key: cord- -d gkfo s authors: perzel mandell, kira a.; price, amanda j.; wilton, richard; collado-torres, leonardo; tao, ran; eagles, nicholas j.; szalay, alexander s.; hyde, thomas m.; weinberger, daniel r.; kleinman, joel e.; jaffe, andrew e. title: characterizing the dynamic and functional dna methylation landscape in the developing human cortex date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: d gkfo s dna methylation (dnam) is a key epigenetic regulator of gene expression across development. the developing prenatal brain is a highly dynamic tissue, but our understanding of key drivers of epigenetic variability across development is limited. we therefore assessed genomic methylation at over million sites in the prenatal cortex using whole genome bisulfite sequencing and found loci and regions in which methylation levels are dynamic across development. we saw that dnam at these loci was associated with nearby gene expression and enriched for enhancer chromatin states in prenatal brain tissue. additionally, these loci were enriched for genes associated with psychiatric disorders and genes involved with neurogenesis. we also found autosomal differences in dnam between the sexes during prenatal development, though these have less clear functional consequences. we lastly confirmed that the dynamic methylation at this critical period is specifically cpg methylation, with very low levels of cph methylation. our findings provide detailed insight into prenatal brain development as well as clues to the pathogenesis of psychiatric traits seen later in life. sites change over time and these changes lead to adjustments in gene expression and splicing. these key regions have also been linked to neurodevelopmental disorders such as schizophrenia, in which early dysregulation plays a vital role , . dnam is an attractive epigenetic mechanism to study in post-mortem human brain tissue, because it represents an interaction between genetic and environmental effects. external factors such as changes in diet , exposure to cigarette smoking , and exposure to arsenic have been associated with both global and site-specific changes to dnam levels. in order to better understand the causes and consequences of deviant dnam patterns in psychiatric disease development, we must first understand the normal landscape. illuminating typical methylation changes in prenatal development will both provide insight into gene expression and molecular pathways active in the postnatal developing brain, and provide a baseline for identification of aberrant dnam in postnatal disease states, as the pathological changes that lead to symptoms of psychiatric disease may precede the onset of illness by several decades . the dorsolateral prefrontal cortex (dlpfc) is a dynamic region of the brain throughout development, essential for motor planning, conceptual organization, and working memory, and is often functionally dysregulated in patients with schizophrenia . previous studies using microarrays to quantify dnam have revealed that there are many dnam changes in the dlpfc around the time of birth , but the dynamics of the prenatal brain are far less characterized. previous work assessing fetal brain dnam has largely used microarrays, and sampled whole brain rather than a specific region . most cells in the developing brain are neuronal , and it has been shown that neuronal dna methylation is especially dynamic in the earliest stages of postnatal life, making the dlpfc a potentially fruitful region for deeper interrogation in prenatal samples . in the present report, we describe the use of whole genome bisulfite sequencing (wgbs) to capture an unbiased map of the dnam landscape, and to characterize both cpg and cph methylation during prenatal brain development. dynamic regions of dnam even in the early developmental context are likely to be associated with psychiatric risk-associated genes, and connection to gene expression data can validate the importance of these genomic regions. these important regions will help point to pathways and mechanisms of normal brain development as well as psychiatric and neurodevelopmental conditions. study samples: brain tissue from these second-trimester prenatal samples was obtained via a material transfer agreement with the national institute of child health and human development brain and tissue bank. all specimens were flash-frozen, then brain ph was measured and postmortem interval (pmi, in hours) was calculated for every sample. postmortem tissue homogenates of the dorsolateral prefrontal cortex (dlpfc) were obtained from all subjects. samples were obtained from the developing prefrontal cortex from the dorsolateral convexity of the frontal lobe half-way between the frontal pole and temporal pole, mm lateral to the central sulcus. specimens extended from the surface of the brain to the ventricular zone. there were of each male and female subjects, and subjects were african american and were european ancestries (table s ). data generation: genomic dna was extracted from mg of pulverized dlpfc tissue with the phenol-chloroform method. dna was subjected to bisulfite conversion followed by sequencing library preparation using the truseq dna methylation kit from illumina. lambda dna was spiked in prior to bisulfite conversion to assess its rate, and we used % phix to better calibrate illumina base calling on these lower complexity libraries. resulting libraries were pooled and sequenced on an illumina hiseq x ten sequencer with paired end bp reads ( x bp), targeting gb per sample. this corresponds to x coverage of the human genome as extra reads were generated to account for the addition of phix. data processing: the raw wgbs data was processed using fastqc to control for quality of reads , trim galore to trim reads and remove adapter content , arioc for alignment to the grch .p genome (obtained from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/ / / /gca_ . _grch .p /gc a_ . _grch .p _assembly_structure/primary_assembly/assembled_chromoso mes/) , duplicate alignments were removed with samblaster , and the bismark methylation extractor to extract methylation data from the sequencing data . we then used the bsseq r/bioconductor package (v . ) to process and combine the dna methylation proportions across the samples for all further manipulation and analysis . after initial data metrics were calculated, the methylation data was smoothed using bsmooth for modelling. cpgs were filtered to those that had ≥ coverage in all samples, and cphs were filtered to those that had ≥ coverage and non-zero methylation in at least half (≥ ) of the samples. comparison to k: in order to compare wgbs methylation levels to k methylation levels, we used data from the same samples using the two methods, and compared their methylation levels at the same sites both graphically ( figure d ) and assessing the mean differences and root-mean-square deviation ( figure c ). we then validated our model's findings by applying the same model to the microarray data to overlapping sites and considered loci significant at fdr < . and validated if it's association had p < . in the validating set ( figure s ). gene set enrichment: we annotated our data using gencode v. on hg . we performed gene ontology and gene set enrichment using clusterprofiler (v . ) with a p-value cutoff of . and q-value cutoff of . . we used sfari . for an autism spectrum disorder gene set, and a set of clinical gene sets defined by birnbaum et al for other neuropsychiatric and neurodevelopmental disorders . enrichment was calculated on a background of genes expressed in our samples, to avoid brain bias. we performed ld score regression as described by and with data from finucane et al . data analysis: cell composition of samples was deconvoluted using flow-sorted dlpfc microarray data from fluorescence-activated nuclear sorted neurons and non-neurons and r package minfi . linear regression modelling was performed accounting for sex, age, and embryonic stem cell content. sex-related cpgs and their surrounding sequence were re-aligned to the genome with bowtie to check for homology. association was tested using limma (v . ) , to create a linear regression model accounting for sex, age, and embryonic stem cell content. the homogenate rna-seq samples were also part of a larger study of rna-seq data from homogenate dlpfc tissue (brainseq consortium phase i) . we compared a site's methylation or mean methylation (for dmrs) to the expression level from rna-seq using a linear model adjusting for the strongest covariates, including sex, age, and estimated embryonic stem cell (esc) content. section : whole genome bisulfite sequencing in the prenatal human cortex we performed whole genome bisulfite sequencing (wgbs) to better characterize the shifting dnam landscape in the developing human dorsolateral prefrontal cortex (dlpfc) in prenatal samples during the second trimester in utero (table s ). after data processing and quality control (see methods), we analyzed , , cpg and , , cph (h=a, t, or c) sites across the epigenome. we first focused on cpg dnam and performed a series of quality control analyses. we quantified read coverage at every cpg, which after processing and alignment resulted in an average of . reads per cpg ( figure a ). most of these sites are methylated (> % dnam) and a minority are unmethylated (< % dnam) (table s ) . we then performed principal component analysis (pca) on the dnam levels across these high-coverage cpgs and found that the top principal components were most associated with sex and estimated fractions of embryonic stem cells (escs) via deconvolution ( figure b ; figure s ; see methods), in line with our previous work . we lastly compared the dnam levels from wgbs to levels from illumina k microarray measured on of the samples for the subset of cpgs in common to both technologies (n= ), and found that overall, cpg dnam methylation levels were highly concordant regardless of cpg read coverage ( figure c -d). these analyses together suggested that our dnam data was of high quality and available for subsequent differential methylation analysis. to understand the changes in the prenatal epigenetic landscape, we performed linear modeling across all cpg sites. we found that dnam changes were abundant even during this relatively restricted period in prenatal development, with , cpg sites differentially methylated across the ages of - post-conception weeks (pcw, at fdr < . , table s ). on average, each week of development was associated with a . % change in dnam (iqr: . % - . %) at a given cpg, with some sites changing as much as . % per week. these differentially methylated cpgs were not evenly distributed across the genome, with none on the y chromosome, and a small number on the x chromosome (table s ). there was also some unevenness on the autosomes, with some chromosomes having up to . x higher frequency of age-associated cpgs than others (table s ). , of these cpgs lie in previously annotated cpg islands, with , in shores, defined as kb from island ends. the vast majority ( %) of these cpgs lie within genes, and , lie within kb of a transcription start site and thus potentially in promoters. this leaves . % in intergenic space. sites were fairly evenly split in the direction of methylation change by age, with % increasing in methylation as the cortex develops. additionally, less than half of these cpg sites are significantly associated with the estimated esc proportion in the sample, suggesting that many of these cpg sites have true prenatal age changes in methylation rather than reflecting effects of maturing cell type composition. we further explored whether these sites could be organized into differentially methylated regions (dmrs), which have been shown to be more functionally relevant than individual cpgs. we therefore used a "bumphunting" technique adapted to wgbs data to identify regions of methylation with . % dnam changes per week (corresponding to . % changes across our developmental window) change in dnam levels across adjacent cpgs, and calculated statistical significance using bootstrap-based permutations . using a conservative cutoff (fwer < . ) there were dmrs across prenatal development, though a less conservative cutoff (p < . ) identified , dmrs (table s , figure s a for dmr plot). the dmrs are similarly unevenly distributed throughout the genome as the cpgs, being far less frequent on chrx and variable among the autosomes. the dmrs on average had a width of bp (iqr: - ), % overlap with annotated genes, and % overlap with annotated cpg islands. like the cpgs, the dmrs were split between hypo-and hypermethylation, where in % of the dmrs, methylation increased with age. we also found differentially methylated sites and regions between the sexes in these prenatal samples. there were , significantly differential cpgs by sex and as expected, the vast majority ( %) of these were on the sex chromosomes. there were still , significant autosomal cpgs (table s ) , and while of them were in regions homologous to the sex chromosomes, the majority ( %) had no homology to chrx or chry. among these, we found conservatively dmrs with a % methylation change between sexes (table s , figure s b for dmr plot). again we saw that these were not global, but regional changes in methylation, with equal numbers of dmrs being hypo-and hypermethylated in males and females. these data show that there are many differences in the dnam landscape throughout second trimester development, and between sexes prenatally. previously, many studies of brain dna methylation have used the illumina infinium® humanmethylation beadchip (" k") and more recent infinium methylationepic (" k") microarray technologies. while these platforms can sensitively measure dnam levels without the high coverage/sequencing requirements of wgbs, they assay a limited number of sites. to better identify the tradeoff between breadth (assaying more sites) and depth (assaying a given site more accurately), we compared our wgbs findings to analogous statistics calculated using these arrays on the same dna extractions. first, using the probe coordinates alone, the k array does not measure dnam levels at , ( %) of the significant age-correlated cpgs or , ( . %) of the significant sex-correlated cpgs we identified. the k dnam data validated ( %) out of the , age-correlated sites that were covered. performing the same age model on the k data, our wgbs data validated / ( %) of the significant results. the effects of age on methylation were generally directionally concordant between the two datasets ( figure s ). though dmrs have much wider spans, the k does not cover ( %) and ( %) of dmrs for age and sex, respectively. the newer k microarray had almost twice the number of probes, yet we found that it does not measure % of our significant sites for age and % of sites for sex. additionally, it still failed to capture % of age dmrs and % of sex dmrs. microarrays are potentially missing a great deal of significantly differential sites, suggesting wgbs is best for thorough analysis. section : functional characterization of differential sites while differential methylation analysis provides detailed information about changes in the methylation landscape across brain prenatal development and between the sexes, we wanted to better understand the molecular consequences of these changes. we performed gene set enrichment and gene ontology to better understand in which processes the genes containing significant cpgs are involved. the top biological processes associated with genes containing differentially methylated cpgs across age were related to axon development and guidance, and regulation of neuron projection ( figure s , table s ). these sites were also shown to be enriched for enhancers in fetal brain tissue as well as enriched for transcription start site regions in brain tissue from the epigenome roadmap project , implying that these areas are likely functional in prenatal brain ( figure a ). to better assess the putative functionality of the methylation at these cpgs and dmrs, we correlated these dnam associations with nearby expression levels using rna-seq data from the same cortical dissection. among the , age-related cpgs, % were within an rna-seq-measured gene (corresponding to , unique genes). , ( %) of these cpgs had methylation levels significantly correlated (p < . ) to expression of the gene they lie directly within ( unique genes, %, see methods, figure b -c). these expression-correlated cpgs did not have any differential gene ontology from the overall group of genes. these sites were mostly in weak transcription, enhancer, and quiescent chromatin states in fetal brain ( figure s ) . with an expanded window of kb around genes, , ( %) cpgs were accounted for, and , ( %) correlated to the nearest gene's expression. it is important to note that this only tested sites within actual genes, and only the effect on the same gene, so enhancer, upstream, and trans-acting effects were not accounted for. the correlation to expression was much lower for sex-related cpgs, at % and % significant correlation depending on inclusion or exclusion of sex chromosomes. the lack of correlation to expression here is unsurprising because there are almost no transcriptomic differences between the sexes in these samples. gene set enrichment on autosomal sex-related cpgs found genes involved with synaptic transmission and signaling, regulation of gtpase activity, and the glutamate receptor signaling pathway (table s ) . age-related dmrs were most enriched for genes related to stem cell proliferation and various cell fate specifications (table s ). among the dmrs across age, % were significantly correlated with gene expression. the genes represented by this are otx , ac . , cyp e , plekhh , and dux l . there was no enrichment for any gene category among the sex-related dmrs, but it is worth noting that many of these genes coded for lncrnas. % of sex-related dmrs were correlated to gene expression within their own gene and these genes were linc , linc , spatc l , and rfpl . correlation to gene expression overall was not different between differentially methylated regions and non-differentially methylated regions; the dmrs we found were not enriched for highly correlated regions. these results provide potential starting points for further understanding the dna methylation landscape of the developing brain, allowing us to understand which processes are active during normal and disordered development. while dna methylation occurs exclusively in the cpg context in most somatic cell types, neurons in the human brain uniquely have high levels of dna methylation in other cytosine contexts (predominantly cpa) . we therefore investigated the potential role of cph methylation across brain development. only . % (iqr: . - . ) of sites in a chg context and . % (iqr: . - . ) of sites in a chh context were methylated. for comparison, around % of cph sites are methylated in postnatal neurons, so it is likely as previously suggested that cph methylation accumulates beginning around the time of birth. this contrasted with cpg sites, which were predominantly methylated across the genome (table s ) . because there were so many more cph than cpg sites in the genome, this means that there are actually similar numbers of methylated cpg and cph sites, despite the different proportions. using the same linear model as was used for cpgs on , , cph sites, no cphs were significantly associated with age within this time period when accounting for the large multiple testing burden. however, autosomal cphs were significantly associated with sex (at fdr < . , table s ). additionally, there were associated cphs on the sex chromosomes, though most of these were on chry, overall very proportionally different from cpg methylation. there was no enrichment for any more specific trinucleotide context in these significant sites over the whole genome. these cphs were less likely to be in or near genes than the cpgs, but of the autosomal cphs that were in or near genes, only was significantly correlated (p < . ) with expression of the nearest gene within kb. it is notable that the majority of these cphs are relatively far from a gene. additionally, the effect size of the cph methylation at these significant sites is independent of the effect of cpgm in the surrounding area in most cases. five of the cphs' effect is reduced when accounting for the methylation of the nearest cpg ( figure s ) , showing that these few are dependent on the local cpg landscape, but most function independently. these loci are found both in and outside of genes and their surrounding regions. overall, cph methylation does not seem to be very dynamic or functional in prenatal second trimester development, though later in early postnatal life it is . section : links to neuropsychiatric conditions and their (genetic) risk. to understand the clinical implications of our findings, we tested the genes represented by prenatal age-associated cpgs and found that they were enriched for bipolar disorder-(p = . ), neurodevelopmental disorder-(p = . ), and schizophrenia-(p = . ) associated genes . age-differential cpgs with correlation to gene expression were not further enriched. the set of age-differential cpg genes was particularly enriched for autism-associated genes (p = x - , or = . ), with of our , genes being linked to autism spectrum disorder (asd) . sex-differential cpgs were also enriched for asd-associated genes (p = x - , or = . ), but not for genes associated with other psychiatric disorders. dmrs represented a greater portion of the genome than cpgs, and thus may have different functional effects. to further understand what phenotypes may be linked to our dmrs, we performed stratified ld score regression . our dmr sets represented a fairly small portion of the genome, but at . mb, the less stringent age-based dmrs were wide enough to detect enrichment of bmi-and subjective well-being-(swb) associated markers (table s ). these dmrs were also shown to be enriched for brain-linked traits overall, compared to non-brain-linked traits. additionally, the dmrs overlap with of schizophrenia loci from psychiatric genetics consortium . these results suggest that dynamic methylation states in prenatal development may play a role in many disease and non-disease phenotypes that manifest later in life. here we demonstrate that cpg methylation changes are associated with prenatal cortical development, are likely functionally relevant, and may play a role in developmental psychiatric disorders such as autism spectrum disorder (asd). the shifting dnam landscape during this restricted prenatal time period -the second trimesterlikely plays a role in neurogenesis of the developing human brain. following current dogma, changes in dnam at these critical sites presumably lead to altered gene expression which promotes the formation of new neural connections in the cortex. at this point in time, we see that it is cpgs that are the important sites in the dnam landscape, as opposed to cphs. this is in line with previous research which finds that most cphm is established around birth and postnatally , , later in neuron differentiation and development . we also see that much of this regulation is occuring in cis -within the site's own gene. we do not currently understand the trans effects of our developmentally dynamic sites, but they may explain the effects of the methylation that does not correlate to its nearest gene's expression. the fact that the age-associated cpg sites we identified are often in enhancer regions in prenatal brain is further evidence that these sites may be regulating from a distance. our findings are in line with previous findings based on a microarray platform , but using wgbs allows us to assess methylation at far more loci than the more commonly used illumina microarrays. we've shown that these microarrays do not even measure the majority of sites that are dynamic in neural development. despite potentially less precision, we also show that wgbs has enough coverage to assess dnam dynamics and that its findings were consistent with deeper-coverage assays. additionally, we present novel loci due to our narrower developmental age range, which means that we can detect more subtle changes in the second trimester but also means that we do not replicate all the findings of studies done across wider timespans. confining our samples to just cortex tissue rather than whole brain tissue also allows us to detect more regionally specific changes. most variation in our data likely comes from different cell types in the cortex, which we partially accounted for in our model by adjusting for esc estimated proportions. we believe that most of the cell types in this area at this time are neurons, but there is likely a range of maturities leading to variation in methylation . by accounting for esc fraction in our model, we believe that the age affects we find are truly a result of prenatal age. we also find that there are autosomal differences in dnam between the sexes, though by our assessment they seem less functionally significant. they may be acting more in trans which makes it more difficult to assess significance, but there are also just generally far fewer transcriptomic differences between sexes than between ages. sex-differential dnam later in life has been linked to psychiatric genes, which has been proposed as a mechanism for sex differences in psychiatric disorders , so perhaps the early changing dnam is laying the groundwork for future effects. like xia et al , we found that asd-associated genes are differentially methylated between the sexes, but this does not readily translate into transcriptomic differences. given the enrichment in our developmentally-associated sites for psychiatric-linked genes, these data support the notion of early neurodevelopmental components for disorders such as schizophrenia , , . genes implicated in bipolar disorder, schizophrenia, and asd are clearly dynamic at this time point in life, so dysregulation at this stage could lead to vulnerabilities to these disorders later in life. our data also implies that bmi and subjective well beingnon-disease traits -could be linked to neural development at this stage of life. as more data is generated, particularly through genome-scale methods like wgbs, we will be able to establish normal ranges of dnam at all ages, which will undoubtedly provide insight into molecular dynamics in this hard-to-study period and organ, as well as give clues to where deviation from the norm is important. unfortunately, wgbs still does not measure hydroxymethylation, another chemical modification thought to be epigenetically important. our findings are only the beginning of what may be found given the limited sample size of our study, but even here we reveal important processes. there is room for much more characterization of these epigenetic marks, but it is clear that they are worth understanding. dna methylation serves as an exciting potential avenue to understand neural development and psychiatric disorders. there are clear and functional changes in the neuronal dnam landscape over this important window of brain development and between sexes, and further investigation will help elucidate unknown mechanisms in the brain. supplemental data include figures and tables. divergent neuronal dna methylation patterns across human cortical development reveal critical periods and a unique role of cph methylation spatio-temporal transcriptome of the human brain epigenetic mechanisms in neurological disease persistent epigenetic differences associated with prenatal exposure to famine in humans tobacco-smoking-related differential dna methylation: k discovery and replication long term low-dose arsenic exposure induces loss of dna methylation genetic insights into the neurodevelopmental origins of schizophrenia complexity of prefrontal cortical dysfunction in schizophrenia: more than up or down mapping dna methylation across development, genotype and schizophrenia in the human frontal cortex methylomic trajectories across human fetal brain development a single-cell transcriptomic atlas of human neocortical development during mid-gestation a wrapper around cutadapt and fastqc to consistently apply adapter and quality trimming to fastq files, with extra functionality for rrbs data arioc: gpu-accelerated alignment of short bisulfite-treated reads samblaster: fast duplicate marking and structural variant read extraction bismark: a flexible aligner and methylation caller for bisulfite-seq applications bsmooth: from whole genome bisulfite sequencing reads to differentially methylated regions clusterprofiler: an r package for comparing biological themes among gene clusters sfari gene . : a community-driven knowledgebase for the autism spectrum disorders (asds) prenatal expression patterns of genes associated with neuropsychiatric disorders partitioning heritability by functional annotation using genome-wide association summary statistics flowsorted.dlpfc. k: illumina humanmethylation data on sorted frontal cortex cell populations (r: bioconductor) minfi: a flexible and comprehensive bioconductor package for the analysis of infinium dna methylation microarrays limma powers differential expression analyses for rna-sequencing and microarray studies robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis bump hunting to identify differentially methylated regions in epigenetic epidemiology studies integrative analysis of reference human epigenomes distribution, recognition and regulation of non-cpg methylation in the adult mammalian brain divergent neuronal dna methylation patterns across human cortical development reveal critical periods and a unique role of cph methylation biological insights from schizophrenia-associated genetic loci global epigenomic reconfiguration during mammalian brain development principles governing dna methylation during neuronal lineage and subtype specification sex-differential dna methylation and associated regulation networks in human brain implicated in the sex-biased risks of psychiatric disorders the neurodevelopmental hypothesis of schizophrenia, revisited implications of normal brain development for the pathogenesis of schizophrenia the authors thank the umb brain bank at the department of pediatrics in the university of maryland school of medicine for the tissue provided. this project was supported by the lieber institute for brain development and by nih grants r mh and r mh . finally, we are indebted to the generosity of the families of the decedents, who donated the brain tissue used in these studies. the authors declare no competing interests. raw and processed nucleic acid sequencing data generated to support the findings of this study are part of the psychencode consortium and the brainseq consortium data releases. specifically, wgbs data have been deposited at www.synapse.org along with the other psychencode data, under the accession code syn . the homogenate rna-seq samples were also part of a larger study of rna-seq data from homogenate dlpfc tissue (brainseq consortium phase i) , which was also deposited at www.synapse.org and summarized in http://eqtl.brainseq.org/phase . the processed, homogenate rna-seq data for this study have additionally been deposited via globus under the jhpce#brainepi-cellsorted collection at the following location: http://research.libd.org/globus/jhpce_brainepi-cellsorted/ . neun-sorted rna-seq data were originally published as part of phase ii of the brainseq consortium ( http://eqtl.brainseq.org/phase / ) and have also been deposited via globus under the jhpce#brainepi-polya collection at the following location: http://research.libd.org/globus/jhpce_brainepi-polya/ . publicly available data reprocessed in support of the conclusions in this work were downloaded from the gene expression omnibus under geo accession gse . key: cord- -ju pao authors: carey, clayton m.; apple, sarah e.; hilbert, zoё a.; kay, michael s.; elde, nels c. title: conflicts with diarrheal pathogens trigger rapid evolution of an intestinal signaling axis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ju pao the pathogenesis of infectious diarrheal diseases is largely attributed to enterotoxin proteins that disrupt intestinal water absorption, causing severe dehydration. despite profound health consequences, the impacts of diarrhea-causing microbes on the evolutionary history of host species are largely unknown. we investigated patterns of genetic variation in mammalian guanylate cyclase-c (gc-c), an intestinal receptor frequently targeted by bacterial enterotoxins, to determine how hosts might adapt in response to diarrheal infections. under normal conditions, gc-c interacts with endogenous guanylin peptides to promote water secretion in the intestine, but signaling can be hijacked by bacterially-encoded heat-stable enterotoxins (sta) during infection, which leads to overstimulation of gc-c and diarrhea. phylogenetic analysis in mammals revealed evidence of recurrent positive selection in the gc-c ligand-binding domain in primates and bats, consistent with selective pressures to evade interactions with sta. using in vitro assays and transgenic intestinal organoids to model sta-mediated diarrhea, we show that gc-c diversification in these lineages results in substantial variation in toxin susceptibility. in bats, we observe a unique pattern of compensatory coevolution in the endogenous gc-c ligand uroguanylin, reflecting intense bouts of positive selection at the receptor-ligand interface. these findings demonstrate control of water physiology as a previously unrecognized interface for genetic conflict and reveal diarrheal pathogens as a source of selective pressure among diverse mammals. many enteric pathogens enhance their growth and dissemination to new hosts by secreting enterotoxins that cause diarrhea . dehydration resulting from these hostenterotoxin interactions represents a significant cause of mortality and morbidity in human populations, particularly in children under the age of five . the intestinal receptor guanylate cyclase-c (gc-c) is a frequent target of enterotoxins encoded by diverse bacterial pathogens. under normal conditions, gc-c activity is stimulated by interactions with the endogenous peptides guanylin and uroguanylin, leading to an increase in intracellular cgmp levels in enterocytes lining the small intestine and colon ( figure a ). enterocyte cgmp levels regulate osmotic balance of the gut by promoting chloride secretion through cystic fibrosis transmembrane conductance regulator (cftr) and inhibiting sodium import through sodium/hydrogen exchanger (nhe ), causing water to flow into the intestinal lumen. during infection, gc-c signaling can be hijacked by bacterial pathogens that produce heat-stable enterotoxins (sta). these toxin peptides mimic sequence and structural features of guanylins to overstimulate cgmp production and cause severe watery diarrhea , promoting rapid dissemination to new hosts. given the central role of sta-gc-c interactions in the pathogenesis of many diarrheal infections, we hypothesized that gc-c might adapt resistance to sta variants as part of an ongoing evolutionary conflict with diarrheal bacteria. to determine whether gc-c evolved under selective pressure to modulate interactions with enterotoxins, we first collected the sequences of gc-c orthologs from primate genomes for phylogenetic analysis (table s ). using statistical models that compare ratios of nonsynonymous to synonymous mutations (dn/ds) across species , , we find strong evidence of recurrent positive selection in gc-c (table s ). similar patterns of abundant nonsynonymous substitutions are widely reported for genes encoding dedicated immune functions that are targeted by pathogen factors . this suggests that gc-c may similarly be subject to recurrent selection based on temporary advantages conferred by mitigating pathogen interactions. unlike dedicated immune proteins, however, gc-c is functionally constrained by its physiological function of regulating water secretion through interactions with endogenous ligands. indeed, the gc-c signaling axis is found in all vertebrates , a pattern associated with functional conservation. to better understand the role of toxin interactions in gc-c evolution, we compared sequence evolution in the paralogous natriuretic peptide receptors (npr - ), which share similar overall structure and function with gc-c. npr and npr are not known to interact with pathogen-derived toxins , and in contrast to gc-c, bear signatures of purifying selection with highly conserved sequences across primates (tables s ,s ). rapid evolution of gc-c is therefore associated with its unique interaction with pathogens. humans encounter a diverse array of sta peptides during infection with different bacterial pathogens. these commonly include toxin variants stp and sth encoded by strains of enterotoxigenic e. coli . pathogenic strains from other bacterial genera, including yersinia and vibrio species, also encode distinct sta variants , . these peptides show considerable sequence variation in contrast to the endogenous gc-c ligands guanylin and uroguanylin that are nearly identical among primates ( figure a , s ). to further examine how gc-c evolution in primates might be influenced by toxin interactions, we mapped rapidly evolving codons with significantly elevated dn/ds ratios to the primary structure of gc-c. strikingly, all six positively-selected sites mapped to the extracellular ligand-binding domain of gc-c, whereas the intracellular domain is highly conserved ( figure b , table s ). these patterns of diversification of both sta and the gc-c ligandbinding domain support a model of recurring conflicts between diarrhea-causing bacteria and hosts over control of water secretion in the intestine. in addition to primates, sta toxicity has been demonstrated in other mammalian species including rodents and livestock . to test whether gc-c might be subject to recurrent positive selection in other mammalian lineages, we collected sequences of gc-c from four additional groups of mammals including bovidae (cloven-hooved ruminants), caniformia (suborder of dog-like carnivores), myomorpha (mouse-like rodents), and chiroptera (bats) (table s ). in contrast to primates, substitution patterns among caniformia and bovidae are consistent with purifying selection, while signals of positive selection are supported in only a subset of tests among myomorpha gc-c (table s ) . patterns of positive selection were remarkably widespread in bats and strongly supported by all phylogenetic tests. examination of bat gc-c sequences revealed codon positions with significantly elevated dn/ds values mapping exclusively to the gc-c ligand-binding domain ( figure b , table s ). sequence comparisons of bat gc-c and npr ligand-binding domains revealed extensive diversification in gc-c relative to the closely related paralog ( figure s ). although the structural basis of the interaction between gc-c and sta has not yet been determined, a putative ligand-binding pocket was proposed in a membrane-proximal region between residues - (ref ). a cluster of seven sites with elevated dn/ds values among bats is in close proximity to this region, in addition to one site identified in our analysis of gc-c in primates ( figure s ). bats are known to harbor many types of viruses, including several human pathogens . this widespread association with viruses is attributed to high density population structures and the ability of long-distance flight . notably, frequent infection with coronaviruses likely sparked high levels of sequence diversification in the viral entry receptor ace- among bats , illustrating the intensity of selection pathogens place on these species. in addition to viruses, bats can carry bacteria that act as human pathogens, including genera that encode sta . furthermore, sta variants have been detected in dna isolated from the feces of wild bats . our detection of signals of intense positive selection in the gc-c ligand-binding domain of bats suggests that diarrheal pathogens may have profoundly impacted the evolution of bats, similar to their established coevolutionary relationship with viruses ( figure b ). to test if rapid evolution of gc-c ligand-binding domains result in functional differences in sta susceptibility, we generated cell lines stably expressing gc-c from seven primate and five bat species. functional diversity in these gc-c variants was assessed by measuring intracellular cgmp production in response to chemically synthesized uroguanylin and sta variants from four strains of pathogenic bacteria. hek t cells expressing human, chimpanzee, gorilla, gibbon and african green monkey gc-c generated similar levels of cgmp upon stimulation with sta variants from pathogenic e. coli (stp and sth), yersinia enterocolitica (y-st), and vibrio mimicus (v-st) (figure a , s ). in contrast, cells expressing gc-c from orangutan and rhesus macaque were significantly more susceptible to stp, showing a large increase in cgmp production relative to other species (figure a ). dose-response curves comparing maximal activation of human and orangutan gc-c to each toxin revealed similar maximal responses for human gc-c, while orangutan gc-c exhibited increases in relative vulnerability to stp and sth toxin variants compared to y-st and v-st ( figure b , c). thus, variation in gc-c among primates results in differing levels of susceptibility to diverse toxins, providing evidence that ancient sta-like peptides influenced the course of gc-c evolution. differences in toxin susceptibility were even more pronounced among gc-c encoded by bats. we cloned gc-c from representatives of the vesper bats: myotis lucifugus (little brown bat), eptesicus fuscus (big brown bat), and miniopterus natalensis (natal longfingered bat), as well as old world fruit bats: pteropus vampyrus (large flying fox) and pteropus alecto (black flying fox) for expression in cell lines. this sampling captures roughly a million year interval of divergence from the common ancestor of chiropteran bats . we discovered a wide range of susceptibility to sta variants across bat species. while gc-c from the vesper bats e. fuscus and m. natalensis responded similarly to all four toxins at high concentrations and produced modest levels of cgmp comparable to human gc-c, cells expressing m. lucifugus, p. vampyrus, and p. alecto gc-c produced nearly -fold more cgmp in response to toxin treatments ( figure d -f). intriguingly, both p. vampyrus and p. alecto failed to respond to treatment with µm y-st in notable contrast to gc-c encoded by m. lucifugus ( figure d ). additional comparisons across a wide range of toxin concentrations revealed that p. vampyrus gc-c is only activated by y-st stimulation at concentrations exceeding µm, well outside probable concentrations encountered during infection ( figure e ). thus, consistent with strong signatures of positive selection for gc-c from bats, susceptibility to sta variants widely differs between bat species, with some receptors appearing resistant to activation at physiologically plausible concentrations of toxin. we next sought to more directly measure the physiological significance of differing toxin susceptibility we observed in our cgmp generation assays. cultured organoids composed of intestinal enterocytes have recently emerged as a means to directly assay symptoms of secretory diarrhea . in this system, gc-c mediated water secretion induces organoid swelling upon ligand stimulation, directly modeling symptoms of diarrhea. we cultured organoids derived from the small intestine of gc-c -/mice to investigate the relationship between cgmp production and water secretion across species. we chose to compare organoids expressing gc-c from humans and the fruit bat p. vampyrus, which displayed heightened susceptibility to stp, but marked resistance to y-st. while mouse gc-c -/organoids fail to respond to ligand stimulation, complementation with human or p. vampyrus gc-c via lentiviral transduction rescued swelling in response to sta ( figure a , b). organoids expressing p. vampyrus gc-c displayed considerably increased water secretion during stp treatment as measured by the total change in organoid area compared to organoids expressing human gc-c ( figure a ). consistent with our in vitro analysis, treatment with y-st resulted in minimal swelling of organoids expressing p. vampyrus gc-c compared to human ( figure c , d). thus, differences in susceptibility to sta variants likely have a direct influence on the level of water secretion and symptoms of diarrhea experienced by host species during infection. in order to regulate intestinal water levels gc-c must interact with endogenous guanylin and uroguanylin peptides . to determine how rapid divergence of gc-c might influence the evolution of its cognate ligands, we next examined diversity in uroguanylin, the more potent of the two peptides . sequence comparisons revealed that secreted uroguanylin peptides are generally highly conserved across mammals with little variation occurring outside the most n-and c-terminal residues ( figure s ). in bats, however, uroguanylin sequences are highly variable, with frequent mutations occurring in core residues of the peptide ( figure a ). given the variability in sequence and toxin interactions we observed in bat gc-c, we hypothesized that sequence variation in bat uroguanylin might reflect compensatory mutations required to maintain affinity for its rapidly evolving receptor. to test if uroguanylin co-evolved with gc-c in bats, we synthesized active isomers of uroguanylin peptides encoded by the vesper bats m. lucifugus and e. fuscus, as well as the more distantly related p. vampyrus ( figure a, s ) . treatment of cells expressing p. vampyrus gc-c with uroguanylin from m. lucifugus did not stimulate cgmp production, whereas treatment with p. vampyrus uroguanylin robustly activates catalytic activity ( figure c ). conversely, m. lucifugus gc-c responds more strongly to its speciesmatched uroguanylin than to the p. vampyrus peptide ( figure b ). intriguingly, e. fuscus gc-c responds weakly to all uroguanylin variants tested, including its own peptide ( figure b ). these experiments are consistent with compensatory co-evolution of gc-c and uroguanylin in bats, and also reveal that some species may exist in intermediate states with sub-optimal affinity for uroguanylin. given functionally consequential variation in gc-c that impacts both enterotoxin and uroguanylin interactions, we propose a model of compensatory coevolution triggered by bacterial enterotoxin interactions ( figure c ). in this model, mutations in gc-c that allow escape from overstimulation by sta can provide a fitness benefit even at the cost of disrupting the interaction with uroguanylin. this intermediate state of low affinity with endogenous peptides is tolerated while toxin susceptible variants are culled from the population, given that survival is possible when gc-c signaling is disrupted . subsequent compensatory mutations in uroguanylin that optimize signaling interactions with gc-c might then outcompete mismatched variants with compromised receptorligand interfaces. in this proposed scenario, a single pathogen encoded protein directly influences the evolution of the host receptor, which subsequently impacts variation in the endogenous ligand. our work reveals how genes involved in intestinal water physiology can rapidly adapt in ongoing conflict with enteric pathogens. similar to well established conflicts between host immune defenses and pathogen effectors, the recurrent evolutionary innovation seen in the gc-c ligand-binding domain indicates that modulating interactions with bacterial toxins has been critical for survival in primates and bats. we show that this diversification in the molecular machinery of the intestine likely contributes to disease susceptibility in host species and may restrict the host range of sta producing pathogens. our observation of remarkable diversification at the gc-c receptor-ligand interface in bats further suggests that pathogens can spark compensatory coevolution within host signaling pathways. together these findings illustrate the far-reaching impacts of deadly diarrheal infections on the evolutionary history of diverse mammals. evolutionary analysis gc-c and npr - nucleotide sequences were retrieved from ncbi genbank for each species. prior to analysis, sequences for each gene within each clade were first aligned using clustalw. in each analysis, a generally agreed upon phylogeny was used for each clade. evidence for gene-wide positive selection was obtained using the branch-site models implemented in paml and busted software. in paml, alignments were fitted to a f x codon model and likelihood ratio tests were performed based on comparisons of nssites model to model and model to model . busted analysis was performed on all branches of a user-specified tree using an online webserver with default parameters (https://www.datamonkey.org/busted). gene trees based on the amino acid sequence of gc-c and npr were generated by first aligning amino acid sequences with clustalw in each clade. trees were generated using the phyml plugin implemented in geneious software using the le gascuel substitution model. unless otherwise noted, all gene and amino acid sequences were retrieved from genbank (table s ). sequence alignments were created using clustalw and exported using geneious. figures were created using assets from www.biorender.com. to generate a c-terminal acid with the last amino acid of the sequence pre-loaded and a density of ~ . mmol/g, mg -chlorotrityl resin (chempep, - mesh) was washed with dmf and dcm and swelled for min, . mmol of the respective amino acid was dissolved in ml dmf/dcm and . mmol dipea was added to the amino acid solution. this mixture was added to pre-swelled resin and rotated at rt for hour. then the resin was washed with x dcm and capped with ml : : dcm:meoh:dipea with manual mixing between washes. finally, resin was washed x with ~ ml dcm and x with ~ ml dmf prior to synthesis. remaining amino acids were chemically synthesized using solid-phase peptide synthesis (spps) with a standard synthesis scale of µmol on a prelude x peptide synthesizer (gyros protein technologies) using fmoc-protected amino acids (gyros protein technologies). uroguanylin peptides were synthesized using fmoc-cys(acm)-oh (aapptec) at c /c . the following cycles were used during spps: fmoc deprotection: x -min cycles of ml % piperidine in dmf. amino acid coupling: mixing fmoc-protected amino acid in nmp ( . ml, mm), hatu in dmf ( . ml, mm), plus nmm in dmf ( . ml, . m) for min with shaking at rpm and nitrogen bubbling. dmf washing ( ml) was performed between deprotection and coupling steps ( x s). completed peptides were washed with dcm and dried prior to cleavage. peptide cleavage from resin peptides were cleaved from resin for h using . ml tfa, µl water, µl tis and µl edt. after cleavage, peptides were precipitated into cold diethyl ether ( ml), stored at - °c overnight, then washed with diethyl ether ( x ml), and finally peptides were pelleted by centrifugation and dried overnight in a desiccator prior to hplc purification. oxidation of free cysteine residues in purified peptides was accomplished using air oxidation of solid peptide dissolved in peptide oxidation buffer ( mm tris base, % dmso, ph at °c) at ≤ . mg/ml (to ensure intramolecular disulfide formation) for ~ h at °c with shaking at rpm. the reaction was quenched with glacial acetic acid (final concentration of %), and the ph was confirmed to be between ph - . reaction solution was then diluted -fold with hplc buffer a, spun for min at xg and purified by reverse phase hplc on a phenomenex jupiter -µm proteo c column using a gradient of - % buffer b (v-st), to % buffer b (y-sta), - % buffer b (stp), . - % buffer b (sth) over min with a flow rate of ml/min. all hplc methods started at % buffer b to prevent peptide precipitation. fractions were collected and checked for purity by lc/ms ( - or - % buffer b over min) and lyophilized. for each peptide, eight distinct peaks were collected ( figure s a ) and tested for activity in the cellular cgmp assay. the most active peaks were: oxidized stp peak # , oxidized sth peak # , oxidized v-st peak # , and oxidized y-st peak # ( figure s b and c) . oxidation of free cysteine residues (human, p. vampyrus, m. lucifigus, and e. fuscus uroguanylin peptides) crude peptides were first validated by lc/ms and analytical hplc, as described above. oxidation of free cysteine residues was accomplished using air oxidation of solid peptide dissolved in peptide oxidation buffer at ≤ . mg/ml for - h at °c with shaking at rpm. the reaction was quenched with glacial acetic acid ( % final concentration) and the ph was confirmed to be between - . reaction solution was then diluted -fold with hplc buffer a, spun for min at xg and purified by reversephase hplc on a phenomenex jupiter -µm proteo c column (see specs above) using a gradient of - % buffer b (human), - % buffer b (p. vampyrus), - % buffer b (m. lucifigus), or - % bufer b (e. fuscus) over min, all hplc methods starting at % buffer b to prevent peptide precipitation. fractions were collected and checked for purity by lc/ms ( - or - % buffer b over min, ml/min). acm removal and formation of second disulfide (human, p. vampyrus, m. lucifigus uroguanylin peptides) dry purified peptides with a single formed disulfide and cys(acm) protecting groups at positions c and c ( - mg) were dissolved in buffer a to a concentration of ~ µm. fresh i -tfa mixture was prepared (~ mg i dissolved in ml acn, then added to a solution of ml h o with . ml tfa), and the i -tfa mix was added ( x volume) to the peptide-buffer a solution. this mixture reacted on a rotator at rt for min before quenching with m ascorbic acid added drop-wise until solution changes from rust color to colorless, clear solution (typically less than μl). quenched solution was diluted with one volume buffer a, spun min at xg, and transferred to a new tube for purification. same-day purification proceeded via reverse-phase hplc using a phenomenex -μm c -kinetex column c Å ( x . mm) with a gradient of - % buffer b (human uroguanylin), - % buffer b (p. vampyrus), - % buffer b (m. lucifugus), or - % buffer b (e. fuscus) over min at ml/min to achieve baseline separation of peptide topoisomers. two individual peaks were collected separately and their mass confirmed by lc/ms. generally the larger peak was determined to be the active isomer (consistent with previous studies ). the uroguanylin topoisomers were found to interconvert at rt (as seen previously for human uroguanylin [ ] [ ] [ ] , with m. lucifigus converting the fastest-within min of peak purification). therefore, fractions were placed on ice immediately after purification until confirmation by lc/ms, then pooled and lyophilized. all dry, purified peptides were stored in parafilmed containers in the dark at - °c. we tested the order of disulfide bond formation and found that forming the c -c disulfide second yielded more of the active isomer compared to forming the c -c disulfide second. to generate each gc-c expressing cell line . x t cells were seeded into a single well of a well plate. after hours of growth, µl concentrated lentivirus encoding gc-c linked to a gfp reporter was added to a total of ml growth medium containing µm polybrene (sigma). hours post transduction, cells were transferred to a t flask and grown for - days. transgenic lines were established by facs sorting the top % of gfp expressing cells from each transduction. t cells expressing gc-c variants were grown to confluence in -well plates before ligand stimulation. toxin and uroguanylin solutions were diluted in reduced serum media (opti-mem, thermo scientific) to the appropriate experimental concentration. for each measurement, cell culture media was then aspirated and replaced with the ligandcontaining solution in - replicate wells. following incubation at °c/ % co for min, the ligand solution was aspirated and cells lysed directly in . m hcl. intracellular cgmp was then measured using an enzyme linked immunosorbent assay kit (enzo scientific) according to the manufacturer's specifications. absorbance was measured using a plate reader (biotek), and cgmp concentrations calculated based on a standard curve. four-parameter variable slope dose-response curves were generated in prism (graphpad). lentivirus generation and cloning gc-c sequences were downloaded from genbank based on whole genome sequences from each species tested. each c-terminally ha tagged gc-c variant was synthesized via gene synthesis (life technologies). gc-c was then cloned into the lentiviral transfer vector pultra (addgene # ) between the xbai and bamhi restriction sites by gibson assembly, in frame with gfp and the t a linker sequence. to generate lentiviral particles, cm dishes were seeded with x t cells hours prior to transfection. cells were then transfected with . µg pultra-gc-c, . µg pspax packaging plasmid (addgene # ), and . µg pmd .g envelope plasmid (addgene # ) with µl fugene hd transfection reagent according to the manufacturer's specifications. media was replaced hours post-transfection and replaced with ml media. viral supernatants were collected hours-post transfection and passed through a . µm followed by overnight incubation with x peg-it solution (system biosciences) at °c. precipitated viral particles were centrifuged at xg for min at °c and resuspended in pbs at a final volume of µl before storage at - °c. western blotting t cells expressing gc-c were grown to confluence in -well plates following immunoblot analysis. to avoid aggregation, cells were lysed directly in x laemmli buffer containing m urea (sigma), m thiourea (sigma), and β-mercaptoethanol for min at room temperature prior to analysis. total protein was resolved by mini-protean gtx polyacrylamide gel electrophoresis (bio-rad). proteins were detected using anti-ha (covance cat# mms- p, : ) and anti-actin (bd cat# , : ) antibodies. blots were visualized using film or c-digit chemiluminescent imager (li-cor). intestinal organoid culture small intestinal organoids derived from gc-c knockout mice were a generous gift from the laboratory of scott waldman and were isolated as described previously . organoids were maintained in µl matrigel droplets (corning) in wells of a -well plate containing µl intesticult mouse organoid growth medium (stemcell technologies cat# ) at °c/ % co . organoids were passaged every - days by disruption with tryple (thermo scientific) and reseeded at a concentration of - organoids per well. intestinal organoid transduction for each transduction approximately organoids were mechanically disrupted by vigorous pipetting using a µl pipette tip. disrupted organoids were then resuspended in µl concentrated lentivirus solution containing µm polybrene (sigma), µm y- (sigma) and µm sb (sigma). lentiviral infection was allowed to proceed at °c/ % co for h prior to resuspension in matrigel and seeding into a new well. transduced organoids were monitored for gfp expression and - organoids with uniform fluorescence were selected manually h post-transduction and transferred to a new well to establish transgenic lines. organoid swelling assay organoids were grown in a well plate for - days after passaging prior to swelling assays at a concentration of - organoids per well. organoids were imaged on an imagexpress pico automated cell imaging system (molecular devices) using the live cell imaging cassette with temperature set to ˚c, % co , and humidity levels - %. these parameters were monitored and remained constant throughout each experiment. for swelling analysis, st toxins were added directly to the organoid culture media to a final concentration of µm. following addition of toxin, a mm area of each well (corresponding to ~ % of the total well area) was imaged every min at x magnification for the full hours of the experiment. the imaged area was constant across the experiment and analogous between wells and contained at least organoids for each analyzed strain. stacked transmitted light images were collected for each well with a focus step of µm and the best plane image was calculated by the imagexpress software. gfp images were collected at the beginning and end of the experiment using the same imaging parameters, and both the best focus and maximum fluorescence images for each stack were calculated and output by the imagexpress software. organoids used for analysis were identified in imagej and assigned roi identifiers so that the same organoids could be easily assessed across timepoints. organoids from each strain were selected randomly to include a variety of starting sizes and were checked for gc-c expression by comparison with gfp images. gc-c expressing organoids that ruptured during the experiment were excluded from analysis. swelling was assessed by measuring the area of each organoid in imagej at t= and t= min. the percent change in organoid area from to min was calculated for each organoid and plotted in prism. significance in organoid swelling data was assessed by an unpaired t-test with welch's correction using prism . figure s : gc-c ligand-binding domain sequence divergence in primates and bats. amino acid alignments were generated for the ligand-binding domains of gc-c and npr in primates and bats. trees were generated using phyml with the le gascuel amino acid substitution model. branches connecting npr and gc-c trees are artificially collapsed (dashed line). two distinct topoisomers were observed following oxidation of synthesized uroguanylin from each species. (b) cgmp generation assays in hek t cells expressing gc-c from the indicated species were performed to identify the most active topoisomer. human isomer # is the known active peak. e. fuscus peak # was used in in experimental assays ( figure ). enteric infection meets intestinal function: how bacterial pathogens cause diarrhoea estimates of the global, regional, and national morbidity, mortality, and aetiologies of diarrhoea in countries: a systematic analysis for the global burden of disease study receptor guanylyl cyclase c (gc-c): regulation and signal transduction cure and curse: e. coli heat-stable enterotoxin and its receptor guanylyl cyclase c gene-wide identification of episodic selection phylogenetic analysis by maximum likelihood rules of engagement: molecular insights from hostvirus arms races uroguanylin and guanylin peptides: pharmacology and experimental therapeutics natriuretic peptides, their receptors, and cyclic guanosine monophosphate-dependent signaling functions molecular mechanisms of enterotoxigenic escherichia coli infection amino acid sequence of heat-stable enterotoxin produced by vibrio cholerae non- isolation, primary structure and synthesis of heat-stable enterotoxin produced by yersinia enterocolitica identification of ligand recognition sites in heat-stable enterotoxin receptor, membrane-associated guanylyl cyclase c by site-directed mutational analysis bats: important reservoir hosts of emerging viruses bats as 'special' reservoirs for emerging zoonotic pathogens evidence for ace -utilizing coronaviruses (covs) related to severe acute respiratory syndrome cov in bats bats and bacterial pathogens: a review microbiome analysis reveals the abundance of bacterial pathogens in rousettus leschenaultii guano a time-calibrated species-level phylogeny of bats (chiroptera, mammalia) simple and reliable enzyme-linked immunosorbent assay with monoclonal antibodies for detection of escherichia coli heat-stable enterotoxins intestinal enteroids model guanylate cyclase c-dependent secretion induced by heat-stable enterotoxins characterization of human uroguanylin: a member of the guanylin peptide family disruption of the guanylyl cyclase-c gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice generation of two isomers with the same disulfide connectivity during disulfide bond formation of human uroguanylin topological isomers of human uroguanylin: interconversion between biologically active and inactive isomers synthesis, biological activity and isomerism of guanylate cyclase cactivating peptides guanylin and uroguanylin the authors thank scott waldman and amanda pattison for generously providing gc-c knockout organoids. n.c.e. is supported by the national institute of health, united states (r gm and r gm ) and the burroughs wellcome fund investigators in the pathogenesis of infectious disease program, united states. m.s.k. is supported by key: cord- - xg s o authors: coulibali, zonlehoua; cambouris, athyna nancy; parent, serge-Étienne title: site-specific machine learning predictive fertilization models for potato crops in eastern canada date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xg s o statistical modeling is commonly used to relate the performance of potato (solanum tuberosum l.) to fertilizer requirements. prescribing optimal nutrient doses is challenging because of the involvement of many variables including weather, soils, land management, genotypes, and severity of pests and diseases. where sufficient data are available, machine learning algorithms can be used to predict crop performance. the objective of this study was to predict tuber yield and quality (size and specific gravity) as impacted by nitrogen, phosphorus and potassium fertilization as well as weather, soils and land management variables. we exploited a data set of field experiments conducted from to in quebec (canada). we developed, evaluated and compared predictions from a hierarchical mitscherlich model, k-nearest neighbors, random forest, neuronal networks and gaussian processes. machine learning models returned r values of . – . for tuber marketable yield prediction, which were higher than the mitscherlich model r ( . ). the models were more likely to predict medium-size tubers (r = . – . ) and tuber specific gravity (r = . – . ) than large-size tubers (r = . – . ) and marketable yield. response surfaces from the mitscherlich model, neural networks and gaussian processes returned smooth responses that agreed more with actual evidence than discontinuous curves derived from k-nearest neighbors and random forest models. when marginalized to obtain optimal dosages from dose-response surfaces given constant weather, soil and land management conditions, some disagreements occurred between models. due to their built-in ability to develop recommendations within a probabilistic risk-assessment framework, gaussian processes stood out as the most promising algorithm to support decisions that minimize economic or agronomic risks. * npk factorial design or others where n (nitrogen), p (phosphorus) and k (potassium) were kept constant. . we matched the duration from planting to harvest but the classes names differed. the preceding crops were categorized as in parent et al. [ ] as grasslands, legumes, cereals, low-residue crops and high-residue crops. toponymic names, geographical coordinates and years were recorded at each site. fertilizers other than n, p or k, fertilizer source, dosage and application method, seeding density and date, harvest date, tuber marketable yield (excluding tubers < . cm in diameter), tuber size distribution (small, medium, large) and sg were recorded. the n fertilizers were either all applied at seeding or split-applied between seeding and hilling. the p fertilizers were banded at planting. the k fertilizers were band-applied or split-applied before planting and at planting. we added trials conducted in and in the outaouais, centre-du- québec, and lac-saint-jean regions. we reported the growing season lengths provided by scouting teams covering the period from seeding to harvest and not strictly corresponding to the theoretical cfia [ ] growth duration as shown for cultivars superior, goldrush, krantz and fl from the trials used for model analysis (table ). side, c jis the compositional vector at the left-hand side, c j + is the compositional vector at the right-hand side, and g() is the geometric mean function. the proportion of the textural components and the carbon content formed the soil texture simplex. the balances are presented in table . we followed the [denominator parts | numerator parts] notation [ ]. index for rainfall rd is daily rainfall, n is the number of days and tm is daily mean temperature. typically, values greater than . are considered acceptable [ ] . the mae is the average of the absolute differences between predictions and observations as in equation mitscherlich, nn and gp models generated smooth response curves, while the knn and rf models generated stepped curves. the marketable yield was non-responsive to p application in the rf model. there was also no effect of k fertilization on the yield shown by the mitscherlich and rf models. all models for the p trial somewhat underestimated marketable yield while response curves followed data for n. (fig ) showed increasing response to n fertilization across models, while response was globally poor for p and k. for the [s | m] balance, responses increased with increasing fertilizer doses, except for p and k trials data fitted with gp model (fig ) . there was also poor response for k trial with sg (fig ) . the sg response decreased from zero k levels and increased then decreased as p dosage increased. for n trials, sg slightly increased then decreased as n dose increased in the rf model, but was non-responsive with the other models. the prediction of optimum fertilizer doses and optimum or maximum outputs showed some disagreements for the case presented (fig ) . there should be a single economic optimal dose or agronomic optimal dose at each site each year. some models were more consistent than others in deriving optimal doses depending on the target variable. at extremely low predicted n, p or k doses, it could be challenging to manage the fertilization program at low economic risk for producers, who generally consider that the cost of over-fertilization is low compared to the cost of under-fertilization [ , ]. the probabilistic prediction capability of gaussian processes may help to determine credible dosage. the average gp curve is shown as a black line, with its optimal dosage as a black dot. five sampled gp curves are plotted as grey lines, with their optimal doses as grey dots. the probability distributions of the optimal doses are shown under the respective response curves. the figures show that predicted means prediction only for the n trial the probabilistic prediction was equal to the mean gp prediction for p trial i.e., kg p ha - , while n and k trials returned equal predictions with the [s | m] balance prediction models with . kg ha - and . kg ha - , respectively (fig ). for tuber sg prediction models examples of optimal economic n, p, k doses distribution with gaussian processes using marketable yield for selected trials. n: nitrogen, p: phosphorous examples of agronomic optimal n, p, k doses distribution with gaussian processes using tuber size [m, s | l] balance for selected trials. n: nitrogen, p: phosphorous, and k: potassium fig : examples of agronomic optimal n, p, k doses distribution with gaussian processes using tuber size [s | m] balance for selected trials. n: nitrogen, p: phosphorous examples of agronomic optimal n, p, k doses distribution with gaussian processes using tuber sg for selected trials features with low or no importance could be removed without affecting model performance [ ]. the preceding crops categories i.e., grassland, small grains, legumes, low-residue crops and high-residue crops, as categorized by parent et al. [ ], returned zero (for tuber sg) or faintest scores (for other target variables) and were thus removed despite a substantial body of literature on the advantages of crop rotation to the next crop. nonetheless, zebarth et al. [ ] stated that the amount of nitrogen mineralized from organic matter during the growing season cannot be predicted accurately. torma et al. [ ] found that the n supplied by soil and crop residues (maize, potato, silage maize, soybean, sunflower, winter rape, winter wheat) ranged from to kg ha - , while the phosphorus ranged from to kg ha - or decreasing gp samples, which are more frequent when the sample is close to patterns in data where the response to fertilizer is flat. a zero-fertilizer recommendation could be interpreted as a soil sufficiently fertile to supply the crop khiari et al. [ ] assessed the th and th percentiles. the mean ( %), the median or any other percentile dose could be computed to support decision-making. for example, the mean gp and the probability distribution processes returned the upper bound of the simulation dosage (i.e., kg n ha - ) as the economic optimal dose for the n trial with the marketable yield prediction model (fig ). the conditional expectation percentiles showed that a lower dose (i.e., kg n ha - ) could be recommended this study assessed machine learning techniques as an alternative for potato fertilizer recommendations at local scale usually handled by statistical models or meta- analysis at regional scale. a large collection of field trial data provided information to fit machine learning models with specific traits of cultivars, soil properties, weather indexes, and n, p and k fertilizers dosage used as predictive features p and k doses derived from yield, or against optimal agronomic n, p and k doses derived from tuber size and sg. the models trained using machine learning algorithms outperformed the mitscherlich tri-variate response predictive model. the marketable yield prediction coefficient (r ) varied between . and . , while the mitscherlich from uniform distributions under constant weather conditions, soil properties and land management factors as large amounts of data are being assembled into observational data sets, in the context of precision agriculture. to assess model performance under real-world situations data since accurate future weather data covering the growing season are unavailable any biotic factor other than fertilizer, e.g., length of growing season or planting density, could be optimized with management scenarios. with more experiment data, the training and testing division could be performed at trial level to improve the model predictive ability criteria for publishing papers on crop modeling. field crops research an overview of available crop growth and yield models for studies and assessments in agriculture decision support systems in potato production potato, sweet potato, and yam models for climate change: a review mathematical models of plant growth and development a neural network experiment on the site-specific simulation of potato tuber growth in eastern canada. computers and electronics in agriculture site-specific multilevel modeling of potato response to nitrogen fertilization effects of soil compaction on potato growth and its removal by cultivation differentiation of potato ecosystems on the basis of relationships among physical, chemical and biological soil parameters agronomic practices an analysis of the response of sugar beet and potatoes to fertilizer nitrogen and mineral soil mineral nitrogen potato response to crop sequence and nitrogen fertilization following sod breakup in a gleyed humo-ferric podzol responses of potato (solanum tuberosum l.) to green manure cover crops and nitrogen fertilization rates soil mineralizable nitrogen and soil nitrogen supply under two-year potato rotations italian ryegrass management effects on nitrogen supply to a subsequent potato crop effect of straw and fertilizer nitrogen management for spring barley on soil nitrogen supply to a subsequent potato crop a model of the development and bulking of potatoes (solanum tuberosum l.) i. derivation from well-managed field crops. field crops research potato response to nitrogen sources and rates in an irrigated sandy soil rate and timing of nitrogen fertilization of russet burbank potato: yield and processing quality the potato crop: the scientific basis for improvement water relations and growth of potatoes. the potato crop: springer effects of climate on different potato genotypes. . dry matter allocation and duration of the growth cycle comparison of empirical daily surface incoming solar radiation models. agricultural and forest meteorology yield levels of potato crops: recent achievements and future prospects prediction of soil nitrogen supply in potato fields using soil temperature and water content information sa. soil nutrient bioavailability: a mechanistic approach minerals, soils and roots net primary productivity and below-ground crop residue inputs for root crops: potato (solanum tuberosum l.) and sugar beet (beta vulgaris l.) water-nutrients interaction: exploring the effects of water as a central role for availability & use efficiency of nutrients by shallow rooted vegetable crops -a review potash requirements of potatoes düngung sichert ertrag und qualität. land & fort the significance of trends in concentrations of total nitrogen and nitrogenous compounds commercial potato production in north america the potato association of america handbook global markets for processed potato products meeting global food needs: realizing the potential via genetics x environment x management interactions do farmers waste fertilizer? a comparison of ex post optimal nitrogen rates and ex ante recommendations by model, site and year. agricultural systems nouveaux outils de gestion de l'azote dans la production de la pomme de terre. craaq, colloque sur la pomme de terre dynamics of nitrate leaching under irrigated potato rotation in washington state: a long-term simulation study. agriculture, ecosystems & environment long-term simulations of nitrate leaching from potato production systems in prince edward island controls on nitrate loading and implications for bmps under intensive potato production systems in prince edward island groundwater monitoring to support development of bmps for groundwater protection: the abbotsford-sumas aquifer case study an agri- environmental phosphorus saturation index for acid coarse-textured soils agri- environmental models using mehlich-iii soil phosphorus saturation index for canadian journal of soil science mehlich-iii soil phosphorus saturation indices for quebec acid to near neutral mineral soils varying in texture and genesis nitrogen balances and yields of spring cereals as affected by nitrogen fertilization in northern conditions: a meta-analysis management of nitrogen and water in potato production pers, wageningen disaggregating model bias and variability when calculating economic optimum rates of nitrogen fertilization for corn alternative benchmarks for economically optimal rates of nitrogen fertilization for corn advances in machine learning applications in software engineering: igi global application of machine learning methodologies for predicting corn economic optimal nitrogen rate soil test correlation, calibration, and recommendation soil testing and plant analysis potato plants characteristics, maturity. canadian food inspection agency: canadian food inspection agency a specific gravity calculator for potatoes soil classification working group. canadian system of soil classification canadian system of soil classification numerical clustering of soil series using morphological profile attributes for potato methods of soil analysis: part -physical and mineralogical methods (agronomy m): soil science society of america determination of soil texture by laser diffraction method inventaire des problèmes de dégradation des sols agricoles du québec: rapport synthèse. entente auxiliaire canada- québec sur le développement agro-alimentaire québec service de recherche en sols soil reaction and exchangeable acidity soil sampling and methods of analysis. . nd ed methods of soil analysis part chemical and microbiological properties a comparison of three methods of organic carbon determination in some new zealand soils table interprétative de la mesure du ph des sols du québec par quatre méthodes différentes mehlich-iii extractable elements correlation of mehlich , bray , and ammonium acetate extractable p, k, ca, and mg for alaska agricultural soils a modified single solution method for the determination of phosphate in natural waters guide de référence en fertilisation guide de référence en fertilisation. è ed groups of parts and their balances in compositional data analysis balance trees reveal microbial niche differentiation. msystems plant ionome diagnosis using sound balances: case study with mango (mangifera indica). frontiers in plant science development and testing of canada-wide interpolated spatial models of daily minimum-maximum temperature and precipitation for - corn response to nitrogen is influenced by soil texture and weather scikit- learn: machine learning in python dealing with zeros and missing values in compositional data sets using nonparametric imputation zcompositions -r package for multivariate imputation of left-censored data under a compositional approach r: a language and environment for statistical computing. r foundation for statistical computing tidyverse: easily install and load the 'tidyverse'. r package version . . compositions: compositional data analysis. r package version robcompositions: an r-package for robust statistical analysis of compositional data john wiley & sons python tutorial, technical report cs r : centrum voor wiskunde en informatica (cwi) amsterdam scipy . --fundamental algorithms for scientific computing in python. arxiv preprint the numpy array: a structure for efficient numerical computation hunter jd. matplotlib: a d graphics environment réseaux de neurones a review of supervised machine learning algorithms and their applications to ecological data modeling avena fatua seedling emergence dynamics: an artificial neural network approach. computers and electronics in agriculture classification of arrhythmia using machine learning techniques gaussian processes based bivariate control parameters optimization of variable-rate granular fertilizer applicator a bivariate response surface for growth data. fertilizer research model evaluation guidelines for systematic quantification of accuracy in watershed simulations hydrological modeling of the iroquois river watershed using hspf and swat nitrogen uptake across site specific management zones in irrigated corn production systems a simple software tool to simulate nitrate and potassium co-leaching under potato crop nitrogen management for potato: general fertilizer recommendations residual plant nutrients in crop residues -an important resource. acta agriculturae scandinavica section b-soil and crop rotation effects on soil fertility and plant nutrition university of maryland: nraes do plants need nitrate? the mechanisms by which nitrogen form affects plants chapter -functions of macronutrients. marschner's mineral nutrition of higher plants feddes ra. water, heat and crop growth a simulation model for potato growth and development: substor-potato version . : michigan state university, department of crop and soil sciences adaptation of potato to high temperatures and salinity-a review compaction of coarse-textured soils: balance models across mineral and organic compositions. frontiers in ecology and evolution above-ground and below-ground plant development potatoes and human health the effect of in-row seed piece spacing and harvest date of the tuber yield and processing quality of conestoga potatoes in southern manitoba aspects physiologiques de la croissance et du développement la pomme de terre: production ennemis et maladies, utilisations. paris: inra; evaluation of the effect of density on potato yield and tuber size distribution factors affecting specific gravity loss in crisping potato crops in yield response of potatoes to variable nitrogen management by landform element and in relation to petiole nitrogen -a case study nitrogen fertilization and irrigation affects tuber characteristics of two potato cultivars influence of fertilizer management and soil fertility on tuber specific gravity: a review effect of nitrogen, phosphorus, and potassium fertilizers on yield components and specific gravity of potatoes effects of nitrogen, phosphorus, and potassium on yield, specific gravity, crisp colour, and tuber chemical composition of potato comparison of models for describing corn yield response to nitrogen-fertilizer modeling nutrient responses in the field. plant and soil comparison of three statistical models describing potato yield response to nitrogen fertilizer modified-quadratic/plateau model for describing plant-responses to fertilizer quadratic and quadratic-plus-plateau models for predicting optimal nitrogen rate of corn: a comparison relationships between nitrogen rate, plant nitrogen concentration, yield, and residual soil nitrate-nitrogen in silage corn agronomic use efficiency of n fertilizer in maize-based systems in sub-saharan africa within the context of integrated soil fertility management rich ae. potato diseases influence of weed competition on potato growth, production and radiation use efficiency potato_df.csv' file available in 'data' repository at key: cord- -v vrely authors: harda, zofia; spyrka, jadwiga; jastrzębska, kamila; szumiec, Łukasz; bryksa, anna; klimczak, marta; polaszek, maria; gołda, sławomir; zajdel, joanna; błasiak, anna; parkitna, jan rodriguez title: loss of mu and delta opioid receptors on neurons expressing dopamine receptor d has no effect on reward sensitivity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v vrely opioid signaling controls the activity of the brain’s reward system. it is involved in signaling the hedonic effects of rewards and also has essential roles in reinforcement and motivational processes. here, we focused on opioid signaling through mu and delta receptors on dopaminoceptive neurons and evaluated the role these receptors play in reward-driven behaviors. we generated a genetically modified mouse with selective double knockdown of mu and delta opioid receptors in neurons expressing dopamine receptor d . selective expression of the transgene was confirmed using immunostaining. knockdown was validated by measuring the effects of selective opioid receptor agonists on neuronal membrane currents using whole-cell patch clamp recordings. we found that in the nucleus accumbens of control mice, the majority of dopamine receptor d -expressing neurons were sensitive to a mu or delta opioid agonist. in mutant mice, the response to the delta receptor agonist was blocked, while the effects of the mu agonist were strongly attenuated. behaviorally, the mice had no obvious impairments. the mutation did not affect sensitivity to the rewarding effects of morphine injections or social contact and had no effect on preference for sweet taste. knockdown had a moderate effect on motor activity in some of the tests performed, but this effect did not reach statistical significance. thus, we found that knocking down mu and delta receptors on dopamine receptor d -expressing cells does not appreciably affect reward-driven behaviors. highlights – it is well accepted that opioid signaling controls the brain’s reward system – we generated mutant mice with mu and delta receptor knockdown in d neurons – knockdown made dopaminoceptive neurons insensitive to mu and delta opioid receptor agonists – the mutation did not cause obvious behavioral impairments – the loss of mu and delta receptors on d neurons does not affect reward sensitivity opioid signaling controls the activity of the brain's reward system, acting as a regulator of the mesolimbic system, both at the level of dopamine neurons in the midbrain and dopaminoceptive medium spiny neurons in the striatum. it is now well established that mu receptor agonists inhibit gabaergic signaling in the ventral midbrain, which leads to an increase in dopamine neuron activity and is critical for the euphoric and reinforcing effects of opioids (fields and margolis, ) . conversely, the role of opioid signaling in dopaminoceptive neurons (i.e., neurons that receive dopamine inputs), particularly striatal medium spiny neurons, remains only partly understood. it has been reported that the activation of mu or delta opioid receptors in discrete areas of the nucleus accumbens of the striatum induces hedonic reactions and promotes the intake of palatable food or drink (e.g. castro and berridge, ; peciña and berridge, ) . these observations led to an influential model linking opioid signaling through mu and delta receptors in the nucleus accumbens to the "liking" component of reward, and separating it from dopamine signaling, involved primarily in motivational processes, or the "wanting" component of reward (berridge et al., ; wise, ) . nevertheless, some evidence has indicated that such elegant separation of roles may be too simplistic. the reported reinforcing effects of direct mu opioid receptor agonist injection into the striatum on conditioned place preference in rats appear to be inconsistent, although anatomical differences have been suggested as a potential explanation for these discrepancies (bals-kubik et al., ; castro and berridge, ) . intrastriatal injections of delta agonists have been observed to have no effect on place preference (bals-kubik et al., ) . the complete loss of mu opioid receptors abolishes the rewarding effects of morphine in mice (matthes et al., ) , and the reintroduction of mu receptor expression to direct pathway medium spiny neurons of the striatum is sufficient to restore morphine-conditioned place preference (cui et al., ) . it has also been reported that the selective deletion of mu opioid receptors in forebrain gabaergic neurons (including those in the striatum) spares opioid-conditioned place preference but affects motivation to self-administer heroin or obtain palatable food (charbogne et al., ) . to add further complexity, it appears that opioid signaling in the forebrain may play different roles depending on the type of reward, as the same mutation completely abolishes alcohol-conditioned place preference (hamida et al., ) . it should also be noted that mu opioid signaling in the nucleus accumbens has been implicated in the rewarding effects of social interaction (e.g. trezza et al., ) . conversely, data regarding the specific roles of delta opioid receptors in reward signaling are more limited. a recent report implicated these receptors in resilience to social stress through a mechanism involving primarily d -expressing neurons (nam et al., ) . thus, while the majority of reports have implicated opioid signaling in the striatum in the control of reward-driven behaviors, there is no consensus on the actual mechanisms involved. all types of opioid receptors are expressed in the striatum (including the nucleus accumbens), and their mrna and protein levels follow discrete distribution patterns (le merrer et al., ; mansour et al., ; svingos et al., ) . mu opioid receptor mrna and protein are mainly present and ligand binding mainly occurs in striatal patches or striosomes (crittenden and graybiel, ) . conversely, delta opioid receptor mrna levels are relatively low, but the delta opioid receptor is the most abundant opioid receptor protein in the striatum. it has been reported that delta opioid receptors are present mainly on indirect pathway neurons that project to the intermediate nuclei in the basal ganglia while mu receptors are expressed both on direct and direct pathway medium spiny neurons; however, the reported fractions of neurons expressing these receptors vary considerably (ambrose et al., ; banghart et al., ; oude ophuis et al., ) . recent single-cell transcriptome analyses of the whole brain identified a subclass of neurons that express dopamine receptor d but have no or very low delta opioid receptor expression as well as subtypes of cells that express the d , mu and delta receptors (zeisel et al., ) . together, these data suggest that the expression of opioid receptors in the same neuronal population may be variable, and there is no clear agreement among reports based on different methodologies. here, we investigated the role of mu and delta opioid receptor-dependent signaling in d -expressing neurons in reward-driven behaviors. we generated a novel mouse model with selective knockdown of both receptors by means of rna interference. the main rationale for targeting both receptors was that they are mainly activated by the same ligands -enkephalins -and trigger intracellular signaling cascades initiated by go/i proteins (gendron et al., ; williams et al., ) . therefore, while these receptors may potentially have different roles in the striatum, this is mainly due to their presence on different cell types; on the other hand, their effects in a single type of neuron are highly similar. thus the elimination of one of these receptors could be compensated by the effects of the other. we found that while knockdown reduced or abolished the activity of the mu and delta receptors, no significant effects on reward-driven behaviors were observed. . . . oprd /oprm d -kd mouse strain short hairpin rnas against the mu and delta receptors were designed using block-it rnai designer software (invitrogen, usa). the sequences of the -nt fragments complementary to the target mrnas were gctgccctttcagagtgttaa (oprm - ), cctttggaaacatcctctgca (oprm - ), gctggtgattcctaaactgta (oprd - ) and cgccttgagataacatcgggt (oprd - ). synthetic oligos were cloned into the gw/emgfp-mir vector (invitrogen), which contains sequences derived from mir- (see fig. a for summary). silencing efficiency was validated in cho-k cells cotransfected with plasmids encoding opioid receptors and plasmids encoding the mirna cassette. a construct harboring hairpins was recombined into a bacterial artificial chromosome (bac; rp - e ; children's hospital oakland research institute, usa) containing the mouse drd a gene. the bac was purified, the vector sequences were removed, and the transgene was injected into the pronuclei of fertilized oocytes from swr/j mice. mutant mice were congenically bred onto the c bl/ n background (> generations of backcrosses before the start of the experiments). transgenic animals were genotyped by pcr using primers with the sequences acgtaaacggccacaagttc and acgtaaacggccacaagttc (amplicon size bp), which target the egfp-encoding sequence. additionally, each genotyping reaction also included positive control primers with the sequences ccatttgctggagtgactctg and taaatctggcaagcgagacg (amplicon size bp). the pcr conditions were as follows: minutes of denaturation at °c followed by cycles of s at °c, s at °c, and s at °c. the experiments were performed on adult oprd /oprm d -kd mice and their wild-type littermates aged - weeks at the beginning of the procedures. the exception was the socially conditioned place preference procedure, which was started when mice reached the age of weeks. male mice were used for all experiments except the intellicage experiment. the animals were housed - per cage in rooms with a controlled temperature of ± °c under a / -h light-dark cycle with ad libitum access to food and water. the animals were handled for - days before the beginning of the experimental procedures. the experiments were conducted during the dark phase, except for the light-dark box test, and morphine-conditioned place preference test, which were conducted during the light phase. the saccharin preference test and the intellicage experiment were conducted over one or several days. all behavioral procedures were approved by the ii local bioethics committee in krakow (permit numbers / , / , and / ) and conducted in accordance with the european communities council directive of november ( / /eec). the expression of the transgene was assessed by analyzing the coexpression of the egfp protein with the mir precursor by immunostaining. both immunohistochemistry and immunofluorescence were used. briefly, the animals were perfused with % paraformaldehyde in phosphate-buffered saline, and the brains were removed and postfixed overnight. for immunochemistry, the brains were sliced into -µm sections in the coronal plane on a vibratome (leica, germany), the sections were blocked with goat/pig serum hybridized with an anti-gfp antibody (rabbit, thermo fisher scientific, a- , : ), and then the signal was developed using the rabbit vectastain abc hrp kit and diaminobenzidine substrate (vector inc., usa). images were acquired using an aperio scanscope cs device (leica, germany). for immunofluorescence, perfused brains were frozen in % (w/v) sacharose, sliced into -µm sections on a cryostat (leica), and mounted on superfrost+ microscope slides. the sections were stained with an anti-gfp antibody (chicken, abcam, ab , : ) and, in some cases, a rabbit anti-ppenk antibody (neuromics, ra - ; : ). the following secondary antibodies were used: alexa fluor -conjugated goat antichicken igy (h+l) (invitrogen, a- , : ) and alexa fluor -conjugated donkey anti-rabbit (invitrogen, a- , : ). fluorescence images were acquired using an lsm zeiss upright confocal microscope. mice ( -to -week-old males) were decapitated under isoflurane (aerrane, baxter) anesthesia. coronal brain slices ( μm thick) containing the nucleus accumbens were cut on a vibratome (vt s leica microsystems, germany). the slices were prepared from brain tissue submerged in ice-cold cutting artificial cerebrospinal fluid (acsf) containing (in mm) nacl, . nahco , . nah po , . mgso , . kcl, . cacl , . hepes, . sodium ascorbate, . sodium pyruvate, . thiourea and glucose and continuously bubbled with a mixture of % o and % co , ph . - . , osmolarity - mosmol/kg. after cutting, the slices were immediately transferred to an incubation chamber filled with normal acsf containing (in mm) nacl, nahco , . nah po , mgso , . kcl, . cacl and glucose at ± . °c. the tissue was incubated for at least minutes before electrophysiological recordings. nucleus accumbens neurons were visualized with an upright microscope (zeiss axio examiner a microscope, zeiss, germany) using video-enhanced infrared differential interference contrast (dic) and fluorescence optics. individual neurons were visualized using a × water immersion lens. dopamine receptor d expressing cells were identified based on the expression of gfp (kd animals) or tdtomato (control animals). gfp was excited at nm and tdtomato was excited at nm by led illumination (zeiss colibri). whole-cell current-and voltage-clamp recordings were made using the sec- x amplifier (npi, tamm, germany). signals were filtered at khz and digitized at khz using a micro converter (cambridge electronic design (ced), cambridge, uk) with signal and spike software (ced, uk). patch micropipettes ( - mΏ) were pulled from borosilicate glass capillaries (sutter instrument, novato, ca, usa.) using the sutter instrument p puller. the internal pipette solution contained (in mm) k-gluconate, kcl, mgso , hepes, na -atp, . na-gtp, egta and . % biocytin, osmolarity - mosmol/kg, ph . - . . a liquid junction potential of + . mv was calculated, and the data were corrected for this value. to characterize the basic electrophysiological properties of d neurons, recordings were performed in normal acsf. the membrane potential of the recorded neurons was held at − mv by continuous current injections, and voltage responses to rectangular current steps ( ms in duration, - pa to + pa, -pa increments, with -s intervals between steps) and depolarizing current ramp ( . to na) were recorded. the membrane resistance, time constant and capacitance were measured from the voltage response to a - pa hyperpolarizing pulse. the excitability of the recorded neurons was examined using depolarizing current pulses from + pa to + pa, and for each cell, the number of action potentials was plotted against the intensity of the injected current. action potential (ap) parameters, specifically the ap threshold, amplitude, - rise time, half width, ahp minimum and action potential peak time to ahp minimum time (ap peak to ahp), were measured for the first action potential evoked by the minimal depolarizing current step. the rheobase was defined as the minimal current necessary to induce an action potential and was determined using a current ramp protocol. next, to characterize current-voltage relationships, cells were voltage-clamped at a command potential of - mv, and recordings were performed in the presence of . μm tetrodotoxin. voltage steps from - mv to + mv ( ms in duration, -mv increments) were delivered every seconds, and steady-state current responses were measured. to verify the presence of mu and delta opioid receptors on the membranes of d -expressing neurons, cells were voltage-clamped at a command potential of - mv. all recordings were performed in normal acsf containing tetrodotoxin ( . μm). after obtaining stable whole-cell current recordings for at least minutes, the selective mu opioid receptor agonist damgo ( μm) or the selective delta opioid receptor agonist dpdpe ( μm) was added to acsf-perfused slices. recordings lasted for at least minutes following batch application of the drugs. the values are given as the means ± sem. in all experiments, p < . was considered significant. all data sets were tested for normal distribution, and outliers were excluded from the analysis (rout method, q = %). the firing characteristics of the recorded neurons were analyzed by linear regression. statistical significance between groups was determined using either student's t test or the mann-whitney test when applicable. data analysis was performed using graphpad prism for windows. a recorded neuron was classified as responsive to a drug if the whole-cell current of the neuron after drug application differed from the baseline by more than three standard deviations. behavior was recorded using a basler (ac - gm) camera and ethovision . software (noldus). the experimenter was blinded to the genotypes of the animals tested. walking initiation and open field exploration were measured during the adaptation period before the social interaction test. the mice were introduced to a novel transparent plastic cage ( × . cm × . cm high) containing . cm of bedding. the latency to leave a x -cm square (outlined digitally) was used as a measure of walking initiation. latencies were measured manually from video recordings with boris software (friard and gamba, ) . leaving the square was defined as putting all four paws outside the square. the distance moved during the -minute habituation period was measured by ethovision . software (noldus, the netherlands). the test was performed in a two-chamber apparatus (each chamber was × × cm high). the light compartment was brightly lit ( lux), and the dark compartment was dimly lit ( lux). the animals were placed in the dark compartment and allowed to freely explore the apparatus for minutes. partners ( -to -week-old male c bl/ n mice, charles river laboratories, germany) were habituated to our laboratory facilities for at least one week prior to the experiments. focal animals were habituated to the test cage for minutes before the test ( × . × . cm, containing . cm of bedding). immediately after habituation, the partner animal was placed in the cage, and the mice were allowed to interact for minutes. control animals were exposed to a novel object instead of a novel mouse. aspen gnawing blocks ( × × cm) were used as novel objects. the blocks were identical in shape and size to the blocks in the home cages of the mice but differed in smell. time in proximity was measured from video recordings with ethovision . software (noldus, the netherlands). proximity was defined as < cm between body centers. saccharin preference was assessed as described previously (jastrzębska et al., ) . briefly, individually housed animals had -h access to two -ml graduated drinking bottles. one bottle was filled with water, and the other bottle was filled with . % saccharin solution. food was provided ad libitum on the cage floor. the test was performed in an intellicage (new behavior, switzerland), which allows for long-term monitoring of up to mice living in a group with minimal interference from the experimenters. mouse behavior was monitored using rfid chips (uno pico id, animalab, poznań, poland). the cage consisted of a housing area and four operant chambers (situated in the corners of the cage) equipped with sensors. the operant chambers (later referred to as "corners") were accessible by only one animal at a time. each of the corners allowed access to two bottles through a guillotine door. the experiment consisted of two phases: the adaptation phase ( days) and the test phase ( days). during adaptation, the mice had free access to water in all corners. during the test, the mice were provided . % saccharin solution in two of the corners. the guillotine door closed seconds from the first lick or immediately after the mouse left the corner. the delay from the moment the mouse was detected in the corner until the doors opened increased from . to seconds every h (an initial delay of . s lasted for h). the positions of the saccharin and water bottles were exchanged every h. the test was conducted in automatic conditioned place preference (cpp) cages with three compartments (env- c, med associates inc., usa). one of the side compartments had black walls and a white floor, while the other had white walls and a black floor. on the first day of the procedure, the mice were habituated to the apparatus for minutes (the floor was covered with white paper). on the second day, a pretest was conducted; the mice were placed in the central compartment and allowed to freely explore the whole apparatus for minutes. on conditioning days, the mice received an i.p. injection of either morphine hydrochloride ( mg/kg every other day) or saline and were immediately placed in one of the compartments for minutes. the pairing of the compartment with morphine injection was biased, i.e., the mice were assigned to the compartment that was initially less preferred. the conditioning phase lasted eight days and was followed by a posttest, which was performed in the same manner as the pretest, on the ninth day. the procedure was performed as previously described, with modifications (dölen et al., ; panksepp and lahvis, ) . before the test, the animals were housed in groups of to on aspen shavings with aspen gnawing blocks (context a). the test consisted of three phases: the pretest, conditioning phase, and posttest (fig. a) . during the pretest, the animals were placed in a custom-made plastic cage ( x x cm high) divided into two identical compartments by a transparent plastic wall with a x -cm opening at the base. each compartment contained a type of novel bedding (cellulose (biofresh performance bedding, absorption corp, usa, / ' pelleted cellulose), beech (p.p.h. "wo-jar", poland, trociny bukowe przesiane gat. ), or spruce (lignocel® fs , j. rettenmaier and sohne, germany) and a gnawing block different from the one in the home cage in size and/or shape (contexts b and c). the mice were allowed to freely explore the cage for minutes. the amount of time spent in each compartment was measured by ethovision xt . software (noldus, the netherlands). after the pretest, the mice were returned to their home cages (context a). the next day, the mice were assigned to undergo social conditioning (housing with cage mates) for h in one of the contexts used in the pretest followed by h of isolate conditioning (single housing) on the other type of bedding. two experiments were performed on separate groups of animals. in the first experiment, beech and spruce were used. since all mice showed a preference for beech over spruce, spruce was chosen as the social bedding. in the second experiment, beech and cellulose were used. the mice showed no preference between these beddings, and the assignment of contexts was random (unbiased design). the results of the two experiments were pooled. conditioning was repeated for days ( days in each context, alternating every day). the posttest was performed in the same manner as the pretest. the rewarding effects of the social context were measured by comparing the time spent in the social context during the posttest to the time spent in the social context during the pretest. the strain was generated following the general outline of the method described by (novak et al., ) . the transgene harbors a sequence encoding egfp for easy detection of expression and two hairpins against each of the targeted sequences, the mu and delta opioid receptors (fig. a) . we first tested the knockdown efficiency in the cho-k cell line and found average reductions of and % in the abundance of the mrna sequences corresponding to mu and delta receptors, respectively (fig. b) . transgenic mice were generated by injecting the transgene construct (without the vector cassette) into fertilized oocytes. the offspring were screened for the presence of the transgene, and two founder lines were established. the line with higher transgene levels was selected for further experiments. the results indicated that the expression of gfp was consistent with the known pattern of dopamine receptor d protein expression (fremeau et al., ) (fig. c) . the strongest signal was observed in areas corresponding to the striatum, including the nucleus accumbens and olfactory tubercle, and slightly weaker staining was also present in the deeper layers of the cortex and discrete areas of the septum. importantly, in the nucleus accumbens, the expression of gfp and preproenkephalin exhibited minor overlap (fig. d) , which is again consistent with d receptors being mainly expressed on the neurons of the direct pathway equivalent in the nucleus accumbens (gangarossa et al., ; kupchik et al., ) (fig. c & d) . to characterize the influence of selective double knockdown of the mu and delta opioid receptors on the electrophysiological properties of d receptor-expressing neurons in the nucleus accumbens core, we used oprd /oprm d -kd [tg/ ] mice as the kd group and d -tdtomato mice [tg/ ] as the control group. only neurons expressing tdtomato (control group) or gfp (kd group) were recorded ( fig. a, b) . in total, neurons from the control and from the kd group were included in the analysis. whole-cell recordings revealed that mu and delta opioid receptor knockdown did not influence the shape of the action potentials of the examined neurons (fig. c) . subsequent statistical analysis did not reveal any differences in the measured ap parameters (action potential threshold, amplitude, - rise time, half width, and ahp minimum and ap peak to ahp, tab. ). double knockdown of the mu and delta opioid receptors also did not influence the membrane resistance, capacitance or time constant of the recorded neurons (fig. d, e, f) . the excitability of nucleus accumbens neurons was determined by testing the relationship between the firing rate and the intensity of the injected current (fig. g) . the firing characteristics of the recorded neurons were fitted by linear regression (fig. h) , and the parameters did not differ between the groups. the mean gain and the threshold current were not affected by mu and delta opioid receptor knockdown (gain: . ± . hz/pa in the control group and . ± . hz/pa in the kd group, t( ) = . , p = . ; threshold current: . ± . na in the control group and . ± . na in the kd group; t( ) = . , p = . ). current-voltage relationships (i-v curves) were plotted using measurements of steady-state currents in response to a series of incremental voltage steps recorded in acsf containing tetrodotoxin. two-way repeated measures anova revealed no significant differences in the current responses of kd and control neurons (main effect of treatment, f( , ) = . , p = . ; fig. i ). to validate the lack of functional mu opioid receptor expression in the d -expressing neurons of oprd /oprm d -kd mice, we bath-applied the selective mu opioid receptor agonist damgo ( μm) and recorded the whole-cell currents (- mv command potential) of d -expressing cells in slices from kd and control mice. damgo induced a reversible outward current in of voltage-clamped control neurons (fig. a, b) . statistical analysis of the damgo-induced current amplitude showed that in control cells, damgo application induced a statistically significant change in the recorded whole-cell current (mean whole-cell current at baseline: . ± . na; mean whole-cell current during damgo application: . ± . na; t( ) = . , p = . ; fig. b ). damgo application produced a reversible outward current in only of recorded kd neurons (fig. c , d, e). statistical analysis of the damgo-induced current amplitude showed that in kd neurons, damgo did not induce a statistically significant alteration in the whole cell current (mean whole-cell current at baseline: . ± . na; mean whole-cell current during damgo application: . ± . na; t( ) = . , p = . ; fig. d ). subsequent comparison of the damgo-induced whole-cell current in kd and control neurons showed that the mu opioid receptor agonist evoked a reduced effect in kd cells (whole-cell current change: ± pa) in comparison with control neurons (whole-cell current change: ± pa), but the difference did not reach statistical significance (t( ) = . , p = . ; fig. f ). to confirm the knockdown efficiency of delta opioid receptors in d -expressing cells, d striatal neurons from oprd /oprm d -kd and control mice were voltage clamped at - mv, and their responsiveness to the selective delta opioid receptor agonist dpdpe ( μm) was recorded (fig. a , b, c, d). we found that all examined neurons from kd mice were insensitive to the delta opioid receptor agonist, which was confirmed by subsequent statistical analysis (mean whole-cell current at baseline: . ± . na; mean whole-cell current during dpdpe application: . ± . na; t( ) = . , p = . ). at the same time, in neurons from the control mice, bath application of the drug generated a reversible outward current (in of recorded cells, fig. e ), and the change in the wholecell current was statistically significant (mean whole-cell current at baseline: . ± . na; mean whole-cell current during damgo application: . ± . na during damgo application; t( ) = . , p = . ). moreover, the dpdpe-evoked change in the whole-cell current of d striatal neurons was significantly different between control and kd neurons (change in the whole-cell current in control neurons: ± pa; change in the whole-cell current in kd neurons: . ± . pa; t( ) = . , p = . ; fig. f ). first, we assessed the behavior of mutant mice in an open field. the mutation targeted striatal neurons of the direct pathway, which may have an effect on motor activity. oprd /oprm d -kd mice showed a normal latency to leave the square in the open field test (fig. a , t( ) = . , p = . ), indicating no deficit in movement initiation. there was a trend towards a shorter total distance traveled by mutant mice compared to wild-type littermates, but this difference did not reach significance (t( ) = . , p = . ; fig. b ). these results indicate no appreciable motor impairments that could confound the interpretation of the observed behaviors. next, we tested anxiety-like behaviors in the light-dark box and during social interaction tests. oprd /oprm d -kd mice did not differ from wild-type controls in the latency to enter the lit compartment (t( ) = . , p = . ; fig. c ) and overall spent a similar amount of time on the lit side (t( ) = . , p = . ; fig. d ). there was no effect of genotype on the number of crossings between compartments (control: . ± . ; oprd /oprm d -kd : . ± . ; p = . ). we performed a social interaction test, another test for altered anxiety-like behavior, in the open field apparatus. oprd /oprm d -kd mice spent as much time in proximity with an unfamiliar adult conspecific as their wild-type littermates and spent significantly more time interacting with another mouse than an inanimate object (genotype: f( , ) = . , p= . ; animal vs. object: f( , ) = . , p < . ; animal vs. object × genotype: f( , ) = . , p = . ; fig. e ). together, these results indicate that the mutation did not altered anxiety-like behaviors, indicating that its effects were different from the reported effects of the complete inactivation of the oprd gene (filliol et al., ) or treatment with a delta opioid receptor agonist (perrine et al., ) . to assess the effects of the mutation on reward sensitivity, we used two paradigms: the saccharin preference test and morphine-or social-conditioned place preference. first, we measured the volume of saccharin-sweetened water ( . % w/v) consumed by male oprd /oprm d -kd and wild-type mice over a period of h in a two-bottle choice task. mutant mice did not differ from wild-type animals in preference for sweet taste (t( ) = . , p = . ; fig. a ) or the total volume of . % saccharin solution consumed (t( ) = . , p = , ; fig. b ). next, we subjected female oprd /oprm d -kd and wild-type to a delay discounting task with saccharin as a reward. the main advantage of this method is that it is able to detect even a minor difference in subjective reward value and the propensity for impulsive choices. the test was conducted in intellicages, in which a group of female mice implanted with radio-frequency identification (rfid) chips were housed together for days. oprd /oprm d -kd mice showed similar general activity as control animals, which decreased slightly but significantly over the course of the experiment (number of corner visits, time: f( , ) = . , p < . ; genotype: f( , ) = . , p = . ; time × genotype : f( , ) = . , p = . ; fig. c ). the mutation had no effect on initial preference for the saccharin solution and did not affect the rate of discounting with increasing delay to access the reward (time: fig. f ) consumed. thus, we found that the mutation did not affect preference for, the intake of or the motivation to drink the saccharin solution. next, we tested morphine-conditioned place preference (cpp). both wild-type and oprd /oprm d -kd mice showed an increase in preference for the context paired with morphine injections ( mg/kg, i.p.; pre-post: f( , ) = . p< . ; genotype: f( , ) = . p = . ; pre-post × genotype: f( , ) = . p = . ; post hoc sidak's test; wt: p = . ; kd: p = . ; fig. g ). mutant mice had, on average, slightly lower locomotor activity (particularly during the posttest); however, this effect did not reach significance (pre-post: f( , )= . , p = . ; genotype: f( , ) = . , p = . ; pre-post × genotype: f( , ) = . , p = . ; fig. h ). the results showed no appreciable effects of receptor knockdown on morphine reward, which is consistent with previously reported normal cpp in animals with selective deletion of the mu receptor in forebrain gabaergic neurons (charbogne et al., ). finally, we tested the rewarding effects of social contact using the conditioned place preference paradigm (fig. a ). both mutant and control mice acquired a preference for the context associated with group housing (pre-post: f( , ) = . , p = . ; genotype: f( , ) = . , p = . ; interaction: f( , ) = . , p = . ; post hoc sidak's test; wt: p = . ; kd: p = . ; fig. b ). motor activity was not affected by genotype, although animals from both groups exhibited an increases in activity from the pretest to the posttest (pre-post: f( , ) = . , p = . ; genotype: f( , ) = . , p = . ; interaction: f( , ) = . , p = . ; post hoc sidak's test; wt: p = . ; kd: p = . ; fig. c ). these results showed that the mutation did not appreciably affect the rewarding effects of social contact. we attribute the genotype-independent change in activity to the normal physical development of young animals. we found that knocking down the mu and delta opioid receptors in d -expressing neurons had no appreciable effects on sensitivity to rewards or motivation to obtain them. the only changes in behavior were alterations in motor activity in some of the tasks that included rewards. these results do not completely exclude the involvement of the mu and delta opioid receptors present on neurons expressing dopamine receptor d in reward processing but suggests that they are not an essential part of the associated mechanism. first, it should be stressed that the mutation utilized in this study targets neurons expressing dopamine receptor d , which is a broader group of neurons than a subset of nucleus accumbens or striatal medium spiny neurons that prominently includes glutamatergic neurons in lower cortical layers. we note, however, that knockdown efficiency is dependent on gene promoter activity, and d expression is several-fold higher in medium spiny neurons than in other cell types (nam et al., ) . no significant behavioral effects of knockdown were observed, which could indicate that opioid signaling in non-striatal cells also has no essential role in reward-driven behaviors. however, it is likely that knockdown in cells other than medium spiny neurons was less efficient and that mu and delta opioid receptor signaling was thus spared. we also note that the knockdown efficiency in d -expressing medium spiny neurons was probably higher than that found in the in vitro experiment based on the previously reported effects of mglur knockdown in a mutant mouse line generated using the same gene promoter (novak et al., ) . we showed that under ex vivo conditions, the majority of d -expressing striatal medium spiny neurons from control mice were sensitive to mu and delta receptor agonists. the coexpression of the d and delta opioid receptors has been previously reported in striatal neurons (ambrose et al., ; ma et al., ) . here, we confirmed these data and showed that delta opioid receptor stimulation in d -expressing neurons activated outward whole-cell currents, which correspond to the hyperpolarization of the cell membrane. to the best of our knowledge, the presented results are the first to show the direct, inhibitory effect of opioid receptor activation on whole-cell currents in striatal neurons. notably, in knockdown mice, the mutation led to the complete loss of sensitivity to the delta opioid receptor agonist (dpdpe). these results not only confirm knockdown but also, along with results obtained in acsf in the presence of tetrodotoxin, are proof of the presence of delta opioid receptors on postsynaptic membrane of the majority of striatal d -receptor-expressing neurons. similarly, most of the tested control cells were sensitive to the mu opioid receptor agonist (damgo), and the selective activation of mu receptors also led to outward current activation. d and mu opioid receptor coexpression in the striatum is well documented (cui et al., ; oude ophuis et al., ) , and its hyperpolarizing influence is attributed to postsynaptic action (elghaba and bracci, ) . again, the lack of response of knockdown mice to the agonists allows us to conclude that the functional mu opioid receptor is largely absent in mutant animals. importantly, the lack of either mu or delta receptors did not change the passive or active membrane properties of d -expressing neurons; therefore, any behavioral effect of the mutation can be attributed to altered opioid signaling, not the dysfunction of d -expressing neurons. the normal preference for sweet taste and morphine-or social-conditioned place preference observed in kd animals were unexpected. based on the reported role of opioid signaling in the "liking" component of reward, we anticipated a decrease in sweet taste preference or possibly reduced consumption of sweetened water (castro and berridge, ; peciña and berridge, ) . we also hypothesized that the loss of mu and delta receptors would affect the rewarding effects of social contact, but we found no evidence of any social impairments. we were uncertain whether knockdown could affect motivation (i.e., delay discounting) or the rewarding effects of morphine, and found them to be unaltered. the simplest explanation for the normal phenotypes is that knockdown allowed sufficient opioid receptor activity to sustain reward-driven behaviors. nevertheless, we find this possibility unlikely, as behavioral changes have been reported after antagonist treatment, which is unlikely to completely block receptor function. alternatively, it can be argued that opioid receptors in other types of striatal neurons play the primary and essential role in reward-driven behaviors. however, this may be inconsistent with the reported role of d -expressing neurons in mediating the reinforcing effects of rewards (e.g. calipari et al., ; cole et al., ; kravitz et al., ) . moreover, this assumption might be counterintuitive in the context of the reported rescue of morphine-conditioned place preference in mice with oprm gene ko by the reintroduction of mu receptors in pdyn-expressing (i.e., mainly direct pathway medium spiny neurons in the nucleus accumbens) cells (cui et al., ) or the reported involvement of delta opioid receptors on d -expressing neurons in resilience to social stress (nam et al., ) . one way to reconcile these reports and the observed phenotypes is to assume redundancy in the roles played by opioid signaling in different types of medium spiny neurons and through presynaptic receptors. thus, opioid receptors on dopamine neurons lacking d receptors or on presynaptic terminals could possibly compensate for the effects of knockdown. opioid receptors have primarily inhibitory effects on transmission, and their activation in any part of the network that drives activity, such as delta receptors on excitatory neurons in the cortex or mu and delta receptors on cholinergic neurons, could have similar effects. this is speculative and immediately raises the question of why redundancy in opioid system functions could be necessary. we found that mu and delta receptors are ubiquitously present in d -expressing medium spiny neurons but are not essential for the control of signaling underlying reward-driven behaviors. this work was supported by the grant opus / /b/nz / from the national science centre, poland, and statutory funds of the maj institute of pharmacology of the polish academy of sciences. the amplitudes of dpdpe-induced outward whole-cell currents recorded in neurons from control and kd mice. the filled circles represent neurons in which the change in current in response to dpdpe was greater than three standard deviations from baseline. the data are presented as the mean ± sem. significant differences between group means (student's t test) are represented by ** (p < . ). motor activity in the apparatus before (pre) and after (post) conditioning. activity was scored each time an animal crossed one of the infrared beams in the compartments. significant differences between group means (sidak's test) are represented by *** (p < . ). . the data are presented as the mean ± sem, no statistically significant differences were revealed between examined neurons. the filled circles represent neurons in which the change in current in response to damgo was greater than three standard deviations from baseline. the data are presented as the mean ± sem. significant differences between group means (student's t test) are represented by *** (p < . ). (c) a representative voltage-clamp recording showing that neurons from kd mice did not respond to dpdpe application and (d) a corresponding line graph showing the current amplitude recorded at baseline and during dpdpe application. note the lack of differences between whole-cell current amplitudes in the tested conditions. (e) pie charts showing the proportion of dpdpe-responsive (filled) and dpdpe-nonresponsive (empty) neurons from control and kd mice. (f) the amplitudes of dpdpe-induced outward whole-cell currents recorded in neurons from control and kd mice. the filled circles represent neurons in which the change in current in response to dpdpe was greater than three standard deviations from baseline. the data are presented as the mean ± sem. significant differences between group means (student's t test) are represented by ** (p < . ). (g) time spent in the morphine-paired compartment before (pre) and after (post) conditioning. each pair of points connected by a line corresponds to an individual animal. (h) motor activity in the apparatus before (pre) and after (post) conditioning. activity was scored each time an animal crossed one of the infrared beams in the compartments. significant differences between group means (sidak's test) are represented by *** (p < . ). dopamine-d and delta-opioid receptors co-exist in rat striatal neurons neuroanatomical sites mediating the motivational effects of opioids as mapped by the conditioned place preference paradigm in rats enkephalin disinhibits mu opioid receptor-rich striatal patches via delta opioid receptors dissecting components of reward: 'liking', 'wanting', and learning in vivo imaging identifies temporal signature of d and d medium spiny neurons in cocaine reward opioid hedonic hotspot in nucleus accumbens shell: mu, delta, and kappa maps for enhancement of sweetness "liking" and "wanting mu opioid receptors in gamma-aminobutyric acidergic forebrain neurons moderate motivation for heroin and palatable food optogenetic self-stimulation in the nucleus accumbens: d reward versus d ambivalence basal ganglia disorders associated with imbalances in the striatal striosome and matrix compartments targeted expression of μ-opioid receptors in a subset of striatal direct-pathway neurons restores opiate reward social reward requires coordinated activity of accumbens oxytocin and ht dichotomous effects of mu opioid receptor activation on striatal low-threshold spike interneurons. front understanding opioid reward mice deficient for delta-and mu-opioid receptors exhibit opposing alterations of emotional responses localization of d dopamine receptor mrna in brain supports a role in cognitive, affective, and neuroendocrine aspects of dopaminergic neurotransmission boris: a free, versatile open-source event-logging software for video/audio coding and live observations distribution and compartmental organization of gabaergic medium-sized spiny neurons in the mouse nucleus accumbens molecular pharmacology of δ-opioid receptors mu opioid receptors in gabaergic neurons of the forebrain promote alcohol reward and drinking loss of nmda receptors in dopamine neurons leads to the development of affective disorderlike symptoms in mice distinct roles for direct and indirect pathway striatal neurons in reinforcement coding the direct/indirect pathways by d and d receptors is not valid for accumbens projections reward processing by the opioid system in the brain regional and cell-type-specific effects of damgo on striatal d and d dopamine receptor-expressing medium-sized spiny neurons opioid-receptor mrna expression in the rat cns: anatomical and functional implications loss of morphine-induced analgesia, reward effect and withdrawal symptoms in mice lacking the µ-opioid-receptor gene reduced nucleus accumbens enkephalins underlie vulnerability to social defeat stress incentive learning underlying cocaine-seeking requires mglur receptors located on dopamine d receptor-expressing neurons cannabinoid, melanocortin and opioid receptor expression on drd and drd subpopulations in rat striatum social reward among juvenile mice hedonic hot spot in nucleus accumbens shell: where do μ-opioids cause increased hedonic impact of sweetness? delta opioid receptor ligands modulate anxiety-like behaviors in the rat ultrastructural immunocytochemical localization of μ-opioid receptors in rat nucleus accumbens: extrasynaptic plasmalemmal distribution and association with leu -enkephalin nucleus accumbens μ-opioid receptors mediate social reward regulation of µ-opioid receptors: desensitization, phosphorylation, internalization, and tolerance dopamine, learning and motivation molecular architecture of the mouse nervous system rewarding effects of social interactions in oprd /oprm d -kd mice. (a) schematic representation of the experiment. (b) time spent in the context associated with social interaction before and after conditioning (in the pretest and posttest) key: cord- -bi jyz r authors: wilson, audrey e; siddiqui, ali; dworkin, dr. ian title: spatial heterogeneity in resources alters selective dynamics in drosophila melanogaster date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bi jyz r environmental features can alter the behaviours and phenotypes of organisms and populations evolving within them including the dynamics between natural and sexual selection. experimental environmental manipulation, particularly when conducted in experiments where the dynamics of the purging of deleterious alleles are compared, has demonstrated both direct and indirect effects on the strength and direction of selection. however, many of these experiments are conducted with fairly simplistic environments when it is not always clear how or why particular forms of spatial heterogeneity may influence behaviour or selection. using drosophila melanogaster, we tested three different spatial environments designed to determine if spatial constraint of critical resources influences the efficiency of natural and sexual selection. we conducted two allele purging experiments to ) assess the effects of these spatial treatments on the selective dynamics of six recessive mutations, and ) determine how these dynamics changed when sexual selection was relaxed and the spatial area was reduced. we found that allele purging dynamics depended on spatial environment, however the patterns of purging rates between the environments differed across distinct deleterious mutations. we also found that for two of the mutations, the addition of sexual selection increased the purging rate. understanding mating systems and the dynamics between the sexes can illuminate how sexual selection acts within populations, driving many organisms' behaviours and phenotypes. key work in the theory of mating systems conducted by bateman ( ) , trivers ( ) , and emlen and oring ( ) has led many studies being dedicated to examining male and female interactions across different species and populations. the mating systems of numerous species have been shown to vary due to local adaption or ecological constraints due to environmental factors (miller and svensson ) . for example, ungulate species that inhabit open environments tend towards group mating systems while those within closed or forested environments tend to adopt small group or pair mating systems (carranza ; bowyer et al. ) . this variation in behaviour can occur within species as well, as seen in the mating system of prunella modularis, which has been shown to shift between polygyny, polygynandry, and polyandry depending on food distribution (davies and lundberg ) . environmental features such as spatial size, structure, resource abundance, and climate can alter the strength of sexual selection and conflict (both intra-and inter-) on an individual, in turn leading to fitness payoffs for certain phenotypes. in sancassania berlesei, increasing environmental complexity changes the fitness differences between the fighter and scramble male morphs which was believed to be a result of reduced encounters between fighter males (lukasik et al. ) . another example can be found in certain populations of katydid, where sex role reversal occurs under conditions of low resource abundance, placing a greater influence of inter-and intra-sexual selection on females (gwynne and simmons ) . since environmental variation can impact fitness, it is important to keep environmental context in mind when studying the strength of natural selection, sexual selection and sexual conflict. along with the environment, understanding the interaction between sexual selection and other components of natural selection (fecundity and viability) is important for determining an organisms' or a populations' phenotypic and behavioural origins. since the term was introduced by darwin ( ), studies have focused on how traits under strong sexual selection (weaponry, ornaments, and mating behaviours) arise and persist within populations. when sexual conflict is present, mutations may be beneficial in one sex but deleterious in the other (antagonistic pleiotropy), allowing for the maintenance of conditionally deleterious alleles. in many species, an extreme case of this is males harming females during copulation, either through mating itself or ejaculates, in order to prevent re-mating, further securing the males' paternity (johnstone and keller ) . however, while often portrayed as being at odds with one another, individuals of higher overall condition will on average receive more mates, resulting in sexual selection working in tandem with other components of natural selection. for instance in ungulates, males of overall higher condition tend to have the largest weaponry and are better able to obtain fertilizations along with access to females themselves (preston et al. ; hoem et al. ; vanpé et al. ; emlen ) . a common way of determining how various factors influence natural and sexual is to conduct allele purging experiments. within these experiments, deleterious mutations are introduced into populations at a known frequency (or via induced mutations) and the rate they are removed from the populations over time is recorded or populations undergo various fitness assays. experimental conditions are manipulated (thermal stress, dietary stress, population density, environmental complexity, and mate choice (sharp and agrawal ; wang et al. ; young et al. ; hollis et al. ; maclellan et al. ; laffafian et al. ; hollis and houle ; mcguigan et al. ; arbuthnott and rundle ; clark et al. ; maclellan et al. ; singh et al. ; colpitts et al. ) and purging rates (or fitness) are compared to obtain estimates of the effects these conditions have on selective dynamics. while several kinds of these studies have been conducted, many show contrasting results in reference to whether sexual selection aids natural selection in the removal of deleterious alleles. one potential reason for such inconsistencies is that most experiments are performed in small, simple environments (i.e. small vials) at relatively high densities, and it is not clear the degree to which this may influence the strength and orientation of selection. such simple and high-density environments likely constrain individuals in terms of mating strategies available in more natural conditions. alternative mating strategies are density-dependent in several species (greenfield and shelly ; höglund and robertson ; kokko and rankin ) , and particularly for drosophila melanogaster, territorial defence strategies by males are less likely to occur when the population is at a high density (hoffmann and cacoyianni ) . simple environments may also influence female strategies in that they may accept more mates due to being unable to seek refuge or escape from constant male harassment (byrne et al. ) . creating a more "complex" environment consisting of a larger space, multiple food cups, and additional spatial structure to alter the interactions between the sexes, yun et al. ( ) showed that female harassment of high quality d. melanogaster females was greater in the simple fly vial environments used in many experiments, exaggerating the effects of sexual selection to reduce variance in female fitness. since yun et al.'s ( ) experiment, there have been several studies conducted to determine how natural and sexual selection changes within simple (high density in single vials or bottles) versus "complex environments" (lower density cages with multiple resources for interactions to occur). in a later study, yun et al. ( ) found flies that had mating opportunities within "complex" environments adapted more quickly to novel larval environments as opposed to those mating in simple environments or lacking mate competition. using a similar environmental design but creating a larger, lower density simple environment, colpitts et al. ( ) demonstrated that "complex" environments aided the purging of two deleterious mutations that had previously been found to have no difference in purging rate while manipulating opportunity for mate choice (arbuthnott and rundle ) . singh et al. ( ) showed increased purging rate of deleterious alleles from populations evolving within these "complex" environments, while macpherson et al. ( ) revealed that low quality females experienced a greater reduction in fitness due to male harm compared to high quality females but only in "complex" relative to simple environments. these studies exemplify that with even modest changes in spatial environment (increasing space and lowering density of individuals), the dynamics of natural and sexual selection can vary vastly. complexity without the manipulation of overall environment size has been shown to influence female fitness in terms of offspring production (malek and long ) , but this has not been used to test overall population fitness. while these studies potentially show how these forces interact in a way that may be more representative of what is seen in nature, the types of environments employed are still simple and largely reflect changes in density. however, it is important for such experiments to explicitly consider factors that are known to influence mating strategy as well, such as territory availability and spatial heterogeneity of resources. increasing the environmental complexity in which populations evolve may reveal new patterns of how sexual selection acts, particularly for d. melanogaster, which as a species shows considerable variability in mating strategy in different spatial contexts. typically displaying scramble competition in the lab, territorial behaviours and resource defense polygyny have been observed when d. melanogaster males are given a desirable resource (hoffmann ) . males also appear to display this behaviour more often when females are present, when there is a low density of males, and the resource is readily used by females for oviposition and resource patches are in a range of sizes (~ mm diameter) (hoffmann and cacoyianni ) . within laboratory experiments, where aggressive interactions amongst d. melanogaster males are observed, it is typical that larger males or males that hold residence of a territory first, have greater reproductive success (hoffmann ) . considering this, if populations are within an environment that allows males to benefit from territorial behaviour, these populations may show an increase in overall fitness, and more variation in mating strategies. yet to date, most experimental evolution and purging experiments have not considered these explicit factors in their design. while the previous work outlined above has made considerable contributions to our understanding of the interplay between environmental complexity and selective forces, the environments used in these experiments are relatively simplistic when considering the plasticity of animal mating behaviour. we conducted a series of short term allele purging experimental evolution assays where environmental complexity and the accessibility of d. melanogaster to critical resources were manipulated with these factors in mind. in the first part of this experiment we looked at how differences in resource patch size and accessibility influenced the purging of six recessive deleterious mutations from populations being held within a series of complex environments. specifically, we provided multiple resource patches of high (to maximize female fecundity) and low quality. in each treatment high quality patch size and accessibility varied according to how they should potentially influence aspects of territoriality. in the second experiment, we examined how the rate of removal for two of these mutations differed between the complex environments and two simple environments in which we additionally manipulated opportunity for mate choice (via forced monogamy). we expected that if natural and sexual selection were aligned, we would see an increase in purging rate as accessibility to resources decreased and that the purging rate overall would be greater when sexual selection was allowed to act in the form of mate choice than when it was removed. images of the environmental treatments and an illustration of general set up are provided in figure . three environments were created in order to test the effects of desirable resource availability on the removal of deleterious mutations from populations. within each environment there were both "high quality" resources of a yeast-rich food (see table s ) and a % dilution (in water/carrageenan) of this food as a "low quality" resource. high quality food was determined based on previously published nutritional geometry studies (lee et al. ; jensen et al. ) , that maximized female fecundity. based on previous studies, the intent of these high quality food resources was to entice females to use these patches for oviposition and potentially lead males to defend these resources to maximize their own mating success. the diluted medium provided resource patches, such that individuals are not competing for survival per se, but for the desirable resources that females may prefer to maximize their fecundity. for each replicate environment described below, mesh bugdorm m cage ( cm ) were used. the "nonterritory" treatment environment (nt) consisted of a single drosophila culture bottle ( ml), with a surface area of . cm ( mm x mm base) containing ~ ml of high quality food with the addition of four drops of a yeast-paste and orange juice mixture on top (to attract females (dweck et al. )), as well as a bottle only with ml low quality food. these represent "typical" drosophila lab environments where apparent scramble competition is commonly observed (spieth ) , although subtle interference competition may be occurring as well (baxter et al. ). the "unconstrained territory" spatial treatment (uct) consisted of eight open vials (height of mm, mm diameter, . cm surface area) each filled with ~ ml of high quality food with a single drop of yeast-paste/orange juice mixture on top and a single bottle with the low quality food . finally, the "spatially constrained territory" treatment (sct) had the same set-up as the uct treatment except each vial had a d printed cap ( mm diameter, mm height, mm opening, see supplemental fig ) to further restrict ease of access to high quality food patches. these d printed caps were designed and tested with several specific features in mind. first, that it was relatively difficult to gain access, but would be relatively easy (given positive photo-taxis and negative geo-taxis in drosophila (markow and merriam ) ) for an interloper to be chased out. second, that the aperture was of sufficient size that two large d. melanogaster individuals could pass one another, but one individual could still harass or chase the other in this space. finally, the cap was designed so that if an individual did display territorial behaviours, it had multiple places to survey or defend (food surface, inner aperture, and outside top of aperture). pipe cleaners were wrapped around the tops of bottles and vials to provide additional perching substrate for individuals. to examine deleterious allele purging rates, six mutations with known morphological defects were used across each of the three spatial treatments. each allele was picked because of previous work examining the effects of selection on them in the context of either spatial manipulations or varying degrees of sexual selection (arbuthnott and rundle ; colpitts et al. ). three of these mutations are autosomal (brown , vestigial , and plexus ) and three are x-linked (white , yellow , and forked ). the mutations plexus , white , yellow , and forked were obtained from bloomington stock center while brown and vestigial were obtained from stocks kept in the lab. these alleles were chosen for their wide array of phenotypic effects with two influencing eye colour (white and brown ), two influencing wing morphology (plexus and vestigial ), one affecting body colour and behaviour (yellow ) and one affecting bristle morphology (forked ). to create experimental populations, individuals were backcrossed into a large outbred domesticated lab population (census size of - individuals) originally collected from fenn valley winery (fvw), michigan (gps co-ordinates: . , - . ) in . this population was chosen to potentially minimize confounding effects of lab adaptation in this experiment (harshman and hoffmann ) , i.e. it is expected that this population has already had considerable opportunity to adapt to our lab environment (~ generations prior to initiation of this experiment). to generate experimental populations, the following procedure was used. for autosomal mutations, mutant female virgins were crossed with fvw males. f was then crossed to each other and mutant homozygote females were collected. for x-linked mutations, mutant males were crossed with wildtype females. the heterozygous females from this cross were then crossed back to wildtype males, the mutant offspring from this cross were then collected and the process was repeated. for each mutation, backcrossing was conducted for five generations and on the final generation, offspring from the final cross were mated together to create mutant males and females. fifty pairs were used to generate each cross. for each mutation, nine replicate populations were created and three of each randomly assigned to one of the three environmental treatments. initial populations consisted of males and females with starting allele frequencies of . for their respective mutation. populations were maintained at l: d cycles at °c with % relative humidity in a conviron walk in chamber (cmp ). each generation, adults were placed into their respective treatments and allowed to mate and lay eggs for three days. after the three day period, adults were removed from the environments and discarded. eggs were allowed to develop for days, after which the next generation of adults was collected by bringing the adults to the cold room kept at °c and gently knocking them into vials. after this initial collection, males and females from each replicate were phenotyped under light co and placed into their respective environments with fresh food. this cycle was repeated for generations. due to a laboratory bacterial infection in one replicate of the brown population for the nt treatment, this replicate was discarded after generation . a fourth replicate was created with the same starting allele frequencies ( . ) in order to account for the missing data. this replicate was therefore five generations behind the rest of the experiment and was continued for generations. in order to get an estimate of allele frequencies for autosomal mutations during this experiment, monogamous pairings of phenotypically wildtype females and mutant males were conducted at generations and , for brown and plexus populations and at generations and for vestigial populations. after the collection of adults for the next generation, for each population virgin females were phenotyped over light co . of the females, those that lacked the mutation (i.e. could be homozygous or heterozygous for the wild type alleles) were placed singularly into vials with a mutant male. offspring were analyzed from these vials over days after emergence. if a vial contained only wildtype offspring, the female parent was scored as homozygous for lacking the mutation, if the vial contained a mixture of wildtype and mutant offspring, the female parent was scored as heterozygous for the mutation. for the x-linked mutations, allele frequencies were estimated from the frequency of the mutation in males. to determine the effects of sexual selection on purging rates, we re-ran the experiment using white and vestigial with the addition of two new treatments. the first treatment, deemed "vial no choice" (vnc), consisted of randomly assigning individual pairs into vials to mate (i.e forced monogamy). the second treatment, "vial choice" (vc), consisted of randomly assigning male and female adults into vials of mixed sex pairs. after three days of mating for each treatment, males were removed and females were placed into environments similar to the nt treatment. after three days the females were removed and eggs were allowed to develop for - days. emerging female virgins and adult males were collected similar to above and the process was repeated. nt, uct, and sct treatments were conducted the same as above except females were collected as virgins and males and females were held separately for three days after collection in order to align with the experimental schedule of the vnc and vc treatments. this experiment was conducted for only four generations as it was disrupted by a lab shutdown brought about by the covid- pandemic. one replicate of the sct vestigial treatment did not have any surviving adults at generation four. the rate of mutant allele loss in each population over multiple generations for each component of the experiment was analyzed by fitting generalized linear mixed effect models with binomial distribution (i.e. a logistic mixed model). since each allele was started at a known frequency, and the intercept was known, models were fit without estimating a global intercept (but included offsets). main effect for allele or treatment were also not included (as all treatments started with the same frequency for a given allele). fixed effects included in the model were thus generation x mutation type, generation x treatment, and generation x mutant type x treatment. random slopes for generation was included across replicate lineages, and the intercept was offset to . for allele frequency (or . for autosomal and . for sex-linked mutations when modelling mutant genotypic frequencies). fixed effects were further examined for significance with a two way anova (type ii wald χ test) and treatment contrasts averaged over mutant type were examined by comparing estimated marginal means within each model. for analyzing purging rates across environmental treatments, models were generated with and without the third sct replicate for the forked mutation due to this replicate having mutant allele frequencies approaching fixation consistently throughout the experiment ( figure s , table s and table s ). results presented exclude this replicate unless otherwise indicated. selection coefficients for each mutation were estimated using the allele frequency data. selection coefficient per generation was calculated as s = -(q'/q), and these estimates were then averaged across generation and replicate for each mutant type. all statistical analyses were performed in r v. . . (r core team ) using glmer() (lme package v . - (bates et al. ) ), anova() (car package v . - (fox and weisberg )), and emtrends() (emmeans package v. . . (lenth )). all plots were generated with ggplot v. . . (wickham ) . as expected, average allele frequency declined over generations for all six mutations types, indicating these alleles to be deleterious (fig ) . we observed substantial differences in rates of purging (as assessed by genotypic frequencies) based on the identity of the mutation. anova shows significant effects for all interactions of generation with mutant type and treatment, however significant effects may be restricted to certain mutation types as contrast estimates between treatments among all mutant types are non-significant (table and table ). across the six mutation types, there was no consistent overall pattern in purging rate between the nt, uct, and sct environmental treatments. similar results are shown when analyzing males and females separately. when examining estimated allele frequencies, only the interactions between generation and mutant type, and generation and treatment are significant (fig , table ). however, treatment contrasts are still not significantly different from one another ( table ) . overall trends of significance from anova and treatment contrasts are the same when including the third sct replicate for the forked mutation. estimated selection coefficients are of differing strengths for each mutant type, however these estimations also indicate no consistent pattern in strength of selection of treatment types across mutations (fig ) . overall the results suggest that while there are effects of the three spatial treatments on rates of purging (fig s ) , they are relatively modest in comparison to the effects of individual mutants and their interactions with the spatial treatment. in the second experiment, we replicated the above experiment with two alleles and added additional treatments with explicit manipulations of sexual selection. the addition of sexual selection for both white and vestigial mutant populations increased purging rates (fig , table ). while the forced monogamy treatment (vnc) treatment showed the slowest purging rate for both mutations, between the treatments that include sexual selection there is no consistent pattern in purging rate by treatment across the two mutant types. the anova shows significant effects of the interaction between generation and mutant type, and generation and treatment but not for the interaction between all three fixed effects. treatment contrasts show that the vnc (vial no choice) treatment (i.e. forced monogamy) is significantly different from the other treatments but vc, nt, uct, and sct are not significantly different from each other. when analyzing the sexes separately, only the interactions between generation and treatment, and generation and mutant type were significant for males whereas the interactions between generation and treatment, and generation, treatment and mutant type were significant for females. treatment contrasts were similar between male and female models with only the vnc treatment showing a significant difference from other treatment types when looking across all mutation types (table ). spatial heterogeneity in the environment can alter many aspects of an organisms' phenotype including mating strategy which in turn influences how selection acts on a population including the degree to which allelic effects may be concordant or antagonistic across fitness components. analyzing the directions and magnitudes of the components of natural selection has been investigated in many contexts, however many empirical studies teasing apart these elements in varying environments fail to recognise the influence of mating strategies. we created populations with known mutation frequencies and allowed them to evolve in environments differing in spatial constrains for resource accessibility to determine how environmental complexity influences the removal of deleterious mutations. we found environmental complexity did influence purging rates, but these rates depended greatly on mutation type. we reanalyzed the purging rates of two of these mutations in the same environments but also including treatments allowing different opportunities for mate choice within a more "simple" environment. again, we found that purging rates between treatments varied with mutation type, but for both mutations a lack of mate choice (forced monogamy) decreased purging rates. for each of the six mutations, we expected that with increased variance in resource accessibility there would be an increase in purging rate and therefore the highest purging rate would be seen in the sct treatment, with the lowest being in the nt treatment. this prediction rested on several assumptions including that natural and sexual selection are aligned, high quality food patches in the sct treatment would initiate territorial behaviour within males, and males of the highest quality would be able to hold and defend these food patches with the most success, leading to the most mates. while the sct treatment showed the highest purging rate among treatment types for plexus populations, this pattern does not hold for other mutant types. this discrepancy between our predictions and the data could be due to inaccurate assumptions or other unknown factors. despite evidence that drosophila melanogaster among other drosophila species can show context dependent territoriality (hoffmann ; hoffmann and cacoyianni ) , considerable uncertainty exists in the extent of what factors influence it and how it ultimately influences the fitness of an individual. it should also be noted that evolutionary stable strategy theories predict that a behavioural strategy will only be adopted by an individual or population if it is advantageous (maynard smith ) . while our environments were designed based on theory that would suggest our assumptions provide the most advantageous strategy (emlen and oring ; emlen ) , this cannot be known without further empirical testing and observation and other strategies may have been implemented that cause the discrepancy between our expectations and results. the lack of consistency between mutant alleles and the difference between treatments could be due to populations not using the environments as predicted. the nt environment was designed to resemble environments that promote scramble competition in drosophila, with uct having characteristics that promote territorial behaviours. the sct environment was designed to provide greater opportunity for one-on-one contests to occur between individuals due to limited entry to the desirable resource. individuals in the uct and sct environments were provided mm diameter high quality food patches with potential densities of males per high quality food patch (if the males within each environment were equally distributed across patches). while these conditions have been shown to increase the rate of territorial behaviour and the success of those males that defend territories (hoffmann and cacoyianni ) , these results were found over short-term experiments (up to hours) and these behaviours may not persist in d. melanogaster populations over longer time periods like the three days we allowed in our experiment. although not observed, other unexpected uses of the environments such as the majority of copulations occurring outside of food patches, and skewed patch use could have caused the disparity between our predictions and results. also, the addition of cap in the sct treatment was expected to aid males in further defending their resource patches. however due to the novelty of these environments, the behaviours these environments were meant to encourage may not have been used or had the opportunity to evolve. if the behaviours did evolve but at a point in the experiment where the allele frequencies for the mutations were low, genetic drift could have masked their effects. although our results do not show any consistent pattern of purging rate across treatment types between mutant types, inconsistent results are common to many purging experiments. many studies that analyze multiple mutations find that each mutation acts differently to experimental treatments not only in magnitude but also direction and thus mainly focus on the overall patterns among mutation types (sharp and agrawal ; maclellan et al. ; arbuthnott and rundle ; clark et al. ; maclellan et al. ; colpitts et al. ; singh et al. ). these differences are also reflected in our calculated selection coefficients, where higher selection coefficients lead to faster purging rates but the environmental treatment that has the highest selection coefficient changes depending on mutation type. differences between how these mutant individuals interact within their environment can likely explain these variances. for example, the mutant vestigial has a wing phenotype that influences both its movement and courtship signalling (pezzoli et al. ) putting it at a greater disadvantage compared to wildtype individuals in the same population, which is likely why it has the most drastic purging rate across environmental treatments among all the mutations analyzed in this study. further investigation into the behaviours of these mutant types may give an indication as to why these results differ between mutant types. while we wanted to explore how resource accessibility and environmental complexity influence populations through purging rates, we also wanted to evaluate how these compared to the purging rates of populations that lacked sexual selection and populations that had simple mating environments. as expected, the addition of sexual selection increased the purging rate for both mutations tested. however, there was no difference between the simple and relatively complex environments in purging rate for either mutation. this contradicts previous work of colpitts et al. ( ) where polygamous populations of mutant white d. melanogaster showed increased purging rates in complex environments. while the overall ideas between our experiments are similar, key differences in experimental design could explain these differences. firstly, due to the alignment of the experimental schedule, virgins from the vnc and vc treatments were able to mate more quickly than the virgins in the nt, uct, and sct treatments that were initially held separately before mating. this difference in waiting times to mate could have caused virgins from the nt, uct, and sct treatments to be more receptive to potential mates (pavković-lučić and kekić ). this could also explain why we see differences in the overall trends between the nt, uct, and sct environments compared to our initial experiment. secondly, our experiment had a much shorter mating period ( days versus ) and all eggs laid during this time period were kept to potentially contribute to the next generation for the nt, uct, and sct treatments, but not for the vnc and vc treatments. this could potentially lead to lower quality offspring from early matings with lower quality males being kept within the experiment, decreasing the purging rates within the complex mating treatments. overall our study adds to the recently growing body of literature considering "environmental complexity" while breaking down "complexity" further to accommodate for changes in mating strategy by environment. figure s : schematic for d-printed cap design. caps were created using filament material. sexual selection is ineffectual or inhibits the purging of deleterious mutations in drosophila melanogaster intra-sexual selection in drosophila fitting linear mixed-effects models using lme mating success in fruit flies: courtship interference versus female choice evolution of ungulate mating systems: integrating social and environmental factors effect of a refuge from persistent male courtship in the drosophila laboratory environment environmental effects on the evolution of mating systems in endotherms relative effectiveness of mating success and sperm competition at eliminating deleterious mutations in drosophila melanogaster the purging of deleterious mutations in simple and complex mating environments the descent of man and selection in relation to sex food distribution and a variable mating system in the dunnock, prunella modularis olfactory preference for egg laying on citrus substrates in drosophila reproductive contests and the evolution of extreme weaponry the evolution of animal weapons ecology, sexual selection, and the evolution of mating systems an {r} companion to applied regression. second alternative mating strategies in a desert grasshopper: evidence of density-dependence experimental reversal of courtship roles in an insect laboratory selection experiments using drosophila: what do they really tell us? fighting behaviour in territorial male roe deer capreolus capreolus: the effects of antler size and residence a laboratory study of male territoriality in the sibling species drosophila melanogaster and d. simulans territoriality of drosophila melanogaster as a conditional strategy chorusing behaviour, a density-dependent alternative mating strategy in male common toads (bufo bufo) sexual selection accelerates the elimination of a deleterious mutant in drosophila melanogaster populations with elevated mutation load do not benefit from the operation of sexual selection sex-specific effects of protein and carbohydrate intake on reproduction but not lifespan in drosophila melanogaster how males can gain by harming their mates: sexual conflict, seminal toxins, and the cost of mating lonely hearts or sex in the city? density-dependent effects in mating systems variation in the strength and softness of selection on deleterious mutations lifespan and reproduction in drosophila: new insights from nutritional geometry emmeans: estimated marginal means, aka least-squares means structural complexity of the environment affects the survival of alternative male reproductive tactics dietary stress does not strengthen selection against single deleterious mutations in drosophila melanogaster sexual selection against deleterious mutations via variable male search success the effects of male harm vary with female quality and environmental complexity in drosophila melanogaster spatial environmental complexity mediates sexual conflict and sexual selection in drosophila melanogaster phototactic and geotactic behaviour of countercurrent defective mutants of drosophila melanogaster the theory of games and the evolution of animal conflicts reducing mutation load through sexual selection on males sexual selection in complex environments influence of mating experience on mating latency and copulation duration in drosophila melanogaste females fitness components in a vestigial mutant strain of drosophila melanogaster overt and covert competition in a promiscuous mammal: the importance of weaponry and testes size to male reproductive success r: a language and environment for statistical computing. r foundation for statistical computing mating density and the strength of sexual selection against deleterious alleles in drosophila melanogaster environmental complexity and the purging of deleterious alleles courtship behaviour in drosophila parental investment and sexual selection. pp. - in sexual selection and the descent of man antler size provides an honest signal of male phenotypic quality in roe deer selection, epistasis, and parent-of-origin effects on deleterious mutations across environments in drosophila melanogaster ggplot : elegant graphics for data analysis the effect of pathogens on selection against deleterious mutations in drosophila melanogaster competition for mates and the improvement of nonsexual fitness the physical environment mediates male harm and its effect on selection in females we thank dr. tony frankino and christine sikes for their assistance with the design and development of d-printed caps used in this study. funding for this research was provided by the natural sciences and engineering research council (nserc) of canada and mcmaster university to id. key: cord- -mb qcd b authors: seymour, elif; Ünlü, nese lortlar; carter, eric p.; connor, john h.; Ünlü, m. selim title: configurable digital virus counter on robust universal dna chips date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mb qcd b here, we demonstrate real-time multiplexed virus detection by applying dna-directed antibody immobilization technique to a single-particle interferometric reflectance imaging sensor (sp-iris). in this technique, the biosensor chip surface spotted with different dna sequences is converted to a multiplexed antibody array by flowing antibody-dna conjugates and allowing specific dna-dna hybridization. the resulting antibody array is shown to detect three different recombinant vesicular stomatitis viruses (rvsvs) genetically engineered to express surface glycoproteins of ebola, marburg, and lassa viruses in real-time in a disposable microfluidic cartridge. we also show that this method can be modified to produce a single-step, homogeneous assay format by mixing the antibody-dna conjugates with the virus sample in solution phase prior to flowing in the microfluidic cartridge, eliminating the antibody immobilization step. this homogenous approach achieved detection of the model ebola virus, rvsv-ebov, at a concentration of pfu/ml in hour. finally, we demonstrate the feasibility of this homogeneous technique as a rapid test using a passive microfluidic cartridge. a concentration of pfu/ml was detectable under minutes for the rvsv-ebola virus. utilizing dna microarrays for antibody-based diagnostics is an alternative approach to antibody microarrays and offers advantages such as configurable sensor surface, long-term storage ability, and decreased antibody use. we believe these properties will make sp-iris a versatile and robust platform for point-of-care diagnostics applications. rapid and sensitive detection of viral infections is of significant importance for improving patient care and containing outbreaks that threaten public health. current techniques employed in clinical diagnosis of viral infections include polymerase chain reaction (pcr), enzyme-linked immunosorbent assay (elisa), isothermal nucleic acid amplification techniques (e.g. lamp and rpa), or virus isolation in cell culture. , these tests often require sending patient samples to a central laboratory with necessary equipment and trained personnel, and can take on the order of days, or weeks in the case of virus isolation from cell culture, hampering the fast containment of the virus and delaying the appropriate course of treatment. this situation is exacerbated when there is an excessive number of samples to be tested in an epidemic. epidemics are one of the challenging problems that have caused widespread deaths since the beginning of the known human history. starting from the post-classical era, humankind faced several epidemics such as the plague, viral hemorrhagic fever, cholera, smallpox, measles, poliomyelitis, and influenza. it is estimated that more than million people died from spanish flu (swine flu h n ) from to . the rapid spread of the novel coronavirus, sars-cov- , has reminded us that such pandemics are not historical anecdotes and exposed the major gaps in today's infectious disease diagnostics. in many countries, due to the huge demand in rt-pcr tests, clinical laboratories have faced shortages of trained personnel, test reagents and lab space. the system has been rapidly overwhelmed, causing longer wait times and delaying appropriate isolation procedures. to ease the burden on centralized laboratories and ramp up the testing capacity, rapid point-of-care (poc) tests in lateral assay format have been developed. although these tests can provide fast and simple detection, they lack sensitivity to detect low viral loads at the early stages of infection, when detection is essential for stopping community spread. there is a continuing need for alternative viral diagnostic techniques that can meet high sensitivity requirements in easy-to-use and portable poc platforms without the need for laboratory environment and trained personnel. an ideal poc platform should offer rapid (sample-to-answer in about min) and sensitive detection with minimal sample preparation. moreover, it should have multiplexing capability, which is especially important when it is necessary to distinguish between different viral pathogens that cause similar physical symptoms. it is also desirable that the diagnostic platform is easily configurable to new emerging viral pathogens with highly scalable and long-shelf-life consumables. in this paper, we describe a configurable and multiplexed digital virus counter capable of enumerating virions captured on a universal dna sensor chip and its application as a rapid testing platform utilizing dnaconjugated antibodies and disposable microfluidic cartridges. our group developed a label-free biosensor termed interferometric reflectance imaging sensor (iris) that has been used to quantify biomass accumulation on a microarray chip and was shown to detect virus particles captured onto an antibody-printed chip with a limit-of-detection (lod) of x pfu/ml. , iris utilizes a silicon-silicon dioxide microarray chip and an imaging sensor, consisting of different wavelength leds for illumination and a ccd camera that records reflected light intensities. although, this sensor provided a simple, inexpensive, and high-throughput platform for virus detection, due to the ensemble-based nature of this biosensor, sensitivity of detection was moderate. the iris system was modified to image single virus particles and termed single particle -iris (sp-iris). sp-iris has been shown to individually count and size the nanoparticles bound to capture probes on the sensor surface over a large sensor area. in this technique, high affinity capture probes are immobilized on the surface that can selectively bind to the target virus. when virus particles bind to the surface, scattered light from the particles interfere with the reference beam reflecting from the layered substrate, allowing an enhanced signal from the particle that is detected on the ccd camera. particles captured on the sensor surface appear as bright dots in the resulting image. this technique can also be used for single-particle interferometric (spir) microscopy modality and provide shape and size information allowing detailed morphological characterization of viruses. sp-iris has been utilized for detecting viruses in complex media using antibody microarrays by imaging chips both dry (dried after sample incubation and wash steps) and in liquid using disposable microfluidic cartridges. [ ] [ ] [ ] microfluidic integration of sp-iris provided an enclosed chamber for virus incubation, eliminated wash and drying steps and further improved the sensitivity of virus detection. these recent developments rendered sp-iris highly sensitive, fast and easy-to-use. (see table s - of the supporting information for a comparison of different modalities of iris in terms of key sensor properties.) as we improved sp-iris to develop it as a robust poc diagnostic platform, we focused on optimizing virus capture efficiency of antibody microarray chips, one of the most important factors that affect assay sensitivity in solid-phase immunoassays. a major challenge with antibodybased solid-phase biosensors is the immobilization of capture probes on the sensor surface. the surface attachment chemistry can affect the biological activity of the antibody, its affinity and the background noise, ultimately affecting the sensitivity of the biosensor. [ ] [ ] [ ] moreover, printing of antibodies on the microarray surface can introduce issues such as non-uniform surface coverage and assay-to-assay variability, affecting the assay accuracy and reproducibility. [ ] [ ] [ ] a dna-based site-specific antibody immobilization technique, known as dna-directed immobilization (ddi), has gained interest due to the reproducible production of dna microarrays, compatibility of dna microarrays with the fabrication of integrated microfluidic systems, and stability and robustness of dna chips. in this technique, a universal ssdna chip is first converted to an antibody microarray using antibodies tagged with short dna sequences complementary to the immobilized dna capture probes ( figure ). the resulting ddi-antibody microarray can be used for detection of target using a variety of labeled or label-free biosensing techniques. ddi technique has been shown to improve the antigen binding capacity, , the antibody surface coverage, and assay reproducibility, , compared to directly immobilized antibodies. in a previous study, we applied ddi approach to sp-iris to demonstrate the label-free detection of whole viruses. we showed that ddi elevates the antibodies from the sensor surface and improves the virus capture efficiency, increasing the sensitivity of the sp-iris platform. , here, we extend this approach to show multiplexed detection of three virus pseudotypes genetically engineered to express ebola, marburg, and lassa glycoproteins as a model for ebola, marburg and lassa virus detection. utilizing antibody -dna conjugates to convert a dna chip into a multiplexed antibody array suggests an alternative approach for generation of robust and repeatable diagnostic platforms by making use of stability and highly reproducible nature of dna microarrays. we also demonstrate, for the first time, a homogenous dna-directed virus capture assay where the antibody -dna conjugates and the virus sample are mixed in solution phase before incubating the dna chip. we present the combined utility of this homogeneous assay with a passive flow cartridge as an example of the application of sp-iris platform to a rapid test format which is suitable for poc testing. silicon chips with a patterned thermally grown silicon dioxide were purchased from silicon valley microelectronics inc. an oxide thickness of nm was used since the optimization studies for in-liquid visualization of the viruses showed that this thickness gave the highest level of particle contrast. custom-designed, disposable, active and passive microfluidic cartridges were purchased from aline. monoclonal antibodies (mabs) against ebola virus glycoprotein ( f ), marburg virus glycoprotein (agp - ), and lassa virus glycoprotein ( . f) were provided by mapp biopharmaceutical, prof. ayato takada (hokkaido university), and prof. james robinson (tulane university), respectively. recombinant vesicular stomatitis virus (rvsv) stocks expressing surface glycoproteins of ebola, marburg, and lassa viruses were created as described previously. hplc purified ′-aminated single-stranded dna (ssdna) molecules were purchased from integrated dna technologies. antibody-dna conjugation kit was purchased from innova biosciences. polymer kit for chip surface coating (mcp- ) was purchased from lucidant polymers. sensor chip functionalization. the silicon-silicon dioxide chips were cleaned by sonicating in acetone and then rinsing with methanol and nanopure water. chips were then dried under nitrogen. chips were coated with a -d polymeric coating, copoly(n,n-dimethylacrylamide (dma)acryloyloxysuccinimide (nas) - (trimethoxysilyl)propylmethacrylate (maps)) polymer, that offers a simple, inexpensive, and repeatable coating process and provides high density probe immobilization due to its -d structure. , copoly(dma-nas-maps) has nhs esters for covalent binding of proteins and amine-tagged dna molecules. for coating, the chips were first treated with oxygen plasma and then immersed in × mcp- polymer solution for min. chips were then rinsed extensively with nanopure water and dried with nitrogen. polymer coated chips were baked at °c for min and stored in a desiccator until microarray printing. printing of biomolecules on sensor chips. antibody and dna molecules were printed on the polymer-coated chips using a piezo-driven, non-contact dispensing system, sciflexarrayer s (scienion, germany). ′-aminated ssdna surface probes were spotted at µm in mm sodium phosphate buffer (ph = . ), producing dna spots of ~ μm diameter. for passive cartridge experiment and stability test, mg/ml f antibody in pbs with mm trehalose was spotted on the chip along with ssdna (a´ sequence in table ). antibody spots were ~ μm in diameter. during spotting, humidity was kept at % in the spotter chamber and the spotted chips were kept in the chamber overnight at % humidity. the chips were then washed with mm ethanolamine in × tris -buffered saline ( mm nacl and mm tris -hcl, fisher scientific), ph = . , for min to quench the unreacted nhs groups in the polymer. this step was followed by a min wash with pbst (pbs with . % tween) and a rinse with pbs and nanopure water. the chips were finally dried with nitrogen. antibody-dna conjugation. antibody -dna conjugates were prepared using thunder -link oligo conjugation kit (innova biosciences). each monoclonal antibody ( f , agp - , and . f, mg/ml) was reacted with a specific ′ -aminated mer ssdna ( μm) according to the manufacturer's instructions. the dna concentration used in the conjugation was optimized to yield a dna to antibody ratio between - in the final conjugate. ′-aminated mer dna sequences that are immobilized on the sensor chips are partially complementary to the antibody-conjugated dna strands. length of the dna surface probes were optimized in a previous study that showed mer probes provide optimal elevation of the antibodies from the sensor surface. three antibody-linked dna sequences (a, b, and c) and corresponding surface-immobilized probe sequences (a′, b′, and c′) are given in table components for virus detection assay step step step step step complementary regions between the antibody-linked sequences and the surface probes are underlined. to eliminate the formation of hairpin and self-dimer structures and to prevent cross hybridization between dna sequences. a ′ spacer sequence ( -bp polya) was added to the antibody-linked dna sequences to increase the hybridization efficiency. as given by the bradford assay for the protein part of the conjugate and the absorbance at nm for the dna part, dna-to-ab ratios were measured as . , . , and . , for f , agp - , and . f mab conjugates, respectively. antibody -dna conjugates are designated in the text by adding the letter representing the dna sequence to the antibody name as follows: anti-ebov-dna 'a', anti-marv-dna 'b', and anti-lasv-dna 'c'. optical biosensor setup and data analysis. in-liquid virus detection experiments were performed using sp-iris and spotted biosensor chips mounted into either a multi-layer laminate, disposable, active flow cartridge or a disposable passive flow cartridge that is composed of laminate layers and an absorbent pad ( figure ). for the active flow cell, the flow was controlled with a syringe pump (harvard apparatus, phd ) and a flow rate of μl/min was used. sp-iris setup is composed of a single wavelength led ( nm) for illumination of the substrate, a highmagnification objective ( ×, . na) to obtain a high spatial resolution image, and a ccd camera. an autofocus system (mfc- , applied scientific instrumentation) was used to control the focus during image acquisition. the recorded sp-iris images are analyzed for the bound virus particles for each spot. image analysis is performed using a custom software that identifies the particle-associated intensity peaks in a given image and applies a gaussian filter to eliminate the noise from the background. the morphological features of the antibody spots that become prominent due to the high resolution of the optical system cause a low correlation with a gaussian-type intensity profile, and therefore, the background signal caused by these features is eliminated by adjusting the gaussian filter parameters. sp-iris uses a forward-model to correlate the background normalized intensities of the particles to the particle size, allowing size-based filtering of the images to increase the specificity of detection. to quantify the virus particles in a spot, the diffraction-limited particles in the appropriate size range are detected and counted. the signal is expressed as virus density (number of particles per mm ) for a given spot by dividing the number of the detected particles by the analyzed spot area. for the end-point experiments, the initial particle count is subtracted from the final particle count for each spot to obtain the net number of particles bound to the spot during the experiment. table ) that are partially complementary to the antibody-linked dna sequences were spotted on a polymer coated sp-iris chip at a μm concentration. the chip was mounted in the active microfluidic cartridge via a pressure sensitive adhesive (psa) and the assembled cartridge was placed on the sp-iris stage. first, a mixture of dnaconjugated anti-ebov, anti-marv, and anti-lasv mabs (at μg/ml in pbs with % bsa) was flowed through the channel for min at a rate of μl/min. after a μl wash step with pbs, recombinant vsv models of ebov, marv and lasv were flowed sequentially over the sp-iris chip, by flowing each vsv pseudotype for min. μl pbs was flowed through the channel after each virus incubation to wash the extra virus in the channel and the tubing. the order of the virus incubation was rvsv-ebov, rvsv-marv, and rvsv-lasv, and their titers were , and pfu/ml, respectively, as determined by the plaque assay. the images of the anti-ebov, anti-marv, and anti-lasv spots generated by ddi were acquired every minute, and the virus densities on each spot were calculated over the course of the experiment. lod determination for one-step homogeneous detection of rvsv-ebov. one-step homogeneous assay uses a dna chip and a solution-phase mixture of the virus sample and antibody-dna conjugates, eliminating the antibody-dna conjugate incubation step. to determine the lod for the homogenous, dna-directed rvsv-ebov assay using sp-iris, we performed a dilution experiment with fold dilutions of a pfu/ml rvsv-ebov stock, ranging from pfu/ml to pfu/ml. five sp-iris chips were spotted with replicates of a´ probe and washed as described previously. μl of anti-ebov-dna 'a' conjugate at μg/ml was mixed with . ml of each of the rvsv-ebov dilutions prepared in . × pbs with % bsa. a blank sample was also prepared by mixing the same amount of ab-dna conjugate with . ml . × pbs with % bsa. after waiting for min, μl of each virus dilution and the blank sample was passed over a different sp-iris chip in the active microfluidic cartridge in subsequent experiments. for each sample, the channel was first filled with . × pbs with % bsa and the spots were scanned to obtain the pre-incubation particle counts. then, the sample was flowed for h in the cartridge at a rate of μl/min. after the channel was washed with pbs, the spots were scanned again. the net number of virus particles captured on the a´ spots were counted and the average virus densities were calculated from replicate spots for each chip. combining homogenous dna-directed assay with passive microfluidic cartridge. passive microfluidic cartridge has been designed to simplify the sp-iris platform by eliminating the need for an active syringe pump and to create a fully-contained test platform in order to minimize the sample handling. briefly, the passive microfluidic cartridge consists of a sample reservoir with a vented luer cap and an integrated °c fan shape absorbent pad in the channel placed after the chip (figure ) . the sample to be tested is pipetted into the reservoir and the flow is established by applying a pressure through the closure of the reservoir cap. once the sample flows over the chip and touches to the absorbent pad on the other side, the adhesive sealing tab on the cap is removed to let the fluid migrate under the atmospheric pressure. a stable flow rate (~ μl/min) is established by the °c fan shape of the absorbent pad. to demonstrate the feasibility of using the homogeneous assay in combination with the passive flow cartridge and to compare its performance to the directly immobilized antibody assay, an sp-iris chip was printed with anti-ebov mab, a´ probe, and a negative dna sequence, and washed as described previously. . μl of μg/ml anti-ebov-dna 'a' conjugate was added to μl of pfu/ml rvsv-ebov sample in pbs with % bsa. after min incubation, μl of this mixture was placed in the sample reservoir of the passive microfluidic cartridge. the reservoir cap was closed and tightened until the liquid started touching the absorbent pad. the cartridge was immediately placed on the sp-iris stage and the images of the directly immobilized anti-ebov, a´ probe, and negative dna spots were recorded every min during a min incubation. following the image acquisition, the virus density was calculated for each of the three spot types at every time point to show the real-time binding of the viruses. accelerated stability testing of dried antibody-dna conjugates. µl of μg/ml anti-ebov-dna 'a' was aliquoted into five tubes and placed in a vacuum oven for min at °c for drying the conjugate solution. after drying, of the dried conjugate tubes were placed into the oven at °c for the accelerated stability test. one tube was used for the day measurement on the same day. sp-iris chips, spotted with replicates of anti-ebov antibody and a´ sequence, were also kept in the oven at °c. pfu/ml rvsv-ebov sample was prepared on day and aliquoted into five tubes. four of the virus samples were stored at - °c until the virus detection experiments. on each of the days , , , , and , one dried conjugate tube was reconstituted with µl pbs with % bsa and flowed over the sp-iris chip mounted on the active microfluidic cartridge for min for ddi. the channel was washed with pbs and the spots were imaged with sp-iris for the pre-incubation particle counts. then, pfu/ml rvsv-ebov sample was flowed over the chip for min and the spots were scanned again to obtain the postincubation particle counts. average virus densities on the ddi-antibody spots were calculated from replicate spots for each chip. to show the multiplexed detection of the rvsv models for ebola, marburg, and lassa viruses and specificity of the antibody-dna conjugates, we performed a sequential incubation with these viruses after functionalizing a dna spotted sp-iris chip with three antibody-dna conjugates. first, a mixture of dna conjugated anti-ebov, anti-marv, and anti-lasv mabs was flowed through the active microfluidic cartridge for minutes. this step loaded the antibodies onto the specific complementary ssdna spots on the sensor chip. following the antibody immobilization step, rvsv samples were flowed sequentially in the following order: rvsv-ebov, rvsv-marv, and rvsv-lasv. each virus sample was flowed for minutes followed by a μl pbs wash step. sp-iris image acquisition was done with min intervals. figure shows the virus densities (particle count / mm ) on the three ddi-antibody spots (anti-ebov-dna 'a', anti-marv-dna 'b', and anti-lasv-dna 'c') during the course of the experiment. following the rvsv-ebov sample addition (red band), the signal on the anti-ebov-dna 'a' spot starts to increase whereas the other two spots do not show any virus binding. after the rvsv-marv (green band) is introduced, the virus density on the anti-marv-dna 'b' spot starts to increase showing the specific detection of rvsv-marv. finally, when the rvsv-lasv sample is flowed in the channel (blue band), the virus density on the anti-lasv-dna 'c' spot increases whereas the signal on the other two spots remain constant. overall, these results show that the site-specific self-assembly of the three antibody -dna conjugates on a dna surface was performed successfully to generate a multiplexed antibody microarray, and each antibody-dna conjugate was able to detect its target virus specifically with no cross-reactivity from other viruses. such a programmable dna surface can serve as a universal chip that can be adapted for the detection of different target viruses by using different sets of antibody-dna conjugates based on the need. antibody-dna conjugates in solution-phase. one drawback of the conventional ddi-based detection assay is the time associated with the antibody-dna conjugate immobilization step. to reduce the assay time and make this approach compatible with passive flow platforms where it is not practical to have sequential flows, we explored a homogeneous tagging approach where the virus sample is mixed with the antibody-dna conjugates in solution prior to the incubation of the chip. this approach replaces the -min antibody immobilization step of the conventional ddi technique with a simple and fast mixing step, reducing the number of incubation and wash steps. although the homogenous tagging of the target has been demonstrated for the detection of antigens previously, , our work is the first one, to the best of our knowledge, to show the capture of whole viruses decorated with dna-encoded antibodies on a ssdna microarray surface. one important consideration that needs to be addressed for this approach is the amount of the antibody-dna conjugates to be added to the virus sample. presence of excess antibody-dna conjugates in the solution would cause blocking of the dna surface with antibody-dna conjugates, preventing the binding of the virus particles that are already decorated with antibody-dna conjugates. we found that a concentration of . μg/ml antibody-dna conjugate does not saturate the surface, causing only . nm height increase, as measured by iris (data not shown), compared to about a nm height increase when the surface is fully saturated with the antibody-dna conjugates. (in iris, nm surface height corresponds to a surface antibody density of . ng/mm .) we mixed the anti-ebov-dna 'a' conjugate (at a final concentration of . μg/ml) with the rvsv-ebov samples prepared by -fold dilutions from a pfu/ml stock, ranging between - pfu/ml, and waited for min prior to the flow over the sp-iris chip. virus titer of the rvsv-ebov stock was measured by plaque assay. next, we flowed this mixture over the sp-iris chip in the active microfluidic cartridge for h and determined the captured virus density on the complementary a´ spots. figure shows the average virus densities on the a´ spots obtained from replicate spots for each titer tested. the detection threshold, indicated by solid red line, was calculated as the average virus density of six a´ spots plus three times the standard deviation from the blank chip that was incubated with only anti-ebov-dna 'a' conjugate. average virus density from the blank chip, which is virus count/mm , is also shown in the graph as dashed red line. in our earlier work, we have demonstrated that variance scales inversely with the sensor area (equivalently number of spots averaged). thus, for a large sensor area (or many spots), the lod will depend only on the mean virus density of the blank chip. therefore, the lod for a single spot detection is virus count/mm (solid red line) corresponding to pfu/ml, obtained by the extrapolation of the linear fit based on the data presented in figure . similarly, for a large area sensor where many spots can be averaged to virtually eliminate the variance, the lod is expected to approach virus count/mm , further improving the sensitivity. the lod for the homogeneous assay ( pfu/ml) is comparable to the one obtained from the heterogenous ddi technique, pfu/ml, and therefore, our results indicate that the homogenous approach provides a simpler assay procedure without affecting the sensitivity of the detection. sp-iris platform provides a positive result once the adequate number of virions are counted, and therefore, test time can be significantly reduced for high titer samples. according to figure , the experimental virion count response scales with [virus concentration] ( / ) -a perfect theoretical fit to the sheet density of the virus on the chip surface. in contrast, in an rt-pcr test, the cycle threshold (ct) values are inversely and logarithmically related to the viral rna copy number. ct values reported for sars-cov- in a recent study are . , . , . , and . , corresponding to . × , . × , . × , and . × copies/ml, respectively. while a thousand-fold increase in viral load provides a marginal ( %) reduction in ct values corresponding to a small time saving for rt-pcr (about min), the reduction in test time for sp-iris can be as high as -fold. based on our data presented in this and next section, sp-iris is expected to have significantly reduced test times at median viral loads (a few minutes) allowing for high throughput testing. one other advantage of the homogenous assay is the decreased antibody usage compared to the direct immobilization and the conventional ddi. homogenous assay uses at least three-fold less antibody per test, decreasing the cost of the assay. (see supporting information for the comparison of the antibody usage for three methods.) our lod determination experiment used a high number of antibody-dna conjugates per virus particle. this amount can be further decreased (at least -fold) and still have sufficient ab-dna molecules in the solution for efficient tagging of the virus particles (table s- of the supporting information) . moreover, the use of low antibody concentrations makes testing of the antibodies possible even when they are available at low quantities or concentrations. dilution experiment using single -step homogeneous assay approach. average virus densities are calculated from replicate a´ spots that are complementary to the anti-ebov-dna 'a' conjugate. solid red line shows the detection threshold calculated as the mean virus density from six spots plus three standard deviation of the mean from a blank chip. pfu/ml rvsv-ebov was detectable for a -hour incubation. lod, based on the extrapolation of the linear fit, is pfu/ml. passive microfluidic cartridge test using the homogeneous approach. active flow cartridge-based approach is not practical for field testing due to the complexity of the sample flow process that uses a syringe pump and tubing. this especially brings concerns in the case of lethal virus outbreaks due to the risk associated with the sample fluid handling. therefore, we have designed a passive microfluidic cartridge that would eliminate the need for an active pump and provide a fully contained test platform with minimum sample handling. we combined the homogeneous virus tagging approach with this lateral-flow cartridge to show the utility of sp-iris as a rapid testing platform. for this purpose, we added the anti-ebov-dna 'a' conjugate (final concentration of . μg/ml) to the rvsv-ebov sample ( pfu/ml), mixed, and waited for minutes. then, we applied this mixture to the reservoir and closed the cap. once the flow started, we put the cartridge onto the sp-iris stage and started scanning the dna spots (both complementary and negative dna sequences) and directly immobilized anti-ebov spots with min intervals. figure shows the captured virus density as a function of time for complementary dna spots (a´), negative dna spots and directly immobilized anti-ebov antibody. a positive signal can be observed on the complementary dna spots within the first minute of the incubation. moreover, the signal from the directly immobilized anti-ebov spots is lower than that of dna spots which is consistent with our previous findings. our results suggest that the combination of the homogenous assay with the lateral-flow cartridge can provide a suitable platform for rapid and sensitive virus diagnostics. to test the stability of the dried antibody-dna conjugates, we performed an accelerated stability test by storing the dried anti-ebov-dna 'a' conjugates and the spotted sp-iris chips at °c for weeks and performing inliquid virus detection experiments using ddi on the days , , , and . figure shows how the sp-iris signal changes over time for the dna-conjugated antibodies that were reconstituted for ddi on the test day. at the end of the two weeks, the antibody-dna conjugates captured approximately . times more viruses per mm than the directly immobilized antibody spots on the day chip (shown by the dashed red line), showing the superior performance of ddi technique to the direct spotting. the signal on the directly spotted antibodies showed a faster degradation rate over time and higher variability compared to the dried antibody-dna conjugates (data not shown). by extrapolating the line fit for the data points in figure , the stability of the antibody-dna conjugates is calculated as days at °c. using the q-rule we applied the ddi technique to our virus counter, sp-iris, for generation of a multiplexed antibody array for the detection of ebola, marburg, and lassa viruses. we showed the specific self-assembly of the antibodies on a dna microarray surface and subsequent real-time detection of three different rvsvs in a disposable microfluidic cartridge. we also demonstrated the homogeneous tagging of the viruses with antibody-dna conjugates in solution phase. by introducing this dna-encoded virus tagging approach, antibody immobilization step in conventional ddi can be eliminated, decreasing the assay time and complexity. in addition, homogenous dna-directed assay uses substantially less antibody (three-fold) compared to the conventional ddi and direct spotting. moreover, in-solution binding of antibodies eliminates the problems that affect capture efficiency such as antibody orientation, steric hindrance, and the activity loss due to antibody immobilization. we also demonstrated the combined utility of this homogenous method with a passive microfluidic cartridge. this platform allowed us to detect pfu/ml rvsv-ebov in less than min in a disposable, contained cartridge, showing the feasibility of sp-iris as a rapid and sensitive viral diagnostic platform. dna chips also offer the advantage of being configurable, allowing the use of the same multiplexed dna chips to create the desired virus detection panel. for example, clinicians would greatly benefit from a multiplexed test for sars-cov- and different influenza viruses to differentiate between these viruses that cause similar symptoms and that can happen concurrently. we envision that a multiplexed, dna microarray-based sp-iris system would provide a versatile, robust, and simple diagnostic platform through the use of universal dna chips and specific antibody-dna conjugates that can be synthesized quickly according to the need. comparing antibody usage for direct immobilization and dnadirected assays, molecular and immunological diagnostic tests of covid- : current status and challenges. iscience dengue and dengue hemorrhagic fever the lancet see the following epidemics to eradication: the modern history of poliomyelitis reviewing the history of pandemic influenza: understanding patterns of emergence and transmission fast, portable tests come online to curb coronavirus pandemic recent advances in immobilization methods of antibodies on solid supports technological development of antibody immobilization for optical immunoassays: progress and prospects antibody-based protein multiplex platforms: technical and operational challenges surface chemistry and morphology in single particle optical imaging sars-cov- viral load in upper respiratory specimens of infected patients assessing shelf life using real-time and accelerated stability tests authors would like to thank steven m. scherr for his help with microfluidic cartridge experiments and a. j. devaux for taking cartridge pictures. this work was supported by r ai to j.h.c. and m.s.u. key: cord- -mc zruyx authors: toksvang, linea natalie; schmidt, magnus strøh; arup, sofie; larsen, rikke hebo; frandsen, thomas leth; schmiegelow, kjeld; rank, cecilie utke title: hepatotoxicity during -thioguanine treatment in inflammatory bowel disease and childhood acute lymphoblastic leukaemia: a systematic review date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: mc zruyx background the recently established association between higher levels of dna-incorporated thioguanine nucleotides and lower relapse risk in childhood acute lymphoblastic leukaemia (all) calls for reassessment of prolonged -thioguanine ( tg) treatment, while avoiding the risk of hepatotoxicity. objectives to assess the incidence of hepatotoxicity in patients treated with tg, and to explore if a safe dose of continuous tg can be established. data sources databases, conference proceedings, and reference lists of included studies were systematically searched for tg and synonyms from – . methods we included studies of patients with all or inflammatory bowel disorder (ibd) treated with tg, excluding studies with tg as part of an intensive chemotherapy regimen. we uploaded a protocol to prospero (registration number crd ). database and manual searches yielded unique records. of these, full-texts were screened for eligibility. finally, reports representing studies were included. results and conclusions we included data from studies of all and ibd patients; four randomised controlled trials (rcts) including , patients, observational studies including patients, and case reports including patients. hepatotoxicity in the form of sinusoidal obstruction syndrome (sos) occurred in – % of the all patients in two of the four included rcts using tg doses of – mg/m /day, and long-term hepatotoxicity in the form of nodular regenerative hyperplasia (nrh) was reported in . %. in ibd patients treated with tg doses of approximately mg/m /day, nrh occurred in % of patients; sos has not been reported. at a tg dose of approximately mg/m /day, nrh was reported in % of ibd patients, which is similar to the background incidence. according to this review, doses at or below mg/m /day are rarely associated with notable hepatotoxicity and can probably be considered safe. two review authors independently evaluated risk of bias of the included reports using the cochrane collaboration tool for rcts. [ ] the study quality assessment tools of the national heart, lung, and blood institute of the national institutes of health (nih) for quality assessment of observational cohort and cross-sectional studies and controlled intervention studies [ ] were used to assess the risk of bias of observational studies. a graphic illustration of potential bias of observational studies ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , and were case reports ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . no similar reviews were found in the database search or on prospero. the included rcts were generally well executed with low risk of bias, except in the areas of blinding participants and outcome assessment, which may lead to performance and detection bias. risk of bias assessments in the included observational studies did not result in any suspicion of specific biases. the confidence in the cumulative estimate of the incidence of hepatotoxicity was graded as moderate, and the use of diagnostic methods as moderate, primarily because of the inconsistencies in reporting within these subjects. the confidence in the cumulative estimate of dose reduction or truncation of tg was graded as high. the findings of the present systematic review indicate that tg-induced hepatotoxicity in the form of sos or nrh is highly dose-dependent, and that it rarely occurs at daily doses of less than mg/m /day. furthermore, tg-induced hepatotoxicity appears to be largely reversible, except at high doses exceeding mg/m /day. the three rcts of childhood all, the coall- trial, the ccg- trial, and the uk mrc all / trial, respectively, have previously been included in a meta-analysis, which estimated the increase in sos between treatment arms to be a factor . (or, % ci . - . ). [ ] sos was highlighted as a dose-related toxicity with a multifactorial aetiology. no sos was reported in the coall- trial, albeit a high frequency of discordant thrombocytopenia, which may reflect a minor degree of hepatotoxicity. in that trial, vincristine and corticosteroids were not co- administered with tg -as was the case in the other two rcts. [ ] this difference in the co- medication has been proposed to be an explanation for the differing incidence of sos between the three rcts. [ ] the theory that nrh is tg dose or ery-tgn level-dependent was first presented in . [ ] this is supported by a mouse model, in which sos arose from high peak concentrations of tg, lnt created and pilot tested the data extraction form. sa, mss and lnt extracted the data, made assessments of risk of bias and confidence in cumulative evidence. lnt drafted the manuscript. the manuscript was read, revised, and approved by all authors. setting, duration, inclusion and exclusion criteria, source of funding, conflict of interest, ethical approvals, key conclusions), patient characteristics (number of patients, age, sex, ethnicity, disease, comorbidity, concomitant therapy), details of the interventions (duration of tg, dose of tg, cumulative dose of tg, maximum dose of tg, route of administration, ery-tgn levels), comparator (comparator drug, duration of mp or other standard of care drug, dose of mp or other standard of care drug, cumulative dose of mp or other standard of care drug, maximum dose of mp or other drug incidence of any hepatotoxicity reported as sos, veno-occlusive disease (vod) drug- induced liver injury or non-specified hepatotoxicity. due to the lack of standardised definitions of hepatotoxicity, authors of the included studies may not have used the above-mentioned terms. to assess additional hepatotoxicity we therefore report any pathological findings of liver biopsies and use the ponte di legno (pdl) toxicity working group consensus criteria for sos, which entail fulfilment of at least three out of the following five criteria: (i) hyperbilirubinaemia; (ii) hepatomegaly furthermore, we considered an increase in alanine transaminase, aspartate transaminase, alkaline phosphatase, conjugated bilirubin or total bilirubin of more than two times upper normal limit as evidence of hepatotoxicity secondary outcomes: diagnostic methods (number of patients who had a liver biopsy, indication for liver biopsy, other diagnostic methods, study conclusions about diagnostic methods); dose reduction or truncation of tg due to hepatotoxicity (how many patients had tg truncated or the dose reduced if protocols were unavailable, we compared outcomes reported in the methods and results sections. we did not quantitate the impact of meta ) consistency, and ( ) directness. the evidence on each outcome was graded as 'very low', 'low', 'moderate' or 'high'. higher in studies on tg for adult ibd and childhood all. tg-related hepatotoxicity persists in the form of nrh in % of the patients. however, the use of tg doses of approximately mg/m /day or less leads to hepatotoxicity in only % of the adult patients, corresponding to the incidence in the background population veno-occlusive disease of the liver after chemotherapy of acute leukemia. report of two cases toxicity and efficacy of -thioguanine versus -mercaptopurine in childhood lymphoblastic leukaemia: a randomised trial oral -mercaptopurine versus oral -thioguanine and veno-occlusive disease in children with standard-risk acute lymphoblastic leukemia: report of the children's oncology group ccg- clinical trial hepatic sinusoidal obstruction syndrome during maintenance therapy of childhood acute lymphoblastic leukemia is associated with continuous asparaginase therapy and mercaptopurine metabolites. pediatr blood cancer review article: the association between nodular regenerative hyperplasia, inflammatory bowel disease and thiopurine therapy long-term risk of portal hypertension and related complications in a novel mouse model of venoocclusive disease provides strategies to prevent thioguanine-induced hepatic toxicity nodular regenerative hyperplasia and thiopurines: the case for level-dependent toxicity meta-analysis of randomised trials comparing thiopurines in childhood acute lymphoblastic leukaemia novel therapy for childhood acute lymphoblastic leukemia relapsed childhood acute lymphoblastic leukemia in the nordic countries: prognostic factors, treatment and outcome the cytotoxicity of thioguanine vs mercaptopurine in acute lymphoblastic leukemia drug insight: pharmacology and toxicity of thiopurine therapy in patients with ibd dna-thioguanine nucleotide concentration and relapse-free survival during maintenance therapy of childhood acute lymphoblastic leukaemia (nopho all ): a prospective substudy of a phase trial cochrane handbook for systematic reviews of interventions version . . [updated preferred reporting items for systematic reviews and meta-analyses: the prisma statement the prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration consensus definitions of severe acute toxic effects for childhood lymphoblastic leukaemia treatment: a delphi consensus thiopurine-induced liver injury in patients with inflammatory bowel disease: a systematic review the study quality assessment tools of the national heart, lung, and blood institute of the national institutes of health (nih) for quality assessment of observational cohort and cross-sectional studies and con grading quality of evidence and strength of recommendations thioguanine offers no advantage over mercaptopurine in maintenance treatment of childhood all: results of the randomized trial coall- long-term outcome in children with relapsed all by risk-stratified salvage therapy: results of trial acute lymphoblastic leukemia-relapse study of the berlin-frankfurt-munster group substitution of oral and intravenous thioguanine for mercaptopurine in a treatment regimen for children with standard risk acute lymphoblastic leukemia: a collaborative children's oncology group/national cancer institute pilot trial (ccg- ). pediatr blood cancer -thioguanine: a new old drug to procure remission in inflammatory bowel disease high variation of tioguanine absorption in patients with chronic active crohn's disease -thioguanine can cause serious liver injury in inflammatory bowel disease patients -thioguanine associated nodular regenerative hyperplasia in patients with inflammatory bowel disease may induce portal hypertension treatment of inflammatory bowel disease with -thioguanine ( -tg): retrospective case series from a tertiary care center efficacy of -thioguanine in patients with crohn's disease intolerant to azathioprine biotransformation of -thioguanine in inflammatory bowel disease patients: a comparison of oral and intravenous administration of -thioguanine lower" -thioguanine ( -tg) levels may be as effective as "higher" -tg levels in ibd patients treated with thioguanine -thioguanine in children with acute lymphoblastic leukaemia: influence of food on parent drug pharmacokinetics and -thioguanine nucleotide concentrations -thioguanine as an alternative therapy in inflammatory bowel disease-experience in a london district general hospital splitting a therapeutic dose of thioguanine may avoid liver toxicity and be an efficacious treatment for severe inflammatory bowel disease: a -center observational cohort study efficacy and safety of -thioguanine in the management of inflammatory bowel disease to evaluate the efficacy and safety of -thioguanine therapy in patients with inflammatory bowel disease-a london dgh experience a systematic survey evaluating -thioguanine-related hepatotoxicity in patients with inflammatory bowel disease toxicity and efficacy of intensive chemotherapy for children with acute lymphoblastic leukemia (all) after first bone marrow or extramedullary relapse the prevalence of nodular regenerative hyperplasia in inflammatory bowel disease patients treated with thioguanine is not associated with clinically significant liver disease thioguanine in inflammatory bowel disease: long-term efficacy and safety. united tpmt and mthfr genotype is not associated with altered risk of thioguanine-related sinusoidal obstruction syndrome in pediatric acute lymphoblastic leukemia: a report from the children's oncology group. pediatr blood cancer portal hypertension develops in a subset of children with standard risk acute lymphoblastic leukemia treated with oral -thioguanine during maintenance therapy. pediatr blood cancer psychosomatic complications during treatment for ulcerative colitis long-term follow-up of children with -thioguanine-related chronic hepatoxicity following treatment for acute lymphoblastic leukaemia variceal hemorrhage in a patient with ulcerative colitis treated with -thioguanine acute sinusoidal obstruction syndrome after -thioguanine therapy for crohn's disease thioguanine treatment-related sinusoidal obstruction syndrome in children mri patterns in a case of -thioguanine-related hepatic sinusoidal obstruction syndrome safe -thioguanine therapy of a tpmt deficient crohn's disease patient by using therapeutic drug monitoring veno-occlusive disease of the liver associated with thiopurines in a child with acute lymphoblastic leukemia pharmacokinetics of -thioguanine and -mercaptopurine combination maintenance therapy of childhood all: hypothesis and case report the case of colitis ulcerosa -diagnostic and therapeutic difficulties hepatotoxicity associated with -thioguanine therapy for crohn's disease liver venoocclusive disease (vod) in a patient given -thioguanine for crohn's disease early nodular hyperplasia of the liver occurring with inflammatory bowel diseases in association with thioguanine therapy safety of tioguanine during pregnancy in inflammatory bowel disease thioguanine-induced symptomatic thrombocytopenia on the limitation of -tioguaninenucleotide monitoring during tioguanine treatment thioguanine use in inflammatory bowel disease: year experience in a tertiary centre toxicity of -thioguanine: no hepatotoxicity in a series of ibd patients treated with longterm, low dose -thioguanine. some evidence for dose or metabolite level dependent effects? dig liver dis transient elastography to assess liver stiffness in patients with inflammatory bowel disease early hepatic nodular hyperplasia and submicroscopic fibrosis associated with -thioguanine therapy in inflammatory bowel disease histopathology of liver biopsies from a thiopurine-naïve inflammatory bowel disease cohort: prevalence of nodular regenerative hyperplasia micronodular transpormation (nodular regenerative hyperplasia) of the liver: a report of cases among , autopsies and a new classification of benign hepatocellular nodules -thioguanine treatment in inflammatory bowel disease: a critical appraisal by a european -tg working party a multicenter assessment of liver toxicity by mri and biopsy in ibd patients on -thioguanine -thioguanine-related chronic hepatotoxicity and variceal haemorrhage in children treated for acute lymphoblastic leukaemia -a dual-centre experience nodular regenerative hyperplasia: evolving concepts on underdiagnosed cause of portal hypertension nodular regenerative hyperplasia rarely leads to liver transplantation: a -year cohort study in all dutch liver transplant units. united eur gastroenterol j key: cord- -wo gg nx authors: mathew, nimitha r.; jayanthan, jayalal k.; smirnov, ilya; robinson, jonathan l.; axelsson, hannes; nakka, sravya s.; emmanouilidi, aikaterini; czarnewski, paulo; yewdell, william t.; lebrero-fernández, cristina; bernasconi, valentina; harandi, ali m.; lycke, nils; borcherding, nicholas; yewdell, jonathan w.; greiff, victor; bemark, mats; angeletti, davide title: single cell bcr and transcriptome analysis after respiratory virus infection reveals spatiotemporal dynamics of antigen-specific b cell responses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wo gg nx b cell responses are a critical component of anti-viral immunity. however, a comprehensive picture of antigen-specific b cell responses, differentiation, clonal proliferation and dynamics in different organs after infection is lacking. here, we combined single-cell rna sequencing with single-cell b cell receptor (bcr) characterization of antigen-specific cells in the draining lymph nodes, spleen and lungs after influenza infection. we identify several novel b cell subpopulations forming after infection and find organ-specific differences that persist over the course of the response. we discover important transcriptional differences between memory cells in lungs and lymphoid organs and describe organ-restricted clonal expansion. strikingly, by combining bcr mutational analysis, monoclonal antibody expression and affinity measurements we find no differences between germinal center (gc)-derived memory and plasmacells, at odds with an affinity-based selection model. by linking antigen-recognition with transcriptional programming, clonal-proliferation and differentiation, these finding provide important advances in our understanding of antiviral b cell immunity. viral respiratory infections caused by influenza-, orthopneumo-or corona-virus are major concerns worldwide. b cell derived antibodies (abs) are a central feature of adaptive immunity to viruses. abs can greatly reduce viral pathogenicity in primary infections and can provide complete protection against disease causing reinfections (lam and baumgarth, ) . influenza a virus (iav) is a highly prevalent respiratory virus that causes significant morbidity and mortality in humans (iuliano et al., ) . intranasal (i.n.) infection with iav initiates b cell responses in several organs characterized by a robust early extrafollicular plasmablast (pb) response, followed by persistent germinal center (gc) formation in the draining mediastinal lymph nodes (mln) and diffuse memory b cell (bmem) dispersion across several organs (angeletti et al., ; boyden et al., ; frank et al., ; joo et al., ; rothaeusler and baumgarth, ) . respiratory virus infection can also promote circulating blood cells to generate inducible bronchus-associated lymphoid tissues (ibalts) in the lung parenchyma (moyron-quiroz et al., ) exemplified by the formation of gc-like structures in mouse lungs by days post infection (dpi) with iav (denton et al., ; tan et al., ) . the iav surface glycoprotein hemagglutinin (ha) is the immunodominant target of b cell responses to iav infection and immunization (altman et al., ; angeletti and yewdell, ) . pioneering studies performed over three decades ago examined the diversity of mouse b cell responses to iav via sequencing b cell receptors (bcr) from hybridomas generated from b cells recovered at various times p.i. with the a/pr/ (h n ) mouse adapted strain kavaler et al., ; kavaler et al., ) . of note, with hybridomas is impossible to discern the cell type originating the fusions. nevertheless, comprehensive studies assessing the link between transcriptional status and clonal diversity of b cell populations at different developmental stages within or between organs after respiratory viral infections are lacking. deciphering how bcr characteristics are linked to cell differentiation is crucial for our ability to understand and ultimately manipulate b cell responses with more effective vaccines or adjvants. few studies identified lung bmem as critical in preventing iav reinfection (allie et al., ; onodera et al., ) . these tissue resident bmem (allie et al., ) appear to have a broader specificity compared to splenic bmem (adachi et al., ) . however, virtually nothing is known about their overall transcriptional programming, their bcr profile, and whether they originate from lung-ibalt vs. other lymphoid organs. better appreciation of the origin and formation of lung resident memory cells after infection is crucial first step in developing mucosal vaccines against respiratory viruses. germinal centers (gc) form as a consequence of rapid clonal proliferation during t cell dependent b cell responses and are the site of b cell affinity maturation through selection of high affinity clones generated via somatic hypermutation (shm) . gc entry, exit and dynamics have primarily been studied during responses to simple model antigens (victora and nussenzweig, ) . a recent study suggested that following protein immunization, naive b cells with high avidity bcrs tend to immediately differentiate into pb or igm bmem, while b cells carrying lower avidity bcrs enter the gc (pape et al., ) . activated b cells initially acquire a light zone (lz) phenotype, then cycle into the dark zone (dz) where they proliferate and acquire shm. after returning to lz, bcr avidity is assessed by interaction with antigen on follicular dendritic cells, and current models suggest various b cell fates are determined by signals from t follicular helper cells . such fates include re-entering the dz for additional rounds of shm, differentiating to pb or bmem or undergoing apoptosis. signals that regulate terminal b cell differentiation to pb and bmem have primarily been studied using model antigens and transgenic mice (krautler et al., ; phan et al., ; shinnakasu et al., ; smith et al., ; suan et al., a; suan et al., b; weisel et al., ) . the general consensus is that higher avidity b cells will differentiate into pb, while b cells of lower avidity become bmem (suan et al., b) . this was demonstrated in a study using the model hapten nitrophenol (np) as antigen, where lz gc b cells highly expressing bach were of lower avidity and destined for bmem differentiation (shinnakasu et al., ) . a subsequent study using the switched-hel (hen-egg lysozyme) transgenic mouse model, supported this and further indicated ccr as marker for lower avidity lz gc b cells becoming bmem (suan et al., a) . interestingly, the latter study reported a high avidity bmem subset as well. importantly, unlike most natural responses, both np and hel models require only a single mutation for the germline v region in the bcr to mature from low to high avidity. whether a similar selection of lower avidity gc cells into the memory compartment occurs after viral infection and how the selection differs between organs is unknown. this is particularly relevant to understand how the first encounter with a virus shapes bmem formation, a central feature of the original antigenic sin phenomenon in anti-iav responses (henry et al., ; yewdell and santos, ) . to address these questions, and overcome limitations of previous studies, we have sequenced single ha-specific b cells to correlate their transcriptome with their paired heavy and light chain bcr within different organs and across time points post intranasal iav infection. our data provide a comprehensive resource to trace b cell differentiation upon respiratory viral infection. intranasal (i.n.) mouse infection with iav is a well-established acute respiratory viral infection model. we infected mice i.n. with iav pr and tracked antigen-specific responses at , and days post infection (dpi) by sorting antigen experienced, ha+ igd-b cells from individual lungs, spleen and mediastinal lymph nodes (mln) (fig a-b) . as a control, we sorted total b cells (live, cd -, b +) from spleen and lungs of two mice (fig s a) . we subjected antigen-specific cells to single cell rna sequencing (scrnaseq) paired with single cell b cell receptor profiling (scbcrseq). in total, we analyzed results from , cells from naïve mice and , cells from infected mice ( , , , , and , cells, respectively, from , , and dpi) . unsupervised clustering, using the sauron implementation of seurat package, distinguished populations of ha-specific b cells that clustered according to their transcriptional profile ( fig c) . differential gene expression analysis allowed us to define specific cell populations ( fig d) : naïve and activated cells (clusters c , c and c , both igm+ and igd+), marginal zone (mz) b cells (cluster c ) and gc cells (clusters c , c , c - ). pb (cluster c ) and bmem (cluster c ) were confirmed by previously described gene signatures ( fig. e ) (bhattacharya et al., ) . interestingly, gc cluster c showed high expression of genes associated with bmem fate (see below for further discussion). likewise, we could readily divide gc clusters into lz and dz cells using sets of genes known to distinguish these cells (fig. f ) (victora et al., ) . despite having regressed out cell cycle influence, gc clusters still separated based on cell cycle phase, as expected ( fig s b) . we identified two lz clusters (c and c ) that expressed gene signatures typical for strong interactions with t follicular helper (tfh) cells and gsea confirmed signatures consistent with recently antigen-activated b cells ( fig g) (busconi et al., ) . indeed, cluster c cells were mostly in g (pregc) while c cells were entering cell cycle (earlygc). clusters c , c , c , and c had similar signatures, indicative of dz gc b cells, with only differences in cell cycle status, with c in g m, c and c in s phase and c between g m and g , indicating cells exiting the cell cycle. cluster c had a strong lz signature, as did c , which was split between g m and g and c , in s phase (fig s a-b ). to incorporate bcr sequence data into the overall analysis we compiled the sequence data, isotype and somatic mutations for the heavy chain sequences using the immcantation pipeline (gupta et al., ; vander heiden et al., ) and screpertoire to define clonal status and expansion (borcherding et al., ) . consistent with the recent demonstration of pre-gc class switching (roco et al., ) , cluster pregc-c exhibited over % class switched ig (fig. h) . therefore, we identified pregc-c cells as those receiving positive t cell help signals and that were actively recruited into the earlygc-c . supporting this hypothesis, the fraction of b cells in these two clusters was almost double at d compared to d and . the continued presence of these clusters weeks after infection is consistent with continued replenishment of the gc reaction (fig. i ). bmem cells (cluster c ) were ~ / igm+/class switched, while all gc clusters´ cells were dominated by igg b/c. iga isotype was highly enriched both in bmem as well as in pb in all organs, suggesting a potential preferential recruitment of these cells into the effector arm ( fig. h and fig s c) . whereas most clusters were found in all organs, surprisingly, ha+ b cell clusters were largely organ-and not dpi-specific ( fig. i-j) . while the fraction of naïve and early activated cells was somewhat higher at dpi in all organs, the most evident difference was the distribution between cell types in the three organs studied. the mln is characterized by strong gc activity with a smaller proportion of bmem ( - %) and a considerable number of pb (from % at day to % at d ) (fig. i) . conversely, we could detect very low ha+ gc b cells in the lungs (~ %) but a remarkably high number of bmem that rose from % at day to ~ % of ha+ cells at d and . pb were constant between - %. ha+ b cells in the spleen exhibited strong gc activity and relatively constant proportion of bmem ( - %) and pb ( . - %). interestingly, pb proportion was the highest in mln at day but lowest in other organs. this, linked to mutation analysis (fig ) showing nearly germline bcrs in pb, indicates an early and selective expansion of pb in the mln. coordinately, we detected a burst of bmem in mln at d . overall, these data demonstrate early expansion and differentiation of ha-specific cells to pb and bmem in the mln, and persistence of bmem in lungs. the dynamics of gc at single cell resolution after viral infection have not been previously elucidated and, further, the identity of bmem precursors in the gc is controversial (laidlaw et al., ; shinnakasu et al., ; suan et al., a) . in order to address these issues and decipher the pattern of bmem differentiation, at the single cell level, we performed trajectory analysis using slingshot (street et al., ) and rna velocity analysis with scvelo (bergen et al., ) . rna velocity analysis suggests that pregc-c differentiate to earlygc-c and subsequently enter gc (fig a) , in accordance with gsea (fig. g) . further, velocity analysis suggests that premem-c could potentially have some backflow to bmem (c ). for trajectory analysis with slingshot we removed cluster pb-c , as the cluster was clearly disconnected form all others. the trajectory analysis, with preset start at pregc-c , showed a major trajectory going from lz to dz and to lz again, with cells exiting from premem-c and differentiating to bmem-c (fig. b) . changes of gene expression along slingshot pseudotime showed a marked switch in transcriptional programming at premem-c ( fig c) . premem-c cluster is unique in that cells highly express several gc marker genes (mki , aicda, bcl ) as well as foxo , bach , cd and have high mitochondrial content ( fig d) . all these genes and features have been implicated in bmem development (chappell et al., ; jang et al., ; shinnakasu et al., ) . altogether, this cell population closely resembles the bmem-precursor cell population previously identified by shinnakasu et al. (shinnakasu et al., ) . to further confirm that premem-c was the bmem precursors cluster, we ran a series of gsea analyses. we defined high and low affinity score based on average expression of genes corresponding to high and low affinity cells in shinnakasu et al (shinnakasu et al., ) at the single cell level. gsea analysis identified premem-c as a lz gc b cell population enriched in the low affinity score signature as well as more similar to bmem than pb ( fig e) . when assigning low and high affinty scores we confirmed premem-c as the highest among gc clusters for this signature. likewise, the bmem population was enriched for the low affinity signature. conversely, most of the other gc clusters were enriched for the high affinity signature ( fig f) . thus, cluster premem-c likely represents a bmem precursor population in the gc, characterized by high mki , bcl , cd , bach , foxo expression and high mitochondrial content. bmem in the lungs have a distinct transcriptional profile, independent of isotype. having found a different proportion of bmem in different organs (fig i-j) we investigated whether their transcriptional programming varied with anatomic location. unsupervised analysis of the bmem population (c ) only, revealed subclusters, with the main determinant of separation being lung vs. non-lung localization (fig a-d) . clusters , and were almost exclusively made up from lung bmem (fig c) . these cells were of different isotypes and were both mutated and unmutated, thus probably of gc-dependent and independent origins ( fig d) . while the number and proportion of iga bmem in the lung was higher compared to spleen( fig s a) , b cell heavy chain class was only a partial determinant in the cell segregation. igm+ bmem were the principal determinant of bmem-cluster ( fig b) , but were present in other clusters. rather, several genes strongly contributed to differential clustering between lung and spleen/mln clustering ( fig e) . interestingly, among the most differentially upregulated gene in the lungs were cd , an adhesion molecule linked to tissue residency of immune cells, together with cd , ahr, ccr , cxcr and cd . conversely spleen and mln had significantly higher sell expression (encoding cd l), together with cd , cr , bcl and cd ( fig e) . gsea analysis on cd tissue resident memory signatures revealed striking similarity of lung bmem to cd tissueresident memory (trm), as opposite to spleen and mln bmem ( fig f) (mackay et al., ) . validating the observed differences, pb and gc transcriptional profiles appeared largely similar between organs (fig s d-k) . the only detected difference was between iga pb and others, as previously reported ( fig s e) (neu et al., ; price et al., ) . thanks to the power of our approach, we detected transcriptional differences in bmem and pb between germline (gcindependent) and mutated (gc-dependent) cells, suggesting long term functional differences, depending on cell origin (fig g and fig s b and fig s l) . germline (and probably gcindependent) bmem expressed higher levels of btg , foxp , plac , and other genes that control cell proliferation and differentiation. on the other hand, highly mutated bmem (probably gcdependent) expressed higher levels of jchain (iga specific), slpi, txndc , cmah and others associated with programming towards pb differentiation to validate the transcriptional differences detected in bmem, we infected mice and analyzed lung and spleen b cells at dpi by flow cytometry. this revealed upregulation of cd and cd in lung bmem consistent with scrnaseq data. cd and cxcr , even if upregulated at the mrna level, were not detected on the bmem surface, as expected given their role in gc. splenic bmem had higher cd l, cr and cd expression consistent with scrnaseq data ( fig h) . collectively, these data demonstrate a distinct transcriptional profile of bmem in the lung with hallmark of activation and tissue residency. the low number of ha+ gc b cells in the lung suggest that the majority of lung bmem are either gc independent or derived from gc in other organs and acquiring a new transcriptional signature once they take residence in the lungs. we assigned v gene usage using the immcantation pipeline according to imgt standard (giudicelli et al., ) . as expected, the unselected naïve repertoire was composed of a large number of vh genes while we could start observing selection already at day after infection ( fig a) . at dpi, vh - expressing b cells (cb site specific) dominated the pb and gc response on d (fig s a) , as previously reported (angeletti et al., ; kavaler et al., ; rothaeusler and baumgarth, ) , producing germline or near germline abs ( fig s b) . extending previous findings, we found an increased proportion of these b cells also within the bmem compartment in unmutated form, indicating that these b cells, with high avidity germline bcr not only dominate the extrafollicular pb response but also differentiate into igm and switched bmem ( fig s c) . at day with all individual mice, vh gene usage became far more polarized, indicating vigorous clonal selection. critically, by d one vh family dominated in each mouse, ranging from % to % of the response ( fig a) . when comparing cell clusters at different time points our data shows some vh genes undergo selected as early as dpi in gcs, and dominant clones also appear in pb at dpi ( fig s d) . finally, we detect some skewed vh usage in the bmem population at dpi , consistent with prolonged gc selection ( fig s d) . selection of vh, d and jh, genes was mainly cell type rather than organ specific (fig s e-f ). v gene usage was private to individual mice, except for three genes that were consistently overrepresented in all mice (vh - , vh - and vh - ) ( fig s a) . we then analyzed the vh gene usage overlap between different mice, organs and cell types, as defined by umap clusters, using a pearson's correlation matrix ( fig b and fig s b) (greiff et al., a) . this analysis gives information on both overall vh gene usage, selection and clonal expansion. this revealed that ) the v gene repertoire is most clustered by the high similarity between lung and spleen b cells, ) numerous vh genes can be used to generate ha binding abs, ) most of them are selected into gc and bmem compartments, ) selection in to pb compartment is limited to much fewer clonotypes, which are unusually excluded from gc and bmem compartments. analyzing clonal overlap based on both heavy and light chain (fig s c) , revealed that clustering is dominated by individual mice, extending many prior findings (greiff et al., a; greiff et al., ; greiff et al., b; miho et al., ) that even in mice with nearly identical genetic backgrounds most b cells generate repertoires that emerge stochastically. the cdr pearson's correlation between individual mice, organs and cell types indicates that clonal overlap is mostly organ specific with notable exceptions. in several mice, the bmem populations in lungs were overlapping with gc and bmem populations in mln, rather than other b cell populations in lungs themselves. pb in mln were strongly correlated with mln gc in most mice but the same wasn't always true for pb in spleen and lung (fig c and fig s d) . we further investigated clone sharing between pb, bmem and gc in different organs. we only considered mutated, likely gc-dependent clones. by day we could assign almost all pb in mln to have gc origin by clonal relationship (fig d) . this was not true for most of the lung and spleen, due at least in part to the low degree of clonal expansion and limited sampling. however, it should be noted that, at least for the spleen and mln, overall diversity for each cluster was similar ( fig s ) . for bmem in spleen we idenfied clonal relatives for only ~ % of the cells. this could be due to smaller clonal families that make sampling limiting. we could, however, track as high as % of lung and mln bmem at day ( fig h) . surprisingly, we observed sharing of gc-derived bmem, with spleen gc being a source of bmem in mln and lungs and mln-derived bmem present in the lungs and spleen. these data are consistent with a high degree of dissemination of gcderived bmem between organs. next, we examined clonal expansion. we found that up to % of mln antigen specific b cells belonged to expanded clonotypes. by contrast, at most % of b cells in the lung/spleen were expanded ( fig a) . consistently, the top expanded clones represent more than % of the repertoire in mln, while just - % in lungs and spleen ( fig s a) . expectedly, we detected no sign of clonal expansion in naïve mice. clonal expansion by cluster and dpi shows lower expansion in splenic gc b cells compared to mln b cells (fig b) . clonal expansion of the various gc clusters was uniform. bmem are the most bcr diverse compartment, and had higher numbers of unique clonotypes ( fig s ) . conversely, pb started with few highly expanded clonotypes at day , became more diverse by day and narrowed again over the next two weeks. we generated alluvial graphs to assess the extent to which highly expanded clonotypes are shared between clusters or preferentially expanded in certain clusters over time (fig c) . at day only a few clonotypes in the gc clusters were shared among multiple clusters, regardless of clonal expansion state. by day about - % of the highly expanded clones were present in all gc clusters and most of the highly expanded clones populate bmem and pb subsets. clonal sharing between gc and bmem and pb was independent of clonal size. at day , more than % of bcr sequences were shared between gc clusters, in particular among highly expanded clones. consistent with clonal expansion state, about % of pb derived from highly expanded clones. as in previous dpi , bmem were originated not only from highly expanded gc families but also from smaller clones (fig c) . when looking at individual mice, m _ and m _ at d stood out as that most of the clonal sharing was only between splenic gc and not mln gc, while m _ at d had more than % clonally expanded clones shared between clusters (fig s b) . we visualized clonal expansion by arbitrarily dividing clones into five categories: single ( cell), small (between to cells), medium (between to cells), large (between to cells) and hyperexpanded (over cells) clones and rendered them on the umap plot (fig. d ). the umap clearly shows that the majority of expanded cell clones are in the gc clusters, but also in pb and bmem clusters. as expected, cells in naïve and mz clusters mostly belonged to single clones. interestingly, cells in pregc-c cluster, which we propose to represent cells just entering the gc and undergoing tfh interactions, were mostly unique, consistent with our hypothesis. surprisingly, splitting the umap according to organ and day (fig. e-f and fig s c) reveals that gc clonal expansion is organ dependent. in mln, medium expanded clones are already present at d p.i. and large and hyperexpanded clones by d . conversely, splenic gc had few medium sized clones at day and maintained an essentially unmutated expansion profile at day . likewise, gc clones in the lungs were mostly single or small with occasional medium sized clones appearing. these findings are quite surprising, as they clearly show organ-dependent regulation of gc clonal expansion. notably, the number of analyzed gc cells in mln and spleen at day is nearly identical (n= vs ). nevertheless, to verify that sampling differences didn't affect our day observation, we randomly downsampled mln d to the same number of gc cells of spleen and reassigned the expansion status of clonotypes (fig s d) . we still detected the majority of gc b cells to be either large or hyperexpanded, in stark contrast with spleen gc, validating our conclusion. focusing on highly expanded clones (more than cells), we could define four patterns: clonotypes present in gc only, gc and pb, gc and bmem and gc, bmem and pb. remarkably, the proportion of highly expanded clones found only in the gc increased from % to about % from day to . conversely, the fraction of gc clones shared only with pb decreased, while the fraction of bmem derived from gc stayed constant by dpi (fig g) . together with overall clonal expansion status, this observation indicates that a constant number of gc derived bmem are generated as the immune response progresses while the pb diversity decreases as pb clones expand. finally, we investigated how bcr somatic mutation status relates to cell type and clonal expansion of ha-specific b cells. in line with what would be expected, naïve and marginal zone cells had almost no mutations, while gc cells were overall the most mutated followed by pb and bmem (fig a) . separating by dpi and differentiation clusters we found that d mice had very few mutations, regardless of cell type (even in pb and bmem), indicating that pb and bmem initially derive from expansion of unmutated cells. the mutation frequency increased at later dpi with similar trends for all cluster and organs (fig b) . when considering the different heavy chain classes (fig c and fig s a) , we did not detect major differences between clusters and organs with two exceptions: igm cells were generally less mutated than class switched cells and iga cells tended to have a higher mutation rate, starting from d . b cell mutation frequencies between mice were comparable ( fig s b) . to facilitate mutation analysis, we divided the cells in four discrete bins: germline (not mutated), low (up to % nucleotide mutation), medium (up to %) and high (over % mutation) (fig d) . by day most of gc cells carried bcr with low to medium mutations while at day they had medium/high mutation rate. comparing the mutation data with clonal expansion data (fig f) highlights the different dynamics between mln and other organs. approximately % of bmem remained germline at day and % at day ( fig e) . while we can´t determine the timing of their production, the increased proportion of highly mutated bmem suggested recent origin. more than half of the unmutated bmem were of igm isotype but we also detected igg and iga (fig s c) . unexpectedly, we found that when excluding nonmutated, likely gc-independent cells, overall mutation rates of pb and bmem bcr were statistically indistinguishable, with the exception of bmem in the lungs and spleen at d having lower mutation rate than pb in the same organ (fig f and fig s c) . similarly, premem-c , identified by trajectory analysis to be bmem precursors (fig ) , had undistinguishable mutation rate and clonal expansion profile compared to all other clusters. in addition, we found that % of pb had bcr sequence identical to a bmem. further, mutation distribution did not correlate with clonal size, with families with only members already showing members with high mutation rate (fig s d) . while an imperfect proxy, higher shm usually reflects increased ab binding avidity (gitlin et al., ; kocks and rajewsky, ; neu and wilson, ) . indeed, high and low avidity signatures correlated with mutation rate (fig s e) . as gc-derived bmem are thought to originate from lower affinity cells as compared to pb (shinnakasu et al., ; suan et al., a) , we would expect gc-derived pb to have higher mutation rate as compared to bmem but this was not the case in our infection model (fig f) . the mutation data suggests only a minor contribution of affinity-based selection for gc b cells becoming bmem vs pb. to test this hypothesis, we generated mabs from mutated bmem and pb that are members of large clonal families we generated clonal trees for five families (one from m and two from m at dpi and one each from m and m at dpi ) that contributed differing proportions of cells to gc, bmem and pb populations (fig a) . the branching point for differentiation into pb vs. bmem appeared to be random. in more complex trees, some branches gave rise to both pb and bmem. we expressed representative switched mabs ( from bmem and from pb) and calculated their affinity to ha by biolayer interferometry (bli) (fig b and fig s f) . of note, of the selected mabs were identical between pb and bmem. bli affinity measurement showed no pattern of differential affinity between bmem and pb. indeed, the major determinant of affinity was the clonal family and not the cell type or number of mutations, similar to what was recently reported for bmem recall after immunization . surprisingly, mabs from the highly expanded clonotype , with more than sequenced cells (> % of all cells of m _ ) exhibited low to extremely low avidity for ha by bli. in an extreme example of diverse avidity within a single clonotye, in clone , mab and mab , which differ by amino acids (with in the cdr ), exhibit a nearly million-fold difference in kd ( . mm vs. . nm). we confirmed the bli measurements by testing the mab by elisa on ha, ha from virus and pr virus treated at ph to expose hidden epitopes (fig s g-i) . interestingly, ph treatment affected mostly members of one clonal family ( ) by decreasing their apparent kd, while only one mab had increased kd upon ph treatment (mab ). while the results did not fully recapitulate the bli measurements, they confirmed that there was no difference in apparent avidity between bmem and pb between the same family, that the clonal family was the main determinant in avidity difference and that all mab bound virus. overall, these data indicate that following viral infection: ) a sizeable number of high avidity gcdependent bmem are generated as early as two weeks p.i. ) gc-derived bmem and pb possess bcr with similar number of mutations and avidity for antigen. ) clonal selection is not strictly based on the measured avidity of soluble abs encoded by the clones. b cell responses are the cornerstone of preventing viral infections. better understanding of how antigen specific b cell immunity develops after a respiratory viral infection is crucial for designing effective vaccines for influenza viruses, parainfluenza viruses and sars-cov- . how antigenspecific b cells differentiate prior to gc and from the gc to pb and bmem remains elusive. here, by combining antigen-sorting, scrna-seq and scbcr-seq we have generated a detailed map of differentiation stages of antigen specific b cells in response to respiratory viral infection. by analyzing events in lungs, draining ln, and spleen, our data elucidate the complex mechanisms involved in b cell responses to infection. most seminal discoveries in b cell biology have been made in mice immunized with haptens or simple, monovalent protein antigens (e.g.np, ova, hel, cgg). such models do not, however, fully recapitulate the complexity of infectious agents, each of which expresses dozens to hundreds antigens, and also idiosyncratically activates innate immunity, which sculpts the adaptive response. indeed, using a more complex protein immunogen for immunization (iav-ha) can challenge established principles of b cell differentiation (kuraoka et al., ; . our findings clearly demonstrate the critical importance of using viral infections to study b cell responses. following iav lung infection, we used gene signatures to assign identities to scrna seq clusters, and differentiate lz to dz in the gc. we observed that the proportion of bmem, pb and gc ha-specific cells did not vary with time p.i. infection but was specific for each organ. this was also true for all specific gc clusters. not surprisingly, pregc-c and earlygc-c , which include unmutated cells interacting with tfh and entering the gc reaction were overrepresented on day . within pregc-c and earlygc-c we could detect progressive class switching as cells were approaching gc, consistent with recent findings of pre-gc class switching (roco et al., ) . the regulation of b cell differentiation was organ specific. in particular, lungs harbor a large number of ha-specific bmem, much more abundant than expected from the number of ibalt ha-specific gc b cells. this, together with our demonstration that lung bmem exhibit a different transcriptional profile lead us to conclude that bmem generated in other organs emigrate to lungs, as previously speculated by allie et al. ( ) . here, we show that gc-derived bmem in the lungs can be generated in spleen gc up to day and mln gc at d and subsequently traffic to the lungs where they become tissue resident. ha stem-specific bmem have been described to derive from gc of lungs of iav infected mice (adachi et al., ) . we did not assess stem vs. head specificity, but it is possible that ibalt derived bmem harbor more cross-reactive b cells, particularly since their clonotypes do not overlap with mln-gc. pulmonary bmem also differed from spleen and mln in their transcriptional profiles, with upregulated cd , cd , ahr and downregulated cd l (sell), cr , cd , among others. this confirms recent observations of organ-specific diversity of bmem in bone marrow vs spleen (riedel et al., ) . it also highlights the complexity of the bmem compartment, reflecting a need for specialization and rapid response of bmem, depending on organ. it is interesting to ponder the potential function(s) of early arriving bmem cells in the lung. they may serve a back-up function to pb to insure clearance of the initital infection. they may also reflect early deployment against the chance of reinfection, particularly if niches for tissue resident bmem are specifically created during the remodeling of the lung as the tissue returns to a pre-infection state. it is also possible that their function is antibody independent, e.g. participating in cytokine based tissue remodeling or immune cell regulation. we observed a time dependent increase in bcr mutations in all b cell populations in all organs. surprisingly, we did not detect a similar rate of clonal expansion in gc from mln and spleen, despite similar diversity and even after subsampling to equalize cell numbers. clonal bursts (tas et al., ) may be more common in mln because the total number of gc is lower or because of increased/persistent antigen levels. alternatively, clonal expansion could be similar, but splenic gc may experience increased apoptosis. whatever the explanation, the net result is the presence of a few hyperexpanded clonal families in mln gcs and many small clonal families in spleen gc. notably, previous studies using np and hel immunization model systems suggested a switch in the output of pb and bmem, with early bmem being unswitched, followed by swig bmem (between weeks - ) and then at day the generation of pb (weisel and shlomchik, ) . except for early igm bmem, the response to iav infection differs, featuring a constant output of pb and bmem from gc, as judged by mutation rate. the strong correlation between clonal size and, importantly, mutational pattern suggests that bmem are outputted constantly from gc. similarly, the origin of bmem from gc has been hotly debated. both inductive and stochastic models for bmem differentiation have been proposed. recently it has been proposed that bmem precursors in the gc are selected into the memory compartment because of lower affintiy as compared to pb (shinnakasu et al., ; suan et al., a) . this notion is based on np and hel immunization using bcr transgenic mice. in both cases just one amino acid substitution is needed to dramatically improve bcr avidity (w l for np and y d for hel). to relate this to anti-viral b cell responses we first examined the prototypical c idiotype in the anti-ha response, first described by gerhard and colleagues, which is specific for the cb antigenic site of ha (kavaler et al., ) . these cells carry a bcr with high germline affinity for ha, rapidly differentiate into extrafollicular plasma cells and do not participate in secondary responses to flu and therefore were assumed not to be forming memory rothaeusler and baumgarth, ) . our analysis shows that these cells, of high affinity, are also capable of forming bmem both as gc dependent as well as gc-independent. based on pseudotime analysis and comparison with previously identified gene signatures we identified premem-c as bmem-precursor cells. importantly, we did not find any significant differences in bcr that cells in this cluster expressed, neither in regard of mutational load, class switching, or clonality as compared to other gc cluster. dpi gc-derived bmem were at least as mutated as pb. while somatic hypermutation is not a perfect proxy for affinity it is suggestive that these cells underwent through similar number of selection cycles in the gc. further, we detected a high proportion of cells expressing identical bcr in both bmem and pb compartment which is again not compatible with affinity-based selection. this was further confirmed as we expressed mabs from selected bmem and pb and found their affinity to be comparable. further, our results indicate the major determinant for affinity differences to be the clonal family. while a large number of bmem that we detect are unmutated igm, overall, the data presented here does not support the hypothesis of avidity-based selection for gc-derived bmem, in case of an infection with a respiratory virus. it is possible that affinity selection is prevalent in immunization settings, where antigen amount is limiting but not when antigen is in excess, such as in the case of live replicating virus. in addition, soluble mab expression might not fully recapitulate the complex gc environment, where avidity is also determined by multivalency and bcr density on b cells (lingwood et al., ; slifka and amanna, ; tolar and pierce, ) . further, ha used for avidity measurements might not be in the same form presented in gc on fdc. nevertheless, bcr from gc-derived bmem and pb and even premem-c were indistinguishable by all the measures presented here. our data suggests that stochasticity (blink et al., ; good-jacobson and shlomchik, ; pelissier et al., ; smith et al., ) , might be one of the major determinants of bmem differentiation from gc. indeed, we identify clonal family as the main determinant of b cell affinity, a finding that would have been obviously impossible when using transgenic monoclonal mice. importantly, we found the diversity of bmem to be much larger than pb, with the latter mainly selected from, and composed of, expanded clones. in summary, our study provides a comprehensive resource, linking bcr characteristics with transcriptional regulation of antigen specific b cell activation and differentiation, in different organs, upon respiratory virus infection. we would like to thank the staff at the experimental biomedicine (ebm) core facility at the university of gothenburg for animal management; e. rekabdar and a. almstedt, genomics core facility at sahlgrenska hospital for running the sequencing and data pre-processing; m. bäckstrom and r. lymer, mammalian protein expression (mpe) core facility at the university of gothenburg for recombinant ha production and purification; d. anastasakis, niams, nih, for help retrieving data sets for gsea; a. svitorka-härtlova for assistance with figure the authors have no conflicts of interest to disclose. c bl/ mice were purchased from taconic biosciences, denmark and were housed in the animal facility of experimental biomedicine unit at the university of gothenburg. female mice which are eight to twelve weeks old were used in the experiments. all the experiments were conducted according to the protocols (ethical permit number: / ) approved by regional animal ethics committee in gothenburg. mice were anesthetized with isoflurane and infected through nasal inoculation with tcid influenza a/puerto rico/ / (pr ) (molecular clone; h n ) diluted in hbbs containing . % bsa. c bl/ mice were infected with pr h n virus and were euthanized on different days postinfection. lungs, spleen and mediastinal lymph nodes (mln) were isolated. the same organs from naïve mice were used as controls. spleen and mln were mashed and passed through a µm filter to obtain single cell suspension. lungs wereperfused and processed into single cell suspension using the mouse lung dissociation kit (miltenyi biotec) according to manufacturer's instruction. splenocytes and lung cells were enriched for total b cells using the easysep mouse pan-b cell isolation kit (stemcell technologies) while whole mln cells were used for downstream processing. the cells were incubated for one hour at °c with a cocktail of fluorochrome-labelled antibodies consisting of anti-cd -bv (cat. no: , bd biosciences), anti-b -apc-cy (cat. no: , bd biosciences) and anti-igd-pacific blue (cat. no: , bd biosciences), and µg/ml biotinylated recombinant hemagglutinin (rha) (whittle et al., ) conjugated to streptavidin apc (cat. no: s , invitrogen). to exclude dead cells, the cells were washed and stained with live/dead™ fixable aqua dead cell stain (cat. no: l , invitrogen) according to manufacturer's instruction. a maximum of , live ha-specific mature b cells (cd -b + igd -rha + ) were sorted and collected in a bd facsaria fusion or bd facsaria iii (bd biosciences) cell sorter and processed immediately. all the fluorochrome-labelled antibodies used in flow cytometry were titrated for determining the optimal concentration. briefly, spleens and lungs were harvested from c bl/ mice on day post-pr h n infection after euthanization. spleens were processed into single cell suspension by mashing them and passing through a µm filter. lungs were processed into single cell suspension using the mouse lung dissociation kit (miltenyi biotec). the following fluorochrome-conjugated antibodies were used for labelling the cells: anti-cd -bv (cat. no: , bd biosciences), anti-b -apc-cy (cat. (cat. no: , biolegend) . the cells were stained with fluorochrome labelled antibodies for min at °c. after washing, the cells are stained with live/dead™ fixable aqua dead cell stain (cat. no: l , invitrogen) to exclude dead cells. the labelled cells were run and the data was acquired on the bd lsr fortessa x- (bd biosciences) and was analyzed using flow jo software (tree star). nearly - , sorted ha-specific mature b cells from individual organs were processed into single cells in a chromium controller ( x genomics). during this process, individual cells are embedded in gel beads-in-emulsion (gems) where all generated cdna share a common x oligonucleotide barcode. after amplification of the cdna, ´gene expression library and enriched b cell library, with paired heavy and light chain were generated from cdna of the same cell using chromium single cell vdj reagent kit (v . chemistry, x genomics). the ´gene expression libraries were sequenced in nextseq or novaseq sequencer (illumina) using nextseq / v . sequencing reagent kit (read length: x bp) or novaseq s sequencing reagent kit (read length: x bp) (illumina) respectively. the enriched b cell libraries were sequenced in nextseq or miseq sequencer using nextseq mid output v . sequencing reagent kit (read length: x bp) or miseq reagent kit v (read length: x bp) (illumina) respectively. lungs from mice m _ , m _ and m _ and spleen from m _ failed to yield good quality gems and libraries and were not sequenced. single-cell rna-seq data was processed in r with sauron (https://github.com/nbisweden/sauron), which primarily utilizes the seurat (v . . ) package (stuart et al., ) . this workflow comprises a generalized set of tools and commands to analyze single cell data in a more reproducible and standardized manner, either locally or in a computer cluster. the complete workflow and associated scripts are available on https://github.com/angelettilab/scmousebcellflu. a set of instructions on how to use the workflow and completely reproduce the results shown herein are available there. raw umi count matrices generated from the cellranger x pipeline were loaded and merged into a single seurat object. cells were discarded if they met any one of the following criteria: percentage of mitochondrial counts > %; percentage of ribosomal (rps or rpl) counts > %; number of unique features or total counts was in the bottom or top . % of all cells; number of unique features < ; gini or simpson diversity index < . . furthermore, mitochondrial genes, non-protein-coding genes, and genes expressed in fewer than cells were discarded, whereas the immunoglobulin genes ighd, ighm, ighg , ighg c, ighg b, ighg , igha, and ighe were retained in the dataset regardless of their properties. gene counts were normalized to the same total counts per cell ( ) and natural log transformed (after the addition of a pseudocount of ). the normalized counts in each cell were mean-centered and scaled by their standard deviation, and the following variables were regressed out: number of features, percentage of mitochondrial counts, and the difference between the g m and s phase scores. data integration across cells originating from different samples, time points and tissues were done on regressed scaled counts using the mutual nearest neighbors (mnn) (haghverdi et al., ) on a set of highly variable genes (hvgs) identified within each sample individually and combined. the top nearest neighbors (k) with a final dimensionality of were used. uniform manifold approximation and projection (umap) (mcinnes et al., ) was applied to the mnnintegrated data to further reduce dimensionality for visualization ( dimensions) or for unsupervised clustering ( dimensions). at this stage, differential expression between clusters, and cell correlation with cell-type specific gene lists were evaluated to identify clusters of non-b-cells (such as nk or t cells). predicted non-b-cells were removed from the data, and the entire single-cell rna-seq processing pipeline was re-run using only the remaining b cells. finally, hierarchical clustering was performed on the dimensional umap embedding to define clusters of b cell subtypes, which were then visualized on the -dimensional umap embedding. trajectory inference analysis was performed on a diffusion map embedding ( diffusion components; dcs) of the mnn-integrated count data using the destiny package (angerer et al., ) . the cell differentiation lineages were then predicted from the dcs using the slingshot package (street et al., ) . cluster (cells entering the gc) was specified as the starting point and cluster as the end point (bmem). distance along the resulting curve was used to define the position of each cell in pseudotime. identification of differentially expressed genes was done by fitting a generalized additive model (gam) to the trajectory curve using the tradeseq package (van den berge et al., ), allowing us to detect which genes exhibited expression behavior that was most strongly associated with progression along the defined lineage. single-cell rnaseq bam files were processed using the velocyto command line tool (la manno et al., ) to quantify the amount of unspliced and spliced rna reads of each gene in each sample. the scvelo package (bergen et al., ) was used to perform the rna velocity analysis. the firstand second-order moments for velocity estimation were calculated using the mnn-integrated data as the representation, and the cell velocities were computed using the likelihood-based dynamical model. a velocity graph was calculated based on cosine similarities between cells, and cell velocities were visualized as streamlines overlaid on the -dimensional umap embedding. the bcr sequence data was processed using the immcantation toolbox (v . . ) using the igblast and imgt germline sequence databases, with default parameter values unless otherwise noted. the igblast database was used to assign v(d)j gene annotations to the bcr fasta files for each sample using the change-o package (gupta et al., ) , resulting in a matrix containing sequence alignment information for each sample for both light and heavy chain sequences. bcr sequence database files associated with the same individual (mouse) were combined and processed to infer the genotype using the tigger package as well as to correct allele calls based on the inferred genotype. the shazam package (gupta et al., ) was used to evaluate sequence similarities based on their hamming distance and estimate the distance threshold separating clonally related from unrelated sequences. the predicted thresholds ranged from . to . , where a default value of . was assumed for cases when the automatic threshold detection failed. ig sequences were assigned to clones using change-o, where the distance threshold was set to the corresponding value predicted with shazam in the previous step. germline sequences were generated for each mouse using the genotyped sequences (fasta files) obtained using tigger . bcr mutation frequencies were then estimated using shazam. the bcr sequence data, clone assignments, and estimated mutation frequencies were integrated with the single-cell rna-seq data by aligning and merging the data with the metadata slot in the processed rna-seq seurat object. screpertoire (borcherding et al., ) was used to determine clonal groups based on paired heavy and light chains. this package uses the filtered contig annotation obtained from cell ranger. clones were assigned only for cells were high quality paired heavy and light chains were sequenced. clones were assigned based on the ctstrict function per each mouse. the ctstrict function consider clonally related two sequences with identical v gene usage and > % normalized hamming distance of the nucleotide sequence. percent of unique clonotypes were obtained using the quantcontig function. integration with the seurat object was done using the combineexpression function. ranking of clones were determined using the clonalproportion function and shannon and simpson´s diversity determined using the clonaldiversity function. all function were run using the exporttable = t function to obtain a table and customarily facet the graph in r using the ggplot package. sharing of clones between clusters was visualized using the ggalluvial package. differentially expressed genes between different clusters, organs, isotype or differentially mutated cells were identified using the findallmarkers function from seurat using default settings (wilcoxon test and bonferroni p value correction). significant genes with average log fold change > . and expressed in > % of cells in that group were ranked according to fold change and represented in the featureplot. for gsea analysis, differentially expressed genes for each cluster or organ were calculated using the wilcoxon rank sum test via the wilcoxauc function of the presto package using default parameters (including benjamini-hochberg false discovery rate correction) and filtered on logfc > and padj < . . gsea was run on pre-ranked genes using the fgsea package (korotkevich et al., ) .for each enrichment graph we report p, padj (fdr q) and nes (enrichment score normalized to mean enrichment of random samples of the same size) values in the figure. five clonal families were randomly selected among the hyperexpanded, as defined by screpertoire. clonal trees were reconstructed using the alakazam package of immcantation (gupta et al., ) . in brief, clones were made with the function makechangeoclones and lineages were reconstructed using the dnapars function of the phylip package via buildphyliplineage function. clonal trees were visualized via the igraph package in r. random distinct germinal center selection at local sites shapes memory b cell response to viral escape the establishment of resident memory b cells in the lung requires local antigen encounter lamprey vlrb response to influenza virus supports universal rules of immunogenicity and antigenicity defining b cell immunodominance to viruses is it possible to develop a "universal" influenza virus vaccine? outflanking antibody immunodominance on the road to universal influenza vaccination destiny: diffusion maps for large-scale single-cell data in r generalizing rna velocity to transient cell states through dynamical modeling. biorxiv transcriptional profiling of antigen-dependent murine b cell differentiation and memory formation early appearance of germinal center-derived memory b cells and plasma cells in blood after primary immunization screpertoire: an r-based toolkit for singlecell immune receptor analysis pulmonary infection with influenza a virus induces site-specific germinal center and t follicular helper cell responses functional outcome of b cell activation by chromatin immune complex engagement of the b cell receptor and tlr many variable region genes are utilized in the antibody response of balb/c mice to the influenza virus a/pr/ / hemagglutinin cd is required for formation of memory b cell precursors within germinal centers type i interferon induces cxcl to support ectopic germinal center formation a simple flow-cytometric method measuring b cell surface immunoglobulin avidity enables characterization of affinity maturation to influenza a virus automated analysis of highthroughput b-cell sequencing data reveals a high frequency of novel immunoglobulin v gene segment alleles clonal selection in the germinal centre by regulated proliferation and hypermutation imgt/v-quest: imgt standardized analysis of the immunoglobulin (ig) and t cell receptor (tr) nucleotide sequences plasticity and heterogeneity in the generation of memory b cells and long-lived plasma cells: the influence of germinal center interactions and dynamics systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout b cell development bioinformatic and statistical analysis of adaptive immune repertoires learning the high-dimensional immunogenomic features that predict public and private antibody repertoires change-o: a toolkit for analyzing large-scale b cell immunoglobulin repertoire sequencing data batch effects in single-cell rnasequencing data are corrected by matching mutual nearest neighbors from original antigenic sin to the universal influenza virus vaccine estimates of global seasonal influenza-associated respiratory mortality: a modelling study mitochondrial function provides instructive signals for activationinduced b-cell fates broad dispersion and lung localization of virusspecific memory b cells induced by influenza pneumonia a b cell population that dominates the primary response to influenza virus hemagglutinin does not participate in the memory response a set of closely related antibodies dominates the primary antibody response to the antigenic site cb of the a/pr/ / influenza virus hemagglutinin stepwise intraclonal maturation of antibody affinity through somatic hypermutation fast gene set enrichment analysis. biorxiv neuraminidase inhibition contributes to influenza a virus neutralization by anti-hemagglutinin stem antibodies differentiation of germinal center b cells into plasma cells is initiated by high-affinity antigen and completed by tfh cells complex antigens drive permissive clonal selection in germinal centers rna velocity of single cells the ephrelated tyrosine kinase ligand ephrin-b marks germinal center and memory precursor b cells the multifaceted b cell response to influenza virus structural and genetic basis for development of broadly neutralizing influenza antibodies the developmental pathway for cd (+)cd + tissue-resident memory t cells of skin umap: uniform manifold approximation and projection for dimension reduction restricted clonality and limited germinal center reentry characterize memory b cell reactivation by boosting spec-seq unveils transcriptional subpopulations of antibodysecreting cells following influenza vaccination taking the broad view on b cell affinity maturation memory b cells in the lung participate in protective humoral immune responses to pulmonary influenza virus reinfection naive b cells with high-avidity germline-encoded antigen receptors produce persistent igm(+) and transient igg(+) memory b cells high affinity germinal center b cells are actively selected into the plasma cell compartment igm, igg, and iga influenza-specific plasma cells express divergent transcriptomes discrete populations of isotype-switched memory b lymphocytes are maintained in murine spleen and bone marrow b-cell fate decisions following influenza virus infection regulated selection of germinal-center cells into the memory b cell compartment role of multivalency and antigenic threshold in generating protective antibody responses the extent of affinity maturation differs between the memory and antibody-forming cell compartments in the primary immune response bcl- transgene expression inhibits apoptosis in the germinal center and reveals differences in the selection of memory b cells and bone marrow antibody-forming cells slingshot: cell lineage and pseudotime inference for single-cell transcriptomics comprehensive integration of single-cell data ccr defines memory b cell precursors in mouse and human germinal centers, revealing light-zone location and predominant low antigen affinity plasma cell and memory b cell differentiation from the germinal center inducible bronchus-associated lymphoid tissues (ibalt) serve as sites of b cell selection and maturation following influenza infection in mice visualizing antibody affinity maturation in germinal centers a conformation-induced oligomerization model for b cell receptor microclustering and signaling trajectory-based differential expression analysis for single-cell sequencing data presto: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires germinal centers germinal center dynamics revealed by multiphoton microscopy with a photoactivatable fluorescent reporter memory b cells of mice and humans a temporal switch in the germinal center determines differential output of memory b and plasma cells flow cytometry reveals that h n vaccination elicits cross-reactive stem-directed antibodies from multiple ig heavy-chain lineages antibody-containing eluates were concentrated by centrifugation through vivaspin columns with a kda cut-off. estimation of antibody concentration was done by nanodrop (themofisher) equipment. binding of all mabs was confirmed using elisa. briefly, plates were coated overnight with hau of pr virus in pbs. plates were blocked with pbs % milk for h at rt. mabs were serially diluted and incubated for hr at rt. after washing, plates were incubated with anti-mouse igg secondary ab bio-layer interferometry in principle, bli is an optical analytical, label-free, technique and is used to analyze biological interactions using the difference in interference pattern of white light reflected from two surfaces: a layer of immobilize ligand on the biosensor tip orbital shake speed for affinity analysis. briefly, . µg of avitagged ha diluted in acetate . (pall fortebio, sartorius group experiments were carried out at plate shake speed of rpm and plate temperature of o c. reference sensor was immobilized with pbs (ph . , gibco, sweden) and samples were run with same experimental conditions as ligand immobilized sensor. data was processed using octet data analysis software for single cells analyses statistics are described above. for multiple comparison between groups, one-way anova or two-way anova with tukey´s multiple comparisons were used the complete workflow and associated scripts are available on https://github.com/angelettilab/scmousebcellflu . a set of instructions on how to use the workflow and completely reproduce the results shown herein are available there. a) schematic diagram of experimental setup of influenza infection, cell sorting followed by scrna-seq and bcr profiling. b) representative gating for the cell sorting of single ha+ igd-b cells. c) umap plot of unsupervised clustering of ha-specific b cells, combining all organs and dpi. d) mean expression of the top marker genes for each cell cluster. color intensity denotes average expression while dot size the percent of cells expressing the gene. e) umap plot as in c showing average expression of gene signatures associated with plasma blasts and memory b cell programs. f) umap plot of gc clusters showing average expression of aicda and gene signatures associated with dark and light zone programs. g) enrichment score from gsea comparing pregc-c and earlygc-c to all others for genes involved in b cell activation and differentiation. h) alluvial plot showing proportion of cells with defined antibody isotype for each cluster. i) alluvial plot showing proportion of cells for each umap cluster as in c, divided by organ and dpi. j) umap plot of infected mice divided by organ. a) rna velocity determined by scvelo projected on umap. arrowheads determine predicted direction of the cell movement and arrow size determines strength of predicted directionality. in the squares are highlighted cells moving from pregc-c to earlygc-c (top) and cells moving from premem-c (bottom). b) trajectory inference by slingshot projected on umap with pregc-c selected as starting cluster. on the right the same graph with pseudotime coloring. cluster pb-c was excluded as clearly disconnected from the others. c) list of differentially expressed genes over trajectory-based pseudotime. colors on top indicate clusters. d) umap plot of gc clusters showing average expression of selected genes. e) enrichment score from gsea comparing premem-c to all others for genes involved in gc program, lz program, memory cell program and low affinity signature as described by shinnakasu et al. ( ) f) violin plot showing high and low affinity gene expression scores by umap clusters as defined by shinnakasu et al. ( ) a) umap plot of unbiased clustering of ha-specific memory b cells (c in fig c) , combining all organs and dpi. b) umap plot of unbiased clustering of ha-specific memory b cells as in a, colored by bcr isotype. on the right alluvial plot showing proportion of cells with defined isotype per cluster. c) umap plot of unbiased clustering of ha-specific memory b cells as in a, colored by organ. on the right alluvial plot showing proportion of cells, belonging to a specific organ per cluster. d) umap plot of unbiased clustering of ha-specific memory b cells as in a, colored by bcr mutation rate. germline (not mutated), low (up to % nucleotide mutation), medium (up to %) and high (over % mutation). on the right alluvial plot showing proportion of cells with defined mutation rate per cluster. e) mean expression of the top marker genes for each organ for bmem. color intensity denotes average expression while dot size the percent of cells expressing the gene. d) umap plot of infected cells divided by organ and dpi, colored by mutation rate. germline (not mutated), low (up to % nucleotide mutation), medium (up to %) and high (over % mutation). e) graph showing proportion of cells for each mutation rate for each cluster, divided by dpi and organ. f) violin plot comparing mutation frequency of total and gc-derived bmem vs pb. statistical differences were tested using student´s t test. g) alluvial plot showing proportion of pb with a sequence totally identical to a bmem, divided by dpi. a) clonal trees of five selected clonal families from mice at different dpi. color indicates cell types, where each circle or square is a cell that was sequenced in our experiments. where symbols are missing between junctions, denotes an inferred member of the clonal family. expressed mabs are indicated by name. b) characteristics of the expressed mabs including kd value measured by bli. key: cord- -ena usqv authors: long, rory k. m.; moriarty, kathleen p.; cardoen, ben; gao, guang; vogl, a. wayne; jean, françois; hamarneh, ghassan; nabi, ivan r. title: super resolution microscopy and deep learning identify zika virus reorganization of the endoplasmic reticulum date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ena usqv the endoplasmic reticulum (er) is a complex subcellular organelle composed of diverse structures such as tubules, sheets and tubular matrices. flaviviruses such as zika virus (zikv) induce reorganization of endoplasmic reticulum (er) membranes to facilitate viral replication. here, using d super resolution microscopy, zikv infection is shown to induce the formation of dense tubular matrices associated with viral replication in the central er. viral non-structural proteins ns b and ns b associate with replication complexes within the zikv-induced tubular matrix and exhibit distinct er distributions outside this central er region. deep neural networks trained to identify zikv-infected versus mock-infected cells successfully identified zikv-induced central er tubular matrices as a determinant of viral infection. super resolution microscopy and deep learning are therefore able to identify and localize morphological features of the er and may be of use to screen for inhibitors of infection by er-reorganizing viruses. the endoplasmic reticulum (er) is a highly dynamic network composed of - nm ribosome- studded rough er sheets and convoluted networks of smooth er tubules ( , ) . er shaping proteins such as the lumenal sheet spacer protein cytoskeleton-linking membrane protein (climp- ), in order to study er morphology following zikv infection, we first generated stable u glioblas- for hours. cells were fixed with % paraformaldehyde/ . % glutaraldehyde to preserve er archi- tecture ( , , , ) and labeled for dsrna, a marker for zikv replication factories ( ). maximum figure . d sted microscopy reveals zikv-induced er reorganization in human u glioblastoma cells. a) ermoxgfp or sec β-gfp stably transfected u cells were mock-infected or infected for hours with the prvabc zikv strain (moi= ). er reporter gfp and immunostained dsrna-labeled zikv replication factories were imaged by d sted microscopy. b) fluorescence intensity of ermoxgfp of infected cells using a spectrum heat map and a segmentation mask of the er that colocalizes with dsrna (grey), both generated on imaris x . . (imaris), are depicted. yellow squares in the panels indicate the magnified rois shown in the adjacent panels. quantification of the mean normalized er density ((intensity sum of mask/total cell intensity sum)/ (volume sum of mask/ total cell volume sum)) was performed for both dsrna-positive and dsrna-negative ermoxgfp and sec β-gfp in zikv-infected cells by imaris segmentation. scale bar= microns. cells per biological replicate (n= ). statistics were done using unpaired student's t tests: **= p< . , error bars represent sem. were imaged by d sted microscopy. magnified rois (yellow rois identified by red roman numerals) show that the per extends over - sections ( nm step size) and cer > sections. graph shows average per and cer z-height for each ermoxgfp or sec β-gfp labeled cell. a z-height cutoff of . microns (red line) was used to identify per and cer objects. b) segmented er labeling from -hour zikv-or mock-infected ermoxgfp or sec β-gfp stably transfected u cells (moi= ) was visualized using a z-height spectrum heat map and cer (green; > . µm) and per (red; < . µm) masks are shown. c) volume percentage (left) and mean normalized density (right) of cer and per masks between mock-and zikv-infected cells. cells per biological replicate were analyzed for a total of n= . anova with post-hoc tukey hsd: *= p< . , **= p< . , and ***= p< . . error bars represent sem. scale bar= microns. (green) and per (red) masks overlaid with the dsrna-positive er mask (white) for zikv-infected ermoxgfp and sec β-gfp transfected u cells. enlarged images of ermoxgfp transfected cells show x µm rois of the mock-infected cer (green box) and of the zikv-infected dsrna-positive (red box) and dsrna-negative (yellow box) cer shown in b. graphs show the volume percent of the cer or per region that contains dsrna-positive er (left) and the volume percent of the dsrna-positive er that resides within the cer mask. b) d images of er (white) and dsrna (red) labeling in x µm rois of the zikv-infected dsrna-positive (red) and dsrna-negative (yellow) cer and mock-infected (green) cer are shown above imaris d surface rendering of x µm regions of the above rois. graph shows mean normalized er density for each of the three cer zones by imaris segmentation and masking. cells per biological replicate were analyzed for a total of n= . anova with post-hoc tukey hsd: *= p< . , **= p< . , and ***= p< . . error bars represent sem. scale bar: microns ( nm for zoomed rois). projections of d sted image stacks show high intensity ermoxgfp and sec β-gfp labeling in a cer region and low intensity labeling in per tubules in mock-infected cells ( figure a ), as reported previously by diffraction limited confocal microscopy ( ). upon zikv infection, the cer reorganizes to form an intensely labeled crescent-shaped region surrounding a lower intensity perinuclear re- gion ( figure a) . interestingly, the crescent-shaped zikv-induced perinuclear er overlapped exten- sively with dsrna ( figure a ). imaris bitplane software fragments the er into distinct segments that can then be analyzed for different features, including reporter density, segment z-height and segment overlap with other labels, such as dsrna. density-based segmentation of the ermoxgfp- and sec β-gfp-labelled er of zikv-infected cells showed that the higher density crescent-shaped cer region exhibited significant overlap with dsrna-positive er structures relative to the rest of the er ( figure b ). this suggests that zikv dsrna associates with an er region of high density for both lumenal and membrane er reporters. figure . ultrastructural analysis of zikv-infected cerebral brain organoids. transmission em images of nm thin sections of -hour mock-and zikv-infected cerebral brain organoids (moi= ). yellow boxes show rois shown of adjacent higher magnification images that highlight rough er sheets in mock-infected and tubular matrices (convoluted membranes) in zikv-infected cells. scale bars: nm and nm for zoomed image rois. we then investigated the relation- figure d ). overlaying the cer and per masks with the dsrna-positive er mask showed that the dsrna- positive er (> % volume/volume) is predominantly included within the cer mask ( figure a ). in- deed, only % of per volume contains dsrna while % of cer volume contains dsrna for both er reporters ( figure a ). morphological comparison of the dsrna-positive and -negative cer of zikv-infected cells with the cer of mock-infected cells showed that the cer was composed of a convoluted network of tubules for both the ermoxgfp-and sec β-labeled er ( figure b ). d re- constructions confirmed that these regions were predominantly tubular with a few small sheet-like structures, similar to the tubular matrix morphology of peripheral sheets ( ). d voxel-based visu- alization and quantification showed that the density of er tubular structures in the dsrna-positive er is higher, for both the ermoxgfp or sec β-gfp er reporters, than in the dsrna-negative cer regions of zikv-infected cells or the cer of mock-infected cells ( figure b ). the lower er reporter density reflects reduced spacing between tubules in the dsrna-positive er, suggesting that zikv infection induces tubular matrix reorganization in a subdomain of the cer in u cells. consis- tently, em analysis of the microcephaly relevant cerebral brain organoid model ( ) showed that zikv-induced er reorganization from perinuclear stacked rough er sheets to a perinuclear, circular region of convoluted smooth er tubules (figure ). these results are consistent with zikv induction of a perinuclear tubular matrix. er localization of zikv ns b and ns b structural proteins we then labeled cells for zikv ns proteins ns b and ns b to assess their relationship to the zikv- induced tubular matrix. d sted analysis showed a predominant distribution of both ns b and ns b to the cer and more particularly to the dense zikv-induced crescent-shaped tubular matrix in ermoxgfp transfected u cells ( figure a ). while ns b is predominantly associated with the dsrna-positive cer, ns b labeling extended throughout the cer as well as to the per ( figure a ). to quantify this, we identified ns b-positive and ns b-positive er segments and determined their overlap with total er, cer and per ( figure a ). while ns b was present at high levels on both per and cer segments, ns b was enriched in the cer relative to the per and presented a similar er distribution to dsrna ( figure a ). the majority (> %) of dsrna-labeled puncta were associated with ns b or ns b, consistent with the presence of both these ns proteins in the zikv-induced tubular matrix. in contrast, a minority of ns b (~ %) and ns b (~ %) spots overlapped with dsrna spots ( figure b ). in the dsrna-positive cer, the highly punctate ns b labeling differed from a more reticular ns b label- ing. these two zikv ns proteins therefore exhibit distinct distributions within the zikv-induced tubular matrix when not associated with dsrna replication complexes ( figure b ). together with the differential distribution of ns b and ns b within the er as a whole ( figure a ), these results highlight that these two zikv ns proteins do not associate exclusively with replication factories and suggest that following synthesis of the zikv polyprotein, ns b and ns b undergo distinct biosyn- thetic pathways before reuniting in er-associated replication complexes. are required by non-deep learning methods; and ) provide the ability to move beyond simple classification to inspect discriminative regions (i.e. subregions of the er within each cell). we therefore applied deep neural networks to identify and distinguish the morphological fea- tures of the er of zikv-infected cells. a pipeline outlining our approach is shown in figure a . we train a cnn using d frames (each representing a single z-frame) from d sted volumes of er- figure . er distribution of zikv ns b and ns b proteins. a) mock-or zikv-infected ermoxgfp (red) stably transfected u cells were immunostained for zikv ns b or ns b (green) and dsrna (white). merged images show ns b or ns b (green) overlaid with the cer, per or dsrna-positive er (white). graphs show quantification of volume percent of ns b, ns b and dsrna er regions relative to total er, cer or per. b) mock-or zikv-infected ermoxgfp stably transfected u cells were immunostained for zikv ns b or ns b (green) and dsrna (red). white squares show rois of adjacent zoomed images in which white arrowheads show colocalization between ns protein and dsrna puncta. graphs show percent of dsrna puncta overlapping ns b or ns b puncta (left) and percent of ns b or ns b puncta overlapping dsrna puncta. cells per biological replicate (n= ) with each dot representing a cell. anova with post-hoc tukey hsd: *=p< . , **=p< . , and ***=p< . . error bars represent sem. all images are maximum projections from d sted stacks. scale bar= microns. roi scale bar = microns. rate of networks when dealing with small target datasets. certain filters (combinations of weights) learned on the first dataset (i.e. imagenet) may still be useful for classifying a second dataset (i.e. sted); as a result, less weight updates are needed before achieving good performance. us- ing a pretrained vgg as our base model we obtained a % boost in test accuracy, compared against a random weight initialization. using ermoxgfp labelled er alone, the cnn was able to distinguish between zikv-and mock-infected cells with an % accuracy ( figure b, figure . deep learning classification pipeline: pretrained convolutional neural network accurately predicts labels of d frames from d sted volumes. a) leave-one-out cross validation is successively applied to each cell. this prevents information from d frames leaking between training, validation and test sets. during training, network uses d frames from cells (specifically, for training and for validation). the trained cnn then predicts a class label (i.e. zikv-infected or mock-infected) to each d frame of the remaining test cell. class activation maps are also generated for each d frame belonging to the test set. b) cnn performance reported on a cell basis and across d frames. normalized confusion matrices report the total number of predicted labels (zikv-infected or mock-infected) over the total number of ground truth labels. for example, % of all mock-infected d frames were predicted correctly by the cnn (top right). predicted cell labels correspond to the majority label of predicted frame labels for each cell (top left). when excluding frames beyond the cell with reduced ermoxgfp signal, performance metrics increase both in terms of cell label predictions (bottom left) and individual frame label predictions (bottom right). the remaining cell. this process is outlined in figure a , and is repeatedly applied using each cell show poor accuracy to predict class label (supp fig ) . when considering those frames containing ermoxgfp signal intensity greater than the median, we achieved % accuracy and % sensitiv- ity. on a per frame basis, accuracy for all frames was % and sensitivity % that increased to % and %, respectively, when considering frames expressing ermoxgfp greater than median this suggests that the cams used to identify both zikv-and mock-infected cells correspond to high er density regions localized over the cell (see figure b ) and that vgg is using differences in the er label (ermoxgfp) to identify slices as either zikv-or mock-infected. figure b, left) . conversely, regions identified by the thresholded mock cam have increased er density for tn compared to tp cells ( figure b, right) . we then calculated the average euclidean figure c ) and the precise nature of the features that the cnn uses to discriminate between zikv-and mock-infected cells remains to be determined. cam localization analysis shows that the neural network uses the same cer region that we have observed to be modified upon zikv infection. deep learning therefore has the ability to identify the er morpho- logical changes associated with zikv infection. zikv infection is characterized by re-organization of the er to create replication factories and con- voluted er membranes involved in viral replication, whose ultrastructure has been elegantly char- given roi, er density is defined as total ermoxgfp intensity within the roi divided by ermoxgfp area inside the roi. er density for each roi defined by the cam is then normalized by the er density of the whole cell. a) er density of rois defined by cam thresholds from - % with increments of % is compared across subgroups: zikv-infected d frames correctly predicted to be infected (true positives); mock-infected d frames correctly predicted to be uninfected (true negatives); zikv-infected d frames incorrectly predicted to be uninfected (false negatives); mock-infected d frames incorrectly predicted to be infected (false positives). b) er densities of % cams rois compared across subgroups. c) euclidean distances between center of mass of % cams rois and weighted center of mass of ermoxgfp signal. ( , ). the er is a morphologically complex organelle, containing smooth er tubules and ribosome- studded rough er sheets identified ultrastructurally by em since over years ( networks that correspond to previously described per tubular matrices ( ). while we were unable to detect er sheets by super-resolution analysis of cultured u cells, em of brain organoids shows the transformation of er sheets to convoluted membranes upon zikv infection. this suggests that organoid structures present more highly developed er structures than cultured cells; application of d live cell super-resolution analysis ( ) to this model of the developing fetal brain, composed of a heterogenous population of cell types, may lead to better definition of complex er structures and their dynamic transitions in response to stress, such as viral infection. nevertheless, the fixed cell d sted analysis applied here demonstrates that convoluted membranes associated with zikv replication derive from tubular matrix reorganization in the cer. the zikv-induced cer-localized, high er density tubular matrices are enriched for dsrna. as formance for d super-resolution microscopy data sets that will be of service to other researchers applying deep learning approaches to super-resolution microscopy. the interpretability of artificial intelligence is an evolving field and we believe that interpretable methods, such as grad-cam ( ), are important tools for the understanding of deep neural net- works applied to exploratory data sets. this approach has now allowed us to identify features of discriminatory regions, and has not, to our knowledge, been applied to subcellular morphology, figure . network performance analysis a) performance metrics are reported across predictions of frame labels and cell labels, where cell labels correspond to the majority label of predicted frame labels for each cell. results are reported using a given selection criteria: using all frames (rows , ), using only frames with normalized ermoxgfp signal greater than mean normalized ermoxgfp signal (rows , ) or greater than median normalized ermoxgfp signal (rows , ). mean and median thresholds are computed on a cell basis. the rug plot (above x-axis) visualises distribution of z-frames. b) accuracy reported across corrected z-frames, z= is where normalized ermoxgfp intensity sum is maximal. c) normalized ermoxgfp intensity sum plotted against corrected z-frame. dashed lines indicate the median (orange) and mean (red) normalized ermoxgfp intensity sum computed across all frames. microtubules and the endoplasmic reticulum are highly inter- dependent structures rough sheets and smooth tubules mechanisms determining the morphology of the peripheral er a class of membrane proteins shaping the tubular endoplasmic reticulum dynamic nanoscale morphology of the er surveyed by sted microscopy reticulon and climp- control nanodomain organization of pe- ripheral er tubules increased spatiotemporal reso- lution reveals highly dynamic dense tubular matrices in the peripheral er architecture and biogenesis of plus-strand rna virus replication fac- tories rig-i-like receptors: cytoplasmic sensors for non-self rna molecular mechanism of signal perception and integration by the innate immune sensor retinoic acid-inducible gene-i (rig-i) hostile takeover: hijacking of endoplasmic reticulum function by t ss and t ss effectors creates a niche for intracellular pathogens endoplasmic reticulum: the favorite intracellular niche for pathogen-endoplasmic-reticulum interactions: in through the out door immunolocalization of the dengue virus nonstructural glycoprotein ns suggests a role in viral rna replication human coronavirus: host-pathogen interaction zika virus associated with meningoencephalitis . fauci as, morens dm. zika virus in the americas-yet another arbovirus threat ultrastructural characterization of zika virus replication factories cytoarchitecture of zika virus infection in human neuroblastoma and aedes albopictus cell lines virus in human fetal neural progenitors persists long term with partial cytopathic and limited molecular aspects of dengue virus replication genome sequence of a zika virus strain isolated from the serum of an infected patient in thailand in rewiring cellular networks by members of the flaviviridae family the host protein reticulon is utilized by flaviviruses to facilitate membrane remodelling virus perturbs mitochondrial morphodynamics to dampen innate immune responses. cell host microbe composition and three- dimensional architecture of the dengue virus replication and assembly sites zika ns b is a crucial factor recruiting ns to the er and activating its protease activity a palette of fluorescent proteins optimized for diverse cellular environments ligand-induced redistribution of a human kdel receptor from the golgi complex to the endoplasmic reticulum a conserved er targeting motif in three families of lipid binding proteins and in opi p binds vap using brain organoids to understand zika virus- induced microcephaly imagenet classification with deep convolutional neural networks very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv rethinking model scaling for convolutional neural networks skin lesion analysis toward melanoma detection: a challenge at the international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic) a dataset for breast cancer histopathological image classification learning deep features for discriminative computer vision and pattern recognition evaluating cnn-based semantic food segmentation across illuminants a novel segmentation framework for uveal melanoma in magnetic resonance imaging based on class acti- vation maps coronavirus infection, er stress, apoptosis and innate immunity. front micro- biol studies on the endoplasmic reticulum. i. its identification in cells in situ applying systems-level spectral imaging and analysis to reveal the organelle interactome atlastin en- doplasmic reticulum-shaping proteins facilitate zika virus replication er-shaping atlastin proteins act as central hubs to promote flavivirus replication and virion assembly crystal structure of unlinked ns b-ns protease from zika virus flaviviral ns b, chameleon and jack-in-the-box roles in viral replication and pathogenesis, and a molecular target for antiviral intervention zika virus ns a and ns b proteins dereg- ulate akt-mtor signaling in human fetal neural stem cells to inhibit neurogenesis and induce knowledge transfer for melanoma screening with deep learning texture analysis for muscu- lar dystrophy classification in mri with improved class activation mapping. pattern recognition breast cancer histology images classification: training from scratch or transfer learn- ing? ict express grad-cam: visual explanations from deep networks via gradient-based localization form follows function: the importance of endoplas- mic reticulum shape defining host-pathogen interactions employing an artificial intelligence workflow. elife sars- coronavirus replication is supported by a reticulovesicular network of modified endoplasmic retic- ulum ca + signaling machinery is present at intercellular junctions and structures associated with junction turnover in rat sertoli cells key: cord- - u adnk authors: li, jinzhi; mahoney, brennan dale; jacob, miles solomon; caron, sophie jeanne cécile title: visual input into the drosophila melanogaster mushroom body date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: u adnk the ability to integrate input from different sensory systems is a fundamental property of many brains. yet, the patterns of neuronal connectivity that underlie such multisensory integration remain poorly characterized. the drosophila melanogaster mushroom body — an associative center required for the formation of olfactory and visual memories — is an ideal system to investigate how different sensory channels converge in higher-order brain centers. the neurons connecting the mushroom body to the olfactory system have been described in great detail, but input from other sensory systems remains poorly defined. here, we use a range of anatomical and genetic techniques to identify two novel types of mushroom body input neuron that connect visual processing centers — namely the lobula and the posterior lateral protocerebrum — to the dorsal accessory calyx of the mushroom body. together with previous work that described a pathway conveying visual information from the medulla to the ventral accessory calyx of the mushroom body (vogt et al., ), our study defines a second, parallel pathway that is anatomically poised to convey information from the visual system to the dorsal accessory calyx. this connectivity pattern — the segregation of the visual information into two separate pathways — could be a fundamental feature of the neuronal architecture underlying multisensory integration in associative brain centers. sensory systems use different strategies to detect specific physical features of the outside world. for instance, the olfactory system contains many different types of sensory neuron that are each specialized in detecting a specific class of volatile chemicals. through only two neuronal layers, olfactory information -the identity of an odor and its concentration -is relayed to higher brain centers (leinwand and chalasani, ) . in contrast, the visual system contains far fewer types of sensory neuron but, through numerous neuronal layers, it relays a range of highly processed information -for instance, color, brightness, motion and shape -to higher brain centers (baden et al., ) . thus, higher brain centers have to integrate different types of processed information, bind that information into a coherent representation of the outside world, and use such representations to guide behavior (yau et al., ) . how higher brain centers achieve this feat remains largely unknown. this gap in our knowledge mainly stems from the fact that higher brain centers are formed by a large number of neurons, and that the projection neurons conveying information from different sensory systems to these centers often remain poorly characterized. this makes it difficult to understand whether there are specific patterns of neuronal connectivity that enable multisensory integration and what the nature of these patterns are. deciphering the fundamental neuronal mechanisms that underlie multisensory integration requires a model system such as the drosophila melanogaster mushroom body, which consists of a relatively small number of neurons whose connections can be charted reliably. the drosophila mushroom body is formed by ~ , neurons -called the kenyon cells -and has long been studied for its essential role in the formation of olfactory associative memories (aso et al., ; hige, ) . the identity of the projection neurons that connect the olfactory system to the mushroom body -and the way kenyon cells integrate input from these neurons -has been characterized in great detail, highlighting fundamental connectivity patterns that enable this higher brain center to represent olfactory information efficiently (bates et al., ; caron et al., ; tanaka et al., a tanaka et al., , b zheng et al., ) . evidence in drosophila melanogaster shows that the mushroom body is more than an olfactory center, as it is also required for the formation of visual and gustatory associative memories (liu et al., ; masek and scott, ; vogt et al., ) . however, the identity of the neurons that connect the mushroom body to other sensory systems remains poorly characterized. thus, a first step towards understanding how the mushroom body integrates multisensory information is to identify such non-olfactory mushroom body input neurons and the genetic tools necessary to manipulate these neurons. the mushroom body receives its input through its calyx and sends its output through its lobes. the calyx -a morphologically distinct neuropil containing the synapses formed between projection neurons and kenyon cells -can be divided into four, non-overlapping regions: one main calyx as well as three accessory calyces named the dorsal, lateral and ventral accessory calyces (aso et al., ; yagi et al., ) . the five output lobes -the a, a', b, b' and g lobes -contain the synapses formed between kenyon cells, mushroom body output neurons and dopaminergic neurons (aso et al., ) . with respect to these input and output regions, kenyon cells can be divided into seven distinct types (aso et al., ) . of these seven types, five types -the a/bc, a/bs, a'/b'ap, a'/b'm and gmain kenyon cells -extend their dendrites only into the main calyx and their axons along one or two lobes. most of the neurons that project to the main calyx emerge from the antennal lobe, the primary olfactory center in the drosophila brain. thus a/bc, a/bs, a'/b'ap, a'/b'm and gmain kenyon cells receive input primarily from the olfactory system (caron et al., ; aso et al., ; zheng et al., ) . in contrast, the two other classes of kenyon cells do not extend their dendrites into the main calyx. instead, the a/bp kenyon cells extend their dendrites into the dorsal accessory calyx -avoiding completely the main, lateral and ventral accessory calyces -and their axons along the a and b lobes. likewise, the gd kenyon cells extend their dendrites exclusively into the ventral accessory calyx and their axons along the g lobe (aso et al., ; vogt et al., ) . thus, both the a/bp and gd kenyon cells are anatomically poised to receive non-olfactory input. there is evidence suggesting that the ventral accessory calyx receives input from the medulla, a region of the optic lobe that specializes in processing brightness and color (morante and desplan, ; vogt et al., ) . furthermore, a recent study suggests that the dorsal accessory calyx is a multisensory center that integrates input from multiple sensory pathways including the olfactory, gustatory and visual systems (yagi et al., ) . here, we report a strategy that uses a combination of genetic tools -including transgenic lines that drive expression in few neurons and a photo-labelling technique used to identify individual neurons and their presynaptic partners -to characterize the input neurons of the a/bp kenyon cells. we identify two novel types of mushroom body input neuron that, together, form about half of the total input the a/bp kenyon cells receive in the dorsal accessory calyx. the first neuronal type -henceforth referred to as lopns -consists of a neuron that projects from the lobula, a region of the optic lobe specialized in detecting visual features such as shape and motion. the second type of neuron -henceforth referred to as plppns -consists of projection neurons that emerge from the posterior lateral protocerebrum, a brain region that receives input from the optic lobe (otsuna and ito, ; keleş and frye, ; wu et al., ) . interestingly, lopn and plppns do not project to the ventral accessory calyx and do not connect to the gd kenyon cells. based on these findings, we conclude that there are two parallel pathways that convey visual information to the mushroom body: a pathway projecting from the medulla to the gd kenyon cells and another pathway projecting from the lobula and posterior lateral protocerebrum to the a/bp kenyon cells. neurons projecting to the dorsal accessory calyx emerge from different brain regions the dorsal accessory calyx is a neuropil formed from the synapses connecting ~ a/bp kenyon cells to their input neurons ( figure a ; (aso et al., ) ). using transgenic lines that drive expression specifically in the a/bp kenyon cells (the r d -gal dbd, transgenic line also known as the mb -gal ) and transgenic lines that express a photoactivatable form of gfp or pa-gfp (a combination of the uas-c pa-gfp and uas-spa-gfp transgenic lines, henceforth referred to as uas-pa-gfp), we photo-labelled individual a/bp kenyon cells similarly as described in a previous study (aso et al., ; datta et al., ; ruta et al., ) . we found that individual a/bp kenyon cells extend on average ± claw-shaped dendritic terminals (n = ) exclusively into the dorsal accessory calyx ( figure b , d) and project their axons along the a and b lobes (not depicted). the overall morphology of a/bp kenyon cells is similar to the morphology of other types of a/b kenyon cell; for instance, individual a/bs kenyon cells extend on average ± claw-shaped dendritic terminals (n = ) exclusively in the main calyx ( figure c ) and project their axons along the a and b lobes (not depicted). it is worth noting that the claws formed by the a/bp kenyon cells are much smaller in diameter ( . ± . µm; n = claws) than the claws formed by the a/bs kenyon cells ( . ± . µm; n = claws; figure b -c (white arrows), e). it is also worth noting that the border of the dorsal accessory calyx is looser than the compact and well-defined circular border of the main calyx. individual a/bp kenyon cells can extend dendrites further away from the core of the dorsal accessory calyx, resulting in an irregularly shaped calyx ( figure d) . thus, the a/bp kenyon cells resemble a/bs and a/bc kenyon cells but also differ from them in one major way: the dendrites of the a/bs and a/bc kenyon cells exclusively innervate the main calyx -a region known to receive most of its input from the olfactory system -whereas dendrites of the a/bp kenyon cells exclusively innervate the dorsal accessory calyx -a poorly characterized region of the mushroom body. to identify neurons that project to the dorsal accessory calyx and connect to the a/bp kenyon cells, we used a targeted photo-labeling technique that was adapted from previously published techniques (aso et al., ; datta et al., ; ruta et al., ) . in order to only photo-label the neurons projecting to the dorsal accessory calyx, and not the a/bp kenyon cells, we used a combination of transgenes that, in concert, drive the expression of pa-gfp in all neurons except the kenyon cells (the n-synaptobrevin-gal , mb -gal and uas-pa-gfp transgenes); instead, kenyon cells were labeled with the red fluorescent protein dsred using the mb -dsred transgene ( figure a ). the brains of adult flies carrying all the aforementioned transgenes were dissected and imaged using two-photon microscopy ( figure b -c). guided by the expression of dsred, we targeted the dorsal accessory calyx -which is clearly distinct from the main calyx -with high-energy light in order to photo-convert pa-gfp specifically in the neurons projecting to that area ( figure c , white dashed outline). upon photo-labeling, the somata of the neurons that express pa-gfp, and that project either dendrites or axons into the dorsal accessory calyx, were labeled ( figure d -e). on average, ± neurons (n = ) were photo-labeled. the somata of these neurons are located in seven distinct clusters that are distributed across the brain (figure a , d-e). on the anterior side of the brain, we found three clusters: one located near the antennal lobe (cluster al containing ± neurons), one located in the optic lobe (cluster ol containing ± neurons) and one located near the anterior ventral lateral protocerebrum (cluster avlp containing ± neurons) ( figure d , f). on the posterior side of the brain, we found four clusters: one located in the superior medial protocerebrum (cluster smp containing ± neurons), one located in the superior lateral protocerebrum (cluster slp containing ± neurons), one located in the lateral horn (cluster lh containing ± neurons) and one located in the posterior lateral protocerebrum (cluster plp containing ± neurons) ( figure e -f). altogether, these results suggest that the dorsal accessory calyx receives input from a diverse and distributed collection of projection neurons that can be divided into seven clusters. while en masse photo-labeling of the neurons projecting to the areas within and around the dorsal accessory calyx gives a good approximation of the number and identity of neurons possibly connecting to the a/bp kenyon cells, this technique cannot confidently identify true pre-synaptic partners. additionally, the amount of photo-activated pa-gfp within individual neurons varies significantly between trials, and because a large number of neurons project to the dorsal accessory calyx, it is difficult to resolve the morphology of individual neurons using this technique. we, therefore, sought to identify gal transgenic lines that drive expression specifically in these neurons. to identify such lines, we carried out an anatomical screen using the flylight collection of gal transgenic lines ( figure a ; (jenett et al., ) ). as a first step, we screened through the flylight database -an online catalogue that reports the expression patterns of ~ , gal driver lines -and selected transgenic lines which that highlighted neuronal processes in the dorsal accessory calyx ( figure b -c). as a second step, we specifically labeled neurons that project to the dorsal accessory calyx using the same technique as described above but with a different combination of transgenes (the r_line-gal , uas-pa-gfp and mb -dsred transgenes). the brains of adult flies that carry this combination of transgenes were dissected and imaged using two-photon microscopy ( figure d , f). guided by the expression of dsred, we targeted the dorsal accessory calyx with high-energy light in order to convert pa-gfp specifically in the neurons that project to that region ( figure d , white dashed outline). we screen through the pre-selected transgenic lines and identified lines that we chose to investigate further based on two criteria. first, we selected lines that showed a strong signal after photo-labeling and clear pre-synaptic terminals near or in the dorsal accessory calyx ( figure e , g; table ). possible pre-synaptic terminals -bouton-shaped ( figure e , white arrow) -were distinguished from possible post-synaptic terminals -mesh-shaped -based on the recovered photo-labeling signal. second, we selected lines based on the strength and specificity of their expression patterns: lines expressing at high level in a few neurons led to clear photolabeling signals, making it easier to characterize these putative input neurons further ( figure g ). finally, we determined whether the neurons identified with these lines are pre-synaptic partners of the a/bp kenyon cells. to this end, we used an activity-dependent version of the gfpreconstitution across synaptic partners (grasp) technique ( figure a ) (feinberg et al., ; macpherson et al., ) . in this technique, the green fluorescent protein (gfp) is split into two complementary fragments -the gfp - and gfp fragments -that do not fluoresce when expressed alone; the gfp - fragment is tagged to an anchor that specifically targets the pre- synaptic membrane (syb::spgfp - ), whereas the gfp fragment is tagged to an anchor that targets the membrane (spgfp ::cd ). when both fragments are in close contact -in this case, when the syb::spgfp - fragment is released at an active synapse -gfp molecules can be reconstituted and recover their fluorescence. the reconstitution of gfp molecules, as indicated by the presence of fluorescent gfp speckles, suggests that the neurons expressing the two complementary fragments form functional synapses (macpherson et al., ) . in our study, the syb::spgfp - fragment was expressed in the putative input neurons using the different transgenic lines identified in our screen as well as two transgenic lines that a previous study found to drive expression in putative dorsal accessory calyx input neurons ( figure b -m, table ; (yagi et al., ) ). the expression of the spgfp ::cd fragment was driven in most of the kenyon cells using the mb -lexa transgenic line. we look for gfp speckles in the core of the dorsal accessory calyx, which is clearly visible in the red channel as an irregularly shaped structure anterior to the main calyx ( figure b , white dashed outline). we detected a large number of gfp speckles in the core of the dorsal accessory calyx for four of the ten lines that were identified in the screen as well as for the two transgenic lines identified by the l-m and table ). additionally, we detected a small number of gfp speckles in the same region for three other transgenic lines ( figure f -h and table ). it is worth noting that some of the gfp speckles lay outside of the red signal; most likely these speckles are formed by kenyon cells that extend dendrites outside of the core of the dorsal accessory calyx ( figure b -h, l-m). finally, we did not detect any gfp speckles for the remaining three lines ( figure i -k and table ) . altogether, our screen identified at least nine transgenic lines that drive expression in neurons that form synapses with the a/bp kenyon cells of the dorsal accessory calyx. to characterize further the morphology of the projection neurons that connect to the a/bp kenyon cells, we photo-labeled individual neurons using the transgenic lines recovered from our screen that showed reliable grasp signal. to this end, neurons were photo-labeled in the dissected brains of flies carrying the r_line-gal and uas-pa-gfp transgenes; these brains were then immuno-stained and imaged using confocal microscopy. we first focused our attention on the transgenic lines in which we had found the strongest grasp signal ( figure b -e and table ). photo-labeling of neurons that project to the dorsal accessory calyx using the r h -gal and r d -gal transgenic lines revealed a single neuron ( figure a ). in each line, a neuron projecting from the lobula to the dorsal accessory calyx was clearly photo-labeled ( figure b -c). therefore, we named this type of neuron "lopn". the soma of lopn is located in cluster plp, a region medial to the optic lobe ( figure e , d-e). lopn extends its dendrites to the lobula and projects its axon to the superior lateral protocerebrum and the dorsal accessory calyx, completely avoiding the main calyx and the ventral accessory calyx ( figure d -e). it is worth noting that lopn is very similar to the olct neuron that has been previously described using a different transgenic line, r f -gal ; accordingly, we recovered a strong grasp signal for that transgenic line ( figure l , table ; (yagi et al., ) ). we then set out to confirm whether lopn does indeed connect to a/bp kenyon cells, as surmised from the grasp results. a previous study has demonstrated that a technique, which combines photo-labeling and dye-labeling tools, can be used to identify the complement of input that individual kenyon cells receive from the antennal lobe, and the frequency at which individual projection neurons connect to kenyon cells (caron et al., ) . we have modified this technique in order to directly measure the connectivity rate between a given projection neuron and kenyon cells. in this modified version of the technique, we used a combination of transgenic lines to photolabel the projection neuron of interest -using the same protocol that we described in the previous section -and to dye-label a single kenyon cell ( figure a ). we then assessed how many of the dendritic claws formed by the dye-labeled kenyon cell are connected to a given projection neuron. as a proof of principle, we measured the connectivity rate of the dc glomerulus projection neurons to the olfactory kenyon cells that innervate the main calyx. the connectivity rate of the dc projection neurons to the olfactory kenyon cells has been previously approximated to be . %; this means that there is a . % chance that a given olfactory kenyon cell claw receives input from the dc projection neuron (caron et al., ) . using the modified technique, we found that the dc projection neurons connect to kenyon cells at a connectivity rate of . % (n = ), a value well within the range measured previously (data not shown). we measured the connectivity rates of lopn to a/bp kenyon cells using the two transgenes we identified in our screen. using the r h -gal driver line, we found that . % (n = ) of the claws formed by the a/bp kenyon cells connect to lopn ( figure b -c, h and table ). likewise, using the r h -gal driver line, we found that . % (n = ) of the claws formed by the a/bp kenyon cells connect to lopn ( figure h and table ). altogether, based on the results obtained using both the grasp technique and the dye/photo-labeling technique, we conclude that lopn is a true input neuron of the a/bp kenyon cells and that it conveys information from the lobula, a visual processing center, to the dorsal accessory calyx. photo-labeling of neurons that project to the dorsal accessory calyx using the r h -gal and r g -gal transgenic lines revealed a group of neurons ( figure a ). in each line, neurons that project from the posterior lateral protocerebrum to the dorsal accessory calyx were photolabeled ( figure b -c). therefore, we named this type of neuron "plppns". the somata of the plppns are located in cluster lh, a region near the lateral horn ( figure d ). we could identify ± plppns (n = ) per hemisphere using the r h -gal transgenic line ( figure d ). the r g -gal transgenic line drives expression at a much lower level, making it difficult to analyze the plppns in great detail ( figure c ). plppns extends projections into many brain centers, including the posterior lateral protocerebrum, the superior clamp, the superior lateral protocerebrum and the dorsal accessory calyx, avoiding completely the main calyx and the ventral accessory calyx ( figure d ). using the dye/photo-labeling technique described above, we measured the connectivity rate between plppns and a/bp kenyon cells. we found that the plppns identified using the r h -gal transgenic line connect at a high rate: we found that . % (n = ) of the claws formed by the a/bp kenyon cells connect to the plppns ( figure d , e, h). we also measured the connectivity rate of individual plppns using the same transgenic line and found that, on average, an a/bp kenyon cell claw has . % (n = ) probability of connecting to a plppn ( figure h and table ). to determine whether the plppns identified in the r h -gal and r g -gal transgenic lines are the same or different group of neurons, we used the split-gal technique (luan et al., ) . in this technique, the gal transcription factor is split into two complementary fragments -the gal ad and gal dbd domains -that are both transcriptionally inactive when expressed alone; when both fragments are expressed in the same cell, gal can be reconstituted and recovers its transcriptional activity. plppns were visible in flies that carry the r g -gal dbd and r h -gal ad transgenes, thus confirming that both lines are expressed in the same group of plppns ( figure e -g). this result revealed, even more clearly, that individual plppns project in many different brain centers ( figure g ). in order to distinguish the axonal terminals from the dendritic arbors, we used the split-gal combination of transgenic lines (the r g -gal dbd and r h -gal ad transgenic lines) to drive the expression of the presynaptic marker synaptotagmin in the plppns ( figure h ). we found that the projections extending into the superior lateral protocerebrum, superior clamp and dorsal accessory calyx contain presynaptic terminals. similarly, when the expression of the postsynaptic marker denmark was driven specifically in the plppns, we found that all of the projections made by the plppns contain postsynaptic terminals ( figure i ). however, the projections extending into the posterior lateral protocerebrum are the only projections formed by plppns that contain only post- synaptic terminals and no pre-synaptic terminals ( figure h -i). to identify the neurons that project to the post-synaptic terminals formed by the plppns in the posterior lateral protocerebrum, a poorly characterized visual processing center, we used the targeted photo-labeling technique described above ( figure a ). in short, we used a combination of transgenes that, in concert, drive the expression of pa-gfp in all neurons (the n-synaptobrevin-qf and quas-pa-gfp transgenes) and the expression of tdtomato in the plppns (using the r h -gal ad, r g -gal dbd and uas-tdtomato transgenes). guided by the expression of tdtomato, we targeted the post-synaptic terminals formed by plppns in the posterior lateral protocerebrum with high-energy light ( figure b , c). upon photo-labeling, two types of neuron were clearly photo-labeled: the plppns -showing that the photo-activation was specific to these neurons -and neurons projecting from the ventral medulla ( figure d ). these photolabeled neurons project into deeper layers of the medulla ( figure e ). altogether, from this set of experiments, we conclude that plppns are one of the major input neurons of the a/bp kenyon cells and that they convey information from the posterior lateral protocerebrum, and possibly from the ventral medulla, to the dorsal accessory calyx. photo-labeling using the transgenic lines that displayed weak grasp signal -these are the r e -gal , r c -gal and r b -gal lines -or no grasp signal -these are the r f -gal , r f -gal and r c -gal lines -led to the identification of a third type of neuron (figure ). the somata of these neurons are all located in al cluster, a region near the antennal lobe. although each of these neurons has a distinct overall morphology, they all project from the antennal lobe to the superior lateral protocerebrum and extend their axons in a region near the dorsal accessory calyx. therefore, we named this type of neuron "alpns". the neurons photo-labeled using the r e -gal and r c -gal driver lines show a nearly identical morphology: in each line, a single neuron that extends its dendrites into the posterior antennal lobe and project its axons into the superior lateral protocerebrum was visible ( figure a ). we named this neuron "alpn ". it is worth noting that alpn is very similar to the thermosensitive ac neuron that has been previously described (shih and chiang, ) . using a combination of photo-labeling and dye-tracing techniques, as described above, we measured the connectivity rate between alpn and a/bp kenyon cells. we could not detect any connections between alpn and a/bp kenyon cells (n = ), suggesting that alpn is not a dorsal accessory calyx input neuron ( figure h , table ). similarly, the neuron photo-labeled using the r b -gal line extends its dendrites into the column region of the posterior antennal lobe, a region known to be activated by high humidity, and the sub-esophageal ganglion, a gustatory processing center ( figure b ). we named this neuron "alpn ". again, we could not detect any connections between alpn and a/bp kenyon cells (n = ), suggesting that alpn is not a dorsal accessory calyx input neuron ( figure f -h, table ). thus, alpn and alpn are most likely not contributing major input to the a/bp kenyon cells. we extended our analysis to the transgenic lines that displayed no grasp signal. the neurons photo-labeled using the r f -gal and r c -gal driver lines show an overall similar morphology: in each line, a single neuron that extends its dendrites into the arm region of the posterior antennal lobe, a region known to be activated by low humidity, was visible ( figure d ). we named this neuron "alpn ". not surprisingly, we could not detect any connections between alpn and a/bp kenyon cells (n = ) suggesting that alpn does not provide input into the dorsal accessory calyx (table ) . similarly, the neuron photo-labeled using the r f -gal line extends its dendrites broadly throughout the anterior antennal lobe, an olfactory processing center ( figure f ). we named this neuron "alpn ". again, we could not detect any connections between alpn and a/bp kenyon cells (n = ), suggesting that alpn does not provide input into the dorsal accessory calyx (table ) . thus, we could confirm that the transgenic lines that displayed no grasp signal do not appear to provide major input to the a/bp kenyon cells. altogether, these results show that the third type of neuron identified in our screen -alpnsproject to a region close to the dorsal accessory calyx but are most likely not presynaptic to the a/bp kenyon cells. our results suggest that the lopn and plppns that we identified in our screen connect to the a/bp kenyon cells and that, together, these two types of projection neuron represent a large fraction of the input neurons of the dorsal accessory calyx. our results also suggest that the alpns do not connect to the a/bp kenyon cells, as another study suggested (yagi et al., ) . thus, we conclude that the dorsal accessory calyx is anatomically poised to receive information primarily from the lobula and the posterior lateral protocerebrum, two visual processing centers. we verified whether these findings corroborate with the recently released drosophila hemibrain connectome (xu et al., ) . using the neuprint . . platform, we focused our attention on the reconstructed a/bp kenyon cells that have been fully traced. altogether, these a/bp kenyon cells receive input from a large number of neurons but most of these synapses are most likely axoaxonic synapses as they are located along the a and b lobes. these input neurons are primarily other kenyon cells, various dopaminergic neurons as well as neurons known to connect broadly to all kenyon cells such as the apl neuron and the dpm neuron (data not shown). we focused our attention on the input neurons that connect to the a/bp kenyon cells in the dorsal accessory calyx. not surprisingly, in accordance with the results obtained from our en masse photo-labeling experiment, the cell bodies of these input neurons can be divided into seven different clusters and the ratio of neurons belonging to a given cluster is largely consistent between both analyses ( figure a ). interestingly, the lh cluster is the most numerous cluster: our study identified ± neurons (n = ) in this cluster -including the ± plppns (n = ) -whereas the connectome shows a total of input neurons with cell bodies located in the lateral horn ( figure a ). among these neurons, we identified that are morphologically highly similar to plppns ( figure j -k, table ). we found that these plppn-like neurons connect to of the reconstructed a/bp kenyon cells, representing . % of the input all a/bp kenyon cells receive in the dorsal accessory calyx ( figure b and table ). plppns-like neurons connect mostly to a/bp kenyon cells but also form a few connections to gd and gmain kenyon cells, reinforcing the observation that the dorsal accessory calyx input neurons and the ventral accessory calyx input neurons form two parallel pathways ( figure c ). in our study, we identified neurons connecting the ventral medulla to the plppns (figure ) but, because the hemibrain connectome does not include the medulla, we could not identify fully traced neurons similar to these plppns input neurons. additionally, our study identified ± neurons (n = ) in the ol cluster -including lopn -whereas the connectome shows a total of input neurons with cell bodies located in the optic lobe ( figure a ). among these neurons, we identified a neuron very similar to lopn (figure f -g). we found that this lopn-like neuron connects to of the reconstructed a/bp kenyon cells, representing . % of the input all a/bp kenyon cells receive in the dorsal accessory calyx ( figure b and table ). as we found in our study, this lopn-like neuron does not connect to any other types of kenyon cell, forming a pathway parallel to the one conveying visual information to the ventral accessory calyx. finally, both studies identified a number of neurons with cell bodies located near the antennal lobe: using en masse photo-labeling, we identified a total of ± neurons (n = ) in the al cluster, whereas nine such neurons are found in the connectome. among these neurons, we recognized three of the four alpns we characterized: an alpn -like neuron and an alpn -like neuron that represents, respectively, . % and . % of the input that a/bp kenyon cells receive in the dorsal accessory calyx ( figure c , g and table ). we could not recognize an alpn -like neuron in the connectome but we identified an alpn -like neuron ( figure e , table ). the alpn like neuron does not connect to any kenyon cells. together, the nine antennal lobe input neurons -including the alpn -like and alpn -like neurons -represent less than % of the input that a/bp kenyon cells receive in the dorsal accessory calyx ( figure b ). interestingly, most of the remaining input a/bp kenyon cells receive in the dorsal accessory calyx ( . % of the total input a/bp kenyon cells receive in the dorsal accessory calyx) is from a single neuron projecting from many brain regions including the main calyx and the pedunculus, two regions of the mushroom body, as well as the superior lateral protocerebrum (table ) . altogether, these observations confirm that lopn-like and plppn-like neurons are the major input neurons that connect to the a/bp kenyon cells in the dorsal accessory calyx. thus, the dorsal accessory calyx receives most of its input from visual processing centers -the posterior lateral protocerebrum and the lobula -and is thus anatomically poised to process mostly, if not strictly, visual information. in this study, we identified and characterized neurons projecting to the dorsal accessory calyx of the mushroom body and show that these neurons are presynaptic to the a/bp kenyon cells ( figure ). using a combination of genetic and anatomical techniques, we could distinguish two different types of projection neuron: lopn projecting from the lobula -an area of the optic lobe processing visual features such as shape and motion -and the plppns projecting from the posterior lateral protocerebrum. although the posterior lateral protocerebrum remains poorly characterized in d. melanogaster, evidence from other insects shows that this brain region receives input from the optic lobe (paulk et al., (paulk et al., , ). interestingly, we found that the dendrites formed by the plppns in the posterior lateral protocerebrum are in close proximity to neurons that project from the ventral medulla. based on our results -and considering insights from the connectome -we estimate that lopns and plppns account for half of total input that a/bp kenyon cells receive in the dorsal accessory calyx. lopns and plppns do not extend axonal terminals into the ventral accessory calyx, the other calyx known to receive visual input, but rather extend axonal terminals into the dorsal accessory calyx and into the superior lateral protocerebrum. likewise, the a/bp kenyon cells do not connect to the visual projection neurons that are associated with the ventral accessory calyx (vogt et al., ) . these findings suggest that the visual system is connected to the mushroom body via two parallel pathways: the a/bp kenyon cells receive input from the lobula and the posterior lateral protocerebrum, whereas the gd kenyon cells receive input directly from the medulla. further functional studies are necessary to determine what kind of visual information is processed by the a/bp kenyon cells. in drosophila melanogaster, the mushroom body has long been studied as an olfactory processing center. however, evidence from many insects, including the honeybee apis mellifera, shows that the mushroom body integrates sensory information across different modalities. in honeybees, the input region of the mushroom body, also called the calyx, is divided into different layers and each layer receives input from either the olfactory or visual system (gronenberg, ) . because the dendrites of kenyon cells are also restricted to specific layers, it has been suggested that, in the honeybee, multisensory integration does not occur at the level of individual kenyon cells, but rather at the population level (ehmer and gronenberg, ) . although the honeybee mushroom body differs greatly from the drosophila mushroom body -it contains about a hundred times as many kenyon cells and its input region is divided in multiple complex layers -it appears that both mushroom bodies share a common fundamental connectivity principle: the segregation of input based on sensory modality. this connectivity mechanism is immediately apparent in the structural organization of the drosophila melanogaster mushroom body: the kenyon cells receiving input from the olfactory system all extend their dendrites into the main calyx, whereas as the kenyon cells receiving input from the visual system extend their dendrites either in the dorsal accessory calyx or the ventral accessory calyx. many studies have demonstrated that the kenyon cells that process olfactory information -those associated with the main calyx -integrate input broadly across the different types of olfactory projection neuron (caron et al., ; zheng et al., ) . interestingly, it appears that the kenyon cells that process visual information are wired differently. we have a thorough understanding of how olfactory kenyon cells integrate input from the antennal lobe: most kenyon cells receive, on average, input from seven projection neurons and the projection neurons connecting to the same kenyon cell share no apparent common features (caron et al., ; zheng et al., ) . theoretical studies have shown that this random-like connectivity pattern enables the mushroom body to form sparse and decorrelated odor representations and thus maximizes learning (litwin-kumar et al., ) . randomization of sensory input is a connectivity pattern that is well suited for representing olfactory informationas an odor is encoded based on the ensemble of olfactory receptors it activates -and might not be suitable for representing visual information. indeed, our results suggest that that specific visual features -the signals processed by the medulla and the ones processed by the lobula and the posterior lateral protocerebrum -need to be represented by two separate subpopulations of kenyon cells. this observation mirrors anatomical studies of the honeybee brain: the neurons projecting from the lobula terminate in a different layer than the neurons projecting from the medulla (ehmer and gronenberg, ) . this arrangement might be essential to preserve distinct visual features when forming associative memories. functional and behavioral studies are required to determine whether indeed the mushroom body represents multisensory stimuli in this manner. flies (drosophila melanogaster) were raised on standard cornmeal agar medium and maintained in an incubator set at °c, % humidity with a hours light / hours dark cycle (percival scientific, inc.). crosses were set up and reared under the same conditions, but the standard cornmeal agar medium was supplemented with dry yeast. the strains used in this study are described in the table below. two to six-day-old flies were used for all photo-labeling experiments. the protocol was largely based on the one developed in a previous study (aso et al., ) . in short, brains were dissected in saline ( mm nacl, mm kcl, mm hepes, mm trehalose, mm sucrose, mm nah po , mm nahco , mm cacl , mm mgcl , ph≈ . ), treated for minute with mg/ml collagenase (sigma-aldrich) and mounted on a piece of sylgard placed at the bottom of a petri dish. photo-labeling and image acquisition were performed using an ultima two-photon laser scanning microscope (bruker) with an ultrafast chameleon ti:saphirre laser (coherent) modulated by pockels cells (conotopics). for photo-labeling, the laser was tuned to nm with an intensity of - mw; for image acquisition, the laser was tuned to nm with an intensity of - mw (both power values were measured behind the objective lens). a x water-immersion objective lens (olympus) was used for both photo-labeling and image acquisition. a x waterimmersion objective lens (olympus) was used for image acquisition in some experiments (the ones described in figure and figure ) . a gaasp detector (hamamatsu photonics) and pmt detector were used for measuring green and red fluorescence, respectively. photo-labeling and image acquisition files were visualized on a computer using the prairie view software version . (bruker). image acquisition was performed at a resolution of by pixels, with a pixel size of . μm ( x lens) or . μm ( x) and a pixel dwell time of μs. each pixel was scanned times. for photo-labeling of the dorsal accessory calyx (or the posterior lateral protocerebrum), a volume spanning the entire neuropil was divided into eight to planes with a step size of μm. the mask function of the prairie view software was used to mark the targeted region in every plane, and the boundaries of the mask were determined based on the red fluorescent protein dsred expressed by the a/bp kenyon cells. the photo-labeling step was performed using a pixel size of . μm and a pixel dwell time of μs. each pixel was scanned four times. each plane was scanned times with -second interval. the entire photo-labeling cycle was repeated two to three times, with a -minute resting period between cycles. the entire brain was imaged before and after photo-labeling using the x water-immersion objective lens (olympus, japan). the number of cell bodies recovered after the photo-labeling was measured using the multi-point tool function of the fiji software (schindelin et al., ) . for the photo-labeling of single neurons, a slightly modified protocol was used. instead of a volume, a single square plane of . μm by . μm centered on the soma of a neuron cell was scanned to times with a seconds interval between scans. photo-labeled brains were fixed in % paraformaldehyde diluted in x phosphate bovine saline (pfa) for minutes at room temperature, washed five times in . % triton x- diluted in x phosphate bovine saline technologies) were used. on the following day, brains were washed four times in pbst and mounted on a slide (fisher scientific) using the mounting media vectashield (vector laboratories inc.). immuno-stained brains were imaged using an lsm confocal microscope. (zeiss). the neuropils innervated by the input neurons were identified by comparing the confocal images with the adult brain template jfrc available on virtual fly brain (https://v .virtualflybrain.org/) (jenett et al., ) . one to three-day-old flies were used when mapping the connectivity rate between input neurons and α/βp kenyon cells. brains were dissected in saline, treated for one minute with mg/ml collagenase (sigma-aldrich) and mounted on a piece of sylgard placed at the bottom of a petri dish. the imaging protocol is the same as described above but the photo-labeling protocol is different. each of the input neurons was photo-labeled using a single plane centered on either its soma or its projection and by scanning the plane three to five times. each pixel was scanned eight times with a pixel size of . μm and a pixel dwell time of μs. a fire-polished borosilicate glass pipette ( . mm i.d., . mm o.d., cm length; sutter instruments) was pulled using the p- micropipette puller (sutter instruments) and backfilled with texas red dye (lysine-fixable mw; life technologies) dissolved in saline. the tip of the pipette was positioned next to the cell body of a randomly chosen α/βp kenyon cell under the two-photon microscope. the dye was electroporated into the cell body using three to five - millisecond pulses of - v. the dye was allowed to diffuse within the kenyon cell for minutes before the brain was imaged. all confocal images were collected on lsm confocal microscope (zeiss). for imaging whole brain, each sample was imaged twice using a plan-apochromat x/ . oil m objective lens. first, the entire brain was divided into four tiles, each tile was imaged separately (voxel size = . μm by . μm by μm, by pixels per image plane) and then stitched together using the stitch function of the zen microscope software (zeiss). for imaging specific neuropil regions, the same objective lens was used, but with higher resolution (voxel size = . μm by . μm by μm, by pixels per image plane). for imaging brains manipulated using the grasp technique, a plan-apochromat x/ . oil m objective lens was used in combination with the rgb airyscan mode. images were processed using the airyscan function of the zen microscope software. all confocal images were analyzed using the fiji software (schindelin et al., ) . all figure panels are maximum intensity projection of confocal stacks or sub-stacks. (a) a schematic of the drosophila brain shows how the frequency of connections between the α/βp kenyon cells (red) and a given input neuron (green), here lopn, was measured. (b-g) a given input neuron was photo-labeled (bright green) and a randomly chosen a/bp kenyon cell was dye-filled (red). the total number of claws formed by the dye-filled kenyon cell was counted and claws connecting to the axonal terminals of the photo-labeled neuron were detected (arrows). such connections were found for lopn (b-c), plppns (d-e) but not for the other neurons identified in this study such as alpn (f-g). (h) the frequency of connections between α/βp kenyon cells and a given input neuron was calculated by dividing the number of connections detected (for instance arrows in c and e) by the total number of claws sampled for that particular input neuron. the following genotypes were used in this figure: yw/yw;uas-c pa-gfp unknown /uas-spa-gfp attp ; r f -gal ad vk ,r d -gal dbd attp /r_line(as indicated in the panel)- the neuronal architecture of the mushroom body provides a logic for associative learning understanding the retinal basis of vision across species complete connectomic reconstruction of olfactory projection neurons in the fly brain random convergence of olfactory inputs in the drosophila mushroom body the drosophila pheromone cva activates a sexually dimorphic neural circuit segregation of visual input to the mushroom bodies in the honeybee (apis mellifera) gfp reconstitution across synaptic partners (grasp) defines cell contacts and synapses in living nervous systems subdivisions of hymenopteran mushroom body calyces by their afferent supply systematic analysis of the visual projection neurons of drosophila melanogaster: lobula-specific pathways what can tiny mushrooms in fruit flies tell us about learning and memory? a gal -driver line resource for drosophila neurobiology object-detecting neurons in drosophila olfactory networks: from sensation to perception context generalization in drosophila visual learning requires the mushroom bodies refined spatial manipulation of neuronal function by combinatorial restriction of transgene expression dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation connectomics analysis reveals first, second, and third order thermosensory and hygrosensory neurons in the adult drosophila brain limited taste discrimination in drosophila the color-vision circuit in the medulla of drosophila visual processing in the central bee brain the processing of color, motion, and stimulus timing are anatomically segregated in the bumblebee brain a dimorphic pheromone circuit in drosophila from sensory input to descending output fiji: an open-source platform for biological-image analysis. organization of antennal lobe-associated neurons in adult drosophila melanogaster brain dye fills reveal additional olfactory tracts in the protocerebrum of wild-type drosophila direct neural pathways convey distinct visual information to drosophila mushroom bodies shared mushroom body circuits underlie visual and olfactory memories in drosophila visual projection neurons in the drosophila lobula link feature detection to distinct behavioral programs a connectome of the adult drosophila central brain convergence of multimodal sensory pathways to the mushroom body calyx in drosophila melanogaster dissecting neural circuits for multisensory integration and crossmodal processing a naturally monomeric infrared fluorescent protein for protein labeling in vivo a complete electron microscopy volume of the brain of adult drosophila melanogaster r h -gal ) or weakly (c, r h -gal ) in neurons projecting to or from the dorsal accessory calyx (white dashed outline) were selected for further investigation. these images were obtained from the flylight website. (d, e) the dorsal accessory calyx (white dashed outline) of the mushroom body (red) was visualized in the selected lines, here r h -gal , and targeted for photo-activation by designing a mask that exposed the outlined region to high energy light; transgenic lines driving expression in a few neurons extending clear axonal terminals in the dorsal accessory calyx (e, arrow) were selected for further investigation. (f, g) before photo-labeling the following genotype was used in the d-g panels: yw/yw;mb -dsred unknown ,uas-c pa-gfp unknown /uas-c pa-gfp attp b-c) lopn (bright green) was identified in the screen using two different transgenic lines (b: r h -gal and c: r d -gal ); the neurons photo- labeled in each line show an overall similar morphology. (d-e) lopn was photo-labeled using either the r h -gal (d) r d -gal (e) transgenic lines; the samples were fixed, immuno-stained (nc antibody, blue) and imaged. the photo-labeled neurons show an overall similar morphology: their somata are located near the optic lobe; they extend dendritic terminals in a small region of the lobula (lo); they extend axonal terminals in the dorsal accessory calyx (daca) and the superior lateral protocerebrum (slp). (f-g) a lopn-like neuron was identified in the hemibrain connectome: (f) this neuron projects from the lobula to the dorsal accessory calyx and the superior lateral protocerebrum; (g) its axonal terminals innervate the dorsal accessory calyx (daca), but not the main calyx (ca) or the ventral accessory calyx (vaca) e) yw/yw;mb -dsred unknown ,uas-c pa-gfp attp /uas-c pa-gfp unknown b-c) a group of plppns (bright green) were identified in the screen using two different transgenic lines (b: r h -gal and c: r g -gal ). (d) plppns were photolabeled using the r h -gal transgenic line; the sample was fixed, immuno-stained (nc antibody, blue) and imaged. the photo-labeled neurons show an overall similar morphology: their somata are located near the lateral horn; they extend projections in the posterior lateral protocerebrum (plp), the superior clamp (sc), the superior lateral protocerebrum (slp) and the dorsal accessory calyx (daca). (e-f) the expression patterns observed for each of the gal lines are broad and include many neurons; (e) the r h -gal line drives expression strongly in many neurons, including the plppns, (f) whereas the r g -gal line drives expression weakly in fewer neurons, including the plppns. these images were obtained from the flylight website. (g) a split-gal j) these neurons project from the posterior lateral protocerebrum to the dorsal accessory calyx, the superior lateral protocerebrum and the superior clamp; (k) their axonal terminals innervate the dorsal accessory calyx (daca), but not the main calyx (ca) or the ventral accessory calyx (vaca). the following genotypes were used in this figure: (b, d) yw/yw; mb -dsred unknown , uas-c pa- gfp unknown /uas-c pa-gfp attp c) yw/yw; uas-c pa-gfp unknown /cyo;uas-c pa-gfp attp , r g -gal attp /mkrs; (g) yw/yw; mb -dsred unknown , uas-c pa-gfp unknown /cyo; uas-c pa-gfp attp unknown ; (i) yw/yw;uas-denmark /cyo the authors declare that no competing interests exist. key: cord- -cquxqbc authors: martinez, xavier; baaden, marc title: fair sharing of molecular visualization experiences: from pictures in the cloud to collaborative virtual reality exploration in immersive d environments date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cquxqbc motivated by the current covid- pandemic that has spurred a substantial flow of structural data we describe how molecular visualization experiences can be used to make these datasets accessible to a broad audience. using a variety of technology vectors related to the cloud, d- and virtual reality gear, we examine how to share curated visualizations of structural biology, modeling and/or bioinformatics datasets for interactive and collaborative exploration. we discuss f.a.i.r. as overarching principle for sharing such visualizations. we provide four initial example scenes related to recent covid- structural data together with a ready-to-use (and share) implementation in the unitymol software. synopsis visualization renders structural molecular data accessible to a broad audience. we describe an approach to share molecular visualization experiences based on fair principles. our workflow is exemplified with recent covid- related data. the covid- pandemic has spurred a wealth of new structural biology data related to the severe acute respiratory syndrome coronavirus (sars-cov)- retrovirus and its interactions with other macromolecules involved in the development and spread of the disease [ , ] . such structural data is transparent to experts in the field, but requires adequate visualization to become accessible to a broader audience. as put by card et al., "visual artifacts aid thought; in fact, they are completely entwined with cognition action" [ ] . visualizations prepared by experts may serve both educational and research purposes by highlighting key aspects of a given dataset or comparing features among several ones thereby supporting thought and reasoning. here we aim to establish a general workflow to prepare such curated educational and research material, taking the advent of recent structural and modeling data as example. our overall goal is to enable easy sharing of a given visual experience with others so that the same content can be accessed in various ways. we put focus on collaborative virtual reality (vr) as well as sharing d models and scenes to intuitively convey the spatial complexity of these molecular objects. shape-related features bear particular relevance for drug design and hence for expert users, yet they can also be understood by an inexperienced person, in particular through a visual experience such as those described here. a particular aim is to treat the sharing of visual experiences with the same f.a.i.r. principles [ ] as one should apply for the underlying raw chemical [ ] , structural and modelling data, and as is generally the case in crystallography [ ] . this target is challenging by itself, as common fair-based sharing platforms do not provide a category that is well adapted to visual experiences. the features of the workflow we are seeking to implement are schematically summarized in figure . starting from the original raw data that should encompass experimental structural datasets, molecular modelling results and bioinformatics findings, we aspire to prepare new shareable visual content to be set up and curated by an expert. to do so, we will use a molecular visualization software package to describe the visualization scene and all its annotations through command scripts. the experience can be customized through add-ons, for example providing dataset-specific user menus leading to a visual scene that can be explored directly in the software. this exploration does however require learning at least some basic usage of that software and how to manipulate the scene. this task can be simplified to some extent by designing appropriate simple-to-use custom user menus. as an alternative, the core content to be visualized can be exported to a variety of media such as still images, movies of various formats -possibly stereoscopic ones -, native d models or a fully interactive scene to be explored with advanced technology. we aim to support a broad variety of sophisticated hardware such as vr headsets, wall-size stereoscopic displays and the hololens, but also simple setups such as google cardboard or even just running in a web browser. to render these media findable, accessible, interoperable and reusable, a variety of options are discussed. the set of (open source) software plus openly accessible scripts that we propose to provide make the visual experience as reproducible as possible [ ] . inherent limitations arise from technical hardware and software dependencies related to rapidly evolving visualization technology that are by nature short-lived with a high rate of change. we chose the unitymol package to implement our proof of concept set of visualization experiences, as we have been developing and redesigning this software, which provides specific advantages for the tasks at hand in its latest version [ , ] . many alternative software options of high value could have been considered such as those recently reviewed for protein visualization [ ] . we opted for unitymol as it provides a good compromise between feature-richness and ease of use. a wide variety of media and technology vectors is supported, while the software remains simple to use for a broad range of users due to its single-window design with only a few menus to operate. our ambition is to provide a few proof of concept implementations that others can build on subsequently to further improve or simply adapt them. we have implemented all functionalities that we deemed necessary to prepare the covid- -related materials, and present here four basic examples. based on these example scenes, we experiment with applying f.a.i.r. principles to such visualization-centric data. one way to achieve this goal is by addressing the setup of the visualization experiences, i.e. the underlying software and the related scripts (designated as "raw elements" in fig. ), rather than the resulting media products. a more classical approach is to provide the derived media, which may consist in videos or images for which sharing platforms exist, although this is not enough to optimize fair compliance [ ] . a scheme summarizing the suggested workflow for fair sharing of molecular visualization experiences, building on covid- -related structural data. using existing data, new shareable visual content is created from raw elements such as a script with instructions, software to execute the script and eventual add-ons to facilitate the end-user tasks (for instance custom menus to provide only the strictly necessary shortcuts for a given experience). the visual scene that is created can be explored within the software itself or it can be exported in various ways to generate derived media such as images, movies, d models and even full experiences in e.g. vr contexts. these media can be rendered accessible through platforms that implement fair principles and ease their exploration. sharing of visualization experiences appears as a particularly timely topic. the case of the covid- pandemic provides a stringent example for the need to render the latest research data explorable by the broad scientific community. if such data were only to be provided in its raw form, the accessibility would be quite limited. the worldwide protein data bank [ ] indeed provides quite a few visualizations along with the actual data files. recently, the european pdb also proposed a visual analytics platform that is fair compliant [ ] . here we try to go one step further in the democratization of such data by preparing curated visual experiences as a valuable support for crossdisciplinary scientific exchange. in related fields, similar initiatives have recently emerged. now more than ever, sharing data from molecular simulations and modeling is becoming a priority to ensure reproducibility and accelerate discovery [ ] . an increasing number of initiatives aim to develop open access and reproducible molecular simulations (see for example [ ] [ ] [ ] among other approaches). sharing such molecular dynamics (md) simulation data equally needs efficient visualization and collaboration tools [ ] for apprehending such data remotely to avoid unnecessary data transfers. yet to the best of our knowledge, current initiatives focus on the actual data, not so much on the visualizations and annotation thereof. concerning the specific case of covid- , many data portals and hubs were put together to provide more unified access to relevant data. a brief selection is listed in table and served as testing ground for our visualization experiments. many of the listed portals provide a very direct access to the data we require for our visualization experiences, a few provide more general search engines with a less direct path to the actual dataset of interest. we did not include the classical routes to search data such as the pdb, gigadb [ , ] or zenodo and we only included one of the many relevant initiatives by individual scientists or laboratories as example. here we will briefly describe key functionalities implemented for the purpose of sharing molecular visualization experiences. as an overarching principle, we want to be able to generate a broad variety of media, including still images and videos based on these visual scenes describing the core of the contents, capturing them from within the running software, ideally in an executable build and for a handful of technically more involved features from within the unity d editor environment. for this purpose we implemented, extended and stabilized several features. a certain number of those features are still experimental only. we added a python console and api for scripting within unitymol, providing an easy way to formalize and share the narrative of a given experience, as well as facilitating the reproducibility of all necessary steps leading to a scene. we furthermore implemented the possibility to generate custom user menus so that complex and fine-tuned functionality can be achieved with the click of a button. a possible practical use could be to group shortcuts to relevant structural ensembles in light of covid- for visual analysis, comparison or even manipulation. the latter can be achieved through interactive simulations: unitymol implements the generic imd protocol for steering molecular simulations through mddriver [ ] , and provides a few in-app implementations such as a rigid protein-protein docking module. the scenes can be annotated by the user through several functionalities such as text bullets or free hand drawing. a guided tour function was implemented to walk the user through a set of predetermined key frames in a fully interactive experience. a particular focus concerns the possibility to generate effectful high-quality graphics, for instance in order to convey the complex shapes of molecular surfaces using ambient occlusion (ao) or to draw the user's attention to a certain part of the structure of interest using photography effects such as depth-of-field (dof). in addition to the molecular structures themselves, their properties such as electrostatic field can be depicted as well [ ] . in the latest version of the software we added an experimental feature enabling interactive raytracing using the ospray [ ] package. this is illustrated for our first example of the spike trimer complex with the ace enzyme ( figure a ). we added a custom menu to simplify navigating through this example ( figure b ). we explored the scene collaboratively in a -user virtual reality session ( figure c ) where avatars represent each participant. to capture such sessions, still images from screenshots or movies from video capture can be created at user defined resolution. these media can be generated and exported with specific options such as a degree view or a d stereoscopy enhancement. for these purposes, specific cameras had to be added within unity. in general, image and video exports do however not preserve the full complexity of the initial object. for this purpose we added the export of textured d polygonal objects in the common obj, fbx and probably soon also gltf formats. special effects such as ao and dof cannot currently be exported. such d models can be transformed into tangible physical objects through d printing [ ] . we have started the process of building up a collection of covid- example scripts with several objectives: they should be easy to understand, even for non-specialists; others should be able to re-run them under various conditions in terms of visualization hardware and modality and (in the future) it should be simple to use them as starting scene for shared multi-user sessions. a particular emphasis lies on the design of visual experiences suitable and tested for virtual reality (vr) exploration as our current research is focused on this technology. in our opinion its potential is still largely underexplored in academia and industry, be it for research, collaboration or education purposes [ ] . to illustrate the sharing of such visual experiences, we have prepared four typical datasets depicted in figure and figure ): a simple structural view, a collection of small molecules binding to a viral protein, an md trajectory and a bioinformatics sample with conservation as well as mutation data. haddock docking score from [ ] and their size is varied according to residue conservation assessed through a shannon entropy measure. the first example depicted in figure aims at setting up a simple structural view of the sars spike glycoprotein complex with human angiotensin-converting enzyme (ace ). the trimeric complex accessible from pdb-id cs [ ] was used as starting point for many early sars-cov- studies. we then have a comparative look at where drug molecules bind to the main protease of covid- in our second example ( figure a ). the visualization is inspired by the animation of small molecules in protein databank structures (https://www.rbvi.ucsf.edu/chimerax/data/sars-protease-may /) prepared by the chimerax [ ] team. we created a d print of the protease monomer as well. molecular dynamics simulations provide another inestimable resource for insight into molecular mechanisms. example ( figure b ) is based on a trajectory depicting a binding event of the receptorbinding domain (rbd) of the sars-cov- spike and the human ace receptor (desres-anton-[ , ] of ref. [ ] ). the fourth example illustrates a visual experience related to representing the results of bioinformatics analyses. our example is built upon the freely available data from a recent study on cross-species transmission of sars-cov- , highlighting the species variability of viral-host protein interactions [ ] . we visually map this data onto the ace -rbd complex ( figure c ). the first level of sharing these four examples is to make the software and the scripts themselves available. we considered the zenodo and gigadb platforms and chose zenodo as main platform for our experiment. typically, no dedicated category for "visual experiences" exists in these databases, whereas the classical ones such as software, dataset, image, video, workflow or other only partially reflect this case. as a first approach, we created a zenodo community ( figure a ) on fair sharing of molecular visualization experiences at https://zenodo.org/communities/fair-molvisexp/. we then linked a github repository for our visualization scripts to zenodo and the collection. we produced a set of derived materials including pictures, movies and d objects. cloud-platforms for pictures and movies are very common nowadays and we will not go into much detail on this aspect. we used both the zenodo and figshare platforms (http://figshare.com) for pictures and for movies, the former one for consistency and the latter because of its visually oriented interface and popularity. to regroup our productions on figshare, we created a collection (https://doi.org / . /m .figshare.c. ) shown in figure b . we also experimented with tangible physical objects as a particularly accessible form for d models through d printing. many generic d printing databases exist, whereas the nih d print exchange [ ] is a specific research initiative dedicated to bioscientific d prints. we deposited our models in this resource ( figure c ). such d models can also be disseminated and shared virtually, which is a less common process. we experimented with four platforms: sketchfab (https://sketchfab.com ; figure d ), google poly (https://poly.google.com ; figure e ), printing and an fbx file for d model building. referencing our datasets on these platforms only represent the first, relatively easy step towards implementing the fair principles. the main issues may lie in the interoperability and reusability aspects for the (meta)data associated with the visual experiences. for example, metadata is needed to describe software version, script purpose, underlying datasets, dependencies and the variants of output that can be produced. there are definitely many ways to go about implementing the fair principles. in this early attempt at visual experiences, we first and foremost want to raise awareness about this yet overlooked category, but do not ambition to provide a full-fledged solution. concerning the fact that our data should be findable, the choice to target covid- makes this goal relatively easy to achieve as there is an abundance of hubs for increasing the findability of such data as described briefly above (table ). the accessible attribute should be taken care of by our choice of well-established faircompliant databases for most of the produced media. interoperability is clearly an area for further improvement. in particular we consider bridges between different visualization environments as an important way forward. initiatives fostering universal protocols that can be interchanged such as molql for the query language to define selections [ ] need to be extended to describe all ingredients needed for a visual experience, in particular the molecular scene representation. as a modest step in this direction we have experimented with a pymol session interpreter for unitymol (https://github.com/lbt-cnrs/pymoltounitymol), so that pymol users can re-use their previously prepared scenes. reproducibility is intrinsically a difficult aim for visual experiences. however, by providing scripts with full instructions for the setup of the experience and by making the corresponding visualization software available as versioned open source project along with ready-to use executable builds for many operating systems and platforms, reproducibility and accessibility are maximized. furthermore, the software is conceived within a game engine, which by design is easily extensible, with a shallow learning curve. from this early experiment, we plan to expand the range of example visual experiences, be it based on the context of the rapidly growing covid- data or more generally the overwhelming structural, modeling and dynamic data of macromolecular assemblies. we are trying to optimize the generated objects so that they can run on devices with limited hardware specifications. an example would be the hololens, with good ergonomy but limited graphical power [ ] for which we want to be able to export simplified fbx/gltf polygonal models to run smoothly. most of our efforts will be dedicated to refine the multi-user collaborative features to enable visual sessions for joint exploration. more generally, our contribution calls for considering whether visual experiences should be assigned a specific category to be added to fair sharing platforms in order to take into account the specificities of such interactive, computer-supported graphical representations of research data. visualizing an unseen enemy; mobilizing structural biology to counter covid- chemistry and biology of sars-cov- . chem s readings in information visualization: using vision to think the fair guiding principles for scientific data management and stewardship chemical data in life sciences r&d and the fair principles fact and fair with big data allows objectivity in science: the view of crystallography scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities visualizing biomolecular electrostatics in virtual reality with unitymol-apbs visualizing protein structures -tools and trends how fair can you get? image retrieval as a use case to ieee th international conference on e-science (e-science) announcing the worldwide protein data bank pdbe-kb: a community-driven resource for structural and functional annotations a community letter regarding sharing biomolecular simulation data for covid- sharing data from molecular simulations ten simple rules on how to create open access and reproducible molecular simulations of biological systems about the need to make computational models of biological macromolecules available and discoverable mdsrv: viewing and sharing molecular dynamics simulations on the web gigadb: announcing the gigascience database increased interactivity and improvements to the gigascience database complex molecular assemblies at hand via interactive simulations tracing framework for scientific visualization tangible interfaces for structural molecular biology will chemists tilt their heads for virtual reality? molecular dynamics simulations related to sars-cov- insights on cross-species transmission of sars-cov- from structural modeling stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis meeting modern challenges in visualization and analysis the nih d print exchange: a public resource for bioscientific and biomedical d prints molql: towards a common general purpose molecular query language interactive molecular graphics for augmented reality using hololens acknowledgements this work was supported by the "initiative d'excellence" program from the french state (grant "dynamo", anr- -labx- and grant "cacsice", anr- -eqpx- ). x.m. and m.b. thank sesame ile-de-france for co-funding the display wall used for data analysis. x.m. and m.b. thank ucb biopharma for support. we thank nawel khenak for help with d printing and nicolas férey and antoine taly for experimentations with laser and led-lighting of translucent d-printed molecules. key: cord- -e vc q j authors: yoon, hye-jin; jeong, hyunah; lee, hyung ho; jang, soonmin title: signature of n-terminal domain (ntd) structural re-orientation in npc for proper alignment of cholesterol transport: molecular dynamics study with mutation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e vc q j the lysosomal membrane protein npc (niemann-pick type c ) and npc (niemann-pick type c ) are main players of cholesterol control in lysosome and it is known that mutation on these proteins leads to cholesterol trafficking related disease, called niemann-pick disease type c (npc) disease. the mutation r w or r q on npc is one of such disease-related mutations, causing reduced cholesterol transport by half, resulting in accumulation of cholesterol and lipids in late endosomal/lysosomal region of the cell. even though there has been significant progress in understanding cholesterol transport by npc in combination with npc , especially after the structural determination of full length npc in , many details such as interaction of full length npc with npc , molecular motions responsible for cholesterol transport during and after this interaction, and structure and function relations of many mutations are still not well understood. we report the extensive molecular dynamics simulations to gain insight into the structure and motions of npc lumenal domain for cholesterol transport and disease behind the mutation (r w). it is found that the mutation induces structural shift of ntd (n-terminal domain), toward the loop region in mld (middle lumenal domain), which is believed to play central role in interaction with npc protein, such that the interaction with npc protein might be less favorable compare to wild npc . also, the simulation indicates the possible re-orientation of the ntd, aligning to form an internal tunnel, after receiving the cholesterol from npc with wild npc unlike the mutated one, a possible pose for further action in cholesterol trafficking. we believe the current study can provide better understanding on the cholesterol transport by npc , especially the role of ntd of npc , in combination with npc interaction. synopsis modeling study of cholesterol binding protein npc the cholesterol homeostasis is maintained by many different proteins depending on tissues on the body. [ ] blood cholesterol levels are regulated by several processes, including bio-synthesis, cholesterol absorption/re-absorption, and biliary clearance and excretion. [ , ] elevated blood cholesterol levels contribute to atherosclerotic coronary heart disease. [ , ] previous studies have shown that lowering the levels of plasma cholesterol significantly reduces the risk of cardiovascular diseases associated with diabetes mellitus even in diabetic patients with normal levels of plasma cholesterol. [ ] therefore, control of cholesterol for therapeutic purpose for disease such as cardiovascular disease has fundamental importance. there has been tremendous amount of efforts to understand how different organ systems such as not only the liver and the intestine but also the brain, which are less explored, coordinate their cellular mechanisms to control body cholesterol homeostasis. [ , ] the transmembrane protein niemann-pick type c (npc ) inside lysosome is one of the key players in cholesterol transport, which is mediated by niemann-pick type c (npc ). [ , ] npc facilitate transport of ldl-derived cholesterol out of lysosomes for subsequent delivery to the endoplasmic reticulum and plasma membrane. npc functions mostly in tandem with npc , a soluble lysosomal protein, to move unesterified cholesterol. [ ] the abnormal function of npc due to some factors such as mutation cause accumulation of cholesterol within lysosome, leading to a disease called npc disease. not only the crystal structures of n-terminal lumenal domain (ntd) of npc with and without cholesterol (pdb id: gki and gkh), [ ] but also the cryo-em (electron microscope) of full length npc with and without cholesterol in ntd (pdb id: jd and jnx) were reported. [ ] the npc glycoprotein has helixes in transmembrane domains (tmd) and three relatively large, lumenally oriented domains. the n-terminal lumenal domain (ntd) contains a cholesterol binding pocket while the newly determined structure of cysteine-rich c-terminal domain (ctd) contains loop region, which is close to the ntd, indicating the importance of this region in cholesterol transport in conjunction with maintaining the possible orientation of ntd to receive cholesterol from npc . [ ] it is believed that the soluble npc accepts ldl-cholesterol and delivers it to ntd directly, so called "hands-off mechanism". [ ] it is known that the two protruding mld loops in npc mediates interaction of npc as well as interaction of glycoprotein of ebola virus and both the structure of mld with ebola glycoprotein and npc has been determined. [ , [ ] [ ] [ ] [ ] it facts, study of npc has importance not only as a mediator of cholesterol transport but also as a mediator of coronavirus such as sars and ebola virus. especially, much attention is payed to the npc as a one of the possible target protein to mimic the npc disease in relation to the recent pandemic outbreak covid- virus because inhibition of this protein could reduce the replication of coronavirus including covid- virus. [ , ] there have been several computational or modeling studies regarding the cholesterol transport through npc , complementing the experimental results. the putative structure of npc with cholesterol in complex with ntd was suggested with modeling studies. [ ] [ ] [ ] the detailed sliding like hands-off cholesterol transport, especially the isomerization of cholesterol during the transport from npc to ntd, was reported with qm/mm study within fixed npc /ntd conformational framework. [ ] based on the x-ray structure of npc in complex with npc -mld region, another possible ntd/npc complex was also suggested. [ ] the interface contacts between ntd and npc are quite different between these two putative complexes, especially in terms of the angle between ntd and npc and the orientation of npc relative to the ntd. from the molecular dynamics simulations it is found that one of the complex, termed as "texas model", has favorable interaction in npc /ntd complex interface when the cholesterol is in npc side while the same complex dissociates when the complex is in ntd side or no cholesterol on both side. [ , ] on the other hands, the other complex, coined as "california model", shows the opposite behaviors. based on these observations it was suggested that the texas model may corresponds to the initial structure for cholesterol transport from npc to ntd while the california model corresponds to the structure after the cholesterol transport. [ ] even though there has been significant progress of understanding the dietary cholesterol exchange in cellular environment in connection to the npc especially after the structural determination of full length npc , many of the details still remains unclear. the location or orientation of ntd presented in full length npc structure might not in active form and there could have change of ntd orientation/location from full length npc structure during the cholesterol transfer from npc for proper alignment. [ , , ] eventually, it is believed that once the cholesterol has been transferred to ntd from npc it should be delivered to the sterol sensing domain (ssd), which is in between the helix bundle within the membrane region, possibly triggering a sequence of events once transferred there. [ , ] the transfer of cholesterol to ssd could proceed from ntd possibly through a conduit like channel that was observed in patched protein by proton driven network [ , ] . note that the modeling study with disease causing l p mutation indicate break down of this tunnel in npc , [ ] emphasizing importance of this tunnel in cholesterol transport. recent structure of npc with npc blocker itraconazole obtained from cryo-em indicated the blocker is located in this tunnel, supporting this scheme. [ ] in this context, the possible binding sites are suggested through modeling study recently. [ ] we note that the channel was observed in short molecular dynamics simulation with no cholesterol in ntd. [ ] interestingly, the very recent molecular dynamics study npc with cholesterol in itraconazole binding site indicates migration of cholesterol to ssd side along this tunnel with wild type unlike mutation, which migrates reverse direction. [ ] another possibility is that the cholesterol is transferred from ntd of neighboring npc , i.e. inter-protein transfer. [ ] in either case, there should be some structural rearrangement of npc protein after the cholesterol is transferred to ntd from npc , possibly significant amount of ntd re-orientation or translocation for further actions. [ , ] recent modeling study shows that the presence of cholesterol on ssd induces large structural change. i.e. unfolding of transmembrane (tm ), break of contact between mld and ctd, and disengagement of ntd, suggesting that cholesterol on ssd can serve as either positive or negative feedback. [ ] the mutational study, both theoretically and experimentally, may provide opportunities to gain insight into the mechanism of npc cholesterol transfer process directly or indirectly. since the first discovery of mutation on npc protein and its connection to npc related diseases [ ] , numerous mutations either on npc or npc are additionally found. [ , ] the point mutation r w (or r q) is one of such examples [ ] and it has been reported that the cholesterol transfer activity is reduced by % by this mutation. [ ] at the same time, it is found that the binding affinity of npc to the full length npc is noticeably reduced. [ , ] the transfer efficiency was maximized under ph . , which is the environment of lysosome. [ , ] from experiments, it has been reported that the interaction between npc and npc has two aspects, i.e. cholesterol independent weak interactions and cholesterol depending strong interactions. [ ] later, it has been found that this cholesterol dependent binding affinity is due to the structural difference of npc near the mld loops binding region depending on the presence of cholesterol. [ ] there might be initial binding of npc to npc mld loops and stable complex formation of npc with ntd when there is cholesterol on npc , thereby the ntd is acting as an anchoring player also as pointed out by gong et al,. [ ] the computational study of point mutation i t, p a, and g w on ctd, shows structural instability of the mutated npc , especially instabilities in ntd, suggesting that importance of correct orientation or stability of ntd in npc . [ ] on the other hands, the computer simulation with ntd protein and its two mutations (q r and q s) shows the importance of correct electrostatic distribution near the entrance of cholesterol pocket as well as structural stability for suitable binding cholesterol to ntd. [ ] in combination with experiment, the molecular dynamics simulation of l p mutation indicates disruption of the tunnel between mld and ctd that is believed to play essential role for cholesterol transport as mentioned earlier. [ ] since it was pointed that the r w mutation decrease cholesterol transfer activity not because of misfolding of npc but because of the functional defection of npc , [ ] the atomic level detailed mechanism behind this defective functionality could provide insights on the structure and functional relationship of npc , especially its interaction with npc . in this paper, we present extensive molecular dynamics simulations of npc to understand the structural and dynamical characteristics upon r w mutation and its effects on overall cholesterol transport efficiency in connection with possible re-location or orientation of ntd and interaction of npc with npc . [ ] we build the initial full length npc structure using x-ray structure (pdb id: u ) as a template. [ ] the missing ntd structure was introduced by overlapping the overall structure into cryo-em structure presented by gong et al,, (pdb id jd ). [ ] for simulation with r w mutation, we mutated the arg to trp with pymol [ ] by selecting the lowest steric hindrance conformer. the ph of the simulation was set to . considering the lumenal environment of lysosome. [ , , ] for this purpose, considering the intrinsic pka values of amino acids, the protonation states of every asp, glu, and his was obtained by propka . [ ] the overall structure was solvated with tip p water and the total charge was balanced by addition of na + ions. in this study, we only considered the lumenal exposed part of the full length npc to reduce the overall computational cost by imposing position restraints on cα atoms connected to tm. the restraints constant was set to kj/mol·nm and its locations are pro , lys , glu , tyr , and tyr . note that we have included the prolinerich long strand that is connected to the ntd starting from membrane surface in our simulation. of course, this simplified model may not fully reflect the behavior of full length npc including membrane. the recent molecular dynamics simulation [ ] with full npc when cholesterol is present in npc inhibiting itraconazole binding site, [ ] which is at the interface between membrane and lumenal region, shows there is non-negligible distance correlation coefficient between tmd and the rest of the domains. however, the longtime simulation of full npc suggest that the conformational stability of tmd is higher, i.e. not much structural change starting from initial structure with cα rmsd below Å, unlike the lumenal exposed three domains. [ ] also, one of the current simulation result is in agreement with the very recent full length npc simulation, which will be addressed in results section. therefore, the model presented here might capture the qualitative structural/dynamic features of lumenal domains within simplified framework. the initial structure we have used in this study is shown in figure . all figures of protein structure are drawn with pymol. [ ] we have used charmm force field [ ] for both protein and cholesterol when needed. the simulation box was constructed by setting the minimum distance from the protein edge to the simulation box as . Å and the final box size was . , . , and . Å along the x, y, and zdirection respectively. the periodic boundary condition was imposed along the three different directions. the non-bonded cutoff distance was set to . Å with pair list update at every time steps. the long range electrostatic interaction was treated with particle-mesh-ewald (pme) method with pme order of and fourier spacing of . Å. the initially prepared system was energy minimized with steepest descent method followed by molecular dynamics simulation with constraints to all-bond in protein using lincs algorithm for ns before the production run. for final production, the simulation was run under the constant temperature and constant pressure condition, i.e. npt simulation. the temperature was set to k using velocity rescale method with coupling constant of . ps and the pressure was set to atm using berendsen pressure coupling method with coupling constant of . ps with compressibility of . x - bar - . the whole simulation was run using gromacs- . . [ ] we used the leap-frog time integrator with time step of . fs to speed up the simulations by increasing the hydrogen atom mass by factors of four while keeping the total atomic mass unchanged by subtracting the same mass from the h-bonded heavy atoms as implemented in gromacs. [ , ] we have performed the molecular dynamics for four different systems, i.e. wild npc , r w mutated npc , wild npc with cholesterol on ntd, and r w mutated npc with cholesterol on ntd. the reason we included system with cholesterol in ntd for mutated npc in our simulation is we would like to trace the effect of mutation after successful cholesterol transfer from npc to ntd since as much as half of cholesterol is transferred to npc from npc according to experiment in mutation. [ ] the simulation time and total number of independent simulations are described in detail in results section. initially, we run eight independent wild npc simulations with no cholesterol in ntd for ns. one of the simulation trajectories shows noticeable change of ntd displacement toward the outside of the npc with rmsd of . Å compare to the initial structure. therefore, with this trajectory we further proceeded the simulation up to . μs. the rest of the trajectories essentially show no noticeable change from the starting structure, exhibiting rmsd around . Å from starting structure. among those remaining seven trajectories, we have randomly selected two more trajectories and proceeded the simulation to . μs but these two trajectories remains essentially steady. the results presented here are obtained from the trajectory with the large ntd displacement mentioned above. to monitor this displacement or tilting of ntd, we obtained the relative angles between each domains as a function of simulation time and showed in figure . the angles shown here are relative angles between long helices from each domain. note that the rmsd of each domain is relatively small during this simulations (less than . Å). therefore, monitoring the angles between well preserved helices from each domain could be a measure of relative change of orientation between them. for this purpose, we defined a single vector from each helix and calculated the angles between them. [ ] the helices we defined are shown in supplementary figure s . panel a shows sudden change of angles, especially angle between ctd and ntd, starting from around ns and remains steady with some fluctuations after ns. compare to panel a, the same plot on panel b and c, which correspond to the angles in those two randomly selected trajectories, shows minor angle change. the figure shows the overlap of final structure from our μs simulation trajectory over the initial structure. one can clearly see the displacement of ntd away from the mld domain. the movie file corresponding to this trajectory can be found in supplementary material. in fact, this observation is in agreement with the simulation of full length npc including the membrane with no cholesterol in ntd when the cholesterol was not present within the membrane. [ ] the detailed structural information on the interaction or complex formation of npc with ntd in the presence of full length npc could significantly enhance our understanding on cholesterol transfer from npc to ntd. we note that the overlap of putative ntd-npc complex when there is cholesterol in npc side, [ , ] which is called texas model and it was suggested based on ntd domain only, generates significant structural crash when superimposed on the cryo-em structure, indicating that possible involvement of some structural re-orientation or displacement of ntd to adapt the npc . the fitting of texas complex into the current simulation structure after . μs in figure shows no more such crash and favorable interaction is possible between full length npc and npc . certainly, it would be interesting to keep track of this putative full length npc + npc structure with long time molecular dynamics in the presence of cholesterol, which is underway. as for r w mutation simulation, we generated six independent trajectories for ns each. basically, no noticeable changes are observed except in one trajectory where the ntd is displaced toward the mld side, which is opposite direction to the wild type simulation. possibly, this causes less favorable interaction with npc and could be the signature of the experimentally observed cholesterol independent weak interaction of npc with mutated npc . [ , ] in fact, the overlap of mld from the last frame of this trajectory over the mld of x-ray structure (pdb id: kwy) [ ] generates structural crash in interface between npc and ntd, more specifically residue a ~k of ntd and residue g ~p of npc ( figure s ). another possible cause of weaken interaction is the structural change of npc binding mld loops upon the mutation. it is reported that the seven residues (q , y , p , d , f , f , and y ) in two protruding mld loops are involved in interaction with npc . to understand this possibility, we analyzed the loop structure between wild and mutated trajectories but we was not able to correlate the loop structural difference with npc binding affinity between them. note that the residues d is e in our simulation since we have used human npc unlike npc x-ray structure, which is for bovine. therefore, we believe the reduced binding affinity of mutated r w is due to the displacement of ntd toward the mld, leading to less favorable ntd-npc interaction compare to a state where the interface is water exposed. this can be understood by noting that the interactions in this region is mostly between hydrophobic residues (or charged residues) from ntd and hydrophobic residues from npc . we also performed the molecular dynamics simulation with cholesterol loaded in ntd. the seven independent simulations to ns show no distinctive titling motion or distance change between domains. there was no noticeable structural differences between any two different trajectories either. therefore we selected one of the trajectory and proceeded the simulation to . μs. then, we observed change of ntd orientation. the backbone rmsd using starting structure as a reference was plotted in figure as a function of time. we observed that the trajectory quickly makes conduit-like channel as a result of this re-orientation, including the cholesterol containing ntd as a part of this channel, that was mentioned above. as one can see from figure , which is the same angle profile as figure , its relative orientation between domains are different from structure with no cholesterol at least within our simulation time scale. we have selected a structure near . μs which has rmsd of . Å and presented the identified tunnel in figure . the tunnel was constructed using moleonline [ ] and imported into pymol. it is somewhat surprising that this long tunnel is keep surviving even with large structural fluctuation. this tunnel might act as a channel for cholesterol transport from ntd to ssd much like the experimentally observed channel in ncr . [ ] of course, the simulation is not reached to the fully equilibrated state. hence, the tunnel we have observed should be understood as a snapshot of dynamically fluctuating system and its shape might be not ideal. however, this simulation shows the trajectory can form relatively stable tunnel, which was not observed in trajectory with no cholesterol on ntd, that can facilitate the transport of cholesterol and there is onset of ntd alignment to this channel when there is cholesterol on ntd, possibly for proper transport of cholesterol from ntd all the way to ssd. to observe the behavior of the mutated npc once the cholesterol is transferred to ntd from npc , which might be relatively easy after binding of npc to mld loops, we performed the same simulations with cholesterol in ntd for r w mutation. the number of total independent simulation trajectory was nine and the simulation was run up to ns each for each of eight trajectories and . μs for one trajectory. among these nine trajectories two of them shows similar behavior, i.e. formation of tunnel or tunnel-like structure as shown in figure . we also examined the possibility of blocking of the tunnel due to the r w mutation since it is located near the tunnel. but those tunnel blocking was not observed in all simulations. these observations are in agreement with experimental findings that the reduced cholesterol trafficking of r w npc is mainly due to the functional inability instead of misfolding as stated above. based on the mutational study here, it seems that the reduced binding affinity npc to r w mutated npc is due to unfavorable orientation of ntd for npc binding, thereby decreased cholesterol transfer activity from npc to ntd. on the other hands, it seems there is no difficulty of proper alignment of ntd to cholesterol transporting channel after the cholesterol is transferred to ntd from npc , supporting the stepwise process for cholesterol transport from npc to ssd in lysosomal membrane. before the transfer of cholesterol from npc to ntd, the ntd has to have proper pose for favorable interaction with npc as indicated by titling of ntd from current simulation and previous simulation. [ ] the structure of full length npc + npc (with cholesterol) is not observed and it would be interesting to model this complex starting from putative structure presented in this work. the texas model is based on ntd and npc interaction and there is possibility that their interaction structure might be very different under the presence of full length npc . right after the cholesterol transfer from npc to ntd, the alignment of cholesterol in ntd must be directed more or less to npc cholesterol leaving region. it is believed that the next step is the re-orientation of ntd starting from this conformation toward the long channel that leads to ssd, which could happen between neighboring npc . this process of leaving npc from ntd and ntd reorientation could happen independently or they could be connected. in either case, it seems reasonable to assume that from our simulations there is change of ntd orientation toward the favorable formation of tunnel that can facilitate the release of cholesterol from ntd to the tunnel for further process. this behavior was not observed previously. we note that the cholesterol transfer efficiency from npc to npc is reduced by % by deletion of ntd. [ ] therefore, the role of ntd must be crucial for proper cholesterol transport through npc . with current study, we demonstrated the importance of ntd as a dynamic domain depending on the existence of cholesterol on it along with the mutation r w. clearly, there is limitation of current simulations since our model include lysosomal lumenal domains only instead of full length npc including the membrane and cytoplasmic loops. therefore the possible consorted motion between domains via the tmd and cytoplasmic loops is not fully reflected in current study. moreover, the current simulation is too short to access the equilibrium state and the simulation results presented here should be understood as 'dynamical signature' of corresponding behaviors instead of final equilibrium property. nevertheless, the current study provides insights into the structure function relation of npc , especially the shift of ntd orientation depending on the stage of cholesterol transport in connection to the npc and effect of these motions or npc lumenal domain conformations by r w mutation. we believe the current study can enhance our understanding of cholesterol absorption/re-absorption process via npc with atomic detail. the panel a corresponds to the trajectory with titled ntd. we selected a helix from each domain and calculated the angles between each helix axis. please refer figure s for the selected helices in each domain. mechanisms and regulation of cholesterol homeostasis absorption and metabolism of dietary cholesterol cholesterol efflux and reverse cholesterol transport: experimental approaches regulation of the mevalonate pathway srebps: transcriptional mediators of lipid homeostasis, cold spring harbor symposia on quantitative biology diabetes mellitus and intestinal niemann-pick c -like gene expression, molecular nutrition and diabetes transporters as drug targets: discovery and development of npc l inhibitors identification of surface residues on niemann-pick c essential for hydrophobic handoff of cholesterol to npc in lysosomes structure of n-terminal domain of npc reveals distinct subdomains for binding and transfer of cholesterol structural insights into the niemann-pick c (npc )-mediated cholesterol transfer and ebola infection blobel, . Å structure of niemann-pick c protein reveals insights into the function of the c-terminal luminal domain in cholesterol transport niemann-pick type c function requires lumenal domain residues that mediate cholesterol-dependent npc binding purified npc protein i. binding of cholesterol and oxysterols to a -amino acid membrane protein purified npc protein ii. localization of sterol binding to a -amino acid soluble luminal loop ebola viral glycoprotein bound to its endosomal receptor niemann-pick c clues to the mechanism of cholesterol transfer from the structure of npc middle lumenal domain bound to npc insights into the covid- pandemic from a rare neurodegenerative disease the lysosome: a potential juncture between sars-cov- infectivity and niemann-pick disease type c, with therapeutic implications computational studies of the cholesterol transport between npc and the n-terminal domain of npc (npc (ntd)) niemann-pick type c disease: a qm/mm study of conformational changes in cholesterol in the npc (ntd) and npc binding pockets simulations of npc (ntd): npc protein complex reveal cholesterol transfer pathways npc intracellular cholesterol transporter (npc )-mediated cholesterol export from lysosomes cholesterol binding to the sterol-sensing region of niemann pick c protein confines dynamics of its n-terminal domain clues to npc -mediated cholesterol export from lysosomes structural basis for cholesterol transport-like activity of the hedgehog receptor patched structural insight into eukaryotic sterol transport through niemann-pick type c proteins highcontent imaging and structure-based predictions reveal functional differences between niemann-pick c variants structural basis for itraconazole-mediated npc inhibition computational tools unravel putative sterol binding sites in the lysosomal npc protein cholesterol transport in wild-type npc and p s: molecular dynamics simulations reveal changes in dynamical behavior lysosomal cholesterol export reconstituted from fragments of niemann-pick c npc gene mutations in japanese patients with niemann-pick disease typeác identification of novel mutations in niemann-pick disease type c: correlation with biochemical phenotype and importance of ptc -like domains in npc npc facilitates bidirectional transfer of cholesterol between npc and lipid bilayers, a step in cholesterol egress from lysosomes molecular dynamics simulations reveal structural differences among wildtype npc protein and its mutant forms comparative study of the effect of disease causing and benign mutations in position q on cholesterol binding by the npc n-terminal domain the pymol molecular graphics system, version very fast empirical prediction and rationalization of protein pka values charmm m: an improved force field for folded and intrinsically disordered proteins gromacs: fast, flexible, and free improving efficiency of large time-scale molecular dynamics simulations of hydrogen-rich systems in silico direct folding of thrombin-binding aptamer g-quadruplex at all-atom level defining the axis of a helix mole online . : interactive web-based analysis of biomacromolecular channels the authors declare no competing financial interests. key: cord- -wk iipl authors: tausch, simon h.; loka, tobias p.; schulze, jakob m.; andrusch, andreas; klenner, jeanette; dabrowski, piotr w.; lindner, martin s.; nitsche, andreas; renard, bernhard y. title: patholive – real-time pathogen identification from metagenomic illumina datasets date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: wk iipl motivation over the past years, ngs has become a crucial workhorse for open-view pathogen diagnostics. yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. the interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. results we implemented patholive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. based on real-time alignment with hil-ive , mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. the results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. for a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only of sequencing cycles. for a real-world sample from sudan the results correctly indicated the presence of crimean-congo hemorrhagic fever virus. in a second real-world dataset from the sars-cov- outbreak in wuhan, we found the presence of a sars coronavirus as the most relevant hit without the novel virus reference genome being included in the database. for all samples, clinically irrelevant hits were correctly de-emphasized. our approach is valuable to obtain fast and accurate ngs-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance. availability patholive is open source and available on gitlab (https://gitlab.com/rkibioinformatics/patholive) and bioconda (conda install –c bioconda patholive). contact bernhard.renard@hpi.de, nitschea@rki.de the identification of pathogens directly from patient samples is a major clinical need. while highly accurate pathogen detection methods such as polymerase chain reaction (pcr), cell culture, or amplicon sequencing exist, such routine procedures often fail to identify the underlying cause of a patient's symptoms due to their targeted behavior (breitwieser, et al., ; bzhalava, et al., ; greninger, et al., ; salzberg, et al., ) . as a complementary approach, metagenomics next-generation sequencing (ngs) has been proposed as a valuable technique for clinical application. ngs facilitates the detection and characterization of pathogens without a priori knowledge about candi-date species. further, it generates a sufficient amount of data to detect even lowly abundant pathogens without targeted amplification of specified sequences allowing for hypothesis-free diagnostic analysis. current tools to address ngs-based pathogen identification can be divided into two major categories, either aiming to discover yet unknown genomes huson, et al., ; kostic, et al., ; li, et al., ; norling, et al., ; piro, et al., ; roux, et al., ; skewes-cox, et al., ; wommack, et al., ; zhao, et al., ) or to detect known organisms in a sample (bray, et al., ; byrd, et al., ; dadi, et al., ; flygare, et al., ; francis, et al., ; hong, et al., ; lee, et al., ; lindner and renard, ; menzel, et al., ; naccache, et al., ; piro, et al., ; piro, et al., ; scheuch, et al., ; truong, et al., ; wood, et al., ; wood and salzberg, ; zheng, et al., ) . from an algorithmic perspective, a further distinction can be made between alignment-based methods, alignment-free methods or combinations of both. while alignment-free methods usually deliver faster results, alignment-based methods potentially allow for a more extensive characterization of the sample. regardless of the algorithmic approach, existing methods based on unbiased metagenomics ngs face various obstacles, especially concerning the ranking of the results according to their clinical relevance and the long overall turnaround time (breitwieser, et al., ; dutilh, et al., ; frey, et al., ; lecuit and eloit, ; lecuit and eloit, ; mokili, et al., ; roux, et al., ; snyder, et al., ) . the lack of good ranking methods is based on the fact that the distinction of clinically relevant and irrelevant data is not trivial. first, the dominating part of the sequences in a patient sample usually originates from the host genome. second, there are nucleic acids of various species that are usually of low clinical relevance such as endogenous retroviruses (erv) or nonpathogenic bacteria which commonly colonize a person. for these reasons, the number of reads hinting towards a relevant pathogen can be as low as a handful of individual reads. to put it more generally, it is a widespread misconception to rely only on quantitative measures when ranking the importance of candidate hits as not the amount but the uncommonness of a species in a given sample may give critical indications on its relevance. based on the premise that a large proportion of the produced reads may stem from the host genome, species irrelevant for diagnosis, or common contaminations, even highly accurate methods struggle with false positive hits potentially concealing the relevant results. this central problem is getting worse when considering that even microbial databases are contaminated with human sequences (breitwieser, et al., ) . existing pipelines tackle this problem in different ways. one common strategy is to ignore sequences that occur in a reference database of host and contaminating sequences (byrd, et al., ; dutilh, et al., ; flygare, et al., ; naccache, et al., ; wommack, et al., ; zheng, et al., ) . while facilitating cleaner results, this approach may lead to a premature rejection of relevant sequences and does not solve the problem of human contaminations in reference databases as those "derive primarily from high-copy human repeat regions, which themselves are not adequately represented in the current human reference genome" (breitwieser, et al., ) . further, the definition of precise contamination databases proves rather difficult and has not yet been adequately solved. thus deleting any results to gain a better overview comes at great risk of overlooking the true cause of an infection. a different strategy are intensity filters, as implemented e.g. in slimm (dadi, et al., ) , that disregard sequences with low genome coverages. as the author states, this step eliminates many genomes which introduces the risk of losing information that might be relevant in the following diagnostic process. this problem even intensifies for marker-gene based methods such as metaphlan (truong, et al., ) , as large parts of the sequenced reads cannot be assigned due to the miniaturized reference database. while this may lead to a better ratio of seemingly relevant assigned reads to those from the background, it comes with the risk of disregarding relevant candidates. another fundamental problem of ngs-based pathogen identification approaches is the fact that sequencing and analysis is very time consuming. even when considering the reduction of sequencing time in the last years, current mid-and high-throughput devices still have maximum runtimes of more than a day (nextseq ) and up to two (novaseq ) or three days (hiseq x), respectively. the resulting turnaround times of two to four days including data processing and analysis are not short enough for many critical scenarios such as sepsis and infectious disease outbreaks. to obtain actionable results within an appropriate time frame it is crucial to reduce the time span from sample receipt to diagnosis. however, existing approaches to speed up ngs-based diagnostics come with significant disadvantages such as a highly reduced throughput and data quality (quick, et al., ) , massive reduction of analyzed reads or targets (stranneheim, et al., ) or the need of specialized hardware that involves additional costs and relatively low flexibility to adapt the workflow to a given scenario (miller, et al., ) . an actual approach for taxonomic classification of ngs data during runtime of sequencing is implemented in livekraken, a real-time version of the well-known kraken software (tausch, et al., ) . however, by not providing positional information in the results, a sequence-based ranking to determine the relevance of hits is not possible with this approach. as a general complement to real-time analysis of short-read sequencing data, there are several promising studies for pathogen detection using the minion handheld device which is particularly useful for field studies and produces longer reads of up to several hundred kilobase pairs. while allowing very fast throughput times, these devices yield only approximately a million reads with comparably low per-base qualities, limiting their areas of application to targeted sequencing so far (cao, et al., ; greninger, et al., ; loose, et al., ; quick, et al., ; stewart and watson, ) . therefore, from today's perspective, ngs is the only technology providing sufficient amount and quality of data for many applications in clinical diagnostics. the currently high turnaround times from sample arrival to final diagnosis make it necessary to develop efficient methods to generate, analyze, and understand large metagenomics datasets in an accurate and quick manner to pave the way for ngs as a standard tool for clinical diagnostics. this enforces ngs-based diagnostics workflows to generate and evaluate large numbers of reads to facilitate adequate sequencing depths while reducing the time span between sample receipt and diagnosis. to overcome the named obstacles, we present patholive, an ngs-based real-time pathogen detection tool. we present an innovative approach to handle the occurrence of common contaminations, background data and irrelevant species in a single step. to tackle the problem of long overall turnaround times, we based our novel approach on the real-time read mapper hilive that enables the analysis of sequencing data while an illumina sequencer is still running (loka, et al., ) . this enables patholive to perform nucleotide-level analysis based on ngs providing an open view and high accuracy in short turnaround times while generating an intuitive and interactive visualization of results that highlights organisms of high clinical significance. our workflow follows a different paradigm than other frameworks to tackle the existing problems, as shown in fig. : (i) prepare informative, well defined reference databases, (ii) automatically define contaminating or non-pathogenic sequences beforehand, (iii) use hilive for accurate real-time alignment of illumina sequencing data, (iv) visualize the potential risk of candidate pathogens and present results in an intuitive, comprehensible manner. the details on the modules for each of these steps are provided in the following paragraphs: (i) preparation of reference databases: in order to save computational effort during the analysis, reference databases including the full taxonomic lineage of organisms are prepared before the first execution of patholive. for this purpose user selectable databases, for example the refseq genomic database (brister, et al., ) , are downloaded from the file transfer protocol (ftp) servers of the national center for biotechnology information (ncbi) and annotated accordingly with taxonomic information from the ncbi taxonomy database. while preserving the original ncbi annotation of each sequence, additional information is appended to the sequence header. this information consists of each taxonomic identifier (taxid), rank and name of each taxon in the lineage of an organism. afterwards, user definable sub-databases of taxonomic clades relevant for a distinct pathogen search are automatically created. for the experiments in this manuscript, we focused on viruses. the database updater used for this purpose is available at https://gitlab.com/rki_bioinformatics/database-updater. the viral database used in this manuscript can be downloaded as a single compressed fasta file from zenodo (https://doi.org/ . /zenodo. ) and is ready to use for viral diagnostics with patholive. (ii) identification and labelling of clinically irrelevant hits: a main obstacle in ngs based diagnostics is the large amount of background noise contained in the data. this includes various sources of contamination such as sequencing artefacts, ambiguous references and clinically irrelevant species, which hinder a quick evaluation of a dataset. defining an exhaustive set of possible contaminations is a yet unachieved goal. furthermore, deleting such sequences carries the risk of losing relevant results. since in this step, raw sequencing data from a human host is examined, the logical conclusion is to contrast it to comparable raw datasets instead of processed genomes. we implemented a method to define and mark all kinds of undesired signals on the basis of comparable datasets from freely available resources. for this purpose, raw data from randomly selected datasets from the genomes project phase (the genomes project consortium, ) were downloaded, assuming that a large majority of the participants in the genomes project were not acutely ill with an infectious disease. the full list of selected datasets is provided in the supplementary material (section . ). the reads are quality trimmed using trimmomatic (bolger, et al., ) and mapped to the selected pathogen reference database using bowtie (langmead and salzberg, ) . whenever a stretch of a sequence is covered once or more in a dataset from the genomes project, the overall background coverage of these bases is increased by one. coverage maps of all references from the pathogen database are stored in the serialized pickle file format. stretches of dna found in this data are marked as of lower clinical significance and visualized as such in later steps of the workflow. the coverage maps of the background abundances are plotted in red color against the coverage maps of the reads from the patient dataset in green color on the same reference (fig. ) . this enables highlighting presumably relevant results without discarding other candidate pathogens, giving the researcher the best options to interpret the results in-depth but still in an efficient manner. the code for the generation of these databases is part of patholive. (iii) using hilive for real-time alignment of reads: we used hil-ive (version . ) to produce real-time alignments of intermediate sequencing results. thereby, the raw sequencing data is directly loaded in raw bcl file format without the need to perform a file conversion step. alignments are updated with each new sequencing cycle and output in bam format can be created for any sequencing cycle. as changes in the mapping positions mainly occur in early sequencing cycles, we recommend to create output in shorter intervals at the beginning of sequencing. options for integrated demultiplexing and adapter trimming are available. for algorithmic details of hilive , we refer to loka, et al. ( ) . (iv) visualization and hazardousness classification: a key hurdle in a rapid diagnostics workflow, which is often underestimated, is the presentation of results in an intuitive way. many promising efforts have been made by different tools, e.g. providing coverage plots (lindner and renard, ; naccache, et al., ) or interactive taxonomy explorers (flygare, et al., ; huson, et al., ) . while being hard to measure and thus often ignored, the time it takes for groups of experts to assess the results and come to a correct conclusion should be considered. our browser-based, interactive visualization is implemented in javascript using the data visualization library d (bostock, et al., ) . for an example of the visualization, see fig. . while providing all available information on demand, the structure of a taxonomic tree allows an intuitive overview. detailed measures are available on genus, family, species and sequence level. for the calculation of scores for a given node , we define ( ) as the total number of read alignments to an underlying species of . ( ) is the total number of bases being covered by all reads with respect to n. accordingly, ( ) describes the number of bases being covered by the background database and \ ( ) is the number of bases being covered by the foreground but not by the background data. in total, we provide three different scores for each node n of the tree: (a) total hits , representing the total number of hits to all underlying sequences in this branch: , representing the total number of bases covered in the foreground data but not in any background dataset: , being the ratio of unambiguous bases for the foreground data to the number of bases covered by the background database and logarithmically weighted by the total number of alignments: can be useful to get a general impression of the abundance of sequences in the sample, the unambiguous bases provides a first comparison to the background dataset. the weighted score introduces an intensified metric of how often a sequence is found in a healthy individual, and thereby allows drawing stricter conclusions from the background data. not only exactly overlapping mappings of fore-and background are regarded, but also the overall abundance of a sequence within the background data is considered. the values of the selected scoring scheme are reflected in the thickness of the branches, which draws the visual focus to higher rated branches. users can switch between the three scores via the respective buttons in the interactive visualization. in order to enable users to make early decisions regarding the handling of a sample as well as to further enhance the intuitive understanding of the results, the hazardousness of detected pathogens is color-coded based on a biosafety level (bsl) score list (biosafety and biotechnology unit, ) . to improve bsl classification, minor changes were manually applied to improve matches to the organism names in the reference database. the bsl score gives information on the biological risk emanating from an organism. therefore, it qualifies as a measure of hazardousness in this use case. the bsl-score is color-coded in green (no information/bsl ), blue (bsl ), yellow (bsl ) or red (bsl ), and the maximum hazardousness-level of a branch is propagated to the parent nodes. phages are displayed in grey, as they cannot infect humans directly, but may imply information on the presence of bacteria. details about the sums of all three available scores of all underlying species are provided on mouse-over (fig. in the results section) . when expanding a branch to sequence level, additional plots of the foreground coverage calculated in step (iii) as well as the abundance of bases in the background datasets calculated in step (ii) are shown when hovering the mouse over the node (fig. ) . these plots provide a visualization of the significance of a hit. the hits of a species in the patient dataset are shown in green, while background hits are drawn in red on a coverage plot. this way, it is easy to evaluate if a sequence is commonly found in non-ill humans and therefore can be considered less relevant, or if a detected sequence is unique and could lead to more certain conclusions. we compared the results of patholive to two existing solutions, clinical pathoscope (byrd, et al., ) and bracken . we selected clinical pathoscope for its very sophisticated read reassignment method, which promises a highly reliable rating of candidate hits. it also is perfectly tailored to this use case. other promising pipelines such as surpi (naccache, et al., ) or taxonomer (flygare, et al., ) were not locally installable and had to be disregarded. bracken, a method based on metagenomics classification with kraken (wood and salzberg, ) , was included in the benchmark as one of the fastest and best known classification tools which makes it one of the primary go-to methods for many users. the experiment is based on a real sequencing run on an illumina hiseq in high output mode. we designed an in-house generated sample in order to have a solid ground truth. we ran all tools using threads, starting each at the earliest possible time point when the data was available from the sequencer in the expected input format. for the non-real-time tools, the base calling was executed via illumina's standard tool bcl fastq and the runtime was regarded in the overall turnaround time. clinical pathoscope and bracken were both run with default parameters, apart from the multithreading. we built the databases for patholive and bracken using the viral part of the ncbi refseq (o'leary, et al., ) . for clinical pathoscope we downloaded the associated database from http://www.bu.edu/jlab/wpassets/databases.tar.gz using the provided viral database as foreground and the human database as background. details of the database construction are given in the supplementary methods (section . ). please note that, in contrast to all other results shown in this manuscript, the live analysis of the in-house sample was performed using read-mapping results of hilive, the predecessor of hilive . however, we repeated the analysis using hilive and obtained similar results with respect to accuracy (cf. supplementary fig. and supplementary table ) . to validate patholive on real data, we applied it to a previously described diagnostic human serum sample from an outbreak of hemorrhagic fever virus in sudan (andrusch, et al., ; kohl, et al., ) and a dataset from an outbreak of severe acute respiratory syndrome coronavirus (sars-cov- ) in wuhan, (wu, et al., ) . as the data was only available in fastq format, it was converted to bcl file format following the procedure described in the supplementary methods (section . ). the total read length was x bp for the cchfv dataset from sudan and x bp for the sars-cov- dataset from wuhan. k is rightly found in the dataset, but not considered a clinically relevant candidate due to its common prevalence in healthy human individuals. viral metagenomics studies were performed with a human plasma mix of six different rna and dna viruses as well-defined surrogate for clinical liquid specimen. the informed consent of the patient has been obtained. this µl mix contained orthopoxvirus (vaccinia virus vr- ), flavivirus (yellow fever virus d vaccine), paramyxovirus (mumps virus vaccine), bunyavirus (rift valley fever virus mp vaccine), reovirus (t /bat/germany/ / ) and adenovirus (human adenovirus ) from cell culture supernatant at different concentrations. the sample also contains dependoparvovirus as proven via pcr. the sample was filtered through a . µm filter and nucleic acids were extracted using the qiaamp ultrasense kit (qiagen) following the manufacturers' instructions. the extract was treated with turbo dna (life technologies, darmstadt, germany). cdna and double-stranded cdna (ds-cdna) synthesis were performed as previously described ( ). the ds-cdna was purified with the rneasy minelute cleanup kit (qiagen). the purification method takes ~ h to complete. the library preparation was performed with the nextera xt dna sample preparation kit following the manufacturers' instructions (illumina). ngs libraries were quantified using the kapa library quantification kits for illumina sequencing (kapa biosystems). if the starting amount of ng of nucleic acid was not reached the entire sample volume was added to the library. the diagnostic sample from sudan was prepared according to (andrusch, et al., ; kohl, et al., ) , including inactivation of the human serum in qiagen buffer avl, extraction with qiagen qiaamp viral rna mini kit and dna digestion using the thermo fisher turbo dna-free kit. a sequencing library was created using the illumina nextera xt dna library preparation kit. the sample was sequenced on an illumina miseq. the dataset from the outbreak of sars-cov- in wuhan in was sequenced on an illumina miniseq sequencing device and is publicly available at the ncbi sequence read archive (sra) under accession number srr (wu, et al., ) . the human plasma sample spiked with a viral mixture was sequenced on an illumina hiseq in high output mode on one lane. patholive was executed from the beginning of the sequencing run using threads. intermediary results were taken after , , and cycles or after , , and hours, respectively. the time needed to produce results from the intermediary sequencing data was lower than minutes for all output cycles. raw reads usable for the testing of other tools were available only after hours as they had to be translated into the human readable fastq-format first. as a ground truth, we selected all sequences associated to the species described as abundant above. the area under the curve (auc) of the receiver operating characteristic (roc) was calculated using the highest ranking species, as given by the tested tools. the top of the identified species are considered because hits appearing after twice the number of true positives cannot be expected to be regarded by a user in this experiment. furthermore, none of the tested tools found more true positives within the next hits. the roc-plot (fig. ) denotes the true positive rate and false positive rate for each threshold n≤ , whereby a threshold n means that the best n hits are taken into account. this means that only the rank of the hits was considered while disregarding the actual score. for patholive, the ranks were determined by the weighted score , for clinical pathoscope we used the "final guess" metric and for bracken, the species with most estimated reads were ranked highest. phages are shown in grey). on mouse-over, detailed information (here on genus mastadenovirus) is displayed. the selected score (here: weighted score) is highlighted in grey. the visualization clearly emphasizes all spiked pathogens through the thickness of their clades, while other species are shown only in smaller clades and therefore ranked lower. we were able to detect all abundant spiked species in the library after only cycles of the sequencing run using patholive. while the overall number of false positive hits decreases with the sequencing time, the weighted score and the number of unambiguous bases yield accurate results throughout all reports. reported phages are included in these numbers, although they are optically grayed out in the visualization, as they cannot infect vertebrates directly. as an example report, a screenshot of the resulting interactive tree of results after cycles is shown in fig. . a central issue in pathogen identification, especially for viruses, is the potentially low number of pathogenic reads in the sample. therefore, we demonstrated the performance of patholive on real data that is known to contain a low number of reads of interest. we analyzed a human serum sample from sudan that was confirmed via pcr to contain crimean-congo hemorrhagic fever virus (cchfv) but only shows a small amount of related reads in the corresponding illumina sequencing data ( out of , , reads were reported by andrusch et al. in (andrusch, et al., to unambiguously belong to cchfv). when running patholive with default parameters and having adapter trimming activated, bunyaviridae was the family with the highest weighted score over the complete sequencing procedure when not considering phages and the "unassigned family" branch. thereby, the score of bunyaviridae was consistently equal to the score of the underlying species cchfv while other underlying species didn't contribute to the overall score of the family. fig. shows the development over time for all families that reach a score of in at least one output cycle. it can be seen that the weighted score of cchfv (represented by the family of bunyaviridae) is in the top three of all identified families after only sequencing cycles which corresponds to % of the sequencing procedure. at this time point, only reads were aligned to cchfv. thus, indications for the correct finding are already possible within a short time span and based on only a couple of available reads while the result is more and more emphasized with ongoing sequencing. the only other family reaching a score higher than and not exclusively containing phages was retroviridae, being mainly driven by the species hiv . however, a more detailed view on the sequence level shows that all mappings to hiv cluster in a small region of approximately , bp (fig. d) while the alignments to cchfv distribute over the complete genome (fig. b,c) . this strongly indicates that cchfv is more likely to be a true positive. fig. further shows the family level visualization of the patholive tree structure (fig. a) and an example for granulovirus of the baculoviridae family that shows a high total number of mappings, but all of those being located in regions that are covered in the background database leading to a weighted score of (fig. e) . the overall results for this sample show the strength of patholive to pronounce interesting findings at first glance while still allowing for a more detailed perspective that is often important for interpretation. for a dataset from the outbreak of sars-cov- in wuhan, , we could also identify a coronavirus as the most probable causative virus. this example clearly demonstrates the strength of our scoring approach. when using the pure quantity of alignments for the visualization of results, the coronaviridae family branch is not among the most prominently visualized hits. in contrast, when activating the weighted score, a clear indication was already available after only sequencing cycles, corresponding to % of the complete sequencing run (fig. ) . the ranks and underlying scores of the visualization shown in table further support the strength of the weighted score approach. a more detailed analysis of the underlying tree of the coronaviridae family shows that there are different coronavirus species with high scores, mainly dominated by a sars-related coronavirus and a bat coronavirus. this further indicates that a clear assignment to one of the underlying species was not possible. these results are as expected, since the correct species, sars-cov- , was not present in the reference database. another branch that is clearly highlighted by its red color belongs to the poxviridae family. however, a more detailed look into the results shows that the bsl- classification originates from a single sub-branch where all mapping positions cluster to only two single peaks (not shown). this is a similar pattern to what we already showed for the occurrence of hiv in the previous section (cf. fig. d ) and is therefore most probably not of biological interest. ngs has been shown to be the current state of the art dna sequencing technology for pathogen detection and makes an increasing impact on the diagnosis of infectious diseases. although third generation sequencing approaches are also becoming more and more influential, the discovery of lowly abundant pathogens is still problematic due to the relatively low number of reads. additionally, the comparably low coverage and high error rates still hamper certain types of complex follow-up analyses such as the detection of antimicrobial resistances or the geographical origin of a pathogen. on the other hand, long-read sequencing technology show an immense potential for real-time diagnostics in the future, especially when considering the continuously decreasing error rates, shorter sample preparation times, arising higher throughput devices such as the promethion, as well as valuable technology-specific features such as the read until functionality for that first attempts have been made to separate microbial reads from host dna during the sequencing procedure (edwards, et al., ; loose, et al., ) . all these aspects considered we assume long-read sequencing technology a valuable comple- by selecting the weighted score, the results correctly highlight the presence of a coronavirus by assigning the clearly highest score. when opening the details of the coronavirus branch, highest similarity can be determined with sars-related coronavirus and a bat coronavirus. in both subfigures, the family of coronaviridae is marked by a red box. ment to ngs-based diagnostics in future with distinct properties and therefore potentially different application areas. the high turnaround time of ngs-based diagnostics is a major drawback compared to targeted molecular methods. past efforts to speed up ngs-based diagnostics have been made but often come with significant disadvantages : quick, et al. ( ) introduced a fast sequencing protocol for illumina sequencers that allows obtaining results after as little as hours . this speedup is accompanied by lower throughput and lower data quality, making it less suitable for whole genome shotgun sequencing approaches without a priori knowledge. other approaches aiming at performing analyses of intermediate sequencing data require either a massive reduction of the amount of analyzed reads and/or targets (stranneheim, et al., ) or the application of specialized hardware such as field-programmable gate array technology (fpga) which is, for example, used for the dragen system (miller, et al., ) . such specialized hardware approaches come with additional costs, either for purchase and infrastructure of local solutions or for the use of a cloud system. at the same time, such approaches provide a low level of flexibility in the analysis and are not algorithmically optimized for working with incomplete data. patholive does not require the use of specialized hardware and provides accurate diagnostics results in real-time, illustrated with an easily understandable and interactive visualization. this strongly facilitates to get insights into a clinical sample before the sequencer has finished. real-time output before the sequencing process of the first read has finished lacks information about multiplex indices, though. therefore, early results of multiplexed sequencing runs can only be assigned to a specific sample after sequencing of the multiplexindices. for paired-end sequencing runs, this still means analyses are still possible far before the sequencer ends, and single-end sequencing runs can produce results at the very moment the indices have been sequenced. a possible solution for this problem is to sequence the indices before the first read, which can pose addressable challenges for cluster identification. as a working solution, many sequencing devices allow paired-end sequencing with different lengths for the first and second reads. it is thereby possible to sequence only a short fragment of the first read to get early access to the multiplex indices. thus, this approach can be used to obtain de facto single-end reads (i.e., the full second read) while having the multiplex information available from the beginning of the read. for pathogen identification, we changed the basis for the selection of clinically relevant hits from pure abundance or coverage-based measures towards a metric that takes information on the singularity of a detected pathogen into account. still, we decided not to completely trust the algorithmic evaluation alone, but provide all available information to the user in an intuitive interactive taxonomic tree. while we assume that this form of presentation allows users to come to the right conclusions very quickly, more sophisticated methods for the abundance estimation especially on strain level exist. implementing an additional abundance estimation approach comparable to the read reassignment of clinical pathoscope (byrd, et al., ) or the abundance estimation of bracken could enable more accurate results, albeit this would not be applicable trivially to the overall conception of patholive. the sensitivity and specificity of patholive varies with the time of a sequencing run. in the beginning, when only little sequence information is available, only a small number of nucleotides specify a candidate hit, leading to comparably high false positive rates. at the end of a sequencing run, the number of sequence mismatches in the longer alignments may lead to the erroneous exclusion of hits, especially when sequencing quality decreases. however, this behavior is implicitly considered by the hilive algorithm which allows for an increasing number of mismatching nucleotides with increasing length of the reads. still, the results can vary over runtime with the optimal outcome being measured at intermediate cycles if the selected parameters are not well-suited for the specific sample or if the sequencing quality decreases stronger than usual. besides these challenges which are unique to patholive, similar problems as conventional approaches occur. first, the definition of meaningful reference databases is difficult. no reference database can ever be exhaustive since not all existing organisms have been sequenced yet. besides that, there may be erroneous information in the reference databases due to sequencing artefacts, contaminations or false taxonomic assignment. the definition of the hazardousness was especially complicated, as to our knowledge no well-established solution for the automated assignment of this information exists. therefore, the basis for our bsl-levelling approach might not be exhaustive, leading to underestimated danger levels of pathogens that are missing in the underlying bsl list. furthermore, in-house contaminations, some of which are known to be carried over from run to run on the sequencer while others may come from the lab, could interfere with the result interpretation of a sequencing run. especially since no indices are sequenced for the first results of patholive, comparably large numbers of carry-over contaminations might lead to false conclusions. candidate contaminations should therefore be kept in mind when interpreting results. using in-house generated spiked human plasma samples, we were able to show the advantages of patholive not only concerning its unprecedented runtime but also the selection of relevant pathogens. we further show the high sensitivity of our approach by identifying cchfv in a real sample from sudan based on a few dozens of reads. while being very fast and accurate, a limitation of patholive lies in the discovery of yet unknown pathogens. this is due to the limited sensitivity of alignment-based methods in general, which hampers the correct assignment of highly deviant sequences. however, the analysis of a dataset from the sars-cov- outbreak in clearly shows that the detection of novel species that are related to known pathogens is still possible. concluding, patholive is a helpful tool for accurate and yet rapid detection of pathogens in clinical ngs datasets. the key advantages are the real-time availability of analysis results as well as the intuitive and interactive visualization with down-prioritization of likely irrelevant candidates. paipline: pathogen identification in metagenomic and clinical next generation sequencing samples belgian classifications for micro-organisms based on their biological risks -definitions trimmomatic: a flexible trimmer for illumina sequence data data-driven documents near-optimal probabilistic rna-seq quantification a review of methods and databases for metagenomic classification and assembly re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus d infection human contamination in bacterial genomes has created thousands of spurious proteins ncbi viral genomes resource clinical pathoscope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data unbiased approach for virus detection in skin lesions streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time minion(tm) sequencing slimm: species level identification of microorganisms from metagenomes virus discovery by metagenomics: the (im)possibilities reference-independent comparative metagenomics using crossassembly: crass real-time selective sequencing with rubric: read until with basecall and reference-informed criteria taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mrna expression profiling pathoscope: species identification and strain attribution with unassembled sequencing data comparison of three next-generation sequencing platforms for metagenomic sequencing and identification of pathogens in blood rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis rapid metagenomic next-generation sequencing during an investigation of hospital-acquired human parainfluenza virus infections pathoscope . : a complete computational framework for strain identification in environmental or clinical sequencing samples megan community edition -interactive exploration and analysis of large-scale microbiome sequencing data crimean congo hemorrhagic fever pathseq: software to identify or discover microbes by deep sequencing of human tissue fast gapped-read alignment with bowtie the diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening the potential of whole genome ngs for infectious disease diagnosis scalable metagenomics alignment research tool (smart): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations vip: an integrated pipeline for metagenomics of virus identification and discovery metagenomic abundance estimation and diagnostic testing on species level metagenomic profiling of known and unknown microbes with microbegps reliable variant calling during runtime of illumina sequencing real-time selective sequencing using nanopore technology bracken: estimating species abundance in metagenomics data fast and sensitive taxonomic classification for metagenomics with kaiju a -hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases metagenomics and future perspectives in virus discovery a cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples an in silico experimental design, simulation and analysis tool for viral metagenomics studies reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation ganon: precise metagenomics classification against large and upto-date sets of reference sequences dudes: a top-down taxonomic profiler for metagenomics metameta: integrating metagenome analysis tools to improve taxonomic profiling rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of salmonella benchmarking viromics: an in silico evaluation of metagenomeenabled estimates of viral community composition and diversity metavir : new tools for viral metagenome comparison and assembled virome analysis next-generation sequencing in neuropathologic diagnosis of infections of the nervous system riems: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets profile hidden markov models for the detection of viruses within metagenomic sequence data next-generation sequencing--the promise and perils of charting the great microbial unknown pore guis for parallel and real-time processing of minion sequence data rapid pulsed whole genome sequencing for comprehensive acute diagnostics of inborn errors of metabolism rambo-k: rapid and sensitive removal of background sequences from next generation sequencing data the genomes project consortium. a global reference for human genetic variation metaphlan for enhanced metagenomic taxonomic profiling virome: a standard operating procedure for analysis of viral metagenome sequences improved metagenomic analysis with kraken kraken: ultrafast metagenomic sequence classification using exact alignments a new coronavirus associated with human respiratory disease in china virusseeker, a computational pipeline for virus discovery and virome composition analysis virusdetect: an automated pipeline for efficient virus discovery using deep sequencing of small rnas we gratefully acknowledge the support of claudia kohl concerning the selection of appropriate datasets. we thank andrea thürmer and aleksandar radonić for sharing their expertise in illumina sequencing. we further thank all hilive contributors for their work on the real-time read mapping approach. conflict of interest: none declared. key: cord- -aioqogaw authors: chiu, elliott s.; vandewoude, sue title: presence of endogenous viral elements negatively correlates with felv susceptibility in puma and domestic cat cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aioqogaw while feline leukemia virus (felv) has been shown to infect felid species other than the endemic domestic cat host, differences in felv susceptibility among species has not been evaluated. previous reports have noted a negative correlation between enfelv copy number and exogenous felv infection outcomes in domestic cats. since felids outside the genus felis do not harbor enfelv genomes, we hypothesized absence of enfelv results in more severe disease consequences in felid species lacking these genomic elements. we infected primary fibroblasts isolated from domestic cats (felis catus) and pumas (puma concolor) with felv and quantitated proviral and viral antigen loads. domestic cat enfelv env and ltr copy numbers were determined for each individual and compared to felv viral outcomes. felv proviral and antigen levels were also measured in naturally infected domestic cats and naturally infected florida panthers (p. concolor coryi). we demonstrated that puma fibroblasts are more permissive to felv than domestic cat cells, and domestic cat felv restriction was highly related to enfelv ltr copy number. terminal tissues from felv-infected florida panthers and domestic cats had similar exfelv proviral copy numbers, but florida panther tissues have higher felv antigen loads. our work indicates enfelv ltr elements negatively regulate exogenous felv replication. further, puma concolor lacking enfelv are more permissive to felv infection than domestic cats, suggesting endogenization can play a beneficial role in mitigating exogenous retroviral infections. conversely, presence of endogenous retroelements may relate to new host susceptibility during viral spillover events. importance feline leukemia virus (felv) can infect a variety of felid species. only the primary domestic cat host and related small cat species harbor a related endogenous virus in their genomes. previous studies noted a negative association between the endogenous virus copy number and exogenous virus infection in domestic cats. this report shows that puma cells, which lack endogenous felv, produce more virus more rapidly than domestic cat fibroblasts following cell culture challenge. we document a strong association between domestic cat cell susceptibility and felv long terminal repeat (ltr) copy number, similar to observations in natural felv infections. viral replication does not, however, correlate with felv env copy number, suggesting this effect is specific to felv ltr elements. this discovery indicates a protective capacity of the endogenous virus against the exogenous form, either via direct interference or indirectly via gene regulation, and may suggest evolutionary outcomes of retroviral endogenization. to felv viral outcomes. felv proviral and antigen levels were also measured in naturally infected domestic cats and naturally infected florida panthers (p. concolor coryi). we demonstrated that puma fibroblasts are more permissive to felv than domestic cat cells, and domestic cat felv restriction was highly related to enfelv ltr copy number. terminal tissues from felv-infected florida panthers and domestic cats had similar exfelv proviral copy numbers, but florida panther tissues have higher felv antigen loads. our work indicates enfelv ltr elements negatively regulate exogenous felv replication. further, puma concolor lacking enfelv are more permissive to felv infection than domestic cats, suggesting endogenization can play a beneficial role in mitigating exogenous retroviral infections. conversely, presence of endogenous retroelements may relate to new host susceptibility during viral spillover events. importance feline leukemia virus (felv) can infect a variety of felid species. only the primary domestic cat host and related small cat species harbor a related endogenous virus in their genomes. previous studies noted a negative association between the endogenous virus copy number and exogenous virus infection in domestic cats. this report shows that puma cells, which lack endogenous felv, produce more virus more rapidly than domestic cat fibroblasts following cell culture challenge. we document a the vast majority of vertebrate genomes, including up to % of the human genome, harbor fossils of ancient viral infections made up predominantly of retroviral genetic material ( - ). during infection, the retroviral rna genome is reverse-transcribed to form double-stranded dna, which is in turn integrated into the host's genome ( ). while most of these infections target somatic cells, these viruses are capable of infecting and integrating into germ cells ( ). the consequences of viral integration into the germline is profound, and ultimately the virus is vertically transmitted as permanent genetic elements inherited in a mendelian fashion ( ). fixation of the retroviral content in host genomes is a process termed endogenization and leads to new host genetic elements called endogenous retroviruses (ervs) ( ). ervs in their early stages are believed to undergo massive changes during host cell transcription, and the foreign, potentially deleterious genetic material accumulates mutations and deletions that often render the newly endogenized virus defunct ( ). while typically unable to produce infectious virions, many ervs are still capable of undergoing transcription and may even produce functional viral proteins. certain ervs are known to function in important physiologic, cellular, or biological processes, including placentation, oncogenesis, immune modulation, and infectious disease progression ( , ). endogenous feline leukemia virus (enfelv) is an example of an erv which has a horizontally transmitted retroviral counterpart (feline leukemia virus, felv). only members of the felis genus harbor enfelv as endogenization is believed to have originated after the felis genus split off from other members of the felidae family ( , ) . felv can infect felid species that harbor enfelv (i.e., domestic cats) as well as species that lack enfelv (i.e., puma). felv epizootics have been documented in multiple non-felis species populations including the north american puma (puma concolor) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . felv represents an endogenous-exogenous retroviral system that has perhaps been best studied with regard to disease biology and outcome during naturally occurring infections in an outbred, highly dispersed mammalian host. thus, evaluation of this system provides opportunities to better understand erv-exogenous viral interactions that are highly relevant to virus and host evolution and ecology. felv epizootics in wild felids are characterized by serious disease of epidemic proportions, ( , ), whereas felv infection in adult cats frequently results in regressive and abortive infections ( ). it has been hypothesized that enfelv may be associated with differences in infection outcome. we previously demonstrated enfelv long terminal repeat (ltr) copy number was associated with better infection outcomes during a natural felv outbreak in a multi-cat household ( ). here, we evaluate felv infection of puma (p. concolor) and domestic cat cells in vitro and in situ to examine the susceptibility of endemic and novel hosts to felv infection with respect to enfelv to provide further evaluation of this relationship. primary fibroblasts were successfully propagated from ear punches from three free-ranging puma (two kittens of unknown sex and one adult male) and abdominal skin incision from a mature adult female captive puma. primary fibroblasts were cultured from domestic cats abdominal full skin biopsies from necropsied cats at colorado state university ( male, female). exogenous felv proviral load was significantly and substantially greater in puma fibroblasts domestic cat and puma viable cell counts were equivalent on days or , averaging . x cells per cm for domestic cat cells and . x cells per cm for puma cells (supplemental figure a) . crfks are smaller than domestic cat and puma primary fibroblasts and therefore cell density was higher in these cultures ( . x cells per cm at day and . x cells per cm at day ). percent dead cells (as measured by trypan blue exclusion) on days and ranged between . ± . % in for domestic cat cells and . %± . % for puma cells regardless of infection status (supplemental figure b ). there was a consistent trend for lower percent mortality at day when cells first reached confluency. crfks experienced greater cell mortality over the course of the infection regardless of infection status ( . % at day and . % at day ). as anticipated, domestic cat cells harbored more enfelv ltrs than enfelv env genes ( figure a ). normalized ltr sequence copy numbers in domestic cat cells ranged from and copies per cell with an average of copies per cell. copy numbers for enfelv env were significantly lower, ranging from to copies per cell with an average of copies. variation in domestic cat fibroblasts enfelv-ltr copy number correlated to felv antigen loads (day , pearson's correlation coefficient=- . ; p< . ; figure b ), whereas variation in enfelv-env did not (day , pearson's correlation coefficient=- . , p= . ; figure c ). enfelv- exfelv correlations were calculated at day since this is the timepoint that cells reached complete confluency and the rate at which antigen was produced waned after this timepoint. linear regression analysis of felv proviral load against antigen load showed that only % of the variation in antigen production could be explained by proviral load (r = . ; figure ). felv loads in bone marrow, spleen, thymus and peripheral lymph nodes were assessed in experimentally infected cats and naturally infected pumas (not all tissues were available for each animal, see table ). both tissue proviral load and tissue antigen load failed the kolmogorov-smirnov test for log-normality, indicating a non-normal distribution. felv proviral load was greater in domestic cats compared to panther by a median difference of . by mann-whitney test of log-transformed copy numbers per cell (u= , p= . ; figure a ). despite lower mean proviral load in florida panther tissues, p capsid antigen in tissues tended to be higher (median value . vs. . , mann-whitney test, u= , p= . ; figure b ). there was no difference in antigen or proviral load among tissues, with the exception of bone marrow antigen. alterations in infectivity and virulence has been noted following cross species viral transmission ( ). in some cases, disease spillovers into novel species of the same family can result in dead-end hosts for the virus (infectious hematopoietic necrosis virus ( ); feline immunodeficiency virus strain lru ( )). in other cases, disease spillover may result in active infections that maintain persistent transmission (mycoplasmosis; ( ); feline foamy virus ( )). further still, some cases result in adaptation of the virus in novel hosts leading to increased morbidity and mortality (hiv ( ); covid- ( )). outcomes of diseases associated with spillover are dependent on the given specifics of host, environment and agent interactions ( ). multiple felv spillover events have been documented in free-ranging pumas with resultant significant morbidity ( , ). the apparent virulence of felv in pumas and other nondomestic felids has led to speculation that this virus may have enhanced virulence in novel hosts ( , ) . this study was thus undertaken to evaluate the hypothesis that felv infection of nondomestic felids might be more competent in virus replication than the domestic cat reservoir host, examined specifically in the context of the presence of enfelv. experimental infections were conducted in vitro to establish differences in felv replication in puma and domestic cat cells in the absence of immunological and physiological parameters. in fibroblast cultures, felv infection resulted in higher proviral load in puma cells than in domestic cat cells ( figure a ). additionally, increased viral antigen was documented in infected puma cells, which is suggestive of increased viral production ( figure b) . therefore, at a cellular level, puma cells appear to be more competent at supporting felv infection and replication than fibroblasts of the primary host domestic cats. this is further supported by the fact that proviral load could only explain % of the variation in viral antigen production, indicating that proviral integration events alone are not a surrogate for virus replication. other factors may be influencing the increased viral production in puma cells or restriction of viral replication in domestic cats enfelv-ltr copy number is associated with resistance to felv infection and antigen production. endogenous elements constitute a sizable component of an animal's genome ( ) and solo ltrs vastly outnumber full endogenous genomes/pseudogenomes ( ). this occurs because two flanking ltrs allow for the intervening genes to be removed by homologous recombination, leaving behind just one copy of ltr it its place ( ). as such, env copy number serves as a proxy for full enfelv genomes, since loss of one gene would not occur frequently. in this sampling of domestic cat fibroblast enfelv components, solo ltrs range from - copies per cell, while env ranges from - copies per cell, similar to previous observations ( figure a ). this is consistent with previously reported measures of full-length enfelv range of - copies per cell ( , , ), versus - copies of enfelv ltr per cell ( ). ltr copy number variation, but not full enfelv genomes, correlated with felv replication as evidenced by directly proportional antigen load; enfelv-env showed no correlation to either felv replication ( figure b -c). while felv replication may intuitively seem to be correlated to proviral integration number, only % of felv antigen variation is explained by proviral load (figure ). linear regression analysis therefore suggests that factors other than proviral integration number contributes to host susceptibility. this also corroborates observations made in vivo in a domestic cat colony naturally infected with felv ( ) alternatively, it is possible that the enfelv genetic elements may directly interfere with exfelv infection by encoding for small interfering rnas or piwi interacting rnas that activate host dicer complexes to specifically target felv transcripts ( , ). our results are more suggestive of direct interference mechanisms due to the linearity of the felv restriction afforded by enfelv-ltrs. it is unlikely that all ltr integration sites are influencing the transcription of host genetic factors and would have an effect in a dose-dependent manner. felv reaches high viral load in lymphoid tissues during natural infections. naturally infected pumas with felv had lower proviral load than domestic cats, with the exception of two bone marrow samples, that achieved x proviruses per x cells ( figure a ). interestingly, a much wider range of proviral copy numbers were noted in pumas, and viral antigen loads in pumas equaled or exceeded that of domestic cats ( figure b ). field collections were performed opportunistically on florida panthers when animals were either found deceased or hit by vehicle, often hours to days after death occurred. in contrast, felv positive shelter cats were euthanized prior to death, and tissues were collected rapidly following death. the timeliness of collection likely impacted quality of the sample prior to dna extraction for puma samples. while normalization to feline ccr helped to address these issues for proviral copy number calculations, it is possible that viral antigen loads measured in puma tissues underestimate actual values. biological aspects of felv transmission that differ between domestic cats and pumas may impact subsequent infection kinetics. the initial spillover events of felv to pumas has been associated with predation of domestic cats ( , ), whereas felv in domestic cats is believed to be transmitted in households through social interactions such as grooming, or via antagonstic interactions ( ). domestic cats interact socially, so behaviors like grooming may sustain infection in animals in close contact and infections may result from repeated exposures. unlike domestic cats, pumas are much more solitary and interactions between pumas outside of mother-offspring groups are primarily believed to be antagonistic ( ). gammaretroviruses require the dissolution of the nucleus during mitosis in order to integrate into cells ( ), and therefore dividing cells are more susceptible to felv infection and replication. we measured cell count as a proxy for rate of cell division, and percent cell mortality as a measure of viability. neither measure differed between domestic cat and puma fibroblasts, though crfk cells displayed greater cell count and cell mortality at confluency (supplemental figure ) . immortalized cell lines have accumulated multiple changes that fundamentally alter their morphology and physiologic behaviors, including decreases in contact inhibition ( ). felv proviral integration and antigen production in crfk infections had far less within-and between-run variation than primary fibroblasts, likely attributable to the clonal nature of crfk versus wild type derived primary tissue cultures. observation of cell culture parameters did not suggest differences in growth characteristics that explain the variant felv susceptibly of puma and domestic cat fibroblasts. in this report, we present information that demonstrates that enfelv-ltr confers protection against exfelv infection in vitro through the limitation of felv replication. the exact mechanism by which these constituents act has yet to be determined, but leaves room for further investigation in the felv system as well as other endogenous-exogenous retroviral dyads. we hypothesize that felv restriction may manifest as direct interference through rna silencing mechanisms, or by indirect enfelv-ltr-mediated promotion of host anti-viral genes. felv provides an opportunity to directly interrogate the mechanisms that govern related exogenous-endogenous retroviral interactions in an outbred and diverse population. penicillin/streptomycin/fungizone). one puma culture was infected with feline foamy virus and was treated prophylactically with the anti-retroviral drug, azt ( ug/ml; sigma) per manufacturer's direction until in cultures where ffv cpe were detected. cells were passaged two times for approximately days in media without azt washout prior to infection. primary cultures were expanded for at most four passages before being frozen in ( % dmso, % fbs, % serum free dmem) using a freezing container (nalgene) and stored at - ˚c. bone marrow, thymus, spleen, and lymph node from naturally felv-infected domestic cats were washed with sterile pbs, were given fresh media, and were incubated with % co at ˚c for ten days. titration was repeated three times. felv antigen elisa detection, described below, was used to detect viral capsid antigen p in the supernatant. the quantity of virus necessary to infect % of tissue cultures (tcid ) was calculated by previously published methods ( ). enfelv and exfelv quantification by real time qpcr ltr and env enfelv copy number was quantified in domestic cat cells. env was used as a proxy for full-length endogenous felv, and ltr copy number detected both full length enfelv as well as solo ltrs. exogenous felv proviral dna was measured by a third qpcr protocol targeting exfelv specific ltrs, which vary from enfelv ( ). enfelv-env, enfelv-ltr and exfelv-ltr primers and probes were previously designed and reactions were performed as described ( ) biorad cfx thermocycler. in order to determine enfelv and exfelv proviral load, quantified felv was normalized against feline ccr (c-c chemokine receptor type ; ( )) recognizing both domestic cat and puma ccr sequences. we used the delta ct method accounting for two ccr genes per cell ( ). custom dna oligos were synthetically constructed with target regions of enfelv ltr and env, exfelv, and ccr on one dna construct for quantification (gblock, idt; primer and probe sequences and qpcr thermocycling conditions are reported in table ( (bio-rad), water, and dna template. felv and ccr reactions were run simultaneously on the same plate on a bio-rad cfx at ˚c for minutes, followed by cycles of ˚c for seconds and ˚c for seconds. the limit of detection for this assay is ≥ copies per reaction. standards for this assay were created as custom synthetic oligos (gblocks, idt) containing a relevant fragment of the exogenous felv and ccr genes (supplementary figure ) . standard dilution and controls were run in duplicate and samples were run in triplicate. primary puma and domestic cat fibroblast cultures passaged fewer than five times were cultured in % fbs-supplemented dmem high glucose media. cells were plated at a density of , cells per cm in a -well plate and infected with a multiplicity of infection (moi) of . felv- e in triplicate and cultured with . ml media. μl of supernatant was collected and stored at ˚c at days , , , , , and for detection of p elisa. at days and , cells were harvested to determine cellular viability based on cell number and percent mortality by counting cells stained with trypan blue (gibco) on a hemocytometer. one domestic cat cell culture, one puma cell culture, and one crfk cell culture infection were terminated at day due to equipment failure. one puma primary cell culture triplicate infection was repeated twice. felv capsid antigen p was measured by sandwich elisa. costar immulon hb plates were coated with ug cm capture antibody (custom monoclonal, inc., us) in ul . m carbonate buffer ( . g/l sodium bicarbonate, . g/l sodium carbonate, ph ~ . ) overnight at ˚c. plates were blocked with ul % bsa in ten buffer for two hours. one hundred μl of samples buffered with μl elisa diluent were incubated for two hours on a plate shaker. six hundred micrograms of biotinylated secondary antibody (cm -b; custom monoclonal, inc., us) was incubated in each well, followed by : dilution of hrp-conjugated streptavidin (thermofisherscientific, ma). each step following sample incubation was followed with x wash with ten buffer ( . m tris base, . m edta, . m nacl, ph . - . ) with . % tween. all incubations were performed at room temperature. p antigen was detected indirectly following the addition of , ', , ' tetramethyl benzidine (tmb) substrate and peroxidase (biolegend, san diego, ca) at room temperature for . min before adding . n h so was quantified by bioanalyzer at nm. semi-purified felv p diluted in appropriate media (dmem or rpmi) was used as a standard curve. cutoff values for negative samples were three times the standard error over the average od measured for control media samples. one puma, one domestic cat, and one control cat experiment were ended at day . cutoff values were established by x standard error above average value for negative control wells (red line). variation between enfelv-ltrs and env. in individual cats, ltr copy number was greater than env copy number. enfelv-ltr ranged between - copies per cell. enfelv-env was more variable ranging between - copies per cell. b) domestic cat enfelv-ltr copy number was negatively correlated with felv antigen production (pearson's coefficient= - . ; p= . ). :e . . consortium ihgs. . initial sequencing and analysis of the human genome endogenous retroviruses and the human germline. current opinion in evolution of retroviruses: fossils in our dna endogenous retroviruses of non-avian/mammalian vertebrates illuminate diversity and deep history of retroviruses transmission, evolution, and endogenization: lessons learned from recent retroviral invasions. microbiology and molecular retrotransposons, endogenous retroviruses, and the evolution of retroelements degradation and remobilization of endogenous retroviruses by recombination during the earliest stages of a germ-line invasion endogenous retroviral syncytin: compilation of experimental research on syncytin and its possible role in normal and disturbed human placentogenesis expression and regulation of human endogenous retrovirus w elements feline leukaemia virus: half a century since its discovery evolutionary dynamics of endogenous feline leukemia virus proliferation among species of the domestic cat lineage multiple introductions of domestic cat feline leukemia virus in endangered florida panthers feline leukemia virus and other pathogens as important threats to the survival of the critically endangered iberian lynx (lynx pardinus) prevalence of infectious diseases in feral cats in northern florida prevalence of fiv and felv infections in cats in istanbul prevalence of feline leukaemia virus and antibodies to feline immunodeficiency virus and feline coronavirus in stray cats sent to an rspca hospital prevalence of feline immunodeficiency virus and other retroviral infections in sick cats in italy lymph node spleen thymus key: cord- -t bt eb authors: yao, dehui; lao, fang; zhang, zeyi; liu, yan; cheng, jianwei; ding, fengjiao; wang, xiaofei; xi, lun; wang, chuang; yan, xichong; zhang, rongkun; ouyang, fangxing; ding, hui; ke, tianyi title: human h-ferritin presenting rbm of spike glycoprotein as potential vaccine of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t bt eb the outbreak of covid- has so far inflicted millions of people all around the world and will have a long lasting effect on every aspect of everyone’s life. yet there is no effective approved treatment for the disease. in an effort of utilizing human ferritin as nanoplatform for drug delivery, we engineered a fusion protein by presenting receptor-binding motif (rbm) of sars-cov- virus spike glycoprotein on the n-terminus of ferritin subunits. the designed fusion protein with a cage-like structure, similar to that of corona virus, is a potential anti-sars-cov- vaccine. we hereby show the construction, preparation, and characterization of the fusion protein rbm-hftn. our initial affinity study confirmed its biological activity towards ace receptor which suggests its mode of action against sars-cov- could be either through vaccine therapy or blocking the cellular entry of virus as antagonist of ace receptor. the pandemic coronavirus disease (covid- ) by sars-cov- virus has caused tremendous suffering to tens of millions of people around the world. even though quite a few clinical studies involving different approaches are undergoing for the treatment of the disease, there is no effective cure yet up to date. vaccines therefore is urgently needed for the preventing further spread of the covid- . the corona virus, sars-cov- , consists of a large rna genome, four structural proteins, nonstructural proteins, and some accessory proteins. the four structural proteins include spike, envelope, membrane, and nucleocapsid proteins, of which the spike glycoprotein is of particular interest for it is a popular vaccine target for corona virus. antibodies targeting the spike glycoprotein of sars-cov and mers-cov, especially its receptor-binding domain (rbd), was found to efficiently neutralize virus infection [ , ] . antibodies from sars-cov and sars-cov- patients however showed limited cross neutralization, in spite of the high sequence similarity between two viruses [ ] . other vaccine approaches include the production of live attenuated whole virion vaccines, inactivated whole virion vaccines, recombinant protein vaccines, and mrna based vaccines [ ] . sars-cov- was found to enter cells through binding of the host cellular receptor angiotensin-converting enzyme (ace ) via its spike glycoprotein soon after its outbreak in china [ ] . later the cryo-em revealed the structure of sars-cov- s-rbd complexed with its receptor human ace [ ] [ ] [ ] [ ] . the s subunit of spike glycoprotein undergoes a hinge-like conformational transition from "down" conformation to "up" conformation before binding to ace [ ] . the receptor-binding motif (rbm, amino acid in total) of spike glycoprotein in close contact of ace receptor was identified for its sequence between - [ ] , and this rbm was consistent with the identified s-rbd [ ] . this conformational transition state has become the target for antibody-mediated neutralization, and the atomic-level understanding of this transitional state and the identification of s-rbm would facilitate the vaccine design and development against sars-cov- . ferritin is a -mer protein assembly consisting of heavy chain ( kd) and light chain ( kd) . it has a cage-like structure in a way similar to sars-cov- . because of its unique structure, ferritin is a promising nanoplatform for antigen presentation and immune stimulation [ ] [ ] [ ] [ ] . its spherical architecture has an outer diameter of nm, suitable for rapid tissue penetration and draining to lymph node [ ] . in this work, we engineered a human ferritin heavy chain (hftn) by fusing and presenting the rbm of its spike glycoprotein as potential vaccine of sars-cov- . human ace (his tag) ( -h h) ,ace rabbit antibody ( -rp ), goat anti-rabbit/hrp secondary antibody(ssa , and sars-cov- ( -ncov) and rbd of spike glycoprotein (mfc tag)( -v h) were purchased from sino biological (beijing, china). the refolding solution was concentrated and buffer exchanged to mm tris-hcl buffer (ph . ) with a millipore lab scale tangential flow filter (tff) system. after buffer exchange, rbm-hftn was found to have a typical soluble ferritin structure. the fusion protein with its right space structure was further purified by using anion exchange chromatography (flow through mode), followed with a size-exclusion chromatography to remove aggregates and other low molecule impurities. the physico-chemical properties of fusion rbm-hftn were analyzed by sec-hplc overnight. after blocking with % bsa, ace protein (concentration . μg/ml) was allowed to bind to the surface coated proteins for h at c, followed by washing with washing buffer three times. anti-ace antibody (dilution : ) was incubated for . hours at c followed by washing for three times. secondary antibody/hrp ( : dilution) in % bsa was incubated c for . h followed by washing for three times. the tmb substrate was added and incubated for min, and then the absorbance was read at nm wavelength. ferritin heavy chain protein has found many biological applications in nanomedicines and molecular diagnostics [ ] [ ] [ ] . lately, a few constructs based on ferritin have been developed as antivirus and anticancer vaccines [ , , ] . because ferritin heavy chain protein is derived from a natural occurring protein in human, itself has low immunogenicity. its -mer assembly cage-like structure has four different symmetries, six -fold axes, eight -fold axes, twelve -fold axes, and twenty-four c -c interfaces. by presenting a fusion protein at the -fold axes, ferritin was able to display eight trimers of fused protein, resulting in enhanced immunogenicity of protein on display [ ] . in this work we engineered a human ferritin heavy chain fused with the rbm of spike glycoprotein of sars-cov- at its nterminus with (ggggs) short peptide linker ( figure a ). we chose a short polypeptide sequence ( amino acid) of spike glycoprotein rbm (s-rbm) based on latest -d cryo-em studies of spike glycoprotein-ace complex [ , ] . because this sequence doesn't appear to have a stable -d structure, we engineered it at the n-terminus, so that the fused s-rbm subunits are distant from each other and won't affect the formation -mer assembly. for the same reason, we chose a minimal size of rbd so that it is properly displayed on the surface of ferritin. in spite of those considerations, the rbm-hftn fusion protein was found mainly in the inclusion bodies, instead of in soluble form. hypothetically, the designed rbm-hftn fusion protein may involve in two different pathways against virus infection. first and most importantly, the rbm-hftn may act as vaccine against sars-cov- , and antibodies responsive to spike glycoprotein rbm may subsequently neutralize sars-cov- , followed by removal of virus by immune system ( figure b, upper pathway) . two antibodies found in covid- patients were shown to block the binding between virus s-protein rbd and cellular receptor ace [ ] . their therapeutic effect was validated in mouse model by reducing virus titer in infected lungs [ ] . those findings supported our hypothesis that by properly presenting s-protein rbd/rbm, the designed rbm-hftn has great potential as an anti-sars-cov- vaccine, thus inducing the production of antivirus antibodies. the second pathway, more obvious and direct, is to block the cellular entry of sars-cov- by preoccupy the ace receptor with rbm-hftn and suppress the virus proliferation ( figure b , lower pathway). the first pathway is more effective for prevention of virus infection. the second pathway is valuable for treatment after virus infection as antagonist of ace receptor. the expressed rbm-hftn was found in the form of inclusion bodies in the bacteria lysis precipitates (supplementary figure s ) rbd of spike glycoprotein (mfc tag) and rbm-hftn of varied concentration in l was incubated in maxisorp plate and ace was allowed to bind to the surface coated proteins. their binding was detected by ace antibody followed by its secondary antibody. in this indirect elisa, the maximum binding intensity correlates with the binding site on surface. as indicated by the results in figure , s-rbd showed higher plateau than rbm-hftn, suggesting more binding sites are available in s-rbd coated wells than in rbm-hftn coated wells. the ec s of the binding between ace and s-rbd, rbm-hftn were estimated to be . nm and . nm respectively. apparently rbm-hftn has higher binding affinity than s-protein in binding to their same ace receptor. the apparent higher binding affinity may attribute to the cage-like structure of rbm-hftn, which presents multiple copies ( copies) of rbm on the surface. this result suggested that the rbm is properly presented on surface of heavy chain human ferritin and is recognizable by the ace receptor. its potential as sars-cov- vaccine and as antagonist of ace receptor is being further studied in animal experiments. the receptor-binding motif (rbm) of sars-cov- , a -amino acid polypeptide, was fused with n-terminus of human heavy chain ferritin (hftn) through a proper linker. the constructed rbm-hftn was found in inclusion bodies of bacterial lysis and was able to refold by gradual dialysis. the purified rbm-hftn was found to be in good purity, expected size and morphology. its biological activity towards ace receptor was confirmed by elisa and the fusion protein rbm-hftn bodes well as potential vaccine and therapeutics against sars-cov- as ace receptor antagonist. t. ke holds ownership interest (including patents) in kunshan xinyunda biotech co., ltd. no potential conflicts of interests were disclosed by the other authors. the spike protein of sars-cov--a target for vaccine and therapeutic development mers-cov spike protein: targets for vaccines and therapeutics a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- vaccines: status report. immunity, structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis for the recognition of sars-cov- by full-length human ace structural basis of receptor recognition by sars-cov- cryo-em structure of the -ncov spike in the prefusion conformation receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus self-assembling influenza nanoparticle vaccines elicit broadly neutralizing h n antibodies engineered human ferritin nanoparticles for direct delivery of tumor antigens to lymph node and cancer immunotherapy a milk-based self-assemble rotavirus vp -ferritin nanoparticle vaccine elicited protection against the viral infection structure-based design of ferritin nanoparticle immunogens displaying antigenic loops of neisseria gonorrhoeae. febs open bio vaccine delivery: a matter of size, geometry, kinetics and molecular patterns ferritin nanocages: a biological platform for drug delivery, imaging and theranostics in cancer ferritins as nanoplatforms for imaging and drug delivery emerging and dynamic biomedical uses of ferritin structure and immunogenicity of a stabilized hiv- envelope trimer based on a group-m consensus sequence a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace key: cord- - lumewy authors: cox, robert m.; sourimant, julien; govindarajan, mugunthan; natchus, michael g.; plemper, richard k. title: therapeutic targeting of measles virus polymerase with erdrp- suppresses all rna synthesis activity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lumewy morbilliviruses, such as measles virus (mev) and canine distemper virus (cdv), are highly infectious members of the paramyxovirus family. mev is responsible for major morbidity and mortality in non-vaccinated populations. erdrp- , a pan-morbillivirus small molecule inhibitor for the treatment of measles, targets the morbillivirus rna-dependent rna-polymerase (rdrp) complex and displayed unparalleled oral efficacy against lethal infection of ferrets with cdv, an established surrogate model for human measles. resistance profiling identified the l subunit of the rdrp, which harbors all enzymatic activity of the polymerase complex, as the molecular target of inhibition. here, we examined binding characteristics, physical docking site, and the molecular mechanism of action of erdrp- through label-free biolayer interferometry, photoaffinity cross-linking, and in vitro rdrp assays using purified mev rdrp complexes and synthetic templates. results demonstrate that unlike all other mononegavirus small molecule inhibitors identified to date, erdrp- inhibits all phosphodiester bond formation in both de novo initiation of rna synthesis at the promoter and rna elongation by a committed polymerase complex. photocrosslinking and resistance profiling-informed ligand docking revealed that this unprecedented mechanism of action of erdrp- is due to simultaneous engagement of the l protein polyribonucleotidyl transferase (prntase)-like domain and the flexible intrusion loop by the compound, pharmacologically locking the polymerase in pre-initiation conformation. this study informs selection of erdrp- as clinical candidate for measles therapy and identifies a previously unrecognized druggable site in mononegavirus l polymerase proteins that can silence all synthesis of viral rna. importance the mononegavirus order contains major established and recently emerged human pathogens. despite the threat to human health, antiviral therapeutics directed against this order remain understudied. the mononegavirus polymerase complex represents a promising drug target due to its central importance for both virus replication and viral mitigation of the innate host antiviral response. in this study, we have mechanistically characterized a clinical candidate small-molecule mev polymerase inhibitor. the compound blocked all phosphodiester bond formation activity, a unique mechanism of action unlike all other known mononegavirus polymerase inhibitors. photocrosslinking-based target site mapping demonstrated that this class-defining prototype inhibitor stabilizes a pre-initiation conformation of the viral polymerase complex that sterically cannot accommodate template rna. function-equivalent druggable sites exist in all mononegavirus polymerases. in addition to its direct anti-mev impact, the insight gained in this study can therefore serve as a blueprint for indication spectrum expansion through structure-informed scaffold engineering or targeted drug discovery. the mononegavirus order contains major established and recently emerged human pathogens. despite the threat to human health, antiviral therapeutics directed against this order remain understudied. the mononegavirus polymerase complex represents a promising drug target due to its central importance for both virus replication and viral mitigation of the innate host antiviral response. in this study, we have mechanistically characterized a clinical candidate small-molecule mev polymerase inhibitor. the compound blocked all phosphodiester bond formation activity, a unique mechanism of action unlike all other known mononegavirus polymerase inhibitors. photocrosslinking-based target site mapping demonstrated that this class-defining prototype inhibitor stabilizes a pre-initiation conformation of the viral polymerase complex that sterically cannot accommodate template rna. function-equivalent druggable sites exist in all mononegavirus polymerases. in addition to its direct anti-mev impact, the insight gained in this study can therefore serve as a blueprint for indication spectrum expansion through morbilliviruses belong to the paramyxovirus family of highly contagious respiratory rna viruses several cryo-electron microscopy-based reconstructions of mononegavirus polymerase proteins have been reported, including structural models of piv- and rsv l polymerases that are closely related to mev l ( , ). we have previously identified signature resistance hot-spots for erdrp- in mev and cdv l through viral adaptation ( , ) . escape sites locate to the polymerase and a catalytically defective l mutant harboring an n a substitution in the polymerase active site confirmed that rna synthesis was mev p-l specific and not due to co-purified cellular contaminants. polymerase complexes containing an erdrp- resistance mutations h y or t a showed greatly reduced susceptibility to compound-mediated suppression of de novo polymerase initiation, reflected by inhibitory concentrations of . µm and . µm, respectively, which represents a -and backpriming and/or primer extension ( , , ) . backpriming refers to the spontaneous formation of a resulting paired '-ends beyond the actual length of the template (figure a). to assess whether the same limitation applies to erdrp- , we applied the mev polymerase complex to a previously described -mer rna template derived from an authentic rsv promoter sequence that is capable of efficient backpriming ( ). the rsv template was efficiently recognized as a suitable substrate for de in conclusion, these results consistently demonstrate that erdrp- directly suppresses all phosphodiester bond formation, setting the compound mechanistically apart from all other allosteric mononegavirus inhibitor classes characterized to date. photoaffinity labeling maps the erdrp- target site to the central l cavity to map the physical target site of erdrp- , we synthesized a photo-activatable analog of the compound, erdrp- az , through installation of an aryl azide moiety at c- position of the erdrp- piperidine ring via a short tether (figure a). analog design was guided by our extensive insight into the structure-activity relationship (sar) of the erdrp- chemotype that we had acquired in previous work ( ). bioactivity testing of erdrp- az against mev revealed dose-dependent suppression of virus replication and an ec of . µm (figure b). no cytotoxicity was detectable at the highest concentration tested ( µm), indicating specific virus inhibition by the photo-activatable probe. compared to erdrp- , however, antiviral potency of erdrp- az was reduced approximately -fold, which may reflect reduced plasma membrane permeability of the azide analog in cell-based assays or partially impaired target access of the modified compound. we compensated for this reduction in activity by incubating the p-l complexes in the presence of µm compound prior to photo-activation. photo-coupling of erdrp- az to purified l followed by lc-ms/ms analysis after trypsin digestion of the ligand-l complexes identified three discrete peptides that are located in the capping we therefore concentrated further analysis on peptides and , which are near the intersection between the polymerase capping, connector and mtase domains, and in close proximity to highly conserved polymerase motifs such as the proposed prntase domain (hr moiety of motif d and g of motif a) as well as both the postulated paramyxovirus l priming and "intrusion" loops ( ) (figure e). overlaying the photo-crosslinking results and resistance maps in the mev l structural model revealed a circular arrangement of all potential erdrp- anchor points along the interior lining of the central polymerase cavity (figure a). docking of erdrp- into the l structure was guided by the following constraints: positioning of the ligand is compatible with the formation of covalent bonds with residues in photo-crosslinking defined peptides and ; and the docked compound is in equal erdrp- , two chemical analogs, the original screening hit ( ) and the first-generation lead the intersection of mev l capping and rdrp domains, between motifs a and d of the predicted and . - . Å distance to peptide , establishing hydrogen bond interactions between residue y in individual models were ranked based on goodness of fit (r ) and predictive capacity (q ) against the consistent with this conclusion, none of these three inhibitor classes affects extension of the rna template after backpriming ( , , ) , which is considered to mimic rna elongation by a committed characterization of erdrp- demonstrated that preventing the switch to elongation mode is not the only mechanism available to allosteric small molecule inhibitors to block mononegavirus polymerases, since the compound interrupted both initiation at the promoter and rna elongation after backpriming with equal potency. underscoring a unique mechanism of action of erdrp- , although three peptides were identified by photocrosslinking, we primarily focused on peptides and . peptide is located in a highly variable and unstructured region that is completely solvent hydrogen bonding between the sulfonyl group of erdrp- and this residue is very likely catastrophic for the formation of productive initiation complexes. confirmed resistance sites to erdrp- line the internal wall of the central polymerase cavity, but individual hot-spots do not cluster in the native structure and none is predicted to be in direct contact with the docked ligand. remarkably, however, resistance sites were located in highly sequence conserved l domains such as on the priming and intrusion loops (i.e. h y, r q, and v a) and/or in immediate proximity of functional motifs (i.e. s a and t a framing the gdnq catalytic center). we conclude that escape from erdrp- is mediated by secondary structural effects rather than due to primary resistance, which is unusual for non-nucleoside analog polymerase inhibitors ( - purified mev l was incubated with µm erdrp- az for minutes prior to activating the crosslinker. the mev l -erdrp- az mixture was incubated on ice and exposed to uv light ( nm) for min. the sample was then exposed to additional uv light ( nm) for minutes. protein was then collected using flag resin. bound resin was then incubated with laemmli buffer at °c for minutes. sds-page electrophoreses was then performed using laemmli buffer on - % acrylamide gels. bands of interest were excised and analyzed by mass spectrometry. erdrp- az crosslinked peptides were identified by the proteomics & metabolomics facility at the wistar institute as previously described ( ). in order to identify erdrp- az crosslinked peptides, mass addition of and created initial pharmacophore models based on the training set. after electrostatic, steric, and space filling model building was performed, a partial-least squares analysis was performed and each model was validated, scored, and ranked based on loo correlation (q ) and goodness of fit (r ). the predictive potential of the model was then tested using a conformation database of the test set. directly transmitted infections diseases: control by vaccination age-related changes in the rate of disease transmission the vesicular stomatitis virus l protein possesses the a dual-functional priming-capping loop of new therapeutic strategies in hcv: polymerase leu key: cord- -csdezu y authors: booeshaghi, a. sina; pachter, lior title: normalization of single-cell rna-seq counts by log(x+ )* or log( +x)* date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: csdezu y single-cell rna-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. these have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in sars-cov- infections. while single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding sars-cov- can be expressed at very low levels. common assumptions underlying standard single-cell analyses don’t hold when examining low-expressed genes, with the result that standard workflows can produce misleading results. key points lowly expressed genes in single-cell rna-seq can be easliy misanalyzed. log( +x) count normalization introduces errors for lowly expressed genes the average log( +x) expression differs considerably from log(x) when x is small an alternative approach is to use the fraction of cells with non-zero expression the ace receptor, which facilitates entry of sars-cov- into cells [ ] , has become one of the most studied genes in the history of genomics over the past two months. there are already hundreds of preprints about the gene (google scholar), and it is currently the default gene displayed on the ucsc genome browser [ ] . several studies have reported on the expression of ace at single-cell resolution, and papers have been rife with speculation about implications of differential ace mrna abundance for severity of disease. as is common in single-cell rna-seq, the expression estimates of ace are derived from counts that are filtered and normalized. figure a shows an analysis of ace mrna in mice lungs (data from [ ] ). the expression is computed from cells containing at least one copy of the gene. while single-cell rna-seq expression data has been modeled with many different distributions [ , ] , for simplicity in illustrating our points we model this count data with a simple poisson random variable x with parameter λ in order to demonstrate the implications of this restriction. application of the filter amounts to computing while this is approximately λ when λ is large, it is close to when λ is small [ ] . figure b shows the fraction of cells containing at least one copy of ace [ ] . evidently, figure a creates a misleading impression. while it may appear that average ace expression is similar between young and old mice, figure : a) changes in ace expression in the lungs of eight -month old mice and seven -month old mice after log p transformation of the raw counts on the cells with non-zero ace expression. the p-value was computed using a t-test. b) changes in ace expression as determined by the fraction of ace positive cells. the p-value was computed using a t-test. c) a comparison of the naïve estimate of the expectation of log p (red) to the taylor approximation of the expectation of log p (blue). the code to produce the panels in the figure is available here. when comparing the fraction of cells with nonzero expression of ace it is clear that ace has significantly lower mrna expression in the lungs of aged mice than young mice. the fraction f of cells with nonzero expression of a gene has a useful statistical interpretation. we leave it as an exercise for the reader to show that the the following estimator for the poisson rate is consistent: since f is approximately equal to this expression when f is small, this provides an interpretation of the fraction of cells with at least one copy of a low-abundance gene as an estimate of the rate parameter λ in a poisson distribution. another mistake that we've found to be common in reporting ace expression has to do with the log transformation, frequently used as part of a normalization of counts. counts are log transformed for two reasons: the first is to stabilize the variance, as the log transform has the property that it stabilizes the variance for random variables whose variance is quadratic in the mean [ , ] . the rationale of this step for single-cell rna-seq is manifold: first when performing pca on the gene expression matrix to find a reduced-dimensional representation that captures the variance, it is desirable that all genes contribute equally. the second rationale for the log transform is that it converts multiplicative relative changes to additive differences. in the context of pca, this allows for interpreting the projection axes in terms of relative, rather than absolute, abundances of genes. a seemingly minor technical issue in log transforming counts is that zero counts cannot be "logged", as log( ) is undefined. to circumvent this problem, it is customary to add a "pseudocount", e.g. + , to each gene count prior to log transforming the data [ ] . we denote this by log p (see units of figure a ), in accordance with nomenclature standard in scientific computing [ ] . for a gene with an average of λ counts where λ is large, it is intuitive that the average of the log p transformed counts is approximately log(λ). however, this is not true for small λ. an understanding of the result of applying the log p transform begins with the observation that for a random variable x, e[f (x)] is not, in general, equal to f (e[x]). for example, if x is a poisson random variable with parameter λ, it is not true that e[log( + x)] = log(λ + ). by taylor approximation, = log(λ + ) − λ (λ + ) this shows that for low-expressed genes, the average log p expression differs considerably from log(λ) (see figure c) . thus, while a -fold change for large λ translates to a log( ) difference after log p, that is not the case for small λ. in summary, while single-cell rna-seq atlases offer detailed information about the transcriptomic profiles of distinct cell types, their use to examine specific genes, as has been done recently in the study of sars-cov- infection related genes, requires care. methods should not be used unless their limitations are understood. for example, while it doesn't matter whether one uses log(x+ ) or log( +x), the filtering and normalization applied to counts can affect comparative estimates in non-intuitive ways. moreover, there are subtle problems that arise when working with small counts that transcend the elementary issues we have raised [ , ] . these matters are not theoretical; we leave the identification of published preprints and papers that have ignored the issues we've raised, and hence reported misleading results, as another exercise for the reader. angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target ucsc genome browser enters th year an atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics bayesian gamma-negative binomial modeling of single-cell rna sequencing data observation weights unlock bulk rna-seq tools for zero inflation and single-cell applications analyse des infiniment petits moutard decrease in ace mrna expression in aged mouse lung. biorxiv, the use of transformations a new family of power transformations to improve normality or symmetry scclustviz-single-cell rnaseq cluster assessment and visualization berkeley elementary functions test suite why you cannot transform your way out of trouble for small counts overcoming systematic errors caused by log-transformation of normalized single-cell rna sequencing data. biorxiv we thank charles herring, michael hoffman, harold pimentel, jeffrey spence, and valentine svensson for helpful comments. key: cord- -ex zlq authors: de wijngaert, brent; sultana, shemaila; dharia, chhaya; vanbuel, hans; shen, jiayu; vasilchuk, daniel; martinez, sergio e.; kandiah, eaazhisai; patel, smita s.; das, kalyan title: cryo-em structures reveal transcription initiation steps by yeast mitochondrial rna polymerase date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ex zlq cryo-em structures of transcription pre-initiation complex (pic) and initiation complex (ic) of yeast mitochondrial rna polymerase show fully resolved transcription bubbles and explain promoter melting, template alignment, dna scrunching, transition into elongation, and abortive synthesis. promoter melting initiates in pic with mtf trapping the − to − non-template (nt) bases in its nt-groove. transition to ic is marked by a large-scale movement that aligns the template with rna at the active site. rna synthesis scrunches the nt strand into an nt-loop, which interacts with centrally positioned mtf c-tail. steric clashes of the c-tail with rna:dna and nt-loop, and dynamic scrunching-unscrunching of dna explain abortive synthesis and transition into elongation. capturing the catalytically active ic-state with utpαs poised for incorporation enables modeling toxicity of antiviral nucleosides/nucleotides. mitochondrial dna is transcribed by single-subunit rnaps (mtrnap), which unlike their phage counterparts, depend on one or more transcription factors for promoter-specific transcription initiation. much of our understanding of mitochondrial dna transcription comes from studies of yeast (s. cerevisiae) and human mtrnaps ( , , ) . the yeast mtrnap transcription initiation complex (y-mtic) is comprised of the catalytic subunit rpo and a transcription factor mtf . the human mtrnap transcription initiation complex (h-mtic) comprises of polrmt and two transcription factors. the h-mtic has been structurally characterized by crystallography ( ) , but the structure was captured in an inactive fingersclenched state with a major part of the transcription bubble disordered. hence, the structural basis for promoter melting, dna scrunching, and transcription initiation remains largely unknown for the mtrnaps. rpo (∆n ) and mtf were assembled on a pre-melted promoter (- to + , s yeast mtdna promoter; fig. a ) to generate the yeast mitochondrial transcription preinitiation complex (y-mtpic) (fig. s ). the y-mtpic was incubated with pppgpg rna and a non-hydrolysable utpas to generate the y-mtic poised to incorporate the + ntp. singleparticle cryo-em data analysis of the quinary y-mtic revealed a surprising coexistence of pic and ic states in equilibrium (fig. s ) . the y-mtic structure had bound rna and ntp, and y-mtpic structure had no rna or ntp. another dataset collected from a pic-only grid extends the resolution to . Å. key steps guiding transcription initiation are revealed from the . Å y-mtpic and . Å y-mtic structures. in y-mtpic structure ( fig. b & c) , the mtf is traced from to out of amino acid residues, rpo is traced from to the end residue with few disordered regions, and unambiguously traced dna (fig. s a) . a stable transcription core is composed of rpo , mtf , and transcription bubble (fig. c ). rpo interacts with mtf at multiple locations. two rpo b-hairpins -the intercalating hairpin (ich) and the mtf supporting hairpin (k -p ) form a crescent-shaped platform that accommodates the c-terminal domain of mtf ( - ) (fig. d) . the n-terminal domain of mtf contacts the tip of the rpo thumb helix (fig. e) ; biochemically, we show the interaction stabilizes rpo -mtf complex ( fig. s c -d) ( ) . the mtf -supporting hairpin also guides the mtf c-tail ( - ) towards the active site (fig. f ); the c-tail is disordered in free mtf ( ) . the pic structure has not been observed previously; thus, it provides new insights into the mechanism of promoter melting. the y-mtpic structure suggests that rpo and mtf initiate the promoter melting by creating a -nt transcription bubble from - to - . we provided a - to + pre-melted promoter; however, the + and + nucleotides assume a duplex-like dna conformation in pic albeit lacking canonical base-pairing. dna melting is driven by sequence-specific interactions of the nt strand with the nt-groove that lies at the interface of n-and c-terminal domains of mtf (residues - , - , and - ) . the - to - aag bases in the nt strand are flipped towards nt-groove (fig. g ). the - guanine base is sandwiched between the aromatic side chains of y and w , and all n and o atoms of the base, except n , are engaged in complementary hydrogen bond interactions (fig. h) ; mutation of - guanine severely impair promoter melting ( ) . the - nt base stacks with y with no base-specific interaction. the - and - aa bases are the quinary y-mtic has a promoter designed to bind a -mer rna and incorporate a third nucleotide ( fig. a) . the . Å density map of ic locates many previously uncharacterized structural elements, including the c-tail of mtf and the scrunched dna ( fig. b-c) , and reveals the mechanism of template alignment and rna synthesis during initiation. comparison of pic and ic shows little shift of upstream promoter dna and interacting structural elements (including ich, specificity loop, and thumb of rpo and nt-groove of mtf ) during pic to ic transition. in contrast, the downstream dna and interacting cterminal domain of rpo undergo large conformational changes including fingers closing (movie s ). during the transition, the upstream dna from position - onward is locked in the mtf nt-groove while the template strand undergoes a large conformational switching to align with the -mer rna and the incoming utpas at the active site ( fig. c; movie s ). these unsynchronized events at two ends of the transcription bubble scrunches the nt strand into an nt-loop (movie s ). dna scrunching has been proposed in multi-and single-subunit rnaps including y-mtrnap ( ) ( ) ( ) ( ) . our y-mtic structure is first to capture the scrunched conformation (fig. c) . the scrunched nt-loop is stabilized by ich (h , n ), thumb (r and k ), and mtf c-tail (m -y ). the looping of nt strand alters the downstream dna track and bends it from ~ ° to ~ ° with respect to the upstream dna; thus, transforming a v-shaped dna in pic to an u-shape in ic (fig. s ). biochemical studies show that the mtf c-tail plays an important role in template alignment, dna scrunching, and triggering transition into elongation ( ). the ic structure captures the entire c-tail in the active-site cavity and interacts with the template dna, ntloop, and '-end of the rna transcript. the c-tail base is stabilized by ich, thumb helix, and a loop ( - ) of rpo (fig. a ). the c-tail tip residue s is Å away from the 'end a-phosphate of pppgpg rna. the main chain carbonyls of e and h hydrogen bond with the n and n atoms of the - template-base; similar interactions with n and n atoms of unmutated cytosine at - position are expected. the c-tail also stabilizes the scrunched nt-loop; m -y of c-tail stack against the looped-out + and + nt bases ( fig. b ). structural projection indicates that rna synthesis will progressively push the ctail out of its position in ic (fig. c ), and at a critical length of rna, the c-tail will be displaced out from the active-site cavity. single-molecule fret studies show that ic to ec transition completes at -mer rna synthesis and c-tail deletion delays the transition ( , ). superposition of y-mtic on polrmt ec with -bp rna:dna ( ) shows that c-tail must exit for ic to ec transition (fig. d ). we expect a similar role of c-tail in homologous h-mtic (fig. s ). upon complete displacement of the c-tail, the mtf -supporting hairpin will switch its role from guiding the c-tail in ic to supporting the upstream dna in ec ( ). the pic and ic structures provide a basis for understanding the mechanism of abortive synthesis. abortive synthesis is observed in all dna-dependent rnaps during transcription initiation, and y-mtrnap generates large amounts of -mer and -mer abortive products compared to -to -mer abortives ( the y-mtic has captured the incoming ntp in a catalytic-competent state poised for incorporation (fig. a) , the ntp and the dna:rna duplex make extensive interactions with rpo in the active site, which is highly conserved in mitochondrial rnaps ( fig. s and s ). the structure of y-mtic permits reliable modeling of antiviral nucleosides/nucleotides for cytotoxicity prediction. nucleos(t)ide analogs are widely used to treat viral infections and can cause cytotoxicity by binding to cellular rnap and mitochondrial polrmt. remdesivir is a nucleotide analog with broad antiviral profile ( ) including treatment of sars-cov- (covid- ) infection. modeling of remdesivirdiphosphate into the ntp-binding pocket of mtrnap reveals that the characteristic 'cyano group of remdesivir clashes with the conserved h in polrmt (fig. b) . thus, remdesivir is expected to have low cytotoxicity, consistent with its low incorporation efficiency by polrmt ( , ). thus, the platform provides a framework for testing mitochondrial toxicity of nucleoside analogs. g. q. tang the coordinates and density maps for y-mtpic and y-mtic structures were deposited under pdb accession numbers ymv and ymw and embd ids. emd- and emd- , respectively. all data is available in the main text or the supplementary materials. all data, code, and materials used in the analysis will be available for purposes of reproducing or extending the analysis. (b) superposition of y-mtpic structure (black non-template; gray template) on y-mtic structure (cyan non-template; pink template) shows the bending of downstream dna while the upstream dna in both structures are aligned. the angle between the upstream and downstream dna are about ° and °, respectively, in y-mtpic (blue axes) and y-mtic (red axes); i.e., the dna is bent by ~ ° in pic and subsequently by another ° to ~ ° in the ic structure. the dna bending calculations were done using curves+ server ( ) . movies s -s shows the conformational changes during the transition from pic to ic state. qln) shows that the promoter dna template in y-mtic (pink) is bent sharply at the active site and after nucleotides that is analogous to the template track in t rnap ic (gray); however, the bound utpas captures y-mtic in the catalytic mode for nucleotide incorporation whereas, the t rnap ic structure represents the post-translocated state with no bound ntp. (c) the comparison also shows that the ntp-binding pocket undergoes a conformational change. the conserved y on the o helix of t rnap must shift to the position that y of rpo takes to accommodate an ntp. (d) superposition of y-mtic on h-mtic structure (pdb id. eqr) shows that the y-helix of polrmt clashes with the template strand and the state observed in the h-mtic structure would not position the template in a conformation that is compatible for rna/ntp-binding. the h-mtic structure represents an inactive-clenched state. the y-helix ( - of rpo ) is critical for downstream dna unwinding. insights into transcription: structure and function of singlesubunit dna-dependent rna polymerases maintenance and expression of mammalian mitochondrial dna structural basis of mitochondrial transcription mechanism of bacterial transcription initiation: rna polymerase -promoter binding, isomerization to initiation-competent open complexes, and initiation of rna synthesis mechanism of transcription initiation by the yeast mitochondrial rna polymerase structural basis of mitochondrial transcription initiation the thumb subdomain of yeast mitochondrial rna polymerase is involved in processivity, transcript fidelity and mitochondrial transcription factor binding crystal structure of the transcription factor sc-mttfb offers insights into mitochondrial transcription transcription factor-dependent dna bending governs promoter recognition by the mitochondrial rna polymerase mutations in the yeast mitochondrial rna polymerase specificity factor, mtf , verify an essential role in promoter utilization structure of a transcribing t rna polymerase initiation complex initial transcription by rna polymerase proceeds through a dna-scrunching mechanism movies s -s movie s . overall structural change in the transition from pic to ic. morphing between the transcription pre-initiation state (y-mtpic, gray rpo and dna, yellow mtf ) and initiation state (y-mtic, yellow mtf , blue rpo , cyan non-template, and pink template) simulates the conformational changes in the promoter and protein during transition from the pic to ic state. the downstream dna bends inward and parts of the c-terminal domain including fingers (in front) undergo large conformational changes. mtf , upstream dna, and parts of the n-terminal domain of rpo that interact with mtf and upstream dna show minimal conformational changes; e.g. the thumb helix on the left and mtf on the top have minimal movements. morphing between the pic state (gray dna) and ic state (cyan non-template, pink template, and -mer rna pppgpg and utpas in stick models) shows dna bubble expansion associated with the template base-pairing with the -mer rna and utp at the polymerase active site. the downstream dna bends by about ° (fig. s ) . the protein atoms are removed for clear visualization of the dna.movie s . scrunching of the non-template dna strand as an nt-loop. morphing between the pic state (gray dna) and ic state (cyan non-template, pink template, and -mer rna pppgpg and utpas in stick models) shows looping of the non-template strand into an nt-loop. this looping appears to be a major contributor to bending of the downstream dna with respect to the upstream dna (fig. s ) . key: cord- -wetqqt i authors: brandell, ellen e.; becker, daniel j.; sampson, laura; forbes, kristian m. title: the rise of disease ecology date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wetqqt i disease ecology is an interdisciplinary field that has recently rapidly grown in size and influence. we described the composition and educational experiences of disease ecology practitioners and identified changes in research foci. we combined a global survey with a literature synthesis involving machine-learning topic detection. disease ecology practitioners have diversified in the last decade in terms of gender identity and institution, with weaker diversification in terms of race and ethnicity. topic detection analysis of over , research articles revealed research foci that have declined (e.g., hiv), increased (e.g., infectious disease in bats), and have remained common (e.g., malaria ecology, influenza). the steady increase in topics such as climate change, and emerging infectious diseases, superspreaders indicate that disease ecology as a field of research will continue advancing our understanding of complex host-pathogen interactions and forms a critical and adaptable component of the global response to emergent health and environmental threats. among these is the urgency to understand and address novel disease threats, which are rooted in natural systems but are often exacerbated by societal inequalities (carlson & mendenhall ) . for example, the impacts of habitat degradation on pathogen spillover is an expanding area of research that can be used to guide risk assessments and environmental policy (patz et al. ). at the same time, infrastructure has developed around disease ecology, such as a specialized national science foundation and national institutes of health funding program and conference series (scheiner & rosenthal ) , which have helped to direct research effort and create networks amongst researchers. still, many questions remain as to the composition of disease ecology practitioners, core research foci, and how research trends have changed to meet societal needs. answering these questions will help to improve training pathways and prioritize research emphases for future research. however, understanding these complex and interrelated factors as they apply to an interdisciplinary research topic requires diverse and innovative approaches. here we characterize both the practitioners and field of disease ecology by addressing the following questions: ( ) who comprises the field in terms of education, demographics, and research foci? ( ) which are the most influential scientific articles and journals? ( ) and significantly, how have research trends emerged and changed over time? for example, do they follow global health priorities such as disease outbreaks? to answer these questions, we surveyed self-declared disease ecologists globally and conducted a literature synthesis with machine-learning topic detection ( had to meet specific criteria using boolean filters, including a focus on studying a pathogen or parasite, host infections (to distinguish from solely environmental persistence of microorganisms), and individual-level or higher-order dynamics (e.g. not cellular processes, with the exception of those analyzed as a population-level process). the full list of search terms is provided in the supporting information, alongside a set of exclusionary terms to remove similar but non-disease ecology articles. web of science categories were used to narrow our search and also reduce false-positive inclusions. finally, articles with fewer than four citations were removed as a form of quality control. to evaluate false-positives, two authors (djb and kmf) independently evaluated the same randomly selected articles and classified them as 'disease ecology' or 'outside the field'. papers that fell outside the field predominantly described bacterial communities, within-host behavior or adaptations, or genetics/genomics. over % of the articles in the final corpus were classified as disease ecology, and consensus was strong among evaluators ( % agreement, cohen's κ= . ). within-host studies were accepted if they focused on population-level processes (e.g., to evaluate false-negatives, we cross-validated our corpus using our survey data. specifically, we assessed whether articles that were identified by at least two respondents as influential were present in our corpus. we calculated the proportion of papers that were included in our corpus out of the list of such articles, with the requirement that at least % of papers had to be included. of the influential articles identified by survey respondents (written ≥ times) restricted to journals used in building the corpus, approximately % ( / ) were present in the corpus. yet the 'most influential' articles had a higher probability of being included: the corpus included % of articles written four or more times, % of articles written three times, and % of articles written twice. we adjusted the search and exclusion terms twice using the workflow described in figure small or too large, we were unable to detect temporal variation in that topic. if j was too small or too large, the topics were not clearly defined. for example, a topic with only five words may not be interpretable; similarly, a topic with words may be too broad to assign meaning. we used i = and j = , so our corpus was analyzed for topics with words each. we used k-means clustering from the nltk python library to construct topics, where each topic comprised commonly co-occuring words. we assigned a name to each topic to describe its theme. for example, we named a topic containing immunodeficiency, hiv, patient, therapy, drug, aids, background, treatment, and risk, as an hiv topic. we gave each topic name a 'confidence' measurement of - , from high to low confidence in identifying the topic. in addition to topics that emerged from the literature, we also generated and assessed our own topic lists based on key research areas, such as climate change, dilution effect, superspreaders, network analysis, eids, bovine tuberculosis, infectious diseases in bats and rodents, and chytrid fungus ( fig. ) . to ensure topic trends were not confounded by an increase in the total number of published articles through time, we constructed a baseline topic using neutral words that should be in all disease ecology articles: analysis, study, and paper. we evaluated temporal trends in publications for each theme using generalized additive models (gams) fit using the mgcv package in r (wood ) . the proportion of words in each topic relative to all words was modeled as a binomial response using thin plate splines with shrinkage for publication year. lastly, to assess covariation among topics, we estimated spearman's rank correlation coefficients (ρ) at the zero-year lag. survey a total of self-declared disease ecologists participated in the survey. the average respondent was . years old ( - , median: ; n= ). . % of participants (n= ) considered at least half of their research to fall within disease ecology. participants that considered ≥ % of their research to be disease ecology were concentrated from ages - , and most self-identified as women ( %) (n= ). more broadly, . % of participants identified as women (n= ), . % as men (n= ), . % as other (n= ), and . % preferred not to say (n= ). we report on participants that chose to disclose a gender identity for results regarding gender. most respondents identifying as women were younger (age ≤ - ) than most respondents identifying as men (age - ). the youngest age category (≤ years) was . % women (n= ), and the oldest age category ( + years) was . % men (n= ). current positions held by survey participants were: undergraduate student ( . %, n= ), master's student ( . %; n= ), phd student ( . %; n= ), post-doctoral researcher ( . %; n= ), faculty ( . %; n= ), researcher ( . %; n= ), and other ( . %; n= ). respondents identifying as women comprised most of each academic position except master's student and faculty (table s ). in general, most phd students and post-doctoral researchers were young and identified as women. most masters' students were young and identified as men, and most faculty were middle-aged and identified as men (tables s -s , fig. s c ). participants that did not identify with a strict gender binary were distributed across age (≤ - ) and position categories. important' areas were parasitology, immunology, field/laboratory techniques, microbiology/virology, and genetics/genomics/bioinformatics. ecology was also listed as least important, suggesting that most participants considered ecology as either the most or least important area of research; however, the number of responses for the former was nearly three times greater than for the latter. survey respondents were asked to write in scientific journals and articles that they believed were the most influential in disease ecology (table ) we compiled a list of journals that at least four survey participants said were the most important in disease ecology, plus science and nature. we searched these journals for relevant articles in the field using the algorithm described below, and our final corpus many of the topics that emerged from the disease ecology literature, such as malaria, influenza, and vaccination, have remained constant over time (fig. b) . others, such as hiv and serology, have declined over time, and host-pathogen coevolution has instead steadily increased. these emergent topics comprised a notable portion of the disease ecology literature and were more prominent than author-selected topics. we constructed a neutral topic for comparison, which was constant through time (fig. b, gray line in panels) , thus validating the observed temporal changes in these topics. using key term searches, we next explored select topic trends: climate change, emerging infectious diseases (eids), the dilution effect, superspreaders, network analysis, pathogens in rodents and bats, bovine tuberculosis, and chytrid fungus in amphibians (fig. b) . as with emergent topics, our topic detection was sensitive to detecting changes in frequency over time, have remained prominent foci of disease ecology, whereas an increase in a priori selected topics such as emerging infectious diseases, climate change, and effects of biodiversity loss emphasize how this expanding field has grown to meet global health crises and priorities. further, addressing diversity and allocating resources toward these growing topics could promote equity within disease ecology, improve training programs, designate funding opportunities, and provide the infrastructure for concentrated advancements. self-declared disease ecology practitioners are becoming more diverse in terms of country of education, gender identity, and institution (fig. ) . this echoes similar trends in conservation in general, research on epidemics tended to be responsive rather than anticipatory, such that we observed an immediate spike in publications on high-profile pathogens followed by a decline or plateau (e.g., bovine tuberculosis and chytrid fungus). emergent topics were remarkably stable through time, with the exception of hiv and host-pathogen coevolution, which have respectively decreased and increased. research focusing on concepts (e.g., dilution effect, superspreaders, coevolution) or approaches (e.g., network analyses) rather than specific hosts or pathogens tended to rise more gradually and remain a notable proportion of the literature. on the other hand, mosquito-borne pathogens and influenza have been defining topics over the entire time series, which we expect to persist for the foreseeable future. although our analysis of cross-correlation between the topic time series is associative, we observed several especially interesting relationships. in particular, publications on bat disease, chytrid fungus, climate change, the dilution effect, superspreaders, and emerging infectious diseases were all positively economic burden of livestock disease and drought in northern tanzania climate change and infectious diseases: from evidence to a predictive framework the proximal origin of sars-cov- infectious diseases of humans: dynamics and control population biology of infectious diseases: part i quantifying the burden of vampire bat rabies in peruvian livestock natural language processing with python: analyzing text with the natural language toolkit probabilistic topic models women and science careers: leaky pipeline or gender filter? gender global trends in antimicrobial resistance in animals in low-and middle-income countries wildlife mortality investigation and disease research: contributions of the usgs national wildlife health center to endangered species management and recovery women and minorities in science, technology, engineering, and mathematics: upping the numbers bats: important reservoir hosts of emerging viruses preparing for emerging infections means expecting new syndemics biodiversity inhibits parasites: broad evidence for the dilution effect effects of landscape heterogeneity on the emerging forest disease sudden oak death disentangling the interaction among host resources, the immune system and pathogens anthropogenic environmental change and the emergence of infectious diseases in wildlife seeking congruity between goals and roles: a new look at why women opt out of science, technology, engineering, and mathematics careers when do female role models benefit women? the importance of differentiating recruitment from retention in stem gender diversity of editorial boards and gender differences in the peer review process at six journals of ecology and evolution patterns of authorship in ecology and evolution: first, last, and corresponding authorship vary with gender and geography an emerging disease causes regional population collapse of a common north american bat species ecology of infectious diseases in natural populations group of eight. . the changing phd on the benefits of systematic reviews for wildlife parasitology topic modeling of major research themes in disease ecology of mammals statistical methods for meta-analysis a timeline of hiv prevention of population cycles by ecology of wildlife diseases diversity in the geosciences and successful strategies for increasing diversity why infectious disease research needs community ecology global trends in emerging infectious diseases challenges and supports for women conservation leaders taming wildlife disease: bridging the gap between science and management effects of species diversity on disease risk the rise of disease ecology and its implications for parasitology-a review achieving synthesis with meta-analysis by combining and comparing all available studies facilitating systematic reviews, data extraction and meta-analysis with the metagear package for r plasmodium knowlesi: reservoir hosts and tracking the emergence in humans and macaques the changing face of pathogen discovery and surveillance superspreading and the effect of individual variation on disease emergence nltk: the natural language toolkit biological invasions: a field synopsis, systematic review, and database of the literature population biology of infectious diseases: part ii the educational benefits of diversity: evidence from multiple sectors national science foundation. . women, minorities, and persons with disabilities in automated content analysis: addressing the big literature challenge in ecology and evolution a guide to conducting a standalone systematic literature review host and viral traits predict zoonotic spillover from mammals unhealthy landscapes: policy recommendations on land use change and infectious disease emergence ecological responses to altered flow regimes: a literature review to inform the science and management of environmental flows recognizing the benefits of diversity: when and how does diversity increase group performance? global expansion and redistribution of aedes-borne virus transmission risk with climate change amphibian fungal panzootic causes catastrophic and ongoing loss of biodiversity ecology of infectious disease: forging an alliance leaks in the pipeline: separating demographic inertia from ongoing gender differences in academia ecological interventions to prevent and manage zoonotic pathogen spillover the influence of feeding behaviour and temperature on the capacity of mosquitoes to transmit malaria interactions between hiv/aids and the environment: toward a syndemic framework press release: infectious diseases kill over million people a year: who warns of global crisis the top causes of death unesco institute for statistics (uis) higher education. heterogeneity in pathogen transmission: mechanisms and methodology quantifying the impact of human mobility on malaria wildlife disease ecology: linking theory to data and application generalized additive models: an introduction with r evaluating ecological restoration success: a review of the literature key: cord- -llmfgavd authors: sprenger, kayla g.; louveau, joy e.; chakraborty, arup k. title: optimizing immunization protocols to elicit broadly neutralizing antibodies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: llmfgavd natural infections and vaccination with a pathogen typically stimulates the production of potent antibodies specific for the pathogen through a darwinian evolutionary process known as affinity maturation. such antibodies provide protection against reinfection by the same strain of a pathogen. a highly mutable virus, like hiv or influenza, evades recognition by these strain-specific antibodies via the emergence of new mutant strains. a vaccine that elicits antibodies that can bind to many diverse strains of the virus – known as broadly neutralizing antibodies (bnabs) – could protect against highly mutable pathogens. despite much work, the mechanisms by which bnabs emerge remain uncertain. using a computational model of affinity maturation, we studied a wide variety of vaccination strategies. our results suggest that an effective strategy to maximize bnab evolution is through a sequential immunization protocol, wherein each new immunization optimally increases the pressure on the immune system to target conserved antigenic sites, thus conferring breadth. we describe the mechanisms underlying why sequentially driving the immune system increasingly further from equilibrium, in an optimal fashion, is effective. the optimal protocol allows many evolving b cells to become bnabs via diverse evolutionary paths. significance statement the global health burden could be substantially alleviated by the creation of universal vaccines against highly mutable pathogens like hiv and influenza. broadly-neutralizing antibodies (bnabs) are encouraging targets for such vaccines, because they can bind to diverse strains of highly mutable pathogens. bnabs typically develop only rarely upon natural infection, after the immune system has been exposed to many mutated versions of a pathogen. thus, sequentially administering multiple different pathogen-like proteins (antigens) is a promising strategy to elicit bnabs through vaccination. however, it remains unclear how best to design and administer these antigens. we explore this matter using physics-based simulations, and provide new mechanistic insights into antibody evolution that could guide the creation of universal vaccines against highly mutable pathogens. successful vaccines stimulate immune responses that can protect the host from infection by a specific pathogen. such a vaccine is usually unable to protect against the diverse circulating strains of highly mutable pathogens. examples include viruses like hiv and hcv . such a vaccine can also not serve as a universal vaccine against variant strains of the influenza virus that arise annually. creating an effective universal vaccine against highly mutable pathogens will likely require a novel approach to vaccine design. successful prophylactic vaccines induce the immune system to generate antibodies (abs) that can bind to and neutralize the pathogen. the process that governs ab production upon pathogen (or antigen, ag) encounter/vaccine stimulation is a stochastic darwinian evolutionary process called affinity maturation (am) , . first, a few b cells become activated upon binding of their b cell receptors (bcrs) to the ag with sufficient affinity. activated b cells can then seed microstructures called germinal centers (gcs) in lymph nodes. during am, b cells proliferate, and upon induction of the activation-induced cytidine deaminase (aid) gene, mutations are introduced into the bcr at a high rate via a process known as somatic hypermutation. the b cells with mutated receptors then interact with the ag displayed on the surface of gc-resident follicular dendritic cells (fdcs) and attempt to bind and internalize the ag. b cells with bcrs that bind to the ag with higher affinity are more likely to internalize ag. the internalized ag is then processed and displayed on the surface of b cells as peptide-major histocompatibility complex (mhc) molecules. b cells that display peptide-mhc molecules then compete with each other to interact with t helper cells, productive binding of whose t cell receptors (tcrs) to peptide-mhc molecules delivers a survival signal. b cells that bind more strongly to the ag on fdcs are likely to internalize more ag and thus likely to display more peptide-mhc molecules on their surface, and are therefore more likely to be positively selected . b cells that do not bind the ag strongly enough or do not receive t cell help undergo apoptosis , . a few positively selected b cells exit the gc and differentiate into ab-producing plasma cells and memory b cells, while the majority are recycled for further rounds of mutation and selection , . upon immunization with a single ag, as cycles of diversification and selection ensue, abs with increasingly higher affinity for the ag are thus produced . one promising strategy for the development of an effective vaccine against hiv is to induce the immune system to generate broadly-neutralizing antibodies (bnabs) that can neutralize diverse strains of mutable pathogens. bnabs that can neutralize most hiv strains in vitro [ ] [ ] [ ] [ ] and diverse influenza strains have been isolated , . these abs target regions or epitopes on the surface of pathogenic proteins that contain amino acid residues that are relatively conserved because they are key to the virus' ability to propagate infection in human cells. one possible strategy to generate bnabs is to vaccinate with an immunogen (an immune response-eliciting ag) containing only these conserved residues. this tempting solution is impractical, however, because the abs thus generated cannot learn how to bind to the conserved residues in the molecular context in which they are presented on the pathogen surface. for example, an important target of bnabs against hiv is the cd binding site on the virus' spike, and the conserved residues therein are surrounded by glycans and variable residues that, due to the three-dimensional nature of the hiv spike proteins, partially shield b cell receptors (bcrs) from binding to the conserved residues. in fact, mutations in variable antigenic residues can insert loops that further hinder access to conserved residues . similarly, for influenza, the receptor binding site on the head of its spike is surrounded by variable residues , and another conserved region in the stem of the spike is sterically difficult to access because of the high density of spikes on the influenza virus' surface . a potentially promising vaccination strategy that may elicit bnabs is to first administer an ag that can stimulate germline b cells that can target the pertinent conserved residues, followed by immunization with variant ags that share the same amino acids at conserved positions, but diverse amino acids at the surrounding variable positions presented in the same molecular context as in the real pathogen's spike. a deep understanding of how abs evolve in such a vaccination setting is required to be able to guide the design of optimal vaccination strategies that can efficiently stimulate the production of bnabs - against different highly mutable pathogens in diverse individuals. developing this knowledge also presents an interesting challenge at the intersection of immunology, evolutionary biology, and biophysics. in cases where bnabs develop in hiv-infected individuals, their emergence typically occurs only after several years of infection, during which the immune system of the infected individual has been exposed to many different antigenic strains of the rapidly-mutating virus [ ] [ ] [ ] . bnabs against influenza have similarly been observed to emerge in rare individuals upon exposure to antigenically distinct strains of the virus , . these data suggest that being selected by different, but somewhat related, ags promotes the evolution of bnabs. the diverse infecting ags serve as selection forces that shape ab evolution during am. can properly designed vaccination protocols using variant ags that share conserved residues but differ in the variable regions result in am that elicits bnabs efficiently in diverse individuals? if so, a strategy for developing vaccines against highly mutable pathogens would be available. this tantalizing possibility has led to a great deal of research directed toward achieving this goal. a number of advances have been made in this regard, and we note just a few that have resulted from work focused on eliciting bnabs against hiv. strategies and immunogens have been designed to activate the correct germline b cells that can target conserved residues on the hiv viral spike, which have the potential for developing into bnabs upon subsequent immunization with variant ags [ ] [ ] [ ] . computational studies have described how the variant ags can serve as conflicting selection forces because a bcr/ab that binds well to one of the variant ags is unlikely to bind well to another variant ag unless relatively strong interactions with the shared conserved residues evolve. the presence of such conflicting selection forces has been termed "frustration" , too much of which has been shown to result in substantial b cell death/gc extinction , . studies have suggested that, in some instances, sequential immunization with variant ags may elicit bnabs more efficiently than a cocktail of the same ags , , [ ] [ ] [ ] [ ] [ ] [ ] . these studies also highlight the importance of separating the conflicting selection forces (or sources of frustration) over time to minimize significant b cell death, and of using multiple immunizations to help the developing abs acquire mutations that focus their binding on to the conserved residues, and thus confer breadth. it has also been suggested that cocktails can potentially be optimally designed to promote bnab formation while minimizing cell death by manipulating the vaccine dose and mutational distance between ags , thus identifying these variables as two further sources of frustration that can influence the outcome of am. despite this progress, several questions remain to be answered to devise immunogens and vaccination protocols that can efficiently induce bnabs. in particular, ) which ags should be used as immunogens (number of ags and their sequences/compositions), and ) how should they be administered in time? given the high diversity of the variable residues as well as the broad range of possible bcr-ag binding sites, the number of ags that share conserved residues and the number of different immunization protocols are too great to test all possible combinations of these ags and protocols with experiments in animal models. it is, however, possible for computational/theoretical studies to elucidate the effect of these variables and evaluate the outcomes in terms of both the quality (mean breadth) and quantity (titers) of the produced abs. these models could provide new insights into how immunogens and vaccination protocols influence bnab evolution and thus help to guide the design of an ab-based vaccine against highly mutable pathogens. such studies can also shed light on fundamental questions in immunology and biophysics. in this study, we used a computational model to explore how different sets of variant ags administered sequentially at different concentrations influences the evolution of bnabs by am. by quantifying frustration and characterizing its impact on am, we predict that ags and vaccination protocols that result in a temporally increasing level of frustration on gc reactions promotes optimal bnab responses. this can be realized, for example, by decreasing the ag concentration for each new immunization, or by immunizing with increasingly dissimilar ags. our results further indicate that an intermediate amount of frustration during each new immunization is optimal. the optimum is defined by the highest level of frustration allowed that does not result in extensive gc collapse/cell death. we describe the mechanisms underlying these results, which highlight that an appropriate level of diversity among b cells needs to evolve early on. this optimal level of diversity allows b cells to subsequently acquire mutations that confer breadth by diverse evolutionary trajectories. description of the computational model of affinity maturation. we built a simple computational am model of b cells in the presence of different ags that can be introduced at several time points. we simulate the processes that occur during am using a stochastic model, as is appropriate for an evolutionary process. the set of rules that define am are derived from experimental studies of am with a single ag , - , and these serve as instructions that are executed by the computer. the goal of this model is not the quantitative reproduction of existing experiments, but to provide mechanistic insights into how the nature of variant ags and immunization protocols influence the development of ab breadth and bnab titers. while our model represents the steps of am, it does not explicitly account for b cell migration within a gc or employ an atomistically detailed representation to compute the free energy of binding of a bcr to ags. the model is based on those of previous works , , , but there are a number of new features that are described below. all parameters that will be discussed are listed in table s . as in the models described by (wang et al. , luo et al. , the bcr paratope and the ag epitope of interest, which we will from here on refer to as bcr and ag, respectively, are both represented by a string of residues , . the bcr and the ag have identical lengths ( residues total; variable and conserved residues), so the bcr residue at position k binds to the antigenic residue located at the same position k. for the bcr, the identity of each residue is designated by a number, whose value was sampled from a continuous and bounded uniform distribution; variable residue values were bounded between - . and . (bounds change after aid gene turns on (see table s )), and conserved residue values were bounded between . and . (see following section for more details). similarly, each ag residue is designated by a number. the value of this number corresponding to a conserved ag residue is always + , while the value of a variable residue can be either + (non-mutated) or - (mutated). the binding free energy depends on the strength of the interactions between the bcr and the ag. we model this binding free energy as a sum of all pairwise interactions between the bcr and the ag, estimated as the product of the values that describe the identity of each bcr residue and its analogous antigenic residue. thus, the binding free energy is: more positive binding free energies correspond to stronger affinities. this simple expression for the binding free energy is further embellished to account for the effects of three-dimensional conformations of the bcr and ag as described in context below. the absolute value of the binding free energy is arbitrary as it is determined only up to an additive constant. all binding free energies in our model are expressed in units of the thermal energy (k b t). we assume that a binding free energy of k b t is the minimum requirement for ag binding and therefore only b cells that bind to an ag with a binding free energy above this threshold can seed a gc. the value of this threshold binding free energy sets the scale for our calculations, and we do not expect qualitative results to depend upon this particular choice. the number of potential bcr/ab binding sites on the hiv envelope that do not contain conserved residues far exceeds those that could be bnab epitopes . in the repertoire of naïve b cells bnab precursors are rare , and thus the precursor frequency of germline b cells that target epitopes that do not contain conserved residues is much higher. thus, it is unlikely that bnab precursors will get activated during a natural infection . similar considerations apply to influenza bnabs, where there are many more highly variable hemagglutinin (ha) head epitopes than those that can potentially be epitopes for bnabs. immunization experiments for hiv try to remedy this challenge by priming with an ag that focuses immune responses on conserved residues (fig. ). that is, immunizing with ags designed to target specific germline (gl) bcrs is a key first step to eliciting bnabs , , , . subsequently, variant ags that mimic the viral spike are used as immunogens ( fig. ). our model mimics the priming to activate the appropriate gl bcrs by seeding the first gcs, upon immunization with variant ags, with a pool of b cells that we assume has been generated by a prior, gl-activating immunization, and thus bind to epitopes containing conserved residues (e.g., the cd binding site for hiv). mimicking the bcrs produced through gl-targeting experiments , we chose the residues of the bcrs that bind to the conserved residues of the ag to be slightly biased towards positive values, reflecting that they favor these residues because they were activated initially by the gl-targeting immunogen. since no specific selection force was imposed on the residues of these bcrs that bind to the variable residues of the ags, these were chosen to be highly variable among the seeding b cells. experiments have shown that the number of seeding b cells varies between a few to a hundred cells . we chose to seed gcs with cells; changing this number did not affect our qualitative results. we assume that the gl-targeting immunogen residues are all + s. this is just a reference sequence, and a different choice would lead to a linear shift in the mutational distances that we calculate for the variant ags used for subsequent immunizations, and thus would have no effect on the qualitative results we report. the b cells expand in the dark zone of the gc without mutation or selection for a week, reaching a population size of , cells. after the initial expansion of the b cell population, the aid gene turns on and mutations are introduced into the bcrs with a probability determined by experiments: each b cell of the dark zone divides twice per gc cycle ( divisions per day) , and mutations appear at a high rate ( . per sequence per division) . these mutations are known as somatic hypermutations (shms) and can have various effects on the fate of a b cell , . recent evidence suggests that b cells that internalize more ag divide more times , but this effect is not included in our simulations. experiments have shown that shms are lethal % of the time (for example, by making the bcr unable to fold properly), are silent % of the time (due to redundancy of the genetic code; i.e., synonymous mutations), and modify the binding free energy % of the time . the energy-affecting mutations are more likely to be detrimental; experiments have determined that across all protein-protein interactions, only ~ to % of energy-affecting mutations strengthen the binding free energy . for mutations that affect the binding free energy, a particular residue on the bcr is randomly picked to undergo mutation. the change in the strength of binding between this bcr residue and the complementary antigenic residue is sampled from a bounded lognormal distribution whose parameters are chosen to approximate the empirical distribution of changes in binding free energies of protein-protein interfaces upon single-residue mutation . thus, the random change in binding free energy Δe is drawn from the following probability distribution: where r is a standard normal variable with mean zero and standard deviation equal to one. here o is a shift parameter, which is needed to center the lognormal distribution properly with respect to zero, and μ and σ are the mean and standard deviation of the lognormal distribution, respectively. the parameters of the lognormal distribution (o, μ, σ and r) are set so that only % of mutations increase the binding free energy and so that the tail of harmful mutations fits the distribution obtained by experiments. the additional parameter δ limits the effect of a single-residue mutation, chosen to prevent the b cell population from succeeding too fast and exceeding the average time for the gc population to reach its initial size after vaccination or infection with a single ag (see parameters section). the effects of ag residues that shield the conserved residues from being accessed easily by the bcr are incorporated into the model to account for the three-dimensional nature of ag-bcr interactions. mutations in variable residues can insert loops that hinder access to conserved residues : greater binding to loop residues reduces access to conserved residues, and vice versa. the insertion or deletion of a loop can drastically change the binding free energy . in particular, a new interaction with a loop residue can completely prevent bcr binding to non-loop residues and greatly lower the binding free energy. we mimic this effect as follows. if mutation of a bcr residue results in a stronger interaction with a variable antigenic residue, the interaction between a randomly chosen bcr residue that binds to a conserved antigenic residue is proportionally decreased by a factor α (eqn. ). similarly, a weaker interaction with a variable residue leads to a proportional increase in the binding to a conserved residue. the boundary for the free energy change due to a loop δ l is chosen to be the same as that for a single-residue mutation (δ). this feature in our model favors the emergence of mutations that alleviate steric effects because it strengthens binding to conserved residues, and this aids binding to multiple variant ags. mathematically, this effect is represented as follows: see si for an explanation of the choice of the parameter, . after shm, the mutated b cells migrate to the light zone of the gc, where selection takes place through competition for binding to ag and receiving t cell help . b cells with the greatest binding free energy for the ag presented on fdcs have a better chance to bind that ag and present its peptides on their mhc molecules to receive t cell help. productive interactions with t helper cells results in positive selection, and otherwise b cells undergo apoptosis [ ] [ ] [ ] . we model this biology with a two-step selection process. first, each b cell successfully internalizes the ag it encounters with a probability that grows with the binding free energy and then saturates, following a langmuir form (eqn. ). only the b cells that successfully internalize ag can then go on to the second step. the surviving b cells are ranked according to their binding free energy, a proxy for the concentration of peptide-mhc molecules that they display, and only the top % are selected. this selection probability, as well as the parameter e scale in eq. below (pseudo inverse temperature, chosen to be . k b t - ), were manually adjusted to fit with experimental observations of gc dynamics upon single-ag administration (see parameters section for more details) . the probability of b cell j internalizing ag i depends on the binding free energy and the concentration c i of that ag, as well as the energy scale e scale and the activation threshold e act (eqn. ): recycling to the dark zone, exit for differentiation, and termination of the gc reaction most b cells that are positively selected are recycled for further rounds of mutation and selection, while a few randomly selected b cells exit the gc to mimic differentiation into memory and abproducing plasma cells . as a proxy for the fact that all the ag will be consumed by internalization if a sufficient number of b cells successfully mature, the gc reaction is terminated if the number of b cells exceeds the initial gc population size of , cells. in addition, to reflect ag degradation over time, termination occurs when the number of cycles before the gc recovers its initial size exceeds ( days). the gc reaction also ends if all b cells die. every memory b cell that exits a gc is in circulation and could be reactivated by a second exposure to the ag that initially triggered its development. memory b cells can also then seed a new gc upon immunization with a second ag that is different from the first ag but shares conserved residues. if there are many other epitopes on the ag that do not contain conserved residues, naïve b cells that bind to these epitopes could also seed gcs. however, we assume that memory b cells outnumber these naïve b cells, or that the second ag is given sufficiently soon after the first immunization that it joins ongoing gcs with the first ag almost depleted. thus, we consider that only memory b cells undergo am upon the second immunization. ten memory cells are randomly chosen to seed a new gc. our model depends on several parameters (table s ). many of these parameters represent specific biological quantities while the remaining parameters arise because of the coarse-grained nature of the model. the biological quantities can be split into those that have already been assessed by experiments -such as the shm rate and the fate of mutations -and those that have yet to be measured experimentally. we estimated the latter type of biological quantities by making reasonable guesses. experiments have so far focused on am in the case where only one ag is present, thus we fit the parameters so that simulation results are in qualitative agreement with experiments with a single ag. using this methodology, we fit the following parameters that control both the growth of the b cell population and the properties of the produced abs: ) f help,cutoff , the fraction of b cells that receive t cell help after binding ag; ) e scale , the factor multiplying k b t in the equation for p internalize (eq. ); ) p recycle , the probability to be recycled from the light zone to the dark zone; ) c i , the concentration of the antigen(s); and ) δ, the maximum value of a singleresidue mutation. f help,cutoff , e scale , p recycle , c i , and δ are adjusted so that the population decreases sharply at the beginning of am until a few beneficial mutations appear and allow survival of a few b cells. then the population plateaus for about days until enough good mutations accumulate to increase the binding free energy dramatically. the population then rises quickly until it reaches the initial size of , cells, approximately days post immunization. the affinity of the abs produced during am with a single ag accumulate about mutations and see their binding affinity grow by at least , -fold . with our chosen parameters (table s ) , our model reproduces these features well. for each gc reaction we record the total number of b cells over time, their binding affinities to a test panel of ags to define the breadth of each ab (see below), the number of energy-affecting mutations acquired by each bcr, the value of each residue (identity of pseudo amino acid) of every bcr and the binding free energy of the bcr and the ag it interacted with in the light zone during the gc reaction. due to the stochastic nature of am, we executed many simulations under identical conditions (see below) and aggregated the results to obtain meaningful statistics. as in the am model described by , we group b cells into sets of functionally identical b cells, called b cell clones. all cells of a clone have the same properties, including binding free energies, number of mutations, and breadth. the size of a clone varies with time and its evolutionary trajectory. in order to determine the breadth of coverage of each ab, we compute the binding free energy of each clone against an artificial panel of ags different from those that the clone matured against. we found that increasing the size of the panel to , ags produced little-to-no change in our breadth estimates. these panel ags share the conserved residues but the value of the variable residues has equal chance to be + or - . we compute the binding free energies of the clone for each panel ag and the breadth is calculated as the fraction of panel ags for which the clone binds with a binding free energy above a certain threshold e th (eqn. ). the threshold was set to k b t so that b cells produced by single-ag immunization had a breadth of . this criterion reflects the fact that the mutations required to confer breadth are unlikely to evolve appreciable numbers in a single immunization, as implied by the fact that naturally evolving bnabs usually emerge several years after infection, and sequential immunizations with variant ags results in the evolution of bnab-like abs in mice , . a lower threshold would define abs generated from immunization with a single ag as "broad" while a higher threshold would result in few or no abs that would ever be defined as broad. in fact, the affinity of b cells for a single ag can increase during am by up to a few thousand-fold but above a certain value the affinity saturates. the goal of this boundary may be to safeguard against potential autoimmune responses . we ran , gc trials per vaccination setting and calculated a mean breadth of all clones produced across all , gc trials (eqn. ). we then averaged this mean clonal breadth across multiple simulations for each vaccination protocol studied. apart from breadth, the quantity of high-breadth abs produced (bnab titers; clonal breadth > . ) by a vaccination protocol is a key metric of success. this is because low titers of bnabs are unlikely to confer protection. a universal and efficient vaccine would likely need to elicit sufficiently high titers of high-breadth abs. thus, apart from the mean clonal breadth for a given vaccination setting, we also compute the average bnab titers/gc (referred to simply as the bnab titers/gc). we ran , gc trials per vaccination protocol and combined the outcome of all , trials in our calculations (eqn. ). as with the mean clonal breadth, we then averaged this value across multiple simulations for each vaccination protocol. due to the large number of gcs we analyzed ( , ) for each vaccination setting, our statistical results are robust (fig. s ). we studied the situation where there are two sequential immunizations following the consequences of an assumed gl-targeting immunization ( fig. ; see previous section on gl-targeting). past work has shown that variant ags subject b cells undergoing am to conflicting selection forces, an effect referred to as frustration because these forces can, under some circumstances, frustrate the evolutionary process and lead to gc collapse. past work has also shown that the extent of frustration can be modulated by changing the concentration of the ags and the mutational distances between them. we first performed simulations of a single immunization with one ag. in the results described below, we study the effects of the ag concentration (c ) and the mutational distance (d ) between the vaccine ag and previously-administered gl-targeting ag (sequence of all + s). we find that the mean clonal breadth increases with increasing frustration ( fig. a-b) , arising from changes in sequence (as d is increased) and concentration (as /c is increased). the bnab titers/gc goes through a maximum with changes in frustration ( fig. c-d) . the origins of these results can be understood as follows. at low frustration ( fig. : d ≤ (left), c ≤ . (right)), the gc success rate is high (fig. e-f) . however, due to the ease with which the gc b cells can succeed at being positively selected, there is too little time to acquire mutations before the successful b cells consume all the ag and the gc reaction ends (fig. g-h) , which results in a low diversity of clonal breadth (fig. i-j) . as a consequence, essentially none of the b cell clones can acquire much breadth. this is illustrated by graphs showing the variation in breadth among the resultant clones (fig. a-d) . thus, the bnab titers/gc is essentially zero. as the level of frustration increases, the mean breadth and bnab titers/gc begin to increase ( fig. : . (right)). the results for mutational distances beyond were included purely for completeness ( fig. : d > (left) ), and were achieved by simply not setting a threshold ag-bcr binding affinity for starting the gc reaction. at these high levels of frustration, we find that the changes in the number of cycles and mean breadth stall, almost all gcs collapse rapidly, and the very small population of b cells that survives contains no bnabs (requires lower concentrations than explored/presented to see in fig. j ). under these conditions, there are only rare evolutionary pathways to success (explored later in more detail). fig. . schematic of the in silico immunization scheme. the germline (gl)-targeting antigen (ag) was inspired by the eod-gt construct designed to target precursor naïve b cells against the cd binding site (cd bs) of hiv . conserved residues of the cd bs are schematically depicted in yellow, example mutated variable residues are shown in red, and the surrounding residues are shown in blue. visual molecular dynamics (vmd) was used to construct the images from pdb code fyj . gray dotted lines indicate the point beyond which gcs are unlikely to be seeded due to a low ag-bcr binding affinity. error bars represent the standard deviation of the mean across multiple simulations (n= , gcs). note that some error bars are too small to be visible, but are included on all points and are largest where some gc collapse occurs (introducing much stochasticity into the data), or at the highest levels of frustration where few statistics could be obtained altogether. our results thus far show that properties of the clonal population vary in similar ways with the level of frustration (fig. ) , either through modulating the mutational distance between ags (d ) or concentration (c ). additionally, our results suggest a tradeoff in frustration exists between these two variables in the determination of mean breadth. for example, a mean breadth of . can be achieved with either (d , /c ) = ( , . ) in fig. a , or with a decreased mutational distance but increased inverse concentration of (d , /c ) = ( , . ) in fig. b . we thus hypothesized that these two sources of frustration could be combined into a single metric for predicting breadth, allowing for a simpler comparison of different temporal patterns of immunization. to test this hypothesis, we chose a simple linear combination of the two individual sources of frustration (eqn. ). tfl refers to the total frustration level (subscript refers to the first vaccine immunization), due to the combined effects of the frustration originating from the mutational distance of the variant ag from the gl-targeting ag (sfl, sequence frustration level) and ag concentration (cfl, concentration frustration level). note that the cfl has been written as /c because administering less ag results in a higher level of frustration during the gc reaction. the parameter, w, describes the relative weight of the two contributions to the total level of frustration. by fitting the value of w, we find that all the simulation results described in the preceding section collapse onto a single curve that is highly predictive of the resultant mean clonal breadth after immunization with a single ag (fig. a ). furthermore, with the same value of w , the bnab titers/gc exhibits a maximum at an intermediate tfl (fig. b) , thus encapsulating the results in fig. . we next wanted to determine if the tfl concept could be used as a metric to predict ab properties after multiple immunizations, enabling us to determine the optimal way in which to manipulate frustration with time. we hypothesized that the tfl after multiple immunizations is the product of the tfl of each individual immunization. this history-dependent formulation is based on the idea that each new immunization operates on the memory b cells that resulted from the previous immunizations. this more general form of eqn. is shown in eqn. for n immunizations, where indicates a summation over the mutational distances between the sequences of all pairs ( ∑ ≠ ) of ags j and k administered across all immunizations. here, b is a weight that renormalizes differences in the simulated ranges of the cfl and sfl values; see si). terms with a subscript, i, in eq. refer to each individual immunization (e.g., tfl ), and tfl{n} refers to the total frustration level after n non-gl-targeting immunizations. following the first single-ag vaccine immunization (fig. a-b) , we next simulated a second sequential single-ag immunization (third immunization overall; fig. c-d) . here, we manipulated frustration by changing concentration (c ) and the average mutational distance (d ) between the current immunizing ag and both previously-administered ags (gl-targeting ag and first vaccine ag). we find that eq. is highly predictive of the mean breadth after multiple immunizations ( fig. c ; w =w = . , b = , and b = . (see si)). as before, we also find that the bnab titers/gc are maximized at intermediate levels of frustration, graphed in fig. d as the net frustration level after all immunizations (i.e., tfl{ }). however, note that different levels of tfl lead to different peak positions and maximal values of the bnab titers/gc when the results are graphed against tfl{ }. in the following section, we investigate this behavior, to better understand how best to manipulate frustration over time to maximize bnab evolution. to determine the optimal way in which to manipulate frustration with successive immunizations (or time), simulations were performed that varied tfl at constant values of tfl , namely at a low tfl (figs. a, d) , intermediate tfl (figs. b, e) , and high tfl (figs. c, f) . we find that regardless of the level of frustration imposed upon the immune system in the first immunization, an optimal level of frustration always exists for the second immunization that maximizes the bnab titers/gc. the origin of the optimality in the bnab titers/gc after the second immunization is the same as was described earlier for the first immunization. notably, the optimal tfl is always higher than the corresponding tfl . this result says that increasing the level of frustration in successive immunizations maximizes bnab production. this is because after the first immunization, some b cells are likely to have developed moderate to strong interactions with the conserved residues, and a stronger selection force is required for evolving additional mutations that can further focus their interactions on the conserved residues to acquire breadth. a higher level of frustration provides such a selection force to promote bnab evolution. comparing the maximum bnab titer values in figs. d-f, we find that the highest titers are not only produced at intermediate values of tfl , but at an intermediate value of tfl , which can also be observed in fig. d . fig. s shows the validity of this finding across a wider range of tfl . taken together, our results imply that each new immunization should be administered at an intermediate level of frustration above the optimum level in the previous immunization to maximize bnab production. to determine the mechanism underlying such a requirement, we hypothesized that the intermediate value of tfl resulted in an optimal clonal diversity of memory b cells that provided many potential evolutionary pathways to "success" (i.e., becoming a bnab; breadth > . ) upon the second immunization. to test this hypothesis, we performed a more comprehensive set of simulations that varied tfl at constant tfl , recording the mutational trajectories of all clones. considering only the clonal trajectories that achieved success after the second vaccine immunization, we find that the clonal diversity after the first vaccine immunization was high between a tfl of and (fig. a ). consistent with our hypothesis above, such an intermediate value of tfl equal to resulted in the most successful trajectories after the second vaccine immunization (figs. b, d) . similar to our earlier discussion of fig. , this is because at this particular value of tfl , the number of gc cycles -or equivalently frustration -is as high as possible to allow for the most time to make affinity-increasing mutations and thus diversify the clonal population, without incurring major cell death. a lower tfl resulted in either: ) low clonal diversity due to too little time to make mutations before the gc reaction ended, providing few pathways for achieving success in the next immunization (tfl < ; fig. c ), or ) high clonal diversity but many of the resulting memory b cells are likely to lead to dead-end evolutionary trajectories that do not evolve bnabs upon subsequent immunizations at suboptimal levels of frustration (e.g., tfl = ). values of tfl higher than the optimum resulted in extensive gc extinction, which restricted the number of successful evolutionary trajectories that could be generated upon the subsequent immunization. finally, we note that in all three scenarios in fig. c -e, upon analyzing the mutational trajectories of clones from one successful gc at a time (here, not conditioned on success), we observe many instances of clonal interference (fig. s ) . we define clonal interference here as instances where the mean breadth of a given clone is below that of another clone after the first immunization, but surpasses the other clone in breadth after the second immunization. this result further emphasizes the importance of ensuring that multiple evolutionary pathways exist for achieving success, as promoted at intermediate frustration levels. effect of changing the total frustration level administered in the second immunization (tfl ), conditioned on (a) a low total frustration level administered in the first immunization (tfl ), (b) an intermediate tfl , and (c) a high tfl , on the mean clonal breadth and bnab titers/gc after the second vaccine immunization. values in parentheses indicate the mean clonal breadth and bnab titers/gc after the first immunization. universal vaccines currently do not exist for diseases like hiv, hcv, influenza, and malaria, in large part because of the high genetic variability of the pathogens that cause these diseases. broadly-neutralizing antibodies (bnabs), which target conserved regions of the pathogenic machinery, offer an exciting route to overcome this challenge. while bnabs have been isolated from a number of patients [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , despite progress, vaccination protocols to elicit them are not available. recent work identified that varying antigen (ag) sequence, concentration, or the pattern of administration can modulate bnab formation by impacting affinity maturation [ ] [ ] [ ] [ ] [ ] [ ] [ ] , . efforts to develop systematic strategies to design vaccines against highly mutable pathogens would be greatly aided if a deep mechanistic understanding of the pertinent immunological processes was available. toward this end, in this study, we employed computational models to investigate the effect of sequential immunization with variant ags with diverse ag sequences and concentration on the evolution of bnabs by am. a large number of immunization protocols are made possible by changing both ag sequence and concentration. our results show that a simple lower-dimensional representation of this highdimensional design space is likely predictive of outcome vis-à-vis bnab production. specifically, the level of frustration imposed on the immune system upon a single immunization could be formulated as a linear combination of the frustration due to changing ag sequence and concentration. then, this level of frustration was multiplied across immunizations to provide the total frustration level (tfl) after multiple immunizations. we found this metric to be highly predictive of the resultant mean antibody breadth and bnab titers. however, it remains puzzling why such a simple model works for our computational results. while this governing scaling phenomena may not quantitatively translate in an experimental setting, we predict that some combination of the imposed sequence and concentration frustrations will still likely be predictive of the resultant ab properties. for example, rather than use mutational distance between ags as a metric of frustration, it may instead be better to employ the difference in their ability to bind different gc b cells as a metric (if such calculations or experiments can be performed accurately). our model predicts that an optimal level of frustration imposed on gc reactions upon the first immunization, followed by a temporally increased level of frustration with each subsequent immunization promotes optimal bnab responses (fig. ) . an initial optimal level of frustration results in a population of b cells that has the appropriate diversity to subsequently evolve into bnabs via diverse evolutionary pathways when a stronger selection force to evolve breadth is imposed in subsequent immunizations. too low or too high a level of initial b cell diversity leads to either extensive gc extinction or dead-end evolutionary paths. recent advances in controlled ag release platforms, such as osmotic pumps or gene delivery vectors , may enable quantitative validation of the role of temporally increasing frustration on bnab formation. we suggest immunizing with increasingly dissimilar ags while holding concentration constant, modulating sequence frustration by inserting increasingly diverse amino acids at variable positions adjacent to conserved ag residues (fig. ) . additionally, advances in high-throughput mutagenesis, deep sequencing, and in vitro evolution methods (phage, yeast, bacterial, or ribosomal display) may enable validation of our finding that diverse evolutionary trajectories optimize bnab responses. like the cd binding site (cd bs) on hiv's envelope spike protein, the conserved region of the receptor binding site (rbs) on influenza's hemagglutinin spike protein is smaller in size than the binding footprint of an ab . thus, similar to the epitope of cd bs-directed abs, the epitope of rbs-directed abs contains peripheral variable sites in addition to the core conserved sites . our proposed vaccination approach may therefore be relevant to eliciting bnabs to the rbs of influenza's hemagglutinin spike protein. additionally, in the case of influenza, an intermediate level of frustration in the first immunization may be optimal for another reason, which is accounting for the effects of immunological memory in different individuals. for example, if the vaccine ag is too different from what a person has been exposed to in the past (i.e., a high level of frustration), then naïve, strain-specific b cells may be more favorably recruited to gcs over crossreactive memory b cells . after administering the first immunization at an intermediate level of frustration, increasing the frustration in subsequent injections (via differential ag design, in a similar manner as discussed above for hiv) may transform b cells into true rbs-directed bnabs . our results exhibit an interesting analogy to cognitive learning models , where foundational material is introduced first, followed by more complex material later on. these models, as does our approach, present increasingly difficult material (frustration) over time, allowing many students (b cells) of varying initial skill (breadth) to succeed at each step along the way and eventually pass the test (become a bnab). this is in contrast to providing complex material first and asking students to take the test right away (akin to a high level of frustration initially), resulting in the success of only a few very bright or lucky students. the idea of increasing frustration with time, particularly sequence frustration, is also similar to how highly mutable pathogens diversify their sequences over time, imposing ever-greater amounts of frustration upon the immune system and creating an evolutionary arms race. fig. . proposed vaccination scheme. our model predicts that imposing optimal, temporally increased levels of frustration on gc reactions upon subsequent immunizations (i.e., increasing the 'total frustration level'; orange line), will optimize bnab responses (green line). this can be achieved, for example, by immunizing with ags that are increasingly dissimilar (i.e., increasing the 'sequence frustration level'; blue line), while holding the concentration of each immunization constant (i.e., keeping the 'concentration frustration level' constant; purple line). the total frustration level is the sum of the sequence and concentration frustration levels. a simple schematic of how to increasingly diversify variable ag residues (white/gray ovals), to guide germline (gl)-targeted unmutated antibodies (ua) towards bnabs by shifting the focus to conserved ag residues (black ovals), is demonstrated above the plot. evolutionary and immunological implications of contemporary hiv- variation germinal center selection and the development of memory b and plasma cells gp -cd interactions are essential for germinal center formation and the development of b cell memory a brief history of t cell help to b cells somatic mutation leads to efficient affinity maturation when centrocytes recycle back to centroblasts variations in affinities of antibodies during the immune response * broad neutralization coverage of hiv by multiple highly potent antibodies developing an hiv vaccine hiv-host interactions: implications for vaccine design broadly neutralizing antibodies to hiv and their role in vaccine design conserved epitope on influenza-virus hemagglutinin head defined by a vaccine-induced antibody broadly neutralizing human antibody that recognizes the receptorbinding pocket of influenza virus hemagglutinin affinity maturation in an hiv broadly neutralizing b-cell lineage through reorientation of variable domains defining b cell immunodominance to viruses tuning environmental timescales to evolve and maintain generalists host-pathogen coevolution and the emergence of broadly neutralizing antibodies in chronic infections tracing antibody repertoire evolution by systems phylogeny maturation pathway from germline to broad hiv- neutralizer of a cd -mimic antibody co-evolution of a broadly neutralizing hiv- antibody and founder virus developmental pathway for potent v v -directed hivneutralizing antibodies viral variants that initiate and drive maturation of v v -directed hiv- broadly neutralizing antibodies rational hiv immunogen design to target specific germline b cell receptors hiv vaccine design to target germline precursors of glycan-dependent broadly neutralizing antibodies hiv- broadly neutralizing antibody precursor b cells revealed by germline-targeting immunogen optimal immunization cocktails can promote induction of broadly neutralizing abs against highly mutable pathogens manipulating the selection forces during affinity maturation to generate cross-reactive hiv antibodies sequential immunization elicits broadly neutralizing anti-hiv- antibodies in ig knockin mice sequential and simultaneous immunization of rabbits with hiv- trimers from clades a, b and c sequential immunization with a subtype b hiv- envelope quasispecies partially mimics the in vivo development of neutralizing antibodies tailored immunogens direct affinity maturation toward hiv neutralizing antibodies sequential immunizations with a panel of hiv- env virus-like particles coach immune system to make broadly neutralizing antibodies imaging of germinal center selection events during affinity maturation mutation drift and repertoire shift in the maturation of the immune response visualizing antibody affinity maturation in germinal centers role of framework mutations and antibody flexibility in the evolution of broadly neutralizing antibodies competitive exclusion by autologous antibodies can prevent broad hiv- antibodies from arising optimal sequential immunization can focus antibody responses against diversity loss and distraction precursor frequency and affinity determine b cell competitive fitness in germinal centers, tested with germline-targeting hiv vaccine immunogens stabilized hiv- envelope glycoprotein trimers for vaccine use optimality of mutation and selection in germinal centers antibodies in hiv- vaccine development and therapy structural insights on the role of antibodies in hiv- vaccine and therapy skempi: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models fine mapping of the interaction of neutralizing and nonneutralizing monoclonal antibodies with the cd binding site of human immunodeficiency virus type gp helper cell differentiation, function, and roles in disease clonal selection in the germinal centre by regulated proliferation and hypermutation what are the primary limitations in b-cell affinity maturation, and how much affinity maturation can we drive with vaccination? vmd: visual molecular dynamics trimeric hiv- -env structures define glycan shields from clades a, b, and g bnaber: database of broadly neutralizing hiv antibodies induction of broadly neutralizing antibodies in germinal centre simulations tackling influenza with broadly neutralizing antibodies use of hemagglutinin stem probes demonstrate prevalence of broadly reactive group influenza antibodies in human sera a human monoclonal antibody prevents malaria infection by targeting a new site of vulnerability on the parasite neville a public antibody lineage that potently inhibits malaria infection through dual binding to the circumsporozoite protein a lair insertion generates broadly reactive antibodies against malaria variant antigens broadly neutralizing epitopes in the plasmodium vivax vaccine candidate duffy binding protein human broadly neutralizing antibodies to the envelope glycoprotein complex of hepatitis c virus sustained antigen availability during germinal center initiation enhances antibody responses to vaccination immunization for hiv- broadly neutralizing antibodies in human ig knockin mice adeno-associated virus gene delivery of broadly neutralizing antibodies as prevention and therapy against hiv- viral receptor-binding site antibodies with diverse germline origins what are the primary limitations in b-cell affinity maturation, and how much affinity maturation can we drive with vaccination? immunogenic stimulus for germline precursors of antibodies that engage the influenza hemagglutinin receptor-binding site the science of learning: mechanisms and principles we thank krishna shrinivas for helpful discussions. financial support was provided by the lawrence livermore national laboratory llc award #b , and by the ragon institute of massachusetts general hospital, massachusetts institute of technology, and harvard university. key: cord- -d jsek authors: eguchi, raphael r.; anand, namrata; choe, christian a.; huang, po-ssu title: ig-vae: generative modeling of immunoglobulin proteins by direct d coordinate generation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d jsek while deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. we present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the d coordinates of immunoglobulins. our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. we show that the ig-vae can be used to create a computational model of a sars-cov -rbd binder via latent space sampling. we further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model. over the past two decades, the field of protein design has made steady progress, with new computational methods providing novel solutions to challenging problems such as enzyme catalysis, viral inhibition, de novo structure generation and more [ , , , , , , , , ] . building on the improvements in force field accuracy and conceptual advancements in understanding structural engineering principles, computational protein design is seemingly ready to take on any engineering challenge. however, designs created computationally rarely match the performance of variants derived from directed evolution experiments; there is still much room for improvement. while design force fields are known to be imperfect, a major limitation of computational design also stems from the difficulty in modeling backbone flexibility, specifically, movements in loops and the relative geometries of the structural elements in the protein. when comparing structural changes between pre-and post-evolution structures, we often find that the polypeptide chain responds to sequence changes in ways not predictable by the design algorithm. this is because current methods, such as rosetta [ ] , tend to restrict backbone conformations to local energy minima. while this issue is usually addressed by sampling backbone torsional angles from known fragments, such a stochastic optimization process is discrete, sparse, time-consuming, and rarely uncovers the true ground state. to address this fundamental challenge in modeling protein structure, we explore the use of deep neural networks to infer a continuous structural space in which we can smoothly interpolate between different backbone conformations. in doing so we seek to capture elements of backbone motility that are otherwise difficult to reflect in rigid body design. we focus on the task of antibody design, as it harbors many challenges common across different protein design problems and is of significant practical importance: applications of antibodies have been found in every sector of biotechnology, from diagnostic procedures to immunotherapy. with recent advances in deep learning technology, machine learning tools have seen increasing applications in protein science, with deep neural networks being applied to tasks such as sequence design [ ] , fold recognition [ ] , binding site prediction [ ] , and structure prediction [ , ] . generative models, which approximate the distributions of the data they are trained on, have garnered interest as a data-driven way to create novel proteins. unfortunately, the majority of protein-generators create d amino acid sequences [ , ] making them unsuitable for problems that require structure-based solutions such as designing protein-protein interfaces. very few machine learning models have been implemented for protein backbone generation [ , ] , and none of the reported methods produce structures comparable in quality to those of experimentally validated tools such as rosettaremodel [ ] . previously, our group was the first to report a generative adversarial network (gan) that generated -residue backbones that could serve as templates for de novo design [ ] . this prior work focused on unconditioned structure generation via a distance matrix representation, and d coordinates were recovered using a convex optimization algorithm [ ] or a learned coordinate recovery module [ ] . despite its novelty, the gan method is accompanied by certain difficulties. the generated distance constraints are not guaranteed to be euclidean-valid, and thus it is not possible to recover d coordinates that perfectly satisfy the generated constraints. we found that the quality of the recovered backbone torsions could often be degraded due to inconsistencies or errors in the generated distance constraints, leading to loss of important biochemical features, such as hydrogen-bonding. moreover, since the gan is trained on peptide fragments, we are not guaranteed to sample constraints for globular structures that can eventually fold. in this study, we use a variational autoencoder (vae) [ ] to perform direct d coordinate generation of full-atom protein backbones, circumventing the need to recover coordinates from pairwise distance constraints, and also avoiding the problem of distance matrix validity [ ] . using a coordinate representation has allowed us to make our model torsion-aware, significantly improving the quality of the generated backbones. importantly, our model motivates a conceptually new way of solving protein design problems. because the ig-vae generates coordinates directly, all of its outputs are fully differentiable through the coordinate representation. this allows us to use the generative prior to constrain structure generation with any differentiable coordinate-based heuristic, such as rosetta energy, backbone shape constraints, packing metrics, and more. by optimizing a structure in the vae latent space, designers are able to specify a set of desired structural features while the model creates the rest of the molecule. as an example of this approach, we use the ig-vae to perform constrained loop generation, towards epitope-specific antibody design. our technology ultimately paves the way for a novel approach to protein design in which backbone construction is solved via constrained optimization in the latent space of a generative model. in contrast to conventional methods [ , ] , we term this approach "generative design." figure : vae training scheme. the flow of data is shown with black arrows, and losses are shown in blue. first, ramachandran angles and distance matrices are computed from the full-atom backbone coordinates of a training example. the distance matrix is passed to the encoder network (e), which generates a latent embedding that is passed to the decoder network (d). the decoder directly generates coordinates in d space, from which the reconstructed ramachandran angles and distance matrix are computed. errors from both the angles and distance matrix are back-propagated through the d coordinate representation to the encoder and decoder. note that both the torsion and distance matrix losses are rotationally and translationally invariant, and that the coordinates of the training example are never seen by the model. the shown data are real inputs and outputs of the vae for the immunoglobulin chain in pdb: yxh(l). all training data were collected from the antibody structure database abdb [ ] . domains that were missing residues were excluded, and sequence-redundant structures were included to allow the network to learn small backbone fluctuations. the final training set is comprised of immunoglobulins spanning non-sequence-redundant structures, including single-domain antibodies. the training set covers close to % of the abdb database. structures in the dataset vary in length from to residues, with most falling between and . since the input of our model was fixed at residues ( atoms), structures larger than were center-cropped. structures smaller than were "structurally padded" by using rosettaremodel to append dummy residues to the n and c termini. the reconstruction loss of the padded regions was down-weighted over the course of training (see supplemental methods). we found that the structural padding step led to slight improvements in local bond geometries at the terminal regions. all structures were idealized and relaxed under the rosetta energy function with constraints to starting coordinates [ ] . this relaxation step was done to remove any potential confounding factors resulting from various crystal structure optimization procedures. a schematic of the model training scheme is shown in figure . like classical vaes [ ] our model minimizes a reconstruction loss and a kl-divergence loss that constrains the latent embeddings to be isotropic gaussian. importantly, the reconstruction loss is comprised of a distance matrix reconstruction loss and a torsion loss. the torsion loss is formulated as a supervised-learning objective, where the network infers the correct torsion distribution from the distance matrix. we found that early in training, the torsion loss must be heavily up-weighted relative to the distance loss in order to achieve correct stereochemistry, as molecular handedness cannot be uniquely determined from pairwise distances alone. decreasing the torsion weight later in training led to improvements in local structure quality. a detailed description of the loss weighting schedule is included in the supplemental methods. we note that both the distance and torsion losses are rotationally and translationally invariant, so the absolute position of the output coordinates is determined by the model itself. coordinates of the training examples are never seen by the model directly. to assess the utility our model, we studied the ig-vae's performance on several tasks. the first of these is data reconstruction, which reflects the ability of the model to compress structural features into a low-dimensional latent space (section . ). this functionality is an underlying assumption of generative sampling, which requires that the latent space capture the scope of structure variation with sufficient resolution. next, we assessed the quality and novelty of the generated structures, characterizing the chemical validity of the samples (section . ), while also evaluating the quality of interpolations between embedded structures (section . ). we visualize the distribution of embeddings within the latent space to better understand its structure, and to determine if the sampling distribution is well-supported (section . ). we ultimately challenge the ig-vae with a real design task; specifically, generation of a novel backbone with high shape-complementarity to the ace epitope of the sars-cov -rbd [ ] (section . . ). to evaluate the general utility of our approach, we investigated whether we could leverage the model's generative prior to perform backbone design subject to a set of local, human-specified constraints (section . . ). a core feature of an effective vae is the model's ability to embed and reconstruct data. high quality reconstructions indicate that a model is able to capture and compress structural features into a low-dimensional representation, which is a prerequisite for generation by latent space sampling. to evaluate this functionality, we reconstructed randomly selected structures and compared the real and reconstructed distributions of backbone torsion angles, pairwise distances, bond lengths, and bond angles ( figure ). the structurally padded "dummy" regions were excluded from this analysis. the distance and torsion distributions are shown in figure a , where we observe that the real and reconstructed data agree well. on average, pairwise distances smaller than Å tended to be reconstructed slightly smaller than the actual distances, while larger distances tended to be reconstructed slightly larger ( figure b , reconstructed). φ and ψ torsions tended within ∼ • of the real angles, while ω angles tended to fall within ∼ • . examples of reconstructed backbones are shown in the top row of figure c . these data demonstrate that the ig-vae accurately performs full-atom reconstructions over a range of loop conformations. in order to use generated backbones in conjunction with existing protein design tools, it is crucial that our model produce structures with near-chemically-valid bond lengths and bond angles. otherwise, large movements in the backbone can occur as a result of energy-based corrections during the design process, leading to the loss of model-generated features. the bond length and bond angle distributions are depicted in figure d . the majority of bond length reconstructions were within ∼ . Å of ideal lengths, while bond angles tended to be within ∼ • of their ideal angles. we found that a constrained optimization step using the rosetta centroid-energy function (see supplemental methods) could be used to effectively refine the outputs. this refinement process kept structures close to their output conformations ( figure c , bottom) while correcting for non-idealities in the bond lengths and bond angles ( figure d , green). refinement did not improve backbone reconstruction accuracy ( figure b , refined), but did improve chemical validity, implying that our model outputs could be refined with rosetta without washing-out generated structural features. our analysis of the reconstructions reveal that the ig-vae can be used to obtain high-resolution structure embeddings that are likely useful in various learning tasks on d protein data. these results support our later conclusion that the ig-vae embedding space can be leveraged to generate structures with high atomic precision, while also showing that the kl-regularization imposed on the latent embeddings does not overpower the autoencoding functionality of the vae. the left panel shows a plot of the post-refinement per-residue centroid energy against normalized nearest neighbor distance for the generated structures. the nearest neighbor distance is computed as the minimum frobenius distance between the generated distance matrix and all distance matrices in the training set. each point is colored based on whether the nearest neighbor is a heavy or light chain ig. the center panel shows an overlay of the generated structures (pink) and their nearest neighbors (blue) in the training set. these six structures were selected using a combination of centroid energy, nearest neighbor distance, heavy/light classification, and manual inspection. the right panel shows sequence design results for structures iii and vi. the energies in the left panel are centroid energies, while the energies in the right panel are full-atom rosetta energies using the ref score function. to determine whether the ig-vae could generate novel, realistic ig backbones, we sampled structures from the latent space of the model and compared their feature distributions to non-redundant structures from the dataset. each generated structure was cropped based on its nearest neighbor in the training set. in figure a we show overlays of the distance and torsion distributions for the real and generated structures. the generated torsions were more variable than the real torsions, with more residues falling outside the range of the training data. the real and generated distance distributions agreed well. visual assessment of the backbone ensembles ( figure b ) revealed that the two were similarly variable, suggesting that our model captures much of the structural variation found in the training set. the generated structures exhibited good chemical-bond geometries ( figure c , red) that were slightly noisier but comparable to those of the reconstructed backbones ( figure d ). once again, we found that constrained refinement using rosetta could improve chemical bond geometries ( figure c , green), with minimal changes to the generated structures. to assess both the novelty and viability of the generated examples, we evaluated structures based on two criteria: ( ) post-refinement energy and ( ) nearest-neighbor distance. energies were normalized by residue-count to account for variable structure sizes. nearest-neighbor distance was computed as a length-normalized frobenius distance between c α -distance matrices. for notational convenience, nearest-neighbor distances were normalized between and . we avoid the use of the classical c α -rmsd, because it is neither an alignment-free nor a length-invariant metric, and because it lacks sufficient precision to make meaningful comparisons between ig structures. we found that there was a positive correlation between energy and nearest-neighbor distance ( figure d ), implying that while our model is able to generate structures that differ from any known examples, there is a concurrent degradation in quality when structures drift too far from the training data. despite this, we found that a significant number of generated structures had novel loop shapes, achieved favorable energies, and retained ig-specific structural features. six of these examples are shown as raw model outputs, overlaid with their nearest neighbors in the center panel of figure d . both heavy and light-chain-like structures exhibited dynamic loop structures, and the model appears to performs well in generating both long and short loops. to assess whether the generated loop conformations could be sustained by an amino acid sequence, we used rosetta fastdesign [ , ] to create sequences for the selected backbones. the outputs of two representative design trajectories are shown in the right panel of figure d . the design process yielded energetically favorable sequences with loops supported by features such as hydrophobic packing, π-π stacking and hydrogen bonding. overall these results suggest that the ig-vae is capable of generating novel, high-quality backbones that are chemically accurate, and that can be used in conjunction with existing design protocols to obtain biochemically realistic sequences. while the results of the preceding section suggest that the ig-vae is able to produce novel structures, an important feature of any generative model is the ability to interpolate smoothly between examples in the latent space. in design applications this functionality allows for dense structural sampling, and modeling of transitions between distinct structural features. a linear interpolation between two randomly selected embeddings is shown in figure a . the majority of interpolated structures adopt realistic conformations, retaining characteristic backbone hydrogen bonds while transitioning smoothly between different loop conformations ( figure c ). structures along the interpolation trajectory were able to achieve negative post-refinement energies ( figure b) , with the highest energy structure corresponding to the most unrealistic portion of the trajectory ( figure a , b, index ). to better understand the structure of the embedding space, we visualized the training data embeddings ( figure d ) using two dimensionality-reduction methods: t-distributed stochastic neighbor embedding (tsne) [ ] and principal components analysis (pca) [ ] . the top panel depicts a tsne decomposition of the embedding means (without variance) for the non-redundant structures in the dataset. k-means clustering (k= ) revealed distinct clusters that roughly correlated with loop structure, suggesting a correspondence between latent space position and semantically meaningful features ( figure d, insert) . in the bottom panel, we visualized sampled embeddings for the same set of structures, sampling embeddings per example. pca revealed a spherical, densely populated embedding space, suggesting that the isotropic gaussian sampling distribution is well supported. the pca results also suggest that the kl-loss was sufficiently weighted during training. overall these results support the conclusions of the previous section, demonstrating that the ig-vae exhibits the features expected of a properly-functioning generative model, and that sampling from a gaussian prior is well motivated. the smooth interpolations agree with the observation that our model is capable of generating novel structures, which are expected to arise by sampling from interpolated regions between the various embeddings. while antibodies are usually comprised of two ig domains, there also exist a large number of single-domain antibodies in the form of camelids [ ] and bence jones proteins [ ] . to test the utility of our model in a real-world design problem, we challenged the ig-vae to generate a single-domain-binder to the ace epitope of the sars-cov receptor binding domain (rbd), an epitope which is of significant interest in efforts towards resolving the / coronavirus pandemic [ ] . to do this, we sampled structures from the latent space of the ig-vae. to find candidates with high shape complementarity to the ace epitope, we used patchdock [ ] to dock each generated structure against the rbd. to make the search sequence agnostic, both proteins were simulated as poly-valines during this step. we then selected ig's that bound the ace epitope specifically, and used fastdesign to optimize the sequences of the binding interfaces. two ig's that exhibited good shape complementarity to the ace epitope and adopted unique loop conformations are shown in figure a . after sequence design these candidates achieved favorable energies and complex ddg's of - . and - . rosetta energy units ( figure a, designed) . using rosettadock [ , ] , we were able to accurately recover the designed interfaces as the energy minimum of a blind global docking trajectory, suggesting that the binders are specific to their cognate epitopes ( figure a, recovered, docking) . these results demonstrate that the latent space of our generative model can be leveraged to create novel binding proteins that are otherwise unobtainable by discrete sampling of real structures. while we believe that designed proteins must be experimentally validated, our data suggest that generative models can provide compelling design candidates by computational design standards, making the method worthy of larger-scale and broader experimental testing. our data also demonstrate the compatibility of the ig-vae with established design suites like rosetta, which have conventionally relied on real proteins as templates. the functional elements of a protein are often localized to specific regions. antibodies are one example of this, where binding is attributable to a set of surface-localized loops, as well as enzymes, which depend on the positioning of catalytic residues to form an active site. despite this apparent simplicity, natural proteins carry a rich set of evolutionoptimized structural features that are required to host the functional elements. the protein design process often seeks to mimic this organization, requiring the manual engineering of supporting elements centered around a desired structural feature. while logical, designing supportive features is almost always a difficult task, requiring large amounts of experience and manual tuning. motivated by these difficulties, we sought to investigate whether the generative prior of our model could be leveraged to create structures that conform to a human-specified feature without specification of other supporting features. to test this, we specified a -residue antibody loop shape as pairwise ca distances. we then sampled random latent initializations and applied the constraints to the generated structures. next, we optimized the structures via gradient descent, backpropagating constraint errors to the latent vectors through the decoder network. from initializations, we were able to recover the target loop shape in trajectories, with the vast majority of structures retaining high quality, realistic features. we visualize one trajectory in the center panel of figure b . while the middle ig-loop ( figure b , blue) is being constrained, the other loops move to adopt sterically compatible conformations and the angles of the β-strands change to support the new loop shapes. we note that the latent-vector optimization problem is non-convex, which is why we require multiple random initializations [ , ] . importantly, the recovered backbones in the generated ensemble differ from the originating structure ( figure b , orange). these data suggest that our model can be used to create backbones that satisfy specific design constraints, while also providing a distribution of compatible supporting elements. we emphasize that this procedure is not limited to distance constraints, and can be done using any differentiable coordinate-based heuristic such as shape complementarity [ ] , volume constraints, rosetta energy [ ] , and more. with a well-formulated loss function, which warrants a study in itself, it is possible to "mold" the loops of an antibody to a target epitope, or even the backbone of an enzyme around a substrate of choice. our example demonstrates a novel formulation of protein design as a constrained optimization problem in the latent space of a generative model. in contrast to methods that require manual curation of each part of a protein, we term our approach "generative design," where the requirement of human-specified heuristics constitutes the "design" element, and using a generative model to fill in the details of the structure constitutes the "generative" element. perhaps the greatest challenge in deep structure generation is that generated backbones must be both globally realistic and chemically valid to be of any practical use. in the case that a generated example is poor in quality, energybased corrections to chemical bond angles, lengths, and torsions will often lead to unintended backbone movements, resulting in loss of model-derived features. this issue is closely tied to that of data representation; while proteins are often represented by d coordinates, these are not invariant under rotations and translations. several groups have experimented with distance matrix representations which provide such invariances [ , , ] ; however, generation of euclidean-valid distance matrices remains non-trivial [ ] and recovery of d coordinates from invalid matrices leads to feature degradation [ , ] . the training scheme that we present has sought to address these challenges by auto-encoding distance matrices while factoring through a d coordinate representation. our approach circumvents the need for coordinate recovery via secondary methods, and avoids the problem of euclidean validity altogether. importantly, the ig-vae loss function does not specify the absolute positions of the atomic coordinates, which are instead learned. factoring through a coordinate representation has also allowed us to back-propagate torsion error to the generative model, significantly improving local structure quality. in the structure prediction field, fragment-sampling has been used for many years as a powerful tool to efficiently search the vast backbone conformational space [ ] . recently, however, several groups discovered that structural priors obtained from coevolution data [ ] can be used to sufficiently restrict the conformational search space to allow for continuous optimization by gradient descent [ , ] . this insight gave alphafold [ ] a significant advantage in casp [ ] , as it enabled sampling of conformations otherwise not accessible by a discrete fragment set. unlike structure prediction, protein design does not often start with a sequence from which a structural prior can be obtained, and remains heavily dependent on stochastic fragment sampling. despite past success, fragment-based design methods can be problematic for two reasons. first, they are unable to model backbone flexibility, limiting the range of accessible conformations, sequences, and thus functions [ ] . second, their stochastic nature means that they use little prior information about global organization (e.g. topology, secondary structure contacts, etc.). while it is possible to filter for global patterns [ ] , such information is difficult to integrate into the sampling process itself, leading to the overwhelming majority of trajectories being dsiscarded. in this study we have sought to address these long-standing difficulties by class-specific generative modeling. because our model provides realistic interpolations within the embedding space, generated backbones capture a range of dynamic motions that are difficult to sample via conventional simulations or fragment-sampling; the model infers a continuous distribution of ig-loop conformations from a set of discrete ones. while our model is classspecific, its generative prior allows us to restrict backbone conformations to the regime of a specific class, allowing for dense sampling while also supplying information about high-level features not provided by quantitative heuristics. additionally, because our approach is not specific to ig's, it can be applied to any fold-class well represented in structural databases. although there is much room to improve our model, in its current form the ig-vae remains a powerful tool for creating single-domain antibodies, and allows for high-throughput construction of epitope-tailored, structure-guided libraries. with such a tool, it may be possible to circumvent screening of fully randomized libraries, a large proportion of which are usually insoluble or fundamentally incompatible with the target of interest [ , , , ] . overall, our work is of significant interest to both protein engineers and machine learning scientists as the first successful example of d deep generative modeling applied to protein design. with the novel paradigm that it offers, we speculate that our scheme will motivate further study of class-specific generative models, as well as development of differentiable loss functions that can be used to create novel, functional proteins. code for the working model will be released on github at a later date. supplemental methods, rosetta commands, and an interpolation movie are available for download at: https://tinyurl.com/y cao h . de novo design of a fluorescence-activating β-barrel de novo design of a four-fold symmetric tim-barrel protein with atomic-level accuracy de novo design of potent and selective mimics of il- and il- a potent and broad neutralizing antibody recognizes and penetrates the hiv glycan shield kemp elimination catalysts by computational enzyme design de novo design of bioactive protein switches computational design of closely related proteins that adopt two well-defined but structurally divergent folds accurate computational design of multipass transmembrane proteins the coming of age of de novo protein design rosetta : an object-oriented software suite for the simulation and design of macromolecules protein sequence design with a learned potential multi-scale structural analysis of proteins by deep semantic segmentation deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning koray kavukcuoglu, and demis hassabis. improved protein structure prediction using potentials from deep learning improved protein structure prediction using predicted interresidue orientations unified rational protein engineering with sequence-based deep representation learning progen: language modeling for protein generation. preprint, synthetic biology generating tertiary protein structures via an interpretative variational autoencoder generative modeling for protein structures rosettaremodel: a generalized framework for flexible backbone protein design fully differentiable full-atom protein backbone generation auto-encoding variational bayes generating valid euclidean distance matrices principles for designing ideal protein structures abdb: antibody structure database-a database of pdb-derived antibody structures potential role of ace in coronavirus disease (covid- ) prevention and management accurate de novo design of hyperstable constrained peptides rosettascripts: a scripting language interface to the rosetta macromolecular modeling suite visualizing data using t-sne on lines and planes of closest fit to systems of points in space. the london, edinburgh, and dublin philosophical magazine camelid single-domain antibodies: historical perspective and future outlook bence jones proteins patchdock and symmdock: servers for rigid and symmetric docking benchmarking and analysis of protein docking performance in rosetta v protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations precise recovery of latent vectors from generative adversarial networks generalized latent variable recovery for generative adversarial networks shape complementarity at protein/protein interfaces the rosetta all-atom energy function for macromolecular modeling and design are there pathways for protein folding large-scale determination of previously unsolved protein structures using evolutionary information. elife critical assessment of methods of protein structure prediction (casp)-round xiii rosettabackrub-a web server for flexible backbone protein structure modeling and design evaluation of protein engineering and process optimization approaches to enhance antibody drug manufacturability predictive tools for stabilization of therapeutic proteins post-translational modifications of recombinant proteins: significance for biopharmaceuticals general strategy for the generation of human antibody variable domains with increased aggregation resistance we thank sergey ovchinnikov for helpful discourse during early phases of this project, and for contributing initial code that became part of the torsion-reconstruction loss function. this project was supported by startup funds from the stanford schools of engineering and medicine, the stanford chem-h chemistry/biology interface predoctoral training program and the national institute of general medical sciences of the national institutes of health under award number t gm . additionally, this project was supported by the u.s. department of energy, office of science, office of advanced scientific computing research, scientific discovery through advanced computing (scidac) program. this project is also based upon work supported by google cloud. key: cord- -rfhtlodc authors: azhar, mohd.; phutela, rhythm; ansari, asgar hussain; sinha, dipanjali; sharma, namrata; kumar, manoj; aich, meghali; sharma, saumya; rauthan, riya; singhal, khushboo; lad, harsha; patra, pradeep kumar; makharia, govind; chandak, giriraj ratan; chakraborty, debojyoti; maiti, souvik title: rapid, field-deployable nucleobase detection and identification using fncas date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rfhtlodc detection of pathogenic sequences or variants in dna and rna through a point-of-care diagnostic approach is valuable for rapid clinical prognosis. in recent times, crispr based detection of nucleic acids has provided an economical and quicker alternative to sequencing-based platforms which are often difficult to implement in the field. here, we present fncas editor linked uniform detection assay (feluda) that employs a highly accurate enzymatic readout for detecting nucleotide sequences, identifying nucleobase identity and inferring zygosity with precision. we demonstrate that feluda output can be adapted to multiple signal detection platforms and can be quickly designed and deployed for versatile applications including rapid diagnosis during infectious disease outbreaks like covid- . the rise of crispr cas based approaches for biosensing nucleic acids has opened up a broad diagnostic portfolio for crispr products beyond their standard genome editing abilities , . in recent times, crispr components have been successfully used for detecting a wide variety of nucleic acid targets such as those obtained from pathogenic microorganisms or disease-causing mutations from various biological specimens [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . at the heart of such a detection procedure lies the property of crispr proteins to accurately bind to target dna or rna, undergo conformational changes leading to cleavage of targets generating a reporter-based signal outcome [ ] [ ] [ ] [ ] [ ] . to enable such a detection mechanism to be foolproof, sensitive and reproducible across a large variety of targets, the accuracy of dna interrogation and subsequent enzyme activity is extremely critical, particularly when clinical decisions are to be made based on these results , . current technologies relying on using crispr components for nucleic acid detection can sense the identity of the target either through substrate cleavage mediated by an active crispr ribonucleoprotein (rnp) complex or by binding through a catalytically inactive rnp complex. cleavage outcomes are then converted to a reporter-based readout with or without signal amplification. among the crispr proteins that have been used so far, cas and cas or their inactive forms have been predominantly employed for detecting dna sequences while cas has been used for both dna and rna sequences. each of these approaches has its own strengths and limitations that are related to sensitivity, specificity and read-out modes for an accurate diagnosis. the primary focus of these platforms is towards detection of low copy numbers of nucleic acids from body fluids of patients where signal amplification through collateral activity of fluorescent reporters has been proven to be advantageous. for genotyping individuals with high confidence, including careers of we have recently reported a cas ortholog from francisella novicida (fncas ) showing very high mismatch sensitivity both under in vitro and in vivo conditions [ ] [ ] [ ] . this is based on its negligible binding affinity to substrates that harbor mismatches, a property that is distinct from engineered cas proteins showing similar high specificity . we reasoned that fncas mediated dna interrogation and subsequent cleavage can both be adapted for accurately identifying any snvs provided that the fundamental mechanism of discrimination is consistent across all sequences ( figure a ). we name this approach fncas editor linked uniform detection assay (feluda) and demonstrate its utility in various pathological conditions including genetic disorders and infectious diseases including disease outbreaks like covid- [ ] [ ] [ ] . to identify a snv with high accuracy, we first sought to investigate if fncas can be directed to cleave the wild type (wt) variant at a snv by placing an additional mismatch in the sgrna sequence specific to the snv. to test this, we selected sickle cell anemia (sca), a global autosomal recessive genetic disorder caused by one point mutation (gag>gtg) [ ] [ ] . we identified an sgrna that can recognize the main disease causing point mutation (gag > gtg) on account of a pam site in the vicinity ( figure b ). we then fixed the position of this mutation with respect to pam and changed every other base in the sgrna sequence to identify which combination led to complete loss of cleavage of a wild type substrate in an in vitro cleavage (ivc) assay with fncas ( figure b, supplementary figure a ). we found that two mismatches at the nd and th positions away from the pam completely abrogated the cleavage of the target ( figure b ). similarly other combinations ( and , and , and , and and , numbers referring to positions away from pam) also abolished the cleavage, although to slightly lower levels ( figure b) . notably, on the sca substrate, the and mismatch combination produced near complete cleavage suggesting that this combination may be favorable for discriminating between wt and sca substrates ( figure b) . importantly, the same design principle can guide the allelic discrimination at every snv that appears in these positions upstream of the trinucleotide ngg pam in dna. we performed feluda with synthetic sequences corresponding to snvs reported for the mendelian disorders like glanzman's thrombasthenia, hemophilia a (factor viii deficiency), glycogen storage disease type i and x linked myotubular myopathy and observed an identical pattern of successful discrimination at the snvs (supplementary figure b) . taken together, these experiments suggest that feluda design can be universally used for detection of snvs and and would not require extensive optimization or validation steps for new snvs. to aid users for quick design and implementation of feluda for a target snv, we have developed a webtool jatayu (junction for analysis and target design for your feluda assay) that incorporates the above features and generates primer sequences for amplicon and sgrna synthesis (https:// jatayu.igib.res.in, supplementary figure ). we next tested feluda in dna from sca patients and a healthy control and found that in every case, the sc mutation containing substrates were cleaved to give a distinguishable signature while the wt substrate remained intact (supplementary figure a ). to establish that enzyme specificity for position-specific mismatches with respect to pam site is the fundamental reason for this discrimination, we designed the mismatch combinations such that cleavage will occur only for the wt substrate and observed identical results thus confirming our hypothesis (supplementary figure b ). notably, a simple agarose gel electrophoresis can be employed for this discrimination suggesting that feluda can be used in routine molecular biology labs to establish the presence of an snv in a dna sequence. in recent times fluorescence based nucleic acid detection has been widely used for several crisprdx platforms, particularly where collateral cleavage of reporters has been employed. although feluda results can be precisely determined by agarose or capillary electrophoresis, we envisioned fluorescence or chemiluminescence as alternate end-point readouts to expand the scope of devices that can suit feluda based detection. to implement this, we first investigated if feluda can be adapted to a non-cleavage, affinity-based method of detection which works with single nucleotide mismatch sensitivity. to develop such a readout, we tested feluda with a catalytically dead fncas (dfncas ) tagged with a fluorophore (gfp) and investigated if its mismatch discrimination is regulated at the level of dna binding (supplementary figure a) . we performed microscale thermophoresis (mst) assays to measure the binding affinity of inactive fncas -gfp rnp complex with wt or sca substrates. we observed that the sca substrate showed moderately strong binding (k d = . ± . nm) whereas the wt substrate exhibited very weak binding (k d = . nm ± . nm) consistent with the absence of cleavage on the ivc readout ( figure a) . we then developed a pipeline to adapt feluda for an affinity-based fluorescent read-out system, where the amplification step generates biotinylated products which can then be immobilized on magnetic streptavidin beads. upon incubation with fluorescent components in feluda, the absence or presence of a mismatch guides the binding of fncas molecules to the substrate leading to loss of fluorescence signal in the supernatant ( figure b ). we tested feluda using wt or sca substrates and observed distinct signatures that distinguished between the two alleles suggesting that feluda can be adapted for a fluorescence-based readout ( figure b ). next, we investigated if feluda detection can be extended to a quick point of care diagnosis of snvs using lateral flow strips for instrument-free visual detection. although such read-outs have been demonstrated with crispr effectors that have collateral activity, they rely on a secondary signal amplification step where fluorescent oligos are added to the reaction setup. we designed an assay using fam labelled rnp complex and biotin labelled amplicons on commercially available paper strips. in such a reaction, rnp-bound substrate molecules in a solution can lead to aggregation of the complex on a distinct test line of the strip while anti-fam antibody linked gold nanoparticles accumulate on a control line on the strip ( figure c ). to enable fam labelling of sgrna, we first validated the successful activity of chimeric fncas sgrnas by altering the length of overlap between crrna:tracrrna and observed that nt overlap can efficiently cleave the target ( figure d ). since a single fam labelled tracrrna is compatible with multiple crrnas, this design also reduces the time and cost of a feluda assay. next, we performed this assay with the hbb target and were able to detect up to femtomolar levels of target dna in a solution suggesting that even without signal amplification, feluda reaches sensitivity similar to that reported for crisprdx platforms employing collateral activity ( figure e ). finally, we tested feluda using wt and sca samples and obtained clear distinguishing bands for either condition validating the feasibility of visual detection for feluda diagnostics with complete accuracy ( figure f ). genotyping carrier individuals with heterozygosity though non-sequencing pcr based methods is often complicated and requires extensive optimization of primer concentration and assay conditions. although sickle cell trait (sct) individuals are generally non-symptomatic, carrier screening is vital to prevent the spread of sca in successive generations and is widely employed in sca control programs in various parts of the world . we speculated that the high specificity of feluda can be extended to identifying carriers since fncas would cleave % of the dna copies carrying sca mutation and thus show an intermediate cleavage pattern (figure a .) we also investigated the use of saliva instead of blood as a non-invasive source of genomic dna that would allow genotyping children and aged subjects where drawing blood may not be feasible. we obtained a clear, distinguishable signature of dna cleavage in an sct subject that was intermediate between normal or sca individuals suggesting that feluda can be successfully used for determining zygosity at a specific snv ( figure a ). although this reinforced the inherent sensitivity of feluda for genotyping targets with bp mismatches, obtaining the three distinct readouts would necessitate significantly robust reaction components and high reproducibility across multiple subjects. we performed a blinded experiment using dna obtained from subjects with all three genotypes from a tertiary care center. remarkably, the feluda results perfectly matched with the sequencing data from the same samples performed in a different laboratory (csir center for cellular and molecular biology, chandak lab) and thus identified all three genotypes with % accuracy ( figure b ). although several pathogenic snvs are located close to a ngg pam and can be accurately targeted by feluda, these form only a small subset of the total number of disease-causing snvs catalogued in clinvar database (supplementary figure ) . to detect the non-pam proximal snvs we designed an in-built pam site in the amplification step of feluda. we tested this approach using snvs (a g and a g) present in helicobacter pylori s rrna gene and which do not have a ngg pam in the vicinity. these snvs confer variable clarithromycin resistance in patients with gastric ulcers and clinically pose a serious concern for physicians . we validated that pam-mer based amplification can be successfully used for fncas based ivc by targeting synthetic dna sequence containing the h. pylori s rrna gene ( figure c ). we then isolated bacteria dna from patient gut biopsy samples and successfully distinguished the antibiotic resistance genotype from another closely matched synthetic wild type sequence ( figure c ). importantly, this procedure takes a few hours from obtaining sample to diagnosing the variant, a significant improvement over existing regimens which rely on antimicrobial susceptibility tests that can take several days to complete . since our design incorporates the need for fixing the snv at either nd or th position proximal to pam, this limits the possibility of discriminating substrates using a single feluda reaction. to overcome this, we explored single mismatches in the sgrna sequence that might lead to abrogation of cleavage or binding from the substrate particularly at positions - at pam distal end, since these showed minimum cleavage from our previous study. remarkably, we found that fncas shows negligible cleavage at each of these positions (supplementary figure a ). in particular, mismatch at pam distal th base shows complete absence of cleavage and negligible binding affinity to mismatched substrate ( figure d ). to confirm this strategy, we targeted the snv rs which is associated with either g or c or g/c (heterozygous) variants in different individuals figure b) . its ease of design and implementation, as exemplified by its urgent deployment during the covid- health crisis offers immense possibilities for rapid and wide-spread testing that has so far proven to be successful in spreading the progression of the disease in multiple countries. the sequences for the hbb (wt and sca), emx and vegfa site were pcr similarly, a bp region flanking two snvs (a g and a g) in helicobacter pylori s rrna gene were ordered as synthetic dna cloned in puc by ecorv. the cloned sequences were confirmed by sanger sequencing. all the plasmid constructs used were adapted from our previous study in vitro transcription for sgrnas/crrnas were done using megascript t transcription kit (thermo fisher scientific) using a t promoter containing template as substrates. ivt reactions were incubated overnight at ℃ followed by dnase treatment as per kit instructions and then purified by nucaway spin column (thermo fisher scientific) purification. ivt sgrnas/crrnas were stored at - ℃ until further use. human genomic dna was extracted from the blood using the wizard genomic dna purification kit (promega) as per the instructions. for genomic dna extraction from saliva, ml of saliva was centrifuged at rpm followed by three washes with ml of x pbs. after washing, the pellet was lysed with µl of . % triton x at °c for minutes. then again centrifuged at rpm and supernatant was transferred into a fresh vial. a total volume of µl of the supernatant was used in pcr reaction or otherwise stored at - °c. genomic dna was extracted from the biopsy samples ( ) ( ) ( ) ( ) ( ) ( ) rpa reaction was set up as recommended in the twistamp ® basic kit. - ng of genomic dna was used with normal or biotinylated primers and the reaction was performed at ℃ for minutes. amplicons were then purified with qiagen pcr clean up kit and visualized on agarose gel. dna samples were directly pcr amplified for the target region while rna samples were converted to cdna using reverse transcriptase kit (qiagen) and pcr amplified. in vitro cleavage assay was performed as optimized in our previous study chimeric grna (crrna:tracrrna) was prepared by mixing in-vitro transcribed crrna and synthetic '-fam labelled tracrrna in a equimolar ratio using annealing buffer ( mm nacl, mm tris-hcl, ph . and mm mgcl ), mix was heated at c for - minutes and then allowed to cool at room temperature for - minutes. chimeric grna-dead fncas rnp complex ( nm) was prepared by mixing them in a buffer ( mm hepes, ph . , mm kcl, mm dtt, % glycerol, mm mgcl ) and incubated for min at rt. rpa or pcr amplified biotinylated amplicons were then incubated with the rnp complexes for minutes at °c. dipstick buffer was added along with the milenia hybridetect paper strip (twistdx) was added as per instructions, producing dark colored bands over the strip within - min. fncas -sgrna complex ( nm) was prepared by mixing them in a buffer ( mm hepes, ph . , mm kcl, mm dtt, % glycerol, mm mgcl ) and incubated for min at rt. the reconstituted rnp complexes along with pcr amplified dna amplicons were then used for ivc assays at different temperatures ranging from °c to °c for minutes. the reaction was inhibited using µl of proteinase k (ambion). after removing residual rna by rnase a (purelink), the cleaved products were visualized on agarose gel. fncas -sgrna complex ( nm) was prepared by mixing them in a buffer ( mm hepes, ph . , mm kcl, mm dtt, % glycerol, mm mgcl ) and incubated for min at rt. linearized plasmid used here as a substrate was incubated with reconstituted rnp complexes at different time points starting from h to h with or without % sucrose in the reaction buffer respectively. further, cleaved products were visualized on agarose gel. for the binding experiment, dfncas -gfp protein was used along with page purified respective ivt sgrnas. notably, ivt sgrnas were purified by % urea- with rnp complex at c for min in reaction buffer. nanotemper standard treated capillaries were used for loading the sample. measurements were performed at °c using % led power in blue filter ( - nm excitation wavelength; - nm emission wavelength) and % mst power. all experiments were repeated at least three times for each measurement. all data analyses were done using nanotemper analysis software. the graphs were plotted using originpro . software. the sequencing reaction was carried out using big dye terminator v . cycle sequencing kit (abi, ) in μl volume (containing . μl purified dna, . μl sequencing reaction mix, μl x dilution buffer and . μl forward/ reverse primer) with the following cycling conditions - mins at °c, cycles of ( sec at °c, sec at °c, mins at °c) and mins at °c. subsequently, the pcr product was purified by mixing with μl of mm edta (ph . ) and incubating at rt for mins. μl of absolute ethanol and μl of m naoac (ph . ) were then added, incubated at rt for mins and centrifuged at rpm for mins, followed by invert spin at < rpm to discard the supernatant. the pellet was washed twice with μl of % ethanol at rpm for mins and supernatant was discarded by invert spin. the pellet was air dried, dissolved in μl of hi-di formamide (thermo fisher, ), denatured at °c for mins followed by snapchill, and linked to abi xl sequencer. base calling was carried out using sequencing analysis software (v . . ) (abi, us) and sequence was analyzed using chromas v . . (technelysium, australia). clinvar dataset (version: ) was used to extract disease variation spectrum that can be targeted by feluda , jatayu (jatayu.igib.res.in) is a web tool which enables users to design sgrnas and primer for the detection of variations. users need to provide a valid genomic sequence with position and type of variation. jatayu front-end has been created using bootstrap and jquery. in the back-end, python-based flask framework has been used with genome analysis tools bwa (burrows-wheeler aligner) and bedtools - . all oligos used in the study are listed in table . we thank all members of chakraborty and maiti labs for helpful discussions and valuable insights. we are grateful to mitali mukerji, rajesh pandey and mohd. faruq the present study was approved by the ethics committee, institute of genomics and step : users need to provide mutation information such as position and type of mutation. step : confirmation of the mutation information. step : design of sgrna and primers for the given input sequence and mutation. only one fluorescent rnp will bind to the target while the other rnps will dissociate. the process can be iteratively performed to identify each nucleobase in a sequence. next-generation diagnostics with crispr. science ( -. ) crispr/cas systems towards next-generation biosensing low-cost detection of zika virus using programmable biomolecular components cdetection: crispr-cas b-based dna detection with subattomolar sensitivity and single-base specificity flash: a next-generation crispr diagnostic for multiplexed detection of antimicrobial resistance sequences sherlock: nucleic acid detection with crispr nucleases holmesv : a crispr-cas b-assisted platform for nucleic acid detection and dna methylation quantitation multiplexed and portable nucleic acid detection platform with cas , cas a, and csm . science ( -. ) pathogen detection in the crispr-cas era highly effective and low-cost microrna detection with crispr-cas c c is a single-component programmable rnaguided rna-targeting crispr effector. science ( -. ) crispr-cas a has both cis-and trans-cleavage activities on single-stranded dna crispr-cas a target binding unleashes indiscriminate single-stranded dnase activity. science ( -. ) cas b is a type vi-b crispr-associated rna-guided rnase differentially regulated by accessory article cas b is a type vi-b crispr-associated rna-guided rnase differentially regulated by accessory proteins csx and csx cas d is a compact rna-targeting type vi crispr effector positively modulated by a wyl-domain-article cas d is a compact rna-targeting type vi crispr effector positively modulated by a wyl-domain-containing accessory protein francisella novicida cas interrogates genomic dna with very high specificity and can be used for mammalian genome editing targeted activation of diverse crispr-cas systems for mammalian genome editing via proximal crispr targeting structure and engineering of francisella novicida cas targeting accuracy world report developing antibody tests for sars-cov- sars-cov- testing the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak -an update on the status when basic science reaches into rational therapeutic design: from historical to novel leads for the treatment of β globinopathies sickle cell disease clinvar: public archive of relationships among sequence variation and human phenotype mutations in the s rrna gene are associated with clarithromycin resistance in helicobacter pylori isolates in brazil molecular patterns of resistance among helicobacter pylori strains in south-western poland association of the bitter taste receptor gene tas r (polymorphism rs ) with sensory responsiveness, food preferences, biochemical parameters and body-composition markers. a cross-sectional study in italy isothermal amplification of nucleic acids clinical manifestations of dengue in relation to dengue serotype and genotype in malaysia: a retrospective observational study serotype influences on dengue severity: a cross-sectional study on confirmed dengue cases in vitória recombinase polymerase amplification: basics, applications and recent advances comment scientific and ethical basis for socialdistancing interventions against covid- improved molecular diagnosis of covid- by the novel, highly sensitive and specific covid- -rdrp/hel real-time reverse transcription-polymerase chain reaction assay validated in vitro and with clinical specimens aerosol and surface stability of sars-cov- as compared with sars-cov- clinvar: improving access to variant interpretations and supporting evidence a new coronavirus associated with human respiratory disease in china unique and conserved features of genome and proteome of sars-coronavirus, an early split-off from the coronavirus group lineage ncbi viral genomes resource seqmap: mapping massive amount of oligonucleotides to the genome fast and accurate long-read alignment with burrows-wheeler transform bedtools: a flexible suite of utilities for comparing genomic features key: cord- -m ahicqb authors: romano, alessandra; casazza, marco; gonella, francesco title: energy dynamics for systemic configurations of virus-host co-evolution date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: m ahicqb virus cause multiple outbreaks, for which comprehensive tailored therapeutic strategies are still missing. virus and host cell dynamics are strictly connected, and convey in virion assembly to ensure virus spread in the body. study of the systemic behavior of virus-host interaction at the single-cell level is a scientific challenge, considering the difficulties of using experimental approaches and the limited knowledge of the behavior of emerging novel virus as a collectivity. this work focuses on positive-sense, single-stranded rna viruses, like human coronaviruses, in their virus-individual host interaction, studying the changes induced in the host cell bioenergetics. a systems-thinking representation, based on stock-flow diagramming of virus-host interaction at the cellular level, is used here for the first time to simulate the system energy dynamics. we found that reducing the energy flow which fuels virion assembly is the most affordable strategy to limit the virus spread, but its efficacy is mitigated by the contemporary inhibition of other flows relevant for the system. summary positive-single-strand ribonucleic acid ((+)ssrna) viruses can cause multiple outbreaks, for which comprehensive tailored therapeutic strategies are still missing. virus and host cell dynamics are strictly connected, generating a complex dynamics that conveys in virion assembly to ensure virus spread in the body. this work focuses on (+)ssrna viruses in their virus-individual host interaction, studying the changes induced in the host cell bioenergetics. a systems-thinking representation, based on stock-flow diagramming of virus-host interaction at the cellular level, is used here for the first time to simulate the energy dynamics of the system. by means of a computational simulator based on the systemic diagramming, we identifid host protein recycling and folded-protein synthesis as possible new leverage points. these also address different strategies depending on time setting of the therapeutic procedures. reducing the energy flow which fuels virion assembly is addressed as the most affordable strategy to limit the virus spread, but its efficacy is mitigated by the contemporary inhibition of other flows relevant for the system. counterintuitively, targeting rna replication or virion budding does not give rise to relevant systemic effects, and can possibly contribute to further virus spread. the tested combinations of multiple systemic targets are less efficient in minimizing the stock of virions than targeting only the virion assembly process, due to the systemic configuration and its evolution overtime. viral load and early addressing (in the first two days from infection) of leverage points are the most effective strategies on stock dynamics to minimize virion assembly and preserve host-cell bioenergetics. as a whole, our work points out the need for a systemic approach to design effective therapeutic strategies that should take in account the dynamic evolution of the system. virus cause multiple outbreaks, for which comprehensive tailored therapeutic strategies are still missing. virus and host cell dynamics are strictly connected, and convey in virion assembly to ensure virus spread in the body. study of the systemic behavior of virus-host interaction at the single-cell level is a scientific challenge, considering the difficulties of using experimental approaches and the limited knowledge of the behavior of emerging novel virus as a collectivity. this work focuses on positive-sense, single-stranded rna viruses, like human coronaviruses, in their virus-individual host interaction, studying the changes induced in the host cell bioenergetics. a systems-thinking representation, based on stock-flow diagramming of virus-host interaction at the cellular level, is used here for the first time to simulate the system energy dynamics. we found that reducing the energy flow which fuels virion assembly is the most affordable strategy to limit the virus spread, but its efficacy is mitigated by the contemporary inhibition of other flows relevant for the system. positive-single-strand ribonucleic acid ((+)ssrna) viruses can cause multiple outbreaks, for which comprehensive tailored therapeutic strategies are still missing. virus and host cell dynamics are strictly connected, generating a complex dynamics that conveys in virion assembly to ensure virus spread in the body. this work focuses on (+)ssrna viruses in their virus-individual host interaction, studying the changes induced in the host cell bioenergetics. a systems-thinking representation, based on stockflow diagramming of virus-host interaction at the cellular level, is used here for the first time to simulate the energy dynamics of the system. by means of a computational simulator based on the systemic diagramming, we identifid host protein recycling and folded-protein synthesis as possible new leverage points. these also address different strategies depending on time setting of the therapeutic procedures. reducing the energy flow which fuels virion assembly is addressed as the most affordable strategy to limit the virus spread, but its efficacy is mitigated by the contemporary inhibition of other flows relevant for the system. counterintuitively, targeting rna replication or virion budding does not give rise to relevant systemic effects, and can possibly contribute to further virus spread. the tested combinations of multiple systemic targets are less efficient in minimizing the stock of virions than targeting only the virion assembly process, due to the systemic configuration and its evolution overtime. viral load and early addressing (in the first two days from infection) of leverage points are the most effective strategies on stock dynamics to minimize virion assembly and preserve host-cell bioenergetics. as a whole, our work points out the need for a systemic approach to design effective therapeutic strategies that should take in account the dynamic evolution of the system. interaction between a (+)ssrna virus and the host cell and therefore addressing effective intervention strategies. starting from the knowledge of relevant processes in (+ss)rna virus replication, transcription, translation, virions budding and shedding and their energy costs (reported in supplementary methods table ) , we built up a systems-thinking (st) based energy diagram of the virus-host interaction. figure shows the stock-flow diagram for the system at issue, where each stock was quantified in terms of embedded energy of the corresponding variable. symbols are borrowed from the energy language , : shields indicate the stocks, big solid arrows the processes and line arrows the flows, whereas dashed lines indicate the controls exerted by the stocks on the processes. all stocks, flows and processes are expressed in terms of the energy embedded, transmitted and used during the infection. we used atp-equivalents (atp-eq) as energy unit referred to cellular costs, by using the number of atp (or gtp and other atp-equivalents) hydrolysis events as a proxy for energetic cost , . the dynamic determination of flows was based on the knowledge of characteristic time-scales of well-established biological processes (for more details see supplementary methods). the output flows j and j were set to be effective only if the value of the respective stocks q and q is higher than a threshold, as represented by the two switch symbols in the diagram. the stock q represents the embedded energy of resources addressed to protein synthesis in the host cell. the dynamics of allocation for protein synthesis depends on the cell bioenergetics, e.g., the number of mitochondria, ox-phos activity levels, and the cell cycle phase [ ] [ ] [ ] (for more details, see supplementary methods). in the absence of virus, energy flows from q (flows j , j a and j b) to produce short-half-life proteins (stock q a) and long-half-life proteins (stock q b), whose synthesis, degradation and secretion follow dynamic stationary conditions. in particular, typical of the specialized cells such as those of the pulmonary epithelium, there is a flow of proteins destined for degradation and recycling through the recruitment of autophagic receptors or organelles or proteasomes (identified in the diagram by the flows j a and j b, supplementary methods table ). the outflow of folded, fully functional proteins addressed to secretion or surface exposure is first, we investigated the system dynamics under different initial conditions, exploring the possible role of different initial viral loads (figure ). in the configuration of initial null viral load (q stock value= ) the value of stocks q , q a and q b were constant and the system behavior was stationary (figure a) . assuming a different viral loads (time zero) in the - , virions range, we found that there is a threshold in the initial viral load for triggering the progressive reduction of q , whose amount could in turn trigger the cell death in different ways. apoptosis is a cellular process requiring energy, and a deflection in q , as shown in figure when a (+)ssrna virus enters the single host cell, the q stock is fed, and its proteins can interact with the host proteome to sustain rna replication. based on previous works in the field - , we identified a time delay of to hours required to record changes in the q stock. moreover, the value of q varies over the time due to changes occurred at a different timepoints in the stocks q b, q and q . to be effective, a therapeutic strategy should limit the outflows of virions, j and/or j . however, minimization of j seems to be counter-effective in our simulation, due to the increase of q as consequence of the feedback action in the virus replication (vr) process. the minimization of j prevents the outflow (viral shedding) from q without stopping its growth. at the same time, this leads to increased resources diverting from q and q b that could promote the cell death, with consequent spread of the virions in the environment. arising from targeting j was mitigated by the combination with reduction of j or j (figure ) . the partial reduction of j , alone or in combination, did not change significantly the dynamic growth of q , induced increase of q at day , though it could prevent a reduction of q , to preserve the cell bioenergetics and preventing the cell death by atp lack. we also simulated the effect of applying the same external inputs at different times: after (extended data figure ), (extended data figure ) or days (extended data figure ) from the initial infection, alone or in combination (figure ) . application of external forcing factors at day , day or day could affect in a non-linear way the system response, with the largest efficacy targeting j (figures - ) . the combination of external driving forces acting on j and j did not result to be significantly synergic, while targeting j and j increased the q stock value instead of the expected reduction. it is worth stressing that this kind of behavior is a typical systemic feature, where an intervention on a specific local process may lead to counterintuitive rearrangements in the system dynamics as a whole. late suppression of j and j at day could reduce only the stock values of q . only early full reduction of j (by day ) could significantly limit the growth of q , while the combination of contemporary suppression of j and j could not prevent q growth (figure ) . thus, application of external driving forces at different timepoints is expected to model additional resilience configurations. in this work, we approached the host-virus interaction dynamics as a systemic problem and, for the first time in the field, we used combined systems thinking tools as a conceptual framework to build up a systemic description of the viral action and host response, critically depending on the existing metabolic environment. the complex dynamic behavior was described in terms of underlying accessible patterns, hierarchical feedback loops, self-organization, and sensitive dependence on external parameters, that were analytically computed by a simulator. the following novelties were addressed: i) the use of energy language as a common quantitative unit for different biological cells and used mass spectrometry to measure protein-protein interactions . with this experimental approach, they identified interactions between viral and host proteins, and noted existing drugs, known to target host proteins or associated pathways, which interact with sars-cov- , addressing the importance to target the host-virus interaction at the level of rna translation. there are known advantages of in silico modelling the action of therapeutic agents on known diseases through agent-based modelling . however, the literature evidenced some intrinsic limitations on the choice of parameters, like the size of investigated populations , while major problems are related to model validation , , also requiring to supplement the models with adequate formal ontologies . thanks to its abstract nature, stock-flow description can be used in a wide range of different fields, realizing the conceptual bridge that connects the language of biological systems to that of ecology. our approach unveils the potential of systems thinking for the study of other diseases or classes of disease, since it appears more and more clear how some incurable pathologies can be described only by adopting a more comprehensive systemic approach, in which the network of relationships between biological elements are treated in quantitative way similar to that applied in this paper. the method used to study the dynamics of the virus-host interaction system is structured in basic steps, namely, the development of a flow-stock diagram, that describes the virus-host interactions, the development of a virus-host systemic simulator, and, finally, its calibration and validation (extended data, figure ) . a typical systems thinking diagram is formed of stocks, flows and processes. stocks are countable extensive variables qi, i= , ,...,n, relevant to the study at issue, that constitute an n-ple of numbers that at any time represents a state of the system. a stock may change its value only upon its inflows and/or its outflows, represented by arrows entering or exiting the stock. processes are any occurrence capable to alter -either quantitatively or qualitatively -a flow, by the action of one or more of the system elements. in a stationary state of the system, stocks values are either constant or regularly oscillating. in the dynamics of a system, stocks act as shock absorbers, buffers, and time-delayers. processes are all what happens inside a system that allows the stationarity of its state, or that may perturb the state itself. to occur, a process must be activated by another driver, acting on the flow where the process is located. these interaction flows may be regarded as flows of information, that control the occurring processes and so the value and nature of the matter flows. the pattern of the feedbacks acting in the system configurations is the feature that utlimately defines the systems dynamics. we adopted an energyy approach, where stocks, flows and processes are expressed in terms of the energy embedded, transmitted and used, respectively, during the system operation. the equations, that characterize the flows relevant to the diagram, are typical of dynamic st analysis , , and their setting up is linked in many respects to the energy network language . in this approach, each flow depends on the state variables qi by relationships of the kind dqi/dt=kf(qj), i,j= ,...,n, where n is the number of stocks in the system. given a set of proper initial conditions for the stocks (i.e., the initial system state) and a properly chosen set of phenomenological coefficients k, the set of interconnected equations will be treated by standard finite-different method, taking care of choosing a time-step short enough to evidence the possible dynamics of any of the studied processes. the coefficients ki are calculated using data on the dynamics of any single stock, in particular, by estimating the flows and the stocks during the time interval set for the simulation steps (as described in supplementary methods). when different flows co-participate in a process, a single coefficient will gather all the actions that concur to the intensity of the outcoming flow(s). these phenomenological coefficients are not, therefore, related to specific biophysical phenomena or processes, but are set to describe how and how fast any part of the system react to a change in any of the other ones. in general, our model may be then regarded as based on a population-level model (plm) as defined by , which can be run at different scales from organism to sub-cellular ones (see for example , , ), that already increased the existing knowledge on the life cycle of different infections, like for example the ones generated by hiv and hepatitis c virus . to obtain the simulations, we used the open-source computation software scilab (https://www.scilab.org). the model simulations are developed based on the choice of the initial stock conditions, as well as of the parameters ki (see supplementary information for further details) . a model is valid as far as its output reproduces the reality, and systems modelling must be tested under two aspects, concerned with the capability to correctly describe the system and its dynamics, respectively extended data figure workflow and study design extended data figure effects of targeting leverage points upon the applcation of generic external driving forces d at day interaction of the innate immune system with positive-strand rna virus replication organelles genetic variability and evolution of hepatitis e virus genomic analysis of the emergence, evolution, and spread of human respiratory rna viruses host factors in positive-strand rna virus genome replication antiviral drug targets of single-stranded rna viruses causing chronic human diseases a sars-cov- protein interaction map reveals targets for drug repurposing coronaviruses -drug discovery and therapeutic options cytoplasmic viral replication complexes mutational and fitness landscapes of an rna virus revealed through population sequencing the role of mutational robustness in rna virus evolution quasispecies diversity determines pathogenesis through cooperative interactions in a viral population evolvability of an rna virus is determined by its mutational neighbourhood innate immunity induced by composition-dependent rig-i recognition of hepatitis c virus rna middle east respiratory syndrome coronavirus orf b protein inhibits type i interferon production through both cytoplasmic and nuclear targets resolution of primary severe acute respiratory syndrome-associated coronavirus infection requires stat innate immune evasion strategies of dna and rna viruses viral infection switches non-plasmacytoid dendritic cells into high interferon producers cardif is an adaptor protein in the rig-i antiviral pathway and is targeted by hepatitis c virus viral innate immune evasion and the pathogenesis of emerging rna virus infections hla class ii genes determine the natural variance of hepatitis c viral load emerging coronaviruses: genome structure, replication, and pathogenesis a pneumonia outbreak associated with a new coronavirus of probable bat origin genesis of a highly pathogenic and potentially pandemic h n influenza virus in eastern asia isolation of a novel coronavirus from a man with pneumonia in saudi arabia common nodes of virus-host interaction revealed through an integrated network analysis coronaviruses and the human airway: a universal system for virus-host interaction studies generation and comprehensive analysis of host cell interactome of the pa protein of the highly pathogenic h n avian influenza virus in mammalian cells human interactome of the influenza b virus ns protein whither systems medicine? chloroquine is a potent inhibitor of sars coronavirus infection and spread in vitro inhibition of severe acute respiratory syndrome coronavirus by chloroquine role of lopinavir/ritonavir in the treatment of sars: initial virological and clinical findings efficacy of chloroquine and hydroxychloroquine in the treatment of covid- chloroquine and hydroxychloroquine in the context of covid- review: hydroxychloroquine and chloroquine for treatment of sars-cov- (covid- ) a rapid systematic review of clinical trials utilizing chloroquine and hydroxychloroquine as a treatment for covid- effect of high vs low doses of chloroquine diphosphate as adjunctive therapy for patients hospitalized with severe acute respiratory syndrome coronavirus (sars-cov- ) infection: a randomized clinical trial treating covid- with chloroquine hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) chloroquine and hydroxychloroquine as available weapons to fight covid- breakthrough: chloroquine phosphate has shown apparent efficacy in treatment of covid- associated pneumonia in clinical studies remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro structural and molecular basis of mismatch correction and ribavirin excision from coronavirus rna thinking in systems : a primer all models are wrong: reflections on becoming a systems scientist modeling for all scales : an introduction to system simulation a picture is worth a thousand words: energy systems language and simulation energetic cost of building a virus the bioenergetic costs of a gene ampk regulates energy expenditure by modulating nad+ metabolism and sirt activity the incidence of thromboembolism for lenalidomide versus thalidomide in older patients with newly diagnosed multiple myeloma development of an urban few nexus online analyzer to support urban circular economy strategy planning optimal replication of poliovirus within cells single molecule analysis of rna polymerase elongation reveals uniform kinetic behavior mathematical model of influenza a virus production in large-scale microcarrier culture current estimates for hiv- production imply rapid viral clearance in lymphoid tissues viral dynamics in human immunodeficiency virus type infection a complex systems approach to cancer prevention the systemic-evolutionary theory of the origin of cancer (setoc): a new interpretative model of cancer as a complex biological system enhancing implementation science by applying best principles of systems science kinetic modeling of virus growth in cells energy and time determine scaling in biological and computer designs absence of effect of the antiretrovirals duovir and viraday on mitochondrial bioenergetics macrophages exposed to hiv viral protein disrupt lung epithelial cell integrity and mitochondrial bioenergetics via exosomal microrna shuttling approaching complex diseases extremely small energy requirement by poliovirus to proliferate itself is the key to an outbreak of an epidemic an agent-based model for drug-radiation interactions in the tumour microenvironment: hypoxia-activated prodrug sn in multicellular tumour spheroids entrainment and control of bacterial populations: an in silico study over a spatially extended agent based model replicating complex agent based models, a formidable task agent-based modelling of socio-ecological systems: models, projects and ontologies systems and models: complexity, dynamics, evolution, sustainability. (books on demand gmbh explanations of ecological relationships with energy systems concepts combining molecular observations and microbial ecosystem modeling: a practical guide formal aspects of model validity and validation in system dynamics the philosophy and epistemology of simulation: a review competing interests. authors declare no competing financial interests. the author to whom correspondence and material requests should be addressed is:dr. marco casazza key: cord- -d uu e authors: hass, kenneth n.; bao, mengdi; he, qian; park, myeongkee; qin, peiwu; du, ke title: integrated micropillar polydimethylsiloxane accurate crispr detection (impact) system for rapid viral dna sensing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d uu e a fully integrated micropillar polydimethylsiloxane accurate crispr detection (impact) system is developed for viral dna detection. this powerful system is patterned with high-aspect ratio micropillars to enhance reporter probe binding. after surface modification and probe immobilization, crispr cas a/crrna complex is injected into the fully enclosed system. with the presence of double-stranded dna target, the crispr enzyme is activated and non-specifically cleaves the ssdna reporters initially immobilized on the micropillars. this collateral cleavage releases fluorescence dyes into the assay, and the intensity is linearly proportional to the target dna concentration ranging from . to nm. importantly, this system does not rely on traditional dye-quencher labeled probe thus eliminating the fluorescence background presented in the assay. furthermore, our one-step detection protocol is performed at isothermal conditions ( °c) without using complicated and time-consuming off-chip probe hybridization and denaturation. this miniaturized and fully packed impact chip demonstrates rapid, sensitive, and simple nucleic acid detection and is an ideal candidate for the next generation molecular diagnostic platform for point-of-care (poc) applications, responding to emerging and deadly pathogen outbreaks. the widespread impact of the current coronavirus (covid- ) is a striking indicator of the fact that the global community is struggling to battle infectious diseases. failure to contain the virus early has resulted in an outbreak that has infected over , people with over , deaths. , the usa cdc predicts that a nationwide epidemic is unavoidable. the current situation highlights a urgent need for access to real-time detection that is accurate and effective to identify those infected so they can be properly quarantined and treated. , an effective vaccine could be the best solution to contain epidemics. however, vaccines take a very long time to develop, as evidenced with the african swine fever virus (asfv), which was initially discovered in but only had a preliminary vaccine announced in january of this year. it took over an additional month to show its effectiveness in laboratory tests, and it still needs to be proven effective in the field. currently, there are no proven treatments for either covid- or asfv. even if one were to be developed soon, it can take years to prove its effectiveness in clinical trials and mass produce the vaccine, not to mention distribute and administer it in affected areas. to contain and prevent the spread of these contagious outbreaks, a rapid point-of-care (poc) testing device is essential. current methods can take up to two days for tittering in a centralized laboratory in order to diagnose whether a sample contains the disease, and is not viable and inefficient when trying to isolate those infected and prevent them from spreading the disease to others. , in addition, the collected patient samples need to be sent to a centralized laboratory, leading to very long turnaround times and greatly limiting the number of people who could be tested and confirmed. strides have been made to develop poc detection methods which can be deployed in the field, and currently lab-on-chip (loc) devices or simple test kits which involve real-time polymerase chain reaction (pcr) have emerged as one of the leading choices for meeting the desired criteria. pcr is particularly sought after due to its ability to amplify the viral rna/dna from a few copies to billions. it has also been shown to work with clinical samples in loc devices with low volumes of both reagents and patient samples. however, there have been many issues with implementing pcr test kits, such as a shortages in test kits and trained professional to use them, as well as potential inaccurate results from the kits. , clustered regularly interspaced short palindromic repeats (crispr) provides an alternative to pcr amplification techniques for the detection of viral rna/dna. certain cas proteins, such as crispr-cas a, have been shown to be powerful in biological detection due to their ability to indiscriminately cut single-stranded dna (ssdna) after they are activated by a target dna. this is extremely useful when paired with "reporter probes" (ssdna strands with a fluorescent dye and quencher attached to them), as the crispr complex can cleave the reporter probe and release the dye for fluorescence quantification. however, one of the issues with utilizing the crispr complex is the high fluorescence background signal associated in the sample as the quencher cannot fully quench the fluorescent dye. solid-phase detection assays have been developed as one potential solution to overcome this issue presented in the liquid phase. on the solid surface, reporter probes do not require a quencher since they are only measured in the liquid phase after degradation, thus no fluorescent signal will be detected without the target dna present in the assay. however, an extended surface with a larger surface area is always needed to increase the probe binding capacity. it has been shown that these extended surface can increase the amount of probe binding, lower the detection limit of the target of interest, and extend the detection dynamic range. , here, we present a fully enclosed integrated micropillar polydimethylsiloxane accurate crispr detection (impact) system for nucleic acid target detection. the reporter probes were firstly immobilized in the enclosed channel and the crispr complex was pumped into the system for reaction (fig. a) . leveraging the high activity of crispr-cas a enzyme and the ability of micropillars to bind more reporter probes, we successfully detect double-stranded dna target without background issues. in addition, the impact chip requires low volumes of reagents, operates in a one-step detection fashion, and does not require complicated temperature control, making it ideal for poc applications. to prove the concept of the impact chip for viral dna detection, the target sequence selected for the crrna was synthesized after a segment of the asfv-sy genome (b l). in the future, our device could work with any rna or dna based pathogens, such as the emerging covid- . the impact chip (fig. b) consists of cm long channels patterned with polydimethylsiloxane (pdms) micropillars. fig. c shows the sem image of the channel, revealing periodic curved nature of the channel and even spacing of the micropillars (diameter: µm; height: µm). after channel fabrication and sealing, surface treatment with aptes ( -aminopropyl)triethoxysilane) and glutaraldehyde was performed for streptavidin immobilization. after streptavidin coating, the static water contact angle decreases from ~ degree to ~ degree for a flat pdms surface and ~ degree to degree for micropillar surface, respectively (fig. d ). fig. e shows the entire sample preparation and detection protocol. dna reporter probes with a biotin label were conjugated on the streptavidin coated micropillars. crispr cas a complex was then introduced into the channel. with the presence of the asfv target dna, it cleaves the reporters from the micropillars for detection. the sequence of the dna target and the crrna are listed in fig. f . the target was selected based on the pam region in the asfv-sy genome. it has been shown that auto-fluorescence occurs from the chemical treatment of aptes and glutaraldehyde. in our case, we were able to confirm the phenomenon was specific to glutaraldehyde and aptes ( fig. si a) , and use it to show that it only occurs on pdms in the presence of these chemicals (fig. si b) . fig. g shows the uniformity of the chemical treatment on the surface, as the intensity of the auto-fluorescence (emission: nm) is uniform throughout. we then incubated the reporter probes (emission: nm) in the channel for hrs and showed uniform coating in the channel (fig. h ). after establishing a surface modification and streptavidin incubation strategy, we first determined the channel washing conditions. after incubating biotinylated photocleavable capture probes (/ pcbio/ttattcttattgtgtgaactgctccttc ttgactccacc/ -fam/) in the channel for hrs, the channel was washed with μl di water. then, the channel was evacuated, and the fluorescence intensity of the supernatant was immediately evaluated. after the first wash, a high fluorescence peak was observed for all the samples (inset of fig. a) , indicating that excessive dna probes were washed from the channel. a second wash was performed to confirm the unbounded dna probes were completely removed from the channel. after the second wash, the collected supernatant barely shows any fluorescence peak. as shown in fig. a , the integrated signal ( to nm) for the first supernatant ranges between ~ , to , counts, regardless of surface treatment conditions (saturated the spectrometer due to high signal). on the other hand, the integrated signal for the second supernatant is only ~ counts, eliminating the background caused by weakly bounded dna probes. after washing, the bounded reporter probe was released by uv exposure and the results are presented in fig. b . the flat channel coated with streptavidin shows ~ fold more reporter probe binding than the uncoated surface. in addition, the micropillar channel coated with streptavidin shows the highest signal among the three substrates. with an optimal washing condition in hand, we then studied the effect of incubation time for biotin labeled dna binding on streptavidin coated pdms surface. after channel surface modification and streptavidin treatment, fluorescent reporter probe was introduced to bind on the solid surface. then, the washing protocol was performed, followed by uv light exposure to retrieve the reporter probe. as shown in fig. , for streptavidin coated surface, the number of dna immobilized on the surface does not show significant change with an incubation time between to min as the integrated fluorescence intensity of the retrieved dna ranges between , to , counts. however, the binding capacity increases ~ % with hrs incubation when compared to min incubation. although the binding capacity further increases with hrs incubation , it requires refrigeration as the streptavidin protein could denature at room temperature. on the other hand, the negative sample without streptavidin coating does not show a correlation to the incubation time, indicating less specific binding. moreover, the negative sample always shows much lower fluorescence intensity than the positive sample. for example, with hrs incubation, the retrieved dna from positive sample is ~ . times higher than the negative sample. it further indicates that surface modification can enhance the probe binding capacity. therefore, we used hrs incubation to prepare the impact chip for solid phase crispr detection. the extended surface provided by high-aspect ratio micropillars significantly increases the reporter probe binding capacity. to demonstrate this, we compared the number of captured dna molecules on the micropillar channel with the flat channel (fig. ) . with an input of x nmoles, the retrieved dna from the micropillar channel has an integrated intensity of ~ , counts, which is much higher than the flat surface (~ counts). with an input of x nmoles, the retrieved dna from the micropillar channel has an integrated intensity of ~ , counts and is ~ % higher than the flat channel. with an input of x nmoles, the micropillar channel still shows higher signal than the flat channel but is comparable to x nmoles sample, indicating that the channel is saturated with an input of ~ x - nmoles. thus, we used x nmoles reporter probes in our crispr detection as higher load could cause unnecessary background. we combined the dna probe modified micropillar channel with crispr cas a assay for solid-phase and background free viral dna sensing. we first mixed crispr cas a/crrna/target dna in an eppendorf tube and then injected the complex into the impact chip and allowed it to incubate for hrs. the activated complex diffuses in the microchannel and nonspecifically cleaves the reporter probes from the micropillars. the uncorrected emission curve and the integrated fluorescence signal of the crispr experiments are shown in fig. a and b, respectively. the measured fluorescence intensity linearly increases with the target concentration ranging from . - nm (pearson's r= . ). on the other hand, the supernatant without any asfv target dna input does not present a fluorescence signal as the crispr cas a cannot be activated, demonstrating a fully enclosed and efficient microdevice for background-free viral dna quantification. we have shown that the pdms micropillars provide a significant increase in solid-state capture probe binding than the flat surface. the increased ssdna capture probe capacity can extend the detection limit and also dynamic range. , in addition, the micropillar array increases the interaction between biomolecules and the substrate, as the surface area in contact with any given solution is significantly higher. to further increase the probe binding capacity, we can increase the aspect ratio of pdms micropillars and the microchannel length. indeed, the channel was figure : released ssdna reporter probe integrated intensity for flat and micropillar channels with a dna input load of × , × , and × nmoles, respectively. uv photocleavage was used to retrieve the ssdna reporter probes. error bars represent %- % confidence intervals and middle line is mean. designed to have a periodic curvature throughout its length in order to show the ability to utilize a curved design with the chip, which in the future could snake back and forth across an entire sample, greatly increasingly the overall surface area. the aspect ratio of the microstructures can be further increased by choosing more rigid materials. for example, an aspect ratio of : has been demonstrated on silicon by using deep-ion reactive etching. , silicon would also have the advantage of potentially removing some of the air bubbles seen during use with the impact chip, which could ensure greater uniformity in coating of reporter probes. alternatively, the emerging additive manufacturing technology can also be used to create such ultra-high aspect ratio microstructures as an efficient target capture platform. , one of the main advantages of the impact chip compared to traditional crispr assays is its ability to limit the background caused by dye-quencher probes, which is typically seen in crispr detection in the liquid state and needs to be designed around to lower the detection limit. , our device utilizing solid-phase crispr does not need to tether a quencher on the probe. the cleaved crispr products are sent to a separate reservoir for detection thus completely avoiding the fluorescence background caused by the tradition "onepot" detection. as shown in fig. , without the presence of asfv target dna, no fluorescence background signal was detected, demonstrating this powerful backgroundfree detection. this is an important improvement for molecular diagnostics, especially in low light settings in which the amount of background fluorescence present can significantly affect the detection limit . , leveraging this advantage with increasing the aspect ratio either with pdms or a silicon substrate will be able to further increase the accuracy and robustness of the device and improve the limit of detection. the impact chip offers a simple one-step detection strategy and does not require off-chip incubation. traditional nucleic acid based detection requires stringent multi-probe hybridization and washing thus is slow and complicated. , in this work, the crispr complex is quickly mixed with the target sample and then the crispr/crrna/target dna complex is injected into the channel for on-chip detection. the chip sits on a hotplate set at °c and does not need thermal cycling with temperature control like pcr does. , thus, the detection system we developed is much simpler and more compact, ideal for poc applications. , to extend the detection limit, our system can be integrated with isothermal amplification methods such as recombinase polymerase amplification for sensitive "one pot" target amplification and crispr reaction, without the background issues seen in traditional "one-pot" methods. , an advantage of the impact chip compared to other detection apparatuses such as the sherlock test strip is that it is a fully enclosed system without extra packaging need. , this is advantageous as the treated micropillars are sealed before molecular diagnostics. the reporter probes are immobilized within the channel, avoiding degradation issues from outside contaminants. this is particularly important when dealing with rna target as it is susceptible to degradation by the presence of rnase. our fully integrated chip without special packaging needs is able to avoid rnase exposed in air, dust, and human hands in poc settings. this impact chip concept can be extended to detect many biomarkers such as exosomes , single cells , and surface proteins . for example, the detection sensitivity of traditional immunoassay approaches is limited due to the low binding capacity. magnetic beads have been widely used to increase the binding affinity. , however, it is difficult to incorporate beads into the microfluidic channel as they easily settle down and get stuck in the channel. in vivo detection is also challenging as the beads emit strong auto-fluorescence. the micropillar arrays that we developed are patterned in the chip and can avoid those problems. in the future, on-chip treatment of blood infections can be achieved by immobilizing proper capture probes onto the impact chip. , materials and methods device fabrication: to create the impact microfluidic chip, a mold was created using standard photolithography on a mm silicon wafer. the silicon wafer was first dehydrated by baking it at °c for min on a hot plate. after cooling to room temperature, su- (microchem) was spun coat onto the silicon wafer with a thickness of µm. the wafer was then left to sit for min on a level surface to allow for reflow, and then soft baked on a hotplate at °c for min, followed by min at °c. after that, the wafer was exposed using a karl suss ma mask aligner for - s. it was then allowed to sit for min, before receiving a post exposure bake at °c for min. the sample was developed for ~ min in su- developer (microchem) and then sprayed down with ipa and dried. a final hard bake at °c was done on a hotplate for min. after the hard bake, the wafer was silanized overnight using silanization solution i (~ % dimethyldichlorosilane in heptane) in a desiccator. the pdms channel was created by pouring : ratio of sylgard silicone elastomer base to sylgard silicone curing agent over the su- molds and allowing it to cure. the pdms was then peeled from the su- mold, and holes (diameter of mm) were punched at both ends of the channels to allow for flow of reagents through the channel. the channels were then cleaned in an ultrasonic bath for min using ethanol, dried, and then cleaned again in the same manner using deionized water. the pdms slab was then bonded to a pre-cleaned glass substrate by treating both with oxygen plasma (electro-technic products) for min, followed by pressing the substrate and pdms together. the device was immediately baked on a hotplate overnight at ~ °c. surface modification: the device was filled with % aptes (sigma aldrich, ( -aminopropyl)triethoxysilane) in ethanol (bvv, lab grade ethanol proof usa made). a total of µl of the solution was flown into the channels at a flow rate of µl/min using a syringe pump (wpi, sp ). the aptes solution was then left to incubate in the channel for min and washed with % ethanol. after washing, the remaining liquid in the channel was drained by a syringe filled with air, and then with a compressed air canister (dust off, electronics compressed-gas duster) to dry the channel. the channel was then baked on a hotplate at °c for min. glutaraldehyde (ad) solution ( %) in di water was next flown to fill the channel in the same manner as described above, but was allowed to incubate for a full hour after the µl of solution was flown through. the channel was then flushed with di water and dried with air. streptavidin immobilization: after surface treatment, each channel was injected with µl of streptavidin (thermofisher scientific, streptavidin s ) at a concentration of mg/ml using a ga microliter syringe (hamilton). since each channel has a volume of ~ µl, any excess solution was allowed to exit to the outlet. the solution was left in the channel for hrs to incubate. surface characterization: a goniometer system (model , ramé-hart) was used to characterize the chemically treated pdms surface. a sessile droplet (∼ µl) of deionized water was placed on the pdms surface with an automated dispensing system (part no. - ) . the static water contact angle was immediately measured by using the dropimage matrix software, provided by ramé-hart. reporter probe immobilization: in a similar manner as the streptavidin was applied, µl probe solution in pbs buffer (gibco tm, ph . ) was injected into the channel with a microliter syringe and allowed to incubate for hrs, regardless of whether a photocleavable linker was attached to dna bases. ultraviolet (uv) cleavage: reporter probe with a uv cleavable linker was cleaved by using a uv light (wavelength: nm). briefly, the sample was placed under a uv light with the glass side of the channel facing the light, at a distance of ~ cm from the uv lamp to the glass sealing the channel. the light was then turned on and the sample was exposed to the uv light for min. finally, the released reporters were retrieved by injecting ul of di water into the channel. solid-phase crispr cas a detection: lbcas a (new england biolabs, inc.) with a concentration of nm was pre-assembled with . nm crrna (idt, inc.) at room temperature for min. then, lbcas a-crrna complexes were mixed with x binding buffer and . µl nuclease free water (idt, inc.) to reach a ul reaction volume. after adding asfv target dna with various concentrations into the mixture, the detection assay was activated and immediately injected into the impact chip. the device was incubated on a hotplate at °c for hrs to allow for optimized reaction. finally, the cleaved product was retrieved from the channel and evaluated by a custom designed fluorometer. fluorescence quantification: to quantify the fluorescence intensity of the ssdna reporter probes, a custom designed fluorometer was used and has been reported before. , briefly, a continuous wave laser with an emission peak at nm (sapphire lp, coherent) was aligned under a reservoir filled with fluorescent molecules. the fluorescence signal was collected by an off-axis parabolic mirror and a fiber coupled mini usb spectrometer (usb +, ocean optics). to reduce background noise from the excitation light, a nm notch filter (thorlabs, inc.) was placed in front of the optical fiber. fluorescence images of the surface modified with aptes -glutaraldehyde (positive control) and aptes-valeraldehyde (negative control). negative control images of figure g and h which did not receive any surface modification. corresponding author cases: live updates coronavirus update (live): , cases and deaths from covid- wuhan china virus outbreak officials warn of coronavirus outbreaks in the u.s. the new york times we need a cheap way to diagnose coronavirus gene results in sterile immunity against the current epidemic eurasia strain a seven-gene-deleted african swine fever virus is safe and effective as a live attenuated vaccine in pigs potential rapid diagnostics, vaccine and therapeutics for novel coronavirus ( -ncov): a systematic review transmission of sars and mers coronaviruses and influenza virus in healthcare settings: the possible role of dry surface contamination point-of-care testing for infectious diseases: diversity, complexity, and barriers in low-and middle-income countries evolving status of the novel coronavirus infection: proposal of conventional serologic assays for disease diagnosis and infection monitoring development of a novel quantitative real-time pcr assay with lyophilized powder reagent to detect african swine fever virus in blood samples of domestic pigs in china -wang - -transboundary and emerging diseases continuous-flow, microfluidic, qrt-pcr system for rna virus detection ) hao, w. clinical features of atypical novel coronavirus pneumonia with an initially negative rt-pcr assay crispr-cas a target binding unleashes indiscriminate single-stranded dnase activity plasma nanotextured pmma surfaces for protein arrays: increased protein binding and enhanced detection sensitivity high-throughput and all-solution phase african swine fever virus (asfv) detection using crispr-cas a and fluorescence based point-of-care system real-time reliable determination of binding kinetics of dna hybridization using a multi-channel graphene biosensor fabrication of ultra-deep high-aspect-ratio isolation trench without void and its application link to external site, this link will open in a new window link to external site, this link will open in a new window. spatial decoupling of light absorption and catalytic activity of ni-mo-loaded high-aspect-ratio silicon microwire photocathodes d-printed cactus-inspired spine structures for highly efficient water collection rna-based fluorescent biosensors for live cell imaging of small molecules and rnas background suppression in fluorescence nanoscopy with stimulated emission double depletion universal dynamic dna assembly-programmed surface hybridization effect for single-step, reusable, and amplified electrochemical nucleic acid biosensing a ratiometric electrochemical biosensor for the exosomal micrornas detection based on bipedal dna walkers propelled by locked nucleic acid modified toehold mediate strand displacement reaction ultrasensitive detection of target dna using pcr selective endpoint visualized detection of vibrio parahaemolyticus with crispr/cas a assisted pcr using thermal cycler for on-site application cas avdet: a crispr/cas a-based platform for rapid and visual nucleic acid detection nucleic acid detection with crispr nucleases test for detection of circulating nuclei acids ultrasensitive microfluidic analysis of circulating exosomes using a nanostructured graphene oxide/polydopamine coating enhanced immunoadsorption on imprinted polymeric microstructures with nanoengineered surface topography for lateral flow immunoassay systems on-chip signal amplification of magnetic bead-based immunoassay by aviating magnetic bead chains photobleaching: improving the sensitivity of fluorescence-based immunoassays by photobleaching the autofluorescence of magnetic beads (small / ) link to external site, this link will open in a new window a novel pathogen capturing device for removal and detection rapid and fully microfluidic ebola virus detection with crispr-cas a key: cord- -da zn x authors: almodaresi, fatemeh; zakeri, mohsen; patro, rob title: puffaligner: an efficient and accurate aligner based on the pufferfish index date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: da zn x motivation sequence alignment is one of the first steps in many modern genomic analyses, such as variant detection, transcript abundance estimation and metagenomic profiling. unfortunately, it is often a computationally expensive procedure. as the quantity of data and wealth of different assays and applications continue to grow, the need for accurate and fast alignment tools persists. results in this paper, we introduce puffaligner, a fast, accurate and versatile aligner built on top of the pufferfish index. puffaligner is able to produce highly-sensitive alignments, similar to those of bowtie , but much more quickly. while exhibiting similar speed to the ultrafast star aligner, puffaligner requires considerably less memory to construct its index and align reads. puffaligner strikes a desirable balance with respect to the time, space, and accuracy tradeoffs made by different alignment tools, and provides a promising foundation on which to test new alignment ideas over large collections of sequences. availability puffaligner is a free and open-source software. it is implemented in c++ and can be obtained from https://github.com/combine-lab/pufferfish/tree/cigar-strings since its introduction, next generation sequencing (ngs) has been widely used as a low-cost and accessible technology to produce high-throughput sequencing reads for many important biological assays. the sequencing data that is generated in the form of short reads, drawn from longer molecular fragments, and finding the optimal alignments of these short reads to some reference is a necessary first step for many downstream biological analyses. the process of finding the segment on the reference that is most similar to the query read, and therefore most likely to be the source of the fragment from which the read was drawn, is known as read mapping or read alignment. the main goal in read alignment is to find alignments of contiguous sub-string of the underlying reference that yields a minimum edit distance (or maximum alignment score) between the read and the reference sequence at the alignment position. if the reads are paired-end, characteristics other than the alignment score can be used to filter spurious alignment locations, such as orientation of each end of the alignment pair (forward or reverse) or distance between the alignments corresponding to reads that are ends of the same fragment. short-read aligners are a major workhorse of modern genomics. given the importance of the alignment problem, a tremendous number of different tools have been developed to tackle this problem. some widely used examples are bwa , bowtie , hisat , and star . existing alignment tools use a variety of indexing methods. some tools, such as bwa, bowtie , and star use a full-text index over the reference sequences; bwa and bowtie use variants of the fm-index, while star uses a suffix array. a popular alternative approach to full-text indices is to instead, index sub-strings of length k (k-mers ) from the reference sequence. trading off index size for potential sensitivity, such indices can either index all of the k-mers present in the underlying reference, or some uniform or intelligently-chosen sampling of k-mers . there are a large variety of k-mer -based aligners, including tools like the subread aligner , shrimp , mrfast , and mrsfast . to reduce the index size, one can choose to select specific k-mers based on a winnowing (or minimizer) scheme. this approach has been particularly common in tools designed for long-read sequence alignment like mashmap and minimap . recently, a set of new indices for storing k-mers have been proposed based on graphs, specifically de bruijn graphs (dbg). a de bruijn graph is a graph over a set of distinct k-mers where each edge connects two neighboring k-mers that appear consequently in a reference sequence and therefore, overlap on "k − " bases. kallisto , debga , bgreat , browniealigner , and pufferfish are some tools which use an index constructed over the de bruijn graph built from the reference sequences. cortex , vari , rainbowfish , and mantis are also tools that use a colored compacted de bruijn graph for building their index over a set of raw experiments. all these approaches cover a wide range of the possible design space, and different design decisions yield different performance tradeoffs. generally, the fastest aligners (like star) have very large memory requirements for indexing, and make some sacrifices in sensitivity to obtain their speed. on the other hand, the most sensitive aligners (like bowtie ) have very moderate memory requirements, but obtain their sensitivity at the cost of a higher runtime. maintaining the balance between time and memory is especially more critical while aligning to a large set of references, like a large collection of microbial and viral genomes which may be used as an index in microbiome or metagenomic studies. as both the collection of reference genomes and the amount of sequencing data grows quickly, it is import for alignment tools to achieve a time-space balance without loosing sensitivity. based on the compact pufferfish index, we introduce a new aligner called puffaligner, that we believe strikes an interesting and useful balance in this design space. puffaligner is designed to be a highly-sensitive alignment tool while, simultaneously, placing a premium on computational overhead. by using the colored compacted de bruijn graph to factor out repeated sub-sequences in the reference, it is able to leverage the speed and cache friendliness of hash-table based aligners while still controlling the growth in the size of the index; especially in the context of redundant reference sequences. by carefully exploring the alignment challenges that arise in different assays, including single-organism dna-seq, rna-seq alignment to the transcriptome, and metagenomic sequencing, we have engineered a versatile tool that strikes desirable balance between accuracy, memory requirements and speed. we compare puffaligner to some other popular aligners and show how it navigates these different tradeoffs. for measuring the performance of puffaligner and comparing it to other aligners, we have designed a series of experiments using both simulated and experimental data from different sequencing assays. we compare puffaligner with bowtie , star and debga . bowtie is a popular, sensitive and accurate aligner with the benefit of having very modest memory requirements. star requires a much larger amount of memory, but is much faster than bowtie and can also perform "spliced alignment" against a reference (which puffaligner, bowtie , and debga currently do not allow). debga, is most-related tool to puffaligner conceptually, as it is an aligner with a colored compacted de bruijn graph-based index that is focused on exploiting redundancy in the reference sequence. we use different metrics to assess both the performance and accuracy of each method on a variety of types of sequencing samples. these experiments are designed to cover a variety of different use-cases for an aligner, spanning the gamut from situations where most alignments are expected to be unique (dna-seq), to situations where each fragment is expected to align to many loci with similar quality (rna-seq and metagenomic sequencing), and spanning the range of index sizes from small transcriptomes to large collections of genomes. first, we show puffaligner exhibits similar accuracy for aligning dna-seq reads to bowtie , but it is considerably faster. in the case of experimental reads, since the true origin of the read is unknown, we use measures such as mapping rate and concordance of alignments to compare the methods. furthermore, we evaluate the accuracy of aligners by aligning simulated dna-seq reads that include variation (single-nucleotide variants and small indels with respect to the reference). for aligning rna-seq reads, we compare the impact of alignments produced by each aligner on downstream analysis such as abundance estimatation. finally, we show puffaligner is very efficient for aligning metagenomic samples where there is a high degree of shared sequence among the reference genomes being indexed. we also illustrate that using alignments produced by puffaligner yields the highest accuracy for abundance estimation of metagenomic samples. the performance of each tool is impacted by the different alignment scoring schemes they use, e.g. different penalties for mismatches, and indels. to enable a fair comparison, we attempted to configure the tools so as to minimize divergences that simply result from differences in the scoring schemes. for the experiments in this paper, we use bowtie in a near-default configuration (though ignoring quality values), and attempt to configure the other tools, as best as possible, to operate in a similar manner. the debga scoring scheme is not configurable, so we use this aligner in the default mode (unfortunately, the inability to disable local alignment and forcing just computation of end-to-end alignments in debga makes certain comparisons particularly difficult). for puffaligner we use a scheme as close to bowtie as possible. the maximum possible score for a valid alignment in bowtie is (in end-to-end mode) and each mismatch or gap subtracts from this score. bowtie uses an affine gap penalty scoring scheme, where opening and extending a gap (insertion or deletion) have a cost of and respectively. for dna-seq reads, we configure star to allow as many mismatches as bowtie and puffaligner by setting the options "--outfiltermismatchnoverreadlmax . " and "--outfiltermismatchnmax ". also, we use the option "--alignintronmax " in star to perform non-spliced alignments while aligning genomic reads. for rna-seq reads, star has a set of parameters which we change in our result evaluations, and which are detailed below in the relevant sections. in bowtie we also use the option --gbar to allow gaps anywhere on the read except within the first nucleotide (as the other tools have no constraints on where indels may occur). furthermore, for consistency, we also run bowtie with the option "--ignore-quals", since the other tools do not utilize base qualities when computing alignment scores. as explained in section . , for the sake of performance, highly repeated anchors (more than a user-defined limit) will be discarded before the alignment phase. this threshold is by default equal to in puffaligner. we set the threshold to the same value for star and debga using options --outfiltermultimapnmax and -n respectively. there is no such option exposed directly in bowtie . since puffaligner finds end-to-end alignments for the reads, we are also running other tools in end-to-end mode, which is the default alignment mode in bowtie as well. in star we enable this mode using the option --alignendstype endtoend. in the case of debga, although the documentation suggests it is not supposed to find local alignments by default, the output sam file contains many reads with relatively long soft clipped ends, so if a read is not aligned end-to-end, debga reports the local alignment for that. we were not able to find any option to force debga to perform end-to-end alignments for all reads, and so we have compared it in the configuration in which we were able to run it. for aligning dna-seq samples, each aligner is configured to report a single alignment, which is the primary alignment, for each read. bowtie outputs one alignment per read by default. to replicate this in the other tools, we use the option --outsammultnmax in star, -o -x in debga, and --primaryalignment in puffaligner. first, we evaluate the performance of puffaligner with a whole genome sequencing (wgs) sample from the genomes project .we downloaded the err reads from sample hg , which is a low-coverage sample from a finnish male, sequenced in finland. * . there are , , paired-end reads, each of length nucleotides in this sample. using fastp , we remove low quality ends and adapter sequences from these reads. after trimming, there are , , reads remaining in the sample. indices for each of the tools are built over all dna chromosomes of the latest release of the human genome (v ) by gencode † . in this experiment, all aligners are configured report only concordant alignments, i.e., only pairs of alignments that are cocordant and within the "maximum fragment length" shall be reported. the maximum fragment length in all aligners is set to , using the option --alignmatesgapmax in star, --maxins in bowtie and -u -f in debga. the default value for the maximum fragment length in puffaligner is set to , the user can cofigure this value by using the flag --maxfragmentlength. this concordance requirements also prevents bowtie , puffaligner, and star from aligning both ends of a paired end read to the same strand. the alignment rate, run-time memory usage and running time for all the aligners are presented in . the reason that debga has the highest mapping rate in compared to other tools is that it is local alignments for the reads that are not alignable end-to-end under the scoring parameters for the other tools. bowtie and puffaligner are both able to find end-to-end alignments for about ∼ % of the reads. star and puffaligner are the fastest tools, with star being somewhat faster than puffaligner. on the other hand, puffaligner is able to align more reads than star, while requiring less than half as much memory. the memory usage of bowtie is the smallest, since bowtie 's index does not contain a hash table. however, this comes at the cost of having the longest running time compared to other methods. overall, puffaligner benefits from the fast query of hash based indices while its run-time memory usage, which is mostly dominated by the size of the index, is significantly smaller than other hash based aligners. although debga's index is based on the de bruijn graphs, similar to the pufferfish index, the particular encoding for it is not as space-efficient as that of pufferfish. to look more closely how the mappings between the tools differ, we investigate the agreement of the reads which are mapped by each tool and visualize the results in an upset plot in fig. using the upsetr library . we are only comparing the three methods which perform end-to-end alignment in this plot, since outliers from the local alignments computed by debga would otherwise dominate the plot. the first bar shows that the majority of the reads are mapped by all three tools.the next largest set represents the reads which are only mapped by bowtie and puffaligner. all the other sets are much smaller compared to the first two sets. this fact illustrates that the highest agreement in the aligners is between bowtie and puffaligner. exploring a series of individual reads from the smaller sets in the upset plot, suggests that some of these differences happen as a result of small differences in the scoring configuration, while some result from different search hueristics adopted by the different tools. supplementary fig. s shows the coherence between the alignments reported by the tools by also including the exact location to which the reads are aligned in the reference. to further investigate the accuracy of the aligners, we used simulated dna-seq reads.one of the main differences between simulated reads and experimental reads is that simulated reads are often generated from the same reference sequences to which they are aligned, with the only differences being due to (simulated) sequencing error. while (simulated) sequencing error prevents most reads from being exact substrings of the reference, it actually does not tend to complicate alignment too much. on the other hand, while dealing with experimental data, the genome of the individual from which the sample is sequenced might include different types of variations with respect to the reference genome to which we are aligning . therefore, it is desirable to introduce variations in the simulated samples, and to measure the robustness and performance of the different aligners in the presence of the variation. mason is able to introduce different kinds of variations to the reference genome, such as snvs, small gaps, and also structural variants (sv) such as large indels, inversions, translocations and duplications. we use mason to simulate dna-seq samples with different variation rates ranging from e− to e− . each sample includes m paired-end illumina reads of bp length from chromosome of the human genome, ensembl release ‡ . for this analysis, we do not restrict the aligners to only report concordant alignments, since the structural variations in the samples can lead to valid discordant alignments, such as those on the same strand or with inter-mate distances larger than the maximum fragment length. to be specific, we do not use the options which limit bowtie and puffaligner to report only concordant alignments, in addition, we use the option "--dovetail" in bowtie to consider dovetail pairs as concordant pairs. ‡ ftp://ftp.ensembl.org/pub/release- /fasta/homo sapiens/dna/ the alignments reported by debga already include discordant pairs and also orphan mappings. furthermore, to remove any restrictions on the fragment length in the alignments reported by debga, we set the minimum and maximum insert size, respectively to and the , since setting a larger value resulted in the tool running into segmentation fault. to allow dovetail pairs and also larger gaps between the pairs in star, we use the following options: "--alignendsprotrude concordantpair", "--alignmatesgapmax ". by default there is not a specific option in star for allowing orphan alignment of paired end reads. instead, we can increase the number of allowed mismatches to be as large as one end of the read by using the following options: "--outfiltermismatchnoverreadlmax . ", "--outfiltermismatchnoverlmax . ", "--outfilterscoreminoverlread ", "--outfiltermatchnminoverlread ". for each sample, mason produces a sam file which includes the alignment of the simulated reads to the original, non-variant version of the reference -the version which was used for building the aligner's indices in this experiment. based on the alignments reported in the truth file, some reads did not have a valid alignment to the original reference. this was the result of a high rate of variations at some sequencing sites. we called the set of reads that, according to the truth sam file, were aligned to the original reference as compatible reads. we compared the performance of aligners based upon how well they are able to align the compatible reads. we computed the precision and recall of the alignments reported for these reads as follows. true positives are considered the reads that are mapped by the aligner to the same location stated by the truth file. then, recall is computed by dividing the number of true positives by the number of all compatible reads. furthermore, we considered an alignment as a false positive in two different cases. first, an alignment was considered discordant if the reported alignment had a large edit distance (larger than ) for the non-compatible reads. second, in the case that an aligner reported an alignment to a location other than the one in the truth file, it was considered as a false positive if the edit distance of the reported alignment is greater than the edit distance of the true alignment. having defined the set of tp and fp for the alignments, and also having considered the set of all compatible reads as the set we are trying to recover, we computed precision and recall for the set of alignments reported by each aligner. figure shows the precision and recall of the aligners for different samples. according to fig. , for lower variation ratios up until e− , most of the tools are able to make accurate alignment calls with a high specificity. as the variation ratio introduced in the sample is increased, all the tools start to have lower precision and recall. debga and star perform worse in higher variation samples, as they fail to recover the true alignment for more reads, while bowtie and puffaligner are able to align most of the reads to their true location on the original reference. these results show that puffaligner' accuracy is stable in the face of variation which makes the tool suitable for datasets that are known to have substantial variation, such as when aligning reads to microbial genomes where the specific sequenced strain may not be represented in the reference set. mapping sequencing reads to target transcriptomes is the initial step in many pipelines for reference-based transcript abundance estimation. while lightweight mapping approaches , greatly speed-up abundance estimation by, in part, eliding the computation of full alignment between reads and transcripts, there is evidence that alignments still yield the most accurate abundance estimates by providing increased sensitivity and avoiding spurious mappings , , . thus, the continued development of efficient methods for producing accurate transcriptome alignments of rna-seq reads remains a topic of interest. in this section, we compare the effect of alignments produced by each tool on the accuracy of rna-seq abundance estimation. we generated , , paired-end rna-seq reads using the polyester read simulator. the reads are generated by the simulate experiment countmat module in polyester. the input count matrix is calculated based on the estimates from the bowtie -salmon pipeline on the sample srr (where reads are first aligned with bowtie and then the alignments are quantified using salmon). this sample is a collection of paired-end rna-seq reads sequenced from human transcriptome using an illumina hiseq . the human transcriptome from gencode release ( ) is used to build all the aligners' indices. also, for building star's index in the genome mode, the human genome and the comprehensive gene annotation (main annotation file) is obtained from the same release of gencode. as the reads in this experiment are rna-seq reads sequenced from the human transcriptome, it is important to account for multi-mapping, as often, a read might map to multiple transcripts which share the same exon or exon junction. this property makes the direct evaluation of performance at the level of alignments difficult. therefore, a typical approach in evaluating the accuracy of the transcriptomic alignments is to assess the accuracy aligner spearman mard time (mm:ss) memory (gb) table : abundance estimation of simulated rna-seq reads, computed by salmon, using different tools' alignment outputs. the time and memory are only for the alignment step of each tool and the time for abundance estimation by salmon is not considered. of downstream analysis such as abundance estimations by computing the correlation and relative differences of the estimates with the true abundance of the transcripts. to compare the accuracy of each tool we give the alignments produced by each aligner, which are in the sam format, as input to salmon to estimate the transcript expressions. puffaligner, by default, outputs up to alignments with an alignment score greater than . times the best alignment score, i.e., the alignment for the read in the case that all bases are perfectly matched to the reference. to enable the multi-mapping to take into account the characteristics of alignment to the transcriptome, bowtie is run with the option -k which lets the tool output up to alignments per read. the value of is adopted from the suggested parameters for running rsem with bowtie alignments. we note that running bowtie with this option makes the tool considerably slower than the default mode, as many more alignments will be computed and output to the sam file under this configuration. for both bowtie and puffaligner, and also for star by default, orphan and discordant mappings are not allowed. we ran star with the 'encode' options, which are recommended in the star manual for rna-seq reads. star is also run in two different modes, one is by building the star index on human genome, while it is also provided a gtf file for gene annotation. in this mode, star performs spliced alignment to the genome, then projects the alignments onto transcriptomic coordinates. the other mode is building the star index on the human transcriptome directly, which allows star to align the rna-seq reads directly to the transcripts in an unspliced manner. we chose to run star in the transcriptomic mode as well, since we find that it yields higher accuracy, though this increases the running time of star. the debga index is built on the transcriptome, as are the bowtie and puffaligner indices, since these tools do not support spliced read alignment. debga is run in the with options -o -x , which nominally has the same effect as -k in bowtie , according to the documentation of debga. accuracy of abundance estimation by salmon, when provided the sam output generated by each aligner, is displayed in table . the timing and memory benchmarks provided in this table is only for the alignment step. alignments produced by puffaligner, bowtie and star in the transcriptomic mode produce the best abundance estimates. debga's output alignments are not suitable for any abundance estimation as many reads are aligned only to the same strand which are later filtered during the abundance estimation by salmon, so we could not provide a meaningful correlations for abundance estimation using debga's alignments. aligning the reads by star to genome and then projecting to transcriptomic coordinates does not generate as high correlation as directly aligning the reads to the transcriptome by star. however, we note that, as described by srivastava et al. , there are numerous reasons to consider alignment to the entire genome that are not necessarily reflected in simulated experiments. while the memory usage by puffaligner is only fold larger than memory used by bowtie , it computes the alignments much more quickly. according to the results in table puffaligner is the fastest aligner in these benchmarks, and the accuracy as high as bowtie and star for aligning rna-seq reads. here, puffaligner leads to the most accurate abundance estimates, while being times faster than bowtie . moreover, the memory usage is much less than other fast aligners such as star. to demonstrate the performance and accuracy of puffaligner for metagenomic samples, we designed two different experiments. one main property of metagenomic samples is the high similarity of the reference sequences against which one typically aligns, where a pair (or more) of references may be more than % identical. the first experiment we designed for this scenario, to specifically evaluate issues related to this challenge, we call the "single strain" experiment. additionally, metagenomic samples also have the property of containing reads from a variety of genomes, some of which are not even assembled yet -and hence unknown. this leads to the second experiment, which we call the "bulk" experiment, that compares the aligners in the presence of a high variety of species in the sample in addition to the high similarity of references. for simplicity and uniformity, all the experiments have been run in the concordant mode for both puffaligner and bowtie (both of which support such an option), disallowing orphans and discordant alignments. all aligners are run in three different confiurations, allowing three specific maximum numbers of alignments per fragment; (primary output with highest score, breaking ties randomly), , and . puffaligner and star, as the only tools that support this option, also are run in the beststrata mode. in this mode, the aligner outputs all equally-best alignments for a read with highest score without the limitation on number of reported alignments. this option is inspired by the similarly-named option in bowtie . however, unlike bowtie , puffaligner and star only make a best-effort attempt to find the score of the best stratum alignments, and do not guarantee to find the best stratum (though the cases in which they fail to seem to be exceedingly rare). this option is especially useful in the metagenomic analyses, as we will report only the best-score alignments without having an arbitrary limitation on the number of allowed alignments. this allows proper handling of highly multi-mapping metagenomic reads. in other words, using this option, one can achieve a high sensitivity without the need to hurt specificity. the details of each experiment is explained in the following sections. for this experiment, we download the viral database from ncbi, and choose three similar coronavirus genomes. this set includes one of the recently-uploaded samples from wuhan , . we select three very similar viral genomes to simulate reads from, which are: nc . , nc . , and nc . . there are also a lot of literature discussing the similarity in sequence and behavior for these three species of coronavirus [ ] [ ] [ ] . the first is the complete genome for severe acute respiratory syndrome coronavirus isolate wuhan-hu- known as covid with length of , bases. nc . is the id of sars coronavirus complete genome (length: , ) and finally, nc . is a bat coronavirus bm - /bgr/ complete genome (length: , ). we use mason to generate three simulated samples, each sample contains , reads only from one of the three viral references we mentioned earlier. then, reads were aligned back to the database of viral sequences using each of the four aligners. the results are shown in table for covid and table s for the other two simulations. as the results show, the alignments of all aligners, except for debga, are distributed only across the three references of interest out of all the reference sequences in the complete viral database. debga reports only a few alignments to a forth virus. in general, all of the aligners do a good job of reporting the correct alignment among the returned alignments for each read. here, we are more interested in exploring how sub-optimal alignments are computed and filtered under different settings when aligning to a collection of very similar genomes. the results show that all tools have very high sensitivity even when considering only a single (primary) alignment per read. as we allow more alignments to be reported, the sensitivity increases and quickly levels off for all the tools. on the other hand, more alignments are generated and bowtie , in particular, generates a considerable number of extra alignments as the maximum number of allowed reported alignments is increased. however, the results do not change when allowing more than alignments, which means no more than alignments ever pass the alignment score threshold for these reads in the viral database for any of the tools we are testing. the results indicate that, when allowing more than one alignment to be reported for every read, bowtie tends to report a large number of sub-optimal (yet, still valid) alignments compared to other tools. these are alignments that are accepted within the alignment score threshold, but are to another target than the one from which the read truly originates. generating these sub-optimal alignments is in no way wrong, but it has a non-trivial computational cost, as shown in figure a, even if these alignments are not used in downstream analysis. further, the score of the best alignment for each read is specific to that read and not known ahead of time, meaning that this situation cannot be completely addressed simply by setting more stringent parameters for which alignment scores should be allowed. this behavior of bowtie gives the other tools a computational advantage when the user only truly requires the set of equally-best alignments for each read. interestingly, there is one read that all tools, except for puffaligner fail to properly align. inspecting this alignment reveals it is a valid alignment within the range of the acceptable scoring threshold, and it is unclear why it is not discovered by the other tools. overall, the aligners tested perform very well here in reporting the true strain of origin without reporting too many extra alignments. interestingly, despite changing the parameters to allow more alignments, star tends to return the same set of alignments under all configurations in this experiment. figure shows that puffaligner has the lowest running time, even when the number of allowed alignments per read increases. beststrata mode in this small example, all tools showed good sensitivity (and puffaligner and star showed near-perfect sensitivity) even when reporting only a single-alignment per read. this experiment is, of course, an atypically small test for multi-mapping read. in in larger samples, with reads deriving from more organisms and a larger database of references, permitting more alignments usually yields non-trivial improvements in sensitivity. to control the rate of reporting sub-optimal alignments, puffaligner supports the "best strata" option -also available to star, which allows only the alignments with the best calculated score to be reported (as a replacement for maximum allowed number of alignments). using this option, puffaligner achieves full specificity and sensitivity in this experiment . the same results are achieved for the other two simulated single-strain samples shown in the supplementary table s . we further demonstrate the positive impact of this option on the alignment of bulk metagenomic samples in the next section. . the best specificity is achieved by puffaligner in beststrata mode (as well as the primary mode). in this simulated sample, many alignments are not ambiguous, resulting in the good performance observed when using only primary alignments. however, typically in metagenomic analysis, many equally-good alignments exist, and selecting only one is equivalent to making a random choice. we chose a random set of complete bacterial genomes downloaded from the ncbi microbial database and constructed the indices of puffaligner, bowtie , star, and debga on the selected genomes. supplementary table s shows the time and memory required for constructing each of the indices, in addition to the size of the final index on disk. overall, puffaligner and bowtie show a similar trend in time and memory requirements, while star and debga require an order of magnitude more memory. in terms of the final index size, bowtie has the smallest index, puffaligner has the second-smallest, and star has the largets. for simulating a bulk metagenomic sample, we generated a list of simulated whole genome sequencing (wgs) reads through the following steps: • select a real metagenomic wgs read sample • align the reads of the chosen real experiment to the genomes using bowtie , limiting bowtie to output one alignment per read. • choose all the references with count greater than c from the quantification results. this defines the read distribution profile that we will use to simulate data. • for each of the expressed references, use mason , a whole genome sequence simulator, to simulate bp paired-end reads with counts proportional to the reported abundance estimates so that total number of reads is greater than a specified value n. in this step we ran mason with default options. • mix and shuffle all of the simulated reads from each reference into one sample which is used as the mock metagenomic sample. we selected three illumina wgs samples that are publicly available on ncbi. a soil experiment with accession id srr from a project for finding sub-biocrust soil microbial communities in the mojave desert. the sample has ∼ m paired-end reads, containing a mixture of genomes from various genera and families. however, less than k of the reads in the sample were aligned to the strains present in our database, leading the selection of species from a variety of genera. we scaled the read counts in the simulation to ∼ m reads. the other two selected samples are srr and srr the details of which are explained in supplementary s . in this section we only report the performance of the tools on the first sample. the analysis results for the other samples (which shows similar relative accuracy and performance for different tools) are provided in s . the assessment of "accuracy" directly from the aligned reads is not a trivial task. due to the high rate of multi-mapping in these simulated samples, and due to the fact that multiple references can produce alignments of the same quality as the "true" origin of the read, we calculate the accuracy by comparing the true and estimated abundances using a quantification tool (in this case, salmon) rather than by comparing the read alignments directly. in table the accuracy metrics are calculated over the abundance estimations obtained using the alignments produced by running the aligners in the different modes specified. the list of metrics for metagenomic expression evaluations have been chosen to be similar to previous work such as in bracken and karp . the metrics selected are spearman correlation, mean absolute relative difference (mard), mean absolute error (mae), and mean squared log error (msle). each metric measures different characteristics of the predicted versus true abundance estimates. for example, lower mard indicates better distribution of the reads among the references relative to the abundance of each reference, while mae shows the quality of the distribution of the reads in a more absolute way regardless of the difference between the abundance of the references. in this case, one misclassified read has the same impact on the mae metric both for a high-abundance and low-abundance reference. the definition of each of these metrics is provided in equation in the supplementary material. this experiment leads to three main observations. first, regardless of the alignment mode, quantifications derived from the debga alignments seem to lead to systematic underestimation of abundance. however, puffaligner, star and bowtie , show very similar behavior with respect to accuracy. star is the best in primary mode as well as when allowing alignments, closely followed by puffaligner. when allowing up to alignments per read, bowtie tends to yield the most accurate abundances, again with puffaligner being the close runner-up. these results demonstrate that puffaligner is a reliable alignment tool showing a stable pattern of being comparable to the best aligner under all the scenarios tested. that is, the good performance of puffaligner is robust across a variety of different parameter settings. moreover, due to the nature of the metagenomic data -the high degree of ambiguity and multi-mappingwe expect to see improvement in the accuracy metrics as more alignments are reported per read, as this leads to a higher recall. while star's accuracy changes only slightly from alignments to alignments (only improving mae) the results for puffaligner and bowtie improve considerably when allowing more alignments per read. however, this higher accuracy comes in the cost of alignment time for bowtie . as shown in figure , section b, bowtie alignment time increases sharply when allowing more alignments per read, while puffaligner exhibits only small changes in alignment time regardless of the maximum number of alignments being reported per read. the difference becomes especially evident when allowing up to alignments per read, where puffaligner is times faster than bowtie . additionally, in experimental data, many of the alignments reported do not necessarily have high quality, and only appear in the output as one of the alignments for the read. in fact, we note the similar accuracy achieved by puffaligner in beststrata mode compared to when we allow up to alignments per read. this observation is also consistent across the other two simulated samples in the supplementary table : accuracy of abundance estimation with salmon using alignments reported by each aligner for the mock sample simulated from a real sample with accession id srr . we have run all the aligners in three main modes; allowing only one best alignment with ties broken randomly (primary), up to alignments reported per read, and up to alignments reported per read. puffaligner and star support a fourth mode that allows reporting all equally best alignments (beststrata). this option improves the performance while maintaining, or even slightly improving, the accuracy of the results. as well-known aligners like bowtie and star. on these data, it exhibits memory requirements close to those of the memory-frugal bowtie , while being much faster. the trend shows the effect of database size as well as redundancy and sequence similarity on the scalability of each of the tools. tools such as puffaligner and debga, which build a de bruijn graph based index on the input sequence, specifically compress similar sequences into unitigs and therefore scale well for databases with high redundancy such as microbiomes. it is worth mentioning that bowtie requires a switch from a -bit index to a -index as the total count of the input bases increases, which is another reason why the size is growing super-linearly. here, we compare a pipeline for metagenomic abundance estimation comprised of aligning reads using puffaligner followed by quantification with salmon, to a k-mer -based abundance estimation method that consists of classifying reads with kraken and estimating the abundances with bracken. to perform this comparison, we construct an index over all bacteria, viral, archaea, and fungi obtained from ncbi taxonomy database on may , using both pufferfish and kraken . we ran both approaches over samples from different categories of metagenomic analyses. ten of the samples are selected from non-human projects such as the "metasub project" as well as from submarine or soil samples . the remaining samples are selected from the human metagenome project (hmp) . the human samples are chosen from different tissue categories of plasma, tongue dorsal, gingiva, vaginal, and fecal. we then compare the abundance of the reported references under each of the two pipelines. we run puffaligner in two modes; the default mode, which only reports the best concordant alignments, and a less restrictive mode which will be explained. the default mode does not allow any orphan, discordant, or dovetail alignments, filters any mapping with alignment score less than . fraction of the maximum possible alignment score, i.e., when the read matches the reference perfectly, and for each read reports only alignments with the highest score (beststrata). we also ran puffaligner allowing orphans, discordant, and dovetail alignments, but keeping the rest of the parameters as default. we use salmon to estimate the abundance of references from the alignments. while this is a reasonable pipeline for metagenomic abundance estimation, it is likely that the accuracy could be further improved by incorporating specific features of the metagenomic data in the abundance estimation step, such as the topology of the taxonomy tree and the expression of certain marker genes. we also run kraken in two different modes. first, we run it in the "default" mode, which allows all reads having any k-mer match to be classified. second, we run it setting the confidence option to a value c, which prevents reporting reads that have a confidence lower than c. as per the authors' definition § , this number is calculated based on the ratio of unique k-mers mapped to a taxa in the taxonomy tree over all the non-ambiguous k-mers of the read (k-mers not containing an n character). there is not a one-to-one correspondence between the confidence threshold here and the "minscorefraction" in puffaligner. however, we believe both of these options are necessary for providing more reliable abundance estimates, by removing the reads that exhibit poor evidence of deriving from an indexed references. filtering orphaned, discordant, or dovetail reads is a feature only available to the alignment-based procedures and not k-mer counting approaches, since they do not jointly consider such properties of the reads containing the k-mers . that is why we provide both modes for puffaligner, allowing and disallowing these sort of reads, to provide a fair comparison of the two methods. in the absence of technical errors and variation from the reference genomes, if a sequencing read comes from a subset of references in the index, then there will be at least one exact match for it. however, due to the presence of both sequencing errors and biological variation of the organisms in the samples with respect to the reference genomes, a perfect alignment does not exist for most reads. therefore, alignment-based methods try to find the sub-sequences in the reference which matches the read with a high alignment score (including both mismatches and gaps) and report these loci as the potential origin of the read. on the other hand, k-mer -based approaches break the read into its constituent k-mers and treat exact k-mer matches as evidence of a read originating from a reference. in most cases, there is a large degree of concordance between this approach and the alignment-based approaches for finding potential loci (i.e. the number of k-mer matches correlates well with alignment score). thus, in general, such approaches result in highly-correlated abundance estimates with those produced by alignment-based approaches, and they also tend to be very fast and memory efficient. nonetheless, such approaches still face challenges as the degree of sequence-similarity between strains (and species) can be very large and, simultaneously, the organisms present in a sample can show considerable sequence divergence from the strains in the reference database. as a result, k-mer -based approaches, while very sensitive, tend to sacrifices specificity in this type of data. in most cases, when no filter is applied to the taxonomic assignments, a k-mer -based approach overestimates the number of reads deriving from the reference strains in each sample. to see how the reference abundance estimations compare across the two pipelines of kraken and puffaligner we look at the top most highly-abundant (predicted) species for each sample and their abundance in supplementary figure s . we run kraken both with no confidence filter and filtering with a confidence value of . , and compare the results with default puffaligner (with minscorefraction=. ). in certain samples, observe similar highly-abundant species discovered by kraken (with no confidence filter) and puffaligner that are reported as less abundant when processed with kraken using a confidence threshold of . (e.g. streptococcus gordonii § https://github.com/derrickwood/kraken /wiki/manual for one of the subway samples). however, for other samples, applying the confidence threshold to the kraken results brings the predictions closer to those of puffaligner (e.g. lactobocillus crispatus for two vaginal sample). we do the same analyses at the genus level in figure s and there we observe similar inconsistencies. ). the count of filtered reads for both puffaligner and kraken depends on the sample and quality of the reads (the two plots on the top) as well as the filtered read counts relative to puffaligner's (the bottom plots). however, there is an observed inconsistency in how changing the confidence threshold of kraken affects the number of filtered reads when comparing to the alignment-based approach of puffaligner. in fact, not only is there no fixed confidence value that corresponds to the analogous alignment options, but further there is no corresponding value that provides a smooth consistent effect on the number of filtered reads. there are large changes from allocating many more read to allocating many fewer reads compared to puffaligner in some samples, whereas the allocated read count barely changes in others. to further investigate how the filtering of low quality reads in the two approaches compare, we look at the trend of the number of reads filtered in kraken while applying different confidence thresholds with respect to the default puffaligner in figure . we run kraken with confidence values of . , . , . , as well as . which is recommended by the authors for filtering low quality alignments for general purposes ¶ . we compare the results of running the kraken + bracken pipeline with puffaligner + salmon pipeline under both modes of puffaligner ¶ https://github.com/derrickwood/kraken /issues/ (default and less restrictive) so that the effect of the read-pair-based filters (those taking into account read orphan and dovetail status) can be evaluated. the plots in the top row of figure show the read count difference of default puffaligner and kraken applying different confidence values. the total read count varies widely across samples. therefore, to have a better understanding of the results, we normalize the read count differences based on the puffaligner reported read count in the two bottom plots of figure . as is observed, applying the confidence threshold may not cause a uniformly similar impact on the absolute read count for all samples. we expect to see a similar relative effect when applying the threshold over all the samples, i.e. the read count consistently becoming closer to or further from that obtained by the alignment-based approach. although this is the observed trend for small confidence values which are less effective, the plots in figure do not show a consistent pattern for larger confidence values in all the samples, where the application of the threshold in kraken makes the reported read counts sometimes closer and sometimes farther from the alignment-based pipeline. as might be expected, the read counts reported in kraken are further from the puffaligner pipeline in the default mode versus when allowing orphans and discordant alignments in the less restrictive mode (as can be seen when comparing the two columns of figure ). to summarize, through the experiments in this section, we observe that the k-mer -based approaches sometimes result in different reported highly-abundant references than alignment-based approach and, crucially, applying a score filter to the k-mer -based approaches does not always make the abundance predictions for all species closer to the alignment-based predictions. the efficacy of the score filter in the k-mer -based approaches may depend on the technical and biological details particular to that sample, so that the score filter that best matches the alignment-based approach in one sample may not be the same score filter that best matches the alignment-based approach in other samples. this is, perhaps, expected, as the alignment-based approach is taking into account more information, both within and across the ends of a paired-end fragment, than simply the number of matching k-mers . in this sense, we believe that a sensitive and efficient aligner like puffaligner, paired with a capable abundance estimation tool, provides a pipeline for metagenomic abundance estimation that, while sacrificing some speed, is likely to be more accurate and robust than k-mer -based approaches. in this paper we introduce puffaligner, an aligner suitable for the contiguous alignment of short-read sequencing data. we demonstrate its use in aligning dna-seq reads to the genome of a single species, aligning rna-seq reads to the transcriptome, and aligning dna-seq reads from metagenomic samples to a large collection of references. it is built on top of the pufferfish index, which constructs a colored compacted de bruijn graph using the input reference sequences. puffaligner begins read alignment by collecting unique maximal exact matches, querying k-mers from the read in the pufferfish index. the aligner then chains together the collected uni-mems using a dynamic programming approach, choosing the chains with the highest coverage as potential alignment positions for the reads. finally, puffaligner is able to efficiently compute alignment, exploiting information from long matches in the chains and making use of an alignment cache to avoid redundant work. we compared the accuracy and efficiency of puffaligner against two widely-used alignment tools, bowtie and star, that perform unspliced and (optionally) spliced alignments of reads, respectively. we also compare the results against debga, an aligner that also utilizes an index built over the compacted de bruijn graph. we analyze the performance of these tools on both simulated and experimental dna and rna sequencing datasets. the accuracy of puffaligner is comparable to bowtie , which exhibits very high alignment. puffaligner generally performs better than star and debga (though, unlike star, none of these other tools currently support spliced read alignment). in terms of speed and memory, puffaligner reaches a tradeoff between the relatively high memory usage of star and debga and the slower speed of bowtie . hence, while the memory requirement of puffaligner is more than that of bowtie , the speed gain is significant. in the tests performed in this manuscript, puffaligner is almost always the fastest tool ( with the exception being that star is faster when aligning unspliced dna-seq reads to a single human genome). an additional advantage of the pufferfish index utilized in puffaligner is that it can be built on a mixed collection of genomes, transcriptomes, or both. this feature is already utilized in a specific pipeline for rna-seq quantification that makes use of a joint index over the genome and transcriptome . the analysis shows that specificity of alignments in such a case can be improved by filtering from quantification reads that are better aligned to some genomic locus that is not present in the transcriptome. furthermore, the nature of the pufferfish index, that explicitly factorizes out highly-repetive sequence, coupled with the fast (and repetition-aware) alignment procedure of puffaligner makes it a particularly useful for indexing and aligning to a highly similar collection of sequences. this potentially makes it a good match for metagenomic analyses. we have provided a proof of concept for such a puffaligner-based metagenomic analysis pipeline, and plan to build a more sophisticated and fully-featured metagenomic analysis framework around puffaligner in the future. puffaligner is an aligner built on top of the pufferfish indexing data structure. pufferfish is a space-efficient and fast index for the colored compacted de bruijn graph (ccdbg). a colored compacted de bruijn graph is a graph whose vertices (strings) are the compacted non-branching paths of the underlying de bruijn graph, with the restriction that each node also have the same color set (set of reference sequences in which it appears). the nodes in the colored compacted de bruijn graph are referred to as unitigs. each unitig can be mapped to a list of tuples that describe exactly how this subsequence appears in the unlderying collection of references. the basic query operation in the pufferfish index is to query a k-mer from the input sequence against the index. given this query, the pufferfish index returns the unique position (and orientation) where this k-mer appears in the colored compacted de bruijn graph (or a sentinel value if this k-mer does not occur). this match between the query and the graph can then be easily "unpacked" into the implied list of matches with the underlying references by finding all of the places that the matched unitig appears in the reference sequences and translating the relative position within the unitig into the corresponding reference position (and adjusting the orientation if necessary). the output of this step is then a list of all of the reference sequences, positions, and orientations where this exact match occurs. while k-mer query is the basic operation performed by the index, we actually do not use k-mer matches directly, and instead extend the initial match into unique maximal exact matches (uni-mems). specficially, each k-mer match is extended simultaneously in both the query and reference to obtain a longer exact match. the exact matches to the unitigs, called uni-mems, are then projected to the positions on the references associated to that unitig. then, uni-mems are aggregated into mems (described below) on each reference, and the chains of mems with the highest score are selected. in the case of paired-end reads, the chains of the left and right ends are paired with respect to their distance, orientation, etc. finally, rather than fully aligning each query sequence to the anchored position on the reference, only the sub-sequences from the query that are not part of the uni-mems (exact matches) are aligned to the reference; we call this procedure the between-mem alignment. each of these steps are explained in detail in the following sections. the pufferfish index provides puffaligner with an efficient index for k-mer lookup within a list of references. specifically, the core components of the index are ( ) a minimal perfect hash function (mphf), ( ) a unitig sequence vector, ( ) a unitig-to-reference table, and ( ) a vector storing the position associated with each k-mer in the unitig sequence vector. the unitig sequence vector contains all the unitigs in the ccdbg. the pufferfish index admits efficient exact search for k-mers , as well as longer matches that are unique in both the query string and colored compacted de bruijn graph. these matches, called uni-mem, were originally defined in debga . a uni-mem is a maximal exact match (mem) between the query sequence and a unitig. using the combination of the mphf and the position vector, a k-mer is mapped to a unitig in the unitig sequence vector. the k-mer is then extended to a uni-mem via a linear scan of the query sequence and the unitig sequence vector. each uni-mem can appear in multiple different references, and since uni-mems must be completely contained within a unitig, it is possible for multiple uni-mems to be directly adjacent on both the query and some references where the unitig appears. uni-mem collection: the first step in read alignment is to collect exact matches shared between the query (single-end or paired-end reads) and the reference. in puffaligner, this is accomplished by collecting the set of uni-mems that co-occur between the query and reference. puffaligner starts processing the read from the left-end and looks up each k-mer that is encountered until a match to the index is found. once a match is discovered, it is extended in both query and the reference until one of these termination conditions occur: ( ) a mismatch is encountered, ( ) the end of the query is reached, or ( ) the end of the unitig is reached. this process results in a uni-mem match shared between the query and reference. uni-mems where extension is terminated as a result of reaching the end of a unitig must later be examined and potentially "collpased" together to form mems with respect to the references on which they appear. if the uni-mem extension is not terminated as a result of reaching the end of the query, then the position in the read is incremented by a small value and the same procedure is repeated for the next k-mer on the read. this process continues until either the uni-mem extension terminates because the end of the query is reached, or because the last k-mer of the query is searched in the index. here, we recall an important property of uni-mem extension that is different from e.g. mem extension or maximum mappable prefix (mmp) extension . due to the definition of the ccdbg, it is guaranteed that any k-mer appearing within a uni-mem cannot appear in any other unintig in the ccdbg. thus, extending k-mers to maximal uni-mems is, in some sense, safe with respect to greedy extension, as such extension will never cause missing a k-mer that would lead to another distinct uni-mem shared between the query and reference. the concept of safe extension of kmer matches was introduced in . filtering highly-repetitive uni-mems: in order to avoid expending computation on performing the subsequent steps on regions of reads mapping to highly-repeated regions of the reference, any uni-mem that appears more than a user-defined number of times in the reference is discarded. in this manuscript, we use the threshold of . this filter has a strong impact on the performance, since, even if one k-mer from the read maps to a highly-repetitive region of the reference, the following expensive steps of the alignment procedure should be performed for every mapping position of the uni-mem to find the right alignment for the read, while the less repetitive uni-mems also map to the true origin of the read on the reference as well. the drawback of this filter is that for a very small fraction of the reads which are truly originating from a highly-repetitive region, all of the matched uni-mems will be filtered out and no hit remains for aligning the read. however, we find that in the case of aligning paired-end reads, usually one end of the read maps to a non-repetitive region, then, the alignment of the other end can be recovered using orphan recovery (explained in section . ). futheremore, we also provide a flag -allowhighmultimappers that mitigates the effect of this filter for a slight tradeoff on the alignment performance. as shown in figure , having all the mems (maximal exact matches) from a read to each target reference, the goal of this step is to find promising chains of mems that cover the most unique bases in the read in a concordant fashion and that can potentially lead to a high quality alignment. to accomplish this, we adopt the dynamic programming approach used in minimap for finding co-linear chains of mems that are likely candidates to support high-scoring read alignments. as mentioned in minimap , all the mems from a read r to the reference t, are sorted by the ending position of the mems on the reference. then, this algorithm computes a score for each set of mems based on the number of unique covered bases in the read, the coverage score is also penalized by the length of the gaps, both in the read and reference sequence, between each consecutive pair of mems. in the first step, the chaining algorithm chooses the best chain of mems that provide the highest coverage score for each end of the read, that is the m -m chain for the left end and two single mem chain for the right end. then, the selected chains from each end are joined together to find the concordant pairs of chains, that is the (m -m , m ) pair for this read as m is too far from m -m . then, the chain from each end will go through to the next step, between-mem alignment. for the green areas (mems) no alignment is recalculated as they are exact matches. only the un-matched blue parts of the chains (those nucleotides not occurring within a mem) are aligned using a modified version of ksw . in puffaligner, if the distance between two mems, m and m , on the read and the reference is d r and d t respectively, these two mems should not be chained together if |d r −d t |>c, where c is the maximum allowed gap. so, the penalization term, the β value in , in the coverage score computation is modified accordingly to prevent pairing of such mems. also, unlike what is done in minimap , rather than considering together the mems that are discovered on both ends of a paired-end read, we consider the chaining and chain filtering for each end of the read separately. this is done in order to make it easier to enforce the orientation consistency of the individual chains. specifically, the chaining algorithm that is presented in introduces a transition in the recursion that can be used to switch between the mems that are part of one read and those that are part of the other. however, such switching makes it difficult to enforce the orientation consistency of the chains that are being built for each end of the read. one solution to this problem is to add another dimension to the dynamic programming table, encoding if one has already switched from the mems of one read end to the other, and the recurrence can be modified to allow only one switch from the one read end to the other, allowing enforcement of orientation consistency. however, we found that, in practice, simply chaining the read ends separately led to better performance. finally, we also adopt the heuristic proposed by minimap when calculating the highest scoring chains. that is, when a mem is added to the end of an existing chain, it is unlikely that a higher score for a chain containing this mem will be obtained by adding it to a preceding chain. thus, we consider only a small fixed number of rounds (by default ) of preceding chains once we have found the first chain to which we can add the current mem. the chaining algorithm described above finds the best chains of mems shared between the read r and the reference t in orientation o. a chain is accepted if its score is greater than a configurable fraction, which we call the consensusfraction, times the maximum coverage score found for the read r to any reference. throughout all the experiments in this manuscript the consensusfraction is set to . . if a chain passes the consensus fraction threshold, we call it a valid chain. additionally, rather than keeping all valid chains, we also filter highly-suboptimal chains with respect to the highest scoring chain per-reference. all valid chains shared between r and t are sorted by their scores, and chains having scores within % of the highest scoring chain for reference t are selected as potential mappings of the read r to the reference t. while these filters are essential for improving the throughput of the algorithm in finding the right alignment, they are carefully selected to have very little effect on the sensitivity of puffaligner. for all the experiments in this manuscript, the same default settings of these parameters are used if not mentioned otherwise. after finding the high-scoring mem chains for each reference sequence, a base-to-base alignment of the read to each of the candidate reference sequences is computed. each selected chain implies a position on the reference sequence where the read might exhibit a high quality alignment. thus, we can attempt to compute an optimal alignment of the read to the reference at this implied position, potentially allowing a small bit of padding on each side of the read. this approach utilizes the positional information provided by the mem chains. however, the starting position of the alignments is not the only piece of information embedded in the chains. rather each chain of mems consists of sub-sequences of the read (of size at least k, though often longer) which match exactly to the reference. while the optimal alignment of the read to the reference at the position being considered is not guaranteed to contain these exact matches as alignments of the corresponding substrings, this is almost always the case. in puffaligner, we aim to exploit the information from the long matches to accelerate the computation of the alignments. in fact, since only chains with relatively high coverage score are selected, a large portion of the read sequences are typically already matched to the positions in the reference with which they will be matched in the final optimal alignment. for instance, in fig. , for the final chains selected on the reference sequence, it is already known for the light blue, dark blue and green sub-sequences on the left end of the read precisely where they should align to the reference. likewise this is the case for the yellow and purple sub-sequences on the right read. the unmapped regions of the reads are either bordered by the exact matches on both sides, or they occur at the either ends of the read sequence. puffaligner skips aligning the whole read sequence by considering the exact matches of the mems to be part of the alignment solution. as a result, it is only required to compute the alignment of the small unmapped regions, which reduces the computation burden of the alignments. when applying such an approach, two different types of alignment problems are introduced, which we call bounded sub-sequence alignment and ending sub-sequence alignment. for bounded sub-sequence alignment, we need to globally align some interval i r of the read to an interval i t of the reference. if i r and i t are of different lengths, the alignment solution will necessarily include insertions or deletions. if i r and i t are of the same length, then the optimal global alignment between them may or may not include indels. for each such bounded sub-sequence alignment, we determine the optimal alignment of i r to i t by computing a global pair-wise alignment between the intervals, and stitching the resulting alignment together with the exact matches that bound these regions. gaps at the beginning or the end of the read are symmetric cases, and so we describe, without loss of generality, the case where there is an unaligned interval of the read after the last mem shared between the read and the reference. in this case, we need to solve the ending sub-sequence alignment problem. here, the unaligned interval of the read consists of the substring spanning from the last nucleotide of the terminal mem in the chain, up through the last nucleotide of the read. there is not a clearly-defined interval on the reference sequence. while the left end of the relevant reference interval is defined by the last reference nucleotide that is part of the bounding mem, the right end of the reference interval should be determined by actually solving an extension or "end-free" alignment problem. we address this by performing extension alignment of the unaligned interval of the read to an interval of the reference that begins on the reference at the end of the terminal mem, and extends for the length of the unaligned query interval plus the length of some problem-dependent buffer (which is determined by the maximum length difference between the read and reference intervals that would still admit an alignment within the acceptable score threshold). an example of both of these cases is displayed in figure . specifically, an alignment of the read could be obtained by only solving two smaller alignment problems; one is the ending sub-sequence alignment of the unmapped region after the green mem on the left read and the other is the bounded sub-sequence alignment of region on the right read bordered by the yellow and purple mems. puffaligner uses ksw , for computing the alignments of the gaps between the mems and for aligning the ending sequences. ksw exposes a number of alignment modes such as global and extension alignments. for aligning the bounded regions, ksw alignment in the global mode is performed, and for the gaps at the beginning or end of reads, puffaligner uses the extension mode to find the best possible alignment of that region. puffaligner, by default, uses a match score of and mismatch penalty of . for indels, puffaligner uses an affine gap scoring schema with gap open penalty of and gap extension penalty of . in puffaligner, after computing the alignment score for each read, only the alignments with a score higher than τ times the maximum possible score for the read are reported. the value of τ is controlled by the option -minscorefraction, which is set to . by default. by only aligning the read's sub-sequences that are not included in the mems, the size of alignment problems being solved in puffaligner are often much shorter than the length of the read. however, to further speed up alignment, we also incorporate a number of other techniques to improve the performance of the alignment calculation. we describe the most important of these below: • skipping alignment calculation by recognizing perfect chains and alignment caching: it is possible to avoid the alignment computation completely in a considerable number of cases. in fact, as has been explained in previous work , the alignment calculation step can be completely skipped if the set of exact matches for each chain covers the whole read. puffaligner skips alignment for cases where the coverage score of chains of mems is the length of the read, and assigns a total matched cigar string for that alignment. alignment computation of a read might be also skipped if the same alignment problem has been already detected and computed for this read. for example, in the case of rna seq data, reads often map to the same exons on different transcripts. in such cases, each alignment solution for a read is stored in a cache (a hash table) so that if the same alignment problem is detected, the solution can be directly retrieved from the cache, and no further computation is required (see supplementary table s ). • early stopping of the alignment computation when a valid score cannot be achieved: while care is taken to produce only high-scoring chains between the read and reference, it is nonetheless the case that the majority of the chains do not lead to an alignment of acceptable quality. since the minimum acceptable alignment score is immediately known based on τ and the length of the read, the base-to-base alignment calculation can be terminated at any point where it becomes imposible for the minimum required alignment score to be obtained. this approach can be applied both during the ksw alignment calculation, and also after the alignment calculation of each gap is completed. during this procedure, for each base at position i, starting from position on the read of length n, if the best alignment score p up to the i-th position is s i , we can calculate the maximum possible alignment score, s max , that might be achieved starting at this location given the current alignment score by: where ms is the score assigned to each match. if s max is smaller than minimum required score for accepting the alignment, the alignment calculation can be immediately terminated, since it is already known that this anchor is not going to yield a valid alignment for this read. • full-sensitivity banded alignment: ksw is able to perform banded alignment to make alignment calculation more efficient. in this mode, the dynamic programming matrix for the alignment problem is only filled out along the sub-diagonals out to a certain distance d away from the main diagonal. if one is guaranteed that any valid alignment must have fewer than d insertions or deletions, then the alignment must not exit these bands of the dynamic programming matrix. note that alignments with >d indels can be represented within these bands as insertions and deletions move in opposite anti-diagonal directions, but it is certainly the case that no alignment with ≤d indels can exit these bands. by calculating the maximum number of gaps (insertions or deletions) allowed in each sub-alignment probem, in a way that we are certain that any alignment having greater than this number of gaps must drop below the acceptable threshold, we utilize the banded alignment in ksw within each sub-alignment problem without losing any sensitivity with respect to non-banded alignment. finally, once alignments have been computed for the individual ends of a read, they must be paired together to produce valid alignments for the entire fragment. at this point in the process, on each reference sequence, there are a number of locations where the left end of each read or the right end of each read, or both, are mapped to the reference. for the purpose of determining which mappings will be reported as a valid pair, the mappings are joined together only if they occur on opposite strands of the reference, and if they are within a maximum allowed fragment length. there are two different types of paired-end alignments that can be reported by puffaligner; concordant and discordant. if puffaligner is disallowed from reporting discordant alignments, then the mapping orientation of the left and right end should agree with the library preparation protocols of the reads. puffaligner first tries to find concordant mapping pairs on a reference sequence, and if no concordant mapping is discovered and the tool is being run in a mode where discordant mappings are allowed, then puffaligner reports pairs that map discordantly. here, discordant pairs may be pairs that do not, for example, obey the requirement of originating from opposite strands. while this is not expected to happen frequently, it may occur if there has been an inversion in the sequenced genome with respect to the reference. orphan recovery: if there is no valid paired-end alignment for a fragment (either concordant or discordant, if the latter is allowed), then puffaligner will attempt to perform orphan recovery. the term "orphan" refers to one end of paired-end read that is confidently aligned to some genomic position, but for which the other read end is not aligned nearby (and paired). to perform orphan recovery, puffaligner examines the reference sequence downstream of the mapped read (or upstream if the mapped read is aligned to the reverse complement strand) and directly performs dynamic programming to look for a valid mapping of the unmapped read end. for this purpose, we use the "fitting" alignment functionality of edlib to perform a simple levenshtein distance based alignment that will subsequently be re-scored by ksw . finally, if, after attempting orphan recovery, there is still no valid paired-end mapping for the fragment, then orphan alignments are reported by puffaligner (unless the "--noorphans" flag is passed). rp is a co-founder of ocean genomics inc. supplementary material for "puffaligner : an efficient, and accurate aligner based on the pufferfish index" fatemeh almodaresi, mohsen zakeri, and rob patro intersection size the formula for calculating the metrics used for evaluating the abundance estimation results in the manuscript are as follows. ( ) fast and accurate short read alignment with burrows-wheeler transform fast gapped-read alignment with bowtie hisat: a fast spliced aligner with low memory requirements graph-based genome alignment and genotyping with hisat and hisat-genotype star: ultrafast universal rna-seq aligner the subread aligner: fast, accurate and scalable read mapping by seed-and-vote shrimp : sensitive yet practical short read mapping personalized copy number and segmental duplication maps using next-generation sequencing mrsfast: a cache-oblivious algorithm for short-read mapping a fast approximate algorithm for mapping long reads to large reference databases minimap : pairwise alignment for nucleotide sequences near-optimal probabilistic rna-seq quantification debga: read alignment with de bruijn graph-based seed and extension read mapping on de bruijn graphs browniealigner: accurate alignment of illumina sequencing data to de bruijn graphs a space and time-efficient index for the compacted colored de bruijn graph de novo assembly and genotyping of variants using colored de bruijn graphs succinct colored de bruijn graphs rainbowfish: a succinct colored de bruijn graph representation mantis: a fast, small, and exact large-scale sequence-search index a global reference for human genetic variation fastp: an ultra-fast all-in-one fastq preprocessor gencode reference annotation for the human and mouse genomes upsetr: an r package for the visualization of intersecting sets and their properties alignment and mapping methodology influence transcript abundance estimation mason: a read simulator for second generation sequencing data salmon provides fast and bias-aware quantification of transcript expression towards selective-alignment: bridging the accuracy gap between alignment-based and alignment-free transcript quantification a revisit of rsem generative model and its em algorithm for quantifying transcript abundances. biorxiv polyester: simulating rna-seq datasets with differential transcript expression the genotype-tissue expression (gtex) project rsem: accurate transcript quantification from rna-seq data with or without a reference genome aligning short sequencing reads with bowtie a new coronavirus associated with human respiratory disease in china programmed ribosomal frameshifting in decoding the sars-cov genome unique epidemiological and clinical features of the emerging novel coronavirus pneumonia (covid- ) implicate special control measures probable pangolin origin of sars-cov- associated with the covid- outbreak on the origin and continuing evolution of sars-cov- sub-biocrust soil microbial communities from mojave desert, california, united states - hms. sequence read archive (sra) national library of medicine (us) bracken: estimating species abundance in metagenomics data using pseudoalignment and base quality to accurately quantify microbial community composition the ncbi taxonomy database the metagenomics and metadesign of the subways and urban biomes (metasub) international consortium inaugural meeting report abundant rifampin resistance genes and significant correlations of antibiotic resistance genes and plasmids in various environments revealed by metagenomic analysis the human microbiome project: a community resource for the healthy human microbiome introducing difference recurrence relations for faster semi-global alignment of long sequences edlib: a c/c++ library for fast, exact sequence alignment using edit distance nonhuman/uhongkong/estuary-sediment/srr nonhuman/uhongkong/estuary-sediment/srr cpm-acinetobacter nosocomialis cpm-acinetobacter radioresistens cpm-actinomyces oris cpm-alicycliphilus denitrificans cpm-bacillus cereus cpm-bacteroides dorei cpm-bacteroides ovatus cpm-bacteroides thetaiotaomicron cpm-bacteroides vulgatus cpm-candidatus nitrosopumilus sediminis cpm-cloacibacterium normanense cpm-corynebacterium matruchotii cpm-cutibacterium acnes cpm-desulfosarcina alkanivorans cpm-dolosigranulum pigrum cpm-enterobacter roggenkampii cpm-enterococcus faecalis cpm-escherichia virus lambda cpm-fusarium pseudograminearum cpm-gemella haemolysans cpm-haemophilus haemolyticus cpm-haemophilus sp. oral taxon cpm-lactobacillus crispatus cpm-lactobacillus iners cpm-lactobacillus sp cpm-magnetospirillum gryphiswaldense cpm-methylocystis rosea cpm-moraxella catarrhalis cpm-mycolicibacterium insubricum cpm-neisseria subflava cpm-nitrosopumilus maritimus cpm-pantoea sp cpm-staphylococcus aureus cpm-stenotrophomonas acidaminiphila cpm-stenotrophomonas nitritireducens cpm-stenotrophomonas kctc cpm-streptococcus gordonii cpm-streptococcus mitis cpm-streptococcus pneumoniae cpm-streptococcus pyogenes cpm-streptococcus sanguinis cpm-streptococcus sp cpm-streptococcus sp. oral taxon cpm-streptomyces lividans cpm-synechococcus sp nonhuman/uhongkong/estuary-sediment/srr nonhuman/uhongkong/estuary-sediment/srr nonhuman/uhongkong/estuary-sediment/srr nonhuman/uhongkong/estuary-sediment/srr heatmap showing most popular genera over samples through three pipelines of kraken (no confidence)+bracken, kraken (confidence= . )+bracken and default puffaligner+salmon the authors would like to thank laraib iqbal malik for her help and suggestions for improving the manuscript. this work has been funded by r hg , nsf ccf- , and nsf cns- to r.p. the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. key: cord- - gnbe n authors: gonzález-arias, fabio; reddy, tyler; stone, john e.; hadden-perilla, jodi a.; perilla, juan r. title: scalable analysis of authentic viral envelopes on frontera date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gnbe n enveloped viruses infect host cells via fusion of their viral envelope with the plasma membrane. upon cell entry, viruses gain access to all the macromolecular machinery necessary to replicate, assemble, and bud their progeny from the infected cell. by employing molecular dynamics simulations to characterize the dynamical and chemical-physical properties of viral envelopes, researchers can gain insights into key determinants of viral infection and propagation. here, the frontera supercomputer is leveraged for large-scale analysis of authentic viral envelopes, whose lipid compositions are complex and realistic. vmd with support for mpi is employed on the massive parallel computer to overcome previous computational limitations and enable investigation into virus biology at an unprecedented scale. the modeling and analysis techniques applied to authentic viral envelopes at two levels of particle resolution are broadly applicable to the study of other viruses, including the novel coronavirus that causes covid- . a framework for carrying out scalable analysis of multi-million particle md simulation trajectories on frontera is presented, expanding the the utility of the machine in humanity’s ongoing fight against infectious disease. v iruses are pathogens that cause infectious diseases in living organisms. because of their of lack metabolic capabilities, viruses require the molecular machinery of a host cell to replicate [ ] . virus particles, referred to as virions, exhibit chemical and structural diversity across families. the detailed architecture of a virus determines its fitness, mechanism of infection and replication, and host cell tropism. the most basic virions consist of genomic material protected by a protein shell, called a capsid. the viral genome contains the blueprint for synthesis of the macromolecular components that will form progeny virions. depending on the virus, the genome may be encoded as rna or as dna [ ] , [ ] . more complex virus structures exhibit an exterior membrane, called an envelope, composed of a lipid bilayer. fusion of the viral envelope with the plasma membrane of the host cell initiates infection. enveloped viruses typically incorporate surface glycoproteins that interact with host cell receptors to mediate adhesion and facilitate membrane fusion [ ] . the composition of the viral envelope, along with the specificity of the adhesion proteins, contributes to the recognition, attachment, and fusion of the pathogen with its host [ ] , [ ] . other membrane-embedded structures that may be displayed by a virion include viral ion channels, called viroporins, proteins that provide particle scaffolding or assembly support, or proteins that participate in the release of progeny virions [ ] . enveloped viruses, such as influenza a, ebola, and hiv- , account for numerous cases of infection and death in humans each year. sars-cov- , the novel coronavirus that causes covid- , is also enveloped. the sars-cov- virion incorporates a class-i fusion glycoprotein called the spike (s), a purported viroporin known as the envelope (e) protein, and a structurally essential integral membrane (m) protein [ ] . while each of these components plays a key role in the viral life cycle, it is ultimately the envelope that enables each to carry out its function successfully. as the envelope is derived from the host cell membrane, particularly from the plasma membrane or organelle in which the virion assembles, its lipid composition may be highly complex [ ] . further, the envelope may be asymmetric, or present different lipid compositions across the inner versus outer leaflets of the bilayer; membrane asymmetry is crucial to the function of most organelles and may likewise be important in the assembly or infection processes of viruses [ ] , [ ] . detailed characterization of viruses is critical to the development of new antiviral therapies [ ] . computational studies now routinely contribute to the foundation of basic science that drives innovation in pharmacy, medicine, and the management of public health. notably, molecular dynamics (md) simulations have emerged as a powerful tool to investigate viruses. following advances in highperformance computing, md simulations can now be applied to elucidate chemical-physical properties of large, biologically relevant virus structures, revealing insights into their mechanisms of infection and replication that are inaccessible to experiments [ ] - [ ] . in recent years, md simulations of intact enveloped virions, including influenza a, dengue, and an immature hiv- particle, have been reported. state-of-the art lipidomics profiling of viruses has enabled some of these models to contain realistic lipid composition, leading to the first computational studies of authentic viral envelopes [ ] - [ ] . here, next-generation simulation and analysis of authentic viral envelopes is discussed, leveraging the resources of the leadership-class frontera supercomputer. a framework for deploying largescale analysis of md simulation trajectories on frontera is presented. md simulations of membranes can be carried out at coarsegrained (cg) or atomistic levels of detail. cg models employ particles that consolidate groups of atoms. atomistic models employ discrete particles for each constituent atom, including hydrogen. due to the size and chemical complexity of authentic viral envelopes, md simulations can include millions -even hundreds of millions -of particles, particularly when the physiological solvent environment of the system is taken into account. moreover, md simulations of membranes must be performed on timescales of hundreds of nanoseconds to microseconds in order to characterize dynamical properties, such as lipid lateral diffusion and flipflop between leaflets of the bilayer [ ] , [ ] . cg models reduce computational expense by eliminating degrees of freedom, enabling the exploration of extended simulation timescales; however, the loss of detail versus atomistic models reduces simulation accuracy. an approach that combines the strengths of both methods involves building and simulating a cg model, which reproduces essential features of the membrane and allows equilibration of lipid species [ ] , then backmapping the cg representation to an atomistic model for further study [ ] , [ ] . the backmapping process can be precarious and depends on the molecular mechanics force field and energy minimization scheme to resolve structural artifacts. either way, the incredible number of particle-particle interactions that must be computed to reproduce the behavior of a membrane bilayer over extended simulation timescales, especially for complex, large-scale systems like viral envelopes, presents a significant computational challenge [ ] . furthermore, the amount of data generated by md simulations of intact viral envelopes is tremendous, both in terms of chemical intricacy and storage footprint. md simulation trajectories, which may range from tens to hundreds of terabytes in size, fail to provide new information about viruses until they are analyzed in painstaking detail. the magnitude of trajectory files and the sheer numbers of particles that must be tracked during analyses necessitates massively parallel computational solutions. it follows that such large datasets cannot be feasibly transferred to other locations for analysis or visualization, and interaction with the data to yield scientific discoveries must be conducted on the same high-performance computing resource used to run the simulation in the first place. frontera is a leadership-class petascale supercomputer, funded by the national science foundation, and housed at the texas advanced computing center at the university of texas at austin. it is ranked as the fifth most powerful supercomputer in the world, and the fastest on any university campus. frontera provides an invaluable computational resource for furthering basic science research of large biological systems, including enveloped viruses. frontera comprises multiple computing subsystems. the primary partition is cpu-only and consists of , compute nodes powered by intel xeon platinum (clx) processors; each node provides cores ( cores per socket) and gb of ram. the large memory partition includes additional compute nodes with cores ( cores per socket) and memory upgraded to . tb nvdimm. another partition is a hybrid cpu/gpu architecture, consisting of compute nodes powered by intel xeon e - v processors and nvidia quadro rtx gpus; each node provides cores ( cores per socket), four gpus, and gb of ram. all frontera nodes contain gb solidstate drives and are interconnected with mellanox hdr infiniband. the longhorn subsystem consists of hybrid cpu/gpu compute nodes powered by ibm power processors and nvidia tesla v gpus; each node provides cores ( cores per socket), four gpus, gb of ram, and gb of local storage. eight additional nodes have ram upgraded to gb to support memory-intensive calculations. longhorn is interconnected with mellanox edr infiniband. frontera has a multi-tier file system and provides pb of lustre-based storage shared across nodes. the ability to store massive amounts of data and analyze this data in parallel enables the investigation of biological systems at an unprecedented scale. researchers gain access, not only to the computational capability to perform md simulations of millions of particles, but also to the capability to extract more complex and comprehensive information from their simulation trajectories, leading to deeper discoveries relevant to the advancement of human health. frontera's lustre filesystem is crucial for the performance of large-scale biomolecular analysis. lustre decomposes storage resources among a large number of so-called object store targets (osts) that are themselves composed of moderate sized high-performance raid-protected disk arrays. in much the same way that a raid- array can stripe a file over multiple disks, lustre can similarly stripe files over multiple osts. this second level of striping over osts is particularly advantageous when working with very large files, since i/o operations at different file offsets can be serviced in parallel by multiple independent osts, according to the stripe count and stripe size set by the user, or by system defaults. figure a illustrates how an md simulation trajectory (and potentially even individual frames) can be striped over osts. figure b demonstrates the scalability of multi-million particle biomolecular analysis on frontera when using the lustre filesystem. visual molecular dynamics (vmd) is a widely-used software for biomolecular visualization and analysis [ ] . commonly utilized as a desktop application to setup md simulations and interact with trajectory data, vmd exploits multi-core cpus, cpu vectorization, and gpu-accelerated computing techniques to achieve high performance on key molecular modeling tasks. vmd can also be used in-situ on massive parallel computers to perform large-scale modeling tasks, exploiting computing and i/o resources that are orders of magnitude greater than those available on even the most powerful desktop workstations. mpi implementations of vmd have already been employed on supercomputers to enable novel investigation of large virus structures, such as the capsid of hiv- [ ] . to facilitate large-scale molecular modeling pipelines, vmd incorporates support for distributed memory message passing with mpi, a built-in parallel work scheduler with dynamic load balancing, and easy to use scripting commands, enabling large-scale parallel execution of molecular visualization and analysis of md simulation trajectories. vmd's tcl and python scripting interfaces provide a userfriendly mechanism to distribute and schedule user-defined work across mpi ranks, to synchronize workers, and to gather results. this approach allows user-written modeling and analysis scripts to be readily adapted from existing scripts and protocols that have been developed previously on local computing resources, allowing robust tools to be deployed on a large-scale, often with few changes. vmd exploits node-level hardware-optimized cpu and gpu kernels, all of which are also used when running on distributed memory parallel computers. computationally demanding vmd commands are executed by the fastest available hardware-optimized code path, with a general purpose c++ implementation as the standard fall-back. when run in parallel, the i/o operations of each vmd mpi rank are independent of each other, allowing i/o intensive trajectory analysis tasks to naturally exploit parallel filesystems. vmd supports i/o-efficient md simulation trajectory formats that have been specifically designed for so-called burst buffers and flash-based high-performance computing storage tiers, with emerging gpu-direct storage interfaces achieving i/o rates of up to gb/sec on a single gpudense compute node, and conventional parallel i/o rates approaching tb/sec. vmd startup scripts can be customized to run one or multiple mpi ranks per node, while avoiding gpu sharing conflicts or other undesirable outcomes, allowing compute, memory, and i/o resources to be apportioned among mpi ranks, and thereby best-utilized for the task at hand. to facilitate large-scale simulation and analysis of biological systems containing millions of particles, vmd has been compiled on frontera with mpi enabled. frontera's capacity for massively parallel investigation of intact, authentic viral envelopes is demonstrated by application to envelope models at two levels of particle resolution. lipidomics profiling by mass spectrometry has established the lipid composition of the hiv- viral envelope [ ] , [ ] . based on this experimental data, a cg model of an authentic hiv- envelope was constructed ( fig. a) . the model exhibits a highly complex chemical makeup, with different lipid species and asymmetry across the inner and outer leaflets of the membrane bilayer. the envelope is nm in diameter and, with solvent, comprises million cg beads, represented as martini [ ] particles. the cg model was equilibrated using gromacs . [ ] , and its dynamics were investigated over a production simulation time of over five microseconds. the md simulation system, complete with solvent environment, is shown in figure b . all-atom md simulations are the most accurate classical mechanical approach that can be applied to study largescale biological systems [ ] . following simulation of the cg hiv- envelope model, which facilitated equilibration of lipid distributions over an extended simulation timescale, an atomistic model was constructed based on backmapping. the equilibrated cg model was mapped to an atomistic representation using the backward tool [ ] , which enables transformation of cg systems built with martini [ ] . the approach utilizes a series of mapping files that contain structural information and geometric restrictions relevant to the modeling of each lipid species. local geometries of atomistic lipids are reconstructed, taking into account stoichiometry and stereochemistry, to project the cg configuration into an atomistic configuration. an example of an atomistic representation of a lipid, along with its chemical structure, is given in figure c . the final atomistic model of an authentic hiv- viral envelope comprises over million atoms including solvent and ions. to complete the backmapping process, the system must be relaxed to a local energy minimum to resolve structural artifacts introduced by the cg to all-atom transformation. transverse diffusion of lipids occurs when they flip-flop between leaflets of the membrane bilayer. cellular enzymes can catalyze this movement of lipids, or it can occur spontaneously over longer timescales. computational approaches to characterize transverse diffusion in heterogeneous lipid mixtures have been described previously and indicate that flip-flop occurs at relatively slow rates [ ] , [ ] . current strategies to measure flip-flip events are based on tracking the translocation and reorientation of lipid headgroups (i.e., colored beads in fig. a) . a feature released in vmd . . by the authors (measure volinterior) [ ] facilitates tracking of headgroup dynamics by providing rapid classification of their locations in the inner versus outer leaflets of the bilayer. the method equips vmd with a mechanism to identify the inside versus outside of a biomolecular container. if the container surface is specified as the region of the bilayer composed of lipid tail groups (i.e., silver in fig. a) , vmd can detect the presence of lipid headgroups in the endoplasmic versus exoplasmic leaflets. by tracking the number of headgroups that flip-flop from one leaflet to the other during intervals of simulation time, rates of transverse diffusion can be readily calculated [ ] . frontera was leveraged to measure transverse lipid diffusion in the cg model of the authentic hiv- viral envelope using measure volinterior in vmd compiled with mpi support. the entire . µs of md simulation, comprising , frames with a storage footprint of gb, was used for the calculation. to obtain high i/o performance, the trajectory files were set to a mb stripe width, striping over a total of lustre osts. the analysis was run on frontera's primary compute partition, utilizing intel clx cores per node. flip-flop events for the lipid dpsm were found to occur at a rate of . × − ± . × − molecules per nanosecond from the exoplasmic to endoplasmic leaflet, and . × − ± . × − molecules per nanosecond from the endoplasmic to exoplasmic leaflet (fig. a) . because inward versus outward diffusion of this lipid species is not occurring at the same rate, the calculation reveals the process of lipid distributions being equilibrated across the asymmetric bilayer over the course of the simulation. the tcl framework for performing the calculation with vmd, the vmd startup script, and the slurm job submission script used to run the calculation on frontera, are given in the supplementary information. figure b shows the wall-clock time for the transverse diffusion calculation with one vmd mpi rank per node and two vmd mpi ranks per node. by running multiple vmd mpi ranks per node, i/o operations are more effectively overlapped with computations, thereby achieving better overall i/o and compute throughput both per-node, and in the aggregate, leading to an overall performance gain. it is worth noting, however, that finite per-node memory, gpu, and interconnect resources ultimately restrict the benefit of using multiple mpi ranks per-node to achieve better compute-i/o overlap. ongoing vmd developments aim to improve compute-i/o overlap both for cpus and for gpuaccelerated molecular modeling tasks through increased internal multithreading of trajectory i/o to allow greater asynchrony and decoupling of i/o from internal vmd computational kernels. the atomistic model of an authentic hiv- viral envelope presented here was derived via backmapping from a cg model. a major limitation of the backmapping approach involves structural artifacts introduced by the cg to allatom transformation. figure a shows an example of such structural artifacts in the lipid dpsm, including distorted bond lengths and angles. the current strategy for remedying these artifacts is to subject the backmapped model to molecular mechanics energy minimization, relaxing the geometry of each molecule, and the system as a whole, to a local energy minimum. figure b shows an example of a post-minimization structure in which the distortions have been successfully resolved. energy minimization is a common procedure applied to biomolecular systems to eliminate close contacts and nonideal geometries prior to the start of md simulations. the energy of atomistic systems is described as an empirical potential energy function v(r i ), where the terms depend on the positions r of i atoms [ ] . here, energy minimization of the system is performed on frontera using namd . , an md engine based on c++ and charmm++ that is amenable to large systems by virtue of being highly scalable on massive parallel computers [ ] , [ ] . energy minimization is implemented in namd using the conjugate gradient algorithm [ ] for maximum performance. the atomistic model of an authentic hiv- viral envelope was subjected to energy minimization based on the charmm force field, with all the latest corrections in parameters for lipids and heterogenous protein-lipid systems [ ] . figure c diagrams the complete procedure for constructing an atomistic envelope model from an initial cg model. following backmapping of the cg representation to an all-atom representation, explicit ions must be added to achieve electrostatic neutralization of the system, given the numerous charged lipid headgroups present in the envelope. coordinate and topology files are generated using the js plugin in vmd, utilizing a binary file format that overcomes the fixed-column limitation of pdb files. the minimization of multi-million atom systems can occasionally cause memory overflow or significant loss of performance in namd, due to the i/o, even on large-scale computational resources. in remedy, the memory-optimized version of namd implements parallel i/o and a compressed topology file, reducing the memory requirements for large biomolecular systems. following initial conjugate gradient minimization with namd, the model is assessed for major contributors to energetic instability, including structural clashes, close contacts, bond lengths or angles significantly exceeding equilibrium values, and unfavorable dihedral configurations. such assessments can be rapidly accomplished using vmd compiled with mpi support on frontera, using existing vmd commands like measure contacts, measure bond, measure angle, and measure dihed. in a second iteration of minimization, regions of the system that are relatively stable are restrained in order to focus further minimization efforts on regions of that remain the most energetically unfavorable. useful namd commands for minimization of unstable systems include mintinystep and minbabystep, which control the magnitude of steps for the line minimizer algorithm for initial and further steps of minimization, respectively. this process is repeated until all geometric distortions are resolved, and the atomistic model has been driven to a local energy minimum, representative of the stable biological system at equilibrium. increased computational capability presents the opportunity for researchers to push the envelope and extend the bounds of scientific knowledge. building, simulating, and analyzing models of intact, authentic viral envelopes, particularly at atomistic detail, is a relatively new research endeavor under active development. the leadership-class frontera supercomputer provides a powerful resource to support the continued growth of this field. the combination of frontera's state-of-the-art, massively parallel architecture and high-performance software applications, such as vmd, that can exploit it will enabling researchers to make discoveries relevant to the treatment and prevention of disease at an unprecedented scale. all of the techniques discussed here are applicable to modeling and investigation of other enveloped viruses, including sars-cov- , which likewise comprises a complex lipidome. owing to resources like frontera, the computational biological sciences will play an increasingly important role in driving future innovations that improve human health and longevity. b atomistic representation of dpsm post-minimization with structural artifacts resolved. cg representation of the lipid is superimposed to the relaxed all-atom structure. c iterative relaxation process for largescale authentic atomistic viral envelopes. following backmapping from cg representations to all-atom, the system is subjected to charge neutralization, followed by generation of topology and coordinate files. conjugate gradient minimization using namd, in combination with parallel analysis using vmd mpi, is employed iteratively to eliminate structural artifacts of the membrane and drive the system to a local minimum. this work is supported by national science foundation award mcb- , funded in part by the delaware established program to stimulate competitive research (ep-scor). this research is part of the frontera computing project at the texas advanced computing center. frontera is made possible by national science foundation award oac- . vmd development has been supported by national institutes of health grant p -gm . this research used resources provided by the los alamos national laboratory institutional computing program, which is supported by the u.s. department of energy national nuclear security administration under contract no. cna . cold spring harbor perspectives in biology dna viruses: the really big ones (giruses) rna viruses: rna roles in pathogenesis, coreplication and viral load common features of enveloped viruses and implications for immunogen design for next-generation vaccines viral membrane fusion modulation of entry of enveloped viruses by cholesterol and sphingolipids relevance of viroporin ion channel activity on viral replication and pathogenesis genome composition and divergence of the novel coronavirus ( -ncov) originating in china coronavirus membrane fusion mechanism offers as a potential target for antiviral development lipid bilayer asymmetry dynamic transbilayer lipid asymmetry antiviral therapy molecular dynamics simulations of large macromolecular complexes multiscale modelling and simulation of viruses all-atom virus simulations computational virology: from the inside out nothing to sneeze at: a dynamic and integrative computational model of an influenza a virion the role of the membrane in the structure and biophysical robustness of the dengue virion envelope mesoscale all-atom influenza virus simulations suggest new substrate binding mechanism multiscale computer simulation of the immature hiv- virion computational modeling of realistic cell membranes coarse-grained modeling of lipids reconstruction of atomistic details from coarse-grained structures going backward: a flexible geometric approach to reverse transformation from coarse grained to atomistic models validating lipid force fields against experimental data: progress, challenges and perspectives vmd -visual molecular dynamics physical properties of the hiv- capsid from all-atom molecular dynamics simulations the hiv lipidome: a raft with an unusual composition comparative lipidomics analysis of hiv- particles and their producer cell membrane in different cell lines the martini force field: coarse grained model for biomolecular simulations gromacs . : a high-throughput and highly parallel open source molecular simulation toolkit rapid flipflop motions of diacylglycerol and ceramide in phospholipid bilayers cholesterol flipflop in heterogeneous membranes high-performance analysis of biomolecular containers to measure small-molecule transport, transbilayer lipid diffusion, and protein cavities charmm: a program for macromolecular energy, minimization, and dynamics calculations scalable molecular dynamics with namd parallel programming using c; wilson, gv; lu iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients update of the charmm all-atom additive force field for lipids: validation on six lipid types key: cord- -tu znm authors: le sage, valerie; jones, jennifer e.; kormuth, karen a.; fitzsimmons, william j.; nturibi, eric; padovani, gabriella h.; arevalo, claudia p.; french, andrea j.; avery, annika j.; manivanh, richard; mcgrady, elizabeth e.; bhagwat, amar r.; lauring, adam s.; hensley, scott e.; lakdawala, seema s. title: pre-existing immunity provides a barrier to airborne transmission of influenza viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tu znm human-to-human transmission of influenza viruses is a serious public health threat, yet the precise role of immunity from previous infections on the susceptibility to airborne viruses is still unknown. using human seasonal influenza viruses in a ferret model, we examined the roles of exposure duration and heterosubtypic immunity on influenza transmission. we found that airborne transmission of seasonal influenza strains is abrogated in recipient animals with pre-existing non-neutralizing immunity, indicating that transmissibility of a given influenza virus strain should be examined in the context of ferrets that are not immunologically naïve. airborne transmission is essential for the epidemiological success of human influenza a virus (iav), which imposes a significant seasonal public health burden. every influenza season is different, with one virus subtype (h n or h n ) typically dominating and factors such as age, pregnancy and pre-existing medical conditions putting people at increased risk of severe influenza infection. during the - h n predominant season, roughly , people died in the united states, which is more than the number who died during the h n pandemic ( , ). in the - h n epidemic, % of the cases were in the elderly ( +), in contrast during the h n pandemic the highest burden of infection was found in individuals - years of age ( %) ( , ). this age-based discrepancy in iav burden suggests that pre-existing immunity could impact the susceptibility to iav infection, since people of different age groups are exposed to different strains of iav in early childhood ( ) ( ) ( ) ( ) . an individual's first influenza infection typically occurs before the age of ( ) and are repeatedly infected with influenza viruses during their lifetime. the antibody response to a person's first influenza infection will be boosted upon each subsequent antigenically distinct influenza virus strain in a process classically referred to as 'original antigenic sin' ( ) . the impact of pre-existing immunity on the spread of influenza viruses has been understudied, and the few reports in this area have found that pre-existing immunity against heterologous or homologous strains surprisingly protects animals against all subsequent influenza virus infections ( , ) . in these studies, steel et al demonstrated that pre-existing seasonal h n or h n immunity in recipient guinea pigs reduced transmission of the h n pandemic virus ( ) and houser et al observed that pre-existing seasonal h n immunity in donor ferrets prevented transmission of emerging swine h n v virus ( ) . however, these observations do not mimic human epidemiology, since individuals can become infected with influenza virus many times, suggesting that these animal models may need to be revised to more accurately represent human transmission. the majority of published transmission studies use days of continuous exposure in immune naïve animals ( ) ( ) ( ) ( ) and a time interval of - weeks between primary and secondary infection ( , ) . to address these transmission parameters, we examine the role of timing of exposure and pre-existing immunity to address the barriers to transmission and provide a comprehensive comparison of a seasonal h n virus and the pandemic h n virus (h n pdm ). we used ferrets for respiratory droplet transmission because they are naturally susceptible to human isolates of iav and can transmit infectious iav particles through the air ( , , ) . we propose that transmissibility of emerging influenza viruses be assessed in immune ferrets and at short exposure times. in addition, our results indicate that pre-existing immunity provides different barriers for h n and h n pdm virus transmission, suggesting that pre-existing immunity can drive susceptibility to heterosubtypic infections. to investigate the constraint of exposure time on transmission, we examined a representative human seasonal h n virus (a/perth/ / ) and the most recent iav pandemic h n pdm virus (a/ca/ / ). airborne transmission of these two viruses to naïve ferrets was performed continuously for or days, as well as periodically for hours a day for consecutive days ( fig a) . these times were used to mimic human exposure conditions. h n pdm transmitted by respiratory droplets to % of all naïve recipients at all exposure times. donor ferrets shed from days to post-infection (fig b-d , red bars), while recipient ferrets had a wider range of shedding (fig b-d, blue bars) . shorter exposure times caused a delay in detectable h n pdm virus in recipient nasal secretions consistently after day post-exposure (fig c and d , blue bars). this observation is consistent with the highly transmissible nature of this virus in multiple transmission systems ( , ) . the h n virus replicated efficiently in donor ferrets on days , , and (fig e-g, green bars) . h n transmitted to / naïve animals after a -day exposure time with slightly delayed shedding kinetics on days to post-infection ( fig e, green bars) , as compared to ferrets infected with h n pdm ( fig b, red bars) . after a h exposure, h n transmitted to / naïve recipients ( fig f) and / in a second independent replicate (fig s ) , with recipient animals shedding on days , and post-infection (fig f and s , orange bars). at intermittent exposure times transmission of h n virus was slightly reduced to / naïve recipients, which we still consider to be efficient airborne transmission. taken together, these results indicate that seasonal influenza viruses are highly transmissible to naïve recipients. shaded gray bars depict exposure times. three donor ferrets were infected with a/ca/ / (h n ) (b-d) or a/perth/ / (h n ) (e-g) and a naïve recipient ferret was placed in the adjacent cage at hour post-infection for (b and e) or (c and f) continuous days, or hours a day for consecutive days (d and g). nasal washes were collected from all ferrets on the indicated days and each bar indicates an individual ferret. limit of detection is indicated by dashed line. viral titers for donor animals in part g have previously been published in ( ) . pre-existing immunity to heterologous strains impacts airborne transmission of influenza viruses at short exposure times. as demonstrated in figure , ferrets are naturally susceptible to human h n pdm and h n influenza virus infections. to develop a model to mimic the yearly seasonality of influenza infections, we waited between and days between infections to allow for the primary immune response to wane, as shown by others ( ) ( ) ( ) ( ) , and result in two robust infections. six ferrets were infected with h n virus and then three of those animals were experimentally infected at days post-infection with h n pdm virus (herein referred to as 'h -h inf') (fig a) . twenty-four hours after this secondary infection, the three remaining ferrets with h pre-existing immunity (table s ) were each placed in an adjacent cage to act as the recipient animal (herein referred to as 'h -imm recipients') and exposed for days (fig a) . an -day gap produced a robust infection of the h n pdm virus in experimentally inoculated h -h inf ferrets with virus detected in ferret nasal secretions on multiple days ( fig b) . we observed that transmission of h n pdm to recipient animals was not significantly impacted by pre-existing h n virus immunity as / h -imm recipients became infected after a -day exposure period (fig b) . interestingly, the duration and kinetics of h n pdm shedding were different between naïve animals and h -imm animals ( fig c vs fig b) . in a complementary study, ferrets with pre-existing h n pdm immunity were experimentally infected with h n virus to act as donors (herein referred to as 'h -h inf') in a subsequent transmission experiment days later (fig a) . robust replication of h n in h n pdm immune donor (h -h inf) animals was observed ( fig c, green bars) and a recipient ferret with pre-existing h n pdm immunity (herein referred to as 'h -imm recipient') was placed in the adjacent cage h post-infection. the recipient animals were exposed to the h -h inf donor animals for continuous days. surprisingly, no virus was detected in the nasal secretions of the h -imm recipients ( fig c, orange bars) and no seroconversion was observed on day postexposure (table s ). we previously demonstrated that h n was able to transmit to out of recipients with no prior immunity ( figure f and s ). in comparison to this efficient transmission, pre-existing h n pdm immunity provides a complete block to h n transmission during a day exposure window (fig c) . to discern whether donor or recipient immunity was critical for the blockade in h n virus transmission, we examined the spread of h n virus to either h -imm donors or recipients. in figure d , h -h inf donors transmitted the virus to / recipients without prior immunity ( %) (fig d, orange bars) . this h n transmission efficiency is reduced as compared to % ( / ) in animals without prior immunity (fig f and s ), indicating that donor immunity may partially contribute to a barrier in h n transmission. in contrast, / h -imm recipients became infected upon a -day exposure to ferrets infected with h n virus as their primary infection (fig e, orange bars), which suggests that heterosubtypic immunity in recipients is sufficient to block h n virus transmission. taken together, these results indicate that airborne transmission of h n cannot overcome the barrier imposed by h n pdm immunity but transmission of h n pdm can overcome h n immunity even within short exposure times. barrier to h n airborne transmission is independent of neutralizing antibodies. to elucidate the difference in protection provided by h n pdm versus h n immunity, we first compared the shedding of the naïve and immune donor ferrets. higher viral loads in the donor animals might be expected to result in more efficient transmission, yet they do not always correlate with each other ( ) . direct comparison of the shedding kinetics between transmission experiments was not possible as the h n pdm and h n nasal washes were titered at different times on different cell lines. however, comparison of viral rna levels in nasal washes of h or h infected animals revealed similar levels between these two viral infections during both primary and secondary infections (fig s a-b) . these results show that the block in h n transmission was not due to significantly decreased viral shedding by ferrets with pre-existing h n pdm immunity. to determine whether the h -imm recipient ferrets were generating cross-protective h neutralizing antibodies, we performed neutralization assays with sera from animals after primary or secondary infection. sera from the h -imm and h -h inf animals exhibited robust h n neutralizing antibodies that persisted from day post infection/exposure and appeared to increase slightly upon heterologous challenge (fig a) . sera from the h -imm recipient animals, had no detectable cross-reacting neutralizing antibodies against h n pdm (fig b) with the h n pdm virus transmitting efficiently to / h -imm recipient animals (fig b) . similarly, sera from h -imm ferrets produced strong h n pdm neutralizing antibody titers that waned slightly over days and increased slightly upon the h transmission experiment (fig d) . although the h -imm recipient animals were protected against h transmission they had no detectable crossreacting neutralizing antibodies against h (fig c, orange line) . at day post-infection, only the experimentally infected animals (h -h inf) had neutralizing antibody titers against h ( fig c, green line) . these data indicate that the block in h n airborne transmission by h n pdm immunity is independent of cross-reactive neutralizing antibodies in either the donor or recipient. airborne transmission is critical for emergence of pandemic viruses and we demonstrate that epidemiologically successful human seasonal and pandemic influenza viruses transmit to naïve recipients efficiently even at shortened or intermittent exposure times. in humans, primary influenza infections typically occur by about years of age, which initiates innate and adaptive immune responses resulting in immunological memory. thus, the spread of pandemic and seasonal iav occurs in a population with significant pre-existing immunity ( ) . we demonstrate that pre-existing immunity can influence airborne transmissibility of iav. specifically, transmission of seasonal h n virus is abrogated in animals with pre-existing h n pdm immunity that was non-neutralizing. conversely, h n immunity did not significantly impair h n pdm transmission. importantly, recipient immunity is sufficient to get a complete block in h n transmission through an immune mechanism that does not rely on cross-neutralizing antibodies. transmissibility of a given influenza virus relies on the viral replication in the upper respiratory tract, release of virus into the air, survival of the virus within the environment, and successful replication in a susceptible recipient. in the ferret animal model, transmission studies typically house a naïve recipient animal adjacent to an infected donor for consecutive days ( ) ( ) ( ) ( ) and very few studies have altered this experimental animal model. in this study, we examined the transmissibility of seasonal h n and h n pdm viruses at shortened exposures times and found efficient transmission after a h continuous exposure and intermittent exposure of h a day for days, mimicking a work week. studies examining transmission of h n pdm have demonstrated transmission to naïve ferrets after a h exposure to as short as a h exposure ( , ) , which is consistent with our observations with h n pdm . however, it is important to note that the transmission system utilized by koster et al ( ) had considerably lower air flow rates, liters per minute compared to cubic ft. per minute (fig s ) . interestingly, transmissibility of h n pdm within a h exposure window was efficient if the naïve recipients were exposed to infected ferrets early ( - days) as opposed to late ( - days) post-infection ( ) , suggesting that transmission efficiency is also linked to an exposure window. the primary correlate of protection against influenza virus infections has been mapped to antibodies against of the two antigenic determinants on the surface of the virus particle, glycoproteins hemagglutinin (ha) and neuraminidase (na) ( ) . in this study, we also demonstrate that heterosubtypic immunity can provide protection from transmission of h n virus. broadly protective immune response can be generated between ha subtypes ( , , ( ) ( ) ( ) ( ) ( ) ( ) , but the lack of detectable h n neutralizing antibodies in the serum of donor and recipient animals suggests that these broadly cross-reactive neutralizing antibodies are not the source of protection. sub-neutralizing concentrations of antibodies that recognize the stalk portion of ha have been shown to limit influenza disease through antibody-dependent cell mediated cytotoxicity (adcc) ( ) ( ) ( ) ( ) . alternatively, non-neutralizing antibodies against the influenza virus conserved antigens, np, m and m might be exerting a protective function through adcc, antibodydependent phagocytosis or antibody-mediated complement-dependent cytotoxicity ( ) . heterosubtypic immunity in the recipients provided a robust barrier against infection with seasonal h n , but not h n pdm . this difference may be due to the immunogenicity of internal h n pdm antigens in ferrets compared to h n virus. future examination of cross-reactive adaptive immune responses between h n pdm and h n will further our understanding of influenza virus correlates of protection. finally, heterosubtypic immunity in experimentally infected animals limited disease severity, but not viral shedding in nasal secretions, which may represent a system to study asymptomatic transmission. in the last years, two pandemic viruses have emerged in the human population; the pandemic h n virus and sars-cov- . both of the viruses are efficiently transmitted from person-to-person with expelled aerosols. during the h n influenza pandemic over , individuals died of the infection in the united states, with % of those deaths were in patients - year-old ( ) . recent analyses correlating birth year to imprinted virus would suggest that a large proportion of those individuals would have been imprinted with a seasonal h n virus ( ) and our data demonstrate the pre-existing immunity against h n virus was not a significant barrier to natural infection of h n pdm virus. in contrast, in - a drifted h n predominated for which the vaccine was not efficacious, but individuals aged - had the lowest number of infections in that year ( ) and would include a large number of individuals imprinted with h n pdm virus. previous studies in ferrets have also demonstrated that pre-existing h n pdm immunity days prior allowed for rapid clearance of an antigenically distinct swine influenza virus and demonstrated reduced disease severity compared to naïve ferrets ( ) . our data support this idea as immune imprinting with h n pdm is superior to h n imprinting to protect from infection with heterologous viruses. although the immunological mechanism underlying this phenotype will require further studies, translation of these results to the current covid- pandemic may be important to understand age-based distributions of sars-cov- disease severity and susceptibility. cells and viruses. mdck (madin darby canine kidney, obtained from atcc) and mdck siat cells (kind gift from dr. stacy schultz-cherry at st. jude) were grown at °c in % co in mem medium (sigma) containing % fetal bovine serum (fbs, hyclone), penicillin/streptomycin and l-glutamine. reverse genetics plasmids of a/perth/ / and a/california/ / were a generous gift from dr. jesse bloom (fred hutch cancer research center, seattle) and were rescued as previously described in ( ) . the viral titers were determined by tissue culture infectious dose (tcid ) using the endpoint titration method on mdck cells for h n pdm and mdck siat cells for h n ( ) . ferret transmission experiments were conducted at the university of pittsburgh in compliance with the guidelines of the institutional animal care and use committee. five to six month old male ferrets were purchased from triple f farms (sayre, pa, usa). all ferrets were screened for antibodies against circulating influenza a and b viruses by hemagglutinin inhibition assay, as described in ( ) transmission studies. our transmission caging setup is a modified allentown ferret and rabbit bioisolator cage similar to those used in ( , ) . additional details on the caging setup can be found in fig s . for each study, three to four ferrets were anesthetized by isofluorane and inoculated intranasally with tcid / ul of a/perth/ / or a/california/ / , they function as the donor or (inf) animals. twenty-four hours later, a recipient ferret was placed into the cage but separated from the donor animal by two staggered perforated metal plates welded together one inch apart. nasal washes were collected from each donor and recipient every other day for days. to prevent accidental contact or fomite transmission by investigators, the recipient ferret was handled first and extensive cleaning of all chambers, biosafety cabinet, and temperature monitoring wands was performed between each recipient and donor animal and between each pair of animals which also included glove and anesthesia chamber changes. sera was collected from donor and recipient ferrets upon completion of experiment to confirm seroconversion. environmental conditions were monitored daily and ranged between - °c with - % relative humidity. to ensure no accidental exposure during husbandry procedures, recipient animal sections of the cage were cleaned first then then infected side, three people participated in each husbandry event to ensure that a clean pair of hands handled bedding and food changes. one cage was done at a time and a min wait time to remove contaminated air was observed before moving to the next cage. new scrappers, gloves, and sleeve covers were used on subsequent cage cleaning. clinical symptoms such as weight loss and temperature were recorded during each nasal wash procedure and other symptoms such as sneezing, coughing, lethargy or nasal discharge were noted during any handling events. animals were given a/d diet twice a day to entice eating once they reached % weight loss. a summary of clinical symptoms for each study are provided in supplemental table . taq enzyme mix, . μl of μm probe, . μl of x pcr buffer master mix, and μl of extracted viral rna. the pcr master mix was thawed and stored at °c, h before reaction set-up. the rt-qpcr was performed on a fast real-time pcr system (applied biosystems) with a machine protocol of °c- min, °c- min followed by cycles of °c- sec, °c- sec. to relate genome copy number to ct value, we used a standard curve based on serial dilutions of a plasmid control, run in triplicate on the same plate. h n samples were compared to a plasmid containing the m segment of a/perth/ / . h n samples were compared to a plasmid containing the m segment of a/california/ / . serology assay. analysis of neutralizing antibodies from ferret sera was performed as previously described ( ) . briefly, the microneutralization assay was performed using . tcid of either h n or h n pdm virus incubated with -fold serial dilutions of heat-inactivated ferret sera. the neutralizing titer was defined as the reciprocal of the highest dilution of serum required to completely neutralize infectivity of . tcid of virus on mdck cells. the concentration of antibody required to neutralize tcid of virus was calculated based on the neutralizing titer dilution divided by the initial dilution factor, multiplied by the antibody concentration. figure s . transmission of h n to naïve ferrets for a short exposure period. three ferrets were infected with a/perth/ / (h n ) and nasal washes were collected from each ferret on the indicated days post-infection. a naive ferret was placed in the adjacent cage at hour post-infection for days and nasal washes were collected from each recipient ferret on the indicated days post-exposure. bars indicate individual ferrets. all ferrets were serologically negative for circulating influenza viruses at the beginning of the study. the limit of detection was . tcid /ml. tcid , % tissue culture infectious dose. the viral titer data for donor animals was previously published in ( ) . estimated global mortality associated with the first months of pandemic influenza a h n virus circulation: a modelling study update: influenza activity in the united states during the season and composition of the - influenza vaccine potent protection against h n and h n influenza via childhood hemagglutinin imprinting childhood immune imprinting to influenza a shapes birth yearspecific risk during seasonal h n and h n epidemics genesis and pathogenesis of the pandemic h n influenza a virus age-specific differences in the dynamics of protective immunity to influenza prevalence of antibodies against seasonal influenza a and b viruses in children in netherlands on the doctrine of original antigenic sin impact of prior seasonal h n influenza vaccination or infection on protection and transmission of emerging variants of influenza a(h n )v virus in ferrets transmission of pandemic h n influenza virus and impact of prior exposure to seasonal strains or interferon treatment ferrets as a transmission model for influenza: sequence changes in ha of type a (h n ) virus lack of transmission of h n avian-human reassortant influenza viruses in a ferret model transmission and pathogenesis of swine-origin a(h n ) influenza viruses in ferrets and mice pathogenesis and transmission of swine-origin a(h n ) influenza virus in ferrets the ferret as a model organism to study influenza a virus infection the soft palate is an important site of adaptation for transmissible influenza viruses in vitro and in vivo characterization of new swine-origin h n influenza viruses original antigenic sin priming of influenza virus hemagglutinin stalk antibodies historical h n influenza virus imprinting increases vaccine protection by influencing the activity and sustained production of antibodies elicited at vaccination in ferrets elicitation of protective antibodies against years of future h n cocirculating influenza virus variants in ferrets preimmune to historical h n influenza viruses sequential seasonal h n influenza virus infections protect ferrets against novel h n influenza virus sequential infection in ferrets with antigenically distinct seasonal h n influenza viruses boosts hemagglutinin stalk-specific antibodies exhaled aerosol transmission of pandemic and seasonal h n influenza viruses in the ferret scintigraphic, spirometric, and roentgenologic effects of radiotherapy on normal lung tissue. short-term observations in consecutive patients with breast cancer transmission of a h n pandemic influenza virus occurs before fever is detected, in the ferret model the human antibody response to influenza a virus infection and vaccination induction of broadly cross-reactive antibody responses to the influenza ha stem region following h n vaccination in humans h stalk-based chimeric hemagglutinin influenza virus constructs protect mice from h n challenge pandemic h n influenza vaccine induces a recall response in humans that favors broadly cross-reactive memory b cells neutralizing antibodies against previously encountered influenza virus strains increase over time: a longitudinal analysis hemagglutinin stalk antibodies elicited by the pandemic influenza virus as a mechanism for the extinction of seasonal h n viruses broadly cross-reactive antibodies dominate the human b cell response against pandemic h n influenza virus infection broadly neutralizing hemagglutinin stalk-specific antibodies require fcgammar interactions for protection against influenza virus in vivo cross-reactive influenza-specific antibody-dependent cellular cytotoxicity antibodies in the absence of neutralizing antibodies antibody-dependent cellular cytotoxicity is associated with control of pandemic h n influenza virus infection of macaques hemagglutinin stalk immunity reduces influenza virus replication and transmission in ferrets non-neutralizing antibodies directed at conservative influenza antigens estimating the burden of pandemic influenza a (h n ) in the united states antigenically diverse swine origin h n variant influenza viruses exhibit differential ferret pathogenesis and transmission phenotypes eurasian-origin gene segments contribute to the transmissibility, aerosol release, and morphology of the pandemic h n influenza virus a simple method of estimating fifty percent endpoints this work was supported by the national institute of allergy and infectious diseases (hhsn c, ssl; r ai - a , ssl; r ai , seh; p ai , seh; ceirs hhsn c, seh). additional funding for ssl includes the american lung association biomedical research grant, and a new initiative award from the charles e. kaufman foundation, a supporting organization of the pittsburgh foundation. asl is supported by burroughs wellcome fund path award. key: cord- - uy dq authors: marano, jeffrey m.; chuong, christina; weger-lucarelli, james title: rolling circle amplification is a high fidelity and efficient alternative to plasmid preparation for the rescue of infectious clones date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uy dq alphaviruses (genus alphavirus; family togaviridae) are a medically relevant family of viruses that include chikungunya virus, eastern equine encephalitis virus, and the emerging mayaro virus. infectious cdna clones of these viruses are necessary molecular tools to understand viral biology and to create effective vaccines. the traditional approach to rescuing virus from an infectious cdna clone requires propagating large amounts of plasmids in bacteria, which can result in unwanted mutations in the viral genome due to bacterial toxicity or recombination and requires specialized equipment and knowledge to propagate the bacteria. here, we present an alternative to the bacterial-based plasmid platform that uses rolling circle amplification (rca), an in vitro technology that amplifies plasmid dna using only basic equipment. we demonstrate that the use of rca to amplify plasmid dna is comparable to the use of a midiprepped plasmid in terms of viral yield, albeit with a slight delay in virus recovery kinetics. rca, however, has lower cost and time requirements and amplifies dna with high fidelity and with no chance of unwanted mutations due to toxicity. we show that sequential rca reactions do not introduce mutations into the viral genome and, thus, can replace the need for glycerol stocks or bacteria entirely. these results indicate that rca is a viable alternative to traditional plasmid-based approaches to viral rescue. importance the development of infectious cdna clones is critical to studying viral pathogenesis and for developing vaccines. the current method for propagating clones in bacteria is limited by the toxicity of the viral genome within the bacterial host, resulting in deleterious mutations in the viral genome, which can only be detected through whole-genome sequencing. these mutations can attenuate the virus, leading to lost time and resources and potentially confounding results. we have developed an alternative method of preparing large quantities of dna that can be directly transfected to recover infectious virus without the need for bacteria by amplifying the infectious cdna clone plasmid using rolling circle amplification (rca). our results indicate that viral rescue from an rca product produces a viral yield equal to bacterial-derived plasmid dna, albeit with a slight delay in replication kinetics. the rca platform, however, is significantly more cost and time-efficient compared to traditional approaches. when the simplicity and costs of rca are combined, we propose that a shift to an rca platform will benefit the field of molecular virology and could have significant advantages for recombinant vaccine production. rna viruses produce significant disease in humans and animals, highlighted by the current outbreak of severe acute respiratory syndrome coronavirus (sars-cov- ) [ ] . infectious cdna clones of these viruses are necessary molecular tools to understand viral biology since they facilitate the study of single-nucleotide polymorphisms [ ] and enable the insertion of reporter proteins to study virus replication or cell tropism [ ] . cdna clones have also been instrumental in developing vaccines for rna viruses, notably cyd-tdv (dengvaxia) [ ] and tak- (takeda) [ ] , both of which are tetravalent chimeric vaccines against dengue virus. alphaviruses (genus alphavirus; family togaviridae) are a group of small, enveloped, medically relevant positive-sense rna viruses with genomes of - kilobases in length [ ] . examples of medically relevant alphaviruses include several arthropod-borne viruses, or arboviruses, such as chikungunya virus, ross river virus, eastern equine encephalitis, and the emerging mayaro virus [ ] . additionally, alphavirus genomes are relatively easy to manipulate and have been used as expression vectors for foreign proteins [ , ] . typically, the propagation of infectious cdna clones before viral rescue requires the generation of high concentration plasmid stocks from bacteria, which is not only cumbersome and time-consuming but also presents an opportunity for the introduction of unwanted mutations during amplification in bacteria. bacterial instability of viral genomes has been reported for flaviviruses [ ] , alphaviruses [ ] , and coronaviruses [ ] . the cause of this is likely cryptic prokaryotic promoters, which results in the expression of viral proteins inside of the bacteria, which due to their toxicity, can lead to the selection of plasmids with deletions, mutations, or recombination with reduced bacterial toxicity [ ] [ ] [ ] . there are several critical points within the plasmid-based rescue workflow where deletion or mutations can occur ( figure ). these include the transformation of the plasmid into the bacteria, the selection of colonies from the agar plate, and the propagation of the colony in liquid culture. while deletions are easy to identify in plasmids by restriction enzyme digestion, mutations can only be determined by whole-genome sequencing, which is costly and laborious. furthermore, even synonymous changes can have profound impacts on viral replication [ ] [ ] [ ] and should be avoided in cdna clones. these unwanted changes to the viral genome can confound experimental results and, therefore, necessitate sequencing of the full viral genome every time new plasmid stocks are generated, a time-consuming and expensive task. thus, removing the need for the bacterial host to maintain and propagate infectious cdna clone plasmids would simplify the process and remove the possibility of deleterious bacterial-derived mutations. in this report, we describe an alternative to bacterial-based growth of infectious cdna clones-in vitro amplification using rolling circle amplification (rca). rca is an isothermal, high yield method of dna amplification [ ] that uses a highly processive polymerase that can amplify dna over kb [ ] . importantly, the enzyme replicates dna with high fidelity due to its '- 'exonuclease-or proofreading-activity [ ] . we show that peak virus yields were similar following rca-and plasmid-derived virus rescue in several cell lines and that small amounts of the rca product could be used to rescue virus successfully. finally, we showed that we could further amplify an rca product through additional rounds of rca without the introduction of unwanted mutations, thereby allowing a simple, cheap, and high-fidelity means to propagate infectious cdna clone plasmids. rca-launched infectious cdna clones represent a technical improvement in rescuing viruses without the need for bacteria. cell culture. vero (cercopithecus aethiops kidney epithelial cells, atcc® ccl- ™), bhk- clone (baby hamster kidney fibroblasts, atcc® ccl- ™), and hek t (human embryonic kidney cells, atcc® crl- ™) cell lines were maintained at °c in % co using dulbecco's modified eagle's medium (dmem) supplemented with % fetal bovine serum (fbs), % nonessential amino acids, and . % gentamicin. plaque assays were performed as previously described [ ] except that plates were either fixed for two days ( % gum tragacanth overlay-mp biomedicals catalog ) or three days ( . % methylcellulose overlay-spectrum chemical catalog me - gm) post-infection. each day for three days for the growth curves in multiple cell lines, each day for two days for the rca input comparison and the kit comparison, and two days post-transfection for the sequential rca experiment. viral titer was then determined using plaque assays on vero cells. we used an infectious cdna clone of mayaro virus strain trvl , which has previously been described [ ] , for our plasmid control. the plasmid was initially transformed into nebstable electrocompetent cells. cells were incubated for hours at °c and then hours at room temperature. colonies were picked and incubated in lennox broth (lb) supplemented with µg/ml of carbenicillin for hours. we extracted dna using both promega pureyield miniprep kit, for verification, and zymo midiprep kit, for transfection. we verified the plasmids using endonuclease digestion and gel electrophoresis and transfected the samples to ensure the infectivity of the clone. dna concentration was determined using invitrogen's qubit x dsdna hs kit. for the superphi rca premix kit with random primers (evomics catalog number pm ), µl of ng/µl of plasmid dna was mixed with µl of sample buffer while the thermocycler was preheated to °c. the mixture was incubated at °c for minute and then rapidly cooled to °c. µl of x superphi master mix was then mixed with the sample, which was then incubated for hours at °c before polymerase inactivation at °c for minutes. for the genomiphi v dna amplification kit (ge healthcare), µl of ng/µl of plasmid dna was mixed with µl of molecular grade water and µl of denaturation buffer. at the same time, the thermocycler was preheated to °c. the mixture was incubated for one minute at °c and then rapidly cooled to °c. µl of the denatured template was then added to the lyophilized reaction cake containing enzymes, dntps, and buffers and thoroughly mixed by pipetting. samples were incubated at °c for minutes, and then the enzyme was inactivated at °c for minutes. rca product concentration was determined using qubit after a -fold dilution in molecular grade water. amplification of the plasmid was confirmed using endonuclease digestion and gel electrophoresis. dilutions were performed using molecular grade water. to generate the rca passages, an initial rca was performed using the superphi protocol as described above and validated using gel electrophoresis. µl of rca product was then used as the template for a subsequent µl rca reaction. we then repeated the process of rca for a total of three passages. sequencing protocol. rca products were sanger sequenced at the genomics sequencing center at virginia tech. rca products were diluted to ng/µl to prepare them for sequencing. µl of diluted rca product was mixed with µl of µm primer and µl of molecular grade water. resulting reads were aligned using snapgene® . . software (gsl biotech). statistical analysis. statistics were performed using graphpad prism (san diego, ca). twoway anova tests were performed using sidak's corrections for multiple comparisons for the comparison of titers in different cell lines. for the comparison of rca kits, two-way anova tests were performed using dunnett's correction for multiple comparisons against the plasmid control. a one-way anova was performed using sidak's correction for multiple comparisons against the plasmid control for the sequential rca test. to examine the efficacy of rca for viral rescue, we chose three cell lines for transfection: vero, hek t, and bhk clone cells (bhk ). we selected these cell lines due to their widespread use in virus rescue [ , ] . we used a cmv-driven mayaro virus infectious cdna clone as both the template for our rca reactions and as our plasmid control for the transfections (fig. ) . following transfection, we collected virus each day until % of the cells showed cytopathic effect (cpe), which occurred by day three in all cases. on the first day post-transfection, viral titer in the plasmid transfection was significantly higher compared to the rca product in both bhk and hek t cells (p= . and p= . , respectively). no and p= . , respectively) . however, the titer of rca product transfection was significantly higher than the plasmid titer in hek t cells (p= . ). the above results demonstrate two critical features of using rca for viral rescue: similar replication kinetics-albeit with a slight delay-and identical peak yield. in all the cell lines, the peak viral titer occurred on the second-day post-transfection for both plasmid and rca product transfection. these data demonstrate that rca is equivalent to plasmids in their ability to rescue virus using standard transfection conditions in terms of viral yield. a hypothesis for the delay in viral production seen in bhk and hek t cells is that the complex structure of rca products (i.e., branched rca molecules) compared to plasmid dna may produce steric inhibition and delay transcription from the cmv promoter by rna polymerase ii [ ] . since the kinetics of virus recovery were similar for all cell lines, and since peak titers were observed two days post-transfection, we only used vero cells and only sampled on the firstand second days following transfection for all future studies. we next sought to determine whether transfections with different rca product input concentrations would result in efficient virus rescue. to that end, we transfected vero cells with a range of rca inputs produced using the evomics superphi kit (fig. ) . one-day post-transfection, ng of rca resulted in a decreased viral titer compared to a plasmid input of ng (p = . ). we observed no differences in any of the other input concentrations. as in the above experiment, by two days post-transfection, rca and plasmid titers were the same ( ng superphi p= . , ng superphi p=. , ng p>. , ng p=. ). these results indicate that the transfection of rca product is robust and can tolerate a wide variety of input concentrations without altering peak viral yield. to ensure that the above results were not restricted to a specific rca kit, vero cells were transfected in triplicate in two independent replicates with rca product produced using both the evomics superphi kit and the ge genomiphi kit or plasmid dna (fig. ) . one day post-transfection, the viral titers produced from both ng and ng of genomiphi rca products were lower than the titers produced by plasmid (p = . and p = . , respectively). there was no significant difference between the superphi samples and the plasmid samples one-day post-transfection ( ng p=. , ng p=. ). all titers were the same two days post-transfection compared to plasmid rescue ( ng superphi p=. , ng superphi p=. , ng genomiphi p=. , ng genomiphi p=. .) thus, peak viral yields or recovery kinetics of rca products are not dependent on the rca kit. to determine if rca can further amplify an rca product without introducing unwanted mutations, an initial rca was performed using plasmid dna as a template (subsequently referred to as passage ) and amplified three more times. following the transfection of the different "passages," we found no significant differences between the viral titer produced in passage rca dna and plasmid dna (p= . ) (fig. ) . however, we did note a difference between the titers of later rca passages and plasmid dna (p = . , p = . , and p = . , respectively). to determine if these differences were due to either mutations or artifacts of repeated amplification, rca dna from passage and was sequenced using sanger sequencing. the sequences for passage , passage , and the original plasmid were identical, indicating that no mutations were introduced in the viral genome during repeated rcas. the likely cause of the reduction in peak viral yield is that the repeated rcas amplified both specific and non-specific dna. amplification of non-specific dna, which is caused by the concatemerization of the random hexamer primers [ ] , alters the ratio of specific to non-specific dna, resulting in a reduction in the amount of target dna that is transfected. to mitigate the effect of moderate titer reduction with sequential rca reaction, harvesting virus at a slightly later time or increasing dna input may be effective. however, these results indicate that rca products can effectively act as a template for subsequent rca reactions without introducing unwanted mutations. rca products can, therefore, substitute the standard glycerol bacterial stock protocol, or repeated bacterial transformations to generate midi-or maxi-prepped dna. in both bacterial methods, mutations can be introduced during growth and, thus, require sequencing. here, we report a simple method to recover infectious virus from a cdna clone using rca to amplify a plasmid. we observed that both rca and plasmid-based transfection produced similar peak viral titers following transfection for several cell lines, using several rca kits, and when transfecting variable input dna amounts. importantly, rca products can be reamplified by rca to maintain a dna record without generating mutations in the viral genome. the evidence above demonstrated that the rca platform is equivalent to the plasmid-based platform in terms of viral yield. however, when considering the time and cost to perform these two processes, it is apparent that rca offers many advantages (table ) . using the superphi kit as an example, one rca reaction costs $ . and produces µg of dna in hours. when using the plasmid approach, first, you would need to screen colonies using endonuclease digestion. assuming you screen ten colonies using the promega pureyield™ plasmid miniprep system, it will cost $ . . from there, positive colonies would then be used to inoculate cultures for midiprep. assuming you midiprep between one and four colonies using the zymopure™ ii plasmid midiprep kit, this purification would cost between $ . and $ . . the extracted midiprep dna would then be used for sanger sequencing, costing roughly $ per genome. therefore, the final cost of the plasmid workflow ranges from $ . to $ . , with a final dna yield of µg. when comparing cost per µg of dna, it is apparent that the rca system is superior to the plasmid approach. rca reactions cost $ . /µg, while the plasmid approach costs between $ . -$ . /µg. this difference is further emphasized when considering the time to complete the two reactions. an rca reaction takes hours or less and can be transfected directly while the bacterial approach takes several days and requires sequencing. given this cost and time analysis, the rca platform is a more time and costefficient method for rescuing viruses and produces similar results, indicating that a shift to the rca-based approach would simplify viral rescue while saving time and money. the use of rca-launched expression from rna polymerase ii promoter containing constructs has several potential commercial applications as well, including dna and recombinant live-attenuated vaccines. bacterial-derived plasmids have several safety concerns, including endotoxins, transposition of pathogenic elements, and the introduction of antibiotic resistance into the environment [ ] . rca mitigates many of these concerns since the antibiotic resistance marker is not required, and rca products are free from endotoxin. this study has two limitations: first, we only used a single mayv infectious cdna clone to characterize the rca rescue system. however, we successfully used rca to amplify and rescue a variety of infectious cdna clones, including zika virus, chikungunya virus, usutu virus, and sindbis virus. we anticipate that this system can be used for other positive-sense rna virus cdna clones and likely negative-strand viruses as well. second, we only tested viral rescue from a clone driven by a cytomegalovirus (cmv) promoter. using a cmv promoter allows for simple transfection of small amounts of plasmid dna without the need for extra reagents to produce viral rna or potential issues with unwanted mutations derived from the error-prone bacteriophage promoters [ ] . however, we have also used a linearized and purified rca product to generate full-length infectious rna transcripts from bacteriophage-driven clones, indicating the versatility of this system. taken together, rca represents a simple, high-fidelity, and cost-effective means to produce large amounts of plasmid dna that can be repeatedly propagated and used to rescue infectious virus directly. we would like to acknowledge vt-fast, specifically kristin rose jutras, alexander crookshanks, michael stamper, and janet webster, for there assistance in preparing this manuscript for publication. we would thank members of the weger-lucarelli lab, specifically tyler bates, emily webb, and pallavi rai, for there valuable contributions and discussions in experimental design and manuscript preparation. this work was partially supported by funding from darpa's preventing emerging pathogenic threats (preempt) program. the plasmidbased system involves the transformation of a plasmid into bacteria. the bacteria are then selected and propagated using antibiotic enriched media, and the plasmid is purified from the bacteria and transfected into the cell type of interest. the red exclamation points indicate points during the workflow where mutations and error can be introduced or enhanced. the rca-based system involves amplification of the plasmid using random hexamers to produce the hyperbranched product. this product can then be directly transfected into cells. the kinetics of plasmid and rca techniques to produce virus were assessed in vero, bhk clone , and hek t cells. cells were transfected in triplicate with either ng of evomics superphi rca or plasmid dna in triplicate. the experiment was done in two independent biological replicates. the supernatant was collected each day post-transfection until cells reached % cpe for plaque assay. error bars represent the standard deviation from the mean. statistical analysis was performed using two-way anova with ad hoc sidak's correction for multiple comparisons (ns p > . , ** p ≤ . , *** p ≤ . ). the effect of rca input on viral production kinetics was examined using vero cells. cells were transfected in triplicate with ng, ng, ng, or ng of evomics superphi rca or ng of the plasmid. the supernatant was collected at one and -days post-transfection for plaque assay. error bars represent the standard deviation from the mean. statistical analysis was performed using two-way anova with ad hoc dunnett's correction for multiple comparisons. the effect of rca kits on viral production kinetics was examined using vero cells. cells were transfected with ng or ng of either evomics superphi or ge genomiphi rca or ng of plasmid dna. the supernatant was collected at one and -days post-transfection for plaque assay. error bars represent the standard deviation from the mean. statistical analysis was performed using twoway anova with ad hoc dunnett's correction for multiple comparisons against a plasmid control. we examined the impact of sequential rca on the ability to rescue virus in vero cells. we used the evomics superphi rca kit and a sample of midiprepped dna as a template to generate rca products (passage ). we then used the rca product as the template for subsequent rca reactions (passage - ). we transfected the cells using ng of rca product or ng of plasmid dna in triplicate. the supernatant was collected -days post-transfection for plaque assay. error bars represent the standard deviation from the mean. statistical analysis was performed using oneway anova with ad hoc dunnett's correction for multiple comparisons against the plasmid control (ns p > . , ** p ≤ . , *** p ≤ . ). pm ) . in constructing the cost estimate, it was assumed that ten colonies were selected for screening, and one to four of those colonies were then midiprepped and sequenced. host factors in positive-strand rna virus genome replication haiku: new paradigm for the reverse genetics of emerging rna viruses construction of an infectious chikungunya virus cdna clone and stable insertion of mcherry reporter genes at two different sites impact of dengue vaccination on serological diagnosis: insights from phase iii dengue vaccine efficacy trials efficacy of a tetravalent dengue vaccine in healthy children and adolescents replication of alphaviruses: a review on the entry process of alphaviruses into cells evolutionary relationships and systematics of the alphaviruses an alphavirus-based therapeutic cancer vaccine: from design to clinical trial alphavirus vectors as tools in neuroscience and gene therapy successful propagation of flavivirus infectious cdnas by a novel method to reduce the cryptic bacterial promoter activity of virus genomes infectious alphavirus production from a simple plasmid transfection+ reverse genetics with a full-length infectious cdna of the middle east respiratory syndrome coronavirus identification of a cryptic prokaryotic promoter within the cdna encoding the ′ end of dengue virus rna genome development and characterization of recombinant virus generated from a new world zika virus infectious clone exploring the instability of reporters expressed under the subgenomic promoter in chikungunya virus infectious cdna clones the fitness effects of synonymous mutations in dna and rna viruses random codon re-encoding induces stable reduction of replicative fitness of chikungunya virus in primate and mosquito cells synonymous mutations at the beginning of the influenza a virus hemagglutinin gene impact experimental fitness the discovery of rolling circle amplification and rolling circle transcription highly efficient dna synthesis by the phage Φ dna polymerase. symmetrical mode of dna replication the bacteriophage phi dna polymerase, a proofreading enzyme infectious cdna clones of two strains of mayaro virus for studies on viral pathogenesis and vaccine development new reverse genetics and transfection methods to rescue arboviruses in mosquito cells generation of high-yielding influenza a viruses in african green monkey kidney (vero) cells by reverse genetics improvements of rolling circle amplification (rca) efficiency and accuracy using thermus thermophilus ssb mutant protein ensuring safety of dna vaccines. microbial cell factories key: cord- - foh uvn authors: bosker, hans rutger; peeters, david title: beat gestures influence which speech sounds you hear date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: foh uvn beat gestures – spontaneously produced biphasic movements of the hand – are among the most frequently encountered co-speech gestures in human communication. they are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. despite their prevalence across speakers of the world’s languages, how beat gestures impact spoken word recognition is unclear. can these simple ‘flicks of the hand’ influence speech perception? across six experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g., distinguishing object from object), and in turn, can influence what vowels listeners hear. thus, we provide converging evidence for a manual mcgurk effect: even the simplest ‘flicks of the hands’ influence which speech sounds we hear. significance statement beat gestures are very common in human face-to-face communication. yet we know little about their behavioral consequences for spoken language comprehension. we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress, and, in turn, can even shape what vowels we think we hear. this demonstration of a manual mcgurk effect provides some of the first empirical support for a recent multimodal, situated psycholinguistic framework of human communication, while challenging current models of spoken word recognition that do not yet incorporate multimodal prosody. moreover, it has the potential to enrich human-computer interaction and improve multimodal speech recognition systems. the human capacity to communicate evolved primarily to allow for efficient face-to-face interaction . as humans, we are indeed capable of translating our thoughts into communicative messages, which in everyday situations often comprise concurrent auditory (speech) and visual (lip movements, facial expressions, hand gestures, body posture) information [ ] [ ] [ ] . speakers thus make use of different channels (mouth, eyes, hands, torso) to get their messages across, while addressees have to quickly and efficiently integrate or bind the different ephemeral signals they concomitantly perceive , , . this process is skilfully guided by top-down predictions - that exploit a life-time of communicative experience, acquired world knowledge, and personal common ground built up with the speaker [ ] [ ] [ ] . the classic mcgurk effect is an influentialthough not entirely uncontroversial - - illustration of how visual input influences the perception of speech sounds. in a seminal study , participants were made to repeatedly listen to the sound /ba/. at the same time, they watched the face of a speaker in a video whose mouth movements corresponded to the sound /ga/. when asked what they heard, a majority of participants reported actually perceiving the sound /da/. this robust finding indicates that, during language comprehension, our brains take both verbal (here: auditory) and non-verbal (here: visual) aspects of communication into account , , providing the addressee with a best guess of what exact message a speaker intends to convey. we thus listen not only with our ears, but also with our eyes. visual aspects of everyday human communication are not restricted to subtle mouth or lip movements. in close coordination with speech, the hand gestures we spontaneously and idiosyncratically make during conversations help us express our thoughts, emotions, and intentions . we manually point at things to guide the attention of our addressee to relevant aspects of our immediate environment [ ] [ ] [ ] [ ] , enrich our speech with iconic gestures that in handshape and movement resemble what we are talking about [ ] [ ] [ ] , and produce simple flicks of the hand aligned to the acoustic prominence of our spoken utterance to highlight relevant points in speech [ ] [ ] [ ] [ ] [ ] [ ] . over the past decades, it has become clear that such co-speech manual gestures are semantically aligned and temporally synchronized with the speech we produce - . it is an open question, however, whether the temporal synchrony between these hand gestures and the speech signal can actually influence which speech sounds we hear. in this study, we test for the presence of such a potential manual mcgurk effect. do hand gestures, like observed lip movements, influence what exact speech sounds we perceive? we here focus on manual beat gesturescommonly defined as biphasic (e.g., up and down) movements of the hand that speakers spontaneously produce to highlight prominent aspects of the concurrent speech signal , , . beat gestures are amongst the most frequently encountered gestures in naturally occurring human communication as the presence of an accompanying beat gesture allows for the enhancement of the perceived prominence of a word , , . not surprisingly, they hence also feature prominently in politicians' public speeches , . by highlighting concurrent parts of speech and enhancing its processing [ ] [ ] [ ] [ ] , beat gestures may even increase memory recall of words in both adults and children [ ] [ ] [ ] . as such, they form a fundamental part of our communicative repertoire and also have direct practical relevance. in contrast to iconic gestures, beat gestures are not related to the spoken message on a semantic level, but only on a temporal level: their apex is aligned to vowel onset in lexically stressed syllables carrying prosodic emphasis , , , . neurobiological studies suggest that listeners may tune oscillatory activity in the alpha and theta bands upon observing the initiation of a beat gesture bosker & peeters to anticipate processing an assumedly important upcoming word . bilateral temporal areas of the brain may in parallel be activated for efficient integration of beat gestures and concurrent speech . however, evidence for behavioral effects of temporal gestural alignment on speech perception remains scarce. given the close temporal alignment between beat gestures and lexically stressed syllables in speech production, ideal observers could in principle use beat gestures as a cue to lexical stress in perception. lexical stress is indeed a critical speech cue in spoken word recognition. it is well established that listeners commonly use suprasegmental spoken cues to lexical stress, such as greater amplitude, higher fundamental frequency (f ), and longer syllable duration, to constrain online lexical access, speeding up word recognition, and increasing perceptual accuracy , . listeners also use visual articulatory cues on the face to perceive lexical stress when the auditory signal is muted [ ] [ ] [ ] . nevertheless, whether non-facial visual cues, such as beat gestures, also aid in lexical stress perception, and might even do so in the presence of auditory cues to lexical stressas in everyday conversationsis unknown. these questions are relevant for our understanding of faceto-face interaction, where multimodal cues to lexical stress might considerably increase the robustness of spoken communication, for instance in noisy listening conditions , and even when the auditory signal is clear . indeed, a recent theoretical account proposed that multimodal low-level integration must be a general cognitive principle fundamental to our human communicative capacities . however, evidence showing that the integration of information from two separate communicative modalities (speech, gesture) modulates the mere perception of information derived from one of these two modalities (e.g., speech) is lacking. for instance, earlier studies have failed to find effects of simulated pointing gestures on lexical stress perception, leading to the idea that manual gestures bosker & peeters provide information only about sentential emphasis; not about relative emphasis placed on individual syllables within words . as a result, there is no formal account or model of how audiovisual prosody might help spoken word recognition [ ] [ ] [ ] . hence, demonstrating a manual mcgurk effect would imply that existing models of speech perception need to be fundamentally expanded to account for how multimodal cues influence auditory word recognition. below, we report the results of four experiments (and two direct replications: experiments s -s ) in which we tested for the existence and robustness of a manual mcgurk effect. the experiments each approach the theoretical issue at hand from a different methodological angle, varying the degree of explicitness of the participants' experimental task and their response modality (perception vs. production). experiment uses an explicit task to test whether observing a beat gesture influences the perception of lexical stress in disyllabic spoken stimuli. using the same stimuli, experiment implicitly investigates whether seeing a concurrent beat gesture influences how participants vocally shadow (i.e., repeat back) an audiovisual speaker's production of disyllabic stimuli. experiment then uses those shadowed productions as stimuli in a perception experiment to implicitly test whether naïve listeners are biased towards perceiving lexical stress on the first syllable if the shadowed production was produced in response to an audiovisual stimulus with a beat gesture on the first syllable. finally, experiment tests whether beat gestures may influence what vowels (long vs. short) listeners perceive. the perceived prominence of a syllable influences the expected duration of that syllable's vowel nucleus i.e., longer if stressed, . thus, if listeners use beat gestures as a cue to lexical stress, they may perceive a vowel midway between dutch short /ɑ/ and long /a:/ as /ɑ/ when presented with a concurrent beat gesture, but as /a:/ without it. for all experiments, we hypothesized that, if the manual mcgurk effect exists, we should see effects of the presence and timing of beat gestures on the perception and production of lexical stress (experiments - ) and, in turn, on vowel perception (experiment ). in short, we found evidence in favor of a manual mcgurk effect in all four experiments and in both direct replications: the perception of lexical stress and vowel identity is consistently influenced by the temporal alignment of beat gestures to the speech signal. disyllabic spoken stimuli as having initial stress or final stress (e.g., wasol vs. wasol; capitals indicate lexical stress). native speakers of dutch were presented with an audiovisual speaker (facial features masked in the actual experiment to avoid biasing effects of articulatory cues) producing non-existing pseudowords in a sentence context: nu zeg ik het woord... [target] "now say i the word... [target]". pseudowords were manipulated along a -step lexical stress continuum, varying f in opposite directions for the two syllables, thus ranging from a 'strong-weak' (step ) to a 'weak-strong' prosodic pattern (step ). critically, the audiovisual speaker always produced a beat gesture whose apex was manipulated to be either aligned to the onset of the first vowel or the second vowel by varying the silent interval preceding the pseudoword (cf. figure a) . the task categorized audiovisual pseudowords as having lexical stress on either the first or the second syllable. each target pseudoword was sampled from a -step continuum (on x-axis) varying f in opposite directions for the two syllables. in the audiovisual block (solid lines), participants were significantly more likely to report perceiving lexical stress on the first syllable if the speaker produced a beat gesture on the first syllable (in blue) vs. second syllable (in orange). no such difference was observed in the audio-only control block (transparent lines). c. participants in experiment categorized the audio-only shadowed productions, recorded in experiment , as having lexical stress on either the first or the second syllable. if the audiovisual stimulus in experiment involved a pseudoword with a beat gesture on the first syllable, the elicited shadowed production was more likely to be perceived as having lexical stress on the first syllable (in blue). conversely, if the audiovisual stimulus in experiment involved a pseudoword with a beat gesture on the second syllable, the elicited shadowed production was more likely to be perceived as having lexical stress on the second syllable (in orange). d. participants in experiment categorized audiovisual pseudowords as containing either short /ɑ/ or long /a:/ (e.g., bagpif vs. baagpif). each target pseudoword contained a first vowel that was sampled from a -step f continuum from long /a:/ to short /ɑ/ (on x-axis). moreover, prosodic cues to lexical stress (f , amplitude, syllable duration) were set to ambiguous values. if the audiovisual speaker produced a beat gesture on the first syllable (in blue), participants were biased to perceive the first syllable as stressed, making the ambiguous first vowel relatively short for a stressed syllable, leading to a lower proportion of long /a:/ responses. conversely, if the audiovisual speaker produced a beat gesture on the second syllable (in orange), participants were biased to perceive the initial syllable as unstressed, making the ambiguous first vowel relatively long for an unstressed syllable, leading to a higher proportion of long /a:/ responses. for all panels, error bars enclose . x se on either side; that is, the % confidence intervals. however, the task in experiment , asking participants to categorize the audiovisual stimuli as bosker & peeters having stress on either the first or second syllable, is very explicit and presumably susceptible to response strategies. therefore, experiment tested lexical stress perception implicitly and in another response modality by asking participants to overtly shadow the audiovisual pseudowords from experiment . to reduce the load on the participants, we selected only three relevant steps from the lexical stress continua: two relatively clear steps ( and ) and one relatively ambiguous step (step ). critically, lexical stress was never mentioned in any part of the instructions. we hypothesized that, if participants saw the audiovisual speaker produce a beat gesture on the second syllable, the suprasegmental characteristics of their shadowed productions would resemble a 'weak-strong' prosodic pattern (i.e., greater amplitude, higher f , longer duration on the second syllable). acoustic analysis of the amplitude, f , and duration measurements of the shadowed productions by three separate linear mixed models (one for each acoustic cue) showed small but significant effects of beat condition, visualized in figure . that is, if the beat gesture was aligned to the second syllable, participants presumably implicitly perceived lexical stress on the second syllable, and therefore their own shadowed second syllables were produced with greater amplitude and a longer duration, while their first syllables were shorter. no effect of f was observed. note that the effects of beat gestures in experiment were robust but small. furthermore, no effect of beat condition was found on the f of the shadowed productions, the primary cue to stress in dutch . therefore, experiment tested the perceptual relevance of the acoustic adjustments participants made to their shadowed productions in response to the two beat gesture conditions. each participant listened to all shadowed productions from two shadowers from experiment and indicated whether they perceived the shadowed productions to have stress on the first or second syllable ( afc). critically, this involved audio-only stimuli and the participants in experiment were unaware of the beat gesture manipulation in experiment . nevertheless, outcomes of a glmm on the binomial categorization data demonstrated that participants were significantly biased to report hearing stress on the first syllable of the audio-only shadowed productions if the audiovisual stimulus that elicited those shadowed productions had a beat gesture on the first syllable (main effect of beat condition: β = . in logit space, p = . ). this is indicated by the difference between the blue and orange lines in figure c is grounded in the observation that listeners take a syllable's perceived prominence into account when estimating its vowel's duration , . vowels in stressed syllables typically have a longer duration than vowels in unstressed syllables. hence, in perception, listeners expect a vowel in a stressed syllable to have a longer duration. that is, one and the same vowel duration can be perceived as relatively short in one (stressed) syllable, but as relatively long in another (unstressed syllable). in languages where vowel duration is a critical cue to phonemic contrasts (e.g., to coda stop voicing in english, e.g., "coatcode" ; to vowel length contrasts in dutch, e.g., tak /tɑk/ "branch" -taak /ta:k/ "task" ), these expectations can influence the perceived identity of speech sounds. participants in experiment were presented with audiovisual stimuli, much like those used in experiments - . however, we used a new set of disyllabic pseudowords whose initial vowel was manipulated to be ambiguous between dutch /ɑ-a:/ in its duration while varying the second formant frequency (f ) in -step f continua, which is a secondary cue to the /ɑ-a:/ contrast in dutch but not a cue to lexical stress. moreover, each pseudoword token was manipulated to be ambiguous with respect to its suprasegmental cues to lexical stress (set to average values of f , amplitude, and duration). taken together, these manipulations resulted in pseudowords that were ambiguous in lexical stress and vowel duration, but varied in f as cue to vowel identity. participants indicated whether they heard an /ɑ/ or /a:/ as initial vowel of the pseudoword ( afc; e.g., bagpif or baagpif?); lexical stress was never mentioned in any part of the instructions. condition on participants' vowel perception (β = - . in logit space, p = . ). that is, if the beat gesture was aligned to the first syllable (blue line in figure d ), participants were more likely to perceive the first syllable as stressed, rendering the ambiguous vowel duration in the first syllable as relatively short for a stressed syllable, lowering the proportion of long /a:/ responses. conversely, if the beat gesture was aligned to the second syllable (orange line in figure d ), participants were more likely to perceive the first syllable as unstressed, making the ambiguous vowel duration in the first syllable relatively long for an unstressed syllable, leading to a small to what extent do we listen with our eyes? the classic mcgurk effect illustrates that listeners take both visual (lip movements) and auditory (speech) information into account to arrive at a best possible approximation of the exact message a speaker intends to convey . here we provide evidence in favor of a manual mcgurk effect: the beat gestures we see influence what exact speech sounds we hear. specifically, we observed in four experiments that beat gestures influence both the explicit and implicit perception of lexical stress, and in turn may influence the perception of individual speech sounds. to our understanding, this is the first demonstration of behavioral consequences of beat gestures on low-level speech perception. these findings support the idea that everyday language comprehension is primordially a multimodal undertaking in which the multiple streams of information that are concomitantly perceived through the different senses are quickly and efficiently integrated by the listener to make a well-informed best guess of what specific message a speaker is intending to communicate. the four experiments we presented made use of a combination of implicit and explicit paradigms. not surprisingly, the explicit nature of the task used in experiment led to strong effects of the timing of a perceived beat gesture on the lexical stress pattern participants reported to hear. although the implicit effects in subsequent experiments - were smaller, conceivably partially due to suboptimal control over the experimental settings in online testing, they were nevertheless robust and replicable (cf. experiments s -s in supplementary material). moreover, these beat gesture effects even surfaced when the auditory cues were relatively clear (i.e., at the extreme ends of the acoustic continua), suggesting that these effects also prevail in naturalistic face-to-face communication outside the lab. in fact, the naturalistic nature of the audiovisual stimuli would seem key to the effect, since the temporal alignment of artificial hand movements the outcomes of the present study challenge current models of (multimodal) word recognition, which would need to be extended to account for how prosody, specifically multimodal prosody, supports and modulates word recognition. audiovisual prosody is implemented in the fuzzy logical model of perception , but how this information affects lexical processing is underspecified. moreover, experiment implicates that the multimodal suprasegmental and segmental 'analyzers' would have to allow for interaction, since one can influence the other. therefore, existing models of audiovisual speech perception and word recognition thus need to be expanded to account for ( ) how suprasegmental prosodic information is processed ; ( ) how suprasegmental information interacts with and modulates segmental information ; and ( ) how multimodal cues influence auditory word recognition , . a recent, broader theoretical account of human communication (i.e., beyond speech alone) proposed that two domain-general mechanisms form the core of our ability to understand language: multimodal low-level integration and top-down prediction . the present results are indeed in line with the idea that multimodal low-level integration is a general cognitive principle supporting language understanding. more precisely, we here show that this mechanism operates across communicative modalities. the cognitive and perceptual apparatus that we have at our disposal apparently not just binds auditory and visual information that derive from the same communicative channel (i.e., articulation), but also combines visual information derived from the hands with concurrent auditory informationcrucially to such an extent that what we hear is modulated by the timing of the beat gestures we see. as such, what we perceive is the model of reality that our brains provide us by binding visual and auditory communicative input, and not reality itself. in addition to multimodal low-level integration, earlier research has indicated that also top-down predictions play a pivotal role in the perception of co-speech beat gestures. it seems reasonable to assume that listeners throughout life build up the implicit knowledge that beat gestures are commonly tightly paired with acoustically prominent verbal information. as the initiation of a beat gesture typically slightly precedes the prominent acoustic speech feature , these statistical regularities are then indeed used in a top-down manner to allocate additional attentional resources to an upcoming word . the current first demonstration of the influence of simple 'flicks of the hands' on low-level speech perception should be considered an initial step in our understanding of how bodily gestures in general may affect speech perception. future research may investigate whether the temporal alignment of types of non-manual gestures, such as head nods and eyebrow movements, also impact what speech sounds we hear. moreover, other behavioral consequences beyond lexical stress perception could be implicated, ranging from phase-resetting neural oscillations with consequences for speech intelligibility and phoneme identification , , to facilitating selective attention in 'cocktail party' listening [ ] [ ] [ ] , word segmentation , and word learning . finally, the current findings have several practical implications. speech recognition systems may be improved when including multimodal signals, even beyond articulatory cues on the face. presumably beat gestures are particularly valuable multimodal cues, because, while cross-language differences exist in the use and weighting of acoustic suprasegmental prosody e.g., amplitude, f , duration, - , beat gestures could presumably function as a language-universal prosodic cue. also, the field of human-computer interaction could benefit from the finding that beat gestures support spoken communication, for instance by enriching virtual agents and avatars with human-like gesture-tospeech alignment. the use of such agents in immersive virtual reality environments will in turn allow for taking the study of language comprehension into naturalistic, visually rich and dynamic settings while retaining the required degree of experimental control . by acknowledging that spoken language comprehension involves the integration of auditory sounds, visual articulatory cues, and even the simplest of manual gestures, it will be possible to significantly further our understanding of the human capacity to communicate. native speakers of dutch were recruited from the max planck institute's participant pool for our auditory target stimuli, we used disyllabic pseudowords that are phonotactically legal in dutch (e.g., wasol /ʋa.sɔl/; see table s in supplementary material). in dutch, lexical stress is cued by three suprasegmental cues: amplitude, fundamental frequency (f ), and duration . since, in dutch, f is a primary cue to lexical stress in perception, we used -step lexical stress continua adopted from an earlier student project varying in mean f while keeping amplitude and duration cues constant, thus ranging from a 'strong-weak' prosodic pattern (sw; step ) to a 'weakstrong' prosodic pattern (ws; step ). these continua had been created by recording an sw (i.e., stress on initial syllable) and a ws version (i.e., stress on last syllable) of each pseudoword from the first author, a male native speaker of dutch. we first measured the average duration and amplitude values, separately for the first and second syllable, across the stressed and unstressed versions: mean duration syllable = ms; syllable = ms; mean amplitude syllable = . db; syllable = db. then, using praat , we set the duration and amplitude values of the two syllables of each pseudoword to these ambiguous values. subsequently, f was manipulated along a -step continuum, with the two extremes and the step size informed by the talker's originally produced f . manipulations were always performed in an inverse manner for the two syllables of each pseudoword: while the mean f of the first syllable decreased along the continuum (from . to . hz in steps of . hz), the mean f of the second syllable increased (from . to . hz in steps of . hz). moreover, rather than setting the f within each syllable to a fixed value, more natural output was created by including a fixed f declination within the first syllable (linear decrease of . hz) and second syllable ( . hz). pilot data from a categorization task on these manipulated stimuli showed that these manipulations resulted in -step f continua that appropriately sampled the sw-to-ws perceptual space (i.e., from . proportion sw responses for step to . proportion sw responses for step ). to create audiovisual stimuli, the first author was video-recorded using a canon xf camera figure a) . by modulating the length of the intervening silent interval, the onset of either the first or the second vowel of the target pseudoword was aligned to the final beat gesture's apex. furthermore, the facial features of the talker were masked to hide any visual articulatory cues to lexical stress. finally, in order to mask the cross-spliced nature of the audio, combining recordings with variable room acoustics, the silent interval and pseudoword were mixed with stationary noise from a silent recording of the gesture lab. we created new disyllabic pseudowords, with either a short /ɑ/ or a long /a:/ as first vowel (e.g., bagpif /bɑx.pɪf/ vs. baagpif /ba:x.pɪf/; cf. table s ). this new set had a fixed syllable structure (cvc.cvc, only stops and fricatives) to reduce item variance and to facilitate syllablelevel annotations. the same male talker as before was recorded producing these pseudowords in four versions: with short /ɑ/ and stress on the first syllable (e.g., bagpif); with short /ɑ/ and stress on the second syllable (bagpif); with long /a:/ and stress on the first syllable (baagpif); with long /a:/ and stress on the second syllable (baagpif). after manual annotation of syllable onsets and offsets, we measured the values of the suprasegmental cues to lexical stress (f , amplitude, and duration) in all four conditions. pseudowords with long /a:/ and stress on the first syllable were selected for manipulation. first, we manipulated the cues to lexical stress by setting these to ambiguous values using psola in praat: each first syllable was given the same mean value calculated across all stressed and unstressed first syllables (mean f = hz; original contour maintained; amplitude = . db; duration = ms), and each second syllable was given the same mean value across all stressed and unstressed second syllables (mean f = hz; original contour maintained; amplitude = . db; duration = ms). this resulted in manipulated pseudowords that were ambiguous in their prosodic cues to lexical stress. since this included manipulating duration across all recorded conditions as well, it also meant that the duration cues to the identity of the first vowel were ambiguous. second, the first /a:/ vowel was extracted and manipulated to form a spectral continuum from long /a:/ to short /ɑ/. in dutch, the /ɑ-a:/ vowel contrast is cued by both spectral (lower first (f ) and second formant (f ) values for /ɑ/) and temporal cues shorter duration for /ɑ/, . we decided to create to create audiovisual stimuli, these manipulated pseudowords were spliced into the audiovisual stimuli from experiment . once again, manipulating the silent interval between carrier sentence offset and target onset resulted in two different alignments of the last beat gesture to either first vowel onset or second vowel onset. like in experiment , facial features of the talker were masked, and silent intervals as well as manipulated pseudowords were mixed with stationary noise to match the room acoustics across the sentence stimuli. participants were tested individually in a sound-conditioning booth. they were seated at a distance of approximately cm in front of a , cm by , cm screen and listened to stimuli at a comfortable volume through headphones. stimulus presentation was controlled by presentation software (v . ; neurobehavioral systems, albany, ca, usa). participants were presented with two blocks of trials: an audiovisual block, in which the video and the audio content of the manipulated stimuli were concomitantly presented, and an audio-only control block, in which only the audio content but no video content was presented. block order was counter-balanced across participants. on each trial, the participants' task was to indicate whether the sentence-final pseudoword was stressed on the first or the last syllable ( -alternative forced choice task; afc). before the experiment, participants were instructed to always look at the screen during trial presentations. in the audiovisual block, trials started with the presentation of a fixation cross. after ms, the audiovisual stimulus was presented (video dimensions: x pixels). at stimulus offset, a response screen was presented with two response options on either side of the screen, for instance: wasol vs. wasol; positioning of response options was counter-balanced across participants. participants were instructed that capitals indicated lexical stress, that they could enter their response by pressing the "z" key on a regular keyboard for the left option and the "m" key for the right option, and that they had to give their response within s after stimulus offset; otherwise a missing response was recorded. after their response (or at timeout), the screen was replaced by an empty screen for ms, after which the next trial was initiated automatically. the trial structure in the audio-only control block was identical to the audiovisual block, except that instead of the video content a static fixation cross was presented during stimulus presentation. participants were presented with pseudowords, sampled from -step f continua, in beat gesture conditions (onset of first or second vowel aligned to apex of beat gesture), resulting in unique items in a block. within a block, item order was randomized. before the experiment, four practice trials were presented to participants to familiarize them with the materials and the task. participants were given opportunity to take a short break after the first block. experiment used the same procedure as experiment , except that only audiovisual stimuli were used. also, instead of making explicit afc decisions about lexical stress, participants were instructed to simply repeat back the sentence-final pseudoword after stimulus offset, and to hit the "enter" key to move to the next trial. no mention of lexical stress was made in the instructions. participants were instructed to always look at the screen during trial presentations. at the beginning of the experiment, participants were seated behind a table with a computer screen inside a double-walled acoustically isolated booth. participants wore a circum-aural sennheiser game zero headset with attached microphone throughout the experiment to ensure that amplitude measurements were made at a constant level. participants were presented with pseudowords, sampled from step , , and of the f continua, in beat gesture conditions, resulting in unique items. each unique item was presented twice over the course of the experiment, leading to trials per participants. item order was random and item repetitions only occurred after all unique items had been presented. before the experiment, one practice trial was presented to participants to familiarize them with the stimuli and the task. experiment was run online using psytoolkit version . . ; , because of limitations due to the corona virus pandemic. each participant was explicitly instructed to use headphones and to run the experiment with undivided attention in quiet surroundings. each participant was presented with all shadowed productions from two talkers from experiment , leading to different lists, with two participants assigned to each list. stimuli from the same talker were grouped to facilitate talker and speech recognition, and presented in a random order. the task was to indicate after stimulus offset whether the pseudoword had stress on the first or the second syllable ( afc). two response options were presented, with capitals indicating stress (e.g., wasol vs. wasol), and participants used the mouse to select their response. no practice items were presented and the experiment was self-timed. note that the same participants who performed experiment also participated in experiment . to avoid explicit awareness about lexical stress influencing their behavior in experiment , these participants always first ran experiment , and only then took part in experiment . experiment was run online using testable (https://testable.org). each participant was explicitly instructed to use headphones and to run the experiment with undivided attention in quiet surroundings. before the experiment, participants were instructed to always look and listen carefully to the audiovisual talker and categorize the first vowel of the sentence-final target pseudoword ( afc). specifically, at stimulus offset, a response screen was presented with two response options on either side of the screen, for instance: bagpif vs. baagpif. participants entered their response by pressing the "z" key on a regular keyboard for the left option and the "m" key for the right option. the next trial was only presented after a response had been recorded (iti = ms). no mention of lexical stress was made in the instructions. participants were presented with pseudowords, sampled from -step f continua, in beat gesture conditions (onset of first or second vowel aligned to apex of beat gesture), resulting in unique items in a block. each participant received two identical blocks with a unique random order within blocks. trials with missing data (n = ; . %) were excluded from analysis. data were statistically all shadowed productions were manually evaluated and annotated for syllable onsets and offsets. we excluded trials (< . %) because no speech was produced, and another trials ( . %) because participants produced more phonemes than the original target pseudoword (e.g., hearing wasol but repeating back kwasol) which would affect our duration measurements. thus, acoustic analyses only involved correctly shadowed productions (n = ; %) and incorrectly shadowed productions but with the same number of phonemes (n = ; . %). measurements of mean f , duration, and amplitude were calculated for each individual syllable using praat, and are summarized in figures s , s , and s in the supplementary material. individual syllable measurements of f (in hz), duration (in ms), and amplitude (in db) were entered into separate linear mixed models as implemented in the lmertest library (version . - ) in r. statistical significance was assessed using the satterthwaite approximation for degrees of freedom . the structure of the three models was identical: they included fixed effects of syllable (categorical predictor; with the first syllable mapped onto the intercept), continuum step (continuous predictor; scaled and centered around the mean), and beat condition (categorical predictor; with a beat gesture on the first syllable mapped onto the intercept), and all interactions. the model also included participant and pseudoword as random factors, with by-participant and by-item random slopes for all fixed effects. model output is given in table and the combined predicted estimates of the three models are visualized in figure . the observed effects of syllable in all three models demonstrate word-final effects: the last syllables of words typically have lower amplitude, lower f , and longer durations (cf. figure ). step confirm that participants adhered to our instructions to carefully shadow the pseudowords. that is, for higher steps on the f continuum (i.e., sounding more wslike), the amplitude and f of the first syllable (mapped onto the intercept) decreased (see simple effects of continuum step), while amplitude and f of the second syllable increased (interactions syllable and continuum step). finally, and most critically, the duration of the first syllable (mapped onto the intercept) decreased when the beat gesture was on the second syllable (simple effect of beat condition; cf. figure ) . that is, when the beat fell on the second syllable, the first syllable was more likely to be perceived as unstressed, leading participants to reduce the duration of the first syllable of their shadowed production. moreover, when the beat fell on the second syllable, the second syllable itself had a higher amplitude and a longer duration (interactions syllable and beat condition; cf. figure ). that is, with the beat gesture on the second syllable, the second syllable was more likely to be perceived as stressed, resulting in louder and longer second syllables in the shadowed productions. suggesting a tendency for the more ambiguous steps to show a larger effect of the beat gesture. we analyzed the binomial categorization data (/a:/ coded as ; /ɑ/ as ) using a glmm with a logistic linking function. fixed effects were continuum step and beat condition (coding adopted the experimental data of this study will be made publicly available for download upon publication. at that time, they may be retrieved from https://osf.io/b kue under a cc-by the first or the second syllable. each target pseudoword was sampled from a -step continuum (on x-axis) varying f in opposite directions for the two syllables. in the audiovisual block (solid lines), participants were significantly more likely to report perceiving lexical stress on the first syllable if the speaker produced a beat gesture on the first syllable (in blue) vs. second syllable (in orange). no such difference was observed in the audio-only control block (transparent lines). c. participants in experiment categorized the audio-only shadowed productions, recorded in experiment , as having lexical stress on either the first or the second syllable. if the audiovisual stimulus in experiment involved a pseudoword with a beat gesture on the first syllable, the elicited shadowed production was more likely to be perceived as having lexical stress on the first syllable (in blue). conversely, if the audiovisual stimulus in experiment involved a pseudoword with a beat gesture on the second syllable, the elicited shadowed production was more likely to be perceived as having lexical stress on the second syllable (in orange). d. participants in experiment categorized audiovisual pseudowords as containing either short /ɑ/ or long /a:/ (e.g., bagpif vs. baagpif). each target pseudoword contained a first vowel that was sampled from a -step f continuum from long /a:/ to short /ɑ/ (on x-axis). moreover, prosodic cues to lexical stress (f , amplitude, syllable duration) were set to ambiguous values. if the audiovisual speaker produced a beat gesture on the first syllable (in blue), participants were biased to perceive the first syllable as stressed, making the ambiguous first vowel relatively short for a stressed syllable, leading to a lower proportion of long /a:/ responses. conversely, if the audiovisual speaker produced a beat gesture on the second syllable (in orange), participants were biased to perceive the initial syllable as multimodal language processing in human communication speaking: from intention to articulation language as a multimodal phenomenon: implications for language learning, processing and evolution the neurobiology of language beyond single-word processing why we should study multimodal language knowledgebased and signal-based cues are weighted flexibly during spoken language comprehension what do we mean by prediction in language comprehension? prediction during language comprehension: benefits, costs, and erp components using language integration of word meaning and world knowledge in language comprehension electrophysiology reveals semantic memory use in language comprehension hearing lips and seeing voices audiovisual speech perception and the mcgurk effect the mcgurk effect: auditory visual speech perception's piltdown man forty years after hearing lips and seeing voices: the mcgurk effect revisited how visual cues to speech rate influence speech perception the supramodal brain: implications for auditory perception visible action as utterance fifteen ways of looking at a pointing gesture pointing sets the stage for learning language-and creating language where language, culture, and cognition meet kinematic correlates of communicative intent in the planning and production of pointing gestures and speech two sides of the same coin: speech and gesture mutually interact to enhance comprehension on-line integration of semantic information from speech and gesture: insights from event-related brain potentials how iconic gestures enhance communication: an erp study beat gestures modulate auditory integration in speech perception prosodic structure shapes the temporal realization of intonation and manual gesture movements the effects of visual beats on prosodic prominence: acoustic analyses, auditory perception and visual perception the temporal relation between beat gestures and speech temporal, structural, and pragmatic synchrony between intonation and gesture gesture-speech physics: the biomechanical basis for the emergence of gesture-speech synchrony co-speech gestures influence neural activity in brain regions associated with processing semantic bosker & peeters information. hum. brain mapp the role of synchrony and ambiguity in speech-gesture integration during comprehension what does cross-linguistic variation in semantic coordination of speech and gesture reveal?: evidence for an interface representation of spatial thinking and speaking hearing and seeing meaning in speech and gesture: insights from brain and behaviour acoustic information about upper limb movement in voicing gesture and speech in interaction: an overview the repertoire of nonverbal behavior: categories gestures and phases: the dynamics of speech-hand communication gesture/speech integration in the perception of prosodic emphasis speaker's hand gestures modulate speech perception through phase resetting of ongoing neural oscillations good and bad in the hands of politicians: spontaneous gestures bosker & peeters during positive and negative speech beat that word: how listeners integrate beat gesture and focus in multimodal speech discourse gesture facilitates the syntactic analysis of speech giving speech a hand: gesture modulates activity in auditory cortex during speech perception. hum. brain mapp the role of beat gesture and pitch accent in semantic processing: an erp study mnemonic effect of iconic gesture and beat gesture in adults and children: is meaning in gesture important for memory recall? beat gestures improve word recall in -to -year-old children beat gestures help preschoolers recall and comprehend discourse information observing storytellers who use rhythmic beat gestures improves children's narrative discourse performance speech focus position effect on jaw-finger coordination in a pointing task robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel english listeners use suprasegmental cues to lexical stress early during spoken-word recognition early use of phonetic information in spoken word recognition: lexical stress drives eye movements immediately suprasegmental lexical stress cues in visual speech can guide spoken-word recognition optical phonetics and visual perception of lexical and phrasal stress in english cross-modal facilitation in speech prosody visual contribution to speech intelligibility in noise visual input enhances selective speech envelope tracking in auditory cortex at a "cocktail party pointing gestures do not influence the perception of lexical stress. bosker & peeters in proceedings of interspeech a bayesian model of continuous speech recognition prediction and constraint in audiovisual speech perception the trace model of speech perception production and perception of vowel length in spoken sentences examples of mixed-effects modeling with crossed random effects and with binomial data intonational structure mediates speech rate normalization in the perception of segmental categories accounting for rate-dependent category boundary shifts in speech perception perceiving talking faces: from speech perception to a behavioral principle prosodically driven phonetic detail in speech processing: the case of domain-initial strengthening in english listeners normalize speech for contextual bosker & peeters speech rate even without an explicit recognition task acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing neural entrainment determines the words we hear spectral contrast effects are modulated by selective attention in 'cocktail party' settings temporal contrast effects in human speech perception are immune to selective attention mechanisms underlying selective neuronal tracking of attended speech at a "cocktail party finding phrases: the interplay of word frequency, phrasal prosody and co-speech visual information in chunking speech by monolingual and bilingual adults prosodic temporal alignment of co-speech gestures to speech facilitates referent resolution acoustic cues to perception of word stress by english, mandarin, and russian speakers english learners' use of segmental and suprasegmental cues to stress in lexical access: an eye-tracking study cross-language differences in cue use for speech segmentation virtual reality: a game-changing method for the language sciences the role of talker-specific prosody in predictive speech perception praat: doing phonetics by computer psytoolkit: a novel web-based method for running online questionnaires and reaction-time experiments fitting linear mixed-effects models using lme r: a language and environment for statistical computing random effects structure for confirmatory hypothesis testing: keep it maximal evaluating significance in linear mixed-effects models in r veni grant - - ). we would like to thank giulio severijnen for help in creating the pseudowords of experiments - , and nora kennis, esther de kerf, and myriam weiss for their bosker & peeters help in testing participants and annotating the shadowing recordings. key: cord- -pcuzylva authors: swoger, maxx; gupta, sarthak; charrier, elisabeth e.; bates, michael; hehnly, heidi; patteson, alison e. title: vimentin intermediate filaments mediate cell shape on visco-elastic substrates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pcuzylva the ability of cells to take and change shape is a fundamental feature underlying development, wound repair, and tissue maintenance. central to this process is physical and signaling interactions between the three cytoskeletal polymeric networks: f-actin, microtubules, and intermediate filaments (ifs). vimentin is an if protein that is essential to the mechanical resilience of cells and regulates cross-talk amongst the cytoskeleton, but its role in how cells sense and respond to the surrounding extracellular matrix is largely unclear. to investigate vimentin’s role in substrate sensing, we designed polyacrylamide hydrogels that mimic the elastic and viscoelastic nature of in vivo tissues. using wild-type and vimentin-null mouse embryonic fibroblasts, we show that vimentin enhances cell spreading on viscoelastic substrates, even though it has little effect in the limit of purely elastic substrates. our results provide compelling evidence that the vimentin cytoskeletal network is a physical modulator of how cells sense and respond to mechanical properties of their extracellular environment. the physical properties of the extracellular environment impact many cellular and tissuelevel functions [ ] [ ] [ ] . cells sense and respond to physical forces by converting them into cellular signals through a process known as mechano-sensing [ ] . recent studies indicate that the cell cytoskeleton plays a key role in transmitting forces from cell-matrix adhesions to the nuclear surface, which can alter nuclear shape and lead to changes in gene expression [ ] [ ] [ ] . the cell cytoskeleton is comprised of three polymeric networks: f-actin, microtubules, and intermediate filaments (if) . vimentin is an if protein that is essential to the structural integrity of the cell [ , ] and maintaining nuclear shape [ ] , but its role in in cellular mechano-sensing is unclear. a key mechano-sensing signature in fibroblasts is increasing cell spread area on elastic substrates of increasing stiffness [ , ] . a recent study has shown that loss of vimentin decreases cell spreading on soft elastic substrates but increases cell spreading on stiff substrates [ ] . in the range of physiological-relevant stiffnesses, on the order of - kpa in shear moduli, the spread areas of wild-type and vimentin-null mef, however, are nearly indistinguishable [ ] . on elastic substrates, the ability of cells to generate traction stresses and change shape depends on the shear modulus of the cytoskeleton, which is dominated by f-actin and microtubules [ ] . yet, most biological tissues are not only elastic but viscoelastic, capable of dissipating applied stresses on timescales relevant to cellular mechanical sensing [ ] [ ] [ ] . substrate viscoelasticity induces changes in cell morphology and cytoskeletal structure, reducing stress fiber formation and suppressing cellular adhesions [ ] . on viscoelastic substrates, stresses imposed by a spreading cell dissipate and cell spreading is set by a balance between the stress relaxation time scales of the viscoelastic substrate and the cell focal adhesion turnover rate [ ] . the role of vimentin in the mechanical resilience of cells [ , , ] and its role in focal adhesion assembly [ , ] suggest that despite the modest effect of vimentin on elastic surfaces, their effects in more physiologically-relevant viscoelastic settings might be more evident. to assess the effect of vimentin on cell surface sensing, here we use polyacrylamide hydrogels that model the elastic and viscoelastic properties of real tissues. using wild-type and vimentin-null mouse embryonic fibroblasts (mef), example immunofluorescence images are show in fig. , we find that unlike substrate stiffness, small changes in substrate viscoelasticity has profound effects on how vimentin impacts cell spreading. unlike elastic substrates, loss of vimentin significantly reduces cell spread area on substrates with viscous dissipation. our results suggest vimentin intermediate filaments are a significant contributor to cellular mechano-sensing and could drive differences in cells spreading and motility in tissue. wild-type mefs and vimentin-null mefs were kindly provided by j. ericsson (abo akademi university, turku, finland) and maintained in dulbecco's modification of eagle's medium (dmem) + . g/l glucose, l-glutamine, and sodium pyruvate. culture media is supplemented with % fetal bovine serum, % penicillin streptomycin with mm hepes and % non-essential amino acids. cell cultures were maintained at • c with % co . cultures were passaged when they reached % confluence. cells were fixed for immunofluorescence using % paraformaldehyde (fisher scientific, hampton, new hampshire). cells membranes are permeabilized with . % triton-x (fisher bioreagents, hampton, new hampshire) in pbs for minutes at room temperature and blocked with % bovine serum albumin (bsa) (fisher bioreagents, hampton, new hampshire) for minutes at room temperature. for vimentin visualization, cells were incubated with primary anti-vimentin polyclonal chicken antibody (novus biologicals, centennial, colorado) diluted : in % bsa in pbs for hours at room temperature; the secondary antibody anti-chicken alexa fluor (invitrogen carlsbad, california) was used at a dilution of : in % bsa in pbs incubated covered from light for hour at room temperature. for visualizing paxillin we use primary anti-paxillin mouse antibody (bd biosciences, san jose, california) diluted : in % bsa in pbs and incubated for hours at room temperature; secondary antibodies were anti-mouse alexa fluor (invitrogen, carlsbad, california) at a dilution of : in % bsa in pbs incubated covered from light for hour at room temperature. cells were stained for dna using hoechst (molecular probes, eugene, oregon) at a concentration of : in % bsa in pbs; cells were stained for actin using rhodamine phalloidin (invitrogen, carlsbad, ca) at a dilution of : in % bsa in pbs. cells were incubated for hour covered from light at room temperature while staining for both dna and actin. cells were imaged using a leica dmi (leica, bannockburn, illinois) equipped with a spinning disk x-light v confocal unit using a hc pl apo x/ . w corr cs water immersion objective. images were acquired using the visiview software (visitron systems, puchheim, germany) using a . micron z-step series with . micron steps and analyzed with imagej (nih, bethesda, maryland). (andor technologies, south windsor, ct). cells were maintained at • c and % co using a tokai hit (tokai-hit, shizuoka-ken, japan) stage top incubator and imaged used a x objective. cell areas were traced manually using imagej software. a minimum of cell areas were measured for at least independent experiments. gels are prepared as described by charrier et al. [ ] . fully polymerized chains of linear polyacrylamide (paa) are enmeshed in crosslinked paa networks to create viscoelastic gels containing separate elastic and viscous components. the linear paa chains do not bond to the elastic network, and their effect is to increase the viscous contributions to the gel by allowing reorganization of the viscoelastic network via linear polymer relaxations. chains of linear paa were polymerized by making a solution . % acrylamide in distilled water. this solution was degassed, then . % ammonium per sulfate and . % tetramethylethylenediamine (temed) were added to the solution. the linear paa was allowed to polymerize for hours at • celsius. the elastic component of the polyacrylamide gel is synthesized by mixing acrylamide, bis-acrylamide, linear paa, and distilled water. polymerization was initiated by the addition of ammonium per-sulfate, and temed. concentrations of elastic and viscoelastic gel components are listed in table i . to facilitate cell adhesion with the substrate, we covalently link the paa network with collagen i. the gels are designed so that collagen is bond to only the elastic component of the gel network by incorporating . % nhs into its polymerizing solution. gels were then incubated in a µg/ml collagen solution for hr at room temperature. then, gels were washed three times with pbs and sterilized by exposure to ultraviolet light. rheology measurements were performed on a malvern panalytical kinexus ultra+ rheometer (malvern panalytical, westborough, masachussets) using a mm diameter parallel plate geometry. the elastic and viscoelastic gel solutions are polymerized at room temperatures between the rheometer plates at a gap height of mm. the time evolution of polymerization is monitored by applying a small oscillatory shear strain of % at a frequency of rad/sec for minutes. the values for the elastic shear modulus and viscous shear modulus of each gel are determined by their plateau value once the gel has fully polymerized. stress relaxation measurements were performed using a full polymerized sample by applying a constant shear strain of % and tracking the resulting stress relation with time. data presented as mean value ± standard errors (se). each experiment was performed at least twice. the unpaired student's t-test with two tails at the % confidence interval was used to determine statistical significance. denotations: *, p <= . ; * * , p <= . ; ***, p < . ; ns, p > . . to examine the effects of substrate viscous dissipation on cell spreading, we prepared polyacrylamide (paa) hydrogels with elastic and viscoelastic material properties. paa gels are a common model system for soft cell culture substrates. once polymerized, acrylamide and bis-acrylamide form a linearly elastic network with time-independent responses to stress. to form a viscoelastic hydrogel, a dissipative element of linear paa chains is incorporated into the network, as recently described in charrier et al. [ ] . a schematic of the viscoelastic gels is shown in fig. a , where the linear paa chains (pink) are integrated into the elastic crosslinked paa network (cyan and green). the dissipative component of these gels relaxes in a time dependent response to an applied stress, imparting viscoelastic behavior to the gel. the mechanical properties of the gels are characterized by a shear storage elastic modulus g and a viscous loss modulus g via oscillatory rheology. to determine the effects of viscous dissipation independently of substrate stiffness on cell spreading, the gels were designed with a fixed storage modulus (g ∼ = kpa) but variable loss modulus g for the elastic (g ∼ = pa) and viscoelastic gels (g ∼ = pa), as measured at a frequency of rad/s and % shear strain amplitude (fig. ) . the loss modulus of the viscoelastic gel is thus % of its elastic moduli, which is in the - % range of real tissue! [ ] . if the time scale of substrate relaxation is similar to the time scale of cellular mechanosensing, the substrate relaxation may provide an important feedback for cells attempting to spread on viscoelastic substrates. to determine the time dependence of force dissipation on viscoelastic substrates, we performed stress relaxation measurements (fig. e) . a shear strain of % is continuously applied to the gel and the resulting stress relaxation is measured with time t. the stress relaxes to a finite value above zero, indicating the viscoelastic gels behave as a viscoelastic solid. the decrease in stress can be captured by an exponential decay function, where τ is the characteristic timescales of force dissipation and a and c are fitting constants. by fitting this relationship to the data in fig. e , we obtain a characteristic timescale τ of ≈ seconds, which is expected to be relevant to cellular motion in the extracellular matrix environment [ ] . to determine the effects of vimentin to substrate viscous dissipation, we seeded wildtype and vimentin-null mouse embryonic fibroblasts (mef) on the elastic and viscoelastic gels (fig. a) . to facilitate cells attachment to the substrates, the elastic component of the hydrogels was covalently linked with collagen i. cells were imaged hours after plating with bright field images used to quantify the cell spread area of single cells (fig. b) . we found that substrate viscoelasticity decreased cell spread area for both cell types, as seen on other viscoelastic hydrogels [ ] , but in a strongly vimentin-dependent manner. on purely elastic kpa gels, the two cell types spread similarly (fig. b) as on other soft elastic substrates [ ] . however, on viscoelastic gels, their response was different (fig. b) : the vimentin-null mef were significantly less spread compared to wild-type mef. taken together, these results indicate that vimentin contributes to how cells sense and respond to the physical properties of the extracellular matrix environment, particularly to substrate viscoelasticity. we next examined how substrate viscous dissipation affects cell-substrate and cell-cell interactions (fig. ) . to quantify cell-substrate adhesions, we counted the total number of adherent cells after hr. on both elastic and viscoelastic substrates, wild-type cells were more successful in remaining adhered to the substrate when compared to vimentin-null cells. this observation was more striking on viscoelastic substrates, where it was seen that on average wild type cells were -fold more likely to remain adhered (fig. a) . these results suggest that cell adhesion to viscoelastic substrates correlates with vimentin expression. cell spreading depends on the mechanical properties of the substrate matrix but also direct contact-mediated cell-cell interactions. we observed different rates of cell clustering behavior between our wild-type and vimentin-null mef on viscoelastic substrates. vimentinnull mef adhered to viscoelastic substrates formed noticeable more clusters after hours; whereas cells with vimentin or on elastic substrates did not cluster tightly. to assess cell-cell interactions, we quantified cell-cell interactions by the length of direct contact between two touching cells, which is independent of the adherent cell density. fig. b shows two examples of these cell-cell interactions. for wild-type cells on elastic substrates, cell-cell contacts are small compared to the cell's contact with the substrate, whereas the vimentin-null cells on viscoelastic substrates are strongly coupled. to normalize the cell-cell contact with cell spread area, we define the cell-cell contact by dividing the contact length by the total perimeter of the pair of cells. as shown in fig. , we find that substrate viscoelasticity increases cell-cell contact for both cell types, indicating that cell-cell interactions promote adhesion on viscoelastic substrates. while cell-cell contact is similar between wild-type and vimentinnull mef on elastic substrates, cell-cell contact increases more than % for vimentin-null cells compared to wild-type cells on the viscoelastic substrates. taken together, these results indicate that vimentin may facilitate adhesion to viscoelastic substrates by promoting cell-matrix adhesions over cell-cell interactions. next, we analyzed the organization of vimentin networks on elastic and viscoelastic substrates. vimentin networks are apparent in wild-type mef on all substrates but the organization of the vimentin networks varied, as shown in immunofluorescence images in fig. . on the elastic substrates, vimentin was spread throughout the cytoplasm of the cells, with filamentous bundles extending toward the periphery of the cells. filamentous vimentin bundles such as these have been implicated in load-bearing units in the cell, contributing to proper alignment of actin-based cell traction stress [ ] . however, on viscoelastic substrates, the vimentin network was more condensed in a mesh-like cage around the cell nucleus and vimentin filamentous strands were less evident. when cultured on viscoelastic gels, the fibroblasts were less spread, suggesting a reduced ability to form stable attachments to substrates. cells form adhesion sites with the substrate though integrins and the extracellular matrix ligands to which they bind. to visualize cell-matrix adhesions, we fixed cells and stained for antibodies against paxillin, a cellular protein found in cell focal adhesions sites (fig. ) . most of the cells (> %) displayed paxillin patches and actin stress fibers on elastic substrates. this was reduced by nearly % on viscoelastic gels in both wild-type and vimentin-null mef. these results suggest that substrate viscoelasticity alters focal adhesion formation and actin architecture. the number of cells with focal adhesions and stress fibers were similar between wild-type and vimentin-null cells, suggesting an independent effect of vimentin in enhancing cell spreading on viscoelastic substrates. cell spreading is a complex process that probes fundamental interactions between the behavior of the cell and its coupling to the underlying substrate matrix. this process depends on the cellular cytoskeleton and focal adhesion complexes that attach the cell to the substrate. the intermediate filament protein vimentin contributes to cell morphology and the mechanical resilience of cells, but its role in how cells sense and respond to physical properties of the environment is largely unclear. here, we provide evidence that vimentin is an integral element of cellular mechanical sensing, particularly on substrates with viscous dissipation. we found that wild-type (vim +/+) and vimentin-null (vim -/-) mefs spread similarly on soft purely elastic substrates, in agreement with prior experiments [ , ] ; however, cells lacking vimentin have decreased spreading on viscoelastic substrates. substrate viscoelasticity altered the vimentin network structure, collapsing filamentous vimentin bundles into a mesh-like cage around the nucleus. based on models of vimentin as a load-bearing structures that helps align actin-based stresses [ ] , this observed change in vimentin organization suggests that vimentin is under less mechanical tension on viscoelastic substrates compared to elastic substrates. on viscoelastic gels, cells are less spread and have fewer paxillin patches, indicating a weaker cell-substrate interaction. loss of vimentin correlated with a fewer number of substrate adherent cells and a preference for cell-cell interactions (fig. ) . cells are thought to sense their substrate through a "motor-clutch" mechanism [ ] . in this context, focal adhesions act as molecular clutches, physically linking the cellular cytoskeleton to the extracellular matrix substrate. these physical linkages transmit traction forces to the underlying substrates. motor clutches are generally assumed to act as slip bonds, with a dissociation rate that increases with the amount of force it bears by coupling the cell with the substrate. as motors bind with the substrate, they generate friction and resist the retrograde flow of f-actin filaments. this allows actin polymerization to push the leading edge of the cell forward, which ultimately results in cell spreading. the strong effect of substrate viscoelasticity on vimentin-dependent cell spreading might be surprising given that vimentin is dispensable for cell spreading on soft elastic substrates [ ] . on the other hand, filamentous vimentin networks are less dynamic [ ] and more viscoelastic [ ] compared to the actin and microtubule networks in the cytoskeleton. thus, the stabilizing effect of vimentin on microtubule orientation [ ] and actin-based stresses [ , ] may be particularly evident for cell spreading on viscoelastic substrates, which act to dissipate and effectively lessen cell-generated traction stresses. on viscoelastic substrates, the dynamics of focal adhesion assembly and disassembly compete with substrate viscous dissipation to facilitate cell spreading. recent motor-clutch models [ ] have been developed to capture the effects of adhesion dynamics and substrate viscoelasticity on cell spreading. on viscoelastic substrates, cells feel a time-dependent effective stiffness that decreases over a characteristic substrate relaxation time scale. this results in weaker adhesions and faster retrograde flows, which decrease cell spreading. prior experiments using fluorescence recovery after photobleaching (frap) analysis of gfppaxillin showed that the rate of paxillin turnover in mcf- cells is significantly increased in cell with high levels of vimentin expression [ ] . if vimentin increases the binding time of the clutches, then clutches can quickly bind and break without forming large stable focal adhesions. the time scale of binding is larger than the lifetime of a focal adhesion. this results in an effective "frictional slippage" regime, which stalls retrograde flow and promotes cell spreading. since focal adhesion turnover is slower in cells lacking vimentin, they are predicted to sense a lower effective substrate stiffness. in this case, the clutch binding time is less than the lifetime of the motor clutches, resulting in a "load and fail" regime. the adhesive forces are expected to be smaller than the frictional slippage regime in vimentin expressing cells, resulting in increased retrograde flow and reduced cell spreading. overall, our results suggest that vimentin could promote cell spreading on viscoelastic substrates by mediating emergent interactions between focal adhesion dynamics and substrate relaxation time scales. taken together, our results indicate a new function of vimentin intermediate filaments in modulating cellular response to viscoelastic environments that has important implications for understanding cell and tissue functions. intermediate filaments play diverse roles in a range of cell and tissue functions [ , ] and are important to maintaining cell morphology and adhesion [ ] . vimentin if in particular has been implicated in cataracts [ ] , coronaviruses [ ] , wound healing [ ] , and many forms of metastatic cancer [ ] [ ] [ ] . our results indicate that vimentin promotes cellular adhesion and motility through viscoelastic environments, such as extracellular matrices and in vivo tissue. impaired wound healing and cell migration in cells lacking vimentin could result from a reduced ability for cells to interact strongly with viscoelastic tissue. more broadly, our results show that vimentin contributes to how cells respond to viscoelastic properties of the extracellular matrix where applied stresses dissipate on timescales relevant to cellular mechanical sensing matrix crosslinking forces tumor progression by enhancing integrin signaling matrix elasticity directs stem cell lineage specification the influence of substrate creep on mesenchymal stem cell behaviour and phenotype tissue cells feel and respond to the stiffness of their substrate cell geometric constraints induce modular gene-expression patterns via redistribution of hdac regulated by actomyosin contractility regulation of nuclear architecture, mechanics, and nucleocytoplasmic shuttling of epigenetic factors by cell geometric constraints nuclear shape, mechanics, and mechanotransduction demonstration of mechanical connections between integrins, cytoskeletal filaments, and nucleoplasm that stabilize nuclear structure high stretchability, strength, and toughness of living cells enabled by hyperelastic vimentin intermediate filaments vimentin protects cells against nuclear rupture and dna damage during migration substrate stiffness regulates solubility of cellular vimentin fibroblast adaptation and stiffness matching to soft elastic substrates vimentin enhances cell elastic behavior and protects against compressive stress normal and fibrotic rat livers demonstrate shear strain softening and compression stiffening: a model for soft tissue mechanics quantification of liver viscoelasticity with acoustic radiation force: a study of hepatic fibrosis in a rat model control of cell morphology and differentiation by substrates with independently tunable elasticity and viscous dissipation soft hyaluronic gels promote cell spreading, stress fibers, focal adhesion, and membrane tension by phosphoinositide signaling, not traction force impaired mechanical stability, migration and contractile capacity in vimentin deficient fibroblasts the role of vimentin intermediate filaments in cortical and cytoplasmic mechanics vimentin induces changes in cell shape, motility, and adhesion during the epithelial to mesenchymal transition filamin a is required for vimentin-mediated cell adhesion and spreading vimentin fibers orient traction stress traction dynamics of filopodia on compliant substrates intermediate filaments: possible functions as cytoskeletal connecting links between the nucleus and the cell surface viscoelastic properties of vimentin compared with other filamentous biopolymer networks vimentin intermediate filaments template microtubule networks to enhance persistence in cell polarity and directed migration bidirectional interplay between vimentin intermediate filaments and contractile actin stress fibers intermediate filaments and stress dominant cataract formation in association with a vimentin assembly disrupting mutation surface vimentin is critical for the cell entry of sars-cov vimentin in cancer and its potential as a molecular target for cancer therapy the role of vimentin intermediate filaments in the progression of lung cancer intermediate filaments as effectors of cancer development and metastasis: a focus on keratins we acknowledge useful discussions with jennifer schwarz, vivek shenoy, farid alisafaei, bobby carroll, and paul janmey. this work was supported by the national science foundation mcb awarded to a.e. patteson. key: cord- -xx l r e authors: kodani, andrew; knopp, kristeene a.; di lullo, elizabeth; retallack, hanna; kriegstein, arnold r.; derisi, joseph l.; reiter, jeremy f. title: zika virus alters centrosome organization to suppress the innate immune response date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xx l r e zika virus (zikv) is a flavivirus transmitted via mosquitoes and sex to cause congenital neurodevelopmental defects, including microcephaly. inherited forms of microcephaly (mcph) are associated with disrupted centrosome organization. similarly, we found that zikv infection disrupted centrosome organization. zikv infection disrupted the organization of centrosomal proteins including cep , a mcph-associated protein. the zikv nonstructural protein ns bound cep , and expression of ns was sufficient to alter centrosome architecture and cep localization. loss of cep suppressed zikv-induced centrosome disorganization, indicating that zikv requires cep to disrupt centrosome organization. zikv infection or loss of cep decreased the centrosomal localization and stability of tank-binding kinase (tbk ), a regulator of the innate immune response. zikv infection or loss of cep also increased the centrosomal accumulation of the cep interactors, mindbomb (mib ) and dtx , ubiquitin ligases that respectively activate and degrade tbk . therefore, we propose that zikv ns binds cep to increase centrosomal dtx localization and destabilization of tbk , thereby tempering the innate immune response. in addition to identifying a mechanism by which cep controls the innate immune responses in zikv infection, we propose that the altered centrosomal organization caused by altered cep function may contribute to zikv-associated microcephaly. in the summer of , cerebral malformations were linked to mosquito-transmitted zika virus (zikv). inherited forms of microcephaly (mcph) are characterized by reduced head and brain size, resulting in severe intellectual disability and motor movement defects. many forms of mcph are caused by autosomally recessive mutations in genes encoding centrosomal proteins required for centrosome biogenesis and mitotic progression . given the similar pathologies between zikv-associated and inherited mcph, we hypothesized that both disorders were due to centrosomal defects leading to disrupted brain development. in mammalian cells, centrosomes serve as the microtubule organizing center of the cell to facilitate neuronal migration and cell division - . the centrosome is composed of centrioles surrounded by a pericentrosomal matrix that nucleates microtubules , . during s phase, the centrosome duplicates by recruiting specialized proteins to the centriole base , - . many mcph-associated proteins are recruited to the centrosome in a hierarchical manner to promote centrosome duplication , [ ] [ ] [ ] [ ] . defects in centrosome organization and biogenesis leading to cell death or premature differentiation in neural progenitors may underlie the pathology of many forms of mcph , , . cells infected with zikv have disrupted centrosome organization and mitotic abnormalities, leading to altered neural progenitor differentiation [ ] [ ] [ ] [ ] [ ] . however, the mechanism by which zikv disrupts centrosome architecture remains unclear. we found that zikv alters the function of the mcph-associated protein, cep . more specifically, the zikv nonstructural protein, ns , localizes to the centrosome and binds cep . zikv ns overaccumulates cep at centrosomes of zikv-infected cells and recruits the ubiquitin ligases, mib and dtx . mib and dtx promote the degradation of tbk , a regulator of the innate immune response [ ] [ ] [ ] [ ] [ ] . consequently, zikv-infected cells express lower levels of interferon b (ifnb), a key anti-viral signal. we propose that zikv recruits cep to the centrosome to degrade tbk , dampening anti-viral responses, and disrupted centrosomal architecture in zikv-infected cells may also perturb neural development. to explore whether microcephaly associated with zikv infection involves the centrosome, we infected human induced neural precursor cells (inpcs) derived from induced pluripotent stem cells with zikv and examined the organization of their centrosomes. sixteen hours post infection (hpi) with zikv, inpcs displayed supernumerary foci of centrin, a component of the centriolar distal lumen (figure a, c) . similarly, zikvinfected u and h cells exhibited supernumerary centrin foci (figure b , c, and supplemental figure a ). like zikv-infected cultured cells, npcs isolated from human fetal neocortical tissue infected by zikv exhibited supernumerary centrin foci ( figure d ). in zikv-infected inpcs and u cells, the supernumerary centrin foci co-localized with the distal centriole protein cp ( figure e and supplemental figure b ), indicating that zikv-associated supernumerary centrin foci can recruit centriolar proteins. however, the supernumerary centrin foci do not accumulate the distal appendage protein cep or pericentrosomal protein g-tubulin ( figure f , supplementary figure b , c). zikv infection did not alter the levels of cp or g-tubulin (supplemental figure d ). electron microscopy revealed that zikv infection did not alter centriole number or distal appendages hpi ( figure g) . instead, zikv-infected cells accumulated electron dense particles in the vicinity of the centrosome. thus, acute zikv infection does not cause centrosome overproduction, but rather disorganizes the centrosome. many mcph-associated proteins dynamically localize to the centrosome during its biogenesis [ ] [ ] [ ] . given that zikv infection alters centrosome organization, we investigated whether the localization of mcph-associated proteins is also altered. zikv infection in u cells did not affect either the localization or levels of the mcph-associated proteins plk , stil, sas , cdk rap , cep , or wdr (supplemental figure a-d) . in striking contrast, zikv infection caused cep , another mcph-associated protein , to overaccumulate at supernumerary centrin foci (figure a , and quantified in figure b ). as in u cells, zikv infection of inpcs and npcs isolated from dissociated fetal brain tissue relocalized cep to supernumerary centrin foci ( figure c) . although zikv infection did not dramatically affect cep protein levels, it did increase the levels of a higher molecular weight species of cep ( figure d ). because mcph-associated proteins function in centrosome biogenesis, we hypothesized that the disruption of cep localization participates in zikv-associated centrosomal disorganization. to test this, we infected control and cep gt/gt mouse embryonic fibroblasts (mefs) with zikv and assessed centrin organization. in contrast to control cells, zikv did not induce supenumerary centrin foci in the absence of cep (figure e-f) . together, these findings indicate that cep is required for zikv-associated reorganization of the centrosome. in zikv-infected inpcs, the non-structural viral protein ns localized to one or two foci among the supernumerary centrin foci at hpi ( figure a) . a yeast two-hybrid analysis suggested that cep binds the non-structural protein ns of flaviviruses related to zikv . as zikv infection causes cep to overaccumulate at centrosomes, we determined whether zikv ns interacts with cep . immunoprecipitation of endogenous cep in zikv-infected cells revealed that cep interacts with zikv ns (figure b ). to confirm the interaction of zikv ns with cep , we transfected cells with myc-tagged brazilian zikv ns and found that myc-ns co-precipitated with endogenous cep , but not cp , another centrosomal protein that misaccumulates at the centrosome upon zikv infection ( figure c ). as zikv ns interacts with cep , we investigated whether cep was required to localize ns to the centrosome. to test this, we infected control and cep gt/gt mefs with zikv and examined ns localization hpi. similar to control cells, zikv ns localized to centrosomes in cep gt/gt mefs ( figure d ) suggesting that ns affects cep localization, but not vice versa. ns acts together with another nonstructural protein, ns b, to form a proteolytic complex ( figure e ) . to gain insight into how zikv disrupts centrosomes, we examined whether the ns b/ns complex could induce supernumerary centrin foci or if ns alone is sufficient. expression of either brazilian zikv myc-ns b/ns or myc-ns induced supernumerary centrin foci ( figure f-h) , suggesting ns alone is sufficient to perturb centrosome organization. additionally, myc-ns increased the centrosomal accumulation of cep ( figure i and quantified in supplementary figure d ) and had no effect on the localization or stability of cep , cdk rap and wdr ( figure i and supplemental figure b -d). therefore, brazilian zikv ns interacts with cep , localizes to the centrosome, and is sufficient to reorganize centrosomal architecture akin to zikv infection. microcephaly is associated with south american strains of zikv, but not uganda zikv [ ] [ ] [ ] . given the ns proteins of the brazilian and ugandan strains of zikv differ by amino acids, we examined whether ugandan ns was sufficient to induce supernumerary centrin foci and misrecruit cep . like brazilian ns , ugandan zikv ns localized to centrosomes ( figure j ). in striking contrast to brazilian zikv ns , the ugandan ns did not induce supernumerary centrin foci or increase centrosomal cep localization ( figure j , and quantified in supplementary figure e , f). together, these findings indicate that brazilian zikv ns has acquired the ability to increase centrosomal cep localization and induce supernumerary centrin foci. to assess how centrosomal disorganization participates in zikv biology, we examined whether centrosomal disorganization contributes to viral production. we found that loss of cep did not alter zikv production ( figure a ). as cep is not required for zikv production, we hypothesized that it may alter the cellular response to zikv infection. zikv inhibits type i interferon effector signaling by degrading stat , and by an unknown mechanism, inhibits interferon induction , . during viral infections, the kinase tbk phosphorylates and activates the transcription factor irf , and is then degraded, to regulate interferon induction , , . tbk localizes to centrosomes [ ] [ ] [ ] [ ] [ ] [ ] and proximity interaction studies have suggested that tbk and cep interact . through co-immunoprecipitation, we confirmed that endogenous cep and tbk interact ( figure b ). an activated form of tbk , phospho-tbk (p-tbk ), is removed from centrosomes upon zikv infection during mitosis , we confirmed p-tbk is removed during interphase ( figure c , and quantified in supplemental figure a ). as p-tbk is absent from centrosomes of zikv-infected cells, we assessed whether zikv infection induces interferon beta (ifnβ) using reverse-transcription digital droplet pcr (rt-ddpcr). following zikv infection, ifnβ expression gradually peaked to hpi, ( figure d ). ifnβ expression was abruptly curtailed at to hpi in zikv-infected cells, corresponding to the timing of when centrosome begin to disorganize. zikv infection decreased levels of tbk and phosphorylated tbk (p-tbk ) at hpi ( figure e ). therefore, we propose that zikv may suppress the innate immune response by suppressing tbk stability and centrosome organization. to test whether zikv disrupts tbk function in a cep -dependent manner, we examined whether centrosomal p-tbk localization depends upon cep . indeed, p-tbk was absent from centrosomes in cep gt/gt mefs ( figure f , and quantified in supplemental figure b ), indicating that cep is required for the centrosomal localization of p-tbk . similarly, tbk and p-tbk levels were decreased in cep gt/gt mefs ( figure g ). these data are consistent with zikv interfering with cep function to disrupt tbk stability. how might zikv reorganize the centrosome in a cep -dependent manner to affect tbk stability? in response to sendai virus infections, mindbomb (mib ) binds and activates tbk through k -linked ubiquitylation , , and proximity interaction studies of cep identified mib as an interactor . we confirmed the interaction of mib to tbk and cep using reciprocal co-immunoprecipitation of endogenous mib with tbk and cep ( figure a and supplemental figure a ). furthermore, we assessed whether zikv infection induced a change in the centrosomal localization of mib . zikv infection increased centrosomal accumulation of mib in inpcs, fetal npcs, and u cells ( figure b and supplemental figure b , c). zikv infection mildly altered the stability of mib , but significantly increased the levels of a higher molecular weight species ( figure c ). we assessed whether cep restricts mib localization to the centrosome. like zikv infection, depletion of cep increased mib accumulation at the centrosome and resulted in the appearance of a higher molecular weight species of mib (supplemental figure d -f), indicating that mib localization to centrosomes is restricted by cep . we then tested whether cep regulates mib activity by examining the levels of mib dependent k -ubiquitylation of tbk in cep gt/gt mefs. k -ubiquitylation of tbk was significantly increased in cep mutant cells, suggesting that increased centrosomal mib may facilitate its ability to ubiquitylate tbk at the centrosome (supplemental figure g -h). similar to the loss of cep , k -ubiquitylation of tbk was dramatically increased in zikv infected cells ( figure d and supplemental figure i ). together, these data suggest that zikv disrupts cep to affect the levels and function of mib at the centrosome. once activated the innate immune response is attenuated by k ubiquitin-mediated degradation of tbk by the ubiquitin ligase dtx . as tbk localizes to the centrosome, we examined whether dtx is, like mib , present at centrosomes. interestingly, dtx partially co-localized with g-tubulin (supplemental figure j ). the specificity of the antibody was confirmed by sirna (supplemental figure j ). in accordance with previously published data, depletion of dtx led to the stabilization of tbk (supplemental figure k ). as tbk stability is disrupted in the absence of cep , we hypothesized that dtx interacts with and is localized to the centrosome by cep . reciprocal immunoprecipitation of cep and dtx confirmed that these two proteins interact we have found that zikv-produced ns binds cep to alter centrosome organization, but not centrosome amplification , increases centrosomal mib and dtx , decreases tbk levels, and disrupts the innate immune response. as cep mutations cause microcephaly in humans , disrupted cep function may underlie the pathogenesis of zikv-associated microcephaly. moreover, the centrosomal cep interactors mib and dtx , regulators of neurogenic notch and innate immune signaling, respectively are altered upon either loss of cep or zikv infection. based on these results, we propose that zikv ns disrupts cep function to both dampen the innate immune response and to disrupt developmental signaling during brain development. previous studies have reported conflicting effects on centrosome biogenesis in zikv infected cells , , , , . our results using human neural progenitor cells from fetal tissue, induced pluripotent stem cells, and established neural cell lines indicate that acute zikv infection disrupts centrosome organization but does not lead to centrosome amplification. as studies by others on centrosome biogenesis examined later time points following zikv infection, centrosomal overabundance in zikv-infected cells may be secondary to multiple rounds of abnormal cell division. we found that zikv-produced ns localizes to the centrosome to induce the formation of supernumerary centrin foci by interacting with and localizing increased amounts of cep to the centrosome. the failure of zikv infection in cep mutant mefs to induce supernumerary centrin, supports our assertion that zikv disrupts centrosome organization in a cep -dependent manner. we and others have shown that cep promotes centriole duplication , , , . however, its function in other cellular processes has not been explored. here, we have provided evidence that cep controls tbk stability, a central component of innate immune signaling. given that cep interacts with and limits the localization of the centrosomal ubiquitin ligases, mib and dtx , to the centrosome, a direct role for cep in promoting cellular signaling at the centrosome is implicit. mib , a k -linked ubiquitin ligase is a key regulator of notch signaling, a pathway that controls neural progenitor maintenance and cell division . a less studied role for mib , is in the innate immune response . we have found that mib over accumulates at the centrosome and k -ubiquitylates tbk in response to zikv infection or cep loss. the increased activity of mib in zikv-infected neural progenitors or cep -depleted cells suggests that mib activity in the notch signaling pathway could also be affected. in agreement with this, a recent publication has demonstrated that mib levels are affected in response to zikv infection , , raising the possibility that zikv-associated microcephaly is a side effect of zikv altering the centrosome to evade host immunity. we found that dtx localizes to the centrosome and promotes the k -ubiquitylation of tbk . similar to mib , dtx accumulates at the centrosome in zikv infected cells in a cep -dependent manner. as tbk is ubiquitylated by dtx to promote its degradation, we propose that zikv mediated recruitment of dtx to the centrosome may limit tbk activity and stability, and thus the innate immune response, to zikv infection. it will be interesting to determine whether other viruses with ns homologues such as sars-cov- m pro and nsp which interact with centrosome proteins can similarly suppress innate immunity by altering centrosome organization. in summary, we have found that zikv ns localizes to the centrosome, recruits cep and its binding partners mib and dtx to ubiquitylate and degrade tbk , a key regulator of the innate immune response. these findings provide mechanistic insight into how zikv specifically targets the centrosome with implications for how it may both evade viral detection and alter brain development. hela and t/ cells (ucsf tissue culture facility) were cultured in advanced dulbecco's modified eagle's medium (dmem, thermo fisher scientific) supplemented with % fetal bovine serum (fbs, thermo fisher scientific) and glutamax-i (thermo fisher scientific). u and h cells were cultured in dmem (thermo fisher scientific) supplemented with % fbs and l-glutamine (thermo fisher scientific). cep gt/gt mefs and control cep +/-mefs were cultured in amniomax c- (thermo fisher scientific). t/ and hela cells were transfected with plasmids using fugenehd (promega) or lipofectamine (thermo fisher scientific), respectively, according to manufacturer's instructions and analyzed h later. npcs were derived from pluripotent stem cells (nih human embryonic stem cell registry line wa (h ) at passages - ) according to a recently published protocol and maintained in neural media composed of dmem/f with sodium pyruvate and glutamax, n , b , heparin and antibiotics. medium was either supplemented with growth factors epidermal growth factor ( ng/ml) and fibroblast growth factor ( ng/ml). de-identified fetal brain tissue samples were collected with previous patient consent in strict observance of the legal and institutional ethical regulations from elective pregnancy termination specimens at san francisco general hospital. protocols were approved by the human gamete, embryo and stem cell research committee (an institutional review board) at the university of california, san francisco. blocks of cortical tissue spanning the ventricle to the cortical plate were dissected away from meninges and germinal zone using a stereomicroscope, and then minced using a razor blade. cells were dissociated by incubation with papain (worthington biochemical corporation) at °c for - min, followed by addition of dnase i and trituration. the cells were collected by centrifugation for min at g, the supernatant was removed, and the cells were resuspended in sterile dmem containing n- , b- supplement, penicillin, streptomycin and glutamine and sodium pyruvate ( . mg/ml) (all invitrogen). the suspension was passed through a μm strainer (bd falcon) to yield a uniform suspension of single cells. cells were plated at a density of . x cells/well on mm coverslips (neuvitro -gg-pdl) precoated with high concentration growth factor-reduced matrigel (bdbiosciences, ) and cultured at °c, % co , % o . we acquired cep tm . (komp)vlcg (also called pibf tm . (komp)vlcg ) heterozygous mice (jackson laboratory). cep tm . (komp)vlcg is a deletion of chr : - (grcm .p ), covering all coding exons of cep . we isolated mefs from littermate e . embryos produced from a heterozygous intercross. mefs were genotyped by quantitative pcr using express sybr greener qpcr supermix, with premixed rox (invitrogen) on a real-time pcr machine (applied biosystems) and cep genotyping primers listed in the reagents table. zikv strain prvabc was propagated in vero cells. viral titers were determined by focus assay . briefly, serial dilutions of viral stock were added to vero cells in -well plates. h post-infection, inoculum was removed and cells were fixed with . % pfa for minutes. foci were visualized by immunofluorescent staining for flavivirus envelope. zikv infections of npcs, u and h cell lines, at mois of , were carried out by incubating cells with inoculum for h and then replacing the inoculum with fresh media. zikv was added to dissociated fetal brain cells at mois of and incubated for h unless stated otherwise. mock and zikv-infected cells were fixed - h post infection in chilled methanol for min at - °c and processed for immunofluorescence. wild type human codon optimized zikv ns b and ns open reading frame flanked by attb sites were synthesized by gibson assembly (sgi). gateway cloning into pdonr generated pentr-ns zikv. subsequent gateway-mediated subcloning into pdest-cmv-myc (gift of keith yamamoto) generated pcmv-myc-ns brazilian and ugandan zikv, encoding n-terminally myc-tagged ns and pcmv-myc-ns b brazilian zikv encoding an n-terminally myc-tagged fusion of ns b and ns . the s a mutant form of ns was generated using site-directed mutagenesis to create pcmv-myc-ns -s a brazilian zikv (quikchange ii, agilent). immunoprecipitations were performed as previously described . in brief, mock or zikv infected u cells or t/ cells were collected in dulbecco's phosphate-buffered saline (dpbs), lysed in lysis buffer ( % igepal ca- , mm tris-hcl ph . , mm nacl, . mm kcl, . mm kh po , . mm na hpo - h o) supplemented with protease and phosphatase inhibitors (emd millipore). myc-tagged proteins were immunoprecipitated with ag myc monoclonal agarose beads (emd millipore), washed three times in lysis buffer and boiled in x laemmli reducing buffer (bio-rad). samples were separated on - % tgx precast gels (bio-rad), transferred onto protran ba nitrocellulose membrane (ge healthcare) and subsequently analyzed by immunoblot using ecl lightening plus (perkin-elmers) or supersignal west dura (thermo fisher scientific). cells were fixed in - °c methanol for minutes followed by permeabilization in blocking buffer ( . % bsa, . % triton-x , . % nan in dpbs) for min. primary and secondary antibodies were diluted in blocking buffer and incubated with cells for at least h. to detect cells in s-phase, cells were co-stained with antibodies to centrin and cyclin a to determine centriole number and s-phase/g cells, respectively. to detect phospho-tbk , u and mefs were fixed in % paraformaldehyde for minutes, blocked in % fbs and . % tritonx diluted in dpbs. permeabilized cells were stained overnight at room temperature using the p-tbk antibody at a dilution of : in % fbs and . % tritonx diluted in dpbs. samples were mounted in gelvatol and imaged with an axio observer d or lsm (zeiss). images were processed using adobe photoshop and analyzed using fiji. (d) immunoblot of mock or zikv-infected u cell lysates probed for ns , cp , or gtubulin. actin served as a loading control. name sequence cep wt l '-ggaaaccattttattgcgacag- ' cep wt r '-ctcaaagtctcgcagatttcg- ' cep gt l '-ctcatcaatgtatcttatcatgtctgg- ' cep gt r '-tcgactactaggaaagcaacgag- ' cep -wt l '-gtaggaccaggccttagcgttag- ' cep -wt r '- bridging centrioles and pcm in proper space and time a primary microcephaly protein complex forms a ring around parental centrioles cep and cep cooperate to ensure centriole duplication molecular architecture of a cylindrical self-assembly at human centrosomes microcephaly proteins wdr and aspm define a mother centriole complex regulating centriole biogenesis, apical complex, and cell fate centriolar satellites assemble centrosomal microcephaly proteins to recruit cdk and promote centriole duplication human cep and cep cooperate in plk recruitment and centriole duplication subdiffraction imaging of centrosomes reveals higher-order organizational features of pericentriolar material katanin p regulates human cortical development by limiting centriole and cilia number acentriolar mitosis activates a p -dependent apoptosis pathway in the mouse embryo zika virus causes supernumerary foci with centriolar proteins and impaired spindle positioning zika virus infection induces mitosis abnormalities and apoptotic cell death of human neural progenitor cells zika virus differentially infects human neural progenitor cells according to their state of differentiation and dysregulates neurogenesis through the notch pathway. emerging microbes & infections recent zika virus isolates induce premature differentiation of neural progenitors in human brain organoids zika virus ns localizes at centrosomes during cell division mapping a dynamic innate immunity protein interaction network regulating type i interferon production ikkepsilon and tbk are essential components of the irf signaling pathway socs drives proteasomal degradation of tbk and negatively regulates antiviral innate immunity differential requirement for tank-binding kinase- in type i interferon responses to toll-like receptor activation and viral infection nemo binds ubiquitinated tank-binding kinase (tbk ) to regulate innate immune responses to rna viruses most of centrin in animal cells is not centrosome-associated and centrosomal centrin is confined to the distal lumen of centrioles cep and cp suppress a cilia assembly program cep , a novel centriole appendage protein required for primary cilium formation gamma-tubulin is a highly conserved component of the centrosome a missense mutation in the pisa domain of hssas- causes autosomal recessive primary microcephaly in a large consanguineous pakistani family mutations in stil, encoding a pericentriolar and centrosomal protein, cause primary microcephaly mutations in plk , encoding a master regulator of centriole biogenesis, cause microcephaly, growth failure and retinopathy flavivirus ns and ns proteins interaction network: a highthroughput yeast two-hybrid screen functional characterization of cis and trans activity of the flavivirus ns b-ns protease detection and sequencing of zika virus from amniotic fluid of fetuses with microcephaly in brazil: a case study zika virus associated with microcephaly comparative genomic analysis of pre-epidemic and epidemic zika virus strains for virological factors potentially associated with the rapidly expanding epidemic. emerging microbes & infections , e zika virus targets human stat to inhibit type i interferon signaling zika virus inhibits type-i interferon production and downstream signaling triggering the interferon antiviral response through an ikkrelated pathway nlrp negatively regulates type i interferon signaling by targeting the kinase tbk for degradation via the ubiquitin ligase dtx contribution of a tank-binding kinase -interferon (ifn) regulatory factor pathway to ifn-gamma-induced gene expression recent insights into the complexity of tank-binding kinase signaling networks: the emerging role of cellular localization in the activation and substrate specificity of tbk zika virus disrupts phospho-tbk localization and mitosis in human neuroepithelial stem cells and radial glia tank binding kinase is a centrosome-associated kinase necessary for microtubule dynamics and mitosis a dynamic protein interaction landscape of the human centrosome-cilium interface zika virus disrupts phospho-tbk localization and mitosis in human neuroepithelial stem cells and radial glia cep recruits cdk to the centrosome: implications for regulation of mitotic entry, centrosome amplification, and genome maintenance mind bomb is essential for generating functional notch ligands to activate notch zika virus increases mind bomb levels, causing degradation of pericentriolar material (pcm ) and dispersion of pcm -containing granules from the centrosome dysregulation of astrocyte extracellular signaling in costello syndrome quantification of lymphocytic choriomeningitis virus with an immunological focus assay in -or -well plates kif a interacts with dynactin subunit p glued to organize centriole subdistal appendages cdk rap (green), cep (green) or wdr (green). (b-c) the fluorescence intensities ± s.d. of plk , stil, sas , cdk rap , cep and wdr were quantified in mock and zikv infected u cells scale bars indicate μm for all images. (c) immunoblot of hela cells expressing myc-tagged ns probed for c-myc, cdk rap , cep , wdr , and cep . actin served as a loading control. asterisk denotes specific band for cep . (d) quantification of mean fluorescence intensities ± s.d. of cdk rap , cep , wdr , and cep in hela cells transfected with an empty c-myc vector (control) or brazilian myc-ns . for fluorescence quantifications cells were analyzed per experiment (n= ). asterisk denotes p< . (student's t test). (e) quantification of hela cells transfected with ugandan myc-ns with greater than four centrioles. (f) quantification of mean fluorescence intensities ± s.d. of cep in control and ugandan myc-ns expressing hela cells. supplemental figure : (a) quantification of mean fluorescence intensities ± s.d. of centrosomal p-tbk in mock and zikv-infected u cells. for fluorescence quantifications cells were analyzed per experiment (n= ) asterisk denotes p< . (student's t test). (d) s phase sc and cep sirna transfected hela cells co-stained for centrin (red) and mib (green). (e) mean fluorescence intensity quantifications of mib in sc and cep sirna-treated hela cells. (f) lysates from control and cep gt/gt mefs immunoblotted for mib . asterisk denotes a higher molecular weight species of mib actin served as a loading control. (i) mock and zikv-infected u cell lysates used to immunoprecipitate tbk in figure d were analyzed by western blot using antibodies to ns and tbk . actin served as a loading control. (j) sc or dtx -depleted hela cells were co-stained for g-tubulin and dtx . (k) immunoblot of hela cells transfected with sc or dtx sirna probed for dtx and tbk . actin served as a loading control. (l) quantification of mean fluorescence intensities ± s.d. of dtx in mock or zikv-infected u cells. (m) quantification of centrosomal dtx in sc and cep depleted hela cells expressed as a mean fluorescence intensities ± s.d. of the control. (n) total cell lysate from mock and zikv-transfected u cells hpi were analyzed by western blot using antibodies to zikv ns and dtx . actin served as a loading control. (o) immunoblot analysis of lysate from sc and cep sirna-treated hela cells using antibodies to cep and dtx . actin served as a loading control. (p) control and cep gt/gt mef lysates used to immunoprecipitate tbk in figure m were analyzed by western blot using antibodies to tbk . actin served as a loading control. (q) cell lysates from mock and zikv-infected u cells used to immunoprecipitate key: cord- -cnlvyey authors: tekman, mehmet; batut, bérénice; ostrovsky, alexander; antoniewski, christophe; clements, dave; ramirez, fidel; etherington, graham j; hotz, hans-rudolf; scholtalbers, jelle; manning, jonathan r; bellenger, lea; doyle, maria a; heydarian, mohammad; huang, ni; soranzo, nicola; moreno, pablo; mautner, stefan; papatheodorou, irene; nekrutenko, anton; taylor, james; blankenberg, daniel; backofen, rolf; grüning, björn title: a single-cell rna-seq training and analysis suite using the galaxy framework date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cnlvyey background the vast ecosystem of single-cell rna-seq tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. the uptake of x genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically-driven methods needed to process and understand these ever-growing datasets. results here we outline several galaxy workflows and learning resources for scrna-seq, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. the galaxy reproducible bioinformatics framework provides tools, workflows and trainings that not only enable users to perform one-click x preprocessing, but also empowers them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. the downstream analysis supports a wide range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal and clustering. the teaching resources cover an assortment of different concepts from computer science to cell biology. access to all resources is provided at the singlecell.usegalaxy.eu portal. conclusions the reproducible and training-oriented galaxy framework provides a sustainable hpc environment for users to run flexible analyses on both x and alternative platforms. the tutorials from the galaxy training network along with the frequent training workshops hosted by the galaxy community provide a means for users to learn, publish and teach scrna-seq analysis. key points single-cell rna-seq has stabilised towards x genomics datasets. galaxy provides rich and reproducible scrna-seq workflows with a wide range of robust tools. the galaxy training network provides tutorials for the processing of both x and non- x datasets. single-cell rna-seq and cellular heterogeneity. the continuing rise in single cell technologies has led to previously unprecedented levels of analysis into cell heterogeneity within tissue samples, providing new insights into developmental and differentiation pathways for a wide range of disciplines. gene expression studies are now performed at a cellular level of resolution, which compared to bulk rna-seq methods, allows researchers to model their tissue samples as distributions of different expressions instead of as an average. pathways from single-cell data. the various expression pro les uncovered within tissue samples infer discrete cell types which are related to one another across an "expression landscape". the relationships between the more distinct pro les are inferred via distance-metrics or manifold learning techniques. ultimately, the aim is to model the continuous biological process of cell di erentiation from multipotent stem cells to distinct mature cell types, and infer lineage and di erentiation pathways between transient cell types [ ] . elucidating cell identity. trajectory analysis which integrates the up or down regulation of signi cant genes along lineage branches can then be performed in order to uncover the factors and extracellular triggers that can coerce a pluripotent cell to become biased towards one cell fate outcome compared to another. this undertaking has created a new frontier of exploration in cell biology, where researchers have assembled reference maps for di erent cell lines for the purpose of fully recording these cell dynamics and their characteristics in which to create a global "atlas" of cells [ , ] . sequencing sensitivity and normalization. with each new protocol comes a host of new technical problems to overcome. the rst wave of software utilities to deal with the analysis of single cell datasets were statistical packages, aimed at tackling the issue of "dropout events" during sequencing, which would manifest as a high prevalence of zero-entries in over % of the featurecount matrix. these zeroes were problematic, since they could not be trivially ignored as their presence stated that either the cell did not produce any molecules for that transcript, or that the sequencer simply did not detect them. normalisation techniques originally developed for bulk rna-seq had to be adapted to accommodate for this uncertainty, and new ones were created that harnessed hurdle models, data imputation via manifold learning techniques, or by pooling subsets of cells together and building general linear models [ ] . with the downstream analysis packages attempting to solve the dropouts via stochastic methods, the upstream sequencing technologies also aspired to solve the capture e ciency via new well, droplet, and ow cytometry based protocols, all of which lend importance to the process of barcoding sequencing reads. in each protocol, cells are tagged with cell barcodes such that any reads derived from them can be unambiguously assigned to the cell of origin. the inclusion of unique molecular identi ers (umis) are also employed to mitigate the e ects of ampli cation bias of transcripts within the same cell. the detection, extraction, and (de-)multiplexing of cell barcodes and umis is therefore one of the rst hurdles researchers encounter when receiving raw fastq data from a sequencing facility. since its conception, several di erent packages and many pipelines have been developed to assist researchers in the analysis of scrna-seq [ , ] . the vast majority of these packages were written for the r programming language since many of the novel normalisation methods developed to handle the dropout events depended on statistical packages that were primarily r-based [ ] . standalone analysis suites emerged as the di erent authors of these packages rapidly expanded their methods to encapsulate all facets of the single-cell analysis, often creating compatibility issues with previous package versions. the bioconductor repository provided some muchneeded stability in this regard by hosting stable releases, but researchers were still prone to building directly from repository sources in order to reap the bene ts of new features in the upstream versions [ , ] . nonexchangeable data formats. another issue was the proliferation of the many di erent and quickly evolving r-based le formats for processing and storing the data, such as singlecellexperiment as used by the scater suite, scseq from raceid, and seuratobject from seurat [ , ] . many new packages would cater only towards one format or suite, leading to the common problem that data processed in one suite could not be reliably processed by methods in another. this incompatibility between packages fuelled a choice of one analysis suite over another, or conversely required researchers to dig deeper into the internal semantics of r s objects in order to manually slot data components together [ ] . these problems only accelerated the rapid development of these suites, leading to further version instability. as a result of this analysis diversity, there are many tutorials on how to perform scrna-seq analysis each oriented around one of these pipelines [ ] . error propagation and analysis uncertainty. di erent pipelines produce di erent results, where the stochastic nature of the analyses means that any uncertainty in a crucial quality control step upstream, such as ltering or the removal of unwanted variability, can propagate forward into the downstream sections to yield wildly di erent results on the same data. this uncertainty, and the statistically-driven methods to overcome them, leaves a wide knowledge gap for researchers simply trying to understand the underlying dynamics of cell identity. x launch. in , x genomics released their gemcode product, which was a droplet-seq based protocol capable of sequencing tens of thousands of cells with an average cell quality higher than other facilities [ ] . this unprecedented level of throughput steadily gained traction amongst researchers and startups seeking to perform single-cell analysis, and thus x datasets began to prevail in the eld. x analysis software. x genomics provided software that was able to perform much of the pre-processing, and provided feature-count matrices in a transparent hdf -based format which provided a means of e cient matrix storage and exchange, and conclusively removed the restriction for downstream analysis modules to be written in r. scanpy, a popular alternative. the scanpy suite [ ] , written in python, using its own hdf -based anndata format became a valid alternative for analysing x datasets. the seurat developers had similar aspirations and soon adopted the loom format, another hdf variant. however, the popularity of scanpy rose as it began to integrate the methods of other standalone packages into its codebase, making it the natural choice for users who wanted to achieve more without compromising on compatibility between di erent suites [ ] . accessible science. as the size of datasets scaled, so did the computing resources required to analyse them, both in terms of the processing power and in storage. galaxy is an opensource biocomputing infrastructure that exempli es the three main tenets of science: reproducibility, peer review, and openaccess -all freely accessible within the web-browser [ ] . it hosts a wide range of highly-cited bioinformatics tools with many di erent versions, and enables users to freely create their own work ows via a seamless drag-and-drop interface. reproducible work ows. galaxy can make use of conda or containers to setup tool environments in order to ensure that the bioinformatics tools will always be able to run, even when the library dependencies for a tool have changed, by building tools under locked version dependencies and bundling them together in a self-contained environment [ ] . these environments provide a concise solution for the package version instability that plagues scrna-seq analysis notebooks, both in terms of reproducibility and analysis exibility. a user could keep the quality control results obtained from an older version of scanpy, whilst running a newer scanpy version at the clustering stage to reap the bene ts of the later improvements in that algorithm. by allowing the user to select from multiple versions of the same tool, and by further permitting di erent versions of the tools within a work ow, galaxy enables an unprecedented level of free-ow analysis by letting researchers pick and choose the best aspects of a tool without worrying about the underlying software libraries [ ] . the burden of software incompatibility and choice of programming language that plagued the scrnaseq analysis ecosystem before, are now completely alleviated from the user. user-driven custom work ows. analyses are not limited to one pipeline either, as the datasets which are passed between tools can easily be interpreted by a di erent tool that is capable of reading that dataset. in the case of scrna-seq, galaxy can convert between csv, mtx, loom and anndata formats. this interexchange of modules from di erent tools further extends the exibility of the analysis by again letting the user decide which component of a tool would be best suited for a speci c part of an analysis. training resources. galaxy also provides a wide range of learning resources, with the aim of guiding users step-by-step through an analysis, often reproducing the results of published works. the teaching and training materials are part of the galaxy training network (gtn), which is a worldwide collaborative e ort to produce high-quality teaching material in order to educate users in how to analyse their data, and in turn to train others of the same materials via easily deployable workshops backed by monthly stable releases of the gtn materials [ ] . training materials are provided on a wide variety of di erent topics, and workshops are hosted regularly, as advertised on the galaxy events web portal. the gtn has grown rapidly since its conception and gains new volunteers every year who each contribute and coordinate training and teaching events, maintain topic and subtopics, translate tutorials into multiple languages, and provide peer review on new material [ ] . stable work ows in galaxy. the analysis of scrna-seq within galaxy was a two-pronged e ort concentrated on bringing high quality single-cell tools into galaxy, and providing the necessary work ows and training to accompany them. as mentioned in the previous section, this e ort needed to overcome incompatible le format issues, unstable packages due to rapid development, and needed to establish a standardised basis for the analysis. tutorials. the tutorials are split into two main parts as outlined in figure : rst, the pre-processing stage which constructs a count matrix from the initial sequencing data; second, the cluster-based downstream analysis on the count matrix. these stages are very di erent from one another in terms of their information content, since the pre-processing stage requires the researcher to be more familiar with wetlab sequencing protocols than your average bioinformatician would normally know, and the downstream analysis stage requires the researcher to be familiar with statistics concepts that a wetlab scientist might not be too familiar with. the tutorials are designed to broadly appeal to both the biologist and the statistician, as well as complete beginners to the entire topic. the pre-processing scrna-seq materials tackle the two most common use-cases that researchers will encounter when they rst begin the eld: processing scrna-seq data from x genomics, and processing data generated from alternative protocols. for instance, microwell-based protocols have been known to yield more features and display lower levels of dropouts compared to x, and so we accommodate for them by providing a more customizable path through the preprocessing stage [ ] . barcode extraction. before the era of x genomics, scrna-seq data had to be demultiplexed, mapped, and quanti ed. the demultiplexing stage entails an intimate knowledge of cell barcodes and unique molecular identi ers (umis) which are protocol dependent, and expects that the bioinformatician knows exactly where and how the data was generated. one common pitfall at this very rst stage is estimating how many cells to expect from the fastq input data, and this requires three crucial pieces of information: which reads contain the barcodes (or precisely, which subset of both the forward and reverse reads contains the barcodes); of these barcodes, which speci c ones were actually used for the analysis; and how to resolve barcode mismatches/errors. barcode estimation. naive strategies involve using a known barcode template and querying against the fastq data to pro le the number of reads that align to a speci c barcode, often employing 'knee' methods to estimate this amount [ ] . however, this approach is not robust, since certain cells are more likely to be over-represented compared to others, and some cell barcodes may contain more unmappable reads compared to others, meaning that the metric of higher library read depth is not necessarily correlated with a better-de ned cell. ultimately, the bioinformatician must inquire directly with the sequencing lab as to which cell barcodes were used, as these are often not speci c to the protocol but to the technician who designed them, with the idea that they should not align to a speci c reference genome or transcriptome. quanti cation with cell ranger. x genomics simpli ed the scrna package ecosystem by using a language independent le format, and streamlining much of the barcode particularities with their cell ranger pipeline, allowing researchers to focus more on the internal biological variability of their datasets [ ] . quanti cation with starsolo. the pre-processing work ow (titled " x starsolo work ow") in galaxy uses rna starsolo utility as a drop-in replacement for cell ranger, because not only is it a feature update of the already existing rna star tool in galaxy, but because it boasts a ten-fold speedup in comparison to cell ranger and does not require illumina lane-read information to perform the processing [ , ] . other approaches. the pre-processing work ows for these "one-click" solutions consume the same datasets and yield approximately the same count matrices by following similar modes of barcode discovery and quanti cation. within galaxy, there is also alevin (paired with salmon) and scpipe which can both also perform the necessary demultiplexing, (alignment-free) mapping, and quanti cation stages in a single step [ , , ] . celseq barcoding. the custom pre-processing work ow (titled "celseq : single batch mm ") is modelled after the cel-seq protocol using the barcoding strategies of the freiburg max-planck institute laboratory as its main template, but the work ow is actually exible to accommodate any droplet or well-based protocol such as smart-seq , and drop-seq [ ] . manual demultiplexing and quanti cation. the training pictographically guides users through the concepts of extracting cell barcodes from the protocol, explains the signi cance of umis in the process of read deduplication with illustrative examples, and instructs the user in the process of performing further quality controls on their data during the post-mapping process via rna star and other tools that are native to galaxy. training the user. at each stage, the user's knowledge is queried via question prompts and expandable answer box dialogs, as well as helpful hints for future processing in comment boxes, all written in the transparent markdown speci cation developed for contributing to the gtn. common stages of analysis. the downstream modules are dened by the ve main stages of downstream scrna-seq analysis: ltering, normalisation, confounder removal, clustering, and trajectory inference. there are three work ows to aid in this process (two of which are shown in figure ), each sporting a di erent well-established scrna-seq pipeline tool. quality control with scater. the scater pipeline follows a visualise-lter-visualise paradigm which provides an intuitive means to perform quality control on a count matrix by use of repeated incremental changes on a dataset through the use of pca and library size based metrics [ ] . once this pre-analysis stage is complete, the full downstream analysis (comprising the ve stages mentioned above) can be performed by workows based on the following suites: raceid and scanpy. downstream analysis with the raceid suite. raceid was developed initially to analyse rare cell transcriptomes whilst being robust against noise, and thus is ideal for working with smaller datasets in the range of to cells. due to its complex cell lineage and fate predictions models, it can also be used on larger datasets with some scaling costs. downstream analysis with the scanpy suite. scanpy was developed as the python alternative to the innumerable r-based packages for scrna-seq which was the dominant language for such analyses, and it was one of the rst packages with native x genomics support. since then it has grown substantially, and has been re-implementing much of the newer rbased methods released in bioconductor as "recipe" modules, thereby providing a single source to perform many di erent types of the same analysis. the work ows derived from both these suites emulate the ve main stages of analysis mentioned previously, where ltering, normalisation, and confounder removal are typically separated into distinct stages, though sometimes merged into one step depending on the tool. cell and gene removal. during the ltering stage, the initial count matrix removes low-quality or unwanted cells using commonly used parameters such as minimum gene detection sensitivity and minimum library size, and low-quality genes are also removed under similar metrics, where the minimum number of cells for a gene to be included is decided. the scater pre-analysis work ow can also be used here to provide a pcabased method of feature selection so that only the highly variable genes are left in the analysis. there is always the danger of overltering a dataset, whereby setting overzealous lower-bound thresholds on gene variability, can have the undesired e ect of removing essential housekeeping genes. these relatively uniformly expressed genes are often required for setting a baseline to which the more desired di erentially expressed genes can be selected from. it is therefore important that the user rst performs a naive analysis and only later re ne their ltering thresholds to boost the biological signal. library size normalisation. the normalisation step aims to remove any technical factors that are not relevant to the analysis, such as the library size, where cells sharing the same identity are likely to di er from one another more by the number of transcripts they exhibit, than due to more relevant biological factors. intrinsic cell factors. the rst and foremost is cell capture eciency, where di erent cells produce more or less transcripts based on the ampli cation and coverage conditions they are sequenced in. the second is the presence of dropout events which manifest as a prevalence of "zeroes" in the nal count matrix. whether a "zero" is imputable to the lack of detection of an existing molecule or to the absence of the molecule in the cell is uncertain. this uncertainty alone has led to a wide selection of di erent normalisation techniques that try to model this expression either via hurdle models, or imputing the data via manifold learning techniques, or working around the issue by pooling subsets of cells together [ ] . in this regard, both the raceid and scanpy work ows offer many di erent normalisation techniques, and users are encouraged to take advantage of the branching work ow model of galaxy to explore all possible options. regression of cell cycle e ects. other sources of variability stem from unwanted biological contributions known as confounder e ects, such as cell cycle e ects and transcription. depending on what stage of the cell cycle a cell was sequenced at, two cells of the same type might cluster di erently because one might have more transcripts due to it being in the m-phase of the cell cycle. library sizes notwithstanding, it is the variability in speci c cell cycle genes that can be the main driving factor in the overall variability. thankfully, these e ects are easy to regress out, and we replicate an entire standalone scanpy work ow dedicated to detecting and visualising the e ects based on the original notebook [ ] . transcriptional bursting. the transcription e ects are harder to model, as these are semi-stochastic and are as of yet still not well understood. in bulk rna-seq the expression of genes undergoing transcription are averaged to give "high" or "low" signals producing a global e ect that gives the false impression that transcription is a continuous process. the reality is more complex, where cells undergo transcription in "bursts" of activity followed by periods of no activity, at irregular intervals [ ] . at the bulk level these discrete processes are smoothed to give a continuous e ect, but at the cell level it could mean that even two directly adjacent cells of the same type normalised to the same number of transcripts can still have di erent levels of expression for a gene due to this process. this is not something that can be countered for, but it does educate the users in which factors they can or cannot control in an analysis, and how much variability they can expect to see. dimension reduction and clustering. once a user has obtained a count matrix they are con dent with, they are then guided through the process of dimension reduction (with choice of different distance metrics), choosing a suitable low-dimensional embedding, and performing clustering through commonlyused techniques such as k-means, hierarchical, and neighbourhood community detection. these complex techniques are illustrated in layman's terms through the use of helpful images and community examples. for example, the gtn scanpy tutorial explains the louvain clustering approach [ ] via a standalone slide deck to assist in the work ow [ ] . commonly-used embeddings. the clustering and the cluster inspection stages are notably separated into distinct utilities here, with the understanding that the same initial clustering can appear dissimilar under di erent projections, e.g. t-distributed stochastic network embedding (tsne) against uniform manifold approximation and projection (umap) [ , ] . ultimately the user is encouraged to play with the plotting parameters to yield the best looking clusters. static plots or interactive environments. cluster inspection tools are available that allow users to easily generate static plots tailored to pipeline-speci c information as originally de ned by the software package authors. however, the anndata and loom speci cations store this map projection data separately, enabling the use of a plethora of possible plotting tools, including html -based interactive visualisations, such as cellxgene [ ] , that permit on-demand querying and rendering of individual cell features without having to generate static images. a collection of these galaxy interactive tools can be accessed at the website live.usegalaxy.eu. though these tools are excellent at dynamically displaying map projections, especially -dimensional ones, further computation must be performed to complete a full pseudotime analysis. inferring developmental pathways. the cell pseudotime series analysis is often referred to as the trajectory inference stage, since cells are ordered along a trajectory to re ect the continuous changes of gene expression along a developmental pathway under the assumption that the cells are transitioning from one pluripotent type to another less-potent type. pseudotime techniques. for the trajectory inference stage, there is the partition-based graph abstraction (paga) technique championed by scanpy [ ] , and there is also the fateid and stemid packages for the raceid work ow [ ] . the former provides a level of graph abstraction to the datasets in order to infer a community graph structure, which it can use to learn the shape of the data and infer pathways between neighbourhoods. the latter is more intuitive, in that it constructs a minimal spanning tree of related clusters that infer lineage, and cell fate decisions that can be explored by querying branches in the tree, as a function of the genes which are up or down regulated along the currently explored pathway. the statistical strength and signi cance of each pathway guides the user along more valid trajectories that would more accurately re ect the biological variation occurring within transitioning cells. the insights and novel cell types discovered in these analyses can also be integrated into the human cell atlas portal [ ] , which is an initiative that aims to classify unique or rare cell types as well as their transitive properties in order to build a comprehensive map of cells that can be used to investigate the various di erentiation pathways of multipotent stem cells in the human body. tutorial hierarchy. tutorials in the gtn are grouped by topic, e.g. variant calling, transcriptomics, assembly, etc. these tutorials can also declare prerequisites, so that users can review required concepts from previous tutorials, e.g. quality control checks from bulk rna-seq still being used in scrna-seq. not only does this allow users to derive a clear route through the range of training materials, but it also empowers them to choose their own learning path through the network of topics. in particular for scrna-seq, users can start their training from pre-processing tutorials and continue till downstream analysis. tutorial structure. tutorials usually consist of a hands-on workow that guides the user through an analysis with galaxy utilising a step-by-step approach, and is often accompanied with a slide deck that either serves to explain standalone concepts more concisely, or is used during workshops and trainings as a way to introduce the user to the topic. in an e ort to maintain reproducibility in science, all tutorials require example workows, and all materials needed to run the work ows and tu- user-driven contribution. the user contributions are the heart of the gtn community, and options are given to appeal to di erent levels of contribution. at the casual level, each tutorial has at the bottom an anonymous feedback form that rates the quality of the tutorial and asks for hints on what could be improved, which the tutorial authors can then act on. at the more eager level, users can contribute directly to the material hosted at the github repository using the approachable gtn markdown format, which further empowers contributors to not only adapt existing material, but to also write tutorials in their own specialist topics. the github code reviews paired with the plaintext gtn markdown format, facilitate easy peer-review of tutorial topics by using standard di utilities. subdomains encapsulate relevant tools. the galaxy tools and the gtn are further tied together by galaxy subdomains, that better serve the various topics within their own self-contained environments. these complement the training materials by providing only the necessary galaxy tools in order to not trouble the user with unrelated tools that might not be so relevant to the material, e.g. variant analysis tools are not included in an scrna-seq environment. this also has the bene t that smaller more specialised galaxy instances can be packaged and deployed, avoiding the overhead of presenting the entire galaxy tool repertoire. single cell and human cell atlas. in this light, the singlecell. usegalaxy.eu subdomain hosts the entirety of the single-cell materials, tools, work ows, and single-cell related events. a table containing the full list of tools in the subdomain, as well as their application to the previously mentioned stages of scrna-seq analysis is given in supplemental table . human cell atlas community members, led by the european bioinformatics institute and the wellcome sanger institute have their own subdomain at humancellatlas.usegalaxy.eu [ ] , providing access to widely applicable tools including scanpy, seurat and monocle [ ] , but also specialist tools such as those for cell type prediction (including scmap [ ] , scpred [ ] and garnett [ ] ). analysis in galaxy work ows. the tools outlined in the downstream work ows subsection expose the full set of parameters of their underlying program suites, in order to serve the same level of customisation that the users would expect when running a notebook-based analysis. this suits the needs of most researchers, but some are more used to processing the data directly in a language-driven notebook environment. galaxy interactive environments (gie). for the more computer programming-oriented users, galaxy hosts interactive environments at live.usegalaxy.eu which allows users to spin up their own jupyter [ ] or rstudio [ ] notebooks whilst harnessing the same cloud compute infrastructure. here, users can import their galaxy datasets, process them in their own desired manner, and export them back into their histories in a similar way to how datasets are treated in work ows. list of gies. in addition to interactive notebooks, the gie also boasts a selection of other interactive tools such as the previously mentioned cellxgene featured in figure , as well as sparql a query language interface, bam/vcf iobio a le format analysis viewer [ ] , ethercalc a web spreadsheet [ ] , phinch a metagenomes visualiser [ ] , wallace a species modelling platform [ ] , wilson an omics visualiser [ ] , ide for materials science, panoply a netcdf viewer [ ] , higlass a hi-c data visualiser [ ] , and even an xfce virtual desktop environment [ ] . growth of scrna training materials. the single-cell materials on the gtn are growing substantially every year, with at rst only one pre-processing tutorial in , one downstream tutorial at the start of , and at the current time of writing three pre-processing tutorials and three downstream analysis work ows, further accompanied by slide decks and interactive visualisations. single-cell galaxy workshops based on these materials have been given at the single-cell rnaseq training course at the earlham institute, the galaxy community conference (gcc ), within the freiburg meinbio consortium, and at the association of biomolecular resource facilities (abrf). the trainings also lend themselves seamlessly to online webinars which have proved useful during the covid- lockdown period. reproducible cloud-based analysis. the advent of scrna-seq analysis within the galaxy framework re-echoes the e orts to standardise the analysis of scrna-seq with the promise of presenting reproducible research. the burden of computation on the ever-growing size of the datasets is shifted to the cloud computing resources, and as scrna sequencing technology scales, more researchers are likely to migrate towards cloud-based solutions in order to reap the bene ts of superior computing abilities and storage capabilities. ultimately, the galaxy framework abstracts the user from the many nontrivial technicalities of the analysis, and exposes them to a legible interface of tools that they can pick and choose from. longevity and accessibility. the community regularly comes together during scheduled code festivals (cofests) or hackathons to review, contribute, and actively maintain the training materials. the number of community contributions have steadily increased over the last four years [ ] , and this growing trend ensures that the galaxy resources will stay current and adapt to changes in scrna sequencing technology and analysis methods if necessary. the gtn also makes use of language translation tools to provide international interpretations of the training materials in order to reach a wider more internationally diverse audience. future of scrna-seq in galaxy. the capacity for growth of scrnaseq in galaxy is limitless, with the continuing acquisition of new single-cell tools being incorporated into galaxy workows, and the expanding gtn community bringing more expert-level contributions to the training material. the vestiges of incompatible libraries and in-exchangeable le formats are unburdened from the user as the epoch of web-based tools and strong biocomputing frameworks become more dominant. from the rst data upload to the nal nishing touches of a customized work ow, the single cell galaxy portal upholds the ideals of open science by supporting the user all the way from the initial training to the nal publication, where they can export and share their results with a single click. lists the following: • project name: single-cell rna-seq analysis in galaxy • project home page: singlecell.usegalaxy.eu • operating system(s): web-based, platform independent • license: gnu gpl v any restrictions to use by nonacademics: e.g. licence needed all datasets used in the gtn are independently hosted at zenodo and are easily ndable under the tag "galaxy training network", as well as being directly hosted within the galaxy data libraries on the usegalaxy.eu server. the tool wrappers which serve as the functional components of the many di erent single-cell analysis tools are hosted at the github tools-iuc repository, as well as at the galaxy toolshed under the category of "transcriptomics". funded by the deutsche forschungsgemeinschaft (dfg, german research foundation) /grk and ba / - , the bbsrc core strategic programme grants bbs/e/t/ pr , bbs/e/t/ pr , bbs/e/t/ pr , and bbs/e/t/ pr , core capability grant bbs/e/t/ pr at the earlham institute, and the national institutes of health grant u hg . the european galaxy project is in part funded by collaborative research centre medical epigenetics (dfg grant sfb / ) and german federal ministry of education and research (bmbf grants a a/a c rbc, l b/ l c de.nbi-epi). revealing the vectors of cellular identity with single-cell genomics the human cell atlas: from vision to reality the dynamics of gene expression in vertebrate embryogenesis at single-cell resolution methods and challenges in the analysis of single-cell rna-sequencing data scater: preprocessing, quality control, normalization and visualization of single-cell rna-seq data in r spatial reconstruction of single-cell gene expression data orchestrating single-cell analysis with bioconductor list of seurat releases scanpy release notes singlecellexperiment: s classes for single cell data single-cell messenger rna sequencing reveals rare intestinal cell types s classes for distributions. the newsletter of the r project current best practices in single-cell rna-seq analysis: a tutorial massive and parallel expression pro ling using microarrayed single-cell sequencing scanpy: large-scale single-cell gene expression data analysis the galaxy platform for accessible, reproducible and collaborative biomedical analyses: update practical computational reproducibility in the life sciences bioconda: sustainable and comprehensive software distribution for the life sciences list of galaxy training network releases community-driven data analysis training for direct comparative analysis of x genomics chromium and smart-seq umi-tools: modeling sequencing errors in unique molecular identi ers to improve quanti cation accuracy massively parallel digital transcriptional proling of single cells star: ultrafast universal rna-seq aligner alevin e ciently estimates accurate gene abundances from dscrna-seq data salmon provides fast and bias-aware quanti cation of transcript expression a exible r/bioconductor preprocessing pipeline for single-cell rna-sequencing data cel-seq : sensitive highly-multiplexed single-cell rna-seq a galaxy-based training resource for single-cell rna-sequencing quality control and analyses pooling across cells to normalize single-cell rna sequencing data with many zero counts preprocessing and clustering k pbmcs tutorial nature, nurture, or chance: stochastic gene expression and its consequences fast unfolding of communities in large networks accompanying slide deck for scanpy pbmc work ow visualizing data using t-sne uniform manifold approximation and projection for dimension reduction paga: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells fateid infers cell fate bias in multipotent progenitors from single-cell rna-seq data science forum: the human cell atlas user-friendly, scalable tools and workows for single-cell analysis the single-cell transcriptional landscape of mammalian organogenesis scmap: projection of singlecell rna-seq data across data sets scpred: accurate supervised method for cell-type classication from single-cell rna-seq data supervised classi cation enables rapid annotation of cell atlases jupyter notebooks-a publishing format for reproducible computational work ows rstudio: integrated development environment for r iobio: a web-based, real-time, sequence alignment le inspector phinch: an interactive, exploratory data visualization framework for -omic datasets a exible platform for reproducible modeling of species niches and distributions built for community expansion web-based interactive omics visualization panoply netcdf, hdf and grib data viewer. national aeronautics and space administration-goddard institute for higlass: web-based visual exploration and analysis of genome interaction maps xfce: a lightweight desktop environment we thank the bioinformatics group at the university of freiburg for the development and hosting of the european galaxy server, monika degen-hellmuth at the backofen lab for her assistance in the organization of the project, the institut français de bioinformatique (ifb) for its support of the artbio team, charles girardot at embl heidelberg for his useful feedback, and we also thank the worldwide contributions from users and developers towards the galaxy project and all upstream authors and contributors of the software ecosystem that we use and rely on. the author(s) declare that they have no competing interests. key: cord- -kf tra authors: sheffield, lakbira; sciambra, noah; evans, alysa; hagedorn, eli; delfeld, megan; goltz, casey; fierst, janna l.; chtarbanova, stanislava title: age-dependent impairment of disease tolerance is associated with a robust transcriptional response following rna virus infection in drosophila date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kf tra advanced age in humans is associated with greater susceptibility to and higher mortality rates from infections, including infections with some rna viruses. the underlying innate immune mechanisms, which represent the first line of defense against pathogens, remain incompletely understood. drosophila melanogaster is able to mount potent and evolutionarily conserved innate immune defenses against a variety of microorganisms including viruses and serves as an excellent model organism for studying host-pathogen interactions. with its relatively short lifespan, drosophila also is an organism of choice for aging studies. despite numerous advantages that this model offers, drosophila has not been used to its potential to investigate the response of the aged host to viral infection. here we show that in comparison to younger flies, aged drosophila succumb more rapidly to infection with the rna-containing flock house virus (fhv) due to an age-dependent defect in disease tolerance. in comparison to younger individuals, we find that older drosophila mount larger transcriptional responses characterized by differential regulation of more genes and genes regulated to a greater extent. our results indicate that loss of disease tolerance to fhv with age possibly results from a stronger regulation of genes involved in apoptosis, activation of the drosophila immune deficiency (imd) nf-kb pathway or from downregulation of genes whose products function in mitochondria and mitochondrial respiration. our work shows that drosophila can serve as a model to investigate host-virus interactions during aging and sets the stage for future analysis of the age-dependent mechanisms that govern survival and control of virus infections at older age. advanced age in humans is associated with greater susceptibility to and higher mortality rates from infections, including infections with some rna viruses. the underlying innate immune mechanisms, which represent the first line of defense against pathogens, remain incompletely understood. drosophila melanogaster is able to mount potent and evolutionarily conserved innate immune defenses against a variety of microorganisms including viruses and serves as an excellent model organism for studying host-pathogen interactions. with its relatively short lifespan, drosophila also is an organism of choice for aging studies. despite numerous advantages that this model offers, drosophila has not been used to its potential to investigate the response of the aged host to viral infection. here we show that in comparison to younger flies, aged drosophila succumb more rapidly to infection with the rna-containing flock house virus (fhv) due to an age-dependent defect in disease tolerance. in comparison to younger individuals, we find that older drosophila mount larger transcriptional responses characterized by differential regulation of more genes and genes regulated to a greater extent. our results indicate that loss of disease tolerance to fhv with age possibly results from a stronger regulation of genes involved in apoptosis, activation of the drosophila immune deficiency (imd) nf-kb pathway or from downregulation of genes whose products function in mitochondria and mitochondrial respiration. our work shows that drosophila can serve as a model to investigate host-virus interactions during aging and sets the stage for future analysis infectious diseases, including viral infections, represent an important burden among the elderly. for instance, older age is a major risk factor for increased morbidity and mortality to numerous viral pathogens including the severe acute respiratory syndrome (sars) associated coronavirus- (sars-cov- ), the agent responsible for the current covid- pandemic (nikolich-zugich et al. ). immunosenescence, a collective term used to describe the progressive functional decline of the immune system over time, is associated with the increased susceptibility to infections and lower responsiveness to vaccination observed in the elderly (leng and goldstein ). considerable progress has been made in understanding how aging affects both, the innate and adaptive immune systems, however, the causes underlying immunosenescence remain incompletely elucidated. in particular, the age- dependent mechanisms leading to dysregulated innate immunity, which represents the first in the present study, we conducted comparative analysis of survival, virus load and gene expression between young and aged drosophila following infection with the flock house virus (fhv). fhv is a small, non-enveloped virus, whose genome is composed of two positive, single-stranded rna molecules (venter and schneemann ). we report that older flies succumb faster to fhv infection without accumulating higher virus loads, suggesting that a tolerance mechanism becomes impaired with age. additionally, we show that aged flies mount a more robust transcriptional response to fhv than young flies, including the regulation of innate immunity genes; response, which is different from the response of flies undergoing aging. genes encoding components of the apoptotic process are predominantly regulated in aged, fhv-infected flies. additionally, we show that several genes whose gene products function in mitochondria and mitochondrial respiratory chain are specifically downregulated in aged, fhv-infected flies. we also demonstrate that among genes that do not belong to specific gene ontology categories, the expression of several encoding for non-coding rnas (ncrnas) to determine how age affects survival to infection with fhv, we injected -day old and -day figure b) . interestingly, although survival curves overlapped at days of age between both sexes ( figure a and figure s a ), virus load was significantly lower in females in comparison to males ( figure b ). at days of age, females showed significant, two-fold decrease in virus load in comparison to males ( figure b) , which was accompanied with slightly better, although non-significantly different median survival to fhv ( . ± . days for males and . ± . days for females, figure s a ). in support of the data obtained for oregonr male flies, similar differences in survival between -and -day old flies, and comparable fhv load at h p.i. between animals of the two age groups was observed for males of another genotype, y w c ( figure s b ). additionally, we found non-significant differences between fhv titers in circulating hemolymph (insect blood) of -and -day old female w flies h p.i. ( figure s c ). accounting for the observed increase in mortality. to test this, we performed transcriptomics analysis using rna sequencing (rna-seq) on -day old (young) and -day old (aged) male oregonr drosophila injected with either tris or fhv at h and h following injection. this sex was chosen because aged males showed more pronounced effect on survival than females. the time points were chosen early in the infection process before differences in survival between age groups were detected. as an additional control, we used non-infected young and aged flies to control for the effects of aging alone in absence of infection. an average of . % of each rna-seq library (table s ) aligned to the d. melanogaster genome (table s ) . we validated the rna-seq data for aging and the h post fhv infection time point using specific primers and rt-qpcr analysis for four genes per experimental condition. we confirmed that in aging flies cpr fb and cg were upregulated and acp a and lman iii were downregulated. in young drosophila, h after fhv infection, upd and ets c were upregulated and rfabg and diedel were downregulated in comparison to tris-injected controls. in aged flies, fhv infection led to upregulation of or a and upd and downregulation of im and gnbp-like ( figure s ). to evaluate the overall similarity and differences between treatments, we used principal differential gene expression analysis following fhv infection revealed that more genes were significantly regulated (p adj < . ) at least two-fold at h p.i. in comparison to h p.i. in both age groups. more genes were differentially changed in aged fhv-infected flies in comparison to young flies for both time points ( figure b , table s ). overall, in young flies, the expression of genes was differentially regulated h p.i. vs , genes h p.i. in aged flies, we observed differential regulation of genes at h p.i. and , genes at h p.i. the process of aging itself differentially regulated expression of , genes ( figure b ). we note that in aging flies, more genes are downregulated than upregulated, whereas in aged, fhv-infected flies there are fewer downregulated than upregulated genes ( figure b) . among the genes differentially regulated during aging, we observed a very small altogether, these results indicate that aged male flies mount a larger transcriptional response following fhv infection than younger flies, a signature that is different from the transcriptional changes taking place during the aging process itself. the fact that most of commonly regulated genes between young and aged fhv-infected flies were found to overlap as a function of time ( % of up-and % of down-regulated genes, figure s ), is in support of the hypothesis that the age-dependent defect in disease tolerance is unlikely to result from the regulation of these genes. rather, our data suggest that impaired tolerance in aged flies could be due to differential regulation of the genes that are uniquely expressed in infected young flies, uniquely expressed in infected aged flies or a combination of both. to visualize biological processes regulated by aging and fhv infection in young and aged flies, we performed gene ontology (go) analysis. the number of genes with flybase id (fbgn number) without a matching david id is listed in table s . we note that most differentially regulated genes with a david id were labeled as "others" ( figure s ). for instance, % of differentially regulated genes for the aging group did not match a specific biological process. for young fhv h, young fhv h, aged fhv h and aged fhv h, these percentages are %, %, % and %, respectively ( figure s ). our go analysis revealed a complex signature. for instance, aging led to changes in expression of genes belonging to biological processes. five of them ('defense response', 'response to bacterium', 'antibacterial humoral response', 'defense response to gram-positive bacterium' and 'oxidation-reduction') overlapped between all five experimental conditions. 'mannose metabolic process' and 'protein refolding' were in common between aging and aged fhv h groups and 'sperm storage' between aging and aged fhv h groups. processes identified in common between the aging group and young and aged fhv-infected flies were 'circadian rhythm', 'multicellular organism reproduction' and 'proteolysis' (figure and table s ). in drosophila, aging leads to both, deregulation of organismal reproduction (tatar ) at h p.i., we identified more biological processes in young flies than in aged animals ( vs , respectively), among which five overlapped between the two age groups. and biological processes were specific to young fhv h and aged fhv h, respectively ( figure and table s ). at h p.i., we found an opposite trend with and biological processes in young and aged flies, respectively, among which overlapped. we found and biological processes to be specific to the young fhv h and aged fhv h groups, respectively ( figure and table s ). in both young and aged flies, fhv infection led to differential regulation of genes involved in processes associated with the nervous system. clustering analysis identified one module of 'neurogenesis' genes that were strongly upregulated in the aged fhv h group and regulated to a lesser extent in young fhv h and aged fhv h groups ( figure s a ). for instance, among genes belonging to this go category at h p.i., the gene midlife crisis table s ). other biological processes linked to the nervous system development and function for which genes were enriched in young and aged fhv-infected groups were 'lateral inhibition', 'sleep' and 'ventral cord development' (table s ). the significance of this regulation is not known as fhv has not been previously demonstrated to target the nervous system, but rather the drosophila heart and fat body (eleftherianos et al. several heat shock proteins belonging to the biological process 'response to heat' were upregulated in both young and aged fhv-infected flies (table s and table s ). this suggests that following fhv infection, this branch of antiviral immunity is functional in aged interestingly, genes belonging to additional categories associated with nervous system's function such as 'neuromuscular synaptic transmission', 'transmembrane transport' and 'neurotransmitter secretion' were specifically found in the young fhv h group. on the other hand, among processes specific to aged fhv h we found 'autophagic cell death', and 'regulation of autophagy' (table s ) . among processes specifically enriched h p.i., we found 'regulation of transcription, dna-templated', 'transmembrane receptor protein tyrosine kinase signaling pathway' and "protein ubiquitination" in young flies and 'phagocytosis', 'programmed cell death' and 'peptidoglycan recognition protein signaling pathway' in aged flies. the latter category contained multiple genes encoding for components of the drosophila imd pathway. finally, among the processes specifically regulated in aged flies at both h and h p.i., we found 'apoptotic process', 'determination of adult lifespan' and 'chromatin remodeling' (table s ). overall, these results indicate that despite a large number of "other" genes, genes belonging to identifiable common and distinct categories of biological processes are regulated by aging and fhv infection of young and aged flies. although our results identify specific categories of biological processes for each experimental group (figure ) , at this stage we are not able to determine whether the age-associated impairment of tolerance depends on the regulation of genes that are specifically regulated in young or/and aged flies. table s ). interestingly, in both young and aged flies, fhv infection led to strong downregulation of most amp and im genes, despite a robust upregulation of the mrna encoding the nf-κb factor relish ( figure a and table s ). in aged fhv-infected flies, we observed marked upregulation of imd pathway components pgrp-le, imd, key (ikkγ) and attd. this upregulation was to a greater extent in the aged fhv h group ( figure a and figure s ). in comparison to aging and young fhv-infected drosophila, we found dsting, whose product acts upstream of relish to protect flies against infection with dcv and crpv figure a ). this suggests that older animals respond to injury by upregulating innate immunity genes toa greater extent than younger flies. we took a closer look at the differentially regulated genes, which were labeled as 'other' in our go analysis ( figure s ). we observed that most of these genes are uncharacterized (categorized as candidate genes, or cg); several are non-coding rnas (ncrna); and others have previously described function but do not fit a specific david go category. among we compared the number of ncrnas differentially regulated at least two-fold in our aged fhv-infected than in young fhv-infected flies ( vs genes h p.i. and vs genes h p.i.) ( figure a , b and figure s a ). aging itself regulated the expression of ncrna genes. as observed for the total number of transcripts, ncrnas, which were regulated by infection shared minimal overlap with aging ( figure b and figure s a ). among ncrnas, we identified the largest proportion to correspond to lncrnas. for all experimental groups we also found asrnas and small nucleolar rnas (snornas). in young fhv-infected flies, a small percentage of ncrnas corresponded to stable intronic sequence rnas (sisrnas). specifically, in aged, fhv-infected flies we found differential regulation of ncrnas that belong to small nuclear (snrnas) and small non-messenger rnas (snmrnas) ( figure s b) . we compared the expression of cr (an asrna) and cr (an lncrna) genes h p.i. by rt-qpcr. consistent with the rna-seq data, we observed significant increase in cr and significant decrease in cr expression in comparison to tris- injected controls in aged, but not young flies ( figure c ). together, these results indicate that both, aging and fhv infection affect the expression of genes encoding different categories of ncrnas, and that specific ncrnas are regulated in the aged organism after fhv infection. we used the highly tractable genetic model drosophila melanogaster to investigate the response of the aged organism following infection with the rna(+) virus fhv. we found that -day old flies died faster than younger flies to fhv infection and that older, but not younger males were more sensitive than females. although for both sexes we did not observe a difference in virus load as a function of age, our results indicate higher fhv titers in younger males in comparison to younger females, for which survival curves overlap. although we cannot exclude genetic background-specific effects, our results raise the interesting question of whether control of virus replication in the young organism represents a sexually dimorphic trait. we observed that older males die faster than older females and contain twice the level of fhv rna transcript than females. this could potentially indicate that in comparison to females, younger males are able to tolerate higher fhv loads, but that this ability becomes impaired with age and results in more rapid death. indeed, it is increasingly recognized that however, additional studies including small rna sequencing during aging to compare the abundance of sirnas against the fhv genome, are needed to determine whether this is the we cannot entirely rule out the possibility that aging impacts resistance mechanisms in a tissue-specific way, differences which cannot necessarily be detected by measuring virus load in whole flies. it therefore would be very informative to perform additional studies to determine whether fhv differentially targets tissues at different ages and whether fhv load differs among tissues as a function of age. for instance, it is appreciated that aging affects gene expression differently in different tissues and in mammalian models differentially expressed genes in a given tissue are often not genes specific to this tissue (rodwell et al. one striking finding of this study is that aged flies infected with fhv mount a more robust transcriptional response than younger flies. the fact that at h after fhv infection we find an overlap between % of upregulated genes and % of downregulated genes in young flies with genes regulated in aged flies, suggests that most of the transcriptional response to fhv is maintained as a function of age. however, aged flies show extensive regulation of additional genes. one possibility was that these additional genes are related to the process of aging itself. we show, however, that the overlap between the transcriptional profiles of aging, our transcriptomic analyses reveal that as fhv infection progresses in aged flies, genes associated with mitochondrial respiratory chain become downregulated. additionally, we notice that several transcripts of genes encoded by the mitochondrial genome (table s ) days-old (labeled as d-old), and aged flies were - days-old (labeled as d-old), the quick-rna miniprep kit (zymo research) was used to isolate total rna following manufacturer's instructions. rna ( ng) was converted to cdna using the high capacity rna-to-cdna kit (applied biosystems). rt-qpcr reaction was performed using power sybr™ green pcr master mix (applied biosystems) according to manufacturer's instructions. primer sequences are listed in table s . for all assays, expression of rpl (rp ) was used to normalize gene expression. table s . the authors affirm that all data necessary for confirming the conclusions of the article are differential expression analysis for sequence count data sexual dimorphisms in innate immunity and responses to infection in drosophila melanogaster decline in self-renewal factors contributes to aging of the stem cell niche in the drosophila testis mitochondria, bioenergetics and apoptosis in cancer dnr mutations cause neurodegeneration in drosophila by activating the innate immune response in the brain midlife crisis encodes a conserved zinc-finger protein required to maintain neuronal differentiation in drosophila drosophila c virus systemic infection leads to intestinal obstruction atp-sensitive potassium channel (k(atp))-dependent regulation of cardiotropic viral infections evolution of longevity improves immunity in drosophila inflamm-aging. an evolutionary perspective on immunosenescence essential function in vivo for dicer- in host defense against rna viruses in drosophila the interplay between immunity and aging in drosophila the kinase ikkbeta regulates a sting- and nf-kappab-dependent antiviral response pathway in drosophila phagocytic ability declines with age in adult drosophila hemocytes bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists systematic and integrative analysis of large gene lists using david bioinformatics resources broad rna interference-mediated antiviral immunity and virus-specific inducible responses in drosophila nf-kappab immunity in the brain determines fly lifespan in healthy aging and age-related neurodegeneration induced antiviral innate immunity in drosophila the host defense of drosophila melanogaster impact of aging on viral infections p -mediated rapid induction of apoptosis conveys resistance to viral infection in drosophila melanogaster inflammation- induced, sting-dependent autophagy restricts zika virus infection in the drosophila brain senescence of the cellular immune response in drosophila melanogaster nucleic acid sensors and programmed cell death disease tolerance as an inherent component of immunity the heat shock response restricts virus infection in innate and intrinsic antiviral immunity in drosophila the twilight of immunity: emerging concepts in aging of the immune system sars-cov- and covid- in older adults: what we may expect regarding pathogenesis, immune responses genome-wide transcript profiles in aging and calorically restricted drosophila melanogaster functional analysis of the drosophila immune response during aging a transcriptional profile of aging in the human kidney flock house virus induces apoptosis by depletion of drosophila inhibitor-of-apoptosis protein diap comparative flavivirus-host protein interaction mapping reveals mechanisms of dengue and zika virus pathogenesis from discovery to function: the expanding roles of long noncoding rnas in physiology and disease reproductive aging in invertebrate genetic models recent insights into the biology and biomedical applications of flock house virus iiv- inhibits nf-kappab responses in drosophila aging of the innate immune response in drosophila melanogaster temporal and spatial transcriptional profiles of aging in drosophila melanogaster mitochondrial respiratory chain complex iv mitochondrial respiratory chain complex i (nd , nd-b we thank dr. annette schneemann for fhv virus stock and the anti-fhv antibody used in this study. we are grateful to drs. david wassarman and grace boekhoff-falk for critical reading of the manuscript. the authors declare that they have no conflict of interest. c. up-regulated ncrna genes down-regulated ncrna genes cr cr key: cord- - alcsd p authors: cheek, martin; tchiengue, barthelemy; van der burgt, xander title: taxonomic revision of the threatened african genus pseudohydrosme engl. (araceae), with p. ebo, a new, critically endangered species from ebo, cameroon date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: alcsd p this is the first revision in nearly years of the african genus pseudohydrosme, formerly considered endemic to gabon. sister to anchomanes, pseudohydrosme is distinct from anchomanes because of its – -locular ovary (not unilocular), peduncle concealed by cataphylls at anthesis and far shorter than the spathe (not exposed, far exceeding the spathe), stipitate fruits and viviparous (vegetatively apomictic) roots (not sessile, roots non-viviparous). three species, one new to science, are recognised, in two sections. although doubt has previously been cast on the value of recognising pseudohydrosme buettneri, of gabon, it is here accepted and maintained as a distinct species in the monotypic section, zyganthera. however, it is considered to be probably globally extinct. pseudohydrosme gabunensis, type species of the genus, also gabonese, is maintained in sect. pseudohydrosme together with pseudohydrosme ebo sp.nov. of the ebo forest, littoral, cameroon, the first addition to the genus since the nineteenth century, and which extends the range of the genus km north from gabon, into the cross-sanaga biogeographic area. the discovery of pseudohydrosme ebo resulted from a series of surveys for conservation management in cameroon, and triggered this paper. all three species of pseudohydrosme are morphologically characterised, their habitat and biogeography discussed, and their extinction risks are respectively assessed as critically endangered (possibly extinct), endangered and critically endangered using the iucn standard. clearance of forest habitat for logging, followed by agriculture or urbanisation are major threats. one of the species may occur in a formally protected areas and is also cultivated widely but infrequently in europe and the usa for its spectacular inflorescences. the new species resulted in this revision was discovered as a result of a long-running survey of plants in cameroon to support improved conservation management. the survey is led by botanists from the royal botanic gardens, kew and the national herbarium of cameroon-irad (institute for research in agronomic development), yaoundé. this study has focussed on the cross-sanaga interval (cheek et al., ) which contains the area with the highest species diversity per degree square in tropical africa (barthlott et al., ) . the herbarium specimens collected in these surveys formed the foundations for a series of conservation checklists (see below). so far, over new species and several new genera have been discovered and published as a result of these surveys, new protected areas have been recognised and the results of analysis are feeding into the cameroon important plant area programme (https://www.kew.org/science/our-science/projects/tropicalimportant-plant-areas-cameroon ). in october the last two authors found two leafless, flowering plants of a spectacular aroid in the ebo forest of littoral region, cameroon (van der burgt , fig. ). since these had prickles on the peduncle they were provisionally identified as anchomanes schott. in late this herbarium specimen was reidentified by the first author as pseudohydrosme engl., an erstwhile gabonese genus previously unknown from cameroon. pseudohydrosme is separated from anchomanes by a peduncle much shorter than the spathe (not far longer) and by - -locular (not unilocular) ovaries (mayo et al. ) . van der burgt was suspected of representing a new species to science since it differed in several characters from the two known species of pseudohydrosme. in addition, ebo in cameroon is separated geographically by km from the range of those two previously known species in gabon in a different biogeographic zone. in order to obtain the missing stages of fruit and leaf for van der burgt , it was decided to revisit the collection site at the next available opportunity. hence, in december leaves, although not fruits, and additional field data were obtained (van der burgt , fig. ) including from a further site. supplementary characters separating the ebo taxon from other members of the genus were discovered in the new material. early in the last author uncovered previously overlooked multiple additional flowering herbarium sheets, morgan , of the same taxon at k, collected from a third site, close to the other two sites, with characters consistent with the first mentioned collection. pseudohydrosme was described with two species, p. gabunensis engl. based on soyaux collected oct. , and p. buettneri engl. based on büttner (buettner) (b) collected in sept. , both from forest at sibang, formerly near and now largely subsumed by, libreville, the capital and principal city of gabon (engler , bogner . in the renowned aroid specialist josef bogner visited gabon and rediscovered two plants of p. gabunensis. tubers were taken to germany and cultivated allowing description of the leaves of pseudohydrosme for the first time (bogner, ) . the second species of the genus, p. buettneri has never been refound. it differs so greatly from the first in the structure of its inflorescence (see key and species account below) that the noted aroid specialist n.e. brown erected a separate genus, zyganthera n.e. brown for it. engler ( ) reduced zyganthera to sectional rank within pseudohydrosme. in their monumental account 'the genera of araceae', mayo et al. ( ) placed pseudohydrosme next to anchomanes, together with nephthytis schott in the tribe nephthytideae engl. molecular phylogenetic analysis has subsequently supported the sister relationship of pseudohydrosme and anchomanes, but since each were represented by only a single taxon, it was not possible to test their monophyly (cabrera et al., ; nauheimer et al. , cusimano et al. . in presenting new data on pseudohydrosme gabunensis based on successful pollination of flowers and fruit production on plants in cultivation in the netherlands, hetterscheid & bogner ( ) questioned the distinction of pseudohydrosme and anchomanes. they considered the only difference to be the locularity of the ovaries ( - versus ) and set aside the difference in peduncle: spathe proportions maintained in mayo et al. ( ) . however, later in their paper hetterscheid & bogner ( ) then brought to light two new synapomorphies that further support the distinction of pseudohydrosme from both anchomanes and from all other aroids (see discussion below). hetterscheid & bogner ( ) made the case to extend the range of pseudohydrosme from gabon southwards into the republic of congo (congo-brazzaville) citing photos and a specimen deposited at wag by ralf becker. however, the species concerned is not identified and we have not been able to access the material in order to verify this statement (see methods). in this paper we describe the cameroon material from ebo forest as a new species to science, pseudohydrosme ebo cheek, in the context of a revision of the genus, last revised nearly years ago (engler ) . herbarium citations follow index herbariorum (thiers et al., ) . specimens were viewed at b, br, k, p, wag, and ya . pseudohydrosme is centred in gabon. the national herbarium of gabon is lbv, but the most comprehensive herbaria for herbarium specimens of that country are p and wag. the national herbarium of cameroon, ya, was also searched for additional material but without success. during the time that this paper was researched, it was not possible to obtain physical access to material at wag (due to the transfer of wag to naturalis, leiden, subsequent construction work, and covid- travel and access restrictions). however images for wag specimens were studied at https://bioportal.naturalis.nl/?language=en and those from p at https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form?lang=en_us. we also searched jstor global plants ( ) for additional type material of the genus, and finally the global biodiversity facility (gbif, www.gbif.org accessed aug ) which lists occurrences and images, mainly relating to the holdings (including duplicate herbarium sheets) of wag, followed by p. binomial authorities follow the international plant names index (ipni, ). the conservation assessment was made using the categories and criteria of iucn ( ). geocat was used to calculate red list metrics (bachman et al., ) . spirit preserved material was not available. herbarium material was examined with a leica wild m dissecting binocular microscope fitted with an eyepiece graticule measuring in units of . mm at maximum magnification. the drawing was made with the same equipment using leica camera lucida attachment. the herbarium specimens of the new species described below as pseudohydrosme ebo were soaked in warm water to enable the spathe to be folded back, exposing the spadix and hydrated flowers. the terms and format of the description follow the conventions of mayo et al. ( ) . georeferences for specimens lacking latitude and longitude were obtained using google earth (https://www.google.com/intl/en_uk/earth/versions/ ). the map was made using simplemappr (https://www.simplemappr.net). pseudohydrosme is sister to anchomanes schott ( - species) and the pair are in a sister relationship with nephthytis schott ( species). together these three tropical african (all are absent from madagascar, but anomalous nephthytis bintuluensis a.hay, bogner & p.c. boyce occurs in borneo (hay et al,. ; nauheimer et al., ) ) genera comprise the nephthytideae which is sister to the bigeneric se asian group aglaonemateae engl. (cabrera et al., ) . both groups share adjacent male and female flower zones, free stamens, and collenchyma arranged in threads peripheral to the vascular strands of leaf blades and petioles (with the exception of nephthytis, in which collenchyma can form interrupted bands (keating, ; cabrera et al., ) . morphological support for nephthytideae is the primary leaf venation with basal ribs of the primary veins very well developed, i.e., ± tripartite primary development (cabrera et al. ) dracontioid leaf divisions characterise pseudohydrosme and anchomanes but are not present in nephthytis. differences between pseudohydrosme and anchomanes are re-assessed in the discussion below. pseudohydrosme engl. (engler : ; brown in thistleton-dyer : ; engler : ; mayo et al., : - ) type species: pseudohydrosme gabunensis engl. (lectotypified by n.e.brown in thistleton-dyer : ) . br. (brown in thistleton-dyer, : ) . heterotypic synonym. large, seasonally dormant, monoecious herbs. rhizome shallowly subterranean, the growing point at ground level, subglobose or cylindrical with annular leaf scars, and erect to horizontal or obliquely inclined, growing continuously and not renewed with each growing period. roots fleshy, produced along length of rhizome, sometimes reproductive (the distal ends rising to the surface and producing new plants). leaf solitary, large; petiole cylindrical, erect, long, with minute and sparse prickles, sheath very short and inconspicuous. blade transitioning from simple, sagittate and entire in seedlings, older plants developing slits and divisions, in mature plants leaves dracontoid : trisect, primary divisions pinnatisect, distal lobes mostly truncate and bifid, sessile and decurrent, proximal lobes acuminate; primary lateral veins of ultimate lobes pinnate, forming a regular submarginal collective vein (p. ebo) or an irregular collective vein, or veins running into margin (p. gabunensis), higher order venation reticulate. inflorescence solitary, appearing separately from the leaf. cataphylls papery-membranous, ( -) - , proximal ± triangular, small, distal oblong-elliptic, concealing spathe tube. peduncle concealed by cataphylls at anthesis, terete, very short < / th the length of the spathe, with minute, sparse, prickles. spathe large, fornicate, resembling the horn of a euphonium, unconstricted, blade very broad, with flaring auriculate margins; tube convolute, fleshy, obconic, with a few sparse prickles on the outer surface proximally. spadix short, about / - / length of spathe, sessile, female zone subcylindric, male zone cylindric, obtuse or rounded, subequal to ± twice (±four times in p. buettneri) as long as female, completely covered in flowers and fertile to apex (p. gabunensis) or with a distal appendix twice as long as the fertile portion and covered in sterile male flowers (p. buettneri) or flowers covering only about % of axis in the female zone (p. ebo). flowers unisexual, perigone absent. male flower - -androus, stamens free, subprismatic, compressed, anthers sessile, connective thick, broad, overtopping thecae, thecae oblong, long, lateral, dehiscing by apical pore. pollen extruded in strands, inaperturate, ellipsoid-oblong, very large (mean micrometres diam.) exine psilate to slightly scabrous. sterile male flowers (p. buettneri) composed of subprismatic, free staminodes. female flower ovary globose to broadly ellipsoid, usually prismatic, - -locular, ovules per locule, anatropous, funicle short, placentation axile, at base of septum, stylar region attenuate to cylindric, narrower than ovary, stigma thick, shallowly - -lobed or subdiscoid, concave centrally, wet when receptive. berry white, ripening dark purple, fleshy, wrinkled when mature, oblong-ellipsoid, laterally compressed to slightly bilobed, stipitate, large, borne on a slightly accrescent peduncle, stigma and style persistent (known only in p. gabunensis). seeds subglobose to broadly ovoid, one side convex, the other slightly flattened, testa thin, whitish, smooth, papery, transparent; embryo large, outer surface green, inner white, raphe distinct, hilum and micropyle purple, plumule with leaf primordia. three species. phenology: flowering sept. and oct. (or march in cultivation in europe); in leaf dec.-april. distribution & habitat: cameroon and gabon, lowland evergreen forest on coastal sediments (gabon) or inland foothills on basement complex rocks (cameroon) (fig. ) . etymology: meaning "false hydrosme", hydrosme schott now being a synonym of amorphophallus. local name and uses: none are documented. conservation: all species are highly infrequent and globally threatened according to iucn ( ) criteria (see species accounts below), and p. buettneri is possibly extinct (not seen for over years, the majority of its former habitat destroyed). pollination has not been investigated in detail in pseudohydrosme,but is almost certainly by insects as is usual in araceae. two different species of flies, and two of beetles were reported to visit p. gabunensis (see below). in cultivation the stigmas are reported to be wet and receptive for only two days, and the scent reported to be faint, of lettuce (lactuca) in the same species (see below also). following successful fertilisation, seed development is reported to take up to months in p. gabunensis (see below). seed dispersal is probably by either ground-dwelling mammals or birds consuming the thinly fleshy purple berries. cabrera et al. ( ) for pseudohydrosme gabunensis, using five regions of coding (rbcl, matk) and noncoding plastid dna (partial trnk intron, trnl intron, trnl -trnf spacer). the voucher was wieringa (wag), genbank codes are am , am , am + am . cultivation of one species, pseudohydrosme gabunensis is widespread but infrequent outside of africa in the tropical glass-house collections of several large botanical gardens, mainly in europe and n. america (see under that species). chromosome numbers are reported of one species, pseudohydrosme gabunensis, as n = ca. (mayo et al., ); x = ( (bogner & petersen, ; cusimano et al., ) . germination in pseudohydrosme gabunensis is cryptocotylar and takes weeks to months. the large seed embryo remains buried, producing a single hastate seedling leaf (hetterscheid & bogner, ) . medicinal uses, and chemistry is unreported in pseudohydrosme. however, the much more frequent sister genus anchomanes, is harvested as a traditional medicine e.g. in cameroon, and contains bioactive compounds (cheek, ) identification key to the sections and species of pseudohydrosme . pseudohydrosme buettneri engl. (engler : ; engler : ; brown in thistleton-dyer : ; engler : ) . - fig. , . holotype: gabon, estuaire province, libreville "gabun, mundagebiet; sibange-farm" fl. sept. , buettner (holotype b destroyed or mislaid). terrestrial herb, rhizome vertical, subglobose, . cm long, . cm wide, surface tuberculate, roots fleshy, from along the length of the rhizome. leaf unknown. inflorescence: cataphylls three or more, . - x . - cm; peduncle cm long, colour and indumentum unknown. spathe cm long. spadix subcylindrical, - cm long, c. . cm diam. female zone . cm long. fertile male zone cm long. appendix (sterile male flowers) cm long, c. . cm diam. female flowers mm long, ovary globose, mm diam., style mm long, slender; stigma mm diam., thick. male flowers. stamens mm long, mm wide, usually the two stamens of a flower close together. staminodes prismatic, - sided, lacking anther thecae and much smaller in diameter than the stamens (from engler ( notes: pseudohydrosme buettneri has the largest inflorescence by far of all known species of the genus, with an cm long spathe. the type specimen had lost the top part of the spathe, but dimensions were given by the collector (engler, ) . the type, and only known specimen was at b, but is reported to be no longer there (bogner ) . one can deduce that it was destroyed in the allied bombing of berlin in march , when most of the specimens at b were also destroyed, however the type specimen of p. gabunensis (see below) dating from about the same time, and also housed at b, has survived. no additional specimens of this species have been found in the years ensuing from collection of the type specimen. hetterscheid & bogner ( ) have questioned whether this species is not just a variant of p. gabunensis. however, this seems highly unlikely, because the specimen differs in three independent characters from p. gabunensis (and p. ebo): ) the ratio between the female zone and the male zone (of fertile and sterile flowers) differs greatly between the two. in p. buettneri the ratio is : +, while in the other two species it is less than : . ) in p. buettneri most of the spadix consists of an appendix of sterile male flowers. no such sterile appendix occurs in the other two species. ) in p. buettneri the stamens are paired (engler ) , while in the other two species the stamens are not paired, but present in an indistinct ring of five. additional differences between the species seen in the pistils and spadices. the styles in p. buettneri are < / the width of the ovary. in p. gabunensis it is ½. the spadix of p. buettneri is cylindrical, and even in width along its length, while that of p. gabunensis shows a pronounced constriction at the junction of female and male zones, and the male zone reaches a greater width than the female zone. conservation. pseudohydrosme buettneri is here assessed as critically endangered (possibly extinct). this is because it has only been found once, at a single site, in the "munda region" at sibang farm or plantation, in . at that time sibang was far outside libreville consisting largely of forest, some of which was exploited to produce forest products such as timber and rubber, and cleared to produce agricultural products by europeans for international commerce e.g. by the woermann company (cheek et al. : ) . the munda is the estuary that forms the eastern edge of the peninsula on which libreville sits. tributaries of the munda drain the sibang area. beginning in , the population of libreville expanded -fold, and its footprint expanded. only a small part of the original forest formerly known as sibang survived. this part measures about m x m as measured on google earth (see further details under p. gabunensis, below) and is now entitled the 'sibang arboretum'. this minute remnant of forest is probably the most visited by botanists in the whole of gabon because it is immediately adjacent to the site of the current national herbarium, lbv (cheek, pers. obs.) . in the unlikely although hoped-for rediscovery of pseudohydrosme buettneri, the area of occupancy would be expected to be calculated as km using the iucn preferred gridcells of this size, and the extent of occurrence of the same size. if it should be found anywhere in the vicinity of libreville it is likely to be threatened by human pressures since most of the population of gabon is concentrated here. the libreville region has the highest botanical specimen collection density in gabon, with specimens recorded in digital format. it also the highest level of diversity of both plant species overall and of endemics (sosef ) . the coastal forests of the libreville area are known to be especially rich in globally restricted species (lachenaud et al. ) . these authors detail species globally restricted to the libreville area, of which eight have not been seen recently and which are possibly extinct. among these is octoknema klaineana pierre, a rainforest tree "only collected in the immediate area of libreville at the beginning of the th century, and only once since." (gosline & malecot, ) . most of the collections of this possibly extinct species of octoknema were also, as with pseudohydrosme buettneri, from libreville-sibang, and were mainly made in the period - , during the colonial period, before the city expanded to its current extent. the other seven species recorded as globally restricted to the libreville area and as possibly extinct by lachenaud et al. ( ) : are ardisia pierreana taton (taton, ) , dinklageella villiersii szlach. & olszewski (szlachetko & olszewski, ) , eugenia librevillensis amshoff (amshoff, ) , hunteria hexaloba (pichon) omino (omino, ) , pandanus parvicentralis huynh (huynh, ) , psychotria gaboonensis ruhsam (ruhsam et al. ) and tristemma vestitum jacq.-fél. (jacques-félix, ) . these species have also not been seen in several decades, or more, in the case of the penultimate species, since . the explanation for this hotspot of unique species, fast disappearing if not already extinct, at libreville may be that it has the highest rainfall in gabon (gosline & malecot, ) , with c. . m p.a. it seems likely that pseudohydrosme buettneri is an additional lost endemic species to the libreville area, likely rendered extinct by the expansion of the city. let us hope it is rediscovered in a fragment of forest in the greater libreville area, although this seems extremely unlikely given that it was the most spectacular species of the genus with by the largest spathe ( cm long) known in the genus, and that as stated above, the libreville area is the most intensively botanically surveyed part of gabon (sosef et al., ) . type species: pseudohydrosme gabunensis engl. chorianthera engl. (engler, : ) . homotypic synonym male flowers free, in clusters of c. ; distal half of spadix lacking sterile male flowers; ratio of female:male spadix portions : . pseudohydrosme gabunensis engl. (engler, : ; engler : ; brown, : ; engler, : ; bogner, : ; hetterscheid & bogner, : - ) . -fig. , . holotype: gabon, estuaire province, libreville, sibang, "gabun, mundagebiet; sibang-farm am ufer des maveli" fl. october , soyaux (holotype: b , herbarium specimen, image!) terrestrial herb, rhizome light brown, ellipsoid or subcylindric, erect or oblique, - cm diam. to cm long, surface with transverse ridges. roots fleshy - mm thick, brownish yellow, sometimes developing new plants at their tips. leaf - . (- . ) m tall, petiole terete, - . cm diam. at base, dark green olive and spotted with small yellowish white points; spines - mm long. blade of youngest seedlings sagittate-elliptic c. cm long, - cm wide, basal sinus c. cm long, breadth variable (see hetterscheid & bogner, ) . successively formed blades developing slits and divisions. blade of mature leaves dracontoid, primary divisions - cm long, pinnatisect, lobes - , dimorphic, larger, distal lobes elliptic ( -) - cm long, ( -) - (- ) cm, apex truncate, bifid, ( . -) - cm long; smaller, proximal leaflets ovate, . - cm long, . - cm wide, apex cuspidate; lateral nerves - on each side of the midrib, conspicuous on abaxial surface, running to the margin or forming an incomplete submarginal nerve, higher order nerves reticulate. inflorescence: cataphylls - membranous, reddish white or brown-purple, slightly spotted distichous, proximal subtriangular shorter, distal becoming longer and oblong elliptic to the spathe . - cm long, ( -) - . cm wide; peduncle ( -) - cm long, - . cm diam., with minute sparse spines - mm long and greenish white, colour as petiole. spathe ( -) - cm long, basal half ( - cm long) funnel-shaped to tubular, fleshy and to mm thick, blade comprising the distal half of the spathe, flaring widely and curving forward, the apex obtuse, margin undulate. outer surface uniform bright pale yellow, greenish yellow or yellow white. inner surface of blade pale yellow or yellowish white, at marginal separated by a solid irregular line from the dark purple central area which continues to the base of the tube. mouth facing horizontally, usually orbicular or elliptic. spadix with "unpleasant smell, but not so strong as some araceae" (van der laan , wag), sessile, cylindrical, ( -) - . cm long, ( . -) - . cm diam. female zone ( -) . (- ) cm long, female flowers completely covering the surface of the axis, usually contiguous with the male zone. male zone ( . -) - . cm long, apex rounded, completely covered in fertile male flowers. sterile appendix absent. flowers lacking perigone. male flowers with c. stamens, stamens densely packed, free, sessile, mm long, in plan view isodiametric, subprismatic, - -faceted, in cross section c. . (- ) mm x . mm wide, apex convex purple, sides white, anther thecae c. mm long, opening by an apical pore, pollen orange or yellow, in strings. female flowers white with ovary yellowish-white globose or ellipsoid, - mm diam., (- )-celled; style - . mm long, . mm diam., stigma black to reddish brown, surface papillose, mm wide, bilobed, lobes with a broad concave area, apex rounded. berry white, ripening purple-black, surface wrinkled when ripe, thinly fleshy, transversely ellipsoid, laterally compressed, rarely globose, . - . cm long, . - . cm wide, style and stigma persistent, ( -) -seeded, apex rounded-truncate, base stipitate stipes ( -) - mm long, c. mm diam. seeds subglobose to broadly ellipsoid, one side flattened, the other convex, mm long, mm wide. phenology: flowering mid-sept.-late oct. distribution & ecology: gabon, estuaire, moyen-ogooué (probably) and woleu-ntem provinces, known from five sites in lowland rainforest sometimes with aucoumea gabonensis pierre (burseraceae); - m alt. etymology: meaning "coming from gabon" (formerly, in german "gabun"). those specimens listed above which are sterile, e.g. wieringa (voucher for dna studies of the genus, see above), wieringa & haegens , are only provisionally identified as p. gabunensis. it is possible that these specimens might belong to another species of the genus (although unlikely since these specimens were collected at sibang arboretum where in recent years only this species of the genus has been collected in flower). equally they may even represent a species of the genus anchomanes. conservation: pseudohydrosme gabunensis is possibly extinct at some of its historical locations and is threatened at all of those which remain. at the type location, sibang, formerly far outside libreville, at least four gatherings have been made in what is now a small and highly visited forest patch inside libreville (see notes under p. buettneri above). measured on google earth, the forest is approximately a square, c. m n to s, and m w to e, or about . km² (grid reference: ° ' . "n, ° ' . " e, m alt.). it is now completely surrounded by the dense urban settlement of libreville which has expanded greatly in the last years. in , at independence, the population of libreville was , . since then it has expanded -fold to, in , , (https://en.wikipedia.org/wiki/libreville, accessed sept. ) and has a vastly greater footprint. sibang arboretum, the surviving patch of forest of a once much greater area, is now known as one of the top two tourist destinations in libreville. at the cap santa clara location, the forêt de la mondah, known since as the raponda walker arboretum (walters et al. ) , two collections were made, one in (see additional collections). since created as a protected area in , it has been reduced in size, losing % of its area in years to habitat clearance and degradation due to its close proximity (c. km) to the metropolis of libreville which draws upon its trees for timber and firewood (walters et al. ) . it is not clear whether or not either of the two specimens from st. clara was from within the current protected area or not. the species has not been recorded from the ogouué since it was collected there by leroy ( - ), despite intensive recent surveys in the lower reaches of the river whence it was probably collected. we have georeferenced the leroy record from lambarene since in leroy's time this was a trading post n the lower reaches of the river and it is credible that he stopped and collected there, but this is uncertain. neither has it been recorded in the last century from the komo at kango, whence it was collected by chevalier ( , p; fl. oct. ) . since this is now on a major transnational route, and on google earth shows multiple cleared areas due to development, it is possible that it no longer survives at this location, especially since it has not been recorded here or anywhere near, in such a long time, despite the peak decades of botanical collection in gabon having been at the end of the th century (sosef et al., ) . pseudohydrosme gabunensis was assessed as endangered, en b ab(ii,iii) by lovell & cheek ( ) since it is or was known from ten specimens at five locations globally, with an area of occupation estimated as km² using the km² cell sizes preferred by iucn ( ) and the threats detailed above. threats in the libreville area have already resulted in the possible global extinction of nine species, including pseudohydrosme buettneri (see under that species, above). the extent of occurrence is calculated as , km² notes. the location given in the protologue for the type specimen (see above) is similar to that of pseudohydrosme buettneri but more detailed. the munda is the estuary that forms the eastern edge of the peninsula on which libreville sits. tributaries of the munda drain the sibang area, one of which may have been known as the maveli, on the forested banks of which soyaux recorded collecting the type of pseudohydrosme gabunensis. the specimens leroy and chevalier (both p) had been determined as amorphophallus until identified by bogner (m) as pseudohydrosme gabunensis in dec. . in contrast, wieringa (wag) determined as this species, and cited as such in sosef et al. ( ) is in fact an amorphophallus, evident in the larger leaf blade divisions all being acuminate not bifid, and the tuber being described as having the roots from the top (not scattered along the length). similarly, wieringa (wag), correctly cited in sosef et al. ( ) as pseudohydrosme gabunensis, was originally collected as an anchomanes until determined by hetterscheid in april . van der laan (wag) had been identified as anchomanes nigritianus rendle until redetermined by bogner in sept. . pseudohydrosme gabunensis is the most common and widespread member of the genus. however, it is still extremely rare and with a highly restricted range in the wild. it is sought after by private collectors of aroids and live rootstocks and seed attract high prices on the internet. fortunately, it is found in several large public botanic gardens including in germany, france, netherlands, u.k. and u.s.a. we believe that plants are probably not collected from the wild (but this cannot be ruled out), rather they are propagated from those already in cultivation, probably from seed derived from the netherlands. the collection reported in hetterscheid and bogner ( ) as from congo must be treated with great caution. since it is greatly disjunct (at least c. km) from the known range of this species, it may represent a further new species. if it consists only a leaf and lacks reproductive parts, it may have been confused with an anchomanes. the specimen concerned should be located, studied carefully, and an attempt made to rediscover the source population. differences between pseudohydrosme gabunensis and p. buettneri are detailed under the last species. there is no doubt that pseudohydrosme gabunensis is much more closely related to p. ebo than to p. buettneri. however the larger size of the spathes in p. gabunensis, their different colour and patterning, the usually bilobed style and bilocular female flowers densely covering the axis, all serve, together with the vegetative characters, to separate it from p. ebo (see also table below). it is to be hoped that further studies of live plants of p. ebo will be possible to determine if, like p. gabunensis it also propagates itself apomictically from the root tips. reproductive biology: hetterscheid & bogner ( ) working with cultivated plants, report that the female flowering phase is indicated by a faint yet clear lettuce-like scent as the spathe opens, at which time, for days, the receptive stigmas are wet and sticky. after this time, the stigmas turn darker brown, desiccate and are no longer receptive. individuals are obligate out-crossers. fruits take up months to mature (hetterscheid & bogner, ) . germination and development: germination takes weeks to months, producing a single small sagittate, entire leaf from a small rhizome. for several months to two years, new leaves are produced consecutively, usually each larger than its predecessor. from the second leaf onwards, slits may develop in the blade, and within two years the successively produced blades first become divided and finally develop the mature dracontoid pattern (see description). first flowering has occurred in as little as five years from first sowing (hetterscheid & bogner, ) . in the wild, time to maturity is likely to take longer due to predation, competition, and likely lower availability of nutrients . pseudohydrosme ebo cheek, sp. nov. - fig. - , & . differing from pseudohydrosme gabunensis engl. in the ovaries -celled, the stigma conspicuously -lobed, very rarely -celled/lobed (not: usually -celled, -lobed, rarely -celled/lobed), the female zone of the spadix only c. % covered in flowers (not % covered), the spathe at anthesis - (- . ) cm long, the outer surface dull white with longitudinal brown stripes, inner surface light reddish brown with wide pale green veins (not ( -) - cm long, uniformly white, green or yellow on both surfaces, inner surface bicoloured, the mid-blade area dark purple, separated by a solid line from the marginal white/yellow coloured area). type: b.j. morgan (holotype k!; isotypes b! mo! ya !), cameroon, littoral region, yabassi-yingui, ebo proposed national park, fl. sept. . terrestrial herb, to . m tall. rhizome cylindric, c. cm diam. obliquely erect to almost parallel to substrate surface, only upper part exposed, surface with transverse ridges (leaf scars) about mm deep, mm apart. roots adventitious, thick, fleshy, c. mm diam., scattered along length of rhizome, vegetative apomixis not detected. leaf to . m tall, petiole terete, to cm. diameter at base, green, inconspicuously spotted yellow, mature plants with minute, patent, extremely sparse prickles . mm long. blade of youngest seedlings sagittate-elliptic, x . cm, apex obtuse, basal sinus . x . cm, petiole - cm long. older seedlings, in successive years with leaves developing first slits and then divisions, becoming triangular in outline with a broad basal sinus. blade of mature leaves dracontoid, primary division - x - cm, pinnatisect, lobes - , dimorphic, larger, mainly distal lobes oblong . - x . - . cm, apex acuminate or truncate-bifid (biacuminate), acumen . - . cm long, smaller, mainly proximal lobes ovate c. x . cm; lateral nerves - conspicuous on abaxial surface, on each side of the midrib, uniting to form a regular looping submarginal nerve - mm from the margin, higher order nerves reticulate. inflorescence: cataphylls , distichous, light brown, with light green spots, membranous, successively increasing in size from proximal to distal, the outer most triangular-broadly ovate, amplexicaul x cm, the third in succession, long lanceolate-oblong, x cm, the fourth - x . - cm; peduncle . - . x . - . cm, with minute, patent, extremely sparse prickles . mm long, colour as petiole. spathe - (- . ) x cm long basal / -¾ tubular, funnel-shaped, . - cm wide at cm above the peduncle, - cm wide at cm above the peduncle, and - cm wide at cm above the peduncle, the distal spathe (blade), half to one third of the total length flaring widely and curving forward, hood-like, shielding the spadix, the apex with a triangular acumen - x cm. outer surface of both tube and blade dull white, with pale brown-red ribs running longitudinally along veins from base of tube to mouth of blade. inner surface of spathe light reddish brown, with wide pale green veins, gradually becoming slightly darker along the midline. mouth facing horizontally, transversely elliptic, - cm high, - cm wide, margin entire. spadix sessile, cylindrical - mm long, - mm diam. female zone mm long, - mm wide, female flowers c. covering about half the surface of the axis, sometimes not contiguous with the male zone the axis then naked for several to mm. male zone - mm long, - mm wide, apex rounded, completely covered in male flowers, sterile appendix absent. flowers lacking perigone. male flowers with c. stamens, stamens free, sessile, prismatic, mm long, in plan-view isodiametric, - faceted, ( . -) mm diam., apex convex, minutely papillate; anther thecae lateral, four (fig. f ) oblong-elliptic, running the length of the stamen, with apical pore (fig. e ). female flowers with ovary globose, mm diam., -celled, (fig. i) , very rarely celled, style - . mm long, mm diam., stigma pale yellow, . mm thick, - . mm wide, strongly -lobed (fig. e) , lobes with a narrow midline groove, apex rounded. berry and seed not seen. distribution & ecology: cameroon, littoral region, known only from three sites at one location in the ebo forest near yabassi-yingui, in late secondary and intact, undisturbed lowland evergreen forest on ancient basement complex geology, rainfall c. m p.a., drier season october-march; - m alt. conservation: pseudohydrosme ebo is known from only three sites along a section of valley . km long and only - mature individuals in total have been seen by the collectors (second and third authors). these sites are along former logging roads which have reverted to forest (third author pers. obs. ) as well as intact forest. in the fourteen years since , botanical surveys have been mounted almost annually, at different seasons, over many parts of the formerly proposed national park of ebo. about botanical herbarium specimens have been collected, but despite the species being so spectacular in flower, with individual inflorescences lasting potentially two weeks (if in line with those of p. gabunensis), this species has been seen nowhere else in the c. km² of the ebo forest. however, much of this area has not been surveyed during the flowering season of the species, or not surveyed at all for plants. while it is likely that the species will be found at additional sites, there is no doubt that it is genuinely range-restricted. botanical surveys for conservation management in forest areas neighbouring ebo resulting in thousands of specimens being collected and identified have failed to find any specimens of pseudohydrosme (cheek et al. ; cable & cheek ; cheek et al. ; harvey et al. ; cheek et al. ; harvey et al. ) . it is possible that the species is unique to ebo and truly localised. the area of occupation of pseudohydrosme ebo is estimated as km² using the iucn preferred cell-size. the extent of occurrence is the same area. in february it was discovered that moves were in place to convert the forest into two logging concessions (e.g. https://www.globalwildlife.org/blog/ebo-forest-a-stronghold-for-cameroons-wildlife/ and https://blog.resourceshark.com/cameroon-approves-logging-concession-that-will-destroy-eboforest-gorilla-habitat/ both accessed sept. ). this would result in logging tracks that would allow access throughout the forest allowing poachers of rare collectable plants such as pseudohydrosme, and timber extraction would open up the canopy and remove the intact habitat in which pseudohydrosme grows. additionally, slash and burn agriculture often follows logging trails and would negatively impact the populations of this species. fortunately the logging concession was suspended due to representations to the president of cameroon on the global importance of the biodiversity of ebo (https://www.businesswire.com/news/home/ /en/relief-in-the-forest-cameroonian-government-backtracks-on-the-ebo-forest accessed sept. ). however, the forest habitat of this species remains unprotected and threats of logging and conversion of the habitat to plantations remain. pseudohydrosme ebo is therefore here assessed, on the basis of the range size given and threats stated as cr b + ab(iii), that is critically endangered. etymology: named as a noun in apposition for the forest of ebo, in cameroon's littoral region, yabassi-yingui prefecture, to which this spectacular species is globally restricted on current evidence. local names and uses: none are known. the discovery of pseudohydrosme ebo is related in the introduction above. alvarez with van der burgt, and ngansop, discovered in dec. seedlings of the new species, at three different stages, preserved as van der burgt sheet ¼ (see fig ) . clearly the species at this site is reproducing itself. associated photographs also show plants of different ages. abwe & morgan, ( ) and cheek et al. ( a) characterise the ebo forest, and give overviews of habitats, species, and importance for conservation. fifty-two globally threatened plant species are currently listed from ebo on the iucn red list website and the number is set to rise rapidly. the discovery of a new species to science at the ebo forest is not unusual. since numerous new species have been published from ebo in recent years. examples of other species that, like pseudohydrosme ebo appear to be strictly endemic to ebo on current evidence are: ardisia ebo cheek (cheek & xanthos, ) , crateranthus cameroonensis cheek & prance (prance & jongkind, ) , gilbertiodendron ebo burgt & mackinder (burgt et al., ) , inversodicraea ebo cheek (cheek et al., ) , kupeantha ebo m.alvarez & cheek (cheek et al., b) , palisota ebo cheek (cheek et al., a) . further species described from ebo have also been found further west, in the cameroon highlands, particularly at mt kupe and the bakossi mts . examples are myrianthus fosi cheek (cheek & osborne, ) , salacia nigra cheek (gosline & cheek, ) , talbotiella ebo mackinder & wieringa (mackinder et al., ) additionally, several species formerly thought endemic to mt kupe have subsequently been found at ebo, e.g. coffea montekupensis stoff. (stoffelen et al., ) , costus kupensis maas & h. maas (maas-van der kamer et al., ) , microcos magnifica cheek (cheek, ) , and uvariopsis submontana kenfack, gosline & gereau (kenfack et al., ) . therefore, it is possible that pseudohydrosme ebo might yet also be found in the cameroon highlands, e.g. at mt kupe, further extending westward the known range of the genus. however, this is thought to be only a relatively small possibility given the spectacular nature of this plant, and the high level of survey effort at e.g. mt kupe: if it occurred there it is highly likely that it would have been recorded already. the biogeography of the cameroonian pseudohydrosme ebo is very different from that of the two gabonese species of the genus growing c. km to the south. the gabonese species grow on recently deposited, sandy coastal soils. although the gabonese species also experience a wet season of about metres of rainfall per annum, it is differently distributed: the dry season in libreville occurs from june to september inclusive and is colder than the wet season. in contrast at ebo the geology at the pseudohydrosme location is ancient, highly weathered basement complex, with some ferralitic areas in foothill areas which are inland, c. km from the coast. the wet season (successive months with cumulative rainfall > mm) is almost the inverse of at libreville, falling between march and november and is colder than the dry season (abwe & morgan ) . in addition, the affinities of ebo as indicated by shared plant species, seems to be with other parts of the cross-sanaga biogeographic area, the cameroon highlands, rather than with gabon (see above). although indicated as potentially congeneric with anchomanes by hetterscheid & bogner ( ) who cited only the difference in ovary locularity as a basis for maintaining the separation, in fact five other characters support maintaining the separation of these two genera (see table below). two of these characters were discovered for the first time by those authors. these are ) the development in the fruit of a pedicel-like stipe and ) vegetative apomixis from the fleshy roots: producing new plants distant from the parent rhizome. the last character is specifically remarked to be definitively absent from anchomanes species, which have been studied in detail in cultivation (hetterscheid, & bogner ) . spadix long > / - / as long as spathe; conspicuous, projected above the (short) spathe tube (mayo et al. : ) . secondly, in p. ebo the male and female portions of the inflorescence are not completely contiguous, and the axis is % naked in the female portion, while in the other species of pseudohydrosme there is no naked portion and the spadix axis is completely covered in flowers. thirdly the trilocular ovaries normal in pseudohydrosme ebo are different to those of the other two species which are bilocular, and only very rarely otherwise. the following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): the fieldwork was approved by the institutional review board of the royal botanic gardens, kew entitled the overseas fieldwork committee (ofc). irad-herbier national du cameroun sanctioned the field work under a series of memoranda of collaboration with the royal botanic gardens, kew, the most recent signed th sept. , extending until th sept. . the following information was supplied regarding data availability: the specimens on which this manuscript is based are housed in the herbaria for which the standard codes are k and ya. specimen data and images of type material will be made available on the kew herbarium catalogue at http://apps.kew.org/herbcat/gotosearchpage.do. fieldwork for the research was supported by the garfield weston foundation and the bentham moxon trust. writing of this paper was supported by the players of the people's postcode lottery. the first and last author's salary during the study was paid by rbg, kew, the middle author by irad. there was no additional external funding received for this study. the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. the ebo forest: four years of preliminary research and conservation of the nigeria-cameroon chimpanzee (pan troglodytes vellerosus) notes on myrtaceae vii. myrtaceae of french equatorial africa supporting red list threat assessments with geocat: geospatial conservation assessment tool global distribution of species diversity in vascular plants: towards a world map of phytodiversity the chromosome numbers of the aroid genera the plants of mt cameroon, a conservation checklist phylogenetic relationships of aroids and duckweeds (araceae) inferred from coding and noncoding plastid dna botanical survey of the proposed mabeta-moliwe forest reserve in sw cameroon microcos magnifica (sparrmanniaceae) a new species of cloudforest tree from cameroon myrianthus fosi (cecropiaceae) a new submontane fruit tree from cameroon ardisia ebo sp. nov. (myrsinaceae) a creeping forest subshrub of cameroon and gabon rubiaceae), a new genus from cameroon and equatorial guinea mapping plant biodiversity on mt. cameroon. pp. - in van der maesen, van der a synoptic revision of inversodicraea (podostemaceae) vepris bali (rutaceae), a new critically endangered (possibly extinct) cloud forest tree species from bali ngemba, cameroon ternstroemia guineensis (ternstroemiaceae), a new endangered, submontane shrub with neotropical affinities the plants of dom the plants of mefou proposed national park the phytogeography and flora of western cameroon and the cross river-sanaga river interval new scientific discoveries: plants and fungi the plants of mount oku and the ijim ridge, cameroon, a conservation checklist the plants of kupe, mwanenguba and the bakossi mountains notes on the endemic plant species of the ebo forest, cameroon, and the new, critically endangered, palisota ebo (commelinaceae) taxonomic monograph of oxygyne (thismiaceae), rare achlorophyllous mycoheterotrophs with strongly disjunct distribution relationships with in the araceae: comparison of morphological patters with molecular phylogenies important plant areas: revised selection criteria for a global approach to plant conservation a remarkable range disjunction recorded in metarungia pubinervia (acanthaceae) pseudohydrosme in: engler a die natürlichen pflanzenfamilien nebst ihren gattungen und wichtigeren arten: ergänzungsheft enthaltend die nachträge zu den teilen ii-iv a monograph of octoknema (octoknemataceae-olacaceae s.l.) two new african species of salacia (salacioideae, celastraceae) revision and new species of the african genus mischogyne (annonaceae) the plants of bali ngemba forest reserve, cameroon. a conservation checklist the plants of the lebialem highlands, a conservation checklist nephthytis schott (araceae) in borneo. a new species and a new generic record for malesia recent observations and cultivation of pseudohydrosme gabunensis engl. (araceae) international plant names index. the royal botanic gardens, kew, harvard university herbaria & libraries and australian national botanic gardens iucn red list categories and criteria: version . . second edition bulletin du muséum national d'histoire naturelle de paris, e série jstor global plants. . continuously updated) available at acoraceae and araceae the genus uvariopsis in tropical africa, with a recombination and one new species from cameroon alatae from luzon, philippines showing striking pitcher convergence with n. maxima (sect. regiae) of indonesia les forêts littorales de la région de libreville (gabon) et leur importance pour la conservation: description d'un nouveau psychotria (rubiaceae) endémique pseudohydrosme gabunensis. the iucn red list of threatened species monograph of african costus a revision of the genus talbotiella baker f. (caesalpinioideae: leguminosae) the genera of araceae global history of the ancient monocot family araceae inferred with models accounting for past continental positions and previous ranges based on fossils a contribution to the leaf anatomy and taxonomy of apocynaceae in africa: the leaf anatomy of apocynaceae in east africa; a monograph of pleiocarpinae (series of revisions of apocynaceae ) red data book of the flowering plants of cameroon, iucn global assessments a revision of african lecythidaceae nomenclatural changes in preparation for a world rubiaceae checklist a new species of coffea (rubiaceae) and notes on mt kupe (cameroon) paris: musée national d'histoire naturelle; yaoundé : herbier national checklist of gabonese vascular plants contribution à l'étude du genre ardisia sw. (myrsinaceae) en afrique tropicale index herbariorum: a global directory of public herbaria and associated staff the gilbertiodendron ogoouense species complex (leguminosae: caesalpinioideae), central africa talbotiella cheekii (leguminosae: detarioideae), a new tree species from guinea peri-urban conservation in the mondah forest of libreville, gabon: red list assessments of endemic plant species, and avoiding protected area downsizing ekwoge abwe and bethan morgan of san diego zoo global and their team at ebo forest research project are thanked hugely for expediting our botanical surveys in the ebo forest of cameroon over several years. in particular bethan morgan is acknowledged for collecting the type specimen of pseudohydrosme ebo. janis shillito is thanked for typing the manuscript. the heads of irad (institute of research in agronomic development)-national herbarium of cameroon, yaounde, successively jean-michel onana, florence ngo ngwe and eric nana, are thanked for arranging permits and co-ordinating the co-operation with the royal botanic gardens, kew. the late josef bogner is thanked for conversations on araceae. maria alvarez is thanked for photos of seedlings of pseudohydrosme ebo. eric ngansop assisted in the field in ebo. marcello sellaro of the tropical nursery, royal botanic gardens, kew, is thanked for cultivation of and facilitating access to live material of araceae. the discovery of a new species of pseudohydrosme in cameroon, far from the border with gabon, is completely unexpected after nearly years in which no additional taxa have been added to the genus. it is also unexpected because one would not predict from the pre-existing data on the genus that such a new species would be so biogeographically and climatically disjunct from its congeners in the libreville area of gabon (see under pseudohydrosme ebo above). however examples of even more dramatically unexpected african range extensions have occurred recently such as the westward extension by km of the genus ternstroemia mutis ex l.f., of talbotiella baker by km, and of the genus metarungia baden by km, or in the other direction, eastwards, km in mischogyne exell (cheek et al. ; van der burgt et al. ; darbyshire et al., ; gosline et al., respectively) . such discoveries underline how incomplete our knowledge of the geography of african plant genera remains. such discoveries also underline the urgency for making such further discoveries while it is still possible since in all but one of the cases given, the range extension resulted from discovery of a new species for science with a narrow geographic range and/or very few individuals, and which face threats to their natural habitat, putting these species at high risk of extinction. about new species of vascular plant have been discovered each year for the last decade or more. until species are known to science, they cannot be assessed for their conservation status and the possibility of protecting them is reduced . documented extinctions of plant species are increasing, e.g. oxygyne triandra schltr. of southwest region, cameroon is now known to be globally extinct (cheek et al., c) . in some cases species appear to be extinct even before they are known to science, such as vepris bali cheek, also from the cross-sanaga interval in cameroon (cheek et al., d) and elsewhere, nepenthes maximoides cheek (king & cheek, ) . most of the > cameroonian species in the red data book for the plants of cameroon are threatened with extinction due to habitat clearance or degradation, especially of forest for small-holder and plantation agriculture following logging . efforts are now being made to delimit the highest priority areas in cameroon for plant conservation as tropical important plant areas (tipas) using the revised ipa criteria set out in darbyshire et al. ( ) . this is intended to help avoid the global extinction of additional endemic species such as pseudohydrosme ebo which will be included in the proposed ebo forest ipa. the authors declare there are no competing interests. martin cheek conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper. barthelemy tchiengue contributed reagents/materials/analysis tools, reviewed drafts of the paper, xander van der burgt contributed reagents/materials/analysis tools, reviewed drafts of the paper and produced many of the images. key: cord- -n hoty authors: egli, adrian; goldman, nina; müller, nicola f.; brunner, myrta; wüthrich, daniel; tschudin-sutter, sarah; hodcroft, emma; neher, richard; saalfrank, claudia; hadfield, james; bedford, trevor; syedbasha, mohammedyaseen; vogel, thomas; augustin, noémie; bauer, jan; sailer, nadine; amar-sliwa, nadezhda; lang, daniela; seth-smith, helena m.b.; blaich, annette; hollenstein, yvonne; dubuis, olivier; nägele, michael; buser, andreas; nickel, christian h.; ritz, nicole; zeller, andreas; stadler, tanja; battegay, manuel; schneider-sliwa, rita title: high-resolution influenza mapping of a city reveals socioeconomic determinants of transmission within and between urban quarters date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n hoty with two-thirds of the global population projected to be living in urban areas by , understanding the transmission patterns of viral pathogens within cities is crucial for effective prevention strategies. here, in unprecedented spatial resolution, we analysed the socioeconomic determinants of influenza transmission in a european city. we combined geographical and epidemiological data with whole genome sequencing of influenza viruses at the scale of urban quarters and statistical blocks, the smallest geographic subdivisions within a city. we observed annually re-occurring geographic clusters of influenza incidences, mainly associated with net income, and independent of population density and living space. vaccination against influenza was also mainly associated with household income and was linked to the likelihood of influenza-like illness within an urban quarter. transmissions patterns within and between quarters were complex. high-resolution city-level epidemiological studies combined with social science surveys such as this will be essential for understanding seasonal and pandemic transmission chains and delivering tailored public health information and vaccination programs at the municipal level. as a surrogate marker for the population density and available living space in a particular area and thereby also be an indicator of influenza burden ( figure d ). to account for this potential ecologic fallacy, we corrected the influenza incidence rates per , inhabitants for each statistical block. we still observed similar dense influenza case patterns at higher the socioeconomic score, the higher the vaccine rate. income showed the highest correlation with vaccination rates, followed by living space and population density, respectively (r . , p= . ; r . , p= . ; r . , p= . ). self-reported vaccination rates may serve as a surrogate marker for herd immunity , however, in years with a low vaccine effectiveness (table s ) this association may be weaker. in order to monitor antibody titres over time in a healthy population across the city, we recruited table s ). before the / influenza season, we observed that across all urban quarters a median of % (iqr - . %) had seroprotective antibody levels (defined as hemagglutination inhibition titres equal or more than : ) ( figure a) . again, urban quarters with lower socioeconomic scores also showed low seroprotection rates (e.g. matthaeus, breite, kleinhueningen and klybeck) ( figure s a ). urban quarters with higher socioeconomic scores showed a median seroprotection rate of . %, whereas those with lower socioeconomic scores showed a median seroprotection rate of . % (p= . ). blood donors with influenza vaccination showed significant higher h n specific hi titers in comparison to people who were not vaccinated (p< . ; figure s b ). similar to the survey, in this cohort net income was table s ). this visualization can be (we) were more identical in comparison to other urban quarters (p< . , figure c ). transmission events within the same urban quarter were explored. interestingly, two urban quarters -gundeldingen (gu) and vorstaedte (vo) -showed influenza isolates that were significantly more related to other isolates from within the same urban quarter than to isolates from other quarters or outside of basel (p< . , figure c ). these two urban quarters show a low socioeconomic score and lower pre-seasonal seroprotection rate. phylogenetic cluster size did not correlate with any socioeconomic factors (p=n.s.). each year influenza infects millions of people around the globe , this would allow us to address more and account for a greater variety of population segments and help to identify potential drivers of transmission. the overall study design has been previously published . briefly, the study had retrospective to account for socioeconomic differences related to each urban quarter and potentially . sites with a coverage of less than were assumed to be unknown and were denoted as n, that is any possible nucleotide. sequences that showed a read depth of for at least % of the positions in at least four segments were used for the analysis. non double infections with two strains were noted. using these parameters, we continued our analysis using samples. the consensus sequences from these strains were deposited in genbank (numbers will be available upon acceptance of the manuscript). we aligned the consensus sequences using we also acknowledge the contributing colleagues and centers to gisaid (see data availability statement: all sequencing data (raw reads) will be made available at ncbi. as well tables with anonymized pcr-confirmed cases and anonymized survey results will be made available in a public data repository. code availability statement: all codes used to process the viral genomic data will be made available on github. tables patients with pcr-confirmed influenza are compared against patients with negative pcr testing. we next reconstructed the phylogenetic trees of all initial clusters by using the full genomes of all samples from the initial clusters. we fixed the evolutionary rates to be equal to the mean evolutionary rates as estimated using the methods above. as a population model, we used a constant coalescent model with an estimated effective population size that was shared among all initial clusters. we then estimated a distribution of phylogenies for each initial cluster, assuming that all segments share the same phylogeny. as estimated in the previous analysis, reassortment will not bias evolutionary rates. local cluster identification. to identify sets of sequences from basel that were likely to have been transmitted locally, we reconstructed the geographic origins of lineages that were introduced into basel. therefore, we used the phylogenetic tree distributions for each initial cluster to reconstruct the ancestral states using parsimony. we made some modifications to the standard algorithm for parsimony ancestral state reconstruction to reflect our prior assumption, that basel is unlikely to continental synchronicity of human influenza virus epidemics key: cord- -knvfbuzv authors: knyazev, sergey; tsyvina, viachaslau; shankar, anupama; melnyk, andrew; artyomenko, alexander; malygina, tatiana; porozov, yuri b.; campbell, ellsworth m.; mangul, serghei; switzer, william m.; skums, pavel; zelikovsky, alex title: accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: knvfbuzv rapidly evolving rna viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. minority haplotypes can be identified using next-generation sequencing (ngs), but sequencing noise hinders accurate identification. the elimination of sequencing noise is a non-trivial task that still remains open. here we propose cliquesnv based on extracting pairs of statistically linked mutations from noisy reads. this effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. we comparatively assess the performance of cliquesnv using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing hiv patient. we show that cliquesnv can accurately assemble viral haplotypes with frequencies as low as . % and maintains consistent performance across short and long bases sequencing platforms. rapidly evolving rna viruses such as human immunodeficiency virus (hiv), hepatitis c virus (hcv), influenza a virus (iav), sars, and sars-cov- form * equal contribution † to whom correspondence should be addressed. sk: sergey.n.knyazev@gmail.com az: tel: + ; fax: + ; email: alexz@gsu.edu populations of closely related genomic variants inside infected hosts ( , , , , , , , , , ) . the intra-host viral populations include minority viral variants that are frequently responsible for drug resistance, immune escape, and disease transmission ( , , , , , , , , , , , , ) . therefore, accurately predicting minority viral populations from extremely large and noisy viral genomic data is important for biomedical research, epidemiology, and clinical applications. although this problem has recently attracted significant interest from the biomedical research community ( , , ) , numerous obstacles still delay ngs integration into the viral studies. the last decade witnessed numerous attempts to employ ngs and bioinformatics methods for reconstructing intra-host viral populations. these methods are not accurate enough for clinical and epidemiological applications since they cannot reliably identify haplotypes accounting for a substantial portion of the population. existing methods are ill-equipped to assemble closely related haplotypes and have elevated false-positive rates. additionally, there is only one in vitro viral sequencing benchmark for validation of haplotyping tools ( ) , and to convincingly demonstrate that such tools are ready for clinical and epidemiological applications, new comprehensive sequencing benchmarks are urgently required ( ) . next-generation sequencing (ngs) technologies now provide versatile opportunities to study viral populations. in particular, the popular illumina miseq/hiseq platforms produce - million reads, which allow multiple coverage of highly variable viral genomic regions. this high coverage is essential for capturing rare variants. ability of ngs technologies to efficiently identify minority variants have recently gained fda approval ( ) . however, haplotyping of heterogeneous viral populations (i.e., assembly of fullc yyyy the author(s) this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/licenses/ by-nc/ . /uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. length genomic variants and estimation of their frequencies) is extremely complicated due to the vast number of sequencing reads, the need to assemble an unknown number of closely related viral sequences and to identify and preserve low-frequency variants. single-molecule sequencing technologies, such as pacbio, provide an alternative to short-read sequencing by allowing full-length viral variants to be sequenced in a single pass. however, the high level of sequence noise due to background or platformspecific sequencing errors produced by all currently available platforms makes inference of low-frequency genetically close variants especially challenging, since it is required to distinguish between real and artificial genetic heterogeneity produced by sequencing errors. recently, a number of computational tools for inference of viral quasispecies populations from ngs reads have been proposed ( ) , including savage ( ), predicthaplo ( ) , abayesqr ( ) , quasirecomb ( ), haploclique ( ), vga ( ) , vira ( , ) , shorah ( ) , vispa ( ), qure ( ) and others ( , , , , ) . even though these algorithms proved useful in many applications, accurate and scalable viral haplotyping remains a challenge. in particular, inference of low-frequency viral variants is still problematic, while many computational tools designed for the previous generation of sequencing platforms have severe scalability problems when applied to datasets produced by state-of-the-art technologies. previously, several tools such as v-phaser ( ), v-phaser ( ) and covama ( ) exploited linkage of mutations for single nucleotide variant (snv) calling rather than haplotype assembly, but they do not accommodate sequencing errors when deciding whether two variants are linked. these tools are also unable to detect the frequency of mutations above sequencing error rates ( ) . the snv algorithm ( ) accommodates errors in links and was the first such tool to be able to correctly detect haplotypes with a frequency below the sequencing error rate. we propose a novel method that can accurately identify minority haplotypes from ngs reads consisting of three steps. first, we extract pairs of statistically linked mutations. second, we find maximal sets of pairwise linked mutations (cliques) where each clique corresponds to a set of mutations in a minority haplotype. finally, we assign each read to the closest clique, and for each clique, we form a haplotype as a consensus of reads assigned to it. all haplotyping tools require solid and convincing validation benchmarks ( , ) . the true viral variants and their distribution are only known for simulated data ( ), but sequencing errors, variation of coverage depth, pcr bias, and systematic noise cannot be reliably simulated. therefore experimental sequencing benchmarks that provide an adequate evaluation of haplotyping tools are necessary. by now, there are only two experimental sequencing benchmarks -(i) illumina sequencing reads consisting of a mixture of five hiv- strains (hiv exp, see table ) ( ) and (ii) pacbio sequencing reads from a sample consisting of ten iav viral variants (iav exp, see table ) ( ) . in the hiv exp, five different hiv- strains each having % frequency were prepared to mimic an intra-host viral population. unfortunately, this benchmark is not realistic enough since the observed intra-host viral populations consist of variants that are much closer to each other than different strains and contain both frequent and rare variants ( ) . the iav exp benchmark significantly better mimics the intrahost viral population since its variants are very similar to each other and the variant frequencies are realistically non-uniform. thus, similar to the iav exp benchmark, it would be beneficial to develop illumina benchmarks which adequately imitate intra-host viral populations containing closely related minority variants. to validate our method's performance, we have introduced two novel in vitro sequencing hiv- benchmarks, which consist of illumina miseq experiments on haplotype mixtures based on the mutation profile from an existing patient. finally, there is a essential gap in existing quality measures of intra-host viral population assembly. up-to-date, instead of populations (i.e. haplotypes with their frequencies), only sets of reconstructed and the ground truth haplotypes are compared ( ) . here we propose to measure differences between haplotype populations using matching error and the earth mover's distance which account for both the distances between haplotypes and their frequencies. a schematic diagram of the cliquesnv algorithm is shown in figure . the algorithm takes aligned reads as input and infers haplotype sequences with their frequencies as output. the method consists of six steps: • step uses aligned reads to build the consensus sequence and identifies all snvs. then all pairs of snvs are tested for dependency and are then divided into three groups: linked, forbidden, or unclassified. each snv is represented as a pair (p,n) of its position p and nucleotide value n in the aligned reads. if there are enough reads that have two snvs (p,n) and (p ,n ) simultaneously, then they are tested for dependency. if the dependency test is positive and statistically significant (see cliquesnv algorithm details for more information), then the algorithm classifies these two snvs as linked. otherwise, these two snvs are tested for independency. if the independency test is positive and statistically significant (see detailed description for details), then these two snvs are classified as a forbidden pair. • in step , we build a graph g = (v,e) with a set of nodes v representing snvs, and a set of edges e connecting linked snv pairs. • ideally, snvs of each true minority haplotype form a clique in g. a maximal clique c ⊆ v is a set of nodes such that (u,v) ∈ e for any u,v ∈ c and for any step finds all maximal cliques in g. • for real sequencing data, the linkage between some snv pairs may be undetected due to sequencing noise, uneven coverage, or the shortness of the ngs reads. as a result, a single clique corresponding to a haplotype will be split into several overlapping cliques. step merges such overlapping cliques. in order to avoid merging distinct haplotypes, two cliques are not merged if they contain a forbidden snv pair. • step assigns each read to a merged clique with which it shares the largest number of snvs. then cliquesnv builds a consensus haplotype from all reads assigned to a single merged clique. • finally, haplotype frequencies are estimated via an expectation-maximization algorithm in step . table . four experimental and two simulated sequencing datasets of human immunodeficiency virus type (hiv- ) and influenza a virus (iav). the datasets contain miseq and pacbio reads from intra-host viral populations consisting of two to ten variants each with frequencies in the range of . - %, and hamming distances between variants in the range of . - . %. we tested the ability of cliquesnv to assemble haplotype sequences and estimate their frequencies from pacbio and miseq reads using four real (experimental) and two simulated datasets from hiv and iav samples (table ) . each dataset contains between two to ten haplotypes with frequencies of . to %. the hamming distances between pairs of variants for each dataset are shown in figure s . experimental datasets: - . hiv- subtype b plasmid mixtures and miseq reads (hiv exp and hiv exp). we designed nine in silico plasmid constructs comprising a -bp region of the hiv- subtype b polymerase (pol) gene that were then synthesized and cloned into pucidt-amp (integrated dna technologies, skokie, il). each clone was confirmed by sanger sequencing. this -bp region at the beginning of pol contains known protease and reverse transcriptase genes that are monitored for drug-resistant mutations and is monitored with sequence analysis for patient care. each of these plasmids contains a specific set of point mutations chosen using mutation profiles of patient p from a real clinical study ( ) to create nine unique synthetic hiv- pol haplotypes. different proportions of these plasmids were mixed and then sequenced using an illumina miseq protocol to obtain x -bp reads (see supplementary methods). hiv exp and hiv exp are mixtures of two and nine variants, respectively. this dataset consists of illumina miseq × -bp reads with an average read coverage of˜ , × obtained from a mixture of five hiv- isolates: . , hxb , jrcsf, nl , and yu available at ( ) . isolates have pairwise hamming distances in the range from - . %( to -bp differences). the original hiv- sequence length was . kb, but was reduced to the beginning of pol with a length of . kb. this benchmark contains ten iav virus clones that were mixed at a frequency of . - %. the hamming distances between clones ranged from . - . % ( - bp differences) ( ) . the kb-amplicon was sequenced using the pacbio platform yielding a total of , reads with an average length of nucleotides. precision and recall inference quality is typically measured by precision and recall. p recision = t p t p +f p recall = t p t p +f n where t p is the number of true predicted haplotypes, f p is the number of false predicted haplotypes, and f n is the number of undiscovered haplotypes. initially we measured precision and recall strictly by treating a predicted haplotype with a single mismatch as an f p . additionally, like in ( ) we introduced an acceptance threshold, which is the number of mismatches permitted for a predicted haplotype to count as a t p . matching errors between populations however, precision and recall do not take into account (i) distances between true and inferred viral variants as well as (ii) the frequencies of the true and inferred viral variants. instead, we chose to use analogues of precision and recall defined for populations as follows. let t = {(t,f t )}, be the true haplotype population, where f t is the frequency of the true haplotype t, similarly, let p = {(p,f p )}, be the reconstructed haplotype population, where f p is the frequency of the reconstructed haplotype p, p∈p f p = . let d pt be the distance between haplotypes p and t. thus, instead of precision, we used the matching error e t →p which measures how well each reconstructed haplotype p ∈ p weighted by its frequency is matched by the closest true haplotype. indeed, precision increases while e t →p decreases and reaches % when e t →p = . similarly, instead of recall, we propose to use the matching error e t ←p which measures how well each true haplotype t ∈ t weighted by its frequency is matched by the closest reconstructed haplotype. ( ) note that recall increases while e t ←p decreases and reaches % when e t ←p = . earth mover's distance (emd) between populations the matching errors described above match haplotypes of true and reconstructed populations but do not match their frequencies. in order to simultaneously match haplotype sequences and their frequencies, we allowed for a fractional matching when portions of a single haplotype p of population p are matched to portions of possibly several haplotypes of t and vice versa. thus, we separated f p into f pt 's each denoting portion of p matched to t such that f p = t∈t f pt , f pt ≥ . symmetrically, f t 's are also separated into f pt 's, i.e, p∈p f pt = f t . finally, we chose f pt 's minimizing the total error of matching t to p which is also known as wasserstein metric or the emd between t and p ( , ). emds can vary a lot over different benchmarks since they may have different complexities, which depends on the number of true variants, the frequency distribution, the similarity between haplotypes, sequencing depth, sequencing error rate, and many other parameters. hence, we measured the complexity of a benchmark as the emd between the true population and a population consisting of a single consensus haplotype ( ) . data input for cliquesnv consists of pacbio or illumina reads from an intra-host viral population aligned to a reference genome. output is the set of inferred viral variant rna sequences with their frequencies. the formal high-level pseudocode of the cliquesnv algorithm is described in the supplementary materials. below we describe in detail the six major steps of cliquesnv that are schematically presented in figure . step : finding linked and forbidden snv pairs. at a given genomic position i, the most frequent nucleotide is referred to as a major variant and is denoted . let us fix one of the less frequent nucleotide (referred to as a minor variant) and denote it . a pair of variants at two distinct genomic positions i and j is referred to as a -haplotype. there are four -haplotypes with major and minor variants at i and j: the pairs of minor variants (referred to as snv pairs) are classified into three categories: linked, forbidden, and unclassified. an snv pair is linked if it is highly probable that there exists a haplotype containing both minor variants. on the contrary, an snv pair is forbidden if it is extremely unlikely that the corresponding minor variants belong to the same haplotype. all other snv pairs are referred to as unclassified. assuming that errors are random, it has been proven in ( ) where e , e , and e are the expected numbers of reads containing the -haplotypes ( ), ( ) and ( ) where p is the user-defined p -value (by default p = . ) and dividing by l is the bonferroni correction for multiple testing. pairs of snvs passing this linkage test are classified as a linked snv pairs. for every other pair of snvs, we check whether they can be classified as a forbidden snv pair, i.e., whether the probability of observing at most reads is low enough (< . ) given that the variant ( ) has frequency t ≥ t (by default t = . ). step : constructing the snv graph. the snv graph g = (v,e) consists of vertices corresponding to minor variants and edges corresponding to linked pairs of minor variants from different positions. if the intra-host population consists of very similar haplotypes, then graph g is very sparse. indeed, the pacbio dataset for iav encompassing , positions is split into , vertices, while the snv graph contains only edges, and, similarly, the simulated illumina read dataset for the same haplotypes contains only edges. note that the isolated minor variants correspond to genotyping errors unless they have a significant frequency. this fact allows us to estimate the number of errors per read, assuming that all isolated snvs are errors. as expected, the distribution of the pacbio reads has a heavy tail (see figure s ), which implies that most reads are (almost) error free, while a small number of heavy-tail reads accumulate most of the errors. our analysis allows the identification of such reads, which can then be filtered out. by default, we filter out ≈ % of pacbio reads, but we do not filter out any illumina reads. the snv graph is then constructed for the reduced set of reads. such filtering allows the reduction of systematic errors and refines the snv graph significantly. step : finding cliques in the snv graph g. although the max clique is a well-known np-complete problem and there may be an exponential number of maximal cliques in g, a standard bron-kerbosch algorithm requires little computational time since g is very sparse ( ). step : merging cliques in the clique graph c g . the clique graph c g = (c,f,l) consists of vertices corresponding to cliques in the snv graph g and two sets of edges f and l. a forbidding edge (p,q) ∈ f connects two cliques p and q with at least one forbidden pair of minor variants from p and q respectively. a linking edge (p,q) ∈ l connects two cliques p and q, (p,q) / ∈ f , with at least one linked pair of minor variants from p and q respectively. any true haplotype corresponds to a maximal (l\f )-connected subgraph h of c g which is connected with edges from l and does not contain any edge from f (see fig. ( ) ). unfortunately, even deciding whether there is a l-path between p and q avoiding forbidding edges is known to be np-hard ( ) . we find all subgraphs h as follows (see figure s ): (i) connect all pairs of vertices except connected with forbidding edges, (ii) find all maximal super-cliques in the resulted graph c g = (c,c ( ) −f ) using ( ), (iii) split each super-clique into l-connected components, and (iv) filter out the l-connected components which are proper subsets of other maximal l-connected components. step : partitioning reads between merged cliques and finding consensus haplotypes. let s be the set of all positions containing at least one minor variant in v . let q s be an major clique corresponding to a haplotype with all major variants in s. the distance between a read r and a clique q equals the number of variants in q that are different from the corresponding nucleotides in r. each read r is assigned to the closest clique q (which can possibly be q s ). in case of a tie, we assign r to all closest cliques. finally, for each clique q, cliquesnv finds the consensus v(q) of all reads assigned to q. then v(q) is extended from s to a full-length haplotype by setting all non-s positions to major snvs. step : estimating haplotype frequencies by using the expectation-maximization (em) algorithm. cliquesnv estimates the frequencies of the assembled intra-host haplotypes via an expectation-maximization algorithm similar to the one used in isoem ( ) . let k be the number of assembled viral variants, and let α be the probability of sequencing error. em algorithm works as follows: j || > ε, then n ← n+ and go to step . output estimated frequencies f (n) we compared cliquesnv to the snv, predicthaplo, and abayesqr haplotyping methods. since cliquesnv, predicthaplo and abayesqr use illumina reads, we compared them using the hiv exp, hiv exp, hiv exp, hiv sim, and iav sim datasets. since cliquesnv, snv, and predicthaplo can also use pacbio reads, we compared them using the iav exp dataset. we also used consensus sequences in the comparisons ( ) because of its simplicity and to evaluate sequences most similar to those generated by the sanger sequencing method ( ) . the precision and recall of haplotype discovery for each method is provided in table . cliquesnv had the best precision and recall for five of the six datasets. for the hiv exp dataset, predicthaplo was more conservative and predicted less false positive variants (better precision) than cliquesnv. table . prediction statistics of haplotype reconstruction methods using experimental and simulated (a) miseq and (b) pacbio datasets. the precision and recall was evaluated stringently such that if a predicted haplotype has at least one mismatch to its closest answer, then that haplotype is scored as a false positive. following study ( ) , we also showed how precision and recall grew with the reduction of restriction on mismatches (fig. ) . the number of true predicted haplotypes for cliquesnv was always greater than that of the other methods on real experimental sequencing benchmarks indicating that cliquesnv more accurately identified the true haplotypes. the number of falsely predicted haplotypes for cliquesnv was always lower than those for abayesqr, but similar to those predicted by predicthaplo on four out of five datasets indicating that both cliquesnv and predicthaplo had the best precision with miseq datasets. matching distance analysis showed that matching distances e t ←p and e t →p are better for cliquesnv than for both predicthaplo and abayesqr on four out of five miseq datasets (fig. ) . for hiv sim, e t ←p for abayesqr was slightly better than for cliquesnv. using hiv exp, hiv exp, hiv sim, and iav sim datasets, the e t ←p and e t →p for cliquesnv were very close to zero indicating that the predictions were almost perfect. since e t ←p and e t →p correlate with precision and recall, matching distance analysis indicates that cliquesnv had a better precision, and significantly outperformed both predicthaplo and abayesqr. since abayesqr had a higher e t →p on miseq datasets, it is more likely to make more false predictions. notably, on the hiv sim dataset, abayesqr outperformed both cliquesnv and predicthaplo by e t ←p . the emd between the predicted and true haplotype populations for all five miseq datasets are shown in figure . the exact emd values are provided in table . cliquesnv provided the lowest (the best) emd across all tools on four out of five miseq benchmarks. for the simulated and pacbio datasets, cliquesnv had almost a zero emd indicating a low error in predictions. predicthaplo had a lower emd than abayesqr on four out of five miseq datasets. abayesqr has almost a zero emd with the hiv sim dataset and outperformed cliquesnv, while using the hiv exp dataset, abayesqr performed poorer than other methods. next, cliquesnv, snv, and predicthaplo were compared using the iav exp benchmark dataset (see table s ). cliquesnv correctly recovered all ten true variants, including the haplotype with frequencies significantly below the sequencing error rate. snv recovered nine true variants but found one false positive. predicthaplo recovered only seven true variants and falsely predicted three variants. to further explore the precision of these three methods with the iav exp data, we simulated low-coverage datasets by randomly subsampling n = k, k, k reads from the original data. for each dataset, cliquesnv found at least one true variant more than both snv and predicthaplo. to compare the computational run time of each method, we used the same pc (intel(r) xeon(r) cpu x . ghz x cores per cpu, dimm ddr , mhz ram gb x ) with the centos . operating system. the runtime of cliquesnv is sublinear with respect to the number of reads while the runtime of predicthaplo and snv exhibit superlinear growth. for the k iav sim reads the cliquesnv analysis took seconds, while predicthaplo and snv took around minutes. the runtime of cliquesnv is quadratic with respect to the number of snvs rather than by the length of the sequencing region (fig. s ) . we also generated five hiv- variants within % hamming distance from each other, which is the estimated genetic distance between related hiv variants from the same person ( ) . then we simulated m illumina reads for sequence regions of length , , and nucleotides for which cliquesnv required , , , and seconds, respectively, for analyzing these datasets (fig. s ) . for the hiv exp benchmark, abayesqr, predicthaplo, and cliquesnv required over ten hours, minutes, and only seconds, respectively. assembly of haplotype populations from noisy ngs data is one of the most challenging problems of computational genomics. high-throughput sequencing technologies, such as illumina miseq and hiseq, provide deep sequence coverage that allows discovery of rare, clinically relevant haplotypes. however, the short reads generated by the illumina technology require assembly that is complicated by sequencing errors, an unknown number of haplotypes in a sample, and the genetic similarity of haplotypes within a sample. furthermore, the frequency of sequencing errors in illumina reads is comparable to the frequencies of true minor mutations ( ) . the recent development of single-molecule sequencing platforms such as pacbio produce reads that are sufficiently long to span entire genes or small viral genomes. nonetheless, the error rate of single-molecule sequencing is exceptionally high reaching − % ( ), which hampers pacbio sequencing to detect and assemble rare viral variants. we developed cliquesnv, a new reference-based assembly method for reconstruction of rare genetically-related viral variants such as those observed during infection with rapidly evolving rna viruses like hiv, hcv and iav. we demonstrated that cliquesnv infers accurate haplotyping in the presence of high sequencing error rates and is also suitable for both single-molecule and short-read sequencing. in contrast to other haplotyping methods, cliquesnv infers viral haplotypes by detection of clusters of statistically linked snvs rather than through assembly of overlapping reads used with methods such as savage ( ) . "output" - / / - : -page -# nucleic acids research, yyyy, vol. xx, no. xx applied to the novel in vitro sequencing hiv- benchmark, cliquesnv correctly reconstructed % of the intra-host haplotype population. at the same time, other state-of-the-art tools were not able to recover even a single haplotype without errors. additionally, we have used the only previously known and commonly used in vitro benchmark ( ) and simulated datasets to evaluate the accuracy of existing haplotyping methods. in contrast to the existing methods, cliquesnv was able to detect minority haplotypes at a low . % frequency and distinguish minority haplotypes differently in only two base pairs. although very accurate and fast, cliquesnv has some limitations. unlike savage ( ), cliquesnv is not a de novo assembly tool and requires a reference viral genome. this obstacle could easily be addressed by using vicuna ( ) or other analogous tools to first assemble a consensus sequence from the ngs reads, which can then be used as a reference. another limitation is for variants that differ only by isolated snvs separated by long conserved genomic regions longer than the read length which may not be accurately inferred by cliquesnv. while such situations usually do not occur for viruses, where mutations are typically densely concentrated in different genomic regions, we plan to address this limitation in the next version of cliquesnv. the ability to accurately infer the structure of intra-host viral populations makes cliquesnv applicable for studying viral evolution, transmission and examining the genomic compositions of rna viruses. in addition, we envision that the application of our method could be extended to other highly heterogeneous genomic populations, such as metagenomes, immune repertoires, and cancer cell genes. the datasets hiv exp and hiv exp have been deposited in the sequence read archive under accession number srr and srr , respectively. the links to the data sets and the consensus sequences of the individual strains are available at https://github.com/sergey-knyazev/cliquesnv-validation/ blob/master/relevant haplotypes/hiv exp.fasta cliquesnv is available at https://github.com/vtsyvina/cliquesnv all scripts and configuration files that were used for validation of the tools are available at https://github.com/sergey-knyazev/cliquesnv-validation this work was supported in part by nih grant r eb - and nsf grant ccf- . sk, vt, am were partly supported by molecular basis of disease at georgia state university. global epidemiology of hiv epidemiology and natural history of hcv infection global and regional mortality from causes of death for age groups in and : a systematic analysis for the global burden of disease study the molecular quasispecies hepatitis c virus (hcv) circulates as a population of different but closely related genomes: quasispecies nature of hcv genome distribution rapid evolution of rna viruses viral quasispecies evolution quasispecies structure, cornerstone of hepatitis b virus infection: mass sequencing approach ) sars-associated coronavirus quasispecies in individual patients genomic diversity of severe acute respiratory syndrome-coronavirus in patients with coronavirus disease computational methods for the design of effective therapies against drug resistant hiv strains the rational design of an aids vaccine diversity considerations in hiv- vaccine selection rna virus populations as quasispecies hiv- subtype b protease and reverse transcriptase amino acid covariation drug resistance of a viral population and its individual intrahost variants during the first hours of therapy antigenic cooperation among intrahost hcv variants organized into a complex network of cross-immunoreactivity accurate genetic detection of hepatitis c virus transmissions in outbreak settings ) inference of genetic relatedness between viral quasispecies from sequencing data ) quentin: reconstruction of disease transmissions from viral quasispecies genomic data ) phyloscanner: inferring transmission from within-and between-host pathogen genetic diversity ) piqmee: bayesian phylodynamic method for analysis of large datasets with duplicate sequences de novo assembly of viral quasispecies using overlap graphs ) geno pheno[ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data ) fulllength haplotype reconstruction to infer the structure of heterogeneous virus populations epidemiological data analysis of viral quasispecies in the next-generation sequencing era office of the commissioner fda authorizes marketing of first next-generation sequencing test for detecting hiv- drug resistance mutations hiv haplotype inference using a propagating dirichlet process mixture model abayesqr: a bayesian method for reconstruction of viral populations characterized by low diversity probabilistic inference of viral quasispecies subject to recombination viral quasispecies assembly via maximal clique enumeration accurate viral population assembly from ultra-deep sequencing data reconstruction of viral population structure from next-generation sequencing data using multicommodity flows reconstructing viral quasispecies from ngs amplicon reads shorah: estimating the genetic diversity of a mixed sample from nextgeneration sequencing data inferring viral quasispecies spectra from pyrosequencing reads qure: software for viral quasispecies reconstruction from next-generation sequencing data probabilistic inference of viral quasispecies subject to recombination efficient error correction for next-generation sequencing of viral amplicons correction of ngs reads from viral populations. computational methods for next generation sequencing data analysis viral quasispecies reconstruction via correlation clustering. biorxiv hcv quasispecies assembly using network flows highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data v-phaser : variant inference for viral populations covama: co-variation mapper for disequilibrium analysis of mutant loci in viral populations using next-generation sequence data virvarseq: a low-frequency virus variant detection pipeline for illumina sequencing using adaptive base-calling accuracy filtering ) long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants ) systematic benchmarking of omics computational tools benchmarking of computational error-correction methods for next-generation sequencing data ) evaluation of haplotype callers for next-generation sequencing of viruses full-length haplotype reconstruction to infer the structure of heterogeneous virus populations ) population genomics of intrapatient hiv- evolution simseq: a nonparametric approach to simulation of rna-sequence datasets analysis of ngs data from immune response and viral samples phd thesis the earthmover's distance is the mallows distance: some insights from statistics a note on asymptotic joint normality de novo assembly of highly diverse viral populations long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants algorithm : finding all cliques of an undirected graph complexity of the path avoiding forbidden pairs problem revisited estimation of alternative splicing isoform frequencies from rna-seq data ) evaluating the accuracy and sensitivity of detecting minority hiv- populations by illumina next-generation sequencing the global transmission network of hiv- a tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina miseq sequencers use of trade names is for identification only and does not imply endorsement by the u.s. department of health and human services, the public health service, or the centers for disease control and prevention (cdc). the findings and conclusions in this paper are those of the authors and do not necessarily represent the official position of the cdc. conflict of interest statement. none declared."output" - / / - : -page -# key: cord- -l u bp authors: lópez, maría s.; jordan, daniela i.; blatter, evelyn; walker, elisabet; gómez, andrea a.; müller, gabriela v.; mendicino, diego; estallo, elizabet l. title: dengue arbovirus affecting temperate argentina province for more than a decade - date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: l u bp dengue disease is found in tropical and subtropical climates and within the last decade it has extended to temperate regions. santa fe, a temperate province in argentina, has experienced an increase in dengue cases and virus circulation in the last decade, with the recent outbreak being the largest since dengue transmission was first reported in the province in . the aim of this work is to perform a description of spatio-temporal fluctuations of dengue (denv) cases from to the present in santa fe province. the data presented in this work provide a detailed description of dengue virus transmission for santa fe province by department. this information is useful to assist in better understanding the impact of ongoing dengue emergence in temperate regions across the world. indeed, this work provides data useful for future studies including those investigating socio-ecological, climate, and environmental factors associated with dengue transmission, as well as those investigating other variables related to the biology and the ecology of vector-borne diseases. dengue virus (denv serotypes - ) is considered one of the most important emerging and reemerging arboviruses today responsible for dengue fever, dengue hemorrhagic fever, and dengue shock syndrome. aedes aegypti mosquitoes are the main vectors for denv as well for yellow fever, zika and chikungunya viruses. dengue disease is found in tropical and subtropical climates, and in the past decade it has extended to temperate regions , . vector-borne diseases are sensitive to environmental and climatic changes , , which could bring on changes in vector distribution and abundance as well as changes in disease incidence rates . in the last years, denv has undergone a rapid expansion into temperate regions, generating numerous epidemic events . denv was eradicated from argentina in the middle of the past century due in part to successful ae. aegypti control programs; however, during the first autochthonous transmission and subsequent outbreak was registered in subtropical northern argentina . after the reemergence, successive outbreaks appeared in the warmest months and were always closely related to outbreaks in neighboring countries . currently in argentina most of its provinces ( of ) have reported autochthonous cases of dengue since its reemergence . aedes aegypti distribution has a wide range in the country, and therefore there is the risk of outbreaks if the virus circulates in those areas . santa fe province is in central-northeastern argentina, at the southern cone of south america (fig. a) and according to the köppen-geiger climate classification has a temperate climate with hot summers and no dry season. the province is one of the most populated and productive areas of the country. in fact, it features international road connections through the bioceanic corridor and the parana-paraguay waterway, which gives santa fe a privileged geostrategic location. this central bi-oceanic corridor connects with chile in the pacific ocean and uruguay in the atlantic ocean. santa fe also connects the southern provinces of argentina with those of the center and northeast (fig. b) . indeed, santa fe is a place of passage for land cargo and passengers with bolivia, paraguay and brazil which are neighboring countries with endemic denv circulation . according to the argentina ministry of health (moh), santa fe province is a central epidemiological region of the country together with córdoba and entre ríos provinces and therefore we will be refereeing to the central region despite santa fe being climatically and geographically in central northeastern argentina as we describe above. argentina experienced for first time in the central region the report of dengue cases with the outbreak. since them dengue cases have been reported each year with the largest number to date occurring in where more than % of the dengue cases of the whole country have been reported in this region. moreover, santa fe province is facing the biggest dengue epidemic since dengue's re-introduction in the country, despite control efforts of the health ministry of santa fe (mos) and the moh. several protocols were implemented, that included entomological surveillance, environmental sanitation, focal control and emergency actions such as chemical spraying around dwellings with reported dengue cases . the aim of the present work is to perform a detailed description of spatio-temporal fluctuations of dengue cases since to present in the temperate santa fe province of argentina. also, we present dengue case distribution and incidence by province jurisdiction. the data base included in this paper is important for future denv cases studies in the central temperate region of the country, and it is an important source of information for researchers investigating dengue emergence worldwide. in the past few years, the mos together with the center for climate variability and climate change studies (cevarcam, acronym in spanish), have being working in collaboration to develop studies for a better understanding of the relationships between vectorborne diseases and climate . studies of the impacts of climate changes through meteorological and environmental variables, as well as socio-economic variables affecting the incidence and dengue transmission rates could be developed with the data sets presented here. studies with mathematical models could utilize this data to investigate previous outbreaks and predict future outbreaks and dengue case occurrence. this study could be useful for stakeholders on making decisions related to dengue prevention, control, and management at local, national, or even international levels. dengue epidemics were documented between january and may in santa fe province ( fig. ). the region is characterized by a homogeneous geomorphological conformation where the chaco-pampeana plain predominates. it consists in a mosaic of wet savannahs and grasslands, subtropical dry forests, gallery forests, shrublands, and a wide variety of wetlands (e.g., rivers, streams, marshes, swamps). the climate in the region is temperate with hot summers and no dry season, according to köppen-geiger's climate classification river is the main waterway and constitutes the eastern limit, along with a complex system of islands, main channels, lagoons, and wetlands. the dynamics of the paraná river floodplain are strongly shaped by cycles of rises and falls in water levels . the salado river is another important waterway that crosses the center of the province from west to east flowing into the paraná river. santa fe province is divided into departments and contains a total population of . million inhabitants . we have compiled and reviewed all available data on dengue cases including confirmed and probable cases, autochthonous and imported cases, denv serotypes, and provenance of imported cases (either from another country or another province of argentina) between january and may period. this study does not include suspected and unconfirmed cases. the spatiotemporal fluctuation of denv cases was analyzed and the areas and periods with the highest incidence of the disease were identified. a time series of monthly incidence (number of cases per , inhabitants), was created to determine the outbreaks progression during the whole study period january -may . additionally, a time series was created with the number of cases per epidemiological week (ew) between january and may , to describe the most recent and most important outbreak in the province to date. a dengue incidence map (number of cases per , inhabitants) was prepared to facilitate determining the most affected departments. the database is publicly available online (via figshare ) as a set of four column separated files. the first contains the total number of cases and incidence per month and per year for the province of santa fe. the second contains the total number of cases and incidence discriminated by department. the third file contains the number of cases in the cities with a high number of denv cases and incidence in santa fe province in . the column headings of the files are as follows. year: the year of the date of the entry. table ). imported dengue cases originated mainly from tropical countries where dengue fever is endemic, as well as the endemic northern region of argentina, although there were also imported cases from temperate countries such as uruguay (table ) . although denv circulation typically occurs between january and may each season, during august (n = ), july (n = ), october (n = ) and december (n = ), november (n = ) and june (n = ), ten autochthonous dengue cases were reported in santa fe between june and december. figure shows the incidence of dengue by departments of the province of santa fe in the period january -may . dengue incidence in santa fe departments was clearly highest in the northeast area of the province that borders with chaco and corrientes provinces located northeast of the country. the predominant serotype that circulated among all outbreaks was denv- although all four denv serotypes were detected across outbreaks (table ) . at present during the outbreak, . % of the cases were detected as denv- ( cases), . % as denv- ( cases) and . % as denv- ( cases) ( table ) . denv- was widely distributed in the province, although denv- was reported in rosario department and denv- was not reported during this outbreak (fig. ). between ew and ew , five deaths were reported ( . % of total cases in the province). table shows the most affected cities during the outbreak, where almost % of the cases were reported. all data presented within this work were requested to the moh by means of a facility known as the access to public information (national law ), with the protection of personal data regulated by the national law . these data requested from the moh were tested with the reports sent by the mos through the national health surveillance system. the data presented herein show the increases of denv cases and denv transmission in the last decade in the temperate santa fe province, highlighting the outbreak as it is most important outbreak to date. this data also highlights the importance of this area in dengue emergence because of its geographic location relative to other provinces and neighboring countries were denv has endemic circulation. in the last decade, the intensity of denv circulation has increased with expansion to temperate regions of the globe and with the size of outbreaks in endemic and emerging regions increasing in magnitude . this is evident in santa fe province, which experienced increases in argentina experienced its most important season of dengue to date in . % of the national territory ( of provinces were affected by the dengue outbreak). the dengue outbreak in santa fe province during was four times larger than the outbreak. the department of general obligado, in the northeast region of the province, presented co-circulation of denv and . this area also connects with northern provinces of the country that border with the neighboring countries of brazil and paraguay were dengue has a high incidence, suggesting southern spread of dengue transmission resulting from importation via neighboring countries. in addition to increases in overall transmission, the presence of multiple serotypes co-circulating presents a risk for increased severity of dengue due to reactions among serotypes . areas of southern brazil reported cocirculation of denv , and until ew . in paraguay, cases of denv , and were identified . rosario department in the southeast of santa fe was the only one reporting denv circulation due to imported cases from travelers to mexico and brazil (fig. ) . the co-circulation with similar incidence of denv and (table ) in santa fe province increases the possibility of severe dengue cases . co-circulation of more than serotype and the increases in dengue severity caused by this is a high concern for santa fe province other temperate globe areas where dengue is actively emerging. several reasons could explain the advance of denv and therefore dengue cases to the temperate areas, and those are related to the increased presence of the ae. aegypti mosquito due to favorable environmental conditions for the mosquito's expansion , global travel, and environmental changes associated with increasing temperatures and changing precipitation patterns . the favorable environmental conditions could be attributed to a combination of increased urbanization, human activities like forest devastation, expansion of agriculture areas, expansion of urban areas and the consequences of city areas without essential services such as garbage depots, sufficient health system, and comprehensive entomological surveillance . furthermore the outbreak is in unique context due to the covid- pandemic where people in argentina have a mandatory quarantine. it is possible that the epidemic has grown in part due to the increased time that people are spending in their homes, which potentially increases exposure to ae. aegypti mosquitoes . the argentina dengue control plan is based on integrated strategies for diminishing the vector population. therefore, it is necessary to focus researchers on spatio-temporal dynamics of denv transmission to improve the entomological surveillance to facilitate the entomological monitoring situation. the data presented in this work provide a detailed description of denv transmission for santa fe province by department to highlight the recent and ongoing emergence of dengue in the province. this information together with other works in the temperate argentina , will be useful in better understanding the impact of dengue emergence and reemergence in other areas of the world. indeed, this work can be combined with other existing data sets to contribute to future studies including those aimed at investigating socio-ecological, climate, and environment factors associated with dengue emergence, as well as those aimed at understanding the influence of other variables related to the biology and the ecology of vector-borne diseases. table . total incidence of confirmed and probable cases, number of cases confirmed by serotypes, confirmed denv serotypes and origin of imported cases. incidence is calculated as the number of cases per , inhabitants of santa fe province. table . cities with the highest number of dengue cases in the outbreak in the province of santa fe. incidence is calculated as the number of cases per , inhabitants. year total incidence of confirmed and probable cases n° cases identified by serotypes arbovirosis de importancia en las regiones tropicales (centro de investigación y desarrollo profesional arbovirus emergence in temperate climates: the case of spatio-temporal dynamics of dengue outbreak in córdoba city geographical limits of the southeastern distribution of aedes aegypti (diptera, culicidae) in argentina. plos neglected tropical diseases ncbi updated world map of the köppen-geiger climate classification strategies and capacities for a competitive global insertion ministry of government and reform of the state of the province of santa fe plan nacional de prevención y control del dengue y la fiebre amarilla spatio-temporal analysis of leptospirosis incidence and its relationship with hydroclimatic indicators in northeastern argentina. sc. of the total env ecorregiones de la argentina población según censo nacional de población poblacion/censo-nacional-de-poblaciony-vivienda- /estadisticas-por-dpto.-y-pcia/poblacion/poblacion-segun-censo-nacional-de-poblacion dengue arbovirus affecting temperate argentina province for more than a decade actualización epidemiológica dengue boletín integrado de vigilancia. número secretaria de vigilancia en salud arboviruses: a family on the move climate change and viral emergence: evidence from aedes-borne arboviruses denv- (n = ) argentina provinces: formosa and misiones denv- (n = ) argentina provinces: formosa argentina provinces: misiones the data of dengue cases were granted by ministry of health and meteorological data by national total of confirmed and probable cases key: cord- -vnmc ii authors: araki, takuma; tanatani, kenta; kamimura, naofumi; otsuka, yuichiro; yamaguchi, muneyoshi; nakamura, masaya; masai, eiji title: sphingobium sp. syk- syringate o-demethylase gene is regulated by desx, unlike other vanillate and syringate catabolic genes regulated by desr date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vnmc ii syringate and vanillate are the major metabolites of lignin biodegradation. in sphingobium sp. strain syk- , syringate is o demethylated to gallate by consecutive reactions catalyzed by desa and ligm, and vanillate is o demethylated to protocatechuate by a reaction catalyzed by ligm. the gallate ring is cleaved by desb, and protocatechuate is catabolized via the protocatechuate , -cleavage pathway. the transcriptions of desa, ligm, and desb are induced by syringate and vanillate, while that of ligm and desb are negatively regulated by the marr-type transcriptional regulator desr, which is not involved in desa regulation. here we clarified the regulatory system for desa transcription by analyzing the iclr-type transcriptional regulator desx, located downstream of desa. quantitative reverse transcription (rt)-pcr analyses of a desx mutant indicates that the transcription of desa was negatively regulated by desx. in contrast, desx was not involved in the regulation of ligm and desb. the ferulate catabolic genes (ferba) under the control of a marr-type transcriptional regulator ferc are located upstream of desa. rt-pcr analyses suggest that the ferb-fera-slg_ -desa gene cluster consists of the ferba operon and the slg_ -desa operon. promoter assays reveal that a syringate- and vanillate-inducible promoter is located upstream of slg_ . purified desx bound to this promoter region, which overlaps with an -bp-inverted repeat sequence that appears to be essential for the dna binding of desx. syringate and vanillate inhibited the dna binding of desx, indicating that these compounds are effector molecules of desx. importance syringate is a major degradation product in the microbial and chemical degradation of syringyl lignin. along with other low-molecular-weight aromatic compounds, syringate is produced by chemical lignin depolymerization. converting this mixture into value-added chemicals using bacterial metabolism (i.e., biological funneling) is a promising option for lignin valorization. to construct an efficient microbial lignin conversion system, it is necessary to identify and characterize the genes involved in the uptake and catabolism of lignin-derived aromatic compounds and elucidate their transcriptional regulation. in this study, we found that the transcription of desa, encoding syringate o-demethylase in syk- , is regulated by an iclr-type of transcriptional regulator, desx. the findings of this study, combined with our previous results on desr (a marr transcriptional regulator that controls the transcription of ligm and desb), provide an overall picture of the transcriptional regulatory systems for syringate and vanillate catabolism in syk- . phenylcoumaran, and diarylpropane) and monoaryls (e.g., ferulate, vanillin, and syringaldehyde) as its sole carbon and energy source ( , ). these aromatic compounds, crescentus, and c. glutamicum atcc , the transcriptional regulation of vanab is negatively regulated by a gntr-type, gntr-type, and padr-like transcriptional regulator, respectively, all of which are called vanr ( , , ) . va was determined to be an effector molecule for the latter two systems ( , ) . the degradation of sa by bacteria other than syk- has recently been reported in novosphingobium aromaticivorans dsm ( ), microbacterium sp. strain rg ( ), and pseudomonas sp. strain ngc ( ). while the sa catabolic pathway genes have been identified in dsm and predicted in rg , the regulatory system of sa catabolism has not been studied in any bacterium. in terms of the transcriptional regulation involved in va and sa catabolism in syk- , we have reported that the pca , -cleavage pathway genes are positively regulated by ligr, a lysr-type transcriptional regulator that recognizes pca and ga as effectors ( fig. ) ( ). moreover, we have shown that a marr-type transcriptional regulator, desr, negatively regulates the transcription of ligm and desb, and va and sa are effectors that release the repression by desr (fig. ) ( ) . although the transcription of desa is also induced by va and sa, desr does not participate in the transcriptional regulation of desa; thus, its regulatory system remains unknown. in this study, we clarified the regulatory system of sa catabolism in syk- by identifying and characterizing an iclr-type transcriptional regulator, desx, which regulates desa transcription. downstream of desa, there is slg_ , which is similar to the iclr-type transcriptional regulator (ittr) (fig a; table s ). to clarify whether the gene product of ferc or slg_ is involved in the binding of the desap probe, we conducted emsas using cell extracts of a ferc mutant (ferc) and an slg_ mutant ( ). ferc was obtained in our previous study ( ), while  was constructed through homologous recombination in this study (fig. s ). emsas using a cell extract of ferc grown in wx-semp show a band shift similar to that of the wild type; however, no such band shift occurs in emsa using a cell extract of  grown in the same medium (fig. c) . these results strongly suggest that the band shift was due to the binding of the slg_ gene product to the desap region. to determine whether the disruption of ferc and slg_ affects sa and va catabolism in syk- , we measured the growth of ferc and  on mm sa and va.  grew on sa and va somewhat faster than the wild type, while ferc grew on sa and va as well as the wild type ( fig. d and e) . these results suggest that the slg_ gene product negatively regulates the transcription of desa by binding to the desap region. introduction of a plasmid carrying slg_ (pjb ) into  and syk- caused a substantial delay in their growth on sa compared to  and syk- harboring the vector (pjb ) (fig. s ) . these results indicate that the disruption of slg_ caused the changed phenotype of  . thus, we designated slg_ as desx. (fig. ) , desa is transcribed at higher rate in vivo under va-inducing conditions compared to sa-inducing conditions (fig. a) . the reason for this discrepancy is still unclear, however, the in vivo results may not completely reflect in vitro results, as the former is affected by other factors, such as the level of substrate uptake. the transcription start site of the slg_ -desa operon is located bp upstream from the initiation codon of slg_ , and a weak shine-dalgarno sequence is -to -bp upstream of the initiation codon (fig. s ) . according to the rbs calculator ( ), the translation initiation rate of slg_ from the putative mrna sequence of the slg_ -desa transcript is . , a rate that is markedly lower than to oma by a hydrolase whose gene has not yet been identified (fig. ) ( ) . recently, the methylesterase (desc) and the cis-trans isomerase (desd) genes were reported to be involved in the conversion of chmod to oma during sa catabolism in novosphingobium aromaticivorans dsm ( ). in the syk- genome, the slg_ and slg_ amino acid sequences are % and % similar to those of desc and desd, respectively. however, there is no slg_ ortholog in the dsm genome. it will be necessary to investigate the involvement of these genes in the conversion of chmod in syk- in the future. in n. aromaticivorans dsm , sa is converted to mga by the saro_ gene product (desa na ), whose amino acid sequence is % similar to that of syk- desa. the resulting mga is metabolized via chmod as described above ( ). in the dsm genome, a blast search reveals saro_ , encoding a product with % amino acid sequence identity with desx. because saro_ (desa na ) and saro_ (desx ortholog) are closely located (fig. s ) , desa na is probably regulated catabolism. this information is essential for creating engineered bacteria that can efficiently produce value-added chemicals from lignin. bacterial strains, plasmids, culture conditions, primers, and chemicals. the bacterial strains and plasmids used in this study are listed in table , and pcr primers are listed in table . sphingobium sp. syk- and its mutants were grown at c with (table ), and q hot start high-fidelity dna polymerase (new england biolabs). pcr products were electrophoresed on a . % agarose gel. psda _f psda _r psda _f psda a_f psda b_f psda c_f psda _r psda _f psda _r psda _f psda _r p bdx_f p bdx_r desap _f desap _r desap _f desap _r desap _f desap _r desap _f desap _r desap _f desap _r desap _f desap _f desap _r desap _f desap _r gcgcagaaccttaccaacgt agccatgcagcacctgtca gccttcgccttcctcaacta caccggaacccactgctt gctctccgacacgatgatca acgtactgcttcgccttgttg tttcgagcattattcgcatttc tccgcaggcgaatattcct ccggtggaacgggaaga ccacgccacgttgttcac gacatgctgtggcagatgtg cgcatctgccgctcatac cagaaggtggactcgtcgt aggatatcgagcgtgcg tgacgtacgacaatgcggaa tgacgcctccatcattctcg atgatccgcgtcttctcgtc atgagtcactcgccttcca tcagcaccggcattcactt gcatcgatgagggcatccat ccgacgaggttgaactggtt tcataatccgcccagggac gacgtcaccatgggaagcttgacacgatctacctgcgca cctgcaggatatctggatcctcctcgtgggactggtcat carbon balance in terrestrial detritus lignins: natural polymers from oxidative coupling of -hydroxyphenyl-propanoids lignin biosynthesis lignin biosynthesis and structure a field of dreams: lignin valorization into chemicals, materials, fuels, and health-care products opportunities and challenges in biological lignin valorization genetic and biochemical investigations on bacterial catabolic pathways for lignin-derived aromatic compounds bacterial catabolism of lignin-derived aromatics: new findings in a recent decade: update on bacterial lignin catabolism engineered microbial production of -pyrone- , -dicarboxylic acid from lignin residues for use as an industrial platform chemical glucose-free cis,cis-muconic acid production via new metabolic designs corresponding to the heterogeneity of lignin development of the production of -pyrone- , -dicarboxylic acid from lignin extracts, which are industrially formed as by-products, as raw materials the protocatechuate , -cleavage pathway: overview and new findings genetic and biochemical characterization of a -pyrone- , -dicarboxylic acid hydrolase involved in the protocatechuate , -cleavage pathway of sphingomonas paucimobilis syk- identification of three alcohol dehydrogenase genes involved in the stereospecific catabolism of arylglycerol-β-aryl ether by sphingobium sp. strain syk- ddvk, a novel major facilitator superfamily transporter essential for , '-dehydrodivanillate uptake by sphingobium sp a rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding cloning and expression of pseudomonas paucimobilis syk- genes involved in the degradation of vanillate and protocatechuate in p. putida use of bacteriophage t rna polymerase to direct selective high-level expression of cloned genes improved m phage cloning vectors and host strains: nucleotide sequences of the m mp and puc vectors zap: a bacteriophage λ expression vector with in vivo excision properties small mobilizable multi-purpose cloning vectors derived from the escherichia coli plasmids pk and pk : selection of defined deletions in the chromosome of corynebacterium glutamicum improved broad-host-range rk vectors useful for high and low regulated gene expression levels in gram-negative bacteria d -d . enzyme genes involved in the conversion of an arylglycerol-β-aryl ether metabolite and their use in generating a metabolic pathway for lignin valorization and the regulation of the pca , -cleavage genes by ligr ( ) are highlighted in blue background and black bold, respectively. the transcriptional regulation of desa by desx transcriptional regulators: desx, iclr-type regulator abbreviations: va, vanillate; pca, protocatechuate; chms, -carboxy- -hydroxymuconate- -semialdehyde pdc, -pyrone- , -dicarboxylate kch, -keto- -carboxy- -hexenedioate cha, -carboxy- -hydroxy- -oxoadipate ga, gallate; chmod, -carboxy- -hydroxy- -methoxy- -oxohexa- , -dienoate black bars under the map show the dna fragments used for emsa (desap desap probes). (b) emsas of syk- cell extracts using the desap desap probes emsas of ferc and  cell extracts using the desap probe. the desap probe ( pm) was incubated in the presence (+) and absence () of the extracts ( . g protein/l) of ferc and  cells grown in wx-semp cells of syk- (gray), ferc (magenta), and  (cyan) were incubated in wx- mm sa (d) or wx- mm va (e), and od was periodically monitored total rnas were isolated from the cells of syk- and a desx mutant (desx) grown in wx-semp, wx- mm sa, and wx- mm va. the relative mrna amounts of desa (a), ligm (b), desb (c), ferb (d), slg_ (e), and desx (f; measured only in the wild type) indicate the fold increases relative to the amount of mrna in syk- cells grown in wx-semp (level of . ). values for each amount of mrna were normalized to the level of s rrna this study this study this study this study this study this study this study this study this study this study this study this study this study this study acggtcatcgcagatcag acctgcaggcatgcaagctttccatcattctcgacggcg atgtttttcctcctaagctttcagtccaccagcatcagg acctgcaggcatgcaagcttgcgcatccaggaactcgat acctgcaggcatgcaagcttgcatgacctttcaattgtgcg acctgcaggcatgcaagcttcgtatatacgaagaacatcgg acctgcaggcatgcaagcttcggatttccatgagtcactcg atgtttttcctcctaagcttttccccatagcccgcaa acctgcaggcatgcaagcttcctgatctgcgatgaccgt atgtttttcctcctaagcttcgcatctgccgctcatac acctgcaggcatgcaagcttgacatgctgtggcagatgtg atgtttttcctcctaagcttttccggcattgtccagca tatcgaaggtcgtcatatgatccagaaggtggactc ctttgttagcagccggatcctcagcggtcgccaag ctgcaggatgtgcgcc gtgaatgccagccccaaa gcatgacctttcaattgtgcg aagtgaatgccggtgctgat agatcattcgccgcgca atggcttcatgctgcacc accggcgccgatgatgc tggttgaagagcacggcgg aactggcgcaacgagca ctgcagctggaagcgatagt atgagtcactcgccttcca tttccctctgcacgacgt gtgcccagatacggtcatc gcgcatccaggaactcgat tttcatctactcgaagaccgg key: cord- -km fnc authors: kinaneh, safa; knany, yara; khoury, emad; ismael-badarneh, reem; hammoud, shadi; berger, gidon; abassi, zaid; azzam, zaher s. title: identification, localization and expression of nhe isoforms in the alveolar epithelial cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: km fnc na+/h+ exchangers (nhes), encoded by solute carrier a (slc a) genes in human, are ubiquitous integral membrane ion transporters that mediate the electroneutral exchange of h+ with na+ or k+. nhes, found in the kidney and intestine, play a major role in the process of fluid reabsorption together via na+,k+-atpase pump and na+ channels. nevertheless, the expression pattern of nhe in the lung and its role in alveolar fluid homeostasis has not been addressed. therefore, we aimed to examine the expression of nhe specific isoforms in alveolar epithelium cells (aecs), and assess their role in congestive heart failure. three nhe isoforms were identified in aec and a cell line, at the level of protein and mrna; nhe , nhe and mainly nhe , the latter was shown to be localized in the apical membrane of aec. treating a cells with angiotensin (ang) ii for and hours displayed a significant reduction in nhe protein abundance and to lesser extent at hours; however, there was no effect at hours. moreover, a treated overnight with ang ii downregulated nhe protein abundance. chf rats held for week had increased abundance of nhe compared to sham operated rats. however, lower abundance of nhe was observed in chf rats held for weeks. herein we show, for the first time, the expression of a novel nhe isoform by aec, namely nhe . besides being negatively affected by ang ii, nhe protein levels were distinctly affected in chf rats, which may be related to chf severity. introduction: alveolar fluid clearance has been shown to be an important mechanism in keeping the airspaces free of edema in both cardiogenic and non-cardiogenic states [ , ] . there is a large body of evidence that the removal of alveolar fluid is attained by the alveolar epithelial active sodium transport; by which sodium passively enters the alveolar epithelial cells (aec) via apical amiloride-sensitive na + channel (enac) or other na + channels and is pumped out of the cells by basolateral na + , k + -atpase, an energy consuming process. following sodium transport, water is extruded from the alveolar airspaces [ ] [ ] [ ] . it has been shown that the survival of acute lung injury patients, directly correlated with the rate of alveolar fluid clearance [ ] . the sodium hydrogen exchanger (nhe) family includes several isoforms, which have different characteristics, including cell-compartment localization, plasma membrane distribution and organ-dependent function [ , ] . the evidence regarding nhe expression and role in the lungs, particularly in alveolar epithelial cells, is scarce. according to the evidence on nhes in the kidney and intestine; water reabsorption is achieved by the function of epithelial na + channel (enac), na + ,k + -atpase pump along with na + /h + exchangers (nhes) [ ] ; thus, it is conceivable to assume that nhe may contribute to this transport in the lung, specifically aec. therefore, the major objective of this work is to address whether nhe isoforms are expressed in alveolar epithelial cells and to evaluate their role in alveolar epithelial active sodium transport in healthy and congestive heart failure (chf) rats. rna was converted to cdna using maxima first strand cdna kit. the primers used for rt-pcr are listed in table . cdna template ( - ng), data were presented as mean  sem; n is the number of animals in each study group. one way analysis of variance was used when multiple comparisons are made followed by a multiple comparison test (tukey) when the f statistic indicated significance. to analyze paired data, we used unpaired t-test to assess the differences between the study groups. results were considered significant when p < . . the kolmogorov-smirnov test was used to analyze the normality of the groups. the levene's test was used for comparing the equality of variances. the student t-test for independent groups was used to compare between two study groups. two-tailed p value of . or less was considered to be statistically significant. a cells our major focus was on nhe isoforms that are primarily localized to cell membranes, thus can potentially contribute to the alveolar active sodium transport, and eventually alveolar fluid clearance. among these are nhe - and nhe . nhe , however, was not included in our experiments as it is reported to be exclusively expressed in the brain. by using rt-pcr and targeted primers to each isoform, the expression of nhe and nhe was confirmed in isolated aec ( figure a ). moreover, nhe was found to be expressed only following cells incubation for hours ( figure b) ; this observation might be attributed to the differentiation of aec type ii into type i. similarly, these exchangers, namely nhe , nhe and nhe are expressed in a cell line, known to have characteristic features of aecii (fig. c) . surprisingly, nhe expression was not demonstrated in neither cell types, whereas the expression of unexpected novel isoform, namely nhe , was established. based on previous reports, nhe might be localized to intracellular compartments or to the plasma membrane. immunofluorescence staining to nhe in aec and a based on the up mentioned finding of nhe abundance in aec, and its plasma membrane occurring profile, we assume that nhe has an important role in alveolar fluid clearance; therefore, we were interested to explore whether its expression is modified in a model of congestive heart failure (chf) as compared to sham rats. we performed western blot to evaluate nhe protein expression in lungs, one (chf- w) and four (chf- w) weeks following acf procedure, as compared to sham operated rats. chf- w rats exhibited an increased nhe protein abundance compared to sham- w rats, yet this elevation was not significant, possibly due to high variability of edema severity among chf rats (figures a and c) . however, nhe immunoreactive levels were significantly decreased in chf- w rats as compared to sham -w rats (figures b and d) . this distinct expression suggests that nhe expression may be related to the severity and progression of chf. there was no evidence for nhe expression in isolated aec; an observation that was supported by rt-pcr experiments (fig. a) . therefore, other nhe isoforms known figure b) . conceivably, this observation may be related to the assumption that nhe is localized in aeci. the aecii cell-like cell line, a , was used to validate nhe isoforms expression (figures a-c) . western blotting to isolated basolateral membranes of aec was negative for nhe expression (fig. d) . these findings support the assumption that nhe is localized on the apical side of aec. unfortunately, there is no known selective inhibitor for nhe since it is not much studied and only recently has been discovered. therefore, we decided to bypass this obstacle and study the response of this isoform to hormones and factors known to impair the ability of the lungs to clear edema. . recently, our group has shown that ang ii impaired afc, partly by down- regulating na + , k + -atpase protein levels [ ] . assuming that nhe participates in vectorial sodium transport, we hypothesized that ang ii may affect nhe protein expression. therefore, we conducted an experiment in which a cells were treated with ang ii ( - m) and examined nhe protein changes over different periods of time- , , and hours. as shown in fig. , ang ii decreased the levels of nhe mature protein at , , and hours; while there was no effect following hours of furthermore, we investigated nhe expression in chf rats, using acf rat model. rats were sacrificed week (chf- w) or weeks (chf- w) following acf procedure. nhe was distinctly expressed in chf rat lungs, in which chf- w nhe levels were increased, while in chf- w, nhe levels were decreased, as compared to sham-rats (fig. ) . this distinct expression pattern of nhe might suggest a protective role of nhe in chf- w that is driven by the need to clear excessive lung fluids; while the decreased levels in chf- w, might be a result of chf severe condition, were even protective pathways are badly damaged. the main limitation of this study was our inability to directly address nhe effect on afc due to the lack of specific inhibitors or nhe knock-out mice models. latter it is localized to the apical membrane. by exchanging extracellular na + with intracellular h + , nhe contribute to na + vectorial transport along with epithelial sodium channel (enac). the entered sodium then is pumped out of alveolar cells by na + , k + atpase pump. sodium transport from alveolar space to the interstitium is accompanied by water movement and so edema clearance. resolution of pulmonary edema thirty years of progress patterns of alveolar fluid clearance in heart failure alveolar epithelium and na,k-atpase in acute lung injury invited review: lung edema clearance: role of k(+)-atpase lung edema clearance: years of progress invited review: alveolar edema fluid clearance in the injured lung alveolar fluid clearance is impaired in the majority of patients with acute lung injury and the acute respiratory distress syndrome the na+/h+ exchanger: an update on structure, regulation and cardiac physiology regulation of myocardial na+/h+ exchanger activity structural and functional analysis of the na+/h+ exchanger natriuretic peptides system in the pulmonary tissue of rats with heart failure: potential involvement in lung edema and inflammation isolation and culture of alveolar type ii cells the role of angiotensin ii and cyclic amp in alveolar active sodium transport regulation of ion channel structure and function by reactive oxygen-nitrogen species lung epithelial fluid transport and the resolution of pulmonary edema role of na+/h+ exchanger nhe in nephron function: micropuncture studies with s , an inhibitor of nhe mechanisms of pulmonary edema clearance evolutionary origins of eukaryotic sodium/proton exchangers slc /nhe gene family, a plasma membrane and organellar family of na +/h+ exchangers ontogeny of nhe in the rat proximal tubule the neurohormonal hypothesis: a theory to explain the mechanism of disease progression in heart failure a new coronavirus associated with human respiratory disease in china a call for rational intensive care in the era of covid- key: cord- -vpu w wh authors: le, trang t.; moore, jason h. title: treeheatr: an r package for interpretable decision tree visualizations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vpu w wh summary treeheatr is an r package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. the integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. this visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods. availability the treeheatr package is freely available under the permissive mit license at https://trang .github.io/treeheatr and https://cran.r-project.org/package=treeheatr. it comes with a detailed vignette that is automatically built with github actions continuous integration. contact ttle@pennmedicine.upenn.edu decision tree models comprise a set of machine learning algorithms widely used for predicting an outcome from a set of predictors or features. for specific problems, a single decision tree can provide predictions at desirable accuracy while remaining easy to understand and interpret (yan et al., ) . these models are also important building blocks of more complex tree-based structures such as random forests and gradient boosted trees. the simplicity of decision tree models allows for clear visualizations that can be incorporated with rich additional information such as the feature space. however, existing software frequently treats all nodes in a decision tree similarly, leaving limited options for improving information presentation at the leaf nodes. specifically, the r library rpart.plot displays at each node its characteristics including the number of observations falling in that node, the proportion of those observations in each class, and the node's majority vote. despite being potentially helpful, these statistics may not immediately convey important information about the tree such as its overall performance. function vistree() from the r package visnetwork draws trees that are aesthetically pleasing but lack general information about the data and are difficult to interpret. the state-of-the-art python's dtreeviz produces decision trees with detailed histograms at inner nodes but still draw pie chart of different classes at leaf nodes. ggparty is a flexible r package that allows the user to have full control of the representation of each node. however, this library fixes the leaf node widths, which limits its ability to show more collective visualizations. we have developed the treeheatr package to incorporate the functionality of ggparty but also utilize the leaf node space to display the data as a heatmap, a popular visualization that uncovers groups of samples and features in a dataset (wilkinson and friendly, , galili,t. et al., ) . a heatmap also displays a useful general view of the dataset, e.g., how large it is or whether it contains any outliers. integrated with a decision tree, the samples in each leaf node are ordered based on an efficient seriation method. after simple installation, the user can apply treeheatr on their classification or regression tree with a single function: heat_tree(x, target_lab = 'outcome') this one line of code above will produce a decision tree-heatmap as a ggplot object that can be viewed in rstudio's viewer pane, saved to a graphic file, or embedded in an rmarkdown document. this example assumes a classification problem, but one can also apply treeheatr on a regression problem by setting task = 'regression' . this article is organized as follows. in section , we present an example treeheatr application by employing its functions on a real-world clinical dataset from a study of covid- patient outcome in wuhan, china (yan ✐ ✐ "output" - / / - : et al., ). in section , we describe in detail the important functions and corresponding arguments in treeheatr. we demonstrate the flexibility the user has in tweaking these arguments to enhance understanding of the tree-based models applied on their dataset. finally, we discuss general guidelines for creating effective decision tree-heatmap visualization. this example visualizes the conditional inference tree model built to predict whether or not a patient survived from covid- in wuhan, china (yan et al., ) . the dataset contains blood samples of patients admitted to tongji hospital between january and february , . three features were selected based on their importance score from a multi-tree xgboost model, including lactic dehydrogenase (ldh), lymphocyte levels and high-sensitivity c-reactive protein (hs_crp). detailed characteristics of the samples can be found in the original publication (yan et al., ) . the following lines of code compute and visualize the conditional decision tree along with the heatmap containing features that are important for constructing this model ( fig. ) : the heat_tree() function takes a party or partynode object representing the decision tree and other optional arguments such as the outcome label mapping. if instead of a tree object, x is a data.frame representing a dataset, heat_tree() automatically computes a conditional tree for visualization, given that an argument specifying the column name associated with the phenotype/outcome, target_lab, is provided. in the decision tree, the leaf nodes are labeled based on their majority votes and colored to correlate with the true outcome. on the right split of hs_crp (hs_crp ≤ . and hs_crp > . ), although individuals of both branches are all predicted to survive by majority voting, the leaf nodes have different purity, indicating different confidence levels the model has in classifying samples in the two nodes. these seemingly non-beneficial splits present an opportunity to teach machine learning novices the different measures of node impurity such as the gini index or cross-entropy (hastie et al., ). in the heatmap, each (very thin) column is a sample, and each row represents a feature or the outcome. for a specific feature, the color shows the relative value of a sample compared to the rest of the group on that feature; higher values are associated with lighter colors. within the heatmap, similar color patterns between ldh and hs_crp suggest a positive correlation between these two features, which is expected because they are both systemic inflammation markers. together, the tree and heatmap give us an approximation of the proportion of samples per leaf and the model's confidence in its classification of samples in each leaf. three main blocks of different lymphocyte levels in the heatmap illustrate its importance as a determining factor in predicting patient outcome. when this value is below . but larger than . (observations with dark green lymphocyte value), hs_crp helps further distinguish the group that survived from the other. here, if we focus on the hs_crp > . branch, we notice that the corresponding hs_crp colors range from light green to yellow (> . ), illustrating that the individuals in this branch have higher hs_crp than the median of the group. this connection is immediate with the two components visualized together but would not have been possible with the tree model alone. in summary, the tree and heatmap integration provides a comprehensive view of the data along with key characteristics of the decision tree. when the first argument x is a data.frame object representing the dataset instead of the decision tree, treeheatr automatically computes a conditional tree with default parameters for visualization. conditional decision trees (hothorn et al., ) are nonparametric models performing recursive binary partitioning with well-defined theoretical background. conditional trees support unbiased selection among covariates and produce competitive prediction accuracy for many problems (hothorn et al., ) . the default parameter setting often results in smaller trees that are less prone to overfit. treeheatr utilizes the partykit r package to fit the conditional tree and ggparty r package to compute its edge and node information. while ggparty assumes fixed leaf node widths, treeheatr employs a flexible node layout to accommodate the different number of samples shown in the heatmap at each leaf node. this new node layout structure supports various leaf node widths, prevents crossings of different tree branches, and generalizes as the trees grow in size. this new layout weighs the x-coordinate of the parent node according to the levels of the child nodes in order to avoid branch crossing. this relative weight can be adjusted with the lev_fac parameter in heat_tree(). lev_fac = sets the parent node's xcoordinate perfectly in the middle of those of its child nodes. the default level_fac = . seems to provide optimal node layout independent of the tree size. the user can define a customized layout for a specific set of nodes and combine that layout with the automatic layout for the remaining nodes. by default, heatmap samples (columns) are automatically reordered within each leaf node using a seriation method (hahsler et al., ) using all features and outcome label, unless clust_target = false. treeheatr uses the daisy() function in the cluster r package with the gower metric (gower, ) to compute the dissimilarity matrix of a dataset that may have both continuous and nominal categorical feature types. heatmap features (rows) are ordered in a similar manner. we note that, while there is no definitive guideline for proper weighting of features of different types, the goal of the seriation step is to reduce the amount of stochasticity in the heatmap and not to make precise inference about each grouping. in a visualization, it is difficult to strike the balance between enhancing understanding and overloading information. we believe showing a heatmap at the leaf node space provides additional information of the data in an elegant way that is not overwhelming and may even simplify the model's interpretation. we left it for the user to decide what type of information to be displayed at the inner nodes via different geom objects (e.g., geom_node_plot, geom_edge_label, etc.) in the ggparty package. for example, one may choose to show at these decision nodes the distribution of the features or their corresponding bonferroni-adjusted p values computed in the conditional tree algorithm (hothorn et al., ) . striving for simplicity, treeheatr utilizes direct labeling to avoid unnecessary legends. for example, in classification, the leaf node labels have colors corresponding with different classes, e.g., purple for deceased and green for survived in the covid- dataset (fig. ) . as for feature values, by default, the color scale ranges from to and indicates the relative value of a sample compared to the rest of the group on each feature. linking the color values of a particular feature to the corresponding edge labels can reveal additional information that is not available with the decision tree alone. in addition to the main dataset, the user can supply to heat_tree() a validation dataset via the data_test argument. as a result, heat_tree() will train the conditional tree on the original training dataset, draw the decision tree-heatmap on the testing dataset, and, if desired, print next to the tree its performance on the test set according to specified metrics (e.g., balanced accuracy for classification or root mean squared error for regression problem). the integration of heatmap nicely complements the current techniques of visualizing decision trees. node purity, a metric measuring the tree's performance, can be visualized from the distribution of true outcome labels at each leaf node in the first row. comparing these values with the leaf node label gives a visual estimate of how accurate the tree predictions are. further, without explicitly choosing two features to show in a -d scatter plot, we can infer correlation structures among features in the heatmap. the additional seriation may also reveal sub-structures within a leaf node. in this paper, we presented a new type of integrated visualization of decision trees and heatmaps, which provides a comprehensive data overview as well as model interpretation. we demonstrated that this integration uncovers meaningful patterns among the predictive features and highlights the important elements of decision trees including feature splits and several leaf node characteristics such as prediction value, impurity and number of leaf samples. its detailed vignette makes treeheatr a useful teaching tool to enhance students' understanding of this fundamental model before diving into more complex tree-based machine learning methods. treeheatr is scalable to large datasets. for example, heat_tree() runtime on the waveform dataset with observations and features was approximately seconds on a machine with a . ghz intel core i processor and gb of ram. however, as with other visualization tools, the tree's interpretation becomes more difficult as the feature space expands. thus, for high dimensional datasets, it's potentially beneficial to perform feature selection to reduce the number of features or random sampling to reduce the number of observations prior to plotting the tree. moreover, when the single tree does not perform well and the average node purity is low, it can be challenging to interpret the heatmap because clear signal cannot emerge if the features have low predictability. future work on treeheatr includes enhancements such as support for left-to-right orientation and highlighting the tree branches that point to a specific sample. we will also investigate other data preprocess and seriation options that might result in more robust models and informative visualizations. heatmaply: an r package for creating interactive cluster heatmaps for online publishing a general coefficient of similarity and some of its properties getting things in order: an introduction to the r package seriation the elements of statistical learning: data mining, inference, and prediction nd ed unbiased recursive partitioning: a conditional inference framework partykit: a modular toolkit for recursive partytioning in r ggplot : elegant graphics for data analysis the history of the cluster heat map an interpretable mortality prediction model for covid- patients the treeheatr package was made possible by leveraging integral r packages including ggplot (wickham, ) , partykit (hothorn and zeileis, ) , ggparty, heatmaply (galili et al., ) and many others. we would also like to thank daniel himmelstein for his helpful comments on the package's licensing and continuous integration configuration. finally, we thank two anonymous reviewers whose helpful feedback helped improve the package and clarify this manuscript. this work has been supported by the national institutes of health grant nos. lm and ai . key: cord- -g ti jj authors: schaworonkow, natalie; triesch, jochen; ziemann, ulf; zrenner, christoph title: eeg-triggered tms reveals stronger brain state-dependent modulation of motor evoked potentials at weaker stimulation intensities date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: g ti jj background corticospinal excitability depends on the current brain state. the recent development of real-time eeg-triggered transcranial magnetic stimulation (eeg-tms) allows studying this relationship in a causal fashion. specifically, it has been shown that corticospinal excitability is higher during the scalp surface negative eeg peak compared to the positive peak of µ-oscillations in sensorimotor cortex, as indexed by larger motor evoked potentials (meps) for fixed stimulation intensity. objective we further characterize the effect of µ-rhythm phase on the mep input-output (io) curve by measuring the degree of excitability modulation across a range of stimulation intensities. we furthermore seek to optimize stimulation parameters to enable discrimination of functionally relevant eeg-defined brain states. methods a real-time eeg-tms system was used to trigger meps during instantaneous brain-states corresponding to µ-rhythm surface positive and negative peaks with five different stimulation intensities covering an individually calibrated mep io curve in healthy participants. results mep amplitude is modulated by µ-phase across a wide range of stimulation intensities, with larger meps at the surface negative peak. the largest relative mep-modulation was observed for weak intensities, the largest absolute mep-modulation for intermediate intensities. these results indicate a leftward shift of the mep io curve during the µ-rhythm negative peak. conclusion the choice of stimulation intensity influences the observed degree of corticospinal excitability modulation by µ-phase. lower stimulation intensities enable more efficient differentiation of eeg µ-phase-defined brain states. the brain is ever active, with a rich dynamic structure of ongoing activity, even in the absence of task-related behavior. one salient feature of neurophysiological recordings are pronounced oscillatory rhythms, but the functional implications of these oscillations are not yet clear [ , ] . a characterization of relevant brain states is challenging: what part of the signal is essential and what part is incidental? additionally, how can we determine that a putative state-signature is functionally relevant? one promising approach uses brain-state triggered brain-stimulation to assess whether different eeg-derived state-signatures at the time of stimulation lead to different evoked potentials. based on the hypothesis that oscillations organize cortical responses [ ] [ ] [ ] , the goal is to understand how different activity states lead to different functional consequences. in addition to providing a deeper understanding of the quences of different brain states to be probed in a causal manner and increases statistical power by preferentially targeting specific oscillatory phases. in the context of the motor system, a recent study [ ] demonstrated a dependence of corticospinal excitability and plasticity on the phase of the cortical µ-rhythm using a real-time triggered eeg-tms system. larger mep amplitudes were elicited by stimulation at time of µ-rhythm surface negative peak (n) compared to µ-rhythm positive peak (p). in that study, a fixed stimulation intensity (eliciting meps of on average of mv peak-to-peak amplitude or using a fixed stimulus intensity of % of mep threshold) was used to examine the effects of ongoing brain activity on corticospinal excitability. the present study is motivated by the belief that the identification and characterization of functionally relevant eeg-defined large-scale brain-states is of critical importance for the development of more stable and effective personalized eeg-modulated therapeutic brain-stimulation protocols. the goal is to investigate the conditions under which functionally differentiable brain-states can be optimally identified in eeg-triggered tms, specifically with regard to stimulus intensity. our recent computational modelling work suggests a larger relative excitability modulation by phase for lower stimulation intensities [ ] . here, we experimentally addressed the question of which stimulation parameters are optimal for the differentiation of µ-rhythm derived brain states. we investigated how µ-phase-modulation of corticospinal excitability changes as a function of stimulation intensity. using a real-time eeg-tms set-up, pulses of five different stimulation intensities were triggered at two different oscillatory phase states (positive and negative peak) of the ongoing sensorimotor µ-rhythm, while meps were obtained to measure corticospinal excitability in each phase and intensity condition. a combined eeg-tms set-up was used to trigger stimulation pulses according to the instantaneous oscillatory phase of the recorded µ-rhythm. scalp eeg was recorded from a -channel tms compatible ag/agcl sintered ring electrode cap (easycap gmbh, germany) in the international - system arrangement. scalp electrode preparation consisted of light skin abrasion followed by filling with conductive gel (electrode cream, ge medical systems, usa) until an impedance of < kΩ was reached. a -bit biosignal amplifier was used for combined -channel eeg and -channel emg recordings (neurone tesla with digital out option, bittium biosignals ltd., finland), data was acquired in dc mode with a sample rate of khz at the head-stage and down-sampled online to a sample rate of khz. emg was recorded from relaxed right abductor pollicis brevis (apb) and first dorsal interosseous (fdi) muscle with bipolar adhesive hydrogel electrodes (kendall, covidien) in a belly-tendon montage. a passively cooled tms double coil (pmd -pcool, mm winding diameter, mag & more gmbh, germany) was used together with a magnetic stimulator (research , mag & more gmbh, germany) configured to deliver biphasic single cosine cycle pulses with µs period such that the second phase of the biphasic pulse induced an electrical field from lateral-posterior to medial-anterior, i.e., orthogonal to the central sulcus. each tms pulse was individually triggered through an external trigger input from the real-time system. stimulation intensity was set programmatically using an analog control interface between - v and corresponding to - % of maximum stimulator output through an analog output port interface (uei pd -mf- - / l, united electronic instruments, usa) from the real-time system to allow for randomized ordering of intensity conditions. stimulation was applied to the hand representation of left primary motor cortex (m ). the motor hot spot was identified as the coil position and orientation resulting consistently in maximum mep amplitudes [ ] . the target muscle was the muscle which responded to the lowest stimulator intensity and was then subsequently used to determine resting motor threshold (rmt) as the lowest intensity that elicited meps with a peak-to-peak amplitude of at least µv in out of trials [ ] . a neuronavigation system (localite gmbh, sankt augustin, germany) was used to mark the coil position over the motor hot spot to monitor coil stability over time. the real-time processing system used in this experiment is described in detail in zrenner et al. [ ] . briefly, an algorithm implemented in simulink real-time (mathworks ltd, usa, r a) was used for real-time data acquisition, data processing and as the tms stimulator control system. the algorithm was executed on an xpc target processor (dfi-acp cl -crm mainboard), processing online eeg data streamed through a real-time ethernet interface from the digital out interface of the eeg main unit in data packets at a rate of khz. the eeg signal used for real-time triggering was comprised of eeg channels overlying left sensorimotor cortex (c , cp , cp , fc , fc ), which were combined in a c -centered laplacian montage [ ] . data was down-sampled to khz by averaging and a sliding window of data of ms width was used to compute estimates of instantaneous phase. the signal window was bandpass filtered (finite impulse response filter with order and pass band - hz), the last ms were discarded because of edge artefacts and then forward predicted by an autoregressive model (yule-walker, order ) for ms. the analytic signal was computed by fast fourier transform-based hilbert transform, which was used to determine the instantaneous phase. in addition, the power spectrum was calculated from a sliding window of µ-positive peak trial (p) µ-negative peak trial (n) (c) eeg µ-phase-triggered stimulation is performed, according to the instantaneous phase from the laplace-filtered c -signal. two example single trials for the two types of trigger conditions, surface positive µ-peak (p) and surface negative µ-peak (n). (d) the corresponding single-trial emg signals recorded for the two trigger conditions, with meps in the interval - ms after stimulation. spectral power in the - hz frequency band. a digital output signal was generated to trigger the magnetic stimulator when three conditions were met: ( ) an interstimulus interval (isi) to the preceding pulse larger than . s ( ) the temporal evolution of the phase estimate crosses the target phase. ( ) a predefined µ-power threshold is met. the power-threshold was adjusted on an individual participant basis at the beginning of the experiment such that a median isi of seconds resulted. in the case of random-phase stimulation, only the minimum isi and µ-power were considered as conditions for triggering the magnetic pulse, and a random delay was imposed between and ms. the experimental session consists of three types of blocks: ( ) resting state eeg recordings were made, with eyes open and closed (five minutes eyes open, participants instructed to fixate cross meter in front of participant, followed by one minute eyes closed). ( ) an io curve was obtained (fig. a) . eight intensities ( % to % rmt, in steps of %) were tested in randomized order, with pulses per condition, applied with random-phase stimulation. the median isi was . ± . s across participants. the io curve was fitted with a logistic function f (s) = /( + exp(−b · (s − a))). five stimulation intensities were chosen on the indvidual io curve where median mep amplitudes resulted at the following percentages of the maximum saturation level: si % max / si % max / si % max / si % max / si % max (fig. b) . ( ) µ-phase-triggered stimulation with real-time eeg-triggered tms at surface µ-positive peak and surface µ-negative peak conditions was performed at the five intensities chosen in the previous block in randomized order with pulses per condition (fig. c) . the phase-dependent stimulation block was repeated three times, i.e., phases × intensities × pulses × blocks = pulses total. the stimulator intensity was automatically adjusted between pulses, intensity conditions were applied in randomized order. the median isi was . ± . s across participants. as an output measure, emg was recorded (fig. d ). the study protocol conformed to the declaration of helsinki and was approved by the local ethics committee at the medical faculty of the university of tübingen (protocol / bo ). written informed consent was obtained from all participants prior to the experiment. right-handed participants ( male, female, mean age: . ± . years, age range: - , average laterality score in edinburgh handedness survey: . ± . ) with no history of neurological disease and usage of cns drugs were selected according to the following inclusion criteria: ( ) rmt of right fdi or apb muscle <= . % of maximum stimulator output (mso), so that a stimulation intensity range of up to % mso ( . * . %= % mso) could be explored. ( ) the presence of a µ-rhythm with sufficient signalto-noise ratio (snr), as an adequate snr is required for the phase-detection algorithm to estimate phases with sufficient accuracy. snr was evaluated as follows (similar to nikulin and brismar [ ] ): a power function (c · x α ) was fitted to the /f noise of the resting eeg data power spectrum from laplacefiltered c -electrode of each individual participant. for that, data points from frequency bins with typically no oscillatory components present ( . - hz, - hz) were used. the fitted noise was subtracted from the power spectrum. the adjusted power in the - hz band was assessed for a clearly identifiable peak in the µ-range, with db over noise level as inclusion threshold. two participants were excluded from the experiment, one because of excessive pre-innervation in the emg (with . % of trials discarded according to the predefined threshold criterion), the other because the meps evoked by the fitted intensities for phase-dependent stimulation differed greatly (by . %) from the io curve fitted on median meps recorded in the pre-experiment, likely due to a coil position mismatch. this resulted in a sample size of participants. experiments were performed in accordance with current tms safety guidelines [ ] . all participants tolerated the procedures without any adverse effects. data was analyzed with matlab (mathworks ltd, usa, r b) and the bbci toolbox [ ] . emg signals were high-pass filtered (butterworth, filter order , hz). trials with muscle activity ms before stimulation onset were discarded with a threshold criterion (max-min amplitude > µv). peak-to-peak mep amplitudes were determined within the interval of - ms after tms pulse. all mep time courses were inspected manually and non-response amplitudes were set to zero. trials of the three phase-dependent blocks were pooled. for one participant, the last phasedependent block was discarded because of excessive coil drift. the c -centered eeg-laplace filter extracted µ-rhythm pre-stimulation activity ( ms before stimulation) was manually inspected for artefacts, and corresponding trials were discarded. overall, . % of trials were discarded for the phase-dependent stimulation sessions. we evaluated the relative as well as the absolute mep differences between nand p-trials. we quantified the relative phase-modulation by calculating the ratio median(mepn) median(mepp) and the absolute phasemodulation by calculating median(mepn)−median(mepp) iocmax for each intensity condition, respectively, where ioc max is the median mep evoked at intensity % rmt, measuring the individual io curve saturation level. we assessed the effect of intensity on phase-modulation by bootstrapping. trials were randomly partitioned (with replacement) into two classes and the ratio and difference measures were calculated. this procedure was repeated for iterations to arrive at confidence bounds and p-values. to estimate the accuracy of the real-time phase-trigger algorithm, we determined the instantaneous phase by passing the five minutes resting eeg through the simulink model from the experimental session to determine time points at which the algorithm would trigger. this procedure was chosen to avoid contamination by stimulation artefacts. instantaneous phase was estimated by using hilbert transform on the laplacian c signal and band-pass filtered in - hz frequency range. phase prediction accuracy (mean ± standard deviation) across participants was - . • ± . • in the positive peak condition and . • ± . • in the negative peak condition. angular phase accuracy distribution plots for individual participants are shown in supplementary fig. s . the achieved phase accuracy was as expected similar to zrenner et al. [ ] , as no changes were made to the core phase-detection algorithm. to validate that the intensities chosen for the phase-triggered measurement session matched the section of the io curve measured during the random-phase pre-measurement, the resulting averaged mep amplitudes were compared by computing the percentile rank of the median mep-amplitude for phasestimulation conditions, pooled across n-and p-trials, assuming median meps from random-phase stimulation reflect an average of n-and p-trials. we found a mean deviation of - . % ± . %, which allows adequate comparisons to the randomphase io curve. the io curves for individual participants can be found in supplementary fig. s e . to illustrate the effect of µ-phase on mep amplitudes, fig. shows data from one measurement session for an illustrative single participant, including the phase-dependent io curves as well as mep histograms, and the io curve from the pre-experiment with random-phase stimulation. to quantify the effect of pre-stimulation µ-phase, we compared relative and absolute differences between mep amplitudes of n-and p-trials. the mean influence of stimulation intensity on the degree of mep amplitude modulation by µ-phase across all participants is shown in fig. . we replicated the dependence of mep amplitude on phase of the sensorimotor µ-rhythm at the time of tms (i.e., larger mep amplitudes at the negative peak of the µ-rhythm) [ ] and detected a significant difference between n-and p-trials in four out of five intensity conditions. the effect of stimulation intensity on modulation by µ-phase was assessed as relative and absolute differences between n-and p-trials. the mean n/p-ratio decreased with higher intensity, (fig. b) . mep amplitudes at the µ-negative peak were on average between % (at si % max , corresponding on average to % rmt, p = . , two-tailed wilcoxon signed-rank test) and % (at si % max , corresponding on average to % rmt, p> . ) larger than meps evoked at the µ-positive peak. the n/p-difference peaked at the intermediate intensity si % max , with a group average difference between n-and p-trials of . % of ioc max (fig. c) . the measures for individual participants are shown in supplementary figs. s and s . for the participant with the largest observed n/p-ratio, the phase-dependent stimulation meps are lower compared to the random-phase stimulation io curve (participant s , supplementary fig. s e ). this leads to increased n/p-ratio-values and to deviation from the n/p-ratio computed from the logistic fit (fig. b) . variability of meps is higher at low intensities, as measured by the coefficient of variation (across participants, mean cv % max = . , cv % max = . , cv % max = . , cv % max = . , cv % max = . ). due to higher variability of meps at low intensities, even large n/p-ratio values are not always significant within participants according to the bootstrap test. to illustrate the effect of the sample size on the probability of detecting a difference between n-and p-trials for every intensity condition we used a simulation approach. for each participant, meps were resampled with replacement separately for nand p-trials for varying sample sizes and a wilcoxon rank-sum test was performed, noting whether the null hypothesis was rejected. this procedure was repeated times. the results are shown in fig. . using trials per condition, the mean probability to reject the null hypothesis (n/p-ratio equals or n/p-difference equals ) within a single participant is % for the cv % max and cv % max conditions, and around % for the cv % max condition. using only trials, these values decline to % and %, respectively. this demonstrates that the required number of trials to differentiate between µ-phase-defined states with high statistical power within participants can be reduced by choosing a lower stimulation intensity. in this analysis, also non-responders (i.e. participants without a significant phasemodulation for any tested stimulation intensity) are included, as the goal of this analysis is to show relative differences in required sample sizes between intensities. for non-responders, the null hypothesis may be true, therefore the actual statistical power at a given sample size is higher than the average probability of rejecting h . at the group level, using a low to intermediate intensity will enable detection of significant effects of the size observed in the current study with high certainty. in our study, of the included participants did not show a significant mep amplitude modulation by instantaneous µ-phase at any of the tested stimulation intensities, which is consistent with the data reported in zrenner et al. [ ] . we performed a number of correlation analyses in order to identify possible factors, which may separate responders from non-responders: we found no significant correlation between snr as obtained from resting state eeg data and the effect size for si % max (p> . , spearman rank correlation). overall, higher snr resulted in improved performance of the phase-detection algorithm as measured by decreased standard deviation of the phase accuracy (r=- . , p= . , spearman rank correlation), but pronounced rhythm with high snr did not translate to a larger phase-modulation effect. a significant sub-population of participants remained that showed a clear µ-rhythm without exhibiting a clear phase-modulatory effect on mep amplitude. additionally, no significant correlation was observed between distance of coil to center electrode of the laplacian filter and effect size at si % max (p> . , spearman rank correlation). in this study, a standard mni brain model was used for navigation. in future studies with individual participant mri anatomy additional factors could be investigated, such as the orientation of dipoles underlying the mean topography around the stimulation trigger (supplementary fig. s b ), coil position and orientation relative to the central sulcus. therefore, the factors required for observing µ-phase modulation of corticospinal excitability remain to be further elucidated. we replicated the finding that corticospinal excitability as measured by mep amplitude is modulated by the phase of the ongoing µ-rhythm [ ] , with larger mep amplitudes at the negative compared to the positive peak. additionally, in agreement with predictions based on our modeling work [ ] , we demonstrated that the magnitude of the modulatory effect depends on stimulation intensity, with largest relative modulation for low intensities and largest absolute differences for intermediate stimulation intensities. the reduced modulatory influence of µ-rhythm at high stimulation intensities can be explained by saturation of the io curves of meps. if stimulation intensity is sufficiently far above the motor threshold, any ongoing fluctuations of that threshold influence behavioral outcomes to a lesser degree and will result in smaller relative differences between n-and p-trials. this is compatible with many previous findings showing greatest sensitivity of mep amplitude to intervention in the low and/or intermediate parts of the io curve (e.g. [ ] [ ] [ ] [ ] ). one practical implication of the present findings is that studies seeking to demonstrate differential eeg-defined brain states based on differential tms-evoked responses should be designed with a sufficient number of interleaved trials (more than per condition) and use low stimulation intensity to maximize statistical power, where the lower limit of intensity is determined by the proportion of non-responses and measurement noise when quantifying small responses. based on this study, the intensity setting resulting in an average mep amplitude % of ioc max would be recommended; in our dataset, this corresponded to a stimulus intensity of on average % rmt, with of the participants in our sample in the range between - %rmt. however, depending on the io curve, a stimulation intensity based on a fixed rmt percentage can already be in the saturation range of an individual participant, where mep amplitudes do not significantly differ between µ-phase conditions. measuring the individual io curve and specifically estimating the maximum amplitude at saturation is therefore helpful to choose an optimal intensity. the range of tms intensities used is limited at the lower end by the motor threshold, as stimulation at intensities significantly below rmt does not result reliably in emg responses in any brain-state. nevertheless, sub-threshold tms does affect cortical circuits, as can be demonstrated, for instance, in pairedpulse paradigms (e.g., [ , ] ), with pre-innervation [ ] , or in tms evoked eeg potentials [ , ] . it is therefore feasible (using paired-pulse protocols, performing stimulation during pre-innervation, or using cortical eeg responses) to investigate in subsequent studies whether eeg-defined brain-states can be differentiated at intensities below single-pulse rmt. in addition to phase, pre-stimulus oscillatory power has been shown to modulate perception [ ] [ ] [ ] [ ] , but in our experimental design, the impact of instantaneous µ-power (or power in other frequency bands) on mep amplitude is difficult to assess. as we used a power-threshold, to ensure suitable accuracy of the phase-detection algorithm, no trials with low µ-power were acquired. real-time triggered eeg-tms could be used in the future to explore the role of instantaneous power. not all participants displayed a modulation of meps by instantaneous µ-phase. we tried to identify factors which predict the individual degree of phase-modulation. high snr alone is not sufficient for a large effect, as in our data set, there are participants with high snr but no modulation of mep amplitude by µ-phase. furthermore, the standard laplacian c filter may not extract the functionally relevant oscillatory component, depending on individual anatomical features. future studies may improve this aspect by using individualized spatial filters or anatomically guided source level analysis. this may also increase the proportion of participants which can be studied with phase-dependent brain stimulation, as in this study only participants with a µ-rhythm as detected by a standard c hjorth montage were included, whereas individualized approaches will yield increased snr for oscillatory components. the conditions required to establish a clear dependence of cortical excitability on instantaneous µ-phase are not sufficiently understood yet. understanding this dependence will yield a clearer view on what exactly is stimulated by tms, and on the interplay of endogeneous oscillatory rhythms in motor areas and their functional role. neurophysiological and computational principles of cortical rhythms in cognition neuronal oscillations in cortical networks ongoing eeg phase as a trial-by-trial predictor of perceptual and attentional variability a mechanism for cognitive dynamics: neuronal communication through neuronal coherence shaping functional architecture by oscillatory alpha activity: gating by inhibition the phase of ongoing eeg oscillations predicts visual perception to see or not to see: prestimulus α phase predicts visual awareness the phase of ongoing oscillations mediates the causal relation between brain excitation and visual perception the phase of prestimulus alpha oscillations affects tactile perception cortical brain states and corticospinal synchronization influence tmsevoked motor potentials variability of motor potentials evoked by transcranial magnetic stimulation variability in the amplitude of skeletal muscle responses to magnetic stimulation of the motor cortex in man real-time eeg-defined excitability states determine efficacy of tms-induced plasticity in human motor cortex ongoing brain rhythms shape i-wave properties in a computational model non-invasive electrical and magnetic stimulation of the brain, spinal cord, roots and peripheral nerves: basic principles and procedures for routine clinical and research application. an updated report from an i.f.c.n. committee a practical guide to diagnostic transcranial magnetic stimulation: report of an ifcn committee an on-line transformation of eeg scalp potentials into orthogonal source derivations phase synchronization between alpha and beta oscillations in the human electroencephalogram safety, ethical considerations, and application guidelines for the use of transcranial magnetic stimulation in clinical practice and research the berlin brain-computer interface: progress beyond communication and control abnormal cortical motor excitability in dystonia input-output properties and gain changes in the human corticospinal pathway ketamine increases human motor cortex excitability to transcranial magnetic stimulation deleterious effects of a low amount of ethanol on ltp-like plasticity in human cortex corticocortical inhibition in human motor cortex interaction between intracortical inhibition and facilitation in human motor cortex effects of voluntary contraction on descending volleys evoked by transcranial stimulation in conscious humans the effect of stimulus intensity on brain responses evoked by transcranial magnetic stimulation the spectral features of eeg responses to transcranial magnetic stimulation of the primary motor cortex depend on the amplitude of the motor evoked potentials prestimulus oscillations predict visual perception performance between and within subjects prestimulus oscillatory activity in the alpha band predicts visual discrimination ability prestimulus oscillations enhance psychophysical performance in humans prestimulus oscillatory power and connectivity patterns predispose conscious somatosensory perception we thank anna kempf and tamara vasilkovska for help with participant coordination and experimental preparation. ns and jt acknowledge support from the quandt foundation. ns preprint // biorxiv and cz are supported through a german federal ministry for economic affairs and energy of exist transfer of research grant. cz acknowledges support from the clinician scientist program at the faculty of medicine at the university of tübingen. uz acknowledges support from the german research foundation (dfg, grant zi / - ). this research was supported with funding from the federal ministry of education and research of germany through the motor-bic project (bmbf, grant gw a). key: cord- - lbrfsrh authors: adam, kirsten c.s.; chang, lillian; rangan, nicole; serences, john t. title: steady-state visually evoked potentials and feature-based attention: pre-registered null results and a focused review of methodological considerations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lbrfsrh feature-based attention is the ability to selectively attend to a particular feature (e.g., attend to red but not green items while looking for the ketchup bottle in your refrigerator), and steady-state visually evoked potentials (ssveps) measured from the human electroencephalogram (eeg) signal have been used to track the neural deployment of feature-based attention. although many published studies suggest that we can use trial-by-trial cues to enhance relevant feature information (i.e., greater ssvep response to the cued color), there is ongoing debate about whether participants may likewise use trial-bytrial cues to voluntarily ignore a particular feature. here, we report the results of a preregistered study in which participants either were cued to attend or to ignore a color. counter to prior work, we found no attention-related modulation of the ssvep response in either cue condition. however, positive control analyses revealed that participants paid some degree of attention to the cued color (i.e., we observed a greater p component to targets in the attended versus the unattended color). in light of these unexpected null results, we conducted a focused review of methodological considerations for studies of feature-based attention using ssveps. in the review, we quantify potentially important stimulus parameters that have been used in the past (e.g., stimulation frequency; trial counts) and we discuss the potential importance of these and other task factors (e.g., feature-based priming) for ssvep studies. attending to a specific feature leads to systematic changes in the firing rates of neurons that encode the relevant feature space. for example, when looking for a ripe tomato, the firing rate of neurons tuned to red will be enhanced and the firing rate of neurons tuned to other colors will be suppressed (e.g. responses to green; ipata et al., ; martinez-trujillo & treue, ; störmer & alvarez, ) . although there is broad agreement that participants may learn to suppress irrelevant distractors with sufficient experience, there is disagreement about whether these behavioral suppression effects may be volitionally implemented on a trial-by-trial basis in response to an abstract cue (i.e., a "volitional account"), or if they instead are solely implemented via implicit or statistical learning mechanisms (i.e., a "priming-based" account). consistent with a volitional account, some work has found that participants can learn to use a trial-by-trial cue to ignore a particular color ( suppress a color on a trial-by-trial basis independent of target enhancement (i.e., a strong version of a volitional suppression account), we predicted that the time-course of enhancement vs. suppression of the ssvep signal would be reduced or reversed (i.e., that suppression of the cued, to-be-ignored color may happen even prior to enhancement of the other color). alternatively, if participants recode the "ignore" cue to serve as an indirect "attend" cue (e.g., "since i'm cued to ignore blue, that means i should attend red"; beck & hollingworth, ; becker et al., ) , then we predicted that target enhancement would always precede distractor suppression regardless of whether participants were cued to attend or ignore a particular color. to preview the results, we were unable to fully test our hypotheses about the time- course of feature-based enhancement and suppression because we did not find evidence for an overall attention effect with our task procedures. despite robust ssvep amplitude (cohen's d > ), we observed no credible evidence that the ssvep response was higher for an attended versus unattended color in either cue condition. positive control analyses revealed that our lack of ssvep effect was not due to a complete lack of attention to the attended color: erp responses (p ) to the targets were modulated by attention as light of our inconclusive results, we also performed a focused methodological review of key potential task differences between our work and prior work that may have resulted in our failure to detect the effect of feature-based attention on ssvep amplitude. we considered whether task factors such as stimulus flicker frequency, sample size, stimulus duration, and stimulus color might have impacted our ability to observe an attention effect. no single methodological factor that we considered neatly explains our lack of effect. given our results and literature review, we propose that future work is needed to systematically explore two key factors: ( ) variation in feature-based attention effects across stimulus flicker frequencies and ( ) the extent to which feature-based priming modulates ssvep attention effects. we published a pre-registered research plan on the open science framework prior to data collection (https://osf.io/kfg h/). our raw data and analysis code will be made available online on the open science framework at https://osf.io/ew dv/ upon acceptance for publication. healthy volunteers (n = ; gender = female, male; mean age = . years [sd = . , min = , max = ]; handedness not recorded; corrected-to-normal visual acuity; normal color vision) participated in one . to hour experimental session at the university of california san diego (ucsd) campus, and were compensated $ /hr. procedures were approved by the ucsd institutional review board, and all participants provided written informed consent. inclusion criteria included normal or corrected-to- normal visual acuity, normal color vision, age between and years old, and no self- reported history of major neurological disorders (e.g., epilepsy, stroke). data were excluded from analysis if there were fewer than trials in either cue condition (either due to leaving the study early or after artifact rejection). a sample size of was pre- registered, and artifact rejection criteria were pre-registered (see section "eeg preprocessing" below for more details). after running each participant, we checked whether the data were usable (i.e., sufficient number of artifact-free trials) so that we would know when to stop data collection. to reach our final sample size (n = participants with usable data), we ran a total of participants. nine participants' data were not used for the following reason: subjects with an error in the task code (n = ), subjects who stopped the study early due to technical issues or to participants' preferences (n = ), subjects with too many artifacts (n = ). note, we were one subject short of our pre-registered target sample size of because data collection was suspended due to covid- . however, as our later power analyses will show, we do not believe the addition of further subject would have meaningfully altered our conclusions. heterochromatic flicker photometry task. we chose perceptually equiluminant colors for each participant using a heterochromatic flicker photometry task. participants viewed a large circular, flickering stimulus ( º radius) on a black screen ( . cd/m ). we generated circular color spaces in cielab-space with varying luminance (circles centered on: l = - , a = , b = ; colors equally spaced around circle with radius = ) for use in the task. participants matched each of the colors to a medium-gray reference color (rgb = . . . ). on each trial, the circular background was flickered between two different colors. one color was always medium-gray, and the other color was the to-be-matched color on that trial. the colors of circular background were phase reversed at a rate of hz, giving the appearance of a fast flicker when the subjective luminance values were not matched. on top of the flickering circular stimulus small oriented bars were drawn in the medium- gray reference color (the bars changed locations at a rate of hz). the oriented bars served no purpose other than subjectively making it easier to discriminate fine-grained differences in luminance between the flickering colors (i.e., these bars gave secondary visual cues about equiluminance via the "minimally distinct border" phenomenon, kaiser, ) . participants increased or decreased the luminance of the to-be-matched color (using up and down arrow keys) until the amount of perceived flicker was minimized -the point of perceptual equiluminance. the luminance starting value of the to-be-matched color was chosen at random on each trial. once satisfied with their response, the participant pressed spacebar to continue to the next trial. each to-be-matched color was repeated times ( trials total). feature-based attention task. all stimuli were viewed on a luminance calibrated crt monitor ( x resolution, hz refresh rate) from a distance of ~ cm in a dimly lit room. stimuli were generated using matlab a and the psychophysics toolbox (brainard, ; kleiner et al., ; pelli, ) . participants rested their chin on a chin- rest and fixated a central dot ( . º radius) throughout the experiment. the stimulus was a circular aperture (radius = ~ . º) filled with oriented bars (each bar ~ . º long and ~. º wide). bars were centered on a grid and separated by ~ bar length such that they never overlapped with one another. on each individual frame (~ . ms) this grid was randomly phase shifted ( : π in x and y coordinates) and rotated ( : º), thus giving the appearance of random flicker. to achieve steady state visually evoked potentials (ssvep) half of the bars flickered at hz ( frames on, frames off) and the other half flickered at hz ( frames on, frames off). due to the jittered rotation of bar positions and to the random assignment of colors to bars on each "on" frame, this means that the individual pixels that were "on" for each color varied from frame to frame. the unpredictable nature of each bar's exact position is thus quite similar to unpredictable stimuli that have been used in past work (e.g., andersen et al., ) . for each "off frame" no bars of that color were shown (e.g., if hz had an "on" frame and hz had an "off" frame, then only out of bars would be shown on the black background). if both the hz and hz bars were "off", then a black screen would be shown on that frame. see figure s for an illustration of some example frame-by-frame screenshots of the stimuli. on each trial (figure ), the participants viewed the stimulus array of flickering, randomly oriented bars presented on a black background ( . cd/m ). half of these bars were shown in one color (randomly chosen from the possible colors) and the other half were in another randomly chosen color (with the constraint that the two sets of bars must be two different colors). during an initial baseline ( , ms), participants viewed the flickering dots while they did not yet know which color to attend; during this baseline, the fixation point was a medium gray color (same as the reference color in the flicker photometry task). after the baseline, the fixation dot changed color, cuing the participants about which color to attend or ignore. in the "attend cue" condition, the color of the fixation point indicated which color should be attended. in the "ignore cue" condition, the color of the fixation point indicated which color should be ignored. these two conditions were blocked, and the order was counterbalanced across participants (further details below). during the stimulus presentation ( , ms), participants monitored the relevant color for a brief "target event" ( ms). during this brief target event, a percentage of lines in the relevant color will be coherent (iso-oriented). critically, the orientation of each coherent target or distractor event was completely unpredictable (randomly chosen between - degrees); thus, participants could not attend to a particular orientation in advance in order to perform the task. a target event occurred on % of trials, and participants were instructed to press the spacebar as quickly as possible if they detected a target event. importantly, physically identical events (iso-oriented lines in a random orientation, ms) could also appear in the distractor color ( % of trials). participants were instructed that they should only respond to target events; if they erroneously responded to the distractor event, the trial was scored as incorrect. the target and/or distractor events could begin as early as cue onset ( ms) and no later than , ms after stimulus onset). participants could make responses up to second into the inter-trial interval. if both a target and distractor event were present, their onset times were separated by at least ms. target events, and distractor events in the main conditions. in the 'attend cue' condition, participants made a response when the iso-oriented lines were the same color as the cue (target event) and did not respond if the iso-oriented lines occur on the uncued color (distractor event). in the 'ignore cue' condition, participants made a response when the iso-oriented lines occurred on the uncued color (target event), and they did not respond if the iso-oriented lines occurred on the cued color (distractor event). note, all lines were of equal size in the real experiment; lines are shown at different widths here for easier visualization of the target and distractor colors. here, the iso-oriented lines are drawn at vertical in all examples. in the actual task, the iso-oriented lines could be any orientation ( - ). to ensure that the task was effortful for participants, the coherence of the lines in the target stimulus was adapted at the end of each block if behavior was outside the range of - % correct. at the beginning of the session, the target had % coherent iso- oriented lines. if accuracy over the block of trials was > %, coherency decreased by %. if block accuracy was < %, coherency increased by %. the maximum allowed coherence was % iso-oriented lines (so that participants would not be able to simply individuate and attend a single position to perform the task) and the minimum allowed coherence was %. the presence and absence of target and distractor events was balanced within each block yielding a total of sub-conditions within each cue type ( % each): ( ) target event + no distractor event, t d ( ) no target event + distractor event, t d ( ) target event + distractor event, t d ( ) no target event + no distractor event, t d . participants completed both task conditions (attend cue and ignore cue). the two conditions were blocked and counterbalanced within a session (i.e., half of participants performed the "attend cue" task for the first half of the session and the "ignore cue" task for the second half of the session.) each block of trials took approximately min sec. participants completed blocks ( per condition) for a total of trials per cue condition. note, we originally planned for blocks ( per condition) in the pre- registration, but the block number was reduced to after the first few participants did not finish all blocks. summary of deviations from the registered procedures. as described in-line above, there were some minor deviations from the pre-registration: ( ) we made changes to the pre-registered task code to fix errors that we discovered while running the first subjects (e.g., incorrect cues and behavioral feedback in the 'ignore cue' condition). ( ) we included code for eye-tracking, which allowed us to give participants automated real- time feedback if they blinked when they were not supposed to, i.e., during the stimulus period. ( ) we reduced the total number of experimental blocks from ( per cue condition) to ( per cue condition) due to time constraints. ( ) we had to prematurely stop data collection at n = out of due to covid- . ( ) we forgot to specify a specific statistical test for quantifying the robustness of overall ssveps in section "checking that an ssvep is elicited at the expected frequencies before collecting the full sample", so we have described our justification for the statistical tests we present here. ( ) due to unanticipated failure to detect an overall attention effect, we performed additional non- pre-registered control analyses to attempt to rule out possible explanations of this null effect (see section: "non pre-registered control analyses" below). eeg pre-processing continuous eeg data were collected online from ag/agcl active electrodes mounted in an elastic cap using a biosemi activetwo amplifier (cortech solutions, wilmington, nc). an additional external electrodes were placed on the left and right mastoids, above and below each eye (vertical eog), and lateral to each eye (horizontal eog). continuous gaze-position data were collected from an sr eyelink + eye- tracker (sampling rate: , hz; sr research, ottawa, ontario). we also measured stimulus timing with a photodiode affixed to the upper left-hand corner of the monitor (a white dot flickered at the to-be-attended color's frequency; the photodiode and this corner of the screen were covered with opaque black tape to ensure it was not visible). data were collected with a sampling rate of hz and were not downsampled offline. data were saved unfiltered and unreferenced (see: kappenman & luck, ) , then referenced offline to the algebraic average of the left and right mastoids, low-pass filtered (< hz) and high-pass filtered (>. hz). artifacts were detected using automatic criteria described below, and the data were visually inspected to confirm that the artifact rejection criteria worked as expected. we excluded subjects with fewer than trials remaining per cue condition. eye movements and blinks. we used the eye-tracking data and the heog/veog traces to detect blinks and eye movements. blinks were detected on-line during the task using the eye tracker. if a blink was detected (i.e., missing gaze position returned from the eye tracker), the trial was immediately terminated and the participant was given feedback that they had blinked (i.e., the word "blink" was written in white text in the center of the screen). if eye-tracking data could not be successfully collected (e.g., calibration issues), the veog trace was used to detect blinks and/or eye movements during offline artifact rejection. to do so, we used a split-half sliding window step function (luck, ; window size = ms, step size = ms, threshold = microvolts.) we also used a split-half sliding-window step function to check for eye-movements in the gaze-coordinate data from the eye-tracker (window size = ms, step size = ms, threshold = º) and in the horizontal electrooculogram (heog), window size = ms, step size = ms, threshold = microvolts, and to detect blinks and/or eye movements we also pre-registered an analysis plan for examining the time-course of ssvep amplitude. however, because our data failed to satisfy pre-registered pre-requisite analyses, we do not report these time-course effects here (for completeness, we show the time-course of snr in figure s ). checking that an ssvep is elicited at the expected frequencies before collecting the full sample. at n = , we planned to confirm that our task procedure successfully produced reliable ssvep responses (i.e., check that we observed peaks at the correct stimulus flicker frequencies). if our task procedures failed to elicit an ssvep at the expected frequencies, we had planned to stop data collection and alter the task to troubleshoot the problem (e.g., optimize timing, choose different flicker frequencies, make stimuli brighter, etc.). we planned to begin data collection over again if we failed this trouble-shooting step. note, at this early stage we only verified if the basic method worked (ssvep frequencies were robust): we did not test whether any hypothesized attention effects were present, as this could inflate our false discovery rate (kravitz & mitroff, ) . note, in the original pre-registration we failed to specify what test we would run to determine if ssvep frequencies were robustly represented in the eeg signal. theoretical chance for snr would be , so the simplest test would be to compare the snr for our stimulation frequencies ( and hz) to using a t-test, which we report. however, it is often is better to compare to an empirical baseline with a reasonable amount of noise (combrisson & jerbi, ) . as such, we opted to also compute an effect size comparing the snr for our stimulation frequencies to all other frequencies (with the exception that we did not use frequencies +/- hz of or hz as baseline values, since snr was calculated as the power at frequency f divided by the power in the adjacent -hz bins). checking achieved power for the basic attention effect. without we did not anticipate our failure to find an overall attention effect with these task procedures and set of pre-registered "sanity check" analyses described above. to further understand the lack of ssvep attention effect, we performed additional non-pre- registered control analyses. we first confirmed that our ssvep procedure was effective at eliciting robust, frequency-specific modulations of the eeg signal. after collecting the first participants, we checked that overall ssvep amplitudes for our two target frequencies ( and hz) were robust when collapsed across conditions (fig a) before proceeding with data collection. we indeed found that the ssvep signal was robust during the stimulus period even with n= for both the hz frequency (mean snr = . , sd = . , snr > : p < . ) and for the hz frequency (mean snr = . , sd = . , snr > : p < . ). these values were similar for the full n= sample (fig b) . to compute an effect size, we compared snr values for each target frequency ( hz and hz) to the snr values for each baseline frequency (frequencies from - hz not within +/- hz of or hz). snr values for the target frequencies were significantly higher than baseline, mean cohen's d = . (sd = . ) and . (sd = . ), respectively (see figure s ) . as planned, we also confirmed that the electrodes we selected a priori (o , oz, and o ) were reasonable given the topography of overall ssvep amplitudes (i.e., they fell approximately centrally within the brightest portion of the heat map; figure experimental conditions. color scale indicates snr. as expected, the a priori electrodes o , o , and oz (magenta circles) showed robust snr during the stimulus period. next, we checked for a basic attention effect, defined as a larger amplitude response evoked by the attended frequency compared to the ignored frequency). note, for the sake of clarity, all conditions are translated into "attend" terminology. that is, if a participant was cued to "ignore blue" ( hz) during the "ignore cue" condition (and the other color was red and hz), this will instead be plotted as "attend red" ( hz). figure shows the gaussian wavelet-derived frequency spectra during the stimulus period ( - ms) as a function of cue type (attend versus ignore) and attended frequency (attend hz or attend hz). we found a main effect of measured frequency, whereby snr was overall higher for versus hz, f( , ) = . , p < . , η p = . . however, we found no main effect of attended frequency (p = . ) or cue type (p = . ), and we found no significant interactions (p >= . ). collapsed down to a paired t-test, the observed effect size for attended versus unattended snr values was cohen's d = . . to detect an effect of this size with % power ( -β = . ; α = . ) would require a sample size n > , * . given that we did not find an overall attention effect, we did not analyze or interpret analysis of the ssvep time-course. however, for completeness we have shown the time course in figure s . frequency spectra in the attend cue (a) and ignore cue (b) conditions during the stimulus period. although we observe expected peaks at hz and hz, this ssvep response is not modulated by the attention manipulation. (c-d). violin plots of the signal-to-noise ratio at the ssvep frequencies in the attend cue (c) and ignore cue (d) conditions. although we pre-registered that we would analyze all trials (those with and without target/distractor events), most prior studies have included only trials without any target or distractor events in the main ssvep analysis (e.g., andersen et al., ; müller et al., ) . to ensure that our null result was not due to this analysis choice, we also planned in our pre-registration to examine the ssvep attention effect for trials with and without target and distractor events. when restricting our analysis to only trials without targets or distractors ( % of the trials, or trials total before artifact rejection), we likewise found no attention effect. as before, we found a main effect of measured frequency ( > hz), p < . , but no effect of cue condition (p = . ) or attended frequency (p = . ), and, most critically, we found no interaction between measured frequency and attended frequency (p = . ). frequency spectra for all combinations of target and distractor presence are shown in figures s and s . finally, we also pre-registered that we would check whether the similarity of the target and distractor colors ( versus degrees apart on a circular color wheel; figure b) would modulate the ssvep attention effect. we likewise found that the similarity of the distractor colors did not significantly modulate the ssvep response, and we found no attention effect (interaction of measured frequency and attended frequency) in either color distance condition (p >= . ; figure s ). we conducted additional control analyses to rule out possible sources of our failure to find an attention effect. first, we examined the photodiode recording to rule out any failures due to trial indexing. the photodiode measured the luminance of a white dot that flickered at the attended frequency on each trial. as expected, performing an fft on the photodiode time-course thus yielded near-perfect tracking of the attended frequency ( figure a-b, p < . ) . on the other hand, we again found null results for the main attention manipulation ( figure c -f) when using an fft analysis that more closely followed prior work. we ran a repeated measures anova on the signal to noise ratio values during the stimulus period, including the factors measured frequency ( hz, hz), attended frequency ( hz, hz), and cue type (attend, ignore). we found no main effect of measured frequency (p = . ), attended frequency (p = . ), or cue type (p = . ), and we found no significant interactions (p >= . ). however, the average signal- to-noise ratio of the stimulus frequencies was overall robust (m = . , sd = . , greater than chance value of : p < x - ), so our inability to observe the attention effect was not due to lack of overall ssvep signal. given that some work has reported significant effects only for the second harmonic figures s and s ) . we found no significant attention effects for either second harmonic frequency. we also re-ran the fft analysis with other electrode-selection methods to ensure our a priori choice of electrodes did not impede our ability to observe an effect (m. x. cohen & gulbinaite, ). we found no evidence that electrode choice led to our null effect, as exploiting information from all electrodes by implementing rhythmic entrainment source separation (ress) likewise yielded null effects ( figure s -s ). to ensure that inconsistent task performance did not lead to null effects, we repeated the main fft analysis on only accurate trials. we likewise found null attention effects when analyzing only accurate trials ( figure s ) . finally, we tested whether phase consistency, rather than power, may track attention in our task (e.g., nunez et al., ; tallon-baudry et al., ) . to do so, we performed an fft on single trials rather than on condition-averaged waveforms, and we extracted single-trial phase values. we calculated a phase-locking index by computing mean-resultant vector length on each condition's histogram of single-trial phase values. mean-resultant vector length ranges from (fully random values) to (perfectly identical values), for reference, see zar ( ) . we found no effect of attention on this phase- locking index ( figure s ). was not due to using gaussian wavelets rather than an fft, we repeated the main analysis with an fft. frequency spectra for the attend cue condition (c) and ignore cue condition (d) reveal an overall robust ssvep signal at hz and hz, but no modulation by attention. likewise, violin plots of signal-to-noise ratios again show robust signal but no modulation by attention in either the attend cue condition (e) or the ignore cue condition (f). positive control: analysis of event-related potential (p b) for an attention effect. consistent with prior work, we found a significantly larger p component for target onsets compared to distractor onsets ( figure ) . a repeated measures anova with within-subjects factors cue type (attend cue or ignore cue) and event type (target or distractor onset) revealed a robust main effect of event type (target > distractor), f( , ) = . , p < x - , η p = . , and a main event of cue type (attend > ignore), f( , ) = . , p = . , η p = . , but no interaction between event type and cue type (p = . ). control analyses confirmed this p modulation was not due to differential rates of making a motor response for targets and distractors ( figure s ). the main effect of event type (target > distractor) remained when analyzing only trials where participants made a motor response (p < . ). thus, the p was overall larger for target than distractor events, consistent with prior work that found this erp attention effect alongside an ssvep attention effect. we defined "feature-based attention manipulation" as having the following characteristics: ( ) participants were cued to attend a feature(s) within a feature dimension (e.g., attend red, ignore blue) rather than across a feature dimension (e.g. attend contrast, ignore orientation), ( ) the attended and ignored feature were both frequency-tagged in the same trials (rather than only feature tagged per trial), ( ) each frequency was both "attended" and "ignored" on different trials, so that the amplitude of a given frequency could be examined as a function of attention, ( ) the task could not be performed by adopting a strategy of splitting spatial attention to separate spatial locations. after applying these screening criteria, some of the studies that we initially we identified a total of experiments from unique papers (tables s -s ) meeting our inclusion criteria. from these experiments, we quantified variables such as the number of subjects, number of trials, frequencies used, and the presence or absence of an attention effect in the expected direction (attended > ignored). if more than one group of participants was used (e.g., an older adults group) then we included the study but only quantified results for the healthy young adult group (quigley et al., ; quigley & müller, ) . the tasks used in these studies fell broadly into one of categories: ( ) a competing gratings task, ( ) a whole-field flicker task, ( ) a hemifield flicker task and ( ) a central task with peripheral flicker. in the competing gratings task (table s ), participants viewed a stream of centrally-presented, superimposed gratings (e.g., a red horizontal grating and a green vertical grating). because colored, oriented gratings were typically used, participants could thus generally choose to attend based on either one or both features (color and/or orientation). each grating flickered at its own frequency (e.g. green grating shown at . hz, red grating shown at . hz, as in chen et al., ) . because the gratings were superimposed, on any given frame only one of the two gratings was shown. on frames where both gratings should be presented according to their flicker frequencies, a hybrid "plaid" stimulus was shown. studies using a competing gratings task include: (allison et in the whole-field flicker task (table s ) , participants viewed a spatially global stimulus comprised of small, intermingled dots or lines. typically, half of the dots or lines were presented in one feature (e.g., red) and the other half of the lines were presented in another (e.g., blue); each set of dots flickered at a unique frequency. although the most common attended feature was color, some task variants included ( ) attending high or low contrast stimuli ( ) attending a particular orientation or ( ) attending a particular conjunction of color and orientation. the whole-field flicker task was the most common task variant, and it is also most similar to the task performed here. studies using a whole- first, we examined whether insufficient power (e.g., fewer subjects and/or trials relative to prior work) could have led to our failure to detect an attention effect. the number of studies employing each task variant is plotted in figure a , the number of subjects per experiment is plotted in figure b , the number of trials per experiment is plotted in figure c , and stimulus duration is plotted in figure d next, we examined the percentage of trials where the attended feature was repeated (e.g., if the attended color was red on trial n, what was the chance that red would also be attended on trial n+ ?). the priming-based account of feature-based attention posits that participants cannot use trial-by-trial cues to enhance a particular feature, but rather, feature-based enhancement happens automatically when a particular feature is repeated (theeuwes, ) . thus, if there is a substantial proportion of trials where the repeated color was attended (e.g. with possible colors, both the attended and ignored color will be repeated on % of trials), then the observed attentional enhancement effects might be driven primarily by incidental repetitions of attended features. in our study, we used different colors to reduce the potential effect of inter-trial priming on the observed ssvep attention effects ( % repeats of the attended color, % repeats of the attended color and the ignored color). we quantified the approximate percentage of trials on which an attended feature on one trial is repeated on the next trial (within a given block of trials). in some studies, participants were cued to attend more than one feature on a given trial, or they sometimes attended to a conjunction of features. in these cases, we in the studies with % repeats might be equally attributed to their low trial counts (median of repeats as the present study (störmer & alvarez, ). störmer and alvarez found a significant attention effect while using unique colors (intermixed randomly from trial to trial). the findings by störmer and alvarez provide evidence against the feature-based priming account, and suggest the task factor "number of colors" cannot definitively explain our inability to observe an attention effect. however, given the lack of extant work using unpredictable color cues, we think future, systematic work is needed to determine the degree to which inter-trial priming effects may modulate the size and reliability of feature- based attention effects. ssvep frequencies. we examined frequencies that have been most commonly used in the literature. in our study, we chose relatively high frequencies ( and hz) in order to have increased temporal resolution for detecting potential time-course effects. in addition, some have argued that using higher frequencies as advantages for driving a more finally, we examined whether the type of task and task difficulty may have influenced our ability to detect an attention effect. in particular, the specific targets that we used may differ slightly from prior work. in our experiment, participants detected a brief period ( ms) of an on average ~ % coherent orientation (the coherent line orientation was a random, unpredictable direction, from - degrees). in this task, participants performed well above chance, but the task was still fairly challenging overall (d' = . ). this raises the possibility that, compared to prior ssvep studies, subjects were giving up on some percentage of the trials and that this contributed to the lack of attention effects. for the reviewed papers in which participants detected a target within the flickering stimulus ("whole-field flicker task" and "hemifield flicker task"), we compiled information about participants' accuracy, the duration of the target, the type of target, and the percentage of dots/lines that comprised the target (table s ) . we found that our particular task (detect a coherent orientation in the cued color) was slightly different from the other tasks that have been used. two other prior studies did not use a behavioral task at all: participants were simply instructed to monitor a particular feature without making any overt response (pei et although the particulars of the luminance and motion tasks subtly differ from our orientation task, it is not clear why ssveps would track attention when the target is a coherent luminance value or motion direction, but not when the target is a coherent condition. thus, because we found no overall ssvep attention effect, we were unable to test our hypotheses about how this attention effect was modulated by being cued to attend versus cued to ignore. despite the lack of an ssvep attention effect, positive control analyses indicated that that participants did successfully select the cued target color (i.e., we observed a significantly larger p component for target events in the attended color than in the ignored color). given our failure to observe an effect of attention on ssvep amplitude with our task procedures, we performed a focused review of the literature to quantify key methodological aspects of prior studies using ssveps to study feature-based attention. based on this review, we concluded that sample size and trial counts likely did not explain our failure to find an effect; our sample size and trial counts were near the maximum values found in the surveyed literature. likewise, the range of accuracy values found in the literature suggests that task difficulty does not explain our failure to find an attention effect. however, two key, intentional design differences may have hampered our ability to find an effect: ( ) the number of colors in our stimulus set and ( ) the frequencies used to generate the ssvep. the first key design difference in our study was the number colors in our stimulus set. we purposefully minimized the influence of inter-trial priming on our estimates of feature-based attention (theeuwes, ) by using unique colors and randomly chose target and distractor colors on each trial. according to a priming account of feature-based attention, a relatively high proportion of trials where the attended color is repeated back- to-back could inflate or even entirely drive apparent feature-based attention effects. using colors somewhat protects against this possibility, because it ensures that the attended color is repeated on % of trials, and both the attended/ignored colors are repeated on only % of trials. in the literature, we found that most studies had back-to-back color repeats on at least % of trials. it is thus plausible that inter-trial priming could contribute to observed attention differences in these studies. contrary to a priming account, however, one study found robust feature-based attention effects using a set of unique colors (störmer & alvarez, ) , suggesting that participants can use a cue to direct feature-based attention even when the proportion of repeated trials is relatively low. to date, however, no study has directly manipulated the proportion of repeated trials or the number of possible stimulus colors in an ssvep study. given emerging evidence that history-driven effects play an important role in shaping both spatial and feature-based attentional selection ( the second key design difference in our study was the chosen set of frequencies. to ensure adequate temporal resolution to characterize time-course effects, we chose to use slightly higher frequencies ( and hz). we believed these values would be reasonable, because an initial study of the time-course of spatial attention used ssvep it is perhaps puzzling that frequencies above hz have been commonly used in the spatial attention literature but have not been used in the feature-based attention literature. the truncation of the frequency distribution in the reviewed literature could be a piece of the "file drawer" in action. it is possible that other researchers likewise discovered that they were unable to track feature-based attention using certain frequencies, but that these null results were never published due to journals' and authors' biases toward publishing positive results (ferguson & heene, ; rosenthal, ) and biases against publishing negative results (i.e., "censoring of null results", guan & vandekerckhove, ; sterling, ; sterling et al., ) . thus, our results highlight the practical and theoretical importance of regularly publishing null results. on the practical side, if prior null results had been published, we may have better known which frequencies to use or avoid, and we would have been able to test our key hypotheses. on the theoretical side, our results highlight how seemingly unimportant null results can have implications for theory when viewed in the context of the broader literature. for example, if certain frequencies track spatial but not feature-based attention, this may inform our understanding of the brain networks and cognitive processes differentially modulated by flicker frequency (ding et al., ; srinivasan et al., ) . in short, we found no evidence that ssveps track the deployment of feature-based attention with our procedures, and future methodological work is needed to determine constraints on generalizability of the ssvep method for tracking feature-based attention. we performed a focused review of prior studies using ssveps to study feature-based attention, and from this review we identified two key factors (frequencies used; likelihood of inter-trial feature priming) that should be systematically investigated in future work. history-driven modulations of population codes in early visual cortex during visual search top-down attention is limited within but not between feature dimensions the berger rhythm: potential changes from the towards an independent brain-computer interface using steady state visual evoked potentials effects of feature-selective and spatial attention at different stages of visual processing attention facilitates multiple stimulus features in parallel in human visual cortex global facilitation of attended features is obligatory and restricts divided attention behavioral performance follows the time course of neural facilitation and suppression during cued shifts of feature-selective attention color-selective attention need not be mediated by spatial attention tracking the allocation of attention in visual scenes with steady-state evoked potentials attentional selection of feature conjunctions is accomplished by parallel and independent selection of single features bottom-up biases in feature-selective too little, too late, and in the wrong place: alpha band activity does not reflect an active mechanism of selective attention attentive and pre-attentive aspects of figural processing templates for rejection: configuring attention to ignore task-irrelevant features top-down versus bottom-up attentional control: a failed theoretical dichotomy evidence for negative feature guidance in visual search is explained by spatial recoding no templates for rejection: a failure to configure attention to ignore task-irrelevant features how many trials does it take to get a significant erp effect? it depends attention to a threat-related feature does not interfere with concurrent attentive feature selection the psychophysics toolbox distinct attention networks for feature enhancement and suppression in vision high gamma power is phase-locked to theta oscillations in human neocortex location-based explanations do not account for active attentional suppression feature-based attention resolves differences in target-distractor similarity through multiple mechanisms the power of human brain magnetoencephalographic signals can be modulated up or down by changes in an attentive visual task tracking feature-based attention normal electrocortical facilitation but abnormal target identification during visual sustained attention in schizophrenia using neuronal populations to study the mechanisms underlying spatial and feature attention rhythmic entrainment source separation: optimizing analyses of neural responses to rhythmic sensory stimulation exceeding chance level by chance: the caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy feature guidance by negative attentional templates depends on search difficulty taming the white bear: initial costs and eventual benefits of distractor inhibition attentional modulation of ssvep power depends on the network tagged by the flicker frequency cortical mechanisms of prioritizing selection for rejection in visual search statistical regularities induce spatial as well as feature-specific suppression a vast graveyard of undead theories: publication bias and psychological science's aversion to the null global enhancement but local suppression in feature-based attention near-real-time feature-selective feature-based attention is constrained to attended locations in older adults a bayesian approach to mitigation of publication bias attention differentially modulates the amplitude of resonance frequencies in the visual cortex feature-based attentional tuning during biological motion detection measured with ssvep the functional organization of human extrastriate cortex: a pet-rcbf study of selective attention to faces and locations human eeg responses to ? hz flicker: resonance phenomena in visual cortex and their potential correlation to cognitive phenomena lip responses to a popout stimulus are reduced if it is overtly ignored when conflict cannot be avoided relative contributions of early selection and frontal executive control in mitigating stroop conflict temporal dynamics of divided spatial attention a category-specific top-down attentional set can affect the neural responses outside the current focus of attention sensation luminance: a new name to distinguish cie luminance from luminance dependent on an individual's spectral sensitivity the effects of electrode impedance on data quality and statistical significance in erp recordings time courses of attentional modulation in neural amplification and synchronization measured with steady-state visual-evoked potentials audio-visual synchrony and feature-selective attention co- amplify early visual processing attention induces synchronization-based response gain in steady-state visual evoked potentials what's new in psychtoolbox- ? estimates of a priori power and false discovery rates induced by post-hoc changes from thousands of independent replications large-scale network-level processes during entrainment an introduction to the event-related potential technique feature-based attention increases the selectivity of population responses in primate visual cortex cortical summation and attentional modulation of combined chromatic and luminance signals neural mechanisms of divided feature-selective attention to colour the ignoring paradox: cueing distractor features leads first to selection, then to inhibition of to-be-ignored items. attention, perception feature-based attention selective attention to stimulus location modulates the steady-state visual evoked potential feature-selective attention enhances color signals in early visual areas of the human brain it takes two to tango: suppression of task-irrelevant features requires (spatial) competition concurrent recording of steady-state and transient event- related potentials as indices of visual-spatial selective attention effects of spatial selective attention on the steady-state visual evoked potential in the - hz range the time course of cortical facilitation during cued shifts of spatial attention the steady-state visual evoked potential in vision research: a review individual differences in attention influence perceptual decision making memory for object features versus memory for object location: a positron-emission tomography study of encoding and retrieval processes causal involvement of visual area mt in global feature-based enhancement but not contingent attentional capture neural responses to target features outside a search array are enhanced during conjunction but not unique- neural correlates of object-based attention the videotoolbox software for visual psychophysics: transforming numbers into movies feature- selective attention: evidence for a decline in old age feature-selective attention in healthy old age: a selective decline in selective attention cortical evidence for negative search templates steady-state evoked potentials the file drawer problem and tolerance for null results expectations do not alter early sensory processing during perceptual decision-making capture versus suppression of attention by salient singletons: electrophysiological evidence for an automatic attend-to-me signal steady-state visual evoked potentials distributed local sources and wave-like dynamics are sensitive to flicker frequency rapid adaptive adjustments of selective attention following errors revealed by the time course of steady-state visual evoked potentials publication decisions and their possible effects on inferences drawn from tests of significance-or vice versa publication decisions revisited the effect of the outcome of statistical tests on the decision to publish and vice versa feature-based attention elicits surround suppression in feature space stimulus specificity of phase-locked and non-phase-locked hz visual responses in human. the journal of neuroscience attentional capacity for processing concurrent stimuli is larger across sensory modalities than within a modality feature-based attention: it is all bottom-up priming selection of visual objects in perception and working memory one at a time using frequency tagging to quantify attentional deployment in a visual divided attention task inhibition in selective attention experience-dependent attentional tuning of distractor rejection attention selects informative neural populations in human v steady-state visually evoked potentials: focus on essential paradigms and future perspectives protecting visual short- term memory during maintenance: attentional modulation of target and distractor representations statistical regularities modulate attentional capture how to inhibit a distractor location? statistical learning versus active, top-down suppression the neural correlates of feature-based selective attention when viewing spatially and temporally overlapping images effect of higher frequency on the classification of steady-state visual evoked potentials biostatistical analysis an independent brain- computer interface using covert non-spatial visual selective attention feature-based attention modulates feedforward visual processing orientation. for example, just like in the coherent motion direction tasks used by others, the angle of the coherent orientation in our task was completely unpredictable. thus, participants in our task and in other tasks could not form a template of an orientation or motion direction they should attend in advance and instead had to attend to an orthogonal feature dimension such as color. in addition, in both prior tasks and the current task there were an equal number of coherent events in the cued and uncued color. if participants failed to attend to the cued color and instead responded to any orientation event, their performance in the task would be at chance. behavioral performance in the reviewed studies ranged from d' = . to d' = . (table s ). in many studies, performance was quite high (d' > or accuracy > %) relative to performance in our study (adamian et in this pre-registered study, we sought to test whether cuing participants to ignore however, we failed to replicate this basic overall attention effect; we found no difference in ssvep amplitude as a function of attention in either the attend cue or the ignore cue key: cord- -a fsr ys authors: zeng, zipeng; huang, biao; parvez, riana k.; li, yidan; chen, jyunhao; vonk, ariel; thornton, matthew e.; patel, tadrushi; rutledge, elisabeth a.; kim, albert d.; yu, jingying; pastor-soler, nuria; hallows, kenneth r.; grubbs, brendan h.; mcmahon, jill a.; mcmahon, andrew p.; li, zhongwei title: generation of kidney ureteric bud and collecting duct organoids that recapitulate kidney branching morphogenesis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a fsr ys kidney organoids model development and diseases of the nephron but not the contiguous epithelial network of the kidney’s collecting duct (cd) system. here, we report the generation of an expandable, d branching ureteric bud (ub) organoid culture model that can be derived from primary ub progenitors from mouse and human fetal kidneys, or generated de novo from pluripotent human stem cells. ub organoids differentiate into cd organoids in vitro, with differentiated cell types adopting spatial assemblies reflective of the adult kidney collecting system. aggregating d-cultured nephron progenitor cells with ub organoids in vitro results in a reiterative process of branching morphogenesis and nephron induction, similar to kidney development. combining efficient gene editing with the ub organoid model will facilitate an enhanced understanding of development, regeneration and diseases of the mammalian collecting system. one sentence summary collecting duct organoids derived from primary mouse and human ureteric bud progenitor cells and human pluripotent stem cells provide an in vitro platform for genetic dissection of development, regeneration and diseases of the mammalian collecting system. the mammalian kidney contains thousands of nephrons, connected to a highly branched collecting duct (cd) system. nephrons filter and process the blood to form the primitive urine, which is collected and further refined in the cd system to adjust water, electrolytes and ph and to maintain the homeostasis of tissue fluid , . the complex and elaborate kidney is largely formed from the reciprocal interactions of two embryonic cell populations: the epithelial ureteric bud (ub); and the surrounding metanephric mesenchyme (mm). signals from the mm induce the repeated branching of ub, which gives rise to the entire cd network. meanwhile, signals from the ub induce the mm to form nephrons , . given this central role of the ub in kidney organogenesis, defects in ub/cd development often lead to malformation of the kidney, low endowment of nephrons at birth, and congenital anomalies of kidney and urinary tract (cakut) , , . thus, a better understanding of kidney branching morphogenesis is needed for in vitro efforts towards rebuilding the kidney. it is also required for developing novel preventive, diagnostic and therapeutic approaches for various kidney diseases. three-dimensional ( d) multicellular mini-organ structures, or organoids, have broad applications for modeling organ development and disease, and for regenerating organs through cell or tissue replacement therapies , . recently, we and others have been able to generate kidney organoids from human pluripotent stem cells [ ] [ ] [ ] [ ] or from expandable nephron progenitor cells (npcs) - . these organoids have greatly aided studies of the role of nephrons in kidney development and disease . however, we still lack a robust kidney organoid model that can recapitulate ub branching morphogenesis and its maturation into the renal cd network, despite previous efforts relying on primary mouse/rat tissue [ ] [ ] [ ] [ ] [ ] or human pluripotent stem cells [ ] [ ] [ ] [ ] . here, we report the development of a d organoid model that mimics the full spectrum of kidney branching morphogenesis in vitro-from the expandable immature ub progenitor stage, to the mature cd stage. these organoids, derived from primary ub progenitor cells and human pluripotent stem cells, are amenable to efficient gene editing, and have broad applications for studying kidney development, regeneration and disease. we previously developed a d culture system for the long-term expansion of mouse and human nephron progenitor cells (npcs), which can generate nephron organoids that recapitulate kidney development and disease , . ub branching morphogenesis is driven by another kidney progenitor population, the ub progenitor cells (upcs). we thus set out to establish a culture system for the expansion of upcs. upcs are specified around embryonic day . (e . ), when the ub starts to invade the mm. upcs disappear around postnatal day (p ), when nephrogenesis ceases. self-renewing upcs reside in the tip region of the branching ub. during their approximately -day lifespan, some upcs migrate out of ub tip niche to the ub trunk, and differentiate into the renal cd network. other upcs proliferate and replenish the self-renewing progenitor cell population of the ub tip. ret , and wnt have been identified as specific markers for upcs, and the transgenic reporter mouse strain wnt -myrtagrfp-ires-ce ("wnt -rfp" for short) has been generated to facilitate the real-time tracking of wnt -expresing cells based on rfp expression, and the lineage tracing of their progeny via a cre-mediated recombination system . we employed this wnt -rfp reporter system as a readout to screen for a culture condition that maintained the progenitor identity of upcs in vitro. t-shaped ubs were manually isolated from e . kidneys of wnt -rfp mice, and immediately embedded into matrigel to set up a d culture platform that supported epithelial branching. using this d culture format, hundreds of different combinations of growth factors and small molecules were tested, following strategies similar to those we used to establish optimal npc culture . this screening allowed us to identify a cocktail, which we named "ub culture medium" (ubcm), that maintained self-renewing upcs as a d branching ub organoid (fig. a) . under this culture condition, the tshaped ub formed a rapidly expanding branching epithelial morphology, with uniform wnt -rfp expression maintained throughout the d structure (fig. b) . resected wnt -rfp + ub organoid tips, re-embedded in matrigel, branched and grew into additional wnt -rfp + ub organoids. repetitive passaging and embedding for up to weeks, resulted in over a hundred thousand-fold expansion in the number of cells (fig. c) . wnt -rfp levels remained uniform for the first days and progressively dropped thereafter, similar to the normal course of upcs in vivo (fig. c) . consistent with the uniform expression of wnt -rfp in the entire ub organoid, whole-mount immunostaining of cultures after days, confirmed the homogenous expression of upc markers ret, etv , and sox , as well as broad ub lineage markers gata , pax , pax , krt , and cdh , in the ub organoids (fig. d ,e; extended data fig. a,b) . to better define the identity of the ub organoids, we used rna-seq to profile the transcriptome of the organoids after days and days in culture. these data were compared with prior rna-seq data for primary ub tip and ub trunk populations , as well as for npcs and interstitial progenitor cells (ipcs) . unsupervised clustering (extended data fig. c ) and principle component analysis (pca) (fig. f) placed the cultured ub organoids closer to the primary ub tip samples than to differentiated stalk derivatives of the ub trunk (extended data fig. c) . taken together, these findings indicate the ub organoid culture system enables a substantial expansion of cells retaining molecular characteristics of upcs in vitro. next, we tested whether the ub organoid culture system can be applied to mouse strains other than wnt -rfp. for this, we successfully derived ub organoids from e . ub from swiss webster, a wild-type mouse strain, and from multiple transgenic mouse strains including hoxb -venus , sox -gfp , and rosa -cas /gfp (extended data fig. d ,e and data not shown). all of these ub organoids retained the typical branching morphology and showed very similar growth rates, compared to wnt -rfp ub organoids (fig. g) , indicating the robustness of the d/ubcm culture system. moreover, ub organoids still self-organized into branching organoids after a freeze-thaw process, adding flexibility to the culture system with regards to cell storage (extended data fig. f) . in determining whether we had developed a synthetic niche for upc self-renewal, the most stringent test was whether the ubcm culture condition was able to derive ub organoids from single cells. for this purpose, we dissociated e . ubs into single cells before embedding them into matrigel and culturing in ubcm medium. around % of the single cells self-organized into e . ub-like budding structures within days, though a smaller percentage ( - %) maintained wnt -rfp ( fig. h; extended data fig. g,h) , an efficiency similar to clonal organoid formation for lgr + intestinal stem cells . importantly, the clonally-derived wnt -rfp + budding structures were identical to intact e . ub-derived organoids in both branching morphology and growth rate ( fig. g-i) . furthermore, withdrawal of the major medium components from ubcm resulted in either growth arrest (chir ) or rapid loss of wnt -rfp (all other components), suggesting that each component was essential for optimal ub organoid culture (extended data fig. a-c) . these data, taken together, suggest that ubcm represents a synthetic niche for the in vitro expansion of upcs. the functions of the mature renal cd system are carried out by two major cell populations that are intermingled throughout the entire cd network. the more abundant principal cells (pcs) concentrate the urine and regulate na + /k + homeostasis via water and na + /k + transporters. the less abundant α-and β-intercalated cells (ics) regulate normal acid-base homeostasis via secretion of h + or hco into the urine. the absence of an in vitro system recapitulating pc and ic development in an appropriate d context, constrains physiological exploration, disease modeling and drug screening on the renal cd system. with this limitation in mind, we developed a screen to establish conditions supporting the differentiation of cd organoids, assaying expression of aqp and foxi , definitive markers for pc and ic lineages, respectively, by quantitative reverse transcription pcr (qrt-pcr), following days of culture under variable but defined culture conditions (fig. a) . in a st round of screening, we determined the base condition in which minimal growth factors/small molecules sustained the survival of the organoids and permitted their differentiation. the base medium used for ubcm -hbi -was tested, together with the commercially available apel medium for sustaining kidney organoid generation . combinations of fgf , egf and y (empirically determined) were tested, together with the two different base media (extended data fig. a ). after days of differentiation in the various conditions, we observed that the hbi+fgf +y condition enabled the survival of the organoid and permitted spontaneous basal differentiation, as shown by the modest elevation of both aqp and foxi (extended data fig. b,c) . to enhance the efficiency of differentiation, we carried out a nd round of screening identifying molecules that strongly induced the expression of aqp and/or foxi under the hbi+fgf +y condition. agonists or antagonists targeting major developmental pathways (tgf-β, bmp, wnt, fgf, hedgehog and notch) were tested, together with hormonal inputs known to regulate pc or ic activity (aldosterone and vasopressin). bmp , dapt (a notch pathway inhibitor), jaki (jak inhibitor i) and pd (mek inhibitor) dramatically increased both aqp and foxi expression, while jag- (notch agonist) and aldosterone led to a preferential increase in foxi expression, and vasopressin to enhanced aqp expression (extended data fig. d-f ). in a rd round of screening, testing of various combinations of these factors led to the identification of an optimized cd differentiation medium (cddm) supplemented with fgf , y , dapt, pd , aldosterone and vasopressin. seven days of ub organoid culture in cddm resulted in a marked decrease in expression of the upc genes (wnt and ret) and a concomitant elevation in the expression of pc-specific water transporter encoding genes (aqp and aqp ) and ic-specific transcription factor (foxi ), proton pump (atp v b ) and cl -/hco exchangers (slc a /ae , α-ic-specific; slc a /pendrin β-ic specific) (fig. b) . immunostaining confirmed the presence of aqp , aqp , foxi and tfcp l in cd organoids (fig. c-e; extended data fig. g,h) . differentiating cd organoids displayed a clear lumen (fig. c) , and the organization of pc and ic cell types reflected that of the kidney cd: aqp + /aqp + pcs comprised - % of the cells, and foxi + /atp v b + /tfcp l + /kit + , ics were widely dispersed, solitary cells (fig. c,d; extended data fig. g,h) . further, aqp and aqp showed a differential subcellular localization, aqp to the apical luminal facing surface and aqp to basolateral plasma membrane, reflecting an in vivo pc distribution essential to the physiological action of pcs ( fig. c-e) . more importantly, the cddm differentiation protocol was highly reproducible when testing ub organoids derived from different genetic backgrounds (extended data fig. i ). with the capacity to produce large quantities of npcs and upcs through in vitro expansion culture, we examined whether combining these cell types generated a model mimicking key features of in vivo kidney development, such as reiterative ureteric branching and nephron induction, and morphogenesis and patterning of differentiating derivatives (fig. a) . npcs in our long-term culture model grow as d aggregates , . to mimic the natural organization of npcs capping ub tips in the kidney anlagen, we manually excavated a hole in d cultured npcs and inserted a cultured ub organoid tip into this cavity. the synthetic structures were then embedded into matrigel and transferred onto an air-liquid interface (ali) to facilitate further kidney organogenesis . over days of culture, the ub underwent extensive branching ( fig. b) generating a krt + tubular network extending from the center of the structure into the periphery. further, npcs generated nephron-like cell types and structures such as podxl+/wt + podocytes and ltl+ proximal tubules (fig. c) . to determine whether the synthetic kidney also formed a connection between nephron and cd, we generated synthetic kidney from hoxb -venus ub and wild-type npcs. in this way, all progeny of the ub organoid could be tracked by venus expression. co-staining of the synthetic kidney structure with antibodies against cdh and gata identified a clear fusion of cdh + /venusdistal nephron with cdh + /venus + cd. importantly, gata expression was strong in the entire venus + cd structure, but progressively dropped along the distal-to-proximal axis of the distal nephron, as observed in vivo , (fig. d) . thus, the synthetic kidney established an interconnection between the nephron and cd similar to that observed in vivo. approximately % of synthetic kidneys underwent a similar program, indicating in vitro self-organizing development was robust (fig. e) . most failures likely reflected technical differences in manual reconstruction. taken together, synthetic kidney with interconnected nephron and cd can be efficiently generated from expandable npcs and ubs. the ub and cd models could provide an accessible in vitro complement to the mouse models for in-depth mechanistic studies and drug screening. here, efficient gene overexpression (oe) or gene knockout (ko) would significantly extend the capability and utility of the in vitro model (fig. f ). as a proof of concept, gfp oe and gfp ko ub organoids were generated. for gfp oe, we used a standard lentiviral system to introduce gfp under the control of a cmv promoter . however, even at a very high titer, the lentiviral infection efficiency of the intact t-shaped ub was very low (data not shown). but, dissociating t-shaped ub or ub organoids into a single cell suspension prior to infection, dramatically improved the infection efficiency, with widespread gfp activity in resulting ub organoids after re-aggregation of infected cells (fig. g) . to test geneknockout, we targeted gfp in rosa -cas /gfp ub organoid, in which cas and gfp are constitutively expressed from the rosa loci (extended data fig. e) . a mix of three different lentiviruses, expressing three different synthetic guide rnas (sgrnas) with cas targeting sites - bp apart , gave a highly-efficient, gfp grna-specific, multiplexed crispr/cas gene knockout, demonstrating effective gene knockout in ub organoid cultures (fig. h) . the successful generation of mouse ub and cd organoids prompted us to test whether the system can also derive human ub and cd organoids. to achieve this, we first developed a method to generate expandable human ub organoids from primary human upcs (hupcs) (extended data fig. a ). similar to their murine counterparts, hupcs within ub tips, express to determine whether ub and cd organoids could be generated from human pluripotent stem cells (hpsc)-derived upcs, we first genetically engineered h human embryonic stem cells (hescs) with a knockin dual reporter system where mcherry was expressed from the pax locus (pax -mcherry) and gfp from the wnt locus (wnt -gfp) (extended data fig. ). this reporter cell line facilitated the establishment of a stepwise protocol that produced high-quality hupcs, which generated branching induced ub (iub) organoids, which matured into induced cd (icd) organoids (fig. a) . the ub is derived from the nephric duct (nd), which originates from primitive streak (mesendoderm)-derived anterior intermediate mesoderm , . consistent with this developmental trajectory, following a -day directed differentiation protocol modified from previous protocols , , we were able to first observe the expression of mesendoderm (me) marker t on day of differentiation in most cells (extended data fig. a) , followed by the formation of large numbers of compact cell colonies that are gata + /sox + /pax + /pax + /kit + /krt + on day of differentiation, suggesting the generation of potential precursor cells of the ub lineage (extended data fig. b) . consistent with the immunostaining results, we were able to identify a pax -mcherry + population ( . %) by facs on day . however, at this stage, the pax -mcherry + /wnt -gfp + population was very rare ( . %), preventing further characterization or culture (extended data fig. c ). however, further culture of pax -mcherry+ cells in the d/hubcm culture conditions activated wnt -gfp reporter expression at around weeks, and the structure started to show a branching morphology (extended data fig. d) . we refer to the pax -mcherry + /wnt -gfp + branching structure an "iub" organoid hereafter. importantly, these iub organoids could be expanded stably in d/hubcm for at least months without losing reporter gene expression (fig. f) . consistently, qrt-pcr analysis confirmed that wnt expression was low in the mcherry + cells purified from facs, but was dramatically elevated in the iub organoid. furthermore, even though ub marker genes pax , gata , lhx and ret were greatly elevated on day of differentiation, while wnt , cdh , emx , and hnf b, showed comparable levels of expression to the human fetal kidney only after extended hubcm culture, suggesting hubcm promoted transition from a common nephric duct to a specific ureteric epithelial precursor (fig. b) . in addition to qrt-pcr, expression of marker genes sox and cdh in the iub organoid was detected at the protein level by immunostaining (fig. c) , further confirming the identity of the iub organoid. we next tested whether the expandable iub organoid retained the potential to generate an icd organoid after long-term expansion. iub organoids were subjected to differentiation with the cddm medium identified for mouse ub-to-cd transition. after days of differentiation in cddm, the human iub organoid grew and elongated, maintaining pax -mcherry expression, but losing wnt -gfp expression (fig. d , and data not shown). more importantly, the expression of upc markers wnt and ret was greatly diminished, while cd marker genes aqp , aqp and foxi were dramatically elevated, suggesting the successful transition from iub to icd (fig. e ). lastly, we tested whether expandable iub organoids could be generated from human induced pluripotent stem cells (hipscs). for this, we employed sox -gfp hipsc for differentiation and purified the sox -gfp+ ub precursor cells on day of differentiation (extended data fig. e ). similar to hesc-derived iubs, following an extended culture in hubcm, we were able to derive sox -gfp iub organoids that expanded stably with retained sox -gfp expression throughout ( fig. f ; extended data fig. f-h) . taken together, these results support the conclusion that expandable iubs and mature icds organoid can be derived from hesc and hipsc lines. here, we show that with appropriate d culture conditions, it is possible to expand ub progenitor cells in vitro as branching ub organoids. the organoid culture medium thus serves as a synthetic niche for maintaining the ub progenitor cell identity. consistent with mouse genetics studies, signaling pathways that play key roles in kidney branching morphogenesis, such as gdnf , - , fgf , ra - and wnt , signaling, are also essential in maintaining ub progenitor cell identity in ub organoids. combined with efficient genome editing, this accessible in vitro model for ub progenitor cell self-renewal is a powerful tool for comparative studies of how ub progenitor cell fate is determined in mouse and human. leveraging our ability to produce large quantities of high-quality ub progenitor cells in the format of expandable branching ub organoids, we performed a screening that identified cddm-a cocktail of growth factors, small molecules and hormones that together can differentiate ub organoids into cd organoids with spatially patterned mature pcs and ics. the molecular mechanisms underlying the ub-to-cd transition are still largely unknown. the in vitro organoid system provides a novel tool to study this process, and the chemically-defined components in cddm shed new light on potential signals that trigger cd maturation in vivo. despite the general difficulty of maturing stem cell-derived tissues, our study shows that it is possible to achieve proper patterning and maturation in vitro, similar to what we observe in vivo, when starting from highquality progenitor cells under appropriate culture conditions. the generation of a synthetic kidney from expandable npcs and upcs provides a proofof-concept for rebuilding a kidney in vitro from kidney-specific progenitor cells. sox -gfp mice were kindly shared from dr. haruhiko akiyama . hoxb -venus mice were kindly shared from dr. andrew mcmahon (jax # ). rosa -cas /gfp mice were kindly shared from dr. andrew mcmahon (jax # ). male mice with desired genotype (wnt -rfp, hoxb -venus, sox -gfp, or rosa -cas /gfp) were mated with female swiss webster mice. plugs were checked the next morning, midday of plug positive was designated as embryonic day . (e . ). timed pregnant mice were euthanized at e . . kidneys were dissected out from embryos with standard dissection technique and transferred into kidney dissection medium in a . ml eppendorf tube on ice. next, the medium in the tube was removed, at least µl fresh pre-warmed collagenase iv was added into the tube and incubated at °c for - minutes. after incubation, collagenase was removed and at least µl mouse ub dissection medium was added to resuspend the kidneys. for deriving ub organoid from dissociated ub single cells (e.g. for gene editing purpose), after the isolation of e . t-shaped ubs from kidneys following the procedures described above, all ubs were collected into a . ml eppendorf tube with the medium removed as much as possible. appropriate amount (e.g. for ubs, we use µl, adjust accordingly) of pre-warmed accumax cell dissociation solution (innovative cell technologies, # am ) was added into the tube and the tube was then incubated at °c for minutes. then, an equal amount of mouse ub dissection medium was added into the tube to neutralize the accumax and the mixture was pipetted up and down gently for times to dissociate the ub into single cells. the tube was then centrifuged at g for minutes. after centrifugation, supernatant was carefully removed and ub cells were resuspended in appropriate amount of mubcm (y was supplemented at µm final concentration for the first hours) by pipetting up and down gently for less than times. cell density was measured using automatic cell counter (bio-rad, tc ). cells (~ x e . ub) were transferred into each well of a u-bottom -well low-attachment plate and extra amount of mubcm (with µm y ) was added to the well to make the final volume µl per well. the plate was then centrifuged at g for minutes and transferred and cultured in °c incubator. after hours, the single cells formed an aggregate autonomously and the aggregate was then transferred into an µl cold matrigel droplet in another well of the u-bottom -well low-attachment plate using a p micropipette. the aggregate was pipetted up and down gently for - times to mix with matrigel. after all aggregates were embedded in matrigel, the plate was incubated at °c for minutes for the matrigel to solidify. lastly, µl ubcm was added slowly into each well and the plate was then transferred into an incubator set at °c with % co . mouse ubcm was renewed with fresh medium every two days, and ub organoid was passaged every five days. ub organoid (with matrigel) was first transferred from u-bottom -well plate onto a mm petri dish lid with - µl medium using a p pipette with the tip cut . - cm to widen the diameter. most of the matrigel surrounding the organoid was removed using sterile needles under dissecting microscope. a small piece of the organoid with - branching tips was cut using the needles and then transferred into an µl cold matrigel droplet in a u-bottom -well lowattachment plate well by using a p micropipette. the small piece of tip structure was pipetted up and down - times to mix with the matrigel. the plate was then incubated at °c for minutes for the matrigel to solidify. then, µl mubcm was slowly added into each well and the ub was cultured in °c incubator. ub organoid was transferred into a . ml eppendorf tube with as little medium as possible using a p pipette with the tip cut . - cm to widen the diameter. µl pre-warmed accumax cell dissociation solution was added into the tube and gently pipetted up and down using a p pipette to break down the organoid (around times) into small pieces. the tube was then incubated in °c for minutes. after the incubation, µl ub dissection medium was added to neutralize the accumax and the mixture was pipetted up and down gently for times to further break down the organoid into single cells. the tube was then centrifuged at g for minutes. after centrifugation, supernatant was carefully removed and the ub cells were resuspended in appropriate amount of mubcm (with the addition of y at µm final concentration for the first hours) by pipetting up and down gently for less than times. cell density was measured by automatic cell counter. cells were transferred into each well of a u-bottom -well lowattachment plate and extra amount of mubcm (with µm y ) was added to the well to make the final volume µl per well. the plate was then centrifuged at g for minutes and transferred and cultured in °c incubator. after hours, the single cells formed an aggregate autonomously and the aggregate was then transferred into an µl cold matrigel droplet in another well of the u-bottom -well low-attachment plate using a p micropipette. the aggregate was pipetted up and down gently for - times to mix with matrigel. after all aggregates were embedded in matrigel, the plate was incubated at °c for minutes for the matrigel to solidify. lastly, µl ubcm was added slowly into each well and the plate was then transferred into an incubator set at °c with % co . organoid derivation from ret+ primary upcs purified from human fetal kidney all human fetal kidney samples were collected under institutional review board approval (usc-hs- - and chla- - ). following the patient decision for pregnancy termination, the patient was offered the option of donation of the products of conception for research purposes, and those that agreed signed an informed consent. this did not alter the choice of termination procedure, and the products of conception from those that declined participation were disposed of in a standard fashion. the only information collected was gestational age and whether there were any known genetic or structural abnormalities. the kidney nephrogenic zone was dissected manually from fresh - -week human fetal kidney, chopped into small pieces with surgical blade, and transferred into - . ml eppendorf tubes. tissues were washed with pbs and resuspended with µl of pre-warmed accumax per tube and the tubes were incubated at °c with shaking for minutes. µl dissection medium was then added to neutralize the accumax, and the mixture was pipetted up and down for times to dissociate the tissues into single cells. the mixture medium with kidney cells were then combined together and went through a µm cell strainer, then transferred into . ml eppendorf tubes. these tubes were then centrifuged at g for minutes and the supernatant was carefully removed. lastly, µl hubcm was added slowly into each well and cultured in °c incubator. after around - days of culturing, epithelial tip structures can be seen budding out from the aggregate. these tip structures were dissected out and re-embedded into matrigel and expanded as human ub organoid. human pluripotent stem cells are routinely cultured in mtesr (tesr) medium in monolayer culture format coated with matrigel and passaged using dispase as previously described . the hpscs were pre-treated with µm y in tesr medium for one hour before dissociated into single cells using accumax cell dissociation solution. following dissociation, , cells were seeded into matrigel coated -well plate with ml tesr medium plus µm y (day ). hours later (day ), the medium was removed and ml me stage medium was slowly added to the well. hours later (day ), me stage medium was removed and ml ub-i stage medium was slowly added to the well. hours later (day ), medium was changed to ml fresh ub-i medium again. after another hours (day ), ub-i medium was removed, and . - ml ub-ii stage medium was slowly added to the well. hours later (day ), medium was changed to . human ub organoid expansion and passaging both human ub organoid from primary ret + upc and iub organoid from hpscs were passaged every - days. the passaging methods (manual or as single cells) are the same as passaging mouse ub organoid, with the exception of using hubcm instead of mubcm. mouse cd differentiation: mouse ub organoid was passaged at day of expansion as single cells and cells were seeded for continuing expansion. at day of mub expansion, mubcm was removed and µl xpbs was added and removed to wash the organoid. µl cd differentiation medium (cddm) was then added to start mouse cd (mcd) differentiation (mcd differentiation day ). organoid was cultured in °c incubator and medium was changed every two days or everyday if needed (when organoid is big, and the medium turns yellow within hours) for a total of seven days. no passage is needed. at mcd differentiation day , mcd organoid was harvest for analyses. after human ub organoid expansion was stabilized (at least days post facs), hubcm was removed and µl cd differentiation medium was added to start hcd differentiation (hcd differentiation day ). organoid was cultured in °c incubator and medium was changed every two days or everyday if needed (when organoid is big, and the medium turns yellow within hours) for a total of days. organoid can be passaged if it is too big, but it is better to start with smaller organoid. at hcd differentiation day , hcd organoid was harvest for analyses. a small piece (with - branching tips) of day - cultured mub organoid was inserted into a microdissected hole on a d cultured mnpc aggregate in kidney reconstruction medium (apel + . µm ttnpb) with µm y . the reconstruct was then transferred into a well of a ubottom -well low-attachment plate with µl kidney reconstruction medium plus µm y , using a p pipette with the top . - cm of the tip cut, and cultured in °c incubator (day ). after hours (day ), dead cells surrounding the reconstruct was removed by pipetted up and down several times in the well using a p pipette with the top . - cm of the tip cut. the reconstruct was then transferred and embedded into µl matrigel droplet in another well using the same p pipette and tip or a smaller pipette and tip if possible. sterile needle can be used to position the reconstruct in matrigel droplet. the plate was then incubated at °c for minutes for the matrigel to solidify. then, µl kidney reconstruction medium was slowly added into the well and the reconstruct was cultured in °c incubator. after another hours (day ), the reconstruct was transferred together with its surrounding matrigel onto a -well transwell insert membrane using a p pipette with the top . - cm of the tip cut. µl kidney reconstruction medium was added in the lower chamber of the transwell. the medium was changed every two days for a total of - days for the kidney reconstruction, then the reconstruct was harvested for analyses. samples were fixed in % pfa for minutes (ub/cd organoid) in eppendorf tubes or tissue culture plates at room temperature and then washed four times in x pbs for total minutes. they were then transferred into a plastic mold and embedded in oct compound (scigen, cat. no. k ) and froze in - °c for hours to make a cryo-block. the cryo-blocks were sectioned using leica cm cryostat. the sectioned slides were then blocked for minutes at room temperature followed by one hour of primary antibodies staining at room temperature. the slides were then washed four times with pbst for five minutes, and then secondary staining for minutes. after the secondary staining, the slides were washed four times with pbst for five minutes and mounted with mounting medium. day and day cultured mub organoids were collected and prepared for rna-seq. rna sequencing was analyzed using partek flow, including published dataset of interstitial progenitor cells and nephron progenitor cells (lindstrom et al., ) , ureteric tip and trunk cells (rutledge, et al., ) . fastq files were trimmed from both ends based on a minimum read length of bps and a shred quality score of or higher. reads were aligned to gencode mm (release m ) using star . . d. aligned reads were quantified to the partek e/m annotation model. gene counts were normalized by adding then by tmm values. samples were filtered to include differentially expressed genes of ub tip compared to ub trunk with false discovery rate <= . and fold change <- or > , resulting in ub tip signature genes. then, hierarchical clustering was produced on by clustering samples and features with average linkage cluster distance and euclidean point distance. principle component analysis (pca) was performed using the edaseq r/bioconductor packages and the plot was rendered with the ggplot r package. gene over-expression: lentiviral infection was used to overexpress gfp in e . mub cells. lentivirus was first concentrated x using lenti-x concentrator kit from takara (cat # ). concentrated lentivirus was aliquoted and stored in - before use. the lentivirus was used at x final concentration together with μm polybrene (sigma-aldrich, cat. no. tr- -g) diluted in mubcm. µl virus-ubcm mixture was added to the u-bottom -well low-attachment plate well with single cells suspension prepared from - e . mubs. the ubs and virus were centrifuged together at g for minutes for spinfection at room temperature. after the spinfection, virus-ubcm mixture was removed and the infected ub cells were washed three times with pbs, then aggregated and embedded in matrigel and cultured in mubcm in °c incubator following standard ub organoid culture procedures described above. μg/ml g- (invitrogen, cat. # ) was applied to select for ub cells that have been successfully infected. the ub aggregate self-organize into typical branching organoid - days after infection. gene knockout: e . mub single cell suspension from rosa -cas /gfp background was used and lentiviral vectors were constructed using the lentiguide-puro vector system (addgene # ) following standard protocol to make lentiviruses expressing three different grnas targeting gfp with the cas cutting site - bp apart, or three non-targeting grnas as control . x concentrated lentivirus were used at x together with μm polybrene diluted in mubcm. µl virus-ubcm mixture was added to the u-bottom -well low-attachment plate well with e . mubs that have been dissociated into single cells. the ub cells and virus were centrifuged together at g for minutes for spin-infection. after the spin, virus-ubcm mixture was removed and fresh virus-ubcm mixture was added into the same well and the ub cells were spin-infected for another minutes at g. then, virus-ubcm mixture was removed and the infected ubs were washed three times with pbs, then aggregated and embedded in matrigel and cultured in mubcm in °c incubator following standard ub organoid culture procedures described above. . μg/ml puromycin was applied to select for ub cells that have been successfully infected. the ub aggregate self-organize into typical branching organoid - days after infection. non-cell-autonomous retinoid signaling is crucial for renal development beta-catenin is necessary to keep cells of ureteric bud/wolffian duct epithelium in a precursor state canonical wnt/beta-catenin signaling is required for ureteric branching generation of functioning nephrons by implanting human pluripotent stem cell-derived kidney progenitors renal subcapsular transplantation of psc-derived kidney organoids induces neo-vasculogenesis and significant glomerular and tubular maturation in vivo note the red signals that do not overlay with dapi from pax and etv staining panels are non-specific signals. scale bars, a, µm; b, µm. c, unsupervised clustering analysis of rna-seq data. d, bright field (bf) and fluorescence images of ub organoid derived from sox -gfp genetic background bright field (bf) and fluorescence images of ub organoid derived from rosa -cas /gfp (r cas /gfp) genetic background. scale bars, µm. f, bright field (bf) and fluorescence images of wnt scale bar, µm. g, bright field images showing single cells embedded into one drop of matrigel and cultured in ubcm for day (d ) and days (d ) efficiency of ub organoid formation from single ub cells. data are presented as mean ± s.d. each group represents biological replicates extended data fig. expandable ub organoid-based screening for cd differentiation a, summary of conditions tested in the st round of cd differentiation condition screening. b, c, qrt-pcr analyses of the st round of cd differentiation condition screening for pc marker gene aqp (b, in blue) and ic marker gene foxi (c, in orange). data are presented as mean ± s.d. each column represents technical replicates. d, summary of chemicals tested in the nd round of cd differentiation condition screening with the r - medium as base medium. e, f, qrt-pcr analysis of the nd round of cd differentiation condition screening for pc marker gene aqp (e, in blue) and ic marker gene foxi (f, in orange). data are presented as mean ± s.d. each column represents technical replicates. note that data for r - and r - were not presented here, because dramatic cell death were observed in those conditions and were thus excluded from this analysis. g, h, immunostaining of cryo-section samples of differentiated cd organoids for ureteric lineage marker (gata ) and various pc (aqp and aqp ) and ic markers scale bars, µm. c, representative flow cytometry analysis of ret+ upc cells from dissociated human fetal kidney nephrogenic zone. d, time course bright field images showing the growth of human ub organoid derived from primary human upcs in a typical passage cycle. scale bars, µm. e, qrt-pcr analyses of human ub organoid derived from primary human upcs for various ub markers. human fetal kidney from . -week ( . wk) gestational age was used as control extended data fig. genetic engineering hesc with a dual reporter system by crispr/cas pgk-neo cassette was used for antibiotic selection after gene targeting. this cassette was then excised from the genome by transient expression of cre, before the second reporter system wnt -gfp was knocked in. b, pcrbased genotyping results of pax -mcherry reporter knockin hesc (wt)"is used as a non-editing control. c, pcr-based genotyping result for cre-based excision of pgk-neo cassette. pcr product for pax -mcherry knockin with pgk-neo excised (ki w.o. neo) is bp. *, clones with biallelic excision of pgk-neo ki (homo)" is used as controls. note the very high efficiency in cre-based pgk-neo excision. d, schematic of the engineering of wnt -gfp reporter into the pax -mcherry reporter hesc line. pgk-neo cassette was used for antibiotic selection after gene targeting. e, pcr-based genotyping result for wnt -gfp reporter knockin extended data fig. generation of iub organoids from various hpsc lines a, immunostaining of hesc-derived mesendoderm cells (at the end of me stage) for mesendoderm marker scale bars, µm. c, flow cytometry analysis of mcherry+ and gfp+ cells differentiated from wnt -gfp/pax -mcherry dual reporter hescs. d, bright field (bf) and fluorescence images showing the induction of wnt -gfp expression in the mcherry+ aggregate upon extended culture in hubcm. scale bars, µm. e, flow cytometry analysis of gfp+ cells differentiated from sox -gfp reporter hipscs at the end of the ub-ii stage. f-h, bright field (f,g) and sox -gfp (h) images of branching iub organoid derived from sox -gfp reporter hipscs in a typical passage cycle. the indicated budding structure shown in (f) was dissected and re-embedded cctttggtgtggtaaagactctc gaccgaactgaacaggtactg aactcactgaccttcaactcct ccatagctgagcatgttggt extended data key: cord- -pij hbi authors: klein, steffen; wimmer, benedikt h.; winter, sophie l.; kolovou, androniki; laketa, vibor; chlanda, petr title: post-correlation on-lamella cryo-clem reveals the membrane architecture of lamellar bodies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pij hbi lamellar bodies (lbs) are surfactant-rich organelles in alveolar cells. lbs disassemble into a lipid-protein network that reduces surface tension and facilitates gas exchange in the alveolar cavity. current knowledge of lb architecture is predominantly based on electron microscopy studies using disruptive sample preparation methods. we established and validated a post-correlation on-lamella cryo-correlative light and electron microscopy approach for cryo-fib milled cells to structurally characterize and validate the identity of lbs in their unperturbed state. using deconvolution and d image registration, we were able to identify fluorescently labeled membrane structures analyzed by cryo-electron tomography. in situ cryo-electron tomography of a cells as well as primary human small airway epithelial cells revealed that lbs are composed of membrane sheets frequently attached to the limiting membrane through “t”-junctions. we report a so far undescribed outer membrane dome protein complex (omdp) on the limiting membrane of lbs. our data suggest that lb biogenesis is driven by parallel membrane sheet import and by the curvature of the limiting membrane to maximize lipid storage capacity. lamellar bodies (lbs) are specialized organelles exclusively found in alveolar type epithelial cells (aec ) and in keratinocytes . alveolar lbs produce, accumulate and secrete surfactant, a mix of specialized lipids and proteins. upon secretion into the alveolar cavity, it rapidly disassembles into a highly organized network. pulmonary surfactant reduces the surface tension at the air-water interface in the alveoli to facilitate gas exchange during respiration. therefore, it must be constantly replenished to sustain breathing . aec and surfactant are primary players in the pulmonary immune response . defects in surfactant production are associated with increased risk of respiratory infection by pathogens such as influenza a virus , respiratory syncytial virus , pneumonia and mycobacterium tuberculosis . surfactant protein d (sp-d) specifically binds glycosylated pathogens, including sars-cov- . aec fatty acid metabolism and lb ultrastructure are severely disrupted by the pandemic influenza strain h n and the highly pathogenic h n strain . in spite of their importance in health and disease, many questions remain open about lb biogenesis, structure and secretion. lbs are composed of a core containing multilamellar membrane sheets surrounded by a limiting membrane as revealed by thinsection transmission electron microscopy (tem) . lbs contain % phospholipids by weight, mostly dipalmitoylphosphatidylcholine (dppc), unsaturated phosphatidylcholines, phosphatidylglycerols , as well as cholesterol and specialized surfactant proteins a, b and c (sp-a, sp-b, sp-c) . the majority of the lb-associated proteins are commonly found in lysosomes, lbs are thus classified as lysosome-related organelles. mass spectrometry identified proteins unique to lung lbs . while the core contains the small hydrophobic proteins sp-b and sp-c , the limiting membrane is enriched in the flippase atp binding cassette subfamily a member (abca ) . in the current model of lb biogenesis, lipids are flipped by abca from the cytosolic to the luminal leaflet and are imported into the lb core , where sp-b and sp-c are responsible for further lipid rearrangement into tightly packed membrane sheets . however, this model has been difficult to validate. the lbs' high lipid content is poorly preserved as a result of room temperature tem sample preparation, which relies on chemical fixation and dehydration. in consequence, the concentric membranes inside the lb appear wrinkled. therefore, it is neither understood how they are organized in three-dimensions ( d) nor is it known how the membrane stacks are formed. a study employing cryo-electron microscopy of vitrified sections (cemovis) on rat lungs enabled imaging of frozen-hydrated lbs and showed smooth concentric membranes . however, due to compression artifacts caused by sectioning and lack of compatibility with cryo-electron tomography (cryo-et) , the study provided only little insight into the complex lb architecture. unlike cemovis, cryo-focused ion beam (cryo-fib) milling enables preparation of thin cellular lamellae of arbitrary thickness with a smooth surface and without compression such that they are compatible with cryo-et , . correlative light and electron microscopy (clem) enables unequivocal identification of the targeted compartments and yields structural details . clem methods have been adapted to cryo-em and have successfully been implemented on in vitro samples , or for cryo-et performed on whole cells [ ] [ ] [ ] [ ] . however, a correlation of light microscopy (lm) and electron microscopy (em) data in a workflow involving cryo-fib milling is challenging due to the milling geometry and multiple transfers between microscopes: each transfer increases the risk of sample devitrification and icecontamination. so far, available in situ cryo-clem workflows involving cryo-fib milling are aimed at sitespecific cryo-fib milling , . here, we show that precise knowledge of the lamella position in the context of the entire cell determined by cryo-lm after cryo-tem imaging facilitates accurate mapping of the original lm data to cellular structures on the lamella. in the presented workflow, a d correlation is applied to target the region of interest for milling in the x-y plane of the grid. a second, post-correlation step utilizing lm data acquired after cryo-tem imaging, deconvolution and d correlation is then applied to identify the observed structures corresponding to the position of the lamella not only in x-y but also in the z dimension. we show that the latter is essential to increase the correlation precision by computationally removing out-of-lamella fluorescent signals. we applied the post-correlation on-lamella cryo-clem workflow to study lbs within a cells, a model for aec , that were transiently transfected with abca -egfp, a well-characterized lb marker . after both correlation steps, % of the abca -egfp signal corresponded to membrane-bound organelles containing either vesicles or lamellated membranes typical for lbs. in situ cryo-et allowed us to structurally characterize the membrane organization in abca -egfp positive lbs without sample preparation artifacts. the lb core shows tightly packed membrane sheets with varying curvature. we found parallel bilayer sheets connected perpendicularly to the limiting membrane via "t"-junctions and concentric bilayer sheets as hallmark structures of lbs. in addition, our work revealed a large outer membrane dome protein (omdp) on the limiting membrane of some lbs, presumably involved in their formation and trafficking. to corroborate our findings, we analyzed lbs in primary human lung cells, where we observed both "t"-junctions and omdps. the goal of on-lamella cryo-clem is to precisely localize an organelle of interest within the cell both in the x-y plane and in the z-dimension. while the former is straightforward, the latter is challenging because (i) the z-dimension of the lamella is reduced to under % of the original z-dimension of the cell (cryo-lamellae have a thickness of - nm, while an intact cell has a thickness of - µm) and (ii) the cryo-lm has limited z-resolution. to overcome these difficulties, we implemented a post-correlation on-lamella cryo-clem workflow that uses deconvolved cryo-lm d maps of the cell before and after cryo-lamella preparation ( fig. ). we opted for performing the cryo-lm acquisition of the cryo-lamellae after cryo-tem imaging, as imaging in a cryo-lm increases ice contamination and hence decreases the quality of cryo-et. this results in a loss of fluorescence on the lamellae due to the electron beam damage occurring during cryo-et. we therefore use the unimaged surrounding cell body that retains its fluorescence to register the cryo-lm map acquired before and after cryo-tem acquisition ( supplementary fig. ). we use the cryo-lm map acquired after cryo-tem to identify lamella position and tilt, while we use the initial cryo-lm map acquired before milling to obtain the fluorescence signal. after d registration of the two deconvolved cryo-lm maps, it is possible to measure the tilt of the lamella using the brightfield signal in the z-y-plane and tilt the complete composite stack accordingly. this enables us to extract a single slice of the registered and tilt-corrected image stack containing only the fluorescent signal corresponding to the lamella. using rigid transformation, this fluorescence map can finally be correlated to the cryo-tem map, facilitating the identification of the structure of interest analyzed by cryo-et. as the correlation of cryo-lm and cryo-tem maps is performed after cryo-et, we named this approach "postcorrelation on-lamella cryo-clem". as a proof of principle, we initially applied the postcorrelation on-lamella cryo-clem workflow to localize fluorescently labelled lipid droplets (lds) in a cells because lds are easily distinguishable by cryo-em. we could validate all lds (n = ) identified by cryo-em on cryo-lamellae by our correlation approach (supplementary fig. ) with an average x-y correlation precision of nm (standard deviation (sd) = nm, n = ). we next analyzed the impact of the tilt correction and extraction of a single x-y slice of our workflow. to this end, we repeated the correlation of the lamellae, but instead of an extracted x-y slice of the fluorescent map, we used a maximum intensity projection (mip). the comparison of both correlations ( supplementary fig. ) revealed that the tilt correction and extraction of a single x-y slice is essential for the workflow and reduces the out-oflamella signal by %. the correlation using a mip fluorescent map showed out-of-lamella ld signals which correspond to lds removed during cryo-fib milling. by tilt correction and extraction of a single x-y slice we were able to reduce the number from to out-of-lamella signals. to localize lbs in a cells, we used abca -egfp overexpression, which induces the formation of lb-like organelles . at h post-transfection, a cells contained a median of large spherical structures per cell (sd = , range - ). the abca -egfp signal predominantly localized to the limiting membrane and exhibited an average diameter of . µm (sd = . µm, range . µm - . µm) as revealed by confocal microscopy (supplementary fig. ). these measurements are in line with the previously reported lb diameter (range . - . µm), based on em studies performed on lung cells . we transiently transfected a cells grown on gold em-grids and additionally stained them with a nucleus dye and a neutral lipid dye labeling lds with the intention to use them as fiducial markers, thereby increasing the correlation precision. however, abca -egfp overexpression leads to a depletion of lds in the cells. in addition, the neutral lipid dye is also localized to the core of abca -egfp positive organelles ( supplementary fig. ) preventing us from using them as fiducial markers. after vitrification by plunge freezing, the sample was transferred under cryogenic temperatures to a cryo-lm to map the em-grid. z-stacks in transmitted light brightfield (tl-bf) and fluorescent channels were acquired covering a large area (ca. . mm × . mm) of the grid to maximize the number of potential areas for subsequent milling. to improve contrast and reduce the signal of stray light of the wide-field microscope setup, as well as to improve the correlation precision, we performed deconvolution of the cryo-lm data ( supplementary fig. ). after deconvolution, we were able to discriminate closely apposed abca -egfp positive organelles, which prior to deconvolution appeared as one organelle ( supplementary fig. d-j) . maximum intensity projection (mip) and stitching of cryo-lm data were used to generate a map, from which suitable areas exhibiting large abca -egfp positive organelles were selected for milling (fig. a) . in most cases, stress relief cuts positioned next to the cryo-lamellae were used to reduce bending of cryo-lamellae before the final milling step . grids containing - self-supporting lamellae (fig. b ) were transferred to a cryo-tem where they were mapped to figure) . e, to combine the information of the fluorescent signal from step (a) and the lamella position from step (d), both z-stacks are aligned using an automated d registration algorithm. after image registration, the transmitted light bright field (tl-bf) channel of step (d) is combined with the fluorescent channels of step (a) leading to a combination of lamella position and fluorescent signal in a single composite z-stack. f, to compensate for lamella tilt, the tilt is measured using the tl-bf signal, the z-stack is rotated accordingly and a z-slice corresponding to the lamella is extracted. g, finally, the extracted z-slice is registered to the cryo-tem map from step (c). judge overall lamella quality and to localize membranebound organelles (fig. c) . among frequently present organelles such as mitochondria and nuclei, membranebound structures with a lamellated membrane architecture were selected for cryo-et as putative lbs. to obtain a fluorescent map of the milled cell for on-lamella postcorrelation, grids were mapped a second time by cryo-lm after retrieval from the cryo-tem (fig. d) . milled areas on the grid did not show any specific fluorescent signal, likely due to electron-induced beam-damage, whereas the fluorescence in the surrounding body of the cell was preserved ( supplementary fig. ). in some cases, ice contamination detected in tl-bf on the lamella resulted in an auto-fluorescent signal in all fluorescent channels (fig d, asterisk) . fluorescent images of the cell before and after cryo-tem for all analyzed lamellae are shown in supplementary fig. . to retrieve the fluorescent signal corresponding to the lamella position we utilized d image registration and rigid transformation using one of the deconvolved fluorescence channels to overlay both image volumes (fig. e ). the loss of fluorescent signal by using stress relief cuts was negligible and hence did not impede the correlation. the resulting combined image stack incorporates the fluorescent signal before fib-milling with the tl-bf signal after milling, which contains the precise lamella position. a fluorescent map corresponding to the lamella was extracted from the combined image stack by correcting for the milling angle ( fig. f ) and finally by extracting the slice corresponding to the plane of the lamella (fig. g) . we used the four corners of the lamella as well as clearly identifiable fluorescently labeled markers like the nuclear envelope as landmarks to correlate the extracted zslice with the cryo-tem map (fig. h ) by non-rigid transformation. due to the loss of lds by abca overexpression ( supplementary fig. ) we were not able to use lds as fiducial markers which could have further improved the correlation precision in x-y. a detailed view of the correlated cryo-tem map showed that the abca - a cells grown on em-grids were transiently transfected with abca -egfp and nuclei and lipid droplets were fluorescently labeled prior to plunge freezing. a, mip of the deconvolved cryo-lm stack acquired before fib-milling. abca -egfp is shown in green, and nuclei and lipid droplets are shown in cyan. b, angled fib image showing the same region before and after milling, respectively. c, stitched cryo-tem map of the lamella produced by fib milling acquired at , × magnification. white arrowheads show landmarks used for correlation. d, mip of the deconvolved cryo-lm stack acquired after tem imaging. the fluorescent signal in the cell body remains visible (arrows). ice contamination on the lamella (asterisk) shows autofluorescence which is detectable in all fluorescent channels and thus is not emitted from abca -egfp. e, mip of the registered and aligned image stack. f, z-y slice showing the lamella tilt corresponding to the fib milling angle (top) and z-y slice after rotation of the z-stack by °. the organometallic platinum coated edges of the lamella (arrows) can be seen as dark dots in the z-y-slice. g, extracted z-slice corresponding to the plane of the lamella with computationally removed the out-of-lamella fluorescent signal. white arrowheads indicate landmarks overlaid with those in (c). h, overlay showing the product of the non-rigid alignment between the tem map in panel (c) and the fluorescence image in panel (g). i, high magnification image of the correlated lamella. j, quantification of correlation with membrane bound organelles. a total of distinct correlated egfp signals of lamellae were assigned to corresponding structures of the cryo-tem map. abca -egfp signals were assigned to lbs, to membrane-bound organelles and could not be assigned to any structure. k, quantification of the correlation precision. for the successfully correlated structures, the distance and angle to the center of the corresponding abca -egfp signal was measured and shown in a radar plot. the average correlation precision is nm (sd = nm). scale bars: (a, d, e, g) µm, (c, f, h) µm, (i) µm. membrane-bound we classified structures corresponding to abca -egfp isolated signals (n = ) on lamellae. as a result, % of abca -egfp signal could be assigned to organelles containing lamellated membranes with a displacement smaller than µm. % of the abca -egfp signal was correlated to membrane-bound structures including multivesicular bodies (mvbs), endoplasmic reticulum (er) or vesicles. the remaining % of the abca -egfp signal could not be assigned to any membrane-bound organelles related to abca (fig. j, supplementary table ). correlated structures showed an average displacement to the local maxima of the corresponding abca -egpf signal of nm (sd = nm) (fig. k, supplementary table ). to evaluate the significance of extracting only the signal corresponding to the lamella on this sample, we also performed a correlation based on the mip of the tiltcorrected assembled stack for lamellae. in contrast to extracted slices, % of cryo-lm signals could not be assigned to any membrane-bound organelles observed by cryo-tem when mip was used for the correlation. this validates our analysis on lds ( supplementary fig ) , showing that tilt correction and the extraction of a single x-y slice improves the correlation. in summary, we could correlate % of abca -egfp signal with an average x-y correlation precision of nm to membrane bound organelles from which % showed lamellated morphology. we analyzed the d membrane architecture in tomograms containing abca -egfp positive organelles. all correlated organelles showed intraluminal vesicles and tightly packed membrane sheets of varying curvature and with regular spacing in the core ( supplementary fig. ). in lbs showing large concentric lamellated architecture (fig. , supplementary movie ), we observed that the curvature of membrane sheets increases towards the center, concurrent with a spherical d organization of the parallel sheets. detailed analysis of the regular membrane spacing by fast fourier transform (fft) of a central region (fig. b ) revealed three major frequencies which correspond to intra-headgroup-distance ( . nm - ), bilayer width ( . nm - ) and bilayer repeat ( nm - ). this is supported by a density line plot of the same area (fig. d ). in addition, we observed a crystalline lipid structure with a lateral repeat of . nm as measured by fft (fig. g) . supplementary fig. k, l) . b, detailed view of the parallelcurved membrane organization of the lb core. c, fft analysis of parallel-curved membrane spacing, area used for fft analysis is indicated in (a). fft analysis reveals three major frequencies at . nm - (headgroup-headgroup), . although membrane sheets in parallel-curved arrangements were most frequently found in lbs, spiral-coiled membrane sheets, as well as membranes forming closed compartments were also observed within the core of the lb (fig. , supplementary movie and ). since cellular membranes predominately form closed compartments, we next analyzed the ends of the parallel membrane sheets to provide insight on how the bilayer terminus is structurally organized. rounded densities at the bilayer termini were frequently observed indicating that the acyl chains of the phospholipids are not exposed to the aqueous surrounding ( fig. c-e, arrows) . interestingly, membrane sheets were typically found perpendicularly oriented towards the lb limiting membrane. these sheets were often connected to the limiting lb membrane with a thin density (fig. c) . based on the perpendicular shape of the connection, we call these direct contacts "t"-junctions. after closer inspection of lbs' limiting membranes, we observed a dome-shaped protein complex, hereafter referred to as outer membrane dome protein (omdp) (fig. ) . omdps appeared hollow and were found in the proximity of er cisternae partially decorated with ribosomes ( fig. a-d) . although omdps were rare (n = ) in abca -egfp transfected a cells, they were frequently observed (n = ) on the limiting membrane of lbs and mvbs in a long-term culture from tomograms (supplementary fig. ) as well as on the limiting membrane of lbs in hsaepc cells (n = ) ( fig. e-h) . to obtain insights into the omdp structure, we manually extracted subtomograms (n = ) containing omdps from tomograms and performed subtomogram averaging (sta) in dynamo . the sta of omdp revealed a hollow dome-shaped structure (fig. i-l) with a base diameter of nm, height of nm with a kink at nm and top diameter of nm. symmetry search performed on the calculated average indicated that omdp has a c symmetry. to validate that the architecture of lbs induced by abca -egfp expression in a cells represents the architecture of physiologically formed lbs, we used primary human small airway epithelial cells (hsaepc) isolated from the distal portion of the human respiratory tract, and also included long-term a cultures, which were previously shown to recover a differentiated phenotype with enhanced lb production as found in primary aec cell . in situ cryo-et of milled hsaepc revealed lamellated organelles (fig e-h, supplementary fig. ) in analyzed tomograms consistent with lbs observed in a cells overexpressing abca . importantly, lbs found in hsaepc contained membrane sheets attached to the lb limiting membrane via t-junctions (fig e) and the curved lamellated membranes were less condensed (fig f) . lbs in the hsaepc cells were frequently found in the proximity to intermediate filaments (fig e) . in addition, whole-cell cryo-et of long- supplementary fig. a, b) . b, manual rendering of the tomogram (a). the limiting membrane of the lb is labelled in yellow, membrane sheets are labelled in green. to distinguish individual membrane sheets, alternating shades of green were used. continuous membranes are labelled in red. a spiral-coiled membrane sheet is labelled in orange. c-e, detailed views of reconstructed tomogram. perpendicularly oriented membrane sheets towards the limiting membrane are often connected via a thin density ("t"-junction) to the limiting membrane of the lb (arrowheads). membrane sheets show a rounded density at the membrane termini (arrows). fig. ) in out of analyzed tomograms. they exhibited similar morphologies to lbs produced by abca -egfp overexpression and had an average diameter of . µm (sd = . µm, range . - . µm). consistent with previously published data , we found lamellated structures in only out of analyzed tomograms of short-term non-transfected a cells, indicating that frequent splitting of a cells is impeding lb formation. in this study, we present a workflow for postcorrelation on-lamella cryo-clem and employ it to study the d ultrastructure of lbs in unperturbed conditions. clem approaches are extensively used with chemically fixed or high-pressure frozen-freeze substituted (hpf-fs) samples to identify target structures and rare cellular events . however, such methods are inevitably compromised by sample preparation artifacts from dehydration, post-staining and sample sectioning. in particular, the structure of membranes and lipid-rich organelles is poorly retained due to lipid extraction occurring during sample dehydration by organic solvents. cryo-em allows direct visualization of membranous structures close to their native state. correlative approaches such as cryo-clem can be applied to identify molecules within the vitreous sample under the condition that the sample remains vitrified and free of ice contamination during the transfer between cryo-lm and cryo-tem . until now, cryo-clem workflows involving cryo-fib milling rely on high-precision d correlation for site-specific milling to target the region of interest. a publication by arnold et al. reported lamella preparation with d targeting based on cryo-lm and cryo-fib/sem correlation. this workflow was established on suspended cells mixed with fluorescent beads and required a custom-built cryo-stage for a spinning disk confocal microscope. in such an approach, the correlation accuracy is sensitive to the number and distribution of large fiducial beads and drift occurring during the milling . recently, a prototype microscope with a cryo-lm integrated into the cryo-fib microscope chamber was developed . this allows monitoring the fluorescent signal during milling and on the finished lamella. this signal can then be directly correlated without a sample transfer. in this study, we implemented an alternative strategy and developed a robust on-lamella cryo-clem approach, taking advantage of post-correlation. it is suitable for adherent cells cultured directly on em grids without the introduction of fiducial beads, utilizing a commercially available leica cryo-lm. although tilt series cannot be acquired based on a priori acquired fluorescence information, the postcorrelation approach gives the opportunity to correlate only vitreous, high-quality lamellae judged by cryo-tem. most of our attempts to perform cryo-lm after milling, but prior to cryo-tem, resulted in increased ice contamination on the lamellae, thus preventing high-quality cryo-et. in addition, we rarely detected fluorescence directly on lamellae, which could be due to the sensitivity limits of the leica wide-field cryo-lm. although deconvolution of widefield cryo-lm has been successfully used for tissue imaging , deconvolution has not been fully applied in cryo-lm of vitrified samples for cryo-clem applications. we show that the deconvolution of cryo-lm wide-field data is a necessary step for successful correlation and thus should be applied also to other cryo-clem strategies. computationally performed tilt correction and slice extraction are critical to the workflow even for large organelles like lbs. when applied together, quantification showed that % of abca -egfp signal localized to membrane-bound structures and lb organelles. a residual % of signals could not be assigned to any mvb/lb-like structures, likely because of the limited axial resolution of widefield microscopy. even though deconvolution can improve both lateral and axial resolution, it remains highly anisotropic and much larger than the - nm lamella thickness. the correlation precision of the described microscope setup is limited by the aberrations and the numerical aperture (na) of . of the × objective used in the cryo-lm, which in particular affects resolution in the zdirection. the development of lenses with improved aberration-correction, cryo-immersion lenses with a higher na and the implementation of confocal or super-resolution cryo-lm can further improve the z-resolution and thereby facilitate more accurate removal of out-of-lamella fluorescent signal in the post-correlation workflow. overexpression of abca -egfp resulted in fewer lds compared to naïve a cells ( supplementary fig. ). it is possible that the abca overexpression results in mobilization of neutral lipids in lds for phospholipid synthesis. thus, we were not able to use lds as correlation markers and the final correlation was performed using the corners of the lamella which limited the x-y correlation precision to nm (fig. k) . in comparison, we were able to correlate fluorescently labelled lds with a x-y precision of nm. because of the characteristic spherical morphology of lds and their increased electron density recognizable by cryo-em, lds can be used as fiducial markers to increase the correlation precision. cryo-lm performed at low relative humidity ( - %) improves both cryo-lm data and cryo-fib milling. therefore, integrating cryo-lm imaging into the cryo-fib/sem chambers, which allows obtaining fluorescence data before and after cryo-fib milling , is of great benefit to avoid sample transfers. lb ultrastructure has long been subject to debate. both parallel straight and concentric membrane sheets have been proposed as major structural components of lbs and both types were observed by cemovis . however, cutting artifacts and a lack of d data prevented a further understanding of the organelle architecture. detailed analysis of membrane spacing revealed a repeat of nm, compared to . nm found by cemovis of rat lungs . the difference in the membrane spacing could either be explained by the compression up to % that occurs during sectioning , by maturation stage differences of the lbs, or by differences between human and rat lbs. the fact that primary human lung cells showed less compact packing of lamellar membranes than abca overexpressing a cells indicates that lbs membrane spacing is variable and the lb maturation might be dependent on abca levels and its regulation in a primary or immortalized cell line. post-correlation on-lamella cryo-clem combined with in situ cryo-et of abca -egfp positive organelles allowed us to unravel lb architecture in d. the core exhibits multiple concentric membrane sheets with increasing curvature towards the center. however, none of the lbs observed were exactly centrosymmetric, which had previously been proposed based on polarized light microscopy . "t"junctions can be observed on the inner leaflet of the limiting membrane, where membranes are pushed into the lipidic core as parallel sheets consistent with a model proposed by pérez-gil . in addition to parallel-curved membrane sheets, we occasionally observed crystalline lipid structures inside the lb core with a regular spacing of . - nm as revealed by fft. this spacing is in accordance with cholesterol ester crystals found in lipid droplets . our finding that abca -egfp signal localizes not only to lbs but also to other membrane-bound organelles like mvbs, er or vesicles is concurrent with previous data on trafficking and proteolytic processing of abca in lamp positive vesicles . however, abca distribution and its fraction on lb might be different in naive a cells and in primary lung cells. since lbs found in abca -egfp expressing a cells do not only contain lamellated membranes but also crystalline lipidic structures and vesicles, these organelles rather represent 'composite bodies', a precursor form of lbs. lbs found in primary lung cells were morphologically similar to those induced by abca -egfp overexpression, indicating that neither the lamellar organization nor "t"-junctions found in the abca -egfp positive organelles are artifacts of overexpression. notably, we did not observe many vesicles inside lbs of hsaepc cells, thus vesicles might not play a major role in lb biogenesis in primary lung cells. the a long-term culture showed an increased number of organelles compared to short-term non-transfected cells. since no lb marker was used to localize lbs in a long-term culture, we performed cryo-et on both cryo-fib-milled cells and the periphery of whole cells. interestingly, the majority of large lbs was found in the cell periphery of a long-term culture, where lb maturation and surfactant secretion occur. the post-correlation on-lamella cryo-clem workflow can be applied to study localization of other proteins involved in the maturation of lb such as rab , which plays an indispensable role in maintaining the morphology of lbs . we identified a dome-shaped protein complex on the lb outer membrane both in a cells and in primary lung cells that to our knowledge has not been reported yet. sta revealed that omdps exhibit a hollow cage-like structure, however, the limited number of the averaged omdps prevented obtaining molecular details of the complex. nevertheless, based on the indicated c symmetry we speculate that the complex is built of multiple subunits composed of protein assemblies symmetrically organized around a single rotation axis. notably, we frequently observed the structurally well-characterized mda major vault protein (mvp) in the cytosol of a cells, which forms large cage structures with -fold dihedral symmetry . however, the dimensions of the terminal caps of the mvp are different from the omdp, indicating that omdp is not derived from mvp ( supplementary fig. ). although omdps were rarely detected in abca -egfp expressing cells, they were frequently observed in a long-term culture. we believe that the low occurrence of omdps in abac -egfp overexpressing a cells can be explained by high abca occupancy on the limiting membrane, causing a displacement of other proteins. since omdps have also been observed on mvbs, we assume that this complex plays a more general role in endo-lysosomal trafficking. the composition of the omdp is beyond the scope of this study and warrants further investigation. since mature lbs are exocytosed, we speculate that omdps could be involved in organelle tethering or in the regulation of fusion between the limiting membrane and the plasma membrane. an omdp can be observed in a tomogram deposited by ben engel (emdb ), indicating that the complex might be conserved across several domains of life and could be involved in membrane biology in general. based on structural analysis of abca -positive organelles as well as lbs found in the a long-term culture, we propose a model for lb biogenesis (fig. ): abca - mediated membrane lipid asymmetry causes the inner leaflet to push bilayers into the lumen forming "t"-junctions. it is conceivable that abca itself or other proteins are responsible for the concentration of phospholipids into membrane sheets. sp-b is a small protein that contains amphipathic helices and thus might be responsible for stabilizing the membrane sheets' termini and preventing their fusion into vesicles. these sheets are detached from the lb limiting membrane and shaped according to the principal spherical curvature of the lb to maximize the lipid storage capacity per given volume. finally, omdp observed on the limiting membrane might regulate lb trafficking or be involved in exocytosis. in conclusion, we developed and implemented a robust post-correlation on-lamella cryo-clem workflow which can be applied to prove identity of cellular organelles using a well-defined fluorescent marker to reveal the structural details of organelles in the context of the native cellular environment. a cells were purchased from ecacc (lot j ) and cultured in dmem/f medium (thermofisher scientific) supplemented with % (v/v) fetal bovine serum (fbs) (thermofisher scientific) and u/ml penicillin-streptomycin (thermofisher scientific) at °c, % co . hsaepc primary human lung cells were purchased from promocell (lot z . ) and used at passage . the hsaecp cells were maintained in small airway epithelial cell basal medium (promocell, c- ) supplemented with small airway epithelial cell growth medium supplement mix (promocell, c- ) and u/ml penicillin-streptomycin (thermofisher scientific) for weeks (the medium was exchanged times a week) and detach kit (promocell, c- ) was used to detach the cells from the culture dish before seeding the cells on em grids. a long-term culture was maintained without subculturing according to a protocol by cooper et al. in ham's f (sigma-aldrich) supplemented with % fbs and u/ml penicillin-streptomycin (thermofisher scientific) for days. the medium was exchanged twice a week. to prepare samples for plunge-freezing, mesh quantifoil™ au r / grids were plasma-cleaned in a gatan solarus (gatan) for sec, sterilized by dipping into % ethanol, placed in a mm dish (thermofischer scientific) coated with a thin layer of polydimethylsiloxane (pdms, dow europe gmbh) and incubated in ml of medium at °c for min. subsequently, the medium was removed and . x cells were seeded on grids placed in the pdms-coated dish containing ml of medium. the next day, cells were transiently transfected with µg pe-habca -egfp plasmid (habca -egfp expression controlled by cmv promoter) in trans-it lt transfection reagent (mirus). h post-transfection the medium was removed, and the cells were incubated in fresh medium containing µg/ml hoechst (sigma-aldrich) and nm lipiblue (dojindo europe) to stain nucleus and lipid droplets, respectively. after min incubation, cells were washed three times with medium. cells were plunge-frozen into liquid ethane immediately after fiducial staining using a leica em gp automatic plunge-freezer. cryogen temperature was set to - °c and the chamber to °c and % humidity. grids were blotted from the back with whatman® type paper for - sec. µl medium was added to the grid just before plungefreezing. for samples that were used for whole-cell tomography, protein-a gold fiducials with a nominal diameter of nm (aurion) were added. grids were inserted into autogrids™ (thermofisher scientific) designed for fib milling. fluorescent maps of vitreous samples were acquired using a em cryo-clem widefield microscope (leica) equipped with a × air objective (na . ), metal halide light source (el ), an air-cooled detector (dfc gt) and a cryostage cooled to - °c in the room with relative humidity ranging between - %. a square of approximately . mm by . mm was acquired in the center of the grid using the las x navigator. for each field of view, a symmetrical µm z-stack with a nm spacing was acquired around the autofocus point. the following channels were used: tl-bf, dapi (ex: bp / , em: / ) and gfp (ex: bp / , em: / ). tif files were exported from las x and a mip map was stitched using the grid/collection stitching plugin in imagej/fiji . to increase the clarity of the fluorescence signal, image stacks containing cells of interest were subjected to deconvolution in autoquant x (media cybernetics). a theoretical point-spread function (psf) was utilized based on microscope and lens specifications (na . , × lens, refraction index ) and refined over iterations. after cryo-fluorescence imaging, self-supporting lamellae were prepared using cryo-fib milling on an aquilos dualbeam fib-sem microscope in a room with controlled relative humidity between - % (thermofisher scientific) as first described by rigort et al. . transfected cells were selected using correlation with a stitched mip fluorescence map obtained in the previous step using the maps . software (thermofisher scientific). cells were coated with an organo-metallic platinum layer for - sec and gradually thinned in steps at a stage angle of - ° using a ga + beam to yield lamellae with - nm thickness after the final milling step. if possible, microexpansion joints were used as described by wolff et al. to improve lamella stability. progress was monitored by sem imaging at . - kv with etd and t detectors. after lamella preparation, grids were transferred to a titan krios cryo-tem operated at kv (thermofisher scientific) equipped with a k direct electron detector and a gatan imaging filter (gatan). image acquisition was controlled using serialem . first, low-magnification maps were acquired at × magnification to find the lamellae, then medium-magnification maps (mmm) were acquired at , × for correlation and the identification of sites for tilt series (ts) acquisition at , × or , × (corresponding pixel sizes at the specimen level: . and . Å, respectively). ts were acquired using a dosesymmetric scheme with a constant electron dose of approx. e -/Å per projection, target defocus - μm, energy filter slit at ev, covering the range from ° to - ° in ° increments. on-lamella tomography was done with a stage tilted to ° to compensate for the pre-tilt of the lamella with respect to the grid. each projection was acquired in counting mode using dose fractionation and - individual frames, which were aligned and summed using the semccd plugin in serialem. whole-cell tomography was performed as stated above with the exception that the ts were acquired at magnification , × (pixel size at the specimen level: . Å) using a k direct electron detector instead of k direct electron detector. ts were processed using the imod package . ts were aligned using patch-tracking. before reconstruction, the contrast transfer function (ctf) was corrected by phaseflipping and the micrographs were dose-filtered. tomograms were reconstructed using the weighted backprojection algorithm with a sirt-like filter equivalent to iterations. for power spectral analysis in fig. , tomograms were reconstructed using the sirt algorithm implemented in dmod. renderings were created in amira . (thermofisher scientific). first, a membrane enhancement filter was applied. using the top hat segmentation tool, a first d rendering was created. based on this initial model, all membranes were manually segmented. sta was performed in dynamo . particles (n = ) were picked manually in tomograms and voxel subtomograms were extracted using a crop-on-wall function with the initial particle orientation assigned normal to the segmented limiting membrane of the lb. an initial reference was generated by averaging all particles and the average was iteratively refined using a half-dome shaped mask. a symmetry scan was performed using a focused mask on the top of the cage assembly. subsequently, the average was further refined using the c symmetry. after vitrification of fluorescently labeled cells by plunge freezing, grids were mapped on the leica cryo-lm. the grid was transferred to a dual-beam cryo-fib/sem. by lowprecision correlation of the mip fluorescent map with the sem map of the grid, cells of interest were selected and fibmilled. after a second transfer of the sample to the kv cryo-tem, the lamellae were mapped, areas of interest were selected, and ts were acquired. to facilitate high precision post-correlation of the lamellae, the grids were recovered after the cryo-et acquisition and the lamellae were mapped a second time in the cryo-lm to reveal the exact lamella positions on the grid. to merge the fluorescent signal from the first cryo-lm map with the information of the lamella position on the second cryo-lm map, the deconvolved stacks were registered in matlab b (mathworks) using the imregtform function with a custom script (see code availability). a rigid transformation matrix including translation and rotation was calculated based on the same fluorescent channel of both cryo-lm maps respectively. the remaining channels including tl-bf from the second cryo-lm map were transformed by applying the calculated transformation matrix. the fluorescent channels of the first cryo-lm map were merged with the transformed tl-bf channel of the second cryo-lm in fiji/imagej and saved as a composite tif. to compensate for the fibmilling angle, the composite stack was resliced using fiji/imagej. using the tl-bf channel, the actual lamella tilt in the stack was measured, and the composite stack was rotated accordingly to compensate for the tilt. the z-slice of the lamella was identified and extracted after a second reslice in fiji/imagej. this single tilt-corrected slice was correlated with the stitched tem-map using a non-rigid d transformation in ec-clem utilizing the four corners of the lamella as well as clearly identifiable fluorescently labeled organelles like lipid droplets as landmarks. to evaluate the correlation and the x-y precision of the post-correlation on-lamella cryo-clem workflow, the final correlation of the mmm and cryo-lm map was analyzed. local maxima of the cryo-lm map were calculated using fiji/imagej using a tolerance threshold of . for each local maximum, ultrastructures of the correlated mmm in a spherical area (radius = µm) were classified either as ld, lb or membrane-bound organelle (mvbs, er, vesicle). mitochondrial or nuclear membranes as well as areas without any membrane-bound structures were classified as unspecific. for each assigned ultrastructure, the distance and angle to the center of the local maximum was measured and plotted as a radar plot to estimate the correlation precision. confocal microscopy of a cells transfected with abca -egfp was done using a sp tcs laser scanning confocal microscope (leica) equipped with a × . na oil immersion objective and an environmental control chamber heated to °c. image analysis was performed in imagej/fiji . automated d segmentation of abca -egfp structures was done in imaris . . (bitplane) after deconvolution in autoquant x . (media cybernetics). the fiji/imagej macro for fluorescent map assembly and the matlab code used for d stack registration are. available on github under https://github.com/chlanda-lab. structure and function of lamellar bodies, lipid-protein complexes involved in storage and secretion of cellular lipids pulmonary surfactant: the key to the evolution of air breathing surfactant lipids at the host-environment interface. metabolic sensors, suppressors, and effectors of inflammatory lung disease surfactant protein a genetic variants associate with severe respiratory insufficiency in pandemic influenza a virus infection surfactant protein c-deficient mice are susceptible to respiratory syncytial virus infection surfactant protein genetics in community-acquired pneumonia: balancing the host inflammatory state structure, genetics and function of the pulmonary associated surfactant proteins a and d: the extra-pulmonary role of these c type lectins the sars coronavirus spike glycoprotein is selectively recognized by lung surfactant protein d and activates macrophages lethal h n influenza a virus infection alters the murine alveolar type ii cell surfactant lipidome fatty acid metabolism is associated with disease severity after h n infection the closer we look the more we see? quantitative microscopic analysis of the pulmonary surfactant system structural studies on lamellated osmiophilic bodies isolated from pig lung. p nmr results and water content biosynthetic routing of pulmonary surfactant proteins in alveolar type ii cells a small key unlocks a heavy door: the essential function of the small hydrophobic proteins sp-b and sp-c to trigger adsorption of pulmonary surfactant lamellar bodies identification of lbm , a lamellar body limiting membrane protein of alveolar type ii cells, as the abc transporter protein abca abca -mediated choline-phospholipids uptake into intracellular vesicles in a cells structure of pulmonary surfactant membranes and films: the role of proteins and lipidprotein interactions lamellar body ultrastructure revisited: high-pressure freezing and cryo-electron microscopy of vitreous sections cutting artefacts and cutting process in vitreous sections for cryo-electron microscopy focused ion beam micromachining of eukaryotic cells for cryoelectron tomography focused-ion-beam thinning of frozenhydrated biological specimens for cryo-electron microscopy correlated fluorescence and d electron microscopy with high sensitivity and spatial precision correlated cryofluorescence and cryo-electron microscopy with high spatial precision and improved sensitivity fluorescence-based detection of membrane fusion state on a cryo-em grid using correlated cryo-fluorescence and cryo-electron microscopy correlative cryo-electron microscopy reveals the structure of tnts in neuronal cells correlative cryo-light microscopy and cryo-electron tomography: from cellular territories to molecular landscapes high-precision correlative fluorescence and electron cryo microscopy using two independent alignment markers correlated fluorescence microscopy and cryo-electron tomography of virusinfected or transfected mammalian cells site-specific cryo-focused ion beam sample preparation guided by d correlative microscopy a selective autophagy pathway for phase separated endocytic protein deposits a continuous tumor-cell line from a human lung carcinoma with properties of type ii alveolar epithelial cells mind the gap: micro-expansion joints drastically decrease the bending of fib-milled cryolamellae dynamo: a flexible, user-friendly development tool for subtomogram averaging of cryo-em data in high-performance computing environments long term culture of the a cancer cell line promotes multilamellar body formation and differentiation towards an alveolar type ii pneumocyte phenotype pie-scope, integrated cryocorrelative light and fib/sem microscopy removal of subsurface fluorescence in cryoimaging using deconvolution aberration-corrected cryoimmersion light microscopy towards correlative super-resolution fluorescence and electron cryo-microscopy polarized light microscopy reveals physiological and drug-induced changes in surfactant membrane assembly in alveolar type ii pneumocytes liquid-crystalline phase transitions in lipid droplets are related to cellular states and specific organelle association the surfactant lipid transporter abca is n-terminally cleaved inside lamp -positive vesicles rab targets to lamellar bodies and normalizes their sizes in lung alveolar type ii epithelial cells the structure of rat liver vault at . angstrom resolution globally optimal stitching of tiled d microscopic image acquisitions fiji: an open-source platform for biological-image analysis automated electron microscope tomography using robust prediction of specimen movements automated tilt series alignment and tomographic reconstruction in imod ec-clem: flexible multidimensional registration software for correlative microscopies we would like to thank dr. surafel mulugeta for kindly providing the plasmid habca -egfp and to prof. dr. hans-georg kräusslich for providing the hsaepc cells. we thank dr. stefan pfeffer for critical reading of the manuscript and to dr. ben engel and dr. lars anders carlson for a fruitful discussion about omdp. we would like to acknowledge microscopy support from the infectious diseases imaging platform (idip) at the center for integrative infectious disease research heidelberg. we would like to acknowledge access to the infrastructure and support provided by the cryo-em network at the heidelberg university (hd-cryonet). funded by the deutsche forschungsgemeinschaft (dfg, german research foundation) projektnummer sfb . key: cord- -k xhbssu authors: norwood, jordan n.; gharpure, akshay p.; kumal, raju; turner, kevin l.; pistone, lauren ferrer; vander wal, randy; drew, patrick j. title: intranasal administration of functionalized soot particles disrupts olfactory sensory neuron progenitor cells in the neuroepithelium date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k xhbssu exposure to air pollution has been linked to the development of neurodegenerative diseases and anosmia, but the underlying mechanism is not known. additionally, the loss of olfactory function often precedes the onset of neurodegenerative diseases. chemical ablation of olfactory sensory neurons blocks the drainage of cerebrospinal fluid (csf) through the cribriform plate and alters normal csf production and/or circulation. damage to this drainage pathway could contribute to the development of neurodegenerative diseases and could link olfactory sensory neuron health and neurodegeneration. here, we investigated the impact of intranasal treatment of combustion products (laboratory-generated soots) and their oxygen functionalized derivatives on mouse olfactory sensory neurons, olfactory nerve cell progenitors, and the behavior of the mouse. we found that after a month of every-other-day intranasal treatment of soots, there was minimal effect on olfactory sensory neuron anatomy or exploratory behavior in the mouse. however, oxygen-functionalized soot caused a large decrease in globose basal cells, which are olfactory progenitor cells. these results suggest that exposure to air pollution damages the olfactory neuron progenitor cells, and could lead to decreases in the number of olfactory neurons, potentially disrupting csf drainage. air pollution, particularly small combustion particles (< . µm, pm . ), is a large contributor to global mortality (burnett et al., ) . these small particles are produced by combustion in internal combustion engines, jet aircraft engines, and during cooking. once generated, these particles can be oxidized over time (rattanavaraha et al., ; li et al., ; pourkhesalian et al., ) , generating surface functionalized oxygen groups which can increase their cellular toxicity (li et al., ; holder et al., ; li et al., ) . in addition to the many other adverse health effects of air pollution, there is a strong epidemiological link between exposure to air pollution, particularly pm . , to the development of neurodegenerative diseases (wang et al., ; forman and finch, ; peters et al., ) and to mental disorders (atanasova et al., ; hummel et al., ; buoli et al., ) . exposure to air pollution, particularly fine particulate matter (pm . ), also leads to reduced sense of smell and anosmia (ajmani et al., a; ajmani et al., b) and can damage nasal tissue (calderon-garciduenas et al., ) . interestingly, anosmia and a decline of sense of smell precede the onset of neurodegenerative disorders (doty, ; wilson et al., ; rahayel et al., ; growdon et al., ; ottaviano et al., ; roberts et al., ; murphy, ) and is also associated with depressive disorders (croy et al., ; kohli et al., ) . similar damage and sensory deficits have been implicated in covid- pathology (cooper et al., ) . the observed associations between particulate exposure, decreased olfactory function, and development of neurodegenerative and mental disorders suggests that some of the observed degeneration might originate from the damage to olfactory sensory neurons (osns) in the nasal epithelium. the movement of cerebrospinal fluid (csf) is thought to remove waste from the brain (iliff et al., ; nedergaard, ) , and disruption of normal csf turnover and circulation has been hypothesized to lead to the development of neurodegenerative diseases (albeck et al., ; stoquart-elsankari et al., ; simon and iliff, ; benveniste et al., ) . in addition to csf drainage pathways through meningeal lymphatics and arachnoid granulations (boulton et al., olfactory neuron axons, and chemical ablation of osns blocks this normal outflow, leading to decreased csf production and/or altered csf circulation (norwood et al., ) . thus, any damage to olfactory sensory neurons by air pollutants, in addition to impairing the sense of smell, might lead to disruption of normal csf circulation which can then contribute to the development of neurodegenerative diseases. olfactory sensory neuron cell bodies are located in the nasal epithelium and send their axons to the olfactory bulb through the holes (foramina) in the cribriform plate. because these neurons are exposed to the environment, they have a relatively short lifetime (several months (gogos et al., ) ), and are constantly replenished throughout the lifetime of the organism. olfactory sensory neurons are generated from a population of nearby stem cells (brann and firestein, ; liberia et al., ) , and the ongoing neurogenesis of olfactory sensory neurons continues throughout the life of the animal. there are two classes of stem cells in the nasal epithelia that give rise directly and indirectly to osns, horizontal basal cells (hbcs) and globose basal cells (gbcs) (child et al., ) . gbcs generate olfactory sensory neurons, while hbcs are usually quiescent and are involved in regenerating the nasal epithelial in response to injury. the capacity for regeneration has limits and is reduced with aging or repeated insults (child et al., ) . chronic nasal inflammation causes degeneration of olfactory neurons and their progenitor cells in both humans and animals (chen et al., ; hasegawa-ishii et al., ) . insults that kill olfactory sensory neurons and their progenitor cells will lead to shrinkage and loss of the nasal csf outflow pathways. insults that kill either gbcs or hbcs will decrease the population of stem cells, potentially resulting in a decrease in the number of osns later in life. to better understand the effects of air pollution on olfactory sensory neurons and their progenitor cells, we investigated the impact of intranasal treatment with surrogates for combustion generated 'soots' synthesized from carbon black precursors. carbon black is primarily composed of elemental carbon, but like combustion-produced soot, it is formed by the partial combustion or thermal decomposition of hydrocarbons (donnet, ) . the morphology consists of primary particles that are partially merged or appear "fused" into aggregates (fig. a) . such synthetic soots are also free of variable combustion-derived contaminants such as metals, ash, or condensed organics. we treated the mice with either non-functionalized soots, which resemble the combustion products immediately after their production, or functionalized soots that have been subject to oxygen functionalization, modifying their surface chemistry, mimicking the oxidation processes that would take place during atmospheric aging. we found that relative to vehicle controls, neither non-functionalized soots nor oxygen-functionalized soots had appreciable impact on olfactory sensory neurons. the effects of soot exposure on exploratory behavior was also minimal. however, oxygen-functionalized soots greatly decreased the levels of olfactory progenitor cells, suggesting that exposure to these particles can set up a long-term decrease in the number of osns. such a decrease could lead to anosmia and decreased csf movement. synthesis of soots: synthetic soot was produced by functionalizing commercial carbon black, (regal , cabot corp.) . carbon black was selected for its chemical purity and size similarity to diesel engine-produced soot. to introduce oxygen functional groups such as phenol, carboxyl and carboxylic, we used wet-chemical treatment based on acid etching (romanos et al., ) . in this preparation a gram of carbon black was treated with ml laboratory grade concentrated nitric acid (hno , > %) under reflux for a duration of hours at ˚c, just below the acid's boiling point of ˚c. the carbon-acid mixture was continuously stirred using a magnetic stirrer to ensure uniform exposure and functionalization. the mixture was maintained at a consistent simmer and was thereafter washed with distilled water, filtered, and dried to obtain functionalized carbon black as a synthetic, oxidized soot. any potential residual organic or aromatic compound present on the manufactured material as supplied would be oxidized and removed under these conditions. to visualize soot particles, we used transmission electron microscopy (fei talos f x instrument equipped with quad element eds detector capable of both transmission electron microscopy (tem) and scanning transmission electron microscopy (stem)). for imaging, a beam acceleration voltage of kev was used. beam current was kept less than na for which sample damage or alteration is negligible at these magnifications (< kx). image defocus was one or two steps before the eucentric position. images were captured using a ceta-cooled ccd. samples were dispersed and sonicated in methanol before being dropped onto mesh c/cu lacey tem grids. high angle dark field (haadf) images were obtained using an annular detector. eds for elemental analysis and mapping was performed in the tem. we also used the stem mode, which has a high spatial resolution on the order of the minimum probe size ( . Å). the instrument was fitted with a -quadrant sdd super-x eds detector for eds. the detection limit is typically < atomic percent (at. %) depending on collection parameters. typically, - regions of each material (nascent and functionalized forms) were sampled to gauge elemental representation. eds was performed in stem mode with a sample holder designed to provide low background signal for eds. xps experiments were performed using a physical electronics versaprobe ii instrument equipped with a monochromatic al kα x-ray source (hν = , . ev) and a concentric hemispherical analyzer. charge neutralization was performed using both low energy electrons (< ev) and argon ions. peaks were charge referenced to c-c band in the carbon s spectra at . ev. measurements were made at a takeoff angle of ° with respect to the sample surface plane. this resulted in a typical sampling depth of - nm ( % of the signal originated from this depth or shallower). quantification was done using instrumental relative sensitivity factors (rsfs) that account for the x-ray cross section and inelastic mean free path of the electrons. a thermogravimetric analyzer (ta , ta instruments) coupled to a discovery mass spectrometer (ms) was used to analyze mass loss and the composition of the evolved gases as a function of temperature. the temperature was ramped up at ˚c/min in an inert atmosphere. the tga features low volume, maximum temperature to ˚c and has an inert quartz liner. the ms is a quadrupole mass spectrometer with a heated capillary interface, offering a - amu range, unit m/z resolution. a horiba labram raman microscope was used to obtain raman spectra for the samples when exposed to a nm mw laser with a grooves/mm grating, providing a spectral resolution of cm - . xps was applied to dispersed powder to quantify both surface oxygen atom content (at. % basis) and distribution of oxygen functional groups (-c-oh, phenolic, -c=o, carbonyl, and -cooh, carboxylic), the nominal c s (energy loss) positions were , and . kev. casa was applied to deconvolve the high-resolution spectra, with group contributions ratioed to the total oxygen elemental content. as a baseline, nascent (untreated) carbon black was also subject to the same analytical procedure as a "blank" sample. wet acid reflux treatment of carbon black yielded ~ atomic % (near-surface) oxygen compared to the untreated carbon black, registering negligible surface content, (< at. %). by curve-fitting the c s spectral loss profile, the calculated distribution across function groups was determined as . % (phenolic, c-oh); . % (carbonyl, c=o); and . % (carboxylic, -cooh) (vander wal et al., ) . (the good agreement (± %) in the measured and calculated value of atomic oxygen indicates appropriate curve fitting for functional group identification.) the tga curve shows distinct regions of mass loss owing to functional groups leaving as temperature increases. the wt.% net mass loss corresponds to the gasification of the carbon by the chemisorbed oxygen groups. resolved by temperature, the tga spectrum supported xps identification of functional groups by successive mass loss stages for the oxygen group classes. temperature resolved mass loss curves reveal m/z peaks at amu (co ) arising predominantly from carboxylic groups and at amu (co) arising from carbonyl and phenol groups (kundu et al., ) . soot treatment protocol for mice: after the mouse had been rendered unconscious by a brief exposure to isoflurane, µl of soot (functionalized or non-functionalized, % in sterile h o) or vehicle control (sterile h o) was administered to the left nare dropwise using a pipette. the animal was then inverted to allow for excess fluid to exit the nasal cavity. this treatment was repeated every other day ( days a week) for one month. the animals were monitored and weighed daily after treatment. histology: mice were sacrificed via isoflurane overdose and perfused intracardially with confocal and images were processed using imagej (nih). cell quantification procedures: to quantify the mean fluorescence of pax and p antibody expression, images were first obtained on the olympus fluoview confocal. imaging settings were kept constant across samples to enable quantification of fluorescence. using imagej (nih), a rectangular roi was drawn ( µm in width and µm in height) along the apical side of the neuroepithelium. for every animal, the roi was drawn µm in the rostral direction from the cribriform plate within the neuroepithelium located on the dorsal side of the medial olfactory nerve. the mean fluorescence of the roi for each color channel (corresponding to each of the antibodies used) was obtained and averaged together for each treatment group. data was plotted and analyzed in graphpad prism , using one-way anova to test for significance. to measure any effects of intranasal soot treatment on behavior, mice were individually placed in a x x cm (l x w x h) plastic box one month after the start of the treatment. all experiments were performed between and zt. the acquisition and analysis were done with the experimenter blinded to the treatment, and the order of animals was randomized. mice were placed in the enclosure for minutes, and the behavior was quantified over this entire period. the enclosure was cleaned with % ethanol between mice. the amount of locomotion and rearing behavior were monitored using an intel® realsense™ depth camera d (hong et al., ) . this camera provides simultaneous visible light and depth information used to calculate the animal's distance from the camera. images were acquired at a nominal rate of frames/second using matlab (https://github.com/intelrealsense/librealsense). to track the distance the animal traveled, the distance between the centroid of the mouse was calculated between each successive frame. this distance between frames is then summed over the course of the minutes. rearing events were defined as when the mean of the highest % of pixels of the mouse exceeded cm from the bottom of the enclosure. a generalized linear mixed-effects model (matlab function fitglme) was used to evaluate the differences in rearing events, rearing duration, and distance traveled. each treatment (vehicle, non-functionalized and functionalized soot) was a fixed-effect, with the sex treated as a random effect. data availability: code for the acquisition, analysis and plotting of the behavioral data, as is the behavioral data plotted in figure , is available here: https://github.com/drewlab/norwood_gharpure_turner_ferrer-pistone_vanderwal_drew_manuscript transmission electron micrograph of a carbon black aggregate and primary particle is shown in fig. . the aggregate consists of pseudo-spherical primary particles, partially merged or fused together forming a fractal aggregate. a raman spectrum of the nascent carbon black is shown in fig. c . raman spectroscopy has been developed as a standard method for determining the planar coherence lengths (la) in graphitic carbon, which possesses limited long-range order (tuinstra and koenig, ) . the lower frequency "d" peak at ~ cm - arises from disorder-induced raman activity of zone-boundary a g phonons whereas the "g" peak at ~ cm - reflects the inplane stretching motion of the aromatic rings, designated as e g motions. their comparable intensity reveals considerable disordered carbon, characteristic of furnace blacks and representative of combustion-produced soot emissions (dennison et al., ; sadezky et al., ) . their intensity ratio is an accepted technique for determining la in disordered graphitic materials, given by the relation . (id/ig) - = la, calculated here as . nm, a value commensurate with the short lamellae viewed by hrtem (sadezky et al., ) . the asymmetry of the d-peak due to the extended low frequency (shift) tail is consistent with further disorder of the carbon lattice such as sp and sp carbon at the periphery of the crystallites, contributing vibrations of a g symmetry (sadezky et al., ; parent et al., ) . fluorescence from the oxygen groups and their auxochromic interactions with the electrons of the sp carbon network dwarfed the raman signature of the functionalized material, preventing its comparison to the nascent material. figure d shows a high angle dark field tem image of soot particles for reference and respective eds map displaying carbon (blue) and oxygen (red) for the nitric acid functionalized carbon black. while eds cannot point to a definitive volumetric vs. surface oxygen presence given its -d nature, nitric acid-treated carbon black shows oxygen appearing to be concentrated along the particle perimeter, reflecting a higher near-surface contribution near the particle edge along a) schematic showing the structure of a soot particle, which is an aggregate of smaller particles. b) tem image of a soot aggregate supported by the lacy mesh of the tem grid, illustrating the morphological structure of the particles. pseudo-spherical primary particles are coalesced, forming a branched aggregate whose d projection is shown in the image. c) raman spectrum of r- carbon black. the two peaks of similar intensity are indicative of unstructured (non-graphitic) carbon. d) left, a high angle, annular dark field (haadf) image of soot particles. formed by scattered (rather than transmitted) intensity, the uniformity illustrates the lack of crystallinity and absence of heavy elements such as metals. right, corresponding energy dispersive spectroscopy (eds) map. the elemental map reveals the spatial distribution of carbon and oxygen, integrated through the particle. the higher intensity at the particle perimeters shows that the oxygen is at the particle surfaces. the grid lacy mesh appears as arched support). e) photograph of suspended soot ( %) in water. the oxygen surface functionalization makes the particle hydrophilic, enabling stable dispersion in aqueous media. the beam path. in the -d image it must be noted that eds shows relative amounts of elemental carbon and oxygen and does not give information on what functional groups are present. an image of the % solution of soot in water that is applied intranasally is shown in figure e . soot accumulates in the nasal passageway and lungs, but does not change the structure of the olfactory nerve or osns we treated mice intranasally with vehicle, non-functionalized soot, or functionalized soot for one month. mice were briefly anaesthetized with isoflurane and an intranasal solution of soot (functionalized or non-functionalized, % in sterile water) or vehicle (sterile h o) was administered to the left nare. this treatment was repeated every other day (three times a week) for one month. we saw no appreciable differences between weight of mice of the different treatment groups (data not shown). after sacrifice, the skulls were rapidly decalcified (norwood et al., ) and sectioned. examples of thin sections of olfactory bulb/nasal cavity area are shown in fig a- c, and accumulation of functionalized soot in the olfactory epithelium, but not nonfunctionalized soot was observed. soot could be seen in the lungs of treated animals (fig d-f) . a total of . mg of soot particles was applied each day, though we conservatively estimate that < % remained in the nose after inverting the mouse. given that an average mouse respiratory volume over a day is ~ . l ( . ml tidal volume with breaths a minute this works out an effective dose equivalent to breathing air with a pm . level of ~ µg/m , comparable to the air quality in beijing (zíková et al., ) or new delhi (pant et al., ) . we also examined the status of olfactory sensory neurons to see if either type of soot had a detectable effect on their health. to assess any damage to the olfactory bulb or nerve caused by exposure to the soot particles, the area was examined histologically utilizing rapid decalcification and sectioning. olfactory sensory nerves enter the cranial compartment through the cribriform plate (bird et al., ) . we visualized the nerve in soot and vehicle-treated mice using an antibody against olfactory marker protein (omp) (fig. ) . we found no discernable difference in olfactory nerve labeling among the treatments, indicating that soot treatment does not have any obvious effect on olfactory sensory nerves for the treatment duration used. this is markedly different from intranasal treatment with zinc sulfate, a single treatment which causes the rapid and irreversible ablation of olfactory sensory neurons (norwood et al., ) . one important question is to what extent soot treatment impacts the behavior of the mice. we quantified locomotion and rearing behavior using an intel realsense d depth-sensing camera (hong et al., ) after intranasal soot exposure (fig. ) . treated mice from all three treatment groups were individually placed in a novel environment (white plastic container) after the one month of treatment and their movement and rearing behaviors were monitored for minutes. rearing behavior is a measure of anxiety (sturman et al., ) , and locomotion can be used to assay sickness and malaise (engeland et al., ) . no significant differences were observed in total rearing events or total rearing time for all treatment groups. however, a significant difference in total distance traveled was observed between the vehicle and both the functionalized and non-functionalized soot treatment groups. if the soot treatment causes pronounced health problems, we might expect large decreases in the amount of time rearing or locomotion behavior. as we did not observe pronounced changes in behaviors, this suggest the soot treatment does not cause any generalized decreases in health. comparisons of the effects of vehicle, non-functionalized, and functionalized soot on spontaneous rearing and locomotion behaviors over minutes. the data from each individual mouse is shown as a square (males) or circle (females). the mean of each group is shown as a diamond, standard deviation is denoted with error bars. a) plot of total number of rearing events for each treatment type. there was no significant difference in the number of rearing events between the control and either of the soot treatment groups (p < . non-functionalized, p < . functionalized). b) plot of total rearing time. there was no significant difference in the total rearing time between groups (p < . non-functionalized, p < . functionalized). c) probability distribution of individual rearing event durations for each of the treatment groups. (d) plot of total distance travelled by each mouse. treatment with non-functionalized soot (p < . ) and functionalized soot (p < . ) both significantly decreased the total distance traveled relative to the vehicle treated group. as we saw no obvious changes in olfactory sensory neurons and their axons, we then asked how soot treatment might affect other cell types in the nasal epithelium, particularly the progenitor cells that directly and indirectly give rise to olfactory sensory neurons. if these cells are damaged, then this could lead to a long-term decline in the number of osns as the animals age. we used immunofluorescence staining of the neuroepithelium to visualize changes in progenitor cells ( fig. a-b) . the expression of the anti-pax or anti-p primary antibody in the neuroepithelium was quantified to assess any disruptions in the number of globular basal cells . , p < . ) relative to non-functionalized soot (post-hoc unpaired t-test, t( ) = . , p < . ) and vehicle control (post-hoc unpaired t-test, t( ) = . , p < . ). g) no significant difference between group means of p fluorescence (one-way anova, f( , ) = . , p < . ). exposure. decreases in the number of gbcs could lead to decreases in the number of olfactory sensory neurons in the long term. in order to understand how air pollution might affect olfactory sensory neurons and their progenitor cells, we treated mice intranasally with surrogate soot-like particles that either had oxygen- functionalized surfaces or non-functionalized surfaces. we found that these compounds had minimal effects on behavior, the olfactory sensory nerve, or horizontal basal cells. however, oxygen functionalized soot greatly reduced the population of globular basal cells. our results are consistent with many other studies that have found that oxidized soots are more cytotoxic than un-oxidized soots (li et al., ; holder et al., ; pourkhesalian et al., ) . our results suggest a potential model of how long-term exposure to air pollutants could drive anosmia and decreased csf outflow into the nasal cavity (fig. ). exposure to oxidized soot particles reduces the number of gbcs. as olfactory sensory neurons senesce, the reduced population of gbcs leads to incomplete replacement of osns. the decrease in osns could then potentially lead to decreases in olfactory sensitivity seen with exposure to air pollution (ajmani et al., a; ajmani et al., b; hummel et al., ) . the decrease in osn axons could also reduce fluid outflow through the cribriform there are several limitations to our study. we do not know the mechanism by which oxygen functionalized soot preferentially damages gbcs. it could be that oxygen functionalized soot is more prone to accumulating in the nasal epithelium (fig. ) ( ) dna damage in nasal and brain tissues of canines exposed to air pollutants is associated with evidence of chronic brain inflammation and neurodegeneration. human and nonhuman primate meninges harbor lymphatic vessels that can be visualized noninvasively by mri effects of ambient air pollution exposure on olfaction: a review. lipopolysaccharide treatment in mice: a multivariate assessment of behavioral tolerance a critical review of assays for hazardous components of air pollution odor identification and alzheimer disease biomarkers in clinically normal elderly neuroplastic changes in the olfactory bulb associated with nasal inflammation in mice increased cytotoxicity of oxidized flame soot automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning position paper on olfactory dysfunction a paravascular pathway facilitates csf flow through the brain parenchyma and the clearance of interstitial solutes, including amyloid β the association between olfaction and depression: a systematic review functional groups on multiwalled carbon nanotube surfaces: a quantitative high-resolution xps and tpd/tpr study oxidant generation and toxicity enhancement of aged-diesel exhaust physicochemical characteristics and toxic effects of ozone-oxidized black carbon particles lymphatics in neurological disorders: a neuro-lympho-vascular component of multiple sclerosis and alzheimer's disease? outflow of cerebrospinal fluid is predominantly through lymphatic vessels and is reduced in aged mice rapid lymphatic efflux limits cerebrospinal fluid flow to the brain blocking cerebrospinal fluid absorption through the cribriform plate increases resting intracranial pressure olfactory and other sensory impairments in alzheimer disease quantification of cerebrospinal fluid transport across the cribriform plate into lymphatics in rats neuroscience. garbage truck of the brain anatomical basis and physiological role of cerebrospinal fluid transport through the murine cribriform plate olfaction deterioration in cognitive disorders in the elderly exposure in highly polluted cities: a case study air pollution and dementia: a systematic review effect of atmospheric aging on volatility and reactive oxygen species of biodiesel exhaust nano-particles the effect of alzheimer's disease and parkinson's disease on olfaction: a meta-analysis the reactive oxidant potential of different types of aged atmospheric particles: an outdoor chamber study association between olfactory dysfunction and amnestic mild cognitive impairment and alzheimer disease dementia silva amt, falaras p ( ) controlling and quantifying oxygen functionalities on hydrothermally and thermally treated single-wall carbon nanotubes raman microspectroscopy of soot and related carbonaceous materials: spectral analysis and structural information regulation of cerebrospinal fluid (csf) flow in neurodegenerative, neurovascular and neuroinflammatory disease aging effects on cerebral blood and cerebrospinal fluid flows exploratory rearing: a context-and stress-sensitive behavior recorded in the open-field test raman spectrum of graphite xps analysis of combustion aerosols for chemical composition, surface chemistry, and carbon chemical state toxicity of inhaled particulate matter on the central nervous system: neuroinflammation, neuropsychological effects and neurodegenerative disease olfactory impairment in presymptomatic alzheimer's disease longitudinal profiling of oligomeric abeta in human nasal discharge reflecting cognitive decline in probable alzheimer's disease cerebral oxygenation during locomotion is modulated by respiration on the source contribution to beijing pm . concentrations key: cord- -egtbzhg authors: richards, gareth; baron-cohen, simon; stokes, holly; warrier, varun; mellor, ben; winspear, ellie; davies, jessica; gee, laura; galvin, john title: assortative mating, autistic traits, empathizing, and systemizing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: egtbzhg it has been suggested that the children of parents with particular interests and aptitude for understanding systems via input-operation-output rules (i.e. systemizing) are at increased likelihood of developing autism. furthermore, assortative mating (i.e. a non-random pattern in which individuals are more likely to pair with others who are similar to themselves) is hypothesised to occur in relation to systemizing, and so romantic couples may be more similar on this variable than chance would dictate. however, no published study has yet tested this hypothesis. we therefore examined intra-couple correlations for a measure of autistic traits (autism spectrum quotient [aq]), self-report measures of empathizing (empathy quotient [eq]), and systemizing (systemizing quotient-revised [sq-r]), as well as the reading the mind in the eyes test (rmet) and embedded figures task (eft). we observed positive intra-couple correlations of small-to-medium magnitude for all measures except eq. further analyses suggest that these effects are attributable to people pairing with those who are more similar to themselves than chance (initial assortment) rather than becoming more alike over the course of a relationship (convergence), and to seeking out self-resembling partners (active assortment) rather than pairing in this manner due to social stratification increasing the likelihood of similar people meeting in the first place (social homogamy). additionally, we found that the difference in scores for the aq, sq-r, rmet and eft of actual couples were smaller (i.e. more similar) than the average difference scores calculated from all other possible male-female pairings within the dataset. the current findings therefore provide clear evidence in support of the assortative mating theory of autism. autism spectrum conditions are characterised by unusually routine behaviours, narrow interests, sensory differences, and social and communicative difficulties (american psychiatric association, ) . there is a marked sex difference in autism diagnosis, with approximately four males being diagnosed per every one female (baio et al., ; fombonne, ) , an effect that may (at least in part) be explainable in terms of biological factors such as atypical foetal sex hormone exposure (auyeung et al., ; and social stereotyping processes (bargiela et al., ; geelhand et al., ; whitlock et al., ) . the prevalence of autism is approximately - % (baio et al., ) , which is a marked increase from lotter's ( ) early estimate of . %. this increase in prevalence may be explained by better recognition of autistic symptoms, growth of relevant services, diagnostic upgrading (e.g. providing a diagnosis when uncertain and/or to ensure school support or access to services), and broadening of diagnostic criteria (arvidsson et al., ; dawson, ; fombonne, fombonne, , roelfsema et al., ) . however, it remains unclear whether such processes can explain the rise in prevalence in its entirety or whether the actual incidence of autism has also increased in recent years (fombonne, ). with excellent attention to detail, and the ability to remain focussed on repetitive tasks, autistic people often have an aptitude for working in science, technology, engineering, and mathematics (stem) industries (baron- cohen et al., baron-cohen, wheelwright, skinner, et al., ; ruzich et al., ; wei et al., ) . this effect also appears to extend to the broader autism phenotype (bap) -those people who display higher than average levels of autistic traits but do not warrant a clinical diagnosis (baron-cohen, wheelwright, skinner, et al., ; hoekstra et al., ; pisula et al., ; . furthermore, it has been suggested that autism might be subject to positive assortative mating (i.e. people who measure highly in autistic traits may be more likely than chance to have children together; baron-cohen, b , a , . however, there is more than one process by which this could operate. for instance, it may be that individuals with similar levels of autistic traits consciously or unconsciously seek each other out as romantic partners (active assortment) or that individuals with similar levels of autistic traits are more likely than chance to share other characteristics, such as a working environment, which may lead to an increased likelihood of a relationship starting (social homogamy); in addition, individuals may begin relationships with others who are more similar to themselves than expected by chance as regards autistic traits (initial assortment) or become more similar to their partner over the course of their relationship (convergence) (kardum et al., ; luo, ) . evidence for autism being subject to assortative mating comes from the observation that a person diagnosed with autism is - times more likely to marry or have a child with another autistic person than is someone without such a diagnosis (nordsletten et al., ) . it should, however, be noted that due to matching five controls to every case, the partner correlation (r = . ) observed by nordsletten et al. ( ) has been estimated to reflect a smaller effect (~r = . ) in the general population (peyrot et al., ) . furthermore, although a study utilising genome wide association study (gwas) data (yengo et al., ) lacked the statistical power required to detect a significant genotypic correlation, a more recent study by connolly et al. ( ) reported greater genetic similarity than expected by chance in the parents of autistic children. additionally, several studies have reported positive intra-couple correlations for phenotypic autistic traits measures (connolly et al., ; constantino & todd, ; duvekot et al., ; hoekstra et al., ; lyall et al., ; schwichtenberg et al., ; virkud et al., ) , although others have reported null (hoekstra et al., ; losh et al., ; pollmann et al., ; van steijn et al., ) or ambiguous findings (seidman et al., ) . meta-analysis of the available literature (k= , n= , [including the current study]) shows a small but highly statistically significant positive correlation, r = . ( % ci = . , . ), p < . (see appendix). interestingly there appears to be no published research examining intra-couple correlations for measures of empathizing (baron- cohen & wheelwright, ) and systemizing , despite it having been hypothesised that autism may result from the pairing of two high systemizers (baron- cohen, ) . the current study aims to increase our understanding of the processes that might underpin assortative mating as it relates to autism. more specifically, we examined intra-couple correlations for quantitative self-report measures of autistic traits (autism spectrum quotient [aq]; baron-cohen, wheelwright, skinner, et al., ) , empathizing (empathy quotient [eq]; baron-cohen & wheelwright, ) , and systemizing (systemizing quotient-reived [sq-r]; wheelwright et al., ) , as well as the standardised difference between empathizing and systemizing (d scores). additionally, we examined behavioural measures that broadly map onto empathizing (reading the mind in the eyes test [rmet] ; baron-cohen, wheelwright, hill, et al., ) and systemizing (embedded figures test [eft] ; witkin et al., ) . we predicted that: ( ) sex differences would be observed for each of these measures (m>f for aq, sq-r, d score, and eft; f>m for eq and rmet), ( ) variables associated with social homogamy (age, educational attainment, and stem status) would correlate positively within couples, ( ) autism-related measures would be positively correlated within couples; we also predicted ( ) that intra-couple correlations for autismrelated variables would reflect initial assortment rather than convergence, ( ) that intracouple similarity for autism-related variables might reflect social homogamy for age and attainment or active assortment (no directional prediction was made here), and ( ) that intracouple correlations for autism-related variables would be stronger in couples for whom both partners worked/studied in stem. we conducted an a priori power analysis using g*power . (faul et al., (faul et al., , ) to determine the sample size. assuming a medium effect size (r = . ) for intra-couple correlations on personality variables (e.g. kardum et al., ) and % power, this analysis determined that a sample size of n= couples would be required to observe a statistically significant effect (p < . ) with a one-tailed pearson's correlation test. participants firstly reported their sex ('male', 'female', 'prefer not to say'), 'other (please specify below)', age (years), and ethnicity ('white', 'mixed / multiple ethnic groups', 'asian / asian british', 'black / african / caribbean / black british', 'other ethnic group'). they were then asked questions regarding their relationship, specifically their cohabiting status ('living with partner', 'not living with partner'), length of relationship (years and months), and marital status ('not married', 'engaged to be married', 'married'). they were also asked to confirm their educational level ('no qualifications', 'completed gcse level (or equivalent)', 'completed a level, access course (or equivalent)', 'bachelor's degree', 'master's degree', 'doctorate degree', 'other, please specify)'), current student status ('yes', 'no'; if 'yes', then area and year of study were also recorded), whether they were employed ('yes', no'; if 'yes', then place of work and job role were also recorded), whether they had an autism diagnosis ('yes', 'no') or suspected they were autistic ('yes', 'no'). see table for descriptive statistics relating to the sample's demographics. we used the -item self-report autism spectrum quotient (aq; to measure autistic traits. for each item, participants are asked to specify to what extent they consider a statement to relate to them (response options: strongly agree, slightly agree, slightly disagree, strongly disagree). half of the items are reverse-coded, and one point is given for each response (either slight or strong) validating an autistic trait. the sum of all items (possible range = - ) is calculated as an indicator of one's level of autistic traits (higher scores indicate more autistic traits). cronbach's alpha was considered satisfactory (i.e. > . ; bland & altman, ; see also tavakol & dennick, ) in the current study (α = . ). we measured self-reported empathizing via the -item empathy quotient (eq; baron- cohen & wheelwright, ) . although the response options are the same as those of the aq, in this case, participants are assigned point for each response that slightly endorses an empathic tendency and points for a response that strongly endorses an empathizing tendency. approximately half of the items are reverse-scored, and the possible range of scores is - (higher scores indicate higher empathizing). internal consistent for this measure was satisfactory (α = . ). self-reported systemizing was measured via the item systemizing quotient-revised (sq-r; wheelwright et al., ) . as with the eq, point is assigned for each response slightly endorsing a systemizing tendency and points are assigned for each response strongly endorsing a systemizing tendency; scores can range from to , and higher scores indicate higher systemizing. the sq-r showed satisfactory internal consistency in the current study (α = . ). in addition to examining eq and sq-r scores, we also standardised these scores and calculated the difference as: d = s -e. d scores provide an indication of one's cognitive style, with positive scores indicating relatively strong systemizing compared to empathizing. in addition to the questionnaires, two behavioural measures were administered. firstly, the reading the mind in the eyes test (baron- cohen, wheelwright, hill, raste, & plumb, ) was used to provide an indication of participants' ability to correctly infer mental states in others, a skill that broadly maps onto empathizing. for this task, a picture of the eye region is shown along with four adjectives, one of which correctly describes the emotion portrayed; a practice trial is completed before the items that comprise the measure. internal consistency was satisfactory (α = . ). in addition, we used the embedded figures test (witkin et al., ) as a behavioural task that taps into abilities prerequisite to systemizing. for this measure, participants are shown a rectangular stimulus consisting of horizontal, vertical, and diagonal lines, within which they are tasked with identifying a particular shape (i.e. the embedded figure) . a practice trial is conducted prior to the trials that comprise the measure. participants are timed on each trial by a research assistant using a stopwatch, and the time taken to identify the correct shape is recorded (if the participant does not identify the correct shape within seconds, the task proceeds to the next trial). the mean score across all trials is computed as the variable of interest. the internal consistency for this measure was satisfactory (α = . ). the current study utilised a correlational design. participants were invited to attend a lab session in which they and their partner would independently complete several measures relating to autism. each couple was offered a £ . amazon voucher as an incentive to participate. the study protocol, hypotheses, and analysis plan were pre-registered on the open science framework (osf) prior to data analysis (osf.io/ jg p). however, due to restrictions imposed by the covid- pandemic, a minority of participants completed the study via an online survey hosted by qualtrics. digit ratio ( d: d) was also measured for those participants that attended the lab, and results from that aspect of the study have been published elsewhere (richards, baron-cohen, van steen, & galvin, ) . we first tested for sex differences in the autism-related variables by using independent samples t tests. we then examined intra-couple associations for age (pearson's correlation), educational attainment (spearman's correlation) and stem study/occupational status (chisquare test) before using pearson's correlations to determine the strength and direction of intra-couple correlation for each of the autism-related variables. next, we compared the sexstandardised difference scores for autism-related variables for actual couples with those calculated by pairing each male in the dataset with each female other than his partner. this analysis was used to determine whether actual couples' scores for these variables were more similar than expected assuming a pattern of random mating. we then used pearson's correlations to determine the level of association between length of relationship and the within-couple difference scores for autism-related variables. the assumption of this analysis was that a significant negative correlation would imply that couples' scores become more similar over time and so indicate convergence effects rather than initial assortment. we then performed similar analyses in relation to age and educational attainment under the assumption that a significant positive correlation could indicate that couple similarity for autism-related variables is explainable in terms of social homogamy rather than active assortment. finally, we used independent samples t tests to determine whether couples for whom both members worked/studied in stem areas were more similar in terms of autismrelated variables than couples for whom only one or neither member worked/studied in stem. we considered p < . (two-tailed) to suggest statistical significance and interpret effect sizes in accordance with cohen ( ) . data analyses were conducted in r version . . . as predicted, males scored higher than their female partners on aq, sq-r, and d score, and achieved faster times on the eft; females had higher scores on the eq and rmet, although the latter effect was not statistically significant ( table ) . partners' ages were very strongly positively correlated, r( ) = . , p < . . a spearman's correlation also demonstrated that partners' level of educational attainment was positively correlated, r s ( ) = . , p < . , and a chi-square test showed that those studying/working in stem were more likely than chance to have a partner who was also working/studying in stem, χ ( , ) = . , p < . , φ = - . . positive intra-couple correlations were observed for aq, r ( ) to determine whether actual couples within our dataset were more similar to each other on autism-related variables, we first calculated (unsigned) difference scores for each of the relevant variable as male score -female score. next we calculated the difference scores between any given male and all females that were not his partner and took the average. paired-samples t tests determined that the aq, sq-r, rmet and eft difference scores for actual couples were smaller (i.e. more similar) than those calculated from random pairings, and that the observed differences were small in size; no such effects were observed for eq and d scores ( table ). to investigate whether the intra-couple correlations for autism-related variables were explainable by initial assortment or convenience, we correlated the standardised withincouple difference scores for autism-related variables with length of relationship. essentially, if length or relationship is correlated with the difference score, it suggests that partners' become more similar (negative correlation) or more dissimilar (positive correlation) over the course of their relationship, and so provides evidence against there being initial assortment (kardum et al., ) . as both males and females reported the length of their relationships, we correlated these two sets of scores to check for similarity ( findings therefore indicate that intra-couple correlations for autism-related variables are attributable to initial assortment rather than convergence effects. to investigate whether intra-couple correlations for autism-related variables may be better explained by active assortment or social homogamy, we first correlated the within-couple standardised difference scores for the autism-related variables with the couples' absolute difference in age. the idea here is that if couples that are dissimilar in age are also more dissimilar for autism-related variables, the intra-couple correlation may be explained by social homogamy effects. we observed no correlation between absolute (i.e. unsigned) age difference and within-couple differences for aq, r ( ) the current study aimed to provide an empirical examination of the assortative mating theory of autism by determining if (and to what extent) traits associated with autism are correlated within heterosexual partners in the general population. the main finding was that quantitative measures of autistic traits (aq) and systemizing (sq-r), the standardised difference between empathizing and systemizing (d), the ability to read emotions in the eye region (rmet), and spatial skills (eft) (but not empathizing; eq) are all positively correlated within partners, and that these effects are better explained in terms of active assortment than social homogamy, and by initial assortment rather than convergence. furthermore, when we compared within-couple standardised difference scores for these variables with standardised difference scores calculated as the average of all other possible heterosexual pairings within the dataset, we found that actual couples were more similar for aq, sq-r, rmet, and eft than would be expected under the assumption of random mating. a number of studies have previously examined intra-couple correlations for autistic trait variables, the majority of which have focused on very specific samples such as the parents of autistic children (connolly et al., ; lau et al., ; losh et al., ; lyall et al., ; schwichtenberg et al., ; seidman et al., ; van steijn et al., ; virkud et al., ), parents of twins (constantino & todd, ; hoekstra et al., ; hoekstra et al., ) , and parents of children with various mental health conditions (duvekot et al., ) . although two of these have also included samples of parents of typically developing children (lyall et al., ; schwichtenberg et al., ) and another study reported on a sample of newlyweds (pollmann et al., ) , it is clear that the nature of these effects within the general population remains relatively unexplored. as quantitative autistic traits share a genetic architecture with autism in terms of a diagnostic construct, children of two parents who both have high levels of autistic traits can be predicted to receive a double dose in terms of polygenetic susceptibility to autism. this may have important implications as regards our understanding of geographical trends in autism prevalence. this point is made clearer by the finding that rates of autism can sometimes cluster (mazumdar et al., ; van meter et al., ) , with children in eindhoven (i.e. the 'silicon valley of europe') being twice as likely to be autistic as children from two similar sized cities (utrecht and haarlem) in the netherlands (roelfsema et al., ) . furthermore, peyrot et al. ( ) suggested that "current trends in assortative mating might lead to a considerable increase in the prevalence of rare disorders with high heritability". a novel aspect of the current study is that we demonstrated statistically significant positive intra-couple correlations (and increased within-couple similarity) for self-reported systemizing (sq-r scores) as well as for a behavioural/cognitive skill that likely underpins systemizing ability (speed on the eft). this is particularly important considering that systemizing shares a genetic architecture with autism (warrier et al., ) , and, of course, because it is the trait upon which assortative mating in relation to autism was initially hypothesised to act (baron- cohen, a cohen, , b cohen, , . these findings may therefore be informative regarding why autism is associated with interest and aptitude for stem (baron- cohen et al., baron-cohen, wheelwright, skinner, et al., ; ruzich et al., ; wei et al., ) , and why children in geographical regions enriched with stem industry are at elevated likelihood of developing the condition (roelfsema et al., ) . it remains unclear exactly why autistic traits and systemizing are correlated within couples, though it may be that increased similarity for these variables improves the 'flow' of a relationship and therefore helps it survive and progress. the observation that these associations were not moderated by length of relationship or social homogamy variables suggests that they likely reflect initial and active assortment (i.e. people seek out similar individuals as partners and do not become more alike over the course of their relationship). although pollmann et al. ( ) reported that autistic traits were not associated with relationship satisfaction in wives, husbands with high levels of autistic traits had lower relationship satisfaction, an effect that was entirely mediated by low trust in and responsiveness to their partner, and low intimacy in the relationship. perhaps somewhat counterintuitive though is the observation of jobe and williams white ( ) that autistic traits are actually positively correlated with relationship length, though this might be explained by those with high levels of autistic traits typically being resistant to change and so less likely to choose to end a relationship. however, although speculative, this resistance to change might also result in couples being less likely to converge on these measures over the course of their relationship, which provides further support for the presence of initial assortment. it is relevant to note that similarity in people's levels of autistic traits extends beyond romantic relationships, as it has also been observed in friendship dyads. wainer, block, donnellan, and ingersoll ( ) reported that autistic traits (as measured by the bapq) were significantly correlated between same-sex friendship pairs, and that this effect was present when examined for both self-report (r = . ) and informant-(i.e. partner) report measures (r = . ). of particular relevance here is the finding that concordance on autism-related traits (specifically the 'aloofness' scale of the bapq) predicted increased relationship satisfaction in newly-formed college roommate dyads when measured at - -week follow-up (faso et al., ) . furthermore, autistic adults appear to be more comfortable during first interactions with other autistic (as opposed to neurotypical) adults (morrison et al., ) . taken together, these findings imply that individuals with high levels of autistic traits find it easier to begin relationships with people who show concordance in this regard, and that such relationships are more likely to progress. this process may therefore also be implicated in the development of romantic relationships, with individuals more closely matched on autistic traits being more likely than discordant dyads to pursue and maintain them. an important consideration in explaining the underpinnings of assortative mating is the possible role that online dating may play. in particular, evidence suggests that online dating relates to reduced assortative mating in regard to occupation and geographical proximity but increased couple similarity for educational level, age, and marital history (lee, ) . the internet provides a novel environment in which to seek potential romantic partners. like never before in our evolutionary history, it provides opportunities for individuals to seek each other out based on very specific characteristics that can be vanishingly rare in the population as a whole. in the current context, autistic individuals may dramatically increase their chances of meeting (and therefore also of forming romantic relationships) by becoming members of autism-related online discussion boards, support groups and social media pages. such processes may dramatically increase the likelihood of autistic individuals meeting and having children (nordsletten et al., ) , and could therefore be an explanation (amongst many others) for why autism prevalence has risen notably in recent years. there are several limitations to the current research that should be acknowledged. firstly, although the sample examined is arguably more representative of the general population than most of those previously examined in this area, we did not record information relating to the presence/absence or number of offspring. therefore, although positive assortment appears to exist within this sample, it remains unclear exactly what effects this could have on the gene pool of subsequent generations. our study is also correlational in nature, meaning that it is not possible to assess the development of relationships over time in order to determine causal inferences (see blossfeld & timm, ) . additionally, due to covid- -related restrictions, only a subsample of our study participants was administered the rmet and eft, meaning that analyses relating to these variables achieved lower statistical power than those for the aq, eq, and sq-r. although we notably still demonstrated statistically significant positive intra-couple correlations for the rmet and eft, replication and extension of these findings will be necessary for firmer conclusions to be drawn. for instance, future studies will be required to determine whether assortative mating processes apply specifically to these variables or whether such effects are explained by within-couple similarity for iq score. the current study demonstrates small-to-moderate levels of partner similarity for a range of traits associated with autism spectrum conditions, and so implies that assortative mating may play an important role in terms of the maintenance and transmission of genes related to autistic phenotypes. in particular, we demonstrate here for the first time that systemizing is positively correlated within heterosexual partners, and that actual couples are more similar on this trait than would be expected under the assumption of a random mating pattern. these findings strongly support the assortative mating theory of autism (baron- cohen, a cohen, , b cohen, , , suggest that future behavioural genetics studies should consider the influence of assortative mating when deriving heritability estimates for autism-related measures, and may be informative regarding spatial and chronological trends in autism prevalence. none. diagnostic and statistical manual of mental disorders: dsm- secular changes in the symptom level of clinically diagnosed autism fetal testosterone and autistic traits prevalence of autism spectrum disorder among children aged years -autism and developmental disabilities monitoring network, sites the experiences of late-diagnosed women with autism spectrum conditions: an investigation of the female autism phenotype the hyper-systemizing, assortative mating theory of autism two new theories of autism: hyper-systemising and assortative mating the evolution of empathizing and systemizing: assortative mating of two strong systemizers and the cause of autism elevated fetal steroidogenic activity in autism foetal oestrogens and autism the empathy quotient: an investigation of adults with asperger syndrome or high functioning autism, and normal sex differences mathematical talent is linked to autism the "'reading the mind in the eyes'" test revised version: a study with normal adults, and adults with asperger syndrome or high-functioning autism the autism-spectrum quotient (aq): evidence from asperger syndrome/high-functioning autism, males and females, scientists and mathematicians is there a link between engineering and autism? using self-report to identify the broad phenotype in parents of children with autistic spectrum disorders: a study using the autism-spectrum quotient statistics notes: cronbach's alpha who marries whom? educational systems as marriage markets in modern societies statistical power analysis for the behavioral sciences evidence of assortative aating in autism spectrum disorder intergenerational transmission of subthreshold autistic traits in the general population dramatic increase in autism prevalence parallels explosion of research into its biology and causes trim and fill: a simple funnel plot based method of testing and adjusting for publication bias in meta-analysis symptoms of autism spectrum disorder and anxiety: shared familial transmission and cross-assortative mating the broad autism phenotype predicts relationship outcomes in newly formed college roommates statistical power analyses using g*power . : tests for correlation and regression analyses a flexible statistical power analysis program for the social, behavioral, and biomedical sciences epidemiology of pervasive developmental disorders editorial: the rising prevalence of autism the role of gender in the perception of autism symptom severity and future behavioral development factor structure, reliability and criterion validity of the autism-spectrum quotient (aq): a study in dutch population and patient groups heritability of autistic traits in the general population disentangling genetic and assortative mating effects on autistic traits: an extended twin family study in adults loneliness, social relationships, and a broader autism phenotype in college students assortative mating for dark triad: evidence of positive, initial, and active assortment autistic traits in couple dyads as a predictor of anxiety spectrum symptoms effect of online dating on assortative mating: evidence from south korea defining key features of the broad autism phenotype: a comparison across parents of multiple-and single-incidence autism families epidemiology of autistic conditions in young children assortative mating and couple similarity: patterns, mechanisms, and consequences. social and personality psychology compass parental social responsiveness and risk of autism spectrum disorder in offspring the spatial structure of autism in california outcomes of real-world social interaction for autistic adults paired with autistic compared to typically developing partners patterns of nonrandom mating within and across major psychiatric disorders exploring boundaries for the genetic consequences of assortative mating for psychiatric traits autistic traits in male and female students and individuals with high functioning autism spectrum disorders measured by the polish version of the autism-spectrum quotient mediators of the link between autistic traits and relationship satisfaction in a non-clinical sample assortative mating and digit ratio ( d: d): a pre-registered empirical study and meta-analysis are autism spectrum conditions more prevalent in an information-technology region? a school-based study of three regions in the netherlands sex and stem occupation predict autism-spectrum quotient (aq) scores in half a million people can family affectedness inform infant sibling outcomes of autism spectrum disorders the broad autism phenotype questionnaire: mothers versus fathers of children with an autism spectrum disorder geographic distribution of autism in california: a retrospective birth cohort analysis the co-occurrence of autism spectrum disorder and attention-deficit/ hyperactivity disorder symptoms in parents of children with asd or asd with adhd conducting meta-analyses in r with the metafor package familial aggregation of quantitative autistic traits in multiplex versus simplex autism the broader autism phenotype and friendships in non-clinical dyads the autismspectrum quotient (aq) in japan: a cross-cultural comparison social and non-social autism symptoms and trait domains are genetically dissociable science, technology, engineering, and mathematics (stem) participation among college students with an autism spectrum disorder predicting autism spectrum quotient (aq) from the systemizing quotient-revised (sq-r) and empathy quotient (eq) recognition of girls on the autism spectrum by primary school educators: an experimental study a manual for the embedded figures test imprint of assortative mating on the human genome the authors would like to thank prof. robin dunbar for providing useful feedback on an earlier version of the manuscript. we identified studies in which intra-couple correlations for autistic traits were reported or could be determined from the available descriptive statistics (table s ). although not specified in our pre-registration document, we meta-analysed the available literature (k= , n= , ) in order to provide a more reliable effect size estimate (see figure s for forest plot). we conducted random effects meta-analyses using the r package metafor (viechtbauer, ) so as to allow for the possibility of the true effect size of the correlation differing depending on moderating factors. the model revealed a statistically significant positive correlation, z = . ( % ci = . , . [r = . ; % ci = . , . ]), p < . , and significant heterogeneity was observed, q ( ) = . , p < . , τ = . , i = . %. removal of any one sample did not noticeably change these results (in all cases, p < . ), the funnel plot appeared reasonably symmetrical ( figure s ), and the trim and fill procedure (duval & tweedie, ) did not estimate the presence of missing studies. key: cord- -hroxg u authors: megremis, spyridon; walker, thomas d. j.; he, xiaotong; o’sullivan, james; ollier, william e.r.; chinoy, hector; pendleton, neil; payton, antony; hampson, lynne; hampson, ian; lamb, janine a. title: microbial and autoantibody immunogenic repertoires in tif γ autoantibody positive dermatomyositis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hroxg u we investigate the accumulated microbial and autoantigen antibody repertoire in adult-onset dermatomyositis patients sero-positive for tif γ (trim ) autoantibodies. we use an untargeted high-throughput approach which combines immunoglobulin disease-specific epitope-enrichment and identification of microbial and human antigens. increased microbial diversity was observed in dermatomyositis. viruses were over-represented and species of the poxviridae family were significantly enriched. the autoantibodies identified recognised a large portion of the human proteome, including interferon regulated proteins; these proteins were clustered in specific biological processes. apart from trim , autoantibodies against eleven further trim proteins, including trim , were identified. some of these trim proteins shared epitope homology with specific viral species including poxviruses. our data suggest antibody accumulation in dermatomyositis against an expanded diversity of microbial and human proteins and evidence of non-random targeting of specific signalling pathways. our findings indicate that molecular mimicry and epitope spreading events may play a significant role in the pathogenesis of dermatomyositis. the idiopathic inflammatory myopathies (iim) are a heterogeneous spectrum of rare autoimmune musculoskeletal diseases characterised clinically by muscle weakness and systemic organ involvement. iim are thought to result from immune activation following environmental exposures in genetically susceptible individuals. viral and bacterial infections have been reported in individuals with iim (miller et al., ) , but their role in disease pathology is unclear. autoantibodies are a key feature of iim, in common with other autoimmune rheumatic diseases such as rheumatoid arthritis, systemic lupus erythematosus and systemic sclerosis. myositis-specific autoantibodies, present in approximately %- % of individuals with iim, are directed against a range of cytoplasmic or nuclear components involved in key intracellular processes, including protein synthesis and chromatin re-modelling . these myositis-specific autoantibodies are often associated with particular clinical features. individuals with autoantibodies targeting the cytoplasmic nucleic acid sensor mda (also called interferon induced with helicase c domain , ifih ) can present with rapidly progressive interstitial lung disease which is associated with high mortality (abe et al., ; betteridge et al., ) . several members of the tripartite motif (trim) protein family are known autoantibody targets in iim, and there is a strong temporal association between adult-onset dermatomyositis and malignancy onset in individuals with antibodies to transcription intermediary factor γ (tif γ, trim ) (oldroyd et al., ) . many trim proteins are important immune regulators (van tol et al., ; versteeg et al., ) , and dysregulation of trim proteins leading to reduced ability to restrict viral infection has been reported in several autoimmune diseases, including systemic lupus erythematosus and inflammatory bowel disease (oke et al., ; zhou et al., ) . the inducible type i interferon (t -ifn) cytokine system, part of the innate immune response, also plays a role in autoimmune rheumatic diseases including iim (muskardin and niewold, ) . the t -ifn antiviral response is initiated when pathogen-associated molecular patterns are recognized by host pattern recognition receptors and cytosolic receptors for viral nucleic acid. this broad viral recognition process triggers downstream signalling pathways which lead to interferon transcription, protein production and expression of interferon-induced genes; this further enhances the anti-viral machinery (muskardin and niewold, ) . such a response is critical for host protection against pathogen expansion, and limits infection during the window of time needed to mount an effective specific (adaptive) immune response. improved understanding of iim pathogenesis is required to improve both patient stratification and disease management. in adult-onset dermatomyositis, we propose the following potential molecular mechanism of disease pathogenesis: anti-tif autoantibodies reduce ability to restrict viral infection, which leads to either an increased susceptibility to a wider diversity of viral pathogens or increased exposure to specific anti-tif -related viruses. to test this hypothesis, we applied a novel highresolution and high-throughput comparative screening pipeline (serum antibody repertoire analysis, sara) [manuscript in preparation] to both anti-tif autoantibody-positive dermatomyositis patients and matched healthy control plasma. using this approach, high-throughput antigen epitope-sequencing was integrated with bioinformatic modules to de-convolute accumulated immunogenic responses against the total microbial 'exposome' (including viruses, bacteria, archaea and fungi) and human proteins. we report the identification of disease-specific microbial and human protein epitopes which have clinical and aetiological relevance to anti-tif autoantibody-positive dermatomyositis. to describe the accumulated antibodies present in dermatomyositis we use the sara pipeline which integrates an escherichia coli flitrxtm random amino acid (aa) peptide display system with epitope signature enrichment through competitive bio-panning and high-throughput dna sequencing ( figure ). competitive bio-panning was applied to pooled total immunoglobulin fractions (iga, igg, igm) purified from the plasma of twenty anti-tif positive adult-onset dermatomyositis patients (dm) and twenty healthy controls (hc) ( table s ). four sample pools were generated and paired: the first pair contained pooled samples (p ) of dm (dm p ) used for competitive biopanning against pooled samples of hc (hc p ) ( table s ). the second pair included pooled samples (p ) of dm (dm p ) used for competitive biopanning against pooled samples of hc (hc p ) (table s ). figure and figure s detail achieved metrics while defining the microbial and autoantibody immunogenic repertoires in dm and hc. we retrieved ≈ million (dm) and ≈ million (hc) next generation sequencing (ngs) reads which represent . million (dm) and . million (hc) expressed epitopes. cohorts presented highly enriched epitope sequences that are unique to dm ( , ) and hc ( , ) respectively. our epitope cohorts retrieved . million (dm) and . million (hc) microbial or human protein annotations respectively which mapped to , (dm) and , (hc) distinct species. , unique species were identified in dm while , unique species were identified in hc. in the dm group, linear epitopes were identified for a total of , microbial species ( figure s a) compared to , in the hc group ( figure s b ). in both groups the highest level of richness (number of different species) was observed for bacteria, followed by viruses, fungi and archaea (figure s a & s b) . we defined distinct epitopes as being any discrete epitope sequence retrieved from the dm and hc groups. unique distinct epitopes represent any discrete epitope sequences which were observed exclusively in either dm or hc. common distinct epitopes were discrete epitope sequences that were observed in both dm and hc groups by sequence and at a minimum of five-fold the ngs enrichment frequency in one group vs the other group. unique distinct epitopes represented the vast majority of sequences observed (fig ) . based on the distribution of the corresponding microbial species in the pools of and the majority of distinct species are shared between the p and p samples, whereas the majority of unique species are observed either in p or in p , in both dm and hc ( figure s c & s d, respectively). these data demonstrate the presence of a stably-enriched microbial component in dm ( figure s a ) and also in hc ( figure s b ). the increase in the number of plasma samples within the pool used for cross-panning, from a pool of (p ) to a pool of (p ), had a differential effect between the two groups: in dm, an increase in the number of microbial species was observed from (dm p ) to (dm p ), whereas in the hc group a decrease was observed from (hc p ) to (hc p ) species. the number of epitopes per microbial species in both the dm and the hc significantly increased with increasing pooling size ( figure a ). the number of ngs reads per microbial species significantly increased in the dm p ( % ci: . - . ) compared to dm p ( % ci: . - . ), whereas it did not change in the hc p ( % ci: . - . ) compared to the p ( % ci: . - . ) ( figure b ). the number of distinct epitopes per microbe (epitopes present in both groups at minimum five-fold enrichment) significantly increased in the p s compared to p s in both groups ( figure c ); (n= , %ci: . - . ) to (n= , %ci: . - . ) in dm and (n= , %ci: . - . ) to (n= , %ci: . - . ) in hc. the unique microbial epitopes which were identified only in one of the two groups significantly increased in dm p (n= , %ci: . - . ) compared to dm p (n= , %ci: . - . ) and decreased in hc p (n= , %ci: . - . ) compared to hc p (n= , %ci: . - . ) ( figure e ). collectively, these data indicate an expansion of the dmspecific microbial diversity with increasing dm plasma sample size, whereas the opposite effect was observed in hc. the number of enriched ngs reads per distinct microbial species significantly increased in dm p ( %ci: . - . ) compared to dm p ( %ci: . - . ), while no significant difference was observed in hc p ( %ci: . - . ) compared to hc p ( %ci: . - . ) ( figure d ). the number of sequencing reads per unique microbial species did not differ significantly between dm p and dm p (adjusted p value > . ), however it significantly decreased in the hc p ( %ci: . - . ) compared to the hc p ( %ci: . - . ) ( figure f ). overall, the differential effect of increased sample size on the observed microbial exposure between dm and hc suggests there is a higher biological variability in dm than in hc. dm p exhibited a significantly higher number of microbial aa epitopes per microbial species compared to the hc p ( figure a ) and a higher number of total identified microbial species ( versus ). the same difference was observed in both distinct ( figure c ) and unique epitope sequences ( figure e ). overall, we demonstrate that plasma from anti-tif dm patients contain a higher number of microbial epitopes per species and against a wider microbial repertoire. we focused our further analysis on the dm p and hc p subgroups. to stratify the microbial species based on their potential significance in dm, we first normalised the number of ngs reads against the total number of epitopes per microbial species (ngsre-norm) ( figure s a -s d), and studied the relative ranking of viruses and cellular microbes ( figure a & b) . secondly, we evaluated the mean ngsre-norm at the viral family taxonomy level ( figure d & e) , and, thirdly, we recorded the total number of species contributing to each family ( figure d & e) (table s ). thus, we tested for potential high taxonomy-level organisation of microbes in dm. ranking the microbial species based on decreasing nsgre-norm we observed that viruses were over-represented in the top % of dominant microbial species relative to the total number of viral species present in both dm and hc ( figure a & b). specifically, . % and . % of viral species were present in the top % of microbes in the dm p and hc p , respectively ( figure the hc viral igome profile contained a higher proportion of total double stranded dna (dsdna) (ngsre-norm . % vs. . %) and single stranded rna (ssrna) (ngsre-norm . % vs. . %) than dm ( figure a & b). the dm viral igome contained a higher proportion of ssdna viruses (ngsre-norm . % vs. . %) and rna reverse-transcribing retroviruses (ngsre-norm . % vs. . %) ( figure a & b) . to test whether these observations were due to a generalised differentiating signal amongst viral species within each viral group we compared the species-specific ngsre-norm per viral category between dm and hc ( figure c ). the dsdna and ssrna ngsre-norm were elevated in the hc plasma compared to dm, whereas no significant change was observed in ssdna viruses or rna reverse-transcribing viruses ( figure c ). overall, . % ( out of ) of viral families were represented in both groups. we compared the viral igome at the family level, and we recorded the mean ngsre-norm with increasing enrichment for each viral family (figure d & e) . in dm, the five richest viral families were coronaviridae ( species), geminiviridae ( species), herpesviridae ( species), orthomyxoviridae ( species) and poxviridae ( species) ( figure d ) (table s ). of these, geminiviridae infect plants, whereas all other viruses can infect and cause disease in humans. in the hc, picornaviridae ( species) was the most enriched virus group followed by caliciviridae ( species), orthomyxoviridae ( species), coronaviridae ( species), and retroviridae ( species) ( figure e ) (table s ) . we evaluated the mean ngsre-norm for each family. in dm, nairoviridae (n= ), poxviridae (n= ), secoviridae (natural host: plants) (n= ), caliciviridae (n= ) and adenoviridae (n= ) were the top ranked families (table s ) . poxviridae was the one family that ranked highly regarding both the ngsre-norm (table s ) and the family richness (n= , %ci: . - . ) ( table s ). all pox viruses had high nsgre-norm and were amongst the dominant dm viral species ( figure f ). specifically, variola virus had the highest nsgre-norm amongst all identified viral species ( figure f ). in the hc p , polyomaviridae (n= ), iflaviridae (n= ), podoviridae (n= ), tymoviridae (n= ), and myoviridae (n= ) were the viral families with the highest mean ngsre-norm. podoviridae and myoviridae are prokaryotic viruses infecting bacteria, tymoviridae infect plants and iflaviridae infect insects (viralzone root-expasy) (hulo et al., ) . we also used z transformation of ngsre-norm to define the precise location and rank of each viral family within the dm and hc distribution ( figure d & e). in dm, multiple viral families (nairoviridae, poxviridae, secoviridae, alloherpesviridae, caliciviridae and adenoviridae) with high ngsre-norm z score occurred with at least one standard deviation above the group mean. in hc, only polyomaviridae (n= ) occurred with high ngsre-norm z score. in figure s we provide a cladogram of dm-specific viral species. we queried the ngs reads against the human proteome to identify accumulated autoantibodies in the pooled plasma. in dm p we identified trim (mean log : . ), in accordance with the autoimmune profile of our selected plasma ( figure a ) which was not present in hc p . we also identified additional trim proteins in dm p which were trims , , , , , , , , , and triml ( figure a ). of these, trim was also detected in the healthy samples but with significantly lower ngs reads, and trim (mean log : . ) was observed only in the hc p ( figure a ). these results confirm the presence of trim autoantibodies in the selected twenty anti-trim -positive dm patients and demonstrate the presence of autoantibodies against other members of the trim protein family exclusively in dm. an expanded dermatomyositis-specific and ifn-regulated human proteome from our antibody analysis, in dm p we identified a total of human protein targets, of which were highly specific sequence annotation hits (mean sig. < . ). in hc p we detected human protein targets, with high specificity (mean sig. < . ). autoantibodies identified in both dm and hc constituted only % (n= ) of total proteins and . % (n= ) of highly specific proteins suggesting a strong disease-specific proteome signature. the : (total proteins) and : (high specificity) ratios of identified autoantibodies in dm over hc, suggest that an expanded subset of the human proteome is targeted by autoantibodies in dm patients compared to hc. due to the role of trim proteins in ifn signalling, we asked whether the autoantibody protein-targets are regulated by interferons. in the dm dataset , proteins were predicted to be regulated by ifns compared to in hc (interferome v . ) (rusinova et al., ) . the vast majority of these proteins are regulated by interferons type i, type ii or both ( figure s a ). regardless of the ifn type we observed a higher number of ifn-regulated proteins in dm compared to hc suggesting a dm-specific enrichment of immunoglobulins against ifn-regulated human proteins. we focused our search directly for ifn autoantibodies expressed in our samples. in dm, ifngr autoantibodies were present, whereas they were absent in hc ( figure s b ). we expanded our search for autoantibodies against known proteins that are highly ranked within the ifng signalling pathway (gene rank within the superpath; genecardssuite: pathcards) (belinky et al., ) . autoantibodies against ifngrelated proteins were observed in dm, whereas only were found in hc ( figure s b ). the proteinprotein interaction (ppi) enrichment score in dm was < . e- versus . in the healthy sample (string . )(szklarczyk et al., ) ( figure s c & d). overall, these data suggest the accumulation of antibodies in dm against proteins that strongly contribute to ifng signalling and the broader antiviral mechanism by ifn-regulated proteins. to describe the biological functions of all the identified proteins the gene ontology (go) framework was used ( figure b & c). in dm p eight go biological processes were highly enriched ( figure b ). these processes were represented by an average of . % of go-specific proteins (n= , %ci: . %- . %) ( figure s a ). in hc p , ten go processes were enriched with an overall lower average coverage of . % (n= , range: . %- . %), as less proteins mapped to the same go unit. in the top-ranked go processes enriched in dm, the go coverage was higher in dm than hc regardless of the go process ( figure s a ). more than one third ( . %: out of ) of the dm autoantibody targets were part of the identified biological processes; . % of them ( out of ) were also present in hc p . dm processes involved structural elements including microtubulebased processes, actin-filament functions, cell junction organisation, extracellular structure organisation, cell morphogenesis in differentiation and small gtpase mediated signal transduction ( figure b ). ptk and rock were shared between out of functions, and, kif , ndel , srgap , app, prkcz, and clasp were shared amongst out of functions ( figure d ). none of these proteins were observed in hc. the fact that the identified dm-specific and hc-specific proteins were robustly clustered in biological functions suggests non-random targeting of specific signalling pathways by autoantibodies. in dm, these processes share multiple autoantibody protein targets suggesting the presence of a dm-specific autoantibody-targeted proteome module ( figure s b ). we identified that epitope sequences against the poxviridae family of viruses were significantly enriched in dm, including variola virus which had the highest ngsre-norm (fig. f ). given that our dm patients were trim autoantibody positive and due to the fact that trim proteins, ifng and the ifn antiviral mechanism were significantly enriched in our proteome data, we searched for potential links between variola virus, other members of the poxviridae family and trim proteins. since molecular mimicry is a potential mechanism of autoantibody generation, we aligned the identified variola virus and trim epitopes; the sequence annotation thresholds that we used in our bioinformatics pipeline guaranteed robust annotation of the epitope sequences. this would affect cross-kingdom epitope alignment, since we have maximized the phylogenetic distance between each microbial epitope and human proteins. to account for this, we started our epitope sequence analysis with the widest possible diversity of identified variola and trim epitopes, sacrificing specificity i.e. epitopes with more than % match to variola virus, and trim epitopes with an identity of more than %. the phylogenetic distances are shown in the circular cladogram in figure . we identified two different clades containing leaf nodes of trim and variola epitopes of high similarity (branch lengths < . ). the first clade involved the trim epitope "ripddvrrrpgc" and three additional epitopes "ri(q)ddvrrrpgc", "ri(q)ddv(h)rrpgc" and "ri(q)dd(v)(s)rrpgc" each of which mapped to varv ger hdlg and varv gui . the second clade contained the trim epitope "ssharyksvrfs", and "ssharyksvrfs", "ssharyks(m)rfs", "ssharykslrfs", "ssharyks(l)rf", and "ssharykslrf(t)" of a variola virus (unnamed protein product and viral dna polymerase processivity factor). we reinstated the annotation specificity thresholds for trim and variola epitopes to the default levels (strict, high specificity, maximisation of phylogenetic distances) ( figure s a ). the epitope sequence "ri(p)ddvrrrpgc" was retained and shared between trim (branch length: . ) and varv ger hdlg (branch length: . ). the epitope sequences "ssharykslrfs" and "ssharykslrf" were specific for variola virus but were no longer annotated as trim epitopes ( figure s a ). we asked whether the above epitope sequences are shared amongst different poxviridae species that we identified in dm p ( figure f ). the epitopes ("ssharykslrfs", "ssharykslrf", and "riqddvrrrpgc") were shared with high homology between variola, vaccinia, ectromelia, cowpox and camelpox viruses ( figure s b ). we aligned the dm trim proteins that we identified to test the conservation level of the above epitopes in trims ( figure a ). trim and trim shared high similarity of aa sequence and seem to diverge from the rest of trims across the region of interest ( figure a ). finally, we aligned all dm-enriched viral and trim epitopes ( figure s c ) and identified a third epitope "khkgalggggne" of trim which is shared by human immunodeficiency virus "khkgalgggg(n)e", and "khkg(d)lgggg(y)e" shared by synechococcus phage syn "khkg(d)lgggg(y)e" ( figure s d ). the epitope was poorly conserved amongst the dm-specific trim proteins ( figure b ). overall, we have identified two epitope sequences that are shared between variola virus and trim but also between members of the poxviridae family. a third epitope that is shared between trim , hiv- and syn was also observed. these findings suggest that molecular mimicry events, at least amongst these specific viral and trim epitopes, is a potential mechanism of pathogenesis in dm. this is the first study to investigate the accumulated microbial and autoantigen antibody repertoire in adult-onset dermatomyositis (dm) patients positive for antibodies against tif (trim ). we used an un-targeted high-throughput approach which combines immunoglobulin disease-specific enrichment of immunogenic epitopes, and subsequent identification of anti-microbial antibody and autoantibody richness and abundance. key findings include that ( ) the dm-specific igome was characterised by high microbial diversity, ( ) antibodies against viruses were over-represented and species of the poxviridae family were significantly enriched, ( ) dm-specific accumulated autoantibodies target a significant portion of the human proteome, including proteins from specific biological processes and interferon regulated proteins, ( ) autoantibodies against trim and eleven further trim proteins were identified, and ( ) specific trim proteins share epitope homology with viral species identified from the plasma of dm patients. the identified microbial epitopes in both dm and healthy controls (hc) mapped to bacteria, viruses, fungi and archaea demonstrating the human capacity for building protection against a diverse group of species. the presence of a stably-enriched microbial component in dm was identified, characterised by a higher number of epitopes per species and against a wider microbial repertoire. the effect of sample size in our measurements suggests increased microbial exposure and interpersonal variability in the dm compared to the healthy group, leading to expansion of the identified dm-specific microbial signature upon screening additional samples. even though our observations provide a static snapshot of the igome, the immune system has a memory and the accumulation of antibodies takes place from birth over the entire lifetime of the individual. thus, our data suggest that dm as a clinical entity is characterised by diverse microbial exposure. we observed that viruses are over-represented amongst the microbial species with the highest measured abundance in dm and hc. due to the competitive design of the epitope enrichment process this observation suggests that viral exposure has a significant role in dm. we observed enrichment of the coronaviridae, herpesviridae, orthomyxoviridae, geminiviridae and poxviridae families in persists for up to years (taub et al., ) , therefore given the age of our donors, it is very likely that the dm associated antibodies to poxviridae were raised through the vaccination process. this is supported by the fact that vacv antibodies were observed in the healthy sample without being (gan and miller, ; marie et al., ) . interestingly, there are case studies which report development of dm after vaccination against some of these viruses, notably hbv, hpv, influenza and vacv (ignasi rodriguez-pinto, ) . overall, the dm-specific viral signature includes viruses which can either directly infect muscle tissue and/or indirectly sabotage immune homeostasis. the observation that the respiratory and gastrointestinal systems are the primary physiological targets of these viruses agrees with previous observations regarding the high frequency of this type of infection in patients with juvenile-and adult-onset dm (pachman et al., ; svensson et al., ) . moreover, the age-range of juvenile and adult onset dm coincides with the epidemiological peaks of respiratory infections during life (kuan et al., ) . in dm, we identified autoantibodies against a very large number of human proteins (> ), representing about % of the current human proteome (uniprotkb/swiss-prot database, last modified july ). this was double the proportion compared to the hc-enriched proteome, with only % of autoantibodies being found in both dm and healthy controls, suggesting accumulation of autoantibodies in dm against a wider group of proteins. interestingly three times the number of autoantibody targeted proteins were regulated by type i and ii interferons in dm than in hc. this underscores the relevance of interferons in this disease (neely, ; somani et al., ) , and indicates the potential of interferon blockade as a therapy. the identification with high confidence of specific biological processes enriched in both dm and hc, suggests an organisational structure within the targets of these accumulated autoantibodies. although deconvolution to individual samples is not possible with our data, this is consistent with an infection (exposure)-driven pathological autoantibodyproducing mechanism which progressively expands its repertoire (antigens) under the chronic burden of accumulated stimuli (chan and gack, ; cusick et al., ; munz et al., ; panoutsakopoulou et al., ) . biological processes enriched in dm are orientated around cytoskeletal organisation, microtubule movement, actin filaments and cell junctions. eight of the identified protein targets participated in most of these processes indicating a key role in the regulation of multiple signalling pathways including the formation and disassembly of focal adhesions. the enrichment of antibodies against proteins with a role in the regulation of focal cell-cell adhesion in dm is particularly interesting, since disruption of these structures in epithelial cells is known to facilitate the spread of viruses, by allowing their release from the basal to the apical surface or the external environment (spear, ; torres-flores and arias, ) . interestingly some of the dm associated viruses described here are known to manipulate junctional proteins namely adenoviridae, flaviridae, herpesviridae, retroviridae, paramyxoviridae, and picornaviridae which provides a functional link between dm and increased viral exposure (mateo et al., ; mothes et al., ; zhong et al., ) . we identified autoantibodies against trim in dm plasma, confirming the anti-tif positive profile of our patients. furthermore, we identified autoantibodies against other trim proteins in dm, of which only trim was identified in healthy controls but with much lower abundance. trim (ro ) is a known autoantigen in iim and related autoimmune disorders, and acts as the highestaffinity fc receptor in humans. trim binds cytosolic antibodies bound to non-enveloped viral pathogens, to trigger antibody-dependent intracellular neutralization and proteasomal degradation of the immune complex (lee, ; rajsbaum et al., ) . the autoantibodies we identified against trim proteins, other than trim and trim , have not been observed previously in iim. trim inhibits vesicular stomatitis virus transcription and interacts with dengue virus non-structural protein to mediate its poly-ubiquitination and degradation (kueck et al., ; wang et al., ) . trims also regulate antiviral pathways indirectly by mediating innate immunity, influencing the transcription of ti-ifns, pro-inflammatory cytokines and interferon-stimulated genes (van tol et al., ) . notably, the trim protein family expanded very rapidly in evolution, coinciding with development of the adaptive immune system, suggesting that trims may have evolved to fine-tune interactions between the increasingly complex innate and adaptive immune systems versteeg et al., ) . deregulated activity of approximately one third of the trim proteins has been associated with the development of various type of human cancer (vunjak and versteeg, ) . for example, trim and trim have an oncogenic role in colorectal, prostate, oesophageal, ovarian and non-small cell lung cancer, promoting proliferation and metastasis (han et al., ; liang et al., ; zhang et al., ) . conversely trim has tumour suppressor activity, inhibiting the growth of liver, colorectal and gastric cancer (boulay et al., ) . the presence of autoantibodies against cancerassociated trims accords with the strong temporal association between myositis and the development of malignancies in adult-onset anti-tif γ positive dm (oldroyd et al., ) , consistent with the high proportion of cancer-associated myositis cases in the current study ( %). overall, the expanded targeting of trim proteins we observed in dm plasma supports the role of the trim ring-type e ubiquitin ligase subfamily as powerful regulators of the immune system through post-translational modification (hage and rajsbaum, ; van tol et al., ) . since poxviridae epitopes were enriched in dm, we investigated a potential link between pox viruses and trim proteins. we identified epitope sequences with high similarity between trim and variola virus in addition to other members of the poxviridae family. moreover, trim had epitopes in common with hiv and synechococcus phage. these high similarity epitopes were identified in the c-terminus of the trim proteins; between the filamin and nhl domains for trim , and proximal to the b . domain in trim . the c-terminus of trim proteins is under evolutionary positive selection and determines ligand binding specificity, function and subcellular localization; notably the b . domain is associated with ability to restrict viral infection, particularly retroviruses (van tol et al., ) . this finding of high similarity epitope sequences suggests that molecular mimicry, at least amongst specific viral and trim epitopes, is a potential mechanism of pathogenesis in dm. the finding of antibodies against a human protein does not mean that this was the antigen against which the antibody was raised, as pathogen-derived proteins can mimic endogenous epitopes (panoutsakopoulou et al., ; walker and jeffrey, ) . this might explain inconsistencies between the universal cellular distribution of specific antigens and the specific skeletal muscle target of the disease (walker and jeffrey, ) . in support of this, we observed that certain dm-specific trim epitopes are shared amongst different viruses. trim epitopes did not form a separate phylogenetic cluster but were dispersed throughout the viral epitope phylogenetic tree, suggesting that accumulated antibodies against trim proteins might be the product of a primal molecular mimicry event and subsequent epitope spreading rather than increased trim protein production. this is supported by the observation that the gene expression levels of autoantigens in myositis muscle biopsies do not correlate with the levels of circulating autoantibodies recognizing each cognate endogenous autoantigen; instead they are directly associated with the expression of muscle regeneration markers (pinal-fernandez et al., ) . we propose a model whereby pathogenesis of dm is dependent on dynamic and personalised viral exposure patterns, which start very early in life. the main two gateways of physiological transmission of viruses to humans are the respiratory and gastrointestinal systems, while non-physiological presentation of viruses is through immunisation. the link between dm and viruses is complex but strikingly present in our data since there is systematic enrichment for human proteome modules that are directly exploited by, or respond to, most of the viruses that we identified. we propose that an initial viral infection or virus delivery system presents the main antigen that induces antigen-specific antibody production in a predisposed immune environment. since trim proteins share epitope homology with multiple viral species, these antibodies can also bind to trims and potentially other homologous proteins. the outcome of this process is influenced by factors including the antibody populations at the site of infection (local antibodies), the status of the innate immune system, the virusspecific antibody concentration in serum (which usually peaks several weeks post infection), and the history of previous exposure primarily to the same or related viral strains (and epitopes) (rojas et al., ) . since hla class ii molecules are major regulators of the adaptive immune response to antigenic challenge through t cell repertoire selection, the level of antibody production to different viral antigens also may be mediated by hla type. we previously identified that adult-onset anti-tif positive dm is associated with hla-dqb * : (rothwell et al., ) . for the samples with available data included in this study, we observed that / ( %) dm and / ( %) hc are heterozygous for hla-dqb * : . the finding that in myositis, autoantigen expression correlates with expression of muscle regeneration markers (pinal-fernandez et al., ) suggests that after initial viral presentation, the increased concentrations of muscle proteins may then be sufficient even in the absence of viral proteins to invoke periodic rises of autoantibodies. moreover, the continuous activation of innate immunity observed in dm (neely, ; pinal-fernandez et al., ; walsh et al., ; wong et al., ) can also potentiate autoimmunity through chronic immune-mediated tissue damage resulting in autoantigen release without the need for specific activation of auto-reactive t cells by a microbial mimic (panoutsakopoulou et al., ) . therefore, following the initial viral exposure there is a high chance that this effect will spread amongst related proteins within specific signalling pathways (since protein homology is related to function) of the initial hit. in this model, in specific autoantibody-positive myositis subgroups, accumulated autoantibodies against proteins that participate in specific biological processes and signalling pathways would be observed. the finding that in healthy controls the protein coverage of the go processes significantly decreases suggests more random targeting of human proteins by autoantibodies, opposite to that observed in dm and in support of our hypothesis. the use of pooled plasma is a limitation in our study since we cannot deconvolute the data into individual patients. however, the enrichment process against healthy adults provides a unique opportunity to study dm as a disease system capturing the breadth of microbial antibody and autoantibody accumulation against a very large number of linear epitopes. apart from antibodies against tif which is a shared feature of the patients in this study, we do not anticipate that our findings will be equally distributed among patients. indeed, if the shared feature in dm is a molecular mimicry event combined with epitope spreading, then the stochastic nature of microbial exposure, genetic predisposition and the dynamic nature of immunity would not support a single pathotype. thus, our study provides the first dm-specific "map" of potential routes to disease. for the future, we recommend: clinical records are kept of short and long-term microbial exposure, infections history including type of pathogen, severity scores, number of hospitalisations and vaccination history; which were not available in this study. records could be expanded in cases of juvenile dm, to include maternal infection and vaccination history to account for acquired transplacental immunity. lifelong exposure to viruses and probably to other microbes contributes not only to accumulation of virus-specific antibodies and protection, but generation of autoantibodies against trims and other cellular proteins, which at some point may reach a critical mass and induce disease most likely as a result of seemingly mild or non-symptomatic infection. molecular mimicry and epitope spreading may play a significant role, instantly raising questions not only concerning the extent but also the sequence of events. this also means that autoantibodies identified in dm to date might only be the tip of the iceberg. the authors declare no competing interests. and trim (sp|q ld ); cross-marked. conservation level is presented as a colour gradient (high; red, low; blue). (a) partial alignment of the region where the trim -variola high similarity epitopes were identified. the trim protein sequence is presented next to the alignment plot. yellow: the two trim epitopes with high amino acid (aa) similarity to variola and poxviruses ( figure ). black font: identical aas between trim and variola virus. red font; different aas between trim and variola virus. both epitopes are located only aas apart (relative to trim protein sequence). trim presents high similarity with trim in respect to the specific aa epitopes. (b) partial alignment of the region where the trim -hiv-syn high similarity epitope was identified. the trim protein sequence is presented next to the alignment plot. yellow: the trim epitopes with high aa similarity to hiv and syn phage ( figure s d ). black font: identical aas amongst trim and the two viruses. red font; different aas amongst trim and the two viruses. the epitope is poorly conserved amongst the dm-specific trim proteins. further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, janine lamb (janine.lamb@manchester.ac.uk). this study did not generate new unique reagents. plasma samples were collected from anti-tif positive adult-onset dermatomyositis (dm) patients through the uk myositis network, as described previously (rothwell et al., ) (table s ). all individuals fulfilled definite or probable bohan and peter classification criteria for dermatomyositis. anti-tif autoantibody positivity was identified by immunoprecipitation, and confirmed by elisa, as described previously (rothwell et al., ) . gender and age matched (at time of sample collection) healthy controls (hc) were identified through the university of manchester longitudinal study of cognition in normal healthy old age cohort (rabbitt et al., ) we implemented the "serum antibody repertoire analysis (sara)" pipeline (manuscript in preparation). sara comprises a comprehensive workflow that integrates molecular biology peptide display and epitope signature enrichment through competitive bio-panning and high-throughput dna next generation sequencing (ngs), alongside in-house computational scripts to reverse engineer high resolution epitope signatures that reflect original in vivo epitopes present in patient sera. sara was applied to predict the identity and abundance of antibody epitope repertoires enriched in plasma from anti-tif autoantibody-positive dm patients versus plasma from matched hc (see supplementary methods) . this pipeline provides a digital triage of infectious organism epitopes and autoantibodies predicted to be uniquely present or highly enriched in each sera pool. all informatics analysis was carried out using r(r development core team, ) . - . and python (sanner, ) . . we briefly summarise the seven sara pipeline modules (m -m ) below in turn. plasma total immunoglobulin purification: m total immunoglobulins iga, igg, igm were purified from twenty anti-tif positive adult-onset dm and twenty hc plasma by adamtech total ig extraction kits (pesac, fr). for the sample-pools of (p ), µg of isolated ig per donor were used for a total of µg. for the sample-pools of (p ), µg of isolated ig per donor were used for a total of µg. ig yields and purity were assessed by nanodrop spectrophotometry and polyacrylamide gel electrophoresis. competitive biopanning: m separate purified ig pools from dm and hc were used for competitive bio-panning with the flitrxtm random amino acid (aa) peptide surface display system (lu et al., ) . briefly, we first conducted a pre-panning stage to incubate separate dm and hc immobilised ig pools with induced flitrxtm e. coli cells (lu et al., ) in order to sequester expressed epitopes relevant to the specific plasma cohort. unbound bacteria per cohort were further incubated with the alternative immobilised ig pools for the main panning stages (cross-panning). tethered bacteria were eluted and expanded to repeat the biopanning process times. od was measured after each round of biopanning to ensure comparable efficiency between dm and hc ( figure s e ). after competitive biopanning immobilised bacteria were expanded and polyvalent plasmid cohorts were purified (maxiprep, qiagen, uk). epitope ngs: m , ngs data processing: m polyvalent plasmid variance regions were pcr amplified and gel-purified. pcr products were validated by sanger sequencing (figure s f) , and nanodrop. the dna primer sequences are: forward: '-attcacctgactgacgac- ', flitrxtm reverse: '-ccctgatattcgtcagcg- '. multiplexed dna fragments were sequenced on the nextseq platform (illumina, uk). we retrieved ≈ million and ≈ million paired end fastq reads for hc and dm respectively (figure ). the -bp variance region sequences were translated with respect to reading frame. < % of dm and < % of hc comprised premature stop codons, indicative of immunologically-relevant bio-panning enrichment ( figure s a -s d). non-specific hts sequence noise from residual bio-panning solution was controlled for by our -σ- noise floor. this retains all expressed epitopes within each respective hc or dm pool that surpass a threshold of standard deviations above the lowest % distinct peptide sequence enrichment when ranked by read count (further described in supplementary figure s a-s d). epitope signature set analysis: m distinct epitope signature set analysis presented unique and common aa epitope sequence sets with associated ngs enrichment scores. minimum thresholds of aa sequence length and the -σ- ngs noise filter provided sensitive annotation power while minimising annotation noise ( figure s d -s f). cleaned distinct epitope sequences were collated. unique epitope pools were determined by symmetric difference of the hc and dm pools (i.e. specific epitope sequences uniquely found in hc or in dm patients). the intersect between hc and dm pools was probed and a minimum five-fold change threshold applied to partition relevant epitope collections to supplement the hc or dm pools. these epitope collections were utilised in downstream microbiome and autoantibody annotations. > . x , and > . x distinct epitopes with a minimum length of aa length were retrieved for dm and hc pools respectively. the highest ngs read counts achieved were ≈ , (dm) and ≈ , (hc) (figure ). epitope annotation and rank scoring: m , dm-associated and , hc-associated unique epitopes were retrieved for dm patients. only . % of the dm pool comprised peptide sequences common to hc. of these epitopes common by sequence, the dm enrichment score ranged between -fold and -fold the healthy control score ( figure ) ( figure s a -s f). epitopes were annotated by our modified blastp (henikoff and henikoff, ) approach at % minimum identity ( figure s g -s i). this facilitates short aasequence inputs and resolves protein annotation confidence scores per epitope with statistical control of epitope-protein annotation confidence, intra-protein dis-contiguous sequence matches, redundancy of protein isoforms and common sequence matches, single-epitope multi-protein parsing, and set analysis of resultant patient cohort organisms lists. these were tuned per each epitope's ngs read count. . x (dm patients) and . x (hc) total protein annotations were retrieved ( figure ). phylogenetic and taxonomic analysis: m our scoring system generates aggregate ngs read count enrichment and annotation stringency scores. we retrieved every known upstream taxonomic ranking level for all microbe ids and anchored these via ncbi rest and the taxonomy repository (benson et al., ; e., ; federhen, ) . phylogenetic trees were generated by phylot biobtye (letunic, ) and microbial annotations were assessed by ngs enrichment, epitope mapping confidence, annotation scores, taxonomic ranks, and phylogenetic clustering ( figure s g -s i). we utilised the interactive tree of life itol (letunic and bork, ; letunic and bork, ) to taxonomically cluster microbial agents for the dm or hc pools with incorporated annotation layers representing the ngs enrichment and annotation scores ( figure s i ). , distinct and , unique species were annotated for dm patients ( , distinct and , species were annotated for hc) (figure ). we identified any autoantibodies against human proteins that may be present. annotation scores occupy unit interval space [ , ] as surrogate significance values. protein cohorts were inspected by metscape (tripathi et al., ) , kegg (kanehisa and goto, ) , uniprot (apweiler et al., ) , and the human protein atlas (uhlen et al., ) , to explore functional predictions based on aberrant autoimmune targeting. (dm) and (hc) enriched human biomarkers from autoantibodies were retrieved ( figure ). the code supporting the current study have not been deposited in a public repository but will be deposited following submission of our methods manuscript (in preparation). in the intervening period these will be available from the corresponding author on request. (fig. s a ). these were highly enriched by associated ngs read counts and represent polyvalent flitrxtm peptide aa coding sequences from dm and hc biopanned total ig pools. dm ngs data processing (m ); dm epitope signature set analysis (m ) we retrieved , , distinct expressed aa epitopes that represent the epitope signatures present within the dm and hc ig pools (fig. s a ). associated ngs read frequencies reflected the epitope enrichment process achieved during biopanning, and fig. s b & c confirms that the -σ- ngs filter optimally controlled biopanning-induced sequencing noise to successfully provide , dmassociated and , hc-associated unique epitope sequences. preferential enrichment for epitopes of length and aas (fig. s d ) confirmed genuine biopanning capture and is concordant with published peptide epitope ranges of to aas (buus et al., ; hopp and woods, ) . minimum epitope lengths of aa applied with the -σ- ngs filter (above) proved optimal for annotation ( fig. s e ) using the unique sequence epitope sets of , for dm and , for hc individuals (fig. s f ). ig repertoire sequence overlap determined that only . % of dm epitope set were present in healthy controls (fold-change (fc) values of >= x to x abundance vs hc individuals) while . % of healthy epitope set was retained in the dm group (fc values of >= . x to x vs dm) (fig. s f ). our modified blastp approach retrieved . million (dm) and . million (hc) microbial or human protein annotations respectively which mapped to ≈ , microbial organisms. enrichment plots revealed , distinct infectious agents in hc individuals and , infectious agents in dm patients (fig. s g ) of which , and , unique infectious agents were identified in healthy individual and dm patients respectively (fig. s h) . these signatures presented highly enriched infectious agents in dm and highly enriched agents in healthy individuals ranked by phylogeny ( fig. s i ). supplementary figure : immunoglobulin epitope enrichment in dm and hc. next generation sequencing run data processing: the enrichment fold-change differences of these common sequences ranged from x: , x compared with the opposite set ( . % and . % the proportion of the respective unique epitope sets). enriched distinct and unique microbial agents in dm: distinct and unique organisms at a % blast annotation threshold. epitope sets comprised - aa sequence lengths following the -σ- clinical characteristics and change in the antibody titres of patients with anti-mda antibody-positive inflammatory myositis update on activities at the universal protein resource (uniprot) in pathcards: multi-source consolidation of human biological pathways frequency, mutual exclusivity and clinical associations of myositis autoantibodies in a combined european cohort of idiopathic inflammatory myopathy patients loss of heterozygosity of trim in malignant gliomas navigator: network analysis, visualization and graphing toronto highresolution mapping of linear antibody epitopes using ultrahigh-density peptide microarrays viral evasion of intracellular dna and rna sensing bacterial, fungal, parasitic, and viral myositis nonbacterial myositis molecular mimicry as a mechanism of autoimmune disease e-utilities quick start entrez programming utilities help ncbi (ncbi) the ncbi taxonomy database state of the art: what we know about infectious agents and myositis to trim or not to trim: the balance of host-virus interactions mediated by the ubiquitin system trim overexpression is a poor prognostic factor and contributes to carcinogenesis in non-small cell lung carcinoma amino-acid substitution matrices from protein blocks prediction of protein antigenic determinants from amino acid sequences viralzone: a knowledge resource to understand virus diversity myositis and vaccines kegg: kyoto encyclopedia of genes and genomes a chronological map of physical and mental health conditions from million individuals in the english national health service vesicular stomatitis virus transcription is inhibited by trim in the interferon-induced antiviral state a review of the role and clinical utility of anti-ro /trim in systemic autoimmunity phylot -a phylogenetic tree generator, based on ncbi taxonomy interactive tree of life (itol): an online tool for phylogenetic tree display and annotation interactive tree of life (itol) v : an online tool for the display and annotation of phylogenetic and other trees trim is up-regulated in colorectal cancer, promoting ubiquitination and degradation of smad using bio-panning of flitrx peptide libraries displayed on e. coli cell surface to study protein-protein interactions opportunistic infections in polymyositis and dermatomyositis connections matter--how viruses use cell-cell adhesion components risk factors and disease mechanisms in myositis virus cell-to-cell transmission antiviral immune responses: triggers of or triggered by autoimmunity? type i interferon in rheumatic diseases gene expression meta-analysis reveals concordance in gene activation, pathway, and cell-type enrichment in dermatomyositis target tissues high ro expression in spontaneous and uv-induced cutaneous inflammation the temporal relationship between cancer and adult onset anti-transcriptional intermediary factor antibody-positive dermatomyositis history of infection before the onset of juvenile dermatomyositis: results from the national institute of arthritis and musculoskeletal and skin diseases research registry analysis of the relationship between viral infection and autoimmune disease myositis autoantigen expression correlates with muscle regeneration but not autoantibody specificity r: a language and environment for statistical computing practice and drop-out effects during a -year longitudinal study of cognitive aging trimmunity: the roles of the trim e -ubiquitin ligase family in innate antiviral immunity molecular mimicry and autoimmunity focused hla analysis in caucasians with myositis identifies significant associations with autoantibody subgroups interferome v . : an updated database of annotated interferon-regulated genes the evolution of poxvirus vaccines. viruses python: a programming language for software integration and development severe dermatomyositis triggered by interferon beta- a therapy and associated with enhanced type i interferon signaling viral interactions with receptors in cell junctions and effects on junctional stability infections and respiratory tract disease as risk factors for idiopathic inflammatory myopathies: a population-based case-control study string v : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets immunity from smallpox vaccine persists for decades: a longitudinal study tight junctions go viral! viruses meta-and orthogonal integration of influenza "omics'' data defines a role for ubr in virus budding towards a knowledge-based human protein atlas the trimendous role of trims in virus-host interactions. vaccines (basel) intrimsic immunity: positive and negative regulation of immune signaling by tripartite motif proteins trim proteins polymyositis and molecular mimicry, a mechanism of autoimmunity type i interferon-inducible gene expression in blood is present and reflects disease activity in dermatomyositis and polymyositis critical role for cholesterol in lassa fever virus entry identified by a novel small molecule inhibitor targeting the viral receptor lamp interferon and biologic signatures in dermatomyositis skin: specificity and heterogeneity across diseases necrotizing myositis causes restrictive hypoventilation in a mouse model for human enterovirus infection trim functions as an oncogene by activating epithelial-mesenchymal transition and p-akt in colorectal cancer cell-to-cell transmission of viruses tripartite motif-containing (trim) negatively regulates intestinal mucosal inflammation through inhibiting th /th cell differentiation in patients with inflammatory bowel diseases key: cord- - sssm zk authors: milanez-almeida, pedro; martins, andrew j.; torabi-parizi, parizad; franco, luis m.; tsang, john s.; germain, ronald n. title: blood gene expression-based prediction of lethality after respiratory infection by influenza a virus in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sssm zk lethality after respiratory infection with influenza a virus (iav) is associated with potent immune activation and lung tissue damage. in a well-controlled animal model of infection, we sought to determine if one could predict lethality using transcriptional information obtained from whole blood early after influenza virus exposure. we started with publicly available transcriptomic data from the lung, which is the primary site of the infection and pathology, to derive a multigene transcriptional signature of death reflective of innate inflammation associated with tissue damage. we refined this affected tissue signature with data from infected mouse and human blood to develop and validate a machine learning model that can robustly predict survival in mice after iav challenge using data obtained from as little as μl of blood from early time points post infection. furthermore, in genetically identical, cohoused mice infected with the same viral bolus, the same model can predict the lethality of individual animals but, intriguingly, only within a specific time window that overlapped with the early effector phase of adaptive immunity. these findings raise the possibility of predicting disease outcome in respiratory virus infections with blood transcriptional data and pave the way for translating such approaches to humans. influenza a virus (iav) infection of the respiratory tract can lead to severe immune activation and lung tissue damage in mice, ferrets, macaques and humans ( - ). intrinsic virulence, replication capacity and initial infectious dose, together with the host's genetic background, immune status and overall health, determine the extent to which host immunity is triggered ( , ) . substantial evidence indicates that lethality is associated with an excessive innate immune response, with lung dysfunction arising from epithelial and endothelial damage induced by infiltrating leukocytes, in particular monocytes and neutrophils ( ) ( ) ( ) ( ) ( ) . our previous work in a murine model of infection with iav uncovered clusters of co-regulated genes in the lung associated with lethal influenza infection; one of those lethal clusters was highly associated with an overwhelming neutrophil response and, consistently, early post-infection partial neutrophil depletion rescued animals from lethality, establishing a direct link between the innate response in the tissue and death ( ) . while that earlier work focused on gene expression signatures in the infected pulmonary tissue, here we sought to identify a blood-based signature for prediction of lethality. previous attempts to develop gene expression signatures in blood in the context of iav-induced illness focused mostly on distinguishing iav from non-iav infections, symptomatic from asymptomatic iav carriers, or low from high influenza vaccine responders ( ) ( ) ( ) ( ) ( ) ( ) ( ) . a notable exception was the recent description of blood transcriptomics data from a large cohort of subjects enrolled in the mechanisms of severe acute influenza consortium (mosaic) study ( ) . in that report, severity of infectionmeasured in terms of need for mechanical ventilation -was associated with a weak transcriptional "viral response" signal and a strong transcriptional "bacterial response" (and activated-neutrophil) signal in blood in comparison with non-severe cases. however, these transcription patterns were also strongly associated with duration of illness and the authors emphasized the importance of timing in the interpretation of immune activity to iav infection ( ), which, for obvious reasons, can be hard to control for in natural human infection studies. we aimed to develop a blood biomarker for early identification of individuals at risk of adverse disease outcome after influenza infection in a well-controlled mouse model of iav infection, where a precise delineation of the evolution of the response associated with severity could be achieved. we utilized transcriptomic data from the lungs, the focal point of infection, and also data from blood post infection, to derive a transcriptional signature of lethality across various iav and mouse strains. this early-response signature could distinguish mice at high risk of death with different influenza a virus strains. these results provide an impetus for seeking to translate this approach to human respiratory virus infection characterized by damaging inflammatory responses. the first question was how to select a panel of candidate genes whose expression in blood could be used to predict lethality after iav exposure. we focused on genes whose expression early after infection was associated with eventual lethal outcome, rather than with infection ( , ) . we reasoned that the focal point of infection, the lungs, where lethal processes unfold, would contain the relevant biological signal. several whole-genome transcriptomics datasets from mouse lung tissues after infection with iav are publicly available, including from our own previous work ( , , ) . we utilized these resources (i.e., transcriptomic data from the mouse lungs at several time points after infection) to derive a gene signature of lethality from the lung across several iav and mouse strains, followed by integration with blood data from influenza-infected mice and humans. briefly, to be considered for inclusion in the signature of lethality, on days to after infection genes had to be differentially expressed (de) , in comparison to pbs-treated animals, in the lungs of lethally infected mice but not de in the lungs of non-lethally infected animals (see fig. and methods for more details) ( , , ) . furthermore, genes were included in the signature only if they were in at least one of the gene clusters associated with lethal, but not with non-lethal, iav infections that we uncovered previously in the lungs ( ). the above procedure (the lung signature of lethal influenza infection) yielded , genes, which were enriched for several biological processes, including regulation of epithelial cell proliferation, cell adhesion, morphogenesis of an epithelium, regulation of vasculature development and regulation of to refine this lung lethality signature with blood data, we only retained genes that were de in the blood of mice two days after infection with a lethal dose of the highly pathogenic h n /pr iav strain (fig. s ) , and excluded genes that were de in the blood of humans upon low pathogenicity infection ( ). genes passed these pre-selection filters. as a positive control for detection of iav infection itself, independent of lethal outcome, genes previously used as classifiers of respiratory virus infection in blood were selected ( , ) as well as reference "housekeeping" genes for normalization ( ). primers against genes were designed for high throughput rt-qpcr (table s ). since (a) our past work used non-lethal infection data to develop the specific set of lethalityassociated gene clusters, (b) we excluded genes using human low pathogenicity data, and (c) we excluded genes de in the lungs of non-lethally infected mice, the hope was that the candidate genes selected into the integrated tissue and blood signature would have better specificity for association with lethality. to determine whether the expression of these candidate genes in blood could be used as a starting point to train a machine learning model of lethality in mice after challenge with iav, gene expression data were collected from rna isolated from µl of blood drawn from animals four days after treatment with pbs or a range of lethal and sublethal influenza strain a/puerto rico/ / h n (pr ) doses ( fig. a-b) . we used a statistical learning method known as elastic net and leave-one-out cross validation to assess whether predictive models could be built from our data ( - ). briefly, survival (fig. b ) and gene expression data (fig. s ) were fitted via elastic netregularized multinomial logistic regression to generate a model to classify mice into three categories: ) not infected, ) surviving upon infection, or ) dying upon infection. during training (i.e., model selection via cross-validation), the algorithm learned that genes had expression values in blood that could be linearly combined to determine the probability of an individual animal being in one of the training categories ( fig. c and fig. s a ). some of the positive control genes were selected by the algorithm to help differentiate between infected and non-infected animals, and fewer to separate infected survivors from non-survivors (fig. s a) , suggesting that the death-associated candidate genes were indeed enriched for detecting signals of lethality in blood. in the approach described above using logistic regression, the algorithm attempts to learn how the expression of each gene can be combined to discriminate mice in one category from another. in a hypothetical scenario of resource prioritization, however, one might also be interested in estimating the time to a relevant clinical event -death, in this case. this can be achieved with cox proportional hazards regression, where time to event is taken into consideration and the relative risk of death for each subject can be derived. hence, to examine whether the expression of lethal candidate genes in blood would distinguish mice at high risk of early death after challenge, we combined survival and gene expression data from the lethal candidate genes in cox regression regularized via the elastic net. during training (i.e., model selection via cross-validation), the algorithm learned that genes had expression values in blood that could be linearly combined to generate a scoring system of infected animals as a function of the day of death ( in both the multinomial and the lethal cox models, high levels of expression of genes associated with monocytes and neutrophils, together with low levels of transcripts associated with lymphocytes, indicated high risk of death (fig. s ), consistent with previously described analyses of the immune response to iav infection ( , , ) and also recent data from covid- patients ( ). art , the gene most positively associated with lethality in both models, is highly expressed in hematopoietic stem cells and immature lymphocytes according to the immgen database ( ), indicating a potential association of lethality with dysregulated hematopoiesis and release of immature cells into circulation. an important aspect of machine learning-derived models is whether their performance is generalizable, which means whether they perform well on unseen test data that has not been used in training. here, the models were tested for their ability to predict lethality based on gene expression data from independent cohorts of mice that were not available for training. performance was tested in three different ways: ) on mice infected with high doses of either a low or a high pathogenicity influenza strain (i.e., non-lethal high dose strain a/texas/ / h n (tx ) vs. lethal high dose pr ); ) by training our models on this second dataset (low vs. high pathogenicity data) while testing on our first dataset described above from infection with different doses of pr ; and ) by testing on the scenario of infection at one lethal dose (ld ), where mice are challenged with virtually the same viral dose but only half of them survives the infection. in the first round of validation, an independent cohort of mice bled on day two after treatment with pbs or with a high dose of either tx or pr provided the data ( fig. a-b) . although the test set was from an earlier time point of infection and included one virus strain and one dose not used in training, both the multinomial and the lethal cox models showed good predictive accuracy ( fig. c-d) . in the second round of validation, after reversing the roles of each dataset (i.e., training on the set with two different iav strains on day two of infection and testing on the set with five different doses of pr on day four of infection), the multinomial model did not perform as well as the lethal cox model, which showed good accuracy for predicting outcome ( fig. s a-b) . considering only the lethal cox model, for our third round of validation we turned our attention to the challenging scenario of predicting lethality among genetically identical, sex and age matched, cohoused mice given the same infectious bolus at an ld of pr (n = mice). in this setting, typically about half of the mice succumb while the other half survives viral exposure. importantly, it is not known mechanistically what drives this outcome dichotomy, since measuring putative candidate factors such as lung viral titer and immune infiltration requires sacrificing the mice and, thus, precludes assessment of the actual outcome of disease. to us, the most obvious candidate mechanism was that differences in initial infectious bolus effectively received by each animal during the infection procedure would determine the dichotomy. in that case, our lethal model should be able to predict the outcome of infection very early on, and thus help shed light on the biological mechanisms of life or death under these particular conditions. incubation period is the time that transpires from the moment of exposure to a pathogenic agent to when the host starts showing signs of infection. during this period, a virus needs to arrive at accessible tissues and bind to receptors to enter and replicate in susceptible cells. from infected cells, new infectious viral particles are released, and the cycle starts again, with viral titers temporarily following an exponential growth curve at least while virus replication is unperturbed by the host immune system. naturally, the time to trigger effective host immunity is influenced by viral virulence, replication capacity and initial infectious dose. s c) . considering that only a very small number of pr virions is required to induce pathogenic lung disease, these results indicate that on day two of ld pr infection the virus had not yet reached high enough levels in the respiratory tree for the early anti-viral response to become detectable in blood, and, thus, our model was not able to detect any signs of lethality in the blood of mice on day two post ld pr infection. by day four, however, de of all positive control of infection genes could be detected in the blood of animals infected with ld pr (fig. s c) , likely reflecting the spread of pr in the lungs and evolving host immunity. in addition, on day four, our lethal model correctly placed the animals at an intermediate level of risk of death -higher than mice infected with low pathogenicity tx , where no mice were at any risk of death, and lower than high dose lethal pr , where every animal would eventually succumb (fig. s d) . however, on day four, our model was unable to predict accurately which of the individual mice within the ld group would survive and which would die (fig. s e ). retraining our model on ld blood gene expression data also failed, as did predictions based on loss of body weight up to day four, suggesting that the fate of these mice might be indistinguishable at this early point after infection. in an attempt to further characterize the infection dynamics, we devised a longitudinal experiment in which ethically small samples of blood (~ µl) were taken daily from the same animals in a different cohort of mice during days four to eight of infection with ld pr (fig. a) , followed by rna isolation, high-throughput rt-qpcr, and testing based on the existing lethality model (i.e., without retraining). the model had statistically significant power to distinguish within group, interindividual differences in relative risk of death after ld challenge on days five and six, but not on days four, seven or eight (fig. b) . these results suggest a specific time window within which processes associated with ld lethality is reflected by our signature in blood. while the lack of accuracy later in the course of infection (i.e., days seven and eight) was likely due to the fact that the model was trained for early detection of lethality, before adaptive immune cells fully developed and reached the circulation, these data suggest that pamps and damps eventually reach different levels in the lungs of different mice, impacting their blood cell composition and lethal score, which can be used for prediction of lethality of individual animals even in the challenging scenario of experimental infection with an ld iav dose. prediction tools for infectious disease outcomes would enable evidence-based treatment decisions and resource allocation, in particular during pandemics. we show that a transcriptional signature predictive of lethality can be developed via machine learning in a mouse model of influenza infection early after exposure, from as little as µl of blood. this was achieved by integrating tissue and blood signatures of lethal influenza infection. the model also revealed a specific time window for prediction of lethality in the challenging scenario of ld pr infection. earlier in the course of infection, our model tended to place mice infected with ld pr at an intermediate risk level in comparison to mice infected with high dose pr or tx , suggesting that our model can delineate effects associated with infection severity. in the scenario of ld pr infection of inbred and cohoused littermates with virtually the same infectious bolus, our model revealed a later split (~days and ) between animals who would eventually die or survive. these intriguing results raise the possibility that differences in initial infectious bolus were not the primary determinants of life or death of mice infected with ld . rather, these mice seem to behave like one group undergoing similar responses until they reach the boundaries of a tipping point, with survival or death being the result of biological variation around that tipping point (schematic model in fig. s f) . the time when these differences were observed (i.e., days and ) overlaps with the early effector phase of adaptive immunity. since adaptive immunity, in particular cytotoxic t cells, plays an important role in reducing viral load and, hence, in stopping the feedforward stimulation of the damaging innate response ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , the timing when predictions can be made suggests that early inter-individual differences in the adaptive response may make a critical contribution to differences in the outcome of disease. although the average t cell response to infection is remarkably efficient and constant ( - ), formation of the naïve repertoire of antigen-specific t cells is a semi-stochastic process that varies in each host, resulting in a range of precursor frequencies ( - ) and, presumably, leading to a spread among the mice in the kinetics of reaching an adequate effector t cell number for effective viral clearance. therefore, the timing of the split between survivors and non-survivors in the ld scenario potentially reflects differential interference with virus production during the early phase of the effector t cell response, which may be the determinant of life or death at the tipping point of infection. however, a cause/effect relationship is difficult to test directly due to the lack of tools to accurately examine, for example, virus titer, lung infiltration by antigen-specific cells, or the size of their precursor population in live mice, because these would require terminal experiments that would preclude determination of which animals would eventually die several days later due to the actual infection. nonetheless, preliminary experiments adding gene probes to our panel that report activated cd + t cell abundance in the blood samples suggest that there is a relationship between the strength of these lymphocyterelated signals in day or samples and survival after infection. further tests along these lines are planned to refine the signature and relate these blood findings in a large cohort of infected animals to effector function and viral abundance in the lungs as directly assessed by multiplex tissue imaging. our experiments have several limitations, such as the lack of validation of the models on human iav infection data. unfortunately, to our knowledge the only report to contain both whole blood wholegenome transcriptomics and disease severity data that includes severe human iav infection, contained data from only three severe cases beginning five days after onset of symptoms or earlier ( ). beside the fact that such low number of early cases preclude attempts to validate our models, one also needs to take into consideration the incubation time -the time from exposure until development of symptoms -when considering the temporal evolution of infection. the models presented here were developed for prediction very early after exposure, and, while animal models of infection can be used to clearly delineate the evolution of the immune response along its temporal component, it remains to be determined which time window post exposure in mice most closely translates to time after onset of symptoms in human infection. experimental iav challenge in humans with relatively mild strains indicates a peak of symptoms (mostly runny nose and sore throat but no need for mechanical ventilation) around day - after exposure ( ), but it is not clear how long it takes, on average, for a symptomatic carrier to seek medical treatment, in particular in case of severe disease with highly virulent and/or highly pathogenic strains. such questions would need to be addressed to enhance preparedness for the next pandemic and before predictive models developed in animal experiments could be realistically deployed to human populations, including with respect to the ongoing sars-cov pandemic. although mice and humans have substantial differences, including different distributions of influenza viral receptors in the respiratory tree, which is crucial to create an inter-species barrier for transmission, the immunobiology of highly pathogenic influenza infection, once transmission has been established, is quite similar across the several species that have been studied with regard to strong innate immune activation ( , , ) . genes that are part of the lethal cox model are highlighted (fig. s b) . data visualization and analysis was performed in r . . and rstudio ( ), using packages downloaded from cran and bioconductor . ( ) . data were handled, plotted and visualized with the foreach ( ), dplyr ( ) , ggfortify ( ), ggplot ( ) and gridextra ( ) packages. for rna-seq, blood was collected postmortem via heart puncture on day two after infection (n = mice/treatment) and kept at - °c in rnalater (thermo fisher scientific a list of candidate genes for the affected lung tissue signature of lethal influenza infection, as well as respiratory virus infection positive controls and reference genes was created by checking for overlap between the gene lists that were either downloaded from the original publications or created using the geo data with the geo r tool (bh-fdr = . ; see text and fig. for details) ( , ( ) ( ) ( ) . when entrez gene ids were not available, the mouse genome informatics database (jackson) was used to find current and old gene symbols as well as synonyms. a final list with selected genes and primers is shown in table s . bioanalyzer. only samples with rin score above were further analyzed. primer design and gene expression analysis were performed with the . ifc using delta gene assays software (d assay design) and protocols for pre-amplified samples for use with biomark hd, following the manufacturer's instructions (fluidigm) as described previously ( ) . to generate a standard curve for assessment of primer performance, samples from infected and non-infected animals were pooled and applied in a standard curve in duplicates (two no-sample controls were also included). initial quality control was performed on the biomark hd using real-time pcr analysis v . . (fluidigm), with automatic threshold generation. genes with primers unable to generate high quality results across the range of dilutions of the standard curve (r > . in ct value vs. dilution plots) or generating more than one peak in the melting curve were excluded from further analysis. data was imported to r and normalized with htqpcr ( ) . normalization was based on the standard delta ct method (subtraction of the mean of the reference "housekeeping" genes from all other values). in the machine learning algorithm, training was performed with glmnet ( - ) using leave-one-out cross-validation to tune the regularization parameter lambda (alpha = . , family = "multinomial" for the test error of the scoring model was estimated both using leave-one-out cross-validation within the training cohort (also known as nested cross-validation, in which the leave-one-out procedure is used (a) in a sub-cohort of the training samples to tune the hyperparameter lambda, followed by predicting on the one sample not included in that sub-cohort, and (b) repeated for every sample in the training cohort) and on an independent cohort of mice. a cox proportional hazards regression model was fitted to the predicted score (i.e., the relative risk derived using the predict function of the glmnet package [type = response]) of each mouse using the survival package and checked that they did not violate the proportional hazards assumption at alpha = . using the cox.zph function also in the survival package ( ). importantly, the low number of mice per category precluded the use of cross-validation to estimate the test error of the classification model, which was done only using the independent cohort of mice. fatal outcome of human influenza a (h n ) is associated with high viral load and hypercytokinemia genomic analysis of increased host immune and cell death responses induced by influenza virus avian influenza virus (h n ): a threat to human health aberrant innate immune response in lethal infection of macaques with the influenza virus severe seasonal influenza in ferrets correlates with reduced interferon and increased il- induction in vitro and in vivo characterization of new swine-origin h n influenza viruses into the eye of the cytokine storm innate immunity to influenza virus infection ccr + monocyte-derived dendritic cells and exudate macrophages produce influenza-induced pulmonary immune pathology and mortality lung epithelial apoptosis in influenza virus pneumonia: the role of macrophageexpressed tnf-related apoptosis-inducing ligand innate immune responses to influenza a h n : friend or foe? tnf/inos-producing dendritic cells are the necessary evil of lethal influenza virus infection a systems analysis identifies a feedforward inflammatory circuit leading to lethal influenza infection gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans pathological findings of covid- associated with acute respiratory distress syndrome suppressive myeloid cells are a hallmark of severe covid- disease severity-specific neutrophil signatures in blood transcriptomes stratify covid- patients orchestrating high-throughput genomic analysis with bioconductor ggfortify: unified interface to visualize statistical results of ggplot : elegant graphics for data analysis pathogen-related differences in the abundance of presented antigen are reflected in cd + t cell dynamic behavior and effector function in the lung fast gapped-read alignment with bowtie moderated estimation of fold change and dispersion for rnaseq data with deseq panther in : modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees distinct nf-κb and mapk activation thresholds uncouple steady-state microbe sensing from anti-pathogen inflammatory responses htqpcr: high-throughput analysis and visualization of quantitative realtime pcr data in r predicted relative risk (log) for each mouse is shown for comparison, with the red dot and red line representing mean and standard deviation in each group. (f) schematic model of risk of lethal inflammation rising with time early after infection with different doses of and strains iav. n = in a/b (all infected mice new expression profiling by high throughput sequencing data reported here are available on geo with accession number gse . we would like to thank emily condiff, dr. antonio p. table table s : selected genes whose expression was measured in the blood of mice upon treatment for training and testing model. class defines whether a gene is part of the lethal affected tissue signature (lethal), positive controls of infection (infection) or reference "housekeeping" genes (reference); fp: forward primer; rp: reverse primer. key: cord- -b q gbd authors: mickael, alexandra; klimovich, pavel; henckel, patrick; kubick, norwin; mickael, michel e title: asip (agouti-signaling protein) aggression gene regulate auditory processing genes in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: b q gbd covid- strategy of lockdown has affected the lives of millions. the strict actions to enclose the epidemic have exposed many households to inner tensions. domestic violence has been reported to increase during the lockdown. however, the reasons for this phenomenon have not been thoroughly investigated. melanocortin gpcrs family contribution to aggression is well documented. asip (nonagouti) gene plays a vital role in regulating the melanocortin gpcrs family function, and it is responsible for regulating aggression in mice. we conducted a selection analysis of asip. we found that it negatively purified from shark to humans. in order to better asses the effect of this gene in mammals, we performed rna-seq analysis of a knockout of an asip crisper-cas mouse model. we found that asip ko in mice upregulates several genes controlling auditory function, including phox b, mpk , fat , neurod , slc a , gon l gbx , slc a (dat ) aldh a tyrp and lbx . interestingly, we found that slc a , and lamp as well as il , which are associated with startle disease, are also upregulated in response to knocking out asip. these findings are indicative of a direct autoimmune effect between aggression-associated genes and startle disease. furthermore, in order to validate the link between aggression and auditory inputs processing. we conducted psychological tests of persons who experienced lockdown. we found that aggression has risen by % during the lockdown. furthermore, % of the subjects interviewed reported a change in their hearing abilities. our data shed light on the importance of the auditory input in aggression and open perceptions to interpret how hearing and aggression interact at the molecular neural circuit level. the link between social lockdown and aggression is well established. aggression can be categorized as a beneficial evolutionary trait as it might indicate survival when individuals are competing for resources. conversely, aggressiveness might also impede social cohesion. social lockdown can lead to psychological problems, including anxiety, depression, antisocial, and violence-related behaviors. for example, orca confinement in closed places exhibits aggressive behaviors [ ] . in humans, release from mandatory confinement indoors was correlated with decreases in both verbal and physical aggression [ , ] . during covid- lockdown, . % of participants of a study investigating the effect of lockdown reported worsened sleep, increased levels of irritation, anger, and aggression compared to pre-pandemic times [ ] . furthermore, % of all participants reported experiencing interpersonal violence (ipv) [ ] . however, the factors causing aggression are not yet well investigated. melanocortin system plays an important role in regulating aggression. it is widely documented that the peptide hormone pro-opiomelanocortin (pomc) acts as a precursor for various (neuro)peptides including α -melanocyte-stimulating hormone (α-msh), adreno-corticotrophin (acth) and β -endorphin.the function of melanocortins is regulated by the activation of a family of melanocortin receptor subtypes. α -msh binds to mcrs in the brain, where it regulates social behavior, appetite and stress physiology. α -msh acts as a neurotransmitter in the brain where it can modulate behavior mainly via central mc r and mc r in a variety of ways including regulating the dopaminergic reward system. melanocortin- receptor deficiency reduces a pheromone signal for aggression in male mice [ ] . interestingly, agouti signaling peptide (asip) and agouti-related peptide (agrp) have diverse functional roles in feeding, pigmentation and background adaptation mechanisms. interestingly, asip serves as anof the mc r and the mc r receptors. it seems melancortin system diverged from adenosine receptors around the time of divergence of hydra vulgaris [ ] . serotonin receptors that are known to play a distinctive role in aggression seemed to have evolved as they have been found in trichoplax adhereans [ ] . while dopamine receptors diverged afterwards around the time of emergence of ciona intestinalis [ ] . however, the inter-regulation mechanism between these four families is still not clear. hearing as a sensory modality in the context of aggressive behavior has been shown to play a major role in controlling behavior [ ] . precise integration and processing of sensory inputs are crucial to evoke a suitable behavioral response [ ] . in crickets, aggression songs are associated with cricket fights [ ] . mice lacking asic show reduced anxiety like behavior on the elevated plus maze and reduced aggression [ ] . asic was also shown to affect hearing [ ] . these reports indicate that aggression and hearing are possibly interlinked. previous reports have shown various sensory modalities in mediating of aggressive behavior in drosophila melanogaster including olfactory, gustatory, as well as visual neural networks [ ] . furthermore it was found that, neuronal silencing and targeted knockdown of hearing genes such as d trpl (transient receptor potential-like) and the ca + signaling-related genes arr and inad in the fly's auditory organ elicit abnormal aggression [ ] . these observation indicates that hearing could regulating aggression behavior. however if aggression controls hearing is not yet known. interestingly, melanocytes present in the cochlea have an essential function in inner ear physiology. they protect against various types of hearing loss, including age-related hearing loss (arhl) and noise-related hearing loss (nihl), by means of calcium buffering, heavy metal scavenging, and antioxidant activities [ ] . however if aggressiveness is directly affecting hearing abilities is still not known. furthermore there has no been earlier reports of asip investigations that have shown a direct link between asip and hearing. in this study, we have investigated the role of asip in linking aggression and the hearing system. in order to confirm that our mice model study would be representative of aggression hearing link, we conducted an evolutionary study that revealed that asip is negatively selected between mice and humans. we analyzed an rna-seq data in which asip was knocked out from japanese wild-derived msm/ms strain using crispr/cas genome editing. we found that numerous hearing associated genes were upregulated in the ko mouse model including those linked to startle disease. in order to validate our results we conducted behavioral tests, to investigate whether the rise in aggression during the covid- period has affected hearing patterns, we found that there % individuals interviewed experience a change in their hearing abilities while engaging in arguments during lockdown. phylogenetic investigation was done in three stages. first, asip family amino acid sequences were aligned using mafft via the iterative refinement method (fft-ns-i).next, we employed prottest to conclude the best amino acid replacement model [ ] . prottest results based on the akaike information criterion (aic) suggested that the best substitution model is lg+i+g+f., lg is the substitution model supplemented by a fraction of invariable amino acids ('+i') with each site assigned a probability of belonging to given rate categories ('+g') and observed amino acid frequencies ('+f'). the third stage involved employing the protein alignment and the resulting substitution model, in applying two different phylogenetic methods to construct the tree, namely, ( ) maximum likelihood and ( ) bayesian inference. we performed the maximum likelihood analysis using phyml [ ] implemented in seaview with random starting trees [ ] . we applied bayesian inference analysis using mrbayes where we implemented a markov chain monte carlo analysis with , generations to approximate the posterior probability and a standard deviation of split frequencies < . to indicate convergence as previously described. we used the coding dna alignment and our final tree to investigate the ratio of non-synonymous (dn) to synonymous (ds) amino acid substitutions using the paml program. likelihood ratio tests (lrt) were constructed to compare the p-values of χ square tests for selective pressure models against neutral models. one level of analysis was investigated. this level calculates the global ω for the tree using the one-ratio model m [ ] where ω = dn/ds, with trees under purifying selection ( ). the overall design of the rna experiment was as follows: the mid brain section was isolated from japanese wild-derived msm/ms japanese mice ( control and ko) [ ] . total rna was purified from dissected midbrain using trizol (thermo fisher scientific). purified total rna was amplified and labeled with cy using low-input quickamp labeling kit (agilent technologies). cy -labeled rnas were hybridized to sureprint g mouse gene expression v x k microarray kit (agilent technologies) at °c for h. the scanned images were processed with feature extraction software (agilent technologies) to extract signal intensities of each probe. the extracted signal data were imported into the gene spring gx . . software (agilent technologies) and normalized using the default settings. rnaseq analysis was then performed in r using limma [ ] . briefly, we employed the limma rna-seq differential gene expression method to compute the non-parametric approximations of mean-variance relationships. this allowed us to calculate the weights for a linear model analysis of log-transformed counts in conjunction with the empirical bayes shrinkage of variance parameters. differential expression analysis was performed to determine the differences in gene expression between +lps cells and nontreated samples by fitting a linear model to compute the variability in the data with lmfit [ , ]. pathway enrichment was done using the library fgsea[ ]. the network between chosen genes was calculated using the glasso module utilizing the webserver geneck [ ] with default settings. data were downloaded from http://www.proteinatlas.org [ ] . for a protein to be a candidate biomarker it should be medium or high expressed in normal brain .we arbitrarily set our selection criteria for candidate genes that were found to be upregulated in the rna-seq study. the present study included participants who reported no prior aggression, or depression diagnosis. the sample size was determined by the g power analysis. the participants reported age ranged between to years. the education level of participants varied from uneducated to master's degree. informed written consent was taken from the participants after explaining them the purpose of research. exclusion criteria; participants with some serious general medical condition were also not selected. aggression was measured buss-perry aggression questionnaire [ ] . this scale consists of four subscales of physical aggression verbal aggression, anger and hostility the participants were asked to evaluate each item on a -point likert scale ranging from (not true) to (true). the consistency for each category was confirmed in with cronbach alphas ranging from . to . [ ] . in this study, the alphas for the subscales were . for physical aggression, . for verbal aggression, . for anger, and . for hostility, respectively. the descriptive scores of the four subscales were considered by averaging the item scores. we examined whether aggression behavior could be caused by social stress. we employed the bergen social relationship scale (bsrs) which measures the interpersonal relationship problems [ ] . it is a six element self-report quantity. the answers were on four points scale using the notions of "describe me well ( )" to "do not describe me at all". the scores system was divided from to , where a higher score signifies higher interpersonal conflicts. cronbach's alpha for the bsrs was reported to be . . the testretest correlation was reported to be . . the construct validity of the bsrs was ranged from . to . , all statistically significant at p < . . our results indicate that aggression-inducing gene the conserved aggression genes asip is responsible for down regulating several genes responsible for hearing and acoustic processing in mice. our results also shows that this gene is strongly conserved between mice and humans. when we analyzed rna-seq for asip ko mouse model we found that several genes controlling hearing were upregulated in the ko samples. specifically, phox b, mpk , fat , neurod , slc a , gon l gbx , slc a aldh a tyrp and lbx genes were down regulated in asip (ko). interestingly, we found that slc a , and lamp as well as il , which are associated with startle disease, are also upregulated in response to knocking out asip. these results imply that asip is playing a fundamental role in startle disease and that the startle disease pathology is connected to the patient's response to aggression. we found the link between hearing abilities and aggression mirrored in human samples where people whose experienced aggression behavior lockdown, also reported hearing abilities change. our results shed more light on the link between aggression and processing acoustic signals in humans. agouti evolution was under negative selection. our results extend what has been reported by saeed et al. we were able to locate human asip gene homologs in chimps, orangutan, mice, and chicken (figure x) . we found two homologs of the gene in zebrafish as well as in elephant shark (faa . , and faa . ). interestingly, we were only able to find a single copy of the gene in the reed fish erpetoichthys calabaricus (xp_ . ), suggesting that reed fish lost one homolog. we were not able to locate any asip genes in lampreys. however, we were able to locate several melanocortin receptors (e.g. xp_ . , serotonin, xp_ . , adenosine (xp_ . ) and dopamine receptors xp_ . . suggesting that lampreys have evolved its unique pathways for regulation of aggression. we were not able to locate asip in hagfish hyperotreti, ciona intestinalis, hydra vulgaris, drosophila melanogaster, trichoplax adhaerens or caenorhabditis elegans. these observations indicate that asip first emerged during the divergence of gnathostomata and hyperoartia (lampreys). identity and similarity analysis indicated that asip is highly conserved. for example, the similarity between humans chimp for asip was . %, while with the mouse, it was ( . %) (figure) . furthermore, w ds/dn showed a value of . , confirming that the genes evolved under negative selection (figure x) . we analyzed the geo dataset gse . in this dataset, crispr/cas -mediated genome editing in wild-derived mice was performed to generate tamed wild-derived strains by mutation of the a (nonagouti) gene. these tamed mice show non aggressive behavior when tested through the () test. we found genes related to hearing and acoustic signals processing upregulated in the ko mice. we also found that serotonin level was directly down regulated by knocking out agouti gene (figure ). other aggressiveness related genes that were down regulated include () . however interestingly we found serveal genes that are associated with aggressive behavior have not been affected such as cfos, and…. notably, we did not see any change in the inflammatory pathway in this particular model. however, using the geo data set () that uses (), we found that cd pathway was affected (figure ) as well as () patwhay. (figure ). ( figure =ihc ) to investigate the role of asip in the auditory sensory mechanisms, we analyzed the public microarray of cochlear sensory epithelia versus embryonic stem cells. we found that asip is upregulated in cochlear sensory epithelia but not in embryonic stem cells (fold change . ). interestingly mcr was not upregulated ( . ) indicative of asip a regulatory role of melanocortin receptors under homeostasis. fat , slc a , gon l, dat , aldh a were also not upregulated confirming our hypothesis of a regulatory effect of asip on these hearing related genes. to investigate the role of asip in the cochlear sensory epithelia, we analyzed the public microarray of cochlear sensory epithelia treated with lpl. we found that asip is upregulated in cochlear sensory epithelia but not in embryonic stem cells (fold change . ). interestingly il known to perform regulation of inflammatory pathway was not upregulated. interestingly mcr was not upregulated ( . ) indicative of asip a regulatory role of melanocortin receptors under homeostasis. fat , slc a , gon l, dat , aldh a were also not upregulated confirming our hypothesis of a regulatory effect of asip on these hearing related genes. our studies suggest a direct link between acoustic processing and aggression. we have investigated the relationship between aggression induced by lockdown and hearing. we found that % of individuals who answered the questionnaire reported some difference in their hearing ability. this includes both and negative abilities. we found these results reflected in the molecular pathway of melanocortin receptors and , where their knocking out their negative agonist; asip resulted in upregulating various hearing associated genes such as fat among others. we also noticed that of the genes associated with hearing affected by knocking out of asip is slc a and il , which showed upregulation; both these genes were associated with startle disease. we investigated asip further and we found that it is expressed in cochlear sensory epithelial cells. from an evolutionary point of view, asip is more recent than melanocortin receptors. taken together, our results suggest that asip divergence represent the evolution of new mechanism linking hearing and aggression in higher animals. our analysis has shown that the asip gene diverged around mya ago. ingo et al,( pmid: ) have shown that agouti exist in teleost. we have found it in elasmobranchs, proving it was more ancient than previously thought. the endocrine related genes that play an important role in aggression include, serotonin, with low levels of serotonin associated with violent behavior and suicidal thoughts. serotonin perform its role in aggression through a network of genes including maoa and maob, (which play a role in the metabolism of biogenic amines), slc a , (which acts as serotonin transporter) and tryptophan hydroxylase enzyme (tph ) (which catalyzes the rate-limiting step in the synthesis of serotonin). it has been shown that polymorphism of metabolic enzymes, carrier proteins, and receptors on the serotonergic system are associated with an increased aggressive behavior pattern. interestingly, serotonin receptors diverged during the time of trichoplax adhaerens. in human studies, a positive relationship between aggressiveness -methoxy- -hydroxyphenylglycol (mhpg) (norepherinrine product) and has been established. noradrenaline transporter (slc a ), − t allele is more dominate in adhd in americans of european descent, thus proving a questionable link with aggression. interestingly, β -type noradrenergic receptor blocker have been shown to reduce aggressive behavior in some but not all patients tested, suggesting that aggression is controlled by a host of gene networks. adrenergic receptors also evolved during the time of evolution of trichoplax adhaerens. it seems the melanocortin system diverged from adenosine receptors during the divergence of hydra vulgaris (pmid: ). conversely, arginine vasopressin levels are positively correlated with life history of aggression have evolved ruing the emergence of ciona intestinalis. another important endocrine mechanism is the dopamine reward system, for example, the gene dbh is a key enzyme in the synthesis of norepinephrine which is associated with conduct disorder. however, dopamine receptors first appeared during the emergence of ciona intestinalis. acetylcholine receptors also implicated in aggression behavior diverged during danio rerio emergence. we could not find asip in ciona intestinalis. we have noticed that knocking out asip increased acetylcholine receptor slc a ( . fold increase). interestingly, knocking out asip did not affect dopaminergic or serotonergic receptors expression. since its emergence asip has been subjected to a tight negative selection (w= . ). this indicates that from a chronological point of view, serotonin, adrenergic, are responsible for regulating the basic mechanism controlling behavior in lower organisms, melanocortin and dopamine emerged as the need for a better control of aggression appeared in hydra and ciona, while acetylcholine receptors play a role in higher animals. finally, asip emerged to play a role of regulator in higher invertebrates and vertebrates. asip is controlling melanocortin role in hearing. effect on asip on genes associated with auditory signal processing. asip is downregulating genes responsible for sound processing and regulation. phox b, mpk , fat , neurod , slc a , gon l gbx , slc a (dat ) aldh a tyrp and lbx . phox b is expressed in the brain stem, mutation in this gene have been associated with brainstem dysfunction and brainstem auditory evoked potentials in % of the congenital central hypoventilation syndrome (cchs) patients investigated ( ). phox b is involved in the development of several major noradrenergic neuron populations, including the locus coeruleus the pons of the brainstem which is known to play a major role in aggression behavior ( ). fat plays an important role in hair cell regeneration in zerbra fish ( ). it was demonstrated that the lateral and basolateral amygdala nuclei fail to form in neurod null mice and neurod heterozygotes have fewer neurons in this region. neurod heterozygous mice show profound deficits in emotional learning as assessed by fear conditioning ( ). human neurod can induce neurogenic differentiation in non-neuronal cells in xenopus embryos, and is thought to play a role in the determination and maintenance of neuronal cell fates. gbx is required for the morphogenesis of the mouse inner ear. in particular, absence of the endolymphatic duct and swelling of the membranous labyrinth are common features in gbx -/-inner ears. more severe mutant phenotypes include absence of the anterior and posterior semicircular canals, and a malformed saccule and cochlear duct( ). in slc a :egfp larvae, it was found that egfppositive dopaminergic fibers were located within the supporting cell layer and not within the hair cell layer. it was demonstrated that dopamine receptors are present in sensory hair cells at synaptic sites that are required for signaling to the brain. when nearby neurons release dopamine, activation of the dopamine receptors increases the activity of these mechanosensitive cells. a mutation in aldh a has been suggested to contribute in the meniere disease (md), an inner ear disorder characterized by tinnitus, vertigo, and hearing loss (lynch et al., ) . acoustic overstimulation upregulate tyrp in rats ( ). lbx acts as a selector gene in the fate determination of somatosensory and viscerosensory relay neurons in the hindbrain. interestingly we found that asip is upregulated in the the perception of sound involves the cochlear sensory epithelium (cse), which contains hair cells and supporting cells. hair cells are the transducers of auditory stimuli into neural signals, and are surrounded by supporting cells (pmid: ). taken together that these reports indicate that asip a key regulator of several genes at different regions of the brain that play various role in developing and maintaining acoustic pathways. our results indicate a direct link between hereditary hyperekplexia and aggression. startle disease is characterized by an exaggerated startle response, evoked by tactile or auditory stimuli, leading to hypertonia and apnea episodes. missense, nonsense, frameshift, splice site mutations, and large deletions in the human glycine receptor α subunit gene (glra ) are the major known cause of this disorder. however, mutations are also found in the genes encoding the glycine receptor β subunit (glrb) and the presynaptic na+/cl−-dependent glycine transporter glyt (slc a ). in this study, systematic dna sequencing of slc a in new unrelated human hyperekplexia patients revealed sequence variants in index cases presenting with homozygous or compound heterozygous recessive inheritance. five apparently unrelated cases had the truncating mutation r x. genotype-phenotype analysis revealed a high rate of neonatal apneas and learning difficulties associated with slc a mutations. from the slc a sequence variants, we investigated glycine uptake for novel mutations, confirming that all were defective in glycine transport. although the most common mechanism of disrupting glyt function is protein truncation, new pathogenic mechanisms included splice site mutations and missense mutations affecting residues implicated in cl− binding, conformational changes mediated by extracellular loop , and cation-π interactions. detailed electrophysiology of mutation a t revealed that this substitution results in a voltage-sensitive decrease in glycine transport caused by lower na+ affinity. this study firmly establishes the combination of missense, nonsense, frameshift, and splice site mutations in the glyt gene as the second major cause of startle disease. lamp lamp plays a pivotal role in sensorimotor processing in the brainstem and spinal cord. it is highly expressed in several brainstem nuclei involved with auditory processing including the cochlear nuclei, the superior olivary complex, nuclei of the lateral lemniscus and grey matter in the spinal cord. it was localized exclusively in inhibitory synaptic terminals, as has been reported in the forebrain. lamp knockout mice showed an increased startle response to auditory and tactile stimuli. in addition, lamp deficiency led to a larger intensity-dependent increase of wave i, ii and v peak amplitude of auditory brainstem response. (pmid: ). we also found that il −/− animals had deficits in acoustic startle response, a sensorimotor reflex mediated by motor neurons in the brainstem and spinal cord (fig. j , k; ( )). auditory acuity and gross motor performance were normal (fig. s i-j) . taken together, these data demonstrate that il- is required for normal synapse numbers and circuit function in the thalamus and spinal cord (pmid: ). important (pmid: ) a number of controlled experiments and clinical investigations have demonstrated roles for glucocorticoids in auditory function and protection. as early as the s, clinical studies revealed that patients with adrenocorticosteroid deficiency presented with greater auditory sensitivity compared to normal volunteers (henkin et al., ) . moreover, treatment with prednisone brought hearing thresholds up to normal levels, demonstrating that the observed hypersensitivity was related to levels of circulating corticosteroids. similarly other studies revealed that patients with meniere's disease, an inner ear disorder affecting both cochlear and vestibular function, exhibited low levels of circulating corticosteroids. administration of adrenal cortex extract improved auditory function in these patients (powers, ) orca behavior and subsequent aggression associated with oceanarium confinement. animals (basel) confined to barracks: the effects of indoor confinement on aggressive behavior among inpatients of an acute psychogeriatric unit the german covid- survey on mental health: primary results melanocortin- receptor deficiency reduces a pheromonal signal for aggression in male mice an optimised phylogenetic method sheds more light on the main branching events of rhodopsin-like superfamily hearing regulates drosophila aggression emotional responses to multisensory environmental stimuli. sage open aggressive behavior in the antennectomized male cricket gryllus bimaculatus mice lacking asic show reduced anxiety-like behavior on the elevated plus maze and reduced aggression asic (-/-) female mice with hearing deficit affects social development of pups progressive hearing loss in vitamin a-deficient mice which may be protected by the activation of cochlear melanocyte prottest : fast selection of best-fit models of protein evolution new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . . syst biol seaview version : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building paml : phylogenetic analysis by maximum likelihood crispr/cas -mediated genome editing in wild-derived mice: generation of tamed wild-derived strains by mutation of the a (nonagouti) gene. sci rep limma powers differential expression analyses for rna-sequencing and microarray studies geneck: a web server for gene network construction and visualization the human protein atlas: a spatial map of the human proteome the generalizability of the buss-perry aggression questionnaire a review on sample size determination for cronbach's alpha test: a simple guide for researchers measuring interpersonal stress with the bergen social relationships scale we would like to acknowledge professor macrious abraham for his ideas and advice. key: cord- -mcit luk authors: gupta, chitrak; cava, john kevin; sarkar, daipayan; wilson, eric; vant, john; murray, steven; singharoy, abhishek; karmaker, shubhra kanti title: mind reading of the proteins: deep-learning to forecast molecular dynamics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mcit luk molecular dynamics (md) simulations have emerged to become the back-bone of today’s computational biophysics. simulation tools such as, namd, amber and gromacs have accumulated more than , users. despite this remarkable success, now also bolstered by compatibility with graphics processor units (gpus) and exascale computers, even the most scalable simulations cannot access biologically relevant timescales - the number of numerical integration steps necessary for solving differential equations in a million-to-billion-dimensional space is computationally in-tractable. recent advancements in deep learning has made it such that patterns can be found in high dimensional data. in addition, deep learning have also been used for simulating physical dynamics. here, we utilize lstms in order to predict future molecular dynamics from current and previous timesteps, and examine how this physics-guided learning can benefit researchers in computational biophysics. in particular, we test fully connected feed-forward neural networks, recurrent neural networks with lstm / gru memory cells with tensorflow and pytorch frame-works trained on data from namd simulations to predict conformational transitions on two different biological systems. we find that non-equilibrium md is easier to train and performance improves under the assumption that each atom is independent of all other atoms in the system. our study represents a case study for high-dimensional data that switches stochastically between fast and slow regimes. applications of resolving these sets will allow real-world applications in the interpretation of data from atomic force microscopy experiments. molecular dynamics or md simulations have emerged to become the cornerstone of today's computational biophysics, enabling the description of structure-function relationships at atomistic details [ ] . these simulations have brought forth milestone discoveries including resolving the mechanisms of drug-protein interactions, protein synthesis and membrane transport, molecular motors and biological energy transfer, and viral maturation, encompassing a number of our contributions [ ] . more recently, we have employed molecular modeling to predict mortality rates from sars-cov- [ ] , showcasing its application in epidemiology. in md simulations, the chronological evolution of an n -particle system is computed by solving the newton's equations of motion. methodological developments in md has pushed the limits of computable system-sizes to hundreds of millions of interacting particles, and timescales from femtoseconds ( − second) to microseconds ( − second), allowing all-atom simulations of an entire cell organelle [ ] . high performance computing, parallelized architecture, speciality hardware and gpuaccelerated simulations have made notable contributions towards this progress. however, in spite of significant advancements in both development and applications, computational resources required to achieve biologically relevant system-sizes and timescales in brute-force md simulations remain prohibitively "expensive". notably, md involves solving newtonian dynamics by integrating over millions of coupled linear equations. an universal bottleneck arises from the time span chosen to perform the numerical integration. akin to any paradigm in dynamic systems, the time span for numerical integration is limited by the dynamics of the fastest mode. in biological systems, this span is femtoseconds (fs) or lower, owing to the physical limitations of capturing fast vibrations of hydrogen atoms. thus, md simulation of at least microsecond, wherein biologically relevant events occur, requires the computation of million fs-resolved time steps. each step involves the calculation of the interaction between every particle with its neighbors, which scales as n or n logn . when n = - million atoms, these simulations are only feasible on peta to exascale supercomputers. several techniques have been employed to accelerate atomistic simulations, which can broadly be classified into two categories: coarse-gaining and enhanced sampling. in the former, the description of the system under study is simplified in order to reduce the number of particles required to completely define the system [ ] . in the latter, either the potential energy surface and gradients (or forces) that drive the molecular dynamics is made artificially long-range so as to accelerate the movements or multiple short replica of the system are simulated in order to sample a broader range of molecular movements than a long brute-force md [ ] . a major contention of these techniques is that, the simulated protein movements cannot be attributed either chemical precision or a realistic time label [ ] . we explore machine-learning methodologies for predicting the outcomes of md simulations by preserving their accurate time labels. this idea will greatly reduce the computational expenses associated with performing md, making it broadly accessible beyond the current user-base of scientific researchers to high schools and colleges, where the computational resources are sparse. the developments will imminently expedite the efforts of nearly , users of our open-source md engine namd [ ] . in this resource paper, we present trajectory (green: high dimension, red: reduced dimension) visualized in d and rendered in d using the molecular visualization software vmd [ ] . (c) deviation from gaussian behavior (quantified by kurtosis where a higher value denotes larger deviation) of the distribution of x, y, and z positions of each of the particles (shown in red in b). two types of data sets, the dynamic correlations within which pose significant challenge on existing machine-learning techniques for predicting the real-time nonlinear dynamics of proteins. the underlying physics of these data sets represent out-of-equilibrium and inequilibrium conditions, wherein the n -particle systems evolve in the presence vs. absence of external perturbations. beyond tracking the nonlinear transformations, these examples also create an opportunity to study whether prediction accuracy of future outcomes with fs-resolution improve, if prior knowledge is utilized to enhance the signal-to-noise ratio of key features in the training set. a number of works in the past have focused on predicting protein structures from protein sequence/ compositional information by training on the so-called sequencestructure relationship using massive data sets accrued over the pdb and psi databases [ ] . however, knowledge of stationary d coordinates offer little to no information on how the system evolves in time following the laws of classical or quantum physics. little data is available to train algorithms on such time series information despite the imminent need to predict molecular dynamics [ ] . the presented data sets capture both the linear and nonlinear movements of molecules, resolved contiguously across millions of time points. these time series data enable the learning of spatio-temporal correlation or memory-effect that underpins newtonian dynamics of large biomolecules -a physical property that remains obscure to the popular sequence-structure models constructed stationary data. we establish that the success of any deep learning strategy towards predicting the dynamics of a molecule with fs precision is contingent on accurately capturing on these many-body correlations. thus, the resolution of our md data sets will result in novel training strategies that decrypt an inhomogeneously evolving time series. as a publicly accessible resource, our md simulations trajectories of even larger systems ( - particles) [ ] will be provided in the future to seek generalizable big-data solutions of fundamental physics problems. in what follows, we use equilibrium and non-equilibrium md to create high-dimensional time series data with atom-scale granularity. for simplicity, we derive a sub-space of intermediate size composed only of carbon atoms. in this intermediate-dimensional space, where the data distribution is densed highly correlated, we train state-of-the-art time sequence modeling techniques including recurrent neural networks (rnns) with long short term memory (lstms) to predict the future state of the system (fig. ). we explore, how a kirchhoff decomposition [ ] of the many-body problem dramatically enhances the learning accuracy both under equilibrium and non-equilibrium data, even when the number of hidden layers << than the number of atoms. hardness of the time series are captured in terms of root mean square deviations (rmsd) errors, computed at different lead-times. the rmsd between two n -dimensional data points a and b is defined as: ( ) where a and b could be either real and predicted points. we also define history time and lead time to be a moving window of cumulative time steps (in units of fs) respectively in the past and in the f uture of a given data point in the time series, over which training and predictions are achieved. modeling accuracy was evaluated by varying the amount of historical data points incorporated during the training phase, and then comparing its prediction accuracy against that of a static or linear model. surprisingly, we find that the equilibrium md time series is more challenging to learn, despite the non-gaussian distribution atoms associated to the non-equilibrium md. henceforth, we discuss how these new data-set resources can be used for future research of modeling high-dimensional high-frequency event-driven md time series data. in the recent past, machine learning approaches have been successful in analyzing the results of md simulations. support vector machines and variational auto-encoders have been developed to extract free energy information from md simulations [ ] . kinetic properties of small-molecules have also been extracted using neural networks. it is also shown, that neural networks trained on limited data selected from very expensive md simulations can resurrect the entire boltzmann distribution for small proteins [ ] . however, none of these approaches are aimed at resurrecting the real-time (i.e. fs-resolution molecular movements of biological molecules) -one of the central goals of md simulations [ ] . rnns and lstms have been used to predict md [ ] , but the tested data sets fail to wholly capture the dynamical complexity of a biological molecule. a key observation made therein that inspires our current investigations is that training on molecular dynamics beyond particles is improbable. the data sets we present in the next section challenges this seminal bottleneck that must be overcome to forecast md simulations of real biological systems. from a computational perspective, any dynamically evolving system can be regarded as event-driven time series data; in this sense, md simulations are essentially high-dimensional high-frequency time series data, and sequence modeling techniques like recurrent neural networks [ ] , hidden markov models and arma, can be used to model md trajectories. deep learning has recently emerged as a popular paradigm for modeling dynamically evolving time series and predicting future events. these techniques have also been vastly studied in special application areas like business and finance [ ] , healthcare [ ] , power and energy [ ] . at room temperature, where biology exists, newtonian mechanics of the molecules become stochastic described by the fluctuation-dissipation theorem. the ensuing molecular trajectories converge at boltzmann-distributed ensembles at infinitely long times. it has been established that protein dynamics in cells can be modeled as motions of molecules within a media that is highly viscous. imposing this so called friction-dominated condition on the stochastic newton's equations, and assuming that a complete set of the degrees of freedom for describing the dynamical system is known, molecular dynamics is deemed to be a markovian process. in simpler terms, it is a process for which predictions can be made regarding future outcomes based solely on its present state, and most importantly, such predictions are just as good as the ones that could be made knowing the process's full history. the equation of motion of a particle of mass (m), at position (x) in time (t) within an environment of friction coefficient (γ) becomes: where the random force ζ is constrained by requiring the integral of its autocorrelation function to be inversely related to the friction coefficient. however, we often cannot find a complete set of descriptors to probe the molecular dynamics of proteins. the problem becomes particularly challenging once the number of amino acids in the protein sequences becomes more than [ ] (i.e. roughly n = atoms). the associated phase space (of n positions, x = x , x , ...x n , and n momenta) for systems of these sizes (or higher) becomes too extended for physics-based methods such as md to visit all the possible points in the n -dimensional space. this incomplete description of the phase space together with the well-known finite-size artifacts[ ], introduces "memory" into any realistic md simulation. introduced originally by zwanzig and used in ref. [ ] , this memory shows up as a "long-time" tail in auto-correlation functions of atoms undergoing simulation. in a fully equilibrated systems, this memory is short-term vanishing within picoseconds ( − seconds) for carbon, hydrogen and oxygen atoms that primarily compose the proteins [ ] . in non-equilibrium simulations that are often employed to accelerate md [ ] , the long-time tail stretches to nanosecond ( − seconds). noting that every integration time step in md is - fs ( − seconds), there exists at least order of magnitude in time within which the memory of the system is relevant and offers the opportunity to leverage deep learning techniques for making predictions. computational modeling of any complex dynamics essentially boils down to a multivariate time series forecasting task, and hence time series trajectory data capturing an evolving biological system is necessary to analyze and computationally learn the underlying molecular dynamics. below we first present some basic definitions and notations we will used to characterize the md time series. -lead time: for a forecasting problem, the lead time specifies how far ahead the user wants to predict the future positions of atoms. predicting far ahead (high lead) enables faster md simulation, and at the same time, makes forecasting task more challenging. -history size: next, we must decide how much historical data we wish to use to predict the future positions of atoms. this value is known as the history size. -prediction window: prediction window indicates the discrete time-window in the future used for creating the prediction outcome. for simplicity, in this paper, we always use a prediction window of fs. -prediction error: error is defined as the root mean square deviation (rmsd (eq. )) between real and predicted structures at a given time point. during the learning stages, the error across individual interactions is denoted loss. we present two new data sets to introduce subtleties in the equilibrium and nonequilibrium molecular dynamics from the perspective of time series forecasting. an analysis of these data sets will bring to light how effects of the history (or correlation) in the time series data can be described at different lead times and prediction windows to model a real-time dynamically evolving md time series. the training objective here is to minimize the prediction error for a sufficiently large batch of training instances over a historical time span. we introduce two data sets from two distinct kinds of md simulation systems. illustrated in fig. , the first data set is an equilibrium simulation of the enzyme adenosine kinase (adk). the second one is a steered molecular dynamics (smd) or non-equilibrium simulation of the -alanine polypeptide helix (fig. ) . in smd, an external force is applied to the system along a chosen direction. we applied a force of nanonewton along one end of the -alanine helix, unfolding the protein [ ] . we have generated high-as well as low-dimensional data for both the systems. in highdimension, the position of every atom is explicitly defined, resulting in × for adk in both the data sets, shape transformation of a -dimensional ( d) many-body system is recorded over time. for adk, a transition from an open to a closed dshape is observed due to concomitant rearrangements of particles (fig. b) , while in -alanine, a more non-linear helixto-coil transition is probed by tracking the changes in position of particles (fig. a ). beyond such high dimensionality of the data sets, the uniqueness of the equilibrium md time series is in its dynamical evolution -the kinetic behavior stochastically switches between fast and slowly evolving regimes. using rmsd values of all the the particle positions with respect to the very first, t = position, we showcase these sudden changes in single-particle as well as collective dynamics in fig. a . for the non-equilibrium time series data of -alanine, the movements occur in the presence of an external force. these simulations produce less noisy data than the equilibrium md of adk fig. b vs. a) . however, given that the shape changes are highly directed, we find that there are multiple classes of single-particle dynamics hidden under a collective behavior. unlike the equilibrium md simulations, where the positions of all the particles are gaussian-distributed about a mean, at least two different classes of particle distributions is observed in the non-equilibrium time series (fig. c vs. c) . the distribution of the significant majority of atoms is non-gaussian, reflecting of the positional biases from high external forces to which they are subjected. during protein structure determination experiments, the atomic positions of a target protein are assigned by averaging the observed electron densities [ ] . while this assignment offers a good starting model, the derived protein structure is typically in a non-biological (or non-native) state, and therefore severely limits biological application. such artifacts can be resolved by bringing the starting model into thermal equilibrium at room temperature. once in equilibrium, the protein adopts its native structure ( d shape) and dynamics. by numerically integrating eq. , equilibrium md simulations monitor the real-time evolution of native proteins. the challenges involved in modeling of an equilibrium md data can be presented employing the lead times of the associated time series. the hardness of the time series data is quantified by tracking how the rmsd values between the data points change at different lead times, namely at leads of , .. fs (fig. ) . the change in rmsd at different lead times also serve as a direct probe for the correlation in the data. if the lead time is short ( or fs) then it is simple to computationally probe the . - . Å scale changes in molecular position (fig. , black and red traces) by analyzing the associated short-time correlations (fig. d) . in contrast, if the lead time is too long ( and fs), then key short-time correlations within the data are missed. thus, the associated small d shape changes may not be accurately learnt at this scale. one advantage of this data set is that all the particles are "well-behaved" and their dynamics is gaussian distributed (fig. c) . thus, an optimal lead time is desired which is sufficiently large (far into the future) to be interesting from a biological standpoint, and at the same time, can be used to train a machine learning model aimed at replacing computationally expensive md. data preparation. a starting d protein model of adk was generated using an x-ray diffraction crystal structure obtained from the pdb [ ] . the atomic coordinates of adk are encoded in the traditional pdb format presenting the x, y, z positions. x-ray is unable to resolve hydrogen atom positions. thus, the position of hydrogen atoms were inferred using the run adk.py script located in the equilibrium md simulation of the github for this project [ ] . thus, a complete initial model was determined. the goal of equilibrium md is to recreate the native dynamics of a protein of interest. therefore, the forces acting on each atom of the protein is defined using a potential energy function or force field. the amber force field, ff sbonlysc, was used for the adk simulation [ ] . an implicit water model, gb-neck , was chosen to capture the equilibrium adk environment; it is computationally efficient and enhances conformational sampling through decreased friction (γ in eq. ) [ ] . after force field and water model selection, the energy of the protein model is minimized. the energy minimization corrects atoms that are in erroneously close contact due to artifacts from structural determination. if uncorrected, the simulation can produce unrealistic forces that cause the simulation to become unstable. once minimized using conjugate gradients, the all-atom model is ready for production simulation. the adk simulation was performed for timesteps with a periodic update frequency of fs, and atomic models were saved every fs. this results in a . nanosecond ( steps× fs/step) simulation of the adk protein, providing in time series of data points. the simulation of adk was performed using the openmm python library [ ] . five copies simulations were performed at a temperature of k. collective dynamics of adk was monitored by computing its rmsd relative to the t = time point (fig. a) . a plateau in this profile suggests that equilibrium is attained at . × fs. the trajectory data, containing time points or snapshots, was initially stored in single precision binary fortran files known as dcd files. the positional coordinates (x,y,z) of all atoms in each snapshot were extracted from the dcd file resulting in a rank- tensor which was ( × × ) for the high dimensional space and ( × × ) for the low dimensional data. the entire simulation can be reproduced with a single openmm python script located in the equilibrium md simulation on github [ ] . life as we know, exists out of equilibrium. traditionally, experiments focusing on the nonequilibrium behavior of proteins were performed by either adding heat or inducing chemical perturbations. another factor that can bring proteins out of equilibrium is mechanical stress (e.g stretching of the molecules). such stretching arises naturally in proteins located in the muscle tissue. the response of these proteins to mechanical stress can be studied by investigating the individual domain's response to stretching within atomic force microscopy or afm experiments [ ] . this molecular events are analogous to the process of pulling a rubber-band by holding one end fixed in our hand (fig. a) . now, we employ nonequilibrium md simulations for computationally recreating the afm experiments. in particular, steered md or smd is used to generate a relevant and challenging data set for learning algorithms to be trained and validated. it is notable that events from such non-equilibrium pulling experiments or their equivalent smd simulations, have never been used within rnn, in particular lstm framework for time series forecasting. the challenge in smd is commensurate to that of equilibrium md in that, an optimal lead time should be derived respecting the correlation limits of the data. however, subtleties are twofold: first, for the same lead time steps the rmsd error bars in smd are much higher (fig. ) , consistent with more prominent d shape changes that those observed for equilibrium md simulations of adk (fig. a vs. b ). yet, the longer the correlation times (fig. e) indicate smoother shifts within the time series. second, there are multiple classes of atoms with different dynamics distribution (fig. c) . data preparation. the -alanine helix was prepared using the avagadro software on a single cpu. the external force acts on the c-terminus of the long helical protein, while the n-terminus region remains constrained. as the molecule is stretched, it undergoes a gradual conformational change, transitioning from an α-helix to a random coil (fig. a) . typically, there are two variants of smd, constant force and constant velocity pulling. the equation for the external pulling force (f spring ) acting on the atom in the c-terminal region of the protein is given by, here, x is the displacement of the atom in protein which is pulled from its original position, v is the prescribed pulling velocity, and k is the spring constant. in the presence of this external force, the equation of motion (eq. ) becomes for our data set, we adopt the smd with constant velocity (smd-cv) protocol from our open-source namd tutorial [ ] . the smd-cv simulations are performed using the langevin dynamics scheme of md at constant temperature of k in generalized-born implicit solvent with the charmm m force field [ ] . one end of the molecule (n-terminus) is constrained while the other end at the (c-terminus) is free to move along the z-axis with a constant speed of . Å/ps and force constant of kcal/mol/Å , exerting an overall force of nanonewton (fig. a) [ ] . a set of copies of smd is used to generate an ensemble of conformations when subject to smd-cv pulling. all simulations are performed using the recent build of namd (version nightly build) with a time step of fs, with dielectric constant of , and a user defined cut-off for coulomb forces with a switching function starting at a distance of Å which plateaus to zero at Å. a simulation time of fs is required for extension of the helix to random coil. here, we save the trajectory every fs, mainly to generate a large data set of points to train an lstm model in sect. . the data presented in figs. and are saved at even longer time intervals, namely fs, to reduce the number of time points to for computing lead times and correlations. the full data set of ( or × × ) points, which is used in the lstms below is accessible through the google drive link provided on github [ ] . a tcl script smd.constvel.namd is used to implement the outlined simulation protocol. the script includes all the standard namd parameters, which are outlined above to perform smd. this script together with all other input files are available freely through github [ ] and the namd website [ ] to reproduce our data set for non-equilibrium md simulations. our md data is documented in tutorial files, scripts, and an openly accessible github page [ ] so any user with access to a single cpu or gpu node will be able to reproduce the results. the full time series can be loaded, visualized in d and analyzed for rmsd using the molecular visualization tool vmd (figs. and ) . the presented data set exemplifies arguably a first attempt at capturing the entire range of time series variations typical of a biomolecule. we describe two broad classes of data with distinct correlation timescales. more importantly, the data clearly shows how external physical forces can alter time series correlations and provides an avenue to experiment with machine learning models for probing such external factors. accordingly, a data scientist can chose a suite of different learning algorithms to model these fast evolving high dimensional md trajectory data. the equilibrium data at a single-particle level appears to be well behaved with relatively uniform kurtosis values (fig. c ), but offers difficulties in training of the rapid variability in rmsds ( fig. a-multiple shaded regions) . in contrast, the non-equilibrium data shows non-gaussian statistics at a single-particle level (fig. c ) eliciting complexity at a single-particle level, but manifest smooth changes in the time series when treated together (fig. b) . a key question these data sets pose is whether a common learning algorithm can ever be introduced to work with all the limits of biomolecular dynamics. a second question the data sets raise pertains to identification of limits that are easier to model using popular sequence modeling techniques like rnns with lstms or grus cells either in isolation or in concert. finally, will the learning algorithms scale if the dimensions of the data sets increase from the hundred-to-thousand variables, chosen here for simplicity, to the more realistic million-to-billion dimensional spaces. these three questions also offer the opportunity to think about the use of the existing petascale or the upcoming exascale resources for handing the convoluted biomolecular problems with data science methodologies. put together, these data sets places an machine learning expert in a position to address one of the central questions at the interface of life sciences and computer sciences, namely to what extent can numerical simulation schemes be by-passed using the machine learning tools. the community of computational biophysics with nearly , namd users and a - fold large cadre of researchers applying md will immediately benefit from answering this question. the findings from this data set are further generalizable to any domain with quantitative data on highdimensional rapidly fluctuating time series. due to the recent success of recurrent neural networks (rnn) for modeling time series data [ ] , we conducted an exploratory study with rnns to model the two new dynamically evolving md trajectory data. we used long-short term memory (lstm) cells in the hidden layers and trained rnns on both equilibrium and non-equilibrium md simulations to decipher which data-set is more amenable to learning. more specifically, we conducted a series of experiments to produce baseline accuracy numbers for lstms as well as to tune the different hyperparameters associated with the same. below we present a brief summary of the experiments that were conducted and report our findings to facilitate in-depth future research in this direction. as a starting point, we set the static model as our baseline where we assume that the position of an atom at a future timestamp x t+lead does not change relative to its last known position, i.e. x t , where, t is the current timestamp. the assumption is incorrect, but still helps us set a realistic baseline for evaluating the performance of advanced machine learning techniques like lstms. figures a,b (adk) and a,b (smd) show the rmsd distributions of static model for lead time steps and , respectively. for starters, we trained a rnn with lstm units in the hidden layer, a learning rate of . , history size and varying lead time steps of { , , , , }. the output layer used linear activation and mean squared loss was used as the training loss function. below we report some of our key observations from the experiments. curse of dimensionality and kirchhoff decomposition we found that learning by treating the entire protein structure at a given timestamp as a single training instance is very challenging due to the high dimensionality of the problem, generating higher errors than the static model. to deal with this issue, we assumed that the position of each atom within the protein structure is independent of one another and can be modeled as separate one dimensional time-series. this so-called kirchhoff decomposition scheme boosted the performance of lstm significantly. adk vs -alanine: we report rmsd of each simulated system, i.e., adk and smd ( figs. a and b ). we found that rmsd of smd simulation of -alanine is one order of magnitude higher than that of the equilibrium md simulation of adk. this is due to the non-equilibrium nature of the former, where an external force is used to pull the system. this difference is also reflected in the static model error at varying lead time steps (figs. and ). effect of lead time: increasing lead time makes time series forecasting harder, which we expected would justify the use of complex sequence modeling techniques like lstm. in other words, we hypothesized that an increase in lead time will cause the lstm error to increase less than the static model error. we found this to be true for the -alanine smd simulation. with lead time steps of and , lstm loss was higher than the static model error. however, with lead time step of , lstm performed better than the static model, and improvement from static model increased further at even higher lead time steps ( and ). due to lack of space, we only present the results for lead time an (fig. a,b) . in contrast, we have not been able to achieve lower lstm losses compared to the static model loss for the equilibrium md simulation of adk, for the lead time steps through . equilibrium md simulation of adk decorrelates much faster than smd, in the picosecond regime ( fig. a) . this yields an interesting as well as surprising result that equilibrium md trajectories were more difficult to model than the non-equilibrium md trajectories, which is indeed counter intuitive. effect of history: for this set of experiments, we hypothesized that an increase in history size will reduce the lstm training error as we are using more information from the past. indeed, the results confirm our hypothesis (figs. ab and a,b). more specifically, we varied history size among { , , } and found that increasing the history actually reduces the lstm training erros for both adk and -alanine trajectories. effect of learning rate: we trained the lstm network separately while varying learning rate among { . , . , . , . , . }. we found that rates of . , . were unstable, while . , . were too slow to converge for smd simulations (figs. ) [results for md simulations were similar, and are provided in github [ ] ]. thus, we recommend . as the learning rate. summary of hyper-parameter tuning study: based on our exploratory study, we recommend the following set of empirical values for each hyper-parameter as shown in table . in regards to the future directions of methods that can be done in the data set, there are still more ways to improve training on lstms. one possible improvement is through more stacked lstms. this would be able to learn more nonlinear dynamical relationships between the points. other than lstms, we can also borrow from deep learning in natural language processing by utilizing attention models, which have recently been getting state of the art results, without of the use of a recurrent hidden layer [ ] . other considerations for future direction is the ability to reformulate the d structural input of the data as a d point cloud. there have been recent deep learning architectures used in d point cloud segmentation and classification such as voxelnet and pointnet [ ] . both architectures leverage the underlying d relationship between points and objects in d space for the supervising task. with voxelnet, the data is voxelized into fixed voxels in which a d convolutional neural network is used. however, with architectures like pointnet, the input can be variable. in this case, future directions can be the addition of a data set in which the number of atoms per dynamical system and be varied. with architectures that deal with data in the d space, there is the consideration of new loss functions. here, we utilized mse loss in optimizing our lstm. loss functions such as earth movers distance (emd) and chamfer loss are two most notable losses used for d point generation [ ] . moreover, emd can be extended for graphs, which can be useful for not only learning the d geometrical relationships, but the graph relationships between atoms. the external information sought in the current data sets from afm or force measurements to improve temporal correlation can also be derived from other experimental modalities such as x-ray crystallography [ ] or cryo-electron microscopy [ ] . finally, recovery of the all-atom description from an lstm-predicted reduced space of only heavy atoms opens the door to inverse-boltzmann approaches for reverse coarse-graining [ ] . in the present study, we report two new data sets for describing equilibrium and nonequilibrium protein dynamics produced by physics-based simulations. these data sets fill a much needed knowledge gap in the protein-learning field, providing a synergistic augmentation to the popular existing data sets used for learning molecular structure [ ] . protein dynamics was represented as a time-series data and was modeled through a recurrent neural network with lstm cells in the hidden layer. we found that the learning of both data sets was improved when using a kirchhoff decomposition on models with a constant number of hidden layers. the ability to forecast future structure was shown to be dependent on the correlation among the recent past structures. specifically, dynamics within the nonequilibrium molecular dynamic simulations were highly correlated, and thus protein dynamics were effectively learned. conversely, the movements of a protein at thermal equilibrium were poorly correlated, making accurate forecasting more difficult. increasing history size improved the prediction accuracy for both data-sets and lstm outperformed the static baseline while forecasting at higher lead times. overall, lstms provide an exciting tool to model non-equilibrium protein dynamics. virtually all biologically relevant actions occur at non-equilibrium, therefore these results indicate an exciting advance with far-reaching implications. on the range of applicability of the reissner-mindlin and kirchhoff-love plate bending models the protein data bank pointnet: deep learning on point sets for d classification and segmentation recurrent neural networks for multivariate time series with missing values a thorough review on the current advance of neural network structures openmm : rapid development of high performance algorithms for molecular dynamics shear viscosity of the hard-sphere fluid via nonequilibrium molecular dynamics a point set generation network for d object reconstruction from a single image computational methodologies for real-space structural refinement of large macromolecular complexes reconstructing potentials of mean force through time series analysis of steered molecular dynamics simulations cikm md prediction vmd: visual molecular dynamics generalized scalable multiple copy algorithms for molecular dynamics simulations in namd ensemble of multi-headed machine learning architectures for time-series forecasting of healthcare expenditures boltzmann generators: sampling equilibrium states of many-body systems with deep learning calculating potentials of mean force from steered molecular dynamics simulations accelerating molecular simulations of proteins using bayesian inference on weak information predicting improved protein conformations with a temporal deep recurrent neural network time series forecasting of petroleum production using deep lstm recurrent networks financial time series forecasting with deep learning: a systematic literature review order parameters for macromolecules: application to multiscale simulation atoms to phenotypes: molecular design principles of molecular dynamicsbased refinement and validation for sub- Å cryo-electron microscopy maps total predicted mhc-i epitope load is inversely associated with mortality from sars-cov- . medrxiv p key: cord- - igfuaw authors: boonyaratanakornkit, jim; singh, suruchi; weidle, connor; rodarte, justas; bakthavatsalam, ramasamy; perkins, jonathan; stewart-jones, guillaume b.e.; kwong, peter d.; mcguire, andrew t.; pancera, marie; taylor, justin j. title: protective antibodies against human parainfluenza virus type (hpiv ) infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: igfuaw human parainfluenza virus type iii (hpiv ) is a common respiratory pathogen that afflicts children and can be fatal in vulnerable populations, including the immunocompromised. unfortunately, an effective vaccine or therapeutic is not currently available, resulting in tens of thousands of hospitalizations per year. in an effort to discover a protective antibody against hpiv , we screened the b cell repertoires from peripheral blood, tonsils, or spleen from healthy children and adults. these analyses yielded five monoclonal antibodies that potently neutralized hpiv in vitro. these hpiv neutralizing antibodies targeted two non-overlapping epitopes of the hpiv f protein, with most targeting the apex. importantly, prophylactic administration of one of these antibodies, named pi -e , resulted in potent protection against hpiv infection in cotton rats. additionally, pi -e could also be used therapeutically to suppress hpiv in immunocompromised animals. these results demonstrate the potential clinical utility of pi -e for the prevention or treatment of hpiv in both immunocompetent and immunocompromised individuals. hpiv is a common cause of respiratory illness in infants and children. over , hospitalizations per year in the us occur for fever or acute respiratory illness due to hpiv . hpiv , like respiratory syncytial virus (rsv), infects early in life and frequently causes severe bronchiolitis and pneumonia in infants under six months of age who are unable to mount a robust antibody response , . hpiv is also an important cause of mortality, morbidity, and health care costs in other vulnerable populations, such as immunocompromised hematopoietic stem cell transplant (hct) recipients . up to a third of hct recipients acquire a respiratory viral infection within six months of transplant [ ] [ ] [ ] [ ] [ ] [ ] [ ] . in up to a third of those patients, the virus progresses from the upper to the lower respiratory tract , . once the virus gains a foothold in the lower tract, little can be done for most viruses beyond supportive care; up to % of patients with lower tract disease die within three months. hpiv is an important cause of serious respiratory viral infections after hct, with a cumulative incidence of % post-transplant at our center , [ ] [ ] [ ] . in the absence of any vaccine or therapy, there is significant need for preventive and therapeutic interventions against hpiv . neutralizing monoclonal antibodies have been correlated with protection against several respiratory viruses, including rsv and influenza [ ] [ ] [ ] [ ] [ ] . the monoclonal antibody palivizumab is a humanized antibody targeting the fusion (f) protein of rsv and was licensed for use as immunoprophylaxis to prevent severe disease in high-risk infants . the f protein of rsv is an essential surface glycoprotein and therefore a major neutralizing antibody target. as a class i fusion protein, f mediates viral entry by transitioning between a metastable prefusion (pref) conformation and a stable postfusion (postf) conformation. since pref is the major conformation on infectious virus, antibodies to pref are the most potent at neutralizing virus, whereas antibodies targeting postf generally are not , . similar to rsv, the f protein of hpiv also adopts pref and postf conformations , . hpiv f was recently stabilized in the pref conformation and induced higher serum neutralizing titers than the hpiv postf to focus upon b cells producing neutralizing antibodies, we modified our assay to sort individual b cells onto irradiated t feeder cells expressing cd l, il- , and il- to allow for higher throughput screening of culture supernatants for neutralization prior to antibody cloning, as described . in general, over half of sorted b cells and % of sorted igd -b cells produced antibody levels detectable by elisa (fig. a) . we applied this assay to stimulate single hpiv pref-binding b cells and excluded igdexpressing cells since these cells would be the least likely to have undergone the somatic hypermutation and affinity maturation necessary for potent neutralization. using this approach, we found that % of igd -hpiv pref-binding b cells sorted from tonsils produced hpiv neutralizing antibodies, as compared to % from the spleen and % from peripheral blood (fig. b) . from these cultures we cloned four additional hpiv neutralizing monoclonal antibodies named pi -a , pi -b , pi -a , and pi -a (fig. c) . of these, pi -a had the highest apparent binding affinity to hpiv pref, and its affinity was comparable to pi -e (fig. d) . the neutralization potency of these antibodies ranged from . to . ng/ml (fig. c) . each neutralizing monoclonal antibody used different immunoglobulin heavy and light chain alleles except for pi -a and pi -b , which both utilized the kappa allele - * (supplemental table ). none of the alleles matched those of the previously described hpiv antibody pia . the similarity to germ-line sequences of the variable genes from these neutralizing antibodies ranged from - % (supplemental table ). all of these newly described antibodies bound strongly to the pref conformation without any detectable binding to the postf conformation (fig. d, e) , as expected given the exclusion of b cells binding postf during the sort. in contrast, the previously described antibody pia bound weakly to the postf conformation in addition to strong pref binding (fig. e) . in anticipation of administering these antibodies in vivo, we confirmed that none bound to permeabilized hep- cells (fig. f) , a common assessment of autoreactivity , . we next performed cross-competition binding experiments to gauge the antigenic sites on hpiv pref allowing for neutralization. three of these five new neutralizing monoclonal antibodies (pi -e , -a , and -b ) fully competed with each other and the previously described antibody pia (fig. a) . pi -a also competed with this group, but only partially with pi -e (fig. a) . based on the known binding site of pia , this antigenic site is likely located at the apex of hpiv pref . we propose calling this antigenic site Ø on hpiv pref for consistency, since the apices of rsv and hmpv pref are also called antigenic site Ø , . the fifth neutralizing monoclonal, pi -a , only weakly competed with pi -a and not at all with the others, suggesting the presence of an antigenic site vulnerable to neutralization by antibodies outside of antigenic site Ø (fig. a) . we performed negative stain electron microscopy (nsem) of pi -e fab, pi -a fab, and pi -c fab in complex with hpiv pref (fig. b, c) to confirm the binding location of these antibodies. although we could form a complex by size exclusion chromatography (sec) of hpiv pref with pi -c fab, nsem did not show bound pi -c fab molecules. d classifications and d reconstruction indicated that pi -a fab was bound to the side of hpiv pref in a : ratio, confirming that its epitope does not overlap with that of previously described pia antibody and defining a new site of neutralization on hpiv pref (fig. b) . of note, this site is reminiscent of site v on rsv f . as predicted earlier, d classifications and d reconstruction showed that the pi -e fab bound at the apex of hpiv pref in a : ratio (fab:trimer). however, pi -e fab appears to bind hpiv pref with a different angle of approach compared to pia (fig. c) . we obtained a . Å cryo-em structure of the pi -e fab in complex with hpiv pref (fig. d) . using the previously solved hpiv -pref structure (pdb id mjz) and a . Å structure of pi -e fab that we obtained using x-ray crystallography (fig. d fig. s ), we were able to fit the coordinates in our low resolution cryo-em map. the structure confirmed the slightly different angle of approach and that all the cdrs interact with the different protomers in a non-symmetrical manner. we superimposed the structure of hpiv pref (root mean standard deviation: . a ) bound to pi -e fv or pia (fig. e ) which confirmed that their epitopes overlapped. additionally, pi -e used its longer cdrl to make quaternary contacts with protomers plaque reduction neutralization screen of supernatant from hpiv pref-specific b cells individually sorted and expanded on feeder cells. n= donors in each group with a total of cells from pbmc, cells from tonsils, , cells from spleen. (c) neutralizing titers of hpiv -specific monoclonal antibodies were determined by % plaque reduction neutralization tests on vero cells using gfp-labeled hpiv . (d) apparent affinity (kd) of antibody binding with hpiv pref at . , . , . , and . µm were measured by bli. lod stands for limit of detection. error bars represent standard deviation. r represents the coefficient of determination. penta-his probes were loaded with either the (e) pref or postf conformation of hpiv f. association with each mab was then measured by bli. all measurements are normalized against a negative control antibody. the positive control antibody is a human mab known to bind hpiv postf. (f) antinuclear antibody assay in hep- cells using mabs targeting hpiv pref. binding was detected using a secondary alexa fluor (af )-conjugated goat anti-human antibody. the mab palivizumab was used as a negative control and f as a positive control for autoreactivity. the average fluorescence intensity was calculated from two independent experiments and error bars represent standard deviation. whereas the cdrh reaches towards the trimer axis and makes quaternary contacts with the three hpiv pref protomers (fig. d, f) . together, our results indicate that the hpiv pref apical antigenic site Ø is a common target of neutralizing antibodies that can be accessed by antibodies using different gene segments and with different angles of approach. we next investigated the potential clinical utility of pi -e in an animal challenge model of hpiv infection. although the human parainfluenza viruses do not replicate in mice, lower respiratory tract pathology and viral replication can be demonstrated in cotton rats infected intranasally with hpiv , . the cotton rat model was used in the past to predict not only the efficacy of antibody immunoprophylaxis but also the exact dose of palivizumab, mg/kg, that would be effective against rsv in human infants . therefore, we adopted a similar experimental design and injected . - mg/kg of pi -e intramuscularly one day prior to intranasal infection of cotton rats with pfu of hpiv (fig. a) . none of the animals that received pi -e developed significant peribronchiolitis. a trend towards higher histopathologic scores of peribronchiolitis was detected days after infection in (a) epitope binning of mabs using the octet system. penta-his probes were coated with his-tagged hpiv pref. the mab listed on the left-side of the chart was loaded first onto the coated probe followed by the mab listed on the top of the chart. values represent the level of competition between antibodies for the same binding site on hpiv pref. this is expressed as the percent drop in maximum signal of the top mab in the presence of the left mab compared to the maximum signal of the top mab alone. red boxes represent - % competition for the same binding site, yellow boxes represent - % competition, and white boxes represent - % competition. (b, c) negative stain electron microscopy (nsem) d classifications of hpiv pref in complex with pi -a fab (b) and pi -e fab (c) with d reconstruction. coordinates of hpiv pref trimer (blue, pdb id mjz), trimeric domain gcn (orange, pdb id dme), and crystal structure of pi -e fab (green, this paper) were fitted in the d map. (d) cryo-em structure at . Å resolution of prefusion hpiv f bound to pi -e fv. the complex was most ordered at the core of the hpiv f trimer and at its interface with the pi -e fv. insets show antibody-apex interactions. protomers of the hpiv pref trimer are colored dark blue, light blue, and purple. (e) surface representation of hpiv pref bound to pi -e fv (green) and pia fv (grey, pdb id mjz). (f) sequence alignment of pi -e and pia . alignment with germline (gl) cdrl sequences are also shown with mutations from germline highlighted in red. light chain heavy chain control animals that received pbs but this was not statistically significant (fig. b) . consistent with decreased peribronchiolitis, the amount of hpiv detected in the lung was reduced ~ -fold at the lowest tested dose of . mg/kg pi -e and was below the limit of detection in / animals injected with . mg/kg or more (fig. c) . lungs. more modest reductions in hpiv replication in the nose were also detected ( fig. c) , which was expected given the relatively poor ability of igg antibodies to enter this compartment , . together, the data indicate an ec of . mg/kg and an ec of . mg/kg for pi -e -mediated prevention of hpiv in the lungs. since patients receiving cytotoxic therapy for cancer or autoimmune diseases and other immunocompromised groups are at the highest risk for severe disease and mortality due to hpiv infection, we tested the efficacy of pi -e as treatment in immunosuppressed animals. for this we adopted a similar experimental design used to model rsv in immunocompromised cotton rats in which the drug cytoxan is administered to deplete lymphocytes [ ] [ ] [ ] . animals were treated with mg/kg of cytoxan injected every three days for days prior to intranasal infection with pfu of hpiv (fig. a) . five days after infection, ~ pfu/g could be detected in the lungs and nose of control animals that did not receive pi -e (fig. b) . in contrast, viral titers were diminished fold in the lungs and -fold in the nose when mg/kg of pi -e was injected one day after infection (fig. b) . together, our data indicates that pi -e can both prevent and treat hpiv infection. the ability to isolate neutralizing monoclonal antibodies and to identify their antigenic binding sites has revolutionized our ability to understand, prevent, and treat viral infections, some of which include hiv, ebola, rsv, influenza, and the newly emerged sars-cov- , responsible for the current covid- pandemic [ ] [ ] [ ] . one of the goals of this study was to quantify the frequency of b cells capable of producing neutralizing antibodies against hpiv . we designed our b cell probes and flow cytometry panel to allow the selection of b cells that bind specifically to the pref but not the postf conformation of hpiv f protein. even with this selection strategy, we found that the majority of hpiv pref-specific b cells failed to neutralize virus. this is not surprising, since b cells undergo positive selection based on signals that stem from the affinity of binding between their immunoglobulin receptor and cognate antigen, regardless of neutralization , . in the circulating peripheral blood of healthy individuals, the frequency of hpiv neutralizing b cells was only % of hpiv prefspecific b cells. as a result, our original strategy of sorting and directly cloning antibodies from individual hpiv pref-specific b cells from peripheral blood identified predominantly low-affinity naïve b cells, was laborious, expensive, and inefficient, and yielded only a single neutralizing monoclonal antibody. therefore, we switched to a higher-throughput neutralization screening strategy for hpiv based upon an assay developed for hiv . this allowed us to scan thousands of individual b cells from peripheral blood, tonsils, and spleens and select only those that produced neutralizing antibodies against hpiv for subsequent monoclonal antibody cloning. using this method, we confirmed the low frequency of hpiv pref-specific b cells able to produce neutralizing antibodies and isolated four additional potent hpiv neutralizing monoclonal antibodies. many human-derived neutralizing monoclonal antibodies to date are based on b cells found in peripheral blood , , . we decided to compare the frequency of b cells capable of producing neutralizing antibodies in readily accessible secondary lymphoid organs that might be enriched for b cells that had undergone affinity maturation. we therefore sampled human tonsils from children undergoing elective tonsillectomy and human spleens from previously healthy adult deceased organ donors. since virtually all children by the age of three demonstrate serologic evidence of infection by parainfluenza virus, we did not need to screen donors for evidence of previous infection . we found that tonsils were significantly enriched with b cells capable of producing hpiv neutralizing monoclonal antibodies. although tonsillectomy is a common procedure and has long been thought to have negligible long-term costs, more recent data suggests tonsillectomy may be associated with increased long-term risks for respiratory infections . human spleens were also enriched for b cells capable of neutralizing hpiv , although the magnitude of enrichment was much lower than in tonsils. the majority of neutralizing antibodies against hpiv appeared to target the apex of the f protein in a similar fashion to neutralizing antibodies against rsv , . the ability of pi -e to bind the apex of hpiv pref in a ratio of fab : trimer is also reminiscent of the binding mode of vrc . , the most potent hiv v v -recognizing antibody isolated to date . pi -e showed high specificity for the prefusion conformation of hpiv f, unlike the previously isolated apical binding monoclonal antibody pia which bound weakly to the post-fusion conformation. the ability of pia to bind the post-fusion conformation was unexpected since the antigenic site at the apex is unique to the prefusion conformation. in the related fusion protein of rsv, the apical antigenic site consists of an unstructured region and an alpha helix which are displaced by more than Å in the post-fusion conformation . it is possible that pia is able to bind weakly to small stretches of linear epitopes found in both the pre-and post-fusion conformations. interestingly, the monoclonal antibodies we isolated all utilized different immunoglobulin heavy and light chain alleles. a similar phenomenon was previously described in which human antibodies targeting the receptor binding site of hemagglutinin were found to arise from nearly unrestricted germ-line origins from multiple donors, and as a result viral resistance to one antibody did not confer resistance to all . this suggests a wide variety of evolved solutions to the problem of blocking viral attachment and binding are available in the general population, making this antigenic site an appealing vaccine target . we focused our efforts on determining the in vivo efficacy of pi -e , because it was among the most potent neutralizing antibody against site Ø. given the in vitro potency of pi -e , we anticipated a low ec . similar to palivizumab for rsv, a dose of at least . mg/kg reduced hpiv levels in the lungs in virtually all animals. additionally, mg/kg of pi -e significantly reduced hpiv replication in the nose, in contrast to palivizumab which failed to suppress rsv replication in the nose even at a dose of mg/kg , . given the ability of pi -e to suppress hpiv replication in the nose and lungs of immunocompromised animals when given after infection, pi -e could play a role in both prophylaxis or therapy against hpiv infections in the hct population. testing the in vivo efficacy of potent hpiv neutralizing antibodies at additional, later timepoints in cotton rats and non-human primates would provide further insights into the therapeutic window. we have recently described a method of engineering b cells using crispr/cas to express palivizumab and conferred protection against rsv in naïve animals by adoptive transfer of these cells . in this revolutionary new age of cellular therapy and immunotherapy against cancer, it is conceivable that b cells could be engineered to produce a variety of protective antibodies against multiple pathogens and transferred along with the stem cell product during transplant as part of treatment for an underlying disease. the size of experimental groups is specified in figure legends. peripheral blood was obtained by venipuncture from healthy, hivseronegative adult volunteers enrolled in the seattle area control study, which was approved by the fred hutchinson cancer research center institutional review board. pbmcs were isolated from whole blood using accuspin system histopaque- (sigma-aldrich). institutional review board approval for studies involving human tonsils was obtained from seattle children's hospital. studies involving human spleens were deemed non-human subjects research since tissue was de-identified, otherwise discarded, and originated from deceased individuals. tissue fragments were passed through a basket screen, centrifuged at × g for minutes, incubated with ack lysis buffer (thermo fisher) for . minutes, resuspended in rpmi (gibco), and passed through a stacked µm and µm cell strainer. cells were resuspended in % dimethylsulfoxide in heatinactivated fetal calf serum (gibco) and cryopreserved in liquid nitrogen before use. f cells (thermo fisher) were cultured in freestyle media (thermo fisher). vero cells (atcc ccl- ), llc-mk cells (atcc ccl- . ), and hep- (atcc ccl- ) were cultured in dmem (gibco) supplemented with % fetal calf serum and u/ml penicillin plus μg/ml streptomycin (gibco). t cd l/il- /il- feeder cells were cultured in dmem supplemented with % fetal calf serum, penicillin and streptomycin, plus . mg/ml geneticin as described . irradiation was performed with , rads. wild-type rhpiv was a recombinant version of strain js (genbank accession number z ) and modified as previously described to express enhanced gfp . virus was cultured on llc-mk cells and subsequently purified by centrifugation in a discontinuous %/ % sucrose gradient with . m hepes and . m mgso (sigma-aldrich) at , × g for min at °c. virus titers were determined by infecting vero cell monolayers in -well plates with serial -fold dilutions of virus, overlaying with dmem containing % methylcellulose (sigma-aldrich), and counting fluorescent plaques using a typhoon scanner at five days postinfection (ge life sciences). expression plasmids for his-tagged hpiv pref and postf antigens are previously described . hpiv pref contained the following mutations including two disulfide linkages, q c-l c, i c-g c, a v, and i v . f cells were transfected at a density of cells/ml in freestyle media using mg/ml pei max (polysciences). transfected cells were cultured for days with gentle shaking at °c. supernatant was collected by centrifuging cultures at , × g for minutes followed by filtration through a . µm filter. the clarified supernatant was incubated with ni sepharose beads overnight at °c, followed by washing with wash buffer containing mm tris, mm nacl, and mm imidazole. his-tagged protein was eluted with an elution buffer containing mm tris, mm nacl, and mm imidazole. the purified protein was run over a / superose size-exclusion column (ge life sciences). fractions containing the trimeric hpiv f proteins were pooled and concentrated by centrifugation in an amicon ultrafiltration unit (millipore) with a kda molecular weight cut-off. two units of biotinylated thrombin (millipore) were mixed with each mg of protein overnight to cleave off tags, streptavidin agarose (millipore) was added for another hour to remove thrombin and the cleaved tags, and the mixture was centrifuged through a pvdf filter (millipore) to remove the streptavidin agarose. the concentrated sample was stored in % glycerol at - °c. purified hpiv f was biotinylated using an ez-link sulfo-nhs-lc-biotinylation kit (thermo fisher) using a : . molar ratio of biotin to f. unconjugated biotin was removed by centrifugation using a kda amicon ultra size exclusion column (millipore). to determine the average number of biotin molecules bound to each molecule of f, streptavidin-pe (prozyme) was titrated into a fixed amount of biotinylated f at increasing concentrations and incubated at room temperature for minutes. samples were run on an sds-page gel (invitrogen), transferred to nitrocellulose, and incubated with streptavidin-alexa fluor (thermo fisher) at a dilution of : , to determine the point at which there was excess biotin available for the streptavidin-alexa fluor reagent to bind. biotinylated f was mixed with streptavidin-apc at the ratio determined above to fully saturate streptavidin and incubated for min at room temperature. unconjugated f was removed by centrifugation using a k nanosep centrifugal device (pall corporation). apc/dylight tetramers were created by mixing f with streptavidin-apc pre-conjugated with dylight (thermo fisher) following the manufacturer's instructions. on average, apc/dylight contained - dylight molecules per apc. the concentration of each f tetramer was calculated by measuring the absorbance of apc ( nm, extinction coefficient = . µm − cm − ). tetramer enrichment - × frozen pbmcs, - × frozen tonsil cells, or - × frozen spleen cells were thawed into dmem with % fetal calf serum and u/ml penicillin plus µg/ml streptomycin. cells were centrifuged and resuspended µl of ice-cold facs buffer composed of pbs and % newborn calf serum (thermo fisher). postf apc/dylight conjugated tetramers were added at a final concentration of nm in the presence of % rat and mouse serum (thermo fisher) and incubated at room temperature for min. pref apc tetramers were then added at a final concentration of nm and incubated on ice for min, followed by a ml wash with ice-cold facs buffer. next, μl of anti-apcconjugated microbeads (miltenyi biotec) were added and incubated on ice for min, after which ml of facs buffer was added and the mixture was passed over a magnetized ls column (miltenyi biotec). the column was washed once with ml ice-cold facs buffer and then removed from the magnetic field and ml ice-cold facs buffer was pushed through the unmagnetized column twice using a plunger to elute the bound cell fraction. cells were incubated in μl of facs buffer containing a cocktail of antibodies for minutes on ice prior to washing and analysis on a facs aria (bd). antibodies included anti-igm fitc (g - , bd), anti-cd buv (sj c , bd), anti-cd bv (ucht , bd), anti-cd bv (m p- , bd), anti-cd bv ( g , bd), anti-cd buv ( h , bd), anti-igd bv (ia - , bd), and a fixable viability dye (tonbo biosciences). absolute counts within each specimen were calculated by adding a known amount of accucheck counting beads (thermo fisher). b cells were individually sorted into either ) empty -well pcr plates and immediately frozen, or ) flat-bottom -well plates containing feeder cells that had been seeded at a density of , cells/well one day prior in µl of imdm media (gibco) containing % fetal calf serum, u/ml penicillin plus µg/ml streptomycin, and . µg/ml amphotericin. b cells sorted onto feeder cells were cultured at °c for days. nunc maxsorp -well plates (thermo fisher) were coated with ng of goat anti-human fab (jackson immunoresearch) for minutes at °c. wells were washed three times with xdpbs and then blocked with xdpbs containing % bovine serum albumin (sigma-aldrich) for one hour at room temperature. antigen coated plates were incubated with culture supernatants for minutes at °c. a standard curve was generated with serial two-fold dilutions of palivizumab. wells were washed three times with xdpbs followed by a one hour incubation with horse radish peroxidaseconjugated goat anti-human total ig at a dilution of : (invitrogen). wells were then washed four times with xdpbs followed by a - minute incubation with tmb substrate (seracare). absorbance was measured at nm using a softmax pro plate reader (molecular devices). the concentration of antibody in each sample was determined by reference to the standard curve and dilution factor. for neutralization screening of culture supernatants, vero cells were seeded in -well flat bottom plates and cultured for hours. after days of culture, µl of culture supernatant was mixed with µl of sucrose-purified gfp-hpiv diluted to , plaque forming units (pfu)/ml for one hour at °c. vero cells were then incubated with µl of the supernatant/virus mixture for one hour at °c to allow viral adsorption. next, each well was overlaid with µl dmem containing % methylcellulose. fluorescent plaques were counted at five days post-infection using a typhoon imager. titers of hpiv -specific monoclonal antibodies were determined by a % plaque reduction neutralization test (prnt ). vero cells were seeded in -well plates and cultured for hours. monoclonal antibodies were serially diluted : in µl dmem and mixed with µl of sucrose-purified hpiv diluted to , pfu/ml for one hour at °c. vero cells were incubated with µl of the antibody/virus mixture for one hour at °c to allow viral adsorption. each well was then overlaid with µl dmem containing % methylcellulose. fluorescent plaques were counted at five days post-infection using a typhoon imager. prnt titers were calculated by linear regression analysis. for individual b cells sorted and frozen into empty -well pcr plates, reverse transcription (rt) was directly performed after thawing plates using superscript iv (thermo fisher) as previously described , . briefly, µl rt reaction mix consisting of µl of µm random hexamers (thermo fisher), . µl of mm deoxyribonucleotide triphosphates (dntps; thermo fisher), µl ( u) superscript iv rt, . µl ( u) rnaseout (thermo fisher), . µl of % igepal (sigma-aldrich), and µl rnase free water was added to each well containing a single sorted b cell and incubated at °c for hour. for individual b cells sorted onto feeder cells, supernatant was removed after days of culture, plates were immediately frozen on dry ice, stored at - °c, thawed, and rna was extracted using the rneasy micro kit (qiagen). the entire eluate from the rna extraction was used instead of water in the rt reaction. following rt, µl of cdna was added to µl pcr reaction mix so that the final reaction contained . µl ( . u) hotstartaq polymerase (qiagen), . µl of µm ′ reverse primers, . µl of µm ′ forward primers, . µl of mm dntps, . µl of x buffer (qiagen), and . µl of water. the pcr program was cycles of °c for s, °c for s, and °c for s, followed by °c for min for heavy and kappa light chains. the pcr program was cycles of °c for s, °c for s, and °c for s, followed by °c for min for lambda light chains. after the first round of pcr, µl of the pcr product was added to µl of the second-round pcr reaction so that the final reaction contained . µl ( . u) hotstartaq polymerase, . µl of µm ′ reverse primers, . µl of µm ′ forward primers, . µl of mm dntps, . µl x buffer, and . µl of water. pcr programs were the same as the first round of pcr. μl of the pcr product was run on an agarose gel to confirm the presence of a ∼ -bp heavy chain band or -bp light chain band. μl from the pcr reactions showing the presence of heavy or light chain amplicons was mixed with µl of exosap-it (thermo fisher) and incubated at °c for min followed by °c for min to hydrolyze excess primers and nucleotides. hydrolyzed second-round pcr products were sequenced by genewiz with the respective reverse primer used in the nd round pcr, and sequences were analyzed using imgt/v-quest to identify v, d, and j gene segments. paired heavy chain vdj and light chain vj sequences were cloned into ptt -derived expression vectors containing the human igg , igk, or igl constant regions using in-fusion cloning (clontech) as previously described . monoclonal antibody production secretory igg was produced by co-transfecting f cells at a density of cells/ml with the paired heavy and light chain expression plasmids at a ratio of : in freestyle media using mg/ml pei max. transfected cells were cultured for days with gentle shaking at °c. supernatant was collected by centrifuging cultures at , × g for minutes followed by filtration through a . µm filter. clarified supernatants were then incubated with protein a agarose (thermo scientific) followed by washing with igg binding buffer (thermo scientific). antibodies were eluted with igg elution buffer (thermo scientific) into a neutralization buffer containing m tris-base ph . . purified antibody was concentrated and buffer exchanged into xdbps using an amicon ultrafiltration unit with a kda molecular weight cut-off. bli assays were performed on the octet.red instrument (fortebio) at room temperature with shaking at rpm. anti-human igg capture sensors (fortebio) were loaded in kinetics buffer (pbs with . % bsa, . % tween , and . % nan , ph . ) containing µg/ml purified monoclonal antibody for s. after loading, the baseline signal was recorded for s in kinetics buffer. the sensors were then immersed in kinetics buffer containing µm purified hpiv f for a s association step followed by immersion in kinetics buffer for an additional s dissociation phase. the maximum response was determined by averaging the nanometer shift over the last s of the association step after subtracting the background signal from each analyte-containing well using a negative control monoclonal antibody at each time point. curve fitting was performed using a : binding model and fortebio data analysis software. for competitive binding assays, penta-his capture sensors (fortebio) were loaded in kinetics buffer containing µm his-tagged hpiv f for s. after loading, the baseline signal was recorded for s in kinetics buffer. the sensors were then immersed for s in kinetics buffer containing µg/ml of the first antibody followed by immersion for another s in kinetics buffer containing µg/ml of the second antibody. percent competition was determined by dividing the maximum increase in signal of the second antibody in the presence of the first antibody by the maximum signal of the second antibody alone. hep- cells were seeded into -well plates at a density of , cells/well one day prior to fixation with % acetone and % methanol for minutes at - °c. cells were then permeabilized and blocked with xdpbs containing % triton x- (sigma-aldrich) and % bovine serum albumin for minutes at room temperature. µl of each monoclonal antibody at . mg/ml was added for minutes at room temperature. the f positive control was obtained from the nih aids reagent program. wells were then washed four times in xdpbs followed by incubation with goat anti-human igg alexa fluor (thermo fisher) at a dilution of : in xdpbs for minutes at room temperature in the dark. after washing four times with x dpbs, images were acquired using the evos cell imaging system (thermo fisher). fab preparation pi -e , pi -c , and pi -a fab were produced by incubating each mg of igg with µg of lysc (new england biolabs) overnight at °c followed by incubating with protein a for hour at room temperature. the mixture was then centrifuged through a pvdf filter, concentrated in pbs with a kda amicon ultra size exclusion column, and purified further by sec using superdex (ge healthcare life sciences) in mm hepes and mm nacl. crystallization, data collection, and refinement crystals of pi -e fab were obtained using a nt dispensing robot (formulatrix), and screening was done using commercially available screens (rigaku wizard precipitant synergy block # , molecular dimensions proplex screen ht- , hampton research crystal screen ht) by mixing . µl/ . µl (protein/reservoir) by the vapor diffusion method. crystals used for diffraction data were grown in the following conditions in solution containing . m ammonium phosphate monobasic, . m tris, ph . , and % (+/−) -methyl- , -pentanediol. crystals were cryoprotected in parabar oil (hampton). crystals diffracted to . Å (supplemental table ). data was collected on the fred hutch x-ray home source and processed using hkl . the structure was solved by molecular replacement using phaser in ccp (collaborative computational project, number ), and the fab portion of pdb accession number mft was used as a search model , . iterating rounds of structure building and refinement was performed in coot and phenix . structural figures were made with pymol and chimera . negative stain electron microscopy complex of pi -e fab + hpiv pref was formed by mixing both components at a : molar ratio and incubating overnight at °c. complexes were purified by sec using superdex in mm hepes and mm nacl at ph . . negative staining was performed using formvar/carbon grids (electron microscopy sciences) of mesh size. µl of hpiv -pi -e fab complex protein was negatively stained at a concentration of ug/ml on the grids using % uranyl formate staining solution. data were collected using a fei tecnai t electron microscope operating at kev equipped with a gatan ultrascan ccd camera. the images were collected using an electron dose of . e − Å − and a magnification of , × that resulted in a pixel size of . Å. the defocus range used was - . µm to - . µm. the data was collected using leginon interface . image processing was carried out using cistem . the final reconstruction was performed using ~ , unbinned particles, refining for iterations with c symmetry applied. pi -e fab (this paper), hpiv (pdb id mjz) and gcn (pdb id dme) fitting was carried out using the fit function in chimera . cryo-em data collection, processing, and model fitting the hpiv pref protein was incubated with . molar excess of pi -e fab and the complex was run on sec in mm hepes ( . ), mm nacl buffer. the sample was incubated in . mm dn-dodecyl β-d-maltoside (ddm) to prevent aggregation during vitrification, followed by vitrification using a chameleon; spt labtech, formerly ttp labtech [ ] [ ] [ ] . the grids used were nanowire self-blotting grids. the sample was dispensed onto the nanowire grids using a picoliter piezo dispensing head. a total of ~ nl sample, dispensed as pl droplets, was applied in a stripe across each grid, followed by a pause of a few milliseconds, before the grid was plunged into liquid ethane. the grids were imaged using a titan krios g tem (fei) at -kv accelerating voltage and liquid nitrogen temperature. the images were recorded (fei) on a falcon ec direct electron detector (fei) operated in electron-counting mode using serial em . the collected frames were motion corrected and ctf estimated using cryosparc (punjani et al., ) . particles were picked and extracted and further d classification, ab initio reconstruction, heterogeneous refinement and final map refinement was performed using cryosparc . the local resolution of the refined map was estimated using cryosparc and ucsf chimera. the model of hpiv pref for fitting into the cryo-em map was the same as modelled and used while generating pdb mjz . the crystal structure of pi -e fab was used for fitting into the cryo-em map. the fitting of the hpiv pref trimer and fab coordinates to the cryo-em reconstructed maps were performed using ucsf chimera . figures were generated in ucsf chimera. map-fitting cross correlations were calculated using the fit-in-map feature in ucsf chimera. cotton rat challenge experiments were performed by sigmovir. animals in groups of n= - were infected intranasally with µl of pfu hpiv . this sample size is consistent with previously published experiments testing the efficacy of rsv monoclonal antibodies in the cotton rat model , , , . monoclonal antibody was either administered intramuscularly one day prior to infection or one day after infection. cyclophosphamide ( mg/kg) was administered intramuscularly at days prior to infection and readministered every three days for the duration of experiments involving immunosuppression. nasal turbinates were removed for viral titration by plaque assay at day four post-infection. lungs were removed for viral titration by plaque assay and histopathology at day four post-infection. lung and nose homogenates were clarified by centrifugation in emem (gibco). confluent hep- monolayers were inoculated in duplicate with diluted homogenates in -well plates. after incubating for two hours at °c, wells were overlaid with . % methylcellulose. after four days, the cells were fixed and stained with . % crystal violet for one hour, and plaques were counted to determine titers as pfu per gram of tissue. histopathology was performed by inflating dissected lungs with % formalin, immersing in % formalin, embedding in paraffin, sectioning, and staining with hematoxylin and eosin. slides were scored blind on a - severity scale. the scores are subsequently converted to a - % histopathology scale as previously described , . statistical analysis was performed using graphpad prism . pairwise statistical comparisons were performed using unpaired two-tailed t-test. p < . was considered statistically significant. data points from individual samples are displayed. sequencing and structural data that support the findings of this study have been deposited in the protein data bank (pdb) and are accessible through pdb accession number wrp. all other relevant data are available from the corresponding author on request. expertise with cotton rat experiments; steve voght and jessica schembri for proof-reading the manuscript; rebecca putnam, paula culver, russell eberts and laura yates for administrative support; the taylor lab for helpful discussions. parainfluenza virus infection of young children: estimates of the population-based burden of hospitalization identification of a recombinant live attenuated respiratory syncytial virus vaccine candidate that is highly attenuated in infants evaluation of a live, cold-passaged, temperature-sensitive, respiratory syncytial virus vaccine candidate in infancy community-acquired respiratory viruses in transplant patients: diversity, impact, unmet clinical needs the challenge of respiratory virus infections in hematopoietic cell transplant recipients mortality rates of human metapneumovirus and respiratory syncytial virus lower respiratory tract infections in hematopoietic cell transplantation recipients management of respiratory viral infections in hematopoietic cell transplant recipients and patients with hematologic malignancies clinical outcomes associated with respiratory virus detection before allogeneic hematopoietic stem cell transplant significant transplantation-related mortality from respiratory virus infections within the first one hundred days in children after hematopoietic stem cell transplantation clinical outcomes in outpatient respiratory syncytial virus infection in immunocompromised children predictive value of respiratory viral detection in the upper respiratory tract for infection of the lower respiratory tract with hematopoietic stem cell transplantation airflow decline after myeloablative allogeneic hematopoietic cell transplantation: the role of community respiratory viruses respiratory virus infection among hematopoietic cell transplant recipients: evidence for asymptomatic parainfluenza virus infection parainfluenza virus infections in hematopoietic cell transplant recipients and hematologic malignancy patients: a systematic review risk of primary infection and reinfection with respiratory syncytial virus impaired antibody-mediated protection and defective iga b-cell memory in experimental infection of adults with respiratory syncytial virus immunity to and frequency of reinfection with respiratory syncytial virus report of the second meeting on the development of influenza vaccines that induce broadspectrum and long-lasting immune responses, world health organization flublok, a recombinant hemagglutinin influenza vaccine development of a humanized monoclonal antibody (medi- ) with potent in vitro and in vivo activity against respiratory syncytial virus prefusion f-specific antibodies determine the magnitude of rsv neutralizing activity in human sera rapid profiling of rsv antibody repertoires from the memory b cells of naturally infected adult donors structure of the uncleaved ectodomain of the paramyxovirus (hpiv ) fusion protein interaction of human parainfluenza virus type with the host cell surface structure-based design of a quadrivalent fusion glycoprotein vaccine for human parainfluenza virus types - parainfluenza viruses generation of a cost-effective cell line for support of high-throughput isolation of primary human b cells and monoclonal neutralizing antibodies cross-reactivity with self-antigen tunes the functional potential of naive b cells specific for foreign antigens detection and activation of hiv broadly neutralizing antibody precursor b cells using anti-idiotypes structure of rsv fusion glycoprotein trimer bound to a prefusion-specific neutralizing antibody structure and immunogenicity of pre-fusion-stabilized human metapneumovirus f glycoprotein semi-permissive replication and functional aspects of the immune response in a cotton rat model of human parainfluenza virus type infection immunoglobulins in nasal secretions of healthy humans: structural integrity of secretory immunoglobulin a (iga ) and occurrence of neutralizing antibodies to iga proteases of nasal bacteria iga monoclonal antibody is no more effective than igg at protecting mice from mucosal challenge with respiratory syncytial virus treatment with novel rsv ig ri- controls viral replication and reduces pulmonary damage in immunocompromised sigmodon hispidus respiratory syncytial virus infection in cyclophosphamide-treated cotton rats palivizumab is highly effective in suppressing respiratory syncytial virus in an immunosuppressed animal model continuous cultures of fused cells secreting antibody of predefined specificity next generation antibody drugs: pursuit of the 'high-hanging fruit passive immunotherapy of viral infections: 'super-antibodies' enter the fray germinal center selection and the antibody response to influenza vp -and vp -specific antibodies mediate heterotypic immunity to rotavirus in humans isolation of human monoclonal antibodies from peripheral blood b cells rational design of envelope identifies broadly neutralizing human monoclonal antibodies to hiv- association of long-term risk of respiratory, allergic, and infectious diseases with removal of adenoids and tonsils in childhood vaccine development for respiratory syncytial virus structure of super-potent antibody cap -vrc . in complex with hiv- envelope reveals a combined mode of trimer-apex recognition neutralizing epitopes on the respiratory syncytial virus fusion glycoprotein viral receptor-binding site antibodies with diverse germline origins principles of broad and potent antiviral human antibodies: insights for vaccine design a highly potent extended half-life antibody as a potential rsv vaccine surrogate for all infants cotton rat model for testing vaccines and antivirals against respiratory syncytial virus b cells engineered to express pathogen-specific antibodies protect against infection human parainfluenza virus type expressing the respiratory syncytial virus pre-fusion f protein modified for virion packaging yields protective intranasal vaccine candidates efficient generation of monoclonal antibodies from single human b cells by single cell rt-pcr and expression vector cloning specifically modified env immunogens activate b-cell precursors of broadly neutralizing hiv- antibodies in transgenic mice processing of x-ray diffraction data collected in oscillation mode the ccp suite: programs for protein crystallography coot: model-building tools for molecular graphics phenix: a comprehensive pythonbased system for macromolecular structure solution unraveling hot spots in binding interfaces: progress and challenges ucsf chimera--a visualization system for exploratory research and analysis automated molecular microscopy: the new leginon system userfriendly software for single-particle image processing the native gcn leucine-zipper domain does not uniquely specify a dimeric oligomerization state a new method for vitrifying samples for cryoem optimizing "self-wicking" nanowire grids spotiton: new features and applications automated electron microscope tomography using robust prediction of specimen movements cryosparc: algorithms for rapid unsupervised cryo-em structure determination development of motavizumab, an ultrapotent antibody for the prevention of respiratory syncytial virus infection in the upper and lower respiratory tract pathogenesis of human parainfluenza virus infection in two species of cotton rats: sigmodon hispidus develops bronchiolitis, while sigmodon fulviventer develops interstitial pneumonia j.b. conceived the study, designed and conducted the experiments, analyzed the data, and wrote the manuscript. s.s., c.w., j.r. and m.p coordinated and performed the structural analysis. a.m. provided t cd l/il- /il- cells. r.b. provided spleens. j.p. provided tonsils. g.b.e.s-.j and p.d.k. provided pref-stabilized hpiv f. j.t. conceived the study, designed experiments, analyzed the data, and edited the manuscript. work described in this manuscript has been included in a provisional patent application. the authors have no other competing financial interests in relation to the work described. resolutions are reported according to the fsc . gold-standard criterion. key: cord- -ka resa authors: aguilar, césar; verdel-aranda, karina; ramos-aboites, hilda e.; morales, marco antonio; licona-cassani, cuauhtémoc; barona-gómez, francisco title: convergent evolution of streptomyces protease inhibitors involving a trna-mediated condensation-minus nrps date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ka resa small peptide aldehydes (spas) with protease inhibitory activity are natural products typically synthesized by nonribosomal peptide synthetases (nrps). spas are widely used in biotechnology, as therapeutic agents, they are physiologically relevant and regulate development of the natural hosts. during genome evolutionary analysis of streptomyces lividans we identified an nrps-like biosynthetic gene cluster (bgc) that lacked a condensation (c) domain but included a trna-utilizing enzyme (true) belonging to the leucyl/phenylalanyl (l/f) transferase family. this system was predicted to direct the synthesis of a novel spa with protease inhibitory activity, called livipeptin. following genome mining and phylogenomic analyses we confirmed the presence of trues within diverse streptomyces genomes, including fusions with a c-minus nrps-like protein. we further demonstrate functional cooperation between these enzymes and provide the biosynthetic rules for the synthesis of livipeptin, expanding the known universe of acetyl-leu/phe-arginal spas. the l/f-transferase c-minus nrps productive interaction was shown to be trna-dependent after semisynthetic assays in the presence of rnase, which contrasts with leupeptin, an acetyl-leu-arginal spa that we show to be produced by streptomyces roseous atcc via a true-minus bgc with multiple complete nrpss. thus, livipeptin and leupeptin are the result of convergent evolution, which has driven the appearance of unprecedented biosynthetic logics directing the synthesis of protease inhibitors thought to be at the core of streptomyces colony biology. our results pave the way for understanding this streptomyces trait, as well as for the discovery of novel natural products following evolutionary genome mining approaches. abstract importance convergent evolution in microbiology is believed to be highly recurrent yet examples that have been comprehensively characterized are scarce. proteases inhibition by small peptide aldehydes is at the core of many microbiological processes, both within the cell and during colony development, and in microbial ecology. here we report the biosynthetic foundations of leupeptin, the main streptomyces protease inhibitor, and of livipeptin, a protease inhibitor produced by streptomyces lividans. although these peptides belong to the same chemical class, here we show that their biosynthetic routes result from convergent evolution, as they involve unrelated biosynthetic mechanisms, including the recruitment of a trna-utilizing enzyme that functionally replaces the condensation domain of a nonribosomal peptide synthetase during livipeptin biosynthesis. thus, these results pave the way for understanding streptomyces protease inhibitors as a trait and provide unprecedented knowledge for genome mining of natural products and synthetic biology where proteases inhibition is desirable. small peptide aldehydes (spas) are metabolites with protease inhibitory activity. their production has been reported in several bacterial species belonging to the phyla actinobacteria, cyanobacteria and firmicutes, as well as in fungal species belonging to the families aspergillaceae and apiosporaceae (sabotič and kos ) . spas molecular weight ranges between and da, and they are characterized by unique chemical features, including (i) n-terminal groups capped with acyl groups or with ureido-amino acid groups, giving rise to acylated or aminoalkyl ends with a terminal carboxylic group; and (ii) a terminal aldehyde group derived from the carboxyl terminal modification of the peptide chain by a reductive process (bullock et al. ) . biosynthetically, spas have been shown to be produced by nonribosomal peptide synthetases (nrps) with a reductase (r) domain, which mediates the release of the nascent peptide, and at the same time, the formation of the aldehyde group. these reactions lead to the chemical warhead by which spas covalently interact with serine or cysteine proteases active site residues, giving place to hemiacetals or hemithioacetals, and irreversibly inhibiting proteases enzymatic activity (brayer et al. ; wlodawer et al. ) . based on their functional groups, spas can be divided into two sub-classes: (i) those with a terminal group protected by an acyl group, including flavopeptin, tyrostatin, tyropeptin, nerfilin, strepin, leupeptin, thiolstatin, acetyl-leu-arginal and bacithrocins a-d (figure ); and (ii) those in which the nterminal group binds to an ureido moiety that is attached to an amino acid by means of an amide linkage, such as chymostatin, mapi, ge , antipain and elastatinal (umezawa et al. ; suda et al. ; tatsuta et al. ; watanabe et al. ; stefanelli et al. ; murao and watanabe ) . the chemical configuration of spas allows alteration of the peptide chain, wherein the ureido group acts as an adapter that changes the order of the peptide, from n-terminal to c-terminal into c-terminal to cterminal, making them distinct from traditional ribosomal peptides. amongst spas, the most widely used is the acetyl-leu/phe-arginal metabolite leupeptin, which was discovered by umezawa and co-workers more than fifty years ago kondo et al. ). yet, not much is known about the genetic basis and concomitant mechanisms driving the chemical structural diversity and biomolecular specific activities of leupeptin, and overall, of spas. the predicted biosynthetic system includes (i) an nrps-like protein, lacking both canonical condensation (c) and thioesterase (te) domains, but with one adenylation (a) domain, a peptidyl carrier protein (pcp) and a reductase (r) domain (sli gene); (ii) a homolog of a leucyl/phenylalanyl (l/f) trna transferase (sli gene), a trna-utilizing enzyme hitherto never implicated in natural products biosynthesis; and (iii) an n-acyltransferase (sli gene). these enzymes are located within the slp linear plasmid, a mobile genetic element of s. lividans absent from the industrial strain tk (cruz- morales et al. ). our previous proposal implicated these genes in the synthesis of an n-acylated leu -arginal dipeptide, which we called livipeptin. since this prediction suggests common structural features with previously described spas, such as leupeptin kondo et al. ) , thiolstatin (murao et al. ; kamiyama et al. ) , bacithrocins (kamiyama et al. ) and acetyl-leu-arginal kondo et al. ) , it was hypothesized that livipeptin could have protease inhibitory activity. here, we characterize s. lividans putative livipeptin's bgc following a combined phylogenomics, genome mining and synthetic biology approach. as predicted, our results show that this unprecedented biosynthetic system directs the synthesis of livipeptin, which shows strong protease inhibitory activity. livipetin was found to be chemically related to bacithrocins and thiolstatin, reported previously to be produced by brevibacillus laterosporus (kamiyama et al. ) ; and to leupeptin, whose bgc in streptomyces is also identified herein. interestingly, leupeptin and livipeptin were shown to be synthesized by unrelated pathways, demonstrating convergent evolution within streptomyces and with other taxonomically unrelated leupeptin-producing proteobacteria (li et al. ) . genetic and semisynthetic chemical experiments in the presence of rnase, as done previously for other trues (ortega and van der donk ) , confirmed the involvement of an l/f transferase, typically involved in protein tagging during proteolysis (n-end rule) within an essential process carried out by many organisms that regulate the half-life of proteins and morphological differentiation (leibowitz and soffer ; varshavsky ) , during livipeptin biosynthesis. we anticipate that the discovery of this true, which functionally interacts with a c-minus nrps, will pave the way for exploiting this unprecedented biosynthetic logic to uncover novel natural products, and will assist in the development of auxiliary synthetic biology tools to inhibit proteolysis (cruz-morales et al. a & b . similar implications are envisaged for leupeptin once its bgc is fully characterized. moreover, our results are the first step towards investigating the evolution and role of protease inhibitors during streptomyces development, colony biology and ecology, an old hypothesis that remains to be experimentally tested (chater et al. ). the presence of an l/f transferase homolog in the livipeptin bgc suggested that this enzyme family could have been recruited for the synthesis of a natural product (np). this possibility encouraged us to mine for l/f transferases in the context of np biosynthesis, with an emphasis in actinobacteria, as close homologs of sli linked to sli could not be found beyond this phylum. thus, combined evomining (sélem-mojica et al. ) and corason (navarro-muñoz et al. ) phylogenomic analyses of actinobacterial l/f transferases allowed us to identify l/f transferase homologs within , good quality and well annotated actinobacterial genomes. remarkably, despite the fact that l/f transferases are known to play a central role as housekeeping enzymes involved in proteolytic metabolism (mogk et al. ) , only % of the actinobacterial genomes investigated include a homolog of this gene, suggesting that these organisms may have alternative proteolytic tagging strategies, such as the ssra (tmrna) tagging system (braud et al. ). both the evomining and corason output revealed two discrete clades, consistent with one function possibly devoted to housekeeping metabolism, and a second function predicted to be involved in np biosynthesis. the large branch includes enzymes from diverse actinobacteria species, but without a conserved gene neighborhood (supplementary figure s ) , supporting a single-enzyme housekeeping biochemical function. the second clade is less populated yet shows a larger degree of gene order conservation amongst its entries. these enzymes are mostly from the genus streptomyces, but also from nocardiopsis ( ), kitasatospora ( ), streptacidiphilus ( ) and actinopolyspora ( ), with a frankia non-conserved entry at its root (figures a and b) . this so-called small streptomyces conserved clade consists of l/f transferases probably recruited into np biosynthesis, as revealed by evomining in the first instance, and independently confirmed with the use of antismash (blin et al. ). these analyses also revealed potentially translational fusions between the l/f transferase true and the c-minus nrps-like genes (shown with an asterisk, figure b ). as in s. lividans, the latter were always found to encode for an adenylation domain predicted to have specificity towards an arginine residue, a peptidyl carrier protein (pcp) and a reductase (r) domain. the domain organization confirmed above resembles that of a canonical nrps involved in the synthesis of other protease inhibitors (winn et al. ) , but with the l/f transferase replacing (and thus possibly playing a similar role to) the expected condensation domain needed for formation of an amide bound. the r domain could be functionally equivalent to that in flavoprotein, an streptomyces spa of a different class, whose biosynthesis involves an nrps with a r domain responsible for reductive release of the nascent peptide (chen et al. ) . within the small streptomyces sub-clade, the livipeptin bgc branch includes hits from four other streptomyces species, and one from actinopolyspora erythreae jpmv. yet, despite these species are from different genera, their bgcs share high sequence similarity and conserved gene order. this includes two other biosynthetically potential elements: an omethyltransferase and a butyryl-coa dehydrogenase (sli and sli in s. lividans). all together, these observations support our previous prediction related to the synthesis of the putative protease inhibitor livipeptin in s. lividans (cruz-morales et al. ) , but also includes leucine in addition to phenylalanine as an alternative biosynthetic precursor ( figure a ). owing to the predicted chemical structure of livipeptin we hypothesized it may chemically classify as an acetyl-leu-arginal, which includes other related molecules , the bacithrocins and thiolstatin, previously isolated from brevibacillus laterosporus (kamiyama et al. ) , and leupeptin kondo et al. ) (figure ) . therefore, using the l/f transferase as a beacon, we mined the sixty-one brevibacillus spp. genome sequences available at the time of our analyses, without positive results. likewise, after genome sequencing and mining of the leupeptin-producing strain s. roseus atcc (pridham et al. ) we were unable to find evidence of a homologous livipeptin bgc. instead, we identified a bgc rich in nrpss with a domains specific to diverse aminoacids, such as l-thronine and l-serine (leup ), l-isoleucine (leup ) and l-ornithine (leup ). additionally, homologs of argininosuccinate lyase, cysteine synthase, and threonine kinase, previously implicated in the biosynthesis of diaminobutyric acid (daba) were also found ( figure b and supplementary figure s ). although we hypothesized this bgc to be implicated in the synthesis of leupeptin it is unclear how exactly its biosynthesis proceeds. these observations suggest that despite their chemical similarities, leupeptin and livipeptin bgcs are evolutionarily unrelated, as they follow different biosynthetic logics for their synthesis. while for leupeptin further analysis is need for postulating a complete nrps biosynthetic pathway, including its biosynthetic precursors (figure b ), for livipeptin we propose a biosynthetic mechanism involving a small assembly line that catalyzes amide bond formation between arginine (a domain encoded by the nrps-like gene) and phenylalanine or leucine residues provided by their cognate aminoacyl-trnas as substrates (true of the l/f transferase family). a reductive release mechanism leading to the aldehyde moiety of livipeptin would also be part of this pathway. the resulting dipeptide aldehyde may undergo acetylation catalyzed by the function of an n-acetyltransferase (sli ). these three enzymes may represent the minimal biosynthetic core necessary to synthesize a metabolite with protease inhibitory activity. however, additional methylation (sli ) and further dehydrogenation (sli ) could also be in place ( figure a ). in the absence of transcriptional regulatory genes from both bgc, how these enzymes potentially interact to provide a regulatory mechanism controlling protease inhibitory activity, as previously suggested in streptomyces species (kim and lee ; kim and lee ) , remains unknown. to identify the livipeptin metabolite(s) potentially produced by sli - , proposed to have protease inhibitory activity, a s. lividans mutant was constructed (s. lividans Δlvp, sli - minus; table ). s. lividans strains (wt) and Δlvp (mutant) grown in modified r medium were used for the identification of livipeptin after comparative hplc analysis of the aqueous extracts of the resulting cultures. these conditions were suboptimal for comparative chromatographic analysis, as they resulted in far too many metabolites, but only one differential peak eluting at a retention time of . min with a distinctive absorption at nm. the compound associated with this peak could be isolated after extensive hplc optimization and fractionation ( figure a) , and high-resolution ms analysis of the isolated compound(s) showed that this peak corresponds to a metabolite with an m/z of . (supplementary figure s ) . further experiments with an ms ion trap targeted toward this mass reproducibly lead to a distinctive fractionation that includes a derived ion with an m/z of ( figure b ), equivalent to loss of a water molecule, in addition to ions with m/z values of and ( figure c , see also methods). the abovementioned compound is close to the theoretical molecular formula of thiolstatin (c n o h ), but only if incorporation of l-phenylalanine over l-leucine, together with l-arginine, is considered. indeed, ms data obtained coincides with previously reported data for other acetyl-leuarginal metabolites kondo et al. ) , the bacithrocins and thiolstatin produced by brevibacillus laterosporus (kamiyama et al. ) . moreover, although more distantly, these data confirm the expected chemical connection via leucine residues between livipeptin with leupeptin ( and fragments, see table ). therefore, we concluded that livipeptin is equivalent to thiolstatin and bacithrocin d, despite our inability to identify a homologous locus in the currently available brevibacillus genome sequences. this could be due to the lack of taxonomic resolution and detailed information about the brevibacillus laterosporus strain (kamiyama et al. ) , leading to misclassification of the original thiolstatin-and bacithrocin-producing strain(s). after genome mining analyses (data not shown) we favor this explanation rather than the existence of an analogous livipeptin bgc in species belonging to brevibacillus, although a different biosynthetic route yet to be identified cannot be ruled out. in order to identify the leupeptin bgc, we first confirmed by hplc-ms the ability of s. roseus atcc to produce this metabolite. the metabolite found was consistent with the leupeptin standard from a commercial source. we then sequenced and mined the genome of this organism, which consists of . mb encoding for , orfs and putative bgcs. as mentioned previously, our efforts failed to identify a livipeptin bgc in this streptomycete. yet, we identified alternative bgcs that could account for the synthesis of leupeptin (cruz-morales et al. ; cruz-morales, et al. ) . a combined evomining, corason and antismash analysis gave rise to a hypothesis that a bgc with five nrpss and a diaminobutyric acid (daba) biosynthetic cassette (supplementary figure s ) could direct the synthesis of leupeptin. full functional characterization of this bgc is beyond the scope of this report, but to confirm the involvement of this locus in leupeptin biosynthesis we constructed an insertional mutant by targeting the leupa gene with a suicidal plasmid, leading to a strain termed s. roseus leup (see table and methods). this strain was unable to synthesize leupeptin when compared with the wild type parental atcc strain. moreover, heterologous expression of pesac -a constructs (tocchetti et al. ) bearing the putative leupeptin bgc in e. coli lead to production of leupeptin. the ms fragmentation pattern of isolated leupeptin coincides with a commercial standard and with some fragments of livipeptin spectral data ( and , see table and supplementary figure s ). our results therefore establish a chemical relationship between livipeptin and leupeptin as acetylleu/phe-arginal metabolites, despite involving distinct biosynthetic pathways, and potentially different precursors. while leupeptin is proposed to be synthesized by nine genes (including five nrpss and a subcluster potentially involved in the synthesis of the non-proteinogenic amino acid daba), livipeptin is produced by a c-minus nrps-like protein functionally linked with a true l/f transferase, plus an acyltransferase. on one hand, the identification of distinct biosynthetic pathways to produce very similar nps highlights the increasing versatility observed among nrps-derived natural products in terms of their biosynthetic logics and enzyme interactions within multifunctional protein templates. on the other hand, the occurrence of convergent evolution in natural products biosynthesis is a recurring theme (fischbach ), although its significance and rate remains to be determined (chevrette et al. ) . it is nevertheless tempting to speculate that the degree of convergent evolution correlates with chemical scaffolds that contribute towards the fitness of the host organisms, as previously suggested for protease inhibitors (chater et al. ; guo et al. ; li et al. ) to further characterize livipeptin, and to be able to determine its presumed protease inhibitory activity, we used as positive controls commercial leupeptin and antipain against trypsin and papain, respectively. for leupeptin, three different sources with identical results were used: (i) commercial standard produced after bacterial fermentations, (ii) s. roseus atcc and (iii) e. coli cultures expressing pesac -a constructs (tocchetti et al. ) , namely pesac_ , harboring the presumed leupeptin bgc (table ) . for livipeptin, we opted for the use of synthetic sli - genes optimized for e. coli and their expression from the pask vector (adams et al. ). the resulting expression plasmid, plvp, was used to transform e. coli c . the resulting strain, c lvp, was cultivated in minimal medium. the conditions used for these experiments allowed us to easily perform a chromatographic comparative analysis of aqueous extracts, without the residual metabolites found in the s. lividans native system. as above, comparison of the hplc profiles of e. coli expressing plvp and e. coli bearing the empty plasmid pask showed a differential peak at . min absorbing at nm ( figure a ). this metabolite shows up in the e. coli cultures with less background than that seen in the s. lividans samples ( figures b and c ). yet, this e. coli data also failed to identify the presumed l-leucine livipeptin analog with a mass of . (m/z), but did rule out the wt s. lividans-specific mass (m/z) as related to livipeptin, given that it could not be detected within the e. coli plvp spectra. whether this mass corresponds to a bigger version of livipeptin was further ruled out after inspection of the ms spectra of the (m/z) metabolite (supplementary figure s ) . thus, although these data do not confirm that all three sli - genes are needed for synthesis of livipeptin, it unequivocally establishes a link between the lvp genes sli - and a putative spa with protease inhibitory activity. previous data suggests that thiolstatin has weak inhibitory activity against serine proteases, such as trypsin, but potent proteolytic inhibitory activity towards cysteine proteases, such as papain. these observations actually explain the use of the prefix 'thiol' in its name (murao et al. ) . unfortunately, the absence of thiolstatin or bacithrocins standards hampered our ability to compare the activity of these metabolites with that of livipeptin. thus, we focused on hplc-purified products from s. lividans and e. coli c plvp cultures. the results obtained from these experiments are included in table and shown in figure a . although strong inhibitory activity equivalent to leupeptin and antipain could be found for the hplc-purified metabolite(s) from the two experimental sources investigated, the material obtained from the e. coli heterologous system showed less activity, especially towards papain. this may contradict early data on thiolstatin activity (murao et al. ; kamiyama et al. ), but given that the overall conditions of these experiments are not equivalent, this is an unfair comparison and we cannot explain this difference. interestingly, as can be seen in figure , the extracts from s. lividans have a larger metabolite diversity than that from e. coli. similar to leupeptin (suzukake et al. ) , under the rich streptomyces growth conditions, livipetin could be produced and co-purified as a molecular cluster, leading to synergistic protease inhibitory activity. a similar situation has been previously shown for other biomolecular activities of natural products (gutiérrez-garcía et al. ) . indeed, commercial high quality leupeptin isolated from s. roseus cultures includes several related species, which contrasts with the less active synthetic leupeptin (mcconnell et al. ). indeed, the hplc-purified metabolite(s) obtained from the s. lividans mutant Δlvp shows some residual inhibitory activity against both proteases ( figure a) . in addition to codon usage optimization for e. coli expression, the plvp plasmid was designed such that each of the three lvp genes (sli - ) could be excised and the plasmid circularized with different-yet-compatible cohesive restriction enzyme, easing construction of expression plasmids with all possible combinations (supplementary figure s ) . in addition to plvp (psli - - ), as shown in table , five different constructs were obtained: three single-gene plasmids (psli , psli and psli ) and two double-gene plasmids (psli - and psli - ). the resulting plasmids were confirmed after dna sequencing and used in assays consisting of mixtures of different cell-free extracts expressing the plasmids, such that all cis and trans gene-expression possible combinations could be addressed. this synthetic biology strategy allowed us to evaluate the potential functional impact that co-expression of two different enzymes could have, when compared with independent enzyme expression, providing a sense of the predicted enzyme-enzyme interactions between the c-minus nrps-like protein, the l/f-transferase and the n-acetyl transferase, and its relationship with protease inhibitory activity (figure b and c). these experiments showed that only the three-genes combinations, obtained after independent expression (in trans) from independent plasmids or co-expression (in cis) from the same plasmid, yielded livipeptin synthesis ( figure c) . unexpectedly, in trans expression of sli , sli and sli provided the highest yield of livipeptin. the combination sli , and sli was not tested as the former plasmid was not obtained, whereas all combinations with only two or one gene did not lead to production of livipeptin ( table ). more importantly, as expected, reconstituted livipeptin showed protease inhibitory activity, similar to that found in the s. lividans native system ( figure a ) and the heterologous e. coli system ( figure b ) against trypsin, and to a lesser extent, papain. although the reason of these differences remains unknown, these results establish the sli - genes as the minimal biosynthetic core for the synthesis of livipeptin with protease inhibitory activity. this experimental layout also provided us with the opportunity to test the hypothesized aminoacyl-trna dependency of the l/f transferase true. in vitro synthesis of livipeptin with the gene combinations containing the entire cluster was redone. however, for these experiments, the cell-free extracts were pre-incubated with rnase prior to generating the reaction mixture. as shown in figure c and table , hplc analysis shows that the addition of rnase suppresses the formation of livipeptin, which can be explained by the degradation of the aminoacyl-trna substrate. this approach has been previously adopted during characterization of the trna-dependent lantibiotic dehydratase nisb, involved in nistatin biosynthesis (ortega et al. ). as we could not detect protease inhibitory activity in these cell-free extracts, sli - genes are strictly necessary for livipeptin synthesis. these results add to the increasing number of reports that implicate central metabolism trues in natural products biosynthesis (hong et al. ; garg et al. ; zhang et al. ; belin et al. ; bougioukou et al. ; ortega et al. ). (ichetovkin et al. ; watanabe et al. ; fung et al. ; zhang et al. ; ortega et al. ) , the catalytic mechanism and residues involved in substrate-assisted catalysis (watanabe et al. ; fung et al. ; zhang et al. ) and in the innovative role of trna-dependent enzymes in pacidamycin and lantibiotic biosynthetic pathways (zhang et al. ; ortega et al. ). however, the role of an l/f trna transferase in peptide bond formation during biosynthesis of a spa, in association with an nrps-like protein, has not been shown until now. the combinatorial potential provided by the interaction between a domains and trues warrants further investigation in many ways. for instance, we envisage exploitation of this discovery for untapping novel natural products diversity through genome mining efforts, as well as for the development of synthetic biology approaches targeting proteolysis in different settings (hines et al. ) , such as within the present covid pandemic crisis. al. ) algorithms were used as previously. s. roseus atcc was obtained from the atcc collection, and its genomic dna was extracted using common protocols (kieser et al. ) and sequenced at the genomic sequencing facilities of langebio, cinvestav-ipn (irapuato, mexico), using an illumina miseq platform in paired-end format with read lengths of bases and insert length of bases. in total, mbp of sequence was obtained. the raw reads were filtered using trimmomatic (bolger et al. ) and assembled with velvet (zerbino and birney ), obtaining a . mb assembly in contigs with a coverage of x and a gc content of %. this assembly was annotated using rast (aziz et al. ) , antismash (weber et al. ) and evomining (sélem-mojica et al. ). the genome includes , , bp, and it was assembled in a total of contigs and deposited in ncbi under the genome accession number nz_lfml . for the identification and characterization of livipeptin in s. lividans , a mutant deficient for the sli - genes was constructed. these genes were replaced by the apramycin resistance cassette aac( )iv marker in-frame within a pesac -a construct. a . kb region flanking the sli - genes replaced by the apramycin cassette was then amplified by pcr and cloned into the plasmid pwhm , which contains a thiostrepton resistance gene. pwhm is an unstable streptomyces vector that is lost after some rounds of cultivation of the transformed strain without selection (vara et al. ). double crossovers after integration of the pwhm sli - :: aac( )iv construct were screened after apramycin resistance ( μg/ml) and thiostrepton ( μ/ml) sensitivity. the genotype of several transformants was confirmed by pcr, leading to s. lividans Δlvp used for experimentation. the synthetic livipeptin bgc, employing the e. coli usage codon, was obtained from genescript (new jersey, usa). the design of the genetic construction includes restriction sites flanking each gene, as follows: ndei-sli -ecori-sli -xbai-sli -bglii-hindiii. different combinations of these genes were cloned into pask (adams et al. ) , resulting in six different plasmids (table and supplementary figure s ) . the streptomyces roseus leupa mutant was constructed following an insertional mutagenesis strategy using as parental strain atcc . for this, a leupa bp fragment was pcr amplified and cloned into pcr . -topo using a ta cloning kit (invitrogen, carlsbad, usa). the resulting suicide plasmid, termed pleupa, was introduced into s. roseus via protoplasts fusion (kieser et al. ) . the transformants were selected with kanamycin ( μ/ml) and the genotype of the insertional mutant s. for production of leupeptin, s. roseus atcc was grown on shake flask cultures containing a media designed by us, as follows: glucose g, nh no . g, mgso ( h o) . g, kcl . g, lleucine . g, l-arginine . g, glycine . g, casamino acids . g, yeast extract . g per liter. cultures were incubated at °c for h prior to supernatant analysis. heterologous leupeptin production was performed in shake flask cultures using e. coli dh b bearing pesac_ , in m minimal media (glucose g/l plus casamino acids g/l) supplemented with apramycin ( μg/ ml) for selection of the cosmid at °c for h. the cell extract of e. coli c pgroel/groes was prepared following a previously described method (kigawa et al. ) with modifications. e. coli c pgroel/groes was inoculated in ml de lb medium and grown at °c to an od of . for induction with ng/µl of anhydrotetracycline. after induction, the culture was grown at °c for h. cells were harvested for ( rpm, min, °c) and the pellet was washed three times by resuspension in s buffer ( mm tris-acetate buffer ph . , mm magnesium acetate, mm potassium acetate, mm dithiothreitol (dtt), . mm edta, . mm mgcl ) (ortega et al. ) followed by centrifugation ( rpm, min, °c). cells were then resuspended in ml of s buffer per gram of wet cells and lysed with a sonicator. the cell lysate was centrifuged twice ( rpm, min, °c) and the supernatant was dialyzed four times against volumes of s buffer (without dtt) using amycon ultra- (merckmillipore) tubes for dialysis with a molecular mass cutoff of kda. the cell extract was then centrifuged ( rpm, min, °c) and the supernatant was frozen and stored in ml samples at - °c for future use. supernatant of cultures and in vitro reaction mixtures were evaporated to dryness. the dry residues were dissolved in a . volume of hplc grade h o and injected into a c discovery supelco column with a particle size of μm connected to a hplc-agilent equipped with a diode array detector and a fraction collector. the mobile phase comprised a binary system of eluent a, h o, and eluent b, % meoh. the run consisted of h o/meoh gradient ( - min: % b; - min: % b; - min: %. differential peaks were detected between the wild-type and mutant strains (or empty vectors) by monitoring absorbance at a wavelength of nm, and the selected fractions were collected for bioactivity assays and ms analysis in an ion trap ltq velos mass spectrometer (thermo scientific, whatam, usa). ms/ms analysis of selected ions was performed with a collision energy of ev. fractions collected in hplc analysis were analyzed in vitro using a fluorometric assay with excitation at nm and emission at nm. the chromogenic compound nα-benzoyl-dl-arginine nitroanilide hydrochloride (bapna, sigma-aldrich) was used as substrate for trypsin and papain atp ( mm) and nadph ( mm). the assay was incubated at °c for h, centrifugated to remove insoluble material ( , rpm, min, °c). in addition, the cell free extract ( µl) was treated with rnase in the presence of cacl ( µm). the activity was calculated in percentage using as % the relative fluorescence units (rfu) of the proteases without inhibitors at the end of the reaction ( min). the slope of the curve (initial rate) of the time progress of the reaction in each experiment was also calculated by triplicate. . a hplc retention time (rt) and mass spectrometry (ms, ms ) data with common fragments between livipeptin and leupeptin shown in bold. ni, not identified (see figure and its associated supplementary figures s and s for spectral analysis). b t.i.a. and p.i.a., trypsin and papain inhibitory activity, respectively, which was recorded as activity curves for minutes, and reported as a percentage of the activity found for leupeptin and papain, respectively (see figure a ). data was generated after three independent experiments, leading to standard deviations measured as relative fluorescence units (rfu). d reported activity is from commercial leupeptin of microbial source with ≥ % purity (sigma, no. l ). e pesac_ construct (or pleup) is a pesac -a derivative (tocchetti et al. ) containing the predicted leupeptin bgc, including the leup - genes. a the reaction mixtures were prepared with e. coli cell-free extracts (c. f. e.) with or without addition of rnase. b genotype provided is for plasmids harboring the s. lividans genes sli - . ni, livipeptin could be not identified. as in figure , sli - genes are shown only with their final digit, i.e. , and . a comma (,) and a hyphen (-) are used to denote expression in trans or in cis, respectively. trypsin (t. i. a.) and papain (p. i. a.), hplc and ms analysis, was determined and expressed as in table . respectively. standard deviations shown were calculated from three independent experiments (table ) b. activity reconstitution using a synthetic biology approach of livipeptin bgc. metabolites, proteases and controls are as in panel a (data not shown). only the three three-genes rection mixtures, irrespective of their genetic organization (expression in trans or in cis), showed protease inhibitory activity. cell-free extracts with one-or two-genes constructs did not show inhibitory activity ( table ) . rnase (dotted bars) eliminates trypsin and papain inhibitory activity in around % and %, respectively. c. s li sli , sli , s li sli , sli , s li % rfu at min sli , sli sli , sli promiscuous and adaptable enzymes fill "holes" in the tetrahydrofolate pathway in chlamydia species leupeptins, new protease inhibitors from actinomycetes the rast server: rapid annotations using subsystems technology the nonribosomal synthesis of diketopiperazines in trna-dependent cyclodipeptide synthase pathways : updates to the secondary metabolite genome mining pipeline trimmomatic: a flexible trimmer for illumina sequence data revisiting the biosynthesis of dehydrophos reveals a trna-dependent pathway effect of ssra (tmrna) tagging system on translational regulation in streptomyces crystallographic and kinetic investigations of the covalent complex formed by a specific tetrapeptide aldehyde and the serine protease from streptomyces griseus peptide aldehyde complexes with wheat serine carboxypeptidase ii: implications for the catalytic mechanism and substrate specificity the complex extracellular biology of streptomyces proteomics guided discovery of flavopeptins: anti-proliferative aldehydes synthesized by a reductase domain-containing nonribosomal peptide synthetase evolutionary dynamics of natural product biosynthesis in bacteria the genome sequence of streptomyces lividans reveals a novel trna-dependent peptide biosynthetic system within a metalrelated genomic island genetic system for producing a proteases inhibitor of a small peptide aldehyde type. usa patent no. us a recapitulation of the evolution of biosynthetic gene clusters reveals hidden chemical diversity on bacterial genomes phylogenomic analysis of natural products biosynthetic gene clusters allows discovery of arseno-organic metabolites in model streptomycetes antibiotics from microbes: converging to kill an alternative mechanism for the catalysis of peptide bond formation by l/f transferase: substrate binding and orientation investigations of valanimycin biosynthesis: elucidation of the role of seryl-trna discovery of reactive microbiota-derived metabolites that inhibit host proteases phylogenomics of , -diacetylphloroglucinolproducing pseudomonas and novel antiglycation endophytes from piper auritum proteasome inhibition by fellutamide b induces nerve growth factor synthesis characterization of an inducible vancomycin resistance system in streptomyces coelicolor reveals a novel gene (vank) required for drug resistance substrate recognition by the leucyl/phenylalanyl-trna-protein transferase bacithrocins a, b and c, novel thrombin inhibitors mibig . : a repository for biosynthetic gene clusters of known function practical streptomyces genetics preparation of escherichia coli cell extract for highly productive cell-free protein expression regulation of production of leupeptin, leupeptin-inactivating enzyme, and trypsin-like protease in streptomyces exfoliatus smf trypsin-like protease of streptomyces exfoliatus smf , a potential agent in mycelial differentiation isolation and characterization of leupeptins produced by actinomycetes a soluble enzyme from escherichia coli which catalyzes the transfer of leucine and phenylalanine from trna to acceptor proteins making and breaking leupeptin protease inhibitors in pathogenic gammaproteobacteria new leupeptin analogs: synthesis and inhibition data the n-end rule pathway for regulated proteolysis: prokaryotic and eukaryotic strategies novel thiol proteinase inhibitor, thiolstatin, produced by a strain of bacillus cereus novel microbial alkaline protease inhibitor, mapi, produced by streptomyces sp a computational framework to explore large-scale biosynthetic diversity structure and mechanism of the trna-dependent lantibiotic dehydratase nisb new insights into the biosynthetic logic of ribosomally synthesized and post-translationally modified peptide natural products comparison of some dried holotype and neotype specimens of streptomycetes with their living counterparts microbial and fungal protease inhibitors-current and potential applications evomining reveals the origin and fate of natural product biosynthetic enzymes. microb ge factor a and b: new hiv- protease inhibitors, produced by streptomyces sp. atcc antipain, a new protease inhibitor isolated from actinomycetes biosynthesis of leupeptin. iii. isolation and properties of an enzyme synthesizing acetyl-l-leucine the structure of chymostatin, a chymotrypsin inhibitor large inserts for big data: artificial chromosomes in the genomic era chymostatin, a new chymotrypsin inhibitor produced by actinomycetes cloning of genes governing the deoxysugar portion of the erythromycin biosynthesis pathway in saccharopolyspora erythraea (streptomyces erythreus) the n-end rule pathway and regulation by proteolysis protein-based peptidebond formation by aminoacyl-trna protein transferase structural elucidation of alpha-mapi, a novel microbial alkaline proteinase inhibitor, produced by streptomyces nigrescens wt- antismash . -a comprehensive resource for the genome mining of biosynthetic gene clusters recent advances in engineering nonribosomal peptide assembly lines inhibitor complexes of the pseudomonas serine-carboxyl proteinase velvet: algorithms for de novo short read assembly using de bruijn graphs trna-dependent peptide bond formation by the transferase pacb in biosynthesis of the pacidamycin group of pentapeptidyl nucleoside antibiotics we thank the hplc and ms units of cinvestav-ipn, irapuato, for technical support. we also thank pablo cruz-morales for strains construction and nelly selem-mojica for bioinformatics support. key: cord- -zult f authors: nguyen, thin; le, hang; quinn, thomas p.; nguyen, tri; le, thuc duy; venkatesh, svetha title: graphdta: predicting drug–target binding affinity with graph neural networks date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: zult f the development of new drugs is costly, time consuming, and often accompanied with safety issues. drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. in order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. computational models that estimate the interaction strength of new drug--target pairs have the potential to expedite drug repurposing. several models have been proposed for this task. however, these models represent the drugs as strings, which is not a natural way to represent molecules. we propose a new model called graphdta that represents drugs as graphs and uses graph neural networks to predict drug--target affinity. we show that graph neural networks not only predict drug--target affinity better than non-deep learning models, but also outperform competing deep learning methods. our results confirm that deep learning models are appropriate for drug--target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. availability of data and materials the proposed models are implemented in python. related data, pre-trained models, and source code are publicly available at https://github.com/thinng/graphdta. all scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/ . /zenodo. . contact thin.nguyen@deakin.edu.au it costs about . billion us dollars to develop a new drug [ ] , and can take up to years for fda approval [ , ] . finding new uses for already approved drugs avoids the expensive and lengthy process of drug development [ , ] . for example, nearly existing fda-approved drugs are currently being investigated to see if they can be repurposed to treat covid- [ ] . in order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. high-throughput screening experiments are used to examine the affinity of a drug toward its targets; however, these experiments are costly and time-consuming [ , ] , and an exhaustive search is infeasible because there are millions of drug-like compounds [ ] and hundreds of potential targets [ , ] . as such, there is a strong motivation to build computational models that can estimate the interaction strength of new drug-target pairs based on previous drug-target experiments. several computational approaches have been proposed for drug-target affinity (dta) prediction [ , , ] . one approach is molecular docking, which predicts the stable d structure of a drug-target complex via a scoring function [ ] . even though the molecular docking approach is potentially more informative, it require knowledge about the crystallized structure of proteins which may not be available. another approach uses collaborative filtering. for example, the simboost model uses the affinity similarities among drugs and among targets to build new features. these features are then used as input in a gradient boosting machine to predict the binding affinity for unknown drug-target pairs [ ] . alternatively, the similarities could come from others sources (rather than the training data affinities). for example, kernel-based methods use kernels built from molecular descriptors of the drugs and targets within a regularized least squares regression (rls) framework [ , ] . to speed up model training, the kronrls model computes a pairwise kernel k from the kronecker product of the drug-by-drug and protein-by-protein kernels [ , ] (for which any similarity measure can be used). dta prediction may also benefit from adopting methods for predicting drug-target interactions (dti). approaches in this line of work include dti-cdf [ ] , a cascade deep forest model, or dti-mlcd [ ] , a multi-label learning supported with community detection. another approach uses neural networks trained on d representations of the drug and protein sequences. for example, the deepdta model uses d representations and layers of d convolutions (with pooling) to capture predictive patterns within the data [ ] . the final convolution layers are then concatenated, passed through a number of hidden layers, and regressed with the drug-target affinity scores. the widedta model is an extension of deep-dta in which the sequences of the drugs and proteins are first summarized as higher-order features [ ] . for example, the drugs are represented by the most common sub-structures (the ligand maximum common substructures (lmcs) [ ] ), while the proteins are represented by the most conserved sub-sequences (the protein domain profiles or motifs (pdm) from prosite [ ] ). while widedta [ ] and deepdta [ ] learn a latent feature vector for each protein, the padme model [ ] uses fixed-rule descriptors to represent proteins, and performs similarly to deepdta [ ] . the deep learning models are among the best performers in dta prediction [ ] . however, these models represent the drugs as strings, which are not a natural way to represent molecules. when using strings, the structural information of the molecule is lost, which could impair the predictive power of a model as well as the functional relevance of the learned latent space. already, graph convolutional networks have been used in computational drug discovery, including interaction prediction, synthesis prediction, de novo molecular design, and quantitative structure prediction [ , , , , , , ] . however, graph neural networks have not been used for dta prediction. of these, [ , , ] are closest to our work, but look at binary prediction, while our model looks to predict a continuous value of binding affinity. also, in [ ] , the input is a drug descriptor (single input), while our model takes as input both a drug descriptor and a sequence (dual input). in this article, we propose graphdta, a new neural network architecture capable of directly modelling drugs as molecular graphs, and show that this approach outperforms state-of-the-art deep learning models on two drug-target affinity prediction benchmarks. the approach is based on the solution we submitted to the idg-dream drug-kinase binding prediction challenge , where we were among the top ten performers from registered participants . in order to better understand how our graph-based model works, we performed a multivariable statistical analysis of the model's latent space. we identified correlations between hidden node activations and domain-specific drug annotations, such as the number of aliphatic oh groups, which suggests that our graph neural network can automatically assign importance to well-defined chemical features without any prior knowledge. we also examine the model's performance and find that a handful of drugs contribute disproportionately to the total prediction error, and that these drugs are inliers (i.e., not outliers) in an ordination of the model's latent space. taken together, our results suggest that graph neural networks are highly accurate, abstract meaningful concepts, and yet fail in predictable ways. we conclude with a discussion about how these insights can feedback into the research cycle. we propose a novel deep learning model called graphdta for drug-target affinity (dta) prediction. we frame the dta prediction problem as a regression task where the input is a drug-target pair and the output is a continuous measurement of binding affinity for that pair. existing methods represent the input drugs and proteins as d sequences. our approach is different; we represent the drugs as molecular graphs so that the model can directly capture the bonds among atoms. smiles (simplified molecular input line entry system) was invented to represent molecules to be readable by computers [ ] , enabling several efficient figure : this figure shows the graphdta architecture. it takes a drug-target pair as the input data, and the pair's affinity as the output data. it works in stages. first, the smiles code of a drug is converted into a molecular graph, and a deep learning algorithm learns a graph representation. meanwhile, the protein sequence is encoded and embedded, and several d convolutional layers learn a sequence representation. finally, the two representation vectors are concatenated and passed through several fully connected layers to estimate the output drug-target affinity value. applications, including fast retrieval and substructure searching. from the smiles code, drug descriptors like the number of heavy atoms or valence electrons can be inferred and readily used as features for affinity prediction. one could also view the smiles code as a string. then, one could featurize the strings with natural language processing (nlp) techniques, or use them directly in a convolutional neural network (cnn). instead, we view drug compounds as a graph of the interactions between atoms, and build our model around this conceptualization. to describe a node in the graph, we use a set of atomic features adapted from deepchem [ ] . here, each node is a multi-dimensional binary feature vector expressing five pieces of information: the atom symbol, the number of adjacent atoms, the number of adjacent hydrogens, the implicit value of the atom, and whether the atom is in an aromatic structure [ ] . we convert the smiles code to its corresponding molecular graph and extract atomic features using the open-source chemical informatics software rdkit [ ]. one-hot encoding has been used in previous works to represent both drugs and proteins, as well as other biological sequences like dna and rna. this paper tests the hypothesis that a graph structure could yield a better representation for drugs, and so only drugs were represented as a graph. although one could also represent proteins as graphs, doing so is more difficult because the tertiary structure is not always available in a reliable form. as such, we elected to use the popular one-hot encoding representation of proteins instead. for each target in the experimented datasets, a protein sequence is obtained from the uniprot database using the target's gene name. the sequence is a string of ascii characters which represent amino acids. each amino acid type is encoded with an integer based on its associated alphabetical symbol (e.g., alanine (a) is , cystine (c) is , aspartic acid (d) is , and so on), allowing the protein to be represented as an integer sequence. to make it convenient for training, the sequence is cut or padded to a fixed length sequence of residues. in case a sequence is shorter, it is padded with zero values. these integer sequences are used as input to the embedding layers which return a -dimensional vector representation. next, three d convolutional layers are used to learn different levels of abstract features from the input. finally, a max pooling layer is applied to get a representation vector of the input protein sequence. having the drug compounds represented as graphs, the task now is to design an algorithm that learns effectively from graphical data. the recent success of cnn in computer vision, speech recognition, and natural language processing has encouraged research into graph convolution. a number of works have been proposed to handle two main challenges in generalizing cnn to graphs: ( ) the formation of receptive fields in graphs whose data points are not arranged as euclidean grids, and ( ) the pooling operation to down-sample a graph. these new models are called graph neural networks. in this work, we propose a new dta prediction model based on a combination of graph neural networks and conventional cnn. figure shows a schematic of the model. for the proteins, we use a string of ascii characters and apply several d cnn layers over the text to learn a sequence representation vector. specifically, the protein sequence is first categorically encoded, then an embedding layer is added to the sequence where each (encoded) character is represented by a -dimensional vector. next, three d convolutional layers are used to learn different levels of abstract features from the input. finally, a max pooling layer is applied to get a representation vector of the input protein sequence. this approach is similar to the existing baseline models. for the drugs, we use the molecular graphs and trial graph neural network variants, including gcn [ ] , gat [ ] , gin [ ] , and a combined gat-gcn architecture, all of which we describe below. in this work, we focus on predicting a continuous value indicating the level of interaction of a drug and a protein sequence. each drug is encoded as a graph and each protein is represented as a string of characters. to this aim, we make use of gcn model [ ] for learning on graph representation of drugs. note that, however, the original gcn is designed for semi-supervised node classification problem, i.e., the model learns the node-level feature vectors. for our goal, to estimate the drug-protein interaction, a graph-level representation of each drug is required. common techniques to aggregate the whole graph feature from learned node features include sum, average, and max pooling. in our experiments, the use of max pooling layer in gcn-based graphdta usually results in better performance compared to that of the remaining. formally, denote a graph for a given drug as g = (v, e), where v is the set of n nodes each is represented by a c-dimensional vector and e is the set of edges represented as an adjacency matrix a. a multi-layer graph convolutional network (gcn) takes as input a node feature matrix x ∈ r n ×c (n = |v |, c: the number of features per node) and an adjacency matrix a ∈ r n ×n ; then produces a node-level output z ∈ r n ×f (f : the number of output features per node). a propagation rule can be written in the normalized form for stability, as in [ ] : whereà = a + i n is the adjacency matrix of the undirected graph with added self-connections,d ii = ià ii ; h (l) ∈ r n ×c is the matrix of activation in the l th layer, h ( ) = x, σ is an activation function, and w is learnable parameters. a layer-wise convolution operation can be approximated, as in [ ] : where Θ ∈ r c×f (f : the number of filters or feature maps) is the matrix of filter parameters. note that, however, the gcn model learns node-level outputs z ∈ r n ×f . to make the gcn applicable to the task of learning a representation vector of the whole graph, we add a global max pooling layer right after the last gcn layer. in our gcn-based model, we use three consecutive gcn layers, each activated by a relu function. then a global max pooling layer is added to obtain the graph representation vector. unlike graph convolution, the graph attention network (gat) [ ] proposes an attention-based architecture to learn hidden representations of nodes in a graph by applying a self-attention mechanism. the building block of a gat architecture is a graph attention layer. the gat layer takes the set of nodes of a graph as input, and applies a linear transformation to every node by a weigh matrix w. for each input node i in the graph, the attention coefficients between i and its first-order neighbors are computed as this value indicates the importance of node j to node i. these attention coefficients are then normalized by applying a soft-max function, then used to compute the output features for nodes as σ( where σ(.) is a non-linear activation function and α ij are the normalized attention coefficients. in our model, the gat-based graph learning architecture includes two gat layers, activated by a relu function, then followed a global max pooling layer to obtain the graph representation vector. for the first gat layer, multi-headattentions are applied with the number of heads set to , and the number of output features set to the number of input features. the number of output features of the second gat is set to . the graph isomorphism network (gin) [ ] is newer method that supposedly achieves maximum discriminative power among graph neural networks. specifically, gin [ , ] . we compare graph neural network variants: gin [ ] , gat [ ] , gcn [ ] , and combined gat-gcn [ , ] . italics: best for baseline models, bold: better than baselines. where is either a learnable parameter or a fixed scalar, x is the node feature vector, and b(i) is the set of nodes neighboring i. in our model, the gin-based graph neural net consists of five gin layers, each followed by a batch normalization layer. finally, a global max pooling layer is added to obtain the graph representation vector. we also investigate a combined gat-gcn model. here, the graph neural network begins with a gat layer that takes the graph as input, then passes a convolved feature matrix to the subsequent gcn layer. each layer is activated by a relu function. the final graph representation vector is then computed by concatenating the global max pooling and global mean pooling layers from the gcn layer output. to compare our model with the state-of-the-art deepdta [ ] and widedta [ ] models, we use the same datasets from the [ , ] benchmarks: davis contains the binding affinities for all pairs of drugs and targets, measured as k d constants and ranging from . to . [ ] . [ , ] . we compare graph neural network variants: gin [ ] , gat [ ] , gcn [ ] , and combined gat-gcn [ , ] . italics: best for baseline models, bold: better than baselines. kiba contains the binding affinities for , drugs and targets, measured as kiba scores and ranging from . to . [ ] . to make the comparison as fair as possible, we use the same set of training and testing examples from [ , ] , as well as the same performance metrics: mean square error (mse, the smaller the better) and concordance index (ci, the larger the better). for all baseline methods, we report the performance metrics as originally published in [ , ] . the hyper-parameters used for our experiments are summarized in table . the hyper-parameters were not tuned, but chosen a priori based on our past modelling experience. the activation of nodes within layers of a deep neural network are called latent variables, and can be analyzed directly to understand how a model's performance relates to domain knowledge [ ] . we obtained the latent variables from the graph neural network layer, and analyzed them directly through a redundancy analysis. this multivariable statistical method allows us to measure the percent of the total variance within the latent variables that can be explained by an external data source. in our case, the external data source is a matrix of molecular joelib features/descriptors [ ] for each drug (available from chemmine tools [ ] ). we also compare the value of the principal components from these latent variables with the per-drug test set error. here, the per-drug (or per-protein) error refers to the median of the absolute error between the predicted dta and the ground-truth dta for all test set pairs containing that drug (or that protein). for these analyses, we focus on the gin model [ ] (because of its superior performance) and the kiba dataset [ ] (because of its larger drug catalog). . graphical models outperform the state-of-the-art table compares the performance of variant graphdta models with the existing baseline models for the davis dataset. here, all variants had the lowest mse. the best variant had an mse of . which is . % lower than the best baseline of . . the improvement is less obvious according to the ci metric, where only of the variants had the highest ci. the best ci for a baseline model was . . by comparison, the gat and gin models achieved a ci of . and . , respectively. table compares the performance of the graphdta models with the existing baseline models for the kiba dataset. here, of the variants had the lowest mse and the highest ci, including gin, gcn, and gat-gcn. of note, the best mse here is . , which is . % lower than the best baseline of . . of all variants tested, gin is the only one that had the best performance for both datasets and for both performance measures. for this reason, we focus on the gin in all post-hoc statistical analyses. a graph neural network works by abstracting the molecular graph of each drug into a new feature vector of latent variables. in our model, there are latent variables which together characterise the structural properties of the drug. since the latent variables are learned during the dta prediction task, we assume that they represent graphical features that contribute meaningfully to dta. unfortunately, it is not straightforward to determine the molecular substructures to which each latent variable corresponds. however, we can regress [ ] . the blue dots represent drugs, the green dots represent latent variables (the furthest from origin are labelled), and the arrows represent molecular descriptors (the longest are labelled). the right panel of the figure shows the activation of two latent variables plotted against the number of aliphatic oh groups in that drug. these results suggest that the graph convolutional network can abstract known molecular descriptors without any prior knowledge. here, we see that the errors are not distributed evenly across the drugs. it is harder to predict the target affinities for some drugs than others. the learned latent space with a matrix of known molecular descriptors to look for overlap. figure shows a redundancy analysis of the latent variables regressed with molecular descriptors [ ] (available from chemmine tools [ ] ). from this, we find that . % of the latent space is explained by the known descriptors, with the "number of aliphatic oh groups" contributing most to the explained variance. indeed, two latent variables correlate strongly with this descriptor: hidden nodes v and v both tend to have high activation when the number of aliphatic oh groups is large. this finding provides some insight into how the graphical model might "see" the drugs as a set of molecular sub-structures, though most of the latent space is orthogonal to the known molecular descriptors. although the graphdta model outperforms its competitors, we wanted to know more about why its predictions sometimes failed. for this, we averaged the prediction error for each drug (and each protein), for both the davis and kiba test sets. figures and show the median of the absolute error (mae) for affinity prediction, sorted from smallest to largest. interestingly, we see that a handful of drugs (and a handful of proteins) contribute disproportionately to the overall error. of note, chembl (an alk inhibitor), chembl (a pdk inhibitor) and the protein csnk e all had an mae above . we examined the latent space with regard to the prediction error, but could here, we see that the errors are not distributed evenly across the proteins. it is harder to predict the target affinities for some proteins than others. not find any obvious pattern that separated hard-to-predict drugs from easy-topredict drugs. the only trend we could find is that the easy-to-predict drugs are more likely to appear as outliers in a pca of the latent space. supplemental figure (https://github.com/thinng/graphdta/blob/master/supplement.pdf) shows the median errors plotted against the first six principal components, where we see that the hard-to-predict drugs usually appear close to the origin. we interpret this to mean that drugs with unique molecular sub-structures are always easy to predict. on the other hand, the hard-to-predict drugs tend to lack unique structures, though this is apparently true for many easy-to-predict drugs too. knowing how a model works and when a model fails can feedback into the research cycle. in the post-hoc statistical analysis of our model, we find that a graph neural network can learn the importance of known molecular descriptors without any prior knowledge. however, most of the learned latent variables remain unexplained by the available descriptors. yet, the model's performance implies that these learned representations are useful in affinity prediction. this suggests that there are both similarities and differences in how machines "see" chemicals versus how human experts see them. understanding this distinction may further improve model performance or reveal new mechanisms behind drugtarget interactions. meanwhile, the distribution of the test set errors suggest that there are "problem drugs" (and "problem proteins") for which prediction is especially difficult. one could action this insight either by collecting more training data for these drugs (or proteins), or by using domain-knowledge to engineer features that complement the molecular graphs. indeed, knowing that the pca outliers are the easiest to predict suggests that some additional feature input may be needed to differentiate between drugs that lack distinct molecular sub-graphs. although d graphs contain more information than d strings, our model still neglects the stereochemistry of the molecules. future experiments could test whether representing drugs in d (or proteins in d) further improves model performance. interestingly, under-representation of proteins in the training set does seem to be the reason for the "problem proteins". the supplement shows an analysis of the effect of homologous proteins on test set performance. although we see that test set error varies across clustered protein groups, the training set represents all protein clusters equally well. this suggests that the variation in test set performance is not simply explained by asymmetrical representation of protein groups within the training set. we test graphdta with four graph neural network variants, including gcn, gat, gin, and a combined gat-gcn architecture, for the task of drug-affinity prediction. we benchmark the performance of these models on the davis and kiba datasets. we find graphdta performs well for two separate benchmark datasets and for two key performance metrics. in a post-hoc statistical analysis of our model, we find that graphdta can learn the importance of known molecular descriptors without any prior knowledge. we also examine the model's performance and find that a handful of drugs contribute disproportionately to the total prediction error. although we focus on drug-target affinity prediction, our graphdta model is a generic solution for any similar problem where either data input can be represented as a graph. it may be possible to improve performance further by representing proteins as graphs too, for example by a graph of their d structure. however, determining the d structure of a target protein is very challenging. we chose to use the primary protein sequence because it is readily available. the use of d sequences, instead of d structures, also reduces the number of parameters that we need to learn, making it less likely that we over-fit our model to the training data. still, for some problem applications, it may make sense to use structural information, as well as binding site, binding confirmation, and solution environment information, to augment the model. new drugs cost us $ . billion to develop drug repositioning: identifying and developing new uses for existing drugs pharmacogenetics in drug discovery and development: a translational perspective overcoming drug development bottlenecks with repurposing: old drugs learn new tricks a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing. biorxiv protein kinases-the major drug targets of the twenty-first century? protein kinase inhibitors: insights into drug design from structure frequent substructure-based approaches for classifying chemical compounds the protein kinase complement of the human genome maximizing diversity from a kinase screen: identification of novel and selective pan-trk inhibitors for chronic pain a machine learning-based method to improve docking scoring functions and its application to drug repurposing drug discovery in the age of systems biology: the rise of computational approaches for data integration the drug repurposing hub: a next-generation drug library and information resource an overview of scoring functions used for protein-ligand interactions in molecular docking simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors learning with multiple pairwise kernels for drug bioactivity prediction dti-cdf: a cascade deep forest model towards the prediction of drugtarget interactions based on hybrid features predicting drug-target interactions using multi-label learning with community detection method (dti-mlcd). biorxiv deepdta: deep drugtarget binding affinity prediction widedta: prediction of drug-target binding affinity linguistic measures of chemical diversity and the 'keywords' of molecular collections prosite, a protein domain database for functional characterization and annotation padme: a deep learning-based framework for drug-target interaction prediction djork-arné clevert, and sepp hochreiter. large-scale comparison of machine learning methods for drug target prediction on chembl graph convolutional neural networks for predicting drug-target interactions chemi-net: a molecular graph convolutional network for accurate drug property prediction convolutional neural network based on smiles representation of compounds for detecting chemical motif molecular graph convolutions: moving beyond fingerprints graph convolutional networks for computational drug development and discovery large-scale learnable graph convolutional networks interpretable drug target prediction using deep neural representation smiles, a chemical language and information system. . introduction to methodology and encoding rules deep learning for the life sciences: applying deep learning to genomics rdkit: open-source cheminformatics semi-supervised classification with graph convolutional networks graph attention networks how powerful are graph neural networks? comprehensive analysis of kinase inhibitor selectivity making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis deep in the bowel: highly interpretable neural encoder-decoder networks predict gut metabolites from gut microbiome feature selection for descriptor based classification models. . human intestinal absorption (hia) chemmine tools: an online service for analyzing and clustering small molecules key: cord- - zgd fl authors: khodakov, dmitriy; li, jiaming; zhang, jinny x.; zhang, david yu title: donut pcr: a rapid, portable, multiplexed, and quantitative dna detection platform with single-nucleotide specificity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zgd fl current platforms for molecular analysis of dna markers are either limited in multiplexing (qpcr, isothermal amplification), turnaround time (microarrays, ngs), quantitation accuracy (isothermal amplification, microarray, nanopore sequencing), or specificity against single-nucleotide differences (microarrays, nanopore sequencing). here, we present the donut pcr platform that features high multiplexing, rapid turnaround times, single nucleotide discrimination, and precise quantitation of dna targets in a portable, affordable, and battery-powered instrument using closed consumables that minimize contamination. we built a bread-board instrument prototype and three assays/chips to demonstrate the capabilities of donut pcr: ( ) a -plex mammal identification panel, ( ) a -plex bacterial identification panel, and ( ) a -plex human snp genotyping assay. the limit of detection of the platform is under genomic copies in under minutes, and the quantitative dynamic range is at least logs. we envision that this platform would be useful for a variety of applications where rapid and highly multiplexed nucleic acid detection is needed at the point of care. dna and rna sequence information uniquely identify biological organisms, from human to microbe. consequently, detection of specific dna sequences has become a critical part of precision medicine, from pathogen identification to human genetic disease risk assessment to disease prognosis. it is evident that as our understanding of disease genomics improves, translation of this scientific knowledge into actionable clinical practice will be facilitated by dna diagnostic platforms that are simultaneously fast, affordable, sensitive, massively multiplexed, quantitative, and easy to operate. since the early 's, however, dna detection technologies have bifurcated into either massively multiplexed but slow platforms (ngs [ , ] and microarrays [ , ] ) or rapid but low multiplexing platforms (qpcr [ , ] and isothermal amplification [ , ] ). two notable exceptions to the slow but powerful or fast but limited tradeoff are the biofire filmarray multiplex pcr platform [ ] and the oxford nanopore high-throughput sequencing platform [ ] (table ) . however, both platforms are unable to quantitate accurately and unable to reliably recognize single nucleotide difference, that serve as critical dna biomarkers for genetic and metabolic risk assessment [ , ] , pharmacogenetic drug dosing [ ] , cancer therapy selection [ ] , and infectious disease antimicrobial resistance [ , ] . here, we present the donut pcr platform for dna detection that combines scalable and massive multiplexing, rapid turnaround times, single nucleotide discrimination, and precise quantitation in a portable, affordable, and batterypowered instrument using closed consumables that minimize contamination risks ( table ). the donut pcr system is enabled by two inventions: ( ) reliable convection pcr using an annular reaction chamber, and ( ) a pre-quenched microarray that allows multiplexed readout via spatial separation. convection pcr achieves thermal cycling of a pcr reaction mixture using passive movement of fluid due to temperature-induced density differences, enabling an affordable, portable, low-power, and rapid turnaround instrument. the pre-quenched microarray uses spatial separation of fluorescent probes that become unquenched upon hybridization by dna amplicons, enabling scalable multiplex readouts without open-tube wash steps. importantly, the pre-quenched microarray is integrated in the consumable allowing probe hybridization to occur concurrently with convection pcr amplification. consequently, unlike standard dna microarrays that visualize spot endpoint fluorescence after hours of hybridization, the donut pcr platform performs real-time detection and quantitation of dna in under minutes. the quantitative dynamic range is over logs, with a limit of detection of under genomic dna copies. use of toehold probes [ ] or x-probes [ ] for the microarray further provides single nucleotide discrimination that enables robust single nucleotide polymorphism (snp) genotyping. donut pcr mechanism and chip design. rayleigh-benard thermal convection is the physical principle that as an aqueous solution is heated, it becomes less dense and rises due to gravity, whereas, in contrast, a colder solution is denser and falls. in the donut pcr platform, we designed a chip that includes an annular (donut-shaped) reaction chamber in which the dna sample and pcr reagents are loaded. the chip is then sealed and vertically mounted on one side to a ∘ c heater and on the other side to a ∘ c heater (fig. ab) . fluid in the reaction chamber is heated at the ∘ c zone and rises to the top of the chamber where it is carried by momentum to the ∘ c zone. the fluid is then cooled at the ∘ c zone, and falls to the bottom of the chamber, where it is carried by momentum to the ∘ c zone, thus completing the thermal cycle. on the inner surface of the reaction chamber of the chip, we print a pre-quenched dna microarray (fig. c) to allow highly multiplexed probe-based readout. microarrays allow detection of up to hundreds of thousands of different nucleic acid targets using a single fluorescence channel [ , ] by spatially separating different probes. however, traditional microarrays are unsuitable for in vitro diagnostic (ivd) use because they require labor-intensive and open-tube wash steps to suppress fluorescence background. the team has developed a new pre-quenched microarray chemistry, in which unlabeled amplicons induce an increase of the corresponding spot's fluorescence via displacement of a quencher-functionalized oligonucleotide. because the pcr amplicons are unlabeled, no wash steps are needed to reduce fluorescent background from excess amplicons, and a highly multiplexed readout for many different dna targets can be achieved in a closed tube reaction without specialized opto-fluidic equipment. once the loaded chip is mounted against the heaters, the pcr reaction begins, and an external camera is used to periodically take pictures of the microarray area ( fig. d ). at early time points, probe spots on the microarray are dark except for positive control spots (fig. e) ; at the end of the reaction, the probes that are hybridized to dna amplicons become bright. importantly, in the donut pcr system, pcr amplification occurs concurrently with probe hybridization, so the progress of the amplification reaction can be tracked in real time, unlike standard microarrays (fig. f) . this real-time readout allows robust and accurate dna target quantitation based on the time at which the fluorescence significantly increases. in contrast, endpoint fluorescence quantitation for standard microarrays is known to be less stable due to sample contents, probe synthesis impurities, illumination non-uniformity, optical aberrance, physical smudges, and other factors. specificity, speed, sensitivity, and quantitation dynamic range. convection pcr was initially proposed and experimentally demonstrated in [ ] , using capillary tubes that are heated from the bottom. however, previous implementations of convection pcr have not entered mainstream use because they exhibited poor temperature uniformity that resulted in significant primer dimer formation and nonspecific genomic amplification. thus, one of the first priorities in building a convection-based massively multiplex qpcr platform is to ensure pcr amplification specificity. based on our understanding, the fluid circulation in a reaction tube or chamber adopts laminar flow, with the flow velocity dependent on the temperature differential of the different circuit paths. in many reaction chamber designs, there will be regions containing fluid paths with minimal temperature differential that have minimal fluid movement (supplementary section s ). these regions may result in disproportionate formation of primer dimers and nonspecific amplification. by engineering a donut-shaped reaction chamber in the pcr chip, we remove most of the dead volume, and are able to achieve similar pcr specificity on human genomic dna as the commercial bio-rad cfx instrument (fig. a) . in contrast, a "pizza" shaped reaction chamber and a capillary tube both result in significant primer dimer and nonspecific amplicon formation. next, we aimed to improve the amplification speed within the donut pcr chamber because rapid turnaround is highly desirable for point-of-care applications such as pathogen identification. the speed of fluid circulation in the donut pcr is impacted by the thickness of the reaction chamber, because in laminar flow, the fluid velocity near a surface is close to zero. fig. b shows that mean circulation velocity increases in thicker chambers, consistent with expectations, and plateauing at roughly m chamber thickness. at this thickness, for our standard chamber dimensions ( mm outer diameter, mm inner diameter), the chamber volume is approximately l, consistent with commercial qpcr reaction volumes. with a s circulation time, a standard -cycle pcr protocol can be completed within minutes. to evaluate the speed of pcr amplification and the dynamic range of the donut pcr platform, we next ran the donut pcr chip using different quantities of the na human cell line genomic dna (fig. d) , ranging from haploid genomic copies ( cell equivalents, pg) to haploid ge- pcr reaction mixture comprising dna template, primers, dna polymerase, and dntps are loaded via the bottom green loading port into the donut-shaped reaction chamber on the donut pcr chip. subsequently, the left side of the chip is heated to ∘ c and the right side to ∘ c. because water is less dense at higher temperatures, the reaction solution will rise on the left side as it is heated and fall on the right side as it is cooled, achieving autonomous thermal cycles with only two constant-temperature heaters. for each lap around the donut-shaped "racetrack", amplicons have a chance to hybridize to the surface-functionalized probe array printed in the ∘ c zone. nomic copies. triplicate repeat experiments showed consistent transition time (t t ), defined as the time at which fluorescence first exceeds the threshold of . rfu. furthermore, the t t values for lower input dna quantities increased as predicted in a log-linear fashion, similar to qpcr and demonstrating a dynamic range of at least logs. all three negative control samples (water) did not have spot fluorescence exceeding the threshold, confirming specific pcr amplification and probe detection. next, we performed simultaneous quantitation of mouse and human dna on a donut pcr chip, in order to characterize the potential interference of quantitation accuracy due to presence of variable quantities of background dna (fig. ef) . we observe that as the stoichiometric ratio of pcr specificity than alternative implementations of convection pcr, enabling diagnostics-grade dna analysis. shown here are % agarose gel electrophoresis on single-plex pcr amplification products (human tfrc gene) using different platforms. lane shows results of convection pcr in a capillary tube using pockit instrument (genereach, taiwan), based on principles described in ref. [ ] . hardly any intended amplicon molecules are generated, and the vast majority of amplification products are nonspecific amplicons or primer dimers. lane shows the results on the donut pcr platform using a chip without the "donut hole." there is significant nonspecific amplification and primer dimer formation due to the dead space in the middle with low temperature and circulation velocity. mouse dna to human dna ranges from . to . ; the value of t t for human dna is essentially unaffected. simultaneously, the log-linearity of the mouse dna t t value is also unaffected by the presence of human dna. collectively, these results suggest that the donut pcr platform could be used for multiplexed gene expression profiling. spot position independence. the donut pcr platform leverages the embedded pre-quenched microarray to achieve massive multiplexing. for these multiplexing capabilities to be realized in a diagnostic setting, it is necessary that the observed results are reproducible and consistent across different spots. to characterize inter-spot consistency, we constructed a -spot donut pcr chip that includes quintuplet repeat spots for probes against human genes, rat genes, and mouse genes, plus positive controls and negative control (fig. a) . we observe that although there is significant difference in endpoint spot fluorescence intensity, suspected to be primarily due to non-uniform illumination, the values of t t are well conserved across all replicate spots for each of the genes. portable instrument. experiments presented thus far have used a commercial fluorescence microscope as the readout instrument. to facilitate the adoption of the donut pcr platform for a variety of applications that require portability and rapid turnaround, we next designed and built a portable donut pcr instrument (fig. a) . importantly, commercial qpcr instruments require wall power, preventing them from being rapidly deployed to point-of-care or field-use settings where rapid diagnostic testing may be needed. a major component of the power requirement is the need to rapidly cool pcr reaction mixtures from ∘ c (denaturing step) to ∘ c (annealing step). because donut pcr achieves rapid thermal cycling using passive fluidic flow, it requires only constant temperature heaters with low power consumption. our breadboard prototype (fig. b) thus is able to run off a mid-size v battery ( v× ah) for - experiments, and does not require connection to an external power source. there are main modules in the instrument: ( ) a mechanical mechanism to mount the chip against the heaters with sufficient force to ensure good thermal contact, ( ) a closed-loop thermal system to ensure the chip is differentially heated to the desired temperatures, ( ) an optics setup to illuminate the chip and filter out scattered light to reduce fluorescence background, ( ) a camera to acquire fluorescence images, and ( ) a microcontroller to coordinate timing of all components. in total, the cost of the components of this breadboard prototype was roughly $ , , with the bulk of the cost from to the camera (an iphone s, $ ), the optical filters (thorlabs mf - and chroma at lp, $ ), led light source (thorlabs led b, $ ), and led driver (thorlabs m l , $ ). to validate the functionality of this prototype instrument, we constructed a -plex bacterial identification panel on the donut pcr chip. the dna sequence encoding the s ribosomal rna and s ribosomal rna in bacteria are mostly conserved but contains hyper-variable regions with sequences that differ across species but are conserved within strains of the same species. for this -plex bacterial panel, we constructed different probes that target distinct s sequences (located in the v hyper-variable region) that serve as signatures for important bacterial species frequently implicated in nosocomial (hospital-acquired) infections, including the most common eskape [ ] bacteria. nosocomial infections cause roughly , deaths in the united states per year [ ] , and rapid pathogen identification could help inform timely targeted antibiotic treatment that can improve patient outcomes, limit spread, and drug resistance [ , ] . our eskape panel performed as we expected on the breadboard prototype instrument, successfully identifying both a single bacteria species and a combination of two bacteria species. because the current chip and instrument design does not include sample preparation and dna extraction modules, we performed validation using dna input from clinically derived isolates and reference strains purchased commercially from atcc collection and zeptometrix corp. (buffalo, ny). we individually tested each dna sample obtained (see supplementary section s for additional experimental data on the -plex bacterial panel using the breadboard prototype). because the breadboard prototype is open to the air, it is vulnerable to ambient dust that can occlude or distort the optics and ambient light that reduces image signal to noise ratio. we next contracted a third-party engineering firm to build a closed instrument with similar functionality to our breadboard prototype (fig. e) . as expected, the signal to noise ratio of the fluorescent spots were improved, and the -plex bacterial identification panel was able to clearly identify all bacterial species in a mixed sample of clinical isolates (fig. f) . see supplementary section s for revised chip layout for the production prototype instrument. single nucleotide polymorphism (snp) genotyping. single nucleotide polymorphisms are natural variations in the human genome; over million distinct snp loci have been reported in the human genome [ , ] . genome-wide association studies (gwas) have discovered a wide range of snps that correlate with disease risk, from diabetes [ ] , neurodegnerative disease [ , ] , hereditary breast [ ] and colorectal cancers [ ] , and coronary disease [ ] . in addition to human disease-related applications, snp genotyping can also be used for dna forensics application [ ] as well as agricultural seed selection [ ] . currently, the microarray are the preferred technology for snp genotyping applications, due to its massive multiplexing, high automation, and acceptable economics (typically ≤$ per affymetrix array chip). however, analyzing a sample using a microarray takes approximately hours. furthermore, because it requires large specialized instruments for hybridization, washing, and imaging, samples need to be transported to central labs for analysis, adding more days to the total turnaround time. while a - day turnaround is acceptable for non-time-sensitive applications, other snp geno- typing applications require rapid turnaround. one prominent example is pharmacogenetics [ ] ; the use of snp genotyping information to inform dosage of drugs such as warfarin based on individualized drug metabolism rates [ ] . information to guide accurate dosage of warfarin to treat patients suffering from stroke or deep vein thrombosis needs to be provided immediately. for snp genotyping, we use the x-probe architecture [ ] to achieve probe-based single-nucleotide discrimination while limiting the reagent costs of chemically modified dna (fig. a , see also supplementary section s ). for each snp locus, we design two separate surface-bound x-probes, one to each allele. human dna samples that are homozygous at the snp locus will only have the corresponding allele spot light up, while samples that are heterozygous will have both spots light up (fig. b) . the overall workflow from buccal (cheek) swab sample collection takes less than hour, including less than minutes of hands-on time (fig. c ). see http://demovideo.torus.bio for a -minute video of the entire workflow. to showcase the validation and accuracy of snp genotyping by the donut pcr platform, we constructed a -spot array corresponding to the alternate alleles for different human snp loci. initial panel testing showed correct snp genotype calls for all loci for different human cell line gdna samples. next, this panel was applied to buccal swab samples from a family trio of mother, father, and child under informed consent, using the production prototype instrument shown in fig. e . endpoint fluorescence images of the samples are shown in fig. d , but as usual snp genotype calls are made using time-based fluorescence traces. the donut pcr platform presented here achieves rapid, sensitive, and quantitative detection of many dna targets from a single sample using a closed, portable, and affordable instrument. although we have limited our studies in this work to arrays of to spots, probe printing is a highly scalable process and we have demonstrated that over probes can be printed on a single donut pcr chip (supplementary section s ). we believe that the donut pcr platform will be competitive in a range of applications where rapid and highly multiplexed dna detection is needed in decentralized settings. with the recent coronavirus pandemic, one apt use case for the donut pcr may be in disease surveillance, e.g. at airports. rna viruses such as covid- , influenza, and hiv are particularly prone to genetic drift due to the high error rates of reverse transcriptase, and the nextstrain database reports thousands of genetic variants for each virus [ ] . consequently, single-plex isothermal amplification assays [ , ] are vulnerable to low and decreasing clinical sensitivity as virus genomes evolve away from the target sequences singleplex assays are designed to detect. the high multiplexing of the donut pcr platform can overcome potential clinical false negatives by redundantly detecting many conserved pathogen-specific rna or cdna sequences. the single nucleotide specificity demonstrated by the donut pcr platform in fig. allows simultaneous detection of antibiotic resistance with pathogen identification. antibiotic resistances are typically caused either by gain of a gene (e.g. meca for methicillin resistance) or mutation of a gene (e.g. gyra for fluoroquinone resistance [ ] ). while other multiplex pcr platforms such as biofire [ ] or statdx [ ] can detect drug resistance due to gain of gene, they typically lack the molecular specificity needed to identify resistance caused by point mutations. the ability to rapidly identify infections and prescribe effective antimicrobial therapies is especially needed to reduce the mortality and morbidity rate of nosocomial infections in the united states. the core innovations in the donut pcr platform are the donut-shaped chamber to allow reliable and low-power convection-based qpcr, and the integrated pre-quenched microarray to allow massively multiplexed readout through spatial separation. as a result, our experimental demonstrations in this manuscript were "dna in, answer out" workflows. we did not consider sample preparation and dna extraction modules. extensive prior work have reported a variety of different lab-on-a-chip approaches to process blood, nasal/buccal swab, urine, and cerebrospinal fluid. integration with a pre-analytical module will likely be necessary for adoption of the donut pcr platform in point-of-care settings for infectious disease diagnostic applications. another potential application area for donut pcr is multigene expression analysis. cancer prognosis tests such as oncotype dx [ ] use tumor gene expression profiles to stratify patients for more aggressive treatment such as chemotherapy. recently, significant research on host biomarkers [ , ] also suggests that human gene expression can be used to differentiate exposure to bacteria vs. viruses, and could serve as a complementary set of markers to pathogen dna. we believe that donut pcr's ability to rapidly and simultaneously detect and quantitate many different nucleic acid markers positions it well as a platform for performing complex dna and rna diagnostics in settings convenient to the patient. acknowledgements. this work was supported by nih grant r ca to dyz. the authors thank jianyi nie for editorial assistance. the authors thank david walt for suggestions on instrumentation design. the authors thank torus biosystems for lending production prototype instruments used to collect data in fig. ef and fig. . author contributions. dk and dyz conceived the project. dk and dyz performed primer and probe design. dk performed chip design and construction. dk, jl, and jxz performed instrument design and construction. dk and dyz wrote the manuscript with input from all authors. additional information. we have complied with all relevant ethical regulations. correspondence may be addressed to dyz (dyz @rice.edu). there are patents issued on toehold probes and x-probes used in this work. there are patents pending on the donut pcr chip and donut pcr instrument presented in this work. dk, jl, and dyz declare competing interests in the form of employment (dk) or consulting (jl and dyz) for torus biosystems. jxz and dyz declare competing interests in the form of employment (jxz) or consulting (dyz) for nuprobe. dyz declares a competing interest in the form of consulting for avenge bio. software and data availability. raw fluorescence images/movies and code for image analysis are available upon request. coming of age: ten years of next-generation sequencing technologies next-generation sequencing platforms dna microarray technology: devices, systems, and applications revisiting global gene expression analysis comparing whole genomes using dna microarrays normalization of microarray data: singlelabeled and dual-labeled arrays kinetic pcr analysis: real-time monitoring of dna amplification reactions a practical approach to rt-qpcr publishing data that conform to the miqe guidelines nucleic acid isothermal amplification technologies, a review isothermal nucleic acid amplification technologies for point-of-care diagnostics: a critical review multicenter evaluation of the biofire fil-marray gastrointestinal panel for etiologic diagnosis of infectious gastroenteritis the oxford nanopore minion: delivery of nanopore sequencing to the genomics community genome-wide association studies for complex traits: consensus, uncertainty and challenges the new nhgri-ebi catalog of published genome-wide association studies (gwas catalog) pharmacogenomics of gpcr drug targets genetics, genomics, and cancer risk assessment: state of the art and future directions in the era of personalized medicine whole-genome sequencing to control antimicrobial resistance antimicrobial resistance: the example of staphylococcus aureus optimizing the specificity of nucleic acid hybridization simulation-guided dna probe design for consistently ultraspecific hybridization pcr in a rayleigh-benard convection cell rapid dna amplification in a capillary tube by natural convection with a single isothermal heater clinical relevance of the eskape pathogens health care-associated infection -an overview. infection and drug resistance dbsnp: the ncbi database of genetic variation kaviar: an accessible system for testing snv novelty. bioinformatics the genetic architecture of type diabetes genome-wide meta-analysis identifies new loci and functional pathways influencing alzheimer's disease risk a meta-analysis of genome-wide association studies identifies new parkinson's disease risk loci olaparib for metastatic breast cancer in patients with a germline brca mutation milestones of lynch syndrome a comprehensive genomes-based genome-wide association meta-analysis of coronary artery disease inter-laboratory evaluation of snp-based forensic identification by massively parallel sequencing using the ion pgm a robust, simple genotypingby-sequencing (gbs) approach for high diversity species genetics and the clinical response to warfarin and edoxaban: findings from the randomised, double-blind engage af-timi trial nextstrain: real-time tracking of pathogen evolution high proportion of fluoroquinolone-resistant mycobacterium tuberculosis isolates with novel gyrase polymorphisms and a gyra region associated with fluoroquinolone susceptibility multicenter evaluation of the qiastat respiratory panel -a new rapid highly multiplexed pcr based assay for diagnosis of acute respiratory tract infections comparison of pam risk of recurrence score with oncotype dx and ihc for predicting risk of distant recurrence after endocrine therapy host biomarkers for distinguishing bacterial from non-bacterial causes of acute febrile illness: a comprehensive review circulating micrornas as potential biomarkers of infectious disease key: cord- -n vdxdhp authors: rhea, elizabeth m.; logsdon, aric f.; hansen, kim m.; williams, lindsey; reed, may; baumann, kristen; holden, sarah; raber, jacob; banks, william a.; erickson, michelle a. title: the s protein of sars-cov- crosses the blood-brain barrier: kinetics, distribution, mechanisms, and influence of apoe genotype, sex, and inflammation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n vdxdhp evidence strongly suggests that sars-cov- , the cause of covid- , can enter the brain. sars-cov- enters cells via the s subunit of its spike protein, and s can be used as a proxy for the uptake patterns and mechanisms used by the whole virus; unlike studies based on productive infection, viral proteins can be used to precisely determine pharmacokinetics and biodistribution. here, we found that radioiodinated s (i-s ) readily crossed the murine blood-brain barrier (bbb). i-s from two commercial sources crossed the bbb with unidirectional influx constants of . ± . μl/g-min and . ± . μl/g-min and was also taken up by lung, spleen, kidney, and liver. i-s was uniformly taken up by all regions of the brain and inflammation induced by lipopolysaccharide reduced uptake in the hippocampus and olfactory bulb. i-s crossed the bbb completely to enter the parenchymal brain space, with smaller amounts retained by brain endothelial cells and the luminal surface. studies on the mechanisms of transport indicated that i-s crosses the bbb by the mechanism of adsorptive transcytosis and that the murine ace receptor is involved in brain and lung uptake, but not that by kidney, liver, or spleen. i-s entered brain after intranasal administration at about / th the amount found after intravenous administration and about . % of the intranasal dose entered blood. apoe isoform or sex did not affect whole brain uptake, but had variable effects on olfactory bulb, liver, spleen, and kidney uptakes. in summary, i-s readily crosses the murine bbb, entering all brain regions and the peripheral tissues studied, likely by the mechanism of adsorptive transcytosis. graphical abstract radioactive labeling of proteins the s proteins (raybiotech, cat no - , peachtree corners, ga; amsbio, ams.s n-c h , abingdon, uk) were provided by the manufacturer dissolved in phosphate buffered saline (pbs, ph . ) at a concentration ranging from . - . mg/ml. upon receipt, the s proteins were thawed and aliquoted into µg portions and either used immediately or stored at - oc until use. the µg of thawed s protein was radioactively labeled with mci i (perkin elmer, waltman, ma) using the chloramine-t method , as described and the radioiodinated s (i-s ) purified on a column of g sephadex (ge healthcare, uppsala, se), eluting with phosphate buffer (pb) into glass tubes containing % bovine serum albumin in lactated ringer's solution (bsa-lr). bovine serum albumin (sigma, st. louis, mo) was labeled with mtc (ge healthcare, seattle, wa) using the stannous tartrate method and the mtclabeled albumin (t-alb) purified on a column of g- sephadex. both of the i-s 's and the t-alb were more than % acid precipitable. the molecular weight of the labeled proteins was further confirmed by running , - , cpm activity in x lds buffer (invitrogen) with or without reducing agent (invitrogen) on a - % bis-tris gel (genescript) in -(n-morpholino)propanesulfonic acid (mops) buffer (invitrogen). the gel was then fixed for min in % acetic acid/ % methanol, washed x with water, and then dried using a dryease® mini-gel drying system (invitrogen). dried gels were exposed on autoradiography film for hours and then developed. supplemental figure shows that major bands of i-s from raybiotech and amsbio migrated at their predicted molecular weight patterns, based on manufacturer data. in anesthetized mice, the left jugular vein was exposed for an intravenous (iv) injection of . ml bsa-lr containing x cpm of i-s and x cpm of t-alb. at time points between and min, blood was collected from the carotid artery. blood was centrifuged at xg for min and ul serum was collected. the whole brain, kidney, and spleen and portions of the lung and liver were removed and weighed. tissues and serum were placed into a wizard gamma counter (perkin elmer), and the levels of radioactivity were measured. results for serum were expressed as the percent of the injected dose per ml of blood (%inj/ml). results for brain and tissues were expressed as the tissue/serum ratio in units of µl/g. for each individual tissue, its ratio for t-alb was subtracted from its ratio for i-s , yielding the "delta" value which reflects values corrected for vascular space and any nonspecific leakage into tissue. the delta brain/serum ratios were plotted against exposure time (expt), a calculation that corrects for clearance from blood: cpt where t is the time between the iv injection and sampling, cp is the cpm/ml of arterial serum, cpt is the cpm/ml of arterial serum at time t, and is the dummy variable for time. the slope of the linear portion of the relation of tissue/serum ratio vs expt measures the unidirectional influx rate (ki in units of l/g-min) and the y-intercept measures vv, the vascular space , . the area under the curve for the level of radioactivity in blood from - min was calculated using prism . software (graphpad inc, san diego, ca). the percent of the injected dose per gram of brain (%inj/g) was calculated by multiplying the delta brain/serum value by the %inj/ml for i-s . anesthetized mice received an injection into the jugular vein of . ml of bsa-lr containing x cpm of i-s plus x cpm of t-alb. ten minutes after the injection, arterial blood was collected from the abdominal aorta, the thorax opened and the descending thoracic artery clamped, both jugular veins severed, and ml of lactated ringer's solution perfused through the left ventricle of the heart to wash out the vascular space of the brain. the whole brain was then removed. whole blood was centrifuged and ul of serum was added to ul bsa-lr and combined with an equal part of % tricholoroacetic acid, mixed, centrifuged, and the supernatant and pellet counted. whole brain was homogenized in bsa-lr plus complete mini protease inhibitor (roche, mannheim, de; one tab per ml buffer) with a hand-held glass homogenizer, and centrifuged. the resulting supernatant was combined with equal parts of % trichloroacetic acid, mixed, centrifuged, and the supernatant and pellet counted. to determine the amount of degradation that occurred during processing, i-s and t-alb was added to brains and arterial whole blood from animals that had not been injected with radioactivity and processed immediately as above. the percent of radioactivity that was precipitated by acid (%p) in all of these samples was calculated by the equation: where s is the cpm in the supernatant and p is the cpm in the pellet. the capillary depletion method as adapted to mice was used to separate cerebral capillaries and vascular components from brain parenchyma , . we used the variant of the technique that also estimates reversible binding to the capillary lumen. mice were anesthetized and received an iv injection of x cpm tc-alb with x cpm i-s in . ml bsa-lr. , , and min later, blood was obtained from the carotid artery, and the brain (non-washout) was extracted. in other mice, blood was taken from the abdominal aorta at , , and min, the thorax was opened and the descending thoracic artery clamped, both jugulars severed, and ml of lactated ringer's solution infused via the left ventricle of the heart in order to washout the vascular contents of the brain and to remove any material reversibly associated with the capillary lumen. each whole brain was homogenized in glass with physiological buffer ( mm hepes buffer, mm nacl, mm kcl, . mm cacl , mm mgso , mm nah po -h o, mm d-glucose, ph . ) and mixed thoroughly with % dextran. the homogenate was centrifuged at xg for min at °c. the pellet, containing the capillaries, and the supernatant, representing the brain parenchymal space, were carefully separated. radioactivity levels in the capillary pellet, the brain supernatant, and the arterial serum were determined for both t-alb and i-s and expressed as the capillary/serum and brain parenchyma/serum ratios. the i-s parenchymal ratios were corrected for vascular contamination by subtracting the corresponding ratios for t-alb; these results are reported as the delta brain parenchyma/serum ratios. the amount of s in the brain parenchymal space was taken as the delta brain parenchymal space from the washout group (pw), the amount in the capillary as the capillary from the washout group (cw), and the amount of material loosely binding to the luminal surface (luminal) as: where p are the delta brain parenchymal space and the capillary space, both from the nonwashout groups. mice were anesthetized after which the jugular vein and right carotid artery were exposed. they were then given a jugular vein injection of . ml bsa-lr containing x cpm of i-s and x cpm of t-alb. for some mice, μg/mouse of a plant lectin (wheat germ agglutinin [wga], sigma, st. louis, mo) was included in the iv injection. brain, tissues, and serum samples were collected min later and tissue/serum ratios calculated. in anesthetized mice, the left jugular vein was exposed for an iv injection of . ml bsa-lr containing x cpm of i-s (raybiotech) and x cpm of t-alb. for some mice, the injection contained µg/mouse of unlabeled s (amsbio), mouse acyl ghrelin (cbio, menlo park, ca), angiotensin ii (tocris, bristol, uk), or human ace (r&d, minneapolis, mn). ten min after iv injection, whole blood was obtained from the carotid artery and centrifuged after clotting. the whole brain, olfactory bulb, kidney, spleen, and portions of the liver and lung removed. the levels of radioactivity in the arterial serum and the tissues were determined and the results expressed as the percent of the injected i-s per ml for serum and delta tissue/serum ratios (µl/g) for the tissues. male cd- mice aged - weeks were given an ip injection of mg/kg lps from salmonella typhimurium (sigma, st. louis, mo) dissolved in sterile normal saline at , , and h. at h, mice were anesthetized and the left jugular vein and right carotid artery exposed. the mice were given an iv injection of x cpm of i-s and x cpm t-alb in . ml of bsa/lr into the left jugular vein. arterial blood was collected from the right carotid artery min later, the mouse immediately decapitated, the brain removed, dissected into regions (olfactory bulb, frontal cortex, occipital cortex, parietal cortex, thalamus, hypothalamus, striatum, hippocampus, pons-medulla, cerebellum, midbrain), and the regions weighed. kidney, spleen, and portions of the liver and lung were also removed and weighed. serum was obtained by centrifuging the carotid artery blood for min at xg. levels of radioactivity in serum, brain regions, and tissues were measured in a gamma counter. whole brain values were calculated by summing levels of radioactivity and weight for all brain regions except for olfactory bulb. the levels of radioactivity in the arterial serum and the brain regions and tissues were determined and the results expressed as the percent of the injected i-s per ml for serum and delta tissue/serum ratios (µl/g) for the tissues. anesthetized mice were placed in the supine position and received a µl injection of x - x cpm i-s in bsa/lr administered to each naris, delivered to the level of the cribriform plate ( mm depth), using a µl multi-flex tip (thermo fisher scientific, waltham, ma). after administration, the mouse remained in the supine position for s before being placed on the left side. arterial blood was collected from the right carotid artery or min later, the mouse immediately decapitated, the brain removed, dissected into regions (olfactory bulb, frontal cortex, occipital cortex, parietal cortex, thalamus, hypothalamus, striatum, hippocampus, pons-medulla, cerebellum, midbrain), and the regions weighed. serum was obtained by centrifuging the carotid artery blood for min at xg. levels of radioactivity in serum and brain regions were measured in a wizard gamma counter. whole brain values were calculated by summing levels of radioactivity and weight for all brain regions except for olfactory bulb. the levels of radioactivity in the arterial serum and the tissues were determined and the results expressed as the percent of the injected i-s per ml for serum and the percent of the injected i-s per g of brain region calculated. multiple-time regression analysis as described above was performed in female and male homozygous human e -and e -targeted replacement (tr) mice. two-way anova used sex and apoe isoform as independent variables. transport by ipsc-derived brain endothelial-like cells (ibecs) the ibecs were derived from the gm ipsc line (coriell institute) using the method of neal et al with a seeding density of , cells per well for differentiation which was found to be optimal for this cell line. briefly, ipscs were grown to optimal density on plates coated with matrigel (vwr cat no. - ) in e flex medium (thermo fisher scientific cat no. a ), and then passaged using accutase (thermo fisher scientific cat no. a ) onto matrigel-coated plates in e flex medium plus µm rock inhibitor y- (r&d systems, cat no. ). the next day, the medium was changed to e (thermo fisher scientific, cat no. a ) and e changes continued daily for more days. next, the medium was changed to human endothelial serum-free medium (hesfm, thermo fisher scientific, cat no. ) supplemented with ng/ml bfgf (peprotech, cat no. - b), µm retinoic acid (sigma, cat no. r ), and % b supplement (thermo fisher scientific, cat no. ). h later, ibecs were subcultured onto -well transwell inserts (corning cat no. ) coated with mg/ml collagen iv (sigma, cat no. c ) and mm fibronectin (sigma, cat no. f ) in hesfm + ng/ml bfgf, um retinoic acid, and % b . h after subculture, the medium was changed to hesfm + % b without bfgf or retinoic acid, and transendothelial electrical resistance (teer) was recorded using an end evom voltohmmeter (world precision instruments, sarasota florida) coupled to an endohm cup chamber. teer measurements occurred daily and s transport experiments were conducted when teer stabilized, between - days in vitro. prior to transport studies, the medium was changed and cells were equilibrated in the incubator for min. warm hesfm + % b , million cpm t-alb, and , cpm of i-s was then added in a volume of l to the luminal chamber. after incubation times of , , , and min at °c, l volumes of medium from the abluminal chamber were collected and replaced with fresh pre-warmed medium. samples were then acid precipitated by adding a final concentration of % bsa to visualize the pellet and % trichloroacetic acid to precipitate the proteins in solution. the samples were centrifuged at xg for min at °c. radioactivity in the pellet was counted in the gamma counter and the permeability-surface area coefficients for t-alb and i-s were calculated according to the method of dehouck et al . clearance was expressed as microliters (l) of radioactive tracer that was transported from the luminal to abluminal chamber, and was calculated from the initial level of acid-precipitable radioactivity added to the luminal chamber and the final level of radioactivity in the abluminal chamber: where [c]l is the initial concentration of radioactivity in the luminal chamber (in units of cpm/l), [c]c is the concentration of radioactivity in the abluminal chamber (in units of cpm/l) and vc is the volume of the abluminal chamber in l. the volume cleared was plotted vs. time, and the slope was estimated by linear regression. the slopes of clearance curves for the ibec monolayer plus transwell® membrane was denoted by psapp, where ps is the permeability × surface area product (in μl/min). the slope of the clearance curve for a transwell® membrane without ibecs was denoted by psmembrane. the ps value for the ibec monolayer (pse) was calculated from / psapp = / psmembrane + / pse. the pse values were divided by the surface area of the transwell® inserts ( . cm ) to generate the endothelial permeability coefficient (pe, in l/min/cm ). means are reported with their se and n. two-tailed t-tests were used to compare two means and analysis of variance (anova) followed by multiple comparisons test when more than two means are compared. the prism . statistical software package was used for all statistical calculations (graphpad inc, san diego, ca). linear regression lines were calculated and their slopes and intercepts statistically compared using the program in prism . software. as required by multiple-time regression analysis, only the linear portion of the slope was used to calculate ki. outliers whose exclusion improved the r > . were excluded from analysis. more than two slopes were compared by anova; the standard error of the slope computed by the prism program was used and the degrees of freedom being n - . figure compares the i-s proteins from two sources, raybiotech and amsbio, for their abilities to cross the bbb as assessed by mtra. also shown are values for the vascular space marker t-alb which acted as an internal control for each animal studied. for this figure only, to facilitate comparison to t-alb, brain/serum ratios rather than delta brain/serum ratios are shown. there was a strong, positive relation between brain/serum ratios and expt for both i-s proteins: r = . , p< . (raybiotech); r = . , p< . (amsbio) demonstrating transport across the bbb. the unidirectional influx rates (ki's) for both compounds were nearly identical: .  . l/g-min (raybiotech) vs .  . l/g-min (amsbio). the vi values were also similar: .  . l/g (raybiotech) vs .  . l/g (amsbio). comparison of brain/serum ratio vs expt lines revealed no statistical difference between the ki's, but there was a statistical difference in the vi's: f( , ) = . , p = . , with the vi for ambsio i-s being lower. t-alb showed no correlation between its brain/serum ratios and expt and so its values only reflected the vascular space of the brain. figure shows the clearance of i-s (raybiotech) from blood and its uptake by brain and other tissues. clearance from blood was linear for the first min with a half-time clearance of about . min ( figure a ). for the entire min of the experiment, data was fit to a one phase decay model with y = . , plateau = . , k = . , and half life = . min, r = . . the delta tissue/serum ratios vs expt for brain and other tissues; that is, tissue/serum ratios that have been corrected for t-alb, are shown in figure b -f. t-alb measures both the vascular space of the tissue as well as any leakage into the tissue bed. since i-s is about twice the size of t-alb and leakage is usually a function of the inverse square root of the molecular weight, t-alb is an appropriate control for leakage of i-s as well. delta tissue/serum ratios, therefore, represent the selective uptake of i-s . brain had a ki = .  . l/g-min, r = . , p< . ( figure b ). uptake patterns and rates are shown in figure for lung ( c, r = . , p< . ), spleen ( d, r = . , p = . ), kidney ( e, r = . , p = . ), and liver ( f, r = . , p = . ). brain, lung, and kidney showed linear uptake throughout the study. spleen and liver had nonlinear relations with time, consistent with an efflux of s back into the blood stream. table illustrates the stability of i-s from raybiotech in blood and brain as measured by acid precipitation. processing controls show the methodology did not itself result in degradation. some degradation did occur in blood with an acid precipitation value (uncorrected by the processing control) of about %. in contrast, brain showed good acid precipitation (~ %) at min but was at about % by min. this likely represents, at least in part deiodinase activity, which is very concentrated in the brain, as well as degradation of the s protein itself in brain. nevertheless, the results indicate that brain kinetics are best calculated based on data taken within a short time after iv administration. figure shows the clearance of i-s (amsbio) from blood and its uptake by brain and other tissues. in general, patterns were very similar to the i-s from raybiotech, but there were a few differences. the half-time clearance rate during the first min after injection was . min. data for the entire min also was fit to a one phase decay model: y = . , plateau = . , k = . , and half life = . min, r = . . thus, clearance was somewhat faster for amsbio i-s and the volume of distribution in the body somewhat less. brain had a ki = .  . l/g-min, r = . , p< . ( figure b ). uptake patterns and rates are shown in figure for lung ( c, r = . , p< . ), spleen ( d, r = . , p = . ), kidney ( e, r = . , p = . ), and liver ( f, r = . , p = . ). only liver showed a nonlinear pattern of uptake and uptake was nearly times greater in liver for amsbio i-s compared to raybiotech s . some of the delta values for amsbio i-s were negative at the early time points, indicating that the s brain/serum ratio was lower than the t-alb brain/serum ratio at those time points. for amsbio i-s , values for lung, spleen, and kidney were very similar, whereas i-s from raybiotech showed values for spleen that were about times greater than that of lung and kidney. capillary depletion is a quality control method to assure that material taken up by the bbb is completely crossing the bbb rather than being sequestered by the capillary bed. we used a modified technique of this method that also allows us to measure the amount of material that is reversibly adhering to the luminal side of the capillary bed. figure shows that the amount of i-s (raybiotech) increased with time in the parenchymal (m = .  . l/gmin, r = . , n = , p< . ) and capillary (m = .  . l/g-min, r = . , n = , p = . ) fractions, whereas it remained constant for luminal binding (r = . ) with a mean of .  . l/g. wheatgerm agglutinin (wga) is a plant lectin that facilitates uptake of many glycoproteins, including those of viruses, and crosses the bbb by the mechanism of adsorptive transcytosis , , here, we used -way anovas followed by multiple comparisons test to compare effects on wga treatment on the two sources of s proteins (n = for each vehicletreated group, n = for wga-treated raybiotech group, n = for wga-treated amsbio group). figure shows the effect of wga on uptake by brain and peripheral tissues for i-s from both raybiotech and amsbio. figure a also shows serum values and the effect of wga on i-s from raybiotech suggests increased clearance from blood compared to i-s from amsbio (s source, treatment, interaction all significant at p< . ; multiple comparisons showed raybiotech i-s different from all other groups at p< . with no other groups being different). for brain ( figure b ), s source, treatment, and interaction were all significant at p< . . wga increased brain uptake for both i-s 's (raybiotech: p = . ; amsbio: p< . ), with a greater effect on amsbio s (p< . ). two-way anova for lung ( figure c ) showed an effect for wga-treatment (p< . ), and multiple comparison's test showed an increase for each i-s with wga treatment (both at p< . ), but no difference between the two i-s 's. for spleen ( figure d ), two-way anova showed an effect for s source and treatment, both at p< . and a trend for interaction (p = . ). wga increased uptake for raybiotech i-s (p< . ) but only induced a trend for amsbio i-s (p = . ); the difference between the two wga-treated i-s groups was significant (p< . ). for kidney ( figure e ), two-way anova showed differences for s source and treatment, both at p< . . wga increased uptake by kidney of both raybiotech i-s (p = . ) and amsbio i-s (p = . ) with a difference between the two wga-treated groups (p = . ). for liver ( figure f ), twoway anova showed a significant effect for source of s , wga treatment and interaction, all at p< . . multiple comparisons test showed wga significantly decreased uptake of amsbio i-s (p< . ) but not raybiotech i-s with no difference between the two wga-treated groups. it was next determined whether s transport into the brain could be saturated by excess unlabeled s protein, or if transport could be affected by ace or its substrates angii or ghrelin. the effects of ug/mouse of unlabeled proteins were compared by one-way anova followed by multiple comparisons test with comparison to the vehicle groups only. none of these substances affected the level of delta i-s or t-alb in blood, indicating that these substances did not affect the volume of distribution or clearance of either s or albumin. ace did increase the renal level of t-alb: f( , ) = . , p = . ; vehicle:  . l/g (n = ); ace :  . l/g (n = ), p = . ; (data not shown). ace also increased uptake of delta i-s by whole brain: f( , ) = . , p = . ; vehicle: .  . l/g (n = ); ace : .  . l/g (n = ), p = . ( figure a ). additionally, s by t-test decreased uptake of i-s by whole brain (p = . , f = . , df = ). lung uptake of delta i-s was affected by all compounds except angii ( figure b ; note: one outlier of l/g was removed from the ghrelin group): f( , ) = . , p = . ; vehicle: + . l/g (n = ); ace : + . l/g (n = ), p = . ; s : + . l/g (n = ), p = . ; ghrelin: + . l/g (n = ), p = . . none of the substances affected the uptake of i-s by liver, kidney, or spleen (data not shown). figure shows uptake of i-s (raybiotech) by brain regions; these values are the vehicle controls for the inflammation study (see figure ). one-way anova showed a trend (p = . ) and multiple comparisons tests comparing all regions to one another or comparing to whole brain did not show any statistically significant differences or trends among the regions. results of treating mice with lps are shown in figure . figure a shows that lps treatment increased the %inj/ml for raybiotech i-s (t = . , df = , p = . ), indicating a reduction in volume of distribution or clearance. there was no effect of lps treatment on t-alb (p = . ), demonstrating that the lps effect on i-s was not caused by changes in vascular space or leakage. lps treatment significantly increased lung uptake of i-s (t = . , df = , p = . ), while decreasing uptake for spleen (t = . , df = , p< . ) and liver (t = . , df = , p = . ) with a trend for decreasing it for kidney (p = . ) ( figure b ). two-way anova showed a brain region effect (p = . ) accounting for . % of the variability, and for lps treatment (p = . ) accounting for . % of the variability ( figure c ). multiple comparisons test that compared each vehicle-treated group to its lps-treated group found a significant decrease with lps treatment for olfactory bulb (p = . ). comparing vehicle-and lps-treated groups by t-test showed decreases with lps treatment for olfactory bulb (p = . ) and hippocampus (p = . ) with a trend for parietal cortex (p = . ). radioactivity appeared in blood after the intranasal administration of i-s , indicating that some of the material entered the blood stream (data not shown). the auc for blood after intranasal administration was . (%inj/ml)-min, compared to the auc for blood after iv injection was (%inj/ml)-min, giving a bioavailability for the nasal route of . %. radioactivity was found in all brain regions at both min and min after the intranasal administration of i-s . a t-test showed significant increases between and minutes for whole brain, frontal cortex, cerebellum, midbrain, and pons, but not the other regions. anova for min values showed a significant effect f( , ) = . , p = . , with olfactory bulb (p = . ) and hypothalamus (p = . ) being different from whole brain ( figure a ). anova at minutes was also significant f( , ) = . , p = . , but only the olfactory bulb (p = . ) was different from whole brain ( figure b ). figure a also shows for comparison the value for whole brain expressed as %inj/g after iv injection. the bbb was not disrupted in male or female apoe or apoe mice as evidenced by t-alb spaces ranging from a mean of . to . l/g in whole brain (figure , panel a) . clearance of i-s from blood ( figure , panel b) and transport into whole brain ( figure , panel c) and uptake by lung ( figure , panel e) also did not vary as a function of apoe genotype or sex. comparison of slopes and intercepts with the prism program showed a difference in slopes for the olfactory bulb, demonstrating that the unidirectional influx constant for i-s differed ( figure , panel d): f( , ) = . , p = . . a two-way anova of slopes showed a significant effect of sex (p = . ) accounting for % of the variability; apoe showed a trend (p = . ). multiple comparisons for the olfactory bulb showed a significantly larger ki for apoe males vs apoe females (p = . ) and apoe males vs apoe females (p = . ). for liver (figure , panel f), a significant difference occurred among the slopes: f( , ) = . , p = . ). two-way anova showed a significant effect for apoe (p< . ) accounting for % of the variability, and multiple comparisons showed a significantly lower uptake rate for apoe females vs apoe females (p = . ), apoe females vs apoe males (p = . ), and apoe males vs apoe males (p = . ). for spleen (figure , panel g), a significant difference occurred among slopes: f( , ) = . , p < . . two-way anova showed a significant effect occurred for sex (p = . ) accounting for % of the variability, for apoe (p< . ) which accounted for % of variability, and for interaction (p= . ) which accounted for % of variability, and multiple comparisons showed faster uptake rates for apoe females vs apoe females (p< . ), apoe females vs apoe males (p = . ), and apoe females vs apoe males (p< . ). for kidney ( figure , panel h), comparison of slopes showed a difference: f( , ) = . , p = . . two-way anova showed a difference for apoe (p = . ) accounting for % of variability, and multiple comparisons showed faster uptake by apoe males vs apoe males (p = . ) and apoe males vs apoe females (p = . ). to determine whether the s transport system was intact in a human in vitro model of the bbb, we compared transport of i-s vs t-alb across monolayers of ibecs seeded on transwells. for all studies, teer values exceeded Ω*cm and teer means were confirmed to be equal among groups just prior to starting the transport study. three independent experiments were conducted with raybiotech i-s to compare transport in the luminal to abluminal direction vs. t-alb ( figure , panel a) . single experimental replicates evaluated raybiotech i-s transport saturability with excess unlabeled s , which was found to have no effect on transport of either i-s or t-alb ( figure , panel b) . wga also had no effect on raybiotech s transport ( figure , panel c). raybiotech i-s had a significantly higher permeability coefficient when compared with amsbio i-s in vitro ( figure , panel d), however this difference was not significant after correcting for t-alb transport. these results clearly show that i-s from two different commercial sources readily and robustly cross the bbb. other major findings of these studies were that uptake was rather uniform throughout brain, entering every region examined. i-s was also readily taken up by spleen, lung, and kidney with the highest uptake being by liver, indicating that to be the chief organ of clearance. the s protein completely crossed the bbb to enter the brain parenchymal space, with smaller amounts internalized by the brain endothelial cells and binding to their luminal surfaces. wga greatly increased brain uptake of i-s , indicating that s binds to sialic acid and/or n-acetylglucosamine and likely crosses the bbb by the mechanism of adsorptive transcytosis. inhibition studies support adsorptive transcytosis and ace as a binding site on brain endothelial cells. our studies indicated ace to also be important for lung, but not for liver, kidney, or spleen uptake. inflammation tended to decrease clearance of i-s from blood, uptake by whole brain and brain regions, and the peripheral tissues except for lung, whose uptake was increased by inflammation. apoe isoform and sex had no effect on uptake by whole brain, but males tended to transport i-s into the olfactory bulb faster than females. apoe was associated with a higher uptake of i-s for liver, spleen, and kidney. intranasally administered i-s entered both the brain and blood, but at much lower levels than after intravenous injection and the kinetics pattern indicates that the i-s in brain directly crossed the cribriform plate as opposed to entering from blood. finally, in vitro studies showed similar, but attenuated, patterns to those found in vivo, suggesting that in vitro models of the bbb may be too dedifferentiated to be used in exploratory studies of s and sars-cov- . to improve specificity and statistical robustness of these studies, t-alb was used as a control, being co-injected into every animal, thus measuring vascular space for brain and vascular space plus leakage for peripheral tissues in each individual, and allowing for delta tissue/serum levels to be calculated. these delta values, therefore, measure the uptake that is specific to i-s . the transport rates, or ki's, as measured by the delta brain/serum ratios differed by less than %, with ki = . + . l/g-min for the s from raybiotech and a ki = . + . l/g-min for the s from amsbio. likewise, the two i-s 's had similar patterns for clearance from blood and uptake by peripheral tissues, with liver being by far the highest for both i-s 's and having an early nonlinear component to uptake. there were subtle differences between the kinetics of the two i-s 's which are likely caused by differences in glycosylation since this is critical to the tissue uptakes of viral proteins , . the s from raybiotech was cleared less rapidly from blood and had an early non-linear component to uptake by spleen and liver. the nonlinearity is usually caused by an efflux component from tissue back into the blood stream. this suggests that liver and spleen may be rereleasing previously sequestered i-s back into the blood stream. the i-s from raybiotech showed more variation among uptake rates for lung, spleen, and kidney, and had a lower uptake by liver in comparison to the i-s from amsbio. one consistent finding was that the amsbio i-s often produced negative values for tissue/serum ratios, especially at early time points and for brain. this literally means that the vascular space for i-s was less than that for t-alb at those time points. given the larger size of s in comparison to albumin, it is not surprising that albumin would be more leaky at the peripheral tissues. however, it is less clear how negative values occurred for brain uptake. capillary depletion was used to follow the temporal pattern of uptake of i-s from raybiotech into the brain space. occasionally, substances accumulate either on the luminal surface of the brain or are sequestered by the brain capillaries without subsequent passage into brain tissue. if those processes occur over time, then multiple-time regression analysis can give the false impression that penetration of the bbb is occurring. here, capillary depletion showed a strong time-dependent uptake of i-s into brain tissue and a much slower increase in the amount of i-s retained by brain capillaries. in contrast, the amount of i-s reversibly binding to the luminal surface of brain endothelial cells was constant. this shows that luminal binding to brain endothelial cells quickly reached a steady state with blood levels, and that most of the i-s taken up by brain endothelial cells quickly entered the brain parenchymal space. thus, brain cells are exposed to s which could then bind to or activate them. similarly, s internalization could affect brain endothelial cell function and provides a mechanism by which s could facilitate infection of brain endothelial cells. wga is a glycoprotein plant lectin that binds to sialic acid and to n-acetylglucosamine, inducing adsorptive transcytosis in brain endothelial cells, a vesicular process that results in transport across the bbb , . co-injection of wga increased uptake by brain, lung, spleen, and kidney for i-s from either raybiotech or amsbio. wga decreased serum values for i-s from raybiotech, indicating increased volume of distribution among tissues or clearance from blood, and decreased liver uptake for i-s from amsbio. the endosomes induced by adsorptive transcytosis are often routed to the lysosomal compartment and viruses which cross the bbb by the mechanism of adsorptive transcytosis, including hiv- and rabies, can survive the lysosomal compartment , - . an interesting feature of adsorptive transcytosis is that when a robust stimulator such as wga is co-injected with a less robust stimulator, wga promotes, rather than inhibits, the transport of the less robust stimulating substance. for example, co-injection of wga with hiv- gp greatly increases the transport of gp across the bbb . thus, the finding that wga increased uptake by brain is consistent with adsorptive transcytosis as a mechanism of entry. these results suggest that s , and by extension possibly sars-cov- , may cross the bbb or be taken up by peripheral tissues by binding to membrane bound glycoproteins that contain sialic acid and n-acetylglucosamine. subsequent studies used s from raybiotech except as noted. the major cell surface protein that s is thought to bind to is ace , . based on experience with sars, it has been assumed that sars-cov- will bind to human, but not murine, ace . sars can infect mice, but doesn't produce severe symptoms and death except in transgenic mice overexpressing human ace . however, at least part of this effect in transgenic mice may be caused by their expressing - times more ace . the spike protein of sars and sars-cov- share % sequence identity with the s from sars-cov- having more charged residues . these extra positive charges would facilitate transport across the bbb by adsorptive transcytosis , . mouse and human ace are . % homologous . therefore, it should not be prematurely assumed that the s from sars-cov- cannot effectively bind to mouse ace . however, as ace expression is low in brain , , it may be that other receptors are used to enter brain. this would be analogous to hiv- , which does not use its classic binding sites to cross the bbb , . s binds off-target from the ace catalytic site . here we tested the ability of ace to compete with brain uptake of i-s . we co-injected with i-s either soluble ace , unlabeled s , or the ace substrates ghrelin or angiotensin ii. only ace affected transport across the bbb and its presence stimulated uptake. one would have predicted that soluble ace would have decreased i-s transport across the bbb by competing with the ace present on brain endothelial cells. short term, soluble receptors for a substance typically inhibit bbb transport, although they can ultimately increase total brain uptake . lung produced more paradoxic findings in that unlabeled s and ghrelin as well as ace produced statistically significant increases in uptake. these findings, although paradoxical, suggest that ace is involved in uptake of i-s by both the bbb and lung and the paradoxical increases they induce are consistent with transport by the mechanism of adsorptive transcytosis. angii may have been ineffective in alteringi-s uptake by any tissue since i-s binds to a different location on ace . this is typical of viral proteins which usually don't bind to the active site of a transporter or enzyme, but target some less specific aspect of the sugar-protein complex. viral proteins are less discriminating than the ligands of the glycoproteins they target and so may bind to a variety of membrane glycoproteins. this is exemplified by rabies and herpes simplex viruses each of which binds to multiple receptors , . the lack of effect of the tested substances on kidney, liver, and spleen may suggest that other receptors are involved in s sequestration. receptors besides ace can bind sars-cov- , including basigin, cyclophilins, and dipeptidyl peptidase- and grp . uptake of i-s was very uniform throughout brain regions with no statistically significant differences among brain regions. the region with the highest uptake was the hypothalamus at . µl/g and the region with the lowest uptake was the midbrain at . µl/g. this widespread uptake throughout the brain supports the consideration that the central nervous system may be contributing to many of the diverse effects of s and/or sars-cov- such as encephalitis, respiratory difficulties, and anosmia , , . inflammation can increase the passage of substances across the bbb by a variety of mechanisms, including stimulation of adsorptive transcytosis. for example, the lps treatment used here greatly increases the uptake of both gp and hiv- by brain , . lps releases many of the same cytokines associated with the cytokine storm of covid- , including interleukin (il)- , il- , granulocyte colony-stimulating factor, granulocyte-macrophage colony-stimulating factor, interferon gamma-induced protein , monocyte chemoattractant protein- , and tumor necrosis factor-alpha , . here, lps increased uptake of i-s by lung. this would suggest that the cytokine storm of covid- could be associated with increased uptake of s and sars-cov- by lung. otherwise, lps tended to decrease, not increase, uptake by tissues. lps increased blood levels of i-s , consistent with a reduction of uptake by tissues or decreased clearance. indeed, treatment with lps reduced uptake by spleen and liver with a trend for kidney. most brain regions had an arithmetic, non-statistically significant decrease in i-s uptake with lps treatment, although two regions (olfactory bulb and hippocampus) did reach statistical significance. these results suggest that the cytokine storm produced by lps may, at least as far as s actions are concerned, be beneficial for peripheral tissues such as liver and spleen, neutral to beneficial for brain, and harmful for lung. viruses present in the nasopharyngeal cavity can enter the brain. viruses in the nasal passages can be absorbed directly into the blood stream, especially by the very vascular turbinates, using the hematogenous route to infect peripheral tissues and the brain . other substances can enter the central nervous system by being absorbed at the cribriform plate and by nerve terminals of the trigeminal nerve . here, we administered i-s at the level of the cribriform plate where it has been shown substances can be taken up by the cerebrospinal fluid, olfactory bulb, and subsequently other brain regions . we found i-s present in all brain regions at min, with olfactory bulb and hypothalamus having the highest levels and brain levels even higher at min. radioactivity was also found in blood and min after the intranasal administration. both brain and blood levels after intranasal administration are much lower than after intravenous administration, with brain levels after intranasal administration being / th and blood levels about / th of levels after intravenous administration. i-s could have first entered into the blood stream with subsequent blood-to-brain transmission, or have first entered the brain with subsequent entry into blood with the reabsorption of cerebrospinal fluid. however, the blood levels produced after intranasal administration are far too low to account for more than about / th of the radioactivity found in brain. therefore, the results support i-s entry into the central nervous system across the cribriform plate or by the trigeminal nerves. however, the amount of radioactivity entering the brain was much greater after intravenous than after intranasal administration. this suggests that although s and sars-cov- can enter the brain directly from the nasal passages, the major route is likely hematogenous. the apoe genotype and being male has been associated with increased risk of severe covid- symptoms , . here, sex and apoe genotype exerted no statistically significant effect on i-s clearance from blood or uptake by whole brain or lung. sex did affect uptake of i-s for olfactory bulb, with males having a higher uptake, and for spleen, with males having a lower uptake. apoe genotype affected uptake by spleen and kidney with a trend for olfactory bulb. in all cases where significant, apoe was associated with a higher uptake of i-s than was apoe . ace levels are lower in humans expressing apoe and so would be expected to have lower levels of viral uptake . it should be noted that the apoe studies were conducted in mo old mice and most of the effects associated with apoe occur with advancing age. therefore, it is important to examine s and viral uptake in aged apoe isoform mice. in comparison to in vivo studies, the vitro studies did not produce robust results for i-s uptake. transport of i-s was not observed to be significantly greater than that of albumin in a human in vitro model of the bbb, nor was it affected by unlabeled s protein or wga. therefore, the transport mechanisms that are intact for i-s in vivo in mice are not present or are attenuated for i-s in our ibec model despite findings of other bbb-characteristic features such as high teer and consistent expression of brain endothelial cell markers such as claudin- , glut- , and pecam, similar to what has been shown by others . our findings indicate that model optimization is required for studies of s transport in vitro using ibecs. they also show that the transport of i-s across the bbb found in vivo depends on more selective mechanisms such as glycoprotein binding rather than less selective such as leakage. in summary, these studies clearly show that i-s glycoprotein crosses the bbb and is taken up by peripheral tissues. the tissue with the highest uptake by far is liver, suggesting it is the tissue that primarily clears s . i-s completely crosses the bbb to enter brain tissue and enters in all regions of the brain. wga agglutinin enhances uptake of i-s by brain and by peripheral tissues, suggesting transport across the bbb involves sialic acid or nacetylglucosamine and likely uses the transcytotic mechanism of adsorptive transcytosis. ace given intravenously with i-s paradoxically increases brain uptake of i-s as well as that of lung, but not for kidney, liver, or spleen, suggesting s may use other membrane glycoproteins in addition to ace as functional receptors. inflammation tends to reduce uptake of i-s by brain regions and for peripheral tissues except lung, suggesting a protective effect. against s . sex and apoe have no effect on i-s uptake by whole brain and lung, but do have tissue dependent effects for olfactory bulb, liver, spleen, and kidney, with apoe status associated with increased uptake of i-s . taken together, the findings here demonstrate and characterize the uptake of s by brain and peripheral tissues. comparison of bbb transport of s proteins from two sources, raybiotech and amsbio. co-injected t-alb (triangles) measured vascular space of brain and leakage and showed no evidence of bbb transport. both s proteins (circles) crossed the bbb with no differences in the transport rates (ki), but with a lower vi for amsbio s . see results for statistical and pharmacokinetic parameters. figure . uptake by brain and peripheral tissues and clearance from blood of i-s protein (raybiotech). panel a shows clearance from blood fitted to a one phase decay model. panel b shows brain/serum ratios corrected for t-alb (delta brain/serum ratios) plotted against exposure time according to multiple-time regression analysis. the slope of the linear portion of this relation measures ki, the unidirectional influx rate. panels c-f show the unidirectional influx rates for peripheral tissues (see results for statistics). . uptake by brain and peripheral tissues and clearance from blood of i-s protein (amsbio). panel a shows clearance from blood fitted to a one phase decay model. panel b shows brain/serum ratios corrected for t-alb (delta brain/serum ratios) plotted against exposure time according to multiple-time regression analysis. the slope of the linear portion of this relation measures ki, the unidirectional influx rate. panels c-f show the unidirectional influx rates for peripheral tissues (see results for statistics). . capillary depletion for i-s protein. this method showed that i-s completely crossed the bbb to enter the parenchymal space of the brain and that this value increased with time. i-s was also detected in the capillary bed (blood-brain barrier) and also slightly increased with time, whereas luminal binding was steady at . l/g (dashed line) during the course of the experiment. figure . effects of wheatgerm agglutinin (wga) on i-s protein uptake. values have been corrected for albumin space using t-alb that was co-injected with i-s . * , **, *** = difference between wga vs no wga for the indicated s at the p< . , < . , or < . level; #, ##, ### = difference between raybiotech vs s from amsbio for wga at the p< . , < . , or < . level. for spleen, there was a trend (p = . ) for a difference between wga and no wga for the amsbio i-s . figure . competition and modulation of i-s protein uptake. panel a shows that co-injection of unlabeled human ace enhanced transport of i-s across the bbb. panel b shows that i-s uptake by lung was enhanced by unlabeled s , ghrelin, and ace . uptake by olfactory bulb, liver, spleen, and kidney were not affected by any of these agents (data not shown). figure . brain regional uptake of i-s protein across the bbb. data is from the controls from the inflammation experiment in figure . anova showed no statistically significant difference among brain regions. figure . effect of inflammation induced by lps on i-s uptake. panel a shows that treatment with lps increased level of i-s in blood, indicating a decrease in the volume of distribution and/or clearance, but with no effect on t-alb levels. panel b shows that lps treatment resulted in increased uptake of i-s by lung and decreases in uptake by spleen and liver, with a trend towards a decrease by kidney. panel c shows levels in brain regions were arithmetically decreased, but reached statistical significance only in olfactory bulb and hippocampus with at trend for the occipital cortex. in all panels, * , **, *** = p< . , < . , or < . , and t = . >p> . , respectively. figure . brain regional uptake of i-s protein following intranasal administration. panel a shows levels of i-s expressed as %inj/g min after intranasal administration and panel b shows %inj/g min after administration for whole brain and brain regions. for comparison, panel a also shows results for whole brain expressed at %inj/g after intravenous injection of i-s (solid bar). in each panel, * , ** = p< . , < . , respectively. figure . effects of sex and apoe genotype on clearance and uptake of i-s (raybiotech). bbb integrity as measured by t-alb space (panel a), clearance of i-s from blood (panel b), and uptake by whole brain (panel c) and lung (panel e) were not statistically affected by sex or apoe status. olfactory bulb was affected significantly by sex (panel d). liver (panel f) and kidney (panel h) affected by apoe status. spleen (panel g) affected by apoe status, sex, and interaction. figure . transport of s proteins across human ibec monolayers. i-s (raybiotech or amsbio) and t-alb transport in the luminal to abluminal direction were studied. panel a: raybiotech i-s vs. t-alb transport summarized from three independent differentiations, n= wells/group, average teer ( ± . Ω*cm ). panel b: saturability of raybiotech i-s transport with . µg/well excess unlabeled s , n= - wells/group. panel c: effects of wga ( . g/ml) on raybiotech i-s transport, n= - wells/group. panel d: comparison of raybiotech and amsbio i-s transport, n= wells/group, *p < . . supplemental figure . gel autoradiography of i-s from raybiotech and amsbio. in each case, both the reduced and non-reduced gels migrated at the molecular weights predicted by the manufacturers. the neuroinvasive potential of sars-cov may play a role in the respiratory failure of covid- patients neurologic manifestations of hospitalized patients with coronavirus disease the involvement of the central nervous system in patients with covid- a first case of meningitis/encephalitis associated with sars-coronavirus- cytokine storm and leukocyte changes in mild versus severe sars-cov- infection: review of covid- patients in china and emerging pathogenesis and therapy concepts brain invasion by mouse hepatitis virus depends on impairment of tight junctions and beta interferon production in brain microvascular endothelial cells prior infection and passive transfer of neutralizing antibody prevent replication of severe acute respiratory syndrome coronavirus in the respiratory tract of mice detection of severe acute respiratory syndrome coronavirus in the brain: potential role of the chemokine mig in pathogenesis infectability of human brainsphere neurons suggests neurotropism of sars-cov- neuroimmune axes of the blood-brain barriers and bloodbrain interfaces: bases for physiological regulation, disease states, and pharmacological interventions identification of hiv- envelope glycoprotein in the serum of aids and arc patients sensitization of t cells to cd -mediated apoptosis by hiv- neurotoxicity of hiv- proteins gp and tat in the rat striatum detection of hiv gp in plasma during early hiv infection is associated with increased proinflammatory and immunoregulatory cytokines hiv- protein gp crosses the blood-brain barrier: role of adsorptive endocytosis central nervous system expression of hiv- gp activates the hypothalamic-pituitary-adrenal axis: evidence for involvement of nmda receptors and nitric oxide synthase transport of human immunodeficiency virus type pseudoviruses across the blood-brain barrier: role of envelope proteins and adsorptive endocytosis binding, internalization, and membrane incorporation of human immunodeficiency virus- at the bloodbrain barrier is differentially regulated apoe e e genotype and mortality with covid- in uk biobank risk factors for disease severity, unimprovement, and mortality of covid- patients in wuhan, china the blocks to human immunodeficiency virus type tat and rev functions in mouse cell lines are independent evolution of a species-specific determinant within human crm that regulates the post-transcriptional phases of hiv- replication evidence that the species barrier of human immunodeficiency virus- does not extend to uptake by the bloodbrain barrier: comparison of mouse and human brain microvessels coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus structural basis for the recognition of sars-cov- by full-length human ace sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structural basis of receptor recognition by sars-cov- young apoe targeted replacement mice exhibit poor spatial learning and memory, with reduced dendritic spine density in the medial entorhinal cortex on the use of chloramine-t to iodinate specifically the surface proteins of intact enveloped viruses transport of alpha-aminoisobutyric acid across brain capillary and cellular membranes selection of experimental conditions for the accurate determination of blood-brain transfer constants from single-time experiments: a theoretical analysis capillary depletion method for quantification of blood-brain barrier transport of circulating peptides and plasma proteins murine tumor necrosis factor alpha is transported from blood to brain in the mouse a simplified, fully defined differentiation scheme for producing blood-brain barrier endothelial cells from human ipscs drug transfer across the blood-brain barrier: correlation between in vitro and in vivo models the entry of enveloped viruses into cells by endocytosis transcytosis of protein through the mammalian cerebral epithelium and endothelium: ii. adsorptive transcytosis of wga-hrp and the blood-brain and brain-blood barriers virus-cell and cell-cell fusion viral infections and the blood-brain barrier. in implications of the blood-brain barrier and its manipulation avenues for entry of peripherally administered protein to the central nervous system in mouse, rat, and squirrel monkey glycoconjugates and anionic sites in the blood-brain barrier blood to brain and brain to blood passage of native horseradish peroxidase, wheat germ agglutinin and albumin: pharmacokinetic and morphological assessments transmission of human immunodeficiency virus from monocytes to epithelia interaction of human epidermal langerhans cells with hiv- viral envelope proteins (gp and gp s) involves a receptor-mediated endocytosis independent of the cd t a epitope characterization of lectin-mediated brain uptake of hiv- gp considerations around the sars-cov- spike protein with particular attention to covid- brain infection and neurological symptoms physiological and pathological regulation of ace , the sars-cov- receptor lethal infection of k -hace mice infected with severe acute respiratory syndrome coronavirus endothelial negative surface charge areas and blood-brain barrier function cns delivery via adsorptive transcytosis structural libraries of protein models for multiple species to understand evolution of the renin-angiotensin system exploring the pathogenesis of severe acute respiratory syndrome (sars): the tissue distribution of the coronavirus (sars-cov) and its putative receptor, angiotensin-converting enzyme (ace ) tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis hiv- gp receptor on cd -negative brain cells activates a tyrosine kinase human immunodeficiency virus infection of human brain capillary endothelial cells occurs via a cd /galactosylceramideindependent mechanism a model of the ace structure and function as a sars-cov receptor drug equilibrium across the blood-brain barrier -pharmacokinetic considerations based on the microdialysis method virus receptors in the human central nervous system fibroblast growth factor receptor is a portal of cellular entry of herpes simples virus type i distribution of ace , cd , cd and other sars-cov- associated molecules in tissues and immune cells in health and in asthma, copd, obesity, hypertension, and covid- risk factors covid- spike-host cell receptor grp binding site prediction mechanisms for the transendothelial migration of hiv- -infected monocytes into brain adsorptive endocytosis of hiv- gp by blood-brain barrier is enhanced by lipopolysaccharide cytokine and chemokine responses in serum and brain after single and repeated injections of lipopolysaccharide: mutliplex quantification with path analysis intranasal delivery of biologics to the central nervous system bypassing the blood-brain barrier to deliver therapeutic agents to the brain and spinal cord angiotensin-converting enzyme is reduced in alzheimer's disease in association with increasing amyloid-β and tau pathology supported by the va (wab, mae), nih r ag - (mae), and nih rf- ag (wab, jr).we wish to thank kristen baumann, riley weaver, sarah pemberton, and danny quaranta for technical assistance.graphical abstract created with biorender.com key: cord- -obig mu authors: alić, ivan; goh, pollyanna a; murray, aoife; portelius, erik; gkanatsiou, eleni; gough, gillian; mok, kin y; koschut, david; brunmeir, reinhard; yeap, yee jie; o’brien, niamh l; groet, jurgen; shao, xiaowei; havlicek, steven; dunn, n ray; kvartsberg, hlin; brinkmalm, gunnar; hithersay, rosalyn; startin, carla; hamburg, sarah; phillips, margaret; pervushin, konstantin; turmaine, mark; wallon, david; rovelet-lecrux, anne; soininen, hilkka; volpi, emanuela; martin, joanne e; foo, jia nee; becker, david l; rostagno, agueda; ghiso, jorge; krsnik, Željka; Šimić, goran; kostović, ivica; mitrečić, dinko; francis, paul t; blennow, kaj; strydom, andre; hardy, john; zetterberg, henrik; nižetić, dean title: “patient-specific alzheimer-like pathology in trisomy cerebral organoids reveals bace as a gene-dose-sensitive ad-suppressor in human brain” date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: obig mu a population of > million people worldwide at high risk of alzheimer’s disease (ad) are those with down syndrome (ds, caused by trisomy (t )), % of whom develop dementia during lifetime, caused by an extra copy of β-amyloid-(aβ)-precursor-protein gene. we report ad-like pathology in cerebral organoids grown in vitro from non-invasively sampled strands of hair from % of ds donors. the pathology consisted of extracellular diffuse and fibrillar aβ deposits, hyperphosphorylated/pathologically conformed tau, and premature neuronal loss. presence/absence of ad-like pathology was donor-specific (reproducible between individual organoids/ipsc lines/experiments). pathology could be triggered in pathology-negative t organoids by crispr/cas -mediated elimination of the third copy of chromosome- -gene bace , but prevented by combined chemical β and γ-secretase inhibition. we found that t -organoids secrete increased proportions of aβ-preventing (aβ - ) and aβ-degradation products (aβ - and aβ - ). we show these profiles mirror in cerebrospinal fluid of people with ds. we demonstrate that this protective mechanism is mediated by bace -trisomy and cross-inhibited by clinically trialled bace -inhibitors. combined, our data prove the physiological role of bace as a dose-sensitive ad-suppressor gene, potentially explaining the dementia delay in ∼ % of people with ds. we also show that ds cerebral organoids could be explored as pre-morbid ad-risk population detector and a system for hypothesis-free drug screens as well as identification of natural suppressor genes for neurodegenerative diseases. production - , and degradation of β-amyloid peptides (aβ) are among the central processes in the pathogenesis of alzheimer's disease (ad). the canonical aβ peptide is produced after sequential cleavage of the β-amyloid precursor protein (app) by β-secretase and γ-secretase, generating a peptide that most often begins amino acids (aa) from the c-terminus of app with asp and contains the next - aa of the app sequence, generating a range of peptides (aβ - , , , and ) . the longer of these peptides can be detected in toxic amyloid aggregates in the brain, associated with ad and other neurodegenerative disorders . as app gene is located on human chromosome , people with down syndrome (ds, caused by trisomy (t )) are born with one extra copy of this gene, which increases their risk of developing ad. non-ds (euploid) people inheriting triplication of the app gene alone (dupapp) develop ad symptoms by age with % penetrance. paradoxically, only ~ % of people with ds develop clinical dementia by age , suggesting the presence of other unknown chromosome -located genes that modulate the age of dementia onset , . a number of secretases participate in the physiological cleavage of app , , generating various peptides involved in neuronal pathology. bace is the main β-secretase in the brain , while the expression and function of its homologue bace (encoded by a chromosome gene) remain less clear , . at least different activities of bace were recorded with regards to app processing: as an auxiliary β-secretase (proamyloidogenic), as a θ-secretase (degrading the β-ctf and preventing the formation of aβ), and as aβ-degrading protease (aβdp) (degrading synthetic aβ-peptides at extremely acidic ph). it remains unclear which of these activities reflect the role of bace in ad. the potential activity of bace as an anti-amyloidogenic θ-secretase can be predicted from studies on a variety of transfected cell lines that overexpress app, and artificially manipulate the dose of bace [ ] [ ] [ ] [ ] . we compared organoids from isogenic ipsc lines, derived from the same individual with ds, mosaic for t and normal disomy (d ) cells . cerebral organoids were derived following a standard protocol , and shown to contain neurons expressing markers of all layers of the human cortex ( supplementary fig. ) and no significant difference in the proportions of neurons and astrocytes between the d and t organoids ( supplementary fig. ) . the integrity and copy number of the ipsc lines were validated at the point of starting the organoid differentiation, for chromosome ( supplementary fig. ), and the whole genome (available on request). t /d status was further verified by interphase fish on mature organoid slices, ( supplementary fig. a ). the c-terminal region of app can be processed by the sequential action of different proteases to produce a range of protein fragments and peptide species, including aβ ( supplementary fig. ). aβ peptide profiles were analysed from organoidconditioned media (cm) whereby each cm sample was taken from a cm dish culturing a pool of - organoids derived from one ipsc line, in total: n= cm samples for exp ( trisomic isogenic lines, disomic isogenic lines, timepoints each), n= cm samples for exp ( trisomic isogenic lines, disomic isogenic lines, timepoints each) and n= cm samples for exp ( trisomic isogenic line, disomic isogenic line, dupapp line, line each for two different unrelated ds individuals, timepoints each). cm was collected at a timepoints between days - of culturing and analysed using immunoprecipitation in combination with mass spectrometry (ip-ms) . please see "methods" and "supplementary data" sections for more detailed explanations, and statistical controls used for individual ipsc line-to-line comparisons. ( fig. a) . relative ratios were calculated of areas under the peak between the peptides of interest within a single mass spectrum (raw data example in supplementary fig. d) , therefore unaffected by the variability in the total cell mass between wells growing organoids. the proportions of non-amyloidogenic peptides with the signature of bace cleavage products, both as a putative θ-secretase (as reflected by the aβ - product) and putative aβdp or aβclearance products (aβ - & - ), or combined, (relative to the sum of aβ amyloidogenic peptides (aβ - & - & - & - )) were approximately doubled in cm from t organoids, compared to isogenic normal controls, and reached levels of > % of the amyloidogenic peptide levels (fig. a) . this result was fully reproduced in independent experiments, each starting from undifferentiated ipscs ( vertical columns of graphs in fig. a ). in experiment , more recently generated ipsc lines from different individuals were introduced; from a euploid patient with feoad caused by dupapp , and from unrelated people with ds (supplementary figs. - ). the - & - /amyloidogenic ratios were not significantly different between d and dupapp lines, suggesting the third copy of the app gene alone did not cause any change in this ratio. ratios of - & - /amyloidogenic peptides and combined bace products/amyloidogenics were significantly increased in t lines (combining all t individuals) compared to d or dupapp lines (fig. a) . the ratio of - /amyloidogenics was significantly higher in t lines from the isogenic model, compared to its disomic isogenic control, and compared to dupapp, but it was unchanged in the other two unrelated ds ipsc lines (see also supplementary information for a more detailed explanation). as the proportions of bace -unrelated α-site cleavage products ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) were not different between t and isogenic d organoids (in any of the experiments) (fig. a) , it can be predicted that the increased presence of - , - and - peptides in t contributes towards an overall increase in soluble peptides that are non-amyloidogenic. the validity of this prediction was tested by an independent biochemical method (elisa), by measuring the aβ-peptide concentrations within the isogenic t :d organoid cm comparison, which showed an increase in absolute concentrations caused by t for each aβ - , - and - , with no difference in the aβ - / - ratio between t and isogenic d lines, mirroring the readout in the absolute levels of ip-ms peaks ( supplementary fig. ). analysis of ip-ms area under peak (used in fig. to calculate relative ratios) showed a near linear correlation when plotted against absolute peptide concentrations measured by elisa, for each aβ - , - and - ( supplementary fig. ), validating our relative ratio calculations by an independent biochemical method. to estimate the contribution of bace towards the anti-amyloidogenic pathway relative to other anti-amyloidogenic cleavages at the α-site, we calculated the peptide ratios of - / - or - (θ secretase/α secretase products) and - / - or - (bace aβdp/ α secretase products). we observed that t organoids produce statistically highly significant increases in all four of these ratios, relative to isogenic d , or non-isogenic dupapp organoids (fig. b) . therefore, we conclude that t causes these effects in our organoid system. the d ratios were not significantly different to dupapp, suggesting that the third copy of genes other than app causes these effects, though this needs to be tested on a larger number of individuals. as the peptide profiling data strongly favour the hypothesis of a genetic-dose-sensitive antiamyloidogenic action of bace , we sought to zoom in on the bace genetic locus in a systematic snp-array analysis of individuals recruited through the londowns consortium who had undergone detailed assessment for dementia , ; single nucleotide polymorphisms (snps) located within the bace locus +/- kb, were genotyped, and dementia age-of-onset determined, as described in methods. we detect two new bace snps (purple, supplementary fig. ) correlating with age of dementia onset in the ds cohort of the londowns consortium, located in close proximity to a previously reported snp (red, supplementary fig. ) . all of these snps cluster in < kbp segment, which is fully contained within a kbp deletion (blue line, supplementary fig. ) that caused a de novo eoad in a euploid patient ( supplementary fig. ) . these data corroborate the notion that subtle genotypic variation in bace levels may play an important role in affecting the age of dementia onset in both ds and non-ds individuals. in order to assess if the peptide ratio differences from fig. b have any relevance in vivo, we analysed the aβ-peptide profiles immunoprecipitated from human cerebrospinal fluid (csf). we have previously produced ip-ms data on csf from people with ds and age-matched controls . we repeated the calculations shown for organoids in fig. b , on ip-ms results from csf samples from ds (n= ) and age-matched euploid people (n= ). all four relative ratio calculations showed an increase in peptide ratios in csf from people with ds, compared to agematched euploid controls, of which three comparisons were statistically highly significant (fig. c ). this suggests that in ds brains, the third copy of bace skews the anti-amyloidogenic processing significantly towards bace -cleavages, relative to other anti-amyloidogenic enzymes cleaving at the α-site. importantly, these csf results validate the in vivo relevance of the peptide ratios obtained using cm from ipsc-derived cerebral organoids (comparison of fig. b and fig. c ). chemical inhibition of bace remains an attractive therapeutic strategy for ad. as bace is a homologous protein, most inhibitors tested in clinical trials also cross-inhibit the (proamyloidogenic) β-secretase activity of bace , which has been proven as the cause of several unwanted side-effects, such as skin pigmentation changes. as our data suggest that the opposite, aβ-degrading, activity of bace plays an important role, we designed a new fret-based in vitro assay, in which efficient aβdp-cutting after aβ aa by bace at ph= . could be measured (fig. ), while zero activity by bace was detectable under same conditions (supplementary information). we demonstrated that at least two bace inhibitor compounds (of which one recently used in clinical trials) inhibit the aβdp (aβ-clearance) activity of bace in a dose-dependent manner (fig. ). this has, to our knowledge, so far not been shown, and could provide an additional explanation for the failure of some bace -inhibitor clinical trials, and should be taken in consideration when testing new inhibitors. as in vitro experiments showed that bace can very efficiently cleave the aβ -site in the fret peptide (fig. ) and synthetic aβ - peptide in solution at an acidic ph , we sought to visualize if the presence of the substrate (aβ - ), enzyme (bace ) and one of the products of this reaction (aβ - ) can be detected in our organoids, in a sub-cellular compartment known to be acidic. firstly, by immunofluorescence (i.f.) using pan-anti-aβ ( g ), anti-bace , or neoepitope-specific antibodies against aβx- and aβx- , we detected significantly higher signals (normalized to pan-neuronal marker) in t organoid neurons, compared to isogenic d ones ( supplementary fig. b-d) . pearson's coefficient showed a high level of colocalisation (> . ) of both the main substrate (aβx- ) and its putative degradation product (aβx- ) with bace in neurons of cerebral organoids, in lamp + compartment (known to be a subset of lyzosomes, therefore low ph vesicles) ( fig. & supplementary fig. ). in comparison, the pearson's coefficient for bace with aβx- was only . ( fig. & supplementary fig. ) , and its pattern of sub-cellular localization was different to bace (high colocalization with rab and sortilin, much lower with lamp ). using i.f. on human brain sections, a similar highly significant difference was observed (fig. a, b) : aβx- colocalised with bace ( . (± . sem)) as opposed to bace ( . (± . sem)). the colocalised signal of aβx- and bace was seen in categories of objects ( fig. ) , in all analysed samples: individual ds-ad brains ( fig. a-c) , euploid sporadic ad subjects (example in supplementary fig. a , for complete list of brain samples see supplementary table ) and (in the fine vesicle compartment only) in non-demented control euploid subjects' neurons (age - ), as well as ds brain from a yr old with no plaques or dementia, (examples in fig. d , for complete list of brain samples see supplementary table ) . lambda scanning and sudan black b stainings were independently used to subtract the autofluorescence of lipofuscin granules ( supplementary fig. f, g) . this has proven that the fine vesicular pattern and large amorphous extra-cellular aggregates are not autofluorescent lipofuscin granules, but real colocalisations of bace and aβx- ( supplementary fig. ). colocalised signals of aβx- and bace were particularly strong in areas surrounding neuritic plaques (fig. a-c) . as aβdp cleavage by bace is efficient only at low ph, we sought to analyse in more detail the bace and aβx- co-localisation in highly acidic cellular compartments. for this reason, we costained lysosome markers lamp or lamp with aβx- . additionally, macroautophagic vacuoles containing aβ were shown to accumulate in ad distended neurites , which is why we also stained with the macro-autophagosome marker lc a. as we further found that aβx- did not colocalise with lamp or lc a, but colocalised strongly with lamp (fig. , supplementary figure and supplementary information), we tested colocalisation with the components of an alternative autophagy pathway: chaperone-mediated autophagy (cma), and found a very high level of colocalisation (fig. ) . using crispr/spcas -hf , we eliminated a single copy of bace in the trisomic ipsc line c (t c ∆ , a ∆ bp in bace exon , knocking out of copies of the gene), while maintaining the trisomy of the rest of chromosome ( fig. a -c, supplementary fig. , supplementary information). total actin-normalised bace signal showed a %- % reduction in Δ compared to t unedited line, and no significant difference compared to d control (fig. c, supplementary fig. ). total protein level of app in ∆ remained at trisomic levels, significantly increased compared to the disomic control ( supplementary fig. ). the crispr correction of bace gene dose from to , resulted in a significant decrease in levels of putative bace -aβdp (aβ-clearance) products ( - & - ), as well as total bace -related non-amyloidogenic peptides ( - & - & - ), relative to amyloidogenic peptides (fig. d ). this pinpoints the triplication of bace as a likely cause of specific anti-amyloidogenic t effects we observed in fig. a . furthermore, we used two different dyes to detect any presence of amyloid deposits (the traditional thioflavine s, and a newer, more sensitive dye amyloglo ) in organoid sections. remarkably, elimination of the third bace copy caused the t organoids (that had not shown any overt amyloid deposits at div, see t c in supplementary fig. , top row) to develop extremely early ad-plaque like deposits (amyloglo+ and thioflavine s+) in the cortical part of the organoid by div ( supplementary fig. , middle row), that progressed aggressively and became much stronger and denser by div, accompanied by massive cell death ( supplementary fig. , bottom row, supplementary fig. ). in order to prove that extracellular deposits staining positively with amyloid dyes really are related to hyperproduction of aβ amyloidogenic peptides, we cultured t c ∆ organoids in media containing high concentrations of β and γ secretase inhibitors. early t c and t c ∆ organoids were treated with a combination of β-secretase inhibitor iv and compound e (γ secretase inhibitor xii) (supplementary table ) from div to div (fig. ). amyloid-like deposits were readily detected with amyloglo in the untreated and vehicle only treated t c ∆ organoids (fig. b ), but were completely absent from t c ∆ organoids treated with β and γ secretase inhibitors. inhibitor treatment also significantly reduced the number of neurons expressing pathologically conformed tau (tg -positive cells) in the t c ∆ compared to untreated controls (fig. c) . no amyloglo positive aggregates or tg -positive cells were detected in t c organoids under any treatment conditions at div (fig. a , c) and were also absent in the same organoids at div (fig. g, l, supplementary fig. ). also, no obvious deleterious effects of the inhibitors, or vehicle control, could be seen in early unedited t c organoids. further histo-pathological verification showed that elimination of one copy of bace triggered progressive accumulation of extracellular deposits that co-stain with thioflavine s and antibodies against aβ, both g and neo-epitope specific aβx- &aβx- . the antibody signal intensity in colocalisations with thioflavine s drastically increased upon pre-treatment with % formic acid (fig. a-d) , proving that the deposits contain insoluble aβ material. this is further corroborated by the isolation of fibrillary material from the detergent-insoluble fraction of the crispr-edited organoid. when viewed by transmission electron microscopy (tem) the filaments found exhibited a straight morphology of < nm diameter ( supplementary fig. a ), closely resembling fibrils grown in vitro from synthetic aβ - peptide (supplementary fig. c ). furthermore, neuritic plaque-like features were detected by ihc co-staining with gallyas in crispr-edited organoids (fig. m, n) , but not their unedited t control (fig. l) . human brain from an ad patient is shown for comparison stained with gallyas (fig. k) . tau pathology was also observed by ihc using the hyper-phosphorylated tau antibody at (fig. e , f), and by i.f. for conformationally altered tau (tg , fig. g -j). the relative increase in the amount of conformationally altered (pathological) tau in crispr-edited organoids t c Δ , compared to unedited t control organoids, was also independently confirmed by immunoblotting using tg antibody. as shown in fig. o , the protein material isolated from t c Δ organoids produced significantly more tg signal than unedited controls, albeit having a weaker signal with the general r-tau antibody (consistent with the observed neuronal loss, supplementary fig. ). our data in figs. , , show that severing the bace dose by a third, using crispr/cas , might tip the balance against the anti-amyloidogenic activity, and provoke ad-like pathology. our data in fig. suggest that anti-amyloidogenic activity of bace is gene-dose dependent, and its level varies between individuals, with snp allelic differences in bace gene correlating with age of dementia onset. we therefore hypothesized that organoids grown from some people with ds may develop ad-like pathology without any crispr-cas intervention. we then tested this hypothesis using ipsc lines from different individuals with ds, and one dupapp patient (table ) . we detected amyloid-like aggregates (both diffuse and compact in appearance) in / unedited ipsc-derived organoids from people with ds, and one with dupapp (fig. ) . the two donors whose ipsc-organoids did not show pathology are (i) the t ipsc from our isogenic model (whose clinical status is unknown) and (ii) qm-ds , a donor who remains free from dementia symptoms at age (table ). organoids from another ds donors, and one dupapp patient, (all diagnosed with clinical dementia) all showed presence of diffuse and compact amyloid-like deposits (fig. ) as well as presence of neuritic plaque-like features (focal hyper-phosphorylated tau (at +), conformationally altered tau (tg +) and filamentous tau (at +)) within neuropil neurites within plaque-like circular foci ( fig. a-n) . this was corroborated by gallyas intra-neuronal positivity ( fig. o-t) . similarly as for t c Δ , we were able to isolate fibrillary material from the detergent-insoluble fraction of qm-dupapp organoid ( supplementary fig. b ), that on tem resembled fibrils grown in vitro from synthetic aβ - peptide ( supplementary fig. c ). most importantly: tested individual organoids from one donor (from multiple ipsc lines and multiple independent experiments) either all did (dupapp, qm-ds - ), or all did not (isogenic t , qm-ds ) show ad-like pathology (table ) , proving the pathology is donor dependent. this open possibilities of developing assays for pre-therapy riskstratification and individualized drug-response quantitation. several human brain studies show detectable expression and β-secretase activity of bace , though at much lower levels than that of bace [ ] [ ] [ ] [ ] . chemical inhibition of β-secretase activity is an attractive therapeutic approach aimed at reducing the production of aβ [ ] [ ] [ ] . complete knock-out of bace abolished all β-secretase activity in mouse neurons, while leaving some degree of β-secretase activity in astrocytes . this activity was abolished by the complete knockout of both bace and bace , leading to a hypothesis that a bace -driven β-secretase activity in astrocytes may contribute to accelerate the aβ-production and ad-pathology in ds . in human brain, the β-secretase activity of bace correlated positively with the amount of aβ, whereas the β-secretase activity of bace did not . on the other hand, snps at the bace locus (and not bace ) correlate with the age of onset of dementia in people with ds , as well as sporadic load in euploid people in the finnish population , and a recent report showed that a de novo intronic deletion within one allele of bace caused eoad in a year old euploid person . all of the above data (and new data we show in supplementary fig. ) implicate that a single allele alteration in the genetic dose of bace is capable of affecting the risk of ad-dementia, but do not resolve the question whether bace per se acts predominantly as an accelerator, or a suppressor of ad pathology. the answer to this question requires clarification, as most chemical inhibitors used in clinical trials have dual activity against bace and bace , . the increased ratios of - & - (bace -aβdp) to the amyloidogenic and α-site products are among our most consistent and robust observations in t organoid cm and ds-csf ( fig. b- c). the - generating cleavage can only occur after the cuts by both β-and γ-secretases have released aβ, because the hidden transmembrane site between aa and aa is inaccessible to any proteolytic enzymes until the soluble aβ ( - to - ) molecules are released from the membrane , . therefore, the aβ - species can only be a product of an aβdp activity (a catabolic degradation or clearance of an already made aβ - to - peptides). besides bace , the only enzymes with potential to cleave the peptide bond leu -met are bace , , and extracellular matrix (ecm) metalloproteinases (mmp and mmp ) , since no other aβ degrading enzymes (neither ide, nor nep, nor ece) are known to cleave at this site . bace action is unlikely to cause the increased ratios we observe, as bace can only generate this cut in solution at very high enzyme concentration and after prolonged incubation . to further corroborate this point, we designed a novel fret-assay and established the conditions in which bace can efficiently cleave at aβ site ( we also demonstrated that two bace inhibitors (β-secretase inhibitor iv -cas - - (calbiochem, originally a merck compound)), and ly (eli lilly compound recently used in clinical trials) both inhibit the aβdp activity of bace in vitro, while the γ-secretase inhibitor (dapt) had no effect. this suggests that the aβdp activity (cutting the peptide bond leu -met ) has a different enzymatic preference, conditions, and ph, as compared to the classical β-secretase cleavage that both bace and bace are capable of. as fret assays cleaving this classical (before asp ) site are generally used to measure the bace inhibitors' selectivity for bace or bace , our data suggest that the degree of selectivity for any given inhibitor calculated this way, does not necessarily reflect whether the same selectivity would apply for their cross-inhibition of the leu -met site cleavage (aβdp) activity. interestingly, the presence of the aβx- degradation product, both alone and co-localising with bace ( fig. ) show elevated levels in cells and extracellular aggregates immediately surrounding neuritic plaques, suggesting bace degradation of not only newly produced aβ, but also of aβ that is released and re-deposited (from and to) existing deposits. a recent report on widespread somatic changes in individual neurons suggests an additional mechanism for the production of toxic aβ species, including products that do not require secretase cleavage , underscoring the importance of efficient aβ degrading mechanisms that protect from ad, such as the one exerted by bace that we describe here. a recent mouse model has shown that introducing a third dose of chromosome to a mouse that several hundred fold over-expresses aβ and worsens the amyloid plaque load, and this correlates with an unexpected decrease in the aβ / ratio . this unfavourable ratio effect (the cause of which is unknown) is expected to worsen the plaque load and ad pathology, and a mere . x increase of bace dose in this mouse model has no chance in protecting the mouse against a > x overload of aβ. in another mouse model, where transgenic bace was artificially overexpressed together with transgenic wtapp, it actually decreased aβ and to the wt mouse control levels, and the presence of bace transgene reversed behavioural pathologies seen in tgapp mouse . this indicates that a balance of doses of app and bace affects levels of soluble aβ and , and their oligomerization and aggregation as a consequence. our results in figs , and further corroborate that a significant disturbance of this balance by a reduction in bace copy number is sufficient to cause an early ad-like pathology in t cerebral organoids. we did not see any amyloid plaque-like structures at > div organoids from three independent t ipsc lines (or normal disomic lines) of our isogenic system (supplementary figs. , , , , , , figs. , , ) . surprisingly, crispr/cas elimination of the third copy of bace in the same t line caused widespread amyloglo+ deposits at div, and widespread neuritic plaque-like structures with profound neuron loss ( supplementary fig. , ) and tau pathology at div (figs. , ). our data in figs. and supplementary fig. suggest that anti-amyloidogenic activity of bace is gene-dose dependent, and its level varies between individuals, with snp allelic differences in bace correlating with age of dementia onset. we therefore hypothesized that organoids grown from some people with ds may develop ad-like pathology without any crispr-cas intervention. diffuse amyloid plaque-like appearance with tau pathology was recently reported in days old cerebral organoids from only a single ds-hipsc line so far. we subsequently analysed ipsc-derived organoids at approximately the same cell culture age from a total of different individuals with ds and one with dupapp. we found flagrant adlike pathological changes in / ds tested ( %), as well as the one dupapp. very interestingly, when this assessment was repeated in independent experiments, and when individual organoids from a single experiment were compared, it was a black/white picture: either they all had adlike pathology, or none did, driven solely by the genotype of the donor (table ). our data, though not conclusive, are illustrative of the stratifying potential of this technology. for example, the cerebral organoids from individual qm-ds showed the worst ad-like pathology with fibrillary amyloid deposits ( fig. f , i, j, table ), and this individual was diagnosed with dementia at age . in contrast, organoids from individual qm-ds showed no pathology (fig. b , table ), and this individual was also dementia-free at age . this opens up possibilities for finding correlations with clinical parameters, for which a much larger number of individuals would have to be tested. to confirm that the amyloglo deposits were in fact aggregated β-amyloid containing material, early organoids were treated with a combination of β-secretase inhibitor iv (βi-iv) and gamma secretase inhibitor xii (compound e) (fig. a, b) . the combination of these inhibitors should prevent any production of aβ, and therefore eliminate amyloglo positivity. after treatment for days, the inhibitor treatment did indeed prevent the formation of plaque-like deposits within t c Δ organoids, confirming that such deposits are comprised of β-amyloid. the same treatment conditions also significantly reduced the number of tg -positive cells in t c Δ organoids (fig. c) , highlighting the ability to modulate both amyloid and tau pathology in the cerebral organoid system. this also demonstrates the feasibility of using this ad-like organoid pathology in future hypothesis-free drug screens for chemical compounds that may prevent/inhibit amyloid production or aggregation. in view of our results, it becomes inviting to hypothesize that triplication of bace may be the cause of the delayed onset of dementia in % of people with ds compared to dupapp , and (because of the predicted abundance of bace mrna in endothelial cells) also the cause of a significantly lower degree of cerebral amyloid angiopathy (caa) in the brains of people with ds compared to those of dupapp . our organoid system is not informative in this regard, as we could not detect any endothelial cells in our organoids (not shown). this, however, is also an advantage, as it allows uncovering the mechanisms that are specific to neurons in the absence of endothelial or blood cell derived tissue components. in neurons, a recent report also found that an increased app dose may act (through an unknown mechanism) as a transcriptional repressor of several chromosome genes, including bace . this observation needs further verification and mechanistic explanation, but if true, it would imply that the protective effect of the third copy of bace in ds that we observe is actually quenched by the third copy of app, which opens up possibilities of chemically intervening to inhibit this transcriptional repression and potentially unleash a much greater degree of bace protection. an integration of the two observations (the one in and the one in our report) suggests this could be exploited as an additional new protective/therapeutic strategy for ad in general. we found, surprisingly, an equally high or higher level of colocalisation of aβx- with lamp a, as with the general lamp (fig. ) . the high level of colocalisation with lamp a and absence of colocalisation with either lc a or lamp (fig. ) , suggest that aβdp activity of bace that generates aβ is not related to classical lysosomal degradation or macroautophagy, but rather could be related to a cma-like process , . the only published study that linked cma with app processing found a motif that satisfies the criteria for a cmarecognition kferq motif at the very c-terminus of app (kffeq), and this paper demonstrated that c (β-ctf) can bind hsc . however, paradoxically, when this motif is deleted from the β-ctf, the binding to hsc is not abolished, but rather increased, suggesting the presence of another, alternative cma-recognition motif within the β-ctf peptide . the association of the aβdp x- product with lamp a/cma compartment is a provocative new observation that requires further studies. in conclusion, we found that relative levels of specific non-amyloidogenic and aβdp (aβclearance) products are higher in t organoids and ds-csf, and they respond to the dose of bace (and not app). we also demonstrated that bace -aβdp activity generating one of these products can be cross-inhibited in solution by recently clinically tested bace -inhibitors. all components of the aβdp degradation reaction (hitherto only demonstrated in solution in vitro): the main substrate (aβx- ), the enzyme (bace ), and its putative degradation product (aβx- ), we found highly colocalised in discrete intracellular vesicles in human brain neurons, (and not astrocytes), suggesting that at least some of the aβdp activity generating aβx- takes place intra-neuronally and physiologically during lifetime, before the onset of ad pathology, in both normal and ds brains. furthermore, we directly demonstrated that the third copy of bace protected t -hipsc organoids from early ad-like amyloid plaque pathology, therefore proving the physiological role of bace as an ad-suppressor gene. the bace 's θ-secretase antiamyloidogenic cleavage and the aβdp degradation actions could both be contributing to an overall ad-suppressive effect. regardless of the contribution of each of these modes of action, our combined data suggest that increasing the action of bace could be exploited as a therapeutic/protective strategy to delay the onset of ad, whereas cross-inhibition of bace -aβdp activity by bace -inhibitors would have the unwanted worsening effects on disease progression. we also show that cerebral organoids from genome-unedited ipscs could be explored as a system for pre-morbid detection of high-risk population for ad, as well as for identification of natural dose-sensitive ad-suppressor genes. human subjects were participants in the "the london down syndrome consortium table . upon specific informed consent, three to six individual strands of hair were non-invasively plucked from the scalp hair of donor subjects, and placed in transport medium [dmem (sigma d ), mm glutamine (sigma g ), x pen/strep (sigma, p ), % foetal calf serum]. upon arrival to the lab, hair follicles were placed in collagen coated t flasks in kgm medium (lonza cc- ) and incubated at o c, % co . primary keratinocyte cultures were split after reaching - % confluency using . % trypsin/ . % edta. primary keratinocyte cultures were expanded to % confluency, electroporated with plasmids encoding reprogramming factors in episomal vectors (non-integrational reprogramming), and (life technologies) supplemented with penicillin/streptomycin. passaging was carried out using relesr and μm rock inhibitor was included in culture media for hours after passaging. cerebral organoids. cerebral organoids were generated following the standard protocol with the following changes . ipsc lines were first transitioned into feeder free conditions using either mtesr or e media with geltrex. to form embryoid bodies (ebs), hipscs were washed once with pbs, then incubated with gentle cell dissociation solution (stemcell technologies) for mins. this solution was then removed and accutase added and incubated for a further mins. mtesr /e medium at double the volume of accutase was added to the cells and a single cell suspension generated by titruating. cells were centrifuged to remove accutase and then resuspended in hesc medium supplemented with ng/ml fgf and micromolar rock inhibitor. cells were used to form a single eb in each well using either a v shaped ultra low attachment well plate (corning). specifically, ipscs were allowed to form embryoid bodies (ebs) in suspension by culturing for days in hesc medium with low fgf, in non-adherent culture dishes. after - days, ebs were transferred into a well ultra low attachment plate for neural induction. neural induction was achieved by culturing for further - days in dmem-f supplemented with % of each: n , glutamax and mem-neaa, plus μg/ml heparin. neurally induced ebs showing neuroectodermal "clearing" in brightlight microscopy were embedded in matrigel droplets, and transferred to cm dishes containing organoid differentiation medium-a, (for - days), followed by organoid differentiation medium+a . organoid maturation was carried out with - organoids per cm dish on an orbital shaker at °c, % co . aliquots of conditioned medium (cm) were collected from mature organoids ( - days old from day of eb formation), - days after feeding (to allow time for cells to secrete products into the culture media). three completely independent experiments were carried out each time starting from undifferentiated ipsc stage, and cm was collected at - timepoints in each experiment. cm was immediately frozen and stored at - °c. for inhibitor treatment, organoids were treated from div ( days after embedding in matrigel) to div. βi-iv and compound e were added freshly to the media before use at final concentrations of . μm and nm respectively. media was replaced every - days during treatment. dmso of the same volume was used as a vehicle only control. cm from organoids was analysed by ip-ms, using a previously described method . the team performing the ms was blinded to the genotypes in all experiments. in exp , all three independent trisomic lines (t c , t c , and t c ) were compared to two independent disomic lines (d c and d c ), whereas in exp , two independent trisomic lines (t c and t c ) were compared to two independent disomic lines (d c and d c ). in exp , a t c line was compared to the isogenic d c line, and to hipsc lines from unrelated individuals: a dupapp feoad patient (qm-dupapp), and two unrelated adult people with ds (qm-ds and qm-ds ). in all experiments, ip-ms results for all ipsc lines that were used in a particular experiment are shown. ip-ms results were used to calculate the relative ratios of peptides and these ratios were taken as data points for the statistical comparisons. ip-ms spectra were also obtained from the csf samples of people with ds and age-matched normal controls. peak ratios calculated as described above. the cohorts, methods and spectra behind these data were previously described . . fish on organoid cryosections was performed as described . briefly, slides were rinsed in pbs, rehydrated in mm sodium citrate buffer and incubated in the same buffer at ⁰c for min. slides were cooled down and incubated in x saline sodium citrate (ssc) for min and in % formamide in x ssc for h. after incubation slides were covered with previously prepared hybridization chamber and incubated with μl of fig. ) . western blot. for western blots, whole cell lysates of crispr edited or unedited ipscs (fig. c ) or organoids (fig. o) were separated in a % acrylamide gel by sds-page and transferred to a nitrocellulose membrane according to the manufacturers protocols (bio-rad). following a min incubation in % non-fat milk in tbs-t the membrane was incubated with primary and secondary antibodies (supplementary tables , ) . for the stainings shown in fig. o quantitations were done strictly on the same membranes re-stained using the antibodies shown. for the protein of interest (bace or tg ), the signal was adjusted to corresponding βactin loading control for all samples. such adjusted values for unedited c (wt) (n= ) were set to , and used to calculate the fold change for c Δ (n= ) replicates, and the resulting fold-change values for pairs run on the same gel were averaged and analysed by student's t-test. membrane stripping between stainings was carried out using thermo-fisher stripping solution, following manufacturer's instructions. amyloglo and thioflavine s staining. for amyloglo staining, oct embedded slices were rinsed with pbs, and incubated in % ethanol for min at rt, followed by washing with milli q water for min at rt. slices were then incubated with amyloglo solution for min in the dark at rt, followed by washing in . % saline solution for min at rt, and counterstaining with draq for min at rt. thioflavine s staining was performed as described supplementary fig. a ), pre-incubation with bace specific immunogenic peptide ( supplementary fig. b-e) and lambda (λ) scan function on confocal microscope ( supplementary fig. f, g) . three different samples (ds-ad , ds ( yrs) pre-ad and euploid sporadic ad ( yrs) after ihc were stained with . % sudan black b in % ethanol for min at rt and analysed on confocal microscope with aiyrscan. sample ds-ad was stained with antibodies solution, hrs pre-absorbed with bace specific immunogenic peptide, and analysed on confocal microscope and slide scanner. lambda scan records a series of individual images within a defined wavelength range (in our case from nm to end of spectrum) and each image was detected at a specific emission wavelength, at nm intervals. for lambda scan analysis, samples were stained with one primary antibody and labelled with far-red secondary antibody ( ). as negative control, we used secondary antibody ( ) alone and, as additional negative control, one sample was counterstained with dapi only, without secondary antibody. as we used a far-red ( ) antibody, we analysed expression from nm to the end of spectrum at nm intervals. aβx- and bace antibodies showed specific peaks, significantly over and above the autofluorescent signal, in all three specific roi indicated in fig. fig. ). gallyas staining. for gallyas staining samples were depariffinised and/or rinsed in pbs, then treated with ammonium-silver nitrate ( . g nh no , . g agno , . ml % naoh) solution for min protected from the light, rinsed with . % acetic acid ( x min) and placed in developer solution for - min. developer solution was made from three stock solutions: ml of solution a ( g na co + ml distilled water), . ml of solution b ( g nh no + g agno + g tungstosalicic acid hydrate + ml distilled water) and . ml of solution c ( g nh no + g agno + g tungstosalicic acid hydrate + . ml % formaldehyde solution + ml distilled water). after developer solution samples were rinsed in water and placed in destaining solution ( g k co + g edta-na + g fecl + g na s o + g kbr + ml distilled water). finally, samples were rinsed two times in . % acetic acid. after staining samples were rinsed in water, dehydrated in a graded series of ethanol, cleared in histo-clear and mounted with histomount mounting medium. samples were scanned by shown for the lack of space, data available on request). the genome integrity of the isogenic ipsc lines was previously published (but was repeated here as described above). no additional rearrangements due to re-programming or passaging were observed. bace locus snps: the cohort of people with ds has been described in recent reports , . in brief, participants donated dna samples and had detailed cognitive and clinical assessments to determine dementia status . age of dementia diagnosis was established and used in snp analysis. bace snp genotyping for the londowns cohort was undertaken as previously supplementary fig. ) were nominally associated with aoo in the londowns cohort, but were not significant after correction for multiple testing. quantitative paralogous amplification-pyrosequencing was carried out based on the published method . this method takes advantage of the existence of identical sequences on chromosome and one other autosome, allowing amplification of both loci with a single primer pair. paralogous sequence mismatches in amplified products from chromosome (gabpa and itsn) can be quantified relative to their paralogous regions on chromosome and respectively. as such, trisomic cells show a : ratio for the paralogous sequence, while disomic cells produce a : ratio. primers used are listed below, and pyrosequencing was performed on the pyromark q machine (qiagen) following standard procedures. crispr/spcas -hf cas editing of the bace locus. the guide-rna (grna) targeting bace exon was cloned into a vector containing the high fidelity spcas -hf and blasticidin s resistance gene. the complete plasmid was delivered via lipofectamine to a trisomic ipsc line t c (full official name nizedsm it -c ), which was described and characterized in a previous report . untransfected ipscs were removed by treatment with blasticidin ( μg/ml for h). individual colonies were picked and further sub cloned by limiting dilution to achieve clonal cell lines. dna was purified from individual clones, pcr amplified and sequenced by sanger sequencing. sequences were analysed in mutation surveyor (v . . ) and "tracking indels by decomposition (tide)" (tide v . . , desktop genetics). tide analysis of the crispr-targeted clone . . dna sequence gave a score of % of the wt read remaining (not shown). the quality of the grna was assessed using two different prediction software platforms: cctop online software , and the mit online platform (http://crispr.mit.edu/). the same two software platforms were used to predict the off-target sites. neither platform found any off-targets with , or mismatches. the top cctop-predicted sites were pcr amplified in both Δ and wt clones, then sequenced by sanger sequencing to rule out off target events. no differences in the sequence were found. protein isolation from cortical organoids. organoids were collected at specified durations in culture (expressed as days in vitro (div)) and washed twice with ice-cold pbs. the samples were resuspended in ice-cold np- buffer ( mm nacl, % np- , mm tris ph ) containing edta free protease inhibitors (complete cocktail, roche) and lysed using a ml tissue homogenizer (fisher). each sample was centrifuged at , rpm for minutes at ˚c and the homogenates were stored at - ˚c. protein concentration was determined using the bicinchoninic acid method (bsc, pierce). (tem). organoids were lysed following the same procedure for protein extraction, however, samples were initially spun at , g for minutes at ˚c. following the first centrifugation, supernatants were removed and kept on ice. the remaining cell pellets were resuspended in x weight/volume buffer ( mm tris-hcl ph . , . m nacl and % sucrose) containing proteases inhibitor and spun at , g for minutes at ˚c. an equal volume of supernatant was added to the supernatant from the second centrifugation step. % n-lauroysarcosinate (weight/volume) was added and the samples were rocked at room temperature for one hour. the samples were ultra-centrifuged at , g for one hour at ˚c. the supernatant was decanted and the sarkosyl-insoluble pellet was resuspended in ice cold pbs prior to imaging. the samples were deposited on to glow-discharged mesh formvar/carbon film-coated copper grids. negatively stained with a % aqueous (w/v) uranyl acetate solution and then immediately analysed at kv using a jeol tem equipped with a gatan orius camera. tem analysis of synthetic aβ - fibrils in vitro. synthetic aβ peptide powder (china peptides) was treated with , , , , , -hexafluoro- -propanol (hfip) and lyophilized. the peptide was then dissolved in µl of mm naoh and then diluted with buffer. a µm stock of this monomeric aβ peptide was grown at ˚c shaking at rpm for - hours before recording the tem images. µl of extract was added to a nm thick, lacey carbon on mesh grid (glow-discharged) for minutes followed by negative staining with % uranyl acetate for minute and then air dried. the grids were then viewed under fei t , kv transmission electron microscope equipped with a k ccd camera (fei) at x magnification under low dose conditions. all data that support the findings described in this study are available within the manuscript and the related supplementary information, and from the corresponding authors upon reasonable and after digestion with hpych iv(cut), for the initial clone . , and its colony-purified sub-clone . . (renamed further below as "Δ "). the bp fragment in . . is reduced to % of the wt value (normalized to the bp band), and a de novo bp fragment appears in crspr targeted line (red asterisk). c western blot stained with anti-bace antibody of the lysates of the ipsc line Δ compared to the wt t c ipsc line. quantification of the total actin-normalised bace signal showed a significant reduction in Δ compared to tau. β-actin was used as a loading control. human brain tissue of a year old is shown for comparison. comparison of the average values (n= ) for crispr-edited t c Δ showed a highly significant relative increase in tg compared to unedited (n= ) t c organoids, as indicated in the graph, p= . . scale bar: μm. the secretases: enzymes with therapeutic potential in alzheimer disease the amyloid hypothesis of alzheimer's disease: progress and problems on the road to therapeutics amyloid plaque core protein in alzheimer disease and down syndrome decreased clearance of cns beta-amyloid in alzheimer's disease alzheimer's disease association of dementia with mortality among adults with down syndrome older than years a genetic cause of alzheimer disease: mechanistic insights from down syndrome eta-secretase processing of app inhibits neuronal activity in the hippocampus beta-secretase cleavage of alzheimer's amyloid precursor protein by the transmembrane aspartic protease bace function, therapeutic potential and cell biology of bace proteases: current status and future prospects physiological functions of the beta-site amyloid precursor protein cleaving enzyme and identification of bace as an avid ss-amyloid-degrading protease a non-amyloidogenic function of bace- in the secretory pathway beta-secretase cleavage at amino acid residue in the amyloid beta peptide is dependent upon gamma-secretase activity bace , as a novel app theta-secretase, is not responsible for the pathogenesis of alzheimer's disease in down syndrome increased app expression in a mouse model of down's syndrome disrupts ngf transport and causes cholinergic neuron degeneration presence of soluble amyloid beta-peptide precedes amyloid plaque formation in down's syndrome isogenic induced pluripotent stem cell lines from an adult with mosaic down syndrome model accelerated neuronal ageing and neurodegeneration generation of cerebral organoids from human pluripotent stem cells characterization of amyloid beta peptides in cerebrospinal fluid by an automated immunoprecipitation procedure followed by mass spectrometry app locus duplication causes autosomal dominant early-onset alzheimer disease with cerebral amyloid angiopathy cognitive markers of preclinical and prodromal alzheimer's disease in down syndrome neurofilament light as a blood biomarker for neurodegeneration in down syndrome polymorphisms in bace may affect the age of onset alzheimer's dementia in down syndrome de novo deleterious genetic variations target a biological network centered on abeta peptide in early-onset alzheimer disease altered cerebrospinal fluid levels of amyloid beta and amyloid precursor-like protein peptides in down's syndrome abeta truncated species: implications for brain clearance mechanisms and amyloid plaque deposition macroautophagy--a novel beta-amyloid peptide-generating pathway activated in alzheimer's disease introducing amylo-glo, a novel fluorescent amyloid specific histochemical tracer especially suited for multiple labeling and large scale quantification studies bace and bace enzymatic activities in alzheimer's disease expression analysis of bace in brain and peripheral tissues bace expression increases in human neurodegenerative disease altered beta-secretase enzyme kinetics and levels of both bace and bace in the alzheimer's disease brain a promising, novel, and unique bace inhibitor emerges in the quest to prevent alzheimer's disease the bace- inhibitor cnp for prevention trials in alzheimer's disease bace inhibitor drugs in clinical trials for alzheimer's disease phenotypic and biochemical analyses of bace -and bace -deficient mice chromosome bace haplotype associates with alzheimer's disease: a two-stage study future therapeutics in alzheimer's disease: development status of bace inhibitors sequential amyloid-beta degradation by the matrix metalloproteases mmp- and mmp- proteolytic degradation of amyloid beta-protein. cold spring harb perspect med , a somatic app gene recombination in alzheimer's disease and normal neurons trisomy of human chromosome enhances amyloid-beta deposition independently of an extra copy of app in vivo effects of app are not exacerbated by bace co-overexpression: behavioural characterization of a double transgenic mouse model modeling amyloid beta and tau pathology in human cerebral organoids patterns and severity of vascular amyloid in alzheimer's disease associated with duplications and missense mutations in app gene, down syndrome and sporadic alzheimer's disease the impact of app on alzheimer-like pathogenesis and gene expression in down syndrome ipsc-derived neurons unique properties of lamp a compared to other lamp isoforms the coming of age of chaperone-mediated autophagy regulation of amyloid precursor protein processing by its kferq motif d-fish on cultured cells combined with immunostaining hallmarks of alzheimer's disease in stem-cell-derived human neurons transplanted into mouse brain bonferroni sequential correction: an excel calculator ( . ) the londowns adult cognitive assessment to study cognitive abilities and decline in down syndrome detection of aneuploidies by paralogous sequence quantification high-fidelity crispr-cas nucleases with no detectable genome-wide offtarget effects cctop: an intuitive, flexible and reliable crispr/cas target prediction tool tau proteins of alzheimer paired helical filaments: abnormal phosphorylation of all six brain isoforms quantification of the total actin-normalised app signal showed no significant difference between Δ and unedited t line, whereas they both had significantly higher app protein levels compared to the disomic control line. error bars: standard error, p-values after standard one way anova and tukey's multiple comparisons test staining with amyloid specific dye (amyloglo) and nuclear dye (draq ) supplementary fig. . cell death and neuronal loss in crispr-edited t c Δ organoids number of dapi+ nuclei are shown in the volume of µm . graph show decreased number of nuclei in crispr-edited t c Δ (div ) organoids compared to parental t c organoids and significantly decreased number of nuclei in div organoids (p< . ) electron micrographs of negatively stained filaments isolated from insoluble fraction of the ad-like pathology containing organoid lysates. a, b representative straight filaments found in the lysates from the organoids t c Δ and qm-dupapp, respectively. c aβ - synthetic peptide fibrils grown in vitro secondary antibody alone controls for organoid immunostaining. dapi staining confirms the presence of cells, but no unspecific signal from secondary antibodies both antibodies show the same pattern of expression and colocalisation after sudan black b staining (white arrows: intraneuronal fine-vesicular pattern and black arrows with white arrowhead: amorphous extra-cellular aggregates) except for a loss of the large intraneuronal spherical granules (white arrowheads, fig. ), which are likely lipofuscin. scale bar: μm. b and c chromogenic, immunohistochemical analysis of the human brain sections of ds-ad , stained using polymer-hrp/ap doublestaining kit. b the primary antibody against bace was labelled with dab (brown) b(i) is a zoomed-in inset of the rectangle in b. c same as b, but both antibodies were pre-absorbed for hours, and incubated overnight, with the excess of immunogenic peptide for the bace antibody; c(i) is a zoomed f and g in order to distinguish the contribution of lipofuscin autofluorescence to the colocalised signals, specificity of primary antibodies (aβx- and bace ) has been validated using lambda (λ) scan function on confocal microscope (see methods). f aβx- shows specific peak in different roi as negative control of staining, dapi and secondary antibody alone were used. g bace also shows specific peak in different roi and uniform pattern in human brain. h secondary antibody alone control supplementary fig. . crispr/spcas -hf -mediated reduction of bace copy number from to in the t c hipsc line, reduced bace protein expression to disomic levels, but does not alter the level of app protein. western blot stained with anti-bace antibody or anti-app antibody of the lysates of the ipsc line Δ compared to the wt t c , and d c ipsc lines. quantification of the total actin-normalised bace signal showed a % reduction in Δ compared to t unedited line, and no significant difference compared to supplementary fig. . cerebral organoids express cortical neuronal layer-specific and astrocyte markers.supplementary fig. . comparison of the proportions of neurons and astrocytes to total cells in cerebral organoids. isogenic d and t cerebral organoids generated mostly neurons and a small proportion of astrocytes, with no differences in the proportion of astrocytes or neurons in d compared to t . similar proportions were also detected in organoids from dupapp, qm-ds and qm-ds ipscs. fig. . two new single nucleotide polymorphisms in bace intron correlate with age-of-dementia-onset among individuals with ds, and co-localize with a denovo deletion causing non-ds eoad.supplementary fig. . aβx- colocalises with bace much more than with bace in t cerebral organoids.supplementary fig. . validation of crispr-edited ipscs by snp array and paralogousloci-amplification-quantitative pyrosequencing.supplementary fig. . crispr/spcas -hf -mediated reduction of bace copy number from to in the t c hipsc line, reduced bace protein expression to disomic levels, but does not alter the level of app protein.supplementary fig. . staining of extracellular β-amyloid deposits in organoids with two different methods. related to fig. : fig. a : variability between individual ipsc lines (representing individual re-programming events) was tested by anova in exp , where all independent trisomic lines of our isogenic model were used in a single experiment. no significant differences between individual lines were found in any of the calculations shown in fig. , demonstrating that our peptide-ratio-readout parameter is driven by the genotype, and not re-programming artefacts or culture history of the ipsc lines (data did not fit the allowed space, available on request).as peptide-ratio readouts differed slightly between three independent experiments, we are showing complete data here for each experiment individually. as shown in fig. a , the difference (or the absence of difference) caused by t in an isogenic comparison remained stable in each of experiments. in exp , for the ratio of - /amyloidogenics, the isogenic comparison of t v d showed a p= . ( -tailed t-test), which dropped to p= . after anova comparison with all individual samples. also in exp , we further performed an analysis by genotype groups. for the aβdp/amyloidogenics ratio, the combined t samples (n= ) were significantly higher than d (anova p= . ), and significantly higher than dupapp (anova p= . ), whereas d is not significantly different from dupapp. the same result was obtained for the total bace /amyloidogenics ratio: combined t (n= ) v d , anova p= . ; combined t (n= ) v dupapp, anova p= . , and d v dupapp shows no significant difference. the comparison of α-site cleavages ( - & - )/amyloidogenics never showed any significant difference irrespective of how the samples were grouped. fig. : fig. : the fret assay positive control was performed using recombinant human bace at ⁰c, ph= . for h in the r&d systems assay buffer, as specified in the manufacturer's protocol, using the r&d systems fret control peptide (es ). in three technical replicates the blank-subtracted raw fluorescence readings obtained were , (± sem). bace with the new fret peptide for the aβdp cleavage after aa (in the absence of any inhibitors) gave blank-subtracted readings , (± sem). this was taken as the % value for the graphs shown in fig. . for comparison, bace incubated with the same fret peptide, using the manufacturer's assay buffer for bace , gave the readings of (± sem) in the same experiment. fig. and supplementary fig. : we compared the degree of colocalisation between either bace or bace , and aβx- clearance product in organoids, along with other markers of intra-neuronal compartments: flotillin (general marker of lipid rafts), rab (late endosome marker), sortilin (a major apoe receptor linked to aβ catabolism), and lamp (one of the lysosomal membrane proteins often used to visualize lysosomes in studies of aβ-processing). both bace and bace , as well as aβx- highly colocalised with flotillin , suggesting that this type of aβ degradation takes place in lipid raft containing vesicles ( fig. and supplementary fig. ). however, bace and bace differed in vesicular sub-compartment distribution: bace was highly colocalised (> . ) with each sortilin and rab and only weakly with lamp ( . ), whereas bace did not co-localise with sortilin(< . ), but colocalised moderately with rab ( . ) and highly with lamp (> . ) (supplementary fig. ) . interestingly, the localisation of the aβx- fragment closely resembles the pattern of bace , and not of bace : (pearson coefficient of . with each sortilin and rab , and > . with lamp ), further supporting the observation of aβx- (> . ) localisation with bace and less so with bace , in both organoids ( fig. and supplementary fig. ) and human brain (fig. ) . in order to define the compartment with the highest concentration of aβx- within the endo-lysosomal system more precisely, we co-stained the aβx- neoepitope-specific antibody with other markers associated with aβ processing: lc a (macroautophagosome marker), eea (early endosome marker) and lamp (a classical lysosome marker). surprisingly, none of these markers showed any colocalisation, demonstrating that aβx- is not present in either early endosomes, macro-autophagosomes, or classical lysosomes (fig. ) . as aβx- did not colocalise with lamp or lc a, but colocalised strongly with lamp , we tested a colocalisation with the components of an alternative autophagy pathway that would be compatible with this pattern of colocalisations: chaperonemediated autophagy (cma). unexpectedly, we detected an extremely high level of colocalization of aβx- with both hsc (chaperone in cma) and lamp a, (the isoform of lamp that is the main protein controlling the levels of cma activity) (fig. ) . some intraneuronal lamp a+ vesicles appear to contain both hsc and aβx- (fig. ) . these data suggest that aβdp activity of bace is linked with the cma pathway. fig. : fig. a -d: as immunofluorescence on brain sections is susceptible to bright and false positive autofluorescent signals from lipofuscin granules, we confirmed the colocalisation of aβx- and bace using non-fluorescent, chromogenic dual labelled immunohistochemistry ( supplementary fig. b) , where the specificity of the bace antibody was further verified by pre-absorption control with the immunogenic peptide ( supplementary fig. c ). this method confirmed the intra-neuronal co-localization of aβx- and bace signals. the bp deletion causes a frameshift at aa of bace protein sequence. this introduces a stop codon within the protease cleavage domain at aa . the potential off-target effects of the crispr guide rna used were tested using two prediction software tools: cctop and http://crispr.mit.edu/. no target sequences were found with , or mismatched nucleotides. no targets, that had three or more mismatches were overlapping between the two software predictions. in cctop, only two sites with three mismatches, and more sites with four mismatches were found. top loci from this prediction were amplified with the putative target sequence in the middle, and sequenced in the t wt ipsc compared to the ∆ ipsc line. no off-target effects of the crispr/spcas -hf intervention were detected. key: cord- - n h w authors: willforss, jakob; siino, valentina; levander, fredrik title: omicloupe: facilitating biological discovery by interactive exploration of multiple omic datasets and statistical comparisons date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n h w visual exploration of gene product behavior across multiple omic datasets can pinpoint technical limitations in data and reveal biological trends. the omicloupe software was developed to facilitate such exploration and provides more than interactive cross-dataset visualizations for omic data. it expands visualizations to multiple datasets for quality control, statistical comparisons and overlap and correlation analyses, while allowing for rapid inspection and downloading of selected features. the usage of omicloupe is demonstrated in three diverse studies, including an analysis of sars-cov- infection across omic layers, based on previously published proteomics and transcriptomics studies. omicloupe is available at quantitativeproteomics.org/omicloupe omic analysis carries the potential to reveal new biological understanding and serve as a source of biomarkers. still, omic data are challenging to work with, in part as they often contain considerable variation within and between experiments driven by both biological and technical factors, such as differing experimental conditions or sampling procedures. this variation needs to be considered to correctly interpreting the data. furthermore, choices of algorithms and statistical procedures for processing the data cause additional differences in the final results [ , ] . the variation seen in the data can represent valuable biological trends, but can also be caused by nuisance factors, such as batch effects [ ] or sample-to-sample technical variation. if the sources of trends in a dataset are understood, the dataset's reliability can be assessed, and robust approaches of analysis and follow-up studies can be designed. visualization is a critical tool for developing this understanding. in comparative studies, one commonly overlooked aspect is the in-depth analysis of how individual features, such as transcripts or proteins, detected in one set of samples behave in other samples, datasets, or types of omics. quality visualizations such as principal component analysis (pca), and visualizations based on the outcome of statistical comparisons such as volcano plots and p-value histograms are often used to study trends within datasets. as an extension, several approaches to multiomics have been presented where the aim is to project down multiple sets of data to the same low dimensional space, such that they can be jointly visualized and inspected [ ] [ ] [ ] . these provide useful overviews of multiomic datasets, but does not offer a detailed view of how individual features behave across multiple datasets in statistical comparisons. here, we propose an approach where single dataset visualization approaches are expanded to allow direct comparisons across datasets. use cases are, for example, ( ) biomarker studies where an initial set of candidates is to be validated ( ) time-series experiment where the global expression is inspected, for instance, at different times after infection ( ) multiomics experiments where multiple types of data are produced for the same or similar biological systems and ( ) detailed studies of comparisons between methods or software approaches. to facilitate such analyses, we here introduce the interactive software omicloupe, which leverages additions to standard visualizations to allow for explorations of features and conditions across datasets beyond simple thresholds, giving insight which otherwise might be lost. the tool aims to be easy to use, directly interface with upstream software and to enable exploration and exporting parts of particular interest in the data. in the present work, we further demonstrate how omicloupe can be used to rapidly explore complex datasets in three different use cases to improve the accessibility and capability of analysis of complex datasets, we developed omicloupe. it is an interactive piece of software accessible through any web browser (quantitativeproteomics.org/omicloupe), which can either be accessed online or installed and launched locally as an r package (https://github.com/computationalproteomics/omicloupe). a singularity container for execution without any prior dependency installation, and video tutorials are available to increase its accessibility. the code follows a modular design, promoting the extension of omicloupe with additional visualizations in the future. omicloupe is built as a collection of modules, each performing a certain part of the analysis ( figure ). it is built to fit into upstream workflows and can handle any combination of one or two expression datasets where the data are presented as tables with samples as columns and features (genes, proteins, transcripts or other measured features) per rows (illustrated in supplementary materials s ). statistical visualizations require columns with p-value, false discovery rate (fdr) corrected p-values, fold change (difference of means between the two compared groups) and average feature level. these values are provided by up-stream software such as normalyzerde [ ] or r packages such as limma [ ] for most types of omics or deseq [ ] , for rna-seq expression data. after loading the data in the web interface, the visualizations can be accessed immediately. the general analysis workflow is shown in figure . the workflow starts with the user assessing their data using the sample-wide quality visualizations, including boxplots, density plots, bar plots, dendrograms, histograms, and principal component plots. these visualizations commonly reveal outlier samples and the presence of systematic effects in the data. further, for studies involving two datasets, omicloupe provides the side-by-side study of whether these effects are uniquely present in one or both of the datasets. these visualizations help the user to make decisions on how to best perform analysis such as outlier omissions, or decisions on what statistical comparisons to perform, and to judge the reliability of the data. next, the user can screen overlaps between statistical comparisons and sample conditions by inspecting whether features pass specific statistical cutoffs (p-value, fdr, optionally in combination with fold change) in one or several statistical comparisons (i.e. specific treatments or time points) or datasets. this overlap is illustrated by venn diagrams for pairwise comparisons and upset plots [ ] for a higher number of comparisons, with the upset visualization designed to efficiently compare a high number of overlaps. further, for statistical comparisons, overlaps can be split by fold direction giving a better sense of whether overlaps indicate a shared abundance pattern. a novel visualization illustrates the fraction of features that change abundance in the same direction for low-and high-p-values, and the fold patterns of shared features are highlighted. these illustrations jointly provide a detailed view of similarities between contrasts. for both statistical and qualitative upset plots and for the venn comparisons, subsets can be directly inspected and exported. single features can be chosen for closer inspection in the feature check module to evaluate in detail how their abundance values are distributed over any sample condition, shown either as raw data points or by using box-or violin plots. beyond the previously mentioned, further specialized visualizations are provided. an analysis approach for identifying features uniquely present in certain conditions is provided as an upset plot, which can highlight features for which the abundance is below detection limit for certain samples. a correlation plot allows direct illustration of feature correlation patterns between data layers based on the same set of samples, for instance multiomics or alternative software processing of the same dataset. all the plots discussed above can be downloaded in png or vector format and can be customized, providing to summarize, omicloupe provides a tool to rapidly assess datasets for technical trends and for indepth studies of statistical comparisons and individual analytes. case : effects of data processing software on differential expression analysis outcome. to assess the utility and validity of the approaches introduced in omicloupe, we started by analyzing spike-in proteomic data that have previously been explored extensively in a comparison of data processing software for data-independent acquisition (dia) lc-ms/ms data [ ] . this dataset consists of e. coli and yeast proteins spiked at two different concentrations into a human proteome background. the two mixtures were analyzed in triplicates. in the original work, the data were processed using five different software, allowing for a comparison of their relative performance. the software used were peakview, skyline, openswath, dia umpire, and spectronaut, where only dia umpire was used without matching to a previously generated spectral library. this dataset was employed as an example with known ground truth where concentrations of proteins from different organisms were known, allowing assessment of how well the visualizations illustrate the known underlying trends. further, it demonstrates how omicloupe can be used to assess the impact of different dia software methods for processing the same set of samples. upon inspection in omicloupe, the quality control visualizations show that the choice of software impacts the resulting absolute values, as illustrated in the density plots shown in figure a . less obvious differences are seen between the spike-in levels, although upon inspection in a dendrogram, the difference between openswath and peakview appeared smaller than their respective spike-in levels (supplementary s ). it can be noted that the intensity values were scaled to reach similar levels in subsequent analyses in the original study for this reason. qualitative inspection was performed to identify proteins only detected by certain software processing methods. here, the majority of proteins ( ) were detected by all five methods, and proteins were detected by all methods except dia umpire. conversely, dia umpire identified proteins that were not detected by any other method (highlighted in blue, figure b ), although peakview identified a higher number of proteins uniquely ( , figure b ). upon statistically comparing the abundances between the spike-in levels, a considerable number of proteins were also uniquely identified as differentially expressed by peakview ( spike-ins). eleven proteins were found to be significant but with opposing direction of change when comparing peakview and spectronaut output (highlighted in blue, figure c ). out of these, all eleven were yeast protein, correctly identified as downregulated by peakview. to further elucidate the underlying differences in the processed data from dia umpire and peakview, a closer inspection of the statistical distributions was made in omicloupe. out of the six statistical figures, the volcano plots are illustrated in figure (all six can be found in the supplementary material s ). inspection of features passing the thresholds fdr < . and fold change (log ) > ( figure a) shows a considerable number of common statistical features with the same abundance change directions (blue) but it is notable that a larger number of features are identified only after peakview processing. these are distributed across all significance levels and folds, with no evident trends for a higher concentration of lower abundance values. a handful of features are found to be changing in opposite direction between the groups (green) and were compared to the expected regulation pattern. here, it was found that all features with a positive log fold in both methods were true positives, while out of the two with negative fold-change, one was a false positive. this indicates how omicloupe could be used to qualitatively give indications of the reliability of features across comparisons by using fold change. interestingly, sets of features found to be changing in one group are clustering around zero-fold change in the other method, indicating a different ability of the software to handle these features. further inspection of the ground truth ( figure b ) illustrates the respective types of spike-in. for peakview, there seems to be, in particular for yeast, a considerable number of false negatives identified, while for dia umpire these are less common. to illustrate the joint use of the two methods, a subset of features identified after processing by both methods was inspected ( figure c ). here, a set of features, only identified as differentially abundant after peakview processing (yellow in figure a ), was highlighted and their distribution after dia umpire processing inspected. one exception was significant in both cases, seen as the green point with lowest p-value (highest along the y-axis). interestingly, the features represented a mix of true positives (e. coli) and false positives (human). the true positives were found with greater fold change in dia umpire (two rightmost green points in figure c , dia umpire panel). from these observations, we conclude that omicloupe allows for finegrained analysis of differences resulting from data processing using different software and allows careful inspection of specific data points across multiple datasets. multiomics studies of the same biological samples are becoming increasingly more frequent, but how to integrate the data types and finding important features remains challenging. we thus investigated how omicloupe can be used for direct comparisons of different data types taken from the same set of samples, to reveal features only detected in certain conditions, and common patterns of observed abundance level changes. for this purpose, a comprehensive multiomics dataset from endometrial cancer samples was downloaded [ ] . multiple types of data, including proteomics and rna-seq, were acquired for the samples in the original study, and the features had been mapped to common gene identifiers. the samples are classified in different histological types, including copy number variation (cnv) high, which includes serous samples and cancers penetrating at least % of the endometrial wall and cnv low consisting of samples penetrating less than % of the endometrial wall. here, the statistical analysis focused on the comparison between these groups. a first view using the pca module revealed a primary separation between most normal samples and tumor samples ( figure a ). this separation was similar in both proteomics and rna-seq datasets, with few noticeable differences. further, a partial separation between cnv high and cnv low can be seen along with the second principal component. pca analysis without the control samples group was also performed using the function available in omicloupe (supplementary s ). to study the similarity of the statistical comparisons across the two data types, features with positive abundance change and with low p-values were highlighted in the rna-seq contrast (by dragging directly in the figure) between cnv high and cnv low to see how these distribute in the corresponding contrast in the proteomics dataset ( figure b ). the majority of features were also upregulated in the proteomics data with three exceptions, namely folr , steap b and tbl d . genes of interest, including tp , which was discussed in the original study, were inspected using the feature check ( figure c ), which showed a reversed pattern in transcriptomics and proteomics. finally, the correlation between transcriptomics and proteomics was studied. pearson and spearman correlations are illustrated in figure d and showed similar median values ( . pearson) to those presented in the original study, with a small number of inversely correlated features. this demonstrates how omicloupe can be used to inspect similarities and differences between layers of omic data generated from the same set of samples, providing an improved understanding of both the general expression profiles and individual gene products. for the proteomics study, the initial quality control revealed two aspects in the data influencing the subsequent analysis. quality control visualizations using pca and dendrograms revealed clustering of samples according to a sample name-based categorization, thought to be the plating numbers of the cell lines ( figure a ). to compensate for this effect, this number was included as a covariate in the statistical tests, and the impact of including it was investigated in omicloupe. the inclusion of this covariate led to a considerably higher number of features detected as significantly different at the thresholds employed (fdr < . and log fold change > . ), as exemplified for the comparison of infected samples at hours versus infected samples at hours ( figure b ). next, omicloupe was used to study control and infected samples independently (as illustrated in supplementary s ). here, a clear pattern was seen in the infected samples, with the hours infected samples separating out along pc , while the hours samples showed a weaker separation. in the control samples, the trend was less clear, and the control hours samples appeared as weak outliers. in order to study the potential impact of these group comparisons, control and infected samples at and hours were compared, as depicted in figure c . a strong effect of decreasing abundance is seen in control hours, while in infected hours the trend is smaller, with known viral proteins being clearly upregulated. this unexpected distribution of the hours control samples led us to focus on comparisons between infected samples, and the hours infected versus control comparison. to study the viral distribution between the infected conditions we highlighted proteins with known annotation related to either virus or virus receptors (according to uniprot https://covid- .uniprot.org, downloaded th of july ) in comparisons of , and hours infected samples compared to hours infected samples. figure a furthermore, to study the potential of omicloupe, the results from the proteomic study were compared to the transcriptomic dataset. here, the two datasets can easily be uploaded and compared based on their time points. to make a similar comparison in the two datasets, we decided to compare the proteomic and the transcriptomic data at hours, despite one outlier sample being identified in the transcriptomics data at this time point (the pca plot for the transcriptomics data illustrating this outlier is shown in supplementary s ). the distribution of the significant genes in the proteomic study in comparison with the transcriptomic study (expansion media), is depicted in the volcano plot in figure b . of particular interest are the significant genes that are shared between the datasets at hours after infection. at the set threshold (fdr < . and log fold change > . ), differentially expressed genes are shared between the proteomic and the transcriptomic data after hours of infection in the differentiation media. for the extension media, genes are significantly shared between the proteomic and transcriptomic datasets after hours of viral infection. the overlap between the ids in both datasets is displayed in the upset plot in figure c . interestingly, genes were overlapping between the proteomic dataset and the transcriptomic study (differential and expansion media). as an example, one of those shared genes, cd , is depicted in figure d . cd is a leukocyte surface antigene, which has been shown to be upregulated after a viral infection, including sars-cov- infection, as a host response to the infection [ ] . these overlapping groups were further analyzed in string [ ] to investigate the relevant pathways connected to these significant genes identified. of biological interest is that one of the main regulated reactome [ ] pathways is neutrophil degranulation, which in many studies has been reported as a key biological process during the sars-cov- infection [ ] [ ] [ ] . in summary, in this third case study, omicloupe was used to perform a parallel analysis of two datasets from different types of omics (proteomics and transcriptomics) to investigate the response to infection over time. both these datasets were obtained from published studies. by straightforward visualizations, we demonstrate the feasibility of using this tool to easily identify significantly changing gene products, common to both datasets, which can be used for further analysis, such as go enrichment and pathway analysis. visualization is an important tool to fully explore omics datasets and to highlight features that can be difficult to assess with numbers alone. consequently, there are many new software solutions for omic data visualization presented over the past few years. these include a range of user-friendly stand-alone software for omics visualization such as perseus [ ] for proteomics, or shiny-based software such as shinyomics [ ] , which provides a flexible quality-oriented interface to omic data, and wilson [ ] providing high-quality interactive figures based on an open file format but only limited abilities to compare features. intervene is a software focusing on comparisons [ ] , aiming to provide various types of overlap information, but only based on fold change information and not allowing for featureby-feature examination. furthermore, software dedicated to incorporating multiple layers of omics such as mixomics [ ] has extensive multiomics integration capabilities, but does so on a sample-wide scale rather than focusing on the behavior of single features. despite these notable examples and several more new visualization software packages now being available, we have failed to find software performing several of the functionalities for multiple comparisons across datasets provided by omicloupe. key features in omicloupe like the side-by-side data distribution comparison volcano and ma plots, with labeling of key features across the comparisons and of datasets, as well as the ability to rapidly switch to individual feature views across samples, enable a deeper understanding of the individual features in the data. the implementation of upset plots with optional splitting based on changes in abundance direction, can rapidly help in determining reproducibility across datasets. while standard statistical comparison, using strict thresholds in many cases is the default option, underlying trends can be found in plots such as upset with less strict thresholds, when the data are lacking power. here, we explored three diverse datasets to highlight different aspects of omicloupe's functionality. by comparing the impact of different proteomics software processing methods, we could study in detail differences in outcome between the methods, and identify specific features handled correctly only by one or some of the methods. next, multiomics exploration with both transcriptomic and proteomic data obtained from the same samples gave the opportunity to explore features across omic layers. here, we identified an overall similarity of trends across the omic data layers, and rapidly illustrated the correlations of transcripts and proteins. further, we visualized key features in detail, including tp , a key protein discussed in the original study, and detected differences at transcript and protein level. this demonstrates how omicloupe can confirm and provide extended knowledge for existing data. finally, we used two separate sars-cov- studies to profile intestine cells during infection. omicloupe was used to identify and navigate technical limitations, including a batch effect, and a seeming lower reliability of one set of control samples. the overall regulation patterns were relatively different, as expected due to the different types of samples, but still subsets of features with joint abundance changes were identified. these were downloaded and enriched, revealing biological trends in line with what had been observed in prior studies. the cases presented in this manuscript are common examples of challenges encountered when analyzing omics data. beyond these, the omicloupe software has the potential to be used in a wide range of scenarios, to better understand both single-and multiple-omics datasets. to this end, we believe usability is of critical importance for this kind of software, and omicloupe has a straightforward interface, with user help text complemented with video tutorials at the website. having these at hand may mean the difference in how extensively the data could be explored, and thus how well they can be understood. we thus encourage users to test the software, provide feedback about its functionality and to comment on possible useful new extensions. here, we have presented omicloupe, which both introduces novel approaches for comparative visualization cross dataset and presents these in an interactive easy-to-use software. we have demonstrated its utility on three diverse datasets, starting with a technical dataset to demonstrate how omicloupe can be used for comparing processing methods and how the cross-comparison fold can provide important information. secondly, we explored a multi-omic cancer dataset illustrating how same-sample cross-omics can be readily illustrated. finally, to demonstrate its versatility, we reanalyzed two recently published sars-cov- datasets, performed comparative explorations of these datasets and rapidly identified proteins and rna transcripts showing the same abundance change trends across both studies. based on these results and usage on other datasets, we propose that omicloupe can be a versatile tool in many expression omics-based analyses, both for novice and expert users. we provide it for usage by the community, as an r-package and as an online server. omicloupe is implemented using r (v . . ) and rshiny (v . . . ), using packages providing interactive visualizations: plotly (v . . . ), dt (v . ) and packages for data visualization: ggplot (v . . ), ggally (v . . ), upsetr (v . . , [ ] ) and dplyr (v . . ) for data processing. the code is developed in modules to facilitate reusability. further, a singularity container [ ] was prepared allowing immediate local execution without being required to install the r package dependencies. omicloupe was evaluated using three datasets covering different use cases. an r notebook containing the code used for preprocessing the datasets together with an html-document with the code output is outlined in the supplementary s and accessible on the doi: https://doi.org/ . /zenodo. . a technical dataset was employed where proteins from human, e. coli and yeast had been spiked in at controlled concentrations [ ] and subsequently analyzed using five different dia methods. the data was downloaded from proteomexchange [ ] at the id pxd , selecting the data generated on the tripletof instrument with fixed-size windows for all five methods. the hye dataset was used, with a spike-in difference of log fold . the raw data matrices were preprocessed both into five separate data matrices, and into a single merged matrix consisting of all joint protein entries. they were subsequently log -transformed and rolled up to protein level using an r-reimplementation (github.com/computationalproteomics/proteinrollup) of the danter rrollup [ ] , using default settings and excluding proteins supported by a single peptide. statistical contrasts between the two concentration levels were subsequently calculated using limma [ ] as provided by normalyzerde [ ] , and resulting p-values adjusted for multiple hypothesis testing using the benjamini-hochberg procedure [ ] . a multiomics dataset from a study investigating prospectively collected endometrial carcinoma tumors [ ] divided into four histological groups, including copy number (cnv)-high (serous cancera rare aggressive variant, and cancers with more than % penetrance of the endometrial wall) and cnv-low (less than % penetrance of the endometrial wall). the data matrices and meta information were downloaded from the supplementary information of the original study. samples omitted from the original study were similarly omitted, as specified in the matrix obtained from the original supplementary. further, upon inspection with omicloupe the set of normal samples was identified as a strong outgroup and omitted to avoid influence in the statistical procedure. the original dataset contains multiple layers of omic data, out of which the following two were used in the present study: proteomics and mrna levels. for the proteomics matrix, statistical contrasts were calculated using limma [ ] in normalyzerde and benjamini-hochberg [ ] corrected fdr values were calculated. for the transcriptomics matrices the data was provided as rsem estimated counts. it was first transformed with voom with quality weights [ ] and subsequently processed using limma [ ] . statistical contrasts were for both data types calculated between samples classified as cnv-high and cnv-low. the first dataset analyzed in this case study is a recently published sars-cov- proteomic dataset [ ] , where human colon epithelial carcinoma cell line (caco- ) was infected by sars-cov- and proteomic analyses were performed on samples at four time points ( , , , hours after infection), both for infected samples, and samples treated with a mock infection. the proteomic data and metadata were generously provided by the authors in the supplementary materials of the study zero-values were replaced with na, and the protein abundance values were log transformed. statistical contrasts were calculated using limma [ ] in normalyzerde [ ] and resulting p-values fdr-corrected using the benjamini-hochberg procedure [ ] . initially, statistical comparisons were made between infected and control samples at each of the four time points ( , , and hours after infection). after initial explorations in omicloupe a batch effect was identified, which was subsequently included as a covariate in the statistical test, as described in the results section. the second dataset was from human intestinal organoids infected with sars-cov- in both differentiation and expansion media and analyzed at two time points after infection ( and omicloupe is designed to easily interface with the upstream data generation process and to work on any expression data matrix. it provides the ability to explore up to two datasets, and provides comparative views between statistical contrasts performed either within one dataset or across multiple. it is organized in modules allowing rapidly shifting from a sample-wide view, to inspect individual statistical comparisons, overlaps between multiple comparisons, to understanding single features. (adapted from schematics shown at http://quantitativeproteomics.org/omicloupe). upset plots of the features that were found as significantly differentially abundant (fdr < . , log fold change > ) when data had been processed using the different software. in c, features that are changing upwards or downwards in the comparison are displayed separately to visualize contradictory abundance changes due to differential processing. eleven proteins that were deemed significantly changing, but with opposing direction of change after processing in peakview and spectronaut are highlighted in blue. these panels show part of the statistical interface in omicloupe. panel a) shows features passing the significance threshold fdr < . and log fold > . in individual datasets, and in both. green points ("contra" in the legend), are passing the significance threshold in both datasets, but with reversed log fold direction. panel b) shows coloring based on the spike-in source. panel c) show the outcome of interactively highlighting a set of five features only significant in peakview and one significant in both. this reveals their distribution in dia umpire, showing that the features upregulated in both are true positives, while one of the two found in lower abundance in dia umpire is a false hit. a) principal component illustration present in the quality module, comparing proteomics and transcriptomics. as can be seen, the major trends are similar between the two data types. b) distribution of how high-significance features upregulated in rna-seq distribute in the proteomics dataset. c) boxplots of tp , identified in the dataset across the four studied sample classifications using the feature check module. d) correlation distributions between the rna-seq and proteomics features using the correlation module. a) inspection revealed a separation of samples along the second principal component likely related to a plating effect. this was compensated for in subsequent statistical tests by including it as a covariate. b) the impact of performing differential expression analysis without and with including the putative plating number as a covariate. the inclusion of the covariate yielded new statistical features while losing six as compared to not including the covariate. c) comparison of control samples h and h shows many features with a decrease in abundance, indicating that the mock treatment might influence the data. comparison between infected samples at h and h show more limited differences, with seven detected viral proteins among those with increased abundance at hours indicated in red circles (log fold change > . ) out of which two passed the fdr threshold. statistical analysis performed using the following settings: fdr < . and log fold change > . . for both datasets, data from hours after viral infection are used a) inspection of infected samples, hours and hours compared to hours, colored by proteins known as virus proteins and virus receptors revealed a clear upregulation among virus proteins. b) direct comparison of infected and control at hours after infection in proteomics and comparison between the proteomic dataset and the transcriptomic dataset (expansion media). the coloring is based on in which dataset the gene products pass the significance threshold. green points ("contra" in the legend), are passing the significance threshold in both datasets, but with reversed log fold direction. this comparison revealed a set of shared proteins, both changing abundance in the same or opposite direction. c) illustration of the shared significant genes between proteomic and transcriptomic (differential and expansion media) datasets. d) cd distribution at different time points in the proteomic study and in the transcriptomic data. the data was retrieved from the ncbi gene expression omnibus (geo) database, from the accession number gse . the data were tmm normalized comparing infected samples at and hours after infection to the uninfected reference p-values were fdr-corrected using the benjamini-hochberg procedure funding jw and fl were supported by the swedish foundation for environmental strategic research (mistra biotech). vs and fl were supported by olle engkvist byggmästare ( - ) and the technical faculty at lund university (proteoforms@lu) a multicenter study benchmarks software tools for label-free proteome quantification a comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation why batch effects matter in omics data, and how to avoid them mint: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms diablo: an integrative approach for identifying key molecular drivers from multi-omics assays sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems online tool for improved normalization of omics expression data and high-sensitivity differential expression analysis limma powers differential expression analyses for rna-sequencing and microarray studies moderated estimation of fold change and dispersion for rna-seq data with deseq upsetr: an r package for the visualization of intersecting sets and their properties proteogenomic characterization of endometrial carcinoma sars-cov- productively infects human gut enterocytes. science ( -) upregulation of cd is a host checkpoint response to pathogen recognition string -a global view on proteins and their functional interactions in organisms the reactome pathway knowledgebase clinical features of patients infected with novel coronavirus in wuhan crucial, or harmful immune cells involved in coronavirus infection: a neutrophil extracellular traps in covid- the perseus computational platform for comprehensive analysis of (prote)omics data collaborative exploration of omics-data web-based interactive omics visualization intervene: a tool for intersection and visualization of multiple gene or genomic region sets lê cao ka. mixomics: an r package for 'omics feature selection and multiple data integration singularity: scientific containers for mobility of compute proteomexchange provides globally coordinated proteomics data submission and dissemination a statistical tool for quantitative analysis of -omics data controlling the false discovery rate: a practical and powerful approach to multiple testing voom: precision weights unlock linear model analysis tools for rna-seq read counts sars-cov- infected host cell proteomics reveal potential therapy targets we want to thank victor lindh whose master's thesis project allowed us to explore ideas which later were further developed here. we also thank omicloupe users who have tested and provided valuable feedback. finally, we would like to thank the authors behind the four datasets used in the three case studies who generously provided their data for further use in an accessible way. the authors declare that they have no competing interests. the study was conceived by jw and fl. the software development was carried out by jw with input about functionality from vs and fl. data analysis was performed by jw and vs. the manuscript was drafted by jw and was expanded, edited and approved by all authors. key: cord- -lh rl authors: valentine, charles c.; young, robert r.; fielden, mark r.; kulkarni, rohan; williams, lindsey n.; li, tan; minocherhomji, sheroy; salk, jesse j. title: direct quantification of in vivo mutagenesis and carcinogenesis using duplex sequencing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lh rl the ability to accurately measure mutations is critical for basic research and identification of potential drug and chemical carcinogens. current methods for in vivo quantification of mutagenesis are limited because they rely on transgenic rodent systems that are low-throughput, expensive, prolonged, and don’t fully represent other species such as humans. next generation sequencing (ngs) is a conceptually attractive alternative for mutation detection in the dna of any organism, however, the limit of resolution for standard ngs is poor. technical error rates (~ × − ) of ngs obscure the true abundance of somatic mutations, which can exist at pernucleotide frequencies ≤ × − . using duplex sequencing, an extremely accurate error-corrected ngs (ecngs) technology, we were able to detect mutations induced by carcinogens in tissues of strains of mice within days following exposure. we observed a strong correlation between mutation induction measured by duplex sequencing and the gold-standard transgenic rodent mutation assay. we identified exposure-specific mutation spectra of each compound through trinucleotide patterns of base substitution. we observed variation in mutation susceptibility by genomic region, as well as by dna strand. we also identified the primordial signs of carcinogenesis in a cancer-predisposed strain of mice, as evidenced by clonal expansions of cells carrying an activated oncogene, less than a month after carcinogen exposure. these findings demonstrate that ecngs is a powerful method for sensitively detecting and characterizing mutagenesis and the early clonal evolutionary hallmarks of carcinogenesis. duplex sequencing can be broadly applied to chemical safety testing, basic mutational research, and related clinical uses. significance statement error-corrected next generation sequencing (ecngs) can be used to rapidly detect and quantify the in vivo mutagenic impact of environmental exposures or endogenous processes in any tissue, from any species, at any genomic location. the greater speed, higher scalability, richer data outputs, as well as cross-species and cross-locus applicability of ecngs compared to existing methods make it a powerful new tool for mutational research, regulatory safety testing, and emerging clinical applications. carcinogenesis is rooted in somatic evolution. cell populations bearing stochastically-arising genetic mutations undergo iterative waves of natural selection that enrich for mutants which confer a phenotype of preferential survival or proliferation . the probability of cancer can be increased by carcinogens-exogenous exposures that either increase the abundance of mutations or facilitate a cell's ability to proliferate upon selective pressures. many chemicals induce dna damage, thereby increasing the rate of potentially oncogenic dna replication errors . the same is true for many forms of radiation . non-mutagenic/non-genotoxic carcinogens act through a variety of secondary mechanisms such as inhibition of the immune system, cell cycle overdrive to bypass normal dna replication checkpoints, and induction of inflammation which may lead to both increased cellular proliferation and dna damage, among others . preclinical genotoxicity and carcinogenicity testing of new compounds is often required before regulatory authority approval and subsequent human exposure , . however, this standard is slow and expensive; even in rodents, it takes years to reach the endpoint of tumor formation. over the past fifty years, a variety of approaches have been developed to more quickly assess biomarkers of genotoxicity or cancer risk by assaying dna reactivity or mutagenic potential as surrogate endpoints for regulatory decision-making . the most rapid and inexpensive of such methods include in vitro bacterial-based mutagenesis assays (e.g. the ames test). other in vitro and in vivo assays for mutation, chromosomal aberration induction, strand breakage, and formation of micronuclei are also available, however, their sensitivity and specificity for predicting human cancer risk is only modest. in vivo, internationally accepted mutagenesis assays using transgenic rodents (tgr) provide a powerful approximation of oncogenic risk, as they reflect whole-organism biology, but are also highly complex test systems . transgenic rodent mutagenesis assays require maintenance of multiple generations of animals bearing an artificial reporter gene, animal exposure to the test compound, necropsy several weeks after exposure, isolation of the integrated genetic reporter by phage packaging, and transfection of the phage into e. coli for plaque-counting on many petri dishes under permissive and non-permissive selection conditions to finally obtain a mutant frequency readout. although effective, the infrastructure and expertise required for managing a protocol which carries host dna through three different kingdoms of life has not been amenable to ubiquitous adoption. directly measuring ultra-rare somatic mutations from extracted dna while not being restricted by genomic locus, tissue, or organism (i.e. could be equally applied to rodents or humans) is appealing yet is currently impossible with conventional next-generation dna sequencing (ngs). standard ngs has a technical error rate (~ × - ) well above the true per-nucleotide mutant frequency of normal tissues (< × - ) . new technologies for error-corrected next generation sequencing (ecngs) have shown great promise for low frequency mutation detection in fields such as oncology and, conceptually, could be applied to genetic toxicology , . duplex sequencing (ds), in particular, is an error correction method that achieves extremely high accuracy by comparing reads derived from both original strands of dna molecules to produce duplex consensus sequences that better represent the true sequence of the source dna molecule. ds achieves a sensitivity and specificity several orders of magnitude greater than other methods that do not leverage paired-strand information; it is uniquely able to resolve mutants at the real-world frequencies produced by mutagens: on the order of one-in-ten-million . in this study we tested the feasibility of using duplex sequencing to measure the effect of genotoxicants in vivo. we assessed the dna of two strains of mice which were treated with three different mutagenic carcinogens, each with distinct modes of action, and examined five different tissue types, to generate a total of almost ten-billion error-corrected nucleotides worth of data. in addition to comparing mutant frequencies with those obtained from a gold-standard tgr assay, we explored data types not possible with traditional assays, including mutant spectra, trinucleotide signatures, and variations in the relative mutagenic sensitivity around the genome. our findings illustrate the richness of genotoxicity data that can be obtained directly from genomic dna. finally, we highlight an opportunity in applying ecngs to drug and chemical safety testing, as well as broader applications in the research or clinical settings, for the detection of phenomena related to both mutagenesis and carcinogenesis. current in vivo transgenic rodent (tgr) mutagenesis detection assays measure the potential of a test article to generate mutations in a selectable reporter gene. traditional two-year carcinogenicity studies measure the ability of an agent to induce gross tumors in mice and rats. we designed two parallel mouse cohort studies to assess whether duplex sequencing (ds) of genomic dna could be used as an alternate method of quantifying both mutagenesis and early tumor-precursor formation (fig. ) . we selected two transgenic strains of mice: big blue ® , a c bl/ -derived strain bearing approximately integrated copies of lambda phage per cell and tg-rash , a cancer-predisposed strain carrying approximately copies of the human hras proto-oncogene , . the big blue mouse is one of the three most frequently used strains in the tgr mutagenesis assay and the tg-rash mouse is used for accelerated -month carcinogenicity studies. animals were dosed for up to twenty-eight days with one of three mutagenic chemicals (or vehicle control) and then were sacrificed and necropsied. genomic dna was isolated from various frozen tissues for subsequent mutational analysis ( table ). the rationale for the selection of the specific strains, chemicals, tissues, and genic targets to be sequenced is detailed in the sections that follow. we compared the frequencies of chemically induced mutations measured by a conventional plaque-based tgr assay (big blue) against those obtained by duplex sequencing of the big blue reporter gene (cii) after isolation from mouse gdna in the absence of any in vitro selection. the gold-standard for chemical cancer risk assessment involves exposing rodents to a test compound and assessing an increase in tumors relative to controls. while effective, the approach takes two years. mutagenicity is one accepted surrogate for carcinogenicity risk which is more rapidly assessable. at present, transgenic rodent assays are the most validated in vivo mutagenesis measurement approach. a genetically modified strain carrying an artificial reporter gene is mutagen-exposed, necropsied, and the reporter is recovered from extracted dna. reporter dna is then packaged into phage and transfected into bacteria for readout by counting total and mutation-bearing plaques following incubation at mutantselectable temperature conditions. a simple mutant frequency can be determined within months, but the assay is complex. an ideal in vivo mutagenesis assay would be dna sequencing-based and able to directly measure mutation abundance and spectrum in any tissue of any species without the need for genetic engineering or selection. eighteen big blue mice were treated with either vehicle control (vc, olive oil), benzo[α]pyrene (b[α]p) or n-ethyl-n-nitrosourea (enu) for up to days (see materials and methods). we selected b[α]p and enu based on their historical use as positive controls in many early studies of mutagenesis and because they are recommended by oecd tg for demonstration of proficiency at detecting in vivo mutagenesis with tgr assays . we evaluated bone marrow and liver tissue. the former was selected based on its high cell division rate and the latter based its slower cell division rate and the presence of enzymes necessary for converting some non-reactive mutagenic compounds into their dna-reactive metabolites. corresponding plaque-based cii gene mutant frequency data using the big blue tgr plaque-based assay was collected on all samples (supplemental table ). genomic dna was ultrasonically sheared and processed using a previously reported ds approach , which included enrichment for genic targets via hybrid capture (see materials and methods). all samples were initially investigator-blinded with regard to treatment group. in this first experiment, we sequenced the multi-copy cii transgene to a mean duplex depth (i.e. single duplex source molecule, deduplicated, coverage) of , x per sample. duplex sequencing mutant frequency per sample was calculated as the total number of unique non-reference nucleotides detected among all duplex reads of the cii gene divided by the total number of duplex base pairs of the cii gene sequenced. tg-rash total the mean per-nucleotide mutant frequency measured by ds in the vc, b[α]p, and enu-exposed groups was . × - , . × - ( . -fold increase over vc) and . × - ( . -fold increase over vc) respectively. the mean fold-increase detected between vehicle control and mutagenexposed groups was similar to that measured by the conventional plaque assay, with per-gene mutant frequencies for vc, b[α]p, and enu averaging . × - , . × - ( . fold-increase over vc), and . × - ( . -fold increase over vc), respectively ( fig. a) . the extent of induction by both assays was dependent on the tissue type. bone marrow cells, with their higher proliferation rate, accumulated mutations at . and . times the rate of the slowerdividing cells from the liver for b[α]p and enu, respectively. the extent of correlation between the fold change mutation induction of the two methods (r = . ) was encouraging given that the assays measure mutant frequency via two fundamentally different approaches. duplex sequencing genotypes millions of unique nucleotides to assess the proportion that are mutated whereas the plaque assay measures the proportion of phage-packaged cii genes that bear at least one mutation that sufficiently disrupts the function of the cii protein to result in phenotypic plaque formation. put another way, mutations that are disruptive enough to prevent packaging or phage expression in e. coli, or those that are synonymous or otherwise have no functional impact on the cii protein, will not be scored. blue mouse tgr assay yields a similar fold-induction of mutations in response to chemical mutagenesis as the readouts from the plaque-based assay. error bars reflect % confidence interval. (b) standard dna sequencing has an error rate of between . % and % which obscures the presence of genuine low frequency mutations. shown is conventional sequencing data from a representative base pair section of the human hras transgene from the lung of a tg-rash mouse in the present study. each bar corresponds to a nucleotide position. the height of each bar corresponds to the allele fraction of non-reference bases at that position when sequenced to > , x depth. every position appears to be mutated at some frequency; nearly all of these are errors. (c) when the same sample is processed with ds, only a single authentic mutation remains. one difference observed between the two methods was an attenuation of response to b[α]p in the marrow group by ds. this might be explained by an artificial skew due to the fold-increase calculation used, whereby slight variations in the frequency of vc will have disproportionately large effects on fold-increase measures, but could also be wholly biological. it is also conceivable that dna adducts, or sites of true in vivo mismatches could be artefactually "fixed" into doublestranded mutations when passaging reporter fragments through e. coli in the tgr assay and that this effect is amplified as overall mutant frequency increases. duplex sequencing, based in its fundamental error-correction principle, will not call adducted dna bases as mutations when directly sequencing the cii genomic dna, since a mutation has not yet formed on both strands of the dna molecule. nevertheless, the overall correlation between ds and tgr assays was high and the mutant frequency measured in the vc samples by ds, on the order of one-per-ten-million mutant nucleotides sequenced, was ten-thousand-fold below the average technical error rate of standard ngs (fig. b, c) . no difference in mutant frequency or spectrum between control and exposed samples could be detected when analyzing the data from either raw sequencing reads or ecngs methods that do not account for complementary strand information (single-strand consensus sequencing) (fig. s ) . the types of base substitution changes that are induced is an important element of mutagenesis testing. what might appear as a lack of overall effect on mutant frequency can sometimes be clearly appreciated when analyzing a dataset for the individual frequencies of specific transitions and transversions. mutation spectra can also provide mechanistic insight into the nature of a mutagen. although laborious, it is possible with plaque assays to characterize mutation spectra by picking and sequencing the clonal phage populations of many individual plaques or plaque pools. , . because mutations in plaques have been functionally selected and the transgenic target is relatively small, it is possible that the spectral representation is skewed relative to a nonselection-based assay. to assess whether mutation spectra are consistent between ds and tgr assays, we physically isolated, pooled, and sequenced (also with ds) , cii mutant plaques derived from big blue rodents exposed to vc, b[α]p, and enu. we then compared the mutation spectra between the dsanalyzed mutant plaques and the ds-analyzed gdna. the base substitution spectra detected in the cii gene by both approaches were highly similar between methods (p-value > . , chi-squared test) (fig. s ) . % by tgr). the normally uncommon base substitutions with adenine or thymine as reference were increased in all enu exposed samples. the canonical transition that identifies enu mutagenesis, c·g→t·a, was present at . % by ds and . % by tgr. these data add further weight of evidence that the mutations identified by ds reflect authentic biology and not technical artifacts. the eponymously named tgr assays rely on a transgenic reporter cassette which can be recovered from genomic dna. it is the ratio of mutant to wild-type genes, as inferred through phenotypically scoreable plaques, which permits the calculation of a mutant frequency [ ] [ ] [ ] [ ] . while these systems readily identify a subset of mutations in the reporter, others will not disrupt protein function and remain undetectable. given that the primary use of tgr assays has been for relative, rather than absolute, mutational comparison between exposed and unexposed animals, the non-functional subset of mutants has historically been considered irrelevant. yet with the increasing interest in more complex multi-nucleotide mutational spectra , the functional scoring of every base becomes essential, given that a specific sequence may rarely, or never occur in a small reporter region. duplex sequencing does not have this limitation since there is no selection post-dna extraction; all possible single nucleotide variants, multi-nucleotide variants, and indels can be equally well identified. to illustrate the impact of tgr selection on mutant recovery, we plotted the functional class of all cii mutations identified by duplex sequencing of either genomic dna obtained directly from mouse samples (fig. a) or from a pool of , individual mutant plaques that were isolated post-selection (fig. b) . in the tgr plaque assay, the mutations were almost exclusively nonsense or missense across the entire nucleotides of the cii gene (i.e. expected to result in the loss of cii protein function). only a small number of synonymous base changes were identified, and these were always accompanied by a concomitant disruptive mutation elsewhere in the gene. exceptionally few mutations were found at the n-and c-termini of the cii gene, presumably due to their lesser importance to protein function. in contrast, duplex sequencing detected mutations of all functional classes at the expected non-synonymous to synonymous (dn/ds) ratio along the entire length of the gene, including the termini regions. tgr assays rely on the assumption that the mutability of the cii lambda phage transgene is a representative surrogate for the entire mammalian genome. we hypothesized that local genomic features and functions of the genome such as transcriptional status, chromatin structure, and sequence context may modulate mutagenic sensitivity. to test this idea, we used ds to measure the exposure-induced spectrum of mutations in endogenous genes with different transcriptional status in different tissues: beta catenin (ctnnb ), dna-directed rna polymerases i and iii subunit rpac (polr c), haptoglobin (hp) and rhodopsin (rho), as well as the cii transgene in big blue mouse liver and marrow of animals exposed to olive oil (vehicle control), b[α]p, or enu. we assessed mutations in the same four endogenous loci in the lung, spleen, and blood of tg-rash mice exposed to saline (vehicle control) or urethane to investigate ds performance in a second mouse model. the duplex sequencing single nucleotide variant (snv) per-nucleotide mutant frequencies across mouse model, tissue, treatment group, and genomic locus are shown in figure . vehicle control (vc) mutant frequencies averaged . × - in the big blue mouse model ( fig. a ) and . × - in the tg-rash mouse model (fig. b) . the number of unique mutant nucleotides detected per vc sample ranged from to (mean . ) and were always non-zero (fig. s .) these frequencies are comparable to chawanthayatham et al. ( ) where a dmso vehicle-exposed transgenic gptΔ mouse was measured to have a mutant frequency of . × - in liver samples after duplex sequencing of the reporter recovered from gdna . we observed a mean background mutant frequency in the marrow ( . × - ) nearly twice that of peripheral blood ( . × - ), liver ( . × - ), lung ( . × - ), and spleen ( . × - ), which may relate to differences in relative cell-cycling times in these tissues. in all mutagen-exposed samples, the mutant frequency was increased over the respective vehicle control samples. however, the fold-induction across tissue types varied considerably, as each compound has a different mutagenic potential, presumably related to varying physiologic factors such as tissue distribution, metabolism, and sensitivity to cell-turnover rate , . the cii and rho genes had highest mutant frequencies among all tested loci in bone marrow. other genes, such as ctnnb and polr c, exhibited frequencies as much as -fold lower. this disparity is potentially due to the differential impact of transcription levels and transcription-coupled repair (tcr) of lesions and/or local chromatin structure . ctnnb and polr c are thought to be transcribed in all tissues we tested, and therefore benefit from tcr, whereas rho and cii are thought to be non-transcribed, and thus should not be impacted by tcr. hp was selected as a test gene because it is transcribed in the liver but not significantly in other tissues. the aforementioned logic cannot explain why hp exhibited an elevated mutation rate compared to other genomic loci in the mouse liver. an additional genomic process related to the transcriptional status is dna methylation. it is known that lesions on nucleotides immediately adjacent to a methylated cytosine have a lower probability of being repaired due to the relative bulk and proximal clustering of the adducts . this or other factors, such as differential base composition between sites could also be at play. mechanisms aside, the widely variable mutant frequency we observe across different genomic loci indicates that no single locus is ever likely to be a comprehensive surrogate of the genome-wide impact of chemical mutation induction. tg-rash mouse study evaluating lung, spleen, and peripheral blood in animals exposed to vc or urethane. there is no cii transgene in the tg-rash mouse model. note the different y-axis scaling between the two studies. to further investigate the potential role of tcr as a contributor to the observed differential regional sensitivity to mutagens, we examined the strandedness of mutations identified by duplex sequencing at each locus. mutational strand bias is defined as a difference in the relative propensity for a particular type of nucleotide change (e.g. a→c) to occur on one dna strand versus the other. this bias may result from multiple factors including transcription, epigenetic influences (e.g. methylation), proximity to replication origins, and nucleotide composition, among others , . we compared the per-nucleotide mutant frequency for each base substitution against its reciprocal substitution in our urethane-exposed mouse cohort. if a strand bias were to exist, then these frequencies would be unequal . we then correlated the extent of strand differences observed by genic region with predicted transcriptional status of each tissue. . , respectively). in humans, the genes hp and rho are largely nonexpressed in spleen, lung, and blood. mutational strand bias was seen for urethane exposed tissues in ctnnb and polr c (expressed in tissues examined) but not in hp or rho (not expressed in tissues examined). single nucleotide variants (snv) are normalized to the reference nucleotide in the forward direction of the transcribed strand. individual replicates are shown with points, and % confidence intervals with line segments. mutant frequencies were corrected for the nucleotide counts of each reference base in the target genes. the observed bias is evident in ctnnb and polr c as elevated frequencies of a→n and g→n variants relative to their complementary mutation (i.e. asymmetry around the vertical line), in contrast to the balanced spectrum of hp and rho. this difference is likely due to the mutationattenuating effect of transcriptioncoupled repair on the template strand of transcribed regions of the genome. two genomic regions, ctnnb and polr c, showed high urethane-mediated strand bias (fig. ) , which is consistent with a model of transcription-coupled repair since tcr predominantly repairs lesions on the transcribed strands of active genes . the majority of observed strand bias fell into two base substitution groups (t·a→a·t and t·a→g·c) in genes expressed in lung tissue (ctnnb and polr c). the mean reciprocal snv fold-difference of these mutation types across all tissue types was . and . in ctnnb and polr c versus . and . in (non-transcribed) hp and rho. the highest bias existed in lung tissue which is consistent with a tcr-related mechanism given that lung has the highest predicted transcription rate among the tissue types assayed. we next sought to classify each sample into a mutagen class based solely on the simple spectrum of single nucleotide variants observed within the endogenous regions examined in both the big blue and tg-rash animals. the technique of unsupervised hierarchical clustering can resolve patterns of spectra as distinct clusters with common features . figure a shows a strong spectral distinction between enu and both vc and b[α]p. however, the simple spectra of vc and b[α]p resolve poorly. a gradient of similarity is apparent in the vc and b[α]p cluster which suggests that, with deeper sequencing, it may be possible to fully resolve the two. no statistically valid clusters emerged that correlates with tissue type, suggesting that the patterns of mutagenesis for both b[α]p and enu are similar in the liver and marrow of the big blue mouse. figure b shows perfect clustering by exposure due to the orthogonal patterns of urethane mutagenesis as compared to the unexposed tissues in tg-rash mice. we similarly saw no correlated clustering at the level of tissue type in tg-rash mice. - probabilities was performed with the weighted (wgma) method and cosine similarity metric. (a) liver and marrow in big blue animals exposed to enu, b[α]p, or vehicle control (vc). (b) lung, spleen, and blood samples from the tg-rash cohort exposed to urethane or vc. clustering was near-perfect except for distinguishing b[α]p from vehicle exposure in liver tissue where fewer mutational events were observed due to its lower proliferation rate. to further classify the patterns of single nucleotide variants by treatment group, we considered all possibilities of the ′ and ′ bases adjacent to the mutated base to create trinucleotide spectra , , . when enumerating all possible single nucleotide variants within a unique trinucleotide context, a distinct pattern for each treatment group becomes apparent (fig. a-d) that show similarities to mutational signatures as extracted from thousands of human cancers (fig. e) . the vc trinucleotide spectrum (fig. a) is most similar to signature from the cosmic catalog of somatic mutation signatures in human cancer (cosine similarity of . ), which has a proposed etiology of c·g→t·a transitions in cpg sites resulting from unrepaired spontaneous deamination events at -methyl-cytosines. the most notable difference between the bulk trinucleotide spectrum of vc and signature is the extent of c·g→a·t and c·g→g·c transversions which most likely reflect endogenous oxidative damage, an age-related process . aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg type aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct aca acc acg act cca ccc ccg cct gca gcc gcg gct tca tcc tcg tct ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg ttt ata atc atg att cta ctc ctg ctt gta gtc gtg gtt tta ttc ttg the b[α]p trinucleotide spectrum (fig. b) is predominantly driven by c·g→a·t mutations with a higher affinity for cpg sites. this observation is consistent with previous literature indicating that b[α]p adducts, when not repaired by tcr, lead to mutations most commonly found in sites of methylated cpg dinucleotides , . this spectrum is highly similar to signature ( . cosine similarity) and signature ( . cosine similarity), both of which have proposed etiologies related to human exposure to tobacco where b[α]p and other polycyclic aromatic hydrocarbons are major mutagenic carcinogens. the spectrum for in vivo murine exposure to b[α]p is equally comparable to signature and signature ( . cosine similarity), likely due to similar mutagenic modes of action between b[α]p and aflatoxin (the proposed etiology of signature ) . the urethane trinucleotide spectrum (fig. d) has no confidently assignable analog in the cosmic signature set. as compared to the simple spectrum of urethane in figure b , a periodic pattern of t·a→a·t in ′-ntg- ′ emerges. this pattern of highly residue-specific mutagenicity has been previously observed in the trinucleotide spectra of whole genome sequencing data from adenomas of urethane-exposed mice as well as, very recently, in urethane-exposed lung tissue of mice weeks after exposure, as detected by another ecngs method . the tg-rash mouse model contains tandem copies of human hras with an activating enhancer mutation to boost oncogene expression . the combination of enhanced transcription and boosted proto-oncogene copy number predisposes the strain to cancer. use of these mice in a -month cancer bioassay is accepted under ich s b guidelines as an accelerated substitute for the traditional -year mouse cancer bioassay used for pharmaceutical safety assessment . exposure to urethane, a commonly used positive control mutagen, results in splenic hemangiosarcomas and lung adenocarcinomas in nearly all animals by weeks post-exposure. we examined the effect of urethane exposure on the hras transgene, as well as the endogenous hras, kras, and nras genes, at dna residues most commonly mutated in human cancers (fig. ) . pooled data from all tissues in the experiment (lung, spleen, blood) are included. the height of each point (log scale) corresponds to the vaf of each single nucleotide variant (snv). the size of the point corresponds to the number of counts observed for the mutant allele. a cluster of multiplet a·t→t·a transversions at the human oncogenic hras codon hotspot is seen in out of urethane exposed lung samples and out of urethane exposed splenic samples (supplemental table ). the observation of an identical mutation in independent samples with high frequency multiplets in a well-established cancer driver gene likely indicates positive selection. notably, these clones are defined by the transversion a·t→t·a in the context ntg which is characteristic of urethane mutagenesis. in contrast to the endogenous ras family genes, the human hras transgene is present in four copies per haploid genome-each under the control of a tandem promoter/enhancer, but without the repression system that is present at the native human hras locus. we postulated that the mechanism of activation of human hras in the tg-rash model would positively influence selection of the cells harboring the activating mutations and would be observable as outgrowth of clones bearing mutations at hotspot residues relative to residues not under positive selection. indeed, we observed compelling signs of selection, as evidenced by focally high variant allele frequencies (vaf) of activating mutations at the canonical codon hotspot in exon in the human hras transgene, but not at other sites in that gene, nor at homologous sites in the endogenous mouse ras family. sizable clonal expansions of this mutation were detected in out of lung samples, out of spleen samples, and in no blood samples which is consistent with the historically known relative frequency of tumors in each tissue. moreover, not only are the variant allele frequencies as much as -fold higher than seen for any other endogenous gene variant, but the absolute counts of mutant alleles at this locus is very high (> ), which offers strong statistical support for these clones existing as authentic expansions and not as independent mutated residues occurring by chance (supplemental table ). notably, all clonal mutations observed at codon are a·t→t·a transversions in the context ′-ctg- ′, which conforms to the context ′-ntg- ′ (where n is the iupac code for any valid base), which is highly mutated across all genes in the urethane exposed mouse samples (fig d) . other types of mutations at codon could lead to the same amino acid change, so the combination of the specific nucleotide substitution observed, the clone size relative to that of other loci, and the repeated observation across independent samples of the most tumor prone tissues paints a comprehensive picture of both a urethane-mediated mutagenic trigger and a carcinogenic process that follows. we have demonstrated that duplex sequencing, an extremely accurate error-corrected ngs (ecngs) method, is a powerful tool for the field of genetic toxicology that can be used to assess both mutagenesis and carcinogenesis in vivo. unlike conventional in vivo mutagenesis assays, duplex sequencing does not rely on selection, but rather on unbiased digital counting of billions of individual nucleotides directly from the dna region of interest. this yields data that is both richer and more broadly representative of the genome than current tools and allows fundamentally new types of biological questions to be asked. from sequence data it is possible to mine a wealth of information including mutation spectrum, trinucleotide mutation signatures, and predicted functional consequences of mutations. by virtue of not being limited to a specific reporter, we showed that the relative susceptibility to chemical mutagenesis varies significantly by genomic locus and is further influenced by tissue. we could infer this to be (at least partially) the result of non-uniform transcription-coupled repair, as evidenced by the consistent asymmetry of certain mutation types between transcribed and nontranscribed strands. the examples shown here are limited by the modest number of loci and tissues and the inference of transcriptional status based on another species and can be improved upon in future studies. it is likely that many other factors beyond transcriptional status shape the relative plasticity of the genome and can be uncovered with careful investigations. the ability to directly observe subtle regional mutant frequency differences, on the order of onein-ten million, is extraordinary in terms of biological study opportunities, but also raises practical questions for regulatory usage. for example, what would define the optimal subset of the genome to be used for drug and chemical safety testing? for some applications, a diverse, genomerepresentative panel makes the most sense; for others it might be preferable to enrich for regions that are predisposed to certain mutagenic processes or have unique repair biology . not all carcinogens are mutagens. those which are not mutagenic will not produce a signal in mutagenesis assays-either conventional or sequencing-based. however, as shown here, it appears possible to use ecngs to infer carcinogenesis via detection of clonal expansions carrying oncogenic driver mutations as a marker of a neoplastic phenotype . this concept is more complex to design, insofar as it necessitates some a priori knowledge about the common drivers that are operative in different tissues in response to different classes of carcinogens. however, there is simply no other approach, convenient or not, that can quantitate these signals in less than a month from exposure. the proof-of-concept illustrated here relied on a mutagenic chemical in a cancerpredisposed mouse strain; future efforts will be needed to demonstrate the same with nongenotoxic carcinogens in wild-type animals. a further advantage of ecngs is the breadth of applicability, in vivo or in vitro, to any tissue from any species. in vivo selection-based assays are organism and reporter specific; the former restricts testing to rodents, and the latter confers potential biases to mutational spectrum and does not allow targeting of specific genomic regions. the only in vivo mutagenesis assay that does not depend on in vitro selection, the pig-a gene mutation assay, classically restricted to only erythrocytes, requires bioavailability to the bone marrow compartment, cannot be used for spectrum analysis, and necessitates access to flow cytometery equipment . in contrast, next generation dna sequencing platforms are widely available and can be automated to handle thousands of samples per day, thus rendering the approach tractable for many different types of labs. we are not the first to apply ngs to mutagenesis applications . sequencing the reporter gene from pooled clones from tgrs has been used to identify in vivo mutagenic signatures . single cell cloning of mutagen-exposed cultured cells and patient-derived organoids has been used to identify in vitro and in vivo mutagenic signatures , , . in each case, cloning, followed by biological amplification was required to resolve single-cell mutational signals, which would otherwise be undetectable in a background of sequencing errors. we have previously used duplex sequencing to measure trinucleotide signatures in phage-recovered reporter dna of mutagenexposed transgenic mice without the need for cloning . others have characterized mutational spectra directly from human dna using a form of very low depth, whole genome, duplex sequencing without added molecular tags . however, each of these methods has factors that limit its practicality for broad usability. the cost of any ngs-based technique is an important consideration, particularly when compared to something as routine as the bacterial ames assay. duplex sequencing further multiplies sequencing costs because of the need for multiple redundant copies of each source strand as part of the consensus-based error correction strategy. however, over the last years, the cost of ngs has fallen nearly orders of magnitude, whereas the cost of conventional genetic toxicology assays has remained largely unchanged. extrapolating forward, we anticipate that equipoise will eventually be reached. savings by virtue of not needing to breed genetically engineered animals, the ability to repurpose tissue or cells already generated for other assays (supporting the r concept of replacement, reduction, and refinement ), decreased labor, and greater automatability should also serve to increase efficiencies and lessen animal use. beyond being undesirable, animal testing is simply not possible for some applications. new forms of mutagenesis, such as crispr and other gene editing technologies, are highly sequence-specific and cannot be easily de-risked in alternative genomes or using reporter genes , . being able to carry out rapid in-human genotoxicity assessment as a part of early clinical trials may also be important for applications where there is urgency to develop therapies, such as drugs being tested against the pandemic coronavirus and those needed in future public health emergencies. controlled drug and chemical safety testing are not the only reasons to screen for mutagenic and carcinogenic processes. humans are inadvertently exposed to many environmental carcinogens , . the ability to identify biomarkers of mutagenic exposures using dna from tissue, or noninvasive samples such as blood, urine, or saliva is an opportunity for managing individual patients via risk-stratified cancer screening efforts as well as public health surveillance to facilitate carcinogenic source control . deeper investigations into human cancer clusters , monitoring those at risk of occupational carcinogenic exposures , such as firefighters and astronauts , and surveilling the genomes of sentinel species in the environment as first-alarm biosensors are all made possible when dna can be analyzed directly. almost four decades have passed since it was envisioned that the entirety of one's exposure history might be gleaned from a single drop of blood . while this remains a lofty ambition, the data we have shown here suggest that it is not wholly implausible. our work indicates that there is a much greater amount of information recorded in the somatic genome than we have previously been able to appreciate, or access. future studies are needed to determine how best to capitalize on this data for basic research applications, preclinical safety testing, and in-human studies. all animals used in this study were housed at aaalac international accredited facilities and all research protocols were approved by these facilities respective to their institutional animal care and use committees (iacuc). tg-rash male mice [cbyb f -tg(hras) jic] from taconic biosciences received a total of three intraperitoneal injections of vehicle control (saline) or urethane ( mg/kg per injection) at a dose volume of ml/kg per injection on days , and . animals were necropsied on study day . liver, lung, and spleen samples were collected and then flash frozen. bone marrow was flushed from femurs with saline, centrifuged, and the resulting pellet was flash frozen. blood was collected in k edta tubes and flash frozen. studies were generally consistent with oecd tg guidelines except that enu and urethane were dosed less than daily but at a frequency known to produce systemic mutagenic exposures. the sampling time for the urethane study was at day and not day . high molecular weight dna was isolated from frozen big blue and tg-rash tissues using methods as described in the recoverease product use manual rev. b (cat # , agilent inc, santa clara ca, usa). vector recovery from genomic dna, vector packaging into infectious lambda phage particles, and plating for mutant analysis was performed using methods described in the λ select-cii mutation detection system for big blue rodents product use manual rev. a (cat # , agilent, santa clara ca, usa) . phage dna was purified from phage plaques punched from the e. coli lawn on agar mutant selection plates following days of incubation at ° c. agar plugs were pooled by mutagen treatment group in sm buffer and then frozen for storage. dna was purified using the qiaex ii gel extraction kit (cat # , qiagen, hilden, germany). mouse genomic dna was purified from liver, bone marrow, lung, spleen, and blood. approximately mm x mm x mm tissue sections were pulverized with a tube pestle in a microfuge tube. dna was extracted using the qiagen dneasy blood and tissue kit (cat # , qiagen, hilden, germany). extracted genomic dna was ultrasonically sheared to a median fragment size of approximately base pairs using a covaris system. sheared dna was further processed using a prototype cocktail of enzymes with glycosylase and lyase activity for the purpose of excising certain forms of dna damage and cleaving phosphodiester backbones at resulting abasic sites to render damaged, or incomplete duplex templates un-amplifiable (twinstrand biosciences, seattle wa, usa). dna was end-polished, a-tailed, and ligated to duplex sequencing adapters containing semi-degenerate unique molecular identifies (twinstrand biosciences, seattle wa, usa) via the general method described previously , . adapter-ligated dna fragments were then pcr amplified with primers containing dual unique indexes. after the initial pcr, samples were individually subjected to tandem hybrid capture using mer biotinylated dna oligo probes (idt, coralville, ia, usa), for a total of two captures. the first (indexing) and second pcr respectively entailed and cycles. the third pcr involved a variable number of cycles until the library could be accurately quantified. resulting libraries were quantified, pooled, and sequenced on an illumina nextseq using base pair paired-end reads with vendor supplied reagents. where necessary, sybr-based qpcr was used to determine appropriate dna input by normalizing phage and mouse dna across library preparations by total genome equivalents. library input, before shearing, of plaque dna was ~ pg and the genomic dna input for all mouse samples was ~ ng. a summary of sequencing data yields for big blue and tg-rash samples is listed in supplemental tables and . hybrid selection baits for all targets were designed to intentionally avoid capturing any nucleotide sequence within bp of a repeat-masked interval as defined in repbase (fig. s ) . intronic regions adjacent to the exons of the target genes were baited to provide a functionally neutral and non-coding view on the pressures of mutagenesis near exonic targets. duplex consensus base pairs and subsequent variant calls were only reported over a region defined by the same repeat-mask rule as for the bait target design. all libraries achieved . % alignment of duplex consensus bases over the target territories with less than . % of off-panel alignment. all targets were of expected uniform coverage given that no off-target alignment to pseudo-genes or repetitive genomic sequences was observed. baits were also designed to target the cii transgene in the big blue mouse model and the human hras transgene in the tg-rash mouse model. the multi-copy cii transgene was sequenced to a median target coverage of , x and the multi-copy human hras transgene to , x. consensus calling was carried out as described in "calling duplex consensus reads" from the fulcrum genomics fgbio tool suite . generally, the algorithm proceeds with aligning the raw reads with bwa. after alignment, read pairs were grouped based on the corrected unique molecular identifier (umi) nucleotide bases and their shear point pair as determined through primary mapping coordinates. the read pairs within their read pair groups were then unmapped and oriented into the direction they were in as outputted from the sequencing instrument. quality trimming using a running-sum algorithm was used to eliminate poor quality three-prime sequence. bases with low quality were masked to `n` for an ambiguous base assignment. cigar filtering and cigar grouping was performed within each read pair group to help mitigate the poisoning effect of artifactual indels in individual reads introduced in library preparation or sequencing. finally, consensus reads were created, from which duplex consensus reads meeting pre-specified confidence criteria were filtered. barcode error-correction was performed using a known whitelist of barcodes, a maximum number of mismatches between a barcode and an expected barcode of , and a minimum hamming distance to the next most likely known barcode of . after duplex consensus calling, the read pairs underwent balanced overlap hard clipping to eliminate biases from double counting bases due to duplicate observation within an overlapping paired-end read. duplex consensus reads were then end-trimmed and inter-species decontamination was performed using a k-mer based taxonomic classifier (supplementary methods). variants were called using vardictjava with all parameters optimized to collect variants of any alternate allelic count greater than, or equal to, one . there are two polar interpretations one can make when an identical canonical variant is observed multiple times in the same sample. the first assumption is that the observations were independent; that they were acquired during unrelated episodes in multiple independent cells and are not the product of a clonal expansion and shared cell lineage. the second assumption is that the alternate allele observations are a clonal expansion of a single mutagenic event and can all be attributed to one initial mutagenic event. when classifying variant calls as either independent observations, or from a clonal origin, we first fit a long-tailed distribution to all variants that were not germline. any outliers to this distribution with multiple observations are deemed to have arisen from a single origin. this method may serve to undercount multiple independent mutations at the same site under extraordinary specific mutagenic conditions. for example, the clonally expanded a·t→t·a transversion at codon in the hras transgene was a significant outlier to this model and was highly correlated with urethane exposure. the variant allele frequency (vaf) of these expanded mutations varied x in urethane-exposed lung tissues, however, our calculation of per-nucleotide mutant frequency varied only ~ x indicating that this one tissue-specific residue was under the highest selective pressures for expansion beyond any other residue in any other tissue within the panel territory. all clustering was performed using the wald method and the cosine distance metric. leaves were ordered based on a fast-optimal ordering algorithm . simple base substitution spectrum clustering was achieved by first normalizing all base substitutions into pyrimidine-space and then normalizing by the frequencies of nucleotides in the target region. clustering of trinucleotide spectra was achieved in a similar manner where base substitutions were normalized into pyrimidine-space and then partitioned into categories based on all the combinations of five and three prime adjacent bases . subsequent normalization of trinucleotide spectra was performed using the frequencies of -mers in the target regions. a consequence of extremely accurate error-correction next generation sequencing (ecngs) technologies is the detection of ultra-rare intra-species contamination and how false positive alignments of those sequencing reads can bias per-nucleotide mutant frequency (mf) calculation by more than -fold. the false positive alignment of short reads not from the target species is particularly likely when sample processing is done near samples that are of an alternate species. this issue is exacerbated when targeting regions of high homology among all species, such as in conserved or exonic regions of the genome (fig. s ) . the solution we developed for handling inter-species read-pair decontamination relies on taxonomic classification of all error-corrected sequences from the entire study to ensure only the read pairs that match the target species with high confidence are kept for downstream analysis. a taxonomy database was constructed with k-mers from human, rat, cow, and mouse. the taxonomic classifier kraken was used to identify error-corrected paired-end contaminating reads, as well as confidently indicating which reads were only from mus musculus origin. reads that are left unassigned due to this method are often true sequences from the source genomes, however, they contain an `n`-call or variant base often enough such that a single k-mer cannot exist that indicates a positive classification to the target genome. reads of ambiguous assignment were discarded as they did not contain enough sequence information to positively assign them to any of the organisms at the species level. to eliminate confounding assignment due to the human hras transgene in the tg-rash mouse model, a masked human genome was used for all classification where the mask territory was the exact sequence copy as integrated into tg-rash . out of a total of , , error-corrected paired-end reads across all ( . × - %) murine tissue samples, , , were taxonomically classified as mus musculus, to rattus norvegicus albus, ( . × - %) to homo sapiens, and to bos taurus ( %). exactly , ( . %) paired-end reads were unclassified and , , ( . %) were from an ambiguous taxonomic origin. only sequence data that could be positively identified as originating from the mouse genome was reserved for downstream analysis. furthermore, every error-corrected pairedend read supporting a variant call in this cohort underwent manual review and blast+ alignment using the blast nucleotide (nt) collection to confirm the true positive rate of taxonomic classification on this error-corrected dataset as being a perfect . %. tissue samples from vehicle control exposed mouse id contained paired-end reads from homo sapiens and a tissue sample from the benzo[α]pyrene exposed mouse id contained paired-end reads from rattus norvegicus albus suggesting that most contaminating events in both mouse cohorts were punctuated and private to just a few samples. the mean per-nucleotide mutant frequency for mouse is . × - and if contaminating reads were not removed, the mean pernucleotide mutant frequency would have risen to a rate equivalent, or greater than, the mutant frequencies detected in the positive control samples. figure s . mf comparison in a mutagen exposed sample with and without duplex consensus level error-correction. alternative forms of error-corrected next generation sequencing (ecngs) may perform the error-correction on single-strands without resolving a complete duplex consensus. these single-strand error-correction forms of ecngs are not sensitive enough for resolving small effect sizes in mutant frequency induction from experiments like those in the tgr assays. to illustrate this, we performed singlestrand error-correction data using duplex sequencing adapters on two tg-rash mouse lung samples, one treated with urethane and one treated with the vehicle control. the per-nucleotide mutant frequencies for the vehicle control and urethane-exposed samples are . × - and . × - using duplex sequencing. when measuring the same metric using only single-strand consensus sequencing (sscs), the two mutant frequencies rise to . × - and . × - , respectively. the difference between the mutant frequencies of the exposed and control tissues using duplex sequencing are different with a p-value less than . × - . this is in contrast to the single-strand error-correction measurements of mutant frequency which are not significant (p-value . ). both statistical tests were performed using the fisher's exact test for count data. error bars reflect % confidence intervals. figure s . consensus alignment data and probe design over endogenous and transgenic targets in the big blue mouse. hybrid selection targets were carefully designed to abut no closer than base pairs from a repeat masked (green) or pseudogene (pink) intervals. individual baits are colored as blue intervals underlying the read coverage track. the four coverage tracks shown in all panels are from four randomly selected library preparations to illustrate the relatedness of coverage profile and bait layout. table s . sequencing summary of the tg-rash mouse samples including consensus duplex bases and read pairs assayed. similar to the experimental design for sequencing of the big blue mouse samples, nearly billion duplex base pairs were generated from million duplex consensus read pairs from the tg-rash sample set. from these samples, contaminating read pairs were detected from both rattus norvegicus albus and homo sapiens species. these reads were removed prior to downstream analysis. errors in dna replication as a basis of malignant changes overview of biological mechanisms of human carcinogens mutational signatures in tumours induced by high and low energy radiation in trp deficient mice overview of genotoxic carcinogens and non-genotoxic carcinogens guidlines for testing of chemicals: oecd test guideline -transgenic rodent somatic and germ cell gene mutation assays, adopted genotoxicity and carcinogenicity testing of pharmaceuticals mutation as a toxicological endpoint for regulatory decision-making a compendium of mutational signatures of environmental agents detection of ultra-rare mutations by next-generation sequencing enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations highthroughput sequencing in mutation detection: a new generation of genotoxicity tests? next-generation genotoxicology: using modern sequencing technologies to assess somatic mutagenesis and cancer risk validation of transgenic mice harboring the human prototype c-haras gene as a bioassay model for rapid carcinogenicity testing analysis of spontaneous and induced mutations in transgenic mice using a lambda zap/lacl shuttle vector detailed review of transgenic rodent mutation assays detecting ultralow-frequency mutations by duplex sequencing characterizing benzo[a]pyrene-induced lacz mutation spectrum in transgenic mice using next-generation sequencing chemically induced mutations in a mutamouse reporter gene inform mechanisms underlying mapping the binding site of aflatoxin b in dna: systematic analysis of the reactivity of aflatoxin b with guanines in different dna sequences mechanism of the inhibition of mutagenicity of a benzo enu) increased brain mutations in prenatal and neonatal mice but not in the adults efficient repair of o -ethylguanine, but not o -ethylthymine or o -ethylthymine, is dependent upon o -alkylguanine-dna alkyltransferase and nucleotide excision repair activities in human cells efficient rescue of integrated shuttle vectors from transgenic mice: a model for studying mutations in vivo other transgenic mutation assays: a new transgenic mouse mutagenesis test system using spi-and -thioguanine selections the use of shuttle vectors for mutation analysis in transgenic mice and rats mutational signatures are jointly shaped by dna damage and repair mutational spectra of aflatoxin b in a mouse model of cancer establish biomarkers of exposure for human hepatocellular carcinoma short title: mutational spectra of aflatoxin b in mice transgenic mouse mutation assay systems can play an important role in regulatory mutagenicity testing in vivo for the detection of site-of-contact mutagens detailed review paper on transgenic rodent mutation assays, series on testing and assessment whole-genome sequencing of organoid cultures the genome as a record of environmental exposure tissue-specific mutation accumulation in human adult stem cells during life genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing optimizing drug discovery by investigative toxicology: current and future trends guide-seq enables genome-wide profiling of off-target cleavage by crispr-cas nucleases bliss is a versatile and quantitative method for genome-wide profiling of dna double-strand breaks therapeutic options for the novel coronavirus ( -ncov) aristolochic acids and their derivatives are widely implicated in liver cancers in taiwan and throughout asia aflatoxin: a -year odyssey of mechanistic and translational toxicology mutation signatures of carcinogen exposure: genome-wide detection and new opportunities for cancer prevention carcinogenicity of some aromatic amines, organic dyes, and related exposures cancer incidence and mortality among firefighters space radiation triggers persistent stress response, increases senescent signaling, and decreases cell migration in mouse intestine chronic toxicity of environmental contaminants: sentinels and biomarkers mutation spectra from a drop of blood repbase update, a database of repetitive elements in eukaryotic genomes calling duplex consensus reads vardict: a novel and versatile variant caller for next-generation sequencing in cancer research fast optimal leaf ordering for hierarchical clustering kraken: ultrafast metagenomic sequence classification using exact alignments this work was partially funded by nih r es to jjs. tg-rash table s . early neoplastic evolution is detected with duplex sequencing in the cancerpredisposed mouse tg-rash . the variant allele counts of a·t→t·a mutations at codon in the human hras transgene in the tg-rash mouse model. the variant allele counts observed at this locus are those of a·t→t·a in the context ctg for urethane exposed tissues. all but one urethane exposed lung tissue harbors a variant at significant clonality. a single urethane exposed splenic sample has a small clone of two counts ( . %) at this locus. database s . tabular text file of all variant calls for the big blue samples in mut format. database s . tabular text file of all variant calls for the tg-rash samples in mut format. key: cord- -hly ne authors: danko, david; bezdan, daniela; afshinnekoo, ebrahim; ahsanuddin, sofia; bhattacharya, chandrima; butler, daniel j; chng, kern rei; donnellan, daisy; hecht, jochen; kuchin, katerina; karasikov, mikhail; lyons, abigail; mak, lauren; meleshko, dmitry; mustafa, harun; mutai, beth; neches, russell y; ng, amanda; nikolayeva, olga; nikolayeva, tatyana; png, eileen; ryon, krista; sanchez, jorge l; shaaban, heba; sierra, maria a; thomas, dominique; young, ben; abudayyeh, omar o.; alicea, josue; bhattacharyya, malay; blekhman, ran; castro-nallar, eduardo; cañas, ana m; chatziefthimiou, aspassia d; crawford, robert w; de filippis, francesca; deng, youping; desnues, christelle; dias-neto, emmanuel; dybwad, marius; elhaik, eran; ercolini, danilo; frolova, alina; gankin, dennis; gootenberg, jonathan s.; graf, alexandra b; green, david c; hajirasouliha, iman; hernandez, mark; iraola, gregorio; jang, soojin; kahles, andre; kelly, frank j; knights, kaymisha; kyrpides, nikos c; Łabaj, paweł p; lee, patrick k h; leung, marcus h y; ljungdahl, per; mason-buck, gabriella; mcgrath, ken; meydan, cem; mongodin, emmanuel f; moraes, milton ozorio; nagarajan, niranjan; nieto-caballero, marina; noushmehr, houtan; oliveira, manuela; ossowski, stephan; osuolale, olayinka o; Özcan, orhan; paez-espino, david; rascovan, nicolas; richard, hugues; rätsch, gunnar; schriml, lynn m; semmler, torsten; sezerman, osman u; shi, leming; shi, tieliu; song, le huu; suzuki, haruo; tighe, scott w; tong, xinzhao; udekwu, klas i; ugalde, juan a; valentine, brandon; vassilev, dimitar i; vayndorf, elena; velavan, thirumalaisamy p; wu, jun; zambrano, maría m; zhu, jifeng; zhu, sibo; mason, christopher e title: global genetic cartography of urban metagenomes and anti-microbial resistance date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: hly ne although studies have shown that urban environments and mass-transit systems have distinct genetic profiles, there are no systematic worldwide studies of these dense, human microbial ecosystems. to address this gap in knowledge, we created a global metagenomic and antimicrobial resistance (amr) atlas of urban mass transit systems from cities, spanning , samples and , taxonomically-defined microorganisms collected for three years. this atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance markers, and novel genetic elements, including , novel predicted viral species, novel bacteria, and novel archaea. urban microbiomes often resemble human commensal microbiomes from the skin and airways, but also contain a consistent “core” of species which are predominantly not human commensal species. samples show distinct microbial signatures which may be used to accurately predict properties of their city of origin including population, proximity to the coast, and taxonomic profile. these data also show that amr density across cities varies by several orders of magnitude, including many amrs present on plasmids with cosmopolitan distributions. together, these results constitute a high-resolution, global metagenomic atlas, which enables the discovery of new genetic components of the built human environment, highlights potential forensic applications, and provides an essential first draft of the global amr burden of the world’s cities. introduction limitations in sequencing depth, or by missing annotations in the reference databases used for taxonomic classification, which is principally problematic with phages. it is worth noting that potentially prevalent presence and absence of species (which is robust to noise from relative abundance) and performed a dimensionality reduction of the data using umap (uniform manifold approximation and projection, mcinnes et al. ( )) for visualization (figure a) . jaccard distance was correlated with distance based on jensen-shannon divergence (which accounts for relative abundance) and k-mer distance calculated by mash (which is based on the k-mer distribution in a sample, so cannot be biased by a database) (supp. figure s a , b, c). in principle, jaccard distance could be influenced by read depth as low abundance species drop below detection thresholds. however we expect this issue to be minor as the total number of species identified stabilized at , reads (supp. figure s b ) compared to an average of . m reads per sample. samples collected from north america and europe were distinct from those collected in east asia, but the separation between other regions was less clear. a similar trend was found in an analogous analysis based on functional pathways rather than taxonomy (supp. fig s d) , which indicates geographic stratification of the metagenomes at both the functional and taxonomic levels. subclusters identified by umap roughly corresponded to city and climate but not surface type (supp. figure we quantified the degree to which metadata covariates influence the taxonomic composition of our samples using mavric, a statistical tool to estimate the sources of variation in a count-based dataset (moskowitz and greenleaf, ). we identified covariates which influenced the taxonomic composition of our samples: city, population density, average temperature in june, region, elevation above sea-level, surface type, surface material, elevation above or below ground and proximity to the coast. the most important factor, which could explain % of the variation in isolation, was the city from which a sample was taken followed by region which explained %. the other four factors ranged from explaining % to % of the possible variation in taxonomy in isolation (supp . table s ). we note that many of the factors were confounded with one another, so they can explain less diversity than their sum. one metadata factor tested, the population density of the sampled city, had no significant effect on taxonomic variation overall. to quantify how the principle covariates, climate, continent, and surface material impacted the taxo- nomic composition of samples, we performed a principal component analysis (pca) on our taxonomic data normalized by proportion and identified principal components (pcs) which were strongly associated with a metadata covariate in a positive or negative direction (pcs were centered so an average direction indicates an association). we found that the first two pcs (representing . % and . % of the variance of the original data, respectively) associated strongly with the city climate while continent and surface material associate less strongly (figure b) . next, we tested whether geographic proximity (in km) of samples to one another had any effect on the variation, since samples taken from nearby locations could be expected to more closely resemble one another. indeed, for samples taken in the same city, the average jsd (jensen-shannon distance) was weakly predictive of the taxonomic distance between samples, with every increase of km in distance between two samples representing an increase of . % in divergence (p < e , r = . , supp. figure s b ). this suggests a "neighborhood effect" for sample similarity analogous to the effect described by meyer et al. ( ) , albeit a very minor one. to reduce bias that could be introduced by samples colored by the region of origin for each sample. axes are arbitrary and without meaningful scale. the color key is shared with panel b. b) association of the first principal components of sample taxonomy with climate, continent, and surface material. c) distribution of major phyla, sorted by hierarchical clustering of all samples and grouped by continent. d) distribution of high-level groups of functional pathways, using the same order as taxa (c). e) distribution of amr genes by drug class, using the same order as taxa (c) . note that mls is macrolide-lincosamide-streptogramin. taken from precisely the same object we excluded all pairs of samples within km of one another. at a global level, we examined the prevalence and abundance of taxa and their functional profiles between cities and continents. these data showed a fairly stable phyla distribution across samples, but the relative abundance of these taxa is unstable (figure c ) with some continental trends. in contrast to taxonomic variation, functional pathways were much more stable across continents, showing relatively little variation in the abundance of high level categories (figure d ). this pattern may also be due to the more limited range of pathway classes and their essential role in cellular function, in contrast to the much more wide-ranging taxonomic distributions examined across metagenomes. classes of antimicrobial resistance were observed to vary by continent as well. clusters of amr classes were observed to occur in groups of taxonomically similar samples ( figure e ). we quantified the relative variation of taxonomic and functional profiles by comparing the distribution of pairwise distances in taxonomic and functional profiles. both profiles were equivalently normalized to give the probability of encountering a particular taxon or pathway. taxonomic to facilitate characterization of novel sequences we created geodna, a high-level web interface (figure a) to search raw sequences against our dataset. users can submit sequences to be processed against a k-mer graph-based representation of our data. query sequences are mapped to samples and a set of likely sample hits is returned to the user. this interface will allow researchers to probe the diversity in this dataset and rapidly identify the range of various genetic sequences. we sought to determine whether a samples taxonomy reflected the environment in which it was collected. to this end we trained a random forest classifier (rfc) to predict a sample's city of origin from its taxonomic profile. we trained an rfc with components on % of the samples in our dataset and evaluated its classification accuracy on the remaining %. we repeated this procedure with multiple subsamples of our data at various sizes and with replicates per size to achieve a distribution (fig. b ). the rfc achieved % on held out data which compares favorably to the . % that would be achieved by a randomized classifier. these results from our rfc demonstrate that city specific taxonomic signatures exist and can be predictive. we expanded our analysis of environmental signatures in taxonomy to the prediction of features in cities not present in our training set. to do this we collated a set of features for each city: population, surface material, elevation, proximity to the coast, population density, region, ave june temperature, and koppen climate classification. we trained a rfcs to predict each feature based on all samples that were not taken from a given city then used the relevant rfc to predict the feature for samples from the held out city and recorded the classification accuracy ( figure d ). while not all features and cities were equally predictable (in particular features for a number of british cities were roughly similar and could be predicted effectively) in general the predictions exceeded random chance by a significant margin (supp. figure s a ). this suggests that certain features of cities generate microbial signatures that are present globally and distinct from city specific signatures. the successful geographic classification of samples demonstrates distinct city-specific trends in the detected taxa, that may enable future forensic biogeographical capacities. however, unique, city-specific taxa are not uniformly distributed ( figure b ). to quantify this, we developed a score to reflect how endemic a given taxon is within a city, which reflects upon the forensic usefulness of a taxon. we define the endemicity score (es) of a taxa as term-frequency inverse document frequency where the document consists of samples from some metadata defined group such as a city or region. this score is designed to simultaneously reflect the chance that a taxon could identify a given city and that that taxon could be found within the given city. a high es for a taxon in a given city could be evidence of the evolutionary advantage that the taxon has in a particular cities environment. however, neutral evolution of microbes within a particular niche is also possible and the es alone does not distinguish between these two hypotheses. note that while the es only considers taxa which are found in a city, a forensic classifier could also take advantage of the absence of taxa for a similar metric. es show a roughly bimodal distribution for regions (fig. c) . each region possesses a number of taxa with es scores close to and a slightly larger samples for all cities are transformed into lists of unique k-mers (left). after filtration, the k-mers are assembled into a graph index database. each k-mer is then associated with its respective city label and other informative metadata, such as geo-location and sampling information (top middle). arbitrary input sequences (top right) can then be efficiently queried against the index, returning a ranked list of matching paths in the graph together with metadata and a score indicating the percentage of k-mer identity (bottom right). the geo-information of each sample is used to highlight the locations of samples that contain sequences identical or close to the queried sequence (middle right). b) classification accuracy of a random forest model for assigning city labels to samples as a function of the size of training set. c) distribution of endemicity scores (term frequency inverse document frequency) for taxa in each region. d) prediction accuracy of a random forest model for a given feature (rows) in samples from a city (columns) that was not present in the training set. rows and columns sorted by average accuracy. continuous features (e.g. population) were discretized. number close to (note that es is not bounded in [ , ]). some cities, like offa (nigeria), host many unique taxa while others, like zurich (switzerland), host fewer endemic species (supp. figure s b ). large numbers of endemic species in a city may reflect geographic bias in sampling. however, some cities from well sampled continents (e.g., lisbon, hong kong) also host many endemic species which would suggest that es may indicate interchangeability and local pockets of microbiome variation for some locations. quantification of antimicrobial diversity and amrs are key components of global antibiotic stewardship. yet, predicting antibiotic resistance from genetic sequences alone is challenging, and detection accuracy depends on the class of antibiotics (i.e., some amr genes are associated to main metabolic pathways while others are uniquely used to metabolize antibiotics). as a first step towards a global survey of antibiotic resistance in urban environments, we mapped reads to known antibiotic resistance genes, using the megares ontology and alignment software. we quantified their relative abundance using reads/kilobase/million mapped reads (rpkm) for classes of antibiotic resistance genes detected in our samples (figure a b) . , samples had some sequence which were identified as belonging to an amr gene, but no consistent core set of genes was identified. the most common classes of antibiotic resistance genes were for macrolides, lincosamides, streptogamines (mls), and betalactams, yet the most common class of antibiotic resistance genes, mls was found in only % of the samples where amr sequence was identified. despite being relatively common, antibiotic resistance genes were universally in low abundance com- pared to functional genes, with rpkm values for resistance classes typically ranging from . - com- pared to values of - for typical housekeeping genes (amr classes contain many genes so rpkm values may be lower than they would be for individual genes). in spite of the low abundance of the genes themselves, some samples contained sequences from hundreds of distinct amr genes. clusters of high amr diversity were not evenly distributed across cities ( figure c ). some cities had more resistance genes identified on average ( - x) than others (e.g. bogota) while other cities had bimodal distribu- tions (e.g. san francisco) where some samples had hundreds of genes while others very few. we note that % of the cases where we detected an amr genes had an average depth of . x, indicating that our global distribution would not dramatically change with altered read depth (supp. figure s e ). as with taxa, amr genes can be used to classify samples to cities -albeit with much less accuracy. a random forest model analogous to the one trained to predict city classification from taxonomic profiles was trained to predict from profiles of antimicrobial resistance genes. this model achieved . % accuracy on held out test data (supp. figure s a ). while poor for actual classification this accuracy far exceeds the . % that would be achieved by randomly assigning labels and indicates that there are possibly weak, city specific signatures for antimicrobial resistance genes. multiple amr genes can be carried on a single plasmid and ecological competition may cause mul- tiple taxa in the same sample to develop antimicrobial resistance. as a preliminary analysis into these phenomenons we identified clusters of amr genes that co-occurred in the same samples ( figure d ). we measured the jaccard distance between all pairs of amr genes found in at least % of samples and performed agglomerative clustering on the resulting distance matrix. we identified three large clusters of genes and numerous smaller clusters. of note, these clusters often consist of genes from multiple classes of resistance. at this point we do not posit a specific ecological mechanism for this co-occurrence, but we note that the large clusters contain far more genes than are typically found on plasmids. we performed a rarefaction analysis on the set of all resistance genes in the dataset, which we call the "panresistome" (figure (supp. figure s b ). similar to the rate of detected species, the panresistome also shows an open slope with an expected rate of discovery of previously unobserved amr gene per samples. given that amr gene databases are rapidly expanding and that no amr genes were found in some samples, it is likely that future analyses will identify many more resistance genes in this data. additionally, amr genes show a "neighbourhood" effect within samples that are geographically prox- imal analogous to the effect seen for taxonomic composition (supp. figure s c ). excluding samples where no amr genes were detected, the jaccard distance between sets of amr genes increases with distance for pairs of samples in the same city. as with taxonomic composition. the overall effect is weak and noisy, but significant. to examine these samples for novel genetic elements, we assembled and identified metagenome assembled genomes (mags) for viruses, bacteria, and archaea and analyzed them with several algorithms. this includes thousands of novel crispr arrays that reflect the microbial biology of the cities and , genomes from our data, of which did not match any known reference genome within % average nucleotide identity (ani). of the genomes were classified as bacteria, and as archaea. bacterial genomes came predominantly from four phyla: the proteobacteria, actinobacteria, firmicutes, and bacteroidota. novel bacteria were evenly spread across phyla ( figure a ). assembled bacterial genomes were often identified in multiple samples. several of the most prevalent bacterial genomes were novel species ( figure b ). some assembled genomes, both novel and not, showed regional specificity while others were globally distributed. the taxonomic composition of identifiable genomes roughly matched the composition of the core urban microbiome (section ). the number of identified bacterial mags was somewhat based on read depth and the sample count per city (supp. figure s a ). the number of bacterial mags discovered in a city which did not match a known species was closely correlated to the total number of bacterial mags discovered in that city (supp. figure s b ). bacterial mags were roughly evenly distributed geographically with the notable exception of offa, which had dramatically more novel bacterial species than other cities. we investigated assembled contigs from our samples to identify , predicted uncultivated viral genomes (uvigs). taxonomic analysis of predicted uvigs to identify viral species yielded , clusters containing a total of , uvigs and , singleton uvigs for a total of , predicted viral species. we compared the predicted species to known viral sequences in the jgi img/vr system, which contains viral genomes from isolates, a curated set of prophages and k viral mags from other studies. of the , species discovered in our data . % did not match any viral sequence in img/vr (paez- espino et al., ) at the species level for a total of , novel viruses. we note that this number is surprisingly high but was obtained using a conservative pipeline ( . % precision) and corresponded well with our identified crispr arrays (below). this suggests that urban microbiomes contain significant diversity not observed in other environments. next, we attempted to identify possible bacterial and eukaryotic hosts for our predicted viral mags. for the species with similar sequences in img/vr, we projected known host information onto , metasub viral mags. additionally, we used crispr-cas spacer matches in the img/m system to assign possible hosts to a further , predicted viral species. finally, we used a database of million metagenome-derived crispr spacers to provide further rough taxonomic assignments. our predicted viral hosts aligned with our taxonomic profiles, % of species in the core microbiome (section ) had predicted viral-host interactions. many of our viral mags were found in multiple locations ( figure d ). many viruses were found in south america, north america and africa. viral mags in japan often corresponded to those in europe and north america. we identified , crispr arrays in our data of which , could be annotated for specific systems. the annotated crispr arrays were principally type -e and -f btu a number of type two and three systems were identified as well ( figure e , f). a number of arrays had unclear or ambiguous type assignment. critically the spacers in our identified crispr arrays closely matched our predicted viral mags. we aligned spacers to both our viral mags and all viral sequences in refseq. the total fraction of spacers which could be mapped to our viral mags and refseq was similar (supp. figure s c) but the mapping rate to our viral mags dramatically exceeded the mapping rate to refseq (figure c). we present this as additional evidence supporting these novel viral mags. (tables and s ) , constituting the first large scale metagenomic study of the urban microbiome. we also identified species that are geographically constrained and showed that these can be used to determine a samples city of origin (section . ). many of these species are associated with commensal microbiomes from human skin and airways, but we observed that urban microbiomes are nevertheless distinct from both human and soil microbiomes. notably, no species from the bacteroidetes, a prominent group of human commensal organisms (eckburg et al., ; qin et al., ) , was identified in the core urban microbiome. we conclude that there is a consistent urban microbiome core ( figure , ), which is supplemented by geographic variation (figure ) and microbial signatures based on the specific attributes of a city ( figure ). our data also indicates that significant diversity remains to be characterized and that novel taxa may be discovered in the data (figure ), that environmental factors affect variation, and that sequences associated with amr are globally widespread but not necessarily abundant ( figure ). in addition to these results, we present several ways to access and analyze our data including interactive web based visualizations, search tools over raw sequence data, and high level interfaces to computationally access results. unique taxonomic composition and association with covariates specific to the urban environment suggest that urban microbiomes should be treated as ecologically distinct from both surrounding soil microbiomes and human commensal microbiomes. though these microbiomes undoubtedly interact with the urban environment, they nonetheless represent distinct ecological niches with different genetic profiles. while our metadata covariates were associated with the principal variation in our samples, they do not explain a large proportion of the observed variance. it remains to be determined whether variation is essentially a stochastic process or if a deeper analysis of our covariates proves more fruitful. we have observed that less important principal components (roughly pcs - ) are generally less associated with metadata covariates but that pcs - do not adequately describe the data alone. this is a pattern that was observed in the human microbiome project as well, where minor pcs (such as our figure b ) were required to separate samples from closely related body sites. much of the urban microbiome likely represents novel diversity as our samples contain a significant in addition to general purpose data analysis tools essentially all analysis in this paper is available as a series of jupyter notebooks. our hope is that these notebooks allow researchers to reproduce our results, build upon our results in different contexts, and better understand precisely how we arrived at our conclusions. by providing the exact source used to generate our analyses and figures, we also hope to be able to quickly incorporate new data or correct any mistakes that might be identified. for less technical purposes, we also provide web-based interactive visualizations of our dataset (typ- ically broken into city-specific groups). these visualizations are intended to provide a quick reference for major results as well as an exploratory platform for generating novel hypotheses and serendipitous discovery. the web platform used, metagenscope, is open source, permissively licensed, and can be run on a moderately powerful machine (though its output relies on results from the metasub cap). our hope is that by making our dataset open and easily accessible to other researchers the scientific community can more rapidly generate and test hypotheses. one of the core goals of the metasub consortium is to build a dataset that benefits public health. as the project develops we want to make our data easy to use and access for clinicians and public health officials who may not have computational or microbiological expertise. we intend to continue to build tooling that supports these goals. fields collected were the location of sampling, the material being sampled, the type of object being sampled, the elevation above or below ground, and the station or line where the sample was collected. however, several cities were unable to use the provided apps for various reasons and submitted their metadata as separate spreadsheets. additionally, certain metadata features, such as those related to sequencing and quality control, were added after initial sample collection. to collate various metadata sources, we built a publicly available program which assembled a large master spreadsheet with consistent sample uuids. after assembling the originally collected data at- ), which we have added to an empty sterile urine cup followed by swabbing for min (fig.s ). furthermore, the working space has been swabbed for . min / min before and after treatment with % bleach (fig. s ) figure s ). distributions of dna concentration and the number of reads were as expected. gc content was broadly distributed for negative controls while positive controls were tightly clustered, expected since positive controls have a consistent taxonomic profile. comparing the number of reads before and after quality control did not reveal any major outliers. . . batch effect appears minimal a major concern for this low-biomass studies and large-scale studies are batch effects. the median flowcell used in our study contained samples from cities and continents. however, two flowcells covered cities from or continents respectively. when samples from these flowcells were plotted using umap (see section . for details) the major global trends we described were recapitulated (supp. we used blastn to align nucelotide assemblies from case samples to control samples. we used a threshold of , base pairs and . % identity as a minimum to consider two sequences homologous. this threshold was chosen to be sensitive without solely capturing conserved regions. we identified all connected groups of homologous sequences and found approximate taxonomic identifications by aligning contigs to ncbi-nt using blastn searching for % nucleotide identity over half the length of the longest contig in each group. section ). our dilemma was that a microbial species that is common in the urban environment might also reasonably be expected to be common in the lab environment. in general, negative controls had lower k-mer complexity, fewer reads, and lower post pcr qubit scores than case samples and no major previous studies have reported that microbial species whose relative abundance is negatively cor- related with dna concentration may be contaminants. we observed a number of species that were negatively correlated with dna concentration (supp. figure s a ) but this distribution followed the same shape (but had a greater magnitude) as a null distribution of uniformly randomly generated rela- tive abundances (supp. figure s b ) leading us to conclude that negative correlation may simply be a statistical artifact. we also plotted correlation with dna concentration against each species mean rela- tive abundance across the entire data-set (supp. figure s c ). species that were negatively correlated with dna concentration were clearly more abundant than uncorrelated species, this suggests that there may be a jackpot effect for prominent species in samples with lower concentrations of dna but is not generally consistent with contamination. we analyzed the total complexity of case samples in comparison to control samples. case samples had a significantly higher taxonomic diversity (supp. figure s a ) than any type of negative control sample. we also compared the confidence of taxonomic assignments to control assignments for prominent taxa (supp. figure s b ) using the number of unique marker k-mers to compare assignments. we found that case samples had more and higher quality assignments than could be found in controls. one species, bradyrhizobium sp. btai , was not clearly better in case samples than controls but in this case we were able to assemble genomes for this species in several unique samples so we feel it is ambiguous. finally, we compared assemblies from negative controls to assemblies from our case samples searching for regions of high similarity that could be from the same microbial strain. we reasoned that uncontam- inated samples may contain the same species as negative controls but were less likely to contain identical strains. only case samples were observed to have any sequence with high similarity to an assem- bled sequence from a negative control ( , base pairs minimum of . % identity). the identified sequences were principally from bradyrhizobium and cutibacterium. since these genera are core taxa (see section ) observed in nearly every sample but high similarity was only identified in a few samples, we elected not to remove species from these genera from case samples. we generated -mer profiles for raw reads using jellyfish. all k-mers that occurred at least twice in a given sample were retained. we also generated mash sketches from the non-human reads of each sample with million unique minimizers per sketch. we calculated the shannon's entropy of k-mers by sampling -mers from a uniform , reads per sample. shannon's entropy of taxonomic profiles was calculated using the capalyzer package (section ). . . k-mer based metrics correlate with taxonomic metrics we found clear correlations between three pairwise distance metrics (supp. figure s a , b, c): k-mer based jaccard distance (mash), taxonomic jaccard distance, and taxonomic jensen-shannon diver- gence. this suggests that taxonomic variation reflects meaningful variation in the underlying sequence in a sample. we also compared alpha diversity metrics (supp. figure s d ): shannon entropy of k-mers, and shannon entropy of taxonomic profiles. as with pairwise distances these metrics were correlated though noise was present. this noise may reflect sub-species taxonomic variation in our samples. and mapped these to a set of large database using blastn (see for details). this resulted in . % reads which could not be mapped to any external database compared to . % of reads mapped using our approach with krakenuniq. we note that our approach to estimate the fraction of reads that could be classified using blastn does not account for hits to low quality taxa which would ultimately be discarded in our pipeline, and so represents a worst-case comparison. explanation ( ) as it has been demonstrated to be comparable or having higher sensitivity than the best tools identified in a recent benchmarking study (mcintyre et al. ( ) ) on the same comparative dataset. in addition, krakenuniq allows for tunable specificity and identifies k-mers that are unique to particular taxa in a database. reads are broken into k-mers and searched against this database. finally, the taxonomic makeup of a sample is given by identifying the taxa with the greatest leaf to ancestor weight. krakenuniq reports the number of unique marker k-mers assigned to each taxon, as well as the total number of reads, the fraction of available marker k-mers found, and the mean copy number of those k-mers. we found that requiring more k-mers to identify a species resulted in a roughly linear decrease at a minimum we required three reads assigned to a taxa with unique marker k-mers. this setting captures a group of taxa with low abundance but reasonable (∼ - %) coverage of the k-mers in their marker set (supp. figure s c ). however, this also allows for a number of taxa with very high ( ) duplication of the identified marker k-mers and very few k-mers per read which we believe is biologically implausible (supp. figure s d ). we filtered these taxa by applying a further filter which required that the number of reads not exceed times the number of unique k-mers, unless the set of unique k-mers was saturated (> % completeness). we include a full list of all taxonomic calls from all samples including diagnostic values for each call. we do not attempt to classify reads below the species level in this study. we further evaluated prominent taxonomic classifications for sequence complexity and genome cov- figure s b ). we chose sub-sampling (sometimes referred to as rarefaction in the literature) based on the study by weiss et al. ( ) , showing that sub-sampling effectively estimates relative abundance. note that we use the term prevalence to describe the fraction of samples where a given taxon is found at any abundance and we use the term relative abundance to describe the fraction of dna in a sample from a given taxon. we compared our samples to metagenomic samples from the human microbiome project and a metagenomic study of european soil samples using mash (ondov et al., ) , a fast k-mer based comparison tool. we built mash sketches from all samples with million unique k-mers to ensure a sensitive and accurate comparison. we used mash's built-in jaccard distance function to generate distances between our samples and hmp samples. we then took the distribution of distances to each particular human commensal community as a proxy for the similarity of our samples to a given human body site. we also compared our samples to hmp and soil samples using taxonomic profiles generated by metaphlan v . (segata et al., ) . we generated taxonomic profiles from non-human reads using metaphlan v . and found the cosine similarity between all pairs of samples. we used the microbe directory (shaaban et al., ) we analyzed the metabolic functions in each of our samples by processing non-human reads with hu- mann (franzosa et al., ) . we aligned all reads to uniref using diamond (v . . , (buchfink et al., ) ) and used humann to produce estimate of pathway abundance and completeness. we filtered all pathways that were less than % covered in a given sample but otherwise took the reported pathway abundance as is after relative abundance normalization (using humann 's attached script). high level categories of functional pathways were found by grouping postively correlated pathways and manually annotating resulting clusters. figure s : ecological relationships with taxa. a) correlation between species richness and latitude. richness decreases significantly with latitude b) neighbourhood effect. taxonomic distance weakly correlates with geographic distance within cities. c) fraction of reads assigned to different databases by blast for each region, at different levels of average nucleotide identity figure s : antimicrobial resistance genes, supplemental. a) classification accuracy of a random forest model predicting city labels for held out samples from antimicrobial resistance genes. b) rarefaction analysis of antimicrobial resistance genes. curve does not flatten suggesting we would identify more amr genes with more samples. c) neighbourhood effect. jaccard distance of amr genes weakly correlates with geographic distance within cities. d) number of amr genes detected for samples in each region. e) distribution of reads per gene (normalized by kilobases of gene length) for amr gene calls. the vertical red line indicates that % of amr genes have more than . reads per kilobase and would still be called at a lower read depth. gaussian distributions to sampling locations where the taxa was found with standard deviations based on the geographic distance between observations. top row) sampling sites in three major cities rows - ) estimated distribution of different example species in major cities row ) estimated distribution of three species together in major cities structure, function and diversity of the healthy human microbiome metaspades: a new versatile metagenomic assembler nala an, núria andreu so- mavilla a) number of species detected as k-mer threshold increases for randomly selected samples b) number of species detected as number of sub-sampled reads increase c) k-mer counts compared to number of reads for species level annotations in randomly selected samples, colored by coverage of marker k-mer set d) k-mer counts compared to number of reads for species level annotations in randomly selected samples, colored by average duplication of k-mers e) comparison of mean sequence entropy and coverage equality for core and sub-core taxa a) jensen-shannon divergence of taxonomic profiles vs mash jaccard distance of k-mers b) divergence of taxonomic profiles vs jaccard distance of taxonomic profiles. c) jaccard distance of taxonomic profilesvs mash jaccard distance of k-mers d) shannon's entropy of taxonomic profiles vs shannon's entropy of k-mers e) taxonomic richness (number of species) vs shannon's entropy of taxonomic profiles figure s : a) mash k-mer jaccard similarity to representative hmp samples metaphlan v . cosine similarity to representative hmp samples, colored by continent c) fraction unclassified dna by surface material d) cosine similarity to metaphlan v . skin microbiome profile by surface e) jensen-shannon distance between pairs of taxonomic profiles vs geographic distance f) mash k-mer jaccard similarity to representative soil samples key: cord- -c ajfvt authors: sundqvist, martina; holdfeldt, andré; wright, shane c.; møller, thor c.; siaw, esther; jennbacken, karin; franzyk, henrik; bouvier, michel; dahlgren, claes; forsman, huamei title: barbadin selectively modulates fpr -mediated neutrophil functions independent of receptor endocytosis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c ajfvt formyl peptide receptor (fpr ), a member of the family of g protein-coupled receptors (gpcrs), mediates neutrophil migration, a response that has been linked to β-arrestin recruitment. β-arrestin regulates gpcr endocytosis and can also elicit non-canonical receptor signaling. to determine the poorly understood role of β-arrestin in fpr endocytosis and in nadph-oxidase activation in neutrophils, barbadin was used as a research tool in this study. barbadin has been shown to bind the clathrin adaptor protein (ap ) and thereby prevent β- arrestin/ap interaction and β-arrestin-mediated gpcr endocytosis. in agreement with this, ap /β-arrestin interaction induced by an fpr -specific agonist was inhibited by barbadin. unexpectedly, however, barbadin did not inhibit fpr endocytosis, indicating that a mechanism independent of β-arrestin/ap interaction may sustain fpr endocytosis. this was confirmed by the fact, that fpr also underwent agonist-promoted endocytosis in β-arrestin deficient cells, albeit at a diminished level as compared to wild type cells. dissection of the barbadin effects on fpr -mediated neutrophil functions including nadph-oxidase activation mediated release of reactive oxygen species (ros) and chemotaxis reveled that barbadin had no effect on chemotactic migration whereas the release of ros was potentiated/primed. the effect of barbadin on ros production was reversible, independent of β-arrestin recruitment, and similar to that induced by latrunculin a. taken together, our data demonstrate that endocytic uptake of fpr occurs independently of β-arrestin, while barbadin selectively augments fpr -mediated neutrophil ros production independently of receptor endocytosis. given that barbadin binds to ap and prevents the ap /β-arrestin interaction, our results indicate a role for ap in fpr -mediated ros release from human neutrophils. neutrophils, the most abundant leukocytes in human peripheral blood, form the frontline of our innate host immune defense, and are rapidly recruited from the circulation to damaged or infected body tissues, where they contribute to bacterial clearance and tissue repair [ ] [ ] [ ] [ ] . the formyl peptide receptor (fpr ), belonging to the family of g protein-coupled receptors (gpcrs), regulates directional neutrophil migration (chemotaxis), granule secretion (degranulation), formation of f-actin filaments (through polymerization of g-actin), and activation of the reactive oxygen species (ros) producing nadph-oxidase [ , , ] . fpr recognizes not only n-formyl peptides of both bacterial and host mitochondrial origin, but also neutrophil surface expression of fpr or cd b was examined by flow cytometry. neutrophils ( × /ml) were equilibrated for min at °c, and then stimulated with either krg (control), an fpr agonist or with barbadin with or without an fpr agonist in the presence of catalase ( units/ml) to avoid potential agonist inactivation via oxidation. cells the plasmid encoding human fpr with an n-terminal snap-tag was obtained from genscript (piscataway, nj, usa); it was constructed by replacing glp r in the previously described pcdna . (+)-flag-snap-glp r plasmid [ ] with human fpr . hek a cells (thermo neutrophil nadph-oxidase activity nadph-oxidase activity was determined by using isoluminol/luminol-enhanced chemiluminescence (cl) [ ] in a six-channel biolumat lb (berthold co., wildbad, germany). polypropylene tubes containing a µl reaction mixture of neutrophils in krg, isoluminol ( × - m) and hrp ( units/ml), were equilibrated at °c for min before addition of µl of stimulus. for latrunculin a or barbadin dilution experiments, neutrophils ( /ml) were pre-incubated at °c with latrunculin a ( ng/ml) or barbadin ( µm) in a separate tube and then µl cell suspension were transferred to polypropylene tubes containing µl °c pre-heated reaction mixture (to obtain cells in the assay system) of krg, hrp and isoluminol in the absence or presence of latrunculin a ( ng/ml) or barbadin ( µm) followed by agonist stimulation ( µl). for phagocytosis-induced intracellular nadph-oxidase activity, yeast particles ( x /ml) were opsonized in % normal human serum ( min, °c) followed by washing and dilution in krg to a concentration of x yeast particles/ml. opsonized yeast solution ( µl) was added to polypropylene tubes containing µl reaction mixture containing x neutrophils in krg, luminol ( × - m), superoxide dismutase ( units/ml) and catalase ( units/ml) in the absence and presence of barbadin ( µm) or latrunculin a ( ng/ml) that had been equilibrated at °c for min; multiplicity of infection (moi): : . the light emission of yeast phagocytosis-induced ros production is expressed as mega counts per min (mcpm). neutrophils ( /ml) were equilibrated for min in the absence and presence of barbadin ( µm) at °c followed by agonist stimulation. after s of treatment, μl cell suspension were transferred to ice-cold fixation/permeabilization solution (bd cytofix/cytoperm solution, . ml) and incubated on ice for min. the cells were then washed twice with bd wash buffer before staining with af -conjugated phalloidin ( min, °c). a minimum of , gated neutrophils (forward scatter; size versus side scatter; density) per sample were collected on an accuri c flow cytometer (bd biosciences, sparks, md, usa). the af -conjugated phalloidin intensity was determined by the geometric mean fluorescence intensity (mfi) as analyzed by flowjo software version . (tree star inc., ashland, or, usa). neutrophils were loaded with fura- -am ( µm) ( min, rt, in darkness) before washing and resuspension in krg. measurements of the transient rise in [ca + ]i were carried out at °c by using a perkinelmer lc fluorescence spectrophotometer with excitation wavelengths of nm and nm, an emission wavelength of nm, and slit widths of nm and nm. the transient rise in intracellular calcium is presented as the fluorescence intensities for both the excitation wavelengths ( and nm) as measured in parallel. for reactivation experiments (fig c) , isoluminol ( × - m) and hrp ( units/ml) were included in the assay system to avoid agonist inactivation by the radicals generated through the myeloperoxidase (mpo)-h o enzyme system [ ] . neutrophils ( × /ml) in krg supplemented with bsa ( . %) were loaded on top of a µm pore size filter and allowed to migrate towards stimuli loaded into the bottom wells of a chemotaxis plate (chemotx, neuro probe, uk) for min at °c, % co . barbadin was present in both upper and lower chambers to avoid any gradient effect. the migrated cells were quantified by measuring myeloperoxidase activity of cell lysates [ ] . all migration (myeloperoxidase activity) values were subtracted from the level of migration without any attractant in the lower compartment (negative control), and the resulting data are presented as the number of cells recovered in the lower compartment, relative to the total number of cells applied to the migration system (neutrophils were added directly to the bottom well of the chemotaxis plate). data analysis was performed using graphpad prism . . a (graphpad software, san diego, ca, usa). curve fitting was performed by non-linear regression using the sigmoidal doseresponse equation (variable-slope). independent bret experiments were normalized to the span of reference compound (wkymvm) as determined by non-linear regression analysis and pooled together. data represent mean values ± sem. statistical analysis was performed using a paired student's t-test (fig d; d, f and h; c and f; b and c; b and d and f) or a repeated measurement one-way anova followed by tukey's multiple comparison test ( fig a, a and h ). statistically significant differences are indicated by *p < . , **p < . , ***p < . . barbadin (structure is shown in fig ) was recently described as a selective inhibitor of the machinery responsible for ap /-arrestin-dependent endocytic internalization of some gpcrs including the vasopressin receptor (v r), the  -adrenergic receptor ( ar) and the angiotensin ii receptor type (at r) [ ] . to study the effect of barbadin on agonist-induced fpr internalization, we used the peptide agonist wkymvm and two lipopeptides, the peptidomimetic cmp and the pepducin f pal that are functionally biased (structures are shown in fig ) . all three agonists (wkymvm, cmp and f pal ) are potent in inducing an fpr -mediated transient rise in [ca + ]i, erk / phosphorylation, and ros production in human neutrophils, but cmp and f pal are biased away from -arrestin recruitment and chemotaxis [ - ]. by using an enhanced bystander bioluminescence resonance energy transfer (ebbret) assay system, we here confirmed that cmp and f pal are poor inducers of arrestin recruitment (fig a) . both wkymvm and f pal triggered fpr internalization but barbadin lacked effect on fpr internalization as determined by ebbret ( fig b) . the inability of barbadin to inhibit agonist-triggered removal of fpr from the hek cell surface was also evident when time resolved-förster resonance energy transfer assay (tr-fret) [ ] was used to study fpr internalization (fig c, d) . these data clearly show that internalization of fpr induced by agonists is unaffected by the presence of barbadin. in contrast to fpr , and in agreement with previously published data [ ] , barbadin reduced v r internalization as measured by ebbret in hek cells that had been transiently transfected with donor-tagged v r (v r-rlucii) and an acceptor anchored to the membrane (rgfp-caax) (fig e) . although barbadin did not appear to block agonist-induced fpr internalization, ebbret experiments investigating fpr trafficking from the plasma membrane (rgfp-caax) to early endosomes (rgfp-fyve) revealed that the internalization was largely (but not completely for wkymvm) dependent on the presence of -arrestin / as illustrated by the reduced responses in hek cells devoid of -arrestin / (arr, fig f, g). in order to confirm that barbadin did indeed block the interaction between -arrestin- and ap , we performed bret experiments to determine the wkymvm and f pal induced interaction between the two proteins. barbadin clearly blocked the wkymvm-induced interaction between -arrestin- and ap , whereas the low level of interaction induced by f pal alone makes it hard to determine the ability/inability of barbadin to inhibit its effect ( fig h) . taken together, these findings demonstrate that barbadin does not affect agonisttriggered internalization of surface-exposed fpr s and that a mechanism independent of arrestin/ap interaction can sustain wkymvm-induced receptor endocytosis. we next determined whether barbadin affects agonist-induced internalization of fpr when it is endogenously expressed in human neutrophils. although ap was clearly present in neutrophils (fig a) , the reduction in the number of surface-exposed fpr s was identical in wkymvm-activated control neutrophils (not treated with barbadin) and in wkymvmactivated cells treated with barbadin ( fig b) . hence, these data strongly imply that barbadin has no effect on agonist-induced fpr internalization in primary human neutrophils in which the receptor is endogenously expressed. neutrophils possess an electron transporting enzyme system, the neutrophil nadph-oxidase, which upon activation generates ros [ ] . despite the insensitivity of the agonist-triggered fpr internalization to barbadin, pre-incubation of neutrophils with barbadin substantially increased the neutrophil release of ros upon wkymvm stimulation (fig c, d) . expectedly, the fpr -selective antagonist rhb-(lys-βnphe) -nh (structure is shown in fig , where it is denoted as fpr ant.) completely blocked the neutrophil response ( fig c) . also, the priming effect (i.e., potentiation of ros production) of barbadin was evident at a concentration of the peptide agonist wkymvm ( nm) too low to trigger ros release in non-primed neutrophils ( fig d) . the priming effect was concentration-dependent, reaching its full effect at m of barbadin with an ec value of  m (fig e) . similarly, barbadin also significantly primed the ros production induced by f pal and cmp ( fig f) . the fact that f pal and cmp cannot induce -arrestin recruitment at these concentrations ([ - ], fig a) , strongly suggests that the priming effect of barbadin on fpr -mediated ros production is independent of the ability of the activating agonist to recruit -arrestin. in addition, the ros release following activation with pma, a compound that bypasses receptors and directly activate protein kinase c (pkc), was unaffected by barbadin ( fig g, h) , which suggests that barbadin lacks a direct effect on the ros-producing nadph-oxidase machinery. it is also important to note that activation of the nadph-oxidase could not be triggered by barbadin alone (fig g, h). collectively, these results suggest that barbadin primes neutrophils in their response to fpr agonists, and the increased nadph-oxidase activation is regulated independently of fpr internalization and fpr -induced -arrestin recruitment. barbadin treatment significantly primed fpr agonist-induced ros production in human neutrophils and further characterization revealed that a very short interaction time between neutrophils and barbadin was needed for barbadin to exert its priming effect. in fact, the increase in neutrophil ros production was the same when barbadin and the fpr agonist wkymvm were added simultaneously, as when the cells were incubated with barbadin prior to addition of the activating peptide wkymvm (fig a) . in addition, we found that the priming effect of barbadin on neutrophils for increased wkymvm response was also rapidly reduced when the barbadin concentration was lowered significantly. neutrophils first incubated for min with an effective priming concentration of barbadin ( µm) and then transferred to two different ros measurement reagents, one containing a new addition of barbadin ( µm) and the other not (the barbadin concentration was thus lowered x to . µm), followed by stimulation with wkymvm. a significantly lower degree of priming effect was observed in cells that were exposed to a reduced concentration (from µm to . µm) than cells incubated constantly to an effective priming concentration ( µm) of barbadin (fig b and c). this suggests that the neutrophil priming induced by barbadin is a process that is reversible, unlike many other priming processes that involve an irreversible process of neutrophil granule secretion [ , , ] . the effect of barbadin on fpr -mediated ros production resembled the effect induced by the f-actin-disrupting agent latrunculin a. barbadin and latrunculin a both prolonged and increased the magnitude of fpr -mediated neutrophil ros production as compared to the corresponding ros production in non-treated control cells ( fig d) . in addition, very similar to barbadin, the priming effect of latrunculin a was rapidly reduced as shown when latrunculin a was removed prior to activation with the fpr agonist wkymvm ( fig e, f ), using the same dilution protocol, to show that the priming effect of barbadin is reversible, as described above. fpr -induced ros production is a process rapidly initiated after addition of an activating agonist, and within a time period of minutes, the ros production is terminated and the cells are homologously desensitized. the desensitized cells are non-responsive to a second stimulation with fpr agonists but fully responsive to an fpr selective agonist or pma [ , ] . the desensitized state could be transferred to an active signaling state to produce ros again (i.e., fpr resensitization or reactivation) when the cytoskeleton was disrupted by latrunculin a (fig a) . the ros production induced by latrunculin a in fpr -desensitized cells was completely abolished by the fpr -selective antagonist rhb-(lys-βnphe) -nh ( fig a) . to determine the ability of barbadin to resensitize desensitized fpr , we reversed the order by which the sensitizer (barbadin) and fpr agonist (wkymvm) were added to the neutrophils. barbadin was added to wkymvm-activated neutrophils at a time point when ros production had returned to a background level; interestingly, these fpr -desensitized cells could be resensitized to produce ros also by barbadin, similar to the effect of latrunculin a ( fig a) . this response was also inhibited by an fpr -selective antagonist (fig a) , clearly demonstrating an fpr -mediated neutrophil resensitization. very similar results were obtained in fpr -desensitized cells when f pal or cmp (at concentrations that could not recruit - fig a) replaced wkymvm as the agonist used to desensitize fpr (fig a) . although the resensitization effects of latrunculin a and barbadin appeared very similar, it should be noted that the lag phase before any ros were generated was shorter when barbadin was used for resensitization, and the time to reach maximal (peak) ros production was also shorter as compared to the response following addition of latrunculin a ( fig b) . however, the amounts of ros produced during resensitization (as measured by the area under the curve) were comparable for barbadin and latrunculin a (fig c) . in summary, we show that barbadin, similarly to latrunculin a, not only potentiates the ros production induced by the different fpr agonists, but also resensitizes fpr signaling when added to fpr desensitized neutrophils. barbadin treatment potentiated and resensitized fpr -mediated signaling leading to ros production. to further investigate the effect of barbadin on fpr signaling and function in neutrophils, we measured the transient rise in cytosolic calcium [ca + ]i mediated through fpr . in contrast to the potentiating effect on ros production, the rise in [ca + ]i induced by wkymvm was not affected by barbadin (fig a, b) . similarly, resensitization by barbadin of agonist-desensitized fpr leading to an activation of the ros producing nadph-oxidase, was not associated with a corresponding rise in [ca + ]i (fig c) . at the functional level, we also observed a biased activity of barbadin in favor of neutrophil ros production over directional cell migration/chemotaxis. neutrophil chemotaxis was measured by using a transwell migration system; neutrophils were placed on top of the filter and allowed to migrate towards different concentrations of fpr agonists that were placed in the bottom well of the chambers. barbadin ( m) was added to both compartments, so that it was present in both the upper chamber together with the cells and the lower chamber containing the agonist. in line with earlier data [ , ], wkymvm, but not f pal , triggered a chemotactic migration of neutrophils. further, neutrophil chemotaxis towards wkymvm was unaffected by barbadin, and the inability of f pal to trigger chemotaxis was retained in the presence of barbadin (fig d) . in summary, these data demonstrate that the effect of barbadin in neutrophils is in favor of fpr -mediated ros over the rise in [ca + ]i and chemotaxis. the functional similarities between barbadin and latrunculin a, both being priming agents affecting the response induced by fpr agonists and agents that resensitize desensitized fpr , promoted us to examine whether barbadin could directly affect the integrity of the actin cytoskeleton and granule secretion. wkymvm induced a rapid polymerization of g-actin monomers into polymerized filamentous actin (f-actin) in human neutrophils (fig a) . however, the presence of barbadin did not affect the formation of f-actin induced by wkymvm (fig a) , indicating that barbadin does not affect the integrity of the actin cytoskeleton. the observation that barbadin does not directly interfere with the integrity of the actin cytoskeleton gained further support from the results obtained with barbadin on two neutrophil responses previously found to be regulated by the actin cytoskeleton, i.e., the atp receptor p y r-mediated ros production and the phagocytosis process [ , ] . it is known that atp upon binding to its neutrophil receptor p y r, triggers ros production provided that the factin structure is disrupted (fig b) . in line with this, latrunculin a treated neutrophils produce ros when activated with atp, but no such effect was obtained with barbadin ( fig b) . activation of the ros-generating nadph-oxidase system during uptake (phagocytosis) of microbes is a process regulated by the cytoskeleton, and accordingly, latrunculin a inhibited the activation process ( fig c) . however, barbadin exerted no inhibitory effect, neither when added before addition of the phagocytosis prey nor when added during the ongoing activation process (fig c, d) . taken together, these data show that barbadin has no direct effects on basic neutrophil functions regulated by the actin cytoskeleton, supporting the conclusion that barbadin lacks a general effect on the assembly of the ros-producing oxidase. our data reveal that barbadin is able to prime neutrophils for enhanced fpr -mediated ros production. neutrophil priming is a well-known process both in vitro and in vivo [ , , , , ] , and an increased exposure of intracellular granule-localized receptors to the plasma membrane as a result of granule secretion has been suggested to be one of the main mechanisms that augments neutrophil nadph-oxidase activity [ , , ] . we next determined whether the receptor mobilizing effect with increased surface fpr expression could account for the barbadin-induced priming effect. however, no increased surface exposure of fpr was induced by incubation of neutrophils with barbadin for up to ten minutes (fig e, f) . in addition to fpr expression, the ability of barbadin to upregulate the surface expression of cd b (complement receptor ; cr ), a marker protein stored in easily mobilized neutrophil granule compartments that can be mobilized to the surface by many secretagogues or priming agents [ ] was investigated. however, similar to the fpr expression, barbadin also lacked effect to upregulate cd b on the plasma membrane, whereas a profound increase of cd b surface expression was induced by the classical secretagogue fmlf (fig g, h) . in summary, our data show that even though barbadin affects fpr signaling in a way that resembles actin cytoskeleton-disrupting agents, barbadin lacks direct effects on the reorganization of the actin cytoskeleton in neutrophils and on receptor mobilization from intracellular granule stores. in the present study, we assessed the role of -arrestin in endocytosis of fpr and in receptor down-stream functional responses in human neutrophils using barbadin, an ap -binding inhibitor that blocks the interaction between -arrestin and ap and prevents agonist triggered endocytosis of many gpcrs [ ] . our data show that the ap protein targeted by barbadin indeed is expressed in neutrophils, yet, barbadin did not block fpr endocytosis. these results imply that fpr can be internalized through a -arrestin/ap -independent process, an assumption in line with the observation that only residual endocytosis of fpr occurs in cells lacking -arrestin. interestingly, barbadin treatment potentiated fpr -mediated ros production and resensitization of fpr -desensitized human neutrophils in a manner similar as an inhibitor of actin polymerization (i.e., latrunculin a). however, barbadin did not interfere with other processes in neutrophils involving the actin cytoskeleton machinery. in addition, the potentiating effect of barbadin on fpr -mediated ros production was found to involve biased functional/signaling as neither fpr -promoted intracellular calcium mobilization nor chemotaxis was affected when ap -binding was inhibited by barbadin. previously, barbadin has been shown to affect several gpcr-mediated functions, including hormone secretion mediated by gonadotropin-releasing hormone (gnrh) receptors [ ] , and uptake/entry of influenza a viruses facilitated by short chain fatty acid receptor (ffar ) signaling [ ] . as described, barbadin prevents ap /β-arrestin-mediated receptor endocytosis, which has been deemed to be the canonical molecular mechanism behind the functional effects of this ap inhibitor. however, the data presented in the current study suggest that alternative endocytosis-and β-arrestin-independent mechanisms can mediate the effects by this ap binding inhibitor with regards to fpr expressing human neutrophils. recent data infer that arrestin appears to be involved in non-canonical and endosomal signaling, besides playing roles in receptor desensitization and endocytosis [ , ] . however, the exact functional role of β-arrestin in fpr signaling needs to be further investigated. it has been suggested that polymerized actin rather than -arrestin constitutes the basis for physical separation of g proteins from activated fprs, resulting in termination of signaling and receptor desensitization [ , , ] . the role of the actin cytoskeleton in the regulation of gpcr signaling in neutrophils was originally defined by measurements of ros generated by the phagocyte nadph-oxidase. involvement of the actin cytoskeleton in the termination/desensitization of fpr signaling became evident from experiments in which actin cytoskeleton disrupting agents prolong fpr signaling and have the ability to resensitize the desensitized receptors [ , , ] . we now show that barbadin lacks effect on agonist-induced endocytosis of fpr as examined in several assay systems. intriguingly, these data strongly indicate that fpr can undergo endocytosis through a -arrestin/ap -independent process. this is an internalization pattern shared with receptors for transferrin and endothelin-a, which are both endocytosed independently of β-arrestin and ap , respectively [ ] . although barbadin did not affect fpr internalization, it convincingly potentiated fpr -mediated ros production and promoted resensitization of desensitized fpr s. a similar augmentation of the ros response was also obtained at concentrations of fpr agonists (wkymvm) that do not recruit -arrestin or by fpr agonists (f pal and cmp ) that are very poor in recruiting -arrestin, suggesting that this novel priming effect of barbadin is achieved without -arrestin recruitment. as mentioned above, the effect of barbadin on fpr -mediated ros production resembled the effect induced by actin cytoskeleton-disrupting agents [ , , ] . despite this deviation from the prototypical mode of action for barbadin, several lines of evidence suggest that barbadin does not directly disrupt the actin cytoskeleton. these findings include that in contrast to latrunculin a, barbadin had no effect on (i) the increase in f-actin polymerization induced by the fpr agonist wkymvm, (ii) the actin cytoskeleton-dependent ros production induced during phagocytic uptake of yeast particles was not affected by barbadin, and (iii) the signals downstream of atp-activated p y rs generated only when the actin cytoskeleton has been disrupted. altogether, these observations indicate that barbadin primes neutrophils and resensitizes/reactivates desensitized receptors through a mechanism resembling that of actin cytoskeleton-disruptive agents. however, as compared to actin cytoskeleton-disruptive agents, the effects exerted by barbadin do not appear to involve a direct effect on the integrity of the actin cytoskeleton. at present, the precise mechanism underlying the influence of barbadin on fpr activity is not known, but as barbadin lacks effect on the fpr -induced transient rise in [ca + ]i, a general modulation of downstream signaling of agonist-occupied fpr is unlikely. regarding assembly and activation of the electron-transporting nadph-oxidase, a large number of stimuli (including many gpcr agonists) can induce ros production in neutrophils, but not all signaling pathways that regulate these activation processes have been identified yet. however, it has been established that there is no direct link between gpcr-mediated activation of the oxidase and the transient rise in [ca + ]i [ , , , , ] , which gains further support from the data presented in this study. hence, even although it is difficult to directly correlate nadph-oxidase activity and calcium signaling during receptor reactivation induced by barbadin or latrunculin a, our data corroborate previous studies demonstrating that a rise in [ca + ]i is not a requirement for activation of the nadph-oxidase. furthermore, our data support the notion that the neutrophil nadph-oxidase can be activated in the absence of arrestin recruitment [ - ]. it follows that barbadin is a biased and functionally selective regulator of fpr signaling as it influences ros production by activating nadph-oxidase without affecting calcium mobilization and neutrophil granule secretion. although -arrestin modulated functions are inhibited by barbadin, the ap inhibitor lacks direct effects on receptor-mediated -arrestin recruitment [ ] . our earlier reports have demonstrated that fpr agonists that are potent stimuli in triggering calcium signaling, erk / phosphorylation and ros production, but differ in their ability to recruit -arrestin, also vary in their ability to induce neutrophils chemotaxis [ ] [ ] [ ] . the present study shows that the functionally selective deviation linked to the ability to recruit -arrestin is retained in the presence of barbadin, further supporting the proposed mode of action of barbadin in that it lacks a direct effect on receptor mediated -arrestin recruitment [ ] . the observation that barbadin potentiates fpr -mediated ros production, no matter whether this was caused by the -arrestin recruiting wkymvm peptide or fpr agonists that are very poor in recruiting -arrestin, suggests that the effects of barbadin on fpr -mediated ros production is not dependent on -arrestin. several in vitro as well as in vivo processes potentiate fpr-mediated ros production, and increased surface receptor expression as a result of granule secretion has been suggested as an important mechanism underlying the potentiation [ , , ] . however, this is not the mechanism involved in the priming effect of barbadin, demonstrated by its inability to induce the mobilization of granules. this conclusion is supported by the fact that while granule mobilization is an irreversible process the effects of barbadin are reversible. thus, the mechanism by which barbadin potentiates fpr -mediated ros remains to be elucidated. given that barbadin binds to ap to prevent -arrestin binding, the role of ap in the observed effects on ros priming needs to be further investigated. with respect to the role of ap , it is interesting to note that a comparison between ec values reveals that the potency of barbadin mediated augmentation of ros production in neutrophils is the same as that found for its inhibition of the -arrestin-ap interaction. this suggests that the barbadin-mediated effect on ros production could be a result of its action on ap (this study and [ ] ). however, other target proteins including other binding partners for ap , such as ap , arh and scr [ ] can, at this point, not be excluded until the modulating effect (if any) of barbadin on these ap binding molecules has been be determined. in summary, this study demonstrates some novel effects of the ap binding compound barbadin. although barbadin did not affect agonist-induced endocytosis of fpr , a process shown to be independent of whether the agonist recruits -arrestin or not, barbadin both increased fpr agonist induced ros production, and resensitized agonist-desensitized fpr to produce ros. notably, these effects of barbadin on fpr also proved independent of whether the agonist recruited -arrestin or not. the effect of barbadin on fpr induced neutrophil ros production is very similar to the actin cytoskeleton-disrupting agent latrunculin a, albeit without altering other neutrophil functions regulated by a dynamic polymerization of the actin cytoskeleton. elucidation of the precise mechanism(s) of barbadin regarding its priming effect on the fpr -mediated ros production in neutrophils would lead to an increased understanding of the underlying molecular mechanisms regulating inflammatory reactions that are dependent on redox reactions. barbadin and structurally related analogs of this ap inhibitor are expected to serve as useful molecular tools for further mechanistic studies of gpcr regulation in neutrophils. neutrophil recruitment and function in health and inflammation formyl peptide receptor orchestrates mucosal protection against citrobacter rodentium infection formyl-peptide receptors in infection, inflammation, and cancer formylpeptide receptor- contributes to colonic epithelial homeostasis, inflammation, and tumorigenesis measurement of respiratory burst products, released or retained, during activation of professional phagocytes basic characteristics of the neutrophil receptors that recognize formylated peptides, a danger-associated molecular pattern generated by bacteria and mitochondria international union of basic and clinical pharmacology. lxxiii. nomenclature for the formyl peptide receptor (fpr) family dual modulation of formyl peptide receptor by aspirin-triggered lipoxin contributes to its anti-inflammatory activity neutrophil chemoattractant receptors and the membrane skeleton direct or c a-induced activation of heterotrimeric gi proteins in human neutrophils is associated with interaction between formyl peptide receptors and the cytoskeleton similarities and differences between the responses induced in human phagocytes through activation of the medium chain fatty acid receptor gpr and the short chain fatty acid receptor ffa r reactivation of formyl peptide receptors triggers the neutrophil nadph-oxidase but not a transient rise in intracellular calcium a new inhibitor of the betaarrestin/ap endocytic complex reveals interplay between gpcr internalization and signalling the g protein-coupled receptor ffar promotes internalization during influenza a virus entry combining elements from two antagonists of formyl peptide receptor generates more potent peptidomimetic antagonists studies on acid stability and solid-phase block synthesis of peptide-peptoid hybrids: ligands for formyl peptide receptors monitoring g protein-coupled receptor and beta-arrestin trafficking in live cells using enhanced bystander bret isolation of mononuclear cells and granulocytes from human blood. isolation of monuclear cells by one centrifugation, and of granulocytes by combining centrifugation and sedimentation at g isolation of lymphocytes, granulocytes and macrophages real-time trafficking and signaling of the glucagon-like peptide- receptor translating in vitro ligand bias into in vivo efficacy a methodological approach to studies of desensitization of the formyl peptide receptor: role of the read out system, reactive oxygen species and the specific agonist used to trigger neutrophils lipopolysaccharide-induced granule mobilization and priming of the neutrophil response to helicobacter pylori peptide hp( - ), which activates formyl peptide receptor-like galectin- activates the nadph-oxidase in exudated but not peripheral blood neutrophils the synthetic peptide trp-lys-tyr-met-val-met-nh specifically activates neutrophils through fprl /lipoxin a receptors and is an agonist for the orphan monocyte-expressed chemoattractant receptor fprl p y receptor signaling in neutrophils is regulated from inside by a novel cytoskeleton-dependent mechanism an intact cytoskeleton is required for prolonged respiratory burst activity during neutrophil phagocytosis multiple phenotypic changes define neutrophil priming priming and de-priming of neutrophil responses in vitro and in vivo granulopoiesis and granules of human neutrophils beta-arrestin-dependent signaling in gnrh control of hormone secretion from goldfish gonadotrophs and somatotrophs gpcr-g proteinbeta-arrestin super-complex mediates sustained g protein signaling reactivation of desensitized formyl peptide receptors by platelet activating factor: a novel receptor cross talk mechanism regulating neutrophil superoxide anion production neutrophil signaling that challenges dogmata of g protein-coupled receptor regulated functions after  min stimulation, barbadin ( m) or la ( ng/ml) was added (indicated by an arrow). (e-h) analysis of fpr and cd b surface expression was performed by flow cytometry. (e-f) neutrophils were incubated in the absence (buffer) and presence of barbadin ( m) at °c for different time points as indicated on the x-axis, prior fixation and staining with an anti-fpr antibody. (g-h) neutrophils were incubated in the absence (buffer, min) and presence of barbadin ( m; min) or fmlf ( nm; min) at °c prior staining with an anti-cd b antibody key: cord- - ilforzm authors: litviňuková, monika; talavera-lópez, carlos; maatz, henrike; reichart, daniel; worth, catherine l.; lindberg, eric l.; kanda, masatoshi; polanski, krzysztof; fasouli, eirini s.; samari, sara; roberts, kenny; tuck, liz; heinig, matthias; delaughter, daniel m.; mcdonough, barbara; wakimoto, hiroko; gorham, joshua m.; nadelmann, emily r.; mahbubani, krishnaa t.; saeb-parsy, kourosh; patone, giannino; boyle, joseph j.; zhang, hongbo; zhang, hao; viveiros, anissa; oudit, gavin y.; bayraktar, omer; seidman, j. g.; seidman, christine; noseda, michela; hübner, norbert; teichmann, sarah a. title: cells and gene expression programs in the adult human heart date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ilforzm cardiovascular disease is the leading cause of death worldwide. advanced insights into disease mechanisms and strategies to improve therapeutic opportunities require deeper understanding of the molecular processes of the normal heart. knowledge of the full repertoire of cardiac cells and their gene expression profiles is a fundamental first step in this endeavor. here, using large-scale single cell and nuclei transcriptomic profiling together with state-of-the-art analytical techniques, we characterise the adult human heart cellular landscape covering six anatomical cardiac regions (left and right atria and ventricles, apex and interventricular septum). our results highlight the cellular heterogeneity of cardiomyocytes, pericytes and fibroblasts, revealing distinct subsets in the atria and ventricles indicative of diverse developmental origins and specialized properties. further we define the complexity of the cardiac vascular network which includes clusters of arterial, capillary, venous, lymphatic endothelial cells and an atrial-enriched population. by comparing cardiac cells to skeletal muscle and kidney, we identify cardiac tissue resident macrophage subsets with transcriptional signatures indicative of both inflammatory and reparative phenotypes. further, inference of cell-cell interactions highlight a macrophage-fibroblast-cardiomyocyte network that differs between atria and ventricles, and compared to skeletal muscle. we expect this reference human cardiac cell atlas to advance mechanistic studies of heart homeostasis and disease. the heart is a complex organ, composed of four morphologically and functionally distinct chambers ( figure a ), that perpetually pumps blood throughout our lives. deoxygenated blood enters the right atrium and is propelled into the low pressure vascular beds of the lungs by the right ventricle. oxygenated pulmonary blood enters the left atrium and then to the left ventricle, which propels blood across the body at systemic vascular pressures. right and left chambers are separated by the atrial and interventricular septa and unidirectional flow is established by the atrio-ventricular valves (tricuspid and mitral) and ventricular-arterial valves (pulmonary and aortic). the heart contains an intrinsic electrophysiologic system, composed of the sinoatrial node in the right atria where depolarization begins and spreads to the atrioventricular node located at the top of the interventricular septum. this electrical impulse is then rapidly propagated by purkinje fibers to the apex where contraction begins. orchestration of the anatomical and functional complexity of the heart requires highly organized and heterogeneous cell populations that enable continuous contraction and relaxation across different pressures, strains, and biophysical stimuli in each chamber. the importance of these variables is reflected in the differences in wall thickness and mass (left, ± g; right ± g) of adult ventricular chambers . specialized properties of cells that enable adaptation to different biophysical stimuli in each chamber are established early in development. the heart is derived from multipotent progenitor cells residing within two heart fields. cells of the first heart field primarily populate the left ventricle and second heart field-derived cells populate the right ventricle; both heart fields contribute to atrial cells. the distinct gene regulatory networks operant in these heart fields likely establish and prime the patterns of gene expression observed in adult cardiac cells which are further impacted by the establishment of postnatal circulation . the cellular composition of the adult human heart, their anatomical specificities, molecular signatures, intercellular networks and spatial relationships between the various cardiac cells remain largely unknown. single-cell and single-nuclei transcriptomics (scrna-seq, snrna-seq) and multiplex smfish imaging now enable us to address these issues at unprecedented resolution . these technologies illuminate the coordinated communication of cells within their microenvironments that in the heart enable electromechanical connectivity, biophysical interactions, and autocrine/paracrine signaling required for tissue homeostasis but are perturbed in disease. while previous studies using a combination of conventional bulk genomics and microscopy have hinted at the cellular complexity of the myocardium, limitations of these techniques have allowed definition of very few distinct cell populations . bulk rna-seq analysis is unable to assign gene expression to defined cell subpopulations, light microscopy fails to define features beyond morphology of cell subpopulations and immunostainings are limited to the analysis of few markers at once. moreover, the large size of cardiomyocytes (length/width:~ / µm) limits the unbiased capture of single cells requiring analyses of single nuclei transcriptomics to ensure a comprehensive approach. here, we present a broad transcriptomic census of multiple regions of the adult human heart. we profiled rna expression of both single cells and nuclei, capturing them from six distinct cardiac anatomical regions. we also analysed the spatial distribution of selected cell populations using multiplex smfish imaging with rnascope probes. our anatomically defined resolved adult human heart cell census provides a reference framework for studies directed towards understanding the cellular and molecular drivers that enable functional plasticity in response to varying physiological conditions in the normal heart, and will inform the heart's responses to disease. overview of the cellular landscape of the adult human heart samples were obtained from six cardiac regions including the free wall of each chamber (left/right ventricle, left/right atrium), denoted as lv, rv, la, ra, and from the lv apex (ax) and interventricular septum (sp). to capture the heterogeneity of cardiac cell populations, samples were collected as transmural tissue segments that span the three cardiac layers (epicardium, myocardium and endocardium; figure a ) from normal hearts (seven females, seven males) of north american and british organ donors (ages - years; figure b and supplementary table a ). we isolated single cells and single nuclei, as the large sizes of cardiomyocyte (cm) are not captured by the x genomics chromium platform. fresh tissues were mechanically and enzymatically processed to dissociate single cells, and subsequently cardiac immune cells were enriched from the cell fraction using cd + magnetic selection. single nuclei were isolated from frozen heart tissues and purified by fluorescent activated cell sorting. the transcriptome of single cells and nuclei were profiled using the x genomics single cell gene expression solution ( figure a ). after processing, all data from nuclei, cells and cd + enriched cells were batch-aligned using a generative deep variational autoencoder , prior to unsupervised clustering. we found differences in the distribution of cell types across donors, even within the same region, and correlations between different cell types at the same site ( supplementary table a ). for example, in lv, ax, sp and rv tissues, the proportions of vcm and fb were negatively correlated ( p -value= . e- ), while there was a positive correlation among the proportions of pc, smc, and nc cells ( p -value< e- ). we suggest that these data reflect random sampling that included vessels with ec, pc, smc, and concurrently fewer cm. however, the observed correlation between nc, smc and pc implies a potential functional organization. the cell distributions were generally similar in tissues of male and female hearts, but left ventricular regions (ax, sp, lv) from female donors had significantly higher mean percentages of cm ( . ± %) compared to male donors ( . ± %; p -value= . ). this is unexpected given the average smaller heart mass of women and may reflect the small donor pool. if replicated, these data may explain higher cardiac stroke volume in women and lower rates of cardiovascular disease. table b ), including hcn , myh , myl and nppa . in addition, we identified a higher acm expression of aldh a ( -fold increased), the catalytic enzyme required for synthesis of retinoic acid, that may reflect acm derived from the second heart field . acm also showed higher levels of ror ( -fold increased), which participates in cardiomyocyte differentiation via wnt-signaling, as well as synpr ( -fold increased), a synaptic vesicle membrane protein with functions in trp-channel mechanosensing by atrial volume receptors , . in contrast, vcm expressed genes with -fold higher expression than acm ( supplementary table b ). genes with significantly enriched expression included prototypic sarcomere protein genes myh , myl , and transcription factors ( irx , irx , irx , masp , hey ). vcm also had -fold higher expression of prdm than acm, which harbors damaging variants associated with lv non-compaction . similarly highly expressed were pcdh ( supplementary figure b a ,b ), a molecule with strong calcium-dependent adhesive properties and smyd , which promotes the formation of protein stabilizing complexes in the z-disc and i-band of sarcomeres , . expression of these genes is likely to promote tissue integrity under the conditions of high ventricular pressure and strain. clustering of vcm data identified five subpopulations ( figure a we also identified two subpopulations (vcm and vcm ) across all ventricular regions. the transcriptional profile of vcm was remarkably similar to a prominent ra subpopulation (acm , discussed below), and suggestive that these are derived from the second heart field . these cells had higher levels of transcripts associated with retinoic-acid responsive smooth muscle cell genes, including myh , cnn , , and nexn . vcm also expressed stress-response genes including ankrd , fhl , dusp , xirp and xirp . the xirp proteins interact with cardiac ion channel proteins nav . and kv . within intercalated discs, and have been implicated in lethal cardiac arrhythmias prevalent in cardiomyopathies , . vcm contributed - % to vcm populations, and expressed nuclear-encoded mitochondrial genes ( ndufb , ndufa , cox c , and cox b ; figure e ) suggestive of a high energetic state. indeed, gene ontology analyses of vcm transcripts identified significant terms of "atp metabolic process" and "oxidative phosphorylation" ( supplementary figure b c ). these cm also had high levels of cryab encoding a heat shock protein with cytoprotective roles and antioxidant responses by cm . with c oncomitant high expression of genes encoding sarcomere components ( figure e ) and pln ( supplementary table b ) , we deduced that these vcm are outfitted to perform higher workload than other vcm. unlike scrna-seq analysis that identified prominent rv expression of pln in embryonic mouse hearts , vcm were similar in both ventricles. a small proportion (~ %) of cells comprised vcm and expressed high levels of dlc and ebf . these molecules participate in regulating brown adipocyte differentiation and may be involved in cardiac pacemaker activity. in addition, vcm nuclei had higher levels of transcripts also expressed in neural lineages ( sox , ebf , and kcnab ) . notably, mice with deleted ebf have profound hypoplasia of the ventricular conduction system . further confirmation and investigation of this subpopulation is required, given the small number of cells and shared marker genes with other cell types. we identified six subpopulations of acm, indicating considerable heterogeneity ( figure b ) particularly between the right and left chambers. notably, hamp , a master regulator of iron homeostasis, was significantly enriched in over % of ra cm compared to % la cm ( supplementary table b ) , consistent with prior studies of ra tissues and confirmed by smfish ( figure g ). hamp has unknown roles in cardiac biology, but hamp-null mice have electron transport chain deficits and lethal cardiomyopathy . the ra enrichment of hamp may imply energetic differences between right and left acm. ligand for robo receptors in the heart , aldh a and brinp , involved in retinoic acid signaling, and grxcr , a molecule that supports cilia involved in mechanosensing . as noted above, acm shared a remarkably similar transcriptional profile with vcm including the smooth muscle cell gene cnn , which we confirmed by smfish with rnascope probes ( figure g, supplementary figure b a ). the transcriptional profiles of acm , acm and vcm likely indicate their derivation from the second heart field that forms the right heart chambers and associated vascular structures . we captured a small subpopulation of mesothelial cells expressing msln , wt and bnc , but not ec, fb or mural lineage genes, indicating that these are likely epicardial cells , , . table c ). this is in line with in vitro models predicting that while arterial smc are more contractile, the venous seem more dedifferentiated and potentially proliferative . ec_art and smc_art ( figure c, d ) . we also predicted differences between the figure c ). transdifferentiation of stromal populations has been described but remains controversial in the literature . we immune cells are known to play key roles in both cardiac homeostasis, as well as inflammation, repair and remodeling; however, they were unrepresented in our single nuclei dataset. therefore, we used the pan-immune marker cd to enrich for this cell population. we defined cardiac-resident versus circulating cells by calculating the enrichment in to evaluate cell -cell interactions of the immune cells in cardiac homeostasis and remodelling we used cellphonedb . thus we predicted putative cross-talk among myeloid and lymphoid cells, cm, and fb ( figure c and supplementary table e ) . this analysis predicted that dc and mØ_trtmsb x+ interacted with fb in a distinct manner in atria and ventricles, as shown in figure d . the fb subpopulation signaled to acm and vcm in turn, forming a cellular circuitry which may be relevant for healthy cardiac homeostasis. the heart is innervated by both sympathetic and parasympathetic components of the autonomic nervous system, which reside in ganglionated plexuses on the epicardial surface and contribute to regulation of heart rhythm . neural cells (nc) innervate the sinoatrial and atrioventricular nodes, from which an activating wave-front propagates throughout myocardial tissue via nerve bundles and purkinje fibers. we identified nc from cardiac tissues, predominantly ( %) from left chambers, presumably due to multiple left ventricular sites. all nc expressed transcripts ( nrxn , nrxn , kcnmb ) typically found in the central nervous and cardiac conduction system including sodium and potassium channels ( supplementary table f ) . lgi , which is required for glia development and axon myelination . nc was predominantly derived from atrial tissues and expressed gria , encoding glutamate receptor which has altered expression levels in ischemic heart disease . additionally, gins is enriched in these cells, which is known to participate in the regulation of cardiac repolarization , . nc and nc subpopulations shared a broadly similar transcriptional profile including lgr , a g-protein-coupled receptor involved in wnt -signaling that promotes cm differentiation and demarcates a subset of cm in the outflow tract ; this region often becomes arrhythmogenic in heart disease . these cells also expressed genes associated with coronary artery disease, ppp r b , lsamp and lpl an endothelial enzyme involved transporting lipoproteins into the heart . additional smaller subpopulations (nc - ) shared marker genes with other cell types and required further analyses to define their identities. we cellphonedb.org predicted different interactions among arterial ecs and skeletal or heart muscle cells ( figure a ). while notch interactions were shared by both, ec-pc/smc interactions in the heart were predicted to involve jag / notch and notch jag but not in skeletal muscle. additionally, the efna -epha receptor-ligand complex pair was consistently present in the heart but not detected in skeletal muscle. these differences may reflect distinct microvasculatures in the two tissues due to different oxygen requirements. we also compared the immune compartment of heart, skeletal muscle and adult kidney we suggest that these tissue residents populations have developed transcriptional circuits tailored to the heart that differ from other tissues ( figure b , c and supplementary table g ). myeloid cells in the skeletal muscle are known to interact with satellite cells to promote myogenesis . as we did not find a cardiac cell population analogous to skeletal muscle satellite cells in the heart, we considered if the heart had other potential repair mechanisms. to identify these we compared predicted interactions ( supplementary table g ) among immune, fibroblast and myocyte populations in heart and skeletal muscle. these analyses showed that cardiac fb secrete fn and tnc which is bound by cm-expressed integrins. in skeletal muscle, predicted cell-cell interactions between prg + fibroblasts and skeletal muscle myocytes involved col a , col a and a b integrins ( figure d ). skeletal muscle fibroblasts and monocytes appear to interact via the cxcr _cxcl chemokine, while cardiomyocytes have a distinct interaction with macrophages as described above in figure d . altogether, these results imply interactive mechanisms are driven by different transcriptional circuits in heart and skeletal muscle. the circuit between cxcr _cxcl , which has been described to promote repair after myocardial infarction appears to be primed by myeloid populations that initiate fibrotic repair. fb in atrial and ventricular tissues exhibited different transcriptional profiles and subpopulations, suggesting distinct functions. we found that ecm-producing and -organising fibroblasts, while present in atria and ventricles, differed in their mode of action, as exemplified by regional-specific expression of different collagen and ecm remodelling factor genes. together with differences in the frequency of other cell types, we suggest that region-specific fb heterogeneity is critical to support cm across varying biophysical stimuli. . related corona viral infections (sars and mers) also had cardiac involvement. here, we found that expression of the viral receptor ace is higher in pericytes than in cm, as previously reported , but also that neither pericytes nor cm express the protease that prime viral entry, tmprss . instead cm and pericytes express ctsb and ctsl , which may also promote viral entry. ace expression correlated with agtr (angiotensin receptor- ) which was highest in pericytes, and consistent with its role of renin-angiotensin-aldosterone system (raas) signalling in cardiac hemodynamics. ace cleaves the vasoconstrictor angiotensin ii that binds agtr. ace -null mice have reduced cardiac contractility, myocardial ischemia and hypoxia , suggesting a profound role of ace in the regulation of cardiovascular hemodynamics. we expect that our results will allow further insights into other cardiac disease processes. going forward, our results will furthermore be of value for deconvolution of existing bulk transcriptomic data, for transfer learning in analyses of other cardiac regions (valves, papillary muscle, conduction system), and to enable interpretation of the cellular responses to human heart disease. all of our data can be explored at www.heartcellatlas.org . this publication is part of the human cell atlas -www.humancellatlas.org/publications . we normal values of left ventricular mass and cardiac chamber volumes assessed by -detector computed tomography angiography in the copenhagen general population study the deployment of cell lineages that form the mammalian heart single cell gene expression to understand the dynamic architecture of the heart scgen predicts single-cell perturbation responses women have higher left ventricular ejection fractions than men independent of differences in left ventricular volume differential gene expressions in atrial and ventricular myocytes: insights into the road of applying embryonic stem cell-derived cardiomyocytes for future therapies regional differences in mrna and lncrna expression profiles in non-failing human atria and ventricles human pluripotent stem cell-derived atrial and ventricular cardiomyocytes develop from distinct distinctive roles of canonical and noncanonical wnt signaling in human embryonic cardiomyocyte development expression of transient receptor potential channels trpc and trpv in venoatrial endocardium of the rat heart cardiomyopathy due to prdm mutation: first description of a fetal presentation, with possible modifier genes non-clustered protocadherin smyd controls cytoplasmic lysine methylation of hsp and myofilament organization conserved expression of the preli domain containing gene (prelid ) during mid-later-gestation mouse embryogenesis t-cadherin is critical for adiponectin-mediated cardioprotection in mice cdh promoter snps with pleiotropic effect on cardiometabolic parameters represent methylation qtls single-cell rna-seq of the developing cardiac outflow tract reveals convergent development of the vascular smooth muscle cells calponin isoforms cnn , cnn and cnn : regulators for actin cytoskeleton functions in smooth muscle and non-muscle cells smooth muscle dysfunction in resistance arteries of the staggerer the enigmatic role of the ankyrin repeat domain gene in heart development and disease fhl- , a new key protein in pulmonary hypertension h lncrna promotes skeletal muscle insulin sensitivity in part by targeting ampk critical roles of xirp proteins in cardiac conduction and their rare variants identified in sudden unexplained nocturnal death syndrome and brugada syndrome in chinese han population alpha b-crystallin induction in skeletal muscle cells under redox imbalance is mediated by a jnk-dependent regulatory mechanism. free radic single-cell analysis of cardiogenesis reveals basis for organ-level developmental defects ebf determines and maintains brown adipocyte identity aha late-breaking basic science abstracts new insights into the hepcidin-ferroportin axis and iron homeostasis in ipsc-derived cardiomyocytes from friedreich's ataxia patient an essential cell-autonomous role for hepcidin in cardiac iron homeostasis expression of slit and robo genes in the developing mouse heart grxcr regulates taperin localization critical for stereocilia morphology and hearing single-cell reconstruction of the adult human heart during heart failure and recovery reveals the cellular landscape underlying cardiac function liprin-mediated large signaling complex organization revealed by the liprin-α/cask and liprin-α/liprin-β complex structures sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor single-cell transcriptome atlas of murine endothelial cells signaling pathways in the specification of arteries and veins single-cell transcriptome analyses reveal endothelial cell heterogeneity in tumors and changes following antiangiogenic treatment resolving the fibrotic niche of human liver cirrhosis at single-cell level a high endothelial venule-expressing promiscuous chemokine receptor darc can bind inflammatory, but not lymphoid, chemokines and is dispensable for lymphocyte homing under physiological conditions hippo signaling plays an essential role in cell state transitions during cardiac fibroblast development role of secreted modular calcium-binding protein (smoc ) in transforming growth factor β signalling and angiogenesis genetic fate mapping defines the vascular potential of endocardial cells in the adult heart a selective marker of normal and neoplastic mesothelial cells in serous effusions adult mouse epicardium modulates myocardial injury by secreting paracrine factors bnc regulates cell heterogeneity in human pluripotent stem cell-derived epicardium the ng proteoglycan in pericyte biology a molecular atlas of cell types and zonation in the brain vasculature anatomically and functionally distinct lung mesenchymal populations marked by lgr and lgr regulator of g-protein signaling prevents smooth muscle cell proliferation and attenuates neointima formation in vitro differences between venous and arterial-derived smooth muscle cells: potential modulatory role of decorin it takes two: endothelial-perivascular cell cross-talk in vascular development and disease endothelial cells are progenitors of cardiac pericytes and vascular smooth muscle cells pericytes, an overlooked player in vascular pathobiology pericytes of multiple organs do not behave as mesenchymal stem cells in vivo calderone angelino & nattel stanley. differential behaviors of atrial versus ventricular fibroblasts differences in atrial versus ventricular remodeling in dogs with ventricular tachypacing-induced congestive heart failure delineating an oncostatin m-activated stat signaling pathway that coordinates the expression of genes involved in cell cycle regulation and extracellular matrix deposition of mcf- cells transforming growth factor β (tgfβ ) regulates cd v expression and activity through extracellular signal-regulated kinase (erk)-induced egr in pulmonary fibrogenic fibroblasts macrophage hypoxia signaling regulates cardiac fibrosis via oncostatin m self-renewing resident cardiac macrophages limit adverse remodeling following myocardial infarction progranulin attenuates liver fibrosis by downregulating the inflammatory response reappraising the role of inflammation in heart failure novel role of lck in leptin-induced inflammation and implications for renal aging cellphonedb: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes the autonomic nervous system and cardiac arrhythmias: current concepts and emerging therapies development of the cardiac conduction system lgi proteins in the nervous system molecular remodeling of ion channels, exchangers and pumps in atrial and ventricular myocytes in ischemic cardiomyopathy common variants at ten loci influence qt interval duration in the qtgen study drug-sensitized zebrafish screen identifies multiple genes, including gins , as regulators of myocardial repolarization downregulation of lgr expression inhibits cardiomyocyte differentiation and potentiates endothelial differentiation from human pluripotent stem cells population and single-cell analysis of human cardiogenesis reveals unique lgr ventricular progenitors in embryonic outflow tract outflow tract ventricular arrhythmias: an update fine mapping of a linkage peak with integration of lipid traits identifies novel coronary artery disease genes on chromosome polymorphisms of the tumor suppressor gene lsamp are associated with left main coronary artery disease role of lipoprotein lipase in lipid metabolism spatiotemporal immune zonation of the human kidney macrophages fine tune satellite cell fate in dystrophic skeletal muscle of mdx mice the cxcl /cxcr chemokine ligand/receptor axis in cardiovascular disease magma: generalized gene-set analysis of gwas data genetic mapping of cell type specificity for complex traits revisiting cardiac cellular composition single-cell reconstruction of the early maternal-fetal interface in humans clinical features of patients infected with novel coronavirus in wuhan the ace expression in human heart indicates new potential mechanism of heart injury among patients infected with sars-cov- male-female differences in fertility and blood pressure in ace-deficient mice angiotensin-converting enzyme is an essential regulator of heart function key: cord- -wfuzk dp authors: meza, diana k.; broos, alice; becker, daniel j.; behdenna, abdelkader; willett, brian j.; viana, mafalda; streicker, daniel g. title: predicting the presence and titer of rabies virus neutralizing antibodies from low-volume serum samples in low-containment facilities date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wfuzk dp serology is a core component of the surveillance and management of viral zoonoses. virus neutralization tests are a gold standard serological diagnostic, but requirements for large volumes of serum and high biosafety containment can limit widespread use. here, focusing on rabies lyssavirus, a globally important zoonosis, we developed a pseudotype micro-neutralization rapid fluorescent focus inhibition test (pmrffit) that overcomes these limitations. specifically, we adapted an existing micro-neutralization test to use a green fluorescent protein–tagged murine leukemia virus pseudotype in lieu of pathogenic rabies virus, reducing the need for specialized reagents for antigen detection and enabling use in low-containment laboratories. we further used statistical analysis to generate rapid, quantitative predictions of the probability and titer of rabies virus neutralizing antibodies from microscopic imaging of neutralization outcomes. using serum samples from domestic dogs with neutralizing antibody titers estimated using the fluorescent antibody virus neutralization test (favn), pmrffit showed moderate sensitivity ( . %) and high specificity ( . %). despite small conflicts, titer predictions were correlated across tests repeated on different dates both for dog samples (r = . ), and for a second dataset of sera from wild common vampire bats (r = . , n = ), indicating repeatability. our test uses a starting volume of . μl of serum, estimates titers from a single dilution of serum rather than requiring multiple dilutions and end point titration, and may be adapted to target neutralizing antibodies against alternative lyssavirus species. the pmrffit enables high-throughput detection of rabies virus neutralizing antibodies in low-biocontainment settings and is suited to studies in wild or captive animals where large serum volumes cannot be obtained. serology is a core component of the surveillance and management of viral zoonoses. virus neutralization tests are a gold standard serological diagnostic, but requirements for large volumes of serum and high biosafety containment can limit widespread use. here, focusing on rabies lyssavirus, a globally important zoonosis, we developed a pseudotype micro-neutralization rapid fluorescent focus inhibition test (pmrffit) that overcomes these limitations. specifically, we adapted an existing micro-neutralization test to use a green fluorescent protein-tagged murine leukemia virus pseudotype in lieu of pathogenic rabies virus, reducing the need for specialized reagents for antigen detection and enabling use in low-containment laboratories. we further used statistical analysis to generate rapid, quantitative predictions of the probability and titer of rabies virus neutralizing antibodies from microscopic imaging of neutralization outcomes. using serum samples from domestic dogs with neutralizing antibody titers estimated using the fluorescent antibody virus neutralization test (favn), pmrffit showed moderate sensitivity ( . %) and high specificity ( . %). despite small conflicts, titer predictions were correlated across tests repeated on different dates both for dog samples (r = . ), and for a second dataset of sera from wild common vampire bats (r = . , n = ), indicating repeatability. our test uses a starting volume of . µl of serum, estimates titers from a single dilution of serum rather than requiring multiple dilutions and end point titration, and may be adapted to target neutralizing antibodies against alternative lyssavirus species. the pmrffit enables high-throughput detection of rabies virus neutralizing antibodies in low- biocontainment settings and is suited to studies in wild or captive animals where large serum volumes the last few decades have seen a surge in newly emerging human viruses that originate from wildlife ( of gfp fluorescence. next, the command "analyze particles" was used to count the total number of fluorescent cells per field (i.e. infected cells). this command grouped and counted the white neighboring pixels with a predetermined size area and circularity to be a single cell (size area: - circularity: . - . ), so counts corresponded to the number of infected cells. cell count outputs were converted into a standardized spreadsheet using a python version . . script (python core team, ) (script available in supplementary materials). at the end of the image processing step, each serum sample was described by data points consisting of the number of the fluorescent cells in each of fields (photographs) in the : and : dilutions (figure b) . all statistical analyses were executed in r (r core team, (scaled to improve model convergence) and ) the serum dilution level (two factors: : and : dilution). random slope and intercept terms were also considered for the date the test was run ("test date") to account for observed variation in the relationships between srig titers and infected cell counts across dates (figure ) and for the field number ( to ) within each microscope well ("field") to account for variation in cell counts between fields (the middle field, field , had more agglomerated cells in particular). to evaluate whether a simpler, single dilution test produced comparable results, the full dataset was then subset to the : dilution only. the binomial and log-normal models fit to this data subset included only the fixed effect of the virus-infected n a cell counts, but the random effects were identical to those explained above (i.e. test date and field). models were fit using the 'lme ' package (bates et al., ) . the 'predict' function was used to generate the predicted probability that a srig concentration or serum sample was seropositive (binomial model) and its corresponding rvna titer (log-normal model). predictions per field were averaged to obtain results per sample (figure c) with a threshold of > . iu/ml (positives) were used as the benchmark reference. to understand the variability of the pmrffit, replicate srig titer concentration curves were produced on different dates between / / and / / (hereafter "test " through "test "). as expected, the number of infected cells declined at higher srig titers in all replicates; however, the shape of the antibody decay curve varied across test dates (figure ) . at the . iu/ml srig concentration, infected cell counts were more dispersed in the : dilution than in the : dilution, as indicated by higher interquartile range (iqr) within each of the test dates. across all the srig concentrations in all test dates (n = ), . % of the count comparisons were less dispersed in the : dilution suggesting this dilution could be more precise for downstream statistical analysis (si ). binomial glmms accurately predicted seropositive and seronegative srig concentrations (figure ) . the best random effects for the binomial model included a random slope and intercept for test date ( table ) . the models built with the : dilution data only ("one-dilution model") and from both the : and : dilution data ("two-dilution model") had equivalent specificity ( %), but the one- dilution model was more sensitive ( % versus . %, figure a, b) . furthermore, the two-dilution binomial model failed to correctly predict the seropositive controls on out of the test dates, confirming improved performance of the one-dilution model (figure b) . the log-normal glmms gave repeatable predictions of rvna titers from the datasets generated through our protocol across test dates (figure c) . the best log-normal model included a random intercept and slope for test date. although the most complex model had the lowest aicc, the simpler model (without the random intercept of field) had a Δaicc < ( table ) . observed and predicted srig titers were highly correlated for both the one and two-dilution models (r = . , figure c). when comparing test dates (i.e. one-to-one comparison between correlations of the one-dilution and the two-dilution model from the same test dates), the correlation coefficients were similar, suggesting the simpler one-dilution model is sufficient for titer prediction (figure d) . the most commonly applied serological tests to detect rvna titers challenge a range of serial dilutions of serum with infectious rv. this process is labor-intensive and requires laboratory capacity to grow large quantities of pathogenic rv. here, we provide an alternative serological framework that uses a combination of digital image analysis and statistical analysis to estimate the presence and titer of rvna from a single dilution using only . µl of serum. the pmrffit differs from other lyssavirus neutralization tests in several key aspects. it uses an mlv(rg) pseudotype rather than pathogenic rv, allowing the pmrffit to be performed in any low- containment laboratory with appropriate cell culture and microscopy facilities. the addition of gfp expression is significant, since it removes the need for fitc-conjugated antibody (reducing reagent costs) and the fixation and staining steps used in traditional rffit or favn. one potential drawback of using gfp expression to measure infectivity is the prolonged neutralization period ( h versus h rffit and h favn) required to gain sufficient fluorescence for image processing (aubert, ; smith et al., ) . longer neutralization requirements ( h) were also required in a favn modification using a gfp expressing recombinant cvs- -egfp but did not alter results relative to the test run with cvs- (xue et al., ) . fortunately, extended incubations are unlikely to alter neutralization outcomes since mlv(rg) pseudotype is replication incompetent, preventing infection of additional cells during the incubation (temperton et al., ) . the pmrffit also uses an imaging pipeline that combines systematic photography of microscope fields with automated digital image processing to count infected cells. microscopy in neutralization tests is time consuming and presents challenges for interlaboratory comparisons due to multiple sources of variation, especially those that affect the manual readout (e.g. laboratory user, manual pipetting, uneven cell monolayer) ( the pmrffit standardized approach minimizes these sources of error while potentially reducing microscope operator time. moreover, the imaging process generates traceable and permanent electronic records of the raw data, eliminating the need to manually digitize records of field counts. several investigators have previously incorporated image processing into rv neutralization tests. to count pixels using a microrffit but did not make full use of the quantitative nature of imaging data to obtain rvna titers and used pathogenic rv rather than a viral pseudotype. a final distinction is that instead of scoring microscope field or wells as virus positive or negative, the pmrffit predicts serological status and rvna titer from infected cell counts in a single serum dilution using statistical modelling. the efficacy of this approach highlights the value of historically underutilized quantitative data on cellular infectivity for lyssavirus serology. model selection indicated substantial day-to-day variation in the srig dilution series. this was unsurprising, since virus neutralization tests are biologically dynamic systems that can be influenced by many factors (e.g. variability in the humidity of the incubator, technical manipulation, light condition of the microscope, variability in gfp expression in the cells) (briggs et al., ; hammami et al., ; kostense et al., ) . since our statistical approach handles this variability through the random effect of test date, the pmrffit is best suited for large numbers of serum samples that require testing to be carried out across multiple batches. however, performance is only marginally reduced when running single models for each test date, implying the pmrffit may still be useful when fewer samples are available for testing (see si , ). surprisingly, fitting the glmms to data from a single : dilution of srig predicted both seropositivity and rvna titer more accurately than models fit to both the : and : dilutions. the reduced performance of the two-dilution model reflected higher variability in the : dilution compared to the : dilution, as evidenced by greater iqr values (si ). ultimately, this variability likely reflects both higher stochasticity in infected cell counts at lower serum concentrations and pipetting error. regardless, the ability to detect low titers (< . the authors declare no conflict of interest. perform the cell count of the fluorescent cells to construct a database to fit the statistical models. c) construction of the statistical models with two different types of prediction. srig data to fit models~ hrs fields/well antibodies to rabies virus in terrestrial wild mammals in native rainforest on the north coast of são paulo state, brazil practical significance of rabies antibodies in cats and dogs rabies virus exposure of brazilian free-ranging wildlife from municipalities without clinical cases in humans or in terrestrial wildlife mumin: multi-model inference measurement of rabies-specific antibodies in carnivores by an enzyme-linked immunosorbent assay fitting linear mixed- effects models using lme temporal and spatial limitations in global surveillance for bat filoviruses and henipaviruses the use of pseudotypes to study viruses, virus sero-epidemiology and vaccination resolving the roles of immunity , pathogenesis and immigration for rabies persistence in vampire bats supporting online material contents : s seroprevalence data s estimation of seasonal birth rate s methods for parameter estimation use of a transient assay for studying the genetic determinants of fv restriction estimating time of infection using prior serological and individual information can greatly improve incidence estimation of human and wildlife infections a comparison of two serological methods for detecting the immune response after rabies vaccination in dogs and cats being exported to rabies-free areas rabies surveillance in wild mammals in south of brazil. transboundary and emerging diseases development of a fluorescent antibody virus neutralisation test (favn test) for the quantitation of rabies-neutralising antibody development of a qualitative indirect elisa for the measurement of rabies virus- specific antibodies from vaccinated dogs and cats principles of virology. fields virology rabies in new mexico cavern bats serological investigation of rabies virus neutralizing antibodies in bats captured in the eastern brazilian amazon one health, emerging infectious diseases and wildlife: two decades of progress? emerging infectious diseases of wildlife -threats to biodiversity and human health serological methods used for rabies post vaccination surveys: an analysis bioecological drivers of rabies virus circulation in a neotropical bat community. plos neglected tropical diseases rapid detection of neutralizing antibodies against bovine viral diarrhoea virus using quantitative high-content screening on the histogram as a density estimator:l theory. zeitschrift für wahrscheinlichkeitstheorie und verwandte gebiete next-generation serology: integrating cross-sectional and capture-recapture approaches to infer disease dynamics host and viral ecology determine bat rabies seasonality and maintenance deciphering serology to understand the ecology of infectious diseases in wildlife rabies virus-neutralising antibodies in healthy, unvaccinated individuals: what do they mean for rabies epidemiology? plos neglected tropical diseases integrating landscape hierarchies in the discovery and modeling of ecological drivers of zoonotically transmitted disease from wildlife convergence of humans, bats, trees, and culture in nipah virus transmission emerging infectious diseases vaccination of tunisian dogs with the lyophilised sag oral rabies vaccine incorporated into the dbl dog bait virus isolation and quantitation between roost contact is essential for maintenance of european bat lyssavirus type- in myotis daubentonii bat reservoir: 'the swarming hypothesis printed in great britain studies on the different conditions for rabies virus neutralization by monoclonal antibodies f - - and f - - journal of wildlife diseases outbreak of seoul virus among rats and rat owners-united states and canada validation of the rapid fluorescent focus inhibition test for rabies virus-neutralizing antibodies in clinical samples lagos bat virus in kenya a semi-quantitative serological method to assess the potency of inactivated rabies vaccine for veterinary use rabies surveillance in the united states evaluation of a new serological technique for detecting rabies virus antibodies following vaccination. vaccine, current progress with serological assays for exotic emerging/re-emerging viruses origins of the h n influenza pandemic in swine in mexico use of serological surveys to generate key insights into the changing global landscape of infectious disease quantification of lyssavirus-neutralizing antibodies using vesicular stomatitis virus pseudotype particles rabies-specific antibodies: measuring surrogates of protection against a fatal disease experimental infection of artibeus intermedius with a vampire bat rabies virus comparison of visual microscopic and computer-automated fluorescence detection of rabies virus neutralizing sampling to elucidate the dynamics of infections in reservoir hosts transmission or within-host dynamics driving pulses of zoonotic viruses in reservoir- host populations python: a dynamic, open source programming language. python software foundation r: a language and environment for statistical computing usa: national institutes of health serologic evidence of lyssavirus infection in bats lyssaviruses and rabies: current conundrums, concerns, contradictions and controversies. f research, , fiji: an open-source platform for biological-image analysis rabies transmitted by vampire bats to humans: an emerging zoonotic disease in latin america? detection of rabies virus antibodies in brazilian free-ranging wild a rapid reproducible test for determining rabies neutralizing antibody prevalence of rabies specific antibodies in the mexican free-tailed bat (tadarida brasiliensis mexicana) at lava cave, new mexico anthropogenic roost switching and rabies virus dynamics in house-roosting big brown bats. vector-borne and zoonotic diseases ecological and anthropogenic drivers of rabies exposure in vampire bats: implications for transmission and control retroviral pseudotypes -from scientific tools to clinical utility rapid fluorescent focus inhibition test optimization and validation: improved detection of neutralizing antibodies to rabies virus a conserved mechanism of retrovirus restriction in mammals host immunity to repeated rabies virus infection in big brown bats evaluation of elisa for detection of rabies antibodies in domestic carnivores an evaluation of two commercially available elisas and one in-house reference laboratory elisa for the determination of human anti-rabies virus antibodies human rabies: updates and call for data who expert consultation on rabies, third report. world health organization technical report series virus neutralising activity of african fruit bat (eidolon helvum) sera against emerging lyssaviruses a robust lentiviral pseudotype neutralisation assay for in-field serosurveillance of rabies and lyssaviruses in africa investigating antibody neutralization of lyssaviruses using lentiviral pseudotypes: a cross- species comparison generation of recombinant rabies virus cvs- expressing egfp applied to the rapid virus neutralization test a pneumonia outbreak associated with a new coronavirus of probable bat origin mixed effects models and extensions in ecology with r tables table . random key: cord- -fqwks rb authors: liao, yan shin j.; kuan, shin ping; guevara, maria v.; collins, emily n.; atanasova, kalina r.; dadural, joshua s.; vogt, kevin; schurmann, veronica; reznikov, leah r. title: acid exposure impairs mucus secretion and disrupts mucus transport in neonatal piglet airways date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: fqwks rb tenacious mucus produced by tracheal and bronchial submucosal glands is a defining feature of cystic fibrosis (cf). although airway acidification occurs early in cf, whether transient acidification is sufficient to initiate mucus abnormalities is unknown. we studied mucus secretion and mucus transport in piglets forty-eight hours following an intra-airway acid challenge. acid-challenged piglet airways were distinguished by increased mucin b (muc b) in the submucosal gland but decreased lung lavage fluid muc b, following in vivo cholinergic stimulation, suggesting a failure in submucosal gland secretion. concomitantly, intrapulmonary airways were obstructed with glycoprotein rich material under both basal and methacholine-stimulated conditions. to mimic a cf-like environment, we also studied mucus secretion and transport under diminished bicarbonate and chloride transport conditions ex vivo. cholinergic stimulation in acid-challenged piglet airways induced extensive mucus films, greater mucus strand formation, increased dilation of submucosal gland duct openings and decreased mucociliary transport. finally, to elucidate potential mediators of acid-induced mucus defects, we investigated diminazene aceturate, a small molecule that inhibits the acid-sensing ion channel (asic). diminazene aceturate restored surface muc b in acid-challenged piglet airways under basal conditions, mitigated acid-induced airway obstruction, and magnified the number of dilated submucosal gland duct openings. these findings suggest that even transient airway acidification early in life might have profound impacts on mucus secretion and transport properties. further they highlight diminazene aceturate as an agent that might be beneficial in alleviating certain mucus defects in cf airway disease. one sentence summary early life airway acidification has profound impacts on mucus secretion and transport. cystic fibrosis (cf) is a life-shortening autosomal recessive disorder caused by mutations in the gene encoding the cystic fibrosis transmembrane conductance regulator (cftr) . cf airways are distinguished by frequent infection, inflammation and adherent mucus. over time, these factors lead to progressive airway destruction and respiratory demise ( - ). several lines of evidence suggest that airway mucus is abnormal in early cf disease ( , ) . for example, pathologic findings in neonates with cf described bronchioles obstruction due to "thick" mucus produced by the tracheal and bronchial submucosal glands ( ) . similarly, hoegger and colleagues found a failure of mucus detachment from airway submucosal glands in newborn pigs with cf, resulting in impaired mucociliary transport ( ) . likewise, rats with cf displayed submucosal gland duct plugging prior to onset of airway infection ( ) , whereas ferrets with cf showed excessive mucus accumulation even in a sterile airway environment ( ) . combined, these studies highlight the submucosal gland as a key node in cf pathogenesis ( ) , and suggest that abnormal mucus might precede infection and inflammation. consistent with this, a recent study in preschool children with cf suggested that mucus accumulates in the airways prior to infection and airway remodeling ( ) . the cause of abnormal mucus in cf has received significant attention ( ) . numerous mechanisms have been proposed, including dehydration of the airway ( ) ( ) ( ) ( ) ) and a lack of cftr-mediated bicarbonate transport ( ) , which acidifies the airway surface ( , ) . both hypotheses are attractive. the lack of bicarbonate transport and an acidic airway ph hypothesis has received scrutiny as studies describing acidification find that it occurs early in disease. for example, decreased airway ph has been found in cf pigs ( ) and cf rats ( ) prior to the onset of infection and inflammation, whereas in humans, decreased airway surface ph has been described in neonates with cf ( ) , but not children ( , ) . our previous data suggested that neonatal piglets challenged with intra-airway acid exhibit airway obstruction ( ) , suggesting defects in mucus secretion, production and/or transport. here we tested the hypothesis that transient airway acidification is sufficient to induce defects in mucus secretion and mucociliary transport in neonatal piglets. secondarily, we tested the hypothesis that pharmacological inhibition of acid detection would mitigate acid-induced secretion defects. we performed multi-level analyses of the secretory gel-forming mucins, muc ac and muc b ( , ) . we first examined basal secretion by measuring the amount of muc ac and muc b in the bronchoalveolar lavage fluid. detectable levels of proteins were found in all treatment groups. however, no statistically significant differences were noted under basal conditions ( figure a , b). bronchoalveolar lavage fluid is limited in that it primarily captures non-adherent proteins and is a mixed fluid retrieved from alveolar and bronchial spaces. therefore, we also measured muc ac and muc b protein expression in tracheal cross sections using antibody-specific labeling and signal intensity analyses ( ) (supplemental figure a -c). analysis of the airway surface revealed no significant differences in muc ac signal intensity under basal conditions ( figure c ). in contrast, acid-challenged piglets had a significant decrease in the signal intensity for airway surface muc b compared to saline-treated piglets ( figure d ). diminazene aceturate restored muc b surface expression in acid-challenged piglets ( figure d ), suggesting a protective effect. decreased surface muc b could be due to a decreased basal secretion rate or decreased amount of muc b in the submucosal gland or in the airway epithelium. thus, we also assessed signal intensity for muc b and muc ac in the submucosal glands but observed no significant differences among treatment groups ( figure e , f). quantitative rt-pcr on whole trachea also supported no significant differences in basal expression (supplemental figure ). these data suggested that decreased surface muc b was not likely due to a decrease in the amount of muc b in the submucosal gland or in the surface epithelium, but highlighted secretion as the likely defect. further, they suggested that the protective effect of diminizene aceturate was not likely through enhancing production of muc b. if a modest defect in muc b secretion under basal conditions was observed, then we reasoned that stimulation of submucosal glands should magnify the defect. thus, we administered the cholinergic agonist methacholine in vivo to a smaller cohort of piglets to stimulate robust submucosal gland secretion. we found no significant differences in muc ac concentrations in the bronchoalveolar lavage fluid retrieved from methacholine-stimulated piglets (figure a ). in contrast, muc b concentrations were significantly decreased in acid-challenged piglets compared to saline-treated controls ( figure b ). concentrations of muc b in acid-challenged animals provided diminazene aceturate were indistinguishable from acid-challenged piglets ( figure b ). similar to the basal secretion studies, we also assessed muc b and muc ac abundance in tracheal cross sections using antibody labeling. no statistically significant differences in the amount of surface muc ac ( figure c ) or muc b ( figure d ) were found. however, when examining the submucosal glands, we found a significant increase in the amount of muc b labeling present in acid-challenged piglets compared to saline-treated controls ( figure we previously reported that intra-airway acid induces mucus obstruction in the small airways ( ) . similar to our previous studies, we found that intra-airway acid induced airway obstruction under both basal ( figure a - d) and methacholine-stimulated conditions ( figure e -h). diminazene aceturate significantly attenuated airway obstruction in both conditions ( figure c , d, g, h). no differences in the amount of mucin mrna were noted (supplemental figure ), suggesting that the airway obstruction and alleviation of airway obstruction were not due to differences in mucin expression, but perhaps related to defective secretion. indeed, mice lacking muc b show airway obstruction ( ) . all of our in vivo studies were performed under normal physiological conditions. however, in cf, lack of bicarbonate and chloride transport impacts the morphology and secretion of mucus. thus, we also investigated mucus morphology and secretion in tracheal segments stimulated with methacholine under diminished bicarbonate and chloride transport conditions. as a surrogate for muc ac, we stained mucus with a jacalin lectin ( ) . wheat germ agglutin (wga) was used as a surrogate for muc b ( ) . in saline-treated piglet airways, mucus was found in discreet packets that decorated the surface like ornaments ( figure a ). in contrast, mucus in acid-challenged piglets formed film-like sheets ( figure b ). scoring indices indicated greater mucus sheet formation in acid-challenged piglets for jacalin-labeled mucus but not wga-labeled mucus compared to saline treated controls ( figure c , d, supplemental figure ). in a smaller cohort of samples, we also assessed sheet formation using imaris software. consistent with our scoring methods, we found a decrease in the number of discreet mucus particles in acid-challenged piglets compared to saline-treated controls (supplemental figure a ). diminazene aceturate did not significantly alter or prevent the formation of sheet-like structures ( figure c , d). finding that acid-challenge augmented the formation of mucus sheets in diminished bicarbonate and chloride transport conditions was consistent with greater accumulation of mucus sheets in cf airways ( ) . mucus strands ( ) and bundles ( ) emanating from submucosal gland ducts have been observed in cf airways. thus, we examined tracheal segments stimulated ex vivo with the methacholine for strand formation ( figure , supplemental figure a - c). we found no significant differences in the abundance of strands formed by mucus labeled with jacalin ( figure c ). however, a significant increase in strand formation for wga-labeled mucus was observed in acid-challenged piglets compared to saline-treated controls ( figure d ). in a smaller cohort of samples, we performed a secondary analysis that consisted of tracing the strand area manually in image j and expressing it as a percentage of the total image field of view. consistent with our scoring method, manual tracing also suggested significantly more strands in acid-challenged piglets compared to saline-treated controls (supplemental figure b ). diminazene aceturate did not significantly alter mucus strand formation ( figure c , d). dilated submucosal gland ducts openings are commonly observed in cf airways ( ) . thus, we also examined the tracheal surfaces for the presence of submucosal gland duct openings. submucosal gland duct openings were only marginally visible in saline-treated piglets ( figure a ). in stark contrast, acid-challenged piglets displayed a significant elevation in submucosal gland duct openings that appeared dilated ( figure b , c, supplemental figure d - f). diminazene aceturate magnified the effect of acid and increased the number of dilated submucosal gland openings ( figure c ). cf airways are distinguished by impaired mucociliary transport ( , , , ) . thus, to determine whether intra-airway induced mucociliary transport defects, we utilized freshly excised tracheal segments stimulated with methacholine under diminished bicarbonate and chloride transport conditions. for these studies, we utilized methods developed by hoegger and colleagues ( ), in which fluorescent nanosphere bind and attach to mucus, allowing for real-time visualization of mucus production and movement. to assess the movement of fluorescently-labeled mucus, we utilized computer assigned particle-tracking ( figure a , b). we found that both average speed and maximal speed of mucus movement was decreased in acid-challenged piglet airways ( figure c , d). because speed is equal to distance over time, we also examined computer assigned particle track length and found it was decreased ( figure e ). no effect of diminazene aceturate was observed. in early cf pathogenesis, the airway is acidic ( , , ) . although mucus abnormalities precede airway infection and inflammation ( , , ) , how and whether transient airway acidification impacts the development of mucus abnormalities is unknown. to characterize the significance of acute and transient early life airway acidification on mucus properties, we challenged neonatal piglets with intra-airway acid or saline control. secondarily, we blocked detection of acid with diminazene aceturate to investigate potential sensors of airway ph ( ) . using multi-level approaches, we provided new insight into the consequence of early life airway acidification on mucus secretion and transport properties. our results showed that under basal conditions, acid-challenged piglet airways showed decreased airway surface muc b, suggesting defective submucosal gland secretion. consistent with that, in vivo methacholine stimulation resulted in less muc b protein in the bronchoalveolar lavage fluid, and more muc b retained in the submucosal gland. diminazene aceturate restored airway surface muc b protein levels under basal conditions but did not prevent the acid-induced defect in methacholine-stimulated secretion. to mimic a cf-like environment, we investigated mucus secretion and mucus transport under diminished bicarbonate and chloride transport conditions. we observed profound differences in the transport and secretion of mucus. specifically, greater mucus sheet formation was observed in acid-challenged piglets airways. additionally, there were a greater number of mucus strands in acid-challenged piglet airways, as well as an elevation the number of submucosal gland duct openings. these features paralleled what has been described in cf pig airways ( ) . thus, combined, our data suggest that intra-airway acidification greatly impacts mucus secretion and transport properties, inducing many features that mimic cf. diminazene aceturate is widely used for the treatment of protozoan diseases and reportedly has no adverse side effects ( ) . its low cost and availability in numerous regions of the world make it an attractive drug. diminazene aceturate blocks acid-sensing ion channel a (asic a) ( ) . asic a is largely present in nerves innervating the airway ( , ) . in our studies, we found a marginal effect of diminazene aceturate in preventing acid-induced mucus defects in the trachea, whereas a strong protective effect of diminazene aceturate was observed in the intrapulmonary airways. it is possible that asic a is more concentrated in the small airways compared to the large airways. it is also possible that asic a is expressed on additional cell types in the small airways compared to the large airways, thus amplifying its protective effect. further, the degree of acidification might be airway tree-dependent, and thus other sensors in addition to asic a more critical in the larger airways. surprisingly, we also found that diminazene aceturate magnified the number of submucosal gland duct openings under diminished bicarbonate and chloride transport conditions. however, this was not associated with a greater number of mucus strands, nor a further impairment (or alleviation) of mucus transport defects. thus, the significance of augmented submucosal gland duct openings induced by diminazene aceturate is unknown and requires additional studies. although direct measurements of airway ph in children with cf have shown that ph is not different ( , ) , other studies suggest that the airways are acidified in neonates with cf ( ) . these findings have raised debate whether acidification could be an initiating factor in cf pathogenesis. our studies suggest that even a transient airway acidification early in life is sufficient to induce profound impairments in mucus secretion and transport. these studies therefore highlight that changes in airway ph may be an initiating defect in cf. this is consistent with reports from many cf animal models. for example, in rats with cf, the airways are acidic and submucosal gland duct plugging is observed prior to onset of airway infection ( ). hoegger and colleagues found a failure of mucus detachment from airway submucosal glands in newborn pigs with cf, where the airways are also acidic, resulting in impaired mucociliary transport ( ) . therefore, our studies challenge those that dismiss airway ph as being a critical factor in cf pathogenesis. we previously examined inflammation extensively in acid-challenged piglets using inflammatory-directed gene arrays and elisas, but only found evidence for transcriptional inflammation ( ) . thus, although it is possible that airway acidification induces mucus abnormalities through local inflammatory-mediated mechanisms, it is unlikely that large-scale and active inflammation is a major driver of acid-induced mucus abnormalities. mice lacking muc b develop airway obstruction and decreased mucociliary transport ( ) . these features mimic features of cf airway disease. in our studies, acid-challenged piglets showed an impaired muc b secretion under basal and methacholine-stimulated conditions. while our findings are consistent with several other studies suggesting defects in submucosal gland secretion ( ) ( ) ( ) ) , they are slightly at odds with recent data suggesting that cf airways are distinguished by enhanced presence of airway mucins early in life ( ). however, it is possible that with time, acid-challenged piglets might exhibit increased mucin abundance. it is also possible that inflammation on top of acidification might be required to increase mucin abundance ( ). our study has limitations. we did not assess mucus secretion under basal and stimulated conditions within the same subject. this was largely due to the inability to take airway samples from a subject before and after methacholine stimulation. further, our study was transient and therefore lacked information regarding long-term consequences of airway acidification. the transient nature of our study, however, might also be an advantage, because it allowed for the effects of acidification to be isolated from potential secondary complications, such as infection and/or prolonged inflammation. finally, we did not identify a definitive mechanism responsible for acid-mediated defects in mucus secretion and mucus transport, although our data highlight asic a as a contributor. thus, how acidification induces mucus abnormalities requires additional studies. in summary, early life airway acidification impaired mucus secretion, evoked mucus obstruction, and decreased mucociliary transport. diminazene aceturate was partially protective in mitigating acid-induced defects. these findings suggest that even transient airway acidification early in life might have profound impacts on mucus secretion and transport properties. further, they highlight diminazene aceturate as a potential agent beneficial for alleviating some features of cf airway disease. the research objective of this study was to define the impacts of early life airway acidification on mucus secretion and mucus transport. endpoints measured include mucin protein concentrations in bronchoalveolar lavage fluid, mucin protein expression in tracheal crosssections, mucin mrna in whole trachea and lung, airway obstruction in lung samples, mucus secretion and morphology ex vivo using lectins, and mucus transport ex vivo using live imagine and computer-assisted particle tracking. subjects were male and female piglets. data were collected across separate experiments in a controlled laboratory experimental design. sexes and treatments were balanced within an experiment whenever possible. male and female piglets were assigned to treatment randomly. studies were performed by individuals blinded to treatment; only an animal id was associated with post-mortem samples. our previous work for histological scoring indicated a "n" of per sex was required to achieve statistical significance ( , ) . we observed no sex differences in airway obstruction induced by acid and therefore combined male and female data. similarly, for qrt-pcr experiments, our previous data indicated that a "n" of was required ( ), therefore we planned for one additional animal per group per sex. for animals that received methacholine in vivo, we performed a smaller study on saline and acid-challenged animals prior to investigating the effects of diminazene aceturate (n = saline-challenged, acid-challenged). for elisa analysis, we previously reported a n of was required ( ) . prospective exclusion criteria established included any piglet that required antibiotics or anti-inflammatories due to gastrointestinal illness. data were not assessed for outliers. a total of piglets (yorkshire-landrace, - days of age) were obtained from a commercial vendor and fed commercial milk replacer (liqui-lean) and allowed a -hour acclimation period prior to interventions. data were collected from separate cohorts of piglets across approximately - . years. the university of florida animal care and use committee approved all procedures. care was in accordance with federal policies and guidelines. after acclimation, piglets were anesthetized with % sevothesia (henry schein). the piglets' airways were accessed with a laryngoscope; a laryngotracheal atomizer (madgic) was passed directly beyond the vocal folds as previously described ( , ) to aerosolize either a µl . % saline control or % acetic acid in . % saline solution to the airway. this procedure results in widespread distribution of aerosolized solutions throughout the piglet airway, including the lung ( , ) . usp grade acetic acid (fisher scientific) was dissolved . % saline to final concentration of % and sterilized with a . µm filter (millex gp). the ph of the % acetic acid solution measured . using an accumet ae ph probe (fisher scientific). the estimated ph once applied to the airway surface is ~ . to . ( ) . acetyl-beta-methacholine-chloride (sigma) was dissolved in . % saline for intravenous delivery and ex vivo application. diminazene aceturate (selleckchem) was dissolved in . % saline containing % dmso (fisher scientific). approximately minutes prior to airway instillations, piglets were provided an intramuscular injection of either vehicle ( % dmso + % . % saline) or vehicle containing . mg/ml diminazene aceturate. vehicle or diminazene aceturate were dosed at . mg/kg. dose was chosen based upon previous studies showing efficacy in trypanosome infections in dogs ( ) . time point was selected based upon previous studies showing peak serum concentrations at minutes and peak interstitial fluid concentrations at hours in adult rabbits ( ) . the caudal left lung of each piglet was excised and the main bronchus cannulated; three sequential ml lavages of . % sterile saline were administered as previously described ( , ) . the recovered material was pooled, spun at x g, and supernatant removed. a porcine muc ac (lsbio, ls-f - ) and porcine muc b (lsbio, ls-f - ) elisa was performed according to the manufacturer's instructions. the elisa was read using a filter-based accuskan fc micro photometer (fisher scientific). the limits of sensitivity were < . ng/ml and < . ng/ml for muc ac and muc b, respectively. the intra-assay cv and inter-assay cv were < . % and < . % for muc ac and < . % and < . % for muc b. lung tissues were fixed in % neutral buffered formalin (~ - days), processed, paraffinembedded, sectioned (~ µm) and stained with periodic-acid-schiff stain (pas) to detect glycoproteins as previously described ( , ) . digital images were collected with a zeiss axio zoom v microscope. indices of obstruction were assigned as previously described ( , ) . rna from the whole trachea and whole lung were isolated using rneasy lipid tissue kit (qiagen) with optional dnase digestion (qiagen). rna concentrations were assessed using a nanodrop spectrophotometer (thermo fisher scientific). rna was reverse transcribed for the whole trachea and whole lung ( ng) using superscript vilo master mix (thermofisher). briefly, rna and master mix were incubated for mins at °c, followed by mins at °c, followed by mins at °c. muc ac and muc b transcript abundance were measured as previously described ( , ) . all qrt-pcr data were acquired using fast sybr green master mix (applied biosystems) and a lightcycler (roche). standard ΔΔct methods were used for analysis ( ) . containing tissue-tek oct (electron microscopy sciences). molds were placed in a container filled with dry ice until frozen and stored at - °c long-term. tissues were sectioned at a thickness of μm and mounted onto superfrost plus microscope slides (thermofisher scientific). storage until immunofluorescence also occurred at - °c. we used immunofluorescence procedures similar to those previously described ( , ) . briefly, representative cross-sections from a single cohort of pigs were selected and fixed in % paraformaldehyde for minutes. tissues were then permeabilized in . % triton x- , followed by blocking in pbs superblock (thermofisher scientific) containing - % normal goat serum (jackson laboratories). tissues were incubated with primary antibodies for hours at °c. tissues were washed thoroughly in pbs and incubated in secondary antibodies for hour at room temperature. tissues were washed and a hoechst stain performed as previously described ( ) . a : glycerol/pbs solution was used to cover the sections and cover glass added. sections were imaged on a zeiss axio zoom v microscope. identical microscope settings within a single cohort of piglet tracheas were used and applied. three to five images encompassing the posterior, anterior, and lateral surfaces of the trachea were taken. images were exported and analyzed using imagej. the trachea surface epithelia and entire submucosal gland regions were traced and the mean signal intensity muc b and muc ac recorded. background signal intensity was measured and subtracted manually. the final signal intensities were averaged to identify a mean signal intensity for muc b and muc ac per region per each piglet. we used the following anti-mucin antibodies: rabbit anti-muc b ( : ; santa cruz, cat.# ) and mouse anti-muc ac (clone m ) ( : , , thermofisher scientific, cat # ma ), followed by goat anti-rabbit and goat anti-mouse secondary antibodies conjugated to alexa-fluor or (thermofisher scientific, : , dilution). we used wga-rhodamine (vector laboratories) and jacalin-fitc (vector laboratories) at : , dilution. tracheas were submerged in pbs and visualized using a zeiss axio zoom v microscope. we have previously described the ventilation and in vivo methacholine procedures in piglets ( , ) . briefly, h post instillation, animals were anesthetized with ketamine ( mg/kg), and xylazine ( . mg/kg), and intravenous propofol ( mg/kg) (henry schein animal health). a tracheostomy was performed, and a cuffless endotracheal tube (coviden, . - . mm od) was placed. piglets were connected to a flexivent system (scireq); paralytic (rocuronium bromide, novaplus) was administered. piglets were ventilated at breaths/min at a volume of ml/kg body mass. increasing doses of methacholine were administered intravenously in approximately ~ min intervals in the following doses (in mg/kg): . , . , . , . , . . piglets were ventilated to ensure animal well-being and patency of airway tissues. we used methods adopted from ostedgaard et al ( ) . briefly, - rings of trachea were removed post-mortem and the outside of the tracheas were wrapped in gauze soaked with mls of the following: mm nacl, . mm k hpo , . mm kh po , . mm cacl , . mm mgcl , mm dextrose, mm hepes, ph . (naoh), . mgs per ml of methacholine, and μm bumetanide. we determined that the amount of methacholine the tissue was exposed to was . mgs total (e.g., external surface of the trachea is covered by approximately μl). thus, the methacholine dose delivered to the trachea was . mg/ . cm (tissue dimensions of trachea: radius = . cm, height = . cm). using a conversion chart for dogs ( ) , we estimated the body surface area of a - kg piglet to be . - . m . therefore, the estimated dose of methacholine delivered to the tissue was ~ - fold higher than the cumulative in vivo dose of methacholine to account for diffusion of methacholine across multiple tissues layers and in the absence of blood circulation. tracheas were then placed in a temperature-controlled humidified incubator for hours. following stimulation, tracheas were fixed overnight in % paraformaldehyde and permeabilized for minutes using a triton solution ( . %), followed by blocking in superblock pbs. jacalin-fitc and wheat-germ agglutin-rhodamine were used to visual mucus ( , ) and incubated overnight with tracheas at concentration of : , . tracheas were then washed, cut upon the posterior surface, and pinned to wax covered petri-dishes followed by submersion in pbs. tracheas were imaged with zeiss axio zoom v posterior to anterior using identical microscope settings. images were assigned scoring indices for sheet formation and strand formation by two observers blinded to conditions. the number of dilated submucosal gland ducts was also measured by two observers blinded to conditions. scoring for sheet formation follows: = no sheet formation; = - % of image field shows sheet formation; = - % of image field shows sheet formation; = - % of image field shows sheet formation; = - % of image field shows sheet formation. sheet formation was defined as the loss of discrete cellular packets of mucus. secondary analysis and validation of scoring was performed with imaris software (detailed below). scoring for strand formation was similar to sheet formation. scoring for strand formation follows: = no strand formation; = - % of image field shows = - % of image field shows a strand; = - % of image field shows a strand; = - % of image field shows a strand. secondary analysis consisting of tracing the strand area and expressing it a percentage of the total image field of view was also performed on a subset of images in imagej. mucociliary transport was measured using methods similar to those described by hoegger et al ( ) . briefly, rings of tracheas were submerged in mls of prewarmed solution containing the following: mm nacl, . mm kh po , . mm cacl , . mm mgcl , . mm kcl, . mm na hpo - h , mm hepes, ph . (naoh), and μm bumetanide. tracheas were placed onto a heated stage and kept at °c. images were acquired every minute for minutes. after minutes of baseline, methacholine was administered directly into the solution covering the basolateral and apical sides of tracheas at a dose of . mg/ml. this dose matched the estimated accumulative dose of methacholine administered to piglets in vivo (e.g., . mg/kg = . mg/l = . mg/ml). mucus transport was assessed for an additional minutes. imaris software was used to track mucus transport across time. details about imaris software and processing are highlighted below. computer particles based upon signal intensity above background were automatically generated with imaris software (bitplane). the particles were then tracked through time using a custom imaris algorithm utilized principles of the well-validated algorithms published by jaqaman and colleagues ( ) . for each trachea, the average mean speeds and max speeds of the computerassigned particles were reported. the length of the particle track was also computed automatically and the mean track length of all the particles per trachea were calculated and reported. for determination of sheet formation, an image was chosen at random from a subset of tracheas stained with lectins. particles were assigned to the images automatically as described above. the number of particles (representing the jacalin-labeled mucus with intensity above background) were reported. in tracheas that exhibit robust mucus sheet formation, the number of computer-assigned particles is decreased because of the significant jacalin labeled mucus that covers the field of view (e.g., less signal to noise), as opposed to the tracheas in which there is discreet packets of jacalin-labeled mucus (e.g., greater signal to noise). for parametric data that compared three groups, we used a one-way anova followed by a sidak-holmes multiple comparison test. for non-parametric data that compared three groups, we used a one-way anova (kruskal-wallis) test followed by dunn's multiple comparison test. for analyses that compared two groups, we used a two-tailed unpaired students t-test. all tests were carried out using graphpad prism . a. statistical significance was determined as p < . . sheet index for jacalin-labeled mucus (c) and wheat germ agglutinin-labeled mucus (d). n = saline-challenged piglets ( females, males), n = acid-challenged pigs ( females, males), n = acid-challenged + diminazene aceturate pigs ( females, males). data points represent the mean score for each piglet calculated from - analyzed images (encompassing the anterior, middle and posterior regions of the trachea). abbreviations: wga, wheat germ agglutinin; dz, diminazene aceturate. for panel c, * p = . compared to saline-challenged pigs. data were assessed with a non-parametric one-way anova (kruskal-wallis) followed by dunn's multiple comparison test. mean ± s.e.m shown. saline-challenged piglets ( females, males), n = acid-challenged pigs ( females, males), n = acid-challenged + diminazene aceturate pigs ( females, males). data points represent the mean score for each piglet calculated from - analyzed images (encompassing the anterior, middle and posterior regions of the trachea). abbreviations: wga, wheat germ agglutinin; dz, diminazene aceturate. for panel c, * p = . compared to saline-challenged pigs, # p = . compared to acid-challenged piglets. data were assessed with a parametric one-way anova followed by sidak-holmes post hoc test. mean ± s.e.m shown. with imaris software. computer particles are assigned based upon fluorescence intensity and appear as blue/aqua in color. mean mucus transport speed (c) and the maximum mucus transport speed (d). (e) computer assigned particle-track length. n = saline-challenged piglets ( females, males), n = acid-challenged pigs ( females, males), n = acidchallenged + diminazene aceturate pigs ( females, males). abbreviations: dz, diminazene aceturate. for panel c, * p = . compared to saline-challenged pigs. for panel d, * p = . compared to saline-challenged pigs. for panel e, * p = . compared to saline-challenged pigs. data were assessed with a parametric one-way anova followed by sidak-holmes post hoc test. mean ± s.e.m shown. defining a pulmonary exacerbation in cystic fibrosis role of epithelial hco (-) transport in mucin secretion: lessons from cystic fibrosis a morphometric study of mucins and small airway plugging in cystic fibrosis the pathogenesis of fibrocystic disease of the pancreas; a study of cases with special reference to the pulmonary lesions impaired mucus detachment disrupts mucociliary transport in a piglet model of cystic fibrosis development of an airway mucus defect in the cystic fibrosis rat infection is not required for mucoinflammatory lung disease in cftr-knockout ferrets hyposecretion, not hyperabsorption, is the basic defect of cystic fibrosis airway glands mucus accumulation in the lungs precedes structural changes and infection in children with cystic fibrosis future directions in early cystic fibrosis lung disease research: an nhlbi workshop report a periciliary brush promotes the lung health by separating the mucus layer from airway epithelia increased airway epithelial na+ absorption produces cystic fibrosis-like lung disease in mice cystic fibrosis airway secretions exhibit mucin hyperconcentration and increased osmotic pressure pathological mucus and impaired mucus clearance in cystic fibrosis patients result from increased concentration, not altered ph cystic fibrosis: impaired bicarbonate secretion and mucoviscidosis acidic ph increases airway surface liquid viscosity in cystic fibrosis reduced airway surface ph impairs bacterial killing in the porcine cystic fibrosis lung neonates with cystic fibrosis have a reduced nasal liquid ph; a small pilot study airway surface liquid ph is not acidic in children with cystic fibrosis airway surface ph in subjects with cystic fibrosis sex-specific airway hyperreactivity and sex-specific transcriptome remodeling in neonatal piglets challenged with intra-airway acid cftr, mucins, and mucus obstruction in cystic fibrosis muc ac and muc b mucins increase in cystic fibrosis airway secretions during pulmonary exacerbation muc b is required for airway defence gel-forming mucins form distinct morphologic structures in airways the mucus bundles responsible for airway cleaning are retained in cystic fibrosis and by cholinergic stimulation impairment of mucociliary transport in cystic fibrosis mucus clearance as a primary innate defense mechanism for mammalian airways diminazene is a slow pore blocker of acid-sensing ion channel a (asic a) the efficacy of berenil (diminazene aceturate) against trypanosoma evansi infection in mice acid-sensing by airway afferent nerves characterization of acid signaling in rat vagal pulmonary sensory neurons submucosal gland dysfunction as a primary defect in cystic fibrosis solitary cholinergic stimulation induces airway hyperreactivity and transcription of distinct pro-inflammatory pathways widespread airway distribution and short-term phenotypic correction of cystic fibrosis pigs following aerosol delivery of piggybac/adenovirus lentiviral-mediated phenotypic correction of cystic fibrosis pigs diminazene aceturate residues in the tissues of healthy, trypanosoma congolense and trypanosoma brucei brucei infected dogs studies in rabbits on the disposition and trypanocidal activity of the antitrypanosomal drug, diminazene aceturate (berenil) acid-sensing ion channel a contributes to airway hyperreactivity in mice appendix bsa conversion charts studying mucin secretion from human bronchial epithelial cell primary cultures robust single-particle tracking in live-cell time-lapse sequences we thank dr. dave meyerholz and dr. mark hoegger for helpful comments and suggestions.the authors also thank dr. igancio aguirre for providing technical support and resources. representative antibody-labeling of mucin ac (green) and mucin b (red) in tracheal cross sections from piglets challenged with intra-airway saline (a), intra-airway acid (b), intra-airway acid + diminazene aceturate (c). arrow heads represent labeling of mucus on the surface; arrows key: cord- - n mnjb authors: węglarz-tomczak, ewelina; tomczak, jakub m.; talma, michał; brul, stanley title: ebselen as a highly active inhibitor of plprocov date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n mnjb since december a novel a coronavirus identified as sars-cov- or cov has been spreading around the world. on the th of may around . million people got infected and over , died due to the infection of cov . the effective treatment remains a challenge. targeted therapeutics are still under investigation. the papain-like protease (plpro) from the human sars-cov- coronavirus is a cysteine protease that plays a critical role in virus replication. its activity is required to process the viral polyprotein into functional, mature subunits. moreover, cov uses this enzyme to modulate the host’s immune system to its own benefit. therefore, it represents a highly promising target for the development of antiviral drugs. in this work, we discovered that ebselen, a synthetic organoselenium drug molecule with anti-inflammatory, anti-oxidant and cytoprotective activity in mammalian cells and cytotoxicity in lower organisms, is a highly active inhibitor of plprocov . we proved that ebselen is a covalent, fast-binding inhibitor of plprocov exhibiting a low micromolar potency. furthermore, we identified a difference between plpro from sars-cov- (the corona virus which caused the – outbreak, sars) and sars-cov- that allows to explain the difference in dynamics of the replication, and, thus, the disease progression. namely, we present that they show differences in the binding affinity of substrates that we observed through kinetics and molecular docking studies. using a novel approximate bayesian computation method we were able to find kinetic constants for both enzymes. molecular modeling study on the structure of the active site and binding mode of the ebselen with sars and cov showed also significant differences that could explain our observation that ebselen is less active and slower bounding with sars than cov . in conclusion, we show that ebselen inhibits the activity of the essential viral enzyme papain-like protease (plpro) from sars-cov- in low micromolar range. more than years after the global pandemic due to the sars (severe acute respiratory syndrome) coronavirus (cov), no anti-coronaviral medications have been developed for the treatment of infection caused by human coronaviruses (hcov). sars-cov- (sars) was identified as the causative agent of the fatal global outbreak of respiratory disease in humans during - and resulted in , cases with a case fatality rate (cfr) of % [ ] . in september another sars-like respiratory virus (termed middle east respiratory syndrome coronavirus, mers-cov) had been established. mers coronavirus spread around the world with a total of laboratory-confirmed cases, but a cfr of as much as % [ ] showing us how fatal the interspecies transmission potential of covs can be. in december , a novel coronavirus severe acute respiratory syndrome coronavirus (sars-cov- , cov ) formerly known as the novel coronavirus ( -ncov), was discovered in wuhan, china and was sequenced and isolated by january [ ] . the disease, now termed coronavirus disease , rapidly spread within china and infected a much larger number of people causing world economic and social paralysis. on may , nearly . million laboratory-confirmed infections were reported around the world, including over , deaths [ ] . sars-cov- and sars-cov- are closely related, with studies highlighting that sars-cov- genes share > % nucleotide identity and . % nucleotide similarity with sars-cov genes [ ] . the development of anti-coronaviral drugs remains challenging although a number of coronaviral proteins have been identified as potential drug targets [ , ] . two of the most promising are papain-like protease (pl pro ) and main protease (m pro , also known as chymotrypsin-like protease cl pro ) [ ] [ ] [ ] . these proteases play an essential role in polypeptide processing during virus replication. pl pro , in addition to being crucial during replication via processing of the viral polyprotein [ ] , is proposed to be a key enzyme in the sustained pathogenesis of sars-cov. this includes deubiquitination [ ] (the removal of ubiquitin), and deisgylation [ ] (the removal of isg ) from host-cell proteins. these last two enzymatic activities result in the antagonism of the host antiviral innate immune response [ ] . as a result, pl pro is an important potential target for antiviral drugs that may inhibit viral replication and weaken dysregulation of signalling cascades in infected cells that may lead to cell death in surrounding, uninfected cells [ ] . very recent studies led by dikic confirmed the pl pro from sars-cov- (pl pro cov ) to be an essential viral enzyme and potential weak spot [ ] . they proposed pl pro to be achilles' heel of sars-cov- . pl pro from sars-cov- (pl pro sars) and pl pro cov are closely related, with . % sequence identity, and relatively distant from pl pro from mers ( . % identity). the removal of ubiquitin has been also recently confirmed by rut et al. in [ ] . ebselen is a low-molecular-weight organoselenium drug that shows pleiotropic mode of action and due to its very low toxicity there are no barriers to using it in humans [ ] . it is a well-known agent with therapeutic activity in neurological disorders [ ] and cancers [ ] . it also showed an antiviral effect on neurotropic viruses [ ] and hepatitis c virus [ ] . in a very recent work ebselen has been shown to attenuate inflammation and promote microbiome recovery in mice after antibiotic treatment for cdias [ ] . another recently published work proposed via virtual screening ebselen as a possible inhibitor of m pro from sars-cov- [ ] . several lines of evidence demonstrated the biological effects of ebselen is mainly due to its antioxidant properties and capability of forming selenenyl-sulfide bonds with the cysteine residues in proteins [ , [ ] [ ] [ ] [ ] . here, we demonstrate that ebselen inhibits activity of the essential viral enzyme, namely, papain-like protease (pl pro ) from sars-cov- (pl pro cov ) in low micromolar range. moreover, we have identified the mechanism of inhibition as fast and irreversible as well as propose the binding mode of ebselen by molecular docking. we have found a difference in the mechanism of catalysis, the inhibition and the active sites between pl pro from sars-cov- and sars-cov- . furthermore, we used the recently published approximate bayesian computation (abc) methodology [ ] to find kinetic parameters of the catalysis of ubiquitin conjugated with fluorophore by pl pro sars and pl pro cov . our findings help to understand differences between sars-cov- and sars-cov- by analysing pl pro , and further indicate that ebselen is a highly active inhibitor of pl pro cov and indeed is a potential drug against covid- . following the high potential of ebselen as a promising drug, we sought to test its efficacy in the inhibition of the enzyme that is crucial in viral replication, namely, pl pro . deep analysis of the active site of and mechanism of catalysis of coronaviruses pl pro led us to conclude that small molecules with planar phenyl moieties, which also are able to modify cysteine residue active site, could be effective ( figure ). pl pro from coronaviruses belong to the peptidase clan ca (family c ). the active site contains a classic catalytic triad composed of cys-his-asp. we analysed the active site and the mechanism based on the crystal structure of pl pro from sars-cov- published by báez-santos et al. in [ , ] that showed pl pro sars has a catalytic triad composed of cys -his -asp . this catalytic triad transforms the -sh group from cystine into a strong nucleophilic ion that attacks the carboxylic group of peptide bonds and leads to hydrolysis (figure (left) ). the side chain sulfur atom of cys is positioned . Å from the nitrogen in position of the imidazole ring in catalytic histidine (his ). one of the oxygen atoms of the side chain of catalytic aspartic acid (asp ) is located . Å from the nitrogen in position of the same histidine (figure ). the side chain of trp that is located within the oxyanion hole. the indole-ring nitrogen was proposed to participate in the stabilization of the negatively charged tetrahedral transition state of the reaction intermediates produced throughout catalysis [ , , ] . analysis of the active site and mechanism of the catalysis (left) and possible inhibition by ebselen (right) of coronaviruses pl pro based on the active site of pl pro sars [ ] . ebselen ( figure ) meets active site requirements, size and conformation, and the ability to modify the -sh group. moreover, ebselen possesses a clean safety profile in human clinical trials that in case of positive results can lead directly to discovering effective drugs against covid- . encouraged by our analysis, we decided to test ebselen against both pl pro enzymes from sars-cov- and sars-cov- . as a substrate for our study we chose ubiquitin conjugated with fluorophore (ub-amc). progress curves (figures a and c ) at different levels of substrate concentration showed an interesting difference between two enzymes. pl pro sars catalyzed the reaction faster and achieved saturation. we estimated kinetic parameters of hydrolysis of ub-amc (table ) using the recently published novel approximate bayesian computation (abc) computational tool for calculating kinetic constants in the michaelis-menten equation [ ] . this extremely useful framework gives us the opportunity to find the turnover number ( k cat ), the michaelis menten constant ( k m ) and, as a consequence, the catalytic efficiency of the enzyme ( k cat / k m ) without using high concentrations of ub-amc ( figures b and d) . the catalytic efficiency ( k cat / k m ) is often used as a specificity constant to compare the relative rates of reactions. here we show that this ratio is three times higher for pl pro sars compared to pl pro cov that indicates its higher capability to hydrolyze ub-amc. pl pro is required for the processing of viral polypeptides and to modulate the host's immune response, the higher efficiency may well contribute to the fact that once infected, sars-cov- was overall more aggressive and the disease developed faster. we applied ebselen as a possible inhibitor and, indeed, it suppresses pl pro activity from cov with inhibition constants approximately equal μm (table and figure ). we determined the mechanism of inhibition of ebselen as irreversible, with steady-state binding being achieved immediately. ebslen appeared to be an irreversible inhibitor of the studied pl pro sars as well, although in this case inhibition was weaker and the kinetics of binding was slow (table ) . irreversibility seems to confirm our first assumption that the inhibition of both enzymes can be associated with covalent bonds between se from ebselen and s from cysteine. we confirmed irreversibility via dialysis and attemption of reactivation of the enzymes. the inhibitors were screened in tris buffer. the release of the fluorophore was monitored continuously. the linear portion of the progress curve was used to calculate the velocity. each experiment was repeated at least three times and the results are presented as the average with standard deviation. for more details, please see the materials and methods section. our results were further illustrated by the use of molecular modeling to study the binding mode of ebselen with pl pro sars ( figure ) and pl pro cov (figure and ) . this study confirmed our primary assumption that ebselen binds to the pl pro cov active site covalently and, thus, convinced us of our hypothesis about an irreversible mechanism of inhibition. in pl pro sars catalytic triad cys -his -asp is exposed externally out of the protein. the model with the most favorable thermodynamic stability shows that ebselen occupies an intersection between the putative catalytic triad cys -his -asp and trp , as we hypothesized. the n -phenyl ring of ebselen is surrounded by the aromatic ring of trp (edge-to-face interaction) and lys . amine group from lys additionally interacts with oxygen from ebselen. another edge-to-face interaction, we identified between se -ring and an imidazole side chain of his . ebselen is buried deeper in the active site of pl pro cov than of pl pro sars. this results from different conformation of the enzyme. in the pl pro cov the catalytic triad cys -his -asp in each of three subunits are directed to the center of the protein. the most preferable conformation shows that the molecule occupies the same intersection between catalytic cys -his -asp triad and trp but is additionally wrapped by other amino acids (tyr , ala , leu ). here, we identified a different binding mode where the se -phenyl is directed to the oxyanion hole and trp , not n -phenyl as it was observed for pl pro sars. the se -phenyl (from ebselen fragment) and an indole (from cys ) formed π-π stacking interactions, while face-to-edge stacking interactions were observed with an aromatic ring from tyr . the n -phenyl ring of ebselen adopts a bent-shaped conformation that fits well to the space between leu and ala forming π-alkyl interaction with them. the smaller distance between the oxyanion hole and ligand as well as more interactions with amino acids surrounding the active site in pl pro cov could explain a better binding affinity observed in our experiments. a model of the complex of ebselen with human sars-cov- pl pro (pdb: fe [ ] ). a model of the complex of ebselen with human sars-cov- pl pro (pdb: w c [ ] ). the present study provides an understanding of differences between sars-cov- and sars-cov- by analysing pl pro and further highlights high potential of ebselen as a treatment for novel coronavirus appeared in december . pl pro cov is an essential viral enzyme that is required for the processing of viral polypeptides and assembling of new viral particles within human cells [ ] . we first estimated parameters of kinetic constants and the catalytic efficiency of the catalysis of ub-amc by pl pro sars and pl pro cov (see figure and table ). our results suggest that the capability to hydrolyze ub-amc is three times higher for pl pro sars compared to pl pro cov . this observation is well-aligned with the fact that sars-cov- is more aggressive than sars-cov- and leads to a faster development of a disease. further, we showed that ebselen inhibits the enzyme pl pro cov and suppresses its activity with inhibition constants approximately equal μm (see figure and table ). moreover, we indicated that ebslen appeared to be an irreversible inhibitor of both pl pro cov and pl pro sars. however, it is weaker in the case of pl pro sars. eventually, we studied the binding mode of ebselen with pl pro sars ( figure ) and pl pro cov ( figure ) with the use of molecular modeling. the obtained results firmly corroborated our primary assumption that ebselen binds to the pl pro cov active site covalently. this observation reinforces our view regarding the irreversibility of the mechanism of pl pro cov enzyme inhibition by ebselen and will aid in finding variants of the compound with further improved efficacy. recombinant sars-cov- pl pro , sars-cov- pl pro and ubiquitin-amc were purchased as , and μm solutions, respectively, from r&d systems. ebselen ( -phenyl- , -benzisoselenazol- ( h)-one ) is commercially available from sigma aldrich. the enzymes were dissolved in a mm tris-hcl buffer containing dtt ( mm), nacl ( mm) and . % albumin, at ph . , and preincubated min. spectrofluorimetric measurements were performed in a -well plate format working at two wavelengths: excitation at nm and emission at nm. the release of the fluorophore was monitored continuously at the enzyme concentration of nm. the linear portion of the progress curve was used to calculate velocity of hydrolysis. the inhibitor was screened against recombinant pl pro sars and pl pro cov at °c in the assay buffer as described above. for steady state measurement the enzymes were incubated for min at °c with an inhibitor before adding the substrate to the wells. eight different inhibitor concentrations were used. value of the concentration of the inhibitor that achieved % inhibition ( ic ) was taken from the dependence of the hydrolysis velocity on the logarithm of the inhibitor concentration [i]. molecular modeling studies were performed using the discovery studio (dassault systemes biovia corp). the crystal structure of the sars-cov- and sars-cov- (pdb id fe [ ] and w c [ ] , respectively) with protons added (assuming the protonation state of ph . ) was used as the starting point for calculations of the enzyme complexed with ebselen. the partial charges of all atoms were computed using the momany-rone algorithm. minimization was performed using the smart minimizer algorithm and the charmm force field up to an energy change of . or rms gradient of . . generalized born model was applied. the nonbond radius was set to Å. summary table of sars cases by country middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china who novel coronavirus potential treatments for covid- ; a narrative literature review analysis of therapeutic targets for sars-cov- and discovery of potential drugs by computational methods the sars-coronavirus papain-like protease: structure, function and inhibition by designed antiviral compounds identification of severe acute respiratory syndrome coronavirus replicase products and characterization of papain-like protease activity the papain-like protease of severe acute respiratory syndrome coronavirus has deubiquitinating activity selectivity in isg and ubiquitin recognition by the sars coronavirus papain-like protease regulation of irf- -dependent innate immunity by the papain-like protease domain of the severe acute respiratory syndrome coronavirus inhibition of papain-like protease plpro blocks sars-cov- spread and promotes anti-viral immunity activity profiling and structures of inhibitor-bound sars-cov- -plpro protease provides a framework for anti-covid- drug design a promising antioxidant drug: mechanisms of action and targets of biological pathways ebselen as template for stabilization of a v mutant dimer for motor neuron disease therapy ebselen inhibits qsox enzymatic activity and suppresses invasion of pancreatic and renal cancer cell lines synthesis of new alkylated and methoxylated analogues of ebselen with antiviral and antimicrobial properties ebselen inhibits hepatitis c virus ns helicase binding to nucleic acid and prevents viral replication the clinical drug ebselen attenuates inflammation and promotes microbiome recovery in mice after antibiotic treatment for cdi structure of mpro from covid- virus and discovery of its inhibitors a multifunctional compound ebselen reverses memory impairment, apoptosis and oxidative stress in a mouse model of sporadic alzheimer's disease cancer-preventive selenocompounds induce a specific redox modification of cysteine-rich regions in ca( +)-dependent isoenzymes of protein kinase ebselen induces reactive oxygen species (ros)-mediated cytotoxicity in saccharomyces cerevisiae with inhibition of glutamate dehydrogenase being a target caloric restriction controls stationary phase survival through protein kinase a (pka) and cytosolic ph x-ray structural and biological evaluation of a series of potent and highly selective inhibitors of human coronavirus papain-like proteases structural basis for the ubiquitin-linkage specificity and deisgylating activity of sars-cov papain-like protease estimating kinetic constants in the michaelis-menten model from one enzymatic assay using approximate bayesian computation the crystal structure of papain-like protease of sars cov- we gratefully acknowledge the dassault systemes for the free license for biovia: discovery studio package given for our research.ewt is co-financed by a grant mobilność plus v from the polish ministry of science and higher education (grant no. /mob/v/ / ). ewt conceived the project. ewt designed the research and experiments with contributions from jt and sb. experimental work was done by ewt. parameter estimation was carried out by jt. molecular docking was done by jt, ewt and mt. ewt, jt and sb drafted and revised the manuscript.. the authors declare no competing interests. key: cord- - z ppn authors: wren, brandi; ray, ian s.; remis, melissa; gillespie, thomas r.; camp, joseph title: social contact behaviors are associated with infection status for whipworm (trichuris sp.) in wild vervet monkeys (chlorocebus pygerythrus) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: z ppn social grooming in the animal kingdom is common and serves several functions, from removing ectoparasites to maintaining social bonds between conspecifics. we examined whether time spent grooming with others in a highly social mammal species was associated with infection status for gastrointestinal parasites. of six parasites detected, one (trichuris sp.) was associated with social grooming behaviors, but more specifically with direct physical contact with others. individuals infected with trichuris sp. spent significantly less time grooming conspecifics than those not infected, and time in direct contact with others was the major predictor of infection status. one model correctly predicted infection status for trichuris sp. with a reliability of . % overall when the variables used were time spent in direct contact and time spent grooming others. this decrease in time spent grooming and interacting with others is likely a sickness behavior displayed by individuals with less energy or motivation for non-essential behaviors. this study highlights the need for an understanding of a study population’s parasitic infections when attempting to interpret animal behavior. we chose chlorocebus pygerythrus as the study species because individuals exhibit variation in grooming behaviors [ ] , allowing us to examine differences in the relationship between social behaviors and parasite infection status. groups of ch. pygerythrus in ldnr -and much of the surrounding region -typically vary in size from - individuals [ , , ] . six groups of ch. pygerythrus at ldnr are habituated, and researchers have been conducting studies of these groups semi-regularly for more than a decade [ - ]. we collected data from three of the six habituated groups at ldnr: blesbok group, donga group, and bay group. at the commencement of the study there were individuals in the blesbok group, in the donga group, and in the bay group; the total study population fluctuated due to births, migrations, and deaths, and was at the conclusion of the study. here we present data on a total of subjects as well as a subset of of those study subjects. information on group composition for each social group can be found in wren [ ] and wren et al. [ , ] . we located groups using known sleeping sites and home ranges. data were recorded for only the blesbok group from july -october because other researchers were studying the donga and bay groups during that time. data were collected from all three social groups for the social group differences are presented in table . these differences are likely due to the different sampling efforts for each social group as noted in the methods section. (table ) . however, because anova is robust with respect to violations of homogeneity of variance analyses could still be performed. there were statistically significant differences among groups for: total seconds observed (f ( , ) = . , p < . ); number of grooming partners (f ( , ) = . , p < . ); number of grooming partners giving (f ( , ) = . , p = . ); number of total partners (f ( , ) = . , p < . ); time self-grooming (f ( , ) = . , p = . ) ( table ) . planned contrasts revealed specific differences among groups and combinations of groups (tables & ). there were statistically significant differences for all monkeys were lost from sight, but we kept data from all follows longer than min. we used continuous recording for all behavioral data we recorded data on the following variables for all bouts of social grooming: start and stop times, whether grooming was given or received, and identity of grooming or more, the direction of grooming switched (i.e., the individual being groomed began grooming its partner or vice versa) we also recorded data on start and stop times for direct physical contact with another individual and identification of direct social contact partners we collected fecal samples non-invasively from identified individuals directly following defecation, and samples were immediately preserved in a % buffered formalin solution. we recorded data on the following variables for we used three methods to detect parasite eggs and cysts in samples in order to reduce the risk of false negatives: fecal flotation, fecal sedimentation, and immunofluorescence microscopy. we isolated helminth eggs and protozoan cysts and oocysts from fecal material using fecal flotation with double centrifugation (at rpm for min) in nano solution and fecal sedimentation with dilute soapy water we equation, based on a higher - log likelihood ( . ) and lower nagelkerke trichuris sp. and . % of observed presence of trichuris sp infected individuals at ldnr spent an average of % of their observed time grooming others, while those not infected with this parasite spent an average of % of their observed time grooming others. however, no differences existed in time spent being groomed by others. overall, for the entire sample (n = ), study subjects spent about % of their time in direct contact with another individual. the subset used for parasitological analysis spent . % (n = ) of their time in direct contact with another individual. this large difference is primarily influenced by the inclusion of infants and mothers with infants in the entire sample of n = , but only mothers in the smaller subset of n = . these mother-infant dyads remain in almost constant contact for the first weeks of a monkey's life and this inflates the overall mean for the group. because there were not enough fecal samples from these infants, their behavioral data was not included in hypothesis testing these results do not support the hypothesis that social grooming facilitates transmission of this type of gastrointestinal parasite. one possible explanation for these results is that individuals that are infected with trichuris sp. experience degraded health and/or less motivation to groom others and interact with others. red colobus monkeys (procolobus rufomitratus) in uganda that were infected with trichuris sp. decreased their time spent performing a number of behaviors those same individuals spent more time resting as well as ingesting plant species and/or parts that suggest self-medicative behavior. whipworm is known to cause anemia, chronic dysentery, rectal prolapse, and poor growth in humans with symptomatic infections [ ], so less energy, motivation, or interest for behaviors like social grooming should not be surprising in other species another possible explanation is that trichuris sp. more directly alters host behavior in vervet monkeys. gastrointestinal parasites are known to alter host behavior in some host-parasite relationships, an idea referred to as the manipulation hypothesis for example, toxoplasma gondii causes intermediate rodent hosts to be more attracted to the scent of felid predators dicrocoelium dendriticum causes infected ants to wait on the tips of blades of grass where they can be ingested by sheep, the parasite's definitive host. because manipulation host to a definitive host, and vervet monkeys do not serve as intermediate hosts for trichuris sp other studies have found multiple morphotypes of trichuris sp. in nonhuman primate hosts in captivity in nigeria [ , ], suggesting that potentially multiple species of trichuris sp. may infect nonhuman primates. the major implication of this has been seen as relevant for public health because it may mean that the species of trichuris sp ] noted that ill or infected animals display altered behavior, and argued that these sickness behaviors can be adaptive. one study of chimpanzees (pan troglodytes schweinfurthii) revealed that infected individuals exhibit altered behavior, most fittingly described as lethargy the ghai et al. [ ] study that revealed that trichuris sp. was associated with a reduction in grooming and mating and also found that individuals infected with this parasite took longer to switch behaviors than those individuals that were not infected this study suggests that the gastrointestinal parasite trichuris sp. is associated with behavioral differences, specifically decreased time spent grooming others and time spent in direct contact with others, in vervet monkey hosts. these behavioral differences are extreme enough to influence group means when assessing behavior. further, if an individual is less likely to groom or interact with conspecifics, then they may also experience lower social status and thus lower reproductive fitness we would like to thank the mpumalanga parks and tourism agency we are also grateful to katie dean, claire detrich, ruby malzoni, liz sperling moses for assistance in the laboratory grooming systems of insects: structure, mechanics pollen transport and deposition by bumble bees in erythronium: influences of floral nectar and bee grooming grooming patterns in the primitively eusocial wasp polistes dominulus a video-tracking method to identify and understand circadian patterns in drosophila grooming hoxb is required for normal grooming behaviour in mice self-grooming by rodents in social and sexual contexts costs and constraints of anti-parasitic grooming in adult and juvenile rodents neurobiology of rodent self-grooming and its value for translational neuroscience relationship of bill morphology to grooming behaviour in birds comparative analysis of time spent grooming by birds in relation to parasite load of great tits and fleas: sleep baby sleep do grooming behaviours affect visual properties of feathers in male domestic canaries, serinus canaria? grooming in primates: implications for its utilitarian function the distribution of grooming and related behaviours among adult female vervet monkeys intragroup cohesion and intergroup hostility: the relation between grooming distributions and intergroup competition among female primates grooming down the hierarchy: allogrooming in captive brown capuchin monkeys the development of grooming and its expression in adult animals elimination of external parasites (lice) is the primary function of grooming in free-ranging japanese macaques insects groom their antennae to enhance olfactory acuity grooming behaviour as a mechanism of insect disease defense beta-endorphin concentrations in cerebrospinal fluid of monkeys are influenced by grooming relationships seyfarth rm. a model of social grooming among adult female monkeys functional significance of social grooming in primates silk jb. social components of fitness in primate groups grooming reciprocation among female primates: a meta- analysis infection strategies of retroviruses and social grouping of domestic cats gorilla susceptibility to ebola virus: the cost of sociality dynamics of a multihost pathogen in a carnivore community integrating contact network structure into tuberculosis epidemiology in meerkats in south africa: implications for control social network analysis of wild chimpanzees provides insights for predicting infectious disease risk making connections: insights from primate-parasite networks number of grooming partners is associated with hookworm infection in wild vervet monkeys (chlorocebus pygerythrus) social bonds between unrelated females increase reproductive success in feral horses social bonds of female baboons enhance infant survival fitness increases with partner and neighbour allopreening heart rate responses to social interactions in free-moving rhesus macaques (macaca mulatta): a pilot study grooming in barbary macaques: better to give than to receive? the social transmission of disease between adult male and female reproductives of the dampwood termite zootermopsis angusticollis infectious diseases in primates: behavior, ecology and evolution primate parasite ecology: the dynamics and study of host-parasite relationships no effects of a feather mite on body condition, survivorship, or grooming behaviour in the seychelles warbler, acrocephalus sechellensis emerging infectious disease and the challenges of social distancing in human and non- human animals who infects whom? social networks and tuberculosis transmission in wild meerkats social transfer of pathogenic fungus promotes active immunisation in ant colonies a vegetation classification and management plan for the a floristic description and utilisation of two home ranges by vervet monkeys in loskop dam nature wren bt. behavioural ecology of primate-parasite interactions social and ecological influences on activity budgets of vervet monkeys, and their implications for group living when females trade grooming for grooming: testing partner control and partner choice models of cooperation in two primate species . van de waal e, renevey n, favre cm, bshary r. selective attention to philopatric models causes directed social learning in wild vervet monkeys pansini r. induced cooperation to access a shareable reward increases the hierarchical segregation of wild vervet monkeys helminths of vervet monkeys chlorocebus aethiops, from loskop dam nature reserve measuring behaviour: an introductory guide noninvasive assessment of gastrointestinal parasite infections in free-ranging primates a legacy of low-impact logging does not elevate prevalence of potentially pathogenic protozoa in free-ranging gorillas and chimpanzees in the republic of congo: logging and parasitism in african apes sickness behaviour associated with non-lethal infections in wild primates the public health significance of trichuris trichiura modification of intermediate host behaviour by parasites host specificity parasites and the behaviour of animals parasite manipulation of host behaviour: an update and frequently asked questions the effect of toxoplasma gondii on animal behaviour: playing cat and mouse prevalence and morphotype diversity of trichuris species and other soil-transmitted helminths in captive non-human primates in northern nigeria gastrointestinal helminths of resident wildlife at the federal university of agriculture zoological park biological basis of the behaviour of sick animals wrangham rw. noninvasive monitoring of the health of pan troglodytes schweinfurthii in the kibale national park, uganda key: cord- - izdji authors: siddiqui, nabil a.; houson, hailey a.; thomas, shindu c.; blanco, jose r.; o’donnell, robert e.; hassett, daniel j.; lapi, suzanne e.; kotagiri, nalinikanth title: radiolabelled bacterial metallophores as targeted pet imaging contrast agents for accurate identification of bacteria and outer membrane vesicles in vivo date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: izdji modern technologies such as s dna sequencing capable of identifying microbes and provide taxonomic resolution at species and strain-specific levels is destined to be transformative . likewise, there is an emerging need to accurately identify both infectious and non-infectious microbes non-invasively in the body at the genus and species level to guide diagnosis and treatment strategies. here, we report development of radiometal-labelled bacterial chelators, knowns as metallophores that allow non-invasive and selective imaging of bacteria and bacterial products in vivo. we show that these novel contrast agents are able to identify e. coli with strain level specificity and other bacteria, such as k. pneumoniae, based on expression of distinct cognate transporters on the bacterial surface. the probe is also capable of tracking probiotic, engineered bacteria and bacterial products, outer membrane vesicles (omvs), in unique niches such as tumours. moreover, we report that this novel targeted imaging approach has impactful applicability in monitoring antibiotic treatment outcomes in patients with pulmonary infections, thereby providing the ability to optimize individualized therapeutic approaches. compared to traditional techniques used to manufacture probes, this strategy simplifies the process considerably by combining the function of metal attachment and cell recognition into a single molecule. thus, we anticipate that these probes will be widely used in both clinical and investigative settings in living systems for non-invasive imaging of infectious and non-infectious organisms. the body, and distinguish pathogenic from commensal species to minimize indiscriminate use of broad-spectrum antibiotics thereby adopting a more focused treatment strategy targeting specific pathogens. an application of equal importance is the precise mapping of resident microbiota, tracking of engineered bacteria and their products such as virulence factorharbouring outer membrane vesicles (omvs) for therapeutic applications. the role of microbiota in human disease is increasingly being recognized to play a major role in aetiology and treatment modifications . the emergence of next-generation sequencing techniques has allowed extensive characterization of the microbiome in various pathologic conditions, from neurologic and endocrine, to cancer [ ] [ ] [ ] . however, there is an urgent need for non-invasive tools to detect and identify bacteria using endogenous markers. bacteria are also commonly used for desirable, biotechnological applications due to the ease of genetic manipulation and heterologous protein expression. several broad categories of bacteria-based approaches have been explored in preclinical research focusing on a wide range of diseases such as cancer, diabetes, obesity, hypertension, infection and inflammatory dysfunction [ ] [ ] [ ] [ ] . however, functional stability is important for biomedical applications of engineered bacteria. development of convenient and realistic in vivo testing environments will be important for achieving improved accuracy. such monitoring would provide information regarding presence, location, quantity, proliferation, survival and status of the current microbiome, engineered bacteria and omvs. numerous molecular imaging techniques such as ultrasound (us), computed tomography (ct), magnetic resonance imaging (mri), single-photon emission ct (spect), and positron emission tomography (pet) have been developed for preclinical and clinical research in the last three decades. however, current clinical probes cannot reliably distinguish between bacteria from mammalian cells in vivo . thus, a comprehensive understanding of bacterial physiology and genetics is required to develop probes for targeted imaging. as bacteria are evolutionarily and phylogenetically distinct from mammalian cells, fundamental differences in metabolism and cellular structures can be leveraged to develop bacteria-specific imaging agents. some of the recent approaches that have focused on targeting metabolic pathways and proteins that are only present in bacteria include - c-para-aminobenzoic acid (paba) and f- -paba , which are involved exclusively in the bacterial folate pathway; fmaltohexaose and - f-fluoromaltotriose , which specifically target the maltodextrin transporter in bacteria; and f-fluorodeoxysorbitol (fds), a synthetic analogue of f-fdg that selectively localizes in gram-negative enterobacteriaceae . however, these radiotracers do not have the ability to distinguish one genus/species/strain from another. aside from requiring high specificity and selectivity to the target bacterial population, labelling techniques for generating the reporter probes should be simple, rapid and inexpensive to qualify as an ideal imaging agent. current techniques employ a multi-step process to label metabolites and ligands, which can lead to unnecessary delays and avoidable expenses considering the short decay half-lives (t / ) of some radionuclides. our study takes advantage of unique natural molecules that serve as both metal chelators as well as ligands, thus minimizing and simplifying the labelling protocol to just one step. metal transport is a distinct pathway in bacteria that can be exploited to develop specific probes. bacteria have developed sophisticated mechanisms for metal acquisition and transport to maintain metal homeostasis within a specified microenvironment. entire pathways for metal acquisition, comprised of -(i) de novo synthesis of metal-chelating "metallophores, mtps"; and (ii) dedicated membrane transporters that selectively bind the mtpshave evolved to precisely regulate this process . mtps are small peptide-like molecules that possess a high affinity for transition metals . the central role that mtps play as chemical ligands in shuttling metals and the unique biology facilitating this transport into bacteria offers a unique platform that can be harnessed to develop highly versatile and specific contrast agents. recently, two independent research teams underscored the roles of transport proteins, fyua and cnta, in copper homeostasis of escherichia coli and staphylococcus aureus respectively. in the s. aureus study, a novel metallophore, staphylopine (stp) (fig. a) was shown to chelate metals before the complex is selectively "captured" by the cnta domain and transported into the bacteria by the cntbcdf domains of this abc transporter protein . on the other hand, pathogenic e. coli uti (urinary tract infection isolate) uses the metallophore yersiniabactin (ybt) (fig. b) to sequester cu (ii) from the extracellular environment inside the bacteria . metal bound ybt (fig. c ) is first selectively "recognized" by its cognate outer membrane protein receptor, fyua (ferric yersiniabactin uptake a), before the inner membrane atp-binding cassette transporters, ybtpq allow cytoplasmic entry. we surmised that these bacterial metal transporters can be targeted by clinically relevant radioisotope labelled metallophores (fig. d) . since pet is considered the most sensitive among spect, pet, mri, us and ct imaging technologies , we focused on developing targeted pet imaging contrast agents. to optimize the one-step radiolabelling process, an accurate selection of radiometals, mtps and buffering conditions is crucial for an efficient metal-metallophore complexation. consequently, we performed in vitro radiolabelling of stp and ybt in various buffered and temperaturecontrolled conditions with four different transition metals - co, cu, ga and zr (fig. e) . we observed that ybt had the highest complexation with both cu and zr. since stp manifested moderately high affinity for zr, we proceeded to investigate the stability of cu and zr labelled stp and ybt. when we incubated zr-ybt in saline ( . % nacl) and in diethylenetriamine pentaacetate (dtpa, a metal-chelator) supplemented saline, zr remained bound to ybt in both media for hrs (fig. f) , but the radiometal dissociated significantly from stp within the first hour in the same conditions (fig. g) in vitro. we then assessed the stability of ybt probes in vivo and observed that both cu (fig. h ) and zr (fig. i) labelled ybt probes accumulated more significantly in the liver than any other body location, a feature that is likely attributable to the lipophilic nature of ybt. since cu and zr-labeled ybt proved to have the highest stability amongst all the probes investigated thus far, we wanted to examine whether the ybt probes could selectively differentiate its target, e. coli uti (gram-negative), from representative gram-negative and gram-positive bacteria that lack the fyua receptor in vivo. we executed our one-step radiolabelling protocol to demonstrate the ease, simplicity and rapidity of the radiolabelling process from radionuclide acquisition to animal injection (extended data fig. ) . we selected pseudomonas aeruginosa (gram-negative) and s. aureus (gram-positive) as control bacteria, since both are widely studied and highly infectious nosocomial bacterial pathogens. indeed, both p. aeruginosa and s. aureus lack the cognate fyua transporters. tail-extended data figure . schematic representing the experimental approach to establish the suitability of radiolabelled metallophores for bacteria-specific imaging. vein injections of cu-ybt revealed significantly higher accumulation of the probe in uti infected muscles than in s. aureus (fig. a) or p. aeruginosa (fig. b) infected muscles, indicating fyua-mediated e. coli species-specific imaging. however, it is likely that there could be a limited degree of crosstalk between strains, particularly within e. coli sp. that have the most genetic/pathogenic variants. to determine whether e. coli may possess alternative mechanisms to import cu-ybt other than the fyua, we elected to compare the localization of cu-ybt in another highly pathogenic strain of e. coli, o :h (enterohemorrhagic, ehec), which does not encode for the fyua receptor . intravenous (i.v.) administration of the cu-ybt revealed pet signals (fig. c) , again exclusively in uti , thus confirming strain-specific imaging ability of the probe as well. furthermore, when we administered cu-ybt in mice infected with live and heat-killed ( °c for min) uti , we observed signals only from muscles harbouring live bacteria (fig. d) . ex vivo biodistribution revealed concordant results to all the pet/ct images ( fig. e) . any new diagnostic probe must be evaluated against a reliable "gold standard", so we compared the specificity of cu-ybt with the clinically available pet probe, fluorine- labelled deoxyglucose ( fdg). we noticed that the latter accumulated in almost equal concentrations in all the pathogens we have investigated to date ( fig. g-i) . it is well-known that f-fdg is taken up by cells that are involved in both septic and aseptic inflammation [ ] [ ] [ ] . hence, we believe that f-fdg accumulated within immune cells that infiltrated the infected muscles. the lack of a substantial signal from the muscle injected with the non-pathogenic commensal e. coli k (fig. h) indicates that the bacteria were not able to cause an inflammatory response. when we attempted to repeat our study with zr-ybt, our results showed that the probe failed to selectively identify uti and accumulated mostly in the bones of diseased mice ( fig. a and b) , though previous results indicated sufficient in vivo stability of zr-ybt in naïve mice ( fig. f and i) . this led us to question whether the intact probe might have dissociated within and eventually effluxed from the bacteria. to confirm this postulate, we performed in vitro uptake and retention studies for both the probes. we grew an overnight culture of uti in rpmi supplemented with cu-ybt or zr-ybt for hrs to mimic the initial bacterial uptake of probes in a physiologically relevant environment. several studies have shown that bacteria require hrs to accumulate nearly all of the incubated probes , , . subsequently, we pelleted the bacteria and washed with pbs multiple times before we resuspended the pellets in fresh rpmi media for and hrs to investigate how the relative amount of each probe that was retained by the bacteria. fig. c shows that while uti was able to retain most of the cu-ybt over a period of hrs, the bacteria lost a significant amount of zr-ybt over the same period. besides e. coli uti , probiotic e. coli nissle and pathogenic klebsiella pneumoniae also transport metals using ybt via the fyua receptor [ ] [ ] [ ] . we next reasoned that if radiometallabelled ybt is truly selective for fyua only, then our probes should be able to accumulate in nissle and in k. pneumoniae as well. to test this hypothesis, we injected cu-ybt in mice that received intranasal administration of pbs (fig. a) , p. aeruginosa (fig. b) , e. coli nissle (fig. c ) and k. pneumoniae (fig. d) , with the former two serving as negative controls. the goal was to selectively identify fyua-expressing bacteria in mice with infected respiratory tracts. after we performed pet/ct imaging and ex vivo biodistribution (fig. e) of various harvested tissues, we observed significantly higher signals from lungs and trachea of mice that received k. pneumoniae and nissle compared to those that received p. aeruginosa and pbs. this illustrates the ability of our probe to trace and identify live bacteria in the pulmonary niche as well. clinically, healthcare professionals would prefer to have a "real-time" assessment of therapeutic success or failure for the management of seriously ill patients. to demonstrate whether our probe can potentially be used in such circumstances, we injected cu-ybt in mice infected with e. coli uti and k. pneumoniae to image changes in signal intensity following the administration of the antibiotic, ciprofloxacin. we generated uti clones with a luciferase reporter to track bacterial growth and burden using bioluminescence imaging (bli). bli was accurately able to indicate decreases in bacterial burden in response to antibiotic treatment. we used cu-ybt to co-register pet signals with bli, and were able to show a decrease in pet signal corresponding to a proportional decrease in bacterial burden in mice that received two doses of ciprofloxacin (fig. a ) compared to untreated mice (fig. b) . when we harvested the thigh muscles from mice post-euthanasia, gamma counter analysis revealed significantly lower radioactive counts from tissues of mice that received ciprofloxacin compared to the control (fig. c ). the specificity of these distinct bacterial metal transporters for their corresponding metal bound mtps can be leveraged for in vivo visualization of therapeutic bacteria and bacterial virulence factor carrying outer membrane vesicles (omvs) as well. since fyua-expressing e. coli species have been proven to naturally possess the metal transporter in their secreted omvs , we postulated that both nissle and its secreted omvs can be selectively tracked by radiolabelled ybt without the need for further genetic manipulation of the bacteria. we tested our theory by injecting nissle i.v. in subcutaneously developed t tumour-bearing mice. usually, greater than % of the administered bacteria are cleared from the animals, leaving only a small percentage to colonize the tumor , . hence, we allowed days for the bacteria to localize and proliferate in the hypoxic core of the tumour before administering cu-ybt in the mice. the control mice did not receive nissle (fig. a) . pet/ct and ex vivo biodistribution analysis revealed strong signals in tumours of nissle-administered mice only (fig. b and c) . to achieve our next goal of imaging omvs in tumours, which accumulate as a result of enhanced permeability and retention (epr) effect in leaky solid tumours, we first incubated nissle omvs and bovine milk exosomes, as the negative control lacking fyua transporters, with cu-ybt to compare the difference in probe uptake between the two types of nanoparticulates. subsequent purification to eliminate unbound pet probe yielded activities of approximately µci/ml and µci/ml for radiolabelled omvs and exosomes, respectively. after hrs postadministration of the radiolabelled vesicles in our mouse models, we performed pet/ct imaging where we observed radioactive concentration in tumours of the mice injected with omvs ( fig. d) compared to those in exosome administered subjects (fig. e) . we believe that the presence of fyua on the lipid bilayer of the omvs allowed selective entry and statistically significant retention of cu-ybt within the omvs (fig. f) . since the exosomes lack the metal transporter, cu-ybt is merely bound to the outer regions of the nanostructure without being selectively incorporated inside, and eventually had dissociated in the bloodstream in vivo (fig. g ). in this study, we performed a comprehensive analysis of multiple metal-mtps and successfully identified ybt complexed to cu as a highly stable pet probe that can selectively target bacterial metal transport proteins. though numerous bacteria-specific pet probes have been developed within the last decade, most of these involve complex and time-consuming reaction mechanisms that eventually yield in tracer synthesis , . furthermore, most tracers do not have high complexation (> %) with the radioisotope, which necessitates an additional purification step before administration. these might prove as potential barriers to clinical translation as hospital/clinical staff members would prefer to be able to prepare the probes in a facile manner. in some of the initial preclinical and clinical investigations, radiolabelled antibiotics such as m tc-ciprofloxacin seemed promising, not only for their ability to specifically kill (or disable) bacteria while being nontoxic to human cells, but also for the ease of probe preparation using manufactured kits . however, in later investigations, these spect probes proved to not only accumulate in bacterial lesions, but also in sterile inflammatory sites . metallophore-based probes eliminate the need for sophisticated synthetic chemical reactions since these molecules are readily synthesized and secreted by bacteria, and can easily be obtained using simple purification strategies from culture supernatants , (extended data fig. ). moreover, we succeeded in optimizing the radiolabelling procedure to a simple and single step of mixing radiometals with these highly selective bioinorganic ligands. thus critical to ensure that cu-ybt was taken up by live and not non-specifically by dead bacteria. firstly, we displayed how the heat-killed e. coli uti never imported our probe. next, we showed that live bacteria that initially took up the probe, were later neutralized by an antibiotic (ciprofloxacin) regimen and eventually cleared from the infected site. this information could be particularly important in a healthcare setting, where a false positive signal by dead bacteria could mislead physicians to over-prescribe antibiotics in patients, which might eventually lead to the development of multidrug-resistant (mdr) bacteria. there is precedent for using metallophores as nuclear contrast agents. ga coordinated to ferricrocin and desferrioxamine (dfo) has been used in the past to image fungal infections, including aspergillus fumigatus , . zr coordinated to dfo and coupled to various targeting vectors, including antibodies, peptides and nanoparticles, is a useful strategy for imaging tumor specific receptors. recently, pyoverdine- ga probes were used to selectively image p. aeruginosa infections in animal models . however, due to the short half-life of ga (t / = min) the ability of ga-pyoverdine to track bacteria longitudinally in vivo can be challenging. the . -hours half-life of cu provides the flexibility to image at both shorter and longer (upto h) time scales that allows for optimal clearance of probe and obtaining images with high contrast. logistically, the longer half-life also allows cu-radionuclides to be easily distributed for pet imaging studies at sites remote to the production facility with the loss of approximately one half-life . zr has a significantly longer t / of . d which means that some levels of radioactivity can remain in the body for up to a month. this is of particular concern with zrbased probes as preclinical studies often report dissociation of the metal from the tracer leading to free zr accumulation in the bones . although zr-ybt showed promise in our stability investigations in naïve mice, the bacterial presence in infection models destabilized the probe, which resulted in significant uptake in the bones and joints in subsequent studies. indeed ybtpq transporter complex has been shown to be essential for metal-ybt complex dissociation to yield metal-free ybt . mtps have generally been studied for their affinity for and import of metals such as cobalt, copper, iron, manganese, nickel and zinc, which are deemed "nutritional" by bacteria , , . however, zirconium has no known nutritional value to bacteria. hence it is highly likely that only cu was retained inside bacteria and yielded significant pet signals, whereas zr was effluxed after hrs from bacteria resulting in loss of pet signal from infection site and subsequent uptake in the bones (fig. d) . this is interesting because zr is known to be a residualizing radionuclide in mammalian cells, which means that upon internalization in cells it is trapped inside the cell cov- , millions of patients globally have required ventilator-assisted breathing in intensive care units (icus). hospitals in developing countries and make-shift icus even in developed countries run the risk of exposing patients to common pathogens such as k. pneumoniae, p. aeruginosa and methicillin-resistant s. aureus that are known to cause ventilator-associated pneumonia (vap) . this pathology can be exacerbated by carbapenemase producing k. pneumoniae (kpc) and often prove fatal to patients. as scientists begin to unravel the exact pathogenesis of sars-cov- and emerging respiratory illnesses, early diagnosis of secondary infections caused by bacteria in the respiratory tract would be critical in saving significant numbers of lives globally. we have also validated how cu-ybt can potentially be used to advance bacterial therapies using engineered bacteria. advances in genetic engineering have enabled us to explore the potential of using bacteria, particularly for cancer therapy in the last two decades , - . one of the most notable initial studies tested genetically engineered salmonella for its anticancer activity in mouse models of subcutaneously implanted b f melanoma tumors . the results were so promising that a phase ii clinical trial was conducted in patients with metastatic melanoma. however, the research ended at that stage because the engineered bacteria did not yield sufficient tumour-targeting efficacy in humans. selective tracking could have allowed scientists to better map its biodistribution, and hence optimize the therapeutic potential of the engineered salmonella. whilst numerous optical imaging studies have been conducted, very few nuclear imaging studies have selectively located bacteria in tumors . optical imaging techniques based on bioluminescence and fluorescence have the inherent drawback of limited light penetration, unlike pet imaging which is depth independent. recently, tumour localization of omvs from e. coli was confirmed via photoacoustic imaging . while the study described a novel strategy to specifically image the presence of omvs in the body, it involved an additional step of vesicle design to make it amenable for optoacoustic imaging. our study illustrates that cu-ybt can image bacteria and their products using their endogenous proteins as reporters, without the need for additional genetic modifications. moreover, the probe can detect bacteria in unique niches, such as tumours, with minimal nonspecific uptake in tumour tissue, thus maximizing the signal to background ratio. current imaging agents such as fdg, ga-citrate lack specificity as they are unable to distinguish between infection and inflammation . importantly, imaging agents such as fdg would not be suitable in this scenario since tumour cells would also uptake the tracer and it would be impossible to delineate signals from bacteria and tumour. in conclusion, we have demonstrated the successful use of bacterial metallophores as "dual-role" compounds -(i) as a chelator as well as a targeting ligand for imaging and (ii) tracking of pathogenic and commensal bacteria as well as omvs using a single probe. we have shown the versatility of the probe in detecting bacteria in three different modelsmyositis, pulmonary infection and an intratumoural niche. importantly, this technique facilitates precise labelling of "live" bacteria in vivo. the small probe size (~ - da), when compared to conventional antibody or peptide-based pet probes, will potentially allow access through the lining of blood vessels to analyse extravascular structures with relative ease. we anticipate that this will open opportunities to explore the possibility of using the diverse array of natural metallophores, with over examples known to date, as tailored contrast agents for imaging a wide range of wild-type and engineered bacteria. the same probes can be used across diagnostic platforms, for in vitro assays using highly sensitive liquid scintillation and gamma counters as well as in vivo pet/spect/mri imaging using appropriate radionuclides and metals. for instance, mtp complexation with radiometals such as m tc and in could facilitate imaging with spect scanners that are more ubiquitous and less expensive to operate; and complexation with mn metal or mn radionuclide could facilitate bacterial imaging with mri or pet/mri scanners, respectively, to improve spatial resolution. in addition, combining metallophores with novel radiometals can provide insights into unique radiochemistry techniques. such innovations will reveal new and invaluable information based on metallophoremetal, metallophore-bacteria, metallophore-host and bacteria-host interactions in living systems, which in turn will help design advanced tools and future therapeutic strategies. all reagents were purchased from commercial sources as analytical grade and used without a sequential differential centrifugation protocol was developed to isolate exosomes and omvs. in brief, high molecular weight (hmw) proteins and fats were removed from raw bovine milk and the supernatant was centrifuged at g for hrs to pellet the exosomes. in a similar manner, an overnight culture of e. coli nissle was centrifuged to remove the bacteria followed by concentrating the supernatant using pierce™ concentrators ( kda, thermo scientific™). the overnight culture of uti was diluted to an od of . in rpmi supplemented with cu-ybt or zr-ybt ( µci each) and grown for hrs while shaking. bacteria were subsequently pelleted at , r.p.m for mins and washed with × pbs (sigma) multiple times. cell-associated cu or zr levels were measured from pelleted bacteria using a gamma counter. the pellets were resuspended in fresh rpmi media for and hrs after which the centrifugation and washing steps were repeated as above. experiments were repeated three times from independent bacterial cultures. animal models - weeks old balb/cj and athymic nude (nu/j) mice were used for all experiments. for infection studies, x cfu of bacteria were administered either intramuscularly or intranasally hrs or hrs respectively before probe administration. to monitor antibiotic treatment efficacy, ciprofloxacin at a dose of mg/kg was given to mice once every hrs via oral gavage. this regimen was thought to provide a plasma peak level in the range that was above those required to achieve efficacy after an oral administration of mg of ciprofloxacin in humans . to image tumour-infiltrating bacteria, . x t cells were subcutaneously injected and allowed to develop into a noticeable tumour (size > mm ) on the right flank of the mice. this was followed by a single i.v. administration of e. coli nissle ( x cfu) days before probe pet probe administration. to track omvs and exosomes in t tumours, radiolabelled nanovesicles were injected immediately after the tumour reached the appropriate size. all animal experiments were performed under anaesthesia ( % isoflurane) by following protocols approved by the university of cincinnati biosafety, radiation safety, and animal care and use committees. - µci of pet probes were injected i.v. to selectively image bacteria in both infection and tumour models. - µci of radiolabelled vesicles were injected i.v. to study their tumour localization. small-animal pet scan was performed or hrs post injection on a μpet scanner (siemens inveon). animals were placed in the supine position on the imaging gantry with continued warming for the duration of the scan. a ct scan ( kvp, μa, at projections) was acquired for anatomical reference overlay with pet images for a -min acquisition with real-time reconstruction. the pet images were acquired over mins as well and spatial resolution in the entire field of view was determined by ordered subset expectation maximization in dimensions. histogramming and reconstruction were applied using siemens inveon software. post-processing was carried out with inveon research workplace and general analysis and d visualization was used for contouring volume of interest (voi). these voi values were considered active infection volumes and used for further analyses. bioluminescence images were acquired for mins using an ivis imaging system for quantification of radiance (total flux, photons per second, p/s) of the bioluminescent signals from the regions of interest. after the imaging studies, the mice were euthanized via carbon dioxide inhalation and cervical dislocation. organs and tissues of interest were removed and weighed. residual radioactivity in the samples was measured with a gamma counter and results expressed as percentage of injected dose per gram of organ (% id/g). all statistical analyses were performed using the graphpad prism software. two-tailed unpaired student's t-test was performed to compare the means between two groups, whereas one-way anova was used to compare the means among three or more groups. values of p < . was considered statistically significant. the data that support the findings of this study are available from the corresponding author upon request. national institute of general medical sciences of the national institutes of health (r gm ) and national institutes of performed all in vitro and in vivo bacterial experiments. h.a.h. performed radiometal-metallophore complexation and stability studies. s.c.t. prepared exosomes and omvs analysed data and wrote the manuscript with input from all authors evaluation of s rrna gene sequencing for species and strainlevel microbiome analysis the antibiotic resistance crisis: part : management strategies and new agents. p & t : a peer-reviewed journal for formulary management cancer and the microbiota how gut microbes talk to organs: the role of endocrine and nervous routes microglial control of astrocytes in response to microbial metabolites two-step enhanced cancer immunotherapy with engineered salmonella typhimurium secreting heterologous flagellin reversal of diabetes in nod mice by clinical-grade proinsulin and il- -secreting lactococcus lactis in combination with low-dose anti-cd depends on the induction of foxp -positive t cells a phase i trial with transgenic bacteria expressing interleukin- in crohn's disease. clinical gastroenterology and hepatology : the official clinical practice journal of the prevention of vaginal shiv transmission in macaques by a live recombinant lactobacillus molecular imaging of bacterial infections: overcoming the barriers to clinical translation )c]para-aminobenzoic acid: a positron emission tomography tracer targeting bacteria-specific metabolism positron emission tomography imaging with -[( )f]f-p-aminobenzoic acid detects staphylococcus aureus infections and monitors drug response investigation of -[¹⁸f]-fluoromaltose as a novel pet tracer for imaging bacterial infection pet imaging of bacterial infections with fluorine- -labeled maltohexaose imaging enterobacteriaceae infection in vivo with f-fluorodeoxysorbitol positron emission tomography structural biology of bacterial iron uptake biosynthesis of a broad-spectrum nicotianamine-like metallophore in staphylococcus aureusklebsiella pneumoniae yersiniabactin promotes respiratory tract infection through evasion of lipocalin probiotic bacteria reduce salmonella typhimurium intestinal colonization by competing for iron comparative analysis of the uropathogenic escherichia coli surface proteome by tandem massspectrometry of artificially induced outer membrane vesicles escherichia coli nissle facilitates tumor detection by positron emission tomography and optical imaging tumor-specific colonization, tissue distribution, and gene induction by probiotic escherichia coli nissle in live mice bacterial infection imaging with [( )f]fluoropropyl-trimethoprim in vivo detection of staphylococcus aureus endocarditis by targeting pathogen-specific prothrombin activation comparison of mtc infecton imaging with radiolabelled white-cell imaging in the evaluation of bacterial infection inability of mtc-ciprofloxacin scintigraphy to discriminate between septic and sterile osteoarticular diseases an iron-mimicking, trojan horse-entering fungi--has the time come for molecular imaging of fungal infections? plos pathogens siderophore-mediated mechanism of gallium uptake demonstrated in the microorganism ustilago sphaerogena imaging of pseudomonas aeruginosa infection with ga- labelled pyoverdine for positron emission tomography copper- radiopharmaceuticals for pet imaging of cancer: advances in preclinical and clinical research in vivo biodistribution and accumulation of zr in mice metal selectivity by the virulence-associated yersiniabactin metallophore system long-lived positron emitters zirconium- and iodine- for scouting of therapeutic radioimmunoconjugates with pet identification of an abc transporter required for iron acquisition and virulence in mycobacterium tuberculosis the ferrichrome uptake pathway in pseudomonas aeruginosa involves an iron release mechanism with acylation of the siderophore and recycling of the modified desferrichrome the siderophore yersiniabactin binds copper to protect pathogens during infection uropathogenic enterobacteria use the yersiniabactin metallophore system to acquire nickel executive summary: management of adults with hospital-acquired and ventilator-associated pneumonia: clinical practice guidelines by the infectious diseases society of america and the bioengineered bacterial vesicles as biological nano-heaters for optoacoustic imaging bacterial outer membrane vesicles suppress tumor by interferon-γmediated antitumor response programmable bacteria induce durable tumor regression and systemic antitumor immunity lipid a mutant salmonella with suppressed virulence and tnfalpha induction retain tumor-targeting in vivo fdg pet of infection and inflammation broad host range fluorescence and bioluminescence expression vectors for gram-negative bacteria ciprofloxacin treatment failure in a murine model of pyelonephritis due to an aac( ′)-ib-cr-producing escherichia coli strain susceptible to ciprofloxacin in vitro % of total monocytes, clusters , , and ) and relatively rare cluster (< % of total monocytes, cluster ). to assess the distinguishing features of these clusters, we performed differential gene expression analysis for each cluster within the monocyte group (clusters , , , ) . we identified genes with significantly elevated expression in each cluster (dataset s ), including the monocyte/macrophage marker cd (cluster ), and numerous genes implicated in cell adhesion and migration such as mmp (cluster ), thrombospondin (thbs in clusters , ), l-selectin (sell in cluster , ), p-selectin (selp in cluster ), and icam (cluster ) (dataset s ). of note, the top ranked (adjusted p value and log fold-change) differentially expressed gene in cluster is ensecag , annotated as fcgr a/b or cd , a canonical marker for non-classical monocytes (cd locd + by flow cytometry) in human pbmc ( ) . indeed, hierarchical clustering (fig. c) , and heat map (fig. d ) visualizations suggest that clusters , , and exhibit somewhat similar and/or overlapping gene expression patterns, while cluster is notably transcriptionally distinct. clusters , , and demonstrate varying expression of genes associated with classical monocytes (cd hi cd in humans, ly c hi cd + in mice) and/or intermediate monocytes (cd ++ cd + in humans), including cd , cd , sell, and the mhcii components dra and dqa (fig. e ). additional genes with significantly elevated expression levels in cluster include nr a (transcription factor necessary for differentiation of non-classical monocytes in mice) ( ) , cx cr (chemokine receptor characteristic of nonclassical monocytes in humans and mice) ( , ) , and hes (target of notch signaling implicated in non-classical monocyte generation) ( ) (fig. e ). taken together, these results suggest that equine monocyte populations are analogous to those described in humans and mice, with clusters , , and most similar to classical monocytes, and cluster to non-classical monocytes. presumptive dc clusters ( , , ) were also analyzed by differential gene expression analysis (dataset s ). differentially expressed genes in cluster included clec a, cadm , and btla (fig. f , dataset s ), all of which are immunophenotyping markers for cdc in humans and mice ( ) (in mice, clec a is also expressed on plasmacytoid dc ( ) ). genes with significantly enriched expression in cluster included fcer a and sirpa (fig. f , dataset s ), which are flow cytometric markers of cdc in humans and mice (reviewed in ( ) ). dc subsets are also defined by the transcription factors that regulate their development and function, particularly by relative levels of irf and irf ( ) . although irf transcripts were sparsely detected across all dc clusters (likely due to the incomplete sampling depth characteristic of droplet scrna-seq), irf was expressed at high levels in cluster (cdc ) and significantly lower levels in cluster (cdc ). cluster also exhibited high expression of batf , another characteristic transcription factor of cdc ( ) . in addition, top ranked differentially expressed genes in cluster included irf and tcf (e - ) (fig. f , dataset s ), both of which are fundamental to plasmacytoid dc (pdc) development and function ( , ) . to further support our cell type annotations and assess potential differences in monocyte/dc subsets between horses and humans, we performed cross-species hierarchical clustering with a human pbmc public reference scrna-seq data set ( fig. s a-b, fig. g ). equine clusters annotated as classical monocytes clustered first with each other, and next with human classical monocytes (defined by scrna-seq gene expression and confirmed with corresponding cd /cd immunophenotyping feature barcoding data). this pattern likely reflects the heterogeneous transcriptional states of classical monocytes defined by cd expression, and suggests broad similarities of this cell type across species. equine non-classical monocytes clustered with human intermediate and non-classical monocytes. remarkably, each dc subgroup clustered by cell type rather than species, indicating strong similarities of gene expression patterns between horse and human. these results further support three distinct dc subpopulations in horse peripheral blood that correspond with cdc (cluster ), cdc (cluster ) , and pdc (cluster ) in humans. we next performed an in-depth analysis of b cell clusters, as defined by their expression of ms a (cd ), cd a, mhc-ii components (i.e. dra), and/or immunoglobulin transcripts in humans, t-bet + b cells have been described as "atypical memory b cells," appearing in the peripheral blood during chronic infection and/or inflammation ( , ) . although specific markers and/or gene expression patterns vary in different datasets, these b cells are often found to express itgam (cd b), itgax (cd c), as well as genes that modulate bcr signaling (including fcrl , fgr, and hck) ( ) ( ) ( ) ( ) ( ) . moreover, a recent study of t-bet + b cell populations in the context of chronic hiv infection demonstrated expression of genes associated with germinal center b cells ( , ) . we assessed expression of several of these characteristic genes in b cell clusters, and observed patterns consistent with multiple reports in humans (fig. a ). among b cells, itgam (cd b) expression was restricted to clusters and , while fcer (cd ) was virtually absent from these clusters. although sampling for itgax (cd c) and ensecag (annotated as cr /cd ) was insufficient for differential expression testing, we detected itgax (cd c) positive cells in t-bet + clusters and (fig. a) . moreover, we detected significantly elevated expression of fclr , fgr, and hck in these clusters (fig. a , dataset s ). based on these scrna-seq expression patterns, we developed a flow cytometry panel to identify equine t-bet + b cells by protein expression. since t-bet exhibits % amino acid sequence identity between horses and humans (genpept accessions xp_ . and np_ . , respectively), we selected an anti-human t-bet antibody for intracellular labeling. we then adapted a previously validated equine pbmc immunophenotyping panel ( ) to identify cd -cd -panig + b cells (fig. s ). as predicted from our scrna-seq data, we detected an abundant cd b + b cell population with high expression of t-bet ( fig. b ) that did not express surface cd or cd (fig. c) . t-bet expression was not detected in other b cell gates. we also assessed surface isotype usage of t-bet + b cells by flow cytometry; ± % t-bet + b cells were igm hi , ± % were igg + , and ± % expressed neither igg nor igm (fig. d , e). it is unclear whether igm hi t-bet + b cells reflect an antigen-inexperienced naïve subset, a recently activated subset, or a memory cell subset that did not undergo class switch recombination. by flow cytometry, t-bet + b cells comprised ± % of total b cells, a percentage that was correlated, but consistently lower, to the percentage observed in scrna-seq data (fig. f ), perhaps reflecting incomplete sensitivity of panig antibody labeling for all b cells. taken together, our flow cytometry results validate the existence of a novel population of t-bet + b cells initially identified by scrna-seq analysis, which also demonstrated similarities with human t-bet + b cells associated with chronic infection and inflammation. the cd + prf + major cell group is composed of five clusters ( fig. a ; clusters , , , , and ) , which were represented at similar frequencies across all horses examined (fig. b ). all clusters were characterized by expression of the cytotoxic effector prf and ctsw, a cathepsin whose expression is associated with cytotoxic capacity ( ) (fig. b) . based on hierarchical clustering of integrated pca data, we partitioned our annotations into three distinct transcriptional programs (fig. c ). although all clusters expressed high levels of cd transcripts (cd d, cd e, cd g, fig. s a ), based on differential gene expression results (dataset s ), this major cell group likely includes both cytotoxic t cells and nk cells. transcriptional profiling studies of human and mouse cells often describe challenges in distinguishing cytotoxic lymphocyte subpopulations, with memory αβ cd + t cells, nk cells, nkt cells, and γδ t cells exhibiting considerable overlap in gene expression patterns ( ) ( ) ( ) ( ) ( ) . our data suggest similar overlap exists among equine cytotoxic lymphocyte subpopulations. we annotated clusters and as cd + "antigen experienced" or "non-naïve" t cells. while overall quite similar, cluster exhibited features more consistent with cd + t central memory cells in humans (gzmk/gzmm protein expression, absence of gzma protein), while cluster exhibited features more consistent with cd + t effector memory cells (gzmk/gzmm/gzma protein expression) ( ) . we emphasize that these expression patterns are not fully distinct and are unlikely to correspond perfectly to subpopulations defined by traditional flow cytometric markers. furthermore, although these cells appear to share common cytotoxic gene expression programs, we observed notable within-cluster heterogeneity. indeed, cluster contained mutually exclusive subgroups of cells which expressed either cd a or cd (fig. s b) . although expressing many of the same cytotoxic effector genes, cluster appeared distinct from other cytotoxic lymphocyte clusters (fig. a, c ). differential gene expression analysis revealed highly significant elevated expression of trdc. cells in cluster also expressed lower levels of trac, trbc , and trbc relative to other cytotoxic lymphocyte clusters (fig. s c ). based on these expression patterns, we annotated cluster as cytotoxic γδ t cells. interestingly, this cluster demonstrated high and specific (or nearly specific) expression of several genes associated with cytotoxicity, including gnly, klrb , and klrf . these results support the existence of equine γδ t cells, which have not been definitively characterized. moreover, they suggest that these cells employ unique cytolytic mechanisms compared to other equine cytotoxic lymphocytes. znf has also been described in human cytotoxic t cell subsets ( ) . we annotated cluster as nk cells based on specific expression of ensecag (annotated as klrd /cd ), which encodes the cell surface lectin central to nkg functions (fig. e ). this cluster also exhibited specific expression (within the cytotoxic lymphocyte major cell group) of fcer g and cd , both of which are important for nk cell activation signal transduction ( ) . in addition, and in contrast to cluster , this putative nk cell cluster exhibited diminished or absent expression of cd and cd , genes frequently used as t cell markers in humans ( ) (fig. e ). of note, multiple descriptions of equine nk cells by flow cytometry or immunohistochemistry have purposefully excluded cd + cells ( ) ( ) ( ) . however, consistent with scrna-seq, our flow cytometric analysis identified a well-defined cd + cd + lymphocyte population (fig. s a) . given their expression of tcr transcripts, it remains unclear whether these cells have the capacity to respond to specific antigen presented by traditional mhc-i or mhc-ii. although cluster has gene expression patterns consistent with both cytotoxic t cells and nk cells, the absence of definitive marker genes and/or genes associated with nk cell-restricted functions made it challenging to annotate this cluster. based on the overlapping gene expression programs described in cytotoxic lymphocytes in better characterized species, we suspect this cluster could include an additional type of cd + cytotoxic t cells, semi-invariant tcr cytotoxic t cells (e.g. mucosal-associated invariant t cells, nkt cells), and/or an additional type of nk cell. the latter possibilities are further supported by cross-species comparison to human cytotoxic lymphocytes (fig. f ). alternatively, cluster may represent a novel type of cytotoxic lymphocyte unique to horses. the cd + prf -t cell major cell group is composed of clusters, including proliferating t cells (cluster , fig. a ), which were represented at similar frequencies across all horses examined (fig. b) . although the most abundant in both cell and cluster numbers, these subpopulations were also the most challenging to effectively annotate, due in large part to the relatively subtle transcriptional differences detected between most clusters. in our experience, resting t cell populations can be difficult to distinguish by droplet microfluidics scrna-seq data. despite these limitations, we were able to make several observations regarding the non-cytotoxic t cell constituent clusters. first, we distinguished naïve t cells (clusters , , ) based on elevated expression of ccr , sell (l-selectin), and the lef transcription factor (fig. c ). naïve t cells could be further partitioned into cd + (clusters , , not significant by differential gene expression) and cd + (cluster ) subpopulations (fig. c) . although the remaining "non-naïve" clusters (presumably antigen experienced) exhibited significant gene expression differences, we were not able to confidently assign clusters to t cell subsets traditionally defined by flow cytometry (e.g. memory th , memory th , memory th , etc.). given the improved resolution and novel cell populations identified by scrna-seq, we grouped annotated cell clusters into summary populations and provided "reference ranges" for their frequency in healthy horses (fig. a) . we also evaluated how the frequencies of major cell groups defined by scrna-seq compared to those that can be resolved by current state-of-the-art flow cytometry equine pbmc immunophenotyping (adapted from ( ) taken together, these results demonstrate that our scrna-seq cell cluster annotations are consistent with state-of-the-art flow cytometry methods, but can resolve cell populations at much higher resolution and sensitivity. here, we used scrna-seq to define the cellular landscape of equine peripheral blood immune cells. this study demonstrates proof-of-concept for the utility of high throughput scrna-seq as a tool to characterize distinct cell types in species for which traditional flow cytometric tools are limited. furthermore, interspecies analyses demonstrate the potential of scrna-seq for comparative studies exploring the evolution of cellular specialization of the immune system. prior to this work, state of the art equine pbmc immunophenotyping methods ( ) ( )), have been described as generally conserved across mammalian species ( ) , and cd + monocytes have been previously observed in horses ( ) . indeed, our results support the existence of similar monocyte subsets in horse and human peripheral blood. moreover, the gene expression data supplied by scrna-seq offer insight into the putative functions of these cellular subsets. these include distinct trafficking programs (based on adhesion molecules and chemokine receptors) and antigen presentation capacity (expression of mhc-ii genes is generally anti-correlated with cd , with highest expression in the classical mono cluster, followed by non-classical mono). these data also include potential novel surface markers (e.g. cd a for nonclassical monocytes) for improved immunophenotyping by flow cytometry, though rna transcript levels may not necessarily correspond with surface protein expression. we also identified three dendritic cell clusters, with gene expression consistent with the cdc , cdc , and pdc subsets described in humans ( ) ). our data also indicate that, although they share many features with corresponding dc subsets in other species, equine dc subsets exhibit notable differences. for example, we did not detect itgam (cd b), a marker for murine cdc ( ), at appreciable levels in equine cdc . informed by the gene expression programs defined here, additional experimental characterizations are necessary to definitively assign functions to these different subsets, each of which is likely to play a critical role in equine immunity. in addition to identifying and/or validating cell types that we anticipated would be present in equine peripheral blood, the unsupervised clustering approach to scrna-seq data also revealed previously undescribed and unexpected cell populations. within the b cell compartment, we expected to detect clusters consistent with naïve b cells, memory b cells, and antibody secreting cells, based in part on human pbmc scrna-seq data and equine pbmc flow cytometry data ( ) . in addition to these clusters, we were surprised to observe two additional clusters of apparent b cells (as defined by high expression of ms a /cd and immunoglobulin transcripts) with gene expression patterns notably distinct from presumptive naïve and memory clusters. cells in these clusters, characterized by specific expression (within the b cell major cell group) of the t-bet transcription factor (tbx ), were the most abundant b cells across all seven horses under investigation. corresponding populations of t-bet + b cells were not observed in human pbmc scrna-seq data, and have not been previously described in horses. in mice, t-bet + b cells have been shown to be important for antiviral humoral immunity ( , , ) . in humans, t-bet + b cells have been detected in peripheral blood in a variety of chronic inflammatory contexts including systemic lupus erythematosus ( , ) , chronic malaria exposure ( , ) , and chronic viral infection ( , , ) . although a universal definition and designated function for these cells remains elusive, t-bet + b cells are often classified as atypical memory b cells and, at least in some contexts, are thought to arise from repetitive bcr stimulation ( ) . indeed, human t-bet + ( ) . the equine t-bet + b cell populations identified in the present study share many features with the atypical memory b cell populations described in humans. in addition to tbx /t-bet expression, horse t-bet + b cell populations exhibit similar gene expression patterns, including enriched expression of fcrl , fgr, and hck, the protein products of which modulate bcr signaling ( ) . furthermore, a recent study of t-bet + b cells in hiv demonstrated that they express genes characteristic of germinal center b cells ( ) . interestingly, we detected specific expression of aicda (encoding activationinduced cytidine deaminase) and elevated expression of apex in equine t-bet + b cells, suggesting that the t-bet + b cells identified here may represent equine equivalents of the t-bet + atypical memory b cells described in humans. if these cells are elicited by chronic antigenic stimulation, it is plausible that horses chronically exposed to numerous pathogens common in standard boarding conditions (e.g. equine alpha and gamma herpesviruses, influenza, rhinitis viruses, hepacivirus, parvovirus-hepatitis, coronavirus, etc.) could expand this population. while viral exposure burdens are likely to be largely similar to burdens experienced by humans, horses in the northeastern u.s. are also frequently exposed to borrelia burgdorferi (agent of lyme disease) ( ) and sarcocystis neurona (agent of equine protozoal myeloencephalitis) ( ) , and are continuously infested with or re-exposed to gastrointestinal nematodes ( ) . the horses in this study did not show signs of active infection or inflammation, as they all had normal complete blood count, serum amyloid a, iron indices, and globulins. moreover, the surprisingly high frequency of these t-bet + cells within the b cell compartment suggests that they may provide scrna-seq is molecularly compatible with presumably any animal species as most droplet microfluidics scrna-seq platforms select mrnas for barcoding and downstream sequencing based polya tails, a feature common across metazoans. additional requirements for scrna-seq analysis include a genome (or at minimum, transcriptome) sequence to which reads are mapped, and gene/transcript annotations against which mapped reads can be quantified. should transcriptome annotations be insufficient for robust scrna-seq analysis, as may be the case for less commonly studied organisms, read assignment/quantification strategies can be modified with specialized software tools (e.g. esat ( ) , as implemented here) and/or annotations can be supplemented/replaced with custom annotations derived from bulk rna-seq data. interpretation of scrna-seq results can be greatly facilitated by gene/transcript annotations with comprehensive orthologue annotations for multiple species, but this is not a requirement. without the need for species-specific reagents, and a constantly growing catalog of species with sequenced and annotated genomes, we anticipate that scrna-seq will be an increasingly useful research tool for non-traditional model organisms. despite the many insights gained from our pbmc analyses, scrna-seq, particularly for characterizing cell mixtures from diverse animals, is not without limitations. in the present study, although defining subpopulations with unsupervised clustering methods was reasonably straightforward, assigning putative cell types to each cluster presented challenges. ideally, automated cell type classification based on external datasets and/or prior knowledge could minimize biases introduced by supervised annotation ( , ) . recently developed scrna-seq data integration and cluster annotation tools have begun to implement this functionality ( ) ( ) ( ) . we made attempts to apply several of these strategies in comparing equine pbmc to human pbmc, but observed generally poor performance, which we attributed to insufficient interspecies orthologue annotations (data not shown). instead, we adopted a supervised approach based on prior knowledge of human and mouse immune cells to assign likely cell types. we, therefore, emphasize that our presumptive cell type annotations are not definitive, and ultimately require experimental validation by complementary methods, as we pursued with flow cytometry for t-bet + b cells (fig. ) . furthermore, for many clusters, most notably in the cd + prf lymphocytes major cell group, we were unable to confidently assign cell types due to limited detection of informative differentially expressed genes. this could be a result of suboptimal clustering (i.e. heterogeneous clusters), relatively low transcript sampling depth, and/or discrepancies in mrna and corresponding protein expression by which t cell subsets have been previously defined. many of these issues are likely to be mitigated in the future by perennially improving genome and orthologue annotations, scrna-seq methodologies with increased per cell sampling depth, and novel software tools for intra-and inter-species data analyses. horses studied here consisted of mares and geldings, to (mean . ) years old, warmbloods, thoroughbreds, and one quarter horse. horses were determined to be healthy by physical examination, serum biochemistry (including globulins and iron indices), complete approximately ml of blood was collected from each horse by standard jugular venipuncture. immediately following collection, pbmc were isolated by ficoll gradient centrifugation, as previously described ( ) . residual erythrocytes were removed by ammonium chloride lysis. all studies were conducted under approval of cornell university institutional animal care and use committee. within one hour of isolation, fresh pbmc were processed for single cell rna-seq on the scrna-seq data will be made available in the geo repository, accession number pending. the equcab . reference genome ( ) was used in all analyses. reference transcript annotations (ensembl v ) were supplemented by manual annotation of the immunoglobulin heavy chain and light chain loci as described by wagner, et al (supplemental dataset s , ( )). reads were assigned to cell barcodes, mapped and quantified per gene using the cellranger workflow (v . . , x genomics) with default parameters ("standard workflow"). in our optimized workflow, bam files generated by cellranger were reformatted (appending cellular barcode and umi sequence to alignment read names) and were input to the end sequence analysis toolkit (esat, ( ) ). briefly, esat evaluates reads mapped immediately downstream of annotated genes for potential quantification with the adjacent gene, an approach particularly relevant to ' scrna-seq data with reference transcriptomes with incomplete 'utr annotations. to eliminate ambiguous read assignments due to "overlapping genes" (i.e. exons from two different genes on + and -strands sharing the same genomic coordinates), the immunoglobulin-supplemented reference transcriptome (ensembl v ) was additionally modified to remove overlapping exon intervals on opposite strands. reformatted cellranger bam files were processed through esat in two rounds. first, esat was run ( nt extension window) with the modified transcriptome reference and set to ignore any duplicated genes. next, to recover quantification of genes duplicated in the ensembl v reference (n = duplicated genes), esat was run ( nt extension window) a second time with a filtered reference containing only duplicated genes; resulting read counts were divided across gene duplications and appended to the initial gene x cell count matrix. putative "doublet" cell barcodes were identified and removed from downstream analyses with the doubletdetection tool ( ) . gene-cell count matrices processed in the above workflow were analyzed in seurat (v . . , ( , ) ) as follows. data were filtered to exclude genes detected in less than cells (per subject), to exclude cells with less than umis, and to exclude cells with greater than % umis assigned to mitochondrial genes (e.g. dead or dying cells). gene-cell count matrices were independently normalized with sctransform ( ) , and the top most variable genes (variance-stabilizing transformation) were selected for each subject. to minimize subject-and/or batch specific effects, datasets from all subjects were integrated on the first canonical correlation components identified on the union of highly variable genes identified per subject. immunoglobulin heavy chain and light chain genes were excluded from integration and clustering analysis. differential gene expression analyses were conducted using edger v . . ( , ) , with additional modifications for scrna-seq data ( ) . gene expression linear models included factors for cellular gene detection rate, subject, and cluster (as identified in seurat analysis above). specific contrasts are detailed in relevant results sections and/or figures. for analyses other than comparisons among cd + prf + and cd + prf cell clusters, differential gene expression was defined as adjusted p-value < . (benjamini-hochberg correction) and moderated log foldchange > (as determined in edger model). differential gene expression for cd + prf + and cd + prf cell comparisons used a less stringent fold-change cutoff (moderated log foldchange > . ) to account for reduced dynamic range of gene expression observed in these clusters. for all analyses, genes expressed (i.e. greater than or equal to umi) in less than % of cells for at least one group within a contrast were excluded from differential expression results. resulting differential gene expression lists were further annotated for putative surface protein expression by intersecting one-to-one gene orthologs with the human surface protein atlas ( ) . human pbmc scrna-seq datasets (pbmc_ k_v ; pbmc_ k_protein_v ) were cross-species scrna-seq correlation analyses were conducted using an approach based on zilionis et al. ( ) . human and horse gene-cell count matrices were filtered to keep only those genes with high confidence -to- orthologues across species (as defined by ensembl v ). for each species and each major cell group (monocyte/dendritic cells, b cells, cd + prf + lymphocytes, cd + prf lymphocytes), following normalization with sctransform ( ) genes were ranked by pearson residual, and genes above the . *inflection point were selected as highly variable genes. lists of highly variable genes in human and horse datasets were intersected, and the resulting list of orthologs present in both species was used for clustering analysis. clustering was performed on natural log normalized gene x cell count matrices and clustered on pearson correlation distance by ward's method ( ) . results were visualized by dendrogram with the dend function in r. the flow cytometric phenotyping protocol was adapted from ( ) . a list of primary antibodies is included in table s . , , , ) and dcs ( , , ) . (d) heatmap of genes differentially expressed (adjusted p-value < . , log fold-change > for each cluster versus all other clusters) by each cluster, with select genes labeled at left. (e) dot plot of select genes differentially expressed across monocyte clusters. dot size is proportional to number of cells with detectable expression of indicated gene. dot color intensity indicates gene expression values scaled across plotted clusters. *gene id ensecag is labeled fcgr a/b based on ensembl/ncbi annotations. (f) dot plot of select genes differentially expressed across dc clusters. additional details as in (e). (g) hierarchical clustering of equine pbmc scrna-seq data (monocyte/dc clusters) and human pbmc scrna-seq data (monocyte/dc clusters). median-normalized average expression values for highly variable human/horse one-to-one orthologues were calculated for each cluster, and clustering was performed on pierson distances by ward s method. non-model model organisms of mice, dirty mice, and men: using mice to understand human immunology choosing the right animal model for infectious disease research one health perspectives on emerging public health threats onehealth: oie -world organisation for animal health flow cytometry: basic principles and applications standardizing immunophenotyping for the human immunology project mass cytometry: single cells, many features highly parallel genome-wide expression profiling of individual cells using nanoliter droplets massively parallel digital transcriptional profiling of single cells single-cell transcriptomics to explore the immune system in health and disease zoonotic pathogens transmitted from equines: diagnosis and control multispectral fluorescence-activated cell sorting of b and t cell subpopulations from equine peripheral blood overexpression of t-bet in hiv infection is associated with accumulation of b cells outside germinal centers and poor affinity maturation t-bet+ memory b cells: generation, function, and fate comparative analysis of droplet-based ultra-high-throughput single-cell rna-seq systems utrdb and utrsite (release ): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mrnas end sequence analysis toolkit (esat) expands the extractable information from single-cell rna-seq data monoclonal antibodies to equine cd monocyte differentiation and antigenpresenting functions the transcription factor nr a (nur ) controls bone marrow differentiation and the survival of ly c− monocytes human cd dim monocytes patrol and sense nucleic acids and viruses via tlr and tlr receptors regulation of monocyte cell fate by blood vessels mediated by notch signalling human dendritic cell subsets: an update the dendritic cell subtype-restricted c-type lectin clec a is a target for vaccine enhancement batf deficiency reveals a critical role for cd + dendritic cells in cytotoxic t cell immunity comparative analysis of irf and ifn-alpha expression in human plasmacytoid and monocyte-derived dendritic cells transcription factor e - is an essential and specific regulator of plasmacytoid dendritic cell development immunoglobulins and immunoglobulin genes of the horse. developmental & comparative immunology adjuvant-specific regulation of long-term antibody responses by zbtb t-box transcription factor t-bet, a key player in a unique type of b-cell activation essential for effective viral clearance expression of the immunoregulatory molecule fcrh defines a distinctive tissue-based population of memory b cells toll-like receptor (tlr )-driven accumulation of a novel cd c+ bcell population is important for the development of autoimmunity age-associated b cells: a t-bet-dependent effector with roles in protective and pathogenic immunity involvement of the hck and fgr src-family kinases in fcrl -mediated immune regulation role of cd c+ t-bet+ b cells in human health and disease t-bet+ b cells are induced by human viral infections and dominate the hiv gp response cathepsin w expressed exclusively in cd + t cells and nk cells, is secreted during target cell killing but is not essential for cytotoxicity in human ctls transcriptional profiling of γδ t cells molecular definition of the identity and activation of natural killer cells impact of genetic polymorphisms on human immune cell gene expression single-cell transcriptomic landscape of nucleated cells in umbilical cord blood panglaodb: a web server for exploration of mouse and human single-cell rna sequencing data label-free analysis of cd + t cell subset proteomes supports a progressive differentiation model of human-virus-specific t cells the transcription factor znf /hobit regulates human nk-cell development transcriptome analysis of immune genes in peripheral blood mononuclear cells of young foals and adult horses blimp- homolog hobit identifies effector-type lymphocytes in humans up on the tightrope: natural killer cell activation and inhibition atlas of hematopathology: morphology, immunophenotype, cytogenetics, and molecular approaches generation and characterization of monoclonal antibodies to equine cd generation and characterization of monoclonal antibodies to equine nkp abnormal patterns of equine leucocyte differentiation antigen expression in severe combined immunodeficiency foals suggests the phenotype of normal equine natural killer cells monocyte subsets in man and other species development and function of dendritic cell subsets ex vivo generation of mature equine monocyte-derived dendritic cells characterization of respiratory dendritic cells from equine lung tissues monocyte-derived dendritic cells from horses differ from dendritic cells of humans and mice identification and characterization of equine blood plasmacytoid dendritic cells cutting edge: b cell-intrinsic t-bet expression is required to control chronic viral infection c-myb regulates the t-bet-dependent differentiation program in b cells to coordinate antibody responses distinct effector b cells induced by unregulated toll-like receptor contribute to pathogenic responses in systemic lupus erythematosus il- drives expansion and plasma cell differentiation of autoreactive cd chit-bet+ b cells in sle malaria-induced interferon-γ drives the expansion of tbethi atypical memory b cells malaria-associated atypical memory b cells exhibit markedly reduced b cell receptor signaling and effector function circulating and intrahepatic antiviral b cells are defective in hepatitis b borrelia burgdorferi infection and lyme disease in north american horses: a consensus statement: lyme disease in horses equine protozoal myeloencephalitis: an updated consensus statement with a focus on parasite biology, diagnosis, treatment, and prevention anthelmintic resistance and novel control options in equine gastrointestinal nematodes reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage singlecellnet: a computational tool to classify single cell rna-seq data across platforms and across species integrating single-cell transcriptomic data across different conditions, technologies, and species single-cell multi-omic integration compares and contrasts features of brain cell identity improved reference genome for the domestic horse increases assembly contiguity and composition github: doubletdetection (zenodo, ) https normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression a smart local moving algorithm for large-scale modularitybased community detection umap: uniform manifold approximation and projection differential expression analysis of multifactor rna-seq experiments with respect to biological variation bioconductor package for differential expression analysis of digital gene expression data bias, robustness and scalability in single-cell differential expression analysis a mass spectrometric-derived cell surface protein atlas single-cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species hierarchical grouping to optimize an objective function key: cord- -wnsgjdcp authors: love, r. rebecca; pombi, marco; guelbeogo, moussa w.; campbell, nathan r.; stephens, melissa t.; dabire, roch k.; costantini, carlo; della torre, alessandra; besansky, nora j. title: inversion genotyping in the anopheles gambiae complex using high-throughput array and sequencing platforms date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wnsgjdcp chromosomal inversion polymorphisms have special importance in the anopheles gambiae complex of malaria vector mosquitoes, due to their role in local adaptation and range expansion. the study of inversions in natural populations is reliant on polytene chromosome analysis by expert cytogeneticists, a process that is limited by the rarity of trained specialists, low throughput, and restrictive sampling requirements. to overcome this barrier, we ascertained tag single nucleotide polymorphisms (snps) that are highly correlated with inversion status (inverted or standard orientation). we compared the performance of the tag snps using two alternative high throughput molecular genotyping approaches versus traditional cytogenetic karyotyping of the same individual an. gambiae and an. coluzzii mosquitoes sampled from burkina faso, west africa. we show that both molecular approaches yield comparable results, and that either one performs as well or better than cytogenetics in terms of genotyping accuracy. given the ability of molecular genotyping approaches to be conducted at scale and at relatively low cost without restriction on mosquito sex or developmental stage, molecular genotyping via tag snps has the potential to revitalize research into the role of chromosomal inversions in the behavior and ongoing adaptation of an. gambiae and an. coluzzii to environmental heterogeneities. we recently described a strategy that exploited the an. gambiae and an. coluzzii database of natural variation (ag g; www.malariagen.net/projects/ag g) (miles et al. ) to identify tag snps predictive of inversion orientation for all six common inversion polymorphisms in these species. using these tags, we developed an algorithm capable of in silico inversion genotyping based on snps called from whole genome resequencing data (love et al. ). this is a rapid and powerful approach assuming that whole genome sequence data are already available or will be produced for other reasons. however, it does not satisfy experimental designs in which genomic sequence data is not otherwise required, and where its procurement would be cost- prohibitive. for the requisite statistical power, studies aimed at finding significant associations between inversions and behavioral or physiological phenotypes will likely require thousands of specimens of known inversion genotype. to ensure that we retained at least six tag snps per inversion for genotyping, we were compelled to reduce the genotypic concordance threshold below the . level imposed in love et al. ( ) . even lowering the threshold to . for rc tags in an. gambiae failed to yield more than three candidates, and we declined to reduce that threshold further. minimum genotypic concordances for each inversion were . for rb, rc and ru; . for rd; . for rj; and . for la. after filtering, we retained tag snps in total, ranging from to per inversion except rc in an. gambiae, with only tag snps. based on these tags, we selected a custom -assay taqman openarray genotyping plate design whose , reaction through-holes are divided into sub-arrays, each with through-holes ( of which were preloaded with a single custom assay). one such plate genotypes mosquitoes at tags ( , genotypic assays). for genotyping assays are provided in table s . core facility (gbcf) from a subset of specimens (n= ) to refine preparation techniques and identify primers that produced pcr artefacts or were overrepresented. following optimization, primer pools were re-made to include only the optimized panel of pcr primers. tag snps and pcr primers for gt-seq genotyping are listed in table s . biotech) according to manufacturer's instructions. following normalization, ul of each sample per well plate (up to ul total) was then combined into a . -ml eppendorf tube, for a total of tubes. from each tube, ul was transferred to a fresh . -ml eppendorf tube for two rounds of purification using ampure xp paramagnetic beads (beckman coulter, inc.) with ratios of . x and . x respectively. purified libraries were eluted in ul xte and transferred to fresh . -ml tubes before adding . ul buffer eb containing a % tween solution. the procedures for filtering and calling molecular inversion genotypes were the same for both oa and gt-seq platforms. filtering steps were as follows. for each tag snp, we calculated the percentage of mosquito specimens in the sample with a genotype call at that tag (the snp call rate). if snp call rates were < %, the underperforming tag snps were eliminated from further analysis. in addition, for each mosquito specimen analyzed, we calculated the percentage of tag snps with a genotype call (the specimen call rate). if the specimen call rate was < %, that specimen was excluded from further analysis. note that the mosquito specimens in the sample varied according to the inversion under consideration: la, rb and ru tags perform in both species, rj and rd tags are an. gambiae-specific, and defined subsets of rc tags (referred to in this work as rc_col and rc_gam) apply respectively to an. coluzzii or an. gambiae to calculate the multilocus inversion genotype for each specimen, we converted the raw genotype data for individual tag snps to the count of alternate alleles (if necessary), where ' ' is a homozygote for the reference allele, ' ' is a heterozygote carrying one reference allele, and ' ' is a homozygote for the alternate allele. next, we averaged the number of alternate alleles present across all tag snps in a given inversion, and binned this average to produce a predicted inversion genotype ( - . , ; . - . the inversion genotypes inferred for each specimen by the three methods (cytogenetics, oa, and gt-seq) are provided in table s . we compared these genotypes to assess their concordance. due to our filtering rules, not every specimen had genotype calls by both molecular methods. we focused our assessment on the subset of specimens that were successfully genotyped with all three methods ( for an. gambiae, and for an. coluzzii). as summarized in figure and both oa and gt-seq that agreed, but were jointly discordant with cytogenetics. except for rj with negligible discordance (and correspondingly low levels of polymorphism in our sample), the cytogenetic versus multilocus molecular discordance affected from to mosquito specimens per inversion, representing % to % of the mosquito samples (mean, %). although cytogenetic karyotyping may be considered the gold standard for inversion genotyping, two important considerations lend considerable confidence to molecular genotypes, particularly when both molecular approaches concur. first, while none of the tag snps are deterministic (i.e., none is perfectly and invariably correlated with inversion orientation), oa and gt-seq infer genotypes based on multiple predictive tags scored per inversion, thus providing weight of numbers. second, the final set of tags used for oa and gt-seq are almost completely non- overlapping, an outcome produced by distinct filters imposed on the initial list of candidate tags during assay development (see methods; figure ) . accordingly, agreement between both molecular methods is even stronger evidence in favor of the inferred molecular genotype than that provided by one or the other molecular method by itself. table shows that the two molecular methods agree at least % of the time (an average of %), except in the case of rc in an. gambiae. we hypothesized that some genotypic discordances, specifically those in which the two molecular methods agree but conflict with cytogenetics, are caused by cytogenetic errors rather than systematic biases in the molecular approaches. this is difficult to demonstrate conclusively, because the specimens used to compare the three genotyping methods have not been subjected to whole genome sequencing. furthermore, it was not possible to double-check the cytogenetic karyotypes of the specimens with discordances in the majority of cases, because neither slides nor ovaries were available. however, there is strong evidence consistent with cytogenetic error. during sampling in the field, specimens were assigned numerical identifiers that incremented by " " throughout the process; cytogenetic karyotyping was later split between two institutions on the basis of even-or odd-numbered identifiers (see methods). this procedure virtually eliminates the possibility of biases owing to temporal or spatial heterogeneities during the course of the mosquito sampling. because even- and odd-numbered specimens should be random subsamples from the same populations, we would expect no difference in discordance rates between them. this was not what we found. of the specimens assayed by both molecular methods in this study, were odd-numbered and even-numbered. focusing on the inversions with the largest numbers of specimens whose cytogenetic and joint molecular genotypes disagreed ( la, ; rb, ; ru, ; table ), the combined such discordances occurred disproportionately in even-numbered specimens: of the . analyses of x contingency tables demonstrated highly significant departures from the null hypothesis (by chi-square and fisher exact probability tests), consistent with the notion that cytogenetic error disproportionately affecting the even-numbered specimens is responsible for these genotypic discrepancies (~ %). if this is the case, then based on the fact that both molecular approaches agree > % of the time, we suggest that the true error rate for either molecular approach is < %, probably closer to ~ %. it is important to recognize that although the tag snps assayed by the two molecular approaches are largely non-overlapping for technical reasons, the assumptions underlying the ascertainment of the initial set of candidate tags were the same. the implication is that if those assumptions are violated in natural populations, both approaches may agree on the wrong genotype. the tags were ascertained in the ag g variation database, whose content was heavily biased toward an. gambiae at and/or (iii) violation of the assumption that the focal inversion arose uniquely (i.e., has a monophyletic origin). we suspect that at least one of these scenarios applies to rc tags ascertained in an. gambiae, probably explaining their lower rate of apparent success in genotyping (based on lower concordance values across the board; table ). previous work has shown that applying candidate tags to a taxon in which they are not (table s ) . based on this, we expect the systematic underestimation of the number of alternate alleles to produce a distinctive pattern where 'true' standard homozygotes are correctly identified, but heterozygotes and inverted homozygotes would be incorrectly genotyped molecularly as standard homozygotes. table shows the distribution of discordant genotypes between cytogenetics and joint molecular methods when broken down by genotypic class: standard homozygotes, heterozygotes, and inverted homozygotes. while we have no objective measure of which specimens are 'true' heterozgyotes and 'true' inverted homozygotes, it is noteworthy that the discordances for all inversions other than rc in an. gambiae either skew toward molecular genotypes of ' ' or ' ', or they are roughly equally distributed between ' ' or ' ' and ' '. the pattern for rc in an. gambiae is distinctive, in that the skew is strongly toward molecular genotypes of ' ', which is consistent with tags that may not be appropriately suited for the an. gambiae population in which they are applied. further study is both required and merited, to understand the cause(s) of population structure between the populations used to develop the rc tags for in an. although we find good concordance between both molecular genotyping approaches, gt-seq more often agreed with cytogenetics than did oa (table ) . this is not surprising, given the hybridization-based nature of oa and the extremely high levels of nucleotide diversity found in an. coluzzii and an. gambiae (miles et al. ) . in addition, highly concordant candidate tag snps that also had low polymorphism in the ~ bp immediately surrounding the tag, as required by oa, were sufficiently rare that we were compelled to lower the concordance threshold to find enough candidates suitable for assay design, and consequently the total number of oa tags per inversion is smaller than for gt-seq (table ) reference genome assemblies and powerful functional genomics tools, the potential exists to probe molecular mechanisms and deepen our understanding. yet, a major limitation to progress in this area has been the strict requirement for polytene chromosome analysis, which not only limits samples but also demands rare cytogenetic expertise whose throughput is low. here we demonstrate that tag snps highly correlated with inversion status can be used for joint molecular genotyping of common inversions in an. gambiae and an. coluzzii across the genome (i.e., for karyotyping). and gt-seq. each row is an individual mosquito, and each column compares inversion genotypes derived from three genotyping approaches for a given inversion (a, la; j, rj; b, rb; c, rc; d, rd; u, ru). rows are grouped by species; rj and rd tags are not applicable in an. coluzzii. green represents -way genotypic concordance; yellow, concordance between oa and gt-seq; purple, concordance between cyt and gt-seq; black, concordance between cyt and oa; gray is missing data; red is -way discordance. revisiting the impact of inversions in evolution: from population genetic markers to drivers of adaptive shifts and speciation? annual review of how and why chromosome inversions evolve chromosome inversions, local adaptation and speciation breakpoint structure of the anopheles gambiae rb chromosomal inversion silico karyotyping of chromosomally polymorphic malaria mosquitoes in the anopheles gambiae complex. g (bethesda) chromosomal inversions and ecotypic differentiation in anopheles gambiae: the perspective from whole-genome sequencing complex genome evolution in anopheles coluzzii associated with increased insecticide usage in mali a test of the chromosomal theory of ecotypic speciation in anopheles gambiae genetic diversity of the african malaria vector anopheles gambiae the garki project. research on the epidemiology and control of malaria in the sudan savanna of west africa world health organization highly specific pcr-rflp assays for karyotyping the widespread rb inversion in malaria vectors of the anopheles gambiae complex intraspecific chromosomal polymorphism in the anopheles gambiae complex as a factor affecting malaria transmission in the kisumu area of kenya chromosomal plasticity and evolutionary potential in the malaria vector anopheles gambiae sensu stricto: insights from three decades of rare paracentric inversions cuticular differences associated with aridity acclimation in african malaria vectors carrying alternative arrangements of inversion la the anopheles gambiae la chromosome inversion is associated with susceptibility to plasmodium falciparum in africa seasonal variations in indoor resting anopheles gambiae and anopheles arabiensis in kaduna ) la chromosomal inversion enhances thermal tolerance of anopheles gambiae larvae selection in heterogeneous environments maintains the gene arrangement polymorphism of drosophila pseudoobscura identification of single specimens of the anopheles gambiae complex by the polymerase chain reaction the distribution and inversion polymorphism of chromosomally recognized taxa of the anopheles gambiae complex in mali eco-evolutionary genomics of chromosomal inversions molecular karyotyping of the la inversion in anopheles gambiae ayala d, acevedo p, pombi key: cord- -eyyd ent authors: rizvi, vaseef a.; sarkar, maharnob; roy, rahul title: translation regulation of japanese encephalitis virus revealed by ribosome profiling date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: eyyd ent japanese encephalitis virus (jev), a neurotropic flavivirus, is the leading cause of viral encephalitis in endemic regions of asia. although the mechanisms modulating jev virulence and neuroinvasiveness are poorly understood, several acquired mutations in the live attenuated vaccine strain (sa - - ) point towards translation regulation as a key strategy. using ribosome profiling, we identify multiple mechanisms including frameshifting, trna dysregulation and alternate translation initiation sites that regulate viral protein synthesis. a significant fraction (~ %) of ribosomes undergo frameshifting on ns coding sequence leading to early termination, translation of ns ′ protein and modulation of viral protein stoichiometry. separately, a trna subset (glutamate, serine, leucine and histidine) was found to be associated in high levels with the ribosomes upon jev infection. we also report a previously uncharacterised translational initiation event from an upstream uug initiation codon in jev ′ utr. a silent mutation at this start site in the vaccine strain has been shown to abrogate neuroinvasiveness suggesting the potential role of translation from this region. together, our study sheds light on distinct mechanisms that modulate jev translation with likely consequences for viral pathogenesis. downstream polyprotein, ) significant modulation in levels of a distinct subset of ribosome-bound trnas that cannot be explained by virtue of codon usage and ) translation from an upstream orf (uorf) using a non-canonical initiation codon in the utr region of jev. these events signify several strategies of translational regulation during viral polyprotein synthesis along with features which could aid the virus in neuroinvasion. overall, our findings display the potential of translation governing factors in +ssrna viruses' pathogenesis by evaluating their molecular underpinnings. at h post infection (pi), cells were treated with either cycloheximide (sigma-aldrich, µg/ml, min) or harringtonine (lkt laboratories, µg/ml, min) followed by cycloheximide treatment for min. cells were rinsed with ice-cold pbs containing µg/ml cycloheximide. dishes were submerged in a liquid nitrogen reservoir for s followed by scraping over dry ice in polysome lysis buffer ( mm tris-hcl ph . , mm nacl, mm mgcl , mm dtt, % (v/v) triton x- , µg/ml cycloheximide and u/ml turbo dnase i (life technologies)). cells were collected and triturated with a g needle times and clarified by centrifugation ( g, min, • c). supernatant was collected and treated with u/µl of rnase i (ambion) for h at room temperature with gentle mixing followed by inactivation with u of superase-in rnase inhibitor (ambion). cell extracts were passed through sephacryl s spin columns (ge) after pre-equilibration with polysome lysis buffer and ribosome-bound mrna eluates were collected by centrifugation at g, min, • c [ ] . library preparation rna was extracted from eluates and total lysate using trizol reagent (thermoscientific). for rna-seq, total rna was fragmented using x fragmentation reagent (ambion) according to manufacturer's protocol. libraries were prepared according to original protocols of ingolia and colleagues [ ] with minor modifications. briefly, rna was resolved over % polyacrylamide tbe-urea gel using electrophoresis and a broad range of fragments ( - nts) were purified from the gel to capture both ribosome bound mrna and trna segments. rna fragments were further dephosphorylated using t polynucleotide kinase (neb) for h at • c followed by heat inactivation for min at • c. fragments were ligated to microrna preadenylated linker (neb) using t rnl (tr) ligase (neb) for . h at room temperature. ligated products were size selected and purified from page gel followed by reverse transcription [ ] using superscript iii (thermoscientific) and eliminating rna by naoh hydrolysis for min at • c. cdna is again size selected on a denaturing page, gel purified and circularised using circligase (epicentre) for h at • c followed by heat inactivation for min at • c. circularised product is subjected to two rounds of rrna depletion using biotinylated oligos [ ] and streptavidin-coated magnetic beads (neb) according to manufacturer's protocol. the rrna-subtracted circular product is subjected to a final round of pcr amplification with illumina adaptor primers [ ] using phusion polymerase (neb). all the libraries were then quantified, and quality checked by genotypic technology services using qubit fluorimeter, real time qrt-pcr and bioanalyzer followed by sequencing on raw sequences were quality filtered and adaptors were trimmed using fastx-toolkit [ ] . reads ≥ nt were mapped to mus musculus rrna (accession numbers nr , nr , nr , nr , nr , gu ) and trna (gtrna database) using bowtie (very-sensitive-local alignment) [ ] with a maximum of one mismatch and sorted into separate files. unmapped reads were aligned to jev strain p (af ) and mus musculus refseq rna database (mm ) without any mismatches for analyzing ribosomal footprint distribution. p-site offsets were determined by metagene analysis of host mrna using ribogalaxy tool [ ] for corresponding footprint lengths. reads aligning to jev were further mapped to single nucleotide by setting respective p-site offset (+ for − nts and + for nt). individual fragment not accounted for rna levels [ ] . trna mapping was executed using sports . [ ] with no mismatches and default parameters for cytoplasmic and mitochondrial trnas. as mitochondria employ an independent translation system and serves as internal control for relative quantification [ ] , individual cytoplasmic trnas were first normalised to total mitochondrial trna levels and further quantile normalised to derive relative fold changes between the samples. sequence comparison of utr from various jev strains was carried out using kalign with default parameters [ ] . statistical and correlation analyses were performed using in-house written scripts. a dual reporter vector, pcmv-sluc-ires-gfp [ ] , was employed to validate expression activity from at ns c-terminus, with the exception of ns b possibly due to low ribosomal velocities (fig. b) . this frameshift region compared to frame-wise read densities immediately upstream and downstream is consistent with − prf near the ns terminus (fig. c) . with similar estimates reported from wnv ( - %), this frameshifting can result in higher levels of structural proteins and will lead to deviations suggested in viral polyprotein stoichiometry [ ] . however, considering the involvement of vrna in parallel lifecycle processes, accurate estimates of t.e for individual viral proteins remains to be evaluated. since a conserved rna pseudoknot structure is shown to stimulate − prf on a slippery heptanucleotide motif and generate ns in jev [ ] , we also scan for frameshift-associated pausing near the frameshit site. although no ribopausing was observed at the slippery heptanucleotide sequence, we detect significant accumulation of rpfs ∼ nt upstream to the frameshift site ( nt, fig. a inset) . this could either be a consequence of higher representation of charged and proline residues near this region or limited nuclease accessibility due to closely stacked ribosomes upstream to frameshift site. in addition, we also observe significant pause sites in ns ( nt, asn) and membrane cds ( nt, ala). however, these sites do not represent commonly associated studies on rna viruses have suggested adaptations in codon usage of viral genes to the host translation [ ] . this coupling is modulated by trna concentrations of specific codons and regulates ribosome elongation dynamics as well as co-translational folding of proteins [ , ] . in order to understand the differential (fig. a) . as trnas contain modified bases which generate truncated cdnas during reverse transcription step of library preparation [ , ] , we perform stringent alignments to quantify ribosome-protected trnas [ ] (see methods). the trna levels agree well between chx and har-chx treated infected lysates (r = . , fig. b ) suggesting that our results are independent of procedure or protocols used for ribosome associated trnas [ , ] . a subset of trnas also represented at higher abundance in ribosome associated fraction upon jev infection (figs. b and c) . these include glutamic acid (uuc), serine (cga, gcu, uga), histidine (gug) and leucine (cag) trnas (fig. c) . such high levels of trn a u u c glu could signify a strong enrichment of gaa codon which is reported to be associated with ribopause sites in mammalian translatomes [ ] . in trna profiles generated using targeted trna library preparations [ , ] . since mice infected with jev and wnv show upregulation of certain aminoacyl trna synthetases [ ] , it is possible that translation start codon (fig. c) . a recent study on riboprofile of zikv also showed ribosomal initiation scanning from out-of-frame non-canonical start codons present in the utr -cug (uorf ) and uug (uorf ) [ ] . however, unlike zikv riboprofile where initiating ribosomes tend to stall more at uorf start codons compared to aug, riboprofile of jev (chx) in n a cells shows that the rpf density is ∼ . x higher at the canonical start codon compared to the uorf start site (figs. a and b ). translation initiation from both jev start codons was confirmed by luciferase-based translation reporter constructs (fig. b) . consistent with our ribosome profile findings, polyprotein orf (pporf) expresses . − x more efficiently than uorf (fig. c) , / as expected from poor initiation context of uug [ ] . interestingly, jev infection appears to stimulate expression from uug start site by almost % suggesting viral or virus-induced host trans-regulatory factors promoting uorf translation (fig. d) . upon sequence comparison of utr across jev strains, the alternate start codon exhibits % conservation except for the attenuated strain-jev sa - - harbouring u a mutation and generating a previously overlooked stop codon (fig. e) . implications of this disrupted alternate start site in the vaccine strain remains to be evaluated. are distinct from those reported in mammalian systems [ ] . we also identify a subset of ribosome associated trnas whose levels are modulated globally upon jev infection (fig. ) . however, such trna abundances remain to be critically examined as recent studies indicate interference of cycloheximide in quantifying bound trna fractions due to ribosomal conformational locks [ ] . nevertheless, this discriminating trna subset during jev infection could provide a possible antiviral intervention strategy. for example, a recent study demonstrated impairment in wnv infectivity upon depletion of schlafen- which prevents wnv-induced changes in a trna subset translating . % of viral polyprotein [ ] . also, schlafen- was shown to bind trnas essential for hiv protein synthesis during later stages of infection in a codon usage dependent manner [ ] . it is additionally possible that these trna might be involved in translation-independent processes like viral replication (eg. retro-and bromoviruses [ ] ) or novel functions with their site-specific interactions on viral genome (eg. trn a ct c glu and trn a ccc gly with zikv [ ] ). although studies have indicated adaptation in viral codon usage to host organism or tissue [ , ] , our findings suggest its re-evaluation in the context of the anticodon counterpart (trna) dynamics that will arise due to virus-induced cellular / rearrangements. manipulating virus codon usage to attenuate the virus has recently shown promising advancements in vaccination of mice models [ ] [ ] [ ] [ ] [ ] . combined with trna perturbation, efficacy of these putative attenuated vaccine targets can be further improved. despite the employment of translation inhibitors to enrich ribosome associated fragments, both utrs display unique nuclease protected regions. while reads in utr fail to exhibit phasing, rcs region of db displays significant nuclease protection. this region, arising from duplication of cs region in db, is observed across jev and denv serotypes and mutations in cs region have also shown to decrease translation and replication in the latter [ ] . despite the absence of a polya tail, it was shown that a-rich sequences flanking these db structures interact with polya binding protein (pabp) and enhance translation by serving as a circularisation factor [ ] . we speculate that cs and rcs duplicate sequences in utr facilitate a similar interaction and possibly modulate vrna translation by circularisation and ribosome recycling. we also observe a striking nuclease protection profile in utr upon comparing chx and har-chx datasets which exhibit characteristics of an uorf (fig. a ). translational activity of the uorf was validated using reporter constructs (fig. b ). while the functional significance of the putative peptide remains unknown, a close overlap of uorf stop codon ( th nt) with main orf's start codon ( th nt, fig. c ) could directly impact ribosome loading and usage rates-key parameters in regulating protein synthesis [ ] . although a canonical structure of utr is shared amongst flaviviruses [ ] , this strategy of translation initiation might suggest the enhanced translational efficiency of jev and zikv [ ] over denv [ ] . this site exhibits high conservation across jev strains with the exception of vaccine strain (sa - - ) harbouring a silent mutation. a reverse genetic approach incorporating u a mutation in wild type jev sa backbone reduced mice neuroinvasiveness but not neurovirulence of the virus with no significant differences in viral titres under in vitro conditions [ ] . however, it remains to be tested whether this intervention is a consequence of the putative short peptide encoded from the uorf or a cis acting element important for rna transport as suggested in tick borne encephalitis virus [ ] . a similar strategy of uorf and polyprotein expression is employed by enteroviruses where the uorf encoded protein was shown to facilitate virus growth in gut epithelial cells [ ] . this remarkable display of tropism using short uorfs, along with other cis acting estimated global incidence of japanese encephalitis: a systematic review the viruses and their replication. fields virology ns of flaviviruses in the japanese encephalitis virus serogroup is a product of ribosomal frameshifting and plays a role in viral neuroinvasiveness non-canonical translation in rna viruses genetic determinants of japanese encephalitis virus vaccine strain sa - - that govern attenuation of virulence in mice comparison of the live-attenuated japanese encephalitis vaccine sa - - strain with its pre-attenuated virulent parent sa strain: similarities and differences in vitro and in vivo a single nucleotide mutation in ns a of japanese encephalitis-live vaccine virus (sa - - ) ablates ns formation and contributes to attenuation genomewide analysis in vivo of translation with nucleotide resolution using ribosome profiling dengue virus selectively annexes endoplasmic reticulum-associated translation machinery as a strategy for co-opting host cell protein synthesis cellular gene expression during hepatitis c virus replication as revealed by ribosome profiling high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling tomer israely, nir paran, michal schwartz, and noam stern-ginossar. the coding capacity of sars-cov- the translational landscape of zika virus during infection of mammalian and insect cells an upstream protein-coding region in enteroviruses modulates virus infection in gut epithelial cells protein-directed ribosomal frameshifting temporally regulates gene expression isolate and sequence ribosome-protected mrna fragments using size-exclusion chromatography the ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mrna fragments fast gapped-read alignment with bowtie ribogalaxy: a browser based platform for the alignment, analysis and visualization of ribosome profiling data sports . : a tool for annotating and profiling non-coding rnas optimized for rrna-and trna-derived small rnas tracking the missing footprints of idle ribosomes kalign -an accurate and fast multiple sequence alignment algorithm functional incompatibility between the generic nf-kb motif and a subtype-specific sp iii element drives the formation of the hiv- subtype c viral promoter genome-wide translation profiling by ribosome-bound trna capture small noncoding rna modulates japanese encephalitis virus replication and translation in trans integrated in vivo and in vitro nascent chain profiling reveals widespread translational pausing ribosome pausing, a dangerous necessity for co-translational events ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes repertoires of trnas: the couplers of genomics and proteomics codon optimality, bias and usage in translation and mrna decay characterizing expression and processing of precursor and mature human trnas by hydro-trnaseq and par-clip landscapes of ribosome dwell times and relationships with aminoacyl-trna levels in mammals virus-induced transcriptional changes in the brain include the differential expression of genes associated with interferon, apoptosis, interleukin receptor a, and glutamate signaling as well as flavivirus-specific upregulation of trna synthetases membranous replication factories induced by plus-strand rna viruses quantitative analysis of the hepatitis c virus replication complex schlafen restricts flavivirus replication codon-usage-based inhibition of hiv protein synthesis by human schlafen parallels among positive-strand rna viruses, reverse-transcribing viruses and doublestranded rna viruses comrades determines in vivo rna structures and interactions the extent of codon usage bias in human rna viruses and its evolutionary origin codon usage bias and the evolution of influenza a viruses. codon usage biases of influenza virus modulation of poliovirus replicative fitness in hela cells by deoptimization of synonymous codon usage in the capsid region virus attenuation by genome-scale changes in codon pair bias reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity live attenuated influenza virus vaccines by computer-aided rational design deliberate reduction of hemagglutinin and neuraminidase expression of influenza virus leads to an ultraprotective live vaccine in mice identification of cis-acting elements in the -untranslated region of the dengue virus type rna that modulate translation and replication poly(a)-binding protein binds to the nonpolyadenylated untranslated region of dengue virus and modulates translation efficiency the key parameters that govern translation efficiency viral rna switch mediates the dynamic control of flavivirus replicase recruitment by genome cyclization. elife dendritic transport of tick-borne flavivirus rna by neuronal granules affects development of neurological disease the authors thank prof. p.n. rangarajan (bc, iisc) for providing jev p virus and neuro a cells; key: cord- -ltuqoa b authors: tsai, hsiang-yu; rubenstein, dustin r.; chen, bo-fei; liu, mark; chan, shih-fan; chen, de-pei; sun, syuan-jyun; yuan, tzu-neng; shen, sheng-feng title: antagonistic effects of intraspecific cooperation and interspecific competition on thermal performance date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ltuqoa b understanding how climate-mediated biotic interactions shape thermal niche width is critical in an era of global change. yet, most previous work on thermal niches has ignored detailed mechanistic information about the relationship between temperature and organismal performance, which can be described by a thermal performance curve. here, we develop a model that predicts the width of thermal performance curves will be narrower in the presence of interspecific competitors, causing a species’ optimal breeding temperature to diverge from that of a competitor. we test this prediction in the asian burying beetle nicrophorus nepalensis, confirming that the divergence in actual and optimal breeding temperatures is the result of competition with blowflies. however, we further show that intraspecific cooperation enables beetles to outcompete blowflies by recovering their optimal breeding temperature. ultimately, linking direct (abiotic factors) and indirect effects (biotic interactions) on niche width will be critical for understanding species-specific responses to climate change. recent anthropogenic climate warming makes understanding species vulnerability to changing climates one of the most pressing issues in modern biology. a cornerstone for understanding the distribution and associated ecological impacts of climate change on organismal fitness is the concept of the ecological niche, which describes a hyperspace with permissive conditions and requisite resources under which an organism, population, or species has positive fitness (hutchinson ), it has been developed largely independently from niche-based studies. yet, characterizing the tpc is essentially a way to mechanistically quantify a species' thermal niche. since the tpc describes the detailed relationship between temperature and fitness, the concept may actually be more informative than that of the thermal niche, which is typically defined as the range of temperatures over which organisms occur in nature (i.e. thermal niche width) (huey & stevenson ; hillaert et al. burying beetles (silphidae, nicrophorus) are ideal for investigating how social interactions influence both fundamental and realized tpcs because the potentially antagonistic effects of interspecific competition and intraspecific cooperation on the realized tpc can be studied simultaneously. burying beetles rely on vertebrate carcasses for reproduction and often face intense intra-and interspecific competition for using these limiting resources (pukowski ; scott ; rozen et al. here, we extend classic ecological niche theory by introducing the concepts of fundamental and realized tpcs. we first construct a theoretical model by using a hypothetical tpc to predict how interspecific competition influences the width and optimal temperature of realized tpcs in order to provide a general understanding of the relationship between fundamental and realized tpcs. we then describe a series of laboratory and field experiments designed to test the predicted relationship between fundamental and realized tpcs in the asian burying beetle nicrophorus nepalensis ( fig. ) . we began our empirical work by measuring locomotor and breeding performance without interspecific competitors in the laboratory and field to determine n. nepalensis's fundamental tpc. we then quantified beetle breeding performance in the presence of interspecific competitors, mainly blowflies (putman ; putman ; scott ; sun et al. ) , to determine the beetle's realized tpc. finally, we used a group size manipulation in the presence of interspecific competitors in the field to explicitly examine the role of intraspecific cooperation and interspecific competition on the beetle's realized tpc. experimentally distinguishing between fundamental and realized tpcs will not only serve as a starting point for better understanding the relationships between direct and indirect drivers of organismal performance and fitness, but also for better predicting responses to climate change as the earth continues to warm. we used hanging pitfall traps baited with rotting pork (mean ± se: ± g) to collect adult beetles at mt. hehaun, taiwan ranging from m ( ° ′ e, ° ′ n) to m ( ° ′ e, ° ′ n) in . we checked traps and collected beetles on the fourth day after traps were set. a maximum of two males and two females from each trap were brought back to the lab to reduce any influence of capture on the population structure in the field. in every generation, we established at least families, approximately individuals in total, to maintain population structure within the lab. to ensure that beetles in the lab strains were unrelated to each other, we used beetles collected from different traps. we then put one female and one male in a × × cm box with cm of soil and a rat carcass ( ± . g). approximately two weeks after introducing adult beetles, all of the dispersing larvae that were ready to pupate from each breeding box were collected and allocated to a small, individual pupation box. after roughly days, beetles that emerged from pupae were housed individually in ml transparent plastic cups and fed with superworms (zophobas morio) once a week. all breeding experiments were conducted in walk-in growth chambers that imitated natural conditions at m on mt. hehaun. temperature was set to daily cycles between ℃ at noon and ℃ at midnight, and relative humidity was set to - %. we completed all of the laboratory experiments within three generations. to investigate breeding tpcs, we conducted solitary pairing experiments in six temperature conditions- , , , , and ºc-in a common garden with no temperature variation in the lab (n = replicates at each temperature). for each replicate, one wildtype male and one wildtype female were arbitrarily chosen from different lab strains to avoid inbreeding. we chose adult beetles that were sexually mature, roughly to weeks after their emergence. each individual was weighed to the nearest . mg. we then placed the pair with a mouse carcass ( ± . g) under each temperature condition in a transparent plastic container ( × × cm with cm of soil depth) for two weeks. cases in which pairs fully buried the carcass and produced offspring were regarded as successful breeding attempts. cases in which pairs failed to bury a carcass, or they buried it but did not produce offspring, were regarded as failed breeding attempts. to determine the tpc for locomotor behaviors, we conducted a series of treadmill experiments under three temperature conditions- , and ºc-in a common garden with no temperature variation in the lab. we set replicates in total ( ºc: replicates; ºc: replicates; ºc: replicates). for each replicate, we randomly picked one beetle from different lab strains and measured its weight and width of pronotum. the beetles were brought to the experimental chamber one day before data collection began. monofilament glued to the pronotum by uv glue attached each beetle to the treadmill, where it was allowed to walk at a stable speed of . m per min (see figure ). we turned off the treadmill if a beetle's abdomen began to drag or if it started to fly, both behaviors that indicated that the beetle could no longer walk. an individual was tested only once per day. after each experiment, beetles were returned to the transparent container with cm of soil for recovery. we measured each beetle's pronotum and the ambient temperature during running with a thermal imaging infrared camera (flir systems, inc., sc ; thermal sensitivity of < . ºc) at a resolution of * pixels. pronotum temperature was measured at the center of the thorax and calculated as the average pronotum body temperature each minute until an individual dragged its abdomen or started flying. the ambient temperature was the average temperature of a x cm surface of the treadmill located near where the beetle was tested. the temperature difference was depicted by the difference between the beetle's body and ambient temperatures. since our previous study showed that blowflies are the beetle's main interspecific rat carcass was placed on the soil to attract beetles and covered with a × × cm (length x width x height) iron cage with × cm mesh to prevent vertebrate scavengers from accessing the carcass. we checked each carcass daily until it began to decay due to microbial activity (payne ) , was consumed by maggots or other insects, or was buried under the soil by beetles. if burying beetles completely buried the carcass, we checked the experiment after days to determine if third-instar larvae appeared. cases in which pairs produced third-instar larvae were regarded as successful breeding attempts. cases in which pairs failed to produce larvae were regarded as failed breeding attempts. breeding experiments without blowflies were conducted in the same experimental sites in and (may-october). the experimental design was the same as that described above, but we used screen mesh above the pots to also keep blowflies out. to record air temperature at every site we placed ibutton® devices approximately cm above the ground within a t-shaped pvc pipe to prevent direct exposure to the sun but allow for air to circulate. one male and one female beetle that were reared in the lab were released into the pot to record fundamental breeding performance. after days, we checked the pots to determine whether the burying beetles' third-instar larvae appear. cases in which pairs fully buried the carcass and produced larvae after days were regarded as successful breeding attempts. cases in which pairs failed to bury a carcass, or they buried it but did not produce larvae, were regarded as failed breeding attempts. to investigate how cooperative behavior influences tpcs, we manipulated the group each experiment was recorded by a digital video recorder (dvr) to determine whether n. nepalensis successfully buried the carcass. we placed the same temperature measurement device as described above at every site. cases in which beetles buried the carcass completely and produced larvae after days after were regarded as successful breeding attempts. cases in which beetles failed to produce larvae were regarded as failed breeding attempts. we used generalized linear mixed models (glmms) with binomial error structure to compare thermal performance curves among treatments (with/without interspecific competitors; with/without intraspecific cooperation) in the field. the outcome of breeding success ( = success, = failure) was fitted as a binomial response term to test for differences in the probability of breeding successfully. the variables of interest (i.e. mean daily temperature, type of experimental treatment) were fitted as fixed factors. environmental factors (elevation, daily minimum air temperature) were fitted as covariates of interest. to account for repeated sampling in the same plot, we set the field plot id as a random factor (coded as |plot id) in the r package lme . where is environmental temperature, * is the shape parameter describing the steepness of the curve at the lower end, )*+ is the optimal environmental temperature at which organisms have their highest performance, and is the upper critical temperature. we assumed that performance becomes zero when > . since environmental conditions also directly influence a species' average performance, we used a gaussian function to describe the chance of encountering a particular temperature: where | = > represents the probability of getting given = > , = > represents the mean environmental temperature, and describes the environmental temperature variability. we combined equations ( ) and ( ) we began by addressing how interspecific competition influences the realized tpc of a focal species, finding that when a low temperature specialist species (e.g. burying beetle) competes with a high temperature generalist species (e.g. blowfly), the optimal temperature of the realized tpc of the thermal specialist shifts towards a lower temperature and the width of the tpc decreases (fig. b) . in other words, our model predicts that the optimal temperature of the realized tpc will decrease to below that of the optimal of fundamental tpc when a low temperature specialist competes with a high temperature generalist. to make the theoretical framework complete, we also explored the scenario of a high temperature specialist competing with a low temperature generalist. we found that if a high temperature specialist species competes with the low temperature generalist species (fig. s a) , the optimal temperature of the realized tpc of the thermal specialist shifts towards a higher temperature and the width of the realized tpc decreases (fig. s b) , which suggests that a shift in the optimal realized tpc away from the optimal temperature of the competing species is a general result. we first explored n. nepalensis's fundamental tpc of breeding in a controlled lab environment. we found that the beetle's optimal breeding temperature or fundamental tpc-defined as the mean temperature at which breeding success was highest-was . ℃ (fig. a , glm, χ² = . ,p< . ,n= ). to determine the physiological basis of this optimal tpc, we measured locomotion ability at different temperatures by performing a treadmill running experiment. we found that beetles had a greater likelihood of flying at ℃ while running at a stable speed on the treadmill (fig. b , glm, χ² = . ,p< . ,n= ). in other words, n. nepalensis took less energy to raise its body temperature enough to begin flying at ℃ than at other temperatures ( fig. c and d, glm, χ² = . ,p< . ,n= ). next, we investigated n. nepalensis's realized and fundamental breeding tpcs by studying breeding performance along an elevational gradient ( to m above sea level). as predicted by our model, in the presence of interspecific competitors (blowflies) in the wild, the optimal breeding temperature (i.e. the realized tpc) of n. nepalensis was roughly . ℃, which is lower than the optimal temperature in the lab in the absence of blowflies (i.e. the fundamental tpc) (fig. a , glmm, χ² = . , p< . ,n= ). intriguingly, when excluding blowflies and removing the threat of intraspecific competition in the field, the optimal breeding temperature of n. nepalensis increased to approximately . ℃ such that the realized tpc began to approach the fundamental tpc in the absence of blowflies, ultimately becoming broader than when in the presence of interspecific competitors (fig. b, glmm , χ² = . ,p< . ,n= ). thus, our experiment confirmed the causal relationship between interspecific competition and the shift in the realized tpc under natural conditions. since our previous study found that n. nepalensis will cooperate at carcasses to compete against blowflies, particularly in warmer environments (sun et al. ) , we predicted that a group of n. nepalensis in a warm environment would have a better chance of expanding its realized tpc towards the fundamental tpc than would an individual pair. to test this prediction, we performed a group size manipulation to determine the realized tpcs of cooperative groups and solitary beetles. we found that n. nepalensis in cooperative groups had an optimal breeding temperature (i.e. realized tpc) of . ℃, which is identical to their optimal temperature from the fundamental tpc in the lab. in contrast, the optimal breeding temperature of solitary pairs was . ℃, similar to that of the realized tpc in the field (fig. , group size × temperature interaction, χ = . , p= . , n = ; for large groups, χ = . , p= . , n = ; for small groups, χ = . , p< . , n = ). these results suggest that beetles that cooperate are able to expand their realized tpcs such that they converge on their fundamental tpcs, whereas those do not cooperate have divergent realized and fundamental tpcs. by combining the concepts of fundamental and realized niches from ecological niche theory with the that of the thermal performance curve (tpc), we found that a species' realized thermal performance curve is likely to change in time and space in response to biotic factors such as interspecific competition. our theoretical model suggests that if thermal specialist species compete with thermal generalist species adapted to higher or lower temperatures, the optimal performance temperature of specialists will decrease or increase respectively. our empirical results examining competition between burying beetles (thermal specialists) and blowflies (thermal generalists) for access to carcasses support this theoretical prediction, finding that blowflies force the beetle's optimal breeding temperature lower and the realized tpc narrower. intriguingly, our experiment also showed that cooperation in this facultatively social species not only enables beetles to overcome interspecific competition (sun et al. the idea that interspecific competition will reduce the realized niche width of a species is well-accepted in ecology. however, our study further suggests that a more mechanistic understanding of how interspecific competitors affect the optimal temperature performance of species will be critical for understanding how climate change affects species' vulnerability. since it is generally assumed in studies of macroecology and climate change that thermal performance is largely influenced by physiology, a single function is often used to describe a species' thermal performance curve (sinclair et al. ). however, if biotic interactions are key to indirectly influencing the thermal performance of a species as we have shown here, the realized tpc of a species is likely to change in time and space and should not be described by a single function to represent the thermal performance of a species. integrating the idea of tpcs into the ecological niche concept helps bridge two rich, but largely independent, traditions of studying thermal adaptation. by simply recognizing the concept of realized tpcs, it becomes clear that we know little about how realized and fundamental tpcs differ in most species. we show that the realized tpc provides a way to quantify how temperature mediates species interactions, which also influence organismal fitness. thus, the realized tpc extends the realized thermal niche concept, which only considers the temperature ranges in which a species can likely occur in n. nepalensis in warmer environments. therefore, conserving high population densities, especially at lower elevations, will be crucial for n. nepalensis to compete against blowflies under increased climate warming. by examining the relationship between cooperative behavior and interspecific competition, our study thus helps understand the pressing issue of how habitat destruction affects the vulnerability of social organisms to climate change (travis ) . when population density influences the likelihood of intraspecific cooperation in social species, habitat destruction will not only decrease habitat availability but also weaken a species' competitive ability against interspecific competitors, which in turn will lower the realized thermal performance of social organisms. our study has implications beyond interspecific competition in insects. many classic studies of tpcs investigate how changes in body temperature influence physiological or behavioral performance (chen et al. ; zhang & ji ) . body temperature is often assumed to be the same as the environmental temperature in ectotherms. however, accumulating evidence suggests that many ectotherms can at least partially regulate their own body temperature behaviorally or physiologically (heinrich the concept of the tpc has received renewed interest because the earth has been warming rapidly for the past few decades. yet, apparent gaps exist between studies of physiological function and those examining fitness consequences in changing environments. our study shows that employing the concepts of fundamental and realized tpcs can help us predict the ecological impacts of climate change, especially because environmental change will likely reshuffle ecological communities and alter the strength of species interaction (alexander et al. ). the importance of biotic interactions in shaping species distributions and community composition is intuitively obvious, yet historically has been difficult to quantify. we believe that the concept of realized tpcs can help fill this important knowledge gap and, ultimately, deepen our understanding of the ecological impact of climate change. when climate reshuffles competitors: a call for experimental macroecology animal aggregations, a study in general sociology contrasting forms of competition set elevational range limits of species ecological niches: linking classical and contemporary approaches a chemically triggered transition from conflict to cooperation in burying beetles influence of body temperature on food assimilation and locomotor performance in white-striped grass lizards, takydromus wolteri (lacertidae) principles of thermal ecology: temperature, energy and life climatic predictors of temperature performance curve parameters in ectotherms imply complex responses to climate change the thermal mismatch hypothesis explains host susceptibility to an emerging infectious disease hutchinson's duality: the once and future niche inverse density dependence and the allee effect global buffering of temperatures under forest canopies systematic variation in the temperature dependence of physiological and ecological traits impacts of climate warming on terrestrial ectotherms across latitude global metabolic impacts of recent climate warming larval growth rates of the blowfly, calliphora vicina, over a range of temperatures the role of temperature variability on insect performance and population dynamics in a warming world direct and indirect effects of interspecific competition in a highly partitioned guild of reef fishes behavior influences range limits and patterns of coexistence across an elevational gradient in tropical birds rapid adjustment of bird community compositions to local climatic variations and its functional consequences crossing regimes of temperature dependence in animal movement multiple stressors in a changing world: the need for an improved perspective on physiological responses to the dynamic marine environment the hot-blooded insects: strategies and mechanisms of thermoregulation the evolution of thermal performance can constrain dispersal during range shifting body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? integrating thermal physiology and ecology of ectotherms: a discussion of approaches using field data to estimate the realized thermal niche of aquatic vertebrates concluding remarks mechanistic niche modelling: combining physiological and spatial data to predict species' ranges heat accumulation and development rate of massed maggots of the sheep blowfly, lucilia cuprina (diptera: calliphoridae) resolving the paradox of environmental quality and sociality: the ecological causes and consequences of cooperative breeding in two lineages of birds ecological transitions in grouping benefits explain the paradox of environmental quality and sociality decoupling of behavioural and physiological thermal performance curves in ectothermic animals: a critical adaptive trait latitudinal patterns of phenology and age-specific thermal performance across six coenagrion damselfly species mechanisms underpinning climatic impacts on natural populations: altered species interactions are more important than direct effects a summer carrion study of the baby pig sus scrofa linnaeus Ökologische untersuchungen an necrophorus f. zeitschrift für morphologie und Ökologie der tiere the role of carrion-frequenting arthropods in the decay process carrion and dung: the decomposition of animal wastes rates of projected climate change dramatically exceed past rates of climatic niche evolution among vertebrate species quantifying the abundance of co-occurring conifers along inland northwest (usa) climate gradients antimicrobial strategies in burying beetles breeding on carrion field experiments on interspecific competition competition with flies promotes communal breeding in the burying beetle, nicrophorus tomentosus the ecology and behavior of burying beetles the ecology of cooperative breeding behaviour can we predict ectotherm responses to climate change using thermal performance curves and body temperatures? consequences of the allee effect for behaviour, ecology and conservation climate-mediated cooperation promotes niche expansion in burying beetles. elife, , e using a sound field to reduce the risks of birdstrike: an experimental approach seasonal reproductive endothermy in tegu lizards. science advances r: a language and environment for statistical computing. r foundation for statistical computing vienna climate change and habitat destruction: a deadly anthropogenic cocktail locally-adapted reproductive photoperiodism determines population vulnerability to climate change in burying beetles increased temperature variation poses a greater risk to species than climate warming climate change, species distribution models, and physiological performance metrics: predicting when biogeographic models are likely to fail the thermal dependence of food assimilation and key: cord- -n mpdhyu authors: alam, md. nafis ul; chowdhury, umar faruq title: short k-mer abundance profiles yield robust machine learning features and accurate classifiers for rna viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n mpdhyu high throughout sequencing technologies have greatly enabled the study of genomics, transcriptomics and metagenomics. automated annotation and classification of the vast amounts of generated sequence data has become paramount for facilitating biological sciences. genomes of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. recent attempts have highlighted the use of machine learning models for the task but these models rely entirely on dna genomes and owing to the intrinsic genomic complexity of viruses, rna viruses have gone completely overlooked. here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. we trained classifiers for the task of distinguishing viral rna from human transcripts. we challenged our models with very stringent testing protocols across different species and evaluated performance against blastn, blastx and hmmer searches. for clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. on de-novo assemblies of raw rna-seq data from cells subjected to ebola virus, the area under the roc curve varied from . to . depending on the software used for assembly. our classifier was able to properly classify the majority of the false hits generated by blast and hmmer searches on the same data. the outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data. author summary in this age of high-throughput sequencing, proper classification of copious amounts of sequence data remains to be a daunting challenge. presently, sequence alignment methods are immediately assigned to the task. owing to the selection forces of nature, there is considerable homology even between the sequences of different species which draws ambiguity to the results of alignment-based searches. machine learning methods are becoming more reliable for characterizing sequence data, but virus genomes are more variable than all forms of life and viruses with rna-based genomes have gone overlooked in previous machine learning attempts. we designed a novel short k-mer based scoring criteria whereby a large number of highly robust numerical feature sets can be derived from sequence data. these features were able to accurately distinguish virus rna from human transcripts with performance scores better than all previous reports. our models were able to generalize well to distant species of viruses and mouse transcripts. the model correctly classifies the majority of false hits generated by current standard alignment tools. these findings strongly imply that this k-mer score based computational pipeline forges a highly informative, rich set of numerical machine learning features and similar pipelines can greatly advance the field of computational biology. of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. recent attempts have highlighted the use of machine learning models for the task but these models rely entirely on dna genomes and owing to the intrinsic genomic complexity of viruses, rna viruses have gone completely overlooked. here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. we trained classifiers for the task of distinguishing viral rna from human transcripts. we challenged our models with very stringent testing protocols across different species and evaluated performance against blastn, blastx and hmmer searches. for clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. on de-novo assemblies of raw rna-seq data from cells subjected to ebola virus, the area under the roc curve varied from . to . depending on the software used for assembly. our classifier was able to properly classify the majority of the false hits generated by blast and hmmer searches on the same data. the outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data. viruses are numerous [ ] and only a handful have been thoroughly characterized thus far [ ] . as we have entered and slowly progress through the age of automated, high-throughput genomics, generation of sequence data itself is no longer of much concern, but rather accurate annotation of the bacterial srna gene tremendously aids the study of bacterial phylogeny [ ] . the lack of an omnipresent gene or genome segment that can be ubiquitously used to extract phylogenetically relevant data about viruses, complicates the study of virus evolution [ , ] . there are additional levels to the structural complexity of viral genotypes. viral genomes stand out from those of all other genomic entities as they can be either single, double or gapped dna or rna molecules with positive, negative or ambi-sense mechanism of genomic encoding; whereas all known realms of life exclusively rely on dna as the genetic material. these categories of classification are built on top of the original baltimore scheme[ ] that sought to characterize viral genomes with regard to expression of genetic information [ , ] . in the search for viruses in metagenomic data, researchers have noted that the usefulness of homology based search tools are quickly exhausted [ , ] . most assembled contigs, likely to be from viral origins, are short and fragmented with no guarantee of containing coding regions that some of these tools rely on [ ] . a number of software programs have been developed for the purpose of identifying viral sequences that have integrated into host genomes [ ] [ ] [ ] [ ] . all machine learning models built for identification and characterization of viral sequences from cell samples or metagenomic data thus far rely on dna sequences [ , ] . rna viruses comprise a major group having great clinical and scientific importance [ ] . presently made even more evident by the global pandemic caused by the novel sars coronavirus [ ] . it has been demonstrated that rna-seq data can be a very promising avenue for improving knowledge on rna viruses when leveraged by tactful algorithms [ ] . here, we present a novel short k-mer based scoring function that can be applied to sequences of different types to achieve impressive discriminatory capacity from classic machine learning models. we constructed a number of classifiers that rely on the profile of short genetic elements across the genome to classify rna viruses from cellular mrna and further classify them into positive or negative strand rna viruses. we tested the models rigorously using stringent cutoff parameters and many variable test sets including noisy de-novo assembly data of cells cultured with ebola virus. we evaluated the performance of our selected model against nucleotide blast, protein blast and protein family hmmer search. in spite of the stiff testing protocols, our k-mer based numerical features stood out to be informative and robust datapoints for the classification of biological sequences by machine learning algorithms. a total of features were contrived using the sequence scoring formula for k-mer elements of length to . the feature columns were tested for data normality by several methods. q-q plots of the data columns were suggestive of a normal distribution (not shown but can be found on git repository), but more robust statistical normality tests such as the kolmogorov-smirnov test and shapiro-wilk test strongly delineated a non-normal distribution for all feature columns. our features were therefore not suitable for univariate statistical analysis for feature selection. scikit-learn's in-built tree-based feature importance values derived from our trained random forest classifier were applied for feature selection on two levels. similarly, scikit-learn's optimized lasso model applied to our trained logistic regression classifier with l penalty was also used for feature selection on two succeeding levels. using a combination of both our tree- based selections and lasso, we obtained a total of levels of feature importance categories ( from the trees and lasso themselves and more for each method leading into the other on a second level), each assigning a value of to selected features and to omitted ones (fig ) . the top feature sets ranked by importance from these combined levels of selection are presented in caatc, tccaat, gcaatc, ttgac, cgtagt, acggtt, ttga, gttggt, catcaa, atacgg, atcgta, cgataa, cgatta, gttgat, cgat, cgatc, cggtt, gcgtac, cgcgtt, ttgcg, cgtcga, caatct, atcaac, attgg, atagcg, gatc, ttgcgc, ggttg, tcggtt, gtcaat, atcaat, ttcgac, ccgtta, ccgata, tcga, gcgtta atcata, acaatc, cgta, tcgt, atggta, ggtagt, tggt, ctgt, gtcga, agctg, gtcaa, aaataa, tcctg, cagc, tatccg, ctgga, caatt, tcggta, aggaa, atcaa, ataacg, tccaac, aattgg, aaatg, tggttc, aacc, ttgatg, ggcgta, tggtt, cgttgc, gtcgac, tcaac, caga, tcaatt, cagag, tcaat, aggttg, tcaacc, cgcgta, atccaa, ggtacg, gtcata, attggt, gcatac, ggtatg, gattgg, gtacgc, aaccga, tcgcaa, ttttta, acatcg, gcgtaa, tttttt, accggt, gttgag, taacac, ataggg, cgcaa, ccgcaa, cggttg, tatggt, aacggt, tagggt, ctaacg, cggta, gttgt, ggttc, acgata, ttcgca, ctcgat, tcgagt, aaaatg, tccat, gtcaaa, aataaa, ttggcg, gttgc, ggtcaa, ctgta, the summed feature score is obtained by summing the total number of times the feature appears in the six filtered feature sets we obtained from our selection algorithm. the macro-averaged f -score for the model was % and the binary f -score was . %. among the others, the decision tree models with and features and the naïve bayes model trained with features were the only models to obtain binary f -scores below %. all other models displayed resounding accuracy on both their cross-validation and test sets. auroc of . (fig ) . the f -macro score was . and the f -binary score was . (table part ). performance of this magnitude on novel divergent data concerns one to be critical of the to stand out to be good discriminators in biological data [ , ] , particularly because of their propensity against overfitting. our -feature random forest model displayed impressive classification accuracy and an auroc of . on this divergent test set (fig ) . correctly classified of these false positives giving us an accuracy of % on the unary data. when feeding the entirety of the human cell line assembly into our classification pipeline we achieved an accuracy of % for transcripts of all sizes and a much-improved value of % when taking transcripts of the same length range as our training data (table ) owing to the minimized skew of the separate classes (fig ) . the classification metrics from all instances are presented in table . (table ) . to further investigate the despondence of the model, we assessed the performance of the assembly software separately. this revealed that performance varied significantly between the different assembly software that joined the contigs (fig ) . genomes. by plotting the frequency of proteins discovered against the protein length cutoff threshold, it was evident that at a length of about amino acids, non-genic random orf numbers significantly subside (fig ) . using this -amino-acid cutoff for -frame in-silico (table ) . in the same manner, we sorted out uninformative vfams from the high performance vfams vfam a available at http://derisilab.ucsf.edu/software/vfam/. using hmmsearch on our informative vfams against the de-novo assembly dataset yielded false hits. our model was able to correctly classify of these false hits as human transcripts as noted in table . scale the model, we went beyond tree-based methods and employed lasso regularization. feature selection by lasso stems from linear regression models by penalizing the coefficients of the variables. using a combination of these selection methods on multiple levels, we were able to shrink the model down into only few features while only minimally compromising performance. overfitting is always a concern when applying ml models to biological data. the first impressions on the high accuracy of the classifiers suggested a case of overfitting in the model. the bootstrap aggregating or bagging meta-algorithm employed in random forest classification is particularly resistant to high variance and overfitting. although decision trees employ the same bagging principle, because of their larger depths, they are more prone to higher variance and overfitting. many random sampling events generate subsets of data known as bootstrap samples. isolated decision trees are trained on these bootstrap samples to construct the random forest ensemble. a majority vote from the members of the ensemble generates predicted classifications for test sets. since our random forest models performed noticeably better than the decision tree model, it would lead us to believe that overfitting does not account for the fidelity of our models. and the prevalence of many viral genomic remnants in the genomes of host organisms. we can also reasonably conclude that protein-based methods such as protein blast or pfams are not suited to the task of taxonomic classification of greatly divergent rna sequences. pertaining to the nonuniformity of the overlap between the results from our classifier and the tools it was compared to, we propose that a combinatorial approach employing multiple tools simultaneously is currently the best option for segregating sequence data by taxonomy (fig ) . upgrade and models such as ours that facilitate the annotation of the assembled data will be expected to catch up to these advancements. contriving features using k-mer abundance profiles we define where s is our set of all dna sequences q. for all k-mers from k = to k = , ∈ and hence e is defined as the set of all possible k-mers of length k. in the equation that ∈ follows, j is the total number of allowed mismatches or gaps in a score we obtain for any ( ) given sequence q and for every element e, as the sum of all instances where is a slice of q from the i-th letter of the word that matches element e, while allowing exactly j total mismatches or gaps between and e. here, is the length for each sequence q. this final score is used as the pre-scaled value for feature e in sequence q in all constructed machine learning models in this study. table ). the reverse complement here a virus, there a virus, everywhere the same virus? trends microbiol viruses in soil ecosystems: an unknown quantity within an annual review of virology emerging view of the human virome ecological significance of microdiversity: identical s rrna gene sequences can be found in bacteria with highly divergent genomes and ecophysiologies unraveling the viral tapestry (from inside the capsid out). isme j metagenomic contrasts of viruses in soil and aquatic environments expression of animal virus genomes towards the system of viruses genome replication/expression strategies of positive-strand rna viruses: a simple version of a combinatorial classification and prediction of new strategies isolation independent methods of characterizing phage communities : characterizing a metagenome virfinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. microbiome phage_finder: automated identification and classification of prophage regions in complete bacterial genome sequences prophinder: a computational tool for prophage prediction in prokaryotic genomes phaster: a better, faster version of the phast phage search tool prophage hunter: an integrative hunting tool for active prophages viraminer: deep learning on raw dna sequences for identifying viral genomes in human samples identifying viruses from metagenomic data using deep learning the evolutionary history of vertebrate rna viruses genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding. the lancet a bioinformatics approach reveals seven nearly rna-virus genomes in bivalve rna-seq data machine learning for detection of viral sequences in human metagenomic datasets profile hidden markov models for the detection of viruses within gene selection and classification of microarray data using random forest de novo transcriptome assembly: a comprehensive cross- species comparison of short-read rna-seq assemblers. gigascience blast+: architecture and applications origins and challenges of viral dark matter discovering viral genomes in human metagenomic data by predicting unknown protein families virsorter: mining viral signal from microbial genomic data discovering viral genomes in human metagenomic data by predicting unknown protein families s classifier: a tool for fast and accurate taxonomic classification of s rrna hypervariable regions in metagenomic datasets why are rna virus mutation rates so damn high? viral metagenomics third generation sequencing: technology and its potential impact on evolutionary biodiversity research virus taxonomy: the database of the international committee on nucleic acids research accelerated profile hmm searches binpacker: packing-based de novo transcriptome assembly from rna- seq data bridger: a new framework for de novo transcriptome assembly using rna-seq data shannon: an information-optimal de novo rna-seq assembler rnaspades: a de novo transcriptome assembler and its application to rna-seq data idba-tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels soapdenovo-trans: de novo transcriptome assembly with short rna-seq reads de novo assembly and analysis of rna-seq data full-length transcriptome assembly from rna-seq data without a reference genome oases: robust de novo rna-seq assembly across the dynamic range of expression levels s fig. performance of the classifiers when trained on different numbers of features. (a) logistic regression classifier on feature sets. (b) linear svm classifier on feature sets. (c) rbf-kernel svm classifier on feature sets. (d) decision tree classifier on feature sets. (e) random forest classifier on feature sets. (f) gaussian naïve bayes classifier on feature sets all k-meric features ranked by feature importance instated by our feature selection flow-chart complete results for leave-one-family-out cross-validation procedure information of sequences used to test the model's ability to generalize false nucleotide blast hits across whole human transcriptome when queried against the rna virus sequence database false nucleotide blast hits across whole mouse transcriptome when queried against the rna virus sequence database s table. detailed information on training set sequences this research received no specific grant from any funding in the public, commercial, or not-for- profit sectors. the findings and views expressed are that of the authors alone and none else. key: cord- -yazl usb authors: lobet, guillaume; descamps, charlotte; leveau, lola; guillet, alain; rees, jean-françois title: quovidi: a open-source web application for the organisation of large scale biological treasure hunts date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yazl usb learning biology, and in particular systematics, requires learning a substantial amount of specific vocabulary, both for botanical and zoological studies. while crucial, the precise identification of structures serving as evolutionary traits and systematic criteria is not per se a highly motivating task for students. teaching this in a traditional teaching setting is quite challenging especially with a large crowd of students to be kept engaged. this is even more difficult if, as during the covid- crisis, students are not allowed to access laboratories for hands-on observation on fresh specimens and sometimes restricted to short-range movements outside their home. here we present quovidi, a new open-source web platform for the organisation of large scale treasure hunts. the platform works as follows: students, organised in teams, receive a list of quests that contain morphologic, ecologic or systematic terms. they have to first understand the meaning of the quests, then go and find them in the environment. once they find the organism corresponding to a quest, they upload a geotagged picture of their finding and submit this on the platform. the correctness of each submission is evaluated by the staff. during the covid- lockdown, previously validated pictures were also submitted for evaluation to students that were locked in low-biodiversity areas. from a research perspective, the system enables the creation of large image databases by the students, similar to citizen-science projects. beside the enhanced motivation of students to learn the vocabulary and perform observations on self-found specimens, this system allows faculties to remotely follow and assess the work performed by large numbers of students. the interface is freely available, open-source and customizable. it can be used in other disciplines with adapted quests and we expect it to be of interest in many classroom settings. teaching biology to first-year bachelor students is a challenge. as educators, our aim is usually twofold. first, we want the students to learn a new set of knowledge and integrate it. second, and this is for us equally important, we want the students to engage with the topic at hand. we want to transmit our passion and curiosity about the topic that we teach. third, we also want students to learn to observe the world around them. it is one thing to learn a topic from a textbook, it is another to observe it in real life. however, the main issue is that the classroom is, often by design, completely disconnected from the natural world. the challenge is therefore to find a way for students to learn and engage with biology, despite that given disconnection. last but not least, in the spring semester of (january to june) it was necessary for us to adapt the learning activities to the containment measures related to covid- . the formal aim of our biology course -given in the bioengineering faculty, uclouvain, belgium -is to discover plant and animal structures, organs and their function at the individual scale. to achieve this, students need to learn specific vocabulary related to these structures. the classic way to present this vocabulary to a student audience is to review a series of slides illustrating these different characteristics. this vocabulary is usually very boring for teachers to describe (imagine the slides showing all the different shapes of leaves) and the content is not very interesting for students to listen to either. yet this vocabulary is an important prerequisite for describing any biological structure and for later systematic identification of taxons using dichotomous keys . its learning is essential. the question is therefore how to make this learning process motivating for the students and give them the opportunity to learn over time instead of memorising a list of words? the additional difficulty is that this learning activity must be able to be set up with more than students and few teaching resources. to create this learning activity, we decided to draw inspiration from all the pedagogical techniques that aim to place the student at the centre of his learning. student-centred learning and active learning emerged as important pedagogical techniques during the last century [ref] . active learning is characterised by (i) involving the student in the construction of his or her learning, (ii) engaging the student in an in-depth treatment of the subject matter, (iii) constructing learning through interaction (with the teacher or other students), (iv) conceiving of learning as the evolution of knowledge and skills [ , ] . studies have shown that the more cognitively and socially engaged the student is in a learning task, the more perennial the learning task becomes [ , ] . active learning improves the performance of students and acts to reduce the gap achievement between advantaged and disadvantaged students [ ] . in order to stimulate learning through interaction and create a collective emulation around this activity, the idea of creating a campus-wide biological treasure hunt finally emerged from the discussions. beyond simply being active through the manipulation of information, the student has to transform and produce new information that is not provided in the learning material. gamification is another recent technique to better engage the students in a learning activity. gamification is defined by [ ] as "game-based mechanics, aesthetics, and game thinking to engage people, motivate action, promote learning, and solve problems". in many studies, students' levels of engagement increased significantly following the introduction of game elements, such as points, challenges, quests or progress bar [ ] . the gamified environnement can afford intrinsic motivation and engagement, which are also targeted by active learning. to assemble these different elements -biological vocabulary, observation, active learning and gamification -in a comprehensive learning activity, we created a large scale biological treasure hunt for our students. in short, we provided students with a list of specific biological vocabulary. they had to understand the list and find the different elements outside of the classroom, in the natural world. external resources (books, selected websites, wiki pages) describing this vocabulary were available to them. complexity of understanding (some words are more difficult than others) as well as the difficulty of identification in the field were rewarded with different points. to manage the treasure hunt, we designed a new web-based platform, quovidi (which would loosely translate from latin as "where did you see"), for the organisation of large scale, decentralised, biological treasure hunts. quovidi is an open-source project available at www.quovidi.xyz . the objective of this publication is to describe the project, to show how we were able to adapt this learning activity to the covid- crisis, and finally, to show the results and success of the activity with the students. quovidi is a web application for the organisation and management of large scale biological treasure hunts. it was created to teach students to learn new biological terms (both in zoology and botany) and to teach them to observe the natural world surrounding them. first, educators have to prepare a list of quests to find in the natural world. these quests should be tailored and adapted for the target public. for instance, in our experience with first year biology students, the quests revolved around biological structures and families (tab. ). each quest is given a specific reward (points) depending on its intrinsic difficulty and rareness. quests can be sorted in different groups (for instance "animal" and "plant") and subgroups (for instance "animal species" and "leaf shapes") to help students navigate them. find a siphonaptera animal animal groups find an example of aposematism animal animal physical attributes second, educators have to assign students to groups to perform the activity. students in the same group will be able to share pictures and collaborate on the data collection. when logging into the web interface, students will be able to see the collected pictures and rewards from their own group. they will also be able to see the total number of points of the competing groups. educators also have the possibility to define specific game parameters, such as specific geographic regions in which the game takes place or restriction on the number of submissions in each quest group (adding for instance a point penalty below a certain number of "animal" or "plant" submissions). once the list of quests, users and groups are defined, the activity can start. two main activities are available for the students : an in situ treasure hunt and an ex situ photo quiz activity. the main activity of the platform is the biological treasure hunt. students have to go outside (although some of the creatures may be also found in their home such as food parasites, e.g. lepisma sp. or flies) to find the different quests setup by the educators. once they find a specific quest, they have to take a picture of it with their smartphone. we ask the student to take unambiguous pictures, where the subject of the quest is clearly identified and visible. we also ask them to leave the natural environment intact, without killing any plant or animal in the process. they can then store the picture on the quovidi web interface. when stored, pictures are automatically resized (for efficiency) and added to the activity database. localisation information and date are extracted from the picture exif metadata. any other information is erased at this step. once pictures are stored on the web interface, students can assign them to a specific quest and submit it for evaluation. the web application allows users to follow their progress in detail (which picture was submitted for which quest, what is the evaluation status, etc.) as well as the global progress of the other groups (the total number of collected points). it is worth noting that in belgium -where the web application was first usedthe lockdown due to the covid- pandemic still allowed citizens to go outside for some walk and exercise, although at a limited range. as such, the treasure hunt could still be performed by the students, either in their own garden or in neighbouring areas. however, not everyone lives in the countryside or close to a natural environment, or had the opportunity to leave their home during the lockdown. this is why we created a second module in the interface, the photo quiz, which allowed students to learn from photos contributed by other students, without having to submit their own photos. the second module of the interface allows students to evaluate pictures submitted by other students (a modified version of peer evaluation). more precisely, in the photo quiz module, students are presented with pictures submitted by other groups and validated by the educators (see below "expert evaluation). they have to assess whether the picture corresponds to its assigned quests. their assessment is then compared to the assessment of the educators. if it matches, the students gain points that are added to their global group tally. when performing this activity for the first time, it is necessary to have a sufficient amount of submitted (and corrected pictures). without a database large enough, the activity loses some of its interest, as students might all review the same pictures. the third important module of the interface, central to the activity, is the expert evaluation. each submitted picture needs to be manually assessed by the educators. different feedback can be given for each submission, such as "correct", "correct and nice picture", "incorrect" , "not visible" (e.g. the object is not visible in the picture) or "out of rules" (e.g. picture of a houseplant, picture taken outside of the prescribed geographical zones). the interface was designed to easily navigate the different quests and quickly correct the submitted images. the web application was created using the r shiny framework , using the shinydashboard [ ] , shiny [ ] , shinywidgets [ ] , shinybs [ ] , miniui [ ] packages for the user interface design. the data is stored in a sqlite database, hosted on the server. the database management is done using the dbi [ ] and rsqlite [ ] packages . pictures are transformed and managed using the magick [ ] package. exif information is extracted using the exifr [ ] package. data manipulation and visualisation is done using the tidyverse [ ] , lubridate [ ] , cowplot [ ] , formattable [ ] , dt [ ] , plyr [ ] , leaflet [ ] packages. the text sentiment analysis was performed using the rfeel package [ ] . in our exemple, the web application was deployed on the university server with the following specifications: quovidi is an open source project, released under an apache licence [ ] . everyone is free to re-use and modify it, with attribution. - the interface was created to be as much user-friendly as possible so that neither students nor staff need technical training . because it is web based, it can be used on any platform, whatever the operating system . it scales on mobile devices as well, allowing users to store and submit pictures directly from the field (if they have an internet connection). figure shows the different panels of the web interface. figure a shows the "store" panel, where students can store pictures, before submitting them for evaluation. this allows students from the same group to share and visualise their pictures. at this step, students can already assign a quest to the picture, which can be changed later on. they can also assign a geographic region, if this is required by the educators. a default region will be automatically proposed, based on the metadata of the picture. figure b shows the "submit" panel. at this stage, students see all the pictures from their group. they can select a stored picture, assign it to a quest and submit it for evaluation. groups can only submit one picture for each quest. or not as well as their validation status. in the same panel, students can also see the global scores of each group taking part in the activity. this adds a strong gamification aspect to the activity. figure d shows the "quests" panel. in that panel students can navigate through the different quests proposed by the educators. they can sort them by groups, subgroups or rewards. in this panel, no explanation is given for the different quests. for instance if the quest is "find an achene", we do not define achene. this is done by design. we want students to look up the different biological terms by themselves. we do provide them with ressources to do so. when an educator logs into the web application, the "quests'' panel becomes the "admin" panel. in this panel, educators can follow the evolution of the activity ( fig. a) , change the activity parameters ( fig. b ) or correct the student submissions ( fig c) . depending on the number of participating students and allowed submissions, the number of corrections can quickly become quite large. therefore we designed the corrections interface to be fast and efficient. the educator first chooses one quest to correct. he·she will in spring , we organised the activity with a rooster of first year bachelor students from the bioengineering faculty of the uclouvain (belgium). students were spread in groups (it was therefore set up as an individual activity). although students had to do the activity individually, we encouraged them to discuss the different quests and collect them together, as long as everyone took their own pictures. each group was allowed to submit a maximum of pictures. quests were created, divided in plant quests and animal quests. specific restrictions were added to the game. a minimal number of animal and plant quests had to be collected by each group. groups were also asked to the activity started on february . we had to pause the activity for days at the beginning of the lockdown due to the covid crisis. during that pause, we implemented the peer-evaluation in the web interface (it was not part of the interface initially). the activity resumed on the d of april and finished on the th of may. for the second phase of the activity, during the lockdown, all restrictions (quests groups and zones) were lifted as many students had returned to their home far away from the campus. at the end of the activity, we sent an anonymous feedback form to the students and received answers. a total of pictures were submitted by students during the activity. figure shows the repartition of the submitted pictures by the students during the activity. figure a & b show the difference before and after the lockdown imposed during the covid- crisis. before the lockdown, as we asked students to take pictures around the university, most of them were taken in louvain-la-neuve. during the lockdown, almost no pictures were taken in louvain-la-neuve, as students went back home. the lockdown reduced the number of collected pictures, but did not stop it. this is due to several reasons. at the beginning of the activity, we encouraged students to look for quests in groups, to foster peer-learning between them. this was not possible anymore during the lockdown. the collection of biological data was also influenced by the direct surroundings of the students. students living in an urban area were potentially at a disadvantage compared to students in the countryside. however, because we included the photo quiz module at the beginning of the lockdown, every student could continue the activity. figure shows, for every group, the proportion of points acquired either with the quests collection or the photo quiz. we can see that the dual system allowed students to choose different strategies, to adapt to their individual lockdown conditions. we also observed a strong trend toward the collection of plant-related quests by the students ( fig. c & d) . this is probably due to the fact that, in an urban setup, plants are easier to find that animals. for an inexperienced naturalist, it is also probably easier to take pictures of plants than animals that have a tendency to escape. all the pictures can be viewed interactively at the address http:// .quovidi.xyz overall, we observed a high correctness in the students picture submissions ( fig a) . for the treasure hunt and the picture collection, only % and % of the quests (for the animal and plant, respectively) were assessed as incorrect by ourselves. one reason for such a high accuracy from the students might be the high level of engagement required by the activity. they have to learn the vocabulary and discuss with other students, and go outside often in groups to find what they have identified as appropriate for a quest submission. in the icap framework [ ] , we believe this corresponds to the "interactive learning" level, enabling the highest learning capabilities. interestingly, we also observed a much lower accuracy for the photo quiz ( fig. b ). for that activity, % and % of the evaluations by the students (for the animal and plant, respectively) were incorrect. this can be due to several factors. first, contrary to the treasure hunt in itself, the evaluation activity requires a lesser level of engagement by the student. the activity is indeed "reduced" to click on a button in front of a computer screen. second, depending on the quality of the picture to evaluate, said evaluation could be challenging. we tried to keep only good pictures for that activity, but the quality remained nonetheless variable. overall, the activity was very well appreciated by the students. with a few exceptions, students like going outside to observe their surroundings and collect the quests. in a survey performed after the activity ( fig. ), students reported to like the activity and have the feeling to have learned during it. many students spontaneously expressed their enthusiasm for this activity (tab. ). selected comments from the students "great activity to learn new concepts and look at our environment in a different way. " "i think the game is fun and interactive, it's a great way to learn by seeing things "in real life" and also to decipher the quests. " "very nice way to propose the course, it pushes the students to discover the surrounding nature in a playful way. " the quovidi platform was created for several reasons. we wanted students to learn and know specific plant and animal vocabulary, but we did not want to just give them a list of words to be memorized and repeated. we also wanted them to explore and learn to observe their direct environment. we wanted to show them that you do not need to go to a tropical forest to be able to see a great diversity of plant and animal forms and species. we wanted to spark a strong interest in their surrounding natural world. finally, we were also working with strong practical constraints. we needed to design an activity that was scalable for hundreds of students, without the need to increase the number of educators. this was possible, thanks to the current technologies (camera, mobile network and gps localisation) available in almost every mobile phone. with the creation of the web-platform for quovidi, we have met all those goals. the treasure hunt (and to a lesser extent the photo quiz) strongly motivates students to learn and remember the different technical terms used in the quests. then they have to apply these new terms directly in the field. the gamification process (quests, score points, personal progress panel and scoreboard between all the groups) is also a strong incentive to engage in the activity. the activity is also highly scalable. the number of participants is, from a technical point of view, only limited by the capacity of the server on which the platform is installed. the main limitation remains the expert correction step. as every single picture needs to be validated, the evaluation can quickly require a lot of time from the educator, even though we tried to make the process as efficient as possible. we hope in the future that the platform would benefit from advances in artificial intelligence algorithms to help correct the images (see below). finally, the activity is completely decentralised, which has been a great asset during the covid- crisis. students can collect quests at any time and place, making it easy to adapt to every individual situation. if they cannot go outside, or are not in a nature-rich environment, they can still participate in the activity via the peer evaluation module. from the educator point of view, all the management and corrections can be done from anywhere, as long as they have access to a computer and an internet connection. as such, the platform was a real asset during the lockdown period ( march to of june in belgium), as it enabled us to continue the activity almost seamlessly. similarly to citizen science projects, the use of our platform allows the collection of large numbers of geotagged, dated images of plant and animal structures. by helping create such a database over the years, the students are taking an active role in creating a valuable research ressource. this in itself is viewed by the students as a motivational element of the activity. such databases could be re-used in different ways. from an educational point of view, the images collected could be used to create a quiz to rehearse the vocabulary the following year. the student would therefore create their own teaching and rehearsal material. an example of a quiz created with the students pictures is visible here : http://quiz.quovidi.xyz . from a research point of view, an ever growing database of annotated plant and animal pictures (describing either organ, species or groups), on a limited and well defined area would be a valuable resource. as each record of the database has been validated by an expert (the educators), such a database could be used in research projects. another interesting valuation of the database would be to reuse it to train deep learning recognition algorithms. again, given the size and potential growth of the database, it will be an interesting resource to train machine learning models to recognise plant and animal structures. such models could, in turn, be integrated into the platform to help with the correction. so far, we use the quovidi framework within a single classroom (even if it was a very large one). since the activity is entirely centralised online, we could imagine collaboration between remote classrooms. students from different regions, countries or continents could participate in the same activity, hence increasing the degree of diversity of the observations. here we exemplified the use of our platform with a biological treasure hunt. students were asked to find, in the field, plant and animal structures. however, due to its flexibility, the platform could be used to organise large scale treasure hunts in any context. it could be used in architecture, design or geology classrooms, with quests related to different building structures, street art or rock, respectively. it could be used with children, with simplified quests, or with more advanced students, with more complex ones. in short, we expect the concept could be used in any context to deal with structures present in the "outside" world. we presented in this manuscript a new open-source web platform for the organisation for large tresor hunt, quovidi. during the spring , in the midst of the covid- crisis, we successfully used the quovidi platform with more than students, and allowed the collection of more than geotagged plant and animal pictures. the decentralised nature of the platform enabled us to ensure a continuity in our teaching, despite the nation-wide lockdown. we expect quovidi to be of interest for any teaching activity focused on the identification of real-world structures. quovidi is available at the address http://www.quovidi.xyz active learning increases student performance in science, engineering, and mathematics translating the icap theory of cognitive engagement into practice the icap framework: linking cognitive engagement to active learning outcomes increased structure and active learning reduce the achievement gap in introductory biology kapp bkm. games, gamification, and the quest for learner engagement. in: main [internet the effect of gamification on motivation and engagement create dashboards with shiny: web application framework for r shinywidgets: custom inputs widgets for shiny twitter bootstrap components for shiny shiny ui widgets for small screens dbi: r database interface sqlite" interface for r advanced graphics and image-processing in r exif image data in r easily install and load the "tidyverse dates and times made easy with lubridate streamlined plot theme and plot annotations for "ggplot formattable: create "formattable" data structures dt: a wrapper of the javascript library "datatables tools for splitting, applying and combining data leaflet: create interactive web maps with the javascript "leaflet" library feel: a french expanded emotion lexicon. language resources and evaluation apache license, version . | open source initiative quovidi, then called biogo, was one of the laureates of the "prix wernaers pour la vulgarisation scientifique" in . original draft x x x x x key: cord- - s authors: bhadra, sanchita; maranhao, andre c.; ellington, andrew d. title: one enzyme reverse transcription qpcr using taq dna polymerase date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: s taq dna polymerase, one of the first thermostable dna polymerases to be discovered, has been typecast as a dna-dependent dna polymerase commonly employed for pcr. however, taq polymerase belongs to the same dna polymerase superfamily as the molony murine leukemia virus reverse transcriptase and has in the past been shown to possess reverse transcriptase activity. we report optimized buffer and salt compositions that promote the reverse transcriptase activity of taq dna polymerase, and thereby allow it to be used as the sole enzyme in taqman rt-qpcr reactions. we demonstrate the utility of taq-alone rt-qpcr reactions by executing cdc sars-cov- n , n , and n taqman rt-qpcr assays that could detect as few as copies/µl of input viral genomic rna. rt-qpcr remains the gold standard for the detection of sars-cov- . however, the complexity of this assay sometimes limits its use in either resource-poor settings or in circumstances where reagent availability has become limited. to overcome these limitations, we have previously advocated the use of a thermostable reverse transcriptase (rt) / dna polymerase (dnap), which we have termed rtx (and which is distinct from rtx, from new england biolabs) for use as the rt component of rt-qpcr. rtx is an evolved variant of the high-fidelity, thermostable dna polymerase from thermococcus kodakaraensis (kod dnap) with relaxed substrate specificity allowing it to perform as both a dna-and rna-directed dna polymerase . rtx has been shown to serve as the rt component in standard taqman probe-based rt-qpcr and as the sole enzyme component for dye-based rt-qpcr (https://www.biorxiv.org/content/ . / . . . v ). the basis for the directed evolution of rtx is that many dna polymerases have reverse transcriptase activity, with some of them, such as the polymerase from thermus thermophilus (tth), having fairly substantial activity . this is a discovery that appears to resurface every so often, but that can be traced back to . the use of a single enzyme for both reverse transcription and dna polymerization can simplify molecular diagnostic assays, including isothermal amplification assays . in the midst of reagent supply issues, this led us to wonder whether it might prove possible to use readily available thermostable dna polymerases as single enzyme solutions for rt-qpcr. we find that a relatively simple buffer ("gen a") allows robust detection of sars-cov- in taqman probe-based rt-qpcr. we have validated this buffer mix with taq dnap from several commercial sources. all chemicals were of analytical grade and were purchased from sigma-aldrich (st. louis, mo, usa) unless otherwise indicated. all enzymes and related buffers were purchased from new england biolabs (neb, ipswich, ma, usa), thermo fisher scientific (waltham, ma, usa), or promega (madison, wi, usa) unless otherwise indicated. all oligonucleotides and taqman probes ( table ) were obtained from integrated dna technologies (idt, coralville, ia, usa) . sars-cov- n gene armored rna was obtained from asuragen (austin, tx, usa). sars-cov- viral genomic rna was obtained from american type culture collection (manassas, va, usa). rt-qpcr assays were assembled in a total volume of µl containing the indicated buffer at x strength ( table ). the buffer was supplemented with . mm deoxyribonucleotides (dntp), nm each of forward and reverse pcr primer pairs, nm of the taqman probe, and . units of taq dna polymerase from indicated commercial vendors. indicated copies of sars-cov- viral genomic rna, sars-cov- n gene armored rna or rnasep armored rna prepared in te buffer ( mm tris-hcl, ph . , . mm edta, ph . ) immediately prior to use were added to rt-qpcr reactions containing corresponding pcr primers. negative control reactions did not receive any specific templates. amplicon accumulation was measured in real-time by incubating the reactions in a lightcycler qpcr machine (roche, basel, switzerland) programmed to hold °c for min followed by °c for min prior to undergoing cycles of °c for sec and °c for sec. taqman probe fluorescence was measured during the amplification step ( °c for sec) of each cycle using the fam channel. in some experiments, the initial reverse transcription step ( °c for min) was eliminated and the reactions were directly subjected to denaturation at °c for min followed by cycles of pcr amplification as described above. in some experiments, rt-qpcr tests were subjected to a heat kill step prior to reverse transcription by incubating the reactions at °c for min. in some experiments, the rna templates were treated with dnase i prior to rt-qpcr analysis. briefly, x copies/µl ( x total copies) of armored n gene rna or x copies/µl ( total copies) of sars-cov- viral genomic rna were incubated with . units of dnase i for min at °c. dnase i was then inactivated by adding edta to a final concentration of mm and heating at °c for min. while taq dna polymerase has previously been shown to possess reverse transcriptase activity, , it is not commonly thought of as an enzyme that would be readily used for reverse transcription in many assays. to verify this activity and to determine whether taq polymerasemediated reverse transcription might be leveraged for single enzyme rna detection, we carried out the cdc-approved sars-cov- -specific n taqman rt-qpcr assay using only taq dna polymerase and its accompanying commercial reaction buffer, thermopol, (new england biolabs) seeded with different copies of n gene armored rna (asuragen), a commercial template preparation that is devoid of dna. even though the only polymerase present in these reactions was taq dna polymerase (neb), and no dedicated reverse transcriptase was added, amplification curves were generated in response to x , x , and x copies of the sars-cov- n gene armored rna templates (figure ). we hypothesized that the buffers in which taq dna polymerase is commonly used have been optimized for dna amplification and likely would not support robust reverse transcription. in addition, previous work had explored optimization of buffer conditions for taq stoffel dna polymerase (i.e. "klentaq"). we therefore undertook a series of buffer optimizations in which we sequentially varied: buffer ph, tris concentration, the concentration of monovalent cations ((nh ) so and kcl) and divalent cations (mgso ), and the concentration of the non-ionic detergent triton x- ( table ) . these variables were all chosen based on knowledge of the reaction. the optimum ph for taq dna polymerase activity is reported to be between . and . , with highest activity observed in tris-hcl-based buffer at ph . , and increasing ph is one of the factors that decreases taq fidelity. monovalent cations, such as k + , are known to stimulate the catalytic activity of taq dna polymerase with optimal activity being observed at about mm kcl, and higher ionic strength can promote primer annealing. ammonium ions (nh + ), on the other hand, may have a destabilizing effect, especially on weak hydrogen bonds between mismatched primers and templates. the divalent cation mg + is essential for the catalytic activity of taq dna polymerase, and its concentration is frequently varied to obtain optimum amplification , finally, the non-ionic detergent triton x- is thought to reduce nucleic acid secondary structure and may influence rt-pcr specificity. when sars-cov- n taqman rt-qpcr assays were performed in the generation buffers a, b, and c, taq polymerase generated distinct amplification curves with -fold improvement in sensitivity relative to the commercial thermopol buffer (figure ) . in fact, in generation , buffer b, amplification curves were observed with as few as copies of armored rna templates. to identify and hone parameters, we tested the same taqman rt-qpcr assay in four generation buffers with lowered tris concentrations that matched that of the thermopol buffer. in addition, mgso was replaced with mgcl , a more commonly used source of mg + ions in pcr. however, reduction in tris concentrations caused a significant drop in rt-qpcr sensitivity compared to both generation and thermopol buffers (figure ) . therefore, the tris concentration and ph in x buffers were held constant at mm and ph . from generation onwards. in generations - we sequentially varied the concentration of (nh ) so from to mm, kcl from to mm, mgcl from to mm, and triton x- from to . %, and arrived at an optimized generation buffer a (gen a) that contained mm tris, ph . , mm (nh ) so , mm kcl, mm mgcl , and no triton x- . robust amplification curves were generated and as few as copies/µl ( copies total) of the rna template could be detected when gen a buffer was used. to confirm this novel application of taq dna polymerase, we tested two different commercial sources of taq dna polymerase, neb and thermo fisher, using additional rna templates and taqman rt-qpcr assays: (i) cdc n , n , and n taqman rt-qpcr assays of sars-cov- genomic rna purified from infected cells (atcc) and (ii) cdc rnasep taqman rt-qpcr assay of rnasep armored rna. similar to our results with armored n gene rna, the neb taq dna polymerase was able to perform taqman rt-qpcr analysis of sars-cov- viral genomic rna with all three cdc assays (figure ). as expected, gen a buffer improved rt-qpcr performance of taq dna polymerase allowing detection of as few as copies of viral rna in all three n gene assays. the rt-qpcr ability was not restricted to the neb taq dna polymerase. taq dna polymerase from a different commercial vendor, thermo fisher, could also support rt-qpcr analysis of viral genomic rna. similar to the neb enzyme, thermo fisher taq dna polymerase also demonstrated better activity in gen a buffer and was able to detect viral genomic rna using all three n gene assays, albeit with a higher detection limit (figure ) . both neb and thermo fisher taq dna polymerases were also able to perform taqman rt-qpcr analysis of rnasep armored rna using the cdc assay (figure ) . these results suggest that taq dna polymerase can support taqman rt-qpcr analyses of rna in one-enzyme reactions. table . amplification curves resulting from x (black traces), x (red traces), x (blue traces), x (pink traces), (green traces), and (gray) copies of sars-cov- n gene armored rna are depicted. taqman rt-qpcr analysis of sars-cov- viral genomic rna and rnasep armored rna using taq dna polymerase-based one-enzyme assays. cdc sars-cov- n gene assays, n , n , and n , and rnasep assay were performed using taq dna polymerase from either neb (panels a-h) or thermo fisher (panels i-p). assays were performed either using the companion commercial buffer (panels a-d and panels i-l) or using gen a buffer (panels e-h and panels m-p). amplification curves from (black traces), (red traces), (blue traces), (pink traces), and (gray traces) copies of viral genomic rna are depicted in panels a-c, e-g, i-k, and m-o. amplification curves from x (black traces), x (red traces), x (blue traces), x (pink traces) and (gray traces) copies of armored rnasep rna are depicted in panes d, h, l, and p. to further prove that the reverse transcriptase is inherent in taq polymerase itself, we incubated rt-qpcr assays at °c for min prior to reverse transcription, which should inactivate any contaminating mesophilic reverse transcriptases (supplementary figure ) . taq polymerase was still fully capable of rt-qpcr. the armored rna templates and viral genomic rna templates used in these studies are theoretically devoid of dna templates. to demonstrate that the taqman rt-qpcr amplification signals generated by taq dna polymerase are not due to amplification of contaminating dna templates, we treated the rna templates with dnase i prior to rt-qpcr amplification. as shown in figure , taq dna polymerase-mediated taqman rt-qpcr assays generated amplification signals from both dnase treated genomic rna and armored rna. this was true of not only neb and thermo fisher taq dna polymerase but also a preparation of taq dna polymerase purchased from promega (data not shown). while n and n assays could still detect the smallest template quantities tested - copies of viral genomic rna or copies of armored n rnamost amplification curves suffered a small increase of about - ct in their time to detection (figure ). in contrast, the n assay demonstrated anywhere between a and ct delay with dnase treatment of rna templates, and failed to generate signal from the lowest template concentrations. to confirm that reverse transcription step in the taqman rt-qpcr assay was indeed necessary in order to generate amplification curves in response to template rna, we executed cdc sars-cov- n , n , and n assays using neb or thermo fisher taq dna polymerases without performing the reverse transcription step prior to qpcr amplification. eliminating a min incubation at °c (the putative reverse transcription step) prior to pcr thermal cycling also eliminated robust amplification of viral rna templates (supplementary figure ) . the taqman probe signal remained at background levels in all reactions except the n assays performed in gen a buffer. even in this case, amplification curves were significantly delayed and only apparent in the presence of relatively high numbers of rna templates. for instance, taq obtained from neb detected and copies of viral genomic rna (with ct values of . and . , respectively). these same copy numbers of template rna when subjected to a taq-mediated reverse transcription step prior to qpcr amplification typically yielded ct values of and (figure ) . these results demonstrate that taq polymerase can generate amplicons from rna during a normal thermal cycling reaction, and that a pre-incubation step greatly improves detection. figure . effect of dnase i treatment on taq dna polymerase-mediated rt-qpcr assay. taq dna polymerase purchased from neb was used to operate cdc sars-cov- n , n , and n taqman rt-qpcr assays using sars-cov- viral genomic rna (panels a-c) or n gene armored rna (panels d-f) treated with dnase i. amplification curves shown in panels a-c resulted from (black traces), (red traces), (blue traces), (pink traces), and (gray traces) copies of sars-cov- genomic rna. amplification curves in panels d-f resulted from , (black traces), , (red traces), (blue traces), (pink traces) and (gray traces) copies of n gene armored rna. representative ct values for rt-qpcr amplification of indicated copies of untreated and dnase i treated sars-cov- genomic rna and n gene armored rna are tabulated. these results suggest that, given the correct buffer conditions, taq dna polymerase can perform reverse transcription and taqman qpcr in one-pot reactions. while the use of a proficient reverse transcriptase along with a proficient thermostable dna polymerase may still be the best option for many applications, the ability to use only a single enzyme in assays for the detection of sars-cov- rna opens the way to new diagnostic approaches, especially in resource-poor settings where reagent availability or production may be issues. however, it is also evident from our experiments that the variability in amplification efficiency observed as buffer conditions were iteratively improved may mean that similar buffer optimizations will need to be routinely carried out with other templates. it is nonetheless possible that the reverse transcriptase activity of taq dna polymerase, once fully optimized, will prove useful in many different applications, up to and including those where many different templates are present, such as rnaseq. synthetic evolutionary origin of a proofreading reverse transcriptase reverse transcription and dna amplification by a thermus thermophilus dna polymerase reverse transcription of mrna by thermus aquaticus dna polymerase innate reverse transcriptase activity of dna polymerase for isothermal rna direct detection reverse transcription, amplification and sequencing of poliovirus rna by taq dna polymerase reverse transcription and direct amplification of cellular rna transcripts by taq polymerase fusion of taq dna polymerase with single-stranded dna binding-like protein of nanoarchaeum equitans-expression and characterization deoxyribonucleic acid polymerase from the extreme thermophile thermus aquaticus dna polymerase fidelity and the polymerase chain reaction a mechanism for all polymerases this work was supported by grants from the welch foundation (f- ), the national institutes of health ( -r -eb - a ), and the national aeronautics and space administration (nnx af g). key: cord- - yc ok authors: siddiqui, shoib s.; dhar, chirag; sundaramurthy, venkatasubramaniam; sasmal, aniruddha; yu, hai; bandala-sanchez, esther; li, miaomiao; zhang, xiaoxiao; chen, xi; harrison, leonard c.; xu, ding; varki, ajit title: acidosis, zinc and hmgb in sepsis: a common connection involving sialoglycan recognition date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yc ok blood ph is tightly regulated between . - . , with values below . during sepsis being associated with lactic acidosis, low serum zinc, and release of proinflammatory hmgb from activated and/or necrotic cells. using an ex vivo whole blood system to model lactic acidosis, we show that while hmgb does not engage leukocyte receptors at physiological ph, lowering ph with lactic acid facilitates binding. at normal ph, micromolar zinc supports plasma sialoglycoprotein binding by hmgb , which is markedly reduced when ph is adjusted with lactic acid to sepsis levels. glycan array studies confirmed zinc and ph-dependent hmgb binding to sialoglycans typical of plasma glycoproteins. thus, proinflammatory effects of hmgb are suppressed via plasma sialoglycoproteins until drops in ph and zinc release hmgb to trigger downstream immune activation. significance statement hmgb sequestered by plasma sialoglycoproteins at physiological ph is released when ph and zinc concentrations fall in sepsis. the ph of body fluids in healthy individuals spans a very broad range in different tissue types and organs, ranging from ph . (stomach contents), to . (urine). human cells in tissue culture can also tolerate a wide range of ph values. in contrast, blood ph is tightly regulated between . - . ( ) , and departure out of this range (acidosis or alkalosis) can be very detrimental. for example, in the recent covid- pandemic, % of non-survivors had acidosis, compared to % among survivors ( ) . acidosis in sepsis is partly due to lactic acid release from anoxic tissues, which overwhelms the buffering capacity of circulating blood ( ) . a "cytokine storm" of proinflammatory mediators in sepsis triggers a cascade of destructive outcomes such as multiple organ failure ( ) ( ) ( ) ( ) ( ) as currently seen in severe cases of covid- infection ( ) . the mechanisms underlying lethality associated with low blood ph are not clear, but include low zinc levels and release from apoptotic or necrotic cells of hmgb , a damageassociated molecular pattern (damp) defined as one of the late mediators of sepsis, further upregulating many other proinflammatory cytokines ( ) ( ) ( ) . importantly, a recent study indicates hmgb levels are strongly associated with mortality in patients infected with sars-cov- ( ) . here we show that sialylated plasma glycoproteins bind hmgb to suppress its ability to promote inflammatory responses in a zinc and ph-dependent manner. this finding provides an avenue for developing a new therapeutic strategy for treating sepsis. mimicking lactic acidosis ex vivo in hirudin-anticoagulated whole blood. in vivo studies of acidosis and sepsis involve many complex factors and interactions. on the other hand, ex vivo reconstitution of purified blood components can result in artifacts, e.g., neutrophils get activated when separated away from erythrocytes and plasma ( ) . to study the significance of tightly regulated blood ph ex vivo, we sought to create a whole blood system mimicking lactic acidosis. conventional anticoagulation with edta or citrate abrogates divalent cation functions, and heparin has many biological effects independent of anticoagulation. we have previously shown that the leech protein hirudin can be used to obtain whole blood anticoagulation in vitro ( ) . when lactic acid was added to freshly collected hirudin-anticoagulated whole blood, the ph first rose until a concentration of about mm lactic acid was reached. further addition then caused a sharp drop in blood ph. such an initial rise in blood ph followed by a subsequent drop is seen in patients with sepsis ( ) . to further develop this model, we introduced hmgb , a damp ( ) ( ) ( ) associated with poor prognosis in late sepsis ( , ) . which is partially attenuated by an hmgb -blocking antibody: cd b expression was determined by flow cytometry after incubating whole blood with/without hmgb ( µg/ml). a) neutrophils are activated when incubated with hmgb in whole blood at ph . (chromatograms: red-control, blue-whole blood at ph . , orange-whole blood at ph . , green-whole blood at ph . with hmgb , cyan-whole blood at ph . with hmgb ). b) activation is partially attenuated with an hmgb -blocking antibody ( µg/ml) (chromatograms: red-isotype control, blue-whole blood at ph . , green-whole blood at ph . with hmgb , orange-whole blood at ph . with hmgb and an hmgb -block neutrophils in whole blood are activated by hmgb at low ph due to better binding, and activation is attenuated with an hmgb blocking-antibody. interaction of hmgb with toll-like receptors (tlrs) during sepsis is well documented ( ) . the proinflammatory activity of hmgb is due to binding to targets such as tlr- , tlr- , tlr- and rage that are expressed on leukocytes and endothelial cells ( , ) . we, therefore, introduced exogenous hmgb into our whole blood acidosis model and tracked cd b expression on neutrophils, as a sensitive marker of activation triggered by hmgb . increased neutrophil activation was noted when hmgb was incubated with whole blood at low ph as compared to physiological ph ( figure a ). this effect was partially attenuated by adding hmgb blocking antibody ( figure b) . enhanced activation at low ph coincides with increased hmgb binding to neutrophils and monocytes (compare upper and lower panels of figure a and b). thus, physiological blood ph limits interaction of hmgb with leukocyte receptors, suggesting natural inhibitor(s) of hmgb interaction in blood. looking for candidate inhibitors, we noted earlier evidence that hmgb can interact with cd and cd , two heavily sialylated proteins ( , ) in a trimolecular complex with siglec- , a known sialic acid-binding protein. cd -fc bound specifically to the proinflammatory box b domain of hmgb , and this, in turn, promoted binding of the cd n-linked glycan sialic acid with . furthermore, sialidase treatment abolished cd binding to hmgb , indicating that it might be a sialic acid-binding lectin. since normal blood plasma contains ~ mm sialic acid attached to glycans on plasma proteins, ( ) , we hypothesized that the unknown natural inhibitor might be the sialome (the total sum of all sialic acids presented on plasma glycoproteins). ability of hmgb to bind to different cell types of the blood (erythrocytes, monocytes and neutrophils) was determined by using different concentrations ( ng/ml, ng/ml and µg/ml) of hmgb at physiological conditions. b) different cell types of blood were used for binding with hmgb ( ng/ml) at physiological and lower ph (ph . , adjusted with lactic acid). among divalent cations, only zinc supported the robust binding of hmgb with sialylated glycoproteins at physiological ph. the binding buffer used in prior hmgb studies included millimolar concentrations of manganese cation (mn + ), a feature likely carried over from the unrelated function of nuclear hmgb binding to dna. looking at earlier studies of the interaction of hmgb with cd and cd , we noticed that all those experiments were performed in a buffer containing millimolar mn + concentrations ( , ( ) ( ) ( ) . these concentrations were very high in comparison with the physiological levels of mn + in the blood ( - µg/l). we predicted that there might be other divalent cation(s) that are better co-factor(s) for hmgb and facilitate its binding with sialic acids. indeed, upon testing micromolar concentrations of many divalent cations, we found that only zinc cation (zn + ) supported robust binding with sialylated glycoproteins ( figure a ). we tested α -acid glycoprotein and '-sialyllactose as binding partners for hmgb in the presence of different cations and again found that only zn + facilitated binding. there was a modest binding of '-sialyllactose with hmgb in the presence of mn + , but robust binding was only seen with zn + -containing buffer ( figure b ). replacing plasma with buffer at physiological ph allows hmgb to activate neutrophils, suggesting sequestration by plasma sialoglycoproteins. we next asked which whole blood components were preventing neutrophil activation under physiological conditions. hirudin-anticoagulated whole blood at physiological ph was spun down and plasma either replaced with hepes buffer (ph . ) supplemented with zn + or with the same plasma that had been removed. after incubating with hmgb , neutrophils were in a more activated state when incubated in the buffer as compared to when plasma was added back ( figure a ). independent studies have shown that hmgb binds to sialic acid on glycoproteins ( , ) and we posited that the ~ mm bound sialic acid present on plasma glycoproteins might lead to sequestration of hmgb under physiological condition. we also tested the effect of ph on the binding of hmgb to α -acid glycoprotein and found that optimal binding was at physiological ph, with less binding at ph . with buffer containing zn + ( figure b ). replacing plasma with a buffer at physiological ph allows hmgb to activate neutrophils a) ml of blood was drawn from a healthy individual and spun down. the plasma was replaced with hepes buffer containing zinc ( µm of zn + ) or plasma was added back. the cd b expression as a marker of neutrophil activation was measured. b) the binding of hmgb to α -acid glycoprotein was checked with a binding buffer using different ph ranging from . to . . sialoglycan array studies of hmgb confirm that it is a sialic acid-binding lectin with optimal binding at physiological blood ph in the presence of zinc cations. we previously reported a sialoglycan microarray platform used to identify, characterize, and validate the sia-binding properties of proteins, lectins, and antibodies ( ) ( ) ( ) . after identifying zn + -dependent hmgb binding to sialoglycoproteins, we next investigated the ability of hmgb to bind with multiple sialoglycans abundantly found in plasma proteins. we performed sialoglycan array studies of hmgb under four different conditions: ) at physiological ph with zn + , ) at physiological ph without zn + , ) at ph . with zn + ) at ph . without zn + . these array studies further confirmed the binding of hmgb with multiple sialylated glycan sequences that are typically found on plasma glycoproteins, in ph-and zn + -dependent fashion ( figure a and b respectively). additionally, we checked the binding of hmgb to sialic acids in sialoglycan microarray using , and µm concentrations of zn + and observed a dose-dependent effect ( figure b ). this assay showed the relevance of zn + in this binding phenomenon at a physiological concentration (~ µm). on resolving the binding of hmgb at physiological ph and in the presence of zinc, the binding on the microarray was exclusively to sialylated glycans confirming our findings ( figure c ). a heat map representation of all these findings and hmgb binding to individual glycosides is provided in supplementary figures and respectively. **** represents p-value < . ) heparin, a previously known anionic glycan binding partner of hmgb , does not exhibit ph sensitivity, and zn + only partially facilitates binding. hmgb is known to bind heparin, a heavily sulfated glycan carrying many negatively charged groups ( , ) . we checked the binding of hmgb with heparin at different ph values and found that unlike binding with sia, it was not ph-sensitive (supplementary figure a) . moreover, there was appreciable baseline binding of hmgb with heparin that only increased partially with zn + supplementation (supplementary figure b) . this data indicates that the binding of heparin and sialic acid are very different. the b-box of hmgb that mediates sialic acid binding ( ) has three arginine residues ( ) that might be involved in sialic acid recognition. we made single mutants of arginine residues at positions , and . when we checked the sialic acid binding, we could not find any difference between either of the mutants and wt hmgb ( supplementary fig ) . we suspect other positively charged residues and/or multiple arginines to mediate sialic acid binding. here we report one plausible explanation for the tight regulation of blood ph between . - . , showing that even a slight reduction to ph . abolishes the zinc-dependent sequestration of hmgb by plasma sialoglycoproteins, releasing it to bind to activating receptors on neutrophils. hmgb was originally discovered in the cell nucleus ( ) ( ) ( ) ( ) , playing a role in dna bending, replication and transcription ( , ) . much later, hmgb was found to be passively or actively released in conditions like sepsis, leading to inflammation ( , , ) . i.e, it is as a damp ( ) . hmgb retention inside the nucleus is dictated by conserved lysine residues ( ) . inflammatory stimuli trigger acetylation of these lysine residues and trafficking of hmgb to the cytosol, and eventually to the extracellular space. the different domains of hmgb are box a, box b and an acidic tail. while box a and box b possess many arginine and lysine residues, the acidic tail is enriched with glutamic and aspartic acid residues. box b is proinflammatory whereas box a behaves like an antagonist and mimics an anti-hmgb antibody ( , ) . while tnf-α and il- β are released early during sepsis, hmgb is a late mediator expressed only after about hours and remains at elevated levels before death occurs ( ) . many preclinical studies show protection against sepsis upon injection of blocking antibodies of hmgb or just injection of box a protein ( ) . the proinflammatory activity of hmgb is well studied. however, the anti-inflammatory activity of hmgb also has been documented in multiple studies ( ) ( ) ( ) . recently, it was shown that hmgb binds soluble cd and this complex binds with siglec- on t-cells leading to shp- (phosphatase) recruitment that dephosphorylates lck and zap , thus activating an anti-inflammatory cascade ( , ) . in addition, haptoglobin ( ) , c q and tim also show anti-inflammatory activity of hmgb ( , ) . in this study, we found that at physiological blood ph, there is no interaction of hmgb with its receptors on leukocytes. surprisingly, when we lowered the ph using lactic acid (to mimic lactic acidosis, a characteristic feature of sepsis), the interaction was restored. furthermore, the high concentration of sialic acids in plasma glycoproteins was found to be the likely inhibitor of interactions between hmgb and tlrs. we further characterized the role of hmgb as a sialic acid-binding lectin and found that zinc is a required co-factor. moreover, we confirmed all our findings with lipopolysaccharide (lps)-free hmgb and used a glycan array that detected the binding of hmgb with several sialic acid probes (see supplemental table ) in a ph and zincdependent manner. taken together, our findings lead us to propose that under physiological conditions (ph . - . ) and normal zinc concentrations, there is a potent binding of hmgb with plasma sialoglycoproteins ( figure a ). under septic conditions, drops in ph and zinc concentration decrease interactions between hmgb and plasma sialoglycoproteins leading to the liberation of hmgb to bind with tlrs, to enhance inflammation ( figure b ). therefore, proinflammatory and anti-inflammatory activities of hmgb are the two sides of the same coin and are dependent on the different physiological conditions. while the proinflammatory role of hmgb is very well studied, recent studies have reported an anti-inflammatory role for hmgb ( , ( ) ( ) ( ) . the exact mechanism that enables hmgb to switch from its proinflammatory to anti-inflammatory role and vice-versa is not very well described. one factor known to enable its switch from being proinflammatory to anti-inflammatory is its oxidative state. the disulfide form of hmgb is proinflammatory, and the sulfonate form is involved in the resolution of inflammation ( ) ( ) ( ) . in the current study, we have identified another mechanism by which hmgb switches from its proinflammatory to anti-inflammatory role in a ph-and zinc-dependent manner. sepsis is characterized by a decrease in ph and zinc concentration of the blood. we hypothesize that under physiological conditions, hmgb binds with sialoglycoproteins of blood keeping it in a quiescent state. during sepsis, the drop in ph and zinc concentration of the blood leads to disruption of hmgb 's binding with sialic acid, enabling the free hmgb to bind with tlrs and rage present on immune cells and the endothelium. this activates a cascade of the inflammatory response, which if untreated, might lead to multiple organ failure or even death. also consistent with our hypothesis are the findings that survival in mouse models of sepsis can be improved by infusion of soluble cd ( ) , and that the sialic acid binding feature of hmgb is restricted to the disulfide-form of hmgb ( ) , which is expected to be formed when the cytosolic reduced form is released into the oxidizing environment of the bloodstream. we suggest that the potent proinflammatory effects of hmgb are normally kept in check via sequestration by plasma sialoglycoproteins at physiological ph and zinc levels and is triggered when ph and zinc levels fall in the late stages of sepsis. in this regard, it is notable that the acute phase response to inflammation results in high production of hypersialylated molecules such as α -acid glycoprotein from the liver and endothelium, which may then act as a negative feedback loop ( ) ( ) ( ) . current clinical trials that are independently studying zinc supplementation (clinicaltrials.gov identifier: nct nct ) or ph normalization (nct ) may be more successful if these approaches are combined, and perhaps supplemented by infusions of heavily sialylated molecules like cd . additionally, studies evaluating plasma exchange in subjects with septic shock (example nct ) may show superior efficacy if supplemented with zinc infusions and ph correction. pre-clinical studies are presently evaluating a function blocking anti-hmgb antibody ( ) . we performed our assays with hmgb purchased from hmg biotech, also produced it in e. coli and finally confirmed findings using hmgb expressed in freestyle cells. in order to recapitulate the characteristics of hmgb in septic conditions, we used the disulfide linked form in all our assays. future studies should address whether other post-translational modifications such as acetylation, methylation, phosphorylation or oxidation have any further effect on hmgb 's propensity to bind sialic acids. numerous studies have shown that zinc is protective against sepsis ( - ). additionally, blood zinc levels usually decrease during inflammation because it is sequestered to the nucleus where it is required as a cofactor for expression of proinflammatory genes and proteins ( , , ) . thus, lowering of the zinc level in blood is detrimental. the mechanism of action for the anti-inflammatory effect of zinc is extensively studied. these include effect impact on the microbiome, lowering of nf-κb levels, chemotaxis and phagocytosis by immune cells, anti-oxidative stress and adaptive immune response ( ) . in this regard, it is notable that a recent study also shows the role of zinc, ph and ionic strength on the oligomerization of hmgb ( ). we did not investigate any role of zinc or ph on the structural changes or oligomerization of hmgb . it seems that at particular ph and zinc concentration, a positively charged residue of hmgb is exposed for binding with sialic acid. this residue may not be surface available at lower ph and low zinc concentration. in this study, we could not pinpoint the critical residue that is important for sialic acid binding. hmgb has been reported to bind many ligands and some of which are highly negatively charged molecules such as heparin/heparan sulfate ( ) . we wanted to determine if the interaction of hmgb with sialic acid, which is also negatively charged, is a generic electrostatic charge-based interaction. therefore, we tested the binding of hmgb with the acidic glycosaminoglycan, hyaluronan, but could not detect any binding (data not shown). upon testing with heparin, we found that while hmgb did bind with heparin, it did not show any ph dependency. moreover, binding was only partially enhanced in the presence of zinc. this shows that a different set of amino acid(s) might be required for binding to heparin and sialic acid. notably, under physiological conditions, sialic acid is present in the blood, but the concentrations of other anionic glycans (heparan sulfate, hyaluronic acid etc.) are low. our findings, if confirmed in randomized clinical trials, have broad implications in the management of sepsis and possibly other types of acidosis. sepsis is a significant cause of mortality, with a recent study implicating it as the cause of twice as many deaths as earlier estimated ( ) . these findings are of particular importance in light of the present covid- pandemic/survivorship in these patients. acute respiratory distress syndrome (ards), a deadly complication of the sars-cov- and sars-cov- , has been linked with hmgb production ( ) ( ) ( ) . recent articles suggest a potential link between hmgb and the pathogenesis of covid- ( , ) . a recent study showed that hmgb strongly correlates with mortality in covid- patients ( ) . additionally, another recent study showed % of covid- non-survivors had sepsis and % of these had acidosis ( ) . while the surviving sepsis campaign does not suggest the use of convalescent plasma in critically ill patients, ( ) , the fda has approved its use as an investigational new drug. a small study of five critically ill covid- patients treated with convalescent plasma showed improvements in sepsis related sofa scores ( ). a clinicaltrials.gov search for "covid" and "convalescent plasma" on april , yielded results of trials ranging from phase to phase . while the circulating antibodies are likely to be beneficial on their own, the hmgb -sequestering properties of plasma sialoglycoproteins may also contribute to suppressing the "cytokine storm". these effects are likely to be further enhanced if plasmapheresis is supplemented with aggressive ph correction and zinc supplementation. elisa for binding of hmgb with α -acid glycoprotein or '-sialyllactose: ng- µg of hmgb recombinant protein (hmg biotech) diluted with the binding buffer ( mm hepes, mm nacl, µm and the absorbance was acquired at nm with a plate reader. for the elisa with different divalent cations, the binding buffer was prepared using the particular cation containing salt instead of zncl . each incubation and wash was performed using the respective binding buffer. informed consent was obtained from healthy individuals as per a protocol approved by the ucsd human research protection programs institutional review board and venous blood was collected in hirudin coated tubes (thermofischer catalogue number-nc ). hirudin was chosen as the anticoagulant as edta and heparin interferes with normal bioprocesses (chelation by edta and binding to and modulating cell-surface proteins by heparin). the ph of blood, when measured at the start of various assays varied between . and . and is referred to as the "physiological" ph. to test for neutrophill activation, µl of whole blood was incubated with µg/ml of hmgb for minutes at °c. cd b expression was measured by flow cytometry as described earlier ( , ) . blocking with an anti-hmgb antibody (clone e , biolegend, catalogue number- ) was performed with µg/ml antibody as described earlier ( ) . for plasma addback studies, whole blood was spun down at x g for minutes and replaced with hepes buffer supplemented with µm zncl . binding assays were performed with µl of whole blood. the required amount of hmgb ( , , , ng/ml) was added to µl of blood and incubated at o c for minutes with rotation. after centrifuging at x g for minutes, the cells were washed with ml of pbs and finally resuspended in µl of facs buffer ( % bsa in pbs with ca + /mg + ) with anti-hmgb antibody ( µg/ml, biolegend, catalogue number- ). the cells were incubated at o c for minutes on ice and were washed with ml pbs (containing ca + /mg + ). the cells were subsequently resuspended in µl of facs buffer with a secondary anti-mouse-apc antibody (biolegend, catalogue number- ). the cells were incubated at °c for minutes on ice and washed with pbs as before. µl was taken from each sample for rbc analysis and the rest of the sample was fixed with % paraformaldehyde (pfa) and incubated on ice for min. the sample was then washed with pbs and subsequently treated with ack lysis buffer (gibco, catalogue number-a - ) to perform analysis of rbcs. the sample was washed and resuspended in µl of facs buffer. in the forward and side scatter profile, monocytes and neutrophils were gated for the analysis. for gating of monocytes forward and side scatter pattern was used. the surface markers were not used for this gating. chemoenzymatically synthesized sialyl glycans were quantitated utilizing dmb-hplc analysis and were dissolved in mm sodium phosphate buffer (ph . ) to a final concentration of µm. arrayit spotbot ® extreme was used for printing the sialoglycans on nhs-functionalized glass slides (polyan d-nhs slides from automate scientific; catalogue number-po- ). purified mouse anti-hmgb antibody (biolegend; catalogue number- , lot# b ) and cy -conjugated goat anti-mouse igg (jackson immunoresearch; catalogue number- - - ) were used. fresh hepes buffer ( mm hepes, mm nacl ± µm zncl ) was prepared immediately before starting the microarray experiments. method described in ( ) was adapted to perform the microarray experiment. each glycan was printed in quadruplets. the temperature ( °c) and humidity ( %) inside the arrayit ® printing chamber was rigorously maintained during the printing process. the slides were left for drying for an additional h. printed glycan microarray slides were blocked with pre-warmed . m ethanolamine solution (in . m tris-hcl, ph . ), washed with warm milli-q water, dried, and then fitted in a multi-well microarray hybridization cassette (arrayit, ca) to divide it into subarrays. each subarray well was treated with µl of ovalbumin ( % w/v) dissolved in freshly prepared hepes blocking buffer ± µm of zn + (ph adjusted for individual experiments) for h at ambient temperature in a humid chamber with gentle shaking. subsequently, the blocking solution was discarded, and a solution of hmgb ( µg/ml) in the same hepes buffer (± zn + , defined ph) was added to the subarray. after incubating for hours at room temperature with gentle shaking, the slides were extensively washed (first with pbs buffer with . %tween and then with only pbs, ph . ) to remove any non-specific binding. the subarray was further treated with a : dilution (in pbs) of cy -conjugated goat anti-mouse igg (fc specific) secondary antibody and then gently shaken for hour in the dark, humid chamber followed by the same washing cycle described earlier. the developed glycan microarray slides were then dried and scanned with a genepix b (molecular devices corp., union city, ca) microarray scanner (at nm). data analysis was performed using the genepix pro . analysis software (molecular devices corp., union city, ca). expression and purification of full-length murine his-hmgb in e. coli were performed as described before ( ) . mutagenesis was performed using a quikchange site-directed mutagenesis kit (agilent). acid-base homeostasis clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study lactic acidosis sepsis-associated hyperlactatemia novel insights for systemic inflammation in sepsis and hemorrhage sepsis: something old, something new, and a systems view harmful molecular mechanisms in sepsis the immunopathology of sepsis and potential therapeutic targets the covid- cytokine storm; what we know so far targeting hmgb in the treatment of sepsis hmgb , a potent proinflammatory cytokine in sepsis high mobility group protein (hmg- ) stimulates proinflammatory cytokine synthesis in human monocytes elevated serum levels of s a /a and hmgb at hospital admission are correlated with inferior clinical outcomes in covid- patients erythrocyte sialoglycoproteins engage siglec- on neutrophils to suppress activation selectin-mucin interactions as a probable molecular explanation for the association of trousseau syndrome with mucinous adenocarcinomas sepsis and septic shock damage-associated molecular patterns in inflammatory diseases damp-sensing receptors in sterile inflammation and inflammatory diseases complexity of danger: the diverse nature of damage-associated molecular patterns high hmgb level is associated with poor outcome of septicemic melioidosis hmg- as a late mediator of endotoxin lethality in mice hmgb signals through toll-like receptor (tlr) and tlr a critical cysteine is required for hmgb binding to tolllike receptor and activation of macrophage cytokine release hmgb and rage in inflammation and cancer cd and siglec- selectively repress tissue damage-induced immune responses cd glycan binds the proinflammatory b box of hmgb to engage the siglec- receptor and suppress human t cell function occurrence of sialic acids in healthy humans and different disorders hmg-box domain stimulation of rag / cleavage activity is metal ion dependent the effect of manganese(ii) on dna structure: electronic and vibrational circular dichroism studies specific sialoforms required for the immune suppressive activity of human soluble cd a sialylated glycan microarray reveals novel interactions of modified sialic acids with proteins and viruses human xeno-autoantibodies against a non-human sialic acid serve as novel serum biomarkers and immunotherapeutics in cancer cross-comparison of protein recognition of sialic acid diversity on two novel sialoglycan microarrays heparan sulfate is essential for high mobility group protein (hmgb ) signaling by the receptor for advanced glycation end products (rage) design of anti-inflammatory heparan sulfate to protect against acetaminophen-induced acute liver failure a new group of chromatin-associated proteins with a high content of acidic and basic amino acids emerging roles for hmgb protein in immunity, inflammation, and cancer hmgb in health and disease hmgb interacts with many apparently unrelated proteins by recognizing short amino acid sequences hmgb as biomarker and drug target the evolution of high mobility group box (hmgb) chromatin proteins in multicellular animals release of chromatin protein hmgb by necrotic cells triggers inflammation tolerance, danger, and the extended family monocytic cells hyperacetylate chromatin protein hmgb to redirect it towards secretion the many faces of hmgb : molecular structurefunctional activity in inflammation, apoptosis, and chemotaxis therapeutic potential of hmgb -targeting agents in sepsis reversing established sepsis with antagonists of endogenous high-mobility group box identification of cd as an antiinflammatory receptor for hmgb -haptoglobin complexes c q and hmgb reciprocally regulate human macrophage polarization tumor-infiltrating dcs suppress nucleic acid-mediated innate immune responses through interactions between the receptor tim- and the alarmin hmgb t cell regulation mediated by interaction of soluble cd with the inhibitory receptor siglec- mutually exclusive redox forms of hmgb promote cell recruitment or proinflammatory cytokine release regulation of posttranslational modifications of hmgb during immune responses high-mobility group box protein orchestrates responses to tissue damage via inflammation, innate and adaptive immunity, and tissue repair cd inhibits toll-like receptor activation of nf-κb and triggers apoptosis to suppress inflammation alpha- -acid glycoprotein hernàndez pando r. alpha- -acid glycoprotein, its local production and immunopathological participation in experimental pulmonary tuberculosis detailed structural features of glycan chains derived from alpha -acid glycoproteins of several different animals: the presence of hypersialylated, o-acetylated sialic acids but not disialyl residues generation of monoclonal antibodies against highly conserved antigens mechanistic insights into the protective impact of zinc on sepsis persistent low serum zinc is associated with recurrent sepsis in critically ill patients -a pilot study low zinc and selenium concentrations in sepsis are associated with oxidative damage and inflammation zinc dyshomeostasis during polymicrobial sepsis in mice involves zinc transporter zip and can be overcome by zinc supplementation interleukin- regulates the zinc transporter zip in liver and contributes to the hypozincemia of the acute-phase response the effect of physicochemical factors on the self-association of hmgb : a surface plasmon resonance study global, regional, and national sepsis incidence and mortality, - : analysis for the global burden of disease study involvement of high mobility group box and the therapeutic effect of recombinant thrombomodulin in a mouse model of severe acute respiratory distress syndrome platelet-derived hmgb : critical mediator of sars related to transfusion pathogenic role of hmgb in sars extracellular hmgb : a therapeutic target in severe pulmonary inflammation including covid- hmgb : a possible crucial therapeutic target for covid- surviving sepsis campaign: guidelines on the management of critically ill adults with coronavirus disease (covid- ) we thank sandra diaz and patrick secrest for their excellent technical help with the work. key: cord- -mhawyect authors: desirò, daniel; hölzer, martin; ibrahim, bashar; marz, manja title: silentmutations (sim): a tool for analyzing long-range rna-rna interactions in viral genomes and structural rnas date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: mhawyect background a single nucleotide change or mutation in the coding region can alter the amino acid sequence of a protein. in consequence, natural or artificial sequence changes in viral rnas may have various effects not only on protein stability, function and structure but also on viral replication. in the last decade, several tools have been developed to predict the effect of mutations in structural rna genomes. some tools employ multiple point mutations and are also taking coding regions into account. however, none of these tools was designed to specifically simulate the effect of mutations on viral long-range interactions. results here, we developed silentmutations (sim), an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral rnas. the tool can simulate destructive and compensatory mutants of two interacting single-stranded rnas. this will facilitate a fast and accurate assessment of key regions, possibly involved in functional long-range rna-rna interactions and finally help virologists to design appropriate experiments. sim only needs two interacting single-stranded rna regions as input. the output is a plain text file containing the most promising mutants and a graphical representation of all interactions. conclusion we applied our tool on two experimentally validated influenza a virus and hepatitis c virus interactions and we were able to predict potential double mutants for in vitro validation experiments. availability the source code and documentation of sim are freely available at github.com/desiro/silentmutations in the last decades, several computational tools for the analysis of rna secondary structures have been developed. however, tools specifically targeting virus needs are still rare and underresearched [ ] [ ] [ ] . for example, long-range rna-rna interactions (lris) play an important role in the life cycle of rna viruses. many lris are already known to directly functioning as activators or inhibitors of viral replication and translation . for instance, several interactions were recently identified as possibly new lris in the hepatitis c virus (hcv) genome using lriscan . some of these computationally identified interactions have already been verified experimentally [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . apart from long-range interactions forming in a single sequence, lri-like structures can also occur between two separate sequences. whereas the general function of such interactions is still under investigation, they can be seen as viral lris and are presumably responsible for the correct packaging of all segments into the viral capsid of segmented viruses such as the influenza a virus (iav) [ ] [ ] [ ] [ ] . due to the rapidly growing number of possible functional lris identified, it is essential to have an effective verification method. technically, such interactions can be destroyed by mutation or removel of the interacting parts using in vitro experiments , . such sequence changes can result in secondary structure changes and manifest different viral titers. however, with the alteration of the sequence, unwanted effects can arise that not only effect the lri. to cope with this issue, we can alter both interacting rna parts simultaneously. finally, we want the combination of both mutated parts resulting in a similar interaction strength in comparison to the interaction between the wild type (wt) sequences. a single mutated sequence part combined with the opposing wt sequence should destroy the interaction. such a technique was recently used to verify a possible lri between two iav segments . several computational tools are available that can alter a sequence and report alternative secondary structures with one or even multiple mutations [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . however, these tools are designed for certain applications and do not meet the requirements of the previously proposed experimental technique: the combinatorial in vitro analysis of rna-rna interactions. in this study, we present a tool called silentmutations (sim) that effectively simulates synonymous (silent) compensatory mutations in two single-stranded viral rnas and is therefore appropriate for the in vitro assessment of predicted lris. here, we present a command-line tool, called silentmutations (sim), that can simulate synonymous structure-destroying and structurepreserving mutation pairs within coding regions for long-range rna-rna interaction experiments. the tool has been written in python (v . . ) and relies heavily on the rnacofold python sitepackage of the viennarna package (v . ) . as the viennarna package is available for linux, windows, and macos, sim runs on all three different platforms. the various parameters of sim are fully adjustable and several filters (fig. ) allow a fast and accurate prediction, therefore the tool can be run on a standard notebook. a simple run takes about seconds when using the standard parameters, two sequences of length as input, and a single core. generally, the runtime highly depends on the parameter setup and the length of the input sequences. the main computational steps of sim are ( ) preprocessing, ( ) permutation, ( ) attenuation, ( ) recovery, and ( ) sampling. a special challenge in respect to negative singlestranded viruses (such as ebola, marburg, and coronaviruses) is, that most databases only contain the positive strand of the viral genomes. while the protein-coding sequence is encoded on the positive strand, the folding of these viruses happens on the negative strand. depending on the user input, sim can automatically create the reverse and complement strand of negative singlestranded rna viruses for folding while maintaining codon integrity on the positive strand. this can be easily achieved with the --virusclass=ssrna-, --reverse, and --complement options. the user can directly provide the positive strand sequence from the database in a fasta file, with the reading frame and regions on the positive strand. the default output of silentmutations prints all relevant information of the folding in a plain text file and additionally creates for each folding a varna command for visualization. additionally, if the tool can find installations of varna and inkscape , it will directly generate high-quality vector graphics of the various foldings. a default binary of varna (v . ) is provided with the tools source code. silentmutations is primary designed for two sequence snippets (hereafter called snips) in coding regions but can be used to simulate interactions between coding and non-coding regions as well as for interactions between two non-coding regions. this can be exercised with the --noncoding and --noncoding parameters, denoting the first or second sequence as non-coding, respectively. we use the following notation for the minimum free energy (mfe) obtained by folding sequence x with sequence y via rnacofold: the mfe is represented as a negative value in kcal/mol and therefore having a lower mfe results in a more stable structure. the first step of sim extracts sequence snips from a longer sequence template while keeping codons intact (fig. ) . especially, the difficult extraction of snips in the exact reading frame from negative sense single-stranded rnas can be easily handled by the build-in functionalities of the tool. sim thus directly accepts the whole sequence as input, together with a start and end position for the snips and the reading frame. the tool first extracts the requested interacting parts from each sequence and automatically increases them in both directions if the provided range would otherwise split a codon based on the predefined reading frame (fig. ) . this ensures a preservation of all codons. both sequences, pre-snip (ps ) and pre-snip (ps ), will then be fold together via rnacofold to acquire only codons involved in the folding. unpaired codons at the termini will be removed to get the final snip (s ) and snip (s ) from which silentmutations calculates the two mutant sequences mut (m ) and mut (m ). figure : overall workflow of the silentmutations tool, exemplary shown for two sequences from a negative single-stranded rna virus genome (ssrna-) (a) the first step will extract the defined range in each sequence and possibly increase the range to preserve codons in the given reading frame. both so-called pre-snips are folded with rnacofold to remove unpaired codons at the endings. we refer to an extracted singlestranded rna snippet as snip. (b) all possible codon permutations (here, reverse complements due to ssrna-) are generated for both snips (called perms) to conserve the amino acid sequence. permutations with too many mutations defined by the mutations (-mut) parameter are discarded. (c) to further decrease the total number of combinations, sim will fold each snip with all permutations of the other snip and keep permutations with a mfe higher or equal to mfe(snip ,snip ) times the filter percentage (-prc) parameter, denoted as upper limit l. (d) all remaining snip permutations will then be folded against all snip permutations to find snip and snip mutations with a similar fold mfe. a double-mutant mfe(mut ,mut ) fold is considered to be similar, if its mfe is lower than the wild-type mfe(snip ,snip ) times the lower deviation (-ldv) parameter and higher than mfe(snip ,snip ) times the upper deviation (-udv) parameter. (e) the last step will minimize a combined single-mutant mfe(snip ,mut ) and mfe(mut ,snip ) by keeping both mutations in a similar range depending on the mutation range (-mrg) parameter. the range for each mutation combination is defined by (mfe(snip , mut ) + mfe(mut , snip )) · . · ( + -mrg) for the lower threshold (l-mrg) and (mfe(snip , mut ) + mfe(mut , snip )) · . · ( − -mrg) for the upper threshold (u-mrg). different sets and their abbreviations are given in brackets. details can be found in the methods. in this step all possible permutations pre-perms (p*) and pre-perms (q*) of synonymous codons are created from each sequence in order to find the most suitable mutations that maximize the distance between mfe(s ,s ) and mfe(s ,m ) as well as the distance between mfe(s ,s ) and mfe(m ,s ), while keeping the wild-type mfe(s ,s ) and the double-mutant mfe(m ,m ) similar. the number of these permutations can be vast, depending on the number of synonymous codons and the length of the sequence. in order to reduce the computational complexity, this step also removes every permutation sequence with more mutations than predefined by the --mutations parameter resulting in the sets perms (p) and perms (q), see fig. . having fewer alterations in a sequence with a strong attenuation effect on the viral titer additionally improves the authenticity of the mutational experiment. here, a further reduction of the computational complexity is applied while already creating valid sequences with reduced mfe folding scores. having a small difference between the wildtype mfe(s ,s ) and single-mutant mfe(s ,m ) or mfe(m ,s ) respectively would diminish the mutational effect on the interaction. it is therefore vital to increase the distance between these mfe scores. therefore, the user can directly specify the required distance with the filter percentage (--filterperc) parameter. the defined value is then multiplied with the mfe(s ,s ) value to obtain a upper limit (denoted l) for mfe(s ,m ) and mfe(m ,s ). then, the tool creates all foldings between s and q as well as p and s . each folding mfe(s , q j ∈ q) and mfe(p i ∈ p, s ) (with i and j defining the i-th and j-th element of set p and q, respectively) is only considered for the next recovery step if the mfe is higher than l. the new sets mut (u) and mut (v ) have a drastically reduced size compared to p and q which greatly reduces the computational complexity for finding the two mutant sequences m and m where mfe(m ,m ) is similar to the wild-type mfe(s ,s ). while it seems to be convenient to set a very low --filterperc parameter, this could also result in an empty permutation set. on the other hand, setting a high --filterperc could result in longer running times. it is therefore recommended to start with the default . and later adjust this parameter depending on the performance. to find double-mutants (mut) with a similar folding mfe such as the wild-type (wt) folding mfe, each sequence u i ∈ u and v i ∈ v has to be folded against each other with rnacofold. the runtime of this folding process is highly dependent upon the previous attenuation step. when using the unprocessed p* and q* sets from the example in fig. , the number of required folding operations would be around million and reduced to only , for the filtered sets u and v. some lris function as inhibitors for viral replication and a stronger interaction would decrease functional viral reproduction. it is therefore necessary for the simulation of the double-mutant interaction to set a lower but also an upper similarity threshold. each of the calculated mfe(u i ∈ u, v j ∈ v ) values are therefore compared with the wild-type mfe(s ,s ). for each mutant tuple. the most suitable mutant pair, holding all predefined requirements, is finally found by calculating: silentmutations is primarily designed as a supporting tool in the assembly step of an in vitro experiment, such as previously performed by gavazzi et al. in influenza a viruses . with the help of sim, experimentalists can simulate for the first time specific rna-rna interactions of viral coding-and non-coding sequence snippets, potentially forming stable long-range interactions. using sim, the search space of possible doublemutant sequences, forming a stable secondary structure with a comparable mfe to the wildtype structure and simultaneously disrupting the single-mutants structure, can be drastically reduced. therefore, sim can be used prior time-and cost-consuming wet-lab experiments to simulate promising double-mutant sequences that preserve the wild-type lri, however are not functional as single-mutants. to validate sim and to show that the tool is able to predict biological relevant mutants, we used two validated interaction examples, one in the influenza a virus h n a/finch/england/ / strain and another in the hepatitis c virus type b strain (accession: aj . ). the influenza a virus (iav) genome consists of viral ribonucleoproteins (vrnps) and each vrnp segment includes one of different negative sense and single-stranded viral rnas. it is hypothesized, that these segments are packed selectively through rna-rna interactions between the segments , [ ] [ ] [ ] . to this end, gavazzi et al. performed an in vitro mutation experiment in iav, perfectly shaped to validate our tool. they took a yet unconfirmed interaction from one of their previous experiments between the pb and the ns segment and introduced four transcomplementary point substitutions by hand. an interaction between the pb mutant and ns mutant resulted in a similar viral titer than using the pb wt and ns wt segment. introducing only the pb mutant or ns mutant resulted in both cases in an attenuation of the viral replication. to show that we are able to obtain the same experimental results of gavazzi et al. computationally, we adjusted some key parameters of sim. importantly, we limited the number of possible mutation pairs to four and calculated the mfe between the wt and mutant (mut) combinations in mere seconds. the sim results of the secondary structure predictions and mfe values for the two iva segment combinations are shown in fig. . by only allowing a maximum of four mutation pairs, we were able to calculate the same structures and mutations as previously proposed by gavazzi et al. . moreover, our results show a mfe difference of . kcal/mol between wildtype and double-mutant (fig. a and b) and a mfe difference of . kcal/mol between singlemutants ( fig. c and d) . again, we want to point out that for the in silico experiment it is preferable to have a similar mfe between the two wild-type iav segments mfe(ns w t , pb w t ) and the double-mutant mfe(ns mut , pb mut ) as well as between the single-mutants mfe(ns w t , pb mut ) and mfe(ns mut , pb w t ). in a next step and by using sim with default parameters, we were able to calculate a double-mutant that not only reflects the results of gavazzi et al. , but also shows slightly lower mfe differences between the wild-type and the double-mutant ( . kcal/mol) as well as between the two singlemutants ( . kcal/mol), see fig. . therefore, the interaction strengths of mfe(ns w t , pb w t ) and mfe(ns mut , pb mut ), as well as mfe(ns w t , pb mut ) and mfe(ns mut , pb w t ) are more closely to each other in comparison to the results of gavazzi et al. . as a possible drawback, our simulation needs to introduce one more point mutation in each single-stranded rna snip (fig. ) . the hepatitis c virus (hcv) genome consists of a positive single-stranded rna of about kb length. this rna is translated into a single polyprotein that is later cleaved into four structural (c, e , e , p ) and six nonstructural (ns , ns , ns a, ns b, ns a, rdrp) proteins by viral and host proteases , . both utr regions of the viral genome are highly structured and have been extensively studied in the past , [ ] [ ] [ ] [ ] . for our validation of sim, we have chosen a well studied interaction between the ' utr and the end region of the orf encoding the hcv polyprotein. several studies , , [ ] [ ] [ ] [ ] have already shown that the rna replication highly depends on the conserved structures of the x-tail , sequence contained in the ' utr. this sequence is presumed to contain three experimentally verified stem-loops (sli, slii, sliii) which may interact with other parts of the viral genome through long-range rna-rna interactions to regulate replication . the lri between the free nucleotides of the hairpin from the ' slii structure and the free nucleotides of the hairpin from the bsl . structure have been subject of many hcv studies. the interaction was first verified by friebe et al. and was also computationally found with lriscan by fricke et al. . another reason for selecting this interaction as an example was the duality of having one interacting part in a coding and one in a non-coding region. applying sim on this interaction resulted in two compensatory point mutations in each snip (fig. ) . as the ' slii region is non-coding, only the point mutations in the snip from the bsl . structure are silent. our results show, that the wild-type structure mfe( bsl . w t , slii w t ) should have a similar strength compared to the calculated doublemutant structure mfe( bsl . mut , slii mut ), see fig. . additionally, both mfe( bsl . w t , slii mut ) and mfe( bsl . mut , slii w t ) weaken the interaction significantly. we propose, that our simulated mutations in hcv may be used to verify the given longrange interaction. taking all the new long-range interactions found by fricke et al. into account, we propose that our tool can be used to create mutation experiments for every predicted lri to provide evidence for a biological function of these interactions. such experiments would be especially interesting for iav, where the exact packaging process of the vrnp segments is not yet fully understood. by presenting sim, we provide an easy and fast way to analyze possible interactions between vrnps. the tool can be used to heavily reduce the search space of possible synonymous mutation interactions between two rnas. another difficulty when creating silent iav mutations lies in preserving the codons on the positive strand, while mutating the negative strand, and is also intercepted by sim. furthermore, our tool provides a significant speedup, not only in the verification of interactions in these two viruses, but also for many other virus families. our simulations will help to gather a deeper understanding of the translation and replication processes in viruses and also how long-range interactions are regulating these. a promising future approach would be the combined application of lriscan and silentmutations to detect currently unknown lris and to provide in the same step possible mutational verification experiments for each lri. virologists-heroes need weapons a new era of virus bioinformatics software dedicated to virus sequence analysis functional long-range rna-rna interactions in positive-strand rna viruses conserved rna secondary structures and long-range interactions in hepatitis c viruses the functional rna domain bsl . within the ns b coding sequence influences hepatitis c virus ires-mediated translation. cellular and molecular life sciences : cmls a twist in the tail: shape mapping of long-range interactions and structural rearrangements of rna elements involved in hcv replication kissing-loop interaction in the ' end of the hepatitis c virus genome essential for rna replication cis-acting rna elements in human and animal plus-strand rna viruses natural variation in translational activities of the ' nontranslated rnas of hepatitis c virus genotypes a and b: evidence for a long-range rna-rna interaction outside of the internal ribosomal entry site core protein-coding sequence, but not core protein, modulates the efficiency of cap-independent translation directed by the internal ribosome entry site of hepatitis c virus long-range rna-rna interaction between the ' nontranslated region and the core-coding sequences of hepatitis c virus modulates the ires-dependent translation rnase iii cleavage demonstrates a long range rna: rna duplex element flanking the hepatitis c virus internal ribosome entry site an in vitro network of intermolecular interactions between viral rna segments of an avian h n influenza a virus: comparison with a human h n virus a functional sequence-specific interaction between influenza a virus genomic rna segments a supramolecular assembly formed by influenza a virus genomic rna segments interaction network linking the human h n influenza a virus genomic rna segments. vaccine a long-range rna-rna interaction between the ' and ' ends of the hcv genome corrna: a web server for predicting multiple-point deleterious mutations in structural rnas the rnasnp web server: predicting snp effects on local rna secondary structure the rnamute web server for the mutational analysis of rna secondary structures rdmas: a web server for rna deleterious mutation analysis rnamutants: a web server to explore the mutational landscape of rna secondary structures mutational analysis in rnas: comparing programs for rna deleterious mutation prediction rtools: a web server for various secondary structural analyses on single rna sequences efficient algorithms for probing the rna mutation landscape viennarna package . varna: interactive drawing and editing of the rna secondary structure genome packaging in influenza a virus architecture of ribonucleoprotein complexes in influenza a virus particles selective incorporation of influenza virus rna segments into virions hepatitis c virus rna replication hepatitis c virus proteins: from structure to function secondary structure of the ' nontranslated regions of hepatitis c virus and pestivirus genomic rnas a phylogenetically conserved stem-loop structure at the ' border of the internal ribosome entry site of hepatitis c virus is required for cap-independent viral translation role of rna structures in genome terminal sequences of the hepatitis c virus for replication and assembly hepatitis c virus rna translation in vivo analysis of the ' untranslated region of the hepatitis c virus after in vitro mutagenesis of an infectious cdna clone genetic analysis of sequences in the ' nontranslated region of hepatitis c virus that are important for rna replication ' nontranslated rna signals required for replication of hepatitis c virus rna ' rna elements in hepatitis c virus replication: kissing partners and long poly(u) a novel sequence found at the ' terminus of hepatitis c virus genome secondary structure determination of the conserved -base sequence at the ' terminus of hepatitis c virus genome rna author contributions dd performed the design, development and programming of the tool and wrote the main draft of the paper. mh, bi, and mm contributed in writing, discussions and proofreading of the final manuscript. all authors read and approved the final manuscript. key: cord- -bkyk or authors: burns, c. sean; nix, tyler; shapiro, robert m.; huber, jeffrey t. title: methodological issues with search in medline: a longitudinal query analysis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bkyk or this study compares the results of data collected from a longitudinal query analysis of the medline database hosted on multiple platforms that include pubmed, ebscohost, ovid, proquest, and web of science in order to identify variations among the search results on the platforms after controlling for search query syntax. we devised twenty-nine sets of search queries comprised of five queries per set to search against the five medline database platforms. we ran our queries monthly for a year and collected search result count data to observe changes. we found that search results vary considerably depending on medline platform, both within sets and across time. the variation is due to trends in scholarly publication that include publishing online first versus publishing in journal issues, which leads to metadata differences in the bibliographic record; to differences in the level of specificity among search fields provided by the platforms; to database integrity issues that lead to large fluctuations in monthly search results based on the same query; and to database currency issues that arise due to when each platform updates its medline file. specific bibliographic databases, like pubmed and medline, are used to inform clinical decision-making, create systematic reviews, and construct knowledge bases for clinical decision support systems. since they serve as essential information retrieval and discovery tools that help identify and collect research data and are used in a broad range of fields and as the basis of multiple research designs, this study should help clinicians, researcher, librarians, informationalists, and others understand how these platforms differ and inform future work in their standardization. to answer these questions, our analytical framework is based on the concepts of methods and results reproducibility [ ] . methods reproducibility is "the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results" and results reproducibility is "the production of corroborating results in a new study, having followed the same experimental methods (a new lexicon for research reproducibility section, para. ). we do not apply the concept of inferential reproducibility in this paper since this pertains to the conclusions that a study makes based on the reproduced methods, and this would largely be applicable if we investigated the relevance of the results based on an information need rather than, as we do, focus solely on the reproducible sets of search queries and the records produced by executing those queries. the search queries, tested in the pilot studies, were designed to be semantically and logically equivalent to each other on a per set basis. differences between queries within sets were made only to adhere to the query syntax required for each platform. note: column meanings: the keyword column indicates how many keywords were used in the query, not counting field specific keywords, such as document title, journal title, or author name. the latter are counted in the fieldspecific column, which indicates the number of field specific terms used in the query. the mesh column indicates how many mesh terms were used in the query. the branches column indicates how many trees a mesh term belongs to. the pubdate column is a binary column to indicate whether a query does not include a publication date ( ) or includes a publication date ( ). the explode column indicates whether a mesh term was not exploded ( ), exploded ( ), or in queries with multipe mesh terms, at least one term was exploded and one was not ( ) . the and, or, and not columns indicate a count of how many of these boolean operators were used in the query. in legacy pubmed, queries require an 'and medline [sb] ' tag in order to limit results to medline only and to exclude pubmed more broadly. these ands were not counted in this column. we did count ands when used to join terms or when including publication date ranges in our searches, even for ovid/medline, even though ovid/medline uses the limit operator and not technically the and operator. databases as scientific instruments and their role in the ordering of scientific work scholarship and disciplinary practices should metaanalysts search embase in addition to medline? examining the role of medline as a patient care information resource: an analysis of data from the value of libraries study human(e) factor in clinical decision support systems sources of polysemy in indexing practice: the case of games, experimental in mesh emerging trends and new developments in information science: a document co-citation analysis classical databases and knowledge organization: a case for boolean retrieval and human decision-making during searches can we prioritise which databases to search? a case study using a systematic review of frozen shoulder management comparing the coverage, recall, and precision of searches for systematic reviews in embase, medline, and google scholar: a prospective study a comparison of the performance of seven key bibliographic databases in identifying all relevant systematic reviews of interventions for hypertension availability of renal literature in six bibliographic databases integrating evidence-based practice and information literacy skills in teaching physical and occupational therapy students searching pubmed for a broad subject area: how effective are palliative care clinicians in finding the evidence in their field? a content analysis of strategies and tactics observed among mlis students in an online searching course a learning-based approach for performing an in-depth literature search using medline breaking records: the history of bibliographic records and their influence in conceptualizing bibliographic data improving information retrieval using medical subject headings concepts: a test case on rare and chronic diseases the earth is flat (p> . ): significance thresholds and the crisis of unreplicable research , scientists lift the lid on reproducibility estimating the reproducibility of psychological science preferred reporting items for systematic reviews and meta-analyses: the prisma statement cochrane handbook for systematic reviews of interventions available: /handbook bibliographic database access using free-text and controlled vocabulary: an evaluation full text database retrieval performance a checklist to assess database-hosting platforms for designing and running searches for systematic reviews comparison of cinahl® via ebscohost®, ovid®, and proquest® a comparison of searching the cochrane library databases via crd, ovid and wiley: implications for systematic searching and information services updated algorithm for the pubmed best match sort order. nlm technical bulletin increasing number of databases searched in systematic reviews and meta-analyses between and compliance of systematic reviews in veterinary journals with preferred reporting items for systematic reviews and meta-analysis (prisma) literature search reporting guidelines database selection in systematic reviews: an insight through clinical neurology the comparative recall of google scholar versus pubmed in identical searches for biomedical systematic reviews: a review of searches used in systematic reviews searching one or two databases was insufficient for meta-analysis of observational studies analysis of the reporting of search strategies in cochrane systematic reviews information retrieval in telemedicine: a comparative study on bibliographic databases bias on the web search engine coverage bias: evidence and possible causes evaluating the usability and usefulness of a digital library the impact of query interface design on stress, workload and performance development of a search strategy for an evidence based retrieval service what a difference an interface makes: just how reliable are your search results? focus altern complement ther when is a search not a search? a comparison of searching the amed complementary health database via ebscohost, ovid and dialog developing search strategies for clinical practice guidelines in sumsearch and google scholar and assessing their retrieval performance comparison of journal title coverage between cinahl and scopus medical literature searches: a comparison of pubmed and google scholar what does research reproducibility mean? the new pubmed is here. u.s. national library of medicine advanced pubmed searching resource packet. u.s. national library of medicine systematic bias in cancer patient-reported outcomes: symptom 'orphans' and 'champions new mesh supplementary concept record for the novel coronavirus, wuhan, china. nlm technical bulletin google scholar: the pros and the cons is the coverage of google scholar enough to be used alone for systematic reviews google scholar is not enough to be used alone for systematic reviews key: cord- -tc z nm authors: zhang, yuan; wei, yanqiu; li, yunlong; wang, xuan; liu, yang; tian, deyu; jia, xiaojuan; gong, rui; liu, wenjun; yang, limin title: igy antibodies against ebola virus possess post-exposure protection and excellent thermostability date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tc z nm ebola virus (ebov) is the most virulent pathogens that cause hemorrhagic fever with high mortality rates in humans and nonhuman primates. the postexposure antibody therapies to prevent ebov infection are considered efficient. however, due to the poor thermal stability of mammalian antibody, their application in the tropics has been limited. here, we developed a thermostable therapeutic antibody against ebov based on chicken immunoglobulin y (igy). the igy antibodies demonstrated excellent thermal stability, which retained their neutralizing activity at °c for one year, in contrast to conventional polyclonal or monoclonal antibodies (mabs). we immunized laying hens with a variety of ebov vaccine candidates and confirmed that vsv Δ g/ebovgp encoding the ebov glycoprotein could induce high titer neutralizing antibodies against ebov. the therapeutic efficacy of immune igy antibodies in vivo was evaluated in the newborn balb/c mice model. lethal dose of virus challenged mice were treated or h post-infection with different doses of anti-ebov igy. the group receiving a high dose of nau/kg (neutralizing antibody units/kilogram) achieved complete protection with no signs of disease, while the low-dose group was only partially protected. in contrast, all mice receiving naïve igy died within days. in conclusion, the anti-ebov igy exhibits excellent thermostability and protective efficacy, and it is very promising to be developed as alternative therapeutic entities. author summary although several ebola virus therapeutic antibodies have been reported in recent years, however, due to the poor thermal stability of mammalian antibody, their application in tropical endemic areas has been limited. we developed a highly thermostable therapeutic antibody against ebov based on chicken immunoglobulin y (igy). the igy antibodies demonstrated excellent thermal stability, which retained their neutralizing activity at °c for one year. the newborn mice receiving passive transfer of igy achieved complete protection against a lethal dose of virus challenge indicating that the anti-ebov igy provides a promising countermeasure to solve the current clinical application problems of ebola antibody-based treatments in africa. ebola virus (ebov) belongs to the filoviridae family and the known cause of severe hemorrhagic fever in humans and nonhuman primates (nhps). since the epidemic of zaire the ongoing outbreak in the drc is the second-largest ebola epidemic on record, with lives lost and confirmed infections since august , which prompted who to declare this epidemic a public health emergency of international concern. pandemic potential, high mortality, high infectivity, and lack of preventive and therapeutic approaches make ebov a class a pathogen that seriously threatens public health. the intermittent and continuous outbreak of ebola disease (evd) poses a challenge for lethal challenge in newborn mice. our results suggest that the potent igy warrant further development as prophylactic and therapeutic reagents for evd. preparation of immunogens vaccine-elicited neutralizing antibodies (nabs) are associated with protection against filoviridae family mediated disease. in order to obtain the most potent anti-ebov antibody, we prepared several ebov immunogens based on multiple different platforms, including dna vaccine (pcaggs/ebovgp), recombinant protein (rebovgp) or virus-like particle (ebov- vlp) subunit vaccines, and two viral vector vaccines (vsvΔg/ebovgp, ad /ebovgp). western blot confirmed that these immunogens could express or contain ebov gp that can induce nabs in animals (fig ) . due to the differences in humoral immune responses induced by different vaccines, we need to screen for the most suitable immunogen for igy antibody production. to obtain high titer ebov nabs, -month-old laying hens were vaccinated with five different immunogens, including or tcid vsvΔ g/ebovgp, μ g rebovgp, μ g pcaggs/ebovgp, μg ebov-vlp, and virus particles (vp) ad /ebovgp (fig a) . thirty-five laying hens were randomly divided into seven groups, which were immunized four times with each immunogen or pbs control intramuscularly (i.m.) at a -day interval. eggs were collected at , , , , weeks, and igy antibodies were purified from egg yolk for elisa and nabs test. both titers in all groups were gradually increased after the first immunization. the results showed that all immunogens except dna vaccine induced potent gp-specific elisa antibodies (fig b) . for the nabs, the geometric mean titers (gmts) in tcid vsvΔ g/ebovgp group reached : (vsv pseudoneutralisation, vsv-psn) and : (lentiviral vectors pseudoneutralisation, lvv-psn) after the third boost, which significantly higher than other groups (fig c- passive transfer of igy protect newborn balb/c mice from lethal challenge to determine whether the anti-ebov igy antibodies are protective against ebov, passive protection experiment was performed in newborn balb/c mice (within the first days of life). forty newborn balb/c mice were divided into eight groups, which were challenged subcutaneously (s.c.) with tcid vsvΔg/ebovgp. two hours or day post-infection (dpi), each mouse was adoptively transferred with igy twice daily for days, and control group mice treated with naive igy (fig a) . to determine the correlation between the transferred igy dosage and therapeutic efficacy, three different dosages with , , or nau/kg (nab year. however, the antibody titer stored at ℃ gradually decreased from the second month, and only about % of the antibody activity remained by the end. in contrast, the activity of the igy stored at ℃ is lost faster, and the nabs titer cannot be measured by the third month. these results proved that the anti-ebov igy has excellent thermal stability, can be stored at room temperature (rt) for up to one year, and can maintain one month of activity at ℃ without significant changes. even at a high temperature of ℃, it still can short-term retention of activity (fig ) . it is suggested that this anti-ebov igy can be used as an emergency immunizations seven groups of -month-old laying hens (n = per group) were inoculated i.m with immunogens prepared as described above. the detailed scheme is or tcid vsvΔ g/ebovgp, μg rebovgp, μg pcaggs/ebovgp, μg ebov-vlp, vp ad /ebovgp, or an equivalent volume of pbs as a sham control at weeks , , , ( times). eggs were collected at weeks, , , , , for elisa and nabs test. purification of yolk igy antibody igy was isolated from the egg yolk using the water dilution method, a rapid and simple method was used to separate igy from the yolk. the separation method is improved based on previous for western blot analysis, the proteins were electrically transferred onto a polyvinylidene difluoride (pvdf) membrane using a semi-dry blotting apparatus ( v, min, rt), then blocked with tris-buffered saline containing . % tween (tbs-t) and % non-fat dry milk for h at rt and was incubated overnight at °c with a : dilution of mouse anti- monoclonal antibody therapy for ebola virus disease. the new england journal of medicine new treatments for ebola virus disease antibodies from a human survivor define sites of vulnerability for broad protection against antibody therapeutics for ebola virus disease. current opinion in virology protective monotherapy against lethal ebola virus infection by a potently neutralizing antibody neutralizing antibody fails to impact the course of ebola virus infection in monkeys. plos pathogens postexposure antibody prophylaxis protects nonhuman primates from filovirus disease. proceedings of the national academy of sciences of the united states of america reversion of advanced ebola virus disease in nonhuman primates with zmapp development of a cost-effective ovine polyclonal antibody-based product treat ebola virus infection. the journal of infectious diseases post-exposure treatment of ebola virus disease in guinea pigs using ebotab, an ovine antibody-based therapeutic treatment with hyperimmune equine immunoglobulin or immunoglobulin fragments completely protects rodents from antibody production and transfer to egg yolk in chickens the use of gene-specific igy antibodies for drug target discovery induction of an fc conformational change by binding of antigen: the generation of protein a-reactive sites in chicken immunoglobulin production and application of anti-nucleoprotein igy antibodies for influenza a virus detection in swine preparation and characterization of egg yolk immunoglobulin y specific to influenza b virus prophylaxis and therapy of pandemic h n virus infection using egg yolk antibody dengue virus specific igy provides protection following lethal dengue virus challenge and is neutralizing in the absence of inducing antibody dependent enhancement. plos neglected tropical diseases results are therapeutic following a lethal zika virus challenge without inducing preparation and evaluation of anti-sars coronavirus igy from yolks of immunized spf chickens a dual chicken igy against rotavirus and norovirus antibody therapeutics for ebola virus disease. current opinion in virology characterization of a novel neutralizing monoclonal antibody against ebola virus gp. the journal of infectious diseases the emergence of antibody therapies for ebola protective monotherapy against lethal ebola virus infection by a potently neutralizing antibody a highly immunogenic fragment derived from zaire ebola virus glycoprotein elicits effective neutralizing antibody two-mab cocktail protects macaques against the makona variant of ebola virus successful treatment of ebola virus-infected cynomolgus macaques with monoclonal antibodies. science translational medicine dependent enhancement of ebola virus infection by human antibodies isolated from survivors ebola epidemic in war-torn democratic republic of congo stomatitis virus -zaire ebolavirus vaccine a replication defective recombinant ad vaccine expressing ebola virus gp is safe and immunogenic in healthy adults ebola virus-like particles produced in insect cells exhibit dendritic cell stimulating activity and induce neutralizing antibodies recombinant vesicular stomatitis viruses from dna generation of vsv pseudotypes using recombinant deltag-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines pseudoparticle neutralization assay for detecting ebola-neutralizing antibodies in biosafety level settings key: cord- -tgka pl authors: tovo, anna; menzel, peter; krogh, anders; lagomarsino, marco cosentino; suweis, samir title: taxonomic classification method for metagenomics based on core protein families with core-kaiju date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tgka pl characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. however, determining microbiomes diversity implies the classification of taxa composition within the sampled community, which is often done via the assignment of individual reads to taxa by comparison to reference databases. although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. in this study, we first apply and compare different bioinformatics methods based on s ribosomal rna gene and whole genome shotgun sequencing for taxonomic classification to three small mock communities of bacteria, of which the compositions are known. we show that none of these methods can infer both the true number of taxa and their abundances. we thus propose a novel approach, named core-kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to s, but based on emergent statistics of core protein domain families. we thus test the proposed method on the three small mock communities and also on medium- and highly complex mock community datasets taken from the critical assessment of metagenome interpretation challenge. we show that core-kaiju reliably predicts both number of taxa and abundance of the analysed mock bacterial communities. finally we apply our method on human gut samples, showing how core-kaiju may give more accurate ecological characterization and fresh view on real microbiomes. modern high-throughput genome sequencing techniques revolutionized ecological studies of microbial communities at an unprecedented range of taxa and scales ( , , , , ) . it is now possible to massively sequence genomic dna directly from incredibly diverse environmental samples ( , ) and gain novel insights about structure and metabolic functions of microbial communities. * correspondence should be addressed to dr. suweis. email: suweis@pd.infn.it one major biological question is the inference of the composition of a microbial community, that is, the relative abundances of the sampled organisms. in particular, the impact of microbial diversity and composition for the maintenance of human health is increasingly recognized ( , , , ) . indeed, several studies suggest that the disruption of the normal microbial community structure, known as dysbiosis, is associated with diseases ranging from localized gastroenterologic disorders ( ) to neurologic illnesses ( ) . however, it is impossible to define dysbiosis without first establishing what normal microbial community structure means within the healthy human microbiome. to this purpose, the human microbiome project has analysed the largest cohort and set of distinct, clinically relevant body habitats ( ) , characterizing the ecology of healthy human-associated microbial communities. however there are several critical aspects. the study of the structure, function and diversity of the human microbiome has revealed that even healthy individuals differ remarkably in the contained species and their abundances. much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. characterizing a microbial community implies the classification of species/genera composition within the sampled community, which in turn requires the assignment of sequencing reads to taxa, usually by comparison to a reference database. although computational methods aimed at identifying the microbe(s) taxa have an increasingly long history within bioinformatics ( , , ) , it is well known that inference based on s ribosomal rna (rrna) or shotgun sequencing vary widely ( ) . moreover, even if data are obtained via the same experimental protocol, the usage of different computational methods or algorithm variants may lead to different results in the taxonomic classification. the two main experimental approaches for analyzing the microbiomes are based on s rrna gene amplicon sequencing and whole genome shotgun sequencing (metagenomics). sequencing of amplicons from a region of the s rrna gene is a common approach used to characterize microbiomes ( , ) and many analysis tools are available (see materials c the author(s) this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/licenses/ by-nc/ . /uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. and methods section). besides the biases in the experimental protocol, a major issue with s amplicon-sequencing is the variance of copy numbers of the s genes between different taxa. therefore, abundances inferred by read counts of the amplicons should be properly corrected by taking into account the copy number of the different genera detected in the sample ( , , ) . however, the average number of s rrna copies is only known for a restricted selection of bacterial taxa. as a consequence, different algorithms have been proposed to infer from data the copy number of those taxa for which this information is not available ( , ) . in contrast, whole genome shotgun sequencing of all the dna present in a sample can inform about both diversity and abundance as well as metabolic functions of the species in the community ( ) . the accuracy of shotgun metagenomics species classification methods varies widely ( ) . in particular, these methods can typically result in a large number of false positive predictions, depending on the used sequence comparison algorithm and its parameters. for example in k-mer based methods as kraken ( ) and kraken ( ) the choice of k determines sensitivity and precision of the classification, such that sensitivity increases and precision decreases with increasing values for k, and vice versa. as we will show, false positive predictions often need to be corrected heuristically by removing all taxa with abundance below a given arbitrary threshold (see materials and methods section for an overview on different algorithms of taxonomy classification). we highlight that the protocols for s-amplicons and shotgun methods are different and each has their own batch effects. importantly, while shotgun taxonomic analysis gives classification results at species-level, s taxonomic profilers most often need to stop at the genus level. however, in the end, both aim at answering to the same question: "what are the relative abundances of taxa in the sample?" therefore it is not methodologically wrong to compare their answers against the same community. to do that, it is possible to aggregate lower level (e.g. species) counts towards higher levels (e.g. genus), as it has been done in many benchmarks studies before (see, e.g., ( , , , ) ). in fact, several studies have performed comparisons of taxa inferred from s amplicon and shotgun sequencing data, with samples ranging from humans to studies of water and soil. logares and collaborators ( ) studied communities of bacteria marine plankton and found that shotgun approaches had an advantage over amplicons, as they rendered more truthful community richness and evenness estimates by avoiding pcr biases, and provided additional functional information. chan et al. ( ) analyzed thermophilic bacteria in hot spring water and found that amplicon and shotgun sequencing allowed for comparable phylum detection, but shotgun sequencing failed to detect three phyla. in another study ( ) s rrna and shotgun methods were compared in classifying community bacteria sampled from freshwater. taxonomic composition of each s rrna gene library was generally similar to its corresponding metagenome at the phylum level. at the genus level, however, there was a large amount of variation between the s rrna sequences and the metagenomic contigs, which had a ten-fold resolution and sensitivity for genus diversity. more recently jovel et al. ( ) compared bacteria communities from different microbiomes (human, mice) and also from mock communities. they found that shotgun metagenomics offered a greater potential for identification of strains, which however still remained unsatisfactory. it also allowed increased taxonomic and functional resolution, as well as the discovery of new genomes and genes. while shotgun metagenomics has certain advantages over amplicon-sequencing, its higher price point is still prohibitive for many applications. therefore amplicon sequencing remains the go-to established cost-effective tool to the taxonomic composition of microbial communities. in fact, the usage of the s rrna-gene as a universal marker throughout the entire bacterial kingdom made it easy to collect sequence information from a wide distribution of taxa, which is yet unmatched by whole genome databases. several curated databases exist to date, with silva ( , ), greengenes ( , ) and ribosomal database project (rdp) ( ) being the most prominent. additionally, ncbi also provides a curated collection of s reference sequences in its targeted loci project (https://www.ncbi.nlm.nih.gov/refseq/targetedloci/). when benchmarking protocols for taxonomic classification from real samples of complex microbiomes, the "ground truth" of the contained taxa and their relative abundances is not known (see ( ) ). therefore, the use of mock communities or simulated datasets remains as basis for a robust comparative evaluation of a method prediction accuracy. in the first part of this work we apply three widely used taxonomic classifiers for metagenomics, kaiju ( ), kraken ( ) and metaphlan ( ) , and two common methods for analyzing s-amplicon sequencing data, dada ( ) and qiime ( ) to three small mock communities of bacteria, of which we know the exact composition ( ) . we show that s rrna data efficiently allow to detect the number of taxa, but not their abundances, while shotgun metagenomics as kaiju and kraken give a reliable estimate of the most abundant genera, but the nature of the algorithms makes them predict a very large number of false-positive taxa. the central contribution of this work is thus to develop a method to overcome the above limitations. in particular, we propose an updated version of kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method, similar to s rrna, but based on core protein domain families ( , , , ) from the pfam database ( ) . our criterion for choosing the set of marker domain families is that we uncover the existence of a set of core families that are typically at most present in one or very few copies per genome, but together cover uniquely all bacteria species in the pfam database with an overall quite short sequence. using presence of these core pfams (mostly related to ribosomal proteins) as a filter criterion allows for detecting the correct number of taxa in the sample. we tested our approach in a protocol called "core-kaiju" and show that it has a higher accuracy than other classification methods not only on the three small mock communities, but also on intermediate and highly biodiverse mock communities designed for the st critical assessment of metagenome interpretation (cami) challenge ( ) . in fact we will show how in all these cases core-kaiju overcomes, for the most part, the problem of false-positive genera and accurately predicts the abundances of the different detected taxa. we finally apply our novel pipeline to classify microbial genera in the human gut from the human macrobiome project (hmp) ( ) dataset, showing how core-kajiu may allow for a more accurate biodiversity characterization of real microbial communities, thus putting the basis for more solid dysbiosis analysis in microbiomes. taxonomic classification: amplicon versus whole genome sequencing many computational tools are available for the analysis of both amplicon and shotgun sequencing data ( , , , , , , ) . one of the differences among the several software for s rrna analysis, is on how the next-generation sequencing error rate per nucleotide is taken into account, when associating each sampled s sequence read to taxa. indeed, errors along the nucleotide sequence could lead to an inaccurate taxon identification and, consequently, to misleading diversity statistics. the traditional approach to overcome this problem is to cluster amplicon sequences into the so-called operational taxonomic units (otus), which are based on an arbitrary shared similarity threshold usually set up equal to % for classification at the genus level. of course, in this way, these approaches lead to a reduction of the phylogenetic resolution, since gene sequences below the fixed threshold cannot be distinguished one from the other. that is why, sometimes, it may be preferable to work with exact amplicon sequence variants (asvs), i.e. sequences recovered from a high-throughput marker gene analysis after the removal of spurious sequences generated during pcr amplification and/or sequencing techniques. the next step in these approaches is to compare the filtered sequences with reference libraries as those cited above. in this work, we chose to conduct the analyses with the following two opensource platforms: dada ( ) and qiime ( ) . dada is an r-package optimized to process large datasets (from s of millions to billions of reads) of amplicon sequencing data with the aim of inferring the asvs from one or more samples. once the spurious s rrna gene sequences have been recovered, dada allowed for the comparison with both silva, greengenes and rdp libraries. we performed the analyses for all the three possible choices. qiime is another widely used bioinformatic platform for the exploration and analysis of microbial data which allows, for the sequence quality control step, to choose between different methods. for our comparisons, we performed this step by using deblur ( ) , a novel sub-operational-taxonomic-unit approach which exploits information on error profiles to recover error-free s rrna sequences from samples. as shown in ( ) , where different amplicon sequencing methods are tested on both simulated and real data and the results are compared to those obtained with metagenomic pipelines, the whole genome approach resulted to outperform the previous ones in terms of both number of identified strains, taxonomic and functional resolution and reliability on estimates of microbial relative abundance distribution in samples. similar comparisons have also been performed with analogous results in ( , , , ) (see ( ) for a comprehensive summary of studies comparing different sequencing approaches and bioinformatic platforms). standard widespread taxonomic classification algorithms for metagenomics (e.g. kraken ( ) and kraken ( ) ) extract all contained k−mers (all the possible strings of length k that are contained in the whole metagenome) from the sequencing reads and compare them with index of a genome database. however, the choice of the length k highly influences the classification, since, when k is too large, it is easy not to found a correspondence in reference database, whereas if k is too small, reads may be wrongly classified. recently, a novel approach has been proposed for the classification of shotgun data based on sequence comparison to a reference database comprising protein sequences, which are much more conserved with respect to nucleotide sequences ( ) . kaiju indexes the reference database using the borrows-wheeler-transform (bwt), and translated sequencing reads are searched in the bwt using maximum exact matches, optionally allowing for a certain number of mismatches via a greedy heuristic approach. it has been shown ( ) that kaiju is able to classify more reads in real metagenomes than nucleotide-based k−mers methods. therefore, previous studies on the community composition and structure of microbial communities in the human can be actually very biased by previous metagenomic analysis that were missing up to % of the reconstructed species (i.e. most of the species they found were not present in the gene catalog). we therefore chose to work with kaiju (with mem option ( )) for our taxonomic analysis. although it resulted to give better estimates of sample biodiversity composition with respect to amplicon sequencing techniques, we found that it generally overestimates the number of genera actually present in our community (see results section) of two magnitude orders, i.e. there is a long tail of low abundant false-positive taxa. to overcome this, we implemented a new release of the program, core-kaiju, which contains an additional preliminary step where reads sequences are firstly mapped against a newly protein reference library we created containing the amino-acid sequence of proteomes' core pfams (see following section). we also compared standard kaiju and core-kaiju results with those obtained via kraken and via another widely used program for shotgun data analysis, metaphlan ( , ) . after downloading the pfam database (version . ), we selected only bacterial proteomes and we tabulated the data into a f ×p matrix, where each column represented a different proteome and each row a different protein domain. in particular, our database consisted of p = bacterial proteomes and f = protein families. in each matrix entry (f,p), we inserted the number of times the f family recurred in proteins of the p proteome, n f,p . by summing up over the p column, one can get the proteome length, i.e. the total number of families of which it is constituted, which we will denote with l p . similarly, if we sum up over the f row, we get the family abundance, i.e. the number of times the f family appears in the pfam database, which we call a f . figure shows the frequency histogram of the proteome sizes (left panel) and of the family abundances (right panel). our primary goal was to find the so-called core families ( ), i.e. the protein domains which are present in the overwhelming majority of the bacterium proteomes but occurring just few times in each of them ( , ) . in order to analyze the occurrences of pfam in proteomes, we converted the original f ×p matrix into a binary one, giving information on whether each pfam was present or not in each proteome. in the left panel of figure we inserted the histogram of the family occurrences, which displays the typical u-shape, already observed in literature ( , , , ) : a huge number of families are present in only few proteomes (first pick in the histogram), whilst another smaller peak occurs at large values, meaning that there are also a percentage of domains occurring in almost all the proteomes. in the right panel, we show the plot of the number of rare pfam (having abundance less or equal to four in each proteome) versus the percentage of proteomes in which they have been found. we thus selected the pfams found in more than % of the proteomes and such that max p n f,p = (see zoom panel of figure ). since we wish to have at least one representative core pfam for each proteome in the database, we checked whether with these selected core families we could 'cover' all bacteria. unfortunately, none of them resulted to be present in proteomes and , corresponding to actinospica robiniae dsm and streptomyces sp. nrrl b- , respectively. we therefore looked for the most prevalent pfam(s) present in such proteomes. we found that pfam pf , occurring in % of the proteomes, was present in both actinospica robiniae and streptomyces and we therefore add it to our core-pfam list. eventually, in order to minimize the number of pfams to work with (and related computational cost), we considered in our final core-pfam list only the minimum number of domains through ribosomal protein l pf ribosomal protein l pf nusb family (involved in the regulation of rrna biosynthesis by transcriptional antitermination) pf ribosomal protein l pf ribosomal protein s (bacterial ribosomal protein s interacts with s rrna) pf mraw methylase family (sam dependent methyltransferases) pf ribosomal proteins l , c-terminal domain pf domain of unknown function (duf ) pf ef-p (elongation factor p) translation factor required for efficient peptide bond synthesis on s ribosomes pf ribosomal proteins s l /mitochondrial s l which we were able to cover the whole list of proteomes of the databases. in particular, the selected core protein domains for bacteria proteomes are the ten pfams pf , pf , pf , pf , pf , pf , pf , pf and pf (see table ). principal coordinate analysis. in order to explore whether the expression of the core pfam protein domains are correlated with taxonomy, we did the following. first, we downloaded from the uniprot database ( ) the amino acid sequence of each pfam along the different proteomes (see supporting information for details). their averaged (over proteomes) sequence lengths l resulted to be highly picked around specific values ranging from l = to l = (see supporting information, figure s , for the corresponding frequency histograms). second, for each family we computed the damerau−levenshtein (dl) distance between all its corresponding dna sequences. dl measures the edit distance between two strings in terms of the minimum number of allowed operations needed to modify one string to match the other. such operations include insertions, deletions/substitutions of single characters and transposition of two adjacent characters, which are common errors occurring during dna polymerase. this analogy makes the dl distance a suitable metric for the variation between protein sequences. by simplicity and to have a more immediate insight, we conducted the analysis only for sequence points corresponding to the five most abundant phyla, i.e. proteobacteria, firmicutes, actinobacteria, bacteroidetes and cyanobacteria. after computing the dl distance matrices between all the amino-acid sequences of each pfams along proteomes, we performed the multi dimensional scaling (mds) or principal coordinate analysis (pcoa) on the dl distance matrix. this step allow us to reduce the dimensionality of the space describing the distances between all pairs of core pfams of the different taxa and visualize it in a two dimensional space. in the last two columns of table we inserted the percentage of the variance explained by the first two principal coordinates for the ten different core families, where the first one ranges from . to . % and the second one from . to . %. we then plotted the sequence points into the new principal coordinate space, colouring them by phyla. in general, we observed a two-case scenario. for some families as pf (see figure , left panel), actinobacteria and proteobacteria sequences are grouped in one or two highly visible clusters each, whereas the other three phyla do not form well distinguished structures, being their sequence points close one another, especially for cyanobacteria and firmicutes. for other families as pf (see figure , left panel), all five phyla result to be clustered, suggesting a higher correlation between taxonomy and amino-acid sequences (see supporting information, figure s , for the other core families graphics). these results suggest that some core families (e.g. ribosomal ones) are phyla dependent, while other are not directly correlated with taxa. we started by testing shotgun versus s taxonomic pipelines on three small artificial bacterial communities generated by jovel et al. ( ) , whose raw data are publicly available (sequence read archive (sra) portal of ncbi, accession number srp ). these mock populations contain dna from eleven species belonging to seven genera: salmonella enterica, streptococcus pyogenes, escherichia coli, lactobacillus helveticus, lactobacillus delbrueckii, lactobacillus plantarum, clostridium sordelli, bacteroides thetaiotaomicron, bacteroides vulgatus, bifidobacterium breve, and bifidobacterium animalis. for the taxonomic analysis at the genus level through s amplicon sequencing, we evaluated the performance of dada ( ) and qiime pipelines ( ) . in particular, as shown in ( ), qiime produced more reliable results in terms of relative abundance of bacteria for all three mock communities when compared to mothur ( ), another widely used s pipeline, and to the miseq reporter v . , a software developed by illumina to analyze miseq instrument output data. as for shotgun libraries, we tested the standard kaiju ( ), kraken ( ), the improved version of kraken ( ) , and metaphlan ( ), the improved version of metaphlan ( ) . this latter relies on unique clade-specific marker genes and it had been shown to have higher precision and speed over other programs ( ) . eventually, we tested core-kaiju on these mock communities and compared its performance with the above taxonomic classification methods. we inserted, for each core family (pfam id, first column), the percentage of proteomes in which it appears (prevalence, second column), the maximum number of times it occurrs in one proteome (maximal occurrence, third column), the total number of times it is found among proteomes in the pfam database (total occurrence, fourth column) and the percentage of variance explained by the firs two coordinates (pco and pco , last two columns) when mds is performed on sequences belonging to the five most abundant phyla (see figure ). after defining the core pfams, we created two protein databases for kaiju: the first database only contains the protein sequences from the core families, whereas the second database is the standard kaiju database based on the bacterial subset of the ncbi nr database. the protocol then follows these steps: . classify the reads with kaiju using the database with the core protein domains . classify the reads with kaiju using the nr database to get the preliminary relative abundances for each genus . discard from the list of genera detected in ( ) those having absolute abundance of less than or equal to twenty reads in the list obtained in point ( ). this threshold represents our confidence level on the sequencing pipeline (see below). . re-normalize the abundances of the genera obtained in point ( ) . we evaluated the performance of both shotgun and s pipelines for the taxonomic classification of the three mock communities. in the top panels of figure we show the true relative genus abundance composition of the three small mock communities versus the ones predicted via the different tested taxonomic pipelines. we then applied the core-kaiju pipeline to detect the biodiversity composition of the same three mock communities. in figure , bottom panels, we plot the linear fit performed on predicted relative abundances via core-kaiju versus theoretical ones, known a priori. as we can see, in all three cases the predicted community composition was satisfactorily captured by our method, with an r value higher than . . our goal was to to quantitatively compare the performance of different methods in terms of both biodiversity and relative abundances. as for the first, we chose to measure it via the figure . comparison between theoretical and predicted relative abundances in small mock communities. top panels: predicted relative abundance composition of the three small mock communities via different taxonomic classification methods. bottom panels: red points represent data of relative abundance predicted for the genus level by core-kaiju on the three mock communities versus the true ones, known a priori. the green line is the linear fit performed on obtained points which, in the best scenario, should coincide with the quadrant bisector (dotted red line). in all three cases the predicted community composition was satisfactorily captured by our method, with an r-squared value of . , . and . , respectively. f score applied at the genera level. more precisely, we define the recall of a given taxonomic classification method as the number of truly-positive detected genera (present in a community and thus correctly detected by the method), t p , over the sum between t p and f n , the number of false-negative genera (present in a community, but missed to be classified). in contrast, we define the precision to be the ratio between t p and the sum of t n and f p , the number of false-positive genera (not present in a community and thus incorrectly detected as present). finally, the f biodiversity score is twice the ratio between the product of recall and precision and their sum, i.e. f = * t p /((t p +f n ) * (t p +f p )). f score values obtained via the different methods for the three analysed mock communities are presented in table . while f describes the overall accuracy in detecting the correct number of genera in the sample, r gives the correlation between the taxa abundance measured by the pipeline and the real composition of the microbial sample. finally, we also indicated the number of genera each method predicts,Ĝ. table summarizes the results of the analysis, together with the r-squared values, r , obtained for the linear fit performed between true and predicted relative abundances. as we can see, both core-kaiju and metaphlan gave a good estimate of the number of genera in the communities (which is equal to seven), whereas all s methods slightly overestimated it. finally, both standard kaiju and kraken predicted a number of genera much higher than the true one. moreover, fit with standard kaiju and core-kaiju of the predicted abundances displayed a higher determination coefficient with respect to all other pipelines, with the exception of kraken , which gave comparable values. however, if we focus on the f score, we can notice that core-kaiju outperformed all the other methods in terms of precision and recall. in particular, since the pipeline led to zero false-positive and only one false negative genus (e.coli in all three communities), the resulting precision and recall were and . for all the sampled mocks. with core-kaiju, we were therefore able to produce a reliable estimate of both the number of genera within the communities and their relative abundances. as stated in the introduction and observed above, metagenomic classification methods, such as kaiju, often give a high number of false-positive predictions. in principle, one could set an arbitrary threshold on the detected relative abundances, for example . % or %, to filter out lowabundance taxa that are likely false-positives. however, different choices of the threshold typically lead to very different results. the top panels of figure shows the empirical taxa abundance distribution of the genera table . f score, r-squared values and number of predicted genera. for all three analysed mock communities, we inserted the f score (twice the ratio between the product of recall and precision and their sum), the r value of the linear fit performed between estimated and true abundances together with the number of predicted genera,Ĝ, with various taxonomic methods. the true number of genera is g = for each community. mock (g = ) mock (g = ) , or if one considers only genera accounting for more than . %, . % and % of the total number of sample reads, respectively. moreover, looking at the empirical pattern, one can notice the main gap between genera covering a fraction of less than · − with respect to the total number of reads (black points) and those covering a fraction higher than · − (green points), which corresponds to the genera actually present in the artificial community. one could therefore hope that, whenever such a gap is detected in the taxa abundance distribution, this corresponds to the one between false-positive and truly present taxa. however, as will be clear in the following section, this is not the case and it is not possible to set a relative threshold for the shotgun methods that works for all the mock communities. we tested and compared standard kaiju, kraken and core-kaiju also on medium and high complexity mock bacterial communities obtained from the st cami challenge ( ) , in terms of biodiversity (recall, precision, f score,Ĝ) and abundance composition (linear fit r-squared). in table we show the results for samples and of the high-complexity dataset (see supporting information for the results of the other samples). as we can see, core-kaiju strongly outperformed the other methods in terms of precision. indeed, it only slightly overestimated the true number of genera of around taxa in sample , and taxa in sample (see table ), which is two order of magnitude lower with respect to the other methods (that predicted > of taxa). on the other hand, as also shown from the bottom panels of figure , when using in standard kaiju (or kracken ) a relative threshold of % so to reduce the number of false-positive taxa, as suggested by the previous analysis on the small mock community, the number of predicted taxa is in this case around , therefore strongly underestimating the real biodiversity of the samples. as for the recall, the performance of core-kaiju (values around %) stands between standard kaiju (values around %) and kraken (values around %). the combination of recall and precision led to an f score around %, much higher than the other two pipelines ( %). finally, as shown in figure , core-kaiju gave also a very good estimation of the microbial composition, with an r-squared for the fit between theoretical and predicted relative abundances above . , value comparable to standard kaiju and much higher than the one obtained with kraken ( . ). in the supporting information we present all the results for the other highcomplexity samples as well as the analyses performed on the medium-complexity challenge dataset and the sensitivity of the classification on the absolute thresholds. we finally applied core-kaiju taxonomic classification method to an empirical data-set. we analysed a cohort of healthy human fecal samples from the study ( ) (metagenomic sequencing data are publicly available at the ncbi sra under accession number srp ). we applied standard kaiju and found on average (over the samples) bacterial genera. similar overestimation of the number of taxa of kajiu . would be obtained also with kracken , highlighting the above mentioned problem of setting the correct threshold in order to have a realistic estimation of the sample biodiversity. the right panel of figure shows the empirical taxa abundance distribution of one individual (sample id: srr ). as we can see, in this case the only apparent gap occurs between relative abundance of less than − and those above . , with only one genus. it therefore results quite unrealistic that all the taxa but one should be considered falsepositive. the same plot shows the vertical lines corresponding to threshold on relative population of . %, . % and % above which we have , and taxa, respectively. in contrast, with core-kaiju we did not need to tune a relative threshold. instead, by removing false-positive through the (fixed) absolute abundance of reads we ended up with genera (orange diamonds in figure ) , which is compatible with previous estimates. in fact, the available ampliconsequencing datasets from stool samples of healthy participants of the human microbiome project ( ) suggest that there are on average different bacterial genera per sample (based on samples with at least > k reads per sample using % otu the red triangle corresponds to the unique false-negative genus (e.coli) undetected with the newly proposed method. dashed lines represent relative abundance thresholds on standard kaiju output of . %, . % and %, respectively, which would have led to a biodiversity estimate of , and genera, respectively. imposing an absolute abundance threshold of twenty reads on standard kaiju output directly, would instead lead to an overestimation of genera. bottom panels: the same analyses have been performed on the cami high-complex sample . again, green diamonds represent the out of genera present in the community and correctly detected by our pipeline. in this case, in addition to the remaining false-negative genera (red triangles) we have also the presence of false-negative genera, here represented by gray triangles. setting a threshold on the relative abundance of reads produced by standard kaiju gives a number of genera of for the . %, for the . % and for the % threshold, respectively. left and right panels represent, respectively, log-log absolute frequency and cumulative patterns of the taxa abundances in the mock communities. clustering). however, in terms of taxa composition, core-kaiju predicted abundances are different from those obtained using s classification methods ( ). an important source of errors in the performance of any algorithm working on shotgun data is the high level of plasticity of bacterial genomes, due to widespread horizontal transfer ( , , , , , ) . indeed, most highly abundant gene families are shared and exchanged across genera, making them both a confounding factor and a computational burden for algorithms attempting to extract species presence and abundance information. thus, while having access to the sequences from the whole metagenome is very useful for functional characterization, restriction to a smaller set of families may be a very good idea when the goal is to identify the species taxa and their abundance. to summarize, we have presented a novel method for the taxonomic classification of microbial communities which exploits the peculiar advantages of both whole-genome and s rrna pipelines. indeed, while the first approaches are recognised to better estimate the relative taxa composition of samples, the second are much more reliable in predicting the true biodiversity of a community, since the comparison between taxa-specific hyper-variable regions of bacterial s ribosomal gene and comprehensive reference databases allows in general to avoid the phenomenon of false-positive taxa detection. indeed, the identification of a threshold in shotgun table . performance comparison on cami high-complexity samples and . in the first four columns, we inserted the values for the precision, the recall, the f score, the r value of the linear fit performed between estimated and true abundances, and the number of predicted generaĜ with core-kaiju, standard kaiju and kraken . the true number of genera is g = for each sample. in the last column we also inserted the number of genera one would predict with standard kaiju and kraken by setting a relative threshold of %, i.e. by considering false-positive all those genera having a relative abundance of less than . in the sample. we denoted this quantity byĜ % . sample (g = ) figure . linear fit between theoretical and predicted relative abundances with core-kaiju. red points represent data of relative abundance predicted for the genus level by core-kaiju on sample and from the cami highly-complex dataset versus the ground-truth abundances, known a priori. the green line is the linear fit performed on such values which, in the case of perfect matching between data and cor-kaiju output, should coincide with the quadrant bisector (dotted red line). in both cases, the predicted community composition was satisfactorily captured by our method, with a correlation with the real taxa abundances of r = . and r = . for sample and , respectively. methods to remove most of the false-positive is of course a critical problem, because in general the true taxa composition is not known, and thus setting the wrong threshold may lead to a huge over-(or under-) estimation of the sample biodiversity, as shown in this work. inspired by the role of s gene as a taxonomic fingerprint and by the knowledge that proteins are more conserved than dna sequences, we proposed an updated version of kaiju, an open-source program for the taxonomic classification of whole-genome high-throughput sequencing reads where sample metagenomic dna sequences are firstly converted into amino-acid sequences and then compared to microbial protein reference databases. we identified a class of ten domains, here denoted by core pfams, which, analogously to s rrna gene, on one hand are present in the overwhelming majority of proteomes, therefore covering the whole domain of known bacteria, and which on the other hand occur just few times in each of them, thus allowing for the creation of a novel reference database where a fast research can be performed between sample reads and pfams amino-acid sequences. tested against mock microbial communities, of different level of complexity, generated in other studies ( , ) and available online, the proposed updated version of kaiju, core-kaiju, outperformed popular s rrna and shotgun methods for taxonomic classification in the estimation of both the total biodiversity and taxa relative abundance distribution. in fact, by fixing an absolute threshold with core-kaiju (by only considering abundances greater to twenty reads), we are able to correctly classify the biodiversity in all samples of different size and complexity, while keeping a very good performance in the prediction of taxa abundances. we highlight that other technologies exist beyond metagenomics or s amplicons on a miseq (integrated instrument performing clonal amplification and sequencing), as for example pacbio ( ). earl and collaborators ( ) used a cami dataset to test the accuracy of this method and it is therefore possible to indirectly compare core-kaiju with pacbio through their results. also in this case we found that our method gives a slightly higher r score for the genera abundances composition, confirming the competitiveness of core-kaiju even with long-read technology such as pacbio. however, a deeper comparison with these methods goes beyond the scope this work because, although might perform better than miseq next-generation sequencing approaches, they are quite rare and available only for much higher price. our promising results pave the way for the application of the newly proposed pipeline in the field of microbiotahost interactions, a rich and open research field which has recently attracted the attention of the scientific world due to the hypothesised connection between human microbiome nevertheless estimates from a reference cohort of stool microbiomes ( ) from healthy hmp participants ( s v -v region, > k reads per sample, % otu clustering), report an average number of genera per sample of (max= , min= ) ( ). setting a threshold on the relative abundance of reads produced by standard kaiju gives a number of genera of for the . %, for the . % and for the % threshold, respectively. in contrast, considering false-positive all genera with less or equal to twenty reads in standard kaiju output, we end up with genera. orange diamonds in plot correspond to the genera detected with core-kaiju, a number compatible with the reported estimates. left and right panels represent log-log absolute frequency and cumulative patterns, respectively. and healthy/disease ( , ) . having a trustable tool for the detection of microbial biodiversity, as measured by the number of genera and their abundances, could have a fundamental impact in our knowledge of human microbial communities and could therefore lay the foundations for the identification of the main ecological properties modulating the healthy or ill status of an individual, which, in turn, could be of great help in preventing and treating diseases on the basis of the observed patterns. all data and codes used for this study are available online or upon request to the authors. raw data for the three in-silico mock communities ( ) are publicly available at the sequence read archive (sra) portal of ncbi under accession number srp . metagenomic sequencing data of the healthy human fecal samples from the study ( ) are publicly available at the ncbi sra under accession number srp . cami medium and high complexity datasets are available at https://data.cami-challenge.org/participate under request. this work was supported by the stars grant unipd react to s.s. mcl, s.s. and a.k. acknowledge cariparo foundation visiting program . the human microbiome project the human microbiome project: a community resource for the healthy human microbiome tara oceans studies plankton at planetary scale viral to metazoan marine plankton nucleotide sequences from the tara oceans expedition. scientific data emergent simplicity in microbial community assembly the application of ecological theory toward an understanding of the human microbiome universality of human microbial dynamics community ecology as a framework for human microbiome research the integrative human microbiome project the human intestinal microbiome in health and disease the role of microbiome in central nervous system disorders structure, function and diversity of the healthy human microbiome shotgun sequencing of the human genome microbial community profiling for human microbiome projects: tools, techniques, and challenges phylophlan is a new method for improved phylogenetic and taxonomic placement of microbes large-scale differences in microbial biodiversity discovery between s amplicon and shotgun sequencing predictive functional profiling of microbial communities using s rrna marker gene sequences evaluation of general s ribosomal rna gene pcr primers for classical and next-generation sequencing-based diversity studies incorporating s gene copy number information improves estimates of microbial diversity and abundance quantitative microbiome profiling links gut community variation to microbial load copyrighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction microbiology: metagenomics evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities kraken: ultrafast metagenomic sequence classification using exact alignments improved metagenomic analysis with kraken genome biology characterization of the gut microbiome using s or shotgun metagenomics fast and sensitive taxonomic classification for metagenomics with kaiju metagenomic s rdna i llumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities diversity of thermophiles in a malaysian hot spring determined using s rrna and shotgun metagenome sequencing strengths and limitations of s rrna gene amplicon sequencing in revealing temporal microbial community dynamics the silva ribosomal rna gene database project: improved data processing and web-based tools the silva and all-species living tree project (ltp) taxonomic frameworks greengenes, a chimera-checked s rrna gene database and workbench compatible with arb an improved greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea ribosomal database project: data and tools for high throughput rrna analysis metaphlan for enhanced metagenomic taxonomic profiling dada : high-resolution sample inference from illumina amplicon data reproducible, interactive, scalable and extensible microbiome data science using qiime joint scaling laws in functional and evolutionary categories in prokaryotic genomes cross-species gene-family fluctuations reveal the dynamics of horizontal transfers familyspecific scaling laws in bacterial genomes statistics of shared components in complex component systems the pfam protein families database in critical assessment of metagenome interpretationa benchmark of metagenomics software metagenomic microbial community profiling using unique cladespecific marker genes deblur rapidly resolves single-nucleotide community sequence patterns analysis of the intestinal microbiota using solid s rrna gene sequencing and solid shotgun sequencing estimating the size of the bacterial pan-genome zipf and heaps laws from dependency structures in component systems universal distribution of component frequencies in biological and technological systems a neutral theory of genome evolution and the frequency distribution of genes gene frequency distributions reject a neutral model of genome evolution uniprot: a hub for protein information introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric crohns disease the phylogenetic forest and the quest for the elusive tree of life. cold spring harbor symposia on quantitative biology search for a 'tree of life' in the thicket of the phylogenetic forest the tree and net components of prokaryote evolution genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life genomic fluidity: an integrative view of gene diversity within microbial populations pacbio sequencing and its applications genomics species-level bacterial community profiling of the healthy sinonasal microbiome using pacific biosciences sequencing of full-length s rrna genes microbiome the gut microbiome in health and in disease rakoff-nahoum s. the evolution of the host microbiome as an ecosystem on a leash none declared. key: cord- -nt zoktv authors: sweeney, blake a.; hoksza, david; nawrocki, eric p.; ribas, carlos eduardo; madeira, fábio; cannone, jamie j.; gutell, robin; maddala, aparna; meade, caeden; williams, loren dean; petrov, anton s.; chan, patricia p.; lowe, todd m.; finn, robert d.; petrov, anton i. title: r dt: computational framework for template-based rna secondary structure visualisation across non-coding rna types date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nt zoktv non-coding rnas (ncrna) are essential for all life, and the functions of many ncrnas depend on their secondary ( d) and tertiary ( d) structure. despite proliferation of d visualisation software, there is a lack of methods for automatically generating d representations in consistent, reproducible, and recognisable layouts, making them difficult to construct, compare and analyse. here we present r dt, a comprehensive method for visualising a wide range of rna structures in standardised layouts. r dt is based on a library of , templates representing the majority of known structured rnas, from small rnas to the large subunit ribosomal rna. r dt has been applied to ncrna sequences from the rnacentral database and produced > million diagrams, creating the world’s largest rna d structure dataset. the software is freely available at https://github.com/rnacentral/r dt and a web server is found at https://rnacentral.org/r dt. introduction based template are available for the same rna, the d-based template is preferentially selected. the ribovore software is used to search against all models except for trna. if no hits are detected, trnascan-se . is then used to compare the sequences against the bacterial, archaeal, and eukaryotic domain-specific trna models. once a top scoring domain-specific trna model is chosen, the sequence is compared with the isotype- specific trna models for that domain. . the input sequence is folded with the infernal cmalign program using the top scoring covariance model. this ensures that the predicted d structure is compatible with the template d structure. it is important to note that r dt does not attempt to fold the unstructured regions found in some templates or predict the structure of the insertions relative to the template. for each sequence, the pipeline produces a text file with the d structure in dot-bracket notation and a d diagram in svg format. the diagrams are coloured depending on the identity of the individual nucleotides in the input sequence relative to the template. identical nucleotides are shown in black, while inserted nucleotides are displayed in red. if a nucleotide is modified compared to the template reference sequence, it is shown in green. if the location of the nucleotides was automatically repositioned relative to its corresponding position in the template, the nucleotide is coloured blue. the svg diagrams can be scaled to any resolution and edited using text editors or specialised vector graphics editing software. when viewed with a web browser, additional information is shown when hovering the mouse over individual nucleotides (for example, hovering over modified nucleotides reveals the identity of the nucleotide in the corresponding position of the reference sequence). further interactivity can be added to the svg visualisations using javascript and css web technologies. comprehensive d structure template library we compiled a library of , templates aggregating rna d structure layouts from different sources (table ) while the majority of the , templates were integrated from the existing sources (table ) , templates have been manually curated specifically for this project, as described below (also see supplementary table ) . new d structure based templates model rrna expansion segments the availability of the experimentally determined ribosomal d structures enabled us to improve the traditional rrna diagrams available from the crw , . specifically, the d structural data assessed the accuracy of the covariation-based s and s rrna secondary structures, removed the few incorrect base pairs, added new base pairs with both watson-crick and non- canonical base pair conformations, and provided detailed modelling of the species-specific expansion segments that were not present in the covariation-based expansion segments. the revised lsu d templates are outlined using single page layouts and explicitly depict h a , a helix that connects the ′ and ′ halves of the lsu rrna. this irregular helix, which is now known to be the loop-e motif was initially suggested by gutell and fox , and had been indicated by arrows connecting the two halves of the historical lsu rrna layouts . all non- canonical interactions were explicitly depicted when the first d structural model of the lsu particle became available . the single page lsu layouts enable r dt to visualise the lsu d structures automatically, which has not been possible until now (figure a structures, we prepared isotype-specific templates for bacterial, archaeal, and eukaryotic trnas that include those decoding the standard twenty amino acids, initiator methionine/n- formylmethionine (trna imet in archaea/eukaryotes or trna fmet in bacteria), isoleucine for the aua codon in bacteria and archaea, and selenocysteine ( figure ). consensus trna primary sequence with d structure for each isotype of each taxonomic domain was generated based on the trna alignments used for building the isotype-specific covariance models in trnascan- se . . the isotype-specific trna d structure templates were created using the corresponding consensus sequences and structures. in addition, we generated six domain- specific templates for more general application. due to the structural difference of variable loop in type i and type ii trnas , alignments for building the domain-specific covariance models in trnascan-se . were divided into two sets. similar to the isotype-specific ones, the domain- specific templates were built with the consensus sequences and structures for both type categories of trnas. together, the isotype-specific templates can be used to visualise d structures of trnas with typical features while the domain-specific templates can be applied for the atypical predictions with undetermined or inconsistent isotypes. the r dt pipeline is designed to be extendable as new templates are added to the library. notably, r dt can also serve as a tool for the development of new templates where the r dt output is used as a starting point for manual refinement of the d layouts. to facilitate the workflow, we provide a modified version of the xrna software called xrna-gt that supports the import of the r dt-generated svg files and can be used to adjust the d layouts (for example, change the orientation of rna helices or edit base pairs). using xrna-gt it is also possible to add custom annotations, such as helix or nucleotide numbers, in order to produce publication-ready images. the updated d layouts can be submitted to the r dt library where they become new templates, upon review by the r dt team. this workflow has been successfully used internally to produce the d-based ssu templates. we welcome new contributions from the community and provide detailed documentation on github (https://github.com/rnacentral/r dt#how-to-add-new-templates). validation of d diagrams at the time of writing, there are no alternative methods that enable template-based rna d structure visualisation at a comparable scale. the only related method, implemented in rpredictordb , has a small number of templates ( as of july ) and a limited support for alternative templates from the same rna type (for example, species-specific rrna templates). as this is a unique dataset, we developed global benchmarks to assess both accuracy of the template selection and the quality of the resulting d diagrams. evaluation of template selection we tested r dt with a diverse set of rrna sequences to evaluate the template selection process, focusing on the rrna templates as they are annotated at the species level, making it possible to compare the taxonomic lineages of the input sequence and the template. we selected all rrna sequences from refseq shorter than , nucleotides ( , sequences as of july ). the sequences were visualised with r dt and the taxonomic trees of the sequences and the selected templates were compared by identifying the most specific taxonomic rank common to the templates and the refseq sequences. for example, if an rrna from photorhabdus caribbeanensis was drawn using a template from escherichia coli, their respective phylogenies share the order enterobacteriales, thus the sequence and the template agree at the level of order. the majority of sequences match the templates at the level of kingdom ( . %), phylum ( . %), or class ( . %) (supplementary table ), indicating that the selected templates can be taxonomically distant from the input sequences. this effect is due to the preferential use of the d-based ssu and lsu rrna templates, as only a relatively small number of d structures is available. however, when we classified each nucleotide in the d diagrams based on whether it matched a template for each taxonomic rank separately, we found that at least % of all nucleotides were in the same position as the template for all taxonomic ranks, confirming that the sequences closely matched the selected templates despite the phylogenetic distance between the template and sequence. r dt templates model the conserved core of most structured rnas we classified each nucleotide in the resulting diagrams according to whether it matched a template and found that . % of nucleotides were displayed using the nucleotide locations encoded in the templates, while . % of nucleotides represented insertions compared to the templates, and . % of nucleotides matched the templates but required automatic repositioning by the traveler software (table ) . overall . % of the nucleotides were visualised using the templates. to further confirm the agreement between the templates and the diagrams, we manually inspected , d diagrams from human and e. coli (based on the hgnc and ecocyc sequences) to identify any modes of failure, such as overlapping structural regions. this process identified only suboptimal diagrams ( . %) that were characterised by overlapping helices and other artifacts (all diagrams can be seen in supplementary information) , while the remaining , ( . %) diagrams produced error free diagrams, indicating a close correspondence between the template and rendered sequence. to eliminate bias from the use of model organisms (which tend to have the most experimental data), and to also demonstrate the scalability of r dt, the nucleotide classification analysis was extended to a broad range of sequences from a wide taxonomic distribution by processing all ncrna sequences from rnacentral, aiming to test as many realistic use cases as possible. we present a comprehensive framework for the ongoing development of consistent, standardised visualisations of rna d structures. as new d structure templates are introduced, the pipeline can be extended to cover new rna types, including structured viral rnas. for example, as the coronavirus-specific rna families were added to the rfam database in response to the covid- pandemic , their d structures were included in the template library to enable consistent visualisation of sars-cov- structured rnas (figure ) , such as the ' and ' utrs and frameshifting signal (rfam accessions rf , rf , and rf , respectively). the d structure diagrams produced by the pipeline represent computational predictions. r dt establishes a framework that can be further extended and refined. importantly, r dt can be used to generate starting versions of new templates that can be manually refined and incorporated into the template library. for example, new rrna sequences can be submitted to r dt, the species-specific expansion segment regions can be manually edited, and the resulting diagram can be submitted to r dt as described above. in addition, we identified two areas for future development and improvements: ) expanding and refining the template library. as new detailed d structures are published, we will integrate them as templates into the r dt library. in addition, r dt will benefit from the ongoing development isotype-specific consensus trna sequences and d structures were generated using r-scape from the alignments that were used to train and build the corresponding covariance models in trnascan-se . alignments for training the domain-specific covariance models were split into two subsets: ) type i trnas (all except type, and ) type ii trnas (leucine, serine in bacteria, archaea and eukaryotes, and tyrosine in bacteria). the bacterial trna alignments were further filtered to include only one representative trna with the same anticodon in each genus due to the original extra large training set (over , trnas). consensus sequences and the d structures of type i and ii trnas for each domain were then generated using r-scape as the isotype-specific ones. r r was used for the initial image creation using consensus sequence. custom adjustments were then made to convert the positions of the images into typical trna cloverleaf structure orientation. the templates correspond to trnascan-se . covariance models that are used to score input sequences against each isotype-specific set and pick the highest scoring domain/template combination. the pseudogene trnas, as identified by trnascan-se . , are not currently visualised. rfam d structure templates for rna families without a standard, community-accepted d structure layout, we adopted the rfam consensus d structures displayed using the r-scape and r r software. the r r software uses a set of rules that lead to consistent diagrams with the standard position of the ′ and ′ ends of the sequence. we excluded the lncrna rfam families, as well as families that each sequence's best-matching model is used in the second round of cmsearch, executed with the "--hmmonly" option, that again uses a profile hmm to score sequence only, but this time executing the full hmmer filter pipeline such that accurate hit boundaries in sequence and model coordinates are reported. while the second round of cmsearch is slower per model/sequence comparison than the first, only one model is compared to each sequence instead of all models. if the second cmsearch round identifies that there are multiple hits to the model, this indicates that at least some of the input sequence (the intervening sequence between adjacent hits) is either inserted relative to the model, or dissimilar from the expected homologous model region. in this case, the sequence is not evaluated further and no structure diagram will be drawn for the sequence. typically, profile hmms and covariance models are built from multiple sequence alignments, but the ssu and lsu rrna models used in r dt were built from the single sequence templates. r dt uses the rfam covariance models built from the rfam seed alignments. if, for a given sequence, the first round of ribotyper.pl cmsearch results in zero models with a score above bits indicating that no significant similarity has been detected to any models, then the second cmsearch round is skipped and the sequence will be analyzed in a subsequent step by trnascan-se . to identify possible similarity against the trna models. visualising d structures using traveler to produce a layout for an input (target) structure, the traveler software requires the target and template d structures accompanied by the template layout. both the target and template structures are turned into a tree-based representation, then, a minimum mapping between the trees is found and the template layout is modified based on this mapping to fit the target structure. to support the r dt pipeline, two major modifications were made to the traveler software: i) the ability to provide custom mapping and ii) optimised hairpin rotation. since the target d structure is generated by infernal within the r dt pipeline, the target- template structure mapping is already known and the original traveler's mapping procedure is not needed. therefore, for the purpose of r dt, a new process was implemented that uses the infernal output with the target-template sequence mapping and produces an infernal-informed tree mapping which is used by traveler. although in most cases the resulting layout is overlap-free, sometimes the target and template differ in such a way that it is not easily possible to fit the target-specific portions of the structure into the template. therefore, a new overlap detection process was implemented in traveler allowing to rotate the overlapping parts of the structure so that the number of overlaps is the r dt source code is available on github under the apache . license (https://github.com/rnacentral/r dt). an r dt web server can be found at https://rnacentral.org/r dt and its source code is available at https://github.com/rnacentral/r dt- web. a custom version of xrna-gt is available at https://github.com/ldwlab/xrna-gt. trnascan-se . software, and wrote the manuscript. rdf coordinated the project and wrote the manuscript. aip conceived and implemented the r dt software, wrote the manuscript, and coordinated the project. predicting and modeling rna architecture the comparative rna web (crw) site: an online database of comparative sequence and structure information for ribosomal, intron, and other rnas varna: interactive drawing and editing of the rna secondary structure forna (force-directed rna): simple and effective online rna secondary structure diagrams tools for the automatic identification and classification of rna base pairs pseudoviewer: web application and web service for visualizing rna pseudoknots and secondary structures r r--software to speed the depiction of aesthetic consensus rna secondary structures rna drawer: geometrically strict drawing of nucleic acid structures with graphical structure editing and highlighting of complementary subsequences web servers for rna secondary structure prediction and analysis mfold web server for nucleic acid folding and hybridization prediction structural rna homology search and alignment using covariance models rnacentral: a hub of information for non-coding rna sequences infernal . : -fold faster rna homology searches trnascan-se . : improved detection and functional classification of transfer rna genes traveler: a tool for template-based rna secondary structure secondary structure and domain architecture of the s and s rrnas a common motif organizes the structure of multi-helix loops in s and s ribosomal rnas additional watson-crick interactions suggest a structural core in large subunit ribosomal rna secondary structure model for s ribosomal rna the complete atomic structure of the large ribosomal subunit at . a resolution evolutionary characteristics of s and s rrna structures rpredictordb: a predictive database of individual secondary structures of rnas and their formatted plots reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation dictybase : integrating multiple dictyostelid species flybase . : the next generation : knowledgebase for the laboratory mouse pombase : updates to the fission yeast database saccharomyces genome database: the genomics resource of budding yeast the arabidopsis information resource: making and mining the 'gold standard' annotated reference plant genome wormbase : more genomes, more data, new website genenames.org: the hgnc and vgnc resources in xrna: auto-interactive program for modeling rna. the center for molecular biology of rna /pub/xrna secondary structures of rrnas from all three domains of life nonredundant d structure datasets for rna knowledge extraction and benchmarking the protein data bank fr d: finding local and composite recurrent structural motifs in rna d structures ribovision suite for visualization and analysis of ribosomes. faraday discuss a statistical test for conserved rna structure shows lack of evidence for structure in lncrnas accelerated profile hmm searches dna homology search with profile hmms the authors would like to thank the rnacentral consortium for contributing data to rnacentral the authors declare no competing interests. key: cord- -rlee x authors: leis, jonathan; luan, chi-hao; audia, james e.; dunne, sara f.; heath, carissa m. title: ilaprazole and other novel prazole-based compounds that bind tsg inhibit viral budding of hsv- / and hiv from cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rlee x in many enveloped virus families, including hiv and hsv, a crucial, yet unexploited, step in the viral life cycle is releasing particles from the infected cell membranes. this release process is mediated by host escrt complex proteins, which is recruited by viral structural proteins and provides the mechanical means for membrane scission and subsequent viral budding. the prazole drug, tenatoprazole, was previously shown to bind to escrt complex member tsg and quantitatively block the release of infectious hiv- from cells in culture. in this report we show that tenatoprazole and a related prazole drug, ilaprazole, effectively block infectious herpes simplex virus (hsv)- / release from vero cells in culture. by electron microscopy, we found that both prazole drugs block the release of hsv particles from the cell nuclear membrane resulting in their accumulation in the nucleus. ilaprazole also quantitatively blocks the release of hiv- from t cells with an ec of . μm, which is more potent than tenatoprazole. finally, we synthesized and tested multiple novel prazole-based analogs that demonstrate both binding to tsg and inhibition of viral egress in the nanomolar range of hiv- from t cells. our results indicate that prazole-based compounds may represent a class of drugs with potential to be broad-spectrum antiviral agents against multiple enveloped viruses, by interrupting cellular tsg interaction with maturing virus, thus blocking the budding process that releases particles from the cell. importance these results provide the basis for the development of drugs that target enveloped virus budding that can be used ultimately to control multiple virus infections in humans. the advent of antibiotics had a major impact on controlling bacterial infections in patients worldwide, with a single drug being used to treat multiple infections. unfortunately, antivirals have not had the same success. there are many contributing factors to this shortcoming, foremost the fact that few mechanisms are shared by different viruses limits targets for a broad-spectrum antiviral. consequently, approved antivirals generally act against individual rather than groups of viruses, limiting a single drug's potential. however, this may change with the finding that multiple classes of enveloped viruses share the same budding mechanism that relies on host-cell endosomal sorting complex required for transport (escrt) proteins ( , ) . an inhibitor of this pathway could represent a potential broad-spectrum antiviral and have a positive impact on our ability to treat multiple enveloped virus infections with a single therapeutic agent. enveloped viruses bud from the host cell membranes and use the acquired lipid layer as a protective coat that also contains the glycoproteins required for infection of other cells. enveloped viruses do not encode the machinery needed for budding and must recruit host-cell proteins to bud from cells. in hiv, escrt proteins are recruited to virus budding complexes through an interaction between the l-domain (ptapp motifs) in virus structural proteins ( ) ( ) ( ) ( ) ( ) with cellular protein tsg (tumor susceptibility gene ), a homolog of the e ubiquitin conjugating enzyme and a member of the escrt-i complex ( , ( ) ( ) ( ) ( ) . tsg recruits the cellular escrt-iii complex, which provides the mechanical means for viruses to passage through cell membranes to be released ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . another enveloped virus family member, herpes simplex virus, hsv, assembles particles in the nucleus and relies on escrt proteins for passage through the nuclear membrane ( ) ( ) ( ) ( ) . thus, this pathway may present a common target for treating multiple virus infections. in support of targeting this pathway, a recent seminal discovery in our lab established that an interferoninduced protein, interferon stimulated gene (isg ), specifically targets the escrt proteins in budding complexes to block the release of viruses ( , ( ) ( ) ( ) . this indicates that the human immune system evolved to target the escrt pathway to control infections and supports that this is a natural target. another group identified single-nucleotide polymorphic sites in the ' region of tsg , located at positions - and + relative to the translation start signal, which affect the rate of aids progression among caucasians ( ) . these data support the hypothesis that variation in tsg affects efficiency of tsg -mediated release of viral particles from infected cells, altering plasma viral load levels and subsequent disease progression. taken together, these investigations indicate that tsg and escrt proteins present a natural antiviral target. currently the prazole family of drugs is best known for their role as proton pump inhibitors (ppis) and a few, namely omeprazole (prilosec), esomeprazole (nexium) and ilaprazole (adiza, noltec, yi li an), are marketed to control symptoms of gastroesophageal reflux disease (gerd) in either the us or abroad. ppis form a covalent bond with the active site of proton pumps, inhibiting their ability to acidify the stomach and reducing symptoms associated with over-acidification ( ) . recent reports indicate that drugs from the prazole family, including tenatoprazole and esomeprazole, form a disulfide linkage to tsg , which results in blocking the release of hiv- from cells in culture ( ) . other groups recently reported that prazoles may have potential as an antiviral therapeutic in hsv and sars-cov in combination with acyclovir or remdesivir, respectively ( , ) . however, the prazole compound used in these studies, omeprazole, was not potent enough to be predicted to have a therapeutic effect in vivo. this highlights a gap in the ability for current prazoles to have therapeutic potential, and the need for further research on prazoles as antivirals. in the present manuscript, we demonstrate that multiple prazole drugs block the budding of hsv- and hsv- from vero cells in culture, strengthening the case for the broad-spectrum potential of this mechanism/drugs. most notably, we identified one prazole drug, ilaprazole, which blocks the release of both hiv- and hsv- / from cells at an efficiency more potent than reported for tenatoprazole. ilaprazole acts in the low µm range without detectable cell toxicity at inhibitory concentrations. additionally, we designed and synthesized novel prazole analogs that act in the nanomolar range to block virus release, a major step forward in creating a vbi that can be brought to the clinic. to further define the mechanism of action of these prazole drugs on hsv infections, we identified the site of blockage of herpes virus release, which appears to be different from hiv- . while the blockage to hiv- particle release is at the outer cell membrane ( ), the prazole drugs appear to first block the passage of the herpes virus through the nuclear membrane. this prevents particles being released into the cytosol, where maturation of their envelope membrane occurs to produce infectious virus. with the prazole-based inhibitors being effective in both hiv and hsv, targeting tsg could lead to a broadspectrum antiviral therapy. identification of prazole compounds that bind the uev ubiquitin-binding domain of tsg . we screened chemical compounds using a fluorescence thermal shift (fts) assay ( , ) to identify small molecules that bind directly to a truncated form of tsg (amino acids - ) which contains the ubiquitin e variant (uev) ubiquitin-binding domain (fig. ) . this truncated tsg , called tsg -uev, was used because full-length tsg has significant solubility issues in aqueous solution. tsg is an adaptor protein and thus lacks a readily deployable functional assay, making fts a tractable approach to identify interacting compounds. fts monitors protein thermal denaturation using sypro® orange, a dye which fluoresces when bound to hydrophobic surfaces, which allows monitoring of the changes in hydrophobic surface exposure during protein denaturation ( ) . since ligand binding affects protein thermal stability, it can be detected through modulation of protein thermal denaturation (melting) as a shift in melting temperature (tm). tsg -uev has a well-defined melting curve suitable for fts. we used the fts assay to identify compounds that bind to tsg -uev. we compared thermal denaturation profile for tsg -uev in the presence and absence of tenatoprazole and found that it destabilizes the native protein structure, indicating that it binds tsg -uev ( fig. ) . we also tested tenatoprazole against proteins unrelated to tsg , including dhph, eno , mek , and did not observe a tm shift (data not shown), indicating that the tm shift of tsg -uev was due to specific interaction of the prazole compound. this specific binding is consistent with a previous nmr structure in which tenatoprazole forms a covalent disulfide bond to cys in the uev domain of the protein ( ). this disulfide bond formation can be prevented by including the reducing agent dtt in the assay (fig. s ). the addition of dtt abolished the tsg -uev tm shift caused by the prazoles. therefore, the addition of dtt to the fts assay is a facile means to ascertain if prazole analogs interact with tsg -uev in a covalent manner. tenatoprazole and esomeprazole were shown to quantitatively inhibit the release of infectious hiv- from t cells in culture, and it was suggested that these effects may be mediated via changes in viral interaction with tsg , a key component of the cellular escrt complex ( , ) . given multiple reports suggesting that herpes viruses also use cellular escrt proteins in their replication process ( ) ( ) ( ) ( ) we tested if the tsg -binding prazole drugs, which blocked budding of hiv- , would also block the release of herpes viruses from cells. we infected vero cells with hsv- and hsv- for two hours at a multiplicity of infection (moi) of . to assay the antiviral activity of tenatoprazole. following infection, cells were treated with different concentrations of tenatoprazole. after or h the media fractions were collected and released virus titers were determined by standard plaque assays ( ) . tenatoprazole caused a -log drop of hsv- and - log drop of hsv- of released virus titer from vero cells in a dose dependent manner (table , column , , and ) with calculated ec 's ranging from - µm. total virus titer was also determined to differentiate between virus released into the media and infectious particles present in cell lysate. total infectious virus particles were also reduced by tenatoprazole ( we next imaged herpes virus infected-vero cells using transmission electron microscopy to determine the site of inhibition of release of virus and whether it was similar to observations of hiv- release from t cells. vero cells grown on glass cover slips were infected with hsv- at moi of . for h and then treated for h with µm tenatoprazole or vehicle control. using electron microscopy, we examined eighty cells with virus particles, and representative images are shown in fig. . in the control, virus particles were in both the nucleus and cytoplasm near the cell surface ( fig. a ). in the tenatoprazole-treated cells the cytosol of the intact cells was largely devoid of virus particles (fig. b ). instead, we observed large pockets of granular material accumulated in the nucleus and immature virus particles lined on the inside of the nuclear membrane (inset, b). these results suggest that tenatoprazole blocks the passage of herpes virus particles through the nuclear membrane. this result is different from that observed with hiv- . because tenatoprazole binds tsg , it suggests that the escrt-i protein complex is involved in transport of hsv- through the nuclear membrane. despite the lack of cell toxicity signal at effective tenatoprazole concentrations, the effective concentration is too high for use as a clinical therapy. therefore, more potent analogs are required to explore antiviral therapeutic potential. we set out to identify and test other analogs which were more potent prazole analogs. we began by searching pubchem for analogs of tenatoprazole. we identified and obtained a dozen such compounds from commercial sources and prioritized these for testing based on structural similarities around the sites where tenatoprazole covalently linked to cys of tsg . to this end, tenatoprazole, lansoprazole, rabeprazole, dexlansoprazole, pantoprazole, esomeprazole, desmethoxy-omeprazole (an omeprazole analog, -methoxy- [[( , -dimethyl- -methoxy-pridin- -yl-noxide)methyl]sulfinyl]- h-benzimidazole; labelled o-omeprazole), omeprazole, and ilaprazole were assessed in the fts assay for their ability to change the tm of tsg -uev as described above (data not shown). we determined the dose response plots of tsg melting temperature shifts caused by these prazole compounds binding to tsg ( - ) (fig. ) . o-omeprazole is the only compound predicted not to form the covalent bond with tsg , since it has an oxygen linked to a ring nitrogen that is normally a hydrogen in the active prazoles (table , right column). as expected, o-omeprazole did not cause a detectable thermal shift. the smallest thermal shift was observed with pantoprazole (gray) and the largest thermal shift was observed with ilaprazole (green). ilaprazole's ability to cause a thermal shift with tsg was blocked by the addition of dtt (fig. s ), consistent with the idea that the compound forms a disulfide linkage to tsg . next, we tested the anti-herpes virus activity of these prazole compounds (table ) . to examine the effects of the compounds on the release of hsv- from vero cells, we infected the cells with virus two hours prior to treatment with media containing different concentrations of compound. we incubated the cells for or hours and then collected the cell media fractions. several of the analogs were inactive, including o-omeprazole, pantoprazole, and rabeprazole. we identified a number of active compounds, in which there was a -fold spread of inhibition activity against hsv- , ranging from an ec of µm (for esomeprazole) to - µm (for ilaprazole). thus, we identified prazole analogs that are more potent than tenatoprazole. we provide the structures of prazole compounds tested in this analysis ( table , column ). of note, ilaprazole contains an additional ring structure compared to tenatoprazole that is predicted to lie in a solvent exposed area of the tsg structure that may serve to strengthen the interaction with tsg . in examining the thermal shift capacity of the prazoles, we found that these roughly correlated to their hsv- antiviral activity. this correlation indicates that the fts assay is useful in evaluating structureactivity-relationships (sar) to inform the design of new compounds (fig , table ). based on these hsv- antiviral assay results, we selected ilaprazole for further antiviral profiling. first, we tested it against hsv- (table , columns - ). ilaprazole was slightly more effective against hsv- than against hsv- with ec calculations ranging from . - µm (table ) . ilaprazole's potency is a large improvement over tenatoprazole, which inhibited in the high µm range (table & ) . additionally, ilaprazole was even more effective in inhibiting virus release at h as at h after a single application of the drug ( h ec < µm; compare table , columns & ). the inhibition caused by tenatoprazole against either virus began to fall off after h (data not shown). we also tested for toxicity in the range of effective concentrations and did not observe cell toxicity using the ® aqueous one solution cell proliferation assay reagent and wst- reagent over a h period ( table , right columns). thus, ilaprazole is more potent and has longer lasting effects than tenatoprazole. we next carried out a transmission electron microscopic examination of cells infected with hsv- at a moi . in the presence and absence of µm ilaprazole to determine if this drug causes the accumulation of virus particles in the nucleus of cells similar to tenatoprazole. without drug, we observe particles in the cytoplasm and in the nucleus (fig a & c) , in the presence of drug little or no viral particles are found in the cytoplasm (fig b & d) . in both heavily infected cells (fig a & b) and mildly infected cells (fig c & d) , treatment lead to particles being detected in the nucleus arrayed along the nuclear membrane (fig c & d) . this indicates that location of particles in the cell in the presence of drug is independent of the number of particles observed. these results are similar to the effect of tenatoprazole on hsv- infected cells (fig. ) . finally, to confirm the broad-spectrum potential of ilaprazole, we tested whether ilaprazole would inhibit the release of hiv- from t cells. to this end, cells were transfected with pr -hiv- ba-l plasmid and release of virus into the media fraction was detected by monitoring the capsid (ca) protein (p ) via enzyme linked immunosorbent assay analysis (elisa). the drug was tested at concentrations between and µm and the effect of the drug on release of virus assessed (table ). ilaprazole was effective at inhibiting the release of hiv- from cells with a calculated ec of . µm. we did not detect toxicity to the cells at the drug concentrations that inhibited the release of hiv- over the course of these experiments (data not shown). in an effort to design and identify more potent viral budding inhibitors, we synthesized novel prazolebased analog compounds and assessed these for binding to tsg -uev using fts. of the compounds screened, eight compounds demonstrated a tm shift greater than or equivalent to that observed with ilaprazole (selected examples, fig ) indicating that these compounds bind to tsg . based on the results described above, we are currently assessing the antiviral activity of these compounds in multiple antiviral assays. initial testing of four analog compounds against hiv- reveal further increases in potency above that seen with ilaprazole (table ) with ec calculations ranging from - nm. this supports the pursuit of a medicinal chemistry campaign to apply the sar learned from the prazole drugs to identify and develop potent compounds with broad-spectrum antiviral potential. taken together, our results indicate that prazole-based drugs block release of hiv- and herpes viruses (hsv- and hsv- ), two families of viruses with different replication mechanisms that share the common pathway of tsg -mediated release of virus particles. interestingly, we were able to identify more potent prazole analogs, in particular ilaprazole as well as novel compounds that are now in development. we found that ilaprazole and our analogs demonstrated antiviral effects significantly more potent, and potentially longer lasting, than other prazoles we tested. we are developing a novel strategy to treat viral infections affecting humans by disrupting a common mechanism used by many enveloped viruses to bud from cells. viral budding inhibitors (vbi) have the potential to be broad-spectrum antiviral therapeutics, potentially being effective against herpes ( - , ( , , ) . vbis would require testing for antiviral activity towards these different viruses before clinical use but nonetheless present a strong starting point for identifying therapeutics. in this work we demonstrate antiviral activity of prazole compounds, with no detectable cell toxicity at effective concentrations, against two viruses that use different mechanisms for viral replication. of particular note is that the viral genomes are very different, with hiv being rna-based and hsv being dna-based. that one compound works against viruses with such stark difference in viral life cycle types supports that these compounds have potential as a broad-spectrum antiviral agent for current and emerging viruses. this aspect gives this approach advantage over other potential broad-spectrum antivirals, such as remdesivir, which is targeted to rna viruses, limiting its potential as a broadspectrum antiviral ( ) . tsg binding to the proline-rich ptapp viral l-domains in gag ( , , , , , ) is required for virus particles to be released from cell membranes of infected cells. tsg is a member of the escrt-i complex of proteins involved in cell endosomal sorting. the escrt-i complex recruits proteins from the escrt-iii complex with aip ( ) , which provides the mechanical means for scission of virus particles from cell membranes. thus, blocking the ptapp l-domain sequence from interacting with the host proteins causes the virus budding defect and several lines of independent evidence support this idea. first, drugs targeted to this specific interaction in hiv cause budding defects in cells without detectable off-target effects ( ) . second, a research group identified noncoding snps in the ' region of tsg , which affect the degree of tsg -mediated release of viral particles ( ) . third, viral infections activate a host innate immunity mechanism, through interferon stimulated gene (isg ), that specifically disrupts virus budding complexes ( ) . in response to this immune system defense, many viruses encode enzymes that prevent or reverse isg conjugation to cellular proteins to avoid the budding blockade ( ) ( ) ( ) ( ) ( ) ( ) . while the prazole analogs block the release of hiv- , hsv- , and hsv- , the inhibition is manifested in different areas of the cell. the drugs block the release of hiv- at the outer cell membrane by preventing pinching of virus particles from the membrane ( ). in contrast, herpes viruses appear to be first blocked at the passage of the virus through the nuclear membrane. because the prazole drugs form a covalent bond to tsg , it strongly suggests that the escrt proteins are important for the herpes virus particles to be released from the nucleus of the cell where they are formed. this is consistent with the recent report by arii et al., ( ) that the escrt-iii protein complex mediates herpes virus movement across the nuclear membrane and regulates its integrity. the finding that the prazole drugs cause a significant drop in total infectious herpes viruses can be explained by the trapping of immature particles in the nucleus, which prevents them from migrating into the cytoplasm to exchange enveloped membranes, which makes them infectious. the accumulation of the dense material in the nucleus observed in the electron micrographs suggests that the drug may interfere with normal particle assembly in addition to blocking the release of the particles from the nuclear membrane. the use of prazoles as an antiviral represents an exciting potential case of repurposing existing drugs to act as antiviral agents. currently, omeprazole is marketed as a prodrug for treatment of acid reflux disease. the prodrugs are acid-activated into derivatives that form disulfide linkages with proton pumps ( , , ) . the prodrug, but not the charged sulfonamide derivative, can cross the plasma membrane barrier. the antiviral activity of tenatoprazole has been suggested to be the result of forming a covalent disulfide bond with tsg ( ). while the binding site for tenatoprazole is near the ubiquitin (ub) binding pocket and not the l-domain binding site, biochemical and confocal imaging data independently demonstrated that tenatoprazole disrupts the binding of tsg to the ptapp sequence ( ) . while the precise biochemical mechanism remains to be clarified, our fts results support that it may be related to allosteric changes in tsg after the drug forms its covalent linkage with cys . one of our most potent lead compounds is another prazole drug, ilaprazole, which is also marketed for treatment of acid reflux disease in china, india, and south korea (yi li an, adiza, noltec, respectively) indicating that it has reasonable bioavailability and a known clinical safety profile. previous reports did not detect offtarget effects of the prazole drugs affecting tsg metabolism inside of cells ( ) . the prazoles we tested here also appear to be nontoxic to vero, hela, and t cells at the concentrations used to inhibit budding of herpes viruses and hiv- . a recent report highlighted the potential of prazole compounds to have a therapeutic effect on sars-cov- when combined with remdesivir ( ) . however, the authors did not definitively identify the mechanism of action of the prazoles and also concluded that the potency of the prazole compound used, omeprazole, is too low to reach therapeutic levels in vivo. a potential mechanism posed by the authors is that the prazoles lead to an increase in lysosomal ph, which is the potential mechanism for lysosomotropic drugs such as chloroquine ( ) . in contrast to omeprazole, we hypothesize that ilaprazole and our more potent novel prazole compounds may allow for therapeutic levels to be reached in vivo. in the case of ilaprazole, which is marketed in several asian countries as discussed above, our strong in vitro results lay the foundation for a potential fast-track to broad-spectrum antiviral clinical testing, alone or in combination with other drugs, in these countries. we are currently working to determine if ilaprazole or our novel compounds have activity against sars-cov- with or in combination with remdesivir. this would further the potential broad-spectrum antiviral capacity of the prazole compounds described in this report. fts using thermal shift elicited by the small molecule binding effect to protein stability. fts monitors protein thermal denaturation using environment-sensitive dye sypro® orange which fluoresces when bound to hydrophobic surfaces, taking advantage of the changes in hydrophobic surface exposure in protein denaturation. discovery of small molecule binding to target protein utilizes the observation that ligand binding affects protein thermal stability, and therefore can be detected through a shift in the protein's thermal denaturation (melting) temperature (tm). we have employed fts to reveal changes in thermodynamic properties of tsg elicited by its interaction with a small molecule. the recombinant tsg fragment (amino acids - ), prepared as described above in materials and methods (but without label) has a thermal unfolding profile suitable for using fts as a primary screen assay in hts. a fluorescence dye sypro® orange (invitrogen) was used for assay detection. the dye is excited at nm and has a fluorescence emission at nm. the dye binds to hydrophobic regions of a protein that are normally buried in a native protein structure. when a protein is unfolded, the dye interacts with exposed hydrophobic surfaces and the fluorescence intensity increases significantly over that observed in aqueous solution. the tsg fragment was premixed at a concentration of μm with a x concentration of sypro® orange in hepes buffer ( mm hepes, mm nacl, ph . ). then µl of the protein-dye mix was added to an assay plate and to nanoliters of compound, equal to to µm, were added with an acoustic transfer robot echo (labcyte, ca). the plate was shaken to ensure proper mixing, then sealed with an optical seal and centrifuged. the thermal scan was performed from to °c with a temperature ramp rate of . °c/min. fluorescence was detected on a real-time pcr machine cfx (bio-rad laboratories). comparison of the thermal denaturation profile for tsg -uev in the presence and absence of tenatoprazole and other prazoles revealed destabilization of the native protein structure, indicating that the compound interacted with tsg -uev. table . effect of tenatoprazole on hsv- and-hsv- release from vero cells. tenatoprazole was incubated with vero cells infected with hsv- or hsv- at a range of concentrations. the virus released into the media fraction at stated times was determined as described in materials and methods. total virus is the amount of virus released from cells plus virus inside of the cells. viability of vero cells incubated with increased concentration of tenatoprazole was determined using the ® aqueous one solution cell proliferation assay reagent as described in materials and methods. figure . dose-response plots of tsg melting temperature shift caused by prazole compounds. different concentrations of prazole compounds were incubated with tsg (aa - ) and subjected to fluoresecent thermal shift analysis as described in materials and methods. a a table . effect of prazole analogs to inhibit the release of hsv- from vero cells. different concentrations of the listed prazole drugs were incubated with hsv- infected vero cells for hours and then virus released into the media was quantified by plaque assays. data presented includes the ec value (concentration at which virus release is inhibited by %). methods are as described in the legend of table . effect of ilaprazole on release of hsv- and hsv- from vero cells. different concentrations of ilaprazole were incubated for the times indicated with hsv- or hsv- infected cells similar to that described in the legend to table . virus titer released into the media and total virus was determined. viability of vero cells incubated with increased concentration of tenatoprazole was determined using the ® aqueous one solution cell proliferation assay reagent as described in materials and methods. titer of hsv- in media, h titer of hsv- in media, h titer of hsv- in media, h titer of hsv- in media, h total hsv- in media + cell lysate, h budding of enveloped viruses: interferon-induced isg -antivirus mechanisms targeting the release process parallels between cytokinesis and retroviral budding: a role for the escrt machinery effect of mutations affecting the p gag protein on human immunodeficiency virus particle release avian sarcoma virus and human immunodeficiency virus, type use different subsets of escrt proteins to facilitate the budding process tsg chaperone function revealed by hiv- assembly inhibitors an assembly domain of the rous sarcoma virus gag protein required late in budding fine mapping and characterization of the rous sarcoma virus pr gag late assembly domain tsg can replace nedd function in asv gag release but not membrane targeting ubiquitin depletion and dominant-negative vps inhibit rhabdovirus budding without affecting alphavirus budding the functionally exchangeable l domains in rsv and hiv- gag direct particle release through pathways linked by tsg tsg , the prototype of a class of dominant-negative ubiquitin regulators, binds human immunodeficiency virus type pr gag: the l domain is a determining of binding nedd l overexpression rescues the release and infectivity of human immunodeficiency virus type constructs lacking ptap and ypxl late domains functional role of alix in hiv- replication tsg and the vacuolar protein sorting pathway are essential for hiv- budding tsg control of human immunodeficiency virus type gag trafficking and release divergent retroviral latebudding domains recruit vacuolar protein sorting factors by using alternative adaptor proteins structure and functional interactions of the tsg uev domain the protein network of hiv budding aip /alix is a binding partner for hiv- p and eiav p functioning in virus budding human cytomegalovirus exploits escrt machinery in the process of virion maturation herpes simplex virus type production requires a functional escrt-iii complex but is independent of tsg and alix expression herpes simplex virus type cytoplasmic envelopment requires functional vps intracellular trafficking and maturation of herpes simplex virus type gb and virus egress require functional biogenesis of multivesicular bodies mechanism of inhibition of retrovirus release from cells by interferon-induced gene isg the mechanism of budding of retroviruses from cell membranes the interferon-induced gene isg blocks retrovirus release from cells late in the budding process consistent effects of tsg genetic variability on multiple outcomes of exposure to human immunodeficiency virus type pharmacokinetics and pharmacodynamics of the proton pump inhibitors omeprazole increases the efficacy of acyclovir against herpes simplex virus type and sars-cov- and sars-cov differ in their cell tropism and drug sensitivity profiles ligand screening using fluorescence thermal shift analysis (fts) high-density miniaturized thermal shift assays as a general strategy for drug discovery selective targeting of virus replication by the epstein-barr virus glycoprotein carboxy-terminal tail domain is essential for lytic virus replication functional interaction between the escrt-i component tsg and the hsv- tegument ubiquitin specific protease the small ring finger protein z drives arenavirus budding: implications for antiviral strategies cellular factors required for lassa virus budding the escrt system is required for hepatitis c virus production vps and the escrt-iii complex are required for the release of infectious hepatitis c virus particles small-molecule probes targeting the viral ppxy-host nedd interface block egress of a broad range of rna viruses a ppxy motif within the vp protein of ebola virus interacts physically and functionally with a ubiquitin ligase: implications for filovirus budding a host-oriented inhibitor of junin argentine hemorrhagic fever virus egress the multifunctional ebola virus vp matrix protein is a promising therapeutic target hiv- and ebola virus encode small peptide motifs that recruit tsg to sites of particle assembly to facilitate egress involvement of vacuolar protein sorting pathway in ebola virus release independent of tsg interaction ebola virus matrix protein vp interaction with human cellular factors tsg and nedd interaction of tsg with marburg virus vp depends on the pppy motif, but not the pt/sap motif as in the case of ebola virus, and tsg plays a critical role in the budding of marburg virus-like particles induced by vp , np, and gp hepatitis b virus maturation is sensitive to functional inhibition of escrt-iii, vps , and γ -adaptin mumps virus matrix, fusion, and nucleocapsid proteins cooperate for efficient production of virus-like particles evidence for a new viral latedomain core sequence, fpiv, necessary for budding of a paramyxovirus requirements for budding of paramyxovirus simian virus virus-like particles l-domain flanking sequences are important for host interactions and efficient budding of vesicular stomatitis virus recombinants ppey motif within the rabies virus (rv) matrix protein is essential for efficient virion release and rv pathogenicity the antiviral compound remdesivir potently inhibits rna-dependent rna polymerase from middle east respiratory syndrome coronavirus ub surprised: viral ovarian tumor domain proteases remove ubiquitin and isg conjugates ovarian tumor domain-containing viral proteases evade ubiquitin-and isg -dependent innate immune responses antiviral activity of innate immune protein isg role of nedd and ubiquitination of rous sarcoma virus gag in budding of virus-like particles from cells structural basis for ubiquitin-like isg protein binding to the ns protein of influenza b virus: a protein-protein interaction function that is not shared by the corresponding n-terminal domain of the ns protein of influenza a virus influenza b virus ns protein inhibits conjugation of the interferon (ifn)-induced ubiquitin-like isg protein escrt-iii mediates budding across the inner nuclear membrane and regulates its integrity general pharmacological properties of the new proton pump inhibitor restoration of acid secretion following treatment with proton pump inhibitors targeting endosomal acidification by chloroquine analogs as a promising strategy for the treatment of emerging viral diseases figure . thermal shift data of tsg by lead compound tenatoprazole (n ). the compound caused a dose-dependent shift in the tm for tsg -uev indicating binding to the key domain of tsg as described in materials and methods. supplemental figure fig . s effect of adding dtt to the thermal shift of tsg -uev in the fts assay. the addition of dtt abolishes the tm shift in the fts assay. this is consistent with all of these prazole compounds forming a disulfide bond to tsg . key: cord- - dxwbmd authors: shengjuler, djoshkun; chan, yan mei; sun, simou; moustafa, ibrahim m.; li, zhen-lu; gohara, david w.; buck, matthias; cremer, paul s.; boehr, david d.; cameron, craig e. title: the rna-binding site of poliovirus c protein doubles as a phosphoinositide-binding domain date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: dxwbmd some viruses use phosphatidylinositol phosphate (pip) to mark membranes used for genome replication or virion assembly. pip-binding motifs of cellular proteins do not exist in viral proteins. molecular-docking simulations revealed a putative site of pip binding to poliovirus (pv) c protein that was validated using nmr spectroscopy. the pip-binding site was located on a highly dynamic α-helix that also functions in rna binding. broad pip-binding activity was observed in solution using a fluorescence polarization assay or in the context of a lipid bilayer using an on-chip, fluorescence assay. all-atom molecular dynamics simulations of the c protein-membrane interface revealed pip clustering and perhaps pip-dependent conformations. pip clustering was mediated by interaction with residues that interact with the rna phosphodiester backbone. we conclude that c binding to membranes will be determined by pip abundance. we suggest that the duality of function observed for c may extend to rna-binding proteins of other viruses. highlights a viral pip-binding site identified, validated and characterized pip-binding site overlaps the known rna-binding site pip-binding site clusters pips and perhaps regulates conformation and function duality of pip- and rna-binding sites may extend to other viruses in brief the absence of conventional pip-binding domains in viral proteins suggests unique structural solutions to this problem. shengjuler et al. show that a viral rna-binding site can be repurposed for pip binding. pip clustering can be achieved. the nature of the pip may regulate protein conformation. interactions between viral proteins would then recruit the full complement of the genome- replication machinery. in , altan-bonnet and colleagues discovered that a picornavirus induced the synthesis of the phosphoinositide, phosphatidylinositol -phosphate (pi p), and showed that pi p tracked with sites of genome replication (hsu et al., ) . this observation inspired the hypothesis that a cellular lipid rather than a cellular protein might be the lure used to recruit viral proteins (hsu et al., ) . ninety-four percent of the docking solutions in the major cluster exhibited a similar head group orientation. red spheres indicate the position of the phosphate at position four of the inositol ring ( figure a ). in contrast, the orientation of the fatty acyl chains varied for each docking solution. docking results indicated that k , r , and r of the major cluster interact pi p ( figure a) . similarly, k and r of the minor cluster are positioned to interact with pi p ( figure a) . these residues are highly conserved in the c proteins of all prototype viruses in the enterovirus genus of the picornaviridae family ( figure b) . even though the sequence similarity between members of the genus ranges from - %, the basic nature of k , r , r , and r were conserved, whereas that of k was not ( figures b) . validation of pip-binding sites by nmr in order to test the validity of our docking observations, we titrated n-labeled c protein with soluble dibutyl-pi p to observe potential nmr chemical shift perturbations (csps), which would indicate chemical environment changes in the presence of pi p (figure ) . out of the three basic residues of the major cluster that were predicted to interact with the pi p, r showed the largest csp (figure a ). there were also smaller csps associated with k and r (figure a) . however, we did not observe any csps for the minor cluster residues k and r . as seen in figure b , the addition of pi p resulted in a number of csps beyond that of k , r , and r . csps at least one standard deviation above the mean were considered to be significantly different (csp > . ppm). many of these residues were not in the immediate vicinity of a docked pi p molecule. for example, a , m , and n are located at the n-terminus of c at a distance from pi p but all showed significant csps ( figure b ). these and other perturbations could be caused by other structural changes conferred by pi p binding. broad pip-binding activity of c observed in solution in order to assess pip binding to c empirically and evaluate specificity of any observed binding, we used a fluorescence polarization (fp)-based pip-binding assay (ceccarelli et al., ; contrast, the bound form of the fluorescent ligand has a larger molecular volume, which reduces the rotation such that the emitted light remains in the same plane as that used for excitation for a longer period of time ( figure a ). this approach was validated by using the plc-δ ph domain as a positive control, which showed pi( , )p binding specificity, as expected ( figure s ) (harlan et al., ) . we measured binding of phosphatidylinositol (pi) and mono-, bis-, or tris-phosphorylated pips to c ( figure b ). binding of pi and all mono-phosphorylated pips was very weak. however, binding of bis-and tris-phosphorylated pips was substantially stronger and without any apparent order or preference ( figure b ). given the absence of an observed binding preference of c, we determined the apparent dissociation constant (kd,app) for pi( , )p binding to c. the value was ± µm ( figure c ). to assess the contribution of c residues implicated in pip-binding to the actual binding event monitored computationally (figure a) , we changed k , r , or r to leu. we used leu instead of ala to limit any collateral damage that might be caused by creating a void in the protein that could be filled by water. one expectation was that loss of any of these residues would diminish pip binding. however, we observed that the k l derivative behaved as wt (kd,app = ± µm) ( figure d ). the r l derivative reduced binding by ~ -fold relative to wt (kd,app = ± µm) ( figure e ). the r l derivative actually increased pip-binding affinity by ~ -fold (kd,app = nmr experiment, dibutyl-pi p was used instead. we monitored residues that were implicated in pip binding: k , r , and r . it should be noted that the csps induced by addition of pi p (figure a ) or rna ( figure i ) exhibited different trajectories. the addition of pi p to the c-rna complex ( figure i ) led to resonance positions similar to that of the c-pi p complex (figure a) , suggesting that pi p competed out rna and that pi p and rna binding were mutually exclusive. broad pip-binding activity of c observed on membranes it is possible that the context in which a pip is presented contributes to binding affinity and/or specificity. to address this question, we employed a ph-modulation assay to monitor protein binding to a planar supported lipid bilayer (slb) in a microfluidic device (jung et al., ; shengjuler et al., ) . the slb is a stable model membrane system, and the composition of the membrane can be easily tailored to probe interactions of specific lipids with proteins. the slb used here employed three lipids, the structures of which are illustrated in figure a . a ph- sensitive dye, ortho-sulforhodamine b-pope (osrb-pope), is embedded into the slb and used to probe protein binding by detecting changes in the local electric field (figures b). upon binding of c, which is positively charged at the experimental ph, the interfacial potential increases and more hydroxide ions are recruited locally. as a result, the fluorescence signal decreases ( figures b and s ) . in these studies, we used -palmitoyl- -oleoyl-sn-glycero- -phosphatidylcholine (popc) as the primary component of the slb. we performed experiments in the presence of mg + divalent cations to better mimic the cellular environment. c did not bind to pure popc membranes ( figure c ). the addition of pi p to the popc membrane revealed the capacity of c to bind to this mono-phosphorylated pip ( figure c ). the value for the kd,app was . ± . µm (table ). in addition to pi p, c bound to popc membranes containing pi p and pi( , )p , all with essentially the same affinity (table ) . pip binding to c in the context of a membrane also required the phosphates because phosphatidylinositol (pi) binding was weak ( table ) . worth noting, pi binding did occur when compared to popc alone; however, we were unable to produce a complete binding isotherm because of the extremely high concentration of protein required. the presence of pi will produce a net negative charge on the membrane, so the interaction with the basic surface of c makes sense. to test the possibility that negative charge alone can promote binding of c to the membrane, we performed an experiment in a membrane containing -palmitoyl- -oleoyl-sn-glycero- -phospho-l-serine (pops). the presence of pops will also produce a net negative charge on the membrane. the affinity of c for the pops- containing membrane was on par with that observed for the pi-containing membrane ( table ) . (figures a and b) . although the same residues of c were used to interact with both pips, the orientation of c differed slightly when bound to pi p compared to that bound to pi( , )p (figures a and b) . for reference, residues of the protease active site have been indicated by black spheres. the orientation of the active site in figure a is clearly different than observed in figure b . this analysis expanded the number of residues interacting with pip headgroups to include residues in regions observed by nmr, for example d , k , and r ( figures a and b) . interestingly, the contribution of d to c-pip interface was mediated by two na + ions in the case of pi p ( figure a ) and one na + ion in the case of pi( , )p ( figure b ). a surface representation of c interacting with pips in the context of the membrane showed that c is propelled above the membrane by the headgroups (figures c and d) . the n-terminus of c was an exception. rotation of c to bring the n-terminus into view showed that the n- terminus has a more substantive interaction with the outer leaflet of the bilayer, albeit of a non- specific nature (figures e and f ). the conformation and interaction of the n-terminus with membrane was also influenced by the pip present (figures e and f ). for the n-terminal residues of c to exhibit such conformational flexibility and diverse interactions with membranes, the solution behavior would have to be more dynamic than predicted by the crystal structure (mosimann et al., ) . in order to evaluate the solution beavior of c, we performed small-angle x-ray scattering (saxs) experiments. figure a shows the scaled scattering profiles of c at concentrations ranging from . mg/ml to . mg/ml. the overlap of the scaled curves at the low ends of scattering angles was consistent with the protein remaining monomeric over this concentration range in agreement with the nmr and dls studies (chan et al., ) . the scattering data at the low-angle end showed a linear correlation, satisfying the guinier approximation (qrg < . ), from which we determined a radius of gyration (rg) of . Å ( figure a ). the rg derived from the guinier approximation was in good agreement with that estimated from the pair-distance distribution function p(r) calculated by gnom (real-space rg = . Å, reciprocal-space rg = . ± . Å) ( figure b ). from the calculated p(r), the maximum particle dimension of the protein was estimated to be Å ( figure b). the asymmetric shape of the distribution function suggested the presence of a tail in the solution structure of c. the values of rg and dmax from saxs data also suggested that the protein exists as a monomer in solution. furthermore, the estimated molecular mass of . ± . kda (derived from porod volume) or . ± . kda (derived from saxs mow calculation) was in very good agreement with the calculated molecular mass of a monomer ( . kda). we compared the calculated scattering profiles of the crystal structure and the average md structure to the experimental data. as shown in figure c , both structures agreed well with the experimental data. however, the calculated curve from the average md structure showed a slightly better fitting to the saxs data relative to the crystal structure (χ = . for crystal structure vs. χ = . for md structure). next, we built the ab initio low-resolution saxs model using dammin and dammif programs; independent models from each program were generated and averaged. the models were highly similar; the average values of the normalized spatial discrepancy (nsd), which represents the similarity among the models, were . ± . and . ± . for dammin and dammif models, respectively. superimposing the average md structure onto the saxs model showed a very good agreement between these two models with an nsd of . . the crystal structure showed less agreement with the saxs model with an nsd of . . the reconstructed low-resolution saxs model clearly revealed the presence of an extended and highly dynamic n-terminal helix of c, the volume of which exceeded the volume of a helix that is residues in length ( figure d ). worth noting, the amphipathic nature of this helix ( figure s ) may also contribute to the alternative conformations observed on membranes (figures e and f ). the membranes forming the genome-replication organelle of enteroviruses are enriched in pi p. it is known that the rna-dependent rna polymerase domain, d, of enteroviruses binds to pi p, although it is not clear where on the protein this binding occurs (hsu et al., ) . because the cd protein accumulates to a much higher level in pv-infected cells than d and cd has been implicated in formation and/or function of the genome-replication organelle (oh et al., ), we used the available structural information for these proteins (marcotte et al., ; mosimann et al., ; thompson and peersen, ) to screen for pi p-binding sites by using a molecular-docking algorithm (goodsell et al., ; morris et al., ) . the observation of a putative pi p-binding site on c was unexpected but exciting as it was now very possible that trafficking of pv proteins to the site of genome replication might be facilitated by use of new structural classes of pi p-binding domains (figure ). our focus on c was motivated by the arsenal of structural and computational resources available to study pv c that would facilitate identification and characterization of the pi p-binding site (amero et al., ; marcotte et al., ; mosimann et al., ; moustafa et al., ) , including a near-complete backbone nmr resonance assignment of the d (amero et al., ) . using a solution-and bilayer-based assay for pip binding, we observed pip binding by c (figures and ) . titration of pi p into a solution containing c caused csps that were consistent with the major pi p-binding site observed computationally (figure a) . however, the csps also included other residues (figure b) , in particular residues known to contribute to the rna- binding activity of c (amero et al., ) . pip-binding was indeed competitive with rna binding (figures h and i) . for a small molecule like pi p to cause so many csps or to compete with a substantially larger rna, either multiple molecules of pi p bind to the protein or a single molecule of pi p induces a large change in the conformation and/or dynamics of the protein. analysis of nmr spectra were inconsistent with large-scale changes in conformation or dynamics as most of the resonance positions and intensities were not affected by pi p binding (figure ) . md simulations were consistent with multiple pips binding to pv c along the shallow cleft used for rna binding (figure ) . while the negative charge of the phospholipid headgroup contributed to pip binding, the display of the charge on the inositol ring contributed to high-affinity binding ( table ) . in this regard, neither the location of the phosphate on the inositol ring nor the number of phosphates on the inositol ring mattered for c binding ( table ) , although the nature of the pip influenced the conformation of c bound to membrane observed computationally ( figure ) . md simulations indicated that one contributor to the pip-dependent conformation is the flexibility of the n-terminus (compare figures e and f ). such conformational flexibility of the c n-terminus was also observed in solution using saxs (figure ). the pip-binding site structure, corresponding specificity and extreme capacity to cluster pips differs substantially from known cellular pip-binding domains. most cellular pip-binding domains contain a solvent-accessible, basic cavity to which a pip binds (lemmon, ) . in certain cases, spatial restriction within the cavity enables stereospecificity to the interaction with a pip (chukkapalli et al., ; johnson et al., ; saad et al., ) . the role of viral matrix proteins is to bridge the genome or some genome-containing ribonucleoprotein complex to the membrane used to envelope the virus as it buds from the cell. both of these viral proteins appear to bind specifically to pi( , )p . pip-binding signals to the protein that it has reached the destination for assembly. in the case of ma, pip binding triggers the exposure and insertion of its myristoylated n-terminus into the plasma membrane (saad et al., ) . in the case of vp , pip binding promotes oligomerization (johnson et al., ) . both hiv ma and ebola vp assemble into oligomers, which cluster pips, but not to the same extreme as predicted for c (johnson et al., ; saad et al., ) . both of these matrix proteins bind rna. in the case of non-structural protein a (ns a) from hepatitis c virus binds pi( , )p and to lesser extent pi( , )p through its n-terminal amphipathic α-helix (cho et al., ) . the motif used has been referred to as the basic amino acid pip pincer (baapp) domain (cho et al., ) . this motif is nothing more than two basic amino acids flanking a series of hydrophobic residues displayed on a helix, presumably that penetrates into the bilayer (cho et al., ) . the baapp motif can be found in amphipathic α-helices of cellular and viral proteins; however, the function is not known. indeed, such a motif might exist in the n-terminus of c ( figure s ). in the case of c, we propose that this helix just augments membrane binding without any specificity (figures e and f). how the baapp domain confers specificity is not known. it is worth noting that ns a is an rna-binding protein. the organization described for c, amphipathic α-helix followed by an rna-binding domain, therefore applies to ns a. it is compelling to speculate that this rna- binding domain might also interact with pips. our computational studies suggested that c interacted with a single pip mediated by k and r . (figure a) . although r was nearby, a role for this residue in pip binding was not suggested ( figure a) . removal of the positive charge from k had no impact on pip binding, removal from r increased pip-binding affinity, but removal from r weakened pip-binding affinity ( figures c- f) . a role for r and r in pip binding was supported by nmr experiments, but the evidence for k was weak ( figure b) . so, while the molecular-docking simulation pointed us in the right direction, the molecular details of the putative interaction were not deliberate interrogation of this hypothesis is warranted. in conclusion, in an effort to begin to identify proteins and mechanisms that pv uses to recognize pips, we have discovered that nature has been creative and adapted a viral rna- binding surface to perform this additional task. the outcome is a new structural class of pip- binding proteins with the capacity for multivalent binding to any mono-, bis-or tris-phosphorylated pip, but with a unique conformational state determined by the nature of the bound pip. it is likely that a similar mechanism exists in noroviruses and coronaviruses, suggesting a role for pips in the biology of these viruses even though empirical support of this possibility is needed. csps at least one standard deviation above the average were considered to be substantial (Δtotal = . ppm). these results are in agreement with the docking analyses, which show pi p-binding towards the n-terminus of c. chemical structures of the supported lipid bilayer (slb) components; ph-sensitive ortho-sulforhodamine b- -palmitoyl- -oleoyl-sn-glycero- -phosphoethanolamine (osrb-pope); l-αphosphatidylinositol -phosphate (pi p), and -palmitoyl- -oleoyl-sn-glycero- -phosphocholine (popc). "r" represents the fatty acyl chains. (b) schematic diagram illustrating the principles of the slb-binding experiment. in the absence of c, the fluorescent probe is in its "on state" (left). upon binding to the membrane, the interfacial potential is increased, causing the fluorescent probe to switch to its "off state" (right). the ph response curve of the fluorescent probe in a bilayer containing mol% popc, . mol% pi p, and . mol% osrb-pope is shown. (c) c binding to pi p-containing slbs. change in fluorescence intensity was observed as a function of c concentration, which was normalized to a reference channel. shown is a hyperbolic fit of the data set. the apparent dissociation constant for c-pi p interaction is . ± . µm. c was unable to bind to pure popc membranes. all experiments were conducted at mm hepes at ph . , mm nacl, and mm magnesium acetate. error bars represent the sem (n = ). to the experimental saxs data (grey). both structures fit well; however, the average md structure showed slightly better fitting compared to the crystal structure as indicated by its lower χ-value. the reconstructed saxs filtered model calculated by dammin is shown as an orange transparent surface with the crystal structure superimposed. the saxs model clearly shows an extended n-terminus compared to the crystal structure. the calculated scattering profile of the saxs model (orange) fitted to the experimental data (grey) is also shown. . ± . . ± . . ± . pi( , )p . ± . . ± . . ± . a each membrane also contained . mol% osrb-pope. b pi and pops binding as a function of c concentration was in the linear range. assuming an amplitude similar to that of pi p binding, yields the indicated values for kd,app, which represent a lower limit. further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, craig e. cameron (cec @psu.edu). cell lines rosetta™(de ) competent cells (millipore sigma) were used for plc-δ protein purification and grown in nzcym media, ph . , by iptg induction. poliovirus proteins were expressed and purified from e. coli bl (de )pcg (shen et al., ) in nzcym media, ph . , by iptg induction. materials. bodipy-fl-labeled and unlabeled water soluble phosphoinositides were from echelon biosciences, inc. construction of expression plasmids. the c construct used in this study, pet ub-pv- c-c a-c s-chis , was previously described (amero et al., ) . the quikchange site-directed mutagenesis kit (stratagene) was used to introduce mutations into the c coding sequence. the oligonucleotides employed were as follows: the docking study of pi p to c and d proteins was carried out using autodock . suite of programs (goodsell et al., ; morris et al., ) . the crystal structure of the c protein (pdb l n) (mosimann et al., ) was obtained from the protein data bank. only chain a of the two identical monomers in the crystal structure was chosen for the docking. the protein was prepared for the docking runs as follows: the structural water molecules and any non-protein atoms were deleted from the crystal structure; explicit hydrogen atoms were added to the protein and the structure was subjected to quick minimizations ( steps) using amber force field ff sb in chimera (hornak et al., ; pettersen et al., ) . next, the autodocktools (adt) was used to complete the preparation of the target protein by merging the non-polar hydrogen atoms, adding kollman charges, and creating the pdbqt files for the docking runs. the structure of the pi p ligand was prepared by modifying the structure of pi( , )p , extracted from the nmr complex structure of hiv- matrix protein bound to pi( , )p (pdb h z), in chimera (saad et al., ) . the ligand was subjected to quick energy minimizations ( steps) in chimera after adding hydrogen atoms and gasteiger-marsili atomic partial charges; adt was then used to prepare the ligand pdbqt file, the flexible ligand has active torsions. for the target protein, the affinity grid field was generated using the auxiliary program autogrid. small angle x-ray scattering (saxs) analysis. a non-his-tagged version of c protein was expressed and purified. protein samples were prepared at concentrations of . , . , and . mg/ml in mm tris-hcl buffer at ph . containing mm nacl, % glycerol, mm edta and mm dtt. synchrotron saxs data were collected on the f -line station at macchess at k using dual pilatus k-s detector and a wavelength of . Å. the data were collected using exposure times of minutes in ten -second frames and covered a momentum transfer range (q-range) of . < q < . Å - . the program raw (nielsen et al., ) was used for data reduction and background subtraction. the radius of gyration (rg) and forward scattering i( ) were calculated using guinier approximation. the gnom program (svergun, ) was used to calculate the pair-distance distribution function p(r) from which the maximum particle dimension (dmax) and rg were estimated. the ab initio low-resolution models were reconstructed using dammin (svergun, ) and dammif (franke and svergun, ) programs using data in the range . < q < . Å - ; ten independent models generated from each program were averaged using damaver (volkov and svergun, ) . foxs expression and purification of pv c. c protein was expressed in bl (de )pcg competent cells as previously described (shen et al., ) . cells were harvested by centrifugation at , x g at °c for min and washed with a buffer containing mm tris and mm edta at ph . and then re-centrifuged. the cell pellet was resuspended in buffer a [ mm hepes, % glycerol, mm imidazole, mm β-mercaptoethanol (bme), mm edta, mm nacl, . μg/ml pepstatin a, . μg/ml leupeptin, at ph . ] at a ml buffer a per -gram cell pellet ratio. isotopic labeling and sample preparation for nmr. small unilamellar vesicle (suv) preparation. lipids were mixed at the desired mole ratio in chloroform in a glass scintillation vial. the chloroform was removed by continuously purging the vial with n gas. desiccation was performed under vacuum for - hours to remove any residual organic solvent. the dried lipid films were hydrated with mm hepes, mm nacl, at ph . followed by sonication in a water bath at room temperature to obtain . mg/ml lipid suspensions. these suspensions were then subjected to freeze−thaw cycles with liquid n and °c water bath, and extrusion cycles through a nm track-etched polycarbonate membranes glass cleaning procedure. the glass substrates used for supporting fluid lipid bilayers were first boiled in a -fold diluted x cleaning solution (mp biomedicals) and water for one hour. this was followed by rinsing the glasses with copious amounts of purified water before drying thoroughly with nitrogen gas. the coverslips were then annealed for five hours at °c before being stored until use. photomask design. the microfluidic device was designed with a drafting software (autocad v. ). this device consists of eight channels with independent inlets and outlets. each channel has a dimension of cm x µm (length x width) and each channel is separated from each other by a µm gap. the design with black background and clear features was printed at a , dpi resolution on a transparent mask ( x in) by cad/art services. silicon mold fabrication. the mold containing the microfluidic patterns was fabricated in the nanofabrication laboratory at penn state in university park, pa. a -inch silicon wafer was dehydrated ( min at °c). the su - (microchem corp.), negative tone photoresist, was poured onto the center of the wafer via static dispensing method, spun ( sec at rpm, sec at , rpm), pre-baked ( min at °c), soft-baked ( min at °c), exposed to uv light with the ma/ba mask aligner for min ( x sec/exposure) at a power density of . mw/cm to produce a positive relief of photoresist on the wafer, and then postbaked ( min at °c and min at °c) to selectively cross-link the uv-exposed portions of the film. the wafer was developed in an su developer for min (without agitation), rinsed with isopropyl alcohol (ipa), and dried with n gas. the wafer with photoresist pattern was hard-baked for sec at °c, sec at °c, and min at °c. in the simulations, a membrane consisting of popc and pi p lipid molecules, and a membrane composed of popc and pi( , )p were prepared. the model membranes were created by the charmm-gui (wu et al., ) and equilibrated for ns. the protein was placed above the membrane with the center of mass of the protein about nm away from the center of membrane in the z direction. the closest distance between the protein and the lipid molecules is around . nm. the starting orientation of the protein towards the membrane is based on the docking prediction, with few vital pipinteracting residues such as r and r facing the membrane. the protein and membrane were solvated in a box of tip p water. sodium and chloride ions were added at a near-physiological ion concentration of mm. the charmm force field was used for the simulation atoms (huang and mackerell, ; klauda et al., ) . a time step of fs was employed. the van der waals interactions was cut at . nm and the electrostatic interactions was treated with particle-mesh ewald (pme) method. the shake algorithm was used for length constraint on bonds involving hydrogen. the simulations were carried out under conditions of constant temperature at k and constant pressure at bar. the simulation system was subjected to energy minimization for steps, followed by ns constraint simulation with a harmonic potential applied on protein atoms. the production simulations of ns in length were then performed. the initial ns of all simulations was simulated using the namd/ . package (phillips et al., ) . then it was moved to the anton supercomputer that is optimized for md simulation for another ns simulation (shaw et al., ) . the trajectories of the last ns were used for analysis. statistical analysis and nonlinear regression was provided by graphpad prism v. . error bars represent the sem. the n-terminus of poliovirus c. two clusters were identified by docking dibutyl-pi p (red spheres) on c surface (grey ribbon). the major cluster ( % of the trials) encompasses residues k , r , and r ; the minor cluster ( % of the trials) encompasses residues k and r . total of docking runs were performed. the crystal structure of c (pdb l n) was obtained from the protein data bank (mosimann et al., ) . the structure of the pi p ligand was prepared by modifying the structure of phosphatidylinositol , -bisphosphate (pi( , )p ), extracted from the nmr complex structure of hiv- matrix protein bound to pi( , )p (saad et al., ) . phosphates on the inositol head group are depicted as red spheres; basic residues in the major and minor clusters are depicted as sticks. (b) multiple-sequence alignment of enteroviral c proteins shows that the majority of the residues predicted by docking (k , r , r , and r ) and/or the basic charge are conserved; k is not conserved (colored in blue). residues implicated in phosphoinositide-binding ( ); conserved residues (*); pv, poliovirus; cv, coxsackievirus; ev, enterovirus; bev, bovine enterovirus; pev, porcine enterovirus; sev, simian enterovirus; sv, simian virus. n heteronuclear single quantum correlation (hsqc) spectra of free c (black) and c in a complex with dibutyl-pi p (red). (right) close up view of the chemical shift perturbations (csps) for k , r , and r , which are implicated in pi p-binding by docking. the nmr experiments were conducted at °c in a buffer containing mm hepes at ph . , mm nacl, and % d o. the final concentration of c and pi p were . mm and . mm, respectively. (b) a bar graph showing csps induced throughout c by pi p binding. the total chemical shift change (Δtotal) depends on the chemical shift changes in the h (Δh) and n (Δn) dimensions according to Δtotal = (Δh + . Δn ) . . residues with significant csps (above the red line) are indicated. csps at least one standard deviation above the average were considered to be substantial (Δtotal = . ppm). these results are in agreement with the docking analyses, which show pi p-binding towards the n-terminus of c. the principles of the fluorescence polarization (fp)-based pip-binding assay. unbound form of the fluorescent ligand (pip-bodipy-fl) yields low polarization value due to its small size and rapid rotation. in contrast, the bound form yields a higher polarization value due to the overall size of the complex and its slow rotation. all fp-based pip-binding experiments were conducted in a buffer containing mm hepes at ph . and mm nacl. c-pi( , )p samples were incubated for s inside a chamber at °c prior to measuring the millipolarization (mp). (b) pip-binding specificity of c. c binds to both bis-and tris-phosphorylated pip species. binding was tested at a fixed c concentration ( µm) using . nm of each pip-probe as indicated. error bars represent the sem (n = ). (c) wt c and c variants: (d) k l, (e) r l, and (f) r l bind to phosphatidylinositol , -bisphosphate (pi( , )p ) with varying affinities. shown is a hyperbolic fit for each data set from which each apparent dissociation constant (kd,app) was obtained. values are provided in table . (g) the principles of the fluorescence polarization (fp)-based competition assay. c-rna (fluorescently-labeled -mer) complex is pre-formed, which yields chemical structures of the supported lipid bilayer (slb) components; ph-sensitive ortho-sulforhodamine b- palmitoyl- -oleoyl-sn-glycero- -phosphoethanolamine (osrb-pope); l-α-phosphatidylinositol phosphate (pi p), and -palmitoyl- -oleoyl-sn-glycero- -phosphocholine (popc). "r" represents the fatty acyl chains. (b) schematic diagram illustrating the principles of the slb-binding experiment. in the absence of c, the fluorescent probe is in its "on state" (left). upon binding to the membrane, the interfacial potential is increased, causing the fluorescent probe to switch to its "off state" (right). the ph response curve of the fluorescent probe in a bilayer containing mol% popc, . mol% pi p, and . mol% osrb-pope is shown. (c) c binding to pi p-containing slbs. change in fluorescence intensity was observed as a function of c concentration, which was normalized to a reference channel. shown is a hyperbolic fit of the data set. the apparent dissociation constant for c-pi p interaction is . ± . µm. c was unable to bind to pure popc membranes. all experiments were conducted at mm hepes at ph . , mm nacl, and mm magnesium acetate. error bars represent the sem (n = ). figure . c interacts with multiple pi p and pi( , )p ligands. all-atom molecular dynamics simulations of (a) c binding to a pi p-containing membrane or (b) to a pi( , )p -containing membrane. snapshots of selected simulations show that c interacts with five, clustered pi p or pi( , )p molecules. pi p and pi( , )p are shown as dark grey sticks with phosphates colored orange (phosphorus) and red (oxygen). c is shown as a light grey ribbon; residue side chains are colored as follows: light blue (positive side chains) and red (negative side chains) sticks. also shown are: sodium ions as magenta spheres; the amino-and carboxy-termini as blue and red spheres, respectively; c protease active-site residues as black spheres. specific interactions are as follows, with number in parentheses referring to headgroup in panel: ( ) r and r ; ( ) d , mediated by sodium ions; ( ) d for pi p in panel a mediated by sodium, or r for pi( , )p in panel b; ( ) k and r ; ( ) α-amino group of g for pi p in panel a, or k and r for pi( , )p in panel b. (c) and (d) surface representations of c (light grey) with pi p-containing membrane in panel c or pi( , )p -containing membrane in panel d (blue, red and dark grey sticks); phosphatidylcholine (light grey lines). charged interactions between c and the respective ligands are shown as blue (positive) or red (negative) spheres to correspondingly colored sticks. (e) and (f) same as panels (c) and (d) rotated degrees counter-clockwise about the vertical access of the page to reveal the amino terminus (blue) interacting with the membrane (light grey lines). pi p-containing membrane was composed of phosphatidylcholine (pc) and pi p lipids; pi( , )p containing membrane was composed of pc and pi( , )p lipids. all lipids were equally distributed on each monolayer of the membrane. the simulation was conducted at k for ns; the last ns of the trajectories were used for analysis. figure . the n-terminus of c is dynamic as revealed by saxs. (a) scattering profiles of c obtained at three different protein concentrations ( . mg/ml, green; . mg/ml, red; . mg/ml, grey). scaling factors were applied for the low-concentration data. the inset shows the linear fitting of the guinier plot from which radius of gyration (rg) was determined. (b) the transformed scattering profile calculated by gnom from the pair-distance-distribution function, p(r), that is shown in the inset. the maximum particle dimension and the rg obtained from p(r) are indicated. the estimated rg from gnom is in good agreement with that obtained from guinier fitting. (c) the calculated scattering profiles of the crystal structure of poliovirus c monomer (cyan) (mosimann et al., ) and the average md structure (black) (moustafa et al., ) fitted to the experimental saxs data (grey). both structures fit well; however, the average md structure showed slightly better fitting compared to the crystal structure as indicated by its lower χ-value. (d) the reconstructed saxs filtered model calculated by dammin is shown as an orange transparent surface with the crystal structure superimposed. the saxs model clearly shows an extended n-terminus compared to the crystal structure. the calculated scattering profile of the saxs model (orange) fitted to the experimental data (grey) is also shown. purification was performed as described above. purified c fractions were concentrated using vivaspin- centrifugal concentrators (sartorius stedium biotech) and buffer exchange was performed using zeba spin desalting columns ( , mwco) (thermo scientific) n hsqc nmr spectra were recorded at °c on a mhz bruker avance iii spectrometer equipped with a mm nmrpipe and nmrview software were used to process and analyze the nmr spectra Δωhn is the change in chemical shift in the amide proton dimension and Δωn is the change in chemical shift in the nitrogen dimension. chemical shift perturbations at least one standard deviation above average were considered to be substantial fluorescence polarization-based phosphoinositide binding assay to test pip-binding specificity, µm of c or its mutants were added into a solution containing . nm of bodipy-fl-labeled pips in a binding buffer [ mm hepes and mm nacl, at ph . ] at a μl final reaction volume. due to tight binding, plcδ ph domain pip-specificity was assessed at mm nacl and in the presence of nm protein. for competition experiments p- ) were titrated into a solution containing a pre-formed c-rna complex. μm of c and . nm of '-fluorescein (fl) labeled -mer rna ( '-agu uca aga gc- '-fl), corresponding to the pv orii sequence, were added to the binding buffer )p ) competes out the bound rna, the millipolarization (mp) value decreases. (h) fp-based competition experiments. the unlabeled pi( , )p efficiently competes out the bound rna, suggesting that these interactions are mutually exclusive. the competition experiment with phosphatidylinositol (pi) competitor is used to show that binding to c is required for rna displacement. binding reactions contained µm c-wt, . nm '-fluorescein (fl)-labeled rna changes in the direction of k , r , and r resonances (green arrow) in two-dimensional space suggest that rna and pip interactions are mutually exclusive. the nmr-based experiments were conducted at °c in a solution containing mm hepes at ph . , mm nacl, and % d o. the final concentration of c, rna, and key: cord- -z q wo v authors: sang, eric r.; tian, yun; gong, yuanying; miller, laura c.; sang, yongming title: integrate structural analysis, isoform diversity, and interferon-inductive propensity of ace to refine sars-cov susceptibility prediction in vertebrates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: z q wo v the current new coronavirus disease (covid- ) has caused globally near . / million confirmed deaths/infected cases across more than countries. as the etiological coronavirus (a.k.a. sars-cov ) may putatively have a bat origin, our understanding about its intermediate reservoir between bats and humans, especially its tropism in wild and domestic animals, are mostly unknown. this constitutes major concerns in public health for the current pandemics and potential zoonosis. previous reports using structural analysis of the viral spike protein (s) binding its cell receptor of angiotensin-converting enzyme (ace ), indicate a broad sars-cov susceptibility in wild and particularly domestic animals. through integration of key immunogenetic factors, including the existence of s-binding-void ace isoforms and the disparity of ace expression upon early innate immune response, we further refine the sars-cov susceptibility prediction to fit recent experimental validation. in addition to showing a broad susceptibility potential across mammalian species based on structural analysis, our results also reveal that domestic animals including dogs, pigs, cattle and goats may evolve ace -related immunogenetic diversity to restrict sars-cov infections. thus, we propose that domestic animals may be unlikely to play a role as amplifying hosts unless the virus has further species-specific adaptation. these findings may relieve relevant public concerns regarding covid- -like risk in domestic animals, highlight virus-host coevolution, and evoke disease intervention through targeting ace molecular diversity and interferon optimization. erupting in china last december, the novel coronavirus disease (covid- ) has become a worldwide pandemic and caused near . million confirmed deaths and million infected cases across countries by the end of may [ , ] . the etiological virus, designated as severe acute respiratory syndrome coronavirus (sars-cov ) has been identified [ ] and related to the viruses previously causing sars or middle east respiratory syndrome (mers) in humans in and , respectively [ ] . these three human-pathogenic coronaviruses putatively evolve from bat coronaviruses, but have different animal tropisms and intermediate reservoirs before transmission to humans [ , ] . as civet cats and camels were retrospectively determined as reservoirs for sars and mers respectively, there is no conclusion about what animal species passing sars-cov to humans [ , ] . investigations indicated that canivora animals including raccoon dogs, red foxes, badgers and minks as well swine, at a less extent, are susceptible to sars virus infections [ , ] . although the viral nucleic acids and antibodies to mers were detectable in multiple ruminant species including sheep, goat, and donkeys, the virus inoculation studies did not result in a productive infection for mers disease in these domestic ruminants, nor in horses [ , ] . as a group of obligate pathogens, viruses need to engage cell receptors for entering cells and race with the host immunity for effective replication and spreading to initiate a productive infection [ ] . in this context, the spike proteins protruding on the coronavirus surface are responsible for cell receptor binding and mediating viral entry [ ] [ ] [ ] . for example, mers-cov adopts the dipeptidyl peptidase (dpp , a.k.a. cd ) and sars-cov uses angiotensin-converting enzyme (ace ) as primary receptors for cell attachment and entry [ ] [ ] [ ] [ ] [ ] [ ] . several groups have reported that sars-cov uses the same ace receptor as sars-cov, but exerts higher receptor affinity to human ace , which may ascribe to the efficacy of sars-cov infection in humans [ , ] . after cell attachment via the receptor binding domain (rbd) in the n-terminal s region of the s protein, the c-terminal s region thus engages in membrane fusion. further cleavage of s from s by a furinlike protease will release and prime the virus entering the recipient cells. several furin-like proteases, especially a broadly expressed trans-membrane serine protease (tmprss ), are adopted for priming sars-cov entry [ , ] . compared with sars-cov, studies showed that sars-cov spike protein also evolutionarily obtains an additional furin-like proteinase cleavage site within the s /s junction region for efficient release from the cell surface and entry into the cells [ , [ ] [ ] [ ] . because tmprss is widely expressed, the tissue-specific expression of ace has been shown to determine sars-cov cell tropism in humans [ , ] . namely, human nasal secretory cells, type ii pneumocytes, and absorptive enterocytes are ace -tmprss double positive and highly permissive to sars-cov infection [ , ] . for cross-species animal tropism, the potential infectivity of sars-cov in both wild and domestic animals raises a big public health concern after the prevalence of sars-cov infections in humans [ , ] . this concern involves two aspects: ( ) screening to identify the animal species that serve as a virus reservoir originally passing sars-cov to humans; and ( ) the existing risk of infected people passing the virus to animals, particularly domestic species, thus potentially amplifying the zoonotic cycle to worsen sars-cov evolution and prevalence [ , ] . by diagnosis of animals in close contact with covid- patients or screening of animal samples in some covid- epidemic zones, studies detected that domestic cats and dogs could be virally or serologically positive for sars-cov [ ] [ ] [ ] [ ] [ ] [ ] [ ] , as was a reported infection in a zoo tiger [ ] . using controlled experimental infection of human sars-cov isolates, several studies demonstrated that ferrets, hamsters, domestic cats and some non-human primate species are susceptible to human sars-cov strains [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . obviously, it is impractical to test sars-cov susceptibility experimentally in all animal species. by adoption of a structural simulation based on published structures of the viral s-rbd/ace complex, studies have predicted a broad spectrum of vertebrate species with high potential for sars-cov susceptibility, which, if true, entails unexpected risks in both public and animal health, and warrants further critical evaluation [ ] [ ] [ ] . ace is a key enzyme catalyzing angiotensin (agt) further conversion into numeral active forms of agt - , which are hormonal mediators in the body's renin-angiotensin system (ras) [ , ] . thus, ace plays a regulatory role in the blood volume/pressure, body fluid balance, sodium and water retention, as well as immune effects on apoptosis, inflammation, and generation of reactive oxygen species (ros) [ , ] . in this line, the expression of ace is also inter-regulated by immune mediators pertinent to its systemic function. multiple physio-pathological factors, including pathogenic inflammation, influence on ras through action on ace expression [ ] [ ] [ ] . interferon (ifn) response, especially that mediated by type i and type iii ifns, comprises a frontline of antiviral immunity to restrict viral spreading from the initial infection sites, and therefore primarily determines if a viral exposure becomes controlled or a productive infection [ ] . several recent studies revealed that human ace gene behaves like an interferon-stimulated gene (isg) and is stimulated by a viral infection and ifn treatment; however, mouse ace gene is not [ , , ] . therefore, to determine the cell tropism and animal susceptibility to sars-cov , the cross-species ace genetic and especially epigenetic diversity in regulation of ace expression and functionality should be evaluated [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . in this study, through integration of structural analysis and key immunogenetic factors that show species-dependent differences, we critically refine the sars-cov susceptibility prediction to fit recent experimental validation [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . along with showing a broad susceptibility potential across mammalian species based on structural analysis [ ] [ ] [ ] , our results further reveal that domestic animals including dogs, pigs, cattle and goats may evolve previously unexamined immunogenetic diversity to restrict sars-cov infections. protein and promoter sequence extraction and alignment: the amino acid sequences of ace proteins and dna sequences of the proximal promoters of each ace genes were extracted from ncbi gene and relevant databases (https://www.ncbi.nlm.nih.gov/gene). ace genes and corresponding transcripts have been well annotated in most representative vertebrate species. in most cases, the annotations were double verified through the same gene entries at ensembl (https://www.ensembl.org). the protein sequences were collected from all non-redundant transcript variants and further verified for expression using relevant rna-seq data (ncbi geo profiles). the proximal promoter region spans ~ . kb before the predicted transcription (or translation) start site (tss) of ace or other genes. the protein and dna sequences were aligned using the multiple sequence alignment tools of clustalw or muscle through an embl-ebi port (https://www.ebi.ac.uk/). other sequence management was conducted using programs at the sequence manipulation suite (http://www.bioinformatics.org). sequence alignments were visualized using jalview (http://www.jalview.org) and megax (https://www.megasoftware.net). sequence similarity calculations and plotting were done using sdt . (http://web.cbio.uct.ac.za/~brejnev). other than indicated, all programs were run with default parameters. phylogenic analysis: the phylogenic analysis and tree visualization were performed using megax and an online program, evoview. the evolutionary history was inferred using the neighbor-joining method. percentage of replicate trees in which the associated taxa clustered together in the bootstrap test ( , replicates) was also performed. the evolutionary distances were computed using the p-distance method and in units of the number of amino acid differences per site. other than indicated, all programs were run with default parameters as the programs suggested. structural simulation and analysis: the structure files of human ace protein and its interaction with sars-cov s-rbd were extracted from the protein data bank under the files of m and m j. the residual mutation and structure simulation were performed using ucsf chimera and pymol available at https://www.cgl.ucsf.edu/chimera/ and https://pymol.org/, respectively. structural visualization were using pymol. the binding affinity energy (Δg), dissociation constant (kd) and interfacial contacts between s-rbd and each ace were calculated using an prodigy algorithm at https://bianca.science.uu.nl/prodigy/. profiling transcription factor binding sites in ace promoters and pwm scoring: the regulatory elements (and pertinent binding factors) in the ~ . kb proximal promoter regions was examined against both human/animal tfd database using a program nsite (version . , at http://www.softberry.com). the mean position weight matrix (pwm) of key cis-elements in the proximal promoters were calculated using pwm tools through https://ccg.epfl.ch/cgibin/pwmtools, and the binding motif matrices of examined tfs were extracted from jaspar core vertebrates (http://jaspar.genereg.net/). for expression confirmation, several sets of rna-seq data from ncbi gene databases, and one of ours generated from porcine alveolar macrophages (bioproject with an accession number of srp ), were analyzed for verification of the differential expression of ace genes in most annotated animal species. especially, the expression of porcine ace isoforms and relevant other genes in the porcine lung macrophage datasets. significantly differentially expressed genes (degs) between two treatments were called using an edger package and visualized using heatmaps or bar charts as previously described [ ]. . . vertebrate ace orthologs share an functional constraint but experience intra-species diversification in livestock with unknown selective pressure sequence comparison among ace orthologs across representative vertebrate species shows a pairwise identity range at - % ( fig. a and supplemental fig. s and excel sheet), which is - % higher than the average value generated through a similarity analysis at - % on gene orthologs at a genome-wide scale [ ] . this indicates that ace exerts a similar and basic function cross-species, consistent with its systemic and regulatory role as a key enzyme in ras, an essential regulatory axis underlying the body circulatory and execratory systems in vertebrates [ ] [ ] [ ] . a comparison of evolutionary rates of major genes within ras including angiotensinogen (agt), ace, and several receptors of the processed angiotensin hormones showed that ace actually evolves slightly faster than ace [ , and unpublished data] . this implies that ace may bear pressure for ras adapting evolution per a species-dependent physiological and pathological requirement [ ] [ ] [ ] . this evolutionary adaptability of ace genes is demonstrated by the existence of numerical genetic polymorphisms [ ] and several transcript isoforms particularly in humans and major livestock species ( fig. b and supplemental fig. s and excel sheet). we identified (and verified by rna-seq annotation) four transcripts of ace isoforms in humans (fig. b ) that primarily differ in the c-terminal residues within the collectrin domain. particularly, - short ace isoforms were identified in dogs, pigs, cattle, and goats in addition to the longer ace consensus to the human's (designated as -s or -l, respectively after the animal common names in fig. b and thereafter). these livestock ace -s isoforms have a - residual truncation at their n-terminal peptidase domains, which also span the region interacting with sars-cov spike protein. the selective mechanisms driving the evolution of these short ace isoforms in livestock are unknown, but may relate to previous pathogenic exposure or unprecedented physiopathological pressure. to support this reasoning, short ace isoforms are detected in both domestic bos taurus and hybrid cattle, but not in the wild buffalo and bison; and ace isoforms from each species are generally paralogous and sister each other within a clade in the phylogenic tree (fig. b ). phylogenic analysis of vertebrate ace orthologs/paralogs reveals a general relationship aligning to the animal cladistics (fig. b) . in this context, homologs from the fish, frog and chicken conform to a primitive clade. all ungulate homologs form into parallel clades next to each other. the homologs from the glires, primates and carnivores cluster into a big clade (marked with yellow triangle node), which contains all the sars-cov susceptible species that have been verified via natural exposure or experimental infections (fig. b , marked with red/orange circles). we examined and merged several previous studies about the prediction of sars-cov susceptibility in vertebrates based on the simulated structural analysis of s-rbd-ace complex [ ] [ ] [ ] . as numerous vertebrate species were predicted to be high or low potential (fig. b , labeled as red h or green l) for sars-cov susceptibility, incongruence between the predicted sars-cov susceptibility and infected validation is apparent in pangolin, ferret, tiger, cat and horseshoe bat, indicating that some other factors besides ace -rbd affinity should be considered [ , [ ] [ ] [ ] . we, therefore, refined the prediction matrix to include the rbd-binding evasion of some ace orthologs identified in major livestock species and the interferon-stimulated ace expression underlying sars-cov infections [ , [ ] [ ] [ ] . several recent studies have elegantly demonstrated the structural interaction of the viral s protein or its rbd in complex with human ace receptor [ , ] . showing that the contacting residues at the rbd/ace interface ( fig. a) involve at least residues in ace (fig. b , listed in the table cells and referred to the aligned residual positions in human ace ) and residues in the sars-cov rbd (fig. b , blue circles with residue labels above the table) [ , , ] . the cross-species residual identity (%) of these interacting residues in ace are dispersed in a broader range ( - %) than the whole ace sequence identity rate at - % [ ] , indicating a faster evolution rate of this virus-interacting region. notably, the s-binding region spans a large part of the n-terminal peptidase domain and s-binding may competitively block a majority of active sites of the enzyme (fig. c ). using a similar structural analysis procedure [ , ] , we modeled the ace structures of animal species of interest and simulated their interaction with sars-cov s-rbd based on a published rbd-human ace structure (protein data bank file m j) [ ] . fig. demonstrates the s-rbd interaction with the simulated structures of ace long isoforms from the dog, pig and cattle, respectively. the major changes of the rbd-ace interacting interfaces are from the residual exchanges in ace from other species compared with human ace (fig. b - d, highlighted in red). in addition, the exchange of n t (in pigs) and n y (in cattle and sheep) would destroy the n-glycosylation site in human ace . ace from goat (supplement fig. s ) exhibits identical amino acid exchanges as in cattle in the rbd-ace interfacial contacts. in contrast,when compared with human ace , ace from cats (supplement fig. s ) conserves all relevant glycosylation sites in human ace [ , ] . we also calculated the interfacial contacts using parameters of protein-protein interaction including the predictable binding affinity energy (Δg), dissociation constant (kd) and number of different interfacial contacts within the s-rbd and ace contact. although the exact numbers may differ from previous reports [ ] , they provide a very comparable matrix generated using the same algorithm ( fig. e ) [ ] . data show that the ace of most domestic animals, including that from mouse and rat (species known to be unsusceptible to human sars-cov ), have a binding affinity (Δg) at - . to - . kcal/mol. this is within the binding affinity range ( . - . kcal/mol) between the rbd and the ace from known susceptible species (fig. e , underlined in the left part of the table). this indicates that other factors, conceivably from genetic divergence and/or natural immunity, also contribute to sars-cov susceptibility in animal species. therefore, an effective prediction matrix should include the critical immunogenetic factors to further determine virus susceptibility in addition to the sequence/structural similarity of ace receptors ( fig. and fig. s ) [ , , ] . we detected several short ace isoforms in the domestic animals including dog, pig, goat and cattle that have an n-terminal truncation spanning - key residues in the contacting network to s-rbd but retain the enzyme active sites (fig. a ). most of the splicing isoforms of ace genes, such as in zebrafish, cats and humans, share a common proximal promoter and encode ace proteins containing all key rbd-interacting residues [ , ] . however, these short ace -s isoforms in domestic animals truncate for (cattle/goat ace -s) or (dog/pig ace -s) residues at their n-termini compared with the long ace isoforms in the same species ( fig. and fig. s ). therefore, these short ace isoforms destroy - key residues in the contacting network to s-rbd but likely retain ace enzymatic function in ras. paired structural comparison between the human ace structure (extracted from m ) with each simulated ace -s structure from the pig, dog, and cattle/goat, reveals that all these ace -s orthologs from domestic animals, particularly the porcine one, show high structural similarity to the human ace except for the nterminal truncations ( fig. b- d ). this indicates that these short ace isoforms in domestic animals have little chance to be engaged by the viral s-binding, and predict an unexpected evolutionary advantage to allay potential covid- risk resulting from viral engagement and functional distortion on the classical long ace isoforms in these animal species [ , ] . sars-cov infection induces a weak ifn response but a production of a high amount of inflammatory cytokines including interleukin (il)- and chemokine cxcl in most severe covid- patients [ ] [ ] [ ] [ ] . studies of sars and mers showed that these pathogenic coronaviruses share similar viral antagonisms, including the endoribonuclease (endou) encoded by nonstructural protein (nsp ), which directly blunts cell receptors responding to viral dsrna and in turn weaken the acute antiviral response [ ] . several recent studies revealed that sars-cov seems more cunning in not only evading or antagonizing but also in exploiting the ifn response for efficient cell attachment [ , , , ] . as a key enzyme in ras, the expression of ace gene has been primarily investigated for physiological response to circulatory regulations, and a response to pathological inflammation is also expected [ ] [ ] [ ] . however, the expression of the ace gene was highly responsive to both viral infection and host ifn response, i.e. human ace gene seems an unstudied ifn-stimulated gene (isg) [ , ] . surprisingly, the isg propensity of ace genes is species-dependent, for example: the mouse ace gene is less ifn responsive which may partly explain the mouse insusceptibility to sars-cov infection [ ] . to categorize the different ifn-inductive propensity of ace genes in vertebrates, particularly in major livestock species, we profiled the regulatory cis-elements and relevant transcription factors in the proximal promoter regions of each ace genes ( . kb before tss or atg). figure illustrates major regulatory cis-elements located in ace genes from major livestock animals and several reference animal species. data show that animal ace gene promoters are evolutionally different in containing ifn-or virus-stimulated response elements (isre, prdi, ifrs, and/or stat / factors) and cis-elements responsive to pro-inflammatory mediators. all these cis-elements recruit corresponding transcription factors (tf) to mediate differential ace responses to antiviral ifns and inflammation that is associated with covid- disease [ , , ] . we discover that ace genes obtain species-different isg propensity responsive to ifn and inflammatory stimuli. in most (if not all) of the sars-cov susceptible species the ace genes obtained the ifn response between the typical robust and tunable ifn-stimulated genes (isg) [ ] . in general, the robust isgs (isg is an example here) are stimulated in the acute phase of viral infection and play a more antiviral role; in contrast, the later responsive tunable isgs (irf is an example) contribute more to anti-proliferation of ifn activity [ ] . in addition, unlike the promoter of the short ace isoforms in cattle and goats, which share most common promoter regions with their paralogous long isoforms, the short ace isoforms of dogs (dog-s) and pigs (pig-s) have distinct proximal promoter regions (and different ifn responsivity) to the paralogous long ace isoforms ( fig. and fig ) . results indicate that the short ace isoforms in pigs and dogs diversify from their long paralogs at both the levels of genetic coding and epigenetic regulation to adapt to some evolutionary pressure, such as that from pathogenic interaction (fig. ) [ , ]. the position weight matrix (pwm) stands as a position-specific scoring model for the binding specificity of a transcription factor (tf) on the dna sequences [ ]. using pwm toolsets online (https://ccg.epfl.ch/cgi-bin/pwmtools), we evaluate mean pwm of key cis-elements in the proximal promoters of ace genes that containing binding sites for canonical ifn-dependent transcription factors, which include isre/stat, irf . irf / and irf , as well as c/ebp representing a core transcription factor for pro-inflammation. these ifn-dependent transcription factors, particularly irf / and isre/stat for ifn stimulation, are differentially enriched in the promoter regions of ace genes in a species-dependent way. higher enrichment of isre/stat / and/or irf / binding sites are detected in most sars-cov /covid susceptible species (indicated with solid orange or red circles, respectively). in contrast, the pwm for irf and c/ebp, which regulate inflammation, are less differentiated in ace promoters from animal species, indicating that ace genes are more universally regulated by inflammation than that by the viral infection or ifninduction in a species-dependent way (fig. ). as compared with the promoters of a typical human robust isg and tunable irf genes, this data indicate that ace genes (particularly the primate ones) are not typical robust or tunable isgs as represented by isg or irf , but respond differently to viral infection (through irf / ) or ifn auto-induction (via isre/stat) in a speciesdependent manner ( fig. ) [ ]. higher enrichment of isre/stat / and/or irf / corresponds to sars-cov susceptibility in experimentally validated mammalian species especially primates, but not to the phylogenically distant species such as zebrafish, which has very low potential for sars-cov susceptibility due to the high disparity of ace structures ( fig. and fig. s ). in addition, the proximal promoters of the pig and dog ace -s genes differ much in their ifn-responsive elements to most ace promoters in mammalians ( fig. and fig. ). however, they are phylogenically sister to the ace promoters from the primitive vertebrates (frog, chicken and zebrafish) (fig. , phylogenic tree) . this indicates that the expression of these short ace isoforms is more conservative than the long ace isoforms, which represent a more recent evolution obtaining ace epigenetic regulation by ifn-signaling (fig. ) [ ]. studies show that affinity adaption of the viral s-rbd and ace receptor determines the cellular permissiveness to the virus [ , , ] . sars-cov not only adapts a high binding affinity to human ace for cell attachment, but also antagonizes host antiviral interferon (ifn) response and utilizes ifn-stimulated property of human ace gene to boost spreading [ , , , ] . in addition to structural analysis of simulated s-rbd-ace interaction, we propose that several immunogenetic factors, including the evolution of s-binding-void ace isoforms in some domestic animals, the species-specific ifn system, and epigenetic regulation of ifn-stimulated property of host ace genes, contribute to the viral susceptibility and the development of covid- -like symptoms in certain animal species [ , , , ] . a computational program in development that incorporates this multifactorial prediction matrix and in vitro validation of sars-cov susceptibility in major vertebrate species will be necessary to address public concerns relevant to sars-cov infections in animals (fig. ) . it will also lead to the development of better animal models for anti-covid investigations [ ] . in addition, several ifn-based therapies for treatment of covid have been proposed and are in the process of clinic trails [ - ]. considering the viral stealth of ifnstimulated property of human ace , a timely and subtype-optimized ifn treatment should be delivered rather than a general injection of typical human ifn-α/β subtypes [ - ]. in this line, domestic livestock like pigs and cattle have a most evolved ifn system containing numerous unconventional ifn subtypes. some of these unconventional ifn subtypes, such as some porcine ifn-ω exert much higher antiviral activity than ifn-α even in human cells and most ifn-λ retain antiviral activity with less pro-inflammatory activity, could be utilized for developing effective antiviral therapies [ , ] . in summary, a predication matrix, which integrates the structural analysis of s-rbd-ace interfacial interface and the species-specific immunogenetic diversity of ace genes, was used to predict the sars-cov susceptibility and fit current knowledge about the infectious potential already validated in different animal species (fig. ) . more extensive validation experiments are needed to further improve this prediction matrix. our current results demonstrate several previously unstudied immunogenetic properties of animal ace genes and imply some domestic animals, including dogs, pigs and cattle/goats, may obtain some immunogenetic diversity to confront sars-cov infection and face less covid- risk than may have been previously thought. however, immediate biosecurity practices should be applied in animal management to reduce animal exposure to the virus and prevent potential species-specific adaptation (fig. ) . for livestock breeding programs that targeting disease resistance to respiratory viruses, the genetic and epigenetic diversity of ace genes as well antiviral isgs are highly recommended [ , , , ]. in conclusion, sars-cov evolves to fit well with human (and non-human primates) ace receptor through the structural interfacial affinity, immunogenetic diversity and epigenetic expression regulation, which results in a highly infectious efficacy [ ] [ ] [ ] , , , ] . most mammals, especially those that belong to glires, primates and carnivores, have a higher potential for sars-cov susceptibility but in a species-different manner based on the s-binding-void ace isoforms and the difference of the ifn-inductive propensity of the major ace genes. most ungulate animals appear have a low susceptibility potential with horses and sheep having a high potential (fig. ) . default parameter setting. the prediction of sars-cov susceptibility is based on the sequence similarity of each ace to human ace in the s-rbd binding region and simulated using a published human ace -rbd structure ( m j) and refers to two recent publications using similar procedures but different structural models [ , ] . compared with the currently available experimental data, incongruence of the predicted sars-cov susceptibility is clearly demonstrated in pangolin, ferret, tiger, cat and horseshoe bat, indicating that some other factors besides ace -rbd affinity should be considered. interfacial contacts of the sars-cov s-rbd with ace orthologs of major livestock species. most domestic animals ace including that from mouse and rat (species known not to be susceptible to sars-cov ) have a binding affinity (Δg) at - . to - . kcal/mol that is within the range ( . - . kcal/mol) between the rbd and the ace from the known susceptible species (underlined in the left part of the table), indicating that some other factors, especially those from genetic divergence and natural immunity, contribute to the sars-cov susceptibility of different animal species. the phylogenic tree of identified ace orthologs/variants from different species was built with a neighbor-joining approach and visualized using an evoview program under default parameter setting. the prediction of sars-cov susceptibility is based on the sequence similarity of each ace to human ace in the s-rbd binding region and simulated using a published human ace -rbd structure ( m j) and refers to two recent publications using similar procedures but different structural models [ , ] . compared with the currently available experimental data, incongruence of the predicted sars-cov susceptibility is clearly demonstrated in pangolin, ferret, tiger, cat and horseshoe bat, indicating that some other factors besides ace -rbd affinity should be considered. we emphasize to integrate other factors, including the rbd-binding evasion of some short ace orthologs identified in some major livestock species and the recently identified ace -interferon association [ ] , to refine the sars-cov susceptibility prediction. y y predicted permissiveness by sequence similarity and ace -rbd binding energy figure : prediction of sars-cov susceptibility in major livestock species based on the conservation of key interacting residues and binding capacity between the viral spike (s) protein on the host ace receptor. (a) sars-cov- uses the cell receptor, angiotensin-converting enzyme (ace ) for entry and the serine protease tmprss and furin for s protein priming. (b) as tmprss is broadly expressed and active with a furin-like cleavage activity, the affinity adaption of the s receptor binding domain (rbd) and ace receptor determines the viral permissiveness. the contacting residues of human ace (a distance cutoff . Å) at the sars-cov- rbd/ace interfaces are shown, and the contacting network involves at least residues in ace (listed in the table cells and referring to the aligned residual positions in human ace ) and residues in the sars-cov- rbd (blue circles with residue labels), which are listed and connected with black lines (indicating hydrogen bonds) and red line (represents salt-bridge interaction). the cross-species residual identity (%) of these interacting residues in ace are listed in a broad range ( - %) [ ] [ ] [ ] . (c) we also detected several short ace isoforms (underlined) in the domestic animals including dog, pig, goat and cattle, which have a n-terminal truncation spanning - key residues in the contacting network to s-rbd but keeping the enzyme active sites (indicated by yellow triangles), thus resulting in little engagement by the viral s protein and predicting an unexpected evolutionary advantage for relieving potential covid- risk caused by the viral engagement and functional distortion on the classical long ace isoforms in these animal species. the ncbi accession numbers of the ace orthologs are listed as in fig. s-rbd with ace orthologs of major livestock species simulated using the human ace /cov -rbd structure ( m j). most residues involved in binding are highlighted as magenta (ace ) or orange (s) sticks and labeled as one-letter amino-acid codes plus residual numbers in bold or regular font respectively for s or ace residues. the dotted/blue lines indicate intermolecular salt bridge or hydrogen bonds between interacting residues (generated and visualized with ucsf chimera and pymol from protein data bank file m j). (b) to (d) rbd interaction with the simulated structures of ace long isoforms from the dog, pig and cattle, respectively. amino acid exchanges in ace from another species compared with human ace are highlighted in red. e) prediction of binding affinity energy (Δg), dissociation constant (kd) and interfacial contacts of the sars-cov s-rbd with ace orthologs of major livestock species. most domestic animals ace including that from mouse and rat (species known not to be susceptible to sars-cov ) have a binding affinity (Δg) at - . to - . kcal/mol that is within the range ( . - . kcal/mol) between the rbd and the ace from the known susceptible species (underlined in the left part of the table), indicating that some other factors, especially those from genetic divergence and natural immunity, contribute to the sars-cov susceptibility of different animal species. in contrast to most splicing isoforms such as in cats and humans, which share a common proximal promoter and encode ace proteins with similar sequences containing all key rbd-interacting residues, these short ace -s isoforms in domestic animals truncate for (cattle/goat ace -s) or (dog/pig ace -s) residues at their n-termini compared with human ace or the long ace isoforms in these species, thus destroying - key residues in the contacting network to s-rbd but retaining all enzyme active sites (yellow triangles in the blue ace domain bar). this results in little chance to be engaged by the viral s protein binding and predicts an unexpected evolutionary advantage to relieve potential covid- risk caused by the viral engagement and functional distortion on the classical long ace isoforms in these animal species. (b), (c) and (d) paired structural comparison between the human ace structure ( m ) with each simulated ace -s structure from pig (b), dog (c) and cattle/goat (d). human ace structure are in green, and each compared animal ace -s structure in magenta. the n-terminal residues of both compared structures are in cyan (arrows indicating n-termini of the ace -s isoforms) and shared c-termini are in red. putative proximal promoter region (ace -p) figure . categorizing ace genes based on regulatory cis-elements predicted in their proximal promoter regions (< kb before tss or atg). the regulatory elements (and pertinent binding factors) in the ~ kb proximal promoter regions were examined against both human/animal tfd database using a program nsite (version . , at http://www.softberry.com), including ace genes identified in major livestock animals and several reference animal species. data show that animal ace gene promoters are evolutionally different in containing ifn-or virus-stimulated response elements (isre, prdi, ifrs, and/or stat / factors) and cis-elements responsive to proinflammatory mediators, which mediate different ace responses to antiviral interferons (ifns) and inflammation associated with covid- disease. legend: ○, gata- regulating constitutive expression; acute (◊) or secondary (◊) ifn-stimulated response element (isre) and prdi that interact with irf, isgf and stat factors, respectively; □, cis-elements interacting with factors to mediate immune/ inflammatory responses including c/ebp, nf-kb, nf-il , and p ; •, cis-elements reacting with other factors significant in other developmental/physiological responses. the promoter features of two typical human interferon-stimulated genes (isg), the robust isg and tunable irf are shown as references to indicate that ace genes obtain species-different isg propensity responsive to ifn and inflammatory stimuli. covid- dashboard by the center for systems science and engineering (csse) at johns hopkins university (jhu) a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a new coronavirus associated with human respiratory disease in china the emergence of sars, mers and novel sars- coronaviruses in the st century a genomic perspective on the origin and emergence of sars-cov- origin and evolution of pathogenic coronaviruses animal origins of the severe acute respiratory syndrome coronavirus: insight from ace -s-protein interactions middle east respiratory syndrome coronavirus (mers-cov): animal to human interaction mers-cov: the intermediate host identified? how viral and intracellular bacterial pathogens reprogram the metabolism of host cells to allow their intracellular replication sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov the proximal origin of sars-cov- sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus covid- : animals, veterinary and zoonotic links transmission of sars-cov- in domestic cats pathogenesis and transmission of sars-cov- in golden hamsters serological survey of sars-cov- for experimental, domestic, companion and wild animals excludes intermediate hosts of different species of animals animal models of mechanisms of sars-cov- infection and covid- pathology first detection and genome sequencing of sars-cov- in an infected cat in france simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility infection and rapid transmission of sars-cov- in ferrets complete genome sequence of sars-cov- in a tiger from a u spike protein recognition of mammalian ace predicts the host range and an optimized ace for sars-cov- infection predicting the angiotensin converting enzyme (ace ) utilizing capability as the receptor of sars-cov- covid- : epidemiology, evolution, and cross-disciplinary perspectives the pivotal link between ace deficiency and sars-cov- infection physiological and pathological regulation of ace , the sars-cov- receptor renin-angiotensin system at the heart of covid- pandemic type i and type iii interferons -induction, signaling, evasion, and application to combat covid- increasing host cellular receptor-angiotensin-converting enzyme (ace ) expression by coronavirus may facilitate -ncov (or sars-cov- ) infection sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes evolutionary constraints on structural similarity in orthologs and paralogs a genomic survey of angiotensin-converting enzymes provides novel insights into their molecular evolution in vertebrates ace receptor polymorphism: susceptibility to sars-cov- , hypertension, multi-organ failure, and covid- disease outcome structural basis for the recognition of sars-cov- by full-length human ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor prodigy: a web server for predicting the binding affinity of protein-protein complexes imbalanced host response to sars-cov- drives development of covid- weak induction of interferon expression by sars-cov- supports clinical trials of interferon lambda to treat early covid- key: cord- -sr j z c authors: mersmann, sophia f.; johns, emma; yong, tracer; mcewan, will a.; james, leo c.; cohen, edward a.k.; grove, joe title: learning to count: determining the stoichiometry of bio-molecular complexes using fluorescence microscopy and statistical modelling date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sr j z c cellular biology occurs through myriad interactions between diverse molecular components, many of which assemble in to specific complexes. various techniques can provide a qualitative survey of which components are found in a given complex. however, quantitative analysis of the absolute number of molecules within a complex (known as stoichiometry) remains challenging. here we provide a novel method that combines fluorescence microscopy and statistical modelling to derive accurate molecular counts. we have devised a system in which a given biomolecule is differentially labelled with spectrally distinct fluorescent dyes (label a or b), which are then mixed such that b-labelled molecules are vastly outnumbered by those with label a. complexes, containing this component, are then simply scored as either being positive or negative for label b. the frequency of positive complexes is directly related to the stoichiometry of interaction and molecular counts can be inferred by statistical modelling. we demonstrate this method using complexes of adenovirus particles and monoclonal antibodies, achieving counts that are in excellent agreement with previous estimates. beyond virology, this approach is readily transferable to other experimental systems and, therefore, provides a powerful tool for quantitative molecular biology. the statistical models used in our analysis are available here: https://github.com/sophiamersmann/molecular-counting, the raw data used for molecular counting can be found here: . /zenodo. . all cellular processes are driven by coordinated networks of dynamically interacting molecular partners. to successfully function, these components typically need to be assembled into specific complexes or clusters. for example, receptor signalling generally requires the co-location of various sensory, regulatory and stimulatory partners; the precise makeup of these assemblies can tune the nature of the signal and resultant physiological output. molecular cell biology research has, thus far, largely focused on determining the identity of the components found in a given complex. however, it is becoming clear that the quantity of any given component is also vitally important. quantifying the number of molecules, or stoichiometry, within an assembly can be used to understand its ultrastructure and, ultimately, to create complete molecular models of entire cellular structures, as has been demonstrated for hiv capsids and the neurological synapse (briggs et al., ; wilhelm et al., ) . there are various approaches to investigate the number of molecules within a given complex; for example calibrated biochemical analysis or cryo-electron microscopy (cryo-em). however, such methods pose practical and/or technological barriers and, by their very nature, obscure heterogeneity within the sample. single molecule localisation microscopy (smlm) modalities, such as storm and palm, are potentially powerful techniques for counting (lee et al., ; coltharp et al., ; fricke et al., ; veatch et al., ; stein et al., ) . nonetheless, these approaches typically require detailed a priori knowledge of the experimental system (for instance, a thorough evaluation of the 'blinking' characteristics of the fluorophores (patel et al., ) ) and/or a well understood reference sample with which to calibrate the measurement (thevathasan et al., ) . these steps need to be performed independently for each different experimental model and microscope set up; this creates a high barrier to implementation and can make these methods vulnerable to experimental variation and artefacts. here we outline an alternative, and potentially complementary, approach that combines (non-smlm) fluorescent microscopy and statistical modelling to extract estimates of molecular numbers within a complex. the method requires differential binary labelling of a constituent (i.e. a protein of interest is labelled with fluorescent dye a or b); by appropriate mixing of the labelled components, any individual molecular complex can be simply scored as being positive or negative for a given label. the frequency of positive complexes is then analysed by statistical modelling to extract estimates of stoichiometry. this approach is simple and requires minimal calibration or a priori understanding of the experimental system. we demonstrate the feasibility and accuracy of this approach by studying the stoichiometry of virus-antibody complexes. adenovirus (adv) is a non-enveloped dna virus; its genome is enclosed within a proteinous shell, called a capsid (nemerow et al., ) . the major adv capsid protein is hexon; this assembles into trimeric subunits, that are hexagonal in shape, which in turn arrange to form an icosahedron with triangular faces (a molecular cartoon of the adv particle is provided in figure a ). the adv particle has vertices, each of which are occupied by a pentameric subunit (formed of the penton base protein) and a receptor binding 'spike' (formed of the fibre protein). antibodies (ab) that bind sites such as the spike can directly neutralise adv by blocking receptor interactions, therefore preventing the virus particle from entering the cell. however, antibodies targeting the hexon protein (which makes up the majority of the capsid) do not necessarily interfere with the mechanics of virus entry (fender et al., ; wohlfart et al., ) . nonetheless, anti-hexon antibodies can prevent virus infection by recruiting the intracellular antibody-sensor trim , which targets the virus for degradation and activates cell-intrinsic immune responses (keeble et al., ; mallery et al., ) . anti-hexon monoclonal antibody c inhibits adv infection via this mechanism and is used as a prototypical system to investigate trim . the stoichiometries of antibody and trim recruitment to incoming virus particles remain unclear and are likely to be a determinant of the nature of the resulting cellular response. previous studies, using alternative techniques, have provided estimates of the stoichiometry of adv- c complexes. each adv particle possesses identical hexon proteins, each of these represents a potential binding site for c . however, the hexon subunits are assembled as trimers, and are arranged in a specific geometry (as described above). moreover, antibodie are bivalent, therefore each c molecule has two hexon binding pockets. consequently, it is highly unlikely that each hexon molecule will be occupied by a single c molecule (i.e. antibodies per virus particle). analysis by cryo-em, immuno-gold em staining and calibrated fluorescence measurements suggest a true maximum stoichiometry of - antibodies per particle (mcewan et al., ; varghese et al., ) ; this maximum is likely determined by the limits to antibody binding and packing enforced by the geometry of the particle. we have applied our counting method to the adv- c complex and generated estimates that are in good agreement with these previous studies, therefore validating our approach. strategy. we used differential binary labelling and statistical modelling to extract estimates of stoichiometry, our strategy is outlined in figure ; note that this approach can be generalised to apply to many other multi-component systems (i.e. how many protein x are found in assembly y?). the hidden truth is the number of antibodies bound to a virus particle; the ab:virus stoichiometry is expected to increase with antibody concentration until it reaches a saturation point where a. the ground truth: the number of antibody molecules (k) per virus particle increases with antibody concentration up to a saturation point (k = nsat). b. extracting truth from data: adv particles (labelled with a green fluorescent dye) are incubated with a defined mixture of two batches of antibody; one batch has received fluorescent label a (magenta), whilst the second has received label b (blue). when viewed by microscopy, every virus particle has received at least one molecule of ab a , whereas only a minority have received any ab b and can be scored in a binary fashion. note, antibody molecules are not drawn to scale the maximum number of abs are bound ( figure a .). in our method ( figure b ), both components (virus and antibody) are fluorescently labelled, however, two separate batches of, the otherwise identical, antibody are given spectrally distinct dyes (resulting in ab a or ab b ). mixing of the differentially labelled antibody batches at appropriate proportions results in only a minority of virus particles receiving a particular fluorescent dye (b in the example cartoon). therefore, when imaged, we detect three distinct fluorescent signals: each virus particle can be identified by its reference dye (green in the cartoon example), every virus particle has also received antibodies with dye a (magenta), however, very few particles are positive for dye b (blue) and can be scored as positive or negative in a binary fashion. the frequency of virus particles that are positive for ab b is a function of the a:b mixing proportion and the stoichiometry of ab:virus interaction; this relationship between the data and the hidden truth can be modelled. consider a single virus to be capable of binding n sat antibodies at saturation. under the assumption that antibodies bind to the same virus independently from each other, k, the total number of (labelled and unlabelled) antibodies bound to a virus, can be modelled as a binomial random variable where p is the probability of an antibody binding to a particular binding site of the virus. if n sat and p are known, then the expected number of antibodies bound to a single virus is simply e[k] = n sat · p. however, in most cases n sat is not known and p cannot be expressed easily since it depends not only on the antibody concentration used but also the composition of the virus particle and the geometry of interaction, which can be difficult to obtain. to extrapolate an antibody count it is, therefore, necessary to estimate both, n sat and p. as described above, our experimental design utilizes antibody labelled with spectrally distinct dyes allowing binary scoring of individual virus particles as positive if they interact with at least one ab b molecule ( figure ). here, we describe this state as being a bernoulli random variable s that takes the value if the virus is in the positive state, and if it is in the negative state, i.e. where q is the probability a virus interacts with at least one labelled antibody. since q = p (s = ) = − p (s = ), we can derive a closed form for q by finding an expression for the probability of a virus not being complexed with any labelled antibody p (s = ). to this end, we simply sum over all possible virusantibody configurations under the constraint of all antibodies being unlabelled, i.e. a virus could bind zero, one, two, ... up to n sat unlabelled antibodies. the marginal probability of a virus being in a negative state is thus given by where p (s = |k = k) is the probability that given the virus binds to k antibodies, exactly zero of them are labelled. the conditional distribution of s given k = k is itself binomial, namely where f l is the proportion of antibodies that are fluorescently labelled. therefore and combining with the probability mass function of k, namely gives a closed form for q directly follows as note that here p (s = ) is expressed in terms of p and n sat , and will hereafter be referred to as p (s = ; θ), where θ = (p, n sat ). consider a single experiment (performed at a specific antibody concentration) to describe v viruses with states s = s , ..., s v where v + of these states are positive, i.e. v + viruses have been observed to interact with at least one labelled antibody. assuming independence among viruses, the likelihood of θ is then where q is as given in ( ). we are then interested in the posterior distribution of θ given by the central result of bayesian statistics, where π(θ) is a suitable prior for the unknown parameters θ. in most applications, more than one experiment is performed; consider m experiments to be conducted at varying antibody concentrations c , ..., c m , producing m sets of virus state observations s = {s , ..., s m }. parameter inference using the described single-experiment model would entail building m independent models, each estimating p j and n sat,j for experiment j. while estimating concentration-specific antibody binding probabilities is desired, inferring multiple n sat values is unintuitive since n sat is fixed for a particular virus-antibody pair, i.e. n sat should be common to all experiments regardless of the antibody concentration used. it is, therefore, preferable to develop a general model accounting for multiple experiments that estimates all p , ..., p m simultaneously while yielding only a single n sat estimate. such a general model contains a single likelihood l(θ; s), where θ = (p , ..., p m , n sat ). this can be expressed as where l j (θ j ; s j ) is the likelihood function for θ j = (p j , n sat ) as defined in ( ). hence, m + unknown parameters are estimated; a probability p j specific to each concentration c j for j = , ..., m, and crucially a single n sat shared over all experiments. to sample from the posterior distributions markov chain monte carlo (mcmc) is used, specifically the metropolis-hasting algorithm using pymc (patil et al., ) . no prior knowledge is incorporated by imposing a beta distribution beta( , ) on all p j , j = , ..., m, and a uniform distribution with a sufficiently large upper bound, uniform( , ), on n sat . the convergence of mcmc is checked by visual inspection of trace and autocorrelation plots for each parameter. model verification via simulations. the proposed method makes use of experimental parameters including the proportion of antibodies labelled (f l ) and the number of viruses sampled (v ). formally, these are not required to comply with specific bounds. however, certain configurations of an experimental set-up are not expected to yield data that lead to a sensible model. labelling almost all antibodies (i.e. f l close to ), for example, would result in a data set with low information content. to explore how different experimental design choices impact the model's ability to reliably estimate parameters, we analysed simulated data that assumed a range of possible experimental settings. we simulated a single experiment at a time and assumed the number of antibodies bound at saturation to be known. for this we used an upper limit estimate of adv- c stoichiometry that we previously derived from an alternative method, n sat = (mcewan et al., ) . in each simulation, the binding probability p of an antibody is thus the only parameter estimated. we considered a range of possible experimental settings by varying the total number of viruses sampled in an experiment (from to ) and the proportion of antibodies labelled (from . to . ). for an assumed true antibody binding probability p ∈ [ . , . ], data is simulated by drawing v virus states from s ∼ ber(q) with q as described in ( ). the antibody binding probability was then blindly estimated using mcmc and convergence checked using the gelman-rubin statistic (gelman and rubin, ) . a total of simulations were carried out that assess the model's ability to reliably estimate p in various experimental settings. as expected, high proportions of labelled antibodies produced data of low information content, reflected in the model's inability to accurately estimate p, even if the number of viruses used in an experiment was high ( fig. a) . by contrast, if f l is less than or equal to . , p was estimated with low bias and low variance ( fig. a) . simulations also suggested that for low f l , as a rule of thumb, at least viruses per experiment should be sampled (fig. b-c) . however, if the proportion of antibodies labelled is . (or higher), then the proposed model failed to produce a reasonable estimate of p for most underlying true values; higher number of viruses seemed to compensate this effect to some extent (fig. d ). in summary, these simulations put empirical bounds on experimental parameters and show that, if compliant, the method yields sensible estimates. experimental setup. successful implementation of our strategy requires sensitive and unambiguous measurements of individual virus-antibody complexes. we achieved this by immobilising adv particles onto coverslips for analysis by total internal reflection fluorescence microscopy (tirf-m; a detailed description of experimental methodology is provided in the supplementary information). prior to immobilisation, purified adv particles were directly labelled with alexa fluor ; when observed by tirf-m (figure bi ) they appear as monodisperse diffraction limited spots (the particle being ∼ nm in diameter; figure a ). to further verify that each object is a single virus particle we used srrf analysis (super-resolution by radial fluctuations (gustafsson et al., ) ); for each object we resolved a single maxima of fluorescence that was ∼ nm in diameter (figure bii and iii), consistent with the ultrastructure of adv particles ( figure a) . immobilised adv particles were incubated with mono- clonal antibody c conjugated to alexa fluor dye, and imaged by tirf-m. each adv particle was positive for antibody, indicating the assembly of virus-antibody complexes ( figure c ). moreover, individual adv- c complexes could be analysed in a quantitative manner over a > fold range in antibody concentration ( figure d ). we automated this process to allow measurements of whole populations of virus particles at varying concentrations of antibody ( figure e & f). c signal intensity was proportional to antibody concentration and reached a plateau at high values; this indicates increasing virus-antibody stoichiometries up to a saturation point at which maximum antibody binding is achieved (as outlined in figure a) . moreover, the populations of virus particles were quite homogenous in c signal; this suggests a relative uniformity of assembly. in conclusion, we were able to quantitatively analyse the assembly of individual virus-antibody complexes. to achieve differential labelling a second batch of c was directly conjugated with biotin; this served as the 'b' batch of antibody to be mixed with the c 'a' batch ( figure ) . it may be possible that conjugation with either biotin or alexa fluor has unexpected detrimental effects on antibody binding; therefore, to have confidence in our binary labelling system we needed to demonstrate fair competition between our differentially labelled antibody batches. to test this we incubated immobilised adv particles with a high concentration of antibody ( µg/ml) composed of different proportions of c or c biotin (e.g. . : . , . : . ). we then monitored the c fluorescence signal under each condition. as the proportion of c dropped we measured a stepwise reduction in fluorescence signal (figure a ). if both batches of antibody possess equivalent binding to adv we would expect a linear relationship between the proportion c and fluorescent signal. indeed, when normalised for units, we observed a near perfect linear relationship ( figure b , slope = . , r = . ); indicating a fair competition in binding between our a and b batches of antibody. as depicted in figure b , our approach requires detection of single antibody molecules (of b batch) within individual virus-antibody complexes. to explore this we incubated immobilised adv particles with µg/ml c spiked with % c biotin (i.e. f l . : . ). molecules of c biotin were detected using streptavidin (an ultra-high affinity biotin binding protein) conjugated to a quantum dot (qdot ). the photostability of quantum dots permits signal accumulation over prolonged exposure times (resch-genger et al., ; algar et al., ) , therefore increasing the sensitivity of detection. analysis by tirf-m revealed that whilst every particle was positive for c only a subset possessed c biotin -qdot signal ( figure c ); this suggests a population of adv particles receiving one, or very few, c biotin antibody molecules (as outline in figure b ). automated analysis revealed well-separated populations of adv particles that were positive or negative for c biotin ( figure d ). note that all particles were positive for c . these data are consistent with the assembly of virus-antibody complexes in which the vast majority of antibody molecules are from batch a ( c ), but a subset of complexes contain ∼ molecules of batch b antibody ( c biotin ); the proportion of c biotin positive particles will serve as the output data for statistical modelling. we have demonstrated quantitative analysis of c interaction with individual adv particles ( figure ) ; we have confirmed that differential labelling of antibody does not bias binding ( figure a & b) ; and that we could detect single molecules of c biotin allowing discrimination of positive and negative adv- c complexes ( figure c & d). therefore, we have satisfied the necessary technical requirements to implement the strategy outlined in figure . we proceeded with a series of experiments to generate data for statistical modelling and stoichiometric estimates. to achieve this we performed four independent overlapping titrations over a > fold range in antibody concentration ( . - µg/ml). guided by the simulations in figure , we explored a range of c biotin proportions, from . - . ( . - % ), and, where possible, collected > particles per sample (the average number of particles collected was > ). the proportion of positive particles was assessed for each sample, details of data analysis are provided in the supplementary information. supplementary figure provides representative data: scatter plots display adv , c and c biotin intensities for control samples (treated with unmixed c biotin or c ) and four representative test samples incubated with a range of antibody concentration, containing . c biotin . bar charts provide summary statistics for each channel: the particles have uniform adv reference signal, whereas the c fluorescence decreases with antibody concentration; likewise, the proportion of c biotin positive particles decreases with concentration. this data are consistent with the expected concentration-dependent stepwise reduction in virus-antibody stoichiometry. the four experiments generated sets of binary scores (supplementary figure and table ) which were integrated in to our statistical model, as outlined in the methods. this generated posterior distributions of p (probability of an antibody binding site being occupied) for each sample (supplementary figure a ) and an estimated n sat (the maximum number of antibodies that can bind per particle) of (maximum a posteriori (map) estimate; ci (credible interval) [ , ]; supplementary figure b ). absolute antibody numbers for each sample can be derived by multiplying the map estimate of each posterior distribution (for p) by n sat (supplementary figure c) . figure a displays the mean number of bound antibody at increasing c concentrations; adv- c stoichiometries range from to across the titration of antibody. for any given sample, alongside the proportion of particles that are positive for c biotin , the experimental setup provides fluorescent intensity values of c in each adv-ab complex (supplementary figure ) ; this provides an internal reference for adv- c interaction, therefore, the stoichiometric estimates should correlate with their experimentally-matched fluorescent intensity values. figure b demonstrates a near perfect linear correlation between stoichiometry and fluorescence intensities for an example experiment; this would suggest that our statistical modelling faithfully reports the relationship between antibody concentration and adv binding occupancy. the stoichiometric estimates generated by our method derive from an ensemble measurment and, therefore, represent the antibody interactions of the average virus particle; this obscures heterogeneity within the population. however, given the excellent agreement between the estimated antibody counts and c intensity values ( figure b ) we used the stoichiometric estimates to calibrate the c signals, therefore allowing us to infer population heterogeneity. to achieve this, for any given sample we matched the median c fluorescence intensity to its associated stoichiometic estimate (generated by the model); extrapolating from this we then converted the c fluorescence values to inferred antibody counts for individual virus particles. figure c provides histograms and frequency distributions of inferred antibody counts for a range of concentrations. being slightly skewed to the right the frequency data was best fitted using a log-normal distribution, this is particularly apparent at higher antibody concentrations. this would suggest that a significant proportion of virus particles are binding more antibodies than the average particle. however, no particles bound greater than antibody molecules. we provide an interpretation of these observations in the discussion. in summary we have used statistical modelling to derive stoichiometric estimates of adv- c complexes. this suggests that the most probable antibody binding maximum is molecules ( % ci - ). however, using stoichiometric estimates to calibrate fluorescent data revealed population heterogeneity with a small proportion of virus particles binding ∼ antibody molecules. notably, these values are in excellent agreement with previous estimates that we, and others, have derived using alternative methods (mcewan et al., ; varghese et al., ). various methods permit the investigation of molecular stoichiometries within biological assemblies but technical limitations often make it difficult to obtain reliable estimates. for example smlm, by its very nature, identifies single molecules and, if properly calibrated, can deliver accurate stoichiometries; however, successful counting by smlm requires a very detailed understanding of the photochemical behaviour of the chosen fluorescent dyes. here we outline a robust, and relatively facile, experimental framework for extracting accurate molecular counts using (non-smlm) fluorescent microscopy and statistical modelling labelling of a component of interest with spectrally distinct fluorescent dyes (label a or b), and mixing them at defined ratios such that b-labelled molecules are in the minority, allowed assembled complexes to be simply identified as being either positive or negative. the frequency of positive complexes was then related to the underlying stoichiometry of interaction through statistical modelling. by creating a scheme in which complexes need only be qualitatively scored for a particular label, our method negates the necessity for carefully calibrated measurements and an a priori understanding of the system. the only requirement is that the chosen label is clearly discerned from background; this is easily achievable with bright/stable fluorescent dyes and relatively inexpensive cameras; in this case we used quantum dots for sensitive detection, but many standard fluorescent dyes should also suffice. an obvious limitation is that our method relies on an ensemble measurement and obscures heterogeneity within the population of complexes. however, this information remains accessible via the a-label fluorescent signals measured from each complex. consequently, the ensemble-based stoichiometric estimates of the average complex can be used to calibrate these fluorescent signals and infer approximate molecular counts for individual complexes, therefore, restoring heterogeneity. we were able to make robust measurements of adv- c interactions by fluorescence microscopy and successfully implemented the differential labelling strategy. we performed multiple independent measurements at various antibody concentrations to derive molecular counts for adv- c complexes. our analysis estimated that the average adv particle interacts with a maximum of c antibody molecules. moreover, through examination of population heterogeneity we revealed that a small proportion of adv particles may bind up to antibody molecules. these data are can be reconciled with a molecular model of adv- c complexes. adv particles possess identical hexon subunits, each providing a potential binding site for c , however, previous estimates indicate an absolute maximum of antibody molecules per virion. this would suggest that particle geometry places packing constraints on the arrangement of antibody molecules. cryo-em analyses indicates a complex and heterogeneous interaction network in which particle geometry creates potential antibody clashes and, therefore, prevents binding to every site simultaneously (varghese et al., ) . whilst there are consistently five c molecules at each of the twelve vertices of the adv particle, additional antibody interactions occur through heterogeneous packing across the surface; the pattern of which is likely dictated by the random order in which binding sites become occupied on any given virus particle. as a consequence with optimal antibody packing there is likely to be an absolute maximum binding occupancy of ∼ molecules. our data indicate that the average particle binds fewer molecules ( ) than the threshold dictated by purely geometric limitations. this would suggest that the majority of particles do not achieve optimal antibody packing and saturate at lower occupancies. this model also offers explanation to our observation that no particles bind greater than ∼ molecules ( figure c ). this interpretation of our data exposes another potential flaw in our approach. our modelling strategy assumes that components bind independently, but in this test case the geometry of adv particles create clashes between adjacent c molecules such that complete saturation is not possible. consequently, antibody binding events can be influenced by prior antibody interactions and, therefore, are not independent. although this may have introduced inaccuracies in our estimates, the molecular counts derived from our approach mean stoichiometric estimates at increasing concentrations of c , error bars indicate standard error of the mean; data fitted with a binding curve, r = . (graphpad, prism). b. stoichiometric estimates were compared to c fluorescent values from an individual experiment; data fitted with a linear regression, r = . (graphpad, prism). c. stoichiometric estimates were used to calibrate fluorescent intensity values allowing inference of heterogeneity. histograms display the frequency of inferred antibody counts as a proportion of total particles. the frequency data were fitted with a log-normal distribution, r values were all > . (graphpad, prism). are in good agreement with previous values. moreover, we maintain that the assumption of independent binding is appropriate for a generalisable method that can be applied to other biological assemblies. for example, within virology, we intend on extending this method to investigate the molecular composition and antibody-mediated neutralisation of enveloped viruses such as human immunodeficiency virus, hepatitis c virus and sars-coronavirus- . beyond the confines of virology, our method could be applied to a variety of other biological assemblies of various scales, for example: bacteria; purified cellular organelles; cellular vesicles, such as exosomes; and supramolecular complexes such as ribosomes or inflammasomes. in conclusion, we have developed a novel and robust method for counting components within biomolecular complexes. this approach has provided accurate counts for a previously characterised system and could be applied in a variety of other contexts. moreover, we expect this system could be integrated with other complementary methods to enhance quantitative analysis; for example differential labelling and statistical modelling may provide a means of internally calibrating smlm-based counting schemes. semiconductor quantum dots in bioanalysis: crossing the valley of death the stoichiometry of gag protein in hiv- accurate construction of photoactivated localization microscopy (palm) images for quantitative measurements one, two or three? probing the stoichiometry of membrane proteins by single-molecule localization microscopy fast live-cell conventional fluorophore nanoscopy with imagej through super-resolution radial fluctuations counting single photoactivatable fluorescent molecules by photoactivated localization microscopy (palm) antibodies mediate intracellular immunity through tripartite motif-containing (trim ) regulation of virus neutralization and the persistent fraction by trim structure of human adenovirus a hidden markov model approach to characterizing the photoswitching behavior of fluorophores pymc: bayesian stochastic modelling in python quantum dots versus organic dyes as fluorescent labels toward absolute molecular numbers in dna-paint nuclear pores as versatile reference standards for quantitative superresolution microscopy postentry neutralization of adenovirus type by an antihexon antibody correlation functions quantify super-resolution images and estimate apparent clustering due to over-counting interaction between hela cells and adenovirus type virions neutralized by different antisera we would like to thank ricardo henriques for initiating some important conversations and providing technical assistance during image analysis. we would like to thanks niall adams for his help in supervising sfm during the development of the statistical methodology. molecular cartoon images were taken from the wellcome key: cord- -dx tg authors: mahajan, lakshmi s.; kim, grace eunyoo; kwak, hojoong title: mapping rna dependent rna polymerase activity and immune gene expression using pro-seq date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dx tg positive strand, single strand rna viruses ((+)ssrna viruses) are viruses with an rna genome that have broad impacts on a wide range of hosts, including sars-cov- human respiratory infections. their replication and gene expression are driven by rna dependent rna polymerases (rdrp). detecting active rna synthesis by rdrp is critical for assessing the infectivity and pathogenicity of (+)ssrna viruses. current approaches rely on viral rna detection, which cannot distinguish viral titer from rdrp activity. precision run-on sequencing (pro-seq) is a nuclear run-on based nascent rna sequencing method, widely used to map eukaryotic rna polymerases by using labeled nucleotide analogues. here we provide evidence that pro-seq also detects rdrp activity and can serve as a highly sensitive rdrp mapping method. coupled to pro-seq in human blood samples, we propose to use pro-seq as a single package method to detect (+)ssrna virus rdrp activity and its interaction with host immune response through transcriptome-wide profiling of leukocyte gene expressions at once. positive strand, single strand rna viruses ((+)ssrna viruses) are viruses with an rna genome that have broad impacts on a wide range of hosts, including coronaviruses in human respiratory infections , . their replication and gene expression are driven by rna dependent rna polymerases (rdrp) , . detecting active rna synthesis by rdrp is critical for assessing the infectivity and pathogenicity of (+)ssrna viruses. current approaches rely on viral rna detection, which cannot distinguish viral titer from rdrp activity . for example, immune cell tropism of coronavirus is a factor related to the severity of the disease outcome currently under debate , , and it will be critical to distinguish the simple presence of viruses in blood from active viral expression inside blood cells. pro-seq is a nuclear run-on based nascent rna sequencing method, widely used to map eukaryotic rna polymerases (rnap) by using labeled nucleotide analogs , . here we provide evidence that pro-seq can also detect rdrp activity, and propose that pro-seq can serve as a highly sensitive rdrp mapping method. coupled to pro-seq in human samples, we anticipate the detection of rna viruses and their activities through rdrp mapping. pro-seq in blood samples of (+)-strand ssrna virus infected patients will further define if there is immune cell tropism of the virus, and host-virus interactions: if the virus interferes with expression levels of the host factors, and if host immune gene expression affects the severity of the infection. drosophila a virus (dav) is a (+)-strand ssrna virus with low pathogenicity, infecting up to % of natural drosophila populations . we previously found that it is also transmitted in experimental drosophila cell lines such as snyder- (s ) cells. dav is among the picornavirus family that encodes rdrp as the replicating and transcribing rna polymerase . drosophila s cells were among the first cells to which pro-seq methods were implemented to map rnap in high resolution . we reanalyzed the pro-seq data in s cells, and investigated whether any of the pro-seq reads were derived from non-drosophila sequences. in the pro-seq data with all biotin-ntps supplied in the run-on reaction, . million reads were sequenced, . million reads ( %) were aligned at least once to the drosophila melanogaster genome (dm ), and . million reads were not mapped to the dm genome. we re-aligned these unmapped reads to the collection of ncbi all virus sequences ( , virus refseq sequences), and , reads mapped to at least once virus sequence ( . % of unmapped reads). , of those reads mapped to the dav genome (nc_ ; % of viral sequences), indicating that pro-seq reads from viral origin in s cells are predominantly from dav. the pro-seq reads from dav covered the entire span of the dav genome mostly on the plus strand ( figure a ). while pro-seq is highly specific for nascent, biotin-labeled rnas, it is also possible that dav pro-seq reads may have derived from the mature dav rna genome that contaminated the sample in small frequencies. the referenced pro-seq study is originally designed to rule out this possibility by including run-on libraries with one type of biotin-ntp supplied at once without the presence of any other ntp substrates. for example, pro-seq biotin-ctp library is produced after washing out other ntp pools and supplying only the biotin-ctp in the run-on reaction. this allows only the incorporation of biotin-ctp at the active site of the rna polymerases if the rna polymerase is positioned at the c base, because no other ntp substrates are available for the run-on reaction. therefore, ′ ends of the pro-seq rna reads should predominantly be the c base ( figure b ). in the original analysis, up to % of all reads ended with the supplied biotin-ntp base (a), supporting that pro-seq detects nascent rna transcribed by rna polymerase. we performed the last base composition analysis on the pro-seq reads aligned to the dav genome ( figure c ). overall, - % of the pro-seq reads mapped to the dav genome ended with the same base as the run-on base. if the reads were derived from rna molecules in viral particles, the base composition would have been the same as the dav genome composition (a:c:g:u = : : : ), which is not observed. this indicates that rdrp can incorporate biotin-ntps similarly to nuclear run-ons, and the majority of pro-seq reads from the dav genome are the products of rdrp activity. while all single base run-ons show expected last base frequencies, the absolute amount of reads mapped to the dav genome differs between each single base run-on ( table , figure a ). biotin-ctp and biotin-utp pro-seq libraries show up to times more reads mapped to the dav genome than biotin-atp and biotin-gtp libraries. this suggests that rdrp may be less efficient in using biotin-purine ntps, and that biotin-pyrimidine ntps (biotin-ctp and biotin-utp) are the preferred substrates. this is different from drosophila rna polymerase ii, which appears to have comparable substrate specificity for all biotin-ntps. the orf-proximal accumulation coincides with quadruplet rich regions of the template-strand of the dav genome ( figure d ). for example, c-quadruplets are more frequent in orf-proximal pro-seq peaks at - base regions ( figure e ). this suggests the complementary g-quadruplets on the template strand are associated with rdrp stalling. g-quadruplets are known to form stable secondary rna structures, and may provide physical barriers to rdrp elongation. therefore, sequence features of the dav genome can play a critical role in rdrp processivity, and these rna sequence features can become therapeutic targets for clinical ssrna virus infections. we then examined if pro-seq can be used to detect rdrp activity in human samples. we previously developed a pro-seq procedure for a small amount of peripheral whole blood, which will detect the transcriptional landscape of nucleated leukocytes. this procedure uses biotin-ctp in the presence of other ntps as the substrates, and should be compatible with rdrp. one of the hypotheses on ssrna virus infections is that they may drive immune cell response to aggravate symptoms of infection, possibly by directly infecting the immune cells. for example, a recent study suggested that sars-cov- can directly infect t cells and induce t cell exhaustion . pro-seq in peripheral blood cells can not only test such hypotheses on viral activity by detecting rdrp, but also can provide comprehensive profiles of gene expression and transcriptional enhancer activities at the same time. this will allow us to examine host-ssrna virus interactions with quantitative measures. we analyzed peripheral blood chromatin run-on (pchro-seq) datasets, a modified version of pro-seq, using . - . ml blood samples from de-identified individuals, whose conditions regarding any viral infection were unknown. we collected pro-seq reads that did not map to the hg genome, and aligned them to the ncbi all virus database as described above. from this data, we did not detect significant pro-seq sequences from (+)ssrna viral genomes, indicating that none of the individuals had direct viral infections in the blood immune cells. for example, unlike the s cell data, we did not find any sequences specific to drosophila a virus genome. this can serve as a baseline for evaluating human samples with active (+)ssrna virus infections. viral infections lead to a diverse range of severity in clinical symptoms, and gene expression in immune cells such as cytokine response is an important factor in host-virus interaction. our pro-seq data show expression levels of immune-response related genes from human peripheral blood leukocytes. out of , "immune" related genes classified by gene ontology, we found differentially expressed genes in at least individual (false discovery rate < . , deseq). typically, pro-seq levels are two or more fold higher in one or more individuals in these differentially expressed genes ( figure a ). these expression patterns form clusters of immune related genes and clusters of individuals ( figure b ) . the co-clustering of biologically replicated data from one individual (p , p n, p nh) shows the reproducibility of pro-seq in peripheral leukocytes (pchro). the gene expression patterns suggest that individuals can be classified into groups with different immunity patterns if pro-seq in peripheral leukocytes were conducted on a larger scale. overall, pro-seq can serve as an extremely efficient method to both map rdrp activity and evaluate host immunity at once in human samples. the transcription/replication landscape of the (+)ssrna viruses, as we report in dav by pro-seq, will provide a mechanistic basis of initiation and elongation by rdrp that can be directly applied to viruses that infect humans, such as sars-cov- . rdrp is the direct target of the only approved drug for sars-cov- infection, remdesivir, a chain terminating nucleoside analogue of rdrp. identifying rdrp initiation and pausing sites in addition to defining sequence elements that regulate rdrp initiation and elongation can play a critical role in developing sars-cov- and other (+)ssrna virus therapeutics. transcription/replication of the dav genome by rdrp is not completely uniform throughout the whole genome, and there appear to be sites of rdrp slowdown, pausing, premature termination, and internal re-initiation. pro-seq and ′ end cap analysis of pro-seq (pro-cap) can precisely define these sites and sequence elements on the viral genome. antisense rna that interferes with these elements would be a straightforward therapeutic option that can synergize with nucleoside analog inhibitors such as remdesivir. in addition, host factors that contribute to post-translational modification of eukaryotic transcription, replication, or rna processing machinery can also play a role in rdrp processivity on the viral genome. testing known small molecules using pro-seq on a human cell based system, and directly measuring specific stages of rdrp initiation and elongation, will provide detailed information to identify drug candidates to be used in combination with other rdrp inhibitors. for example, secondary structures of the rna genome may cause rdrp stalling as we observed in dav, and drugs stabilizing rna structures or inhibiting rna helicases may also inhibit rdrp elongation. such synergisms targeting rdrp from multiple mechanistic aspects will increase the efficiency of existing drugs such as remdesivir and allow the use of lower tolerable doses in combinatorial therapies. finally, pro-seq can provide measures of the expression levels of host factors while also measuring viral activity. while many (+)ssrna viruses manifest as respiratory tract infections, there has been strong suspicion that severe cases may affect immune cells, either directly or indirectly. the versatility of pro-seq to use blood leukocytes from untreated whole blood samples provides us with an opportunity to inspect the presence of actively infecting viruses in immune cells, and immune gene expression profiles that interact with viral infections at systematic levels at the same time. pro-seq has the advantage of distinguishing inactive viral nucleic acids from actively transcribing and replicating viral genomes. therefore, we propose to repurpose pro-seq as an efficient strategy to map rdrp activities along with host gene expression profile to evaluate host-virus interactions simultaneously with high genomic resolution. geo srx ;srr , srr , srr , srr . non declared. drosophila pro-seq sample sequences from a dataset included in kwak et al, science was retrieved from ncbi's gene expression omnibus (geo) under the accession gse . the samples are located in the sequence read archive srx . the four runs (srr , srr , srr , srr ) were downloaded, adaptor sequences were removed using cutadapt, and the trimmed sequences were mapped to the reference drosophila genome sequence (dm ) using star aligner. we aligned unmapped reads to the dav sequence retrieved from the ncbi virus repository using the bwa bwa aligned. peripheral blood cell pro-seq was performed as described previously . briefly, frozen blood sample is thawed and lysed in nun buffer ( . m nacl, m urea, % np- , mm hepes, ph . , . mm mgcl , . mm edta, x protease inhibitor cocktail, mm dtt, u/ml rnase inhibitor) and chromatin is pelleted by centrifugation at , g for min, °c. pelleted chromatin was washed and used for the ultrashort pro-seq procedure (upro). chromatin or cells were incubated in the nuclear run-on reaction condition with biotin-ntps and rntps supplied for min at °c. run-on rna was extracted using trizol, and fragmented. ′ rna adaptors are ligated followed by consecutive streptavidin bead bindings and extractions. extracted rna is converted to cdna using template switch reverse transcription. after a spri bead clean-up, the cdna is pcr amplified using primers compatible with illumina small rna sequencing. downstream analyses were performed as described previously . pro-seq density on dav genome and base-quadruplet counts.the read count is indicated along the left side of each graph. the read densities for the pro-seq data are much higher (some in the thousands) than the quadruplet read densities (less than ten). the first graph contains pro-seq density data in red. the second graph shows a-quadruplet density data in green. the third graph shows c-quadruplet density data in blue. the position in bases is indicated along the bottom of each graph. regions that show increased densities of pro-seq are indicative of rdrp pausing. this can be seen at positions - , , - , , and , - , . e. enrichment analysis of template strand g-quadruplets on high pro-seq density regions. g-quadruplets on the template strand are associated with rdrp stalling because c-quadruplets are more frequent in orf-proximal pro-seq peaks at the - base regions. the left side of each graph indicates read density. the bottom of the figure indicates the rank of each position by read density. in the bottom graph, g-quadruplet positions are mapped. density score is . in the red region, - . in the yellow region, and - . in the blue region, as indicated by the key on the right-hand side. the enrichment score is graphed in the top graph in green, with a p-value of . indicating a significant difference in densities. transcriptome-wide profiles of leukocyte gene expression using pro-seq. a. expression levels of the most differentially expressed immune related genes in each individual. expression levels were normalized using deseq , and the most differentially expressed genes were selected based on their p-values of the deseq analysis. b. heatmap of pro-seq expression levels of the immune related genes and hierarchical clustering of the genes and individuals.y-axis shows the , immune related genes clustered. colormaps reflect the z-score normalized expression levels within the same genes across different individuals. the trinity of covid- : immunity, inflammation and intervention host cell proteases: critical determinants of coronavirus tropism and pathogenesis structure of the rna-dependent rna polymerase from covid- virus one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities detectable -ncov viral rna in blood is a strong indicator for the further clinical severity sars-cov- infects t lymphocytes through its spike protein-mediated membrane fusion comparative replication and immune activation profiles of sars-cov- and sars-cov in human lungs: an ex vivo study with implications for the pathogenesis of covid- precise maps of rna polymerase reveal how promoters direct initiation and pausing base-pair-resolution genome-wide mapping of active rna polymerases using precision nuclear run-on (pro-seq) the discovery, distribution, and evolution of viruses associated with drosophila melanogaster rna-dependent rna polymerases of picornaviruses: from the structure to regulatory mechanisms nascent rna sequencing of peripheral blood leukocytes reveal gene expression diversity biorxiv key: cord- -q jamg authors: hahka, taija m.; xia, zhiqiu; hong, juan; kitzerow, oliver; nahama, alexis; zucker, irving h.; wang, hanjun title: resiniferatoxin (rtx) ameliorates acute respiratory distress syndrome (ards) in a rodent model of lung injury date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q jamg acute lung injury (ali) is associated with cytokine release, pulmonary edema and in the longer term, fibrosis. a severe cytokine storm and pulmonary pathology can cause respiratory failure due to acute respiratory distress syndrome (ards), which is one of the major causes of mortality associated with ali. in this study, we aimed to determine a novel neural component through cardiopulmonary spinal afferents that mediates lung pathology during ali/ards. we ablated cardiopulmonary spinal afferents through either epidural t -t dorsal root ganglia (drg) application or intra-stellate ganglia delivery of a selective afferent neurotoxin, resiniferatoxin (rtx) in rats days post bleomycin-induced lung injury. our data showed that both epidural and intra-stellate ganglia injection of rtx significantly reduced plasma extravasation and reduced the level of lung pro-inflammatory cytokines providing proof of principle that cardiopulmonary spinal afferents are involved in lung pathology post ali. considering the translational potential of stellate ganglia delivery of rtx, we further examined the effects of stellate rtx on blood gas exchange and lung edema in the ali rat model. our data suggest that intra-stellate ganglia injection of rtx improved po and blood acidosis days post ali. it also reduced wet lung weight in bleomycin treated rats, indicating a reduction in lung edema. taken together, this study suggests that cardiopulmonary spinal afferents play a critical role in lung inflammation and edema post ali. this study shows the translational potential for ganglionic administration of rtx in ards. respiratory failure due to acute respiratory distress syndrome (ards) is one of the major causes of mortality associated with acute lung injury (ali) including covid- . [ ] [ ] [ ] [ ] [ ] most forms of ali/ards are also associated with acute cytokine release, pulmonary edema and in the longer term, fibrosis. , however, the mechanisms underlying these pathological changes in the lungs during ali/ards are not fully understood. in particular, a neural component that mediates lung pathology during ali/ards has been less considered. sensory neurons innervating the heart and lung enter the central nervous system by one of two routes; through the vagus nerve into the brain stem (medulla) with cell bodies residing in the nodose ganglia and directly into the spinal cord where cell bodies reside in the dorsal root ganglia (drg). afferents are composed of elements that respond to a variety of sensory modalities including mechanical deformation, heat, cold, ph, and inflammatory mediators, just to name a few. the reflex effects following stimulation of these afferents depends on the type of stimulus and the neural pathway involved. activation of vagal afferent pathways tends to be sympatho-inhibitory and antiinflammatory. , on the other hand, activation of spinal afferents tends to be sympathoexcitatory and pro-inflammatory. [ ] [ ] [ ] [ ] [ ] [ ] it is well known that small diameter spinal transient receptor vanilloid (trpv )-positive afferent c-fibers contain neuropeptides such as substance p (sp) and calcitonin gene related peptide (cgrp). these peptides tend to dilate adjacent vasculature and increase microvascular permeability. in the lung, this can cause pulmonary edema resulting in reduced oxygen diffusion and promote immune cell infiltration resulting in neural inflammation. therefore, in the current study we hypothesized that ablation of lung afferent innervation (thoracic spinal) by application of an ultrapotent, selective afferent neurotoxin, resiniferatoxin (rtx) will modify the course of the pathology including lung edema and local pulmonary inflammation associated with progressive ali. committee of the university of nebraska medical center and performed in accordance with the national institutes of health's guide for use and care of laboratory animals and with arrive guidelines. , experiments were performed on adult, male, - g sprague-dawley rats purchased from the charles river laboratories. animals were housed on-site and given a one-week acclimation period prior to experimentation. food and water were supplied ad libitum, and rats were on -hour light/dark cycles. rats were randomized into three groups and evaluated at -week post-instillation as follows: sham rats, bleomycin (bleo)-exposed rats with saline (epidural or intrastellate injection), and bleo-exposed rats with rtx (epidural or intra-stellate injection). bleo ( . mg/kg, ~ . ml) was instilled intra-tracheally to the lungs under % isoflurane anesthesia. sham control rats underwent intra-tracheal instillation of saline. animals were treated with rtx or vehicle (phosphate buffered saline) by either the epidural t -t drgs route ( µg/ml, µl/per ganglia) or intra-stellate ganglia administration ( µg/ml, µl/per side) days following bleomycin delivery ( figure ). in a pilot experiment, the upper thoracic spinal afferents were ablated by epidural application of rtx as previously described. briefly, rats were anesthetized using %- % isoflurane:oxygen mixture. rats were placed in the prone position and a small midline incision was made in the region of the t -l thoracic vertebrae. following dissection of the superficial muscles, two small holes (approximately mm x mm) were made in the left and right sides of t vertebrae. a polyethylene catheter (pe- ) was inserted into the subarachnoid space via one hole and gently advanced about cm approximating the t level. the upper thoracic sympathetic afferent ganglia were ablated by injecting resiniferatoxin (rtx; sigma aldrich), an ultra-potent agonist of the trpv receptor into the aubarachnoid space via the catheter. rtx ( mg; sigma aldrich) was dissolved in a : : mixutre of ethanol, tween (sigma-aldrich), and isotonic saline. the first injection of rtx ( µg/ml, ul) was made at a very slow speed (~ minute) to minimize the diffusion of the drug. the catheter was then pulled back to t , t and t , respectively to perform serial injections ( ul/each) at each segment. the catheter was withdrawn and the same injections were repeated on the other side. silicone gel was used to seal the hole in the t vertebra. the skin overlying the muscle were closed with a - polypropylene simple interrupted suture, and betadine was applied to the wound. for post-procedure pain management, buprenorphine ( . mg/kg) was subcutaneously injected immediately after surgery and twice daily for days. rats were anesthetized using %- % isoflurane:oxygen mixture. after the trachea was cannulated mechanical ventilation was started (model , harvard apparatus, south natick, ma). the skin from the rostral end of the sternum to the level of third rib was incised. portions of the superficial and deep pectoral muscles and the first intercostal muscles were cut and dissected. to localize the left or right stellate ganglion, the left or right precava vein were separated with a hooked glass or steel rod laterally away from the brachiocephalic artery to expose the internal thoracic artery and the costocervical artery, which are descending branches of the right subclavian artery. stellate ganglia and ansa subclavia are located medially to the origins of the internal thoracic and costocervical arteries. then, rtx ( µl, mg/ml) was injected into the ganglia with a µl hamilton syringe (microliter # , hamilton, reno, nv, usa.) over s bilaterally. an image of this procedure is shown in figure . following these maneuvers, the thorax between the st and nd intercostal spaces was closed with continuous - dexon ii coated braided absorbable polyglycolic acid suture and the skin was closed with - polypropylene suture and the chest evacuated. betadine was applied to the wound and the rats were allowed to recover from the anesthesia. for post-procedure pain management, buprenorphine ( . mg/kg) was injected subcutaneously immediately after surgery and twice daily for days. the artery on the ventral aspect of the rat tail was used for the collection of small amounts of blood (~ . ml) for analyzing arterial blood gas at day post bleomycin. the animal was restrained with a commercial restrainer so that its tail was accessible. the tail was prepared aseptically by alternating alcohol prep pads and iodine prep pads three times and the artery was punctured using a g needle. a small volume of blood (~ . ml) was gently aspirated into the syringe for blood gas analysis (istat, abbott, chicago, il, usa). after sample collection, the needle was removed, and a gauze swab was pressed firmly on the puncture site to stop bleeding. rats were anaesthetized with pentobarbitone ( mg/kg). evans blue, mg/kg ( mg/ml, dissolved in saline + ie per ml heparin) was administered intravenously. right panel: at day , bleomycin or saline was given intra-tracheally; at day , resiniferatoxin or vehicle was given into epidural space or into stellate ganglia; at say , the rats were sacrificed. the procedure by which the stellate ganglia were exposed and injected in rats under anesthesia is shown in figure . plasma extravasation (evans blue) was used to assess vascular permeability after acute lung injury. as shown in figure , bleomycin-treated lungs exhibited a wide distribution of evans blue areas in both sides. the highest intensity of evans blue was shown at the medial aspect of each lung. the evans blue areas were largely reduced following epidural rtx treatment at the -day time point after bleomycin administration (figure ) . three pro-inflammatory tissue cytokines were prevalent in the lung following bleo treatment are shown in figure . il- , il- ß and ifný were markedly elevated following bleo treatment. these cytokine levels were normalized in epidural rtx treated rats. plasma extravasation in response to bleo was also reduced after stellate injection of rtx ( figure ) . as can be seen, there was a marked reduction in evans blue dye in the lung following stellate injection of rtx. arterial blood gas data were evaluated in rats treated with vehicle vs rtx intra-stellate. compared to sham rats, wet lung weight (wlw) as well as the ratio of wlw to bw was significantly higher in the bleo+veh rats, which was significantly reduced by intrastellate injection of rtx. these data suggest that intra-stellate injection of rtx reduces lung edema post bleo. the data from this study provides proof of principle and is highly suggestive of an important role for trpv -positive spinal afferent-mediated neuroinflammation in acute lung injury and progression to acute respiratory distress. the evidence provided demonstrates that ablation of trpv sensory afferents in the presence of acute lung injury using rtx delivered by either of two routes that target cardiopulmonary afferents leads to a rapid reduction in lung microvascular permeability and a reduction in tissue and plasma inflammatory markers. while pulmonary function per se was not directly measured in this series of experiments, arterial blood gas data strongly suggest an improvement in gas exchange. the improved body weight and reduced lung weight in rats with lung injury after receiving stellate ganglia administration of rtx suggest potential clinical benefits from reduced lung edema, and protective effects for nonpulmonary organs that would otherwise be impacted by the pulmonary triggered systemic inflammatory process. the lung is innervated by a dual sensory system including vagal and spinal afferents. both vagal and spinal afferent fibers are composed of a (high conduction velocity) and c-fiber (low conduction velocity) axons. these fibers and their sensory endings express a variety of membrane receptors that mediate ion channel function including traditional na + , k + and ca + channels (both voltage and ligand gated). importantly, non-specific cation channels that are highly permeable to calcium are expressed mostly in small diameter c-fibers. , these include at least members of the transient receptor potential family including transient receptor potential a (trpa) and trpv receptors. trpv receptors transduce sensations of heat and neuropathic pain in the periphery. estimates are that approximately percent of thoracic drg neurons are positive for trpv . upon activation, trpv channels are highly permeable to calcium. , high levels of intracellular calcium are toxic and thus damage or kill these specific afferent neurons. thus, a unique strategy has been developed to modulate the pathological effects of trpv afferent neurons. the ultrapotent neurotoxin, rtx binds avidly to the trpv receptor. after initial stimulation, high intracellular levels of calcium mediate inhibition of neuronal function. site-specific delivery of rtx can be used to intervene in various conditions to alleviate pain, inflammation, fibrosis and plasma extravasation. it has been shown that rtx-induced trpv sensory afferent deletion can block the afferent-contained neuropeptide release and reduce inflammatory pain. cardiopulmonary spinal afferents can also be targeted with rtx by either application into the epidural space at thoracic levels t -t (with some spread to higher and lower segments) or by injection into the stellate ganglia. while drgs are considered exclusively sensory in nature, the stellates contain soma for sympathetic efferent fibers and fibers of passage for thoracic afferents as they course through drgs and enter the spinal cord. it should be noted that in humans the stellate ganglia can be easily identified, and that this type of transcutaneous procedure can be performed with fluoroscopic or ultrasound guidance (intra-ganglionic or nerve 'block' approach). compared to epidural delivery that requires relatively larger injection volume (~ µl for bilateral injection) to sufficiently cover the t -t drgs, intra-stellate injection requires a much smaller volume ( µl for bilateral injection), which reduces the risk of systemic absorption of rtx and allows a higher dose of rtx to be used for local injection. the current data clearly show that intra-stellate ganglia injection of rtx markedly attenuated lung extravasation post ali, suggesting that a large proportion of thoracic afferents passing through the stellates innervate the lungs. taken together, we believe that intra-stellate ganglia delivery of rtx should be a clinically feasible intervention to treat acute lung injury compared to the epidural approach. the preliminary data presented here for epidural administration of rtx provides proof of principle that it reduces plasma extravasation in the lung. the main focus of this study was on the therapeutic effect of the stellate ganglia approach on lung pathology in our ali rat model. prior work from this laboratory has demonstrated that ablation of cardiac trpv positive afferents reduces sympathetic nerve activity and cardiac remodeling in a post myocardial infarction model of chronic heart failure. trpv -expressing cardiopulmonary afferents participate in a sympatho-excitatory reflex that has been termed the cardiac afferent sympathetic reflex (csar) and the pulmonary afferent sympathetic reflex (psar) . the csar is augmented in heart failure along with cardiac afferent discharge in response to bradykinin or capsaicin. epicardial administration of rtx reduces sympathetic outflow to the heart and kidneys and improves cardiac diastolic function while reducing fibrosis and cytokine content in the heart. furthermore, cardiac application of local anesthetic lowers sympathetic nerve activity in anesthetized vagotomized animals suggesting tonic input from these spinal afferents in heart failure. , on the other hand, it has been widely reported that activation of trpv expressing afferents causes secretion of neuropeptides such as substance p (sp) and calcitonin gene-related peptide (cgrp). [ ] [ ] [ ] [ ] [ ] released sp, but not cgrp, in sensory endings binds neurokinin (nk) receptors on blood vessels and causes vasodilation and increased vascular permeability that allows loss of proteins and fluid (plasma extravasation) thus promoting the regional accumulation of monocytes and leukocytes contributing to inflammation. [ ] [ ] [ ] in the lung, this process not only impairs alveolar gas exchange but may initiate and exacerbate a fulminant cytokine storm from adjacent cells and from circulating macrophages. the current study supports the idea that selective ablation of trpv afferents mitigates neuroinflammation in the lung by inhibiting trpv afferent-mediated plasma extravasation, at least in the bleomycin model of ali. importantly, although we did not directly measure respiratory parameters such as minute ventilation and respiratory rate in vehicle and rtx treated bleomycin rats, we observed significantly reduced lung weight and improved blood gas parameters including blood ph, po and so in bleomycin rats treated with intra-stellate ganglia injection of rtx, suggesting an improvement in lung function. combined with the results shown in the current model of ali, the potential of rtx to rescue lung function and protect multiple organs from collateral damage due to lung injury triggered inflammation is very encouraging and warrants additional studies as a rescue therapy for patients with lung injuries or infections resulting in inflammatory -mediated pneumonia. clinical assessments including potential acceleration for return to normal function or long-term protective effects on lung fibrosis should also be explored in upcoming studies. in the current study, we chose the bleomycin-induced lung injury model to study the role of pulmonary spinal afferent ablation in lung pathology after acute lung injury. intratracheal injection of bleomycin has been widely used to evoke pulmonary interstitial lesions in animal models. , bleomycin-induced lung injury is primarily mediated by alveolar epithelial damage resulting in the release of large number of inflammatory cells and cytokines. following pulmonary insult with bleomycin at day , inflammation progresses to peak levels around day three. pulmonary edema, respiratory distress, body and organ weight loss associated with systemic inflammation is observed up to day . the model was chosen as it closely reproduces important aspects of ards including local inflammation, cytokine storm, progression to respiratory distress, and multi-organ impact. the timing of therapeutic intervention (day ) was also carefully selected to coincide with high levels of inflammatory mediators and lung damage that would be found in infectious disorders including in covid- patients progressing to ventilatory support. we acknowledge a limitation that since the clinical etiology of lung injury is variable (e.g. viral/bacterial infection, chemical and surgical), the bleomycin model does not completely mimic all pathological characteristics of lung injury in humans. while we considered the lipopolysaccharide (lps)-induced lung injury model, lps has been shown to have a direct effect on sympathetic and parasympathetic afferent and efferent neurons in addition to its effect on the lung itself, , - and thus was less suitable for use in this study. as far as a viral infection model is concerned, we have not attempted to directly apply our findings to relevant diseases such as covid , although there may be some phenotypic overlap between the bleomycin lung injury and viral infection models. more work needs to be done to validate the efficacy of rtx in other lung injury models. our data suggest that pulmonary spinal afferent ablation by intra-stellate injection of rtx reduces plasma extravasation and local pulmonary inflammation post bleomycininduced lung injury which results in improved blood gas exchange. these findings suggests that local stellate application of rtx could be used as a potential clinical intervention to mitigate lung pathology after ali. this study was supported by sorrento therapeutics inc. dr. hanjun wang is also supported by margaret r. larson professorship in anesthesiology. dr. irving h. zucker is supported in part by the theodore f. hubbard professorship for cardiovascular research. the lung/rtx project is currently sponsored by sorrento therapeutics inc. lb: pathogenesis of acute respiratory distress syndrome clinical features of patients infected with novel coronavirus in uk: covid- : consider cytokine storm syndromes and immunosuppression sa: respiratory support for patients with covid- infection clinical predictors of mortality due to covid- based on an analysis of data of patients from wuhan, china cs: acute respiratory distress syndrome the trp superfamily of cation channels. science&# ;s stke d: vagal afferent activation suppresses systemic inflammation via the splanchnic antiinflammatory pathway anti-inflammatory properties of the vagus nerve: potential therapeutic implications of vagus nerve stimulation hj: trpv (transient receptor potential vanilloid ) cardiac spinal afferents contribute to hypertension in spontaneous hypertensive rat hj: sympatho-excitatory response to pulmonary chemosensitive spinal afferent activation in anesthetized, vagotomized rats the paradoxical role of the transient receptor potential vanilloid receptor in inflammation p: the insulin receptor is colocalized with the trpv nociceptive ion channel and neuropeptides in pancreatic spinal and vagal primary sensory neurons the innervation of the kidney in renal injury and inflammation: a cause and consequence of deranged cardiovascular control ih: cardiac sympathetic afferent reflex control of cardiac function in normal and chronic heart failure states chapter -nociceptive physiology dg: improving bioscience research reporting: the arrive guidelines for reporting animal research nr: guide for the care and use of laboratory animals role of calcium ions in the positive interaction between trpa and trpv channels in bronchopulmonary sensory neurons intrathecal resiniferatoxin in a dog model: efficacy in bone cancer pain hj: identification of cardiac expression pattern of transient receptor potential vanilloid type (trpv ) receptor using a transgenic reporter mouse model mj: deletion of vanilloid receptor -expressing primary afferent neurons for pain control ih: cardiac sympathetic afferent denervation attenuates cardiac remodeling and improves cardiovascular dysfunction in rats with heart failure hl: cardiac vanilloid receptor -expressing afferent nerves and their role in the cardiogenic sympathetic reflex in rats hj: sympathoexcitation in response to cardiac and pulmonary afferent stimulation of trpa channels is attenuated in rats with chronic heart failure w: interaction between cardiac sympathetic afferent reflex and chemoreflex is mediated by the nts at receptors in heart failure tc: impact of neuropeptide substance p an inflammatory compound on arachidonic acid compound generation hydrogen sulfide and substance p in inflammation sensory-nerve-derived neuropeptides: possible therapeutic targets the role of neuromodulators (substance p and calcitonin gene-related peptide) in the development of neurogenic inflammation in the oral mucosa h]resiniferatoxin autoradiography in the cns of wild-type and trpv null mice defines trpv (vr- ) protein distribution kl: nk- receptor mediation of neurogenic plasma extravasation in rat skin nw: desensitization of the neurokinin- receptor (nk -r) in neurons: effects of substance p on the distribution of nk -r, galphaq/ , g-protein receptor kinase- / , and betaarrestin- / je: pharmacologic differentiation of inflammation and fibrosis in the rat bleomycin model chronic interstitial pulmonary fibrosis produced in hamsters by endotracheal bleomycin: pathology and stereology w: lipopolysaccharide-mediated inflammatory priming potentiates painful post-traumatic trigeminal neuropathy cg: lipopolysaccharides and trophic factors regulate the lps receptor complex in nodose and trigeminal neurons novel pathway for lps-induced afferent vagus nerve activation: possible role of nodose ganglion gm: lipopolysaccharide induces substance p in sympathetic ganglia via ganglionic interleukin- production key: cord- -oo q ml authors: gomes, fabio m.; tyner, miles d.w.; barletta, ana beatriz f.; yenkoidiok-douti, lampougin; canepa, gaspar e.; molina-cruz, alvaro; barillas-mury, carolina title: “proliferation of dblox peroxidase-expressing oenocytes maintains innate immune memory in primed mosquitoes” date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oo q ml immune priming in anopheles gambiae mosquitoes following infection with plasmodium parasites is mediated by the systemic release of a hemocyte differentiation factor (hdf), a complex of lipoxin a bound to evokin, a lipid carrier. hdf increases the proportion of circulating granulocytes and enhances mosquito cellular immunity. we found that evokin is constitutively produced by hemocytes and fat-body cells, but expression increases in response to infection. insects synthesize lipoxins, but lack lipoxygenases. here, we show that the double peroxidase (dblox) enzyme, present in insects but not in vertebrates, is essential for hdf synthesis. dblox is highly expressed in oenocytes in the fat body tissue, and these cells proliferate in response to plasmodium challenge. we provide direct evidence that modifications mediated by the histone acetyltransferase agtip (agap ) are essential for sustained oenocyte proliferation, hdf synthesis and immune priming. we propose that oenocytes function as a population of “memory” cells that continuously release lipoxin to orchestrate and maintain a broad, systemic and long-lasting state of enhanced immune surveillance. there is growing evidence that a previous infection can "train" or "prime" the innate immune system, allowing it to respond more effectively to subsequent infections ( - ). this is an ancient response that has been documented in plants ( ) , insects ( ) and humans ( ) . it is not clear whether individual immune cells maintain the training response, or if there are specific "memory keeper" cell populations that orchestrate a persistent re-training of effector immune cells. in anopheles gambiae, the primary vector of malaria in africa, plasmodium infection induces a long-lasting priming response that enhances antiplasmodial immunity ( ) . plasmodium midgut invasion allows direct contact between the gut microbiota and midgut epithelial cells, triggering a burst of prostaglandin e (pge ) production by midgut cells. the transient systemic release of pge establishes a long-lasting release of hemocyte differentiation factor (hdf) which, in turn, induces an increase in the proportion of circulating granulocytes, a hallmark of immune priming ( , ) . hdf is a complex of evokin, a lipocalin lipid carrier protein, and the lipid mediator lipoxin a (lxa ) ( ). thus, immune priming can be defined as the enhanced ability of mosquitoes to convert arachidonic acid into lxa that results in a permanent functional state of enhanced immune surveillance ( ) . an effective antiplasmodial response requires the coordinated activation of epithelial, cellular, and complement components of the mosquito immune system ( ) . pge attracts hemocytes to the basal surface of the midgut ( ), and primed hemocytes release more microvesicles, enhancing complement-mediated ookinete lysis ( ) . in vertebrates, cyclooxygenases (cox) and lipoxygenases (lox) catalyze the synthesis of prostaglandins and lipoxins, respectively ( ) . although eicosanoids have been detected in mosquitoes and other insects ( , , ) , neither cox nor lox enzymes are present in insects. we recently showed that two heme peroxidases, hpx and hpx , are necessary for midgut pge synthesis and are essential to establish immune priming in response to plasmodium infection ( ) . however, the enzyme(s) mediating lxa synthesis in insects remain unknown. in this manuscript, we identify the double peroxidase (dblox) enzyme as essential for hdf synthesis, and show that dblox is highly expressed in oenocytes, a cell population involved in lipid processing. furthermore, we discovered that oenocytes proliferate in primed mosquitoes. in vertebrates, monocyte trained immunity is mediated by epigenetic modifications ( , ) . here, we identify a histone acetyltransferase (hat) that is an essential mediator of oenocyte proliferation and mosquito immune priming. mrna expression of evokin, an essential component of hdf ( ) , was significantly induced in the hemocyte-like immune-responsive a. gambiae sua . cell line in response to bacterial challenge (fig. a ) (p= . , mann-whitney test). additionally, antibodies to recombinant evokin recognized a single band of the expected size ( kda) in the fraction containing membrane and insoluble proteins, but not in the cytoplasm (fig. b) . evokin mrna expression also increased significantly in hemocytes (p= . , unpaired t-test) and body wall (p= . , unpaired t-test) of primed mosquitoes days post-infection ( fig c) . in naïve bloodfed females, evokin is highly expressed in sessile hemocytes associated with the body wall surface ( fig d) . it is also present in oenocytes, with a dotted vesicle-like pattern (fig e) , while in fat body trophocytes a string pattern is observed, suggestive of evokin clustering in specific membrane regions that may represent lipid rafts ( fig f) . a similar expression pattern was observed in plasmodium challenged females (fig. s ). given the critical role of hpx and hpx in pge (table s ). both treatments resulted in a prolonged and significant increase in dblox expression in the body wall (p= . and p= . , respectively, mann-whitney test) while hpx was only induced following pge injection (p= . , mann-whitney test) (fig. b ). hpx silencing did not affect priming, as the characteristic increase in granulocytes was observed in response to plasmodium infection (p= . , mann whitney test) (fig. c ). in contrast, dblox silencing completely abolished the priming response to infection (fig. d ) and to pge injection (fig. s ) . furthermore, hemolymph of mosquitoes in which dblox was silenced no longer had hdf activity when transferred to naïve mosquitoes (fig. e ). proliferation is long-lasting, as the increased number of clusters and cells per segment were still present days post-feeding (fig. s ) . given the essential role of dblox in hdf synthesis and immune priming, the high expression of dblox in oenocytes, and their sustained proliferation in response to priming, we infer that the documented enhanced ability of primed mosquitoes to synthesize lxa from arachidonic acid ( ) is achieved by increasing the number of oenocytes, the cells in the mosquito that are a major site of lxa synthesis. dblox and evokin mrna levels remain persistently high in mosquitoes primed through p. berghei challenge or by pge injection. we explore whether the establishment and persistence of immune priming is mediated by histone acetyltransferases (hat), enzymes known to catalyze epigenetic chromatin modifications associated with long-lasting changes in transcriptional regulation ( , ) . we evaluated the effect of silencing each of the ten hats present in the a. gambiae genome (aghats) on immune priming following p. berghei infection (fig. a) . silencing agap resulted in high mortality after blood-feeding and, thus, it could not be further evaluated. of the other nine aghats, only silencing of the homolog of drosophila tip (agap -agtip ) abolished immune priming (fig. a) . we confirmed that dslacz injection had no effect on priming, while the proportion of granulocytes no longer increased when agtip was silenced (fig. b) . furthermore, hemolymph of primed agtip -silenced females no longer had hdf activity when transferred to naïve mosquitoes ( fig. c ). in agreement with our proposed model, the loss of hdf activity following agtip silencing was also associated with a lack of proliferation of dblox-expressing oenocytes, as the number of clusters and cells per segment no longer increased following a challenge with p. berghei (fig. d-f) . dblox is a unique enzyme with two duplicated heme peroxidase domains that is present in insects, but not in vertebrates. the first domain has the predicted substrate binding sites but lacks the functional residues present in catalytically active enzymes and has two integrin-binding motifs, typical of peroxinectins ( ) . in contrast, the second domain has all the features of a functional heme peroxidase and one integrin-binding motif ( ) . dblox is essential for lxa synthesis, but the biochemical mechanism of lipoxygenase-independent lipoxin synthesis in insects, and the potential involvement of other enzymes besides dblox remain to be explored. we speculate that the systemic burst of pge when ookinete invasion allows direct contact between the microbiota and gut epithelial cells, or when mosquitoes are injected with pge , triggers modification mediated by agtip that establish and maintain proliferation of oenocytes. these cells express high levels of dblox and their proliferation is essential for hdf synthesis and to maintain the priming response. the function of oenocytes is poorly understood, but they are known to be involved in biosynthesis of cuticular hydrocarbon and pheromones ( ) . taken together, our findings suggest that oenocytes are a major site of lipoxin synthesis. immune training of human monocytes involves transcriptional and metabolic reprogramming mediated by epigenetic modifications that allow challenged monocytes to maintain a prolonged state of enhanced immune function ( , ) . we propose that oenocytes function as "memory keeper" cells that persistently release lxa , which constantly re-trains mosquito hemocytes. recently, trained immunity in humans induced by vaccination with the anti-tuberculosis bacillus calmette-guérin (bcg) vaccine has received much attention in light of the covid pandemic ( ) . there is strong epidemiological evidence that bcg vaccination in infants has a broad protective effect and reduces the overall mortality from respiratory infections not related to tuberculosis ( ) ( ) ( ) , and there are ongoing clinical studies to establish if the bcg vaccine also protects from severe covid ( , ) . furthermore, there is growing epidemiological evidence of reduced covid mortality ( ) and improved outcomes ( ) in countries with strong bcg vaccination programs. however, little is known about the mechanism to establish and maintain trained immunity. in a. gambiae, prostaglandins are key to establish the priming response, and lipoxins maintain a broad general state of enhanced immune surveillance that is not pathogen-specific. this begs the question of whether eicosanoids may also be important mediators of trained immunity in humans. innate immune memory in invertebrate metazoans: a critical appraisal immune memory in invertebrates memory and specificity in the insect immune system: current perspectives and future challenges plant innate immunity: an updated insight into defense mechanism hemocyte differentiation mediates innate immune memory in anopheles gambiae mosquitoes therapeutic targeting of trained immunity a mosquito lipoxin/lipocalin complex mediates innate immune priming in anopheles gambiae activation of mosquito complement antiplasmodial response requires cellular immunity synopsis of arachidonic acid metabolism: a review eicosanoid-mediated immunity in insects prostaglandin e modulates the expression of antimicrobial peptides in the fat body and midgut of anopheles albimanus epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity trained immunity: a program of innate immune memory in health and disease histone acetylation: a switch between repressive and permissive chromatin transcription: gene control by targeted histone acetylation identification, characterization and expression analysis of anopheles stephensi double peroxidase the development and functions of oenocytes trained immunity: a smart way to enhance innate immune defence bcg-induced trained immunity: can it offer protection against covid- ? acute lower respiratory tract infections and respiratory syncytial virus in infants in guinea-bissau: a beneficial effect of bcg vaccination for girls community based case-control study nonspecific (heterologous) protection of neonatal bcg vaccination against hospitalization due to respiratory infection and sepsis bcg scar and positive tuberculin reaction associated with reduced child mortality in west africa. a non-specific beneficial effect of bcg? nlm ( ) reducing health care workers absenteeism in covid- pandemic through bcg vaccine (bcg-corona) nlm vaccine trials ( ) bcg vaccination to protect healthcare workers against covid- (brace) bcg vaccine protection from severe coronavirus disease (covid- ) significantly improved covid- outcomes in countries with higher bcg vaccination coverage: a multivariable analysis the authors gratefully acknowledge asher kantor and mark johnson for editorial assistance. animal studies were done according to the nih animal study protocol (asp) approved by the nih animal care and user committee (acuc), with approval id asp-lmvr . public health service animal welfare assurance #a - guidelines were followed according to the national institutes of health animal (nih) office of animal care and use (oacu). key: cord- -t lcd authors: cai, guoshuai; xiao, feifei title: scanner: a web resource for annotation, visualization and sharing of single cell rna-seq data date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t lcd motivation in recent years, efficient scrna-seq methods have been developed, enabling the transcriptome profiling of single cells massively in parallel. meanwhile, its high dimensionality brought challenges in data modeling, analysis, visualization and interpretation. available analysis tools require extensive knowledge and training of data properties, statistical modeling and computational skills. it is challenging for biologists to efficiently view, browse and interpret the data. results here we developed scanner, as a public webserver resource to equip the biologists and bioinformatician to share and analyze scrna-seq data in a comprehensive and collaborative manner. it is effort-less and host-free without requirement on software setup or coding skills, and enables a user-friendly way to compare the activation status of gene sets on single cell basis. also, it is equipped with multiple data interfaces for easy data sharing and currently provide a database for studying the smoking effect on single cell gene expression in lung. using scanner, we have identified larger proportions of cancer-associated fibroblasts cells and activeness of fibroblast growth related genes in melanoma tissues in females compared to males. moreover, we found ace is mainly expressed in pneumocytes, secretory cells and ciliated cells with disparity in gene expression by smoking behavior. availability and implementation scanner is available at https://www.thecailab.com/scanner/. supplementary information supplementary data are available online. contact gcai@mailbox.sc.edu or xiaof@mailbox.sc.ecu key points scanner provides a new web server resource for promoting scrna-seq data analysis scanner enables comprehensive and dynamic analysis and visualization, novel functional annotation and activeness inference, online databases and easy data sharing. scanner bridges the data analysis and the biological experiment units. in recent years, single cell rna sequencing (scrna-seq) methods have been efficiently enabling the transcriptome profiling of each single cell (kolodziejczyk, et al., ) . it provides opportunities to identify cell clusters and their specific biomarkers and insights into cell developmental trajectory, gene bursting activities, cell interactions and others (hwang, et al., ) . however, the high dimensionality of scrna-seq data also brought challenges in its analysis. a widely used strategy is to reduce the high dimension into a low space of two-or three-dimensions, and in which, data analysis, visualization and interpretation could be further performed. for fast browsing scrna-seq data, applications such as scv (wang, et al., ) and cerebro (hillje, et al., ) have been developed. however, they require local installation and implementation, as well as a multi-step pre-processing procedure including raw data processing, dimension reduction and data input standardization. this brings obstacles for biologists to easily explore the data and efficiently communicate with bioinformatician whose knowledge of data modeling and computational skills is much needed. moreover, due to the rapid development of new technologies, the volume and complexity of data increase fast and highly require an effective gateway for data sharing, storage, annotation and visualization. the broad institute single cell portal (https://portals.broadinstitute.org/single_cell) and scrnaseqdb (cao, et al., ) are available databases for scrna-seq studies. however, their implemented functions are static and limited for data visualization and analysis. a new web server resource with comprehensive and dynamic analysis and visualization functions is highly demanded to bridge the data analysis and the biological experiment units. in this study, we developed the single cell transcriptomics annotated viewer (scanner), as a public web resource for scrna-seq data management, analysis and interpretation in a comprehensive, flexible and collaborative manner. scanner is based on the framework of scv, which is a useful local r shiny application for scrna-seq data visualization. scanner enables a unique set of functions highlighted in figure ( ) scanner provides novel functional annotation and activeness inference. analysis on gene set can provide valuable insights into explaining biological mechanisms underlying a phenotype of interest. to enable this, scanner infers the activation status of a particular gene set that are involved in a pathway by four scores (supplementary methods), including averaging expression abundance or ranks of the involved genes, the expression of an eigengene (langfelder and horvath, ) , and enrichment scores (barbie, et al., ). ( ) scanner provides an online database for scrna-seq data. currently, scanner hosts a database of smoking lungs for studying the smoking effect on single cell basis. studies has been reviewed and three publicly available datasets were included, including the dataset from bronchial epithelial cells, alcam+ epithelial cells and cd + white blood cells from six never and six current smokers (duclos, et al., ) , the dataset from lung tissue of one former smoker, two current smoker and five non-smokers) (reyfman, et al., ) and the dataset from lung tissue of one former smoker, one current smoker and three non-smokers (madissoon, et al., ) . we will continue to collect datasets and update scanner to provide a comprehensive resource of single cell transcriptomics for public use. ( ) scanner provides multiple data interfaces which enables easy data sharing. three options are available, by (i) scanner data object that users can generate following instruction (availability and implementation), (ii) access credentials that users can contact the authors for a password controlled access; and (iii) database that provides the most efficient method for data sharing and exploring. here, we demonstrate the application of scanner with two case studies: (a) sex disparity in melanoma-associated fibroblast. we analyzed the melanoma dataset (tirosh, et al., ) and found that tumors from female patients had significantly larger proportions of cancer-associated fibroblasts (caf) and endothelial cells than those from males ( fig. s a) . correspondingly, in caf and endothelial cells of the female patients, the fibroblast growth factor (fgf) binding function inferred by enrichment score was highly activated (fig. s b , c) with higher detection rates and activeness scores (fig. s d ), which were confirmed by methods of average expression, average rank and eigen-gene expression (fig. s ) . consistently, the fgf genes fgf and fgf were highly expressed in caf cells in the female melanoma tissue (fig. s ) . such over-expression was also found in most of genes involved in the fgf binding function (fig. s , ) . given that caf is a promise target to treat melanoma (zhou, et al., ) , this gender difference in caf may provide a new implication to explain why male patients usually have worse survival outcome compared to females (joosse, et al., ) . (b) tobacco-use disparity in ace lung expression. exploring our database of smoking lung, we found that the gene of the sars-cov- receptor, ace , is mainly expressed in pneumocytes, secretory cells and ciliated cells (fig. s ) , which is consistent with the recent study of ziegler et al. (ziegler, et al., ) . also, among bronchial epithelial cells, we found that ace gene is mainly expressed in club cells in never smokers. differently in smokers, goblet cells are extensively proliferated and harbor most expressed ace , which may indicate a complex effect of smoking on the covid- risk (cai, et al., ) . systematic rna interference reveals that oncogenic kras-driven cancers require tbk tobacco smoking increases the lung gene expression of ace , the receptor of sars-cov- a database for rna-seq based gene expression profiles in human single cells characterizing smoking-induced transcriptional heterogeneity in the human bronchial epithelium at single-cell resolution interactive visualization of scrna-seq data single-cell rna sequencing technologies and bioinformatics pipelines sex is an independent prognostic indicator for survival and relapse/progression-free survival in metastasized stage iii to iv melanoma: a pooled analysis of five european organisation for research and treatment of cancer randomized controlled trials the technology and biology of single-cell rna sequencing wgcna: an r package for weighted correlation network analysis scrna-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis dissecting the multicellular ecosystem of metastatic melanoma by single-cell rna-seq single cell viewer (scv): an interactive visualization data portal for single cell rna sequence data perspective of targeting cancer-associated fibroblasts in melanoma sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues we acknowledge ben torkian and jun zhou from research computing program of university of south carolina for the assistance on gateway application, allocation and implementation. this study was supported by the nsf xsede startup allocation award (mcb ). figure . scanner application scenario. scanner facilitates the sharing, visualization, analysis and interpretation of scrna-seq data and the communication between biologists and bioinformatician in a flexible, multi-functional and user-friendly manner. its group side-by-side view is useful to explore differences between groups, of ) cell clusters, ) single cell gene expression or pathway activity, ) expression/activity distribution, ) expression/activity detection and ) expression pattern. key: cord- -jlr vb u authors: baumeister, sebastian e; karch, andré; bahls, martin; teumer, alexander; leitzmann, michael f; baurecht, hansjörg title: physical activity and risk of alzheimer’s disease: a two-sample mendelian randomization study date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: jlr vb u introduction evidence from observational studies for the effect of physical activity on the risk of alzheimer’s disease (ad) is inconclusive. we performed mendelian randomization analysis to examine whether physical activity is a protective factor for ad. methods summary data of genome-wide association studies on physical activity and ad were identified using pubmed and the gwas catalog. the study population included , ad cases and , cognitively normal controls. eight single nucleotide polymorphisms (snp) known at p < × − to be associated with accelerometer-assessed physical activity served as instrumental variables. results genetically predicted accelerometer-assessed physical activity had no effect on the risk of ad (inverse variance weighted odds ratio [or] per standard deviation (sd) increment: . , % confidence interval: . - . , p= . ). discussion the present study does not support a relationship between physical activity and risk of ad, and suggests that previous observational studies might have been biased. alzheimer's disease (ad) is the main cause of dementia and one of the great health-care challenges of the st century [ ] . research since the discoveries of amyloid-β and tau, the main components of plaques and tangles, has provided considerable knowledge about molecular pathways of ad development; however, this knowledge has not yet been translated into the implementation of effective prevention measures for modifiable risk factors of ad [ ] . considerable research has focused on the potentially protective role of physical activity for ad. several meta-analyses of observational studies suggested a protective effect of physical activity for cognitive decline and risk of dementia and ad [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . although intervention studies have shown that exercise improves cognitive performance, they have not revealed changes in the risk of dementia or ad [ , ] . for example, the large multidomain lifestyle finger (finish geriatric intervention study to prevent cognitive impairment and disability) trial comprised an exercise program and demonstrated beneficial effects on cognition after two years [ ] . more recently, long-term observational studies have suggested that the inverse association of physical activity and dementia might be subject to reverse causation due to a decline in physical activity during the preclinical phase of dementia [ , ] . mendelian randomization (mr) is a method that uses genetic variants as instrumental variables to uncover causal relationships in the presence of observational study bias such as unobserved confounding and reverse causation [ ] . in the current study, we performed two-sample mr analyses to provide further evidence for the association between accelerometer-assessed physical activity and ad. the mr study design had three components: ( ) identification of genetic variants to serve as instrumental variables for accelerometer-assessed physical activity; ( ) the acquisition of summary data for the genetic instruments from genome-wide association studies on accelerometer-assessed physical activity; ( ) acquisition of instrumenting snp-outcome summary data for the effect of genetic instruments from genome wide association studies on the risk of ad. we selected eight snps associated with accelerometer-based physical activity (mean acceleration in milli-gravities) at a genome-wide significance level (p < x - ), using a plinkclumping algorithm (r² threshold = . and window size = mb), from a genome-wide study of , uk biobank participants [ ] (supplementary table ). summary data for the association of snps for accelerometer-based physical activity with ad were obtained from a gwas of , clinically-confirmed ad cases and , cognitively normal controls [ ] . that gwas for ad did not include the data from the uk biobank. the a priori statistical power was calculated using an online tool at http://cnsgenomics.com/shiny/mrnd/ [ ] . we assumed that the eight accelerometer-based physical activity snps explained . % of the phenotypic variable [ , , ] . given a type error of %, we had sufficient statistical power (> %) for an expected odds ratios (or) per standard deviation of ≤ . between ad and genetically instrumented accelerometerbased physical activity. cochran's q was computed to quantify heterogeneity across the individual effect estimates of the selected snps, with p≤ . indicating the presence of pleiotropy (supplemental table ). consequently, a random effects inverse-variance weighted (ivw) mr analysis was used as principal analysis [ ] . other mr methods addressing specific violations of specific instrumental variable analysis assumptions included: weighted median, mr-egger and mr-pleiotropy residual sum and outlier (mr-presso) [ , ] . the results were presented as ors and % confidence intervals (cis) per -sd increment in accelerometer-based physical activity. we tested for potential directional pleiotropy by testing the intercepts of mr-egger models [ ] . finally, we looked up each instrument snp and its proxies (r²> . phenoscanncer [ ] and the gwas catalog [ ] to assess any previous associations (p< x - ) with potential confounders. we performed leave-one-out analyses and exclusion of potentially pleiotropic snps to rule out possible pleiotropic effects. analyses were performed using the twosamplemr (version . . ) [ ] and mrpresso (version . ) packages in r (version . . ). reporting follows the strobe-mr statement [ ] . we found that genetically predicted accelerometer-based physical activity was not associated with ad (ivw or per -sd increment: . , % ci: . - . p= . , table ). this finding was confirmed using alternative mr methods and leave-one-snp-out-analysis (table , supplementary table ). the f-statistics for the strength of the genetic instruments were all ≥ and ranged from to (supplementary table ). the intercept test from the mr-egger regression was not statistically significant (supplementary table ). in the phenoscanner and gwas databases, we did not find an indication of possible pleiotropy of any of the eight snps for accelerometer-based physical activity. this mr study with gwas data on accelerometer-based physical activity from , individuals and gwas data from , ad cases found little evidence for an effect of physical activity on the risk of developing ad. previous observational studies concluded that higher levels of physical activity are associated with reduced risk of dementia and ad [ , , ] . the most comprehensive meta-analysis comprising cohort studies found a % (relative risk: . ; % ci: . - . ) relative reduction in risk of ad when comparing the highest and lowest levels of physical activity [ ] . these conclusions are in contrast to meta-analyses of intervention studies, which do not show a protective effect of exercise interventions on the risk of ad [ , ] . similarly, recent observational studies have found that when physical activ-ity assessment and diagnosis of ad are ≥ years apart there was no association between physical activity and risk of dementia and ad [ , ] . a pooled analysis [ ] of studies including , ad cases found a hazard ratio of . ( ci: . ; . ) when comparing physically active and inactive individuals and restricting follow-up time to ≥ years. furthermore, the latter studies also have indicated that a decline in physical activity levels occurs during the subclinical or prodromal phase of dementia and that previous observational studies might have overestimated dementia risk associated with insufficient levels of physical activity as many studies were based on short follow-up times and thus may have been subject to reverse causation caused by decline in physical activity prior to the diagnosis of dementia [ ] . we conducted an mr analysis, which is less susceptible to reverse causation, to further shed light on the association between physical activity and ad. the findings of the present study do not suggest a causal effect of physical activity on ad. our study had several strengths. the use of two-sample mr enabled us to use the largest gwas on ad to date. our mr study also incorporated the largest gwas on physical activity to increase the precision of snp-physical activity estimates, to reduce the potential for weak instrument bias and to increase statistical power. we used genetically predicted objectively measured physical activity, using accelerometry, which is less prone to recall and response bias than measurement of self-reported physical activity [ ] . furthermore, because some genetic loci for self-reported physical activity are also related to cognitive function, selfreported physical activity measures may be prone to information bias, and snp instrumenting self-reported physical activity might have induced horizontal pleiotropy [ , ] . in contrast, snp-associations based on accelerometer-assessed physical activity are unrelated to cognitive performance or other potential pathways with ad, which essentially rules out any impact cognitive biases or pleiotropy could have had on our results [ , ] . however, our study also had certain limitations. first, the genetic instruments for accelerometer-assessed physical activity explained only a small fraction of the phenotypic variability. second, for the two-sample mr to provide unbiased estimates, the risk factor and outcome sample should come from the same underlying population. the discovery genome-wide association study of physical activity consisted of uk biobank participants of european descent, aged to years [ ] . the snp-ad associations were derived from cohort and case-control studies of men and women of european descent aged years and older [ ] . by using non-specific effects, our analyses assumed that the effects of snps on physical activity do not vary by age. however, this may not be an entirely tenable assumption given that the heritability of physical activity has been shown to decrease with age [ ] . thus, given the limited age range of the uk biobank and inclusion of european ancestry individuals only, our results may not be generalizable to other age groups or ancestral populations. therefore, replication of our findings in other age groups and non-european populations is warranted. given the increase in life expectancy, ad is increasingly a public health challenge and measures to prevent or delay the onset of dementia are urgently needed. however, in combination with previous literature [ , ] , the present study provides little evidence that recommending physical activity would help to prevent ad. all authors disclose no conflict. funding/support: the authors did not receive funding for this study. funding information of the genome-wide association studies is specified in the cited studies. data availability: data supporting the findings of this study are available within the paper and its supplementary information files. table mendelian randomization estimates between accelerometer-based physical activity and alzheimer's disease global, regional, and national burden of alzheimer's disease and other dementias, - : a systematic analysis for the global burden of disease study defeating alzheimer's disease and other dementias: a priority for european science and society does physical activity prevent cognitive decline and dementia?: a systematic review and meta-analysis of longitudinal studies physical activity can improve cognition in patients with alzheimer's disease: a systematic review and meta-analysis of randomized controlled trials physical activity and risk of neurodegenerative disease: a systematic review of prospective evidence can diet and physical activity limit alzheimer's disease risk? physical activity and alzheimer disease: a protective association physical activity and risk of cognitive decline: a meta-analysis of prospective studies physical activity and alzheimer's disease: a systematic review. the journals of gerontology series a, biological sciences and medical sciences leisure time physical activity and dementia risk: a dose-response meta-analysis of prospective studies physical activity interventions in preventing cognitive decline and alzheimer-type dementia: a systematic review multidomain lifestyle intervention benefits a large elderly population at risk for cognitive decline and dementia regardless of baseline characteristics: the finger trial physical inactivity, cardiometabolic disease, and risk of dementia: an individual-participant metaanalysis physical activity, cognitive decline, and risk of dementia: year follow-up of whitehall ii cohort study inferring causal relationships between risk factors and outcomes from genome-wide association study data. annual review of genomics and human genetics genome-wide association study of habitual physical activity in over , uk biobank participants identifies multiple variants including cadm and apoe genetic metaanalysis of diagnosed alzheimer's disease identifies new risk loci and implicates abeta, tau, immunity and lipid processing calculating statistical power in mendelian randomization studies assessment of bidirectional relationships between physical activity and depression among adults: a -sample mendelian randomization study physical activity and risks of breast and colorectal cancer: a mendelian randomization analysis invited commentary: detecting individual and global horizontal pleiotropy in mendelian randomization-a job for the humble heterogeneity statistic? evaluating the potential role of pleiotropy in mendelian randomization studies the mr-base platform supports systematic causal inference across the human phenome phenoscanner v : an expanded tool for searching human genotype-phenotype associations the nhgri-ebi gwas catalog of published genome-wide association studies, targeted arrays and summary statistics guidelines for strengthening the reporting of mendelian randomization studies a systematic literature review of reviews on techniques for physical activity measurement in adults: a dedipac study. the international journal of behavioral nutrition and physical activity information bias in measures of self-reported physical activity variance components models for physical activity with age as modifier: a comparative twin study in seven countries key: cord- -yci kq x authors: liu, haiming; luo, jiaohua; guillory, bobby; chen, ji-an; zang, pu; yoeli, jordan k.; hernandez, yamileth; lee, ian (in-gi); anderson, barbara; storie, mackenzie; tewnion, alison; garcia, jose m. title: ghsr- a is not required for ghrelin’s anti-inflammatory and fat-sparing effects in cancer cachexia date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: yci kq x adipose tissue (at) atrophy is a hallmark of cancer cachexia contributing to increased morbidity/mortality. ghrelin has been proposed as a treatment for cancer cachexia partly by preventing at atrophy. however, the mechanisms mediating ghrelin’s effects are incompletely understood, including the extent to which its only known receptor, ghsr- a, is required for these effects. this study characterizes the pathways involved in at atrophy in the lewis lung carcinoma (llc)-induced cachexia model and those mediating the effects of ghrelin in ghsr+/+ and ghsr−/− mice. we show that llc causes at atrophy by inducing anorexia, and increasing at inflammation, thermogenesis and energy expenditure. these changes were greater in ghsr−/−. ghrelin administration prevented llc-induced anorexia only in ghsr+/+, but prevented wat inflammation and atrophy in both genotypes, although its effects were greater in ghsr+/+. llc-induced increases in bat inflammation, wat and bat thermogenesis, and energy expenditure were not affected by ghrelin. in conclusion, ghrelin ameliorates wat inflammation, fat atrophy and anorexia in llc-induced cachexia. ghsr- a is required for ghrelin’s orexigenic effect but not for its anti-inflammatory or fat-sparing effects. every year, over , , individuals in the us are diagnosed with cancer. cachexia (involuntary loss of muscle and adipose tissue) is present in up to % of cancer patients, is strongly associated with higher morbidity and mortality, and is reported as the direct cause of death in - % of these patients (dewys, begg et al., , fearon, strasser et al., . adipose tissue, once considered only a high-energy fuel reserve, has emerged recently as an active metabolic organ modulating inflammation, energy expenditure and food intake in non-cancer settings (you & nicklas, ) . accelerated loss of adipose tissue plays an important role in cancer cachexia contributing significantly to the increased morbidity and mortality seen in this setting (fouladiun, korner et al., ) . increased inflammation is common in the setting of cancer (garcia, garcia-touza et al., ) and is associated with adipose tissue wasting in human studies (lerner, hayes et al., ) . white adipose tissue (wat) is a significant source of inflammatory cytokines accounting for more than % of circulating interleukin (il)- (michaud, boulet et al., ) and this and other inflammatory cytokines have been linked to wat atrophy in the setting of cancer (petruzzelli, schweiger et al., , tsoli & robertson, , tsoli, swarbrick et al., . also, a phenotypic switch from wat to brown adipose tissue (bat) known as "browning" is thought to contribute to the overall increase in energy expenditure and wat atrophy seen in cancer cachexia (petruzzelli et al., ) . nevertheless, the mechanisms regulating adipose tissue atrophy and dysfunction in this setting are incompletely understood. ee levels in response to llc tumor implantation when compared to ghsr +/+ . animals co-administered ghrelin were not statistically different from vehicle-treated, tumor-bearing animals. tumor implantation also decreased spontaneous locomotor activity in both genotypes and ghrelin administration did not prevent these changes . the respiratory quotient (rq), was significantly decreased by tumor implantation and was not affected by genotype or ghrelin administration (fig g-i) . adipose tissue atrophy is a central component of the cancer anorexia and cachexia syndrome (cacs) leading to increased morbidity and mortality (das, eder et al., ) . recently, emerging roles for inflammation, wat browning and increased bat thermogenesis have been demonstrated in this setting (daas, rizeq et al., , dalal, , han, meng et al., , kir, white et al., kliewer, ke et al., , petruzzelli et al., , rohm, schafer et al., , wang, zhu et al., ) ; however, the pathways involved and their potential as therapeutic targets are not well-known. ghrelin and agonists of its only known receptor, show potential to ameliorate cacs at least in part by preventing fat atrophy, but the specific mechanisms mediating these effects have not been fully characterized. given that there are no treatments for cancer cachexia and that several clinical trials targeting this pathway have failed to meet their primary endpoints (garcia et al., , temel, abernethy et al., , there is a pressing need to improve our understanding of the mechanisms of action of ghrelin in this setting. in this study we show that ghrelin prevents llc tumor-induced weight loss, fat atrophy and wat inflammation without affecting tumor-induced bat inflammation, wat browning, and increased bat uncoupling and whole-body energy expenditure. we confirmed that its orexigenic effects are ghsr- a-dependent, and also show that other novel ghsr- a-independent mechanisms are involved given the partial improvements in fat atrophy and wat inflammation seen in ghrelin-treated, ghsr -/animals. also, this is the first report of macrophages as the source of wat and bat in the setting of cacs. weight loss and survival rates are correlated with il- levels in cancer patients (garcia et al., , moses, maingay et al., , scott, mcmillan et al., . these observations and several mechanistic studies support the premise that inflammation plays a central role in cacs. increases in il- β and tnf contribute to anorexia (baracos, martin et al., , braun, zhu et al., , khatib, gaidhane et al., , and tnf and il- promote lipolysis and inhibit lipogenesis in wat leading to weight loss (fearon, glass et al., , han et al., , jeanson, carriere et al., , ruan, hacohen et al., ) . in non-cancer settings, one third of the circulating il- is produced by wat (mohamed-ali, goodrick et al., ) and most of this wat-derived il- comes from the stroma-vascular fraction composed of endothelial cells, monocytes/macrophages, myocytes, and fibroblasts (fain, madan et al., ) , although it can also be derived from adipocytes (fain, ) . macrophages in wat are known to be the source of proinflammatory cytokines in conditions leading to at hypertrophy including obesity (di gregorio, yao-borengasser et al., , divoux, tordjman et al., , lumeng, deyoung et al., ) but this has not been previously shown in cacs. here we show that llc tumor implantation induces an increase in inflammatory cytokines in circulation as well as in bat and wat. moreover, these at cytokines appear to be derived exclusively from macrophages residing in these tissues. adipose tissue atrophy in cancer patients with cacs has been associated with an increase in subcutaneous at macrophages (batista, henriques et al., , de matos-neto, lima et al., , henriques, sertie et al., and tissue inflammation (batista, olivan et al., , de matos-neto et al., henriques et al., ). although, macrophage infiltration has also been described in wat from tumor-bearing rodents (henriques et al., , machado, costa rosa et al., ), to our knowledge this is the first report of macrophages as the source of pro-inflammatory cytokines in adipose tissue in cacs. these findings may explain why at remains an important source of pro-inflammatory cytokines even when the adipocyte mass is significantly reduced in this setting. also, this may be clinically relevant to cancer patients since knowing the source of inflammation may allow us to target these pathways more effectively (henriques, lopes et al., ). previously, we have shown that activation of ghsr- a by ghrelin or ghsr- a agonists (ghs) increases food intake and body weight ( , , ) . our group and others also have shown that ghrelin reduces fat oxidation and lipolysis and increases lipogenesis and adiposity in a rodent model of cisplatin-induced cachexia by a combination of food intake-dependent and independent mechanisms (chen et al., , garcia et al., b , porporato, filigheddu et al., . ghrelin is thought to have anti-inflammatory effects in other settings (deboer, zhu et al., , dixit, schaffer et al., , tsubouchi, yanagi et al., but this is not yet clear in cacs. some reports suggest an anti-inflammatory effect of native ghrelin administration, but this was not confirmed in other studies using ghsr- a agonists (chen et al., , garcia, friend et al., a . in the current study, we report that ghrelin modulates inflammation in a tissue-specific manner. ghrelin did not prevent tumor-induced increases in circulating inflammatory cytokines or in bat il- β or mcp- protein levels. however, it mitigated llc-induced inflammation in wat. this effect was seen in both genotypes although it was clearer in wild type animals partly because ghsr -/mice appear to be resistant to tumor-induced inflammation. ghsr- a is not expressed in adipocytes (sun, garcia et al., ) but is present in macrophages (ma, lin et al., ) and our findings are consistent with a previous report showing that old, non-tumor-bearing ghsr -/mice have reduced macrophage infiltration, a shift on macrophage differentiation towards a more anti-inflammatory phenotype, and decreased inflammation in adipose tissue (lin, lee et al., ) . however, a ghsr- a-independent effect of ghrelin on macrophages is also possible as it has been proposed in other settings (avallone, demers et al., , bulgarelli, tamiazzo et al., , lucchi, costa et al., . taken together, our data is consistent with a wat-specific, anti-inflammatory effect of ghrelin that is partly ghsr- a dependent. this is clinically relevant as ghsr- a agonists are in clinical development for cacs and their effect on these ghsr- a independent pathways is not known (garcia et al., ) . also, the differences we report between serum, wat and bat levels underscore the limitations of relying exclusively on circulating cytokine levels when trying to determine the potential role of inflammation in other tissues. energy expenditure is an important mechanism in the regulation of body weight and is increased in cacs (garcia et al., a , kir, komaba et al., , rohm et al., . factors contributing to ee include physical activity and resting ee (ree) (silver, dietrich et al., , vazeille, jouinot et al., ) and adipose tissue can lead to an increase in ree by uncoupling oxidative phosphorylation in mitochondria thereby releasing heat through activation of a proton leak (nicholls, , okamatsu-ogura, kitao et al., ) . in wat, browning has been noted in multiple cancer cachexia models with adipocytes showing an upregulation of the main regulator of thermogenesis, ucp (dong, lin et al., , vaitkus & celi, . in bat, increased thermogenesis has been reported in cachectic animals (kir et al., ) independently of decreased food intake or their ability to maintain their body temperature (tsoli, moore et al., ) . proinflammatory cytokines have been suggested as key drivers of wat browning (han et al., , petruzzelli et al., and of bat thermogenesis through activation of sympathetic nervous system or targeting bat directly (arruda, milanski et al., , dascombe, rothwell et al., , li, klein et al., , tsoli et al., . here we show that llc-tumor implantation led to an increase in total ee in spite of a significant decrease in physical activity, suggesting an increase in ree. this was associated with an increase in ucp- expression in wat (browning) and in bat. moreover, these effects were more marked in ghsr -/mice suggesting a protective role of ghsr- a in this setting. these results agree with previous reports in aged, non-tumor-bearing ghsr -/showing higher levels of thermogenesis and energy expenditure when compared to aged-matched, wild-type mice (lin, saha et al., ) . the effect of ghrelin or ghsr a agonists on energy expenditure is unclear with some studies showing a decrease in ee (borner, loi et al., , villars, pietra et al., while others showed no effect (adachi, takiguchi et al., , tschop, smiley et al., , vestergaard, djurhuus et al., . in this study, we did not see a significant effect of ghrelin on preventing llc-induced fat browning, bat thermogenesis, increased ree or decreased physical activity in the setting of cacs despite the fact that ghrelin prevented fat and weight loss and anorexia. we hypothesize that differences in the models, route of administration and treatment regimen and agents used (llc mice vs. c mice or hepatoma model in rats, administration via s.q. vs. oral gavage vs. osmotic mini pump, ghrelin vs. ghsr a agonists) could account for these discrepancies. more studies will be needed to test this hypothesis. calderon-dominguez, mir et al., ) . in cacs the aforementioned tumor-induced inflammation is thought to play an important role in bat thermogenesis (petruzzelli et al., , tsoli et al., ; however, the source of inflammation in bat is not known. similar to wat, we found that bat il- and tnf come exclusively from macrophages in the setting of cachexia. however, their expression in bat were lower than in wat and no significant changes were found in response to tumor implantation or ghrelin. we found a significant tumor-effect on increasing il- β levels in bat although ghrelin did not prevent this increase, suggesting tissue-specific differences in inflammation between bat and wat in response to tumor and ghrelin. taken together, these results are important because they show that tumor-induced wat browning and bat thermogenesis are associated with significant increases in ree and appear to be independent of inflammation given that downregulating inflammation does not prevent uncoupling in wat and that bat il and tnf levels were not upregulated upon tumor implantation. in addition, our data suggests that wat is a significant source of inflammatory cytokines, which express the highest levels of il- β, il- , and tnf when compared to bat and circulating levels. there were limitations to our approach. this study was not set up to establish the safety of ghrelin administration in the setting of cancer. nevertheless, none of the studies published to date using ghrelin or ghsr- a agonists in mice or humans have shown an increase in tumor progression (sever, white et al., ) . also, the experiments were not designed to characterize other mechanisms contributing to the protective role of ghsr- a in this setting. lastly, our data suggest that there is an alternative receptor for ghrelin although identification of this receptor remains elusive and is the focus of other studies. in summary, ghrelin prevents llc tumor-induced body weight and fat loss by a combination of ghsr- a-dependent mechanisms including preventing anorexia, and other mechanisms that are partly ghsr- a-independent. the increase in inflammation in at induced by tumor implantation is prevented by ghrelin only in wat; however, tumor-induced wat browning, and increased bat inflammation, uncoupling and whole body energy expenditure are not prevented by ghrelin even when the presence of ghsr- a appears to contribute to maintaining energy balance in this setting. tumor-induced wat browning and bat thermogenesis are associated with significant increases in ree and these seem to be independent of inflammation given that downregulating it does not prevent these changes. these results are clinically relevant because they show that ghrelin five to seven-month-old male c bl/ j growth hormone (gh) secretagogue receptor wild type (ghsr +/+ ) and knockout (ghsr -/-) congenic mice were used for all experiments. briefly the ghsr +/+ and ghsr -/mice were originally from dr. roy g. smith ph.d's laboratory (sun, butte et al., ) the procedures of tumor implantation (ti) and ghrelin intervention were described previously (chen et al., ) . in brief, mice were injected subcutaneously (s.q.) with lewis lung carcinoma (llc) cells ( × cells, crl , american type culture collection, manassas, va) into the right flank or with equal volume and number of heat-killed llc cells (hk). approximately days after tumor implantation (ti), when the tumor was palpable (~ cm in diameter), the tumor-bearing mice were treated with either acylated ghrelin (as- , anaspect, fremont, ca) at a dose of . mg/kg or vehicle ( . % sodium chloride, , covidien, dublin, ireland), s.q., twice daily, while mice in hk group received vehicle (saline, same volume), s.q., twice daily for two weeks. the comprehensive laboratory animal monitoring system (clams™, columbus instruments, columbus, oh) was used to identify metabolic parameters of the animals as we previously described (guillory, chen et al., ) . ghsr +/+ and ghsr -/mice were individually housed in clams cages for hours before ti as well as at the endpoint (see the supplemental fig. , timeline for the study). the first hours of clams was considered as the acclimation phase and the data for the next hours were analyzed. oxygen consumption (vo ) (ml/h), carbon dioxide production (vco ) (ml/h), and locomotor activity (infrared beam-break counts) were recorded automatically by the clams system every min. the respiratory exchange ratio (rq) and energy expenditure (ee, or heat generation) were calculated from vo and vco gas exchange data as follows: rq = vco /vo and ee = ( . + . × rq) × vo , respectively. energy expenditure was then normalized to lbm for statistical analysis using two-way analysis of variance (anova). alternatively, we also analyzed ee value by ancova with lbm as a covariate. locomotor activity was measured on x-and z-axes by the counts of beam-breaks during the recording period. the data shown in the results was summarized as the mean of every hours in a -hour-period. for iwat and bat samples, ug of the protein lysate was diluted with diluent and loaded onto each well. the plate was incubated at room temperature (rt) with shaking for h followed by times of wash in phosphate buffered saline with . % tween (pbs/t). sulfo-tag labeled detection antibody was then added to plates and incubated for . h. after another washes in pbs/t, read buffer t( x) was added and the plate was read on msd sector imager (msd). immunohistochemistry the iwat and bat were mounted with oct (vwr - , vwr, radnor, pa) and flash frozen in liquid nitrogen-chilled isopentane immediately after tissue collection. the oct-mounted iwat and bat blocks were sliced at μm using a cryostat (leica cm s, nussloch, germany) at - o c. before the process of staining, slides were dehydrated at rt for minutes followed by incubating in methanol for minutes at - o c. to identify the colocalization of f / and il- or tnfα in iwat and bat, slides were blocked with % donkey serum for hour at rt and followed by incubating in primary antibodies (f / monoclonal antibody : , mf , thermo fisher scientific; anti-il- antibody : , ab , abcam; tnf alpha monoclonal antibody, fitc, ebioscience™ : , thermo fisher scientific) signaling). the stained slides were dehydrated by %, %, % ethanol, and % xylene sequentially and mounted with coverslips by using permount (sp - , thermo fisher scientific). all stained slides were imaged by nikon nie microscope at x (iwat) or x (bat). the positive cells (immunofluorescence) or positive area (dab stain) in the section were quantified and normalized to the total area of the section (mm ) using imagej analysis software (national institutes of health, http://rsb.info.nih.gov/ij/). two-way anova was performed to identify differences between genotypes (ghsr +/+ vs. ghsr -/-) across treatments (hk, tv, and tg) followed by fisher's lsd post hoc test. for inflammatory cytokines, kruskal-wallis test was performed to identify the differences between groups. for energy expenditure, ancova was also used for analysis in addition to anova with lbm as a covariate to identify differences between genotypes across treatments followed by fisher's lsd post hoc test. values are presented in mean ± sem. all statistical testing was performed using ibm spss version adachi s, takiguchi s, okada k, yamamoto k, yamasaki m, miyata h, nakajima k, fujiwara y, hosoda h, kangawa k, mori m, doki y ( ) effects of ghrelin administration after total gastrectomy: a prospective, randomized, placebo-controlled phase ii study. and thermogenic responses to il- beta in mice ghrelin-induced adiposity is independent of orexigenic effects a switch from white to brown fat increases energy expenditure in cancer-associated cachexia acylated and unacylated ghrelin impair skeletal muscle atrophy in mice coactivator of nuclear receptors linked to adaptive thermogenesis diet-induced obesity causes insulin resistance in mouse brown adipose tissue an amp-activated protein kinase-stabilizing peptide ameliorates adipose tissue wasting in cancer cachexia in mice energy metabolism in cachexia what we talk about when we talk about fat tumor necrosis factor-alpha suppresses adipocyte-specific genes and activates expression of preadipocyte genes in t -l adipocytes: nuclear factor-kappab activation by tnf-alpha is obligatory the relationship between weight loss and interleukin in non-small-cell lung cancer is there an effect of ghrelin/ghrelin analogs on cancer? a systematic review changes in body mass, energy balance, physical function, and inflammatory state in patients with locally advanced head and neck cancer treated with concurrent chemoradiation after low-dose induction chemotherapy peptidomimetic regulation of growth hormone secretion characterization of adult ghrelin and ghrelin receptor knockout mice under positive and negative energy balance ghrelin and growth hormone secretagogue receptor expression in mice during aging anamorelin in patients with non-small-cell lung cancer and cachexia (romana and romana ): results from two randomised, double-blind, phase trials ghrelin induces adiposity in rodents a guide to analysis of mouse energy metabolism activation of thermogenesis in brown adipose tissue and dysregulated lipid metabolism associated with cancer cachexia in mice cancer cachexia: malignant inflammation, tumorkines, and metabolic mayhem lipolytic and thermogenic depletion of adipose tissue in cancer cachexia ghrelin relieves cancer cachexia associated with the development of lung adenocarcinoma in mice the role of adipose tissue in cancer-associated cachexia immune modulation of brown(ing) adipose tissue in obesity relation between hypermetabolism, cachexia, and survival in cancer patients: a prospective study in cancer patients before initiation of anticancer therapy acute effects of ghrelin administration on glucose and lipid metabolism receptor agonist hm attenuates cachexia in mice bearing colon- (c ) tumors multiple roles of adipose tissue in cancer formation and progression beige adipocytes are a distinct type of thermogenic fat cell in mouse and human chronic inflammation: role of adipose tissue and modulation by weight loss hk: heat-killed + vehicle; tv: tumor + vehicle; tg: tumor + ghrelin. changes in (a) body weight (carcass weight, n = - ) and (b) fat body mass by nmr expressed as % change from baseline average cumulative food intake (fi) normalized to baseline fi (g/g, black areas represent food intake in the nighttime, and the bottom areas in the bars represent food intake in the daytime, n = - ). * p < . compared to hk within the same genotype. # p < . compared to tv within the same genotype hk: heat-killed + vehicle; tv: tumor + vehicle; tg: tumor + ghrelin. protein levels of inflammatory markers (a)il- β, (b) il- , and (c) tnf; and (d) macrophage marker mcp- in iwat (pg/mg) compared to hk within the same genotype. # p < . compared to tv within the same genotype. no genotype difference was detected. data are shown as mean ± se. n = - /group. (e-f) colocalization of inflammation and macrophages in iwat. (e) representative images of colocalization of inflammatory marker il- and macrophage marker f f / in fitc green; nuclei in dapi blue). (f) representative images of colocalization of inflammatory marker tnf and macrophage marker f / in iwat (tnf in fitc green positively stained inflammatory markers and colocalizations with macrophages are indicated by the white arrows. scale bars hk: heat-killed + vehicle; tv: tumor + vehicle; tg: tumor + ghrelin. protein levels of inflammatory markers (a)il- β, (b) il- , and (c) tnf; and (d) macrophage marker mcp- in iwat (pg/mg) compared to hk within the same genotype. # p < . ; ### p < . compared to tv within the same genotype. no genotype difference was detected. data are shown as mean ± se. n = - /group. (e-f) colocalization of inflammation and macrophages in bat representative images of colocalization of inflammatory marker il- and macrophage marker f / f / in fitc green; nuclei in dapi blue). (f) representative images of colocalization of inflammatory marker tnf and macrophage marker f / in bat positively stained inflammatory markers and colocalizations with macrophages are indicated by the white arrows. scale bars hk: heat-killed + vehicle; tv: tumor + vehicle; tg: tumor + ghrelin. (a) representative ihc images of ucp- in iwat. (b) ucp- positive area is expressed as % of the total analyzed area in iwat ucp- in bat. (d) ucp- positive area is expressed as % of the total analyzed area in bat compared to hk within the same genotype. genotype effects are shown as p-values above the corresponding figures (p < . ). data are shown as mean ± se. scale bars hk: heat-killed + vehicle; tv: tumor + vehicle; tg: tumor + ghrelin. (a-c) energy expenditure adjusted by lbm is expressed (a) compared to the baseline; (b) every hours; and (c) average of every hours. (d-f) ambulatory activity is expressed (d) compared to baseline g-i) respiratory quotient (rq) is expressed (g) compared to baseline every hours; and (i) average of every hours. *p< . compared to hk within the same genotype genotype effects are shown in p-values above the corresponding figures (p < . ). n = for hk groups and n = for the rest of the groups. data are shown as mean ± se data is expressed as box-and-whisker plot showing the median (middle line), mean (middle cross), upper and lower quartiles (box), maximum and minimum (whiskers) supplemental fig. . high resolution images of immunohistochemistry staining in iwat representative images of colocalization of inflammatory marker il- and macrophage marker f / f / in fitc green; nuclei in dapi blue). (b) representative images of colocalization of inflammatory marker tnf and macrophage marker f / in iwat positively stained inflammatory markers and colocalizations with macrophages are indicated by the white arrows. scale bars supplemental fig. . high resolution images of immunohistochemistry staining in bat representative images of colocalization of inflammatory marker il- and macrophage marker f / f / in fitc green; nuclei in dapi blue). (b) representative images of colocalization of inflammatory marker tnf and macrophage marker f / in bat positively stained inflammatory markers and colocalizations with macrophages are indicated by the white arrows. scale bars effects of ghrelin on llc-induced protein-level changes in inflammation (il- β mcp- ) in plasma (pg/mg, n = - ). *, **: different than hk within the same genotype (*: p < . ; **: p < . ). genotype effects are shown in p-values above the corresponding figures timeline of current study. ghsr +/+ and -/-mice were injected with llc (t, × cells, s.q.) into the right flank or with equal volume and number of heat-killed llc cells (hk) the tumor-bearing mice were treated with either acylated ghrelin, . mg/kg (tg) or vehicle ( . % sodium chloride, tv), s.q., twice daily, while mice in hk group received vehicle (saline, same volume), s.q., twice daily for two weeks days before tumor noted, baseline) and weekly till the endpoint. all the mice were individually housed in clams cages for hours before ti ( - days before tumor noted, baseline) as well as at the endpoint borner t, loi l, pietra c, giuliano c, lutz ta, riediger t ( ) the ghrelin receptor agonist hm mimics the neuronal effects of ghrelin in the arcuate nucleus and attenuates anorexia-cachexia syndrome in tumor-bearing rats. am j physiol regul integr comp physiol : r - braun tp, zhu x, szumowski m, scott gd, grossberg aj, levasseur pr, graham k, khan s, damaraju s, colmers wf, baracos key: cord- - ome h authors: levinson, maxwell adam; niestroy, justin; manir, sadnan al; fairchild, karen; lake, douglas e.; moorman, j. randall; clark, timothy title: fairscape: a framework for fair and reproducible biomedical analytics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ome h results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. evidence for the correctness of any analysis consists of accessible data and software with runtime parameters, environment, and personnel involved. evidence graphs - a derivation of argumentation frameworks adapted to biological science - can provide this disclosure as machine-readable metadata resolvable from persistent identifiers for computationally generated graphs, images, or tables, that can be archived and cited in a publication including a persistent id. we have built a cloud-based, computational research commons for predictive analytics on biomedical time series datasets with hundreds of algorithms and thousands of computations using a reusable computational framework we call fairscape. fairscape computes a complete chain of evidence on every result, including software, computations, and datasets. an ontology for evidence graphs, evi (https://w id.org/evi), supports inferential reasoning over the evidence. fairscape can run nested or disjoint workflows and preserves the provenance graph across them. it can run apache spark jobs, scripts, workflows, or user-supplied containers. all objects are assigned persistent ids, including software. all results are annotated with fair metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software. fairscape is a reusable computational framework, enabling simplified access to modern scalable cloud-based components. it fully implements the fair data principles and extends them to provide fair evidence, including provenance of datasets, software and computations, as metadata for all computed results. computation is an integral part of the preparation and content of modern biomedical scientific publications, and the findings they report. computations can range in scale from simple statistical routines run in excel spreadsheets to massive orchestrations of very large primary datasets, computational workflows, software, cloud environments, and services. they typically produce data and generate images as output. scientific claims of the authors are supported both by reference to the existing domain literature, and to the experimental or observational data and its analysis represented in the figure or image. the ideal recommended practice is now to archive and cite one's own experimental data (cousijn et al. ; data citation synthesis group ; fenner et al. ; groth et al. ) ; to make it fair (wilkinson et al. ) ; and to archive and cite software used in analysis (smith et al. ) . that is, increasingly strict requirements are demanded to leave a digital footprint of each preparation and analysis step in derivation of a finding to support reproducibility and reuse of both data and tools. this is a welcome development, now extended by many journals into the realm of critical research reagents (a. bandrowski ; a. e. bandrowski and martone ; prager et al. ) . how do we facilitate it? and how do we make the recorded digital footprints most useful? our notion, inspired by a large body of work in abstract argumentation frameworks, and analysis of biomedical publications (tim clark et al. ; greenberg greenberg , , is that the evidence for correctness of any finding can be represented as a directed acyclic support graph, an evidence graph. when combined with a graph of challenges to statements, or their evidence, this becomes a bipolar argument graph -or argumentation system (cayrol and lagasquie-schiex . we have abstracted core elements of our micropublications model (clark et al. ) to create evi (http://w id.org/evi), an ontology of evidence relationships that extends the w c provenance ontology, prov (gil et al. ; lebo et al. ; moreau et al. ) , to support specific evidence types found in biomedical publications, reasoning across deep evidence graphs, and propagation of evidence challenges deep in the graph, such as: retractions, reagent contamination, errors detected in algorithms, disputed validity of methods, challenges to validity of animal models, and others. (al manir & clark, in preparation; w id .org/evi#). evi is based on the fundamental idea that scientific findings or claims are not facts, but assertions backed by some level of evidence, i.e., they are defeasible components of argumentation. therefore, evi focuses on the structure of evidence chains that support or challenge a result, and on providing access to the resources identified in those chains. evidence in a scientific article is in essence, a record of the provenance of the finding, result, or claim asserted as likely to be true. if the data and software used in analysis are all registered and receive persistent identifiers (pids) with appropriate metadata, a provenance-aware computational data lake, i.e., a data lake with provenance-tracking computational services, can be built that attaches evidence graphs to the output of each process. at some point, a citable object -a dataset, image, figure, or table will be produced as part of the research. if this, too, is archived with its evidence graph as part of the metadata and the final supporting object is either directly cited in the text, or in a figure caption, then the complete evidence graph may be retrieved as a validation of the object's derivation and as a set of uris resolvable to reusable versions of the toolsets and data. evidence graphs are themselves entities that can be consumed and extended at each transformation or computation. a cogent use case for this treatment of evidence comes from the recent surgisphere retractions in covid- research mehra, mandeep r et al. ) , and earlier, the obokata "stimulus transitioned acquisition of pluripotency" (stap) retractions (aizawa ; ishii et al. ; haruko obokata, wakayama, et al. ) . many more such cases could be cited, including the wakefield paper in lancert which claimed that mmr vaccination caused autism (deer ; the editors of the lancet ; wakefield et al. ). in these well-publicized cases, research that initially appeared to have groundbreaking promise, was shown to be invalid based on examination of the underlying data and methods. while the obokata and surgisphere retractions occurred relatively quickly, due no doubt to the egregiousness of the scientific misconduct involved, it is reasonable to believe that less obtrusive, or more well-concealed errors, malfeasance, or simple hyped-up claims with a poor (or no) basis in evidence, is much more prevalent. we set out to construct a provenance-aware computational data lake, as described above, by significantly extending and refactoring the identifier and metadata services framework we and our colleagues developed in the nih data commons pilot project consortium (timothy clark et al. ; fenner et al. ). this framework successfully demonstrated interoperability across several nih "data commons" environments, providing the identifier, authn/authz, and metadata management elements of grossman's "data ecosystem" concept (grossman ) . we extended and re-engineered this framework over time to track and visualize computations and their evidence, to manage the computational objects (such as data and software) as well as their metadata, to analyze very large datasets with horizontal scale-out, to support neuroimaging workflows, and to make it generally more easy for scientists and computational analysts to use, by providing binder and notebook services (jupyter et al. ; kluyver et al. ) , and a python client. end-users do not need to learn a new programming language to use services provided by fairscape. they require no additional special expertise, other than basic familiarity with python and the skillsets they already possess in statistics, computational biology, machine learning, or other data science techniques. fairscape provides an environment that makes large-scale computational work easier and results fairer. fairscape is a reusable framework, suitable for installation in private, public, or hybrid cloud environments. we have also installed it on a high-end laptop. it focuses on ease of use for computational researchers, while capturing an extensible record of provenance, transparently to the user. fairscape provenance is rendered as named evidence graphs. these provide a complete record of any series of computations, with fair access to every digital object in a series of computations and transformations, whether or not connected in a workflow, or done by different users, including both datasets and software source code. the remainder of this article describes the approach, microservices architecture, and interaction model of the fairscape framework in detail. fairscape is built on a multi-layer set of components using a containerized microservice architecture (msa) (balalaie et al. ; larrucea et al. ; lewis and fowler ; wan et al. ) running under kubernetes (burns et al. ) in an openstack (adkins ) private cloud environment, with a devops deployment model (balalaie et al. ; leite et al. ). an architectural sketch of this model is shown in figure . ingress to microservices in the various layers is through a reverse proxy using an api gateway pattern. the top layer provides an interface to the end users with raw data and the associated metadata. the mid layer is a collection of tightly coupled services that allow end users with proper authorization to submit and view their data, metadata, and various types of computations performed on them. the bottom layer is built with special purpose storage and analytics platforms for storing and analyzing data, metadata and provenance information. all objects are assigned pids using local ark assignment for speed, with global resolution for generality. the user interface layer in fairscape offers end users various ways to utilize the functionalities in the framework. a reproducible interactive executable environment using binders offers users with proper authorization the ability to use the features with ease. a python client simplifies calls to the microservices. data, metadata, software, scripts, workflows, containers, etc. are all are submitted and registered by the end users from the ui layer. access to the fairscape environment is through an api gateway, mediated by a reverse proxy. our gateway is mediated by traefik, which dispatches calls to the various microservices endpoints. accessing the services requires user authentication, which we implement using the globus auth authentication broker (tuecke et al. ). users of globusauth may be authenticated via a number of permitted authentication services, and are issued a token which serves as an identity credential. in our current installation we require use of the commonshare authenticator, with site-specific two-factor authentication necessary to obtain an identify token. this token is then used by the microservices to determine a user's permission to access various functionality. the microservices layer is composed of seven services and two interfaces: authentication, authorization, transfer, metadata, evidence, computation, search, and visualization services; and the object and cluster compute apis to lower level services. fairscape currently uses minio for object storage, mongodb for basic metadata storage, and stardog for graph storage. computations are managed by kubernetes, apache spark, and the nipype neuroinformatics workflow engine. this service transfers and registers digital research objects -datasets, software, etc., -and their associated metadata, to the commons. these objects are sent to the transfer service as binary data streams, which are then stored in minio object storage. these objects may include structured or unstructured data, application software, workflow, scripts. the associated metadata contains essential descriptive information such as context, type, name, textual description, author, location, checksum, etc. about these objects. metadata are expressed as json-ld and sent to the metadata service for further processing. hashing is used to verify correct transmission of the object -users are required to specify a hash which is then recomputed by minio after the object is stored. hash computation is currently based on the sha- secure cryptographic hash algorithm (dang ) . upon successful execution, the service returns a pid of the object in the form of an ark, which resolves to the metadata. the metadata includes, as is normal in pid architecture, a link to the actual data location. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/transfer/ . the metadata service handles metadata registration and resolution including identifier minting in association with the object metadata. the metadata service takes user posted json-ld metadata and uploads the metadata to mongodb and stardog, and returns a pid. to retrieve metadata for an existing pid a user makes a get call to the service. a put call to the service will update an existing pid with new metadata. while other services may read from mongodb and stardog directly, the metadata service handles all writes to mongodb and stardog. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/metadata-service/ . this service executes user uploaded scripts, workflows, or containers, on uploaded data. it currently offers two compute engines (spark, nipype) in addition to native kubernetes container execution, to meet a variety of computational needs. to complete jobs the service spawns specialized pods on kubernetes designed to perform domain specific computations that can be scaled to the size of the cluster. this service provides the essential ability to recreate computations based solely on identifiers. for data to be computed on it must first be uploaded via the transfer service and be issued an associated pid. the service accepts a pid for a dataset, a script, software, or a container, as input and produces a pid representing the activity to be completed. the request, if successful, returns a job identifier from which job progress can be followed. upon completion of a job all outputs are automatically uploaded and assigned new pids, with provenance aware metadata. at job termination, the service performs a 'cleanup' operation, where a job is removed from the queue once it is completed. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/compute/ . this service allows users to visualize evidence graphs interactively in the form of nodes and directed edges, offering a consolidated view of the entities and the activities supporting correctness of the computed result. our current visualization engine is cytoscape (shannon ) . each node displays its relevant metadata information, including its type and pid, resolved in real-time. the visualization service renders the graph on an html page. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/visualization/ . the evidence graph service creates a json-ld evidence graph of all provenance related metadata to a pid of interest. the evidence graph documents all entities such as datasets, software, and workflows, and the activities performed involving these entities. the service accepts a pid as its input, runs a specialized path query built on top of the sparql query engine in stardog with the pid as its source to retrieve all supporting nodes that can be reached. to retrieve an evidence graph for a pid a user may make a get call to the service. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/evidence-graph/ . the search service allows users to search for object metadata containing strings of interest. it accepts a string as input and performs a search over all literals in the metadata for exact string matches and returns a list of all pids with a literal containing the query string. it is invoked via the get method of api endpoint to the service with the search string as argrument. an openapi description of the interface is here: https://app.swaggerhub.com/apis/fairscape/search/ . fairscape orchestrates a set of containers to provide the services in these layers, using kubernetes. the services support a pattern composed of the following steps: (a) api ingress, (b) user authentication and authorization, (c) service dispatch, (d) object acquisition, (e) computation, (f) object resolution and access. these steps rely on further components (g) identifier minting and resolution, (h) object access, (i) object verification, and (j) evidence graph visualization. the authn/authz service authenticates users and issues them a token, which is then used at the service level to determine what permissions they have in that service. metadata access is authorized separately from data access, and separately from service execution. a user may be authorized to read an object's metadata, but not its data. this is accomplished by preventing return of the downloadurl term by the metadata service, and as a second level assurance, by blocking access to the object's s bucket in minio. the transfer service provides import of an object -software, container, or dataset -into fairscape, documenting its origin, and enabling descriptive metadata to be attached. once the object is stored robustly, it can be computed upon. objects are automatically registered with a persistent identifier (pid) upon acquisition. these are currently limited to archival resource keys (arks), generated locally. we plan to enable datacite doi registration shortly. this was an original feature of the object registration system we developed in the nih data commons pilot, however since that time, changes have been made to the datacite api which we need to review and address in our code. the compute service executes computations using either a container specified by the user, or the apache spark service, or the nipype workflow engine. objects (again, datasets, software, containers) are passed to the compute service by their pid, retrieved from the object store, and acted upon using the facilities indicated. at end, the result is written back to the object store, the metadata store is updated, and the evidence graph updates the support graph. for nipype jobs, the metadata includes all prov records for each step of the workflow. for spark jobs, data from the object store is written to the hfs file system, which maintains a direct interface with minio, separate from and below the level of the compute service, for efficiency. the metadata service mints pids using the appropriate internal or external service. in the current deployment, that is local ark minting with global resolution. multiple alternative pids may exist for any object, and doi registration is a planned near-term feature. pids are resolved to their associated object level metadata, including the object's evidence graph and location, with appropriate permissions. objects are accessed by their location, after prior resolution of the object's pid to its metadata and authorization of the user's authentication token for data access on that object. object access is either directly from the object store, or from wherever else the object may reside. certain large objects residing in robust external archives, may not be acquired into local object storage, but remain in place, up to the point of computation. objects are issued hashes when they are created, and these hashes are also required metadata on ingress. the original user-supplied hashes are verified whenever an object is ingested, and internally computed hashes are provided for re-verification when the object is accessed. evidence graphs of any object acquired by the system may be visualized at any point in this workflow using the visualization service. nipype provides a chart of the workflows it executes using the graphviz package. our evidence graph service is interactive, using the cytoscape package (shannon ) , and allows evidence graphs of multiple workflows in sequence to be displayed whether or not they have been combined into a single flow. service testing and deployment is automated following modern continuous integration / continuous deployment (ci/cd) devops practices. when code is committed to the github repository, unit and integration tests are automatically invoked. if the tests are passed, automated deployment of the microservice containers is invoked using jenkins pipelines (soni ) and helm charts. this allows for rapid evolution of the platform with reasonable integrity. we have installed fairscape both in a large private cloud cluster computing environment at our university, and on laptops. we used fairscape services to analyze ten years of neonatal icu vital signs data from over , babies with over different highly comparative time series analysis (hctsa) methods taken from the literature (fulcher et al. ; fulcher and jones ) , recoding many of them from matlab into python. we analyzed the data with operations computed using several parameter sets amounting to > , separate computations (niestroy et al., in preparation) . one key step in the analysis was to cluster the algorithms by the similarity of the results. the results were represented in the heat map shown in figure . the evidence graph for this result is quite large. a visualization of a section for one patient is shown in figure . the full evidence graph for the clustering computation has , nodes. the json-ld for this individual patient example is shown in figure . metadata for the archived image includes the json-ld evidence graph. in this set of computations, all steps required authentication and authorization within the university of virginia computing infrastructure. we then used the following service calls to do the analysis: (a) transfer service to register all the objects with metadata and pids; (b) compute service to perform the individual computations, using apache spark; (c) evidence graph service to compute and retrieve the evidence graph and create the visualization. internally, services call each other in a more complex way, but this is masked from the user. for example, transfer service calls the metadata service to mint identifiers and register metadata, and it performs object verification against the inbound sha hash. we ran neuroimaging workflows using test data provided for the nipype workflow engine (gorgolewski et al. ) . metadata for the archived computational result includes this evidence graph. a visualization of the evidence graph is shown in figure . intermediate results for such workflows have time-limited utility. per data citation guidelines (data citation synthesis group ; fenner et al. ; starr et al. ) , it is acceptable to clear this data if the useful metadata describing the procedure is preserved, which we do here. the service calls to perform this work were similar to those in use case above, with the exception that the compute service was called using the nipype option. scientific rigor depends on the transparency of methods and materials. the historian of science steven shapin, described the approach developed with the first scientific journals as "virtual witnessing" (shapin ) , and this is still valid today. the typical scientific reader does not actually reproduce the experiment but is invited to review mentally every detail of how it was done to the extent that s/he becomes a "virtual witness" to an envisioned live demonstration. that is clearly how most people read scientific papers -except perhaps when they are citing them, in which case less care is often taken. scientists are not really incentivized to replicate experiments; their discipline rewards novelty. the ultimate validation of any claim once it has been accepted as reasonable on its face comes with support from multiple distinct angles, by different investigators, and with re-use of the materials and methods upon which it is based. if the materials and methods are sufficiently transparent and thoroughly disclosed as to be reusable, and they cannot be made to work, or give bad results, that debunks the original experiments -precisely the way in which the promising-sounding stap phenomenon was discredited (haruko , before the elaborate formal effort of riken to replicate the experiments. as a first step then, it is not only a matter of reproducing experiments but also of producing transparent evidence that the experiments have been done correctly. this permits challenges to the procedures to develop over time, especially through re-use of materials (including data) and methods -which today significantly include software and computing environments. we definitely view these methods as being extensible to materials such as reagents, using the rrid approach (prager et al. ) . fairscape is a reusable framework for biomedical computations that provides a simplified interface for research users to an array of modern, dynamically scalable, cloud-based componentry. our goal in developing fairscape was to provide an ease-of-use (and re-use) incentive for researchers, while rendering all the artifacts marshalled to produce a result, and the evidence supporting them, findable, accessible, interoperable, and reusable. fairscape can be used to construct, as we have done, a provenance-aware computational data lake or commons. it supports transparent disclosure of the evidence graphs of computed results, with access to the persistent identifiers of the cited data or software, and to their stored metadata. we plan several enhancements in future research and development with this project, including support for doi and software heritage identifier registration, metadata transfer to dataverse instances, and integration of the galaxy workflow engine for genomic analysis, for release later this year. many efforts involving overlapping groups have attempted to address parts of this problem, which is in large part an outcome of the transition of biomedical and other scientific research from print to digital, and our increasing ability to generate data and to compute on it at enormous scale. we make use of many of these in our fairscape framework, providing an integrated model for fairness and reproducibility, with ease of use openstack: cloud application development results of an attempt to reproduce the stap phenomenon microservices architecture enables devops: migration to a cloud-native architecture rrid's are in the wild! thanks to jcn and peerj. the nif blog: neuroscience information framework rrids: a simple step toward improving reproducibility through rigor and transparency of experimental methods bipolar abstract argumentation systems coalitions of arguments: a tool for handling bipolar argumentation frameworks bipolarity in argumentation graphs: towards a better understanding micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications national institutes of health, data commons pilot phase consortium a data citation roadmap for scientific publishers. scientific data secure hash standard (no. nist fips - ) (p. nist fips - ). national institute of standards and technology joint declaration of data citation principles how the case against the mmr vaccine was fixed tracking the growth of the pid graph core metadata for guids. national institutes of health, data commons pilot phase consortium a data citation roadmap for scholarly data repositories hctsa : a computational framework for automated time-series phenotyping using massive feature extraction highly comparative time-series analysis: the empirical structure of time series and their methods prov model primer: w c working group note nipype: a flexible, lightweight and extensible neuroimaging data processing framework how citation distortions create unfounded authority: analysis of a citation network understanding belief using citation networks progress toward cancer data ecosystems data lakes, clouds, and commons: a review of platforms for analyzing and sharing genomic data fair data reuse -the path through data citation report on stap cell research paper investigation the nci genomic data commons as an engine for precision medicine binder . -reproducible, interactive, sharable environments for science at scale jupyter notebooks-a publishing format for reproducible computational workflows microservices. ieee software prov-o: the prov ontology w c recommendation a survey of devops concepts and challenges microservices: a definition of this new architectural term retraction: cardiovascular disease, drug therapy, and mortality in covid- retraction-hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid- : a multinational registry analysis prov-dm: the prov data model: w c recommendation bidirectional developmental potential in reprogrammed cells with acquired pluripotency retraction note: bidirectional developmental potential in reprogrammed cells with acquired pluripotency stimulustriggered fate conversion of somatic cells into pluripotency improving transparency and scientific rigor in academic publishing cytoscape: a software environment for integrated models of biomolecular interaction networks pump and circumstance: robert boyle's literary technology software citation principles jenkins essentials: continuous integration, setting up the stage for a devops culture achieving human and machine accessibility of cited data in scholarly publications retraction-ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children globus auth: a research identity and access management platform retracted: ileallymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children application deployment using microservice and docker containers: framework and optimization the fair guiding principles for scientific data management and stewardship information sharing statement all code developed for this framework is provide a link to the creative commons license, and indicate if changes were made. the images or other third-party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder we thank satra ghosh, maryann martone, john kunze, neal magee, and chris baker, for several helpful discussions; and neal magee for technical assistance with the university of virginia computing infrastructure. this work was supported in part by the u.s. national institutes of health, grants nih ot od - and nih u hg ; and by a grant from the coulter foundation. maxwell adam levinson, orcid: - - - sadnan al manir, orcid: - - - key: cord- -vxkbctiz authors: mao, kai; breen, peter; ruvkun, gary title: induction of rna interference by c. elegans mitochondrial dysfunction via the drh- /rig-i homologue rna helicase and the eol- /rna decapping enzyme date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vxkbctiz rna interference (rnai) is an antiviral pathway common to many eukaryotes that detects and cleaves foreign nucleic acids. in mammals, mitochondrially localized proteins such as mavs, rig-i, and mda mediate antiviral responses. here, we report that mitochondrial dysfunction in caenorhabditis elegans activates rnai-directed silencing via induction of a pathway homologous to the mammalian rig-i helicase viral response pathway. the induction of rnai also requires the conserved rna decapping enzyme eol- /dxo. the transcriptional induction of eol- requires drh- as well as the mitochondrial unfolded protein response (uprmt). upon mitochondrial dysfunction, eol- is concentrated into foci that depend on the transcription of mitochondrial rnas that may form dsrna, as has been observed in mammalian antiviral responses. the enhanced rnai triggered by mitochondrial dysfunction contributes to the increase in longevity that is induced by mitochondrial dysfunction. many rna viruses carry an rna-dependent rna polymerase (rdrp) to replicate their rna genome in the host, bypassing entirely information storage and replication with dna. in this way, rna viruses can replicate in non-dividing cells with lower deoxyribonucleotide levels than those required for dna viruses or retroviruses but with the substantial ribonucleotide levels needed for transcription of rna and mrnas in terminally differentiated cells. a double-stranded rna (dsrna) viral replication intermediate is a strong clue to the host cell that an rna virus infection is underway. recognition of the dsrna replication intermediate is an initial step in antiviral immune responses. in mammals, the rna helicases rig-i or mda recognize the rna signatures of rna virus replication (chow et al., ) . rig-i binds to dsrna, as well as single-stranded rna (ssrna) with '-triphosphate that is a signature of the products from rna dependent rna polymerases that mediate viral rna replication (lassig and hopfner, ) . associate with the mitochondrial antiviral signaling protein (mavs) and elicit the downstream nfkappab and other interferon immune signaling that mediate systemic antiviral immune defenses (zevini et al., ) . the nematode caenorhabditis elegans uses an rna interference (rnai) pathway to mediate antiviral defense instead of the interferon signaling of vertebrates and most invertebrates (felix et al., ; lu et al., ; schott et al., ; wilkins et al., ) . rna interference is a highly conserved mechanism for antiviral defense and broadly deployed by eukaryotes, including, fungi, nematodes, insects, plants and vertebrates gammon and mello, ; szittya and burgyan, ) . rna interference was initially discovered to mediate silencing triggered by engineered double-stranded rna triggers but soon discovered to produce natural small interfering rnas (sirnas) that target specific mrnas for degradation. rna interference is mediated by nt to nt single stranded short interfering or sirnas that are produced and presented to target mrnas by the argonaute proteins and the dicer dsrna ribonuclease, conserved across many but not all eukaryotes (tabach et al., ) . many species of fungi and plants and a few animal species such as nematodes, ticks, and scorpions, have a second stage amplifier for sirnas, rna dependent rna polymerases that extend off of the primary sirnas produced by dicer and argonaute proteins to render their rnai pathways especially potent. in fact, rnai was simultaneously discovered in fungal, plant, and nematode strains that all possess such rdrp genes to potentiate their rnai response pathways. rna viruses of course depend on such rdrp genes for their replication so the defense pathway may have evolved from the viral weapon, or vice versa. the fact that rdrp genes mediate rnai functions across eukaryotic phylogeny suggests that these pathways have been lost for example in most of the vertebrate lineages that only have the first stage of rnai, the argonautes and dicer, rather than rdrps evolving or being acquired by horizontal transfer from viruses independently in so many eukaryotic lineages. the sting and nfkappab rna virus surveillance pathway of vertebrates and many invertebrates are anticorrelated with the presence of rdrp genes in the genomes of the peculiar animals that have retained them and may have displaced the rdrp second stage of the rnai pathway. some of the genes in the vertebrate and insect nfkappab pathway are conserved between nematodes and other animals without rdrp genes. like its mammalian orthologues rig-i and mda , the c. elegans drh- mediates antiviral rnai and is essential for neutralization of an invading virus (gammon et al., ; guo et al., ) . drh- was initially identified as a binding partner with the dicer protein dcr- , the argonaute protein rde- and the rna helicase rde- (tabara et al., ) . in addition to drh- , drh- is a pseudogene, and drh- encodes a component of an rdrp protein complex (duchaine et al., ) . the dcr- ribonuclease is the first step in sirna generation in most of the c. elegans rnai pathways, including exogenous rnai, endogenous rnai and antiviral rnai duchaine et al., ) . sirnas generated by these pathways engage the mutator proteins that mediate sirna-guided repression in both exogenous and endogenous rnai (phillips et al., ; zhang et al., ) . mutations in the dicer, argonaute, and accessory factors that generate and present sirnas to target mrnas are generally defective for rnai or antiviral defense. but the argonaute and rdrp gene families are expanded in c. elegans compared to most animals, and surprisingly, loss of function mutations in some of those genes cause an increase in the response to sirnas: mutations in the rdrp rrf- , the specialized argonaute ergo- , the rna helicase eri- / , or the exoribonuclease eri- enhance silencing by sirnas (fischer et al., ; kennedy et al., ) . many of these enhanced rnai mutations desilence retroviral elements in the c. elegans genome so that antiviral response pathways are triggered to in turn induce the expression of rnai-based antiviral defense (fischer and ruvkun, ) . a mutation in the c. elegans mitochondrial chaperone gene hsp- causes induction of a suite of drug detoxification and defense genes (mao et al., ) . here we show that in addition to the induction of these defense pathways, rna interference pathways are also activated. because rna interference is a key feature of c. elegans antiviral defense, and because of the association of mammalian viral defense pathways such as mavs and rig-i with the mitochondrion, we explored more fully how mitochondrial homeostatic pathways connect to rna interference and antiviral defense in c. elegans. we find that reduction of function mutations in a wide range of mitochondrial components robustly enhanced rna interference-mediated silencing of endogenous genes as well as a variety of reporters of rnai. these antiviral responses to mitochondrial dysfunction are homologous to the rig-i-based mitochondrial response in mammals because they depend on the rig-i homologue, the drh- rna helicase. comparing the c. elegans transcriptional response of a mitochondrial mutant and infection with the orsay rna virus, we found a striking overlap of expression of multiple members of c. elegans pals-genes implicated in anti-viral and anti-pathogen response pathways (leyva-diaz et al., ; reddy et al., ) and the eol- /dxo rna decapping enzyme gene. we found that eol- transcription is dramatically induced by mitochondrial dysfunction, and an eol- null mutation in strongly suppresses the drh- -mediated antiviral rnai response normally induced by mitochondrial dysfunction. we showed that the eol- protein forms foci in the cytosol only if the mitochondrion is stressed, and the production of these foci are dependent on production of rna from the mitochondrial genome. this is reminiscent of the central role that dsrna released from the mitochondria plays in mammalian antiviral response pathways. during a viral infection, mitochondrial disruption by mavs and other mitochondrial-associated viral immunity factors releases mitochondrial dsrna and dna to the cytosol, where dsrnas trigger an mda -dependent interferon response (dhir et al., ; pajak et al., ) and dna via cgas and sting activate nfkappab immune signaling (west et al., ) . gene inactivation or mutations in a wide variety of nuclear-encoded mitochondrial genes causes one of the strongest increases in lifespan in genome screens for lifespan extension (lee et al., ) . we find that the uncoupling the antiviral defense response via drh- or eol- mutations abrogates the increase in lifespan, suggesting that the antiviral axis of mitochondrial dysfunction is critical to the lifespan extension. the hsp- (mg ) allele is a reduction of function mutation (p s) in the mitochondrial hsp chaperone in a region that is conserved between mammals and c. elegans (vqeifgkvpskavnpdeava). hsp- (mg ) causes induction of a suite of drug detoxification and defense genes that are also induced by a variety of mitochondrial mutations or toxins that disrupt mitochondrial function (mao et al., ) . c. elegans hsp- and human mthsp are orthologues of bacterial and archaeal dnak; these mitochondrial chaperones were bacterial chaperones before the mitochondrial endosymbiosis event more than a billion years ago and the migration of these bacterial genes to the eukaryotic nuclear genome (mao et al., ) . null alleles of hsp- cause developmental arrest, presumably because of the defects in the folding or import of many mitochondrial client proteins (kim et al., ; wiedemann and pfanner, ) . but the viable hsp- (mg ) allele allows mitochondrial roles in other pathways to be studied without the associated pleiotropic lethality. in addition to activating detoxification and immune responses, the mitochondrial defects caused by hsp- (mg ) also unexpectedly cause enhanced rna interference which is an antiviral defense pathway. rna interference in c. elegans is gene silencing by mrna degradation or heterochromatin induction that can be induced by injection or ingestion of approximately kb of double stranded rna corresponding to a particular gene. for feeding rnai, e. coli have been engineered to produce any of , dsrnas corresponding to any c. elegans gene . the specificity of rnai in c. elegans is superior to the single sirna approaches common in most animal systems, probably because, almost unique to animals, c. elegans uses rna-dependent rna polymerase genes to amplify the primary short interfering rnas (sirnas) produced by the argonaute and dicer proteins that nearly all eukaryotes also use for rnai, and unlike in other animal systems, rna interference in c. elegans can be induced by kb segments of dsrna without the induction of interferon-related responses to dsrna. thus thousands of sirnas produced from feeding c. elegans with an e. coli expressing a kb dsrna sum for on-target effects on mrna inactivation and average for off-target mrna inactivations. the specificity of dsrnainduced rnai was validated with genetic loci that had previously studied by genetics: most dsrnas corresponding to those genes with known phenotypes recapitulated the phenotypes predicted from genetics. but a minority of genes expected to generate easily scored phenotypes by rna interference did not silence in wild type. these dsrnas were used in genetic screens for c. elegans mutations that enhance rnai, or eri-mutations (kennedy et al., ) . for example, mutations in the conserved exonuclease eri- or the rdrp rrf- cause enhanced rnai (duchaine et al., ) . most mutants that enhance rnai responses also cause silencing of transgenes, because the transgene is detected as a bearing some foreign signatures (lack of introns is a major signature of a viral origin) and silenced in the enhanced rnai state of these mutant strains. a set of dsrna tester genes have been developed that cause strong phenotypes in enhanced rnai mutants, but no rnai phenotype in wild type (guang et al., ) . we found that the mitochondrial hsp- (mg ) mutant showed the strong phenotype seen in enhanced rnai mutants on many of these enhanced rnai tester dsrnas expressed from e. coli. for example, a dsrna targeting the lir- (where lir is an abbreviation for lin- -related) gene causes no phenotype in wild type, but causes lethal arrest in for example the eri- (mg ) enhanced rnai mutant, as well as on the mitochondrial hsp- (mg ) mutant ( figure a ). the enhanced lethality after exposure to lir- dsrna in enhanced rnai mutants is due to sirnas from this dsrna targeting the duplicated and diverged genes with high regions of nucleotide homology on the same primary transcript of the lir- , lir- , and lin- operon (pavelec et al., ). this enhanced response to lir- was not due to strain background mutations in hsp- (mg ) or the pleiotropy of a mitochondrial chaperone that may affect the function of many imported mitochondrial proteins: lir- rnai also caused a lethal arrest on a variety of other nuclearlyencoded mitochondrial protein point mutants, including nadh dehydrogenase nuo- /ndufb , ubiquinone biosynthesis clk- /coq , and the iron-sulfur cluster isp- /uqcrfs ( figure a ). many mutations in nuclearly-encoded mitochondrial proteins are lethal, so that amino acid substitution reduction in function alleles constitute most of the available viable mitochondrial mutations. a distinct dsrna that targets a histone b gene, his- , causes larval arrest in enhanced rnai mutants but no lethality on wild type ) also caused larval arrest in the hsp- /mthsp , nuo- /ndufb , clk- /coq , and isp- /uqcrfs mitochondrial mutants ( figure a ). his- maps to a cluster of histone genes including multiple histone b genes with nucleic acid homology; in the enhanced rnai mutants, the initial sirnas produced from the his- dsrna may spread to adjacent histone b genes. one explanation for the enhanced response to lir- and his- rnai is that these genes have genetic interactions with hsp- (mg ) that caused synthetic enhancement of certain phenotypes. however because multiple mitochondrial mutations enhance response to these enhanced rnai tester dsrnas, another hypothesis was that hsp- (mg ) and the other mitochondrial mutants trigger an enhanced rnai (eri) phenotype. three other mitochondrial point mutants that we tested (out of six tested), nduf- (et ), mev- (kn ) and gas- (fc ) did not cause an enhanced rnai phenotype. there was no obvious distinction between the types of mitochondrial mutations that caused enhanced rnai and those that did not, in terms of growth rate or severity of the phenotype. nuo- , gas- , and nduf- are components of complex i, mev- is from complex ii, isp- is a component of complex iii, and clk- produces the ubiquinone that also functions in electron transport. it is possible that only particular mitochondrial insults activate the rnai pathway. but because this was not a rare response to a peculiar mitochondrial dysfunction, but rather associated with many different mitochondrial mutations we explored this with other measures of the intensity of rna interference. a hallmark of an enhanced rnai phenotype is the silencing of transgenes, as increased rna interference detects the foreign genetic signatures (the fusion of non-c. elegans genes such as gfp, and the synthetic introns, and other engineered features) of transgenes (fischer et al., ) . we asked if the hsp- (mg ) mutant displayed enhanced transgene silencing by introducing a transgene containing rol- (su ), a dominant mutation of a hypodermal collagen that causes an easily scored rolling movement phenotype (kramer and johnson, ) . lin- b is a class b synthetic multivulva (synmuv b) gene and loss-of-function of lin- b significantly enhances transgene silencing (wang et al., ) . a chromosomallyintegrated transgene carrying multiple copies of the rol- (su ) mutant collagen gene causes % of transgenic animals to roll (a rol phenotype) in wild type animals, but in strains with enhanced rna interference, this transgene is now silenced so that % of animals are rol in the lin- b(n ) mutant and % rol in the hsp- (mg ) homozygous mutant ( figure b ). the suppression of the rol phenotype from the transgene carrying rol- (su ) was not due to a genetic interaction between hsp- (mg ) and the rol- (su ) mutant collagen gene, because the hsp- (mg ); rol- (su ) double mutant with the rol- collagen mutation located in its normal chromosomal location not subject to enhanced rnai silencing of a transgene, was still % rol ( figure c) . rather, the enhanced rnai of the mitochondrial mutants, including hsp- (mg ) causes a silencing of the rol- (su ) mutant collagen allele on the multicopy transgene to suppress the rol phenotype. to evaluate transgene silencing in other tissues, a ubiquitously expressed sur- ::gfp fusion gene was monitored in all somatic cells (yochem et al., ) . the bright gfp signal of sur- ::gfp was dramatically decreased in the enhanced rnai mutant lin- b(n ) as well as in the hsp- (mg ) mitochondrial mutant ( figure d and e). thus hsp- (mg ) enhances exogenous rnai and silences somatic transgenes. the somatic transgene silencing and enhanced rnai in the eri- mutant is associated with a failure to nuclearly-localize the argonaute trancriptional silencing factor nrde- that acts downstream of sirna generation (guang et al., ; wu et al., ) . the synmuvb enhanced rnai mutants cause a somatic misexpression of the normally germline p granules implicated in sirna (wu et al., ) . however, neither nrde- nuclear delocalization nor the somatic expression of p granules occurred in hsp- (mg ) mutant, suggesting that mitochondrial dysfunction does not induce eri- or synmuvb classes of enhanced rnai. the enhanced rnai of hsp- (mg ) most resembled the eri- / rna helicase and ergo- argonaute enhanced rnai phenotypes, associated with desilencing of recently acquired viral genes and induction of viral immunity, without nuclear nrde- or somatic p granule expression fischer et al., ; fischer and ruvkun, ) . in mammalian cells, the rna helicase mda mediates an interferon antiviral immune response that is strongly enhanced by a mitochondrial rna degradation mutation that enhances production of mitochondrial dsrnas (dhir et al., ) . intriguingly, mutations in the c. elegans homologue of mda , drh- suppress the synthetic lethality of enhanced rnai and dsrna editing double mutants (fischer and ruvkun, ; reich et al., ) . c. elegans drh- contains three domains, including conserved helicase domain and c-terminal domain (ctd) ( figure f) , and an nterminal domain (ntd) that is only conserved in nematodes (supplemental figure ). the caspase activation and recruitment domains (card) of rig-i and mda associate with the mitochondrial protein mavs, which is unique to mammals (zevini et al., ) . conversely, the ntd of drh- is nematodespecific and essential for the inhibition of viral replication (guo et al., ) . to test if drh- is required for hsp- (mg ) induced transgene silencing, two transgenes rol- (su ) and sur- ::gfp were tested for silencing in an hsp- ; drh- double mutant. the drh- (tm ) allele disrupts the ntd and renders the strain susceptible to viral infection (gammon et al., ) . drh- (tm ) suppressed the transgene silencing induced by hsp- (mg ): hsp- (mg ) animals carrying the rol- (su ) transgene were % rol, but the hsp- (mg ); drh- (tm ) double mutant was % rol (figure b) , showing that the transgene silencing induced by hsp- (mg ) was suppressed by drh- (tm ). similarly, the gfp signal of sur- ::gfp was significantly brighter in the hsp- (mg ); drh- (tm ) double mutant compared to hsp- (mg ) ( figure d and e). the lir- and his- enhanced rnai lethal or larval arrest phenotypes of hsp- (mg ) were also suppressed by drh- (tm ) ( figure a ). drh- mutant animals carrying a distinct mutant allele from a wild strain of c. elegans with a c-terminal truncation are competent for rnai but show defects in antiviral rnai . therefore, hsp- (mg ) enhanced silencing of transgenes and of his- and lir- enhanced rnai requires drh- gene activity. upon virus infection, human rig-i and mda mediate the upregulation of interferon genes. although the interferon signaling pathway is not conserved in c. elegans, we suspected drh- , like its mammalian orthologue, might promote the transcriptional activation of downstream response genes. comparison of the expression profile of hsp- (mg ) (mao et al., ) and wild type animals infected with the rna virus orsay showed that of the genes upregulated by orsay virus infection, were also upregulated in hsp- (mg ) mutant (figure a and supplemental table s ). of these genes, are members of the pals- to pals- genes, which located in a few clusters and, remarkably, a null allele in pals- causes an enhanced rnai phenotype of transgene silencing. thus the increased expression of pals-genes in virus infected and mitochondrial defective animals is not just associated with enhanced rna interference but a mutation in one pals- gene can cause increased rna interference and transgene silencing (leyva-diaz et al., ; reddy et al., ) . the eol- rna decapping gene was also strongly induced in virally infected and mitochondrial mutant c. elegans (supplemental table s ). we selected eol- for detailed analysis because a. it is conserved from yeast to mammals ( figure b) infection is dependent on drh- (sowa et al., ) , and d. the human orthologue of eol- , dxo represses hepatitis c virus replication (amador-canizares et al., ) . c. elegans eol- was initially identified as a mutant that enhanced olfactory learning after pseudomonas aeruginosa infection (shen et al., ) . eol- encodes a decapping exoribonulease orthologous to yeast rai and mammalian dxo ( figure b ). mammalian dxo processes the ' end of mrna (kramer and mclennan, ) , including a decapping activity that removes the unmethylated guanosine cap and the first nucleotide (gpppn), the denading activity that removes the nicotinamide adenine dinucleotide (nad) cap, the pyrophosphohydrolase (pph) activity that releases pyrophosphate (ppi) from ' triphosphorylated rna, and '- ' exonuclease ( '- ' exo) to degrade the entire rna ( figure c) . the removal the m g cap and subsequent degradation of mammalian mrnas are directed by the decapping exoribonuclease dxo. expression of mus musculus dxo rescued the enhanced olfactory learning phenotype in c. elegans eol- mutant indicating not only the amino acid sequence but also the function of eol- is conserved (shen et al., ) . eol- is one of c. elegans dxo homologues. interestingly, of these genes (m g . , m g . , m g . , y h a. , y h a. , and y h a. ) are clustered as tandemly duplicated genes, adjacent to rrf- , one of four c. elegans rdrps, and one (c h . ) is adjacent to hsp- . to verify the transcriptional upregulation of eol- in hsp- (mg ) mutant and test if the induction requires drh- , a transcriptional fusion reporter peol- ::gfp containing the eol- promoter, gfp and eol- 'utr was constructed. in wild type animals treated with control rnai, peol- ::gfp was barely detectable ( figure d and e); in the hsp- (mg ) mitochondrial mutant, peol- ::gfp was strongly induced ( figure d and e). the induction of peol- ::gfp was abrogated in hsp- (mg ) treated with drh- rnai ( figure d and e). since the peol- ::gfp transcriptional fusion reporter is a transgene that is subject to enhanced rnai transgene silencing by hsp- (mg ) and suppressed by drh- , it might not reflect the accurate expression level of eol- in these mutants. to evaluate the actual mrna level of chromosomal eol- , quantitative reverse transcription pcr (rt-qpcr) was performed. the mrna level of chromosomal eol- was fold increased in the hsp- (mg ) single mutant, and this induction was almost completely abolished in the hsp- (mg ); drh- (tm ) double mutant ( figure f ). thus, mitochondrial dysfunction of hsp- (mg ) triggers the transcriptional activation of eol- in a drh- -dependent manner. in c. elegans, the mitochondrial unfolded protein response (upr mt ) is a transcriptional program responding to mitochondrial dysfunction and essential for mitochondrial recovery, immunity, detoxification and aging (lin and haynes, ) . rnai of the genes atfs- or dev- , which encode transcription factors that mediate the expression of upr mt genes (nargund et al., ; tian et al., ) , suppress the induction of peol- ::gfp ( figure d and e) . we then tested if drh- contributes to the activation of upr mt by observing the induction of phsp- ::gfp, the canonical reporter of upr mt . the phsp- ::gfp was strongly induced in the hsp- (mg ) mutant, and suppressed by rnai of atfs- or dev- as expected ( figure g and h). rnai of drh- in the hsp- (mg ); phsp- ::gfp strain did not disrupt induction of phsp- ::gfp; in fact, induction of phsp- ::gfp by hsp- (mg ) after drh- (rnai) was more than in hsp- (mg ) alone, perhaps because drh- inactivation inhibited hsp- (mg )-induced enhanced rnai and transgene silencing. thus, the transcriptional activation of eol- requires upr mt signaling, whereas drh- is not part of the general upr mt . the drh- dependent upregulation of eol- implies that eol- might act in the enhanced rna interference pathway induction of the hsp- (mg ) mutant. to test this possibility, a loss-of-function mutant eol- (mg ) was generated by crispr-cas (arribere et al., ) . in eol- (mg ), six nucleotides "tgatca", which contains a stop codon and a bcl i endonuclease recognition sequence to facilitate genotyping, was inserted into the eol- locus and resulted in a "lys to stop" nonsense mutation ( figure a ). the rol- (su ) and sur- ::gfp transgenes were tested for silencing in the hsp- ; eol- double mutant. for the rol- (su ) transgene, the hsp- (mg ); eol- (mg ) double mutant showed % rolling compared to % in the hsp- (mg ) single mutant ( figure b ). for the sur- ::gfp transgene, the gfp signal in the hsp- (mg ); eol- (mg ) double mutant was substantially increased relative to the hsp- (mg ) single mutant ( figure c and d) . moreover, the eri phenotype of hsp- (mg ) as assessed using the lethality of the lir- or his- dsrnas was also suppressed by eol- (mg ) ( figure e ). because mammalian dxo, a homologue of c. elegans eol- , is involved in mrna decay, we tested whether eol- mediates the degradation of mrnas from transgenes. in such a model, eol- might also be required for the somatic transgene silencing caused by other eri mutants, for example, by the large synmuv b class of eri mutants. although drh- (tm ) slightly suppressed the transgene silencing of a lin- b null mutant, eol- did not contribute to transgene silencing by the synmuvb mutants ( figure f , g and h). thus, eol- does not act in the synmuvb rnai pathway that silences transgenes and is specifically required for mitochondrial dysfunction-induced transgene silencing. in order to understand the function of drh- and eol- in silencing of transgenes, we monitored their subcellular localization by fusing mscarlet at the n-terminus of drh- and c-terminus of eol- , respectively, under the control of rpl- promoter in a minimos vector (frokjaer-jensen et al., ) . the minimos generated single copy transgene is able to avoid multicopy transgene silencing by the rnai pathway, and the ribosome promoter prpl- is a constitutive promoter with universal expression in all tissues. in wild type animals, mscarlet::drh- and eol- ::mscarlet were localized diffusely in the cytosol without any notable pattern ( figure a ). in the hsp- (mg ) mitochondrial mutant, strikingly, eol- ::mscarlet formed massive puncta in many cell types ( figure a ). mscarlet::drh- remained diffusely localized in hsp- (mg ) (figure a ), just as no change was observed in drh- ::gfp subcellular distribution upon orsay virus infection . the formation of eol- ::mscarlet foci in the hsp- mitochondrial mutant, and the specificity of eol- in the suppression of hsp- (mg ) induced silencing, and the mammalian finding that mda recognizes mitochondrial dsrnas generated a hypothesis that the target of eol- decapping are rnas derived from the mitochondrial genome. first, we examined whether the cytoplasmic eol- foci localized at the mitochondria surface. because loss-of-function eol- (mg ) suppresses the mitochondrial defect-induced rol- (su ) transgene silencing of in the hypodermis, the hypodermal-specific col- promoter driven tomm- ::gfp minimos single copy transgene was used to monitor mitochondria in hypodermal cells. however, eol- ::mscarlet foci were not specifically associated with the mitochondria ( figure b ). therefore, eol- accumulates as foci in response to mitochondrial dysfunction, but these foci as not mitochondrially-associated and may function in the cytosol. we explored whether c. elegans mitochondria release rna into the cytosol, as has been observed in the mda mitochondrial surveillance pathway in human and d. melanogaster (dhir et al., ; pajak et al., ) . the human mtdna transcribes genes, including mrnas encoding subunits of electron transport chain (etc), ribosomal rnas (rrna) and transfer rnas (trna), residing on both the heavy and light strands of the mitochondrial dna (hallberg and larsson, ) . transcription on both strands generated two genome size polycistronic transcripts that are processed into individual genes. during this processing, the complementary noncoding rnas in mammals are degraded by the mitochondrial degradosome formed by rna helicase suv and polynucleotide phosphorylase pnpase (borowski et al., ) . loss of suv or pnpase stabilize the noncoding rnas to form dsrnas with the genes derived form the opposite strand (dhir et al., ) . the c. elegans mitochondrial genome encodes genes with one less etc subunit (okimoto et al., ) . however, these genes are encoded exclusively on the heavy strand of c. elegans mtdna and no transcription of the light strand has been detected (blumberg et al., ) , which disfavors the mitochondrial dsrna model. mrna-seq analysis of hsp- (mg ) revealed that of mitochondrial mrna encoding etc subunits were upregulated ( figure a ). the increased level of mitochondrial mrna was further verified by rt-qpcr analysis of nduo- /nd and ctb- /cytb ( figure b ). in order to examine if the increased abundance of mitochondrial mrnas within the mitochondrial matrix caused their release into the cytosol, the cytosolic fraction (actin antibody was used as a control protein for this fractionation) was separated by centrifugation from the pellet fraction containing mitochondria (atp a antibody used as a control for this fractionation) ( figure c ). rt-qpcr analyses showed that the mrna levels of nduo- /nd and ctb- /cytb were substantially increased in the cytosol of hsp- (mg ) compared to wild type ( figure d ). the strategy of single strand transcription from the mitochondrial genome is not just parochial to c. elegans; it is a strategy used by many members of the genus caenorhabditis . if the unique strand transcriptional architecture of the caenorhabditis mitochondrial genome is linked with production of dsrna from the mitochondria, and if the release of mitochondrial rna into the cytosol in mitochondrial mutants is the trigger for the formation of eol- foci, we would expect a suppression of eol- focus formation by shutting down mitochondrial rna biogenesis. the human gene polrmt encodes a mitochondrial dna-directed rna polymerase that catalyzes the transcription of mitochondrial dna (hallberg and larsson, ) . rnai of rpom- , the c. elegans orthologue of human polrmt, potently reduced the number of eol- foci in hsp- (mg ) mutant ( figure c and d) . therefore, the formation eol- foci required the biogenesis of mitochondrial rna. mitochondrial rna polymerase is central to all gene expression in the mitochondrion, including the - protein coding genes, the trna genes and the rrna genes. so a defect in rpom- is expected to be pleiotropic and thus difficult to ascribe to production of dsrna from mitochondrial transcripts. but paradoxically, with client genes in the mitochondrion, rprom- has a rather circumscribed sphere of genes it controls, so the its function upstream of eol- foci could very well be via dsrnas produced from mitochondria but only released in mitochondrial mutant strains. while the majority of mitochondrial components are essential for eukaryotes and null alleles of most mitochondrial genes are lethal, c. elegans reduction-of-function mitochondrial mutants such as hsp- (mg ) (figure a and d) increase longevity, in some cases dramatically (dillin et al., ; lee et al., ) . we explored whether the enhanced rnai output of the hsp- (mg ) mitochondrial mutant contributes to its longevity. drh- (tm ) or eol- (mg ) single mutants did not shift the survival curve compared to wild type ( figure b and d) , showing that these mutants are not short-lived or sickly in some way. but drh- (tm ) or eol- (mg ) mutations strongly suppressed the lifespan extension of hsp- (mg ), showing that drh- or eol- gene activities are required for the extension of lifespan by hsp- (mg ) ( figure c and d) . because drh- and eol- mediate the enhanced rna interference, an antiviral response, of mitochondrial mutants, and because these outputs also mediate in the increased longevity of mitochondrial mutants, but not the normal longevity of wild type, these data support the model that enhanced antiviral defense as a key anti-aging output from mitochondrial mutations. the rnai pathway drives gene silencing of mrnas that have signatures of being foreign. for example, many of the enhanced rnai mutants mediate the silencing of integrated viruses in the c. elegans genome that are recently acquired (fischer and ruvkun, ) . these viral genes are recognized as foreign by their small number of introns, for example, and by poor splice sites (newman et al., ) . the antiviral defense of c. elegans is more closely related to that of fungi and plants than most animals, for example drosophila and vertebrates. nematodes, plants, and most fungi use the common pathway genes dicer, specialized argonaute/piwi proteins, and most distinctively, rna dependent rna polymerases to produce primary sirnas and secondary sirnas against invading viruses. because these pathways are common across many (but not all) eukaryotes, it is likely that the common ancestor of animals, plants, fungi, and some protists carried the dicer, argonaute, and rdrp genes that mediate rna interference defense. rna dependent rna polymerase genes, paradoxically, are almost always carried by rna viruses to mediate their replication (smith et al., ) . thus a key protein in antiviral defense in most eukaryotes, is a protein that may spread across eukaryotes from viral infections. the loss of rdrp genes in most animals species appears to be associated with the evolution of the sting and nfkappab interferon rna virus defense pathway of most insects and vertebrates but not in nematodes or plants or fungi. in addition to the interferon pathway, the pirna pathway has specialized in insects and vertebrates to mediate antiviral defense (aravin et al., ) . the pirna pathway is not present in fungi or plants and appears to not have become as central to viral defense in c. elegans which has not lost its rna dependent rna polymerase, like most other animal species such as vertebrates and insects (tabach et al., ) . viral defense in c. elegans is strongly induced in the many mutants that cause enhanced rna interference, eri- to eri- , the many synmuvb mutations, and the mitochondrial mutants we characterize here. these mutations may induce antiviral defense because viral infections may compromise these genetic pathways or because these mutations genetically trigger an antiviral state that is normally under physiological regulation. a surveillance system to monitor the states of these pathways may be an early detection system for viruses to induce expression and activities of dicer, argonaute proteins and rna dependent rna polymerases in sirna production, targeting, and amplification (govindan et al., ; melo and ruvkun, ) . regardless of whether antiviral defense is mediated by rdrps in nematodes or by mavs and interferon signaling, in vertebrates and most insects, the mitochondria are woven into the pathways. for flock house virus and other nodaviruses, the mitochondrion is a center of viral replication; mutations that compromise the mitochondrion in yeast for example, cause defects in antiviral responses (hao et al., ; miller and ahlquist, ; miller et al., ) . it is possible that viruses generally need the oxidizing power of o for the massive production of disulfides in virion assembly in the er. thus mitochondrial dysfunction may be a key pathogenic feature of viral infection and systems have evolved to trigger antiviral defense if mitochondria dysfunction is detected. the mitochondria has also been implicated in the drosophila and mammalian antiviral pirna pathway: in drosophila, retroviruses are recognized by the dedicated pirna pathway that generates pirnas from integrated flamenco as well as other retroviral elements (malone et al., ) to target newly encountered viruses that are related to these integrated viruses. a variety of mitochondrial outer membrane proteins couple pirnas to target loci (huang et al., ; munafo et al., ) . and the sting interferon pathway also uses the mavs and rig-i interaction with mitochondria to signal antiviral responses. we have shown that enhanced rna interference is also induced by many mutations that compromise mitochondrial function. human mitochondrial mutations also trigger antiviral immunity (dhir et al., ; west et al., ) . the human rig-i rna helicase recognizes viral dsrna or ssrna 'triphosphate (chow et al., ) . our data suggests that the signal that triggers the homologous rna helicase, drh- , are mitochondrial dsrnas that are released to the cytosol. the eol- /dxo exonuclease may remove the ' cap of mitochondrial rna, such as a '-nad modification (bird et al., ) , to facilitate the recognition by drh- . the ' ends of the majority of messenger rnas (mrna) are chemically modified or capped to protect from rna nucleases (grudzien-nogalska and kiledjian, ). prokaryotic and eukaryotic cells have different strategies to modify their mrna with di-or triphosphate in bacteria and n methyl guanosine (m g) cap in eukaryotes. thus the eukaryotic ' cap is a conspicuous sign to identify self, rather than the non-self uncapped rna from virus or bacteria. a decrement in mitochondrial function is one of the most potent mechanisms to increase longevity in a variety of species (lee et al., ) . our analysis of the eol- and drh- pathway from mitochondrial dysfunction to enhanced rna interference and antiviral activity is a key output from mitochondria for anti-aging. this would suggest that the dramatic increase in frailty in old age could reflect viral vulnerability. in fact, one of the most dramatic hallmarks of the recent covid- epidemic is that the virus is x more lethal to the elderly than young adults. in support of this model, in mammals, a reduction in dicer expression in heart, adipose and brain is a hallmark of aging (kaneko et al., ; tarallo et al., ) . a loss-offunction mutant of dcr- /dicer shortens c. elegans lifespan (mori et al., ) . anti-bacterial immune responses are also activated upon mitochondrial disruption (liu et al., ; mao et al., ; pellegrino et al., ) , suggesting that perhaps the c. elegans antiviral and antibacterial responses may have common features. eol- was discovered based on its effect on olfactory learning upon p. aeruginosa infection (shen et al., ) , and it is induced by particular bacterial pathogens (sowa et al., ) . gene expression profiles suggest that after virus infection, loss of drh- causes a shift from antiviral response to antibacterial response . thus, the immune network in c. elegans may be coupled to the surveillance of many cellular systems to trigger a variety of defense responses, including enhanced rna interference. the enhanced rnai mutants may constitutively express these normally transient states of antiviral or antibacterial defense. all c. elegans strains were cultured at °c. strains used in this study are listed in supplemental table s . for peol- ::gfp, the plasmid harboring [peol- ::gfp::eol- 'utr + cbr-unc- (+)] was injected into unc- (ed ). for single copy transgenes prpl- ::mscarlet::drh- , prpl- ::eol- ::mscarlet and pcol- ::tomm- ::gfp, the plasmid was injected into unc- (ed ) following the minimos protocol (frokjaer-jensen et al., ) . for crispr of eol- (mg ), we chose dpy- (cn ) as the co-crispr marker (arribere et al., ) , and pjw (addgene) to express both of guide-rna (grna) and cas enzyme (ward, ) . the fluorescent signals of sur- ::gfp, peol- ::gfp and phsp- ::gfp transgenic animals were photographed by zeiss ax zoom.v microscope. the subcellular pattern of prpl- ::mscarlet::drh- , prpl- ::eol- ::mscarlet and pcol- ::tomm- ::gfp were carried out on the leica tcs sp confocal microscope. photographs were analyzed by fiji-imagej. for estimation of the mrna level of eol- , nduo- and ctb- , around l larvae were hand picked from mixed population and frozen by liquid nitrogen. total rna was isolated by trizol extraction (thermo fisher # ). for analyses of cytosolic mrna level of nduo- and ctb- , approximately worms were synchronized by bleach preparation of eggs, hatching progeny eggs to l larvae, grown till l stage and frozen by liquid nitrogen. worm lysates were generated by tissuelyser with steel beads (qiagen # ), and re-suspended in μ l of the lysis buffer containing . m sucrose, mm tris-hcl, mm edta, x protease inhibitor cocktail (roche ) and u/μl murine rnase inhibitor (new england biolabs, #m l). the lysate was centrifuged at g for min at °c to remove the pellet debris. and the supernatant was centrifuged at g for min at °c. collect the pellet for western blot and the supernatant was centrifuged at g for min at °c. take μ l of the supernatant for western blot, and μ l for rna isolation with trizol. the cdna was generated by protoscript ® ii first strand cdna synthesis kit (new england biolabs, #e l). qpcr was performed toward eol- , nduo- and ctb- with act- as control by iq tm sybr ® green supermix (bio-rad, # ). the pellet or supernatant from the previous centrifugation were mix nupage tm lds sample buffer (thermo fisher #np ) and heated at °c for minutes. samples were loaded onto nupage™ - % bis-tris protein gels (thermo fisher #np box) and run with nupage tm mes sds running buffer (thermo fisher #np ). after semi-dry transfer, pvdf membrane (millipore #ipvh ) was blocked with % nonfat milk, and probed with anti-atp a (abcam ab ) or anti-actin (abcam #ab ) primary antibodies. the membrane was developed with supersignal™ west femto maximum sensitivity substrate (thermo fisher # ) and visualized by amersham hyperfilm (ge healthcare # ). animals were synchronized by egg-laying and grown until the l stage as day . adults were separated from their progeny by manually transfer to new plates. survival was examined on a daily basis and the survival curve was generated by graphpad prism. g.r. supervised the study. k.m. and g.r. designed the experiments and wrote the manuscript. k.m. performed most experiments and analyzed results. p.b. performed microinjection. (a) enhanced rnai response to lir- rnai or his- rnai in the eri- control enhanced rnai mutant or any of the mitochondrial mutants (hsp- , nuo- , clk- , or isp- ), causes lethality/arrest on the mitochondrial mutants but not wild type. the enhanced rnai of the hsp- mitochondrial mutant is suppressed by a drh- mutation. (b) transgene silencing test with rol- (su ) transgene. this transgene causes a rolling behavior due to a twist on the collagen cuticle in wild type. the expression of the rol- collagen mutation from the transgene mir- does not impact recognition of the hcv genome by innate sensors of rna but rather protects the ' end from the cellular pyrophosphatases, dom z and dusp the piwi-pirna pathway provides an adaptive defense in the transposon arms race efficient marker-free recovery of custom genetic modifications with crispr/cas in caenorhabditis elegans a deletion polymorphism in the caenorhabditis elegans rig-i homolog disables viral rna dicing and antiviral immunity highly efficient ' capping of mitochondrial rna with nad(+) and nadh by yeast and human mitochondrial rna polymerase initiation of mtdna transcription is followed by pausing, and diverges across human cell types and during evolution human mitochondrial rna decay mediated by pnpase-hsuv complex takes place in distinct foci an evolutionarily conserved transcriptional response to viral infection in caenorhabditis nematodes rig-i and other rna sensors in antiviral immunity mitochondrial double-stranded rna triggers antiviral signalling in humans rates of behavior and aging specified by mitochondrial function during development antiviral rna interference in mammals functional proteomics reveals the biochemical niche of c. elegans dcr- in multiple small-rna-mediated pathways natural and experimental infection of caenorhabditis nematodes by novel viruses related to nodaviruses trans-splicing in c. elegans generates the negative rnai regulator eri- / the eri- / helicase acts at the first stage of an sirna amplification pathway that targets recent gene duplications multiple small rna pathways regulate the silencing of repeated and foreign genes in c. elegans rnai pathway silence endogenous viral elements and ltr retrotransposons random and targeted transgene insertion in caenorhabditis elegans using a modified mos transposon the antiviral rna interference response provides resistance to lethal arbovirus infection and vertical transmission in caenorhabditis elegans rna interference-mediated antiviral defense in insects lipid signalling couples translational surveillance to systemic detoxification in caenorhabditis elegans new insights into decapping enzymes and selective mrna decay an argonaute transports sirnas from the cytoplasm to the nucleus homologous rig-i-like helicase proteins direct rnai-mediated antiviral immunity in c. elegans by distinct mechanisms making proteins in the powerhouse genome-wide analysis of host factors in nodavirus rna replication pirna-associated germline nuage formation and spermatogenesis require mitopld profusogenic mitochondrial-surface lipid signaling dicer deficit induces alu rna toxicity in age-related macular degeneration a conserved sirna-degrading rnase negatively regulates rna interference in c. elegans lipid biosynthesis coordinates a mitochondrial-to-cytosolic stress response functional genomic analysis of rna interference in c. elegans analysis of mutations in the sqt- and rol- collagen genes of caenorhabditis elegans the complex enzymology of mrna decapping: enzymes of four classes cleave pyrophosphate bonds discrimination of cytosolic self and non-self rna by rig-i-like receptors a systematic rnai screen identifies a critical role for mitochondria in c. elegans longevity silencing of repetitive dna is controlled by a member of an unusual caenorhabditis elegans gene family comparative mitochondrial genomics reveals a possible role of a recent duplication of nadh dehydrogenase subunit in gene regulation metabolism and the upr(mt) caenorhabditis elegans pathways that surveil and defend mitochondria animal virus replication and rnai-mediated antiviral silencing in caenorhabditis elegans specialized pirna pathways act in germline and somatic tissues of the drosophila ovary mitochondrial dysfunction in c. elegans activates mitochondrial relocalization and nuclear hormone receptor-dependent detoxification genes inactivation of conserved c. elegans genes engages pathogen-and xenobiotic-associated defenses flock house virus rna polymerase is a transmembrane protein with amino-terminal sequences sufficient for mitochondrial localization and membrane insertion flock house virus rna replicates on outer mitochondrial membranes in drosophila cells role of microrna processing in adipose tissue in stress defense and longevity daedalus and gasz recruit armitage to mitochondria, bringing pirna precursors to the biogenesis machinery mitochondrial import efficiency of atfs- regulates mitochondrial upr activation the surveillance of pre-mrna splicing is an early step in c. elegans rnai of endogenous genes the mitochondrial genomes of two nematodes, caenorhabditis elegans and ascaris suum defects of mitochondrial rna turnover lead to the accumulation of double-stranded rna in vivo requirement for the eri/dicer complex in endogenous rna interference and sperm development in caenorhabditis elegans mitochondrial upr-regulated innate immunity provides resistance to pathogen infection mut- promotes formation of perinuclear mutator foci required for rna silencing in the c. elegans germline an intracellular pathogen response pathway promotes proteostasis in c. elegans c. elegans adars antagonize silencing of cellular dsrnas by the antiviral rnai pathway competition between virus-derived and endogenous small rnas regulates gene expression in caenorhabditis elegans an antiviral role for the rna interference machinery in caenorhabditis elegans activation of daf- /foxo by reactive oxygen species contributes to longevity in long-lived mitochondrial mutants in caenorhabditis elegans eol- , the homolog of the mammalian dom z, regulates olfactory learning in c. elegans thinking outside the triangle: replication fidelity of the largest rna viruses the caenorhabditis elegans rig-i homolog drh- mediates the intracellular pathogen response upon viral infection rna interference-mediated intrinsic antiviral immunity in plants identification of small rna pathway genes using patterns of phylogenetic conservation and divergence the dsrna binding protein rde- interacts with rde- , dcr- , and a dexh-box helicase to direct rnai in c. elegans dicer loss and alu rna induce age-related macular degeneration via the nlrp inflammasome and myd mitochondrial stress induces chromatin reorganization to promote longevity and upr(mt) somatic misexpression of germline p granules and enhanced rna interference in retinoblastoma pathway mutants regulation of caenorhabditis elegans rna interference by the daf- insulin stress and longevity signaling pathway rapid and precise engineering of the caenorhabditis elegans genome with lethal mutation co-conversion and inactivation of nhej repair mitochondrial dna stress primes the antiviral innate immune response mitochondrial machineries for protein import and assembly rna interference is an antiviral defence mechanism in caenorhabditis elegans repression of germline rnai pathways in somatic cells by retinoblastoma pathway chromatin complexes a new marker for mosaic analysis in caenorhabditis elegans indicates a fusion between hyp and hyp , two major components of the hypodermis crosstalk between cytoplasmic rig-i and sting sensing pathways mut- and other mutator class genes modulate g and g sirna pathways in caenorhabditis elegans is silenced by enhanced rnai in mitochondrial mutants, and this transgene silencing depends on drh- gene activity the mitochondrial mutations do not simply suppress the collagen defect of a rol- mutation: if the rol- (su ) mutation is not on a transgene that is under rnai control, the rol phenotype is not suppressed by hsp- (mg ) and an hsp- (mg ) single mutation does not have any rol phenotype transgene silencing test with sur- ::gfp transgene. this transgene is ubiquitously expressed in all somatic cells. the expression of the sur- ::gfp from the transgene is silenced by enhanced rnai in mitochondrial mutants, and this transgene silencing depends on drh- gene activity diagram of human rig-i, mda and c. elegans drh- . the helicase domain and ctd are conserved. card: caspase recruitment domain; ntd: n-terminal domain; ctd: c-terminal domain; hs: homo sapiens ce: caenorhabditis elegans increased eol- expression depends on drh- venn diagram for genes upregulated in hsp- (mg ) and orsay virus infection phylogenetic tree of eol- /dxo. eol- is conserved from yeast to mammals. hs: homo sapiens; mm: mus musculus; dm: drosophila melanogaster; ce: caenorhabditis elegans the enzyme activities of human dxo. mammalian dxo modifies the ' end of mrnas: decapping, denading, pyrophosphohydrolase and '- ' exonuclease the induction of peol- ::gfp transcriptional fusion reporter requires drh- and upr mt . the peol- ::gfp is strongly induced by hsp- (mg ) mitochondrial mutant, and this induction is abrogated by drh- rnai as well as rnai of genes involved in upr mt (atfs- and dve- ) drh- does not contribute to upr mt . the benchmark reporter of upr mt phsp- ::gfp is induced by the hsp- (mg ) mutant. and the induction is suppressed by rnai of atfs- or dev- , but not drh- . animals were imaged in (g) and the fluorescence was quantified in (h) the eol- (mg ) mutant allele generated by crispr-cas transgene silencing test with rol- (su ) transgene. the expression of the rol- collagen mutation from the transgene is silenced by enhanced rnai in hsp- (mg ) mutant, and this transgene silencing depends on eol- gene activity this transgene was ubiquitously expressed in all somatic cells. the expression of the sur- ::gfp from the transgene is silenced by enhanced rnai in the hsp- (mg ) mutant, and this transgene silencing requires eol- enhanced rnai response to lir- rnai or his- rnai in hsp- (mg ) mutant requires eol- . rnai of lir- or his- causes lethality/arrest on hsp- (mg ) mutant but not wild type. the enhanced rnai is suppressed by the eol- (mg ) mutation silencing of rol- (su ) transgene caused by synmuvb enhanced rnai mutations does not depend on eol- . the rol- (su ) transgene is silenced by the synmuvb lin- (n ) mutation, and this transgene silencing was slightly suppressed by drh- (tm ) silencing of sur- ::gfp transgene caused by synmuvb enhanced rnai mutations does not require eol- . the sur- ::gfp transgene is silenced by the synmuvb lin- (n ) mutation, and this transgene silencing was slightly suppressed by drh- (tm ), but not eol- (mg ) the mrna level of nduo- /nd or ctb- /cytb was induced by hsp- (mg ) mitochondrial mutation. both nduo- /nd (nadh:ubiquinone oxidoreductase) and ctb- /cytb (cytochrome b) are transcribed from mitochondrial genome and their expression level was evaluated by rt-qpcr assays the mrna level of nduo- /nd or ctb- /cytb in the cytosol was increased in hsp- (mg ) mitochondrial mutation. the pellet and cytosolic fractions were separated by centrifugation and probed with atp a and actin antibodies, respectively to assess purification in (c). the mrna level of nduo- /nd or ctb- /cytb in the mitochondria free cytosolic fraction was evaluated by rt c: cytosolic fraction figure . eol- protein forms cytoplasmic foci if mitochondria are dysfunctional eol- protein, but not drh- protein form easily seen puncta in the hsp- (mg ) mitochondrial mutant but not in wild type. mitochondrial dysfunction caused by hsp- (mg ) mutation trigged the formation of eol- ::mscarlet foci eol- foci are not associated with the mitochondria. in the hsp- (mg ) mutant, the eol- ::mscarlet foci do not colocalize with mitochondria that were visualized by the mitochondrial outer membrane protein formation of eol- cytoplasmic foci requires the transcription of mitochondrial rna from mitochondrial genome by rpom- rna polymerase. the eol- ::mscarlet foci formed in the drh- (tm ) or eol- (mg ) single mutants do not extend lifespan drh- (tm ) or eol- (mg ) suppresses the extension of lifespan caused by hsp- (mg ) mutation protein alignment of drh- ntd in nematode species. cr: caenorhabditis remanei; cl: caenorhabditis latens; ce: caenorhabditis elegans; ac: ancylostoma ceylanicum we thank caenorhabditis genetics center and national biorescource project (tokyo, japan) for providing strains. k.m. is a damon runyon fellow supported by the damon runyon cancer research foundation (drg- - ). this work is supported by a grant from the national institute of health awarded to g.r.(nih gm and gm ). the authors declare no competing interests. key: cord- -cx elpb authors: hassani-pak, keywan; singh, ajit; brandizi, marco; hearnshaw, joseph; amberkar, sandeep; phillips, andrew l.; doonan, john h.; rawlings, chris title: knetminer: a comprehensive approach for supporting evidence-based gene discovery and complex trait analysis across species date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cx elpb generating new ideas and scientific hypotheses is often the result of extensive literature and database reviews, overlaid with scientists’ own novel data and a creative process of making connections that were not made before. we have developed a comprehensive approach to guide this technically challenging data integration task and to make knowledge discovery and hypotheses generation easier for plant and crop researchers. knetminer can digest large volumes of scientific literature and biological research to find and visualise links between the genetic and biological properties of complex traits and diseases. here we report the main design principles behind knetminer and provide use cases for mining public datasets to identify unknown links between traits such grain colour and pre-harvest sprouting in triticum aestivum, as well as, an evidence-based approach to identify candidate genes under an arabidopsis thaliana petal size qtl. we have developed knetminer knowledge graphs and applications for a range of species including plants, crops and pathogens. knetminer is the first open-source gene discovery platform that can leverage genome-scale knowledge graphs, generate evidence-based biological networks and be deployed for any species with a sequenced genome. knetminer is available at http://knetminer.org. which is prone to information being overlooked and subjective biases being introduced. even when the task of gathering information is complete, it is demanding to assemble a coherent view of how each piece of evidence might come together to "tell a story" about the biology that can explain how multiple genes might be implicated in a complex trait or disease. new tools are needed to provide scientists with a more fine-grained and connected view of the scientific literature and databases, rather than the conventional information retrieval tools currently at their disposal. scientists are not alone with these challenges. search systems form a core part of the duties of many professions. studies have highlighted the need for search systems that give confidence to the professional searcher and therefore trust, explainability, and accountability remain a significant knetminer provides search term suggestions and real-time query feedback. from a search, a user is presented with the following views: gene view is a ranked list of candidate genes along with a summary of related evidence types. map view is a chromosome based display of qtl, gwas peaks and genes related to the search terms. evidence view is a ranked list of query related evidence terms and enrichment scores along with linked genes. by selecting one or multiple elements in these three views, the user can get to the network view to explore a gene-centric or evidence-centric knowledge network related to their query and the subsequent selection. (nilsson-ehle, ) and that the red pigmentation of wheat grain is controlled by r genes on the long arms of chromosomes a, b, and d (sears, figure a ). this network is displayed in the network view which provides interactive features to hide or add specific evidence types from the network. nodes are displayed in a defined set of shapes, colors and sizes to distinguish different types of evidence. a shadow effect on nodes indicates that more information is available but has been hidden. the auto-generated network, however, is not yet telling a story that is specific to our traits of interest and is limited to evidence that is phenotypic in nature. to further refine and extend the search for evidence that links tt to grain color and phs, we can provide additional keywords relevant to the traits of interest. seed germination and dormancy are the underlying developmental processes that activate or prevent pre-harvest sprouting in many grains and other seeds. the colour of the grain is known to be determined through accumulation of proanthocyanidin, an intermediate in the flavonoid pathway, found in the seed coat. these terms and phrases can be combined using boolean operators (and, or, not) and used in conjunction with a list of genes. thus, we search for traescs d g (tt ) and the keywords: "seed germination" or "seed dormancy" or color or flavonoid or proanthocyanidin. this time, knetminer filters the extracted tt knowledge network ( nodes) down to a smaller subgraph of nodes and relations in which every path from tt to another node corresponds to a line of evidence to phenotype or molecular characteristics based on our keywords of interest ( figure b ). overall the exploratory link analysis has generated a potential link between grain color and phs due to tt -mft interaction and suggested a new hypothesis between two traits (phs and root hair density) that were not part of the initial investigation and previously thought to be unrelated. furthermore, it raises the possibility that tt mutants might lead to increased root hairs and to higher nutrient and water absorption, and therefore cause early germination of the grain. more data and experiments will be needed to address this hypothesis and close the knowledge gap. biologists would generally agree to be informative when studying the function of a gene. searching a kg for such patterns is akin to searching for relevant sentences containing evidence that supports a particular point of view within a book. such evidence paths can be short e.g. gene a was knocked out and phenotype x was observed; or alternatively the evidence path can be longer, e.g. gene a in species x has an ortholog in species y, which was shown to regulate the expression of a disease related gene (with a link to the paper). in the first example, the relationship between gene and disease is directly evident and experimentally proven, while in the second example the relationship is indirect and less certain but still biologically meaningful. there are many evidence types that should be considered for evaluating the relevance of a gene to a trait. in a kg context, a gene is considered to be, for example, related to 'early flowering' if any of its biologically plausible graph patterns contain nodes related to 'early flowering'. in this context, the word 'related' doesn't necessarily mean that the gene in question will have an effect on 'flowering shown to a user; let alone if combining gcss for tens to hundreds of genes. there is therefore a need to filter and visualise the subset of information in the gcss that is most interesting to a specific user. however, the interestingness of information is subjective and will depend on the biological question or the hypothesis that needs to be tested. a scientist with an interest in disease biology is likely to be interested in links to publications, pathways, and annotations related to diseases, while someone studying the biological process of grain filling is likely more interested in links to physiological or anatomical traits. to reduce information overload and visualise the most interesting pieces of information, we have devised two strategies. ) in the case of a combined gene and keyword search, we use the keywords as a filter to show only paths in the gcs that connect genes with keyword related nodes, i.e. nodes that contain the given keywords in one of their node properties. in the special case where too many publications remain even after keyword filtering, we select the most recent n publications (default n= ). nodes not matching the keyword are hidden but not removed from the gcs. ) in the case of a simple gene query (without additional keywords), we initially show all paths between the gene and nodes of type phenotype/trait, i.e. any semantic motif that ends with a trait/phenotype, as this is considered the most important relationship to many knetminer users. gene ranking we have developed a simple and fast algorithm to rank genes and their gcs for their importance. we give every node in the kg a weight composed of three components, referred to as sdr, standing for the specificity to the gene, distance to the gene and relevance to the search terms. specificity reflects how specific a node is to a gene in question. for example, a publication that is cited (linked) by hundreds of genes receives a smaller weight than a publication which is linked to one or two genes only. we define the specificity of a node x as: where n is the frequency of the node occurring in all n gcs. d i s t a n c e assumes information which is associated more closely to a gene can generally be considered more certain, versus one that's further away, e.g. inferred through homology and other interactions increases the uncertainty of annotation propagation. a short semantic motif is therefore given a stronger weight, whereas a long motif receives a weaker weight. thus, we define the second weight as the inverse shortest path distance of a gene g and a node x: both weights s and d are not influenced by the search terms and can therefore be pre-computed for every node in the kg. relevance reflects the relevance or importance of a node to user-provided search terms using the well-established measure of inverse document frequency (idf) and term frequency (tf) (salton & yang, we define the knetscore of a gene as: the sum considers only gcs nodes that contain the search terms. in the absence of search terms, we sum over all nodes of the gcs with r= for each node. the computation of the knetscore biologists, such as tables and chromosome views, allowing them to explore the data, make choices as to which gene to view, or refine the query if needed. these initial views help users to reach a certain level of confidence with the selection of potential candidate genes. however, they do not tell the biological story that links candidate genes to traits and diseases. in a second step, to enable the stories and their evidence to be investigated in full detail, the network view visualises highly complex information in a concise and connected format, helping facilitate biologically meaningful conclusions. consistent graphical symbols are used for representing evidence types throughout the different views, so that users develop a certain level of familiarity, before being exposed to networks with complex interactions and rich content. scientists spend a considerable amount of time searching for new clues and ideas by synthesizing many different sources of information and using their expertise to generate hypotheses. knetminer is a user-friendly platform for biological knowledge discovery and exploratory data mining. it allows humans and machines to effectively connect the dots in life science data and literature, search the connected data in an innovative way, and then return the results in an accessible, explorable, yet concise format that can be easily interrogated to generate new insights. we discovering protein drug targets using the monarch initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species a wheat homolog of mother of ft and tfl acts in the regulation of germination zur kenntnis der mit der keimungsphysiologie des weizens in zusammenhang stehenden inneren faktoren bioinformatics meets user-centred design: a perspective meta-analysis of the heritability of human traits based on fifty years of twin studies information retrieval in the workplace: a comparison of professional search practices progress in biomedical knowledge discovery: a -year on the specification of term values in automatic indexing cytogenetic studies with polyploid species of wheat knowledge graphs and knowledge networks: the story in brief knetmaps: a biojs component to visualize biological knowledge networks identification of loci governing eight agronomic traits using a gbs-gwas approach and validation by qtl mapping in soya bean big data: astronomical or genomical? sensitivity to "sunk costs" in mice, rats, and humans iwgsc whole-genome assembly principal investigators whole-genome sequencing and assembly shifting the limits in wheat research and breeding using a fully annotated reference genome trend analysis of knowledge graphs for crop pest and diseases mother of ft and tfl regulates seed germination through a negative feedback loop modulating aba signaling in arabidopsis use of graph database for the integration allelic variation and transcriptional isoforms of wheat tamyc gene regulating anthocyanin synthesis in pericarp the authors declare that they have no competing interests. key: cord- -lpwwlrqm authors: fenn, gareth d.; waller-evans, helen; atack, john r.; bax, benjamin d. title: crystallization and structure of ebselen bound to cysteine of human inositol monophosphatase (impase) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lpwwlrqm inositol monophosphatase (impase) is inhibited by lithium, the most efficacious treatment for bipolar disorder. several therapies have been approved, or are going through clinical trials, aimed at the replacement of lithium in the treatment of bipolar disorder. one candidate small molecule is ebselen, a selenium-containing antioxidant, which has been demonstrated to produce lithium-like effects, both in a murine model and in clinical trials. here we present the crystallization and first structure of human impase covalently complexed with ebselen, a . Å crystal structure (pdb entry zk ). in the human-impase-complex ebselen, in a ring opened conformation, is covalently attached to cys , a residue located away from the active site. impase is a dimeric enzyme and, in the crystal structure, two adjacent dimers share four ebselen molecules, creating a tetramer with ∼ symmetry. in the crystal structure presented in this publication, the active site in the tetramer is still accessible, suggesting that ebselen may function as an allosteric inhibitor, or may block the binding of partner proteins. synopsis here we present a . Å crystal structure of human inositol monophosphatase (impase) bound to the inhibitor ebselen (pdb entry zk ). in the structure, ebselen forms a seleno-sulfide bond with cysteine and ebselen-mediated contacts between two dimers give a ∼ tetramer. bipolar disorder is a chronic and debilitating psychiatric disorder, characterised by cycles of mania followed by severe depression, frequently accompanied by bouts of psychosis. although antipsychotic agents are the preferred short-term method of treatment, more efficacious mood stabilising drugs, such as lithium, are used in long term clinical management (geddes & miklowitz, ) . lithium is the gold-standard treatment for bipolar, however, it has several serious side effects, such as nausea and cognitive impairment; in addition to a narrow therapeutic window (rybakowski, ) . because of these liabilities, other, less efficacious mood stabilisers (e.g. lamotrigine), are now often used in the treatment of bipolar disorder (won & kim, ) . one enzyme inhibited by lithium, is inositol monophosphatase (impase) (gill, et al. ) , which has led to rational drug design targeting impase as a strategy for developing novel therapies for bipolar disorder (brown & tracy, ) . impase is a key enzyme in the phosphatidylinositol intracellular (pi) signalling pathway, whereby impase dephosphorylates inositol -, -, or -phosphate, collectively known as insp , to produce myo-inositol, also known as free inositol (atack et al., ) . cleavage of insp into myo-inositol by impase is required for the recycling of inositol for subsequent use in the pi signalling pathway (atack et al., ) . inositol is an essential precursor for the synthesis of pi, subsequently utilised in the synthesis of phosphatidylinositol phosphates (pips). these include pi( , )p , which is cleaved by phospholipase c following gpcr signalling to release the second messengers diacylglycerol (dag) and inositol , , -trisphosphate (ip ) (phiel & klein, ) . the observed depletion of free inositol and accumulation of the substrate of impase, insp , coupled with a reduction in agonist-invoked ip formation in cells and animals treated with lithium, led to the development of the inositol depletion hypothesis to explain the mechanism by which lithium exerts its effects (berridge et al., ) . the inositol depletion hypothesis, suggests that lithium produces a reduction in free inositol, primarily via blocking the recycling of inositol from insp , which leads to a decrease in pi( , )p and a slowing of the pip signalling pathways that are postulated to be hyperactive in bipolar disorder (harwood, ) . further evidence to support the inositol depletion hypothesis come from observations that the mood stabilisers carbamazepine and valproic acid, also lead to depletions in free inositol and attenuation of pi( , )p signalling pathways (williams et al., ) . therefore, targeting impase as a means of depleting free inositol has been of scientific interest, and has led to the search for new impase inhibitors. one such impase inhibitor is ebselen ( -phenyl- , -benzisoselenazol- ( h)-one), an organoselenium compound, which functions as a glutathione peroxidase mimic (nakamura et al., ) . ebselen is believed to act through reduction of reactive oxygen species (ros), by binding covalently to cysteine residues or thiols to form seleno-sulfide bonds that lead to the pharmacological effect (azad & tomar, ) . however, it is not known whether the covalent binding of ebselen to specific groups is directly or indirectly responsible for its mechanism of action (ulrich et al., ) . ebselen has been demonstrated to inhibit impase in a covalent manner, with effects consistent with that of lithium, through depletion of free inositol in mouse brain (singh et al., ) . subsequent trials in a healthy cohort demonstrated that ebselen leads to decreased myo-inositol in the anterior cingulate cortex, in addition to effects consistent with attenuation of pip signalling (singh et al., ) . at present ebselen is currently in stage clinical trial for the treatment of bipolar disorder, however, results from the trial have not been released at the time of writing. ebselen is known to bind to several proteins; crystal structures show ebselen covalently bound to cysteine residues of proteins including sod (capper., et al ; chantadul et al ) and the transpeptidase ldtmt from mycobacterium tuberculosis (de munnik., et al ) . ebselen has also been reported to inhibit the main protease from sars-cov- (jin., et al ) . these multiple targets suggest several potential therapeutic uses for ebselen, but also that there are likely to be off-target side effects. whilst inhibition of impase by ebselen in-vitro has been demonstrated, confirmation of ebselen binding loci and the exact mechanism of action on impase remains unclear. structures of impase have been published with a variety of ligands, including a structure of human impase with the lithium mimetic l- , (kraft et al., ) . in this paper, we present a . Å structure of ebselen covalently bound to cysteine residue of human impase (pdb entry zk ). this is the first structure of impase to be published demonstrating direct covalent binding of ebselen to impase. all reagents were purchased from sigma-aldrich or thermofisher unless otherwise stated. the impase construct was described by kraft et al. , with rossetta (de ) e. coli used for impase production. a starter culture ( ml) of transformed e. coli was grown overnight and used to inoculate l of lb medium containing . mm betaine, mm sorbitol, mg/ml chloramphenicol and mg/ml ampicillin. the culture was grown at ºc to an od . . impase expression was induced by the addition of . mm iptg, and the culture grown overnight at ºc. table ). cleaved impase was purified initially by incubating with glutathione sepharose fast flow resin that had been pre-equilibrated with two column volumes of sizeexclusion chromatography (sec) buffer ( mm tris-hcl ph . , mm nacl). the flow through from the column was collected and concentrated to ml and loaded onto a hiload / superdex prep-grade sec column pre-equilibrated with sec buffer. proteins were gel-filtered at a flow rate of ml min - and the fractions containing impase were: pooled, buffer exchanged into storage buffer [ mm tris-hcl ph . , mm nacl, mm edta, %(v/v) glycerol] and concentrated to mg/ml using kda protein concentrators at , g, prior to storage at - ºc. this protocol gave a typical yield of mg impase per litre of culture. the sample was transferred to a µl microcentrifuge tube and placed on a roller and incubated at room temperature for min, prior to setting up crystallisation plates a reservoir solution comprising the following: . m mnso , . m mes, % peg , ph . , was added to the impase and ebselen solution at a : ratio and incubated at ºc (see table ). crystals appeared after days and continued to grow until being harvested for data collection on day ; being cryoprotected by transfer into % (v/v) glycerol + reservoir solution ( % v/v) and flash cooled prior to data collection. an x-ray diffraction dataset was collected from a single cryo-cooled crystal on beam-line i - at the diamond light source synchrotron (table ). the data ( , . degree images) were processed with the xia pipeline at diamond (winter, ) to give a . Å dataset in p with cell dimensions a=b= . Å, c= . Å, α=β= °, γ= °. there are nine crystal structures of human impase in the pdb with similar cell dimensions, all in space-group p ; thus, the data were reindexed and remerged in p . the data were merged with version . . of the program aimless (evans & murshudov, ) . the rcp statistic, used to estimate cumulative radiation damage in aimless (diederichs, ) , did not increase significantly over the frames. test datasets were also produced, merging the first and last images from the dataset -to check for radiation damage effects (see below in refinement -section . -for details). the analysis suggested that the best dataset was obtained from using all data, and that although some radiation damage appeared to be present in the data, this damage was not reduced by removing the later frames from the dataset. the structure was solved by rigid body refinement from the . Å crystal structure of impase with mn (pdb entry gj ; kraft et al., ) . initial structure solution used the dimple pipeline; this pipeline provides the user with a quick method to identify datasets that have a bound ligand or drug candidate in their crystal (http://ccp .github.io/dimple/; wojdyr et al., ) . initial maps showed clear electron density (figure ) for a single ebselen molecule attached to cys in both subunits a and b of the dimer (the p cell has one dimer in the asymmetric unit). the ebselen was built onto cys a and cys b in coot (emsley et al., ; emsley, ) . restraints for the covalently bound ebselen were generated in acedrg (long et al., ) and the structure was refined using refmac (murshudov et al., ) . additionally allowed (%) . outliers(%)** . * ebselen -attached to cys a had only one atom with two positions (the selenium atom). ebselen attached to cys b had every atom in two positions. there are sixteen non-hydrogen atoms in ebselen. ** lys is the only residue (just) outside allowed region in both subunits (phi/psi - °/- °) although the electron density was very clear for the terminal phenyl ring of the compound (figure c and d) and electron density maps have a large peak for the selenium (covalently bonded to the sulfur of cys ), the electron density suggests some radiation damage has occurred to the sulfur-selenium bond (weik et al., ; garman, ) . the data are consistent with a model in which some initial radiation damage occurred to the sulfur-selenium (within the first few degrees), after which a steady state occurred (bond reforming after breaking due to radiation) (gerstel et al., ) . each active site contains three mn + ions in gj (kraft et al., ) , while in bji (gill et al., ) each active site contains three mg + ions. in our structure, site (gill et al., ) does not have enough electron density for a mn + ion (mn + ions have twenty-three electrons). we modelled a similarly coordinated sodium ion at this site, because we have no mg + ions in our crystallisation experiment (the protein comes in mm nacl and the crystallisation buffer contains mm mnso ). however, we cannot rule out the possibility that this is a mg + ion, rather than a na + ion (both na + and mg + ions have ten electrons). this 'site ' is the position where lithium is postulated to bind with tetrahedral coordination geometry (gill et al., ) . however, the coordination geometry in our structure is consistent with two mn + ions and one mg + ion each with 'standard' octahedral coordination geometry. most of the active site metal atoms are modelled in two positions and have temperature factors similar to those of surrounding residues (masmaliyeva & murshudov, ) . in the deposited structure (pdb zk ), the ebselens on cys in subunits a and b each have an occupancy of . , but the selenium atoms are modelled in two positions (supplementary figure ) . in the crystal, the selenium-sulfur bond has been modelled with an occupancy of . or . (see supplementary figure ). a second selenium position, observed in electron density maps, is some . Å further away from the sulfur of cys and this second position is likely caused by radiation induced cleavage of the seleno-sulfide bond (gerstel et al., ) . the . Å crystal structure of human impase with ebselen (pdb entry zk ) was solved from a structure of human impase in the same unit cell and space-group ( gj - kraft et al., ) . electron density maps (figure ) clearly showed ebselen attached only to a single cysteine, cys . the binding of ebselen to the cys residue in each monomer does not lead to noticeable changes in conformation in the active site that would prevent the catalytic activity of impase. additionally, the binding of ebselen to cys does not appear to prevent dimer formation, as evidenced by the dimers (and tetramers) present in this structure ( zk ). however, whilst this structure clearly shows ebselen bound to cys on each monomer of impase, our structure does not rule out the possibility that cys could be modified in-vivo. (wojdyr et al., ) fo-fc map ( sigmalight blue), and difference map fo-fc ( sigma -orange). for subunit a the dimple refined structure with waters (small red spheres) refined into the density for the ebselen is shown. for the a' subunit the 'final' coordinates (including ebselen) are shown. impase contains seven cysteine residues, amino acids , , , , , and , of these residues, only cys is near the active site. four of the cysteine residues are buried and would not be expected to be accessible to modification by ebselen (cysteines , , and ) . of the three cysteines that have some surface accessibility in the monomer, one of them, cys is largely buried in the dimer interface as shown in figure (pdb entry zk ) . the procedure used to co-crystallise ebselen with impase allowed partial oxidation to form the cys -ebselen seleno-sulfide bond (fig .) . in our structure cys has some surface accessibility when reduced, but when oxidised forms a disulphide with cys ( figure ). cys has no surface accessibility whether oxidised or reduced. a partial cys -cys disulphide is also observed in gj (kraft et al., ) . cysteine residues in impase with ebselen bound to cys (pdb entry zk ). impase is shown as with a ca ribbon trace and the side-chains of the seven cysteines are shown in stick on the 'red' subunit. the second subunit in the dimer is shown in cyan. a semi-transparent surface is shown; note that where the sulphurs of the cysteine residues are on the surface of the protein, the surface is yellow (cys and cys ). cys also has some surface accessibility in the monomer, but is largely buried at the dimer interface, so no yellow is visible for cys in this figure. previous research suggested that cys is the primary reactive cysteine residue (knowles et al., ) , supported by reduced inhibition of c a impase by ebselen (singh et al., ) . in the structure reported here (pdb entry zk ), and other human impase structures, cys is largely buried, and therefore seems an unlikely target for modification, as it is unclear as to how ebselen would gain access. if the side-chain of cys is modified by ebselen it would likely lead to substantial reorganisation of the protein structure; asp coordinates an active site metal ion. from our structure, it appears that cys is likely to be the primary binding site of ebselen. the sulfur group of this residue is exposed on the surface of impase (figure ) , and this residue is not in close proximity to another cysteine residue, in either the monomeric or dimer form, so unlikely to form a disulphide bond (pdb entry zk ). cys is conserved in mammals (knowles et al., ; singh et al., ) , and an analogous cysteine residue (cys ) is present in staphylococcus aureus impase. given this level of conservation, it is probable that cys is a functionally important residue in impase (dutta et al., ) . cys has previously been shown to be a reactive cysteine residue, demonstrated through affinity for the thiol probes pyrene-maleimide (greasley et al., ) and n-ethylmaleimide (knowles et al., ) . in this structure, ebselen is in an open ring conformation with the selenium atom forming a seleno-sulfide bond with the sulfur group of cys , which is consistent with the known binding mechanism of ebselen (capper et al., ) . each monomer of impase in this structure has a single ebselen molecule bound to cys (pdb entry zk ). whilst cys conservation would suggest a critical role of this residue, the exact function remains unclear. the residue is not in close proximity to the active site, therefore the residue is unlikely to be involved in catalytic activity. one possibility is that the residue may undergo post translational modification in-vivo, and may be redox active (marino & gladyshev, ) , linking ebselen to therapeutic effects as a known antioxidant. figure shows the position of cys on the subunits a and b in an impase dimer. the two ebselen molecules covalently bound to each dimer and localise with two other ebselen molecules on a second dimer. the ebselen molecules on each dimer increase contacts with a neighbouring dimer, which gives a tetramer with ~ symmetry in the crystal. however, as shown in figure (panels g and h), the active site still appears accessible. interestingly, other members of the impase superfamily, including fructose- , -bisphosphate (fbpase), have been observed to have both dimer and tetramer forms (hines et al., ) . tetrameric forms of impase have also been observed in the anaerobic hyperthermophilic eubacterium t. maritima (stieglitz et al., ) . showing that the three metal ions (grey/black spheres) at each active site are still accessible in the tetramer. in view (h), a surface is shown on both dimers. the impase crystal structure that is presented (pdb entry zk ) has ebselen covalently attached to cys , however it is not clear to what extent this binding brings about ebselen's inhibitory effects on impase. it is possible that the modification on cys is biologically relevant, and that this is a cysteine residue that is modified in vivo by ebselen, with binding over the previously suggested preferred residue cys (singh et al., ) . cys has been shown to be a reactive cysteine residue (greasley et al., ) , so it possible that ebselen binding at cys causes inhibition of impase. however, the binding of ebselen does not affect the conformation of the active site or prevent dimer formation; with the high conservation of cys across species, it is likely to be a functional residue with a potential regulatory redox role. there is evidence that modulation of impase away from the active site and dimer interface can affect activity; synthetic peptides that disrupt impase-calbindin interactions prevent calbindin mediated activation of impase (noble et al., ) and mediate antidepressant-like effects in mice (levi et al. ) . it is possible that ebselen interferes with accessory protein binding, possibly by formation of the tetramer seen in the crystal, to moderate the activity of impase in-vivo. however, the possibility that the conditions used in crystallisation do not reflect physiological conditions and that, in vivo, cys and cys could be modified differently by ebselen, cannot be ruled out. the binding of ebselen to cys does not appear to have significantly altered the structure of the dimer or the active site. should the ebselen/impase tetramers observed prove to be biologically relevant, this suggests a new mechanism for the regulation and subsequent inhibition of impase that could be utilised in the development of novel therapeutics. biochem acta cryst. a , s . won we are especially grateful to dr. pierre rizkullah and his colleagues for transporting and carrying out data collection on our crystals. we also thank diamond light source ltd (didcot, uk) for access to synchrotron radiation on beamline i . we thank gareth wright for helpful discussions. key: cord- -zh cjk authors: ferraro, francesco; costa, joana r.; ketteler, robin; kriston-vizi, janos; cutler, daniel f. title: modulation of endothelial organelle size as an antithrombotic strategy date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zh cjk it is long-established that von willebrand factor (vwf) is central to haemostasis and thrombosis. endothelial vwf is stored in cell-specific secretory granules, weibel-palade bodies (wpbs), uniquely rod-like exocytic organelles generated in a wide range of lengths ( . to . µm). it has been shown that wpb size responds to physiological cues and pharmacological treatment and that, under flow, vwf secretion from shortened wpbs produces a dramatic reduction of platelet and plasma vwf adhesion to an endothelial surface. wpb-shortening therefore represents a novel target for antithrombotic therapy acting via modulation of vwf adhesive activity. to this aim, we screened a library of licenced drugs and identified several that prompt wpb size reduction. these compounds therefore constitute a novel set of potentially antithrombotic compounds. summary the size of the endothelial secretory granules that store von willebrand factor correlates with its activity, central to haemostasis and thrombosis. here, human-licenced drugs that reduce the size of these secretory granules are identified, providing a set of novel potential anti-thrombotic compounds. endothelial von willebrand factor (vwf) plays a fundamental role in haemostasis, with deficiencies in its activity causing von willebrand disease (vwd), the most common inherited human bleeding disorder (sadler, ) . vwf is a large multi-domain glycoprotein, whose function in haemostasis depends on its multimeric status. vwf multimers act as mechano-transducers, which respond to shear forces in the circulation by stretching open and exposing binding sites for integrins, collagen, platelets and homotypic interaction (i.e., between vwf multimers) (ruggeri and mendolicchio, ) . endothelial cells secrete vwf in a highly multimerized form, known as ultra-large (ul)-vwf, highly sensitive to haemodynamic forces and thus very active in platelet binding (zhang et al., ) . ul-vwf's potential to cause spontaneous thrombus formation is controlled by a circulating protease, adamts , which generates the less-multimerised, less active forms of vwf seen in plasma (zhang et al., ) . persistence of ul-vwf in the circulation leads to microvascular thrombosis and the highly morbid and potentially life threatening clinical manifestations observed in a host of infectious and non-infectious diseases, such as sepsis and thrombotic thrombocytopenic purpura (ttp) (chang, ; tsai, ) . although these may be the most extreme examples of excess vwf function, many common disorders including hypertension and diabetes are characterised by increased vwf plasma levels (apostolova et al., ; westein et al., ) . while vwf is stored in the secretory granules of both platelets and endothelial cells, most of the vwf circulating in plasma derives from endothelial wpbs and is fundamental to haemostasis (kanaji et al., ) . vessel injury, in either the macro-or microcirculation, triggers localized stimulated exocytosis of wpbs, mediated by a variety of agonists (mccormack et al., ) . ul-vwf secreted from activated endothelium forms cable-like structures built of multiple multimers, both in vitro and in vivo (arya et al., ; rybaltowski et al., ) . these vwf "strings" provide scaffolds for the recruitment of circulating platelets and soluble plasma vwf, contributing to the formation of the primary haemostatic plug, but also potentially promoting microangiopathy (nicolay et al., ; ruggeri and mendolicchio, ) . the length of vwf strings generated upon exocytosis reflects the size of the wpbs in which vwf was stored. wpbs are cigar-shaped organelles, whose length ranges ten-fold, between . and . µm. their size depends on the structural status of the endothelial golgi apparatus where they form, and experimental manipulations causing golgi fragmentation consistently result in the formation of only short wpbs (ferraro et al., ) . short wpbs also form when vwf biosynthesis by endothelial cells is reduced, or following statin treatment, via golgi fragmentation-independent and -dependent mechanisms, respectively (ferraro et al., ; ferraro et al., ) . the metabolic status of endothelial cells also regulates wpb size through an ampk-mediated signalling pathway (lopes-da-silva et al., importantly, in vitro experiments have revealed that reducing wpb size results in the shortening of the vwf strings they generate and in much-reduced recruitment of platelets and soluble circulating vwf to the endothelial surface (ferraro et al., ; ferraro et al., ) . conversely, endothelial cells respond to raised glucose levels (mimicking hyperglycemia) by producing longer wpbs, suggesting a link between long vwf strings and thrombotic manifestations in diabetes (lopes-da-silva et al., ) , which is often associated with high levels of plasma vwf and microangiopathy (westein et al., ) . wpb size is, therefore, plastic and responds to physiological cues and pharmacological treatments. such findings suggest that drug-mediated reduction of wpb size might provide alternative or coadjuvant therapeutic approaches to current clinical interventions in thrombotic pathologies where dysregulated formation and/or prolonged persistence of vwf strings on vascular walls play a triggering role. we designed a screen to identify drugs that reduce wpb size and thus can potentially reduce endothelial pro-thrombotic capacity. out of human licensed drugs we found compounds fitting our criteria, with a variety of mechanisms of action consistent suggesting a number of pathways that influence biogenesis of wpbs. a quantitative high-throughput microscopy-based workflow, dubbed highthroughput morphometry (htm), allows rapid quantification of the size of tens to hundreds of thousands of wpbs within thousands of endothelial cells (ferraro et al., ) . htm has been applied for analytical purposes and in phenotypic screens (ferraro et al., ; ferraro et al., ; ketteler et al., ; lopes-da-silva et al., ; stevenson et al., ) . in the present report, htm was deployed to identify compounds that can induce a reduction of wpb size in human umbilical vein endothelial cells (huvecs). wpbs longer than µm represent ~ % of these vwf-storing organelles, but contain roughly % of all endothelial vwf. thus, while a minority, these long wpbs are disproportionally important with respect to secreted endothelial vwf (ferraro et al., ) . for the purpose of the screen, we quantified wpb size as the ratio between the area covered by wpbs longer than µm and the area covered by all these organelles; we define this parameter as "fractional area (fa) of long wpbs" (figure a ; (ferraro et al., ) . the effect of each compound in the library on wpb size was compared to two controls: treatment with dmso and nocodazole, the negative and positive controls, respectively, for wpb shortening (ferraro et al., ; ferraro et al., ) (figure b and c) . huvecs were incubated with the compounds of the prestwick library at µm for h in single wells of -well plates, in duplicate (two separate plates; figure d ). after fixation, immuno-staining and image acquisition, the fa of long wpbs in treated cells was quantified. per plate and inter-plate normalization to dmso-controls was implemented in order to rank the effects of the library compounds on wpb size by z-score, (figure e ). hit selection. statins are cholesterol-lowering drugs that inhibit the enzyme -hydroxy, methylglutaryl coa reductase (hmgcr) in the mevalonate pathway, upstream of the biosynthesis of cholesterol. we have previously shown that treatment of endothelial cells with two statins, simvastatin and fluvastatin, induces wpb size shortening, resulting in reduced adhesive properties of the vwf released by activated endothelial cells (huvec), measured by the reduced size of platelet-decorated vwf strings and by the recruitment of vwf from a flowing plasma pool (ferraro et al., ) . fluvastatin and simvastatin are present in the prestwick library. we therefore used their z-score to establish a stringent cutoff for selection of positive hits. this approach identified compounds, . % of the library ( figure e , orange box, simvastatin and fluvastatin). of note, aside from simvastatin and fluvastatin, the other three statins present in the prestwick library were among the selected hits ( table ) , indicating that both our screening approach and the criterion for hit selection are robust. hit pharmacology. forty-one ( %) of the drugs identified could be allocated in ten pharmacological classes, each including at least three compounds ( table ; and supplemental table ), consistent with a variety of mechanisms of wpb size control. some of the compounds are shared by more than one class (table and figure a) ; an indication that, aside from their known molecular targets, these drugs may exert their effect on wpb size through a yet unidentified common cellular pathway. among the pharmacological classes capable of inducing wpb shortening, we found microtubule (mt) depolymerizing agents, histone deacetylase (hdac) inhibitors, topoisomerase inhibitors and, as mentioned earlier, statins (hmgcr inhibitors). compounds with these mechanisms of action have been shown induce unlinking of the ribbon architecture of the golgi apparatus, i.e. golgi "fragmentation" (farber-katz et al., ; ferraro et al., ; gendarme et al., ; thyberg and moskalewski, ) . since an intact golgi ribbon is required for the biogenesis of long wpbs (ferraro et al., ) (see nocodazole in figure b ), identification of these classes of molecules in our screen was expected. work from our lab also showed that neutralization of the acidic lumen of wpbs disrupts the tubular structure of vwf, shifting the organelle shape from cylindrical to spherical, therefore detected as shortening (michaux et al., ) ; and, indeed, we identified transmembrane ph gradient depleting agents. reduction in vwf biosynthesis results in shorter wpbs without affecting the golgi architecture (ferraro et al., ) and one of the screen hits, cycloheximide, is a classic protein synthesis inhibitor. aside from those expected, entirely novel wpb-shortening drug classes were also identified, including neurotransmitter receptor antagonists and cardiac glycosides. interestingly, several compounds, beside their known mechanism of action, also inhibit multidrug resistance protein , mrp /abcb (table , figure a and supplemental table ), which may hint at the common cellular pathway discussed above. the cardiac glycosides and their cardenolide precursors were prominent among these novel pharmacological classes. these molecules, which inhibit na + /k + -atpase (table and (figures b and a) instead of its fragmentation, suggestive of a novel wpb size-reducing mechanism. in most cases, sub-micromolar concentrations of these compounds were sufficient to reduce wpb size ( figure b ). upon exocytosis, endothelial ul-vwf self-assembles into strings, which serve as recruiting platform for platelets and circulating vwf, thus promoting the formation of the primary haemostatic plug (ferraro et al., ; ruggeri, ; varga-szabo, ) . vwf-strings also mediate pathological processes such as tumour metastasis, endocarditis and microangiopathies (bauer et al., ; nicolay et al., ; pappelbaum et al., ) . interventions reducing the persistence and/or activity of vwf-strings are therefore of interests as potential anti-thrombotic therapies. modulation of organelle size has been suggested as a potential strategy to regulate biological functions and correct pathological states (marshall, ) . in this context, wpbs represent a paradigmatic example. reduction of wpb size has no effect on ul-vwf formation, but blunts generation of long platelet-decorated vwf-strings following exocytosis and recruitment of plasma vwf to the endothelial surface (ferraro et al., ; ferraro et al., ) . interventions that shorten wpbs could therefore provide alternative or coadjuvant therapies to clinical interventions in thrombotic pathologies where dysregulated formation and/or prolonged persistence of vwf-strings play a triggering role. apart from pharmacological treatments and other experimental manipulations disrupting the integrity of the golgi apparatus (ferraro et al., ; ferraro et al., ) , formation of short wpbs is mediated by endogenous signalling pathways. we have uncovered a pathway involving ampk-dependent regulation of the arf-gef gbf , which is (dekker et al., ; fledderus et al., ; kumar et al., ; lin et al., ) . treatment with statins also up-regulates klf expression; and their antiinflammatory, anti-coagulant and antithrombotic effects are believed to be mediated by this transcription factor sen-banerjee et al., ) . however, wpb size reduction induced by statin treatment does not require klf (ferraro et al., ) . altogether, a significant body of experimental evidence indicates that the size of wpbs is subject to regulation and represents a target for pharmacological intervention in haemostatic function and thrombotic risk. we therefore screened human-licenced drugs with the aim of identifying wpb sizereducing molecules that could be rapidly repurposed as antithrombotics. we found fifty-eight drugs with this activity, the majority of which can be grouped into pharmacological classes. some of these classes, such as microtubule depolymerizing agents and statins, have been identified by previous work (ferraro et al., ; ferraro et al., ) . others might be expected, due to their effects on the golgi ribbon, as in the case of hdac and topoisomerase inhibitors (farber-katz et al., ; gendarme et al., ) . our screen also identified compounds with pharmacology previously unknown to affect wpb biogenesis and size. together, these findings suggest that several cellular pathways can modulate the size of wpbs produced by endothelial cells. multidrug resistance protein (mrp ), is an organic anion transporter. its upregulation is responsible for the development of tumor resistance to chemotherapy, hence its name. while this activity towards xenobiotics was the first to be identified, it has become clear that mrp is also involved in the cellular efflux of endogenous molecules, mediating pro-inflammatory signalling pathways and may act as an oxidative stress sensor (cole, ) . interestingly, twenty-two compounds, with varied mechanisms of action (table and supplemental table ), have also been described as mrp inhibitors. this suggests the possibility that, in addition to their main molecular target, these drugs could affect the efflux of endogenous mrp signalling substrates, which regulate wpb size; a mechanism worth future investigation. since the screen endpoint was h, the drugs listed are relatively fast-acting. except for statins (ferraro et al., ) and cardiac glycosides (see above), drug activity was documented at the single concentration used in our screen ( µm). while such concentrations are unlikely to be used for patient administration, it is worth noting that submicromolar concentration treatment with simvastatin for h does reduces wpb size with dramatic effects on both platelet recruitment and plasma vwf adhesion to the stimulated endothelium (ferraro et al., ) . it therefore cannot be ruled out that several of the drugs identified by the screen would maintain wpb size-reducing activity at lower concentrations, compatible with their use in the clinic. statins rapidly produce anti-inflammatory and anticoagulant effects on the endothelium (greenwood and mason, ) and their acute administration in the context of percutaneous angioplasty greatly reduces post-operative myocardial infarctions (leoncini et al., ) . the compounding of these fast-acting effects, wpb size-reduction included, suggests that statins may represent a promising tool for acute, emergency treatment in endotheliopathies involving inflammation, coagulation and thrombosis. as a class, the most potent wpb-shortening drugs identified in the screen, active also at sub-micromolar concentrations (figure b and b) , were the cardiac glycosides and cardenolides. with the caution due to their known dose-dependent cytotoxicity (kanji and maclean, ) , cardiac glycosides may therefore be worth exploring in acute and chronic antithrombotic therapies. further to the potential toxicity associated with administration of drugs at high concentrations, we note that in vitro combination of wpb-size reducing treatments, acting through different mechanisms, can display synergy in the abatement of plasma vwf recruitment to the endothelial surface (supplemental figure ) . in a clinical setting, wpb shortening and its consequent antithrombotic effects might therefore be achieved by administering combinations of drugs at lower, non-toxic concentrations. in conclusion, here, we report a set of licenced drugs with potential antithrombotic activity, via a novel mechanism: the reduction of wpb size and its consequence in terms of reduced adhesion of platelets and circulating plasma vwf to the ul-vwf they release. µg/ml endothelial cell growth supplement from bovine neural tissue and u/ml heparin (both from sigma-aldrich). cells were cultured at °c, % co , in humidified incubators. reagents. the library compounds ( fda-approved drugs), from prestwick chemical, were stored at - °c as mm stock solutions in dmso. for the screen, compounds were transferred to echo qualified -well low dead volume microplates using the echo® acoustic dispenser (both from labcyte). antibodies used in this study were: rabbit polyclonal anti-vwf pro-peptide region (hewlett et al., ) huvecs were seeded in -well plates and pre-processed as described for the screen. cardiac glycosides were added to cells using the echo® acoustic dispenser (labcyte) and cells were treated as described above. concentration of each compound ranged from nm to µm in -fold increments across the wells of the plate column. duplicate plates were prepared. each plate contained dmso and nocodozale controls as detailed for the screen. treatment was for h, at the end of which cells were fixed as described for the screen. immunostaining and image acquisition. fixed cells were processed for immunostaining as previously described (ferraro et al., ) . wpbs were labelled using a rabbit polyclonal antibody to the vwf pro-peptide region. the golgi apparatus was labeled with an anti-gm mab. primary antibodies were detected with alexa fluor dye-conjugated antibodies (life technologies). nuclei were counterstained with (life technologies). images ( fields of view per well) were acquired with an opera high content screening system (perkin elmer) using a x air objective (na . ). high-throughput morphometry (htm) workflow. image processing and wpb extraction of morphological parameters (high-throughput morphometry, htm) have been described in detail elsewhere (ferraro et al., ) . wpb size was expressed per well (i.e., summing the values measured in the fields of view) as the fraction of the total wpb area covered by wpbs > µm. dose-response data analysis was done in r language, using the drc package by christian ritz and jens c. strebig (https://cran.r-project.org/package=drc). the drm () function was used to fit a dose-response model, a four-parameter log-logistic function (ll. ), applied to each dataset; four parameter values were calculated: slope, lower limit, upper limit and ec value. plasma vwf recruitment assays. sirna nucleofections, drug treatment and human plasma perfusion experiments on huvec monolayers were carried out as previously described (ferraro et al., ) . table . licenced drugs with wpb-shortening activity. individual drugs were assigned to pharmacological classes, indicated by numbers, based on their mechanism of action (see appendix table) . , cardiac glycoside or cardenolide (na + /k + -atpase inhibitor); , statin ( -hydroxy, -methylglutaryl coa reductase, hmgcr inhibitor; , topoisomerase inhibitor/ dna-intercalating agent; , serotonin ( -ht) receptor antagonist; , histone deacetylase (hdac) inhibitor; , dopamine receptor antagonist; , histamine receptor antagonist; , microtubule depolymerizing compound; , ph gradient-depleting compound; , multidrug resistance protein (mrp /abcb ) inhibitor. two plates with identical drug layout (biological duplicates) were analyzed. e. screen results. z-scores of the library drugs are plotted. hit drugs were selected based on the effects of fluvastatin and simvastatin, which we previously showed to reduce wpb size. fluvastatin, with the lowest z-score, was used as cut-off for the selection of hit compounds. graphical summary of the known mechanism of action (moa) of the hits reported in the supplemental table . each pharmacological class is depicted by a square, whose area is proportional to the number of compounds it includes. lines connecting the squares represent common drugs between pharmacological classes. thickness of the lines indicates the number of drugs shared. b. drug classes ranked by potency, using their median z-score in the screen. each data-point represents one drug and classes are color-labeled based on their know, inferred from the literature and previously unknown (novel) effects on wpb size. table pharmacology of hit compounds. information regarding the mechanism of action of hit drugs was searched in pubmed and drugbank (https://www.drugbank.ca/). forty-one compounds can be assigned to ten pharmacological classes, each containing at least drugs (see table ). no drugbank record. it is an ionophore antibiotic (na+/h+ antiporter) produced by the fungus streptomyces cinnamonensis. depletes the transmembrane ph gradient of the golgi apparatus and acidic organelles. drugbank; agonist of serotonin receptor ht . antagonist of ht a, ht b and ht c serotonin receptors. inhibitor of sodium-dependent serotonin transporter slc a . drugbank: it inhibits tubulin beta- chain and tubulin beta chain(tubb , tubb) and microtubule polymerization. inhibits several cyp enzymes and transporters. inhibits multidrug resistance protein (abcb ) drugbank: cardiac glycoside. it consists of three sugars and the aglycone digoxigenin. inhibits na/k atpase alpha- (atp a ). substrate, inhibitor and inducer of multidrug resistance protein (abcb ). drugbank: antiemetic, it is a substance p/neurokinin (nk ) receptor antagonist. drugbank: a non-benzodiazepine that is used in the management of anxiety. binds to benzodiazepine receptors, which interact allosterically with gaba receptors, potentiating the effects of the inhibitory neurotransmitter gaba. binds and is an agonist of transporter protein (tspo) which promotes the transport of cholesterol across mitochondrial membranes and may play a role in lipid metabolism (pubmed: ), but its precise physiological role is controversial. tspo is apparently not required for steroid hormone biosynthesis, was initially identified as peripheral-type benzodiazepine receptor and can also bind isoquinoline carboxamides drugbank; mucolytic agent. excessive nitric oxide (no) is associated with inflammation of airways. no enhances the activation of soluble guanylate cyclase and cgmp accumulation. ambroxol has been shown to inhibit the nodependent activation of soluble guanylate cyclase. wikipedia; ambroxol is a potent inhibitor of the neuronal na+ channels. it also activates lysosomal enzyme glucocerebrosidase. ambroxol can also diffuse into lysosomes and induce ph neutralization. ambroxol and its parent drug bromhexine have been shown to induce autophagy in several cell types. adenosine 'monophosphate drugbank; several functions, as it is a central molecule in metabolism and signalling. activates ampk methyldopa (l,-) drugbank; agonist of alpha- adrenergic receptor with both central and peripheral nervous system effects. its primary clinical use is as an antihypertensive agent. targets alpha- a adrenergic receptor (adra a) and inhibits aromatic-l-amino-acid decarboxylase (ddc) and the transporter solute carrier family member (scl a ) specific for dipeptides. drugbank; semisynthetic derivative of podophyllotoxin. exhibits antitumor activity. it inhibits dna synthesis by forming a complex with topoisomerase ii and dna, inducing breaks in double stranded dna and preventing repair by topoisomerase ii binding. accumulated breaks in dna prevent entry into the mitotic phase of cell division and lead to cell death. inhibits dna topoisomerase -alpha (top a) and dna topoisomerase -beta (top b). cyclosporin a drugbank: lipophilic cyclic polypeptide formed by amino acids with immunosuppressive and immunomodulatory properties. essentially, it is a calcineurin inhibitor and this activity allows for inhibition of t cell activation. it binds to the intracellular receptor cyclophilin- forming a ciclosporincyclophilin complex that inhibits calcineurin preventing the dephosphorylation and activation of the nuclear factor of activated t cells (nf-at). the nf-at is a transcription factor that regulates the production of pro-inflammatory cytokines such as il- , il- , interferon-gamma and tnf-alpha. cyclosporin inhibits calcineurin regulatory subunit b type (ppp r ) and binds peptidylprolyl cis-trans isomerase a and f (ppia, ppif), calcium signal-modulating cyclophilin ligand camlg. it is a substrate, inhibitor and inducer of multidrug resistance protein (abcb ). drugbank: hydroxymethylglutaryl coenzyme a (hmg-coa) reductase inhibitor. also inhibits dipeptidyl peptidase (dpp ). agonist of aryl hydrocarbon receptor (ahr). inhibits multidrug resistance protein (abcb ). it has been shown that other statins inhibit this transporter (pubmed: ). inhibits solute carrier organic anion transporter family member a , b (slco a , slco b ). it is a substrate of several transporters. drugbank: inhibitor of hydroxymethylglutaryl coenzyme a (hmg-coa) reductase (hgmcr). the first statin discovered. isolated from penicillium citinium. it also inhibits liver carboxylesterase , ces (a triglyceride lipase). drugbank: triazole antifungal agent. inhibits cytochrome p- -dependent enzymes resulting in impaired ergosterol synthesis. it has been used against histoplasmosis, blastomycosis, cryptococcal meningitis & aspergillosis. inhibits lanosterol -alpha demethylase (cyp a ) and several other cyp enzymes. inhibits multidrug resistance proteins (abcb ) and slco b . wikipedia. among the triazole antifungal agents is the only one shown to inhibit the hedgehog pathway and angiogenesis. the antiangiogenic activity was shown to be linked to inhibition of glycosylation, vegfr phosphorylation, trafficking and cholesterol biosynthesis pathways. like cyclosporine, quinidine and clarithromycin (and statins), can inhibit p-glycoproteins (multidrug resistance proteins, abcbs) causing drug-drug interactions by reducing elimination and increasing absorption of organic cation drugs. drugbank: vorinostat, known also as suberoylanilide hydroxamic acid (saha), is currently under investigation for the treatment of cutaneous t cell lymphoma (ctcl), a type of skin cancer. it is the first in a new class of agents known as histone deacetylase inhibitors. inhibits histone deacetylase , , , and (hdac , hdac , hdac , hdac and hdac ). hdac inhibitors and dna-damaging (dna-intercalating) agents were identified as novel golgi disruptors (see pmid: ). note; clemizole has been identified as a potential anti-epileptic through its action on serotonin receptors (see annotation of clemizole for reference); among the papers citing this study, one identifies vorinostat as an anti-epileptic (pmid: ) clomiphene drugbank: it is an estrogen agonist or antagonist depending on the target tissue. estrogen receptor (esr ) agonist or antagonist. inhibits some cytochrome p (cyps) enzymes. drugbank: long-acting, non-sedating second generation antihistamine used in the treatment of allergy symptoms. antagonist of histamine h receptor (hrh ). inhibitor of potassium voltage-gated channel subfamily h member (kcnh ). also acts (with unclarified pharmacological action) on potassium voltage-gated channel subfamily h member (kcnh ) and microtubuleassociated protein tau (mapt). inhibits multidrug resistance protein (abcb ) perhexiline drugbank: coronary vasodilator used especially for angina. it may cause neuropathy and hepatitis. inhibits carnitine o-palmitoyltransferase , liver isoform (cpt a), carnitine o-palmitoyltransferase , mitochondrial (cpt ). acts (no established pharmacology) on potassium voltage-gated channel subfamily h member (kcnh ). drugbank: monocationic surface-active agent with surfactant and detergent properties. it is widely used to enhance dispersion and penetration of cellular debris and exudate, thereby promoting tissue contact of the administered medication. inhibits v-atpase, v-type proton atpase subunit c (atp v c ). no drugbank record alclometasone drugbank: synthetic glucocorticoid steroid for topical use in dermatology as anti-inflammatory, antipruritic, antiallergic, antiproliferative and vasoconstrictive agent. agonist of glucocorticoid receptor (nr c ). drugbank: anthelmintic effective for pinworms; no targets reported. wikipedia: it is an anthelmintic and has been shown that the pamoate salt has preferential toxicity for various cancer cell lines during glucose starvation. drugbank: a medication used as a photosensitizer for photodynamic therapy to eliminate the abnormal blood vessels in the eye associated with conditions such as the wet form of macular degeneration. no molecular targets reported. no drugbank record. wikipedia: antagonist of hrh . recently, it was identified through a phenotypic screen as described as a ht receptor antagonist (pmid: ) fluvastatin drugbank: hydroxymethylglutaryl coenzyme a (hmg-coa) reductase inhibitor. inhibits -hydroxy- -methylglutaryl-coenzyme a reductase (hmgcr). figure . synergistic effects of wpb-shortening treatments on plasma vwf adhesion. wpb size was reduced by two treatments, simvastatin incubation and reduced vwf synthesis (ferraro et al., ; ferraro et al., ) , alone or in combination. huvecs were nucleofected with sirnas targeting luciferase (negative control) or vwf and seeded in µ-slides vi (ibidi). at h postnucleofection, cells were treated with dmso or . µm simvastatin. after h drug treatment, and h post-nucleofection, cells were exposed to histamine to stimulate vwf secretion, while perfused with pooled human plasma and then fixed under flow (as described in ferraro et al., ) . vwf on the endothelial surface was detected by immunofluorescence; the area covered by its signal measures the extent of its adhesion. each data-point represents the quantification of a field of view; median and interquartile ranges are shown. ****, p < . , mann-whitney test. blue asterisks: platelet aggregation in malignant melanoma of mice and humans sepsis and septic shock: endothelial molecular pathogenesis associated with vascular microthrombotic disease multidrug resistance protein (mrp , abcc ), a "multitasking" atpbinding cassette (abc) transporter klf provokes a gene expression pattern that establishes functional quiescent differentiation of the endothelium dna damage triggers golgi dispersal via dna-pk and golph a two-tier golgi-based control of organelle size underpins the functional plasticity of endothelial cells weibel-palade body size modulates the adhesive activity of its von willebrand factor cargo in cultured endothelial cells prolonged shear stress and klf suppress constitutive proinflammatory transcription through inhibition of atf image-based drug screen identifies hdac inhibitors as novel golgi disruptors synergizing with jq statins and the vascular endothelial inflammatory response temperature-dependence of weibel-palade body exocytosis and cell surface dispersal of von willebrand factor and its propolypeptide contribution of platelet vs. endothelial vwf to platelet adhesion and hemostasis cardiac glycoside toxicity: more than years and counting image-based sirna screen to identify kinases regulating weibel-palade body size control using electroporation tumor necrosis factor alphamediated reduction of klf is due to inhibition of mef by nf-kappab and histone deacetylases statin treatment before percutaneous cononary intervention kruppel-like factor (klf ) regulates endothelial thrombotic function a gbf -dependent mechanism for environmentally responsive regulation of er-golgi transport organelle size control systems: from cell geometry to organelledirected medicine weibel-palade bodies at a glance the physiological function of von willebrand's factor depends on its tubular storage in endothelial weibel-palade bodies cardiac glycosides as novel cancer therapeutic agents cellular stress induces erythrocyte assembly on intravascular von willebrand factor strings and promotes microangiopathy ultralarge von willebrand factor fibers mediate luminal staphylococcus aureus adhesion to an intact endothelial cell layer under shear stress statins exert endothelial atheroprotective effects via the klf transcription factor the role of von willebrand factor in thrombus formation interaction of von willebrand factor with platelets and the vessel wall in vivo imaging analysis of the interaction between unusually large von willebrand factor multimers and platelets on the surface of vascular wall biochemistry and genetics of von willebrand factor kruppel-like factor as a novel mediator of statin effects in endothelial cells g protein-coupled receptor kinase moderates recruitment of thp- cells to the endothelium by limiting histamine-invoked weibel-palade body exocytosis microtubules and the organization of the golgi complex identification and characterization of cardiac glycosides as senolytic compounds pathophysiology of thrombotic thrombocytopenic purpura the shear stress-induced transcription factor klf affects dynamics and angiopoietin- content of weibel-palade bodies cell adhesion mechanisms in platelets thrombosis in diabetes: a shear flow effect? mechanoenzymatic cleavage of the ultralarge vascular protein von willebrand factor affects (no defined mechanism) integrin beta- (itgb ) and inhibits integrin alpha-l (itgal) high affinity nerve growth factor receptor (ntrk ), macrophage colony-stimulating factor receptor (csf r), platelet-derived growth factor receptor alpha (pdgfra), platelet-derived growth factor receptor beta (pdgfrb) it is a substrate and inhibitor of multidrug resistance protein (abcb ) and other transporters of the acb family and slc family has no effect on the adrenergic system or central nervous system, but may antagonize histamine and interfere with acetylcholine release locally wikipedia: a vasodilator that acts as an adenosine reuptake inhibitor. used for the treatment of cardiopathy and renal disorders used in treatment of leukaemia and other neoplasms. targets dna (dna intercalating agent). inhibits dna topoisomerase -alpha (top a) and dna topoisomerase -beta (top b) inhibitor of multidrug resistance-associated protein (abcc ) and other acb transporters also used as an anthelmintic and in the treatment of giardiasis and malignant effusion and in cell biological experiments as an inhibitor of phospholipase a . targets dna (dna intercalating agent). inhibits / kda calcium-independent phospholipase a (pla g ), cytosolic phospholipase a (pla g a) antagonist of d( ) dopamine receptor (drd ), d( a) dopamine receptor (drd ), -hydroxytryptamine receptor a (htr a), -hydroxytryptamine receptor a (htr a), alpha- a adrenergic receptor (adra a), alpha- b adrenergic receptor (adra b) fluspirilene drugbank: long-acting injectable antipsychotic agent used for chronic schizophrenia fluphenazine drugbank: phenothiazine used in the treatment of psychoses with properties and uses generally similar to those of chlorpromazine ciclesonide drugbank: glucocorticoid used to treat obstructive airway diseases. agonist of glucocorticoid receptor (nr c ) antagonist of -hydroxytryptamine receptor a (htr a), -hydroxytryptamine receptor b (htr b), -hydroxytryptamine receptor c (htr c) doxorubicin drugbank: cytotoxic anthracycline antibiotic isolated from cultures of streptomyces peucetius var. caesius. targets dna (dna intercalating agent). inhibits dna topoisomerase -alpha (top a). inhibitor and inducer of multidrug resistance protein (abcb ) and other abc transporters antagonist of d( ) dopamine receptor (drd ), neuron-specific vesicular protein calcyon (caly; interacts with clathrin light chain and stimulates clathrin self assembly and endocytosis), alpha- a adrenergic receptor (adra a) azithromycin drugbank: broad-spectrum macrolide antibiotic with a long half-life and a high degree of tissue penetration. like other macrolides, it blocks bacterial protein synthesis. in human cells inhibits protein-arginine deiminase type- (padi ; involved in arginine modification of histones epirubicin drugbank: '-epi-isomer of doxorubicin. the compound exerts its antitumor effects by interference with the synthesis and function of dna. targets dna (dna intercalating agent). inhibits dna topoisomerase -alpha (top a) inhibits cytosolic phospholipase a (pla g a) inhibits nadh dehydrogenase [ubiquinone] subunit c (ndufc ), potassium voltage-gated channel subfamily h member (kcnh ), vascular cell adhesion protein (vcam ), e-selectin (sele) and multidrug resistance protein (abcb ). modulates hypoxia prochlorperazine drugbank: a phenothiazine antipsychotic used principally in the treatment of nausea, vomiting and vertigo antagonist of d( ) dopamine receptor (drd ), -hydroxytryptamine receptor a (htr a), -hydroxytryptamine receptor c (htr c), -hydroxytryptamine receptor (htr ) key: cord- -sh w mye authors: lu, shuai; li, yuguang; wang, fei; nan, xiaofei; zhang, shoutao title: leveraging sequential and spatial neighbors information by using cnns linked with gcns for paratope prediction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sh w mye antibodies consisting of variable and constant regions, are a special type of proteins playing a vital role in immune system of the vertebrate. they have the remarkable ability to bind a large range of diverse antigens with extraordinary affinity and specificity. this malleability of binding makes antibodies an important class of biological drugs and biomarkers. in this article, we propose a method to identify which amino acid residues of an antibody directly interact with its associated antigen based on the features from sequence and structure. our algorithm uses convolution neural networks (cnns) linked with graph convolution networks (gcns) to make use of information from both sequential and spatial neighbors to understand more about the local environment of the target amino acid residue. furthermore, we process the antigen partner of an antibody by employing an attention layer. our method improves on the state-of-the-art methodology. a ntibody, also known as immunoglobulin, is a yshaped protein consisting of two light chains and two heavy chains [ ] , and can bind to a specific surface of the antigen, named epitope. amino acid residues of an antibody directly involved in binding epitope is called paratope [ ] . the accurate recognition of paratope on a given antibody would greatly improve antibody affinity maturation [ ] - [ ] and de novo design [ ] - [ ] . we can get high resolution structure of antibodyantigen complex by experimental methods, such as xray [ ] , nrm [ ] and cryo-em [ ] . however, it remains time consuming and empirical [ ] . as more and more protein structures including antibody-antigen complexes were analyzed, the machine learning-based methods can be used for predicting paratope by learning the paratope-epitope interaction patterns from known antibody-antigen complex structures. according to the type of selecting neighbors of target residue for representing and predicting, the machine learning-based methods can be divided into two categories, leveraging sequential neighbors or spatial neighbors. as for methods leveraging sequential neighbors, a part of the antibody sequence was used consisting of the target residue and additional forward and backward sequential neighbors. sequential neighbors were selected from the whole sequence of antibody like the methods in [ ] - [ ] , and others only took advantage of the sequence of cdr region [ ] , [ ] . although the sequence was always available at the stages of an antibody discovery campaign earlier than the structure, machine learning methods using spatial neighbors can provide more precise definition of the paratope. in [ ] , the antibody surface patch which was a set of amino acid residues adjacent to each other on the antibody surface, were represented by d zernike descriptors. and the stateof-art method [ ] represented an antibody as a graph where each amino acid residue was a node and k nearest spatial neighbors were used in the convolution operator. in this work, we utilize the sequential and spatial neighbors of the target antibody residue by using convolutional neural networks (cnns) linked with graph neural networks (gcns) for paratope prediction. first, we construct an antibody residues feature matrix form sequence-based and structure-based information. next, we employ cnns which take the residues feature matrix as input with a fixed window size for considering the influence of sequential neighbors. then, the output of cnns are directly fed to gcns for learning the local environment of spatial neighbors. at last, our program predicts the binding probability of each antibody residue. we also compare results with those from other paratope predictors, and our framework achieves the best performances. moreover, we add an attention layer to our model performs best attempting gain more information from antigen partner. we use the dataset the same as [ ] . all the complexes in training set are collected by [ ] from the training set used to train paratome [ ] , antibody i-patch [ ] and parapred [ ] predictors. the complexes in test set are fetched from abdb database [ ] . the antibody-antigen complexes present in abdb are split into two categories depending on whether their antigen is a protein or not. in both training and test sets, the complexes whose resolution better than Å or the antibody sequence which has more than % sequence identity are removed. the training set is further split into two disjoint sets: a reduced training set and a validation set,and the validation set is used to tune the hyper parameters in the predictive model. structures with nonprotein-binding antibodies are removed in the state-of-art method [ ] resulting in complexes for training, for validation and for testing. specifically, the complexes with pdb id ap and kve, only has one chain in antibody which are still retained. to construct the input matrix, we encode the d antibody sequence as a d numerical matrix with the dimension of (l, n ), where l is the length of the antibody sequence and n is the residue features vector dimension ( here). as shown in fig. , the feature representation for amino acid residue a is donated by x a . different components of the feature representation are denoted by the superscripts . each box indicates the program used to extract a given set of features. all those features can be classified into two classes according to the source: sequence-based and structure-based. one-hot encoding: the type of amnio acid residue (only possible natural types are considered) is encoded to a dimensional vector, where where each element is either or and indicates the existence of a corresponding amino acid residue. seven physicochemical parameters: those parameters are about physicochemical properties of residues summarized by [ ] . profile features: we run psi-blast [ ] against the nonredundant (nr) [ ] database for every antibody sequence. then we get the pssm and psfm matrix, both with dimension (l, ) , as well as a d vector related with column entropy with dimension l, where l is the length of the antibody sequence. relative accessible surface area, secondary structure, phi and psi torsion angles for each residue: those features are computed using dssp [ ] . the secondary structure totally has eight classes and is represented by one-hot encoding. half sphere amino acid composition: hsaac captures the amino acid residue composition in the direction of the side chain of a residue, defined as the number of times a particular amino acid occurs in that direction within a minimum atomic distance threshold of . Å from the residue of interest. residue depth: we calculate the average distance of the atoms of a residue from the solvent accessible surface by msms [ ] . protrusion index: the protrusion index of a nonhydrogen atom is calculated using psaia [ ] which is defined as the proportion of the volume of a sphere with a radius of . Å centered at that atom that is not filled with atoms [ ] . each element of this vector is normalized to have the range from to as in [ ] . b-factor: the b-factor (or temperature factor) is an indicator of thermal motion about an atom. we use the maximum b-factor of any atom for each residue. we represent an antibody as a graph [ ] , where each residue is a node whose features represent the properties of the residue. we define the spatial neighbors of a residue as a set of k ( , in our work) closest residues determined by the mean distance between their heavy atoms [ ] . fig. shows sequential and spital neighbors of a target residue. from the analyzed d structure of an antibody-antigen complex, a residue on antibody is judged to belong to the paratope if at least one of its atoms is located within . Å from any antigen atoms like previous methods [ ] , [ ] . the sequence of the input antibody with length l is considered as a set of sequential nodes s and each node is represented as a d vector s i : all the nodes of the antibody sequence compose a d features matrix as said in sectioon . . our cnns uses one filter function, where the input is s i−w:i+w = {s i } l i= = s ( ) and the output is a hidden vector s where f is a non-linear activation function (e.g. relu), w c is the wight matrix, and the b c is the bias vector. here we use residual connections which act as a shortcut connection between inputs and outputs of some part of a network by adding inputs to outputs shown as as a result, we apply the function to obtain s set of hidden vector of every position of the residue sequence: we use the graph convolution [ ] which enables orderindependent aggregation of properties over spatial neighbors of target residue and together contributes to the formation of a binding interface. for a node s i , the receptive field consisting of k spatial neighbors g i = {g i } k j= from the input graph. after processing by cnns, result of a node is s (t) and its spatial neighbors are g the parameters of this operation include the aggregation weight matrix w t for the target node, the aggregation weight matrix w g for the neighboring nodes, and the bias vector b g . thus, multiple layers can be stacked to produce high-level representations for each node s finally, two fully connected layers perform classification for each antibody residue z (t) i after processing by cnns and gcns. an inverse logit function transforms each residue's output y i to indicate the probability of belonging paratope shown as we implement our model using pytorch [ ] v . . validation sets are used to find the optimal set of network training parameters for final evaluation. the training details of these neural networks are as follows: optimization: momentum optimizer with nesterov accelerated gradients; learning rate: . ; batch size: ; dropout: . ; sequential neighbors size: (fixed); spatial neighbors in the graph: (fixed); number of layers in gnns: , or ; number of layers in cnns: , or . training times of each epoch vary from roughly - minutes depending on network depth, using a single nvidia rtx gpu. for each combination, networks are trained until the performance on the validation set stops improving or for a maximum of epochs. gcns have the following number of filters for , and layers, respectively: ( ), ( , ), ( , , ). all weight matrices are initialized as in [ ] and biases are set to zero. training is carried out by minimizing the weighted cross-entropy loss function [ ] . in this secction, we compute precision and recall by predicting residues as paratope with probability above . [ ] . as the area under the receiver operating characteristics curve (auc roc) is threshold-independent and increases in direct proportion to the overall prediction performance, we take it to assess the overall predictive abilities. beside, we consider the area under the precision recall curve. to provide robust evaluation of performance, we have trained and tested all networks five times, and computed the mean and standard error. results comparing the auc roc and auc pr of various layers combination of cnns and gcns are shown in table and table . our first observation is that the all the cnns linked with gcns methods, with auc roc around . and auc-pr around . , outperform the individual cnns or gcns methods which have distinct lower auc prs, showing that the incorporation of combined information from a residue's sequential and spatial neighbors improves the accuracy of interface prediction. this matches the biological intuition that the region around a residue should impact its binding affinity [ ] . we also observe that the effect of the combination number of cnns and gcns layers is not linear, i.e. more layers will not achieve better performance. indeed, in protein interface prediction, networks with more than four layers performed worse in [ ] . in addition, one layer gcn achieves better performance than two layer gcns about paratope prediction in task-specific learning in [ ] . we agree with these findings and draw the same conclusions. as said in secttion . , residue features are classified into two classes: sequence-based and structure-based according to the source. furthermore, sequence-based features can be divided into three parts: residue type one-hot encoding(a), profile features(b) and the seven physicochemical parameters(c) as their different properties. all the structurebased feature are considered as a individual part(d). because the residue type is the most basic features, all kinds of combination must include it's one-hot encoding, e.g. a+b, a+c, a+d, a+b+c, a+b+d, a+c+d, a+b+c+d. we obtain the best performance form the model with layers cnns linked with layers gcns as shown in table and table . hence, we train this model again using the other kinds of residue features combination. each combination was evaluated by averaging all the auc roc and auc pr of all the antibodies in testing set. both mean value and standard deviation are reported in fig. and fig. . from fig. , we can see that three residue features combinations(a+b: . ± . , a+b+c: . ± . , a+b+d: . ± . ) almost achieve the optimal performance. all of them contain the profile features(b). as for the auc pr in fig. , we can see that performance vary from all kinds combination. the model using all the features still works best. as shown in fig. . and fig. . , we compare our method to other existing methods specifically for paratope prediction, i.e. antibody i-path which pays attention to energetic importance(auc roc: . , auc pr: . ) [ ] , parapred which consists of cnn and rnn-based networks(auc roc: . , auc pr: . ) [ ] , models using d zernike descriptors(auc roc: . , auc pr: . ) [ ] and graph convolution and attention mechanism(auc roc: . , auc pr: . ) [ ] . note that these methods only considering sequential or spatial neighbors of target antibody residue. our model achieves competitive or greater performance compared to these methods on both auc roc( . ± . ) and auc pr( . ± . ). an attention layer was used to explore the specific interaction between antibody and antigen pairs on paratope and epitope prediction [ ] . the contributions of attention layer only were accessed about epitope predictor. in this study, we add an attnetion layer to our best model resulting in lower performance(auc roc: . ± . , auc pr: . ± . ) for paratope prediction. fig. shows the heatmap of attention score between every pairs of residues from the complex on which our model performed best(pdb id k ). but we can not see outstanding performance as in epitope predictor, which could be caused by the different environment components of epitope and paratope [ ] , [ ] . in this study, we design and implement a new structurebased paratope predictor leveraging sequential and spatial neighbors of target antibody residue. our model is trained on the antibody-antigen complex structures collected from datasets of some paratope predictors which includes the most structures. moreover, we utilize more residue features consisting of sequence-based and structure-based. experimental results with a training dataset and an independent validation dataset demonstrate the efficiency of our method. the superior performances of our method are due to several reasons, including a rich dataset, more sufficient features selection, and careful construction of the prediction model considering sequential and spatial neighbors at same time. we note that our program has two potential disadvantages. first, the predictor needs antibody structure as it takes structure-based residue features as input. second, at the stage of extracting residue features, it consumes long computer time as psi-blast [ ] needs to be performed. in our future work, we will take adjacent information from antibody sequence so that the predictor can make use of gcns without structure. we will attempt to accelerate the computation speed by using several servers to concurrently perform psi-blast [ ] . biomolecule binding motifs mining is a long-term challenge for understanding their function. the forming incorrect interaction between some critical molecules has been revealed as one of the import causes for diseases like covid- [ ] . the method proposed in this study is specifically for identifying the antibody-antigen binding residues. in the future work, we will future investigate the applicability of our model to other types of molecules binding residues prediction problem, e.g., drug-target interaction prediction [ ] . structure, function and properties of antibody binding sites antibody and antigen contact residues define epitope and paratope size and structure effective optimization of antibody affinity by phage display integrated with high-throughput dna synthesis and sequencing technologies insights into the structural basis of antibody affinity maturation from next-generation sequencing the effects of framework mutations at the variable domain interface on antibody affinity maturation in an hiv- broadly neutralizing antibody lineage in silico methods for design of biological therapeutics computational design of epitope-specific functional antibodies epitope-directed antibody selection by site-specific photocrosslinking watching a protein as it functions with -ps time-resolved x-ray crystallography methodological advances in protein nmr towards atomic resolution structural determination by single-particle cryo-electron microscopy computer-aided antibody design paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure prediction of site-specific interactions in antibody-antigen complexes: the proabc method and server antibody i-patch prediction of the antibody binding site improves rigid local antibody-antigen docking parapred: antibody paratope prediction using convolutional and recurrent neural networks attentive cross-modal paratope prediction antibody interface prediction with d zernike descriptors and svm learning context-aware structural representations to predict antigen and antibody binding interfaces abdb: antibody structure database-a database of pdb-derived antibody structures generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks gapped blast and psi-blast: a new generation of protein database search programs blast: at the core of a powerful and diverse set of sequence analysis tools dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features reduced surface: an efficient way to compute molecular surfaces psaia -protein structure and interaction analyzer cx, an algorithm that identifies protruding atoms in proteins pairpred: partner-specific prediction of interacting residues from sequence and structure protein interface prediction using graph convolutional networks pytorch: an imperative style, high-performance deep learning library progress and challenges in predicting protein interfaces structural basis for the recognition of sars-cov- by full-length human ace drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework origins of specificity and affinity in antibody-protein interactions this work was supported by the 'created major new drugs' of major national science and technology (no. zx - ), and leading talents fund in science and technology innovation in henan province( ). xiaofei nan and shoutao zhang are the corresponding authors for this paper. key: cord- -nly vojr authors: fletcher, nicola f.; meredith, luke w.; tidswell, emma; bryden, steven r; gonçalves-carneiro, daniel; chaudhry, yasmin; lowe, claire shannon; folan, michael a.; lefteri, daniella a; pingen, marieke; bailey, dalan; mckimmie, clive s.; baird, alan w. title: a novel antiviral formulation inhibits a range of enveloped viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nly vojr some free fatty acids derived from milk and vegetable oils are known to have potent antiviral and antibacterial properties. however, therapeutic applications of short to medium chain fatty acids are limited by physical characteristics such as immiscibility in aqueous solutions. we evaluated a novel proprietary formulation based on an emulsion of short chain caprylic acid, virosal, for its ability to inhibit a range of viral infections in vitro and in vivo. in vitro, virosal inhibited the enveloped viruses epstein-barr, measles, herpes simplex, zika and orf parapoxvirus, together with ebola, lassa, vesicular stomatitis and sars-cov- pseudoviruses, in a concentration- and time-dependent manner. evaluation of the components of virosal revealed that caprylic acid was the main antiviral component; however, the virosal formulation significantly inhibited viral entry compared with caprylic acid alone. in vivo, virosal significantly inhibited zika and semliki forest virus replication in mice following the inoculation of these viruses into mosquito bite sites. in agreement with studies investigating other free fatty acids, virosal had no effect on norovirus, a non-enveloped virus, indicating that its mechanism of action may be via surfactant disruption of the viral envelope. we have identified a novel antiviral formulation that is of great interest for prevention and/or treatment of a broad range of enveloped viruses. the antimicrobial properties of fatty acids have been extensively reported in the literature (for review, see thormar et al. (thormar and hilmarsson, ) and (churchward et al., ) . previously, (thormar et al., ) demonstrated the antiviral effects of different free fatty acids and lipid extracts from human milk against vesicular stomatitis virus (vsv), herpes simplex virus (hsv) and visna virus revealed that short chain saturated fatty acids (butyric, caproic and caprylic) together with long chain saturated fatty acids (palmitic and stearic) had no or very little antiviral activity, whereas medium chain saturated entities including capric, lauric, myristic and long chain unsaturated oleic, linoleic and linolenic acids were anti-viral, albeit at different concentrations. another study (hilmarsson et al., ) reported similar trends in the antiviral activity of six medium chain fatty acids together with their alcohol and mono-glyceride derivatives against herpes simplex viruses and . in contrast, dichtelmuller et al ( ) reported that caprylic acid had antiviral activity against enveloped viruses including human immunodeficiency virus, bovine viral diarrhoea virus, sindbis virus and pseudorabies virus (dichtelmuller et al., , pingen et al., . studies investigating the antiviral properties of whole milk reported no antiviral properties of fresh human milk, whereas milk that had been stored at o c possessed potent antiviral activity against several viruses in vitro. refrigeration disrupts the milk fat globule membrane allowing ingress of milk serum lipase which results in hydrolysis of milk fat triglyceride (thormar et al., , isaacs et al., . it was concluded that release of fatty acids from milk triglycerides in stored milk, and that recovered from neonatal (achlorhydric) stomachs, was responsible for generating antiviral factor(s) (thormar et al., ) . we investigated the effect of a specifically formulated emulsion of free fatty acids, virosal, on infectivity of enveloped and non-enveloped viruses. caprylic acid delivered in the virosal emulsion exhibited significant anti-viral effects. a range of enveloped viral infection systems was utilized, and complete inhibition of viral infection was observed without any evidence of cytotoxicity. virosal had no effect on the infectivity of a non-enveloped virus, norovirus, which is in agreement with previous studies demonstrating that free fatty acids are ineffective against nonenveloped viruses (thormar et al., , kohn et al., . furthermore, virosal inactivated the enveloped mosquito-borne viruses semliki forest virus (sfv) and zika virus (zikv) in vitro. prophylactic topical treatment of viral infection in mosquito bites with virosal inhibited local replication and dissemination of sfv and plasma levels of zikv in mice. transmission electron microscopy analysis indicated that virosal disrupts orf parapoxvirus envelope integrity, with higher concentrations completely disrupting virion morphology. these data indicate that virosal has antiviral activity against a range of enveloped viruses in vitro and in vivo. hepg , caco- , a , vero cells, raw . murine macrophages, t human embryonic kidney cells and bv murine microglial cells were propagated in dmem supplemented with % fetal bovine serum (fbs), mm l-glutamine (gibco), antibiotics ( u of penicillin/ml and μg of streptomycin/ml) and % non-essential amino acids (gibco). baby hamster kidney- (bhk- ) cells and c / mosquito cells were grown as previously described (pingen et al., ) . vero cells engineered to over-express signaling lymphocyte activation molecule (slam) were supplemented with . mg/ml of geneticin (g , sigma). human foreskin fibroblasts (hff- ) cells were obtained from atcc (atcc® scrc- ™), and were propagated in dmem supplemented with % fbs, mm lglutamine, antibiotics ( u of penicillin/ml and μg of streptomycin/ml) and % non-essential amino acids). primary tonsil epithelial cells were isolated from normal tonsil tissue obtained during hypophysectomy, as previously described (feederle et al., ) . primary foetal lamb skin (fls) cells were isolated as previously described (scagliarini et al., ) and cultured in medium with % heat inactivated fbs. all cells were maintained at o c, % co . anti-hsv- gd antibody was a generous gift from colin crump, university of cambridge (minson et al., ) . anti-mouse alexa- antibody was obtained from gibco, u.k. pseudoviruses were generated by transfecting t cells with plasmids encoding a hiv- provirus expressing luciferase (pnl - -luc-r-e-) and vesicular stomatitis virus (vsv-g), zika, ebola, lassa or sars coronavirus- (sars-cov- ) envelope or a no envelope control (env-), as previously described (fletcher et al., (broer et al., . zika virus pseudoviruses were generated by amplifying prem-m-env region (positions - ) from the pe brazil zika strain (accession no. kx ), with flanking restriction sites 'nhei-insert-xbai- ' then inserting into pcdna . (+) plasmid. virus containing media was incubated with the indicated concentrations of virosal, at ph . , for minutes at room temperature unless stated otherwise, and an equal concentration of neutralizing buffer was added to restore ph to , and virus/virosal solutions were added to target cells for hours. each preparation of virosal was titrated in the relevant media used to culture each cell type (dmem, williams e and m ). to control for ph treatment, virus was treated with ph . buffer that did not contain virosal for minutes and then ph restored to as described for virosal treatments and virus added to target cells. after hours, unbound virus was removed and the media replaced with dmem/ % fbs. at h post-infection the cells were lysed, luciferase substrate added (promega, u.k.) and luciferase activity measured for seconds in a luminometer (lumat lb ). specific infectivity was calculated by subtracting the mean env-pp rlu signal from the pseudovirus rlu signals. infectivity is presented relative to untreated control cells by defining the mean luciferase value of the replicate untreated cells as %. to assess cell viability following virosal treatment, an mtt assay was performed on replicate wells in every experiment (gibco, u.k.). ic carrying the open reading frame of the enhanced green fluorescent protein (egfp) was generated as described previously (hashimoto et al., ) . after initial recovery, mev was produced in vero-slam cells using an initial multiplicity of infection of . . when infection was fully developed, flasks were vigorously shaken and supernatants collected and clarified at rpm for min at ⁰c. supernatant was collected, aliquoted and stored at - ⁰c. recombinant wild-type epstein-barr virus (ebv ) with a gfp insert (delecluse et al., ) was generated from t cells carrying the recombinant ebv b . genome, as previously described (shannon-lowe et al., ) . encapsidated and enveloped virus was purified from the culture supernatants by centrifugation on an optiprep (axis shield) self-generated gradient (shannon-lowe et al., ) and quantitated by quantitative pcr (qpcr) of a single-copy gene, balf , as previously described (shannon-lowe et al., ) . herpes simplex virus (hsv- )(strain i ) was propagated in vero cells as previously described (ren et al., ) . titration was performed on confluent cultures of vero cells with a % agarose overlay in dmem/ % fcs, and infectious titre determined in plaque forming units (pfu). for in vitro work, zika virus (brazil, pe ) was propagated in vero cells, as previously described (chavali et al., ) . titration was performed on confluent cultures of vero cells with a % agarose overlay in dmem/ % fcs and infectious titre quantified in plaque forming units (pfu). virus stocks were pooled and titred on vero cells to determine ic values. orf virus (mri-scab) was propagated by scarification of lambs as previously described (mcinnes et al., ) and then cultured by infection of primary foetal lamb skin (fls) cells. infection experiments were performed by direct inoculation of orf virus onto primary cultures of fls cells (scagliarini et al., ) . murine norovirus (mnv- .cw strain) was propagated in murine raw . cells (wobus et al., ) . the yield of infectious virus was determined at h posttransfection of cdna or capped rna. titres of virus were determined as % tissue culture infectious dose (tcid ) in raw . cells, using microscopic visualization for the appearance of cytopathic effect. the pcmv-sfv and pcmv-sfv backbone for production of sfv has been previously described (ulper et al., ) . sfv plasmids were electroporated into bhk- cells to generate infectious virus. sfv is the prototypic, less virulent strain of the virus, whereas sfv is a copy of a virulent strain (ferguson et al., ) . for all in vivo experiments with sfv and zikv (brazil, pe ), viruses were grown once in bhk- cells and then passaged once in c / aedes mosquito cells and titrated before use because mosquito cell-derived virus has distinct glycosylation and because insect cells impose distinct evolutionary constraints on viral progeny (moser et al., ) . sfv and sfv were used at passage . westgate biomedical ltd, donegal, ireland, (folan m. patent wo / ). all components used in the construction of virosal are pharmaceutical grade constituents of greater than % purity. the virosal emulsion used in these studies was constructed using lipoid s lecithin (lipoid ag, steinhausen, switzerland) . % w/w which had been de-lipidised using solvent extraction to remove any extraneous lipid that remained conjugated to its lipophilic sites (de-lipidised lecithin is amphipathic). a co-surfactant pluronic f- (basf cork, ireland) was used to enhance stability and a di-palmitoyl, , -dipalmitoyl-sn-glycero- -phosphorylglycerol sodium salt (dppg from lipoid ag) was used to amplify surface charge on the emulsion droplet. pharmaceutical grade caprylic acid (merck, nottingham, uk) % w/w was emulsified in the de-lipidised lecithin, pluronic and dppg mix . % w/w and . % w/w respectively using an emulsiflex c- (avestin, ottowa, canada) at , bar pressure. the emulsion was diluted to % w/w ( . % caprylic acid), or used at the concentrations specified in each experiment, in sodium citrate buffer, ph . , with a physiological isotonicity of mosm, optimized for each culture medium used, and autoclaved before use (laverty et al., ) . for topical application to mouse skin, a gel formulation was constructed using %w/v carbopol p (lubrizol inc, cleveland, ohio, usa), dispersed and hydrated in an aqueous solution of % glycerol (merck) which was then ph adjusted to . before addition of % virosal emulsion. the gel was applied liberally to test sites as described. virosal has an acidic ph which necessitated minimal time exposure to cell lines followed by a neutralizing buffer to restore ph to . . in this study, virosal at the indicated concentrations was mixed with an equal volume of viral inoculum (mev:original tcid = . /ml, hsv- : pfu/ml, ebv: moi= , zika: moi= , orf: pfu/ml) or pseudoviruses bearing vsv, ebola, lassa or sars-cov- envelope glycoproteins, and incubated at room temperature for minutes. the same volume of a buffered neutralizing solution was added and mixed thoroughly to restore the ph to . virosal and neutralizing solution was optimized for each culture medium used in in vitro studies. as a control for the effect of low ph on infectivity, µl of ph . buffer solution was added to the virus, incubated for min and neutralized as before. virus/virosal or control treated virus was inoculated onto appropriate target cells and incubated for h at °c, then fixed and infection enumerated, or, for pseudovirus assays, lysed and luciferase activity quantified as previously described (fletcher et al., ) . for mev, tcid s were calculated in veroslam cells in triplicate using the reed-muench method (reed, ) . for non-enveloped viral infectivity assays, a similar approach was used to enveloped virus assays. virosal was mixed with murine norovirus (mnv-cw ) in a : ratio, incubated at room temperature for minutes and the ph was restored to . virus infection of permissive raw . (mouse macrophage) and bv (murine microglial) cells were conducted immediately after neutralization in triplicate, incubated at °c and quantified h post infection using tcid via microscopic visualization for the appearance of cytopathic effect. - -week-old female c bl/ j mice were derived from a locally bred colony maintained in a pathogen-free facility, in filter-topped cages, and maintained in accordance with local and governmental regulations. to prevent genetic drift, mice have been rederived using externally supplied mice (charles river). because arbovirus infection of the skin always occurs in the context of an arthropod bite, we used a mouse model that additionally incorporates biting aedes aegypti mosquitoes. host response to mosquito bites includes oedema and an influx of leukocytes that enhances host susceptibility to infection with virus (pingen et al., ) . therefore, we used our previously established model of arbovirus infection at mosquito bites. this model was specifically developed to model natural infection by arbovirus, including mimicking the same dose delivered by mosquitoes, using mosquito cell-derived virus, injecting a small ul inoculum volume, and by including the presence of a mosquito bite at the site of inoculation. to ensure that mosquitoes bit a defined area of skin (upper side of the left foot), anesthetized mice were placed for up to min into a mosquito cage containing a. aegypti mosquitoes (locally bred colony derived from the liverpool strain). biting was restricted to a defined area of the left foot by covering all other mouse skin with an impenetrable barrier. viral rna and host gene transcripts in tissues were quantified by reverse transcription qpcr, and infectious virus was quantified by end point titration, as described previously. tissue generated up to ug of total rna, of which ug of rna was used to create cdna, of which % was used per qpcr assay. qpcr primers for sfv amplified a section of e and primers for zikv amplified a section of the env gene. for sfv and zikv, qpcr assays measured the sum value of both genome and subgenomic rna (bryden et al., ) . each result represents the median of three or four technical replicates of one biological replicate. for plaque assay, viral stocks and biological samples were serially diluted, and each dilution was assayed in duplicate on bhk- cells with an avicell overlay for hours as previously described (pingen et al., ) . biological replicates from mice were excluded from analysis if injection of virus inadvertently punctured a blood vessel (although this was rare and occurred with a frequency of < %). all experiments involving mice had been subject to rigorous review by the university of leeds welfare and ethical review committee and additionally approved by the u.k. home office (license pa cf e ). zika infected cells were fixed with % paraformaldehyde, stained with primary anti-zika-e-protein antibody ( : , g ), and secondary antimouse alexa- ( : ). cells were then resuspended in pbs, and , cells were screened by flow cytometry using a facscalibur flow cytometer (bd biosciences, germany). data was analyzed using flowjo software (flowjo llc, usa). orf (mri-scab) was harvested from supernatants of primary cultures of foetal lamb skin (fls) cells, filtered through a . μm filter and then centrifuged for hours at , rpm in a bench-top centrifuge (eppendorf). virus was placed on formvar coated copper electron microscopy grids for minutes; excess supernatant was removed and the grids were stained with % uranyl acetate. grids were visualized using a tecnai transmission electron microscope. in vitro results are expressed as the mean ± standard deviation of the mean (sd), except where stated. statistical analyses were performed using student's t-test or mann-whitney u test in prism . (graphpad, san diego, ca) with a p < . being considered statistically significant. for all in vivoderived data, data were analyzed using graphpad prism version software. copy numbers of viral rna and infectious titres from virus-infected mice were not normally distributed (with data points often spread over orders of magnitude) and were accordingly analyzed using the nonparametric-based mann-whitney test or kolmogorov-smirnov test. to assess the ability of virosal to inhibit viral infection, we utilized a pseudovirus system, which allows high throughput assays to assess viral entry to cells. virosal inhibited entry of viral pseudoparticles bearing the envelope glycoproteins of vsv, lassa, ebola and sars-cov- viruses. pseudoviral particles mirror the entry pathways of their respective wild type viruses and use the same viral receptors and entry pathways (for review, see (li et al., ) ). these constructs generate high titre pseudoviruses that can be used to infect a range of cell types. inhibition of pseudovirus entry by virosal occurred in a concentration-dependent manner, when pseudoviruses were incubated with virosal for minutes ( figure a) . treatment of pseudoviruses with % virosal resulted in complete neutralization of pseudovirus infectivity on t human embryonic kidney cells ( figure a ). there was no alteration in cellular proliferation or cytotoxicity when cells were treated with pseudovirus or virosal, assessed using an mtt assay (data not shown). virosal is stable at ph . , so pseudovirus treatment was carried out at ph . and then restored to ph before infection of eukaryotic cells. treatment of vsv and lassa pseudoviruses with ph . control buffer caused no significant change in infectivity; however, a significant reduction in ebola pseudovirus infectivity was observed ( figure a ). to establish whether virosal inhibits pseudovirus infection in other cell types, virosal-treated lassa and vsv pseudoparticles were used to infect human intestinal (caco- ) and hepatoma (hepg ) epithelial cell monolayers. similar levels of neutralization with virosal were observed compared with that of t cells. the inhibition of viral infectivity with ph . control treatment was perhaps due to the less efficient entry of viral pseudoparticles to these cells compared with the highly permissive t cell line (supplementary figure ) . because many aspects of enveloped virus lifecycles are ph sensitive (ruigrok et al., ) and ph is the most common variable when assessing the antimicrobial activity of fatty acids (churchward et al., ) , in all cases the control against which virosal treatments were compared were those exposed to virus at ph . . to test whether virosal inhibits pseudovirus infection following short treatment times, vsv pseudoviruses were incubated with virosal at concentrations from . % to . % for , or . minutes, and then used to infect t cells. a timedependent decrease in infectivity was observed ( figure b) , and virosal inhibited vsv pseudovirus at concentrations greater than . % following seconds of treatment. to test whether virosal is capable of inhibiting full-length infectious viruses, we used a range of infectious virus systems and selected cell culture systems. pseudoviruses were generated bearing the envelope glycoproteins of zika virus strain pe , a brazilian strain, and the ability of virosal to inhibit pseudovirus entry together with the full-length infectious clone of zika virus strains pe and mr , an african strain, were compared. virosal inhibited both zika virus pseudovirus and similar to zika virus, a higher concentration of virosal ( %) was required for complete neutralization of full-length viruses compared to pseudoviruses. treatment with % virosal did not result in cytotoxicity, measured using mtt assays (data not shown). this indicates that virosal is capable of fully neutralizing a range of enveloped viruses in vitro, at a concentration that does not cause cytotoxicity in eukaryotic cells. since virosal potently inhibits infection of a range of enveloped viruses, we sought to establish whether this effect was limited to enveloped viruses or whether virosal also influenced non-enveloped viruses. murine norovirus, a non-enveloped virus, was treated with virosal and added to cultures of raw . murine macrophages or bv murine microglial cells. there was no significant difference between control treated virus and virosal treated virus infectivity in either cell type tested ( figure a ). this indicates that, in contrast to virosal's effect on enveloped viruses, virosal does not inhibit infection with norovirus, a non-enveloped virus. activity. to investigate the components of virosal that are responsible for its activity, we evaluated individual components of virosal (caprylate, dppg, lecithin and kolliphor), at concentrations equivalent to amounts present in % virosal, together with % virosal and ng/ml ifn-y as a positive antiviral control. only caprylate (caprylic acid) had a significant effect on vsv-g pseudovirus entry to vero cells ( figure b ). however, % virosal fully neutralized viral entry, indicating that caprylic acid within the virosal formulation was more potent as an antiviral than caprylic acid alone. to examine the potential perturbation of enveloped virus membranes, we used transmission electron microscopy to visualize virosal-treated orf virions. orf is a parapoxvirus, a family of viruses that includes pseudocowpox and bovine papular stomatitis virus, and it was selected because of the large size of the virions ( - nm in diameter) and the relatively high titres that can be achieved following growth in cell culture. negatively stained virions exhibited typical poxvirus morphology ( figure a) , with an ovoid structure surrounded by an external envelope. treatment of orf virions with % virosal, which neutralized approximately % infection in infection assays ( figure b ), appeared to disrupt the external viral envelope, which remained associated with the virions ( figure b ). treatment with % virosal, which completely neutralized in vitro infectivity ( figure b ) disrupted virion morphology and the lattice-like structure present on intact virions ( figure c ). these data indicate that virosal disrupts orf surface structure, and virion morphology. arboviruses are transmitted to their mammalian hosts via the bite of an insect vector. given that virosal has optimal activity at a slightly acidic ph, such as that seen on skin and mucosal surfaces, we investigated the effects of virosal treatment on mouse skin following viral inoculation at the site of aedes aegypti bites, according to previously published protocols (bryden et al., ) . arbovirus replication in the skin at mosquito bites is a key stage of infection during virus replicates rapidly before disseminating to the blood and other tissues (pingen et al., ) . previous work has suggested therapeutic targeting this site may be efficacious in reducing severity of infection (bryden et al., ) . here, zikv was chosen, as it is a medically important emerging arbovirus, whereas sfv was used as a model virus that replicates efficiently in immunocompetent mice. (supplementary figure ) in addition to zikv (figure ) , sfv was incubated at increasing infectious titres with % virosal at ph . for minutes. following immediate restoration of physiological ph, solution was applied to monolayers of bhk- cells and infectious titre assessed by plaque assay. as a control sfv was incubated similarly at ph . in the absence of virosal. the amount of infectious sfv was reduced to beyond the limit of detection by virosal, suggesting this virus is highly sensitive to treatment (supplementary figure ) . similarly, when virosal treated sfv was inoculated subcutaneously into mice, the titre of virus at hours post infection was significantly reduced (supplementary figure ) . we next determined whether topical application of virosal can also suppress virus infection when virus was inoculated subcutaneously into a mosquito bite. in this experiment, virosal was applied immediate post mosquito bite and left for minutes to allow penetration of skin by virosal, or left untreated as a control. sfv was then inoculated subcutaneously into the mosquito bite as previously described (pingen et al., ) . re-application of virosal to the inoculation site was repeated at hours post infection. at hours post infection, those mice receiving virosal had a significantly reduced amount of virus rna at the inoculation/bite site and limited dissemination of virus to the spleen hours post infection ( figure a ). infectious viral titres within plasma were also significantly reduced ( figure b ). topical application of virosal to mosquito bites was similarly efficacious in reducing zikv serum viraemia by hours post zikv infection ( figure c ). the virosal used in these studies is a proprietary emulsion of free fatty acids that has previously been shown to have antibacterial properties, including against a range of veterinary periodontopathogens (laverty et al., ) , staphylococcus species including methicillin-resistant staphylococcus aureus (hogan et al., ) as well as a wide range of culturable species from human colon (mcdermott et al., ) . the antimicrobial activity of medium chain saturated and long chain unsaturated free fatty acids has been reported previously (thormar and hilmarsson, ) , with both antibacterial and antiviral activity observed (hilmarsson et al., ) . caprylic acid has been used in the purification process of immunoglobulins due to its inactivation of lipid enveloped viruses (dichtelmuller et al., ) , a process which is ph-dependent and optimal at low ph. poor solubility in aqueous solutions has limited antimicrobial applications of fatty acids in their free form (jackman et al., ) and dissolution in solvents such as dmso can cause skin irritation. formulation as nanoparticles has been described to overcome the challenges of antimicrobial fatty acid bioavailability (jackman et al., , yoon et al., . in this study we used patented technology (folan, m.a. international patent no. wo / ) to build and use an emulsion to evaluate its antiviral potential. recent publications have noted the potential antimicrobial activity of nanoemulsions and microemulsions (churchward et al., , buranasuksombat et al., , ma et al., , donsi and ferrari, . in each of these studies there are no data that clearly evaluate the contribution of the individual components to the overall properties which, in a mixed system may be synergistic or antagonistic. however, each of the components of virosal is generally regarded as safe (gras) by the united states food & drug administration. the antiviral activity of virosal emulsion may be due to amplified surface area of the water insoluble fatty acid together with the amphipathic nature of the de-lipidised lecithin which facilitates delivery to lipophilic cell surfaces and/or the viral envelope (laverty et al., ) . we observed a significant concentration-and time-dependent (rapid) decrease in infectivity of all enveloped viruses tested in the present study, with no effect on non-enveloped norovirus. milk-based free fatty acids, as well as fatty acid emulsions, have been shown to inhibit infection of vero cells with vsv and hsv- , with no antiviral effect on poliovirus, a non-enveloped virus (thormar et al., ) . moreover, isaacs (isaacs et al., ) reported no antiviral activity of human milk or milk formula, but potent antiviral activity of milk aspirated from the stomachs of infants one hour after feeding, which occurred due to the release of free fatty acids by lipolysis. similar observations were reported by thormar et al. (thormar et al., ) , after lipolysis by storage of human milk at °c. following treatment of orf virus with virosal, we observed a dissociation of the viral envelope from virions, with higher concentrations disrupting virion integrity and morphology. in agreement with the present study, fatty acids were found to affect the viral envelope, with high concentrations causing complete disintegration of viral particles (thormar et al., , thormar et al., . in agreement with the present study, previous studies have also reported no antiviral effect of fatty acids on nonenveloped viruses (thormar et al., , churchward et al., , indicating that the antiviral mechanism of free fatty acids and virosal may involve the viral envelope and we anticipate that to be due to a surfactant effect (yoon et al., , yoon et al., . the non-ionised form of caprylic acid dissociates at basic ph into the ionized form and only the non-ionised form is capable of virus inactivation (lundblad and seng, ) . virosal has optimal activity at ph . and is consequently suitable for use in environments close to this ph, such as skin, oral cavity and mucous membranes which are also relevant portals of infection. the present study demonstrated that virosal has antiviral activity against hsv- , ebv and orf viruses, all of which are pathogens of skin or mucous membranes and have a tropism for epithelial cells. since virosal has optimal activity at ph . , we also tested a range of epithelial cell systems that would physiologically have a lower ph than that of plasma, such as skin which has a ph of approximately (lambers et al., ) and the oral cavity which can have a ph of . or lower depending on the level of oral health. virosal had strong antiviral activity against semliki forest and zika viruses, which are transmitted via mosquitoes into the skin dermis. this is an important stage of the virus life cycle, in which it replicates in dermal cells before disseminating to the blood and then throughout the body (hamel et al., ; pingen et al., ) . the potential of virosal as a topical application immediately following mosquito bites is of great interest, as established and emerging mosquito-borne viruses constitute an increasing threat to human health. treatment modalities that incorporate virosal to target skin arbovirus infection should be explored further as e.g. treatment of the mosquito bite inoculation site with an immunomodulator has recently been shown to be an effective strategy in suppressing arbovirus infection (bryden et al., ) . moreover, virosal also neutralized measles virus and sars-cov- pseudovirus infection. measles is an aerosol transmitted infection that initially targets macrophages and dendritic cells of the upper airway, which are then transported across the respiratory epithelium to the lymphatic organs (muhlebach et al., ) . similarly, sars-cov- causes an acute interstitial pneumonia and was the causative agent of the sars coronavirus epidemic. virosal, therefore, has potential applications as a topical or inhaled therapy to treat epitheliotropic viral infections or those that infect the airway or other mucous membranes. the fact that virosal has applications against a wide range of enveloped viruses is of great interest, and while many antivirals have activity against specific viral families, there are few broad-spectrum antivirals currently available. therefore, these data provide a rationale for future pre-clinical studies to investigate the antiviral properties of virosal in viral infections of mucous membranes including the airway and skin. virosal may have applications in current and future viral outbreaks, including the current pandemic caused by sars-cov- . ebola or the spike protein of sars-cov- was treated in a : dilution with concentrations of virosal ranging from % to . % (final concentrations ranged from % to . %) for minutes. buffer was then added to restore the ph to . to control for the effect of ph on viral infectivity, virus was treated with a ph . buffer, equivalent to that of virosal, for minutes and then the ph of the virus was restored to . pseudoviruses were used to infect t human embryonic kidney cells. ng/ml ifn-y was included as a positive antiviral control. data are presented as mean infectivity ± sd relative to the untreated virus control. (b) pseudovirus bearing the envelope glycoproteins of vsv were treated for , or . minutes with virosal, the ph restored to , and used to infect t cells. control virus was treated with a ph . buffer for minutes and the ph restored to . statistical differences are presented relative to the ph . control (n = independent experiments). ****p < . , ***p < . , **p < . , ns: not significant. were treated with a range of dilutions of virosal for minutes and the viral load was quantified by qrt-pcr. (c) zika virus (pe ) was treated with virosal as above and used to infect human foreskin cells for h. cells were stained with g antiflavivirus antibody or an isotype control and anti-mouse alexa- secondary antibody. infected cells were quantified by flow cytometry. statistical differences are presented relative to the ph . control (n = independent experiments). **p < . , *p< . . infection. hsv- (strain i ) was treated with virosal for minutes, the ph restored to , and used to infect vero cells. control virus was treated with a ph . buffer for minutes and the ph restored to . after hours, cells were fixed and stained with an antibody to detect hsv gd glycoprotein. images are shown with a representative phase image. (b) measles, ebv or orf virus was treated with virosal for minutes, the ph restored to , and used to infect vero, primary tonsil epithelial cells or primary foetal lamb skin cells, respectively. control virus was treated with a ph . buffer for minutes and the ph restored to . after hours, viral infectivity was enumerated. data are presented as % neutralization of infection relative to the untreated virus control. murine norovirus was treated with virosal for minutes, the ph restored to , and used to infect either raw . murine macrophages or bv murine microglial cells. after h, viral infectivity was measured and quantified as tcid via microscopic visualization. no significant difference was observed following virosal treatment compared with control treated virus. (b) vsv-g pseudovirus was treated with % virosal or the equivalent concentrations of caprylate, dppg, lecithin and kolliphor for minutes, and then ph restored to . to control for the effect of ph on viral infectivity, virus was treated with a ph . buffer, equivalent to that of virosal, for minutes and then the ph of the virus was restored to . ng/ml ifn-y was included as a positive antiviral control. data are presented as mean infectivity ± sd relative to the untreated virus control (n = independent experiments). ***p < . , **p < . , ns: not significant. orf virus (mri-scab) was treated with a ph . buffer (a), . % virosal (b) or % virosal (c) for minutes, and ph was restored to . mock treated virions exhibited typical poxvirus morphology with a prominent outer envelope (a). virions treated with a low concentration of virosal ( %) displayed altered morphology and external envelope integrity was disrupted (arrows)(b). virions treated with a high concentration of virosal ( %) exhibited heterogeneous morphology with extensive disruption of virion structure (arrows). infectious virus from sfv and zika infected mice was enumerated by plaque forming assay (pfu) hours post-infection. *p < . , **p < . , ***p < . (mann-whitney test). influence of emulsion droplet size on antimicrobial properties important role for the transmembrane domain of severe acute respiratory syndrome coronavirus spike protein during entry pan-viral protection against arboviruses by activating skin macrophages at the inoculation site neurodevelopmental protein musashi- interacts with the zika genome and promotes viral replication alternative antimicrobials: the properties of fatty acids and monoglycerides propagation and recovery of intact, infectious epstein-barr virus from prokaryotic to human cells inactivation of lipid enveloped viruses by octanoic acid treatment of immunoglobulin solution essential oil nanoemulsions as antimicrobial agents in food epstein-barr virus b . produced in cells shows marked tropism for differentiated primary epithelial cells and reveals interindividual variation in susceptibility to viral infection ability of the encephalitic arbovirus semliki forest virus to cross the blood-brain barrier is determined by the charge of the e glycoprotein hepatitis c virus infection of cholangiocarcinoma cell lines biology of zika virus infection in human skin cells slam (cd )-independent measles virus entry as revealed by recombinant virus expressing green fluorescent protein virucidal activities of medium-and long-chain fatty alcohols, fatty acids and monoglycerides against herpes simplex virus types and : comparison at different ph levels eradication of staphylococcus aureus catheter-related biofilm infections using virosal and citrox antiviral and antibacterial lipids in human milk and infant formula feeds nanotechnology formulations for antibacterial free fatty acids and monoglycerides unsaturated free fatty acids inactivate animal enveloped viruses natural skin surface ph is on average below , which is beneficial for its resident flora antimicrobial efficacy of an innovative emulsion of medium chain triglycerides against canine and feline periodontopathogens current status on the development of pseudoviruses for enveloped viruses inactivation of lipid-enveloped viruses in proteins by caprylate antimicrobial properties of microemulsions formulated with essential oils, soybean oil, and tween gnotobiotic human colon ex vivo genomic comparison of an avirulent strain of orf virus with that of a virulent wild type isolate reveals that the orf virus g l gene is non-essential for replication an analysis of the biological properties of monoclonal antibodies against glycoprotein d of herpes simplex virus and identification of amino acid substitutions that confer resistance to neutralization growth and adaptation of zika virus in mammalian and mosquito cells adherens junction protein nectin- is the epithelial receptor for measles virus host inflammatory response to mosquito bites enhances the severity of arbovirus infection mosquito biting modulates skin response to virus infection a simple method of estimating fifty per cent endpoints glycoprotein m is important for the efficient incorporation of glycoprotein h-l into herpes simplex virus type particles low ph deforms the influenza-virus envelope antiviral activity of hpmpc (cidofovir) against orf virus infected lambs features distinguishing epstein-barr virus infections of epithelial cells and b cells: viral genome expression, genome maintenance, and genome amplification resting b cells as a transfer vehicle for epstein-barr virus infection of epithelial cells the role of microbicidal lipids in host defense against pathogens and their potential as therapeutic agents inactivation of enveloped viruses and killing of cells by fatty acids and monoglycerides inactivation of visna virus and other enveloped viruses by free fatty acids and monoglycerides construction, properties, and potential application of infectious plasmids containing semliki forest virus full-length cdna with an inserted intron replication of norovirus in cell culture reveals a tropism for dendritic cells and macrophages spectrum of membrane morphological responses to antibacterial fatty acids and related surfactants correlating membrane morphological responses with micellar aggregation behavior of capric acid and monocaprin antibacterial free fatty acids and monoglycerides: biological activities, experimental testing, and therapeutic applications key: cord- -k ev qkn authors: janosevic, danielle; myslinski, jered; mccarthy, thomas; zollman, amy; syed, farooq; xuei, xiaoling; gao, hongyu; liu, yunlong; collins, kimberly s.; cheng, ying-hua; winfree, seth; el-achkar, tarek m.; maier, bernhard; ferreira, ricardo melo; eadon, michael t.; hato, takashi; dagher, pierre c. title: the orchestrated cellular and molecular responses of the kidney to endotoxin define the sepsis timeline date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k ev qkn clinical sepsis is a highly dynamic state that progresses at variable rates and has life-threatening consequences. staging patients along the sepsis timeline requires a thorough knowledge of the evolution of cellular and molecular events at the tissue level. here, we investigated the kidney, an organ central to the pathophysiology of sepsis. single cell rna sequencing revealed the involvement of various cell populations in injury and repair to be temporally organized and highly orchestrated. we identified key changes in gene expression that altered cellular functions and can explain features of clinical sepsis. these changes converged towards a remarkable global cell-cell communication failure and organ shutdown at a well-defined point in the sepsis timeline. importantly, this time point was also a transition towards the emergence of recovery pathways. this rigorous spatial and temporal definition of murine sepsis will uncover precise biomarkers and targets that can help stage and treat human sepsis. acute kidney injury (aki) is a common complication of sepsis that doubles the mortality risk. in addition to failed homeostasis, kidney injury can contribute to multi-organ dysfunction through distant effects. indeed, the injured kidney is a significant mediator of inflammatory chemokines, cytokines, and reactive oxygen species that can have both local as well as remote deleterious effects - . therefore, understanding the complex pathophysiology of kidney injury is crucial for the comprehensive treatment of sepsis and its complications. we have recently shown that renal injury in sepsis progresses through multiple phases. these include an early inflammatory burst followed by a broad antiviral response and culminating in translation shutdown and organ failure . in a non-lethal and reversible model of endotoxemia, organ failure was followed by spontaneous recovery. the exact cellular and molecular contributors to this multifaceted response remain unknown. indeed, the kidney is architecturally a highly complex organ in which epithelial, endothelial, immune and stromal cells are at constant interplay. therefore, we now examined the spatial and temporal progression of endotoxin injury to the kidney using single cell rna sequencing (scrnaseq). our data revealed that cell-cell communication failure is a major contributor to organ dysfunction in sepsis. remarkably, this phase of communication failure was also a transition point where recovery pathways were activated. we believe this spatially and temporally anchored approach to sepsis pathophysiology is crucial for identifying potential biomarkers and therapeutic targets. we harvested a cumulative amount of , renal cells obtained at , , , , , and hours after endotoxin (lps) administration. the majority of renal epithelial, immune and endothelial cell types were represented (fig. a) . note the absence of podocyte and mesangial cells, which can be a limitation of single cell rnaseq renal dissociation procedures . cluster identities were assigned and grouped using known classical phenotypic markers (fig. b, supplementary fig. a ) [ ] [ ] [ ] [ ] [ ] . interestingly, the umap-based computational layout of epithelial clusters recapitulated the normal tubular segmental order in the nephron. this indicates that gene expression gradually changes among neighboring tubular segments along the nephron. note that the expression of cluster-defining markers varied significantly during the injury and recovery phases of sepsis ( fig. s b; supplementary table ). therefore, we also identified a set of genes that are conserved across time for a given cell type (fig. s c) . in the integrated umap (fig. a) , we noted the presence of a proliferative cell cluster (cdk and ki expression). by back mapping to time-specific unintegrated umaps, we determined that these proliferating cells could be traced to specific cell types at various points along the sepsis timeline (fig. c) . for example, within the first hour after lps, these proliferative indices were expressed primarily in s cells. these cells are the site of lps uptake in the kidney as we have previously shown - . at later time points, proliferative indices are seen in macrophages ( hours) and s cells ( hours) (fig. c) . these proliferative indices reflect cell cycle activity which may be involved in injury, repair or recovery processes . we also noted the presence of a proximal tubular cluster expressing unique gene identifiers: agt, rnf , slc a and slc a (fig. a) . this is likely the proximal tubular s -type (s t ) reported by others . this cluster maintained a separate and distinct identity throughout the sepsis timeline (fig. c) . because the location of s t is currently unknown, we performed in-situ spatial transcriptomics on septic mouse kidneys . we then integrated our scrnaseq with the in-situ rnaseq in order to map our scrnaseq clusters onto the tissue (supplementary fig. a, s b) . we found that the classical s cluster localizes to the cortex while s t is in the outer stripe of the outer medulla (fig. b, supplementary fig. b) . we confirmed the location of s t to the os-om with single molecular fish (supplementary fig. c) . the differential gene expression between s and s t is likely dictated by regional differences in the microenvironments of the cortex and the outer stripe. because angiotensinogen (agt) was strongly expressed in s t , we examined the expression of other components of the renin-angiotensin system (ras). we first noted the absence of ace expression in s tubular cells (fig. c, supplementary fig. ) . in contrast, ace was strongly expressed in s , s and s t cells. there is currently great interest in understanding the biology of ace because of its role in sars-cov- cellular invasion. other essential components of the sars-cov- entry mechanism include tmprss and slc a - . while tmprss was expressed in all proximal tubular segments, slc a was more strongly expressed in s throughout the sepsis timeline. this may point to the s tubular segment as one point of entry of sars-cov- into the kidney. the immune cell profile in the septic kidney was time-dependent and showed a five-fold increase in immune cells, primarily macrophages (fig. a, b) . we noted two distinct macrophage clusters denoted as macrophage a and macrophage b (mϕ-a, mϕ-b). both of these clusters expressed classical macrophage markers such as cd b (itgam) (fig. c) . accumulated macrophages were predominantly mϕ-a. we noted the absence of proliferation markers (cdk , ki ) in this cluster, raising the possibility that this may be an infiltrative macrophage type (fig. d) . the mϕ-b cluster, located between mϕ-a and conventional dendritic cells (cdc) expressed also cdc markers such as mhc-ii subunit genes (h -ab ) and cd c (itgax) indicating that it is an intermediary macrophage type (fig. c) . this continuum between macrophages and dendritic cells in the kidney has been reported - . interestingly, mϕ-b cells expressed proliferation markers (cdk , ki ) and thus, may be differentiating towards a mϕ-a or cdc phenotype (fig. c) . pseudotime and velocity field analysis suggested that at earlier time points ( hour) mϕ-b was differentiating toward mϕ-a phenotype. at later time points ( hours) the velocity field suggested that mϕ-b was differentiating towards cdc but pseudotime analysis was inconclusive (fig. e) . similarly, the mϕ-a cluster also showed two subclusters on the rna velocity map (supplementary fig. a) . one of the subclusters showed increased expression of alternatively activated macrophages (m ) markers such as arg (arginase ) and mrc (cd ) at later time points ( hours, supplementary fig. b) . therefore, rna velocity analysis may be a useful tool in distinguishing macrophage subtypes in scrnaseq data. in t-cells, while cd expression was minimal at all time points, the expression of cd was robust and relatively preserved over time (fig. s c) . we also noted an increase of a distinct plasmacytoid dendritic cell cluster at one hour (pdc). these pdcs, along with natural killer (nk) cells, are known to signal through the interferon-gamma pathway and stimulate cd expression , . this supports the early antiviral response we have previously reported in this sepsis model defined with pseudotime analysis. we note that at any given time point, directional progression of states along pseudotime correlated well with real time state changes (fig. a) . note that the endothelium exhibited changes in states as early as hour, while s showed changes at later time points ( hours). these sequential state changes may reflect the spatial and temporal propagation of lps signaling in the kidney. as sepsis progressed, many cell types lost function- defining markers while acquiring novel ones. for example, s and s lost classical markers like slc a (sglt ) and aqp and expressed new genes involved in antigen presentation such as h -ab (mhc-ii) and cd (fig. b) . moreover, the highly distinct phenotypes that differentiated s from s /s at baseline merged into one phenotype for all three sub-segments by hours after lps (fig. c) . however, despite the apparent convergent phenotype at hours, additional analytical approaches such as rna velocity revealed significant differences in rna splicing kinetics between s and s segments at this time point. in addition, rna velocity revealed the presence of two subclusters within the s segment at hours (fig. d) . these two velocity subclusters did not correlate with the two states seen in pseudotime analysis. this indicates that multiple analytic approaches are needed to fully characterize cellular changes along the sepsis timeline. we next show gene expression profiles in select cell types along the sepsis timeline. in this analysis, we included endothelial cells, pericyte/stromal cells, macrophages and s tubular cells. within hour of lps exposure, most cell types showed decreased expression of select genes involved in ribosomal function, translation and mitochondrial processes such as eef and rpl genes (fig. a, supplementary fig. a ). this reduction peaked at hours and recovered by hours. concomitantly, most cell types exhibited increased expression of several genes involved in inflammatory and antiviral responses such as tnfsf , cxcl , ifit , and irf . however, this increase was not synchronized among all cell populations. indeed, it occurred as early as hour in endothelial cells, macrophages and pericyte/stromal cells, all acting as first responders. in contrast, epithelial cells were late responders, with increases in inflammatory and antiviral responses occurring between and hours. in fact, four hours after lps administration, cluster-specific go terms were indistinguishable among the majority of cell types with enrichment in terms related to defense, immune and bacterium responses (fig. b) . one noted exception was the s t cells (outer stripe s ) which did not enrich as robustly as other cell types in these terms. it mostly maintained an expression profile related to ribosomes, translation and drug transport throughout the sepsis timeline (supplementary fig. ). other players of interest in sepsis pathophysiology such as prostaglandin and coagulation factors are described in supplementary figure b . at the -hour time point, while s cells partially recovered to baseline, the macrophages showed increased expression of genes involved in phagocytosis, cell motility and leukotrienes, broadly representative of activated macrophages (e.g. csf r, lst , capzb, s a , cotl , alox ap, fig. a) . intriguingly, at this late time point, the pericyte/stromal cells are enriched in unique terms related to specific leukocyte and immune cell types such as lymphocyte-mediated immunity, t cell mediated cytotoxicity and antigen processing and presentation. this suggests that the pericyte may function as a transducer between epithelia and other immune cells. therefore, we next examined comprehensively cell-cell communication along the sepsis timeline. we show select examples of cell type-specific receptor ligand pairs. for example, we found that s and endothelial cells communicate with the angpt (angiopoetin ) and tek (tie ) ligand-receptor pair at baseline and throughout the sepsis timeline ( fig. a-b, supplementary fig. a ). in contrast, c was strongly expressed in pericyte/stromal cells, while its receptor c ar localized to macrophage/dcs. this communication, present at baseline, did increase along the sepsis timeline with additional players such as s participating in the cross talk (supplementary fig. ) . another strong communication was noted between endothelial cells and macrophage/lymphocytes using the ccl and ccr receptor-ligand pair. the architectural layout of these four cell types, with pericytes and endothelial cells residing between proximal tubule and macrophage/dcs may explain these complex communication patterns . such communication patterns among these four cell types may also explain macrophage clustering around s tubules at later time points in sepsis as we previously reported . when examined comprehensively, receptor-ligand signaling progressed from a broad pattern at baseline into a more discrete and specialized one hours after lps (fig. c, supplementary fig. b-c) the murine sepsis timeline allows staging of human sepsis finally, we asked whether our mouse sepsis timeline could be used to stratify human sepsis aki. to this end, we selected the differentially expressed genes from all cells combined (pseudo bulk) for each time point across the mouse sepsis timeline (supplementary table ) . we then examined the orthologues of these defining genes in human kidney biopsies of patients with sepsis and aki. the clinical data associated with these human biopsies did not allow further stratification or staging of the sepsis timeline (supplementary table ). as shown in figure d , our approach using the mouse data succeeded in partially stratifying the human biopsies into early, mid and late sepsis-related aki. these findings suggest that underlying injury mechanisms are conserved, and the mouse timeline may be valuable in staging and defining biomarkers and therapeutics in human sepsis. in this work, we provide comprehensive transcriptomic profiling of the kidney in a murine sepsis model. to our knowledge, this is the first description of spatial and temporal transcriptomic changes in the septic kidney that extend from early injury well into the recovery phase. our data cover nearly all renal cell types and are time-anchored, thus providing a detailed and precise view of the evolution of sepsis in the kidney at the cellular and molecular level. using a combination of analytical approaches, we identified marked phenotypic changes in multiple cell populations along the sepsis timeline. the proximal tubular s segment exhibited significant alterations consisting of early loss of traditional function-defining markers (e.g., sglt ). similar losses of function-defining markers along the nephron may explain the profound derangement in solute and fluid homeostasis seen in sepsis. concomitantly, we observed novel epithelial expression of immune-related genes such as those involved in antigen presentation. this indicates a dramatic switch in epithelial function from transport and homeostasis to immunity and defense. these phenotypic changes were reversible, thus underscoring the remarkable resilience and plasticity of the renal epithelium. in addition, our combined analytical tools clearly identified unique subclusters within each epithelial cell population (e.g., cortical s and os s ). these subclusters likely represent novel populations that may be in part influenced by the complex microenvironments in the kidney. it is likely that such microenvironments define unique features in epithelial subpopulations such as the expression of complete sars-cov- machinery in s . similarly, we also identified unique features in immune-cell populations. for example, the combined use of rna velocity field and pseudotime analyses uncovered differences in macrophage subtypes relating to rna kinetics and cell differentiation trajectories. of note is that these subtypes only partially matched the traditional flow cytometry-based classification of macrophages (e.g., m /m ). therefore, the use of single-cell rna seq is a powerful approach that will add to and complement our current understanding of the immune cell repertoire in the additional approaches such as receptor-ligand crosstalk and gene regulatory network analyses identified unique cell-and time-dependent players involved in sepsis pathophysiology. our work points to the urgent need for defining a more accurate and precise timeline for human sepsis. such definition will guide the development of biomarkers and therapies that are cell and time specific. we show evidence supporting the relevance of murine models and their usefulness in staging human sepsis. these precisely time-and space-anchored data will provide the community with rich and comprehensive foundations that will propel further investigations into human sepsis. animal model: male c bl/ j mice were obtained from the jackson laboratory. mice were - biopsies were used in this study, the institutional review board determined that informed consent was not required. murine kidneys were transported in rpmi (corning), on ice immediately after surgical procurement. kidneys were rinsed with pbs (thermofisher) and minced into eight sections. each sample was then enzymatically and mechanically digested with reagents from multi- tissue dissociation kit and gentlemacs dissociator/tube rotator (miltenyi biotec). the samples were prepared per protocol "dissociation of mouse kidney using the multi tissue dissociation kit " with the following modifications: after termination of the program "multi_e_ ", we added ml rpmi (corning) and % bsa (sigma-aldrich) to the mixture, filtered and homogenate was centrifuged ( g for minutes at °c). cell pellet was resuspended in ml of rbc lysis buffer (sigma), incubated on ice for minutes, and cell pellet washed three times ( g for minutes at °c ). annexin v dead cell removal was performed using magnetic bead separation after final wash, and the pellet resuspended in rpmi /bsa . %. viability and counts were assessed using trypan blue (gibco) and brought to a final concentration of million cells/ml, exceeding % viability as specified by x genomics processing platform. the sample was targeted to , cell recovery and applied to a single cell master mix with lysis buffer and reverse transcription reagents, following the chromium single cell ' reagent kits v user guide, cg rev a ( x genomics, inc.). this was followed by cdna synthesis and library preparation. all libraries were sequenced in illumina novaseq platform in paired-end mode ( bp + bp). fifty thousand reads per cell were generated and the x genomics cellranger (v. . . ) pipeline was utilized to demultiplex raw base call files to fastq files and reads aligned to the mm murine genome using star . cellranger computational output was then analyzed in r (v. . . ) using the seurat package v. . . . , . seurat objects were created for non-integrated and integrated (inclusive of all time points) using the following filtering metrics: gene counts were set between - and mitochondrial gene percentages less than to exclude doublets and poor quality cells. gene counts were log transformed and scaled to . the top principle components were used to perform unsupervised clustering analysis, and visualized using umap dimensionality reduction (resolution . ). using the seurat package, annotation and grouping of clusters to cell type was performed manually by inspection of differentially expressed genes (degs) for each cluster, based on canonical marker genes in the literature - , , . in some experiments, we used edger negative binomial regression to model gene counts and performed differential gene expression and pathway enrichment analyses (topkegg, topgo, fig. , supplementary fig. a, supplementary fig. , and david . fig. b . , . the immune cell subset was derived from the filtered, integrated seurat object and included the macrophage/dc (cluster ), neutrophil (cluster ) and lymphocyte (cluster ) cells. gene counts were log transformed, scaled and principle component analysis performed as for the integrated object above. umap resolution was set to . , which yielded clusters. the clusters were manually assigned based on inspection of degs for each cluster, and cells grouped if canonical markers were biologically redundant. we confirmed manual labeling with an automated labeling program in r, singler . scenic analysis was performed using the default setting and mm - bp-upstream- species.mc nr.feather database was used for data display. we performed pseudotime analysis on the integrated seurat object containing all cell types as well as the immune cell subset. cells from each of the seven time points were included and were split into individual gene expression data files organized by previously defined cell type. a septic mouse kidney was immediately frozen in optimal cutting temperature media. a µm frozen tissue section was cut and affixed to a visium spatial gene expression library preparation slide ( x genomics). the specimen was fixed in methanol and stained with hematoxylin-eosin reagents. images of hematoxylin-eosin-labeled tissues were collected as mosaics of x fields using a keyence bz-x fluorescence microscope equipped with a nikon x cfi plan fluor objective. the tissue was then permeabilized for minutes and rna was isolated. the cdna libraries were prepared and then sequenced on an illumina novaseq . using seurat . . , we identified anchors between the integrated single cell object and the spatial transcriptomics datasets and used those to transfer the cluster data from the single cell to the spatial transcriptomics. for each spatial transcriptomics spot, this transfer assigns a score to each single cell cluster. we selected the cluster with the highest score in each spot to represent its single cell associated cluster. using a loupe browser, expression data was visualized overlying the hematoxylin-eosin image. formalin-fixed paraffin-embedded cross sections were prepared with a thickness of µm. the slides were baked for minutes at °c. tissues were incubated with xylene for minutes x , % etoh for minutes x , and dried at room temperature. rna in situ hybridization was fluorescein plus evaluation kit (perkinelmer, inc) was used as secondary probes for the detection of rna signals. all slides were counterstained with dapi and coverslips were mounted using fluorescent mounting media (prolong gold antifade reagent, life technologies). the images were collected with a lsm confocal microscope (carl zeiss). no blinding was used for animal experiments. all data were analyzed using r software packages, with relevant statistics described in results, methods and fig. legends . data will be deposited to ncbi geo. the authors declare that all relevant data supporting the findings of this study are available on request. r scripts for performing the main steps of analysis are available from the lead contact on request. correspondence and requests for resources and reagents should be directed to and will be fulfilled by the lead contact takashi hato (thato@iu.edu). supplemental fig. - : refer to "supplemental_fig - .pdf" supplemental table : cell-type specific differentially expressed genes from - hours, related to fig. , supplemental fig. . lung-kidney cross-talk in the critically ill patient distant organ dysfunction in acute kidney injury: a review sepsis: current dogma and new perspectives sepsis associated acute kidney injury bacterial sepsis triggers an antiviral response that causes translation shutdown rna sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis representation and relative abundance of cell-type selective markers in whole- kidney rna-seq data a single-nucleus rna-sequencing pipeline to decipher the molecular anatomy and pathophysiology of human kidneys deep sequencing in microdissected renal tubules identifies nephron segment-specific transcriptomes single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease single-cell profiling reveals sex, lineage, and regional diversity in the mouse kidney the macrophage mediates the renoprotective effects of endotoxin preconditioning endotoxin preconditioning reprograms s tubules and macrophages to protect the kidney endotoxin uptake by s proximal tubular segment causes oxidative stress in the downstream s segment epithelial cell cycle arrest in g /m mediates kidney fibrosis after injury joint profiling of chromatin accessibility and gene expression in thousands of single cells visualization and analysis of gene expression in tissue sections by spatial transcriptomics structural basis for the recognition of sars-cov- by full-length human ace sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor multiorgan and renal tropism of sars-cov- renal histopathological analysis of postmortem findings of patients with covid- in china sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes identification and functional characterization of dendritic cells in the healthy murine kidney and in experimental glomerulonephritis quantification of dendritic cell subsets in human renal tissue under normal and pathological conditions macrophages in renal injury and repair the debate about dendritic cells and macrophages in the kidney distinct macrophage phenotypes contribute to kidney injury and repair pivotal role of plasmacytoid dendritic cells in inflammation and nk-cell responses after tlr triggering in mice interferon-lambda modulates dendritic cells to facilitate t cell immunity during infection with influenza a virus star: ultrafast universal rna-seq aligner comprehensive integration of single-cell data a transcriptional map of the renal tubule: linking structure to function single-cell transcriptomics of a human kidney allograft biopsy specimen defines a diverse inflammatory response bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage scenic: single-cell regulatory network inference and clustering the dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells characterization of cell fate probabilities in single-cell data with palantir rna velocity of single cells cellphonedb: inferring cell- cell communication from combined expression of multi-subunit ligand-receptor complexes circlize implements and enhances circular visualization in r we thank the kidney precision medicine project for making data available for human kidney reference nephrectomy specimens. we thank daria barwinska for assistance with specimen validation. this work was supported by nih k -dk to th, nih r -dk , key: cord- - cgih o authors: mead, benjamin e.; hattori, kazuki; levy, lauren; vukovic, marko; sze, daphne; matute, juan d.; duan, jinzhi; langer, robert; blumberg, richard s.; ordovas-montanes, jose; shalek, alex k.; karp, jeffrey m. title: high-throughput organoid screening enables engineering of intestinal epithelial composition date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cgih o barrier tissue epithelia play an essential role in maintaining organismal homeostasis, and changes in their cellular composition have been observed in multiple human diseases. within the small intestinal epithelium, adult stem cells integrate diverse signals to regulate regeneration and differentiation, thereby establishing overall cellularity. accordingly, directing stem cell differentiation could provide a tractable approach to alter the abundance or quality of specialized cells of the small intestinal epithelium, including the secretory paneth, goblet, and enteroendocrine populations. yet, to date, there has been a lack of suitable tools and rigorous approaches to identify biological targets and pharmacological agents that can modify epithelial composition to enable causal testing of disease-associated changes with novel therapeutic candidates. to empower the search for epithelia-modifying agents, we establish a first-of-its-kind high-throughput phenotypic organoid screen. we demonstrate the ability to screen thousands of samples and uncover biological targets and associated small molecule inhibitors which translate to in vivo. this approach is enabled by employing a functional, cell-type specific, scalable assay on an organoid model designed to represent the physiological cues of in vivo paneth cell differentiation from adult intestinal stem cells. further, we miniaturize and adapt the organoid culture system to enable automated plating and screening, thereby providing the ability to test thousands of samples. strikingly, in our screen we identify inhibitors of the nuclear exporter xpo modulate stem cell fate commitment by inducing a pan-epithelial stress response combined with an interruption of mitogen signaling in cycling intestinal progenitors, thereby significantly increasing the abundance of paneth cells independent of known wnt and notch differentiation cues. we extend our observation in vivo, demonstrating that oral administration of xpo inhibitor kpt- at doses , -fold lower than conventionally used in hematologic malignancies increases paneth cell abundance. in total, we provide a framework to identify novel biological cues and therapeutic leads to rebalance intestinal stem cell differentiation and modulate epithelial tissue composition via high-throughput phenotypic screening in rationally-designed organoid model of differentiation. epithelial barrier tissues enable interaction, exchange, and protection from the external environment. these vital functions are accomplished by specialized epithelial cells integrated with stromal and immune cell populations. within the small intestine, adult intestinal stem cells (iscs), conventionally identified by markers including lgr (barker et al., ) and olfm (van der flier et al., ) , provide a source of constant regeneration from which an ordered process of differentiation into secretory and absorptive epithelial cells determines composition, thereby steering barrier function. under homeostatic conditions, wnt, bmp, and notch signaling actively maintain the isc niche (kim et al., ; pinto et al., ) . however, iscs have a demonstrated capacity to integrate dietary and immune-derived signals to modulate their self-renewal and differentiation into specific secretory lineages (beyaz et al., ; biton et al., ; von moltke et al., ) . further, following injury, the isc niche has a remarkable capacity to regenerate from nonstem or quiescent stem pools (ayyaz et al., ; tetteh et al., ; yan et al., ) . cellular identity in the stem cell niche is fluid and responsive to both physiologic and pathologic stimuli (roulis et al., ) . in the barrier tissues of the upper respiratory tract and skin, changes in epithelial composition arising from aberrant stem cell differentiation (naik et al., ; ordovas-montanes et al., ) may be a precipitating factor in infalmmatory diseases. additionally, shifts in the composition and quality of mature epithelial cells descendant from iscs are known to occur in the colon and the small intestine of patients suffering from inflammatory bowel disease (graham and xavier, ; smillie et al., ) . given that barrier tissue stem cells possess significant control over cellular composition and may provide for the restoration of tissue homeostasis in a broad spectrum of human disease, they are a compelling target for therapeutic development. to support discovery efforts focused on modulating or restoring epithelial barrier composition, there is a need for tools to identify druggable, biological targets involved in regulating epithelial differentiation. such a tool should employ a physiologicallyrepresentative model of the barrier to provide for the best opportunity to identify biological targets that may translate to the in vivo context, while also recapitulating cell differentiation processes in a scalable and robustly measurable fashion. intestinal organoids-broadly defined as three-dimensional, stem cell-derived, tissue-like cellular structures-have proven to be valuable models of the adult stem cell niche, and preserve known developmental pathways in stem cell differentiation (sato et al., ; yin et al., ) . however, because organoid models are dynamic, cellularly and structurally heterogeneous, and typically require complex and costly experimental manipulations, their application in phenotypic high-throughput screens has been limited. screening activities in organoid models of multiple tissues have so far either focused on clear, genetically-driven behaviors (dekkers et al., ; korving et al., ) , or had screening capacity demonstrated on the order of tens of perturbations (czerniecki et al., ) . while these are foundational steps towards harnessing organoids for screening, these models have not yet provided for novel biological target discovery which translates to the native in vivo tissue. one approach to adapt organoids for phenotypic high-throughput screening is through the reduction of model complexity around a well-structured hypothesis which incorporates links to in vivo tissue biology (mead and karp, ) . existing chemical tools for small intestinal organoids may enable this approach. the addition of well-characterized small molecules to culture media provides for both enriching intestinal organoids for iscs, and driving differentiation down specific lineages, by providing physiologically-meaningful cues, such as the modulation of wnt and notch signaling (yin et al., ) . use of such rationally-directed differentiation has been applied to induce functional paneth cells from enriched iscs in vitro (mead et al., ) . by employing small molecules to mimic known in vivo signaling cues of physiological differentiation from a stem cell, it becomes possible to construct organoids representative of lineage-specific stem cell differentiation and frame a screening campaign around identifying new biological targets which may modulate such differentiation. for instance, identification of biological targets and accompanying small molecules which enhance paneth cell differentiation from iscs. searching for novel targets which enhance paneth cell differentiation and increase abundance in the native tissue may be therapeutically valuable. declines in paneth cell quality and number are observed in inflammatory bowel disease (ibd) (gassler, ; khor et al., ; liu et al., ; mcguckin et al., ; xavier and podolsky, ) . similar paneth cell aberrations occur in necrotizing enterocolitis (nec), corresponding with intestinal immaturity and excessive inflammation and systemic infection (mcelroy et al., ; sherman et al., ; tanner et al., ; white et al., ) . emerging evidence suggest that certain viral pathogens, including a subset of coronavirus, may mediate their disruption of the intestinal barrier by targeting and depleting paneth cells (wu et al., ) . finally, patients with graft versus host disease (gvhd) can exhibit a loss in paneth cell number and quality, and associated microbial dysbiosis (eriguchi et al., ) . additionally, treatment with r-spondin , a potentiator of wnt signaling, can elevate the secretion of paneth-specific alpha-defensins and resolve dysbiosis seen in mice with gvhd by stimulating iscs to differentiate into paneth cells (hayase et al., ) . while treatment with r-spondin illustrates the importance of stem cell cues driving barrier tissue reconstitution, it faces a major challenge in clinical translation because wnt activation is implicated in precancerous hyperplasia okubo and hogan, ; sansom et al., ) . other signaling pathways known to drive paneth cell differentiation, including notch signaling, face similar challenges. activation of notch signaling amplifies the proliferative progenitor population and promotes an absorptive cell lineage (fre et al., ; jensen et al., ; vandussen et al., ) . conversely, deactivation of notch signaling amplifies differentiation to all secretory cell types and secretory cell hyperplasia (vandussen and samuelson, ) . a more specific, and niche factor-independent treatment to accomplish isc-to-paneth differentiation could provide for safer modulation of tissue composition. directing iscs to preferentially differentiate to paneth cells independent of known niche signaling (wnt and notch) offers both a specific hypothesis and a compelling therapeutic target. accordingly, we have sought to adapt and scale an organoid model to screen for pharmaceutically-actionable biological targets which mediate isc-to-paneth differentiation independent of major niche-associated pathways. to screen for novel biological targets with established pharmacophores which may regulate paneth cell differentiation in vitro and translate in vivo, we require a physiological model system and a scalable screening approach to reasonably scan annotated small molecule space (up to thousands of samples) and measure specific changes of a single cell type (paneth cells), within a dynamic (differentiation) and heterogeneous (organoid) system. to model physiologically-driven paneth cell differentiation, we employ a method of small molecule-mediated enrichment and differentiation of intestinal organoids from iscs to paneth cells. murine intestinal organoids are typically expanded in a -d matrigel scaffold with supplemented culture media containing growth factors intended to mimic the isc niche -epidermal growth factor (e), bmp-antagonist noggin (n), and the aforementioned wnt-pathway enhancer r-spondin (r). spontaneous differentiation of organoids grown in enr media can be mitigated, and the population of iscs enriched by the addition of small molecules chir (c), an activator of wnt signaling, and valproic acid (v), an activator of notch signaling. these isc-enriched cultures can then be differentiated towards paneth cells with the small molecules (c) and dapt (d), an inhibitor of notch signaling, as we have previously shown (mead et al., ; yin et al., ) . to measure changes in paneth cell abundance or quality, we employ a previously demonstrated assay of paneth cell-specific function and relative-abundance with a commercially-available assay for secreted lysozyme (lyz) (mead et al., ) . to scale this model system and measurement, we adapt conventional -d organoid culture into a . -d pseudo-monolayer, where isc-enriched organoids are partially embedded on the surface of a thick layer of matrigel (at the matrigel-media interface), rather than fully encapsulated in the matrigel structure -an approach similar to others previously reported (langhans, ) . this technique enables matrigel plating, cell seeding, and media additions to be performed in a high-throughput, fully-automated, -well plate format and allows for lyz secretion directly into cell culture media, thereby enhancing measurement sensitivity. to test our approach, we screened a small molecule library containing well-annotated and specific small molecule inhibitors of a diverse range of biological targets (supp . table ) over a -day differentiation starting from isc-enriched organoid precursors (n = biological replicates originating from unique murine donors). small molecules were added into distinct wells at concentrations per compound ( nm to µm range) at day and day , and at day we measured the functional secretion of lyz in media supernatants, as a specific marker of paneth cell enrichment (fig. a) . paneth cell abundance was determined by measuring basally secreted lyz (lyz.ns) followed by carbachol (cch)-induced secretion (lyz.s), along with cellular atp as a measure of relative cell number per well. we chose to assay both stimulant-induced (lyz.s -total cellular lyz) and basal (lyz.ns -constitutively secreted lyz) secretion to distinguish compounds which may mediate changes in paneth cell quality or secretion (lyz.ns and lyz.s uncorrelated) versus changes in paneth cell abundance (lyz.ns and lyz.s correlated). the target-selective inhibitor library contained compounds with high specificity to unique molecular targets-many implicated in stem cell differentiation. in total, our proof-of-concept screen assayed , unique samples with a -measure functional assay. we first sought to demonstrate that our screening approach is reproducible, and that our assays measure meaningful function at scale. following normalization of all measured wells, each assay had an approximate-normal distribution, with lower-value tails corresponding to toxic compounds (supp. fig. a) . assay values across biological replicates were well correlated, with pearson correlation values between screen plates ranging from . to . (supp. fig. b) . to validate our multiplexed assay's performance in the screen, we assessed each assay's performance based on un-treated control wells which are randomly-distributed in each screening plate. control wells had significantly higher atp readings than no-cell wells (adj. p < . ), and in the lyz.ns and subsequent lyz.s assays, supernatant lyz was significantly higher in µm cchstimulated control wells than in basal control wells (lyz.ns adj. p < . , lyz.s adj. p < . ), which in turn were significantly higher than no-cell wells (lyz.ns adj. p < . , lyz.s adj. p < . ) (supp. fig. c ). following confirmation of reasonable plate reproducibility and assay performance, we next sought to define which molecules meaningfully increased paneth cell abundance. we defined primary screen 'hits' as having replicate strictly standardized mean differences (ssmds) in both lyz.ns and lyz.s assays greater than the calculated optimal critical value ( ! ! = . ) (fig. b, ! ! was determined as the intersection minimizing false positive and false negative levels (fpl & fnl error = . ) for up-regulation of ssmd-based decisions (zhang, ) . the hits correspond to treatment-dose (grouped by biological replicate) combinations that had a statistically significant increase in lyz.ns and lyz.s without regard to viability (though most hits per these criteria had positive effects on cellular atp). hits were narrowed down to treatment-dose combinations using the z-scored fold-change to select for combinations that elicited a biological effect in the top % of values for both lyz assays relative to the plate (z score > . ). thus, drugs (covering treatmentdose conditions) from unique molecular targets were identified as primary screen hits (fig. c) . for molecular targets with more than one hit, only the most robust treatmentdose was selected for further investigation. to validate primary screen hits against an enr+cd (not plate) control, while identifying appropriate dose-response ranges and narrowing hits to only the most potent activator(s) of paneth cell differentiation, we performed a secondary screen with the primary screen hit compounds. compounds were tested at a narrowed dose range around each treatment's identified optimal dose from the primary screen ( x below, x below and above). hits in the validation screen were chosen by ssmds in both lyz.ns and lyz.s assays greater than the calculated optimal critical value (βα = . ), with compounds passing this threshold (supp. fig. d ). the same treatment-dose conditions passing the ssmd threshold also had the greatest biological effect, and in particular one compound, kpt- , a known xpo inhibitor, had two doses representing the greatest and neargreatest biological effect (~ - % increases in lyz.ns and lyz.s relative to enr+cd control) (fig. d ). the results of primary and secondary screening reflect a mixture of potential effects which may cause increases in total lyz secretion. this includes contributions from: ( ) enhanced paneth cell differentiation, ( ) altered paneth cell quality, and ( ) changes in total cell number concurrent with differentiation. to better inform how the compounds increased total secreted lyz, and to isolate only those which enhance paneth differentiation robustly, we utilized flow cytometry to measure changes in paneth cell representation within treated organoids during differentiation. concurrently, to ensure that we do not select for compounds which manifest their behavior only in specific in vitro settings, we performed the analyses in the conventional -d culture method ( ), thus providing control for . -d culture system-specific effects. single live cells were selected by several gating strategies and paneth cells were identified as lyz-high, cd -mid, side scatter-high (ssc-high) (supp. fig. e ). only one compound, kpt- -the most potent compound in validation screening -significantly enhanced the mature paneth cell population within differentiating organoids, suggesting kpt- induces paneth differentiation ( ) (fig. e) . of the remaining compounds, nilotinib excluded, none changed organoid composition and are likely driving changes in paneth quality ( ) or are mediating effects dependent on . -d culture ( ). nilotinib significantly decreased paneth abundance, while significantly increasing total cell number, suggesting the overall increase in bulk lyz secretion is an effect of increased proliferation ( ), or . -d-mediated effect ( ). to examine whether our hits are dependent or independent of canonical stem cell niche signaling, we next evaluated whether the culture media supplements c and d (which mimic physiological paneth differentiation through wnt activation and notch inhibition) may alter the effects of our hits. we measured paneth cell abundance in the canonical enr culture condition in -d (nb because paneth cells exist in an immature state within enr, we were unable to robustly quantify paneth cell number via flow cytometry, and instead used the lyz secretion assay). this result mirrored our flow cytometry findings in the enr+cd condition, suggesting that the identified compounds act independently of strong wnt and notch drivers, and that only kpt- is enhancing paneth cell-specific activity in the conventional organoid culture condition (suppl. fig. f ). collectively, these results led us to focus solely on kpt- and its potential mechanism of xpo inhibition. we next sought to confirm the predicted on-target activity of kpt- and investigate the dose-dependency of treatment in enhancing paneth cell differentiation. kpt- is a firstin-class orally-administered fda-approved drug against multiple myeloma, targeting the nuclear transporter xpo (also known as crm- ). administration of kpt- below nm for days (nb higher concentrations proved toxic in primary screening) showed lyz secretion increasing in a dose-dependent manner, with nm of kpt- as the most effective dose among tested concentrations kpt- is a selective inhibitor of nuclear export (sine); these molecules act by suppressing the xpo -regulated nuclear export of multiple proteins and mrnas from the nucleus to the cytoplasm -including genes involved in stem cell maintenance and differentiation as well as inflammatory stress response (sendino et al., ) . proteins shuttled by xpo are marked with a nuclear export signal (nes). additionally, xpo is known to regulate cell cycle through xpo 's export-independent role in the regulation of mitosis (forbes et al., ) . based on this evidence, we hypothesized that xpo inhibition might provide for enhanced paneth cell differentiation by directing iscs to modulate their differentiation trajectories through alterations in either developmental signaling within the nucleus and / or interfering with cell cycle. to test the hypothesis that kpt- drives paneth differentiation by altering isc behavior, we utilized single-cell rna-sequencing (scrna-seq) via seq-well s^ (hughes et al., ) . we performed a longitudinal comparison between untreated and kpt- treated organoids over a -day differentiation, with a particular emphasis on early timepoints ( fig. a) . we collected samples at following timepoints: hours ( . days), and , , , , or days. each sample consists of single cells from > , organoids from pre-differentiation enr+cv organoids and both enr+cd and enr+cd + kpt- ( nm) conditions. for time points beyond days, media was refreshed every other day. the resulting dataset consists of , cells. unique molecular identifier (umi), percent mitochondrial, and detected gene distributions are similar across samples, within acceptable quality bounds (genes > , umi < , , percent mitochondrial < ) (supp. fig. a ). table ) . each cluster possessed similar quality metrics, suggesting that clusters are driven by biological and not technical differences (supp. fig. b ). to contextualize and provide a more robust measure of cellular identity of our clusters, we used lineage-defining gene sets from a murine small intestinal scrna-seq atlas (haber et al., ) to score for enrichment in gene set expression (supp. fig. c ). the eight clusters include three stem-like, two enterocyte, two paneth, and one enteroendocrine, aligning with our expectation that enr+cd differentiation should enrich for secretory epithelium cells -principally paneth and to a lesser extent enteroendocrine (fig. e) . to distinguish the three stem-like clusters, and assess physiological relevance, we performed module scoring over gene sets identified to correspond to known isc subsets in vivo (biton et al., ) (supp. fig. d) . we see alignment with the type iii and type i iscs per the nomenclature of biton et al., along with slight enrichment for a distinct type ii (supp. fig. e ), though this population may also be an intermediate between stem i and iii populations, sharing markers with both (fig. d) . accordingly, we adopted the naming scheme of biton et al. to we next explored changes in cell type representation between organoids treated with kpt- versus control. importantly, in the combined dataset, we do not observe cell clusters unique to kpt- treatment, but rather shifts in cluster composition (supp. fig. f ). both conditions begin with over % of cells either stem ii or stem iii. by day , stem i emerges, accounting for approximately % of the cells in the control condition, but a smaller proportion in kpt- -treated organoids. early enterocytes emerge at day , with the continued differentiation to enterocytes, peaking at day and becoming less prevalent by day . early paneth population appears to crest with enterocytes followed by a transition to paneth cells continuing to day ( fig. e) . to better quantify the differences in representation between the kpt- and control conditions over time, we performed fisher exact testing for each cell type relative to all others. this was done for each timepoint where that cell type accounted for at least % of cells in both kpt- and control samples. we present the relative enrichment or depletion of a cell population with kpt- treatment over time as the odds ratio (or) with a corresponding % confidence interval. kpt- treatment leads to a depletion of stem i, ii, iii and enteroendocrine cells over time along with the corresponding enrichment of enterocytes and paneth cells (fig. g ). the observed two-fold enrichment in paneth cells at day mirrors our flow cytometry observations of a two-fold increase in mature paneth cells, while also showing the unexpected early enrichment of enterocytes and longer-term depletion of a subset of stem cells -the quiescent stem i population. compositional changes during differentiation with kpt- are consistent with xpo inhibition acting in a pro-differentiation manner, and our data suggest that the stem ii / iii populations may be a primary target. in untreated organoids, the expression of xpo is significantly enriched in the cycling stem iii population (fig. a & supp. fig a) , and the expression of genes known to contain a nes (which is required for the nuclear efflux via (fu et al., ) . more specifically, we know that xpo mediates nuclear signaling processes including the mitogen-activated protein kinase (mapk) pathway, nfat, ap- , and aurora kinase activity during cell division (sendino et al., ; . with this in mind, we observe the expression of many key mediators in these pathways within the stem populations, and see particular stem iii-enrichment in members of mapk (mapk , mapk , mapk , mapk ), nfat (nfatc ), ap- (atf ), and aurora kinases (aurka, aurkb) (supp. fig. c ). to explore whether the stem ii / iii population is the principal cellular target of xpo inhibition, we leveraged the dynamic nature of our system and exposed organoids to kpt- over every , , and -day interval in the -day differentiation and measured final abundance and function of mature paneth cells at day , thereby inferring the relative effect of xpo inhibition on each cell type (fig. c) . of all the -day kpt- treatments, day - results in the greatest enrichment in mature paneth cells, with longer exposure following day providing additional, albeit lesser enrichment. further, day - treatment produces moderate enrichment, while day - is no different than untreated (by flow cytometry) or slightly enriched (by lyz secretion assay) (fig. d & supp. fig. d ). using an additional sine, kpt- , we observe similar enrichment behavior as kpt- (supp. fig. e ). this data is consistent with xpo inhibition altering stem ii / iii differentiation -the largest effects of xpo inhibition are concurrent with periods in the differentiation course where stem ii / iii populations are most abundant. however, this data also suggests that xpo inhibition may not be entirely stem-dependent, given the lesser, but significant increases in paneth cell number and function with later treatment, where stem ii / iii populations are greatly diminished. to better understand the pleiotropic effects of xpo inhibition which may mediate differentiation, we examined the differentially expressed genes between kpt- treated and untreated stem ii / iii populations in the earliest stages of differentiation when they are most abundant (day . - ). both the most significantly enriched (xpo ) and depleted table ). additional notable genes with significantly increased expression include arrdc (regulates proliferative processes), (growth inhibitor), and atf (regulates stress response in iscs) (cheng et al., ; draheim et al., ; jadhav and zhang, ; zhou et al., ) . genes down-regulated by kpt- treatment appear related to proliferation and cell cycle, including the marker mki . in addition to substantial changes within early stem ii / iii populations, genes regulated by xpo inhibition -including xpo , atf , trp (p ), ccnd , cdk / , and cdkn a (p ) -have increased expression across all cell types (at all times), but with significant differences in the fraction of cells which express each gene (supp. fig. f ). this suggests that there are both stem ii / iii -specific responses and pan-epithelial responses to xpo inhibition. to better contextualize the transcriptional response to kpt- treatment in stem ii / iii cells, we performed gene set enrichment analyses (gsea) using the v molecular signatures database (msigdb) hallmark collection, which represent specific well-defined biological states or processes across systems (liberzon et al., ; subramanian et al., ) . significant gene sets with fdr < . reveal two major programs differentially we next examined whether the responses embodied by the significant differentiallyexpressed genes in stem ii / iii (day . - ) may be pan-epithelial or restricted to the cycling stem ii / iii populations. the stress response module (differentially increased in stem ii / iii) is substantially increased across all cells during differentiation, with the greatest effect in the stem ii / iii as well as early mature cell populations, and lowest effect in the mature paneth cells (fig. g) . conversely, the mitogen signaling module (differentially decreased in stem ii / iii) is selectively decreased in stem ii / iii and early enterocyte populations relative to all others. this selectivity corresponds with our observation that the majority of mitogen signaling occurs within the proliferative stem ii / iii populations relative to the mature populations. as further evidence of altered mitogen signaling impacting stem ii / iii cells following xpo inhibition, we observe a decrease in we sought to clarify this conceptual model with the use of additional small molecule inhibitors known to modulate discrete components of our hypothesized differentiation process, namely: signaling through xpo -associated stress response including ap- and p , signaling within the mapk pathway, and finally xpo -mediated effects on mitosis through association with aurora kinases. we began by treating organoids along the enr+cd differentiation course with sr , a small molecule inhibitor of ap- , to test whether ap- is critical to the sine-induced stress response -both alone and in combination with kpt- . we observe that sr significantly decreases functional lyz secretion at the end of the -day differentiation, both in combination with and alone (fig. i ). this suggests that ap- signaling is an important mediator of paneth differentiation from iscs. we next tested whether p is a downstream mediator of xpo inhibition by repeating the above assay with two known p modulators: a p inhibitor pifithrin-a (pfta), and p agonist serdemetan (serd.). across a wide dose-range, both p modulators tested did not alter paneth cell differentiation -neither alone nor in combination with kpt- suggesting that the kpt- stress response is not dependent on p signaling modulated by either compound (supp. fig. g ). with the same assay, we began to probe the mitogen signaling response by adding the mek inhibitor, cobimetinib (shown by basak et al. to induce the quiescent isc population), in combination with kpt- . interestingly, cobimetinib alone did not significantly alter paneth cell differentiation, but gave a different result in combination with kpt- (fig. i) . we next sought to test whether the regulation of cell cycle via mitogen signaling may be an important downstream mediator following xpo inhibition. inhibition of cdk / with palbociclib both alone and in combination with kpt- did not alter paneth cell differentiation (supp. fig. h ), but inhibition of aurora kinase b with zm did significantly increase paneth cell differentiation (notably, zm was also a lower-effect size hit in our primary screen) (supp. fig. i) . combined, these experiments suggest that the sine-induced stress response may be mediated by ap- but not p , while suppression of mitogen signaling is not dependent on erk, but is further enhanced by erk inhibition. additionally, the non-exporter-related action of xpo during cell cycle (which interacts with aurora kinase) may further contribute to the observed pro-differentiation effect. in total, our analyses suggest that xpo inhibition drives paneth cell enrichment through the modulation of cell state within cycling iscs (stem ii / iii). further, this modulation includes a confluence of pan-epithelial stress response and suppression of mitogen signaling within stem ii / iii. we observe the cycling stem population becoming transiently quiescent, thereby favoring differentiation towards the paneth and enterocyte lineages (the latter being a short-lived population relative to the former) over a more balanced transition to the mature lineages and the quiescent stem pool (stem i) (fig. j ). based on our understanding of xpo inhibition in stem-enriched organoids, we hypothesized that sine compounds may selectively enrich the epithelium for paneth cells in vivo. our findings in organoids suggest that sine treatment is independent of the niche cues of wnt and notch, and acts specifically on cycling stem cells (which are abundant in the epithelial crypts). while xpo inhibition may enrich both for paneth cells and enterocytes, by virtue of the relatively long paneth cell lifespan (ireland et al., ) we would expect a longer-term accumulation of paneth cells in vivo relative to enterocytes. additionally, because xpo inhibition in organoids does not expand the stem cell pool but rather rebalances patterns of differentiation, we expect an increase in paneth cell number following sine treatment in vivo to be restricted to the spatial constraints of nonhypertrophic crypts and proportional to the initial number of cycling progenitors. this suggests that in vivo increases in paneth cell number may be modest, thus requiring a particularly sensitive method of quantification. following a similar protocol as previously reported for sine treatment in the context of cancer (arango et al., ; azmi et al., ; hing et al., ; zheng et al., ) , kpt- was administered at a dose of mg/kg via oral gavage every other day over a twoweek span in c bl/ wild-type mice, and body weight was monitored for any clear toxicity. within the treatment group, we observed significant weight loss indicative of toxicity (supp. fig. a ). given animal weight loss on the standard chemotherapeutic dosage regimen, and additional evidence that sustained dosage of sines adversely impacts t cell populations (tyler et al., ) , we sought to explore dosing regimens well below mg/kg, to assess if a pro-paneth phenotype may exist below potential toxicities. we repeated the two-week study with oral gavage of kpt- every other day at doses corresponding to -fold ( . mg/kg), -fold ( . mg/kg), and , -fold ( . mg/kg) decrease in the mg/kg dose conventionally used in a cancer setting. because paneth cell number and quality is known to physiologically change along the length of the small intestine, and diseases associated with paneth cells most frequently present distally (abraham and cho, ), we sought to profile how xpo inhibition may differentially affect proximal and distal small intestine. we tracked animal weight every other day and collected the proximal and distal thirds of the small intestine at day for histological quantification of paneth, stem, and goblet populations (fig. a) . in this lower-dose regimen, we observe no significant changes in animal weight, suggesting that we are outside the gross toxicity range (supp. fig. b ). paneth cells were counted within well preserved crypts -with at least crypts quantified per animal (representative images supp. fig. c ) -and the counts were averaged. compellingly, within this lower dose regimen, we observe significant increases in paneth cell abundance both in the proximal and distal small intestine at doses of . mg/kg, and proximally at . mg/kg. we additionally quantified the abundance of olfm + stem cells as well as pas+ goblet cells within the same animals to ascertain whether the effect of sine treatment was restricted to the paneth cell compartment (representative images supp. fig. d,e) . interestingly, we observe a significant increase in olfm + stem cells within the distal si at doses of . mg/kg corresponding to the group with the greatest increase in paneth cells (fig. c ), suggesting a potential expansion of the stem cell niche commensurate with increased paneth cell abundance. we did not identify any significant changes in the developmentally-related goblet cell population (fig. d) . in total, this data suggests that sine-treatment may be a meaningful approach to specifically increase paneth cell abundance in vivo, and further validates our framework for using models of organoid differentiation in small molecule screening. here, we demonstrate that, by employing phenotypic small molecule screening in a rationally and physiologically motivated organoid model, we can uncover novel biological targets and clinically-relevant small molecules which translate to and inform in vivo tissue stem cell biology. further, this approach to small molecule phenotypic screening enables a specific, functional readout in a dynamic and heterogeneous organoid model. our approach provides perturbation capacity nearly two orders of magnitude greater than existing examples of non-genetic perturbations in organoid models, thereby enabling screens within the space of annotated small molecule libraries and empowering novel biological target discovery. by using a model that focuses on differentiation to a specific lineage (the paneth cell), we are able to resolve a pathway and compounds which direct isc fate decisions to drive subtle but significant effects on the in vivo tissue. we identify a series of compoundsknown to inhibit the nuclear exporter xpo -acting on cycling iscs by inducing a program of stress response and decreased mitogen signaling. this isc response rebalances self-renewal and differentiation towards paneth cell differentiation. recent work on mitogen and stress response control of re-entry into cell cycle may provide important context on the necessity of overlap of these two responses (yang et al., ) . specifically, mother cells will transmit p protein and ccnd transcripts to daughter cells, which, based on the abundance of transmitted signal, will either immediately re-enter cell cycle or commit to a quiescent state. transitions between quiescence and proliferation within the isc niche have important roles in tissue homeostasis and regeneration. quiescent pools of crypt-residing or adjacent cells serve as reserve populations which upon injury-dependent depletion of cycling stem cells will re-establish cycling progenitors and maintain homeostatic tissue regeneration (ayyaz et al., ; yousefi et al., ) . (glal et al., ; zhou et al., ) , where xpo may be one way to access these observed responses for therapeutic use. because xpo inhibition is inherently pleiotropic, there are many possible leads to consider as downstream mediators within iscs driving the observed shift in fate decision making. future work exploring these many complex interacting effects is warranted. additionally, exploring the potential local cellular niches of what we describe as stem ii /iii within these perturbed and evolving organoids may provide great clarity on the extent to which the organoid stem niche does and does not recapitulate in vivo. as previously shown, events of symmetry-breaking are well-modeled within intestinal organoids and spatial structuring is important in cell fate decisions (serra et al., ) . while tools to profile these kinds of changes are presently limited, they will become increasingly available. studies of this nature will better illuminate biological targets which may better translate to large tissue effects in vivo. the authors j.m.k. and r.l. hold equity in frequency therapeutics, a company that has an option to license ip generated by j.m.k. and r.l. and that may benefit financially if the ip is licensed and further validated. the interests of j.m.k. and r.l. were reviewed and are subject to a management plan overseen by their institutions in accordance with their conflict of interest policies. all studies were performed under animal protocols approved by the massachusetts institute of technology (mit) committee on animal care. proximal and / or distal small intestine was isolated from wild-type c bl/ mice of both sexes, aged between one and six months in all experiments. small intestinal crypts were isolated as previously described . briefly, the small intestine was harvested, opened longitudinally, and washed with ice-cold dulbecco's phosphate buffer saline without calcium chloride and magnesium chloride (pbs ) (sigma-aldrich) to clear the luminal contents. the tissue was cut into - mm pieces with scissors and washed repeatedly by gently pipetting the fragments using a -ml pipette until the supernatant was clear. fragments were rocked on ice with crypt isolation buffer ( mm edta in pbs ; life technologies) for min. after isolation buffer was removed, fragments were washed with cold pbs by pipetting up and down to release the crypts. crypt-containing fractions were combined, passed through a -μm cell strainer (bd bioscience), and centrifuged at rcf for min. the cell pellet was resuspended in basal culture medium ( mm glutamax (thermo fisher scientifc) and mm hepes (life technologies) in advanced dmem/f (invitrogen)) and centrifuged at rcf for min to remove single cells. crypts were then cultured in a matrigel culture system (described below) in small intestinal crypt medium ( x n supplement (life technologies), x b supplement (life technologies), mm n-acetyl-l-cysteine (sigma-aldrich) in basal culture medium) supplemented with differentiation factors at °c with % co . pen/strep ( x) was added for the first four days of culture post-isolation only. small intestinal crypts were cultured as previously described . briefly, crypts were resuspended in basal culture medium at a : ratio with corning™ matrigel™ membrane matrix -gfr (fisher scientific) and plated at the center of each well of -well plates. following matrigel polymerization, μl crypt culture medium (enr+cv) containing growth factors egf ( ng/ml, life technologies), noggin ( ng/ml, peprotech) and r-spondin ( ng/ml, peprotech) and small molecules chir ( μm, lc laboratories or selleckchem) and valproic acid ( mm, sigma-aldrich) was added to each well. rock inhibitor y- ( μm, r&d systems) was added for the first two days of isc culture only. cell culture medium was changed every other day. after days of culture, crypt organoids were expanded as and enriched for iscs under the enr+cv condition. expanding iscs were passaged every - days in the enr+cv condition. after to days of culture under enr+cv condition, iscs were differentiated to paneth cells. briefly, isc culture gel and medium were homogenized via mechanical disruption and centrifuged at rcf for min at °c. supernatant was removed and the pellet resuspended in basal culture medium repeatedly until the cloudy matrigel was almost gone. on the last repeat, pellet was resuspended in basal culture medium, the number of organoids counted, and centrifuged at rcf for min at °c. the cell pellet was resuspended in basal culture medium at a : ratio with matrigel and plated at the center of each well of -well plates (~ - organoids/well). following matrigel polymerization, μl crypt culture medium (enr+cv) was added to each well. cell culture medium was changed every - days depending on seeding density. for -well plate high-throughput screening, isc-enriched organoids were passaged and split to single cells with tyrple (thermo fisher scientific) and cultured for - days in enr+cvy prior to transfer to a " . d" -well plate culture system. to prepare for " . d" plating, cell-laden matrigel and media were homogenized via mechanical disruption and centrifuged at rcf for min at °c. supernatant was removed and the pellet washed and spun in basal culture medium repeatedly until the cloudy matrigel above the cell pellet was gone. on the final wash, pellet was resuspended in basal culture medium, the number of organoids counted, and the cell pellet was resuspended in enr+cd medium at ~ clusters/μl. -well plates were first filled with μl of % matrigel ( % basal media) coating in each well using a tecan evo liquid handling deck, and allowed to gel at °c for minutes. then μl of cell-laden media was plated at the center of each well of -well plates with the liquid handler, and the plates were spun down at rcf for minutes to embed organoids on the matrigel surface. compound libraries were pinned into prepped cell plates using nl pins into μl media/well. cells were cultured at °c with % co for six days in enr+cd medium supplemented with the tested compounds with a media change at three days. on day six, lysozyme secretion and cell viability were assessed using lysozyme assay kit (enzchek) and celltiter-glo d (ctg d) cell viability assay (promega), respectively, according to the manufacturers' protocols. briefly, screen plates were washed x with fluorobrite basal media ( mm glutamax and mm hepes in fluorobrite dmem (thermo fisher scientific)) using a biotek plate washer with min incubations followed by a min centrifugation at rcf to settle media between washes. after removal of the third wash, μl of non-stimulated fluorobrite basal media was added to each screen well using a tecan evo liquid handling deck from a non-stimulated treatment master plate, and plates were incubated for min at °c. after minutes, the top μl of media from each well of the screen plate was transferred to a non-stimulated lyz assay plate containing μl of x dq lyz assay working solution using a tecan evo liquid handling deck. the non-stimulated lyz assay plate was covered, shaken for min, incubated for min at °c, then fluorescence measured (shake s; mm/ nm) using a tecan m plate reader. after the media transfer to the non-stimulated lyz assay plate, the remaining media was removed from the screen plate and μl of stimulated fluorobrite basal media (supplemented with μm cch) was added to each screen well using a tecan evo liquid handling deck from a stimulated treatment master plate, and plates were incubated for min at °c. after minutes, the top μl of media from each well of the screen plate was transferred to a stimulated lyz assay plate containing μl of x dq lyz assay working solution using a tecan evo liquid handling deck. the stimulated lyz assay plate was covered, shaken for min, incubated for min at °c, then fluorescence measured (shake s; mm/ nm) using a tecan m plate reader. finally, μl of ctg d was added to each well of the screen plate, which was shaken for min at room temperature, then luminescence read (shake s; integration time . - s) to measure atp. primary screens were performed using the target selective inhibitor library (selleck chem). assays were performed in triplicate using four compound concentrations ( . , . , , and μm). a custom r script and pipeline was used for analysis of all screen results. results (excel or .csv files) were converted into a data frame containing raw assay measurements corresponding to metadata for plate position, treatments, doses, cell type, and stimulation. raw values were log transformed, then a loess normalization was applied to each plate and assay to remove systematic error and column/row/edge effects using the formula (mpindi et al., ) : where "# is the loess fit result, "# is the log transformed value at row i and column j, and . "# is the value from loess smoothed data at row i and column j calculated using r loess function with span . following loess normalization, a plate-wise fold change (fc) calculation was performed on each well to normalize plates across the experiment. this was calculated by subtracting the median of the plate (as control) from the loess normalized values: replicate strictly standardized mean difference (ssmd) was used to determine the statistical effect size of each treatment in each assay (treatment and dose grouped by replicate, n= ) relative to the plate using the formula for the robust uniformly minimal variance unbiased estimate (umvue) (zhang, ) : where " and si are respectively the sample mean and standard deviation of dijs where dij is the fc for the ith treatment on the jth plate. (•) is a gamma function. ( is an adjustment factor equal to the median of all " ( s to provide a more stable estimate of variance. wi and w are weights equal to . with the constraint of wi + w = . n is the replicate number. mean fc (the arithmetic mean of all samples grouped by treatment and dose across replicates) was used to determine the z-score for each treatment and dose with the formula: where sdpop is the standard deviation of all mean-fc's. all calculated statistics were combined in one finalized data table and exported as a .csv file for hit identification. a primary screen "hit" was defined as having ssmds for both lyz assays greater than the optimal critical value ( ! ! = . ) and being in the top % of a normal distribution of fc values for both assays with a z-score cutoff > . . hit treatments were thus selected to have a well-powered statistical effect size as well as a strong biological effect size. optimal dose per hit treatment was determined by ssmd for both lyz assays. confirmatory secondary screening with primary hits was performed using the above well plate method. the screen was conducted with -plate replicates with a base media of enr+cd. media was supplemented with compound at day and day (n= well replicates per dose) at four different doses: two-fold above, two-fold below, and four-fold below the optimal final dose for each respective treatment. additionally, each plate carried a large number of enr+dmso or enr+cd+dmso (vehicle) control wells (n= for atp, and n= for lyz.ns and lyz.s) for robust normalization. atp, non-stimulated lysozyme activity and cch-stimulated lysozyme activity was again measured and the collected data was again processed in a custom r-script, per primary screen with slight modification. values were log transformed, and a plate-wise fc was calculated for each well based on the median value of enr+cd+dmso (vehicle) control wells to normalize plate to plate variability. the following formula was used: the small intestine (si) was collected from mice and divided into three parts. only proximal and distal si were kept in pbs, and medial si was discarded. each si was opened longitudinally and washed in pbs. si was rolled using the swiss-rolling technique and incubated in % neutral buffered formalin (vwr, - ) for h at rt. fixed tissues were embedded in paraffin, and μm sections were mounted on slides. for immunohistochemistry, slides were deparaffinized, antigen retrieved using heat-induced epitope retrieval at °c for min using citrate buffer ph , and probed with appropriate antibodies followed by dab staining. an antibody against lysozyme was purchased from abcam (ab ), ki from bd biosciences (# ), and olfm from cell signaling technology (# ). for mcmanus periodic acid schiff (pas) reaction, slides were deparaffinized, oxidized in periodic acid, and stained with schiff reagent (poly scientific, s ) followed by counterstaining with harris hematoxylin. slides were scanned by aperio slide scanner (leica) and cells were counted on aperio eslide manager. slides were blinded and randomized before counting, and all cell types were counted in all wellpreserved crypts along the longitudinal crypt-villus axis. a single-cell suspension was obtained from organoids cultured under either enr+cd or enr+cd + nm kpt- for the differentiation time course as detailed in fig. a . briefly, organoids at each sampling were harvested from - pooled matrigel domes, totaling > , organoids per sample. excess matrigel was removed per previously described washing protocol, and organoids were resuspended in tryple at c for min, with vigorous homogenization through a p pipette tip every min. after min, the suspension was passed through a um cell strainer twice, and counted under brightfield microscopy with trypan blue staining for viable single cells. we utilized seq-well s^ for massively parallel scrna-seq, for which full methods are published (hughes et al., ) and available on the shalek lab website (www.shaleklab.com). briefly, ~ - , cells were loaded onto a functionalizedpolydimethylsiloxane (pdms) array preloaded with ~ , uniquely-barcoded mrna capture beads (chemgenes; macosko- - ). after cells had settled into wells, the array was then sealed with a hydroxylated polycarbonate membrane with pore sizes of nm, facilitating buffer exchange while confining biological molecules within each well. following membrane-sealing, buffer exchange across the membrane permits cell lysis, mrna transcript hybridization to beads, and bead removal before proceeding with reverse transcription. the obtained bead-bound cdna product then underwent exonuclease i treatment (new england biolabs; m m) to remove excess primer before proceeding with second strand synthesis. following exonuclease i treatment, the beads mixed with . m naoh for min at room temperature to denature the mrna-cdna hybrid product on the bead. second strand synthesis was performed with a mastermix consisting of ul x maxima rt buffer, ul % peg solution, ul mm dntps, ul mm dn-smart oligo, ul klenow exo-, and ul of di ultrapure water, which was added to the beads and incubated for hour at °c with end-over-end rotation. after second strand synthesis, pcr amplification was performed using kapa hifi pcr mix (kapa biosystems kk ). specifically, a ul pcr mastermix consisting of ul of kapa x mastermix, . ul of um ispcr oligo, and . ul of nuclease-free water was combined with , beads per reaction. following pcr amplification, whole transcriptome products were isolated through two rounds of spri purification using ampure spri beads (beckman coulter, inc.) at both . x and . x volumetric ratio and quantified using a qubit. sequencing libraries were constructed from whole transcriptome product using the nextera tagmentation method on a total of pg of pooled cdna library per sample. tagmented and amplified sequences were purified through two rounds of spri purification ( . x and . x volumetric ratios) yielding library sizes with an average distribution of - base pairs in length as determined using the agilent hsd screen tape system (agilent genomics). arrays were sequenced within multi-sample pools on an illumina nova-seq through the broad institute walk-up sequencing core. the read structure was paired end with read starting from a custom read primer containing bases with a bp cell barcode and bp unique molecular identifier (umi) and read being bases containing transcript information. sequencing read alignment was performed using version . . of the drop-seq pipeline previously described in (macosko et al., ) . for each nova-seq sequencing run, raw sequencing reads were converted from bcl files to fastqs using bcl fastq based on nextera n indices that corresponded to individual samples. demultiplexed fastqs were then aligned to the mm genome using star and the dropseq pipeline on a cloud-computing platform maintained by the broad institute. individual reads were tagged with a -bp barcode and -bp unique molecular identifier (umi) contained in read of each sequencing fragment. following alignment, reads were grouped by the -bp cell barcodes and subsequently collapsed by the -bp umi for digital gene expression (dge) matrix extraction and generation. prior to analysis, dge matrices were pre-processed to remove cellular barcodes with less than unique genes, greater than % of unique molecular identifiers (umis) corresponding to mitochondrial genes, low outliers in standardized house-keeping gene expression (tirosh et al., ) , barcodes with greater than , umis, and cellular doublets identified through manual inspection and use of the doubletfinder algorithm (mcginnis et al., ) . these pre-processed dges are deposited as geo gse . after quality and doublet correction, we performed integrated analysis on a combined dataset of , cells, with quality metrics for gene number, captured umis, and percent mitochondrial genes reported in fig. s . to better control for potential batch effects that may arise in sample handling and library preparation, dimensional reduction and clustering was performed following normalization with regularized negative bionomical regression as implemented in seurat v via sctransform (hafemeister and satija, ) . we performed variable gene identification and dimensionality reduction utilizing the first principal components based on the elbow method to identify cell type clusters using louvain clustering (resolution = . ). following umap visualization, we used log-normalized rna expression for all differential gene expression tests, gene set enrichment analyses, and gene module scoring. we identified genes enriched across clusters using the wilcoxon rank sum test, with genes expressed in at least % of cells, and a minimum log-fold change of . , to identify generic cell types, and corroborated these cell type identities relative to gene signatures coming from an established murine small intestinal scrna-seq atlas (haber et al., ) . gene modules were scored within each cell based on enrichment in gene set expression relative to randomly selected genes of comparable expression levels in each cell (tirosh et al., ) , via the addmodulescore function within seurat v . in addition to cell-type module scoring from haber et al., we incorporated gene sets for isc sub-typing from (biton et al., ) , in addition to gene sets representing isc activity (basak et al., ) , and genes known to contain nes from the validness database (fu et al., ) . to quantify enrichments in cell populations between treatment and control within the dataset, we utilized fisher's exact test for each cell type relative to all others at each timepoint. we only considered populations for testing where that cell type accounted for at least % of cells in both kpt- and control samples. we present the relative enrichment or depletion of a cell population with kpt- treatment over time as the odds ratio (or) with a corresponding % confidence interval, and fdr-adjusted p values with significance as '*''s denoted in corresponding figure legend. gene set enrichment analysis (gsea) was performed on the full rank-ordered list of differentially-expressed genes (without fold-change or p value cutoffs) using the piano r package (väremo et al., ) , and the msigdb hallmark v gene sets (liberzon et al., ; subramanian et al., ) . gene sets with at least and no more than matching genes were considered, and only gene sets with an fdr-corrected p value of < . were retained. methodology for statistical analysis of screens is detailed above. for each subsequent experiment, replicate type and number are reported in corresponding figure legends, along with statistical tests performed and either '$' classifications for cohen's d effect sizes, or '*' classifications for p values. supplemental information described below are available upon request. the single-cell rna sequencing data reported in this paper will be made available shortly through geo. supplementary table results from primary and secondary lysozyme (lyz) secretion screening grouped by compound and dose, reported as log fold change (fc), standard z score, and strictly standardized mean difference (ssmd). marker genes from organoid differentiation time course single-cell rna-seq, as determined by wilcoxon differential expression testing of cluster versus rest. differentially expressed genes and gene set enrichment analysis (gsea) over differentially expressed genes between kpt- treated and untreated stem ii / iii cells over days . , , in organoid differentiation time course single-cell rna-seq. a. violin plots of single-cell rna-seq log normalized (transcripts per , -tp k) expression of xpo in all un-treated control cells split by non-stem, and stem i / ii / iii annotations. wilcoxon rank sum test, bonferroni correction stem i / ii / iii vs. non-stem; ****p < . . b. violin plots of module scores over all cells derived from genes with known nuclear export signals (nes) in all un-treated control cells split by non-stem, and stem i / ii / iii annotations, each score scaled on a range from to . one-way anova post-hoc dunnett's multiple comparisons test; ****p < . , ***p < . , **p < . . c. time course kpt- treatment of enr+cd differentiating organoids, with treatments over every continuous , , and -day interval. d. flow cytometry analyses of d-cultured intestinal organoids, treated with kpt- for the indicated time frame during days culture in enr+cd media. paneth cells were identified as lysozyme-positive and cd -positive cells. means and individual values are shown (n= ), and the dotted line represents the average of paneth cell fraction in control samples. one-way anova post-hoc dunnett's multiple comparisons test; ****p < . , ***p < . , *p < . . e. volcano plot of differentially expressed single-cell rna-seq log normalized genes between kpt- -treated and control cells within stem ii / iii populations in early timepoints (day . - ). red points are enriched in kpt- -treated, grey enriched in control. differential expression based on wilcoxon rank sum test with significant log fold changes based on +/- standard deviations of all genes, fdr (bonferroni correction) cutoff p < . . f. gene set enrichment analysis (gsea) normalized enrichment score over all differentially expressed genes between kpt- -treated and control cells within stem ii / iii populations in early timepoints (day . - ). gene sets shown from msigdb hallmark v with fdr < . , red enriched in kpt- -treatment, grey enriched in control. g. split violin plots between kpt- -treated and control of module scores over all cells derived from significantly enriched (stress response) and depleted (mitogen signaling) genes in kpt- -treated and control cells within stem ii / iii populations in early timepoints (day . - ), each score scaled on a range from to . effect size (cohen's d) for each module between kpt- -treated and control within each cell type represented in bar chart below violin plots. h. violin plots of module scores derived from genes expressed in active and quiescent intestinal stem cells between kpt- -treated and control cells within stem ii / iii populations in early timepoints (day . - ), each score scaled on a range from to . two-sided t test; ****p < . . i. lyz secretion assay for organoids differentiated in enr+cd, treated with μm sr (ap- inhibitor) or nm cobimetinib (mek inhibitor) for days. organoids were incubated in fresh basal media with or without μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one-way anova post-hoc tukey's multiple comparisons test; ****p < . , **p < . . j. proposed mechanism for xpo inhibition driving transcriptional changes manifesting as increased stress responses and reduced mitogen signaling, resulting in rebalanced cycling stem cell fate decisions towards secretory paneth cells and absorptive enterocytes. selinexor (kpt- ) demonstrates anti-tumor efficacy in preclinical models of triple-negative breast cancer singlecell transcriptomes of the regenerating intestine reveal a revival stem cell selective inhibitors of nuclear export block pancreatic cancer cell proliferation and reduce tumor growth in mice identification of stem cells in small intestine and colon by marker gene lgr induced quiescence of lgr + stem cells in intestinal organoids enables differentiation of hormone-producing enteroendocrine cells high-fat diet enhances stemness and tumorigenicity of intestinal progenitors t helper cell cytokines modulate intestinal stem cell renewal and differentiation ketone body signaling mediates intestinal stem cell homeostasis and adaptation to diet high-throughput screening enhances kidney organoid differentiation from human pluripotent stem cells and enables automated multidimensional phenotyping erk signalling rescues intestinal epithelial turnover and tumour cell proliferation upon erk / abrogation a functional cftr assay using primary cystic fibrosis intestinal organoids arrdc suppresses breast cancer progression by negatively regulating integrin Β graft-versushost disease disrupts intestinal microbial ecology by inhibiting paneth cell production of α-defensins nuclear transport factors: global regulation of mitosis notch signals control the fate of immature progenitor cells in the intestine validness: a database of validated leucine-rich nuclear export signals paneth cells in intestinal physiology and pathophysiology atf sustains il- -induced stat phosphorylation to maintain mucosal immunity through inhibiting phosphatases pathway paradigms revealed from the genetics of inflammatory bowel disease a single-cell survey of the small intestinal epithelium normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression r-spondin chromosome rearrangements drive wnt-dependent tumour initiation and maintenance in the intestine r-spondin expands paneth cells and prevents dysbiosis induced by graft-versushost disease shp /mapk signaling controls goblet/paneth cell fate decisions in the intestine next-generation xpo inhibitor shows improved efficacy and in vivo tolerability in hematological malignancies highly efficient, massively-parallel single-cell rna-seq reveals cellular states and molecular features of human skin pathology cellular inheritance of a creactivated reporter gene to determine paneth cell longevity in the murine small intestine activating transcription factor in immune response and metabolic regulation control of endodermal endocrine development by hes- genetics and pathogenesis of inflammatory bowel disease mitogenic influence of human r-spondin on the intestinal epithelium a living biobank of breast cancer organoids captures disease heterogeneity three-dimensional in vitro cell culture models in drug discovery and drug repositioning the molecular signatures database hallmark gene set collection paneth cell defects in crohn's disease patients promote dysbiosis highly parallel genome-wide expression profiling of individual cells using nanoliter droplets paneth cells and necrotizing enterocolitis: a novel hypothesis for disease pathogenesis doubletfinder: doublet detection in single-cell rna sequencing data using artificial nearest neighbors intestinal barrier dysfunction in inflammatory bowel diseases all models are wrong, but some organoids may be useful harnessing single-cell genomics to improve the physiological fidelity of organoid-derived cell types impact of normalization methods on highthroughput screening data with high hit rates and drug testing with dose-response data inflammatory memory sensitizes skin epithelial stem cells to tissue damage hyperactive wnt signaling changes the developmental potential of embryonic lung endoderm allergic inflammatory memory in human respiratory epithelial progenitor cells canonical wnt signals are essential for homeostasis of the intestinal epithelium paracrine orchestration of intestinal tumorigenesis by a mesenchymal niche loss of apc in vivo immediately perturbs wnt signaling, differentiation, and migration single lgr stem cells build crypt-villus structures in vitro without a mesenchymal niche hitting a moving target: inhibition of the nuclear export receptor xpo /crm as a therapeutic approach in cancer self-organization and symmetry breaking in intestinal organoid development paneth cells and antibacterial host defense in neonatal small intestine gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression profiles inhibiting cancer cell hallmark features through nuclear export inhibition pathogenesis of necrotizing enterocolitis: modeling the innate immune response replacement of lost lgr -positive stem cells through plasticity of their enterocyte-lineage daughters dissecting the multicellular ecosystem of metastatic melanoma by single-cell rna-seq clinical dosing regimen of selinexor maintains normal immune homeostasis and t-cell effector function in mice: implications for combination with immunotherapy notch signaling modulates proliferation and differentiation of intestinal crypt base columnar stem cells mouse atonal homolog directs intestinal progenitors to secretory cell rather than absorptive cell fate enriching the gene set analysis of genomewide data by incorporating directionality of gene expression and combining statistical hypotheses and methods tuft-cell-derived il- regulates an intestinal ilc -epithelial response circuit the past, present, and future of crm /xpo inhibitors paneth-celldisruption-induced necrotizing enterocolitis in mice requires live bacteria and occurs independently of tlr signaling transmissible gastroenteritis virus targets paneth cells to inhibit the self-renewal and differentiation of lgr intestinal stem cells via notch signaling unravelling the pathogenesis of inflammatory bowel disease intestinal enteroendocrine lineage cells possess homeostatic and injury-inducible stem cell activity competing memories of mitogen and p signalling control cell-cycle entry nicheindependent high-purity cultures of lgr + intestinal stem cells and their progeny hierarchy and plasticity in the intestinal stem cell compartment hit selection in genome-scale rnai screens with replicates kpt- inhibitor of xpo -mediated nuclear export has anti-proliferative activity in hepatocellular carcinoma atf acts as a rheostat to control jnk signalling during intestinal regeneration *** * * * *** *** * a. distributions of all sample data (n= wells) for each assay following data transformation and normalization, dotted line indicates median of distribution from which fold change calculations are determined. b. pearson correlation (r) between all sample wells by screen plate and biological replicate (m , m , m ), with representative correlation plots shown for assay plates through . c. atp, lyz.ns, lyz.s assay controls across all plates and replicates, welch's t test for atp, one-way anova post-hoc dunnett's multiple comparison test * adj. p< . , **** adj. p< . . d. replicate strictly standardized mean difference (ssmd) for each assay in secondary validation screen, each point represents the ssmd from wellreplicates relative to dmso control, orange signifies treatments passing cutoffs in both lyz.ns and lyz.s assays, red marking most potent compound, kpt- . e. flow cytometry gating strategy to select viable mature paneth cells, final gate outlined in red. f. lyz secretion assay for organoids differentiated in enr with hit compounds for days. organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one sample t-test compared to , followed by the two-stage linear step-up method of benjamini, krieger and yekutieli for adjusting p-values; **p < . , *p < . . g. lyz secretion assay for organoids differentiated in enr+cd with nm kpt- , nm kpt- or ng/ml leptomycin b for days. organoids were incubated in fresh basal media with or without μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one sample t-test compared to , followed by the two-stage linear step-up method of benjamini, krieger and yekutieli for adjusting p-values; ***p < . , **p < . , *p < . . h. lyz secretion assay for organoids differentiated in enr with nm kpt- , nm kpt- or ng/ml leptomycin b for days. organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one sample t-test compared to , followed by the two-stage linear stepup method of benjamini, krieger and yekutieli for adjusting p-values; ***p < . , *p < . . i. western blotting of intracellular lyz in d-cultured intestinal organoids, cultured in enr+cd media for days. j. western blotting of intracellular lyz in d-cultured intestinal organoids, cultured in enr media for days. a. single-cell rna-seq quality metrics on a per-sample basis, including final cell number (barcodes) per array, and distributions of unique molecular identifiers per barcode (umi), unique gene number per barcode, and percent of total umis corresponding to mitochondrial genes per barcode. b. single-cell rna-seq quality metrics on a per-cell type basis, including final cell number (barcodes) per array, and distributions of unique molecular identifiers per barcode (umi), unique gene number per barcode, and percent of total umis corresponding to mitochondrial genes per barcode. c. feature plots over organoid differentiation umap representing module scores derived from gene sets enriched in in vivo stem cells, enterocytes, goblet cells, paneth cells, and enteroendocrine cells, each score scaled on a range from to . d. feature plots over organoid differentiation umap restricted to stem i / ii / iii populations representing module scores derived from gene sets enriched in in vivo type i / ii / iii intestinal stem cells (iscs), each score scaled on a range from to . e. violin plots for stem i / ii / iii populations representing module scores derived from gene sets enriched in in vivo type i / ii / iii intestinal stem cells (iscs), each score scaled on a range from to . effect size measured as cohen's d, $ . < d < . , $$$ . < d < , $$$$ d > . f. organoid differentiation umap labeled by annotated cell type, and split by day (enr+cv), day . - control (enr+cd), and day . - nm kpt- treated (enr+cd + kpt- ). a. single-cell rna-seq log normalized (transcripts per , -tp k) expression of xpo in all un-treated control cells split by cell type annotations. b. violin plots representing module scores derived from genes with known nuclear export signals (nes) in all un-treated control cells split cell type annotations, each score scaled on a range from to . c. single-cell rna-seq log normalized expression of genes involved in mapk, nfat, ap- , and aurora kinase signaling in all un-treated control cells split by non-stem, and stem i / ii / iii annotations. d. lyz secretion assay for organoids treated with kpt- for the indicated time frame during days culture in enr+cd media. organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one sample t-test compared to , followed by the two-stage linear step-up method of benjamini, krieger and yekutieli for adjusting p-values; **p < . , *p < . . e. lyz secretion assay for organoids treated with kpt- for the indicated time frame during days culture in enr+cd media. organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data normalized to atp abundance and standardized to the control in each experiment. means and individual values are shown (n= ), dotted line represents the control value ( ). one sample t-test compared to , followed by the two-stage linear step-up method of benjamini, krieger and yekutieli for adjusting p-values; **p < . , *p < . . f. single-cell rna-seq log normalized expression of genes known to be regulated by xpo signaling between kpt- -treated and control cells split by cell-type annotations. color scale is relative to un-treated, purple-to-grey increasing relative-expression. g. lyz secretion assay for organoids treated with p modulators, inhibitor pifithrinα (pfta) and activator serdemetan (serd.) over -day culture in enr+cd media with or without nm kpt- . organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data were normalized to atp abundance and further standardized to the control in each experiment. means and individual values are shown (n= ), and the dotted line represents the control value ( ). h. lyz secretion assay for organoids treated with cdk / inhibitor palbociclib over -day culture in enr+cd media with or without nm kpt- . organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data were normalized to atp abundance and further standardized to the control in each experiment. means and individual values are shown (n= ), and the dotted line represents the control value ( ). i. lyz secretion assay for organoids treated with aurora kinase inhibitor zm over -day culture in enr+cd media with or without nm kpt- . organoids were incubated in fresh basal media with μm carbachol (cch) for h on day . all data were normalized to atp abundance and further standardized to the control in each experiment. means and individual values are shown (n= ), and the dotted line represents the control value ( ). a. animal body weight over -day study, normalized per-animal to day , of vehicle or mg/kg kpt- . two-way anova, treatment variation **p < . . b. animal body weight over -day study, normalized per-animal to day , of vehicle or . , . or . mg/kg kpt- . two-way anova, treatment variation ns p > . . c. representative histological images of small intestinal crypts with to paneth cells, stained with anti-lysozyme antibody. d. histograms of paneth cell number in proximal or distal small intestine oh vehicletreated animals. cumulative frequency in proximal and distal small intestine of vehicle-treated animals was used for determining the cut-off value of fig. b . e. representative histological staining images of olfm + stem cells in small intestine. f. representative histological staining images of pas + goblet cells in small intestine. key: cord- - a b x authors: rathbun, li; aljiboury, aa; bai, x; manikas, j; amack, jd; bembenek, jn; hehnly, h title: plk - and plk -mediated asymmetric mitotic centrosome size and positioning in the early zebrafish embryo date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a b x factors that regulate mitotic spindle positioning have been elucidated in vitro, however it remains unclear how a spindle is placed within the confines of extremely large cells. our studies identified a uniquely large centrosome structure in the early zebrafish embryo ( . ± . μm mitotic centrosome in a . ± . μm diameter cell), whereas c. elegans centrosomes are notably smaller ( . ± . μm mitotic centrosome in a . ± . μm diameter cell). during early embryonic cell divisions, cell size changes rapidly in c. elegans and zebrafish embryos. notably, mitotic centrosome area scales closely with changing cell size compared to changes in spindle length for both organisms. one interesting difference between the two is that mitotic centrosomes are asymmetric in size across embryonic zebrafish spindles, with the larger mitotic centrosome being . ± . -fold larger in size than the smaller. the largest mitotic centrosome is placed towards the embryo center in a polo-like kinase (plk) and plk dependent manner . ± . % of the time. we propose a model in which uniquely large centrosomes direct spindle placement within the disproportionately large zebrafish embryo cells to orchestrate cell divisions during early embryogenesis. during early embryogenesis, rapid cell divisions increase the number of cells in an embryo to ensure that proper tissue and organ formation can proceed during later development. however, it remains unclear how the mitotic spindle is able to position itself within the confines of a cell that is disproportionately large. previous studies have proposed that spindle size scales to cell size in embryos [ , ] . however, this poses the question whether the mitotic spindle adapts to the rapidly changing cell size during early embryonic cell divisions. this study aims to understand the previously unknown mechanism by which cell division is regulated during early development in extremely large cells. one proposed model is that large embryonic cells use acentrosomal microtubule nucleation sites so that astral microtubules can reach the cortex in large cells [ ] . the mitotic centrosome/spindle pole assembles the microtubule-based spindle, and one spindle pole consists of two centrioles surrounded by pericentriolar material (pcm) that contains microtubule nucleation sites [ ] . typically, astral microtubules emanate from the centrosome and project towards the cell cortex, where they anchor and facilitate pulling forces to position the spindle and undergo cell division [ ] . in the proposed acentrosomal model, microtubule nucleation sites exist outside of the centrosome through branched microtubules, positioning astral microtubules closer to the cell cortex in large cells [ ] . however, we find that large dividing zebrafish embryo cells have notably large mitotic centrosomes that scale with cell size. we hypothesize that large centrosomes are used to assist astral microtubules in reaching the cortex in large cells. we present a model where mitotic centrosome size scales with cell size, and that this scaling requires polo-like kinase (plk) and plk . transitioning to an asynchronous wave [ ] . during the first five cell divisions, blastomeres create a cellular monolayer on top of the yolk. each division during this stage occurs perpendicular to the plane of the previous division, leading to the construction of a monolayer grid (model, figure b ) [ ] . this is clearly visualized through the use of a fluorescent microtubule transgenic zebrafish line (emtb- xgfp) [ ] , where the -cell stage embryo contains mitotic spindles oriented perpendicular to the previous division at the in early development, rapid rounds of division result in a stark decrease in cell size during the cleavage stage [ ] . we measured cell area during the first five cell cycles in each respective embryo ( -to -cell stage embryo in c. elegans, -to -cell stage in zebrafish). since imaging and quantitative analysis is difficult to obtain from the -cell to -cell stage zebrafish embryo, we focused on the -to -cell stage. both organisms had a significant decrease in cell area during these divisions. in c. elegans, the change in cell area became less drastic over time. in contrast, the decrease in cell area remained constant in zebrafish embryos out to -cells (supplementary figure e ). this suggests that while a marked decrease in cell size occurs during the first several rounds of cell division in many organisms, the magnitude of this change is not always similar. we next questioned whether the spindle and/or mitotic centrosome scaled to the longest cell axis (e.g. cell length) in c. elegans and zebrafish embryos. spindle, mitotic centrosomes, and cell length were measured in c. elegans embryos that stably expressed a centrosome marker (g-tubulin::gfp), cell membrane marker (ph::mcherry) and/or a nuclear marker (h b::mcherry) ( figure a , d, supplementary figure a metaphase mitotic spindle length was measured from mitotic centrosome to mitotic centrosome, cell length was measured from cell membrane to cell membrane along the same plane of the metaphase spindle, and metaphase mitotic centrosome area was measured (modeled in figure h ). cell length decreased with every division over time in c. elegans and zebrafish ( figure c , gray), similar to the trend identified in cell area (supplementary figure e) . this decrease was not as drastic as the decrease in spindle length ( figure c , orange). when considered as a ratio between spindle and cell length, mitotic spindles occupy a higher percentage of the cell length in later cell divisions compared to earlier divisions in both organisms leading to a significant decrease in the distance from mitotic centrosomes to cell membrane (supplementary figure f -g). despite the stark size difference between cells in c. elegans and zebrafish embryos ( figure h) , these data suggest a conserved trend of disproportional changes in cell and spindle dimensions during early cell divisions. when measuring mitotic centrosome size in c. elegans and zebrafish embryos ( figure d -e), a significant decrease in mitotic centrosome area was identified from one round of division to the next ( figure f ). strikingly, mitotic centrosomes in zebrafish embryos were extremely large with g-tubulin organized into a wheel-like structure ( . ± . μm at -cell stage, figure e ), compared to g-tubulin in c. elegans ( . ± . μm at -cell stage, figure d ) or in zebrafish at the -cell stage ( . ± . μm , supplementary figure h ). the values obtained from cell length, spindle length, and mitotic centrosome area measurements were normalized to determine their relative change. size values were normalized to the -cell stage in c. elegans and to the -cell stage in zebrafish. in both organisms, we determined that the change in cell length scaled more closely with the change in mitotic centrosome area than that of spindle length ( figure g ). in both c. elegans and zebrafish embryos, cell length and mitotic centrosome area decreased by - % over time. spindle length, however, decreased < % during this time in both organisms ( figure g ). taken together, these data suggest that decreases in cell size scale more closely with mitotic centrosome size than spindle length ( figure h , percentages represent size decrease). to characterize spindle and centrosome dynamics in the early embryo we focused on the zebrafish embryo due to its uniquely large centrosomes. to do this we employed bactin::emtb- xgfp [ ] and bactin::centrin-gfp [ ] embryos to mark microtubules and centrosomes. volumetric projections of embryos from these transgenic lines were acquired over time (figure a figure c . at prophase, the mitotic centrosomes are placed on either side of the nucleus ( figure e , f) and begin to nucleate a robust microtubule-based spindle for metaphase ( figure d ). during anaphase, the mitotic centrosomes begin to fragment and disperse, and reform during telophase to prepare for immediate cell cycle re-entry ( figure e -f, supplementary video ). notably, centrin normally marks centrioles [ , ] , but in this case it marks a uniquely large structure that colocalizes with the pcm protein g-tubulin [ ] figure a ). this > -fold difference in mitotic centrosome size is maintained from prophase/prometaphase to anaphase (supplementary figure b , d). this asymmetry in mitotic centrosome size was consistent in cells placed next to the midline or further away from the midline (representative shown in figure a ). when calculating centrosome size at metaphase in the centrosome transgenic line, bactin::centrin-gfp, an approximate . -fold change was calculated at the -cell and -cell stage ( figure d , supplementary figure e ). even though centrin-gfp organization at mitotic centrosomes is asymmetric ( figure d ), a metaphase spindle with the mitotic centrosome organizing the largest area of centrin-gfp was not as consistently positioned towards the center of the embryo ( . ± . % of the time across n= embryos at the -and -cell stage measured, figure e , ratios between mitotic centrosomes shown for inner to outer in supplemental figure f ). taken together, these data suggest a model in which zebrafish mitotic centrosomes present with an asymmetry in pcm components (e.g. γ-tubulin) starting at prophase/prometaphase (supplementary figure b , d) and this asymmetry biases the positioning of the larger centrosome towards the midline at the -and -cell stage (modeled in figure f ). positioning. as cells progress through the cell cycle, they normally require plk to duplicate their centrosome and plk for robust pcm assembly during bipolar spindle construction [ ] . the assembly of pcm components that interact with g-tubulin, such as pericentrin and cep , is facilitated by the phosphorylation activity of plk [ , ] . with plk inhibition, centriole duplication is disrupted, causing spindles to assemble through acentriolar organization of pcm [ ] [ ] [ ] . however, the role of plk and/or plk at mitotic centrosomes in the early zebrafish embryo is unknown. transcripts for plk and plk have been detected as early as the -cell stage in zebrafish embryos, indicating that they are maternally supplied prior to zygotic genome activation, albeit plk transcript levels are significantly lower [ ] . due to this, we tested the hypothesis that plk and/or plk regulate g-tubulin organization at mitotic centrosomes in zebrafish embryos. the plk and plk small molecule inhibitors, bi [ , , ] and centrinone [ ] , were injected into cell stage embryos (concentrations used µm or nm, bi described with zebrafish in [ , ] ). an injection approach was utilized versus soaking the embryos in drug due to the low permeability of the zebrafish chorion and embryo fragility at this stage. control embryos were injected with % dmso at the -cell stage and analyzed at the -cell stage. in . ± . % of these control embryos, the larger mitotic centrosome was positioned towards the midline ( . ± . μm ) whereas the smaller was positioned away ( . ± μm , figure a -b). this directional positioning of the larger mitotic centrosome towards the midline was significantly decreased when embryos were injected with bi ( . ± . % with nm and . ± . % with μm), or centrinone ( . ± . %, with nm and . ± . % with μm, figure a this suggests that not only does plk and plk regulate mitotic centrosome structure and asymmetry, but they regulate the directionality of larger centrosome placement towards the midline of the embryo's grid of cells ( figure e ). we were surprised that plk inhibition caused an increase in the area occupied by g-tubulin in mitotic centrosomes due to its known role in recruiting the pericentrin-cep complex that anchors the γ-turc at the centrosome [ , ] . one possible explanation for this is that plk has been proposed to regulate pcm architecture by facilitating its phase separation in c. elegans [ , ] and that inhibiting plk in zebrafish embryos may change the physical state of the pcm causing it to increase in size. this could explain why mitotic centrosome area significantly increases in a dosage-dependent manner with bi treatment, as losing a pcm architecture regulator may cause the surrounding pcm to lose its tight matrix configuration and expand in size ( figure a , d). centrinone treatment did not exhibit the same dosagedependent change in mitotic centrosome area ( figure d ). a possible explanation is that plk is present at much lower concentrations in the early zebrafish embryo compared to plk [ ] . it is therefore likely that lower drug concentrations are required to target the small pool of plk , leading to a similar phenotype with drug concentrations above this small threshold. in order to determine the importance of plk / -dependent asymmetric mitotic centrosome size placement in early zebrafish divisions, we raised embryos after injection of % dmso, μm bi , or μm centrinone ( figure f ). we found that compared to control embryos, plk -or plk -inhibition resulted in a lower survival rate over the first five days post-fertilization. at five days, we noted heart edema, embryo elongation defects, yolk elongation defects, and small eyes in the small fraction of embryos that survived drug treatment ( figure f , supplementary figure d ), which are all common defects seen after early development perturbation. given that the injections of bi or centrinone likely diffuse out when the chorion starts to become more permeable, it is probable that the earliest cell divisions are impacted the most from this treatment. this led us to conclude that plk -and plk -dependent asymmetric mitotic centrosome placement in early embryos impacts later development. through these studies, a unique centrosome structure has been characterized that may contribute to a better understanding of how mitotic spindles are able to coordinate cell division in disproportionately large cells. we demonstrate that mitotic centrosome size adapts to the decreasing cell size during the cleavage stage of zebrafish development. during this time in development, we found that zebrafish mitotic centrosomes are asymmetric in size and display a directionality, where the larger mitotic centrosome within a spindle is positioned towards the embryonic midline in a plk -and plk -dependent manner. furthermore, we identified an ability for mitotic centrosomes to scale with cell size and maintain a -fold asymmetry across the spindle while doing so. when centrosome size scaling and asymmetry is disrupted an increase in embryonic lethality and developmental defects occurs, suggesting that early zebrafish embryonic cell divisions are not only important for early embryogenesis but likely also impact later developmental processes. animal lines. zebrafish lines were maintained using standard procedures approved by the syracuse university iacuc committee (protocol # - ). embryos were staged as described in kimmel for live cell imaging of c. elegans embryos, a spinning disk confocal system was used. the system is equipped with a nikon eclipse and is an inverted microscope with a x . na objective, a csu- spinning disc system and a photometrics em-ccd camera from visitech international. images were obtained every minutes with a micron z-stack step size. pharmacological treatments. zebrafish embryos were injected with either % dmso, or bi or centrinone (final concentration nm or µm) at the - -cell stage. embryos are incubated at °c until they reach the developmental stage of interest, at which time they are fixed with % paraformaldehyde in pbs. immunohistochemistry then proceeds as detailed below. zebrafish immunohistochemistry. zebrafish embryos were fixed using % pfa containing . % triton-x overnight at °c. zebrafish were then dechorionated and incubated in pbst (phosphate buffered saline + . % tween) for minutes. embryos were blocked using a fish wash buffer (pbs + % bsa + % dmso + . % triton-x ) for minutes followed by primary antibodies incubation (antibodies diluted : in fish wash buffer) overnight at °c or hours at room temperature. embryos are then washed five times in fish wash buffer and incubated in secondary antibodies (diluted : in fish wash buffer) for hours at room temperature. after five more washes, embryos were incubated with ', -diamidino- -phenylindole (nucblue® fixed cell readyprobes® reagent) for minutes. for imaging, embryos were either halved and mounted on slides using prolong diamond (thermo fisher scientific cat. # p ) or whole-mounted in % agar (thermo-fisher cat. # ). image and data analysis. images were processed using both fiji/imagej software and adobe photoshop. all graphs and statistical analysis were produced using graphpad prism software. -d images, movies, and surface rendering were performed using bitplane imaris software (surface, smoothing, masking, and thresholding functions). to calculate two-dimensional area, a boundary was drawn around the structure of interest (cell, spindle pole, etc.) in imagej/fiji and the area within this shape was calculated. to calculate spindle length, cell length, aspect ratio, etc., a line was drawn in imagej/fiji from one end of the structure of interest to the other. this length was then measured and recorded. phenotypic characterization. wildtype zebrafish embryos were injected as described in the pharmacological section described above. the embryos were maintained at °c and assessed for abnormality in development and the number of deaths every minutes for - hours post injection then once hours post injection. at days post fertilization, the phenotypes of injected embryos were characterized and the number of embryos with developmental defects were recorded. to generate death curves for the pharmacological treatments, the number of embryos treated with each drug were standardized to the starting number of embryos and were displayed as ratios over time. statistical analysis. unpaired, two-tailed student's t-tests and one-way anova analyses were performed using graphpad prism software. **** depicts a p-value < . , *** p-value < . , **p-value< . , *p-value < . . see methods tables for detailed information regarding statistics. a comparative analysis of spindle morphometrics across metazoans scaling, selection, and evolutionary dynamics of the mitotic spindle microtubule nucleation remote from centrosomes may explain how asters span large cells re-evaluating centrosome function cell cycle timing regulation during asynchronous divisions of the early c. elegans embryo stages of embryonic development of the zebrafish a model for cleavage plane determination in early amphibian and fish embryos measuring time during early embryonic development polarization and orientation of retinal ganglion cells in vivo chromosome misalignment is associated with plk activity at cenexin-positive mitotic centrosomes centrin- is required for centriole duplication in mammalian cells subdiffraction imaging of centrosomes reveals higher-order organizational features of pericentriolar material regulating a key mitotic regulator, polo-like kinase (plk ) the centrosomespecific phosphorylation of cnn by polo/plk drives cnn scaffold assembly and centrosome maturation increased lateral microtubule contact at the cell cortex is sufficient to drive mammalian spindle elongation sak/plk is required for centriole duplication and flagella development numa assemblies organize microtubule asters to establish spindle bipolarity in acentrosomal human cells the zebrafish transcriptome during early development cytokinetic bridge triggers de novo lumen formation in vivo subcellular drug targeting illuminates local kinase action reversible centriole depletion with an inhibitor of polo mitosisspecific anchoring of γ-tubulin complexes by pericentrin controls spindle organization and mitotic entry regulated changes in material properties underlie centrosome disassembly during mitotic exit regulated assembly of a supramolecular centrosome scaffold in vitro we thank lilianna solnica-krezel (uw) for sharing the bactin::centrin-gfp zebrafish line. this work was supported by national institutes of health grants no. key: cord- -vf qyvft authors: seitz, christian; casalino, lorenzo; konecny, robert; huber, gary; amaro, rommie e.; mccammon, j. andrew title: multiscale simulations examining glycan shield effects on drug binding to influenza neuraminidase date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vf qyvft influenza neuraminidase is an important drug target. glycans are present on neuraminidase, and are generally considered to inhibit antibody binding via their glycan shield. in this work we studied the effect of glycans on the binding kinetics of antiviral drugs to the influenza neuraminidase. we created all-atom in silico systems of influenza neuraminidase with experimentally-derived glycoprofiles consisting of four systems with different glycan conformations and one system without glycans. using brownian dynamics simulations, we observe a two- to eight-fold decrease in the rate of ligand binding to the primary binding site of neuraminidase due to the presence of glycans. these glycans are capable of covering much of the surface area of neuraminidase, and the ligand binding inhibition is derived from glycans sterically occluding the primary binding site on a neighboring monomer. our work also indicates that drugs preferentially bind to the primary binding site (i.e. the active site) over the secondary binding site, and we propose a binding mechanism illustrating this. these results help illuminate the complex interplay between glycans and ligand binding on the influenza membrane protein neuraminidase. statement of significance the influenza glycoprotein neuraminidase is the target for three fda-approved influenza drugs in the us. however, drug resistance and low drug effectiveness merits further drug development towards neuraminidase, which is hindered by our limited understanding of glycan effects on ligand binding. generally, drug developers do not include glycans in their development pipelines. here, we show that even though glycans can reduce drug binding towards neuraminidase, we recommend future drug development work to focus on strong binders with a long lifetime. furthermore, we examine the binding competition between the primary and secondary binding sites on neuraminidase, leading us to propose a new, to the best of our knowledge, multivalent binding mechanism. it has been long appreciated that glycans on influenza membrane proteins help shield the virus from the host immune system's antibodies ( ) ( ) ( ) ( ) ( ) ( ) ( ) . unrecognized glycosylation differences can also attenuate influenza vaccines ( ) . in one study, glycans were shown to reduce epitope accessibility and drug binding to receptor proteins ( ) . glycans can clearly influence antibody binding due to their presence in the antibody binding site. however, it remains to be seen whether this glycan shielding and glycoprofile variability is also a concern for influenza drugs, recognizing that these drugs are smaller than human antibodies, and the fact that glycans present themselves near, but not directly inside, the catalytic sites. currently there are three fdaapproved influenza neuraminidase (na) antivirals in the us: tamiflu (oseltamivir), relenza (zanamivir) and rapivab (peramivir), all of which have lingering questions over their efficacy, side effects, and drug resistance ( , ) . this necessitates the need for further drug development against influenza ( ) . drug developers have many hurdles to clear when designing a new influenza drug: classical admet characteristics, clinical trials and governmental regulations, among others. what is not often considered is the viral glycosylation state. the glycosylation state is the assemblage of glycans, linkages of sugars found on the surface of about half of all proteins ( ) . influenza contains n-linked glycosylation sites, defined by the asn-x-ser/thr sequon, where x can be anything besides proline ( ) . this leads to the so-called glycan shield, where glycans on the protein surface are capable of accessing much of the protein's surface area, and potentially shielding it from outside interactions ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . among their many biological functions, glycans play a crucial, but complex role in viral infection ( ) . one salient example of glycan function in influenza is how they help the virus evade the immune system ( ) ( ) ( ) ( ) ( ) ( ) ( ) . furthermore, glycans are capable of affecting receptor binding in influenza ( , , ( ) ( ) ( ) ( ) . traditionally, glycans have been difficult to study due to their flexibility and heterogeneity. most of the glycan characterization studies are done through mass spectrometry, which can yield highly variable glycoprofile data, such as differences in the degree of post-translational modifications, sequon occupancy, and type of glycan, for different strains of influenza ( ) ( ) ( ) ( ) ( ) ( ) . similarly, glycan occupancy levels are not consistent across studies, even when using the same cell line and strain of influenza ( , ) these discrepancies may arise from differences in system setup, sample preparation, cell culturing and/or analysis method, which increases the difficulty in determining the transferability of experimental glycan results. though not well understood, the number and position of the glycosylation sites on influenza can change over time as a result of antigenic drift ( ) ( ) ( ) ( ) . this increases the glycoprofile variability, effectuating irregular but significant changes in the glycan shield over time. considering the variability and immune evasion function of the glycan shield, it remains to be seen what effect this shield has on small-molecule antiviral drug binding to viral surface proteins. previous work has shown that, depending on the viral strain and receptor mimetic used, removing viral glycans can improve binding to cell receptor mimetics ( ) ( ) ( ) . other studies have shown that these viral glycans decrease binding of other cell receptor mimetics ( , ( ) ( ) ( ) ( ) . regardless, antiviral drugs will be much smaller than a receptor mimetic, and it is not clear whether this size difference means antiviral drugs will still be affected by the viral glycans. an earlier study by kasson and pande, using ns molecular dynamics (md) simulations, showed reduced binding of α - -sialyllactose trisaccharides to hemagglutinin due to glycans ( ) . a recent review concluded that the viral glycosylation state should be considered when designing small molecule antivirals ( ) . focusing on how small molecule antivirals are affected by the glycan shield, we combine results from distinct bd and md simulations into an integrated multiscale simulation study. we have utilized bd to estimate the rates of binding of small molecules to the primary (i.e. active/catalytic) and secondary (i.e. hemadsorption) binding sites of influenza neuraminidase in glycosylated and unglycosylated states. we see that the glycan shield is capable of moderately inhibiting drug association to the primary binding site of na on the order of two to eight times. small molecule association is faster to the primary binding site than the secondary binding site. ligand binding is independent between the primary and secondary sites -the presence of one site does not influence binding at the other site. overall, this work provides insights into the impact of glycans on small-molecule binding to na. in this study, we use brownian dynamics (bd), which has been previously used to simulate protein-small molecule association ( ) ( ) ( ) ( ) . specifically, it has also been used to simulate the association of small molecules to influenza neuraminidase ( ) ( ) ( ) . bd makes the implicit assumption that long-range electrostatics and stochastic collisions with solvent molecules are the driving forces behind protein-ligand binding ( ) . therefore, it is an efficient method to simplify binding to describe electrostatically-influenced diffusion. using bd allows for a reduction in system complexity and a focus on specific modulations of ligand association. to assess whether glycans affect small molecule binding to na, we created an in silico na model using the strain of influenza a virus, a/viet nam/ / (h n ) and tetrameric pdb hty, with uniprot id q dpl ( ) . building on this structure, we generated five na constructs: (i) one unglycosylated model; (ii) one glycosylated model with web server-derived glycan conformations; (iii) three glycosylated models, each with unique, biologically-relevant glycan conformations derived from all-atom md simulations that were based on (ii) as the starting structure. finally, we ran bd simulations using these models to examine binding characteristics of oseltamivir, zanamivir and sialic acid. to note, the bd input and results files are provided on github (https://github.com/cgseitz). the unglycosylated model was built using an avian h n strain and was used as a basis for the other models. we picked this strain of influenza because it contains a glycosylation site at n , a member of the -loop that hangs over the primary binding site, as shown in figure . this close proximity provides a good test of whether glycans were capable of interfering with ligand association to influenza neuraminidase. as the bd simulations used here keep bonds rigid, it was necessary to select ligand conformations that represented a bound state and protein conformations that represented an open state, to properly approximate the initial binding contact. thus, we selected a crystallized apo head region of the strain mentioned above (pdb hty) ( ) . the stalk region has not been crystallized for any influenza na and is unlikely to influence ligand association due to its large distance from the distal binding sites, so it was not modelled. the crystallized calcium ions were retained throughout, while the crystallized glycan fragments were removed ( ) . hty was crystallized with a y h mutation (pdb numbering), which was reversed for this project through pymol ( ) . the histidine rotamer was chosen to be the one with the highest occurrence in proteins, according to pymol. the crystal structure contained a broken backbone between p and n which was fixed through schrodinger maestro; subsequently residues - (on each side of the fixed bond) were minimized through maestro ( ) . the same procedure was done for the broken backbone between v and q : the bond was created and residues - were minimized. this refitting was done for each monomer in the tetramer. the ph was set to ph . , as this was done in the reference k on experiments ( , ) . using this ph, protonation states on the neuraminidase were assigned using propka ( ) . the protonation assignments were done through the pdb pqr server ( ) . partial charges on the protein were assigned according to the amber force field ( ) . parameterizing the glycans needed special treatment as there are not glycan parameters in the amber force field. we used the glycam_ h- parameters as these would be consistent with the amber force field ( ) . to build the first glycosylated construct (with web server-derived glycan conformations), the unglycosylated na structure was uploaded to the glyprot server, and three representative glycans were added to each na monomer, for a total of glycans on the na homotetramer ( ) . though there is experimental variability in glycosylation site occupancy, we decided to place a glycan in each glycosylation site to see the maximum potential effect the glycoprofile can have on ligand association. considering most of the human h n transmission came directly from avian sources, the glycans used to model this structure came from an avian (hen egg) source for growing these glycans ( ) . additionally, this dataset is the only one containing structures experimentally found on influenza na ( ) . we chose representative glycans from this dataset, however we note that both larger and smaller glycans will exist in nature; these size differences may slightly affect the results presented here. the exact glycans were selected as shown in table . table . glycan structures from the glyprot web server. the "glycan structure" entries came from experimental results ( ) . these structures consist of n-acetylglucosamine (glcnac), mannose (man), n-acetylhexosamine (hexnac) and hexose (hex). hexnac and hex were interpreted according to their corresponding "glyprot identifier" and the structures shown in to better diversify our system, three glycosylation sites (termed as site # , site # and site # ) present on each monomer were linked to three different glycan types. importantly, the four monomers (termed as monomer a, monomer b, monomer c, and monomer d) of our homotetrameric na model were symmetrically glycosylated, meaning that sites # , # and # were populated with the same glycan across monomers. starting with the structure containing web server-derived glycan conformations, md was then used to generate representative glycan conformations, with the assumption that md would provide realistic conformations of glycans within a microsecond's worth of sampling ( , ) . the first step was porting the structure with web server-derived glycans into charmm-gui to prepare the structure for md ( ) ( ) ( ) ( ) ( ) ( ) . the disulfide bonds were taken from uniprot id q dpl . the system was embedded into a box described with explicit water molecules using the tip p model ( ). an ion model was used as described previously ( ). the full system had a size of , atoms. an ionic solution of . m nacl was used, and the charmm all-atom additive force fields were used for the protein and the glycans ( ). molecular dynamics simulations were run using gpu-accelerated amber with an npt ensemble ( , ). the system was initially minimized for a total of cycles using a combination of steepest descent and conjugant gradient methods ( , ). equilibration in an npt ensemble was performed for ps, using a timestep of fs and the shake algorithm to constrain all bonds involving hydrogen ( ) . the equilibration temperature was set at k and regulated through a langevin dynamics thermostat ( , ) . the pressure was fixed at bar through a monte carlo barostat ( ) . these simulations were run using extreme science and engineering discovery environment (xsede), specifically the comet supercomputer housed at the san diego supercomputer center ( ) . periodic boundary conditions were used with a non-bonded shortrange interaction cutoff of Å and force-based switching at Å. particle mesh ewald was used for the long-range electrostatic interactions ( ) . for the production runs, the temperature was set at . k ( ). after equilibration, this system was cloned into identical replicates. each one was run in parallel for ns each with a unique starting velocity, totaling µs of sampling. once the md simulations finished, the trajectory of each glycan was concatenated independently of the rest of the system. each of these individual glycan trajectories were then clustered using gromacs-based gromos clustering with an rmsd cutoff of . Å ( ). this number was chosen so the three most populated clusters would represent at least % of the total glycan conformations in each of the simulations. the central structure, defined as the structure with the smallest average rmsd from all other members of the cluster, from the top cluster of each of the glycans was then selected; the pyranose ring from the reducing end of the glycan was then aligned to the analogous pyranose ring of the corresponding glycan from the glyprotglycosylated structure as this should be the most stable part of the glycan ( ) . the glyprot glycans were removed and the glycans from the md simulations were attached through schrodinger maestro, to create a new na system with each glycosylation site inhabited by the central structure of the most representative conformation from the md simulations. this was then repeated for the second and third most representative glycan clusters from the md simulations. the sialic acid structures used were drawn from pdb mwe, which crystallized the boat conformation in the active site and the chair conformation in the secondary site ( ) . the chair conformation of sialic acid was crystallized with a missing carboxylate group, which was added through schrodinger maestro to model an energetically-favorable gauche conformation. zanamivir was extracted from the ckz crystal structure ( ) . oseltamivir was extracted from the cl crystal structure ( ) . a d comparison of these ligands can be seen in figure s , showing their structural similarities; we note that all mentions in this study of oseltamivir pertain to tamiflu's active metabolite oseltamivir carboxylate. these ligands were then uploaded to the prodrg server to add hydrogens ( ) . charges according to the amber force field were added through the pdb pqr server ( , ) . bd simulations were run using browndye ( ) . even though the md was run with the amber force field and the bd was run with the amber force field, we assume these to be sufficiently independent steps and the slight force field differences should not appreciably affect the results, especially as our k on numbers are relative, not absolute. the charges for the protein and ligands were reassigned according to the amber force field ( ) . the temperature was set to . k, which was the temperature for the referenced k on experiments ( , ) . the ions used are shown in table s . these ions were selected to mimic the ion and buffer concentration of the reference k on experiments ( , ). the experimental assay used mm cacl and . mm mes buffer ( ) . the ca + and clconcentrations were simply calculated by finding their ionic strengths. mes buffer is prepared with na + ; the concentrations of the buffer and na + at ph . were calculated with the henderson-hasselbalch equation ( , ) . this resulted in an overall ionic strength of . m. the calcium, chlorine and sodium van der waals radii were taken from the literature ( , ) . the mes radius was determined by building it in schrodinger maestro and measuring it in vmd ( ) . apbs was used to create the electrostatic grids needed by browndye for these simulations ( ) . the grid spacings are listed in table s . the solvent dielectric was set to while the protein dielectric was set to . desolvation forces were turned off. the debye length, determined from the concentration and charges of the ions in the solution, was set to . Å. in browndye, the b radius is defined as the starting radius for the ligand trajectories, at a distance where the force between the protein and ligand is independent of orientation. this distance is determined from the hydrodynamic center of the receptor. because of the different glycan conformations used, the b radius differed slightly between systems. if a ligand reaches what is known as the q radius, the trajectory either ends as a non-association or is restarted from the b radius according to browndye's algorithm. the q radius is defined as . times the b radius distance. the b radius ranged between Å and Å depending on the system, and the q radius ranged from Å to Å . the exact b and q radius values for each system are shown in table s . bd simulations were run on all five na models generated (i.e. unglycosylated, glycosylated with web server-derived glycans, and the three systems with md-derived glycan conformations). these simulations totaled million trajectories for each ligand/binding site pair, consistently giving reproducible rates within the small level of error reported and resulting in million trajectories total. reproducible rates will be obtained by having a binding probability of around one in a million trajectories; we found we could roughly obtain these probabilities by using million trajectories for each ligand/binding site as has been reported previously ( ) . this number of trajectories produced error values comparable to those seen in the reference experimental studies, as seen in figure s . for systems where we saw at least one binding event, the number of binding events ranged from two to (see supporting material for details). bd simulations using browndye requires the creation of reaction criteria, consisting of a list of protein-ligand atom pairs and a cutoff distance. if any three of these pairs simultaneously came closer than the cutoff distance, we assume the ligand will associate. the cutoff distance was empirically determined to be . Å; this distance approximately yielded the experimental k on rates for both oseltamivir and zanamivir ( ) . there are no other experimental k on rates towards the primary site of h n , and no referenced rates at all for the secondary site. the referenced k on experiments were done with glycans attached to na and measured to the full tetramer; this was confirmed in personal correspondence with the corresponding author (stephen martin of the mrc national institute for medical research, correspondence on july , ). considering that the reaction criteria and reaction distance were created for oseltamivir and no significant changes were made before applying them to zanamivir, we can safely assume that they are generalizable for sialic acid, an analog of both oseltamivir and zanamivir ( figure s ). the protein-ligand atom pairs were taken from crystal structures of ligands in the primary and secondary sites of neuraminidase for each monomer, and simulations were run for the full tetramer. the primary binding site was determined according to the crystallized binding pocket for our strain of neuraminidase ( ) . this pocket is noted to have a surface area of . Å and a volume of . Å ( ) . the secondary site contacts were determined from a structure of influenza a/tern/australia/g c/ ( ) . however, all the secondary site residues are conserved between that strain and the strain used in our simulations. the combined site simulations are defined as simulations with criteria allowing for association to either the primary or secondary site; it is simply a simulation run with a concatenation of the binding criteria for these sites. in this work, we define binding site contacts to be those protein-ligand contacts seen in crystal structures. from these contacts, we created protein-ligand atom pairs in browndye to determine when a reaction has occurred in our bd trajectories. there are seven primary binding site contacts reported between oseltamivir and the cl crystal structure ( ) . these binding site contacts are reported in table s and figure s . there are five primary binding site contacts reported between sialic acid and the mwe crystal structure; all five of these are analogous to those seen for oseltamivir ( ) . the binding site contacts for sialic acid are registered in table s and figure s . there is one primary binding site contact reported between zanamivir and the ckz crystal structure; this one is analogous to one seen in oseltamivir ( ) . the binding site contacts from oseltamivir were transferred to zanamivir retaining the one contact seen in the ckz crystal structure and are reported in table s and figure s . using the structural similarities of sialic acid and zanamivir to oseltamivir, analogous primary binding site atom pairs were created so that each ligand had seven primary protein-ligand atom pairs. there are five secondary binding site contacts reported between sialic acid and the mwe crystal structure ( ) . these contacts are reported in table s . there are no published reports of crystal structures of oseltamivir or zanamivir in the secondary binding site, so five analogous secondary binding site protein-ligand atom pairs were created for oseltamivir (table s ) and zanamivir (table s ) to match those seen in sialic acid, so that each ligand had five secondary binding site protein-ligand atom pairs. to pare down the data from µs of cumulative md sampling and pick out biologically-relevant glycan conformations, we clustered each glycan from the md simulations. the glycan trajectories were extracted and affixed on the static na crystal structure, to reveal the conformational space they can access (figure ) . visualizing these glycan trajectories on the na structure gives a qualitative representation of how much volume and surface area the glycans are capable of accessing. keeping in mind the primary and secondary binding sites are located just beneath the glycans (figure ) , the size and flexibility of the glycans here shows that they have the capability to "shield" the binding sites from ligand association. the three most representative clusters for each glycan were extracted from the md simulations. the central structure from each cluster was compared with the conformation generated from glyprot. these clusters show some conformational diversity, but none show a particularly similar conformation to the glyprot structure. however, the third glycan in each monomer shows a markedly decreased conformational diversity compared to the other two monomers. the clustering results from each monomer show the same trends; the results from monomer a are shown in figure , while the results from monomer b (figure s ), monomer c (figure s ) , and monomer d (figure s ) are shown in the supporting material. the glycans bend away from the binding sites on their monomer towards the binding sites on the neighboring monomer. this is seen for each monomer. the primary binding sites are in purple and the secondary binding sites are in orange. the linkage between the glycans and the protein is in yellow. the na structure is in teal. the glyprot conformation is in gray, the first conformation from the md simulations is in orange, the second conformation is in blue, and the third conformation is in green. to be confident in our computed association rates, we first needed to benchmark our system against experimental results. we created empirically-derived system criteria for the association of oseltamivir to the primary binding site of glycosylated na, as described in the methods. after matching the experimental association rate with oseltamivir, the same parameters were applied to zanamivir. these are the only two experimental association rates for h n na. subsequently, we investigated the association of oseltamivir and zanamivir to the primary sites of glycosylated na, obtaining association rates of . ± . /µm·s for oseltamivir and . ± . /µm·s for zanamivir. these are in agreement with the experimentally-measured rates of . ± . /µm·s and . ± . /µm·s, respectively, as visualized in figure s ( ) . considering the experimental systems were glycosylated, we had to pick one glycan conformation to use for computing these benchmarks in our glycosylated system; for reproducibility we chose the conformation generated from the glyprot server. we note that choosing a different conformation for our computed benchmark would change the absolute association rates by a scaling factor, but the trends would remain the same. since the predicted k on for oseltamivir and zanamivir both matched up well with the experimental rates, the system proved to be transferable to ligand analogs for the primary site. we then applied the same criteria to two different conformations of sialic acid, boat and chair, to probe if the association rate was dependent on conformation. this was done in addition to analyzing how association rate was modulated by different functional groups, via comparisons of ligand analogs such as oseltamivir, zanamivir and sialic acid. with the binding criteria set up, we calculated the association rates of each of the ligands to the primary site ( figure a) . these results show two important findings. first, there is not a large difference in association rates between the system with glyprot glycans and the unglycosylated system. this shows that a glycan may adopt a conformation where it does not inhibit ligand binding much at all. the second finding is that the glycans from the md simulations all show a moderate level of inhibition, more than the system with glyprot glycans. this shows that biologically-relevant glycan conformations will likely exhibit a moderate level of inhibition towards ligand binding. combining the first and second finding discussed in this paragraph, glycans are capable of perturbing ligand binding to na. conf is the glycan structure from the most populated cluster from the md simulations. conf is from the second most populated cluster, and conf is from the third most populated cluster. the association rates using glycans structures downloaded from glyprot are shown in gray. the association rates using structures derived from the md simulations are in bright, colorful shades whereas the others are in grayscale. the association rates without using any glycans are shown in black. (a) the glycan structures from the md simulations show a moderate association rate inhibition to the primary binding site irrespective of ligand chosen. (b) little association is seen to the secondary binding site. note the different y-axis used to be able to see the small amount of binding. (c) association rates of trajectories run with either the primary site or secondary site as the trajectory end point. similar to (a), the glycans structures from the md simulations in (c) show a moderate inhibition of ligand association. the raw data for this figure is seen in table s (oseltamivir), table s (zanamivir), table s (sialic acid boat conformation), and table s (sialic acid chair conformation). there are no experimental association rates for ligands to the secondary site, so criteria were chosen based off of crystal structure data and discussed in the methods. only sialic acid has been crystallized in the secondary site of avian na, so binding site criteria for the secondary site were extracted from that structure and used to create the criteria for oseltamivir and zanamivir, as discussed in the methods ( ) . a previous bd study suggested that oseltamivir can bind to the avian na secondary site ( ) . a follow-up nmr study also suggested that the oseltamivir binds to the avian na secondary site ( ) . however, a more recent experimental study disagreed with these findings and did not see oseltamivir binding to the avian na secondary site ( ) . considering the disagreement with oseltamivir binding to the secondary site, we decided to test this and secondary site binding for zanamivir as well. the computed association rates towards the secondary site show a markedly different story than those to the primary site ( figure b) . none of the ligands exhibited noticeable binding towards the secondary site, with the exception of the boat conformation of sialic acid. even with this conformation, there is no consistent trend when compared to primary site binding. although the boat conformation sialic acid displays a small amount of binding, the chair conformation does not show binding. these results show that we can differentiate between these two sialic acid conformations at the bd level of theory. finally, trajectories were run where the ligand could associate to either the primary site or the secondary site ( figure c) . intriguingly, the results are essentiallly a concatenation of the rates seen for the primary and secondary sites individually. considering the low level of secondary site binding, the trends here are the same as seen for the primary site. as can be seen in figure s , there is a formal charge difference between the ligands: sialic acid contains a formal charge of - while oseltamivir and zanamivir are neutral. running test bd trajectories without charge treatment (results not shown), we saw analogous results to those seen in figure . this meant that only the sterics of the systems affected binding, not electrostatics. clearly one or a few of the structural differences between the ligands play outsized roles in affecting the association rates. in this work we did not further probe which exact atoms in the ligands will change the association rates. biologically, the influenza replication cycle is propagated through na recognizing and cleaving sialic acid. this study compares the interplay between that molecular recognition process and na's aforementioned glycan shielding capabilities. this interplay is simplified here by approximating ligand binding as a diffusion-governed association process, modulated by protein electrostatics. previous studies have shown that viral proteins can exhibit a degree of glycosylation large enough to partially protect a variety of viruses from immune system antibodies; this is termed the viral glycan shield ( , , ( ) ( ) ( ) . from static structures one can envision the shielding that glycans can provide, but a dynamic representation better depicts the steric barrier encountered by immune system antibodies and drugs ( ) . in our single na protein, we see that glycans are capable of covering most of the na surface area, as shown in figure . this is consistent with studies explaining how the influenza glycan shield can cloak the influenza virion from the immune system ( - ). the glycans can access a large volume, allowing for a considerable shielding potential. however, it is worthwhile to note that influenza glycoproteins are usually not as extensively glycosylated as on some other viral proteins, such as the hiv envelope protein or the sars-cov- s protein ( , , ( ) ( ) ( ) . the exact h n construct prepared here contains a glycosylation site at n . this is part of the loop that borders the primary binding site (figure ) . the representation in figure shows that the glycans present at site n on each monomer have the combined capability to cover both na binding sites, potentially thwarting the binding of small molecules. the results shown here display a moderate inhibitory effect due to glycans, but this effect would likely not be present in proteins whose glycans only reside far from the ligand binding sites, i.e. if the setup in figure only contained the glycans at site and site on the bottom of the na head. when examining the effect of glycan conformation on binding inhibition, the glyprot glycans display a fairly vertical conformation. on the other hand, the glycans from the md simulations bend backwards, away from the primary binding site on their own monomer and towards the secondary binding site of the adjacent monomer, as shown in figure . interestingly, this bend appears to be enough to inhibit primary site binding. it has been previously shown that specific chemical modifications on the glycans can significantly change their flexibility ( ) ( ) ( ) . it has also been hypothesized that glycan flexibility plays a role in protein-receptor binding equilibria ( ) . considering the scale of biological interactions that glycans participate in, it is likely that they would exploit their flexibility to facilitate these interactions. however, the glycan environment, and nearby steric clashes would conceivably affect this flexibility as well, introducing competing effects. revisiting the input na structure in figure , we hypothesized that the glycan on top of each na monomer (the oligomannose type glycans linked to site # ) would achieve a higher degree of flexibility than the two on the bottom of each monomer (the complex and hybrid type glycans linked to sites # and # , respectively). our reasoning was that these two may find steric restrictions on their flexibility, and that the placement on the glycan on the na head would be more important than the type of glycan examined. our results show this is not quite the case. the clusters in figure , figure s , figure s , and figure s , show that, similar to the complextype glycans (a-d ), the oligomannose-type glycans (a-d ) were quite flexible even though they were situated near the hybrid-type glycans (a-d ) on the bottom of the na surface; this large degree of conformational freedom is backed up by previous work specifying that this flexibility is driven by the mannose( )-α( - )-mannose( ) and the mannose( )-α( - )-mannose( ) linkages ( ) . these are the linkages connecting the chitobiose glycan "stalk" to the two glycan "branches". finally, the hybrid-type glycans showed noticeably less conformational flexibility than either the oligomannose-type glycans or the complex-type glycans. overall, the type of glycan and its specific linkages seemed to govern its flexibility more than potential nearby steric clashes. this agrees with previous work showing that unless there is a direct steric clash, inter-residue hydrogen bonds may have a larger effect governing glycan conformations ( , ) . the results shown in figure are consistent with diffusion controlled reactions, and show relatively high association rates. the space explored is consistent with the random walk nature of diffusion. the randomness of the ligand trajectories (from brownian motion) and the small sizes of the ligands considered here minimize the effects of the glycans on binding. the rates for each ligand are mostly of similar orders of magnitude, with or without glycosylation. however, the glycan structures from the md simulations show a moderate inhibition compared to the unglycosylated na structure and the na structure with glycan structures taken directly from the glyprot web server. the extent of this inhibition ranges from a factor of about two to eight. in general, glycans can decrease binding activity of viral proteins ( , , , ) . due to their bulk and proximity to the primary ligand binding site, we hypothesized that, irrespective of conformation, the presence of glycans, particularly those near the binding sites, could substantially reduce ligand binding and removing these glycans would restore binding. what we found was a more nuanced picture. the na constructs with glycan conformations from the glyprot server showed similar binding rates to unglycosylated constructs. however, more realistic glycan conformations, extracted from the md simulations, showed a moderate but noticeable decrease in association rate, k on , on the order of two to eight times. one may naturally question whether glycans would have the same effect on dissociation rate, k off . one previous study testing antibody binding to cancer cells showed that antibody binding was relatively insensitive to the presence of glycans, indicating a similar dampening of k on and k off due to the presence of glycans ( ) . in this study mentioned, the overall equilibrium constant k d changed by less than a factor of two irrespective of the presence or absence of glycans ( ) . however, a different study done in the influenza membrane protein hemagglutinin showed that trimming the glycans from a standard length seen in hek cells to a single monosaccharide decreases the equilibrium constant k d by a factor of two to , depending on the receptor mimic used ( ) . this meant that the k on and the k off were not affected in the same way by the presence of glycans ( ) . glycans are present in the antibody binding sites of both of the studies mentioned above; this is in contrast to our system where glycans are situated near the catalytic sites, but not directly inside them. with this in mind, it seems likely that the slight slowing of binding small ligands by the glycans would be similarly reflected in a slight slowing of release, so that the equilibrium constants for binding these molecules are relatively insensitive to the presence of glycans. in effect, this is because the presence of glycans near the binding sites should not change the Δ g in accordance with the gibbs relationship. we hypothesize that our observed decrease in association rate is due to the glycans at glycosylation site n (site # ) as only those glycans are capable of sterically inhibiting the binding sites (figure ) , and we assume the glycans at sites n (site # ) and n (site # ) do not impair binding. taking the inhibition results discussed here with a different binding study using larger ligands for influenza na, there appears to be a size dependence on this inhibitory potential: smaller ligands are not as affected as larger ligands ( ) . the key points here are that small molecules are not seriously impeded from binding by the glycans; future drug discovery efforts can be focused on the development of strong binders with correspondingly long lifetimes of binding. modeling studies focused on small inhibitors are likely to be helpful, even when glycans are not included. the results seen in figure highlight the importance of using biologically-relevant glycan conformations relaxed on the protein structure as opposed to simply generating a glycan conformation and attaching it to the protein. though this study did use static structures as per the bd setup, we would expect similar trends if this study were repeated using a dynamic md environment since our bd trajectories already used the most highly-accessed glycan conformations gleaned from extensive md sampling. moreover, a study using mixed bd-md simulations analyzing the association of oseltamivir and zanamivir to na actually showed a less accurate k on rate than our coarser study using only bd ( ) . we can rationalize that the slower binding kinetics seen in our systems with biologically-relevant glycan conformations ( figure ) are due to the ligands having to maneuver around the glycans, even after running into them, and then continuing with the trajectory until reaching the binding site. this type of maneuverability can be seen in figure . we generated bd trajectories that could end with the ligand binding to the primary site ( figure a) , the secondary site (figure b) , or either site ( figure c ) on any monomer. using this setup, we were able to differentiate binding between the primary and secondary sites, and in fact found an additive binding mode when examining both sites concurrently. by simply adding up the association rates observed for the primary site ( figure a ) to the analogous simulation run to the secondary site (figure b) , the association rate to both sites ( figure c ) can be roughly obtained. we do not see any evidence of a further increase in association rate using both sites, showing that the presence of a proximal binding site does not influence association rate, either for the primary site or the secondary site. our primary site binding results show two conclusions supported by literature. in figure a we see that oseltamivir associates faster than zanamivir, as has been seen in experimental kinetics studies ( ) . moreover, we see faster binding of oseltamivir than sialic acid. this is qualitatively in agreement with an nmr study showing that oseltamivir outcompetes α ( , )-sialyllactose in binding to the avian na active site ( ) . it is not immediately clear which atoms on the ligands drive their binding differences. ligand binding to the secondary site has not been extensively studied, but it does not appear to have catalytic activity ( , ) . focusing on the secondary site, our results show three important findings. we first see that binding to the secondary site is slower than to the primary site, if binding is seen at all (figure ) . we do not see secondary site binding for oseltamivir and very little for zanamivir, though this may be as they are at the lower detection limit of our method. furthermore, we see that sialic acid binds faster to the secondary site than oseltamivir, which is in agreement with one study showing that α ( , )-sialyllactose outcompetes oseltamivir for binding to the avian na secondary site ( ) . a more recent study goes further and does not show any binding of oseltamivir to the avian na secondary site ( ) . however, we caution that a small amount of drug binding, likely only with zanamivir, may occur to the secondary site, as seen with zanamivir bound in the secondary site in the unpublished crystal structure pdb cml, and also seen in figure b . secondly, in the small amount of secondary site binding seen (figure b) , glycans are actually capable of enhancing or inhibiting binding, foreshadowing the complex role glycans play in ligand binding. finally, there appears to be a small conformational dependence on association rate, but this is only seen towards the secondary site ( figure b) . we used two different conformations of sialic acid for these binding studies. the boat conformation was crystallized in the active site and the chair conformation was crystallized in the secondary site. in our results we see the sialic acid chair conformation actually shows fractionally higher binding to the primary site than the boat conformation ( figure a) . conversely, only the boat conformation shows binding to the secondary site; the chair conformation does not register binding at all (figure b) . however, we caution that these results may be because sialic acid was crystallized in a different strain of avian na than we used in our studies. taken together, these results show that the exact ligand conformation upon approach to the binding site may not match the crystallized binding pose, but the results we present here do not permit us to explore this note or further explain a conformational dependence on binding. comparing the association rates in figure one may naturally query the competition in association rates between the primary and secondary sites. we see faster association to the primary site than the secondary site, which is not in agreement with two previous bd simulation studies ( , ) . however, the methodology of our study differs from these two studies, and from this we can unify the difference. those bd studies showed that ligands reach a distance of . Å away from the secondary site faster than to the primary site. we then show that ligands reach a distance of . Å away from the primary site faster than the secondary site, though we would like to note that our paper and the sung et al. paper assigned charges for the bd trajectories according to the amber force field and the amaro et al. paper assigned charges according to the charmm force field ( , ) . taken together, the secondary site appears to contain stronger long-range electrostatics to draw in ligands, but when the ligands approach the binding sites and sterics come into play, it appears to be more favorable for ligands to move closer to the primary site than the secondary site, assuming the rigid body approximations applied herein. considering the fact that the realistic substrates na encounters will exhibit multivalent binding, one previous study showed that the secondary site improved avian na enzymatic activity in removing sialic acid both from soluble macromolecular substrates and from cells ( ) . another study confirmed that the binding in the secondary site improved catalytic activity against multivalent substrates ( ) . other previous studies have suggested that the secondary site enhances the overall na catalytic activity by binding substrates and bringing them close to the catalytic primary site ( , ( ) ( ) ( ) ( ) . taking the studies above with our results, we postulate that multivalent cleavage will occur in a stepwise manner (figure ) . the first association event of the multivalent substrate, such as sialylated cell surface receptors, will bind to the primary site, and then to the secondary site. after sialidase cleavage occurs in the primary site, the cleaved glycan branch will dissociate. then the sialylated glycan branch bound in the secondary site will be transferred to the primary site, as suggested previously ( , ( ) ( ) ( ) . after this passage, cleavage will again occur, and the full glycan will be released, finishing the enzymatic cycle. this mechanism is in disagreement with a previously proposed mechanism, which postulates that both binding sites will not be bound simultaneously ( ) . however, we feel there is a greater body of literature suggesting that binding both sites simultaneously increases catalytic activity. we note that our proposed binding mechanism may be muddied in the case of multivalent ligands with viral glycans situated near the binding sites; in this case, the glycans may sterically inhibit multivalent binding, slowing down enzymatic activity and attenuating the replication cycle. in the case of monovalent binders, such as the inhibitors oseltamivir and zanamivir, we show in figure that association will happen to the primary site faster than to the secondary site. this appears to be biologically viable considering that previous studies have showed that the secondary site activity has no effect on enzymatic activity for monovalent substrates ( , ( ) ( ) ( ) . as the primary site is the main site of enzymatic activity, it is reasonable to assume that ligands would preferentially bind to the primary site over the secondary site; reducing transfers of ligands between the binding sites would ostensibly increase catalytic activity and efficiency. taken together, abolishing the secondary site in avian na will not affect monovalent substrates such as influenza drugs as these associate faster to the primary site anyways, which our results confirm. to exposit this a different way, influenza drugs will preferentially block primary site binding over secondary site binding. a free monovalent binder will associate to the primary site over the secondary site (a). this monovalent binder will release from the primary site before a second monovalent binder will associate to the secondary site. glycans, with their sialic acid tips, are an example of a multivalent binder (b). similar to the monovalent binders, the first multivalent binding event will occur to the primary site. next, the second sialic acid tip binds to the secondary site. with both sites bound, the sialic acid in the primary site is cleaved and released. the sialic acid bound in the secondary site is then transferred to the primary site. finally, the second sialic acid is cleaved and released, and the enzymatic cycle is complete. in this work, we created na systems with varying glycan conformations, and also without the presence of glycans. these glycans are capable of covering much of the surface area of na. their conformational flexibility is dependent on their glycan type, not necessarily their spatial position. the glycosylated systems showed moderate inhibition of ligands to the primary binding site. finally, we propose a new binding mechanism for multivalent binders to na, such as cell surface receptors. these results have implications for future drug development, the overall understanding of glycans, and the na enzymatic mechanism. much sustained effort has gone into developing na inhibitors, and will continue to do so in the future. measuring the binding of a potential drug is an important step in the drug discovery process. however, most drug discovery efforts have not taken into account viral glycans. neglecting this effect can lead to a surprising drop in drug binding ( ) . our work shows that glycans can have an inhibitory effect on influenza na primary site binding. there have already been a number of studies using multivalent binders as na antivirals ( ) ( ) ( ) ( ) ( ) ( ) . with the results shown here, we recommend future work on multivalent na drugs, to focus on developing strong binders with a long lifetime, regardless of the presence or absence of glycans. with the detection limitations of our study, we cannot conclude how glycans affect secondary site binding, although we believe binding to the secondary site will be slower than binding to the primary site ( figure ) . however, it follows from these results that glycans could evoke a secondary site binding inhibition similar to the primary site. in summary, this work examines glycan inhibition on drug binding, compares the drug binding interplay between two binding sites, and proposes a new mechanism of ligand binding to na. antibody determinants of influenza immunity exploitation of glycosylation in enveloped virus pathobiology effect of the addition of oligosaccharides on the biological activities and antigenicity of influenza a/h n virus hemagglutinin effects of glycosylation on the properties and functions of influenza virus hemagglutinin fitness costs limit influenza a virus hemagglutinin glycosylation as an immune evasion strategy glycosylation as a target for recognition of influenza viruses by the innate immune system playing hide and seek how glycosylation of the influenza virus hemagglutinin can modulate the immune response to infection contemporary h n influenza viruses have a glycosylation site that alters binding of antibodies elicited by egg-adapted vaccine strains cellular glycosylation affects herceptin binding and sensitivity of breast cancer cells to doxorubicin and growth factors multisystem failure: the story of anti-influenza drugs drug resistance in influenza a virus: the epidemiology and management current advances in anti-influenza therapy mammalian protein glycosylation -structure versus function structural requirements of n-glycosylation of proteins structure and immune recognition of the hiv glycan shield site-specific glycan analysis of the sars-cov- spike vulnerabilities in coronavirus glycan shields despite extensive glycosylation arenavirus glycan shield promotes neutralizing antibody evasion and protracted infection hepatitis c virus envelope glycoprotein e glycans modulate entry, cd binding, and neutralization structure of the epstein-barr virus major envelope glycoprotein glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy the hiv glycan shield as a target for broadly neutralizing antibodies bitter-sweet symphony: glycan-lectin interactions in virus biology effect of addition of new oligosaccharide chains to the globular head of influenza a/h n virus haemagglutinin on the intracellular transport and biological activities of the molecule antigenic structure of the haemagglutinin of human influenza a/h n virus genetic requirement for hemagglutinin glycosylation and its implications for influenza a h n virus evolution n-glycan profiles in h n avian influenza viruses from chicken eggs and human embryonic lung fibroblast cells integrated omics and computational glycobiology reveal structural basis for influenza a virus glycan microheterogeneity and host interactions targeted n-linked glycosylation analysis of h n influenza hemagglutinin by selective sample preparation and liquid chromatography/tandem mass spectrometry glycan analysis in cell culture-based influenza vaccine production: influence of host cell line and virus strain on the glycosylation pattern of viral hemagglutinin comparative glycomics analysis of influenza hemagglutinin (h n ) produced in vaccine relevant cell platforms glycosylation characterization of an influenza h n hemagglutinin series with engineered glycosylation patterns: implications for structure-function relationships characterization of site-specific glycosylation in influenza a virus hemagglutinin produced by spodoptera frugiperda insect cell line comparative characterization of the glycosylation profiles of an influenza hemagglutinin produced in plant and insect hosts changing selective pressure during antigenic changes in human influenza h human influenza a virus hemagglutinin glycan evolution follows a temporal pattern to a glycan limit glycosylation site alteration in the evolution of influenza a (h n ) viruses antigenic drift of the influenza a(h n )pdm virus neuraminidase results in reduced effectiveness of a/california/ / (h n pdm )-specific antibodies glycosylation at asn of h n haemagglutinin affects binding to glycan receptors recent avian h n viruses exhibit increased propensity for acquiring human receptor specificity glycans on influenza hemagglutinin affect receptor binding and immune response structural basis for influence of viral glycans on ligand binding by influenza hemagglutinin n-linked glycosylation of the hemagglutinin protein influences virulence and antigenicity of the pandemic and seasonal h n influenza a viruses influenza h n a/solomon island/ / virus receptor binding specificity correlates with virus pathogenicity, antigenicity, and immunogenicity in ferrets alterations in receptor binding properties of recent human influenza h n viruses are associated with reduced natural killer cell lysis of infected cells receptor binding by influenza virus: using computational techniques to extend structural data molecular simulations of diffusion and association in multimacromolecular systems bimolecular diffusion association brownian dynamics with hydrodynamic interactions role of secondary sialic acid binding sites in influenza n neuraminidase hemagglutinin/neuraminidase functional balance reveals the neuraminidase secondary site as a novel anti-influenza target multiscale simulation of receptor-drug association kinetics: application to neuraminidase inhibitors free energy decomposition of protein-protein interactions the structure of h n avian influenza neuraminidase suggests new opportunities for drug design influenza virus sialidase: effect of calcium on steady-state kinetic parameters the pymol molecular graphics system crystal structures of oseltamivir-resistant influenza virus neuraminidase mutants pandemic influenza virus: resistance of the i r neuraminidase mutant explained by kinetic and structural analysis propka : consistent treatment of internal and surface residues in empirical pka predictions pdb pqr: an automated pipeline for the setup of poisson-boltzmann electrostatics calculations how well does a restrained electrostatic potential (resp) model perform in calculating conformational energies of organic and biological molecules glycam : a generalizable biomolecular force field glycosciences.db: an annotated data collection linking glycomics and proteomics data reaching biological timescales with all-atom molecular dynamics simulations conformational analysis of furanoside-containing mono-and oligosaccharides charmm gui: a web based graphical user interface for charmm charmm: the biomolecular simulation program charmm-gui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm additive force field glycan reader: automated sugar identification and simulation preparation for carbohydrates and glycoproteins glycan reader is improved to recognize most sugar types and chemical modifications in the protein data bank charmm-gui glycan modeler for modeling and simulation of carbohydrates and glycoconjugates comparison of simple potential functions for simulating liquid water numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamcis of n-alkanes langevin dynamics of peptides: the frictional dependence of isomerization rates of n acetylalanyl n ′ methylamide an analysis of the accuracy of langevin and molecular dynamics algorithms isothermal-isobaric molecular dynamics simulations with monte carlo volume sampling particle mesh ewald: an n⋅log(n) method for ewald sums in large systems gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers distinct glycan topology for avian and human sialopentasaccharide receptor analogues upon binding different hemagglutinins: a molecular dynamics perspective structural evidence for a second sialic acid binding site in avian influenza neuraminidases prodrg: a tool for high-throughput crystallography of protein-ligand complexes browndye: a software package for bronwian dynamics studier i affiniteten. cm forhandlinger: videnskabs-selskabet i christiana Über die chemische affinität ion-water interaction potentials derived from free energy perturbation simulations vmd: visual molecular dynamics improvements to the apbs biomolecular solvation software suite structure of influenza virus n : the last piece of the neuraminidase a secondary sialic acid binding site on influenza virus neuraminidase: fact or fiction? role of neuraminidase in influenza a(h n ) virus receptor binding antibody neutralization and escape by hiv- antibody evasion by a gammaherpesvirus o-glycan shield the hepatitis c virus glycan shield and evasion of the humoral immune response beyond shielding: the roles of glycans in the sars-cov- spike protein structure of an hiv gp envelope glycoprotein in complex with the cd receptor and a neutralizing human antibody the hiv- envelope glycoproteins: fusogens, antigens, and immunogens structural, glycosylation and antigenic variation between novel coronavirus ( -ncov) and sars coronavirus effect of bisecting glcnac and core fucosylation on conformational properties of biantennary complex-type n-glycans in solution glycan flexibility: insights into nanosecond dynamics from a microsecond molecular dynamics simulation explaining an unusual nuclear overhauser effect conformational flexibility of nglycans in solution studied by remd simulations sequence-to-structure dependence of isolated igg fc complex biantennary n-glycans: a molecular dynamics study regulation of receptor binding affinity of influenza virus hemagglutinin by its carbohydrate moiety the nd sialic acidbinding site of influenza a virus neuraminidase is an important determinant of the hemagglutinin-neuraminidase-receptor balance influenza virus-glycan interactions functional significance of the hemadsorption activity of influenza virus neuraminidase and its alteration in pandemic viruses mutation of the second sialic acid-binding site, resulting in reduced neuraminidase activity, preceded the emergence of h n influenza a virus mesoscale all-atom influenza virus simulations suggest new substrate binding mechanism substrate binding by the second sialic acid-binding site of influenza a virus n neuraminidase contributes to enzymatic activity n neuraminidase of influenza virus a/fpv/rostock/ has haemadsorbing activity neuraminidase hemadsorption activity, conserved in avian influenza a viruses, does not influence viral replication in ducks antigenic structure and variation in an influenza virus n neuraminidase synthesis and anti-influenza evaluation of polyvalent sialidase inhibitors bearing -guanidino-neu ac en derivatives dimeric zanamivir conjugates with various linking groups are potent, long-lasting inhibitors of influenza neuraminidase including h n avian influenza attaching zanamivir to a polymer markedly enhances its activity against drug-resistant strains of influenza a virus polymerattached zanamivir inhibits synergistically both early and late stages of influenza virus infection synthesis of multivalent difluorinated zanamivir analogs as potent antiviral inhibitors multivalent zanamivir-bovine serum albumin conjugate as a potent influenza neuraminidase inhibitor key: cord- -iqjksoim authors: marinaik, chandranaik b.; kingstad-bakke, brock; lee, woojong; hatta, masato; sonsalla, michelle; larsen, autumn; neldner, brandon; gasper, david j.; kedl, ross m.; kawaoka, yoshihiro; suresh, m. title: programming multifaceted pulmonary t-cell immunity by combination adjuvants date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iqjksoim induction of protective mucosal t-cell memory remains a formidable challenge to vaccinologists. using a novel adjuvant strategy that elicits unusually potent cd and cd t-cell responses, we have defined the tenets of vaccine-induced pulmonary t-cell immunity. an acrylic acid-based adjuvant (adj), in combination with tlr agonists glucopyranosyl lipid adjuvant (gla) or cpg promoted mucosal imprinting but engaged distinct transcription programs to drive different degrees of terminal differentiation and disparate polarization of th /tc /th /tc effector/memory t cells. combination of adj with gla, but not cpg, dampened tcr signaling, mitigated terminal differentiation of effectors and enhanced the development of cd and cd trm that protected against h n and h n influenza viruses. mechanistically, vaccine-elicited cd t cells played a vital role in optimal programming of cd trm and anti-viral immunity. taken together, these findings provide new insights into vaccine-induced multi-faceted mucosal t-cell immunity with significant implications in the development of vaccines against respiratory pathogens. one sentence summary adjuvants induce multipronged t-cell immunity in the respiratory tract. viral mucosal infections such as influenza cause considerable morbidity, and even mortality in very young and geriatric patients ( ). protection afforded against influenza a virus (iav) by antibodies is typically virus type/subtype specific; however, t cells are believed to provide broad heterosubtypic immunity ( ) ( ) ( ) ( ) . iav infection elicits strong effector cd and cd t-cell responses in the lungs leading to the development of protective lung and airway-resident memory t cells ( , ) . however, influenza-specific mucosal memory t cells exhibit attrition and t-cell-based protection wanes in a span of - months ( , ) . therefore, unlike systemic viral infections that typically engender enduring immunity ( , ) , mucosal viral infections fail to program durable t-cell immunity in the respiratory tract (rt). while engagement of multiple innate receptors early in the response might be key to long-lived immunological memory following systemic infections ( , ) , there is a lack of understanding of why mucosal infections lead to shorter duration of cellular immunity. there is a general paucity of adjuvants that induce strong t-cell responses, and we have limited knowledge of mucosal t-cell responses to adjuvanted subunit vaccines, especially in the rt. these knowledge gaps pose daunting constraints in the development of immunization strategies targeted at the establishment of durable protective t-cell immunity in the rt ( ) ( ) ( ) ( ) . adjuplex (adj) is a polyacrylic acid-based (carbomer) adjuvant that is a component of some current veterinary vaccines and also known to induce neutralizing antibodies against hiv and malaria ( ) ( ) ( ) ( ) . here, we report that adj, in combination with toll-like receptor (tlr) / agonists, elicits unexpectedly potent and functionally diverse cd and cd t-cell responses to a subunit viral protein in the rt. studies with this adjuvant system provided the means to differentially program distinct patterns of effector and memory t-cell differentiation in the rt. further, these studies provided the first glimpse of the evolution of t-cell responses to adjuvanted vaccines in the lungs to define the quantitative, phenotypic and functional attributes of mucosal effector/memory cd and cd t cells that are associated with effective viral control in the lungs, and protection against h n and h n influenza infections. collectively, these findings provide novel insights into the immunological apparatus underlying the generation and establishment of protective and durable t-cell immunity in the rt in response to adjuvanted subunit vaccines. (p< . ) than in the gla group. interestingly, comparison of adj, gla and adj+gla groups suggested that gla limited the development of cx cr hi cd t cells. as another surrogate marker for effector differentiation, we quantified granzyme b levels in cd t cells directly ex vivo (fig. c) . the percentages of granzyme b hi cd t cells among np specific cd t cells in adj, cpg and adj+cpg groups were significantly (p< . ) higher than in gla or adj+gla groups. clearly, adj and cpg promoted granzyme b expression, but gla antagonized the granzyme b-enhancing effects of adj. studies to determine the transcriptional basis for the disparate differentiation of effector cd t cells in different adjuvant groups showed that the expressions of t-bet, irf- and batf were substantially greater in adj and adj+cpg groups, compared to gla and adj+gla groups ( fig. d) . while adj appeared to be the primary driver of t-bet, irf- and batf expression, gla effectively negated this effect in adj+gla mice (fig. d) . the levels of eomes did not differ between adjuvants, but analysis of t-bet and eomes co-expression showed that a higher percentage of cd t cells co-expressed t-bet and eomes (t-bet hi eomes hi ) in the cpg and adj+cpg groups (fig. s c) . by contrast, a greater proportion of cd t cells in gla and adj+gla groups expressed eomes, but not t-bet (t-bet lo eomes hi ) (fig. s c) . taken together, terminal differentiation of effector cd t cells in adj and/or cpg was linked to high levels of t-bet, irf- and batf. next, we assessed expression of cd and cd to ask whether adjuvants affected mucosal imprinting of cd t cells in the rt. the majority of np -specific cd t cells in lungs and bal expressed cd but not cd in all groups. the percentages of cd hi cd hi cd t cells in adj, adj+cpg and adj+gla groups were higher than in cpg and gla groups, which suggested that adj was a potent inducer of cd (fig. e) . altogether, fig. shows that adj and/or cpg promoted different facets of cd t-cell terminal differentiation. remarkably however, when combined with adj, gla selectively antagonized adj-driven terminal differentiation program without affecting mucosal imprinting of cd t cells. thus, adj-driven cd t-cell differentiation program can be augmented or antagonized by tlr agonists cpg and gla respectively. next, we characterized np-specific cd t-cell responses to various adjuvants following mucosal immunization. at day pv, high perecentages of np -specific cd t cells were detected in lungs and airways of all groups of mice ( fig. a) . the percentages and total numbers of np -specific cd t cells in lungs and airways were comparable between adj, cpg, gla and adj+cpg groups. however, the total numbers of np -specific cd t cells in the lungs and airways of adj+gla group were significantly higher than in other groups ( fig. a) . phenotypically, adj and cpg promoted the expression of terminal differentiation markers cx cr and klrg- , respectively (fig. b) . by contrast, expressions of cx cr and klrg- were lowest in the gla group (fig. b) and gla tempered adj-induced expression of cx cr in adj+gla group. np -specific cd t cells from adj and/or cpg groups contained greater levels of t-bet, as compared to other groups (fig. c) , but eomes levels were not different between groups. gla with or without adj induced the lowest levels of t-bet, which resulted in greater percentages of t-bet lo eomes hi cd t cells in gla and adj+gla groups (fig. d) . thus, adj and cpg promoted terminal differentiation of cd t cells by inducing t-bet expression, as compared to gla or adj+gla groups. analysis of mucosal imprinting markers cd and cd showed that adj-containing adjuvants elicited higher percentages of cd hi and cd hi cd hi cd t cells in lungs (fig. e) . thus, in contrast to adj and cpg, combining adj with gla promoted the development of less differentiated mucosally-imprinted cd t cells in the lungs and airways. we then asked whether adjuvants regulated functional programming of effector cd and cd t cells into tc /tc or th /th subsets respectively, in lungs. np -specific ifn--producing tc cd t cells were induced in all groups and the percentages of such cells among cd t cells were generally higher in the adj+gla group (fig. a) . interestingly however, il- -producing np -specific tc cd t cells were strongly induced only in the gla and adj+gla groups. to further elucidate the relative dominance of tc versus tc in different adjuvant groups, we calculated the relative proportions of these cells among total cytokine-producing (il- +ifn- producing cells) peptide-stimulated np -specific cd t cells (fig. b) ; ~ - % of np -specific cytokine-producing cd t cells produced ifn- in the cpg and adj+cpg groups, and only %, % and % of such cells produced ifn- in adj, gla and adj+gla groups, respectively. reciprocally, while only a relatively small fraction ( - %) of np specific cytokine-producing cd t cells produced il- or il- +ifn- in cpg and adj+cpg groups, - % of np -specific cd t cells produced il- or il- +ifn- in gla and adj+gla groups. thus, cpg and adj+cpg promoted functional polarization of tc cells, and adj, gla and adj+gla drove a balanced differentiation of tc and tc cells. evaluation of the ability of np -specific cd t cells to co-produce ifn-, tnf- and il- ( fig. c) showed that all adjuvants induced polyfunctional cd t cells, but a significantly higher percentages of np -specific cd t cells in the gla group were polyfunctional, as compared to other groups (fig. c) . np -specific th and th cd t cells were induced to varying levels by different adjuvants (fig. d) . adj promoted th polarization of effector cd t cells but cpg promoted th differentiation and negated the th skewing effects of adj in the adj+cpg group. th differentiation dominated over the th development in gla and adj+gla groups (fig. e) . in summary, while cpg and adj+cpg promoted the development of th effector cells, adj, gla and adj+gla favored the differentiation of th cells (fig. e) . polyfunctionality among np -specific cd t cells was largely comparable between groups (fig. f) . antigenic stimulation and the inflammatory milieu govern effector differentiation during infections ( ) ( ) ( ) . in order to determine whether adjuvants differed in terms of antigenic stimulation in draining lymph nodes (dlns) and lungs, early after vaccination (day and ), we adoptively transferred x tcr transgenic ot-i cd t cells that express gfp under the control of nur promoter; nur expression faithfully reports specific tcr signaling in t cells ( ) . subsequently, mice were vaccinated with chicken ovalbumin (ova) mixed with different adjuvants, and gfp expression by ot-i cd t cells was assessed at days and pv. ot-i cd t cells expressed readily detectable levels of gfp in dlns and lungs at different days pv (fig. a) . overall, gfp levels were not significantly different for ot-i cd t cells (p< . ) in dlns between various groups (except between gla and adj+gla) at day pv (fig. a) . ot-i cd t cells were not detectable in lungs until day pv; at day pv, significantly (p< . ) higher levels of gfp were detected in ot-i cd t cells from the lungs of adj mice, compared to cpg, gla and adj+gla groups (fig. a) . adoptive transfer of x tcr transgenic cd t cells was technically essential to assess t-cell signaling early after vaccination (fig. a ), but transfer of such unphysiologically high numbers of t cells might affect their differentiation ( ) . therefore, for assessment of tcr signaling at day pv, we adoptively transferred nur -gfp ot-i tcr transgenic cd t cells prior to vaccination. the pattern of gfp fluorescence in donor ot-i cd t cells in dlns and lungs of vaccinated mice at day pv, is shown in fig. s . on the th day pv, ot-i cd t cells in the dlns of adj mice expressed higher levels of gfp, compared to other groups, but the differences did not reach statistical significance. by contrast, on day pv, gfp levels in ot-i cd t cells from lungs of adj mice were significantly higher (p< . ) than in ot-i cd t cells from lungs of cpg, gla and adj+gla mice (fig. a) . collectively, a greater percentage of ot-i cd t cells in the lungs of adj group showed evidence of active tcr signaling in the lungs at days and after vaccination, and, notably, this effect of adj was dampened by gla but not cpg. enhanced tcr signaling in adj group (and to a lesser extent in cpg group) was consistent with elevation of irf- and batf (fig. d) , whose expressions are known to be driven by tcr signaling ( ) . transcription factor klf plays a key role in regulating t-cell trafficking, and tcr signaling downregulates klf expression ( , ) . using klf -gfp reporter mice ( ), we assessed whether high tcr signaling in adj-vaccinated mice led to klf downregulation in polyclonal np -specific cd t cells in the dlns and lungs, at day pv. in all groups, np -specific cd t cells downregulated klf expression in lungs, relative to klf levels in their respective lymph nodes (fig. b) . in lungs of adj, cpg and adj+cpg groups, np -specific cd t cells expressed lower levels of klf than in cd t cells from gla and adj+gla groups ( fig. b) . these data suggested that adj and/or cpg might enhance tcr signaling-induced klf downregulation in lungs, as compared to adj+gla. during influenza virus infection in mice, tcr signaling drives pd- expression in lungs ( ) . therefore, we investigated whether pd- expression was linked to varying levels of tcr signaling induced by different adjuvants. at day pv, higher percentages of np -specific cd t cells in adj mice expressed pd- , as compared to those in cpg and gla mice (fig. c) . interestingly, addition of gla but not cpg to adj significantly reduced adj-driven pd- expression on np -specific cd t cells (fig. c) . to elucidate the possible relationship between the frequency of np -specific cd t cells and their pd- expression levels in the lungs, we calculated correlation co-efficient between the two parameters (fig. d) . strikingly, there was a significant linear inverse correlation between pd- expression and the frequency of np -specific cd t cells in lungs of mice vaccinated with adj, cpg and gla adjuvants. these findings suggested that tcr signaling-induced pd- expression might limit the accumulation of cd t cells (clonal burst size) in the lungs. in summary ( fig. and ) , terminal differentiation of effector cd t cells in adj and adj+cpg groups was associated with enhanced tcr signaling in the lungs. reciprocally, gla might protect effector cd t cells from adj-driven terminal differentiation, by limiting tcr signaling in the lungs. to explore whether tcr signaling in cd t cells in adj+gla mice is governed by the abundance of antigen-presenting cells in lungs, first we quantified innate immune cells including dcs in lungs at day and pv (fig. s a) . adj+gla and adj+cpg increased the infiltration of neutrophils in lungs at day and respectively. only at day but not at day pv, lungs of adj, adj+cpg and adj+gla contained higher numbers of monocytes and monocyte-derived dcs, than in cpg and gla mice. there were no differences between the groups in the numbers of cd +ve dcs or alveolar macrophages on either days after vaccination. we deterimined the abundance and type of antigen-processing cells in lungs by vaccinating mice with dq-ova, which emits green/red fluorescence upon degradation by proteases (fig. s b) . as compared to cpg and gla groups, lungs of adj and adj+cpg (and adj+gla to a slightly lesser degree) contained significantly higher numbers of dq-ova-bearing monocyte-derived dcs, monocytes and cd +ve dcs at day pv, but not at day pv. these data suggested that dampened tcr signaling in adj+gla group, as compared to augmented signaling in adj and adj+cpg groups ( fig. a -d) cannot be simply explained by reduced abundance of specific antigenbearing cells in the lungs. to determine whether early inflammatory response influenced the phenotypic and functional differentiation of effector t cells, we quantified cytokine expression in the lungs. at ( fig. s a ) and hours (fig. s b ) pv, the levels of cytokines/chemokines il- , il- , il- , kc, rantes, g-csf and gm-csf were higher in lungs of gla and/or adj+gla mice. however, the levels of ifn-, ifn-, il- , il- p , il- p , mip , mip , mcp, tgf- and tnf in lungs were largely comparable between groups, except for the adj group (fig. s ) . thus, terminal differentiation of effector cd t cells in adj, cpg and adj+cpg mice was not associated with excessive inflammation in the lungs, relative to other groups. notably however, tc and th cell development in gla and adj+gla groups was associated with elevated il-  in the lungs. further, development of th effectors and enhanced t-bet induction ( fig. and fig. ) in cpg and adj+cpg groups was not associated with elevated levels of il- p in the lungs. thus, the differences in accumulation and terminal differentiation of effector t cells in the lungs of vaccinated mice cannot be explained by the degree of early inflammation. at days pv, we quantified np -specific memory cd t cells in lungs, airways and spleen. all adjuvants elicited robust cd t-cell memory in the rt (fig. ) . notably, both frequencies and total numbers of np -specific memory cd t cells in lungs, airways and spleen of adj+gla group were significantly (p< . ) higher than in other groups (fig. a) . intravascular staining showed that - % of np -specific memory cd t cells in the lungs localized to the non-vascular compartment in adj, cpg, gla and adj+gla groups; the percentages of non-vascular memory cd t cells were slightly reduced in the adj+cpg group ( fig. b) . the percentages of cd +ve cd +ve lung resident memory (trm) cells among np -specific cd t cells were comparable for various adjuvants (fig. c) . however, lungs of adj+gla group contained significantly (p< . ) greater numbers of both non-vascular and vascular cd +ve np -specific cd t cells, as compared to other groups (fig. d) . thus, adj+gla was the most effective adjuvant that elicited high numbers of cd +ve trm cd t cells in the airways and the non-vascular compartment of the lungs. at days after vaccination, all adjuvants induced strong cd t-cell memory and the percentages of np -specific memory cd t cells ranged from . - % in the lungs (fig. e) . the percentages of memory cd t cells in lungs of adj+gla group were consistently higher than in other groups (fig. e) . regardless of adjuvants - % of memory cd t cells localized to the non-vascular compartment in the lungs (fig. f) . likewise, the percentages ( - %) of lung cd +ve trm-like cd t cells were comparable for various adjuvants. we determined whether polarization of th versus th was maintained in memory cd t cells of vaccinated mice. at days after vaccination, ifn- and/or il- -producing np-specific memory cd t cells were detectable in the lungs of vaccinated mice (fig. g) . fig. h illustrates that the percentages of np-specific cytokine-producing cd t cells that produce ifn- and/or il- differed amongst various groups. ifn--producing cd t cells were only dominant (~ %) in the cpg group, but il- -producing cd t cells formed the dominant subset (~ %) in the gla and adj+gla groups. about - % of np-specific memory cd t cells produced il- in the adj and adj+cpg groups. therefore, functional programming in effector cells is largely preserved in memory t cells. mice were vaccinated twice with np protein formulated in various adjuvants. at day after the booster vaccination, we investigated whether np-specific t-cell memory protected against respiratory challenge with the virulent pr /h n influenza a virus (iav). on the th day after challenge, viral burden was high in lungs of mice that were unvaccinated or vaccinated with np alone (without adjuvants) (fig. a) . compared to the unvaccinated and np-only groups, other groups exhibited varying degrees of protection. the adj+gla vaccine provided the most effective protection, followed by gla and adj+cpg vaccines ( fig. a and s a ). although relatively less effective, adj and cpg vaccines still reduced viral titers by > %. kinetically, at days pv, viral burden was reduced in the lungs of all vaccinated mice within - days after pr /h n challenge (fig. s b) , but clear differences in viral control among adjuvants emerged beween days and postchallenge ( fig. a and fig. s b ). protection against iav afforded by various adjuvant groups was durable and was maintained for at least until day pv (fig. s c ). to elucidate correlates of protection afforded by various adjuvanted vaccines, we quantified recall cd and cd t-cell responses in the lungs at day after pr /h n challenge. interestingly, despite varying levels of protection afforded by various vaccines (fig. a) , the numbers and extra-vascular localization of np -specific cd t cells and np -specific cd t cells in the lungs were comparable between the groups (fig. s d) . the percentages of np specific ifn--producing cd t cells were also comparable for all groups of mice (fig. b) . in striking contrast, percentages of np -specific il- -producing tc cells were considerably higher in the lungs of adj+gla and gla groups (fig. b) . the percentages of np -specific ifn--producing cd t cells in the cpg and adj+cpg groups were significantly higher than in other groups (fig. c) . in addition, lungs of gla and adj+gla mice contained higher percentages of il- -producing np -specific th cd t cells, than in other groups. in this adjuvant system, all adjuvants afforded considerable protection. however, differences in viral control between groups appeared to associate with disparate levels of tc and/or th cells, but not tc or th cells. for example, better viral control by gla and adj+gla groups was associated with increased percentages of il- -producing np -specific tc and np specific th cells ( fig. b and c ). cpg and adj+cpg groups also differed in the percentages of np -specific th cells but not th or tc cells. these data suggest that stimulation of tc /th cells in parallel with tc /th cells might constitute a correlate of enhanced immunity conferred by adj and gla, as compared to adj and cpg groups. to test this inference, we assessed the importance of il- a in mediating protective immunity to iav in mice vaccinated with np formulated in adj+gla. at days after vaccination, adj+glavaccinated mice were treated with isotype control antibodies or anti-il- a antibodies, just prior to viral challenge. data in fig. d show that treatment with anti-il- a antibodies did not affect the accumulation of np -specific cd t cells or np -specific cd t cells in lungs following viral challenge. in mice treated with isotype control antibodies but not in anti-il- atreated mice, lung viral titers were significantly lower than in unvaccinated control mice (fig. d ). these data suggested that il- a might have contributed to viral control in lungs in mice vaccinated with adj+gla. although il- production is known to be protective against certain fungal and bacterial infections, it is also linked to immune pathology ( , ) . in order to evaluate whether vaccine-induced protective immunity in adj+gla mice was associated with lung pathology, we analyzed histopathological changes in lungs after viral challenge (fig. s ) . with the exception of the adj group, moderate necrotizing bronchiolitis was present in all mice, and was most severe in the cpg where it progressed to early-stage bronchiolitis obliterans and organizing pneumonia. very mild extension to the surrounding alveoli was present in the gla and aj gla group. thus, we did not find any evidence of augmented lung pathology in adj+gla mice following viral challenge. next we assessed whether np-based adjuvanted vaccines conferred heterosubtypic immunity against a highly lethal infection with h n avian influenza virus at days pv. in the unvaccinated and np-vaccinated group, % of mice lost significant weight and succumbed to h n infection (fig. e) . by contrast, % of adj+gla and adj+cpg mice lost little weight and survived h n challenge, while other groups showed excellent protection ranging from - % (fig. e) . in order to determine whether cd t cells regulate the quality of cd t-cell memory and protective immunity induced by the adj+gla vaccine, we depleted cd t cells, only at the time of prime and boost vaccination. at days pv, we examined cd and cd t-cell memory in the rt (fig. ) . np -specific memory cd t cells were only detected in lungs and airways of non-depleted mice (fig. a) . cd t-cell depletion had no adverse effect on the numbers of np -specific memory cd t cells in the rt (fig. b) . among trm markers, only the expression of cd , but not cd or cd a was significantly reduced by cd t-cell depletion (fig. c) . coincident with impaired cd expression, memory cd t cells in cd t-cell-depleted mice poorly localized to the lung parenchyma (fig. d) . functionally, the percentages of np -specific ifn--producing cd t cells were significantly (p< . ) increased in the lungs of cd t cell-depleted mice, with no effect on il- -producing cd t cells ( fig. e and f) . in summary, loss of cd t cells impaired cd expression and extravascular localization of memory cd t cells but increased the percentages of np -specific memory tc cells in the lungs. to assess whether depletion of cd t cells affected protective immunity, we challenged undepleted and cd t-cell-depleted vaccinated mice with the pr /h n virus. on the th day after challenge, we assessed recall cd t-cell responses and viral control in the lungs. the percentages of np -specific cd t cells in lungs of cd t-cell-depleted mice were higher than in un-depleted mice (fig. g) , and the majority of these effector cells localized to the nonvascular compartment (fig. h) . np -specific cd t cells were only detected in the lungs of un-depleted mice (fig. i) . cd t-cell depletion had no effect on the percentages of cd +ve cd t cells, but the percentages of cd a +ve cells were significantly (p< . ) increased in cd t-cell-depleted mice. a small percentage of np -specific cd t cells in the lungs of undepleted mice expressed cd , and this fraction was significantly (p< . ) reduced by cd t-cell depletion (fig. j) . in the cd t-cell-depleted group, the percentages of cxcr +ve np -specific cd t cells were significantly reduced, but the percentages of np -specific cd t cells that expressed elevated levels of cx cr , t-bet and eomes were higher in cd t-cell-depleted group than in undepleted group (fig. k) . increased accumulation of cx cr hi cells and reduced expression of cxcr on cd t cells in cd t-cell-depleted mice is consistent with elevated expression of t-bet ( , ) . significantly, the percentages of ifn-producing and granzyme b +ve cd t cells were higher, but there was a concurrent reduction in il- -producing cd t cells in the lungs of cd t-cell-depleted mice (fig. l-m) . strikingly, > % of np -specific cd t cells produced ifn- in the cd t-cell-depleted group as opposed to ~ % in undepleted mice (fig. n) . undepleted mice effectively controlled viral replication in the lungs (fig. o ; > % reduction in lung viral titers). thus, surprisingly, despite markedly increased development of ifn--producing granzyme b +ve tc cd t cells, cd tcell-depleted mice showed poor control of influenza virus in the lungs (fig. o) and also exhibited exaggerated weight loss (fig. p) . taken together data in in order to dissect whether impaired viral control in cd t cell-depleted mice was due to defective programming of cd t cells and/or due to loss of cd t cell-dependent viral control, we depleted cd or cd t cells just prior to influenza virus challenge (fig. s ) . as shown in mucosal viral infections such as influenza fail to induce durable t-cell immunity and therefore, studies of t-cell responses to iav have failed to provide clues about how to induce durable tcell immunity in the rt ( , ) . here, we report an adjuvant system comprised of a polyacrylic acid-based adjuvant adj and tlr agonists that elicits surprisingly potent, durable and functionally diverse mucosal t-cell immunity to disparate strains of iav. as a mucosal adjuvant, adj afforded protection against iav in mice ( ) . here, we define ways to broaden t-cell immunity and enhance the protective efficacy of adj by combining with tlr and tlr agonists, gla and cpg respectively. this adjuvant system's ability to elicit impressive numbers of antigen-specific cd and cd t cells enabled us to perform in-depth characterization of vaccine-elicited effector and memory t cells directly ex vivo, without the need for tetramer enrichment. following iav infection, activated cd t cells migrate from dlns to the lungs, undergo another round of antigenic stimulation and differentiate into effector cells ( ) . likewise, antigen-specific cd t cells in all adjuvant groups experienced varying levels of tcr signaling in the lungs. significantly, adjuvants differed in terms of the degree of effector differentiation, for both cd and cd t cells. adj and/or cpg-adjuvanted vaccines drove terminal differentiation into cx cr hi klrg- hi effector cells; as in iav-infected mice ( , ) , the pathway to terminal differentiation is attributed at least in part to higher tcr signaling in the lungs, leading to induction of transcription factors t-bet, irf- and batf ( , , ) . notably, high tcr signaling also induced pd- expression in adj, cpg and adj+cpg groups, and likely limited the accumulation of cd t cells in lungs. pd- might restrain rt inflammation ( ) , but it would be worthwhile determining whether pd- limits vaccine-induced memory and protective immunity. it is noteworthy that despite the presence of similar numbers of antigen-bearing cells in lungs of adj, adj+cpg and adj+gla mice, effector cd t cells in adj+gla mice displayed substantially lower levels of tcr signaling in lungs. it is possible that gla-induced tlr stimulation antagonized antigen-triggered tcr signaling in adj+gla mice ( ) . by dampening tcr signaling, gla might have mitigated terminal differentiation of effectors and promoted the development of trms in adj+gla mice. high levels of inflammation and il- early in the response have been linked to t-bet induction and terminal differentiation of cd t cells in spleen ( , ) , but the rules that govern t cell differentiation in lungs versus spleen are likely different and worthy of further exploration. we find that adj enhances cd expression in responding cd and cd t cells. tcr signaling, il- and exposure to tgf- promote cd expression and mucosal imprinting in t cells ( , ) . however, we find that at and hours after vaccination, the levels of tgf or il- in lungs did not explain differences in cd expression. adj promotes crosspresentation of antigen to cd t cells ( ) and hence, adj-induced increase in the number of antigen-bearing cells in lungs likely enhances tcr signaling and cd expression on effector cd t cells. interestingly, gla inhibited tcr signaling in adj+gla mice without abrogating the cd -inducing effects of adj. it is possible that the residual tcr signaling in adj+gla mice is sufficient to induce cd or other mechanisms including ifn production by cd t cells might have contributed to cd expression on cd t cells ( ) . in summary, we infer that the magnitude of tcr signaling in lungs is a key factor that controls accumulation, mucosal imprinting and effector/memory differentiation. a salient feature of adj-based adjuvants is the diverse functional programming of effector and memory t cells. for cd t cells, all adjuvants induced comparable levels of il- and elicited a strong tc response. however, gla, by virtue of its ability to induce il- and il- , also enabled a significant tc /th response, and induction of th cells by gla is consistent with published work ( ) . importantly, from a vaccination perspective, we have discovered the means to tailor an adjuvant based on pathogen-specific correlates of protection. for example, adj formulated with cpg elicits strong tc /th memory, which protects against viruses and protozoan pathogens (e.g. leishmania). alternatively, adj formulated with gla stimulates balanced differentiation of tc /th and th cells, which is protective against fungi, tuberculosis and other bacterial pathogens ( , ) . the hallmark of effective adjuvants is their ability to elicit protective immunity. effective t-cellbased protection against iav requires a critical number of trms in the airways and the lung parenchyma ( , ) . in this study, all adjuvants elicited readily detectable cd and cd trms in the rt. adj+gla induced the largest number of trms and vascular memory cd /cd t cells in the lungs, which is likely a sequel to less terminal differentiation and larger clonal burst size during the effector phase ( ) . trms are known to reside primarily in the tissue parenchyma and in the dlns, but not as circulating cells ( ) . we find that lungs of adj+gla mice contained cd +ve memory cd t cells in the vasculature, which are likely similar to circulating skinresident cd +ve memory t cells in humans ( ) . parabiosis studies are needed to elucidate whether vascular cd +ve memory cd t cells in adj+gla mice are circulating cells or lung vasculature-resident memory t cells. the numbers of memory t cells in lungs of other adjuvant groups were comparable, but the differential polarity (th vs. th ) programmed by each during the effector phase was preserved in memory t cells; cpg and adj+cpg displayed th dominance and adj, gla and adj+gla showed skewed th differentiation. upon challenge with the pr /h n iav, all vaccinated groups afforded significant protection in the lungs. interestingly, the extent of protection varied between the groups; adj+gla provided the most effective protection, and the descending order of adjuvants in terms of protection is gla ≥ adj+cpg > cpg ≥ adj. upon challenge, all vaccinated groups mounted a strong recall response, and the accumulations of np -specific cd t cells and np -specific cd t cells in lungs were comparable. the percentages of ifn-producing np -specific cd t cells were similar between the groups and the percentages of ifn-producing np -specific cd t cells showed no correlation with viral control. however, interestingly, differences in viral control tend to associate with the combined percentages of il- -producing cd and cd t cells. further, blocking il- a modestly affected iav control in mice vaccinated with adj+gla. these data are consistent with a report that th cells can provide some degree of protection against iav ( ) . in addition to il- a, tc /th cells also produce cytokines such as il- f, il- and gm-csf, whose role in iav control is unknown. we postulate that adjuvants (adj+gla) that stimulate tc /th memory provide an additional layer of immune defense, that augments other mechanisms of cd /cd t cell immunity, leading to enhanced protection. it is also possible that tc /th programming, and not il- -mediated antiviral functions per se, might be important in engendering protective immunity, because th programming is associated with stem cell-like functionally plastic memory t cells ( ) . it is likely that a battery of redundant mechanisms including but limited to il- , ifn- and mhc i/mhc ii-restricted cytotoxicity orchestrate vaccine-induced protective immunity to influenza a virus ( ) ( ) ( ) ( ) . our investigations into the cd t cells' role in programming vaccinal immunity to iav provided further insights into the mechanisms of protection in adj+gla-vaccinated mice. depletion of cd t cells during vaccination precluded priming of np -specific cd t cells, but had no adverse effect on ifn-or il- -producing np -specific memory cd t cells in lungs. importantly, however, cd t cell depletion reduced cd expression and the number of non-vascular cd trms in the lungs, as reported before ( ) . upon iav challenge, despite request for further information, resources and reagents should be directed to the lead contact: m. suresh (sureshm@vetmed.wisc.edu). no unique or new materials or reagents were developed in this study. all materials or reagents used in this manuscript are available commercially or were obtained from other researchers. all data generated in this study are presented in figures for intracellular cytokine staining, x cells were stimulated for hours at c in the presence of human recombinant il- ( u/well), and brefeldin a ( μl/ml, golgiplug, bd biosciences), with one of the following peptides: siinfekl, np or np (thinkpeptides ® , proimmune ltd. oxford, uk) at . ug/ml. after stimulation, cells were stained for surface markers, and then processed with cytofix/cytoperm kit (bd biosciences, franklin lakes, nj). to stain for transcription factors, cells were first stained for cell surface molecules, fixed, permeabilized and subsequently stained for transcription factors using the transcription factors staining kit (ebioscience). all samples were acquired on a lsrfortessa (bd biosciences) flow cytometer. data were analyzed with flowjo software (treestar, ashland, or). statistical analyses were performed using graphpad software (la jolla, ca). all comparisons were made using an one-way anova test with tukey corrected multiple comparisons or students t test where p< . = *, p< . = **, p< . = *** were considered significantly different among groups. in some experiments (fig. ) , we used two-way anova, students t test and simple regression analysis. in fig. , we used non-linear regression for analyzing weight loss data. data are presented as mean ± sem for biological replicates. viral titers were log transformed prior to analysis. no data or outliers were excluded from analyses. funding: this study was supported by a phs grant from the national institutes of health (grant# u ai and r ai ) and funds from the john e. butler professorship to m. suresh. woojong lee was supported by a pre-doctoral fellowship from the american heart association. djg's contribution was supported by national institutes of health training grant (grant# t od ). calculated proportions of ifn- and/or il- -producing cells among cytokine-producing peptide-stimulated ifn-+il- np -specific cd t cells. (g-p) at day after booster vaccination, non-depleted and cd t cell-depleted mice were challenged intranasally with pr /h n influenza a virus percentages of np -specific tetramer-binding cells among cd t cells in lungs. (h) percentages of np -specific tetramer-binding cd t cells in vascular and nonvascular lung compartment. (i) percentages of np -specific tetramer-binding cells among cd t cells in lungs. (j) expression of tissue residency markers on np -specific tetramer-binding cd t cells. (k) chemokine receptor and transcription factor expression in np -specific cd t cells in lungs. (l) granzyme b expression by np -specific cd t cells directly ex vivo. (m) percentages of ifn- and il- producing np -specific cd t cells. (n) relative proportions of ifn- and/or il- producing cells among total ifn- plus il- -producing peptide body weight, measured as a percentage of starting body weight prior to challenge. data are pooled from two independent experiments. *, **, and *** indicate significance at influenza pathogenesis: the effect of host factors on severity of disease airway-resident memory cd t cells provide antigen-specific protection against respiratory virus challenge through rapid ifn-gamma production establishment and maintenance of conventional and circulation-driven lung-resident memory cd (+) t cells following respiratory virus infections the way forward: potentiating protective immunity to novel and pandemic influenza through engagement of memory cd t cells heterosubtypic t-cell immunity to influenza in humans: challenges for universal t-cell influenza vaccines the effector t cell response to influenza infection dynamics of influenza-induced lung-resident memory t cells underlie waning heterosubtypic immunity influenza-induced lung trm: not all memories last forever cd t-cell memory differentiation during acute and chronic viral infections immunological memory and protective immunity: understanding their relation translating innate immunity into immunological memory: implications for vaccine development vaccine adjuvants: putting innate immunity to work immunity to viruses: learning from successful human vaccines vaccine adjuvants: putting innate immunity to work license to kill: formulation requirements for optimal priming of cd (+) ctl responses with particulate vaccine delivery systems accelerating next-generation vaccine development for global disease prevention production and preclinical evaluation of plasmodium falciparum msp- and msp- chimeric protein, pfmsp-fu robust neutralizing antibodies elicited by hiv- jrfl envelope glycoprotein trimers in nonhuman primates the ability by different preparations of porcine parvovirus to enhance humoral immunity in swine and guinea pigs antigenicity and immunogenicity of equine influenza vaccines containing a carbomer adjuvant effective respiratory cd t-cell immunity to influenza virus induced by intranasal carbomer-lecithin-adjuvanted non-replicating vaccines the chemokine receptor cx cr defines three antigen-experienced cd t cell subsets with distinct roles in immune surveillance and homeostasis th differentiation drives the accumulation of intravascular, non-protective cd t cells during tuberculosis functional and genomic profiling of effector cd t cell subsets with distinct memory fates inflammation directs memory precursor and short-lived effector cd (+) t cell fates via the graded expression of t-bet transcription factor diversity in t cell memory: an embarrassment of riches inflaming the cd + t cell response t cell receptor signal strength in treg and inkt cell development demonstrated by a novel fluorescent reporter mouse initial t cell receptor transgenic cell precursor frequency dictates critical aspects of the cd (+) t cell response to infection quality of tcr signaling determined by differential affinities of enhancers for the composite batf-irf transcription factor complex lklf: a transcriptional regulator of singlepositive t cell quiescence and survival transcriptional downregulation of s pr is required for the establishment of resident memory cd + t cells pd- (hi) cd (+) resident memory t cells balance immunity and fibrotic sequelae il- and il- in immunity: driving protection and pathology insight into non-pathogenic th cells in autoimmune diseases lung airway-surveilling cxcr (hi) memory cd (+) t cells are critical for protection against influenza a virus editorial: pulmonary resident memory cd t cells: here today, gone tomorrow cutting edge: contribution of lung-resident t cell proliferation to the overall magnitude of the antigen-specific cd t cell response in the lungs following murine influenza virus infection graded levels of irf regulate cd + t cell differentiation and expansion, but not attrition, in response to acute virus infection tcr signaling via tec kinase itk and interferon regulatory factor (irf ) regulates cd + t-cell differentiation tlr signaling in effector cd + t cells regulates tcr activation and experimental colitis in mice monocytes acquire the ability to prime tissue-resident t cells via il- -mediated tgf-beta release cd + t cell help guides formation of cd + lung-resident memory cd + t cells during influenza viral infection mucosal delivery switches the response to an adjuvanted tuberculosis vaccine from systemic th to tissue-resident th responses without impacting the protective efficacy sting-activating adjuvants elicit a th immune response and protect against mycobacterium tuberculosis infection helper t-cell responses and pulmonary fungal infections virus-specific cd + t-cell memory determined by clonal burst size human cd (+)cd (+) cutaneous resident memory t cells are found in the circulation of healthy individuals il- deficiency unleashes an influenza-specific th response and enhances survival against high-dose challenge th cells are long lived and retain a stem cell-like molecular signature new insights into the generation of cd memory may shape future vaccine strategies for influenza multiple redundant effector mechanisms of cd + t cells protect against influenza infection tc , a unique subset of cd t cells that can protect against lethal influenza challenge memory cd + t cells protect against influenza through multiple synergizing mechanisms prolonged antigen presentation by immune complex-binding dendritic cells programs the proliferative capacity of memory cd t cells rectification of ageassociated deficiency in cytotoxic t cell response to influenza a virus by immunization with immune complexes replication-incompetent influenza a viruses that stably express a foreign gene we thank dr. jameson for providing the klf -gfp mice and advanced bioadjuvants for sproviding adjuplex. thanks to amulya suresh for peparing the graphic abstract. we wish to acknowledge sincere appreciation for the efforts of the veterinary and animal care staff at uw-madison. the authors declare no competing interests reagent key: cord- -w bm p authors: suarez, david l.; pantin-jackwood, mary j.; swayne, david e.; lee, scott a.; deblois, suzanne m.; spackman, erica title: lack of susceptibility of poultry to sars-cov- and mers-cov date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w bm p chickens, turkeys, ducks, quail and geese were challenged with sars-cov- or mers-cov. no disease was observed, no virus replication was detected, and antibodies were not detected in serum. neither virus replicated in embryonating chicken’s eggs. poultry are unlikely to serve a role in the maintenance of either virus. which it has % identity across the genome ( ). sars-cov- is highly transmissible among humans and particularly virulent for elderly individuals and those with certain underlying health conditions. multiple studies have examined the susceptibility of domestic animals to cov- to establish the risk of zoonotic transmission and two studies have shown chickens and pekin ducks were not susceptible to infection ( , ) . middle east respiratory syndrome coronavirus (mers-cov), another coronavirus of high concern associated with zoonotic infection, was first detected in patients with severe acute lower respiratory tract disease in saudi arabia in . mers-cov causes lower respiratory disease similar to the sars-covs ( ). unlike sars-cov- , mers-cov transmits poorly to humans and does not exhibit sustained human-to-human transmission; however, it has a high case fatality rate of around %. although the mers-cov case count is low, human cases continue to be reported, therefore there is a possibility for the virus to adapt to humans. based on sequence similarity, the closest relatives of sars-cov- and mers-cov are believed to be bat beta-coronaviruses ( ), but because of the amount of sequence difference between human and bat isolates an intermediary host likely exists. for mers-cov, dromedary camels appear to be the primary natural reservoir of infection to humans, but other domestic animals seem to be susceptible to infection ( , ) . there is only a single study of mers-cov in chickens that looked for antibodies, but all samples were negative ( ). because poultry are so widespread and have close and extended contact with humans, and other mammals in many production systems, including live animal markets, susceptibility were conducted with sars-cov- and mers-cov in five common poultry species. additionally, embryonating chicken eggs (ece) have been utilized for the isolation and as a laboratory host system, including use in vaccine production, for diverse avian and mammalian viruses. therefore, ece were tested for their ability to support the replication of both viruses. hosts and sources of endemic human genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan transmission studies of sars-cov- in fruit bats, ferrets, pigs and chickens the lancet susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus . science mers-cov: a global challenge human coronavirus infections-severe acute middle east respiratory syndrome (mers), and sars- reference module in biomedical sciences identification of mers-cov in dromedary camels middle east respiratory syndrome coronavirus infection in non-camelid domestic mammals. emerg microbes infect respiratory syndrome (mers) coronavirus seroprevalence in domestic livestock in saudi arabia key: cord- -ymr e k authors: wu, yue; hengen, keith b.; turrigiano, gina g.; gjorgjieva, julijana title: homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: ymr e k homeostasis is indispensable to counteract the destabilizing effects of hebbian plasticity. although it is commonly assumed that homeostasis modulates synaptic strength, membrane excitability and firing rates, its role at the neural circuit and network level is unknown. here, we identify changes in higher-order network properties of freely behaving rodents during prolonged visual deprivation. strikingly, our data reveal that pairwise functional correlations and their structure are subject to homeostatic regulation. using a computational model, we demonstrate that the interplay of different plasticity and homeostatic mechanisms can capture the initial drop and delayed recovery of firing rates and correlations observed experimentally. moreover, our model indicates that synaptic scaling is crucial for the recovery of correlations and network structure, while intrinsic plasticity is essential for the rebound of firing rates, suggesting that synaptic scaling and intrinsic plasticity can serve distinct functions in homeostatically regulating network dynamics. neural circuits are faced with a fundamental problem: how to allow experience to alter and refine network connectivity during learning and experience-dependent plasticity, while still maintaining stability of function. generating a neural system that is both stable and flexible is a non-trivial challenge and requires a prolonged period of development when multiple mechanisms at the level of single neurons and networks of neurons interact. two powerful and fundamentally different forms of plasticity involved in this process are hebbian mechanisms, which alter synaptic connectivity in a synapse-specific manner, and homeostatic mechanisms that maintain stable function by globally adjusting overall synaptic weights and neuronal excitability. the development and refinement of visual response properties in primary visual cortex (v ) involves classic synapse-specific mechanisms implementing the bidirectional form of hebbian plasticity, such as long-term potentiation (ltp) and long-term depression (ltd), considered to be the cellular substrate for learning and memory (smith et al., ) . however, associative hebbian plasticity drives positive feedback processes leading to unstable network dynamics and some form of homeostasis is needed to compensate for the inherent instability (abbott and nelson, ; turrigiano and nelson, ) . a large body of evidence shows that various homeostatic plasticity mechanisms, including synaptic scaling and intrinsic plasticity (turrigiano et al., ; desai et al., ) , operate in the brain to maintain stability despite various internal and external perturbations. more specifically, homeostatic plasticity can elevate neural activity in response to sensory deprivation (hengen et al., ; keck et al., ) , and suppress activity under conditions of overexcitation (seeburg et al., ; evers et al., ) . despite great efforts to describe homeostatic mechanisms at the single cell level, how network properties are homeostatically regulated is largely unknown. furthermore, while hebbian and homeostatic mechanisms operate at different timescales and can be induced by distinct cues (stellwagen and malenka, ; shepherd et al., ; joseph and turrigiano, ; daoudal and debanne, ) , how they interact within complex, highly recurrent microcircuits, as those found in the cortex, to refine and maintain circuit function has remained elusive. a critical challenge has been the lack of detailed measurements of individual synaptic strengths and their potential impact on large-scale network dynamics, especially in a highly recurrent network like the cortex. here we investigate two main questions. first, which aspects of network function are under homeostatic control? second, why are there so many homeostatic mechanisms, and do they serve redundant or unique functions? to address these questions, we combine analysis of in vivo electrophysiological data during sensory deprivation in rodent visual cortex and computational modeling of cortical synaptic plasticity and network dynamics. first, we analyzed the collective activity of multiple neurons in the monocular region of primary visual cortex (v m) during a classic monocular deprivation (md) paradigm (lid suture), in freely behaving rats over days during the critical period (hengen et al., ) . earlier work demonstrated that md induces an initial drop in firing followed by the rates' homeostatic recovery despite long-lasting deprivation (hengen et al., ) . here we reanalyzed these datasets to characterize the temporal evolution of higher-order network properties over the same nine-day period. individual pairwise correlations, including correlation structure, weakened during brief md, but recovered during prolonged md. second, to understand how the cortical network exploits diverse homeostatic mechanisms to return firing rates and correlations to baseline after prolonged md, we took advantage of a plastic spiking recurrent network model equipped with known plasticity and homeostatic mechanisms. our work suggests that synaptic scaling is crucial for the recovery of correlations and network structure, whereas intrinsic plasticity is essential for the rebound of firing rates. these results indicate that different homeostatic mechanisms act in the brain to independently regulate distinct network features. we first confirmed previous analysis of individual neurons recorded in vivo in the primary visual cortex during the critical period of plasticity (postnatal days to ). in these experiments, md was performed after days of baseline activity and continued for the rest of the recordings. while firing rates of individual neurons remain relatively stable during normal development, brief two-day monocular deprivation (md) caused the firing rates to decrease to % of their baseline values ( fig. a , left) (hengen et al., (hengen et al., , . however, despite prolonged md, over the next three to four days firing rates gradually recovered to baseline after an initial overshoot (fig. a , right) (hengen et al., (hengen et al., , . these effects were not only observed at the population level, but also at the level of individual neurons (hengen et al., ) . here, we investigated higher-order network properties during normal development and following prolonged md by calculating the next statistical moment beyond the firing rates, namely the pairwise spiking correlations between different neuron types (methods). specifically, we quantified the temporal evolution of the correlation coefficient of individual neuron pairs and of the average correlation across all pairs both during a. the average firing rates of neurons from five control hemispheres (left) and neurons from five deprived hemispheres (right) normalized to the firing rates at p in the light (horizontal dashed line). the average pairwise correlations of pairs from five control hemispheres (left) and pairs from five deprived hemispheres (right) normalized to the correlations at p in the light (horizontal dashed line). correlation comparisons between baseline (bl) and early md (left), between early md and late md (middle), and between bl and late md (right) at the single cell-pair level of one control hemisphere. different colors represent the correlations between different neuron types. dashed lines are fitted regression lines crossing the origin. upper left histograms indicate the distributions of correlation differences. *** p < . , wilcoxon signed-rank test. d. same as c but for one deprived animal. here, for two hemispheres we used md and for the other three hemispheres md as early md because different animals showed the biggest drop in correlations at different times. *** p < . , wilcoxon signed-rank test. e. slopes of fitted regression lines for the correlation comparisons in c and d. data are shown as means ± sem. normal development and after perturbing visual input through md. in control hemispheres, unlike firing rates, correlations increased slightly as a function of age (n = animals, fig. b, left) . by contrast, in deprived hemispheres, correlations initially dropped over the first two days and then gradually rebounded to pre-deprivation levels (n = animals, fig. b, right) , displaying a similar pattern as the firing rates (fig. a) . as previously reported, we observed light-dark oscillations in the correlation amplitudes with higher correlations in the light and lower correlations in the dark (pacheco et al., ) . to assess the degree to which correlations of individual neuron pairs changed beyond the population level, we evaluated single cell-pair correlations on different days. as in earlier analysis, neurons were separated into putative pv+ fast-spiking units (pfs) or regular-spiking units (rsus) based on waveform and spiking characteristics (hengen et al., (hengen et al., , . specifically, we focused on three different -hour periods recorded in the light: ( ) baseline (bl) corresponding to p , ( ) a period that we called "early md" when the largest drop of firing rates and correlations occurred, typically two or three days after baseline i.e. p or p , and ( ) a period that we called "late md" corresponding to the time when the firing rates and correlations nearly recovered to baseline i.e. p . as observed already for the average correlation combining all neuron pairs and animals, correlations increased during normal development covering the -day period during which recordings were performed. the increase between bl and early md was small (n = pairs, fig. c , left, r = . , p < − , wilcoxon signed-rank test). correlations at late md were significantly greater than at early md (n = pairs, fig. c , middle, r = . , p < − , wilcoxon signed-rank test). the developmental increase in correlations during the critical period became most obvious when we compared bl versus late md (n = pairs, fig. c , right, r = . , p < − , wilcoxon signed-rank test). we did not observe any obvious differences in correlations among different cell types in that they all showed similar patterns of temporal evolution. moreover, almost all neuronal pairs in the control hemispheres demonstrated an increase in correlation ( fig. c, right) . conversely, in deprived hemispheres, correlations of the majority of individual cell pairs, independent of their type, underwent an initially significant drop during early md (n = pairs, fig. d , left, r = . , p < − , wilcoxon signed-rank test) followed by a subsequent increase during late md (n = pairs, fig. d , middle, r = . , p < − , wilcoxon signed-rank test). the correlations during late md recovered to a higher level than baseline (n = pairs, fig. d , right, r = . , p < − , wilcoxon signed-rank test). we summarized the gradual increase of correlations in control hemispheres and drop followed by recovery in deprived hemispheres by the fitted regression lines of the individual pair data for each animal (fig. e) . remarkably, despite a degree of variability across animals, the drop and recovery of correlations induced by md were ubiquitous ( fig. s ). while correlations at the single cell-pair level recovered during late md, the difference between correlations at late md and bl (fig. d , right) raised the possibility that the recovered network might have a different structure after recovery. to examine the evolution of network structure during normal development over the critical period and during prolonged md, we examined the correlation matrices on different days. an example experiment shows that in the control hemisphere, the structure of the correlation matrix remained consistent over time (n = neurons, fig. a ), whereas in the deprived hemisphere, the correlation structure initially weakened and recovered to a similar structure as baseline (n = neurons, fig. b ). to quantify the similarity between the structure of correlation matrices at distinct time points, we used a metric known as the l distance (methods), which measures the average positive distance between the vectors of correlations in a high dimensional space. this metric allowed us to summarize the results for multiple animals, and revealed that in control hemispheres the correlation matrix at bl is more similar to the correlation c. l distance between correlation matrix at baseline and at late md, relative to baseline and shuffled late md for control and deprived hemispheres. data are shown as means ± sem. *** p < . , wilcoxon signed-rank test. matrix at late md relative to randomly shuffling the latter (n = pairs, fig. c , left, p < − , wilcoxon signed-rank test). similarly, in deprived hemispheres, the correlation matrix at bl was more similar to the correlation matrix at late md relative to randomly shuffling the latter (n = pairs, fig. c , right, p < − , wilcoxon signed-rank test). this demonstrates a high degree of similarity between correlation structure at baseline and at late md. interestingly, the correlation matrices were composed of several assemblies -groups of neurons exhibiting stronger pairwise correlations ( fig. a , b) -reminiscent of clustered network structure reported in previous studies (ko et al., ; cossell et al., ; perin et al., ; yoshimura et al., ) . in conclusion, our analysis of v cortical activity recorded in vivo demonstrates that the correlation structure of these networks is homeostatically regulated following perturbation of normal sensory experience. we next asked what mechanisms underlie the observed neuronal and network-level changes during normal development, and following a perturbation like md. to understand how neural circuits exploit various synaptic plasticity and homeostatic mechanisms to decrease and recover both firing rates and correlations during md, we built a plastic recurrent network model consisting of randomly connected excitatory and inhibitory spiking neurons (methods). model neurons receive thalamic inputs, with thalamocortical synaptic efficacy onto inhibitory neurons set higher than onto excitatory neurons, consistent with previous experimental studies (cruikshank et al., ; ji et al., ; miska et al., ) . neuronal and network parameters were chosen to generate in vivo-like firing rates, with excitatory neurons firing at hz and inhibitory neurons firing at hz (hengen et al., ) . to generate clustered correlation structure as observed experimentally ( fig. a , b), we followed previous modeling studies and included several experimentally characterized plasticity mechanisms (litwin-kumar and doiron, ) (methods). we first tasked the network with the imprinting of connectivity assemblies starting from an initially random connectivity. in contrast to previous computational modeling studies that used random, uncorrelated poisson inputs (litwin-kumar and doiron, ) and in line with our observation that the networks show stronger pairwise correlations in the light than in the dark (pacheco et al., ) , we postulated that input correlationsas would be generated during natural vision -matter for the generation of clustered connections. therefore, we trained the network by stimulating the recurrent cortical network with thalamocortical poisson spiking inputs that had identical firing rates, but differed in their correlation structure. for the training, excitatory neurons were randomly grouped into assemblies, although the exact number of assemblies was not important. before training with correlated inputs, the initial synaptic connections in the entire network were weak and identical between any pair of neurons of the same type (fig. a , left), resulting in asynchronous irregular network activity (fig. a , middle), and low correlations (fig. a , right). during training, neurons within the targeted assembly received correlated inputs (methods), which strengthened connectivity between them through recurrent hebbian plasticity. after training, the network became structured with stronger synaptic connections within assemblies (fig. b, left) . as a result of this structure, the network no longer exhibited asynchronous irregular activity but blocks of activity defined as occasional periods of high firing rate (fig. b , middle). the structured connectivity and block activity selectively increased correlations within assemblies ( fig. b , right). using the structured model network as a baseline following normal cortical development after eye opening, we next wanted to investigate how this network responds to a sensory perturbation resembling md. to achieve this, we needed to know how the inputs to the network are modified during md. previous experimental studies have reported that md induces no change in the average firing rates of lgn, the visual area of the thalamus (linden et al., ) . therefore, to simulate md in our model network, we kept the firing rates of lgn inputs identical to that at baseline, but assumed that eye closure during md considerably diminished input correlations. in the model, the excitatory neurons received uncorrelated poisson inputs to denote the start of md (fig. a ). in addition to these changes in input correlations, recent experiments have revealed that brief md ( days) induces long-term depression at thalamocortical synapses onto excitatory and inhibitory neurons, with thalamocortical synapses onto inhibitory neurons depressed more than synapses onto excitatory neurons (miska et al., ) . the process of long-term depression is not instantaneous, so we assumed that synaptic connections from thalamus to excitatory and inhibitory neurons undergo a linear decrease during the first two days of md. to match experimental findings, the decrease in thalamocortical connections onto inhibitory neurons was larger (fig. a , methods). it is currently unknown when during md this thalamocortical depression saturates, but since deprived-eye responsiveness reaches its minimum - days after the onset of md (frenkel and bear, ) , we assumed that the feedforward connections did not further decrease after this point, while keeping the inputs uncorrelated for the entire md (fig. a ). how does the recurrent network respond to these changes in input correlation structure and depression of feedforward connectivity strength that occur following md? although there are potentially multiple ways to achieve network stability and regulate network function, there are two fundamentally different mechanisms that are known to homeostatically control neuronal firing: homeostatic adjustment of synaptic strengths and of intrinsic excitability (turrigiano and nelson, ; turrigiano, ; lambo and turrigiano, ) . neurons can regulate their activity by scaling incoming synaptic strengths in response to perturbations -a process known as synaptic scaling (turrigiano et al., ) . this scaling is bidirectional in that it can increase and decrease synaptic strengths, it is global and operates in a multiplicative manner. in addition to synaptic scaling, neurons can alter the number of ion channels to adjust their intrinsic excitability, and consequently modify their firing thresholds, in response to perturbations (desai et al., ; daoudal and debanne, ; grubb and burrone, ) . based on these experimental findings, besides hebbian plasticity during training, we modeled these two distinct homeostatic mechanisms following md: ( ) synaptic scaling which acts only on excitatory synapses (turrigiano et al., ; hengen et al., ) , and ( ) intrinsic plasticity which modifies the intrinsic excitability of both excitatory and inhibitory neurons (grubb and burrone, ; campanac et al., ) (methods) . in the presence of persistent thalamocortical ltd as during training, and both homeostatic mechanisms, the average firing rates of excitatory and inhibitory neurons in the model network first decreased to % of baseline, because slow homeostatic mechanisms could not overcome the feedforward synaptic depression and input decorrelation to recover firing rates. at the time that feedforward ltd saturated, firing rates started to increase due to homeostatic plasticity, resembling the recovery to baseline observed experimentally during late md (compare fig. b and fig. b ). next, we investigated the evolution of higher-order aspects of network dynamics. similar to the analysis of our data, we focused on two key time points after md onset in the model: early md corresponding to the largest drop of firing rates (fig. b, orange) , and late md corresponding to the time when the firing rates recovered close to baseline (fig. b, yellow) . the network showed irregular spiking dynamics with different firing rates during these two periods (fig. c, d, left) . the correlations first decreased during the period modeling early md as observed experimentally (compare fig. b to fig. c , right), but did not recover during the period corresponding to late md (compare fig. b to fig. d, right) . we speculated that this failure to recover correlations in the model network, despite the recovery of firing rates, could be the result of perturbing the structured connectivity between excitatory neurons within assemblies generated through training (fig. b) . indeed, the average weights between excitatory neurons within an assembly depressed during the period corresponding to late md (fig. s ) . to reveal the origin of this depression in the model network, we investigated the specific contribution of hebbian plasticity and synaptic scaling to the average weight change within assemblies. despite the overall potentiation within assemblies induced by synaptic scaling during the period corresponding to early md, continued ltd from hebbian plasticity dominated over homeostatic plasticity, depressing all weights within assemblies and preventing the recovery of correlations ( fig. s ). in conclusion, this dominance of depression after md prevented the recovery of structured connectivity, and consequently correlations, in a model with persistent hebbian ltd despite homeostatic plasticity. this suggests that the relative timing and resulting competition between the two homeostatic mechanisms and ongoing hebbian plasticity could be important for recovering different aspects of network dynamics. previous work involving ocular dominance plasticity has shown that blocking nmdar, the main receptor involved in hebbian plasticity, during the homeostasis-dominant phase causes no significant change in response strength (toyoizumi et al., ) . this suggests that the effect of hebbian plasticity during homeostatic recovery of network activity is negligible. motivated by this, we asked whether the recovery of correlations during the period corresponding to late md can be rescued by reducing the effect of hebbian ltd. we proposed that the attenuation of hebbian plasticity might occur through a metaplastic process where the amplitude of ltd dynamically adapts to the history of neuronal activity (methods). to adapt the magnitude of ltd, we followed previous theoretical work (pfister and gerstner, ; gjorgjieva et al., ) . implementing metaplastic ltd, preserved the recovery of average firing rates of both excitatory and inhibitory neurons (fig. a) . similarly, the spiking rasters during the period corresponding to the early md phase showed asynchronous irregular activity (fig. b, left) . in contrast to the model with persistent ltd, however, the metaplastic reduction in ltd enabled the return of structured activity during late md (fig. c , average excitatory-to-excitatory weights for each assembly and across assemblies. average inhibitory-to-excitatory weights which target all excitatory neurons independent of assembly membership. left). importantly, the correlation structure in the model during early and late md homeostatically recovered after its initial dilution ( fig. b and fig. c, right) . we further investigated what other properties of the network changed as we modeled md. along with firing rates and correlations, the average weights within assemblies manifested the same pattern of drop and rebound (fig. d) , differently than the weights in the initial model with persistent ltd (fig. s ) . average inhibitory onto excitatory weights also decreased during early md in the model (fig. e) , suggesting that the network reduced the amount of inhibition to elevate the decreased firing rates of excitatory neurons. during the period corresponding to late md, overall inhibition increased to balance the gradually recovered excitation, to keep excitatory-inhibitory balance and avoid winner-take-all dynamics where a single strongly-connected assembly dominates the entire network (litwin-kumar and doiron, ) . furthermore, the average firing thresholds of excitatory and inhibitory neurons in the model network decreased as we modeled prolonged md, and reached steady state as the firing rates approached their baseline values (fig. f ). therefore, metaplastic regulation of ltd together with synaptic scaling and intrinsic plasticity, is sufficient to capture both the recovery of firing rates and correlations during md. this suggests that homeostatic modifications of overall synaptic weights and intrinsic excitability cooperate with hebbian ltd to maintain several aspects of network function following input perturbations. to determine the distinct contributions of the different homeostatic mechanisms for the recovery of firing rates and correlations during prolonged md, we selectively eliminated each mechanism. when deactivating synaptic scaling during the entire period of md in the model, we found that firing rates still recovered (fig. a ), but the correlations did not (fig. c ). since synaptic scaling affects synaptic strengths, we hypothesized that the correlations failed to recover due to the inability of the network to recover its structured connectivity. indeed, the average weights between excitatory neurons within assemblies remained low in the absence of synaptic scaling (fig. s ) , eliminating structured block activity (fig. b ) and preventing the recovery of correlation structure during late md (fig. c ). this suggests that synaptic scaling is indispensable for the recovery of correlations. similarly, without intrinsic plasticity during the entire md period firing rates in the model did not recover (fig. d ). this result was independent of the recovery of correlations. when the overall excitatory drive received by a single neuron within the same assembly was weak, low firing rates were accompanied by a poor degree of synchrony within assemblies (fig. e) , resulting in weak correlations (fig. f) . increasing the overall excitation to a neuron, for instance by increasing the connectivity probability within assemblies, generated structured block activity resulting in high correlations within assemblies (not shown), but still failed to recover firing rates to baseline levels, especially for inhibitory neurons. in conclusion, we demonstrate that two important forms of homeostatic plasticity, synaptic scaling and intrinsic homeostatic plasticity, are able to regulate distinct aspects of network activity. even though these mechanisms are operating within the context of a specific network architecture implementation here, we propose that they will apply more generally since synaptic and intrinsic plasticity influence different aspects of network function. a key question in the field of homeostatic plasticity is which aspects of neuronal activity are under homeostatic control. recent studies have shown that despite a high degree of synaptic plasticity during the critical period (levelt and hübener, ) , firing rates of individual neurons remain remarkably constant during normal development (hengen et al., ) , and when perturbed by sensory deprivation rebound back to an individual set point despite continued deprivation (hengen et al., ) . here we used in vivo data in rodent visual cortex to investigate whether higher-order cortical network properties are under homeostatic control. we found that -distinct from firing rates -correlations in the control hemispheres increased slightly during early development. in contrast, in deprived hemispheres correlations initially decreased over the first two days and then gradually recovered to pre-deprivation levels, including their structure. modeling of this process revealed that this restoration of correlation structure could be accomplished through synaptic scaling, while firing rate homeostasis was dependent on intrinsic homeostatic plasticity. together, these experimental findings provide the first evidence that functional correlation structures are subject to homeostatic regulation. recovery of stimulus preference at the single cell level, as well as network correlation structure, has also been reported during repeated episodes of monocular deprivation in the binocular region of visual cortex, each followed by eye reopening (rose et al., ) . however, in these ocular dominance plasticity studies, recovery occurring following eye reopening is trkb-dependent and mediated by hebbian ltp (kaneko et al., ) . this is mechanistically distinct from our work where recovery is governed by homeostatic mechanisms, and where there is no competition between the closed and open eye. what might be the purpose of the recovered network correlations? following lid suture to induce md, the transmitted light through the closed eye lids is relatively weaker compared to the pre-deprivation condition. therefore, we propose that the network's homeostatic recovery of correlations might be a way to amplify weak signals, promoting successful signal propagation to other cortical regions (vogels and abbott, ) , which is essential for animals' perception of the sensory environment (van vugt et al., ) . we predict that the recovery of correlation structure also has important functional implications for information transmission across cortical hierarchies. for instance, neurons in layer / sum inputs from neurons in layer and are highly influenced by its connectivity and correlation structure. if the recovered network in one layer undergoes a profound remodeling and ends up having a completely different correlation structure, adjustments in successive layers would be needed to keep the cortical network functional. we cannot conclude from our data whether neurons with higher correlations are more strongly connected. however, as previously shown, functionally correlated neurons are more likely to be connected and if so, more strongly (ko et al., ; cossell et al., ) . we therefore assume that correlation strength is indicative of connection strength. in that sense, the identified clusters with strong correlations giving rise to assemblies are consistent with previous experimental work (ko et al., ; cossell et al., ; barnes et al., ) . however, this is only the case for excitatory neurons (identified rsus); since the number of sorted pfs cells was significantly lower than rsus, we could not investigate their correlation structure. to dissect the role of various homeostatic mechanisms to restore firing rates and correlations to baseline despite prolonged md, we built and analyzed a computational model with spiking neurons and biologically realistic plasticity rules. upon training with correlated input patterns (ocker and doiron, ) , imitating the baseline condition in which animals perceive normal visual inputs, the network exhibited structured spontaneous activity and developed stronger correlations within assemblies. our model showed that decreasing thalamocortical connection strength (miska et al., ) and decorrelating input patterns during md, degraded synaptic weights and decreased firing rates and correlations. this was accompanied by a depression in excitatory synaptic weights within assemblies and overall inhibitory synaptic weights in the model. although experiments have not found significant changes in the strength of recurrent excitation within layer (maffei et al., ) , in layer / there is a generalized depression of excitatory input ; a more systematic analysis that includes measurements within and across assemblies would be necessary to reveal selective depression of some connections. the modeling results indicated that attenuating the depression effect of hebbian plasticity was required to maintain clustered network structure during the process of recovery. this suggests that the effect of hebbian plasticity becomes attenuated during prolonged md, which then allows homeostatic plasticity to "catch up" and restore network properties. this is consistent with several experimental findings. for example, brief md leads to occlusion of ltd in layer in primary visual cortex (crozier et al., ; miska et al., ) , while homeostatic strengthening of ca synapses in hippocampus is accompanied by a reduced ability of synapses to exhibit ltp (soares et al., ) . furthermore, blocking the nmdars involved in hebbian plasticity during recovery does not significantly change response strength, suggesting that the total effect of hebbian plasticity has already attenuated when homeostasis is at its peak (toyoizumi et al., ) . importantly, in the face of ongoing plasticity, we found that two different forms of homeostatic plasticity can serve distinct functions in recovering network function. first, intrinsic plasticity as a mechanism that affects individual neuron properties, such as the firing threshold, is essential for the rebound of firing rates. since it does not act directly on the synaptic weights, it has no significant impact on the recovery of correlations. we implemented intrinsic plasticity by simply adjusting the firing threshold which effectively shifts the neuronal input/output function to keep the model sufficiently general. biophysically, intrinsic plasticity can be implemented by changes in the density and function of voltage-gated channels (lemasson et al., ; turrigiano et al., ; desai et al., ) . unlike intrinsic plasticity, synaptic scaling regulates synaptic strengths directly and is crucial for the recovery of correlations and network structure in the model. mechanistically, this regulation is fundamentally distinct from hebbian plasticity. the regulating process involves an enhanced accumulation of ampar in the postsynaptic membrane, which can be mediated by the pro-inflammatory cytokine tumour-necrosis factor-α (tnf-α) produced by glia (stellwagen and malenka, ) , the immediate-early gene arc (shepherd et al., ) , β integrins (cingolani et al., ) and other molecules. crucially, the scaling is bidirectional, global and operates in a multiplicative manner (turrigiano et al., ) , although there is some evidence for dendritic-branch specific scaling in some neocortical cell types (barnes et al., ) . during recovery, multiplicative scaling potentiates synaptic weights within assemblies more than across assemblies in our model, preserving the relative strength of synaptic inputs and enabling the recovery of correlation structure. the distinct functional roles fulfilled by synaptic scaling and intrinsic plasticity apply to the context of the present constellation of plasticity rules. we found that synaptic scaling alone is insufficient to recover the firing rates in our model, especially inhibitory firing rates. however, increasing synaptic strengths also boosts neuronal responses, which raises the possibility that synaptic scaling alone might be able to recover firing rates with a different combination of plasticity rules. one straightforward possibility to recover the firing rates of inhibitory neurons is either by increasing the total excitation, for example by upscaling the e-to-i connection, or by decreasing the total inhibition to inhibitory neurons, for example by downscaling the i-to-i connection. interestingly, synaptic scaling onto inhibitory neurons was recently found to organize model recurrent networks around criticality, independently of firing rates (ma et al., ) . this suggests that homeostatic plasticity in excitatory elements might be important for the recovery of firing rates and correlations, while plasticity in inhibitory elements for the recovery of criticality. we highlight that including spiking neurons in our model and training the baseline network with correlated inputs enabled us to study the emergence, dilution and recovery of correlation structure during prolonged md, which is not possible in unstructured randomly connected networks despite the recovery of firing rates. furthermore, our implementation of hebbian and homeostatic plasticity with appropriate biologically motivated timescales suggests a non-trivial cooperation between hebbian and homeostatic plasticity with the first being attenuated while the latter is in full operation. in conclusion, our analysis reveals an important, previously unidentified network feature that is homeostatically regulated during perturbation of normal circuit dynamics in the visual cortex. the finding that not only the average correlations, but also the correlation structure, recover has interesting implications for the recovery of computations in these circuits that might be encoded in non-random connectivity patterns. moreover, our network model with spiking neurons and experimentally characterized homeostatic mechanisms allowed us to dissect the role of each on different aspects of network dynamics, suggesting that different homeostatic mechanisms serve unique, rather than redundant, functions. to obtain the normalized firing rate evolution for different animals, the firing rates of each animal were normalized to the average firing rate at p during the light period. note that here the analysis of firing rates was restricted to md because for the higher order network feature analysis (the pairwise correlations) the number of available, continuously recorded cells beyond this period was insufficient. therefore, although the firing rates still seem to be above baseline at md -a trend identical to that reported in the previous study (hengen et al., ) -they eventually return to baseline by md (hengen et al., ) . each spike train was binned into spike counts of bin size ms, generating a vector of spike counts for each cell. the spike count correlation coefficient ρ for a pair of neurons was computed in -minute episodes using a sliding window of minutes. we averaged these values for each pair every single half ( -hour) day, thus computing the correlation coefficient for light and dark conditions separately: where x and y represent the spike count vectors of two cells, respectively; µ x and µ y are the means of x and y ; σ x and σ y denote the standard deviations of x and y ; e is the expectation. this produced the matrices of pairwise spike count correlations on different half days. just like the firing rates, to generate the normalized correlation curve across animals, the correlations of each animal were normalized to the average correlation at p during the light period. the correlation matrices in figure a ,b were clustered using hierarchical clustering during baseline and the same neuron order was preserved at later time points. shuffled matrix a was generated by redistributing the off-diagonal entries of the original matrix a while keeping the matrix a symmetric. the similarity was quantified by computing the absolute difference between the shuffled matrix a and the correlation matrix at baseline b: the elements of the upper triangular part of m were used to form a vector of l distance. vectors from different animals were then concatenated into a single vector. during shuffling, only the elements corresponding to a given animal were shuffled, i.e. animal identity was preserved. single neurons were modeled as leaky integrate-and-fire with membrane potential of neuron i, u i , given by (zenke et al., ) : where τ m is the membrane time constant and u rest is the resting potential. the neuron elicited a spike when its membrane potential reached the spiking threshold u thr . after a spike, the membrane potential was reset to u rest . the neuron also had a refractory period τ ref after a spike. inhibitory neurons also followed the same integrate-and-fire formalism, but with a shorter membrane time constant. the values of all neuron model-related parameters are listed in table . the network model consisted of excitatory and inhibitory leaky integrate-and-fire neurons, which were randomly connected with probability of %. excitatory neurons were randomly grouped into four non-overlapping groups. each excitatory and inhibitory neuron received external excitatory input from neurons firing with poisson statistics at average firing rate of hz, with synaptic strength w ext→e and w ext→i , respectively. excitatory synapses have a fast ampa component and a slow nmda component. dynamics of excitatory conductances are given by: here w ij is the synaptic strength from neuron j to neuron i. if the connection does not exist, w ij was set to . s j (t) is the spike train of neuron j, which is defined as s j (t) = k δ(t − t k j ), where δ is the dirac delta function t k j are the spikes times k of neuron j. α is a weighting parameter. dynamics of inhibitory conductances are given by: the values of all network-related parameters are listed in table . we implemented the network model in three stages: initialization stage, a training stage and an md stage. all plasticity except for excitatory-to-excitatory plasticity was present in the first seconds of the simulation to initialize the network and obtain network activity before training. subsequently, the training process started. during training, correlated stimuli were presented sequentially for second, with second gaps in between stimulus activations. while correlated stimuli were presented to one assembly, the remaining neurons received inputs from independent neurons firing with poisson statistics at average firing rate of hz. the firing rate of the correlated inputs was also hz. correlated inputs for the training were generated following previous studies (brette, ; gjorgjieva et al., ) . specifically, we used a copying probability . from individual uncorrelated poisson source trains and copying probability . from a common poisson source, all with the same firing rates. the weight matrix obtained after training was used to induce md in the simulations. md simulations started with seconds without plasticity when inhibitory stdp was activated, while other plasticity and homeostatic mechanisms were activated at seconds. at the same time, the feedforward connections onto excitatory and inhibitory neurons linearly decreased by % and % from seconds to seconds. from seconds onwards, feedforward connections were kept fixed. to form the clustered correlation structure as observed experimentally, we followed previous modeling studies (litwin-kumar and doiron, ) and modeled the plasticity of excitatory-to-excitatory synapses using triplet stdp (pfister and gerstner, ) , of inhibitory-to-excitatory synapses using inhibitory stdp (vogels et al., ) and also heterosynaptic plasticity operating on excitatoryto-excitatory synapses. we used the triplet stdp rule which describes synaptic plasticity based on triplets of spikes and captures experiments where the rate of pre-and postsynaptic neurons varies (sjöström et al., ) . the triplet stdp rule enables the formation of bi-directional connections, a necessity for the formation of clustered architectures (pfister and gerstner, ; gjorgjieva et al., ; ocker and doiron, ) . according to triplet stdp, the synaptic strength from excitatory neuron j to excitatory neuron i here, a − and a + are the amplitude of the weight change induced by a post-pre pair or a post-prepost triplet of spikes. is a small positive constant. the synaptic traces z + n (t), z − n (t) and z slow n (t) evolved according to with different time constants τ x . unlike excitatory-to-excitatory plasticity, there is significantly less experimental evidence for the form of learning rules operating at other types of synapses. therefore, inhibitory-to-excitatory plasticity, known as inhibitory stdp, was proposed to counteract the self-feedback loop that potentiates or depresses synaptic strength arising from excitatory-to-excitatory plasticity (vogels et al., ; d'amour and froemke, ) . inhibitory synapses on excitatory neurons are governed by: where x i and x j are the synaptic traces of the postsynaptic excitatory and presynaptic inhibitory neuron, which are described by dx i dt = − x i τ istdp + s i (t) with r i , τ istdp and η denoting the target firing rate of neuron i, the time constant of the synaptic trace and the learning rate of istdp, respectively. excitatory-to-inhibitory connections and inhibitory-to-inhibitory connections were non-plastic to reduce the parameters in the model. all plastic weights were subject to upper bounds. we also modeled a second form of normalization in the form of heterosynaptic plasticity, which ensures that the sum of all incoming excitatory synaptic weights at each postsynaptic excitatory neuron is kept below a target (fiete et al., ) . this form of normalization has been found to be essential in maintaining clustered structures upon their formation (litwin-kumar and doiron, ). the evolution of synaptic strength from excitatory neuron j to excitatory neuron i via heterosynaptic plasticity follows: where n e i is the number of nonzero elements. as heterosynaptic plasticity also imposed a constraint on the excitatory-to-excitatory synaptic weight, β was set to . so that w ij becomes approximately w ee max . heterosynaptic plasticity was implemented every s, and only acting when the j w ee ij (t) was larger than β * j w ee ij ( ). the amplitude of ltd for neuron i, a − i follows here, x est i denotes the firing rate estimator defined as with τ est being the integration time constant of x est i . if the firing rate of a neuron was close to its target, r i , then x est i τ est r i ≈ . metaplasticity was implemented every s. furthermore, a − i was bounded below by % of its initial value to reduce the effect of hebbian plasticity during the homeostatic recovery phase as shown previously (toyoizumi et al., ) . the evolution of synapse strength from excitatory neuron j to excitatory neuron i via synaptic scaling is given by: where τ ss represents the time constant of synaptic scaling. the firing threshold of neuron i regulated by intrinsic plasticity is given by: where η ip is the learning rate of intrinsic plasticity. initial firing threshold was set to - mv. the values of all plasticity-related parameters are listed in table . data analysis and numerical simulations were performed in python and julia. all differential equations were implemented by euler integration with a time step of . ms. figure s : change in correlations of individual hemispheres. a. correlation differences relative to baseline of five control hemispheres. same as a but for five deprived hemispheres. note that for deprived hemispheres and , correlations at md are used for the slope analysis (see fig. e , bottom). average weight change within assemblies induced by the different synaptic plasticity and homeostatic mechanisms. figure s : average e-to-e weight without synaptic scaling. average excitatory-to-excitatory weights for each assembly and across assemblies without synaptic scaling. the vertical dashed line indicates the onset of md. synaptic plasticity: taming the beast deprivation-induced homeostatic spine scaling in vivo is localized to dendritic branches that have undergone recent spine loss subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo generation of correlated spike trains enhanced intrinsic excitability in basket cells maintains excitatory-inhibitory balance in hippocampal circuits activity-dependent regulation of synaptic ampa receptor composition and abundance by β integrins functional organization of excitatory synaptic strength in primary visual cortex deprivation-induced synaptic depression by distinct mechanisms in different layers of mouse visual cortex synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex long-term plasticity of intrinsic excitability: learning rules and mechanisms plasticity in the intrinsic excitability of cortical pyramidal neurons inhibitory and excitatory spike-timing-dependent plasticity in the auditory cortex plk attachment to nsf induces homeostatic removal of glua during chronic overexcitation spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity how monocular deprivation shifts ocular dominance in visual cortex of young mice a triplet spike-timing-dependent plasticity model generalizes the bienenstock-cooper-munro rule to higher-order spatiotemporal correlations activity-dependent relocation of the axon initial segment fine-tunes neuronal excitability firing rate homeostasis in visual cortex of freely behaving rodents neuronal firing rate homeostasis is inhibited by sleep and promoted by wake thalamocortical innervation pattern in mouse auditory and visual cortex: laminar and cell-type specificity all for one but not one for all: excitatory synaptic scaling and intrinsic excitability are coregulated by camkiv, whereas inhibitory synaptic scaling is under independent control trkb kinase is required for recovery, but not loss, of cortical responses following monocular deprivation synaptic scaling and homeostatic plasticity in the mouse visual cortex in vivo functional specificity of local synaptic connections in neocortical networks synaptic and intrinsic homeostatic mechanisms cooperate to increase l / pyramidal neuron excitability during a late phase of critical period plasticity activity-dependent regulation of conductances in model neurons critical-period plasticity in the visual cortex thalamic activity that drives visual cortical plasticity formation and maintenance of neuronal assemblies through synaptic plasticity critical dynamics are a homeostatic set point of cortical networks in vivo. biorxiv potentiation of cortical inhibition by visual deprivation sensory experience inversely regulates feedforward and feedback excitation-inhibition ratio in rodent visual cortex training and spontaneous reinforcement of neuronal assemblies by spike timing plasticity rapid and active stabilization of visual cortical firing rates across light-dark transitions a synaptic organizing principle for cortical neuronal groups triplets of spikes in a model of spike timing-dependent plasticity cell-specific restoration of stimulus preference after monocular deprivation in the visual cortex critical role of cdk and polo-like kinase in homeostatic synaptic plasticity during elevated activity arc/arg . mediates homeostatic synaptic scaling of ampa receptors rate, timing, and cooperativity jointly determine cortical synaptic plasticity bidirectional synaptic mechanisms of ocular dominance plasticity in visual cortex metaplasticity at ca synapses by homeostatic control of presynaptic release dynamics synaptic scaling mediated by glial tnf-α modeling the dynamic interaction of hebbian and homeostatic plasticity too many cooks? intrinsic and synaptic homeostatic mechanisms in cortical circuit refinement selective regulation of current densities underlies spontaneous changes in the activity of cultured neurons activitydependent scaling of quantal amplitude in neocortical neurons homeostatic plasticity in the developing nervous system the threshold for conscious report: signal loss and response bias in visual and frontal cortex signal propagation and logic gating in networks of integrateand-fire neurons inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks excitatory cortical neurons form finescale functional networks synaptic plasticity in neural networks needs homeostasis with a fast rate detector we thank all members of the gjorgjieva lab for comments and discussions. this work was supported by the max planck society (yw, jg), the european research council (stg to jg), ro ey and r ns (ggt), and r ns (kh). yw and jg designed the research and developed the model. kh and ggt provided experimental data and contributed to the interpretation of the data analysis. yw analyzed the data with input from all authors. yw performed and analyzed the simulations together with jg. yw and jg prepared the figures and wrote the manuscript with input from ggt. key: cord- -my n vye authors: valle-casuso, josé carlos; gaudaire, delphine; martin-faivre, lydie; madeline, anthony; dallemagne, patrick; pronost, stéphane; munier-lehmann, hélène; zientara, stephan; vidalain, pierre-olivier; hans, aymeric title: replication of equine arteritis virus is efficiently suppressed by purine and pyrimidine biosynthesis inhibitors date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: my n vye rna viruses are responsible for a large variety of animal infections. equine arteritis virus (eav) is a positive single-stranded rna virus member of the family arteriviridae from the order nidovirales like the coronaviridae. eav causes respiratory and reproductive diseases in equids. although two vaccines are available, the vaccination coverage of the equine population is largely insufficient to prevent new eav outbreaks around the world. in this study, we present a high-throughput in vitro assay suitable for testing candidate antiviral molecules on equine dermal cells infected by eav. using this assay, we identified three molecules that impair eav infection in equine cells: the broad-spectrum antiviral and nucleoside analog ribavirin, and two compounds previously described as inhibitors of dihydroorotate dehydrogenase (dhodh), the fourth enzyme of the pyrimidine biosynthesis pathway. these molecules effectively suppressed cytopathic effects associated to eav infection, and strongly inhibited viral replication and production of infectious particles. since ribavirin is already approved in human and small animal, and that several dhodh inhibitors are in advanced clinical trials, our results open new perspectives for the management of eav outbreaks. viruses with a rna genome infect both animals and plants, and dominate the eukaryotic virome by their diversity and deleterious effects previous works exploring eav inhibitors have proposed to use morpholino or peptide-conjugated morpholino oligomers specifically designed against the eav genome as a therapy to impair viral replication . however, this promising therapy is not easily applicable for use in vivo in large animal such as horses. although, two effective eav vaccines have been available since , the proportion of horses vaccinated against eav in the equine population remains insufficient to control the disease . we think that a complementary approach using antiviral molecules would be ideal to complement vaccination, reduce viral spreading in equine structures and contain eav outbreaks. this is supported by a recent study showing that a global strategy based on vaccination combined with drug therapies enhances protection against foot-and-mouth disease . in our study, we have developed a high- throughput cell-viability assay based on infected equine dermal cells. then, we took advantage of this new assay to explore the potential of broad-spectrum antivirals ( replication through multiple direct and indirect mechanisms. in particular, ribavirin inhibits inosine '- monophosphate dehydrogenase (impdh), a key enzyme catalyzing the first committed and rate- limiting step in the de novo synthesis of guanine nucleotides. as such, impdh inhibition interferes with the production of gtp that is necessary for the synthesis of viral rna molecules , but other mechanisms have been reported such as the direct inhibition of viral polymerases or the lethal mutagenesis of viral genomes , . besides ribavirin, we also tested two newly designed bsa that target the pyrimidine biosynthesis pathway, namely gac and ippa -a . these two molecules are inhibitors of dihydroorotate dehydrogenase (dhodh), the fourth and rate-limiting enzyme of the de novo pyrimidine biosynthesis pathway, and were shown to impair the replication of many different positive-strand and negative-strand rna viruses [ ] [ ] [ ] . in this study, we have shown, for the first time, eav is a cytopathic virus. in this study, we took advantage of this specificity to develop a microplate assay for the identification of eav inhibitors. as a cellular model, we selected the ed cell line that originates from equine dermis and thus matches the host specificity of eav. to setup our in vitro assay, we first analyzed the proliferation of non-infected ed cells in -well plates. to determine the number of viable cells in each culture well, we quantified atp which parallels the number of metabolically active, living cells. assays were performed using a commercial reagent based on the principle that firefly luciferase luminescence is proportional and dependent on atp concentrations (celltiter-glo luminescent cell viability assay; promega). in non-infected cells, atp levels increased by % in the first h as a consequence of cellular proliferation, but then plateaued at h as cell cultures reached confluence ( figure a ). same experiment was performed using cell cultures infected with the bucyrus strain of eav at a multiplicity of infection (moi) of . . as shown in figure a previous studies on broad-spectrum antiviral molecules showed that ribavirin inhibits the replication of different nidoviruses, including arteriviridae such as porcine reproductive and respiratory syndrome virus (prrsv) . we thus tested the antiviral effects of ribarivin on eav using the cell-viability assay described above. as a prerequisite, we first evaluated the cytotoxicity of ribavirin on ed cells. cells were incubated with ribavirin at different concentrations ( . , , , , , , , and µg/ml), and the number of viable cells was determined after h without treatment or after h with or without treatment. no significant cell death was reported at µg/ml or lower concentrations of ribavirin when compared to the initial number viable cells seeded in the culture wells ( figure a ). however, ribavirin was clearly cytostatic at these concentrations. cell viability at µg/ml was %, and decreased to % and % when treating cells with µg/ml and µg/ml of ribavirin, respectively ( figure a) . the half maximal toxic concentration (tc ), which is the concentration that kills % of the cells in culture, was estimated to be > µg/ml for ribavirin. then, we tested the effect of ribavirin in our in vitro eav infection model at the non-cytotoxic concentration range of . to µg/ml. as shown in figure a , atp quantification in infected culture wells at h showed that only % of cells were alive in the absence of ribavirin as opposed to % in presence of the drug at . µg/ml. the viability of infected cells was above % when treated with ribavirin at , or µg/ml ( figure a ). these results were used to calculate the half maximal inhibitory concentration (ic ) of ribavirin that was estimated to . µg/ml, i.e. . µm. this demonstrates that ribavirin is able to protect ed cells against eav- associated cytopathic effects. we were also interested in exploring the antiviral capacity of two new bsa, ippa - and gac that impair viral replication through inhibition of de novo pyrimidine biosynthesis in host cells. using our in vitro eav infection assay, we tested first the cytotoxicity of both compounds. ippa -a and gac- treatment did not show any sign of cytotoxicity on ed cells. at µg/ml, cell viability was above % after h of treatment, but cytostatic effects were observed as expected for this class of drug ( figure b and c) . these results indicate that tc for ippa -a and gac- is over µg/ml in ed cells. we then explored the cytoprotective effect of these two compounds in eav-infected cultures at different concentrations. our results showed that after h of culture, the viability of infected ed cells reached % when treated with µg/ml of ippa -a , and was above % when treated with concentrations > µg/ml ( figure b ). the ic of ippa -a was estimated at . µg/ml, i.e. . µm. gac is not as effective since viability of ed infected cells treated with gac is > % only at the highest concentrations of or µg/ml ( figure c ) with an ic at . µg/ml, i.e. . µm. these results show that cell cultures treated with pyrimidine biosynthesis inhibitors did not exhibit cytopathic effects associated to eav infection, suggesting that eav replication is also impaired. to verify that ribavirin, ippa -a and gac actually block eav replication, viral genome copies were quantified at h.p.i. culture supernatants of infected ed cells treated or not with the compounds at different concentrations were analyzed by rt-qpcr. results showed that ribavirin at and µg/ml significantly reduced the number of eav genome copies in culture supernatants compared to the untreated infected cells ( figure a ). eav replication was totally blocked at µg/ml as determined by comparison with the initial inoculum quantified at h.p.i. these results confirmed that ribavirin impairs eav replication in ed cells, thus explaining the positive effect on cell survival. in parallel, we also measured viral genome copies present in the supernatant of infected cells treated with pyrimidine biosynthesis inhibitors. as expected, ippa -a and gac both reduced the number of viral genome copies in culture supernatants when treated with concentrations > µg/ml and fully blocked eav replication at µg/ml ( figure b and c) . to rank the three molecules that we characterized as eav inhibitors in this study, we calculated for each molecule the selectivity index (si) that corresponds to the tc /ic ratio, and thus reflects the activity of a drug while taking into account cytotoxic effects. tc and ic values were determined from the dose response experiments presented in figures and . si values for ribavirin, ippa -a and gac were > , > and > respectively. in conclusion, this confirms that ippa -a is, over ribavirin and gac , a lead molecule of interest for developing potential antiviral therapies against eav. equine viral arteritis: a respiratory and reproductive disease of significant economic importance to the equine industry. equine veterinary education curing of hela cells persistently infected with equine arteritis virus by a peptide-conjugated morpholino oligomer serological evidence of equine arteritis virus infection and phylogenetic analysis of viral isolates in semen of stallions from serbia synergistic effect of ribavirin and vaccine for protection during early infection stage of foot-and-mouth disease primary resistance of hepatitis b virus to nucleoside and nucleotide analogues resistance to nucleotide analogue inhibitors of hepatitis c virus ns b: mechanisms and clinical relevance. current opinion in virology resistance to the nucleotide analogue cidofovir in hpv(+) cells: a multifactorial process involving ump/cmp kinase ribavirin efficiently suppresses porcine nidovirus replication ribavirin quantification in combination treatment of chronic hepatitis c. ribavirin and boceprevir are able to reduce canine distemper virus growth in vitro oseltamivir-ribavirin combination therapy for highly pathogenic h n influenza virus infection in mice toxicologic effects of ribavirin in cats ribavirin as an inhibitor of nitric oxide synthesis is broadened by macromolecular prodrugs. effectiveness of the ribavirin in treatment of hantavirus infections in the americas and eurasia: a meta-analysis emerging and neglected infectious diseases: insights, advances, and challenges the predominant mechanism by which ribavirin exerts its antiviral activity in vitro against flaviviruses and paramyxoviruses is mediated by inhibition of imp dehydrogenase in vivo evidence for ribavirin-induced mutagenesis of the hepatitis e virus genome ribavirin and lethal mutagenesis of poliovirus: molecular mechanisms, resistance and biological implications original -( -alkoxy- h-pyrazol- -yl)azines inhibitors of human dihydroorotate dehydrogenase (dhodh) inhibition of pyrimidine biosynthesis pathway suppresses viral growth through innate immunity respiratory syncytial virus infection in macaques is not suppressed by intranasal sprays of pyrimidine biosynthesis inhibitors effects of ribavirin on the replication and genetic stability of porcine reproductive and respiratory syndrome virus antiviral activity of k against members of the order nidovirales development and use of a polarized equine upper respiratory tract mucosal explant system to study the early phase of pathogenesis of a european strain of equine arteritis virus species-related inhibition of human and rat dihydroorotate dehydrogenase by immunosuppressive isoxazol and cinchoninic acid derivatives the establishment of an antiviral state by pyrimidine synthesis inhibitor is cell type-specific treatment with interferon-alpha b and ribavirin improves outcome in mers-cov-infected rhesus macaques broad-spectrum inhibition of common respiratory rna viruses by a pyrimidine synthesis inhibitor with involvement of the host antiviral response original chemical series of pyrimidine biosynthesis inhibitors that boost the antiviral interferon response therapeutic options for the novel coronavirus ( -ncov) we would like to thank yves janin from the institut pasteur, paris, france and daniel dauzonne from cnrs, umr , inserm, u , institut curie, centre de recherche, paris, france who kindly provided ippa-a and gac- , respectively. we are also very grateful to fanny lecouturier and gabrielle bouet for their technical assistance. key: cord- -m cg lz authors: schloer, sebastian; brunotte, linda; goretzko, jonas; mecate-zambrano, angeles; korthals, nadia; gerke, volker; ludwig, stephan; rescher, ursula title: targeting the endolysosomal host-sars-cov- interface by clinically licensed functional inhibitors of acid sphingomyelinase (fiasma) including the antidepressant fluoxetine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: m cg lz the corona virus disease (covid- ) pandemic caused by the severe acute respiratory syndrome related coronavirus (sars-cov- ) is a global health emergency. as only very limited therapeutic options are clinically available, there is an urgent need for the rapid development of safe, effective, and globally available pharmaceuticals that inhibit sars-cov- entry and ameliorate covid- . in this study, we explored the use of small compounds acting on the homeostasis of the endolysosomal host-pathogen interface, to fight sars-cov- infection. we find that fluoxetine, a widely used antidepressant and a functional inhibitor of acid sphingomyelinase (fiasma), efficiently inhibited the entry and propagation of sars-cov- in the cell culture model without cytotoxic effects and also exerted potent antiviral activity against two currently circulating influenza a virus subtypes, an effect which was also observed upon treatment with the fiasmas amiodarone and imipramine. mechanistically, fluoxetine induced both impaired endolysosomal acidification and the accumulation of cholesterol within the endosomes. as the fiasma group consists of a large number of small compounds that are well-tolerated and widely used for a broad range of clinical applications, exploring these licensed pharmaceuticals may offer a variety of promising antivirals for host-directed therapy to counteract enveloped viruses, including sars-cov- and covid . only very limited therapeutic options are clinically available, there is an urgent need for the rapid development of safe, effective, and globally available pharmaceuticals that inhibit sars- cov- entry and ameliorate covid- . in this study, we explored the use of small compounds acting on the homeostasis of the endolysosomal host-pathogen interface, to fight sars-cov- infection. we find that fluoxetine, a widely used antidepressant and a functional inhibitor of acid sphingomyelinase (fiasma), efficiently inhibited the entry and propagation of sars-cov- in the cell culture model without cytotoxic effects and also exerted potent antiviral activity against two currently circulating influenza a virus subtypes, an effect which was also observed upon treatment with the fiasmas amiodarone and imipramine. mechanistically, fluoxetine induced both impaired endolysosomal acidification and the accumulation of cholesterol within the endosomes. as the fiasma group consists of a large number of small compounds that are well-tolerated and widely used for a broad range of clinical applications, exploring these licensed pharmaceuticals may offer a variety of promising antivirals for host-directed therapy the current outbreak of the severe acute respiratory syndrome related coronavirus (sars-cov- ) and the resulting corona virus disease (covid- ) pandemic clearly added to the cells for h. the supernatant was aspirated, dmso was added to the cells for min, and subsequently, the od was measured . filipin staining and colocalization analysis. cells the secondary alexafluor -coupled anti-mouse antibody was from thermo fisher limited. confocal microscopy was performed with an lsm microscope (carlzeiss, jena, germany) equipped with a plan-apochro-mat x/ . oil immersion objective. the colocalization of filipin and cd signals was analyzed using the jacop plugin for fiji . endosomal ph measurement. ratiometric fluorescence microscopy and calculation of endosomal ph were done as described . in brief, cells were exposed to oregon green (og )-labeled and tetramethylrhodamine (tmr)-labeled dextran (invitrogen) ( kda) for min, followed by a min chase. cells were then washed and kept in hepes-buffered hanks' balanced salt solution (hbss) at °c during image acquisition. the individual epifluorescence signals of each dye were acquired at intervals of s, and the endosomal ph values of the cells were calculated according to the calibration curve generated by applying standard solutions ranging from ph . to . to cells. were evaluated using one-way anova followed by dunnett's multiple comparison test. **p< . , ***p< . , ****p ≤ . . results building on our previous work on the lel cholesterol balance as a promising therapeutic target cov- titers up to % in both cell lines (fig. b) . reproduced in the polarized cell model. of note, u a treatment was also antiviral, with a lower concentration of µg/ml reducing the viral titers to %, and the higher µg/ml dose displaying a more pronounced antiviral effect of a % reduction. because unwanted cytotoxic effects are one of the main causes for the withdrawal of approved drugs, we measured the impact of u a and fluoxetine treatments on calu- and vero cell viability. as shown in suppl. fig. , both drug treatments were well tolerated, and cell viability was unaffected over a key feature of the late endosomal compartment is the acidic ph. we, therefore, assessed whether fluoxetine treatment impacted endolysosomal ph values in calu- cells, utilizing a quantitative ratiometric fluorescence microscopy assay , . in control cells, the measured ph was at a value of . , whereas cells exposed to µg/ml u a displayed a significantly reduced acidification, well in agreement with our previously reported data of the impact of higher u a concentrations on ph values in lels . of note, the endolysosomal acidification was also affected upon fluoxetine treatment, and this was already observed in cells treated with the low µm concentration ( figure c) . of note, even a low µm fluoxetine treatment could reduce the number of visibly infected vero cells to %, and the higher dose of µm reduced the levels of infected cells to %, indicating a % inhibition of cells with detectable sars-cov- infection. in contrast to the dose- response assays, this assay does not discriminate between different signal strengths, and ec values are, therefore, not directly comparable. in this study, we explored the use of small compounds that target the endolysosomal host- approved, generally well-tolerated, and widely used in human medicine for the treatment of a broad spectrum of pathological conditions . the fiasma fluoxetine, trade-named prozac, is a selective serotonin reuptake inhibitor that boomed in the s and s in the us and is commonly used to treat major depression and related disorders. our results show that fluoxetine treatment was capable of inhibiting sars-cov- infection in a dose-dependent manner, with an ec value below µm, and that the application of µm fluoxetine severely reduced viral titers up to %. fluoxetine-mediated ph neutralization was already seen at a low dose, whereas enhanced endolysosomal cholesterol pools were only visible when a higher dose was used. our results support the hypothesis that although there is quite some variation in the actual escape mechanisms , targeting the viral entry might serve as a target for antiviral therapy . the intricate regulatory circuits that underly endolysosomal lipid balance and functionality are key elements functioning at the endolysosomal host-virus interface and are promising druggable targets for a wide variety of viruses and might be a fast and versatile approach to fight a broad range of pathogens with functionally similar modes of action. because of the essential need for the host cell components, the infection cycle would have to be drastically altered to circumvent such host-directed therapeutics, and this approach is therefore considered much less likely to cause the development of resistance. the large variety of fiasma pharmaceuticals offer a toolbox of potential antivirals for host-directed therapy, and exploring their use including their combination with drugs that directly target viral enzymes, might constitute a promising approach to repurpose these drugs as antivirals to counteract sars-cov- and covid . calu- cells grown on semipermeable supports were infected with sars-cov- isolate at . moi for h. cells were treated h p.i. with µm fluoxetine, and or µg/ml u a. bar graphs represent the mean viral titers ± sem of three independent experiments. one-way anova followed by by dunnett's multiple comparison test. **p ≤ . , ***p ≤ . . estimating clinical severity of covid- from the transmission dynamics in wuhan, china association and membrane localization influenza a virus entry by blocking the formation of fusion pores following virus- endosome hemifusion the v-type h+-atpase in vesicular trafficking: targeting, regulation and function niemann-pick disease type c is a sphingosine storage disease that causes deregulation of lysosomal calcium direct perturbation of lens membrane structure may contribute to cataracts caused by u a, an oxidosqualene cyclase inhibitor targeting viral entry as a strategy for broad-spectrum antivirals rapid colorimetric assay for cellular growth and survival: application to proliferation and cytotoxicity assays a guided tour into subcellular colocalization analysis in light microscopy an open-source platform for biological-image analysis power : a flexible statistical power analysis program for the social, behavioral, and biomedical sciences the authors declare no competing interests. key: cord- -hfd ur a authors: frost, h. robert title: variance-adjusted mahalanobis (vam): a fast and accurate method for cell-specific gene set scoring date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hfd ur a single cell rna sequencing (scrna-seq) is a powerful tool for analyzing complex tissues with recent advances enabling the transcriptomic profiling of thousands to tens-of-thousands of individual cells. although scrna-seq provides unprecedented insights into the biology of heterogeneous cell populations, analyzing such data on a gene-by-gene basis is challenging due to the large number of tested hypotheses, high level of technical noise and inflated zero counts. one promising approach for addressing these challenges is gene set testing, or pathway analysis. by combining the expression data for all genes in a pathway, gene set testing can mitigate the impacts of sparsity and noise and improve interpretation, replication and statistical power. unfortunately, statistical and biological differences between single cell and bulk expression measurements make it challenging to use gene set testing methods originally developed for bulk tissue on scrna-seq data and progress on single cell-specific methods has been limited. to address this challenge, we have developed a new gene set testing method, variance-adjusted mahalanobis (vam), that seamlessly integrates with the seurat framework and is designed to accommodate the technical noise, sparsity and large sample sizes characteristic of scrna-seq data. the vam method computes cell-specific pathway scores to transform a cell-by-gene matrix into a cell-by-pathway matrix that can be used for both exploratory data visualization and statistical gene set enrichment analysis. because the distribution of these scores under the null of uncorrelated technical noise has an accurate gamma approximation, inference can be performed at both the population and single cell levels. as we demonstrate using both simulation studies and real data analyses, the vam method provides superior classification accuracy at a lower computation cost relative to existing single sample gene set testing approaches. despite the diversity of cell types and states present in multicellular tissues, high-throughput genome-wide profiling has, until recently, been limited to assays performed on bulk tissue samples. for bulk tissue assays, the measured values reflects the average across a large number of cells and, when significant heterogeneity exists, only approximate the true biological state of the tissue. to address the shortcomings of bulk tissue analysis, researchers have developed a range of techniques for the genome-wide profiling of individual cells [ , ] with single cell rna sequencing (scrna-seq) [ ] generating particular scientific interest due to the rapid development of the underlying laboratory techniques, which can now cost-effectively quantify genome-wide transcript abundance for thousands to tens-of-thousands of cells. single cell genomic assays, in combination with techniques that infer transcription rates [ ] , spatial information [ ] or temporal dynamics [ , ] , provide scientists with a detailed picture of cellular biology. such cell-level genomic resolution is especially important for the study of tissues whose structure and function is defined by complex interactions between multiple distinct cell types that can occupy a range of phenotypic states, e.g., the tumor microenvironment [ , ] , immune cells [ , ] , and the brain [ ] . important scientific questions that can be addressed by single cell transcriptomics include the identification and characterization of the cell types present within a tissue [ , ] , the discovery of novel cell subtypes [ ] , the analysis of dynamic processes such as differentiation [ ] , or the cell cycle [ ] , and the reconstruction of the spatial distribution of cells within a tissue [ ] . although single cell data provides unprecedented insights into the structure and function of complex tissues and cell populations, technical and biological limitations make statistical analysis challenging [ ] . single cell methods analyze very small amounts of genomic material, leading to significant amplification bias and inflated zero counts relative to bulk tissue assays [ ] . single cell-specific approaches for quality control, normalization and statistical analysis (e.g., zero-inflated models) only partially address these challenges [ , ] . in addition to the challenges of increased noise and missing data, important biological differences exist between bulk tissue and single cell data. as the average over a large number of cells, bulk tissue measurements are typically unimodal and, in many cases, approximately normally distributed. in contrast, single cell data sets reflect a heterogeneous mixture of cell types and cell states resulting in multi-modal and non-normal distributions [ ] . the diverse mixture of cell types and states found in complex tissues also leads to significant differences in gene expression patterns between bulk tissue and single cell data. as evidenced by projects such as the human protein atlas (hpa) [ ] , gene activity measured on bulk tissue samples can differ substantially from the activity occurring within the cell subpopulations comprising the tissue. figure provides a simplified illustration of the marginal and joint distribution characteristics of single cell and bulk tissue gene expression data. in this figure, the marginal distribution is represented by density plots for a single gene while the joint distribution is represented by covariance matrices. collectively, the distributional differences between single cell and bulk tissue genomic data make it challenging to successfully analyze single cell expression data using methods originally developed for bulk tissue, which assume non-sparse data and lower levels of technical noise. although high-dimensional genomic data provides a molecular-level lens on biological systems, the gain in fidelity obtained by testing thousands of genomic variables comes at the price of impaired interpretation, loss of power due to multiple hypothesis correction and poor reproducibility [ ] [ ] [ ] [ ] . to help address these challenges for bulk tissue data, researchers developed gene set testing, or pathway analysis, methods [ , ] . gene set testing is an effective hypothesis aggregation technique that lets researchers step back from the level of individual genomic variables and explore associations for biologically meaningful groups of genes. by focusing the analysis on a small number of functional gene sets, gene set testing can substantially improve power, interpretation and replication relative to an analysis focused on individual genomic variables [ ] [ ] [ ] [ ] . the benefits that gene set-based hypothesis aggregation offers for the analysis of bulk tissue data are even more pronounced for single cell data given increased technical variance and inflated zero counts. gene set testing methods can be categorized according to whether they support supervised or unsupervised analyzes (i.e., test for association with a specific clinical endpoint or test for enrichment in the variance structure of the data), whether they provide results for each sample or for an entire population, whether they test a self-contained or competitive null hypothesis (i.e., the h that none of the genes in the set has an association with the outcome or the h that the genes in the set are not more associated with the outcome than genes not in the set) and whether they test each gene set separately (uniset) or jointly evaluate all sets in a collection (multiset). in this paper, we focus on single sample gene set testing methods, i.e., those that compute a cell-specific statistic for each analyzed gene set to transform a cell-by-gene scrna-seq matrix into a sample-by-pathway matrix. this class of techniques is of particular interest because the cell-level pathway scores can be leveraged for both exploratory data visualization, e.g., shading of cells in a reduced dimensional plot according to inferred pathway activity, as well as the full range of population-level statistical gene set tests, i.e., supervised or unsupervised tests of either the uniset or multiset flavor. existing single sample gene set testing methods can be grouped into three general categories: random walk methods, principal component analysis (pca)-based methods and z-scoring methods. random walk methods (e.g., gsva [ ] and ssgsea [ ] ) generate sample-level pathway scores using a kolmogorov-smirnov (ks) like random walk statistic computed on the gene ranks within each sample, often following some form of gene standardization across the samples. pca-based methods (e.g., pagoda [ ] and plage [ ] ) perform a pca on the expression data for each pathway and use the projection of each sample onto the first pc as a sample-level pathway score. z-scoring methods (e.g., technique of lee et al. [ ] , scsva [ ] , and vision [ ] ) generate pathway scores based on the standardized mean expression of pathway genes within each sample. while these methods have proven effective for the analysis of bulk expression data, with gsva and ssgsea among the most popular techniques, the application of these methods to scrna-seq data is limited by three main factors: poor classification performance in the presence of sparsity and technical noise, lack of inference support on the single cell level, and high computational cost (esp. for the random walk methods when the number of samples/cells is large). gsva, ssgsea, plage and the lee et al. z-scoring methods were all developed for the analysis of bulk gene expression data and were therefore optimized for, and evaluated on, non-sparse data with moderate levels of technical noise. although scsva and vision are both targeted at single cell expression data, they are methodologically similar to the lee et al. z-scoring technique and make no special provision for sparsity or elevated noise. as we demonstrate through simulation studies later in the manuscript, these methods all have poor classification performance relative to the vam technique on sparse and noisy data, i.e., they are not able to effectively identify cells whose transcriptomic profile is enriched for specific pathways. in contrast to the other existing single sample methods, pagoda was designed for single cell analysis and specifically addresses the scrna-seq features of sparsity and technical noise. in the case of pagoda, however, the primary focus is an unsupervised and population-level analysis; the generation of sample-level scores is a secondary output which lacks inference support and, relative to the random walk and z-scoring approaches, is particularly poor at identifying cells with elevated expression of specific pathways. the practical utility of the pagoda method is also hindered by a fragile installation procedure (we were unable to install it successfully), the requirement for a specialized normalization process and lack of direct integration with popular scrna-seq frameworks like seurat [ , ] . although the pathway scores generated by the z-scoring methods should have a standard normal distribution when the expression data follows an uncorrelated multivariate normal distribution, this distributional assumption does not hold for sparse scrna-seq data. neither the random walk nor the pca-based method generate scores with a well characterized null distribution. while the lack of a null distribution does not prevent the cell-specific scores generated by these techniques from being used for visualization or as predictors in regression models, it does preclude cell-level inference and the use of scores as dependent variables in parametric models. given experimental and cost constraints, most bulk gene expression data sets have sample sizes in the hundreds; bulk data sets with more than one thousand samples are rare. single cell data sets, by contrast, typically profile thousands of cells and data sets containing tens-of-thousands to hundreds-of-thousands of cells are becoming increasingly common. these large sample sizes make computational cost an important factor, especially for techniques that are used in an exploratory and interactive context. relative to the vam approach, all of the existing single sample methods have significantly worse computational performance on even small ( cells, genes) data sets. for very large scrna-seq data sets (i.e., , + cells), the use of methods like gsva and ssgsea will be impractical for most users. the vam method generates cell-specific gene set scores from scrna-seq data using a variation of the classic mahalanobis multivariate distance measure [ ] . vam takes as input two matrices: • x: n × p matrix that holds the positive normalized counts for p genes in n cells as measured via scrna-seq. as detailed in section . below, vam provides direct support for both seurat [ ] normalization techniques: log-normalization (i.e., log of plus the unnormalized count divided by an appropriate scale factor for the cell) and the sctransform method [ ] . other scale factor-based normalization techniques that are equivalent to seurat log-normalization (e.g., normalization supported by the scater framework [ ] ) can also be used. • a: m × p matrix that represents the annotation of p genes to m gene sets as defined by a collection from a repository like the molecular signatures database (msigdb) [ ] (a i,j = if gene j belongs to gene set i). vam generates as output one matrix: • s: n × m matrix that holds the cell-specific scores for each of the m gene sets defined in a. given x and a, vam computes s using the following steps: . estimate technical variances: letσ tech be a length p vector holding the technical component of the sample variance of each gene in x. for the vam-seurat integration, two approaches are supported for computingσ tech depending on whether log-normalization or sctransform is employed (see section . below for details). similar variance decomposition approaches are supported by other scrna-seq normalization pipelines (e.g., scater [ ] ). vam can also be used under the assumption that the observed marginal variance of each gene is entirely technical. in this case,σ tech is simply estimated by the sample variances of each gene in x. where g is the size of gene set k, x k is a n×g matrix containing the g columns of x corresponding to the members of set k, i g is a g × g identity matrix, andσ g,tech holds the elements ofσ tech corresponding to the g genes in set k. . compute modified mahalanobis distances on permuted x: to capture the distribution of the squared modified mahalanobis distances under the h that the normalized expression values in x are uncorrelated with only technical variance, the distances are recomputed on a version of x where the row labels of each column are randomly permuted. let x p represent the row-permuted version x and let m p be the n × m matrix that holds the squared modified mahalanobis distances computed on x p according to ( ) . . fit gamma distribution to each column of m p : a separate gamma distribution is fit using the method of maximum likelihood (as implemented by the fitdistr() function in the mass r package [ ] ) to the non-zero elements in each column of m p . letα k andβ k , k ∈ , ..., m represent the gamma shape and rate parameters estimated for gene set k using this procedure. as detailed in section . , the normal χ approximation for standard squared mahalanobis distances does not hold for the values generated according to ( ) , however, the null distribution of these values can be well characterized by a gamma estimated on each column of m p . note that if computational efficiency is a major concern, the gamma distributions can be fit directly on m to avoid the cost of generating x p and m p ; this will impact the power to detect deviations from h but will not inflate the type i error rate. the cell-specific gene set scores are set to the gamma cdf value for each element of m. specifically, each column k of s, which holds the cell-specific scores for gene set k, is calculated as: where f γ(α k ,β k ) () is the cdf for the gamma distribution with shapeα k and rateβ k . under the h of uncorrelated technical noise, valid p-values can be generated by subtracting the elements of s from . section . explores the statistical properties of the elements of m and inference using p-values generated via − s in greater detail. the use of f γ(α k ,β k ) () to generate the elements of s has several important benefits in addition to support for cell-level inference. first, it transforms the squared modified mahalanobis distances for gene sets of different sizes into a common scale, which is important if values in s are used together in statistical models, e.g., as regression predictors. second, it generates a statistic that is bound between and and is robust to very large expression values, i.e., the cdf converges quickly to as the squared distances increase. such robustness is particularly important for the analysis of noisy scrna-seq data; many existing scrna-seq analysis methods such as sc-transform artificially clip normalized data to eliminate extreme values. lastly, the fact that the distribution of values is often bimodal with most values close to or improves the utility of s for both visualization and statistical modeling. for the scenario represented by ( ) , the squared mahalanobis distance is normally defined as: wherex k is a matrix whose rows contain the mean values of the columns of x k andΣ k is the estimated sample covariance matrix for x k . there are two important differences between the modified mahalanobis distance in ( ) and the standard mahalanobis distance in ( ): . the standard mahalanobis distance uses the full sample covariance matrix whereas the modified mahalanobis distance accounts for just the technical variance of each gene and ignores covariances. . the standard mahalanobis measure computes the distances from the multivariate mean whereas the modified mahalanobis distance in computes distances from the origin. a key feature of the vam method, and the basis for the "variance-adjusted" portion of the name, is the use of i gσ g,tech instead of the sample covariance matrix included in the typical formulation of the mahalanobis distance. the practical impact of this change is that deviations in directions of large estimated technical variance are discounted (i.e., larger deviations are expected due to the higher variance) but deviations in directions of large biological variation (or covariance) are not discounted (i.e., these deviations are not expected if the variation in expression is purely technical). use of the origin instead of the multivariate mean in ( ) generates a more biologically meaningful distance measure for scrna-seq data. with the standard mahalanobis distance, it is possible for samples whose elements are all above the mean, all below the mean or a mixture of above and below to have the exact same distance value. computing distances from the origin for positive data eliminates this ambiguity: larger distances correspond to larger positive sample values, i.e., elevated gene expression in the cell, and a distance of corresponds to lack of expression in all genes. measuring distances from the origin will also assign more extreme values to sets whose members show coordinated expression. when distances are measured from the multivariate mean, it is not possible distinguish between sets with a mixture of up and down-regulated genes and sets whose members show coordinated expression. prioritizing coordinated expression is advantageous since such pathways are usually more biologically interesting. as a simple example, imagine a two gene set with mean ( , ) and identity covariance matrix. for this set, cells with expression values of ( , ), ( , ), ( , ), and ( , ) all have the same squared mahalanobis distance of when distances are measured from the multivariate mean. by contrast, the squared distance from the origin for these cells is , , , and , which better reflects the combined expression of these genes. it should be noted that the difference between the mean and origin will be minor for the large number of geens in an scrna-seq data set that have mean values very close to . if the values in x k follow a multivariate normal distribution, the squared mahalanobis distances computing according to the standard definition in ( ) can be approximated by a χ distribution with g degrees-of-freedom, where g is the size of gene set k. ifx k is replaced by the vector in ( ) , the resulting squared distances are instead approximated by a non-central χ distribution with g degrees-of-freedom and non-centrality parameterx t kΣ − kxk . the modified squared mahalanobis measure used by vam and defined in ( ) can also be approximated by a non-central χ distribution under the h of uncorrelated technical noise if the data in x k is not too sparse, i.e.,∼ % or fewer of the elements are zero, and the non-zero values in x k have an approximately normal distribution. figure illustrates the density estimate for values computed using ( ) on scrna-seq data simulated under the h of uncorrelated technical noise for sparsity values of both . and . (see section . for more details on the simulation model, which assumes a log-normal distribution for the non-zero elements in x k ). figure also includes the density for the non-central χ distribution with the appropriate degrees-of-freedom and non-centrality parameter. as shown in this figure, the non-central χ distribution provides an accurate approximation for a sparsity of . , panel a), but overestimates the mean and significantly underestimates the variance of the squared distances when the sparsity increases to . , panel b). given the poor fit of a non-central χ distribution for realistic sparsity levels, we instead model the null distribution of elements in m by a gamma distribution whose parameters are estimated via maximum likelihood as described in section . above. as shown in figure , the estimated gamma distribution provides a very good fit for the observed squared modified mahalanobis distances at both the . and . sparsity levels. the type i error control and power provided by the estimated gamma distribution is detailed in section . below. the vam implementation supports direct integration with the seurat framework [ ] with the integration details varying based on whether seurat log-normalization is followed by variable feature detection using a mean/variance trend or the sctransform [ ] method is used to perform both normalization and variable feature detection. the s matrix generated by vam is saved as a new seurat assay, which enables the visualization and further analysis of these cellspecific pathways scores using seurat framework, e.g., the featureplot() and findmarkers() functions. the seurat log-normalization method implemented by the normalizedata() r function starts by dividing the unique molecular identifier (umi) counts for each gene in a specific cell by the sum of the umi counts for all genes measured in the cell and multiplying this ratio by the scale factor × . the normalized scrna-seq values are then generating by taking the natural log of this relative value plus . when log-normalization is used, variable features are detected using the findvariablefeatures() function, which fits a non-linear trend to the log scale variance/mean relationship (the seurat vst method). the estimated trend models the expected technical variance based on mean gene expression; observed variance values above this expected trend reflect biological variance. given this trend, the proportion of technical variance is computed as ratio of the expected technical variance to the observed variance. note that it is possible for this ratio to be greater than if the observed variance is less that the expected variance. in this scenario, vam sets the x matrix to the log-normalized values and the technical variance vectorσ tech is computed as the product of the variance of the normalized counts and the proportion of technical variance as estimated by the vst method. the seurat sctransform normalization method [ ] fits a regularized negative binomial regression model on the umi counts for each gene using approximate cell sequencing depth as a dependent variable. the pearson residuals from these regression models capture the biological component of the scrna-seq data and should have a mean of and variance of if expression is due solely to technical noise. the reciprocal of the pearson residual variance therefore estimates the proportion of technical variance. in this scenario, vam sets the x matrix to plus the natural log of the corrected umi counts (i.e., counts that have been adjusted using the pearson residuals to reflect the counts that would be observed if all cells had the same sequencing depth) and the technical variance vectorσ tech is computed as the ratio of the variance of corrected counts in x and the variance of the pearson residuals. to explore the statistical properties of the vam method, we used simulated scrna-seq data simulated to reflect the characteristics of the pbmc log-normalized data. to simulate normalized scrna-seq data, i.e, the contents of x, we took advantage of the fact that the non-zero log-normalized values in real scrna-seq data sets can be effectively modeled by a log-normal distribution. figure illustrates this distributional approximation for the non-zero lognormalized counts from the peripheral blood mononuclear cell (pbmc) scrna-seq data set (see section . for more details on this data set). based on this result, we simulated normalized scrna-seq data under the h of uncorrelated technical noise by first populating x for , cells and genes with independent log-normal rvs with mean and variance set to the sample estimates for the non-zero normalized counts in the pbmc scrna-seq data. the generated x was then sparsified by setting a random selection of elements to , with the number of zero elements matching the desired sparsity level. data sets simulated according to this procedure were used to generate figure as well as the type i error control results in section . . to assess power and classification performance (sections . and . ), data sets simulated under h were modified to elevate the normalized expression of genes in a hypothetical gene set of size for the first cells while preserving the overall sparsity level. the elevated values were computed by setting all non-zero counts in the original null data to log-normal rvs with a larger mean (the variance was set to the same value used to simulate the null data). classification performance was assessed for a range of null variance, set size and inflated mean values. to assess the performance of vam on real scrna-seq data, we analyzed two data sets that are both freely available from x genomics: the . k human pbmc data set used in the seurat guided clustering tutorial [ ] , and an . k mouse brain cell data set generated on the combined cortex, hippocampus and sub ventricular zone of an e mouse [ ] . these two data sets are representative of small and medium sized single cell experiments and capture transcriptomic measurements for distinct heterogeneous cells populations (immune cells and neural cells) for the two organisms (human and mouse) that comprise a large fraction of existing scrna-seq data sets. preprocessing, quality control (qc), normalization and clustering of the pbmc data set followed the exact processing steps used in the seurat guided clustering tutorial. specifically, the seurat log-normalization method is used followed by application of the vst method for decomposing technical and biological variance. preprocessing and qc of the pbmc data yielded an x matrix of normalized counts for , genes and , cells. processing of the mouse brain data followed similar quality control metrics (at least features per cell, non-zero values in at least cells for genes, proportion of mitochondrial reads less than % [ ] ) with uniform manifold approximation and projection (umap) [ ] used for dimensionality reduction and clustering performed with seurat's implementation of shared nearest neighbor (snn) modularity optimization [ ] . normalization of the mouse brain data was performed using sctransform rather than log-normalization to explore the performance of vam for both of the supported seurat normalization approaches. preprocessing and qc of the mouse brain data yielded an x matrix of normalized counts for , genes and , cells. for the vam analysis of these two scrna-seq data sets, the gene set matrix a was populated using the c .cp.biocarta (biocarta, gene sets), and the c .bp (gene ontology biological processes, , gene sets) collections from the version . of the molecular signatures database (msigdb) [ ] . these msigdb collections contain gene sets from three well known and widely used repositories of curated gene sets: biocarta [ ] , and the biological process branch of the gene ontology [ ] . prior to running vam, the entrez gene ids used by msigdb were converted to ensembl ids using logic in the bioconductor org.hs.eg.db r package [ ] . for analysis of the mouse brain data, the human ensembl ids were mapped to murine orthologs using logic in the biomart r package [ ] . the x and a matrices were then filtered to only contain genes present in both matrices ( , genes for the pbmc data and , genes for the mouse brain data). finally, the a matrix was filtered to remove all gene sets containing fewer than or more than members. to determine enrichment of gene sets for specific scrna-seq clusters, a wilcoxon rank sum test was performed using the seurat findmarkers() method. for comparative evaluation of the vam method on both simulated and real scrna-seq data, we used methods from each of the existing categories of single sample gene set testing methods. for the random walk category, we used both gsva [ ] and ssgsea [ ] given the popularity of these two techniques, for the class of z-scoring methods, we used the technique of lee et al. [ ] , and, for the class of pca-based methods we used plage [ ] . for all of these comparison methods, the implementations available in the gsva r package were employed. unless otherwise noted, analyses were performed using default values for method parameters. type i error control was assessed using scrna-seq data simulated according to the process detailed in section . with the technical variances set to the sample variance of the simulated genes. the vam method was applied to a set comprised by randomly selected genes. the type i error rate at an α = . for simulated scrna-seq data sets ( , p-values per data set for , total hypothesis tests) was . . to assess power, a random group of genes were given inflated log-normal values for the first cells with the mean value ranging from . to . (the non-inflated mean was . to align with the pbmc data). for each inflated mean value, data sets were simulated and power was computed on the non-null cells for a total of hypothesis tests. as displayed in figure , the estimated power values ranged from . for an inflated mean of . to . for an inflated mean of . . to compare the performance of vam against existing single sample gene set testing methods, we measured the classification accuracy of each method (i.e., how well the method is able to highly rank cells that have inflated values for the genes in a specific set) on scrna-seq data sets simulated according to the procedure outlined in section . . use of classification accuracy vs. statistical power for the comparative evaluation had two motivations: ) vam is the only method in the comparison group that generates valid p-values, and ) we envision vam being used primarily as a means to rank order cells according to pathway activity rather than as a tool for cell-level statistical inference. figure illustrates the relative classification performance (as measured by the area under the receiver operating characteristic curve (auc)) of vam, gsva [ ] , ssgsea [ ] , and representative methods from the z-scoring and pca-based categories (the technique of lee et al. [ ] for z-scoring and plage [ ] for pca-based methods) across a range of sparsity, noise, effect size and set size values. for each distinct combination of parameter values, data sets were simulated according to the procedure outlined in section . and figure displays the average auc for each method across these data sets with error bars representing the standard error of the mean. the general trends in performance follow the expected trajectories, e.g., auc values fall as sparsity or noise increase and auc values increase as the effect size or set size increases. importantly, the vam method provides superior classification performance relative to the other evaluated methods across the full range of evaluated parameter values with the difference particularly pronounced for the sparsity and variance found in the pbmc scrna-seq data. table : relative execution time as compared to the vam method on simulated scrna-seq data, the pbmc scrna-seq data set for msigdb c .cp.biocarta collection and the mouse brain scrna-seq data set for the msigdb c .bp collection. the gsva method failed to process the mouse brain data so a specific relative performance is not available. table displays the relative execution time of gsva, ssgsea and representative z-scoring and pca-based methods as compared to vam. relative times are shown for the analysis of the simulated data sets ( , cells and genes) used to generate the classification results shown in figure , for the analysis of the pbmc scrna-seq data set using the msigdb bio-carta (c .cp.biocarta) collection (see section . for detailed results on the pbmc data set), and for the analysis of the mouse brain scrna-seq data set using the msigdb gene ontology biological process (c .bp) pathway collection (see section . for detailed results on the mouse brain data set). a specific result for the gsva method on the mouse brain data is not available since this method failed to complete the analysis due to memory issues. the vam method had a much faster average execution on the simulated data set relative to the other methods with the difference particularly dramatic for the two most popular single sample methods, gsva and ssgsea. although the pca-based method was faster than vam on the pbmc data and both the z-scoring and pca-based methods were faster than vam on the mouse brain data, the difference in execution time between vam and both gsva and ssgsea on these real data sets was still over an order-of-magnitude. gsva ssgsea z-scoring pca il ctcf ctcf il bbcell bbcell bbcell asbcell bbcell asbcell asbcell asbcell bbcell blymphocyte tcra blymphocyte th th il mhc csk inflam il th th ctcf th th table : top five biocarta pathways found to have higher pathway activity scores in the b cell cluster relative to other cells in the pbmc data set according to a wilcoxon rank sum test. pathways are ordered according to p-value from wilcoxon test. the columns reflect the method used to compute the cell-specific pathway scores. as detailed in section . , we applied the vam method and comparison techniques to the x . k human pbmc data set used in the seurat guided clustering tutorial [ ] . figure is a reduced dimensional visualization of the , cells remaining after quality control filtering. cluster cell-type labels match the assignments in the seurat guided clustering tutorial. for this analysis, the cell-specific pathway scores were used to identify pathways with elevated activity within cell-type specific clusters. as an illustrative example, we highlight the results for the b cell cluster. table lists the five msigdb biocarta pathways most significantly up-regulated in the b cell cluster according to a wilcoxon rank sum test applied to the cell-specific scores computed by vam and other comparison methods. all of the evaluated methods correctly associate b cell-related pathways with the b cell cluster, which is not surprising given the very distinct transcriptomic profile of b cells. while all of the methods offer similar classification performance in this scenario, vam still has the benefits of low computational cost and support for cell-level inference. for more complex cell populations, e.g., the mouse brain scrna-seq data detailed in section . , vam appears to offer superior classification performance relative to the other techniques. a important use for the cell-specific scores generated by vam is the visualization of pathway activity across all cells profiled in a given scrna-seq data set. figure illustrates such a visualization for the four biocarta pathways most significantly up-regulated in the b cell cluster according to cell-specific scores generated by the vam method. this type of visualization provides important information regarding the range of pathway activity across all profiled cells, e.g., il- activity is also up-regulated in monocytes. figure : projection of mouse brain scrna-seq data onto the first two umap dimensions. cells are labeled according to the output from unsupervised clustering. as detailed in section . , we applied the vam method and comparison techniques to the x . k mouse brain scrna-seq data set. for this example, we used the sctransform normalization technique instead of log-normalization and explored a much larger pathway collection (the msigdb gene ontology biological process (c .bp) collection with , gene sets). figure is a reduced dimensional visualization of the , cells remaining after quality control filtering with cells labeled according to the output from unsupervised clustering. similar to the pbmc analysis, the cell-specific pathway scores were used to identify pathways with elevated activity within specific clusters. we highlight the results for cluster , which appears to represent glial cells including a population of astrocytes, a glial cell subtype. table lists the five msigdbc c .bp gene sets most significantly up-regulated in cluster according to a wilcoxon rank sum test applied to the cell-specific scores computed by vam and other comparison methods. figure : visualization of the vam generated cell-specific scores for the pathways most significantly enriched in cluster (as seen in figure ) according to a wilcoxon rank sum test on the vam scores. as seen in table , vam clearly associates this cluster with glial cells with glial-cell-differentiation the top ranked set and both astrocyte-differentiation and glial-cell-development also in the top five list. figure provides a visualization of the vam-generated scores for the top four gene sets up-regulated in cluster . by contrast, neither the z-scoring nor pca-based methods included glial cell-related sets in the top and ssgsea only identified one, glial-cell-fate-commitment, at rank . none of these other methods identified an astrocyte-related gene set. although it is not possible to say with certainty that cluster captures the glial (and potentially astrocyte-specific) sub-population in this scrna-seq data, the top five most significantly up-regulated genes in cluster according to a wilcoxon test on the sctransform-corrected counts all have a known association with astrocytes: dbi [ ] , ptn [ ] , tubb b [ ] , hopx [ ] , igfbp [ ] . single cell rna-sequencing is a powerful experimental tool for exploring the biology of heterogeneous cell populations. the significant sparsity and technical noise associated with scrna-seq data, however, makes statistical analysis challenging, especially for tests conducted on the level of individual genes. one promising approach for addressing the statistical challenges of scrna-seq data is gene set testing or pathway analysis, a hypotheses aggregation technique that can mitigate the issues of sparsity and technical noise to improve power, replication and interpretability. the class of single sample gene set testing methods, which transform a cell-by-gene matrix into a cellby-pathway matrix, is particular effective for single cell analyses since it enables the full range of standard downstream processing (visualization, clustering, differential expression testing, etc.) to be performed on the pathway-level rather than on the gene-level. unfortunately, almost all existing single sample gene set testing methods were designed for the analysis of bulk tissue gene expression data, which is non-sparse and, compared to scrna-seq data, has a small sample size and limited technical noise. to remedy the lack of effective single sample gene set testing methods for scrna-seq data, we developed the variance-adjusted mahalanobis (vam) method, a novel modification of the standard mahalanobis multivariate distance measure that generates cell-specific pathways scores which account for the inflated noise and sparsity of scrna-seq data. although we expect the scores generated by vam to be primarily used in contexts that do not assume a specific statistical model, e.g., as predictor variables, the fact that the distribution of the vam-generated scores has an accurate gamma approximation under the null of uncorrelated technical noise enables inference regarding pathway activity for individual cells. as demonstrated on both simulated and real scrna-seq data, the vam method provides superior classification performance at low computational cost relative to existing single sample techniques. the utility of vam is also aided by direct integration with the popular seurat framework, which makes it easy to incorporate vam into existing scrna-seq analysis pipelines. these features combine to make the vam method an effective and practical tool for the visualization and statistical analysis of scrna-seq data. scaling single-cell genomics from phenomenology to mechanism revealing the vectors of cellular identity with single-cell genomics highly parallel genome-wide expression profiling of individual cells using nanoliter droplets rna velocity of single cells spatial reconstruction of singlecell gene expression data wishbone identifies bifurcating developmental trajectories from single-cell data reversed graph embedding resolves complex single-cell trajectories dissecting the multicellular ecosystem of metastatic melanoma by single-cell rna-seq recent progress in single-cell cancer genomics single-cell profiling of breast cancer t cells reveals a tissue-resident memory subset associated with improved prognosis global characterization of t cells in non-small-cell lung cancer by single-cell sequencing brain structure. cell types in the mouse cortex and hippocampus revealed by single-cell rna-seq visne enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia bayesian approach to single-cell differential expression analysis an immune atlas of clear cell renal cell carcinoma reconstructing cell cycle pseudo time-series via single-cell transcriptome data challenges and emerging directions in single-cell analysis design and computational analysis of single-cell rna-sequencing experiments scater: pre-processing, quality control, normalization and visualization of single-cell rna-seq data in r single-cell mrna quantification and differential analysis with census proteomics. tissue-based map of the human proteome microarray data analysis: from disarray to consolidation and consensus gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles analyzing gene expression data in terms of gene sets: methodological issues ten years of pathway analysis: current approaches and outstanding challenges gene set enrichment analysis: performance evaluation and usage guidelines gsva: gene set variation analysis for microarray and rna-seq data systematic rna interference reveals that oncogenic kras-driven cancers require tbk characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis pathway level analysis of gene expression using singular value decomposition inferring pathway activity toward precise disease classification scsva: an interactive tool for big data visualization and exploration in single-cell omics functional interpretation of single cell similarity maps integrating single-cell transcriptomic data across different conditions, technologies, and species comprehensive integration of single-cell data on the generalized distance in statistics normalization and variance stabilization of single-cell rnaseq data using regularized negative binomial regression molecular signatures database (msigdb) . modern applied statistics with s, th edn seurat: seurat guided clutering tutorial x genomics: k brain cells from an e mouse (v chemistry a single-cell transcriptional atlas of the developing murine cerebellum umap: uniform manifold approximation and projection for dimension reduction a smart local moving algorithm for large-scale modularity-based community detection the gene ontology in : extensions and refinements org.hs.eg.db: genome wide annotation for human mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart astrocytes potentiate gabaergic transmission in the thalamic reticular nucleus via endozepine signaling upregulation of pleiotrophin gene expression in developing microvasculature, macrophages, and astrocytes after acute ischemic brain injury neural circuitspecialized astrocytes: transcriptomic, proteomic, morphological, and functional evidence gliogenesis in the outer subventricular zone promotes enlargement and gyrification of the primate cerebrum enhanced production and proteolytic degradation of insulin-like growth factor binding protein- in proliferating rat astrocytes funding: national institutes of health grants k lm and p gm .conflict of interest: none declared. key: cord- -vmze mdx authors: vanheer, lotte; schiavo, andrea alex; van haele, matthias; haesen, tine; janiszewski, adrian; chappell, joel; roskams, tania; cnop, miriam; pasque, vincent title: revealing the key regulators of cell identity in the human adult pancreas date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vmze mdx cellular identity during development is under the control of transcription factors that form gene regulatory networks. however, the transcription factors and gene regulatory networks underlying cellular identity in the human adult pancreas remain largely unexplored. here, we integrate multiple single-cell rna sequencing datasets of the human adult pancreas, totaling cells, and comprehensively reconstruct gene regulatory networks. we show that a network of transcription factors forms distinct regulatory modules that characterize pancreatic cell types. we present evidence that our approach identifies key regulators of cell identity in the human adult pancreas. we predict that heyl and jund are active in acinar and alpha cells, respectively, and show that these proteins are present in the human adult pancreas as well as in human induced pluripotent stem cell-derived pancreatic cells. the comprehensive gene regulatory network atlas can be explored interactively online. we anticipate our analysis to be the starting point for a more sophisticated dissection of how transcription factors regulate cell identity in the human adult pancreas. furthermore, given that transcription factors are major regulators of embryo development and are often perturbed in diseases, a comprehensive understanding of how transcription factors work will be relevant in development and disease biology. highlights - reconstruction of gene regulatory networks for human adult pancreatic cell types - an interactive resource to explore and visualize gene expression and regulatory states - predicting putative transcription factors driving pancreatic cell identity - heyl and jund as candidate regulators of acinar and alpha cell identity, respectively a fundamental question in biology is how a single genome gives rise to the great diversity of cell types that make up organs and tissues. a key goal is to map all cell types of developing and mature organs such as the pancreas, an essential organ at the basis of multiple human disorders including diabetes and cancer (kahn, cooper and del prato, ; kamisawa et al., ; han et al., ) . single-cell rna sequencing (scrna-seq) provides a powerful tool to resolve cellular heterogeneity, identify cell types and capture highresolution snapshots of gene expression in individual cells (grün et al., ) . with the advent of singlecell transcriptomics, great progress has been made toward the creation of a reference cell atlas of the pancreas (baron et al., ; segerstolpe et al., ; wang et al., ; xin et al., ; enge et al., ; lawlor et al., ; augsornworawat and millman, ; han et al., ) . work from several groups provided cellular atlases of the pancreas during mouse development (stanescu et al., ; byrnes et al., ; scavuzzo et al., ) , in adult mice (muraro et al., ) and in human fetal and adult pancreas (liu et al., ; baron et al., ; muraro et al., ; segerstolpe et al., ; wang et al., ; enge et al., ; han et al., ) . efforts have also been made to map cellular identity during pancreas development starting from human pluripotent stem cells hrvatin et al., ; zhu et al., ; han et al., ; hogrebe et al., ; peterson et al., ) . taken together, these studies provide an opportunity to better understand the maintenance and establishment of cellular identity among different pancreatic cell types. work over the past decades indicated that cellular identity is established by combinations of transcription factors (tfs) that recognize and interact with cis-regulatory elements in the genome. these transcription factors, together with chromatin modifiers, give rise to gene expression programs. a small number of core tfs are thought to be sufficient for the establishment and maintenance of gene expression programs that define cellular identity during and after development (ohno, ) . studies conducted in both mouse and human have successfully identified tfs that are pivotal for the acquisition and maintenance of pancreatic cell fates (dassaye, naidoo and cerf, ) . these include pdx (zhou et al., ; shih et al., ) , mafa (nishimura, takahashi and yasuda, ) , ngn (gradwohl et al., ) , nkx . (sussel et al., ) , pax (sosa-pineda et al., ) , nkx . , neurod (mastracci et al., ) , arx (collombat et al., ) , mafb (artner et al., ) , rfx (smith et al., ) , gata (ketola et al., ) , foxa and sox (shroff et al., ; shih et al., ) . conditional deletion of tfs such as foxa and pdx in adult beta cells results in the loss of cellular identity and function (sund et al., ; gao et al., ) . genetic evidence for the role of these tfs in establishing human pancreatic cell identity is provided by the identification of tf loss-of-function mutations that cause pancreatic agenesis (stoffers et al., ; sellick et al., ; allen et al., ; shaw-smith et al., ; de franco et al., ) or neonatal or young-onset diabetes (senée et al., ; solomon et al., ; rubio-cabezas et al., smith et al., ; bonnefond et al., ; flanagan et al., ) . in addition, tf overexpression can reprogram somatic cells to adopt alternative identities (takahashi and yamanaka, ; zhou et al., ; vierbuchen et al., ; lima et al., ) . for example, the induced expression of ngn , pdx , and mafa was shown to reprogram mouse alpha cells into beta-like cells in vivo (zhou et al., ) . however, how key tfs underlie the maintenance of cellular identity in the human pancreas remains incompletely understood. over the past decade, multiple approaches to reconstruct gene regulatory networks (grns) from bulk and single-cell omics data have been developed (ghazanfar et al., ; lim et al., ; matsumoto et al., ; fiers et al., ) . in particular, it is now possible to combine single-cell transcriptomic data with cisregulatory information to infer grns (janky et al., ; aibar et al., ; van de sande et al., ) . because tfs recognize dna motifs in the genome, one can measure if inferred target genes are expressed within single cells, and therefore quantify the activity of tfs. such approaches have revealed the regulatory programs in distinct systems including the drosophila brain (davie et al , ) , cancer (wouters et al., ) , during early mouse embryonic development (peng et al , ) , in a mouse cell atlas (suo et al., ) and a human cell atlas (han et al., ) . analysis of grns in the human adult pancreas has identified distinct endocrine and exocrine regulatory states with multiple stable cell states for alpha, beta and ductal cells (kumar and vinod, ) . type diabetes and body mass index (bmi) were shown not to impact grn activity of alpha and beta cells (kumar and vinod, ) . previous data shows that type diabetic (faerch et al., ; dennis et al., ; dybala and hara, ; zaharia et al., ) and non-diabetic human islet preparations vary greatly depending on age (enge et al., ; westacott et al., ) and bmi (henquin, ) warranting the exploration of grns in larger cohorts. hence, it remains unclear whether previous grn findings can be extrapolated to a broader, highly heterogeneous non-diabetic and type diabetes patient population. the development of integration methods provides an opportunity to analyze multiple scrna-seq studies from multiple laboratories and patients (butler et al., ; luecken et al., ) . additional knowledge on how grns maintain cellular identity in the human adult pancreas may further the understanding of disease states as well as improve efforts to convert patient cells into functional, mature beta cells for diabetes treatment. here, we build an integrated human pancreas gene regulatory atlas. in this resource, we use single-cell transcriptomes of the human adult pancreas, taking advantage of integration strategies and computational tools to reconstruct grns. our analysis identifies the grn landscape and candidate regulators that are critical for cellular identity in the human adult pancreas. integrating multiple human adult pancreas scrna-seq datasets can further improve the power of scrnaseq analyses to create a human adult pancreas cell atlas. we set out to analyse and integrate five publicly available datasets covering a total of non-diabetic, type diabetic and type diabetic individuals using seurat v . cca integration tools ( figure a , detailed donor information can be found in table s ) (segerstolpe et al., ; wang et al., ; xin et al., ; enge et al., ; lawlor et al., ; hafemeister and satija, ; stuart et al., ) . after filtering out low quality transcriptomes and data integration, uniform manifold approximation and projection for dimension reduction (umap) visualisation revealed that cells localize into distinct clusters ( figure b) . cells from each original dataset localize together suggesting that the location of cells on the umap is not driven by the dataset of origin ( figure c ). we next sought to identify pancreatic cell types ( figure d ). clustering analyses based on the expression of well-established cell type specific markers led to the identification of eight cell types in the human adult pancreas: beta, alpha, gamma, delta, acinar, ductal, stellate and endothelial cells ( figure b , table s ). umap visualization allowed for the segregation of endocrine, exocrine and other lineages ( figure s a ). beta cells grouped together, away from other clusters and were marked by ins expression (figure e ). other distinct clusters corresponded to alpha, gamma and delta cells based on global transcriptional similarity and elevated expression of gcg, ppy and sst, respectively, and other markers ( figure e , table s ). using a similar approach, we reliably detected other, previously described, major pancreatic cell types (acinar, ductal, endothelial and stellate, figure b ). all cell types were detected in both non-diabetic and type diabetic pancreases ( figure f ). an additional four rare cell populations, that cannot be robustly identified through clustering analyses, were identified manually by assessing the expression of ghrl (epsilon cells), tps ab (schwann cells), cd (mast cells) and sox (major histocompatibility complex (mhc) class cells) (wierup et al., ; segerstolpe et al., ) (figure g ). these rare cell types often cluster with other common cell types. importantly, our annotation recapitulated previous annotations to a large extent (figure s b-c). in summary, we reconstructed an integrated single-cell atlas of the human adult pancreas, and annotated pancreatic cell types. next, we set out to comprehensively reconstruct grns for all pancreatic cell types from single-cell transcriptomic data, applying single-cell regulatory network inference and clustering (pyscenic) (aibar et al., ; van de sande et al., ) . pyscenic links cis-regulatory sequence information together with single-cell transcriptomes in three sequential steps by ) co-expression analysis, ) target gene motif and chip-seq track enrichment analysis, and ) regulon activity evaluation (figure a) . each regulon consists of a tf with its predicted target genes (co-expressed genes with an enriched tf motif), altogether forming a regulon. pyscenic identified regulons that characterize the grns of the human adult pancreas ( figure b /c, table s ). multiple regulons identified here as active in the pancreas correspond to tf binding motifs enriched in accessible chromatin in the pancreas (assessed by atac-seq in facs-purified pancreatic cells, (arda et al., ) ), supporting the validity of the approach ( figure s a ). umap visualization based on the activity of regulons in non-diabetic and type diabetic pancreata revealed groups of cells that differ from one another based on their regulatory activity ( figure b -c). in particular, there are distinct regulatory states for exocrine and endocrine pancreatic lineages, stellate and endothelial cells (figure b /c). endocrine cell types clustered together, indicating shared regulatory states, while exocrine cell types formed two distinct clusters. stellate and endothelial cells differed most from other cell types in their regulatory states. these results are consistent with previous analyses (baron, veres, samuel l. wolock, et al., ; lawlor et al., ; kumar and vinod, ) and are also in line with our findings above based on gene expression analysis ( figure d ). as expected, regulons active in endocrine cell types include rfx , pax and neurod ( figure d -g). these tfs have reported roles in endocrine cell fate commitment and maintenance of cell identity throughout adult life (smith et al., ; hart et al., ; mastracci et al., ) . using iregulon for visualization, many of the neurod target genes identified here have been previously linked to beta cell survival and function, such as snap , tspan , (gierl et al., ; haumaitre et al., ; hart et al., ; churchill et al., ) . clustering all cells based on the activity of all regulons identifies regulatory modules ( figure d , red squares). in the exocrine pancreas, one regulatory module, containing nr a , was shared between acinar and ductal cells, although with a tendency for increased regulon activity in ductal cells ( figure d /h). other exocrine regulons included onecut , rest and hnf b with reported roles in exocrine development (nissim et al., ; kropp, zhu and gannon, ) and the adult exocrine pancreas (quilichini et al., ; bray et al., ) ( figure d ). in summary, this analysis confirms the expected separation of exocrine and endocrine cells with distinct gene regulatory programs, and identifies known and novel candidate regulators of pancreas cell states. several regulatory modules are shared between different cell types within the endocrine and exocrine pancreas. additionally, each cell type is defined by cell-type specific regulatory modules ( figure d ). in the endocrine pancreas, alpha and beta cells shared endocrine regulons (mafb, meis ), whereas we observed distinct activities for arx and irx regulons in alpha cells and rxrg and pdx in beta cells ( figure d /h), expanding previous findings (kumar and vinod, ) . using iregulon for visualization, pdx target genes include slc a , pdia and abhd , which have been reported to control insulin release (thomas, brown and brown, ; eletto et al., ; rorsman and ashcroft, ) ( figure s c ). interestingly, gamma and delta cells overlapped with alpha and beta cells, respectively, suggesting a shared regulatory state ( figure c/d) . this includes shared regulon activity for arx in gamma and alpha and pdx in beta and delta cells ( figure d /h), consistent with their reported expression in published scrna-seq studies (baron et al., ; segerstolpe et al., ; lawlor et al., ) . gata and rbpjl, known acinar-specific tfs, were highly active in acinar cells (masui et al., ; carrasco et al., ) ( figure h) . similarly, ductal cells were characterised by highly active sox and pou f regulons, in line with previous literature (shroff et al., ; yamashita et al., ) (figure h ). in sum, this analysis confirms that alpha, beta, acinar and ductal cells are defined by the activity of distinct tf combinations that form gene regulatory modules. in conclusion, the network approach recovers many of the expected regulators of pancreatic cellular identity allowing for the comprehensive characterisation of the gene regulatory state of all major human adult pancreatic cell types. a comprehensive network analysis provides an opportunity to predict and identify critical regulators of cell identity. to identify regulons with highly cell type-specific activities within the human adult non-diabetic pancreas, we calculated regulon specificity scores (rss) (a complete list of rsss can be found in table s ) (suo et al., ) . the rss utilises jensen-shannon divergence to measure the similarity between the probability distribution of the regulon's enrichment score and cell type annotation wherein outliers receive a higher rss and are therefore considered cell type-specific (suo et al., ) . it can therefore be used to rank the activity of tfs within specific cell types. among the top regulons identified in alpha cells, we recover well known regulators of alpha and endocrine cell fate such as arx, irx , pax , mafb, neurod and rfx ( figure a /b) (collombat et al., ; artner et al., ; delporte et al., ; smith et al., ; dorrell et al., ; mastracci et al., ) . in addition, we identified jund, egr , srebf and stat , which have not yet been implicated in alpha cell identity. egr (but not egr ) has been shown to transcriptionally regulate glucagon expression (leung- theung-long et al., ) as well as the pdx promoter in beta cells (eto, kaur and thomas, ) . stat and jund have been described in pancreatic tissue in general and beta cells, respectively, but not in alpha cells (yu and kim, ; good et al., ) (figure b /c). these tfs respond to the jnk and egfr signalling pathways and may have important physiological functions. both jund and the jund/jnk signaling pathway have been implicated in pancreatic cancer (shin et al., ; recio-boiles et al., ) . immunocytochemistry of the human adult pancreas confirmed the presence of nuclear jund in islets ( figure di ). we also detected nuclear jund protein in a subset of human induced pluripotent stem cells (ipscs) subjected to beta cell differentiation (figure e/f) . surprisingly, we also detected jund protein in ductal cells, despite lower jund regulon activity in this cell type (figures c, dii) . thus, protein expression does not always predict regulatory activity. nevertheless, these results show that jund is present and active in a subset of pancreatic cell types in the human adult pancreas and human pluripotent stem cell derived islet cells. altogether, this analysis predicts tfs active in human alpha cells, recovering known as well as new candidate tfs. among the top regulons identified in beta cells, we retrieved well-known as well as new candidate regulators of beta and endocrine cell identity. known tfs include rxrg, pdx , neurod , pax and rfx (zhou et al., ; miyazaki et al., ; smith et al., ; mastracci and sussel, ; hart et al., ) (figure g /h). in addition, we found that znf d, ascl and hoxd were highly ranked regulons ( figure h /i). hoxd and bhlhe have been shown to be present in the exocrine pancreas (cantile et al., ; sato et al., ) . interestingly, ascl has been reported to interact with β-catenin of the wnt pathway, the latter has an established role in endocrine fate specification during in vitro differentiation (schuijers et al., ; sharon et al., ; vethe et al., ) . many putative target genes of ascl including pdx , ins, abcc , foxa , kcnk , fxyd are directly related to glucose sensing and beta cell identity, in line with the beta cell-specific regulatory activity of ascl ( figure s a ) (gao et al., ; arystarkhova et al., ; vierra et al., ; park, lee and park, ) . fxyd γa, a regulatory subunit of the na + -k + -atpase, is a transcript exclusively expressed in human beta cells (flamez et al., ) . immunohistochemistry of human adult pancreas sections showed that ascl is expressed in ins + beta and islet cells ( figure j) . surprisingly, ascl was mainly localized to the cytoplasm ( figure j) , which is unexpected for tfs which tend to localize to the nucleus (baranek, sock and wegner, ) . cytoplasmic localization of ascl has been reported in the context of colon and breast cancer (zhu et al., ; xu et al., ) . these results implicate additional tfs including ascl in the regulation of beta cell identity. they also illustrate the value of network analyses to increase our understanding of the biology of the human pancreas. in summary, grn analysis and regulon ranking allowed us to pinpoint both known and novel candidate regulators of pancreatic endocrine cell identity, providing a resource for further investigation of their roles in cellular identity and function. similarly, the comprehensive network analysis provides an opportunity to predict and identify critical regulators of exocrine cell identity. we also identify known and new tfs in acinar cells. among the top acinar-specific regulons, we recovered well known regulators of acinar and exocrine cell identity such as ptf a, rbpjl, gata and nr a (ketola et al., ; masui et al., ; nissim et al., ; sakikubo et al., ) (figure a/b) . these findings are in line with a recent study that used single-nucleus rna-seq on pancreatic acinar tissue (tosti et al., ) . furthermore, we identified mecom, heyl and tgif as highly ranked regulons ( figure b /c). interestingly, aberrant mecom expression has been linked to the induction of gastric genes in acinar cells, which disrupts acinar cell identity and increases susceptibility to malignancy (hoang et al., ) . the loss of tgif has been linked to pancreatic ductal adenocarcinoma progression making further exploration of these regulons interesting in the context of cancer biology (weng et al., ) . ectopic expression of tgif (but not tgif ) reprograms mouse liver cells towards a pancreas progenitor state (cerdá-esteban et al., ) . heyl is a reported notch signalling target gene in ngn + exocrine cells (gomez et al, ) . we confirmed nuclear expression of heyl in human acinar and islet cells (figure di /ii, detailed donor information can be found in table s ) by immunohistochemistry, in agreement with elevated heyl regulon activity in acinar cells ( figure c ). further functional studies in the healthy pancreatic context (matsumoto et al., ; coleman et al., ) . in summary, grn analysis and regulon ranking allowed us to pinpoint both known and novel candidate regulators of pancreatic exocrine cell identity. specifically, we identified heyl as a candidate tf that might be important for acinar cell identity, warranting further investigation. to enable users to easily navigate the human pancreatic cell network atlas, we provide a loom file that allows for the visualisation and exploration of the data using the web-based portal scope (davie et al., ) (.loom file and tutorial available at http://scope.aertslab.org/#/pancreasatlas/*/welcome and https://github.com/pasquelab/scpancreasatlas). features such as cell type annotation as defined in this paper, gene expression and regulon activity can be explored on the regulon and gene expression based umap. this resource enables users to select and visualize up to three genes or regulons simultaneously and select subsets of cells for downstream analyses. for example, the expression of covid- related genes can be interactively explored (yang et al., ) . target genes of a specific regulon can be downloaded to facilitate further exploration, for example in iregulon or gene ontology analysis (janky et al., ) . a list of predicted target genes of all regulons can also be found in table s . furthermore, a list of target genes can be manually defined to compute the activity of a custom regulon. this resource can be used to further study cell identity and gene regulation in the context of the pancreas, diabetes and cancer. in this resource, we take advantage of integration strategies and new computational tools to reconstruct an integrated cell and grn atlas of the human adult pancreas from single-cell transcriptome data. this approach provides a comprehensive analysis of the gene regulatory logic underlying cellular identity in the human adult pancreas in a broad range of individuals, limiting the influence of inter-donor variability. we recovered known regulators of pancreatic cell identity and uncovered novel candidate regulators of cell identity that can be further investigated for their roles in cellular identity and function. by validating regulon analyses and creating an easily accessible interactive online resource which allows for the exploration of the gene regulatory state of cells from individuals, this approach extends beyond previous gene regulatory studies in the human adult pancreas (augsornworawat and millman, ) . the present analysis identified regulators of pancreatic development, function and survival that are known to be critical in humans because loss-of-gene function causes pancreatic agenesis or young onset diabetes. for example, ptf a (sellick et al., ; weedon et al., ) and gata (shaw-smith et al., ) , whose loss of function are linked to pancreatic agenesis and neonatal diabetes, were among the top acinarspecific regulons (figure ). in addition, monogenic diabetes related genes pdx (nicolino et al., ), neurod (rubio-cabezas et al., , pax (solomon et al., ) , rfx (smith et al., ; patel et al., ) and glis (senée et al., ) were among the top beta cell-specific regulons (figure and table s ). stress signalling (table s ) . creb and creb l are non-canonical er stress transducers that are induced in human islets and clonal beta cells upon exposure to the saturated fatty acid palmitate (cnop et al., ) . interestingly, srebf and - undergo similar er exit and proteolytic processing in the golgi as these er stress transducers, but they do so in response to changes in er cholesterol content; both also have high regulon activity in alpha and beta cells. xbp is abundantly expressed in the exocrine and endocrine pancreas (cnop et al., ) , but the xbp regulon has its highest specificity in beta cells. atf and atf are tfs that are activated upon eif α phosphorylation, an er stress response pathway to which no less than monogenic forms of diabetes belong (eizirik, pasquali and cnop, ) . our data underscores the importance of these tfs for endocrine pancreatic cell identity. given that we predict novel regulators of cell identity in the human pancreas, it will be interesting to also expand this analysis to pancreas embryonic development. our work may also be beneficial in guiding improvements of and better understanding the in vitro derivation of pancreatic cell types. for example, the emergence of sst-positive cells together with beta-like cells at the end of in vitro differentiation could be explained by the overlap in regulatory states between beta and delta cells (baron et al., ) . grn analyses are particularly interesting for in vitro derived beta cells since a better understanding of the regulatory logic underlying control of beta cell fate may improve or facilitate future applications in regenerative medicine (pagliuca et al., ; rezania et al., ; nostro et al., ; russ et al., ; baeyens et al., ) . alternatively, many observed grns such as ascl , mecom, ppard, gata and cdx are linked to pancreatic cancer making the additional exploration of grns interesting in the context of cancer biology (matsumoto et al., ; zhu et al., ; coleman et al., ; hoang et al., ; xu et al., ; weng et al., ; brunton et al., ) . finally, recent reports have stratified type diabetes patients based on age at diagnosis, bmi, hba c and insulin secretion and sensitivity, and identified subtypes with different genetic predisposition, treatment response, disease progression and complication rates (ahlqvist et al., ) . hence, it would be interesting to assess differences in gene regulatory state and gene expression profiles of alpha and beta cells between different type diabetic subgroups. it is important to note that pyscenic is a stochastic algorithm that does not produce precisely the same regulons for repeated applications, limiting reproducibility when comparing different datasets (huynh-thu et al., ; van de sande et al., ) . to mitigate this uncertainty, we ran the full pyscenic pipeline five times and only kept consistent regulons with the highest regulon activity. the performance of pyscenic, and other grn inference methods, suffers due to the large amount of drop-out events in scrna-seq data warranting caution when interpreting results (chen and mar, ) . this could explain the absence of wellestablished pancreas tfs such as mafa (olbrot et al., ) , mnx (flanagan et al., ) , neurog (krentz et al., ) , foxa (lee et al., ) and nkx - (mastracci et al., ) in this analysis. nevertheless, in support of the validity of our findings, atac-seq, literature and immunohistochemistry of human pancreas sections corroborate several pyscenic predictions. chen and colleagues underline the importance of using large sample sizes to derive the most accurate network inference possible (chen and mar, ) , highlighting the importance of dataset integration to increase the number of cells analysed. in the future, it will be interesting to extend our analyses to include many more cells and patients. in spite of current caveats, grn analysis has enabled the capture of biological relevant information (butte et al., ) . one additional limitation of this study is the assumption that all tfs bind their binding motifs in the promoters of expressed genes. however, tf binding can be restricted to a subset of tf motifs in the genome due to influence of chromatin processes including the presence of nucleosomes as well as dna methylation. therefore, additional approaches such as single cell multi-omics that capture additional layers of genome regulation will be helpful to increase our understanding of gene regulation in the context of the human pancreas. taken together, our grn atlas, containing individuals, provides a valuable resource for future studies in the human pancreas development, donor variability, homeostasis and disease including type diabetes and pancreatic cancer. finally, our results provide new insights into the activity of tfs and gene regulation in the human adult pancreas from a gene regulatory perspective. questions about data analysis should be directed to the lead contact, vincent pasque (vincent.pasque@kuleuven.be). the reviewer tokens for this geo repository is cfsjciumfferjkd. motif discovery of bulk atac-seq data paired-end raw reads for bulk atac-seq (see key resource table) were downloaded from sra using sra toolkit (v . . ). reads were aligned and further analyzed using the encode atac-seq pipeline with default parameters using the encode human reference genome grch . (lee, ) . bed files containing the global open chromatin landscape of adult alpha (alpha_ ; ea and alpha_ ; ea ), beta (beta_ ; ea and beta_ ; ea ), acinar (acinar_ ; ea and acinar_ ; ea ) and ductal (ea ) cells or cell type specific differentially accessible regions were used as input for motif discovery by homer (v . . ) using the 'findmotifsgenome.pl' with options using hg with size given (heinz et al., ) . the tfs whose motifs identified by homer correspond with tfs identified by pyscenic are visualized in figure s a . analysis of publicly available scrna-seq data raw reads for five publicly available scrna-seq datasets (see key resource table) were downloaded from sra using sra toolkit (v . . ). afterwards, reads were aligned to the human reference genome grch . using star (v . . a) with default parameters followed by the conversion to the coordinate sorted bam format. next, the featurecounts command from the "rsubread" (v . . ) package in r (v . . ) was used to assign mapped reads to genomic features. low quality transcriptomes with a mitochondrial contamination greater than % and less than expressed genes per cell were excluded from subsequent analyses. the resulting raw count matrix was batch corrected using the findintegrationanchors and integratedata functions from the "seurat" package (v . . ) after which subsequent analyses were carried out in the r package "seurat" (v . . ). gene expression was used to cluster all cells with umap, using seurat's function runumap. clusters for cell type annotation were defined using seurat's shared nearest neighbour algorithm findclusters function after which differential expression analysis was performed using wilcoxon's rank sum test with a minimum cutoff of . average log fold change and min.pct of . . pyscenic grns were inferred using pyscenic (python implementation of scenic, v . . ) in python version . . (aibar et al., ) . integrated read counts were used as input to run genie (huynh-thu et al., ) which is part of arboreto (v . . ). grns were subsequently inferred using pyscenic with the hg _refseq-r motif database and default settings. to control for the stochasticity, which is inherent to pyscenic, a consensus grn was generated by merging results from five repeat pyscenic runs. if regulons were identified in multiple pyscenic runs, only the regulon with the highest auc value was retained. regulon activity represented by aucell values was used to cluster all cells with umap, using seurat's runumap function. all regulons within non-diabetic cell types were visualized using the 'clustermap' function of the python package "seaborn" (v . . ). the z-score for each regulon across all cells was calculated using the z-score parameter of the seaborn 'clustermap' function. extended analysis of the target genes of specific regulons was conducted in cytoscape (v . . ) using the iregulon application (v . ). the list of target genes of a specific regulon was downloaded from the loom file through the scope platform (https://github.com/pasquelab/scpancreasatlas) (davie et al., ) . to quantify the cell-type specificity of a regulon, we utilized an entropy-based strategy as described previously (suo et al., ) using the aucell matrix as input in matlab r b. the top most specific regulons were subsequently visualized using the r package ggplot (v . . ). the complete regulon ranking list is available in table s . control ipsc line hel . (cosentino et al., ) was differentiated into beta cells using a previously published -step protocol (cosentino et al., ) . at the end of the stage , cells were seeded into -well aggrewell microwell plates (stem cell technologies) at a density of . · cells per well after which differentiation was carried out as described previously (cosentino et al., ) . stage differentiated beta cells were washed twice with pbs containing . mm edta and incubated in accumax (sigma #a ) for min at °c after which % volume of knockout serum replacement (thermofisher # ) was added to stop the reaction. after centrifugation at g for min at room temperature, cells were resuspended in ml ham's f- medium, supplemented as indicated before (demine et al., ) . , cells in μl medium were seeded per square of a nunc lab-tek ii icc chamber (thermofisher). immunohistochemistry analyses were carried out largely as described previously (demine et al., ) , immunohistochemistry analyses were carried out largely as described previously (ceulemans et al., ) , using primary antibodies against the following proteins: ascl (merck, mab , clone e , / ), jund (atlas antibodies, hpa , / ), heyl (atlas antibodies, hpa , / ) and ins (agilent, ir , / ). pictures were taken using a leica dmlb (leica microsystems). the integrated single-cell rna-seq data and pyscenic results can be explored interactively in scope (davie et al., ) . loompy (v . . ) (linnarsson lab., ) was used to create the loom files which were uploaded to scope. the embedding of the regulon and integrated gene expression based umap clustering, as seen in this article, were added to the loom file. this table is related to figure , , and . this table is related to figure . this table is related to figure , and . list of regulon specificity scores for all regulons of non-diabetic alpha, beta, acinar and ductal cells. this table is related to figure and . list of putative target genes for each regulon. this table is related to figure , and . novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables', the lancet diabetes and endocrinology scenic: single-cell regulatory network inference and clustering', nature methods reduced expression of plcxd associates with disruption of glucose sensing and insulin signaling in pancreatic β-cells', frontiers in endocrinology gata haploinsufficiency causes pancreatic agenesis in humans a chromatin basis for cell lineage and disease risk in the human pancreas an activator of the glucagon gene expressed in developing islet α-and β-cells hyperplasia of pancreatic beta cells and improved glucose tolerance in mice deficient in the fxyd subunit of na,k-atpase single-cell rna sequencing for engineering and studying human islets', current opinion in biomedical engineering re)generating human beta cells: status, pitfalls, and perspectives the pou protein oct- is a nucleocytoplasmic shuttling protein a single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure.', cell systems transcription factor gene mnx is a novel cause of permanent neonatal diabetes in a consanguineous family loss of re- silencing transcription factor accelerates exocrine damage from pancreatic injury', cell death and disease hnf a and gata loss reveals therapeutically actionable subtypes in pancreatic cancer integrating single-cell transcriptomic data across different conditions, technologies, and species', nature biotechnology discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks lineage dynamics of murine pancreatic development at single-cell resolution', nature communications hox d expression across tumor tissue types gata and gata control mouse pancreas organogenesis stepwise reprogramming of liver cells to a pancreas progenitor state by the transcriptional regulator tgif ', nature communications rna-sequencing-based comparative analysis of human hepatic progenitor cells and their niche from alcoholic steatohepatitis livers', cell death and disease evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data genetic evidence that nkx . acts primarily downstream of neurog in pancreatic endocrine lineage development', elife. elife sciences publications ltd rna sequencing identifies dysregulation of the human pancreatic islet transcriptome by the saturated fatty acid palmitate', diabetes endoplasmic reticulum stress and eif α phosphorylation: the achilles heel of pancreatic β cells role of peroxisome proliferator-activated receptor β/δ and b-cell lymphoma regulation of genes involved in metastasis and migration in pancreatic cancer cells opposing actions of arx and pax in endocrine pancreas development pancreatic -cell trna hypomethylation and fragmentation link trmt a deficiency with diabetes snap- b-deficiency increases insulin secretion and changes spatiotemporal profile of ca + oscillations in β cell networks', scientific reports a single-cell transcriptome atlas of the aging drosophila brain expression of zebrafish pax b in pancreas is regulated by two enhancers containing highly conserved cis-elements bound by pdx , pbx and prep factors pro-inflammatory cytokines induce cell death, inflammatory responses, and endoplasmic reticulum stress in human ipsc-derived beta cells disease progression and treatment response in data-driven subgroups of type diabetes compared with models based on simple clinical features: an analysis using clinical trial data', the lancet diabetes and endocrinology transcriptomes of the major human pancreatic cell types heterogeneity of the human pancreatic islet', diabetes pancreatic β-cells in type and type diabetes mellitus: different pathways to failure pdia regulates insulin secretion by selectively inhibiting the ridd activity of ire single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns regulation of pancreas duodenum homeobox- expression by early growth response- ' heterogeneity of pre-diabetes and type diabetes: implications for prediction, prevention and treatment responsiveness mapping gene regulatory networks from single-cell omics data', briefings in functional genomics a genomic-based approach identifies fxyd domain containing ion transport regulator (fxyd )γa as a pancreatic beta cell-specific biomarker analysis of transcription factors key for mouse pancreatic development establishes nkx - and mnx mutations as causes of neonatal diabetes in man a specific cnot mutation results in a novel syndrome of pancreatic agenesis and holoprosencephaly through impaired pancreatic and neurological development foxa and foxa maintain the metabolic and secretory features of the mature β-cell pdx maintains β cell identity and function by repressing an α cell program integrated single cell data analysis reveals cell specific networks and novel coactivation markers the zinc-finger factor insm (ia- ) is essential for the development of pancreatic β cells and intestinal endocrine cells neurogenin expressing cells in the human exocrine pancreas have the capacity for endocrine cell fate jund regulates pancreatic β cell survival during metabolic stress', molecular metabolism neurogenin is required for the development of the four endocrine cell lineages of the pancreas de novo prediction of stem cell identity using single-cell transcriptome data normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression', biorxiv. cold spring harbor laboratory construction of a human cell landscape at single-cell level', nature the developmental regulator pax is essential for maintenance of islet cell function in the adult mouse pancreas histone deacetylase inhibitors modify pancreatic cell fate determination and amplify endocrine progenitors simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities influence of organ donor attributes and preparation characteristics on the dynamics of insulin secretion in isolated human islets transcriptional maintenance of pancreatic acinar identity, differentiation, and homeostasis by ptf a', molecular and cellular biology targeting the cytoskeleton to direct pancreatic differentiation of human pluripotent stem cells', nature biotechnology inferring regulatory networks from expression data using tree-based methods', plos one tetraspanin- promotes glucotoxic apoptosis by regulating the jnk/β-catenin signaling pathway in human pancreatic β cells iregulon: from a gene list to a gene regulatory network using large motif and track collections neuron-enriched rna-binding proteins regulate pancreatic beta cell function and survival pathophysiology and treatment of type diabetes: perspectives on the past, present, and future', the lancet pancreatic cancer', the lancet transcription factor gata- is expressed in the endocrine and gata- in the exocrine pancreas research resource: the pdx cistrome of pancreatic islets phosphorylation of neurog links endocrine differentiation to the cell cycle in pancreatic progenitors regulation of the pancreatic exocrine differentiation program and morphogenesis by onecut /hnf ', cmgh single-cell transcriptomic analysis of pancreatic islets in health and type diabetes single-cell transcriptomes identify human islet cell signatures and reveal cell-typespecific expression changes in type diabetes foxa controls pdx gene expression in pancreatic β-cells in vivo kundajelab/atac_dnase_pipelines: atac-seq and dnase-seq processing pipeline foxa is required for enhancer priming during pancreatic differentiation essential interaction of egr- at an islet-specific response element for basal and gastrin-dependent glucagon gene transactivation in pancreatic α-cells btr: training asynchronous boolean models using single-cell expression data generation of functional beta-like cells from human exocrine pancreas a mettl -mettl complex mediates mammalian nuclear rna benchmarking atlas-level data integration in single-cell genomics', biorxiv. cold spring harbor laboratory nkx . and arx genetically interact to regulate pancreatic endocrine cell development and endocrine hormone expression regulation of neurod contributes to the lineage potential of neurogenin + endocrine precursor cells in the pancreas the endocrine pancreas: insights into development, differentiation, and diabetes replacement of rbpj with rbpjl in the ptf complex controls the final maturation of pancreatic acinar cells scode: an efficient regulatory network inference algorithm from single-cell rna-seq during differentiation cdx expression in pancreatic tumors: relationship with prognosis of invasive ductal carcinomas nuclear hormone retinoid x receptor (rxr) negatively regulates the glucosestimulated insulin secretion of pancreatic β-cells neurexin- α contributes to insulin-containing secretory granule docking a single-cell transcriptome atlas of the human pancreas a novel hypomorphic pdx mutation responsible for permanent neonatal diabetes with subclinical exocrine deficiency', diabetes mafa is critical for maintenance of the mature beta cell phenotype in mice iterative use of nuclear receptor nr a regulates multiple stages of liver and pancreas development', developmental biology efficient generation of nkx - + pancreatic progenitors from multiple human pluripotent stem cell lines the number of genes in the mammalian genome and the need for master regulatory genes identification of β-cell-specific insulin gene transcription factor ripe b as mammalian mafa generation of functional human pancreatic β cells in vitro a novel mutation of abcc gene in a patient with diazoxideunresponsive congenital hyperinsulinism chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants heterozygous rfx protein truncating variants are associated with mody with reduced penetrance', nature communications molecular architecture of lineage allocation and tissue organization in early mouse embryo', nature a method for the generation of human stem cell-derived alpha cells', nature communications pancreatic ductal deletion of hnf b disrupts exocrine homeostasis, leads to pancreatitis, and facilitates tumorigenesis', cmgh jnk pathway inhibition selectively primes pancreatic cancer stem cells to trail-induced apoptosis without affecting the physiology of normal tissue resident stem cells reversal of diabetes with insulin-producing cells derived in vitro from human pluripotent stem cells', nature biotechnology pancreatic β-cell electrical activity and insulin secretion: of mice and men homozygous mutations in neurod are responsible for a novel syndrome of permanent neonatal diabetes and neurological abnormalities', diabetes permanent neonatal diabetes and enteric anendocrinosis associated with biallelic mutations in neurog controlled induction of human pancreatic progenitors produces functional betalike cells in vitro ptf a inactivation in adult pancreatic acinar cells causes apoptosis through activation of the endoplasmic reticulum stress pathway', scientific reports a scalable scenic workflow for single-cell gene regulatory network analysis', nature protocols the basic helix-loop-helix transcription factor dec inhibits tgf-β-induced tumor progression in human pancreatic cancer bxpc- cells endocrine lineage biases arise in temporally distinct endocrine progenitors during pancreatic morphogenesis', nature communications ascl acts as an r-spondin/wnt-responsive switch to control stemness in intestinal crypts single-cell transcriptome profiling of human pancreatic islets in health and type diabetes mutations in ptf a cause pancreatic and cerebellar agenesis mutations in glis are responsible for a rare syndrome with neonatal diabetes mellitus and congenital hypothyroidism wnt signaling separates the progenitor and endocrine compartments during pancreas development gata mutations are a cause of neonatal and childhood-onset diabetes', diabetes a gene regulatory network cooperatively controlled by pdx and sox governs lineage allocation of foregut progenitor cells activator protein- has an essential role in pancreatic cancer cells and is regulated by a novel akt-mediated mechanism', molecular cancer research sox : a useful marker for pancreatic ductal lineage of pancreatic neoplasms differentiated human stem cells resemble fetal, not adult, β cells rfx directs islet formation and insulin production in mice and humans compound heterozygosity for mutations in pax in a patient with complex brain anomaly, neonatal diabetes mellitus, and microophthalmia the pax gene is essential for differentiation of insulin-producing β cells in the mammalian pancreas single cell transcriptomic profiling of mouse pancreatic progenitors pancreatic agenesis attributable to a single nucleotide deletion in the human ipf gene coding sequence comprehensive integration of single-cell data tissue-specific deletion of foxa in pancreatic β cells results in hyperinsulinemic hypoglycemia', genes and development revealing the critical regulators of cell identity in the mouse cell atlas mice lacking the homeodomain transcription factor nkx . have diabetes due to arrested differentiation of pancreatic β cells induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors in vivo metabolite profiling as a means to identify uncharacterized lipase function: recent success stories within the alpha beta hydrolase domain (abhd) enzyme family', biochimica et biophysica acta -molecular and cell biology of lipids single nucleus rna sequencing maps acinar cell states in a human pancreas cell atlas', biorxiv. cold spring harbor laboratory the effect of wnt pathway modulators on human ipsc-derived pancreatic beta cell maturation', frontiers in endocrinology direct conversion of fibroblasts to functional neurons by defined factors', nature type diabetes-associated k+ channel talk- modulates β-cell electrical excitability, second-phase insulin secretion, and glucose homeostasis', diabetes single-cell transcriptomics of the human endocrine pancreas recessive mutations in a distal ptf a enhancer cause isolated pancreatic agenesis.', nature genetics loss of the transcriptional repressor tgif results in enhanced kras-driven development of pancreatic cancer', molecular cancer age-dependent decline in the coordinated [ca +] and insulin secretory dynamics in human pancreatic islets', diabetes the ghrelin cell: a novel developmentally regulated islet cell in the human pancreas robust gene expression programs underlie recurrent cell states and phenotype switching in melanoma rna sequencing of single human islet cells reveals type diabetes genes elevated ascl expression in breast cancer is associated with the poor prognosis of patients skn- a/pou f functions as a master regulator to generate trpm -expressing chemosensory cells in mice a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids role of janus kinase/signal transducers and activators of transcription in the pathogenesis of pancreatitis and pancreatic cancer risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: a -year follow-up study', the lancet diabetes and endocrinology in vivo reprogramming of adult pancreatic exocrine cells to β-cells' ascl knockdown results in tumor growth arrest by mirna- b-related inhibition of colon cancer progenitor cells genome editing of lineage determinants in human pluripotent stem cells reveals mechanisms of pancreatic development and diabetes we thank stein aerts, kristofer davie and the stein aerts lab for discussions and creating the permanent scope link, shengbao suo for sharing the matlab script for calculating the regulon specificity score, the key: cord- - hkoeca authors: furstenau, tara n.; cocking, jill h.; hepp, crystal m.; fofanov, viacheslav y. title: sample pooling methods for efficient pathogen screening: practical implications date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hkoeca due to the large number of negative tests, individually screening large populations for rare pathogens can be wasteful and expensive. sample pooling methods improve the efficiency of large-scale pathogen screening campaigns by reducing the number of tests and reagents required to accurately categorize positive and negative individuals. such methods rely on group testing theory which mainly focuses on minimizing the total number of tests; however, many other practical concerns and tradeoffs must be considered when choosing an appropriate method for a given set of circumstances. here we use computational simulations to determine how several theoretical approaches compare in terms of (a) the number of tests, to minimize costs and save reagents, (b) the number of sequential steps, to reduce the time it takes to complete the assay, (c) the number of samples per pool, to avoid the limits of detection, (d) simplicity, to reduce the risk of human error, and (e) robustness, to poor estimates of the number of positive samples. we found that established methods often perform very well in one area but very poorly in others. therefore, we introduce and validate a new method which performs fairly well across each of the above criteria making it a good general use approach. for targeted surveillance of rare pathogens, screenings must be performed on a large number of individuals from the host population to obtain a representative sample. for pathogens present at low carriage rates of % or less, a typical detection scenario involves testing hundreds to thousands of samples before a single positive is identified. although advances in molecular biology and genomic testing techniques have greatly lowered the cost of testing, the large number of negative results still renders any of specimens in order to detect just a few thousand cases. the large number of negative tests struck dorfman as being extremely wasteful and expensive and he proposed that more information could be gained per test if many samples were pooled together and tested as a group [ ] . if the test performed on the pooled samples was negative (which was very likely), then all individuals in the group could be cleared using a single test. if the pooled sample was positive, it would mean that at least one individual in the sample was positive and further testing could be performed to isolate the positive samples. this procedure had the potential to dramatically reduce the number of tests required to accurately screen a large population and it sparked an entirely new field of applied mathematics called group testing. due to practical concerns, dorfman's group testing approach was never applied to syphilis screening because the large number of negative samples had a tendency to dilute the antigen in positive samples below the level of detection [ ] . despite this, sample pooling has proven to be highly effective when using a sufficiently sensitive, often pcr-based, diagnostic assay. in fact, ad hoc pooling strategies have long been used to mitigate the costs of pathogen detection in disease surveillance programs. for example, surveillance of mosquito vector populations in the u.s. involves combining multiple mosquitoes of the same species (typically - ) into a single pool, prior to testing for the presence of viral pathogens [ ] [ ] [ ] [ ] . elsewhere, such pooling techniques have been successful in reducing the total number of tests in systems ranging from birds [ ] , to cows [ ] , to humans [ ] [ ] [ ] . in many wildlife/livestock surveillance programs, sample pooling is used to simply determine a collective positive or negative status of a population (e.g. a herd or flock) without identifying individual positive samples. while this is often appropriate and sufficient for small-to-medium scale research experiments or surveillance programs, a well designed pooling scheme can easily provide this valuable information with little additional cost. for the purposes of this paper, we will focus on pooling methods that provide accurate classification of each sample so that infected individuals can be identified. group testing theory primarily focuses on minimizing the number of tests required to identify positive samples and many nearly-optimal strategies for sample pooling have been described. from a combinatorial perspective, a testing scheme begins by examining a sample space which includes all possible arrangements of exactly k positive samples in n total samples. because the positive samples are indistinguishable from negative samples, a test must be performed on a sample or a group of samples in order to determine their status. the test is typically assumed to always be accurate, even when many samples are tested together (in practice, this is often not the case and approaches that consider test error and constraints on the number of samples per pool have been examined [ , ] ). in the worst case, all of the samples would need to be tested individually requiring n tests. the goal of group testing is to devise a strategy which tests groups of samples together in order identify the positive samples in fewer than n tests. group testing methods are generally more efficient when positive samples are sparse. as the number of positive samples increases, the number of tests will eventually exceed individual testing for all of the methods. this point has been previously estimated to be roughly when the number of positives is greater than n for sufficiently large n [ , ] . in order to establish the most optimal testing procedure, non-adaptive and adaptive pooling approaches and, in each case, we assume that the test applied to the pools is noiseless (the test will always be positive if a positive sample is present in the pool and negative otherwise) and it produces only a binary or two-state outcome (e.g. positive/negative or biallelic snp typing). number of times any two samples are included in the same pool [ ] . this is achieved by staggering the samples that are added to each pool in different sized windows or intervals ( fig ) ; importantly, the size of the windows must be greater than √ n and co-prime to minimize the intersections between samples. the number of different pooling windows (the weight) should be one greater than the expected number (upper bound) of positive samples, w =k + , to ensure accurate results. fig . dna sudoku pooling example. in this example, there are a total of n = samples. the -well plates show which samples are combined into each pool for the two different window sizes (w = and w = which are greater than √ n and co-prime). by using two different window sizes, the weight of this pooling design is w = meaning that k = w − = positive sample can be unambiguously identified in a single step using t = w + w = tests. the positive samples are decoded by finding the samples that appear most often in the positive pools. for example, if g is the only positive sample, we can detect this from the pooling results by noticing that g was added to both of the positive (red) pools while other samples in those pools were added to only one or the other. alternatively, if both g and d are positive, four samples occur with equal frequency (d , g , e , and f ) in the positive pools (red and purple) and it is impossible to determine which are the true positive samples. this ambiguity is introduced because the test was designed to handle only one positive sample. multidimensional pooling is another non-adaptive approach that is generally easier to perform than dna sudoku but can be more prone to producing ambiguous results. as the name implies, this procedure can be extended to many dimensions [ , ] , however it becomes more difficult to perform without robotics when more than two dimensions are used. in the two dimensional ( d) case, n samples are arranged in a perfectly square d grid or in several smaller but still square sub-grids [ ] . for example, when testing samples (as in fig ) , this could be achieved through a single x grid or dorfman's original pooling design for syphilis screening was an adaptive two-stage test. following this method, samples are partitioned and tested in g groups of size n. all of the samples in groups with negative results are considered to be negative and all of the samples in groups with positive results are tested individually. ignoring the constraints of the actual assay, the optimal group size that minimizes the number of tests depends on the number of positive samples, k. specifically, there should be roughly √ n k groups of size n k [ , ] . dorfman's two-stage approach was later generalized to any number of stages using li's s-stage algorithm [ ] , which can reduce the number of tests sobel and groll [ , ] , introduced several adaptive group testing algorithms based on recursively splitting samples into groups and maximizing the information from each test result. they demonstrated that this class of algorithm is robust to inaccurate estimates of k, particularly in the case of the binary splitting by halving algorithm which can be performed without any knowledge of the number of positive samples. binary splitting by halving (fig ) begins by testing all of the samples in a single pool. if the test is negative, all of the samples are negative and testing is complete, if the test is positive, the samples are split into two roughly equal groups and only one of the groups is tested in each step. if the tested half is negative, we know that all of the samples in the tested group are negative and testing is now complete for those samples. we also know that testing [ , ] . this is the only approach discussed here that does not rely on an step ). if the tested half is negative, then all of the samples in the tested half are considered to be negative and at least one negative sample is known to be present in the other non-tested half of the samples. if the tested half is positive, then it contains at least one positive sample and no information is gained about the other untested half. in either case, the method continues by halving and testing whichever group is known to contain a positive sample until a single positive sample is identified (either by individual testing, as seen in step , or by elimination, as seen in step ) . once a single positive sample is identified, the remaining unresolved samples (non-grey wells) are pooled and tested to determine if any positive samples remain and the process continues until all positive samples are identified. only one test is required per round, and in this example, it takes sequential rounds to recover both positive samples. fewer tests [ ] . as the ratio of samples to positive samples ( n k ) increases, the number of tests required to identify k positive samples approaches k log n k which is nearly optimal; however, like binary splitting by halving, the generalized binary splitting approach requires many sequential steps to complete testing. here we are introducing a new approach that we developed with the goal of finding a good balance between the number of tests, the number of steps, simplicity, and robustness. we found that many of the methods described previously focus on optimizing only one of these features usually to the detriment of the others. instead of attempting to perform the best in a single area, we wanted to take a more balanced approach and find tradeoffs that allow good performance across each of these areas. our modified -stage approach (fig ) is based on the s-stage approach but it is modified so that the number of steps is constrained to a maximum of three. at three steps, this approach requires only one additional step than ambiguous non-adaptive approaches that require two steps for complete validation. because the s-stage algorithm is already fairly robust, constraining the number of steps does not have a large impact on the number of tests required. we also modified our method to be simpler and easier to perform by borrowing the recursive subdividing used in the binary splitting approaches. in the s-stage approach, the remaining samples in each step are arbitrarily redivided into pools. not only does this make it difficult to keep track of the remaining samples spread across the plate, it can also make it more difficult to collect the samples for a pool using a multichannel pipette (e.g. step in fig ) . instead, we opted to recursively subdivide the samples from positive pools. this makes it easier to keep track of the samples that should be pooled at each stage and, because the samples are always in close proximity, they are easier to collect using a multichannel pipette (compare and ). is the number of groups tested at each step. the number of pipettings for a single channel pipette was equal to the number of samples in each of the pools that were tested. for multichannel pipettes, the number of samples in each pool was divided by the number of channels and rounded up. in cases where the samples in the pool were not in adjacent wells, additional pipettings were required. experimental validation of modified -stage approach we set up rare pathogen detection experiments in complex microbiome backgrounds to test our modified -stage approach. we used a total of samples (eight -well plates) that contained a background of µl of dna extraction from cow's milk and µl of molecular grade water. these samples originated from distinct cow milk samples and were replicated ( replicates each) to fill eight -well plates -a total of unique microbiome backgrounds. c. burnetti dna ( µl) was added to randomly chosen background samples (∼ . % carriage rate) as we verified that the spike-in was successful using a highly sensitive taqman assay designed to target the is repetitive element in coxiella burnetti [ ] . using the same taqman assay, we also verified that the target pathogen was not present in any of the unique microbiome backgrounds prior to the spike-in. to ensure a consistent amount of background dna, the milk extractions were tested to determine the amount of bacteria with a real-time pcr assay that detects the s gene and compares it to a known standard [ ] . the pooling procedure was carried out by a typical researcher looking to identify the number of sequential steps is one of the major factors that differentiates pooling methods. the major benefit of non-adaptive pooling methods is that, in some cases, all of the tests can be run at the same time which means that testing can be completed faster. clearly, the nonadaptive tests required the fewest number of steps even when the results were ambiguous, necessitating a second round of validation . for samples, the highest weight that we tested was which meant that any simulation with or the number of samples that are combined in a single pool is a very important practical concern because it can determine whether the assay can produce accurate results. in most methods, using a multichannel pipette reduced the number of pipettings by an order of magnitude in some cases. compared to the -channel pipette, the likely because the method requires many samples to be pooled at each step for many steps. the performance is slightly improved when multichannel pipettes are used but it is still the least efficient in many cases. using a single pipette, dna sudoku was not the most inefficient compared to the other methods. however, because the samples that are combined in each pool are spaced out in different intervals instead of in consecutive groups, the number of pipettings did not improve by using multichannel pipettes. this means that, in the best case, a laboratory technician would need to correctly pipette table which shows that method is more sensitive to overestimation than to underestimation. of the methods that depended on an estimate of the number of positive samples, the s-stage (fig , left) and our modified -stage approach (fig , second from left) were the most robust to misestimations of k. the number of steps was more robust in the modified -stage approach than the s-stage due to the -step constraint; however, the modified -stage was more sensitive in the number of tests in some cases (table ) . dna sudoku (fig , left) was the most sensitive method overall. overestimating of the number of positive samples caused the weight of the pooling design (w =k + ) to be set higher than it needed to be. when this happened, all of the positive samples were still unambiguously identified but each unnecessary increase in the weight required more than √ n additional tests. when the number of positive samples was underestimated, fewer tests were performed but the pooling scheme was no longer able to unambiguously identify the positive samples in a single step and a second round of verification was required. a similar pattern occurred in the d pooling simulations (fig , right) . while the grid dimensions did not directly depend on k, generally larger grids were more efficient when the number of positive samples was low and smaller grids reduced ambiguous results when the number of positive samples was high but at the cost of many more tests. however, because d pooling was constrained to two dimensions, the number of tests did not vary as drastically as dna sudoku. table . although the expected number of positive samples per plate was ∼ given the . % carriage rate, the actual number of positives ranged from to and none of the plates had exactly positive samples ( table ). the taqman assay was able to accurately identify the positive pools without any false positives or false negatives even during the first step when the number of samples per pool was the largest at . using an -channel pipette where appropriate, a total of pipettings was required to pool the samples. a total of taqman assays were performed which is ∼ % fewer than would be required to individually test samples. picking the right pooling approach for a given pathogen surveillance campaign can be a complicated decision, which is often driven by a set of conflicting constraints and complexity. dna sudoku, however, is far from optimal for monitoring rapidly changing pandemics due to its extreme sensitivity to misestimation of the carriage rate of the pathogen in population. a good middle ground between the adaptive and non-adaptive pooling approaches is the modified -stage approach -our preference in our own surveillance applications. while it is never the absolute best in any one category, it is always nearly optimal in terms of number of serial steps ( nd best), complexity ( nd best), number of tests ( th best), and extremely resilient to misestimation of the carriage rate ( nd best). the latter is particularly important, as it allows this approach to be useful for surveillance in situations with rapidly changing pathogen carriage rates (e.g. in pandemic or seasonal outbreaks), while keeping number of serial steps as low as possible for an adaptive method. testing pooled sputum with xpert mtb/rif for diagnosis of pulmonary tuberculosis to increase affordability in low-income countries estimating community prevalence of ocular chlamydia trachomatis infection using pooled polymerase chain reaction testing impediments to wildlife disease surveillance, research, and diagnostics evaluating and testing persons for coronavirus disease (covid- ) the detection of defective members of large populations kwang-ming hwang f. combinatorial group testing and its applications searching for the proverbial needle in a haystack: advances in mosquito-borne arbovirus surveillance phylogenetic analysis of west nile virus in maricopa county, arizona: evidence for dynamic behavior of strains in two major lineages in the american southwest detection of west nile virus in large pools of mosquitoes west nile virus in the united states: guidelines for surveillance, prevention, and control active surveillance for avian influenza virus infection in wild birds by analysis of avian fecal samples from the environment pooled-sample testing as a herd-screening tool for detection of bovine viral diarrhea virus persistently infected cattle sample pooling as a strategy to detect community transmission of sars-cov- high-throughput pooling and real-time pcr-based strategy for malaria detection real-time, universal screening for acute hiv infection in a routine hiv counseling and testing population group testing: an information theory perspective. foundations and trends® in communications and information theory to pool or not to pool? guidelines for pooling samples for use in surveillance testing of infectious diseases in aquatic animals a boundary problem for group testing sharper bounds in adaptive group testing dna sudoku-harnessing high-throughput sequencing for multiplexed specimen analysis discovery of rare mutations in extensively pooled dna samples using multiple target enrichment screening of a brassica napus bacterial artificial chromosome library using highly parallel single nucleotide polymorphism assays a two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms a sequential method for screening experimental variables group testing to eliminate efficiently all defectives in a binomial sample binomial group-testing with an unknown proportion of defectives a method for detecting all defective members in a population by group testing rickettsial agents in egyptian ticks collected from domestic animals bactquant: an enhanced broad-coverage bacterial quantitative real-time pcr assay key: cord- -e q e v authors: mishra, shreya; srivastava, divyanshu; kumar, vibhor title: improving gene-network inference with graph-wavelets and making insights about ageing associated regulatory changes in lungs date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e q e v using gene-regulatory-networks based approach for single-cell expression profiles can reveal un-precedented details about the effects of external and internal factors. however, noise and batch effect in sparse single-cell expression profiles can hamper correct estimation of dependencies among genes and regulatory changes. here we devise a conceptually different method using graph-wavelet filters for improving gene-network (gwnet) based analysis of the transcriptome. our approach improved the performance of several gene-network inference methods. most importantly, gwnet improved consistency in the prediction of generegulatory-network using single-cell transcriptome even in presence of batch effect. consistency of predicted gene-network enabled reliable estimates of changes in the influence of genes not highlighted by differential-expression analysis. applying gwnet on the single-cell transcriptome profile of lung cells, revealed biologically-relevant changes in the influence of pathways and master-regulators due to ageing. surprisingly, the regulatory influence of ageing on pneumocytes type ii cells showed noticeable similarity with patterns due to effect of novel coronavirus infection in human lung. inferring gene-regulatory-networks and using them for system-level modelling is being widely used for understanding the regulatory mechanism involved in disease and development. the interdependencies among variables in the network is often represented as weighted edges between pairs of nodes, where edge weights could represent regulatory interactions among genes. gene-networks can be used for inferring causal models [ ] , designing and understanding perturbation experiments, comparative analysis [ ] and drug discovery [ ] . due to wide applicability of network inference, many methods have been proposed to estimate interdependencies among nodes. most of the methods are based on pairwise correlation, mutual information or other similarity metrics among gene expression values, provided in a different condition or time point. however, resulting edges are often influenced by indirect dependencies owing to low but effective background similarity in patterns. in many cases, even if there is some true interaction among a pair of nodes, its effect and strength is not estimated properly due to noise, background-pattern similarity and other indirect dependencies. hence recent methods have started using alternative approaches to infer more confident interactions. such alternative approach could be based on partial correlations [ ] or aracne's method of statistical threshold of mutual information [ ] . single-cell expression profiles often show heterogeneity in expression values even in a homogeneous cell population. such heterogeneity can be exploited to infer regulatory networks among genes and identify dominant pathways in a celltype. however, due to the sparsity and ambiguity about the distribution of gene expression from single-cell rna-seq profiles, the optimal measures of gene-gene interaction remain unclear. hence recently, sknnider et al. [ ] evaluated measures of association to infer gene co-expression based network. in their analysis, they found two measures of association, namely phi and rho as having the best performance in predicting co-expression based gene-gene interaction using scrna-seq profiles. in another study, chen et al. [ ] performed independent evaluation of a few methods proposed for genenetwork inference using scrna-seq profiles such as scenic [ ] , scode [ ] , pidc [ ] . chen et al. found that for single-cell transcriptome profiles either generated from experiments or simulations, these methods had a poor performance in reconstructing the network. performance of such methods can be improved if gene-expression profiles are denoised. thus the major challenge of handling noise and dropout in scrna-seq profile is an open problem. the noise in single-cell expression profiles could be due to biological and technical reasons. the biological source of noise could include thermal fluctuations and a few stochastic processes involved in transcription and translation such as allele specific expression [ ] and irregular binding of transcription factors to dna. whereas technical noise could be due to amplification bias and stochastic detection due to low amount of rna. raser and o'shea [ ] used the term noise in gene expression as measured level of its variation among cells supposed to be identical. raser and o'shea categorised potential sources of variation in geneexpression in four types : (i) the inherent stochasticity of biochemical processes due to small numbers of molecules; (ii) heterogeneity among cells due to cell-cycle progression or a random process such as partitioning of mitochondria (iii) subtle micro-environmental differences within a tissue (iv) genetic mutation. overall noise in gene-expression profiles hinders in achieving reliable inference about regulation of gene activity in a cell-type. thus, there is demand for pre-processing methods which can handle noise and sparsity in scrna-seq profiles such that inference of regulation can be reliable. the predicted gene-network can be analyzed further to infer salient regulatory mechanisms in a celltype using methods borrowed from graph theory. calculating gene-importance in term of centrality, finding communities and modules of genes are common downstream analysis procedures [ ] . just like gene-expression profile, inferred gene network could also be used to find differences in two groups of cells(sample) [ ] to reveal changes in the regulatory pattern caused due to disease, environmental exposure or ageing. in particular, a comparison of regulatory changes due to ageing has gained attention recently due to a high incidence of metabolic disorder and infection based mortality in the older population. especially in the current situation of pandemics due to novel coronavirus (sars-cov- ), when older individuals have a higher risk of mortality, a question is haunting researchers. that question is: why old lung cells have a higher risk of developing severity due to sars-cov- infection. however, understanding regulatory changes due to ageing using gene-network inference with noisy single-cell scrna-seq profiles of lung cells is not trivial. thus there is a need of a noise and batch effect suppression method for investigation of the scrna-seq profile of ageing lung cells [ ] using a network biology approach. here we have developed a method to handle noise in gene-expression profiles for improving genenetwork inference. our method is based on graphwavelet based filtering of gene-expression. our approach is not meant to overlap or compete with existing network inference methods but its purpose is to improve their performance. hence, we compared other output of network inference methods with and without graph-wavelet based pre-processing. we have evaluated our approach using several bulk sample and single-cell expression profiles. we further investigated how our denoising approach influences the estimation of graph-theoretic properties of gene-network. we also asked a crucial question: how the gene regulatory-network differs between young and old individual lung cells. further, we compared the pattern in changes in the influence of genes due to ageing with differential expression in covid infected lung. our method uses a logic that cells (samples) which are similar to each other, would have a more similar expression profile for a gene. hence, we first make a network such that two cells are connected by an edge if one of them is among the top k nearest neighbours (knn) of the other. after building knn-based network among cells (samples), we use graph-wavelet based approach to filter expression of one gene at a time (see fig. ). for a gene, we use its expression as a signal on the nodes of the graph of cells. we apply a graph-wavelet transform to perform spectral decomposition of graph-signal. after graph-wavelet transformation, we choose the threshold for wavelet coefficients using sureshrink and bayesshrink or a default percentile value determined after thorough testing on multiple data-sets. we use the retained values of the coefficient for inverse graph-wavelet transformation to reconstruct a filtered expression matrix of the gene. the filtered gene-expression is used for gene-network inference and other down-stream process of analysis of regulatory differences. for evaluation purpose, we have calculated inter-dependencies among genes using different co-expression measurements, namely pearson and spearman correlations, φ and ρ scores and aracne. the biological and technical noise can both exist in a bulk sample expression profile ( [ ] ). in order to test the hypothesis that graph-based denoising could improve gene-network inference, we first evaluated the performance of our method on bulk expression data-set. we used data-sets made available by dream challenge consortium [ ] . three data-sets were based on the original expression profile of bacterium escherichia coli and the single-celled eukaryotes saccharomyces cerevisiae and s aureus. while the fourth data-set was simulated using in silico network with the help of genenetweaver, which models molecular noise in transcription and translation using chemical langevin equation [ ] . the true positive interactions for all the four data-sets are also available. we compared graph fourier based low passfiltering with graph-wavelet based denoising using three different approaches to threshold the waveletcoefficients. we achieved - % improvement in score over raw data based on dream criteria [ ] with correlation, aracne and rho based network prediction. with φ s based gene-network prediction, there was an improvement in out of dream data-sets ( fig. a) . all the network inference methods showed improvement after graphwavelet based denoising of simulated data (in silico) from dream consortium ( fig. a) . moreover, graph-wavelet based filtering had better performance than chebyshev filter-based low pass filtering in graph fourier domain. it highlights the fact that even bulk sample data of gene-expression can have noise and denoising it with graph-wavelet after making knn based graph among samples has the potential to improve gene-network inference. moreover, it also highlights another fact, well known in the signal processing field, that wavelet-based filtering is more adaptive than low pass-filtering. in comparison to bulk samples, there is a higher level of noise and dropout in single-cell expression profiles. dropouts are caused by non-detection of true expression due to technical issues. using low-pass filtering after graph-fourier transform seems to be an obvious choice as it fills in a background signal at missing values and suppresses high-frequency outlier-signal [ ] . however, in the absence of information about cell-type and cellstates, a blind smoothing of a signal may not prove to be fruitful. hence we applied graph-wavelet based filtering for processing gene-expression dataset from the scrna-seq profile. we first used scrna-seq data-set of mouse embryonic stem cells (mescs) [ ] . in order to evaluate network inference in an unbiased manner, we used gene regulatory interactions compiled by another research group [ ] . our approach of graph-wavelet based pre-processing of mesc scrna-seq data-set improved the performance of gene-network inference methods by - percentage (fig. b) . however, most often, the gold-set of interaction used for evaluation of gene-network inference is incomplete, which hinders the true assessment of improvement. figure : the flowchart of gwnet pipeline. first, a knn based network is made between samples/cell. a filter for graph wavelet is learned for the knn based network of samples/cells. gene-expression of one gene at a time is filtered using graph-wavelet transform. filtered gene-expression data is used for network inference. the inferred network is used to calculate centrality and differential centrality among groups of cells. figure : improvement in gene-network inference by graph-wavelet based denoising of gene-expression (a) performance of network inference methods using bulk gene-expression data-sets of dream challenge. three different ways of shrinkage of graph-wavelet coefficients were compared to graph-fourier based low pass filtering. the y-axis shows fold change in area under curve(auc) for receiver operating characteristic curve (roc) for overlap of predicted network with golden-set of interactions. for hard threshold, the default value of % percentile was used. (b) performance evaluation using single-cell rna-seq (scrna-seq) of mouse embryonic stem cells (mescs) based network inference after filtering the gene-expression. the gold-set of interactions was adapted from [ ] (c) comparison of graph wavelet-based denoising with other related smoothing and imputing methods in terms of consistency in the prediction of the gene-interaction network. here, phi (φ s ) score was used to predict network among genes. for results based on other types of scores see supplementary figure s . predicted networks from two scrna-seq profile of mesc were compared to check robustness towards the batch effect. hence we also used another approach to validate our method. for this purpose, we used a measure of overlap among network inferred from two scrna-seq data-sets of the same cell-type but having different technical biases and batch effects. if the inferred networks from both data-sets are closer to true gene-interaction model, they will show high overlap. for this purpose, we used two scrnaseq data-set of mesc generated using two different protocols(smartseq and drop-seq). for comparison of consistency and performance, we also used a few other imputation and denoising methods proposed to filter and predict the missing expression values in scrna-seq profiles. we evaluated other such methods; graph-fourier based filtering [ ] , magic [ ] , scimpute [ ] , dca [ ] , saver [ ] , randomly [ ] , knn-impute [ ] . graphwavelet based denoising provided better improvement in auc for overlap of predicted network with known interaction than other methods meant for imputing and filtering scrna-seq profiles (supplementary figure s a ). similarly in comparison to graph-wavelet based denoising, the other methods did not provided substantial improvement in auc for overlap among gene-network inferred by two data-sets of mesc (fig. c , supplementary figure s b ). however, graph wavelet-based filtering improved the overlap between networks inferred from different batches of scrna-seq profile of mesc even if they were denoised separately (fig. c , supplementary figure s b ). with φ s based edge scores the overlap among predicted gene-network increased by % due to graph-wavelet based denoising (fig. c ). the improvement in overlap among networks inferred from two batches hints that graph-wavelet denoising is different from imputation methods and has the potential to substantially improve gene-network inference using their expression profiles. improved gene-network inference from single-cell profile reveal agebased regulatory differences improvement in overlap among inferred genenetworks from two expression data-set for a cell type also hints that after denoising predicted networks are closer to true gene-interaction profiles. hence using our denoising approach before estimat-ing the difference in inferred gene-networks due to age or external stimuli could reflect true changes in the regulatory pattern. such a notion inspired us to compare gene-networks inferred for young and old pancreatic cells using their scrna-seq profile filtered by our tool [ ] . martin et al. defined three age groups, namely juvenile ( month- years), young adult ( - years) and aged ( - years) [ ] . we applied graph-wavelet based denoising of pancreatic cells from three different groups separately. in other words, we did not mix cells from different age groups while denoising. graph-wavelet based denoising of a singlecell profile of pancreatic cells caused better performance in terms of overlap with protein-protein interaction (ppi) (fig. a , supplementary figure s a ). even though like chen et al. [ ] we have used ppi to measure improvement in genenetwork inference, it may not be reflective of all gene-interactions. hence we also used the criteria of increase in overlap among predicted networks for same cell-types to evaluate our method for scrnaseq profiles of pancreatic cells. denoising scrnaseq profiles also increased overlap between inferred gene-network among pancreatic cells of the old and young individuals (fig. b , supplementary figure s b ). we performed quantile normalization of original and denoised expression matrix taking all age groups together to bring them on the same scale to calculate the variance of expression across cells of every gene. the old and young pancreatic alpha cells had a higher level of median variance of expression of genes than juvenile. however, after graph-wavelet based denoising, the variance level of genes across all the age groups became almost equal and had similar median value (fig. c ). notice that, it is not trivial to estimate the fraction of variances due to transcriptional or technical noise. nonetheless, graph-wavelet based denoising seemed to have reduced the noise level in single-cell expression profiles of old and young adults. differential centrality in the co-expression network has been used to study changes in the influence of genes. however, noise in single-cell expression profiles can cause spurious differences in centrality. hence we visualized the differential degree of genes in network inferred using young and old cells scrna-seq profiles. the networks inferred from non-filtered expression had a much higher number of non-zero differential degree values in comparison to the de-noised version (fig. d, supplementary figure s c ). thus denoising seems to reduce differences among centrality, which could be due to randomness of noise. next, we analyzed the properties of genes whose variance dropped most due to graphwavelet based denoising. surprisingly, we found that top genes with the highest drop in variance due to denoising in old pancreatic beta cells were significantly associated with diabetes mellitus and hyperinsulinism. whereas, top genes with the highest drop in variance in young pancreatic beta cells had no or insignificant association with diabetes (fig. e) . a similar trend was observed with pancreatic alpha cells (supplementary figure s d ) . such a result hint that ageing causes increase in stochasticity of the expression level of genes associated with pancreas function and denoising could help in properly elucidating their dependencies with other genes. improvement in gene-network inference for studying regulatory differences among young and old lung cells. studying cell-type-specific changes in regulatory networks due to ageing has the potential to provide better insight about predisposition for disease in the older population. hence we inferred genenetwork for different cell-types using scrna-seq profiles of young and old mouse lung cells published by kimmel et al. [ ] .the lower lung epithelia where a few viruses seem to have the most deteriorating effect consists of multiple types of cells such as bronchial epithelial and alveolar epithelial cells, fibroblast, alveolar macrophages, endothelial and other immune cells. the alveolar epithelial cells, also called as pneumocytes are of two major types. the type alveolar (at ) epithelial cells for major gas exchange surface of lung alveolus has an important role in the permeability barrier function of the alveolar membrane. type alveolar cells (at ) are the progenitors of type cells and has the crucial role of surfactant production. at cells ( or pneumocytes type ii) cells are a prime target of many viruses; hence it is important to understand the regulatory patterns in at cells, especially in the context of ageing. we applied our method of denoising on scrnaseq profiles of cells derived from old and young mice lung [ ] . graph wavelet based denoising lead to an increase in consistency among inferred genenetwork for young and old mice lung for multiple cell-types (fig. a) . graph-wavelet based denoising also lead to an increase in consistency in predicted gene-network from data-sets published by two different groups (fig. b) . the increase in overlap of gene-networks predicted for old and young cells scrna-seq profile, despite being denoised separately, hints about a higher likelihood of predicting true interactions. hence the chances of finding gene-network based differences among old and young cells were less likely to be dominated by noise. we studied ageing-related changes in pagerank centrality of nodes(genes). since pagerank centrality provides a measure of "popularity" of nodes, studying its change has the potential to highlight the change in the influence of genes. first, we calculated differential pagerank of genes among young and old at cells (supporting file- ) and performed gene-set enrichment analysis using enrichr [ ] . the top genes with higher pagerank in young at cells had enriched terms related to integrin signalling, ht type receptor mediated signalling, h histamine receptor-mediated signalling pathway, vegf, cytoskeleton regulation by rho gtpase and thyrotropin activating receptor signalling (fig. c) . we ignored oxytocin and thyrotropin-activating hormone-receptor mediated signalling pathways as an artefact as the expression of oxytocin and trh receptors in at cells was low. moreover, genes appearing for the terms "oxytocin receptor-mediated signalling" and "thyrotropin activating hormone-mediated signalling" were also present in gene-set for ht type receptormediated signalling pathway. we found literature support for activity in at cells for most of the enriched pathways. however, there were very few studies which showed their differential importance in old and young cells, such as bayer et al. demonstrated mrna expression of several -htr including -ht , ht and ht in alveolar epithelial cells type ii (at ) cells and their role in calcium ion mobilization. similarly, chen et al. [ ] showed that histamine receptor antagonist reduced pulmonary surfactant secretion from adult rat alveolar at cells in primary culture. vegf pathway is active in at cells, and it is known that ageing has an effect on vegf mediated angiogenesis in lung. moreover, vegf based angiogenesis is for comparing two networks it is important to reduce differences due to noise. hence the plot here shows similarity of predicted networks before and after graph-wavelet based denoising. the result shown here are for correlation-based co-expression network, while similar results are shown using ρ score in supplementary figure s . (c) variances of expression of genes across single-cells before and after denoising (filtering) is shown here. variances of genes in a cell-type was calculated separately for different stages of ageing (young, adult and old). the variance (estimate of noise) is higher in older alpha and beta cells compared to young. however, after denoising variance of genes in all ageing stage becomes equal (d) effect of noise in estimated differential centrality is shown is here. the difference in the degree of genes in network estimated for old and young pancreatic beta cells is shown here. the number of non-zero differential-degree estimated using denoised expression is lower than unfiltered expression based networks.(e) enriched panther pathway terms for top genes with the highest drop in variance after denoising in old and young pancreatic beta cells. known to decline with age [ ] . we further performed gene-set enrichment analysis for genes with increased pagerank in older mice at cells. for top genes with higher pagerank in old at cells, the terms which appeared among most enriched in both kimmel et al. and angelids et al. data-sets were t cell activation, b cell activation, cholesterol biosynthesis and fgf signaling pathway, angiogenesis and cytoskeletal regulation by rho gtpase (fig. d) . thus, there was % overlap in results from kimmel et al. and angelids et al. data-sets in terms of enrichment of pathway terms for genes with higher pagerank in older at cells (supplementary figure s a , supporting file- , supporting file- ). overall in our analysis, inflammatory response genes showed higher importance in older at cells. the increase in the importance of cholesterol biosynthesis genes hand in hand with higher inflammatory response points towards the influence of ageing on the quality of pulmonary surfactants released by at . al saedy et al. recently showed that high level of cholesterol amplifies defects in surface activity caused by oxidation of pulmonary surfactant [ ] . we also performed enrichr based analysis of differentially expressed genes in old at cells (supporting file- ). for genes up-regulated in old at cells compared to young, terms which reappeared were cholesterol biosynthesis, t cell and b cell activation pathways, angiogenesis and inflammation mediated by chemokine and cytokine signalling. whereas few terms like ras pathway, jak/stat signalling and cytoskeletal signalling by rho gt-pase did not appear as enriched for genes upregulated in old at cells ( figure b , supporting file- ). however previously, it has been shown that the increase in age changes the balance of pulmonary renin-angiotensin system (ras), which is correlated with aggravated inflammation and more lung injury [ ] . jak/stat pathway is known to be involved in the oxidative-stress induced decrease in the expression of surfactant protein genes in at cells [ ] . overall, these results indicate that even though the expression of genes involved in relevant pathways may not show significant differences due to ageing, but their regulatory influence could be changing substantially. in order to further gain insight, we analyzed the changes in the importance of transcription factors in ageing at cells. among top genes with higher pagerank in old at cells, we found several relevant tfs. however, to make a stringent list, we considered only those tfs which had nonzero value for change in degree among gene-network for old and young at cells. overall, with kimmel at el. data-set, we found tfs with a change in pagerank and degree (supplementary table- ) due to ageing for at cells (fig. e) . the changes in centrality (pagerank and degree) of tfs with ageing was coherent with pathway enrichment results. such as etv which has higher degree and pagerank in older cells, is known to be stabilized by ras signalling in at cells [ ] . in the absence of etv at cell differentiate to at cells [ ] . another tf jun (c-jun) having stronger influence in old at cells, is known to regulate inflammation lung alveolar cells [ ] . we also found jun to be having co-expression with jund and etv in old at cell (supplementary figure s ) . jund whose influence seems to increase in aged at cells is known to be involved in cytokine-mediated inflammation. among the tfs stat - which are involved in jak/stat signalling, stat showed higher degree and pagerank in old at . androgen receptor(ar) also seem to have a higher influence in older at cells (fig. e ). androgen receptor has been shown to be expressed in at cells [ ] . we further performed a similar analysis for the scrna-seq profile of interstitial macrophages(ims) in lungs and found literature support for the activity of enriched pathways (supporting file- ). whereas gene-set enrichment output for important genes in older ims had some similarity with results from at cells as both seem to have higher pro-inflammatory response pathway such as t cell activation and jak/stat signalling. however, unlike at cells, ageing in ims seem to cause an increase in glycolysis and pentose phosphate pathway. higher glycolysis and pentose phosphate pathway activity levels have been previously reported to be involved in the pro-inflammatory response in macrophages by viola et al. [ ] . in our results, ras pathway was not enriched significantly for genes with a higher importance in older macrophages. such results show that the pro-inflammatory pathways activated due to aging could vary among different cell-types in lung. for the same type of cells, the predicted networks for old and young cells seem to have higher overlap after graph-wavelet based filtering. the label "raw" here means that, both networks (for old and young) were inferred using unfiltered scrna-seq profiles. wheres the same result from denoised scrna-seq profile is shown as filtered. networks were inferred using correlation-based co-expression. in current pandemic due to sars-cov- , a trend has emerged that older individuals have a higher risk of developing severity and lung fibrosis than the younger population. since our analysis revealed changes in the influence of genes in lung cells due to ageing, we compared our results with expression profiles of lung infected with sars-cov- published by blanco-melo et al. [ ] . recently it has been shown that at cells predominantly express ace , the host cell surface receptor for sars-cov- attachment and infection [ ] . thus covid infection could have most of the dominant effect on at cells. we found that genes with significant upregulation in sars-cov- infected lung also had higher pagerank in gene-network inferred for older at cells (fig. a) . we also repeated the process of network inference and calculating differential centrality among old and young using all types of cells in the lung together (supporting file- ). we performed gene-set enrichment for genes up-regulated in sars-cov- infected lung. majority of the panther pathway terms enriched for genes up-regulated in sars-cov- infected lung also had enrichment for genes with higher pagerank in old lung cells (combined). total out of significantly enriched panther pathways for genes up-regulated in covid- infected lung, were also enriched for genes with higher pagerank in older at cells in either of the two data-sets used here ( in angelids et al., in kimmel et al. data-based results). among the top enriched wikipathway terms for genes up-regulated in covid infected lung, has significant enrichment for genes with higher pagerank in old at cells (supporting file- ). however, the term type-ii interferon signalling did not have significant enrichment for genes with higher pagerank in old at cells. we further investigated enriched motifs of transcription factors in promoters of genes up-regulated in covid infected lungs (supplementary methods). for promoters of genes up-regulated in covid infected lung top two enriched motifs belonged to irf (interferon regulatory factor) and ets family tfs. notice that etv belong to sub-family of ets groups of tfs. further analysis also revealed that most of the genes whose expression is positively cor-related with etv in old at cells is up-regulated in covid infected lung. in contrast, genes with negative correlation with etv in old at cells were mostly down-regulated in covid infected lung. a similar trend was found for stat gene. however, for erg gene with higher pagerank in young at cell, the trend was the opposite. in comparison to genes with negative correlation, positively correlated genes with erg in old at cell, had more downregulation in covid infected lung. such trend shows that a few tfs like etv , stat with higher pagerank in old at cells could be having a role in poising or activation of genes which gain higher expression level on covid infection. inferring regulatory changes in pure primary cells due to ageing and other conditions, using singlecell expression profiles has tremendous potential for various applications. such applications could be understanding the cause of development of a disorder or revealing signalling pathways and master regulators as potential drug targets. hence to support such studies, we developed gwnet to assist biologists in work-flow for graph-theory based analysis of single-cell transcriptome. gwnet improves inference of regulatory interaction among genes using graph-wavelet based approach to reduce noise due to technical issues or cellular biochemical stochasticity in gene-expression profiles. we demonstrated the improvement in gene-network inference using our filtering approach with benchmark data-sets from dream consortium and several single-cell expression profiles. using different ways for inferring network, we showed how our approach for filtering gene-expression can help genenetwork inference methods. our results of comparison with other imputation, smoothing methods and graph-fourier based filtering showed that graph-wavelet is more adaptive to changes in the expression level of genes with changing neighborhood of cells. thus graph-wavelet based denoising is a conceptually different approach for preprocessing of gene-expression profiles. there is a huge body of literature on inferring gene-networks from bulk gene-expression profile and utilizing it to find differences among two groups of samples. however, applying classical procedures on single- shown for erg, which have higher pagerank in young at cells. most of the genes which had a positive correlation with etv and stat expression in old murine at cells were up-regulated in covid infected lung. whereas for erg the trend is the opposite. genes positively correlated with erg genes in old at had more down-regulation than genes with negative correlation. such results hint that tfs whose influence (pagerank) increase during ageing could be involved activating or poising the genes up-regulated in covid infection. cell transcriptome profiles has not proved to be effective. our method seems to resolve this issue by increasing consistency and overlap among gene-networks inferred using an expression from different sources (batches) for the same cell-type even if each data-sets was filtered independently. such an increase in overlap among predicted network from independently processed data-sets from different sources hint that estimated dependencies among genes reach closer to true values after graphwavelet based denoising of expression profiles. having network prediction closer to true values increases the reliability of comparison of a regulatory pattern among two groups of cells. moreover, recently chow and chen [ ] have shown that age-associated genes identified using bulk expression profiles of the lung are enriched among those induced or suppressed by sars-cov- infection. however, they did not perform analysis with systems-level approach. our analysis highlighted ras and jak/stat pathways to be enriched for genes with stronger influence in old at cells and genes up-regulated in covid infected lung. ras/mapk signalling is considered essential for self-renewal of at cell [ ] . similarly, jak/stat pathway is known to be activated in the lung during injury [ ] and influence surfactant quality [ ] . we have used murine aging-lung scrna-seq profiles however our analysis provides an important insight that regulatory patterns and master-regulators in old at cells are in such a configuration that they could be predisposing it for a higher level of ras and jak/stat signalling. androgen receptor (ar) which has been implicated in male pattern baldness and increased risk of males towards covid infection [ ] had higher pagerank and degree in old at cells. however, further investigation is needed to associate ar with severity on covid infection due to ageing. on the other hand, in young at cells, we find a high influence of genes involved in histamine h receptor-mediated signalling, which is known to regulate allergic reactions in lungs [ ] . another benefit of our approach of analysis is that it can highlight a few specific targets of further study for therapeutics. such as a kinase that binds and phosphorylates c-jun called as jnk is being tested in clinical trials for pulmonary fibrosis [ ] . androgen deprivation therapy has shown to provide partial protection against sars-cov- infection [ ] . on the same trend, our analysis hints that etv could also be considered as drug-target to reduce the effect of ageing induced ras pathway activity in the lung. we used the term noise in gene-expression according to its definition by several researchers such as raser and o'shea [ ] ; as the measured level of variation in gene-expression among cells supposed to be identical. hence we first made a base-graph (networks) where supposedly identical cells are connected by edges. for every gene we use this basegraph and apply graph-wavelet transform to get an estimate of variation of its expression in every sample (cells) with respect to other connected samples at different levels of graph-spectral resolution. for this purpose, we first calculated distances among samples (cells). to get a better estimate of distances among samples (cells) one can perform dimension reduction of the expression matrix using tsne [ ] or principal component analysis. we considered every sample (cell) as a node in the graph and connected two nodes with an edge only when one of them was among k-nearest neighbors of the other. here we decide the value of k in the range of - , based on the number of samples(cells) in the expression data-sets. thus we calculated the preliminary adjacency matrix using k-nearest neighbours (knn) based on euclidean distance metric between samples of the expression matrix. we used this adjacency matrix to build a base-graph. thus each vertex in the base-graph corresponds to each sample and edge weights to the euclidean distance between them. the weighted graph g built using knn based adjacency matrix comprises of a finite set of vertices v which corresponds to cells (samples), a set of edges e denoting connection between samples (if exist) and a weight function which gives nonnegative weighted connections between cells (samples). this weighted matrix can also be defined as a n xn (n being number of cells) weighted adjacency matrix a where a ij is if there is no edge between cells i and j , otherwise a ij = weight(i, j) if there exist an edge between i, j. the degree of a cell in the graph is the sum of weights of edges incident on that cell. also, diagonal degree matrix d of this graph comprises of degree d(i) if i = j, otherwise. a non-normalized graph laplacian operator l for a graph is defined as l = d − a. the normalized form of graph laplacian operator is defined as : both laplacian operators produce different eigenvectors [ ] . however, we have used a normalized form of laplacian operator for the graph between cells. the graph laplacian is further used for graph fourier transformation of signals on nodes (see supplementary methods) ([ ] [ ] ). for filtering in the fourier domain, we used chebyshev-filter for gene expression profile. we took the expression of each gene at a time considering it as a signal and projected it onto the raw graph (where each vertex corresponds to each sample) object [ ] . we took forward fourier transform of signal and filtered the signal using chebyshev filter in the fourier domain and then inverse transformed the signal to calculate filtered expression. this same procedure was repeated for every gene. this would finally give us filtered gene expression. spectral graph wavelet entails choosing a nonnegative real-valued kernel function which can behave as a bandpass filter and is similar to fourier transform. the re-scaled kernel function of graph laplacian gives wavelet operator which eventually produce graph wavelet coefficients at each scale. however, using continuous functional calculus one can define a function of self adjoint operator on the basis of spectral representation of graph. although for a graph with finite dimensional laplacian, this can be achieved by eigenvalues and eigenvectors of laplacian l [ ] . the wavelet operator is given by t g = g(l). t g f gives wavelet coefficients for a signal f at scale = . this operator operates on eigenvectors u l as t g u l = g(λ l )u l . hence, for any graph signal, operator t g operates on the signal by adjusting each graph fourier coefficient as and inverse fourier transform given as the wavelet operator at every scale s is given as t s g = g(sl). these wavelet operators are localized to obtain individual wavelets by applying them to δ n , with δ n being a signal with on vertex n and zero otherwise [ ] . thus considering coefficients at every scale, the inverse transform can be obtained as here, in spite of filtering in fourier domain, we took wavelet coefficients of each gene expression signal at different scales. thresholding was applied on each scale to filter wavelet coefficients. we applied both hard and soft thresholding on wavelet coefficients. for soft thresholding, we implemented well-known methods sure shrink and bayes shrink. finding an optimal threshold for wavelet coefficients for denoising linear-signals and images has remained a subject of intensive research. we evaluated both soft and hard thresholding approaches and tested an information-theoretic criterion known as the minimum description length principle (mdl). using our tool gwnet, user can choose from multiple options of finding threshold such as visushrink, sureshrink and mdl. here, we have used hard-thresholding for most the data-sets as proper soft-thresholding of graph-wavelet coefficient is itself a topic of intensive research and may need further fine-tuning. one can also use hardthreshold value based on the best overlap among predicted gene-network and protein-protein interaction (ppi). while applying it on multiple datasets we realized that threshold cutoffs estimated by mdl criteria and best overlap of predicted network with known interaction and ppi, were in the range of - percentile. for comparing predicted network from multiple data-sets, we needed uniform percentile cutoff to threshold graph-wavelet coefficients. hence for uniform analysis of several datasets, we have set the default threshold value of percentile. hence in default mode, wavelet coefficient with absolute value less than percentile was made equal to zero. gwnet tool is flexible, and any network inferences method can be plugged in it for making regulatory inferences using a graph-theoretic approach. here, for single-cell rna-seq data, we have used gene-expression values in the form of fpkm (fragments per kilobase of exon model per million reads mapped). we pre-processed single-cell gene expression by quantile normalization and log transformation. to start with, we used spearman and pearson correlation to achieve a simple estimate of the measure of inter-dependencies among genes. we also used aracne ( algorithm for the reconstruction of accurate cellular networks) to infer network among genes. aracne first computes mutual information for each gene-pair. then it considers all possible triplet of genes and applies the data processing inequality (dpi) to remove indirect interactions. according to dpi, if gene i and gene j do not interact directly with each other but show dependency via gene k, the following inequality hold where i(g i , g j ) represents mutual information between gene i and gene j. aracne also removes interaction with mutual information less than a particular threshold eps. we have used eps value to recently skinnider et al., [ ] showed superiority of two measures of proportionality rho(ρ) and phi(φ s ) [ ] for estimating gene-coexpression network using single-cell transcriptome profile. hence we also evaluated the benefit of graph-wavelet based denoising of gene-expression with measures of proportionality ρ and φ s . the measures of proportionality φ can be defined as φ(g i , g j ) = var(g i − g j ) var(g i ) where g i is the vector containing log values of expression of a gene i across multiple samples (cells) and var() represents variance function. the symmetric version of φ can be written as whereas rho can be defined as to estimate both measures of proportionality, ρ and φ, we used 'propr' package . [ ] . the networks inferred from filtered and unfiltered gene-expression were compared to the ground truth. ground truth for dream challenge dataset was already available while for single-cell expression, we assembled the ground truth from hip-pie (human integrated protein-protein interaction reference) database [ ] . we considered all edges possible in network, sorted them based on the significance of edge weights. we calculated the area under the receiver operator curve for both raw and filtered networks by comparing against edges in the ground truth. receiver operator is a standard performance evaluation metrics from the field of machine learning, which has been used in the dream evaluation method with some modifications. the modification for receiver operating curve here is that for x-axis instead of false-positive rate, we used a number of edges sorted according to their weights. for evaluation all possible edges sorted based on their weights in network are taken from the gene-network inferred from filtered and raw graphs. we calculated improvement by measuring fold change between raw and filtered scores. we compared the results of our approach of graphwavelet based denoising with other methods meant for imputation or reducing noise in scrna-seq profiles. for comparison we used graph-fourier based filtering [ ] , magic [ ] , scimpute [ ] , dca [ ] , saver [ ] , randomly [ ] , knn-impute [ ] . brief descriptions and corresponding parameters used for other methods are written in supplementary method. the bulk gene-expression data used here evaluation was download from dream portal (http://dreamchallenges.org/project/dream- network-inference-challenge/). the single-cell expression profile of mesc generated using different protocols [ ] was downloaded for geo database (geo id: gse ). single-cell expression profile of pancreatic cells from individuals with different age groups was downloaded from geo database (geo id:gse ). the scrna-seq profile of murine aging lung published by kimmel et al. [ ] is available with geo id : gse . while aging lung scrna-seq data published by angelids et al. [ ] is available with geo id: gse . the code for graph-wavelet based filtering of gene-expression is available at http://reggen. iiitd.edu.in: /graphwavelet/index.html. the codes are present at https://github. com/reggenlab/gwnet/ and supporting files are present at https://github.com/reggenlab/ gwnet/tree/master/supporting$_$files. an integrative approach for causal gene identification and gene regulatory pathway inference singlecell transcriptomics unveils gene regulatory network plasticity chemogenomic profiling of plasmodium falciparum as a tool to aid antimalarial drug discovery supervised, semi-supervised and unsupervised inference of gene regulatory networks reverse engineering cellular networks evaluating measures of association for single-cell transcriptomics evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data scenic: single-cell regulatory network inference and clustering scode: an efficient regulatory network inference algorithm from single-cell rna-seq during differentiation gene regulatory network inference from single-cell data using multivariate information measures characterizing noise structure in single-cell rna-seq distinguishes genuine from technical stochastic allelic expression noise in gene expression: origins, consequences, and control, science comparative assessment of differential network analysis methods murine single-cell rna-seq reveals cellidentity-and tissue-specific trajectories of aging wisdom of crowds for robust gene network inference genenetweaver: in silico benchmark generation and performance profiling of network inference methods enhancing experimental signals in single-cell rna-sequencing data using graph signal processing comparative analysis of single-cell rna sequencing methods a gene regulatory network in mouse embryonic stem cells recovering gene interactions from single-cell data using data diffusion an accurate and robust imputation method scimpute for single-cell rna-seq data single-cell rna-seq denoising using a deep count autoencoder saver: gene expression recovery for singlecell rna sequencing a random matrix theory approach to denoise single-cell data missing value estimation methods for dna microarrays single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns enrichr: interactive and collaborative html gene list enrichment analysis tool histamine stimulation of surfactant secretion from rat type ii pneumocytes aging impairs vegf-mediated, androgen-dependent regulation of angiogenesis dysfunction of pulmonary surfactant mediated by phospholipid oxidation is cholesterol-dependent age-dependent changes in the pulmonary renin-angiotensin system are associated with severity of lung injury in a model of acute lung injury in rats mapk and jak-stat signaling pathways are involved in the oxidative stress-induced decrease in expression of surfactant protein genes transcription factor etv is essential for the maintenance of alveolar type ii cells, proceedings of the national academy of sciences of the united states of targeted deletion of jun/ap- in alveolar epithelial cells causes progressive emphysema and worsens cigarette smoke-induced lung inflammation androgen receptor and androgen-dependent gene expression in lung the metabolic signature of macrophage responses imbalanced host response to sars-cov- drives development of covid- single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses the aging transcriptome and cellular landscape of the human lung in relation to sars-cov- jak-stat pathway activation in copd, the european androgen hazards with covid- the h histamine receptor regulates allergic lung responses late breaking abstract -evaluation of the jnk inhibitor, cc- , in a phase b pulmonary fibrosis trial androgen-deprivation therapies for prostate cancer and risk of infection by sars-cov- : a population-based study (n = ) visualizing data using t-sne discrete signal processing on graphs: frequency analysis wavelets on graphs via spectral graph theory how should we measure proportionality on relative gene expression data? propr: an r-package for identifying proportionally abundant features using compositional data analysis hippie v . : enhancing meaningfulness and reliability of protein-protein interaction networks an atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics we thank dr gaurav ahuja for providing us valuable advice on analysis of single-cell expression profile of ageing cells. none declared.vibhor kumar is an assistant professor at iiit delhi, india. he is also an adjunct scientist at genome institute of singapore. his interest include genomics and signal processing.divyanshu srivastava completed his thesis on graph signal processing for masters degree at computational biology department in iiit delhi, india. he has applied graph signal processing on protein structures and gene-expression data-sets.shreya mishra is a phd student at computational biology department in iiit delhi, india. her interest include data sciences and genomics. • we found that graph-wavelet based denoising of gene-expression profiles of bulk samples and singlecells can substantially improve gene-regulatory network inference.• more consistent prediction of gene-network due to denoising lead to reliable comparison of predicted networks from old and young cells to study the effect of ageing using single-cell transcriptome.• our analysis revealed biologically relevant changes in regulation due to aging in lung pneumocyte type ii cells, which had similarity with effects of covid infection in human lung.• our analysis highlighted influential pathways and master regulators which could be topic of further study for reducing severity due to ageing. key: cord- -xe ccw j authors: mayer, david; russell, seth; wilson, melissa p.; kahn, michael g.; wiley, laura k. title: developing and deploying a scalable computing platform to support mooc education in clinical data science date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xe ccw j one of the challenges of teaching applied data science courses is managing individual students’ local computing environment. this is especially challenging when teaching massively open online courses (moocs) where students come from across the globe and have a variety of access to and types of computing systems. there are additional challenges with using sensitive health information for clinical data science education. here we describe the development and performance of a computing platform developed to support a series of moocs in clinical data science. this platform was designed to restrict and log all access to health datasets while also being scalable, accessible, secure, privacy preserving, and easy to access. over the months the platform has been live it has supported the computation of more than students from countries. one of the major challenges faced by data science educators is managing student computing environments. typically educators must choose between teaching students how to set up an environment on their own computer or hosting a pre-configured server. setting up a local environment on each student computer, while authentic, is challenging and often time-consuming because of the variety of operating systems and sometimes insufficient user permissions (e.g., for students using employer-provided computers). server-based solutions shift managing the complexity of the computing environment to instructors, which reduces the authenticity of learning to manage the entire data science pipeline. a number of commercial solutions, like rstudio cloud, have emerged to support educators providing a hosted solution without having to manage servers directly. while server-based solutions have some costs associated, for students with internet access they can increase equity of education as all students have equal computational power regardless of their own computing hardware. , these technology challenges are magnified for those teaching data science focused massively open online courses (moocs). moocs are typically offered to thousands of learners across the globe completely asynchronously, increasing the number of unique computing environments and reducing instructor contact for individual-level support. previous data science moocs have devoted an entire week (~ hour) course to setting up students' computational environments. , others use technology embedded in the platform's learning management system (e.g., shared juptyerhub). , importantly, these data science moocs have not had a particular domain focus and thus can use openly available or non-sensitive data in their courses. while moocs are an attractive solution to the increasing demand for a clinical data science workforce, it is not clear how to support student computing environments when working with sensitive healthcare data. we developed a series of moocs ("specialization") on clinical data science, that uses real clinical data (mimic-iii demo database). at the time, all individuals seeking access were required to complete data use agreements. setting up a student's local environment would not allow us to restrict or track data download and potential sharing with external entities. built in data science solutions on the course hosting site had the same limitation and additionally would not allow instructors to restrict data access to only those students who had completed a data use agreement. in response to these challenges we sought to create a hosted computing platform that would both manage student access to restricted materials and accommodate the unique challenges posed by moocs. the primary goal of developing the computing platform was to create a system that would allow instructors to restrict and monitor access to sensitive clinical data to only those students who had signed required data use agreements. the secondary goals of the computing platform were to support challenges inherent in mooc education and hosted computing, namely: ) availability and scalability, ) secure and privacy preserving, and ) easy independent access. given the world-wide access and scope of moocs, the platform had to be available across the globe hours a day, days a week, and be able to support potentially hundreds of thousands of learners. as with all server-based solutions, especially with those hosting clinical data, the platform needed to be secure and have full logging of user activity. additionally, while it is not legally clear the extent to which moocs are subject to the family educational rights and privacy act (ferpa) in order to comply with the course hosting company's privacy policies, the computing platform needed to preserve student privacy. finally, given the limited contact with instructors and large student to instructor ratio, the platform onboarding process and use had to be as simple as possible. during development of the computing platform we had a number of resources that shaped the final product. first, we had previously run a version of the specialization as a regular university course that used our university approved google cloud platform (gcp) infrastructure to host student computation. this experience led us to develop a partnership with google cloud healthcare to provide financial support for the creation of the specialization and the hosting of student computation. second, the google cloud healthcare team routinely hosts healthcare datathons using gcp and mimic-iii datasets. the hosting guide, system configurations, and other pipelines to support these datathons are all publicly available on github. importantly, outside of the choice to use gcp, these resources informed, but did not dictate, the final computing platform created. gcp is a suite of cloud computing tools that includes support for computing, data storage and databases, networking, identity and security management tools, and advanced analytics for big data applications with all resources organized, managed, and billed to individual projects. organizations may have multiple projects and resources can be easily moved between projects as needed. we created a google organization (learnclinicaldatascience) for the computing platform that consisted of two sets of projects -one for developing and prototyping platform improvements (development) and the other used in production for hosting student work (production). within each set of projects, one is devoted to managing student enrollment and platform monitoring (management) and the other for hosting student work (computing). all projects use google compute engine (i.e., virtual servers), cloud operations logging and monitoring, and bigquery. the management project handles student enrollment and access with custom r-scripts, google forms, google groups, cloud identity and access management (iam), and sendgrid email service for student communication. the computing project hosts student computing using rstudio server pro (v . . - ) and r (v . . ). we developed a complete version of the computing platform in the development projects and then performed a series of beta tests. initial beta tests consisted of project team members (dm, lw) creating user accounts and performing basic computational tasks to ensure that the account management process performed as designed. basic security checks and penetration tests were conducted by sr to identify any obvious security risks in the platform. we then conducted a group beta test of local users to test simultaneous user registration and to develop an understanding of computing resources required for sample computational workloads similar to those used in the course. changes to the computing platform and course materials were made following the group beta test and the platform re-tested by project team members (dm, lw). after completion of all beta testing informed improvements, copies of the development machines were created in the production projects. the final computing platform was put into production in january . after moving to production, the development projects were suspended (e.g., shut down, but available for access as needed) and used intermittently to identify the impact of software updates and prototype new platform modifications. we analyzed data from the first months of computing platform usage (january , -august , ) to understand overall platform performance and associated costs. we identified the number of distinct students who registered for the computing platform (and accepted the data use agreement), and created a frequency map (by country) of all registered students using the city and/or country reported in their signed data use agreement. computing logs (e.g., unique r sessions, r code inputs, and queries run in bigquery) were analyzed to determine overall student usage (e.g., number with at least one entry of each type). we also investigated the average volume of platform usage across each day of the week, both with respect to a constant timezone (e.g., overall computing load at any single time point), and student timezone (e.g., what time of day students access course work). all computing logs are captured in utc. we inferred student local time by mapping student's reported country/city combination to the regionally observed local time zone. when city level data was not provided, a best attempt to assign a timezone was made by assuming the most prevalent timezone for the area. system performance and reliability were assessed by analyzing all system performance and reliability logs to identify types of errors encountered and total system downtime. finally, all available discussion forums on the main course hosting site (coursera), were manually reviewed (dm) and categorized by type of question asked to assess the overall frequency of questions generated by the computing platform. all platform metrics were analyzed with r version . . and a variety of packages for data processing, graphing and reporting. [ ] [ ] [ ] [ ] the clinical data science specialization and the associated course computing platform launched on january , . course programming assignments consist of html-based tutorials with associated rmarkdown documents requiring ~ mb of storage. as of august , a total of , students had registered for the first course in the specialization -where students are on boarded to the computing platform. a diagram of the final computing platform implemented is shown in figure . each google project (management, computing) consists of one or more virtual servers and a set of associated bigquery datasets. the management server has vcpu, . gb ram, and a gb ssd boot disk running ubuntu . . this machine is used to process student enrollments and manage all platform logging activities. the associated bigquery datasets consist of student management logs ( . gb), data use agreements ( mb), and r-session, r-console, and bigquery web ui logs ( . tb). earlier versions of the computing platform also used a license management server ( . vcpu, . gb ram) to host the license key for rstudio server pro within the management project. all servers in the management project have firewalls limiting access to the university of colorado campus. the computing server has vcpu, . gb ram, a gb ssd boot disk and an additional gb ssd disk for student file storage. students complete their coursework on this machine using r and a professional license of rstudio server. the rstudio server is configured to have all course related packages preinstalled, restrict terminal access, and set limits on file upload sizes (to attempt to limit loading of non-course data). a firewall is configured such that only https encrypted web traffic on the rstudio interface is accepted. student's browsers must support a minimum tls version of . and http strict transport security is enabled. each student , gold-standard labels for computational phenotyping and nlp courses, and a series of sample clinical notes generated from medical transcription training samples. every server in the platform has full logging of all system commands and resource usage (i.e., free memory, free cpu cycles, and free disk space) and os patches are applied automatically at regular intervals. students register for access through a vanity url (tech-registration.learnclinicaldatascience.org) that connects them to a google form where they sign the mimic-iii data use agreement and platform terms of service (e.g., only use the platform for course work, do not attempt to identify other students). these form responses are stored in a google sheet that is also accessible from bigquery. within the management project the management server runs a custom r-script at -minute intervals checking for new registrations. this script determines if the stuwdent is new or returning. new students are assigned a unique student id number, provisioned a linux user account on the computing server (which is linked to their google account), added to a private google group that has been assigned the appropriate iam roles for accessing course bigquery datasets, and a platform expiration date calculated (~ mo). for returning students (i.e., those with an expired account) their original student id number is identified, their prior linux account reassociated with their google account, and they are readded to the google group. when these onboarding processes are complete, an email is sent to the student confirming their registration and providing links to the course resources. students can access course data through bigquery.learnclinicaldatascience.org (redirects to the computing project on bigquery) and r/rstudio through rstudio.learnclinicaldatascience.org. students log in to rstudio using their google account, no passwords or personally identifying information is available on the computing server. six months after user registration, the student's linux account-google link is removed. while students are not notified of this change, if they try to login they will receive a custom error that asks them to reregister (at this time all their data is preserved). from the launch of the computing platform on january , through august , , , students have requested access to the platform and datasets. of these registered students, , ( . %) have logged in to the computing platform, , ( . %) queried course data, and ( . %) have run r code on the platform. registered students' location reported on their data use agreement shows that they live in more than different countries. the overall volume of student registrants by country is provided in figure . figure a shows a summary of the weekly average number of logins by time/day of the week with respect to mt. figure b shows a summary of the weekly average number of code commands executed by time/day of the week with respect to the students' time zone. the most frequent time/ day combination was thursday at noon, which was driven by work performed by unique students from india (gmt+ : ) across different thursdays in the evaluation period. manual review confirmed that this activity was related to course content. overall, among students who ran r code on the platform, students entered a median of . [iqr: , . ] code commands with a single student entering , code commands. the median amount of storage used by students was < mb with a range of - . mb. over the weeks the computing platform has been running there were unexpected alerts or other errors, of which resulted in downtime where students were unable to access the system. the majority of alerts originated on the computing server (n= ), alerts were related to cpu and memory utilization (i.e., available cpu < %, free ram < %). five uptime alerts on the computing server were related to: dns outages (n= , downtime only for affected regions), rstudio server errors (n= ), and a license management server outage (n= ). the license management server had unexpected outages totaling just under hours of downtime, however the majority (n= ) of these outages lasted less than min -the interval the computing server uses to confirm an active licence. one incident occurred overnight in mt and accounted for approximately an hour and a half of downtime on the computing server and resulted in at least two students unable to access the machine. the management server had no unscheduled downtime. table provides a summary of the total number of unexpected alerts and computing platform outage durations. overall the computing platform has had a system uptime percentage of . %. there have been instances of user onboarding errors. first, although we require all students to register with an "@gmail.com" address, some accounts are treated in iam as an "@googlemail.com" address. one student registered with such an account and was unable to access the platform until we adjusted our onboarding process to account for this scenario. three students registered with the same google account multiple times within minutes (the onboarding script interval) which resulted in them unexpectedly being assigned to the same user account. when multiple registrations occur in the same batch, a student id was only assigned to the first registration with subsequent registrations labelled as na, however an account and account/mapping were performed for both accounts. as the rstudio-google authentication map uses the most recently added mapping, all students accessed a single (shared) "na" user account. finally, students were not initially granted data access when registering. this was due to exceeding a google limit on iam users that can share a single role (n= , ). across the four launched courses of the clinical data science specialization there were forum posts available for analysis, with questions/issues raised (students can comment on forum posts to answer a question or echo concerns). of these issues, ( . %) related to the computing platform with the majority (n= , . %) due to the students not registering for access. the remaining included technical issues with students' registration (n= ) or students who had issues locating the resources they needed (n= ). managing student computing environments is a major challenge in data science education which is magnified when dealing with moocs and sensitive clinical data. when developing a series of clinical data science moocs we identified the need to develop a computing platform with the primary objective of restricting access to and logging access of sensitive health data. here we present the results of that work -a computing platform that can only be accessed by students who have completed all requirements for data access (signed data use agreement) where all data access (e.g., sql queries) and analysis (e.g., r-console commands) are logged and available for review. in addition to these fundamental objectives, we also had three technical requirements related to perceived challenges with hosted computing in moocs: ) availability and scalability, ) secure and privacy preserving, and ) easy independent access. although we prepared our courses with the potential for an international audience, it wasn't well known whether clinical data science coursework with a particular focus on customs/regulations in the united states would be of global interest. however given the global popularity of other data science moocs , we designed the platform to be accessible worldwide. indeed, we had students from around the globe register for access and since deployment of the platform at least one student has started an r session at every single hour of every day of the week. the global use of the platform is confirmed by analyzing platform usage relative to the students' time zone, where most students access the platform during more traditional learning hours ( am-midnight). this platform has also proven to be very stable with only incidents resulting in system-wide outages for only . hours across the , hours the platform has been available. one of the benefits of using a cloud infrastructure is that the platform has proven to be scalable on demand. initial deployments used larger servers and directory storage ( gb), as actual platform usage started we were able to initially reduce our storage to gb and eventually increase it to gb after a year of additional student access. similarly, we can adjust the server specifications (cpus and ram) as needed with limited downtime (simply requires restarting the server). this flexibility has had the additional benefit of allowing for cost control measures as we can shrink and grow the machine as needed. an additional benefit has been the query caching function within bigquery where identical repeated queries are not charged. these types of repeated queries are common for students working through set exercises. even without aggressive system size optimization, the computing platform development cost $ , and continuous access for more than students across months on the production platform only cost $ , . we took multiple steps to ensure the system security and student privacy. first, our platform design uses standard web-based security steps including enabling ssl and limiting server access for students as much as possible beyond those resources required for completing course content. in addition to logging all access and activity, we routinely monitor those logs for unapproved access attempts including running system commands. even if a student somehow bypassed our permissions restricting access for viewing/modifying system files, all student files are labeled by student number only. additionally, although we use google groups for managing data access roles, by using an organization google account the membership of this group (and group enrollment status) is kept entirely hidden from students. to our knowledge, outside of the single instance of three students getting mapped to the same user account, no private student information has been inappropriately accessed. given the exceptionally high student to instructor ratios in moocs, it was critical that our platform be accessible without extensive instruction or hands-on attention. to this end students interact with only three sites, all with custom vanity urls -one for registration and two for platform access. by all available measures our approach has worked well. technical issues accounted for a minority of forum posts across the courses. the majority of the issues raised were related to students having not attempted to register -suggesting that the primary improvement is needed within our learning management system to highlight the need to access the external site. although a minority of students did have actual technical issues accessing the site due to registration errors, these issues were usually fixed within the same or subsequent day after reporting on the forum. although available data supports the easy accessibility of the platform, the number of students registered is dramatically lower than course registrants. it is possible that this drop is due to unreported student issues with the platform. alternatively, moocs experience a well known phenomena where the number of students registering far exceeds those who perform any course work with even fewer completing the course. while we have pleasantly been surprised by the overall efficacy and efficiency of the system, there are numerous limitations to our approach. first, and most importantly, this approach was not without cost. although as highlighted above the expenses are manageable by right-sizing the system size, there is a non-trivial cost associated with hosting student computation. our numbers are also much smaller than would be expected for data science courses that use much larger datasets/computationally intensive algorithms. the mimic-iii demo data contains the records of only patients and though the sample note corpus is larger, most of the computation performed on the platform is happening within the database. second, although there is renewed interest in online computing solutions due to the covid- pandemic requiring remote education, this solution is likely over-engineered for the majority of university courses. additionally, due to platform security concerns we have not made the processing scripts available publically. finally, we have found a number of unexpected benefits from the course platform. first, logistically having full logging of all issues commands has been invaluable when answering student questions, both technical and around course content. when students report that they can't access course resources it's easy to pinpoint whether they simply haven't registered or if they are having some other issue. for course problems we are able to see what commands they are running and provide targeted recommendations for "common issues" students encounter. second, as educators and content creators, these logs also allow for valuable insight into how students interact with course materials. we hope to perform more robust studies of these data in the future to inform both clinical data science education best practices and to improve our own course materials. we have created a computing platform to support clinical data science mooc education that has been scalable, globally available, secure, privacy preserving, and generally supported independent access by a large number of students. practitioners teaching data science in industry and academia: expectations, workflows, and challenges rstudio cloud -do, share, teach, and learn data science equity, scalability, and sustainability of data science infrastructure the data scientist's toolbox the democratization of data science education introduction to data science in python lack of talent, direction afflict healthcare data analytics plans learn clinical data science mimic-iii clinical database demo educational privacy in the online classroom: ferpa, moocs, and the big data conundrum hosting services & apis datathon support by google cloud healthcare compute engine: virtual machines (vms) cloud monitoring & logging cloud data warehouse google forms: free online surveys for personal use cloud identity and access management r: a language and environment for statistical computing austria: r foundation for statistical computing welcome to the tidyverse colorbrewer palettes ggplot " based publication ready plots a new r package for mapping global data the mooc pivot we thank our google partners, especially marianne slight, kate strasburger, and stuart o'brian for their support and their team's technical advice. we are especially grateful to the mit laboratory for computational physiology whose work and support allowed us to provide students with real clinical data. our beta testers and students who have provided feedback on the platform have all significantly improved our final production platform. finally we would like to thank the extraordinary team who have contributed to the creation of the clinical data science mooc: chan voong, christine mousavi, jay billups, janet corral, deborah keyek-franssen, jill taylor, jill lester, jaimie henthorn, aileen sanders, alesia blanchard, ashley boshoven, and benita bazemore-cook. computing platform use and development costs were supported by our partnership with google cloud healthcare, and complimentary rstudio server pro licenses were provided by the rstudio education team. key: cord- -nc yf x authors: wichmann, stefan; scherer, siegfried; ardern, zachary title: computational design of genes encoding completely overlapping protein domains: influence of genetic code and taxonomic rank date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nc yf x overlapping genes (olgs) with long protein-coding overlapping sequences are often excluded by genome annotation programs, with the exception of virus genomes. a recent study used a novel algorithm to construct olgs from arbitrary protein domain pairs and concluded that virus genes are best suited for creating olgs, a result which fitted with common assumptions. however, improving sequence evaluation using hidden markov models shows that the previous result is an artifact originating from dataset-database biases. when parameters for olg design and evaluation are optimized we find that . % of the constructed olg pairs score at least as highly as naturally occurring sequences, while . % of the artificial olgs cannot be distinguished from typical sequences in their protein family. constructed olg sequences are also indistinguishable from natural sequences in terms of amino acid identity and secondary structure, while the minimum nucleotide change required for overprinting an overlapping sequence can be as low as . % of the sequence. separate analysis of datasets containing only sequences from either archaea, bacteria, eukaryotes or viruses showed that, surprisingly, virus genes are much less suitable for designing olgs than bacterial or eukaryotic genes. an important factor influencing olg design is the structure of the standard genetic code. success rates in different reading frames strongly correlate with their code-determined respective amino acid constraints. there is a tendency indicating that the structure of the standard genetic code could be optimized in its ability to create olgs while conserving mutational robustness. the findings reported here add to the growing evidence that olgs should no longer be excluded in prokaryotic genome annotations. determining the factors facilitating the computational design of artificial overlapping genes may improve our understanding of the origin of these remarkable genetic constructs and may also open up exciting possibilities for synthetic biology. the triplet nature of the standard genetic code and double-stranded configuration of dna together enable more than one protein to be encoded within the same nucleotide sequence in different reading frames. this property of the code has long been known to be utilised in viruses [ , ] and there is increasing evidence for overlapping encoding in other organisms [ , , ] , including many genes fully embedded within other coding sequences in alternate reading frames [ ] . while a mutation in a stop codon can easily create a short, trivial overlap in neighbouring genes as a chance event, longer, non-trivial overlaps should only be maintained in a genome if the overlapping region encodes a part of the protein essential for its function for both genes. there are a few hypothetical reasons why genes might overlap, and the evidence for functional antisense overlaps in prokaryotes has been discussed in a recent review [ ] . while the reduction of genome size is particularly relevant only for some viruses [ , ] , it has also been studied in bacteria [ ] . effects on gene regulation [ ] conceivably could affect all organisms, for instance there is the possibility of co-expression of same-strand overlapping genes (olgs) with the mother gene, given that they are potentially expressed from the same mrna. genes within an antisense overlapping pair could also influence each other, for instance in a way similar to what has recently been termed a "noncontiguous operon", where genes in antisense to each other are nonetheless co-expressed as an operon [ ] . other proposed benefits of overlapping genes relate to templating structure based on the existing 'mother gene', namely, for genes directly in antisense ("- frame"), the creation of proteins with a complementary polarity structure to the gene on the antisense strand [ , , ] or, in the case of sense overlaps, a similar hydrophobicity profile [ ] . overlapping open reading frames may play an important role in the origin of de novo genes, exploring new territory in the total space of sequences and functions [ , , , ] . while most currently extant olgs are not taxonomically conserved and therefore appear to be evolutionarily young [ ] , one claimed example of an ancient olg pair is comprised of the two classes of aminoacyl-trna synthetases which can be encoded in an overlapping manner [ , , ] . despite the many possible effects of overlapping genes (olgs), they are generally not considered a significant phenomenon outside of viruses, due perhaps to perceived difficulties in their evolution for some or all reading frames [ , ] . the idea that they have been more widespread has long been theorized [ , , ] . as a consequence, most gene prediction algorithms still exclude non-trivially overlapping genes [ ] , especially outside of bacteriophages and other viruses. the ncbi rules for annotation of prokaryotic genes do not allow genes completely embedded in another gene in a different frame without individual justification [ ] . even in viruses, relatively few overlapping genes have been annotated, particularly antisense gene pairs, although more are regularly being discovered including, for instance, in the pandemic viruses hiv and sars-cov- [ , , ] . a recent study [ ] quantified the difficulty of constructing olgs by picking random pairs of protein domains and rewriting them so as to overlap, with an algorithm minimizing the amino acid changes in each domain. this is a new approach, as previous studies tried to create overlaps without changing the amino acid sequence of the two genes, which resulted in either a very limited overlap length [ ] or could only be done for very specific genes [ ] . they found that, remarkably, % of arbitrary protein domain pairs were able to successfully overlap in at least one of the reading frames they investigated, and one of two positions tested. virus domains were much more likely to create putatively functional overlaps than domains from prokaryotes or eukaryotes, as determined by blast searches of the swiss-prot database. this result suggests that creating overlaps is not as difficult as might be expected, implying that an abnormally high threshold of evidence as compared to other gene types should not be required for verifying their existence. this high success rate also opens up many possibilities for synthetic biology. for instance, mutations in overlapping regions are expected to be more deleterious on average, so an artificial genome with many olgs is not only smaller but also expected to be more stable over time on a population level, as mutations are more likely to be strongly selected against. a recent method for stabilizing synthetic genes [ ] , where an arbitrary orf was constructed to overlap a gene of interest and was concatenated with an essential gene downstream, could be taken a large step forward by overlapping whole genes thereby creating a system where not only 'polar' mutations are selected against but also more minor mutations, if they also affect the mother gene. genome size has become a significant limiting factor for biomolecular computing, in which genetic programs are inserted into cells [ ] . existing compression methods [ ] could be greatly improved by using olgs, making more complex systems possible. in this context a well designed stable synthetic genome could include fail-safe measures, such that faulty genetic programs would shut down. here the algorithm provided in [ ] is used but improved in the evaluation of the constructed sequences as the analysis in the previous study has some weaknesses resulting in incorrect claims. determining whether an artificial sequence has a specific function from its amino acid sequence only is a very hard problem and not possible today. progress is being made in predicting the protein structure from amino acid sequence [ ] , but protein structure does not determine function as essential binding sites can be rendered useless if the amino acid is changed while not changing the overall protein structure. ultimately only experiment can definitively determine the function of a given amino acid sequence. in order to aid the design of expensive experimental setups however, it can at least be determined bioinformatically how similar an artificial sequence is to sequences with known functions. in this study the artificially designed sequences are compared to their original sequences in terms of amino acid identity, amino acid similarity, hidden markov model profile and secondary structure in order to determine the impact of olg construction and which sequences are potentially functional. firstly, the details on how some technical artifacts arose are explained and how to avoid them. in order to further improve the analysis hidden markov models rather than blast is used in this study. while the previous study [ ] tried to estimate an upper limit of how many domains can be successfully overlapped in at least one reading frame and position, here the average success rate for olg construction is determined instead, which is more relevant in relation to both understanding constraints on the formation rate of naturally occuring olgs and in assessing the likelihood of successful synthetic creation of olgs. these results in one sense give an upper estimate of the ease of creating overlaps as the difficulty of obtaining an overlapping gene pair naturally is not directly addressed here. on the other hand, overlapping functional domains directly is a "worst case scenario" as there is some evidence that the critical functional domains of one protein in an olg pair tend to overlap less constrained regions of the other protein [ ] , and this segregation is also intuitively plausible. in order to estimate the difficulty of achieving overprinting naturally, the minimal number of nucleotide changes needed to create the olg sequence is determined. whether functional domains do in fact overlap in nature, however, deserves further attention. by expanding the analysis of the previous study [ ] from the reading frames '+ ','- ' and '- ' to all reading frames (see fig. for reading frame definitions), the observed differences between reading frames can be related to the structure of the standard genetic code. through constructing olgs using randomly generated genetic codes it can be studied whether the standard code shows evidence of optimisation regarding olgs. using the improved evaluation of the designed olgs it can be shown that virus genes, surprisingly, are less suited than bacterial and eukaryotic genes to design olgs. figure : illustration of the alternative reading frames. the '+ ' frame is the standard or reference reading frame and '+ '/'+ ' the sense overlaps, while frames '- ' to '- ' are on the antisense strand. in [ ] constructed sequences were evaluated with a blast search against the swiss-prot database. if both overlapping sequences had a match to the best hit with at most an e-value of ^(- ) and a match length of %, the overlap was considered successful. however, the initial sequences were picked from the pfam seed database and it can be shown that most of the chosen sequences are not well represented in the swiss-prot database (see left panel in fig. ) , with the exception of virus genes. in a search against the swiss-prot database, identities of over % were only found for % of the non virus genes, while % of the virus genes could be found in this category. a curated set in which all sequences have a % match in the swiss-prot database but otherwise the same properties has a remarkable % success rate for overlaps and the virus vs non-virus difference vanishes (see right panel in fig. ). the advantage reported for virus genes is thus fully explained by dataset-database biases. in any case, the extremely high overall success rate obtained should be investigated. either creating overlaps is indeed unexpectedly easy or the evaluation of functionality used in [ ] is not conservative enough. it can be shown that both factors appear to contribute to the surprising result. [ ] with different match identities in swiss-prot -virus genes from this dataset have a higher average identity to a swiss-prot entry than non-virus genes. right: percentage of functional olgs for the original dataset used in [ ] and the average of curated datasets grouped into virus and non virus genes. in curated datasets all original sequences have an exact match in swiss-prot. each curated dataset has sequences with - amino acids. the virus versus non-virus difference observed in the dataset of [ ] vanishes for the curated datasets. when introducing the minimal number of changes required for two random sequences to fully overlap each other, a similar percentage of each sequence is expected to change. in such a case the e-values of the constructed sequences would be strongly length dependent, as a longer sequence with the same similarity has a lower probability of being found by chance in a database of a given size. when picking datasets with different sequence lengths such a lengthdependence can be found in the blast evaluation (supplementary fig. s ). a fixed e-value cutoff cannot adequately evaluate sequences in such a situation as the cutoff value fully determines the result and is chosen arbitrarily. the sequences used in [ ] have a length of - amino acids, and the high success rate for the curated set can be explained by a combination of the sequence length and the choice of the cutoff value. in order to find a reasonable alternative to the fixed e-value cutoff, hidden markov models (hmms) can be used to score the constructed sequences. here hmmer (v . . ) [ ] is used to create profiles for each protein domain family in the pfam database [ ] in order to score the constructed sequences. the pfam database consists of a 'seed' database, containing trusted sequences for each family which are used to create hmm profiles, and a 'full' database, containing all the sequences of the uni-prot database sorted into the different families according to the previously constructed profiles. here the hmm profiles are also constructed from the 'seed' sequences and in order to find the sequence most closely representing the profile all full sequences are tested against the profiles. the highest scoring sequences are used to construct olgs. the rest of the 'full' sequences are used as a comparison for the overlapping region of the constructed sequences. a constructed sequence is judged successful if it has a higher score than a sequence at a defined threshold percentile of the 'full' sequences, thereby creating a threshold value which is individual for each protein family. here results for different threshold percentiles are discussed, while highlighting two particular percentile values. firstly, the th percentile (median), which marks the score of a typical sequence in the protein family. in this analysis, sequences meeting this threshold can not be distinguished from the naturally occurring protein domains and they will be categorised as typical proteins. since all sequences in the 'full' group are naturally occurring sequences, scoring at least as highly as any of these sequences renders a sequence biologically relevant. in order to avoid extreme outliers which may be misclassified, the th percentile is used as the biologically relevant threshold. a relative threshold could alternatively be established with e.g. blast by first picking a single sequence as a starting point for construction and also for comparing the rest of the protein family to in order to find the threshold score as described above. in this case however, it is not clear which sequence to choose as a starting point. a randomly picked sequence could be an outlier of the protein family, resulting in unreliable comparison scores and a higher chance of losing function after constructing olgs. hmms on the other hand provide a profile reflecting the 'average' sequence, which is a better representative for the whole protein family. choosing a family-specific threshold value takes care of most of the length dependencies, but in order to be sure and to be able to compare sequences of different lengths, each score resulting from a comparison between a sequence and a hmm profile is divided by the sequence length. here scores are used instead of e-values, as the latter also depend on the database size, an arbitrary factor in this analysis. aligning the best sequence with the 'seed' sequences using mafft (v . ) [ ] , weights used for sequence construction can be determined just as in [ ] . a more detailed description of the calculation of the weights and their influence can be found below. when studying the influence of a protein family's taxonomic classification on the construction of olgs, the 'seed' and the 'full' database are first filtered by the four major taxonomic groups -archaea, bacteria, eukaryotes and viruses -before creating the profiles and the thresholds. muscle (v . . ) [ ] was used for realigning the 'seed' sequences after taxonomic filtering. for subsequent analyses, random sets from the ~ pfam families were chosen, with the condition that each family must have at least 'seed' sequences and 'full' sequences in order for the weights and the thresholds to be reasonably defined. each dataset consists of families since the variance of the resulting olg success rate barely declines for larger sets (see supplementary fig. s ). fig. summarizes the workflow. hmm profiles are constructed from the seed sequences. the sequence with the highest score from the full group is used for olg design. the remaining sequences in the full group are used to construct threshold scores used to evaluate the designed olgs. in order to estimate the expected success rate of an individual overlap attempt, the domains are overlapped at random positions such that one domain is fully embedded into the other. just as in [ ] the sequence with the lower quality of the two constructed olgs is used as a conservative representative of the pair. after determining the success for each position, the percentage of successful positions for each olg pair, the average success rate in each reading frame, and the overall success rate averaged across reading frames are calculated. the number of possible positions for each olg pair is equal to their difference in length plus one, so using more than one overlap position in each pair is only possible for genes with different lengths. increasing the number of positions for each gene does not change the expected success rate but reduces its variation between different sets (see supplementary fig. s ). comparing the variation caused by choosing random positions and the variation caused by choosing random pfam families, the former turns out to be negligible and consequently only a single randomly chosen position for each olg pair is used for subsequent analyses. the distribution of the percentage of successful positions in each olg pair is calculated from up to different positions (see fig. ). . % of all olg pairs form biologically relevant sequences at all positions in every reading frame while only . % cannot form a biological relevant sequence at any position (see fig. ). . % of the pairs even form typical proteins, as determined by the th percentile threshold, at every position in any reading frame (see right panel in fig. ). this result is strongly dependent on the threshold percentile chosen, but due to the wide range of possible results it can still be concluded that the chance of success of a constructed olg pair depends strongly on the particular genes used, as might be expected. in each olg pair sets of up to random positions were tested against the pfam group hmms using the 'biologically relevant' threshold ( th percentile) and the 'typical sequence' threshold ( th percentile) for a successful overlap. while . % of the pairs can be overlapped at any position and . % in no position using the biological threshold only . % can be overlapped at any position and . % in no position using the threshold of typical sequences. the sequence threshold strongly influences the result. in order to determine whether the relative evaluation of olgs really removed the length dependency, the average quality 'q' of an olg pair is determined and compared for olg pairs with different lengths. q is defined as the ratio of the scores of the constructed sequence (s) over the original sequence (s_max) times . the quality is therefore the percentage score loss due to the overlap. supplementary fig. s shows the mean quality for datasets with different sequence lengths. starting from around amino acids, q is indeed mostly independent of sequence length. the low q values of smaller sequences are because these sequences are less frequently matched to their respective hmm-profile, which results in a score of zero. the reason is probably that the shorter sequences fall below internal detection thresholds of hmmer more easily. changing a single amino acid in a short gene changes its quality to a greater extent than in a long gene, resulting in larger fluctuations, which can lower the sequence below detection thresholds. lowering internal thresholds of hmmer did not lead to more sequences being recognized by their respective profile. in further analysis the minimum sequence length of amino acids is used so that the percentage of olg pairs in which at least one sequence is not recognised is below % (see supplementary fig. s ). when taking both sequences of each pair and not only the one with the lower quality, the quality distribution converges to a broad peak at around % with increasing sequence length (see supplementary fig. s ) . since the quality also depends on the flexibility of the hmm profiles used to score the sequences the peak is not expected to get any narrower with increasing sequence length and thus to reduce variations in sequence similarities between the constructed and the original sequences. the algorithm to construct olg sequences from [ ] uses an exchange matrix (blosum [ ] ) to find the closest overlapping sequences to the original ones. it determines the codon with the highest sum of the scores for the exchanges in both sequences at each position. sequence weights can prioritise the score of either one or the other sequence at different positions in order to increase the chance of obtaining functional sequences. in [ ] , the weight w_i at position i of the sequence is w_i=e^(-s_i), where s_i is the entropy calculated at position i in the alignment. the weights could be defined differently such that their influence on olg construction is stronger or weaker. in order to optimize the weight strength a factor k is added to the entropy in their calculation such that w_i=e^(-ks_i). varying k> , the optimal weight strength for constructing olgs can be determined, while k= means no weights are being used. in the hmm evaluation the influence of k is very weak. a value of k= . is used in order to maximise the quality, q (see supplementary fig. s ). picking very high k values q goes to zero. in this case at each position the sequence with the higher conservation maintains its amino acid. this indicates that it is crucial that at each position both sequences are changed in order to create functional olgs. in the blast evaluation k= is optimal (see supplementary fig. s ), such that no better value can be found for k> . blast does not take special account of conserved regions of a sequence, so weights can improve one sequence but at the same time will reduce the score of the other. since the lowest scoring of the two sequences is taken to represent the olg pair, introducing weights has a high chance of reducing the success rate in an evaluation using blast, despite increasing biological relevance. this makes an evaluation using hmm or any other method that takes into account sequence conservation significantly preferable for judging constructed olg pairs. the five alternative reading frames differ strongly in the combinatorial constraints imposed by the reference gene (mother gene) via the standard genetic code [ ] , e.g. the sequence n|gcn|, with n being any nucleotide, always translates to alanine in the + and the - frame. it is interesting whether this difference in constraint transfers to the success rate for designing olgs. for olgs resembling typical proteins of their respective families, the success rates for olg construction varies from . % in the '- ' frame to . % in the '- ' frame with an average value of . % across all reading frames (see fig. ). calculating the e-value just as in [ ] as a reference, the constructed olgs have a median e-value of ^- ( ) to ^(- ), decreasing with increasing threshold percentile. the result is strongly threshold dependent as . % of the constructed sequences score at least as highly as the worst sequence in the full group, while only . % score better than % of the full group. considering combinatorial restrictions of different reading frames [ ] the ranking of frames by success rate are exactly as expected, insofar as the success rate of each reading frame is inversely proportional to the extent of combinatorial restrictions found in [ ] (see fig. ): the '- ' frame is the least successful reading frame and has the highest restrictions, followed by the '- ' frame, which is the second most restricted frame. next are reading frames '+ ' and '+ ', which have exactly the same restrictions and surprisingly almost the same success rates, not only in their average value but also in every single dataset (data not shown), despite expected stochastic fluctuations due to some genes simply fitting better to each other. last is the '- ' frame, which has no combinatorial restrictions and the highest success rate. plotting the different success rates in the different reading frames as a function of the number of combinatorial constraints found in [ ] , results in a linear relation for the lowest possible threshold, namely that all sequences which are at least as good as the worst in the comparison group are judged successful. as the threshold is increased the linear relation is gradually lost (see supplementary fig. s ). for higher thresholds most of the sequences are below the threshold and very little data is left, which might lead to the observed behaviour. in summary, the structure of the standard genetic code appears to strongly influence the construction of olgs. whether the observed relationship between predicted constraints in different frames and the difficulty of constructing olgs is borne out by the proportion of natural olgs found across frames deserves attention across diverse taxa. the threshold chosen within the pfam group has a very strong influence on success rates. the ordering of the reading frames by success rates, namely '- ', '+ '/'+ ', '- ' and '- ', matches the ordering by combinatorial restrictions in the standard genetic code, beginning with the least restricted frame [ ] . determining the impact of olg construction on an amino acid sequence identity is another indicator of its functionality. it has been argued that a % amino acid identity between naturally occurring sequences ensures that both sequences have the same structure [ ] . comparing the altered part due to olg construction with the original sequence, in . % of cases both olg sequences share at least % of amino acids with their original sequence. in some olg pairs both sequences have an amino acid identity of up to % compared to their original sequence. in the biologically more relevant property of amino acid 'similarity', the worst-scoring of the two olgs can be even up to % similar to its respective original sequence (c.f. left panel of fig. ). determining the average amino acid identity and similarity between the two olg sequences, the average olg design impact can be determined. the average amino acid identity is % in most cases (right panel of fig. ) showing that in almost all olg pairs one sequence is above and one is below % amino acid identity. the average amino acid similarity is % in most cases (right panel of fig. ) which again shows that in almost all cases one of the two olg sequences is above and one below % identity. the double peak structure of both panels in fig. can be explained by differences for olg pairs in different relative reading frames, which are pooled here (c.f. supplementary fig. s ). it follows that in an average olg design, in % of all overlap positions the amino acids of both sequences can be maintained, in % one sequence maintains its amino acid while the amino acid in the other sequence is changed to a similar one and in % one sequence maintains its amino acid and the other sequence cannot maintain a similar amino acid. how well the two sequences can be maintained after the overlap is determined by the standard genetic code and the two specific sequences, the overlap position, their amino acid composition and the amino acid order. while the standard genetic code is a constant factor across all overlaps, all other factors are specific in each case and create the variability in the results. figure : probability density for different amino acid identities and similarities in constructed olg pairs. the data is calculated from , olg pairs. left: the sequence with the lower identity is representative of the pair. the black line indicates the % amino acid identity threshold. right: the mean similarity of both olg sequences represents the pair. the impact of olg design on secondary structure is the last factor studied here. comparing the secondary structure of the olg sequence with its original sequence, a secondary structure similarity is determined. secondary structure is predicted using porter [ ] with the "--fast" flag. it can distinguish between the eight different secondary structure motifs of the dictionary of protein secondary structure (dssp) [ , , ] , which are _ -, alpha-, and phi-helices, hydrogen bonded turns, beta sheets, beta bridges, bends and coils. determining the same secondary structure similarity for all sequences in the seed group of the pfam database yields a control group. this way the typical deviations between domains with the same function can be determined. comparing probability densities for different secondary structure identities in both groups it can be seen that the constructed olg sequences barely deviate from the seed sequences (c.f. fig. ). in conclusion, in regards to secondary structure the change inflicted on a sequence to create olgs is no more than the differences within naturally occurring protein domain families. it is noteworthy that only amino acid identity and similarity have a strong correlation (r= . ) so combined with the other parameters, namely the relative hmm score and the secondary structure identity, there is a set of three more or less independent properties for evaluating constructed olgs, and probably for protein homologs in general. the relative hmm score is the hmm score of the olg sequence divided by the hmm score of a sequence at any threshold percentile as discussed above. between each pair of parameters the pearson's correlation is below . , with the exception of the correlation between secondary structure identity and hmm score being r= . or r= . for thresholds of % or % respectively. olgs are as similar to their original sequences in secondary structure as observed for comparisons of seed sequences of naturally occurring protein domains to the sequence best representing the respective domain family. by comparing olg sequences constructed with the standard genetic code (sgc) to sequences constructed with artificial codes the level of optimality of the sgc can be inferred. since such an approach depends strongly on the codeset used [ ] , four different versions with increasing restrictions will be tested. there are two factors defining a genetic code, namely its amino acid composition and the arrangement of amino acids on the codons. the first code set is the random code set and does not constrict any of the two factors. each code can have any of the amino acids used in the sgc at any codon. the second set only restricts the composition of its codes and is called the degeneracy code set. all codes in this set contain the same amount of codons for each amino acid as in the sgc and thus conserving its amino acid composition. the third set is the blocks code set whose codes have a very similar structure to the sgc and while it also restricts the composition of the codes to some degree it mostly determines their arrangement. this code set is created by assigning all codons of the sgc that code for the same amino acid into blocks and shuffling the amino acids assigned to each block and thus conserves the degeneracy structure of the sgc on the third nucleotide. lastly a code set that maintains the mutational robustness of the sgc as calculated in [ ] is tested. in short, the mutational robustness is the average change of amino acids due to point mutations and has been shown to be extremely optimal in the sgc relative to similar codes [ ] . this set contains block codes like in the second set but only the codes whose mutational robustness is at least as high as the sgc are kept. since these codes are fundamentally block codes they are partly restricted in their amino acid composition, but the arrangement of amino acids in these codes is even more restricted as point mutations from any codons should result in similar amino acids. this code set reflects the fact that different properties of the sgc have a different impact on the fitness or biological optimality of the sgc with the mutational robustness most likely being one of the most important features. here this code set is called the mutational robustness blocks set (mr-blocks set) and it tests the optimality of constructing olgs as an additional property of the sgc after taking into account the mutational robustness. comparing the degeneracy, the block and the mr-blocks code set to the random set, the influence of code composition and arrangement can be determined (see left panel of fig. ). the degeneracy code set reflecting the composition of the sgc has the codes with the highest average success rates indicating that the composition of the sgc is a major factor for this property, but the sgc itself has a very low success rate in comparison, indicating that the amino acid arrangement is an even stronger -in this case negative -factor as the sgc is worse than both the random codes and the degeneracy codes. the block structure of the sgc has a strong negative impact on successful olg design and the sgc is a typical member of this set. enforcing even more structure on the artificial codes in order to maintain the mutation robustness of the sgc further reduces the ability of the sgc to create successful olgs. studying the optimalities of each of the four code sets for flexibility in olg design, it is apparent that the more restricted the code set is, the more optimal the sgc is relative to the set (see right panel of fig. ) . especially in the mr-blocks code set only a few codes are better than the sgc, however no codeset or reading frame has fewer than % of codes doing better (see fig. s -s ), which has been a recommended threshold for inferring optimality [ ] . this is an expected result even if the code has been optimised for olgs as the success rate for constructing olgs reflects merely the 'flexibility' of a code system, but olg sequences also need to be conserved, which is an almost directly opposing property which also has not been found to be strongly optimal by itself [ ] ; it might indeed be expected that overall optimality involves a trade-off between the two. if the sgc has been optimized in this way this could indicate a turning point at which a further increase in mutational robustness results in a smaller fitness increase compared to an increase in the flexibility to create olgs -how to measure fitness for a genetic code is however not clear. while the code composition of the sgc is beneficial for both the ability to create successful olgs and the mutational robustness, the code arrangement of the sgc is only beneficial for mutational error robustness and the sgc (see fig. of [ ] ), indicating that the mutational robustness is the more important property. only in the set of codes with the same mutational robustness the optimality for olg design becomes stronger, supporting the turning point hypothesis. figure : olg design success rates for different alternative code sets. the average is calculated from sets of alternative codes, except for the mr-blocks set with sets of codes. the error bars indicate the standard deviation. left: the average success rates compared to the sgc. while the composition of the sgc is a positive factor, the arrangement of the sgc is a negative factor. right: the optimality of different code sets. the black line indicates the % threshold. the more restricted the code set the more optimal the sgc appears indicating that the ability to successfully create olgs has only been optimized while maintaining other properties. besides the four basic taxonomic groups (three domains of cellular life: archaea, bacteria, eukaryotes, plus viruses) also old genes can be studied by picking only families which have at least one sequence in all four taxonomic groups since it is expect that these families have already been present in luca or another ancient ancestor (although this high level categorisation is not perfect due to widespread horizontal gene transfer). surprisingly, bacterial and eukaryotic genes are generally significantly better suited to olg construction than virus and archaeal genes with only minimal dependence on the threshold percentile, c.f. fig. s . the largest dependence on the threshold percentile is found for the "found in all" genes, which only a total of sequences can be found in the pfam database, so higher stochastic fluctuations are to be expected. using the 'biologically relevant' threshold, the biggest difference is between eukaryotic and archaea genes which have a % difference in their success rate (see left panel of fig. ). for olgs which are typical proteins of their respective family, eukaryotic genes are almost twice as likely to be successful as virus genes (see right panel of fig. ). eukaryotes and "found in all" genes are typically the easiest to overlap, which is somewhat unexpected as eukaryotic genes would perhaps be expected to have the youngest protein families, and so to appear less 'flexible' due to having sampled less of the functional space through mutations. more understandable however is that due to being closer to mutational saturation (if more ancient on average) and therefore having explored a larger proportion of functional sequence space, "found in all" genes might appear more 'flexible', resulting in lower weights and thresholds. in order to estimate the difficulty of naturally forming olg sequences, the minimum number of nucleotide changes needed in order to reach the olg sequence from any of the two original sequences is determined (see fig. ). by only taking olgs in which both sequences are above a certain hmm threshold, extreme outliers are gradually removed with increasing threshold but the rest of the distribution stays the same. this indicates that this property is independent of the threshold value, just as for the amino acid identity and similarity, as fewer and fewer designed olgs pass a higher threshold which makes extreme outliers less likely to occur. on average a designed olg sequence has a % difference in nucleotides to its original, with half of constructed sequences in the range of - % change. most interesting are outliers on the lower end of the distribution as they indicate whether olgs exist that are potentially reachable by naturally occurring mutations. the lowest nucleotide difference observed is . %, which was for an olg pair that scores better than % of the domains in the comparison group. . % of olgs required less than % nucleotide change, i.e. sequences of the sequences created in this dataset that scored at least as highly as the worst sequence in the comparison group. this suggests intuitively that creating overlaps of the sort constructed here could be possible naturally through accumulation of random mutations. the population genetics of such a hypothetical process is a potential topic for further study, as is an experimental evaluation of functionality. figure : percentage nucleotide change of olgs as a function of hmm threshold %. the minimal nucleotide distance of each of olg sequences (two per pair), with a minimal length of nucleotides, to their respective original sequence is determined. there are many aspects of the synthetic construction of olg pairs which can be studied. here factors such as sequence length and the influence of sequence conservation are taken into account. the analysis shows that an evaluation with blast and a fixed e-value cutoff cannot accurately assess the potential functionality of the designed olgs. while the combination of sequence length and an e-value cutoff completely determines the success rate of the constructed olgs, adding in positional weights can only negatively influence the sequences constructed with this method. both problems can be solved however by instead using hmm profiles to determine sequence similarity and then using these to define a threshold for successful olgs derived from sequences in the same protein family. the hmm profiles and the thresholds are though both derived from the pfam database [ ] , which makes these results strongly dependent on the database quality. for example, if in one taxonomic group sequences are very similar due to being mostly from the same species or genus, thresholds would appear to be higher and it would be harder for designed olgs to pass these thresholds. further optimization of the construction algorithm can be achieved by determining the optimal weight strength (influence of sequence conservation), which is k= . . . % of the constructed olg sequences score at least as highly as the worst-scoring biological sequences in pfam groups, while . % of the sequences cannot be distinguished from naturally occurring domains in their respective protein family. this indicates that the typical variation inside protein families is of the same order of magnitude as the change needed in order to construct artificial olgs by arbitrary pairing of protein domains. this result also holds true for other bioinformatic factors like amino acid identity and secondary structure, since the constructed olgs are typically very similar to naturally occurring domains in these properties. studying artificial olg design success from the perspective of an even more constricting biological parameter like tertiary structure would be an important next step; but besides the amino acid sequence, also codon usage can impact protein structure [ ] , along with environmental factors such as the presence of chaperone proteins, which together make it a much harder problem. ultimately, proof of the functionality of artificial sequences cannot yet be realised bioinformatically, and experimental verification is essential. to this end all known independent protein properties available from the sequence should be tested in order to create a gold standard for possibly functional sequences. from this study it is clear that sequence similarity (or identity), hmm-scores and secondary structure should be part of the judged properties. determining relative hmm scores for high thresholds could be used to prefilter sequences for secondary structure prediction as it is the computationally most intensive part of this analysis. considering that domain-domain overlaps are expected to be much harder than overlapping a domain with a less conserved region in another gene, it appears that de novo origin of genes from overlapping orfs may be much less difficult than widely assumed. some constructed olg sequences varied only by . % from their original sequence, and there might be other natural sequences from the same domain that are even closer to the olg sequence. this result could be a starting point for estimating the difficulty of evolving olgs from different starting sequences in natural systems, which is still relatively unexplored despite some early work [ ] . the structure of the standard genetic code explains differences between reading frames and is a strong factor in the overall success rate of olg construction. olgs can maintain an average % amino acid identity and an average % amino acid similarity, which is mostly due to the genetic code. the structure of the standard genetic code is defined by its composition, namely how many codons code for each amino acid, and its arrangement, namely which codons code for each amino acid. it is known that the composition alone can not explain the strong optimality of the standard genetic code for mutational robustness as it stands out from between codes with the same composition as the standard genetic code [ , ] . considering that the arrangement of the standard genetic code creates such high mutational robustness values [ ] it is remarkable that designing olgs also works so well. another factor which deserves further exploration is the age of a protein family, i.e. the time since gene birth. this may correlate with apparent 'sequence flexibility', which is the strongest influence on the result via the threshold values, due to increasing mutational saturation in older protein families. being able to distinguish genuine sequence flexibility from mutational saturation, even in broad terms, could be very useful here. the analysis presented here depends primarily on the reliability of hmm profiles of pfam groups as a guide to biological functionality in constructed sequences. reliability for classifying biological protein sequences into ortholog families, the main use of these hmms, may not correlate well with reliability in scoring artificially constructed sequences for functionality. in other words it may well be that these profiles fail to capture important requirements for protein tertiary structure and/or functionality. future research ought test the best candidates experimentally, and if the best candidates from the methods developed here are not successful, additional factors could be considered in comparing constructed sequences and their natural precursors. for instance, many protein characteristics can be assessed using servers or packages incorporating multiple bioinformatic tools such as predictprotein, for various secondary structural elements [ ] , and many sequence properties, such as hydrophobicity profiles, can be computed using the volpes server [ ] , which has been applied to the related case of frame-shifted sequences compared to their mother genes [ ] . other properties required for functional protein sequences can be inferred from the evolutionary information contained in sequence alignments of protein families. for instance, it has been calculated based on a study of residue-residue co-evolution in ten well-characterized protein families that the proportion of all sequences which fold to the family's structure ranges from approx - to - [ ] . these principles have recently been successfully used in the design of functional proteins [ ] , and could conceivably also be applied to olg construction. factors facilitating the existence of olgs may possibly help in predicting olgs in sequenced genomes and should be explored further. for instance, a careful study of relatively 'flexible' sequence regions in taxonomically widespread genes may help find more overlapping genes. most interestingly, bacterial and eukaryotic genes can be overlapped more easily than virus genes, contrary to the findings in [ ] . these earlier results can be explained entirely with dataset-database biases, so this algorithm gives no support for the common assumption of a higher intrinsic olg formation capacity of viruses compared with bacteria or eukaryotes. two of the main differences between the taxonomic groups are the expected mutation rates and the average length of a protein. while genomes with higher mutation rates explore sequence space faster and therefore their proteins should appear to be more flexible, virus domains do not appear to be very flexible, despite having the highest mutation rate. the length of the sequences on the other hand has been removed as a factor in this analysis. an artificial factor not considered could be database biases or an exchange matrix (blosum ) biased towards certain kinds of proteins. the latter could be tested by using different matrices created from sequences from different taxonomic groups. it would be important to use the new matrix not only in the construction of the olgs but also in the evaluation by the hmms. so far it is not clear why protein families from different taxonomic groups are so different in their calculated ability to create olgs. a better theoretical understanding of overlapping genes will be extremely useful in microbial genome annotation methods, the study of evolution, and in synthetic biology, and therefore deserves renewed attention. overlapping genes in bacteriophage φx concomitant emergence of the antisense protein gene of hiv- and of the pandemic the novel ehec gene asa overlaps the tegt transporter gene in antisense and is regulated by nacl and growth phase overlapping genes in parasitic protist giardia lamblia overlapping genes in vertebrate genomes are antisense proteins in prokaryotes functional? the evolution of genome compression and genomic novelty in rna viruses gene overlapping and size constraints in the viral world comparative study of overlapping genes in bacteria, with special reference to rickettsia prowazekii and rickettsia conorii overlapping genes in bacterial and phage genomes noncontiguous operon is a genetic organization for coordinating bacterial gene expression is genetic code redundancy related to retention of structural information in both dna strands? complementarity of peptides specified by 'sense' and 'antisense' strands of dna genetic coding algorithm for sense and antisense peptide interactions frameshifting preserves key physicochemical properties of proteins viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of deltaretroviruses origins of genes:" big bang" or continuous creation gene birth contributes to structural disorder encoded by overlapping genes evolution of viral proteins originated de novo by overprinting a minimal trprs catalytic domain supports sense/antisense ancestry of class i and ii aminoacyl-trna synthetases two types of aminoacyl-trna synthetases could be originally encoded by complementary strands of the same nucleic acid functional class i and ii amino acid-activating enzymes can be coded by opposite strands of the same gene the combinatorics of overlapping genes overlapping genes: more than anomalies? do overlapping genes violate molecular biology and the theory of evolution? missing genes in the annotation of prokaryotic genomes a case for a negative-strand coding sequence in a group of positive-sense rna viruses computational design of fully overlapping coding schemes for protein pairs and triplets two proteins for the price of one: the design of maximally compressed coding sequences designing of a single gene encoding four functional proteins engineering gene overlaps to sustain genetic constructs in vivo biomolecular computing systems: principles, progress and potential genetic programs can be compressed and autonomously decompressed in live cells functional segregation of overlapping genes in hiv profile hidden markov models pfam: a comprehensive database of protein domain families based on seed alignments mafft multiple sequence alignment software version : improvements in performance and usability muscle: multiple sequence alignment with high accuracy and high throughput amino acid substitution matrices from protein blocks deeper profiles and cascaded recurrent and convolutional neural networks for state-of-the-art protein secondary structure prediction optimality in the standard genetic code is robust with respect to comparison code sets the genetic code is one in a million a neutral origin for error minimization in the genetic code evolution of overlapping genes dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features a series of pdb related databases for everyday needs the structure of proteins: two hydrogenbonded helical configurations of the polypeptide chain a previously uncharacterized gene in sars-cov- illuminates the functional dynamics and evolutionary origins of the covid- pandemic properties and abundance of overlapping genes in viruses codon usage regulates protein structure and function by affecting translation elongation speed in drosophila cells critical assessment of methods of protein structure prediction (casp)-round xiii twilight zone of protein sequence alignments extreme genetic code optimality from a molecular dynamics calculation of amino acid polar requirement evolution by gene duplication predictprotein-an open resource for online prediction of protein structural and functional features volpes: an interactive web-based tool for visualizing and comparing physicochemical properties of biological sequences how many protein sequences fold to a given structure? a coevolutionary analysis co-evolutionary fitness landscapes for sequence design key: cord- - m ajc a authors: okada, megan; guo, ping; nalder, shai-anne; sigala, paul a. title: doxycycline has distinct apicoplast-specific mechanisms of antimalarial activity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: m ajc a doxycycline (dox) is a key antimalarial drug thought to kill plasmodium parasites by blocking protein translation in the essential apicoplast organelle. clinical use is primarily limited to prophylaxis due to delayed second-cycle parasite death at - μm serum concentrations. dox concentrations > μm kill parasites with first-cycle activity but have been ascribed to off-target mechanisms outside the apicoplast. we report that μm dox blocks apicoplast biogenesis in the first cycle and is rescued by isopentenyl pyrophosphate, an essential apicoplast product, confirming an apicoplast-specific mechanism. exogenous iron rescues parasites and apicoplast biogenesis from first-but not second-cycle effects of μm dox, revealing that first-cycle activity involves a metal-dependent mechanism distinct from the delayed-death mechanism. these results critically expand the paradigm for understanding the fundamental antiparasitic mechanisms of dox and suggest repurposing dox as a faster-acting antimalarial at higher dosing whose multiple mechanisms would be expected to limit parasite resistance. malaria remains a serious global health problem, with hundreds of thousands of annual deaths due to plasmodium falciparum parasites. the absence of a potent, long-lasting vaccine and parasite tolerance to frontline artemisinin combination therapies continue to challenge malaria elimination efforts. furthermore, there are strong concerns that the current covid- pandemic will disrupt malaria prevention and treatment efforts in africa and cause a surge in malaria deaths that unravels decades of progress ( ) . deeper understanding of basic parasite biology and the mechanisms of current drugs will guide their optimal use for malaria prevention and treatment and facilitate development of novel therapies to combat parasite drug resistance. tetracycline antibiotics like dox are thought to kill eukaryotic p. falciparum parasites by inhibiting prokaryotic-like s ribosomal translation inside the essential apicoplast organelle ( figure ) ( ). although stable p. falciparum resistance to dox has not been reported, clinical use is largely limited to prophylaxis due to delayed activity against intraerythrocytic infection ( , ) . parasites treated with - µm dox, the drug concentration sustained in human serum with current - mg dosage ( ), continue to grow for - hours and only die after the second -hour intraerythrocytic growth cycle when they fail to expand into a third cycle ( ). slow antiparasitic activity is believed to be a fundamental limitation of dox and other antibiotics that block apicoplast-maintenance pathways ( , ). first-cycle anti-plasmodium activity has been reported for dox and azithromycin concentrations > µm, but such activities have been ascribed to targets outside the apicoplast ( , , ). a more incisive understanding of the mechanisms and parameters that govern first versus second-cycle dox activity can inform and improve clinical use of this valuable antibiotic for antimalarial treatment. we therefore set out to test and unravel the mechanisms and apicoplast specificity of first-cycle dox activity. first-cycle activity by µm dox has an apicoplast-specific mechanism. prior studies have shown that µm isopentenyl pyrophosphate (ipp), an essential apicoplast product, rescues parasites from the delayed-death activity of - µm dox, confirming an apicoplast-specific target ( ). to provide a baseline for comparison, we first used continuous-growth and -hour growth- inhibition assays to confirm that ipp rescued parasites from µm dox ( figure a ) and that dox concentrations > µm killed parasites with first-cycle activity ( ( ). to test the apicoplast specificity of first-cycle dox activity, we next asked whether µm ipp could rescue parasites from dox concentrations > µm. we observed that ipp shifted the -hour ec value of dox from ± to ± µm (average ± sd of independent assays, p = . by unpaired t-test) ( figure c and figure - figure supplement ), suggesting that first-cycle growth defects from - µm dox reflect an apicoplast-specific mechanism but that dox concentrations > µm cause off-target defects outside this organelle. we further tested this conclusion using continuous growth assays performed at constant dox concentrations. we observed that ipp fully or nearly fully rescued parasites from first-cycle growth inhibition by µm but not or µm dox ( with first-cycle activity by an apicoplast-specific mechanism. µm dox blocks apicoplast biogenesis in the first cycle: inhibition of apicoplast biogenesis in the second intraerythrocytic cycle is a hallmark of - µm dox-treated p. falciparum, resulting in unviable parasite progeny that fail to inherit the organelle ( ). ipp rescues parasite viability after the second cycle without rescuing apicoplast inheritance, such that third-cycle daughter parasites lack the organelle and accumulate apicoplast-targeted proteins in cytoplasmic vesicles ( ). we treated synchronized ring-stage d ( ) or nf ( ) parasites expressing the acyl carrier protein leader sequence fused to gfp (acpl-gfp) with µm dox and assessed apicoplast morphology what is the molecular mechanism of faster apicoplast-specific activity by µm dox? we first considered the model that both and µm dox inhibit apicoplast translation but that µm dox kills parasites faster due to more stringent translation inhibition at higher drug concentrations. this model predicts that treating parasites simultaneously with multiple distinct apicoplast-translation inhibitors, each added at a delayed death-inducing concentration, will produce additive, accelerated activity that kills parasites in the first cycle. to test this model, we treated synchronized d parasites with combinatorial doses of µm dox, µm clindamycin, and nm azithromycin and monitored growth over intraerythrocytic cycles. treatment with each antibiotic alone produced major growth defects at the end of the second cycle, as expected for delayed-death activity at these concentrations ( ). two-and three-way drug combinations caused growth defects that were indistinguishable from individual treatments and provided no evidence for additive, first-cycle activity ( figure a and results contradict a simple model that and µm dox act via a common translation-blocking mechanism and suggest that the first-cycle activity of µm dox is due to a distinct mechanism. exogenous iron rescues parasites from first-but not second-cycle effects of µm dox. tetracycline antibiotics like dox tightly chelate a wide variety of di-and trivalent metal ions via their siderophore-like arrangement of exocyclic hydroxyl and carbonyl groups (figure ) , with a reported affinity series of fe + >fe + >zn + >mg + >ca + ( , ). indeed, tetracycline interactions with ca + and mg + ions mediate cellular uptake and binding to biomolecular targets such as the tetracycline repressor and s rrna ( , ). we next considered a model that first-cycle effects of µm dox reflect a metal-dependent mechanism distinct from ribosomal inhibition causing second-cycle death. to test this model, we investigated whether exogenous metals rescued parasites from µm dox. we failed to observe growth rescue by µm zncl (toxicity limit ( )) or µm cacl in continuous-growth ( figure b and we also observed that µm fecl but not cacl rescued first-cycle apicoplast- branching in µm dox ( figure e and figure -figure supplement ). these observations contrast with ipp, which rescued parasite viability in µm dox but did not restore apicoplast branching ( figure e ). we further noted that fecl selectively rescued parasites from the apicoplast-specific, first-cycle growth effects of µm dox but did not rescue parasites from the we first considered whether exogenous fecl might selectively rescue µm dox activity by blocking or reducing its uptake into the parasite apicoplast, since metal chelation has been reported to influence the cellular uptake of tetracycline antibiotics in other organisms ( ). however, µm fecl or mgcl did not rescue second-cycle parasite death in continuous growth assays with µm (figures d) or µm dox ( figure f ). furthermore, exogenous iron resulted in only a small, . -µm shift in ec value from . to µm in a -hour growth inhibition assay, in contrast to the . -µm shift provided by ipp (figure -figure supplement ) . these results strongly suggest that dox uptake into the apicoplast is not substantially perturbed by exogenous iron. the inability of µm fecl to rescue first-cycle activity by ≥ -µm dox ( figure g ) further suggests that general uptake of dox into parasites is not substantially affected by exogenous iron. we propose two distinct models to explain the metal-dependent effects of µm dox, both of which could contribute to apicoplast-specific activity. first, dox could directly bind and sequester labile iron within the apicoplast, reducing its bioavailability for fe-s cluster biogenesis and other essential iron-dependent processes in this organelle. indeed, prior work has shown that apicoplast biogenesis requires fe-s cluster synthesis apart from known essential roles in isoprenoid biosynthesis ( ). in this first model, rescue by exogenous fecl would be due to restoration of iron bioavailability, while modest rescue by µm mgcl may reflect competitive displacement of dox-bound iron to restore iron bioavailability. rpmi growth medium already contains ~ µm mg + prior to supplementation with an addition µm mgcl , and thus mg + availability is unlikely to be directly limited by µm dox. consistent with a general mechanism that labile-iron chelation can block apicoplast biogenesis, we observed in preliminary studies that the anti-plasmodium growth inhibition caused by the highly-specific iron chelator, deferoxamine in a second model, dox could bind to additional macromolecular targets within the apicoplast (e.g., a metalloenzyme) via metal-dependent interactions that inhibit essential functions required for organelle biogenesis. exogenous µm fe + would then rescue parasites by disrupting these inhibitory interactions via competitive binding to dox. this second model would be mechanistically akin to diketo acid inhibitors of hiv integrase like raltegravir that bind to active site mg + ions to inhibit integrase activity but are displaced by exogenous metals ( , ) . to test this model, we are developing a dox-affinity reagent to identify apicoplast targets that interact with doxycycline and whose inhibition may contribute to first-cycle dox activity. there has been a prevailing view in the literature that delayed-death activity is a fundamental limitation of antibiotics like dox that block apicoplast maintenance ( , ). our results emphasize that dox is not an intrinsically slow-acting antimalarial drug and support the emerging paradigm ( - ) that inhibition of apicoplast biogenesis can defy the delayed-death phenotype to kill parasites on a faster time-scale. the first-cycle, iron-dependent impacts of µm dox or µm dfo on apicoplast biogenesis also suggest that this organelle may be especially susceptible to therapeutic strategies that interfere with acquisition and utilization of iron, perhaps due to limited uptake of exogenous iron and/or limited iron storage mechanisms in the finally, this work suggests the possibility of repurposing dox as a faster-acting antiparasitic treatment at higher dosing, whose multiple mechanisms would be expected to limit parasite resistance. prior studies indicate that - mg doses in humans achieve sustained serum dox concentrations ≥ µm for - hours with little or no increase in adverse effects ( , ). dox is currently contraindicated for long-term prophylaxis in pregnant women and young children, two of the major at-risk populations for malaria, due to concerns about impacts on fetal development and infant tooth discoloration, respectively, based on observed toxicities for other tetracyclines ( ). recent studies suggest that these effects are not associated with short-term dox use ( , ), but more work is needed to define the safety parameters that would govern short-term imaging experiments were independently repeated twice. parasite nuclei were visualized by incubating samples with - µg/ml hoechst (thermo scientific pierce ) for - minutes at room temperature. the parasite apicoplast was visualized in d ( ) or nf mevalonate-bypass ( ) cells using the acpleader-gfp expressed by both lines. images were taken on dic/brightfield, dapi, and gfp channels using either a zeiss axio imager or an evos m imaging system. fiji/imagej was used to process and analyze images. all image adjustments, hoechst stain). - parasites were examined for each treatment condition on each given day for duplicate experiments, and data were plotted as the average percentage of parasites in each population that displayed an elongated, punctate, or dispersed apicoplast gfp signal. for clarity, error bars are not displayed but standard deviations were < % in all conditions. cell-percentage differences were analyzed by unpaired t-test (p values in parentheses, ns = not significant) . later (green = acpl-gfp, blue = nuclear hoechst stain). indirect effects of the covid- pandemic on malaria intervention coverage, morbidity, and mortality in africa: a geospatial modelling analysis dis tetracyclines specifically target the apicoplast of the malaria parasite plasmodium falciparum antimalarial drug resistance in africa: the calm before the storm? tetracyclines in malaria pharmacokinetics of oral doxycycline during combination treatment of severe falciparum malaria multiple antibiotics exert delayed effects against the plasmodium falciparum apicoplast chemical rescue of malaria parasites lacking an apicoplast defines organelle function in blood-stage plasmodium falciparum macrolides rapidly inhibit red blood cell invasion by the human malaria parasite, plasmodium falciparum protein trafficking to the plastid of plasmodium falciparum is via the secretory pathway a mevalonate bypass system facilitates elucidation of plastid biology in malaria parasites avidity of the tetracyclines for the cations of metals chemical and biological dynamics of tetracyclines structural basis of gene regulation by the tetracycline inducible tet repressor-operator system fluxes in "free" and total zinc are essential for progression of intraerythrocytic stages of plasmodium falciparum bioavailable iron and heme metabolism in plasmodium falciparum iron chelation therapy for malaria: a review the suf iron-sulfur cluster synthesis pathway is required for apicoplast maintenance in malaria parasites intracellular labile iron diketo acid inhibitor mechanism and hiv- integrase: implications for metal binding in the active site of phosphotransferase enzymes retroviral intasome assembly and inhibition of dna strand transfer inhibitors of nonhousekeeping functions of the apicoplast defy delayed death in plasmodium falciparum delayed death by plastid inhibition in apicomplexan parasites disruption of apicoplast biogenesis by chemical stabilization of an imported protein evades the delayed-death phenotype in malaria parasites small molecule inhibition of apicomplexan ftsh disrupts plastid biogenesis in human pathogens targeting drugs using a chemical supplementation assay in cultured human malaria parasites pharmacokinetics and tolerability of a single oral -mg dose of doxycycline serum levels of doxycycline in normal subjects after a single oral dose has doxycycline, in combination with anti-malarial drugs, a role to play in intermittent preventive treatment of plasmodium falciparum malaria infection in pregnant women in africa? no visible dental staining in children treated with doxycycline for suspected rocky mountain spotted fever in vitro and in vivo antimalarial efficacies of optimized tetracyclines deconvoluting heme biosynthesis to target blood-stage malaria parasites magnet plus sorbitol-synchronized parasites were treated as rings with µm dox ± µm ipp and imaged or days later (green = acpl-gfp, blue = nuclear hoechst stain). - parasites were examined for each treatment condition on each given day for duplicate experiments, and data were plotted as the average percentage of parasites in each population that displayed an elongated, punctate, or dispersed apicoplast gfp signal. for clarity, error bars are not displayed but standard deviations were < % in all conditions. cell-percentage differences were analyzed by unpaired t- test (p values in parentheses, ns = not significant) . key: cord- - rreoh o authors: smith, sydni caet; gribble, jennifer; diller, julia r.; wiebe, michelle a.; thoner, timothy w.; denison, mark r.; ogden, kristen m. title: reovirus rna recombination is sequence directed and generates internally deleted defective genome segments during passage date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rreoh o for viruses with segmented genomes, genetic diversity is generated by genetic drift, reassortment, and recombination. recombination produces rna populations distinct from full-length gene segments and can influence viral population dynamics, persistence, and host immune responses. viruses in the reoviridae family, including rotavirus and mammalian orthoreovirus (reovirus), have been reported to package segments containing rearrangements or internal deletions. rotaviruses with rna segments containing rearrangements have been isolated from immunocompromised and immunocompetent children and in vitro following serial passage at high multiplicity. reoviruses that package small, defective rna segments have established chronic infections in cells and in mice. however, the mechanism and extent of reoviridae rna recombination are undefined. towards filling this gap in knowledge, we determined the titers and rna segment profiles for reovirus and rotavirus following serial passage in cultured cells. the viruses exhibited occasional titer reductions characteristic of interference. reovirus strains frequently accumulated segments that retained ′ and ′ terminal sequences and featured large internal deletions, while similar segments were rarely detected in rotavirus populations. using next-generation rna-sequencing to analyze rna molecules packaged in purified reovirus particles, we identified distinct recombination sites within individual viral gene segments. recombination junction sites were frequently associated with short regions of identical sequence. taken together, these findings suggest that reovirus accumulates defective gene segments featuring internal deletions during passage and undergoes sequence-directed recombination at distinct sites. importance viruses in the reoviridae family include important pathogens of humans and other animals and have segmented rna genomes. recombination in rna virus populations can facilitate novel host exploration and increased disease severity. the extent, patterns, and mechanisms of reoviridae recombination and the functions and effects of recombined rna products are poorly understood. here, we provide evidence that mammalian orthoreovirus regularly synthesizes rna recombination products that retain terminal sequences but contain internal deletions, while rotavirus rarely synthesizes such products. recombination occurs more frequently at specific sites in the mammalian orthoreovirus genome, and short regions of identical sequence are often detected at junction sites. these findings suggest that mammalian orthoreovirus recombination events are directed in part by rna sequences. an improved understanding of recombined viral rna synthesis may enhance our capacity to engineer improved vaccines and virotherapies in the future. viruses in the reoviridae family include important pathogens of humans and other animals and have segmented rna genomes. recombination in rna virus populations can facilitate novel host exploration and increased disease severity. the extent, patterns, and mechanisms of reoviridae recombination and the functions and effects of recombined rna products are poorly understood. here, we provide evidence genetic drift and reassortment are typically considered the primary mechanisms by which segmented rna viruses acquire genetic diversity. however, recombination also occurs regularly during the replication of these viruses and yields non-canonical rna molecules that differ from full-length genome segments ( , ) . these noncanonical rnas may be packaged and can influence viral population dynamics, persistence, and host immune responses. a subset of non-canonical rnas resulting from viral recombination events are known as defective viral genomes (dvgs), because they are unable to replicate in the absence of functional trans-complementation by the full-length parental rna. the most frequently reported type of rna dvg arises from recombination events resulting in large deletions that may remove much of the coding region of a viral rna while retaining elements required for polymerase binding, replication, and packaging ( ) ( ) ( ) . functions of dvgs are poorly defined for rna viruses, although some configurations have demonstrated roles in viral replication interference, innate immune antagonism, and virus evolution ( ) . for segmented reoviridae viruses, the need to successfully package a multi-partite genome likely imposes additional restrictions on the variety of non-canonical rnas that are tolerated. overall for the reoviridae, there have been limited studies of recombination events and the recombined non-canonical rnas they generate. for the reoviridae family of segmented, double-stranded (ds) rna viruses, rna segment termini are predicted to direct packaging and other viral replication processes. rotavirus is the leading cause of diarrheal mortality among unvaccinated children under five years of age ( ) . mammalian orthoreovirus (reovirus) has been linked to loss of oral tolerance associated with celiac disease and is in advanced clinical trials as an oncolytic ( ) . reoviridae particles are non-enveloped, multi-layered, and encapsidate nine to twelve dsrna genome segments ( ) . following entry into target cells and outer capsid removal, subvirion particles function as nanoscale factories that transcribe capped, positive-sense viral rna (+rna) species that are ~ . to kb in length ( ) ( ) ( ) ( ) ( ) . the ten reovirus segments are classified as large (l), medium (m), or small (s), while the eleven rotavirus segments are numbered. most segments contain a single open reading frame (orf) flanked by ′ and ′ untranslated regions (utrs) ( , ) . extensive base-pairing between ′ and ′ terminal regions, interrupted by secondary structures, has been predicted for many segments ( ) ( ) ( ) ( ) . reoviridae +rna sequences required for packaging, assortment, transcription, or translation encompass the utr and extend into the orf ( ) ( ) ( ) ( ) ( ) . experimental and computational evidence suggests inter-segment complementarity drives assortment of reoviridae +rnas, with shorter gene segments nucleating +rna complexes that are packaged into assembling viral capsids ( ) ( ) ( ) ( ) ( ) . since reoviridae viruses can package non-canonical segments, at least some level of divergence from the consensus sequence is tolerated ( , , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . however, the extent of recombination and degree to which non-canonical segments are packaged are unknown. for the reoviridae, non-canonical gene segments have been reported. rotaviruses with rna segments containing rearrangements have been isolated from immunocompromised and immunocompetent children and in vitro following serial passage at high multiplicity ( , , ) . most reported rearrangements involve partial head-to-tail duplications inserted after the functional orf ( ) . in some cases, rotavirus rearrangements occur at preferred sites, with direct repeat sequences or rna secondary structures proposed as recombination hot spots ( , , ) . rearranged segments have also been detected in orbiviruses ( , ) . finally, reoviruses establishing chronic infections in vitro and in vivo may package small, defective rna segments ( , , , , ) . together, these observations suggest that recombination is not a rare occurrence during reoviridae replication. however, previous work often has not approached studies of reoviridae recombination and non-canonical segments in a systematic manner. the mechanism of reoviridae recombination, the frequency with which recombined rnas are synthesized or packaged, and the functions of recombined rnas remain poorly understood. in the current study, we sought to elucidate the type and frequency of noncanonical rna synthesis by reoviridae viruses. we determined the titers and rna segment profiles of reovirus (rst l and rst d i ) and rotavirus (rssa ) laboratory strains that had been rescued by reverse genetics then serially passaged ten times, each in triplicate lineages, in cultured cells. viruses exhibited occasional reductions in titer that are characteristic of interference by defective-interfering viral genes. the two reoviruses accumulated non-canonical rnas that retain ′ and ′ termini and feature one or more large internal deletions, while the rotavirus rarely accumulated such dvgs. analyses of next-generation rna-sequencing data sets from purified rst l reovirus rna revealed many junctions, with hot spots for recombination in specific viral gene segments. further, an overlap of - bp of identical sequence was favored at the recombination junction sites, suggesting that dvg formation is primarily determined by sequence complementarity. these findings suggest that reovirus frequently synthesizes and packages dvgs that contain internal deletions and undergoes recombination at distinct sites across the genome that encode key sequence features. this work provides rationale for future studies that will reveal detailed mechanisms of dvg synthesis and dvg effects on viral population dynamics and host responses. virus titer patterns differ among serially passaged reoviridae viruses. to compare virus population replication patterns over multiple infections between reovirus and rotavirus in cultured cells, we utilized a serial passage approach with two strains of reovirus, t l and t d i , and one strain of rotavirus, sa . t l and t d represent distinct human reovirus serotypes ( ) . t d induces necrosis and substantially more apoptosis than t l in murine l fibroblasts (l cells) ( , ) . recombinant strain (rs) t d i is identical to t d, except for a t i mutation in attachment protein σ that renders it resistant to proteolysis ( ) . sa is a simian rotavirus strain that induces necrosis in african green monkey kidney epithelial (ma ) cells ( ) . for each passage series, the viruses were recovered using plasmid-based reverse genetics, and working virus stocks were made after one additional round of amplification in cultured cells (fig. a) . rst l, rst d i , and rssa were serially passaged ten times, each in three lineages, and infectious virus titers were determined. we found that while the three lineages of each virus strain exhibited relatively similar virus titer patterns, rst l, rst d i , and rssa exhibited notable differences in virus titer pattern across the passage series (fig. b-d) . virus titers for rst l reovirus climbed steadily from ~ × pfu/ml at passage (p ) to ~ pfu/ml at p , remained high through p , decreased by ~ -fold to -fold at p , then rebounded through p (fig. b) . for serially passaged rst d i reovirus, virus titers climbed from ~ × pfu/ml at p to ~ × pfu/ml at p , then fluctuated between ~ × and × pfu/ml through p , with some variability observed among lineages (fig. c ). by p , rssa rotavirus had already reached titers of ~ × pfu/ml (fig. d ). titers then decreased by ~ fold to -fold at p and p , with a rebound to peak titers approaching pfu/ml during intervening passages. thus, titer patterns in serially passaged rssa rotavirus lineages resembled those of rst d i reovirus in that peak titers were lower and stayed within a narrower range, but they resembled those of rst l reovirus in that distinct titer dips and rebounds were observed concurrently for all three lineages ( fig. b-d) . reovirus and rotavirus produce small non-canonical segments during serial passage. alterations to virus titer patterns during serial passage can be due to the presence of dvgs, which may interfere with viral replication ( , ) . to detect the presence of non-canonical segments in lysates from reovirus and rotavirus passage series, we used reverse transcription polymerase chain reaction (rt-pcr). rna molecules containing both the ′ and ′ termini of a specific viral gene segment were amplified and visualized (table ) . using this approach, we consistently detected fulllength viral gene segments ≤ . kb in length, which included m and s reovirus segments and all rotavirus segments (g -g ) (figs. - ). longer full-length segments, such as reovirus l segments, were amplified infrequently, due to limitations of the enzymes used for amplification. thus, our assay conditions permitted detection of small rna molecules that contained native viral ′ and ′ segment termini. for rst l reovirus lineage (lin ), rt-pcr products smaller than the full-length segment were amplified from multiple passages for each segment, except s ( fig. a) . non-canonical segments derived from segments l , m , and s were detected beginning at about p and continuously for the remaining passages. for segments l , l , m , and m , noncanonical segments were detected only transiently. for segments m , s , and s , an rna product slightly longer than the full-length genome segment was detected. there is no non-canonical segment whose presence or absence clearly correlates with the changes in titer shown in fig. b . however, a non-canonical l segment was detected from p -p , preceding the sharp decrease in titer at p , and a faint, transient, noncanonical m segment was detected only in p and p (figs. b and a). for rst d i reovirus lin , rt-pcr products smaller than the full-length segment were amplified from multiple passages for seven of ten segments (fig. b ). like the patterns observed for rst l, these products often were detected by p and then continuously across the majority of rst d i passages. however, some products were detected transiently. products derived from the rst d i l segment provide the most striking example of this property. the smallest non-canonical l segment was detected in p -p . a non-canonical l segment of ~ kb was detected strongly in p -p and p but weakly in some other passages, and two intermediately sized non-canonical l segments were detected only in p or only in p -p . for segment s , a non-canonical segment was detected that is slightly longer than the full-length genome segment. in contrast to the frequent detection of non-canonical segments smaller than the full-length segment for serially passaged reoviruses, such a product was detected for only one of eleven rotavirus segments, g , using primers that bind near the termini (fig. ). this non-canonical segment was present across all serial passages. for several segments, indistinct bands high on the gels suggested the presence of rt-pcr products that were longer than the full-length products. to determine whether the failure to detect small non-canonical segments for rssa rotavirus was due to primer bias, we repeated the rt-pcr for g , for which the initial primer bound nucleotides internal to the ′ end, using an extreme terminal primer ( table , alt primer). using the new g primer pair, we still failed to detect any rt-pcr products smaller than the full-length segment (fig. ). the segment for which we had detected a small non-canonical segment was g , and the ′ primer bound a position beginning nucleotides internal to the terminus. taken together, lin rna profiles suggest non-canonical rna species that differ in length but have identical termini to the parental segments accumulate variably during reovirus and rotavirus serial passage in cultured cells. after examining rna profiles for all ten or eleven segments of lin , we compared rna profiles among all three lineages for each virus passage series following amplification by rt-pcr and resolution in . % agarose, as in figs. - . for rst l and rst d i reoviruses, we used primers that anneal to the termini of l , m , and s , and for rssa rotavirus, we used primers that anneal to g , g , and g (table ) . these segments were chosen to compare rna profiles between representative small, medium, and large viral gene segments. for rst l reovirus, we found that profiles of putative dvgs were mostly conserved across the three independently passaged lineages (fig. a ). for rst l l , in all three lineages, a single, bright band of ~ . kb was detected beginning at p or p and continuing through p , and a smaller, faint band was detected in some central passages. for rst l m , non-canonical rna segments were detected at ~ . kb and ~ . kb in several passages. for rst l s , non-canonical rna segments just slightly larger than the full-length segment and at ~ . kb were detected in several passages. in contrast to rst l reovirus rna profiles, those of rst d i reovirus were less well conserved across the three lineages (fig. b ). for rst d i l , all three lineages contained a small non-canonical rna segment, ~ . kb in size, present in many passages. however, lin contained transient, larger non-canonical rna segments, as described above. lin contained smaller, transient rst d i l non-canonical rna segments instead of the ~ . kb segment in p , p , and p . lin contained a larger non-canonical rst d i l segment in later passages. for rst d i m , many passages of all three lineages contained a small putative non-canonical rna segments of ~ . kb, but this segment appeared absent from the last two passages of lin and lin , and additional larger non-canonical m segments were detected in p -p for lin . the rst d i s rna profiles were more consistent among lineages than those of the m and l segments and were similar to those of rst l s , with non-canonical rna segments just slightly larger than the full-length segment and at ~ . kb detected in several passages (fig. ) . for rst d i lin , an additional non-canonical s segment of < . kb also was detected in several passages. rna profiles for rssa rotavirus were very similar across the three lineages, featuring a single non-canonical rna segment of ~ . kb only for g (fig. ). as noted above for lin , for some segments in lin and lin , indistinct bands high on the gels suggested the presence of rt-pcr products that were longer than the full-length products. the observation that similarly sized products that differ from full-length segments were amplified by primers specific to the ′ and ′ segment termini arose in multiple independent passages of rst l reovirus and rssa rotavirus suggests that these viruses preferentially accumulate certain non-canonical rnas. non-canonical reovirus segments feature large deletions. to gain insight into the identities of non-canonical rna products detected in lysates of serially passaged reovirus, we excised bands from agarose gels on which rt-pcr products had been resolved ( fig. ) and used the sanger method to determine their sequences. we selected bright, low-molecular weight bands amplified using primers specific for the termini of rst l and rst d reovirus l , m , and s (table. ). the sequenced segments ranged from ~ % to % of the length of the parental segments. comparison with reference sequences revealed that each product was a dvg that contained relatively short intact termini and one or two large internal deletions (fig. ). the largest single deletion, in the rst l l dvg, was of , nucleotides and removed the majority of the orf and about half of the ′ utr. in every case, the ′ utr was left intact, and for five of the six sequenced dvgs, more than nucleotides of the ′ end of the orf also was left intact (fig. s ). the shortest terminal regions on either side of a novel dvg junction were only - nucleotides in length and were observed at the ′ terminus of the rst d i m dvg and the ′ termini of the rst l l dvg and the rst d i s dvg. for the rst l l dvg and the rst d i s dvg, part of the ′ utr was deleted during its emergence. for the rst l m dvg, less than nucleotides of the orf preceding the ′ utr was retained, but at nucleotides this utr is relatively long. together, these observations suggest that serially passaged reoviruses regularly undergo recombination events resulting in generation of dvgs that retain ' and ' termini and feature one or multiple large internal deletions. to better understand reovirus recombination, we examined sequences immediately surrounding the new junctions created in dvgs (table ). in several cases, a pattern was elusive. however, for the rst l l dvg, the first rst d i l dvg junction, and the rst d i s dvg, short regions of sequence similarity (six to seven nucleotides) were detected downstream from the recombination site. for the rst l l dvg, the upstream four nucleotides also were identical. thus, these junctions were identical before and after recombination, or, in the case of the rst l l dvg, contained a single inserted nucleotide. reovirus recombination occurs at distinct sites. after noting the short regions of identical sequence immediately downstream or spanning the junction sites for some of the dvgs we excised and sequenced, we wondered whether analysis of larger numbers of junction sites would reveal reovirus recombination biases therefore, we extracted rna from benzonase-treated (to remove extra-virion nucleic acids) rst l particles, generated randomly primed, directional libraries, and sequenced them using illumina technology. these viruses had been amplified only a few times following recovery using plasmid-based reverse genetics to make laboratory stocks and were not anticipated to have accumulated numerous dvgs. we aligned high-quality reads to reference rst l sequences for each individual gene segment using virema (virus recombination mapper), a platform that detects recombination events in viral genomes from input next-generation sequencing data, and analyzed recombination events with a custom bioinformatic pipeline ( , ) . our sequencing averaged ~ reads per site across most genome segments, with reduced coverage at the extreme ′ and ′ termini (figs. a and s ). approximately % of reads mapped to viral gene segments, demonstrating that the virion particles had been sufficiently purified (table ) . we detected a genome-wide recombination frequency of approximately . % and identified many non-canonical junctions within each viral gene segment (figs. b and s , table ). in accordance with previous reports, a recombination junction was defined as a deletion greater than base-pairs flanked both upstream and downstream by a base-pair high-quality alignment ( , ) . forward, ' to ' junctions were filtered for each segment. small, internal deletions of less than bp with an average size of ~ bp were especially prevalent in the l segments, as indicated by the concentration of points along the diagonal axes of junction plots in figures b and s , though similar deletions were detected in all segments. the s segments, particularly s , more frequently exhibited fusion of one segment terminus to the other than did the larger segments, as indicated by the concentration of points in the lower right corners of junction plots in figures b and s . several segments exhibited hot spots for recombination (figs. c and s ). a hot spot was defined as a position or clustering of positions where the recombination frequency exceeded that of the overall genome. further, positions of interest were identified if the recombination frequency was higher than that of the rest of the segment. for example, in segment m , recombination frequencies at nucleotides and were at least four times higher than those at most other positions across the segment. in segment s , hot spots were detected around nucleotides - , - , - , , and - . however, for a few segments (l , s , and s ), recombination frequency was low across the entire gene segment length (fig. s ) . thus, the identification of positions in the reovirus gene segments where the recombination frequency was up to -fold higher than the global recombination frequency demonstrates that reovirus recombination may occur at key, distinct sites across gene segments. to test whether reovirus dvgs encode specific sequence preference, we extracted and quantified the upstream and downstream sequences flanking both the junction start and stop sites (fig a) . the percent adenosine (a), cytosine (c), guanosine (g), and uracil (u) was calculated and plotted for each position in a -nt window flanking the junction site represented by a red line and caret (^) (figs b and s ). to exclude potential biases for overrepresented species, junctions were not weighted by depth. the global composition including all segments was compared to the l , m , and s gene segments individually (fig. b) . there was not a strong enrichment or depletion of any nucleotide across all segments. similarly, the l , m , and s segments did not display strong nucleotide preferences with a few notable exceptions. the m junction start sites were enriched for cytosine. both the start and stop sites of s were relatively enriched for adenosine immediately upstream of the junction sites. finally, both the start and stop sites of l were relatively enriched for guanosine immediately upstream of the junction. thus, while the reovirus junction sites did not demonstrate a strong overall sequence preference for a particular nucleotide or sequence motif, recombination at sites with specific nucleotide compositions may be vary in a segment-dependent manner. we further tested whether reovirus recombination occurs in regions of sequence microhomology, similar to other rna viruses, such as dengue virus and flock house virus (fhv) ( ) . microhomology was defined as a region of identical sequence between and base pairs (bp) in length. when microhomology at virema-detected junctions was compared to an expected probability distribution, there was significant enrichment for - bp of overlap at junction sites for segments l , m , and s ( fig c) . to ascertain any preference for sequence homology in the most frequently detected recombination junctions, we examined sequences upstream and downstream from the start and stop sites of the most frequently identified s recombination junctions (table ). at least four nucleotides of identical sequence were detected for all but one of the top-ranking recombination junctions (table ). typically, regions of sequence similarity spanned the recombination junction, such that sequences immediately surrounding the newly recombined junction site were identical to those at both the start and stop junction sites. together, these observations suggest that rna sequence homology plays a role in reovirus recombination site selection. while the synthesis and packaging of non-canonical segments, including dvgs, has been reported for multi-segmented, dsrna viruses of the reoviridae family ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , the frequency and mechanism of recombination yielding these rna products are unknown. here, we serially passaged two strains of reovirus and one strain of rotavirus in cultured cells and compared the virus titers and rna segment profiles. we demonstrated that small, non-canonical rna products that retain the termini and feature large internal deletions are synthesized frequently by both strains of reovirus (figs. , , and ). some small reovirus dvgs were maintained throughout the passage series, while other dvgs were transient (figs. and ) . rna-seq analysis of rna extracted from two independent preparations of purified, low-passage, benzonase-treated rst l reovirus revealed specific junction locations and recombination hot spots in individual gene segments (figs. and s -s ) . sequences surrounding the recombination junction sites in packaged rst l rna revealed significant nucleotide overlap and regions of sequence microhomology at junction sites for multiple gene segments ( fig. c and table ). together, these findings suggest that reovirus frequently synthesizes and packages defective gene segments featuring internal deletions and that recombination can be directed by sequence and occurs at distinct sites. multiple factors may influence virus titer during serial passage. in the current study, virus titer patterns for serially passaged rst l and rssa were similar in that distinct titer dips and rebounds were observed concurrently for all three lineages, but titer patterns for passaged rst d i and rssa lineages were similar in that peak titers were lower and stayed within a narrower range (fig. b-d) . t d reovirus induces necrosis and substantially more apoptosis than t l in l cells, and sa induces necrosis in ma cells ( ) ( ) ( ) . once most cells in the culture have become infected, cell death may limit the peak titer that can be achieved for rst d i and rssa , which may in part explain their narrower titer ranges. a repeating pattern of dips and rebounds in titer has been reported for serially and continuously cultured viruses and associated with generation of defective defective interfering particles ( , , , , ) . dvgs may interfere with viral replication through competition with full-length genomes for the viral polymerase and structural proteins ( ) . small dvgs may be transcribed or replicated with higher efficiency than the full-length genome, permitting eventual monopolization of these resources. particles encapsidating defective genomes can also potentily activate innate immune responses, which may interfere with viral replication ( ) . for all three lineages, rst l reovirus exhibited a noteworthy decrease in titer at p , and rssa rotavirus exhibited dips in titer at p and p , each followed by a rebound (fig. b and d) . in contrast, rst d i reovirus exhibited smaller fluctuations in titer that were not always parallel among lineages (fig. c) . the frequency of the titer dips and rebounds of rst d i is more consistent with that previously reported for serially passaged bovine rotavirus than the frequency exhibited by the other viruses ( ) . additional experiments are needed before any causal relationship between dvg presence or absence and virus titer changes can be established for the viruses in this study. however, some small dvgs were maintained throughout the passage series, suggesting they have no negative impact on the virus population, while other dvgs were transient (figs. - ) . viruses with the most consistent titer patterns among lineages, rst l and rssa , also had the most consistent rt-pcr profiles, while the virus with the most variability in titer among lineages, rst d i , also exhibited the most diverse rna segment profiles, suggesting a potential link between these phenotypes (figs. b-d and - ). we anticipate that the true diversity and abundance of dvgs present in reovirus and rotavirus populations is much greater than revealed by rt-pcr and electrophoretic analysis, which likely was limited to the most abundant populations of small dvgs with termini that match those of a specific gene segment. differences in dvg profiles among reoviridae genera and reovirus strains detected in this study suggest underlying differences in recombination properties. it is unclear why serially passaged rssa rotavirus lysates contained so few dvgs with matched segment termini and internal deletions, while such dvgs were detected for most rt l and rst d i reovirus segments in most passages (figs. - ). for both reovirus and rotavirus, we detected putative non-canonical segments of greater length than the parental segment. in the literature, head-to-tail duplications (concatemers) resulting in longer segments have been most often reported for rotavirus, whereas internal deletions have been reported for reoviruses ( , , ) . it is possible that reoviridae virion structure influences non-canonical rna length. reoviruses have turrets at the icosahedral vertices, through which newly synthesized viral +rnas exit the particle, whereas rotaviruses are non-turreted reoviridae family members. like rotavirus, non-turreted orbiviruses package concatemers that are significantly longer than full-length segments ( ) . while mechanisms underlying differences in noncanonical viral segment synthesis remain unknown, infrequent detection of small rotavirus dvgs and frequent detection of small reovirus dvgs in this study are consistent with published findings. it is unclear why rst l reovirus exhibited similar dvg profiles across lineages, while rst d i exhibited much greater variability (fig. ) . consistency in rt-pcr profiles among rst l lineages and the finding that junction sites in packaged rst l rna from two different low-passage virus preparations were extremely well conserved support a model in which rst l recombination occurs at distinct sites and yields specific dvgs (figs. , b, and s ) . alternatively, rst l replication may tolerate dvg diversity poorly and limit the range of detected variants through purifying selection. the variability in rna segment profiles among rst d i lineages indicates that the products are not random artifacts of detection and suggests reduced recombination specificity or increased recombination frequency when compared to rst l (fig. ) . for influenza virus, a segmented negative-strand rna virus, the abundance of dvgs can be significantly affected by specific amino acid substitutions in the viral rna-dependent rna polymerase (rdrp), suggesting an important role for this enzyme in dvg synthesis ( , ) . for reovirus, the viral rdrp also likely plays a prominent role in recombination potential, with other viral and host factors potentially influencing recombination properties. for example, viral rdrp co-factor µ has been shown to differentially influence t l and t d transcriptional activity and replication efficiency in specific cell types ( , ) . thus, identifying key determinants of recombination and non-canonical segment synthesis will be important areas of future study. in comparison to other recent reports on rna virus recombination, reovirus demonstrated ~ -fold lower recombination frequency than coronavirus and ~ -fold lower recombination frequency than fhv ( , ) . while microhomology can occur as an artifact of illumina rna-seq library preparation, artifactual recombination typically joins fragments in opposite orientations, involves short deletions, and lacks site specificity across the genome, whereas biological recombination by the virus often will join fragments in the same orientation and may occur at hot spots in the genome ( , , ) . potential artifacts were removed by filtering for forward recombination junctions and removing short deletions from datasets presented here. thus, despite the low reovirus recombination frequency detected in this study, the bioinformatic quality controls, orientation of joined fragments, and presence hot spots at specific sites in the genome support the biological relevance for recombination detected by rna-seq (figs. and s -s ). detection and sequencing of dvgs containing large internal deletions following reovirus passage in cultured cells further supports the validity of these findings (figs. , , and ). recombination frequency for fhv increases during passage and also may do so for passaged reoviruses ( ) . the molecular mechanism of reovirus recombination is unknown. for influenza virus, dvgs containing internal deletions are common, and the ′ and ′ sequences of a dvg are always derived from the same segment and polarity ( ) . while our methodology cannot exclude the presence of other types of dvgs, our deepsequencing data and previously reported findings suggest reovirus synthesizes dvgs that feature internal deletions and ′ and ′ termini from a single segment of the same polarity ( fig. ) ( ) . influenza dvgs containing internal deletions are proposed to arise from pausing of the rdrp during nascent strand synthesis while continuing to process along the template molecule and then resuming synthesis at a downstream point on the template ( ). downstream rdrp reinitiation may be guided by complementarity between the nascent strand and the site of reinitiation on the template ( , ) . given the similarity in dvg composition and conservation of sequences at junction sites detected for multiple segments (figs. and c and table ), reovirus may synthesize dvgs using a mechanism similar to that proposed for influenza virus. rt-pcr products differing in size from full-length rst l s , which exhibited low recombination frequency across the segment, were not detected ( figs. a and s ) . however, such rt-pcr products were detected for l and s , which also exhibited low overall recombination frequencies. aside from the presence of small stretches of homologous sequence, it is unclear why recombination occurs more frequently at some sites than others throughout the rst l reovirus genome (figs. c and s ) . it is possible that, in addition to sequence, +rna secondary structure also contributes to recombination site selection. effects of non-canonical rna species, including dvgs, during reoviridae infection are poorly understood. in addition to their effects on viral populations and the host, studies of non-canonical segments will reveal principles governing assortment and packaging, which may enable more sophisticated engineering of reoviridae-based vaccines and therapeutics using reverse genetics platforms. while passaging at relatively high moi and detection by rt-pcr suggests the existence of small dvgs in reovirus populations, these approaches are relatively insensitive (figs. - ) . the frequent detection of novel junctions resulting from recombination events via nextgeneration sequencing in relatively low-passage reovirus stocks suggests these events are common even in the absence of high-passage conditions (figs. - and s -s ). continued application of sensitive approaches will enable a deeper understanding of recombination frequency, and the identities of non-canonical rna species, mechanisms by which they are generated, and their impacts on virus populations and the host. cells. spinner-adapted l cells were grown in suspension culture in joklik's minimum essential medium (jmem; us biological) supplemented to contain % fetal bovine serum (fbs) (gibco), and mm l-glutamine (corning). all media were additionally supplemented with units/ml penicillin, µg/ml streptomycin (corning), and ng/ml amphotericin b (corning). all cells were maintained at °c with % co . baby hamster kidney cells expressing t rna polymerase under control of a cytomegalovirus promoter (bhk-t ) ( ) were maintained in dulbecco's minimum essential medium (dmem; corning) supplemented to contain % fbs (gibco), with mg/ml geneticin (gibco) added during alternate passages. ma cells were grown in minimum essential medium with earle's salts (emem; corning) supplemented to contain % fbs. viruses. laboratory stocks of reovirus strains rst l and rst d i were engineered using plasmid-based reverse genetics ( , ) . rst d i differs from rst d in that it contains an engineered t i mutation that prevents proteolytic cleavage of attachment protein σ ( ) . the t i mutation was introduced in the pbact -s t d plasmid by 'round the horn pcr ( ) with mutagenic primers. monolayers of bhk-t cells at ~ % confluency in -well plates were co-transfected using transit-lt transfection reagent (mirus bio llc) with . µg of each of ten plasmid constructs representing the t l or t d i reovirus genome. after five days of incubation, cells were lysed by two rounds of freezing and thawing, and rst l or rst d i in lysates was amplified once in l cells prior to use in serial passage experiments. viral titers were determined by plaque assay using l cells, as described ( ) . a laboratory stock of rssa was engineered using reverse genetics ( ) . confluent monolayers of bhk-t cells in -well plates were co-transfected using transit-lt transfection reagent (mirus bio llc) with . µg of each of eleven plasmid constructs representing the sa rotavirus genome and of each of two plasmids encoding vaccinia virus mrna capping enzymes, and . µg of a plasmid encoding the nelson bay virus fusion-associated small transmembrane protein. after one day of incubation, supernatants were removed and replaced with serum-free dmem. after one additional day of incubation, supernatants were removed, and . × ma cells in serum-free dmem containing . µg/ml of trypsin were added to each well, followed by incubation at °c for three more days. cells were then lysed by three rounds of freezing and thawing, and rssa in lysates was amplified once in ma cells prior to use in serial passage experiments. virus titer was determined by plaque assay using ma cells, as described ( ) . for p , x l cells, which support vigorous reovirus replication, were pelleted and adsorbed with rst l or rst d i in ml jmem at an moi of pfu/cell for h at °c. cell-virus mixtures were transferred to l bottles containing a magnetic stirring rod, and jmem was added to a total volume of ml. spinner bottles were incubated at °c with stirring for h prior to two rounds of freezing and thawing and removal of large cellular debris by centrifugation at × g for min. for subsequent passages (p -p ), an identical protocol was followed, except that x l cells were pelleted and resuspended in an adsorption inoculum of ml of cleared lysate from the previous passage. for each virus, passages were conducted in triplicate lineages, which were maintained throughout the series. virus titer in each passage for each lineage was determined by plaque assay ( ) . for p , rssa was activated by incubation with µg/ml of trypsin for h at °c then diluted to a final concentration of < µg/ml of trypsin. a confluent monolayer of ma cells (~ . × ), which support vigorous rotavirus replication, in a t flask was washed with serum-free emem adsorbed with rssa in ml emem at an moi of . pfu/cell for h at °c. the inoculum was removed, and the monolayer was washed prior to the addition of ml serum-free emem containing . µg/ml of trypsin. cells were incubated at °c for h prior to two rounds of freezing and thawing and removal of large cellular debris by centrifugation at × g for min. for subsequent passages (p -p ), an identical protocol was followed, except that ml of cleared rssa lysate from the previous serial passage was activated by incubation with µg/ml of trypsin for h at °c, then used as the adsorption inoculum for the subsequent passage. passages were conducted in triplicate lineages, which were maintained throughout the series. virus titer in each passage for each lineage was determined by fluorescent focus assay (ffa). rotavirus fluorescent focus assay. ma cells were seeded into -well, blackwalled plates to achieve a density of ~ . × . rotavirus stocks were activated by incubation with µg/ml trypsin for h at °c then diluted serially in serum-free emem. cells were washed twice with serum-free emem then adsorbed with serial dilutions of rotavirus at °c for h. inocula were removed, and cells were washed and incubated in fresh medium at °c for - h. cells were then fixed with cold methanol, and rotavirus proteins were detected by incubation with polyclonal rotavirus antiserum (invitrogen) at a : , dilution in pbs containing . % triton x- at °c, followed by incubation with alexa fluor -labeled secondary igg (invitrogen) and dapi. images were captured for four fields of view per well using an imagexpress micro xl automated microscope imager (molecular devices). total and infected cells were quantified using metaxpress high-content image acquisition and analysis software (molecular devices). fluorescent focus units per milliliter of virus stock were calculated based on the surface area of the well quantified and virus inoculum volume and dilution. rt-pcr and electrophoretic analysis of dvg profiles. rna was isolated from cleared serial passage lysates using trizol ls reagent (invitrogen), according to the manufacturer protocol. gene segments and dvgs were amplified using a onestep rt-pcr kit (qiagen), using the isolated serial passage rna as a template and genespecific primers that bind the ′ and ′ rna termini (table ) . reaction and thermal cycler conditions were as described by the manufacturer. an extension time of min was used for gene segments < kb, and min was used for segments > kb. rt-pcr products were resolved by electrophoresis in . % agarose in the presence of ethidium bromide and imaged with a vwr photo imager and variable intensity uv transilluminator. rt-pcr product sizes were analyzed in comparison to migration of markers in bp and kb ladders (new england biolabs). dvg sequencing. selected dna fragments were excised from agarose gels, purified using a qiaquick gel extraction kit (qiagen), and sequenced using the sanger method (genewiz) and the same primers used for rt-pcr amplification. sequences were aligned to the reference viral genome using clc sequence viewer (qiagen). rst l purification, library preparation, and next-generation sequencing. in two independent preparations, l cells ( × ) were pelleted and adsorbed with laboratory stocks of rst l reovirus in ml jmem at an moi of pfu/cell for h at °c. cellvirus mixtures were transferred to l bottles containing a magnetic stirring rod, and jmem was added to a total volume of ml. spinner bottles were incubated at °c with stirring for h, then cells were pelleted by centrifugation at , rpm for min at °c. cell pellets were resuspended in homogenization buffer ( mm nacl, mm tris ph . , mm β-mercaptoethanol) and frozen at - °c. cell pellets were thawed and incubated with . % (v/v) deoxycholate for min on ice prior to addition of vertrel xf and sonication. virus particles were concentrated by centrifugation at , × g in a . - . g/cm cesium chloride density gradient. the reovirus-containing fraction was collected and dialyzed at °c in virion storage buffer ( mm nacl, mm mgcl, mm tris-hcl, ph . ). purified rst l particles were treated with a final concentration of u/µl benzonase (millipore) for h at °c to degrade extra-particle nucleic acids. rna was isolated from benzonase-treated particles using trizol ls reagent (invitrogen), according to manufacturer's protocol. rna libraries were prepared from rna isolated from cesium chloride gradientpurified, benzonase-treated rst l virions. contaminating dna was degraded by treatment with rnase-free dnase i (new england biolabs) for min at °c. rna was isolated by using trizol ls reagent (invitrogen), according to the manufacturer's protocol, and the concentration and quality of rna was quantified using a bioanalyzer (agilent). library preparation for illumina sequencing was performed using the nebnext ultra ii rna library prep kit for illumina (new england biolabs) and a minimum of ng of rna, according to manufacturer's instructions. briefly, rna was fragmented prior to first-strand and second-strand synthesis and rnaclean xp (beckman coulter) purification. pcr enrichment of adaptor-ligated dna was performed using nebnext multiplex oligos for illumina (new england biolabs) to produce illumina-ready libraries. libraries were sequenced by base pair paired-end sequencing on a novaseq sequencing system (illumina). assistance with quality control and next-generation sequencing was provided by the vanderbilt technologies for advanced genomics (vantage) research core. illumina rna-seq processing and alignment. raw reads were processed by first removing the illumina truseq adapter using trimmomatic ( ) nucleotide composition at each position surrounding dvg junctions was determined. to avoid bias of highly replicated dvgs and to more closely reflect the stochastic nature of rna recombination, each unique detected junction was counted equally rather than weighting by read count ( ) . sequences were extracted from a sorted bed file listing the junctions using rec_site_extraction.py with a -base pair window. start site and stop site sequences were separated in microsoft excel and the nucleotide frequency at each position was calculated using the biostrings ( ) package in rstudio ( ) . length of microhomology at junction sites were extracted from virema sam file using the compiler_module.py of virema and -fuzzentry --defuzz flags. the frequency of overlaps ranging from - bp was calculated and compared to an expected probability distribution using uhomology.py. for junction analyses shown in tables and , regions of sequence similarity were defined as stretches of at least four sequential, identical nucleotides located in the same position or removed by a single nucleotide, relative to the junction site. a single mismatch, shown in black, was tolerated in identical sequences longer than four nucleotides, if at least two additional identical nucleotides were adjacent to it. data availability. data generated from illumina rna-seq can be accessed at ncbi sequence read archive (sra) under the bioproject accession prjna . code utilized in this report can be accessed at https://github.com/denisonlabvu/rna-seqpipeline. filled triangles indicate non-canonical rna products smaller than the full-length segment. red triangles indicate products that were excised and sequenced. open triangles indicate products larger than the full-length segment. filled triangles indicate products smaller than the full-length segment. open triangles indicate products larger than the full-length segment. figure . serial passage lineage - rotavirus segment profiles. rna extracted from serial passage lysates was used as template in rt-pcr reactions, along with primers that bind to the ' and ' terminal sequences of rotavirus gene segments g , g , or g . products from reactions in which template rna was extracted from rssa rotavirus lin - p - were resolved on . % ethidium-bromide stained agarose gels. gels from lin are identical to those shown in figure but are included for comparison. black asterisks indicate the position of the full-length segment. filled triangles indicate products smaller than the full-length segment. figure . schematics of reovirus dvgs. rt-pcr products amplified using primers that bind the ′ and ′ termini of the l , m , and s reovirus segments, smaller than the full-length segments, and indicated in figure were excised from agarose gels and sequenced. schematics, drawn to scale, of reovirus l , m , and s segments and sequenced rst l and rst d i reovirus dvgs are shown. the orf is colored purple (l ), blue (m ), or green (s ), and the ′ and ′ utrs are colored gray. deletions are indicated by dotted lines, and deleted nucleotides are described. influenza virus di particles: defective interfering or delightfully interesting? defective viral genomes are key drivers of the virus-host interaction virus population dynamics during infection the defective component of viral populations global, regional, national, and selected subnational levels of stillbirths, neonatal, infant, and under- mortality, - : a systematic analysis for the global burden of disease study reovirus infection triggers inflammatory responses to dietary antigens and development of celiac disease part ii: the viruses -the double stranded rna viruses -family reoviridae p - mechanism of intraparticle synthesis of the rotavirus double-stranded rna genome assortment and packaging of the segmented rotavirus genome bluetongue virus structure and assembly structural insights into the coupling of virion assembly and rotavirus replication fields virology rotaviruses physicochemical analysis of rotavirus segment supports a 'modified panhandle' structure and not the predicted alternative trna-like structure (trls) sequence diversity within the reovirus s gene: reovirus genes reassort in nature, and their termini are predicted to form a panhandle motif genomic analysis of codon, sequence and structural conservation with selective biochemical-structure mapping reveals highly conserved and dynamic structures in rotavirus rnas with potential cis-acting functions bluetongue virus vp acts early in the replication cycle and can form the basis of chimeric virus formation engineering recombinant reoviruses with tandem repeats and a tetravirus a-like element for exogenous polypeptide expression a plasmid-based reverse genetics system for animal double-stranded rna viruses reovirus reverse genetics: incorporation of the cat gene into the reovirus genome identification of sequence elements containing signals for replication and encapsidation of the reovirus m genome segment dynamic network approach for the modelling of genomic sub-complexes in multi-segmented viruses generation of infectious rna complexes in orbiviruses: rna-rna interactions of genomic segments disruption of specific rna-rna interactions in a double-stranded rna virus inhibits genome packaging and virus infectivity inter-segment complementarity in orbiviruses: a driver for co-ordinated genome packaging in the reoviridae? sequential packaging of rna genomic segments during the assembly of bluetongue virus persistent infections in l cells with temperaturesensitive mutants of reovirus identical rearrangement of nsp genes found in three independently isolated virus clones derived from mixed infection and multiple passages of rotaviruses isolation and characterization of orbivirus genotypic variants a human rotavirus with rearranged genes and encodes a modified nsp protein and suggests an additional mechanism for gene rearrangement rearrangement generated in double genes, nsp and nsp , of viable progenies from a human rotavirus strain genome rnas of virulent and attenuated strains of bluetongue virus serotypes , , and rearrangements of rotavirus genomic segment are generated during acute infection of immunocompetent children and do not occur at random generation of defective virus after infection of newborn rats with reovirus genome rearrangements of rotaviruses subgenomic s segments are packaged by avian reovirus defective interfering particles having an s segment deletion defective virions of reovirus reovirus activates a caspase-independent cell death pathway differences in the capacity of reovirus strains to induce apoptosis are determined by the viral attachment protein sigma the in vitro cytopathology of a porcine and the simian (sa- ) strains of rotavirus defective interfering particles and virus evolution discovery of functional genomic motifs in viruses with virema-a virus recombination mapper-for analysis of next-generation sequencing data the coronavirus proofreading exoribonuclease mediates extensive viral recombination a survey of virus recombination uncovers canonical features of artificial chimeras generated during deep sequencing library preparation genome rearrangements of bovine rotavirus after serial passage at high multiplicity of infection production of defective interfering particles of influenza a virus in parallel continuous cultures at two residence times-insights from qpcr measurements and viral dynamics modeling homologous interference by incomplete sendai virus particles: changes in virus-specific ribonucleic acid synthesis a single amino acid mutation in the pa subunit of the influenza virus rna polymerase promotes the generation of defective interfering rnas reduced accumulation of defective viral genomes contributes to severe outcome in influenza virus infected patients a post-entry step in the mammalian orthoreovirus replication cycle is a determinant of cell tropism the m gene is associated with differences in the temperature optimum of the transcriptase activity in reovirus core particles parallel clickseq and nanopore sequencing elucidates the rapid evolution of defective-interfering rnas in flock house virus the origins of defective interfering particles of the negative-strand rna viruses smrt sequencing revealed the diversity and characteristics of defective interfering rnas in influenza a (h n ) virus infection sequence analysis of in vivo defective interfering-like rna of influenza a h n pandemic virus an improved reverse genetics system for mammalian orthoreoviruses a simple method for site-directed mutagenesis using the polymerase chain reaction mammalian reoviruses: propagation, quantification, and storage entirely plasmid-based reverse genetics system for rotaviruses culturing, storage, and quantification of rotaviruses trimmomatic: a flexible trimmer for illumina sequence data the sequence alignment/map format and samtools biostrings: efficient manipulation of biological strings. r package version m (blue), and s (green) gene segments were compared to an expected probability distribution (gray). the frequency of each overlap is displayed as a mean (n = ), and error bars represent standard error we thank the staff at vantage for assistance with next-generation sequencing and dr. james chappell for critical reading of the manuscript.this research was supported in part by the national center for research resources, grant ul rr - , and is now at the national center for advancing translational sciences, grant ul tr - . the content is solely the responsibility of the authors and does not necessarily represent the official views of the national institutes of health. gctacacgttccacgacaat gatgagttgacgcaccacg primers for rssa rt-pcr key: cord- -gt upfri authors: hölzer, martin; marz, manja title: poseidon: a nextflow pipeline for the detection of evolutionary recombination events and positive selection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gt upfri summary poseidon is an easy-to-use pipeline that helps researchers to find recombination events and sites under positive selection in protein-coding sequences. by entering homologous sequences, poseidon builds an alignment, estimates a best-fitting substitution model, and performs a recombination analysis followed by the construction of all corresponding phylogenies. finally, significantly positive selected sites are detected according to different models for the full alignment and possible recombination fragments. the results of poseidon are summarized in a user-friendly html page providing all intermediate results and the graphical representation of recombination events and positively selected sites. availability and implementation poseidon is freely available at https://github.com/hoelzer/poseidon. the pipeline is implemented in nextflow with docker support and processes the output of various tools. contact hoelzer.martin@gmail.com selection pressure continuously influences the evolution of genes and can be studied in many ways (vitti et al., ) . for example, positive or diversifying selection can be detected by comparing the rates of non-synonymous (dn ) and synonymous substitutions (ds) in an alignment of orthologous genes. over several sites (codons), the dn/ds ratio (or ω) can reach values well above (yang, ) , and such sites are likely to be positively selected. for instance, specific amino acid changes are favored if they increase the host's fitness against a pathogen (fumagalli et al., ) . alternatively, the genes of a pathogen are affected, as in the covid pandemic, where positively selected sites in the spike protein of the virus gave cause for concern (korber et al., ) . the detection of positive selection enables researchers to gain insights into the evolution of genes and thus develop countermeasures against pathogens that are in a constant 'arms-race' with their host. since recombination can have a profound influence on evolutionary processes and can adversely affect phylogenetic reconstruction and the accurate detection of positive selection (shriner et al., ) , screening for breakpoints to define recombinant parts within an alignment should be a standard step in any comparative evolutionary study. a comprehensive evolutionary analysis of significantly positively selected sites consist of several complicated steps, including ( ) in-frame alignment; ( ) indel correction; ( ) phylogenetic tree calculation; ( ) selection of a best-fitting nucleotide substitution * to whom correspondence should be addressed. model; ( ) detection of topological incongruence and breakpoint selection to describe putative recombination events; ( ) calculation of positively selected sites (ω > ) under varying models; ( ) and their impact on the selective pressure acting on the whole alignment. thus, such an analysis involves dozens of different tools and parameter settings. in addition, the results of many well-established and widely used tools in this field of evolutionary science are not easy to interpret and process. especially, the accurate detection and handling of putative recombination events is a challenging but essential task. currently, only a few tools for the comprehensive detection of positive selection exist. these tools either do not automatically combine all of the described steps (delport et al., ) , do not take possible recombination events into account (doron-faigenboim et al., ; stern et al., ; webb et al., ) , or focus only on the detection of positive selection in prokaryotic genomes (su et al., ) . here we present poseidon, a pipeline that allows researchers to perform comprehensive evolutionary studies by automatically taking care of all tasks mentioned above. poseidon does detect not only positively selected sites in an alignment of homologous sequences but also possible recombination events that could otherwise adversely affect the positive selection detection. the input is a single fasta file consisting of protein-coding dna sequences with a correct open reading frame. the output is summarized in t e x and pdf format and visualized in a user-friendly html page, providing access to all results and intermediate files. poseidon comprises an assembly of different scripts and tools ( fig. ) that allow for the detection of recombination and positive selection in protein-coding sequences. each third-party tool encapsulates in a docker container and all steps are connected in a nextflow (di tommaso et al., ) implementation for full parallelization and simple execution. if nextflow and docker are configured, poseidon can be installed and run with a single command: nextflow run hoelzer/poseidon --help. different profiles allow the reliable execution of poseidon in the cloud or on a cluster system. starting from homologous coding sequences provided by the user, we build a multiple sequence alignment guided by amino acid information with translatorx (v . ) (abascal et al., ) , using muscle (v . . ) (edgar, ) to align the sequences. the resulting in-frame nucleotide alignment is cleaned for indels. a best-fitting substitution model is selected c oxford university press . using modeltest (posada and crandall, ) , which is part of the hyphy suite (v . . ) (pond et al., ) . possible recombination events and corresponding breakpoints in the alignment are detected using gard (pond et al., b,a) under the previously selected substitution model. all breakpoints are tested for significant topological incongruence using a kashino hasegawa (kh) test (kishino and hasegawa, ) . khinsignificant breakpoints most frequently arise from variation in branch lengths between segments. however, we observed interesting positively selected sites in fragments without any significant topological incongruence (fuchs et al., ) . thus, kh-insignificant breakpoints can be taken into account and are marked in the final output, as they might not occur from real recombination events. positions of putative breakpoints that would destroy the open reading frame are adjusted to the next valid position. phylogenetic reconstructions on the full alignment and all fragments are performed with raxml (v . . ) (stamatakis, ) using the gtrgamma model for nucleotide sequences and protgammawag for amino acids. all calculations are performed with bootstrap replicates. the user can apply optional outgroup rooting. the newick utilities suite (v . ) (junier and zdobnov, ) is used to visualize the calculated trees in different formats. positive selection is analyzed on the full alignment and each of the fragments separately. maximum-likelihood tests to detect positive selection under varying site models are performed with codeml (m a vs. m a, m vs. m ) implemented within the paml suite (v . ) (yang, ) . furthermore, we implemented the m a vs. m test proposed by swanson et al. ( ) as an additional model test in poseidon. the different statistical site models that do not allow (neutral models) or allow (selection models) a class of codons to evolve with ω > are compared. furthermore, varying codon frequency models are applied to simulate different nucleotide substitution rates. a gene is declared to be positively selected if the neutral model can be rejected in favor of the positive selection model based on a likelihood ratio test. then, a bayes empirical bayes (beb) approach (yang et al., ) is applied to calculate posterior probabilities (p p ) that a codon comes from the site class with ω > . positively selected sites with an assigned p p > . are depicted as significant. we graphically summarize all positively selected sites under varying frequency models in the output (fig. ) . thus, we allow the user to investigate sites that would be dismissed from the output when using a p p threshold. for example, such sites could be located in regulatory domains of the final protein, yielding a lower p p value due to insufficient species sampling (mcbee et al., ) . the final output of poseidon is based on a heavily modified version of the translatorx html output. the amino acid color code is adapted from translatorx. here we present poseidon, an easy-to-use nextflow pipeline for the accurate detection of site-specific positive selection and recombination events in protein-coding sequences. the input is a multiple fasta file of homologous coding sequences that is automatically transferred into a codon-based alignment. since recombination can have a profound impact on the evolutionary history of sequences, we initially check the alignment for topological incongruence to define putative recombination breakpoints. the whole evolutionary analysis of poseidon is performed independently for the full alignment and all possible fragments. poseidon automatically calculates maximum likelihood-based phylogenetic trees for all alignments, estimates ω values at each site, and computes their impact on the positive selection. all identified sites and their significance values are projected onto the codon and amino acid alignment of the input sequences to allow visual identification of evolutionary hot-spots with high ω values. additionally, publication-ready pdf and l a t e x tables are provided, including all breakpoints and significantly positively selected sites. all results are summarized in a userfriendly and clear manner, allowing researchers to study positive selection. funding: this work has been supported by the deutsche forschungsgemeinschaft (dfg) through the priority program spp- ma / - and was partly conducted within the collaborative research centre aquadiva (crc aquadiva) of the friedrich schiller university jena, also funded by the dfg. mh appreciates the support of the joachim herz foundation by the addon fellowship for interdisciplinary life science. we would like to thank lasse faber for his help in implementing the dockers. conflict of interest: none declared. translatorx: multiple alignment of nucleotide sequences guided by amino acid translations datamonkey : a suite of phylogenetic analysis tools for evolutionary biology nextflow enables reproducible computational workflows selecton: a server for detecting evolutionary forces at a single amino-acid site muscle: multiple sequence alignment with high accuracy and high throughput evolution and antiviral specificities of interferon-induced mx proteins of bats against ebola, influenza, and other rna viruses signatures of environmental genetic adaptation pinpoint pathogens as the main selective pressure through human evolution the newick utilities: high-throughput phylogenetic tree processing in the unix shell evaluation of the maximum likelihood estimate of the evolutionary tree topologies from dna sequence data, and the branching order in hominoidea the effect of species representation on the detection of positive selection in primate gene data sets hyphy: hypothesis testing using phylogenies automated phylogenetic detection of recombination using a genetic algorithm gard: a genetic algorithm for recombination detection modeltest: testing the model of dna substitution potential impact of recombination on sitewise approaches for detecting positive natural selection raxml version : a tool for phylogenetic analysis and postanalysis of large phylogenies selecton : advanced models for detecting positive and purifying selection using a bayesian inference approach psp: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes pervasive adaptive evolution in mammalian fertilization proteins detecting natural selection in genomic data vespa: very large-scale evolutionary and selective pressure analyses paml : phylogenetic analysis by maximum likelihood bayes empirical bayes inference of amino acid sites under positive selection key: cord- -l n ocf authors: carozza, jacqueline a; brown, jenifer a.; böhnert, volker; fernandez, daniel; alsaif, yasmeen; mardjuki, rachel e.; smith, mark; li, lingyin title: structure-aided development of small molecule inhibitors of enpp , the extracellular phosphodiesterase of the immunotransmitter cgamp date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: l n ocf cancer cells initiate an innate immune response by synthesizing and exporting the small molecule immunotransmitter cgamp, which activates the anti-cancer stimulator of interferon genes (sting) pathway in the host. an extracellular enzyme, ectonucleotide pyrophosphatase phosphodiesterase (enpp ), hydrolyzes cgamp and negatively regulates this anti-cancer immune response. small molecule enpp inhibitors are much needed as tools to study basic biology of extracellular cgamp and as investigational cancer immunotherapy drugs. here, we surveyed structure-activity relationships around a series of cell-impermeable and thus extracellular-targeting phosphonate inhibitors of enpp . additionally, we solved the crystal structure of an exemplary phosphonate inhibitor to elucidate the interactions that drive potency. this study yielded several best-in-class compounds with ki < nm and excellent physicochemical and pharmacokinetic properties. finally, we demonstrate that an enpp inhibitor delays tumor growth in a breast cancer mouse model. together, we have developed enpp inhibitors that are excellent tool compounds and potential therapeutics. adaptive immune checkpoint inhibitors such as anti-pd- , anti-pd-l , and anti-ctla- are now curing cancer patients who were previously considered terminally ill. these inhibitors work by removing the immunological brakes that cancer cells place on tumor-infiltrating lymphocytes (tils), therefore increasing the cancer-killing efficacy of tils. however, only "hot" tumors -those that already have high numbers of tils -respond to checkpoint inhibitor therapy. most tumors do not exhibit this til inflammation and thus are immunologically "cold." [ ] [ ] [ ] turning "cold" tumors "hot" by activating the innate immune detection of cancer, which is upstream of the recruitment of adaptive immune tils, could revolutionize cancer immunotherapy. the cytosolic double-stranded dna (dsdna) sensing-stimulator of interferon genes (sting) pathway is the key innate immune pathway that responds to cancerous cells. chromosomal instability and extrachromosomal dna are hallmarks of cancer that can lead to leakage of dsdna into the cytosol. [ ] [ ] [ ] [ ] [ ] the cytosolic dsdna is detected by the enzyme cyclic-gmp-amp synthase (cgas), which synthesizes the cyclic dinucleotide ', '-cyclic gmp-amp (cgamp). , cgamp then binds and activates sting, which leads to production of type i interferons (ifns) and downstream til infiltration. we recently discovered that cancer cell lines basally synthesize cgamp and export it to the extracellular space. extracellular cgamp is then internalized by host cells to trigger anti-cancer innate immune responses. [ ] [ ] [ ] [ ] [ ] [ ] however, we also discovered that the ubiquitously expressed extracellular enzyme enpp , which was previously known only as an atp hydrolase, is the dominant hydrolase for extracellular cgamp and dampens innate responses to cancer. , we previously developed phosphorothioate cgamp analogs that are resistant to enpp hydrolysis. since then, other stable cgamp analogs have entered clinical trials in combination with anti-pd- checkpoint blockers (trial ids nct and nct , respectively). however, these cgamp analogs need to be injected directly into the tumors to achieve efficacy and to avoid systemic interferon response. alternatively, we propose to inhibit enpp to boost the efficacy of the endogenous extracellular cgamp that is only locally exported by cancer cells. indeed, genetic knockout and pharmacological inhibition of enpp both increased tumor-infiltrating dendritic cells and controlled tumor growth, without any safety concerns. this work suggests that enpp inhibitors could turn "cold" tumors "hot" and render them more sensitive to adaptive immune checkpoint inhibitors. outside of its role in cancer immunotherapy, cgamp has shown excellent adjuvant activities in vaccination against influenza and may aid in the current urgent demand to develop vaccines against sars-cov- . we, therefore, propose that enpp inhibitors may also stabilize cgamp when administered as an adjuvant. enpp is a single-pass transmembrane protein that is anchored with the catalytic domain outside of the cell. it can also be exported extracellularly in a soluble form that is abundant in the circulation. [ ] [ ] [ ] was known to convert nucleotide triphosphates (preferentially atp) to nucleotide monophosphates (cite something). we reported that enpp also converts cgamp to amp and gmp by hydrolyzing first the '- ' phosphodiester bond, and then the '- '. similar to the other npp family members, its catalytic site coordinates two zinc ions that hold the substrate phosphate in place for threonine-mediated nucleophilic attack. the adenosine base of enpp substrates stacks with aromatic residues that form a tight pocket in the active site. later co-crystal structures of enpp with papg, the degradation intermediate of ' '-cgamp, and with ' '-cgamp provided mechanistic insight into why enpp degrades ' '-cgamp, but not ' '-cgamp. however, all natural enpp substrates have km values higher than μm. , despite enpp being a highly sought-after target, [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] developing drug-like enpp inhibitors has proved difficult. here, we report the development of the most potent enpp inhibitors to date and the co-crystal structure of an exemplary phosphonate inhibitor with enpp . inspired by the molecular scaffold of a previous inhibitor, qs , , which lacks potency at physiological conditions, we build structure-activity relationships (sar) around the three sections of the molecule -the zinc-binding head, the core, and the tail -and develop several inhibitors with nanomolar ki values. our crystal structure revealed extensive interactions between the inhibitor and enpp and explains its - , -fold improvement in affinity over the natural substrates. finally, these inhibitors have desirable physicochemical and pharmacokinetic properties, which enables their systemic use in mouse studies. indeed, we demonstrate that treatment with one of our top enpp inhibitors delays tumor growth in a breast cancer mouse model. given that a soluble form of enpp is present in the circulation, we first asked how much enpp activity is present in freshly drawn mouse and human plasma by measuring the half-life of radiolabeled cgamp. cgamp was degraded rapidly (t / = minutes for mouse; t / = - minutes for human, based on healthy donors) (fig a-c) . detectable hydrolysis of cgamp ex vivo occurs only in wt, and not enpp -/mouse plasma , and inhibiting enpp activity using edta to chelate the catalytic zinc ions abrogated cgamp degradation in both mouse and human plasma (fig a-c) . rapid degradation of extracellular cgamp by circulating enpp underscores the need to develop potent and systemic enpp inhibitors. assays used to assess the potency of previously attempted enpp inhibitors are inconsistent - ; however, developing an appropriate assay is key to determining the utility of the molecules in inhibiting cgamp degradation under physiological conditions. the most common model substrate is p-nitrophenyl- '-tmp (p-nptmp), but ic values measured using this substrate have been shown to deviate significantly from ic values measured using a natural substrate such as atp. in addition, perhaps to increase the speed of the assay, most reported assays are conducted at ph , where enpp is most active. however, since enpp is active in serum at physiological ph ( fig. a-c) , an effective enpp inhibitor needs to be potent at ph . or even lower, as can occur in the tumor microenvironment. , therefore, we evaluated inhibitors with an assay using cgamp as a substrate at ph . (fig. d) . there were only two non-nucleotide inhibitor hits from the literature which had the potential to be developed into a lead compound for inhibiting enpp . first, a thioacetamide inhibitor (compound , fig. e ) was reported to have a ki of nm against human enpp using p-nitrophenyl- '-tmp (p-nptmp) as the substrate, but the ki increased to µm when using atp as the substrate, both assays conducted at ph . when we tested the potency of compound using mouse enpp and cgamp as a substrate at both ph . and ph , we detected no activity at concentrations below µm (fig. e) , suggesting that it cannot block cgamp degradation activity specifically compared to other substrates, or that it may be specific to human enpp . regardless, lack of efficacy against cgamp and/or mouse enpp disqualifies this molecule as a scaffold to use for further development. second, patel et al reported a quinazoline-piperidine-sulfamide inhibitor (qs , compound ) with an ic of nm against enpp using atp as a substrate. however, we found that the potency of qs dropped -fold when adjusted the ph down to . instead of ph (ki = . μm using cgamp as a substrate, fig. e ), making it unsuitable for experiments at physiological ph and in vivo. in addition, we have previously shown that qs non-specifically blocks cgamp export, limiting its use as a tool for studying extracellular cgamp biology. we, therefore, sought to develop more potent and specific enpp inhibitors. we hypothesized that qs lacked potency at ph . because deprotonation of the sulfamide group (pka around - ) is important for efficient binding to the zinc atoms at the catalytic site of enpp . therefore, we tested several other classes of zinc-binding head groups on the qs scaffold ( fig. a-b) . we also tried methylene, ethyl, and/or propyl linkage between the head groups and the piperidine core. ureas (compounds and ) and carboxylic acid (compound ) were less potent compared to sulfamides (compounds and ), but we observed improvement in potency with boronic acid (compound ), hydroxamic acids (compounds and ), and especially phosphates (compounds and ), the natural zinc binding group of the enpp substrates cgamp and atp. to increase compound stability while preserving the core properties of phosphates, we tested thiophosphates (compounds and ), phosphonates (compounds - ), and a thiophosphonate (compound ), all of which are known to be more stable against enzymatic degradation compared to phosphates. the phosphonates showed ki values less than nm independent of the ph (fig. a-b) . we chose to proceed with phosphonates because they are potent, stable, and synthetically tractable. in addition, there are several phosphonate drugs on the market including tenofovir and pradefovir as antivirals, fosfomycin as an antibiotic, and bisphosphonates (pamidronic aicd, zoledronic acid, alendronic acid) for osteoporosis and bone disease. finally, they are negatively charged at all physiological ph values, which is crucial for zinc binding and keeping them cell impermeable to act on the extracellular target enpp . we also chose the ethylene linker (compound , ki = nm, previously reported as stf- ) because it has been shown previously that for the sulfamides, analogs with shorter linker lengths have less affinity for the cardiac potassium channel herg, a detrimental off target. , co-crystal structure of enpp and compound reveals molecular determinants of potency enpp is a > kda multidomain glycoprotein with three glycosylation chains. it has a catalytic domain, a nuclease-like domain which provides structural support important for catalysis, and two disordered smb domains. we used a mouse enpp construct where the transmembrane anchor was truncated and replaced by a signal peptide, and expressed the construct in hek s gnt cells to generate secreted soluble enpp with simplified glycosylation chains as previously described , , . we then determined the structure of enpp in complex with compound to . Å resolution using x-ray crystallography (fig. , fig. s , tables s - ). compound occupies the substrate-binding pocket ( fig. a-d) and forms extensive interactions with zinc ions and residues in the catalytic site, demonstrating that compound is a competitive inhibitor. the phosphonate oxygens bind both zinc ions, and the third phosphonate oxygen forms a hydrogen bond with n , a residue previously determined to be important for catalytic activity. the piperidine group is engaged in hydrophobic interactions with l , and the quinazoline group sits between y and f to form p-p interactions with both ( fig. b-d) . although compound adopts a similar binding conformation as the reaction product amp (pdb gtw, fig. e -f), there are two significant differences between the ligands that suggest why compound has a much higher affinity than the substrates atp or cgamp. first, the quinazoline ring of compound sits ~ . a further back in the catalytic pocket compared to the purine ring of amp, perhaps facilitating hydrophobic interactions with residues in the back of the pocket, and possibly a polar interaction between n of the quinazoline and y . second, the -methoxy group of compound makes a direct hydrogen bond with d , whereas amp makes a water-mediated hydrogen bond (fig. e ). in addition, residue k forms a hydrogen bond with the -methoxy oxygen, further strengthening this direct hydrogen bond network. together, the potency of compound is driven by its zinc ion binding, hydrophobic interactions with the back of the binding pocket, and direct hydrogen bonds with active side resides of enpp . in addition, this structure guided our sar for the core and tail parts of the inhibitor. since our crystal structure suggests that the zinc-binding phosphonate head and quinazoline tail form the most important interactions with enpp , we next sought to explore the core region to achieve optimal geometry between these two functional groups ( fig. a-b) . inverting the nitrogen in the piperidine from aryl to alkyl (compound ) resulted in more than a -fold loss in potency, as did installing a piperazine using the phosphonate zinc-binding head and the piperidine core as a scaffold, we then sought to determine optimal substitution of the quinazoline tail ( fig. a-b) . based on our co-crystal structure of enpp with compound , we predicted that the -methoxy would be critical for binding to enpp , while the -methoxy would be dispensable since it is solvent-exposed. indeed, when we deleted the methoxy groups individually, we found that the -methoxy alone (compound ) was -fold less potent, while the we then moved to substituents at the -position of the quinazoline ring. the -methoxy (compound ) and -ethoxy (compound ) quinazolines displayed impressive potency with a ki values less than nm, which is the limit of quantification of the assay when using a nm concentration of enpp . we reasoned that combining other methoxy groups with the -methoxy could make additional interactions with the binding pocket, since the -methoxy (compound ) alone was also potent. however, combinations of -methoxy with substituents in either the , , or positions (compounds - and - ) all decreased potency. from these data and the crystal structure, we hypothesize that the -methoxy (compound ) and -ethoxy (compound ) quinazolines make hydrogen bonds with both d and/or k similar to the -methoxy in compound . this would suggest that compounds and shift in the pocket to accommodate these interactions. adding more substituents to the quinazoline does not necessarily lower potency further, as the inhibitor-interacting residues may already be occupied. in addition, similar to previous sar, , the -substituted vinyl- -pyridyl combined with the , -dimethoxy (compound ) showed identical potency to its parent , -dimethoxy (compound ). this is surprising, since the crystal structure shows limited space in this area of the pocket, suggesting that the pyridine may make a specific interaction with a protein residue. however, this trend did not hold for the -methoxy (compound ), where the addition of the -substituted vinyl- -pyridyl decreased potency by more than -fold. this further supports the hypothesis that different substitutions cause shifts in the inhibitor binding to the pocket, possibly excluding space that was previously available. from the crystal structure of compound bound to enpp , we noted that the quinazoline tail of compound is close to residues near the back of the binding pocket. we perturbed the tail aromatic ring structure itself to investigate these possible interactions (fig. a,c) . working from the quinazoline compound , we deleted the nitrogens on the ring on by one. deleting n (quinoline, compound ) led to merely a -fold decrease in potency, but deleting n (isoquinoline, compound ) led to a more than -fold decrease in potency. we tested whether we could replace the inhibitor-protein interaction that involved the n or n with a nitrile, a more electronegative group and possible hydrogen bond acceptor, as has been previously suggested. indeed, the , -dimethoxy, -nitrile quinoline (compound ) was more potent than the quinazoline itself (compound ), and this pattern repeated with all of the other methoxy substituents (compounds - ). compound was the most potent molecule with a ki less than our limit of quantification of nm. when we attempted to substitute back the missing nitrogen with a nitrile group on the isoquinoline scaffold (compound ), we lost potency, possibly due to molecular clashes between inhibitor and protein. in summary, the n nitrogen is critical for potency, possibly due to polar interactions with the protein backbone ~ Å away. it cannot be replaced by a nitrile since space is limited. the n nitrogen can be replaced by a more electronegative nitrile, leading to much more potent compounds. taking into account of all of our sar data, we then synthesized hybrid molecules composed of the most potent heads, cores, and tails ( -methoxy quinazoline, e.g., compound , and -methoxy quinoline nitrile, e.g., compound ) (fig. ). for all of these molecules, ki values fell below nm. combining the benzyl amine core with the -methoxy quinoline -nitrile tail (compound ) yielded another inhibitor with a ki less than our limit of quantification of nm. attaching other zinc binding head groups, including thiophosphate (compound ), boronic acid (compounds and ), and hydroxamic acid (compound ), to the most potent core/tail combinations also yielded potent inhibitors. we further evaluated the potency and in vitro adme properties of the seven most potent inhibitors. first, we evaluated the protein-shifted ic , which can help predict what the efficacy will be in vivo, as this value is generally higher than that observed in vitro due to protein binding. we observed that all of the inhibitors displayed a shift in potency when we did the assay in the presence of human serum albumin, although several still stayed below nm (fig. a, table , fig. s ). we then tested their potency in both mouse and human plasma, and obtained similar values, confirming that mid-nanomolar concentrations are sufficient in biological fluids to prevent degradation of cgamp (fig. b, table ). although enpp protein is present in our circulation, these data demonstrate that we can completely block serum enpp activity with as little as nm of our most potent inhibitors, showing that protein binding does not negate the inhibition, and suggesting that serum enpp will not sequester all the systemically administered enpp inhibitor. finally, all of our inhibitors are stable and cell impermeable. although we expect these phosphonate inhibitors to have few intracellular off-targets due to their impermeability, we also confirmed that the inhibitors are non-toxic to primary human peripheral blood mononuclear cells (pbmcs) ( table , fig. s ). since qs has an off-target herg liability, we tested two phosphonate inhibitors (compounds and ) and observed no inhibition at μm (table ) . we then assessed the pharmacokinetic (pk) profile of one of the top inhibitors, compound , in mice. since compound has an ic value of nm in mouse plasma, we aim to achieve serum concentrations of nm (ic value). first, we administered intravenous (iv) and subcutaneous (sc) doses of compound at mg/kg to mice and analyzed the concentration of compound in the serum, kidney, and liver for the next hours. we found that serum concentrations declined to nm or less within hours for both iv and sc administration, and the half-life was only - minutes. concurrent with the rapid decline in serum concentrations, we saw concentration plateaus in the kidney and liver in the micromolar range, suggesting that compound is rapidly excreted through these organs. this is in agreement with our previously reported pk study of compound performed at mg/kg via sc administration where we observed that the serum concentration drops to ~ nm at the end of hours. therefore, we were only able to test the efficacy of compound in that study dosing via an osmotic pump to maintain serum concentration of compound above its ic . to achieve sustained serum concentrations without surgically implanting osmotic pumps, we repeated the mg/kg sc dosing daily for days. we collected serum hours after the previous dose, which corresponds to the expected lowest point in the trough. we observed on average ~ nm of compound across several days, and the lowest concentration measured was nm. we also observed no adverse effects and the mice maintained healthy weights, suggesting that this dosage amount and timing is tolerated well. together, we succeeded in achieving convenient, systemic, once daily dosing of compound that results in sustained serum concentrations above the ic . after optimization of compound pk, we then asked if compound has anti-tumor efficacy in the orthotopic and syngeneic e triple negative breast cancer model. we chose this model because e cells basally export cgamp in cell culture. in addition, we previously saw that e cells implanted into mice grow more slowly in enpp -/mice than in wild type mice. we treated mice with established e tumors (~ mm ) with compound using the optimized dosing schedule for seven consecutive days. remarkably, we observed a delay in tumor growth in mice treated with compound relative to the controls, which also led to prolonged survival. the tumor growth delay observed with pharmacological inhibition of enpp mirrors that from genetic ablation of enpp we previously reported, demonstrating that our enpp inhibitors are therapeutically beneficial. here we report the development of highly potent phosphonate enpp inhibitors. we demonstrate that they are active against the physiological substrate cgamp under physiological conditions, including an in vitro assay at ph . and an ex vivo assay against enpp in mouse and human plasma. our compounds are by far the most potent inhibitors reported to date. although some of the boronic acids (compounds and ) and hydroxamic acids (compound ) are also potent, further investigation of their cell permeability and possible off-target effects would be needed. the lead phosphonate compounds, however, are cell impermeable through passive diffusion, avoiding all potential intracellular off-targets. we, therefore, nominate them as specific tool compounds to study enpp and extracellular cgamp biology. to understand the potency of our compounds, we solved the crystal structure of enpp with compound . we found that compound adopts a similar binding pose as amp, the product of cgamp and atp hydrolysis, but there are extra interactions that explain its enhanced potency including a direct hydrogen bond between the -methoxy and d (instead of water-mediated), extra hydrogen bond interactions with k , and other hydrophobic and polar interactions with l and y . our sar of other phosphonate inhibitors allowed us to build a model of the drivers of potency. the p-p stacking interactions are key, as only a couple of the cores we tried resulted in potent inhibitors and the core region could be important for positioning of the zinc-binding phosphonate with respect to the p-p stacking tail. we hypothesize that moving the methoxy groups to different positions on the ring would also engage d and/or k in hydrogen bonding. even though some of the positions on the ring are solvent-exposed in the compound structure in complex with enpp , combinations of more substituents are detrimental to potency, suggesting that these compounds could be shifted in the binding pocket leading to steric hinderance when extra substituents are added. it is interesting to note that enpp makes better interactions with the methoxy group, in contrast to the hydroxy or amine (e.g., compare compound to compounds and ; compare compound to compound ), suggesting that aliphatic carbons are important in addition to hydrogen bonding. in addition, we can speculate about the importance of the -nitrile group off the quinoline, e.g. compound and related analogs. compared to the n on the quinazoline, it is possible that the nitrile can make stronger polar interactions with the protein backbone or surrounding residues (e.g., d , y , or e ), act as a hydrogen bond acceptor, or polarize the quinoline ring for better p-p stacking. previous modeling showed that nitriles can replace water-mediated azomethine-protein interactions . this could extend to our scenario in replacing a weak polar interaction with a stronger one. in addition, it is surprising that the -vinyl- -pyridyl substituent (compound ) is potent, given that the position faces the protein backbone. it is possible that this large substituent can slide into the narrow pocket between h and y or could make a specific interaction with one of the residues. it also suggests that this space is not as available with the -methoxy substituent, as compound (with -vinyl- -pyridyl) was more than one hundred-fold less potent than compound (without -vinyl- -pyridyl). in addition to unmatched potency, our enpp inhibitors also have desirable adme profiles in vitro, and we have measured the pharmacokinetics of one of the top inhibitors to demonstrate efficacious concentrations in mice with once daily systemic dosing. compared to jump starting the anti-cancer innate immune response by treating with direct sting agonists, our enpp inhibitors provide two advantages. first, although endogenous extracellular cgamp enhances the anti-tumor immune response, the same may not be true for cgamp analogs. because cgamp is specifically transported into different cell types through different transporters, , it is difficult to design cgamp analogs that match the cell-targeting profile of cgamp itself. it is important to target specific cells because sting activation in cancer cells promotes metastasis, and sting activation in t cells leads to t cell death, [ ] [ ] [ ] both of which would be detrimental to cancer patients. in contrast, enpp inhibitors should increase the half-life of endogenous cgamp and thus enhance the natural anti-tumor response. indeed, treatment of mice with compound delayed tumor growth in the e breast cancer mouse model, which is a promising result for enpp inhibitors as cancer therapeutics. second, sting agonists can only be introduced intratumorally to achieve efficacy and to avoid systemic inflammation, which limits treatment to palpable and injectable tumors. since our enpp inhibitors can be administered systemically, we hypothesize that enpp inhibitors could treat a wide variety of cancers and may even be effective against undetectable micrometastasis. they could also avoid causing widespread toxic interferon production, since they only enhance endogenous cgamp. this hypothesis is supported by the fact that enpp -inactivating mutations in both humans and mice do not cause interferonopathy. our enpp inhibitors are biological tools as well as candidate investigational drugs that have shown efficacy in mouse tumor models, and they hold the promise to potentiate the efficacy of radiation and other immune checkpoint inhibitors. finally, since cgamp has shown stunning results as an adjuvant for influenza vaccination, our lead enpp inhibitor that prevents its extracellular degradation may have the potential to maximize cgamp's adjuvancy effect in developing vaccines for pandemic threats. all procedures to obtain plasma after blood draw were performed at ºc. blood from c b /j mice was obtained by cardiac puncture and deposited in heparin-coated tubes (bd microtainer with pst additive). mouse enpp ( nm; expressed and purified as described previously ) was incubated with µm cgamp (synthesized as described previously the extracellular region (residues - ) of mouse enpp was expressed, purified, and crystallized as multiple needle-like crystals were tested for x-ray diffraction which, when x-ray exposed, showed weak diffracting power. one needle crystal was isolated and used to collect a data set to a minimum d-spacing of around . Å. data was collected at cryogenic temperature ( k) at beamline . . of the advanced light source (als) synchrotron (berkeley, ca, usa) at a single . Å wavelength. data was reduced with mosflm and scaled with scala within the ccp suite. the crystal belonged to the trigonal space group p and contained two polypeptide chains per asymmetry unit. data collection and refinement statistics are listed in table . the structure was solved by the molecular replacement method with phaser using mouse enpp (pdb code: gtw) polypeptide chain stripped from ligands, ions, and glycan chains, as the search model. structural refinement was done using refmac iteratively with visual inspection of electron density maps and manual adjustment of atomic coordinates in coot until progression to convergence. the final refined structure shows an excellent agreement with reference protein data as shown by ramachandran statistics (table ). data collection statistics are derived from scala. to calculate rfree, % of the reflections were excluded from the refinement. rsym is defined as rsym = ΣhklΣi|ii(hkl) -| / ΣhklΣiii(hkl). data refinement statistics are derived from refmac. the final quality check was done with procheck. graphic renderings were prepared with pymol. as previously observed, ligands are shown as sticks/spheres, protein residue d is shown as sticks, and water is shown as sticks/spheres. density for water molecule is present only in amp-enpp crystal structure. (f) chemical structure of amp. dotted line represents the minimum ic value ( nm) measurable with the given enzyme concentration. chemical structures of inhibitors are displayed below. dots represent the mean of two independent replicates, and shaded areas around the fitted curves represent the % confidence interval of the fit. cell viability of primary human peripheral blood mononuclear cells (pbmcs) after incubation with indicated compounds for hours measured by celltiterglo. data is normalized to no compound ( % cell viability). two cell culture replicates are plotted. to calculate rfree, % of the reflections were excluded from the refinement. rsym is defined as rsym = ΣhklΣi|ii(hkl) -| / ΣhklΣiii(hkl). data refinement statistics are derived from refmac. the final quality check was done with procheck. safety, activity, and immune correlates of anti-pd- antibody in cancer innate immune recognition of cancer innate immune signaling and regulation in cancer immunotherapy host type i ifn signals are required for antitumor cd + t cell responses through cd α + dendritic cells the multifaceted role of chromosomal instability in cancer and its microenvironment cgas surveillance of micronuclei links genome instability to innate immunity mitotic progression following dna damage enables pattern recognition within micronuclei chromosomal instability drives metastasis through a cytosolic dna response extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity cyclic gmp-amp synthase is a cytosolic dna sensor that activates the type i interferon pathway cyclic gmp-amp is an endogenous second messenger in innate immune signaling by cytosolic dna. science ( -. ) cgas produces a ′- ′-linked cyclic dinucleotide second messenger that activates sting extracellular cgamp is a cancer-cell-produced immunotransmitter involved in radiation-induced anticancer immunity slc a is an importer of the immunotransmitter cgamp the lrrc a:c heteromeric channel is a cgamp transporter and the dominant cgamp importer in human vasculature cells tumor-derived cgamp triggers a sting-mediated interferon response in nontumor cells to activate the nk cell response blockade of the phagocytic receptor mertk on tumor-associated macrophages enhances p x r-dependent sting activation by tumor-derived cgamp hydrolysis of ' '-cgamp by enpp and design of nonhydrolyzable analogs pulmonary surfactant-biomimetic nanoparticles potentiate heterosubtypic influenza immunity. science ( -. ) identification and characterization of a soluble form of the plasma cell membrane glycoprotein pc- variants of enpp are associated with childhood and adult obesity and increase the risk of glucose intolerance and type diabetes structure of npp , an ectonucleotide pyrophosphatase/phosphodiesterase involved in tissue calcification crystal structure of enpp , an extracellular glycoprotein involved in bone mineralization and insulin signaling structural insights into cgamp degradation by ecto-nucleotide pyrophosphatase phosphodiesterase quinazolin- -piperidin- -methyl sulfamide pc- inhibitors: alleviating herg interactions through structure based design imidazopyridine-and purine-thioacetamide derivatives: potent inhibitors of nucleotide pyrophosphatase/phosphodiesterase (npp ) quinazoline- -piperidine sulfamides are specific inhibitors of human npp and prevent pathological mineralization of valve interstitial cells -a]benzimidazol- ( h)-one derivatives: structure-activity relationships of selective nucleotide pyrophosphatase/phosphodiesterase (npp ) inhibitors substrate-dependence of competitive nucleotide pyrophosphatase / phosphodiesterase (npp ) inhibitors synthesis of novel substituted pyrimidine derivatives bearing a sulfamide group and their in vitro cancer growth inhibition activity synthesis and biological evaluation of novel quinazoline- -piperidinesulfamide derivatives as inhibitors of npp highly selective and potent ectonucleotide pyrophosphatase- (npp ) inhibitors based on uridine '-pa,a-dithiophosphate analogues deazapurine analogues bearing a h-pyrazolo[ , -b]pyridin- ( h)-one core: synthesis and biological activity blood flow, oxygen and nutrient supply, and metabolic microenvironment of human tumors: a review the acidic tumor microenvironment as a driver of cancer development of cgamp-luc, a sensitive and precise coupled enzyme assay to measure cgamp in complex biological samples expression, purification, crystallization and preliminary x-ray crystallographic analysis of enpp optimization of , -disubstituted- -(arylamino)quinoline- -carbonitriles as orally active, irreversible inhibitors of human epidermal growth factor receptor- kinase activity nitrile-containing pharmaceuticals: efficacious roles of the nitrile pharmacophore intrinsic antiproliferative activity of the innate sensor sting in t lymphocytes cutting edge: activation of sting in t cells induces type i ifn responses and cell death signalling strength determines proapoptotic functions of sting evolving methods for macromolecular crystallography: processing diffraction data with mosflm an introduction to data reduction: space-group determination, scaling and intensity statistics d concentration); the maximum is µm. (e) chemical structures and ki values of compounds and (mean of at least independent replicates residues in...* number of residues (%) most favored regions ( . %) additional allowed regions ( . %) generously allowed regions ( . %) disallowed regions ( . %) *according to procheck for non-proline and non-glycine residues ( residues). key: cord- -krann ir authors: barber-axthelm, isaac m; kelly, hannah g; esterbauer, robyn; wragg, kathleen; gibbon, anne; lee, wen shi; wheatley, adam k; kent, stephen j; tan, hyon-xhi; juno, jennifer a title: coformulation with tattoo ink for immunological assessment of vaccine immunogenicity in the draining lymph node date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: krann ir characterisation of germinal centre b and t cell responses yields critical insights into vaccine immunogenicity. non-human primates are a key pre-clinical animal model for human vaccine development, allowing both lymph node and circulating immune responses to be longitudinally sampled for correlates of vaccine efficacy. however, patterns of vaccine antigen drainage via the lymphatics after intramuscular immunisation can be stochastic, driving uneven deposition between lymphoid sites, and between individual lymph nodes within larger clusters. in order to improve the accurate isolation of antigen-exposed lymph nodes during biopsies and necropsies, we developed and validated a method for co-formulating candidate vaccines with tattoo ink, which allows for direct visual identification of vaccine-draining lymph nodes and evaluation of relevant antigen-specific b and t cell responses by flow cytometry. this approach improves the assessment of vaccine-induced immunity in highly relevant non-human primate models. peripheral lymphoid tissues, including lymph nodes (ln), tonsils and mucosal associated lymphoid tissues are critical sites for the generation of adaptive immunity and immunological memory. after intramuscular (im) administration, vaccine antigens drain via the lymphatics to be concentrated and retained within the ln, where they are subject to immune surveillance. antibodies are a key protective correlate for most human vaccines, with high-affinity variants generated within germinal centres (gc) via tightly regulated interactions of antigen, gc b (b gc ) cells, and t follicular helper (t fh ) cells ( ) . efficient generation of gc by immunisation is therefore a key determinant of vaccine success or failure, controlling the kinetics, magnitude and quality of the resultant serological response ( , ) .direct characterisation of antigen-specific b gc or t fh cells can provide important insights into vaccine immunogenicity and the biogenesis of protective immune responses. however, interrogation of ln b gc and t fh cells in humans is challenging, requiring invasive surgical excision or the collection of a small number of cells by fine needle aspirates (fna) ( , ) . in contrast, pre-clinical animal models such as non-human primates (nhps) offer the opportunity to collect longitudinal ln and peripheral blood samples during vaccine studies ( , ) . a factor critical to the detection of these immune responses is the accurate sampling of lns that drain the injection site, which can be technically challenging due to the sporadic route of antigen trafficking in vivo. there are multiple factors that can confound accurate sampling of vaccine draining lns. in humans, im vaccination into the deltoids sees antigen drain predominately to the axillary ln, with vaccine responses largely restricted to draining lns in anatomic proximity to the injection site ( - ) . vaccination in the quadriceps is expected to drain predominately to the deep inguinal lymph nodes, which subsequently drains to the external iliac ln. . in some individuals, lymphatic drainage from the proximal pelvic limb musculature may bypass the ipsilateral inguinal lns, and drain directly into the iliac lns . while lymphatic drainage patterns of the thoracic limb are conserved in rodents and nhps , pelvic limb lymphatics predominately drain to the iliac ln, with inconsistent drainage to the ipsilateral inguinal ln . we and others observed substantial variability in vaccine induced responses when random ln in the draining region are sampled in nhps ( ) and humans, likely in part due to not sampling the responding ln. the ability to directly track antigen drainage following vaccination would substantially improve the accuracy of ln biopsies and assessment of gc immune responses, particularly in large animals such as nhps. while this can be partially mitigated by substituting subcutaneous (sc) for im vaccine administration ( ) , the majority of human vaccines are given im and it is desirable to maintain comparable delivery in pre-clinical animal models. previous studies have used tracking dyes to broadly identify ln drainage patterns in rodent and nhp animal models .tracking dyes have also been utilised clinically to identify sentinel lns in cancer patients for biopsy or surgical resection . however, the potential for mixing vaccine antigens with tracking dyes for long-term demarcation of draining lns in vivo is untested. here we show that that co-formulating influenza haemagglutinin (ha) or severe acute respiratory syndrome coronavirus (sars-cov- ) spike immunogens with adjuvant and tattoo ink allows for the ready visual identification of draining ln without compromising downstream analyses of cellular and humoral immunity. we propose tattoo ink-based vaccine tracking is an effective method for the differentiation of vaccine-draining lns during extended periods of time post-vaccination, and facilitates a more accurate quantification and phenotypic characterisation of vaccine-specific b and t fh cells in draining lns, in both murine and nhp animal models. tracking dyes used in vaccinations should (i) visibly stain the draining ln with co-deposition of antigen, and (ii) not affect vaccine immunogenicity or downstream analyses, such as flow cytometric quantification of antigen-specific b gc and t fh populations. we first tested the impact of candidate tracking dyes (evans blue dye, india ink, and tattoo ink) on cell viability and autofluorescence in vitro. human pbmc cultured in media with . % evans blue dye for hour resulted in substantial cytotoxicity and loss of lymphocytes (fig a) . in contrast, culture with % india ink or % tattoo ink did not affect lymphocyte viability (fig b) . despite the short duration of co-culture ( hr), incubation of pbmc with % india ink demonstrated alterations in cellular autofluorescence as measured by flow cytometry on channels off the blue, violet and uv lasers (fig c) . given tattoo ink demonstrated less autofluorescence (fig c) , we proceeded to test the utility of tattoo ink co-formulated with vaccine antigens in mice. c bl/ j mice were immunised im in the right gastrocnemius and left quadriceps with a/puerto rico/ / haemagglutinin (pr -ha; μg) formulated with addavax adjuvant and tattoo ink ( . %). based on previous studies , antigens delivered to the right gastrocnemius will predominately drain to the right popliteal ln and the right iliac ln, with variable drainage to the right inguinal ln (fig d) . antigens delivered in the left quadriceps will predominately drain to the ipsilateral iliac ln, with variable drainage to the ipsilateral inguinal ln (fig d) . lymph nodes were harvested and assessed visually for ink staining days post-vaccination. on the left quadricep side, non-draining ln (left popliteal and axillary ln) showed no evidence of ink uptake, with the left iliac ln consistently exhibiting ink uptake (fig e, table ). in contrast, right popliteal and right iliac lns exhibited obvious ink uptake by eye following right gastrocnemius injection, while absent in the right axillary ln (fig e, table ). these observations are consistent with previous reports that lymphatics from the pelvic limbs drain into the iliac lns in mice . dye labelling of inguinal ln was variable, with approximately % of the left inguinal lns, and % of the right inguinal lns labelled (fig e, table ). this may reflect differences in lymphatic drainage patterns between proximal and distal muscle groups of the pelvic limb. overall, our results indicate that tattoo ink can label draining lns when administered in combination with antigen without substantially impacting lymphocyte viability and autofluorescence. to assess the extent to which ink staining tracks with vaccine-induced gc activity in mice, we quantified both total b gc (b + igd -gl + cd dim ) or ha-specific b gc cell frequency ( ) in lns with and without gross ink uptake (fig a) . (fig a, b) . these data indicate that the tattoo ink vaccine formulation does not hinder the identification of total or antigen-specific b gc by flow cytometry, and that ink-dyed draining ln are more likely to contain vaccine responses than suggesting that the degree of ink accumulation mirrors antigen load in the ln following vaccination. while mice generally have only - lns at each lymphoid site , primates and humans commonly exhibit chains or clusters of - lns . this creates additional complexities when evaluating the adaptive immune response, as vaccine antigen may drain to a small subset of the lns at a given site. we tested if sampling accuracy, and the characterisation of vaccine-elicited immune responses ex vivo, could be improved using tattoo ink to label vaccine-draining lns in nhps. pigtail macaques (macaca nemastrina) were immunised in the right quadriceps with sars-cov- spike ( μg) formulated with monophosphoryl lipid a (mpla) liposomal adjuvant ( ) , and were boosted im in the right and left quadriceps with sars-cov- spike ( μg) formulated with mpla and tattoo ink ( . %). administration at these sites was expected to drain primarily to the left and right iliac lns, with inconsistent drainage to the left and right inguinal lns (fig a) ( . animals were additionally immunised in the right and left deltoids with human immunodeficiency virus- (hiv- ) fixed trimeric envelope protein (sosip) vaccines ( μg) formulated with mpla and . % tattoo ink (ink in the right deltoid only), with expected drainage to the axillary lns (fig a) ( . animals were humanely euthanized - days after the second immunisation, and necropsies were performed to evaluate draining and nondraining lns for the presence of tattoo ink. among the draining lns, tattoo ink was visible in at least one ln within the right and left iliac chains in of animals. ( table ). dye labelling of inguinal lns was variable, with lns containing tattoo ink being identified in the left and right inguinal lymphoid sites found in of and of animals, respectively (fig b and c, table ). dye labelling was observed in the right axillary lns in of animals, while dye labelling in the left axillary lns was observed in of animals (fig d-f, table ). stochastic drainage to the ipsilateral inguinal ln aligns with previous reports ( ) , and is important to consider as the inguinal ln is a common and readily accessible site for ln sampling via fna or biopsy. no tattoo ink was grossly visible in any nondraining ln clusters; including the popliteal, para-aortic, mesenteric, mediastinal, tracheobronchial and submandibular lns (data not shown). tattoo ink was generally observed in only a single, or limited number of ln recovered from a given lymphoid site (fig b-f) , suggesting the widely used practice of pooling all ln for immunological analysis could result in significant dilution of vaccine-specific responses and unpredictable effects on reported frequencies of antigen-specific b and t cell responses. longitudinal studies involving ln biopsies in macaques commonly sample the more readily accessible inguinal ln, to which antigen drainage is highly stochastic ( ) . among the animals . however, long term stability in vivo, and the influence of dye on downstream immunological analysis is unclear. prolonged labelling of vaccine draining lns is a relevant consideration, as it allows serial sampling to evaluate changes in the immune response over weeks to months after initial vaccination. visible dye staining in rats was reported - days after intraperitoneal administration of pontamine sky blue dye ( ) . tattoo ink, by design, is a stable, relatively inert compound that can persist in lns for extended periods time. in humans, several case reports have noted tattoo ink being incidentally identified in draining lns up to years after the tattoo was originally placed . the longevity of tattoo ink in draining lns when formulated with vaccines still needs to be determined. however, our data demonstrates that draining lns containing tattoo ink can be readily identified weeks after administration in both mice and nhps, with the likelihood of it persisting for considerably longer. published methods for identifying vaccine-draining lns include administration of fluorescently labelled immunogens that can be identified by in vivo imaging systems (ivis) , and administration of tc sulfur colloid that can be identified with a gamma probe ( ) . vaccines may also be administered in specific anatomical locations, or specific routes, to increase the likelihood of antigen drainage to a specific lns. for example, injection in the sc flank in mice for selective drainage to the ipsilateral inguinal ln , or sc immunisation in the anterior thigh for selective drainage to the inguinal and iliac lns in nhps ( ) . the route of vaccine can impact the associated immune response; for example, reports of sc immunisation eliciting stronger neutralising antibody response compared to im immunisation in nhps) . identification of vaccine draining lns with tattoo ink provides a simple approach for active ln identification without requiring specialised equipment, while permitting im vaccination routes of greatest clinical relevance for human vaccines. one consideration with our proposed method is the potential for both endogenous and exogenous pigments to confound the identification of ink containing lymph nodes. hemosiderin, and ironcontaining haemoglobin breakdown product, can accumulate in draining lymph nodes from congested, haemorrhagic, or inflamed areas . similarly, carbon-containing, particulate debris can accumulate in tracheobronchial and mediastinal lymph nodes, following inhalation and drainage from the pulmonary tree mouse studies and related experimental procedures were approved by the university of full length influenza h n a/puerto rico/ / hemagglutinin (pr -ha) , sars-cov- spike protein (s) ( ) , and the receptor binding domain of the sars-cov- spike protein (rbd) f r e q u e n c y o f t a t t o o i n k l a b e l l i n g i n m u r i n e l n s . the lymph node at a glance -how spatial organization optimizes the immune response rapid germinal center and antibody responses in non-human primates after a single nanoparticle vaccine immunization direct probing of germinal center responses reveals immunological features and bottlenecks for neutralizing antibody responses to hiv env trimer quantification of residual germinal center activity and hiv- dna and rna levels using fine needle biopsies of lymph nodes during antiretroviral therapy normal human lymph node t follicular helper cells and germinal center b cells accessed via fine needle aspirations rhesus macaque b-cell responses to an hiv- trimer vaccine revealed by unbiased longitudinal repertoire analysis serial study of lymph node cell subsets using fine needle aspiration in pigtail macaques vaccine priming is restricted to draining lymph nodes and controlled by adjuvant-mediated antigen uptake cancer treatment and research deep lymphatic anatomy of the upper limb: an anatomical study and clinical implications axillary lymph node accumulation on fdg-pet/ct after influenza vaccination lymph node activation by pet/ct following vaccination with licensed vaccines for human papillomaviruses the normal anatomy of the lymphatic system in the human leg three dimensional demonstration of the lymphatic system in the lower extremities with multi detector row computed tomography: a study in a cadaver model perforating and deep lymphatic vessels in the knee region: an anatomical study and clinical implications the lymph system in mice the lymphatics of japanese macaque lymph node mapping in the mouse selfassembling influenza nanoparticle vaccines drive extended germinal center activity and memory b cell maturation the lymphatic system in rhesus monkeys (macaca mulatta) outlined by lower limb lymphography tracking the luminal exposure and lymphatic drainage pathways of intravaginal and intrarectal inocula used in nonhuman primate models of hiv transmission patterns of lymphatic drainage in the adult laboratory rat triple injection" lymphatic mapping technique to determine if parametrial nodes are the true sentinel lymph nodes in women with cervical cancer how long will i be blue? prolonged skin staining following sentinel lymph node biopsy using intradermal patent blue dye efficacy of methylene blue dye in localization of sentinel lymph node in breast cancer patients safety and technical success of methylene blue dye for lymphatic mapping in breast cancer the predictive value of methylene blue dye as a single technique in breast cancer sentinel node biopsy: a study from dharmais cancer hospital subdominance and poor intrinsic immunogenicity limit humoral immunity targeting influenza ha-stem safety evaluation of monophosphoryl lipid a (mpl): an immunostimulatory adjuvant tattoo pigment in an axillary lymph node simulating metastatic malignant melanoma tattoo pigment mimicking axillary lymph node calcifications on mammography tattoo pigment in axillary lymph node mimicking calcification of breast cancer targeting hiv env immunogens to b cell follicles vaccine-draining lymph nodes of cancer patients for generating anti-cancer antibodies vaccine draining lymph nodes are a source of antigenspecific b cells immunization with tumor neoantigens displayed on t phage nanoparticles elicits plasma antibody and vaccinedraining lymph node b cell responses elicitation of robust tier neutralizing antibody responses in nonhuman primates by hiv envelope trimer immunization using optimized approaches histopathology of the lymph nodes pathologic basis of veterinary disease common reactive erythrophagocytosis in axillary lymph nodes primary nodal anthracosis identified by ebus-tbna as a cause of fdg pet/ct positive mediastinal lymphadenopathy cleavage strongly influences whether soluble hiv- envelope glycoprotein trimers adopt a native-like conformation structure and immunogenicity of a stabilized hiv- envelope trimer based on a group-m consensus sequence flow cytometry reveals that h n vaccination elicits cross-reactive stem-directed antibodies from multiple ig heavy-chain lineages humoral and circulating follicular helper t cell responses in recovered patients with covid- fiji: an open-source platform for biological-image analysis nih image to imagej: years of image analysis the authors would like to thank the staff at the monash animal research platform (marp) andgippsland field station, including irwin ryan, for their assistance with the non-human primate study. we thank robin shattock ( key: cord- -a n authors: mitchell, evan; wild, geoff title: prophylactic host behaviour discourages pathogen exploitation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a n much work has considered the evolution of pathogens, but little is known about how they respond to changes in host behaviour. we build a model where hosts are able to choose to engage in prophylactic measures that reduce the likelihood of disease transmission. this choice is mediated by costs and benefits associated with prophylaxis, but the fraction of hosts engaged in prophylaxis is also affected by population dynamics. we identify a critical cost threshold above which hosts do not engage in prophylaxis. below the threshold, prophylactic host behaviour does occur and pathogen virulence, measured by the extent to which it exploits its host, is reduced by the action of selection relative to the level that would otherwise be predicted in the absence of prophylaxis. our work emphasizes the significance of the dual nature of the trade-off faced by the pathogen between balancing transmission and recovery, and creating new infections in hosts engaging or not engaging in prophylaxis. at a fixed per-capita rate, γ, independent of their group. as a result of recovery, individuals are imbued with life-long immunity to future infection. individuals can also switch groups. for now we use τ ij , φ ij and η ij to represent the per-capita rates at which susceptible, infective, and recovered individuals, repsectively, switch from group j to i. we expand on the details surrounding group switching later. we can summarize the description above using a system of differential equations. scaling time so that one time unit is equivalent to the average lifetime of an individual in the population, and matching birth and death rates, we get note that the differential equations in ( ) sum to zero, and so total population size n is constant. this, along with the fact that group membership among recovered individuals is of no consequence, allows us to omit ( e) and ( f). we now use u i = s i /n and v i = i i /n to denote the fraction of susceptible and infective individuals, respectively, in group i. similarly, we use u = u + u and v = v + v to denote the total fraction of susceptible and infective or may not hold. the linear stability analysis presented in appendix a shows that the well-known endemic equilibrium remains stable as long as where r = β /( + γ) is the basic reproductive number [ ] we consider the evolution of the level of pathogen exploitation of its host, denoted ξ > . exploitation affects disease transmission, with a greater ξ value corresponding to a greater β ij . to reflect this, we now write β ij (ξ) where in words, we are treating β ij as an increasing function of ξ that saturates at a value of [ ] by assuming the trade-off faced by the pathogen involves recovery, e.g., through increased viral load making it more likely that the pathogen is detected by the host's immune system. we use an adaptive dynamics approach to model the evolution of pathogen exploitation under the primary influence of natural selection [ , , ] . we introduce a rare mutant pathogen with exploitation trait ξ m into a resident pathogen population with exploitation trait ξ. it is assumed that the resident system has reached equilibrium prior to introducing as we do with the resident population, we can also track the proportion of individuals infected with the mutant strain engaging in prophylaxis. denoting this proportion by y m , we can describe its dynamics by if the mutant strain becomes common, the approximate dynamics in ( ) break down and we say that the mutant has successfully invaded the resident population. provided the system is sufficiently close to an evolutionarily steady state, a mutant who successfully invades will become the new resident [ ] . since v m does not appear in ( b), we can first solve for the equilibrium valueȳ m of y m , substitute that value into ( a), and study ( a) alone. if the right-hand side of ( a) is positive (resp. negative), the mutant invades (resp. is eliminated) because it is favoured (resp. disfavoured) by natural selection. the sign of the right-hand side of ( a) is the same as the sign of the difference between and unity, where β avgū new infections are created during this time. if this quantity is larger than one (resp. smaller than one), then the mutant population will grow (resp. shrink). writing w as we have done in equation ( ) also highlights the two trade-offs faced by the pathogen. one is the transmission-recovery trade-off described above, captured through the β avg and γ terms. by unpacking the β avg term, though, we can also see a second trade-off between the type of infection created. through theȳ m terms, the pathogen is able to influence whether a new infection occurs in a protector or a non-protector. we can, therefore, use w as a payoff function in the adaptive dynamics analysis. following the discussion above, the direction of evolution of ξ that is favoured by natural an equilibrium valueξ, i.e., one that satisfies condition ( ) there are two special cases that can be analyzed with relative ease. the first special case the mutant strain is then able to invade (resp. is eliminated) if w (ξ m , ξ) exceeds (resp. is less than) unity; equivalently, if r (ξ m ) exceeds or is less than r (ξ). more importantly, the css level of exploitation,ξ, will maximize r (ξ) and soξ = √ κ. this will serve as a benchmark against which more general results will be compared. the second special case assumes that prophylactic measures are cost-free, i.e., χ = . to the pathogen losing access to the trade-off in the type of infection created. in general, the model cannot be explored analytically. however, if we are near the critical cost χ c we can derive quasi-analytic results. when the cost χ is slightly below its critical threshold χ c , we can approximate the css exploitation level asξ if σ is positive (resp. negative), thenξ is above (resp. below) the benchmark value of √ κ. as equation ( ) shows, whether we are above or below this benchmark depends on how small changes in cost lead to small changes in the proportion of susceptible protectors which, in turn, lead to changes in the selection gradient acting on exploitation. we can show with a quasi-analytic approach that equation ( ) is always negative. our evidence relies on first choosing feasible values of our parameters. in particular, we need to ensure that ε < , we then need to choose k > r − . using feasible parameters and working to zeroth order in χ − χ c , we use the computer algebra software (cas) maple (version . ) to investigate the sign of σ as described in equation ( ). we find that the requirement that r > necessarily restricts our choices of , and the region of parameter space where χ c > and r > . dotted grey curves represent the roots of the right-hand side of equation ( ) and panel b shows that these all occur on or below the black and dashed grey curves (note that vertical dotted grey lines are an artifact of jump discontinuities). choosing parameter values in the region above the curves in panel a results in the right-hand side of equation ( ) being negative, as noted in panel b. the implication is that the pathogen exploitation level (virulence) will decrease relative to the benchmark level of √ κ close to the cost threshold χ c . our results can be extended numerically for costs that are possibly much smaller than the critical value, χ c , using a matlab (version r a) procedure detailed in appendix d. we build the procedure around the observation that locally asymptotically stable equilibrium solutions to dξ/dt = (∂w/∂ξ m )| ξm=ξ are also convergence-stable evolutionary equilibria as defined by conditions ( ) and ( ), respectively. as a result, numerical iteration of this differential equation can be used to find candidate css strategies. the evolutionary stability of candidate css strategies can be confirmed with a centred finite-difference approximation of ( ). since the error is on the order of the square of the distance between ξ values used in the approximation, we consider any value within this error to satisfy the ess condition. the results of our numerical procedure confirm that the benchmark css level ofξ = √ κ is obtained when χ = and χ = χ c . second, numerical results confirm the reduction in the css level of pathogen exploitation for χ slightly smaller than χ c . third, and most important, numerical results indicate that the css exploitation rateξ changes in a simple way as cost is reduced from its critical value to its natural lower limit at zero (figure ). in choose five values of k (k = /(r − ) + , k = /(r − ) + , k = /(r − ) + , k = /(r − ) + , and k = /(r − ) + ) and five values of ε spread evenly between the threshold r /((k + )(r − )) and , for a total of combinations of parameter values. pathogen with a virulence ξ > √ κ. when the cost is above its critical value so that no one is engaging in prophylaxis, then this mutant cannot invade a resident population at the css ξ = √ κ. if we then decrease the cost below its critical value so that individuals begin to take prophylactic measures, the css exploitation level decreases away fromξ = √ κ and so the mutant is still unable to invade the resident population. the discrepancy between our prediction and those of pharaon and bauch is due to the moreover, the pathogen's virulence is always below the level predicted in the absence of in- dividuals engaging in prophylactic behaviour. there is a balance, then, between the level of cost that is optimal for minimizing prevalence and the level optimal for minimizing pathogen virulence. it is important to recognize that efforts to minimize the cost of prophylaxis will result in a pathogen strain that is more virulent than it otherwise might be. more broadly, which, when evaluated at the disease-free equilibrium (dfe) (ū,v) = ( , ), has eigenvalues λ = − and λ = β − − γ. the dfe is stable when both eigenvalues are negative, which leads us to the condition that r = β /( + γ) < for stability. when r exceeds this threshold, the dfe becomes unstable and the standard two- dimensional system moves towards an endemic equilibrium (ū,v) = ( /r , ( − /r )/( + γ)). to determine the region in which this equilibrium remains stable after we incorporate the host behavioural dynamics, we need to consider the jacobian matrix of our four-dimensional where χ c = r − k+ k β β (r − ) + and asterisks denote entries that are possibly non- (ū,v,x,ȳ), which we know has negative eigenvalues whenever r > . the × matrix in the bottom right is lower triangular, so its eigenvalues are the entries on the main diagonal. the second of these entries is always negative, while the first is negative as long as χ > χ c . this defines a critical cost threshold where for χ > χ c the endemic equilibrium (ū,v,x,ȳ) is stable, while for χ < χ c our system tends towards an endemic equilibrium that contains protectors in some non-zero quantities. tion g(r, s; χ). we know that below the critical cost threshold χ c the endemic equilibrium (r,ŝ) = (û,v,x,ŷ,ξ) is stable, while above this threshold our system tends towards the protector-free equilibrium (r,s) = (ū,v, , ,ξ). the critical cost level represents a bifur- cation point where these two equilibria coincide and undergo an exchange of stability. to study how our system reacts as we decrease the cost away from this threshold, we intro- duce a perturbation parameter δ = χ − χ c and take a first-order approximation to our new equilibrium point (r,ŝ) = (ū + ρ δ,v + ρ δ, ρ δ, ρ δ,ξ + σδ). knowing that this equilibrium point must satisfy f(r,ŝ; χ) = and g(r,ŝ; χ) = , our goal is to find expressions for the perturbation coefficients ρ , ρ , ρ , ρ , and σ. if we treat (r,ŝ) as a function of χ, we can make a first-order taylor series approximation where subscripts denote partial derivatives and dr = (ρ , ρ , ρ , ρ ) and dŝ = σ are the derivatives with respect to χ ofr andŝ, respectively. we know that (r,s) is an equilibrium point, so the first term in equations ( a) and ( b) evaluates to zero. we also observe that every term in f and g involving χ is multiplied by at least one of x or y, and so the partial derivatives with respect to χ vanish when we evaluate at (r,s). this simplifies ( ) with asterisks denoting entries that are possibly non-zero. since j is a block triangular matrix, the eigenvalues are given by the eigenvalues of the matrices on the main diagonal. the × matrix in the upper left is the jacobian matrix arising from the linearization of the standard sir model around the protector-free endemic equilibrium. since the lower-right × block is lower triangular and has a zero entry on its main diagonal, we can see that zero is an eigenvalue of j. this allows us to interpret [ dr dŝ ] as the eigenvector of j associated with the zero eigenvalue, and so shows that there is a non-trivial solution for our perturbation coefficients. while an analytic expression for this eigenvector can be found, it is unwieldy. of more interest is the sign of the perturbation coefficient σ, as this tells us in which directionξ moves as we decrease the cost below its critical value. the third row of ( ) tells us thatx is a free variable, and the last row tells us that there is a simple relationship between this free variable andξ. in particular, if we consider finding the eigenvector [ dr dŝ ] by solving the expression j [ dr dŝ ] = , then the last row of ( ) tells us that we know that the denominator of ( ) is always negative sinceξ = √ κ is convergence stable (see ( )), and we know that ρ > since the proportion of susceptible protectors increases as the cost is decreased below its critical value. it follows, then, that the sign of σ is controlled only by the numerator of ( ) and so we arrive at equation ( ) in the main text. since the first of these quantities is negative and the second is positive, this shows that the first root is the stable equilibrium: we now define the differential equations for u, v, x, and y, as well as the partial derivative using this, we put together the matrix j described in appendix b: if we also choose ε values so that χ c > and k values so that ε < , we can generate a series if we use the ε and k values chosen above, we can also look at the value of the numerator of σ as described in equation ( ). this gives a second set of β max -κ expressions that we can this produces the plot shown in panel b of figure and shows that all of these curves that represent ( ) we also need functions to define the resident system and the partial derivative with respect to ξ m of the fitness function: ( kappa + xi )*( epsil^ * xbar - )* cost + ubar^ * xi^ * bmax^ *... ( epsil^ * xbar - * xbar * epsil + )^ )*( + xi )^ *( kappa ... + xi )^ ))*( -((( ubar *( epsil^ * xbar - * xbar * epsil + )* bmax ... -k * cost )* xi^ - * cost * k * kappa * xi -( ubar *( epsil^ * xbar ... - * xbar * epsil + )* bmax + k * kappa * cost )* kappa )*... sqrt ( k^ *( kappa + xi )^ * cost^ - * k * ubar * xi * bmax *( kappa ... + xi )*( epsil^ * xbar - )* cost + ubar^ * xi^ * bmax^ *... ( epsil^ * xbar - * xbar * epsil + )^ ) + ( ubar^ * bmax^ *... ( epsil^ * xbar - * xbar * epsil + )^ - * k * ubar * cost *... ( epsil^ * xbar - )* bmax + cost^ * k^ )* xi^ + * k * cost *... (( -epsil^ * xbar + )* ubar * bmax + k * cost )* kappa * xi^ ... + *( -( / )* ubar^ * bmax^ *( epsil^ * xbar - * xbar * epsil ... + )^ -( / )* k * ubar * cost *( kappa - )*( epsil^ * xbar ... - )* bmax + cost^ * k^ * kappa )* kappa * xi + k *( ubar * bmax *... ( epsil^ * xbar - ) + k * kappa * cost )* cost * kappa^ )); finally, we use all of these to define a function that numerically approximates the css level perspectives on the basic reproductive ratio dres = zeros ) = -( b ( xi , epsil , bmax , kappa )* res ( ) res ( ) + b ( xi , epsil , bmax , kappa ) -res ( ))* res ( ) + b ( xi , epsil bmax , kappa )* res ( )*( -res ( )) b ( xi , bmax , kappa )*( -res -res ( )))* res ( )* res ( ) -res dres ( ) = ( b ( xi , epsil , bmax , kappa )* res ( ) res ( ) + b ( xi , epsil , bmax , kappa ) * res ( ) + b ( xi , epsil bmax , kappa )* res ( )*( -res ( )) b ( xi , bmax , kappa )*( -res -res ( )))* res ( )* res ( ) + g ( xi ))* res ) = -res ( )/ res ( ) + ( k + )* res ( ) ))*( res ( ) ( b ( xi , epsil , bmax , kappa ) b ( xi , epsil , bmax , kappa )) + ( -res ( )) b ( xi , epsil , bmax , kappa )))* res ( ) -k * res ( )* cost ) = ( b ( xi , epsil , bmax , kappa )* res ( )* res ( ) -res ( )) -b ( xi , epsil , bmax , kappa ) ))*( res ( )) b ( xi , epsil , bmax , kappa )* res ( ) * res ( )*( -res ( )))* res ( ) key: cord- - stnx dw authors: widrich, michael; schäfl, bernhard; pavlović, milena; ramsauer, hubert; gruber, lukas; holzleitner, markus; brandstetter, johannes; sandve, geir kjetil; greiff, victor; hochreiter, sepp; klambauer, günter title: modern hopfield networks and attention for immune repertoire classification date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: stnx dw a central mechanism in machine learning is to identify, store, and recognize patterns. how to learn, access, and retrieve such patterns is crucial in hopfield networks and the more recent transformer architectures. we show that the attention mechanism of transformer architectures is actually the update rule of modern hop-field networks that can store exponentially many patterns. we exploit this high storage capacity of modern hopfield networks to solve a challenging multiple instance learning (mil) problem in computational biology: immune repertoire classification. accurate and interpretable machine learning methods solving this problem could pave the way towards new vaccines and therapies, which is currently a very relevant research topic intensified by the covid- crisis. immune repertoire classification based on the vast number of immunosequences of an individual is a mil problem with an unprecedentedly massive number of instances, two orders of magnitude larger than currently considered problems, and with an extremely low witness rate. in this work, we present our novel method deeprc that integrates transformer-like attention, or equivalently modern hopfield networks, into deep learning architectures for massive mil such as immune repertoire classification. we demonstrate that deeprc outperforms all other methods with respect to predictive performance on large-scale experiments, including simulated and real-world virus infection data, and enables the extraction of sequence motifs that are connected to a given disease class. source code and datasets: https://github.com/ml-jku/deeprc transformer architectures (vaswani et al., ) and their attention mechanisms are currently used in many applications, such as natural language processing (nlp), imaging, and also in multiple instance learning (mil) problems . in mil, a set or bag of objects is labelled rather than objects themselves as in standard supervised learning tasks (dietterich et al., ) . examples for mil problems are medical images, in which each sub-region of the image represents an instance, video a pooling function f is used to obtain a repertoire-representation z for the input object. finally, an output network o predicts the class labelŷ. b) deeprc uses stacked d convolutions for a parameterized function h due to their computational efficiency. potentially, millions of sequences have to be processed for each input object. in principle, also recurrent neural networks (rnns), such as lstms (hochreiter et al., ) , or transformer networks (vaswani et al., ) may be used but are currently computationally too costly. c) attention-pooling is used to obtain a repertoire-representation z for each input object, where deeprc uses weighted averages of sequence-representations. the weights are determined by an update rule of modern hopfield networks that allows to retrieve exponentially many patterns. classification, in which each frame is an instance, text classification, where words or sentences are instances of a text, point sets, where each point is an instance of a d object, and remote sensing data, where each sensor is an instance (carbonneau et al., ; uriot, ) . attention-based mil has been successfully used for image data, for example to identify tiny objects in large images (ilse et al., ; pawlowski et al., ; tomita et al., ; kimeswenger et al., ) and transformer-like attention mechanisms for sets of points and images . however, in mil problems considered by machine learning methods up to now, the number of instances per bag is in the range of hundreds or few thousands (carbonneau et al., ; lee et al., ) (see also tab. a ). at the same time the witness rate (wr), the rate of discriminating instances per bag, is already considered low at % − %. we will tackle the problem of immune repertoire classification with hundreds of thousands of instances per bag without instance-level labels and with extremely low witness rates down to . % using an attention mechanism. we show that the attention mechanism of transformers is the update rule of modern hopfield networks (krotov & hopfield, demircigil et al., ) that are generalized to continuous states in contrast to classical hopfield networks (hopfield, ) . a detailed derivation and analysis of modern hopfield networks is given in our companion paper (ramsauer et al., ) . these novel continuous state hopfield networks allow to store and retrieve exponentially (in the dimension of the space) many patterns (see next section). thus, modern hopfield networks with their update rule, which are used as an attention mechanism in the transformer, enable immune repertoire classification in computational biology. immune repertoire classification, i.e. classifying the immune status based on the immune repertoire sequences, is essentially a text-book example for a multiple instance learning problem (dietterich et al., ; maron & lozano-pérez, ; wang et al., ) . briefly, the immune repertoire of an individual consists of an immensely large bag of immune receptors, represented as amino acid sequences. usually, the presence of only a small fraction of particular receptors determines the immune status with respect to a particular disease (christophersen et al., ; emerson et al., ) . this is because the immune system has already acquired a resistance if one or few particular immune receptors that can bind to the disease agent are present. therefore, classification of immune repertoires bears a high difficulty since each immune repertoire can contain millions of sequences as instances with only a few indicating the class. further properties of the data that complicate the problem are: (a) the overlap of immune repertoires of different individuals is low (in most cases, maximally low single-digit percentage values) (greiff et al., ; elhanati et al., ) , (b) multiple different sequences can bind to the same pathogen (wucherpfennig et al., ) , and (c) only subsequences within the sequences determine whether binding to a pathogen is possible (dash et al., ; glanville et al., ; akbar et al., ; springer et al., ; fischer et al., ) . in summary, immune repertoire classification can be formulated as multiple instance learning with an extremely low witness rate and large numbers of instances, which represents a challenge for currently available machine learning methods. furthermore, the methods should ideally be interpretable, since the extraction of class-associated sequence motifs is desired to gain crucial biological insights. the acquisition of human immune repertoires has been enabled by immunosequencing technology (georgiou et al., ; brown et al., ) which allows to obtain the immune receptor sequences and immune repertoires of individuals. each individual is uniquely characterized by their immune repertoire, which is acquired and changed during life. this repertoire may be influenced by all diseases that an individual is exposed to during their lives and hence contains highly valuable information about those diseases and the individual's immune status. immune receptors enable the immune system to specifically recognize disease agents or pathogens. each immune encounter is recorded as an immune event into immune memory by preserving and amplifying immune receptors in the repertoire used to fight a given disease. this is, for example, the working principle of vaccination. each human has about - unique immune receptors with low overlap across individuals and sampled from a potential diversity of > receptors (mora & walczak, ) . the ability to sequence and analyze human immune receptors at large scale has led to fundamental and mechanistic insights into the adaptive immune system and has also opened the opportunity for the development of novel diagnostics and therapy approaches (georgiou et al., ; brown et al., ) . immunosequencing data have been analyzed with computational methods for a variety of different tasks (greiff et al., ; shugay et al., ; miho et al., ; yaari & kleinstein, ; wardemann & busse, ) . a large part of the available machine learning methods for immune receptor data has been focusing on the individual immune receptors in a repertoire, with the aim to, for example, predict the antigen or antigen portion (epitope) to which these sequences bind or to predict sharing of receptors across individuals (gielis et al., ; springer et al., ; jurtz et al., ; moris et al., ; fischer et al., ; greiff et al., ; sidhom et al., ; elhanati et al., ) . recently, jurtz et al. ( ) used d convolutional neural networks (cnns) to predict antigen binding of t-cell receptor (tcr) sequences (specifically, binding of tcr sequences to peptide-mhc complexes) and demonstrated that motifs can be extracted from these models. similarly, konishi et al. ( ) use cnns, gradient boosting, and other machine learning techniques on b-cell receptor (bcr) sequences to distinguish tumor tissue from normal tissue. however, the methods presented so far predict a particular class, the epitope, based on a single input sequence. immune repertoire classification has been considered as a mil problem in the following publications. a deep learning framework called deeptcr (sidhom et al., ) implements several deep learning approaches for immunosequencing data. the computational framework, inter alia, allows for attention-based mil repertoire classifiers and implements a basic form of attention-based averaging. ostmeyer et al. ( ) already suggested a mil method for immune repertoire classification. this method considers -mers, fixed sub-sequences of length , as instances of an input object and trained a logistic regression model with these -mers as input. the predictions of the logistic regression model for each -mer were max-pooled to obtain one prediction per input object. this approach is characterized by (a) the rigidity of the k-mer features as compared to convolutional kernels (alipanahi et al., ; zhou & troyanskaya, ; zeng et al., ) , (b) the max-pooling operation, which constrains the network to learn from a single, top-ranked k-mer for each iteration over the input object, and (c) the pooling of prediction scores rather than representations (wang et al., ) . our experiments also support that these choices in the design of the method can lead to constraints on the predictive performance (see table ). our proposed method, deeprc, also uses a mil approach but considers sequences rather than k-mers as instances within an input object and a transformer-like attention mechanism. deeprc sets out to avoid the above-mentioned constraints of current methods by (a) applying transformer-like attention-pooling instead of max-pooling and learning a classifier on the repertoire rather than on the sequence-representation, (b) pooling learned representations rather than predictions, and (c) using less rigid feature extractors, such as d convolutions or lstms. in this work, we contribute the following: we demonstrate that continuous generalizations of binary modern hopfield-networks (krotov & hopfield, demircigil et al., ) have an update rule that is known as the attention mechanisms in the transformer. we show that these modern hopfield networks have exponential storage capacity, which allows them to extract patterns among a large set of instances (next section). based on this result, we propose deeprc, a novel deep mil method based on modern hopfield networks for large bags of complex sequences, as they occur in immune repertoire classification (section "deep repertoire classification). we evaluate the predictive performance of deeprc and other machine learning approaches for the classification of immune repertoires in a large comparative study (section "experimental results") exponential storage capacity of continuous state modern hopfield networks with transformer attention as update rule in this section, we show that modern hopfield networks have exponential storage capacity, which will later allow us to approach massive multiple-instance learning problems, such as immune repertoire classification. see our companion paper (ramsauer et al., ) for a detailed derivation and analysis of modern hopfield networks. we assume patterns x , . . . , x n ∈ r d that are stacked as columns to the matrix x = (x , . . . , x n ) and a query pattern ξ that also represents the current state. the largest norm of a pattern is m = max i x i . the separation ∆ i of a pattern x i is defined as its minimal dot product difference to any of the other patterns: we consider a modern hopfield network with current state ξ and the energy function for energy e and state ξ, the update rule is proven to converge globally to stationary points of the energy e, which are local minima or saddle points (see (ramsauer et al., ) , appendix, theorem a ). surprisingly, the update rule eq. ( ) is also the formula of the well-known transformer attention mechanism. to see this more clearly, we simultaneously update several queries ξ i . furthermore the queries ξ i and the patterns x i are linear mappings of vectors y i into the space r d . for matrix notation, we set x i = w t k y i , ξ i = w t q y i and multiply the result of our update rule with w v . using y = (y , . . . , y n ) t , we define the matrices and the patterns are now mapped to the hopfield space with dimension d = d k . we set β = / √ d k and change softmax to a row vector. the update rule eq. ( ) multiplied by w v performed for all queries simultaneously becomes in row vector notation: this formula is the transformer attention. if the patterns x i are well separated, the iterate eq. ( ) converges to a fixed point close to a pattern to which the initial ξ is similar. if the patterns are not well separated the iterate eq.( ) converges to a fixed point close to the arithmetic mean of the patterns. if some patterns are similar to each other but well separated from all other vectors, then a metastable state between the similar patterns exists. iterates that start near a metastable state converge to this metastable state. for details see ramsauer et al. ( ) , appendix, sect. a . typically, the update converges after one update step (see ramsauer et al. ( ) , appendix, theorem a ) and has an exponentially small retrieval error (see ramsauer et al. ( ) , appendix, theorem a ). our main concern for application to immune repertoire classification is the number of patterns that can be stored and retrieved by the modern hopfield network, equivalently to the transformer attention head. the storage capacity of an attention mechanism is critical for massive mil problems. we first define what we mean by storing and retrieving patterns from the modern hopfield network. definition (pattern stored and retrieved). we assume that around every pattern x i a sphere s i is given. we say x i is stored if there is a single fixed point x * i ∈ s i to which all points ξ ∈ s i converge, for randomly chosen patterns, the number of patterns that can be stored is exponential in the dimension d of the space of the patterns (x i ∈ r d ). theorem . we assume a failure probability < p and randomly chosen patterns on the sphere with radius m = k √ d − . we define a := d− ( + ln( β k p (d − ))), b := k β , and c = b w (exp(a + ln(b)) , where w is the upper branch of the lambert w function and ensure then with probability − p, the number of random patterns that can be stored is examples are c ≥ . for β = , k = , d = and p = . (a + ln(b) > . ) and c ≥ . for β = k = , d = , and p = . (a + ln(b) < − . ). see ramsauer et al. ( ) , appendix, theorem a for a proof. we have established that a modern hopfield network or a transformer attention mechanism can store and retrieve exponentially many patterns. this allows us to approach mil with massive numbers of instances from which we have to retrieve a few with an attention mechanism. deep repertoire classification problem setting and notation. we consider a mil problem, in which an input object x is a bag of n instances x = {s , . . . , s n }. the instances do not have dependencies nor orderings between them and n can be different for every object. we assume that each instance s i is associated with a label y i ∈ { , }, assuming a binary classification task, to which we do not have access. we only have access to a label y = max i y i for an input object or bag. note that this poses a credit assignment problem, since the sequences that are responsible for the label y have to be identified and that the relation between instance-label and bag-label can be more complex (foulds & frank, ) . a modelŷ = g(x) should be (a) invariant to permutations of the instances and (b) able to cope with the fact that n varies across input objects (ilse et al., ) , which is a problem also posed by point sets (qi et al., ) . two principled approaches exist. the first approach is to learn an instance-level scoring function h : s → [ , ], which is then pooled across instances with a pooling function f , for example by average-pooling or max-pooling (see below). the second approach is to construct an instance representation z i of each instance by h : s → r dv and then encode the bag, or the input object, by pooling these instance representations (wang et al., ) via a function f . an output function o : r dv → [ , ] subsequently classifies the bag. the second approach, the pooling of representations rather than scoring functions, is currently best performing (wang et al., ) . in the problem at hand, the input object x is the immune repertoire of an individual that consists of a large set of immune receptor sequences (t-cell receptors or antibodies). immune receptors are primarily represented as sequences s i from a space s i ∈ s. these sequences act as the instances in the mil problem. although immune repertoire classification can readily be formulated as a mil problem, it is yet unclear how well machine learning methods solve the above-described problem with a large number of instances n , and with instances s i being complex sequences. next we describe currently used pooling functions for mil problems. pooling functions for mil problems. different pooling functions equip a model g with the property to be invariant to permutations of instances and with the ability to process different numbers of instances. typically, a neural network h θ with parameters θ is trained to obtain a function that maps each instance onto a representation: z i = h θ (s i ) and then a pooling function z = f ({z , . . . , z n }) supplies a representation z of the input object x = {s , . . . , s n }. the following pooling functions are typically used: average-pooling: where e m is the standard basis vector for dimension m and attention-pooling: z = n i= a i z i , where a i are non-negative (a i ≥ ), sum to one ( n i= a i = ), and are determined by an attention mechanism. these pooling functions are invariant to permutations of { , . . . , n } and are differentiable. therefore, they are suited as building blocks for deep learning architectures. we employ attention-pooling in our deeprc model as detailed in the following. modern hopfield networks viewed as transformer-like attention mechanisms. the modern hopfield networks, as introduced above,have a storage capacity that is exponential in the dimension of the vector space and converge after just one update (see (ramsauer et al., ) , appendix).additionally, the update rule of modern hopfield networks is known as key-value attention mechanism, which has been highly successful through the transformer (vaswani et al., ) and bert (devlin et al., ) models in natural language processing. therefore using modern hopfield networks with the key-value-attention mechanism as update rule is the natural choice for our task. in particular, modern hopfield networks are theoretically justified for storing and retrieving the large number of vectors (sequence patterns) that appear in the immune repertoire classification task. instead of using the terminology of modern hopfield networks, we explain our deeprc architecture in terms of key-value-attention (the update rule of the modern hopfield network), since it is well known in the deep learning community. the attention mechanism assumes a space of dimension d k in which keys and queries are compared. a set of n key vectors are combined to the matrix k. a set of d q query vectors are combined to the matrix q. similarities between queries and keys are computed by inner products, therefore queries can search for similar keys that are stored. another set of n value vectors are combined to the matrix v . the output of the attention mechanism is a weighted average of the value vectors for each query q. the i-th vector v i is weighted by the similarity between the i-th key k i and the query q. the similarity is given by the softmax of the inner products of the query q with the keys k i . all queries are calculated in parallel via matrix operations. consequently, the attention function att(q, k, v ; β) maps queries q, keys k, and values v to d v -dimensional outputs: att(q, k, v ; β) = softmax(βqk t )v (see also eq. ( )). while this attention mechanism has originally been developed for sequence tasks (vaswani et al., ) , it can be readily transferred to sets ye et al., ) . this type of attention mechanism will be employed in deeprc. the deeprc method. we propose a novel method deep repertoire classification (deeprc) for immune repertoire classification with attention-based deep massive multiple instance learning and compare it against other machine learning approaches. for deeprc, we consider immune repertoires as input objects, which are represented as bags of instances. in a bag, each instance is an immune receptor sequence and each bag can contain a large number of sequences. note that we will use z i to denote the sequence-representation of the i-th sequence and z to denote the repertoire-representation. at the core, deeprc consists of a transformer-like attention mechanism that extracts the most important information from each repertoire. we first give an overview of the attention mechanism and then provide details on each of the sub-networks h , h , and o of deeprc. attention mechanism in deeprc. this mechanism is based on the three matrices k (the keys), q (the queries), and v (the values) together with a parameter β. values. deeprc uses a d convolutional network h (lecun et al., ; hu et al., ; kelley et al., ) that supplies a sequence-representation z i = h (s i ), which acts as the values v = z = (z , . . . , z n ) in the attention mechanism (see figure ). keys. a second neural network h , which shares its first layers with h , is used to obtain keys k ∈ r n ×d k for each sequence in the repertoire. this network uses self-normalizing layers (klambauer et al., ) with units per layer (see figure ). query. we use a fixed d k -dimensional query vector ξ which is learned via backpropagation. for more attention heads, each head has a fixed query vector. with the quantities introduced above, the transformer attention mechanism (eq. ( )) of deeprc is implemented as follows: where z ∈ r n ×dv are the sequence-representations stacked row-wise, k are the keys, and z is the repertoire-representation and at the same time a weighted mean of sequence-representations z i . the attention mechanism can readily be extended to multiple queries, however, computational demand could constrain this depending on the application and dataset. theorem demonstrates that this mechanism is able to retrieve a single pattern out of several hundreds of thousands. attention-pooling and interpretability. each input object, i.e. repertoire, consists of a large number n of sequences, which are reduced to a single fixed-size feature vector of length d v representing the whole input object by an attention-pooling function. to this end, a transformer-like attention mechanism adapted to sets is realized in deeprc which supplies a i -the importance of the sequence s i . this importance value is an interpretable quantity, which is highly desired for the immunological problem at hand. thus, deeprc allows for two forms of interpretability methods. (a) a trained deeprc model can compute attention weights a i , which directly indicate the importance of a sequence. (b) deeprc furthermore allows for the usage of contribution analysis methods, such as integrated gradients (ig) (sundararajan et al., ) or layer-wise relevance propagation (montavon et al., ; arras et al., ) . see sect. a for details. classification layer and network parameters. the repertoire-representation z is then used as input for a fully-connected output networkŷ = o(z) that predicts the immune status, where we found it sufficient to train single-layer networks. in the simplest case, deeprc predicts a single target, the class label y, e.g. the immune status of an immune repertoire, using one output value. however, since deeprc is an end-to-end deep learning model, multiple targets may be predicted simultaneously in classification or regression settings or a mix of both. this allows for the introduction of additional information into the system via auxiliary targets such as age, sex, or other metadata. table with sub-networks h , h , and o. d l indicates the sequence length. network parameters, training, and inference. deeprc is trained using standard gradient descent methods to minimize a cross-entropy loss. the network parameters are θ , θ , θ o for the sub-networks h , h , and o, respectively, and additionally ξ. in more detail, we train deeprc using adam (kingma & ba, ) with a batch size of and dropout of input sequences. implementation. to reduce computational time, the attention network first computes the attention weights a i for each sequence s i in a repertoire. subsequently, the top % of sequences with the highest a i per repertoire are used to compute the weight updates and prediction. furthermore, computation of z i is performed in -bit, others in -bit precision to ensure numerical stability in the softmax. see sect. a for details. in this section, we report and analyze the predictive power of deeprc and the compared methods on several immunosequencing datasets. the roc-auc is used as the main metric for the predictive power. methods compared. we compared previous methods for immune repertoire classification, (ostmeyer et al., ) ("log. mil (kmer)", "log. mil (tcrb)") and a burden test (emerson et al., ) , as well as the baseline methods logistic regression ("log. regr."), k-nearest neighbour ("knn"), and support vector machines ("svm") with kernels designed for sets, such as the jaccard kernel ("j") and the minmax ("mm") kernel (ralaivola et al., ) . for the simulated data, we also added baseline methods that search for the implanted motif either in binary or continuous fashion ("known motif b.", "known motif c.") assuming that this motif was known (for details, see sect. a ). datasets. we aimed at constructing immune repertoire classification scenarios with varying degree of difficulties and realism in order to compare and analyze the suggested machine learning methods. to this end, we either use simulated or experimentally-observed immune receptor sequences and we implant signals, specifically, sequence motifs or sets thereof weber et al., ) , at different frequencies into sequences of repertoires of the positive class. these frequencies represent the witness rates and range from . % to %. overall, we compiled four categories of datasets: (a) simulated immunosequencing data with implanted signals, (b) lstm-generated immunosequencing data with implanted signals, (c) real-world immunosequencing data with implanted signals, and (d) real-world immunosequencing data with known immune status, the cmv dataset (emerson et al., ) . the average number of instances per bag, which is the number of sequences per immune repertoire, is ≈ , except for category (c), in which we consider the scenario of low-coverage data with only , sequences per repertoire. the number of repertoires per dataset ranges from to , . in total, all datasets comprise ≈ billion sequences or instances. this represents the largest comparative study on immune repertoire classification (see sect. a ). hyperparameter selection. we used a nested -fold cross validation (cv) procedure to estimate the performance of each of the methods. all methods could adjust their most important hyperparameters on a validation set in the inner loop of the procedure. see sect. a for details. table : results in terms of auc of the competing methods on all datasets. the reported errors are standard deviations across cross-validation (cv) folds (except for the column "simulated"). real-world cmv: average performance over cv folds on the cmv dataset (emerson et al., ) . real-world data with implanted signals: average performance over cv folds for each of the four datasets. a signal was implanted with a frequency (=witness rate) of % or . %. either a single motif ("om") or multiple motifs ("mm") were implanted. lstm-generated data: average performance over cv folds for each of the datasets. in each dataset, a signal was implanted with a frequency of %, %, . %, . %, or . %, respectively. simulated: here we report the mean over simulated datasets with implanted signals and varying difficulties (see tab. a for details). the error reported is the standard deviation of the auc values across the datasets. results. in each of the four categories, "real-world data", "real-world data with implanted signals", "lstm-generated data", and "simulated immunosequencing data", deeprc outperforms all competing methods with respect to average auc. across categories, the runner-up methods are either the svm for mil problems with minmax kernel or the burden test (see table and sect. a ). results on simulated immunosequencing data. in this setting the complexity of the implanted signal is in focus and varies throughout simulated datasets (see sect. a ). some datasets are challenging for the methods because the implanted motif is hidden by noise and others because only a small fraction of sequences carries the motif, and hence have a low witness rate. these difficulties become evident by the method called "known motif binary", which assumes the implanted motif is known. the performance of this method ranges from a perfect auc of . in several datasets to an auc of . in dataset ' ' (see sect. a ). deeprc outperforms all other methods with an average auc of . ± . , followed by the svm with minmax kernel with an average auc of . ± . (see sect. a ). the predictive performance of all methods suffers if the signal occurs only in an extremely small fraction of sequences. in datasets, in which only . % of the sequences carry the motif, all auc values are below . . results on lstm-generated data. on the lstm-generated data, in which we implanted noisy motifs with frequencies of %, %, . %, . %, and . %, deeprc yields almost perfect predictive performance with an average auc of . ± . (see sect. a and a ). the second best method, svm with minmax kernel, has a similar predictive performance to deeprc on all datasets but the other competing methods have a lower predictive performance on datasets with low frequency of the signal ( . %). results on real-world data with implanted motifs. in this dataset category, we used real immunosequences and implanted single or multiple noisy motifs. again, deeprc outperforms all other methods with an average auc of . ± . , with the second best method being the burden test with an average auc of . ± . . notably, all methods except for deeprc have difficulties with noisy motifs at a frequency of . % (see tab. a ) . results on real-world data. on the real-world dataset, in which the immune status of persons affected by the cytomegalovirus has to be predicted, the competing methods yield predictive aucs between . and . (see table ). we note that this dataset is not the exact dataset that was used in emerson et al. ( ) . it differs in pre-processing and also comprises a different set of samples and a smaller training set due to the nested -fold cross-validation procedure, which leads to a more challenging dataset. the best performing method is deeprc with an auc of . ± . , followed by the svm with minmax kernel (auc . ± . ) and the burden test with an auc of . ± . . the top-ranked sequences by deeprc significantly correspond to those detected by emerson et al. ( ) , which we tested by a mann-whitney u-test with the null hypothesis that the attention values of the sequences detected by emerson et al. ( ) would be equal to the attention values of the remaining sequences (p-value of . · − ). the sequence attention values are displayed in tab. a . we have demonstrated how modern hopfield networks and attention mechanisms enable successful classification of the immune status of immune repertoires. for this task, methods have to identify the discriminating sequences amongst a large set of sequences in an immune repertoire. specifically, even motifs within those sequences have to be identified. we have shown that deeprc, a modern hopfield network and an attention mechanism with a fixed query, can solve this difficult task despite the massive number of instances. deeprc furthermore outperforms the compared methods across a range of different experimental conditions. impact on machine learning and related scientific fields. we envision that with (a) the increasing availability of large immunosequencing datasets (kovaltsuk et al., ; corrie et al., ; christley et al., ; zhang et al., ; rosenfeld et al., ; shugay et al., ) , (b) further fine-tuning of ground-truth benchmarking immune receptor datasets (weber et al., ; olson et al., ; marcou et al., ) , (c) accounting for repertoire-impacting factors such as age, sex, ethnicity, and environment (potential confounding factors), and (d) increased gpu memory and increased computing power, it will be possible to identify discriminating immune receptor motifs for many diseases, potentially even for the current sars-cov- (covid- ) pandemic minervina et al., ; galson et al., ) . such results would greatly benefit ongoing research on antibody and tcr-driven immunotherapies and immunodiagnostics as well as rational vaccine design (brown et al., ) . in the course of this development, the experimental verification and interpretation of machine-learningidentified motifs could receive additional focus, as for most of the sequences within a repertoire the corresponding antigen is unknown. nevertheless, recent technological breakthroughs in highthroughput antigen-labeled immunosequencing are beginning to generate large-scale antigen-labeled single-immune-receptor-sequence data thus resolving this longstanding problem (setliff et al., ) . from a machine learning perspective, the successful application of deeprc on immune repertoires with their large number of instances per bag might encourage the application of modern hopfield networks and attention mechanisms on new, previously unsolved or unconsidered, datasets and problems. impact on society. if the approach proves itself successful, it could lead to faster testing of individuals for their immune status w.r.t. a range of diseases based on blood samples. this might motivate changes in the pipeline of diagnostics and tracking of diseases, e.g. automated testing of the immune status in regular intervals. it would furthermore make the collection and screening of blood samples for larger databases more attractive. in consequence, the improved testing of immune statuses might identify individuals that do not have a working immune response towards certain diseases to government or insurance companies, which could then push for targeted immunisation of the individual. similarly to compulsory vaccination, such testing for the immune status could be made compulsory by governments, possibly violating privacy or personal self-determination in exchange for increased over-all health of a population. ultimately, if the approach proves itself successful, the insights gained from the screening of individuals that have successfully developed resistances against specific diseases could lead to faster targeted immunisation, once a certain number of individuals with resistances can be found. this might strongly decrease the harm done by e.g. pandemics and lead to a change in the societal perception of such diseases. consequences of failures of the method. as common with methods in machine learning, potential danger lies in the possibility that users rely too much on our new approach and use it without reflecting on the outcomes. however, the full pipeline in which our method would be used includes wet lab tests after its application, to verify and investigate the results, gain insights, and possibly derive treatments. failures of the proposed method would lead to unsuccessful wet lab validation and negative wet lab tests. since the proposed algorithm does not directly suggest treatment or therapy, human beings are not directly at risk of being treated with a harmful therapy. substantial wet lab and in-vitro testing and would indicate wrong decisions by the system. leveraging of biases in the data and potential discrimination. as for almost all machine learning methods, confounding factors, such as age or sex, could be used for classification. this, might lead to biases in predictions or uneven predictive performance across subgroups. as a result, failures in the wet lab would occur (see paragraph above). moreover, insights into the relevance of the confounding factors could be gained, leading to possible therapies or counter-measures concerning said factors. furthermore, the amount of data available with respec to relevant confounding factors could lead to better or worse performance of our method. e.g. a dataset consisting mostly of data from individuals within a specific age group might yield better performance for that age group, possibly resulting in better or exclusive treatment methods for that specific group. here again, the application of deeprc would be followed by in-vitro testing and development of a treatment, where all target groups for the treatment have to be considered accordingly. all datasets and code is available at https://github.com/ml-jku/deeprc. the cmv dataset is publicly available at https://clients.adaptivebiotech.com/pub/emerson- -natgen. in section a we provide details on the architecture of deeprc, in section a we present details on the datasets, in section a we explain the methods that we compared, in section a we elaborate on the hyperparameters and their selection process. then, in section a we present detailed results for each dataset category in tabular form, in section a we provide information on the lstm model that was used to generate antibody sequences, in section a we show how deeprc can be interpreted, in section a we show the correspondence of previously identified tcr sequences for cmv immune status with attention values by deeprc, and finally we present variations and an ablation study of deeprc in section a . input layer. for the input layer of the cnn, the characters in the input sequence, i.e. the amino acids (aas), are encoded in a one-hot vector of length . to also provide information about the position of an aa in the sequence, we add additional input features with values in range [ , ] to encode the position of an aa relative to the sequence. these positional features encode whether the aa is located at the beginning, the center, or the end of the sequence, respectively, as shown in figure a . we concatenate these positional features with the one-hot vector of aas, which results in a feature vector of size per sequence position. each repertoire, now represented as a bag of feature vectors, is then normalized to unit variance. since the cytomegalovirus dataset (cmv dataset) provides sequences with an associated abundance value per sequence, which is the number of occurrences of a sequence in a repertoire, we incorporate this information into the input of deeprc. to this end, the one-hot aa features of a sequence are multiplied by a scaling factor of log(c a ) before normalization, where c a is the abundance of a sequence. we feed the sequences with features per position into the cnn. sequences of different lengths were zero-padded to the maximum sequence length per batch at the sequence ends. d cnn for motif recognition. in the following, we describe how deeprc identifies patterns in the individual sequences and reduces each sequence in the input object to a fixed-size feature vector. deeprc employs d convolution layers to extract patterns, where trainable weight kernels are convolved over the sequence positions. in principle, also recurrent neural networks (rnns) or transformer networks could be used instead of d cnns, however, (a) the computational complexity of the network must be low to be able to process millions of sequences for a single update. additionally, (b) the learned network should be able to provide insights in the recognized patterns in form of motifs. both properties (a) and (b) are fulfilled by d convolution operations that are used by deeprc. we use one d cnn layer (hu et al., ) with selu activation functions (klambauer et al., ) to identify the relevant patterns in the input sequences with a computationally light-weight operation. the larger the kernel size, the more surrounding sequence positions are taken into account, which influences the length of the motifs that can be extracted. we therefore adjust the kernel size during hyperparameter search. in prior works (ostmeyer et al., ) , a k-mer size of yielded good predictive performance, which could indicate that a kernel size in the range of may be a proficient choice. for d v trainable kernels, this produces a feature vector of length d v at each sequence position. subsequently, global max-pooling over all sequence positions of a sequence reduces the sequence-representations z i to vectors of the fixed length d v . given the challenging size of the input data per repertoire, the computation of the cnn activations and weight updates is performed using -bit floating point values. a list of hyperparameters evaluated for deeprc is given in table a . a comparison of rnn-based and cnn-based sequence embedding for motif recognition in a smaller experimental setting is given in sec. a . regularization. we apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort. during training, each repertoire is subsampled to , input sequences, which are randomly drawn from the respective repertoire. this can also be interpreted as random drop-out (hinton et al., ) on the input sequences or attention weights. during training and evaluation, the attention weights computed by the attention network are furthermore used to rank the input sequences. based on this ranking, the repertoire is reduced to the % of sequences with the highest attention weights. these top % of sequences are then used to compute the weight updates and the prediction for the repertoire. additionally, one might employ further regularization techniques, which we only partly investigated further in a smaller experimental setting in sec. a due to high computational demands. such regularization techniques include l and l weight decay, noise in the form of random aa permutations in the input sequences, noise on the attention weights, or random shuffling of sequences between repertoires that belong to the negative class. the last regularization technique assumes that the sequences in positive-class repertoires carry a signal, such as an aa motif corresponding to an immune response, whereas the sequences in negative-class repertoires do not. hence, the sequences can be shuffled randomly between negative class repertoires without obscuring the signal in the positive class repertoires. hyperparameters. for the hyperparameter search of deeprc for the category "simulated immunosequencing data", we only conducted a full hyperparameter search on the more difficult datasets with motif implantation probabilities below %, as described in table a . this process was repeated for all folds of the -fold cross-validation (cv) and the average score on the test sets constitutes the final score of a method. table a provides an overview of the hyperparameter search, which was conducted as a grid search for each of the datasets in a nested -fold cv procedure, as described in section a . computation time and optimization. we took measures on the implementation level to address the high computational demands, especially gpu memory consumption, in order to make the large number of experiments feasible. we train the deeprc model with a small batch size of samples and perform computation of inference and updates of the d cnn using -bit floating point values. the rest of the network is trained using -bit floating point values. the adam parameter for numerical stability was therefore increased from the default value of = − to = − . training was performed on various gpu types, mainly nvidia rtx ti. computation times were highly dependent on the number of sequences in the repertoires and the number and sizes of cnn kernels. a single update on an nvidia rtx ti gpu took approximately . to . seconds, while requiring approximately to gb gpu memory. taking these optimizations and gpus with larger memory (≥ gb) into account, it is already possible to train deeprc, possibly with multi-head attention and a larger network architecture, on larger datasets (see sec. a ). our network implementation is based on pytorch . . (paszke et al., ) . incorporation of additional inputs and metadata. additional metadata in the form of sequencelevel or repertoire-level features could be incorporated into the input via concatenation with the feature vectors that result from taking the maximum of the d cnn outputs w.r.t. the sequence positions. this has the benefit that the attention mechanism and output network can utilize the sequence-level or repertoire-level features for their predictions. sparse metadata or metadata that is only available during training could be used as auxiliary targets to incorporate the information via gradients into the deeprc model. limitations. the current methods are mostly limited by computational complexity, since both hyperparameter and model selection is computationally demanding. for hyperparameter selection, a large number of hyperparameter settings have to be evaluated. for model selection, a single repertoire requires the propagation of many thousands of sequences through a neural network and keeping those quantities in gpu memory in order to perform the attention mechanism and weight update. thus, increased gpu memory would significantly boost our approach. increased computational power would also allow for more advanced architectures and attention mechanisms, which may further improve predictive performance. another limiting factor is over-fitting of the model due to the currently relatively small number of samples (bags) in real-world immunosequencing datasets in comparison to the large number of instances per bag and features per instance. we aimed at constructing immune repertoire classification scenarios with varying degree of realism and difficulties in order to compare and analyze the suggested machine learning methods. to this end, we either use simulated or experimentally-observed immune receptor sequences and we implant signals, which are sequence motifs weber et al., ) , into sequences of repertoires of the positive class. it has been shown previously that interaction of immune receptors with antigens occur via short sequence stretches . thus, implantation of short motif sequences simulating an immune signal is biologically meaningful. our benchmarking study comprises four different categories of datasets: (a) simulated immunosequencing data with implanted signals (where the signal is defined as sets of motifs), (b) lstm-generated immunosequencing data with implanted signals, (c) real-world immunosequencing data with implanted signals, and (d) real-world immunosequencing data. each of the first three categories consists of multiple datasets with varying difficulty depending on the type of the implanted signal and the ratio of sequences with the implanted signal. the ratio of sequences with the implanted signal, where each sequence carries at most implanted signal, corresponds to the witness rate (wr). we consider binary classification tasks to simulate the immune status of healthy and diseased individuals. we randomly generate immune repertoires with varying numbers of sequences, where we implant sequence motifs in the repertoires of the diseased individuals, i.e. the positive class. the sequences of a repertoire are also randomly generated by different procedures (detailed below). each sequence is composed of different characters, corresponding to amino acids, and has an average length of . aas. in the first category, we aim at investigating the impact of the signal frequency, i.e. the wr, and the signal complexity on the performance of the different methods. to this end, we created datasets, whereas each dataset contains a large number of repertoires with a large number of random aa sequences per repertoire. we then implanted signals in the aa sequences of the positive class repertoires, where the datasets differ in frequency and complexity of the implanted signals. in detail, the aas were sampled randomly independent of their respective position in the sequence, while the frequencies of aas, distribution of sequence lengths, and distribution of the number of sequences per repertoire, i.e. the number of instances per bag, are following the respective distributions observed in the real-world cmv dataset (emerson et al., ) . for this, we first sampled the number of sequences for a repertoire from a gaussian n (µ = k, σ = k) distribution and rounded to the nearest positive integer. we re-sampled if the size was below k. we then generated random sequences of aas with a length of n (µ = . , σ = . ), again rounded to the nearest positive integers. each simulated repertoire was then randomly assigned to either the positive or negative class, with , repertoires per class. in the repertoires assigned to the positive class, we implanted motifs with an average length of aas, following the results of the experimental analysis of antigenbinding motifs in antibodies and t-cell receptor sequences by . we varied the characteristics of the implanted motifs for each of the datasets with respect to the following parameters: (a) ρ, the probability of a motif being implanted in a sequence of a positive repertoire, i.e. the average ratio of sequences containing the motif, which is the witness rate. in this way, we generated different datasets of variable difficulty containing in total roughly . billion sequences. see table a for an overview of the properties of the implanted motifs in the datasets. in the second dataset category, we investigate the impact of the signal frequency and complexity in combination with more plausible immune receptor sequences by taking into account the positional aa distributions and other sequence properties. to this end, we trained an lstm (hochreiter & schmidhuber, ) in a standard next character prediction (graves, ) setting to create aa sequences with properties similar to experimentally observed immune receptor sequences. in the first step, the lstm model was trained on all immuno-sequences in the cmv dataset (emerson et al., ) that contain valid information about sequence abundance and have a known cmv label. such an lstm model is able to capture various properties of the sequences, including positiondependent probability distributions and combinations, relationships, and order of aas. we then used the trained lstm model to generate , repertoires in an autoregressive fashion, starting with a start sequence that was randomly sampled from the trained-on dataset. based on a visual inspection of the frequencies of -mers (see section a ), the similarity of lstm generated sequences and real sequences was deemed sufficient for the purpose of generating the aa sequences for the datasets in this category. further details on lstm training and repertoire generation are given in section a . after generation, each repertoire was assigned to either the positive or negative class, with repertoires per class. we implanted motifs of length with varying properties in the center of the sequences of the positive class to obtain different datasets. each sequence in the positive repertoires has a probability ρ to carry the motif, which was varied throughout datasets and corresponds to the wr (see table a ). each position in the motif has a probability of . to be implanted and consequently a probability of . that the original aa in the sequence remains, which can be seen as noise on the motif. in the third category, we implanted signals into experimentally obtained immuno-sequences, where we considered dataset variations. each dataset consists of repertoires for each of the two classes, where each repertoire consists of k sequences. in this way, we aim to simulate datasets with a low sequencing coverage, which means that only relatively few sequences per repertoire are available. the sequences were randomly sampled from healthy (cmv negative) individuals from the cmv dataset (see below paragraph for explanation). two signal types were considered: (a) one signal with one motif. the aa motif ldr was implanted in a certain fraction of sequences. the pattern is randomly altered at one of the three positions with probabilities . , . , and . , respectively. (b) one signal with multiple motifs. one of the three possible motifs ldr, cas, and gl-n was table a : properties of simulated repertoires, variations of motifs, and motif frequencies, i.e. the witness rate, for the datasets in categories "simulated immunosequencing data", "lstm-generated data", and "real-world data with implanted signals". noise types for * are explained in paragraph "real-world data with implanted signals". implanted with equal probability. again, the motifs were randomly altered before implantation. the aa motif ldr changed as described above. the aa motif cas was altered at the second position with probability . and with probability . at the first position. the pattern gl-n, wheredenotes a gap location, is randomly altered at the first position with probability . and the gap has a length of , , or aas with equal probability. additionally, the datasets differ in the values for ρ, the average ratio of sequences carrying a signal, which were chosen as % or . %. the motifs were implanted at positions , , and according to the imgt numbering scheme for immune receptor sequences (lefranc et al., ) with probabilities . , . and . , respectively. with the remaining . chance, the motif is implanted at any other sequence position. this means that the motif occurrence in the simulated sequences is biased towards the middle of the sequence. we used a real-world dataset of repertoires, each of which containing between , to , (avg. , ) tcr sequences with a length of to (avg. . ) aas, originally collected and provided by emerson et al. ( ) . out of repertoires were labelled as positive for cytomegalovirus (cmv) serostatus, which we consider as the positive class, repertoires with negative cmv serostatus, considered as negative class, and repertoires with unknown status. we changed the number of sequence counts per repertoire from − to for sequences. furthermore, we exclude a total of repertoires with unknown cmv status or unknown information about the sequence abundance within a repertoire, reducing the dataset for our analysis to repertoires, of which with positive and with negative cmv status. we give a non-exhaustive overview of previously considered mil datasets and problems in table a . to our knowledge the datasets considered in this work pose the most challenging mil problems with respect to the number of instances per bag (column ). table a : mil datasets with their numbers of bags and numbers of instances. "total number of instances" refers to the total number of instances in the dataset. the simulated and real-world immunosequencing datasets considered in this work contain a by orders of magnitudes larger number of instances per bag than mil datasets that were considered by machine learning methods up to now. we evaluate and compare the performance of deeprc against a set of machine learning methods that serve as baseline, were suggested, or can readily be adapted to immune repertoire classification. in this section, we describe these compared methods. this method serves as an estimate for the achievable classification performance using prior knowledge about which motif was implanted. note that this does not necessarily lead to perfect predictive performance since motifs are implanted with a certain amount of noise and could also be present in the negative class by chance. the known motif method counts how often the known implanted motif occurs per sequence for each repertoire and uses this count to rank the repertoires. from this ranking, the area under the receiver operator curve (auc) is computed as performance measure. probabilistic aa changes in the known motif are not considered for this count, with the exception of gap positions. we consider two versions of this method: (a) known motif binary: counts the occurrence of the known motif in a sequence and (b) known motif continuous: counts the maximum number of overlapping aas between the known motif and all sequence positions, which corresponds to a convolution operation with a binary kernel followed by max-pooling. since the implanted signal is not known in the experimentally obtained cmv dataset, this method cannot be applied to this dataset. the support vector machine (svm) approach uses a fixed mapping from a bag of sequences to the corresponding k-mer counts. the function h kmer maps each sequence s i to a vector representing the occurrence of k-mers in the sequence. to avoid confusion with the sequence-representation obtained from the cnn layers of deeprc, we denote u i = h kmer (s i ), which is analogous to z i . specifically, where #{p m ∈ s i } denotes how often the k-mer pattern p m occurs in sequence s i . afterwards, average-pooling is applied to obtain u = /n n i= u i , the k-mer representation of the input object x. for two input objects x (n) and x (l) with representations u (n) and u (l) , respectively, we implement the minmax kernel (ralaivola et al., ) as follows: where u (n) m is the m-th element of the vector u (n) . the jaccard kernel (levandowsky & winter, ) is identical to the minmax kernel except that it operates on binary u (n) . we used a standard c-svm, as introduced by cortes & vapnik ( ) . the corresponding hyperparameter c is optimized by random search. the settings of the full hyperparameter search as well as the respective value ranges are given in table a a . the same k-mer representation of a repertoire, as introduced above for the svm baseline, is used for the k-nearest neighbor (knn) approach. as this method clusters samples according to distances between them, the previous kernel definitions cannot be applied directly. it is therefore necessary to transform the minmax as well as the jaccard kernel from similarities to distances by constructing the following (levandowsky & winter, ) : d jaccard (u (n) , u (l) ) = − k jaccard (u (n) , u (l) ). (a ) the amount of neighbors is treated as the hyperparameter and optimized by an exhaustive grid search. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . we implemented logistic regression on the k-mer representation u of an immune repertoire. the model is trained by gradient descent using the adam optimizer (kingma & ba, ) . the learning rate is treated as the hyperparameter and optimized by grid search. furthermore, we explored two regularization settings using combinations of l and l weight decay. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . we implemented a burden test (emerson et al., ; li & leal, ; wu et al., ) in a machine learning setting. the burden test first identifies sequences or k-mers that are associated with the individual's class, i.e., immune status, and then calculates a burden score per individual. concretely, for each k-mer or sequence, the phi coefficient of the contingency table for absence or presence and positive or negative immune status is calculated. then, j k-mers or sequences with the highest phi coefficients are selected as the set of associated k-mers or sequences. j is a hyperparameter that is selected on a validation set. additionally, we consider the type of input features, sequences or k-mers, as a hyperparameter. for inference, a burden score per individual is calculated as the sum of associated k-mers or sequences it carries. this score is used as raw prediction and to rank the individuals. hence, we have extended the burden test by emerson et al. ( ) to k-mers and to adaptive thresholds that are adjusted on a validation set. the logistic multiple instance learning (mil) approach for immune repertoire classification (ostmeyer et al., ) applies a logistic regression model to each k-mer representation in a bag. the resulting scores are then summarized by max-pooling to obtain a prediction for the bag. each amino acid of each k-mer is represented by features, the so-called atchley factors (atchley et al., ) . as k-mers of length are used, this gives a total of × = features. one additional feature per -mer is added, which represents the relative frequency of this -mer with respect to its containing bag, resulting in features per -mer. two options for the relative frequency feature exist, which are (a) whether the frequency of the -mer (" mer") or (b) the frequency of the sequence in which the -mer appeared ("tcrβ") is used. we optimized the learning rate, batch size, and early stopping parameter on the validation set. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . for all competing methods a hyperparameter search was performed, for which we split each of the training sets into an inner training set and inner validation set. the models were trained on the inner training set and evaluated on the inner validation set. the model with the highest auc score on the inner validation set is then used to calculate the score on the respective test set. here we report the hyperparameter sets and search strategy that is used for all methods. deeprc. the set of hyperparameters of deeprc is shown in table a . these hyperparameter combinations are adjusted via a grid search procedure. table a : deeprc hyperparameter search space. every · updates, the current model was evaluated against the validation fold. the early stopping hyperparameter was determined by selecting the model with the best loss on the validation fold after updates. * : experiments for { ; ; } kernels were omitted for datasets with motif implantation probabilities ≥ % in the category "simulated immunosequencing data". known motif. this method does not have hyperparameters and has been applied to all datasets except for the cmv dataset. the corresponding hyperparameter c of the svm is optimized by randomly drawing values in the range of [− ; ] according to a uniform distribution. these values act as the exponents of a power of and are applied for each of the two kernel types (see table a a ). knn. the amount of neighbors is treated as the hyperparameter and optimized by grid search operating in the discrete range of [ ; max{n, }] with a step size of . the corresponding tight upper bound is automatically defined by the total amount of samples n ∈ n > in the training set, capped at (see table a ). number of neighbors { ; max{n, }} type of kernel {minmax; jaccard} table a : settings used in the hyperparameter search of the knn baseline approach. the number of trials (per type of kernel) is automatically defined by the total amount of samples n ∈ n > in the training set, capped at . logistic regression. the hyperparameter optimization strategy that was used was grid search across hyperparameters given in table a . learning rate −{ ; ; } batch size max. updates coefficient β (adam) . coefficient β (adam) . weight decay weightings {(l = − , l = − ); (l = − , l = − )} table a : settings used in the hyperparameter search of the logistic regression baseline approach. burden test. the burden test selects two hyperparameters: the number of features in the burden set and the type of features, see table a . number of features in burden set { , , , } type of features { mer; sequence} table a : settings used in the hyperparameter search of the burden test approach. logistic mil. for this method, we adjusted the learning rate as well as the batch size as hyperparameters by randomly drawing different hyperparameter combinations from a uniform distribution. the corresponding range of the learning rate is [− . ; − . ], which acts as the exponent of a power of . the batch size lies within the range of [ ; ]. for each hyperparameter combination, a model is optimized by gradient descent using adam, whereas the early stopping parameter is adjusted according to the corresponding validation set (see table a ). learning rate {− . ;− . } batch size { ; } relative abundance term { mer; tcrβ} number of trials max. epochs coefficient β (adam) . coefficient β (adam) . table a : settings used in the hyperparameter search of the logistic mil baseline approach. the number of trials (per type of relative abundance) defines the quantity of combinations of random values of the learning rate as well as the batch size. in this section, we report the detailed results on all four categories of datasets (a) simulated immunosequencing data (table a ) (b) lstm-generated data (table a ) , (c) real-world data with implanted signals (table a ) , and (d) real-world data on the cmv dataset (table a ) , as discussed in the main paper. ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . svm (minmax) . . . . . . . . . . . . . . . . . . . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . known motif b. . . . . . . . . . . . . . . . . . . . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . table a : auc estimates based on -fold cv for all datasets in category "simulated immunosequencing data". the reported errors are standard deviations across the cross-validation folds except for the last column "avg.", in which they show standard deviations across datasets. wildcard characters in motifs are indicated by z, characters with % probability of being removed by d . table a : auc estimates based on -fold cv for all datasets in category "lstm-generated data". the reported errors are standard deviations across the cross-validation folds except for the last column "avg.", in which they show standard deviations across datasets. characters affected by noise, as described in a , paragraph "lstm-generated data", are indicated by r . table a : results on the cmv dataset (real-world data) in terms of auc, f score, balanced accuracy, and accuracy. for f score, balanced accuracy, and accuracy, all methods use their default thresholds. each entry shows mean and standard deviation across cross-validation folds. we trained a conventional next-character lstm model (graves, ) based on the implementation in https://github.com/spro/practical-pytorch (access date st of may, ) using pytorch . . (paszke et al., ) . for this, we applied an lstm model with lstm blocks in layers, which was trained for , epochs using the adam optimizer (kingma & ba, ) with learning rate . , an input batch size of character chunks, and a character chunk length of . as input we used the immuno-sequences in the cdr column of the cmv dataset, where we repeated sequences according to their counts in the repertoires, as specified in the templates column of the cmv dataset. we excluded repertoires with unknown cmv status and unknown sequence abundance from training. after training, we generated , repertoires using a temperature value of . . the number of sequences per repertoire was sampled from a gaussian n (µ = k, σ = k) distribution, where the whole repertoire was generated by the lstm at once. that is, the lstm can base the generation of the individual aa sequences in a repertoire, including the aas and the lengths of the sequences, on the generated repertoire. a random immuno-sequence from the trained-on repertoires was used as initialization for the generation process. this immuno-sequence was not included in the generated repertoire. finally, we randomly assigned of the generated repertoires to the positive (diseased) and to the negative (healthy) class. we then implanted motifs in the positive class repertoires as described in section a . . as illustrated in the comparison of histograms given in fig. a , the generated immuno-sequences exhibit a very similar distribution of -mers and aas compared to the original cmv dataset. real-world data deeprc allows for two forms of interpretability methods. (a) due to its attention-based design, a trained model can be used to compute the attention weights of a sequence, which directly indicates its importance. (b) deeprc furthermore allows for the usage of contribution analysis methods, such as integrated gradients (ig) (sundararajan et al., ) or layer-wise relevance propagation (montavon et al., ; arras et al., ; montavon et al., ; preuer et al., ) . we apply ig to identify the input patterns that are relevant for the classification. to identify aa patterns with high contributions in the input sequences, we apply ig to the aas in the input sequences. additionally, we apply ig to the kernels of the d cnn, which allows us to identify aa motifs with high contributions. in detail, we compute the ig contributions for the aas and positional features in the kernels for every repertoire in the validation and test set, so as to exclude potential artifacts caused by over-fitting. averaging the ig values over these repertoires then results in concise aa motifs. we include qualitative visual analyses of the ig method on different datasets below. here, we provide examples for the interpretation of trained deeprc models using integrated gradients (ig) (sundararajan et al., ) as contribution analysis method. the following illustrations were created using ig steps, which we found sufficient to achieve stable ig results. a visual analysis of deeprc models on the simulated datasets, as illustrated in tab. a and fig. a , shows that the implanted motifs can be successfully extracted from the trained model and are straightforward to interpret. in the real-world cmv dataset, deeprc finds complex patterns with high variability in the center regions of the immuno-sequences, as illustrated in figure a . real-world data with implanted signals extracted motif implanted motif(s) g r s r a r f r l r d r r r {l r d r r r ; c r a r s; g r l-n} motif freq. ρ . % . % . % table a : visualization of motifs extracted from trained deeprc models for datasets from categories "simulated immunosequencing data", "lstm-generated data", and "real-world data with implanted signals". motif extraction was performed using integrated gradients on the d cnn kernels over the validation set and test set repertoires of one cv fold. wildcard characters are indicated by z, random noise on characters by r , characters with % probability of being removed by d , and gap locations of random lengths of { ; ; } by -. larger characters in the extracted motifs indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the diseased class. contributions to positional encoding are indicated by < (beginning of sequence), ∧ (center of sequence), and > (end of sequence). only kernels with relatively high contributions are shown, i.e. with contributions roughly greater than the average contribution of all kernels. b) c) figure a : integrated gradients applied to input sequences of positive class repertoires. three sequences with the highest contributions to the prediction of their respective repertoires are shown. a) input sequence taken from "simulated immunosequencing data" with implanted motif sz d z d n and motif implantation probability . %. the deeprc model reacts to the s and n at the th and th sequence position, thereby identifying the implanted motif in this sequence. b) and c) input sequence taken from "real-world data with implanted signals" with implanted motifs {l r d r r r ; c r a r s; g r l-n} and motif implantation probability . %. the deeprc model reacts to the fully implanted motif cas (b) and to the partly implanted motif aas c and a at the th and th sequence position (c), thereby identifying the implanted motif in the sequences. wildcard characters in implanted motifs are indicated by z, characters with % probability of being removed by d , and gap locations of random lengths of { ; ; } by -. larger characters in the sequences indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the diseased class. figure a : visualization of the contributions of characters within a sequence via ig. each sequence was selected from a different repertoire and showed the highest contribution in its repertoire. the model was trained on cmv dataset, using a kernel size of , kernels and repertoires for early stopping. larger characters in the extracted motifs indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the disease class. table a : tcrβ sequences that had been discovered by emerson et al. ( ) with their associated attention values by deeprc. these sequences have significantly (p-value . e- ) higher attention values than other sequences. the column "quantile" provides the quantile values of the empiricial distribution of attention values across all sequences in the dataset. in this section we investigate the impact of different variations of deeprc on the performance on the cmv dataset. we consider both a cnn-based sequence embedding, as used in the main paper, and an lstm-based sequence embedding. in both cases we vary the number of attention heads and the β parameter for the softmax function the attention mechanism (see eq. in main paper). for the cnn-based sequence embedding we also vary the number of cnn kernels and the kernel sizes used in the d cnn. for the lstm-based sequence embedding we use one one-directional lstm layer, of which the output values at the last sequence position (without padding) are taken as embedding of the sequence. here we vary the number of lstm blocks in the lstm layer. to counter over-fitting due to the increased complexity of these deeprc variations, we added a l weight penalty to the training loss. the factor with which the l weight penalty contributes to the training loss is varied over orders of magnitudes, where suitable value ranges were manually determined on one of the training folds beforehand. to reduce the computational effort, we do not consider all numbers of kernels that were considered in the main paper. furthermore, we only compute the auc scores on of the cross-validation folds. the hyperparameters, which were used in a grid search procedure, are listed in tab. a for the cnn-based sequence embedding and tab. a for the lstm-based sequence embedding. results. we show performance in terms of auc score with single hyperparameters set to fixed values so as to investigate their influence in tab. a for the cnn-based sequence embedding and tab. a for the lstm-based sequence embedding. we note that due to restricted computational resources this study was conducted with fewer different numbers of cnn kernels, with the auc estimated from only of the cross-validation folds, which leads to a slight decrease of performance in comparison to the full hyperparameter search and cross-validation procedure used in the main paper. as can be seen in tab. a and a , the lstm-based sequence embedding generalizes slightly better than the cnn-based sequence embedding. table a : impact of hyperparameters on deeprc with lstm for sequence encoding. mean ("mean") and standard deviation ("std") for the area under the roc curve over the first folds of a -fold nested cross-validation for different sub-sets of hyperparameters ("sub-set") are shown. the following sub-sets were considered: "full": full grid search over hyperparameters; "beta=*": grid search over hyperparameters with reduction to specific value * of beta value of attention softmax; "heads=*": grid search over hyperparameters with reduction to specific number * of attention heads; "lstms=*": grid search over hyperparameters with reduction to specific number * of lstm blocks for sequence embedding. table a : impact of hyperparameters on deeprc with d cnn for sequence encoding. mean ("mean") and standard deviation ("std") for the area under the roc curve over the first folds of a -fold nested cross-validation for different sub-sets of hyperparameters ("sub-set") are shown. the following sub-sets were considered: "full": full grid search over hyperparameters; "beta=*": grid search over hyperparameters with reduction to specific value * of beta value of attention softmax; "heads=*": grid search over hyperparameters with reduction to specific number * of attention heads; "ksize=*": grid search over hyperparameters with reduction to specific kernel size * of d cnn kernels for sequence embedding; "kernels=*": grid search over hyperparameters with reduction to specific number * of d cnn kernels for sequence embedding. a compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding predicting the sequence specificities of dna-and rna-binding proteins by deep learning explaining and interpreting lstms solving the protein sequence metric problem rank-loss support instance machines for miml instance annotation augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires multiple instance learning: a survey of problem characteristics and applications vdjserver: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements tetramer-visualized gluten-specific cd + t cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge ireceptor: a platform for querying and analyzing antibody/b-cell and t-cell receptor repertoire data across federated repositories support-vector networks quantifiable predictive features define epitope-specific t cell receptor repertoires on a model of associative memory with huge storage capacity bert: pre-training of deep bidirectional transformers for language understanding solving the multiple instance problem with axis-parallel rectangles predicting the spectrum of tcr repertoire sharing with a data-driven model of recombination immunosequencing identifies signatures of cytomegalovirus exposure history and hla-mediated effects on the t cell repertoire predicting antigen-specificity of single t-cells based on tcr cdr regions. biorxiv a review of multi-instance learning assumptions deep sequencing of b cell receptor repertoires from covid- evaluation and benchmark for biological image segmentation the promise and challenge of high-throughput sequencing of the antibody repertoire tcrex: detection of enriched t cell epitope specificity in full t cell receptor sequence repertoires. biorxiv identifying specificity groups in the t cell receptor repertoire generating sequences with recurrent neural networks. arxiv a bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status learning the high-dimensional immunogenomic features that predict public and private antibody repertoires improving neural networks by preventing co-adaptation of feature detectors long short-term memory fast model-based protein homology detection without alignment neural networks and physical systems with emergent collective computational abilities convolutional neural network architectures for matching natural language sentences attention-based deep multiple instance learning nettcr: sequence-based prediction of tcr binding to peptide-mhc complexes using convolutional neural networks basset: learning the regulatory code of the accessible genome with deep convolutional neural networks detecting cutaneous basal cell carcinomas in ultra-high resolution and weakly labelled histopathological images self-normalizing neural networks capturing the differences between humoral immunity in the normal and tumor environments from repertoire-seq of b-cell receptors using supervised machine learning observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires dense associative memory for pattern recognition dense associative memory is robust to adversarial inputs gradient-based learning applied to document recognition set transformer: a framework for attention-based permutation-invariant neural networks imgt unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains distance between sets methods for detecting associations with rare variants for common diseases: application to analysis of sequence data the extended cohnkanade dataset (ck+): a complete dataset for action unit and emotion-specified expression high-throughput immune repertoire analysis with igor a framework for multiple-instance learning computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires longitudinal high-throughput tcr repertoire profiling reveals the dynamics of t cell memory formation after mild covid- infection. biorxiv methods for interpreting and understanding deep neural networks layer-wise relevance propagation: an overview how many different clonotypes do immune repertoires contain? current opinion in systems biology treating biomolecular interaction as an image classification problem -a case study on t-cell receptorepitope recognition prediction. biorxiv sumrep: a summary statistic framework for immune receptor repertoire comparison and model validation biophysicochemical motifs in t-cell receptor sequences distinguish repertoires from tumor-infiltrating lymphocyte and adjacent healthy tissue pytorch: an imperative style, high-performance deep learning library needles in haystacks: on classifying tiny objects in large images interpretable deep learning in drug discovery pointnet: deep learning on point sets for d classification and segmentation graph kernels for chemical informatics cov-abdab: the coronavirus antibody database. biorxiv immunedb, a novel tool for the analysis, storage, and dissemination of immune repertoire sequencing data a $$k$$-nearest neighbor based algorithm for multi-instance multi-label active learning machine learning in automated text categorization high-throughput mapping of b cell receptor sequences to antigen specificity vdjtools: unifying post-analysis of t cell receptor repertoires vdjdb: a curated database of t-cell receptor sequences with known antigen specificity deeptcr: a deep learning framework for understanding t-cell receptor sequence signatures within complex t-cell repertoires prediction of specific tcr-peptide binding from large dictionaries of tcr-peptide pairs. biorxiv axiomatic attribution for deep networks attention-based deep neural networks for detection of cancerous and precancerous esophagus tissue on histopathological slides learning with sets in multiple instance regression applied to remote sensing attention is all you need revisiting multiple instance neural networks novel approaches to analyze immunoglobulin repertoires immunesim: tunable multi-feature simulation of b-and t-cell receptor repertoires for immunoinformatics benchmarking genome-wide protein function prediction through multiinstance multi-label learning rare-variant association testing for sequencing data with the sequence kernel association test polyspecificity of t cell and b cell receptor recognition practical guidelines for b-cell receptor repertoire sequencing analysis learning embedding adaptation for few-shot learning convolutional neural network architectures for predicting dna-protein binding pird: pan immune repertoire database multi-instance multi-label learning with application to scene classification predicting effects of noncoding variants with deep learning-based sequence model the ellis unit linz, the lit ai lab and the institute for machine learning are supported by the land oberösterreich, lit grants deeptoxgen ( in the following, the appendix to the paper "modern hopfield networks and attention for immune key: cord- -k o gxe authors: johannsen, leif; potwar, karna; saveriano, matteo; endo, satoshi; lee, dongheui title: robotic light touch assists human balance control during maximum forward reaching date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: k o gxe objective we investigated how light interpersonal touch (ipt) provided by a robotic system supports human individuals performing a challenging balance task compared to ipt provided by a human partner. background ipt augments the control of body balance in contact receivers without a provision of mechanical body weight support. the nature of the processes governing the social haptic interaction, whether they are predominantly reactive or predictive, is uncertain. method ten healthy adult individuals performed maximum forward reaching (mfr) without visual feedback while standing upright. we evaluated their control of reaching behaviour and of body balance during ipt provided by either another human individual or by a robotic system in two alternative control modes (reactive vs predictive). results changes in reaching behaviour under the robotic ipt, such as lower speed and straighter direction were linked to reduced body sway. mfr of the contact receiver was influenced by the robotic control mode such as that a predictive mode reduced movement variability and increased postural stability to a greater extend in comparison to human ipt. the effects of the reactive robotic system, however, more closely resembled the effects of ipt provided by human contact provider. conclusion the robotic ipt system was as supportive as human ipt. robotic ipt seemed to afford more specific adjustments, such as trading reduced speed for increased accuracy, to meet the intrinsic demands and constraints of the robotic system. possibly, ipt provided by a human contact provider reflected reactive interpersonal postural coordination more similar to the robotic system’s follower mode. précis interpersonal touch support by a robotic system was evaluated against support provided a human partner during maximum forward reaching. human contact receivers showed comparable benefits in their reaching postural performance between the support conditions. coordination with the robotic system, nevertheless, afforded specific adaptations in the reaching behaviour. we investigated how light interpersonal touch (ipt) provided by a robotic system supports human individuals performing a challenging balance task compared to ipt provided by a human partner. background ipt augments the control of body balance in contact receivers without a provision of mechanical body weight support. the nature of the processes governing the social haptic interaction, whether they are predominantly reactive or predictive, is uncertain. if robotic systems are envisaged as the solution to future shortages in clinical staff and caregivers for the purpose of augmenting of patients' mobility by a provision of balance support, they must show a responsiveness to the social constraints and demands, which govern any routine physical interaction between a patient and a human carer. from a scientific and engineering point of view, therefore, the principles of human-human interactions during physical interactions need to be extracted and evaluated in terms of their transferability to human-robot interactions as exoskeletal approaches may be unsuitable for frail individuals due the weight added to the body. in physical rehabilitation, caregivers and therapists routinely provide physical assistance to balance- impaired individuals during postural mobilization and transfer maneuvres. in order to prevent long-term habitual dependency of a patient on external balance aids and other forms of support, a therapist needs to be adopt an optimum level of postural assistance that maximizes a patient's movement autonomy ('assist-as-needed'). one possible approach is the provision of delibrerately light interpersonal touch (ipt) by a caregiver, which can be to explore the interdependencies between cr and cp during ipt in more detail, we evaluated performance in maximum forward reaching (mfr) with and without light ipt applied to the ulnar side of the wrist of blindfolded cr's extended arm intended to provide a social haptic cue and impose social coordinative constraints on both the cr and the cp (steinl & johannsen, ) . interestingly, ipt reduced sway more effectively when the cp had the eyes closed and their perception of cr's motion was based on haptic feedback alone. in contrast, ipt with open eyes did not result in reduced sway compared with a condition in which ipt was not provided (steinl & johannsen, ) . we speculated, therefore, that minimization of the interaction forces and their variability at the contact location during ipt acts as an implicit task constraint and shared goal in the present study, we intended to contrast the effects of human ipt (hipt) on cr's postural performance against the effects of two different modes of robotic ipt (ript) and expected specific costs and benefits on body sway and postural performance due to the robotic response modes. similar to hipt, ript was applied in a "fingertip touch" fashion to cr's wrist without any mechanical coupling or weight support. the robotic system either followed a participant reactively or predicted a participant's movement trajectory. as the coupling between two humans with ipt in terms of the interaction forces is intrinsically more noisy due to each individual's motion dynamics and response delays, we expected that a predictive mode of the robotic system would result in a less noisy haptic coupling and therefore enhance performance in the mfr task, such as greater reaching distance with less body sway. in addition, the reactive mode of the robot was supposed to be advantageous over hipt due to the fixed response delay, which would enable participants to extract own movement-related information from the interaction forces for balance control. participants parallel to the reaching direction. cp provided ipt with the right extended arm by lightly contacting the wrist at its ulnar side of the cr. during ipt, cp kept the eyes open to receive visual cues of a cr's motion as would the robotic systems by optical motion tracking. during the robotic ipt conditions, a single kuka lwr + manipulator (augsburg, germany) served as cp. the cp kept light contact with cr's ulnar side of the wrist. cr's body sway was determined in terms of the anteroposterior (ap) and mediolateral (ml) components of the center of pressure (cop), as derived from the six components of the ground reaction forces and moments. in in the human-robot interaction conditions, the cr's wrist was tracked by the end effector of the robotic system without any mechanical coupling (fig. b ). the robotic system provided contact via a hemispherical rubber pad attached to a force sensor (optoforce d omd, onrobot. odense, denmark; hz) on the end- effector, which kept the relative orthogonal distance constant. the force sensor was used to measure force at the contact location. the cr's wrist position, required to control the robotic system, was measured by an optoelectronic motion capture system (optitrack, naturalpoint, corvallis, or, usa; hz). to provide nearly the same feeling for the cr in both touch conditions, the cp was wearing a thin rubber glove to provide similar tactile sensation to the case of ript where the end effector of the robot had a rubber surface (fig. b) . participants' movements of the right hand were tracked with a marker-based optical motion capture system by placing three reflective markers on the right hand (one on the caput ulnae/processus styloideus radii/basis and two on the ossa metacarpi). tracked hand position was sent to the robot to control the robots' movements but also recorded to calculate reaching distance in the mfr end-state. the robotic control scheme required high control frequencies to avoid unstable behaviors (siciliano, sciavicco, villani, & oriolo, ) . for this reason, the robot was controlled at hz. interaction forces were measured at the same frequency of hz, while the cr's hand was tracked at hz. hence, it was necessary to up-sample the motion tracking system to match the robot control frequency. the unit of the iop is bit/s and thus expresses the informational "throughput" of a participant during the movement. trajectory. a constant velocity lkf assumes that the motion is generated by the discrete linear system was generated at hz and used to control the robotic system. the lkf was exploited to realize two different robotic modes, i.e. the robotic follower and the robotic anticipatory modes. more specifically, in the riptfollow mode the robot passively followed the wrist motion while providing a light touch. to implement a passive follower, the position (position error: riptfollow ap - . m, ml - . m) (fig. b ) predicted by the lfk at the actual time instant t was used to generate the control command described in the previous section. in this way, the robotic system followed the wrist position with one sample delay ( ms). in the riptanticip mode, the robot predicted the future wrist position to lead the motion while providing a light touch. to realize the leading mode, the lkf was exploited to make a one-step prediction of the wrist position. in particular, the predicted future (position error: riptanticip ap - . , ml - . m) (fig. a) was used to generate the control command. in this way, the robot was anticipating the human motion by one sample ( ms), thereby leading the movement execution. during the mfr task, the robotic system provided a light touch along the contact directions, while predicting and following (or predicting) the participant's right wrist trajectory in the ap direction. ). the desired contact force table summarizes all statistical comparisons. the mfr amplitude in the horizontal plane was not affected by the ipt condition. all three ipt conditions resulted in comparable amplitudes (hipt: mean= . cm, sd . ; riptanticip: mean= . cm, sd . ; riptfollow: mean= . cm, sd . ). average (fig. a) and peak planar reaching velocity (fig. b) were slower in both ript conditions compared to hipt. the directional angle of reaching in the horizontal plane was more straight ahead in the riptfollow condition (av angle=- . deg, sem . ) and a tendency of less lateral drift in riptanticip (av angle=- . deg, sd . ) compared to hipt (av angle=- . deg, sd . ). orthogonal deviation from a straight line, in terms of both the average (fig. c ) and summed deviation (fig. d) as well as the variability, was lower in hipt than riptanticip. path length was not altered by the ipt conditions but the normalized path length indicated less curvature in riptfollow compared to riptanticip (fig. e) . sway variability in either the ap or ml directions was not different between the three ipt conditions in the baseline phase and the mfr end-state. during the reaching phase, however, ap sway variability was reduced in both conditions involving ript compared to hipt (fig. a) and riptanticip compared to riptfollow. in contrast , only riptanticip showed reduced ml sway variability compared to hipt (fig. b) . the iod differed between the three conditions in the ap direction., with the lowest scores in hipt compared to both ript conditions. in the ml direction, hipt had a lower iod score compared to riptanticip only (fig. c) . in contrast, no difference in the informational "throughput" (iop) was observed between the three conditions (fig. d) . movements in a reactive fashion as well, potentially in follower mode due to visual dominance or as the more optimal strategy due to the inability to stem the computational complexity of predicting cr's trajectory. in our current study, the provision of ipt by the cp involved visual feedback of cr and his or her movements.as this would be more similar to the optical tracking of cr's motion used by the robotic system. in human pairs, the presence of visual feedback with habitual visual dominance is likely to turn the cp into a follower of cr's movement (steinl & johannsen, ) . assessing hhi as well as hri in a single degree of between two human individuals leader-follower relationships are not necessarily fixed. it seems to be the case, however, that the more adaptive individual, for example the person on whom fewer requirements to fulfill specific movement contraints are imposed, is more likely to take a follower role (skewes, skewes, michael, & konvalinka, ) . despite impressive advances in the recent decade, current robotics engineering is still distant from developing robotic systems able to assist human individudals socially, especially during postural activities and balance exercises (sheridan, ) . in the both ript conditions of the current study, the dynamics of the robotic system were not independent but in one way or another a direct consequence of cr's movements. despite the lack of any real "social cognitive" capabilities of the robotic system, this fact can nevetheless be interpreted as highly precise responsiveness, which a real human cp could never match. we assume that participants were not able to consciously preceive any difference between the anticipatory and follower ript modes, just an absolute timing difference of ms, and therefore would not change their behaviour voluntarily. possibly due to a shift in participants from less to more reactive, feedback-dependent postural control, crs reduced their reaching velocity to adjust their movements more precisely to the current position of the robotic end-effector and for the same to stay in contact with their wrist. these concerns could have been even more prominent in the riptanticip condition than in riptfollow. forms of ipt, it means that ipt provided by a robotic system does not disrupt or distract the human cr. during the reaching phase, the facilitation of stabilization of body sway by ript tended to surpass the effect of hipt, especially in a robotic control mode involving anticipation. this shows that ript does not destabilize cr's postural behaviour but can lead to a further reductions in behavioural variability. nevertheless, human crs altered their mfr behaviour when ipt was provided not by the human partner but by the robotic system. the most obvious changes were general reductions in the average and peak planar mfr velocity with ript. as body sway tended to be reduced in these situations, these adjustments to the robotic cp could reflect a trade-off between speed and accuracy [fitts, ] . according to this interpretation, participants may have effectively controlled sway variability in order to meet any perceived difficulty increase in ript resulting from "hardware" constraints imposed by technical limitations of the robotic system and "soft" constraints in terms of fulfilling the assist-as-needed robotic approaches translate into corrective forces keeping an individual's body or limbs within an initially defined "normal" range. in contrast to such "positive" force feedback, in which a robotic system aims to guide a participant's limb along a specific trajectory by applying a corrective force, our deliberately light interpersonal touch paradigm could be described to act with "negative" force feedback. this means that if participants stray from a reaching trajectory, they will perceive a momentary reduction in touch, which might cue them to perform a subtle correction with the intention to minimize contact force variability. the robotic system in our study was controlled according to this principle, and we believe it imitated cr's behaviour more naturally. at the same time, the reaching trajectory was not prespecified within the robotic system but emerged as a compromise between the cr and the respective cp. in this sense, the cr's movement range remains completely unconstrained. any constraints result from the "social" context of the hhi or hri in this context it is remarkable that riptfollow led to the straightest forward reaching trajectories with least amount of medial drift. this could mean that a robotic system that emphasizes a reactive follower strategy is a better haptic "communicator" in the sense that it made participants to "listen" more closely to the haptic feedback they received. possibly, participants interpreted ript as more reliable as a relative spatial reference and therefore adjusted their reaching movements more in a feedback-driven manner. in contrast, although riptanticip also tended towards a more straight ahead reaching movement, the condition showed the greatest and most variable orthogonal deviation from a straight line connecting the start and end point. the robotic system in leader mode could have actually "misguided" participants in the sense, that it tried to anticipate a participant's next position and so reinforced a participants' tendency to deviate from their current trajectory. beneficial deliberately light interpersonal touch for balance support during maximum forward reaching is easily provided by a robotic system even when it is mechanically uncoupled to the human contact reveicer. this effect does not rely on the system's capability to predict the future position of the contact receiver's wrist. the effects the uncoupled robotic ipt in reactive following mode were comparable to human ipt on most parameters. as the robotic system itself was not designed for any form of "social" cognition or explicit haptic communication, our study nevertheless demonstrates that robotic ipt can be used to implicitly "nudge" human contact receivers to alter their postural strategy for adapting to the robotic system without any decrements in their postural performance during maximum forward reaching. perspectives on human-human sensorimotor interactions for the design of rehabilitation robots the uncontrolled manifold concept: identifying control variables for a functional task human-robot interaction: status and challenges robotics: modelling, planning and control synchronised and complementary coordination mechanisms in an asymmetric joint aiming task assist- as-needed robot-aided gait training improves walking function in individuals following stroke robotic assist-as-needed as an alternative to therapist-assisted gait rehabilitation interpersonal interactions for haptic guidance during maximum forward reaching haptic communication between humans is tuned by the hard or soft mechanics of interaction we thank prof. g. cheng, prof. m. buss, prof. s. hirche, dr. k. ramirez-amaro, and j. r. guadarrama olvera for providing the experimental infrastructure and s. m. steinl key: cord- - lrfna v authors: northrop, amanda c.; avalone, vanessa; ellison, aaron m.; ballif, bryan a.; gotelli, nicholas j. title: clockwise and counterclockwise hysteresis characterize state changes in the same aquatic ecosystem date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lrfna v incremental increases in a driver variable, such as nutrients or detritus, can trigger abrupt shifts in aquatic ecosys-tems. once these ecosystems change state, a simple reduction in the driver variable may not return them to their original state. because of the long time scales involved, we still have a poor understanding of the dynamics of ecosys-tem recovery after a state change. a model system for understanding ecosystem recovery is the aquatic microecosystem that inhabits the cup-shaped leaves of the pitcher plant sarracenia purpurea. with enrichment of organic matter, this system flips within to days from an oxygen-rich state to an oxygen-poor (hypoxic) state. in a replicated green-house experiment, we enriched pitcher plant leaves at different rates with bovine serum albumin (bsa), a molecular substitute for detritus. changes in dissolved oxygen ([o ]) and undigested bsa concentration were monitored during enrichment and recovery phases. at low enrichment rates, ecosystems showed a substantial lag in the recovery of [o ] (clockwise hysteresis). at intermediate enrichment rates, [o ] tracked the levels of undigested bsa with the same profile during the enrichment and recovery phases (no hysteresis). at high enrichment rates, we observed a novel response: changes in [o ] were proportionally larger during the recovery phase than during the enrichment phase (counter-clockwise hysteresis). these experiments demonstrate that detrital enrichment rate can modulate a diversity of hysteretic responses in a single aquatic ecosystem. with counter-clockwise hysteresis, rapid reduction of a driver variable following high enrichment rates may be a viable restoration strategy. matter to pitchers causes an abrupt increase in bod resulting from decomposition , by carbon-limited bacteria. pitchers in all enrichment treatments saw an rapid decline in [o ] following the initial enrichment phase, suggesting that bod in- creased rapidly (fig. ) . this rapid change in bod may have contributed to clockwise hysteresis at low levels of enrichment, but doesn't account for the counterclockwise hysteresis at high enrichment levels. in other systems, clockwise hysteresis is generally the result of positive feedback loops . shallow lakes, a simple reduction in phosphorus input does not lead to a proportional recovery in macrophyte cover or community structure . further, these communities do not fully recover in the time frames in which they are studied and may effectively remain permanently degraded. these examples and our work highlight the importance of applying a dynamic regime concept to ecosystem management and restoration. such an approach would include testing for hysteresis, characterizing feedbacks that maintain undesirable regimes, and identifying if and how system variables change as a result of a regime shift . in ecosystems where hysteresis is counter-clockwise, rapid reduction in a driver variable from high to low levels may be a successful restoration strategy. in contrast, systems that have experienced chronic low-levels of enrichment may exhibit clockwise hysteresis that requires more extreme reductions of the driver variable, or alternative restoration strategies , to restore. although there is still an important need for early warning signals, past histories of high versus low enrichment may dictate different restoration strategies for collapsed ecosystems. efficiency of insect capture by sarracenia purpurea (sarraceniaceae), the northern pitcher plant preparation and purification of dna from insects for aflp analysis key: cord- -v l sovt authors: bloom, david c.; tran, robert k.; feller, joyce; voellmy, richard title: immunization by replication-competent controlled herpesvirus vectors date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: v l sovt replication-competent controlled virus vectors were derived from virulent hsv- wildtype strain syn+ by placing one or two replication-essential genes under the stringent control of a gene switch that is co-activated by heat and an antiprogestin. upon activation of the gene switch, the vectors replicate in infected cells with an efficacy that approaches that of the wildtype virus from which they were derived. essentially no replication occurs in the absence of activation. when administered to mice, localized application of a transient heat treatment in the presence of systemic antiprogestin results in efficient but limited virus replication at the site of administration. the immunogenicity of these viral vectors was tested in a mouse footpad lethal challenge model. unactivated viral vectors - which may be regarded as equivalents of inactivated vaccines - induced detectable protection against lethality caused by wildtype virus challenge. single activation of the viral vectors at the site of administration (rear footpads) greatly enhanced protective immune responses, and second immunization resulted in complete protection. once activated vectors also induced far better neutralizing antibody and hsv- -specific t cells responses than unactivated vectors. to find out whether the immunogenicity of a heterologous antigen was also enhanced in the context of efficient transient vector replication, a virus vector constitutively expressing an equine influenza virus hemagglutinin was constructed. immunization of mice with this recombinant induced detectable antibody-mediated neutralization of equine influenza virus as well as a hemagglutinin-specific t cell response. single activation of viral replication resulted in a several-fold enhancement of this immune response. importance we hypothesized that vigorous replication of a pathogen may be critical for eliciting the most potent and balanced immune response against it. hence, attenuation/inactivation (as in conventional vaccines) should be avoided. instead, necessary safety should be provided by placing replication of the pathogen under stringent control and of activating time-limited replication of the pathogen strictly in an administration region in which pathology cannot develop. immunization will then occur in the context of highly efficient pathogen replication and uncompromised safety. we found that localized activation in mice of efficient but limited replication of a replication-competent controlled herpesvirus vector resulted in a greatly enhanced immune response to the virus or an expressed heterologous antigen. this finding supports the above hypothesis as well as suggests that the vectors may be promising novel agents worth exploring for the prevention/mitigation of infectious diseases for which efficient vaccination is lacking, in particular in immunocompromised patients. we hypothesized that vigorous replication of a pathogen may be critical for eliciting the most potent and balanced immune response against it. hence, attenuation/inactivation (as in conventional vaccines) should be avoided. instead, necessary safety should be provided by placing replication of the pathogen under stringent control and of activating time-limited replication of the pathogen strictly in an administration region in which pathology cannot develop. immunization will then occur in the context of highly efficient pathogen replication and uncompromised safety. we found that localized activation in mice of efficient but limited replication of a replication-competent controlled herpesvirus vector resulted in a greatly enhanced immune response to the virus or an expressed heterologous antigen. this finding supports the above hypothesis as well as suggests that the vectors may be promising novel agents worth exploring for the prevention/mitigation of infectious diseases for which efficient vaccination is lacking, in particular in immunocompromised patients. that attenuated replicating viruses induce more complete and more potent immune responses to autologous or heterologous antigens than corresponding non-replicating viruses. we hypothesized that a virus vector that could replicate in a controlled fashion with nearly the same efficiency as the respective wildtype virus (referred to herein as "replication-competent controlled virus vector") would induce an even more potent and complete immune response to itself or an expressed heterologous antigen than a corresponding attenuated vector ( ). our hypothesis is in part based on the rational expectation that an efficiently replicating virus will produce a stronger inflammatory response than an attenuated virus, which inflammatory response will result in a potent activation of the innate immune system and, consequently, in strong and lasting b and t cell responses ( ) . to realize such an immunization strategy, a regulation system must be employed that reliably and stringently controls viral replication as well as is capable of being turned on and off at will. however, virus disseminates after administration. simply restricting the number of replication cycles will not be enough: depending on the number of cycles allowed, there will be a more or less pronounced manifestation of the toxicity typical for the viral vector used. for example, allowing an hsv- vector to replicate for a certain period of time may cause unacceptable neurotoxicity. hence, the regulation system must be capable of exerting regional control over viral replication so that the immunizing virus only replicates in a locale in which it is certain not to cause a disease phenotype. hsv-gs replication was also tightly controlled in vivo, two of three groups of mice were administered hsv-gs virus ( , pfu per mouse) to the footpad, and the mice of one of the latter groups were given ulipristal intraperitoneally (i,p.) as well as, h later, were subjected to a heat treatment to the footpads at c for min. one day later, all mice were euthanized, and dna and rna were extracted from their feet and analyzed by qpcr and rt-qpcr, respectively. substantially larger amounts of hsv- dna were detected in the feet of heat/ulipristal-treated mice than in not-treated mice (fig. d ). expression of several viral genes was observed for activated virus but not for unactivated virus, strongly suggesting that viral replication and gene expression only occurred subsequent to heat/ulipristal activation. it is noted that stocks of the recombinant viruses were prepared in cells that were subjected to daily heat treatment in the presence of antiprogestin until maximal cytopathic effect was reached. titrations of the viruses were carried out on cells that provided missing proteins in trans. protective immunity induced by activated hsv-gs . induction of protective immunity was evaluated in a mouse footpad lethal challenge model ( ). in a first experiment, virus vectors were administered under anesthesia to the plantar surfaces of both rear feet of adult swiss webster outbred female mice ( , pfu per animal; animals per group). concurrently, and again h later, the animals of one group received an intraperitoneal injection of . mg/kg of mifepristone. three hours after inoculation, the mice of the latter group were subjected to heat treatment ( . c for min) by immersion of their hindfeet in a temperature-controlled water bath. twenty-two days later, all animals were challenged by a -fold lethal dose of hsv- wildtype strain syn+ administered by the same route as the original virus inoculum. survival of the animals was followed until no more lethal endpoints were reached, i.e., until all surviving animals had fully recovered (fig. ) . icp (-) replication-incompetent hsv- recombinant kd ( ) induced a modest level of immunity. as had been expected, because it did not replicate and also did not express the major transcriptional regulator icp , unactivated hsv-gs provided a comparable degree of protection. activated to determine whether immunization with activated hsv-gs reduced replication of the challenge virus more effectively than unactivated hsv-gs or replication-defective kd virus, additional groups of animals ( animals per group) were immunized and challenged by wildtype virus as described above. four days after challenge, the animals were euthanized, feet were dissected and homogenized, and virus present in the homogenates was titrated. results revealed that activated hsv-gs virus reduced challenge virus replication by nearly two orders of magnitude (table ) . unactivated second immunization further enhances protective immunity. we next investigated whether a second activation treatment applied two days after the first treatment (i.e., at table . results of a meta-analysis of the hsv-gs data are presented graphically in fig. . the difference in immunization efficacy between activated and unactivated hsv-gs was found to be highly significant. also significant was the increase in protection afforded by second immunization with an activated hsv- gs vector. the effect of second activation two days after virus administration and first activation was not statistically relevant. second activation tended to modestly enhance protective immunity in a majority of experiments in which this was addressed. however, in some experiments, e.g., in the experiment reported in fig. a , an effect was not apparent. it is noted that the conditions for second activation had not been optimized. subjected to activation treatment. both ha rna and protein levels appeared to be somewhat higher in twice-activated animals than in once-activated animals. to assess immune responses (three weeks after immunization), additional groups of mice were inoculated with saline (mock immunization), or with , pfu of hsv-gs or hsv-gs (three groups). as in the above-described part of the experiment, all animals in the hsv-gs group as well as in two of the hsv-gs groups were subjected to an activation treatment. the animals of one of the latter groups received a second activation treatment two days later. serum samples were tested for their ability to neutralize eiv prague/ . as expected, neutralizing antibodies were not detected in unimmunized (not shown), mock-immunized or vector-immunized animals (fig. d ). unactivated hsv-gs was capable of inducing a neutralizing antibody response. activation of hsv-gs shortly after inoculation resulted in a several-fold magnified response. it is noted that twice-activated hsv-gs elicited an only marginally better response than once-activated virus. ha-specific t cells present in pbmcs were quantified by the same type of responder cell frequency assay that had been used to assess numbers of hsv- -specific t cells. ha-specific t cells were not detected in unimmunized (not shown) or mock-immunized animals (fig. e ). induction of a t cell generalizing these findings, it appears that replication-competent controlled vectors are promising new agents that may better protect against diseases caused by the viruses from which they were derived than replication-incompetent/inactivated vaccines and, possibly, also live attenuated vaccines. what is perhaps even more important, our observations suggest that replication-competent controlled vectors can be excellent immunization platforms that may be exploited for the elaboration of new "vaccines". distinguish the possibilities that the second activation was only marginally effective because the immune system was already maximally engaged by the initial immunization or because optimal conditions for the second activation treatment had not been identified. additional research is required to resolve this question. immunization by replication-competent controlled vectors represents a novel paradigm that may be elaborated in various ways. therefore, the work presented herein should be in this report, we have focused on immunization effects of once-activated replication- competent controlled vectors that undergo one round of replication. we note that viral vectors that were intended to be limited to one cycle of replication were advanced before as potential vaccines ( ). these so-called disabled infectious single-cycle gs virion dna. subsequent to the addition of mifepristone to the medium, the co-transfected cells were exposed to . c for min and then incubated at c. subsequently, on days and , the cells were again incubated at . c for min and then returned to c. picking and amplification of plaques, screening and plaque purification was performed essentially as described for hsv-gs ( , ). the resulting neutralizing antibody assay. after collection, the blood was allowed to clot for min. table . the proteins were used to coat a well elisa plate which was allowed to air-dry. table and fig. for comparable experiments. for each hsv-gs or hsv-gs group; n = for mock; *** p ≤ . ). see table fig. for comparable experiments. ("activated + boost"; n = ), and activated on day , readministered , pfu/mouse of hsv-gs days after the first inoculation and reactivated h later ("activated + boost + nd activation"; n = ). twenty-one days after the last treatment, all mice were challenged with a -fold lethal dose of wild-type hsv- strain syn+ applied to both rear feet. ** p ≤ . ; *** p < . . (b) a similar experiment except that both initial and second immunizations were with , pfu/mouse of hsv-gs (n = ; for second immunization n = ; *** p ≤ . ). the data are presented as percent survival for each treatment group. see table and fig. for comparable experiments. vector replication was activated in some treatment groups by administration of heat and ulipristal as described in fig. . one treatment group received a second activation treatment that was administered two days after the first activation treatment. mice feet were harvested hours after the last treatment, and rna was isolated and cdna prepared. samples were analyzed by qpcr with eiv prague/ ha-specific taqman primers/probe (table ) global and regional estimates of prevalent and incident herpes simplex virus key: cord- -efd dgn authors: mahan, margaret; rafter, daniel; casey, hannah; engelking, marta; abdallah, tessneem; truwit, charles; oswood, mark; samadani, uzma title: tbiextractor: a framework for extracting traumatic brain injury common data elements from radiology reports date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: efd dgn objective the manual extraction of valuable data from electronic medical records is cumbersome, error-prone, and inconsistent. by automating extraction in conjunction with standardized terminology, the quality and consistency of data utilized for research and clinical purposes would be substantially improved. here, we set out to develop and validate a framework to extract pertinent clinical conditions for traumatic brain injury (tbi) from computed tomography (ct) reports. materials and methods we developed tbiextractor, which extends pycontextnlp, a regular expression algorithm using negation detection and contextual features, to create a framework for extracting tbi common data elements from radiology reports. the algorithm inputs radiology reports and outputs a structured summary containing clinical findings with their respective annotations. development and validation of the algorithm was completed using two physician annotators as the gold standard. results tbiextractor displayed high sensitivity ( . - . ) and specificity ( . ) when compared to the gold standard. the algorithm also demonstrated a high equivalence ( . %) with the annotators. a majority of clinical findings ( %) had minimal errors (f score ≥ . ). when compared to annotators, tbiextractor extracted information in significantly less time ( . sec vs . min per report). discussion and conclusion tbiextractor is a validated algorithm for extraction of tbi common data elements from radiology reports. this automation reduces the time spent to extract structured data and improves the consistency of data extracted. lastly, tbiextractor can be used to stratify subjects into groups based on visible damage by partitioning the annotations of the pertinent clinical conditions on a radiology report. fig . graphical outline of the methods. purple rectangle shapes correspond to methods subsections, meaning they represent steps in the processing workflow, orange parallelogram shapes represent data, blue diamond shapes represent binary decisions on data, gray rectangle shapes represent excluded data, and green isosceles trapezoid shapes correspond to subcomponents of the algorithm. data capture and cleaning hospital admission radiology reports from non-contrast head ct scans were extracted from emrs for subjects participating in the classify-tbi study (details in s protocol). each radiology report was converted to a spacy [ ] container for assessing linguistic annotations and partitioned into sentences. sentences before "findings" and after "impressions" sections were removed. then, the sentences were concatenated with newline characters replaced with a space, symbols removed, and whitespace stripped. radiology reports that did not contain "findings" or "impressions" sections were removed along with radiology reports containing multiple scan types. using scikit-learn [ ] tfidfvectorizer, the corpus was converted into a matrix of tf-idf (term-frequency times inverse document-frequency) features using n-grams with n-range from one to ten. cosine similarities were calculated between each pair of radiology reports by multiplying the tf- idf matrix by its transpose. using the cosine similarity for each pair of radiology reports, one radiology report was randomly selected and all radiology reports with at least . cosine similarity to that radiology report were collected in a set. from this set, one radiology report was randomly selected to keep for further analysis and the remainder were removed. this was applied recursively for each set until each radiology report was retained for further analysis or marked for removal. the purpose of this removal was to reduce the data requiring human annotation. details in s appendix. dataset partitioning a random deck of three numbers the same size as the number of radiology reports retained for analysis was created. the three numbers represented the proportion of radiology reports to be assigned to each of the datasets: % initialization, % training, and % validation. from the set of radiology reports retained for analysis, one radiology report was randomly selected along with up to three most similar radiology reports, based on cosine similarity. from this subset, each radiology report was assigned the next number in the shuffled deck. this was applied recursively until each radiology report was assigned to one dataset. the initialization dataset was solely used for training annotators and was not used by the as a lexicon-based method, pycontextnlp inputs tab-separated files for lexical targets (indexed events) and lexical modifiers (contextual features). it then converts these into itemdata, which contains a literal, category, regular expression, and rule (the latter two are optional). the literal, belonging to a category (e.g., absent), is the lexical phrase (e.g., is negative) in the text. the regular expression allows for variant text phrases (e.g., was negative) giving rise to the same literal and is generated from the literal if not provided. further, the rule provides context to the span of the literal (e.g., backward). for text data, pycontextnlp marks the text with lexical modifiers and lexical targets according to their representative itemdata. the pycontextnlp algorithm outputs a directional graph via networkx[ ] which represents these markups. nodes in the graph represent the concepts (i.e., lexical modifiers and lexical targets) in the text and edges in the graph represent the relationship between the concepts. the following three subsections will describe the details used for extending pycontextnlp. absent, normal, and abnormal. henceforth, the term "annotation" will be used when referencing the category to maintain consistency between annotators and algorithm vocabulary. lexical targets were adapted from the common data elements in radiologic imaging of tbi [ ] . in deriving the lexical targets, the literal represents a clinical condition relevant to tbi on a non- contrast head ct scan (e.g., microhemorrhage) and the category, in this study, is the same (e.g., microhemorrhage). the regular expression for each literal (e.g., microhemorrhage(s)?) was added and updated during the training stage. two examples (fig and fig ) are provided for detailed explanation of the application of lexical modifiers and lexical targets during the algorithm process. at this stage of processing, each sentence in the radiology report will be marked with lexical targets and linked lexical modifiers. there will be one lexical modifier assigned to one lexical target. there were radiology reports extracted: was removed because it did not have both "findings" and "impressions" sections, were removed because they contained more than one scan type, and were removed for high cosine similarity. the remaining reports were split into initialization, training, and validation datasets (table ) . in the training dataset, annotators took an average of . minutes per radiology report. between % and % of annotations across radiology reports were selected from default (table ) . in the validation dataset, annotators took an average of . minutes per radiology report. similar to the training dataset, % of annotations across radiology reports were selected from default (table ) . for the validation dataset, there was high equivalence in annotations between the annotators (n = ), with an additional similar annotations, and only divergent annotations. overall, the two annotators were in high agreement (κ = . ). equivalent was considered the gold standard (fig dashed line) . the evaluation revealed high performance across all metrics (table ) . six false positives were produced for intracranial pathology and four for hemorrhage (nos), meaning tbiextractor identified these lexical targets as present, while the annotators marked these as absent. this is due to the derivation of these lexical targets in relation to other lexical targets (i.e., if extraaxial fluid collection is present, then by decision rules, so is intracranial pathology). the remaining lexical targets produced less minimal false positives. overall, the errors are minimal as measured by the high f scores for the majority of lexical targets. further examination of divergent cases (i.e., annotators annotated absent and tbiextractor annotated present, or vice versa) revealed the most common diverged lexical targets to be intracranial pathology, facial fracture, intraparenchymal hemorrhage, hemorrhage (nos), and herniation. the remaining lexical targets exhibited less than four diverged responses. the most the derived-from- decision-rules intracranial pathology, indicating that most errors were from tbiextractor missing the lexical targets outright. in most divergent cases where this was not the reason, there were more complex structures to the sentences first, cosine similarities across the four subsets of data were not different and indicated a normal, albeit slender, distribution of radiology report similarity. second, the average number of sentences in each radiology reports approached the minimum, indicating a skewed right distribution where the majority of radiology reports will have low numbers of sentences. the same holds true for the number of words. taken together, this could be reflective of the findings generally found in ct reports on tbi subjects, where the prevalence of ct findings is less than % in mild tbi cases in cases where annotators were not equivalent, data entry issues tended to be the culprit. mostly, this was a result of overlooking the lexical target and not selecting an annotation different from default. the overlooking could be a result of annotator fatigue, which may be attributed to length and/or complexity of the radiology report there was also a difference in whether "parenchymal contusion" was considered an intraparenchymal hemorrhage. however, the differences between the annotators was minimal and therefore provided a valid gold standard to develop and validate tbiextractor standard assessment metrics for evaluating tbiextractor were exceptionally high, demonstrating the utility of the algorithm for extracting accurate clinical conditions relevant to tbi research one particularly error-prone case was facial fracture. often, radiology reports with facial fractures are lengthy and involve compound sentence structures, which are missed by the regular expressions and span pruning. second, there were several cases where the lexical modifier was absent or at a distance further away than another lexical modifier. for example, "cerebellar volume loss" would indicate atrophy is present, but with this sentence, there is no lexical modifier available and therefore would result in a default lexical modifier of absent. third, there were cases where derived lexical targets were not accurately annotated by tbiextractor extraaxial fluid collection to be absent. further examination of these errors is an avenue for future the dataset used for this study was from a single institution which limits the style of radiology reports and decreases heterogeneity in the sample. furthermore, the dataset was limited in size as there were only two annotators available for annotation. in addition, there were data entry issues from extracting the radiology report from the emrs. for example, a subsequent radiology report was used instead of the admission. lastly, the only scan considered in this dataset is the admission non- contrast head ct. with the nature of tbis, some visible pathologies are only seen on follow-up cts developed to automate the extraction of tbi common data elements from radiology reports multivariable prognostic analysis in traumatic brain injury: results from the impact study optimizing clinical research participant selection with informatics common data elements in radiologic imaging of traumatic brain injury common data elements in radiology the diagnosis of head injury requires a classification based on computed axial tomography prediction of outcome in traumatic brain injury with computed tomographic characteristics: a comparison between the computed tomographic classification and combinations of computed tomographic predictors what can natural language processing do for clinical decision support natural language processing in pathology: a systematic review natural language processing technologies in radiology research and clinical applications natural language processing: an introduction automated encoding of clinical documents based on natural language processing medication extraction from electronic clinical notes in an integrated health system: a study on aspirin use in patients with nonvalvular atrial fibrillation extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system identifying primary and recurrent cancers using a sas-based natural language processing algorithm automated outcome classification of emergency department computed tomography imaging reports automated outcome classification of computed tomography imaging reports for pediatric traumatic brain injury information extraction from multi-institutional radiology reports natural language processing techniques for extracting and categorizing finding measurements in narrative radiology reports a natural language processing pipeline for pairing measurements uniquely across free-text ct reports assessing the feasibility of an automated suggestion system for communicating critical findings from chest radiology reports to referring physicians characterization of change and significance for clinical findings in radiology reports through natural language processing python software foundation data structures for statistical computing in python a guide to numpy open source scientific tools for python scikit-learn: machine learning in python exploring network structure, dynamics, and function using networkx. proceedings of the th python in a d graphics environment a simple algorithm for identifying negated findings and diseases in discharge summaries context: an algorithm for identifying contextual features from clinical text context: an algorithm for determining negation, experiencer, and temporal status from clinical reports document-level classification of ct pulmonary angiography reports based on an extension of the context algorithm inter-coder agreement for computational linguistics a coefficient of agreement for nominal scales imaging evidence and recommendations for traumatic brain injury: conventional neuroimaging techniques prospective validation of a proposal for diagnosis and management of patients attending the emergency department for mild head injury diagnostic procedures in mild traumatic brain injury: results of the who collaborating centre task force on mild traumatic brain injury mild head injury -mortality and complication rate: meta-analysis of findings in a systematic literature review indications for computed tomography in patients with minor head injuries epidemiology of traumatic brain injury congenital and acquired brain injury. . epidemiology, pathophysiology, prognostication, innovative treatments, and prevention. archives of physical key: cord- -k v ksvq authors: naim, nikki; amrit, francis rg; ratnappan, ramesh; delbuono, nicholas; loose, julia a; ghazi, arjumand title: nhr- acts in distinct tissues to promote longevity versus innate immunity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k v ksvq aging and immunity are inextricably linked and many genes that extend lifespan also enhance immunoresistance. however, it remains unclear if longevity-enhancing factors modulate immunity and longevity by distinct or shared mechanisms. here, we demonstrate that the caenorhabditis elegans pro-longevity factor, nhr- , also promotes resistance against pseudomonas aeruginosa, but modulates immunity and longevity by spatially and mechanistically distinct mechanisms. fenofibrate, an agonist of nhr- ’s mammalian functional homolog, pparα, enhanced worm immunoresistance in an nhr- -dependent manner. nhr- expression is increased by germline ablation, an intervention that extends lifespan, but lowered by pathogen exposure. nhr- acted in multiple somatic tissues to promote longevity, whereas, it’s pro-immunity function was mediated by neuronal expression. the canonical nhr- target genes, acs- and fmo- , were upregulated by germline loss, but infection triggered fmo- downregulation and acs- upregulation. interestingly, neither gene conferred resistance against gram-negative pseudomonas, unlike their reported roles in immunity against gram-positive pathogens. thus, nhr- is differentially regulated by interventions that bring about long-term changes (lifespan extension) vs. short-term stress (pathogen exposure) and in response it orchestrates distinct outputs, including pathogen-specific transcriptional programs. overall, our study demonstrates the independent control of immunity and longevity by a conserved regulatory protein. introduction down-regulated by daf- and tcer- too, the shared group (joint down) showed a much higher overlap with the nhr- down class (~ %, / , r . , p < . e- ) as compared to genes specifically down-regulated by either factor alone (fig. s b) . thus, nhr- targets share the largest overlap with genes whose in this study, we demonstrate that nhr- is a pro-longevity factor that modulates lifespan and immunity through distinct mechanisms. nhr- is influenced differently by, and orchestrates distinct responses to, the acute stress of pathogen attack vs. a we found that nhr- expression in neurons alone could rescue the immunity deficits of germline-less, nhr- ;glp- mutants, whereas, their lifespan was substantially restored by expression in any somatic tissue. in fertile, nhr- single mutants, immunity was restored by presence in neurons or intestine, but lifespan could be rescued from other tissues as well. this suggests that pathogen response may be more sensitive to nhr- 's location as compared to longevity. interestingly, nhr- expression in muscles provided little or no immunity benefit in any of the genetic backgrounds we tested, irrespective of the presence or absence of the germline in the animal. in fact, expression in muscles mostly diminished immunity. this is in noticeable contrast to the broad immunity advantages conferred by tissues including the liver, brain, heart, muscles and immune cells (e.g., macrophages, monocytes, and lymphocytes). importantly, pparα performs distinct functions in these tissues and its activity is governed by tissue-specific mechanisms. for instance, it mediates fatty acid oxidation in liver and heart, but its endogenous ligands and their sources are different in the two organs. in liver, a lipid species, : / : -gpc, derived by fatty acid synthase (fas)-mediated de novo lipogenesis, serves as its endogenous ligand (chakravarthy et al., ) generated. transgene-carrying strains were maintained and selected for lifespan and pathogen stress assays using a leica m c microscope with a fluorescence attachment. a complete listing of all strains created for this study is provided in table s . fenofibrate supplementation assay: µl of µm fenofibrate (sigma f ) in . % dmso were placed onto both ngm and slow-killing plates before seeding with op or pa , respectively, as described above (brandstädt et al., ; leiteritz et al., ). upon the drying and growth of the bacterial lawn, eggs were grown to l on either the fenofibrate or . % dmso control plates, then transferred to pa plates (similarly supplemented with fibrate or dmso) at l larval stage and survival monitored. worms were transferred to fresh plates as described above. fig. trial trial context is everything: aneuploidy in cancer metformin enhances autophagy and normalizes mitochondrial function to alleviate aging-associated inflammation molecular actions of pparalpha in lipid metabolism and inflammation lipid-lowering fibrates extend c. elegans lifespan in a nhr- /pparalpha-dependent manner identification of a physiologically relevant endogenous ligand for pparalpha in liver spectroscopic coherent raman imaging of caenorhabditis elegans reveals lipid particle diversity the role of peroxisome proliferator-activated receptors (ppar) in immune responses ppar-alpha as a key nutritional and environmental sensor for metabolic adaptation immune response with aging: immunosenescence and its potential impact on covid- nhr- transcription factor regulates immunometabolic response and survival of caenorhabditis elegans during enterococcus faecalis infection key: cord- -jj pc a authors: tang, pingtao; das, jharna r.; li, jinliang; yu, jing; ray, patricio e. title: an hiv-tat inducible mouse model system of childhood hiv-associated nephropathy date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jj pc a background modern antiretroviral therapies (art) have decreased the prevalence of hiv-associated nephropathy (hivan). nonetheless, we continue to see children and adolescents with hivan all over the world. furthermore, once hivan is established in children, it is difficult to revert its long-term progression, and we need better animal models of childhood hivan to test new treatments. objectives to define whether the hiv- trans-activator (tat) gene precipitates hivan in young mice, and to develop an inducible mouse model of childhood hivan. design/methods an hiv-tat gene cloned from a child with hivan was used to generate recombinant adenoviral vectors (rad-tat). rad-tat and lacz control vectors ( × ) were expressed in the kidney of newborn wild type and hiv-transgenic (tg ) fvb/n mice without significant proteinuria (n = - per group). mice were sacrificed and days later to assess their renal outcome, the expression of hiv-genes and growth factors, and markers of cell growth and differentiation by rt-qpcr, immunohistochemistry, and/or western blots. results hiv-tat induced the expression of hiv- genes (env) and heparin binding growth factors in the kidney of hiv-tg mice, and precipitated hivan in the first month of life. no significant renal changes were detected in wild type mice infected with rad-tat vectors, suggesting that hiv-tat alone does not induce renal disease. conclusion this new mouse model of childhood hivan highlights the critical role that hiv-tat plays in the pathogenesis of hivan, and could be used to study the pathogenesis and treatment of hivan in children and adolescents. summary statement we developed a new inducible mouse model system of childhood hiv-associated nephropathy, and demonstrated that hiv-tat plays a critical role in this renal disease acting in synergy with other hiv- genes and heparin binding cytokines. modern combined antiretroviral therapies (cart) have improved the clinical outcome of children and adolescents living with hiv and decreased the prevalence of hiv-associated nephropathy (hivan) in a significant manner. however, physicians have had less success providing chronic cart to children and adolescents living with hiv, and we continue to see hivan in this group of people all over the world. over % of the estimated . million hiv-infected children are living in the sub saharan africa [ ], and it is anticipated that approximately % of these children will develop hivan if they do not receive appropriate art [ ] . furthermore, we have noticed that once the typical renal histological features of hivan are established in children, it is difficult to prevent its long-term progression to eskd with current treatments available. in addition, previous reports in adults [ , ] and children [ ] suggest that hivan can occur in people with suppressed viral load. these studies suggest that inflammatory cytokines released by hivinfected cells can play a role in the pathogenesis of hivan independently of the viral load. taken together, all these findings underscore the importance of acquiring a better understanding of the pathogenesis and treatment of childhood hivan during the modern cart era. childhood hivan is a renal disease seen predominantly in black children and adolescents who acquired hiv- through vertical transmission and do not receive appropriate antiretroviral therapy [ , ] . from the clinical point of view it is characterized by persistent proteinuria, often in the nephrotic range, and in the late stages is associated with edema, reduced gfr, hypertension, and rapid progression to end stage kidney disease (eskd) [ , , ] . the renal histological lesions of childhood hivan reveal mesangial hyperplasia, focal segmental or collapsing glomerulosclerosis, and multicystic tubular dilatation leading to renal enlargement [ , , ] . several hiv-transgenic (hiv-tg) animal models are available to study the pathogenesis and treatment of hivan [ ] [ ] [ ] [ ] [ ] . however, these animals develop hivan at different time points, usually after they reach adulthood, and we lack a reliable mouse model system to study the pathogenesis of childhood hivan. therefore, we carried out this study to determine whether the hiv- trans-activator (tat) gene precipitates hivan in young mice, and define whether this approach could be used to generate an inducible mouse model system of childhood hivan. to accomplish this goal, we infected newborn wild type and heterozygous hiv-tg the protein sequence of the hiv-tat derived from a child with hivan (tat-hivan) was aligned with tat protein sequences derived from the lymphotropic virus hiv- iiib and the monocytetropic hiv- virus ada (nih aids research and reference reagent program). as shown in figure a , tat-hivan contains the basic domain that is essential for hiv- activation, but is missing the rgd motif that interacts with cytokines and integrin receptors [ , ] . using an adenovirus gene transferring technique developed in our laboratory [ ] , we were able to express hiv-tat in the kidney and liver of wild type and hiv-tg mice ( figure b ). as expected, by western blots, higher tat protein expression levels where detected in the liver compared to kidneys [ ] ( figure b ). hiv-tg newborn mice infected with rad-tat showed higher tat mrna expression levels when compared to transgenic mice injected with rad-lac-z vectors ( figure b) . the tat mrna detected in hiv-tg mice injected with rad-lac-z vectors was transcribed from the hiv pro-viral dna d transgenic construct. figure a -b). furthermore, the bun levels of these mice were elevated, when compared to hiv-tg mice injected with rad-lac-z vectors ( ± mg/dl* vs. . ± . mean± sem; *p< . ). overall, these findings suggest that tat plays an important role precipitating hivan in hiv-tg mice. figures - ) . taken together, these findings suggest that hiv-tat interacts with other hiv- genes and/or cytokines to induce the proliferation of renal epithelial cells [ , , ] . alternatively, using an in situ apoptosis detection kit and western blots to to develop the mouse model of childhood hivan we took advantage of the hiv-tg mouse line [ , , ] . these mice carry a . -kb hiv- construct lacking -kb sequence overlapping the gag/pol region of the provirus pnl - [ , , ] and express hiv- transcripts in many tissues, including kidney glomerular and tubular epithelial cells. homozygous hiv-tg mice are born sick and usually died with multiple systemic lesions during the first days or weeks of life [ , ] . in contrast, heterozygous mice can be followed until they reach adulthood, and have been used by several investigators to explore the pathogenesis of hivan [ , , , ] . because the majority of heterozygous hiv-tg mice develop hivan at different time points after they reach adulthood, currently we do not have a reliable mouse model system to study the pathogenesis of childhood hivan. therefore, we carried out this study to test the hypothesis that the induction of hiv-genes in the kidney of newborn mice precipitates hivan during the first month of life. to accomplish this goal, we used an adenovirus gene transferring technique developed in our laboratory, which is based on the principle that the retention of adenoviral vectors in the circulation improves the transduction of renal glomerular cells in rodents [ , ] in previous studies, we showed that newborn mice have delayed clearance of rad vectors from the circulation, and therefore, more efficient transduction of glomerular cells after a systemic injection of adenoviruses via the retro-orbital plexus [ ] . following this approach, we expressed the coding sequences of a tat gene derived from a child with hivan (tat-hivan) in the kidney of newborn hiv-tg mice, and precipitated the development of hivan during the first month of life. our findings support the results of previous studies showing that hiv- genes expressed in the kidney play a critical role in this process, although we do not yet understand the exact mechanisms involved. further studies are warranted to explore this issue. the hiv-tat protein is a powerful transcriptional factor encoded by two exons. the first exon encodes the hiv activation and basic binding domains, which are required for hiv-transcription and nuclear localization of tat [ ] . the second exon encodes the rgd motif (c-terminal amino acids - ), which enhances the angiogenic activity of tat acting through cytokines and integrin receptors [ ] . tat plays an essential role in hiv-replication by recruiting a cellular human protein called cyclin t , which efficiently increases the transcription of the hiv-ltr via nf-κb [ ] . however, cloning and characterization of the murine cyct protein revealed that mouse cyclin lacks a critical cysteine residue that is needed to form a complex with tat and induce its full transcriptional activity [ , ] . for this reason, tat has limited direct transcriptional activity in mice, but it can induce the expression of tnf-α [ ] and other cytokines that increase the transcription of hiv- via nf-κb dependent mechanisms [ ] . our study showed that the activation and basic binding domains of tat are sufficient to induce the renal expression of hiv-genes and precipitate hivan in young mice. in contrast, we found tat's rgd motif is not essential in this process. in addition to being a powerful activator of hiv- transcription, tat is released into the circulation by infected cells and can be taken up by uninfected cells [ , ] . in this manner, tat mimics the action of several cytokines involved in the pathogenesis of aids, including sdf- α, rantes, and mif -β [ , , ] . furthermore, acting in synergy with fgf- , tat can induce the de-differentiation and proliferation of cultured human podocytes [ ] [ ] [ ] . for these reasons, we explored the effects of tat in wild type mice, but were unable to detect significant renal lesions in these mice. our findings suggest that tat alone cannot induce renal disease in wild type mice. however, we should mention that tat-hivan has an incomplete rgd sequence, and its ability to interact with cytokines and integrin receptors in vivo might be impaired [ , ] . thus, further studies are needed to determine whether tat proteins containing rdg sequences can cause kidney damage per se in wild type mice. we speculate that an additional mechanism through which tat could precipitate hivan in hiv-tg mice is by increasing the production and/or activity of tnf-α [ ] , since high levels of tnf-α are detected in the circulation of hiv-tg mice [ ] , and tnf-α worsens the outcome of hivan in adult mice [ ] . alternatively, tat could act in synergy with fgf- and vegf-a [ , , , ] , considering that both heparin binding growth factors were up-regulated by tat in day old hiv-tg overall, our mouse model reproduces all the renal histological features characteristic of childhood hivan [ , ] . interestingly, the expression levels of the podocyte specific proteins, nephrin, wt- , and synaptopodin, did not change in correlation with the induction of hiv- genes during the first week of life. these findings suggest that the podocyte de-differentiation changes characteristic of hivan, might be a secondary event associated with the regeneration of these cells. it is possible that podocytes that express high levels of hiv- genes died, and were replaced by parietal epithelial or renal progenitor cells [ , ] , which do not express podocytes markers, and are sensitive to the growth promoting effects of several growth factors [ ] . in addition, we noted a reduced number of renal epithelial cells undergoing apoptosis in days old hiv-tg mice infected with rad-tat vectors, when compared to the controls. it is tempting to speculate that tat, in combination with bfgf- and vegf-a, may have an antiapoptotic effect [ ] , since both heparin binding growth factors were up-regulated at this time point. however, more studies are needed to test this hypothesis. finally, we found that tat induced direct renal epithelial proliferative changes in and days old hiv-tg mice. these changes appear to be driven by the mapk signaling pathway, which can be activated directly by hiv-nef [ ] [ ] [ ] [ ] , as well as fgf- [ ] or vegf-a [ ] in humans, the risk variants of the apolipoprotein- (apol ) increase the lifetime risk of untreated hiv+ people to develop hivan by ~ % [ ] [ ] [ ] . therefore, one limitation of our animal model is that hiv-tg mice do not express the apol- gene. this limitation could be overcome by generating dual transgenic hiv-tg / apol mice [ ] , and infecting newborn mice with rad-tat vectors. in addition, a significant number of black children living with hiv develop hivan independently of the apol risk variants [ , ] , and previous studies suggest that the apol risk variants may play more relevant role in adults, when compared to young children [ , ] . thus, young kidneys might be more sensitive to the cytotoxic effects of hiv- genes, tnf-α, and heparin binding growth factors, and less dependent on the apol risk variants to develop hivan. alternatively, it is possible that other unknown genetic factors may play an additional role precipitating hivan in black children, as reported in hiv-tg mice [ ] . taken together, these studies show that a strong genetic influence modulates the outcome of hivan both in mice and humans, and more work needs to be done to define these factors in children. in conclusion, we have developed an inducible mouse model system of childhood hivan that reproduces the full hivan phenotype during their first month of life. in addition, we showed that tat plays a relevant role in this process by inducing the renal expression of hiv- genes, fgf- , and vegf-a, leading to the activation of the mapk pathway. hopefully, this animal model will facilitate the discovery of new therapeutic targets to prevent the progression of hivan in children and adolescents. care and use committee. heterozygous newborn hiv-tg fvb/n mice [ , ] and their respective wild type (wt) littermates were injected through the retro-orbital plexus with x pfu/mouse of recombinant adenoviral (rad) vectors carrying either hiv-tat derived from a child with hivan (rad-tat vector) or the e. coli lacz gene (rad-lac-z). to express the hiv-tat rad vector in the kidney of newborn mice, we used a gene transferring technique developed in our laboratory [ ] . wild type and hiv-tg mice expressing the pro-viral dna construct d [ , ] , were divided in groups (n = mice each) and sacrificed at days (peak of rad-tat expression) and days (renal clearance of the viral vectors) after the adenoviral injections. all mice had free access to water and standard food and were treated in accordance with the national institutes of health (nih) guidelines for care and use of research animals. the generation of the rad-tat vector derived from a child with hivan was described in detail before [ ] . briefly, a cdna fragment encoding the full-length tat protein was cloned into the pcxn -flag vector and used to generate e -deleted recombinant adenoviruses carrying tat-hivan-flag [ ] . the protein sequence of the tat-hivan gene was aligned and compared to other tat genes using the clustal omega multiple sequence alignment program (http:// www.ebi.ac.uk/tools/msa/clustalo/). both tat-flag and lac-z control adenoviruses were amplified, purified, desalted, and titrated as previously described [ , , ] . the particle-to-plaque-forming unit (pfu) ratio of the virus stock used in these experiments was . blood, urine and kidney sample collection. mice were sacrificed and days after the rad injections. urine, blood and kidney samples were harvested and kept frozen at - c. blood urea nitrogen was assessed using the quantichrom urea assay kit from bioassay systems (catalog no. diur- ) as described before [ ] . the urinary creatinine levels were measured using colorimetric assay from r&d systems (catalog no. kge ). urinary protein was measured using the bayer multistix sg reagent strips for urinalysis. in addition, microliters of urine were run on - % sodium dodecyl sulfate polyacrylamide gel electrophoresis (sds-page) and stained with coomassie blue stain solution (bio-rad) to detect proteinuria. the protein band corresponding to albumin was quantified by densitometry analysis using adobe photoshop . (adobe systems, san jose, ca). results were expressed in arbitrary optical density units adjusted to the urinary creatinine values as described before [ ] . amplification reaction, we used µl and µl cdna respectively. the densitometry analysis was conducted using adobe photoshop . as described before [ , ] . western blot analysis. the kidneys were lysed using ripa lysis buffer containing protease inhibitors and phosphatase inhibitor cocktail (sigma-aldrich), and processed by western blots as described before [ ] . the following primary antibodies were used: phospo-p / mitogen- (progen biotechnik gmbh). all primary antibodies were diluted : except wt- , which was diluted : dilution and incubated overnight at o c. protein bands were detected using supersignal west pico chemiluminescent substrate (thermo scientific) according to the manufacturer's instruction. all membranes were exposed to a kodak film (x-omat) and developed using an automated developer. densitometry analysis of the data expressed as a β actin ratio was conducted using adobe photoshop . as described before immunohistochemistry. paraffin embedded sections were cut at µm, deparaffinized, rehydrated, and stained as previously described [ ] . immunostaining was performed with a commercial streptavidin-biotin-peroxidase complex (histostain sp kit, zymed, san francisco, ca) according to the manufacturer's instructions as described before [ ] . the peroxidase activity was monitored after the addition of substrate using a dab kit (vector laboratories, catalog no sk- ) or aec substrate kit (catalog no. ,invitrogen, frederick, md). sections were counterstained with hematoxylin. the pcna staining kit from invitrogen was used to detect pcna. ki- and wilms' tumor (wt- ) staining were assessed using a : dilution of a monoclonal rat anti-mouse ki- antibody (clone tec- ) and a mouse monoclonal anti-human wt- antibody (clone f-h ) respectively, both from dako north american inc). synaptopodin was detected with a ready-to-use a mouse monoclonal antibody (clone g d , if not specified otherwise, the data were expressed as mean + sem. differences between two groups were compared using an unpaired t test. multiple sets of data were compared by the expression of pcna, wt- and nephrin was quantified as a ratio of beta actin. the graphs show the results of the densitometry analysis and quantification of the results in optical density units (mean ± sem), as described in the methods sections. all changes between groups were not statistically significant (p > . ). kidney disease in hiv-positive children hiv-associated kidney glomerular diseases: changes with time and haart viral load and hiv-associated nephropathy hiv-associated nephropathy in the setting of maximal virologic suppression renal disease in children with the acquired immunodeficiency syndrome human immunodeficiency virus (hiv)-associated nephropathy in children from the hivassociated nephropathy in transgenic mice expressing hiv- genes nephropathy in hiv-transgenic mice a novel hiv- transgenic rat model of childhood hiv- -associated nephropathy expression of hiv- genes in podocytes alone can lead to the full spectrum of hiv- -associated nephropathy transgenic and infectious animal models of hiv-associated nephropathy the tat protein of human immunodeficiency virus type- promotes vascular cell growth and locomotion by engaging the alpha beta and alphavbeta integrins and by mobilizing sequestered basic fibroblast growth factor inflammatory cytokines synergize with the hiv- tat protein to promote angiogenesis and kaposi's sarcoma via induction of basic fibroblast growth factor and the alpha v beta integrin adenovirus-mediated gene transfer to glomerular cells in newborn mice bfgf and its low affinity receptors in the pathogenesis of hivassociated nephropathy in transgenic mice hiv- upregulates vegf in podocytes a urinary biomarker profile for children with hiv-associated renal diseases synergy between basic fibroblast growth factor and hiv- tat protein in induction of kaposi's sarcoma progressive glomerulosclerosis and enhanced renal accumulation of basement membrane components in mice transgenic for human immunodeficiency virus type genes liver bypass significantly increases the transduction efficiency of recombinant adenoviral vectors in the lung, intestine, and kidney efficient gene transfer to rat renal glomeruli with recombinant adenoviral vectors mutational analysis of the conserved basic domain of human immunodeficiency virus tat protein cutting edge: a short polypeptide domain of hiv- -tat protein mediates pathogenesis tat trans-activates the human immunodeficiency virus through a nascent rna target the interaction between hiv- tat and human cyclin t requires zinc and a critical cysteine residue that is not conserved in the murine cyct protein recruitment of a protein complex containing tat and cyclin t to tar governs the species specificity of hiv- tat the tat protein of hiv- induces tumor necrosis factor-alpha production. implications for hiv- -associated neurological diseases hiv- tat protein induces production of proinflammatory cytokines by human dendritic cells and monocytes/macrophages through engagement of tlr -md -cd complex and activation of nf-kappab pathway effects of the human immunodeficiency virus type tat protein on the expression of inflammatory cytokines human immunodeficiency virus- tat induces hyperproliferation and dysregulation of renal glomerular epithelial cells the basic domain of hiv-tat transactivating protein is essential for its targeting to lipid rafts and regulating fibroblast growth factor- signaling in podocytes isolated from children with hiv- -associated nephropathy circulating fibroblast growth factor- , hiv-tat, and vascular endothelial cell growth factor-a in hiv-infected children with renal disease activate rho-a and src in cultured renal endothelial cells elevated levels of tumor necrosis factor alpha (tnf-alpha) in human immunodeficiency virus type -transgenic mice: prevention of death by antibody to tnf-alpha interposes the proliferative and nf-kappab-mediated inflammatory response by podocytes to tnf-alpha role of fibroblast growth factor-binding protein in the pathogenesis of hiv-associated hemolytic uremic syndrome hiv-associated nephropathy proliferating cells in hiv and pamidronateassociated collapsing focal segmental glomerulosclerosis are parietal epithelial cells regeneration of glomerular podocytes by human renal progenitors ureteric bud cells secrete multiple factors, including bfgf, which rescue renal progenitors from apoptosis fibroblast growth factor- and the hiv- tat protein synergize in promoting bcl- expression and preventing endothelial cell apoptosis: implications for the pathogenesis of aids-associated kaposi's sarcoma hiv- nef disrupts the podocyte actin cytoskeleton by interacting with diaphanous interacting protein critical role for nef in hiv- -induced podocyte dedifferentiation nef stimulates proliferation of glomerular podocytes through activation of src-dependent stat and mapk , pathways hiv- viral protein r induces erk and caspase- -dependent apoptosis in renal tubular epithelial cells association of trypanolytic apol variants with kidney disease in african americans hiv-associated nephropathy in african americans apol risk variants are strongly associated with hiv-associated nephropathy in black south africans apol -g or apol -g transgenic models develop preeclampsia but not kidney disease apol -associated glomerular disease among african-american children: a collaboration of the chronic kidney disease in children (ckid) and nephrotic syndrome study network (neptune) cohorts brief report: apol renal risk variants are associated with chronic kidney disease in children and youth with perinatal hiv infection renal and cardiovascular morbidities associated with apol status among african-american and non-african mapping a locus for susceptibility to hiv- -associated nephropathy to mouse chromosome adenovirus-mediated correction of the genetic defect in hepatocytes from patients with familial hypercholesterolemia a novel role of fibroblast growth factor- and pentosan polysulfate in the pathogenesis of intestinal bleeding in mice role of circulating fibroblast growth factor- in lipopolysaccharide-induced acute kidney injury in mice transmembrane tnf-alpha facilitates hiv- infection of podocytes cultured from children with hiv-associated nephropathy fibroblast growth factor- increases the renal recruitment and attachment of hiv-infected mononuclear cells to renal tubular epithelial cells acknowledgements. we thank dr. marina jerebtsova phd. for her advice expanding adenoviral vectors and perfoming injections in mice, and dr. xuefang xie phd. for her contribution performing the sequencing and analysis of the tat-hivan gene. competing interests. all authors declare no conflicts or competing interest related to this manuscript.funding sources. this study was supported by national institutes of health awards: r dk- ; r dk- ; and r dk- . key: cord- - wc f rl authors: sengupta, sourodip; addya, sankar; biswas, diptomit; sarma, jayasri das title: matrix metalloproteinases and tissue inhibitors of metalloproteinases in murine coronavirus-induced neuroinflammation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wc f rl mouse hepatitis virus (mhv) belongs to the same beta-coronavirus family as sars-cov- , mers-cov, and sars-cov. studies have shown the requirement of host cellular proteases for priming the surface spike protein during viral entry and transmission in coronaviruses. the metzincin family of metal-dependent endopeptidases called matrix metalloproteinases (mmps) is involved in virus encephalitis, enhanced blood-brain barrier permeability, or cell-to-cell fusion upon viral infection. here we show the role of mmps as mediators of virus-induced host neuroinflammatory response in the mhv model. infection of mice with wild-type mhv-a or its isogenic recombinant strains, rsa or rsmhv significantly upregulated mmp- , mmp- , and mmp- transcript levels. functional network assessment with ingenuity pathway analysis revealed a direct involvement of these mmps in disrupting junctional assembly between endothelial cells via interaction with junctional adhesion molecules and thereby facilitating transmigration of peripheral lymphocytes. our findings also suggest mrna upregulation of park , which is involved in nadph oxidase-dependent ros production, following rsa infection. rsa infection resulted in elevated mrna levels of rela, a subunit of nf-κb. infection with mhv-a is known to generate ros, and oxidative stress can activate nf-κb. thus, our findings indicate the existence of a possible nexus between ros, nf-κb, and mmps in rsa -induced neuroinflammation. we also assessed the expression of endogenously produced regulators of mmp activities. elevated mrna and protein levels of tissue inhibitors of metalloproteinases (timp- ) in mhv-a infection are suggestive of a timp- mediated host antiviral response. importance the newly emergent coronavirus has brought the world to a near standstill. in the past, studies have focused on the function of host proteases in virus attachment and entry. our research indicates the involvement of a group of metal-dependent host proteases in inflammation associated with coronavirus infection. inflammation is the first response of the host to virus infection. while it helps in restricting the spread and clearance of viral particles, uncontrolled inflammation results in several inflammatory consequences. therefore, it becomes vital to limit unchecked host immune response. the inhibition of specific metalloproteases represents a potential new therapeutic approach in coronavirus infection and disease outcome. junctional adhesion molecules and thereby facilitating transmigration of peripheral lymphocytes. our findings also suggest mrna upregulation of park , which is involved in nadph oxidase-dependent ros production, following rsa infection. as the number of plaques times dilution factor (df) per ml per gram of tissue per ml [pfu= (no. of plaques*df per ml)/ (tissue weight in gram per ml)]. gene expression analysis. total rna was extracted from brain tissues of mhv-a , rsa or rsmhv infected as well as mock-infected mice using trizol reagent (invitrogen) following the manufacturer's instructions. rna concentration was measured using a nanodrop / c spectrophotometer (thermo fisher scientific), and cdna was prepared with µg of total rna using a cdna reverse transcription kit. quantitative real-time pcr (rt-qpcr) was performed using sybr green dye-based assay in a quantstudio real-time pcr system (thermo fisher scientific) with the following reaction conditions: initial denaturation at °c for min, cycles of °c for s and °c for s, and melting curve analysis at °c for s. reactions were performed in triplicates (n= ). primer sequences are provided in table . the comparative threshold (ΔΔct) method was used for relative quantification. the mrna levels of target genes were normalized with the housekeeping gapdh gene and represented as the relative fold change values compared to their respective mock-infected controls. western blotting. brain tissues ( mg) were harvested from mice following transcardial pbs perfusion and flash-frozen in liquid n . tissues were homogenized (using qiagen homogenizer) and lysed in ul of ripa buffer containing protease-cocktail inhibitor and phosphatase inhibitors ( mm navo and mm naf) for hr mins with intermittent vortex every mins. the samples were kept on ice during the entire process. samples were then centrifuged for mins at , rpm at °c to separate the supernatant. the total protein content in the supernatant was estimated with a bca protein assay kit. for immunoblotting, µg of total protein per sample was resolved by sds-page on a % polyacrylamide gel followed by transfer onto pvdf membranes using transfer buffer ( mm tris, mm glycine, and % methanol) . membranes were blocked for hr at room temperature in % v/v goat serum prepared in tbst (tris-buffered saline containing . % v/v tween ) and subsequently incubated for overnight at °c in polyclonal anti-mouse timp- antibody at : dilution in blocking solution. the membranes were washed in tbst and incubated for hr at room temperature with hrp-conjugated donkey anti-goat secondary igg antibody. as an internal loading control, g-actin was used, and membranes were blocked separately in % w/v non-fat skimmed milk in tbst. polyclonal anti-mouse g-actin antibody ( : dilution) and hrp-conjugated goat anti-rabbit secondary igg antibody ( : , dilution) were used. the blots were washed in tbst, and the immunoreactive bands visualized using the chemiluminescent hrp substrate. non-saturated bands were visualized with syngene g: box chemidoc system using gensys software. to the median of all samples used as the baseline option. data were filtered by percentile, and a lower cut off was set at . a fold change of ≥ . -fold was considered for differential expression of a gene. statistical analysis using unpaired student t-tests was performed to compare two groups, with p-values ≤ . considered significant. the list of mmp and timp genes from the microarray data was loaded into qiagen's ingenuity pathway analysis software (ipa®, qiagen, usa) to perform biological network and functional analyses. statistical analysis. data shown are mean ± standard error mean (sem) for all graphs. unpaired student t-test with welch's correction, assuming unequal standard deviations, was performed to examine significant differences between two groups. multiple comparisons were achieved using ordinary one-way anova, followed by dunnett's multiple comparison test. a p-value < . was considered statistically significant. availability of data. all the data sets used and analyzed in the current study are available from the corresponding author on request. four weeks old, male c bl/ mice inoculated with mhv-a ( pfu) or mock- infected were sacrificed at day - (acute), (acute-chronic), and (chronic) post-infection (p.i), and brains were harvested. routine plaque assay was performed with serially diluted brain homogenates to estimate viral replication. mhv-a titer was significant between day - p.i (fig. , a; p< . ) and viral particles were below the detection limit at later time points (data not shown). total rna was isolated from mock and virus-infected brain tissues for expression analysis of viral nucleocapsid and mmp genes through rt-qpcr. primer sequences are given in table . levels of viral nucleocapsid mrna ( fig. , b ; p< . ) coincided with viral replication reaching its peak between day - p.i, which also marks the acute phase of inflammation. mhv-a infected mice exhibited neuroinflammation reaching its peak by - days p.i, which is associated with meningitis, encephalitis, perivascular cuffing, and macrophage/microglia nodule formation ( ). we found similar results in paraffin-embedded brain sections stained by hematoxylin-eosin and immunohistochemistry (data not shown). transcript levels of mmp , mmp , mmp , and mmp were significantly upregulated at day - p.i ( tissue inhibitors of metalloproteinases or timps are endogenous protein regulators of mmps. to understand the regulation of mmps upon mhv-a infection, we also considered the gene expression of timps. as described above, total rna from brain samples of mock and mhv-a infected mice were subjected to rt-qpcr using specific primers (table ) to determine the transcript levels of timp , timp , timp , and timp . mhv-a infection resulted in significant upregulation of timp mrna at day - p.i (fig. , a; p< . ), while mrna levels of timp , timp , and timp remained significantly downregulated (fig. , c-e; p-values varies as < . to < . ). while timp mrna followed a similar expression pattern as the mmps following mhv-a infection-induced inflammation, its protein levels remained high throughout post-infection, as shown in the representative figure (fig. , b) . overall, mhv-a resulted in elevated timp- levels in the brain of infected mice. to determine whether the spike (s) protein has any role in inducing mmp cord. to validate the findings from microarray data, we performed rt-qpcr from brain samples of rsa , rsmhv , and mock-infected mice sacrificed at day - , , and p.i. brain samples were also harvested for titer assay to estimate viral replication. like the parental mhv-a strain, viral titer and nucleocapsid mrna levels peaked by day - p.i in both the recombinant strains as demonstrated in the representative graphs (fig. , a-c) . although we detected no nucleocapsid mrnas between - days p.i in rsmhv , its presence was observed at day p.i in rsa (data not shown). this data corroborates with previous findings that demyelinating rsa persists in the brain while rsmhv does not persist or, if present, nucleocapsid level is significantly low compared with rsa ( ). biological and functional network analysis were performed for mmp , mmp , and mmp genes using qiagen's ingenuity pathway analysis (ipa) software. ipa analysis identified that these mmps could influence several canonical pathways associated with an immune response such as leukocyte extravasation signaling, granulocyte and agranulocyte adhesion and diapedesis (fig. , a) . also, the top disease pathways involved both inflammatory response and immune cell trafficking (fig. , b) . furthermore, ipa revealed that mmps facilitate the transmigration of firmly adhered granulocytes (fig. , a) and agranulocytes (fig. , b) pathway genes brain samples from mice infected with rsa and sacrificed at day - and p.i, were harvested for total rna isolation followed by cdna synthesis. mock-infected samples were kept in parallel. we performed rt-qpcr using primers (table ) in the oxidative and anti-oxidative pathways. transcript levels of parkinson's disease (park ) gene were significantly upregulated following rsa infection and remained elevated p.i compared to mock-infected samples (fig. , a; p< . ). rela, a subunit of nf- b, also showed elevated mrna levels during the acute infection, i.e., - days p.i (fig. , b; p< . ). on the contrary, mrna levels of nfb , a negative regulatory subunit of nf- b, remained unchanged p.i (fig. , c) . similar to another study of our lab (unpublished data), we detected significantly high mrna levels of nuclear factor erythroid -related factor (nrf ) and heme oxygenase- (hmox ) genes during the acute disease phase (fig. levels. in our current study, rsa infection increased transcript levels of park , which is involved in oxidative stress. park has a double-sword effect. in lower ros concentration, it can affect nadph oxidase by phosphorylating its p phox subunit during nadph oxidase activation, which is crucial for nadph oxidase-dependant ros production ( ). in one of our studies (unpublished data), we show that park also induces the anti-oxidative pathway via nrf and hmox activation during higher cellular ros concentration. activation of metalloproteases via oxidative pathways has been demonstrated in the past ( - ). ros-induced oxidative stress can also activate nf-b signaling ( ). the nuclear factor kappa-light-chain-enhancer of activated b cells (nf-b) acts as a transcription factor and is known to induce inflammation-related genes. rela, a subunit of nf-b, which gets activated in the canonical pathway via toll-like receptors that recognize pathogenic patterns ( ), showed increased mrna levels following rsa infection. on the other hand, nfb , which acts as both a precursor and suppressor of nf-b ( ), demonstrated unchanged mrna levels upon infection. previously, it has been documented that nf-b can induce mmp genes ( , ). therefore, our result indicates that park mediated ros generation leads to the induction of mmp genes via nf-b signaling during mhv-induced acute disease (fig. ) . we also found that rsa infection-induced upregulation of nrf and hmox genes. the anti-oxidative pathway mediated by nuclear factor erythroid -related factor (nrf ) and its dependant heme oxygenase- (hmox ) ( ), could therefore play an essential role in restoring homeostasis through inhibition of ros overproduction. one limitation of this study that will be addressed in our future experiments is that the interplay between ros and mmps has not been validated using inhibitors of ros as positive controls. in previous studies ( - ) involving mhv-a , it has been shown that virus infection reduced expression of connexins (cxs) that form intercellular gap junctional channels and thereby disrupt functional communications between cns glial cells and fibroblasts. however, the mechanism through which cx trafficking is altered is not well understood. the ( ) infected mice at different days post-infection (p.i) were harvested, and viral replication was estimated by routine plaque assay. rna isolated from brain tissues was subjected to cdna synthesis. an equal amount of cdna template was used for rt-qpcr. gene expression was normalized to gapdh and fold-change values obtained using ∆∆ct method. a: mhv-a titer peaked between day - p.i. b: viral nucleocapsid mrna levels peaked at day - p.i like viral replication. c-f: mmp , mmp , mmp , and mmp mrna levels elevated between - days p.i. and coincided with viral replication peak. g: membrane-associated mmp- mrna levels peaked only at later stages p.i. data shown are mean ± sem from two independent biological experiments with nine technical replicates. a significant difference between the two groups was compared with the student t-test. multiple group comparison was made with ordinary one-way anova followed by dunnet's test. a p-value of < . was considered statistically significant (**, p< . ; ***, p< . ; ****, p< . ). of rt-qpcr revealed a significant upregulation in timp- mrna levels at day - p.i in mhv-a compared with mock-infected samples (a). representative immunoblot assay from two independent experiments showed elevated protein levels of timp- at all examined days following mhv-a infection (b). in contrast, timp- , - , and - mrnas remained downregulated throughout p.i (c-e). graphs show mean ± sem values from two independent experiments with nine technical replicates. ordinary one-way anova followed by dunnet's test was performed for multiple group comparisons. statistical significance was considered for p values < . (*, p< . ; ***, p< . ; ****, p< . ). strains of the wild-type mhv-a and differs only in the spike gene. brain samples from mice infected with rsa ( pfu) or rsmhv ( pfu) were harvested at different days p.i for routine plaque assay and total rna extraction. a: both rsa and rsmhv showed peak viral replication between day - p.i. no detectable viral particles were observed at later time points in a routine plaque assay (data not shown). rt-qpcr data showed significantly elevated mrna levels of viral nucleocapsid gene in both rsa (b) and rsmhv (c) at early days p.i, coinciding with peak levels of viral particles as detected in a plaque assay. data shown are mean ± sem from three independent experiments having three technical replicates each. multiple group comparison was made with ordinary one-way anova followed by dunnet's test. a p-value of < . was considered statistically significant (***, p< . ; ****, p< . ). china novel coronavirus i, research t. . a novel coronavirus from patients with pneumonia in china coronavirus as a possible cause of severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia structure, function, and evolution of coronavirus spike proteins sars-cov- cell entry depends cathepsin l functionally cleaves the severe acute respiratory syndrome coronavirus class i fusion protein upstream of rather than adjacent to the fusion peptide endosomal proteolysis by cathepsins is necessary for murine coronavirus mouse hepatitis virus type spike-mediated entry cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease protease-mediated enhancement of severe acute respiratory syndrome coronavirus infection elastase-mediated activation of the severe acute respiratory syndrome coronavirus spike protein at discrete sites within the s domain co-infection of respiratory bacterium with severe acute respiratory syndrome coronavirus induces an exacerbated pneumonia in mice experimental demyelination produced by the a strain of mouse hepatitis virus cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell-cell but not virus-cell fusion zinc metalloproteases for virus entry and cell-cell fusion expression of matrix metalloproteinases and their tissue inhibitor during viral encephalitis matrix metalloproteinase expression correlates with virulence following neurotropic mouse hepatitis virus infection metalloproteinases: mediators of pathology and regeneration in the cns intracellular substrate cleavage: a novel dimension in the biochemistry, biology and pathology of matrix metalloproteinases chemokine and cytokine processing by matrix metalloproteinases and its effect on leukocyte migration and inflammation biochemical and biological attributes of matrix metalloproteinases demyelination determinants map to the spike glycoprotein gene of coronavirus mouse hepatitis virus mechanisms of primary axonal damage in a viral model of multiple sclerosis enhanced green fluorescent protein expression may be used to monitor murine coronavirus spread in vitro and in the mouse central nervous system experimental optic neuritis induced by a demyelinating strain of mouse hepatitis virus mouse hepatitis virus type- infection in mice: an experimental model system of acute meningitis and hepatitis different mechanisms of inflammation induced in virus and autoimmune-mediated models of multiple sclerosis in c bl mice a proline insertion-deletion in the spike glycoprotein fusion peptide of mouse hepatitis virus strongly alters neuropathology mouse hepatitis virus infection upregulates genes involved in innate immune responses demyelinating strain of mouse hepatitis virus infection bridging innate and adaptive immune response in the induction of demyelination demyelinating and nondemyelinating strains of mouse hepatitis virus differ in their neural cell tropism matrix metalloproteinase facilitates west nile virus entry into the brain dengue-virus-infected dendritic cells trigger vascular leakage through metalloproteinase overproduction sirt activating compounds reduce oxidative stress mediated neuronal loss in viral induced cns demyelinating disease park interacts with p (phox) to direct nadph oxidase-dependent ros production and protect against sepsis hypochlorous acid oxygenates the cysteine switch domain of pro-matrilysin (mmp- ). a mechanism for matrix metalloproteinase activation and atherosclerotic plaque rupture by myeloperoxidase activation of matrix metalloproteinase- and - by -and -hydroxyestradiol matrix metalloproteinase- in pneumococcal meningitis: activation via an oxidative pathway nf-kappab in oxidative stress transcriptional regulation via the nf-kappab signaling module hypoxia induces connexin dysregulation by modulating matrix metalloproteinases via mapk signaling inhibition of transcription factor nf-kappab reduces matrix metalloproteinase- , - and - production by vascular smooth muscle cells role of nrf /ho- system in development, oxidative stress response and diseases: an evolutionarily conserved mechanism mouse hepatitis virus infection remodels gap junction intercellular communication in vitro and in vivo microtubule-assisted altered trafficking of astrocytic gap junction protein connexin is associated with depletion of connexin during mouse hepatitis virus infection loss of cx -mediated functional gap junction communication in meningeal fibroblasts following mouse hepatitis virus infection we thank the ministry of education, india, and the department of mice infected with rsa ( pfu) or mock-infected was subjected to cdna synthesis, and subsequently, rt-qpcr was performed. a: data analysis revealed elevated mrna levels of park during acute ( - days p.i) and acute-chronic (day p.i) disease phase. b: mrna upregulation was detected for rela, a subunit of the nf-b transcription factor. c: in contrast, no change was observed in the mrna level of nfb , a negative regulator of nf-b. d & e: moreover, rsa infection also induced increased transcription of nrf and hmox genes. data shown are mean ± sem from two independent experiments. a significant difference between multiple groups was compared with ordinary one-way anova, followed by dunnet's test. a p-value of < . was considered statistically significant (*, p< . ; ***, p< . ; ****, p< . ). key: cord- - rpr aph authors: bhandari, bikash k.; gardner, paul p.; lim, chun shen title: solubility-weighted index: fast and accurate prediction of protein solubility date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rpr aph motivation recombinant protein production is a widely used technique in the biotechnology and biomedical industries, yet only a quarter of target proteins are soluble and can therefore be purified. results we have discovered that global structural flexibility, which can be modeled by normalised b-factors, accurately predicts the solubility of , recombinant proteins expressed in escherichia coli. we have optimised b-factors, and derived a new set of values for solubility scoring that further improves prediction accuracy. we call this new predictor the ‘solubility-weighted index’ (swi). importantly, swi outperforms many existing protein solubility prediction tools. furthermore, we have developed ‘sodope’ (soluble domain for protein expression), a web interface that allows users to choose a protein region of interest for predicting and maximising both protein expression and solubility. availability the sodope web server and source code are freely available at https://tisigner.com/sodope and https://github.com/gardner-binflab/tisigner-reactjs, respectively. the code and data for reproducing our analysis can be found at https://github.com/gardner-binflab/sodope_paper . high levels of protein expression and solubility are two major requirements of successful recombinant protein production (esposito and chatterjee ) . however, recombinant protein production is a challenging process. almost half of recombinant proteins fail to be expressed and half of the successfully expressed proteins are insoluble ( http://targetdb.rcsb.org/metrics/ ). these failures hamper protein research, with particular implications for structural, functional and pharmaceutical studies that require soluble and concentrated protein solutions (kramer et al. ; hou et al. ) . therefore, solubility prediction and protein engineering for enhanced solubility is an active area of research. notable protein engineering approaches include mutagenesis, truncation (i.e., expression of partial protein sequences), or fusion with a solubility-enhancing tag (waldo ; esposito and chatterjee ; trevino, martin scholtz, and nick pace ; chan et al. ; kramer et al. ; costa et al. ) . protein solubility, at least in part, depends upon extrinsic factors such as ionic strength, temperature and ph, as well as intrinsic factors-the physicochemical properties of the protein sequence and structure, including molecular weight, amino acid composition, hydrophobicity, aromaticity, isoelectric point, structural propensities and the polarity of surface residues (wilkinson and harrison ; chiti et al. ; tartaglia et al. ; diaz et al. ) . many solubility prediction tools have been developed around these features using statistical models (e.g., linear and logistic regression) or other machine learning models (e.g., support vector machines and neural networks) (hirose and noguchi ; habibi et al. ; hebditch et al. ; sormanni et al. ; heckmann et al. ; z. wu et al. ; yang, wu, and arnold ) . in this study, we investigated the experimental outcomes of , recombinant proteins expressed in escherichia coli from the 'protein structure initiative:biology' (psi:biology) (chen et al. ; acton et al. ) . we showed that protein structural flexibility is more accurate than other protein sequence properties in predicting solubility (craveur et al. ; m. vihinen, torkkila, and riikonen ) . flexibility is a standard feature that appears to have been overlooked in previous solubility prediction attempts. on this basis, we derived a set of values for the standard amino acid residues and used them to predict solubility. we call this new predictor the 'solubility-weighted index' (swi). swi is a powerful predictor of solubility, and a good proxy for global structural flexibility. in addition, swi outperforms many existing de novo protein solubility prediction tools. we sought to understand what makes a protein soluble, and develop a fast and accurate approach for solubility prediction. to determine which protein sequence properties accurately predict protein solubility, we analysed , target proteins from over species that were expressed in e. coli (the psi:biology dataset; see supplementary fig s and table s a ) (chen et al. ; acton et al. ) . these proteins were expressed either with a c-terminal or n-terminal polyhistidine fusion tag (pet _nesg and pet _nesg expression vectors, n= , and , , respectively). they were previously curated and labeled as 'protein_soluble' or 'tested_not_soluble' (seiler et al. ) , based on the soluble analysis of cell lysate using sds-page (r. xiao et al. ) . a total of , recombinant proteins were found to be soluble, in which , of them belong to the pet _nesg dataset. both the expression system and solubility analysis method are commonly used (costa et al. ) . therefore, this collection of data captures a broad range of protein solubility issues. protein structural flexibility, in particular, the flexibility of local regions, is often associated with function (craveur et al. ) . the calculation of flexibility is usually performed by assigning a set of normalised b-factors-a measure of vibration of c-alpha atoms (see supplementary notes)-to a protein sequence and averaging the values by a sliding window approach (ragone et al. ; karplus and schulz ; m. vihinen, torkkila, and riikonen ; smith et al. ) . we reasoned that such sliding window approach can be approximated by a more straightforward arithmetic mean for calculating global structural flexibility (see supplementary notes). we determined the correlation between flexibility (vihinen et al. 's sliding window approach as implemented in biopython) and solubility scores calculated as follows: where is the normalised b-factor of the amino acid residue at the position , and is the b i i l sequence length. we obtained a strong correlation for the psi:biology dataset (spearman's rho = . , p-value below machine's underflow level). therefore, we reasoned that the sliding window approach is not necessary for our purpose. we applied this arithmetic mean approach (i.e., sequence composition scoring) to the psi:biology dataset and compared four sets of previously published, normalised b-factors (bhaskaran and ponnuswamy ; ragone et al. ; m. vihinen, torkkila, and riikonen ; smith et al. ) among these sets of b-factors, sequence composition scoring using the most recently published set of normalised b-factors produced the highest auc score ( to improve the prediction accuracy of solubility, we iteratively refined the weights of amino acid residues using the nelder-mead optimisation algorithm (nelder and mead ) . to avoid testing and training on similar sequences, we generated cross-validation sets with a maximised heterogeneity between these subsets (i.e. no similar sequences between subsets). we first clustered all , psi:biology protein sequences using a % similarity threshold using usearch to produce , clusters with remote similarity (see methods and supplementary fig s ) . the clusters were grouped into cross-validation sets of approximately , sequences each manually. we did not select a representative sequence for each cluster as about % of clusters contain a mix of soluble and insoluble proteins (supplementary fig s c) . more importantly, to address the issues of sequence similarity and imbalanced classes, we performed , bootstrap resamplings for each cross-validation step (fig a and supplementary fig s ) . we calculated the solubility scores using the optimised weights as equation and the auc scores for each cross-validation step. our training and test auc scores were . ± . and . ± . , respectively, showing an improvement over flexibility in solubility prediction (mean ± standard deviation; fig b and supplementary table s ). the final weights were derived from the arithmetic means of the weights for individual amino acid residues obtained cross-validation (supplementary table s ) . we observed over a % change on the weights for cysteine (c) and histidine (h) residues (fig c and supplementary table s ). these results are in agreement with the contributions of cysteine and histidine residues as shown in supplementary fig s b. we call the solubility score of a protein sequence calculated using the final weights the solubility-weighted index (swi). flow chart shows an iterative refinement of the most recently published set of normalised b-factors for solubility prediction (smith et al. ) . the solubility score of a protein sequence was calculated using a sequence composition scoring approach (equation , using optimised weights , w instead of normalised b-factors ). these scores were used to compute the auc scores for b training and test datasets. (b) training and test performance of solubility prediction using optimised weights for amino acid residues in a -fold cross-validation (mean auc ± standard deviation). related data and figures are available as supplementary table s and supplementary fig s and s . (c) comparison between the initial and final weights for amino acid residues. the final weights are derived from the arithmetic mean of the optimised weights from cross-validation. these weights are used to calculate swi, the solubility score of a protein sequence, in the subsequent analyses. filled circles, which represent amino acid residues, are colored by hydrophobicity (kyte and doolittle ) . solid black circles denote aromatic amino acid residues phenylalanine (f), tyrosine (y), tryptophan (w). dotted diagonal line represents no change in weight. see also supplementary table s and fig s . auc, area under the roc curve; roc, receiver operating characteristic; , arithmetic w mean of the weights of an amino acid residue optimised from , bootstrap samples in a cross-validation step. to validate the cross-validation results, we used a dataset independent of the psi:biology data known as esol (niwa et al. ) . this dataset consists of the solubility percentages of e. coli proteins determined using an e. coli cell-free system (n = , ) . our solubility scoring using the final weights showed a significant improved correlation with e. coli protein solubility over the initial weights (smith et al. 's normalised b-factors) [spearman's rho of . (p = . ✕ - ) versus . (p = . ✕ - )]. we repeated the correlation analysis by removing extra amino acid residues including his-tags from the esol sequences (mrgshhhhhhtdpalra and glcgr at the n-and c-termini, respectively). this artificial dataset was created based on the assumption that his-tags have little effect on solubility. we observed a slight decrease in correlation for this artificial dataset (spearman's rho = . , p= . ✕ - ), which may be due to the effects of his-tag in solubility and/or the limitation(s) of our approach that may overfit to his-tag fusion proteins. we performed spearman's correlation analysis for both the psi:biology and esol datasets. swi shows the strongest correlation with solubility compared to the standard and , protein sequence properties (fig and supplementary fig s , respectively) . swi also strongly correlates with flexibility, suggesting that swi is also a good proxy for global structural flexibility. we asked whether protein solubility can be predicted by surface amino acid residues. to address this question, we examined a previously published dataset for the protein surface 'stickiness' of e. coli proteins (levy, de, and teichmann ) . this dataset has the annotation for surface residues based on previously solved protein crystal structures. we observed little correlation between the protein surface 'stickiness' and the solubility data from esol (spearman's rho = . , p = . , n = ; supplementary fig s a) . next, we evaluated if amino acid composition scoring using surface residues is sufficient, optimising only the weights of surface residues should achieve similar or better results than swi. as above, we iteratively refined the weights of surface residues using the nelder-mead optimisation algorithm. the method was initialised with smith et al. 's normalised b-factors and a maximised correlation coefficient was the target. however, a low correlation was obtained upon convergence (spearman's rho = . , p = . ✕ - ; supplementary fig s b) . in contrast, the swi of the full-length sequences has a much stronger correlation with solubility (spearman's rho = . , p = . ✕ - ; supplementary fig s c) . these results suggest that the full-length of sequences contributes to protein solubility, not just surface residues, in which solubility is modulated by cotranslational folding (natan et al. ) . to understand the properties of soluble and insoluble proteins, we determined the enrichment of amino acid residues in the psi:biology targets relative to the esol sequences (see methods). we observed that the psi:biology targets are enriched in charged residues lysine (k), glutamate (e) and aspartate (d), and depleted in aromatic residues tryptophan (w), albeit to a lesser extend for insoluble proteins (supplementary fig s a) . as expected, cysteine residues (c) are enriched in the psi:biology insoluble proteins, supporting previous findings that cysteine residues contribute to poor solubility in the e. coli expression system (diaz et al. ; wilkinson and harrison ) . in addition, we compared the swi of random sequences with the psi:biology and esol sequences. we included an analysis of random sequences to confirm whether swi can distinguish between biological and random sequences. we found that the swi scores of soluble proteins are higher than those of insoluble proteins (supplementary fig s b) , and that true biological sequences also tend to have higher swi scores than random sequences, highlighting a potential evolutionary selection for solubility. to confirm the usefulness of swi in solubility prediction, we compared it with the existing tools protein-sol (hebditch et al. ) , camsol v . (sormanni, aprile, and vendruscolo ; sormanni et al. ) , parsnip ) , deepsol v . (khurana et al. ) , the wilkinson-harrison model (davis et al. ; harrison ; wilkinson and harrison ) , and ccsol omics (agostini et al. ) . we did not include the specialised tools that model protein structural information such as surface geometry, surface charges and solvent accessibility because these tools require prior knowledge of protein tertiary structure. for example, aggrescan d and solart accept only pdb files that can be downloaded from the protein data bank or produced using a homology modeling program (kuriata et al. ; hou et al. ) . swi outperforms other tools except for protein-sol in predicting e. coli protein solubility (table , fig a) . our swi c program is also the fastest solubility prediction algorithm (table , fig b and supplementary table s ). prediction accuracy of solubility prediction tools using the above cross-validation sets (fig a) . for swi, the test auc scores were calculated from a -fold cross-validation (i.e., a boxplot representation of fig b) protein structural flexibility has been associated with conformal variations, functions, thermal stability, ligand binding and disordered regions (mauno vihinen ; teague ; ma ; radivojac ; schlessinger and rost ; yuan, bailey, and teasdale ; yin, li, and li ) . however, the use of flexibility in solubility prediction has been overlooked although their relationship has previously been noted (tsumoto et al. ) . in this study, we have shown that flexibility strongly correlates with solubility (fig ) . based on the normalised b-factors used to compute flexibility, we have derived a new position and length independent weights to score the solubility of a given protein sequence (i.e., sequence composition based score). we call this protein solubility score as swi. upon further inspection, we observe some interesting properties in swi. swi anti-correlates with helix propensity, gravy, aromaticity and isoelectric point (fig c and ) , suggesting that swi incorporates the key propensities affecting solubility. amino acid residues with a lower aromaticity or hydrophilic are known to improve protein solubility (trevino, martin scholtz, and nick pace ; niwa et al. ; kramer et al. ; warwicker, charonis, and curtis ; han et al. ; wilkinson and harrison ) . consistent with previous studies, the charged residues aspartate (d), glutamate (e) and lysine (k) are associated with high solubility, whereas the aromatic residues phenylalanine (f), tryptophan (w) and tyrosine (y) are associated with low solubility (fig c and supplementary fig s a) . cysteine residue (c) has the lowest weight probably because disulfide bonds couldn't be properly formed in the e. coli expression hosts (stewart, aslund, and beckwith ; rosano and ceccarelli ; jia and jeon ; aslund and beckwith ) . the weights are likely different if the solubility analysis was done using the reductase-deficient, e. coli origami host strains, or eukaryotic hosts. higher helix propensity has been reported to increase solubility (idicula-thomas and balaji ; huang et al. ) . however, our analysis has shown that helical and turn propensities anti-correlate with solubility, whereas sheet propensity lacks correlation with solubility, suggesting that disordered regions may tend to be more soluble (fig ) . in accordance with these, swi has stronger negative correlations with helix and turn propensities. these findings also suggest that protein solubility can be largely explained by overall amino acid composition, not just the surface amino acid residues. this idea aligns with our understanding that protein solubility and folding are closely linked, and folding occurs cotranscriptionally, a complex process that is driven various intrinsic and extrinsic factors (wilkinson and harrison ; chiti et al. ; tartaglia et al. ; diaz et al. ) . however, it is unclear why sheet propensity has little contribution to solubility because β-sheets have been shown to link closely with protein aggregation (idicula-thomas and balaji ) . we conclude that swi is a well-balanced index that is derived from a simple sequence composition scoring method. to demonstrate the usefulness of swi, we developed a web server called sodope (soluble domain for protein expression; https://tisigner.com/sodope ). sodope calculates the probability of solubility of a user-selected region based on swi, which can either be a full-length or a partial sequence (see methods and supplementary table s ). this implementation is based on our observation that some protein domains tend to be more soluble than the others. to demonstrate this point, we have analysed three commercial monoclonal antibodies and the severe acute respiratory syndrome coronavirus proteomes (sars-cov and sars-cov- ) (wang et al. ; marra et al. ; f. wu et al. ) ( supplementary fig s and s ). these soluble domains may enhance protein solubility as a whole. sodope also provides options for solubility prediction at the presence of solubility fusion tags. similarly, solubility tags may act as soluble 'protein domains' that can outweigh the aggregation propensity of insoluble proteins. however, some soluble fusion proteins may become insoluble after proteolytic cleavage of solubility tags (lebendiker and danieli ) . in addition, sodope is integrated with tisigner, a gene optimisation web service for protein expression. this pipeline provides a holistic approach to improve the outcome of recombinant protein expression. the standard protein sequence properties were calculated using the bio.sequtils.protparam module of biopython v . (cock et al. ) . all miscellaneous protein sequence properties were computed using the r package protr v . - (n. xiao et al. ) . we used the standard and miscellaneous protein sequence properties to predict the solubility of the psi:biology and esol targets (n= , and , , respectively) (seiler et al. ; niwa et al. ) . for method comparison, we chose the protein solubility prediction tools that are scalable (table ) . default configurations were used for running the command line tools. to benchmark the wall time of solubility prediction tools, we selected sequences that span a large range of lengths from the psi:biology and esol datasets (from to residues). all the tools were run and timed using a single process without using gpus on a high performance computer [ /usr/bin/time -f '%e' ; centos linux (core) operating system, cores in × broadwell nodes (e - v , . ghz, dual socket cores per socket), gib memory]. single sequence fasta files were used as input files. to improve protein solubility prediction, we optimised the most recently published set of normalised b-factors using the psi:biology dataset (smith et al. ) (fig ) . to avoid including homologous sequences in the test and training sets, we clustered the psi:biology targets using usearch v . . , -bit (edgar ) . his-tag sequences were removed from all sequences before clustering to avoid false cluster inclusions. we obtained , clusters using the parameters: -cluster_fast -id . -msaout -threads . these clusters were divided into subsets with approximately , sequences per subsets manually . the subsequent steps were done with his-tag sequences. we used smith et al. 's normalised b-factors as the initial weights to maximise auc using these subsets with a -fold cross-validation. since auc is non-differentiable, we used the nelder-mead optimisation method (implemented in scipy v . . ), which is a derivative-free, heuristic, simplex-based optimisation (oliphant ; millman and aivazis ; nelder and mead ) . for each step in cross-validation, we used , bootstrap resamplings containing , soluble and , insoluble proteins. optimisation was carried out for each sample, giving , sets of weights. the arithmetic mean of these weights was used to determine the training and test auc for the cross-validation step (fig a) . to examine the enrichment of amino acid residues in soluble and insoluble proteins, we compute the bit scores for each amino acid residue in the psi:biology soluble and insoluble groups ( supplementary fig s a) , we normalised the count of each residue in each x) ( group by the total number of residues in that group. we used the normalised count of amino acid residues using the esol e. coli sequences as the background. the bit score of residue for soluble or insoluble group is then given by the following equation: where is the normalised count of residue in the psi:biology soluble or insoluble (x) f i x) ( group and is the normalised count in the esol sequences. (x) f esol for a control, random protein sequences were generated by incrementing the length of sequence, starting from a length of residues to , residues with a step size of residues. a hundred random sequences were generated for each length, giving a total of , unique random sequences. to estimate the probability of solubility using swi, we fitted the following logistic regression to the psi:biology dataset: ( ) robability of solubility where, is the swi of a given protein sequence, and . the x . a = − . b = p-value of log-likelihood ratio test was less than machine precision. equation can be used to predict the solubility of a protein sequence given that the protein is successfully expressed in e. coli ( supplementary table s ). on this basis, we developed a solubility prediction webservice called the soluble domain for protein expression (sodope). our web server accepts either a nucleotide or amino acid sequence. upon sequence submission, a query is sent to the hmmer web server to annotate protein domains ( https://www.ebi.ac.uk/tools/hmmer/ ) (potter et al. ) . once the protein domains are identified, users can choose a domain or any custom region (including full-length sequence) to examine the probability of solubility, flexibility and gravy. this functionality enables protein biochemists to plan their experiments and opt for the domains or regions with high probability of solubility. furthermore, we implemented a simulated annealing algorithm that maximised the probability of solubility for a given region by generating a list of regions with extended boundaries. users can also predict the improvement in solubility by selecting a commonly used solubility tag or a custom tag. we linked sodope with tisigner, which is our existing web server for maximising the accessibility of translation initiation sites (bhandari, lim, and gardner ) . this pipeline allows users to predict and optimise both protein expression and solubility for a gene of interest. the sodope web server is freely available at https://tisigner.com/sodope . jupyter notebook of our analysis can be found at https://github.com/gardner-binflab/sodope_paper_ . the source code for our solubility prediction server (sodope) can be found at https://github.com/gardner-binflab/tisigner-reactjs . robotic cloning and protein production platform of the northeast structural genomics consortium ccsol omics: a webserver for solubility prediction of endogenous and heterologous expression in escherichia coli the thioredoxin superfamily: redundancy, specificity, and gray-area genomics highly accessible translation initiation sites are predictive of successful heterologous protein expression positional flexibilities of amino acid residues in globular proteins learning to predict expression efficacy of vectors in recombinant protein production targetdb: a target registration database for structural genomics projects rationalization of the effects of mutations on peptide and protein aggregation rates biopython: freely available python tools for computational molecular biology and bioinformatics fusion tags for protein solubility, purification and immunogenicity in escherichia coli: the novel fh system protein flexibility in the light of structural alphabets new fusion protein systems designed to give soluble expression in escherichia coli prediction of protein solubility in escherichia coli using logistic regression search and clustering orders of magnitude faster than blast enhancement of soluble protein expression through the use of fusion tags prediction of peptide and protein propensity for amyloid formation a review of machine learning methods to predict the solubility of overexpressed recombinant proteins in escherichia coli improve protein solubility and activity based on machine learning models expression of soluble heterologous proteins via fusion with nusa protein protein-sol: a web tool for predicting protein solubility from sequence machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models espresso: a system for estimating protein expression and solubility in protein expression systems computational analysis of the amino acid interactions that promote or decrease protein solubility solart: a structure-based method to predict protein solubility and aggregation prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in escherichia coli high-throughput recombinant protein expression in escherichia coli: current status and future perspectives prediction of chain flexibility in proteins deepsol: a deep learning framework for sequence-based protein solubility prediction toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility aggrescan d (a d) . : prediction and engineering of protein solubility a simple method for displaying the hydropathic character of a protein production of prone-to-aggregate proteins cellular crowding imposes global constraints on the chemistry and evolution of proteomes usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes the genome sequence of the sars-associated coronavirus data structures for statistical computing in python python for scientists and engineers cotranslational protein assembly imposes evolutionary constraints on homomeric proteins a simplex method for function minimization bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of escherichia coli proteins python for scientific computing scikit-learn: machine learning in python protein flexibility and intrinsic disorder protein engineering, design and selection parsnip: sequence-based protein solubility prediction using gradient boosting machine recombinant protein expression in escherichia coli: advances and challenges protein flexibility and rigidity predicted from sequence statsmodels: econometric and statistical modeling with python dnasu plasmid and psi:biology-materials repositories: resources to accelerate biological research improved amino acid flexibility parameters rapid and accurate in silico solubility screening of a monoclonal antibody library the camsol method of rational design of protein mutants with enhanced solubility disulfide bond formation in the escherichia coli cytoplasm: an in vivo role reversal for the thioredoxins the role of aromaticity, exposed surface, and dipole moment in determining protein aggregation rates implications of protein flexibility for drug discovery amino acid contribution to protein solubility: asp, glu, and ser contribute more favorably than the other hydrophilic amino acids in rnase sa practical considerations in refolding proteins from inclusion bodies relationship of protein flexibility to thermostability accuracy of protein flexibility predictions genetic screens and directed evolution for protein solubility the numpy array: a structure for efficient numerical computation potential aggregation prone regions in biotherapeutics: a survey of commercial monoclonal antibodies lysine and arginine content of proteins: computational analysis suggests a new tool for solubility design predicting the solubility of recombinant proteins in escherichia coli complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in wuhan, china proceedings of the national academy of sciences of the united states of america protr/protrweb: r package and web server for generating various numerical representation schemes of protein sequences the high-throughput protein sample production platform of the northeast structural genomics consortium machine-learning-guided directed evolution for protein engineering on the relation between residue flexibility and residue interactions in proteins prediction of protein b-factor profiles we evaluated nine standard and , miscellaneous protein sequence properties using the biopython's protparam module and 'protr' r package, respectively (cock et al. ; n. xiao et al. ) . for example, the standard properties include the grand average of hydropathy (gravy), secondary structure propensities, protein structural flexibility etc., whereas miscellaneous properties include amino acid composition, autocorrelation, etc. we thank new zealand escience infrastructure for providing a high performance computing platform. we are grateful to harry biggs for proofreading our manuscript and providing feedback for the web server. this work was supported by the ministry of business, innovation and employment, new zealand (mbie grant: uoox ). key: cord- -oinumv z authors: kalantar, katrina l.; carvalho, tiago; de bourcy, charles f.a.; dimitrov, boris; dingle, greg; egger, rebecca; han, julie; holmes, olivia b.; juan, yun-fang; king, ryan; kislyuk, andrey; mariano, maria; reynoso, lucia v.; cruz, david rissato; sheu, jonathan; tang, jennifer; wang, james; zhang, mark a.; zhong, emily; ahyong, vida; lay, sreyngim; chea, sophana; bohl, jennifer a.; manning, jessica e.; tato, cristina m.; derisi, joseph l. title: idseq – an open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oinumv z background metagenomic next generation sequencing (mngs) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mngs data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. existing mngs data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. for many research laboratories, this presents an obstacle, especially in resource limited environments. findings we present idseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). the idseq portal accepts raw mngs data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline which results in the assignment of reads and contigs to taxonomic categories. the taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. furthermore, idseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics which are critical for data interpretation. idseq was designed with the specific intent of detecting novel pathogens. here, we benchmark novel virus detection capability using both synthetically evolved viral sequences, and real-world samples, including idseq analysis of a nasopharyngeal swab sample acquired and processed locally in cambodia from a tourist from wuhan, china, infected with the recently emergent sars-cov- . conclusion the idseq portal reduces the barrier to entry for mngs data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mngs datasets for both known and novel pathogens. infectious diseases remain a leading cause of morbidity and mortality worldwide. despite significant advancement in our understanding of infectious disease biology, existing microbiological tests often fail to identify etiologic pathogens in cases of suspected infection. this can be due to a number of causes -failure to isolate an appropriate sample type, preemptive antibiotic exposure precluding growth in culture, lack of suspicion of a particular infection precluding the ordering of an appropriate test, or lack of available specific diagnostic tests due, in part, to limited knowledge of circulating pathogens. this is compounded further by the fact that novel, previously uncharacterized pathogens may also be present. this fact was illustrated vividly by the recent emergence of covid- in wuhan, china, in early december . metagenomic next-generation sequencing (mngs) of nucleic acid from biological samples offers the potential for a universal pathogen detection method, including the detection of novel species. mngs has great potential as a broad spectrum surveillance or patient monitoring tool, especially in low and middle income countries where the infectious disease burden remains high [ ] . while the expense of sequencing continues to drop, the challenge of mngs data analysis, the lack of bioinformatics expertise, and the access to sufficient compute and storage remains a major obstacle. mngs experiments result in millions of sequencing reads generated from the nucleic acid present within a biological sample, which may include complex microbial populations. a primary goal of mngs data analysis is to determine what nucleic acid derives from the host (for example, a patient), and what cannot be attributed to the host or environmental contaminants. further analysis of the non-host sequence may then attempt to determine the relative abundances of different taxa present in a particular sample, as this may provide insight into the presence and relevance of potentially pathogenic microbes. this is typically done via alignment of sequencing reads to a reference database. in the context of infectious diseases, identification of pathogens via this approach obviates the need for pathogen-specific reagents or the ability to culture the microbe. this is especially important for microbes that are difficult, or impossible to culture, including many viruses, fungal species, eukaryotic parasites, and bacteria [ ] . additional downstream analysis may then be employed to understand trends in the abundances and relatedness of organisms across samples. there are several tools available for estimating relative abundance of microbial populations from mngs data [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . however, running these tools requires bioinformatics expertise and fluency with command-line tools. additionally, pathogen detection in the context of a host organism presents unique informatics challenges beyond microbial abundance estimation. as noted, a substantial fraction of the sample may consist of host sequences that are secondary to the goal of pathogen detection [ ] . existing tools do not perform sensitive removal of host sequences or quality control (qc) steps, thus requiring the use of separate qc and alignment tools, and therefore additional computational experience in pipelining. a number of tools exist to incorporate multiple pipeline steps alongside reporting capabilities, including onecodex [ ] , sunbeam [ ] , and surpi [ ] . however, these tools require paid subscription or significant computational resources to build the underlying databases and run the analyses. consequently, existing tools are not sufficient to support new applications of mngs in poorly host-agnostic and allows researchers to select from several available hosts including human, mouse, pig, ticks, and mosquito, among others. for example, human host samples are aligned to the hg reference database (gca_ . ), while mosquito samples are aligned to a combined collection of reference genomes from culex and aedes species as well as other diptera. reads that align to the selected host genome are removed from the analysis. for hosts with well-annotated genomes, individual gene counts may be saved for offline transcriptome analysis, provided appropriate consent in the case of human subject research. such host-based analyses have been shown to complement metagenomic analysis for pathogen detection [ ] . for all host organisms, sequences for optional spike-in rna controls developed by the external rna controls consortium (erccs) are automatically recognized for downstream steps. next, idseq performs a series of quality control (qc) steps, as outlined in figure . first, trimmomatic [ ] trims illumina adapters. low-quality reads, duplicates, and lowcomplexity reads are then removed using the paired-read iterative contig extension (price) computational package [ ] , the cd-hit-dup tool (v . . ) [ ] , and a filter based on the lempel-ziv-welch (lzw) compression score, respectively. regardless of the host genome, the data is scoured to remove all remaining human sequences using bowtie against the hg reference database [ ] and gmap-gsnap against a more stringent database including sequences combining both hg and chimpanzees (pan troglodytes) [ ] . this step is especially important in the case of vector research, where blood meals may contain human sequences. at each step, the total number of reads remaining in the analysis is computed and these basic qc metrics (including total non-host reads, % passing qc, and duplicate compression ratio) are provided both in the user interface, as well as via download. while the host filtering and qc steps performed by the idseq pipeline serve primarily to reduce the computational burden and noise in downstream alignment steps, these metrics can also provide a resource for evaluating and troubleshooting sample preparation steps. the proportion of reads lost at each step may provide insight into possible sample degradation, fragment size, sequencing quality, or library complexity. idseq's automatic estimation of ercc abundances enables back-calculation of the total input nucleic acid content, estimation of the lower limit of detection, and increases the ability to distinguish contaminants [ ] . ercc spikeins are increasingly recognized as a best-practice for addressing the challenges associated with distinguishing background contamination from true microbial populations (methods) [ ] . to assign taxonomic identities to each read, an assembly-based alignment procedure is used. first, filtered short-read sequences are aligned to the ncbi nucleotide (nt) and nonredundant protein (nr) databases (ftp://ftp.ncbi.nlm.nih.gov/blast/db/fasta/) using gsnapl [ ] and rapsearch [ ] , respectively ( figure a) . gsnapl is a specialized instance of the gmap-gsnap package written by tom wu, intended for very large genome databases. the ncbi database indices are updated biannually, or as needed, via direct pull from ncbi. the index version is tracked for each pipeline run providing for versioned results. putative accessions are assigned to each read using the ncbi accession taxid database [ ] and a blast+ (v . . ) [ ] database is constructed on-the-fly from the set of putative accessions (one database for each, nt and nr). in parallel, short reads are de novo assembled into contigs using spades [ ] . raw reads are mapped back to the resulting contigs using bowtie , in order to identify the contig to which each raw read belongs. finally, each contig is aligned to the set of possible accessions represented by the blast database generated in the previous step, thereby improving the specificity of alignments to all the underlying reads, especially for homologous regions where short reads may align equally well to multiple different accessions. idseq is optimized for scalable amazon web services (aws) cloud deployment ( figure b) . bioinformatics data processing jobs are orchestrated by the idseq pipeline directed acyclic graph (dag, https://github.com/chanzuckerberg/idseq-dag) and carried out on demand as docker containers using aws batch. alignments to the national center for biotechnology information (ncbi) database are executed on dedicated auto scaling groups (asg) of amazon elastic compute cloud (ec ) instances, with the number of server instances varied with job load. fast downloads of the ncbi database from the amazon simple storage service to each new server instance are enabled by the open-source tool s mi (https://github.com/chanzuckerberg/s mi). where alignments exist, taxonomic identifiers (taxid) for each of nt and nr, are assigned to each read. if there exist alignments with equivalent scores to multiple species taxids, then a single taxid is selected at random. if a read was incorporated into a contig, it is assigned the taxid belonging to the ncbi accession to whom its parent contig was assigned, as described above. if the read does not assemble into a contig, it is assigned the taxid of the ncbi nt and nr accessions it mapped to in the initial short-read alignment phase. the results are then aggregated to produce nt and nr counts for each taxid at both the species and genus level. reads matching genbank records in the superphylum deuterostomia are removed, given the high likelihood that such residual reads are of host origin. the idseq portal provides a number of different methods for interpretation of the pipeline results ( figure ) . first, relevant qc metrics and pipeline run information, including the number of reads remaining at each step of the host and quality filtering steps as well as estimates of internal control abundances are provided for each sample (figure ab, methods) . the singlesample report tables provide key metrics for each taxon identified in the sample, including the total number of reads aligning to the taxon (in both nt and nr) as well as contig stats from the assembly-based alignment step ( figure c) . the tree view enables rapid assessment of taxonomic relatedness of microbes identified in the sample ( figure d ). for all views of the data, a wide range of user-selectable compound query and filtering tools are made available, enabling facile investigation of the data. for each taxonomic category, idseq also provides oneclick downloads of the corresponding underlying reads and contigs. furthermore, coverage plots for contigs relative to all corresponding accessions to which they map are automatically generated ( figure f ). to assist with distinguishing microbial signal from reagent and environmental contamination, idseq supports background model generation, which allows researchers to evaluate the significance (reported in z-scores) of relative abundance estimates for taxons in samples of interest as compared to water-only or other environmental control sample collections. altogether, the single sample report and associated filtering functionality enables evaluation of taxonomic hits. more documentation on specific metrics can be found at https://help.idseq.net. to facilitate visualization and hypothesis generation across multiple samples, the idseq portal provides user customizable taxon heatmaps ( figure e ). for advanced users, the pipeline visualization tool clearly documents the input parameters at each step of the analysis pipeline and provides download access to the input and output files at each step so data can be made available for offline analysis (figure s ), such as phylogenetics. idseq is an open source software tool under continued development across two github repositories -one which hosts the web interface (https://github.com/chanzuckerberg/idseqweb), and one which hosts the pipeline code (https://github.com/chanzuckerberg/idseq-dag). modifications to the web interface, which are deployed twice-weekly, do not affect the analysis results. to provide a record of features and how to use them, full documentation is provided at https://help.idseq.net. updates to the pipeline code may impact analysis results. therefore, idseq has adopted a semantic versioning system. changes implemented to each version are listed in the readme file. for each pipeline run, the pipeline and ncbi database versions are also tracked. major changes to the pipeline outputs result in a major version number update ( .x to .x) and are communicated broadly to researchers via email updates. the change from idseq v .x to v. .x involved the incorporation of the current assembly steps to refine alignment results, which improved the ability to resolve taxonomic identities in potentially homologous regions. small changes to the pipeline that may still affect downstream results are indicated by an increased minor version number. for example, addition of a minimum alignment length filter to improve specificity of nt alignments caused a version change from . . to . . . changes to the pipeline which do not affect the results are indicated by incremental minor version (i.e. . . to . . ). continued development on idseq aims to ) improve the computational efficiency and accuracy of the results; ) expand the integration with other tools to enable researchers' flexibility in the downstream analysis of their processed results; ) support the expanding number of mngs sequencing platforms that will be used by researchers for pathogen detection globally. a suite of benchmarking samples are used for analysis of additional pipeline updates as discussed below. software and data availability additional documentation and guides for getting started with idseq can be found at https://help.idseq.net. the code is open source and available in the github repositories listed in table s . a recent study evaluated the performance of taxonomic classifiers for mngs data on ten reference datasets that are commonly used for benchmarking, containing computationally simulated reads from between and bacterial species [ ] . it evaluated performance using two metrics -the area under the precision recall curve (aupr) and the l distance. the aupr evaluates the ability to detect the presence of microbes (binary presence/absence) above a relative abundance threshold, taking into consideration the precision and recall rates as said threshold is adjusted. a species-level aupr of . indicates that there is a threshold (proportion of reads) above which all true positive species can be identified with no false positive species. the l distance provides a complementary metric that considers the similarity in relative abundances between the results and the ground truth. we evaluated the performance of the idseq pipeline on these same datasets (methods , table s ). samples took an average of hours (min = . hours, max = hours) to process on idseq pipeline version . , with the ncbi database version from september . performance metrics (aupr and l distance) were computed separately for the ncbi nt and nr results and compared to those published recently by ye et al. (idseq_nt and idseq_nr, figure ) [ ] . idseq provides an automated pipeline, but at the cost of inability to easily swap in new databases. therefore, we compared our results against those reported by ye et al. for the "default database" of other tools. the performance metrics may inherit biases due to differences in the reference database contents as well as recency of input sequences. the idseq pipeline demonstrated comparable performance to the other mngs tools tested (figure ) . the unambiguously mapped datasets demonstrated limited resolution for distinguishing the tools when evaluated by aupr and l , as most tools show relatively high performance (with aupr scores above . at the species level, figure a ). consistent with ye et al, we observed that the greatest differences between tools was in the reduced precision at high recall. idseq protein alignments (nr) demonstrated greater aupr than idseq nucleotide (nt) across most datasets, but consistently identified more taxa at low abundance (less than %), therefore resulting in reduced precision ( figure c) . meanwhile, idseq nt exhibited increased specificity. idseq nt and nr had a mean aupr across all the datasets of . and . , respectively. the top mean aupr of any single tool was achieved by metaothello ( . ), followed by kraken ( . ). given kraken 's performance on the unambiguous benchmark datasets and its wide adoption for relative microbial abundance estimation, additional analyses focused on comparison against kraken ( figure bc ). another distinguishing factor between the tools was in the number of reads that were "unclassified" across multiple datasets. mmseq , metaothello, kaiju, and bracken consistently left > % of reads "unclassified". idseq (nt and nr) removed an average of % of the reads during host filtering and qc steps, but of the remaining sequences, an average of less than % of reads were unmapped across the ten datasets. this can be attributed in part to idseq always assigning reads to a species when an alignment exists (increasing sensitivity at the expense of specificity) and secondly to the use of assembled contigs to refine alignments where short reads may have been unmapped. to further investigate differences between the tools, we evaluated the results for each dataset independently ( figure b ). idseq nr demonstrated lower precision across all datasets than many other tools, including idseq nt and kraken ( figure c, figure s ). the atcc staggered dataset, which includes several microbes present at very low abundance, yields the lowest aupr of all samples tested via idseq nr, consistent with findings in ye et al. that protein-based classifiers consistently struggled to identify the low-abundance taxa amongst other low-abundance false positives. meanwhile, idseq nt demonstrated reduced performance on the nycsm dataset (supplemental text). idseq's usage of the full ncbi nt and nr databases resulted in relatively high performance for the buccal dataset. ye et al. discuss that the buccal dataset was a low-performing outlier for most evaluated classifiers due to inclusion of reads from a species with only contig-quality reference, which is not included in most default databases. the idseq web portal is designed to provide researchers with the choice of utilizing either nt or nr results, or both in conjunction with each other. for example, the impact of spurious nr alignments can be mitigated by requiring a corresponding alignment with idseq nt. using this strategy, the performance of idseq was evaluated, considering the nt relative abundances reported for taxa with both nt r > and nr r > (idseq_ntnr, figure , figure s ). we observed that requiring concordance resulted in the greatest mean aupr across all other tested tools ( . ) and increased the precision of idseq above that of either nt or nr alone. altogether, these results highlight some key trade-offs with respect to relative abundance estimation of bacterial species. idseq is capable of identifying organisms with respect to the latest versions of ncbi and demonstrates relatively high recall ( figure d ). but use of the full ncbi database may result in false-positive alignments at low abundance which can reduce precision ( figure c ). in the context of pathogen-identification, it has been observed that infecting agents may comprise the majority of sequencing reads in certain circumstances [ ] . for such data sets, the reduced precision for abundance estimation at low levels is less impactful. meanwhile, researchers interested in evaluating highly complex microbiome composition at the species-and strain-level may need to bring in other tools to supplement their analyses [ ] [ ] [ ] or rely on genus-level estimates provided by idseq. to address the gaps between the existing benchmark datasets and the idseq pipeline's primary use-case for pathogen detection, we tested idseq's performance on three additional datasets specifically designed to evaluate detection of divergent viruses (methods) and common clinical microbes (supplemental text). for each dataset, we evaluated the performance of idseq (nt and nr), as compared to kraken [ ] , using per-species recall. viruses are known to evolve rapidly and therefore their sequences may diverge from sequences in the known ncbi database over relatively short timescales [ ] . maintaining the ability to detect divergent viruses is of paramount concern, given their role in numerous recent outbreaks, including the recent emergence of sars-cov- , the coronavirus responsible for the covid- outbreak [ ] [ ] [ ] [ ] . the idseq-bench tool was used to generate simulated ngs samples from rhinovirus c genomes at varying levels of divergence (after in-silico forward evolution from a reference sequence obtained from the ncbi database), ranging from % identical to the reference sequence to % similar (at the nucleotide level) (methods, figure a , table s ). the resulting samples were uploaded to idseq (project hrhinoc simulation). meanwhile, the same raw .fastq files (prior to host filtering), were analyzed using kraken (methods). both idseq nt and kraken identified reads aligning to rhinovirus c down to % sequence divergence ( figure b) . meanwhile, idseq nr recalled rhinovirus c alignments down to % sequence divergence, demonstrating a greater sensitivity for divergent virus detection. we note that idseq nr experienced a rapid drop in total recall ( , reads correctly mapping to rhinovirus c, of , total and , passing qc steps at % sequence similarity vs. reads detected at % sequence similarity). this highlights an artifact of the computational cost-saving mechanisms employed by idseq -whereby a blast database is constructed from only the subset of accessions identified in the initial short-read gsnap and rapsearch alignments to the ncbi database. in cases where the highly divergent short-read sequences don't match to nt or nr in the initial alignment, the blast database will be empty and none of the reads or contigs will map. however, idseq does provide the ability to download all assembled contigs, enabling offline interrogation of this divergent "dark matter". manual blastx of contigs assembled by spades in idseq to the full ncbi database, was able to recover the rhinovirus c identity down to % sequence identity. future iterations of the idseq pipeline may aim to automate the manual follow-up steps for divergent viral contigs as well as incorporating other tools for dark matter investigation to probe for pathogen motifs. further comparison of the idseq (nt and nr) results to kraken shows that kraken initially recovered more of the simulated reads than idseq ( , of , vs. , for both idseq nt and nr). this is explained by the qc steps in the idseq pipeline, which removed ~ % of reads at the priceseq filtering step due to low quality -an expected outcome given that the simulated reads mimic error models of illumina sequencers (methods). of the reads remaining after host filtering, idseq identified % as aligning to rhinovirus c. this pattern persists down to % sequence similarity, at which point kraken begins to identify fewer reads. while some rhinovirus c reads are identified down to % sequence similarity (same as idseq nt), idseq nt identified a significantly greater number of reads mapping to rhinovirus c at increasing levels of divergence. specifically, at % divergence, , reads mapped by idseq nt while only , reads mapped by kraken . altogether, these benchmark results are consistent with existing reports of the utility of idseq nr in detecting divergent viruses [ ] and are within the ranges of nucleotide divergence associated with emerging human pathogens (supplemental text). the idseq pipeline is sample-type agnostic, allowing researchers interested in a broad range of scientific questions across a diverse array of host organisms (humans, mice, mosquitos, ticks, plants, environmental, etc.) to obtain relevant microbial information from any sample type (blood, csf, respiratory fluids, tissue, etc.) [ , [ ] [ ] [ ] . there are many challenges for data interpretation that are common across mngs applications, such as impact of pcr amplification on samples with low amounts of input rna, background contamination, and genomic similarity between short regions of related organisms. here, through a re-analysis of the idseq results for three cerebrospinal fluid (csf) samples from a recent study investigating etiologies of pediatric meningitis in bangladesh [ ] , we highlight specific idseq features to address these challenges. the original study, conducted by saha et al. included csf samples ( positive, negative, and idiopathic) and water controls, processed on idseq v . . we focus on one known infection (streptococcus pneumoniae, chrf_ ), one idiopathic sample that was later confirmed to have chikungunya virus (chrf_ ), and one water control (chrf_ ) (figure ) . these samples, for demonstrative purposes, were rerun on idseq v . and are available in idseq project chrf rr example. key pipeline run metrics for these three samples are provided in table . chrf_ was a pediatric encephalitis case of unknown etiology that was later determined to be a case of neuroinvasive chikungunya virus. figure a shows the number of reads removed by each host filtration and qc step. one challenge for mngs-based pathogen detection is that host sequences dominate the mngs library. notably, in chrf_ , chikungunya virus reads in sample chrf_ represented less than % of the total sequencing reads. however, after idseq's host filtering and qc steps, it represented % of the remaining non-host reads. a second, widely acknowledged challenge for mngs data interpretation is the presence of environmental contaminants. best-practices suggest including at least one water control with every sequencing experiment [ , ] . to assist with interpretation of results with respect to control samples, idseq implements a z-score approach (methods) first described in wilson et al. [ ] . z-score statistics computed by idseq indicate the significance of relative abundance estimates in a sample as compared to the user-selected background controls -which may include water controls or healthy control samples. z-score thresholds can be imposed to remove taxa that are prevalent in the water or healthy controls. in sample chrf_ , rows ( species from genera) were reported with nt reads per million (rpm) greater than , nr rpm > , and z-score > (a relatively stringent threshold employed to remove many of the low-abundance taxa for first-pass evaluation). , . rpm were associated with chikungunya virus, of which many were associated with the contigs aligning to chikungunya virus. by using the idseq portal coverage visualization, which displays reads and contigs in association with their top matched genbank accession, we observe that the longest contig, approximately kb, represented full-genome coverage of the nearest genbank accession ( figure f ). in sample chrf_ , idseq associated , , reads by nt with the independently verified pathogen, streptococcus pneumoniae, of which . % were assembled into contigs ( table ) . the average alignment length across all contigs and reads was , . bp -driven largely by alignment of long contigs. despite the large number of contigs and long alignment lengths, the genbank accession with the greatest coverage ( . mb lr . streptococcus pneumoniae strain stdy genome assembly, chromosome: ) had . % coverage breadth. this exemplifies a frequently observed pattern (which is even more pronounced in lower-coverage samples) -whereby coverage of larger bacterial genomes is lower than virtual genomes even for samples with a high proportion of mngs reads associated with a particular microbe. for many cases, low coverage from mngs data can preclude confident strain identification in bacterial species that may be useful in a clinical context. furthermore, low coverage of the transcriptome (via rna mngs) may produce a large proportion of alignments in conserved rrna regions which may be challenging to disambiguate. in sample chrf_ , the duplicate compression ratio (dcr) of . indicates the possibility of over-amplification of low biomass nucleic acid input ( table ) . this is common for water samples where low input nucleic acid is expected. the use of ercc controls in the library preparation of these samples enabled back-calculation of the total input rna concentration. this sample was determined to have . pg of total input rna, while the two infected samples (chrf_ and chrf_ ) had . pg and . pg, respectively. thus, while the relative abundance values appear comparable to those in the infected sample, they represent significantly smaller quantities of raw nucleic acid (figure e ). in the original study all water and non-infectious controls (which had low white cell counts and therefore little host or pathogen nucleic acid) had input rna quantities < pg, enabling the use of an input nucleic acid threshold for inclusion in downstream analyses. additionally, the top four organisms (by nt rpm) include providencia, cutibacterium, streptococcus, and escherichia -many of which are known environmental contaminants [ , , ] . the total rows (with nt rpm > , nr rpm > , and nt z-score > ) are all present at relatively similar and low abundance levels, characteristic of background contaminants [ , ] . idseq is a globally accessible pipeline for mngs analysis that has been shown through simulation and practice to be effective in identifying novel and divergent viruses. as an additional real-world example of this utility, we provide a vignette from the recent sars-cov- coronavirus outbreak. on january , , a team of researchers from the cnm-niaid (national center for parasitology, entomology, and malaria control -national institute of allergy and infectious disease) collaboration in cambodia obtained a nasopharyngeal swab sample from a patient with pcr-confirmed sars-cov- infection. the library preparation and sequencing were completed in-country by february , [ ] . analysis of the sample ( . million single reads) using idseq against an ncbi database version from - - , which did not contain the known sequences for sars-cov- that have since been deposited on ncbi, identified reads aligned to the genus betacoronvirus, with an average amino acid percent identity of . % (by nr). the sample took minutes to analyze end-to-end and the most abundant species was severe acute respiratory syndrome-related coronavirus, with nt reads ( contigs) and nr reads ( contigs), representing ~ % genome coverage. to quantify the idseq pipeline's recall for sars-cov- sequences, we built a blast database from the sequences associated with sars-cov- , which had been deposited in ncbi between january and february , as a result of widespread efforts by the global science community. by blasting all non-host reads from the sample against the known sars-cov- sequence database, we identified reads mapping to sars-cov- . as compared to this ground-truth value, idseq demonstrated . % read-level recall. this indicates that for an emerging threat, idseq was able to successfully provide information on the presence of a pathogen prior to the existence of full reference genomes associated with the organism. this identification was of paramount public health importance given unclear diagnostic accuracy in the beginning pre-pandemic state. we have introduced idseq, an open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. we described the pipeline analysis steps and demonstrated that the idseq pipeline achieves comparable performance for taxonomic identification and relative abundance estimation as other tools in the field. we showed that idseq is uniquely suited for detection of divergent viruses and has high sensitivity for detecting human pathogens. finally, we have shown through two case-studies, how the idseq portal enables researchers to rapidly generate insights into their samples' quality, microbial content, and cohort trends. we further highlighted its real-time utility by describing how idseq was used to analyze sequences associated with the emerging coronavirus sars-cov- prior to deposition of sars-cov- sequences into public data repositories. the idseq web portal provides an easy-to-use access point for computationally intensive analysis of mngs data. its sample-type agnostic implementation enables its application for a broad range of research questions related to understanding distribution of microbes in a sample. the idseq pipeline has been a key component in recent studies to understand undiagnosed causes of infection and survey the landscape of circulating pathogens, both in humans and animals [ , ] . benchmarking of mngs tools is a well-recognized challenge within the field [ , , ] . the choices of tools, parameters, databases, and datasets may all influence the conclusions. our aim in this study was simply to test performance relative to other tools. we compare idseq's default database (ncbi nt and nr) against the default databases for all other tools included by ye et. al. though it is possible that other tools' performance would improve given a comparably large database, configuring these details requires computational expertise that directly opposes the readily usable nature of idseq. idseq continues to use the full ncbi nt and nr database given their advantages for detecting divergent viruses and incorporating data on novel bacterial pathogens. however, the large database size results in longer run-times and the lack of curation induces the potential for noise in alignment results due to errant sequence assignment errors upon upload to the ncbi databases. there is ongoing work by many researchers to evaluate curated databases for mngs analyses, but for now idseq continues to update its database biannually. to support continued benchmarking of idseq and empower researchers to test idseq's performance for their particular applications, we have released the open-source idseq-bench tool, which was used to generate the divergent virus dataset and for evaluating the per-read recall results. beyond the informatics nuances between tools, idseq provides clear advantages for researchers new to mngs and computational data analysis. first, idseq is designed and maintained by a team of engineers and managed as a software-as-a-service product, where user support is a key component. user support enables researchers to have confidence that they will obtain results in a timely fashion. secondly, the tool's user interface provides a series of advantages for users with limited computational expertise by reducing the challenges associated with installation and configuration as well as providing meaningful metrics for quality control and interpretation. it maintains transparency on individual pipeline steps through documentation (https://help.idseq.net), the pipeline visualization tool (figure s ) , and availability of downloads from intermediate files. together, these resources help researchers new to mngs get started quickly, while also providing tools to enhance skills in computational biology. thirdly, the pipeline provides assurance of computational reproducibility, which is an increasingly appreciated priority within the scientific community as dataset sizes and analytical complexity increase. lastly, the web-based user interface provides an access point for collaboration and networking -enabling researchers to collaborate seamlessly across countries and institutions, thereby building global networks of expertise that can be accessed by those in resource-scarce settings. finally, we highlight that idseq is not a clinical tool and intended for research-only purposes. idseq aims to be a valuable resource for researchers in the infectious diseases field but does not intend to become clinically validated. while idseq can yield insights that inform public health policies, laboratory testing priorities, and real-time decisions for confirmatory clinical testing, clinical validation of the pipeline requires locking of the system for adherence to strict guidelines. idseq will remain under continued development in order to ) improve the computational efficiency and accuracy of the results; ) expand the integration with other tools to enable researchers' flexibility in the downstream analysis of their processed results; and ) support the expanding number of mngs sequencing platforms that will be used by researchers for pathogen detection globally. some possible future directions for improvements to the idseq pipeline have been discussed throughout. notably, idseq's current assembly-based alignment steps results in failure to automatically identify divergent viruses beyond % divergent, while blastx of idseq-generated contigs can enable detection down to % divergent. automating full ncbi blastx of putative viral contigs would simplify offline analyses. similarly, we showed that idseq nr had reduced precision, which made relative abundance estimation of lowabundance taxa challenging. allowing for non-species-specific mappings or propagating estimates of species-level ambiguity to increase species-level resolution for low-abundance taxa may provide another avenue for continued development. finally, continued integration with other analysis tools and sequencing technologies will further enhance the usability of idseq for mngs data analysis. idseq reduces the need for much of the computational expertise and access to largescale computing resources that have traditionally been barriers for conducting mngs data analysis. the idseq portal provides an easy-to-use interface that enables researchers around the world to upload samples and generate hypotheses with relevant implications for global health and infectious disease tracking as diseases emerge. the idseq pipeline uses several publicly available academic bioinformatics tools. the raw commands and parameters used for each step in the pipeline are available for each pipeline version in the pipeline visualization (figure s ), which can be viewed for any sample in idseq. technical documentation is available here: https://github.com/chanzuckerberg/idseqdag/wiki. the external rna controls consortium (ercc) developed a common set of external rna controls that can be used to control for a variety of sources of variation on rna expression attributed to experimental factors (including the quality of the starting material, the level of cellularity and rna yield, the sequencing platform, and the person performing the experiment). in the context of pathogen detection, mngs libraries often contain extremely low quantities of rna input. it has been shown that during library preparation, samples with low input experience amplification background contaminants [ ] . ercc controls can be used to mitigate the effect of low input libraries and to quantify the total input. to enable researchers to rapidly assess the quality of their libraries and the limit of detection, idseq provides ercc counts for each sample. during the host filtering steps, the raw sequencing reads are aligned to the ercc reference sequences and counts are generated by star -genecounts option [ ] . these values are then available for download, as well as visualized in the user interface ( figure b) . given the sensitivity of mngs, it is common to identify contaminating microbial sequences derived from laboratory contaminants, reagents, collection tubes, etc. there exist numerous approaches to assist in distinguishing background contaminants from true microbes [ , , ] . idseq implements a previously described z-score method for background correction [ ] . researchers can create a background model by selecting control samples sequenced via their standard laboratory protocols or select from a default set of publicly available water controls. from the selected set of samples, the distribution of reads for each taxon is computed. the z-score field of the idseq sample report is calculated as the z-score for each taxonomic id based on its prevalence in the selected background model. if a particular taxonomic id is not found in the set of control samples, then the z-score will be set to . if the taxonomic id is not found in the sample, the z-score will be set to - . the z-score metric also feeds into the "aggregate score", which combines information from nt rpm, nr rpm, nt z-score, and nr zscore to provide an estimate of "microbial importance" for a particular sample based on the relative abundance both with the sample as well as in the background. this experimental metric aims to rank rare organisms that may be implicated in an infection higher, even if they are present only at low abundance. datasets evaluated by ye et al. in their benchmark analysis of mngs tools were downloaded from . the raw .fastq files were uploaded to idseq ( table s ). the truth files for each of the datasets were obtained from https://github.com/yesimon/metax_bakeoff_ and are available in the notes field of the idseq metadata. the code developed by ye et al. was downloaded from the github repository. idseq sample reports were downloaded upon completion and processed to produce specieslevel relative abundance estimates for each sample -specifically, the proportion of total reads (by nt and nr) was computed and used as input to the script. the idseq results were processed in parallel with the data analyzed for the ye et al. paper. the scripts used to run this analysis are available here https://github.com/katrinakalantar/idseq-benchmark-manuscript. modifications to the original script are annotated as "##idseq edit". the computed metrics (aupr, l distance, precision, recall, and f -score) were then output as .csv files and plotted (figure , figure s ). the idseq-bench tool (https://github.com/chanzuckerberg/idseq-bench) was developed as a resource to enable the idseq team to benchmark datasets internally [ ] . the tool is open source and available for external users to generate benchmarks appropriate for their particular use case. full documentation can be found on github. briefly, the tool enables users to simulate ngs sequencing data from known microbes. by indicating the genbank reference accession, idseq-bench uses the insilicoseq simulation tool [ ] to generate reads in accordance with known sequencing error models. the true organism from which each read was simulated contains a tag indicating the known accession and species-, genus-, and family-level taxonomic ids. the idseq-bench tool then uses this information to characterize performance of the idseq pipeline results. the tool provides metrics for read-level-recall at the species-, genus-, and family-level, as well as sample-level aupr, l , precision, recall, etc. for samples that were not simulated internally, the tool enables users to supply a gold standard file (comparable to those obtained for the cell benchmarks datasets) and compute sample-level metrics against that file. a reference genome for rhinovirus c (refseq nc_ . ) was identified and the associated coding sequence .fasta file was downloaded from refseq. virapops forward viral simulation [ ] was used to simulate generations of viral evolution using default parameters. from the simulated data, sequences were selected at intervals of % nucleotide sequence identity to the original reference and compiled into a fasta file. this was then used as input to the idseq-bench simulation tool for benchmark simulation, which used insilicoseq [ ] to simulate , sequencing reads of length for each divergent virus genome according to a hiseq error model. this resulted in . x coverage of each divergent viral genome, consistent with the relatively high coverage of viral genomes seen by idseq analysis of samples with high viral load. the simulated fastq files were then uploaded to idseq project hrhinoc simulation (samples hrc_ , hrc_ , hrc_ , … hrc_ , table s ). to evaluate the limit of detection for divergent viruses, the total recall of rhinovirus c reads was evaluated at each level of simulated divergence, for each tool. additionally, the number of reads aligning to false-positive species was tracked. offline analysis was done using the contigs generated by idseq for samples where idseq failed to identify rhinovirus c. for simulated samples hrc_ through hrc_ , the "unmapped contigs" were downloaded and aligned via blastx in the ncbi blast web interface using default parameters [ ] . samples for which the blastx result returned rhinovirus c were marked as "potentially possible" and the greatest level of divergence was recorded. to compare internal benchmark samples against kraken [ ] , a kraken database was generated from the ncbi nt sequence database [ ] . the following command line parameters were used to download and build the reference database. finally, simulated sequencing files were run via the following commands. in collaboration with saha et al. [ ] , three samples were identified (chrf_ , chrf_ , chrf_ , from the original ncbi sequence read archive dataset under bioproject prjna ) and re-run on pipeline version . . the pipeline results were filtered using a conservative set of filters, which required nt_rpm > and nt_zscore > . the z-score was computed with respect to the public background model chrf_rna_negative, which was used in the original manuscript. the background model was generated based on rna-seq data from water samples and negative controls. metrics were compiled into table and a heatmap was generated using idseq, with the same filters ( figure d) . in collaboration with manning et al. [ ] , rna was extracted from a sample obtained from a symptomatic patient meeting criteria for possible covid- pneumonia. libraries were prepared for sequencing as described in manning et al and sequenced on an illumina iseq . the raw .fastq files were uploaded to idseq from the cnm-nih lab in phnom penh, via illumina basespace, on january , using an ncbi index from september . an ncbi database update was then done on february , by the idseq team and the results were evaluated. these samples were run on idseq pipeline version . . the data was deposited in public repositories by the original authors and is available at gisaid accession epi_isl_ . idseq results for the associated samples are available at http://public.idseq.net/covid- ?utm_source=bioarxiv&utm_medium=paper&utm_campaign=benchmark-paper. project name: idseq portal project home page: https://idseq.net operating system(s): platform independent programming language: python, ruby, javascript other requirements: web browser license: mit license availability of supporting data: data referenced in this manuscript has been previously published. sra accession ids are included in the original manuscripts [ , , ] . supplemental_text.docx, contains supplemental methods and results associated with two benchmark datasets listed in the main text, as well as supplemental figures (figure s -s ) and supplemental tables indicating idseq data availability (table s -s ). table of reads remaining during each step of the host filtration step (for chrf_ ) -interpretation of the relative loss at each step in can provide insight into the quality of the library preparation and sequencing run. b) automatic quantification of ercc counts from sample chrf_ ; ercc quantification enables back-calculation of input rna concentration. c) the results for a single sample (chrf_ ) are presented as a table, with key metrics for interpreting taxon alignment quality. d) the tree view indicates the relative abundance of sequences and their taxonomic relationship within a particular sample, shown is the relative abundance of chikungunya virus reads in chrf_ . e) the results from multiple samples can be compared using the idseq heatmap view, with associated metadata (purple = csf, blue = water control). the interactive heatmap visualization can be viewed at https://idseq.net/zlfl . the heatmap is especially powerful when analyzing trends across a larger number of samples. f) coverage of chikungunya virus in chrf_ ; the coverage visualization enables rapid interrogation of genome coverage. including metrics obtained when evaluating basic threshold filters integrating both idseq nt and nr (idseq_ntnr). c) the precision and recall of the same three tools for detecting known taxa. graphic representation of genomic similarity for simulated divergent rhinovirus c genomes, at %, %, and % similarity to reference sequence nc_ . . mutations are shown in dark blue. b) performance of idseq (nt and nr) as compared to kraken for recovery of reads from simulated divergent rhinovirus c genomes at varying levels of divergence. the dotted yellow line indicates the theoretical limit for detection of rhinovirus c achieved by manual blastx of idseq-produced contigs. table : idseq provides key metrics enabling the host filtering and qc stage of the idseq pipeline is composed of several individual steps. the proportion of reads lost at each step can provide insight into sample quality and library preparation. interpretation of these metrics may be valuable for labs evaluating new sample storage techniques, library preparation protocols, etc. these three samples, provided as an example, can be investigated in the idseq portal in project chrf rr example. csf: cerebral spinal fluid, dcr: duplicate compression ratio, "number of rows": total number of species and genus-level rows in the idseq sample report, nt: results based on ncbi nucleotide (nt) database, nr: results based on ncbi non-redundant protein (nr) database, rpm: reads per million, l: average alignment length across all reads and contigs mapping to that taxon. chrf_ chrf_ many closely related bacterial species have different relative pathogenicity in humans or other host organisms [ ] . the unambiguous benchmark datasets, evaluated in the main text, do not emphasize performance at distinguishing common clinically relevant pathogens and closely related bacterial species. here, we provide the results of an analysis of two simulated datasets containing common clinical microbes. for each dataset, we evaluated the performance of idseq (nt and nr), as compared to kraken [ ] , using per-species recall. several microbes commonly identified in samples from humans and vector samples were identified in collaboration with researchers and incorporated into the ccm benchmark dataset. the selected species represent a range of bacterial, viral, fungal, and parasitic pathogens. the idseq-bench simulation tool was used to pull reference genomes from for idseq nt and nr results, the idseq-bench score function was used to evaluate the per-read, species-level recall across each ground truth organism. then, to characterize false positives, the total number of species identified was determined, along with the percentage of reads mapping to incorrect taxa. for kraken , the read-level recall was determined at the species level and reads mapping to higher taxonomic levels were not considered as positive matches for the species-level recall. again, the total number of unique species was determined and the proportion of reads mapping incorrectly at the species level was determined. many bacterial pathogens are closely related, therefore resulting in possible genomic overlap and ambiguity in aligning short read sequences that may map to multiple species. however, distinction of pathogens at the species-level has implications for treatment and outcomes. bacterial species from genera within the enterobacteriaciae family were identified and the idseq-bench simulation tool was used to pull reference genomes (species name, taxonomic id; salmonella enterica, ; citrobacter koseri, ; citrobacter freundii, ; klebsiella aerogenes, ; enterobacter cloacae, ; escherichia coli, ; klebsiella oxytoca, ; klebsiella pneumoniae, ; shigella boydii, ; shigella flexneri, ) and simulate reads using the insilicoseq [ ] sequence simulator with uniform coverage and sequencing error models derived from hiseq data. the simulated sample is available on idseq (table s , https://idseq.net/ wns). analysis of the crb benchmark dataset replicated that which was done for the ccm dataset (above). the nycsm dataset had the lowest aupr of all datasets tested by idseq nt. overall, we note that due to the relative comparable performance of tools on these particular datasets, distinctions in aupr may be the result of just one or two missed taxa. in this case, idseq nt failed to identify enterobacter asburiae (taxid = ) and identified only a small number (< ) of reads aligning to pseudoalteromonas haloplanktis (taxid = ). meanwhile, idseq nr identified successfully recovered reads from e. asburiae and produced a comparably low number of reads mapping to p. haloplanktis. idseq nt did identify many reads mapping to enterobacter soli, which has been noted to have high genomic similarity to e. asburiae and e. aerogenes (p-distance: . and . %, respectively [ ] ), both of which were included in the simulated dataset. this highlights the challenge with disambiguating short reads from taxa with a high degree of genomic similarity. we emphasize the practical importance of orthogonal validation of hits via assays (ie pcr) targeting unique regions of the genome. however, it is possible that the expansion of reference databases has improved specificity beyond the original dataset simulation. the particular genbank accession to which all e. soli reads map (cp . enterobacter soli strain lf a, complete genome), was added to the database in april , while reads were simulated in [ ] . the same trend is observed for p. haloplanktis. idseq nt and nr both identified significantly more reads mapping to p. arctica, a member of the p. haloplanktis-like group [ ] . the associated genbank accession to which the majority of reads map (cp . -pseudoalteromonas arctica a - - chromosome i, complete sequence) was added in september of . the idseq coverage visualization makes interrogation of microbial hits simple by linking to the ncbi taxonomy database and genbank accessions. the ability to detect divergent viruses is a function of genomic similarity to other organisms in the database as well as genomic coverage, which influences assembled contig lengths. many recent, emerging, diseases affecting humans do have some sequence similarity to organisms in the ncbi databases. for example, since the emergence of enterovirus ev-d , numerous outbreaks caused by divergent sub-clades have been reported with nt similarity to other strains ~ % [ ] . similarly, recent outbreaks of dengue virus have been reported to be the result of introduction of novel denv lineages, which are defined based on nucleotide divergence of - % within each denv serotype [ , ] . the set of nine known west nile virus lineages (which genomic analysis indicated diverged in the early th century) have an average pairwise percent identity of . % (nucleotide) and . % (amino acid) [ ] . these are well within the range for detection by idseq. meanwhile, zika virus, the viral species which caused the recent epidemic, was discovered in and shares, on average, . % amino acid sequence identity with dengue virus and . % with west nile virus [ ] . had the sequence of the first zika isolate not been in the database, idseq would not have been able to flag the presence of this virus and researchers would be required to evaluate assembled, unmapped, contigs offline [ ] . two additional datasets were simulated to evaluate idseq's performance for detection of common clinical microbes (ccm) as well as for disambiguating closely related bacterial species (crb) (methods, table s ). the ccm dataset contains reads simulated from total species, including six viral, four bacterial, and three eukaryotic pathogens. the crb dataset contains reads simulated from species of bacteria, all from the enterobacteriaceae family. given that idseq's pipeline returns a species-level assignment for each mapped read and the web interface presents results at the species level, we evaluate each of these benchmarks only at the species level. the kraken algorithm assigns reads with ambiguous mappings to higher taxonomic levels. therefore, the results shown include only the species-specific alignment results. for the ccm dataset, idseq filtered out . % of reads during the host and qc filtering pipeline steps. of the remaining , non-host reads, idseq nt showed the greatest perspecies recall across the species included ( . , iqr = . - . ). idseq nt per-species recall was significantly higher than that of both idseq nr ( . , iqr = . - . , p = . ) and kraken ( . , iqr = . - . , p = . wilcoxon rank sum) ( figure s a ). all tools successfully identified the presence of all microbial species. in addition to these species, idseq nt, nr, and kraken identified , , and false positive species, respectively. since the idseq pipeline returns a species-level assignment for all mapped reads, even in cases where the species may align equally to two different species, it had a notably greater portion of the total (post-qc) reads mapping across those false positive organisms ( . % by nt, . % by nr) than kraken , which had only . % of reads mapping to the false positive species. kraken avoids larger percentages of reads being associated with false-positive species calls by calling a significant portion of ambiguously mapped reads at higher levels of the taxonomic tree. similar trends were observed for the crb dataset. idseq filtered out . % of reads during the host and qc filtering pipeline steps, leaving , non-host reads for down-stream analysis. notably, idseq nt demonstrates the highest per-species recall ( . , iqr = . - . ), significantly different than nr ( . , iqr = . - . , p = . , wilcoxon rank sum) and kraken ( . , iqr = . - . , p = . , wilcoxon rank sum) ( figure s b ). all tools successfully identified the presence of all microbial species. in addition to these species, idseq nt, nr, and kraken identified , , and false positive species, respectively. again, idseq nt and nr had greater proportions of total reads mapping to these false-positive species ( . % and . % for nt and nr, respectively) as compared to kraken , with only . % of reads mapping to false-positive species and the majority of ambiguous reads mapping at higher levels of classification ( . %). the impact of ambiguous reads is exaggerated in cases where we simulate reads from multiple closely related species with known genomic similarity. in many cases of an infection, with a single dominant organism, the importance of recall may outweigh the identification of lower-level false-positive species. additionally, these simulations sample from across the genome. however, we know that ribosomal sequences can be used for typing of bacteria and studies have previously shown improved sensitivity with rna-seq, where ribosomal rna comprises the greatest portion of sequenced nucleic acid [ , ] . thus, idseq results must be interpreted by the researcher with respect to the sample type and sequencing prep. figure s : the idseq pipeline visualization indicates each step in the underlying pipeline and includes a description of the raw command parameters as well as the ability to download intermediate files for offline analysis. area under the precision recall curve (aupr), ranges from to (best), b) precision, ranges from to (best). c) recall, ranges from to (best) d) f -score, the harmonic mean of precision and recall values. e) l distance, ranges from (best) to . tables table s : github repositories containing open-source code for idseq pipeline, web application, and benchmarking resources. idseq pipeline https://github.com/chanzuckerberg/idseq-dag idseq web application https://github.com/chanzuckerberg/idseq-web idseq benchmarking tool https://github.com/chanzuckerberg/idseq-bench unbiased metagenomic sequencing for pediatric meningitis in bangladesh reveals neuroinvasive chikungunya virus outbreak and other unrealized pathogens understanding the promises and hurdles of metagenomic next-generation sequencing as a diagnostic tool for infectious diseases bracken: estimating species abundance in metagenomics data centrifuge: rapid and sensitive classification of metagenomic sequences gatk pathseq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts taxmaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time fast and sensitive protein alignment using diamond fast and sensitive taxonomic classification for metagenomics with kaiju mmseqs software suite for fast and deep clustering and searching of large protein sequence sets metaphlan for enhanced metagenomic taxonomic profiling microbial abundance, activity and population genomic profiling with motus fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers higher classification sensitivity of short metagenomic reads with clark-s improved metagenomic analysis with kraken kraken: ultrafast metagenomic sequence classification using exact alignments krakenuniq: confident and fast metagenomics classification using unique k-mer counts k-slam: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets -pubmed database indexing for production megablast searches sequence analysis a novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures clinical metagenomic next-generation sequencing for pathogen detection sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments a cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from nextgeneration sequencing of clinical samples virus identification in unknown tropical febrile illness cases using deep sequencing sequence analysis star: ultrafast universal rna-seq aligner integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults trimmomatic: a flexible trimmer for illumina sequence data price: software for the targeted assembly of components of (meta) genomic sequence data cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences fast gapped-read alignment with bowtie sequence analysis fast and snp-tolerant detection of complex variants and splicing in short reads towards precision quantification of contamination in metagenomic sequencing experiments simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. microbiome rapsearch: a fast protein similarity search tool for short reads basic local alignment search tool spades: a new genome assembly algorithm and its applications to single-cell sequencing benchmarking metagenomics tools for taxonomic classification constrains identifies microbial strains in metagenomic datasets strain-level microbial epidemiology and population genomics from shotgun metagenomics microbial strain-level population structure & genetic diversity from metagenomes rapid evolution of rna viruses assessing the epidemic potential of rna and dna viruses genome microevolution of chikungunya viruses causing the indian ocean outbreak evolution of the h n influenza genotype that facilitated the genesis of the novel h n virus a novel coronavirus from patients with pneumonia in china identification of infectious agents in high-throughput sequencing data sets is easily achievable using free, cloudbased bioinformatics platforms metagenomic next-generation sequencing of samples from pediatric febrile illness in tororo investigating transfusionrelated sepsis using culture-independent metagenomic sequencing a metagenomics -based diagnostic approach for central nervous system infections in hospital acute care setting messages from the second international conference on clinical metagenomics (iccmg ). microbes infect chronic meningitis investigated via metagenomic next-generation sequencing propionibacterium acnes: disease-causing agent or common contaminant? detection in diverse patient samples by next-generation sequencing common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes rapid metagenomic characterization of a case of imported covid- in cambodia metagenomic next-generation sequencing reveals miamiensis avidus (ciliophora: scuticocilitida epizootic of leopard sharks (triakis semifasciata single mosquito metatranscriptomics recovers mosquito species, blood meal sources, and microbial cargo, including viral dark matter critical assessment of metagenome interpretation -a benchmark of metagenomics software comprehensive benchmarking and ensemble approaches for metagenomic classifiers clinical infectious diseases pulmonary metagenomic sequencing suggests missed infections in immunocompromised children github -chanzuckerberg/idseq-bench: idseq infectious disease benchmarking tools virapops supports the influenza virus reassortments reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation improved metagenomic analysis with kraken simulating illumina metagenomic data with insilicoseq enterobacter soli sp. nov.: a lignin-degrading γ-proteobacteria isolated from soil higher classification sensitivity of short metagenomic reads with clark-s the pangenome of (antarctic) pseudoalteromonas bacteria: evolutionary and functional insights emergence of divergent enterovirus (ev) d sub-clade d strains, northern italy emergence of new genotypes and lineages of dengue viruses during the - epidemics in southern india early identification of dengue virus lineage replacement in brazil using portable genomic surveillance biological and phylogenetic characteristics of west african lineages of west nile virus systematic analysis of protein identity between zika virus and other arthropod-borne viruses. world health organization single mosquito metatranscriptomics recovers mosquito species, blood meal sources, and microbial cargo, including viral dark matter ribosomal rna -an overview | sciencedirect topics advantages of meta-total rna sequencing (metrs) over shotgun metagenomics and amplicon-based sequencing in the profiling of complex microbial communities this research was supported by the chan zuckerberg initiative (czi). the authors would like to thank all czi team members who were involved in support for software development and all researchers who have provided input on the idseq web portal throughout the course of its development. the authors would like to thank hanna retallack for valuable comments on the manuscript text. key: cord- -vhrlrd c authors: mishra, preet; prasad, abhishek; babu, suresh; yadav, gitanjali title: decision support systems based on scientific evidence: bibliometric networks of invasive lantana camara date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vhrlrd c extraction and analysis of useful knowledge from the vast amount of relevant published literature can add valuable insights to any research theme or area of interest. we introduce a simplified bibliometric data analysis protocol for gaining substantial insights into research thematics, which can also serve as a handy practical skill for researchers, while working from home. in this paper, we provide ways of developing a holistic research strategy using bibliometric-data driven approaches that integrate network analysis and information management, without the need of full paper access. this protocol is a comprehensive multi-modular pathway for analysis of metadata obtained from major scientific publishing houses by use of a decision support system (dss). a simple case study on the invasive species lantana camara has been presented as a proof-of-concept to show how one can implement this dss based protocol. some perspectives are also provided on how the outcomes can be used directly or scaled up for long term research interventions. we hope that this work will simplify exploratory literature review, and enable rational design of research objectives for scholars, as well as development of comprehensive grant proposals that address gaps in research. the problem of optimizing knowledge extraction from published literature. informed policy making necessitates scientific evidence, regardless of system complexity, be it socio-political state priorities during the ongoing novel coronavirus pandemic, funding agency mandates, research strategy development of graduate students, or any other real world problem. while identifying possible solutions, policy makers require problem-oriented scientific evidence, which in turn, can be understood as a dynamical system possessing a multimodular nature. accordingly, policy issues and solutions are both embedded in complex, globally interconnected environments that require thorough analysis of knowledge extracted from available published or bibliometric data. any evidence derived from incomplete data can be inaccurate, and this brings in two critical aspects associated with usage of vast information: firstly, published information can be categorized under the paradigm of big data, possessing all the typical characteristics of high variety and volume. usage of such a large body of information is a complex process. thus, the analysis of information, if not implemented without due recognition of constraints involved, such as copyright, open access, metadata etc. can cause bottlenecks in data adequacy. secondly, data insufficiency may arise from the choice of search parameters. computational tools like citation managers and ai-based search engines harness bibliometric big-data using author-designated and user-specified "keywords". this step, in turn, controls all the later stages of the analysis; organizing, filtering, classifying, predicting, planning and so on. if a priori, the keyword is chosen in an ad-hoc or static manner, it may decrease efficiency of the search process which in turn will strongly impact outcome. since both these aspects of the problem are inherent and user-dependent, we suggest a twopronged strategy of building redundancies and optimizations using decision support systems, as described below. network approach to decipher large data sets has long been known as one of the best analytical strategies, and this applies equally well for bibliographic information (shiffrin and börner, ) , with the added benefits of being able to explore, model and restructure literature metadata to draw insights from both static and dynamic representations of individuals, organizations or themes of research (newman, ) . high dimensional literature metadata, when visualized efficiently through networks can reveal communities sharing common node or edge attributes in both coarse-grained and fine-grained routines (babu et al., ) availability of metadata can often overcome constraints of limited access to full-text and enable one to focus on a lower ease-of-access threshold, i.e. critical information within the title, abstract and citation. in the paradigm of embedding theory, words embedded within sentences of publications indicate the themes of research and thus the visual analysis of these embeddings may provide proof of concept (spangler et al., ) . such textual embedding of data has high dimensions (griffiths and steyvers, ) and keeping track of the dimensions can be done in an efficient manner through network visualizations of co-occurrences (tshitoyan et al., ). an added benefit is, it can provide us a visual static picture of the flow of ideas and the themes in current research scenarios (srinivasan, ) . metadata can thus consist of keywords, title, and abstract, authors' information, publisher and journal information which can all be projected as node attributes, while edges may represent relationships between nodes, such as coauthorship, professional affiliations, or collaborative interactions (landauer et al., ) . coauthorship bibliography networks are undirected, and two of the most informative topological parameters are (a) betweenness centrality, an indicator of hubs in the flow of information, and (b) degree distribution, a measure of the collaborations ). a vast number of online and downloadable tools are available to perform bibliometric analyses, but as mentioned above, all such software strictly adhere to the concept of garbage in garbage out (gigo), being implementations based on the processing of metadata obtained through userderived "keywords" (krallinger et al., ) . if we envisage it a single process to obtain metadata from a huge database of published articles, then it may be possible to optimize critical search parameters, through a decision support system (dss) (sprague, ). here we present case studies from invasive plant species bibliographic metadata and share how emerging co-authorship networks can improve and inform decisions, and how diverse network visualizations can be integrated as modules in a dss. the solution provided from these dsss must be regarded as partial, being iterative and adaptive, subject to existing constraints in time, rather than being steadfast or all encompassing. our main objective is to increase the efficiency of decision making at the initial levels of strategy design, which may contribute to accuracy improvement at later levels in the hierarchy. figure depicts a schematic of the suggested decision support system (dss) for integration of modules consisting of various bibliography tools, to analyze the metadata of published information. a-priori the user-selected keyword enters the process, and is at the core of advanced search tools for obtaining metadata from various sources like. this metadata can be analyzed in two distinct ways (dashed line arrows marked as and in fig ) ; route includes labor-intensive reading and analysis of the problem-oriented search results and then updating, on a manual inference basis, the search keyword in the next iteration, a complex and multi-dimensional landscape that may involve new queries arising from parsing of metadata and analytical constraints related to accessibility, time availability, feasibility, collaborations, and so on. this juncture can be identified as the most crucial point in the analysis, where user has to make decisions regarding the next iteration. several queries need to be addressed before moving ahead in an exploratory investigation. for instance, how much information should be sufficient to answer the questions addressed? do the current citations cover adequate thematic premise? does the metadata reflect evolution of subject-specific knowledge through space and time? according to our proposed scheme, route can be used to answer these and other relevant questions through a dss with two modules a and b, both incorporating diverse aspects of network theory (see figure ). article metadata can provide information at the theme levels and thus it can be related to decisions about choice of keywords and databases to search when doing the literature surveys and concept reviews broadly. module a in figure involves text co-occurrence networks based on the metadata obtained from initial searches. these visualizations can be obtained from the abstract, title, or keywords provided with the published paper. some of the file formats that are typically used are .txt , .csv, .xml etc. these can then be plugged into the respective tools to obtain relevant visualizations. some of the tools are mentioned in the figure . through various algorithms, these tools parse the embeddings of the words in the text and yield mapping of the embeddings, providing users with a clearer picture of the linked concepts, which have been published and latent information which can help users select appropriate keywords for the next iteration. the metadata obtained from the search can also yield information about the research scenarios at the author level with related information as attributes of the author. the module marked b in figure deals with user decisions regarding the choice of the search parameters taking into account author based information from metadata. this module involves tools to create coauthorship networks that can be analyzed regarding not-only decisions about search parameters but also inform the exploratory investigation into research themes by answering questions as to the origins and extent of the problem itself, i.e. who, where, what and when etc. (lent et al., ) . both dss modules are highly user-centric and involve open source programming software, with greater levels of accessibility, enabling personalized module design at user level. module b depends on the interpretation of the user in a more convoluted way than that involved in module a, and due care must be taken while implementing module b, as well as while deriving interpretations from the dss outputs at each iteration. we present a case study for assisting a graduate student aiming to perform a systematic review of biological invasions, focusing on india's most aggressive alien plant invasive species lantana camara. our objective is to roughly get an overview of the status of research in this area and identify major gaps, specially from an indian context, so that a suitable graduate research strategy can be designed for the next three years, addressing gaps in knowledge. for this case study, we take route in figure and show how combinations of suitable keywords can be developed by multiple iterations of the dss module, the methodology is summarized as a flowchart in figure . for the initial process of obtaining metadata, we focus on the prime objectives of the species in indian context. we use pubmed (https://www.ncbi.nlm.nih.gov/pubmed), the open-source literature database with over million citations for biomedical literature beginning from , life science journals, and online books (https://www.ncbi.nlm.nih.gov/books/nbk ). we used 'advanced search' option and keywords ("lantana"[all fields]) and ("camara"[all fields]) and ("india"[all fields]), and this returns over a articles by more than authors. we saved metadata as an xml file, a format compatible with the cytoscape plugin used in subsequent stages (shannon et. al., ) . it may be noted that formats and compatibility between search engines and downstream tools represents a constraint in the visualization process and metadata formats with more flexibility can enable more information to be extracted. for instance, although jstor (www.jstor.org) has the largest open collection of published articles ( onwards), it does not enable metadata collection, rather each article is downloaded as an individual xml file. we are currently working on developing a tool for collating these files and extract metadata using r without losing links between the metadata. it may also be noted that we have chosen a relatively sparse dataset as our example, and we advise due care in selection of keywords and to correlate these with number of articles returned. for example, adding the keyword 'invasive' severely reduces the number of articles, whereas removing the connection to india increases the number to beyond . the xml file feeds into the 'social network' plug-in of the most widely used open source visualization tool cytoscape (kofia et al,. ) , yielding the co-authorship bibliography network shown in figure a . various attributes of the nodes (authors) can be superimposed onto this network as additional informative layers of color or shape, such as for example, the institutions they are affiliated to, as well as theme of research, to obtain visualizations in panels b and c respectively (figure ). the foremost observed pattern is that lantana research in india is strongly inclined towards applied phytochemistry, with more than % of authors publishing under this theme (represented by green colored nodes in fig c) . the other notable pattern is that despite very few researchers (only %), being involved in addressing invasion ecology of l.camara in india this is the theme where potential global interest, and possibly funding opportunities, may be found. this is evident from a comparison of congruent layouts in figures b and c , where largest proportion of foreign authors (yellow nodes in b) are sharing publications with indian authors under the theme of invasion ecology (represented by red nodes in c). in order to check if this interpretation is an artefact of network representation, the layouts can be redrawn differently; representing nodes as institutions or publications, rather than individual authors. this has been done in figure using the same metadata and color codes as above, with cytoscape 'thematicmap' plugin (shannon et. al., ) . all attribute networks in figure represent edges as authors and the width of these edges depicts the number of authors sharing an affiliation ( a, b) or a publication ( c). colors in figure a reveal collaborations between indian (blue) and foreign (yellow) institutions, while color codes in figure c reveal theme of the paper. clearly, the pattern observed in the earlier networks is echoed here as well, with % of the network space representing applied phytochemistry, with the added benefit of being able to map organisations to research themes of interest, such as invasion aspect lantana camara. additional features of the bibliography networks can be observed in attribute networks, as compared to co-authorship networks; figure b shows the largest subnetwork of a, revealing the most highly associated institutions among existing collaborations, and this subnetwork should be important for decisions regarding setting up new collaborations, or identification of hubs driving the existing body of work, towards bringing in fresh outlook or scope. nodes in the b subnetwork are colored by degree, revealing institutions that have had the maximum collaborative interaction in the area of interest. interestingly, this subnetwork consists of organisations that are not only related by geographical proximity (like ivri and csir-ihbt both in palampur, and doon university, dehradun), but also connect a wide diversity of expertise (dept of biochemistry at panjab university and carbohydrate biotechnology at iit guwahati), both being valuable for a new researcher. additional features discernable from both sets of networks are related to linkages of research and development units, towards reduction of multidimensional problems related to research in invasion biology of lantana camara. for instance, agricultural applications are not very strongly represented in lantana bibliography networks (only %), but this is not a niche or gap in research, instead, a reflection of the nature of lantana camara; it is not an agricultural weed but more widely impacting the conserved/protected areas. as can be seen in figure c and c, the few studies that involve agricultural applications, are also in the area of applied photochemistry, suggesting an applicability for lantana phytochemicals in toxicity assays or weedicide development. the major research gaps from an indian point of view in this field are that ecology and phytochemistry have not been studied together for this invasive, despite being known to be connected concepts. another insight from these networks is the large number of disconnected subnetworks, suggesting a relative disconnect between researchers, groups and institutions working on lantana camara. this awareness immediately raises the need for a pan-india national network on invasion biology, bridging diverse fields of expertise and knowledge, to address this most aggressive species that has invaded almost all parts of the country. among the keywords that may enable more detailed literature review and decision making, would be to expand the search to larger engines like web of science and jstor, as well as to get a global perspective by dropping the reference to india alone. web of science has over articles while jstor has over articles in this category. we have combined both searches and are presently developing r scripts to link the metadata from all sources for a more comprehensive and global analysis. another option is to perform the next iteration of this dss by adding more specific theme-specific keywords, phytochemistry being of foremost importance. with the largest number of paper available, new questions could be addressed in phytochemical evolution. another immediate consequence of this small study would be to expand the search to other invasive alien species, an effort that has already been undertaken in our group. the metadata can also be analyzed using co-occurrence visualization software like vosviewer (van eck and waltman, ), that forms the other module of the proposed dss (module b; figure ). in this work, we have shown that even with a simple dss on a single iteration, the outcome not only enabled identification and optimization of multiple relevant keywords, but also put forth a complete picture of the various aspects of the research theme, that emerged outright at the initial stages itself. we also provide ways to scale up and the search operation with additional databases, as well as scaling down with selection of a range of keywords. two courses of action emerge from the dss and case study presented here. first, a gui tool could be designed with integrated modules of the above visualization-based analysis as separate programs, which when executed by the user, would act as a source of both quantitative and pictorial depictions of results (skupin, ) . the process of refining the input to such a tool could then be performed iteratively to obtain precise insights at each step of the procedure. higher levels of categorization tools exist beyond a high threshold of ease-of-access thus, openaccess tools are needed to tackle the issues of a good literature review for usage by a wide spectrum of researchers. second, individual-based support system can be designed subject to the constraint of knowhow and ease of access to the above-mentioned network visualizations tools. for example, searching of metadata from web based databases can be scripted through text and data mining tools like contentmine, and custom-built in open source platforms like r or python. combined with this, the knowledge of advanced network analysis tools like gephi, pajek etc. can be used then to analyse the metadata from various perspectives. this will help gather information that would assist the individual to make a holistic research strategy. in summary, we hope that our work will pave the way for new scientific-evidence based insights, policy decisions and future directions through streamlined network analytics. in exceptional circumstances such as the present covid- lockdowns, a lot of review work and metanalysis is being carried out globally. with a few strategic interventions, many of these in silico efforts could provide insights into gaps and opportunities in research thematic across a range of disciplines. author contributions: sb and gy designed the study and proof of concept. pm and ap collected the data. pm built the dss and performed the analysis. all authors contributed to preparation of the manuscript. co-authorship networks among drdo social network: a cytoscape app for visualizing co-authorship networks information retrieval and text mining technologies for chemistry from paragraph to graph: latent semantic analysis for information visualization proc discovering trends in text databases coauthorship networks and patterns of scientific collaboration proc finding and evaluating community structure in networks cytoscape: a software environment for integrated models of biomolecular interaction networks the world of geography: visualizing a knowledge domain with cartographic means a framework for the development of automated hypothesis generation based on mining scientific literature text mining: generating hypotheses from medline unsupervised word embeddings capture latent knowledge from materials science literature vosviewer, a computer program for bibliometric mapping the authors declare no conflicts of interests. key: cord- -j mmx k authors: karasik, agnes; jones, grant d.; depass, andrew v.; guydosh, nicholas r. title: activation of the antiviral factor rnase l triggers translation of non-coding mrna sequences date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: j mmx k ribonuclease l (rnase l) is activated as part of the innate immune response and plays an important role in the clearance of viral infections. when activated, it endonucleolytically cleaves both viral and host rnas, leading to a global reduction in protein synthesis. however, it remains unknown how widespread rna decay, and consequent changes in the translatome, promote the elimination of viruses. to study how this altered transcriptome is translated, we assayed the global distribution of ribosomes in rnase l activated human cells with ribosome profiling. we found that rnase l activation leads to a substantial increase in the fraction of translating ribosomes in orfs internal to coding sequences (iorfs) and orfs within ’ and ’ utrs (uorfs and dorfs). translation of these alternative orfs was dependent on rnase l’s cleavage activity, suggesting that mrna decay fragments are translated to produce short peptides that may be important for antiviral activity. the detection of double stranded rna (dsrna) is a critical mechanism by which cells sense and defend against viral infections. in the cell, dsrna activates several pathways important for type i interferon production, host translational shut down, and ultimately viral clearance or apoptosis (hartmann, ; kang et al., ; malathi et al., ; yoneyama et al., ) . rnase l (ribonuclease latent), an endonuclease that broadly targets host and viral rnas at unn motifs (han et al., ) , is an important part of this antiviral defense network that is constitutively expressed in an inactive (monomer form in differentiated cells (floyd-smith et al., ; zhou et al., ) . activation of rnase l during viral infections begins when oligoadenylate synthetases (oass) bind to and are activated by dsrna. once active, oass synthesize - -oligoadenylate ( - a) from atp (li et al., ; poulsen et al., ) . binding of - a to rnase l stabilizes the dimeric form of the protein and, in turn, activates the enzyme by bringing the catalytic endoribonuclease domains into close proximity (dong and silverman, ; han et al., ; huang et al., ) . active rnase l is thought to be beneficial to the organism because it stimulates production of interferon (ifn), inhibits cell migration, and can trigger apoptosis of infected cells (banerjee et al., ; brennan-laun et al., ; castelli et al., ; hassel et al., ; zhou et al., ; zhou et al., ) . clearance of certain classes of viruses particularly depends on the activation of rnase l. for instance, mice lacking rnase l are not able to clear mouse encephalitis hepatitis virus, a member of the coronavirus family, resulting in higher mortality (ireland et al., ) . furthermore, mutations in oas genes are found in laboratory mice susceptible to west nile virus infection and in human patients with severe disease outcomes (lucas et al., ; mashimo et al., ; yakub et al., ) . while the activity of rnase l is thought to be antiviral, a role in bacterial infections is also emerging (li et al., ) . rnase l is also known to be a tumor suppressor in hereditary prostate cancer and other cancers casey et al., ; long et al., ; wang et al., ) and was also identified as an oncogene in chronic myelogenous leukemia (lee et al., ) , further highlighting its role in human health. many aspects of how cleavage of host rnas benefits the organism are unknown. rnase l cleaves a broad spectrum of host rnas, including rrnas, trnas, mrnas and y-rnas (burke et al., ; donovan et al., ; rath et al., ; rath et al., ) . while cleavage of rrna was shown to not inhibit the activity of the ribosome (rath et al., ) , two recent studies established that, on average, mrna levels in the cell can be reduced up to ~ % due to rnase l (burke et al., ; rath et al., ) . this global effect was termed - a mediated decay ( - amd), analogous to nonsense mediated decay (nmd) . some antiviral mrnas, such as ifn-β, were shown to be somewhat less sensitive to repression by rnase l activation. this was proposed to result from transcriptional compensation or lack of the most favorable cleavage sites (generally ua and uu motifs) (burke et al., ; floyd-smith et al., ; rath et al., ; rath et al., ) . it has also been reported that mrnas encoding ribosomal proteins or mirna binding sites are specifically targeted (andersen et al., ; rath et al., ) . rnase l could therefore be important for generally reshaping the transcriptome and translatome to enhance the response to viral infection. intriguingly, the rnase l cleavage fragments themselves have been proposed to be important since they can activate dsrna sensors, such as rig-i, mda , and pkr, that lead to ifn production or shutdown of translation initiation (malathi et al., ; manivannan et al., ) . it remains unknown whether the fragments themselves have additional roles, such as the ability to be translated into short peptides. there is some evidence that rnase l activation modulates translation. early studies identified rnase l inhibitor (rli) as a protein that binds rnase l. rli was later shown to be the large subunit ( s) ribosome recycling factor, atp binding cassette sub-family e member (abce ) (bisbal et al., ; khoshnevis et al., ; pisarev et al., ; young et al., ) . activated rnase l was also shown to bind to a translation termination factor, eukaryotic release factor (erf ) (le roy et al., ) . dual luciferase reporters in cell lysate suggest that activation of rnase l can induce translation of ' untranslated regions (utrs) (le roy et al., ) , an outcome that could be explained if rnase l inactivated these factors. to answer the question of how rnase l activation impacts translation, we performed ribosome profiling on rnase l activated a human lung epithelial cells. we measured changes in ribosome distribution across the transcriptome and found rnase l activation leads to a shift in the distribution of translating ribosomes to ' and ' utrs and internal open reading frames (orfs) within the coding sequence. this alternative translation was dependent on the cleavage activity of rnase l and therefore suggests that rna fragments can be translated, leading to the production of short peptides. since rnase l dramatically reshapes the transcriptome by degrading mrna (burke et al., ; rath et al., ) and may itself affect the translation process, we investigated the distribution of ribosomes in cells where rnase l was active by performing ribosome profiling (ribo-seq) (ingolia et al., ) (figure a ). active rnase l was shown to cleave both s and s rrnas at precise locations (cooper et al., ; rath et al., ; wreschner et al., ) . activation of rnase l is therefore traditionally assessed by using an rrna cleavage assay (wreschner et al., ) . this activation can be specifically achieved by transfection with purified - a (prepared as described in methods) or the double stranded rna mimic, poly i:c, that also activates additional dsrna sensors, such as pkr, rig-i and mda (chitrakar et al., ; de haro et al., ; martinand et al., ) . we performed most experiments in the a lung carcinoma cell line since these cells had demonstrated robust rnase l activation in previous studies (burke et al., ; chitrakar et al., ; rath et al., ) . we found that . hours treatment of wild type (wt) a cells with - µm - a was sufficient to generate readily detectable rrna cleavage products by using electrophoretic analysis (bioanalyzer) of purified total rna ( figure b ). the observed cleavage patterns were very similar when we treated with poly i:c ( . µg/ml, . hours), and no rrna cleavage was detected in rnase l ko cells ( figure b ), confirming cleavage specificity to rnase l. next, we performed ribosome profiling by purifying ribosomes over a sucrose cushion and selecting mrna fragments ( - nt) that correspond to ribosome footprints from - a treated (or untreated) wt and rnase l ko a cells (mcglincy and ingolia, ) . after library preparation and deepsequencing, ribosome protected footprints (referred to as footprints throughout the manuscript) were aligned to the human transcriptome and analyzed further (see methods). ribosome footprints are plotted by ' ends, rather than by ' ends or overall coverage, throughout the manuscript to enhance analysis of reading frame. first, we investigated changes in translation during rnase l activation by analyzing the differences between ribosome footprint distribution in - a treated and untreated cells. the mrna pool is known to be greatly reduced by activation of rnase l (burke et al., ; rath et al., ) . however, particular classes of mrnas, including those involved in host defense against viruses, were shown to be somewhat resistant to rnase l-mediated degradation. here, we employed ribosome profiling to evaluate whether ribosome occupancy levels also follow this trend when rnase l is active. we found significant changes in footprint distribution for many genes ( genes > -fold changes), including several pro-inflammatory chemokines and cytokines and regulators of immune response and cell physiology ( figure c and d). in particular, we found transcripts that were previously shown to increase in relative abundance during activation of rnase l show a similar relative increase in our ribosome profiling data, such as il (interleukine- ) and egr (early growth response protein ) (burke et al., ) . thus, we confirm that differences in the abundance of ribosome footprints between rnase l activated and normal cells reflect changes in the abundance of these transcripts. we also noted that footprints from the genes encoding interferon-b and g were not observed in - a treated cells during the time course of the treatment, in agreement with previous findings (burke et al., ) . this absence suggests that pathways responsible for interferon production, such as those mediated by mda-i and rig-i, are not activated. since it was reported that rnase l activation increased translation of 'utr regions downstream of stop codons by interfering with factors that promote translation termination (erf ) or ribosome recycling (abce ) (le roy et al., ) , we assessed the level of ribosome footprints in 'utrs. this level was assayed by computing the ratio of footprints in every 'utr relative to its respective main orf within the coding sequence (density ratio, 'utr:orf) for each transcript . we found that the ratios globally increased when wt a cells were treated with - a, ( figure a , red dots above diagonal). this finding was consistent across several replicates as evidenced by boxplot analysis (supplemental figure a -c). we did not observe this trend in rnase l ko cells, showing that rnase l activation is required for the process. these data suggest that activation of rnase l increases translation of 'utrs relative to coding sequences. the increased presence of ribosomes in the 'utr was reproducible in other cell lines, such as hap and hela, indicating that these observations are not cell line specific (supplemental figure d -g). additionally, we tested whether the form of - a used to activate rnase l modulated this observation. while the trimeric form of - a we used for most experiments here is thought to be the shortest - a molecule capable of activating rnase l, longer forms also achieve activation (han et al., ; le roy et al., ) . to verify that other forms of - a don't change the observed effect on translation in 'utrs, we tested whether non-trimeric forms of - a (dimer and tetramer) have the same effect. we found that the tetramer also increased the 'utr:orf density ratio, much as observed for the trimer form of - a (in hela cells, supplemental figure e -g). in contrast, treatment with dimeric - a did not induce rrna cleavage, as expected (data not shown). it also did not increase relative translation of 'utrs, offering further evidence that the observed effects are dependent on the activation of rnase l (supplemental figure h and i). to further investigate the mechanism by which rnase l activation was promoting translation of 'utrs, we aligned genes by their stop codons and averaged their respective ribosome footprints ("metagene" analysis). as expected, the average of footprints mapping to coding sequences (upstream of stop codons) showed strong three nucleotide ( -nt) periodicity, reflecting elongation by the ribosome across the main orf ( figure b , left). in agreement with the 'utr:orf density ratio analysis, we observed an increase in the average level of footprints in the 'utr for wt, but not for rnase l ko cells that were transfected with - a ( figure b , right). in addition to this global analysis, individual analysis of 'utr regions from some transcripts, such as jun, actg and dusp , revealed heavy translation upon rnase l activation ( figure c ) . we then examined several properties of the 'utr average footprints to assess whether existing models (data reproduced in supplemental figure j ) could explain how ribosomes bypass stop codons. it is feasible that lack of the termination factor (erf ) would result in the ribosome "reading through" the stop codon and producing a c-terminally extended protein product by allowing the binding of a near-cognate trna and incorporation of an amino acid by the catalytic center at the stop codon. one of the hallmarks of "readthrough" of stop codons is that ribosome footprints in the 'utr maintain the reading frame established by the main orf (wangen and green, ; young et al., ; young et al., ) (supplemental figure j ). if activated rnase l triggered readthrough by inhibiting erf via its proposed interaction (le roy et al., ) , we would expect to observe -nt periodicity in the 'utr. however, the footprints that mapped to the 'utr region, on average, lacked this periodicity ( figure b , right). this suggests that the observed effects of rnase l activation are not caused by readthrough. alternatively, loss of ribosome recycling activity due to lower levels of abce can increase average 'utr footprint levels in all reading frames due to reinitiation of translation . when ribosome recycling is inhibited, the encoded protein is released at the stop codon. however, because the unrecycled ribosome remains on the mrna, it can move into the 'utr, translate downstream sequences, and produce short peptides. since rnase l is known to directly interact with abce (bisbal et al., ) , it is possible that this interaction interferes with abce 's s ribosome recycling function. however, the signature of lowered abce levels is an increased peak at the stop codon, due to the accumulation of unrecycled s ribosomes (mills et al., ; young et al., ) (supplemental figure j ). however, stop codon peaks in the metagene analysis did not change in a way that supports this hypothesis, indicating that - a treatment did not affect ribosome splitting ( figure b ). in addition, other forms of recycling defect, such as loss of s subunit recycling, could explain the increased translation in 'utrs. however, defects in s recycling are known to result in ribosome queuing upstream of the stop codon (young et al., ) (supplemental figure j ), and this defect was not observed in rnase l activated cells ( figure b ). more broadly, any translation in the 'utr that results from loss of termination or recycling activity should be characterized by a loss in footprint levels at distal regions of 'utrs since ribosomes will eventually terminate translation and be recycled in the 'utr, even if inefficiently young et al., ) (supplemental figure j ). however, a metagene plot computed from genes with long 'utrs revealed almost no decreasing trend on average and instead showed steady footprint levels over kb of 'utr (supplemental figure k , compare to j). metagene analysis therefore revealed that the increased translation in 'utrs did not mimic any known ribosome termination or recycling defect. since we observed increased relative ribosome occupancy in 'utrs, we next asked whether the ribosomes in 'utr regions were actively translating or nonproductively bound. we therefore looked for the hallmarks of active translation, such as peaks on start codons (indication of translation initiation) and -nt periodicity downstream from the dorf start codon (indication of elongation by ribosomes) in the 'utr. consistent with a model where ribosomes in the 'utr were translating, we observed ribosome footprint peaks in 'utrs at canonical (aug) or non-canonical (cug, uug and gug) start codons in all three frames with respect to the main orf. as an example, we show putative dorfs with the most prominent peaks in the 'utr of the actg mrna ( figure a ).to further confirm the existence of active translation, we extended the analysis globally by aligning 'utr start codons together and averaging ribosome footprints (metagene plots at dorfs). the dorf metagene analysis confirmed a large ribosome average footprint peak at canonical and noncanonical start codons. importantly, this analysis also revealed -nt periodicity after aug codons alone, consistent with aug codons efficiently initiating translation that leads to elongation across downstream sequences ( figure b , supplemental figure ). while we also observed a strong peak when non-canonical cug start codons were separately analyzed, the -nt periodicity downstream was weaker than that in the dorf metagene analysis for aug start codons ( figure c , supplemental figure ). these trends are consistent with cug being less efficient at initiating translation (mehdi et al., ) . in contrast, a similar analysis of untreated wt and treated or untreated rnase l ko cells did not show these characteristics, as expected ( figure despite these clear hallmarks of active dorf translation in rnase l activated cells, we wanted to further confirm that translation of these regions was occurring. we therefore employed a dual luciferase assay to directly measure the synthesis of a peptide encoded in the 'utr (houck-loomis et al., ) . in this assay, the main orf of the reporter gene encoded renilla luciferase (rluc) and the coding sequence for firefly luciferase (fluc) was inserted in the 'utr to act as a dorf (see methods). translation of the reporter was carried out in hela cell extracts in the presence or absence of different concentrations of - a. main orf and dorf translation could therefore be measured by assaying for the activity of each respective luciferase enzyme. we found dorf fluc activity increased relative to that of the main-orf rluc in a - a concentration dependent manner ( . - µm) ( figure d ). notably, this concentration dependence of 'utr translation was similar to that observed for the dimerization of an rnase l based - a sensor, consistent with the idea that dorf translation is correlated with rnase l dimerization and activation (chitrakar et al., ) . this trend therefore provides added evidence that activation of rnase l increases translation of 'utr dorfs relative to the corresponding main orf. taken together, ribosome profiling and a luciferase reporter assay showed that ribosomes translate dorfs when rnase l is activated. to offer further context about the possible magnitude of this effect, we computed the ribosome density across individual dorfs (initiated by aug start codons, see methods). using conservative metrics for dorf translation (see legend), we found evidence for over a thousand dorfs being translated in response to rnase l activation. while most dorfs encode short peptides (less than aa), many were found to encode longer products ( figure e ). since we observed increased dorf translation in rnase l activated cells without apparent defects in translation termination or recycling, we asked whether these seemingly anomalous translation events were also occurring in upstream orfs (uorfs) in the 'utr. most uorfs encode short peptides and start with an aug or a near-cognate start codon. they are believed to act as regulators of downstream translation of coding sequences (hinnebusch, ; johnstone et al., ; spealman et al., ) but may have other roles, such as regulating nmd or producing functional peptides (chen et al., ; hinnebusch, ; johnstone et al., ; lin et al., ) . therefore, we computed the ratio of ribosome footprint density between 'utrs and their respective main orfs, similar to the 'utr analysis above. while basal translation of 'utrs in untreated cells is much higher than of 'utrs, we found that 'utr:orf ratios were further increased in rnase l activated cells ( figure a , dots above diagonal, supplemental a-c). as with 'utrs, the trend was eliminated in - a treated cells lacking rnase l, indicating that the effect is attributable to rnase l activation. these increases were also evident in metagene plots where we averaged ribosome footprints from all genes that had been aligned by their canonical start codons ( figure b , supplemental figure d ). we noted that the fold increase in 'utr ribosome occupancy was generally smaller as compared to 'utrs, mainly because 'utrs are normally translated at a moderate level, unlike 'utrs (ingolia et al., ) ( figure b vs b, right-hand panels). similar to the increase of ribosome protected footprints in 'utrs, we also noted these effects on individual genes ( figure c ). we further investigated whether these trends were caused by active translation of individual uorfs by computing metagene plots aligned at uorf start codons. we found that overall uorf translation, as evidenced by increased peaks on start codons and overall higher footprint levels across the uorfs, was enhanced in - a treated cells ( figure d -e and supplemental figure e and f). as with dorfs, nt-periodicity downstream of aug-initiated uorf start codons was evident, but less well-resolved for the case of non-canonical cug codons ( figure d -e). one hypothesis that could explain the increased translation of orfs in utr regions is that ribosomes are able to directly load onto mrna fragments that are created by rnase l and begin translation at an available start codon. to test this prediction, we also asked whether alternative translation events take place within the coding sequences. to this end, we examined whether initiation of translation could take place at aug codons found within the main orf. initiation of translation at such internal augs could result in one of two possible outcomes. if the aug was in the same frame as the main orf (frame ), we would expect an n-terminally truncated protein product compared to that encoded by the full main orf. if the aug was in the - or + frame with respect to the main orf, we would expect a ribosome to initiate and then terminate translation within an internal orf ("iorf") and generate a short peptide product. either event (in-frame or out-of-frame initiation) should be detectable by the averaging (metagene) approach we previously used for uorfs and dorfs. however, any signal from these events will be confounded by translation of the main orf. in the case of initiation in frame , we would expect a higher peak of ribosome footprints on the internal aug codons due to the additional reads from that initiation event. in the case of initiation in frame + or - , we would expect this peak to occur -nt away from the dominant -nt signal arising from the main orf. we therefore performed metagene analysis at internal aug start codons ( fig f- h), similar to the approach we used for uorf and dorf metagene analysis. however, here we analyzed translation of these events in the different frames (frame , - and + ) individually to clearly distinguish footprints deriving from internal initiation events from the dominant main orf footprints. we observed increased ribosome footprints, on average, at internal start codons in all frames in - a treated cells ( figure f -h). in the case of out-of-frame iorf initiation, we further analyzed translation downstream of the initiation event by plotting only footprint reads in the frame corresponding to the iorf (- or + ). this eliminates the signal from main orf translation and clearly shows continued elongation past the initiation event ( figure f and h, right panels). however, we could not analyze periodicity since this iorf analysis is limited single frame (+ or - ). this analysis was consistent across replicates (supplemental figure g -i). as an internal control, we also analyzed out-of-frame reads in one case (- reads for + iorfs) and confirmed that we only observed footprints in the expected frame (supplemental figure i , note peak is only present in + frame). this analysis, combined with our previous analysis of uorfs and dorfs, comprehensively examines translation across the entire mrna and shows that relative increases in alternative orf (altorf) translation during rnase l activation are not limited to any particular region. while we found that altorfs in all mrna regions were translated upon the specific activation of rnase l by - a, we asked whether the effect would also manifest during broad activation of the innate immune response by the dsrna mimic, poly i:c. we expect that during poly i:c treatment, rnase l would be activated by naturally synthesized - a because poly i:c activates the oas enzymes and, to varying degrees, other dsrna sensors, such as rig-i, mda , tlr , and pkr (alexopoulou et al., ; hartmann, ; kato et al., ) . however, the activation of all of these pathways could mask the effects of rnase l activation. we therefore transfected a cells with poly i:c ( . µg/ml, . h), and performed ribosome profiling. we found that the effect of poly i:c on altorf translation was similar to - a ( figure a -f, supplemental figure a -f). we observed increased footprints in dorfs in 'utrs ( a-d), uorfs in 'utrs ( e), and iorfs in coding sequences ( f) upon poly i:c transfection. importantly, we observed a strong correlation in relative 'utr footprint levels observed between - a and poly i:c treatment (pearson's r = . and . (replicate and ), supplemental figure b ). we also detected a large increase in average ribosome footprint peaks on aug codons of dorfs when cells were treated with poly i:c ( figure d and supplemental figure d ). all of these effects of poly i:c treatment were substantially diminished in the rnase l ko cells ( figure a -d). the similarity in the effects suggests that translation of non-canonical regions occurs when rnase l is activated via naturally produced - a from broad activation of the antiviral response by double-stranded rnas. it should be noted that poly i:c treatment did result in slightly elevated 'utr ribosome footprints on some genes in a rnase l ko cell line (supplemental figure a ). however, this increase across replicates was generally weak (supplemental figure c ), the footprint pattern did not match that observed for - a treated cells (supplemental figure g) , and, in particular, peaks were not evident on aug codons (figures d, supplemental figure d) ). this suggests that poly i:c may affect 'utr translation via a weak, alternative mechanism that is independent of rnase l. having established that activation of rnase l increases the relative translation of altorfs, we further investigated the mechanism of this phenomenon by testing whether the catalytic activity of rnase l was required. by cleaving mrnas, rnase l generates ' ends that could be used by s ribosomal subunits for initiation on mrna fragments. alternatively, it is plausible that rnase l could promote translation of altorfs via interactions with translation factors independent of rna cleavage by, for example, increasing usage of a leaderless ( ' end independent) translation initiation mechanism (andreev et al., ) (see discussion). first, we checked whether the relative increase in ribosome footprints in 'utrs is correlated with levels of rna cleavage by rnase l. we reduced rna cleavage by figure d ). the h n rnase l was shown to be deficient in cleavage activity due to the mutated catalytic histidine in the active site. despite this mutation, it was also shown to maintain the ability to dimerize in the presence of - a and bind rna (han et al., ) . therefore, we expect that this mutant should also maintain the protein-protein and protein-rna interactions of wild-type rnase l. we confirmed the cleavage activity of transiently-expressed wild type rnase l by activating it with µm - a or . µg/ml poly i:c and performing rrna cleavage assays ( figure b ). in contrast, transiently-expressed mutant rnase l could not cleave rrnas during - a and poly i:c treatment ( figure b ). as anticipated, we found that the 'utr:orf ribosome footprint density ratio was higher when the transiently-expressed wild type rnase l was activated ( - a or poly i:c) as compared to when the catalytic mutant was expressed ( figure c ). we noted, however, that the extent of the effect was somewhat less than that with endogenous rnase l, likely due to the inherent inefficiency of the transient expression. this demonstrates that the relative increase in altorf translation is dependent on the catalytic activity of rnase l. in further support, local 'utr ribosome footprint patterns ( figure d ) and the characteristic 'utr aug peaks in average footprint data ( figure e and f) were present when wild type, but not mutant h n, rnase l was transiently expressed in rnase l ko cells. we also note that poly i:c treatment on h n expressing cells conferred increased footprints on some individual 'utrs ( figure d , actg data, row ), as we noted in rnase l ko cells previously ( figure s h ). as in rnase l ko cells ( figure s h ), these footprints did not exhibit the same pattern of occupancy as in poly i:c-treated cells transfected with wt rnase l (fig. d, rows and ). in particular, these footprints were not increased on start codons ( figure f ). this inability of the catalytic mutant rnase l to induce alternative mrna translation together with the correlation between rrna cleavage and relative 'utr ribosome occupancy indicates that the observed increase in relative translation of altorfs depends on the catalytic activity of rnase l. since we found that relative changes in altorf translation during rnase l activation are dependent on the cleavage activity of rnase l, we postulate that these effects rely on the presence of rnase l generated mrna fragments. these fragments could potentially be translated if s ribosomal subunits can attach to the exposed ' ends and initiate translation at altorfs ( figure g ). one prediction of this model is that these mrna fragments should be stable enough to be identified by rna sequencing methods. in support of this, it has been shown that rapid activation of rnase l leads to the accumulation of 'utr fragments (rath et al., ) . we investigated the nature of this phenomenon further on a global level by analyzing published data from a polya + rna-seq experiment performed on poly i:c treated a cells . in this experiment, a cells were treated with µm poly i:c for hours, similar to the conditions that we employed for ribo-seq. much like our analysis of ribosome footprint analysis, we computed the ratio of 'utr:orf polya + rna-seq density and compared it between the untreated and poly i:c treated cells. we observed an enrichment in reads derived from 'utrs (supplemental figure f ), indicating that stable 'utr fragments are present in the cell and could be translated as envisioned in figure g . while the previous experiments did not address the status of fragments derived from 'utrs or coding sequences, we surmise that they are also likely present in the cytoplasm. alternative orfs are translated during rnase l mediated antiviral response. remains an open question if these non-canonical translation events also occur during viral infections. to answer this question, we took advantage of existing ribosome profiling data of viral infected cells. vaccinia virus (vacv), which is part of the poxvirus family, is known to be one of the most potent activators of the - a mediated antiviral response (rice et al., ) . while this virus has a large dsdna genome, late viral rnas are thought to form dsrna that activate the oas-rnase l pathway (rice et al., ) . we analyzed publicly available ribo-seq datasets from hela cells infected with vacv for variable durations (mock treated and vacv treated, h- h) (dai et al., ) . we found that 'utr:orf ribosome footprint density ratios increased in proportion to the length of treatment, as compared to a mock treated control, with hours treatment having the greatest effect (fig a, supplemental figure a ). the magnitude of the increase was comparable to what we observed in a cells treated with - a ( figure a ). the effects were also detectable, on average, in a metagene plot computed from reads mapping near the stop codon and at the individual gene level ( figure b and c). analysis of the dorf metagene plot for aug codons also indicated specific translation of dorfs, as indicated by the peak at the start codon ( figure d ). we also noted increases in uorf translation in 'utrs ( figure e, supplemental figure b ). these ribo-seq experiments on cells infected with vacv therefore indicate that translation of altorfs during the antiviral response can occur and may play a functional role defense against certain classes of viruses. (rath et al., ; rath et al., ) , is that the rna fragments cleaved by rnase l can be translated prior to being fully degraded ( figure g ). in support of this model, rnase l dependent mrna fragments were previously found in s hela cell lysates and t d cells by poly a + rna-seq analysis upon - a treatment (rath et al., ) . in addition, we showed that data from others demonstrate that these mrna fragments are stable enough to be detected in cells. these observations raise the question of how these fragments are stabilized to avoid rapid degradation. one possibility is that components of the major rna degradation pathways, such as xrn ( ' and ' exonuclease) or subunits of the exosome ( ' to ' exonuclease), are inhibited because they become overwhelmed by the sheer number of cleaved mrna substrates. xrn was already shown to be an important regulator of the innate immune response due to its ability to degrade viral dsrna fragments and thereby tune the host's ability to detect the virus via other dsrna sensors (burgess and mohr, ; liu and moss, ) . thus, inhibition of xrn could play a role in setting the stability of both host and viral rna fragments in the cell during rnase l activation. furthermore, the rnase l cleavage reaction produces rna termini, a '-oh on the ' cleavage fragment and a '- '-cyclic-phosphate on the ' cleavage fragment (cooper et al., ; wreschner et al., ) , that are likely to afford some resistance to degradation because they are not ideal substrates for xrn and the exosome, respectively (pellegrini et al., ; shigematsu et al., ; sporn et al., ) . another question raised by this model is how s ribosomal subunits are recruited to these fragments in order to initiate translation. canonical translation starts with the loading of the s subunit onto the ' end of the mrna. this process is enhanced by interactions with initiation factors, particularly eif e that directly binds the the ' -methylguanylate (m g) cap that is attached to mrnas in the nucleus (pelletier and sonenberg, ; yue et al., ) . in the case of rnase l activation, the absence of a cap on all fragments (except the '-most fragment) would be expected to lower the efficiency of ribosome recruitment. however, many examples of cytoplasmic mrna "recapping" have been demonstrated (otsuka et al., ; trotman and schoenberg, ) and could account for why these fragments are translated. this sort of recapping process was shown to stabilize endonucleolytic decay intermediates that are generated during nmd in a b-thalassemic mouse model (lim et al., ; otsuka et al., ). on the other hand, translation of these fragments could also occur in the absence of a ' cap by alternative mechanisms, such as use of a-rich sequence elements that have been shown to promote initiation (jia et al., ) or leaderless translation (andreev et al., ) . during leaderless translation, s ribosomes loaded with initiator met-trna are believed to bind directly to start codons without need of initiation factors or a ' cap. additionally, it is conceivable that there would be an overabundance of free ribosomes during rnase l activation because most mrnas are degraded, freeing up ribosomes that would normally be engaged in translating conventional capped mrnas. in this case where "ribosome homeostasis" is perturbed, it is predicted that low-efficiency, capindependent recruitment of s ribosomal subunits would be enhanced simply due to mass action (mills and green, ) . while the observation of altorf translation can be explained by a model where mrna fragments that have escaped degradation are translated ( figure g ), other possibilities can be considered. for example, it is conceivable that rnase l cleavage might damage ribosomal rna (rrna) to an extent that alters the fidelity of translation initiation in a way that permits these damaged ribosomes to translate regions outside of the main orf. it is also possible that changes in the abundance of initiation factors during the broad reprogramming of the transcriptome, or activation of stress pathways as a result of mrna loss, could facilitate translation of altorfs. it was also recently proposed that dorfs in the 'utr can directly recruit ribosomes and lead to initiation (wu et al., b) . thus, it is possible that rnase l activation could also modulate this process. our observation also raises the question of whether translation of these altorfs affects the innate immune response and offers any benefit to the host in clearing virusinfected cells. we predict that some of the rnase l cleaved mrna fragments contain a start codon, where translation can be initiated, but lack an in-frame stop codon due to cleavage at the ' end. therefore, we speculate that ribosomes translating such fragments will become trapped at the ' end of the rna and require rescue by ribosome rescue factors, such as pelo (guydosh et al., ) . this highlights a potential role of ribosome rescue factors during rnase l activation in maintaining a pool of free ribosomes. prolonged activation of rnase l could overwhelm the ribosome rescue system and lead to ribosome queues that are, in turn, recognized by sensors of ribosome stalling, such as zakα, that could trigger apoptosis mediated by jnk (vind et al., ; wu et al., a) . such a model could account for the observation that jnk is critical to rnase-l mediated apoptosis (li et al., ) , a process thought to be beneficial for eliminating cells infected by viruses. the peptide products that are produced by altorf translation during rnase l activation could also have functional roles. it is known that translation outside annotated sequences can have important functional outcomes (ingolia et al., ) and, according to the "immunogenic peptide hypothesis," altorfs can be a source of cryptic (unannotated) peptides presented by human leukocyte antigen i (hla-i) molecules (boon and van pel, ; yewdell, ) . if such peptides are seen as "non-self" by the adaptive immune system, they could aid in the clearance of viruses. in support of this idea, prior work demonstrated that cryptic peptides that are encoded by uorfs and dorfs can be effectively loaded onto hla-is (chen et al., ; schwab et al., ) and, in b-cells, . % and . % of the hla-i bound peptidome originated from 'utrs and 'utrs, respectively (laumont et al., ) . it is therefore conceivable that peptides generated from translation of rnase l cleavage fragments could also be presented on hla-i and benefit the host, acting as additional danger signals. however, given the short time frame of rnase l activation, detection of cryptic peptides in rnase l activated cells has proven to be difficult (data not shown). thus, methodology development is needed to assess whether altorf-derived peptides are stable and functional. in addition, it is possible that translation of the mrna decay intermediates generated by rnase l could serve other roles. for example, they could sequester ribosomes and thereby prevent translation of viral transcripts. on the other hand, translation of fragments could be detrimental to the host since ribosomes melt mrna secondary structure (mizrahi et al., ) and this structure in the fragments was suggested to be important for activating other dsrna sensors that trigger interferon production and assembly of stress granules (burke et al., ; malathi et al., ; manivannan et al., ) . finally, it is notable that the relative levels of altorf translation in cells infected with vaccinia virus (vacv) mimicked that in rnase l activated cells. this suggests that altorf translation is triggered by at least some viral infections. however, since many viruses are known to suppress activation of rnase l, the potential role of altorf translation in the clearance of the viruses is likely to vary. therefore, more targeted studies are needed to investigate the role of rnase l in altorf translation in the context of viruses. nevertheless, our findings reveal that widespread translation outside of coding sequences is a component of the innate immune response and add to the growing body of examples of alternative translation events. we believe further study of this phenomenon has the potential to identify new mechanistic targets for therapeutic intervention. the authors declare no conflicts of interest. further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, nicholas guydosh (nicholas.guydosh@nih.gov) . tissue culture cells. human cell lines hap , hela and a were cultured in dmem (hap , hela) or rpmi (a ) complemented with % fetal bovine serum (fbs). a cells were tested and negative for mycoplasma contamination throughout the study. mycoplasma testing was performed using emyco valid mycoplasma pcr detection kit (biolink). cells were incubated at °c in the presence of % co . synthesis and purification of - a. - a was synthesized enzymatically in vitro by recombinant human oas . first, recombinant oas (p ) containing an n-terminal his tag was expressed in bl (de ) e.coli as described before (poulsen et al., ) or with an alternative protocol using autoinduction media (studier, ) . then cells were collected by centrifugation at g for min at ˚c. then cells were lysed in b-per protein extraction reagent ml/ gram bacterial pellet in the presence of complete mini protease inhibitor cocktail (roche) for minutes at room temperature. next, bacterial lysate was cleared by centrifugation at , g for hour at °c. the supernatant was filtered ( µm pore size filter, millipore) and loaded onto a histrap (ge healthcare) nickel column. the column was then washed with wash buffer ( mm hepes (ph . ), mm nacl, % (vol/vol) glycerol, mm tcep and mm imidazole) and oas was gradient eluted with mm imidazole. protein fractions were then evaluated by sds-page and coomassie staining. then oas containing fractions were pooled, concentrated and buffer exchanged (zeba spin desalting column, k mwco, thermo scientific) in storage buffer ( mm hepes (ph . ), mm nacl, % (vol/vol) glycerol and mm tcep). final protein preparation was stored in storage buffer at - °c. concentration of oas was determined by using nanodrop spectrophotometer (thermo fischer scientific) (mw= . g/mol and e= , m - cm - ). to produce - a, µm purified oas was incubated with . od poly i:c in the presence of mm atp, mm hepes (ph . ), mm nacl, mm mgcl , % (vol/vol) glycerol, mm dtt at °c for hours. to stop the reaction, samples were incubated at ˚c for minutes. then the samples were filtered ( µm pore size millipore filter) and the different - a species were separated on a / monoq column as described before (poulsen et al., ) . the same - a fractions from several run was pooled and run again on / monoq column to achieve higher concentrations. - a concertation was estimated by nanodrop spectrophotometer at nm. then yielded - a was aliquoted and stored at - ˚c. after incubation ( - . h) cells were washed with ice-cold pbs and then flash frozen in liquid n . - µl of lysis buffer ( mm tris-hcl ph . , mm nacl, mm mgcl , mm dtt, µg/ml cycloheximide, % triton x- , . u turbo dnase) was added to the frozen cells and those were thawed on ice. cells were scraped from the bottom of the flasks and lysates were transferred to eppendorf tubes and incubated on ice for additional minutes before passing through gauge needles ten times. lysates were clarified by centrifugation for minutes at , g at °c. supernatants were flash frozen in liquid n and stored at - °c before proceeding to further steps. all other steps and construction of the illumina sequencing libraries were carried out as described before (mcglincy and ingolia, ) . rrnas were removed by illumina ribo-zero rrna removal kit. quality of the libraries was assessed by bioanalyzer (agilent) using the high sensitivity dna kit (agilent). sequencing was performed on an illumina hiseq at the nhlbi or niddk dna sequencing and genomics core. ribo-seq experiments with transiently transfected a rnase l ko cells were carried out similarly to the above. wild type and catalytic mutant h n rnase l plasmids cdnas (the backbone is pcdna -frt-to with cmv promoter, invitrogen) used in this study were a generous gift from dr. robert hogg. originally, wild type rnase l was cloned from hek cells and h n mutation was introduced by quick change mutagenesis. the cdna sequences were identical to that of the reference sequence encoding the full-length rnase l. for electroporation experiments confluent t s (~ - * cells) were trypsinized and resuspended in maxcyte electroporation buffer and electroporated in xoc cuvettes (maxcyte) in the presence of µg/ml pcdna -rnase l wild type or h n transfection quality plasmid with standard settings for a in the maxcyte atx electroporator. cells were divided and plated on three t flasks and incubated for h until cells reached % confluency before - a or poly i:c treatment. all other steps were carried out as described above. read processing. the fastq files were de-barcoded by the core facility and reads of - nucleotides were sorted by internal nucleotide sample barcode and trimmed of linkers by cutadapt. then contaminating trnas and remaining rrnas were filtered out by bowtie allowing two mismatches in -v mode. additionally, we used the -y option to increase sensitivity of the alignment. we created the noncoding rna fasta file by downloading rrna sequences from the silva project (release ) (quast et al., ) and trna sequences from gtrnadb (h. sapiens release ) (chan and lowe, ) . after the filtering out the non-coding rnas, an in-house python script (dedup) was used to remove pcr duplicates by comparing the nucleotide unique molecular identifiers (umi). then, the resulting fastq files were quality checked by fastqc. in particular, we found that the majority of the ribosome protected footprints were - nucleotides long. then umis were trimmed by cutadapt and reads were aligned to the human transcriptome. the hg genome and annotations originated at ucsc on august , and were downloaded from the illumina igenomes project. tophat . and bowtie . . aligner were used and allowed up to mismatches (-n ). the -t, -g , and --nonovel-juncs options were used as further inputs to allow a single mapping per read to annotated transcriptome only. the resulting sam files containing aligned reads were sorted and indexed using samtools for downstream analysis. in order to view reads in a genome browser, wig files were generated by using the make_wiggle function of the plastid suite. we note that we used '-end assignment of footprints throughout the study since it offers a higher fraction of reads mapping to a single reading frame (reading frame of the main orf) than '-end assignment or coverage. initial read processing, bowtie alignments to ncrna, and tophat alignments were performed on the nih biowulf cluster. calculation of utr:orf density ratios. we quantitated the number of reads mapping to defined sub-regions (cds, 'utr, and 'utr) by using the plastid suite (dunn and weissman, ) and cs function. first, we used the generate mode to preprocess the annotation file used above and in the downstream analysis. then we obtained counts of aligned reads in these regions with count mode using options for ' end alignment and offset of nucleotides (corresponds to center of p site). raw read values were normalized to the total number of reads that mapped in the alignment. we then computed the density (in units of rpkm) in each region by dividing the number of counts by the region's length. then the ratio of utr and main orf raw reads (densities) were calculated. genes were excluded where there were less than raw reads in both the utr and orf unless otherwise indicated. metagene analysis. we created average ribosome footprint density (metagene) plots by using plastid's metagene function. first, the generate mode was used to define a window of nucleotides around the start and the stop sites that was free of alternative splicing. then we used count mode to calculate the mean ribosome occupancy across genes. metagene plots were normalized so that the contribution from each gene was equally weighted according to the density within its main orf, specified by positions to and - to - from start and stop codon plots, respectively. data traces are shifted so that peaks coincide with start or stop codons (approximate p site or a site, respectively). note that peak -nt past start codon occurs because all reads at this site have atg at their ' end and this amplifies with high efficiency during library creation. to determine changes in ribosome footprint distribution in rnase l activated cells we analyzed two complete and independent ribo-seq datasets (replicate and ). analysis were carried out by using dteg.r as described (chothani et al., ) , that builds on the widely used differential expression analysis software deseq , however dteg.r is able to take other factors, such as batch effect, into account. data input for dteg.r came from the plastid cs function, where raw read count values were used. gene ontology analysis was performed by gorilla webserver (eden et al., ) where enrichment of genes with the largest, significant increase (> fold) in ribosome occupancy were compared to the background set of genes (two unranked list of genes and function options). for more detailed analysis, we implemented a more stringent transcriptome-only analysis approach with a single transcript isoform per gene to enable higher precision mapping and eliminate spurious reads. in detail, we took reads that had been digitally subtracted for noncoding rna and aligning them to a transcriptome with bowtie version . . using the parameters -v ( mismatch allowed) and -y. the transcriptome, refseq select+mane (ncbirefseqselect), was downloaded from ucsc on april , and used for alignment after removal of duplicates on alt chromosomes. we then used custom python scripts to perform specialized average (metagene) analysis at start and stop codons. metagene analysis for reduced transcriptome alignment. metagene analysis (metagene_m) were performed in a few cases for special cases of long windows where the high-quality transcriptome annotation was required using the transcriptome alignment described above (supplemental figure d) . analysis was analogous to that performed by plastid. averages were computed around the window size of choice for each gene after normalizing to the total read density of the main orf. genes with < rpkm read density were not included and traces shifted to correspond to approximate a site. altorf metagene analysis. averages (posavg_m) around start codons in utr regions (figures - ) were computed by normalizing reads in a window of nt upstream and downstream of the codon of interest, after shifting them nt to correspond the approximate p site, to the average read density in the main orf. to minimize bad annotations, we excluded genes where main orf footprint density measured < rpkm and cases where the average read density in the window was -fold greater than the main orf. for averages of start codons within coding sequences (iorfs and nterminal truncations), each start codon was equally weighted in the average according to the total reads that mapped within the window of interest. note that the peak -nt past start codon occurs because all reads at this site have atg at their ' end and this amplifies with high efficiency during library creation. catalog of dorf sites. read density on all possible aug-initiated dorfs was computed (dorflist) by summing reads mapping only in the frame of interest. these values were then used to compute the length of those dorfs whose relative expression most increased under - a treatment ( figure e ). gene models. example ribosome profiling data on particular gene models was generated by using writegene _m to show reads mapping to the reduced transcriptome alignment. rrna cleavage assay. total rna was extracted from µl of lysates prepared for ribosome profiling using direct-zol mini kit (zymo) or trizol reagent (invitrogen) according to the manufacturer's protocol. the amount of total rna was computed by absorbance at nm measured by nanodrop (thermo fischer scientific) and then diluted to - ng/µl. then rna samples were run on bioanalyzer (agilent) using the rna nano kit (agilent). minute run-to-run shifts in band sizes are due to inherent limitations of the instrument. dual luciferase 'utr translation assay. dual luciferase plasmids were previously generated by cloning renilla luciferase (rluc) into the main orf followed by a stop codon of the pt cfe-chis plasmid backbone (see in (houck-loomis et al., ) ). to study 'utr translation firefly luciferase (fluc) was inserted into the 'utr ( 'utr plasmid). these two reporter sequences were in the same frame and were separated with an nt long linker. while fluc did not contain its own aug start codon, the linker encoded a non-canonical cug start codon upstream of the cdna of fluc. as a control, the same cdna sequence was used without the stop codon between the rluc and fluc sequences (control plasmid). the product of the control plasmid is expected to have both rluc and fluc activity and represent the case where every ribosome translates both genes. the assay was performed by incubating the plasmids with or without the trimer form of - a in hela cell lysates at °c for minutes using the -step human coupled ivt kit (see details in key resources table). then samples were then incubated on ice for minutes to rapidly terminate the reaction. luciferase activity was assayed by using a dual-glo luciferase assay (promega) to generate luminescence and measuring it with berthold centro lb plate reader. experiments was evaluated by sds-page electrophoresis and immunoblotting. first, µl of lysate was mixed with sds sample buffer (invitrogen) then run on - % gradient mini-protean tris-hcl gel (biorad). proteins were transferred to a . µm pvdf membrane using the trans-blot turbo system (biorad) and membranes were blocked in tris buffered saline, . % tween (tbst) and % milk for hour at room temperature. this was followed by incubation with primary antibodies overnight in tbst (tris-buffered saline plus . % tween) and % milk (rnase l antibody : , histone h antibody : dilution). after washing three times for minutes in tbst, secondary antibodies were incubated with the pvdf membrane for hour at room temperature (goat anti-rabbit antibody, biorad, : ). then, after additional washes ( times, minutes each) in tbst, the pvdf membrane were incubated with clarity western ecl substrate (biorad) for minutes and proteins were visualized by amersham imager . ribosome profiling. ribo-seq experiments were repeated at least twice except electroporation experiments ( figure ) where we used two different conditions ( - a and poly i:c treatment) instead of replicates to support our conclusions. each biological replicate dataset was analyzed individually as described above. analysis and comparison of all datasets are provided in the main and in the supplement figures. we note that due to the variability in cell confluency and transfection efficiency of - a and poly i:c we also see some differences between utr:orf ratios in biological replicates. however, all of the - a and poly i:c treated wt a datasets exhibited increased utr:orf ratios when compared to their respective untreated dataset (transfection control). in the case of a rnase l ko datasets, while we found that nearly all of the - a and poly i:c treated cells did not exhibit increased utr:orf ratios, we noted one exception (supplemental figure a) . in this case, untreated cells (replicate ) showed particularly low 'utr:orf ratios (see boxplot in supplemental figure c ), resulting in an apparent shift above the diagonal when - a treated rnase l ko data were plotted against it (supplemental figure a ). since all - a treated rnase l ko datasets exhibited lower 'utr:orf ratios than - a treated wt datasets (first boxplots, supplemental figure c ), the shift in this case can be attributed to inherent variability of 'utr occupancy in the data. all plots were created by using prism . table s ). rnase l ko cells also shown as a control. note that y-axis is truncated causing the highest peaks in 'utrs to be higher than shown for jun and dusp . d average ribosome footprints around aug start codons in 'utrs show an increased peak at the start codon followed by footprints that exhibit -nt periodicity in - a treated wt cells, but not in rnase l ko cells. ribosome footprints are normalized to main orf footprint levels prior to averaging. e average ribosome footprint levels as in d, but for cug start codons, also show an increased peak. f average ribosome footprints around aug start codons in the + reading frame of coding sequences show an increased peak in - a treated wt cells (left panel). to further emphasize differences, we eliminate reads from the main orf by plotting frame + reads only (right panel). g average ribosome footprints around aug start codons internal to and in the same frame as the main orf of coding sequences show an increased peak in - a treated wt cells. h average ribosome footprint levels as in f but for the - frame. recognition of double-stranded rna and activation of nf-kappab by toll-like receptor ribosomal protein mrnas are primary targets of regulation in rnase-l-induced senescence a leaderless mrna can bind to mammalian s ribosomes and direct polypeptide synthesis in the absence of translation initiation factors rnase l is a negative regulator of cell migration cloning and characterization of a rnase l inhibitor. a new component of the interferon-regulated - a pathway t cell-recognized antigenic peptides derived from the cellular genome are not protein degradation products but can be generated directly by transcription and translation of short subgenic regions. a hypothesis rnase l attenuates mitogen-stimulated gene expression via transcriptional and post-transcriptional mechanisms to limit the proliferative response cellular '- ' mrna exonuclease xrn controls double-stranded rna accumulation and anti-viral responses rnase l reprograms translation by widespread mrna turnover escaped by antiviral mrnas germline mutations in the ribonuclease l gene in families showing linkage with hpc rnasel arg gln variant is implicated in up to % of prostate cancer cases a study of the interferon antiviral mechanism: apoptosis activation by the - a system gtrnadb: a database of transfer rna genes detected in genomic sequence pervasive functional translation of noncanonical human open reading frames real-time - a kinetics suggest that interferons beta and lambda evade global arrest of translation by rnase l deltate: detection of translationally regulated genes by integrative analysis of ribo-seq and rna-seq data ribonuclease l and metal-ion-independent endoribonuclease cleavage sites in host and viral rnas ribosome profiling reveals translational upregulation of cellular oxidative phosphorylation mrnas during vaccinia virus-induced host shutoff the eif- alpha kinases and the control of protein synthesis - a-dependent rnase molecules dimerize during activation by - a rapid rnase ldriven arrest of protein synthesis in the dsrna response without degradation of translation machinery plastid: nucleotide-resolution analysis of nextgeneration sequencing and genomics data gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists interferon action: rna cleavage pattern of a ( '- ')oligoadenylate--dependent endonuclease interferon action. covalent linkage of ( '- ')pppapapa( p)pcp to ( '- ')(a)n-dependent ribonucleases in cell extracts by ultraviolet irradiation regulated ire -dependent mrna decay requires no-go mrna degradation to maintain endoplasmic reticulum homeostasis in s structure of human rnase l reveals the basis for regulated rna decay in the ifn response nucleic acid immunity a dominant negative mutant of - a-dependent rnase suppresses antiproliferative and antiviral effects of interferon gene-specific translational control of the yeast gcn gene by phosphorylation of eukaryotic initiation factor an equilibrium-dependent retroviral mrna switch regulates translational recoding dimeric structure of pseudokinase rnase l bound to - a reveals a basis for interferon-induced antiviral activity ribosome profiling reveals pervasive translation outside of annotated protein-coding genes genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling rnase l mediated protection from virus induced demyelination decoding mrna translatability and stability from the ' utr upstream orfs are prevalent translational repressors in vertebrates expression analysis and genomic characterization of human melanoma differentiation associated gene- , mda- : a novel type i interferon-responsive apoptosis-inducing gene differential roles of mda and rig-i helicases in the recognition of rna viruses the iron-sulphur protein rnase l inhibitor functions in translation termination global proteogenomic analysis of human mhc class i-associated peptides derived from noncanonical reading frames a newly discovered function for rnase l in regulating translation termination regulation of human rnase-l by the mir- family reveals a novel oncogenic role in chronic myelogenous leukemia an apoptotic signaling pathway in the interferon antiviral response mediated by rnase l and c-jun nh -terminal kinase an essential role for the antiviral endoribonuclease, rnase-l, in antibacterial immunity activation of rnase l is dependent on oas expression during infection with diverse human viruses novel metabolism of several beta zero-thalassemic beta-globin mrnas in the erythroid tissues of transgenic mice impacts of uorf codon identity and position on translation regulation opposing roles of double-stranded rna effector pathways and viral defense proteins revealed with crispr-cas knockout cell lines and vaccinia virus mutants rnase-l deficiency exacerbates experimental colitis and colitis-associated cancer infection of mouse neurones by west nile virus is modulated by the interferon-inducible '- ' oligoadenylate synthetase b protein small self-rna generated by rnase l amplifies antiviral innate immunity rnase l amplifies interferon signaling by inducing protein kinase r-mediated antiviral stress granules the rnase l inhibitor (rli) is induced by double-stranded rna a nonsense mutation in the gene encoding '- '-oligoadenylate synthetase/l isoform is associated with west nile virus susceptibility in laboratory mice transcriptome-wide measurement of translation by ribosome profiling initiation of translation at cug, gug, and acg codons in mammalian cells ribosomopathies: there's strength in numbers dynamic regulation of a ribosome rescue pathway in erythroid cells and platelets virus-induced changes in mrna secondary structure uncover cis-regulatory elements that directly control gene expression identification of a cytoplasmic complex that adds a cap onto '-monophosphate rna chapter in vitro assays of ′ to ′-exoribonuclease activity the organizing principles of eukaryotic ribosome recruitment the role of abce in eukaryotic posttermination ribosomal recycling enzyme assays for synthesis and degradation of - as and other '- ' oligonucleotides the silva ribosomal rna gene database project: improved data processing and web-based tools human rnase l tunes gene expression by selectively destabilizing the micrornaregulated transcriptome concerted - a-mediated mrna decay and transcription reprogram protein synthesis in the dsrna response - a accumulates to high levels in interferon-treated, vaccinia virus-infected cells in the absence of any inhibition of virus replication constitutive display of cryptic translation products by mhc class i molecules generation of ′, ′-cyclic phosphate-containing rnas as a hidden layer of the transcriptome conserved non-aug uorfs revealed by a novel regression analysis of ribosome profiling data studies on nuclear exoribonucleases. . isolation and properties of the enzyme from normal and malignant tissues of the mouse protein production by auto-induction in high density shaking cultures a recap of rna recapping zakα recognizes stalled ribosomes through partially redundant sensor domains analysis of the rnasel gene in familial and sporadic prostate cancer stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides ribosomal rna cleavage, nuclease activation and - a(ppp(a 'p)na) in interferon-treated cells ribosome collisions trigger general stress responses to regulate cell fate translation of small downstream orfs enhances translation of canonical main open reading frames single nucleotide polymorphisms in genes for '- '-oligoadenylate synthetase and rnase l inpatients hospitalized with west nile virus infection drips solidify: progress in understanding endogenous mhc class i antigen processing the rna helicase rig-i has an essential function in double-stranded rna-induced innate antiviral responses rli /abce recycles terminating ribosomes and controls translation reinitiation in 'utrs in vivo tma /mct- , and tma /denr recycle post-termination s mammalian capping enzyme complements mutant saccharomyces cerevisiae lacking mrna guanylyltransferase and selectively binds the elongating form of rna polymerase ii mapping of the human rnasel promoter and expression in cancer and normal cells interferon action and apoptosis are defective in mice devoid of ', '-oligoadenylate-dependent rnase l impact of rnase l overexpression on viral and cellular growth and death vacv treated cells, but not in mock treated or h treated cells. e comparison of 'utr:orf density ratios of transcripts in vacv vs mock treated ( h, h) hela cells genes plotted above the diagonal have increased relative 'utr translation in the treated vs untreated sample each dot represents one gene model. genes plotted above the diagonal have increased relative 'utr translation in the treated vs untreated sample. data show increased 'utr translation occurs in multiple cell lines and that tetrameric - a also causes it. however, we note that hap cells display lower density ratios as compared to hela and a cells, likely due to the less transfection efficiency of this cell line. g box plot representation of data shown in d-f, performed as in c. h analysis of 'utr:orf density ratios as in d-f, but for dimeric - a in a cells. data reveal dimeric - a does not increase relative 'utr translation. i box plot representation of h with data from trimeric - a also shown as reference. j different mechanisms of 'utr translation in yeast replicate analysis confirms increased relative translation of dorfs after - a utr metagene analysis of ribosome footprints showed increased aug and cug peaks in - a treated wt cells ( nd and rd replicates), but not in rnase l ko cells utr:orf density ratios of transcripts in trimer - a treated and untreated (a-b) wt (replicate and ) and rnase ko (replicate ). c box plot representation of data from figure a and metagene average analysis of trimer - a treated and untreated wt (replicate and ) aug (f) peaks in trimer - a treated and untreated wt (replicate and ) and rnase ko (replicate ) cells. g and h increased average ribosome footprints in frame + on frame + iorfs (g) or - (h) on - iorfs in - a treated cells in replicate and , similar to that of figure f and h. i average ribosome footprints around aug start codons in the + reading frame of coding sequences show a peak in - a treated wt cells. data from figure f all plots include data from genes that passed threshold in both datasets (at least raw reads in both 'utrs and in the main orfs). the data are correlated (pearson's r = . and . ), showing that both treatments evoke a similar 'utr translation phenomenon. c box plot representation of data from figure a and s a. d 'utr metagene analysis of ribosome footprints showed increased aug peaks in poly i:c treated wt cells, but not in rnase l ko cells (replicate ). e box plot representation of data from figure e and additional replicates. f frame - ribosome footprints derived from frame - internal orf metagene average plot (generated similarly to figure d) showed increased peaks at iorf aug start sites. g example gene model of actg showing increased 'utr footprints in poly i:c treated wt cells relative to untreated cells (red vs black). however, we noted a small increase in footprints that follow a different pattern in poly i:c treated ko cells (compare purple to figure c). as discussed in the main text b comparison of 'utr:orf density ratios of transcripts in µm - a treated and untreated (treated for h min or h min) wt a cells. c comparison of 'utr:orf ribosome footprint density ratios of transcripts in . or µm - a treated and untreated wt a cells d expression levels of wild type and h n rnase l are comparable (within ~ -fold) in all samples used for ribo-seq. antibody against histone h was used as a loading control. rnase l ko cells (control) have no detectable rnase l expression. e box plot representation of 'utr:orf ribosome profile density ratios data from figure c and d. f comparison of 'utr:orf polya + rna-seq 'utr:orf density ratios for µg/ml poly i:c treated vs untreated a cells. data is processed and analyzed from publicly available polya + rna-seq datasets (accession numbers: srx and srx utr:orf density ratios from figure a and other data sets (geo accession number: srp ). b average ribosome footprints around aug start codons in 'utrs show an increased peak at the start codon in h vacv treated cells, but not in mock treated or h treated cells. c box plot representation of 'utr:orf ribosome profile density ratios data from figure e and other data sets key: cord- - udhvl n authors: schierding, william; horsfield, julia; o’sullivan, justin title: low tolerance for transcriptional variation at cohesin genes is accompanied by functional links to disease-relevant pathways date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: udhvl n variants in dna regulatory elements can alter the regulation of distant genes through spatial-regulatory connections. in humans, these spatial-regulatory connections are largely set during early development, when the cohesin complex plays an essential role in genome organisation and cell division. a full complement of the cohesin complex and its regulators is important for normal development, since heterozygous mutations in genes encoding these components are often sufficient to produce a disease phenotype. the implication that genes encoding the cohesin complex and cohesin regulators must be tightly controlled and resistant to variability in expression has not yet been formally tested. here, we identify spatial-regulatory connections with potential to regulate expression of cohesin loci, including linking their expression to that of other genes. connections that centre on the cohesin ring subunits (mitotic: smc a, smc , stag , stag , rad /rad -as; meiotic: smc b, stag , rec , rad l ), cohesin-ring support genes (nipbl, mau , wapl, pds a and pds b), and ctcf provide evidence of coordinated regulation that has little tolerance for perturbation. we identified transcriptional changes across a set of genes co-regulated with the cohesin loci that include biological pathways such as extracellular matrix production and proteasome-mediated protein degradation. remarkably, many of the genes that are co-regulated with cohesin loci are themselves intolerant to loss-of-function. the results highlight the importance of robust regulation of cohesin genes, indicating novel pathways that may be important in the human cohesinopathy disorders. the cohesin complex has multiple essential roles during cell division in mitosis and meiosis, genome organisation, dna damage repair, and gene expression . mutations in genes that encode members of the cohesin complex, or its regulators, cause developmental diseases known as the 'cohesinopathies' when present in the germline ; or contribute to the development of cancer in somatic cells [ ] [ ] [ ] . remarkably, cohesin mutations are almost always heterozygous, and result in depletion of the amount of functional cohesin without eliminating it altogether. complete loss of cohesin is not tolerated in healthy individuals . thus, cohesin is haploinsufficient such that normal tissue development and homeostasis requires that the concentrations of cohesin and its regulatory factors remain tightly regulated. the human mitotic cohesin ring contains four integral subunits: two structural maintenance proteins (smc a, smc ), one stromalin heat-repeat domain subunit (stag or stag ), and one kleisin subunit (rad ) . mutation of stag has been linked to at least four tumour types (e.g. ewing sarcoma, glioblastoma and melanoma , and bladder carcinomas) . strikingly, mutations in cohesin components are especially prevalent in acute myeloid leukaemia [ ] [ ] [ ] . in meiotic cohesin, smc a is replaced by smc b; stag / by stag ; and rad by rec or rad l . mutations in meiotic cohesin subunits are associated with infertility in men , chromosome segregation errors and primary ovarian insufficiency in women . cohesin is loaded onto dna by the scc /scc complex (encoded by the nipbl and mau genes, respectively) . mutations in the cohesin loading factor nipbl are associated with > % cases of cornelia de lange syndrome (cdls). remarkably, features associated with cdls are observed with less than % depletion in nipbl protein levels . the release of cohesin from dna is achieved by wapl, which opens up the interface connecting the smc and rad subunits. the pds a/pds b cohesin associated subunits affect this process by contacting cohesin to either maintain (with stag and stag ) or remove (with wapl) the ring from dna . spatial organization and compaction of chromosomes in the nucleus involves non-random folding of dna on different scales. the genome is segregated into active a compartments and inactive b compartments , inside which further organisation occurs into topologically associating domains (tads) interspersed with genomic regions with fewer interactions . cohesin participates in genome organisation by mediating 'loop extrusion' of dna to form loops that anchor tad boundaries. at tad boundaries, cohesin colocalizes with the ccctc binding factor (ctcf) to form chromatin loops between convergent ctcf binding sites. fine-scale genomic interactions include chromatin loops that mediate promoter-enhancer contacts. notably, the time-and tissue-specific formation of the fine-scale loops also requires cohesin. the spatial organization of the genome is particularly dynamic and susceptible to disruption during development. for example, changes to tad boundaries are associated with developmental disorders . furthermore, disruption of tad boundaries by cohesin knockdown can lead to ectopic enhancer-promoter interactions that result in changes in gene expression . rewiring of the patterns of course and fine-scale chromatin interactions also contributes to cancer development , , including the generation of oncogenic chromosomal translocations , . disease-associated gwas variants in non-coding dna likely act through spatially organized hubs of regulatory control elements, each component of which contributes a small amount to the observed phenotype(s), as predicted by the omnigenic hypothesis , . non-coding mutations at cohesin and cohesin-associated factors were found by genome wide association studies (gwas-attributed variants) to track with multiple phenotypes . however, the impact of genetic variants located within cohesin and its associated genes has not yet been investigated with respect to phenotype development. we hypothesised that cohesin-associated pathologies can be affected by subtle, combinatorial changes in the regulation of cohesin genes caused by common genetic variants within control elements. here, we link the d structure of the genome with eqtl data to determine if gwas variants attributed to cohesin genes affect their transcription. we test cohesin gene-associated gwas variants for regulatory connections beyond the cohesin genes (gene enrichment and regulatory hubs). we also identify all variants within each gene locus that had a previously determined cis-eqtl (gtex catalog) to the cohesin gene (eqtl-attributed variant list). as with the gwas variants, we tested these variants for the presence of spatial-regulatory relationships involving genes outside of the locus. only a few of these eqtl-attributed variants are currently implicated in disease pathways, but their regulatory relationships with cohesin suggest that they may be significant for cohesinopathy disorders. results genetic variants with regulatory potential are associated with cohesin loci mitotic cohesin genes (smc a, smc , stag , stag , and rad ), meiotic cohesin genes (smc b, stag , rec , and rad l ), cohesin support genes (wapl, nipbl, pds a, pds b, and mau ) and ctcf were investigated to determine if they contain non-coding genetic variants (snps) that make contact in d with genes and therefore could directly affect gene expression (gwas-attributed and eqtl-attributed; table , table s ). a total of gwas-attributed genetic variants associated with disease were identified (methods) that mapped to a cohesin gene ( gwas; snps) or cohesin-associated ( gwas; snps) gene (table s ). twelve snps (blue , table s ) are not listed in the current gtex snp dictionary (gtex v ), while fourteen snps have virtually no variation in the gtex per-tissue analysis (i.e. minor allele frequency [maf] < . ; green, table s ) and so were discarded. snps passed all filters and were subsequently analysed using codes d (gwas-attributed list; table ). within the gtex catalogue, eqtl-attributed variants associate with regulation of the cohesin gene set (table s ). these variants associate with modified expression levels of cohesin genes in otherwise healthy individuals. fifty-five of these variants had a maf< . (green , table s ) and were filtered out of the eqtl-attributed set prior to codes d analysis ( variants passed maf filter; eqtl-attributed; table ). only three variants were shared between the gwas-attributed and eqtlattributed variant lists, resulting in a total of cohesin-associated variants (gwas-and eqtl-attributed combined; table ), but only variants pass all filters (gwas-and eqtl-attributed combined; black, table s ). codes d integrates data on the -dimensional organisation of the genome (captured by hi-c) and transcriptome (eqtl) associations across multiple tissue types (table s and s ). we used the codes d algorithm to assign the variants to hubs of regulatory/functional impacts by examining their potential to regulate other genes. of the variants, four had zero significant eqtls, leaving variants with significant eqtls ( eqtl-attributed, gwas-attributed, and overlaps; table s and s ). however, many of the variants were not attributed the gwas-or eqtl-attributed cohesin gene in the locus. of the variants with eqtls, / ( . %) eqtl-attributed variants and / ( . %) gwas-attributed variants had a physical (hi-c detected) connection and significant eqtl with their attributed cohesin gene ( total, overlaps; table ). strikingly, most of the variants attributed by gwas studies to cohesin genes were not confirmed by spatial connection, with only % being cis-eqtls for the attributed cohesin gene ( of , . %). after the codes d analysis, six of the cohesin genes have no gwas-attributed snps with a regulatory connection (stag , nipbl, ctcf, smc , rec , and rad -as ). therefore, the majority of gwas variants tested in proximity to cohesin loci have regulatory effects elsewhere in the genome. remarkably, despite five snps being attributed to nipbl by gwas, none of these were attributed to regulation of nipbl by our spatial eqtl analysis. of the cohesin or cohesin-associated genes with any gwas-attributed variants with cis-eqtls (rad , rad l , smc a, smc b, stag , stag , mau , pds a, pds b, and wapl), only the stag , mau , and pds b loci contain more than two variants with confirmed cis-eqtls. therefore, even those loci with confirmed variant-gene gwas attributions have very few variants with evidence of cis-eqtls. to further characterize the potential for the cohesin-associated variants to alter gene regulation, we analysed histone marks, dnase accessibility, and protein binding motifs (haploreg v . ) at each location (table s ) . most variants reside within accessible chromatin (dnase: . %) and almost all ( . %) have at least one of three histone marks that are consistent with putative regulatory activity (promoter, . %; enhancer, . %; protein binding sites, . %). intriguingly, haploreg motif prediction identified of the variants ( different loci: mau , pds b, rec , smc b, stag , rad l , stag ) as residing within protein binding domains associated with cohesin-related dna interactions (i.e. rad , smc , and ctcf). therefore, most of the variants like in regions associated with chromatin marks that highlight putative regulatory capabilities. codes d predicted out of variants to have significant regulatory activity. we compared this to alternative functional variant prediction methods. the deepsea algorithm, which predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence, identified of the variants as having functional significance (< . , table s ). predictsnp , which estimates noncoding variant classification (deleterious or neutral) from five separate prediction tools (cadd, dann, fat, fun, gwava tools), identified of variants as deleterious (table s ) . therefore, only / variants have putative functional significance predicted by these tools ( deepsea, predictsnp, overlaps). therefore, these variants have support from multiple methods, suggesting a potential for higher regulatory effects, and that the contrast with the haploreg chromatin marks and gtex measured eqtls possibly indicates a heavy weighting against false positives in these prediction methods. in summary, gwas-attributed snps are enriched for chromatin marks (regulatory potential). however, fewer than half the snps in proximity to the cohesin and cohesin-associated genes physically connected with the cohesin genes they are predicted to regulate, suggesting that cohesin genes are not the direct targets of these regulatory variants. pathway enrichment implicates coordinated regulation of cohesin with essential cell cycle genes codes d identified variants as being physically connected to, and associated with the expression levels of genes ( genes from eqtl-attributed variants, from gwas-attributed variants, and overlap) across , significant tissue-specific regulatory connections (fdr p< . ). physical connections comprised , fine-scale connections (cis, < mb from the variant), coarsescale connections (trans-intrachromosomal, > mb), and connections on a different chromosome (trans-interchromosomal) (fig ; tables s , s ). of note, there is one cohesin-tocohesin regulatory connection: rs , a gwas-associated variant (bipolar disorder) located within the mau locus has a significant trans-interchromosomal eqtl with rec . the gene overlaps between the gwas-and eqtl-attributed analyses are also intriguing. for example, variants in the gwas-attributed and eqtl-attributed lists, each from a different chromosomal location (rec and pds b), modify tcf l expression. as tcf l is part of the wnt pathway and is highly expressed in ovaries in the gtex catalogue, it is notable that this gene is regulated from variants in the rec locus (meiosis-specific cohesin). their co-regulatory relationships exemplify the systems of genomeregulatory hubs, with a total genes overlapping the gwas-and eqtl-attributed analyses. the significant variants highlighted by the deepsea and predictsnp analyses functionally connected to ( . %) of the genes identified by codes d. thus, while deepsea and predictsnp assigned functionality to just / snps ( . %), these variants represent . % of the codes dpredicted modulatory connections. therefore, deepsea and predictsnp successfully selected for variants with highly enriched regulatory functions. we used g:profiler to assess the functional enrichment of the gwas-and eqtl-attributed genes ( table s ). the genes are enriched for pathways that support cohesin function, as we currently understand it, within the nucleus. for example, functional enrichment in sister chromatid gene ontology categories includes five non-cohesin genes from our analysis: ctnnb , ppp r a, chmp a, cul , and dis l . moreover, cohesin's meiosis-specific role (smc b, stag , rec ) is enriched by two trans connections revealing regulation of meiosis-related genes (itpr , ppp r a; kegg pathway hsa ). collectively, these results suggest that expression of cohesin genes is coordinated with other genes that are involved in cell cycle control. meiosis-specific cohesin genes are functionally connected to kif and a germ cell pathway there are genes functionally connected to variants within the meiosis-specific cohesin loci smc b, stag , rec , and rad l (table s a) . we identified significant enrichment terms using g:profiler (table s b) , including the gene ontology "male germ cell nucleus" pathway. the gene ontology "male germ cell nucleus" pathway contains kif , stag , and rec (trans-interchromosomal connection from stag locus to kif ). within gtex, kif is highly expressed in brain and testis. the kif gene region was previously identified as significantly associated with a gwas of hypospadias, a birth defect presenting with a urethral opening located on the ventral side of the penis instead of at the tip of the glans. of particular relevance to the cohesinopathies, in which affected individuals present with cognitive defects, a mutation in human kif caused neurodevelopmental defects and intellectual disability . variation in kif expression has been associated with epilepsy , another notable cohesinopathy phenotype . it has also been proposed that kif is a conserved regulator of neurological development . previous findings of an association between kif and heart disease are not supported by gtex (kif is lowly expressed in gtex heart tissue) and kif knockout mice have no heart phenotype , suggesting that the heart-associated kif variants might somehow affect the expression or function of other genes. we also identified an enrichment for e f transcription factor binding sites within our genes ( table s b ). the e f transcription factor modulates embryonic development and cell cycle , , with a role in cancer development . of note, the e f gene is loss-of-function intolerant (pli=. ), consistent with its crucial role in the cell cycle and development. collectively, our results suggest that the genes are enriched for cell cycle-regulated genes, including e f targets, and that these genes might be indirect targets of cancer drug treatments modifying e f transcription factor activity. the cohesin gene regulatory network is intolerant to loss-of-function mutations a subset of human genes, whose activity is crucial to survival, are intolerant to loss-of-function (lofintolerant) mutations . the gnomad catalog lists , genes: , lof-intolerant, , loftolerant, and undetermined ( . % lof-intolerance, defined as pli ≥ . ) . all cohesin and cohesin-associated genes (except the meiosis-only rad l , smc b, stag , and rec ) are lofintolerant (table ) . we hypothesized that genes functionally connecting to variants in cohesin genes would also be enriched for loss-of-function intolerance. consistent with our hypothesis, of the genes ( . %) we identified are lof-intolerant ( are lof tolerant and have undetermined pli; table s ). stratification of the eqtl genes based on the distance between the variant and gene (cis versus trans-acting eqtl gene lists) identified a marked increase in lof-intolerance for genes regulated by trans-acting eqtls (fig ; fig s ) .this was especially pronounced for regulatory interactions that occurred between chromosomes (fig ; fig s ) . there was a significant correlation between distance and pli on the same chromosome (r=- . , p< . ). as the cohesin genes are intolerant to even small perturbations in gene expression, we hypothesized that the lof-intolerant genes that were regulated by elements within the cohesin genes would similarly only be tolerant to small allelic fold changes in gene expression. therefore, we tested for a correlation between pli and the allelic fold change (log afc) associated with the eqtl. we observed that pli is significantly correlated with afc (r=- . , p< . ; fig ) . ignoring the direction of the change in expression, by using the absolute value of log afc, we identified an even stronger negative correlation between pli and |afc| (r=- . , p< . ; fig ) . collectively, these results are consistent with the hypothesis that long distance (especially inter-chromosomal) regulatory connections exhibit greater tissue specificity and disease associations , because they are enriched for lof-intolerant gene sets. thus, the inter-chromosomal regulatory connections potentially highlight novel disease pathways associated with the known cohesinopathies. pathway analysis identifies cohesin gene connections with extracellular matrix production and the proteasome we hypothesized that genes connected to regulatory variants within the cohesin loci might be contributing to disease-related phenotypes. a g:profiler transfac analysis identified significant enrichment for target sequences for the egr transcription factor (table s ) . egr transcription is regulated by stress and growth factor pathways; its binding to dna is modulated by redox state, and its transcriptional targets include genes involved in extracellular matrix production . g:profiler enrichment also identified genes as part of the ubiquitin mediated proteolysis pathway (birc , birc , cul , ube f, ube k, ube b, ubr , cul , xiap; tables s [red] and s ). the proteasome pathway tags unwanted proteins to be degraded and defects in proteolysis have a causal role in a variety of cancers. notably, of the ( . %) genes identified in the ubiquitin-mediated proteolysis pathway are lof-intolerant. we hypothesized that the genes we identified would have pharmacokinetic interactions with cancer treatments. notably, we identified cyp a as being regulated by a cis interaction with a variant , bp away (intronic, cnpy ). cyp a encodes a protein within the cytochrome p family. defects in p are known to alter cancer treatment outcomes (drug metabolism, kegg hsa ). additionally, we identified a g:profiler enrichment for the e f transcription factor within our egenes. e f is up-regulated in response to treatment with doxorubicin or etoposide, topoisomerase blockers . indeed, many of the genes from the codes d analysis are targeted by drugs (table s ) . by contrast, for the cohesin genes, the drug-gene interaction database only lists known drug interactions for stag and stag . notably, consistent with our earlier observations, the drug-gene targets we identified include two genes targeted by topoisomerase blockers (e.g. papola, and xiap) , both of which are lof-intolerant genes. through the use of the "contextualize developmental snps using d information" (codes d) algorithm , we have leveraged physical proximity (hi-c) and gene regulatory changes (eqtls) to reveal how variation in putative enhancers can alter the regulation of cohesin genes and modifier genes. our analysis has identified eqtls that link genes associated with the mitotic cohesin ring genes (smc a, smc , stag , stag , and rad /rad -as), meiosis-specific cohesin ring genes (smc b, stag , rec , and rad l ), cohesin-associated support proteins (wapl, nipbl, pds a, pds b, and mau ) and ctcf. collectively, these results form an atlas of functional connections from cohesin genes to proximal and distal genes, some of which reside on different chromosomes to the regulatory elements. these results agrees with previous findings that spatial eqtls mark hubs of activity across a multi-morbidity atlas . only % of variant-gene mappings in the gwas catalogue were supported by spatial cis-eqtls. therefore, although several gwas-associated disease snps have been linked with cohesin, only a minority have supporting evidence that these variants actually regulate the cohesin genes. as such, our results call into question the validity of many of the previous associations between non-coding genetic variants and the cohesin genes that have been made in the gwas catalogue. previous reports have suggested that cis-eqtl gene sets are depleted of lof-intolerance genes when compared to similarly sized sets of non-eqtl genes . similarly, our findings show that the genes regulated by variants in cohesin loci are also enriched for lof-intolerant genes is notable. the apparent bias in existing studies can be explained by eqtl studies being predominantly focused on nearby (cis) variant-gene transcriptional connections and hi-c studies focusing on local changes in tad structure, due to technical limitations in the current analysis pipelines. we revealed that the greater the distance separating the eqtl and target gene, the more likely the target gene was to be lof-intolerant (i.e. over % of trans-interchromosomal interactions involved lof-intolerant genes). this finding is consistent with studies that show that long distance connections exhibit greater disease-and tissue-specificity , , . the demonstration that the cohesion genes are also lof-intolerant agrees with their recognised haploinsufficiency in human developmental disease , . notably, the genes that were enriched within pathways of pathological importance (e.g. of the genes in the ubiquitin-mediated proteolysis pathway gene set) were more likely to be lof-intolerant. this is consistent with previous findings that eqtl-identified genes are enriched for genome-wide disease heritability, and the subset of eqtl genes with lofintolerance are even larger enrichments for genome-wide disease heritability . the codes d method identified eqtl links between cohesin genes and other loci not related to cohesin. two pathways emerged from this analysis. firstly, spatial eqtl connections with cohesin genes identified an enrichment for genes that are regulated by zinc finger transcription factors including egr and znf . egr positively regulates extracellular matrix (ecm) production. interestingly, we recently observed widespread dysregulation of extracellular matrix genes upon deletion of cohesin genes in leukaemia cell lines . this supports the idea that regulation of the cohesin complex is tightly associated with ecm production. additional support for this is derived from the observation that cohesin subunit smc exists in the form of an extracellular chondroitin sulfate proteoglycan known as bamacan . additionally, in asthma, smc upregulation significantly affected ecm components . the ecm facet of cohesin biology is relatively under-explored and is worthy of further investigation. secondly, spatial eqtl connections with cohesin genes identified an enrichment for genes that encode effectors of the proteasome pathway. the stability of many cohesin proteins are regulated by the proteasome pathway , , , but aside from this, genetic interactions between cohesin genes and proteasome pathway genes remain unexplored. in conclusion, many studies of mutations focus on the impact of coding-region variation, relying on natural knockouts (especially missense and loss of function variants) to identify gene function. our analysis highlights what those studies might be missing: sets of co-ordinated genes important to disease but largely intolerant to lof mutation in healthy individuals. we identified a novel set of genes which are regulated by elements within the cohesin genes. we found that many of the pathways and transcription factor binding sites enriched within these genes were relevant to disease pathways relevant to development and cancer. moreover, drug-gene interactions further reinforce the importance of these connections to cancer drug treatments and in particular topoisomerasetargeting drugs. as such, our results support recent reports of the importance of long-distance regulation as a key driver of phenotype development . methods a large number of gwas studies have mapped phenotypic variation to cohesin ring gene loci we searched the gwas catalogue for snps mapped or attributed to mitotic cohesin ring genes (smc a, smc , stag , stag , and rad /rad -as), meiosis-specific cohesin ring genes (smc b, stag , rec , and rad l ), cohesin-associated support proteins (wapl, nipbl, pds a, pds b, and mau ), and ctcf from gwas studies covering a large assortment of altered phenotypes and pathologies across most tissues in the body (gwas-attributed). genomic positions of snps were obtained from dbsnp for human reference hg . beyond variants with association to disease, we searched the gtex catalogue for cis-regulatory variants (variants within mb) that modify the expression of either cohesin ring genes (smc a, smc b, smc , stag , stag , stag , rad , rec , and rad l ), cohesin support genes (wapl, nipbl, pds a, pds b, and mau ), or ctcf in one or more of tissues across the human body (eqtl-attributed). unlike gwas variants, these variants have no inherent association to a phenotype (except the overlaps), as gtex contains individuals that were relatively healthy prior to mortality. thus, these variants explain variation in gene expression in a normal, mostly older cohort. genomic positions of snps were obtained from dbsnp for human reference hg . for all gwas-attributed and eqtl-attributed variants, spatial regulatory connections were identified through genes whose transcript levels depend on the identity of the snp through both spatial interaction (hi-c data) plus expression data (expression quantitative trail locus [eqtl]; gtex v ) using the codes d algorithm (https://github.com/genome d/codes d-v ) , . spatial-eqtl association p-values were adjusted using the benjamini-hochberg procedure, and associations with adjusted p-values < . were deemed spatial eqtl-egene pairs. variants not found in the gtex catalogue or variants with a minor allele frequency below % were filtered out due to the sample size of gtex at each tissue. to identify snp locations in the hi-c data, reference libraries of all possible hi-c fragment locations were identified through digital digestion of the hg human reference genome with the same restriction enzyme employed in preparing the hi-c libraries (i.e. mboi, hindiii). digestion files contained all possible fragments, from which a snp library was created, containing all genome fragments containing a snp. next, all snp-containing fragments were queried against the hi-c databases to find distal fragments of dna which spatially connect to the snp-fragment. if the distal fragment contained the coding region of a gene, a snp-gene spatial connection was confirmed. there was no binning or padding around restriction fragments to obtain gene overlap. to limit technological challenges, gene transcripts for both the spatial and eqtl analyses used the gencode transcript model. spatial connections were identified from previously generated hi-c libraries of various origins (supp table ): ) cell lines gm , hmec, huvec, imr , k , kbm , hela, nhek, and hesc (geo accession numbers gse , gse , and gse ); ) tissue-specific data from encode sourced from the adrenal gland, bladder, dorsolateral prefrontal cortex, hippocampus, lung, ovary, pancreas, psoas muscle, right ventricle, small bowel, and spleen (gse ); and ) tissues of neural origin from the cortical and germinal plate neurons (gse ), cerebellar astrocytes, brain vascular pericytes, brain microvascular endothelial cells, sk-n-mc, and spinal cord astrocytes (gse , gse , gse , gse , gse ), and neuronal progenitor cells (gse ). the human transcriptome consists of genes with varying levels of redundancy and critical function, resulting in some genes being intolerant to loss-of-function (lof-intolerant) mutation. this subset of the human transcriptome are posited to also be more intolerant to regulatory perturbation. the gnomad catalog lists , genes and their likelihood of being intolerant to loss-of-function mutations (pli), resulting in , lof-intolerant, , lof-tolerant, and undetermined ( . % lof-intolerance, defined as pli ≥ . ) . we tested all cohesin, cohesin-associated genes, and those from our analysis (gwas-and eqtl-attributed) for lof-intolerance, comparing our cis and transacting eqtl gene lists for enrichment for lof-intolerance. as pli is bimodal and non-normally distributed, we tested both pli raw values as well as pli grouping (tolerant vs intolerant) for correlation between eqtl effect size (log allelic fold change, afc) and intolerance to disruption (pli). we considered both afc and its absolute value (direction of effect ignored), as it has been suggested that the eqtl effect direction is determined by how you define the minor allele within the population, not the actual molecular impact of the eqtl on the cohesin connection. this analysis highlights the significance of long-distance gene regulation on otherwise mutationally-constrained (lof-intolerant) genes. all genes from the gwas-attributed and eqtl-attributed analyses were then annotated for significant biological and functional enrichment using g:profiler , which includes the kyoto encyclopedia of genes and genomes (kegg) pathway database (https://www.kegg.jp/kegg/pathway.html) for pathways and transfac for transcription factor binding enrichment. finally, we identified drugs that target the genes and related mechanisms through the drug gene interaction database (dgidb) . to predict the most phenotypically causal variants within the variant set, we compared variants from the codes d analysis with several tools which leverage deep learning-based algorithmic frameworks to classify functional relevance from dna markers including identified chromatin marks (enhancer marks, etc). we used deepsea to predict the chromatin effects of variants to prioritize regulatory variants and predictsnp to summarise estimates of noncoding variants for classification (deleterious or neutral). deepsea predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence (transcription factors binding, dnase i sensitivities, and histone marks) across multiple cell types. predictsnp predictions are a consensus score from across five separate prediction tools for variant prioritization: cadd . tables table the eqtl-attributed variant list consists of variants, but after filtering results in variants with significant spatial eqtls. the gwas-attributed variant list consists of variants, but after filtering results in variants with significant spatial eqtls. overall, the attributed variants results in variants with significant spatial eqtls. table . pli of cohesin genes shows largely mutation intolerance the main set of genes comprising cohesin and cohesin-support are all loss-of-function intolerant (pli > . ). however, the subset of cohesin genes specific to only meiosis are not loss-of-function intolerant. the eqtl-attributed variant list consists of variants, but after filtering results in variants with significant spatial eqtls. the gwas-attributed variant list consists of variants, but after filtering results in variants with significant spatial eqtls. overall, the attributed variants results in variants with significant spatial eqtls. red: variants overlapping each set; green: variants filtered from codes d for minor allele frequency (maf) < . ; blue: variants not in the gtex variant dictionary (no variant in gtex). codes d scans the gwas-attributed variant list ( variants) for variants with physical proximity to genes that is supported by allele-specific gene expression changes (eqtls). this analysis found variant-gene-tissues connections, involving gwas-attributed variants. codes d scans the eqtl-attributed variant list ( variants) for variants with physical proximity to genes that is supported by allele-specific gene expression changes (eqtls). this analysis found variant-gene-tissues connections, involving eqtl-attributed variants. we analysed the genetic variants we identified for patterns of histone marks, dnase accessibility, and protein binding motifs in haploreg v . . most of the variants have marks of accessible chromatin (dnase: . %). in addition, almost all of the variants ( . %) have at least one of the three: promoter histone marks ( . %), enhancer histone marks ( . %), proteins binding site ( . %). additionally, of the variants modify protein binding or motif predictions in rad ( ), smc ( ), and/or ctcf ( ). green: protein or motif lists including a cohesin gene. supplementary table . deepsea estimated effect of the cis-associated and gwasassociated variants deepsea predicts the chromatin effects of sequence alterations by analysing the epigenetic state of a sequence (transcription factors binding, dnase i sensitivities, and histone marks) across multiple cell types. this results in variants identified as functionally significant. cohesin: its roles and mechanisms diverse developmental disorders fromthe one ring: distinct molecular pathways underlie the cohesinopathies gene regulation by cohesin in cancer: is the ring an unexpected party to proliferation? cohesin in cancer: chromosome segregation and beyond cohesin mutations in human cancer the cohesin release factor wapl restricts chromatin loop extension mutational inactivation of stag causes aneuploidy in human cancer. science ( -. ) towards a better understanding of cohesin mutations in aml cohesin mutations in myeloid malignancies made simple cohesin mutations in myeloid malignancies: underlying mechanisms mutations in the stromal antigen (stag ) gene cause male infertility due to meiotic arrest meiotic kinetochores fragment into multiple lobes upon cohesin loss in aging eggs cornelia de lange syndrome: from molecular diagnosis to therapeutic approach a d map of the human genome at kilobase resolution reveals principles of chromatin looping long-range interactions between topologically associating domains shape the four-dimensional genome during differentiation cohesin facilitates zygotic genome activation in zebrafish proteogenomics and hi-c reveal transcriptional dysregulation in high hyperdiploid childhood acute lymphoblastic leukemia epigenetic reprogramming at estrogen-receptor binding sites alters d chromatin landscape in endocrine-resistant breast cancer spatial chromosome folding and active transcription drive dna fragility and formation of oncogenic mll translocations topoisomerase ii-induced chromosome breakage and translocation is determined by chromosome architecture and transcriptional activity chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities trans effects on gene expression can drive omnigenic inheritance the nhgri gwas catalog, a curated resource of snp-trait associations haploreg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants mutations in kinesin family member reveal specific role in ependymal cell ciliogenesis and human neurological development genetic variants in incident sudep cases from a community-based prospective cohort with epilepsy epilepsy in patients with cornelia de lange syndrome: a clinical series no evidence for cardiac dysfunction in kif mutant mice atypical e f repressors and activators coordinate placental development synergistic function of e f and e f is essential for cell survival and embryonic development e f , a novel target, is upregulated by p and mediates dna damage-dependent transcriptional repression variation across , human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes systematic identification of trans eqtls as putative drivers of known disease associations egr transcription factor is a multifaceted regulator of matrix production in tendons and other connective tissues dgidb . : a redesign and expansion of the drug-gene interaction database genetic control of expression and splicing in developing human brain informs disease mechanisms dissecting the genetics of the human transcriptome identifies novel traitrelated trans-eqtls and corroborates the regulatory relevance of non-protein coding loci † leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits bet inhibition prevents aberrant runx and erg transcription in stag mutant leukaemia cells the cohesin smc is a target the for β-catenin/tcf transactivation pathway smc may play an important role in atopic asthma development degradation of the separase-cleaved rec , a meiotic cohesin subunit, by the nend rule pathway pds prevents the polysumo-dependent separation of sister chromatids genomic atlas of the human plasma proteome genetic effects on gene expression across human tissues gwas on prolonged gestation (post-term birth): analysis of successive finnish birth cohorts profiler: a web server for functional enrichment analysis and conversions of gene lists ( update) predicting effects of noncoding variants with deep learningbased sequence model predictsnp : a unified platform for accurately evaluating snp effects by exploiting the different characteristics of variants in distinct genomic regions this work was supported by a royal society of new zealand marsden grant to jh and jos ( -uoo- ), and ws was supported by the same grant.the drug-gene interaction database only lists known drug interactions for stag and stag , and only for cancer treatments. however, dgidb identifies a large number of drug-gene interactions with the genes identified by codes d. notably, the drug-gene interaction analysis identifies two topoisomerase targets: camptothecin (stag interaction, a topoisomerase-i inhibitor used in cancer treatments) and etoposide (papola and xiap interactions, which inhibits the topoisomerase ii enzyme).supplementary table . hi-c datasets used in this study spatial connections were identified from previously generated hi-c libraries of various origins: ) cell lines gm , hmec, huvec, imr , k , kbm , hela, nhek, and hesc (geo accession numbers gse , gse , and gse ); ) tissue-specific data from encode sourced from the adrenal gland, bladder, dorsolateral prefrontal cortex, hippocampus, lung, ovary, pancreas, psoas muscle, right ventricle, small bowel, and spleen (gse ); and ) tissues of neural origin from the cortical and germinal plate neurons (gse ), cerebellar astrocytes, brain vascular pericytes, brain microvascular endothelial cells, sk-n-mc, and spinal cord astrocytes (gse , gse , gse , gse , gse ), and neuronal progenitor cells (gse ). the genes identified by codes d were searched for enrichment in various biological processes through g:profiler, identifying significantly enriched processes. when removing the cohesin genes from the enrichment analysis, processes remain significant (red), including ubiquitin gene ontologies.supplementary table . gene list and g:profiler enrichment for meiosis-specific cohesin genes (rad l , rec , smc b, stag ; g:profiler ) (a) there are genes functionally connected to variants within the meiosis-specific cohesin loci smc b, stag , rec , and rad l . (b) we identified significantly enriched processes, including the gene ontology "male germ cell nucleus" pathway and for transfac targets for the e f transcription factor, which link to meiosis-and cell-cycle specific mechanisms. the gnomad database identified lof-intolerant genes (intolerance defined as pli ≥ . ). here we show the pli for the genes identified by spatial eqtls in our gwas-and eqtl-attributed variant analysis. overall, are lof tolerant, of the genes ( . %) are lof-intolerant, and lack pli. supplementary table . highlights of genes identified within the ubiquitin mediated kegg pathway within the g:profiler analysis, two ubiquitin-mediated kegg pathways were significant even when removing cohesin from the analysis.. here we highlight the pathways and their associated genes, eqtl distance, variant-attributed locus, and source of variant-attribution. the kegg disease pathways identified above are largely identifying lof-intolerant genes ( of [ . %] kegg ubiquitin-mediated proteolysis genes).supplementary table . drug-gene interactions with eqtl-attributed and gwasattributed eqtl genes (dgidb v . ) key: cord- -jiuqk kg authors: adler, paul n. title: short distance non-autonomy and intercellular transfer of chitin synthase in drosophila date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jiuqk kg the complex structure of insect exoskeleton has inspired material scientists and engineers. chitin is a major component of the cuticle and it is synthesized by the enzyme chitin synthase. there is a single chitin synthase gene (kkv) in drosophila facilitating research on the function of chitin. previous editing of kkv lead to the recovery of a viable hypomorphic allele. experiments described in this paper argue that a reduction in chitin synthase leads to the shafts of sensory bristles becoming fragile and frequently breaking off as the animals age. this is likely due to reduced chitin levels and further suggests that chitin plays a role in resilience of insect cuticle. the different layers in cuticle are continuous across the many epidermal cells that secrete the cuticle that covers the body. little is known about the mechanisms responsible for this continuity. using genetic mosaics and scanning electron microscopy this paper establishes that kkv shows short range cell non-autonomy. it also provides evidence for possible mechanisms. one is the intercellular transfer of kkv protein from one cell to its neighbors and the second is the deposition of cuticular material across the boundaries of neighboring cells. the complex structure of insect exoskeleton has inspired material scientists and engineers. chitin is a major component of the cuticle and it is synthesized by the enzyme chitin synthase. there is a single chitin synthase gene (kkv) in drosophila facilitating research on the function of chitin. previous editing of kkv lead to the recovery of a viable hypomorphic allele. experiments described in this paper argue that a reduction in chitin synthase leads to the shafts of sensory bristles becoming fragile and frequently breaking off as the animals age. this is likely due to reduced chitin levels and further suggests that chitin plays a role in resilience of insect cuticle. the different layers in cuticle are continuous across the many epidermal cells that secrete the cuticle that covers the body. little is known about the mechanisms responsible for this continuity. using genetic mosaics and scanning electron microscopy this paper establishes that kkv shows short range cell non-autonomy. it also provides evidence for possible mechanisms. one is the intercellular transfer of kkv protein from one cell to its neighbors and the second is the deposition of cuticular material across the boundaries of neighboring cells. insect cuticles. i will use the term envelope to describe the outermost layer that is lipid rich and serves as a barrier to water loss and aids in hydrophobicity. i use the term epicuticle for the next layer. both of these are rather thin in the drosophila cuticles studied in this paper. the innermost layer is the procuticle, which is composed of multiple sub-layers of chitin and protein. the chitin fibrils are arranged as parallel arrays and each layer is rotated with respect to its neighboring layers (bouligand, ; moussian, ; moussian et al., ) . the array rotation gives rise to the layered appearance of the procuticle. the procuticle is typically the thickest layer and the arrays of chitin are thought to provide strength and perhaps resilience to failure. in addition to these layers various authors have proposed an additional layer that is juxtaposed between the apical surface of the epidermal cells and the basal region of the procuticle. this fourth layer was given the names such as the assembly zone or adhesion layer (fristrom et al., ; locke, ; moussian et al., ; schmidt, ; sobala and adler, ) . the suggested function can be gleamed from their names. my observations suggest that both of these two types of layers can be found in at least some developing cuticles and that they are different from one another (sobala and adler, ) . the insect cuticle is extremely variable in terms of its physical properties. for example, the young's modulus of insect cuticle varies over more than orders of magnitude (vincent and wegst, ) . qualitative and quantitative differences in the molecular components, their degree of cross linking and the degree of hydration are likely responsible for this variation. to a first approximation, it is generally thought that the layers are synthesized in the order envelope, exocuticle and procuticle but there is likely at least some "maturation" that takes place out of order (moussian et al., ; sobala and adler, ) . insect cuticle is a complicated material containing a large number of different components. for example, more than cuticle protein genes are found in the drosophila genome (ioannidou et al., ; willis, ) and it has been suggested that this may be an underestimate (sobala and adler, et al., ; wong and adler, ) . as morphogenesis continues the delayed hairs catch up to the earlier forming ones. to determine if the deposition of cuticle followed a similar time course i used a chitin reporter and examined wing cuticle deposition by in vivo imaging (sobala et al., ) . at all times examined ( , and hr awp (after white prepupae)) neighboring cells showed similar levels of the reporter ( fig c) . i carried out similar experiments where we localized kkv::ng and similarly in all cases neighboring cells always showed similar levels and localization of kkv (fig b) . we also examined pigmentation in developing pupae and once again, neighboring cells showed a similar level of pigmentation ( fig s ) . hence the mechanism used to initiate hair outgrowth appears to be different from the mechanism used to initiate chitin deposition and kkv synthesis. the cuticle of a hypomorphic allele of kkv is less robust than wild type cuticle. as described previously i used crispr/cas to tag the endogenous kkv gene with either neon green or smfp (adler, ) . flies homozygous for each of the edits were viable and showed no dramatic phenotype when examined in a stereomicroscope. however, when we mounted wings for examination in the compound microscope we observed thin, bent and generally deformed wing hairs (trichomes) in the smfp homozygotes but not the ng homozygotes (adler, ) . we further established that the kkv::smfp protein accumulated to a much lower level than kkv::ng. when we examined kkv::smfp/df flies the hair phenotype appeared slightly stronger. these two observations let me to conclude kkv::smfp was a hypomorph due to the protein not accumulating efficiently (adler, ) . i subsequently noticed that a minority of the kkv::smfp/df flies had deformed wings soon after or at eclosion (fig ) . this suggested a problem in expansion of the wing or eclosion from the pupal case. we found the frequency of flies showing the phenotype was significantly higher for kkv::smfp/df , kkv::smfp/kkv (kkv is listed as an amorphic allele on fybase (dos santos et al., )) and kkv::smfp flies compared to ore-r or kkv::ng flies (fig ) . we also noticed that some of the kkv::smfp containing flies showed a slight downward curve to the wing. while the kkv hypomorphic individuals are able to fly, by casual observation, they appeared less active than normal flies. to test the hypothesis that the cuticle synthesized in a hypomorphic mutant was less robust to the wear and tear of life we followed adult flies that were either wt or kkv hypomorphs over time to see if defects arose more rapidly in the hypomorphs. we scored phenotypes: life span, the loss of thoracic macrocheatae or wing defects. all of the various mutants were compared to wild type ore-r flies. to reduce the length of time that these experiments took we kept the adult flies at . o c or . o c, which shortens lifespan. we carried out separate experiments using slightly different conditions (see methods) and in both cases the ore-r flies lived longer than the kkv edited flies (fig s ) . the data suggested that this was not simply due to decreased kkv levels as kkv::smfp/df flies lived slightly longer than the kkv::smfp flies. the basis for the shorter life span is unclear. a significant difference was seen for the loss of thoracic macrocheatae with aging that did appear to be due to kkv levels as the strongest phenotypes were seen for kkv::smfp/df and kkv::smfp/kkv flies (fig h-k). most of those flies that lived for days or longer showed the loss of at least one of those bristles and in most cases multiple bristle were lost. i also observed the loss of microcheatae (fig ) but did not quantify this. in all, or at least most cases, the bristle shaft was lost but the socket cell remained (fig g ). in addition to the two experiments where females and males were cultured together, we also carried out a small scale experiment where we followed individual female flies for days. this experiment established that the loss was progressive. that is, we usually observed the loss of one or two bristles followed by the subsequent loss of additional bristles (fig def) . bristle loss was less frequent but still common in kkv::smfp homozygoes. in contrast, the loss of bristles associated with aging was rarely seen for ore-r (fig abc, h-k) . we also scored the flies in these experiments for loss of sections of the wing. the losses routinely began at the wing margin. surprisingly, this turned out to be substantially more common in ore-r flies than in the flies with edited kkv genes ( fig s ) . possible reasons for this are explored in the discussion. in an attempt to assay more directly for non-autonomy of kkv mutant clones i needed an assay that allowed us to identify kkv clones and examine them at a higher resolution than is possible in the light microscope. i found that i could fracture adult cuticle and by scanning electron microscopy detect a layered structure. the layers were very distinctive in abdominal cuticle ( fig s ) consistent with the robust layering in abdominal procuticle seen by tem (fig ) . i also attached wings to studs in a vertical position, fractured them and then imaged the wings by scanning electron microscopy. in wing cuticle i also detected a layered structure that presumably reflects the banding of chitin in the procuticle (fig bc ). the layering was less distinctive (and sometimes hard to detect) than in the abdominal cuticle consistent with tem observations. we next fractured and observed wings carrying kkv loss of function clones. we were able to identify mutant clones by the presence of the kkv flaccid hair phenotype ( wing cuticle thickness at wt-mutant clone boundaries ( fig a) . in contrast, if there was a small degree of cell non-autonomy we predicted that we would see a smooth change in cuticle thickness near the edge of clones ( fig a) . in all of the clones (n= ) we examined there was a smooth transition in cuticle thickness ( fig de) . this transition zone appeared to be restricted to a mutant cell and its direct neighbor. possible mechanism for the short-range non-cell autonomy of kkv. in experiments where we imaged kkv::ng in living pupae we noticed fluorescent puncta in the extracellular space between the pupal cuticle and the epidermal cells that were in the process of synthesizing the adult cuticle (fig. bc ). to investigate this in more detail we obtained large z stacks that extended from the pupal cuticle to below the apical surface of the epidermal cells. pupal cuticle in hr pupae shows substantial autoflourescence so we examined and compared ore-r and kkv::ng pupae to determine what if any fluorescence was due to the presence of the kkv-ng protein. the autofluorescence of the thoracic pupal cuticle of ore-r was spatially relatively even ( fig d) . in contrast, the fluorescence of pupal cuticle of kkv-ng flies was much more uneven with both puncta (arrow) and lines (arrowhead) of bright fluorescence (fig. a) . no fluorescence was observed in the region between the pupal cuticle and the apical surface of the epithelial cells in ore-r pupae (fig. ef) . in contrast, in this region many fluorescent puncta were observed in kkv::ng pupae (fig. bc, arrows) . a majority of these were located close to the pupal cuticle but some were observed throughout the region. most of the puncta located close to the pupal cuticle were stable but many of those located lower were mobile (movie s ). since the fluorescent puncta were only seen when the kkv::ng gene was present we interpret the puncta as evidence of shed kkv::ng. since the puncta were located above the impermeable adult cuticle (which is in the process of being synthesized), it seems likely that the kkv::ng was shed during or after the synthesis of the pupal cuticle and before the synthesis of the adult cuticle began. consistent with this hypothesis we observed puncta in hr pupae, well before the start of adult cuticle deposition (sobala and adler, ) . we also observed puncta in very young pupae ( hr awp) prior to the detachment of the epithelial cells from the pupal cuticle (fig. s b) . we observed similar puncta when we examined ap>kkv::ng pupae ( fig s c) but not from ap>kkv-r k::ng pupae (this mutant protein does not localize correctly to the apical surface (adler, )) ( fig s d) , suggesting that to be shed kkv needs to be localized apically. the highest concentration of puncta were over the dorsal thoracic midline (fig s abc) . there were also a large number of puncta over the dorsal abdomen and they were seen at a lower frequency in the wing, legs and head. in the abdomen the puncta tended to align parallel to the segment boundary. all of the experiments where we detected puncta in living pupae required imaging of the kkv::ng fusion protein. experiments described earlier established that ng was a valid reporter for kkv in bristles so it seemed likely that it was also a valid reporter for kkv in puncta (adler, ) . to test this hypothesis i immunostained pupal cuticle using both anti-ng and anti-kkv-m antibodies. among the puncta detected . % stained with both antibodies indicating most puncta contained both ng and kkv (fig s , arrows) supporting the idea that ng is an valid reporter for shed kkv::ng. the shedding of kkv::ng could be specific for kkv or it could reflect a process that leads to the shedding of many if not all of the proteins located in the apical plasma membrane. to distinguish between these two possibilities we examined live ap-gal /+; uas-mcd -gfp/+ pupae. these animals showed a large number of fluorescent puncta present in the space between the pupal cuticle and the apical surface of the epithelial cells ( fig ghi) . as was the case for the puncta in kkv::ng pupae many of the puncta were mobile. i concluded that the shedding of membrane proteins is not specific for kkv or proteins involved in cuticle deposition. the pupal cuticle of drosophila appears relatively transparent and uniform in bright field optics. however, in carrying out these in vivo imaging experiments we observed that the autofluorescence of the thoracic and abdominal pupal cuticles was quite distinct (fig s ) . the autofluorescence of the thoracic pupal cuticle was splotchy but without distinctive morphology. in contrast, the autofluorescence of the abdominal pupal cuticle showed a pattern of bright elongated lines. our first thought when observing this was that the bright lines represented cell boundaries however attempts to establish this were unsuccessful. flip out clones -transfer of kkv::ng puncta. the observation that kkv::ng was shed raised the possibility that this could be a mechanism to provide for cell non-autonomy of kkv. one possible mechanism is the apical secretion of kkv and its subsequent lateral movement prior to it synthesizing chitin. an alternative possibility is the intercellular movement of kkv to neighboring cells followed by secretion or apical localization and chitin synthesis. in an attempt to get evidence for either of these mechanisms we generated flip out clones comprised largely of single cells that expressed kkv::ng and then looked by in vivo confocal microscopy for lateral movement of kkv::ng beyond the clone cell. we observed such clones during the deposition of the procuticle and for of these we observed kkv::ng puncta beyond the lateral edge of the clone cell (fig ab ). these puncta could be localized apically over the neighboring cells or in the neighboring cell. another possible mechanism to explain the short distance cell non-autonomy is for a spread of apically secreted chitin. with this in mind we examined thoracic epidermal cells during the synthesis of the pupal cuticle by transmission electron microscopy (tem). we could identify cell boundaries by the presence of junctional complexes (figure , asterisks) . we observed that the undulae found on most epidermal cells during cuticle secretion were often bent over the position of the junctional complex (fig. befi). we also often observed what appeared to be "trains" of secreted material between the undualae and the cuticle. these "trains" were often curved and extended over the cell boundary to above the neighboring cell ( fig behfi) . these observations suggest that cuticle material secreted from one cell can end up covering part of a neighbor. the importance of cuticle both as a barrier to the outside and as a support for movement makes its integrity of paramount importance to the success of insects. flight is a standard feature of insects and it is extremely demanding in terms of energy and structural wear on the cuticle. i previously isolated a hypomorphic allele of kkv (kkv::smfp) and here i described several additional phenotypes. one of these was the loss of thoracic macrocheatae. we established that the loss of the bristle shaft increased over time and it appears to be due to the shaft breaking off as the surrounding socket cell remained and were made by the author in his lab and are described in (adler, ) . immunostaining of fixed pupal epidermal cells during the deposition of cuticle is complicated by the inability of the antibodies to penetrate cuticle after the early stages of its development. thus, most of the imaging experiments we carried out on kkv in pupae were done by in vivo imaging of kkv::ng. in a small number of experiments we examined kkv-ng in fixed tissue. inthese experiments we carried out anti-ng immunostaining. such tissue was only weakly fixed and we did not use animals that were older than around hr after white prepupae (awp). otherwise, immunostaining of pupal and larval tissues were done as described previously (nagaraj and adler, ) . imaging of live kkv::ng containing pupae was done on a zeiss confocal microscope in the keck center for cellular imaging. stained dissected samples were examined on the same microscope. wings were removed from two day old adult flies. in the experiments described in the paper the wings contained flip out clones (aygal ) that expressed an rnai for kkv (trip line -hmc. ). the hair phenotype overlapped with that seen previously in clones homozygous for kkv with but with a smaller fraction with the strongest phenotype (ren et al., ) . the wings were attached to studs with a vertical surface with conducting paint and they were then fractured with a tungsten needle. awp (after white prepupae) animals. panels beh are from animals hr awp and panels cfi are from animals hr awp. arrowheads point to undulate that extend over the junctional complex to be above the neighboring cell. arrows point to secreted material in "trains" that curve and join the forming cuticle displaced laterally from the undulae it appears to derive from. this is most dramatic in the hr awp animals but is also clear in the hr animals and there are hints in the hr samples. the size markers are um. panels a and d are shown at a higher magnification than the other images. panel d is shown to illustrate the distinctive dark/light/dark envelope. not all micrographs show this as well. the localization of chitin synthase mediates the patterned deposition of chitin in developing drosophila bristles ecdysone-responsive transcriptional regulation determines the temporal expression of cuticular protein genes in wing discs of bombyx mori flexibility and control of thorax deformation during hawkmoth flight twisted fibrous arrangements in biological materials and cholesteric mesophases elucidation of the regulation of an adult cuticle gene acp a by the transcription factor broad a genome-wide transgenic rnai library for conditional gene inactivation in drosophila flybase: introduction of the drosophila melanogaster release reference genome assembly and large-scale migration of genome annotations unexpected strength and toughness in chitosan-fibroin laminates inspired by insect cuticle the distribution of ps integrins, laminin a and f-actin during key stages in drosophila wing development the transcription factor grainy head and the steroid hormone ecdysone cooperate during differentiation of the skin of drosophila melanogaster crustacean-derived biomimetic components and nanostructured composites cutprotfam-pred: detection and classification of putative structural cuticular proteins from sequence alone, based on profile hidden markov models. insect biochemistry and molecular biology effects of geometry on stresses in discontinuous composite materials drosophila dhr nuclear receptor is required for adult cuticle integrity at eclosion novel transport function of adherens junction revealed by live imaging in drosophila pore canals and realted structures in insect cuticle the drosophila planar polarity gene multiple wing hairs directly regulates the actin cytoskeleton protein equilibration through somatic ring canals in drosophila stages of cell hair construction in drosophila the apical plasma membrane of chitin-synthesizing epithelia involvement of chitin in exoskeleton morphogenesis in drosophila melanogaster cuticle differentiation during drosophila embryogenesis assembly of the drosophila larval exoskeleton requires controlled secretion and shaping of the apical plasma membrane dusky-like functions as a rab effector for the deposition of cuticle during drosophila bristle development genetic control of epidermis differentiation in drosophila. the international journal of developmental biology the transgenic rnai project at harvard medical school: resources and validation the excitation and contraction of the flight muscles of insects numerical investigation of insect wing fracture behaviour gene expression during drosophila wing morphogenesis and differentiation megalin-dependent yellow endocytosis restricts melanization in the drosophila cuticle influenza a virus uses intercellular connections to spread to neighboring cells plasmodesmata at a glance observations on the subcuticular layer in the insect integument the gene expression program for the formation of wing cuticle in drosophila chtvis-tomato, a genetic reporter for in vivo visualization of chitin deposition in drosophila design and mechanical properties of insect cuticle. arthropod structure & development in vivo time-resolved microtomography reveals the mechanics of the blowfly flight motor ecdysone directly and indirectly regulates a cuticle protein gene, bmwcp , in the wing disc of bombyx mori. insect biochemistry and molecular biology structural cuticular proteins from arthropods: annotation, nomenclature, and sequence characteristics in the genomics era tissue polarity genes of drosophila regulate the subcellular location for prehair initiation in pupal wing cells adherens junction-associated pores mediate the intercellular transport of endosomes and cytoplasmic proteins key: cord- -ucqqpra authors: zhang, zhe; luo, shuo; barbosa, guilherme oliveira; bai, meirong; kornberg, thomas b.; ma, dengke k. title: the conserved er-transmembrane protein tmem coordinates with copii to promote collagen secretion and prevent er stress date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ucqqpra dysregulation of collagen production and secretion contributes to aging and tissue fibrosis of major organs. how premature collagen proteins in the endoplasmic reticulum (er) route as specialized cargos for secretion remains to be fully elucidated. here, we report that tmem , an er-localized transmembrane protein, regulates production and secretory cargo trafficking of procollagen. we identify the c. elegans ortholog tmem- from an unbiased rnai screen and show that deficiency of tmem- leads to striking defects in cuticle collagen production and constitutively high er stress response. rnai knockdown of the tmem- ortholog in drosophila causes similar defects in collagen secretion from fat body cells. the cytosolic domain of human tmem a binds to sec a, a vesicle coat protein that drives collagen secretion and vesicular trafficking. tmem- regulation of collagen secretion is independent of er stress response and autophagy. we propose that roles of tmem- in collagen secretion and preventing er stress are likely evolutionarily conserved. collagen is the major molecular component of connective tissues, and the most abundant protein in animals ( ) . collagen dysregulation causes many human disorders, including autoimmune diseases, brittle bone diseases (too little collagen), tissue fibrosis (too much collagen) and aging-related disorders ( - ). the multi-step biosynthesis of mature collagen by the cell is a complex process and involves procollagen gene transcription and protein translation, posttranslational modification, assembly into procollagen trimers inside the endoplasmic reticulum (er), vesicular secretion from er, extracellular peptide cleavage and cross-linking into collagen fibers ( , ) . specific mechanisms underlying the secretion of procollagen still remain poorly understood. in general, specialized intracellular vesicles defined by the coat protein complex ii (copii) transport most secreted proteins, including procollagen, from the er to the golgi apparatus ( , ). sec , sec , sec and sec comprise copii coat proteins, while the transport protein particle (trapp) complex acts a key tethering factor for copii vesicles en route to the golgi ( - ). typical copii vesicles are to nm in diameter, which is not sufficient for transporting procollagen trimers with up to to nm in length ( ). in mammals, large-size copii-coated vesicles may transport procollagen from the er to the golgi apparatus. tango , a transmembrane protein at the er exit site, mediates formation of specialized collagen-transporting vesicle and recruitment of procollagen ( - ). the n-terminal sh -like domain of tango binds to the collagen chaperone hsp in the er lumen, recruiting procollagens to the er exit site ( ). its c-terminal proline-rich domain (prd) servers as a copii receptor by interacting with the inner shell proteins sec /sec ( ). the coil-coil domain of tango forms a stable complex with ctae and sec , which is particularly enriched around large copii carriers for procollagen ( ) . through its membrane helices, tango organizes er exit sites by creating a lipid diffusion barrier and an export conduit for collagen ( ) . while requirement of tango for secretion may depend on specific collagen types, it remains unclear whether tango 's functions are broadly conserved in all animals ( , ) . caenorhabditis elegans produces over collagen members that constitute the cuticle and basement membranes, encodes conserved homologs of copii/trapp proteins, yet lacks apparent tango homologs ( - ). this indicates that evolutionarily conserved and tango -independent mechanisms may exist in c. elegans to regulate procollagen secretion. from a genome-wide rnai screen for genes affecting er stress response, we previously identified tmem- that defines a broadly conserved family of proteins important for procollagen assembly and secretion ( ). mutations in specific collagen genes, conserved copii/trapp-encoding homologs, and impairment of collagen biosynthetic pathway components are known to result in a range of phenotypes including er stress response, abnormal cuticle- associated morphology (blister and dumpy), and early death or growth arrest ( ). table) and found that tmem- rnai knock-down strongly reduced abundance of the col- ::gfp reporter (fig a) . col- is a c. elegans exoskeleton collagen that is secreted by the underlying hypoderm and required for integral structure of the cuticle ( ). the c-terminal gfp-tagged col- reporter enables highly robust and tractable visualization of the cuticle morphology and to identify defects in the collagen production machinery ( ) using confocal microscopy to characterize the structure of hypodermal cuticle, we found that in control rnai animals, col- ::gfp is enriched in the hypoderm, constituting regular annular furrows and lateral alae of the cuticle (fig b) . in the tmem- rnai animals, col- ::gfp appeared to be clustered in the intracellular region of hypoderm, and largely absent in the cuticle (fig b) . we further analyzed the abundance and composition of col- ::gfp proteins by western blot. besides strong reduction of overall col- ::gfp abundance, tmem- rnai markedly increased the soluble "premature" monomeric procollagens, while decreased the insoluble fraction of cross-linked multimers and "mature" monomers of col- ::gfp (fig c) . to examine possible involvement of tmem- in collagen gene transcription, we used rnai to knock-down tmem- in animals with the col- p::gfp transcriptional reporter in which gfp expression is driven by the promoter of col- . in contrast to the striking decrease of overall col- ::gfp protein abundance, the transcriptional activity of the col- promoter was not affected by tmem- (figs b and d). we also evaluated the mrna level of col- by quantitative reverse transcription polymerase chain reaction (qrt-pcr) and found that the dma mutant displayed a mild increase of col- mrna level, likely caused by compensatory feedback regulation of collagen gene transcription (fig e) . the dma mutant fully recapitulated the tmem- rnai phenotype in defective col- ::gfp secretion ( fig f) . there are two main collagen-enriched tissues in c. elegans, the cuticle (exoskeleton) and basement membranes ( ). tmem- rnai had no effect on the production of mcherry-tagged emb- ( ), a collagen iv α on basement membranes (s t fig and s table) . we found that loss of tmem- specifically affected collagens in cuticle, as exemplified by lon- ::gfp and col- ::gfp (figs g-h and s ). furthermore, electron microscopy (em) analysis revealed striking reduction of cuticle thickness in dma mutants than in wild type ( fig i) . we also noticed that tmem- deficient animals were small in size and dumpy, more sensitive to cuticle-disrupting osmotic stresses and developed more slowly. taken together, these results demonstrate essential roles of tmem- in collagen secretion, proper cuticle formation and preventing er stress likely induced by premature collagen accumulation in c. elegans. between human sec a c-termini with human wild type and yr mutant tmem a cytoplasmic loop domain. predicted by the topcons program, tmem- contains putatively eight transmembrane segments and two large cytoplasmic loops ( fig b) . we further used the y h screen to search for human proteins that could interact with the conserved first loop domain ( - a.a.) and the second loop domain ( - a.a.) of tmem a ( fig c) . among the prey cdna clones identified from the y h screen, sec a was confirmed to interact with the second loop domain of tmem a (fig d) . the disease manifests with skeletal abnormalities, dysmorphic facial features and calvarial hypomineralization, features thought to result from defects in collagen secretion ( ). consistent with recent studies using the coip assay to demonstrate association between tmem a and sec a ( ), we found that tmem a interacted with sec a but not sec d in y h assays (figs d-e). these results indicate that the tmem a cytoplasmic loop domain interacts specifically with sec a, which forms an inner-shell heterodimer with sec to drive procollagen secretion. we next examined the loss-of-function phenotype of sec- . rnai knock-down of sec- , the c. elegans homolog of sec a, strongly reduced col- ::gfp secretion in the cuticle and increased its aggregation in the intracellular region of hypoderm (fig f ). rnai of sec- also led to strong hsp- p::gfp induction, indicating constitutively activated er stress response ( fig g) conserved among all examined species from invertebrates to vertebrates (s fig). to test whether the conserved yr motif is important for interaction with sec a, we substituted the yr motif of tmem a into alanine-alanine (aa). using y h assays, we found that such substitution in tmem a strongly attenuated its interaction with sec a (fig h) . these results show that the second cytoplasmic loop domain of tmem a specifically binds to the copii inner-shell component sec a and its c. elegans homolog sec- is also essential for collagen production in vivo. the collagen secretion phenotype of tmem- is independent of er stress and autophagy we identified both tmem- and tmem- from the genome-wide screen for rnai clones affecting the abundance of asp- p::gfp, which is downregulated by er stress ( ). we examined collagen secretion phenotypes of other genes involved in protein modification and homeostasis in the er identified from the asp- p::gfp screen, including ostb- , nus- , stt- , dlst- , ost- and uggt- (fig a and s table) . rnai rnai, but not sac- rnai, caused a marked up-regulation of the autophagy transcriptional reporter tts- p::gfp (fig a) . tts- is a long non-coding rna that represses protein synthesis and is activated by hlh- /tfeb, a master transcriptional regulator of autophagy ( , ). however, sac- rnai did not affect the er stress response reporter hsp- p::gfp (fig b) or col- ::gfp (figs c-d) . we also examined rnai phenotypes of let- , which encodes an ortholog of human mtor (mechanistic target of rapamycin kinase) and regulates autophagy in c. elegans ( , ). similarly as sac- rnai, let- knock-down in c. elegans showed a marked induction of tts- p::gfp but has no apparent effects on collagen secretion (figs e- g). together, these findings indicate that roles of c. elegans tmem- in collagen secretion are independent of er stress response and autophagy regulation. copii/trappiii complexes for sequential er-to-golgi cargo transport (fig ) . the second cytoplasmic loop domain of tmem a interacts with the core copii coating component sec a. tmem binds to col a to facilitate assembly of procollagen trimers and trapp iii activation of rab gtpase, in coordination with tmem a to promote the er-to-golgi transport of procollagen cargo in copii. uso interacts with the copii vesicle to promote targeting to the golgi apparatus. by yeast-two-hybrid assays, we found that the tmem a cytoplasmic loop domain can interact with the sec a. rnai knock-down of sec- and most other copii genes recapitulated the tmem- loss-of-function phenotypes in constitutively high er stress response, defective collagen secretion and sensitivity to osmolality stress in c. elegans (table ). we also noticed that rnai knock-down of many copii related genes, such as sec- , sec- . , npp- , sar- , sec- , rab- and trpp- caused more severe phenotypes than tmem- rnai, leading to lethality or developmental arrest that prevent collagen phenotype analysis (table ) recent work showed that tmem a facilitates the er-to-golgi transport of sac and regulates autophagosome formation ( ). we found that rnai knock-down of autophagy related genes, such as sac- and let- , caused autophagy induction but did not affect the er stress response or collagen secretion (fig ) . genes identified from the asp- p::gfp screen that regulate the er stress response also did not affect collagen secretion (s table) , further supporting the notion that roles of tmem- in collagen secretion are independent of er stress response and autophagy. besides sec a, additional interactors were identified from y h screens with the tmem a cytoplasmic domain as bait. we verified the interaction between full-length dctn ( - a.a.) and tmem a ( - a.a.) (fig d) . dctn is a subunit of the dynactin protein complex ( ) that acts as an essential cofactor of the cytoplasmic dynein motor to transport a variety of cargos and organelles along the microtubule- based cytoskeleton ( , ). in mammalian cells, er-to-golgi transport proceeds by cargo assembly into copii-coated er export sites (eres) followed by vesicular/tubular transport along microtubule tracks toward the golgi in a dynein/dynactin-dependent manner ( ). sec p directly interacts with the dynactin complex ( ), indicating that tmem a may participate in a sec /dctn complex to facilitate copii coat assembly and subsequent dynein/dynactin-dependent transport. test of this hypothetic model and determination of the underlying mechanism in relation to tmem 's role in collagen secretion await further investigations. mammalian genomes encode two tmem family proteins, tmem a and tmem b. tmem a is a susceptibility locus associated with various autoimmune diseases and highly up-regulated in brain tumors ( , ). tmem b was recently found to interact with the sars-cov- orf c protein, which localizes to er-derived vesicles ( , ). hela cells were seeded in -well plates with cover glass, each with three replicates collagen structure and stability mechanisms of renal fibrosis always cleave up your mess: targeting collagen degradation to treat tissue fibrosis the processes and mechanisms of cardiac and pulmonary fibrosis mechanisms of fibrosis: therapeutic translation for fibrotic disease autoimmune collagen vascular diseases: kids are not just little people dauer- independent insulin/igf- -signalling implicates collagen remodelling in longevity. structure, physiology, and biochemistry of collagens copii-mediated vesicle formation at a glance cargo capture and bulk flow in the early secretory pathway vesicle-mediated export from the er: copii coat function and regulation trapp complexes in membrane traffic: convergence through a common rab twenty-five years after coat protein complex ii the pathway of collagen secretion procollagen export from the endoplasmic reticulum a long noncoding rna on the ribosome is required for lifespan extension mondo complexes regulate tfeb via tor inhibition to promote longevity in response to gonadal signals mtor regulation of autophagy analysis of dynactin subcomplexes reveals a novel actin-related protein associated with the arp minifilament pointed end the structure of the dynactin complex and its interaction with dynein the cytoplasmic dynein transport machinery and its many cargoes coupling of er exit to microtubules through direct interaction of copii with dynactin tmem a and human diseases: a brief review cov- protein interaction map reveals targets for drug repurposing sequence analysis and structure prediction of sars-cov- accessory proteins b and orf : evolutionary analysis indicates close relatedness to bat coronavirus efficient gene transfer in c.elegans: extrachromosomal maintenance and integration of transforming sequences oligonucleotide-based targeted gene editing in c. elegans via the crispr/cas system one-step homozygosity in precise gene editing by an improved crispr/cas system the genetics of caenorhabditis elegans genome-wide rnai screening in caenorhabditis elegans evolutionarily conservation of tmem a protein sequences among different species multiple sequence alignment of tmem a from major representative animal species (by cobalt program), with conserved yr residues indicated in the second cytoplasmic loop domains tmem- rnai knock-down for screen of phenotypic defects of different translational fluorescent reporters. (a-v) exemplar fluorescence images showing translational reporters for (a) wrk- c) hmr- , (d) mans, (e) eff- , (f-g) cpl- , (h) mig- , (i) fat- egl- , (l) ubiquitin-v, (m) y e a. , (n) spon- , (o) cat- , (p) sp , (q) lgg- ) emb- and (u) t d . in wild-type animals by control and tmem- rnai at °c (n = - for each reporters) a-b) exemplar fluorescence images showing translational reporters for (a) col- and (b) lon- . in wild-type animals at °c (n = - for each reporters) rnai knock-down cop ii component genes screen for collagen production defection exemplar fluorescence images of col- translational reporter for (a) control, (b) sar- , (c) sec- . , (d) sec- . , (e) sec- , (f) trpp- , (g) trpp- , (h) npp- , (i) sec- , (j) rab- and (k) tmem- rnai knock-down copii component genes for screen genes involved in er stress response a) control, (b) sec- . , (c) sar- , (d) npp- , (e) tmem- , (f) sec- . , (g) sec- , (h) pdi- , (i) trpp- and (j) uso rnai knock-down er stress response genes in screen for collagen production defection exemplar fluorescence images of col- translational reporter for (a) control c) uggt- , (d) cdc- . , (e) ire- and (f) xbp- in wild-type animals at °c. scale bars: µm cladogram showing conservation of the sac protein sequences throughout evolution cladogram of phylogenetic tree for the sac protein family from major representative eukaryotic species (adapted from www.treefam.org). domain architectures of sac family proteins key: cord- -v a ll authors: taylor, adrian; grapentine, sophie; ichhpuniani, jasmine; bakovic, marica title: the novel roles of choline transporter-like and in ethanolamine transport date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v a ll we examined a novel function of mammalian choline-transporter-like proteins ctl /slc a and ctl /slc a in ethanolamine transport. we established two distinct ethanolamine transport systems of a high affinity (k = . - . μm), mediated by ctl , and of a low affinity (k = - μm), mediated by ctl . both types of transport are na+-independent and mediated in a ph dependent manner, as expected for ethanolamine/h+ antiporters. primary human fibroblasts with separate frameshift mutations (m = slc a Δasp and m = slc a Δser ) are devoid of ctl ethanolamine transport but maintain unaffected ctl transport. the lack of ctl or ctl reduced the ethanolamine transport, the flux by the cdp-ethanolamine kennedy pathway and pe synthesis. overexpression of ctl in slc a Δser (m ) cells improved the ethanolamine transport and pe synthesis. the slc a Δser cells are reliant on ctl function and ctl sirna almost completely abolished ethanolamine transport in the whole cells and mitochondria. overexpression of ctl and ctl cdnas increased ethanolamine transport in control and slc a Δser cells. ctl and ctl facilitated mitochondrial ethanolamine uptake, but the transport mediated by ctl is predominant in the whole cells and mitochondria. these data firmly established that ctl and ctl are the first identified ethanolamine transporters in the whole cells and mitochondria, with intrinsic roles in de novo pe synthesis by the cdp-etn kennedy pathway and compartmentation of intracellular ethanolamine. significance the lack of choline transporter like (slc a /ctl ) is the primary cause of a new neurodegenerative disorder with elements of childhood-onset parkinsonism and mitochondrial dysfunction. slc a /ctl encodes the human neutrophil antigen , causes autoimmune hearing loss and meniere’s disease, and has been recently identified as the main risk factor for thrombosis-the major cause of death in covid- patients. our investigation provides insights into the novel functions of ctl and ctl as intrinsic ethanolamine transporters. ctl and ctl are high and low affinity transporters, with direct roles in the membrane phospholipid synthesis. the work contributes to new knowledge for ctl and ctl independent transport functions and the optimization of prevention and treatment strategies in those various diseases. phosphatidylcholine (pc) and phosphatidylethanolamine (pe) are major components of cellular membranes where they are involved with essential cellular processes ( , ) . pc and pe are synthesized de novo by cdp-cho and cdp-etn branches of the kennedy pathway in which the extracellular substrates choline (cho) and ethanolamine (etn) are actively transported into the cell, phosphorylated and coupled with diacylglycerols (dag) to form the final phospholipid product. while multiple transport systems have been established for cho, etn transport is poorly characterized and there is no single gene/protein assigned a transport function for mammalian etn. cho transport for membrane phospholipid synthesis is mediated by cho transporter like protein ctl /slc a ( ) . ctl is the only well-characterized member of a broader family (ctl - /slc a - ) ( , ) . ctl /slc a is a cho/h + antiporter at the plasma membrane and mitochondria ( , ) . the role of plasma membrane ctl is assigned to cho transport for pc synthesis, but the exact function of the mitochondrial ctl is still not clear. in the liver and kidney, mitochondrial ctl transports cho for oxidation to betaine, the major methyl donor in the one-carbon cycle ( ) . in other tissues however, the mitochondrial ctl probably maintains the intracellular pools of cho and as a h + -antiporter and modulates the electrochemical/proton gradient in the mitochondria ( , ) . ctl /slc a is only indirectly implicated in pc synthesis and its exact function is not firmly established in neither whole cells nor mitochondria ( ) . pe is the major inner membrane phospholipid with specific roles in mitochondrial fusion, autophagy and apoptosis ( - ) . pe is also a valuable source of other phospholipids. pc is produced by methylation of pe while phosphatidylserine (ps) is produced by an exchange mechanism whereby the etn moiety of pe is replaced with serine and free etn is released. pc could also produce ps by a similar exchange mechanism, with free cho being released. the metabolically released cho and etn need to be transported in and out of the cytosol and mitochondria or reincorporated into the kennedy pathway ( - ) . that mammalian etn and cho transport may occur through a similar transport system was implicated from early kinetic studies in bovine endothelial cells, human retinoblastoma cells and glial cells ( - ) . here, we demonstrate that ctl /slc a and ctl /slc a are authentic etn transporters at the cell surface and mitochondria. we examine the kinetics of etn transport in ctl and ctl depleted conditions and overexpressing cells. we characterize etn transport in human skin fibroblasts that maintain ctl but lack ctl function due to inherited ctl /slc a frameshift mutations (m = slc a Δasp and m = slc a Δser ) ( ) . we employ pharmacological and antibody induced inhibition to separate the contributions of the ctl and ctl to etn transport and pe synthesis. this study is the first to demonstrate that the ctl and ctl are high and low to medium affinity cellular and mitochondrial etn transporters. to our knowledge, this is the first study to demonstrate that as intrinsic etn transporters, ctl and ctl regulate the supply of extracellular etn for the cdp-etn pathway, redistribute intracellular etn and balance cdp-cho and cdp-etn arms of the kennedy pathway. to assess the magnitude by which ctl / inhibition affects pe and pc levels, two types of cells were characterized for phospholipid metabolism (mcf- and mcf- ) ( , ) . the cells were treated for h with [ h]-glycerol, to label the entire glycerolipid pools (steady-state levels) in the presence and absence of ctl transport inhibitor hemicholinium- (hc- ) or ctl specific antibody ( fig. a and b) . surprisingly, hc- reduced the steady-state levels not only of pc ( - %) but also of pe ( %) in both cell types. ctl antibody similarly reduced pc and pe levels ( - %), further indicating that ctl could be involved in the transport of etn, in addition to its well-characterized function in cho transport ( , ) . µm, etn, cho and etn + cho were applied respectively (fig. d) . furthermore, ctl antibody inhibited c-etn transport in a concentration-dependent manner with an ic of ng (fig. e ). ctl antibody also inhibited both, c-etn and h-cho transports in a concentration dependent manner with lc ng (fig. f, g) . together, the data showed that etn is a ubstrate for ctl and ctl -mediated transports, in addition to already establish functions in cho transport. we studied the kinetics of etn transport in monkey cos- cells and control (ctrl) and ctl deficient (m =slc a Δasp and m =slc a ser ) primary human fibroblasts. as expected, cos- cells and ctrl fibroblasts expressed ctl and ctl proteins while ctl mutant fibroblasts m and m only expressed ctl protein ( fig. a ). c-etn transport rates (v) plotted against [etn] produced a series of saturation curves, as expected for protein mediated transports ( fig. b) . vmax values were nearly identical in ctrl and cos- cells (vmax = . and . nmol/mg protein/min) and m and m cells had reduced but similar vmax = . - . nmol/mg protein/min ( fig. b) , apparently caused by the absence of the ctl transport component. indeed, the eadie-hofstee plots derived from the saturation curves were biphasic in ctrl fibroblasts and cos- cells and linear for m and m cells (fig. c ). this type of behavior indicated the presence of two distinct transport systems in ctrl and cos- cells with two binding constants, of high and low affinity for etn, and one transport system of a lower affinity in m and m cells. as further shown in fig. c , ctrl fibroblasts, high affinity (k = . ± . µm) and low (k = . ± . µm) affinity etn bindings were similar to cos- cells bindings (k = . ± . µm and k = . ± . µm). on the other hand, m and m cells are characterized by a single transport with a binding constant for etn of . - . µm which is the second (k ), low affinity, binding constant as determined in ctrl and cos- cells (fig. c ). m and m cells only express ctl and at levels similar to ctrl and cos cells, and do not have a functional ctl protein ( fig. a,d) , strongly implicating ctl as responsible for the low affinity etn transport. indeed, ctl depletion by sirna knockdown in ctrl cells completely abolished the low affinity transport component while the high affinity component remained intact (fig. e ). this analysis also confirmed that the high affinity transport (k ) which is absent in m and m cells and remained intact in ctl sirna treated ctrl is ctl -mediated etn transport. since the effects of ph and [na + ] ions on choline transport is well established ( ) , their effects on etn transport were also investigated ( fig. f and g). etn transport in ctrl (ctl + ctl transport) and m fibroblasts (ctl transport) (fig. f ) was reduced when extracellular ph was lowered from to . and stimulated when ph was increased to . . additionally (fig. g ), as expected, the rate of etn transport in ctrl cells was higher than in m cells but the rates were not modified when na + ions were replaced by li + ions in the uptake buffer. altogether, the data established that ctl and ctl acts as etn/h + antiporters, driven by a proton gradient and they are both independent of na + , as in case of cho transport ( ) . state levels) showed unchanged pc, reduced pe, ps and dag and increased triglycerides (tag) in m cells ( fig. c and d) . therefore, reduced etn transport, slower p-etn and cdp-etn formation, and reduced dag levels, collectively slowed the cdp-etn pathway (fig. a ,b) and reduced pe levels (fig. c ) in m cells. the cdp-etn formation from petn is usually the rate-regulatory step in the kennedy pathway and is controlled by pcyt (ctp: phosphoethanolamine cytidylyltransferase) ( ) . indeed, in accordance with reduced cdp-etn formation above, the activity and expression of pcyt were also reduced in m cells (fig. e,f) . the expression of etn kinase (ek) was similarly decreased by % (fig. f) , explaining why the formation of p-etn was reduced in m cells (fig. a , b). in addition to pe, ps levels were reduced in m cells but the expression of ps synthesis (ps syntase / -pss / ) and ps degradation (ps decarboxylase-psd) genes ( fig. f ) were unaltered ( fig c) . the expression of the pc synthesis genes pcyt (ctp: phosphocholine cytidylyltransferase) was decreased by % and choline kinase (ck) by % in m cells (fig. f ), yet unexpectedly pc levels were unchanged (fig. c ). we previously established ( ) that the constant pc levels in ctl deficient m and m cells are maintained by reduced pc turnover and increased formation from other phospholipids (pc is made at the expanse of pe and ps), as the main mechanism to maintain pc as a source of choline in a new neurodegenerative disorder caused by frame-shift mutations in the ctl gene m =slc a Δasp and m =slc a ser ( ) . these data collectively provided strong genetic and metabolic evidence that ctl and ctl independently contributes not only to the cdp-cho but also to the cdp-etn kennedy pathway. ctl is not over expressed in deficient m and m cells ( fig. a ) and as such it cannot compensate for the absence of ctl in those cells and affected individuals ( ) . to demonstrate that ctl and ctl are both etn and cho transporters, the cells were transiently transfected with ctl cdna or ctl cdna and the protein expression and transport determined after h. as shown in fig. a , in m cells, that completely lack ctl protein, ctl cdna it is well known that various organic cations can inhibit ctl mediated cho transport ( ) . we assessed the inhibitory effect of organic cations and ctl knockdown on etn transport in ctrl and m cells (fig. ). this helped us understand the magnitude at which ctl and ctl contributed to etn transport. total (ctl +ctl ) transport ( showing that is not a ctl / related transport. finally, comparison of all transport velocities (fig. e ) showed a general order of contributions, from the high affinity ctl (ctrl + ctl sirna), low affinity ctl (m ), and the residual very low affinity (m +ctl sirna) transports for etn. ctl (ctrl) contributed %, ctl (m ) . %, and the unrelated residual transport (m + ctl sirna) accounted for . % to the total transport. the ctrl and m cells express oct and octn and other unspecific cho transporters that could be contributing to this residual transport ( ) . ctl and ctl mediate etn transport to mitochondria. ctl and ctl are present in the mitochondria and are involved in mitochondrial cho transport ( , ) . we used coxiv as a marker of mitochondria isolated from ctrl and m cells and ctl mrna expression for the sirna knockdown of ctl transport (fig. a) . by comparing the contributions of all mitochondria transport components (fig. b) , ctl +ctl (ctrl) contributed %, ctl (m ) %, and the residual unrelated transport (m + ctl sirna) contributed % to the total mitochondria etn transport. we also compared the rates of [ c]-etn transport in the isolated mitochondria and the whole cells and in the presence and absence of the specific inhibitor hc- . as expected, the ctl and ctl inhibitor hc- blunted etn uptake in a time dependent manner in the mitochondria (fig. c , e) and the whole cells (fig. d, f) . in the absence of hc- , the rate of ctrl mitochondria transport was similar to the whole cell ctrl transport ( . and . µmol/mg/min, respectively); m mitochondrial transport was also similar to the whole cell transport ( . and . µmol/mg/min respectively) (fig. e, f) , demonstrating that the same proteins are responsible for the transports in the whole cell and mitochondria. in addition, the m mitochondrial (ctl only) transport was -fold slower that the total (ctl +ctl ) transport of the ctrl mitochondria. taken together, these data established that ctl and ctl mediate mitochondrial etn transport with the same kinetic properties as in the whole cells. pe and pc are bilayer forming phospholipids involved in fundamental membrane processes, growth, survival and cell signaling ( ) . pe and pc are similarly synthesized by cdp-etn and cdp-cho kennedy pathway, in which the extracellular substrates cho and etn are actively transported into the cell, phosphorylated, and coupled with diacylglycerols (dag) to form the final phospholipid product. cho and etn released from pc and pe also need to be transported in and out of the cytosol and mitochondria or reincorporated into the kennedy pathway. the plasma membrane ctl is firmly assigned to cho transport for pc synthesis ( ) , yet the exact function of the mitochondrial ctl is still not clear. in the liver and kidney mitochondria, cho is specifically oxidized to betaine, the major methyl donor in the one-carbon cycle ( ) . since broadly expressed, it is proposed that the mitochondrial ctl could maintain the intracellular pools of cho and as + h-antiporter could regulate the electrochemical/proton gradient in the mitochondria ( , , ) . ctl is only indirectly implicated in cho transport and until this work the exact ctl substrate binding and transport mechanism were not firmly established, in neither the whole cells nor isolated mitochondria. based on our extensive work on ctl and cho transport ( ) ( ) ( ) ( ) ( ) ( ) ) and known similarities between cho and etn transports in various conditions ( ) ( ) ( ) we postulated that ctl could be that long-searched for etn/cho transporter and the last missing link between cdp-cho and cdp-etn pathways for phospholipids synthesis. we conducted an extensive number of kinetic, metabolic, and genetic experiments to solidify this hypothesis. we established that ctl mediated a high affinity etn transport with k = - µm and that ctl mediated a low etn affinity transport with k = - µm in primary human fibroblasts and monkey cos cells. importantly, the ctl affinity constant for etn binding is in the range of physiological etn concentration in rat and humans ( - µm) ( ) , and explains why ctl contributed the most ( - %) of the etn transport in the whole cells and mitochondria. we recently described the first human disorder caused by homozygous frame-shift mutations in the ctl gene slc a : m = slc a Δasp , m = slc a Δser and m = slc a Δlys ( ) . after an extensive characterization of transport and metabolism in patient's fibroblasts it was apparent that diminished cho transport is the primary cause of this new neurodegenerative disorder with elements of childhood-onset parkinsonism and mpan (mitochondrial membrane protein-associated neurodegeneration)-like abnormalities. paradoxically, although cho transport and cdp-cho kennedy pathway were diminished, pc remained preserved in the cerebrospinal fluid and skin fibroblasts of the affected individuals ( ) . the cell membranes were however drastically remodeled and depleted of pe and ps, apparently as a homeostatic response to preserve pc and prevent cho deficiency in the affected individuals ( ) . could maintain the intracellular pools of cho and etn and since they are both proton antiporters, they could be significant regulators of the proton gradient in the mitochondria. recent studies in ctl knockout mice established that that ctl mediated mitochondrial cho transport is critical for atp and ros production, platelet activation and thrombosis ( ) . ctl gene slc a is well-established the human neutrophil antigen ( ) , and genetic risk factor for hearing loss, meniere's disease and venous thrombosis ( ) . neutrophil ctl could interact directly with platelets' integrin αiibβ , and induce neutrophil extracellular trap (netosis) that than promote thrombosis ( ) . in studies with ctl and ctl antibodies . x cells/well were seeded in -well plates and grown for h. after h of growth, various amounts of antibodies were added to cells and incubated for h and cho and etn uptake were then conducted. radiolabeling with h-glycerol as previously described ( ) . pcyt activity assay. the assay was conducted as described ( psd, pss and pss was determined by pcr using the primers and conditions as before ( ) . reactions were standardized by amplifying glyceraldehyde -phosphate dehydrogenase (gapdh) and relative band intensity was quantified using imagej software (nih, bethesda, md, usa). mitochondrial isolation. mitochondria were isolated as initially described ( ) . in brief, cells were incubated for minutes in ice-cold rsb swelling buffer and homogenized; ml ms buffer was added, and the cell homogenate was centrifuged at rpm for minutes. this step was repeated twice, and the final supernatant was centrifuged at , rpm. the resulting pellet was resuspended in ms buffer. the mitochondria purity was determined using coxiv mitochondria marked and β-tubulin as a whole cell control. statistical analysis. all measurements are expressed as means from quadruplets ± sem. statistical analysis was performed using graphpad prism software (graphpad, inc.). data were subjected to students t-test. differences were considered statistically significant at *p < . . all study data are included in the article analyzed data and wrote the paper the relationship between phospholipids and insulin resistance: from clinical to experimental studies phosphatidylethanolamine deficiency in mammalian mitochondria impairs oxidative phosphorylation and alters mitochondrial morphology choline transport for phospholipid synthesis choline transport for phospholipid synthesis: an emerging role of choline transporter-like protein the solute carrier a is a mitochondrial protein and mediates choline transport betaine and choline improve lipid homeostasis in obesity by participation in mitochondrial oxidative demethylation choline supplementation restores substrate balance and alleviates complications of pcyt deficiency palmitic acid and oleic acid differentially regulate choline transporter-like levels and glycerolipid metabolism in skeletal muscle cells de novo synthesis of phospholipids is coupled with autophagosome formation ethanolamine and phosphatidylethanolamine: partners in health and disease regulation of phosphatidylethanolamine homeostasis -the critical role of ctp:phosphoethanolamine cytidylyltransferase (pcyt ) ethanolamine and choline transport in cultured bovine aortic endothelial cells effect of ethanolamine on choline uptake and incorporation into phosphatidylcholine in cells. human y retinoblastoma uptake of ethanolamine in neuronal and glial cell cultures choline transporter-like deficiency causes a new type of childhoodonset neurodegeneration isoform-specific and protein kinase c-mediated regulation of ctp:phosphoethanolamine cytidylyltransferase phosphorylation stimulation of the human ctp:phosphoethanolamine cytidylyltransferase gene by early growth response protein mechanism of choline deficiency and membrane alteration in postural orthostatic tachycardia syndrome primary skin fibroblasts functional expression of choline transporter like-protein (ctl ) and ctl in human brain microvascular endothelial cells molecular and functional characterization of choline transporter in human colon carcinoma ht- cells molecular and functional characterization of choline transporter-like proteins in esophageal cancer cells and potential therapeutic targets identification and functional analysis of choline transporter in tongue cancer: a novel molecular target for tongue cancer therapy ethanolamine inhibits choline uptake in the isolated hamster heart effects of amino acids and ethanolamine on choline uptake and phosphatidylcholine biosynthesis in baby hamster kidney- cells functional analysis of [methyl-( )h]choline uptake in glioblastoma cells: influence of anti-cancer and central nervous system drugs ethanolamine ameliorates mitochondrial dysfunction in cardiolipin-deficient yeast cells ethanolamine enhances the proliferation of intestinal epithelial cells via the mtor signaling pathway and mitochondrial function looking beyond structure: membrane phospholipids of skeletal muscle mitochondria cardiolipin and mitochondrial cristae organization effects of lipids on mitochondrial functions the choline transporter slc a controls platelet activation and thrombosis by regulating mitochondrial function molecular genetics of the human neutrophil antigens slc a single nucleotide polymorphisms, isoforms, and expression: association with severity of meniere's disease? bercu activated αiibβ on platelets mediates flow-dependent netosis via slc a . elife characterization of hemostasis in mice lacking the novel thrombosis susceptibility gene slc a slc a deficient mice have a reduced response in stenosis but not in hypercoagulability driven venous thrombosis slc a -a novel therapeutic target for venous thrombosis? identification and expression of a mouse muscle-specific ctl gene a rapid method of total lipid extraction and purification developmental and metabolic effects of disruption of the mouse ctp:phosphoethanolamine cytidylyltransferase gene (pcyt ) we thank christina fagerberg (odense university, denmark) and felix distelmaier the authors declare no competing interests. key: cord- -x zzuu v authors: contu, lara; balistreri, giuseppe; domanski, michal; uldry, anne-christine; mühlemann, oliver title: characterisation of the semliki forest virus-host cell interactome reveals the viral capsid protein as an inhibitor of nonsense-mediated mrna decay date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: x zzuu v the positive-sense, single-stranded rna alphaviruses pose a potential epidemic threat. understanding the complex interactions between the viral and the host cell proteins is crucial for elucidating the mechanisms underlying successful virus replication strategies and for developing specific antiviral interventions. here we present the first comprehensive protein-protein interaction map between the proteins of semliki forest virus (sfv), a mosquito-borne member of the alphaviruses, and host cell proteins. among the many identified cellular interactors of sfv proteins, the enrichment of factors involved in translation and nonsense-mediated mrna decay (nmd) was striking, reflecting the virus’ hijacking of the translation machinery and indicating viral countermeasures for escaping nmd by inhibiting nmd at later time points during the infectious cycle. in addition to observing a general inhibition of nmd about hours post infection, we also demonstrate that transient expression of the sfv capsid protein is sufficient to inhibit nmd in cells, suggesting that the massive production of capsid protein during the sfv reproduction cycle is responsible for nmd inhibition. suggesting that the massive production of capsid protein during the sfv reproduction cycle is responsible for nmd inhibition. as we live through the current sars-cov pandemic, the world is reminded of the unpredictable nature of viral epidemics and the importance of studying potential emerging viral threats. recent studies present valid arguments for the worldwide epidemic threat of alphaviruses (among other arboviruses) that currently circulate endemically in particular regions , . the outbreak potential of alphaviruses has already been showcased by the two worldwide epidemics caused by chikungunya virus (chikv) that affected more than million people in over countries and could be attributed to a single point mutation leading to a -fold increase in infectious virus in the salivary glands of urban mosquitoes , . this demonstrates that small genetic alterations can cause dramatic changes in human transmissibility and infection. semliki forest virus (sfv) is closely related to chikv, both evolutionarily grouped within the semliki forest (sf) clade of the old world alphaviruses (family: togaviridae) . sfv causes lethal encephalitis in mice . though mostly associated with mild febrile illness or asymptomaticity in humans, sfv is endemic to african regions and a handful of studies indicate serious disease relevant symptoms associated with sfv in humans, including encephalitis, myalgia and arthralgia - . sfv is a small (~ nm in diameter), enveloped virus comprising a nucleocapsid core made up of copies of capsid protein that surrounds its positive-sense single-stranded rna genome (~ . kb). the genome contains a ´ cap (n mgppp) and poly(a) tail and is organised into two distinct open reading frames (orfs). the first orf encodes the non-structural proteins (nsp , nsp , nsp and nsp ) ( figure a), which are translated as one polyprotein (p ) immediately upon exposure of the viral mrna- genome to the cytoplasm [ ] [ ] [ ] [ ] . the polyprotein is then proteolytically cleaved by the protease activity of nsp to yield functional viral replicase complexes . the first protein to be cleaved from the polyprotein is nsp , comprising rna-dependent rna polymerase activity. the resulting p polyprotein in complex with nsp forms the viral replication complex (rc), responsible for synthesizing minus strand template rna from the genomic viral (v)-rna early during infection . the ensuing double-stranded vrna intermediates can trigger the activation of protein kinase double-stranded rna-dependent (pkr), resulting in phosphorylation of the α-subunit of the eukaryotic translation initiation factor (eif ) and thus causing a decrease in global translation of host cell messenger rnas (mrnas) , , . as proteolytic cleavage of p by nsp progresses, individual nsps form new viral rcs of altered composition, resulting in a shift from synthesis of the minus strand template, to synthesis of new viral genomes and viral subgenomic rna (sgrna) from the s promoter ( figure a ) , . alphavirus replication occurs in membrane invaginations called 'spherules', where high concentrations of rcs are present , . binding of host cell proteins to rcs has been reported, though the abundances and the functions thereof are still not fully understood , , . in addition, individual sfv proteins localise independently of the rc to perform functions separate from viral replication , , , . one example is the nsp protein, which translocates to the nucleus here, we investigated the virus-host protein interactome of sfv. a greater understanding of the repertoire of host proteins that may be exploited by viruses is a vital first step toward developing antiviral strategies aimed at targeting or interfering with interactions that may be critical for the infection. while previous studies have reported host interactors of sfv from isolations of rcs from lysosome fractions, as well as affinity purifications and localisation studies of nsp -tagged recombinant virus , [ ] [ ] [ ] , there are so far no sfv studies that assess the complete set of viral-host protein-protein interactors (ppi). using affinity purifications followed by high-throughput quantitative mass spectrometry, we identified host protein interactors of individual sfv proteins in human cells. in addition, using a genome-wide sirna screen we assigned pro or antiviral functions to some of the identified sfv interactors. gene ontology (go) enrichment analyses of protein complexes that could form between the identified host interactors revealed highly significant go terms related to translation and nonsense-mediated mrna decay (nmd). nmd is known to restrict infection of alphaviruses, but whether and how the virus counteracts this cellular intrinsic defence is still not clear . here we show that during the course of infection sfv suppresses nmd. we present evidence that the capsid protein of sfv is sufficient to suppress nmd independently of translation inhibition. as obligatory parasites, all viruses exploit the host cell to favour their own replication. in turn, cells have evolved mechanisms to protect against viral infections. to gain insight into the repertoire of host proteins that could be exploited by sfv, we systematically mapped the interactions between the individual sfv proteins, nsp - , c, and the envelope polyprotein, env (which includes e , e , k and e ), and the host cell proteome using high-throughput quantitative mass spectrometry (figure a) . the sfv proteins were n-terminally tagged with xflag and transiently expressed in hela cells, a cell type susceptible to infection by sfv . the proteins were then affinity purified from the respective lysates using anti-flag antibodies, with and without treatment with rnase a to distinguish rna- mediated from protein-mediated interactions (figure b) . western blot analysis of the eluates from each anti-flag affinity purification revealed the successful pulldown of all six transiently expressed sfv proteins (figure c) . the protein compositions of the eluates, from three biological replicates of each affinity purified sfv protein, were analysed by quantitative mass spectrometry (figure a and suppl. figure ). significant interactors (see methods) (figure b, purple circles and table ) were further filtered by abundance, such that proteins whose abundance made up at least . % of the relevant sfv bait protein were figure . strategy for creation of semliki forest virus (sfv) -host protein-protein interaction map. a, schematic illustration of the genomic organisation of sfv. the first orf encoding the non-structural proteins (nsp , nsp , nsp , nsp ) is highlighted in green. the zsg tag inserted within the nsp protein is also depicted. the second orf encoding the structural proteins (capsid, e ,e , k,e ) is highlighted in orange. other viral features depicted include the ´ cap, poly(a) tail, as well as the position of the s subgenomic viral promoter. b, flowchart outlining the experimental approach to transiently express n- terminally flag-tagged (yellow rectangle) sfv proteins in mammalian cells in order to construct a sfv-host protein-protein interactome. nsp -z refers to the nsp protein with the zsg tag, c refers to the capsid, and e e k e refer to the envelope proteins (env) that were expressed as one polyprotein. c, anti-flag western blot of sfv proteins after transient transfection and affinity purification from hela cells (without rnase a treatment). red asterisks indicate xflag tagged sfv proteins at their expected sizes. untransfected cells (untr) and cells transfected with a plasmid encoding only the xflag tag with no additional coding region (empty) were included as controls. the expected sizes of the xflag-tagged proteins were: empty ~ kda; nsp ~ kda; nsp ~ kda; nsp -z ~ kda; nsp ~ kda; capsid ~ kda and env (polyprotein ~ kda, cleavage intermediates ~ kda / ~ kda). the affinity purifications were conducted in triplicate (± rnase a treatment), and eluates analysed by mass spectrometry. table ). in the case of the nsp bait (here fused with the fluorescent protein zsgreen, nsp -z), which was very lowly abundant in the sample as it proved difficult to elute from the beads (suppl. figure ) , we retained proteins whose abundance made up at least % of the bait (table ). the heat map in figure c summarises the most abundant significant interactors of each sfv protein in the -rnase a samples, with their corresponding abundance in the +rnase a samples alongside them. many of the host interactors identified in the -rnase a sample were lost upon treatment with rnase a, indicating that these interactions were likely mediated by rna. this was clear for many nsp , nsp -z and capsid interactors, where the heat map ( figure c ) corroborated the observations in the analytical silver stain gel (figure a) . in both the heat map and the gel, we noted proteins in the nsp eluate that were enriched in the +rnase a sample compared to the -rnase a sample. also reflected in the heat map were proteins observed in the gel that were more than or as abundant as nsp (~ kda) (figure a and c, +rnase a samples). these agreements between the quantified lists obtained to create the heat map and the observations in the analytical silver stain gel gave us confidence in the strategy employed to collect host protein interactors from the mass spectrometry datasets. since many sfv-host protein interactions were dependent on rna, we chose to focus on the lists of interactors from the -rnase a datasets going forward. a summary of these revealed that a large fraction of the interactions for nsp , nsp -z and capsid consisted of ribosomal proteins (figure d ). these stringent lists of interactors are displayed as sfv-host interactome networks (figure a and suppl. figure ). affinity purified sfv proteins are displayed as black circles, while host cell proteins are displayed as smaller, colour-coded circles. host proteins that were identified as unique interactors to one of the sfv proteins are connected to the relevant host protein with a grey line. many of the host proteins were identified as interactors to more than one of the sfv proteins. for simplicity, these non-unique interactors are grouped into grey boxes with grey lines connecting the whole group of proteins to the sfv proteins for which they were identified as interactors (figure a ). considering this overlap, the total number of host proteins that were identified as interactors was (figure a and suppl. figure ) , of which were ribosomal proteins and are shown separately (suppl. figure ). host proteins displayed in the networks were manually curated and categorised into colour-coded groups based on descriptions gathered from both gene ontology (go) and string analyses ( figure a) . interactions that stood out included subunits of the chaperonin-containing t-complex polypeptide (cct complex) (pink) that interacted with both nsp and nsp , a number of cytoskeletal proteins or proteins involved in cytoskeletal signalling (grey) interacting with nsp and nsp (tubulins), er chaperones (pink) bound uniquely to env, and a large number of rna binding proteins (violet) interacting with nsp , nsp -z and capsid. in addition, a striking presence of rrna processing / ribosome biogenesis factors emerged as interactors, many of which were found bound uniquely to the capsid (dark pink) (figure a ). previously reported human protein-protein interactions were analysed by string and additionally displayed on the networks (pink dashed lines). the dense network of edges (pink dashed lines) that emerged among the rrna processing / ribosome biogenesis factors (dark pink) reflects the known protein-protein interactions that have been reported between depicted. c, using the list of -rnase a interactors, a threshold of abundance of at least . % of the bait protein (in the case of nsp -z, at least % of the bait protein) was applied. a heat map summarising either the top most abundant interactors (or all if there were fewer than interactors identified) for each sfv bait protein without rnase a treatment (-) is shown. the corresponding abundance as % of bait of the interactors that were statistically significantly enriched when treated with rnase a (+) is also shown in the heat map. grey blocks indicate proteins that did not appear or were not statistically significantly enriched in the +rnase a samples. note that due to the low abundance of nsp -z, the abundance as % of bait of nsp -z and all its interactors are presented as a factor of less than what was calculated (for better visual representation of the heat map as a whole). for all values and the complete list of interactors, see table . d, bar graph summary of the them ( figure a ). this indicates that the capsid protein (and to a lesser extent nsp /nsp -z) may interact with a complex of proteins involved in rrna processing and/or ribosome biogenesis. we next set out to determine whether the identified host proteins have pro-or antiviral effects in the context of infection. to systematically address this question, we used a genome-wide fluorescence microscopy-based sirna screen to identify host proteins that affected sfv replication ( table ). in product, compared to the control, non-specific sirnas (set as ). a low multiplicity of infection (moi) of . was used to allow for detection of both reduced or increased infection levels. the maximum ii obtained in the screen was . (table ) . we therefore chose to set an ii threshold of . to identify proteins having a potential antiviral role against sfv, and an ii threshold of . to indicate proteins having a potential proviral role for sfv (table ). when we compared the proteins identified in the sirna screen with the sfv-host protein interaction networks, a sizable fraction of the interactors overlapped. those with potential pro-or antiviral roles during infection are depicted with turquoise or green outlines, respectively (figure a and suppl. figure ) and collected in two lists: 'proviral' and 'antiviral', with their ii values shown alongside them ( figure b ). as expected, all the identified ribosomal proteins affecting sfv replication had a proviral effect (suppl. figure ) and are listed separately ( figure b ). in addition to ribosomal proteins, many other rna binding proteins were also identified as having a potential role in sfv replication (figure a (nsp , nsp , nsp -z, nsp , capsid and env) as well as for the merged list (all baits) was also applied. this allowed us to compare the significance of the go terms collected in figure a for the different sfv proteins (figure b ). we observed that the most significantly enriched go terms were those related to translation, nmd and ribosome biogenesis. in addition, it became clear that it was mainly the interactors of nsp , nsp -z and capsid that contributed to the enrichment of these go terms because of the enriched go terms, we chose to investigate the effect of sfv on translation, nmd and ribosome biogenesis. we reported previously that the nmd machinery could target the sfv genome independently of the ´ utr . half-life measurements of the genome of a replication incompetent sfv mutant suggested that this occurred early during infection, upon entry of the viral genome into cells . since viruses are known to evade cellular defence responses against viral infection, we wondered whether the virus could inhibit nmd at later stages of infection. since nmd depends on translation and viruses are known to inhibit translation, it was important to carefully analyse the time course of infection in our system in an attempt to disentangle these two tightly linked cellular processes. we used anti-zsg or anti-nsp antibodies to detect nsp -z, as a representative for the presence of early produced non-structural proteins expressed from the grna, and anti-capsid antibodies to detect the capsid, as a representative for the presence of structural proteins that are expressed from sgrnas later during the virus replication cycle (figure a and b). the nsp -z protein was reproducibly detected at - hours post infection (p.i.) and as early as hours p.i., while the capsid was reproducibly detected at hours p.i. (figure a and b). we measured the presence of phosphorylated (p-)eif α compared to total eif α as an indication of virus-induced translation inhibition and showed that a virus-dependent accumulation of p-eif α was reproducibly detected at - hours p.i. (figure a ). in addition, we performed time course puromycin incorporation assays to assess global translation activity using a more direct method . this assay involves a puromycin pulse for minutes, which causes the release of nascent polypeptides and results in many puromycin- labelled polypeptides of different lengths that can then be visualised by western blotting using anti- puromycin antibodies. the decrease in puromycin - hours p.i. is therefore indicative of a decrease in global translation, in agreement with the observed increase in p-eif α ( figure b ). we set aside samples from the time course infections in figure a to assess nmd activity by measuring relevant rna levels by rt-qpcr. to assess nmd activity, we adapted an assay described in , which measures the relative amounts of a nmd-sensitive splice isoform (nmd target) versus a nmd insensitive protein coding isoform (non-nmd target) of the same gene. we showed that at hours started to accumulate at hours p.i., whereas the increase of bag _nmd only became apparent at hours p.i. (suppl. figure ). in addition, we measured the rna levels of the well-known endogenous nmd targets, rp p, ire α and gadd , which also accumulated - hours p.i. (figure c and suppl. figure ). together, these data are indicative of reduced nmd activity - hours p.i., suggesting that sfv can indeed inhibit nmd at later stages of infection. since the timing of the nmd inhibition correlated with that of eif α-dependent inhibition of cellular mrna translation, we were unable to pull apart the effect the viral infection had on the two cellular processes independently. translation inhibition by sfv, and rna viruses in general, is well described to occur through induction of p-eif α, which occurs upon host cell detection of the double-stranded viral rna intermediate that arises during its replication cycle , , . we therefore reasoned that, if one of the sfv proteins was responsible for the nmd inhibitory phenotype, we would be able to disentangle the effect of the virus on the two cellular processes. taking the mass spectrometry data analysis ( figure b ) into account, we reasoned that nsp , nsp -z or capsid could be responsible for the nmd inhibitory phenotype. first, in order to confirm the interactions identified between nsp , nsp -z and capsid with cellular upf (figure were identified in the upf eluates of the infected sample ( figure d ). it should be noted that their abundance was higher than that of the nmd factors, upf a and smg . taken together, we were therefore able to confirm the rna-mediated interactions of nsp -z and capsid with upf . the results above suggested that nsp -z and capsid were the most likely candidates to influence nmd activity in cells. nevertheless, we decided to analyse the effect of all individual sfv proteins on nmd activity. to do this, the sfv proteins (from plasmids described in figure b suppress nmd in cells. since translation-related go terms were also enriched for capsid interactors (among other sfv proteins) (figure b) , it was important to investigate whether expression of capsid influenced translation in cells, as this would in turn influence nmd activity. we showed that expression of capsid did not induce p-eif α (figure c ) nor, as judged from the puromycin incorporation assays, effect changes in global translation (figure d ). in addition, polysome profile gradients revealed that cells expressing the sfv capsid retained intact polysomes (suppl. figure ) , indicative of unperturbed translation. these three independent sets of data convincingly show that expression of the capsid protein did not influence global translation in cells. we therefore concluded that the sfv capsid suppresses nmd through a mechanism independent of translation inhibition. since ribosome biogenesis related go terms were most highly enriched among capsid interactors compared to the other sfv proteins, we used capsid expressing cells to look for any indication of altered ribosomes/rrna that could give us any hints on the mechanistic action of capsid. though capsid could be trapped in the nucleus upon blocking of export, we were unable to find any phenotypic changes in polysome gradients (suppl. figure ) (figure a and c) . as such, a large number of rna-binding proteins were identified as host interactors of these three sfv proteins (figure a) , raising the question of whether they could play a role in altering and exploiting the compositions of mrnps during infection. some of the identified rna-binding proteins that were previously found in sfv rcs include hnrnpc, hnrnpa , sfpq, dhx , ddx x, pabpc , g bp and g bp . in addition to being found in rcs, the nsp : g bp interaction has been well characterised - . sfv nsp has been reported to bind g bp and suppress the formation of stress granules, thought to have antiviral activity . consistent with these previous findings, we identified g bp , g bp and usp , a deubiquitinase protein known to bind g bp , as interactors of nsp -z. usp was identified as a unique interactor of nsp , while g bp and g bp were identified as also interacting with nsp and capsid. thanks to a nuclear translocation signal, nsp shuttles between cytoplasm and nucleus during infection and interferes with transcription . in line with this, and as observed by others, we observed localisation of the xflag-nsp in the nucleus of hela cells at steady state (data not shown). our study identified a number of nuclear proteins interacting with nsp , many of which are splicing regulators, including hnrnpc, hnrnpa , hnrnpa , hnrnpf, hnrnph and sfpq. hnrnpc was the most abundant significantly enriched protein in the nsp pulldown. it was also significantly enriched and abundant in the rnasea-treated nsp sample, indicating that this interaction may be either direct or mediated by another protein. interestingly, sfpq was found to be one of the most abundant interactors not only of nsp , but also of capsid ( figure c). sfpq was additionally identified through the sirna screen, along with the mrna export factor, nxf- , as being among the strongest proviral interactors (i.e. depletion of these factors inhibited viral infection). the binding of nsp and capsid to these nuclear proteins could influence the regulation of rna processing or mrna modification steps in the nucleus or re-localise nuclear proteins to the cytoplasm in order to achieve a productive infection. some cytoplasmic viruses, for example, hijack nuclear proteins including splicing factors (hnrnps and sfpq) from the nucleus to the cytoplasm, increasing infectivity - . we also identified rna-binding interactors exhibiting strong antiviral effects activity, including rps a, which was interestingly the only ribosomal protein to be found bound to (in addition to nsp ) nsp , nsp and env (figure c ). we were surprised by the presence of newly identified nuclear interactors involved in rrna processing and ribosome biogenesis, many of which exhibited antiviral activity. interestingly, many of these were uniquely bound to capsid (figure a) . evidence of capsid in the nucleus has previously been reported and we were able to trap xflag- capsid in the nucleus of hela cells upon blocking of export (data not shown). even so, little is known about the role of the capsid in the nucleus and how this could affect the cellular ribosome. we therefore assessed polysome gradients (suppl. figure ) and measured s, s and s precursor rrnas in nuclear fractions of capsid expressing cells (data not shown), but found no obvious phenotypic changes compared to cells expressing the 'empty' vector. perhaps the effects of these interactions on the ribosome are more subtle or only affect a small pool of 'specialised' ribosomes, making changes difficult to detect. since viruses rely on the host cell ribosome for translation of their own genomes, a better understanding of the involvement of the viral proteins in recruiting or potentially altering the host cell ribosomes through interaction with specific ribosomal proteins and this novel set of ribosome biogenesis factors definitely warrants further investigation. to obtain additional hints about possible functional consequences of the detected interactions between sfv and host cell proteins, we used mcode to analyse the protein complexes that could form between all host interactors, including the ribosomal proteins (figure and suppl. figure ). go enrichment analyses of the protein complexes reinforced many of the cellular processes discussed above and revealed that the most highly enriched go terms were related to translation and nmd (figure a and b) . as a counter defence strategy, viruses are known to inhibit cellular mrna decay factors that can degrade viral rnas and restrict infection , , . we therefore hypothesized and decided to investigate, whether sfv was able to inhibit the nmd pathway, which has antiviral activity against alphaviruses . indeed, we found that starting from - hours after infection, sfv antagonises the nmd pathway, with consequent stabilisation of bona fide nmd mrna transcripts (figure c and suppl. fig. ) . viruses known to inhibit mrna decay pathways do so by different mechanisms. often a viral protein counteracts a key cellular regulator. here, we show novel data that in the case of sfv, it is the capsid protein that inhibits nmd (figure b) . therefore, inhibiting this function of the viral capsid could lead to novel avenues for therapeutic intervention. using both sfv protein affinity purifications in transient expression experiments and upf ips in sfv infected cells, we show that the core nmd factor upf binds to sfv capsid among other sfv proteins in an rna-dependent manner. together, this indicates that the capsid, and potentially other sfv proteins, associate with mrnp molecules that also contain upf . the large number of ribosomal proteins pulled down by capsid (figures d, s , and b) reported , . it was also postulated that the capsid or 'core' protein of hepatitis c virus (hcv) may be responsible for the nmd inhibitory phenotype that was reported upon hcv infection . our findings therefore add sfv as the first alphavirus to a growing list of viruses of which the capsid protein is responsible for an nmd inhibitory effect. though the stability of the sfv genome has thus far been attributed to evasion of deadenylation through binding to hur , the virus may require additional strategies to protect itself in order to ensure efficient translation of viral genes and packaging of genomes into new progeny viruses. perhaps the sfv capsid plays a protective role against degradation of its rnas by nmd. in summary, we present here two valuable resources that will aid in the study of sfv: a sfv-host protein interactome as well as a genome-wide sirna screen for host factors influencing sfv infection. the coding sequences for the sfv proteins were cloned into pcdna _frt_to_ xflag(n), to yield pcdna . xflag-nsp , pcdna . xflag-nsp , pcdna . xflag-nsp -zsg, pcdna . xflag-nsp , pcdna . xflag-capsid and pcdna . xflag-env, which were used for all subsequent transfections. pcdna _frt_to_ xflag(n) was linearised using bamhi. pcr products for each of the sfv proteins, were generated from either sfv-zsg(- 'utr) , sfv-capsid or sfv-envelope plasmids using the following primers: buffer and μl of the cleared cell lysate was added, and the mixture was then incubated at °c for one hour with rotation. beads were collected on a magnet and washed three times with lysis buffer. at the third wash, each sample was split into two (for treatment or no treatment with rnase a). for +rnase a samples, rnase a treatment was performed on the beads. the supernatant was removed using the magnet, and μl of rnase a ( . mg/ml, sigma aldrich) containing lysis buffer was added to the beads and incubated at °c for minutes, shaking. thereafter, the rnase a treated beads were washed with lysis buffer, and then samples eluted from the beads using xflag peptide (sciencepeptide.com): μg of flag peptide was incubated with the beads at °c for minutes, shaking. the flag peptide elution was also performed for the -rnase a samples. thereafter, the eluates were collected and μl of loading buffer ( xlds + dtt [ mm]) was added to each eluate, while μl of loading buffer was added to the remaining beads samples. the samples were then incubated at °c for minutes, ready for analysis by western blot, silver stain, and coomassie gel for mass spectrometry sample preparation. samples were electrophoresed on -well nupage™ - % bis-tris gradient gels (thermo fisher scientific) in xmops running buffer at v for approx. hour. the gels were fixed in % methanol / % acetic acid for one hour at room temperature, followed by three minutes washes in % ethanol. the gels were sensitised for minutes in . % sodium thiosulfate, followed by three minutes washes in milli-q h o. the gels were stained in . % silver nitrate / . % formaldehyde for minutes, followed by two minute washes in milli-q h o. the gels were developed in . m sodium carbonate + . % formaldehyde / . % sodium thiosulfate and then incubated in % methanol / % acetic acid for minutes. thereafter, the gels were placed in % acetic acid for short term storage at °c. images were taken using the gel documentation system (www.vilber.com) the protein compositions of the eluates from the sfv affinity purifications were analysed by label-free quantitative mass spectrometry. the eluates ( μl) were electrophoresed in xmops running buffer about cm into the -well nupage™ gels and then stained with coomassie-blue ( % phosphoric acid, % ammonium sulfate, . % coomassie g- , % methanol) as described previously . images were taken using the gel documentation system. rectangular segments ( mm x mm) for each lane were cut from the gels using sterile blades. the gel pieces were reduced, alkylated and digested by trypsin as described elsewhere to custom nsp , nsp , nsp -z, nsp , capsid and env sequences. the following maxquant settings were used: separate normalisation groups for the +rnase a and -rnase a samples, mass deviation for precursor ions of ppm for the first search, maximum peptide mass of da, match between runs activated with a matching time window of . min only allowed across replicates; cleavage rule was set to strict trypsin, allowing for missed cleavages; allowed modifications were fixed carbamidomethylation of cysteines, variable oxidation of methionines, deamination of asparagines and glutamines and acetylation of protein n-termini. protein intensities are reported as maxquant's itop values (sum of the intensities of the three most intense peptides). peptide intensities were first normalised by variance stabilisation normalisation and imputed. imputation values were drawn from a gaussian distribution of width . centred at the sample distribution mean minus . x the sample standard deviation, provided there were at least evidences in the replicate group. in order to perform statistical tests, itop values were further imputed at the protein level, following the rule: 'if at least two detections in at least one group' and using the following protein impute parameters: imputation values were drawn from a gaussian distribution of width . centred at the sample distribution mean minus . x the sample standard deviation. potential contaminants and proteins marked 'only identified by site' were removed prior to performing differential expression (de) tests, which were done by applying the empirical bayes test followed by the fdr-controlled benjamini and hochberg correction for multiple testing. a significance curve was calculated based on a minimal log fold change of and a maximum adjusted p-value of . . proteins that were consistently reported as de throughout imputation cycles were flagged as "persistent". in addition, saint analysis was performed according to and a significance threshold of fdr< . was applied. significant interactors for each sfv bait protein were defined as those that were significantly differentially expressed (enriched) compared to the untransfected control and the significance persisted throughout the imputation cycles. in addition, the proteins taken as "significant interactors" had to be considered "true interactors" as determined using saint analysis (with a threshold fdr of ≤ . ). to simplify the lists and attempting to retain potentially biologically relevant interactors, we further filtered the lists, retaining proteins whose abundance made up at least . % of the relevant bait protein. in the case of the nsp -z bait, which was very lowly abundant in the sample as it proved difficult to elute from the beads, we retained proteins whose abundance made up at least % of the sfv bait. the final lists of proteins were used to create sfv-host protein interaction networks using cytoscape_v . . . string analysis (string-db.org, version . ) was performed using a minimum required interaction score of highest confidence ( . ) for both database and experimental evidence and the known protein-protein interactions were overlaid onto the sfv-protein interactome networks, using cytoscape_v . . . were infected for hours with sfv-zsg at a concentration giving an infection rate of approx. % in control sirna-treated cells. following fixation in % formaldehyde and hoechst staining, nine images per well were acquired using high-content automated fluorescence microscopes (imagex-press, molecular devices). infected cells were detected using cell profiler (www.cellprofiler.org) and advanced cell classifier (www.acc.ethz.ch/acc.html) and the % of infected cells per well was determined. an "infection index" value was calculated for each gene, indicating the fold change of infection upon depletion of the gene product, compared to the control sirnas, which was set as . an infection index threshold of . was chosen to indicate proteins having a potential antiviral role against sfv, and an infection index threshold of . to indicate proteins having a potential proviral role for sfv ( table , raw data). the proteins identified in the sirna screen were overlaid with the sfv-host protein interaction networks. protein-protein interaction enrichment analysis was performed using the metascape online tool (www.metascape.org) according to . metascape allows for the input of multigene lists. for each given list (ie: interactors of nsp , nsp , nsp -z, nsp , capsid and env) as well as for the merged list, protein-protein interaction enrichment analysis was carried out using the following databases: identified for the merged list of interactors. go / pathway and process enrichment analysis was carried out with the following ontology sources: kegg pathway, go biological processes, reactome gene sets, canonical pathways and corum, using the metascape tool. all genes in the genome were used as the enrichment background. from the top most significant go term descriptions gathered for each mcode network (table , supplementary information) , between - terms were chosen to assign biological meaning to each mcode term (figure a ). the log(q-value) of the terms indicates the significance calculated using the merged list of interactors (table , total rna was extracted from tri-reagent by isopropanol precipitation and resuspended in disodium citrate buffer (ph . ). contaminating dna was degraded by treatment with turbo dnase (ambion). thereafter, reverse transcription was carried out using affinityscript multiple temperature reverse transcriptase (agilent), followed by qpcr using brilliant iii ultra-fast sybr® green qpcr master mix (agilent), according to manufacturer's instructions. the following primers were used: hnrnpl_prot '-caatctcagtggacaaggtg - ' for one hour at °c. the coupled beads were washed three times with lysis buffer and then μl of the cleared cell lysate was added to the beads pellet. the cleared lysates and coupled beads were incubated at °c for one hour with rotation. the beads were collected on a magnet and washed three times with lysis buffer. at the third wash, each sample was split into two (for treatment or no treatment with rnasea). the supernatant was removed and the beads were resuspended with µl lysis buffer + µl loading buffer ( x lds+dtt). the samples were incubated at °c for minutes. using the magnet, the supernatants (eluates) were transferred to new tubes, ready to load onto gels for coomassie staining and mass spectrometry sample preparation. the compositions of the eluates were quantified by mass spectrometry, following the same protocol as above. triplicate samples were processed against the same sequence database as above, by transproteomics pipeline (tpp) tools. four database search engines were used (comet , xtandem , msgf and myrimatch , with search parameters as above. each search was followed by the application of the peptideprophet tool; the iprophet tool was then used to combine the search results, which were filtered at the false discovery rate of . ; furthermore, the identification was only accepted if at least two of the search engines agreed on the identification. protein inference was performed with proteinprophet . for those protein groups accepted by a false discovery rate filter of . , a normalized spectral abundance factor (nsaf) was calculated based on the peptide to spectrum match count; shared peptides were accounted for by the method of , giving normalised spectral abundance factor (dnsaf) values. the dnsaf values were used to calculate abundance as a % of the upf bait protein, which were reported for the nmd factors and sfv proteins that were detected in the samples. small-scale sfv protein expression and puromycin incorporation assays . x hela cells per well were seeded into -well plates. the following day, . μg of the relevant pcdna . xflag plasmids were transfected using dogtor transfection reagent (oz biosciences), as per manufacturer's recommendations. cells were harvested hours later, after which x cells per condition were collected to make lysates for western blot analysis, while the remaining cells were collected for rna analysis. the cells were harvested at °c for minutes at x g, washed once with x pbs, and resuspended in either x sds-page loading buffer (for protein analysis) or μl of tri reagent (for rna analysis). for the puromycin incorporation assays, cells were transfected as above and prior to harvesting, the medium was aspirated and medium containing either dmso or µg/ml cycloheximide (chx) was added to the cells and incubated for hours at °c. thereafter, the cells were pulse labelled for minutes with puromycin ( μg/ml). after the minutes pulse, as a recovery step, medium containing either dmso or chx was re-added to the cells for minutes at °c. cells were harvested as above and x cells per condition were resuspended in x sds-page loading buffer and incubated at °c for minutes for western blot analyses. lysates were loaded into either % sds-page gels or pre-casted - % bis-tris -well nupage™ or -well bolt™ gels (invitrogen, thermo fisher scientific) and electrophoresed in either x sds-page running buffer or x mops running buffer. proteins were transferred onto nitrocellulose membranes using either the biorad semi-dry blot system ( minutes in xbjerrum buffer + % methanol) for sds-page gels or the iblot ® (p , mins, using iblot® nc regular stacks) (invitrogen, thermo fisher scientific) for pre-casted gels. membranes were blocked for hour at room temperature (rt) in % milk-tbs-t ( . % tween ), or in the case of rb anti-eif α, ms anti-eif α and rb anti-p-eif α blots, % milk-tbs, and then incubated with the indicated primary antibodies at °c overnight. primary antibodies were diluted in % milk-tbs-t, apart from rb anti-eif α, ms anti-eif α and rb anti-p- eif α, which were diluted in % bsa-tbs-t ( . % tween ). after three minutes washes in tbs- t, membranes were incubated with the indicated secondary antibodies in % milk-tbs-t for . hours at rt, followed by three minutes tbs-t washes. the blots were then visualised using the odyssey system (licor). the protocol for polysome fractionations was adapted from . hela cells were transfected with the 'empty' (pcdna _frt_to_ xflag) or 'capsid' (pcdna . xflag-capsid) plasmids as described for the large scale transfections above. the cells were treated with µg/ml chx for mins at °c prior to harvesting. cells were washed once with ice-cold pbs containing µg/ml chx, scraped off the surface of the dish with µl per cm dish of pbs-chx and then transferred into tubes. cells were collected by centrifugation at °c for minutes at x g and lysed in µl of lysis buffer ( mm tris-hcl [ph . ], mm nacl, mm mgcl , % triton x- , % sodium deoxycholate, red asterisks indicate xflag tagged sfv proteins (or the xflag alone) at respective sizes. 'untr' refers to an untransfected control condition that underwent the affinity purification procedure, while 'empty' denotes transfection with a plasmid construct containing only the xflag tag (with no additional coding region) followed by affinity purification. the expected sizes of the proteins ( xflag included) were: empty ~ kda; nsp ~ kda; nsp ~ kda; nsp -z ~ kda; nsp ~ kda; capsid ~ kda and env ~ kda. the left side of the wbs indicate the eluates (after flag peptide elution from the beads) of each sfv affinity purification, which were sent for mass spectrometry analysis. the right hand side of the wbs indicate the sfv proteins that were still present on the beads after the flag peptide elution step. identified as interactors to more than one of the sfv proteins. in these cases, the solid grey lines connect the grouped set of host proteins to the sfv proteins for which they were identified as interactors. host-host ppi ascertained through string mosquito- borne arboviruses of african origin: review of key viruses and vectors arthritogenic alphaviruses: a worldwide emerging threat? the molecular pathogenesis of semliki forest virus: a model virus made useful? semliki forest virus: cause of a fatal case of human encephalitis an outbreak of human semliki forest virus infections in central african isolation of a newly recognized alphavirus from mosquitoes in vietnam and evidence for human infection and disease me tri virus: a semliki forest virus strain from vietnam? microtubule-dependent transport of semliki forest virus replication complexes from the plasma membrane to modified lysosomes alphavirus infection: host cell shut-off and inhibition of antiviral responses polyprotein processing as a determinant for in vitro activity of semliki forest virus replicase the alphaviruses: gene expression, replication, and evolution semliki forest virus-specific non-structural polyprotein by nsp protease the eif α kinases: their structures and functions eukaryotic aspects of translation initiation brought into focus independent mechanisms are involved in translational shutoff during sindbis virus infection alphavirus rna synthesis and non- structural protein functions magnetic fractionation and proteomic dissection of cellular organelles occupied by the late replication complexes of semliki forest virus novel functions of the alphavirus nonstructural protein nsp c-terminal region induces filopodia and rearrangement of actin filaments nuclear localization of semliki forest virus-specific nonstructural protein nsp evasion of the innate immune response: the old world alphavirus nsp protein induces rapid degradation of rpb , a catalytic subunit of rna a structural and functional perspective of alphavirus replication and assembly translation of sindbis virus mrna: effects of sequences downstream of the initiating codon translation of sindbis virus mrna: analysis of sequences downstream of the initiating aug codon that enhance translation alphavirus vectors and vaccination the regulation of translation in alphavirus- infected cells the alphavirus exit pathway: what we know and what we wish we knew the c-terminal old world alphaviruses bind directly to g bp viral and cellular proteins containing fgdf motifs bind g bp to block structure and function of a protein folding machine: the eukaryotic cytosolic chaperonin cct function and regulation of cytosolic molecular chaperone cct hnrnps relocalize to the cytoplasm following infection with vesicular stomatitis virus exploitation of nuclear functions by human rhinovirus, a cytoplasmic rna virus mapping of chikungunya virus interactions with host proteins identified nsp as a highly connected viral component karyophilic properties of semliki forest virus nucleocapsid protein cytoplasmic viruses: rage against the (cellular rna decay) machine the role of stress granules and the nonsense-mediated mrna decay pathway in antiviral defence interplay between coronavirus, a cytoplasmic rna virus, and nonsense-mediated mrna decay pathway the cellular nmd pathway restricts zika virus infection and is targeted by the viral capsid protein a combined proteomics/genomics approach links hepatitis c virus infection with nonsense-mediated mrna decay sindbis virus usurps the cellular hur protein to stabilize its transcripts and promote productive infections in mammalian and mosquito cells tracking and elucidating alphavirus -host protein interactions new world and old world alphaviruses have evolved to exploit different components of stress granules, fxr and g bp proteins, for assembly of viral replication eastern equine encephalitis virus nsp redundantly utilizes multiple cellular proteins for new world alphavirus protein interactomes from a therapeutic perspective two-helper rna system for production of recombinant semliki blue silver: a very sensitive colloidal coomassie g- staining for proteome analysis proteome remodelling during development from blood to insect-form trypanosoma brucei quantified by silac and mass spectrometry maxquant enables high peptide identification rates, individualized p.p.b.- range mass accuracies and proteome-wide protein quantification uniprot: a worldwide hub of protein knowledge detecting significant changes in protein abundance controlling the false discovery rate: a practical and powerful approach to multiple testing analyzing protein-protein interactions from affinity purification-mass metascape provides a biologist-oriented resource for the analysis of systems- level datasets a guided tour of the trans-proteomic pipeline a deeper look into comet-implementation and features a method for reducing the time required to match protein sequences with tandem mass spectra ms-gf+ makes progress towards a universal database search tool for proteomics highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis statistical validation of peptide identifications in large- scale proteomics using the target-decoy database search strategy and flexible mixture multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates interpretation of shotgun proteomic data quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors quantitation: how to deal with peptides shared by multiple proteins studying the translatome with polysome profiling methods in molecular biology key: cord- -yufvt t authors: van aalst, marvin; ebenhöh, oliver; matuszyńska, anna title: constructing and analysing dynamic models with modelbase v . - a software update date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yufvt t background computational mathematical models of biological and biomedical systems have been successfully applied to advance our understanding of various regulatory processes, metabolic fluxes, effects of drug therapies and disease evolution or transmission. unfortunately, despite community efforts leading to the development of sbml or the biomodels database, many published models have not been fully exploited, largely due to lack of proper documentation or the dependence on proprietary software. to facilitate synergies within the emerging research fields of systems biology and medicine by reusing and further developing existing models, an open-source toolbox that makes the overall process of model construction more consistent, understandable, transparent and reproducible is desired. results and discussion we provide here the update on the development of modelbase, a free expandable python package for constructing and analysing ordinary differential equation-based mathematical models of dynamic systems. it provides intuitive and unified methods to construct and solve these systems. significantly expanded visualisation methods allow convenient analyses of structural and dynamic properties of the models. specifying reaction stoichiometries and rate equations, the system of differential equations is assembled automatically. a newly provided library of common kinetic rate laws highly reduces the repetitiveness of the computer programming code, and provides full sbml compatibility. previous versions provided functions for automatic construction of networks for isotope labelling studies. using user-provided label maps, modelbase v . streamlines the expansion of classic models to their isotope-specific versions. finally, the library of previously published models implemented in modelbase is continuously growing. ranging from photosynthesis over tumour cell growth to viral infection evolution, all models are available now in a transparent, reusable and unified format using modelbase. conclusion with the small price of learning a new software package, which is written in python, currently one of the most popular programming languages, the user can develop new models and actively profit from the work of others, repeating and reproducing models in a consistent, tractable and expandable manner. moreover, the expansion of models to their label specific versions enables simulating label propagation, thus providing quantitative information regarding network topology and metabolic fluxes. mathematical models are accepted as valuable tools in advancing biological and medical research [ , ] . in particular, models based on ordinary differential equations (ode) found their application in a variety of fields. most recently, deterministic models simulating the dynamics of infectious diseases gained the interest of the general public during our combat of the covid- pandemic, when a large number of ode based mathematical models has been developed and discussed even in nonscientific journals (see for example [ ] [ ] [ ] ). such focus on mathematical modelling is not surprising, because computational models allow for methodical investigations of complex systems under fixed, controlled and reproducible conditions. hence, the effect of various perturbations of the systems in silico can be inspected systematically. importantly, long before exploring their predictive power, the model building process itself plays an important role in integrating and systematising vast amounts of available information [ ] . properly designed and verified computational models can be used to develop hypotheses to guide the design of new research experiments (e.g., in immunology to study lymphoid tissue formation [ ] ), support metabolic engineering efforts (e.g., identification of enzymes to enhance essential oil production in peppermint [ ] ), contribute to tailoring medical treatment to the individual patient in the spirit of precision medicine (e.g., in oncology [ ] ), or guide political decision making and governmental strategies (see the review on the impact of modelling for european union policy [ ] ). considering their potential impact, it is crucial that models are openly accessible so that they can be verified and corrected, if necessary. in many publications, modelling efforts are justified by the emergence of extraordinary amounts of data provided by new experimental techniques. however, arguing for the necessity of model construction only because a certain type or amount of data exists, ignores several important aspects. firstly, computational models are generally a result of months, if not years, of intense research, which involves gathering and sorting information, simplifying numerous details and distilling out the essentials, implementing the mathematical description in computer code, carrying out performance tests and, finally, validation of the simulation results. our understanding of many phenomena could become deeper if instead of constructing yet another first-generation model, we could efficiently build on the knowledge that was systematically collected in previously developed models. secondly, the invaluable knowledge generated during the model construction process is often lost, mainly because of the main developer leaves the research team, but also due to unfavourable funding strategies. it is easier to obtain research funds for the construction of novel, even if perfunctory models, than to support a long-term maintenance of existing ones. preservation of the information collected in the form of a computational model has became an important quest in systems biology, and has been to some extend addressed by the community. development of the systems biology markup language (sbml) [ ] for unified communication and storage of biomedical computational models and the existence of the biomodels repository [ ] already ensured the survival of constructed models beyond the academic lifetime of their developers or the lifetime of the software used to create them. but a completed model in the sbml format does not allow to follow the logic of model construction and the knowledge generated by the building process is not preserved. such knowledge loss can be prevented by providing simple-to-use toolboxes enforcing a universal and readable form of constructing models. we have therefore decided to develop modelbase [ ] , a python package that encourages the user to actively engage in the model building process. on the one hand we fix the core of the model construction process, while on the other hand the software does not make the definitions too strict, and fully integrates the model construction process into the python programming language. this differentiates our software from / many available python-based modelling tools (such as scrumpy [ ] or pysces [ ] ) and other mathematical modelling languages (recently reviewed from a software engineering perspective by schölzel and colleagues [ ] ). we report here new features in modelbase v . , developed over the last two years. we have significantly improved the interface to make model construction easier and more intuitive. the accompanying repository of re-implemented, published models has been considerably expanded, and now includes a diverse selection of biomedical models. this diversity highlights the general applicability of our software. essentially, every dynamic process that can be described by a system of odes can be implemented with modelbase. implementation modelbase is a python package to facilitate construction and analysis of ode based mathematical models of biological systems. version . introduces changes not compatible with the previous official release . . published in [ ] . all api changes are summarised in the official documentation hosted by readthedocs. the model building process starts by creating a modelling object in the dedicated python class model and adding to it the chemical compounds of the system. then, following the intuition of connecting the compounds, you construct the network by adding the reactions one by one. each reaction requires stoichiometric coefficients and a kinetic rate law. the latter can be provided either as a custom function or selecting from the newly provided library of rate laws. the usage of this library (ratelaws) reduces the repetitiveness by avoiding boilerplate code. it requires the user to explicitly define reaction properties, such as directionality. this contributes to a systematic and understandable construction process, following the second guideline from the zen of python: "explicit is better than implicit". from this, modelbase automatically assembles the system of odes. it also provides numerous methods to conveniently retrieve information about the constructed model. in particular, the get * methods allow inspecting all components of the model, and calculate reaction rates for given concentration values. these functions have multiple variants to return all common data structures (array, dictionary, data frames). after the model building process is completed, simulation and analyses of the model are performed with the simulator class. currently, we offer interfaces to two integrators to solve stiff and non-stiff ode systems. provided you have installed the assimulo package [ ] , as recommended in our installation guide, modelbase will be using cvode, a variable-order, variable-step multi-step algorithm. the cvode class provides a direct connection to sundials, the suite of nonlinear and differential/algebraic equation solvers [ ] which is a powerful industrial solver and robust time integrator, with a high computing performance. in case when assimulo is not available, the software will automatically switch to the scipy library [ ] using lsoda as an integrator, which in our experience showed a lower computing performance. sensitivity analysis provides a theoretical foundation to systematically quantify effects of small parameter perturbations on the global system behaviour. in particular, metabolic control analysis (mca), initially developed to study metabolic systems, is an important and widely used framework providing quantitative information about the response of the system to perturbations [ , ] . this new version of modelbase now has a full suite of methods to calculate response coefficients and elasticises, and plotting them as a heat-map, giving a clear and intuitive colour-coded visualisation of the results. an example of such visualisation, for a re-implemented toy model of the upper part of glycolysis (section . . [ ] ), can be found in figure . many of the available relevant software packages for building computational models restrict the users by providing unmodifiable plotting routines with predefined settings that may not suit your personal preferences. with modelbase v . we constructed our plotting functions allowing the user to pass optional keyword-arguments (often abbreviated as **kwargs), so you still access and change all plot elements, providing a transparent and flexible interface to the commonly used matplotlib library [ ] . the easy access functions to visualise the results of simulations were expanded from the previous version. they now include plotting selections of compounds or fluxes, phase-plane analysis and the results of mca. models for isotope tracing modelbase has additionally been developed to aid the in silico analyses of label propagation during isotopic studies. in order to simulate the dynamic distribution of isotopes all possible labelling patterns for all intermediates need to be created. by providing an atom transition map in the form of a list or a tuple, all n isotope-specific versions of a chemical compound are created automatically, where n denotes the number of possibly labelled atoms. changing the name of previous function carbonmap to labelmap in v . acknowledges the diversity of possible labelling experiments that can be reproduced with models built using our software. sokol and portais derived the theory of dynamic label propagation under stationary assumption [ ] . in steady state the space of possible solutions is reduced and the labelling dynamics can be represented by a set of linear differential equations. we have used this theory and implemented an additional class linearlabelmodel which allows figure . labelling curves in a linear non-reversible pathway. example of label propagation curves for a linear non-reversible pathway of randomly sized metabolite pools, as proposed in the paper by sokol and portais [ ] . circles mark the position at which the first derivative of each labelling curve reaches maximum. in the original paper this information has been used to analyse the label shock wave (lsw) propagation. to reproduce these results run the jupyter notebook from the additional file . rapid calculation of the label propagation given the steady state concentrations and fluxes of the metabolites [ ] . modelbase will automatically build the linear label model after providing the label maps. such a model is provided in figure , where we simulate label propagation in a linear non-reversible pathway, as in figure in [ ] . the linear label models are constructed using modelbase rate laws and hence can be fully exported as sbml file. many models loose their readability due to the inconsistent, intractable or misguided naming of their components (an example is a model with reactions named as v -v , without referencing them properly). by providing meta data for any modelbase object, you can abbreviate component names in a personally meaningful manner and then supply additional annotation information in accordance with standards such as miriam [ ] via the newly developed meta data interface. this interface can also be used to supply additional important information compulsorily shared in a publication but not necessarily inside the code, such like the unit of a parameter. with the newly implemented changes our package becomes more versatile and user friendly. as argued before, its strength lies in its flexibility and applicability to virtually any biological system, with dynamics that can be described using an ode system. there exist countless mathematical models of biological and biomedical systems derived using odes. many of them are rarely re-used, at least not to the extent that could be reached, if models were shared in a readable, understandable and reusable way [ ] . as our / package can be efficiently used both for the development of new models, as well as the reconstruction of existing ones, as long as they are published with all kinetic information required for the reconstruction, we hope that modelbase will in particular support users with limited modelling experience in re-constructing already existing work, serving as a starting point for their further exploration and development. we have previously demonstrated the versatility of modelbase by re-implementing mathematical models previously published without the source code: two models of biochemical processes in plants [ , ] , and a model of the non-oxidative pentose phosphate pathway of human erythrocytes [ , ] . to present how the software can be applied to study medical systems, we used modelbase to re-implement various models, not published by our group, and reproduced key results of the original manuscripts. it was beyond our focus to verify the scientific accuracy of the corresponding model assumptions. we have selected them to show that despite describing different processes, they all share a unified construct. this highlights that by learning how to build a dynamic model with modelbase, you in fact do not learn how to build a one-purpose model, but expand your toolbox to be capable of reproducing any given ode based model. all examples are available as jupyter notebooks and listed in the additional files. for the purpose of this paper, we surveyed available computational models and subjectively selected a relatively old publication of significant impact, based on the number of citations, published without providing the computational source code, nor details regarding the numerical integration. we have chosen a four compartment model of hiv immunology that investigates the interaction of single virus population with the immune system described only by the cd + t cells, commonly known as t helper cells [ ] . we have implemented the four odes describing the dynamics of uninfected (t), latently infected (l) and actively infected cd + t cells (a) and infectious hiv population (v). we have reproduced the results from figure from the original paper [ ] showing the decrease in the overall population of cd + t-cell (uninfected + latently infected + actively infected cd +) over time, depending on the number of infectious particles produced per actively infected cell (n). to reproduce these results run the jupyter notebook from the additional file . in the figure , we reproduce the results from fig. from the original paper, where by changing the number of infectious particles produced per actively infected cell (n) we follow the dynamics of the overall t cell population (t+l+a) over a period of years. the model has been further used to explore the effect of azidothymidine, an antiretroviral medication, by decreasing the value of n after years by % or %, mimicking the blocking of the viral replication of hiv. a more detailed description of the timedependent drug concentration in the body is often achieved with pharmacokinetic models. mathematical models based on a system of differential equations that link the dosing regimen with the dynamics of a disease are called pharmacokinetic-pharmacodynamic (pk-pd) models [ ] and with the next example we explore how modelbase can be used to develop such models. the technological advances forced a paradigm shift in many fields of our life, including medicine, making more personalised healthcare not only a possibility but a necessity. a pivotal role in the success of precision medicine will be to correctly determine dosing regimes for drugs [ ] and pk-pd models provide a quantitative tool to support this [ ] . pk-pd models have proved quite successful in many fields, including oncology [ ] and here we used the classical tumour growth model by simeoni and colleagues, originally implemented using the industry standard software winnonlin [ ] . as the pharmacokinetic model has not been fully described we reproduced only the highly simplified case, where we assume a single drug administration and investigate the effect of drug potency (k ) on simulated tumour growth curves. in figure we plot the simulation results of the modelbase implementation of the system of four odes over the period of days where we systematically changed the value of k , assuming a single drug administration on day . with the mca suite available with our software we can calculate the change of the system in response to perturbation of all other system parameters. such quantitative description of the systems response to local parameter perturbation provides support in further studies of the rational design of combined drug therapy or discovery of new drug targets, as described in the review by cascante and colleagues [ ] . finally, compartmental models based on ode systems have a long history of application in mathematical epidemiology [ ] . many of them, including numerous recent publications studying the spread of coronavirus, are based on the classic epidemic susceptible-infected-recovered (sir) model, originating from the theories developed by kermack and mckendrick at the beginning of last century [ ] . one of the most critical information searched for while simulating the dynamics of infectious disease is the existence of disease free or endemic equilibrium and assessment of its stability [ ] . indeed periodic oscillations have been observed for several infectious diseases, including measles, influenza and smallpox [ ] . to provide an overview of more modelbase functionalities we have implemented a relatively simple sir model based on the recently published autonomous model for smallpox [ ] . we have generated damped oscillations and visualised them using the built-in function plot phase plane (see figure ). in the attached jupyter notebook we present how quickly and efficiently in terms of lines of code, the sir model is built and how to add and remove new reactions and/or compounds to construct further variants of this model, such as a seir (e-exposed) or sird (d-deceased) model. figure . compartmental pharmacokinetic-pharmacodynamic model of tumour growth after anticancer therapy. we have reproduced the simplified version of the pk-pd model of tumour growth, where pk part is reduced to a single input and simulated the effect of drug potency (k ) on tumour growth curves. the system of four odes describing the dynamics of the system visualised on a scheme above is integrated over the period of days. we systematically changed the value of k , assuming a single drug administration on day . we have obtained the same results as in the figure in the original paper [ ] . to reproduce these results run the jupyter notebook from the additional file . sir model with vital dynamics including birth rate has been adapted based on the autonomous model to simulate periodicity of chicken pox outbreak in hida, japan [ ] . to reproduce these results run the jupyter notebook from the additional file . we are presenting here updates of our modelling software that has been developed to simplify the building process of mathematical models based on odes. modelbase is fully embedded in the python programming language. it facilities a systematic construction of new models, and reproducing models in a consistent, tractable and expandable manner. as odes provide a core method to describe the dynamical systems, we hope that our software will serve as the base for deterministic modelling, hence it's name. with the smoothed interface and clearer description of how the software can be used for medical purposes, such as simulation of possible drug regimens for precision medicine, we expect to broaden our user community. we envisage that by providing the mca functionality, also users new to mathematical modelling will adapt a working scheme where such sensitivity analyses become an integral part of the model development and study. the value of sensitivity analyses is demonstrated by considering how results of such analyses have given rise to new potential targets for drug discovery [ ] . we especially anticipate that the capability of modelbase to automatically generate labelspecific models will prove useful in predicting fluxes and label propagation dynamics through various metabolic networks. in emerging fields such as computational oncology, such models will be useful to, e.g., predict the appearance of labels in cancer cells. if you have any questions regarding modelbase, you are very welcome to ask them. it is our mission to enable reproducible science and to help putting the theory into action. project name: modelbase project home page: https://pypi.org/project/modelbase/ operating system(s): platform independent programming language: python other requirements: none licence: gnu general public license (gpl), version any restrictions to use by non-academics: none computational systems biology computational oncology -mathematical modelling of drug regimens for precision medicine effective containment explains subexponential growth in recent confirmed covid- cases in china an updated estimation of the risk of transmission of the novel coronavirus ( -ncov) covid- outbreak on the diamond princess cruise ship: estimating the epidemic potential and effectiveness of public health countermeasures how computational models can help unlock biological systems model-driven experimentation: a new approach to understand mechanisms of tertiary lymphoid tissue formation, function, and therapeutic resolution mathematical modeling-guided evaluation of biochemical, developmental, environmental, and genotypic determinants of essential oil composition and yield in peppermint leaves modelling for eu policy support: impact assessments the systems biology markup language (sbml): a medium for representation and exchange of biochemical network models biomodels database: a repository of mathematical models of biological processes building mathematical models of biological systems with modelbase scrumpy : metabolic modelling with python modelling cellular systems with pysces required characteristics for modeling languages in systems biology: a software engineering perspective. biorxiv sciencedirect assimulo: a unified framework for ode solvers sundials : suite of nonlinear and differential / algebraic equation solvers algorithms for scientific computing in python the control of flux: a linear steady-state treatment of enzymatic chains. general properties, control and effector strength systems biology: a textbook matplotlib: a d graphics environment theoretical basis for dynamic label propagation in stationary metabolic networks under step and periodic inputs minimum information requested in the annotation of biochemical models (miriam) short-term acclimation of the photosynthetic electron transfer chain to changing light: a mathematical model a mathematical model of the calvin photosynthesis cycle comparison of computer simulations of the f-type and l-type non-oxidative hexose monophosphate shunts with p-nmr experimental data from human erythrocytes c n.m.r. isotopomer and computersimulation studies of the non-oxidative pentose phosphate pathway of human erythrocytes dynamics of hiv infection of cd + t cells modeling of pharmacokinetic/pharmacodynamic (pk/pd) relationships: concepts and perspectives precision medicine: an opportunity for a paradigm shift in veterinary medicine hhs public access precision dosing in clinical medicine: present and future different ode models of tumor growth can deliver similar results predictive pharmacokinetic-pharmacodynamic modeling of tumor growth kinetics in xenograft models after administration of anticancer agents metabolic control analysis in drug discovery and disease the mathematics of infectious diseases a contribution to the mathematical theory of epidemics qualitative and bifurcation analysis using an sir model with a saturated treatment function emergence of oscillations in a simple epidemic model with demographic data the authors declare that they have no competing interests. jupyter notebook with the modelbaseimplementation of the minimal pharmacokineticpharmacodynamic (pk-pd)model linking that linking the dosing regimen of an anticancer agent to the tumour growth, proposed by simeoni and colleagues [ ] . jupyter notebook with the modelbase implementation of the classic epidemic susceptible-infected-recovered (sir) model parametrised as the autonomous model used to simulate periodicity of chicken pox outbreak in hida, japan [ ] . the official documentation is hosted on readthedocs. key: cord- -asphso s authors: herrgårdh, tilda; li, hao; nyman, elin; cedersund, gunnar title: an organ-based multi-level model for glucose homeostasis: organ distributions, timing, and impact of blood flow date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: asphso s glucose homeostasis is the tight control of glucose in the blood. this complex control is important and not yet sufficiently understood, due to its malfunction in serious diseases like diabetes. due to the involvement of numerous organs and sub-systems, each with their own intra-cellular control, we have developed a multi-level mathematical model, for glucose homeostasis, which integrates a variety of data. over the last years, this model has been used to insert new insights from the intra-cellular level into the larger whole-body perspective. however, the original cell-organ-body translation has during these years never been updated, despite several critical shortcomings, which also have not been resolved by other modelling efforts. for this reason, we here present an updated multi-level model. this model provides a more accurate sub-division of how much glucose is being taken up by the different organs. unlike the original model, we now also account for the different dynamics seen in the different organs. the new model also incorporates the central impact of blood flow on insulin-stimulated glucose uptake. each new improvement is clear upon visual inspection, and they are also supported by statistical tests. the final multi-level model describes > data points in > time-series and dose-response curves, resulting from a large variety of perturbations, describing both intra-cellular processes, organ fluxes, and whole-body meal responses. we hope that this model will serve as an improved basis for future data integration, useful for research and drug developments within diabetes. a dysfunctional glucose homeostasis is a hallmark for both type and type diabetes mellitus (t d). in type diabetes, the insulin-producing beta-cells are destroyed by the immune system. since the other organs are unaffected, the treatment of t d simply consists of insulin, taken via injections or insulin pumps. in t d, the patient has both a reduced capacity to produce insulin and has developed a resistance to the hormone. this resistance appears in all of the three most metabolically active organs, which all respond to insulin: adipose tissue, muscle, and liver. inside each of these organs, the response to insulin is governed by an interaction between intracellular signaling and metabolic networks. the resistance is spread between the organs, in ways which are not yet fully understood, but which involves numerous hormones, cytokines, and metabolites. to better understand this complex interaction, both in health and in disease, dynamic mathematical models are needed. models for the top-level glucose homeostasis, involving a simple interaction between glucose and insulin, have been developed for decades ((bergman et al. ) . a first more advanced model (dalla man et al. was based on calculated flows of glucose and insulin between organs in response to a meal. a version of this model, trained on data from patients with type diabetes, is approved by the food and drug administration, fda, for replacement of animal experiments in the approval of the algorithm inside new insulin pumps (kovatchev et al. . for more general applications, involving t d, the intracellular insulin resistance must be combined with the whole-body interactions. such models are called multi-level models. there have been several efforts to create multi-level models of glucose homeostasis, reviewed in e.g. (ajmera et al. ; nyman et al. ; nyman et al. . one of the more comprehensive efforts is a series of nonlinear mixed effects models (silber et al. ; silber et al. ; jauslin et al. ) developed to describe plasma levels of glucose and insulin after different interventions for single patients with t d. another effort has developed a glucose homeostasis model, based partly on (dalla man et al. ) , to create a simulator to use in education and to simulate scenarios of disease (maas et al. ) . a third effort is the multi-level model of human glucose homeostasis we created years ago (nyman et al. ) . this model contains the dynamic glucose-insulin interaction between organs in response to a meal, based on (dalla man et al. ). in this model, we sub-divided the original insulin-responding uptake in a muscle and a fat component, and linked the fat tissue glucose uptake to intracellular insulin signaling data, coming from our own studies. this link was possible since insulin-stimulated glucose uptake can be measured both in isolated adipocytes and in organs. the adipocyte uptake is measured in vitro together with insulin signaling data; the organ-level uptakes are measured using isotopic labelling and/or arteriovenous (av) difference data, which measures the difference between arterial and venous blood. since the uptake measurements from isolated adipocytes should correlate with the av difference-based uptake-measurements for fat tissue, one can build a translation from in vitro to in situ, in humans. however, neither this model, nor any of the previously mentioned multi-level models, have subdivided the glucose uptake into the individual contributions of all of the main insulin-responding and glucose-utilizing organs: adipose tissue, muscle, and liver. in this paper, we have updated the original multi-level connections in (nyman et al. ) , and resolved three critical questions or issues (q -q ), regarding the role of each of the metabolically active organs in glucose uptake (fig ) . more specifically, we have explicitly included the liver in the model as a glucose-utilizing organ, in contrast to the original models, which only considered it as a glucose producing organ (q ). secondly, we have included a timing difference between muscle and adipose tissue glucose uptake in the response to a meal (q ). thirdly, we have have updated the model to include the impact of blood flow on glucose uptake in adipose tissue (q ). finally, we merge these three improvements together with all of the other already published improvements described above, to an updated multi-level model (q ). this model constitutes an updated view on the multi-level roles that each organ plays in glucose homeostasis, and allows for integration of future data for specific sub-systems into an integrated and more complete picture. we have used ordinary differential equations (odes) in the standard form to build the models. all of the equations are given in the supplementary files, both as equations and as simulation files, and here we only describe the most central equations, relating to the changes done in this paper. the equations for the dynamics of the amount of glucose in interstitial tissue (g t ) and plasma (g p ) are given by where u id is insulin and glucose dependent glucose uptake, i.e. in fat, muscle, and liver; where u ii is insulin independent and constant glucose utilization, i.e. glucose uptake by organs such as brain and kidneys; where egp is endogenous glucose production from the liver; where ra is glucose rate of appearance from the intestine; where e is glucose excretion through the kidneys; and where k · g p and k · g t denotes the flux from plasma to intestines and back, respectively. note that g t and g p are states, while u id , u ii , k · g p , k · g t , egp, ra, and e are the reaction rates that describe flows of glucose. similarly, k and k are parameters -rate constants -which are constant over time. insulin-dependent and dynamic glucose uptake the above equations are identical to those in the original dalla man model (dalla man et al. ) , and the change that was implemented in (nyman et al. ) was that u id was sub-divided into a muscle and an adipose tissue part. we now sub-divide the insulin-dependent dynamic glucose uptake into three parts, i.e. where u idm , u idl , and u idf denotes the uptake rates into the muscle, liver, and fat, respectively. all of these uptake descriptions have changed to same extent, so let us now go through them one by one. glucose uptake in muscle is given by where v mmax is the non-scaled maximal glucose uptake, and where k m is the corresponding michaelis-menten parameter. the insulin-dependency of the glucose uptake is located in the expression for v mmax where part m is a scaling parameter to balance the uptake of the muscle with the other organs, where v is the basal rate of glucose utilization, and v x is the maximum rate of glucose entering the tissue (here muscle) from the surrounding tissue, and where ins denotes the interstitial insulin concentration. so far, these equations for the muscle uptake are the same as in (nyman et al. . in contrast, although ins is almost calculated in the same way as in (nyman et al. , the parameters describing the rate of entry and the rate of degradation are now allowed to be different, i.e. where v and v describe the rate of transport from the plasma and the rate of degradation, with corresponding rate constants k and k , respectively; where i b denotes the basal plasma insulin concentration; and where i denotes the insulin concentration in plasma. the liver was not included in the previous models, and thus its equations are new. they are similar to the equations for muscle, i.e. where k l is a michaelis-menten constant, and and where v lmax represents the maximum rate of glucose utilization in the liver. just as for the equations for muscle, the insulin dependence is incorporated into the expression for v lmax , which is given by where part l represent the relative glucose utilization of the liver in comparison with other tissues. note that the insulin-dependency of the liver glucose uptake is described as being direct, while in reality this dependency is indirect. glucose uptake in the liver is done via the glut transporter, which is not regulated by insulin. in contrast, the glucose uptake in muscle and adipose tissue is done by the glut transporter, which is regulated by insulin. in the liver, insulin instead indirectly effects glucose uptake by up-regulating intracellular glucose phosphorylation and utilization. however, since the model is lacking intracellular reactions, this indirect effect present in the liver is approximated in the same way as the direct effect for the muscle. note, finally, also that the egp in the liver also is regulated by insulin, and that this is described as a separate process, in the same way as in (nyman et al. . glucose uptake in the adipose tissue is the most advanced part of the model, since it is determined by intracellular processes, both regarding metabolism and regarding insulin signaling. the ultimate calculation of the uptake is given by the following expression where part f is a parameter, and where v in and v out describe the rate of glucose transport into, and out of, the cell, respectively. these two fluxes are given by where p and p are transport parameters, where glu in is the amount of intracellular glucose, and where ins f ,e is the effect of insulin on these transport rates. these two equations show that the glucose uptake in the fat tissue depends on both intracellular metabolism, which alters the value of glu in , and the intracellular signaling, which alters the value of ins f ,e . the intracellular metabolism incorporates the first two steps of glycolysis, i.e. the steps involving intracellular glucose- -phosphate (g p ). the equations are given by where p , p , v g pmax , and k out are rate constants, and where v g p is the rate of phosphorylation of glucose. the intracellular insulin signaling is in itself the same as in (brännmark et al. , and it starts with insulin binding to the receptor (supplementary eq ), and ends with translocation of the glut transporter to the membrane (supplementary eqs and ). what is new compared to (nyman et al. ; brännmark et al. ) instead concerns the usage of the glut transporter to calculate the resulting impact on glucose uptake, ins f ,e . in our updated model, this insulin effect is given by where glut m is the amount of glut in the membrane, where glut is the amount of glut in the membrane, and where b f e f is the effect of blood flow; nc, k , and pf are parameters. the glut and glut terms corresponds to the transport via the two glucose transporters, and bfe f was introduced in (nyman et al. ) as a scaling parameter between the data from the in vitro setting studying isolated adipocytes, and the in situ setting, where the adipose tissue is still located in the human body. in other words, the blood flow effect is not there when simulating in vitro experiments. in (nyman et al. ) , this difference in insulin effect was hypothesized to be dependent on blood flow, and in this paper, we show that such an impact on blood flow is indeed present. if one does not have data for the blood flow, the model will set bfe f to a constant value, and if there is data for blood flow, we propose to use the new model described in the next section. equations for the impact of blood flow on glucose uptake in adipose tissue the impact of blood flow on glucose uptake is dependent on insulin. the same equations for adipose tissue is used for muscle glucose uptake (cf eqs ( )-( )). where i is insulin in plasma and i b is the basal insulin level, and where c b f and c b f are rate constants. second, to calculate the impact of blood flow, we need to have an expression for how the blood flow is calculated. in this study, we only look at blood in controls, and in presence of bradykinin, which increases the blood flow. this increase is also dependent on insulin. this control of blood flow, denoted bf f is given by where be describes the direct effect of bradykinin on blood flow; where kbf describes the combined effect of insulin and bradykinin, and where ins offset is a small offset introduced to make insulin concentrations positive (same as in (nyman et al. ). the value of bradykinin is in the absence of bradykinin, and in the presence of bradykinin. finally, the blood flow and insulin are combined to impact the glucose uptake via the following expression for bfe f where bf b is the basal blood flow, where p b f is a paramete, and where ins b is the basal insulin level in adipose tissue. these are all the equations that have been changed in the current version of the model. the full set of odes from the final model, including the original simulation files, are found in the supplementary material. an interaction graph of the final model is given in fig s . models were simulated in a modular fashion, by simulating part of the model with curves from other parts as input. specifically, the new additions were simulated on their own together with equations for g p and g t with curves for egp, ra, ins, and glut m, simulated by the first model version presented herein (m ), as input. parameter values for existing models are used from (brännmark et al. ) . the agreement between model simulations and experimental data is used to estimate values for new model parameters. this agreement is done by minimizing the distance between estimation data, denoted y, and corresponding simulated data for parameter p, which is denotedŷ(p). in our case, the estimation data consists of uptake rates of glucose into the adipose tissue and and muscle, which are denoted u idf and u idm , respectively. the cost function used is the conventional where the subscript i denotes the data point, where n denotes the number of data points, and where sem denotes the standard error of the mean for the data uncertainty (cedersund and roll ). we use a χ -test to evaluate the agreement between model simulations and data. to be more specific, we use the inverse of the cumulative χ distribution function for setting a threshold, and then compare the cost function v(p) with a threshold. in order to set that threshold, we need a significance level and the degrees of freedom. in this study, we use significance level . , and the degrees of freedom used is specified for each analysis in the results. apart from the formal optimization described above, some additional ad hoc requirements were added to the parameter estimation. specifically, to get a good estimate of the proportions of glucose taken up by the different tissues, a term adding a slightly increasing punishment for having a total uptake of glucose in liver higher than % or lower than % of total glucose uptake in all organs. the total glucose uptake of other organs except adipose tissue, muscle and liver (u ii ) was punished in the same way for values higher than % and lower than % total glucose uptake of all organs. the simple fitting to the impact of blood flow on glucose uptake was done by hand. a representative simulation was chosen for the comparison to the data uncertainties for total glucose uptake from dalla man (dalla man et al. ) ( ). the uncertainty of the model simulations was estimated by, during the optimization process, saving all found parameters with an acceptable simulation according to section above. we used matlab r b (mathworks, natick, ma) and the iqm toolbox (intiquan gmbh, basel, switzerland) for modeling. the experimental data as well as the complete code for data analysis and modeling are available at https://gitlab.liu.se/isbgroup/projects/updated-multi-level. experimental and clinical data no new data were collected in this study. we therefore refer to the methods sections in the original articles (frayn et al. ; coppack et al. ; gerich ; moore et al. ; iozzo et al. ; brännmark et al. ) for the corresponding experimental methods. distribution of postprandial glucose uptake between adipose, muscle, and liver (q ) the first improvement made to the original model (nyman et al. ), referred to as m , was to update the redistribution of the glucose uptake among the different tissues ( (fig a) . the liver stands for almost half of the total postprandial glucose uptake ( fig b) (gerich ) , which was not explicitly accounted for in m (fig a, dotted line) . we therefore adopted the fluxes to fit to the data in b. more specifically, the liver was added as a glucose consuming organ, with a high net consumption compared to the other organs. in the updated model, referred to as m (table s ) , the liver is set to take up % of the total postprandial glucose uptake (fig a-b) , while adipose and muscle uptake were both reduced to % and % respectively. furthermore, the glucose uptake by organs whose uptake is not affected by a meal (e.g. brain and kidneys) was increased to %. note that in fig a, this constant uptake is symbolized by the kidneys and the brain, because those are the most prominent glucose consumers (gerich ) , but that other tissues and organs can be seen as represented in this uptake as well. as a validation of these changes, we compared the resulting model simulations with data from other studies. more specifically, we compared the uptake of glucose in adipose and muscle tissue, as simulated by the two models m and m , with data that measures the uptake in these two organs specifically. such measurements are possible using e.g. av difference data. in fig c, the area under the curve (auc) for m of adipose and muscle combined (dashed, light orange) is approximately times bigger than the auc of the data (solid, brown) in (frayn et al. ; coppack et al. ) . this is clearly beyond the experimental uncertainty, and m is therefore rejected by a χ test (v (θ ) = > . = χ cum,inv ( , . )). in contrast, m has approximately the same auc as the data, and its simulations lies within the experimental uncertainty for most data points. therefore, the time series is not rejected by the test based on these independent data (v (θ ) = . < . = χ cum,inv ( , . )). for these reasons, we reject m , in favor of the new model m . a more detailed check of the quality of the updated model m is obtained by looking at the muscle and adipose tissue glucose uptake one by one ( fig d) . for muscle (red), both the time-dynamics (left) and auc (right) agrees between simulations (light red) and independent data (dark red). this visual observation is supported by a χ test (v (θ ) = . < . = χ cum,inv ( , . ). in contrast, the adipose tissue shows a reasonable agreement with data, but it is not quantitatively acceptable according to a χ test (v (θ ) = . > . = χ cum,inv ( , . ). looking closer at the time-series reveals that the value at the maximal uptake is fine, but that the problems lies in the fact that the dynamics of the uptake in muscle and adipose tissue are different, and that this is not captured in the model. difference in time-resolved glucose uptake in adipose and muscle tissue (q ) since the timing and agreement with dynamic glucose uptake in the muscle tissue is fine already in the model m , this model was kept essentially intact. however, one minor modification that effects muscle uptake was introduced (fig ) . in the previous model (m ), the rate constant of insulin transport into the interstitium (v ) is assumed to be the same (k ) as for the rate of the subsequent degradation of insulin (v ). since there is no reason for these values to be the same, we updated the model to give these two reaction rates their own rate constants (k , and k , respectively). we refitted both parameters together with the other new parameters (introduced below) to the data, and the resulting model is referred to as m a. the developments for the adipose tissue glucose uptake needed to be more elaborate, and are available in fig : the new model structure is depicted in fig a and comparison with data is included in fig b. as can be seen, the same difference as was introduced for muscle, m a, yields a poor agreement with data for the adipose tissue, since the peak is too late. the main problem is that the glucose uptake in the adipose tissue has gone down to baseline levels already after around min, when insulin levels still are high (dalla man et al. ) ( ). therefore, since the glucose uptake in the current model cannot go down before insulin goes down, an additional mechanism is needed. one such possible mechanism is the fact that the hexokinase reaction has a product inhibition (may and mikulecky ) . this leads to two new states in the next version of the model (m b, a, red circle): intracellular glucose glu in and phosphorylated glucose, g p. as seen, there is an inhibition from g p to the rate of phosphorylation of glu in . this modification allows for the following chain-of-events. when glucose uptake begins, the amount of intracellular glucose starts to build up, which is then phosphorylated into g p. when the g p reaches saturation levels, g p inhibits the phosphorylation process from intracellular glucose, which leads to increasing intracellular glucose levels. since the net glucose uptake is driven by the gradient across the cell membrane, this increase in intracellular glucose will decrease the glucose uptake, even though insulin levels still might be high. the resulting simulations of glucose uptake in muscle and adipose tissue (fig b, right) , agrees with the data both according to a visual check, and according to a χ test (v (θ ) = . < . = χ cum,inv ( , . )). improvements in the intracellular adipose tissue model: glucose metabolism and blood flow effects (q ) the final improvement made was the addition of the impact of blood flow on insulin-stimulated glucose uptake in the adipose tissue. this interaction was hypothesised in iozzo et al, where they looked at the effect of blood flow and insulin, separately and combined, on glucose uptake in adipose tissue (iozzo et al. ) (fig a-b) . increased blood flow was achieved with the drug bradykinin. in these experiments, iozzo et al observed that glucose clearance was not significantly changed when only adding bradykinin (fig b, left) . in contrast, when combining both bradykinin and insulin, the glucose uptake is increased compared with only adding insulin (fig. b, right) . the same behaviour is produced by the model in fig c, where the glucose uptake only increases when both bradykinin and insulin is present. the parameter bradykinin was changed from to to represent the addition of bradykinin, and the parameter ins offset is changed from to to represent insulin infusion (fig a) . this behaviour also agrees with data according to according to a χ test (v (θ ) = . < . = χ cum,inv ( , . ), where the degrees of freedom have been compensated for with the number of new parameters, - = ). the updated model is referred to as m , and as for the other model additions, the new equations are shortly depicted in the figure (here fig a) , and described in detail in materials and methods and supplementary files. finally, we consider the performance of the resulting final multi-level model, in relation to all of the data that has been generated over the years. the final model can fit to dynamic data of postprandial glucose uptake in both adipose and muscle tissue (fig a, same data as in fig d, from (coppack et al. ) . the same figure displays predictions of dynamic uptake in the liver (for which the same type of av difference data is non-existent), and for the tissues with a constant demand of glucose (such as the brain). finally, the right-most sub-figure in fig a shows that the model agrees well with the total dynamic glucose uptake from (dalla man et al. ). furthermore, the aucs for the different tissues in the final combined model is in line with the corresponding auc data (fig b) , just as they were in step q (fig b) . the two left-most bars, for muscle and adipose tissue, are given by the auc of the corresponding time-series in fig a (cf fig d) , and the liver and brain/kidney uptake are the same as in fig b. the final model is also in agreement with data previously used in the model development. the agreement with the most important such data sets are re-plotted in fig ( dalla man et al. ), which describes meal responses for the following variables: plasma glucose, plasma insulin, endogenous glucose production, glucose rate of appearance from the intestines, glucose uptake or utilization, and insulin secretion. as can be seen, the model simulations (lines) are within the experimental uncertainty (grey area) for all these time curves (agreements between simulation and data are similar as in (dalla man et al. ) . similarly, because of the hierarchical way that the multi-level model is constructed, it also still agrees with all of the intracellular signalling data, which we have collected over the years (brännmark et al. ) . the most important such data is depicted in fig . these data (error bars) describe time-series and dose-response curves in response to insulin for a number of intracellular proteins: the insulin receptor (ir), the insulin receptor substrate- (irs ), protein kinase-b (pkb), akt-substrate (as ), ribosomal protein s kinase beta- (s k ), ribosomal protein s (s ), as well as cellular glucose uptake. the model simulations (lines) are in agreement with both data from non-diabetic and lean controls (blue), and from obese people with type diabetes (red), with changes only in a few key parameters (for more details, see (brännmark et al. ). similar agreements for additional proteins -such as extracellular signal-regulated kinases (erk ), ets like- protein elk- (elk ), forkhead box protein o (foxo ), etc -is equally possible to obtain by replacing the intracellular part of the model with those in (nyman et al. ; rajan et al. ). glucose homeostasis is a complex multi-organ and multi-level system, which requires multi-level mathematical modelling for a full understanding. we have herein improved an existing such model (nyman et al. ) for glucose fluxes in the circulation, linked to intracellular pathways in adipocytes, in response to a meal. specifically, we have (q ) made a new subdivision of glucose uptake between all relevant organs, to provide more reliable proportions and to include uptake in the liver (fig ) ; (q ) improved the elimination of interstitial insulin to be tissue-specific (fig ) , and included intracellular metabolism of glucose inside adipocytes, to capture an earlier peak in the glucose uptake in adipocytes compared to the corresponding peak in plasma insulin (fig ) ; and (q ) accounted for the impact of blood flow on glucose uptake (fig ) . the final combined model (q ) can fit to all of the new data for glucose uptake in all organs (fig ) , as well as to all previous data, such as the postprandial glucose and insulin fluxes and concentrations in (dalla man et al. ) (fig ) , and the intracellular data in (brännmark et al. ) (fig ) . to the best of our knowledge, this is the most comprehensive description of such a wide variety of data for glucose homeostasis in humans, and we hope that it will become a useful resource also for integration of future data. one of the main contributions in this work is the addition of glucose uptake in the liver (q ). this addition is important because the liver is the the organ that takes up the most glucose: approximately % (fig ) . apart from this, the liver has a unique function in glucose homeostasis, since it is the only organ that can produce glucose from other metabolites. these two functions, glucose uptake and endogenous glucose production (egp), are now modeled as separate processes. in other words, the liver can both produce and take up glucose at the same time. while there may be situations when only the net uptake/release is important, there are also situations when one can experimentally resolve the two fluxes. for instance, when labeled metabolites have been ingested, one can see the rate by which these are converted to glucose and secreted, even in postprandial conditions, when the net effect of glucose transport is into the cell. such data have previously been used to train the egp fluxes (fig ) (dalla man et al. ), and we have now added corresponding data for glucose uptake (fig b) . note that this model is only fitted to the data in fig b, and that the agreements seen in fig c- d serve as a simple validation of this part of the model. with this said, it should be emphasized that both the muscle and the new liver module are highly simplified. only the muscle and adipose modules have been tested with respect to dynamic uptake data, and only the adipose module with an intracellular signaling part, based on detailed intracellular data, resolving the complicated intracellular metabolic fluxes. these limitations are present primarily because such data are rare or non-existent. at the heart of resolving both q and q lies measurements of glucose fluxes, which have been measured in a variety of ways. the glucose fluxes from (dalla man et al. ) was based on a triple tracer protocol, which allows for the simultaneous calculation of plasma glucose, egp, glucose rate off appearance, and glucose utilization (fig ) . these data are based on advanced calculations, which in turn are based on various assumptions and mathematical models developed within the field of tracer based measurements (wolfe et al. ) . these particular assumptions are not necessary in the organ specific glucose utilization curves, available e.g. for muscle and adipose tissue (fig c and d) . these data are based on an av difference-based protocol, which samples in both an artery and veins that have past through either muscle or adipose tissue, and by looking at the difference between the ongoing and the outgoing blood (coppack et al. ) . this is a more direct way of measuring how each organs contributes to the glucose disappearance from the blood. nevertheless, also avdifference data does not measure glucose uptake in the primary cells, myocytes and adipocytes, respectively. this means that the quick decline in glucose uptake in adipose tissue ( fig b) could in fact be the result of a quick equilibrium between interstitial and capillary glucose concentration. one could possibly develop an alternative model based on that equilibration-based assumption, to explain the quick decline of the glucose uptake in the adipose tissue, either as a replacement or as a complement to the herein implemented mechanism based on product inhibition (fig a) . finally, the fact that the model is based on three different types of measurements of glucose uptake (cellular in vitro, tracer-based, and av-difference based), and can describe all of these types of data simultaneously, is a reason why a relatively simple validation, such as that in fig c-d, still is of value. the final question addressed herein (q ) concerns the impact of blood flow on glucose uptake, which is highly simplified because the real relationship is a bidirectional one. the data in fig b shows that glucose uptake is increased by increased blood flow, at least in cases when insulin is present. this relationship is captured in the final model. however, that model can only describe situations where the blood flow is altered in a way that is not connected to the metabolic response, such as when adding bradykinin (fig b) . in other words, the model cannot describe meal-induced blood flow changes and its associated impact on glucose uptake. the development of a model for blood-flow regulation during e.g. meal-responses is an important task for future modelling works. another weakness regarding the blood flow part of the model concerns the lack of validation. the model is only fitted to the data in fig b. in the analysis, we compensate for that by reducing the degrees of freedom from the number of data points ( ) to the number of data points minus the number of parameters ( - = ). however, one could argue that the two baseline bars should not be counted since they are normalized to be %. in such an interpretation, the degrees of freedom are , a chi test can not be done, and the only possible assessment of the quality of the model is a visual comparison of the differences between fig b and c. for all these reasons, the blood flow part of the model is to be considered as a first step in the development of a model for the blood flow and its function in glucose homeostasis. it is important to compare the model presented herein to other similar models in the literature. in the introduction, we mentioned the now classical nonlinear mixed effects models describing plasma levels of glucose and insulin (silber et al. ; silber et al. ; jauslin et al. ). these models have since these early publications been used to scale data between pre-clinical data from animals to clinical human data for glucose and insulin concentrations (alskär et al. ) , and to describe cross-talk with more long-term processes, such as disease development in mice (choy et al. ) and dynamics of hba c (kjellsson et al. ; møller et al. ) . glucose homeostasis-centered models, focusing on the glucose-insulin interplay, lie at the heart of mathematical models developed for type diabetes, e.g. to aid insulin-pumps, and to develop a so-called artifical pancreas (huang et al. ; fabris and kovatchev ) . another application of glucose homeostasis models exist for meal response t d simulator model, developed for pedagogical and motivational purposes (maas et al. ) . none of these models have subdivided glucose uptake in the different organs, or included intracellular responses, in multi-level and multi-organ models. there exists one model that does this, developed by uluseker et al (uluseker et al. ) . this multi-level model is based on a version of the dalla man model (dalla man et al. ) connected with our intracellular adipocyte model (brännmark et al. ) , while also including hormonal effects on glucose intake/appetite (leptin, ghrelin) and insulin levels (incretin). however, this model does not compare their whole-body simulations with any data, and does not include the liver as a glucose consuming organ. the model presented in this work only included intake of glucose, and thus discarded the effects of proteins and fat on the meal response, something that other models do take into account, to some extent. sips et al developed a model that integrates fatty acids with glucose metabolism (sips et al. ), but this model needs a triglyceride curve as input, and lacks protein metabolism. nevertheless, the sips model is another expansion of the dalla man model (dalla man et al. ) and can thus be merged with the developments herein. two models that include protein and fat intake from a meal are the ones developed by hall et al and sarkar et al (hall et al. ; sarkar et al. ) . these models are however developed for long term simulations (over several years), and can thus not simulate a meal response. in similarity to the model presented here, the sarkar model include liver, muscle, and adipose tissue as glucose consuming organs, but in contrast also adds the pancreas as a glucose consuming organ. furthermore, the sarkar model disregards the organs taking up a constant amount of glucose (brain and kidneys). in any case, the sarkar model only describes data for long-term dynamics, and does not describe meal-respones. another longitudinal model describing glucose dynamics on both short and long-term time-scale is the one developed by ha et al. (ha and sherman ) . this model is, in contrast to the other two longitudinal models mentioned above, multi-scale in that it can look at both changes over years, including the progression towards diabetes in a semi-mechanistic fashion, as well as meal response dynamics happening in the scale of hours and minutes. this model does, however, not include the distribution of glucose among different organs. there are also some multi-level and multi-scale models for other systems that should be mentioned. one such model is the one developed by barbiero et al (barbiero and lió ) . this model combines whole-body dynamics with the function of organs and individual cells, and is able to simulate dynamics in seconds up to several days. the model was used to simulate the cardiovascular and inflammatory effect of both t d diabetes and covid- , using personalized parameters. however, this model has an important short-coming: its simulations are not compared with any data. there also exists interconnected models for e.g. heart function, describing the function of cardiac cells up to the integrated behavior of the intact heart (smith et al. ). in summary, there does not exist any other multi-level model describing glucose meal response, that also separates between the different organs' glucose uptake. in this work, we present such a model, that, due to its modular approach can be easily expanded in different directions. this expansion-possibility is due to both the modular structure, and to the fact that each module can be treated as a separate modelling problem. in other words, as long as the model for each module agrees with the input-output profiles of insulin and glucose, the new model can replace the old model, with little alterations on whole-body dynamics. in the earlier developed model (nyman et al. ) , we took this modularity one step further, by replacing the simpler -state insulin receptor module with a much more detailed -state module for the receptor dynamics, including the possibility for a receptor to bind up to three insulin molecules (kiselyov et al. ). this demonstrates the usefulness of developing a model in modules, so that the right level of details can be included depending on the data/questions you want to analyze. since the original publication of our first multi-level model (nyman et al. ) , we have built further on this model in several directions, and all of these developments can be re-used also in our new model. we have e.g. expanded the intracellular part to explain a more and more comprehensive picture of the alterations in intracellular signaling that occur in t d. this has been done by taking adipose tissue biopsies from both healthy and t d individuals, and characterise their respective insulin signalling. in (brännmark et al. ), we presented a first model of how the insulin resistance occurs, and in subsequent works, we have added additional proteins, such as foxo transcription factor (rajan et al. ) , insulin control of mapks erk / (nyman et al. ) . because of the modular way that our multi-level model is structured, one can replace the herein used intracellular model with any of these other alternatives. the same expansions can be done also for other organs. we therefore hope that this multi-level model in the future can serve as a hub for connecting data and models together into a useful systems-level understanding. in model m , liver is added, the amount of glucose utilization in muscle and adipose is reduced, and the uptake that is constant during a meal of other tissues is increased compared to the original m model. (b) glucose distribution among organs observed in data from (gerich ) . (c) glucose uptake in muscle and adipose tissue combined for m and m . the area under the curve for m is higher than seen in data from (frayn et al. ; coppack et al. ) , and m is thus rejected. (d) comparison between model m 's predictions of adipose and muscle glucose uptake with new data not used for parameter estimation (frayn et al. ; coppack et al. ) . illustration of the new intracellular adipose tissue module and ode equations. the flow of glucose in to the cell, v in , is dependent on the amount of glucose in interstitium (g t ) and inside the cell (glu in ), and the amount of glut m and glut membrane glucose transporters through ins f ,e . the out flow, v out , is only dependent on glu in , which in turn depends on,v in , v out , and the phosphorylation of glucose into g p (v g p ). the rate of phosphorylation to g p is only dependent on v g p and the usage of g p in metabolism (v met ). (b) timing comparison between uptake seen in data and the two models: m a without phosphorylation, and m b with glucose phosphorylation. in m b, the peak comes earlier and the quantity of glucose taken up is closer to data than in m a. behaviour seen in data as response to insulin and bradykinin. insulin alone has a relatively small effect on glucose clearance, but increases glucose uptake significantly when combined with bradykinin (iozzo et al. ). (c) the same behaviour as in (b) (iozzo et al. ) can be simulated with the model. adding bradykinin is simulated by increasing the value of bradykinin, and adding insulin infusion is simulated by increasing the value of ins offset from . (coppack et al. ) . the total glucose uptake is within the bounds presented in (dalla man et al. ). (b) total glucose uptake for all organs, simulated by the final model and from the data used to fit the model (coppack et al. ) . (brännmark et al. ) . m can describe data for intracellular insulin signaling in adipocytes, both normally (blue) and in t d (red). ir, insulin receptor; irs , insulin receptor substrate- ; pkb, protein kinase-b; as , akt-substrate v a = ir · k a · (ins + ) · e − ( ) v c = k c · pkb p · mt orc a ( ) v e = k e · pkb p · irs p ( ) v f = k f · pkb p p ( ) v h = k h · pkb p ( ) v f = as · (k f · pkb p p + k f · pkb p n (km n + pkb p n )) ( ) v f = s k · k f · mt orc a n km n + mt orc a n ( ) v b = s k p · k b ( ) v f = s · k f · s k p ( ) v b = s p · k b ( ) figure s : interaction graph for model m b. original multi-level model rejected by figure updated glucose distribution among organs can describe figure rejected by figure updated glucose dynamic behaviours by improving interstitial fluid insulin can describe figure rejected by figure updated glucose dynamic behaviours by redesigning an intracellular model can describe figure rejected by figure blood flow has an influence on adipose tissue glucose uptake rejected by figure insulin has an influence on adipose tissue glucose uptake the impact of mathematical modeling on the understanding of diabetes and related complications model-based interspecies scaling of glucose homeostasis the computational patient has diabetes and a covid physiologic evaluation of factors controlling glucose tolerance in man. measurement of insulin sensitivity and β -cell glucose sensitivity from the response to intravenous glucose insulin signaling in type diabetes: experimental and modeling analyses reveal mechanisms of insulin resistance in human adipocytes systems biology: model based evaluation and comparison of potential explanations for given biological data modeling the disease progression from healthy to overt diabetes in zdsd rats carbohydrate metabolism in insulin resistance: glucose uptake and lactate production by adipose and forearm tissues in vivo before and after a mixed meal meal simulation model of the glucoseinsulin system the closed-loop artificial pancreas in periprandial regulation of lipid metabolism in insulin-treated diabetes mellitus physiology of glucose homeostasis type diabetes: one disease, many pathways quantification of the effect of energy imbalance on bodyweight modeling impulsive injections of insulin: towards artificial pancreas the interaction of blood flow, insulin, and bradykinin in regulating glucose uptake in lower-body adipose tissue in lean and obese subjects an integrated glucose-insulin model to describe oral glucose tolerance test data in type diabetics harmonic oscillator model of the insulin and igf receptors' allosteric binding and activation a model-based approach to predict longitudinal hba c, using early phase glucose data from type diabetes mellitus patients after anti-diabetic treatment in silico model and computer simulation environment approximating the human glucose/insulin utilization a physiology-based model describing heterogeneity in glucose metabolism: the core of the eindhoven diabetes education simulator (e-des) glucose utilization in rat adipocytes. the interaction of transport and metabolism as affected by insulin longitudinal modeling of the relationship between mean plasma glucose and hba c following antidiabetic treatments regulation of hepatic glucose uptake and storage in vivo a single mechanism can explain network-wide insulin resistance in adipocytes from obese patients with type diabetes insulin signaling -mathematical modeling comes of age a hierarchical whole-body modeling approach elucidates the link between in vitro insulin signaling and in vivo glucose homeostasis requirements for multi-level systems pharmacology models to reach endusage: the case of type diabetes systems-wide experimental and modeling analysis of insulin signaling through forkhead box protein o (foxo ) in human adipocytes, normally and in type diabetes a long-term mechanistic computational model of physiological factors driving the onset of type diabetes in an individual an integrated model for glucose and insulin regulation in healthy volunteers and type diabetic patients following intravenous glucose provocations an integrated model for the glucose-insulin system model-based quantification of the systemic interplay between glucose and fatty acids in the postprandial state the cardiac physiome: at the heart of coupling models to measurement a closed-loop multi-level model of glucose homeostasis isotope tracers in metabolic research: principles and practice of kinetic analysis ribosomal protein s kinase beta- s we thank the swedish research council ( - , - , - ), ceniit ( . , . ), the heart and lung foundation, the swedish foundation for strategic research (itm - ), scilifelab/kaw national covid- research program project grant ( . ), h (precise q, ), the swedish fund for research without animal experiments, and elliit. this manuscript has been submitted to bioarxiv. the authors declare that they have no conflict of interest. all the odes for the final model m : all variables of final model: key: cord- -q yqnvm authors: macpherson, ailene; louca, stilianos; mclaughlin, angela; joy, jeffrey b.; pennell, matthew w. title: a general birth-death-sampling model for epidemiology and macroevolution date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q yqnvm birth-death stochastic processes are the foundation of many phylogenetic models and are widely used to make inferences about epidemiological and macroevolutionary dynamics. there are a large number of birth-death model variants that have been developed; these impose different assumptions about the temporal dynamics of the parameters and about the sampling process. as each of these variants was individually derived, it has been difficult to understand the relationships between them as well as their precise biological and mathematical assumptions. and without a common mathematical foundation, deriving new models is non-trivial. here we unify these models into a single framework, prove that many previously developed epidemiological and macroevolutionary models are all special cases of a more general model, and illustrate the connections between these variants. to do so, we develop a novel technique for deriving likelihood functions for arbitrarily complex birth-death(-sampling) models that will allow researchers to explore a wider array of scenarios than was previously possible. as an illustration of the utility of our mathematical approach, we use our approach to derive a yet unstudied variant of the birth-death process in which the key rates emerge deterministically from a classic susceptible infected recovered (sir) epidemiological model. whether the rates are interpreted as describing pathogen transmission or macroevolutionary diversification. in the model, transmission/speciation results in the birth of a lineage and occurs at rate ⁄(· ), where · ( ae · ae t or ) is measured in units of time before the present day, such that ⁄ can be time-dependent. we make the common assumption that lineages in the viral phylogeny coalesce exactly at transmission events, thus ignoring the pre-transmission interval inferred in a joint phylogeny of within-and between-host [ ] . throughout, we will use · as a a general time variable and t ◊ to denote a specific time of an event ◊ time units before the present day (see table s ). lineage extinction, resulting from host recovery or death in the epidemiological case or the death of all individuals in a population in the macroevolutionary case, occurs at time-dependent rate µ(· ). we allow for two distinct types of sampling: lineages are either sampled according to a poisson process through time Â(· ) or binomially at very short intervals, which we term "concerted sam- pling attempts" (csas), where lineages at some specified time t l are sampled with probability fl l (f denotes a vector of concerted sampling events at di erent time points). in macroevolutionary studies based only on extant lineages, there is no poissonian sampling, but a csa at the present (i.e., fl > ). in epidemiology, csas correspond to large-scale testing e orts (relative to the background rate of testing) in a short amount of time (relative to the rates of viral sequence divergence); for full explanation, see supplementary material section s . . . we call these attempts rather than events because if fl is small or the infection is rare in the population, few or no samples may be obtained. csas can also be incorporated into the model by including infinitesimally short spikes in the sampling rate  (more precisely, appropriately scaled dirac distributions). hence, for simplicity, in the main text we focus on the seemingly simpler case of pure poissonian sampling through time except at present-day, where we allow for a csa to facilitate comparisons with macroevolu- tionary models; the resulting formulas can then be used to derive a likelihood formula for the case where past csas are included (supplement s . . ). in the epidemiological case, sampling may be concurrent (or not) with host treatment or behavioural changes resulting in the e ective extinction of the viral lineage. hence, we assume that sampling results in the immediate extinction of the lineage with probability r(· ). similarly, in the case of past csas we must include the probability, r l , that sampled hosts are removed from the infectious pool during the csa at time ary case to explicitly model the collection of samples from the fossil record (e.g., the fossilized birth-death process [ ]). for our derivation, we make no assumption about the temporal dynamics of ⁄, µ, Â, or r; each may be constant, or vary according to any arbitrary function of time given that it is biologically valid (i.e., non- negative and between and in the case of r). we make the standard assumption that at any given time any given lineage experiences a birth, death or sampling event independently of (and with the same probabilities as) all other lineages. we revisit this assumption in box where we discus how the implicit assumptions of the bds process are well summarized by the diversification model's relationship to the sir epidemiological model. our resulting general time-variable bds process can be fully defined by the parameter set bds = {⁄(· ), µ(· ), Â(· ), r(· ),f}. in order to make inference about the model parameters, we need to calculate the likelihood, l, that an observed phylogeny, t, is the result of a given bds process. with respect to the bds process there are two ways to represent the information contained in the phylogeny t, both of which have been used in the literature, which we call the "edge" and "critical time" representations, respectively. we begin by deriving the likelihood in terms of the edge representation and later demonstrate how to reformulate the likelihood in terms of critical times. in the edge representation, the phylogeny is summarized as a set of edges in the mathematical graph that makes up the phylogeny, numbered - in figure b , and the types of events that occurred at each node. we define g e (· ) as the probability that the edge e which begins at time s e and ends at time t e gives rise to the subsequently observed phylogeny between time ·, (s e < · < t e ) and the present day. the likelihood of the tree then, is by definition g stem (t or ): the probability density the stem lineage (stem = in figure b ) gives rise to the observed phylogeny from the origin, t or , to the present day. although it is initially most intuitive to derive the likelihood in terms of the edge representation, as we show below, it is then straightforward to derive the critical times formulation which results in mathematical simplification. deriving the initial value problem (ivp) for g e (· ): we derive the ivp for the likelihood density g e (· ) using an approach first developed by [ ] . for a small time · the recursion equation for the change in the likelihood density is given by the following expression. ( ) here, e(· ) is the probability that a lineage alive at time · leaves no sampled descendants at the present day. we will examine this probability in more detail below. assuming · is small, we can approximate the above recursion equation as the following di erence equation. g e (· ) ¥ ≠(⁄(· ) + µ(· ) + Â(· )) · g e (· ) + ⁄(· )g e (· )e(· ) · + o( · ). ( ) by the definition of the derivative we have: dg e (· ) d· = ≠(⁄(· ) + µ(· ) + Â(· ))g e (· ) + ⁄(· )g e (· )e(· ). where the auxiliary function, , is given by: this function, (s, t), maps the value of g e at time s to its value at t, and hence is known as the probability "flow" of the kolmogorov backward equation [ ] . deriving the ivp for e(· ): we derive the ivp for e(· ) in a similar manner as above, beginning with a di erence equation. by the definition of a derivative we have: where fl is the probability a lineage is sampled at the present day. the initial condition at time is therefore the probability that a lineage alive at the present day is not sampled. given an analytical or numerical general solution to e(· ), we can find the likelihood by evaluating g stem (t or ), as follows. deriving the expression for g stem (t or ): given the linear nature of the di erential equation for g e (· ) and hence the representation in equation ( )), the likelihood g stem (· ) is given by the product over all the initial conditions times the product over the probability flow for each edge. representing g stem (t or ) in terms of critical times: equation ( ) can be further simplified by removing the need to enumerate over all the edges of the phylogeny (the last term of equation ( )) and writing l in terms the probability flow can be rewritten as the following ratio: this relationship allows us to rewrite the likelihood by expressing the product over the edges as two separate note ( ) = . it is often appropriate to condition the tree likelihood on the tree exhibiting some property, for example the condition there being at least sampled lineage. imposing a condition on the likelihood is done by multiplying by a factor s. various conditioning schemes are considered in section s . and listed in table s . the resulting likelihood of the general bds model is: provides a consistent notation for unifying a multitude of seemingly disparate models, it also provides a con- figure b ) are time independent. in this case the deterministic compartmental model is given by the following initial value problem. step : model specification -as discussed in box , sir compartmental models can be used to constrain bds rates by setting ⁄(· ) = -s(t or ≠ · ) and µ(· ) = " + " + -, which under the present assumptions is a constant. the sampling rate is also assumed constant Â(· ) = Â. we will assume that all samples are acquired through poissonian sampling at a constant rate, hence we include neither csas at in the past nor at the present-day. upon sampling we will assume that all lineages are removed with a constant probability r. finally, we will condition the likelihood on the observation of at least one sample since the time of the most recent common ancestor. to avoid extending the inference to points very early in the spread of the infection we will condition on observing at least one sampled lineage since the time of most recent common ancestor, s . step : ivp for g e (· ) -the initial value problem for g e (· ) is straightforward to derive: step : ivp for e(· ) -similarly we have: step : expression for g stem (t or ). the expression for g stem (t or ) in the case of no present-day sampling can be obtained from and setting n = . step : g stem (t or ) in terms of the critical times -we can then transform the solution into the critical time step : conditioning the likelihood -finally, we condition the likelihood on observing at least one lineage given the tmrca, here we derive a phylogenetic birth-death-sampling model in a more general form than previously attempted, making as few assumptions about the processes that generated the data as possible. while drawing inferences from data will require making additional assumptions and applying mathematical constraints to the param- class (e.g., ⁄(· ) = -s(· )) and the viral death rate to the infectious recovery or removal rate µ(· ) = "+"+-, whereas the sampling rate Â(· ) is identical across models. while constraining the birth, death, and sampling rates in this manner can be used to parameterize compartmental models (e.g., [ ]) doing so is an approxima- tion assuming independence between the exact timing of transmission, recovery or removal from population, and sampling events in the sir model and birth, death, and sampling events in the diversification model. the unifying the epidemiological and evolutionary dynamics of pathogens complex population dynamics and the coalescent under neutrality on the genealogy of large populations an integrated framework for the inference of viral population history from reconstructed genealogies exploring the demographic history of dna sequences using the bayesian coalescent inference of past population dynamics from molecular sequences phylodynamics of infectious disease epidemics how well can the exponential-growth coalescent approximate constant-rate birth-death population dynamics? inference of epidemiological dynamics based on simulated phylogenies using birth-death and coalescent models on the generalized "birth-and-death estimating a binary character's e ect on speciation and extinction on incomplete sampling under birth-death models and connections to the sampling-based coalescent sampling-through-time in birth-death trees mathematical models of cladogenesis the reconstructed evolutionary process phylogenetic approaches for studying diversification simulating trees with millions of species the conditioned reconstructed process the fossilized birth-death process for coherent calibration of divergence-time estimates estimating the basic reproductive number from viral sequence data birth-death skyline plot reveals temporal changes of epidemic spread in hiv and hepatitis c virus (hcv) uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death sir model bayesian inference of sampled ancestor trees for epidemiology and fossil calibration birth-death models in macroevolution reconciling molecular phylogenies with the fossil record mammalian phylogeny reveals recent diversification rate shifts getting to the root of epidemic spread with phylodynamic analysis of genomic data the spread of hepatitis c virus genotype a in north america: a retrospective phylogenetic study speciation gradients and the distribution of biodiversity extant timetrees are consistent with a myriad of diversification histories timing and order of transmission events is not directly reflected in a pathogen phylogeny Über die analytischen methoden in der wahrscheinlichkeitsrechnung proceedings of the first berkeley symposium on mathematical statistics and a general and e cient algorithm for the likelihood of diversification and discrete-trait evolutionary models e cient comparative phylogenetics on large trees beast . : an advanced software platform for bayesian evolutionary analysis recursive algorithms for phylogenetic tree counting inferring epidemiological dynamics with bayesian coalescent inference: the merits of deterministic and stochastic models e cient comparative phylogenetics on large trees general models of multilocus evolution a general consumer-resource population model phylodynamics with migration: a computational framework to quantify population structure from genomic data a multitype birth-death model for bayesian inference of lineage-specific birth and death rates origination and extinction through the phanerozoic: a new approach improved estimation of macroevolutionary rates from fossil data using a bayesian framework explosive evoltuionary radiation: decreasing specaition or increasing extinction through time? modeling infectious diseases: in humans and animals density-dependent diversification in north american wood warblers prolonging the past counteracts the pull of the present: protracted speciation can explain observed slowdowns in diversification detection of hiv transmission clusters from phylogenetic trees using a multi-state birth-death model age-dependent speciation can explain the shape of empirical phylogenies inferring epidemic contact structure from phylogenetic trees estimating epidemic incidence and prevalence from genomic data . arising from a bds process, this sampled tree can be summarized in two ways. first by the set of edges (labeled - ) or as a set of critical times (horizontal lines) including: ) the time of birth events (solid, x i ) ) terminal sampling times (dashed, y j ), and ) ancestral sampling times (dotted, z k ). given the inferred rates from a reconstructed sampled tree, these rates can be used to estimate characteristic parameters of the sir model, for example the basic or e ective reproductive number. key: cord- - uinislh authors: doi, hideyuki; watanabe, takeshi; nishizawa, naofumi; saito, tatsuya; nagata, hisao; kameda, yuichi; maki, nobutaka; ikeda, kousuke; fukuzawa, takashi title: on-site edna detection of species using ultra-rapid mobile pcr date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uinislh molecular methods, including environmental dna (edna) methods, provide essential information for biological and conservation sciences. molecular measurements are often performed in the laboratory, which limits their scope, especially for rapid on-site analysis. edna methods for species detection provide essential information for the management and conservation of species and communities in various environments. we developed an innovative novel method for on-site edna measurements using an ultra-rapid mobile pcr platform. we tested the ability of our method to detect the distribution of silver carp, hypophthalmichthys molitrix, an invasive fish in japanese rivers and lakes. our method reduced the measurement time to min and provided high detectability of aquatic organisms compared to the national observation surveys using multiple fishing nets and laboratory extraction/detection using a benchtop qpcr platform. our on-site edna method can be immediately applied to various taxa and environments. molecular technologies, such as species identification and gene expression analyses, provide essential information for biological and conservation sciences. however, even with advances in techniques in the last decades, molecular measurements in the laboratory may take a day or more. ultra-rapid methods from dna collection to detection are still not well developed ( ), especially for environmental dna (edna) analysis, which uses water or soil samples to track the presence of target species ( , ) . edna analysis is a useful method to investigate the distribution of aquatic and terrestrial organisms ( - ). approaches using edna have provided essential information for ecological management and conservation, facilitating the detection of various kinds of organisms, including endemic, invasive, or parasitic species ( , , ) . edna measurements have been mainly performed by quantitative real-time pcr (qpcr, - ). however, it is limited to laboratory analysis and laboratory processing can take many hours. these time delays often limit the range of uses for on-site edna detection ( , ) . field-portable dna extraction and pcr platforms offer the potential to change species detection by edna on site ( ) ( ) ( ) . however, these approaches still take a similar time to laboratory measurements. here, we developed a new innovative method for the field processing of edna samples and measurements using an ultra-rapid mobile pcr platform (hereafter, mobile pcr) to reduce the measurement time to min and maintain high detectability of aquatic organisms. we demonstrated its on-site use to detect the distribution of silver carp, hypophthalmichthys molitrix, an invasive fish in japanese rivers and lakes. we compared the on-site edna measurement to the laboratory extraction and detection using a benchtop qpcr platform and the national survey to confirm the performance. we detected the edna of h. molitrix by on-site measurement at out of sites ( in survey , we also detected the edna by the laboratory methods using benchtop qpcr at all sites detected by our on-site method (fig. ) . the relationship between the cycle timing (ct) of the mobile pcr and edna concentration (or ct) of qpcr was significant ( fig. a , b, lm, p < . ). the ct of mobile pcr was larger than that of qpcr, because the dna concentration in the field-extracted samples was lower. our ultra-rapid on-site edna extraction and measurement method using mobile pcr successfully detected the edna of h. molitrix, and analysis took only min. our method can be applied to many other taxa, including viruses and bacteria and to vertebrates using specific primers. we sampled water from aquatic ecosystems, but the method can be applied to terrestrial systems. for example, valentin et al. ( ) evaluated terrestrial insects on forest leaves by spraying and collecting water. the mobile pcr platform can also perform multiplex pcr for a few independent dna measurements. using multiplex pcr, we can detect species co-existence, for example, for close hostparasites interactions. therefore, our method has high potential for use with various taxa in different environments, including terrestrial and marine ecosystems. this ultra-rapid methods can immediately be applied to broad science fields, such as human health ( ) and food science ( ) . for example, medema et al. ( ) detected sars-cov- rna from wastewater to evaluate the spread of covid- . our method can be applied to detect rna viruses, such as sars-cov- , using reverse-transcription qpcr. we conducted field surveys in the tone river and lake kasumigaura: survey : on-site detection only and survey : on-site detection and laboratory measurement. we on-site dna measurement using mobile pcr we used a primer and probe set to detect h. molitrix ( ) . we checked the primer specificity for other related species such as carp in japan using ncbi primer-blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi) and confirmed the specificity for japanese carp species. a day before sampling, we made a pcr pre-mix with preliminary mixing of the master mix and primer-probe to bring it on site. each taqman the pcr conditions were as follows: °c for s, followed by cycles of °c for . s, and °c for s. in the laboratory, we performed a no-template control (ntc) using dw after the mixture preparation as a regents control. we performed an ntc using dw after all pcr measurements in the day (pcr control). for survey , we extracted the dna from the rnalater-fixed sterivex filters and purified using the dneasy blood and tissue kit (qiagen, hilden, germany) according to miya et al. ( ) . we quantified the edna using the pikoreal real-time pcr (thermo fisher scientific, waltham, ma, usa). in the laboratory qpcr, we used the same primer-probe set of onsite measurements and the pcr template mix as in our previous studies ( ). each we used a dilution series of , , , and copies per pcr reaction (n = ) for the standard curve using the target dna cloned into a plasmid. the r values of the standard curves ranged from . to . (pcr efficiencies = . − . %). we did not detect any positives from the controls for mobile pcr and qpcr, and confirmed no cross-contamination in all edna measurements. we performed an lod test for both mobile and qpcr as per the above pcr conditions. we used , , , and copies of the positive control per pcr template with four replicates and detected two copy of the positive control ( / replicates). thus, we determined that the lod was two copies for both mobile and qpcr. we obtained a capture survey dataset from the ministry of land, infrastructure, transport and tourism, japan (http://www.nilim.go.jp/lab/fbg/ksnkankyo/index.html). the national fish survey was conducted using multiple fishing gears (casting, gill, and shin net) in . the survey was conducted in three seasons, including spring, summer, and autumn, and h. molitrix was observed in multiple seasons. all statistical analyses were conducted using r ver. . . .we calculated cohen's kappa value to compare the detection probability of h. molitrix distribution between edna and the national surveys with the r "irr" package ver. . , with equally weighted data. to test the regression between edna concentration estimated by qpcr and the ct of mobile pcr, we performed linear models (lms) using "lm" function. all data are available in the supplementary table s . pcr heads into the field applications of environmental dna (edna) in ecology and conservation: opportunities, challenges and prospects the detection of aquatic macroorganisms using environmental dna analysis-a review of methods for collection, extraction, and detection environmental dna analysis for estimating the abundance and biomass of stream fish estimation of fish biomass using environmental dna environmental dna method for estimating salamander distribution in headwater streams, and a comparison of water sampling methods detection of schistosoma japonicum and oncomelania hupensis quadrasi environmental dna and its potential utility to schistosomiasis japonica surveillance in the philippines ande™: a fully integrated environmental dna sampling system a system for rapid edna detection of aquatic invasive species rapid detection and monitoring of flavobacterium psychrophilum in water by using a handheld, field portable quantitative pcr system moving edna surveys onto land: strategies for active edna aggregation to detect invasive forest insects presence of sars-coronavirus- rna in sewage and correlation with reported covid- prevalence in the early stage of the epidemic in the netherlands evaluation of a loop-mediated isothermal amplification (lamp) method for rapid on-site detection of horse meat mitochondrial genome sequencing and development of genetic markers for the detection of dna of invasive bighead and silver carp (hypophthalmichthys nobilis and h. molitrix) in environmental water samples from the united states use of a filter cartridge for filtration of water samples and extraction of environmental dna we thank teruhiko takahara for his helpful comments on our manuscript. this study was supported by the environment research and technology development fund ( - , - ). key: cord- - thh syt authors: carlson, colin j.; albery, gregory f.; merow, cory; trisos, christopher h.; zipfel, casey m.; eskew, evan a.; olival, kevin j.; ross, noam; bansal, shweta title: climate change will drive novel cross-species viral transmission date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: thh syt at least , species of mammal virus are estimated to have the potential to spread in human populations, but the vast majority are currently circulating in wildlife, largely undescribed and undetected by disease outbreak surveillance , , . in addition, changing climate and land use are already driving geographic range shifts in wildlife, producing novel species assemblages and opportunities for viral sharing between previously isolated species , . in some cases, this will inevitably facilitate spillover into humans , —a possible mechanistic link between global environmental change and emerging zoonotic disease . here, we map potential hotspots of viral sharing, using a phylogeographic model of the mammal-virus network, and projections of geographic range shifts for , mammal species under climate change and land use scenarios for the year . range-shifting mammal species are predicted to aggregate at high elevations, in biodiversity hotspots, and in areas of high human population density in asia and africa, driving the cross-species transmission of novel viruses at least , times. counter to expectations, holding warming under °c within the century does not reduce new viral sharing, due to greater range expansions—highlighting the need to invest in surveillance even in a low-warming future. most projected viral sharing is driven by diverse hyperreservoirs (rodents and bats) and large-bodied predators (carnivores). because of their unique dispersal capacity, bats account for the majority of novel viral sharing, and are likely to share viruses along evolutionary pathways that could facilitate future emergence in humans. our findings highlight the urgent need to pair viral surveillance and discovery efforts with biodiversity surveys tracking range shifts, especially in tropical countries that harbor the most emerging zoonoses. dispersal limits). even with dispersal limits, these first encounters are predicted to produce al- most one hundred new viral sharing events (rcp . : ± . ; rcp . : ± . ) that might include zebov, and which cover a much broader part of africa than the current zoonotic niche of ebola . human spillover risk aside, this could expose several new wildlife species to a deadly virus historically responsible for sizable primate die-offs . moreover, for zoonoses like emerging threats like ranavirus causing conservation concern, pathogen exchange among am- phibians may be especially important for conservation practitioners to understand . finally, marine mammals are an important target given their exclusion here, especially after a recent study implicating reduced arctic sea ice in novel viral transmission between pinnipeds and sea otters-a result that may be the first proof of concept for our proposed climate-disease link . because hotspots of cross-species transmission are predictable, our study provides the first template for how surveillance could target future hotspots of viral emergence in wildlife. in the next decade alone, billions could be spent on virological work trying to identify and counteract zoonotic threats before they spread from wildlife reservoirs into human populations . these to implement the grubb outlier tests for a given species we defined a distance matrix between each record and the centroid of all records (in both environmental or geographic space, respec- tively) and determined whether the record with the largest distance was an outlier with respect to all other distances, at a given statistical significance (p = e − , in order to exclude only extreme outliers). if an outlier was detected it was removed and the test was repeated until no additional outliers were detected. the worldclim dataset is widely used in ecology, biodiversity, and agricultural projections of potential climate change impacts. worldclim makes data available for current and future for herbivores and omnivores, the maximum is estimated as d = . m . . we used mammalian diet data from the eltontraits database , and used the same cutoff as schloss to identify carnivores as any species with % or less plants in their diet. we used body mass data from eltontraits in the schloss formula to estimate maximum generational dispersal, and converted estimates to annual maximum dispersal rates by dividing by generation length, formula performs notably poorly for bats: for example, it would assign the largest bat in our study, the indian flying fox (pteropus giganteus), a dispersal capacity lower than that of the gray dwarf hamster (cricetulus migratorius). bats were instead given full dispersal in all scenarios: given significant evidence that some bat species regularly cover continental distances , , and that isolation by distance is uncommon within many bats' ranges , we felt this was a defensible assumption for modeling purposes. moving forward, the rapid range shifts already observed in many bat species (see main text) could provide an empirical reference point to fit a new allo- metric scaling curve (after standardizing those results for the studies' many different method- ologies). a different set of functional traits likely govern the scaling of bat dispersal, chiefly the aspect ratio (length:width) of wings, which is a strong predictor of population genetic differ- entiation . migratory status would also be important to include as a predictor although here, we exclude information on long-distance migration for all species (due to a lack of any real framework for adding that information to species distribution models in the literature). . using a linear model, we show that elevation (c), species richness (d), and land use (e) influence the number of new overlaps for bats and non-bats. slopes for the elevation effect were generally steeply positive: a log -increase in elevation was associated with between a . - . log -increase in first encounters. results are averaged across nine global climate models. legends refer to scenarios: cl gives climate and land use change, while cld add adds dispersal limits. a. b. extended data figure : projected viral sharing from suspected ebola reservoirs is dominated by bats. node size is proportional to (left) the number of suspected ebola host species in each order, which connect to (middle) first encounters with potentially naive host species; and (right) the number of projected viral sharing events in each receiving group. (node size denotes proportions out of % within each column total.) while ebola hosts will encounter a much wider taxonomic range of mammal groups than current reservoirs, the vast majority of viral sharing will occur disproportionately in bats. extended data figure : data processing workflow. summary of species inclusion across the modeling pipeline for species distributions and viral sharing models. the final analyses in the main text use , species of placental mammals across all scenarios. extended data figure : species distribution modeling workflow for a single species. a focal species (the sand cat, felis margarita) is displayed as an illustrative example. the present day climate prediction (top left) was clipped to the same continent according to the iucn distribution (top right). this was then clipped according to cervus elaphus land use (second row, left). the known dispersal distance of the red deer was used to buffer the climate distribution (second row, right). the future distribution predictions (rcp . shown as an example) are displayed in the bottom four panels, for each of the four pipelines: only climate (third row, left); climate + dispersal clip (third row, right); climate + land use clip (bottom row, left) and climate + land use + dispersal clip (bottom row, right). the four distributions clearly display the limiting effect of the dispersal filter (bottom right panels) in reducing the probability of novel species interactions (bottom left panels). the land use clip had little effect on this species as the entire distribution area was habitable for the red deer. bats as 'special' reservoirs for emerging zoonotic pathogens a comparison of bats and rodents as reservoirs of zoonotic viruses: are bats special? are bats really 'special' as viral reservoirs? what we know and need to know mass extinctions, biodiversity and mitochon- drial function: are bats 'special' as reservoirs for emerging viruses? current opinion in virology viral zoonotic risk is homogenous among taxonomic orders of mammalian and avian reservoir hosts virological factors that increase the transmissibility of emerging human viruses transmissibility of emerging viral zoonoses origins of hiv and the aids pandemic. cold spring harbor perspectives in medicine origin and evolution of pathogenic coronaviruses key: cord- -hqbstgg authors: widrich, michael; schafl, bernhard; ramsauer, hubert; pavlovi'c, milena; gruber, lukas; holzleitner, markus; brandstetter, johannes; sandve, geir kjetil; greiff, victor; hochreiter, sepp; klambauer, gunter title: modern hopfield networks and attention for immune repertoire classification date: - - journal: nan doi: nan sha: doc_id: cord_uid: hqbstgg a central mechanism in machine learning is to identify, store, and recognize patterns. how to learn, access, and retrieve such patterns is crucial in hopfield networks and the more recent transformer architectures. we show that the attention mechanism of transformer architectures is actually the update rule of modern hopfield networks that can store exponentially many patterns. we exploit this high storage capacity of modern hopfield networks to solve a challenging multiple instance learning (mil) problem in computational biology: immune repertoire classification. accurate and interpretable machine learning methods solving this problem could pave the way towards new vaccines and therapies, which is currently a very relevant research topic intensified by the covid- crisis. immune repertoire classification based on the vast number of immunosequences of an individual is a mil problem with an unprecedentedly massive number of instances, two orders of magnitude larger than currently considered problems, and with an extremely low witness rate. in this work, we present our novel method deeprc that integrates transformer-like attention, or equivalently modern hopfield networks, into deep learning architectures for massive mil such as immune repertoire classification. we demonstrate that deeprc outperforms all other methods with respect to predictive performance on large-scale experiments, including simulated and real-world virus infection data, and enables the extraction of sequence motifs that are connected to a given disease class. source code and datasets: https://github.com/ml-jku/deeprc transformer architectures (vaswani et al., ) and their attention mechanisms are currently used in many applications, such as natural language processing (nlp), imaging, and also in multiple instance learning (mil) problems . in mil, a set or bag of objects is labelled rather than objects themselves as in standard supervised learning tasks (dietterich et al., ) . examples for mil problems are medical images, in which each sub-region of the image represents an instance, video a pooling function f is used to obtain a repertoire-representation z for the input object. finally, an output network o predicts the class labelŷ. b) deeprc uses stacked d convolutions for a parameterized function h due to their computational efficiency. potentially, millions of sequences have to be processed for each input object. in principle, also recurrent neural networks (rnns), such as lstms (hochreiter et al., ) , or transformer networks (vaswani et al., ) may be used but are currently computationally too costly. c) attention-pooling is used to obtain a repertoire-representation z for each input object, where deeprc uses weighted averages of sequence-representations. the weights are determined by an update rule of modern hopfield networks that allows to retrieve exponentially many patterns. classification, in which each frame is an instance, text classification, where words or sentences are instances of a text, point sets, where each point is an instance of a d object, and remote sensing data, where each sensor is an instance (carbonneau et al., ; uriot, ) . attention-based mil has been successfully used for image data, for example to identify tiny objects in large images (ilse et al., ; pawlowski et al., ; tomita et al., ; kimeswenger et al., ) and transformer-like attention mechanisms for sets of points and images . however, in mil problems considered by machine learning methods up to now, the number of instances per bag is in the range of hundreds or few thousands (carbonneau et al., ; lee et al., ) (see also tab. a ). at the same time the witness rate (wr), the rate of discriminating instances per bag, is already considered low at % − %. we will tackle the problem of immune repertoire classification with hundreds of thousands of instances per bag without instance-level labels and with extremely low witness rates down to . % using an attention mechanism. we show that the attention mechanism of transformers is the update rule of modern hopfield networks (krotov & hopfield, demircigil et al., ) that are generalized to continuous states in contrast to classical hopfield networks (hopfield, ) . a detailed derivation and analysis of modern hopfield networks is given in our companion paper (ramsauer et al., ) . these novel continuous state hopfield networks allow to store and retrieve exponentially (in the dimension of the space) many patterns (see next section). thus, modern hopfield networks with their update rule, which are used as an attention mechanism in the transformer, enable immune repertoire classification in computational biology. immune repertoire classification, i.e. classifying the immune status based on the immune repertoire sequences, is essentially a text-book example for a multiple instance learning problem (dietterich et al., ; maron & lozano-pérez, ; wang et al., ) . briefly, the immune repertoire of an individual consists of an immensely large bag of immune receptors, represented as amino acid sequences. usually, the presence of only a small fraction of particular receptors determines the immune status with respect to a particular disease (christophersen et al., ; emerson et al., ) . this is because the immune system has already acquired a resistance if one or few particular immune receptors that can bind to the disease agent are present. therefore, classification of immune repertoires bears a high difficulty since each immune repertoire can contain millions of sequences as instances with only a few indicating the class. further properties of the data that complicate the problem are: (a) the overlap of immune repertoires of different individuals is low (in most cases, maximally low single-digit percentage values) (greiff et al., ; elhanati et al., ) , (b) multiple different sequences can bind to the same pathogen (wucherpfennig et al., ) , and (c) only subsequences within the sequences determine whether binding to a pathogen is possible (dash et al., ; glanville et al., ; akbar et al., ; springer et al., ; fischer et al., ) . in summary, immune repertoire classification can be formulated as multiple instance learning with an extremely low witness rate and large numbers of instances, which represents a challenge for currently available machine learning methods. furthermore, the methods should ideally be interpretable, since the extraction of class-associated sequence motifs is desired to gain crucial biological insights. the acquisition of human immune repertoires has been enabled by immunosequencing technology (georgiou et al., ; brown et al., ) which allows to obtain the immune receptor sequences and immune repertoires of individuals. each individual is uniquely characterized by their immune repertoire, which is acquired and changed during life. this repertoire may be influenced by all diseases that an individual is exposed to during their lives and hence contains highly valuable information about those diseases and the individual's immune status. immune receptors enable the immune system to specifically recognize disease agents or pathogens. each immune encounter is recorded as an immune event into immune memory by preserving and amplifying immune receptors in the repertoire used to fight a given disease. this is, for example, the working principle of vaccination. each human has about - unique immune receptors with low overlap across individuals and sampled from a potential diversity of > receptors (mora & walczak, ) . the ability to sequence and analyze human immune receptors at large scale has led to fundamental and mechanistic insights into the adaptive immune system and has also opened the opportunity for the development of novel diagnostics and therapy approaches (georgiou et al., ; brown et al., ) . immunosequencing data have been analyzed with computational methods for a variety of different tasks (greiff et al., ; shugay et al., ; miho et al., ; yaari & kleinstein, ; wardemann & busse, ) . a large part of the available machine learning methods for immune receptor data has been focusing on the individual immune receptors in a repertoire, with the aim to, for example, predict the antigen or antigen portion (epitope) to which these sequences bind or to predict sharing of receptors across individuals (gielis et al., ; springer et al., ; jurtz et al., ; moris et al., ; fischer et al., ; greiff et al., ; sidhom et al., ; elhanati et al., ) . recently, jurtz et al. ( ) used d convolutional neural networks (cnns) to predict antigen binding of t-cell receptor (tcr) sequences (specifically, binding of tcr sequences to peptide-mhc complexes) and demonstrated that motifs can be extracted from these models. similarly, konishi et al. ( ) use cnns, gradient boosting, and other machine learning techniques on b-cell receptor (bcr) sequences to distinguish tumor tissue from normal tissue. however, the methods presented so far predict a particular class, the epitope, based on a single input sequence. immune repertoire classification has been considered as a mil problem in the following publications. a deep learning framework called deeptcr (sidhom et al., ) implements several deep learning approaches for immunosequencing data. the computational framework, inter alia, allows for attention-based mil repertoire classifiers and implements a basic form of attention-based averaging. ostmeyer et al. ( ) already suggested a mil method for immune repertoire classification. this method considers -mers, fixed sub-sequences of length , as instances of an input object and trained a logistic regression model with these -mers as input. the predictions of the logistic regression model for each -mer were max-pooled to obtain one prediction per input object. this approach is characterized by (a) the rigidity of the k-mer features as compared to convolutional kernels (alipanahi et al., ; zhou & troyanskaya, ; zeng et al., ) , (b) the max-pooling operation, which constrains the network to learn from a single, top-ranked k-mer for each iteration over the input object, and (c) the pooling of prediction scores rather than representations (wang et al., ) . our experiments also support that these choices in the design of the method can lead to constraints on the predictive performance (see table ). our proposed method, deeprc, also uses a mil approach but considers sequences rather than k-mers as instances within an input object and a transformer-like attention mechanism. deeprc sets out to avoid the above-mentioned constraints of current methods by (a) applying transformer-like attention-pooling instead of max-pooling and learning a classifier on the repertoire rather than on the sequence-representation, (b) pooling learned representations rather than predictions, and (c) using less rigid feature extractors, such as d convolutions or lstms. in this work, we contribute the following: we demonstrate that continuous generalizations of binary modern hopfield-networks (krotov & hopfield, demircigil et al., ) have an update rule that is known as the attention mechanisms in the transformer. we show that these modern hopfield networks have exponential storage capacity, which allows them to extract patterns among a large set of instances (next section). based on this result, we propose deeprc, a novel deep mil method based on modern hopfield networks for large bags of complex sequences, as they occur in immune repertoire classification (section "deep repertoire classification). we evaluate the predictive performance of deeprc and other machine learning approaches for the classification of immune repertoires in a large comparative study (section "experimental results") exponential storage capacity of continuous state modern hopfield networks with transformer attention as update rule in this section, we show that modern hopfield networks have exponential storage capacity, which will later allow us to approach massive multiple-instance learning problems, such as immune repertoire classification. see our companion paper (ramsauer et al., ) for a detailed derivation and analysis of modern hopfield networks. we assume patterns x , . . . , x n ∈ r d that are stacked as columns to the matrix x = (x , . . . , x n ) and a query pattern ξ that also represents the current state. the largest norm of a pattern is m = max i x i . the separation ∆ i of a pattern x i is defined as its minimal dot product difference to any of the other patterns: we consider a modern hopfield network with current state ξ and the energy function for energy e and state ξ, the update rule is proven to converge globally to stationary points of the energy e, which are local minima or saddle points (see (ramsauer et al., ) , appendix, theorem a ). surprisingly, the update rule eq. ( ) is also the formula of the well-known transformer attention mechanism. to see this more clearly, we simultaneously update several queries ξ i . furthermore the queries ξ i and the patterns x i are linear mappings of vectors y i into the space r d . for matrix notation, we set x i = w t k y i , ξ i = w t q y i and multiply the result of our update rule with w v . using y = (y , . . . , y n ) t , we define the matrices and the patterns are now mapped to the hopfield space with dimension d = d k . we set β = / √ d k and change softmax to a row vector. the update rule eq. ( ) multiplied by w v performed for all queries simultaneously becomes in row vector notation: this formula is the transformer attention. if the patterns x i are well separated, the iterate eq. ( ) converges to a fixed point close to a pattern to which the initial ξ is similar. if the patterns are not well separated the iterate eq.( ) converges to a fixed point close to the arithmetic mean of the patterns. if some patterns are similar to each other but well separated from all other vectors, then a metastable state between the similar patterns exists. iterates that start near a metastable state converge to this metastable state. for details see ramsauer et al. ( ) , appendix, sect. a . typically, the update converges after one update step (see ramsauer et al. ( ) , appendix, theorem a ) and has an exponentially small retrieval error (see ramsauer et al. ( ) , appendix, theorem a ). our main concern for application to immune repertoire classification is the number of patterns that can be stored and retrieved by the modern hopfield network, equivalently to the transformer attention head. the storage capacity of an attention mechanism is critical for massive mil problems. we first define what we mean by storing and retrieving patterns from the modern hopfield network. definition (pattern stored and retrieved). we assume that around every pattern x i a sphere s i is given. we say x i is stored if there is a single fixed point x * i ∈ s i to which all points ξ ∈ s i converge, for randomly chosen patterns, the number of patterns that can be stored is exponential in the dimension d of the space of the patterns (x i ∈ r d ). theorem . we assume a failure probability < p and randomly chosen patterns on the sphere with radius m = k √ d − . we define a := d− ( + ln( β k p (d − ))), b := k β , and c = b w (exp(a + ln(b)) , where w is the upper branch of the lambert w function and ensure then with probability − p, the number of random patterns that can be stored is examples are c ≥ . for β = , k = , d = and p = . (a + ln(b) > . ) and c ≥ . for β = k = , d = , and p = . (a + ln(b) < − . ). see ramsauer et al. ( ) , appendix, theorem a for a proof. we have established that a modern hopfield network or a transformer attention mechanism can store and retrieve exponentially many patterns. this allows us to approach mil with massive numbers of instances from which we have to retrieve a few with an attention mechanism. deep repertoire classification problem setting and notation. we consider a mil problem, in which an input object x is a bag of n instances x = {s , . . . , s n }. the instances do not have dependencies nor orderings between them and n can be different for every object. we assume that each instance s i is associated with a label y i ∈ { , }, assuming a binary classification task, to which we do not have access. we only have access to a label y = max i y i for an input object or bag. note that this poses a credit assignment problem, since the sequences that are responsible for the label y have to be identified and that the relation between instance-label and bag-label can be more complex (foulds & frank, ) . a modelŷ = g(x) should be (a) invariant to permutations of the instances and (b) able to cope with the fact that n varies across input objects (ilse et al., ) , which is a problem also posed by point sets (qi et al., ) . two principled approaches exist. the first approach is to learn an instance-level scoring function h : s → [ , ], which is then pooled across instances with a pooling function f , for example by average-pooling or max-pooling (see below). the second approach is to construct an instance representation z i of each instance by h : s → r dv and then encode the bag, or the input object, by pooling these instance representations (wang et al., ) via a function f . an output function o : r dv → [ , ] subsequently classifies the bag. the second approach, the pooling of representations rather than scoring functions, is currently best performing (wang et al., ) . in the problem at hand, the input object x is the immune repertoire of an individual that consists of a large set of immune receptor sequences (t-cell receptors or antibodies). immune receptors are primarily represented as sequences s i from a space s i ∈ s. these sequences act as the instances in the mil problem. although immune repertoire classification can readily be formulated as a mil problem, it is yet unclear how well machine learning methods solve the above-described problem with a large number of instances n , and with instances s i being complex sequences. next we describe currently used pooling functions for mil problems. pooling functions for mil problems. different pooling functions equip a model g with the property to be invariant to permutations of instances and with the ability to process different numbers of instances. typically, a neural network h θ with parameters θ is trained to obtain a function that maps each instance onto a representation: z i = h θ (s i ) and then a pooling function z = f ({z , . . . , z n }) supplies a representation z of the input object x = {s , . . . , s n }. the following pooling functions are typically used: average-pooling: where e m is the standard basis vector for dimension m and attention-pooling: z = n i= a i z i , where a i are non-negative (a i ≥ ), sum to one ( n i= a i = ), and are determined by an attention mechanism. these pooling functions are invariant to permutations of { , . . . , n } and are differentiable. therefore, they are suited as building blocks for deep learning architectures. we employ attention-pooling in our deeprc model as detailed in the following. modern hopfield networks viewed as transformer-like attention mechanisms. the modern hopfield networks, as introduced above,have a storage capacity that is exponential in the dimension of the vector space and converge after just one update (see (ramsauer et al., ) , appendix).additionally, the update rule of modern hopfield networks is known as key-value attention mechanism, which has been highly successful through the transformer (vaswani et al., ) and bert (devlin et al., ) models in natural language processing. therefore using modern hopfield networks with the key-value-attention mechanism as update rule is the natural choice for our task. in particular, modern hopfield networks are theoretically justified for storing and retrieving the large number of vectors (sequence patterns) that appear in the immune repertoire classification task. instead of using the terminology of modern hopfield networks, we explain our deeprc architecture in terms of key-value-attention (the update rule of the modern hopfield network), since it is well known in the deep learning community. the attention mechanism assumes a space of dimension d k in which keys and queries are compared. a set of n key vectors are combined to the matrix k. a set of d q query vectors are combined to the matrix q. similarities between queries and keys are computed by inner products, therefore queries can search for similar keys that are stored. another set of n value vectors are combined to the matrix v . the output of the attention mechanism is a weighted average of the value vectors for each query q. the i-th vector v i is weighted by the similarity between the i-th key k i and the query q. the similarity is given by the softmax of the inner products of the query q with the keys k i . all queries are calculated in parallel via matrix operations. consequently, the attention function att(q, k, v ; β) maps queries q, keys k, and values v to d v -dimensional outputs: att(q, k, v ; β) = softmax(βqk t )v (see also eq. ( )). while this attention mechanism has originally been developed for sequence tasks (vaswani et al., ) , it can be readily transferred to sets ye et al., ) . this type of attention mechanism will be employed in deeprc. the deeprc method. we propose a novel method deep repertoire classification (deeprc) for immune repertoire classification with attention-based deep massive multiple instance learning and compare it against other machine learning approaches. for deeprc, we consider immune repertoires as input objects, which are represented as bags of instances. in a bag, each instance is an immune receptor sequence and each bag can contain a large number of sequences. note that we will use z i to denote the sequence-representation of the i-th sequence and z to denote the repertoire-representation. at the core, deeprc consists of a transformer-like attention mechanism that extracts the most important information from each repertoire. we first give an overview of the attention mechanism and then provide details on each of the sub-networks h , h , and o of deeprc. attention mechanism in deeprc. this mechanism is based on the three matrices k (the keys), q (the queries), and v (the values) together with a parameter β. values. deeprc uses a d convolutional network h (lecun et al., ; hu et al., ; kelley et al., ) that supplies a sequence-representation z i = h (s i ), which acts as the values v = z = (z , . . . , z n ) in the attention mechanism (see figure ). keys. a second neural network h , which shares its first layers with h , is used to obtain keys k ∈ r n ×d k for each sequence in the repertoire. this network uses self-normalizing layers (klambauer et al., ) with units per layer (see figure ). query. we use a fixed d k -dimensional query vector ξ which is learned via backpropagation. for more attention heads, each head has a fixed query vector. with the quantities introduced above, the transformer attention mechanism (eq. ( )) of deeprc is implemented as follows: where z ∈ r n ×dv are the sequence-representations stacked row-wise, k are the keys, and z is the repertoire-representation and at the same time a weighted mean of sequence-representations z i . the attention mechanism can readily be extended to multiple queries, however, computational demand could constrain this depending on the application and dataset. theorem demonstrates that this mechanism is able to retrieve a single pattern out of several hundreds of thousands. attention-pooling and interpretability. each input object, i.e. repertoire, consists of a large number n of sequences, which are reduced to a single fixed-size feature vector of length d v representing the whole input object by an attention-pooling function. to this end, a transformer-like attention mechanism adapted to sets is realized in deeprc which supplies a i -the importance of the sequence s i . this importance value is an interpretable quantity, which is highly desired for the immunological problem at hand. thus, deeprc allows for two forms of interpretability methods. (a) a trained deeprc model can compute attention weights a i , which directly indicate the importance of a sequence. (b) deeprc furthermore allows for the usage of contribution analysis methods, such as integrated gradients (ig) (sundararajan et al., ) or layer-wise relevance propagation (montavon et al., ; arras et al., ) . see sect. a for details. classification layer and network parameters. the repertoire-representation z is then used as input for a fully-connected output networkŷ = o(z) that predicts the immune status, where we found it sufficient to train single-layer networks. in the simplest case, deeprc predicts a single target, the class label y, e.g. the immune status of an immune repertoire, using one output value. however, since deeprc is an end-to-end deep learning model, multiple targets may be predicted simultaneously in classification or regression settings or a mix of both. this allows for the introduction of additional information into the system via auxiliary targets such as age, sex, or other metadata. table with sub-networks h , h , and o. d l indicates the sequence length. network parameters, training, and inference. deeprc is trained using standard gradient descent methods to minimize a cross-entropy loss. the network parameters are θ , θ , θ o for the sub-networks h , h , and o, respectively, and additionally ξ. in more detail, we train deeprc using adam (kingma & ba, ) with a batch size of and dropout of input sequences. implementation. to reduce computational time, the attention network first computes the attention weights a i for each sequence s i in a repertoire. subsequently, the top % of sequences with the highest a i per repertoire are used to compute the weight updates and prediction. furthermore, computation of z i is performed in -bit, others in -bit precision to ensure numerical stability in the softmax. see sect. a for details. in this section, we report and analyze the predictive power of deeprc and the compared methods on several immunosequencing datasets. the roc-auc is used as the main metric for the predictive power. methods compared. we compared previous methods for immune repertoire classification, (ostmeyer et al., ) ("log. mil (kmer)", "log. mil (tcrb)") and a burden test (emerson et al., ) , as well as the baseline methods logistic regression ("log. regr."), k-nearest neighbour ("knn"), and support vector machines ("svm") with kernels designed for sets, such as the jaccard kernel ("j") and the minmax ("mm") kernel (ralaivola et al., ) . for the simulated data, we also added baseline methods that search for the implanted motif either in binary or continuous fashion ("known motif b.", "known motif c.") assuming that this motif was known (for details, see sect. a ). datasets. we aimed at constructing immune repertoire classification scenarios with varying degree of difficulties and realism in order to compare and analyze the suggested machine learning methods. to this end, we either use simulated or experimentally-observed immune receptor sequences and we implant signals, specifically, sequence motifs or sets thereof weber et al., ) , at different frequencies into sequences of repertoires of the positive class. these frequencies represent the witness rates and range from . % to %. overall, we compiled four categories of datasets: (a) simulated immunosequencing data with implanted signals, (b) lstm-generated immunosequencing data with implanted signals, (c) real-world immunosequencing data with implanted signals, and (d) real-world immunosequencing data with known immune status, the cmv dataset (emerson et al., ) . the average number of instances per bag, which is the number of sequences per immune repertoire, is ≈ , except for category (c), in which we consider the scenario of low-coverage data with only , sequences per repertoire. the number of repertoires per dataset ranges from to , . in total, all datasets comprise ≈ billion sequences or instances. this represents the largest comparative study on immune repertoire classification (see sect. a ). hyperparameter selection. we used a nested -fold cross validation (cv) procedure to estimate the performance of each of the methods. all methods could adjust their most important hyperparameters on a validation set in the inner loop of the procedure. see sect. a for details. table : results in terms of auc of the competing methods on all datasets. the reported errors are standard deviations across cross-validation (cv) folds (except for the column "simulated"). real-world cmv: average performance over cv folds on the cmv dataset (emerson et al., ) . real-world data with implanted signals: average performance over cv folds for each of the four datasets. a signal was implanted with a frequency (=witness rate) of % or . %. either a single motif ("om") or multiple motifs ("mm") were implanted. lstm-generated data: average performance over cv folds for each of the datasets. in each dataset, a signal was implanted with a frequency of %, %, . %, . %, or . %, respectively. simulated: here we report the mean over simulated datasets with implanted signals and varying difficulties (see tab. a for details). the error reported is the standard deviation of the auc values across the datasets. results. in each of the four categories, "real-world data", "real-world data with implanted signals", "lstm-generated data", and "simulated immunosequencing data", deeprc outperforms all competing methods with respect to average auc. across categories, the runner-up methods are either the svm for mil problems with minmax kernel or the burden test (see table and sect. a ). results on simulated immunosequencing data. in this setting the complexity of the implanted signal is in focus and varies throughout simulated datasets (see sect. a ). some datasets are challenging for the methods because the implanted motif is hidden by noise and others because only a small fraction of sequences carries the motif, and hence have a low witness rate. these difficulties become evident by the method called "known motif binary", which assumes the implanted motif is known. the performance of this method ranges from a perfect auc of . in several datasets to an auc of . in dataset ' ' (see sect. a ). deeprc outperforms all other methods with an average auc of . ± . , followed by the svm with minmax kernel with an average auc of . ± . (see sect. a ). the predictive performance of all methods suffers if the signal occurs only in an extremely small fraction of sequences. in datasets, in which only . % of the sequences carry the motif, all auc values are below . . results on lstm-generated data. on the lstm-generated data, in which we implanted noisy motifs with frequencies of %, %, . %, . %, and . %, deeprc yields almost perfect predictive performance with an average auc of . ± . (see sect. a and a ). the second best method, svm with minmax kernel, has a similar predictive performance to deeprc on all datasets but the other competing methods have a lower predictive performance on datasets with low frequency of the signal ( . %). results on real-world data with implanted motifs. in this dataset category, we used real immunosequences and implanted single or multiple noisy motifs. again, deeprc outperforms all other methods with an average auc of . ± . , with the second best method being the burden test with an average auc of . ± . . notably, all methods except for deeprc have difficulties with noisy motifs at a frequency of . % (see tab. a ) . results on real-world data. on the real-world dataset, in which the immune status of persons affected by the cytomegalovirus has to be predicted, the competing methods yield predictive aucs between . and . (see table ). we note that this dataset is not the exact dataset that was used in emerson et al. ( ) . it differs in pre-processing and also comprises a different set of samples and a smaller training set due to the nested -fold cross-validation procedure, which leads to a more challenging dataset. the best performing method is deeprc with an auc of . ± . , followed by the svm with minmax kernel (auc . ± . ) and the burden test with an auc of . ± . . the top-ranked sequences by deeprc significantly correspond to those detected by emerson et al. ( ) , which we tested by a mann-whitney u-test with the null hypothesis that the attention values of the sequences detected by emerson et al. ( ) would be equal to the attention values of the remaining sequences (p-value of . · − ). the sequence attention values are displayed in tab. a . we have demonstrated how modern hopfield networks and attention mechanisms enable successful classification of the immune status of immune repertoires. for this task, methods have to identify the discriminating sequences amongst a large set of sequences in an immune repertoire. specifically, even motifs within those sequences have to be identified. we have shown that deeprc, a modern hopfield network and an attention mechanism with a fixed query, can solve this difficult task despite the massive number of instances. deeprc furthermore outperforms the compared methods across a range of different experimental conditions. impact on machine learning and related scientific fields. we envision that with (a) the increasing availability of large immunosequencing datasets (kovaltsuk et al., ; corrie et al., ; christley et al., ; zhang et al., ; rosenfeld et al., ; shugay et al., ) , (b) further fine-tuning of ground-truth benchmarking immune receptor datasets (weber et al., ; olson et al., ; marcou et al., ) , (c) accounting for repertoire-impacting factors such as age, sex, ethnicity, and environment (potential confounding factors), and (d) increased gpu memory and increased computing power, it will be possible to identify discriminating immune receptor motifs for many diseases, potentially even for the current sars-cov- (covid- ) pandemic minervina et al., ; galson et al., ) . such results would greatly benefit ongoing research on antibody and tcr-driven immunotherapies and immunodiagnostics as well as rational vaccine design (brown et al., ) . in the course of this development, the experimental verification and interpretation of machine-learningidentified motifs could receive additional focus, as for most of the sequences within a repertoire the corresponding antigen is unknown. nevertheless, recent technological breakthroughs in highthroughput antigen-labeled immunosequencing are beginning to generate large-scale antigen-labeled single-immune-receptor-sequence data thus resolving this longstanding problem (setliff et al., ) . from a machine learning perspective, the successful application of deeprc on immune repertoires with their large number of instances per bag might encourage the application of modern hopfield networks and attention mechanisms on new, previously unsolved or unconsidered, datasets and problems. impact on society. if the approach proves itself successful, it could lead to faster testing of individuals for their immune status w.r.t. a range of diseases based on blood samples. this might motivate changes in the pipeline of diagnostics and tracking of diseases, e.g. automated testing of the immune status in regular intervals. it would furthermore make the collection and screening of blood samples for larger databases more attractive. in consequence, the improved testing of immune statuses might identify individuals that do not have a working immune response towards certain diseases to government or insurance companies, which could then push for targeted immunisation of the individual. similarly to compulsory vaccination, such testing for the immune status could be made compulsory by governments, possibly violating privacy or personal self-determination in exchange for increased over-all health of a population. ultimately, if the approach proves itself successful, the insights gained from the screening of individuals that have successfully developed resistances against specific diseases could lead to faster targeted immunisation, once a certain number of individuals with resistances can be found. this might strongly decrease the harm done by e.g. pandemics and lead to a change in the societal perception of such diseases. consequences of failures of the method. as common with methods in machine learning, potential danger lies in the possibility that users rely too much on our new approach and use it without reflecting on the outcomes. however, the full pipeline in which our method would be used includes wet lab tests after its application, to verify and investigate the results, gain insights, and possibly derive treatments. failures of the proposed method would lead to unsuccessful wet lab validation and negative wet lab tests. since the proposed algorithm does not directly suggest treatment or therapy, human beings are not directly at risk of being treated with a harmful therapy. substantial wet lab and in-vitro testing and would indicate wrong decisions by the system. leveraging of biases in the data and potential discrimination. as for almost all machine learning methods, confounding factors, such as age or sex, could be used for classification. this, might lead to biases in predictions or uneven predictive performance across subgroups. as a result, failures in the wet lab would occur (see paragraph above). moreover, insights into the relevance of the confounding factors could be gained, leading to possible therapies or counter-measures concerning said factors. furthermore, the amount of data available with respec to relevant confounding factors could lead to better or worse performance of our method. e.g. a dataset consisting mostly of data from individuals within a specific age group might yield better performance for that age group, possibly resulting in better or exclusive treatment methods for that specific group. here again, the application of deeprc would be followed by in-vitro testing and development of a treatment, where all target groups for the treatment have to be considered accordingly. all datasets and code is available at https://github.com/ml-jku/deeprc. the cmv dataset is publicly available at https://clients.adaptivebiotech.com/pub/emerson- -natgen. in section a we provide details on the architecture of deeprc, in section a we present details on the datasets, in section a we explain the methods that we compared, in section a we elaborate on the hyperparameters and their selection process. then, in section a we present detailed results for each dataset category in tabular form, in section a we provide information on the lstm model that was used to generate antibody sequences, in section a we show how deeprc can be interpreted, in section a we show the correspondence of previously identified tcr sequences for cmv immune status with attention values by deeprc, and finally we present variations and an ablation study of deeprc in section a . input layer. for the input layer of the cnn, the characters in the input sequence, i.e. the amino acids (aas), are encoded in a one-hot vector of length . to also provide information about the position of an aa in the sequence, we add additional input features with values in range [ , ] to encode the position of an aa relative to the sequence. these positional features encode whether the aa is located at the beginning, the center, or the end of the sequence, respectively, as shown in figure a . we concatenate these positional features with the one-hot vector of aas, which results in a feature vector of size per sequence position. each repertoire, now represented as a bag of feature vectors, is then normalized to unit variance. since the cytomegalovirus dataset (cmv dataset) provides sequences with an associated abundance value per sequence, which is the number of occurrences of a sequence in a repertoire, we incorporate this information into the input of deeprc. to this end, the one-hot aa features of a sequence are multiplied by a scaling factor of log(c a ) before normalization, where c a is the abundance of a sequence. we feed the sequences with features per position into the cnn. sequences of different lengths were zero-padded to the maximum sequence length per batch at the sequence ends. d cnn for motif recognition. in the following, we describe how deeprc identifies patterns in the individual sequences and reduces each sequence in the input object to a fixed-size feature vector. deeprc employs d convolution layers to extract patterns, where trainable weight kernels are convolved over the sequence positions. in principle, also recurrent neural networks (rnns) or transformer networks could be used instead of d cnns, however, (a) the computational complexity of the network must be low to be able to process millions of sequences for a single update. additionally, (b) the learned network should be able to provide insights in the recognized patterns in form of motifs. both properties (a) and (b) are fulfilled by d convolution operations that are used by deeprc. we use one d cnn layer (hu et al., ) with selu activation functions (klambauer et al., ) to identify the relevant patterns in the input sequences with a computationally light-weight operation. the larger the kernel size, the more surrounding sequence positions are taken into account, which influences the length of the motifs that can be extracted. we therefore adjust the kernel size during hyperparameter search. in prior works (ostmeyer et al., ) , a k-mer size of yielded good predictive performance, which could indicate that a kernel size in the range of may be a proficient choice. for d v trainable kernels, this produces a feature vector of length d v at each sequence position. subsequently, global max-pooling over all sequence positions of a sequence reduces the sequence-representations z i to vectors of the fixed length d v . given the challenging size of the input data per repertoire, the computation of the cnn activations and weight updates is performed using -bit floating point values. a list of hyperparameters evaluated for deeprc is given in table a . a comparison of rnn-based and cnn-based sequence embedding for motif recognition in a smaller experimental setting is given in sec. a . regularization. we apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort. during training, each repertoire is subsampled to , input sequences, which are randomly drawn from the respective repertoire. this can also be interpreted as random drop-out (hinton et al., ) on the input sequences or attention weights. during training and evaluation, the attention weights computed by the attention network are furthermore used to rank the input sequences. based on this ranking, the repertoire is reduced to the % of sequences with the highest attention weights. these top % of sequences are then used to compute the weight updates and the prediction for the repertoire. additionally, one might employ further regularization techniques, which we only partly investigated further in a smaller experimental setting in sec. a due to high computational demands. such regularization techniques include l and l weight decay, noise in the form of random aa permutations in the input sequences, noise on the attention weights, or random shuffling of sequences between repertoires that belong to the negative class. the last regularization technique assumes that the sequences in positive-class repertoires carry a signal, such as an aa motif corresponding to an immune response, whereas the sequences in negative-class repertoires do not. hence, the sequences can be shuffled randomly between negative class repertoires without obscuring the signal in the positive class repertoires. hyperparameters. for the hyperparameter search of deeprc for the category "simulated immunosequencing data", we only conducted a full hyperparameter search on the more difficult datasets with motif implantation probabilities below %, as described in table a . this process was repeated for all folds of the -fold cross-validation (cv) and the average score on the test sets constitutes the final score of a method. table a provides an overview of the hyperparameter search, which was conducted as a grid search for each of the datasets in a nested -fold cv procedure, as described in section a . computation time and optimization. we took measures on the implementation level to address the high computational demands, especially gpu memory consumption, in order to make the large number of experiments feasible. we train the deeprc model with a small batch size of samples and perform computation of inference and updates of the d cnn using -bit floating point values. the rest of the network is trained using -bit floating point values. the adam parameter for numerical stability was therefore increased from the default value of = − to = − . training was performed on various gpu types, mainly nvidia rtx ti. computation times were highly dependent on the number of sequences in the repertoires and the number and sizes of cnn kernels. a single update on an nvidia rtx ti gpu took approximately . to . seconds, while requiring approximately to gb gpu memory. taking these optimizations and gpus with larger memory (≥ gb) into account, it is already possible to train deeprc, possibly with multi-head attention and a larger network architecture, on larger datasets (see sec. a ). our network implementation is based on pytorch . . (paszke et al., ) . incorporation of additional inputs and metadata. additional metadata in the form of sequencelevel or repertoire-level features could be incorporated into the input via concatenation with the feature vectors that result from taking the maximum of the d cnn outputs w.r.t. the sequence positions. this has the benefit that the attention mechanism and output network can utilize the sequence-level or repertoire-level features for their predictions. sparse metadata or metadata that is only available during training could be used as auxiliary targets to incorporate the information via gradients into the deeprc model. limitations. the current methods are mostly limited by computational complexity, since both hyperparameter and model selection is computationally demanding. for hyperparameter selection, a large number of hyperparameter settings have to be evaluated. for model selection, a single repertoire requires the propagation of many thousands of sequences through a neural network and keeping those quantities in gpu memory in order to perform the attention mechanism and weight update. thus, increased gpu memory would significantly boost our approach. increased computational power would also allow for more advanced architectures and attention mechanisms, which may further improve predictive performance. another limiting factor is over-fitting of the model due to the currently relatively small number of samples (bags) in real-world immunosequencing datasets in comparison to the large number of instances per bag and features per instance. we aimed at constructing immune repertoire classification scenarios with varying degree of realism and difficulties in order to compare and analyze the suggested machine learning methods. to this end, we either use simulated or experimentally-observed immune receptor sequences and we implant signals, which are sequence motifs weber et al., ) , into sequences of repertoires of the positive class. it has been shown previously that interaction of immune receptors with antigens occur via short sequence stretches . thus, implantation of short motif sequences simulating an immune signal is biologically meaningful. our benchmarking study comprises four different categories of datasets: (a) simulated immunosequencing data with implanted signals (where the signal is defined as sets of motifs), (b) lstm-generated immunosequencing data with implanted signals, (c) real-world immunosequencing data with implanted signals, and (d) real-world immunosequencing data. each of the first three categories consists of multiple datasets with varying difficulty depending on the type of the implanted signal and the ratio of sequences with the implanted signal. the ratio of sequences with the implanted signal, where each sequence carries at most implanted signal, corresponds to the witness rate (wr). we consider binary classification tasks to simulate the immune status of healthy and diseased individuals. we randomly generate immune repertoires with varying numbers of sequences, where we implant sequence motifs in the repertoires of the diseased individuals, i.e. the positive class. the sequences of a repertoire are also randomly generated by different procedures (detailed below). each sequence is composed of different characters, corresponding to amino acids, and has an average length of . aas. in the first category, we aim at investigating the impact of the signal frequency, i.e. the wr, and the signal complexity on the performance of the different methods. to this end, we created datasets, whereas each dataset contains a large number of repertoires with a large number of random aa sequences per repertoire. we then implanted signals in the aa sequences of the positive class repertoires, where the datasets differ in frequency and complexity of the implanted signals. in detail, the aas were sampled randomly independent of their respective position in the sequence, while the frequencies of aas, distribution of sequence lengths, and distribution of the number of sequences per repertoire, i.e. the number of instances per bag, are following the respective distributions observed in the real-world cmv dataset (emerson et al., ) . for this, we first sampled the number of sequences for a repertoire from a gaussian n (µ = k, σ = k) distribution and rounded to the nearest positive integer. we re-sampled if the size was below k. we then generated random sequences of aas with a length of n (µ = . , σ = . ), again rounded to the nearest positive integers. each simulated repertoire was then randomly assigned to either the positive or negative class, with , repertoires per class. in the repertoires assigned to the positive class, we implanted motifs with an average length of aas, following the results of the experimental analysis of antigenbinding motifs in antibodies and t-cell receptor sequences by . we varied the characteristics of the implanted motifs for each of the datasets with respect to the following parameters: (a) ρ, the probability of a motif being implanted in a sequence of a positive repertoire, i.e. the average ratio of sequences containing the motif, which is the witness rate. in this way, we generated different datasets of variable difficulty containing in total roughly . billion sequences. see table a for an overview of the properties of the implanted motifs in the datasets. in the second dataset category, we investigate the impact of the signal frequency and complexity in combination with more plausible immune receptor sequences by taking into account the positional aa distributions and other sequence properties. to this end, we trained an lstm (hochreiter & schmidhuber, ) in a standard next character prediction (graves, ) setting to create aa sequences with properties similar to experimentally observed immune receptor sequences. in the first step, the lstm model was trained on all immuno-sequences in the cmv dataset (emerson et al., ) that contain valid information about sequence abundance and have a known cmv label. such an lstm model is able to capture various properties of the sequences, including positiondependent probability distributions and combinations, relationships, and order of aas. we then used the trained lstm model to generate , repertoires in an autoregressive fashion, starting with a start sequence that was randomly sampled from the trained-on dataset. based on a visual inspection of the frequencies of -mers (see section a ), the similarity of lstm generated sequences and real sequences was deemed sufficient for the purpose of generating the aa sequences for the datasets in this category. further details on lstm training and repertoire generation are given in section a . after generation, each repertoire was assigned to either the positive or negative class, with repertoires per class. we implanted motifs of length with varying properties in the center of the sequences of the positive class to obtain different datasets. each sequence in the positive repertoires has a probability ρ to carry the motif, which was varied throughout datasets and corresponds to the wr (see table a ). each position in the motif has a probability of . to be implanted and consequently a probability of . that the original aa in the sequence remains, which can be seen as noise on the motif. in the third category, we implanted signals into experimentally obtained immuno-sequences, where we considered dataset variations. each dataset consists of repertoires for each of the two classes, where each repertoire consists of k sequences. in this way, we aim to simulate datasets with a low sequencing coverage, which means that only relatively few sequences per repertoire are available. the sequences were randomly sampled from healthy (cmv negative) individuals from the cmv dataset (see below paragraph for explanation). two signal types were considered: (a) one signal with one motif. the aa motif ldr was implanted in a certain fraction of sequences. the pattern is randomly altered at one of the three positions with probabilities . , . , and . , respectively. (b) one signal with multiple motifs. one of the three possible motifs ldr, cas, and gl-n was table a : properties of simulated repertoires, variations of motifs, and motif frequencies, i.e. the witness rate, for the datasets in categories "simulated immunosequencing data", "lstm-generated data", and "real-world data with implanted signals". noise types for * are explained in paragraph "real-world data with implanted signals". implanted with equal probability. again, the motifs were randomly altered before implantation. the aa motif ldr changed as described above. the aa motif cas was altered at the second position with probability . and with probability . at the first position. the pattern gl-n, wheredenotes a gap location, is randomly altered at the first position with probability . and the gap has a length of , , or aas with equal probability. additionally, the datasets differ in the values for ρ, the average ratio of sequences carrying a signal, which were chosen as % or . %. the motifs were implanted at positions , , and according to the imgt numbering scheme for immune receptor sequences (lefranc et al., ) with probabilities . , . and . , respectively. with the remaining . chance, the motif is implanted at any other sequence position. this means that the motif occurrence in the simulated sequences is biased towards the middle of the sequence. we used a real-world dataset of repertoires, each of which containing between , to , (avg. , ) tcr sequences with a length of to (avg. . ) aas, originally collected and provided by emerson et al. ( ) . out of repertoires were labelled as positive for cytomegalovirus (cmv) serostatus, which we consider as the positive class, repertoires with negative cmv serostatus, considered as negative class, and repertoires with unknown status. we changed the number of sequence counts per repertoire from − to for sequences. furthermore, we exclude a total of repertoires with unknown cmv status or unknown information about the sequence abundance within a repertoire, reducing the dataset for our analysis to repertoires, of which with positive and with negative cmv status. we give a non-exhaustive overview of previously considered mil datasets and problems in table a . to our knowledge the datasets considered in this work pose the most challenging mil problems with respect to the number of instances per bag (column ). table a : mil datasets with their numbers of bags and numbers of instances. "total number of instances" refers to the total number of instances in the dataset. the simulated and real-world immunosequencing datasets considered in this work contain a by orders of magnitudes larger number of instances per bag than mil datasets that were considered by machine learning methods up to now. we evaluate and compare the performance of deeprc against a set of machine learning methods that serve as baseline, were suggested, or can readily be adapted to immune repertoire classification. in this section, we describe these compared methods. this method serves as an estimate for the achievable classification performance using prior knowledge about which motif was implanted. note that this does not necessarily lead to perfect predictive performance since motifs are implanted with a certain amount of noise and could also be present in the negative class by chance. the known motif method counts how often the known implanted motif occurs per sequence for each repertoire and uses this count to rank the repertoires. from this ranking, the area under the receiver operator curve (auc) is computed as performance measure. probabilistic aa changes in the known motif are not considered for this count, with the exception of gap positions. we consider two versions of this method: (a) known motif binary: counts the occurrence of the known motif in a sequence and (b) known motif continuous: counts the maximum number of overlapping aas between the known motif and all sequence positions, which corresponds to a convolution operation with a binary kernel followed by max-pooling. since the implanted signal is not known in the experimentally obtained cmv dataset, this method cannot be applied to this dataset. the support vector machine (svm) approach uses a fixed mapping from a bag of sequences to the corresponding k-mer counts. the function h kmer maps each sequence s i to a vector representing the occurrence of k-mers in the sequence. to avoid confusion with the sequence-representation obtained from the cnn layers of deeprc, we denote u i = h kmer (s i ), which is analogous to z i . specifically, where #{p m ∈ s i } denotes how often the k-mer pattern p m occurs in sequence s i . afterwards, average-pooling is applied to obtain u = /n n i= u i , the k-mer representation of the input object x. for two input objects x (n) and x (l) with representations u (n) and u (l) , respectively, we implement the minmax kernel (ralaivola et al., ) as follows: where u (n) m is the m-th element of the vector u (n) . the jaccard kernel (levandowsky & winter, ) is identical to the minmax kernel except that it operates on binary u (n) . we used a standard c-svm, as introduced by cortes & vapnik ( ) . the corresponding hyperparameter c is optimized by random search. the settings of the full hyperparameter search as well as the respective value ranges are given in table a a . the same k-mer representation of a repertoire, as introduced above for the svm baseline, is used for the k-nearest neighbor (knn) approach. as this method clusters samples according to distances between them, the previous kernel definitions cannot be applied directly. it is therefore necessary to transform the minmax as well as the jaccard kernel from similarities to distances by constructing the following (levandowsky & winter, ) : d jaccard (u (n) , u (l) ) = − k jaccard (u (n) , u (l) ). (a ) the amount of neighbors is treated as the hyperparameter and optimized by an exhaustive grid search. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . we implemented logistic regression on the k-mer representation u of an immune repertoire. the model is trained by gradient descent using the adam optimizer (kingma & ba, ) . the learning rate is treated as the hyperparameter and optimized by grid search. furthermore, we explored two regularization settings using combinations of l and l weight decay. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . we implemented a burden test (emerson et al., ; li & leal, ; wu et al., ) in a machine learning setting. the burden test first identifies sequences or k-mers that are associated with the individual's class, i.e., immune status, and then calculates a burden score per individual. concretely, for each k-mer or sequence, the phi coefficient of the contingency table for absence or presence and positive or negative immune status is calculated. then, j k-mers or sequences with the highest phi coefficients are selected as the set of associated k-mers or sequences. j is a hyperparameter that is selected on a validation set. additionally, we consider the type of input features, sequences or k-mers, as a hyperparameter. for inference, a burden score per individual is calculated as the sum of associated k-mers or sequences it carries. this score is used as raw prediction and to rank the individuals. hence, we have extended the burden test by emerson et al. ( ) to k-mers and to adaptive thresholds that are adjusted on a validation set. the logistic multiple instance learning (mil) approach for immune repertoire classification (ostmeyer et al., ) applies a logistic regression model to each k-mer representation in a bag. the resulting scores are then summarized by max-pooling to obtain a prediction for the bag. each amino acid of each k-mer is represented by features, the so-called atchley factors (atchley et al., ) . as k-mers of length are used, this gives a total of × = features. one additional feature per -mer is added, which represents the relative frequency of this -mer with respect to its containing bag, resulting in features per -mer. two options for the relative frequency feature exist, which are (a) whether the frequency of the -mer (" mer") or (b) the frequency of the sequence in which the -mer appeared ("tcrβ") is used. we optimized the learning rate, batch size, and early stopping parameter on the validation set. the settings of the full hyperparameter search as well as the respective value ranges are given in table a . for all competing methods a hyperparameter search was performed, for which we split each of the training sets into an inner training set and inner validation set. the models were trained on the inner training set and evaluated on the inner validation set. the model with the highest auc score on the inner validation set is then used to calculate the score on the respective test set. here we report the hyperparameter sets and search strategy that is used for all methods. deeprc. the set of hyperparameters of deeprc is shown in table a . these hyperparameter combinations are adjusted via a grid search procedure. table a : deeprc hyperparameter search space. every · updates, the current model was evaluated against the validation fold. the early stopping hyperparameter was determined by selecting the model with the best loss on the validation fold after updates. * : experiments for { ; ; } kernels were omitted for datasets with motif implantation probabilities ≥ % in the category "simulated immunosequencing data". known motif. this method does not have hyperparameters and has been applied to all datasets except for the cmv dataset. the corresponding hyperparameter c of the svm is optimized by randomly drawing values in the range of [− ; ] according to a uniform distribution. these values act as the exponents of a power of and are applied for each of the two kernel types (see table a a ). knn. the amount of neighbors is treated as the hyperparameter and optimized by grid search operating in the discrete range of [ ; max{n, }] with a step size of . the corresponding tight upper bound is automatically defined by the total amount of samples n ∈ n > in the training set, capped at (see table a ). number of neighbors { ; max{n, }} type of kernel {minmax; jaccard} table a : settings used in the hyperparameter search of the knn baseline approach. the number of trials (per type of kernel) is automatically defined by the total amount of samples n ∈ n > in the training set, capped at . logistic regression. the hyperparameter optimization strategy that was used was grid search across hyperparameters given in table a . learning rate −{ ; ; } batch size max. updates coefficient β (adam) . coefficient β (adam) . weight decay weightings {(l = − , l = − ); (l = − , l = − )} table a : settings used in the hyperparameter search of the logistic regression baseline approach. burden test. the burden test selects two hyperparameters: the number of features in the burden set and the type of features, see table a . number of features in burden set { , , , } type of features { mer; sequence} table a : settings used in the hyperparameter search of the burden test approach. logistic mil. for this method, we adjusted the learning rate as well as the batch size as hyperparameters by randomly drawing different hyperparameter combinations from a uniform distribution. the corresponding range of the learning rate is [− . ; − . ], which acts as the exponent of a power of . the batch size lies within the range of [ ; ]. for each hyperparameter combination, a model is optimized by gradient descent using adam, whereas the early stopping parameter is adjusted according to the corresponding validation set (see table a ). learning rate {− . ;− . } batch size { ; } relative abundance term { mer; tcrβ} number of trials max. epochs coefficient β (adam) . coefficient β (adam) . table a : settings used in the hyperparameter search of the logistic mil baseline approach. the number of trials (per type of relative abundance) defines the quantity of combinations of random values of the learning rate as well as the batch size. in this section, we report the detailed results on all four categories of datasets (a) simulated immunosequencing data (table a ) (b) lstm-generated data (table a ) , (c) real-world data with implanted signals (table a ) , and (d) real-world data on the cmv dataset (table a ) , as discussed in the main paper. ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . svm (minmax) . . . . . . . . . . . . . . . . . . . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . known motif b. . . . . . . . . . . . . . . . . . . . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . table a : auc estimates based on -fold cv for all datasets in category "simulated immunosequencing data". the reported errors are standard deviations across the cross-validation folds except for the last column "avg.", in which they show standard deviations across datasets. wildcard characters in motifs are indicated by z, characters with % probability of being removed by d . table a : auc estimates based on -fold cv for all datasets in category "lstm-generated data". the reported errors are standard deviations across the cross-validation folds except for the last column "avg.", in which they show standard deviations across datasets. characters affected by noise, as described in a , paragraph "lstm-generated data", are indicated by r . table a : results on the cmv dataset (real-world data) in terms of auc, f score, balanced accuracy, and accuracy. for f score, balanced accuracy, and accuracy, all methods use their default thresholds. each entry shows mean and standard deviation across cross-validation folds. we trained a conventional next-character lstm model (graves, ) based on the implementation in https://github.com/spro/practical-pytorch (access date st of may, ) using pytorch . . (paszke et al., ) . for this, we applied an lstm model with lstm blocks in layers, which was trained for , epochs using the adam optimizer (kingma & ba, ) with learning rate . , an input batch size of character chunks, and a character chunk length of . as input we used the immuno-sequences in the cdr column of the cmv dataset, where we repeated sequences according to their counts in the repertoires, as specified in the templates column of the cmv dataset. we excluded repertoires with unknown cmv status and unknown sequence abundance from training. after training, we generated , repertoires using a temperature value of . . the number of sequences per repertoire was sampled from a gaussian n (µ = k, σ = k) distribution, where the whole repertoire was generated by the lstm at once. that is, the lstm can base the generation of the individual aa sequences in a repertoire, including the aas and the lengths of the sequences, on the generated repertoire. a random immuno-sequence from the trained-on repertoires was used as initialization for the generation process. this immuno-sequence was not included in the generated repertoire. finally, we randomly assigned of the generated repertoires to the positive (diseased) and to the negative (healthy) class. we then implanted motifs in the positive class repertoires as described in section a . . as illustrated in the comparison of histograms given in fig. a , the generated immuno-sequences exhibit a very similar distribution of -mers and aas compared to the original cmv dataset. real-world data deeprc allows for two forms of interpretability methods. (a) due to its attention-based design, a trained model can be used to compute the attention weights of a sequence, which directly indicates its importance. (b) deeprc furthermore allows for the usage of contribution analysis methods, such as integrated gradients (ig) (sundararajan et al., ) or layer-wise relevance propagation (montavon et al., ; arras et al., ; montavon et al., ; preuer et al., ) . we apply ig to identify the input patterns that are relevant for the classification. to identify aa patterns with high contributions in the input sequences, we apply ig to the aas in the input sequences. additionally, we apply ig to the kernels of the d cnn, which allows us to identify aa motifs with high contributions. in detail, we compute the ig contributions for the aas and positional features in the kernels for every repertoire in the validation and test set, so as to exclude potential artifacts caused by over-fitting. averaging the ig values over these repertoires then results in concise aa motifs. we include qualitative visual analyses of the ig method on different datasets below. here, we provide examples for the interpretation of trained deeprc models using integrated gradients (ig) (sundararajan et al., ) as contribution analysis method. the following illustrations were created using ig steps, which we found sufficient to achieve stable ig results. a visual analysis of deeprc models on the simulated datasets, as illustrated in tab. a and fig. a , shows that the implanted motifs can be successfully extracted from the trained model and are straightforward to interpret. in the real-world cmv dataset, deeprc finds complex patterns with high variability in the center regions of the immuno-sequences, as illustrated in figure a . real-world data with implanted signals extracted motif implanted motif(s) g r s r a r f r l r d r r r {l r d r r r ; c r a r s; g r l-n} motif freq. ρ . % . % . % table a : visualization of motifs extracted from trained deeprc models for datasets from categories "simulated immunosequencing data", "lstm-generated data", and "real-world data with implanted signals". motif extraction was performed using integrated gradients on the d cnn kernels over the validation set and test set repertoires of one cv fold. wildcard characters are indicated by z, random noise on characters by r , characters with % probability of being removed by d , and gap locations of random lengths of { ; ; } by -. larger characters in the extracted motifs indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the diseased class. contributions to positional encoding are indicated by < (beginning of sequence), ∧ (center of sequence), and > (end of sequence). only kernels with relatively high contributions are shown, i.e. with contributions roughly greater than the average contribution of all kernels. b) c) figure a : integrated gradients applied to input sequences of positive class repertoires. three sequences with the highest contributions to the prediction of their respective repertoires are shown. a) input sequence taken from "simulated immunosequencing data" with implanted motif sz d z d n and motif implantation probability . %. the deeprc model reacts to the s and n at the th and th sequence position, thereby identifying the implanted motif in this sequence. b) and c) input sequence taken from "real-world data with implanted signals" with implanted motifs {l r d r r r ; c r a r s; g r l-n} and motif implantation probability . %. the deeprc model reacts to the fully implanted motif cas (b) and to the partly implanted motif aas c and a at the th and th sequence position (c), thereby identifying the implanted motif in the sequences. wildcard characters in implanted motifs are indicated by z, characters with % probability of being removed by d , and gap locations of random lengths of { ; ; } by -. larger characters in the sequences indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the diseased class. figure a : visualization of the contributions of characters within a sequence via ig. each sequence was selected from a different repertoire and showed the highest contribution in its repertoire. the model was trained on cmv dataset, using a kernel size of , kernels and repertoires for early stopping. larger characters in the extracted motifs indicate higher contribution, with blue indicating positive contribution and red indicating negative contribution towards the prediction of the disease class. table a : tcrβ sequences that had been discovered by emerson et al. ( ) with their associated attention values by deeprc. these sequences have significantly (p-value . e- ) higher attention values than other sequences. the column "quantile" provides the quantile values of the empiricial distribution of attention values across all sequences in the dataset. in this section we investigate the impact of different variations of deeprc on the performance on the cmv dataset. we consider both a cnn-based sequence embedding, as used in the main paper, and an lstm-based sequence embedding. in both cases we vary the number of attention heads and the β parameter for the softmax function the attention mechanism (see eq. in main paper). for the cnn-based sequence embedding we also vary the number of cnn kernels and the kernel sizes used in the d cnn. for the lstm-based sequence embedding we use one one-directional lstm layer, of which the output values at the last sequence position (without padding) are taken as embedding of the sequence. here we vary the number of lstm blocks in the lstm layer. to counter over-fitting due to the increased complexity of these deeprc variations, we added a l weight penalty to the training loss. the factor with which the l weight penalty contributes to the training loss is varied over orders of magnitudes, where suitable value ranges were manually determined on one of the training folds beforehand. to reduce the computational effort, we do not consider all numbers of kernels that were considered in the main paper. furthermore, we only compute the auc scores on of the cross-validation folds. the hyperparameters, which were used in a grid search procedure, are listed in tab. a for the cnn-based sequence embedding and tab. a for the lstm-based sequence embedding. results. we show performance in terms of auc score with single hyperparameters set to fixed values so as to investigate their influence in tab. a for the cnn-based sequence embedding and tab. a for the lstm-based sequence embedding. we note that due to restricted computational resources this study was conducted with fewer different numbers of cnn kernels, with the auc estimated from only of the cross-validation folds, which leads to a slight decrease of performance in comparison to the full hyperparameter search and cross-validation procedure used in the main paper. as can be seen in tab. a and a , the lstm-based sequence embedding generalizes slightly better than the cnn-based sequence embedding. table a : impact of hyperparameters on deeprc with lstm for sequence encoding. mean ("mean") and standard deviation ("std") for the area under the roc curve over the first folds of a -fold nested cross-validation for different sub-sets of hyperparameters ("sub-set") are shown. the following sub-sets were considered: "full": full grid search over hyperparameters; "beta=*": grid search over hyperparameters with reduction to specific value * of beta value of attention softmax; "heads=*": grid search over hyperparameters with reduction to specific number * of attention heads; "lstms=*": grid search over hyperparameters with reduction to specific number * of lstm blocks for sequence embedding. table a : impact of hyperparameters on deeprc with d cnn for sequence encoding. mean ("mean") and standard deviation ("std") for the area under the roc curve over the first folds of a -fold nested cross-validation for different sub-sets of hyperparameters ("sub-set") are shown. the following sub-sets were considered: "full": full grid search over hyperparameters; "beta=*": grid search over hyperparameters with reduction to specific value * of beta value of attention softmax; "heads=*": grid search over hyperparameters with reduction to specific number * of attention heads; "ksize=*": grid search over hyperparameters with reduction to specific kernel size * of d cnn kernels for sequence embedding; "kernels=*": grid search over hyperparameters with reduction to specific number * of d cnn kernels for sequence embedding. a compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding predicting the sequence specificities of dna-and rna-binding proteins by deep learning explaining and interpreting lstms solving the protein sequence metric problem rank-loss support instance machines for miml instance annotation augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires multiple instance learning: a survey of problem characteristics and applications vdjserver: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements tetramer-visualized gluten-specific cd + t cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge ireceptor: a platform for querying and analyzing antibody/b-cell and t-cell receptor repertoire data across federated repositories support-vector networks quantifiable predictive features define epitope-specific t cell receptor repertoires on a model of associative memory with huge storage capacity bert: pre-training of deep bidirectional transformers for language understanding solving the multiple instance problem with axis-parallel rectangles predicting the spectrum of tcr repertoire sharing with a data-driven model of recombination immunosequencing identifies signatures of cytomegalovirus exposure history and hla-mediated effects on the t cell repertoire predicting antigen-specificity of single t-cells based on tcr cdr regions. biorxiv a review of multi-instance learning assumptions deep sequencing of b cell receptor repertoires from covid- evaluation and benchmark for biological image segmentation the promise and challenge of high-throughput sequencing of the antibody repertoire tcrex: detection of enriched t cell epitope specificity in full t cell receptor sequence repertoires. biorxiv identifying specificity groups in the t cell receptor repertoire generating sequences with recurrent neural networks. arxiv a bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status learning the high-dimensional immunogenomic features that predict public and private antibody repertoires improving neural networks by preventing co-adaptation of feature detectors long short-term memory fast model-based protein homology detection without alignment neural networks and physical systems with emergent collective computational abilities convolutional neural network architectures for matching natural language sentences attention-based deep multiple instance learning nettcr: sequence-based prediction of tcr binding to peptide-mhc complexes using convolutional neural networks basset: learning the regulatory code of the accessible genome with deep convolutional neural networks detecting cutaneous basal cell carcinomas in ultra-high resolution and weakly labelled histopathological images self-normalizing neural networks capturing the differences between humoral immunity in the normal and tumor environments from repertoire-seq of b-cell receptors using supervised machine learning observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires dense associative memory for pattern recognition dense associative memory is robust to adversarial inputs gradient-based learning applied to document recognition set transformer: a framework for attention-based permutation-invariant neural networks imgt unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains distance between sets methods for detecting associations with rare variants for common diseases: application to analysis of sequence data the extended cohnkanade dataset (ck+): a complete dataset for action unit and emotion-specified expression high-throughput immune repertoire analysis with igor a framework for multiple-instance learning computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires longitudinal high-throughput tcr repertoire profiling reveals the dynamics of t cell memory formation after mild covid- infection. biorxiv methods for interpreting and understanding deep neural networks layer-wise relevance propagation: an overview how many different clonotypes do immune repertoires contain? current opinion in systems biology treating biomolecular interaction as an image classification problem -a case study on t-cell receptorepitope recognition prediction. biorxiv sumrep: a summary statistic framework for immune receptor repertoire comparison and model validation biophysicochemical motifs in t-cell receptor sequences distinguish repertoires from tumor-infiltrating lymphocyte and adjacent healthy tissue pytorch: an imperative style, high-performance deep learning library needles in haystacks: on classifying tiny objects in large images interpretable deep learning in drug discovery pointnet: deep learning on point sets for d classification and segmentation graph kernels for chemical informatics cov-abdab: the coronavirus antibody database. biorxiv immunedb, a novel tool for the analysis, storage, and dissemination of immune repertoire sequencing data a $$k$$-nearest neighbor based algorithm for multi-instance multi-label active learning machine learning in automated text categorization high-throughput mapping of b cell receptor sequences to antigen specificity vdjtools: unifying post-analysis of t cell receptor repertoires vdjdb: a curated database of t-cell receptor sequences with known antigen specificity deeptcr: a deep learning framework for understanding t-cell receptor sequence signatures within complex t-cell repertoires prediction of specific tcr-peptide binding from large dictionaries of tcr-peptide pairs. biorxiv axiomatic attribution for deep networks attention-based deep neural networks for detection of cancerous and precancerous esophagus tissue on histopathological slides learning with sets in multiple instance regression applied to remote sensing attention is all you need revisiting multiple instance neural networks novel approaches to analyze immunoglobulin repertoires immunesim: tunable multi-feature simulation of b-and t-cell receptor repertoires for immunoinformatics benchmarking genome-wide protein function prediction through multiinstance multi-label learning rare-variant association testing for sequencing data with the sequence kernel association test polyspecificity of t cell and b cell receptor recognition practical guidelines for b-cell receptor repertoire sequencing analysis learning embedding adaptation for few-shot learning convolutional neural network architectures for predicting dna-protein binding pird: pan immune repertoire database multi-instance multi-label learning with application to scene classification predicting effects of noncoding variants with deep learning-based sequence model the ellis unit linz, the lit ai lab and the institute for machine learning are supported by the land oberösterreich, lit grants deeptoxgen ( in the following, the appendix to the paper "modern hopfield networks and attention for immune key: cord- - hbw yo authors: gisbert-muñoz, sandra; quiñones, ileana; amoruso, lucia; timofeeva, polina; geng, shuang; boudelaa, sami; pomposo, iñigo; gil-robles, santiago; carreiras, manuel title: multimap: multilingual picture naming test for mapping eloquent areas during awake surgeries date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hbw yo picture naming tasks are currently the gold standard for identifying and preserving language-related areas during awake brain surgery. with multilingual populations increasing worldwide, patients frequently need to be tested in more than one language. there is still no reliable testing instrument, as the available batteries have been developed for specific languages. heterogeneity in the selection criteria for stimuli leads to differences, for example, in the size, color, image quality, and even names associated with pictures, making direct cross-linguistic comparisons difficult. here we present multimap, a new multilingual picture naming test for mapping eloquent areas during awake brain surgery. recognizing that the distinction between nouns and verbs is necessary for detailed and precise language mapping, multimap consists of a database of standardized color pictures representing both objects and actions. these images have been tested for name agreement with speakers of spanish, basque, catalan, italian, french, english, mandarin chinese, and arabic, and have been controlled for relevant linguistic features in cross-language combinations. the multimap test for objects and verbs represents an alternative to the do monolingual pictorial set currently used in language mapping, providing an open-source, standardized set of up-to-date pictures, where relevant linguistic variables across several languages have been taken into account in picture creation and selection. human language is a complex system of communication that supports the decoding, encoding, and transfer of information between individuals. it is a system that allows for communication not only about the here and now, but also about the past, the future, truths, lies, hopes, and desires. it is important for personal growth and socialization, but also for human development as the vehicle for cultural transmission. losing or having impaired language ability can be a traumatic event that incurs hardship for affected individuals and those around them. brain surgery procedures can unintentionally damage the language substrate, inducing impairments that can be irreversible (duffau et al., ) . for this reason, a patient is tested to identify eloquent areas that should not be removed in order to preserve or, in some cases, even improve, their quality of life (ilmberger et al., ) . since the late s, when ojemann and mateer first reported using a visual object naming test during cortical stimulation in awake surgery (ojemann & mateer, ) , the technique has become the gold standard in testing brain lesions involving language-related areas (de witte & mariën, ; miceli, capasso, monti, santini, & talacchi, ) . this language mapping procedure allows for the assessment of different language-related operations (i.e., access, retrieval and production of lexical-semantic information); the patient, presented with a series of drawings or pictures, is asked to name the depicted objects using a noun. although the use of object naming tasks is widespread across surgical teams in many different geographical locations, heterogeneity in the stimuli selection criteria of previous batteries (i.e., differences in picture size, color, image quality, name agreement), in addition to the use of morphologically and typologically different languages across different studies (see supplementary material for a table review), has greatly hindered the comparison and generalization of results. to our knowledge, a battery allowing for direct comparison between two languages in awake surgery has never been designed, even if some teams have tested multilingual patients in more than one language (giussani, roux, lubrano, gaini, & bello, ) . the critical need for a multilingual approach is evident given reports on bilingual aphasia where, in some cases, patients proficient in two languages prior to the lesion are selectively impaired in one after surgery (fabbro, ) . it is also known from the aphasic literature that linguistic variables, such as word frequency and imageability, among others, influence lexical performance (luzzatti et al., ) . however, multilingual brain stimulation studies have not reported how they controlled stimuli across languages, despite the fact that this choice of stimuli could influence findings concerning the brain areas that are common or specific to these languages (bello & acerbi, ; cervenka, boatman-reich, ward, franaszczuk, & crone, ; roux, lauwers-cances, trémoulet, mascott, & démonet, ) . for these reasons, and given that the multilingual population in our society continues to increase as more and more people know and employ a second language in their daily lives, a multilingual evaluation tool has become necessary (eurostat, ) . in this paper, we present multimap, a new multilingual battery of standardized pictures in spanish, with norms for basque, catalan, italian, french, english, mandarin chinese, and arabic. this is a tool that will allow surgical teams to test patients in bilingual contexts in a controlled and comparable manner. we first conducted a systematic revision of the literature on naming tasks for awake surgery (see supplementary material for a table review). out of the articles that reported using an object naming task, included an introductory sentence (e.g. "this is…") printed above the picture to elicit the production of a determiner-noun pair (e.g. "this is… an apple" instead of "apple") (hamberger et al., ; hamberger, seidel, goodman, & mckhann, ; ille et al., ; khan, herbet, moritz-gasser, & duffau, ; lubrano, roux, & démonet, ; moritz-gasser & duffau, ; g. ojemann & mateer, ; roux, borsa, & démonet, ; roux, boukhatem, draper, sacko, & démonet, ; rutten, ) . patients are required to overtly produce grammatical information related to the selection of the appropriate determiner, as it encodes number and, in some languages, also gender information (i.e., in spanish "unaf.sg. manzana f.sg." ["an apple"]). this kind of task allows for the identification of different types of errors, most frequently: speech arrest, in which the patient is unable to speak; and anomia, where the patient can read the introductory phrase, but cannot retrieve and produce the noun . in addition to the object naming task, some neurosurgery teams have introduced verb tests in their practices. we retrieved studies that reported the use of images to elicit verb production (chen, tan, deng, & xu, ; conner, chen, pieters, & tandon, ; corina et al., ; other errors that can be identified using this kind of task are ( ) semantic paraphasia if, instead of the target noun, a semantically related word is produced (e.g. "tiger" for "lion"); ( ) phonological paraphasia, when the patient produces the target word with phonological deviations (e.g. "fable" for "table"); ( ) neologism creation, inventing new word; or ( ) perseveration, if the patient repeats an item, even after a new image is presented. other significant limitations can also be detected using this task, such as delays in producing a response and hesitations. havas et al., ; herholz et al., ; lubrano, filleron, démonet, & roux, ; j. g. ojemann, ojemann, & lettich, b; papagno et al., ; rofes et al., ; roux et al., ; sierpowska et al., ; skrap, marin, ius, fabbro, & tomasino, ; tomasino et al., ) . in these tasks, patients were presented with a drawing or a picture of an action and they were asked to produce the appropriate verb: either a finite or an infinitive form, depending on the specific requirements of the task. the generation of finite verbs, unlike infinitives, requires the production of inflectional features that, depending on the language, may encode for number, gender, person, and/or time. verbs refer to events and imply the projection of a complex representation in which the agent is associated with a specific thematic role and they differ from nouns at the lexical, semantic, morphological, and syntactic levels. in addition, the number of morphologically inflected forms is higher for verbs than for nouns. this distinction between nouns and verbs has been demonstrated at the behavioral, electrophysiological, and neuroanatomical levels (vigliocco, vinson, druks, barber, & cappa, ) , demonstrating a double dissociation that should be taken into account when planning an awake surgery. notably, direct cortical stimulation studies also show this double dissociation when object and action naming tasks are used, and allow for the identification of distinct territories in frontal and temporal brain areas in which stimulation selectively impairs verb or noun production (corina et al., ; crepaldi, berlingeri, paulesu, & luzzatti, ; lubrano et al., ; j. g. ojemann, ojemann, & lettich, a; rofes et al., ) . from these studies, it seems clear that comparing nouns and verbs is necessary for a detailed and precise language mapping procedure. several instruments have been previously employed in this endeavor. the most common batteries reported for mapping nouns are the oral denomination (metz-lutz, kremin, & deloche, ) (do ); the images included in the boston diagnostic aphasia examination (goodglass & kaplan, ) (bdae); and the pictures from the snodgrass and vanderwaart battery (snodgrass & vanderwart, ) . these sets of images were not designed for intraoperative language mapping, and name agreement norms and values for relevant linguistic variables, such as word frequency, length, or familiarity, are not provided or are only available for some of the languages in which they have been employed. in addition, payment is required to access both the do and the bdae, restricting their use. in some other studies using intraoperative object naming, stimuli are only described as "black-and-white drawings", "simple drawings" or just "pictures representing common objects", and no further information is given about their construction or the linguistic variables of the target words. in studies that have carried out language mapping using verbs, some teams have reported using the same pictures for verb action naming that they use in their object naming tasks (papagno et al., ) , while others have presented pictures depicting actions (havas et al., ) . for verbs, as for nouns, only some studies have reported controlling their stimuli for the relevant linguistic variables or have described how these stimuli were constructed and selected. to address these shortcomings and the critical demand for multilingual material, we developed a multilingual picture naming test (multimap) for the mapping of eloquent areas during awake brain surgery. multimap consists of a database of standardized color pictures of common objects and actions, tested for name agreement measures in speakers of spanish, basque, catalan, italian, french, english, mandarin chinese, and arabic. in the multimap object and action naming subtests, items were independently selected for each language and controlled for a number of linguistic variables (i.e. frequency, length, and familiarity). we also equated variables for spanish and each of the other languages separately to facilitate testing in multilingual patients. this new battery will help teams plan surgical interventions, providing them with a sensitive, validated instrument for intraoperative language mapping offering better results in terms of the patient's health and overall quality of life. in the following sections, we include a detailed description of the material and its validation in order to facilitate replication and extension to other languages in future research. one hundred and twenty-three healthy adult volunteers aged to years were recruited and paid for the object naming task, and , within the same age range, for the verb naming task. a detailed description of the sample used for each language is included in table (i.e., sample size, ages, gender, educational level, and linguistic profile). they all had normal or corrected to normal vision and no history of psychiatric, neurological diseases, or learning disabilities. a signed informed consent was collected from each participant as stipulated in the ethics approval procedure of the bcbl research ethics committee. all of the participants were from the basque country in spain and were highly proficient spanish-basque bilinguals. centimeters; dpi = pixels/centimeter) and similar styles for the whole set. verbs were depicted with a human agent (see figure for an example). figure : example of stimulus presentation for both object and action naming. an acoustic cue is presented during the fixation cross, ms before the stimulus onset. the fixation cross appears on the screen for s, followed by the target picture which appears for to s, self-paced as determined by the speed of the patient's response. the name agreement data was collected using the lime survey platform phase (validation). once we had the final subset of images, we prepared the images for use in the awake surgery setting. above each object, we added the text "esto es…" ["this is…" in spanish] to force participants to produce a short sentence that would have to agree in number and gender with the target noun. above the action pictures, we included a noun phrase to be used as the subject of the sentence, that is either "Él…" or "ella……" ["he…" or "she…" in spanish] depending on the gender of the agent. this introductory text was used as a cue for the production of a sentence that started with the given subject and had a finite verb form in rd person singular. we used matlab version b and cogent toolbox (http://www.vislab.ucl.ac.uk/cogent.php) to present the images, as they would be used in a surgery setting. first, a white screen with a black fixation cross appears for s, followed by a picture presented for s. an acoustic cue is given ms before the onset of each stimulus during the fixation cross (the matlab script and its compiled version are available for use at https://git.bcbl.eu/sgisbert/multimap ). phase (cross-language combinations). seven norming studies were carried out. the first one included the spanish set of stimuli, described in phase , and the other five comprised the different cross-language combinations (i.e., spanish-basque, spanish-catalan, spanish-italian, spanish-french, and spanish-english). for each of these combinations, pictures were tested for name agreement, following the same procedure described in phase , in samples of highly proficient users of the languages. frequency, length, and information about orthographic neighbors were extracted for the words in each language from the following databases: for basque, e-hitz (perea et al., ) arabic, aralex (boudelaa & marslen-wilson, ) . stimuli with at least % name agreement were selected so that nouns and verbs did not show significant differences in frequency, number of letters, picture-name agreement, and h-index, both within and between the two languages of each pair. depending on the language, sentence production requires the use of different grammatical devices (e.g., case markings, word order, inflectional morphology, etc.). thus, in addition to orthographical and lexical factors, some syntactic constraints were applied. for all languages, we took out plural invariant nouns (i.e. "tijeras", scissors) and kept only transitive verbs. for each of the cross-language combinations, we removed cognates. data processing. a native speaker of each language checked the answers for writing/spelling errors, standardized the writing using capital letters, and merged basic variants of the same target word (e.g. hyphenated, pluralized forms). after that, we excluded trials where participants did not know the name or did not recognize the concept from all analyses. name agreement and hindex (snodgrass & vanderwart, ) were then computed for the remaining nouns and verbs. both of these measures reflect the level of agreement across participants: while name agreement represents the percentage of participants who give a certain answer, the h-index reflects response variability in terms of the number of different answers given by participants. given the name agreement data for each language, we extracted items with a score of at least %. with these items, we first created a list for each language, selecting the same number of nouns and verbs, and ensuring that there were no significant differences across the linguistic values we controlled for (i.e. frequency, length, orthographic neighbors) using unpaired twosample t tests. next, we paired each of the languages with spanish to create the bilingual tasks. here, we selected new items from the % name agreement list to create lists of the same length between languages, again controlled for frequency, h-index, length, and orthographic neighbors (except for the combinations spanish-mandarin chinese, were length and orthographic neighbors were not contemplated, and spanish-arabic, where orthographic neighbors were not contemplated). we calculated anovas between the four lists (objects language x objects language x verbs language x verbs language ) to check that there were no significant differences in the relevant variables. we tested pictures of objects and pictures of actions in a spanish speaking sample. from this picture pool with a constraint of at least % name agreement, we selected objects and verbs that showed no significant differences in name agreement, frequency per million, number of letters, number of phonemes, number of syllables, and familiarity (see table for an overview of the variables and t-test results comparing objects and verbs within languages). the values for imageability and concreteness showed statistically significant differences, with objects higher than verbs in both cases. imageability can be defined as the ease with which a word brings to mind a sensory image, while concreteness is the property of being able to see, hear and touch something (bird, howard, & franklin, ) . verbs as a class are by definition less imageable and concrete than objects, given that objects that can be drawn have stronger sensory features than action verbs, which are primarily functional and motoric (bird, howard, & franklin, ) . phase (cross-language combinations). for bilingual lists, objects and verbs in both languages were always equated for name agreement (higher than % in all cases) and controlled so that linguistic variables did not show significant differences (see table language is the main vehicle humans use to communicate and transfer information. groups of neurons arranged in networks support language functions. any modification to this system (e.g., brain lesions, epilepsy seizure, etc.) may irreversibly impair language capacity, leading to unexpected and problematic consequences for the affected individual. for this reason, brain surgeries are increasingly planned in an awake setting, making it possible to monitor patients' language function and spare brain tissue that is indispensable (de witte & mariën, ) . there is a critical need for specific tools, based on linguistic, neuroscientific, and clinical knowledge about how the brain decodes and encodes linguistic information, that can facilitate precise mapping of these eloquent regions during neurosurgery. to address this need, we developed the multimap test, a multilingual picture naming task including both objects and actions for mapping eloquent areas during awake brain surgery. images included in the multimap test are colored drawings of objects and actions that have in spite of empirical evidence demonstrating neuroanatomical distinctions for nouns and verbs (vigliocco et al., ) and native and second/third languages (giussani et al., ) , there is no available material that tackles both of these issues at the same time. in this regard, multimap constitutes the first tool designed to explore these factors in a structured way, in order to identify and preserve eloquent brain tissue, and to obtain better results in terms of patients' health and overall quality of life. multimap includes two separate tests, one for objects and one for actions to address the noun/verb issue; both tests are controlled so that there are no significant differences for linguistic variables such as frequency, length, and orthographic neighbors. the need to map both objects and actions is motivated by evidence of a double dissociation, demonstrated at the behavioral, electrophysiological, and neuroanatomical levels (vigliocco et al., ) and, as pointed out in the introduction, also reported in direct cortical stimulation studies. from the reviewed studies where a picture naming task had been used in the context of awake brain surgery, we found only that included a verb task in addition to object naming. although these tasks were varied in their requirements and the nature of the stimuli they employed, they all identified distinct territories, mainly in frontal and temporal brain areas where stimulation impaired verb and noun production separately (corina et al., ; crepaldi et al., ; lubrano et al., ; ojemann et al., a; rofes et al., ) . moreover, our tasks include an extra level of complexity beyond the extraction of morphosyntactic information, as the production of target objects and actions have been embedded in simple sentences, such as "this is a house" for objects, and "he/she sings" in the case of actions. this entails a higher level of complexity since the generation of such sentences requires the projection of representations in which thematic roles are assigned to different elements in the sentence (i.e., heagent sings), in addition to matching the target word to its real-world referent. therefore, combining object and verb processing tasks at the sentence level ensures a more accurate and thorough mapping of language functions, helping to more accurately identify and preserve the neural linguistic substrate essential for a patient's quality of life. the second improvement offered by multimap is its multilingual nature. neuroimaging studies in bilinguals suggest that there is a common cerebral organization across languages, but also describe activations specific to each of the languages (kim, relkin, lee, & hirsch, ; marian, spivey, & hirsch, ; rueckl et al., ) . direct electrostimulation studies have also revealed language-specific areas (giussani et al., ) .this study concluded that multilingual patients should be tested in all languages in which they are fluent during brain mapping procedures so as to avoid selective or preferential impairments. with this objective in mind, the images included in multimap were tested in seven languages taking into account the lexical and morphological features of each language. this resulted in seven separate sets of object and action pictures with at least % name agreement in their target language, accounting for relevant linguistic variables like frequency and word length. these sets can be combined in controlled bilingual sets, enabling researchers from different countries to use the same materials, and to compare results not only from monolingual samples but also in cross-linguistic research on multilingual patients. it will also play a role in the postoperative quality of life for multilingual patients, as these materials will facilitate the identification and preservation of areas where interference impairs only one of their languages (giussani et al., ) , areas that might not be detected by monolingual tests. esploracolfis: un'interfaccia web per le ricerche sul corpus e lessico di frequenza dell'italiano scritto (colfis) intraoperative language lozalization in multilingual patients with gliomas why is a verb like an inanimate object? grammatical category and semantic category deficits verbs and nouns: the importance of being imageable aralex: a lexical database for modern standard arabic language mapping in multilingual patients: electrocorticography and cortical stimulation during naming syntactic prediction in sentence reading: evidence from eye movements category specific spatial dissociations of parallel processes underlying visual naming dissociation of action and object naming: evidence from cortical stimulation mapping a place for nouns and a place for verbs? a critical review of neurocognitive data on grammatical-class effects the neurolinguistic approach to awake surgery reviewed espal: one-stop shopping for spanish word properties contribution of intraoperative electrical stimulations in surgery of low grade gliomas: a comparative study between two series without ( - ) and with ( - ) functional mapping in the same institution foreign language skills statistics the bilingual brain: cerebral representation of languages review of language organisation in bilingual patients: what can we learn from direct brain mapping? the assessment of aphasia and related disorders nim: a web-based swiss army knife to select stimuli for psycholinguistic studies functional differences among stimulation-identified cortical naming sites in the temporal region does cortical mapping protect naming if surgery includes hippocampal resection? electrical stimulation mapping of nouns and verbs in broca's area preoperative activation and intraoperative stimulation of languagerelated areas in patients with glioma combined noninvasive language mapping by navigated transcranial magnetic stimulation and functional mri and its comparison with direct cortical stimulation intraoperative mapping of language functions: a longitudinal neurolinguistic analysis the role of left inferior frontooccipital fascicle in verbal perseveration: a brain electrostimulation mapping study distinct cortical areas associated with native and second languages anatomical correlates for category-specific naming of objects and actions: a brain stimulation mapping study writing-specific sites in frontal areas: a cortical stimulation study verb-noun double dissociation in aphasic lexical impairments: the role of word frequency and imageability shared and separate systems in bilingual language processing: converging evidence from eyetracking and brain imaging standardisation d'un test de dénomination orale: contrôle des effets de l'âge, du sexe et du niveau de scolarité chez les sujets adultes normaux language testing in brain tumor patients evidence of a large-scale network underlying language switching: a brain stimulation study human language cortex: localization of memory, syntax, and sequential motor-phoneme identification systems cortical stimulation mapping of language cortex by using a verb generation task: effects of learning and comparison to mapping based on object naming cortical stimulation mapping of language cortex by using a verb generation task: effects of learning and comparison to mapping based on object naming what is the role of the uncinate fasciculus? surgical removal and proper name retrieval e-hitz: a word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (basque) mapping nouns and finite verbs in left hemisphere tumors: a direct electrical stimulation study the mute who can sing": a cortical stimulation study on singing cortical calculation localization using electrostimulation language functional magnetic resonance imaging in preoperative assessment of language areas: correlation with direct cortical stimulation intraoperative mapping of cortical areas involved in reading in mono-and bilingual patients universal brain signature of proficient reading: evidence from four contrasting languages speech hastening during electrical stimulation of left premotor cortex the glasgow norms: ratings of , words on nine scales morphological derivation overflow as a result of disruption of the left frontal aslant white matter tract brain mapping: a novel intraoperative neuropsychological approach a standardized set of pictures: norms for name agreement, image agreement, familiarity, and visual complexity involuntary switching into the native language induced by electrocortical stimulation of the superior temporal gyrus: a multimodal mapping study nouns and verbs in the brain: a review of behavioural, electrophysiological, neuropsychological and imaging studies we would like to thank the bcbl lab department for data recordings and magda altman for her useful comments on the manuscript. key: cord- -ys i authors: xu, xinhui; luo, tao; gao, jinliang; lin, na; li, weiwei; xia, xinyi; wang, jinke title: crispr-assisted dna detection, a novel dcas -based dna detection technique date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ys i nucleic acid detection techniques are always critical to diagnosis, especially in the background of the present covid- pandemic. the simple and rapid detection techniques with high sensitivity and specificity are always urgently needed. however, the current nucleic acid detection techniques are still limited the traditional amplification and hybridization. to overcome the limitation, we here develop a crispr/cas -assisted dna detection (cadd). in this detection, dna sample is incubated with a pair of capture sgrnas (sgrnaa and sgrnab) specific to a target dna, dcas , a signal readout-related probe, and an oligo-coated solid support beads or microplate at room temperature for min. during this incubation, the dcas -sgrna-dna complex is formed and captured on solid support by the capture sequence of sgrnaa and the signal readout-related probe is captured by the capture sequence of sgrnab. finally the detection result is reported by a fluorescent or colorimetric signal readout. this detection was verified by detecting dna of bacteria, cancer cell and virus. especially, by designing a set of sgrnas specific to high-risk human papillomaviruses (hpvs), the hpv infection in clinical cervical samples were successfully detected by the method. all detections can be finished in minutes at room temperature. this detection holds promise for rapid on-the-spot detection or point-of-care testing (poct). with collateral cleavage activity of single-stranded dna (ssdna) was used to develop detectr (dna endonuclease targeted crispr trans reporter) ( , ) and holmes (a one-hour low-cost multipurpose highly efficient system) ( , ) . immediately, the holmesv (holmes . ) and cdetection technology were developed by using the collateral cleavage activity to single-stranded dna (ssdna) of cas b (also known as c c ) ( , ) . these methods are demonstrated to have attomolar sensitivity and single-base specificity. the ultrahigh sensitivity of these methods partially rely on the signal amplification resulted from collateral cleavage activity of related cas enzymes. these methods are therefore used to detect various pathogenic viruses such as zika virus (zikv) ( , ) , dengue virus (denv) ( ) , human papillomavirus (hpv) ( ) , japanese encephalitis virus (jev) ( ) , african swine fever virus s (asfv) ( ) , and mycobacterium tuberculosis (mtb) ( ) . recently, these techniques are striving to detect sars-cov- ( ) ( ) ( ) ( ) ( ) ( ) . however, these crispr cas-based detection methods typically require dna pre-amplification with polymerase chain reaction (pcr) ( ) , reverse transcription-rpa (rt-rpa) ( ) , recombinase polymerase amplification (rpa) ( , , , , ) , loop-mediated isothermal amplification (lamp) ( , ) , reverse transcription-lamp (rt-lamp) ( , ) , and asymmetric pcr ( ) . these pre-amplifications also partially contribute to the ultrahigh sensitivity of these methods. additionally, sherlock also needs reverse transcription and t in vitro transcription when detecting rna ( ) . such amplification dependence still makes these methods difficult to become simple, rapid and portable diagnostic tools. in this study, we develop a new dna detection method based on crispr/cas named as crispr/cas -assisted dna detection (cadd). in this method, a pair of capture sgrnas (sgrnaa and sgrnab) are designed for a target dna. sgrnaa and sgrnab harbor a different ′ terminal capture sequence. when target dna is bound by a pair of dcas -sgrna complexes, the dcas -sgrna-dna complex will be captured on surface of beads or microplate via annealing between an oligonucleotide coupled on solid supports and capture sequence of sgrnaa. then the captured dcas -sgrna-dna complex is reported by a kind of signal reporter captured by the capture sequence of sgrnab. this method was validated by detecting dna of bacteria, cancer cell and virus. especially, by designed a set of sgrnas specific to high-risk human papillomaviruses (hpvs), this study successfully detected clinical cervical samples with this method. of importance, a target dna can be rapidly detected in less than minutes using a signal readout of fluorescent hybridization chain reaction (hcr). this method has its unique advantages over the current methods such as free of enzyme, pre-amplification, modular heaters, simplicity and rapidness. sgrna was designed with online sgrna design software chop-chop (http://chopchop.cbu.uib.no/), using hg as the reference genome. the designed sgrnas are shown in table s . each dna target had a pair of sgrna, named sgrnaa and sgrnab, respectively. according to the designed sgrna, primers (table s ) were synthesized to amplify sgrna template by pcr using our previous protocol ( ) . the pcr-amplified sgrna template has a t promoter sequence. sgrna was then prepared by in vitro transcription using sgrna template as previously described ( ) . the prepared sgrna had a ′-end bp target dna-specific sequence and a ′-end capture sequence. sgrnaa had a ′-end capture sequence named as capture that was used to anneal with the capture oligonucleotide immobilized on the surface of the magnetic beads or microplate. sgrnab had a ′-end capture sequence named as capture that was used to anneal with a signal reporter-associated oligonucleotide. the full-length l fragments of high-risk hpv (hrhpv) were cloned in the pmd plasmid (takara) to prepare hpv plasmids, including: pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , pmd-hpv , and pmd-hpv . oligonucleotides fam-hairpin- and fam-hairpin- (table s ) (table s ) or sgrna. the reaction was kept at rt for min and detected by agarose gel electrophoresis. a biotin-modified oligo re-flanking (table s ) all procedures used in this research were performed according to the declaration of helsinki. this study was approved by the ethics committee of jinling hospital (nanjing, china). all participants were recruited from the jinling hospital with informed consent. the clinical hpv detection was performed by jinling hospital (nanjing, china) using a human papillomavirus genotyping (type ) detection kit (pcr-reverse dot hybridization method) (asia energy biotechnology, shenzhen). the gdna extraction and hpv detection (a pcr-reverse dot hybridization method) were all performed with this kit. dna was first tested by hospital and then the left dna was brought to our laboratory. two batches of clinical dna samples were detected by cadd. the beads-hcr detection reaction ( µl) m fam-hairpin- and × binding buffer) and incubated at rt for min in rotation. the beads were then dropped on slide glass and covered with a cover glass. the beads were imaged with a fluorescence microscope. the beads images were analyzed with image pro. single-target beads-elisa detection reaction ( µl) contained × binding buffer, nm sgrnaa, nm sgrnab, nm dcas protein, - × beads@oligo, . µm oligo re-biotin (table s ) beyotime). the beads were incubated at rt for min. the microplate was read at nm and imaged with a biorad gel imager in staining-free mode. an amino-modified oligo re-nh ( ; where n is the number of targets detected. the reaction was added to the oligo-coated microplate. the microplate was incubated at rt for min on a horizontal mixer and then washed three times with the washing buffer. the microplate was added with µl of washing buffer containing ng of hrp-conjugated streptavidin and incubated at rt for min. the microplate was then washed three times with the washing buffer. the microplate was then added with μ l of washing buffer and μ l tmp chromogenic solution for elisa (beyotime). the microplate was incubated for min. the microplate was read at nm and imaged with a biorad gel imager in staining-free mode. to explore the feasibility of cadd, we first designed a beads-hcr method (fig. a) , in which the fluorescent hybrid chain reaction (hcr) was used as signal readout. a pair of sgrna were designed for a target dna. different from the traditional sgrna, the sgrna used by cadd was designed to have a short extended ′ terminal capture sequence that can anneal with other functional oligonucleotides. in detection, a pair of dcas -sgrna (dcas -sgrnaa and dcas -sgrnab) first binds the target dna. the dcas -sgrna-dna complex is then captured onto the surface of beads via annealing between the capture sequence on the sgrnaa and a complementary oligonucleotide coupled on beads. the beads are then washed on magnet and then two hcr hairpins (hairpin and hairpin ) are added. the capture sequence of sgrnab can anneal with hairpin to initiate hcr. because the hairpins are labeled by fluorescein, fluorescent signal can be produced on the surface of beads by hcr. to investigate whether the designed hcr reaction is feasible, we tested the prepared hairpin and hairpin in liquid-phase hcrs with sgrnaa plus sgrnab, sgrnaa, sgrnab, and initiator oligo, respectively. the results show that hcr reaction was only initiated by sgrnab (fig. s ), indicating that the capture sequence of sgrnab annealed with hairpin . the oligo initiator (table s ) is a positive control that can also anneal with hairpin to initiate hcr (fig. s ). with the reliable hairpins and sgrnab, we first detected the hpv dna (pmd-hpv ) with beads-hcr cadd using sgrnas targeting hpv (sgrna ). the results show that pmd-hpv can be quantitatively detected by the method (figs. b and c) . to investigate the specificity of beads-hcr detection, we then detected pmd-hpv and pmd-hpv with beads-hcr cadd using sgrna , sgrna , and sgrnact, respectively. sgrnact is an equimolar mixture of sgrnas of hrhpvs. the results show that the two genotypes of hpvs can be specifically by the method (figs. d and e ). to further explore the feasibility and specificity of in order to investigate whether beads-hcr cadd can be used to detect hpv dna in human gdna, we next detected gdnas of three cervical cancer cell lines using sgrna , sgrna and sgrnact. hela and siha cells are known with hpv and hpv infection, respectively, and c- a is known without hpv infection. the results indicate that the hela gdna was detected by sgrna and sgrnact, the siha gdna was detected by sgrna and sgrnact, and c- a gdna was not detected by any sgrna (fig. s ). these results indicate that beads-hcr cadd is qualified for detection of more complicated dna sample than hpv plasmid. in order to investigate whether beads-hcr cadd can be used to detect clinical sample, we then detected clinical dna samples with sgrnasp of hrhpvs and sgrnact. the results are shown as fig. and supplementary file . we compared the hpv detection results of these clinical samples tested by beads-hcr cadd and pcr-reverse dot hybridization method that was performed by jinling hospital (fig. c ). in comparison with the hospital tests, the hrhpv infection (yes or no) and genotype are accurately detected by beads-hcr cadd with % sensitivity and specificity. importantly, the beads-hcr cadd also found multiple infections in samples , , and that were not detected by the pcr-reverse dot hybridization (fig. c ). these multiple infections were confirmed by a pcr re-detection (pcr-rd) (fig. s ), in which hpv and hpv infection were detected by pcr using primers specific to the two hpvs (supplementary method and table s ). these results indicate that beads-hcr cadd can be used to detect hrhpv infections in clinical samples with high sensitivity and specificity. finally, to investigate whether dna other than virus dna could be detected by beads-hcr cadd, we also detected two types of other dna. one is bacterium dna and the other is human oncogenic dna (supplementary methods). the sgrnas targeting t rna polymerase dna and oncogenic telomerase reverse transcriptase (tert) promoter were designed (table s ). the results indicate that the t rna polymerase dna fragment could be quantitatively detected by beads-hcr ( fig. s a and s b ). in addition, the subsequent detection of gdna from two different e. coli (dh α and bl ) indicate that the t rna polymerase dna in bl gdna could be also specifically and quantitatively detected by beads-hcr cadd ( fig. s c and s d ). the dh α gdna that contains no t rna polymerase dna did not produce fluorescence signal even at the highest amount ( fig. s c and s d ). the detection of tert promoter dna indicate that the ontogenetic tert promoter can be specifically detected by beads-hcr cadd using a sgrnas targeting mutated tert promoter (fig. s ) . the mutant tert promoter causes expression of telomerase, which results in malignant cell proliferation in more than % of cancers. it should be noted that the detected tert promoter has only one base difference between the wild-type and mutant genotype (fig. s ) , indicating that cadd has high specificity that can discriminate single nucleotide polymorphisms (snp). because beads-hcr cadd is dependent on fluorescent microscope, we then expect to realize a cadd with visual readout. we therefore designed a beads-elisa form of cadd (fig. a) . in this format of cadd, after dcas -sgrna binds target dna and the dcas -sgrna-dna complex is captured on beads surface via sgrnaa, a biotinylated oligonucleotide is captured by sgrnab. the hrp-labeled streptavidin is then associated with biotin. finally a soluble chromogenic substrate tmb is used to develop color signal. the detection results can thus be read either qualitatively with naked eyes or quantitatively with microplate reader. as a pilot assay, we first detected pmd-hpv with beads-elisa cadd. the results indicate that pmd-hpv can be quantitatively detected by this method (fig. b- d ). in this assay, pmd was also used as a negative control. it cannot produce color even at the highest concentration ( pm). we then detected hrhpvs with this method using various sgrnasp and sgrnact. the results indicate that each hrhpv were specifically detected by its cognate sgrnasp and sgrnact ( fig. a and b ). the negative control pmd was not detected by any sgrna. to check if this method can detect multiple infections, we subsequently detected seven mixtures of two different hrhpvs using various sgrnasp and sgrnact. the results reveal that the simulated multiple infections were specifically detected by this method (figs. c and d ). to further check the detection specificity, we finally detected several hrhpvs and clinical dna samples with this method. the results demonstrate that each hrhpv was detected by its cognate sgrnasp and sgrnact (fig. e) ; however, all single or mixed clinical dna samples were not detected by any sgrna (fig. e) . because the clinical dna samples were selected from the beads-hcr cadd-detected samples, we focused on investigating if false positive can be produced by sgrna when detecting clinical samples. the results reveal that all selected clinical samples were not detected by any sgrna, further indicating the high specificity of this method. to further simplify the beads-elisa cadd and also try a new solid support other than beads, we designed a microplate-elisa form of cadd (fig. a) . in this method, a capture oligonucleotide is coupled in microplate. the dcas -sgrna-dna complex will be captured in microplate through annealing between the '-end capture sequence of sgrnaa and the capture oligonucleotide. the signal reporting system is completely the same as the beads-elisa cadd. to verify this method, we first detected pmd-hpv using sgrna . the results indicate that pmd-hpv can be quantitatively detected by this method (figs. b and c) . the negative control pmd did not generate a signal at the highest concentration (figs. b and c). we then detected new clinical samples with this method using various sgrnasp and sgrnact. the results reveal that all samples were detected by this method (fig. a and b ). in comparison with the results obtained by hospital tests using the pcr-reverse dot hybridization, the microplate-elisa cadd accurately detected the infections of hrhpvs (yes or no) and genotypes with % sensitivity and specificity. additionally, the microplate-elisa cadd also identified more multiple infections in samples , , , and . to confirm the test results, we re-detected the five samples with microplate-elisa. the results confirm that the microplate-elisa cadd can more accurately identify multiple infections in clinical samples than the current method used in clinics ( fig. d and e ). in this study, we designed a new crispr/cas -based dna detection technique and validated it by detecting three types of dna including bacteria dna, human cancer cell dna and virus dna. we also verified three forms of cadd method by detecting hrhpv plasmids and as many clinical cervical samples. these investigations demonstrate the feasibility and reliability of cadd method. in comparison, cadd has its unique advantages over the current crispr-based nucleic acid detection methods. different from the current widely known crispr-based nucleic acid detection methods that mainly rely on cas enzymes with collateral cleavage activity such as cas a, cas a, cas b, and cas , we develops a new dna detection method based on a most widely used cas protein, crsipr/cas , that has no collateral cleavage activity. cadd relies on neither the specific enzymatic activity nor non-specific collateral cleavage activity of crispr/cas proteins as all the current crispr-based methods. therefore, cadd is mechanistically distinct from the current crispr-based methods (such as sherlock, dectectr, and holmes). cadd employs dcas that has no enzymatic activity. cadd uses the dcas -sgrna as a dna-binding complex with high sequence specificity. this study indicates that cadd can be used to detect and type various target dna molecules with high simplicity, sensitivity and specificity. although the current crispr-based methods are reported to have ultrahigh detection sensitivity (am), they are all challenged by tedious pre-treatments to dna/rna samples including various amplifications using rpa, lamp, pcr, asymmetric pcr and in vitro transcription. these pre-amplifications not only complicate the detection process and increase the detection time and cost, but also may increase false negative, because pre-amplification with various dna polymerases may introduce mutations into dna that is then enzymatically detected by various cas proteins. importantly, amplicon spread and contamination is always a serious issue in all detection spots. cadd needs no any pre-treatment to dna sample. beads-hcr cadd is an enzyme-free method. beads/microplate-elisa cadd only need a widely used enzyme hrp to develop signal. all three forms of cadd methods can be performed without depending on any modular heaters. another advantage of cadd is free of traditional hybridization at high temperature. the whole cadd process is carried out at room temperature. the time-consuming and instrument-dependent hybridization and amplification are key limitations to the on-the-spot application of the current nucleic acid tests. due to these advantages, cadd will have wide applications in future dna detection. the cadd signal readout can be versatile. in this study, two forms of cadd signal readouts were verified, fluorescent and colorimetric readout. the beads-hcr cadd uses fluorescent readout by employing the fluorescently labeled hcr hairpins. the beads/microplate-elisa uses colorimetric readout, in which the tmb color development catalyzed by streptavidin-coupled hrp is used as a visual readout. the beads/microplate-elisa cadd is an instrument-free test. in addition, the microplate-elisa assay allows automatic measurement of hundreds of samples on standard plate reader in a high-throughput format. the whole detection process of beads-hcr and microplate-elisa can be finished in minutes, holding promise for rapid on-the-spot detection or point-of-care testing (poct). in fact, other forms of cadd signal readouts can be realized by a few changes of the solid support and signal reporter, such as lateral flow readout, nano-gold colorimetric assay, and molecular beacon-hcr. the high specificity of cadd has close relationship with the high sequence specificity of dcas -sgrna as a dna-binding complex. additionally, the high specificity of cadd is also dependent on its unique detection mechanism, in which a pair of sgrnas commonly determine the final detection results, providing a double-insurance sequence-specific detection. this overcomes the potential false positive results resulted from potential off-target binding of one dcas -sgrna. this problem still challenges all current crispr-based methods including sherlock, dectectr and holmes, in which only one-site target cutting can activate a collateral cleavage activity. due to the signal amplification produced by collateral cleavage activity, detections based on such one-site activation may be prone to false positives. hpv is a double-stranded dna virus that is closely related to the pathogenesis of cervical cancer, anal cancer and other cancers ( ) . there are about different types of hpv. according to different carcinogenic capabilities, hpv is divided into high-risk hpv (hrhpv) and low-risk hpv (lrhpv). the most common hrhpvs in the world are hpv and hpv , which cause about % of cervical cancers ( , we fig. s . evaluation of hcr reaction. fig. s . detection of gdna of cervical cancer cell lines using beads-hcr cadd. fig. s . detection of hpv in samples and and hpv in sample using specific pcr amplification. fig. s . detection of t rna polymerase dna using beads-hcr cadd. fig. s . detection of tert promoter dna using beads-hcr cadd. s g r n a c t s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv pmd-hpv s g r n a c t s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a s g r n a pmd pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv pmd-hpv +pmd-hpv the unusual origin of the polymerase chain-reaction basic principles of real-time quantitative pcr advances in digital polymerase chain reaction (dpcr) and its emerging biomedical applications isothermal amplification of nucleic acids review: a comprehensive summary of a decade development of the recombinase polymerase amplification crispr/cas systems towards next-generation biosensing nucleic acid detection using crispr/cas biosensing technologies crispr provides acquired resistance against viruses in prokaryotes rna-programmed genome editing in human cells multiplex genome engineering using crispr/cas systems a programmable dual-rna-guided dna endonuclease in adaptive bacterial immunity crispr-cas systems for editing, regulating and targeting genomes rapid, low-cost detection of zika virus using programmable biomolecular components crispr-typing pcr (ctpcr), a new cas -based dna detection method detecting and typing target dna crispr-typing pcr (ctpcr) technique detection of target dna with a novel cas /sgrnas-associated reverse pcr (carp) technique contamination-free loop-mediated isothermal amplification based on the crispr/cas cleavage flash: a next-generation crispr diagnostic for multiplexed detection of antimicrobial resistance sequences an rna-guided cas nickase-based method for universal isothermal dna amplification a crispr-cas -triggered strand displacement amplification method for ultrasensitive dna detection detection of unamplified target genes via crispr-cas immobilized on a graphene field-effect transistor paired design of dcas as a systematic platform for the detection of featured nucleic acid sequences in pathogenic strains next-generation diagnostics with crispr nucleic acid detection with crispr-cas a/c c field-deployable viral diagnostics using crispr-cas multiplexed and portable nucleic acid detection platform with cas , cas a, and csm a crispr-based assay for the detection of opportunistic infections post-transplantation and for the monitoring of transplant rejection crispr-cas a target binding unleashes indiscriminate single-stranded dnase activity programmed dna destruction by miniature crispr-cas enzymes crispr-cas a has both cisand trans-cleavage activities on single-stranded dna crispr-cas a-assisted nucleic acid detection holmesv : a crispr-cas b-assisted platform for nucleic acid detection and dna methylation quantitation crispr-cas b-based dna detection with sub-attomolar sensitivity and single-base specificity rapid detection of african swine fever virus using cas a-based portable paper diagnostics crispr-based rapid and ultra-sensitive diagnostic test for mycobacterium tuberculosis crispr-cas -based detection of sars-cov- a scalable, easy-to-deploy, protocol for cas -based detection of sars-cov- genetic material all-in-one dual crispr-cas a (aiod-crispr) assay: a case for rapid, ultrasensitive and visual detection of novel coronavirus sars-cov- and hiv virus point-of-care testing for covid- using sherlock diagnostics. biorxiv a protocol for detection of covid- using crispr diagnostics an ultrasensitive, rapid, and portable coronavirus sars-cov- sequence detection method based on crispr-cas . biorxiv detection of target dna with a novel cas /sgrnas-associated reverse pcr (carp) technique commercially available molecular tests for human papillomaviruses: a global overview role of human papillomavirus in penile carcinomas worldwide a review of methods for detect human papillomavirus infection key: cord- - njndob authors: broggi, achille; ghosh, sreya; sposito, benedetta; spreafico, roberto; balzarini, fabio; lo cascio, antonino; clementi, nicola; de santis, maria; mancini, nicasio; granucci, francesca; zanoni, ivan title: type iii interferons disrupt the lung epithelial barrier upon viral recognition date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: njndob lower respiratory tract infections are a leading cause of mortality driven by infectious agents. rna viruses such as influenza virus, respiratory syncytial virus and the new pandemic coronavirus sars-cov- can be highly pathogenic. clinical and experimental evidence indicate that most severe and lethal cases do not depend on the viral burden and are, instead, characterized by an aberrant immune response. in this work we assessed how the innate immune response contributes to the pathogenesis of rna virus infections. we demonstrate that type iii interferons produced by dendritic cells in the lung in response to viral recognition cause barrier damage and compromise the host tissue tolerance. in particular, type iii interferons inhibit tissue repair and lung epithelial cell proliferation, causing susceptibility to lethal bacterial superinfections. overall, our data give a strong mandate to rethink the pathophysiological roles of this group of interferons and their possible use in the clinical practice against endemic as well as emerging viral infections. efficiently than mice that bear wt stromal cells, and phenocopy ifnlr -/à ifnlr -/chimeras (fig. g) . these data further support the direct activity of ifn-λ on epithelial cells. interestingly, the gene that was most downregulated in ifnlr -/epithelial cells compared to wt cells is the e ubiquitin-protein ligase makorin- (mrkn ) (fig. a) , which controls p and p stability by favoring their proteasomal degradation ( ) . under oxidative stress condition and dna damage, a hallmark of severe viral infections ( ), p is stabilized by phosphorylation and p degradation, mediated by mkrn , favors apoptosis over dna repair ( ). indeed, ifnlr -/epithelial cells, that express lower levels of mkrn , have elevated levels of p as measured by flow cytometry (fig. h , i). these data indicate that the capacity of ifn-λ to reduce tissue tolerance stems from its capacity to inhibit tissue repair by directly influencing epithelial cell proliferation and viability. also, that p degradation via mrkn upregulation is potently influenced by ifn-λ signaling. rna viruses can use several strategies to modulate the immune response to their advantage( , ), therefore it is crucial to understand the molecular pathways involved in the maintenance of sustained ifn-λ production. moreover, the difference between mrna expression and protein levels of interferons suggest that a low abundance cell type with high secretory capacity may be responsible for long term ifn-λ production. we thus investigated the cellular source and molecular pathways that drive ifn-λ production in our model. early after initial influenza virus infection, ifn-λ is expressed by infected epithelial cells, however, at later time points, dcs from the parenchyma of the lung start to express high levels of the ifn-λ transcript( ). we thus hypothesized that lung dcs are the main producers of ifn-λ and are responsible for the secretion of ifn-λ during viral infections. accordingly, sorted lung resident dendritic cells express high levels of ifn-λ transcript after days of poly (i:c) treatment, in contrast to epithelial cells, alveolar macrophages and monocytes (fig. a) , which, instead, express inflammatory cytokines (fig. s a, b) . moreover, diphtheria toxin (dt)-mediated depletion of cd c + cells in cd c-dt receptor (dtr) mice was sufficient to completely abolish ifn-λ transcript and protein upregulation upon days of poly (i:c) treatment (fig. b, c) , while production remained unaltered (fig. s c with the response measured in vivo, tlr stimulation did not induce ifn production while it induced upregulation of pro-inflammatory cytokines, and intracellular delivery of poly (i:c) induced high levels of ifn-i but not ifn-λ (fig. d, fig. s a , b). in agreement with the key role of tlr , ifn-λ production upon extracellular poly (i:c) encounter was abolished by genetic deletion of the signaling adaptor trif (encoded by the gene ticam ) but not by deletion of the rig-i/mda adaptor mavs (mavs) (fig. d) . conversely, ifn-i production in response to intracellular delivery of poly (i:c) was largely dependent on the signaling adaptor mavs (fig. s a ). consistent with our previous data, when the rig-i/mavs pathway was activated by transfection of the influenza a virus derived pathogen-associated molecular pattern (pamp) -phosphate-hairpin-rna ( p- hprna), ifn-i but not ifn-λ, was efficiently induced in a mavs-dependent manner ( fig. s a - e, poly (i:c) was used as a control). finally, inhibition of endosomal acidification by treatment with the pharmacological agent chloroquine abolished ifn-λ induction in response to extracellular poly (i:c), while it preserved ifn-i production upon intracellular poly (i:c) delivery (fig. s a, b) . these evidences clearly indicate that tlr stimulation potently induces ifn-λ production by dcs in vitro. we, thus, explored the importance of the tlr -trif pathway in vivo under our experimental conditions. dendritic cells sorted from ticam -/mice treated with poly (i:c) for six days did not express appreciable levels of ifn-λ transcripts while still produced type i interferons ( fig. e, f) . moreover, poly (i:c) treated ticam -/mice were protected from s. aureus superinfections (fig. g) , and the decrease in bacterial burden correlated with lower ifn-λ transcript levels in the lung, although ifn-i levels remained similar to those of wt mice (fig. h , i). confirming the crucial role of tlr signaling in dcs for ifn-λ production, chimeric mice in which ticam -/bone marrow (bm) cells are transferred in a wt irradiated host (ticam -/ àwt) phenocopied ticam -/animals ( fig. j-l) . the immune system evolved to prevent and resist to pathogen invasion but doing so often threatens host fitness and causes disease in the form of immunopathology ( ). rna viruses are the major cause of most severe lower respiratory tract viral infections ( , ). while most virus infections manifest as self-limiting upper respiratory tract infections, influenza viruses, sars- cov, sars-cov- and mers-cov can progress to severe lung disease with potentially lethal outcomes ( , , ) . although different viruses vary in their virulence and pathogenic potential, the most severe cases of lung rna viral infections share similar features that suggest an immune pathological etiology. in covid- , sars, mers and flu, severe symptoms and death occur late after the initial symptoms onset, and after the peak in viral load ( - ) further indicating a central role for an immune etiology of the most severe forms. while ifn-λ is uniquely equipped to induce a gentler immune response that favors viral clearance in the lungs without inducing overt immune activation ( , , ), its impact on epithelial cell biology and its effect on the maintenance of tissue integrity and tolerance to pathogen invasion is incompletely understood. in a system that allowed us to isolate the effect of immune activation from resistance to viral infection, we demonstrate that sustained ifn-λ production in the lung in response to viral pamps compromises epithelial barrier function, induces lung pathology and morbidity and predisposes to lethal secondary infections by impairing the capacity of the lungs to tolerate bacterial invasion. loss of lung barrier tolerance is sufficient to induce lethality upon bacterial challenge independently of bacterial growth ( ), and alteration of the repair response in the lung can favor bacterial invasion independently from immune cell control ( ). in our model immune cell recruitment is not affected by ifn-λ and neutrophils are dispensable for the impaired control of bacterial infections, while ifn-λ signaling on epithelial cells is necessary and sufficient to cause heightened bacterial invasion. under our experimental conditions, tlr -trif signaling in conventional lung dcs is responsible for the induction of ifn-λ. this is consistent with reports indicating that tlr -deficient mice are protected from influenza-induced immune pathology( ). moreover, tlr detects replication intermediates from necrotic cells ( ) and is, thus, insensitive to viral immune evasion. this is of particular interest during highly pathogenic human coronavirus infections, whose success in establishing the initial infection is partly due to their ability to dampen tlr and mavs dependent early ifn responses ( ) ( ) (available at https://clinicaltrials.gov/ct /show/nct ). clin. microbiol. rev. , - ( ) . intratracheal instillations (i.t.) were performed as previously described in ( ) rectal temperature and body weights were monitored daily. mice were deemed to have reached endpoint at % of starting weight or after reaching body temperature of °c or lower. to generate mice with hematopoietic-specific deletion of ifnlr or ticam , -week-old cd . + mice were exposed to lethal whole-body irradiation ( rads per mouse) and were reconstituted with × donor bone marrow cells from -week-old wild-type, ifnlr -/or ticam -/mice. mice were treated with sulfatrim in the drinking water and kept in autoclaved cages for weeks after reconstitution. after weeks, mice were placed in cages with mixed bedding from wild-type, and ifnlr -/or ticam -/mice to replenish the microbiome and were allowed to reconstitute for more weeks. a similar procedure was used to generate bone-marrow chimeras with stromal cellsspecific deletion in ifnlr . here, recipient wt or ifnlr -/mice underwent irradiation and were reconstituted with bm cells derived from cd . + mice similarly as described above. to evaluate the percentage of chimerism, a sample of peripheral blood was taken from chimeric mice after weeks of reconstitution and stained for cd . and cd . (antibodies as identified under 'reagents and antibodies') and were analyzed by flow cytometry. in order to deplete cd c + cells, cd c-dtr mice received μg/kg diphtheria toxin (dtx) intravenously starting one day before tlr ligand or saline administration and continuing every other day until day post-treatment to maintain depletion. in vivo depletion of neutrophils was carried out by injecting anti-ly g antibody ( μg/mouse) intraperitoneally, starting one day before treatments and then continuing every other day through the duration of the treatment. as controls for no depletion, mice were injected with rat igg isotype control. to assess lung permeability, treated mice were administered fitc-dextran ( μg/mouse) i.t. before or after s. aureus infection. after hr of dextran instillation, blood was collected from the retro-orbital sinus, and the plasma was separated by centrifugation. leakage of dextran in the bloodstream was measured as fitc fluorescence in the plasma compared to plasma from mocktreated mice. bal was collected as described in ( ) briefly, the lungs of euthanized mice were lavaged through the trachea with ml pbs to collect the bal. samples were centrifuged and the supernatants were used for total protein measurement (pierce bca protein assay, thermo fisher scientific # ) and ldh quantification (pierce ldh cytotoxicity assay, thermo fisher scientific #c ). lungs were excised and used for rna extraction using tri reagent (zymo research #r - - ). the left lobe of the lung was weighed and homogenized in ml of sterile d.i. water in a fisherbrand™ bead mill homogenizer. to calculate bacterial load, homogenate was serially diluted and plated on tsb-agar plates in duplicate. colonies were counted after h incubation, and bacterial burden in the lungs was calculated as cfu normalized to individual lung weight. cytokines production in the lungs was measured in the supernatants collected after centrifuging the lung homogenates. lung cells were isolated as described in ( ) briefly, mice were euthanized and perfused. ml of warm dispase solution ( mg/ml) were instilled into the lungs followed by . ml of % low-melt agarose (sigma #a ) at °c, and allowed to solidify on ice. inflated lungs were incubated in dispase solution, for ' at rt. the lungs were then physically dissociated, incubated ' with dnase i μg/ml and filtered through μm and μm strainers. red blood cells were lysed using ack buffer. single cell suspensions were stained for live/dead using zombie red or zombie violet, and then with antibodies against surface antigens diluted in pbs + bsa . % for minutes at °c. cells were then washed, fixed with . % paraformaldehyde for minutes at room temperature, washed again and resuspended in pbs + bsa . %. samples were acquired on a bd lsrfortessa flow cytometer and data were analyzed using flowjo v. software (bd biosciences). countbright absolute counting beads (invitrogen #c ) were used to quantify absolute cell numbers. purified rna was analyzed for gene expression on a cfx real-time cycler (bio-rad) using a taqman rna-to-ct -step kit (thermo fisher scientific) or sybr green (bio-rad). probes specific for ifnl / , ifnb , il b, rsad , gapdh were purchased from thermo fisher scientific, and sybr-green primers for rsad , cxcl , gapdh were purchased from sigma. cytokine analyses were carried out using homogenized lung supernatants, and cell supernatants from stimulated flt l-dcs. ifnλ / elisa (r&d systems dy b) and mouse ifnβ, il β, il- , tnfα elisa (invitrogen) were performed according to manufacturer's instructions. bronchoalveolar lavages (bal) were obtained from five intensive care unit (icu)-hospitalized sars-cov- -positive patients. in parallel, five naso-oropharyngeal swabs were collected from both sars-cov- -positive and -negative subjects. among positive patients, two were hospitalized but without the need of icu support, whereas three out of five did not require any hospitalization. the negative swabs were obtained from subjects undergoing screening for suspected social contacts with covid- subjects. swabs were performed by using floqswabs® (copan) in utm® universal transport medium (copan). all samples were stored at - °c until processing. the study involving human participants was reviewed and approved by san raffaele hospital irb in the covid- biobanking project. the patients provided written informed consent. rna extraction was performed by using purelink™ rna thermo fisher scientific according to manufacturers' instruction. in particular, µl for each bal and swab analyzed sample were lysed and homogenized in the presence of rnase inhibitors. then ethanol was added to homogenized samples which were further processed through a purelink™ micro kit column for rna binding. after washing, purified total rna was eluted in µl of rnase-free water. system (invitrogen™) protocol by using µl of rna extracted from each bal and swab sample. qrt-pcr analysis for was then carried out for evaluating il , il b, ifnb , ifna , ifnl and ifnl expression. all transcripts were tested in triplicate for each sample by using specific primers. gapdh was also included. real-time analysis was then performed according to manufacturer instructions by using taqman® fast advanced master mix (applied biosystems™ by thermo fisher scientific). real-time pcr analysis was performed on abi by applied biosystems. statistical significance for experiments with more than two groups was tested with one-way anova, and dunnett's multiple-comparison tests were performed. two-way anova with tukey's multiple-comparison test was used to analyze kinetic experiments. two-way anova with sidak's multiple-comparison test was used to analyze experiments with grouped variables (i.e. treatment, genotype). statistical significance for survival curves were evaluated with the log-rank (mantel-cox) test and corrected for multiple comparisons with bonferroni's correction. to establish the appropriate test, normal distribution and variance similarity were assessed with the d'agostino-pearson omnibus normality test using prism (graphpad) software. when comparisons between only two groups were made, an unpaired two-tailed t-test was used to assess statistical significance. to determine the sample size, calculations were conducted in nquery advisor version . . primary outcomes for each proposed experiment were selected for the sample size calculation and sample sizes adequate to detect differences with an % power were selected. for animal experiments, four to ten mice per group were used, as indicated in the cells were gated on fsc and ssc to eliminate debris, on fsc-a -fsc-h to select single cells and cells negative for live/dead dye and lineage markers (cd , cd , ter ). epithelial cells were gated as cd -epcam + cd -. the epcamcells were sorted for immune cells as follows: amac were gated as cd + ly g -cd c hi siglec-f + , monocytes and monocyte-derived cells (mo) were gated as cd + ly g -siglec-f -ly c + , cdcs were gated as cd + ly g -siglec-f -ly c -cd c + mhc-ii hi . estimates of the severity of coronavirus disease : a model-based analysis clinical progression and viral load in a community outbreak of coronavirus-associated sars pneumonia: a prospective study of the n. t. u. (ntu) c. of m. hospital progression in patients with severe acute respiratory syndrome virological assessment of hospitalized patients with covid- influenza and rhinovirus viral load and disease severity in upper respiratory tract infections innate and adaptive immune responses in patients with severe acute respiratory materials and methods mice c bl/ j jax ) mice were purchased from jackson labs. c bl/ il- r -/-(ifnlr -/-) mice were provided by bristol-myers squibb. mice were housed under specific pathogen-free conditions at boston children's hospital. staphylococcus aureus infections were conducted in the biosafety level- facility at boston children's hospital. all procedures were approved under the institutional animal care and use committee (iacuc) and conducted under the supervision of the department of animal cd (m / ), mhc-ii i-a/i-e (m / r (tlr-r ) and p-hprna (tlrlhprna) were purchased from invivogen. for in vivo administration of type iii ifn, we used polyethylene glycol-conjugated ifn-λ (peg-ifn-λ) (gift from bristol-myers squibb). diphtheria toxin (unnicked) from corynebacterium diphtheriae was purchased from cayman chemical. anti-ly g antibody, clone a (be - ) and rat igg a isotype control (be ) for in vivo administration was purchased from bioxcell. '-deoxy- -ethynyl uridine (edu) was purchased from carbosynth (ne ) epithelial cell proliferation') before being stained with antibodies against cell-surface antigens. intracellular staining of ki and p were carried out using foxp fix/perm buffer set (biolegend # ) following the manufacturer's instructions. epithelial cell proliferation proliferation of lung after permeabilization cells were washed and incubated with mm copper sulphate (millipore-sigma), mm sodium ascorbate (millipore-sigma) and μm sulfo-cyanine -azide (lumiprobe #a ) in tris buffered saline (tbs) mm, ph . , for min at room temperature. ion torrent for targeted transcriptome sequencing, ng of rna isolated from sorted cells was retro barcoded libraries were prepared using the ion ampliseq transcriptome mouse gene expression kit as per the manufacturer's protocol and sequenced using an ion s system genes were called expressed (n= , ) if they had average log expression of or greater in either wt or ifnlr -/-. differentially expressed genes (degs) between wt and ifnlr -/-were selected by thresholding on fold change (+/- . ) and p value ( . ). in heatmaps, degs were z-scaled and clustered (euclidean distance, ward linkage). pathway analysis was performed with the r package hyper cell culture flt l-dcs were differentiated from bone marrow cells in iscove's modified dulbecco's media (imdm; thermo fisher scientific), supplemented with % b -flt l derived supernatant and % fetal bovine serum (fbs) for where indicated poly (i:c) stimulated cells were pre-treated with μg/ml chloroquine for minutes prior to stimulations. qrt-pcr and elisa rna was isolated from cell cultures using a genejet rna purification kit (thermo fisher scientific #k ) according to manufacturer's instructions. rna was extracted from excised lungs by homogenizing them in ml of tri reagent. rna was isolated from tri reagent samples using phenol-chloroform extraction or column-based extraction systems (direct-zol rna microprep and miniprep, zymo research #r and #r ) according to the manufacturer's protocol flow cytometric isolation of primary murine type ii alveolar epithelial cells for functional and molecular studies wt mice were treated daily with i.t. . mg/kg poly (i:c), . mg/kg r or saline for days and infected i.t. with . x cfu of s. aureus at day . (a) body temperature, (b) total protein in the bal and **p < . and ***p < . (one-way anova). each mouse represents one point a-h) wt mice were treated daily for , , or days with i.t. . mg/kg poly (i:c) or days of saline, and infected with i.t. . x cfu of s. aureus for h. total lung homogenates were analyzed by qpcr for bacterial burden was evaluated in total lung homogenate **p < . and ***p < . (one-way anova compared to saline treatment). each mouse represents one point wt and ifnlr -/-mice were treated daily with i.t. . mg/kg poly (i:c) for days and infected with i.t. . x cfu of s. aureus for h. (a) weight change, (b) total protein in the bal **p < . and ***p < . (two-way anova). each mouse represents one point lung resident dcs are the primary producers of ifn-β upon poly (i:c) treatment mg/kg poly (i:c) or saline for days were sorted for epithelial cells (ec), resident dc (resdc), monocyte-derived dc (modc), and alveolar macrophages (amac) and assessed for (a) il b and (b) ifnb relative mrna expressions. cd c-dtr mice depleted for cd c + cells in vivo by dtx injections were treated daily with i.t. . mg/kg poly (i:c) or saline for days. total lung lysates of the treated mice were analyzed for (c) ifnb relative mrna expression, and (d) ifn-β protein expression by elisa , **p < . and ***p < . (two-way anova) rig-i or tlr ligands. flt l-dcs from wt, ticam -/-or mavs -/-mice were treated with μg/ml poly (i:c), μg/ml transfected poly (i:c) or μg/ml r for h. ifnb (a), and il b (b) relative mrna expressions were evaluated by qpcr , **p < . and ***p < . (two-way anova) mean and sem of independent experiments is depicted flt l-dcs upregulate ifn-λ uniquely upon activation of tlr signaling and not in response to the rig-i specific ligand p-hprna. flt l-dcs from wt mavs -/-mice were treated with μg/ml poly (i:c), or μg/ml transfected p-hprna for h or h ifnb (b), and il b (c) relative mrna expressions were evaluated by qpcr after h and ifn-β (e) levels in the supernatants were evaluated by elisa after h the endosomal tlr inhibitor chloroquine inhibits poly (i:c) dependent ifn-λ expression in flt l-dcs. flt l-dcs from wt mice were treated with μg/ml poly (i:c), or μg/ml transfected poly (i:c) for h in the presence or absence of μg/ml chloroquine we thank dr. jc kagan for discussion, help and support. funding: iz is supported by nih grant r ai , r dk , and niaid-dait-nihai . ab is supported by ccfa key: cord- - w tdhdd authors: hao, siyuan; ning, kang; wang, xiaomei; wang, jianke; cheng, fang; ganaie, safder s.; tavis, john e.; qiu, jianming title: establishment of a replicon system for bourbon virus and identification of small molecules that efficiently inhibit virus replication date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w tdhdd bourbon virus (brbv) was first isolated from a patient hospitalized at the university of kansas hospital in . since then, several deaths have been reported to be caused by brbv infection in the midwest and southern united states. brbv is a tick-borne virus that is widely carried by lone star ticks. it belongs to genus thogotovirus of the orthomyxoviridae family. currently, there are no treatments or vaccines available for brbv or thogotovirus infection caused diseases. in this study, we reconstituted a replicon reporter system, composed of plasmids expressing the rna-dependent rna polymerase (rdrp) complex (pa, pb and pb ), nucleocapsid (np) protein, and a reporter gene flanked by the ’ and ’ utr of the envelope glycoprotein (gp) genome segment. by using the luciferase reporter, we screened a few small molecule compounds of anti-endonuclease that inhibited the nicking activity by parvovirus b (b v) ns , as well as fda-approved drugs targeting the rdrp of influenza virus. our results demonstrated that myricetin, and an anti-b v ns nicking inhibitor, efficiently inhibited the rdrp activity of brbv and virus replication. the ic and ec of myricetin are . μm and . μm, respectively, in cells. myricetin had minimal cytotoxicity in cells, and therefore the therapeutic index of the compound is high. in conclusion, the brbv replicon system is a useful tool to study viral rna replication and to develop antivirals, and myricetin may hold promise in treatment of brbv infected patients. bourbon virus (brbv) is a member in the genus thogotovirus of the orthomyxoviridae family. there are seven genera in the family of orthomyxoviridae, which are segmented negative-strand rna viruses (kawaoka & palese, ) , including four types of influenza viruses (influenza virus a, b, c, and d), thogotovirus, quaranjavirus, and isavirus. many viruses in the orthomyxoviridae family are important pathogens to humans or animals. the epidemic/pandemic influenza a viruses have caused millions of human deaths in the past century and are still circulating, posing a huge threat to human health and the economy. thogotoviruses mainly circulate in domestic animals, such as sheep, cattle, and camels, causing neural diseases and abortion. two thogotoviruses, thogoto virus and dhori virus, have been reported to infect humans and cause deaths (butenko, et al., ; kosoy, et al., ) . human antibodies against thogoto virus and dhori virus have been detected in europe, asia, and africa (hubalek & rudolf, ; filipe, et al., ) . it has been reported that thogoto virus can cause human infections in a vector-free manner, possibly by an aerosol route (butenko, et al., ) , highlighting the potential to infect humans in a large population. brbv was first isolated from a blood sample of a hospitalized male patient from bourbon county, kansas, usa, in the spring of at the university of kansas hospital . the patient died due to a complex syndrome of leukopenia, lymphopenia, thrombocytopenia, hyponatremia, and increased levels of aspartate aminotransferase and alanine aminotransferase. due to the high level of viremia and unique identification of the virus in the serum taken from the patient, brbv was believed to be the cause of the illness and the death of the patient. later, two additional cases of brbv infection have been reported. in , a patient from payne county, oklahoma, usa tested positive for neutralization antibodies to brbv before fully recovering (savage, et al., ) . in june , a -year-old woman from missouri died from an infection of brbv after she had been misdiagnosed for a significant period of time (bricker, et al., ) . not surprisingly, a high seroprevalence of brbv-neutralizing antibodies in raccoons ( %) and white-tailed deer ( %) was detected in missouri, usa (jackson, et al., ) . however, the center for disease control and prevention (cdc), usa, currently does not know if the virus may be found in other areas of the united states, since a seroprevalence of brbv infection has not been evaluated outside of these epidemic regions. nevertheless, brbv is the first species of the genus thogotovirus of the orthomyxoviridae family to be identified as a human pathogen in the new world . brbv is widely carried by the lone star tick (amblyomma americanum), a species that is aggressive, feeds on humans, and is widely distributed across the east, southeast, and midwest states. brbv was found in pools of lone star ticks from retrospective tests in , ticks from northwestern missouri, km from bourbon county, kansas, usa (savage, et al., ) . the human brbv and the strain isolated from tick pools share > . % sequence at the amino acid level and . % identity at the rna sequence level (savage, et al., ) . brbv replicates in cell lines derived from the hard ticks amblyomma, hyalomma and rhipicephalus . together with the geographic location of the brbv infection and the geographic distribution of a number of amblyomma americanum ticks (savage, et al., ; savage, et al., ) , these studies strongly suggest that the lone star tick is a vector of brbv transmission to humans. little is known about the biology of brbv. due to the high mortality of brbv infection, specific treatments with antiviral drugs are in demand to save lives. in this study, we established a replicon reporter of brbv. we then used the replicon reporter to examine anti-influenza drugs that target viral rna-dependent rna polymerase (rdrp) and endonuclease inhibitors of human parvovirus b (b v) for inhibition of brbv rdrp activity and brbv replication in hek t and vero cells. t (favipiravir) and vx- were purchased from adooq bioscience (irvine, ca). baloxavir (s- ; baloxavir acid) was purchased from medchemexpress (monmouth junction, nj). flavonoid compounds used in this study were commercially acquired as follows: # (idofine # ), # (sigma, # ), # (aldrichselect # ), # (sigma, # s ), # (enamine, #z ). they all had a purity of ≥ %. all compounds were dissolved in dmso (sigma, st. louis, mo) as stock solutions at mm, and stored at - °c. plaque assays: cells were seeded in six-well plates at a density of × cells and were confluent the following day. we used the cell growth media to serially dilute the virus stock at -fold. µl of the diluent were added to each well and incubated for h with gently rocking the plate every min. after removing the virus diluent, ml of overlay media, % methylcellulose (sigma, m ) in dmem with % fbs, were added to each well. the plates were incubated at °c under % co for days. after removing the methylcellulose overlays, cells were fixed using the % formaldehyde solution for at least min and stained with % crystal violet solution. pcdna -based viral cdna plasmids: brbv rna was isolated from a stock of brbv-ks passaged on vero cells. cdna of each viral rna genome fragment was synthesized using the reverse primer and amplified by pcr. the amplification program started with one cycle at °c for min and one cycle at °c for min. these cycles were followed by cycles at °c for sec, °c for sec, and °c for sec; the program ended with one cycle at °c for min. the pcr products were visualized by agarose gel electrophoresis, extracted, and cloned into pcdna vector (invitrogen), resulting in pcdna-(brbv) pa, pb , pb , m, np, and gp plasmids. the plasmid phw (hoffmann, et al., ) contains the human ribosome rna polymerase i (pol i) promoter and the murine terminator sequence separated by two bsmbi sites. the pol i promoter and terminator elements are flanked by a truncated immediate-early promoter of cytomegalovirus (cmv) and by the bovine growth hormone polyadenylation signal (bghpa). the pcdna -gp was used as a template to amplify the gp cdna with a primer set that has the bsmbi sites (table ) . after digestion of the pcr products with bsmbi, the fragments were cloned into the phw vector, which resulted in the phw -(brbv)gp plasmid. reporter plasmids: phw -Δcmv-gp was constructed by deletion of the cmv promoter in phw -gp. then, we constructed phw-Δcmv-gp-gfp and phw-Δcmv-gp-gluc plasmids inserting the gfp orf through the ndei and eag i sites of the gp orf and replacing the gp orf with the gaussia luciferase (gluc) orf, respectively, into the phw -Δcmv-gp plasmid (fig. a) . the sequences of all the primers are listed in table . all the constructed plasmids were sequenced to confirm the cloned cdna sequences at mclab (south san francisco, ca). dna sequence analyses were performed using snapgene . (snapgene, chicago, il). hek t cells were transfected using the peimax (cat# - , mw , , polyscience, inc.) at a ratio of : of dna: pei in µl of opti-mem (invitrogen) . the total amounts of plasmid dna were kept constant ( - μg per well of -well plate) in each group by supplementation with an empty vector. brbv gfp reporter assay: phw-Δcmv-gp-gfp was co-transfected with the four pcdna-pa, pb , pb , and np plasmids into hek t cells confluent in six-well plates. after days post-transfection, the gfp signal was observed under a fluorescent microscope (nikon ti-s). the transfection of phw-Δcmv-gp-gfp with three pcdna plasmids (pb , pb , np) was set as a negative control. brbv luciferase reporter assay: phw-Δcmv-gp-gluc and the four pcdna-based plasmids were co-transfected into hek t cells in -well plate. at days post-transfection, µl of the cell culture supernatants were taken and mixed with µl of working solution or luciferase activity, using the gaussia luciferase flash assay kit (pierce™, # ). luminescent signals were detected on a synergy h microplate reader (biotek, winooski, vt). three pcdna plasmids (pb , pb , and np) and phw-Δcmv-gp-gluc co-transfected cells were set up as a negative control. the cytotoxicity of compounds on hek t and vero cells was evaluated using the cytotox-glo™ cytotoxicity assay kit (promega, #g ), according to the manufacturer's instructions. cells were seeded at × cells per well in -well plates. after overnight incubation, different concentrations of the compounds diluted in culture media were added to each well. cells treated with dmso were used for the negative control. cell viability was measured after three days of incubation. confluent monolayers of vero and hek t cells in -well plates were inoculated with pfu of brbv. different concentrations of the compounds, diluted in medium, were added to the wells. each concentration of the compounds was tested in duplicate per experiment, and the experiment was repeated three times. at days post-infection, culture supernatants were collected and used to quantify the nuclease digestion-resistant vgc numbers using rt-qpcr. cells were collected by centrifugation and lysed using a ml tuberculin syringes. the cell lysates were loaded, along with . μl of a pre-stained protein ladder (#p ; goldbio, st. louis, mo), and separated on an sds-polyacrylamide gel. proteins were transferred onto a polyvinylidene difluoride membrane (#ipvh ; millipore, bedford, ma). the membrane was blocked and probed with primary and secondary antibodies sequentially. signals were visualized by enhanced chemiluminescence, and the pre-stained protein ladder was imaged under bright light simultaneously under a fujilas imaging system. an anti-brbv m protein polyclonal antibody was produced in rats by immunizing them with the brbv m protein, which was expressed and purified in bacteria, following a protocol previously described (sun, et al., ). anti-β-actin (#a ) antibody was purchased from sigma (st. louis, mo). calculations of ic , cc and ec and statistical analysis were done by using graphpad prism version . . error bars represent means and standard deviations (sd), and statistical significance (p value) was determined by using the student t test. we propagated brbv-ks, obtained from bei resources (www.beiresources.org/), in vero cells. at days post-infection, when most of the cells appeared cytopathic, the media were collected and centrifugated. the supernatant was collected, aliquoted, and stored at - °c. the virus was titrated on vero cells with a titer of . × plaque forming units (pfu)/ml (fig. a) . we next developed a rt-qpcr to quantify the viral genome copy (vgc) numbers in the brbv stock. we found a high correlation of the vgc numbers with the pfu titrated (ꝩ= . ) (fig. b) . in the following experiments, we used the rt-qpcr to quantify the virus. we tested human hek t cells for infection of brbv for comparison with infection in vero cells. virus growth was determined over the course of infection, and infected cells were collected at days post-infection for western blotting of matrix (m) protein expression. the results showed that brbv infected hek t cells displayed similar growth kinetics as those in vero cells ( fig. a) , which had a peak of virus production of approximately (~) vgc/ml in the media, as well as detection of the m protein at a size of ~ kda (fig. b) . to clone the cdnas of the genome segments of brbv-ks, we designed primers table ) . each viral gene was reversely transcribed using each forward primer and pcr amplified using respective paired primers ( table ) . the pcr cdna fragments were first cloned into pcdna and sequenced. sequences of gp, pa, and pb genes were confirmed as identical to those originally deposited in genbank; whereas np, pb and m genes have , and nucleotide variations, respectively (fig. a) , which result in amino acid variations in np and pb (fig. b) . we next used the set of phw primers ( table ) that had the bsmbi sites to amplify the gp cdna and cloned it in a bsmbi-digested phw bidirectional cloning vector (hoffmann, et al., ) . the phw -gp clone that had the right direction of brbv cdnas was confirmed by sequencing, in which there is the immediate early promoter of cmv, rna polymerase ii (pol ii) promoter, a pol i terminator in the ' untranslated region (utr), a ribosome rna pol i promoter and a bghpa in the ' utr. working on brbv requires a biology safety level (bsl ) facility, thus, a replicon system of brbv is important for assessing antivirals in a bsl setting and for high throughput screening of antivirals. based on the phw -gp plasmid, we inserted a gfp open reading frame (orf) and a secreted gaussia luciferase (gluc) orf between the ' and ' utr of the gp segment (fig. a) . when phw -gfp was co-transfected with pcdna-pa, pb , pb , and np plasmids in hek t cells, at days post-transfection, green fluorescence was clearly observed in cells transfected with pcdna-pa, pb , pb , and np plasmids but not in cells transfected with all plasmids except pcdna-pa (fig. b) . similarly, a high luciferase activity was detected in the cells co-transfected with phw -gluc and pcdna-pa, pb , pb , and np plasmids, but nearly background levels were detected in cells transfected with phw -luc and pcdna-pb , pb , and np plasmids (fig. c) . overall, there was a -fold increase in luciferase activity by the addition of the pcdna-pa plasmid, suggesting a function of the rdrp activity from expression of pa, pb , pb , and np. we next used the brbv replicon reporter system to test antiviral activity of three inhibitors of influenza virus rdrp, favipiravir, also known as t- (a pan rdrp inhibitor) (furuta, et al., ; fuchs, et al., ) , pimodivir (vx- ), an inhibitor of pb cap-snatching activity (byrn, et al., ; furuta, et al., ) , and baloxavir, a pa endonuclease inhibitor (omoto, et al., ) (fig. a) . at . , and µm, respectively, none inhibited luciferase (rdrp) activity by > % (fig. a) . these drugs did not exhibit significant cytotoxicity at the tested concentrations in hek t cells (fig. c) . since influenza virus pa executes endonuclease activity on both single-stranded (ss)rna and ssdna (dias, et al., ; klumpp, et al., ; doan, et al., ) , we tested flavonoid compounds (# , # , # , and # ) (fig. b ) that inhibit nicking/endonuclease activity of the large nonstructural protein (ns ) of b v on an ssdna template (xu, et al., ) . surprisingly, at µm, flavonoids (# , # , and # ) inhibited brbv rdrp activity by > %. in contrast, flavonoids # and # did not inhibit rdrp (fig. b) . additional evidence of specific inhibition of the brbv rdrp is provided by failure of the flavonoid # inhibited the endonuclease activity of b v ns (xu, et al., ) . although compound # was cytotoxic at µm in hek t cells, compounds # , # and # were not (fig. d) . the half maximal inhibitory concentration (ic ) of compound # in the reporter system was . µm (fig. e) . we next assayed the inhibitory effect of compound # on virus replication in both hek t and vero cells. the half maximal effective concentration (ec ) of compound # against viral replication was . µm and . µm in hek t and vero cells, respectively (fig. a&b) . notably, compound # had negligible % cytotoxic concentrations (cc ) of µm in hek t cells and > mm in vero cells (fig. c&d) . collectively, our results demonstrated that a parvoviral endonuclease activity inhibitor, compound # (myricetin), inhibits brbv replication in cells through inhibition of the rdrp activity with a therapeutic index (ti=cc /ec ) of > . in this study, we established a gfp/luciferase expression-based replicon reporter system for brbv, and demonstrated it is suitable for screening of antivirals in a bsl setting. we identified myricetin (compound # ) as a potent anti-brbv drug. in contrast to influenza viruses, thogotoviruses are transmitted mainly through tick vectors and thus are also called "tick-borne viruses" (haig, et al., ; anderson & casals, ) . phylogenetically, brbv is most closely related to dhori virus and its subtype, batken virus, which have been known to occur in regions throughout africa, asia and europe (hubalek & rudolf, ; filipe, et al., ; moore, et al., ; lvov, et al., ) . dhori and batken viruses have been isolated from hyalomma ticks, and antibodies against dhori virus have been detected in camels, goats, horses, cattle, and humans (hubalek & rudolf, ; filipe, et al., ; moore, et al., ; lvov, et al., ) . interestingly, batken virus has also been isolated from several mosquito species (lvov, et al., ) . brbv replicates at a high level in a variety of invertebrate and vertebrate cell lines, including hek t cells, and produces relatively high titers derived from all mammalian cells that are compatible with the known susceptibility of the human host to brbv infection and diseases . the cell cultured brbv-ks has a series of variations in one α-helix of the np protein, compared to the originally reported sequences (kp ) by direct sequencing of the patient's samples (fig. b) . we modeled the structure of brbv np protein for the amino acids' electric charges and propensities to form an α-helix against the structure of influenza d virus (donchet, et al., ) using swiss-model (waterhouse, et al., ) (fig. ) . the modeling indicates that the variations at aa - did not change the charges of any amino acids, had minimal effects on the polarity of the variant sequences, and the sequence was still able to form an αhelix. interestingly, the brbv-stl strain, which was isolated from a fatal brbv case in st. louis, missouri, usa in (bricker, et al., ) , had the identical sequence at aa - in the np protein (mk ) as that of the cell cultured brbv-ks. currently, there is no treatment or vaccine for brbv-infection caused diseases. mice (cd- ) were susceptible to brbv infection as evidenced by seroconversion, but the infection did not cause disease symptoms or death of the animals . recently, mice lacking ifn-α/β receptor expression (ifnαar -/-) have been used as an animal model to evaluate antivirals against brbv (bricker, et al., ; fuchs, et al., ) . favipiravir, a pan inhibitor of rdrp (furuta, et al., ) , was shown to protect ifnαar -/mice from lethal brbv infection. the ec of favipiravir against brbv infection of vero was µm (bricker, et al., ) . in animals, administration of mg/kg of favipiravir twice daily at days post-infection protected all animals from death of brbv infection. however, it was speculated that favipiravir reached a concentration of . mm in serum, leading to the argument that it is not practical to use favipiravir to treat brbv-infected patients. notably myricetin, that was identified in our study, has a -fold lower ec ( µm) compared with the µm of favipiravir in vero cells, suggesting that myricetin may be a more potent inhibitor of brbv infection than favipiravir. due to the lack of an animal bsl facility that can host brbv-infected ifnαar -/mice, we were not able to examine myricetin in animals. myricetin, a common plant-derived flavonoid, exhibits a wide range of activities, including antioxidant, anticancer, antidiabetic and anti-inflammatory activities, as well as antimicrobial activities (semwal, et al., ) . myricetin was reported as a strong inhibitor of reverse transcriptase of retroviruses (ono, et al., ) . it has also been shown to be active in inhibition of nsp , a helicase of the severe acute respiratory syndrome (sars) coronavirus helicase that targets the atpase activity in vitro (ic = . µm) (yu, et al., ; . we found that myricetin does not exert obvious cytotoxicity in vero and hek t cells, as well as in normal breast epithelial cells (yu, et al., ; . we previously found that myricetin inhibited over % of the endonuclease activity of b v ns n (the endonuclease domain of ns ) in vitro at µm (xu, et al., ) . we speculate that myricetin may inhibit the endonuclease activity of brbv pa, which warrants further investigation. the brbv np structure was modeled on the template structure of the influenza d np protein (donchet et al., ) (smtl id: n u. ) using swiss-model (waterhouse, et al., dhori virus, a new agent isolated from hyalomma dromedarii in india therapeutic efficacy of favipiravir against bourbon virus in mice preclinical activity of vx- , a first-in-class, orally bioavailable inhibitor of the influenza virus polymerase pb subunit the cap-snatching endonuclease of influenza virus polymerase resides in the pa subunit metal ion catalysis of rna cleavage by the influenza virus endonuclease the structure of the nucleoprotein of influenza d shows that all orthomyxoviridae nucleoproteins have a similar npcore, with or without a nptail for nuclear transport antibodies to congo-crimean haemorrhagic fever, dhori, thogoto and bhanja viruses in southern portugal essential role of interferon response in containing human pathogenic bourbon virus favipiravir (t- ), a broad spectrum inhibitor of viral rna polymerase mechanism of action of t- against influenza virus thogoto virus: a hitherto underscribed agent isolated from tocks in kenya rescue of influenza b virus from eight plasmids tick-borne viruses in europe bourbon virus in wild and domestic animals family orthomyxoviridae development of chemical inhibitors of the sars coronavirus: viral helicase as a potential target rna and dna hydrolysis are catalyzed by the influenza virus endonuclease novel thogotovirus associated with febrile illness and death molecular, serological and in vitro culture-based characterization of bourbon virus, a newly described human pathogen of the genus thogotovirus batken virus, a new arbovirus isolated from ticks and mosquitoes in kirghiz s arthropod-borne viral infections of man in nigeria characterization of influenza virus variants induced by treatment with the endonuclease inhibitor baloxavir marboxil differential inhibitory effects of various flavonoids on the activities of reverse transcriptase and cellular dna and rna polymerases bourbon virus in field-collected ticks surveillance for tick-borne viruses near the location of a fatal human case of bourbon virus (family orthomyxoviridae: genus thogotovirus) in eastern kansas myricetin: a dietary molecule with diverse biological activities molecular characterization of infectious clones of the minute virus of canines reveals unique features of bocaviruses development of a novel recombinant adeno-associated virus production system using human bocavirus helper genes swiss-model: homology modelling of protein structures and complexes endonuclease activity inhibtion of the ns proetin of parvovirus b as a novel target for antiviral drug developemnt identification of myricetin and scutellarein as novel chemical inhibitors of the sars coronavirus helicase, nsp we are grateful to dr. zekun wang and other members of the qiu laboratory for technical support and valuable discussions. we thank dr. wenjun ma at kansas state university for providing the phw plasmid. the following reagent was obtained through bei resources, niaid, nih: bourbon virus, original, nr- .this study was supported by phs grant ai from the national institute of allergy and infectious diseases. the funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. ). np forms a tetramer, as shown, which has four arms and a center to capture viral rna and pass it to the rdrp. the region of np carrying the α-helix that includes aa - (sivma) is enlarged from one of the monomers. key: cord- -hynnba authors: wong, ten-tsao; liou, gunn-guang; kan, ming-chung title: a self-assembled protein nanoparticle serving as a one-shot vaccine carrier date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hynnba in this paper, we are exploring the role of an amphipathic helical peptide in mediating the self-assembly of a fusion protein into a protein nanoparticle and the application of the nanoparticle as a one-shot vaccine carrier. out of several candidates, an amphipathic helical peptide derived from m protein of type a influenza virus is found to stimulate high antigenicity when fused to a fluorescent protein genetically. this fusion protein was found to form protein nanoparticle spontaneously when expressed and purified protein stimulates long-lasting antibody responses in single immunization. through modeling peptide structure and nanoparticle assembly, we have improved this vaccine carrier in complex stability. the revised vaccine carrier is able to stimulate constant antibody titer to a heterologous antigen for at least six months in single immunization. the immune response against a heterologous antigen can be boosted further by additional immunization in spite of high immune responses to carrier protein. subunit vaccine is a safe alternative to traditional inactivated or attenuated vaccines, but its efficacy is often hindered by the low antigenicity of recombinant protein. different approaches are utilized to resolve this issue, among them, virus like particle (vlp) and self-assembled protein nanoparticle (sapn) are considered the best platforms for subunit vaccine development( ). vlp is assembled from a recombinant capsid protein alone without the genomic nucleic acid and bound nucleocapsid protein so it is non-infectious ( ) . the size of vlp is ranged between - nm that facilitates both draining efficiently to lymph node and also uptake by antigen presenting cells like dendritic cell and macrophage ( ) . the other benefit of vlp based vaccine is the induction of b cell receptor clustering when presenting repetitive antigen to b cell, a function that can activate antibody class-switch and somatic hypermutation in a t cell dependent mechanism ( ) . not only the virus like particle can be used for vaccine directly, heterologous antigen can be presented in the particle surface through genetic fusion ( ) . the universal flu vaccine candidate, ectopic m domain (m e), was genetically fused with hepatitis b core antigen(hbc) and assembled into a nanoparticle that provides full protection to homologous flu strain ( ) . but the application of hbc based vlp in human vaccine development is restricted by the pre-existing anti-hbc antibody present in the million chronic hbv carriers and the population exposed to hbv infection ( ) . an artificially designed sapn may avoid the effect of the existing antibody. a sapn assembled from protein constituted by two coiled-coil domains that form trimer and pentamer respectively can be assembled into nanoparticles with specific sizes ( , ) . this sapn stimulates strong immune responses to target protein fused to the terminal of constituting monomer after immunizations even without adjuvant, but this immunity waned gradually ( ) . the green fluorescent protein is a member of fluorescent protein family that are structurally conserved and emit fluorescent light from a chromophore when excited by photons of shorter wavelength ( ) . the shared features of fluorescent proteins including a sturdy barrel shaped structure constituted by β-sheets and an enclosed chromophore that emits fluorescent light when excited ( ) . the function of the barrel shell is to provide a well organized chemical environment to ensure the maturation of chromophore and protects it from hostile elements ( ) . so it is conceivable that the protein sequences among fluorescent protein family members in the barrel shell are highly variable and fluorescent proteins possess desirable biophysical properties can be selected using directed evolution ( , ( ) ( ) ( ) ( ) . the applications of fluorescent protein have been expanded into multiple areas beyond live imaging which includes serving as biological sensors ( , ) , or detectors for protein-protein interaction or protein folding ( , ) . amphipathic α-helical peptide (ahp) forms hydrophilic and hydrophobic faces when folded and is often identified in proteins related to phospholipid membrane interaction. the n-terminal amphipathic helical peptide is required for membrane anchorage of hepatitis c virus ns protein and the protease function of the ns /ns a complex ( , ) . several anti-microbial peptides also possess amphipathic properties and function by forming membrane pores or causing membrane disruption ( ) . the amphipathic α-helical of type a influenza virus m protein is required for m protein anchorage and induces membrane curvature required for virus budding ( , ) . life protection from diseases through vaccination is considered a holy grail only achievable by some live attenuated viral vaccines with multiple doses up to date ( ) . here in this report, we are describing the identification of a protein nanoparticle based on an amphipathic α-helical peptide (ahp) from m protein of type a influenza strain h n and a green fluorescent protein. we have predicted the protein nanoparticle structure according to transmissive electronic microscope images using protein modeling and generated ahp mutants that provide higher stability and antigenicity to the protein nanoparticle. this modified protein nanoparticle is able to simulate long constant antibody response to an inserted heterologous antigen in single immunization without adjuvant. and this antibody response can be further boosted by additional immunization. the identification and application of this nanoparticle as a vaccine carrier was explored in this study. identification of an ahp-gfp protein complex with high antigenicity and stability. as described in our patent application filed in , we have tested the immunogenicity of fusion proteins composed of an ahp and a gfp ( ) . the results showed an increase of anti-gfp igg titer ranged between ~ log under a two immunizations regime ( figure s ). one of the peptides, ah , that derived from m protein of type a influenza strain h n gives extended stability to the gfp fusion protein when compared to another peptide, ah ( figure s ) as well as other peptides in our study (data not shown). since a stable protein is essential for vaccine carrier and may profits worldwide vaccination effort, we were interested in the mechanism of ah -gfp stability and antigenicity. to study the potential mechanisms that contribute to the above mentioned properties of ah -gfp fusion protein, we first checked the composition of ah -gfp protein post expression and purification. one clue that led us to study the composition of ah -gfp fusion protein is the difficulties encountered during protein purification. unlike other fusion proteins studied, both ah -gfp and ah -gfp fusion proteins are mostly expressed as insoluble inclusion body and the remaining soluble protein did not bind to ni-nta resin under normal condition of mm nacl. the ah -gfp and ah -gfp fusion proteins only started to bind to ni-nta resin after lowering the nacl concentration from mm to mm, an indication that hydrophobic interaction may induce a protein complex with n-terminal his tag hindered from binding to ni-nta ligand. also, the resistance of ah -gfp fusion protein to hydrolysis suggested the linker between ah peptide and gfp is kept in a water tight complex. from these two clues, it was hypothesized that ah -gfp fusion protein form stable protein complex through hydrophobic interaction mediated by n-terminal ah peptide. to test the hypothesis that ah -gfp or ah -gfp fusion protein forms a protein complex, we first used the protein concentration tube with different molecular weight cut off (mwco) to determine the protein complex sizes. as shown in figure a , when gfp protein with a molecular weight of kda is able to pass through membranes with mwco of kda, kda and kda freely, ah -gfp fusion protein purified from bacterial lysate was prevented from passing through the membrane with an mwco of kda. with a molecular weight of kda, the purified ah -gfp protein need to form a complex with more than monomers to be excluded from passing a membrane with a kda mwco. to explore further the geometric composition of the ah -gfp protein complex, we examined the fusion protein under transmissive electronic microscope (tem). the tem results showed the ah -gfp fusion protein forms a cylinder-like structure with length up to ~ nm and a diameter around nm (fig. b) . the difference in length suggests that the particle may be assembled along the long axis. when scanning along the long axis of the ah -gfp particle, there is a repetitive pattern of two-one-two-one of white dots with two less visible dots on each side of the single dot. the predicted structure according to tem images is shown in fig. d . we also examined protein geometric composition of ah -gfp under tem, but there is no clear evidence of forming higher order protein complex, suggesting ah -gfp protein complex is not as stable as ah -gfp to withstand the conditions during negative staining. to find the correlation between protein complex formation and antigenicity, we immunized mice with purified ah -gfp fusion protein and the recombinant gfp protein that pass through the membrane freely. proteins were prepared from lps synthesis defective e. coli strain, clearcoli bl (de ), to avoid the interference of lps contamination, a known tlr ligand. the mice were immunized with purified proteins by single intramuscular injection and sera were collected at day , , and to evaluate anti-gfp igg titer by elisa. deoxycholate was added to test if deoxycholate in the concentration of . % affects ah -gfp antigenicity and related experiment was terminated at days post immunization when it showed no effect on antigenicity of either gfp or ah -gfp. these results suggest gfp alone is a poor antigen and only gained high antigenicity after fused with ah peptide (fig. c) . to understand the potential molecular mechanism leading to the assembly of ah -gfp nanoparticle, we tried to build a protein model based on three observations: first, the particle was assembled through hydrophobic interaction and second, the particle assembled along the long axis and the third: ah -gfp protein particle has a repetitive three-two pattern when observed under tem. we first assembled the two ah peptide as anti-parallel helices with hydrophobic sidechains of f , f , i , l and l mediating intermolecular contacts. the assembled ah dimer created a hydrophobic core between two helices and the helix dimer is surrounded by hydrophilic sidechains from multiple lysine and arginine except one exposed hydrophobic patch, as predicted using deepview and marked by white mesh (fig. a) . this hydrophobic patch can be seen only to cover one face of the dimer (fig. b ). when the water accessible surface of ah dimer was calculated, two cavities could be seen located within the hydrophobic patch that provides contact points for two arg sidechains extruding from the opposite face of ah dimer. a second ah dimer can make close contact with the first dimer after turning counter clockwise looking down the hydrophobic patch for o and forms a tetramer (fig. c) . the intermolecular energy between two dimers from this model was calculated to has a Δg of - kcal/mol (fig. c) . after adding gfp protein structures onto the ah tetramer model, the ah -gfp fusion protein tetramer will form a cross-shaped assembling unit and the stacking of every ah -gfp tetramer on top of another tetramer will extend the particle length by . nm and turning the cross by o . since the gfp protein barrel diameter is ranged between . ~ . nm, the out extending gfp from ah -gfp tetrameric cross can spatially fit with the model (fig. d ). under this model, the protein nanoparticle will be extended continuously with a hydrophobic patch presenting on one end of the assembled particle constitutively and serving as a point for polymerization. after proving that the ah -gfp protein complex possesses high antigenicity, we decided to explore the application of ah -gfp protein nanoparticle as a vaccine carrier. for the hepatitis b core antigen, the amino acid served as an insertion site for heterologous antigen fusion ( ) . gfp protein has a thermal stable structure that constituted by β-strands and α-helices and some of the loops between strands have been explored as insertion sites for heterologous protein for various purposes ( , , ) . among those candidates, loop linking strand and strand was chosen because it has a high capacity for foreign peptide insertion (fig a) ( ) . the original ah -gfp recombinant protein is constructed in pet a vector with ah coding region inserted c-terminal to his-tag and thrombin cleavage site followed immediately by gfp cloned from pegfp-c . this expression vector was low in soluble protein productivity and unable to express soluble recombinant protein when the peptide is inserted between d and g . to resolve the expression and folding issues, we designed a new expression vector. first, we cloned ah peptide into the very n-terminal following methionine in the pet vector, and then to its c-terminal we inserted a synthesized sfgfp gene ( ) with a created antigen insertion site following s of sfgfp. the antigen insertion site also included an xhis tag for recombinant protein purification. to verify vaccine carrier function, we inserted two copies of broad spectrum flu vaccine candidate, human m ectopic peptide (hm e) separated by a amino acid linker (fig. b) . the newly constructed vector was proven to be efficient for expressing soluble ah -sfgfp- hm e fusion protein as a protein complex (data not shown). the ah -sfgfp- xhm e protein complex under tem is not as stable as ah -gfp unless first cross-linking the protein preparation with a heterobifunctional protein crosslinker, sulfo-smcc (fig. c ). following the previous established ah -gfp protein model, we were seeking strategies to create a more stable ah -sfgfp protein complex. first, we found the mutation of isoleucine to leucine increase the intermolecular interaction(Δg) from - kcal/mol to - kcal/mol (fig. d) . second, we mutated lysine to glutamic acid and generated additional electrostatic interactions between side chains of glu with arg and arg (fig. e ). to verify whether the protein modeling results are correct, including the presence of hydrophobic patch on ah peptide complex and a higher stability of ah based protein complex with i l and/or k e mutations. we first generated mutations in ah peptide in the context of ah -sfgfp- xhm e construct (fig. a ). our hypothesis is that the hydrophobic patch of the ah -gfp complex will bind bacterial membrane and co-sediment with it during ultracentrifugation. and then the methodology was verified by first centrifuge bacterial lysate prepared from clearcoli culture in a centrifuge tube preloaded with %(w/v), %(w/v) and %(w/v) sucrose solution in the volume of ml, ml, ml respectively ( fig. b left panel) . the distribution of bacterial membrane was marked by a lysochromic dye, sudan iii. the control sample contained sudan iii with lysis buffer alone (fig. b lane ) . after ultracentrifugation, the bacterial membrane is sedimented to the junction of %/ % sucrose solution (fig. b lane ) . using the same protocol, ah -gfp was found to co-sedimented with the bacterial membrane (fig. s a) as well as the ah -sfgfp-hm e fusion protein but not a free gfp protein (fig s b) these results are consistent with our protein structure modeling that arg serves as a main contact point for dimer stacking also it mediates the electrostatic interaction with mutated glu (fig. e ). as shown in the sucrose step gradient result, the presence of hydrophobic patch enables nonspecific interaction of the ah -gfp protein complex with phospholipid membrane. which may restrict the free moving of protein nanoparticle and keep it from reaching draining lymph node for stimulating immunity ( ) . to compare the antigenicity of protein complexes derived from either ah -sfgfp- xhm e or lyrrle-sfgfp- xhm e, we immunized mice with a single injection of either recombinant proteins. post immunization, sera were collected at day , , and to evaluate anti-hm e igg titer by elisa. the geometric mean titer of anti-hm e igg reached the highest point for the ah -sfgfp - xhm e group and then declined afterward. but of the lyrrle-sfgfp- xhm e group, the gmt reached highest point at day and remained steady up to day (fig. s ) . when the individual mouse serum result is observed separately, only one out of mice from ah -sfgfp- xhm e group has higher anti-hm e igg titer at day than day . but there are out of mice from the lyrrle-sfgfp- xhm e group shows a higher antibody titer in day compared to day (fig. a) . these results suggest that the two point mutations of ah in i l and k e enable the formation of a stable, high antigenic protein complex that stimulates long lasting immune responses in a single immunization. vaccine carrier like hbc based virus like particle (vlp) is often failed at boosting humoral immune responses after prime dose in a multiple doses protocol due to antigen competition. although gfp is a protein of low antigenicity, the fusion with ah strongly enhances its antigenicity as shown in figure c . to test if sfgfp backbone competes with inserted hm e peptide for immune machinery, we immunized mice in a prime-boost protocol using the same protein preparations. the two consecutive injections were carried out days apart and sera collected at day , and were subject to elisa assay using either hm e peptide or sfgfp protein as coating antigen. the result shows the igg titer against hm e elevated continuously after consecutive immunizations for both proteins as well as anti-sfgfp igg titer. the result suggests that although carrier protein ah -sfgfp also has high antigenicity, it did not interfere with the immune response against the heterologous protein, hm e (fig. b, c) vaccine as a tool to prevent infectious disease is the most cost effective strategy. especially for attenuated viral vaccines like vaccinia, mmr or oral polio vaccine, they produce long lasting even life time protective immune responses but these attenuated viral vaccines took decades for development. apparently, this strategy will not be able to timely develop a vaccine to ward off emerging global pandemic like covid- . although the new vaccine technology like dna vaccine, mrna vaccine or adenovirus based vaccine that can quickly develop a subunit vaccine after the genomic information of pathogen become available, but the immune responses generated are often declined to base level within a year ( ) ( ) ( ) ( ) . this short lived immune response may expose vaccinated people to the risk of antibody dependent enhancement (ade) that is known to be devastating and leads to vaccine failure ( , ) . in this study, we have created a self-assembled protein nanoparticle composed of ah -fp and a more stable variant that stimulates long lasting antibody responses. the nature of this long lasting immune responses is not known but it may be mediated by long lasting plasma cell generated during ah -fp immunization, the same mechanism that accounts for the lifelong protection of attenuated viral vaccines ( ) . fluorescent protein family is a group of proteins with conserved barrel shaped structure with chromophore buried inside. since the function of this beta strand constituted barrel is to provide a suitable environment for chromophore maturation, the primary sequence of fluorescent protein barrel is prone to mutagenesis through either direct selection ( ) or evolution ( ) . also, fluorescent proteins of desired biophysical properties like thermal stability or folding efficiency can be obtained expert review of vaccines key: cord- - owvqw authors: saunders, jaclyn k.; gaylord, david; held, noelle; symmonds, nick; dupont, chris; shepherd, adam; kinkade, danie; saito, mak a. title: metatryp v . : metaproteomic least common ancestor analysis for taxonomic inference using specialized sequence assemblies - standalone software and web servers for marine microorganisms and coronaviruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: owvqw we present metatryp version- software that identifies shared peptides across organisms within environmental metaproteomics studies to enable accurate taxonomic attribution of peptides during protein inference. improvements include: ingestion of complex sequence assembly data categories (metagenomic and metatranscriptomic assemblies, single cell amplified genomes, and metagenome assembled genomes), prediction of the least common ancestor (lca) for a peptide shared across multiple organisms, increased performance through updates to the backend architecture, and development of a web portal (https://metatryp.whoi.edu). major expansion of the marine database confirms low occurrence of shared tryptic peptides among disparate marine microorganisms, implying tractability for targeted metaproteomics. metatryp was designed for ocean metaproteomics and has been integrated into the ocean protein portal (https://oceanproteinportal.org); however, it can be readily applied to other domains. we describe the rapid deployment of a coronavirus-specific web portal (https://metatryp-coronavirus.whoi.edu/) to aid in use of proteomics on coronavirus research during the ongoing pandemic. a coronavirus-focused metatryp database identified potential sars-cov- peptide biomarkers and indicated very few shared tryptic peptides between sars-cov- and other disparate taxa, sharing . % peptides or less ( peptide) with the influenza a & b pan-proteomes, establishing that taxonomic specificity is achievable using tryptic peptide-based proteomic diagnostic approaches. statement of significance when assigning taxonomic attribution in bottom-up metaproteomics, the potential for shared tryptic peptides among organisms in mixed communities should be considered. the software program metatryp v and associated interactive web portals enables users to identify the frequency of shared tryptic peptides among taxonomic groups and evaluate the occurrence of specific tryptic peptides within complex communities. metatryp facilitates phyloproteomic studies of taxonomic groups and supports the identification and evaluation of potential metaproteomic biomarkers. we present metatryp version- software that identifies shared peptides across organisms within environmental metaproteomics studies to enable accurate taxonomic attribution of peptides during protein inference. improvements include: ingestion of complex sequence assembly data categories (metagenomic and metatranscriptomic assemblies, single cell amplified genomes, and metagenome assembled genomes), prediction of the least common ancestor (lca) for a peptide shared across multiple organisms, increased performance through updates to the backend architecture, and development of a web portal (https://metatryp.whoi.edu). major expansion of the marine database confirms low occurrence of shared tryptic peptides among disparate marine microorganisms, implying tractability for targeted metaproteomics. metatryp was designed for ocean metaproteomics and has been integrated into the ocean protein portal (https://oceanproteinportal.org); however, it can be readily applied to other domains. we describe the rapid deployment of a coronavirus-specific web portal (https://metatryp-coronavirus.whoi.edu/) to aid in use of proteomics on coronavirus research during the ongoing pandemic. a coronavirus-focused metatryp database identified potential sars-cov- peptide biomarkers and indicated very few shared tryptic peptides between sars-cov- and other disparate taxa, sharing . % peptides or less ( peptide) with the influenza a & b pan-proteomes, establishing that taxonomic specificity is achievable using tryptic peptide-based proteomic diagnostic approaches. when assigning taxonomic attribution in bottom-up metaproteomics, the potential for shared tryptic peptides among organisms in mixed communities should be considered. the software program metatryp v and associated interactive web portals enables users to identify the frequency of shared tryptic peptides among taxonomic groups and evaluate the occurrence of specific tryptic peptides within complex communities. metatryp facilitates phyloproteomic studies of taxonomic groups and supports the identification and evaluation of potential metaproteomic biomarkers. in metaproteomics the mixture of a large number of organisms within each sample collected from a natural environment creates challenges in the attribution of peptides to specific proteins. this is especially problematic in instances where exact tryptic peptide sequences are shared between two or more organisms. this potential for shared peptides across proteins can create uncertainty in protein inference and taxonomic attribution. in bottom-up proteomics, the primary method used in metaproteomics to date, whole proteins are typically digested into smaller peptides with the enzyme trypsin. since bottom-up metaproteomics directly measures these short tryptic peptides, as opposed to entire protein sequences, it is essential to understand the degree of shared peptides across proteins and taxonomic groups when assigning attributes of diverse environmental communities. previously, we described the development of the metatryp software which evaluates multiple organisms for shared peptides [ ] . metatryp takes the full predicted proteome of an organism based on its reference genome, performs an in silico tryptic digestion of the proteins, and then stores the tryptic peptides of that organism within a single sql database. multiple taxa proteomes are stored within the sql database. using metatryp tools, the database can be queried to identify how many taxa share a specific peptide (or list of peptides), and it can also identify the total number of specific tryptic peptides shared across multiple organisms and for other phyloproteomic analyses. the former application has aided in the development of targeted metaproteomic biomarkers for assessing environmental changes in space or time [ ] [ ] [ ] [ ] . a useful result of the latter application was the observation that the percentage of shared peptides between distinct marine microbial taxa was low, often in the single digit percentages, implying that the design of biomarker targets for species or even subspecies level analyses was tractable if sufficient care was taken. in this manuscript, we describe version of the metatryp software (https://github.com/whoigit/metatryp- . ). we have added additional features to improve its usability and performance. a major improvement was the addition of new data categories for different sequencing assembly methods, specifically those associated with assembled metagenomic and metatranscriptomic data as well as single cell amplified genomes (sags). metatryp v now supports these three specific data categories: "genomes" for reference cultured isolates, "specialized assemblies" from sags and mags, and "meta-omic assemblies" from metagenomic and metatranscriptomic assemblies. this greatly expands the utility of metatryp since cultured genomes are often unavailable from natural environmental populations due to many organisms being difficult to culture with classical microbiological techniques [ ] , or being only recently identified taxa. as a result, the availability of single cell genomes amplified and sequenced from the ocean environment (sags), metagenome assembled genomes (mags), and assembled metagenomics and metatranscriptomic data can contribute greatly to the identification and interpretation of metaproteomic data. yet because these metagenomic and metatranscriptomic resources have varying levels of completeness and confidence in their functional and taxonomic assignments, maintaining them as separate categories of tryptic peptides within the database structure is particularly useful. in addition, metatryp v now supports the calculation of a least common ancestor (lca) among shared tryptic peptides. for comparison, the unipept web portal [ ] has some similar functionality in identifying shared tryptic peptides and interpreting the least common ancestor; however, unipept relies on the uniprot database which does not incorporate the wealth of environmental meta-omic sequencing available. also, the unipept portal does not support local curated database construction where users can evaluate unpublished sequencing resources like those of newly sequenced organismal genomes or novel environmental sequencing not yet available in the uniprot curated database whereas metatryp can be installed locally for use with custom curated databases. the addition of these new sequence assembly data categories enables better prediction of shared peptides through enhanced representation of environmental sequence variability. the metatryp marine web portal currently contains a total of , , unique peptides from , , submitted protein sequences combined across all three data categories. multiple improvements to the metatryp software architecture and additional features were added to v . in order to improve performance speed and support these larger data categories (especially, metagenomic and metatranscriptomic assemblies), the metatryp sql backend was converted from sqlite in metatryp v to a postgresql backend in metatryp v . additional software and postgresql implementations support the lca analysis. metatryp v uses the same tryptic digest rules applied to metatryp v , following trypsin-based digestion rules for proteins with peptides - amino acids in length [ ] . here we describe the technical improvements within metatryp v , then demonstrate how metagenomics resources allow increased understanding of least common ancestor interpretations of metaproteomic results. additionally, we provide an overview of the metatryp web portal for marine microorganisms and the rapid deployment of a coronavirus-specific metatryp web portal demonstrating the application of metatryp to various research fields. finally, a private api was added providing lca analysis functionality to the ocean protein portal [ ] , enabling metatryp to be inserted into other pipelines in the future. metatryp is built upon a relational database management system (rdms). version was built using a sqlite database. while this database management system was sufficient for single reference genomes, it was lacking in speed needed for expanded sequencing data categories. in order to increase the speed of database construction (ingestion of proteomes), database searching, and expanded functionality, we have upgraded the database management system to a postgresql backend which is an object-relational database. the postgres backend has provided improvements in speed, as well as enhanced flexibility in searching. the advent of environmental de novo sequencing and assembly has identified an entire realm of microorganisms previously unknown to the scientific world, as many environmental microorganisms are not readily isolated using classical microbiological techniques [ ] . metatryp v focused on the construction of a search database using reference organismal genomes, metatyrp v is capable of handling newer types of sequencing and assemblies thus opening the search space to a much greater range of organisms likely found within the environment of interest. within the field of metaproteomics, there has been great emphasis placed on the need for curated and appropriate search databases to be utilized for peptide-to-spectrum matching (psm) [ , ] ; this also holds true for the construction of a metatryp database for evaluation of shared tryptic peptides in an environmental sample. metatryp relies upon protein sequences predicted from genomic sequencing and does not currently take into account any post-translational modifications (ptms). in order to expand the environmentally-relevant search space, metatryp v now handles newer assemblies from sources like metagenome and metatranscriptome assemblies, metagenomeassembled-genomes (mags), and single cell amplified genomes (sags). the incorporation of these newer data categories, in addition to the traditional single organism reference genome, greatly expands the environmental variability (and therefore, potential for shared tryptic peptides) within an environmental sample. in order to manage the larger sequencing databases generated by incorporation of this environmental data, and thus the greater burden of a larger search space, improvements were made to the back-end search database (see section "database back-end upgrades"). the database schema ( figure ) for metatryp was expanded to not only include these different data categories, but to identify them as separate search spaces as the uncertainty of taxonomic identity among the sequencing categories is variable and should be taken into consideration during interpretation. the new data categories roughly mirror the original reference genome schema. however, additional tables are required to properly map meta-omic data as multiple taxa are contained within a single meta-omic assembly file. environmental sequencing data is also more likely to have more frequent occurrences of ambiguous bases in assemblies. these are base locations where it is uncertain what the correct amino acid should be, sometimes a result of low-quality base calling by the sequencing technology or due to a single location where there are multiple possibilities for the amino acid at that single location in the assembly that cannot be determined. metatryp v will recognize ambiguous bases, specifically the base symbol "x", which represents the presence of an unknown or ambiguous amino acid during the ingestion phase. as the specific amino acid represented by "x" is unknown, the exact tryptic peptide cannot be predicted. during ingestion, metatryp will identify proteins which contain an "x" and report to stdout the affected protein. if there is a tryptic peptide containing the "x", that peptide will not be included in the metatryp peptide table; however, all other tryptic peptides from that protein will be included in the peptide table. sequence homology is conserved among more closely related organisms. however, it is possible that tryptic peptides, especially shorter ones around amino acids, may occur by chance across multiple taxa without a direct shared ancestry. in order to identify the occurrence of shared tryptic peptides either through shared evolutionary history or through stochastic variance in sequence, we have added least common ancestor (lca) analysis to metatryp v . the lca analysis incorporates the phylogenetic lineage of the sequences imported into the metatryp databases ( figure ), then calculates the "least common ancestor" by finding the unifying phylogenetic point for all the organisms containing the tryptic peptide queried. in order for metatryp to identify the common point in the taxonomic lineage, it requires a consistent taxonomic lineage to be used across the database for each proteome submitted. for metatryp the shared phylogeny of the taxonomic groups is identified by pulling the taxonomic lineages from the national center for biotechnology information (ncbi) taxonomy database [ ] . for the creation of a user-generated metatryp database capable of lca analysis, the user can submit the ncbi taxon id number (taxid) for the input sequence files, and metatryp will pull the taxonomic lineage information for each organism using biopython [ ] and pandas [ ] libraries in python . this lineage information is then used to calculate the lca among the organisms with shared peptides via the postgresql longest common ancestor function, which metatryp uses to return the lca for each sequencing data category. a primary goal of releasing the metatryp software originally was to enable other users to create and curate customized databases for searching tryptic peptides, specifically with a focus on marine microbial communities. metatryp v expands on this goal through the creation of a web server and api which can be queried easily by users, without the need to install and run the software locally. the metatryp v site can be found at https://metatryp.whoi.edu/. this web server takes as input a peptide sequence, multiple peptides, or a full protein sequence submitted into a text box by a user which is then in silico digested into tryptic peptides. metatryp then searches for the occurrence of these peptides across three different marine-specific data categories: an organismal reference genome data category ("genomes") same as metatryp v (si table ), and new data categories for "specialized assemblies" (si table ) which currently contains , archaeal and bacterial mags [ , ] assembled by binning metagenomic sequences [ ] [ ] [ ] from the tara oceans sequencing project [ ] , and a metagenomic & metatranscriptomic assembly data category ("meta-omic assembly") [ , , ] (si table ) which currently contains , , predicted proteins. ideally, additional mags and sags will be added to the metatryp web portal database in an effort to broaden taxonomic coverage in the marine environment. the addition of eukaryotic sags [ ] would significantly extend the diversity of the current database. results from a metatryp query are then returned to the user in an interactive drop-down genomes & microbiomes (img) genome ids are also shown, where available, with links out to img as the predicted proteomes for all those in the "genome" category were collected from in the current version of this database [ ] . metatryp can also compare entire organismal proteomes to identify the frequency of shared peptides across taxa within a given sequencing data category. those more familiar with nucleic acid sequencing often incorrectly imagine that because translation to amino acid space results in loss of variable dna codon information, peptides will lack the ability to taxonomically resolve species or subspecies. this feature previously existed in metatryp v for generating peptide redundancy tables within the "genome" sequencing data category and was used to show the relatively low occurrence of shared peptides across disparate taxa in the open ocean microbiome [ ] . this feature can now be implemented within metatryp v on all three major data categories: "genome", "meta-omic assemblies", and "specialized assemblies". within the web portal, there is now a visualization tool for creating ordered heatmaps of shared peptide frequencies among taxa for the genome and meta-omic data categories. this visualization page, "peptide redundancy heatmaps", was built in python . using the jupyter environment [ ] , pandas [ ] , and seaborn [ ] . users can select what taxa they wish to compare within a given data category, and a heatmap is generated and displayed on the page below in a a specified data category where the percentage is calculated as the number of shared peptides between taxon a and taxon b divided by the total number of peptides in taxon a. given the varying levels of genome completeness for a specific taxon in the "specialized assembly" data category, this percentage should be viewed with more caution. due to the aggregate nature of meta-omic assemblies, where many taxa of highly variable coverage depth are present within each dataset, this heatmap visualization feature is not currently supported in the web portal for this data category. the capabilities of the metatryp software make it applicable to scientific domains outside of respiratory syndrome-related (mers) coronavirus strains [ ] , and strains associated with the common cold [ ] like human coronavirus strains nl [ ] , hku [ ] and e [ ] for a total of coronavirus taxa in the database. it also contains the human proteome, the african green monkey (chlorocebus aethiops sabaeus) proteome as it is the taxonomic source of the vero cell line commonly used in virus replication studies and plaque assays [ ] , common oral bacteria [ ] , six lactobacillus strains associated with the human microbiome [ ] , the most common influenza strains (influenza a: h n & h n ; influenza b) as well as other influenza strains, and common proteomic contaminants in the crapome [ ] . all taxa included in the coronavirus database and their associated ncbi taxonomy ids (si table ) are listed on the databases page in the web portal (https://metatrypcoronavirus.whoi.edu/database). in order to capture the variability of sequences, we pulled all the proteins (aside from those in the crapome) for each taxon from the ncbi identical protein groups (ipg) database using taxon sequence identifiers (si table ). the ipg database enables collection of a single non-redundant entry for each protein translation found from several sources at ncbi, including annotated coding regions in genbank and refseq, as well as records from swissprot and pdb [ ] . one sequence for each identical protein group was collected for ncbi taxa with >= identical protein groups, collecting proteins from the specified taxon and all its children from the ncbi taxonomy database. however, for the taxon "severe acute respiratory syndrome-related coronavirus" (ncbi txid: ), only protein groups from that specific txid level were recruited (using the flag "txid [organism:noexp]" in the query). this taxon should only contain sars-cov- related proteins; however, it may contain some non-sars-cov- sequences due to inconsistent nomenclature during the emergence of this relatively new pathogen. proteins were collected using biopython [ ] and the ncbi entrez [ ] e-utilities api [ ] . by using the identical protein groups, the database captures sequence variability while reducing redundancy in database construction. this non-redundant collection of protein sequences per taxon is essentially the collection of the known pan-proteomes for each taxon; it is the representation of the predicted proteome of a single organism's genome plus all the sequence variation captured from sampling a population of organisms within a taxonomic group. for example, the taxon homo sapiens pan-proteome from ipg has > , , proteins capturing sequence variability from the population of human sequences in the ncbi database, whereas an individual human genome contains < , protein coding genes [ ] . due to the varying sequencing efforts of among some taxa, the length of the proteomes in this database may vary. for example, the taxon severe acute respiratory syndrome coronavirus (sars-cov- , -ncov, covid- virus; taxid ) has , unique identical protein groups whereas the taxon bat coronavirus isolate ratg (taxid ), sars-cov- 's closest sequenced animal virus precursor [ ] , has unique identical protein groups as of this writing (may , ) even though the genomes of these viruses are roughly the same length. since the population of sars-cov- has been sequenced more frequently, more sequence variability has been captured in the ncbi databases. due to the collection of the pan-proteomes for the coronavirus metatryp database, the peptide redundancy heatmaps need to be viewed with caution, as taxa which have received a higher degree of sampling effort will have more peptides associated with them. therefore, it is important to take into consideration both combinations of pairwise taxa comparisons for heatmap calculations (both sides of the heatmap separated by the diagonal). for example, one should evaluate the percentage of shared peptides between sars-cov- and ratg where the total number of tryptic peptides for sars-cov- is in the denominator and also where the total number of tryptic peptides in the denominator is for ratg . when calculating peptide redundancy between organisms, metatryp reports this calculation as "individual percent". the percentage of shared peptides across taxa can also be calculated by the number of peptides shared between taxa with the combined total number of peptides for both taxa in the denominator, metatryp reports this as "union percent". however, this also needs to be interpreted with care as when comparing a taxon that may have only been sequenced once (few total peptides) with a broadly sampled taxon (many peptides due to environmental variability), the signal of the rarely sampled taxon may be reduced due to the large n of the heavily sequenced organism. in addition, viewing "union percents" when comparing an organisms with a small proteome vs an organism with a large proteome would also skew any signal of shared peptides, for example sars-cov- only encodes proteins/genome [ ] , whereas the human genome encodes ~ , proteins. even though the ncbi ipg database is not used in the marine metatryp web portal, this same effect may be observed when comparing organisms with highly uneven proteome sizes, say if marine phages are added to the database, or with taxa that have varying levels of genome sequencing completeness, such as with specialized assemblies like mags and sags. backend upgrades for metatryp v provide improved performance: the switch from a sqlite database backend in metatryp v to a postgresql database backend has resulted in significant improvements in performance and functionality of metatryp. in particular, this transition has resulted in improved computational times and facilitated the addition of lca analyses to the software package. to test clock times, a database was constructed based upon the reference genomes from marine microbial taxa (si table ) in both versions of metatryp. si table shows the benchmarks associated with the construction of these genome-only databases for comparison, with metatryp v taking only % of the time it took metatryp v to construct the same database. by converting to a postgresql backend, the computational time to query the example database with the example peptide "lshqaiaeaigstr" was reduced over , x, dropping from . seconds using metatryp v to . seconds with metatryp v . the cpu utilization for metatryp v was also lower than for v . it is noted that setting up a postgresql server on a local machine requires administrative permissions and is more complex than sqlite which is more user-friendly and requires fewer system dependencies. therefore, for users without local administrative permissions, metatryp v with its sqlite backend remains a good lighter weight option (albeit with slower performance and without the capacity to handle diverse data categories and lca analyses table ) and the base schema for all three data categories is included in the metatryp v code repository (https://github.com/whoigit/metatryp- . ). using the results of the peptides set as the default example search in the metatryp web portal (lshqaiaeaigstr, vnsvidaiaeaak, vaaeavlsmtk), we see the results for these three peptides have varying levels of taxonomic lcas, ranging from species-to phylum-levels of taxonomic specificity (figures & ) . for peptide vnsvidaiaeaak, the lca across all data categories is the genus prochlorococcus, indicating that this peptide appears unique to this genus in the marine microbial community and is therefore a potential biomarker for targeted metaproteomics that allows species level specificity. for peptide vaaeavlsmtk, the lca for all categories is the order synechococcales as this peptide is found within the genus prochlorococcus and its sister genus synechococcus. taxonomic assignment of the original sequences of predicted proteins for the different data categories ranges in uncertainty. from reference genomes from cultured isolates being the most certain, to metatranscriptomic & metagenomic assemblies providing the least certainty in taxonomic assignment of the source proteins. the "specialized assemblies" of sags and mags exist somewhere in between on this taxonomic assignment uncertainty spectrum. peptide lshqaiaeaigstr (figure ) demonstrates this level of uncertainty within the "meta-omic assembly" category as a source protein for this peptide in the gos/omz (jcvi) metagenome cannot be identified with below the phylum level of cyanobacteria ( figure ) and in the "specialized assembly" category containing thousands of mags, this peptide is found in cyanobacterial mags and one verrumicrobial mag (verrucomicrobiales_bacterium_strain_np ), resulting in a lca of "bacteria". notably, this peptide is from the global nitrogen transcriptional regulation protein (ntca) which is highly conserved across the cyanobacteria [ ] . while this peptide has been previously identified as a potential biomarker for cyanobacteria [ ] , the addition of the "specialized assembly" data category identified the possible presence of this peptide in another phyla warranting further investigation. interestingly, upon further investigation, the protein in mag verrucomicrobiales_bacterium_strain_np , identified from metagenomes in the red sea [ ] , is > % identical to an ntca in the cyanobacterium genus synechococcus (ncbi accession wp_ . ). this verrumicrobial ntca may occur within this genome as a result of horizontal gene transfer or as an artifact of the mag binning process. either way, it indicates that peptide lshqaiaeaigstr should be used with caution as a cyanobacterial biomarker in environments with abundant verrumicrobia. however, this is an unlikely scenario as cyanobacteria tend to be far more abundant in marine environments than verrumicrobia. query results from a full sequence ntca from prochlorococcus med (ncbi genbank accession cae . ) show that other peptides, such as "lvsflmvlcr", may be more appropriate if targeting prochlorococcus only (si figure ). by separating sequence types into different categories, metatryp allows the user to balance the varying levels in confidence of taxonomic attribution, where reference genomes from cultured isolates are the best in taxonomic quality but more incomplete in environmental coverage, and vice versa for environmental sequences. the addition of , mags to metatryp has provided further insight into the prevalence of shared peptides across taxonomic groups in marine microbial communities. an analysis with metatryp v using a database of single reference genomes from common pelagic marine microorganisms demonstrated very little overlap in shared peptides across different taxonomic groups [ ] . expanding this analysis to the , mags shows a similar pattern of a very low occurrence of shared tryptic peptides across disparate taxa. figure shows the individual percentages of shared peptides across a random selection of mags. in general, taxa share < % of tryptic peptides, with a few clusters of more closely related organisms sharing more tryptic peptides --for example, gammaproteobacterial and euryarchaeotal clusters highlighted with red outlines --where the taxa share between - % or - % of tryptic peptides in each cluster, respectively. a similar pattern is shown when the cross-wise comparison of taxa is expanded to mags (si figures & ) . these results demonstrate that there should be sufficient resolution to discern between taxa using tryptic peptides identified by metaproteomic analyses, especially when coupled to lca analysis tools like metatryp to confirm peptide taxonomic origin. analysis of the coronavirus-focused metatryp instance is similar to the observations of marinefocused metatryp where there is a rather low frequency of shared peptides across disparate taxa with a higher frequency of shared peptides across more closely related taxa ( figure ; si figures & ) . within the broader group of the taxa associated with severe acute respiratory syndrome (si table ; infections [ , ] , as a combination of the coronavirus-specific database with other environmental databases (like marine microbial metatryp) may provide insight into potential tryptic peptide biomarkers in sewage effluent. this manuscript announces the release of version of the metatryp software package for assessing shared tryptic peptides in complex communities. coronavirus has the most shared tryptic peptides with its closest bat precursor virus, has some shared peptides with sars-cov- , and is very different from the "common flu". metatryp is a flexible software package to assess taxonomic occurrence of shared peptides applicable to proteomics studies of complex systems valuable for the identification of biomarkers and phyloproteomic analysis of complex communities. figures: figure . core elements of the metatryp v database schema. an entity relationship diagram depicting the core tables for the three sequencing categories (orange tables: "genome", green tables: "specialized assembly", and blue tables: "meta-omic assembly" data categories). the purple tables are shared tables among all three data categories containing information about the tryptic digestion rules (protease and digest tables) as well as the amalgamation of all unique tryptic peptide sequences found across all three data categories (peptide). the lines connecting the tables represent links between the data tables. the three data categories are stacked where tables represent similar information for each category. the metagenome data category requires two additional data tables as there are multiple taxa stored within a single meta-omic assembly sequencing file which requires an additional metagenome_annotations table for parsing; the blue connecting lines represent meta-omic specific data linkages. the lca results for these three separate peptides indicate the varying degrees of taxonomic uniqueness among the peptides in the "genome" and "meta-omic assembly" data categories. in these two data categories, lshqaiaeaigstr is unique to the phylum cyanobacteria. vnsvidaiaeaak is unique to the genus prochlorococcus. vaaeavlsmtk is unique to the order synechococcales. all of these peptides show potential as biomarkers at these varying taxonomic levels according to evaluation by the "genome" and "meta-omic assembly" databases. in general, there is a very low occurrence of shared tryptic peptides (< %) across disparate taxa. higher frequencies of shared tryptic peptides are shown among more closely related taxonomic groups. severe acute respiratory syndrome-related coronaviruses form a cluster in the top left corner. mers-related coronavirus shares < % of tryptic peptides with all taxa depicted here. the "common cold" strains of coronavirus (hku , e, and nl ) form a separate cluster in the bottom right corner. these coronavirus clusters are distinct and very different from the influenza a cluster in the middles, as the severe acute respiratory syndrome-related viruses share < % of shared tryptic peptides with influenza a and b. homo sapiens and chlorcebus sabaeus form a distinct group, sharing more tryptic peptides with each other than with any other taxonomic groups. needles in the blue sea: subspecies specificity in targeted protein biomarker analyses within the vast oceanic microbial metaproteome methionine synthase interreplacement in diatom cultures and communities: implications for the persistence of b use by eukaryotic phytoplankton multiple nutrient stresses at intersecting pacific ocean biomes detected by protein biomarkers abundant nitrite-oxidizing metalloenzymes in the mesopelagic zone of the tropical pacific ocean metagenomics: application of genomics to uncultured microorganisms. microbiology and molecular biology reviews : mmbr unipept . : functional analysis of metaproteome data development of an ocean protein portal for interactive discovery and education critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns progress and challenges in ocean metaproteomics and proposed best practices for data sharing the ncbi taxonomy database biopython: freely available python tools for computational molecular biology and bioinformatics proceedings of the th python in science conference nitrogen-fixing populations of planctomycetes and proteobacteria are abundant in surface ocean metagenomes the reconstruction of , draft metagenome-assembled genomes from the global oceans binning metagenomic contigs by coverage and composition anvi'o: an advanced analysis and visualization platform for 'omics data binsanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation ocean plankton. structure and function of the global ocean microbiome functional tradeoffs underpin salinity-driven divergence in microbial community composition single cell genomics yields a wide diversity of small planktonic protists across major ocean ecosystems img: the integrated microbial genomes database and comparative analysis system a new coronavirus associated with human respiratory disease in china mechanisms and enzymes involved in sars coronavirus genome expression euro surveillance : bulletin europeen sur les maladies transmissibles = european communicable disease bulletin human coronavirus circulation in the united states identification of a new human coronavirus characterization and complete genome sequence of a novel coronavirus, coronavirus hku , from patients with pneumonia infectious rna transcribed in vitro from a cdna copy of the human coronavirus genome cloned in vaccinia virus the genome landscape of the african green monkey kidney-derived vero cell line. dna research : an international journal for rapid publication of reports on genes and genomes deep metaproteomic analysis of human salivary supernatant anti-infective activities of lactobacillus strains in the human intestinal microbiota: from probiotics to gastrointestinal anti-infectious biotherapeutic agents the crapome: a contaminant repository for affinity purification-mass spectrometry data entrez gene: gene-centered information at ncbi. nucleic acids research the e-utilities in-depth: parameters, syntax and more. entrez programming utilities help multiple evidence strands suggest that there may be as few as , human protein-coding genes a pneumonia outbreak associated with a new coronavirus of probable bat origin nitrogen control in cyanobacteria cryo-em structure of the -ncov spike in the prefusion conformation how sewage could reveal true scale of coronavirus outbreak the potential of wastewater-based epidemiology as surveillance and early warning of infectious disease outbreaks. current opinion in environmental science & health metaproteomic analysis using the galaxy framework we would like to thank a. murat eren, tom delmont, ben tully, elaina graham, and john heidelberg for graciously providing mag sequences and additional taxonomic information facilitating incorporation into the metatryp database. this work was made possible by grants from the national science the authors declare no conflicts of interest in the publication of this manuscript. key: cord- -cdmoazrl authors: richardson, eve; galson, jacob d.; kellam, paul; kelly, dominic f.; smith, sarah e.; palser, anne; watson, simon; deane, charlotte m. title: a computational method for immune repertoire mining that identifies novel binders from different clonotypes, demonstrated by identifying anti-pertussis toxoid antibodies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cdmoazrl due to their shared genetic history, antibodies from the same clonotype often bind to the same epitope. this knowledge is used in immune repertoire mining, where known binders are used to search bulk sequencing repertoires to identify new binders. however current computational methods cannot identify epitope convergence between antibodies from different clonotypes, limiting the sequence diversity of antigen-specific antibodies which can be identified. we describe how the antibody binding site, the paratope, can be used to cluster antibodies with common antigen reactivity from different clonotypes. our method, paratyping, uses the predicted paratope to identify these novel cross clonotype matches. we experimentally validated our predictions on a pertussis toxoid dataset. our results show that even the simplest abstraction of the antibody binding site, using only the length of the loops involved and predicted binding residues, is sufficient to group antigen-specific antibodies and provide additional information to conventional clonotype analysis. next-generation immune repertoire sequencing (ig-seq or rep-seq [ , ] ) is providing us with comprehensive information about adaptive immune repertoires across individuals [ ] and immune states [ ] . progress has been made in the task of interrogating the vast diversity of b-cell receptor (bcr) repertoires, primarily through the analysis of predicted clonal relationships inferred via clonotyping [ ] . ig-seq and associated clonal analysis are finding increasing importance in antibody discovery both as a method of identification of putative antigen-specific antibodies [ ] [ ] [ ] and more recently as a method of lead antibody optimization through repertoire mining [ ] . the identification of antibodies which are predicted to bind to the same site (epitope) is a key component of bcr repertoire analysis and antibody discovery. the starting point for most bcr repertoire analysis is the reduction of thousands or millions of bcrs into orders of magnitude fewer clonotypes [ ] . a clonotype is defined as a group of antibody sequences which derive from a common progenitor b cell [ ] . during b cell development, the variable (v), diversity (d) and joining (j) gene segments encoding the variable domain of the antibody heavy chain undergo recombination [ ] . a requirement for two sequences to be predicted to share the same clonotype is therefore common v-and j-germline gene assignment [ ] . the d gene is not usually included in standard clonotype definitions due to difficulty in its assignment [ ] . the variable domain of the antibody heavy chain consists of the framework regions and hypervariable complementaritydetermining regions (cdrs). cdrhs and are encoded by the v gene while the region spanning the recombined v, d and j segments corresponds to the third and most diverse loop on the antibody heavy chain, the cdrh . the processes of junctional diversification (the insertion of palindromic and random nucleotides at the junction between the v, d and j genes) during recombination as well as somatic hypermutation [ ] during affinity maturation further contribute to the diversity of the cdrh . sequence identity in the cdrh is therefore included as a marker of shared origin in most clonotyping tools [ ] . the nucleotide or amino acid sequence identity across the cdrh required for two sequences to be considered in the same clonotype varies across studies -in studies performing clonotyping with length-normalised amino acid sequence identity thresholds, sequence identity thresholds vary between % and % [ ] . published clonotyping methods use only heavy chain information which is sufficient to capture most clonal relationships [ ] . a number of publicly available, well-supported pipelines have made clonotype analysis standard practice [ , ] . this has permitted large advances in the practical utility of ig-seq data [ , ] . clinically, it has found use in tracking minimal residual disease in blood cancers [ ] , monitoring vaccination responses [ ] [ ] [ ] and providing mechanistic insights into immune-mediated diseases [ , [ ] [ ] [ ] . clonotyping has also proven useful in antibody discovery as a means of selecting candidate sequences for expression as monoclonal antibodies [ ] [ ] [ ] and recently as a method of lead antibody optimisation via repertoire mining [ ] . antibodies within the same clonotype are likely to target a common epitope [ , ] . the majority of antibodies binding to the same epitope in antibody-antigen complex structures in the structural antibody database (sabdab, a database of experimentally-solved antibody and antibody-antigen complex structures) have highly similar cdrh s [ ] . however, it has also been observed that multiple clonotypes may converge on the same epitope. for example, scheid and colleagues identified clonotypes from distinct ighv families converging on the cd binding site in gp [ ] . separate clonotypes have also been observed to bind to overlapping epitopes on the hemagglutinin stem [ ] or globular head, and on multiple epitopes on the ebola virus glycoprotein [ ] . wong and colleagues identified pairs of antibodies with sub- % cdrh amino acid identity binding to the same epitope within sabdab [ , ] . this convergence between clonotypes offers the potential to improve our understanding of the functional landscape of bcr repertoires; large-scale functional convergence between lineages could, for example, explain the apparent scarcity of public clonotypes [ ] . this hypothesis is supported by evidence that while clonotypes are infrequently shared between individuals, the range of antibody structures that these clonotypes generate is more similar between individuals [ ] . in the context of antibody discovery, being able to identify binders to the same epitope from different clonotypes would aid in optimisation of developability or binding affinity, by allowing hopping between germline scaffolds. in antibody discovery, clonotyping is used to search for clonal relatives of lead antibodies in bulk ig-seq data sets in order to identify antibodies which target the same epitope but which have either an increased affinity or a superior developability profile. this process is referred to as "immune repertoire mining". hsiao and colleagues performed clonotyping on a set of bulk repertoires and used the resultant clonotypes for hit expansion against two targets [ ] . they achieved greater than an order of magnitude improvement in affinity for both targets and between % and % of tested heavy chain variants retained target-binding [ ] . this suggests that sampling within a clonotype can be highly effective as a means of repertoire mining. however, the method does not allow the identification of binders to the same epitope which derive from different clonotypes, which currently limits the sequence distinctness of novel binders which can be recovered from immune repertoires. in this paper, we describe a new method to identify functional convergence of antibody sequences that is germline-independent and that considers only the binding site of the antibody sequences, the paratope. we call this approach "paratyping". we show how paratyping allows clustering of antigenspecific sequences from different clonotypes, and is a rapid, structurally intuitive way of clustering functionally-related antibodies. paratyping simplifies the complex phenomenon of antibody-antigen interaction into sets of shared residues. learning the complexities of antibody-antigen interactions as part of a predictive model of antigen binding has been achieved in the case of the antigen-interaction of one therapeutic monoclonal antibody [ ] . however, such approaches rely on a large (on the order of ) library of experimentally validated binding and non-binding variants. paratyping removes the need for large amounts of training data and is generalisable across antigens. we first show the rationale for our method, paratyping, using the structures of a pair of antibodies from different clonotypes that bind to the same epitope. paratyping is then applied to a single-cell data set of sequences raised against pertussis toxoid (ptx) in a transgenic mouse platform where it is as accurate as clonotyping but identifies different binders. we then perform a prospective experimental test of the method by expressing as monoclonal antibodies and experimentally testing predicted ptxbinding and non-binding antibodies mined from a set of non-enriched bulk heavy chain sequencing repertoires. our experimental test demonstrates that paratyping identifies ptx-binding antibodies from different clonotypes to our original hits. this expands the sequence space available through repertoire mining and permits favourable shifts in in-silico developability metrics. of particular advantage is paratyping's ability to prediction common antigen reactivity of antibodies from different v and j gene backgrounds, which has implications for large-scale repertoire analysis. antibodies from different clonotypes have been observed to converge on the same binding site [ ] [ ] [ ] [ ] . we hypothesise that these functionally convergent antibodies may use the same paratope for interaction, and examined the structural antibody database (sabdab) [ ] for evidence of this. we define the epitope and paratope as those residues with any atom within . Å of any residue in the cognate antibody or antigen respectively. : an example of two antibodies that bind to the same epitope but derive from different clonotypes. the murine c (pdb id: do ) and d (pdb id: zpv) anti-mers-cov antibodies target the same residues on the receptor binding domain of the spike protein, despite being derived from different v and j genes and displaying cdrh amino acid identity ( . %) below the standard clonotyping definition ( % - %) [ ] (c). the antibodies use over % of the same paratope residues (b) to achieve this functional convergence. main of the mers-cov spike protein at the same epitope ( . % of epitope residues are shared across the pair of structures). the antibodies come from different clonotypes with both low cdrh identity ( . %) and different germline origins (v - - /j vs v - - /j ). however, c and d use largely the same paratope residues to achieve this epitope convergence, with . % of paratope residues conserved across the structures. another example of epitope convergence between antibodies with differing j genes and sub- % cdrh identity can be seen in the anti-lysozyme antibodies hyhel- and hyhel- (see supplementary figure ). in a standard implementation, antibodies with the same v and j genes, cdrh s of the same length, and above a threshold level of amino acid identity across the cdrh are considered to be in the same clonotype [ ] . in our novel method, paratyping, antibodies with the same length cdrs and above a threshold level of sequence identity across the predicted paratope residues are grouped into the same paratype. to test the ability of paratyping and clonotyping to group antibodies that target the same epitope we performed a test in a single-cell (paired vh/vl) data set of antibodies isolated from genetically engineered mice that have a full set of human immunoglobulin variable region genes [ ] immunised with pertussis toxoid (ptx). although we had pairing information within the single-cell data set, given that the majority of repertoire sequencing data to date is heavy-chain only [ ] , we demonstrate the method using only the heavy chain information. the sequences were annotated with a ptx-binding ( ) or non-binding label ( ) (as per the "single-cell data set" section of methods), and we used paratyping (our new method) and clonotyping (the conventional approach) to identify ptx-binding sequences. for each of the ptx-binders in turn, we mimicked a repertoire mining experiment by using paratyping or clonotyping to identify binders amongst the remaining sequences (one-vs-all crossvalidation). each of the ptx-binders is referred to as a "probe" antibody; sequences that are within the same paratype or clonotype as the probe are predicted to bind ptx. the precision and recall of the two methods (calculated over the aggregate of predictions) for repertoire mining are comparable ( figure , table ). the precision and recall using clonotyping and paratyping with varying cdrh sequence identity or paratope sequence identity thresholds, respectively, are shown in figure a and b. the methods require different sequence identity thresholds for optimum performance but have similar precision-recall profiles. for clonotyping, the optimal heavy-chain only threshold is % cdrh amino acid identity. for paratyping, the optimum occurs at % paratope identity. at these optimal thresholds, clonotyping % % % clonotyping % % % table : precision-recall values for prediction of ptx-binding according to paratyping and clonotyping at the optimal thresholds of % and % respectively. sequence identity is calculated across the predicted paratope for paratyping and across the cdrh for clonotyping. the methods behave comparably over the full precision-recall curve. these thresholds were then used for the prospective repertoire mining experiment. recovers binders with % precision and % recall (meaning that % of the binders in this data set are not related by clonotype to any other). paratyping recovers binders with a precision of % and a recall of % (meaning that % of binders in this data set have distinct paratopes). we expect that this would be an overestimation of performance in the bulk data set due to the enrichment of ptxbinders created through antigen-specific sorting. for each probe, a prediction can be made by both paratyping and clonotyping (hereafter labelled a "both" prediction), paratyping alone (labelled "paratype-only") or clonotyping alone (labelled "clonotypeonly"). out of all of these predictions, . % are made by both methods with "paratype-only" predictions (precision: %) and "clonotype-only" predictions (precision: %). paratyping and clonotyping make a number of method-exclusive predictions, as shown in figure , where the probe antibodies with the largest number of "paratype-only" and "clonotype-only" predictions are shown in dendrograms with these predictions. figure v genes ( ), those with different j genes ( ) and those with cdrh identity below . % ( ) . for example, a pair of binders with just % h sequence identity but % predicted paratope identity were clustered together. paratype predictions can potentially be further validated by building homology models of the sequences [ ] . in this case, modelling of the sequences suggests that the pair of sequences would also be predicted to be highly structurally similar (see supplementary table in order to test how the methods would scale in a larger and non-enriched data set, we performed a prospective repertoire mining experiment in which the heavy chain sequences from ptx-binding antibodies were used to identify novel ptx-binding heavy chains from a set of bulk heavy chain repertoires. we first looked at how the methods scale in terms of number of predictions, and then experimentally validated a number of the predictions of both clonotyping and paratyping. other sequences which are in the same paratype or clonotype are predicted to also bind ptx. these predicted ptx binding sequences are coloured according to whether they are identified by both paratyping and clonotyping ("both"), paratyping but not clonotyping ("paratype-only") or clonotyping but not paratyping ("clonotype-only"). circular leaves represent true ptx binding antibodies (i.e. true positives) while triangular leaves represent sequences that do not bind ptx (false positives). dendrogram a shows the probe which had the most "clonotypeonly" predictions, of which % are true positives; dendrogram b shows the probe with the most "paratype-only" predictions, of which % are true positives; dendrogram c is the probe antibody which is associated with the most false positives ( % true positives) while dendrogram d is the probe antibody associated with the most true positives ( true positives). performance is heterogeneous across probes ranging between % and % precision; precision and recall values reported elsewhere in this manuscript consider the performance in aggregate of all probes. v and j genes are annotated where this changes within a dendrogram. dendrograms are constructed using the full vh sequence with the neighbour-joining algorithm of the r package ape [ ] , plotted using the r package ggtree [ ] . the units of the scale bar are amino acid substitions per residue. using the known ptx binders as probes, , heavy chain sequences from bulk sequencing repertoires were searched for paratype-or clonotype-related sequences. sequences were identified by both clonotyping and paratyping, by paratype-only and by clonotype-only. the paratype-only predictions can be categorised as those with a different v gene ( ), those with a different j gene ( ) and those with sub- % cdrh identity ( ). for the experimental validation of predicted ptx binders and non-binders, we created a category of prediction more stringent than the "paratype-only" or "clonotype-only" categories to show the utility of paratyping in a real antibody discovery experiment context where multiple probe sequences are available. as shown in figure , a particular probe antibody may make predictions via clonotype or paratype alone. as reflected in the similarity in the precision-recall values calculated over the aggregate of probe antibodies in the single-cell one-versus-all cross-validation, these "method-unique" predictions become rarer when considering a larger number of probes -such predictions, which could not be found by another method even when using the full complement of known binders, are referred to as "paratype-unique" or "clonotype-unique" predictions. of the potential predicted ptx binders, were selected for expression (see methods) and ptx-binding assay. an additional antibodies predicted to not bind ptx (due to paratope identity or shared lineage with confirmed non-binders) were also tested. the predictions were split into thirds according to whether they were predicted by both methods (labelled "both"), or were unique across all binders for paratyping ("paratype-unique") or clonotyping ("clonotype-unique"). of the ( %) novel heavy chains predicted to bind ptx by both paratyping and clonotyping were true ptx binders. paratope identity between known and predicted binders ranged between % and %, with cdrh identity ranging between % and %. of the ( %) of the clonotypeunique predictions were true ptx binders, with minimally % predicted paratope identity to a known binder, % cdrh identity and % cdrh identity (amino acid identity calculated over the heavy chain cdrs). of the ( %) of the paratype-unique ptx binders bound ptx. the minimal cdrh identity of a ptx binder to any known binder was % with % paratope identity. the distribution of cdrh , total cdr and total vh amino acid identity of novel ptx-binding heavy chains to known ptx-binding antibodies is shown in figure . none of the predicted ptx non-binders bound ptx. as in the single-cell data set, the success rate in the predictions which were made by both clonotyping and paratyping is higher than either method alone. the success rate of paratype-unique predictions is significantly lower than that of clonotype-unique predictions. however, it may not be appropriate to compare performance across different probe antibodies, some of which may be liable to activity cliffs (a concept from small molecule chemistry where a compound exhibits a large change in activity given only a small change in structure [ ] ). a direct comparison can be made where both clonotype-unique and paratype-unique predictions were made using the same known ptx binder. this occurred for ptx probes, and across these an average precision of % for paratype-unique and % for clonotypeunique was observed. paratyping identified ptx-binding antibodies that could not be found using clonotyping ("paratypeunique"), for example those using a different v gene to any of the known ptx binders. an example is shown in figure ) , where the original antibody used the inherently autoreactive v gene v - [ ] , which may be problematic in development. however, paratyping recovers seven ptx-reactive antibodies which use the v - gene segment instead. heavy chains predicted to bind ptx are coloured as green, blue or purple depending on whether they are predicted to bind via both methods or were clonotype-or paratype-unique. asterisks indicate heavy chains selected for testing, all of which were validated as ptx binding. a shows sequences predicted to bind ptx that use a different v gene (v - ) than the known ptx binder used for prediction (v - ). sequences using a different j gene to the known ptx binder are shown in b. figure : cdrh , total cdr and total vh amino acid identity of novel ptx-binding antibodies to the known ptx-binding antibody by which they were identified, according to method by which they were identified. paratyping enables the discovery of ptx-binding sequences with lower sequence identity across each of these regions with minimally % cdrh identity, % cdrh identity and % total vh identity. paratyping also recovered novel ptx-binding heavy chains which derive from different j genes and examples with cdrh identities well below most clonotyping thresholds (commonly % - %). the minimal cdrh identity of a validated ptx-binding antibody to any known binder was %, suggesting that paratyping can identify antibodies that bind to the same epitope that could not be found by any clonotyping method. one of the limitations of clonotyping as a method for immune repertoire mining is the relatively narrow sequence space it is capable of making predictions within, meaning that the discovered antibodies may have conserved developability problems. we have already seen one example where paratyping's ability to jump between germlines allows us to avoid an autoreactive v gene-derived antibody; other developability problems such as aggregation propensity may also be improved by using paratyping. of the original antibodies used for repertoire mining, antibodies were flagged by the therapeutic antibody profiler (tap) tool as having possible developability issues due to cdr length, high density of charge or hydrophobicity, or charge asymmetry between the heavy and light chains. as paratyping only groups antibodies with the same length cdrs, we consider only the latter four de- figure shows the improvement in patch surface hydrophobicity achieved by immune repertoire mining using cl- as a probe antibody. cl- had an amber flag for this metric. repertoire mining was used to identify a number of predicted ptx-binding antibodies. it can be seen that the more sequence-distinct paratype-only predictions are able to achieve greater changes in psh. these predictions were not assayed as they were within the clonotype of another known binder and therefore not "paratype-unique". characterising the functional relationship between sequence-distinct antibodies is an important step in our understanding of the adaptive immune landscape. mapping antigen preference to antibody repertoires will allow us to identify epitope convergence of antibodies at a large scale. in a test system of transgenic mice immunised with pertussis toxoid (ptx), we show for the first time that prediction and comparison of paratopes can be used to group antigen-specific antibodies in both an enriched, single-cell data set and non-enriched bulk heavy chain repertoires. we demonstrate the utility of the method in the context of an antibody discovery experiment alongside the conventional approach of clonotyping and discover new anti-ptx antibodies from different clonotypes to any known binders. we first developed the method in a single-cell data set, where paratyping and clonotyping were able to group ptx-specific antibody heavy chains with high precision ( % and % respectively). these results may not map to the bulk sequencing data set given that the sequences derived from both plasma-, memory-and antigen-sorted cells (leading to ca. % of sequences being ptx-reactive). to validate the method in a less enriched data set, we performed a prospective experimental test of paratyping in a non-antigen sorted set of bulk heavy chain sequencing repertoires. in the prospective experimental test, paratyping allowed us to discover new ptx-binding antibodies that we could not have found using clonotyping. these include antibodies that use different germline genes as well as antibodies with lower cdrh identity than in common definitions ( - %). in terms of antibody discovery, paratyping allows us to identify sequence-distinct antibodies that bind to the same epitope and which can differ significantly in developability or affinity. in the terms of repertoire analysis, paratyping expands our ability to functionally group antibodies beyond clonotypes, and therefore allows us to detect specific cases of epitope convergence between lineages -we found epitope convergence between v - /j and v - /j lineages, v - /j and v - /j lineages, v - /j and v - /j lineages and v - /j and v - /j lineages in pertussis toxoid binders. we did not observe pairs of antibodies in the same paratype using different v gene families. however, paratope identity across the germline-encoded cdrhs and is equal to or in excess of % across members of v and v , and v and v (see supplementary figure ), so we predict that it should be possible to use paratyping to find binders from different v gene families, should a large enough sequencing data set be mined. the implications of this for immune repertoire clustering are as yet unexplored. if such convergence is widespread, it is possible that clonotyping overestimates functional diversity and this may account for low proportions of clonotypes shared across individuals [ ] . success rate in the prospective experimental validation was variable across the categories of prediction. we found that % of sequences predicted by both clonotyping and paratyping to be binders bound ptx. the success rate was considerably lower in clonotype-unique predictions ( %) and even lower in paratype-unique predictions ( %). it should be noted that these method-unique predictions form the minority of predictions for either method ( - % of predictions). the lower success rate of paratyping versus clonotyping may be attributable to the particularly low cdrh identity of these predictions to known binders -paratyping does not give special weight to the cdrh in the paratope identity calculation despite the particular role it plays in antigen complimentarity. predictions with as little as % cdrh identity to a known binder were assayed but no antibody with an h amino acid identity below % bound to pertussis toxoid, suggesting that paratyping could be further improved by the use of cdrh weighting. paratope prediction only takes around . seconds per sequence (of which . seconds corresponds to cdr extraction) as opposed to . seconds per sequence for vdj annotation using igblast [ ] ) which means it is considerably more tractable for large datasets than homology modelling (on average seconds per sequence, times slower than germline gene annotation [ ] ). it has the advantage that it does not rely on the upkeep of consistent and complete germline databases (a leading cause of disagreement between germline annotation tools [ ] ), but rather on the distribution of a pretrained paratope prediction model [ ] . the paratope prediction step is purely sequence-based and structural modelling is not required, meaning that immune repertoires can be annotated without accurate germline alignment and without access to large compute power. paratyping was shown to identify functional relationships between ptx-binding antibodies that are not related by clonotype; we would expect this to generalise across protein antigens. as an example, we looked at cov-abdab [ ] , a database of antibodies and nanobodies known to bind to betacoron- and c , shown in figure , are another example. deciphering the functional landscape of immune repertoires will greatly improve our understanding of the adaptive immune system. improving our ability to group antibodies binding to the same epitope is a step towards this. our results show here that the simple and computationally rapid abstraction of the antibody binding site used by paratyping is sufficient to group antigen-specific antibodies in a way which provides us with additional information to clonotyping. this additional information is particularly significant in the context of antibody discovery, where it allows us to recover different novel antigenspecific antibodies from immune repertoires. five genetically engineered mice that have a full set of human immunoglobulin variable region genes [ ] were immunised with pertussis toxoid (ptx). paired (v h /v l ) sequences were recovered from antigen-sorted, plasma and memory cells via a previously published method [ ] . antibody sequences with greater than % effect value relative to the positive control were labelled as binders. this resulted in ptx-binders and non-binders. heavy chains from sorted splenic b-cells from the same individual five mice as the single-cell data set were sequenced using standard protocols [ ] and processed using the presto/change-o pipeline [ , ] . this resulted in , heavy chain sequences. for quality control, only sequences with read count (reads with a particular unique molecular identifier (umi)) above or equal to two or a consensus count (reads with different umis but the same nucleotide sequence) above or equal to ten were considered, reducing the size of the data set to , sequences. clonotypes were defined as groups of heavy chain sequences sharing the same v and j genes, with identical cdrh lengths and a number of amino acid mismatches equal to or below a threshold sequence identity. vj annotation was performed with igblast [ ] within change-o [ ] . cdrh s were extracted according to the north definition [ ] with imgt numbering [ ] performed with anarci [ ] . paratypes were defined as heavy chain sequences sharing the same cdr lengths and greater than a threshold sequence identity across the predicted paratope regions. cdrs were extracted according to north definitions [ ] with imgt numbering [ ] performed via anarci [ ] . parapred [ ] was used for paratope prediction using the model as distributed by e. liberis at https://github.com/eliberis/parapred. to convert the output of parapred, binding probabilities, into a binary label, we selected a threshold of . as deemed optimal by the authors of the original paper [ ] , i.e. residues with a predicted probability of being in the paratope of above . were annotated as paratope residues. paratope identity was defined as the number of identical paratope residues (residues which are predicted to be in the paratope in both cases) divided by the smallest number of paratope residues of either sequence being compared (figure ). figure : method of calculating predicted paratope identity. x indicates cdr residues not predicted to constitute the paratope. in the example shown, the paratope identity is . %. figure : graphical illustration of the process of repertoire mining via paratyping. a probe antibody is selected. paratope prediction is performed both on the probe antibody and the bulk data set. heavy chain predicted paratope identity is used to mine the bulk repertoire for new predicted binders. in the repertoire mining experiment, we selected a number of hit antibodies from the single-cell data set to use as probes. spr was carried out on the htrf-positive antibodies and the highest affinity representatives were selected from each clonotype containing only sequences labelled as binders equating to antibodies, in order to align with previous repertoire mining experiments [ ] . we also selected a number of non-ptx binding antibodies to be used as a negative control. the non-ptx binding antibodies were selected as representatives of clonotypes containing only non-ptx binding sequences ( antibodies). the heavy chains from these probe antibodies were used as probes to mine the bulk repertoires via both paratyping and clonotyping, using the optimal sequence identity thresholds from the single-cell data set ( % predicted paratope identity for paratyping and % cdrh sequence identity for clonotyping). the paratyping process is illustrated in figure . predictions were labelled as "paratype-only", "clonotype-only" or "both" as detailed in the results section. if an antibody is within the same paratype as a particular probe but not within its clonotype, it is a "paratype-only" prediction and vice-versa. a prediction is labelled as "both" if it is within both the paratype and clonotype of the same probe. in the "experimental validation of predicted binders" section of methods, the more stringent "paratype-unique" and "clonotype-unique" definitions were used. we selected heavy chain sequences from the bulk data set for expression. of the sequences were predicted to be ptx binders using the ptx-binding probe sequences. we created a new way of categorising predictions in order to fully test the method in the context of an antibody discovery experiment. we define "paratype-unique" predictions as predictions which are not within the paratype of not only the probe in question (paratype-only) but of any of the full complement of probes, and similarly for "clonotype-unique" predictions. paratype-unique predictions are a more stringent subset of paratype-only predictions and present a greater challenge of the method (as such predictions tend to be corroborated by fewer probes). predicted ptx-binding heavy chain sequences were selected from each of the three categories of prediction ("paratype-unique", "clonotype-unique" and "both"). five of the "both" predictions were identical to their probes. the remaining sequences assayed were predicted non-ptx binding heavy chain sequences with sequences selected from "clonotypeunique", "paratype-unique" and "both" categories of prediction. the heavy chain sequences selected for expression from the bulk repertoire were paired with the cognate light chain of the probe sequence by which they were identified. within-clonotype vh/vl pair-ing has been validated [ , ] as a method of reconstituting binding where the cognate light chain is not sequenced. pairings were only made where there was greater than % sequence identity across the residues considered to constitute the v h /v l interface, based on solvent accessibility [ ] , in order to maximise the probability of expression. these positions lie outside of the cdrs and therefore do not enforce any constraint on binding site sequence identity. the predicted binding and non-binding antibodies were expressed in hek cells. the assay is as described in the single-cell data set section of methods with the exception that the anti-ptx antibody b was used as positive control. antibody sequences with an f value exceeding were labelled as ptx-reactive antibodies. to evaluate the performance of either method in grouping ptx-binding sequences, we calculated precision and recall of the method in the single-cell data set according to the following definitions of true positives, false positives and false negatives. the task is to group antibodies that bind pertussis toxoid. however, the method is hypothesised to work by grouping by epitope. there will be multiple epitopes on pertussis toxoid so not all ptx-binding antibodies will be grouped into a single paratype or clonotype. as a result, we do not expect perfect recall. further, we do not classify antibodies as non-binders if they do not group with a particular binder -we only predict that they do not bind at the same epitope as the binder in question. as a result, we do not calculate a "true negative" rate. • true positive (tp): a ptx-binding sequence that was identified by another ptx-binding sequence. • false positive (fp): a non-ptx-binding sequence that was identified by a ptx-binding sequence. • false negative (fn): a ptx-binding sequence that was not identified by any ptx-binding sequence. we calculated the in-silico developability metrics of total cdr length, patches of surface hydrophobicity (psh), patches of positive charge (ppc), patches of negative charge (pnc) and structural fv charge symmetry parameter (sfvcsp) using the therapeutic antibody profiler tool (tap) [ ] . the tool calculates these metrics from homology models built using abodybuilder [ ] and compares them to a database of clinical-stage therapeutics (csts) [ ] . according to the metric in question, values are flagged as "amber" (extreme) if they lie outside the upper or lower, or upper and lower, % of observed values among csts. an antibody is flagged as "red" if the value of a particular metric is outside of the observed range in csts. [ ] funding the code and data associated with this study are available at http://opig.stats.ox.ac.uk/resources. the promise and challenge of high-throughput sequencing of the antibody repertoire. en practical guidelines for b-cell receptor repertoire sequencing analysis commonality despite exceptional diversity in the baseline human antibody repertoire analysis of the b cell receptor repertoire in six immune-mediated diseases. en bioinformatic and statistical analysis of adaptive immune repertoires monoclonal antibodies isolated without screening by analyzing the variablegene repertoire of plasma cells. en mining the antibodyome for hiv- -neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. en neutralizing antibodies against west nile virus identified directly from human b cells by single-cell analysis and next generation sequencing. en immune repertoire mining for rapid affinity optimization of mouse monoclonal antibodies the analysis of clonal expansions in normal and autoimmune b cell repertoires somatic generation of antibody diversity. en anarci: antigen receptor numbering and receptor classification. en cutting edge: ig h chains are sufficient to determine most b cell clonal relationships. en immunosequencing: applications of immune repertoire deep sequencing. current opinion in immunology sequencing the functional antibody repertoire-diagnostic and therapeutic discovery measurement and clinical monitoring of human lymphocyte clonality by massively parallel v-d-j pyrosequencing. en lineage structure of the human antibody repertoire in response to influenza vaccination. en human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements studying the antibody repertoire after vaccination: practical applications b cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes. en antibody repertoire analysis in polygenic autoimmune diseases igh sequences in common variable immune deficiency reveal altered b cell development and selection ab-ligity: identifying sequence-dissimilar antibodies that bind to the same epitope. en. biorxiv. publisher: cold spring harbor laboratory section: new results sequence and structural convergence of broad and potent hiv antibodies that mimic cd binding. en vaccine-induced antibodies that neutralize group and group influenza a viruses therapeutic monoclonal antibodies for ebola virus infection derived from vaccinated humans. en sabdab: the structural antibody database. en massively scalable genetic analysis of antibody repertoires. en. biorxiv, evidence of antibody repertoire functional convergence through public baseline and shared response structures. en. biorxiv. publisher: cold spring harbor laboratory section: new results deep learning enables therapeutic antibody optimization in mammalian cells by deciphering high-dimensional protein sequence space. en. biorxiv, complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery. en observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. en ape . : an environment for modern phylogenetics and evolutionary analyses in r. en ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. en automated antibody structure prediction with data-driven accuracy estimation advancing the activity cliff concept regulation of inherently autoreactive vh - b cells in the maintenance of human b cell tolerance. en five computational developability guidelines for therapeutic antibody profiling igblast: an immunoglobulin variable domain sequence analysis tool. en benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. en parapred: antibody paratope prediction using convolutional and recurrent neural networks. en cov-abdab: the coronavirus antibody database. en. biorxiv. publisher: cold spring harbor laboratory section: new results en. us b . library catalog: google patents presto: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. en. bioinformatics . publisher: oxford academic change-o: a toolkit for analyzing large-scale b cell immunoglobulin repertoire sequencing data. en a new clustering of antibody cdr loop conformations imgt unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains de novo identification of vrc class hiv- -neutralizing antibodies by next-generation sequencing of b-cell transcripts. en thera-sabdab: the therapeutic structural antibody database. en. nucleic acids research . publisher: oxford academic, d -d key: cord- - ls dvzy authors: ganier, c; du-harpur, x; harun, n; wan, b; arthurs, c; luscombe, nm; watt, fm; lynch, md title: cd (bsg) but not ace expression is detectable in vascular endothelial cells within single cell rna sequencing datasets derived from multiple tissues in healthy individuals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ls dvzy coronavirus disease (covid- ) is caused by severe acute respiratory syndrome coronavirus (sars-cov- ) and is associated with a wide range of systemic manifestations. several observations support a role for vascular endothelial dysfunction in the pathogenesis including an increased incidence of thrombotic events and coagulopathy and the presence of vascular risk factors as an independent predictor of poor prognosis. it has recently been reported that endothelitis is associated with viral inclusion bodies suggesting a direct role for sars-cov- in the pathogenesis. the ace receptor has been shown to mediate sars-cov- uptake and it has been proposed that cd (bsg) can function as an alternative cell surface receptor. to define the endothelial cell populations that are susceptible to infection with sars-cov- , we investigated the expression of ace as well as other genes implicated in the cellular entry of sars-cov- in the vascular endothelium through the analysis of single cell sequencing data derived from multiple human tissues (skin, liver, kidney, lung and intestine). we found that cd (bsg) but not ace is detectable in vascular endothelial cells within single cell sequencing datasets derived from multiple tissues in healthy individuals. this implies that either ace is not expressed in healthy tissue but is instead induced in response to sars-cov or that sars-cov enters endothelial cells via an alternative receptor such as cd . we read with interest the report of varga et al (varga et al. ) showing endothelitis in association with viral inclusion bodies in endothelium from multiple organs of patients with coronavirus disease . this finding is of great relevance in view of multiple lines of evidence supporting a role for vascular endothelial dysfunction in the pathogenesis of covid- including the presence of cardiovascular risk factors as an independent predictor of severe disease , the high incidence of thrombotic complications (klok et al. ) and the presence of coagulopathy (tang et al. ). an understanding of the mechanism of endothelitis may shed light on the diverse systemic manifestations of covid- and suggest potential therapeutic approaches. the ace receptor has been shown to mediate uptake of the virus responsible for covid- , severe acute respiratory syndrome coronavirus (sars-cov- ), in human cells (hoffmann, kleine-weber, schroeder, et al. ) and previous reports have suggested expression of ace in vascular endothelial cells hamming et al. ) and heart (ferrario et al. ) . in order to define the endothelial cell populations that are susceptible to infection with sars-cov- , we investigated the expression of ace in the vascular endothelium through the analysis of single cell sequencing data derived from multiple human tissues (skin, liver, kidney, lung and intestine). human lung and liver data were derived from the human cell atlas project (regev et al. ) . other data was publicly available (liao et al. ; aizarani et al. ; travaglini et al. ; y. wang et al. ; tabib et al. ) . single cell sequencing data is shown as dot plots and umap plots -in the latter each dot represents an individual cell and cells with similar expression patterns cluster together. cell types were annotated according to markers as defined within the original publications. vascular endothelium is identified by established markers including pecam , flt , vwf, tie , kdr, cd and cldn (leeuwenberg et al. ; lee et al. ) . we could identify endothelial cells in the lung, liver and skin but were unable to unambiguously identify endothelial cells in the intestine and kidney datasets. with the exception of enterocytes in the colon, the level of ace expression was very low across all of these datasets ( figure a , c and table ) and was not detected at a significant level in endothelial cells ( figure a , b, c and table ). in order not to inadvertently exclude any ace -expressing cells, we show plots without the usual filtering employed for single cell sequencing data (supplementary methods). we also examined expression of tmprss , a serine protease that is required for the priming of the spike protein of sars-cov- (hoffmann, kleine-weber, schroeder, et al. ). this was not expressed at a high level in endothelial cells in any of the tissues that we analysed ( figure a , d and table ) . however it was detected in epithelial cells within the lung, liver and colon (supplementary figure and table ). tmprss was also expressed in hepatocytes. ctsl and ctsb encode capthesin l and b, respectively and are alternative proteases which can mediate the priming of the spike protein ). these are expressed across a range of cell types including endothelial cells in these different tissue types (table ) . it has recently been proposed that cd (bsg), a cell surface receptor, can also facilitate entry of sars-cov- into cells (k. ) and therefore we examined expression of this transcript across the datasets. in contrast to ace , we observed that cd was widely expressed in these tissues including in endothelial cells in all of the datasets for which these could be identified i.e. lung, liver, and skin ( figure a , b, c). in summary, we report that cd but not ace is detectable in vascular endothelial cells within single cell sequencing datasets derived from multiple tissues in healthy individuals. ace expression has previously been reported by rt-pcr in the rat heart, however it was not demonstrated that this was localized to the vascular endothelium (ferrario et al. ) . ace expression in renal endothelium has previously been reported on the basis of immunohistochemistry ) and in the endothelium of small and large arteries and veins in all tissues studied . both of these studies used a polyclonal rabbit anti-ace antibody produced by the same manufacturer and it is possible that staining with this antibody is not specific. interestingly, a recent report has shown that sars-cov- can infect t-lymphocytes via an ace -independent mechanism and proposed that a novel receptor might mediate uptake into t cells (x. and it has been shown that the level of ace expression is very low in normal respiratory mucosa (aguiar et al. ) . in order to reconcile the absence of ace expression with the presence of viral inclusion bodies in endothelial cells of infected patients (varga et al. ) and the direct infection of engineered human blood vessel organoids by covid- (monteil et al. ) we must conclude that either ace is not expressed in healthy tissue but is instead induced in response to sars-cov , or that sars-cov enters endothelial cells via an alternative receptor such as cd . gene expression and in situ protein profiling of candidate sars-cov- receptors in human airway epithelial cells and lung tissue a human liver cell atlas reveals heterogeneity and epithelial progenitors effect of angiotensin-converting enzyme inhibition and angiotensin ii receptor blockers on cardiac angiotensin-converting enzyme tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis the novel coronavirus ( -ncov) uses the sars-coronavirus receptor ace and the cellular protease tmprss for entry into target cells sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor confirmation of the high cumulative incidence of thrombotic complications in critically ill icu patients with covid- : an updated analysis generation of pure lymphatic endothelial cells from human pluripotent stem cells and their therapeutic effects on wound repair e-selectin and intercellular adhesion molecule- are released by activated human endothelial cells in vitro renal ace expression in human kidney disease single-cell rna sequencing of human kidney inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace the human cell atlas white paper sfrp /dpp and fmo /lsp define major fibroblast populations in human skin abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia a molecular cell atlas of the human lung from single cell rna sequencing endothelial cell infection and endotheliitis in covid- sars-cov- invades host cells via a novel route: cd -spike protein sars-cov- infects t lymphocytes through its spike protein-mediated membrane fusion single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study key: cord- -z d lcar authors: wang, shiyu; wang, longlong; liu, ya title: cd + t cell subsets present stable relationships in their t cell receptor repertoires date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: z d lcar cd + t cells are key components of adaptive immunity. the cell differentiation equips cd + t cells with new functions. however, the effect of cell differentiation on t cell receptor (tcr) repertoire is not investigated. here, we examined the features of tcr beta (tcrb) repertoire of the top clones within naïve, memory and regular t cell (treg) subsets: repertoire structure, gene usage, length distribution and sequence composition. first, we found that memory subsets and treg would be discriminated from naïve by the features of tcrb repertoire. second, we found that the correlations between the features of memory subsets and naïve were positively related to differentiation levels of memory subsets. third, we found that public clones presented a reduced proportion and a skewed sequence composition in differentiated subsets. furthermore, we found that public clones led naïve to recognize a broader spectrum of antigens than other subsets. our findings suggest that tcrb repertoire of cd + t cell subsets is skewed in a differentiation-depended manner. our findings show that the variations of public clones contribute to these changes. our findings indicate that the reduce of public clones in differentiation trim the antigen specificity of cd + t cells. the study unveils the physiological effect of memory formation and facilitates the selection of proper cd + subset for cellular therapy. (tcr), cd + t cells recognize the complex of epitopes and major histocompatibility complex ii and then induce the activation of other cells in infections , , cancer and autoimmune diseases. to acquire mature functions, cd + t cells undergo differentiation. nt is the protype of cd + t cell and has the greatest potential among cd + t subsets to differentiate to other subsets. nt usually keep a serenity and can refresh themselves by proliferation. when nt encounters pathogens, it will home to lymphatic organs and receive the help from dendritic cells to initiate the polarization. the study on tcr repertoire suggests that nt has the most large scale of evenness of tcr repertoire among all cd + subsets , which indicates the greatest potential to recognize antigens. in a classical differentiation model , , naive (nt) sars-cov- . cross-reactivation from memory can provide a rapid protection to a novel pathogen in some individuals, such as the case reports of covid- . the importance of tcr repertoire for memory cell functions was found in tissues, where the differential composition of tcr repertoire of cd + memory among tissues equipped them with distinct functions . the function of treg was restricted by tcr repertoire. the optimal diversity of tcr was essential for the suppressive ability , and limitations on tcr diversity disturbed the self- tolerance of immune system . although evidences show that the features of tcr repertoire are distinct among cd + t subsets, the effect of differentiation on t cell receptor (tcr) repertoire of cd + t cells are not investigated. to unveil the influence of differentiation on tcr repertoire, we analyzed the sequencing data of tcr beta (tcrb) chain of nt, et, emt, cmt, tscm and treg. we detected repertoire structure, germline gene usage, sequence composition and public clones of tcrb repertoire of each subset. we found that nt, cmt, tscm and treg were discriminated from each other by repertoire structure, gene usage and sequence composition, independently. the tcrb the tcrb repertoire structure of nt is similar to the tcrb repertoire structure of cmt and tscm frequent clones affect the immune repertoire structure . we thus performed the analyses on top clones within each subset. renyi entropy with alpha values from zero to twenty was used to evaluate the diversity. in dataset , the tcrb repertoire of nt and tscm present similar diversities at all alpha values, and are more diverse than the tcrb repertoire of cmt and the tcrb repertoire of treg ( figure a ). in dataset , nt has the most diverse tcrb repertoire among all subset whereas et has the lowest. the tcrb repertoire of cmt is more diverse than that of etm and treg. (supplementary figure a) . the similarity of tcrb repertoire structure of subsets was estimated by jensen-shannon distance. in dataset , the tcrb repertoire structure of nt is similar to the tcrb repertoire structure of less-differentiated subsets (cmt and tscm), but the tcr repertoire structures of tscm and cmt are different from each other; the tcrb repertoire of treg is different to the tcrb repertoire of nt and cmt with high jsds. it indicates that treg has a structure of tcrb repertoire like that of more- differentiated memory subsets ( figure b ). in dataset , nt and cmt have similar tcrb repertoire structures, and the tcrb repertoire structure of treg is similar to the tcrb repertoire structure of emt rather than that of cmt (supplemental figure b) . these findings fit with the trend found in dataset . to consider the overlapping usage of cdr clones, we further evaluated the similarity of tcrb repertoire among subsets with the morisita-horn similarity index. in this analysis, nt keeps a similar tcrb repertoire like tscm and cmt, while the tcrb repertoire of nt is different from the tcrb repertoire of emt and treg ( figure c ; supplemental figure c ). in conclusion, the tcrb repertoire structure of cd + t the entire repertoire of nt was reported to be longer than the repertoire of memory . however, we found that the top clones in nt were shorter than clones in other subsets in all datasets ( figure a ; supplemental figure ). via calculation of the pearson correlations, the length distribution of nt is different to the length distribution of cmt and tscm. it suggests that the length distribution of tcrb repertoire of nt is highly skewed in these less- differentiated memory cells ( figure b ). since the naïve cells are sorted without antibody against cd in dataset , the length distribution of nt can be affected by the contamination from cell sorting. we examined the length distribution in dataset . to identify whether the gene usage affects the cdr length distribution, we calculated the mean length of clones for each gene. the mean length is different among clones by varied v- and j-genes ( figure c and d) , however, clones using all of genes are shorter in nt than - clones in cmt and tscm. therefore, for top clones, the clones of nt are shorter than the clones of other subsets, and the gene usage contributes less to the distinct length distributions among subsets. public clones that are shared by individuals were shown to be different from private clones in sequence composition . our analyses showed that public clones were shorter than private ones within top clones (supplemental figure a ). it suggests that public clones may affect the features of top clones. we referred clones found in no less than two individuals as public clones. we found more public clones in nt than in other subsets: , in nt, in tscm, from t d ( figure a ; supplemental figure b ). via calculating abundance, we found that public clones were composed of ~ % of top in nt, and ~ % in other subsets. it suggests that the public clones in nt have a larger effect on the repertoire of top clones than the public clones in others ( figure b ). most of public clones in nt were a little presented in other subsets ( figure c ), and about % public clones in each subset could be found in nt. this result indicates that public clones in nt are less maintained than the top clones in other subsets. to detect the differences between public clones and private clones within each subset, support vector machine (svm) was used. to avoid the influence of differential sample sizes, we randomly down-sampled public clones and private clones for each subset, respectively. the prediction was repeated for times. the prediction accuracy (bacc) in nt was found to be lower than bacc in other subsets ( figure d ). it suggests that the differences between private and public clones in differentiated subsets are larger than the differences in naïve. further analyses showed that the gene usage of public clones is similar to the gene usage of all top clones (supplemental figure c ). it suggests that gene usage is not skewed in public clones. to identify that public clones or private clones account for the increased difference in memory and treg, we performed svm to discriminate public clones as well as private clones from different subsets separately . for public clones, the bacc was from % to %; nt was able to be discriminated from treg, cmt and tscm with ~ % bacc; whereas cmt was incapable of separating from tscm with ~ % bacc ( figure a ). for private clones, the bacc was from % to %. we were able to achieve a high prediction accuracy to discriminate private clones of tn from private clones of treg, but we failed to separate private clones from cmt and tscm ( figure b ). when we increased the sample size of private clones from to for training svm model, we found that the varied baccs to discriminate private clones from different subsets were still existed (supplemental figure a) . these results suggest that the sequence compositions of public clones and private clones are both skewed in the reduced number of public clones narrows the antigen spectrum recognized by to unveil the functions of clones among subsets, we annotated clones by vdjdb . clones of cd + t cells targeting eight epitopes in total are recorded by this database. comparing with cmt, tscm and treg, nt has more clones recognizing antigens (ha, h and np) from influenza, pp from cytomegalovirus (cmv), cfp from m. tuberculosis and gliadin from triticum aestivum ( figure c; supplementary figure b ). to estimate the spectrum of antigens targeted by the top clones, we used gliph to predict the clusters recognizing diverse antigens for each subset. with a stringency filter (see methods), we found out clusters in nt from hc and clusters in nt from t d respectively; while less than clusters in whole of tscm, cmt and treg ( figure d ). when public clones were removed from top clones, only . % clusters remained in nt of hc and . % clusters remained in t d, whereas over % clusters remained in other subsets ( figure e ). it suggests that the public clones enlarge the antigen spectrum recognized by top clones in nt. in conclusion, nt recognizes a broader antigen profile contributed by public clones. it is essential for cd + t cells to recognize antigens with tcr, which is primarily achieved by the cdr region. cd + t cells can acquire new functions via differentiation; however, it is unclear how differentiation affects their tcr repertoire. we detected the relationships among the tcrb repertoire of top clones of naïve, memory and treg subsets (including nt, et, tcm, tem, tscm et and treg) by estimating the repertoire structure, the germline gene usage, the sequence composition (k-mer) and public cdr clone usage. we derive that the trbv repertoire features of memory subsets are tightly regulated in differentiation. we observed that of genes increased or decreased in an order of nt, tscm, cmt and emt. it indicates that a mechanism exists to regulate the variations across subsets. furthermore, since tscm is the least differentiated cell whereas emt is the highest one among the tree memory subsets , it indicates that the differentiation level is along with the mechanism. cmt is formally considered as the primary memory subset which et prefer to differentiate to, and then part of cmt differentiates to emt. in the past decade, tscm has been found to mix phenotypes of naïve and memory. tscm is able to self-renew and replenish more differentiated subsets of memory t cells, and therefore acts as the key intermediary of the generation of memory , . in together, differentiation levels of memory subsets reflect their differentiation order. however, memory cells can be directly generated from naïve cells by asymmetric cell division , . it indicates that the differential order should not be the only factor skewing tcrb antigens. it implies that, for newborns, food, self-antigens and even cytokine driven clones compose the large part of tcrb repertoire of memory. since the highly frequent clones in nt are self-antigen related, the features of frequent clones in nt will be delivered to memory at this period . furthermore, shown by graeme et al, a half part of memory is maintained by self-emt maintain the features of tcrb repertoire inherited from nt at the early lifetime. in conclusion, it is reasonable to drive that events at the early lifetime, genetic factors and differentiation order regulate the tcrb repertoire of cd + t subsets with differentiated levels. public clones are key components that affect the features of tcrb repertoire in differentiation. first, we found that public clone usage rather than gene usage shortens the length distribution of top clones within nt. second, the sequence composition of public clones which is skewed in differentiated subsets contribute to the variations of tcrb repertoire in differentiation. third, decreased public clones induce a reduction in antigen spectrum recognized by memory and treg subsets. these results suggest that the skewed public clone usage highly affect top clones in differentiated subsets. furthermore, we showed that factors affecting the generation of public clones in memory and treg are different to that in nt. the generation of public clones were largely attributed to genetic factors and thymic positive selection in the previous study . in our study, public clones from nt are less maintained in differentiated subsets, and svm analyses indicate that sequence composition in memory it suggests that the difference between public clones and private clones is enlarged in the differentiated subsets. when we performed svm on public clones and private clones among subsets respectively, the sequence compositions of public clones and private clones were skewed in differentiation. a small part of peripheral treg differentiated from conventional treg. shown by golding a. et al, the repertoire of foxp + and foxp -cells did not overlap . although peripheral tregs are differentiated from conventional t cells and can introduce the features of nt into treg, the tcr repertoire of effector and memory subsets is similar to nt than to treg. this phenomenon suggests that the influx from naïve just composed a minor part of treg in blood, and comparing to treg, the features of naïve are maintained in effector and memory subsets in the differentiation. our study includes samples of three healthy states (heathy, ra and t d individuals), and therefore highlights that our findings are consistent in heathy conditions and datasets. tnf-alpha/ifn-gamma profile of hbv-specific cd t cells is associated with liver damage and viral clearance in chronic hbv infection the roles of resident, central and effector memory cd t-cells in protective immunity following infection or vaccination cd (+) t cell help is required for the formation of a cytolytic the naive t-cell receptor repertoire has an extremely broad distribution of clone sizes memory t cell subsets, migration patterns, and tissue residence effector and memory t-cell differentiation: implications for vaccine development development and function of protective and pathologic diversity and clonal selection in the human t-cell repertoire regulatory t cells suppress effector t cell proliferation by limiting t cell receptor beta- chains display abnormal shortening and repertoire sharing in type diabetes comprehensive tcr repertoire analysis of cd (+) t-cell subsets in rheumatoid arthritis imonitor: a robust pipeline for tcr and bcr repertoire analysis a model-based approach to comparative analysis of the clone size distribution of the t cell receptor repertoire philentropy: information theory and distance quantification with r kebabs: an r package for kernel-based analysis of biological sequences analyzing the mycobacterium tuberculosis immune response by t-cell receptor clustering with gliph and genome- wide antigen screening large-scale network analysis reveals the sequence space architecture of antibody repertoires learning the high-dimensional immunogenomic features that predict t-cell receptor repertoires share a restricted set of public and abundant cdr sequences that are associated with self-related immunity memory cd t cell subsets are kinetically heterogeneous and replenished from naive t cells at high levels crossreactive public tcr sequences undergo positive selection in the human thymic repertoire deep sequencing of the tcr-beta repertoire of human forkhead box protein (foxp )(+) and foxp (-) t cells suggests that they are completely distinct and non-overlapping the mechanisms shaping the repertoire of cd (+) foxp (+) key: cord- -xa zgkd authors: adachi, hiroaki; sakai, toshiyuki; kourelis, jiorgos; maqbool, abbas; kamoun, sophien title: jurassic nlr: conserved and dynamic evolutionary features of the atypically ancient immune receptor zar date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xa zgkd nlr immune receptors form one of the most diverse protein families in flowering plants (angiosperms). nlrs have massively expanded through birth-and-death evolution and typically exhibit hallmarks of rapid evolution even at the intraspecific level. here, we reconstructed the evolutionary history of zar , an atypically conserved nlr that traces its origin to early angiosperm lineages ~ to million years ago (jurassic period). we used iterative sequence similarity searches coupled with phylogenetic analyses to determine the degree to which zar orthologs and paralogs occur in plants. we discovered zar orthologs in species, including the monocot colacasia esculenta, the magnoliid cinnamomum micranthum and the majority of eudicots, notably the early diverging eudicot species aquilegia coerulea. analyses of the ortholog sequences revealed highly conserved features of zar , including regions for pathogen effector recognition, intramolecular interactions and cell death activation. this also uncovered a new conserved surface on the underside of the activated zar resistosome wheel. throughout its evolution, zar also acquired novel features. nine zar orthologs from cassava and cotton carry an integrated thioredoxin-like domain at their c-termini. zar also duplicated into two paralog families zar -sub and zar -cin. zar -sub, which emerged in the eudicots, is a large class of sequence diverse zar paralogs that lack several of the conserved motifs of zar . a second family, zar -cin, comprises an expansion of paralogs unique to a ~ kb locus in the c. micranthum genome and located about mb from zar . we conclude that zar stands out among angiosperm nlrs for having an ancient origin and having experienced relatively limited gene duplication and expansion throughout its deep evolutionary history. nonetheless, zar did also give rise to non-canonical nlr proteins with integrated domains and degenerated molecular features. plants immune receptors, often encoded by disease resistance (r) genes, detect invading pathogens and activate innate immune responses that can limit infection (jones and dangl, ) . a major class of immune receptors is formed by intracellular proteins of the nucleotide-binding leucine-rich repeat (nlr) family (dodds and rathjen, ; jones et al., ; kourelis and van der hoorn, ) . nlrs detect host-translocated pathogen effectors either by directly binding them or indirectly via host proteins known as guardees or decoys. nlrs are arguably the most diverse protein family in flowering plants (angiosperms) with many species having large (> ) and diverse repertoires of nlrs in their genomes (shao et al., ; baggs et al., ) . they typically exhibit hallmarks of rapid evolution even at the intraspecific level ( van de weyer et al., ; lee and chae, ; prigozhin and krasileva, ) . towards the end of the th century, michelmore and meyers ( ) proposed that nlrs evolve primarily through the birth-and-death process (nei and hughes, ) . in this model, new nlrs emerge by recurrent cycles of gene duplication and loss-some genes are maintained in the genome acquiring new pathogen detection specificities, whereas others are deleted or become non-functional through the accumulation of deleterious mutations. such dynamic patterns of evolution enable the nlr immune system to keep up with fastevolving effector repertoires of pathogenic microbes. however, as already noted over years ago by michelmore and meyers ( ) , a subset of nlr proteins are slow evolving and have remained fairly conserved throughout evolutionary time (wu et al., ; stam et al., ) . these "high-fidelity" nlrs (per lee and chae, ) offer unique opportunities for comparative analyses, providing a molecular evolution framework to reconstruct key transitions and reveal functionally critical biochemical features (delaux et al., ) . nonetheless, comprehensive evolutionary reconstructions of conserved nlr proteins remain limited despite the availability of a large number of plant genomes across the breadth of plant phylogeny. one of the reasons is that the great majority of nlrs lack clear-cut orthologs across divergent plant taxa. here, we address this gap in knowledge by investigating zar (hopz-activated resistance ), an atypically ancient nlr, and asking fundamental questions about the conservation and diversification of this immune receptor throughout its deep evolutionary history. nlrs occur across all kingdoms of life and generally function in non-self perception and innate immunity uehling et al., ) . in the broadest biochemical definition, plant nlrs share a multidomain architecture typically consisting of a nb-arc (nucleotidebinding domain shared with apaf- , various r-proteins and ced- ) followed by a leucine-rich repeat (lrr) domain. angiosperm nlrs form several major monophyletic groups with distinct n-terminal domain fusions (shao et al., ; kourelis and kamoun, ) . these include the subclades tir-nlr with the toll/interleukin- receptor (tir) domain, cc-nlr with the rx-type coiled-coil (cc) domain, ccr-nlr with the rpw -type cc (ccr) domain (tamborski and krasileva, ) and the more recently defined ccg -nlr with a distinct type of cc (ccg ) . up to % of nlrs carry unconventional "integrated" domains in addition to the canonical tripartite domain architecture. integrated domains are thought to generally function as decoys to bait pathogen effectors and enable pathogen detection (cesari et al., ; sarris et al., ; wu et al., ; kourelis and van der hoorn, ) . they include dozens of different modules indicating that novel domain acquisitions have repeatedly taken place throughout the evolution of plant nlrs (sarris et al., ; kroj et al., ) . to date, over nlrs from genera in orders of flowering plants have been experimentally validated as reported in the refplantnlr reference dataset (kourelis and kamoun, ) . several of these nlrs are coded by r genes that function against economically important pathogens and contribute to sustainable agriculture (dangl et al., ) . in recent years, the research community has gained a better understanding of the structure/function relationships of plant nlrs and the immune receptor circuitry they form (wu et al., ; adachi et al., a; burdett et al., ; jubic et al., ; bayless and nishimura, ; feehan et al., ; mermigka et al., ; wang and chai, ; xiong et al., ; zhou and zhang, ) . some nlrs, such as zar , form a single functional unit that carries both pathogen sensing and immune signalling activities in a single protein (termed 'singleton nlr' per adachi et al., a) . other nlrs function together in pairs or more complex networks, where connected nlrs have functionally specialized into sensor nlrs dedicated to pathogen detection or helper nlrs that are required for sensor nlrs to initiate immune signalling (feehan et al., ) . paired and networked nlrs are thought to have evolved from multifunctional ancestral receptors through asymmetrical evolution (adachi et al., a (adachi et al., , b . as a result of their direct coevolution with pathogens, nlr sensors tend to diversify faster than helpers and can be dramatically expanded in some plant taxa (wu et al., ; stam et al., ) . for instance, sensor nlrs often exhibit non-canonical biochemical features, such as degenerated functional motifs and unconventional domain integrations (adachi et al., b; seong et al., ) . the elucidation of plant nlr structures by cryo-electron microscopy has significantly advanced our understanding of the biochemical events associated with the activation of these immune receptors (wang et al., a; martin et al., ) . both the cc-nlr zar and the tir-nlr roq oligomerize upon activation into a wheel-like multimeric complex known as the resistosome. in the case of zar , recognition of bacterial effectors occurs through its partner receptor-like cytoplasmic kinases (rlcks), which tend to vary depending on the pathogen effector and host plant (lewis et al., ; wang et al., ; seto et al., ; schultink et al. ; laflamme et al., ) . activation of zar induces conformational changes in the nucleotide binding domain resulting in adp release, datp/atp binding and pentamerization of the zar -rlck complex into the resistosome. the zar resistosome exposes a funnel-shaped structure formed by the n-terminal α helices, which translocates into the plasma membrane and is thought to perturb membrane integrity to trigger cell death response (wang et al., b) . the zar n-terminal α helix matches the mada consensus sequence motif that is functionally conserved in ~ % of cc-nlrs including nlrs from dicot and monocot plant species (adachi et al., b) . this suggests that the biochemical 'death switch' mechanism of the zar resistosome may apply to a significant fraction of cc-nlrs. interestingly, unlike singleton and helper cc-nlrs, sensor cc-nlrs often carry degenerated mada α helix motifs and/or n-terminal domain integrations, which would preclude their capacity to trigger cell death according to the zar model (adachi et al., b; seong et al., ) . comparative sequence analyses based on a robust evolutionary framework can yield insights into molecular mechanisms and help generate experimentally testable hypotheses. zar was previously reported to be conserved across multiple dicot plant species but whether it occurs in other angiosperms hasn't been systematically studied (baudin et al. ; schultink et al. ; harant et al. ). here, we used a phylogenomic approach to investigate the molecular evolution of zar across flowering plants (angiosperms). we discovered zar orthologs in species, including monocot, magnoliid and eudicot species indicating that zar is an atypically conserved nlr that traces its origin to early angiosperm lineages ~ to million years ago (jurassic period). we took advantage of this large collection of orthologs to identify highly conserved features of zar , revealing regions for effector recognition, intramolecular interactions and cell death activation, along with a new conserved surface on the underside of the activated zar resistosome wheel. throughout its evolution, zar also acquired novel features, including the c-terminal integration of a thioredoxin-like domain and duplication into two paralog families zar -sub and zar -cin. members of the zar -sub paralog family have highly diversified in eudicots and often lack conserved zar features. we conclude that zar has experienced relatively limited gene duplication and expansion throughout its deep evolutionary history, but still did give rise to non-canonical nlr proteins with integrated domains and degenerated molecular features. to determine the distribution of zar across plant species, we applied a computational pipeline based on iterated blast searches of plant genome and protein databases ( figure a ). these comprehensive searches were seeded with previously identified zar sequences from arabidopsis, n. benthamiana, tomato, sugar beet and cassava (baudin et al. ; schultink et al. ; harant et al. ) . we also performed iterated phylogenetic analyses using the nb-arc domain of the harvested zar -like sequences, and obtained a wellsupported clade that includes previously reported zar from the eudicots (arabidopsis, cassava, sugar beet, tomato and n. benthamiana) as well as new clade members from more distantly related plant species, notably colacasia esculenta (taro, alismatales), cinnamomum micranthum (syn. c. kanehirae, stout camphor, magnoliidae) and aquilegia coerulea (columbine, ranunculales) (supplementary table ). in total, we identified zar from angiosperm species that tightly clustered in the zar phylogenetic clade ( figure b , supplementary table ). among the genes, code for canonical cc-nlr proteins with . to . % similarity to arabidopsis zar , whereas another carry the three major domains of cc-nlr proteins but have a c-terminal integrated domain (zar -id, see below). the remaining genes code for two truncated nlrs and a potentially mis-annotated coding sequence due to a gap in the genome sequence. in summary, we propose that the identified clade consists of zar orthologs from a diversity of angiosperm species. our analyses of zar like sequences also revealed two well-supported sister clades of the zar ortholog clade ( figure b ). we named these subclades zar -sub and zar -cin and we describe them in more details below. we have recently proposed that zar is the most conserved cc-nlr between rosid and asterid plants (harant et al. ) . to further evaluate zar conservation relative to other cc-nlrs across angiosperms, we used a phylogenetic tree of nlrs from the monocot taro, the magnoliid stout camphor and eudicot species (columbine, arabidopsis, cassava, sugar beet, tomato, n. benthamiana) to calculate the phylogenetic (patristic) distance between each of the arabidopsis cc-nlrs and their closest neighbor from each of the other plant species. we found that zar stands out for having the shortest phylogenetic distance to its orthologs relative to other cc-nlrs in this diverse angiosperm species set ( figure -figure supplement ) . a similar analysis where we plotted the phylogenetic distance between each of the n. benthamiana cc-nlrs to their closest neighbor from the other species also revealed zar as displaying the shortest patristic distance across all examined species (figure -figure supplement ) . these analyses revealed that zar is possibly the most widely conserved cc-nlr in flowering plants (angiosperms). workflow for computational analyses in searching zar orthologs. we performed tblastn/blastp searches and subsequent phylogenetic analyses to identify zar ortholog genes from angiosperm genome/proteome datasets. (b) zar forms a clade with two closely related sister subclades. the phylogenetic tree was generated in mega by the neighbour-joining method using nb-arc domain sequences of zar -like proteins identified from the prior blast searches and nlrs identified from representative plant species, taro, stout camphor, columbine, tomato, sugar beet and arabidopsis. each branch is marked with different colours based on the zar and the sister subclades. red arrow heads indicate bootstrap support > . and is shown for the relevant nodes. the scale bar indicates the evolutionary distance in amino acid substitution per site. although zar is distributed across a wide range of angiosperms, we noted particular patterns in its phylogenetic distribution. supplementary table describes the gene identifiers and other features of zar orthologs sorted based on the phylogenetic clades reported by smith and brown ( ) . of the plant species have a single-copy of zar whereas species have two or more copies. zar is primarily a eudicot gene but we identified three zar orthologs outside the eudicots, two in the monocot taro and another one in the magnoliid stout camphor. we failed to detect zar orthologs in species among the species we examined (supplementary table ) . except for taro, zar is missing in monocot species ( examined), including in the well-studied hordeum vulgare (barley), oryza sativa (rice), triticum aestivum (wheat) and zea mays (maize). zar is also missing in all examined species of the eudicot fabales, cucurbitales, apiales and asterales. however, we found a zar ortholog in the early diverging eudicot columbine and zar is widespread in other eudicots, including in rosid, caryophyllales and asterid species. the overall conservation of the zar orthologs enabled us to perform phylogenetic analyses using the full-length protein sequence and not just the nb-arc domain as generally done with nlrs ( zar phylogenetic tree with well-supported branches that generally mirrored established phylogenetic relationships between the examined plant species (smith and brown, ; chaw et al., ) . for example, the zar tree matched a previously published species tree of angiosperms based on single-copy core ortholog genes (chaw et al., ) . we conclude that the origin of the zar gene predates the split between monocots, magnoliids and eudicots and its evolution traced species divergence ever since. we postulate that zar probably emerged in the jurassic era ~ to million years ago (mya) based on the species divergence time estimate of chaw et al. ( ) and consistent with the latest fossil evidence for the emergence of flowering plants (fu et al., ) . nlr genes are often clustered in loci that are thought to accelerate sequence diversification and evolution (michelmore and meyers, ; lee and chae, ) . we examined the genetic context of zar genes using available genome assemblies of taro, stout camphor, columbine, arabidopsis, cassava, sugar beet, tomato and n. benthamiana. the zar locus is generally devoid of other nlr genes as the closest nlr is found in the arabidopsis genome kb away from zar (figure -figure supplement -supplementary table ). we conclude that zar has probably remained a genetic singleton nlr gene throughout its evolutionary history in angiosperms. next, we examined the zar locus for gene co-linearity across the examined species. we noted a limited degree of gene co-linearity between arabidopsis vs. cassava, cassava vs. tomato, and tomato vs. n. benthamiana (figure -figure supplement ) . flanking conserved genes include the atpase and protein kinase genes that are present at the zar locus in both rosid and asterid eudicots. in contrast, we didn't observe conserved gene blocks at the zar locus of taro, stout camphor and columbine, indicating that this locus is divergent in these species. overall, although limited, the observed gene co-linearity in eudicots is consistent with the conclusion that zar is a genetic singleton with an ancient origin. the overall sequence conservation and deep evolutionary origin of zar orthologs combined with the detailed knowledge of zar structure and function provide a unique opportunity to explore the evolutionary dynamics of this ancient immune receptor in a manner that cannot be applied to more rapidly evolving nlrs. we used meme (multiple em for motif elicitation) (bailey and elkan, ) to search for conserved sequence patterns among the zar orthologs (zar and zar -id) that encode full-length cc-nlr proteins. this analysis revealed several conserved sequence motifs that span across the zar orthologs (range of protein lengths: - amino acids) ( figure a , figure -supplementary table ). in figure a , we described the major five sequence motifs or interfaces known to be required for arabidopsis zar function that are conserved across zar orthologs. effector recognition by zar occurs indirectly via binding to rlcks through the lrr domain. key residues in the arabidopsis zar -rlck interfaces are highly conserved among zar orthologs and were identified by meme as conserved sequence patterns ( figure a ). valine (v) , histidine (h) , tryptophan (w) and phenylalanine (f) in the arabidopsis zar lrr domain were validated by mutagenesis as important residues for rlck binding whereas isoleucine (i) was not essential (wang et al. a; hu et al. ). in the zar orthologs, v , h , w and f are conserved in - % of the proteins compared to only % for i . after effector recognition, arabidopsis zar undergoes conformational changes from inactive to active state. this is mediated by adp release from the nb-arc domain and subsequent atp binding, which triggers further structural remodelling of zar into the pentameric resistosome (wang et al. b ). nb-arc sequences that coordinate binding and hydrolysis of datp, namely p-loop and mhd motifs, are highly conserved across zar orthologs ( figure a ). histidine (h) and lysine (k) , located in the adp/atp binding pocket (wang et al. a; wang et al. b) , are invariant in all orthologs. in addition, three nb-arc residues, w , s and v , known to form the nbd-nbd oligomerisation (a) schematic representation of the arabidopsis zar protein highlighting the position of conserved sequence patterns across zar orthologs. consensus sequence patterns were identified by meme using zar ortholog sequences. raw meme motifs are listed in figure -supplementary table . red asterisks indicate residues functionally validated in arabidopsis zar for nbd-nbd and zar -rlck interfaces. (b) conservation and variation of each amino acid among zar orthologs across angiosperms. amino acid alignment of zar orthologs was used for conservation score calculation via the consurf server (https://consurf.tau.ac.il). the conservation scores are mapped onto each amino acid position in arabidopsis zar (np_ . ). (c, d) distribution of the consurf conservation score on the arabidopsis zar structure. the inactive zar monomer is illustrated in cartoon representation with different colours based on each canonical domain (c) and the conservation score (d). major five variable surfaces (vs to vs ) on the inactive zar monomer structure are described in grey dot or black boxes in panel b or d, respectively. interface for resistosome formation (wang et al. b; hu et al. ) , are present in - % of the zar orthologs and were also part of a meme motif ( figure a ). the n-terminal cc domain of arabidopsis zar mediates cell death signalling thorough the n-terminal α helix/mada motif, that becomes exposed in activated zar resistosome to form a funnel like structure that perturbs the plasma membrane (baudin et al., ; wang et al. b; adachi et al., b) . we detected an n-terminal meme motif that matches the α helix/mada motif ( figure a ). we also used the hmmer software (eddy, ) to query the zar orthologs with a previously reported mada motif-hidden markov model (hmm) (adachi et al., b) . this hmmer search detected a mada-like sequence at the n-terminus of all zar orthologs (supplementary table ) . taken together, based on the conserved motifs depicted in figure a , we propose that angiosperm zar orthologs share the main functional features of arabidopsis zar : ) effector recognition via rlck binding, ) remodelling of intramolecular interactions via adp/atp switch, ) oligomerisation via the nbd-nbd interface and ) α helix/mada motifmediated activation of hypersensitive cell death. to identify additional conserved and variable features in zar orthologs, we used consurf (ashkenazy et al., ) to calculate a conservation score for each amino acid and generate a diversity barcode for zar orthologs ( figure b ). the overall pattern is that the zar orthologs are fairly conserved. nonetheless, the cc domain (except for the n-terminal mada motif and a few conserved stretches), the junction between the nb-arc and lrr domains and the very c-terminus were distinctly more variable than the rest of the protein ( figure b ). we also used the cryo-em structures of arabidopsis zar to determine how the consurf score map onto the d structures ( figure c , d and figure ). first, we found five major variable surfaces (vs to vs ) on the inactive zar monomer structure ( figure c , d), as depicted in the zar diversity barcode ( figure b ). vs comprises α /α helices and a loop between α and α helices of the cc domain. vs and vs corresponds to α /α helices of nbd and a loop between α and α helices of hd , respectively. vs comprises a loop between whd and lrr and first three helices of the lrr domain. vs is mainly derived from the last three helices of the lrr domain and the loops between these helices ( figure b , d). we also noted significant sequence variation at the glutamate rings inside the arabidopsis zar resistosome (figure -figure supplement ). mutations of glutamic acid (e) and e impaired arabidopsis zar -mediated cell death without interfering with oligomerization and plasma membrane association (wang et al. b ). the e /e ring was previously discussed as potentially having ca + transporter activity because of structural similarity to rings in the structures of the mitochondrial calcium uniporter from caenorhabditis elegans and the calcium release-activated calcium channel orai from drosophila melanogaster (burdett et al., ) . whereas e is conserved in % of zar orthologs, only - % retain e , e and e in the same positions as arabidopsis zar . figure . (d) schematic representation of the conserved underside surface region among zar orthologs. the conserved underside regions are described with consensus sequence patterns identified by meme. red asterisks indicate residues exposed on resistosome surfaces. the raw meme motif is listed in figure -supplementary table . next, we examined highly conserved surfaces on inactive and active zar structures ( figure a , b). consistent with the meme analyses, we confirmed that highly conserved surfaces match to the rlck binding interfaces ( figure a , b). we also confirmed that the n-terminal α helix/mada motif is conserved on the resistosome surfaces, although the first four nterminal amino acids are missing from the n terminus of the active zar cryo-em structures ( figure b ). remarkably, these analyses revealed a highly conserved ring that is exposed on the underside surface of the zar resistosome opposite to the funnel-shaped structure ( figure c ). this conserved underside surface is mainly formed by α helix-loop-ß sheet and α helix-loop-α helix regions in the nbd (figure -figure supplement and ) . within the conserved patch, arginine (r) , k and k , are positively charged residues that are exposed on the underside surface and are conserved in %, % and % of zar orthologs, respectively ( figure -figure supplement ) . the three residues form a positive electrostatic potential ring on the underside surface of the zar resistosomes ( figure -figure supplement ) . the conserved underside ring is composed of a amino acid motif as revealed by meme ( figure d ). we propose that this underside sequence pattern has been maintained throughout the more than million years of zar evolution and is likely to be functionally important. as noted earlier, zar orthologs carry an integrated domain (id) at their c-termini (supplementary table ) . these zar -id include predicted proteins (xp_ . and xp_ . ) from manihot esculenta (cassava) and predicted proteins (kab . , ppd . , kab . , tyg . , tyi . , tyj . , kjb . ) from the cotton plant species gossypium barbadense, gossypium darwinii, gossypium mustelinum and gossypium raimondii (supplementary table ). the integrations follow an otherwise intact lrr domain and vary in length from to amino acids ( figure a ). we confirmed that the zar -id gene models of cassava xp_ . and xp_ . are correct based on rna-seq exon coverage in the ncbi database (database id: loc ). however, cassava zar -id xp_ . and xp_ . are isoforms encoded by transcripts from a single locus on chromosome lg (refseq sequence nc_ . ) of the cassava refseq assembly (gcf_ . ) which also produces transcripts encoding isoforms lacking the c-terminal id (xp_ . , xp_ . , xp_ . , xp_ . and xp_ . ). thus, cassava zar -id are probably splicing variants from a unique cassava zar gene locus ( figure -figure supplement ) . to determine the phylogenetic relationship between zar -id and canonical zar , we mapped the domain architectures of zar orthologs on the phylogenetic tree shown in figure we annotated all the c-terminal extensions as thioredoxin-like using interproscan (trx, ipr ; ipr ; cd ). the integrated trx domain sequences share sequence similarity to each other ( figure b ). they are also similar to arabidopsis at g (phosphoducin-like plp a; . - % similarity to integrated trx domains), which is located immediately downstream of zar in a tail-to-tail configuration in the arabidopsis genome ( . we also noted additional genetic linkage between zar and trx genes in other rosid species, namely field mustard, orange, cacao, grapevine and apple, and in the asterid species coffee (figure -figure supplement -supplementary table ). we conclude that zar is often genetically linked to a plp a-like trx domain gene and that the integrated domain in zar -id has probably originated from a genetically linked sequence. phylogenetic analyses revealed zar -sub as a sister clade of the zar ortholog clade ( figure b, figure ). zar -sub clade comprises genes from a total of plant species (supplementary table ) . of the plant species carry a single-copy of zar -sub whereas figure . zar -sub has emerged early in eudicots and diverged at mada motif sequence. the phylogenetic tree was generated in mega by the neighbour-joining method using full length amino acid sequences of zar , zar -sub and zar -cin identified in figure . each branch is marked with different colours based on the plant taxonomy. red triangles indicate bootstrap support > . . the scale bar indicates the evolutionary distance in amino acid substitution per site. nlr domain architectures are illustrated outside of the leaf labels: mada is red, cc is pink, nb-arc is yellow, lrr is blue and other domain is orange. black asterisks on domain schemes describe truncated nlrs or potentially mis-annotated nlr. species have two or more copies. of the genes, code for canonical cc-nlr proteins ( - amino acid length) with shared sequence similarities ranging from . to . % ( figure ). unlike zar , zar -sub nlrs are restricted to eudicots (figure -figure supplement , supplementary table ). three out of genes are from the early diverging eudicot clade ranunculales species, namely columbine, macleaya cordata (plume poppy) and papaver somniferum (opium poppy) (figure -figure supplement ) . the remaining zar -sub are spread across rosid and asterid species (figure -figure supplement ) . we found that species have zar -sub genes but lack a zar ortholog (supplementary table ) . these species include two of the early diverging eudicots plume poppy and opium poppy, and the brassicales carica papaya (papaya). interestingly, papaya is the only brassicales species carrying a zar -sub gene, whereas the other brassicales species have zar but lack zar -sub genes (figure -figure supplement , supplementary table ) . in total, we didn't detect zar -sub genes in species that have zar orthologs, and these species include the monocot taro, the magnoliid stout camphor and eudicots, such as arabidopsis, sugar beet and n. benthamiana (supplementary table ) . in summary, given the taxonomic distribution of the zar -sub clade genes, we propose that zar -sub has emerged from a single duplication event of zar prior to the split between ranunculales and other eudicot lineages about ~ - mya based on the species divergence time estimate of chaw et al. ( ) . we investigated the sequence patterns of zar -sub proteins and compared them to the sequence features of canonical zar proteins that we identified earlier (figures , ) . meme analyses revealed several conserved sequence motifs (figure -supplementary table ) . especially, the meme motifs in the zar -sub nb-arc domain were similar to zar ortholog motifs (figure -supplementary table ). these include p-loop and mhd motifs, which are broadly conserved in nb-arc of % and % of the zar -sub nlrs, respectively ( figure a ). meme also revealed sequence motifs in the zar -sub lrr domain that partially overlaps in position with the conserved zar -rlck interfaces ( figure a remarkably, unlike zar orthologs, meme did not predict conserved sequence pattern from a region corresponding to the mada motif, indicating that these sequences have diverged across zar -sub proteins ( figure a ). we confirmed the low frequency of mada motifs in zar -sub proteins using hmmer searches with only ~ % ( out of ) of the tested proteins having a mada-like sequence (supplementary table , figure ). moreover, conserved sequence patterns were not predicted for the nbd-nbd interface and the conserved underside surface of the zar resistosome ( figure a, figure -figure supplement ). this indicates that the nb-arc domain of zar -sub proteins is highly diversified in contrast to the relatively conserved equivalent region of zar proteins. we generated a diversity barcode for zar -sub proteins using the consurf as we did earlier with zar orthologs ( figure b ). this revealed that there are several conserved sequence blocks in each of the cc, nb-arc and lrr domains, such as the regions corresponding to ploop, mhd motif and the equivalent of the zar -rlck interfaces. nonetheless, zar -sub proteins are overall more diverse than zar orthologs especially in the cc domain, including the n-terminal mada motif, and the nbd/hd regions of the nb-arc domain where the nbd-nbd interface is located. next, we mapped the consurf conservation scores onto a homology model of a representative zar -sub protein (xp_ . from tomato) built based on the arabidopsis zar cryo-em structures ( figure ). as highlighted in figure b and c, conserved residues, such as mhd motif region in the whd, are located inside of the monomer and resistosome structures. interestingly, although the prior meme prediction analyses revealed conserved motifs in positions matching the zar -rlck interfaces in the lrr domain, the zar -sub structure homology models displayed variable surfaces in this region ( figures a, a ). this indicates that the variable residues within these sequence motifs are predicted to be on the outer surfaces of the lrr domain and may reflect interaction with different ligands. taken together, these results suggest that unlike zar orthologs, the zar -sub paralogs have divergent molecular patterns for regions known to be involved in effector recognition, resistosome formation and activation of hypersensitive cell death. the zar -cin clade, identified by phylogenetic analyses as a sister clade to zar and zar -sub, consists of genes from the magnoliid species stout camphor ( figure b, figure , supplementary table ). of the zar -cin genes code for canonical cc-nlr proteins with . to . % sequence similarities to each other, whereas the remaining genes code for truncated nlr proteins. interestingly, all zar -cin genes occur in a ~ kb cluster on scaffold qpkb . of the stout camphor genome assembly (genbank assembly accession gca_ . ) ( figure -figure supplement ). this scaffold also contains the stout camphor zar ortholog (cmzar , rwr ), which is located mb from the zar -cin cluster ( figure -figure supplement ) . based on the observed phylogeny and gene clustering, we suggest that the zar -cin cluster emerged from segmental duplication and expansion of the ancestral zar gene after stout camphor split from the other examined zar containing species. we performed meme and consurf analyses of the intact zar -cin proteins as described above for zar and zar -sub. the consurf barcode revealed that although zar -cin proteins are overall conserved, their whd region and lrr domain include some clearly variable blocks ( figure b ). meme analyses of zar -cin sequences revealed that like zar orthologs, the mada, p-loop and mhd motifs match highly conserved blocks of the zar -cin consurf barcode ( figure b , c, figure -supplementary tables and ) . consistently, . % ( out of ) of the zar -cin proteins were predicted to have a mada-type n-terminal sequence based on mada-hmm analyses (supplementary table , figure ). meme picked up additional sequence motifs in zar -cin proteins that overlap in position with the nbd-nbd and zar -rlck interfaces ( figure c, figure -figure supplement ) . however, the sequence consensus at the nbd-nbd and zar -rlck interfaces indicated these motifs are more variable among zar -cin proteins relative to zar orthologs, and the motif sequences were markedly different from the matching region in zar (figures a, c) . we also mapped the consurf conservation scores onto a homology model of a representative zar -cin protein (rwr . ) built based on the arabidopsis zar cryo-em structures (figure ). this model revealed several conserved surfaces, such as on the α helix in the cc domain, the whd of the nb-arc domain and underside surface of the resistosome (figure b, c, e). in contrast, the zar -cin structure homology models displayed highly varied surfaces especially in the lrr region matching the rlck binding interfaces of zar ( figure a ). this sequence diversification on the lrr surface suggests that the zar -cin paralogs may have different host partner proteins and/or effector recognition specificities compared to zar . this study originated from phylogenomic analyses we initiated during the covid- lockdown of march . we performed iterated comparative sequence similarity searches of plant genomes using the cc-nlr immune receptor zar as a query, and subsequent phylogenetic evaluation of the recovered zar -like sequences. this revealed that zar is an ancient gene with orthologs recovered from species including monocot, magnoliid and eudicot plants. zar is an atypically conserved nlr in these species with the gene phylogeny tracing species phylogeny, and consistent with the view that zar originated early in angiosperms during the jurassic geologic period ~ to mya (figure ). the ortholog series enabled us to determine that resistosome sequences that are known to be functionally important and have remained highly conserved throughout the long evolutionary history of zar . this also revealed a new conserved sequence ring on the underside of the resistosome, which has remained constrained in zar ortholog proteins (figure ) . the only unexpected feature among zar orthologs is the acquisition of a c-terminal thioredoxin-like domain in cassava and cotton species (figures , ). our phylogenetic analyses also indicated that zar duplicated twice throughout its evolution (figure ). in the eudicots, zar spawned a large paralog family, zar -sub, which greatly diversified and often lost the typical sequence features of zar . a second paralog, zar -cin, is restricted to a tandemly repeated -gene cluster in stout camphor. overall, our findings map patterns of functional conservation, expansion and diversification onto the evolutionary history of zar and its paralogs ( figure ). phylogenomics analyses, such as this work, provide a unique evolutionary perspective on the function of a plant nlr immune receptor and generate experimentally testable hypotheses that can be challenged in the future. zar most likely emerged prior to the split between monocots, magnoliids and eudicots, which corresponds to ~ to mya based on the dating analyses of chaw et al. ( ) . the origin of the angiosperms remains hotly debated with uncertainties surrounding some of the fossil record coupled with molecular clock analyses that would benefit from additional genome sequences of undersampled taxa. however, recently fu et al. ( ) provided credence to an earlier emergence of angiosperms with the discovery of the fossil flower nanjinganthus dendrostyla, which places the emergence of flowering plants at the early jurassic. it is tempting to speculate that zar emerged among these early flowering plants during the period when dinosaurs dominated planet earth. nlrs are notorious for their rapid and dynamic evolutionary patterns. in contrast, zar is an atypically core nlr gene conserved in a wide range of angiosperm species (figures and ) . nevertheless, arabidopsis zar can recognize diverse bacterial pathogen effectors, including five different effector families distributed among nearly half of a collection of ~ pseudomonas syringae strains (laflamme et al., ) and an effector avrac from figure . evolution of zar and the paralogs in angiosperms. we propose that the ancestral zar gene has emerged ~ to million years ago (mya) before monocot and eudicot lineages split. zar gene is widely conserved cc-nlr in angiosperms, but it is likely that zar has lost in a monocot lineage, commelinales. a sister clade paralog zar -sub has emerged early in the eudicot lineages and may have lost in caryophyllales. another sister clade paralog zar -cin has duplicated from zar gene and expanded in the magnoliidae c. micranthum. trx domain integration to c terminus of zar has independently occurred in few rosid lineages. xanthomonas campestris (wang et al., ) . how did zar remain conserved throughout its evolutionary history while managing to detect a diversity of effectors? the answer to the riddle lies in the fact that zar effector recognition occurs via its partner rlcks. hopz-etideficient (zed ) and zed -related kinases (zrks) of the rlck xii- subfamily rest in complex with inactive zar proteins and bait effectors by binding them directly or by recruiting other effector-binding rlcks, such as the family vii pbs -like protein (pbl ) (lewis et al., ; wang et al., ) . these zar -associated rlcks are highly diversified in arabidopsis, with of the rlck xii- members occurring in the expanded zrk gene cluster (lewis et al., ) . in this zrk cluster, rks /zrk is required for recognition of x. campestris effector avrac (wang et al., ) and zrk and zrk /zed are required for recognition of p. syringae effectors hopf a and hopz a, respectively (lewis et al., ; seto et al., ) . therefore, as in the model discussed by schultink et al. ( ) , rlcks have evolved as pathogen 'sensors' whereas zar acts as a conserved signal executor to activate immune response. future phylogenomic analyses of the rlck subfamilies coupled with functional analyses with zar across angiosperms will help test and sharpen this model. our meme and consurf analyses are consistent with the model of zar /rlck evolution described above. zar is not just exceptionally conserved across angiosperms but it has also preserved sequence patterns that are key to resistosome-mediated immunity (figures and ) . in particular, within the lrr domain, zar orthologs display highly conserved surfaces for rlck binding (figure ) . we conclude that zar has been guarding host kinases throughout its evolution ever since the jurassic period. these findings strikingly contrast with observations recently made by prigozhin and krasileva ( ) on highly variable arabidopsis nlrs (hvnlrs), which tend to have diverse lrr sequences. for instance, the cc-nlr rpp displays variable lrr surfaces across arabidopsis accessions, presumably because these regions are effector recognition interfaces that are caught in arms race coevolution with the oomycete pathogen hyaloperonospora arabidopsidis (prigozhin and krasileva, ) . the emerging view is that the mode of pathogen detection (direct vs indirect recognition) drives an nlr evolutionary trajectory by accelerating sequence diversification at the effector binding site or by maintaining the binding interface with the partner guardee/decoy proteins (prigozhin and krasileva, ) . zar orthologs display a patchy distribution across angiosperms (figure , supplementary table ). given the low number of non-eudicot species with zar it is challenging to develop a conclusive evolutionary model. nonetheless, the most parsimonious explanation is that zar was lost in the monocot commelinales lineage (figure , supplementary table ) . zar is also missing in some eudicot lineages, notably fabales, cucurbitales, apiales and asterales (supplementary table ) . cucurbitaceae (cucurbitales) species are known to have reduced repertoires of nlr genes possibly due to low levels of gene duplications and frequent deletions (lin et al., ) . zar may have been lost in this and other plant lineages as part of an overall shrinkage of their nlromes or as a consequence of selection against autoimmune phenotypes triggered by nlr mis-regulation (karasov et al., ; adachi et al., a) . in the future, it would be interesting to investigate the repertoires of rlck subfamilies vii and xii in species that lack zar orthologs. we unexpectedly discovered that some zar orthologs from cassava and cotton species carry a c-terminal thioredoxin-like domain (zar -id in figure ). what is the function of these integrated domains? the occurrence of unconventional domains in nlrs is relatively frequent and ranges from to % of all nlrs. in several cases, integrated domains have emerged from pathogen effector targets and became decoys that mediate detection of the effectors (kourelis and van der hoorn, ). whether or not the integrated trx domain of zar -id functions to bait effectors will need to be investigated. since zar -id proteins still carry intact kinase binding interfaces (supplementary table -source data ), they may have evolved dual or multiple recognition specificities via rlcks and the trx domain. in addition, all zar -id proteins have an intact n-terminal mada motif ( figure -figure supplement ) , suggesting that they probably can execute the hypersensitive cell death through their nterminal cc domains even though they carry a c-terminal domain extension (adachi et al., b) . however, we noted multiple splice variants of the zar -id gene of cassava, some of which lack the trx integration ( figure -figure supplement ) . it is possible that both zar and zar -id isoforms are produced, potentially functioning together as a pair of sensor and helper nlrs. our sequence analyses of zar -id indicate that the integrated trx domain originates from the plp phosphoducin gene, which is immediately downstream of zar in the arabidopsis genome and adjacent to zar in several other eudicot species ( figure -figure supplement ) . whether or not plp plays a role in zar function and the degree to which close genetic linkage facilitated domain fusion between these two genes are provocative questions for future studies. zar spawned two classes of paralogs through two independent duplication events. the zar -sub paralog clade emerged early in the eudicot lineage-most likely tens of millions of years after the emergence of zar -and has diversified into at least genes in species (figure ). zar -sub proteins are distinctly more diverse in sequence than zar orthologs and generally lack key sequence features of zar , like the mada motif and the nbd-nbd oligomerisation interface (figures , ) (adachi et al., b; wang et al. b; hu et al. ). this pattern is consistent with 'use-it-or-lose-it' evolutionary model, in which nlrs that specialize for pathogen detection lose some of the molecular features of their multifunctional ancestors (adachi et al., b) . therefore, we predict that many zar -sub proteins evolved into specialized sensor nlrs that require nlr helper mates for executing the hypersensitive response. it is possible that zar -sub helper mate is zar itself, and that these nlrs evolved into a phylogenetically linked network of sensors and helpers similar to the nrc network of asterid plants (wu et al., ) . however, species have a zar -sub gene but lack a canonical zar (supplementary table ) , indicating that these zar -sub nlrs may have evolved to depend on other classes of nlr helpers. how would zar -sub sense pathogens? given that the lrr domains of most zar -sub proteins markedly diverged from the rlck binding interfaces of zar , it is unlikely that zar -sub proteins bind rlcks in a zar -type manner (figure ). this leads us to draw the hypothesis that zar -sub proteins have diversified to recognize other ligands than rlcks. in the future, functional investigations of zar -sub proteins could provide insights into how multifunctional nlrs, such as zar , evolve into functionally specialized nlrs. the zar -cin clade consists of clustered paralogs that are unique to the magnoliid species stout camphor as revealed from the genome sequence of the taiwanese small-flowered camphor tree (also known as cinnamomum kanehirae, chinese name niu zhang 牛樟) (chaw et al., ) . this cluster probably expanded from zar , which is ~ mbp on the same genome sequence scaffold (figure -figure supplement ) . the relatively rapid expansion pattern of zar -cin into a tandemly duplicated gene cluster is more in line with the classical model of nlr evolution compared to zar maintenance as a genetic singleton over tens of millions of years (michelmore and meyers, ) . zar -cin proteins may have neofunctionalized after duplication, acquiring new recognition specificities as a consequence of coevolution with host partner proteins and/or pathogen effectors. consistent with this view, zar -cin proteins display distinct surfaces at the zar -rlck binding interfaces and may bind to other ligands than rlcks as we hypothesized above for zar -sub (figure ). zar -cin could be viewed as intraspecific highly variable nlrs (hvnlr) per the nomenclature of prigozhin and krasileva ( ) . unlike zar -sub, zar -cin have retained the n-terminal mada sequence (figures and ) . we propose that zar -cin are able to execute the hypersensitive cell death on their own similar to zar . however, zar -cin display divergent sequence patterns at nbd-nbd oligomerisation interfaces compared to zar ( figure c, figure -figure supplement ) . therefore, zar -cin may form resistosome-type complexes that are independent of zar . one intriguing hypothesis is that zar -cin may associate with each other to form heterocomplexes of varying complexity and functionality operating as an nlr receptor network. in any case, the clear-cut evolutionary trajectory from zar to the zar -cin paralog cluster provides a robust evolutionary framework to study functional transitions and diversifications in this cc-nlr lineage. in summary, our phylogenomics analyses raise a number of intriguing questions about zar evolution. the primary hypothesis we draw is that zar is an ancient cc-nlr that has been guarding rlcks ever since the jurassic period. throughout over million years, zar has maintained its molecular features for sensing pathogens and activating hypersensitive cell death, but it also has retained an intriguingly conserved nb-arc ring surface on the underside of the zar resistosome (figure ) . we propose that this underside surface may play an important function in the resistosome, similar to other highly conserved regions such as α helix/mada motif, nbd-nbd oligomerisation and rlck binding interfaces. the equivalent region of the nb-arc underside ring is apparently not exposed onto the underside surface of the tir-nlr roq resistosome structure (martin et al., ) . therefore, zar conserved underside surface may be a specific feature of cc-nlr resistosomes. further comparative analyses, combining molecular evolution and structural biology, of plant resistosomes and between resistosomes and the apoptosomes and inflammasome of animal nlr systems (wang and chai, ) will yield novel experimentally testable hypotheses for nlr research. we performed blast (altschul et al., ) using previously identified zar sequences as queries (baudin et al. ; schultink et al. ; harant et al. ) to search zar like sequences in ncbi nr or nr/nt database (https://blast.ncbi.nlm.nih.gov/blast.cgi) and phytozome . (https://phytozome.jgi.doe.gov/pz/portal.html#!search?show=blast). in the blast search, we used cut-offs, percent identity ≥ % and query coverage ≥ %. the blast pipeline was circulated by using the obtained sequences as new queries to search zar like genes over the angiosperm species. we also performed the blast pipeline against a plant nlr dataset annotated by nlr-parser (steuernagel et al., ) from plant reference genome databases (supplementary table ) . for the phylogenetic analysis, we aligned nlr amino acid sequences using mafft v. (katoh and standley, ) and manually deleted the gaps in the alignments in mega (kumar et al., ) . full-length or nb-arc domain sequences of the aligned nlr datasets were used for generating phylogenetic trees. the neighbour-joining tree was made using mega with jtt model and bootstrap values based on iterations. all datasets used for phylogenetic analyses are in source data files. to calculate the phylogenetic (patristic) distance, we used python script based on dendropy (sukumaran and mark, ) . we calculated patristic distances from each cc-nlr to the other cc-nlrs on the phylogenetic tree (figure -source data ) and extracted the distance between cc-nlrs of arabidopsis or n. benthamiana to the closest nlr from the other plant species. the script used for the patristic distance calculation is available from github (https://github.com/slt / phylogenetic_distance_plot ). to investigate genetic co-linearity at zar loci, we extracted the genes upstream and downstream of zar using gff files derived from reference genome databases (supplementary table ). to identify conserved gene blocks, we used gene annotation from ncbi protein database and confirmed protein domain information based on interproscan (jones et al, ) . full-length nlr sequences of the each subfamily zar , zar -sub or zar -cin were subjected to motif searches using the meme (multiple em for motif elicitation) (bailey and elkan, ) with parameters 'zero or one occurrence per sequence, top twenty motifs', to detect consensus motifs conserved in ≥ % of the input sequences. the output data are summarized in figure to predict the mada motif from zar , zar -sub and zar -cin datasets, we used the mada-hmm previously developed (adachi et al., b) , with the hmmsearch program (hmmsearch -max -o ) implemented in hmmer v . . (eddy, ) . we termed sequences over the hmmer cut-off score of . as the mada motif and sequences having the score -to- . as the mada-like motif. to analyze sequence conservation and variation in zar , zar -sub and zar -cin proteins, aligned full-length nlr sequences in mafft v. were used for consurf (ashkenazy et al., ) . arabidopsis zar (np_ . ), a tomato zar -sub (xp_ . ) or a stout camphor zar -cin (rwr . ) was used as a query for each analysis of zar , zar -sub or zar -cin, respectively. the output datasets are in figure -source data , figure source data and figure -source data . we used the cryo-em structure of activated zar (wang et al., b) as template to generate a homology model of zar -sub and zar -cin. the amino acid sequence of a tomato zar -sub (xp_ . ) and a stout camphor zar -cin (rwr . ) were submitted to protein homology recognition engine v . (phyre ) for modelling (kelley et a., ) . the coordinates of zar structure ( j t) were retrieved from the protein data bank and assigned as modelling template by using phyre expert mode. the resulting model of zar -sub and zar -cin, and the zar structures ( j t) were illustrated with the consurf conservation scores in pymol. the phylogenetic tree shown in figure was used to describe nlr domain architectures. domain schemes are aligned to right side of the leaf labels: mada is red, cc is pink, nb-arc is yellow, lrr is blue and other domain is orange. black asterisks on domain schemes describe truncated nlrs or potentially mis-annotated nlr. each branch is marked with different colours based on the plant taxonomy. red triangles indicate bootstrap support > . . the scale bar indicates the evolutionary distance in amino acid substitution per site. supplementary table -source data . amino acid sequences of zar in angiosperms. this file contains zar amino acid sequences identified from computational pipeline in figure a . supplementary table -source data . amino acid alignment file of zar in angiosperms. full-length amino acid sequences of zar orthologs were aligned by mafft version . supplementary table -source data . list of plant species carrying zar as single-copy gene. supplementary table -source data . list of plant species carrying or more zar genes. nlr singletons, pairs, and networks: evolution, assembly, and regulation of the intracellular immunoreceptor circuitry of plants an n-terminal motif in nlr immune receptors is functionally conserved across distantly related plant species basic local alignment search tool consurf : an improved methodology to estimate and visualize evolutionary conservation in macromolecules nlr diversity, helpers and integrated domains: making sense of the nlr identity fitting a mixture model by expectation maximization to discover motifs in biopolymers analysis of the zar immune complex reveals determinants for immunity and molecular interactions structure-function analysis of zar immune receptor reveals key molecular interactions for activity enzymatic functions for toll/interleukin- receptor domain proteins in the plant immune system uncoiling cnls: structure/function approaches to understanding cc domain function in plant nlrs function, discovery, and exploitation of plant pattern recognition receptors for broad-spectrum disease resistance the plant "resistosome": structural insights into immune signaling a novel conserved mechanism for plant nlr protein pairs: the "integrated decoy stout camphor tree genome fills gaps in understanding of flowering plant genome evolution pivoting the plant immune system from dissection to deployment reconstructing trait evolution in plant evo-devo studies plant immunity: towards an integrated view of plant-pathogen interactions induced proximity of a tir signaling domain on a plant-mammalian nlr chimera activates defense in plants profile hidden markov models plant nlrs get by with a little help from their friends an unexpected noncarpellate epigynous flower from the jurassic of china phytozome: a comparative platform for green plant genomics a vector system for fast-forward in vivo studies of the zar resistosome in the model plant nicotiana benthamiana bacterial effectors induce oligomerization of immune receptor zar in vivo the plant immune system intracellular innate immune surveillance devices in plants and animals interproscan : genome-scale protein function classification help wanted: helper nlrs and plant immune responses mechanisms to mitigate the trade-off between growth and defense mafft multiple sequence alignment software version : improvements in performance and usability the phyre web portal for protein modeling, prediction and analysis defended to the nines: years of resistance gene cloning identifies nine mechanisms for r protein function refplantnlr: a comprehensive collection of experimentally validated plant nlrs integration of decoy domains derived from protein targets of pathogen effectors into plant immune receptors is widespread mega : molecular evolutionary genetics analysis version . for bigger datasets the pan-genome effector-triggered immunity landscape of a host-pathogen interaction genome-wide functional analysis of hot pepper immune receptors reveals an autonomous nlr cluster in seed plants variation patterns of nlr clusters in arabidopsis thaliana genomes the arabidopsis zed pseudokinase is required for zar -mediated immunity induced by the pseudomonas syringae type iii effector hopz a receptor-like cytoplasmic kinases: central players in plant receptor kinase-mediated signaling frequent loss of lineages and deficient duplications accounted for low copy number of disease resistance genes in cucurbitaceae structure of the activated roq resistosome directly recognizing the pathogen effector xopq plant and animal innate immunity complexes: fighting different enemies with similar weapons clusters of resistance genes in plants evolve by divergent selection and a birth-anddeath process balanced polymorphism and evolution by the birth-and-death process in the mhc loci intraspecies diversity reveals a subset of highly variable plant immune receptors and predicts their binding sites comparative analysis of plant immune receptor architectures uncovers host proteins likely targeted by pathogens evolution of nlr resistance genes with noncanonical n-terminal domains in wild tomato species expanded type iii effector recognition by the zar nlr protein using zed -related kinases using forward genetics in nicotiana benthamiana to uncover the immune signaling pathway mediating recognition of the xanthomonas perforans effector xopj constructing a broadly inclusive seed plant phylogeny large-scale analyses of angiosperm nucleotide-binding site-leucine-rich repeat genes reveal three anciently diverged classes with distinct evolutionary patterns subsets of nlr genes show differential signatures of adaptation during colonization of new habitats nlr-parser: rapid annotation of plant nlr complements dendropy: a python library for phylogenetic computing evolution of plant nlrs: from natural history to precise modifications do fungi have an innate immune response? an nlr-based comparison to plant and animal immune systems a species-wide inventory of nlr genes and alleles in arabidopsis thaliana the decoy substrate of a pathogen effector and a pseudokinase specify pathogeninduced modified-self recognition and immunity in plants ligand-triggered allosteric adp release primes a plant nlr complex reconstitution and structure of a plant nlr resistosome conferring immunity structural insights into the plant immune receptors prrs and nlrs the "sensor domains" of plant nlr proteins: more than decoys? nlr network mediates immunity to diverse plant pathogens receptor networks underpin plant immunity resistosome and inflammasome: platforms mediating innate immunity plant immunity: danger perception and signaling we are thankful to several colleagues for discussions and ideas. we thank sebastian schornack (sainsbury laboratory, university of cambridge, cambridge, uk) for valuable comments on this paper. this work was funded by the gatsby charitable foundation, biotechnology and biological sciences research council (bbsrc, uk), and european research council (erc blastoff projects). we thank the prime minister of the united kingdom for announcing a stay-at-home order on th march . h.a. and s.k. mainly wrote the paper; h.a., a.m. and s.k. designed the research and supervised the work; h.a., t.s., j.k. and a.m. performed research. s.k. receives funding from industry on nlr biology. supplementary table -source data . amino acid sequences of zar -sub. this file contains zar -sub amino acid sequences identified from computational pipeline in figure a .supplementary table -source data . amino acid alignment file of zar -sub. fulllength amino acid sequences of zar -sub were aligned by mafft version .supplementary table -source data . amino acid sequences of zar -cin. this file contains zar -cin amino acid sequences identified from computational pipeline in figure a .supplementary table -source data . amino acid alignment file of zar -cin. fulllength amino acid sequences of zar -cin were aligned by mafft version . key: cord- -nh i lu authors: vanachayangkul, pattaraporn; im-erbsin, rawiwan; tungtaeng, anchalee; kodchakorn, chanikarn; roth, alison; adams, john; chaisatit, chaiyaporn; saingam, piyaporn; sciotti, richard j.; reichard, gregory a.; nolan, christina k.; pybus, brandon s.; black, chad c.; lugo, luis a.; wegner, matthew d.; smith, philip l.; wojnarski, mariusz; vesely, brian a.; kobylinski, kevin c. title: safety, pharmacokinetics, and liver-stage plasmodium cynomolgi effect of high-dose ivermectin and chloroquine in rhesus macaques date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nh i lu previously, ivermectin ( – mg/kg) was shown to inhibit liver-stage development of plasmodium berghei in orally dosed mice. here, ivermectin showed inhibition of the in vitro development of plasmodium cynomolgi schizonts (ic = . μm) and hypnozoites (ic = . μm) in primary macaque hepatocytes when administered in high-dose prophylactically but not when administered in radical cure mode. the safety, pharmacokinetics, and efficacy of oral ivermectin ( . , . , and . mg/kg) with and without chloroquine ( mg/kg) administered for seven consecutive days was evaluated for prophylaxis or radical cure of plasmodium cynomolgi liver-stages in rhesus macaques. no inhibition or delay to blood-stage p. cynomolgi parasitemia was observed at any ivermectin dose ( . , . , and . mg/kg). ivermectin ( . and . mg/kg) and chloroquine ( mg/kg) in combination were well-tolerated with no adverse events and no significant pharmacokinetic drug-drug interactions observed. repeated daily ivermectin administration for seven days did not inhibit ivermectin bioavailability. it was recently demonstrated that both ivermectin and chloroquine inhibit replication of the novel severe acute respiratory syndrome coronavirus (sars-cov- ) in vitro. further ivermectin and chloroquine trials in humans are warranted to evaluate their role in plasmodium vivax control and as adjunctive therapies against covid- infections. novel chemoprophylactic therapeutics and vector control interventions could support and accelerate malaria elimination efforts. ivermectin mass drug administration (mda) has been proposed as a malaria control tool since it makes the blood of treated persons lethal to anopheles mosquitoes, the vectors of malaria ( - ), and repeated ivermectin mdas in burkina faso were able to reduce malaria transmission to humans ( ). ivermectin is a safe and well- tolerated endectocidal drug used widely in veterinary and human medicine to combat both internal and external parasites. ivermectin has been shown to inhibit liver-stage development of plasmodium berghei in both an in vitro huh human hepatoma cell line model ( ) and an in vivo c bl/ mouse model ( ). the in vitro half maximal inhibitory concentration (ic ) for ivermectin p. berghei inhibition, ic = . µg/ml ( . µm), was higher than blood levels that can be achieved in treated humans. however, mice that were orally dosed with ivermectin at - mg/kg at and hours before and hours after sporozoite challenge demonstrated liver-stage inhibition equal to primaquine ( mg/kg) under the same dosing schedule ( ). human equivalent dosing (hed) that was evaluated in mice would correlate to ivermectin doses in the range of . - . mg/kg ( ). thus, ivermectin is promising for human malaria chemoprophylaxis as ivermectin doses as high as mg/kg have been safely administered to humans ( ). if ivermectin can prevent plasmodium liver-stage infection, then ivermectin chemoprophylaxis could be considered in high risk groups such as forest-goers in the greater mekong subregion or naïve soldiers deployed to malaria endemic areas. furthermore, if ivermectin mda is deployed for community-wide malaria vector control, and ivermectin is chemoprophylactic, then there would be direct benefits to mda participants in preventing malaria infections. model can evaluate both the causal prophylaxis, (i.e. protection from developing liver schizonts), and the hypnozoiticidal (i.e. radical cure of liver hypnozoites) efficacy of compounds ( ) . ivermectin has been used in rhesus macaque colonies to treat mites ( ), lice ( ), and intestinal helminths, such as ascaris, trichuris, and strongyloides fulleborni ( ) ( ) ( ) . studies demonstrated that oral ivermectin was safe in macaques at doses up to . mg/kg for days and that macaques are an ideal animal model for ivermectin human treatment ( , ) . however, no study to date has evaluated the pharmacokinetics of repeated ivermectin treatment in rhesus macaques or in combination with chloroquine. here we evaluate the in vitro and in vivo liver-stage effect of ivermectin against p. cynomolgi in rhesus macaque liver hepatocytes and infected macaques. the safety and pharmacokinetics of repeated oral ivermectin dosing with and without chloroquine in macaques is also presented. ivermectin efficacy against liver-stage parasites was initially evaluated using an in vitro p. cynomolgi liver model which utilizes primary rhesus macaque hepatocytes in order to closely resemble the in vivo anti-relapse mode. the drugging regimen was defined by treatment mode, either prophylactic mode (i.e. drug administered with sporozoites and days thereafter) or radical cure mode (i.e. drug administered from days to post sporozoite infection) similar to previously described methods ( ) . in prophylactic mode, ivermectin showed marginal in vitro causal protection against the development of p. cynomolgi-infected rhesus macaque hepatocyte liver schizonts ic = . μg/ml ( . μm) and hypnozoites ic = . μg/ml ( . μm) ( figure ). however, in radical cure mode, ivermectin had no activity on developing p. cynomolgi liver schizonts or established hypnozoites, even when dosed at a high initial concentration of µg/ml ( . μm). inhibition of liver schizonts (ic = . μg/ml) and hypnozoites (ic = . μg/ml). ls = liver- stage. graph bars represent means with standard deviation of biological replicates (n = ) with experimental replicates (n = ). there was only one adverse event in a single macaque (r ) that vomited three hours after the first oral dose of ivermectin ( . mg/kg) when administered as monotherapy one day prior to p. cynomolgi sporozoite injection. no adverse events occurred when ivermectin ( . or . mg/kg) was co-administered with chloroquine. no abnormal hematology outcomes were observed for ivermectin alone or ivermectin plus chloroquine co-administration. primary blood-stage parasitemia greater than , /μl was detected ten days post inoculation for negative and positive control groups and for of macaques in both ivermectin high ( . mg/kg)-and low ( . mg/kg)-dose groups, with remaining macaques from each group reaching greater than , /µl eleven days post inoculation which was and days after the last ivermectin administration. primary infection blood-stage parasitemia was cleared from the negative control group with ten days of chloroquine ( mg/kg) and both blood-and liver-stage parasites from positive control group with seven days of chloroquine ( mg/kg) and primaquine ( . mg/kg). blood-stage parasitemia was cleared from the three macaques in the low-dose ivermectin group with seven days ivermectin ( . mg/kg) and ten days chloroquine ( mg/kg). two of three macaques were cleared of primary infection blood-stage parasitemia in the high- dose group with ivermectin ( . mg/kg) for seven days and chloroquine ( mg/kg) for ten days, while one macaque was cleared with ivermectin ( . mg/kg) and chloroquine ( mg/kg) for seven days. however, the first relapse occurred within weeks, at approximately the same time for negative control and both ivermectin groups with no significant differences for time to blood- stage parasitemia or treatment (log-rank (mantel cox) test p > . ). the first relapse infection blood-stage parasitemia was cleared from the negative control with chloroquine ( mg/kg) alone for seven days. first relapse infection blood-stage parasitemia was cleared from both high ( . mg/kg)-and low ( . mg/kg)-dose ivermectin groups when given in combination with chloroquine ( mg/kg) for seven days. approximately weeks later, a second relapse occurred in all negative control and ivermectin high-and low-dose treated macaques with no significant differences for time to blood-stage parasitemia or treatment (log-rank (mantel cox) test p > . ). at the point of second relapse, all ivermectin-group macaques were treated with primaquine ( . mg/kg) and chloroquine ( mg/kg) for seven days. the positive control group was treated with primaquine ( . mg/kg) and chloroquine ( mg/kg) for seven days at point of primary infection and had no relapses for the remainder of the study (figure ). the negative control group was treated with primaquine ( . mg/kg) and chloroquine ( mg/kg) for seven days at the point of third relapse (data not shown). table illustrates the pharmacokinetic parameters of ivermectin when administered alone after the first and seventh (last) doses. auc %extrap is the percentage of area-under-the-curve infinity due to extrapolation from the last collection time point to infinity, auc hr is the exposure through hours, auc inf is the total exposure, cl/f is the apparent clearance, vz/f is the apparent volume of distribution, c max is the maximum concentration, c max /dose is the maximum concentration divided by the dose administered, t / is the elimination half-life, and t max is the time to reach the maximum concentration. table illustrates the pharmacokinetic parameters of ivermectin when administered with chloroquine ( mg/kg) after the first and seventh (last) doses. auc %extrap is the percentage of area-under-the-curve infinity due to extrapolation from the last collection time point to infinity, auc hr is the exposure through hours, auc inf is the total exposure, cl/f is the apparent clearance, vz/f is the apparent volume of distribution, c max is the maximum concentration, c max /dose is the maximum concentration divided by the dose administered, t / is the elimination there was no delay to patency of first blood-stage p. cynomolgi infection in either low-or high-dose ivermectin groups (figure ). ivermectin displayed µm levels of liver schizont efficacy in vitro, however, a lack of delay to blood-stage patency suggests minimal impact of ivermectin on liver schizont development. admittedly, the injection of one million p. cynomolgi sporozoites into the macaque sets a very high bar for any drug as it only requires one sporozoite to develop into a liver schizont to continue the blood-stage malaria infection. this is in contrast to a single mosquito that is predicted to deliver < sporozoites during blood feeding ( ) to the best of our knowledge this is highest repeated dose ivermectin pharmacokinetic investigation in any mammal species. there were no significant changes in the cl/f or t / . it should be noted that this study had a small sample size, only three macaques per ivermectin- treated group, and thus ivermectin autoinhibition warrants further evaluation in future trials. in humans, three repeated doses of ivermectin ( or mg) every third day did not inhibit c max when comparing the first and third dose, suggesting a lack of autoinhibition ( ). in fvb mice administered oral ivermectin ( . mg/kg) twice a week for five weeks there was a . -fold reduction in hour post-dose plasma ivermectin concentrations, while increasing the major metabolite concentration by . -fold ( ), suggesting induction of metabolism. in macaques, co-administration of ivermectin ( . or . mg/kg) and chloroquine ( mg/kg) for seven days was safe and well-tolerated. co-administration of chloroquine and ivermectin did not have an effect on the c max or auc of ivermectin or chloroquine (tables and ; figure ). the . and . mg/kg dose in macaques has an approximate heds of . mg/kg (total . mg/kg) and . mg/kg (total . mg/kg) respectively. this suggests that repeated daily dosing of ivermectin at . or . mg/kg could be used in combination with chloroquine in humans. while billions of ivermectin and chloroquine treatments have been administered to humans, there is very limited safety evidence for their co-administration. only one study, on plasmodium vivax, has co-administered ivermectin ( . mg/kg single-dose) and chloroquine ( . mg/kg first day, . mg/kg second and third day), in ten persons with no adverse events passively reported ( ). ivermectin ( ), chloroquine ( ), and hydroxychloroquine ( , ) have been shown in vitro to inhibit replication of the novel severe acute respiratory syndrome coronavirus (sars-cov- ). all three drugs distribute into lung tissues at higher concentrations than plasma for chloroquine and hydroxychloroquine in rats ( ) anopheles dirus mosquitoes were used to produce p. cynomolgi (b strain) sporozoites, from a donor macaque infected with blood-stage p. cynomolgi parasites. for liver-stage challenge, each macaque was injected intravenously with x p. cynomolgi sporozoites in a ml inoculum of pbs and . % bovine serum albumin. usamd-afrims colony-born rhesus macaques of indian origin were used in this study. ten healthy macaques, five male and five female, - years old and ranging in weight from . - . kg were selected for this study. all macaques were negative for simian retroviruses and simian herpes b virus. two macaques served as negative controls and were treated initially with seven days of vehicle controls and treated with seven days of chloroquine ( mg/kg) when parasitemia reached > , parasites per µl at primary infection and first relapse, and with seven days chloroquine ( mg/kg) plus primaquine ( . mg/kg) at second relapse. two macaques served as positive causal prophylaxis controls and were treated initially with seven days of vehicle controls and treated with seven days of chloroquine ( mg/kg) plus primaquine ( . mg/kg) at point of primary infection when parasites reached > , parasites per µl. all study drugs were administered to restrained conscious macaques via nasogastric intubation at ml/kg body weight. sparmectin-e (sparhawk laboratories, inc., lenexa, ks, usa) is a water-soluble formulation of ivermectin developed for oral use in horses. ivermectin was diluted in sterile water and administered via nasogastric route. six macaques received ivermectin; three low- dose ( . mg/kg) and three high-dose ( . mg/kg) for seven consecutive days starting one day before sporozoite challenge. if a primary blood-stage infection occurs, and blood-stage parasitemia reaches > , parasites per µl, then the macaques receive seven days of chloroquine ( mg/kg) plus ivermectin ( . mg/kg) for the high-dose group, and seven days of chloroquine ( mg/kg) plus ivermectin ( . mg/kg) for the low-dose group. if a relapse occurs, and blood-stage parasitemia reaches > , parasites per µl, then macaques received seven days of chloroquine ( mg/kg) plus ivermectin ( . mg/kg) for both the low-and high-dose groups. if a second relapse occurs, then the macaques were treated with seven days of chloroquine ( mg/kg) and primaquine ( . mg/kg), terminating the experiment. both the negative and positive control group macaques were treated with seven days of chloroquine ( mg/kg) and primaquine ( . mg/kg) at the third relapse and first infection, respectively. macaques were observed several times in the first few hours post dosing, and at least three times a day for the remainder of the study for any clinical signs of neurological (e.g. ataxia, lethargy, imbalance) or gastroenterological (e.g. diarrhea, vomiting, weight loss) complications. venous blood was collected at select time points and after macaques become blood smear positive for hematocrit, white and red blood cell count was determined. thick and thin blood smear samples were made and examined daily to quantify malaria parasitemia. samples were fixed in methanol and stained with giemsa stain. blood smears were examined for the presence or absence of blood-stage parasites under oil-immersion objective. if no parasites were found in microscopic oil-immersion thick fields or approximately , white blood cells (wbcs), the smear was considered negative. the parasitemia level was reported as number of parasites per µl or mm of whole blood. parasites were counted per number of wbcs or red blood cells (rbcs) (i.e., per , wbcs or , - , rbcs). parasitemia levels were calculated by the appropriate total blood cell count (white or red) per mm . blood samples ( . ml) were collected on days , , and after sporozoite injection. the same sampling schedule occurred in control macaques with the addition of sampling days , , and ( ml) to obtain infected blood for controls used for method development. blood was collected, stored in edta tubes, and kept frozen at - °c. parasite dna was extracted from ul from edta whole blood using ez dna blood kit with automated ez advanced xl purification system (qiagen, hilden, germany). real time pcr for p. cynomolgi detection was performed by using rotor gene q plex hrm platform (qiagen, hilden, germany). primer and probe were designed to target p. cynomolgi small subunit rrna of blood-stage parasites (genbank accession number l . ). primer and probe sequences are as follows; p. cynomolgi fwd: blood sampling ( ml) for pharmacokinetic time points: just prior to first ivermectin dose, and after first dose , , , , hours, then each consecutive day just before dosing, then after the th dose at , , , , hours, and days , , , , . if a primary infection occurred, then the same blood sampling schedule was repeated, but no blood for pharmacokinetics were collected at first or second relapses. blood was collected in heparinized sodium vacutainer tubes and centrifuged at , rpm for min and then the supernatant (plasma) was transferred and kept effect of ivermectin on anopheles gambiae mosquitoes fed on humans: the potential of oral insecticides in malaria control evaluation of ivermectin mass drug administration for malaria transmission control across different west african environments safety and mosquitocidal efficacy of high-dose ivermectin when co-administered with dihydroartemisinin-piperaquine in kenyan adults with uncomplicated malaria (ivermal): a randomised, double-blind, placebo-controlled trial efficacy and safety of the mosquitocidal drug ivermectin to prevent malaria transmission after treatment: a double- blind, randomized, clinical trial safety, pharmacokinetics, and mosquito- lethal effects of ivermectin in combination with dihydroartemisinin-piperaquine and primaquine in healthy adult thai subjects dabire r. . efficacy and risk of harms of repeat ivermectin mass drug administrations for control of malaria (rimdamal): a cluster-randomised trial drug screen targeted at plasmodium liver stages identifies a potent multistage antimalarial drug office of new drugs in the center for drug evaluation and research (cder). . guidance for industry: estimating the maximum safe starting dose in initial clinical trials for therapeutics in adult healthy volunteers. health and human services food and drug administration safety, tolerability, and pharmacokinetics of escalating high doses of ivermectin in healthy adult subjects causal prophylactic efficacy of primaquine, tafenoquine, and atovaquone-proguanil against plasmodium cynomolgi in a rhesus monkey model treatment of pulmonary acariasis in rhesus macaques with ivermectin management of an infestation of sucking lice in a colony of rhesus macaques comparison of efficacy of selamectin, ivermectin and mebendazole for the control of gastrointestinal nematodes in rhesus macaques, china comparison of efficacy of moxidectin and ivermectin in the treatment of strongyloides fulleborni infection in rhesus macaques molecular confirmation and anthelmintic efficacy assessment against natural trichurid infections in zoo-housed non-human primates ivermectin and abamectin stromectol ® new drug application. fda center for drug evaluation and research a comprehensive model for assessment of liver stage therapies targeting plasmodium vivax and plasmodium falciparum promising approach to reducing malaria transmission by ivermectin: sporontocidal effect against plasmodium vivax in the south american vectors anopheles aquasalis and anopheles darlingi an estimation of the number of malaria sporozoites ejected by a feeding mosquito ivermectin for causal malaria prophylaxis: a randomised controlled human infection trial ivermectin exposure leads to up- regulation of detoxification genes in vitro and in vivo in mice . the fda-approved drug ivermectin inhibits the replication of sars-cov- in vitro antiviral research epub remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro vitro antiviral activity and projection of dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) tissue distribution of chloroquine, hydroxychloroquine, and desethylchloroquine in the rat simultaneous quantitation of hydroxychloroquine and its metabolites in mouse blood and tissues using lc-esi- an application for pharmacokinetic studies influence of the route of administration on efficacy and tissue distribution of ivermectin in goat comparative distribution of ivermectin and doramectin to parasite location tissues in cattle ivermectin in covid- related critical illness. ssrn epub screening for an ivermectin slow-release formulation suitable for malaria vector control oral, ultra-long-lasting drug delivery: application toward malaria elimination goals we thank the afrims department of veterinary medicine for conducting the macaque trial especially laksanee inamnuay, kesara chumpolkulwong, natthasorn komchareon, chardchai key: cord- - avjsqu authors: davies, jennifer l title: using transcranial magnetic stimulation to map the cortical representation of lower-limb muscles date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: avjsqu the aim of this study was to evaluate the extent to which transcranial magnetic stimulation (tms) can identify discrete cortical representation of lower-limb muscles in healthy individuals. data were obtained from young healthy adults ( women, four men; mean [sd] age . [ . ] years). motor evoked potentials were recorded from the resting vastus medialis, rectus femoris, vastus lateralis, medial and lateral hamstring, and medial and lateral gastrocnemius muscles on the right side of the body using bipolar surface electrodes. tms was delivered through a -mm double-cone coil at sites over the left hemisphere. location and size of the cortical representation and the number of discrete peaks were quantified for each muscle. within the quadriceps muscle group there was a main effect of muscle on anterior-posterior centre of gravity (p = . ), but the magnitude of the difference was very small. within the quadriceps there was a main effect of muscle on medial-lateral hotspot (p = . ) and map volume (p = . ), but no post-hoc tests were significant. the topography of each lower-limb muscle was complex, displaying multiple peaks that were present across the stimulation grid, and variable across individuals. the results of this study indicate that tms delivered with a -mm double-cone coil could not reliably identify discrete cortical representations of resting lower-limb muscles when responses were measured using bipolar surface electromyography. the characteristics of the cortical representation of lower-limb muscles reported here provide a basis against which to evaluate cortical reorganisation in clinical populations. transcranial magnetic stimulation (tms) can be used to study the representation of muscles within the primary motor cortex. although the extent of somatotopy within the primary motor cortex is debated (donoghue et al. ; schieber ; schellekens et al. ) , tms has revealed alterations in the cortical representation of muscles in several clinical conditions (liepert et al. ; schwenkreis et al. schwenkreis et al. , tsao et al. tsao et al. , b schabrun et al. schabrun et al. , te et al. ) . this suggests that tms can identify clinically meaningful differences in cortical representation between groups of individuals. the majority of work on the representation of muscles within the primary motor cortex has been conducted for muscles of the hand and upper limb. for the lower limb, tms has been used to quantify the size and the amplitude-weighted centre (centre of gravity [cog] ) of the cortical representation of the resting tibialis anterior muscle (liepert et al. ; lotze et al. ) , the resting quadriceps femoris muscle (schwenkreis et al. ) , the resting vastus lateralis muscle (al sawah et al. ) , and the active rectus femoris (ward et al. b (ward et al. , a te et al. ) and vastii (te et al. ) muscles. no studies have reported the representation of the hamstring or gastrocnemius muscles. although tms has revealed discrete cortical representation of the deep and superficial fascicles of the paraspinal muscles (tsao et al. a) , the extent to which tms can be used to identify discrete cortical representation of lower-limb muscles is unclear. only one study has quantified the representation of multiple lower-limb muscles in the same individuals. in statistical analysis, there was no main effect of muscle, suggesting similar cortical representation of the active rectus femoris and vastii muscles (te et al. ) . however, the separation between the representation of the three muscles was smaller in individuals with patellofemoral pain than in healthy controls, suggesting that separation between muscles might be a measure of interest (te et al. ) . this supports findings in other chronic musculoskeletal pain conditions, where there was reduced distinction in the cortical representation of two back muscles in individuals with low-back pain (tsao et al. b) , and of two forearm extensor muscles in individuals with elbow pain (schabrun et al. ) . the extent to which tms can be used to identify discrete cortical representation of lower-limb muscles therefore warrants further investigation. in addition, recent studies have identified multiple discrete peaks in the cortical representation of a muscle (schabrun et al. ; te et al. ), but no normative data exists on this measure for the representation of lower-limb muscles. the aim of this study was to evaluate the extent to which tms can identify discrete cortical representation of lower-limb muscles in healthy individuals. cortical representation was mapped for seven resting lower-limb muscles involved in control of the knee joint (rectus femoris, vastus lateralis, vastus medialis, medial hamstring, lateral hamstring, medial gastrocnemius, and lateral gastrocnemius) and was quantified using size, cog, hotspot and number of discrete peaks. these measures were compared between muscles from the same group (quadricep, hamstring, plantar flexor) to evaluate the extent to which tms can identify discrete cortical representation of lower-limb muscles. these data describe the characteristics of the cortical representation of lower-limb muscles in healthy individuals and provide a basis against which to evaluate reorganisation in clinical populations. this study was carried out at the cardiff university brain research imaging centre and was approved by the cardiff university school of psychology ethics committee. a convenience sample of young healthy adults ( women, five men; mean [sd] age . [ . ] years) was recruited from an existing participant database and advertisements placed around cardiff university. all participants were screened for contraindications to tms (including history of seizures, neurological injury or head injury) and to ensure that they met the following inclusion criteria: no recent, recurring or chronic pain in any part of the body, no history of surgery in the lower limbs, and not taking any psychiatric or neuroactive medications. the full screening questionnaire is available on the open science framework (https://osf.io/npvwu/). all participants reported that they were right-leg dominant, defined by the leg they would use to kick a ball. all participants attended a single testing session between january and march , and provided written informed consent prior to the start of the experiment. participants were instructed to have a good night's sleep the night prior to the experiment, not to consume recreational drugs or more than three units of alcohol on the day of or night prior to the experiment, and not to consume more than two caffeinated drinks in the two hours prior to the experiment. surface electrodes (kendall series; covidien, ma) were placed on the following muscles of the right leg according to the seniam project guidelines (http://seniam.org/): rectus femoris, vastus lateralis, vastus medialis, medial hamstring (semitendinosus), lateral hamstring (biceps femoris), medial gastrocnemius, and lateral gastrocnemius. prior to electrode placement the skin was prepared with exfoliant and alcohol swabs. data were passed through a humbug noise eliminator (digitimer, hertfordshire, uk) and a d / amplifier (digitimer) where they were amplified x and bandpass filtered ( - hz; (groppa et al. ) ) before being sampled at samples per second in signal software (version ; cambridge electronic designs, cambridge, uk). electromyography (emg) data were stored for offline analysis and viewed in real time using signal software. tms was delivered through a -mm double-cone coil (magstim, whitland, uk) using a single-pulse monophasic stimulator ( , magstim). throughout the experiment, a neuronavigation system (brainsight tms navigation, rogue resolutions, cardiff, uk) was used to track the position of the coil relative to the participant's head. the vertex was identified as the intersection of the interaural line and the line connecting the nasion and inion. participants were seated and chair height and arm rests were adjusted to optimise comfort. the coil was placed slightly lateral to the vertex, over the left hemisphere. stimuli were delivered and stimulus intensity was gradually increased until motor evoked potentials (meps) were observed in the emg data. participants were instructed to stay relaxed, and this was confirmed by visual inspection of the emg data in real time. for each participant, a stimulus intensity was selected that elicited consistent meps in all muscles, but that would be tolerable for the remainder of the experiment. in two participants (one woman, one man) it was not possible to elicit meps on resting muscles with a tolerable stimulus intensity, and these participants did not participate further. in the remaining participants, the mean (sd) selected stimulation intensity was ( . )% maximum stimulator output (range, - % maximum stimulator output). the neuronavigation software was used to project a x cm grid with -cm spacings over the left hemisphere of a representation of the skull that was visible to the experimenter on a monitor. the front-rightmost corner of the grid was positioned cm to the right and cm anterior to the vertex, and the back-leftmost corner was cm to the left and cm posterior to the vertex (figure ). this resulted in a total of grid sites. the target grid was centred slightly anterior to the vertex based on previous reports that the cog of lower-limb muscles is anterior to the vertex (schwenkreis et al. ; al sawah et al. ; ward et al. b ward et al. , a te et al. ). the target grid was designed to cover a large area to capture the boundaries of the cortical representations. the order of the targets was randomised and each target was stimulated once at the predetermined stimulus intensity. the inter-stimulus interval was at least s. a break of ~ min was then taken, before this was repeated. the purpose of this break was to avoid long blocks of stimuli during which the participant's attention or level of arousal might decline. five sets of stimuli were performed in total. at each target, the experimenter viewed real-time information on the position of the coil and the error from the target position and did not stimulate until there was < mm error in coil position. the position of the coil was recorded for each stimulation. the same experimenter (jd) performed tms for all participants. all processing and analyses were performing using custom-written code in matlab (versions a and a, mathworks, natick, ma, usa). all code used to process and analyse the data is available on the open science framework (https://osf.io/qrsp /). background emg was quantified as mean absolute emg in the ms prior to stimulus onset. trials were automatically discarded if if there was background muscle activity, defined as background emg greater than three median absolute deviations above the median background emg from all trials for that muscle. trials were also excluded if the stimulus was delivered > mm from the target. emg data from the remaining trials were visualised and trials were manually excluded if there were visible artefacts. emg traces for all muscles of all participants (https://osf.io/ k p/), and the raw data on which all processing and analyses were performed (https://osf.io/y uj/) are available on the open science framework (doi . /osf.io/e nmk). emg data from the remaining trials were averaged across all stimuli at each scalp site. planned analysis was that the amplitude of the mep be quantified as the peak-to-peak amplitude of the emg signal between and ms after stimulus onset. in three participants, this window included the beginning of a second mep and was, a posteriori, shortened to finish at ms after stimulus onset. visual inspection of emg data confirmed that the windows captured the mep for all participants and muscles. within each participant and each muscle, mep amplitude was scaled to peak mep amplitude (tsao et al. (tsao et al. , b . map volume was calculated as the sum of scaled mep amplitudes across all sites (te et al. ) . discrete peaks were identified if the scaled mep amplitude at a grid site was greater than %, was at least % greater than the scaled mep amplitude at all but one of the surrounding grid sites, and was not adjacent to another peak (schabrun et al. ; te et al. ). the location of each grid site was expressed relative to the vertex. the scaled mep amplitude, and the medial-lateral (ml) and anterior-posterior (ap) coordinates of the grid site were spline interpolated in two dimensions to obtain a resolution of one millimetre for each axis (borghetti et al. ; ward et al. a ward et al. , b . the interpolated data were used to create a topographical map and calculate cog. cog is the amplitude-weighted indication of map position, and was calculated using the following formulae: is the medial-lateral location of the grid site, is the anterior-posterior location of the grid site, and ‫ݖ‬ is the scaled amplitude of the mep at that grid site. for each outcome and each muscle, the distribution of the data was evaluated using visual the following analyses were conceived and performed after the data had been viewed. cog can be influenced by the presence of multiple discrete peaks in the topography. the stimulation location that elicited the largest mep (hotspot) was quantified as an additional measure of the cortical representation. for each participant, the onset latency of the mep was determined for each stimulation site. latency was initially defined as the first point after stimulation at which the full-wave rectified emg signal was more than five median absolute deviations above the median fullwave rectified emg signal in the ms prior to stimulation and stayed above this threshold for at least ms. the full-wave rectified emg from each simulation site was then visually inspected to ensure that latency was accurately identified. latency was averaged across four stimulation sites (from midline to -cm lateral to midline) at the vertex (central), cm anterior to the vertex (anterior), and cm posterior to the vertex (posterior; see figure ). in some muscles of some participants, a second mep was present with a latency of ~ ms. for each participant and muscle, the presence or absence of a late mep was determined by visual inspection of the emg data. the onset latency of this late mep was determined manually for each muscle by clicking a cursor on a graph where the first deviation from ongoing emg was visible. for several muscles, hotspot ml and hotspot ap were different from the normal distribution. within each muscle group, hotspot ml and hotspot ap were compared across muscles using a friedman test (quadriceps) or a wilcoxon signed rank test (hamstrings, plantar flexors). for several muscles, mep latency at central, posterior, and/or anterior stimulation locations were different from the normal distribution. for each muscle, onset latency was compared across stimulation locations using a friedman test. effect sizes were calculated as described above. data were collected from participants ( women, four men; mean [sd] age . [ . ] years). all participants completed the testing session and did not report any adverse effects. in six participants, stimulation at the most anterior row of grid sites was uncomfortable due to large twitches of the facial muscles. in one of these participants and one additional participant, stimulation at the most lateral column of grid sites was also uncomfortable due to large twitches in hand muscles. these grid sites were not stimulated for these participants. in the remaining nine participants all grid sites were stimulated. for one participant (# ), emg data from the medial gastrocnemius were of poor quality. for another participant (# ), emg data from the medial gastrocnemius and the lateral hamstring were of poor quality. for a third participant (# ) emg data from the medial gastrocnemius and medial hamstring were of poor quality. these five muscles (from three participants) were excluded from further analysis. meps were present in all muscles from all participants. cog and hotspot are shown in figure . mep latency, cog, hotspot, map volume and the number of discrete peaks for each muscle are shown in table . within the quadriceps muscle group there was a significant main effect of muscle on cog ap (analysis of variance p = . ), but the effect size was very small (η = . ). post-hoc tests showed that cog ap was more negative (posterior) for vastus medialis than for vastus lateralis (bonferroni corrected p = . ) and rectus femoris (bonferroni corrected p = . ), but the magnitude of this difference was very small (table and figure a ). cog ap was similar for rectus femoris and vastus lateralis (bonferroni corrected p > ). within the quadriceps muscle group there was also a significant main effect of muscle on hotspot ml (friedman test p = . ; w = . ). no post-hoc tests were significant, and there was no clear trend in the data ( figure j ). within the quadriceps muscle group there was a significant main effect of muscle on map volume (analysis of variance p = . , η = . ). no post-hoc tests were significant, but there was a trend for greater map volume in rectus femoris than in vastus medialis (bonferroni corrected p = . ; table ). there was no significant effect of muscle for any other outcome measure in any other muscle group (table ) . topographical maps for all muscles from one participant are shown in figure . topographical maps for the rectus femoris muscle from several participants are shown in figure ). in all muscles, the tendency was for latency to be longer for stimuli delivered at the anterior location than for stimuli delivered at the central and posterior locations. the detailed results of post-hoc tests are provided in table . emg data from the medial and lateral hamstring muscles from one participant are shown in figure . in these muscles, a second mep was present after the primary mep. this late mep was present in the lateral hamstring of five participants, the medial hamstring of four participants, the vastus medialis and rectus femoris of two participants and the vastus lateralis of one participant. the late meps observed in the quadriceps muscles were all very small, whereas those observed in the hamstring muscles could be sizeable (see figure ). the mean (sd) estimated onset latency for the late mep was ( ) ms for the lateral hamstring (n = ), ( ) ms for the medial hamstring (n = ), ( ) ms for the vastus medialis (n = ), ( ) ms for the rectus femoris (n = ) and ms for the vastus lateralis (n = ). in this study i used tms to map the cortical representation of seven resting lower-limb muscles in healthy individuals. the data indicate that the size, cog, hotspot and number of discrete peaks were largely similar across muscles within each group (quadriceps, hamstrings, plantar flexors). there was a statistically significant different in cog ap and hotspot ml across the quadriceps muscles but the effect size and magnitude of differences was very small. the magnitude of the difference means that it would not be practically possible to differentially target one of the three quadriceps muscles with navigated tms. these results indicate that there was considerable overlap in the cortical representations of the lower limb muscles identified by tms and surface emg, and that tms delivered with a double-cone coil could not identify discrete cortical representations of lower-limb muscles when meps were measured with bipolar surface emg. the topography of the seven lower-limb muscle studied here was often complex, displaying multiple peaks that were present across the stimulation grid, and variable across individuals. this may reflect a large and complex anatomical representation of these muscles within the cortex, with considerable inter-individual variability. however, the impact of the techniques used to quantify the cortical representation must also be considered, particularly the volume of cortical tissue excited by the tms and the potential for peripheral volume conduction (crosstalk) in the surface emg recordings. for all muscles, the average cog was located at, or slightly posterior to, the vertex. in addition, despite the large area covered by the target grid, large meps were often observed in response to stimuli delivered at the edge of the grid. this is in contrast to previous mapping studies of the quadriceps muscles, which have reported an anterior cog ( studied active muscles, in contrast to resting muscles studied here. the double-cone coil used in this experiment consists of two circular coils arranged in a figure-of-eight shape with a fixed angle of about degrees between the two wings. the double-cone coil can stimulate deeper regions of the brain than a circular or figure-of-eight coil, which is advantageous for activating corticospinal projections to the lower-limb muscles. it elicits meps at a lower stimulation intensity than a circular coil and a figure-of-eight coil (dharmadasa et al. ). however, this improved stimulation depth is obtained at the expense of focality (lu and ueno ) . the large area of cortical tissue activated by tms delivered through a double-cone coil means that even when the coil is not centred over the motor cortex, excited tissue will likely include the motor cortex. this could explain the apparent expansion of the cortical representation beyond the borders of the target grid, and the larger area of the cortical representation than previously reported (schwenkreis et al. ; al sawah et al. ; ward et al. b ward et al. , a te et al. ) . nonetheless, if current spread was the only relevant factor, the largest mep may still be expected when the stimulating coil was centred over the motor cortex. the discrete peaks in the topographical maps that occurred at or close to the boundaries of the target grid in some participants suggest a complexity that is difficult to describe solely by current spread. it is possible that the double-cone coil excited cortical tissue beyond the motor cortex, and this resulted in meps. the latency of meps observed in response to stimulation at the anterior of the target grid was often slightly longer than that of meps observed in response to stimulation at the centre or posterior of the target grid. this may suggest that a different pathway was involved in the generation of these meps. corticospinal neurones innervating the lower limb spinal motoneurones are present in the premotor cortex (he et al. ) and supplementary motor area, caudal cingulate motor area on the dorsal bank and the rostral cingulate motor area (he et al. ) , as well as the primary motor cortex. however, it is also possible that the longer latency at anterior stimulation sites is an artefact of a smaller mep size at the grid boundary, and this requires further investigation. the figure-of-eight coil provides a very focal stimulation in superficial cortical regions, but minimal stimulation at increasing depths (lu and ueno ) . if much of the cortical representation of lower-limb muscles is inaccessible to the figure-of-eight coil, the resulting topography will appear less complex than that obtained with a double-cone coil. although the double-cone coil will excite deeper cortical tissue, the larger current spread means that it also excites a large volume of cortical tissue at shallow depths. it is possible that the double-cone coil accessed corticospinal neurones that could not be accessed with the figure-of-eight coil, and thus the complexity of the cortical representation reflects the true anatomical complexity that cannot be uncovered with a figure-of-eight coil. high-density surface emg recordings suggest that meps recorded in forearm muscles using conventional surface emg may contain crosstalk from neighbouring muscles (van elswijk et al. ; gallina et al. ; neva et al. (liepert et al. ; lotze et al. ; schwenkreis et al. ; al sawah et al. ; ward et al. a; te et al. ) and clinical (ward et al. b; te et al. ) populations. the current results indicate bipolar surface emg used with tms delivered through a doublecone coil cannot reliably identify discrete cortical representation of lower-limb muscles in young, healthy individuals. if, as has been suggested, the cortical representation of muscles is functional, rather than anatomical (schieber ; ejaz et al. ; leo et al. ) , then requiring the participant to perform a motor task will engage the specific subset of cortical neurones relevant for that task. the cortical representation revealed by tms may then be biased towards the representation for that specific task, at the expense of the representations for other functions the topography incorporated multiple discrete peaks, is not clear. however, the available data suggest that the cortical representations uncovered were smaller and less complex than those revealed in the present study. this suggests that the difference between the current and previous studies cannot be explained by the state of the muscle, and is more likely a function of the stimulating coil. in some muscles in some participants, there was a second mep that occurred with a latency of - ms. this late mep has previously been reported in the resting hamstrings, quadriceps, tibialis anterior and triceps surae muscles, with a latency of ms, ms, ms and ms, respectively (dimitrijević et al. ) , and in resting and active tibialis anterior and triceps surae muscles with a latency of ~ ms (holmgren et al. ). dimitrijevic et al. reported that this late mep was most prevalent in the hamstrings, where it was of higher amplitude than the primary mep. this is in line with the current findings, where the late mep was observed most frequently and with the largest amplitude in the hamstring muscles. the late mep is not exclusive to the lower limbs, and has been observed in resting and active forearm muscles (holmgren et al. ) and a resting, but not active, intrinsic hand muscle (wilson et al. ). the source of the late mep is not known, and could be central or peripheral. indirect corticospinal or cortico-brainstem-spinal pathways, which originate either from the targeted areas of the motor cortex or from wider cortical areas excited by the stimulation, could play a role. proprioceptive information arising from the primary mep could also play a role. based on the latency of responses from several lower-limb muscles, dimitrijevic et al. argued against a segmental or transcortical stretch reflex origin of the late mep, and against the involvement of gamma motor neurones (dimitrijević et al. ) . the difference in the latency of the late mep between upper-and lower-limb muscles is ~ ms greater than the difference in latency of the early, primary mep (holmgren et al. ), lending the possibility that a slow central pathway is involved. recent evidence indicates the primary motor cortex includes slow pyramidal tract neurones (innocenti et al. ) , which comprise the majority of the pyramidal tract but are not well studied (firmin et al. ; kraskov et al. ) . this may provide one such candidate pathway, but this remains to be studied. responses to tms were evaluated using bipolar surface emg, and the results may have been influenced by peripheral volume conduction (crosstalk) from other muscles. studies using high-density surface emg recordings, similar to those conducted in the forearm muscles (van elswijk et al. ; gallina et al. ; neva et al. ) , are required to elucidate the contribution of cross-talk to meps recorded from lower-limb muscles. however, the finding that tms could not identify discrete cortical representations of lower-limb muscles measured with bipolar surface emg is highly relevant, and all tms studies of the lower-limb to date have been performed using bipolar surface emg. tms was delivered through a double-cone coil. the ability of other coil designs, such as the figure-of-eight coil, to identify discrete cortical representations of lower-limb muscles remains to be determined. when using any coil, the volume of cortical tissue excited by the stimulus, and whether this is likely to encompass the full cortical representation of the target muscle, should be considered. this is particularly relevant for flat coils, where the depth of electric field penetration is lower than for the double-cone coil (lu and ueno ) . the current results are particularly relevant in light of a recent study recommending the standard use of the double-cone coil for lower-limb studies, in preference to a figure-of-eight or circular coil (dharmadasa et al. ) . the results of this study indicate that tms delivered with a -mm double-cone coil cannot reliably identify discrete cortical representations of resting lower-limb muscles when meps are measured using bipolar surface emg. the characteristics of the cortical representation of lower-limb muscles reported here provide a basis against which to evaluate cortical reorganisation in clinical populations. target stimulation sites. target stimulation sites were arranged in a x cm grid with -cm spacings. the front-rightmost corner of the grid was positioned cm to the right and cm anterior to the vertex, and the back-leftmost corner was cm to the left and cm posterior to the vertex. grey shading indicates the four stimulation sites (from midline to -cm lateral to midline) at the vertex, cm anterior to the vertex, and cm posterior to the vertex over which motor evoked potential latency was averaged for the exploratory analysis median across all participants. asterisks indicate significant difference (p < . ). late motor evoked potential. surface electromyography data from the medial (a) and lateral (b) hamstring of one participant. a late motor evoked potential is clearly evident with a latency of approximately ms. effect size is kendall's coefficient of concordance. mep, motor evoked potential. *different from latency of mep evoked from stimulation at posterior stimulation sites. transcranial magnetic stimulation mapping: a model based on spline interpolation the effect of coil type and limb dominance in the assessment of lower-limb motor cortex excitability using tms early and late lower limb motor evoked potentials elicited by transcranial magnetic motor cortex stimulation organization of the forelimb area in squirrel monkey motor cortex: representation of digit, wrist, and elbow muscles hand use predicts the structure of representations in sensorimotor cortex muscle imaging: mapping responses to transcranial magnetic stimulation with high-density surface electromyography axon diameters and conduction velocities in the macaque pyramidal tract selectivity of conventional electrodes for recording motor evoked potentials: an investigation with high-density surface electromyography. muscle and nerve a practical guide to diagnostic transcranial magnetic stimulation: report of an ifcn committee topographic organization of corticospinal projections from the frontal lobe: motor areas on the medial surface of the hemisphere topographic organization of corticospinal projections from the frontal lobe: motor areas on the lateral surface of the hemisphere late muscular responses to transcranial cortical stimulation in man of cortico-descending projections: histological and diffusion mri characterization in the monkey the corticospinal discrepancy: where are all the slow pyramidal tract neurons? a synergy-based hand control is encoded in human motor cortical areas changes of cortical motor area size during immobilization comparison of representational maps using functional magnetic resonance imaging and transcranial magnetic stimulation comparison of the induced fields using different coil configurations during deep transcranial magnetic stimulation differentiation of motor evoked potentials elicited from multiple forearm muscles: an investigation with high-density surface electromyography symmetric corticospinal excitability and representation of vastus lateralis muscle in right-handed healthy subjects novel adaptations in motor cortical maps: the relation to persistent elbow pain normalizing motor cortex representations in focal hand dystonia detailed somatotopy in primary motor and somatosensory cortex revealed by gaussian population receptive fields constraints on somatotopic organization in the primary motor cortex reorganization in the ipsilateral motor cortex of patients with lower limb amputation assessment of reorganization in the sensorimotor cortex after upper limb amputation ml hotspot (mm)* - the author declares that they have no conflict of interest. key: cord- - a wdduq authors: nunez-bajo, estefania; kasimatis, michael; cotur, yasin; asfour, tarek; collins, alex; tanriverdi, ugur; grell, max; kaisti, matti; senesi, guglielmo; stevenson, karen; güder, firat title: ultra-low-cost integrated silicon-based transducer for on-site, genetic detection of pathogens date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a wdduq rapid screening and low-cost diagnosis play a crucial role in choosing the correct course of intervention e.g., drug therapy, quarantine, no action etc. when dealing with highly infectious pathogens. this is especially important if the disease-causing agent has no effective treatment, such as the novel coronavirus sars-cov- (the pathogen causing covid- ), and shows no or similar symptoms to other common infections. we report a silicon-based integrated point-of-need (pon) transducer (trisilix) that can chemically-amplify and detect pathogen-specific sequences of nucleic acids (na) quantitatively in real-time. unlike other silicon-based technologies, trisilix can be produced at wafer-scale in a standard laboratory; we have developed a series of methodologies based on metal-assisted chemical (wet) etching, electroplating, thermal bonding and laser-cutting to enable a cleanroom-free low-cost fabrication that does not require processing in an advanced semiconductor foundry. trisilix is, therefore, resilient to disruptions in the global supply chain as the devices can be produced anywhere in the world. to create an ultra-low-cost device, the architecture proposed exploits the intrinsic properties of silicon and integrates three modes of operation in a single chip: i) electrical (joule) heater, ii) temperature sensor (i.e. thermistor) with a negative temperature coefficient that can provide the precise temperature of the sample solution during reaction and iii) electrochemical sensor for detecting target na. using trisilix, the sample solution can be maintained at a single, specific temperature (needed for isothermal amplification of na such as recombinase polymerase amplification (rpa) or cycled between different temperatures (with a precision of ± . °c) for polymerase chain reaction (pcr) while the exact concentration of amplicons is measured quantitatively and in real-time electrochemically. a single -inch si wafer yields trisilix chips of × × . mm in size and can be produced in hours, costing ~us $ . per device. the system is operated digitally, portable and low power – capable of running up to tests with a mah battery (a typical battery capacity of a modern smartphone). we were able to quantitatively detect a -bp fragment (insertion sequence is ) of the genomic dna of m. avium subsp. paratuberculosis (extracted from cultured field samples) through pcr in real-time with a limit-of-detection of fg, equivalent to a single bacterium, at the th cycle. using trisilix, we also detected the cdna from sars-cov- ( pg), through pcr, with high specificity against sars-cov ( ). despite the advancement of diagnostic technologies targeting nucleic acids (na), there are still no rapid, handheld, low-cost, easy-to-use and integrated solutions for the testing of infectious diseases at the point-of-need (pon). this is unfortunately the case for pathogens infecting humans, animals or even plants. this large gap in the diagnostic workflow hampers attempts to contain infectious pathogens from spreading by rapid and early detection. this technological gap has become, once again, evident with the spread of covid- , the diagnosis of which continues to depend heavily on centralized laboratories with specialized personnel and facilities, which in turn slows down testing and delays treatment. na make excellent targets for the direct detection of pathogens due to their high specificity. unlike other biomarkers, such as antibodies or non-genetic molecules originating from pathogens, na can be chemically amplified, enabling direct detection of low numbers of pathogens (down to single organisms). hence tests targeting na tend to be exceptionally sensitive. na-based reagents (e.g., primers) for na testing can also be produced synthetically rapidly and on a large scale; dna-based molecules, in particular, are highly stable, therefore, do not require a cold-chain for storage, making it especially suitable for pon testing. ability to produce na quickly also increases the speed of development and deployment of new test kits to health systems, providing much needed diagnostic capabilities for the detection of novel pathogens (such as sars-cov- ). despite the massive advantages, on-site testing of na has been limited. there are several molecular methods to amplify na for detection with high specificity e.g., polymerase chain reaction (pcr), loop-mediated isothermal amplification (lamp), stranddisplacement amplification (sda), recombinase-polymerase amplification (rpa). while pcr requires thermocycling; emerging isothermal methods such as lamp and rpa, do not, removing the need for sophisticated instruments. , on the other hand, pcr is a simple process, requiring few reagents. it is also the gold standard for na-based laboratory diagnostics with a solid support infrastructure. amplification strategies are traditionally combined with fluorescence-based optical methods of detection to produce a quantitative analytical signal. optical methods, even though sensitive, require expensive instruments. the fluorescent labels used in optical detection are also difficult to handle and susceptible to photobleaching. optical detection systems have been difficult to miniaturize and largely limited to centralized laboratories, however, some early concepts have been proposed by various groups in the literature. [ ] [ ] [ ] [ ] [ ] [ ] despite low-cost onsite testing for infectious diseases is being the holy grail of na diagnostics, there are still no inexpensive and handheld solutions in the market that can provide truly portable, rapid na detection. commercially available, benchtop optical lab instruments such as genexpert (cephied) have already made a substantial difference in the speed of na-based diagnosis of infectious diseases but the high cost of the instrument/tests, large size and power consumption have prevented the adoption of these systems for use in the field. the ideal, miniaturized, low-cost portable detection system for na must be able to heat up the sample from the patient to the desired temperature setpoint with high precision (using a heater and temperature sensor) and measure the results of the amplification reactions quantitatively, all in an integrated fashion. using cleanroom-based semiconductor fabrication methods, such systems have been reported; [ ] [ ] [ ] [ ] [ ] however, these devices require advanced methods of microfabrication that can only be performed in a cleanroom, such as photolithography, vacuum etching/deposition etc. they also do not exploit the intrinsic properties of the semiconductor materials that are used as substrates (i.e. si itself can be used both as a temperature sensor and electrical heater). hence the devices reported are complex and expensive. because semiconductor foundries are mainly located in east asia, manufacturing and logistics are susceptible to disruptions in the supply chain due to unexpected events (e.g., covid- pandemic). there is, therefore, still no viable low-cost, rapid, quantitative, integrated handheld na testing solution available for use in the field with the potential for market wide availability. in this work, we report an integrated silicon-based point-of-need tri-modal na transducer (trisilix) that can chemically-amplify and detect pathogen-specific sequences of na quantitatively in real-time (figure and supporting information video v ). unlike other silicon-based technologies, trisilix can be produced at wafer-scale in a standard laboratory and does not require fabrication in an advanced semiconductor foundry. hence it is resilient to disruptions in the global supply chain as the devices can be produced anywhere in the world. to achieve cleanroom-free, low-cost fabrication, we have developed a series of fabrication methodologies based on wet etching to form porous silicon, electroplating, thermal bonding and laser-cutting ( figure a ). to reduce costs and complexity, trisilix exploits the intrinsic properties of the semiconductor si which can be used as a resistive heating device and thermistor simultaneously. trisilix has three modes of operation: i) electrical (joule) heater, ii) thermistor with a negative temperature coefficient that can provide the precise temperature of the sample solution during reaction and iii) label-free electrochemical sensor for detecting target na with methylene blue as a redox-active reporter. trisilix works with both cyclic and isothermal methods of amplification. we demonstrate that trisilix can detect na from both bacteria (mycobacterium avium subspecies paratuberculosis) and viruses (sars-cov- ) with high specificity. the fabrication of multilayered, composite trisilix chips (dimension of each chip: × × . mm) starts with a si wafer. in this study we used lightly-doped p-type " si wafers, however, wafers with a larger diameter would also work. first, each wafer is plated electrolessly for s (in µm kaucl and . % hf) to form a thin layer of gold particulate film. the wafer is then placed inside an etching bath containing an aqueous solution of h o and hf with a ratio of : v/v ( % h o : % hf) to perform metalassisted chemical etching (mace) of si for min. [ ] [ ] [ ] this process forms a nm thick nanoporous silicon (psi) layer on each side of the wafer (figure a) . psi plays a critical role in the fabrication of trisilix. the porous surface allows electroplating of high-quality metal films on the surface of the si substrate by creating an interlocking, high porosity surface to improve adhesion ( figures b & c) . without this step, the metal films electroplated do not adhere to the surface of the substrate. the psi layer also allows thermal bonding of sheets of polymer films in an ordinary heat press after patterning through laser-cutting ( figure d ). after the formation of psi surface, two layers once the basic device structure was created, the wafer was laser-diced into individual chips ( chips / " si wafer, figure b ). using the randles-sevcik equation with experimentally measured gradient values of . ± . , we determined that the au film has a x larger electroactive area than the geometrically defined area due to the high porosity of the au film, which is beneficial for highperformance electrochemical analysis ( figure s ) . a sample reservoir (polyethylene; diameter: mm; volume: µl, figure a ) with two holes was thermally bonded across the three-electrode electrochemical cell at ± °c. the sample reservoir is specifically designed to prevent the evaporation of the solvent during amplification of dna at elevated temperatures. the final trisilix chip has a thickness of µm (n= ) in the we region and µm in the re/ce region (n= ) without the sample reservoir. figure c na amplification reactions require maintaining the sample at a temperature setpoint with high °precision. for pcr, the duration of each heating step must also be carefully controlled. because the trisilix uses si, the substrate itself can be used both as an electrical heater and temperature sensor (i.e. thermistor) without adding any additional components for temperature transduction. si, a semiconductor, heats up when an electrical current passes through it (known as joule heating). because si has a high thermal conductivity (~ w m - °k - at °k), the substrate can be heated uniformly regardless of the path the current flows. the electrical resistance of si also varies with temperature with a negative correlation (within our experimental range of temperatures from rt to °c); the electrical resistance of si drops with increasing temperature due to generation of mobile charge carriers allowing the use of si substrate itself as a sensitive sensor of temperature. [ ] [ ] [ ] we have applied electrical currents in the range - ma between two au electrodes deposited on the bottom of trisilix chip to heat up the device electrically (the experiments were performed at room temperature; ~ °c). during this experiment, we used a thermal camera (flir e ) to measure the temperature across the chip as a reference measurement. as illustrated in figure a , trisilix chip can be heated up to °c, electrically. the relationship between the current applied and substrate temperature was linear (with a positive slope) at steady-state with high repeatability (slope: . ± . °c a - ; r = . , n= ). at room temperature, the electrical resistance of a batch of trisilix chips was . ± . Ω (n= ) which varied linearly (with a negative slope of - . ± . × - °c - ; r = . , n= ) with the temperature measured within the range from room temperature to °c ( figure b ). by measuring the electrical resistance of the si substrate, using the two electrical contacts at the bottom of the chip, the temperature of the trisilix chip can be precisely identified after calibration against a reference measurement (see supporting information section s for more information on the calibration of the trisilix thermistor). temperature sensing is important for correct and fast execution of a na amplification program for at least two reasons: i) temperature can be maintained at a precise setpoint using a control algorithm such as proportional-integral-derivative control loop. ii) when cooling (passively) or heating (actively), the speed of reaching a temperature setpoint can be maximized as both heating and cooling depend on the outside temperature and device packaging (e.g., the construction of the holder or cartridge). by only relying on the power applied, temperature control cannot be achieved with high precision. although we expected formation of schottky barriers and non-linear electrical characteristics across the au-si interface, the i-v measurements ( figure c ) demonstrated that the junction exhibits relatively ohmic behavior with a linear i-v relationship (r = . ) between - v and + v. the si substrate plus the au contacts can, therefore, be modeled as a simple variable resistor, the magnitude of which changes with temperature. we have designed a custom electrical circuit ( figure d ) and matlab-based graphical user interface ( figure s ) to control the temperature of the trisilix chip at a precise setpoint through software control. the circuit ( figure d ) implements a voltage controlled constant current source which can be adjusted digitally using a low-cost microcontroller (arduino due). this circuit can also measure the voltage drop across the trisilix chip; when a constant current is applied, the resistance (hence the temperature) of the trisilix thermistor can be calculated using the voltage reading, and ohm's law. using the matlab program and the custom electrical circuit board, we were able to run a temperature program ( figure e ) and maintain the trisilix chip at a given temperature setpoint over time. this is crucial for both cyclic and isothermal na amplification reactions. we produced a thermally-stable silicone-based holder encased in a d printed polylactic acid (pla) housing (figures f and s ) to characterize the performance of electrochemical sensing using the trisilix chip. the silicone holder also contained five embedded gold-plated stainless-steel electrodes to make electrical contacts with the three electrodes positioned at the top for electrochemical na sensing and four at the bottom for heating and temperature sensing. first, we characterized the electrochemical redox processes for methylene blue (mb) using cyclic voltammetry (figure a) . we chose mb because the electrochemical approach we used for the detection of dna involves the use of mb [ ] [ ] [ ] as an intercalating redox reporter. during na amplification, mb is intercalated between guanine-cytosine base pairs of the double-stranded dna (ds-dna) which provides an electroanalytical signal correlated with the concentration of ds-dna in the sample. we prepared a µg ml - solution of mb in mm phosphate-buffered saline ph (pbs) and swept across a range of potentials between - to mv at a scan rate of mv s − to characterize the electrochemical processes involving mb and electrodes. we determined that the anodic peak current, originating from the oxidation of mb on the electrode surface, appears at − ± and cathodic peak (due to reduction) at − ± mv versus ag ( figure a ). the absence of peaks with nearly overlapping anodic and cathodic curves in buffer alone indicates that the electrodes are electrochemically and mechanically stable and high performance (e.g., low capacitive current). we have performed ( figure b ) square wave voltammetry (swv), a substantially more sensitive electroanalytical method, to measure the concentration of mb in pbs in a range between - µg ml - using a potential window from - mv to - mv versus ag (corresponding to the anodic process). the swv measurements using trisilix produces an electroanalytical signal (peak current intensity) that is linearly related to the concentration of mb in the range . µg ml − - µg ml − with an r = . ( figure c curve denoted by 'rt'). we have also characterized the analytical performance of trisilix for performing swv measurements when the chip was operated at elevated temperatures. to prevent evaporation of the solvent (i.e. water) from the sample solution, a small amount ( µl) of mineral oil was added to the reservoir. since both isothermal and cyclic na amplification reactions require heating; the effect of temperature on the electrochemical measurements is important. as shown in figure c , the peak current intensity (measured from the recorded swvs) increased two to four times in comparison to room temperature when operated at higher temperatures; this is due to enhanced transport of the analyte to the surface of the electrode. - with increasing temperature, however, the error bars also widen, indicating that the precision of the measurement decreases. we speculate that this could be related to thermal and electrical crosstalk between the heater and the electrochemical sensing structures which introduce additional noise to the electroanalytical measurements at higher temperatures. we have also studied the thermal stability of mb over time (figure d) . in this experiment, we added mb to the pcr and rpa mastermix solutions to simulate the conditions during isothermal and cyclic amplification reactions. as illustrated in figure d (and figure s ) , mb remained relatively stable both at °c over mins (i.e. rpa conditions) and cycles of pcr conditions ( sec for each step at °c, °c and °c; every th cycle, the temperature is reduced to °c for the measurement to increase precision as illustrated in figure s ) indicated by the small spread of the measurements. the swv measurements at elevated temperatures produced similar results to the measurements taken at room temperature ( °c) in terms of repeatability, in agreement with the findings in the literature by other groups although we did observe a slight decrease in stability for the pcr conditions (i.e. higher temperatures). the results of the thermal stability experiments indicate that mb is a sufficiently stable redox-active reporter for use in na amplification reactions at elevated temperatures. using the trisilix chip, we have performed real-time and quantitative rpa (isothermal, qrpa) and pcr (cyclic, qpcr) amplification of dna using its temperature transduction capabilities and figure b ). once again, we performed swv as the analytical method and measured anodic and cathodic peak current intensities originating from mb as the electroanalytical signal. without dna amplification, we were able to detect genomic dna of map k ( , kbp) down to an lod of . pg from cathodic peak current intensities in comparison to a lod of . pg provided by the anodic peak current intensities; the cathodic peak current intensity was, therefore, used as the electroanalytical signal in the following pcr experiments. we performed qpcr using the trisilix chip to amplify and quantify small amounts ( - fg) of dna of map k in real-time ( figure c ). the forward primer ( ´-gcc gcg ctg ctg gag ttg a- ´) and reverse primer ( '-cgc ggc acg gct ctt gtt - ´), were used to amplify a nucleotide segment of is ( - of genbank accesion number ae . ; national center for biotechnology information, usa), which has repeats in the genome of map k . using the qpcr approach shown in figure s , we were able to measure as low as fg of genomic dna of map k which is equivalent to detection of a single bacterium in the sample. we also characterized the same samples using a commercial laboratory qpcr (idvet, uk) using a test manufactured by id gene tm as a goldstandard ( figure s ). the results produced by trisilix were similar or better in comparison to those produced by the commercially available, sophisticated laboratory instrument; for fg of map k dna, the resulting ct (cycle threshold) value was using trisilix, cycles shorter compared to the value obtained using the commercial qpcr (ct= ). trisilix is a low-cost, high-precision, integrated nucleic acid amplification and detection technology that is ideally suited for point-of-need diagnosis of infectious pathogens (bacteria, viruses etc.) affecting, humans, animals and plants. trisilix can perform both isothermal and cyclic nucleic acid amplification; integrates temperature control and dna detection on the same device, is label-free, requires minimal sample handling and allow operation by minimally trained personnel. although trisilix is a silicon-based technology, it does not need a cleanroom for fabrication; chips can be produced in a standard wet-lab with easily accessible laboratory equipment worldwide. because fabrication in an advanced silicon foundry is not needed, reliance on the global supply chains is substantially reduced; silicon foundries are located only in a few countries in the world. each trisilix chip costs ~ . usd (cost of materials is shown in table s ) when produced at wafer-scale with -inch wafers but the cost could be reduced further by moving to -inch wafers, decreasing the size of each chip and optimizing the fabrication process (for example, the gold contacts at the bottom of the trisilix chip could be replaced by another appropriate metal to form ohmic contacts). due to its small size, integrated form-factor, and low-power requirements, trisilix can be controlled with a portable, battery-operated handheld analyzer. using a li-ion battery with a mah capacity (a typical rating for batteries in modern smartphones), trisilix is estimated to perform at least tests using isothermal amplification reactions at °c lasting min or tests using cycles of pcr (these estimates assume that the electronics for the handheld analyzer also draw ma while operating). the trisilix technology has the following three disadvantages: (i) currently trisilix uses mb as the redox-active reporter to quantify the products of the na amplification reactions in real-time. mb is known to polymerize and its activity also slightly decreases over time at elevated temperatures. this issue, however, can be addressed by the use of redox-active metal-complexes that also intercalate with dna, several of which have already been reported in the literature for real-time na for most na amplification processes, however, °c is sufficiently high. in case higher temperatures are needed (for example for on-chip sample prep), the fabrication process would need to be modified and thermoplastics with a higher melting point could be used. in the future, we will design a handheld analyzer and include all electronics needed for the operation of trisilix in the same device to simplify and enable its use in the field (currently the potentiostat for electrochemical na sensing and driver electronics for heating + temperature sensing are separate). we will also add capabilities to communicate with smartphones to have access to the cloud so that the result of each diagnostic test can be passed to health agencies remotely. , to enable on-site testing, we will implement on-chip lysis capabilities (such as mechanical or heatinduced cell lysis ) to analyze the samples taken directly from the subject without further sample prep. we will also implement on-chip reverse transcription to enable analysis of rna. trisilix would be particularly useful in emergency situations, such as the covid- pandemic, where rapid, early, on-site detection of the virus would accelerate intervention and reduce the spread of the pathogen. since trisilix is a low-cost, high-performance na detection technology, it is expected to find applications beyond infectious diseases which may include genetic analysis of various plants, animals or detection of genetic diseases. trisilix is a versatile platform and could also be adapted for electrochemical detection of non-genetic targets (e.g., through the use of antibodies) with on-chip incubation capabilities to deliver accelerated results at the point-of-need. prior to the dna measurements by swv, the sample reservoir was cleaned with dna decontamination reagent (fisher scientific) followed by nuclease-free ultrapure water (fisher scientific). replacing antibodies with aptamers in lateral flow immunoassay microfluidic dna amplification:a review emerging ultrafast nucleic acid amplification technologies for next generation molecular diagnostics detection of real-time dynamics of drug-target interactions by ultralong nanowalls development of a paper-based analytical device for colorimetric detection of select foodborne pathogens multiplexed detection of pathogen dna with dna-based fluorescence nanobarcodes sensitive quantification of escherichia coli o :h , salmonella enterica, and campylobacter jejuni by combining stopped polymerase chain reaction with chemiluminescence flow-through dna microarray analysis in situ hybridization chain reaction amplification for universal and highly sensitive electrochemiluminescent detection of dna paper-based microfluidics for dna diagnostics of malaria in low resource underserved rural communities handheld isothermal amplification and electrochemical detection of dna in resource-limited settings real-time monitoring of strand-displacement dna amplification by a contactless electrochemical microsystem using interdigitated electrodes a miniaturized silicon based device for nucleic acids electrochemical detection electrochemistry-based real-time pcr on a microchip microfabricated pcr-electrochemical device for simultaneous dna amplification and detection toward discrete multilayered composite structures: do hollow networks form in a polycrystalline infinite nanoplane by the kirkendall effect? autocatalytic metallization of fabrics using si ink, for biosensors, batteries and energy harvesting tracing the migration history of metal catalysts in metal-assisted chemically etched silicon monolithic solder-on nanoporous si-cu contacts for stretchable silicone composite temperaturedependent thermal conductivity of porous silicon thermal conductivity in porous silicon nanowire arrays silicon nanowires as efficient thermoelectric materials real-time electrochemical pcr with a dna intercalating redox probe huey fang, naveen ramalingam, dong xian-dui investigation of the signaling mechanism and verification of the performance of an electrochemical real-time pcr system based on the interaction of methylene blue with dna high-temperature electrochemistry: a review the effect of temperature on the electrochemical properties of copper-dna membranes temperature dependent electrochemistry-a versatile tool for investigations of biology related topics comparison of single-round and nested pcr assays targeting is , ismav , f and locus for detection of mycobacterium avium subsp. paratuberculosis newly developed primers for the detection of mycobacterium avium subspecies paratuberculosis high-throughput direct fecal pcr assay for detection of mycobacterium avium subsp. paratuberculosis in sheep and cattle a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster coronavirus by a loop-mediated isothermal amplification assay real-time electrochemical monitoring of the polymerase chain reaction by mediated redox catalysis electrochemical indicators for dna electroanalysis electrochemical real-time nucleic acid amplification: towards point-of-care quantification of pathogens cellulose fibers enable near zero-cost electrical sensing of water-soluble gases paper-based electrical respiration s sensor development of rapid extraction method of mycobacterium avium subspecies paratuberculosis dna from bovine stool samples comparison of point-of-care-compatible lysis methods for bacteria and viruses key: cord- -vh ma k authors: smaldino, paul e.; jones, james holland title: coupled dynamics of behavior and disease contagion among antagonistic groups date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vh ma k disease transmission and behavior change are both fundamentally social phenomena. behavior change can have profound consequences for disease transmission, and epidemic conditions can favor the more rapid adoption of behavioral innovations. we analyze a simple model of coupled behavior-change and infection in a structured population characterized by homophily and outgroup aversion. outgroup aversion slows the rate of adoption and can lead to lower rates of adoption in the later-adopting group or even behavioral divergence between groups when outgroup aversion exceeds positive ingroup influence. when disease dynamics are coupled to the behavior-adoption model, a wide variety of outcomes are possible. homophily can either increase or decrease the final size of the epidemic depending on its relative strength in the two groups and on r for the infection. for example, if the first group is homophilous and the second is not, the second group will have a larger epidemic. homophily and outgroup aversion can also produce dynamics suggestive of a “second wave” in the first group that follows the peak of the epidemic in the second group. our simple model reveals complex dynamics that are suggestive of the processes currently observed under pandemic conditions in culturally and/or politically polarized populations such as the united states. behavior can spread through communication and social learning like an infection through a community (bass, ; centola, ) . cavalli-sforza and feldman, who pioneered treating cultural transmission in an analogous manner to genetic transmission, noted that "another biological model may offer a more satisfactory interpretation of the diffusion of innovations. the model is that of an epidemic" (cavalli-sforza and feldman, , - ) . the biological success of homo sapiens has been attributed to its capacity for cumulative culture, and particularly to the rapid and flexible adaptability that arises from social learning (henrich, ) . adoption of adaptive behaviors during an epidemic of an infectious disease could be highly beneficial to both individuals and the population in which they are embedded (fenichel et al., ) . coupling models of behavioral adoption and the transmission of infectious disease, what we call coupled contagion models, may thus provide important insights for understanding dynamics and control of epidemics. while we might expect strong selection-both biological and cultural-for adaptive responses to epidemics, complications such as the potentially differing time scales of culture and disease transmission and the existence of social structures that shape adoption may complicate convergence to adaptive behavioral solutions. in this paper, we explore the joint role of homophily-the tendency to form ties with people sim- ilar to oneself-and outgroup aversion-the tendency to avoid behaviors preferentially associated with an outgroup. several previous studies have considered the coupled contagion of behavior and infection, usually focused on cases where the behavior is one that decreases the spread of the disease (such as social distancing) and sometimes using the assumption that increased disease prevalence promotes the spread of the behavior (tanaka et al., ; epstein to analyze. to help us make sense of the dynamics, we will first describe the dynamics of infection and behavior adoption in isolation, and then explore the full coupled model. we model infection in a population in which individuals can be in one of three states: susceptible, infected, and recovered. when susceptibles interact with infected individuals, they become infected with a rate equal to the effective transmissibility of the disease, τ . infected individuals recover with a constant probability ρ. this is the well-known sir model of epidemics (tolles and luong, ). the baseline model assumes random interactions governed by mass action, and the dynamics are described by well-known differential equations (see supplemental materials). this model yields the classic dy- namics in which the susceptible and recovered populations appear as nearly-mirrored sigmoids, while the rate of infected individuals rises and falls ( figure s ). the threshold for the epidemic is given by the basic reproduction number, r , which is a measure of the expected number of secondary cases caused by a single, typical primary case at the outset of an epidemic and occurs when r > . for the basic sir model in a closed population, r = τ ρ . our analysis will focus on scenarios where individuals assort based on identity. in this case, assume that individuals all belong to one of two identity groups, indicated with the subscript or . let w i be the probability that interactions are with one's ingroup, i ∈ { , }. it is therefore a measure of homophily; populations are homophilous when w i > . . it is important to recognize that groups can differ in their homophily (morris, ) . for example, if groups differ in socioeconomic class and group tends to employ members of a group as service workers, homophily will be higher for group ; a member of group is more likely to encounter members of group than the reverse. we can update the equations governing infection dynamics for members of group , with analogous equations governing members of group . we assume the disease breaks out in one of the two groups, so the initial number of infected in group is small but nonzero, while the initial number of infected in group is exactly zero. without loss of generality, we have assumed that group is always infected first. when homophily is low, the model exhibits standard sir dynamics approximating a single unified population. when an infection breaks out in group , homophily can delay the outbreak of the epidemic in group . homophily for each group works somewhat synergistically, but the effect is dominated by w . this is because the infection spreads rapidly in a homophilous group , and if group is not homophilous, its members will rapidly become infected. however, if group is homophilous, its members can avoid the infection for longer, particularly when group is also homophilous. if only group is homophilous, the initial outbreak will be delayed, but the peak infection rate in group can actually be higher than in group , as the infection is driven by interactions with both populations (figure ). we also considered the case in which the transmissibility of the infection can be reduced to very near the recovery rate, so that r is very close to . in this case, homophily can protect groups where infection did not originally break out by keeping members relatively separated from the infection group ( figure s ). we model behavior adoption as a susceptible-infectious-susceptible (sis) process, in which individuals can oscillate between adoption and non-adoption of the behavior indef- initely. we view this as more realistic than an sir process for preventative-but-transient behaviors like social distancing or wearing face masks. to avoid confusion with infection status, we denote individuals who adopted the preventative behavior as careful (c), and those who have not as uncareful (u ). unlike a disease, which is reasonably modeled as equally transmissible between any susceptible-infected pairing, where behavior is concerned, susceptible individuals are more likely to adopt when interacting with in- group adopters, but less likely to adopt when interacting with outgroup adopters. we model the behavioral dynamics for members of group are as follows, with analogous equations governing members of group : members of group i may spontaneously adopt the behavior independent of direct social influence, and do so at rate α i . this adoption may be due to individual assessment of the behavior's utility, to influences separate from peer mixing, such as from media sources, or to socioeconomic factors that make behavior adoption more or less easy for certain groups. for these reasons, we assume that groups can differ on their rates of spontaneous adoption. in reality, it is possible for groups to differ on all four model parameters, all of which can influence differences in adoption rates. for simplicity, we restrict our analysis to differences in spontaneous adoption. uncareful individuals are positively influenced to become careful by observing careful individuals of their own group, with strength β. however, this is countered by the force of outgroup aversion, γ, whereby individuals may cease being careful when they observe this behavior among members of the outgroup. the behavior is eventually discarded at rate δ, representing financial and/or psychological costs of continuing to adopt preventive behaviors like social distancing. this model assumes no explicit homophily in terms of behavioral influence. on the one hand, it seems obvious that we observe and communicate with those in our own group more than other groups. on the other hand, opportunities for observing outgroup behaviors are abundant in a digitally-connected world, which alter the conditions for cultural evolution (acerbi, ). for simplicity, we do not add explicit homophily terms to this system. instead, we simply adjust the relative strengths of ingroup influence and outgroup aversion, β/γ. when this ratio is higher, it reflects stronger homophily for behavioral influence. numerical simulations that illustrate the influence of outgroup aversion are depicted in figure . in all cases, the behavior is first adopted by group . in the absence of outgroup aversion, both groups adopt the behavior at saturation levels, with group being slightly delayed. when outgroup aversion is added, the delay increases, but more importantly, overall adoption declines for both groups. this decline continues as long as the strength of outgroup aversion is less than the strength of positive ingroup influence. a phase transition occurs here ( figure c ,d). although group may initially adopt the behavior, adoption is subsequently suppressed, resulting in a polarizing behavior that is abundant in group but nearly absent in group . because all individuals have either adopted or not, u = − c , these coupled equations can be replaced by a single equation through substitutions. for intuitive reasons, we leave them as two coupled equations. we also consider the case in which one group has a higher intrinsic adoption rate, which could be driven by differences in personality types, norms, or media exposure between the two groups. when α > α , the equilibrium adoption rate for group could be considerably higher than for group , even when ingroup positive influence was greater than outgroup aversion ( figure e , f). note that these differences arise entirely because of outgroup aversion. when γ = , both groups adopt at maximum levels. outgroup aversion has a strong influence on adoption dynamics. it can delay adoption, reduce equilibrium adoption rates, and even suppress adoption entirely in the later- adopting group. as we will see, when the behavior being adopted influences disease transmission, quite complex dynamics can emerge. before we explore the coupled dynamics of this system, we must add one more consideration to the model. we focus on the adoption of preventative behaviors that decrease the effective transmission rate of the infection, such as social distancing or wearing face masks. we model this by asserting that the transmission rate is τ c for careful individuals and τ u for uncareful individuals, such that τ u ≥ τ c . when considering the the model has six compartments, with two-letter abbreviations denoting the disease and behavioral state ( figure s ). the coupled dynamics for members of group are as follows, with analogous equations governing members of group , such that the full system is defined by coupled differential equations. a list of all parameters is presented in table . behavioral adoption is independent of infection status in this model. this may not be a realistic assumptions for some systems, such as ebola, where the both the infection status of the adopter and the perceived incidence in the population are likely to influence behavior. the assumption is more realistic for infections like influenza and covid- , where infection status is not always transparent and decisions are likely to be made on the basis of more abstract socially-transmitted information. to make the behavioral adoption most meaningful, we focus on the case where instantaneous and universal adoption of the careful behavior would decrease the disease transmissibility so that r < . that is, if everyone immediately adopted the behavior, the epidemic would fizzle out. however, behavior adoption does not typically work this way. we have already observed that under assumptions of between-group variation and outgroup aversion, a behavior is likely to be adopted neither instantaneously nor universally. the question we tackle now is how those socially-driven facets of behavioral adoption influence disease dynamics. figures s , s ). in the absence of either ho- mophily or outgroup aversion, our results mirror previous work on coupled contagion in which the adoption of inhibitory behaviors reduces peak infection rates, flattening the curve of infection. due to differences in spontaneous adoption rates, however, group may see a higher peak infection rate even when the infection breaks out in group , because the inhibitory behavior spreads more slowly in that group ( figure a ). homophilous interactions further lower infection rates. if group alone is homophilous, the infection rate declines in that group, while peak infections actually increase in group ( figure c ). this is because group adopts the careful behavior early, decreasing their transmission rate and simultaneously avoiding contact with the less careful members of group , who become infected through their frequent contact with group . if group alone is homophilous, on the other hand, the infection is staved off even more so than if both groups are homophilous ( figure b , d). this is because members of group avoid contact with group until the careful behavior has been widely adopted, while members of group diffuse their interactions with some members of group , and these are less likely to lead to new infections. outgroup aversion considerably changes these dynamics. first and foremost, outgroup aversion leads to less widespread adoption of careful behaviors, dramatically increasing the size of the epidemic. moreover, because under many circumstances there will be between-group differences in equilibrium behavior-adoption rates, this can lead to dramatic group differences in infection dynamics. in the absence of outgroup aversion, we saw that homophily in group could lead to an almost total suppression of the epidemic. not so with outgroup aversion, in which the peak infection rates increase relative to the low homophily case ( figure e , f). this occurs because homophily causes a delay in the infection onset in group . behavioral adoption slows the epidemic initially in both groups. however, when the infection finally reaches group , behavioral adoption has decreased past its maximum due to the outgroup aversion, causing peak infections in both groups to soar. the dynamics are particularly interesting for the case where group is homophilous. recall that this is the group in which the epidemic first breaks out. because of homophily and rapid behavior adoption, the epidemic is initially suppressed in this group. however, due to slower and incomplete behavior adoption, the infection spreads rapidly in group . as the infection peaks in group while group decreases its behavior adoption rate, we observe a delayed "second wave" of infection in group , well after the infection has peaked in group ( figure g ). this effect is exacerbated when both groups are homophilous, as the epidemic runs rampant in the less careful group ( figure h ). as shown in the supplementary material, the timing of the second wave is also delayed to a greater extent when the adopted behavior is more efficacious at reducing transmission ( figure s ). it is well known that disease transmission is influenced by behavior. what is often overlooked is how behavior itself changes within heterogeneous cultural populations. both population structure and social identity influence who interacts with whom, af- fecting disease transmission, and who learns from whom, affecting behavior change. we have highlighted two of these forces-homophily and outgroup aversion-and shown their dramatic influence on disease dynamics in a simple model. homophily is often treated as though it were a global propensity for assortment by type (e.g. centola, ). however, homophily is frequently observed to a greater or lesser degree across subgroups, a phenomenon known as differential homophily (morris, ). there are several different interpretations of homophily in these simple models. when the homophily of group is less than group , group can be interpreted as "frontline" workers, who are exposed to a broader cross-section of the population by nature of their work. outgroup avoidance of this group's adopted protective behavior can arise if there are status differentials across the groups. prestige bias is a mechanism that can drive differential uptake of novel behavior by different groups (boyd and richerson, ) , for which there is quite broad support (jiménez and mesoudi, ) . when both groups are highly homophilous and outgroup aversion is strong, the resulting dynamics suggest the case of negative partisanship (abramowitz and webster, ), in which differences in the relative size of the epidemic will be driven purely by differences in the adoption rates by the two groups, including those differences induced by outgroup aversion. homophily has three main effects in our coupled-contagion models. when homophily is strong, it can protect the uninfected segment of the population (i.e., group ) if the transmissibility of the infection is sufficiently low ( figure s ) or if outgroup aversion is negligible ( figure ). however, when r is high enough and outgroup aversion induces group differences in behavior adoption, strong homophily among group can lead to larger, albeit delayed, epidemics in the initially-uninfected segment of the population. finally, when homophily is asymmetric and higher in group , it can substantially reduce the size of the epidemic in that group because the protective behavior spreads rapidly at the outset of the epidemic when there is the greatest potential to reduce the epidemic's toll. incorporating adaptive behavior into epidemic models has been shown to significantly alter dynamics (fenichel et al., ) . prevalence-elastic behavior (funk et al., ) is a behavior that increases with the growth of an epidemic. while it may be protective, it can also lead to cycling of incidence, which can prolong epidemics. similarly, the adoption of some putatively-protective behaviors that are actually ineffective can be driven by the existence of an epidemic when the cost of adoption is sufficiently low (tanaka et al., ). we have shown in this paper that group-identity processes can have large effects, leading groups that would otherwise respond adaptively to the threat of an epidemic to behave in ways that put them, and the broader populations in which they are embedded, at risk. the context of the ongoing covid- pandemic provides some interesting and timely perspective on the relationship between behavior, adaptive or otherwise, and transmis- infection rates. we expect that such a situation will not induce strong prevalence-elastic behavioral responses, and that the sorts of identity-based responses we describe here will dominate the behavioral effects on transmission. in terms of social interaction and adoption dynamics, group identity exerts its influence by way of homophily, a powerful social force. aral et al. ( ), for example, showed that homophily accounted for more than % of contagion in a natural exper- iment on behavioral adoption. the effect of homophily on diffusion dynamics can be variable. for example, homophily can slow down convergence toward best responses in strategic networks (golub and jackson, ). this can be critical when the time scales of learning and infection are different. homophily can also lower the threshold for desirability (or the selective advantage) required for adoption of a behavior. creanza and feldman ( ) showed that homophily and selection can have balancing effects-the selective advantage of a trait does not need to be as high to spread when it is transmitted assortatively by its bearers. in the case of our coupled-contagion model, strong homophily interferes with the adaptive adoption of protective behavior. centola ( ) showed that homophily can increase the rate of adoption of health behaviors, but his experimental population could assort only on positive cues, and had no ability to signal or perceive group identity. when homophily promotes negative partisanship (abramowitz and webster, ) with respect to the adoption of adaptive behavior, it can lead to quite complex outcomes, as we have outlined in this paper. how do we intervene in a way to offset the pernicious effects of negative partisanship on the adoption of adaptive behavior? while it may seem obvious, strategies for spreading efficacious protective behaviors in a highly-structured population with strong outgroup aversion will require weakening the association between protective behaviors and par- ticular subgroups of the population. given that we are writing this during a global pandemic in which perceptions and behaviors are highly polarized along partisan lines, attempts to mitigate partisanship in adaptive behavioral responses seem paramount to support. the models we have analyzed in this paper are broad simplifications of the coupled dynamics of behavior-change and infection. it would therefore be imprudent to use them to make specific predictions. the goal of this approach is to develop strategic models in the sense of holling ( ) , sacrificing precision and some realism for general understanding of the potential interactions between social structure, outgroup aversion, and coupled contagion (levins, ; smaldino, ) . such models provide a scaffold for the development of richer theories concerning coupled disease and behavioral contagions. epstein, j. m., parker, j., cummings, d., and hammond, r. a. ( ) . coupled contagion dynamics of fear and disease: mathematical and computational explorations. line model assumes random interactions governed by mass action, and the dynamics are described by the following well-known differential equations describing the proportion of the population in these three compartments: this model yields the classic dynamics in which the susceptible and recovered populations appear as nearly-mirrored sigmoids, while the rate of infected individuals rises and falls ( figure s ). figure s . classic sir dynamics. here τ = . , ρ = . . the threshold for the epidemic is given by the basic reproduction number, r , which is a measure of the expected number of secondary cases caused by a single, typical at the outset of an epidemic and occurs when r > . r is essentially the ratio of the rate of additional infections to the rate of removal of infections through recovery and possibly death. for the classic sir model, the calculation is quite simple. we assume that s ≈ at the time of initial outbreak, and we are interested in the case where the rate of new infections exceeds the rate of recovery: appendix b. the sir model with homophily we extend the sir model to explore scenarios where individuals assort based on identity. in this case, assume that individuals all belong to one of two identity groups, indicated with the subscript or . let w i be the probability that interactions are with one's ingroup, i ∈ , . it is therefore a measure of homophily; populations are homophilous when w i > . . homophily can be asymmetric between groups, because members of one group may be more likely to have interactions with the outgroup than the other group. for example, low ses individuals, who often work service jobs, may be unable to avoid interactions with the outgroup. we can update the equations governing infection dynamics for members of group , with analogous equations governing members of group : as illustrated in figure s , when an infection breaks out in group , homophily can delay the outbreak of the epidemic in group . homophily for each group works somewhat synergistically, but the effect is dominated by w . this is because the infection spreads rapidly in a homophilous group , and if group is not homophilous its members will rapidly become infected. however, if group is homophilous, its members can avoid the infection for longer, particularly when group is also homophilous. we also explored a scenario where r for the basic model was very close to , indicating a small epidemic (we used r = . ; figure s ). when homophily was low (w = . ), the populations mixed a lot. the proportion of infected individuals in group briefly fell, as the majority of new infected individuals were in group . however, the groups quickly matched their pace and experienced the outbreak in tandem. when homophily was high (w = . ), not only did group experience a delayed outbreak, it also experienced a substantially lower peak infection rate, because the total number of infected individuals at the start of its outbreak was so much lower than that experienced by group . thus, homophily can serve not only to delay an epidemic, but also to reduce it in the cases of homophily and the sis behavioral adoption model with outgroup aversion. the adopted behavior decreases the effective transmission rate of the infection due to measures like social distancing. we model this by asserting that the transmission rate is τ c for careful individuals and τ u for uncareful individuals, such that τ u ≥ τ c . when considering the interaction between groups, we use the geometric mean, so the transmissibility between su and iu is √ τ u τ c . the model has six compartments, with two-letter abbreviations denoting the disease and behavioral state ( figure s ). the coupled dynamics for members of group are as follows, with analogous equations governing members of group , such that our system is defined by coupled differential equations: here we present an extended version of the full model analysis presented in the main text, that includes intermediate homophily of w i = . . analysis with no outgroup aversion is shown in figure s , and with outgroup aversion is shown in figure s . the figures illustrate how homophily and outgroup aversion can interact to produce unintuitive dynamics. when both forces are present, an infection that begins in group can peak earlier and stronger in group , followed by a smaller peak in the group where it began. in the main text analysis, we assumed that the adopted behavior reduced the transmission to below the threshold for r < . in other words, if everyone immediately and universally adopted the behavior at the start of the outbreak, it would not become an epidemic. although we view this as a reasonable assumption (that is, the efficacy of the behavior is reasonable, not the expectation that it will be either immediately or universally adopted), it is also worth examining what happens with the spread of behaviors that reduce transmission, but not below epidemic levels. figure s illustrates the model su sc figure s . illustration of the model dynamics. (a) transition probabilities between compartments for members of group . for simplicity these probabilities do not include the influence of homophily. (b) homophilous interactions. members of group i have physical contact with members of their own group with probability w i and members of the outgroup with probability − w i . dynamics for varying levels of behavior efficacy (τ c ) with and without outgroup aversion and for both weak and strong homophily. without outgroup aversion (γ = ), the effect is clear: the more efficacious the behavior, the smaller the epidemic. this occurs because the behavior spreads effectively. with outgroup aversion, two things happen. first, the more effectively the behavior reduces transmission (that is, the smaller τ c is), the smaller the overall epidemic, but with an effect that is much stronger in group . in group , the effect of increased behavior efficacy is relatively small, because adoption is reduced and delayed. second, the better the be- havior reduces transmission, the bigger the delay in when group experiences a "second wave." this illustrates how complex the dynamics of disease transmission can become when even simple assumptions about behavior and group structure are considered. more effective behavior adoption figure s . coupled dynamics of the full model for varying levels of behavior efficacy, τ c = { . , . , . }, where only the last case would provide r < if immediately and universally adopted at the start of the outbreak. we provide analyses with and without outgroup aversion and for both weak and strong homophily. darker lines are group , lighter lines are group . parameters used: τ u = . , ρ = . , α = . , α = . , β = . , δ = . the secret of our success: how culture is driving human evolution, domesticating our species, and making us smarter intergroup behavior and social identity. the sage handbook of social psychology: concise student edition the strategy of building models of complex ecological systems prestige-biased social learning: current evidence and outstanding questions why we're polarized the strategy of model building in population biology substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) uncivil agreement: how politics became our identity estimating the infection and case fatality ratio for coronavirus disease (covid- ) using age-adjusted data from the outbreak on the diamond princess cruise ship the effect of opinion clustering on disease out- breaks models are stupid, and we need more of them social identity and cooperation in cultural evolution key: cord- -x pz uz authors: yellman, christopher m. title: precise replacement of saccharomyces cerevisiae proteasome genes with human orthologs by an integrative targeting method date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: x pz uz artificial induction of a chromosomal double-strand break in saccharomyces cerevisiae enhances the frequency of integration of homologous dna fragments into the broken region by up to several orders of magnitude. the process of homologous repair can be exploited to integrate, in principle, any foreign dna into a target site, provided the introduced dna is flanked at both the ’ and ’ ends by sequences homologous to the region surrounding the double-strand break. we have developed a tool set that requires a minimum of steps to induce double-strand breaks at chromosomal target sites with the meganuclease i-scei and select integration events at those sites. we demonstrate this method in two different applications. first, the introduction of site-specific single-nucleotide phosphorylation site mutations into the s. cerevisiae gene spo . second, the precise chromosomal replacement of eleven s. cerevisiae proteasome genes with their human orthologs. placing the human genes under s. cerevisiae transcriptional control allowed us to update our of model of functional replacement. our experience suggests that using native promoters may be a useful general strategy for the coordinated expression of foreign genes in s. cerevisiae. we provide an integrative targeting toolset that will facilitate a variety of precision genome engineering applications. the integration of dna into saccharomyces cerevisiae chromosomes has become a foundational tool for the creation of inheritable modifications of many types, including gene-epitope fusions, mutations, and foreign gene insertions. dna transformed into s. cerevisiae can integrate stably into chromosomes by homologous recombination when it has sequence homology to the target site [ ] [ ] [ ] . linear double-stranded dna integrates more efficiently than circular dna, and can carry heterologous dna into the integration site as a consequence of recombination at the dna ends. the presence of a double-strand break (dsb) at the target site further increases the efficiency of dna integration by homology-directed repair (hdr) . the experimental induction of dsbs to initiate recombination at specific sites was pioneered in saccharomyces cerevisiae using the ho meganuclease , followed soon after by the i-scei meganuclease . meganucleases have since been used in s. cerevisiae, other microbes and even metazoan species to enhance the efficiency of chromosomal modifications [ ] [ ] [ ] [ ] [ ] . in principle, a variety of meganucleases will work in yeast, but i-scei , and i-crei have been the most frequently used. the "delitto perfetto" is a particularly elegant method that uses i-scei for dsb induction and scarless repair with templates as small as oligonucleotides , [ ] [ ] [ ] . more recently the rna-guided endonuclease cas has become a widely-used tool for dsb induction in yeast [ ] [ ] [ ] [ ] [ ] [ ] and other organisms . the meganucleases have large dna recognition sequences, usually - base pairs (bp) long, so they are unlikely to occur randomly in the relatively small genomes of yeast. the use of a meganuclease in genome engineering therefore requires that its recognition sequence be integrated at or near the target site to prepare it for dsb induction. in contrast, cas can be directed to a large variety of target sites using unique guide rnas (grnas). however, several considerations affect the utility of crispr-cas for editing yeast genomes, and suggest that meganucleases will continue to be useful. firstly, it is difficult to predict the efficiency of dsb induction by cas at specific grna sites. factors that inhibit the performance of individual grnas include the presence of nucleosomes at the target site and intrinsic sequence features of the rna . as a result, several grna candidates must often be compared experimentally to find one that performs with high efficiency . secondly, good grna targets, while numerous in s. cerevisiae , are not ubiquitous. consequently, the use of oligonucleotides, which are potentially very useful repair templates, is limited to chromosomal sites with an efficient grna target in the region spanned by the oligonucleotide. thirdly, a grna target that is not fully disabled by the dsb repair continues to be available for repeated cutting, potentially biasing the repair towards undesired events. fourthly, crispr-cas has well-documented off-target effects that continue to be actively investigated , , although they are of less concern in yeast than in organisms with larger genomes. finally, when using crispr-cas , a specific repair event can be selected from all possible events only if it confers a novel selectable phenotype. when the dsb is induced within an essential gene, the selection for repair to a viable state is strong , but changes made at nonessential loci lack that advantage. we have developed a simplified method for genome engineering s. cerevisiae using i-scei for dsb induction. while conserving the key features of "delitto perfetto", we have reduced the cassettes for dsb induction and +/-selection from ~ . kb to less than . kb, and provided a variety of separate plasmid-borne or integrated constructs for i-scei expression. our integrative targeting (it) cassettes carry only a single marker, k. lactis ura , and built-in i-scei recognition sites at one or both ends. we used the it method to introduce phosphorylation site mutations into the gene spo from oligonucleotide repair templates, and to precisely replace essential yeast proteasome genes with their human orthologs. placing human proteasome orthologs under s. cerevisiae transcriptional control allowed us to refine our understanding of cross-species complementation by human proteasome subunits in yeast. our methods outline a high-confidence work flow for genome engineering of s. cerevisiae, and we provide a variety of strains that are useful starting points for further applications. the it cassettes ( figure ) were synthesized by pcr using plasmid pom as the template for the kluyveromyces lactis ura gene, including bp of its native promoter and bp of its terminator. i-scei recognition sequences were incorporated, in various orientations, into the pcr primers used to amplify k. lactis ura . the pcr products were cloned by cold fusion (sbi) into a plasmid backbone derived from pgem- zf(+) to make the it plasmids (table ) . plasmids for i-scei expression pgal -i-scei expression modules were assembled in the yeast cen/ars plasmid backbones prs h, prs k and prs n by in vivo homologous recombination in s. cerevisiae. the backbones were linearized with the endonuclease ecorv, then co-transformed into yeast with three pcr products consisting of a bp gal promoter from pym-n , the i-scei open reading frame , and a bp s. cerevisiae native gal terminator. the pcr fragments had overlapping homology of ~ bp at each junction to drive their assembly. the assembled plasmids were recovered by preparation of yeast genomic dna and electroporation into e. coli, and sequenced across the assembled regions. plasmid pgal -scen was assembled in the prcvs n (cmy, unpublished) backbone with a bp s. cerevisiae native gal promoter and a bp gal terminator plasmids for complementation of s. cerevisiae gene deletions with s. kluyveri genes complementing plasmids carrying s. kluyveri orthologs of s. cerevisiae proteasome genes have been previously described . chromosomal integration of an it cassette requires its synthesis with pcr primers that have priming regions common to all of the cassettes and unique identity to the ' and ' regions flanking the desired integration site. integration of the cassettes into a chromosomal site is relatively efficient when the flanking target identity at each end is at least bp. the forward primer ( ' to ') requires ' target identity + cggacgtcacgacctgcg and the reverse primer ( ' to ') requires ' reverse complementary identity + ggctgtcaggcgtgcacg. recommended pcr amplification conditions are described in table s . yeast media and growth conditions were standard . i-scei expression was induced in yeast cells from the gal or gal promoters as follows: cells were grown overnight in yp/ % raffinose, inoculated at ~ x cells/ml into fresh yp/ % raffinose in the morning, and grown for - hours to ensure they were in logarithmic growth. at the zero time point of i-scei expression, galactose was added to the cycling cells to reach a final concentration of %. induction continued for different lengths of time depending on the experiment. dna transformations into yeast were performed using the peg/lithium acetate highefficiency method . the typical transformation targeted ~ x yeast cells. yeast strains were all of the by or w backgrounds. strain names and genotypes are listed in table . all viable strains with chromosomal modifications were backcrossed to a congenic strain, and derivatives of either mating type are available upon request. by inspection of chromosomal sequences from the saccharomyces genome database (sgd), we identified a set of genomic targets in s. cerevisiae to use for the integration of i-scei expression constructs and as general sites for the integration of foreign dna. their chromosomal locations are summarized in table s . we did not work with all of the gt sites, but include their locations for potential use. double-stranded oligonucleotides were prepared by mixing single-stranded oligonucleotides together at a concentration of µm each in mm tris, ph . / mm nacl. the mixture was heated at o c in a heat block for minutes, and cooled to room temperature over a period of approximately one hour to promote annealing. the coding sequences for human open reading frames (orfs) were amplified by pcr from plasmids in the human orfeome collection (horfeome v . ), with the exception of psma ccds . , which was amplified from plasmid hscd (harvard institute for proteomics). all plasmids were confirmed by sanger sequencing of at least the relevant assembled construct. all chromosomally-integrated constructs, including it cassettes, i-scei expression modules, spo mutations and human orfs were sequenced after integration. the loci were amplified by pcr from outside the regions of yeast sequence identity used for homologous recombination, and sequenced across the entire construct. we constructed integrative targeting cassettes containing only the marker gene kluyveromyces lactis ura , which can be both positively and negatively selected, and recognition sequences for the homing endonuclease i-scei ( figure ). the set of cassettes includes versions that contain no i-scei site at all, a single site at either the ' or ' end of the cassette, or sites at both the ' and ' ends, in direct or inverse orientation to each other. the cassettes are maintained on high-copy e. coli plasmids ( table ) that serve as pcr templates. the cassettes, amplified by pcr with flanking target identity, can be integrated into a yeast chromosomal target locus by high-efficiency transformation and selected for by complementation of a ura mutation in the host strain. the eventual replacement of the cassette with a dna cargo is selected for using media containing -fluoro-orotic acid ( -foa), which is lethal to ura+ yeast that have not excised or mutated the k. lactis ura gene. for flexible control of dsb induction, we generated constructs that place i-scei expression under the control of the strongly repressible and inducible s. cerevisiae gal and gal promoters. yeast centromeric plasmids carrying i-scei expression constructs (table ) can be transformed into yeast and selected with a variety of dominant drug-resistance markers, then spontaneously lost during unselected growth. we also provide several yeast strains with pgal or pgal -driven i-scei expression from chromosomally-integrated constructs ( table ) . expression of i-scei can therefore be controlled using a variety of methods appropriate to different applications. we wanted to measure the efficiency with which the five it cassettes, when integrated into a chromosomal site, would induce homologous recombination. to best estimate the frequency of chromosomal dsb formation at the different cassettes, we designed an assay in which the repair template was supplied on a homologous chromosome, and therefore available as efficiently as possible. in diploid cells, the it cassettes were integrated into one copy of chromosome iv at a neutral genomic locus we refer to as gt (table ) . following dsb induction, repair of the break using the homologous chromosome eliminated the it cassette and the cells became uraand -foa-resistant ( -foa r ). with the homologous chromosome being present in every cell, the rate of recovery of foa r cells should quantitatively reflect dsb formation. we induced i-scei expression in diploid cells and counted -foa r colonies as a fraction of the total at , , and -hour intervals (figure ), expecting that the presence of two i-scei sites instead of one would increase dsb formation. to our surprise, cassettes it and it performed equally well, yielding -foa r in close to % of all cells after hours in galactose. in contrast, it , it and it induced -foa r poorly. because the dsb induction results were unexpected, we confirmed the identities of the it cassette strains by pcr length analysis of the ' and ' ends of the cassettes at all time points in the experiment (data not shown). the most parsimonious interpretation of the data is that the forward-oriented i-scei site at the ' end of it and it performs far better than any other break site. the forward-facing i-scei site at the beginning of it is a weak dsb site, while the inversely-oriented i-scei sites in it together are relatively inefficient. although we think it is unlikely, we cannot rule out the possibility that local features affected the dsb activity induced in the experiment, and therefore that the cassettes might perform differently in other chromosomal contexts. to explore the utility of the it cassettes for genome engineering, we performed two types of chromosomal modifications. the first was the introduction of phosphorylation site mutations into the non-essential spo gene using double-stranded oligonucleotides as the repair templates. the second set of modifications was the precise replacement of eleven yeast genes encoding subunits of the proteasome with the coding sequences of their human orthologs. spo protein is an activator of the early anaphase release of the phosphatase cdc , and serines and of spo are required for this release . we used the it method to make inhibitory and activating phosphomimetic mutations at these two amino acid residues. an s to a (alanine) mutation approximates a serine that cannot be phosphorylated, while mutations to d (aspartic acid) and e (glutamic acid) mimic phosphorylated serine . to introduce the mutations, the it , it , it and it cassettes were first integrated into the non-essential spo gene, replacing twenty-four base pairs (bp - ) ( figure ). we induced dsbs in the it cassettes and transformed the cells with double-stranded oligonucleotides to repair the breaks. the relative efficiencies of the four different spo ::it cassettes as targets for repair with an oligonucleotide encoding s a/s a were consistent with the results of our assay of dsb repair at gt ( table ). the spo ::it target formed -foa r colonies inefficiently and was a poor repair site. spo ::it yielded relatively few -foa r colonies, but most of them used the oligonucleotides for repair. the it and it versions formed -foa r colonies efficiently and consistently used the oligonucleotides as repair templates. we used spo ::it to introduce three pairs of mutations (s a/s a, s d/s d and s e/s e), each time recovering ~ colonies of which / were repaired as desired. in summary, repair of dsbs induced at it and it yielded many candidates and consistently used the desired oligonucleotide templates. dsbs induced at it were repaired using oligonucleotides that spanned ~ . kb from the break site, consistent with the previously reported oligonucleotide-templated repair of dsbs induced at one end of the ~ . kb core cassette . the eukaryotic proteasome is a highly conserved protease with approximately protein subunits, responsible for the degradation of ubiquitinated proteins , . we have previously shown, in plasmid-based complementation tests, that many human genes encoding subunits of the proteasome can functionally replace their yeast orthologs under the control of a strong constitutive yeast promoter and terminator . however, such assays are affected by plasmid instability and copy number variation and the need to grow the cells in selective media. the ability of a heterologous gene to support viability is also subject to the activity level of the chosen promoter, a variable which is often not well understood. a gene replacement strategy to protect genetic stability the proteasome has direct roles in chromosome segregation and dna double-strand break repair , . therefore, we designed a work flow of several high-confidence steps that minimized the risk of genotoxic stress on the yeast cells due to partial or temporary loss of proteasome activity ( figure ). we first transformed diploid yeast to replace one copy of each gene with the it cassette. diploid yeast heterozygous for the gene::it deletions were then transformed with centromeric plasmids carrying the orthologous saccharomyces kluyveri gene, under the control of the s. kluyveri promoter and terminator, which we have previously shown are able to complement the s. cerevisiae gene deletions . the diploid cells were sporulated and tetrads dissected to recover haploid cells with gene::it deletions covered by the plasmid-borne s. kluyveri genes. the it cassettes were then replaced by inducing i-scei and transforming with pcr-amplified human orfs, flanked by homology to the promoter and terminator of the yeast gene. because standard -mer pcr primers were used, the regions of flanking homology were relatively short, ranging from - nt at the ' ends and - nt at the ' ends, with one exception that had slightly longer homology. we isolated and screened -foa r gene replacement candidates by yeast colony pcr and sequenced a variety of them. in addition to recovering candidates with the desired repair to human orfs, we found two types of undesired repair products, namely mutations in k. lactis ura and deletions that reduced the it cassette to a single, unmarked i-scei site. analysis of candidates showed that ( %) had repaired to the human orf, ( %) had mutated ura and ( %) had reduced the cassette to an i-scei site. the i-scei site reductions must have occurred by microhomology-mediated end joining, a relatively high-fidelity form of repair in yeast , we did not observe this type of repair product when replacing the spo targets with oligonucleotides. their high frequency in the pcr product transformations underscores the importance of efficient delivery of the repair template. the pcr products we used to introduce human orfs were both larger and less abundant than the oligonucleotides used to introduce mutations into spo . therefore, the efficiency of transformation was probably much lower with the large dna fragments. cassette reduction will usually be an undesired outcome, but it can be avoided by using it , which carries only a single i-scei site. we confirmed the human orfs integrations by sequencing the chromosomal loci from outside the regions of homology present in the pcr products and across each integrated human orf. as we had hoped, we found no cases in which the orthologous s. kluyveri genes were used as templates for repair of the induced dsbs, due to the sequence divergence of their promoters and terminators. to test the ability of human proteasome genes to function when controlled by the native yeast promoters and terminators, we precisely replaced eleven yeast proteasome genes, at their chromosomal loci, with the orthologous human protein coding sequences. the genes we replaced encoded the seven subunits of the core α ring and four of six subunits of the atpase ring, which is responsible for substrate translocation into the catalytic core . the resulting yeast strains each contained a single human coding sequence. many human proteasome genes have splicing isoforms, so we compared all human cdna source clones to the consensus coding sequences (ccds) in the ncbi gene database, and identified the ccds most similar to the clones. we used pcr to modify several of the cdnas so that their sequences and lengths matched at least one ccds in the database, leaving only a few instances of silent nucleotide changes. table s summarizes the yeast:human orthology relationships of the proteasome core and atpase ring subunits we worked with, the length and sequence comparisons of our cdna clones to the database ccds, and the existence of splicing isoforms. among the human ccds used for gene replacements, there was at least one isoform of each core α-ring subunit and three out of four atpase subunits that minimally supported viability on rich medium (figure ) . in a previous study, we reported that the human α (psma ) subunit was toxic when expressed from a strong constitutive yeast promoter. . placing it under fully native yeast transcriptional control allowed it to support viability. we did not previously test complementation by the α human paralogs psma and psma , due to the lack of cdna clones, but we now done so. psma is a widely-expressed isoform, while psma is testis-specific and has three isoforms of different lengths. both psma and the mid-length isoform of psma supported viability (table s ) . we did not test the short and long isoforms of psma . not all of the human α-ring isoforms that we tested were able to complement yeast, however. of the two splicing isoforms of psma , the human α , only the longer one supported viability. the atpase ring consists of six subunits, all of which supported viability in our previous study . we replaced subunits rpt , rpt , rpt and rpt with their human orthologs psmc , psmc , psmc and psmc respectively. psmc , psmc and psmc supported viability, but psmc did not ( figure and table s ). of the two splicing isoforms of psmc , only the longer one supported viability (table s ) . we tested rpt ::psmc and rpt ::psmc activity by the ability of the strains to lose a complementing plasmid carrying s. kluyveri rpt or rpt ( figure c ), leaving the human orf as the only source of protein. neither psmc nor the short isoform of psmc conferred on cells the ability to lose the complementing plasmid. however, the long isoform of psmc allowed plasmid loss, indicating functional replacement. we previously reported that psmc could functionally replace rpt , but by this more rigorous assay, we found it did not. the lineages leading to s. cerevisiae and humans diverged approximately billion years ago , so it would be surprising if human proteasome subunits had fully normal activity in yeast. the proteasome is required for both routine mitotic division and degradation of misfolded proteins , , so we compared the growth of unmodified and humanized yeast under both permissive conditions and proteotoxic stress. under permissive growth conditions (rich medium, o c), only a few humanized strains grew slowly ( figure a ). human psma and psma significantly delayed growth, while psma caused a slight delay. the yeast α subunit pre is the only non-essential protein in the core αring. deletion of pre causes a slight growth delay , an effect which was very subtle in our assay. yeast with the human α subunit psma grew normally, but the Δ -psma n-terminal truncation allele caused a slight growth delay. finally, the viable human rpt atpase strains grew at strikingly normal rates. overall, humanized strains executed mitosis well under these permissive conditions. high temperature imposes proteotoxic stress on cells in the form of misfolded proteins and survival of this stress requires the ubiquitin-proteasome system [ ] [ ] [ ] [ ] [ ] . as previously reported, pre Δ yeast were very sensitive to high temperature ( figure a) . with a few exceptions, yeast with human proteasome subunits were also very sensitive to high temperature, growing poorly at o c and not at all at o c. the most striking exceptions were yeast with psma (α ), which was slightly slow at o c and psma (α ), which grew normally at o c and o c. the psma (α ) and psma (α ) strains grew fairly well at o c, and were only partially inhibited at o c. we conclude that most human substitutions compromise proteasome activity, but psma and psma provide almost normal activity at the elevated temperatures we tested. canavanine, a non-biogenic arginine analog, causes proteotoxic stress upon incorporation into new proteins, activating the yeast environmental stress response . we performed a canavanine-sensitivity test on synthetic medium at o c to avoid temperature-dependent effects. in the absence of canavanine, the Δ -psma (α ), psma (α ) and psma (α ) strains grew slowly, as they had on ypd ( figure b ). yeast lacking pre had been extremely sensitive to high temperature, but were only slightly sensitive to canavanine, suggesting the stress imposed by high temperature is more severe or general. the humanized α , α and α strains (psma , psma and psma ) were slightly sensitive to canavanine. the humanized rpt strains psmc , psmc and psmc were more strongly sensitive to canavanine, especially psmc and psmc . in summary, our phenotypic analysis shows that yeast proteasomes with single human subunits tend to be severely deficient in the response to high temperature and moderately sensitive to protein misfolding. transcriptional circuitry is often complex, delicately balanced, and incompletely characterized. there is currently abundant interest in transferring complete foreign protein complexes and enzymatic pathways into yeast, for both functional conservation studies and synthetic biology applications. native gene regulation may be particularly useful when working with complexes such as the proteasome, ribosome and cct chaperonin, which have defined stoichiometries and are transcriptionally co-regulated [ ] [ ] [ ] [ ] . apart from these well-known examples, complexes formed during facultative responses, such as autophagy and dna repair, are also coregulated , , and the use of native promoters preserves those potentially important regulatory circuits. chromosomal integration of human proteasome genes under native transcriptional control allowed us to refine the complementation status of several proteasome subunits previously reported in a large-scale study . in exceptional cases, altered expression may be desirable, but as a general strategy, native expression appears to be the best way to minimize confounding effects. we benefitted from native gene regulation in one other way. from previous work, we knew that s. kluyveri proteasome genes, under native promoters and terminators, tend to complement s. cerevisiae gene deletions . having used these genes as part of our chromosomal integration strategy, we can now say, at least in a limited context, that the sequences of their promoters and terminators differ sufficiently from s. cerevisiae to make them unavailable for homologous repair. this trick won't always work, but when available, it is very useful. we expected that human proteasome subunits, having evolved separately from yeast for billion years , would have pleiotropic functional deficits compared to native yeast subunits. the human subunits may be inefficiently synthesized or folded, they may limit the assembly or stability of the mature proteasome, or they may have more specific functional defects. humanized yeast grew surprisingly well under permissive conditions, suggesting the proteasome was working well within its capacity, but phenotypic assays confirmed that the human genes did not provide fully normal activity. high temperature and protein folding stress exposed the incompleteness of the complementation. by exploring phenotypes ranging from minimal complementation to stress resistance, we were able to characterize the levels of functional conservation of human splicing isoforms and paralogs. in well-characterized cases of complementation, humanized yeast may be a useful platform to test the functional activity of human mutants, isoforms or splicing variants, or the efficacy of a drug treatment. high temperature has wide-ranging effects on cells, including changes in membrane composition , , and the induction of multiple transcriptional and metabolic pathways , . ubiquitin ligases and deubiquitinases of the ubiquitin-proteasome system are required for the degradation of proteins that misfold as a result of high temperature , . with the exception of the non-essential α subunit pre , the roles of the proteasome core subunits in surviving high temperature have not been deeply investigated, in part because they are essential. the human substitutions can be can be viewed as temperature-sensitive alleles of essential proteasome subunits, but as with any classic temperature-sensitive or pleiotropic allele, there are a variety of possible reasons for the loss of function. irrespective of these considerations, the failure to grow indicates that the core proteasome is required for survival at high temperature. the human substitutions may be useful reagents with which to further investigate the roles of the proteasome in high temperature growth and under proteotoxic stress. where are the pros and cons of the it method and crispr/cas in yeast genome engineering? aside from the differences in dsb induction, crispr/cas and it have similar requirements in the subsequent steps of chromosomal modification. a repair template must be available during or soon after the dsb is made, and the desired repair products must be identified from a variety of possible events. therefore, the main differences are in the convenience and level of control of dsb induction. crispr/cas is a very flexible way to induce chromosomal dsbs; sgrna targets are abundant, and no preliminary chromosomal modification is necessary. this is particularly advantageous when working with essential genes. however, the performance of a specific crispr/cas sgrna is still unpredictable, and dsb induction near a specific locus typically requires comparing several sgrna candidates. therefore, crispr/cas is ideal for large, exploratory experiments in which the efficiency of dsb induction at any one sgrna target does not need to be optimal . some of the drawbacks of crispr/cas can be overcome by offering large dna repair templates, which span a chromosomal region containing multiple sgrna sites. the it cassettes, on the other hand, must be integrated into a target site, and if that locus is essential for viability, the function must be covered, at least temporarily, with a complementing construct. it cassettes can, in theory, be integrated into any chromosomal locus, and once integrated, dsb induction is, in our experience, independent of the locus. the freedom to choose a precise dsb induction site is potentially powerful, enabling intense mutagenesis of short, defined genomic regions using relatively cheap oligonucleotide repair templates. one potential application of the it method that we have only briefly explored is the integration of a cassette into the kanmx marker used in the yeast standard gene deletion collection , . the systematic integration of an it cassette into kanmx would convert the gene cmy - a spo -s a, s a w cmy - a spo -s d, s d w cmy - a spo -s e, s e w in diploid yeast, the it cassettes were chromosomally-integrated into one copy of chromosome iv at the site gt (table , strains cmy , , , , ). the homologous chromosome was unmodified. a single copy of pgal -i-scei chromosomally-integrated at gt was used to induce i-scei expression upon the addition of galactose. cells were sampled at the indicated time points by plating on ypd for single colonies, then replica-plating to sc -ura + -foa plates to count the fraction of -foa-resistant cells in the population. at least cells of each strain were analyzed at time zero, and the number cells counted increased to more than of each strain by the -hour time point. error bars represent the standard error of the mean calculated from three separate platings of cells from the same culture. . in a diploid cell, replace of one copy of the na�ve gene with an it casse�e. transform with a plasmid carrying an orthologous complemen�ng gene. figure . replacing essential s. cerevisiae genes and testing complementation. (a) an essential gene is targeted for replacement in a diploid yeast cell, and function is maintained throughout the procedure by a complementing plasmid-borne orthologous gene. (b) yeast (cmy , and ) with human replacements of rpt and rpt were complemented by a plasmid carrying s. kluyveri rpt (pjd ) or rpt (pjd ). cultures were grown in ypd liquid overnight to permit spontaneous plasmid loss, diluted and plated at low density to allow the formation of individual colonies. plasmid loss was assayed by replica-plating to ypd + clonnat. small fuchsia circles indicate colonies that formed on ypd but not on ypd + clonnat, indicating they had lost the complementing plasmid. selected subunits of the proteolytic core α-ring and the atpase ring of the yeast proteasome were replaced in their chromosomal loci with the orthologous human coding sequences. human subunits that supported viability are green, those that did not are red and untested subunits are gray. canavanine μg/ml μg/ml μg/ml transformation of yeast replacement of chromosome segments with altered dna sequences constructed in vitro yeast transformation: a model system for the study of recombination chromosomal site-specific double-strand breaks are efficiently targeted for repair by oligonucleotides in yeast efficient repair of ho-induced chromosomal breaks in saccharomyces cerevisiae by recombination between flanking homologous sequences site-specific recombination determined by i-scei, a mitochondrial group i intron-encoded endonuclease expressed in the yeast nucleus i-sceimediated double-strand dna breaks stimulate efficient gene targeting in the industrial fungus trichoderma reesei use of the meganuclease i-scei of saccharomyces cerevisiae to select for gene deletions in actinomycetes introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease homing endonucleases from mobile group i introns: discovery to genome engineering purification and characterization of the in vitro activity of i-sce i, a novel and highly specific endonuclease encoded by a group i intron the delitto perfetto approach to in vivo site-directed mutagenesis and chromosome rearrangements with synthetic oligonucleotides in yeast a novel engineered meganuclease induces homologous recombination in yeast and mammalian cells gene knockouts, in vivo site-directed mutagenesis and other modifications using the delitto perfetto system in saccharomyces cerevisiae in vivo site-specific mutagenesis and gene collage using the delitto perfetto system in yeast saccharomyces cerevisiae genetic modification stimulated by the induction of a sitespecific break distant from the locus of correction in haploid and diploid yeast saccharomyces cerevisiae genome engineering in saccharomyces cerevisiae using crispr-cas systems efficient multiplexed integration of synergistic alleles and metabolic pathways in yeasts via crispr-cas seamless site-directed mutagenesis of the saccharomyces cerevisiae genome using crispr-cas a highly characterized yeast toolkit for modular, multipart assembly homology-integrated crispr-cas (hi-crispr) system for one-step multigene disruption in saccharomyces cerevisiae genome-scale engineering of saccharomyces cerevisiae with singlenucleotide precision development of crispr-cas systems for genome editing and beyond nucleosomes inhibit target cleavage by crispr-cas in vivo internal guide rna interactions interfere with cas -mediated cleavage genome-scale engineering of saccharomyces cerevisiae with singlenucleotide precision supplemental off-target effects in crispr/cas -mediated genome engineering deciphering off-target effects in crispr-cas through accelerated molecular dynamics single-step precision genome editing in yeast using crispr-cas systematic humanization of yeast genes reveals conserved functions and genetic modularity. science ( -. ) new modules for the repeated internal and n-terminal epitope tagging of genes in saccharomyces cerevisiae system of centromeric, episomal, and integrative vectors based on drug resistance markers for saccharomyces cerevisiae a versatile toolbox for pcr-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes systematic humanization of yeast genes reveals conserved functions and genetic modularity. science ( -. ) methods in yeast genetics : a cold spring harbor laboratory course manual yeast transformation by the liac/ss carrier dna/peg method -fluoroorotic acid as a selective agent in yeast molecular genetics regulation of spo phosphorylation and its essential role in the fear network synthetic approaches to protein phosphorylation the ubiquitin-proteasome system of saccharomyces cerevisiae structure and function of the s proteasome degradation of a cohesin subunit by the n-end rule pathway is essential for chromosome stability proteasome involvement in the repair of dna double-strand breaks proteasome nuclear activity affects chromosome stability by controlling the turnover of mms , a protein important for dna repair saccharomyces cerevisiae ku potentiates illegitimate dna double-strand break repair and serves as a barrier to error-prone dna repair pathways mediated end joining: a back-up survival mechanism or dedicated pathway? the timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? the ubiquitin-proteasome system of saccharomyces cerevisiae degradation of misfolded protein in the cytoplasm is mediated by the ubiquitin ligase ubr cytoplasmic protein quality control degradation mediated by parallel actions of the e ubiquitin ligases ubr and san plasticity in eucaryotic s proteasome ring assembly revealed by a subunit deletion in yeast ubiquitin-conjugating enzymes ubc and ubc mediate selective degradation of short-lived and abnormal proteins protein misfolding and temperature up-shift cause g arrest via a common mechanism dependent on heat shock factor in saccharomyces cerevisiae misfolded proteins are competent to mediate a subset of the responses to heat shock in saccharomyces cerevisiae deubiquitinase activity is required for the proteasomal degradation of misfolded cytosolic proteins upon heat-stress rsp /nedd is the main ubiquitin ligase that targets cytosolic misfolded proteins following heat stress the rpd l hdac complex is essential for the heat stress response in yeast the yeast environmental stress response regulates mutagenesis induced by proteotoxic stress rpn p acts as a transcription factor by binding to pace, a nonamer box found upstream of s proteasomal and other genes in yeast rpn is a ligand, substrate, and transcriptional regulator of the s proteasome: a negative feedback circuit the transcriptional regulation of protein complexes; a cross-species perspective structures and co-regulated expression of the genes encoding mouse cytosolic chaperonin cct subunits thermal adaptation in yeast: growth temperatures, membrane lipid, and cytochrome composition of psychrophilic, mesophilic, and thermophilic yeasts flexibility of a eukaryotic lipidome -insights from yeast lipidomics dynamic transcriptional and metabolic responses in yeast adapting to temperature stress yeast metabolic and signaling genes are required for heat-shock survival and have little overlap with the heatinduced genes functional characterization of the s. cerevisiae genome by gene deletion and parallel analysis. science ( -. ) the yeast deletion collection: a decade of functional genomics matouschek for his support during the preparation of this manuscript and to joseph desautelle, who helped with plasmid construction. development of the it method was started while cmy was a post-doctoral fellow of the howard hughes medical institute. key: cord- -cl evbr authors: wang, yingxue; zhang, weijiao; jefferson, matthew; sharma, parul; bone, ben; kipar, anja; coombes, janine l.; pearson, timothy; man, angela; zhekova, alex; bao, yongping; tripp, ralph a; yamauchi, yohei; carding, simon r.; mayer, ulrike; powell, penny p.; stewart, james p.; wileman, thomas title: the wd and linker domains of atg l required for non-canonical autophagy limit lethal respiratory infection by influenza a virus at epithelial surfaces date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cl evbr respiratory viruses such as influenza a virus (iav) and sars-cov- (covid- ) cause pandemic infections where cytokine storm syndrome, lung inflammation and pneumonia lead to high mortality. given the high social and economic cost of these viruses, there is an urgent need for a comprehensive understanding of how the airways defend against virus infection. viruses entering cells by endocytosis are killed when delivered to lysosomes for degradation. lysosome delivery is facilitated by non-canonical autophagy pathways that conjugate lc to endo-lysosome compartments to enhance lysosome fusion. here we use mice lacking the wd and linker domains of atg l to demonstrate that non-canonical autophagy protects mice from lethal iav infection of the airways. mice with systemic loss of non-canonical autophagy are exquisitely sensitive to low-pathogenicity murine-adapted iav where extensive viral replication throughout the lungs, coupled with cytokine amplification mediated by plasmacytoid dendritic cells, leads to fulminant pneumonia, lung inflammation and high mortality. iav infection was controlled within epithelial barriers where non-canonical autophagy slowed fusion of iav with endosomes and reduced activation of interferon signalling. this was consistent with conditional mouse models and ex vivo analysis showing that protection against iav infection of lung was independent of phagocytes and other leukocytes. this establishes non-canonical autophagy pathways in airway epithelial cells as a novel innate defence mechanism that can restrict iav infection and lethal inflammation at respiratory surfaces. respiratory viruses such as influenza a virus (iav) and sars-cov- (covid- ) cause pandemic infections where cytokine storm syndrome, lung inflammation and pneumonia lead to high mortality. given the high social and economic cost of these viruses, there is an urgent need for a comprehensive understanding of how the airways defend against virus infection. viruses entering cells by endocytosis are killed when delivered to lysosomes for degradation. lysosome delivery is facilitated by noncanonical autophagy pathways that conjugate lc to endo-lysosome compartments to enhance lysosome fusion. here we use mice lacking the wd and linker domains of atg l to demonstrate that non-canonical autophagy protects mice from lethal iav infection of the airways. mice with systemic loss of non-canonical autophagy are exquisitely sensitive to low-pathogenicity murine-adapted iav where extensive viral replication throughout the lungs, coupled with cytokine amplification mediated by plasmacytoid dendritic cells, leads to fulminant pneumonia, lung inflammation and high introduction influenza a virus (iav) is a respiratory pathogen of major global public health concern (yamayoshi & kawaoka, ). as with sars cov- , animal reservoirs of iav can contribute to zoonotic infection leading to pandemics with a high incidence of viral pneumonia, morbidity and mortality. iav infects airway and alveolar epithelium and damage results from a combination of the intrinsic pathogenicity of individual virus strains as well as the strength and timing of the host innate/inflammatory responses. optimal cytokine levels protect from iav replication and disease but excessive cytokine production and inflammation worsens the severity of lung injury (davidson et al., , herold et al., , iwasaki & pillai, , ramos & fernandez-sesma, , teijaro et al., . even though infection of the lower respiratory tract can result in inflammation, flooding of alveolar spaces, acute respiratory distress syndrome and respiratory failure the factors that control iav replication at epithelial surfaces and limit lethal lung inflammation remain largely unknown. the transport of viruses to lysosomes for degradation provides an important barrier against infection. transport to lysosomes can be enhanced non-canonical autophagy pathways which conjugate autophagy marker protein lc to endo-lysosome compartments to increase lysosome fusion. in phagocytes lc -associated phagocytosis (lap) conjugates lc to phagosomes and enhances phagosome maturation (delgado et al., , fletcher et al., , lamprinaki et al., , martinez et al., , sanjuan et al., . in non-phagocytic cells lc is conjugated to endolysosome compartments during the uptake of particulate material such as apoptotic cells and aggregated β-amyloid, and following membrane damage during pathogen entry or osmotic imbalance induced by lysosomotropic drugs (heckmann et al., , tan et al., ) (florey et al., , florey et al., , roberts et al., . it is known from in vitro studies that lc can be recruited to endo-lysosome compartments during the uptake of pathogens, but the roles played by non-canonical autophagy during viral infection in vivo are largely unknown. a role for non-canonical autophagy in host defence has been implied from in vitro studies of lap in phagocytes infected with free living microbes with a tropism for macrophages such as bacteria (listeria monocytogenes (gluschko et al., ) , legionella dumoffii (hubber et al., ) ), protozoa (leishmania major) and fungi (aspergillus fumigatus (akoumianaki et al., , kyrmizi et al., , matte et al., ). it is also known that iav induces non-canonical autophagy during infection of cells in culture (fletcher et al., ) , however, the role played by non-canonical autophagy in controlling iav infection and lung inflammation in vivo are currently unknown. it is not known for example if non-canonical autophagy is important in the control of iav infection by epithelial cells at sites of infection, or if it plays a predominant role within phagocytes and antigen-presenting cells during development of an immune response. herein we use mice with specific loss of non-canonical autophagy to determine the role played by noncanonical autophagy in host defence against iav infection of the respiratory tract. the mice (wd) lack the wd and linker domains of atg l that are required for conjugation of lc to endo-lysosome membranes (rai et al., ) but express the n-terminal atg - binding domain and the ccd of atg l that are required for wipi binding and autophagy (dooley et al., ) . importantly, the wd mice grow normally and maintain tissue homeostasis (rai et al., ) , and unlike autophagy-defective mice, the wd mice do not have pro-inflammatory phenotype. we show that loss of non-canonical autophagy from all tissues renders mice highly sensitive to low-pathogenicity murine-adapted iav (a/x- ) leading to extensive viral replication throughout the lungs, cytokine dysregulation and high mortality typically seen after infection with highly pathogenic iav. conditional mouse models and ex vivo analysis showed that protection against iav infection of lung was independent of phagocytes and other leukocytes, and that infection was controlled within epithelial barriers where noncanonical autophagy slowed fusion of iav with endosomes and reduced interferon signalling. this establishes non-canonical autophagy pathways in airway epithelial cells as a novel innate defence mechanism that restricts iav infection at respiratory surfaces. mice with systemic loss of the wd and linker domains of atg l are highly sensitive to iav infection the consequences of loss of the wd and linker domains of atg l on conventional autophagy and non-canonical autophagy were confirmed using cell lines taken from controls and wd mice. mouse embryo fibroblasts (mefs) from littermate control mice expressed full-length  and  forms of atg l at kda (fig. b) , and generated pe-conjugated lc ii during recruitment of lc to autophagosomes following starvation in hbss. the mefs also recruited lc to endo-lysosome compartments swollen by monensin and control bone marrow-derived macrophages (bmdm) activated lap to recruit lc to phagosomes containing zymosan (fig. b) . mefs from δwd mice expressed a truncated atg l at kda (fig. c) . cells from δwd mice generated lc ii and autophagosomes in response to starvation but failed to recruit lc to swollen endo-lysosome compartments or phagosomes containing zymosan. these data confirm defects in non-canonical autophagy and lap in the δwd mice. iav enters airway and lung epithelial cells by endocytosis, and in tissue culture iav induces non-canonical autophagy leading to atg l -wd domain-dependent conjugation of lc to the plasma membrane and peri-nuclear structures (fletcher et al., ) . to test whether non-canonical autophagy has a host defence function in vivo, δwd mice were infected with iav. we used a low-pathogenicity murine-adapted iav (a/x ) that does not normally lead to extensive viral replication throughout the lungs, or cause the cytokine storm syndrome and death typically seen after infection with highly pathogenic viral strains. the results (fig. ) showed that δwd mice became moribund and showed severe signs of clinical illness (rapid breathing, piloerection). they also displayed rapid weight loss compared to littermate controls ( fig innate protection against iav is provided by type (α, β) and iii (λ) interferon (ifn) with severe iav infection causing excessive airway inflammation and pulmonary pathology attributable in part to ifnαβ and tnf-α (davidson et al., , szretter et al., . measurement of cytokine expression at d.p.i showed that iav induced a transient increase in transcripts for interferon-stimulated genes (isgs), isg and ifit (iwasaki & pillai, ) and pro-inflammatory cytokines ) in the lungs of both control and δwd mice ( fig a) . this increase in cytokine expression was resolved by d.p.i. before a second wave of increased cytokine expression at d.p.i. this second wave of cytokine expression was resolved by d.p.i in control mice, but δwd mice showed sustained increases in isg , ifit , il- β, tnfα and ccl transcripts, co-incident with exacerbated weight loss. at d.p.i lungs of δwd mice showed increased expression of neutrophil chemotaxis factor cxcl mrna (fig. a), coincident with increased neutrophil infiltration of airways and parenchyma, and extensive neutrophil extracellular traps (nets) as a consequence of neutrophil degeneration as shown by ih ( fig. b and s ). increased neutrophil infiltration of airways in δwd mice at d.p.i. was confirmed and quantified using flow cytometric analysis of broncho-alveolar lavage (bal; fig. c ). at - d.p.i. increased expression of ccl mrna in δwd mice was coincident with extensive macrophage/monocyte infiltration into lung parenchyma observed by ih ( fig. b and s ) which was not seen in controls. this increased macrophage/monocyte infiltration in δwd mice was confirmed and quantified using flow cytometric analysis of single cell suspensions from lung tissue (fig. d ). it is known that, in severe iav infection, a cytokine storm occurs that is amplified by plasmacytoid dendritic cells pdcs (davidson et al., ) . pdcs detect virus-infected cells and produce large amounts of cytokines, in particular ifnαβ, that in severe infections can enhance disease. in these cases, depletion of pdcs can decrease morbidity (davidson et al., ) . depletion of pdcs in iav-infected δwd mice using anti-pdca- led to markedly decreased weight loss as compared with isotype control-treated mice and that was similar to that seen in littermate controls (fig e). this indicates that excessive cytokine production amplified by pdcs is responsible for the increased morbidity seen in the δwd mice. thus, mice with systemic loss of non-canonical autophagy failed to control lung virus replication and inflammation, leading to increased cytokine production, morbidity and mortality. changes in inflammatory threshold or immunological homeostasis macrophages cultured from embryonic livers from mice with complete loss of atg l secrete high levels of il - (saitoh et al., ) , and lysmcre-mediated deletion of genes essential for conventional autophagy (eg: atg , atg , atg , atg l , fip ) in mice leads to raised pro-inflammatory cytokine expression in the lung. this has been reported to increase resistance to iav infection (lu et al., ) , and this was also observed in mice used in our study (fig. s ) where lysmcre-mediated loss of atg l prevented rapid weight loss and reduced virus titre. this led us to test the possibility that the δwd mutation to atg l could also increase il- β secretion, and cause the increased inflammation observed during iav infection. this was tested by incubating bmdm with lps and purine receptor agonist, bzatp (fig s a) , or by challenging mice with lps ( fig s b) . mice with a complete loss of atg l in myeloid cells (atg l fl/fl -lysmcre) showed three-fold increases in il- β in both serum and of secretion il- β from bmdm in vitro. in contrast il- β secretion in δwd mice did not differ significantly from littermate controls ( fig s a&b) . this was consistent with lack of elevated cytokines in lungs prior to infection (see day in fig. a ), and our previous work showing that serum levels of il- β, il- p , il- , and tnf-α in δwd mice are the same as in littermate controls at - and - weeks (rai et al., ) . the exaggerated inflammatory response to iav in δwd mice did not therefore result from a raised proinflammatory threshold or dysregulated il- β responses in the lung. also, the frequencies of t-cell, b-cell and macrophages were similar in δwd mice to littermate controls (fig. s ). these data suggest that the exaggerated responses of δwd mice to iav do not occur because the mice have a raised inflammatory threshold or abnormal immunological homeostasis. the link between non-canonical autophagy/lap, tlr signalling, nadph oxidase activation and ros production (delgado et al., , martinez et al., , sanjuan et al., provides phagocytes with a powerful mechanism to limit infections in vivo. to test whether wild-type bone marrow-derived cells could protect susceptible δwd mice from lethal iav infection, we generated radiation chimeras (fig. s ). when challenged with iav, δwd mice reconstituted with either wild-type or δwd bone marrow remained highly sensitive to iav ( fig. a & b) with body weight reduced by up to % and decreased survival by d.p.i. as seen for δwd mice, weight loss was associated with a -fold increase in lung viral titre (fig. c) , fulminant pneumonia and inflammatory infiltration into the lung (fig. d ). this increased susceptibility to iav was not observed for control mice reconstituted with wild-type marrow, showing that non-canonical autophagy pathways in phagocytes and other leukocytes from control mice were not able to protect δwd mice against lethal iav infection. in a reciprocal experiment ( delivery of viral ribonuclear proteins (rnps) into the cytoplasm (skehel & wiley, , wharton et al., . rnps are then imported into the nucleus for genome replication (boulo et al., ) . the effect of non-canonical autophagy on iav entry was tested using a fluorescence de-quenching assay where the envelope of purified iav was labelled with green (dioc ) and red (r ) lipophilic dyes. mefs incubated with iav for increasing times were analyzed by facs to detect the green fluorescence signal generated when the dyes are diluted following iav fusion with the endosome membrane ( fig. a -c). the percentage of cells emitting a green signal was greater in mefs from δwd mice ( % compared to % for controls at min p.i.) and increased with time ( % versus % for controls; fig. b ). likewise, the median fluorescence intensity was . - . -fold higher in mefs from δwd mice ( fig c) . this showed that non-canonical autophagy slowed fusion of iav with endosome membranes. the recognition of viral rna by interferon sensors following delivery of rnps into the cytoplasm was used as a second assay for iav entry. mefs from δwd mice showed between and -fold increases in expression of ifn responsive genes, isg and ifit ( fig. d & e), and this was also observed in the lung in vivo (fig. a) . taken together the results demonstrate for the first time that the wd and linker domains of atg l allow non-canonical autophagy to provide a novel innate defence mechanism against lethal iav infection within the epithelial barrier in vivo. showed profound sensitivity to infection by a low-pathogenicity murine-adapted iav (a/x ) leading to extensive viral replication throughout the lungs, dysregulated cytokine production, fulminant pneumonia and lung inflammation leading to high mortality and death usually seen after infection with virulent strains (belser et al., ) . these signs mirror the cytokine storms and mortality seen in humans infected with highly pathogenic strains of iav such as the 'spanish' influenza (belser et al., ) . the observation that bone marrow transfers from wild-type mice were unable to protect δwd mice from iav suggested that protection against iav infection in vivo was independent of leukocytes and did not require non-canonical autophagy in leukocyte populations (e.g. macrophages, dendritic cells, neutrophils, granulocytes, lymphocytes). in a reciprocal experiment the linker and wd domains of atg l were deleted specifically from myeloid cells. these mice, which lack non-canonical autophagy in phagocytic cells ( δwd mice infected with iav appeared to be unable to resolve inflammatory responses resulting in sustained expression of pro-inflammatory cytokines, morbidity and a striking lung pathology characterized by profuse migration of neutrophils into the airway at day followed by macrophages on day . pdcs detect iav-infected cells and produce large amounts of cytokines, in particular ifnαβ, that in severe infections can enhance disease (davidson et al., ) . the fact that morbidity in δwd mice could be decreased by depleting pdcs indicates that excessive cytokine production, amplified by pdcs was a major factor. this is not due to a lack of non-canonical autophagy/lap in pdc as bonemarrow chimaeras of δwd mice with wild-type leukocytes have the same phenotype as δwd mice. iav is recognized by endosomal tlr in respiratory epithelial cells and rig-i detects virus replicating in the cytosol leading to activation of irf and nfkb with subsequent induction of interferon, isg and proinflammatory cytokine production (iwasaki & pillai, ). increased inflammation may result directly from increased virus in the lungs, but the increased fusion of iav envelope with endosomes in wd mice may increase delivery of viral rna to the cytoplasm resulting in the sustained pro-inflammatory cytokine signalling. a similar pro-inflammatory phenotype resulting from decreased trafficking of inflammatory cargoes is observed following disruption of non-canonical autophagy by lysmcre-mediated loss of rubicon from macrophages or microglia (heckmann et al., , martinez et al., . we have dissected the roles played by conventional autophagy and non-canonical several non-canonical pathways leading to recruitment of lc to endo-lysosomal compartments, rather than phagosomes, are beginning to emerge. non-canonical autophagy in microglia facilitates endocytosis of amyloid and tlr receptors to reduce β-amyloid deposition and inflammation in mouse models of alzheimer's disease (heckmann et al., ) . this may involve and interaction between the wd domain and tmem which is required for -amyloid glycosylation (ullrich et al., ) . lysosomotropic drugs, which stimulate direct recruitment of lc to endosomes, create the δwd mefs do not recruit lc (green) to endo-lysosomes following incubation with monensin or to bone marrow-derived macrophage phagosomes containing zymosan. littermate control and wd mice were challenged intranasally with iav strain x ( pfu). mann-whitney u test was used to determine significance. (d) precision-cut lung slices from control and δwd mice were infected with iav. virus titres were determined at indicated time points. comparisons were made using two-way anova with bonferroni post-tests. influenza virus a/hkx (x , h n ) was propagated in the allantoic cavity of -day-old embryonated chicken eggs at °c for h. titres were determined by plaque assay using mdck cells with an avicel overlay. all experiments were performed in accordance with uk home office guidelines and under the uk animals (scientific procedures) act . the generation of wd mice (atg l δwd/δwd ) has been described previously (rai et al., ) . generation of wd phag and atg l fl/fl -lysmcre mice is described in detail in fig. s and separate cohorts inoculated intra-nasally with pfu iav strain x in µl sterile pbs. mice were infected between and am. animals were sacrificed at variable timepoints after infection by cervical dislocation. tissues were removed immediately for downstream processing. sample sizes of n = were used as determined using power calculations and previous experience of experimental infection with these viruses. for survival analysis, a humane endpoint was determined using a scoring matrix that included excessive (> %) weight loss. to specifically deplete plasmacytoid dendritic cells (pdcs), mice were treated with anti-pdca- (cambridge bioscience) or igg b isotype-matched control, using a dose of mg per ml via the i.p. route on day of infection with iav and every h thereafter (davidson et al., ) . the general strategy is shown in fig s a. mice were subjected to whole body irradiation with gy in two doses h apart using a cs source in a rotating closed chamber. bone marrow was collected from male wild-type c bl/ -ly . (b .sjl-ptprc a pepc b /boycrl; atg l +/+ ) mice that are congenic for the cd . allele or from δwd mice (that are congenic for cd . ). the c bl/ cd . marrows were used to enable confirmation of chimaerism by facs analysis of bon-marrow-derived cells as littermate control and δwd mice are cd . (fig s b) . the femur and tibia of the donor mouse was collected and sterilised for min in % ethanol. the ends of the bones were removed and pbs was used to flush out the bone marrow through a μm cell sieve. red blood cell lysis was performed using . % ammonium chloride and the cells were washed twice in pbs and re-suspended at a concentration of cells/ml. t cell depletion was performed prior to transfusion by using a commercial mouse hematopoietic progenitor cell isolation kit (easysep, stemcell™ technologies, # ). after depletion, donor bone marrow cells were injected into each irradiated mouse by tail vein injection h following irradiation. mice were then allowed to recover for weeks with daily monitoring of mouse weights and general condition for at least the first two weeks to monitor for any severe radiation sickness or illness due to being immunocompromised. for chimaerism analysis, approximately spleen cells were analysed by flow cytometry using fluorochrome-conjugated monoclonal antibodies specific for cd . (clone a ebioscience), cd . , (clone ebioscience). as shown in fig. s b , in the groups where cd . marrow was transplanted all mice were > % chimaeric. brocho-alvolear lavage fluid (bal) was obtained by lavage of mice via the trachea using ml ice-cold rpmi containing % fcs. for lung tissue, single-cell suspensions were made from minced lung and subjected to collagenase and dnase i digestion, then treated with ack buffer to remove red blood cells. in both cases, approximately cells were infection of ex vivo lung slices was used to examine the responses of lungs without any contribution from recruited leukocytes, which could not be present. mouse lungs were inflated with % low melting point agarose in hbss and then sliced into µm sections using a vibrating microtome. they were then cultured overnight in dmem/f medium (thermofisher ) prior to infection with iav. iav endosome fusion assay. the envelope of purified iav ( . mg protein ml - ) was labelled using an ethanol solution containing µm , '-dioctadecyloxacarbocyanine (dioc ) and µm octadecyl rhodamine b (r s . increased neutrophilia and netosis in iav-infected mice deficient in noncanonical autophagy δwd and littermate control mice were infected i.n. with pfu iav x . lung tissues were harvested at d p.i. neutrophils and h (marker of netosis) were detected by ih using anti-ly g and anti-h , visualized with dab and counter-stained with hematoxylin. micrographs of representative areas from lungs of six mice are shown. scale bars represent µm (upper panels), µm (middle panels) or µm (lower panels). there are dramatically increased numbers of neutrophils in airways (bronchi and bronchioles) and lung parenchyma of δwd mice, accompanied by markedly-increased netosis, indicating significant neutrophil degeneration. δwd and littermate control mice were infected i.n. with pfu iav x . lung tissues were harvested at d p.i. macrophages were detected by ih using anti-iba- , visualized with dab and counter-stained with hematoxylin. micrographs of representative areas from lungs of six mice are shown. scale bars represent µm (upper panels) and µm (lower panels). lower panels are the same as in fig. b . upper panels show the lower magnification images of the lung to illustrate the general nature of the observations. there is clearly increased inflammation in δwd mice with higher numbers of macrophages in the lung parenchyma. δwd iba- (macrophages) control iba- (macrophages) atg l fl/fl -lysmcre mice and littermate controls; n = or per group) were infected i.n. with pfu iav x . panel a. mice were weighed daily and the weights presented as a percentage of the starting weight. panel b. lung tissues were taken at d.p.i. and virus titer determined by plaque assay. data represent the mean value ± sem. analysis using the mann-whitney u test showed a significant difference (* p < . ). thus, atg l fl/fl -lysmcre mice that are deficient in canonical autophagy in phagocytes lose weight at the same rate as littermate controls but are more resistant to virus replication as they have lower lung virus titres. lysmcre-mediated deletion of autophagy genes from mice leads to increased inflammatory threshold characterised by raised secretion of il- β from macrophages ( ), and in the lung this can increase resistance to iav infection ( ). the possibility that the δwd mutation could affect il- β secretion was tested by stimulating bmdm with bacterial lipopolysaccharide (lps) and bzatp (p x receptor agonist) or challenging mice with lps. panel a. bone marrow-derived macrophages (bmdm) from mice strains as indicated were incubated with ng/ml of lps for h and μm of bzatp for min. supernatants were assayed for il- β by elisa. (mock group: untreated, bzatp controls only received bzatp). representative data are shown as the means ± sd of readings from wells per group and were analyzed using one-way anova with tukey's post-hoc analysis (**** p< . ). approximately three-fold increases in il- β secretion were seen for bmdm from atg l fl/fl -lysmcre mice. however, il- β secretion from δwd bmdm did not differ significantly from littermate controls. panel b. mouse strains (as indicated) were injected with mg/kg of lps via the ip route. serum collected min post injection was assayed for il- β by elisa. in nontreated mice il- β was below the detection limit in all strains (not shown). data are shown as the means ± sd of duplicate assays from mice per group and were analyzed using one-way anova with tukey's post-hoc analysis (** p < . ). approximately threefold increases in il- β secretion were seen for atg l fl/fl -lysmcre, however il- β secretion for δwd mice did not differ significantly from littermate controls. the possibility that the loss of non-canonical autophagy resulted in changes in leukocyte populations was tested by analysing dissociated spleens by facs using antibodies to tcell subsets (cd + , cd + and cd + , cd + ), b-cells (cd r/b ) and macrophages (cd b, f / ) upper panel shows representative facs profiles from n = mice. lower panel shows the percentage positive for each population. ). flow plot shows representative plot from one c bl/ wt (cd . ) bone-marrow → δwd (cd . ) recipient chimaera and one c bl/ wt (cd . ) bone-marrow → littermate control (cd . ) recipient chimaera. all animals were > % chimaeric. homozygous δwd mice carrying lysmcre were crossed with atg l fl/fl mice. % of progeny are atg l fl/δwd and carry lysmcre. cre recombinase expressed in myeloid cells of these mice inactivates atg l by removing exon from atg l (wd phag ). the myeloid cells only express δwd. cre recombinase is not expressed in non-myeloid tissues and atg l is preserved to power autophagy. % of progeny provide littermate controls because they lack lysmcre and preserve atg l in all tissues. δwd phag that are deficient in non-canonical autophagy in phagocytes and littermate control mice were infected i.n. with pfu iav x . lung tissues were harvested at d p.i. macrophages were detected by ih using anti-iba- , visualized with dab and counterstained with hematoxylin. scale bars represent µm (upper panels) and µm (lower panels). micrographs of representative areas from lungs of six mice are shown. both δwd phag and control mice show little inflammatory response in the parenchyma and mild, macrophage-rich peri-bronchiolar infiltration (black arrows). δwd phag iba- (macrophages) control iba- (macrophages) sub-confluent monolayers of mefs were incubated with dual-labelled (sp-dioc /r ) iav at c for min and warmed to °c for min. ph-dependent fusion was assessed by adding bafilomycin a (baf) to some wells. cells were harvested by trypsinisation, fixed in pfa and analysed by flow cytometry. cell wall melanin blocks lc -associated phagocytosis to promote pathogenicity an innate defense peptide bpifa /splunc restricts influenza a virus infection pathogenesis and transmission of triple-reassortant swine h n influenza viruses isolated before the h n pandemic foot-and-mouth disease virus induces autophagosomes during cell entry via a class iii phosphatidylinositol -kinase-independent pathway the t a crohn's disease risk polymorphism impairs function of the wd domain of atg l nuclear traffic of influenza virus proteins and ribonucleoprotein complexes pathogenic potential of interferon alphabeta in acute influenza infection toll-like receptors control autophagy wipi links lc conjugation with pi p, autophagosome formation, and pathogen clearance by recruiting atg - - l the wd domain of atg l is required for its non-canonical role in lipidation of lc at single membranes v-atpase and osmotic imbalances activate endolysosomal lc lipidation autophagy machinery mediates macroendocytic processing and entotic cell death by targeting single membranes the beta integrin mac- induces protective lc -associated phagocytosis of listeria monocytogenes associated phagocytosis and inflammation lc -associated endocytosis facilitates beta-amyloid clearance and mitigates neurodegeneration in murine alzheimer's disease influenza virus-induced lung injury: pathogenesis and implications for treatment activation of antibacterial autophagy by nadph oxidases bacterial secretion system skews the fate of legionella-containing vacuoles towards lc -associated phagocytosis innate immunity to influenza virus infection coxsackievirus infection induces autophagy-like vesicles and megaphagosomes in pancreatic acinar cells in vivo autophagy proteins promote repair of endosomal membranes damaged by the salmonella type three secretion system calcium sequestration by fungal melanin inhibits calciumcalmodulin signalling to prevent lc -associated phagocytosis lc -associated phagocytosis is required for dendritic cell inflammatory cytokine response to homeostatic control of innate lung inflammation by vici syndrome gene epg and additional autophagy genes promotes influenza pathogenesis noncanonical autophagy inhibits the autoinflammatory, lupus-like response to dying cells molecular characterization of lc -associated phagocytosis reveals distinct roles for rubicon, nox and autophagy proteins leishmania major promastigotes evade lc -associated phagocytosis through the action of gp the atg -binding and coiled coil domains of atg l maintain autophagy and tissue homeostasis in mice independently of the wd domain required for lc -associated phagocytosis modulating the innate immune response to influenza a virus: potential therapeutic use of anti-inflammatory drugs autophagy and formation of tubulovesicular autophagosomes provide a barrier against nonviral gene delivery loss of the autophagy protein atg l enhances endotoxin-induced il- beta production toll-like receptor signalling in macrophages links the autophagy pathway to phagocytosis receptor binding and membrane fusion in virus entry: the influenza hemagglutinin role of host cytokine responses in the pathogenesis of avian h n influenza viruses in mice an atg l -dependent pathway promotes plasma membrane repair and limits listeria monocytogenes cell-to-cell spread mapping the innate signaling cascade essential for cytokine storm during influenza virus infection the novel membrane protein tmem modulates complex glycosylation, cell surface expression, and secretion of the amyloid precursor protein the wd and linker domains of atg l required for non-canonical autophagy limit lethal influenza a virus infection at epithelial surfaces. figs. s to s unmodified atg l is identified using primers flanking exon ( , ) and exon ( and ). the δwd allele was generated by inserting a stop codon into exon and this increases the size of the pcr product of exon from bp to bp. in atg l fllfl loxp sites flanking exon in atg l increase the pcr product of exon from bp to bp, while removal of exon by cre recombinase reduces the pcr product of exon from bp to bp. panel c. genotyping δwd phag mice. dna extracted from mouse tail tissue or bone marrow derived macrophages (mΦ) was analysed by pcr. (i). samples from δwd phag mice (indicated by cre+) and littermate controls (cre-). the bp pcr product seen in macrophage dna of cre+ δwd phag strains indicates specific removal of exon from atg l in myeloid cells. (ii). pcr primers verify presence of cre recombinase (cre+). iii). genotyping of wild type and δwd strains showing predicted changes in size of pcr product from exon . panel d. tissue specific expression of atg l and δwd. skin fibroblasts and bone marrow derived macrophages (bmdm) isolated from atg l wd phag mice (wd phag ) and littermate controls were analysed by western blot. skin fibrobalasts and bmdm from control mice lacking lysmcre (atg l fl/fl ) express full length kda  and isoforms of atg l and the truncated wd at kda. δwd phag mice express lysmcre indicated by the removal of full length atg l from bmdm but not skin fibroblasts. atg l wd phag mice (wd phag ) and litter mate controls were incubated with monensin to induce lc associated endocytosis, fixed and immunostained for lc . fibroblasts from wd phag mice are able to recruit lc (green) to swollen endo-lysosome compartments in a similar way to those from littermate control mice. δwd phag mice. bmdms isolated from atg l wd phag mice (wd phag ) and litter mate controls were incubated with zymosan for min, fixed and immunostained for lc . bmdms from wd phag mice are unable to recruit lc (green) to phagosomes containing zymosan (red). key: cord- -xcsal vk authors: rafie, k.; lenman, a.; fuchs, j.; rajan, a.; arnberg, n.; carlson, l.-a. title: the structure of enteric human adenovirus - a leading cause of diarrhea in children date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xcsal vk human adenovirus (hadv) types f and f are a prominent cause of diarrhea and diarrhea-associated mortality in young children worldwide. these enteric hadvs differ strikingly in tissue tropism and pathogenicity from respiratory and ocular adenoviruses, but the structural basis for this divergence has been unknown. here we present the first structure of an enteric hadv - hadv-f - determined by cryo-em to a resolution of . Å. the structure reveals extensive alterations to the virion exterior as compared to non-enteric hadvs, including a unique arrangement of capsid protein ix. the structure also provides new insights into conserved aspects of hadv architecture such as a proposed location of protein v, which links the viral dna to the capsid, and assembly-induced conformational changes in the penton base protein. our findings provide the structural basis for adaptation to a fundamentally different tissue tropism of enteric hadvs. the structure of hadv-f reveals a ph-resistant capsid with an altered surface charge distribution to elucidate the structural basis of enteric adenovirus infection, we determined the structure of hadv-f using cryo-em. in parallel, the genome of the purified virus (strain "tak") was sequenced, revealing one protein-coding mutation (val ala in protein viii) compared to the deposited sequence for the same strain (genbank: dq . ). further, its proteome was determined using the high-recovery filter-aided sample preparation (fasp) mass spectrometry ( ) , revealing a total of viral proteins present in the purified virus (supplementary table ). at an average resolution of . Å, the three-dimensional reconstruction of hadv-f had continuous electron density with well-defined secondary structure elements and side-chain density (supplementary figure , supplementary movie ) . local resolution estimates revealed a resolution of better than Å for large parts of the icosahedral capsid (supplementary figure ) . this allowed us to build and refine an atomic model of the asymmetric unit (asu), which describes the icosahedral part of the virus (fig. b, supplementary movie ) . the final asu model contained four hexon homotrimers, single chains of the penton base protein and piiia, ten chains of pvi, two chains of pviii, three chains of the triskelion protein ix, and five chains of unknown identity, revealing known and unknown protein-protein interaction surfaces (supplementary table ). electron density for the fibers was present at the interface with the icosahedral capsid but was of insufficient quality for extensive model building due to increasing flexibility in more distal parts. compared to the two other reported structures of human adenoviruses, hadv-c (pdb: b t ( ) ) and hadv-d (pdb: tx ( )), the sequence identity of the capsid proteins ranges from - % (supplementary table ). this generally correlated with the average structural similarity between the three structures, with more divergent proteins showing higher structural difference in terms of cα root mean square deviation (rmsd) (supplementary table ). in their adaptation to gastrointestinal tropism, a major obstacle for enteric adenoviruses was likely the passage through the low ph of the stomach to their intestinal site of infection. to investigate the adaptation to the hostile environment of the stomach, we solved the structure of hadv-f at ph= . , which resembles the diurnal average gastric ph of young children ( ) (fig. a) . at a resolution of . Å the overall structure of the capsid at ph= . was largely unchanged (overall cα rmsd . Å for proteins listed in fig. b ) and no drastic local movements were observed (fig. b) , showing the resistance of the enteric adenovirus capsid to the gastric ph. we reasoned that the gastrointestinal adaptation might also have altered the distribution of acidic and basic residues exposed on the outer surface of the capsid. to investigate this, the surface charge distribution of hadv-f along with the other two existing human adenovirus structures (hadv-c and hadv-d ) were calculated for ph= . . a visual comparison of the charge distribution revealed significant differences between the three hadvs (fig. c ). the exposed surface of hadv-d is almost entirely covered by negative charge at ph= . , and hadv-c has mostly negatively charged surfaces on the top of the hexons (in a calculation underestimating the amount of negative charge due to exclusion of the flexible, and highly negatively charged, hypervariable region , which was not built in any of the hadv-c structures ( , , ( ) ( ) ( ) ( ) ). by comparison, the capsid of hadv-f is predominantly uncharged at ph= . (fig. c) . the surface charge distribution of hadv-f at ph= . revealed two distinct regions as still being relatively uncharged at this extreme ph: the n-terminal part of pix, largely occluded between hexons, and the solvent-exposed loops at the top of the hexons (fig. c ). whereas the overall structural fold of hexon chains is conserved among adenoviruses, they differ in the seven hypervariable regions (hvrs). comparing hadv-f (at ph= . ) to hadv-c and hadv-d , substantial differences were found in all seven hvrs (supplementary table ). in particular, hvr stands out in the comparison since it is much shorter in enteric hadvs (fig. d) . the absence of the highly negatively charged hvr loop in all hadv-c structures ( , ) indicates flexibility. on the other hand, the near-eliminated hvr in hadv-f is a short loop with a rigid conformation that allowed tracing the entire length of the polypeptide (fig. e ). taken together, these findings show that the capsid of enteric hadv-f is structurally unperturbed by exposure to stomach-like ph and has evolved to expose fewer charged residues on its exterior as compared to non-enteric hadvs, most prominently exemplified by a near-complete deletion of the hypervariable region . among the so-called minor capsid proteins, pix forms the most extended and complex arrangements. in previously reported structures of hadv-c and hadv-d , this amounts to a tight mesh of ordered protein density that stretches through the canyons between hexons across the virion surface ( , , ) (fig. a) . in hadv-f , only the n-terminal residues - (henceforth 'pix-n') were sufficiently ordered to trace the protein chain, thus ruling out the same sort of ordered, virus-spanning pix cage seen in other hadvs (fig. a ). analogously to other hadvs, pix-n trimerizes to form a triskelion (fig. b ) located between three non-peripentonal hexon units (fig. b) . each facet of the virion harbors four copies of pix-n triskelion in two distinct structural surroundings: three copies at the local -fold (l ) symmetry axes of each asymmetric unit, and one at the icosahedral -fold (i ) symmetry axis at the center of the facet (supplementary figure ) . the conformations of pix-n in these two surroundings are virtually identical (fig. b ). the three pix chains come together at the center of this triskelion, where interactions between residues phe , phe and tyr from each chain form a hydrophobic core (fig. c ). this hydrophobic core is differently arranged compared to the non-enteric hadv-c and hadv-d (supplementary figure ) and has more large hydrophobic residues at its center. sequence comparisons of pix revealed that pix-n has a higher sequence homology to hadv-c and hadv-d than its c-terminal part (residues - , 'pix-c') (supplementary figure ) . however, the sequences of pix-c are near-identical in hadv-f and the related hadv-f , indicating conservation between enteric adenoviruses. mass spectrometry analysis detected the entire pix sequence in the purified hadv-f , confirming the presence of pix-c in the purified virions (supplementary table ). we reasoned that the substantial protein mass corresponding to pix-c, emanating in the constricted space between the hexons, should be visible in the electron density map at a lower threshold even if it is flexible. indeed, the interhexonal space above pix-n harbors electron density corresponding to a flexible protein at the local -fold axes (fig. d, supplementary figure ) . this density appears to pass to the outside of the capsid between the hvr loops of the three surrounding hexons. in a localized asymmetric reconstruction ( ) , the three hvr could be resolved in their entirety (supplementary figure ) , showing that they are present in a single conformation and form a constriction of defined size (supplementary figure ) from which the c-termini of pix protrude. in contrast, there is clearly no electron density above pix-n at the icosahedral -fold position (fig. e) , showing that the spatial organization of pix-c differs between these two positions (fig. f ). taken together, these data reveal a very different arrangement of pix in enteric adenoviruses as compared to respiratory (c ) and ocular (d ) hadvs. in hadv-f , the c-terminal half of pix is flexible and exposes it's c-terminus to the capsid exterior at three of the four pix positions in each virion facet. the hadv-f penton base undergoes assembly-induced conformational changes located on the five-fold symmetry axes of adenovirus capsids, the penton base (pb) protein forms a homopentamer that contains integrin-binding motifs and serves as an assembly hub connecting the icosahedral capsid to the fibers (fig. a) . in our reconstruction of the entire hadv-f , the electron density for the pb was less well resolved than other parts of the capsid. to improve the map of the pb, we performed a localized asymmetric reconstruction of the pb monomer (supplementary figure ) . the improved map allowed the building of an atomic model for the pb, which was placed into a composite atomic model of the entire asymmetric unit. overall, the hadv-f pb is very similar to the pb in hadv-c ( , , ) and hadv-d ( ), with a β-sheet rich fold that can roughly be divided into four domains: crown, head, body and tail, with the body and head as the main domains, separated by loop regions (fig. a ). during assembly of the virus particle, the pb homopentamer forms a plethora of interactions with peripentonal hexons, the fiber, and the minor capsid protein piiia (supplementary table ). to investigate conformational changes induced during the pb assembly process, we solved the structure of a recombinantly expressed hadv-f pb homopentamer in solution (free pb -fpb) by cryo-em. the map had an average resolution of . Å (supplementary figure ) , allowing for the placement of an atomic model (fig. b ). comparing the atomic models of the fpb and the virion-bound pb (vpb), the overall cα-shift (rmsd) was very small (~ . Å). however, color-coding vpb by its structural deviations from fpb revealed regions with higher rmsd, indicating localized assembly-induced conformational changes ( fig. c , supplementary movie ). moreover, four sequence segments that were built in the vpb model could not be built in the fpb model ( fig. a ), indicating that these regions are disordered in solution and only become stabilized in a defined conformation upon assembly into the virion. one such region is the tail, a residue (thr -gly ) random coil region (fig. a) , which is disordered in the fpb (fig. b ). it is stabilized through interactions with two loop regions from piiia (supplementary table ). the sequence of the tail domain is largely conserved between hadvs (supplementary figure ) , suggesting a conserved role as an assembly motif. the second motif (tyr -leu ) becoming ordered upon assembly is an α-helix consisting of residues gln -thr , located close to the five-fold axis of the penton base (fig. d) , and close to where the fiber binds. although the hadv-f capsid map shows only weak density for the proximal fiber in connection with the pb, the vpb structure did allow tracing of a fragment of the conserved fiber tail (supplementary figure , supplementary table ) . thus, the folding of gln -thr may be dependent upon interactions within the capsid and/or binding of the fiber to the pb. additional assembly-dependent interactions take place in two loop regions located between residues val -asn in the body domain (fig. d ). these loops are disordered in the fpb but well resolved in the vpb where their conformation is stabilized by interactions with the peripentonal hexon. the first loop (ser -ser ) -located at the top of the body -is stabilized as an extended coil structure upon interaction with hexon chain ser -tyr loop. the second region (thr -gln ) -located at the bottom of body -is stabilized as a short α-helix upon binding to an uncharged pocket formed by peripentonal hexon residues ala -ile . peculiarly for enteric hadvs, the otherwise conserved integrin-binding rgd motif has been replaced by igdd in hadv-f (rgad for hadv-f ). in the hadv-f structure the igddcontaining loop is the only surface-exposed part of the pb for which we find no continuous electron density (fig. e) , despite being among the shortest loop of all hadvs( ) (supplementary figure ). this parallels the observed flexibility of the rgd motifs in the two previously reported structures of hadvs ( ) ( ) ( ) , indicating that the function of the igdd sequence may be dependent on it being flexible until interacting with a target molecule. in summary, a comparison of the pb pentamer in solution and in the virus capsid revealed several distinct motifs that become folded only upon assembly of the pb into the capsid, and further revealed that the non-canonical integrin-binding motif igdd is disordered also in the context of the assembled virus. the dna-binding protein v is located at a conserved position at the inner face of the capsid after initial model building of the virion at ph= . , the asymmetric unit contained five peptide chains that still had not been assigned an identity. four of the five chains were deemed too short to assign an identity to. three of those four chains interlace with different copies of pvi at positions where pvi interacts with pxiii or piiia at the inner face of the capsid (supplementary figure , supplementary movie ). the fifth unidentified peptide chain is markedly longer and is located at the inner face of the capsid, in a pocket formed by three non-peripentonal hexons ( fig. a -b, supplementary table ). its electron density is resolved well enough to identify large side-chain residues and we thus reasoned that a structural bioinformatics workflow might be devised to reveal its identity. with the initial constraint being only that residues number and in the identified mer peptide must have large side chains, we utilized a combination of the proteomics data, exclusion of proteins with known locations, real-space refinement scores and other considerations to elucidate its identity (supplementary methods , supplementary figure ). after sequential exclusion of candidates based on these criteria, a single most likely candidate remained, a sequence from the center of protein v (pv): gln -asp . pv has been reported to bind directly to dna and to pvi, thereby bridging the core and the surrounding capsid, but it has not been localized in any adenovirus structure. the built sequence of pv fits the density without any clashes or unlikely interactions with surrounding proteins, and furthermore shows a high degree of sequence conservation with pv in hadv-c and hadv-d (fig. c, d, supplementary movie ) . in both hitherto published hadv structures, hadv-c ( ) and hadv-d ( ), there is similarly shaped electron density at the corresponding position of the virus capsid (fig. e ). in hadv-c no atomic model was built into it ( ) . similarly, at the corresponding position in the hadv-d structure, two shorter peptide chains of unknown identity were placed ( ) . the location of the mer chain of pv is such that both the n-and c-terminus of pv, which are not resolved in our structure, may protrude towards the interior of the virion in agreement with the proposed role of pv to link the viral genome to the capsid. in summary, we used a systematic structural bioinformatics approach to propose a likely conserved position of pv in human adenoviruses. here, we present the structure of a major cause of diarrhea and diarrhea-associated mortality in children: the enteric human adenovirus hadv-f . as the first structure of an adenovirus with pronounced gastrointestinal tropism, it reveals a capsid stable to stomach ph, with substantial changes to the virion surface as compared to respiratory and ocular hadvs. overall, hadv-f has fewer charged, i.e. ph-dependent, residues exposed on the surface of its capsid. this is especially prominent at the top of the hexons where hvr is long, and rich in negatively charged residues in hadv-c . this allows interaction of hadv-c with lactoferricin through a chargedependent mechanism ( ) , which contributes to an extended tissue tropism ( , ) . evolution of hadv-f has resulted in a largely truncated and less charged hvr (fig. d) , seemingly to adapt to the specific conditions in the gastrointestinal tract. another major change to the capsid exterior of hadv-f is the starkly different conformation of protein ix (pix). instead of forming a viruscovering, rigid mesh, the c-terminal half of pix (pix-c) is flexible and protrudes to the outside of the capsid, kept in place by hexon hvr -containing loops. strikingly, pix-c density is observed above all pix-n trimers except at the icosahedral -fold axis (fig. d-e, supplementary figure ). in principle, each pix-c strand could either emerge in cis (i.e. above the same pix-n trimer to which it belongs), or stretch across the virion surface to emerge above another pix-n trimer (in a trans arrangement). the length of pix-c in hadv-f is compatible with both of these arrangements. a cis arrangement of all strands would be reminiscent of pix in some non-human adenoviruses, in which the conformation of pix-c is also more defined ( , , ) . whereas our data don't allow tracing of individual strands of pix-c, the lack of any pix-c density above the pix-n trimer at the icosahedral -fold rules out such a pure cis arrangement of pix-c. one possible, parsimonious interpretation would be that the central pix trimer adopts a trans arrangement, donating one pix-c strand to each of the three pix trimers at local -fold positions which in turn have their pix-cs in cis (fig. f ). whether this model is correct or not, it is clear from our data that pix arranges in a unique manner in enteric adenoviruses compared to other hadvs studied to date. all these modifications to the virion surface of hadv- are likely related to the different set of interactors, and different ph range that this virus encounters throughout the gastrointestinal tract. other gastrointestinal viruses interact with components such as bile (calicivirus)( ) and lipopolysaccharides (poliovirus ( ) , mouse mammary tumor virus( )), which is crucial for infection of these viruses. besides low-ph resistant interactions of hadv-f with gastrointestinal phospholipids ( ) , little is known about the hadv-f :gastrointestinal interactome. finding such interaction partners, e.g. of the disordered and exposed pix-c region, will yield further insights into the infection cycle and tropism of enteric adenoviruses. our study further unveiled how several motifs in the hadv-f pb are disordered in solution and only adopt a defined conformation upon assembly into the virus capsid, laying out another piece of the still largely unfinished puzzle of adenovirus assembly ( ) . the observation that the modified integrin-interacting motif of the hadv-f pb is still disordered in the assembled virus particle highlights the need for structural studies of the interactions with its proposed binding partner, laminin-binding integrins ( ) . biochemical data have defined protein v (pv) as a key protein linking the adenovirus genome to the capsid ( ) , but in spite of its conserved function in adenoviruses it had not been located in the adenovirus capsid. here we propose the point of interaction of pv to the interior of the capsid, and provide data suggesting that this position, at the junction between three non-peripentonal hexons, is conserved between hadvs. previous biochemical data have not suggested pv to interact with the hexons, but have instead suggested interactions between pv and the minor protein pvi ( , ) . these data are not mutually exclusive with our identification of the pv anchoring point to the hexons, since most of the pv sequence is still unaccounted for in the structure, and several copies of pvi are found in the vicinity of the anchoring point where they may form additional interactions. taken together, the structure of the enteric adenovirus hadv-f revealed key conserved aspects of adenovirus architecture as well as highly divergent features of enteric adenoviruses, thus laying the foundation for structure-based approaches to preventing this prominent cause of diarrheaassociated mortality in young children, and, for further development of these structurally divergent adv types as vaccine vehicles. human a cells (kind gift from dr. alistair kidd) were maintained in dulbecco´s modified eagle medium (dmem; sigma-aldrich) supplemented with % fetal bovine serum (fbs; hyclone, ge healthcare), mm hepes (sigma-aldrich) and u/ml penicillin + µg/ml streptomycin (gibco). for hadv-f (strain tak) propagation, bottles of a cells ( cm , at a % confluency) were infected with hadv-f inoculation material (produced in a cells) in ml of growth media ( % fbs) for min on a rocking table at ˚c. thereafter additional ml of growth media ( % fbs) was added to each flask and the cells were further incubated at °c. infected cells were harvested after approximately one week, or when cells displayed clear signs of cytopathic effect. cells were collected by centrifugation, resuspended in dmem and disrupted to release virions by freeze thawing and by addition of equal volume of vertrel xf (sigma-aldrich). after vigorous resuspension, the cell extract was centrifuged at rpm for min. the upper phase was transferred onto a discontinuous cscl gradient (densities: . g/ml, . g/ml, and . g/ml, in mm tris-hcl, ph= . ; sigma-aldrich) and centrifuged at rpm in a beckman sw rotor for . h at °c. the virion band was collected and desalted on a nap column (ge healthcare) into sterile pbs. the samples were split for tryptic and chymotryptic digestion and processed using a modified protocol of filter-aided sample preparation (fasp) ( ) . in brief, triethylammonium bicarbonate (teab) was added to a final concentrations of mm teab prior to reduction using mm dithiothreitol at °c for min. the reduced samples were loaded onto kda mwco pall nanosep centrifugal filters (sigma-aldrich), washed with m urea, % sodium deoxycholate (sdc) and alkylated with mm methyl methane thiosulfonate. two step digestion was performed on filters using trypsin and chymotrypsin as digestive enzymes in mm teab, . % sdc buffer. the first step was performed overnight and the second step, with an additional portion of proteases, for four hours the next day. tryptic digestion was performed at °c using pierce ms-grade trypsin protease (thermo fisher scientific). chymotryptic digestion was performed at roomtemperature using pierce ms-grade chymotrypsin protease (thermo fisher scientific). the peptides were collected by centrifugation and sdc was precipitated by acidifying the sample with tfa (final concentration %). the digested sample was desalted using pierce peptide desalting spin columns (thermo scientific) according to the manufacturer´s protocol. the digested and desalted samples were analysed using a qexactive hf mass spectrometer interfaced with an easy-nlc liquid chromatography system (both thermo fisher scientific). peptides were trapped on an acclaim pepmap c trap column ( μm x cm, particle size μm, thermo fischer scientific) and separated on an in-house packed analytical column ( μm x mm, particle size μm, reprosil-pur c , dr. maisch). a stepped gradient used was from % to % solvent b in min followed by an increase to % in min and to % solvent b in min at a flowrate of nl/min. solvent a was . % formic acid and solvent b was % acetonitrile in . % formic acid. the mass spectrometer was operated in data-dependent mode (dda) where the ms scans were acquired at a resolution of and a scan range from to m/z. the most intense ions with a charge state of to were isolated with an isolation window of . m/z and fragmented using normalized collision energy of . the ms scans were acquired at a resolution of and the dynamic exclusion time was set to s. data analysis was performed using proteome discoverer (version . , thermo fisher scientific). the data was searched against an in-house database containing the amino acid sequences of hadv-f . mascot (version . . , matrix science) was used as search engine with a precursor mass tolerance of ppm for ms and mmu for ms spectra. tryptic peptides were accepted with a maximum of one missed cleavage, chymotryptic peptides with maximum three missed cleavages. variable modification of methionine oxidation and fixed methylthio of cysteines were selected. the mascot significance threshold for peptides was set to . . purified hadv-f was used at . mg/ml (ph= . ) and . mg/ml (ph= . ). the recombinant hadv penton base (pb ) was purified as described before ( ) and used at mg/ml in pbs buffer, supplemented with % glycerol. a hadv sample at ph= . was prepared by adding µl of a . m citric acid / m na hpo ph= . solution to µl of hadvf- ph= . followed by incubating on ice for minutes. samples were vitrified on quantifoil cu r / (electron microscopy sciences, cat#: q cr ) and quantifoil cu r . / . (electron microscopy sciences, cat#: q cr . ) grids for the virus particles and the recombinant protein, respectively. prior to sample application the grids were glow discharged using a pelco easiglow device (ted pella inc.) at mamp for s. sample was applied by transferring µl sample onto the glow-discharged side of the grid, blotted and plunge frozen in liquid ethane, using a vitrobot plunge freezer (thermo fisher scientific), with the following settings: °c, % humidity, blotforce = - and a blotting time of s. for hadvf- ph= . sample was applied twice with a blotting step, using the same settings as above, between applications( ). all data were collected on an fei titan krios transmission electron microscope (thermo fisher scientific) operated at kev and equipped with a gatan bioquantum energy filter and a k direct electron detector. a condenser aperture of µm (hadv-f ph= . & . ) and an objective aperture of µm were chosen for data collection. a c -aperture of µm was selected for the pb data collection. coma free alignment was performed with autoctf/sherpa. data were acquired in parallel illumination mode using epu (thermo fisher scientific) software at a nominal magnification of kx ( . Å pixel size). both datasets for the hadv-f structure at ph= . were collected in super-resolution mode. due to a preferred orientation of pb , a second data set was collected at a ° tilted stage. data collection parameters are listed in the supplementary table . data processing and structure determination hadv-f ph= . two datasets were collected on hadv-f ph= . and initially processed independently. data were initially processed using relion . ( ) and continued in relion . (beta) ( ) . beam-induced motion was corrected using relion's motioncor ( ) implementation, at which step the superresolution movies were binned once, and the per-micrograph ctf estimated using gctf ( ) for all data sets. particles were manually picked and subjected to reference-free d classification and well-resolved classes were combined and subjected to d classification, applying icosahedral symmetry (i according to crowther ( ) ) and a mask of the capsid structures. a low-pass filtered ( Å) volume of hadvc- (emd- ( ) ) was used as a reference volume. particles were classified into two classes, resulting in % of particles allocated to one well-resolved class which was used for downstream processing. d refinement was performed using the output of the d classification as a reference model, low-pass filtered to Å, with no additional fourier-padding. following refinement, data were post-processed, and the particles subjected to per-particle ctf refinement, bayesian polishing and another round of per-particle ctf refinement. the particles were subjected to an additional round of d refinement before combining both datasets and performing a final d refinement, with no additional fourier-padding. the resolution was calculated using the gold standard fsc (threshold . ) to . Å after postprocessing. finally, the data were corrected for the ewald's sphere curvature using relion, which led to a local improvement of the electron density map with and a new average resolution of . Å. local resolution estimates were calculated using resmap ( ) . a homology model was generated using the swiss-model server( ) the hadv-f capsid protein sequences, for which homologues have been structurally determined. the resulting homology model was based on the reported hadv-d structure (pdb: tx ( )). the model was manually docked into the hadv-f electron density in chimerax( ) and the map corresponding to the asymmetric unit (asu) extracted. the asu map was locally sharpened using phenix's( ) autosharpen tool. subsequently, the hadv-f homology model was docked and subjected to an initial round of real-space-refinement using phenix. the structure was fully refined using iterative cycles of phenix's real-space-refinement and model building in coot ( ) . to improve the map quality surrounding the penton base monomer and the hvr -loop containing region, the map was improved using the localized asymmetric reconstruction workflow reported by ilca et al ( ) and implemented in scipion v . ( ) . coordinates for the sub-particles were determined in chimerax and subsequently located by applying icosahedral symmetry and extracted in scipion v . . sub-particles were subsequently filtered to exclude particles not present within a [- °, °] range from the image plane. the resulting sub-particles were then subjected to an asymmetric d classification. to increase the probability of convergence during classification, changes in the origins and orientations were not allowed. a subsequent d refinement yielded a d reconstruction of the penton base monomer and the hvr -loop containing region to a resolution of . Å and . Å, respectively. average resolutions were calculated according to the gold standard fsc calculations (threshold . ). data processing statistics for the asymmetric localized reconstruction are given in supplementary table . dfsc curves were calculated using the remote dfsc processing server ( ) . image processing and model building for hadv-f at ph= . data were processed as described for the hadvf- ph= . structure up until the first d refinement. the volume hadvf- at ph= . was low-passed filtered to Å and used as a reference. the resolution was estimated to . Å using the gold standard fsc (threshold . ) after postprocessing. local resolution estimates were calculated using resmap. the hadv-f ph= . model was fitted into the reconstructed hadv-f ph= . density using chimerax and an asu extracted and the resulting map locally sharpened using phenix. the model was then further fitted and energy minimized using namdinator ( ) . the hadv-f penton base (pb ) data (untilted and tilted at °) were processed using relion . (beta), with beam-induced motion correction and ctf estimation performed as for the hadv-f structure. particles were picked using the automated particle picker cryolo( ) using the available phosaurus generalized model. reference-free d classification of particles was performed in relion and revealed a significant proportion of particles with the same view, suggesting a preferred orientation of the specimen. from the ° data, an initial model was generated in cryosparc ( ) . well-resolved d classes were combined and subjected to d refinement using the same reference model as during d classification, low pass filtered to Å. following refinement, data were post-processed, and the particles subjected to per-particle ctf refinement, bayesian polishing and another round of per-particle ctf refinement before performing a final round of d refinement. inspection of the final volume revealed poor resolution along one of the axes (supplementary figure ) . we therefore collected data on a tilted specimen stage. as data collection at a tilted stage leads to a defocus gradient along the image path, per-particle ctf refinement was performed after particle extraction and before reference-free d classification, using gctf. subsequent processing steps were performed as described for the data collected on an untilted specimen stage. the average resolutions were estimated to . Å (untilted) and . Å ( ° tilt), using the gold standard fsc (threshold . ) after postprocessing. local resolution estimates were calculated using resmap. dfsc curves were calculated using the remote dfsc processing server ( ) . the pb volume generated from the data collected on the tilted stage was used for down-stream model building and model refinement. the penton base monomer chain from the hadv-f ph= . model was initially fitted into pb volume using namdinator ( ) and outlying residues pruned in coot. subsequently the model was fully built and refined using iterative cycles of realspace-refinement in phenix and model building in coot. surface charges for hadv-c (pdb: cgv ( ) ), hadv-d (pdb: tx ( )) and hadv-f was calculated using the pdb pqr-apbs software package( ) at ph= . and ph= . . for each direction of the unknown chain, a mer poly-alanine chain was manually placed into the respective density and initially real-space-refined using coot. a list of sequences was screened by using a job pipeline including a mutation step in coot and real-space-refinement in phenix. custom bash scripts written for this purpose are available upon request. an extended description is given in supplementary methods . figures of protein structures and electron densities were generated using chimerax. prospects of replication-deficient adenovirus based vaccine development against sars-cov- . vaccines (basel) systemic and mucosal immunity in mice elicited by a single immunization with human adenovirus type or vector-based vaccines carrying the spike protein of middle east respiratory syndrome coronavirus estimates of the global, regional, and national morbidity, mortality, and aetiologies of diarrhoea in countries: a systematic analysis for the global burden of disease study use of quantitative molecular diagnostic methods to identify causes of diarrhoea in children: a reanalysis of the gems case-control study quantifying risks and interventions that have affected the burden of diarrhoea among children younger than years: an analysis of the global burden of disease study human adenoviruses: from villains to vectors structure of human adenovirus latest insights on adenovirus structure and assembly. viruses atomic structure of human adenovirus by cryo-em reveals interactions among protein networks crystal structure of human adenovirus at . a resolution image reconstruction reveals the complex molecular organization of adenovirus the structure of the human adenovirus penton a triple beta-spiral in the adenovirus fibre shaft reveals a new structural motif for a fibrous protein crystal structure of the receptor-binding domain of adenovirus type fiber protein at . a resolution three-dimensional structure of the adenovirus major coat protein hexon adenovirus composition, proteolysis, and disassembly studied by indepth qualitative and quantitative proteomics revised crystal structure of human adenovirus reveals the limits on protein ix quasi-equivalence and on analyzing large macromolecular complexes atomic structures of minor proteins vi and vii in human adenovirus cryo-em structure of human adenovirus d reveals the conservation of structural organization among human adenoviruses crystal structure of enteric adenovirus serotype short fiber head adenovirus type virions contain two distinct fibers human adenovirus type contains two fibers integrins alpha v beta and alpha v beta promote adenovirus internalization but not virus attachment phylogenetic analysis and structural predictions of human adenovirus penton proteins as a basis for tissue-specific adenovirus vector design enteric species f human adenoviruses use laminin-binding integrins as co-receptors for infection of ht- cells adenovirus type lacks an rgd alpha(v)-integrin binding motif on the penton base and undergoes delayed uptake in a cells block in entry of enteric adenovirus type in hek cells diurnal variation in intragastric ph in children with and without peptic ulcers universal sample preparation method for proteome analysis structural and phylogenetic analysis of adenovirus hexons by use of high-resolution x-ray crystallographic, molecular modeling, and sequencebased methods a quasi-atomic model of human adenovirus type capsid adenoviral vector with shield and adapter increases tumor specificity and escapes liver and immune control the role of hexon protein as a molecular mold in patterning the protein ix organization in human adenoviruses localized reconstruction of subunits from electron cryomicroscopy images of macromolecular complexes lactoferrin-hexon interactions mediate car-independent adenovirus infection of human respiratory cells latent species c adenoviruses in human tonsil tissues adenoviruses use lactoferrin as a bridge for car-independent binding to and infection of epithelial cells cryo-em structures of two bovine adenovirus type intermediates. virology - three-dimensional structure of canine adenovirus serotype capsid structural basis for human norovirus capsid binding to bile acids intestinal microbiota promote enteric virus replication and systemic pathogenesis successful transmission of a retrovirus depends on the commensal microbiota unique physicochemical properties of human enteric ad responsible for its survival and replication in the gastrointestinal tract isolation and characterization of the dna and protein binding activities of adenovirus core protein v interactions among the three adenovirus core proteins vitrification after multiple rounds of sample application and blotting improves particle density on cryo-electron microscopy grids new tools for automated high-resolution cryo-em structure determination in relion- estimation of high-order aberrations and anisotropic magnification from cryo-em data sets in relion- . electron counting and beam-induced motion correction enable near-atomicresolution single-particle cryo-em gctf: real-time ctf determination and correction procedures for three-dimensional reconstruction of spherical viruses by fourier synthesis from electron micrographs quantifying the local resolution of cryo-em density maps swiss-model: homology modelling of protein structures and complexes meeting modern challenges in visualization and analysis macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix features and development of coot scipion: a software framework toward integration, reproducibility and validation in d electron microscopy addressing preferred specimen orientation in single-particle cryo-em through tilting namdinator -automatic molecular dynamics flexible fitting of structural models into cryo-em and crystallography experimental maps sphire-cryolo is a fast and accurate fully automated particle picker for cryo-em cryosparc: algorithms for rapid unsupervised cryo-em structure determination improvements to the apbs biomolecular solvation software suite inference of macromolecular assemblies from crystalline state the sequence manipulation suite: javascript programs for analyzing and formatting protein and dna sequences conformational change of the adenovirus dna-binding protein induced by soaking crystals with k uo f solutions competing interests: there are no competing interests. data and materials availability: the scripts used for the bioinformatics analysis are available upon request. coordinates reported in this study have been deposited with the protein data bank with accession codes xxxx (hadv-f asymmetric unit) and xxxx (hadvf- (free) penton base). electron microscopy maps and half-maps have been deposited in the electron microscopy data bank with the accession codes emd-yyyyy (hadv-f ph= . ), emd-yyyyy (hadv-f ph= . ), emd-yyyyy (hadv-f (free) penton base), emd-yyyyy (localized asymmetric reconstruction of the hadv-f penton base), emd-yyyyy (localized asymmetric reconstruction of the hadv-f hvr -containing loop). ( )) and hadv-d (emd- ( )) electron densities located at the same position in their respective asymmetric units. key: cord- - itisiup authors: kwesi-maliepaard, eliza mari; aslam, muhammad assad; alemdehy, mir farshid; van den brand, teun; mclean, chelsea; vlaming, hanneke; van welsem, tibor; korthout, tessy; lancini, cesare; hendriks, sjoerd; ahrends, tomasz; van dinther, dieke; den haan, joke m.m.; borst, jannie; de wit, elzo; van leeuwen, fred; jacobs, heinz title: the histone methyltransferase dot l prevents antigen-independent differentiation and safeguards epigenetic identity of cd + t cells date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: itisiup cytotoxic t-cell differentiation is guided by epigenome adaptations but how epigenetic mechanisms control lymphocyte development has not been well defined. here we show that the histone methyltransferase dot l, which marks the nucleosome core on active genes, safeguards normal differentiation of cd + t cells. t-cell specific ablation of dot l resulted in loss of naïve cd + t cells and premature differentiation towards a memory-like state, independent of antigen exposure and in a cell-intrinsic manner. without dot l, the memory-like cd + cells fail to acquire full effector functions in vitro and in vivo. mechanistically, dot l controlled t-cell differentiation and function by ensuring normal t-cell receptor density and signaling, and by maintaining epigenetic identity, in part by indirectly supporting the repression of developmentally-regulated genes. through our study dot l is emerging as a central player in physiology of cd + t cells, acting as a barrier to prevent premature differentiation and supporting the licensing of the full effector potential of cytotoxic t cells. lymphocyte development and differentiation is tightly regulated and provides the basis for a functional adaptive immune system. development of mature t cells initiates in the thymus with progenitor t cells that have to pass two key checkpoints: t cell receptor (tcr) β selection and positive selection, both of which are controlled by intricate signaling pathways involving the pre-tcr/cd and αβtcr/cd complexes, respectively . upon positive selection, mature thymocytes are licensed to emigrate and populate peripheral lymphatic organs as naïve t cells (t n ). further differentiation of naïve t cells into effector or memory t cells normally depends on tcr-mediated antigen recognition and stimulation. however, it has become evident that a substantial fraction of mature cd + t cells acquires memory-like features independent of exposure to foreign antigen. the origin and functionality of these unconventional memory cells in mice and humans, also referred to as innate or virtual memory cells, are only just being uncovered [ ] [ ] [ ] . the dynamic changes during development and differentiation of cd + t cells are governed by transcriptional and epigenetic changes, including histone modifications that are controlled by chromatin modifiers. well-established histone marks are mono-and tri-methylation of histone h k at enhancers (h k me ) and promoters (h k me ), h k me at repressed promoters, and h k me / in heterochromatin [ ] [ ] [ ] [ ] [ ] [ ] . although epigenetic 'programming' is known to play a key role in t cell development and differentiation, the causal role of epigenetic modulators in t cell differentiation is still poorly understood, especially for chromatin modifiers associated with active chromatin . one of the histone modifications positively associated with gene activity is mono-, di-and trimethylation of histone h k mediated by dot l. this evolutionarily conserved histone methyltransferase methylates h k in transcribed promoter-proximal regions of active genes , . although the association with gene activity is strong, how h k methylation affects transcription is still unclear and repressive functions have also been proposed [ ] [ ] [ ] [ ] [ ] [ ] . dot l has been linked to several critical cellular functions, including embryonic development, dna damage response, and meiotic checkpoint control , and dot l has also been shown to function as a barrier for cellular reprogramming in generating induced pluripotent stem cells . dot l gained wide attention as a specific drug target in the treatment of mll-rearranged leukemia, where mll fusion proteins aberrantly recruit dot l to mll target genes leading to their enhanced expression [ ] [ ] [ ] . a similar dependency on dot l activity and sensitivity to dot l inhibitors was recently observed in thymic lymphoma . interestingly, inhibition of dot l activity in human t cells attenuates graft-versus-host disease in adoptive cell transfer models and it regulates cd + t cell differentiation . given the emerging role of dot l in epigenetic reprogramming and t-cell malignancies, we investigated the role of dot l in normal t cell physiology using a mouse model in which dot l was selectively deleted in the t cell lineage. our results suggest a model in which dot l plays a central role in cd + t cell differentiation, acting as a barrier to prevent premature antigen-independent differentiation, and promoting the licensing of the full effector potential by maintaining epigenetic integrity. given the essential role of dot l in embryonic development , we determined the role of dot l in t cell development and differentiation by employing a conditional knock-out mouse model in which dot l is deleted in the t-cell lineage by combining floxed dot l with a cre-recombinase under the control of the lck promoter, which leads to deletion of exon of dot l during early thymocyte development (sup fig. a) . the observed global loss of h k me in lck-cre +/-;dot l fl/fl mice, as confirmed by immunohistochemistry on fixed thymus tissue (sup fig. b) , agreed with the notion that dot l is the sole methyltransferase for h k , , , . to validate the efficacy of dot l deletion at the single-cell level, we developed an intracellular staining protocol for h k me . histone dilution by replication-dependent and -independent means has been suggested to be the main mechanisms of losing methylated h k , [ ] [ ] [ ] . flow-cytometric analyses of thymocyte subsets from lck-cre +/-;dot l fl/fl mice (hereafter, ko) revealed that double-negative (dn, cd -cd -) thymocytes started losing h k me . from the subsequent immature single positive state (isp) onward all the thymocytes had lost dot l mediated h k me (sup fig. c ). this confirmed that upon early deletion of dot l, successive rounds of replication in the thymus allowed for loss of methylated h k . no changes in h k methylation levels were found in t-lineage cells of lck-cre +/-;dot l wt/wt control mice (hereafter, wt). ablation of dot l did not significantly affect the overall thymic cellularity, however it led to a reduction in the number of mature single positive (sp) cd + and cd + thymocytes (both . -fold) while the isp and cd + cd + double positive (dp) subsets were not significantly affected, suggesting a role of dot l in controlling intrathymic t cell selection (sup fig. d-e) . in the spleen, overall cellularity was also not affected but within the t cell compartment, cd + t cells were drastically reduced ( . -fold), whereas cd + t cells were increased ( . -fold) in number (sup fig. e -f). however, while flow cytometry of h k me -stained splenic t cells confirmed the lack of dot l activity in cd + t cells and cd -cd l + cd + t cells, cd + cd l -cd + cells showed a partial loss of h k me and cd + cd + regulatory t cells (treg) remained h k me positive (sup fig. g ). since earlier in development, cd -expressing cells in the thymus were mostly h k me negative, this suggests that a strong selection occurred for the maintenance of dot l for the development of tregs in this mouse model. indeed, partial deletion of dot l in cd + cells was confirmed by pcr analysis (sup fig. h ). here, we focused our study on defining the role of dot l in the cytotoxic t cell compartment in which efficient deletion of dot l and loss of h k me was found in both the thymus and the periphery. cd + t cell differentiation was strongly affected by the absence of dot l. analysis of cd + t cell subsets in the spleen revealed that dot l-ko mice showed a severe loss of naïve (cd -cd l + ) cd + t (t n ) cells which was linked to a massive gain of the cd + cd l + phenotype, a feature of central memory t cells (t cm ; fig. a -b). lck-cre +/-;dot l fl/wt heterozygous knock-out mice (het) did not show any phenotypic differences of cd + t cells (fig. a-b) . the lack of haplo-insufficiency was further confirmed by principal component analysis of rna-seq data indicating that wt and het cd + t cells clustered together but were separated from the ko cd + t cells, excluding gene-dosage effects (sup fig. i ). therefore, we restricted our further studies to the comparison of the ko and wt mice. the strong shift towards a cd + memory-phenotype in dot l ko was unexpected because dot l-ko mice were housed under the same conditions as their wt controls and the mice had not been specifically immunologically challenged. to unravel the molecular identity of the cd + cd l + dot l-ko cells in more detail we performed rna-seq analysis on sorted cd -cd l + (t n ) and cd + cd l + (t cm ) cd + t cell subsets from wt and ko mice. based on differential gene expression between t n and t cm cd + cells from wt mice, naïve and memory gene signatures were defined. interestingly, overlay of these signatures on wt cd -cd l + (t n ) and ko cd + cd l + (t cm ) cells showed that differential expression between t n and t cm cells was mostly conserved when dot l was ablated. (fig. c) , although there was misregulation of other genes as well (see below). these data suggest that in the absence of dot l, cd + t cells cannot retain the naïve identity but rather acquire, prematurely and in the absence of any overt immunological challenge, a transcriptome of memory-like cd + t cells. although wt and ko mice were exposed to the same environment it cannot be excluded that ko mice responded differentially to antigens in the environment. if this is the case one expects skewing in the clonality of the tcrβ gene usage. in order to investigate this possibility, we examined the tcrβ repertoire. tcrb-sequencing revealed no difference in productive clonality scores between wt and ko cd + t cells (sup fig. a) . also, cdr length as well as tcrb-v and tcrb-j gene usage were unaffected (sup fig. b-d) . these data, together with the nearly complete loss of naïve cd + t cells, argued against any antigen-mediated bias in the selection for cd + t cells and indicated that cd + cd l + memory-like cd + t cells in ko mice were polyclonal and arose by antigen-independent differentiation of t n cells. antigen-independently differentiated memory-like cd + t cells have already been described in the literature and their origins and functions are subject of ongoing studies [ ] [ ] [ ] . depending on their origin and cytokine dependency they are referred to as 'virtual' or 'innate' memory cells. virtual memory cd + t cells have been suggested to arise in the periphery from cells that are cd high , related to high tcr affinity, and require il- , . in contrast, innate memory cd + t cells develop in the thymus and their generation and survival are generally considered to be dependent on il- signaling . we here collectively refer to them as antigen-independent memory-like cd + t cells (t aim ). a common feature of these t aim cells is reduced expression of cd d , a marker that is normally upregulated after antigen exposure. in addition, they express high levels of t-bet and eomes, encoding two memory/effector transcription factors , . in both dot l-ko and wt, the majority of the cd + cd l + (t cm ) cells were cd d negative. as a control, cd + cd leffector t cells (t eff ) from wt mice challenged with listeria monocytogenes were mostly cd d positive (fig. d) . this further indicates that the generation of cd + cd l + memory-like t cells in ko mice is independent of antigen exposure. of note, the percentage of cd d-negative cd + cd l + cells that we observed in wt mice corresponds to the percentage of t aim cells reported in wt b mice . regarding the expression of t-bet and eomes, most of the ko cd + t cells co-expressed t-bet and eomes. furthermore, eomes was expressed at a higher level in ko cd + t cells as compared to their wt counterpart ( fig. e-f ). together these characteristics are all in agreement with antigen-independent differentiation of naïve cd + -t cells in the absence of dot l. to determine whether peripheral t aim cells observed in the dot l-ko setting originate intrathymically, as reported previously for il- dependent innate memory t cells , we compared rna-seq data from sp cd + thymocytes from ko and wt mice. analyzing the relative distribution of memory and naïve signature genes revealed that memory genes were among the genes upregulated in ko sp cd + thymocytes (fig. a) . importantly, like in peripheral cd + t cells the expression of t-bet and eomes was highly upregulated in ko sp cd + thymocytes. this transcriptional upregulation was corroborated by flow cytometric analysis of protein expression. intracellular staining for the transcription factors showed that a small but substantial subset of the sp cd + ko thymocytes expressed t-bet and eomes at the protein level (fig. b-c) . together with the unperturbed tcrβ repertoire, this further supports the notion that differentiation of dot l-ko cd + t cells towards memory-like cells initiates intrathymically in an antigen-independent manner. innate memory cells have been suggested to arise in the thymus in response to an increase in il- producing plzf high invariant nkt (inkt) cells or γδ t cells . however, inkt cells were nearly absent in the thymus of dot l-ko mice ( fig. d-e) . furthermore, the number of γδ t cells did not differ significantly between wt and ko mice (fig. f) . previous studies on innate memory t cells from different mouse models demonstrated that introduction of a transgenic tcr, inhibiting the generation of inkt and γδ t cells, prevents the development of innate memory cells , . however, introduction of the transgenic ot-i tcr, a condition under which the number of inkt and γδ t cells is strongly reduced , , did not affect the memory-phenotype of dot l-ko cd + t cells (fig. g) . together, these findings indicate that the intrathymic differentiation of t aim cd + cells in the absence of dot l did not depend on an excess of il -producing cells in the thymic microenvironment as reported for innate memory t cells. rather the formation of t aim cells in dot l-ko mice likely relates to a cell-intrinsic mechanism. one of the cell-intrinsic mechanisms reported to be involved in the formation of t aim cells is aberrant tcr signaling , , , . furthermore, treatment of human t cells with dot l inhibitor impaired tcr sensitivity and attenuated low avidity t cell responses . this led us to investigate the expression of genes encoding tcr signaling components in the absence of dot l. rna-seq analyses confirmed that many tcr signaling genes were differentially expressed between wt and ko sp cd + thymocytes (fig. a ). importantly, cd ζ (cd ), a critical rate limiting factor in controlling the transport of fully assembled tcr/cd complexes to the cell surface, was downregulated in dot l-ko t cells , [ ] [ ] [ ] [ ] . in addition, other components of the tcr/cd complex like cd e and its associated co-receptor cd a/b were also downregulated in ko t cells. h k me chip-seq showed that these genes had h k me in wt mice and might therefore be directly regulated by dot l (sup fig. a-c) . as a consequence, one expects tcr/cd and cd αβ to be reduced at the cell surface of t aim cells, which we confirmed by flow cytometry (fig. b ). in addition to the cd /tcr complex, we observed downregulation of itk, a key tcr signaling molecule reported to be involved in innate memory cd + t cell formation . to exclude that the impaired tcr signaling in ko t cells could be compensated for by the selection of thymocytes expressing tcrs with altered affinity, we kept the tcr affinity identical by crossing the ot-i tcr transgene into our system. thymocytes expressing ot-i are positively selected in the presence of mhc class i (h- k b ), mainly generating cd + sp t cells expressing the exogenous ot-i tcr, with concomitant reduction of the cd + lineage . if dot ldeficiency impairs tcr surface density and signaling, positive selection of conventional ot-i cd thymocytes is expected to be compromised. in dot l-ko mice expressing ot-i, the number of sp cd + thymocytes was decreased ( . -fold) compared to wt mice expressing ot-i (fig. c ). this revealed that like with endogenous tcr, early intrathymic ablation of dot l in the t cell lineage prohibits positive section of conventional ot-i cd + t cells, but yet supports the generation and selection of t aim cells (fig. g ) of which the vast majority expressed both exogenous tcr chains (tcrαv and tcrβv ) (fig. d) . consistent with the low surface expression of tcr/cd and cd in the ko condition, the surface expression of ot-i tcr was also lower, as determined by siinfekl/h- k b tetramer staining (sup fig. d ). this was further validated by staining with antibodies for the transgenic tcr chains tcrαv and tcrβv which indicated two-fold reduction in ko ( fig. b and d) . the tcrαv element of the ot-i tcr was under the control of an exogenous promoter, suggesting that the reduced surface expression of ot-i tcr on ko t cells did not relate to transcriptional silencing of native tcr gene promoters. instead, the observed lower tcr surface levels in ko t cells likely relate to the reduced cd ζ expression . in conclusion, dot l regulates the levels of tcr complex and signaling molecules independent of the selected tcr. in the absence of these regulatory mechanisms, the identity of naïve cd + t cells cannot be maintained, contributing to premature t cell differentiation. a hallmark of memory t cells is that they proliferate faster and produce ifnγ rapidly upon stimulation compared to naïve t cells . likewise, innate and virtual memory t cells rapidly produce ifnγ upon tcr stimulation , . to determine the proliferative potential of dot l-ko t aim cells, we activated b-cell depleted, cfse-labeled splenocytes in vitro with anti-cd and anti-cd antibodies. proliferation was determined by cfse dilution. dot l-ko cd + t cells proliferated faster than wt cells (fig. a ). in addition, they rapidly became activated cd + cd l -(sup fig. a ). however, in contrast to wt, ko cd + t cells failed to produce ifnγ ( fig. b and sup fig. b) , and expressed lower levels of cd , a marker of activation (sup fig. c ). this indicates a functional impairment of dot l-ko t cells. to test their intrinsic competence to produce ifnγ, dot l-ko t cells were stimulated in a tcr-independent manner with pma and ionomycin. intracellular staining for ifnγ revealed that the percentage of stimulated cd + t cells producing ifnγ was . -fold higher in ko as compared to wt (sup fig. d ). this suggests that dot l-ko cd + t cells only partially respond to tcr stimulation, but that they have the intrinsic capacity to produce ifnγ. the responders in the wt population likely include the naturally existing virtual memory population ( fig. ). taken together, dot l ko t aim cells displayed some memory-associated features but simultaneously are functionally impaired. to determine the in vivo immune responsiveness of dot l-ko t cells, mice were challenged with a sublethal dose of listeria monocytogenes. the immune response was monitored in the blood at day - , and post injection (fig. c ). already at day most cd + t cells in the blood had differentiated towards an effector-phenotype (cd + cd l -) in ko mice (fig. d) , which is in line with the in vitro stimulation data. however, in accordance with the peak of a normal immune response initiated from naïve t cells , the peak of activated effectors in wt mice was reached only after seven days. at day and day post injection, mice were sacrificed and spleen and liver were used to determine clearance of the listeria by counting colony-forming units. despite the fact that in ko at day most t aim cells had differentiated into activated t cells, the listeria infection was not cleared. at day complete clearance was observed in the spleen from wt-mice, which is in accordance with literature ; however, the dot l ko mice failed to clear listeria (fig. e-f ). in order to investigate the reason for the compromised clearance in ko, despite the rapid t eff formation in the blood upon infection, we examined the splenic cd + t cells subsets in more detail. in wt mice the peak of activated (cd + cd l -) cd + t cells was at day , whereas in dot l ko mice this was at day ( fig. g -h and sup fig. e ). however, the absolute numbers of total cd + t cells in ko on day were not higher than in wt. furthermore, in wt the number of cd + t cells at day had increased . -fold, whereas in ko this was only . -fold, suggesting a lack of expansion despite the acquisition of an effector phenotype (fig. i) . indeed, staining for ki , a marker of proliferation showed that at day , the frequency of ki + cd + cd l -cd + t cells was not increased in ko cells compared to wt (fig. j) . besides the reduced proliferative capacity in vivo, even up to day , the ko cd + cd l -cd + t cells showed impaired induction of the effector hallmarks klrg , ifnγ and cd d ( fig. k -l, sup fig. b ). this demonstrates that in normal cd + t cells dot l-mediated h k methylation generally marks expressed genes, but only a subset of the methylated genes needs h k methylation for maintaining full expression levels. thus, in normal t cells dot l does not act as a transcriptional switch but rather seems to be required for transcription maintenance of a subset of methylated genes that are already on and it indirectly promotes repression of genes. amongst the differentially expressed genes marked with h k me , we searched for candidate direct target genes that could mediate the role of dot l in terminal t cell differentiation. we were especially interested in factors that could explain the prominent gene derepression in dot l-ko cells. to this end we selected genes that were significantly downregulated in ko and had harbored h k me at the ' end of the gene, both in t n and t cm . we further narrowed the list down to genes that were annotated as "negative regulator of transcription by rna polymerase ii", resulting in transcriptional regulators. among those, ezh emerged as a potentially relevant target of dot l that could explain part of the derepression of genes in dot l-ko cells (table ). ezh is part of the polycomb-repressive complex (prc ), which deposits h k me - , a mark involved in repression of developmentally-regulated genes and in switching off naïve and memory genes during terminal differentiation of effector cd + t cells , . although the change in ezh mrna expression was modest between wt and ko (sup fig. c ), the ezh gene was h k me methylated (sup fig. d) , and dot l-ko mice share several t-cell phenotypes with ezh -ko mice , , although this is dependent on the mouse model used [ ] [ ] [ ] . to further investigate the idea that misregulation of prc targets could be one of the downstream consequences of loss of dot l in cd + t cells, we compared the gene expression changes in an ezh -ko model with those seen in dot l-ko cd + t cells. this revealed substantial overlap between the derepressed genes in the two models (fig. b) , suggesting a functional connection between two seemingly opposing epigenetic pathways. furthermore, we determined h k me scores based on previous chip-seq studies and compared them with the gene expression in wt and dot l-ko sp cd + thymocytes and peripheral cd + t cells. this analysis showed that the genes that were upregulated in dot l-ko and lack h k me in wt were strongly enriched for h k me in wt memory precursor cd + t cells (fig. c) . as a control, expression matched non-differentially expressed genes were not enriched for h k me (fig. c) . taken together, these findings suggest that one of the consequences of loss of dot l-mediated h k me is derepression of a subset of prc targets that are normally actively repressed. this, together with the other transcriptional changes likely contributes to the perturbation of the epigenetic identity of cd + t aim cells, thereby compromising their ability to terminally differentiate in vivo. the histone methyltransferase dot l has emerged as a druggable target in mll-rearranged leukemia and additional roles in cancer have been suggested [ ] [ ] [ ] [ ] [ ] [ ] [ ] . this in combination with the availability of highly-specific dot l inhibitors make dot l a potential target for cancer therapy. however, the role of in cd + t cells, loss of dot l resulted in a massive gain of cd + cd l + memory-like cells associated with a simultaneous loss of the naïve compartment. these cells start to acquire memory features intrathymically, express a diverse un-skewed tcrβ repertoire, and lack expression of cd d. taken together, this suggests that these memory-like cd + t cells arose independently of foreign antigens, leading us to designate these cells as antigen-independent memory-like t aim cd + cells. importantly, such unconventional memory-phenotype cells constitute a substantial ( - %) fraction of the peripheral cd + t cell compartment in wt mice and humans , , , , . the fraction of unconventional memory-phenotype t cells further increases with age , . the biological role of antigen-inexperienced memory cd + t cells is still not completely understood, but these cells have been observed to respond more rapidly to tcr activation than t n cells and they have been suggested to provide by-stander protection against infection in an antigen-independent way , . however, in older mice, virtual memory cells have been reported to lose their proliferative potential and acquire characteristics of senescence . the mechanism of unconventional memory-phenotype differentiation is also poorly understood . in some mouse models cd + t cells differentiate upon excess production of il- by inkt or γδ t cells , , [ ] [ ] [ ] [ ] . in contrast, other mouse models report that antigen inexperienced memory cd + t cell differentiation is cell intrinsic , . here we show that dot l plays an important role in t aim cd + cell differentiation by cell-intrinsic mechanisms. it has been established in several independent mouse models that the quality of tcr signaling closely relates to the formation of t aim cells , , . our results show that loss of dot l leads to reduced surface expression of the cd /tcr complex and co-receptors. this phenotype is likely related to the reduced expression of cd ζ (cd ), a target of dot l and a rate limiting molecule for assembly and transport of tcr/cd complexes to the cell surface , [ ] [ ] [ ] [ ] . the failure to upregulate the tcr/cd complex upon positive selection likely prohibits differentiation of conventional t n and supports formation of t aim cells. in addition, dot l ablation perturbed expression of tcr signaling genes, including itk (il -inducible t cell kinase; a member of the tec kinase family). disruption of itk signaling has also been reported to lead to antigen-independent t-cell differentiation . together, this suggests that one of the key functions of dot l is to ensure adequate tcr surface expression and signaling to maintain naivety and prevent t aim cell differentiation. the discovery of dot l as a key player in preventing premature antigen-independent differentiation towards memory-type cells warrants further investigation and will aid in further uncovering the origin and regulation of this emerging and intriguing subset of the immune system. functionally, dot l-ko cd + t aim cells partially displayed features of memory cells in vitro but in vivo showed an impaired immune response against listeria monocytogenes. the rapid differentiation towards effector cells suggests that dot l-ko t aim cells might execute a partial by-stander response, but dot l appears to be required to accomplish full effector function. interestingly, similar features have been observed for virtual memory cells . the reduced tcr density and affected tcr signaling network in dot l-ko t cells likely further contribute to the impaired immune response. how does dot l affect t cell differentiation at the chromatin level? inspection of the transcriptome and epigenome provided evidence that dot l methylates transcriptionally active genes in t cells and positively affects gene expression. however, only a subset of the targets required dot l for maintenance of normal expression levels, which is in agreement with previous observations , . why some genes depend on dot l/h k methylation and others not is not known yet although a recent study indicates that in mll-rearranged leukemic cell lines some genes harbor a ' enhancer located in the h k me / marked genic region, which can make them more sensitive to loss of dot l . one of the genes that was h k methylated and dependent on dot l in normal peripheral cd + t cells was ezh and ezh /prc targets were derepressed in the absence of dot l. importantly, analysis of data from kagoya et al. showed that ezh expression is also reduced in human t cells in which dot l was inactivated not by deletion but by treatment with a dot l inhibitor (sup fig. e ). this suggests that the epigenetic crosstalk that we uncovered in mice is evolutionarily conserved. interestingly, recruitment of dot l to nucleosomes by its interaction partner af has been shown to be negatively affected by methylation of h k . therefore, the crosstalk between dot l and prc might involve mutual interactions. restricting dot l to nucleosomes unmethylated at h k might be one of the mechanisms by which h k me and h k me are directed to non-overlapping sites, as we also observed in t cells (fig. c) . derepression of some of the targets of prc is just one of the consequences of loss of dot l. besides ezh , dot l affects the expression of other genes, including several additional candidate transcription regulators ( table ). in the future, it will be important to determine other mechanisms by which dot l affects the cd + t cell transcriptome to fully understand its central role in cd + t cell biology. finally, we cannot exclude that dot l has additional methylation targets besides h k that contribute to role of dot l in safeguarding t cell differentiation and effector functions. although understanding the mechanisms in more detail will require further studies, the role of dot l in preventing premature differentiation and safeguarding the epigenetic identity is conserved in other lymphocyte subsets. in an accompanying study (aslam et al) , we observed in an independent mouse model that loss of dot l in b cells also led to premature differentiation, perturbed repression of prc targets, and a compromised humoral immune response. therefore, dot l is emerging as a central epigenetic regulator of lymphocyte differentiation and functionality. in conclusion, we identify h k methylation by dot l as an activating epigenetic mark critical for cd + t cell differentiation and maintenance of epigenetic identity. further investigation into the central role of the druggable epigenetic writer dot l in lymphocytes is likely to provide novel strategies for immune modulations and disease intervention [ ] [ ] [ ] [ ] . lck-cre;dot l fl/fl mice have been described elsewhere previously (table ) . of note, for ot-i tetramer stains the cd antibody clone - . was used . for intracellular staining, cells were fixed and permeabilized using the transcription factor buffer kit (benton dickinson). antibodies for intracellular staining were diluted : in perm/wash buffer (table ) . for h k me staining, cells were first stained with surface markers and fixed and permeabilized as rna-seq reads were mapped to mm (ensembl grcm ) using tophat with the arguments `--prefiltermultihits -no-coverage-search -bowtie -library-type fr-firststrand` using a transcriptome index. counts per gene were obtained using htseq-count with the options `-m union -s no` and ensembl grcm . gene models. analysis was restricted to genes that have least counts in at least samples, and at least counts in samples in specific contrasts, to exclude very low abundance genes. differential expression analysis was performed on only relevant samples using deseq and default arguments with the design set to either dot lko status or cell type. adaptive effect size shrinkage was performed with the ashr r package to control for the lower information content in low abundance transcripts. genes were considered to be differentially expressed when the p-value of the negative binomial wald test was below . after the benjamini-hochberg multiple testing correction. sets of differentially expressed genes in indicated conditions were called 'gene signatures'. an exception was made for the ezh rna-seq data from he et al., wherein the dispersion was estimated with a local fit, using `estimatedispersion` function in deseq with the argument `fittype = "local"` and using an adjusted pvalue cutoff of . , to increase the number of detected differential genes. principal component analysis was performed using the `prcomp` function on variance stabilizing transformed data of all samples with the `vst` function from the deseq package on all samples using default arguments. for analyses where we performed expression matching, we chose genes with an absolute log fold changes less than . and false discovery rate corrected p-values above . that were closest in mean expression to each of the genes being matched without replacement. sorted cells were centrifuged at rcf. the pellet was resuspended in imdm containing % fcs and formaldehyde (sigma) was added to a final concentration of %. after min incubation at rt glycine (final concentration mm) was added and incubated for min. cells were washed twice with ice-cold pbs containing complete, edta free, protein inhibitor cocktail (pic) (roche). cross-linked cell pellets were stored at - o c. pellets were resuspended in cold nuclei lysis buffer ( mm tris-hcl ph . , mm edta ph . , %sds) + pic and incubated for at least min. cells were sonicated with pico to an average length of - bp using s on/ s off for min. after centrifugation at high speed debris was removed and x volume of chip dilution buffer ( mm tris-hcl ph , . m nacl, . % triton x- , . % sodium deoxycholate) + pic and x volume of ripa- ( mm tris-hcl ph , . m nacl, mm edta ph , . % sds, % triton x- , . % sodium deoxycholate) + pic was added. shearing efficiency was confirmed by reverse crosslinking the chromatin and checking the size on agarose gel. chromatin was pre-cleared by adding proteing dynabeads (life technologies) and rotation for hour at o c. after the beads were removed μl h k me , μl h k me (nl , merck millipore) and μl h k me (ab , abcam) were added and incubated overnight at o c. proteing dynabeads were added to the ip and incubated for hours at o c. beads with bound immune complexes were subsequently washed with ripa- , times ripa- ( mm tris-hcl ph , . m nacl, mm edta ph , . % sds, % triton x- , . % sodium deoxycholate), times ripa-licl ( mm tris-hcl ph , mm edta ph , % nonidet p- , . % sodium deoxycholate, . m licl ) and te. beads were resuspended in μl direct elution buffer ( mm tris-hcl ph , . m nacl, mm edta ph , . %sds) and incubated overnight at o c and input samples were included. supernatant was transferred to a new tub and μl rnase a (sigma) and μl protk (sigma) was added per sample and incubated at o c for hour. dna was purified using qiagen purification columns. library preparation was done using kapa ltp library preparation kit using the manufacturer's protocol with slight modifications. briefly, after end-repair and a-tailing adaptor were ligated followed by solid phase reverisble immobilization (spri) clean-up. libraries were amplified by pcr and fragments between - bp were selected using ampure xp beads (beckman coultier). the libraries were analyzed for size and quantity of dnas on a bioanalyzer using a high sensitivity dna kit (agilent), diluted and pooled in multiplex sequencing pools. the libraries were sequenced as base single reads on a hiseq (illumina). chip-seq samples were mapped to mm (ensembl grcm ) using bwa-mem with the option '-m' and filtered for reads with a mapping quality higher than . duplicate reads were removed using markduplicates from the picard toolset with `validation_stringency=lenient` and `remove_duplicates=false` as argument. low quality reads with any flag bit set to were removed. bigwig tracks were generated by using bamcoverage from deeptools using the following arguments: `-of bigwig -binsize -normalizeusing rpgc -ignorefornormalization chrm -effectivegenomesize `. for visualization of heatmaps and genomic tracks, bigwig files were loaded into r using the `import.bw()` function from the rtracklayer r package. tsss for heatmaps were taken from ensembl grcm . gene models by taking the first base pair of the ' utr of transcripts. when such annotation was missing, the most ' position of the first exon was taken. genes were selected that are down in ko with adjusted p-value < . , with h k me normalized reads > and the go annotation "negative regulator of transcription by rna polymerase ii". between , and , , sorted cells were used for tcrβ-seq. dna was extracted using qiagen dna isolation kit (qiagen) and collected in μl te buffer. the concentration was determined with nanodrop. samples were sequenced and processed by adaptive biotechnologies using immunoseq mm tcrb service (survey). statistical analyses were performed using prism (graphpad). data are presented as means ±sd unless otherwise indicated in the figure legends. the unpaired student's t-test with two-tailed distributions was used for statistical analyses. a p-value < . was considered statistically significant. * p < . , ** p < . , *** p < . . we thank the nki animal pathology facility for histology and immunohistochemistry, as well as advice, the nki genomics core facility for library preparations and sequencing, the nki flow cytometry facility for assistance, the nki animal laboratory intervention unit for performing immunization experiments, and the caretakers of the nki animal laboratory facility for assistance and excellent animal care. the following reagent(s) was/were obtained through the nih tetramer core facility: cd d-pbs and cd d- percentage of cd cd l subsets in cd + t cells from spleen at day and h) day . i) absolute number of cd + cd + t cells in spleen at day and day . j) intracellular ki positive cells in cd + splenocytes. k) percentage of klrg + in cd + t cells from the spleen. l) percentage of ifnγ-producing cells in cd + t cells from spleen after four hours stimulation with pma/ionomycin. g-l) data is represented as mean +sd. composition and function of t-cell receptor and b-cell receptor complexes on precursor lymphocytes similar but different: virtual memory cd t cells as a memory-like cell population innate memory t cells antigen-inexperienced memory cd (+) t cells: where they come from and why we need them epigenetic control of cd (+) t cell differentiation the interface between transcriptional and epigenetic control of effector and memory cd + t-cell differentiation remembering to remember: t cell memory maintenance and plasticity epigenetic maintenance of acquired gene expression programs during memory cd t cell homeostasis the molecular basis of the memory t cell response: differential gene expression and its epigenetic regulation metabolic and epigenetic coordination of t cell and macrophage immunity the upstreams and downstreams of h k methylation by dot l dot l/kmt recruitment and h k methylation are ubiquitously coupled with gene transcription in mammalian cells dot l deficiency leads to increased intercalated cells and upregulation of v-atpase b in mice the zfp- (af )/dot- complex opposes h b ubiquitination to reduce pol ii transcription nucleosome turnover regulates histone methylation patterns over the genome the histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote dot l and h k methylation in transcription and genomic stability dot a-af complex mediates histone h lys- hypermethylation and repression of enacalpha in an aldosterone-sensitive manner the emerging roles of dot l in leukemia and normal development methylation of histone h k by dot l requires multiple contacts with the ubiquitinated nucleosome chromatin-modifying enzymes as modulators of reprogramming mll-rearranged leukemia is dependent on aberrant h k methylation by dot l the dot l inhibitor pinometostat reduces h k methylation and has modest clinical activity in adult acute leukemia leukaemic transformation by calm-af involves upregulation of hoxa by hdot l conserved crosstalk between histone deacetylation and h k methylation generates dot l-dose dependency in hdac -deficient thymic lymphoma dot l inhibition attenuates graft-versus-host disease by allogeneic t cells in adoptive immunotherapy models a chemical biology toolbox to study protein methyltransferases and epigenetic signaling the histone h k methyltransferase dot l is essential for mammalian development and heterochromatin structure early mammalian erythropoiesis requires the dot l methyltransferase progressive methylation of ageing histones by dot functions as a timer direct screening for chromatin status on dna barcodes in yeast delineates the regulome of h k methylation by dot patterns and mechanisms of ancestral histone protein inheritance in budding yeast the virtuous self-tolerance of virtual memory t cells strong homeostatic tcr signals induce formation of self-tolerant virtual memory cd t cells the antigen-specific cd + t cell repertoire in unimmunized mice includes memory phenotype cells bearing markers of homeostatic expansion type i interferons regulate eomesodermin expression and the development of unconventional memory cd (+) t cells effector and memory cd + t cell fate coupled by t-bet and eomesodermin virtual memory cd t cells display unique functional properties the tec family tyrosine kinases itk and rlk regulate the development of conventional cd + t cells early transcriptional and epigenetic regulation of cd + t cell differentiation revealed by singlecell rna sequencing immediate antigen-specific effector functions by tcr-transgenic cd + nkt cells isotypic exclusion of γδ t cell receptors in transgenic mice bearing a rearranged β-chain gene two separate defects affecting true naive or virtual memory t cell precursors combine to reduce naive t cell responses with aging virtual memory t cells develop and mediate bystander protective immunity in an il- -dependent manner the cd ζ subunit contains a phosphoinositide-binding motif that is required for the stable accumulation of tcr-cd complex at the immunological synapse the t cell receptor: critical role of the membrane environment in receptor assembly and function tcr ζ-chain downregulation: curtailing an excessive inflammatory immune response genetic and mutational analysis of the t-cell antigen receptor tcr signaling via tec kinase itk and interferon regulatory factor (irf ) regulates cd + t-cell differentiation t cell receptor antagonist peptides induce positive selection effector and memory t-cell differentiation: implications for vaccine development t-cell activation, proliferation and apoptosis in primary listeria monocytogenes infection reciprocal intronic and exonic histone modification regions in humans the polycomb complex prc and its mark in life prc is high maintenance the complexity of prc subcomplexes mechanisms regulating prc recruitment and enzymatic activity prc . and prc . synergize to coordinate h k trimethylation structure, mechanism, and regulation of polycomb-repressive complex polycomb repressive complex -mediated chromatin repression guides effector cd + t cell terminal differentiation and loss of multipotency ezh phosphorylation state determines its capacity to maintain cd + t memory precursors for antitumor immunity mir- harnesses phf to potentiate cancer immunotherapy through epigenetic reprogramming of cd + t cell fate ezh regulates activation-induced cd + t cell cycle progression via repressing cdkn a and cdkn c expression epigenetic landscapes reveal transcription factors that regulate cd +t cell differentiation the protective role of dot l in uv-induced melanomagenesis a novel germline variant in the dot l gene co-segregating in a dutch family with a history of melanoma prognostic and therapeutic value of disruptor of telomeric silencing- -like (dot l) expression in patients with ovarian cancer the histone methyltransferase dot l promotes neuroblastoma by regulating gene transcription the epigenetic control of stemness in cd + t cell fate commitment acetylation of the cd locus by kat a determines memory t histone deacetylase regulates cell survival and tcr signaling in cd /cd double-positive thymocytes histone deacetylase functions as a key regulator of genes involved in both positive and negative selection of thymocytes hdac , a thymus-specific class ii histone deacetylase, regulates nur transcription and tcrmediated apoptosis histone deacetylase is required for efficient t-cell development hdac restrains cd -lineage genes to maintain a bi-potential state in cd +cd + thymocytes for cd -lineage commitment primary cutaneous cd + small/medium pleomorphic t-cell lymphoproliferative disorder: where do we stand? a systematic review derivation and maintenance of virtual memory cd t cells phenotype of nk-like cd (+) t cells with innate features in humans and their relevance in cancer diseases cutting edge: central memory cd t cells in aged mice are virtual memory cells virtual memory cells make a major contribution to the response of aged influenza-naïve mice to influenza virus infection age-related decline in primary cd + t cell responses is associated with the development of senescence in virtual memory cd + t cells tec kinase itk in γδt cells is pivotal for controlling ige production in vivo a non-canonical function of ezh preserves immune homeostasis coupling of t cell receptor specificity to natural killer t cell development by bivalent histone h methylation histone deacetylase mediates tissue-specific autoimmunity via control of innate effector function in invariant bcl b prevents the intrathymic development of innate cd t cells in a cell intrinsic manner the nf-κb transcription factor prevents the intrathymic development of cd t cells with memory properties dot l inhibition reveals a distinct subset of enhancers dependent on h k methylation mll-af spreading identifies binding sites that are distinct from super-enhancers and that govern sensitivity to dot l inhibition in leukemia the pzp domain of af senses unmodified h k to regulate dot l-mediated methylation of h k chromatin regulatory mechanisms and therapeutic opportunities in cancer epigenetic modulators, modifiers and mediators in cancer aetiology and progression advances in epigenetics link genetics to the environment and disease clinical epigenetics: seizing opportunities for translation a conditional knockout resource for the genome-wide study of mouse gene function lck-driven cre expression alters t cell development in the thymus and the frequencies and functions of peripheral t cell subsets critical role for cd in t cell receptor binding and activation by peptide/major histocompatibility complex multimers nonprocessive methylation by dot leads to functional redundancy of histone h k methylation states key: cord- - gbk t y authors: fettrow, tyler; reimann, hendrik; grenet, david; crenshaw, jeremy; higginson, jill; jeka, john title: walking cadence affects the recruitment of the medial-lateral balance mechanisms date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: gbk t y we have previously identified three balance mechanisms that young healthy adults use to maintain balance while walking. the three mechanisms are: ) the lateral ankle mechanism, an active modulation of ankle inversion/eversion in stance; ) the foot placement mechanism, an active shift of the swing foot placement; and ) the push-off mechanism, an active modulation of the ankle plantarflexion angle during double stance. here we seek to determine whether there are changes in neural control of balance when walking at different cadences and speeds. twenty-one healthy young adults walked on a self-paced treadmill while immersed in a d virtual reality cave, and periodically received balance perturbations (bipolar galvanic vestibular stimulation) eliciting a perceived fall to the side. subjects were instructed to match two cadences specified by a metronome, bpm (high) and bpm (low), which led to faster and slower gait speeds, respectively. the results indicate that subjects altered the use of the balance mechanisms at different cadences. the lateral ankle mechanism was used more in the low condition, while the foot placement mechanism was used more in the high condition. there was no difference in the use of the push-off mechanism between cadence conditions. these results suggest that neural control of balance is altered when gait characteristics such as cadence change, suggesting a flexible balance response that is sensitive to the constraints of the gait cycle. we speculate that the use of the balance mechanisms may be a factor resulting in well-known characteristics of gait in populations with compromised balance control, such as slower gait speed in older adults or higher cadence in people with parkinson’s disease. there is an ongoing debate about whether or not walking slower is more stable (bruijn et al., ). we know certain patient populations reduce their gait speed and increase their cadence (buckley et al., ; duan-porter et al., ; himann et al., ; lauretani et al., ) , but is the motivation to improve stability? key factors that lead to the decrease in gait speed in older adults remains unresolved, but we speculate that the use of the control of balance plays a larger role than previously recognized. our focus here is to assess changes in the control of balance at different gait speeds, and more specifically cadence. there is support for slower walking being more stable. though a firm definition is not established, stability has been quantified with different methods. roos and dingwell ( ) found that neuromuscular noise is diminished when walking slower. less noise generally results in better estimates of a physical quantity, and thus improved control. lyapunov exponents, a common tool used for quantifying overall stability, have been shown to be reduced at lower walking speeds (dingwell and marin, ; england and granata, ) . others have pointed to neurological deficiencies associated with slower walking, without referring to stability directly. anson et al. ( ) have shown vestibular function loss causes people to walk with longer, slower steps, and menz et al. ( ) have shown people with peripheral neuropathy reduced walking speed and cadence. hsiao et al. ( ) suggested that a lateral weight shift mechanism is impaired in chronic stroke, leading to reduced gait speed. another body of literature suggests fast walking is more stable. variability, which is often attributed to greater instability, lessens with increased velocity (donker and beek, ) . studies have also used the mediolateral margin of stability to quantify overall stability, which increases with speed (hof et al., ; gates et al., ) . a larger margin of stability is viewed as safer and more stable. recent modeling results indicate that slowed gait can be explained entirely by diminished muscle strength (song and geyer, ) . however, fan et al. ( ) has shown older adults with slower gait can walk faster if instructed, suggesting that locomotor capacity (i.e. force-generating capabilities) are not the reason for slowed gait. this suggests a complex relationship between stability and slowed gait, as muscle strength is considered necessary but its role in the control of balance is unknown, with other factors such as multisensory integration playing an important role (peterka, ; oie et al., ) . regardless of the walking speed, a common theme to the process of keeping a body upright is the relationship between the center of mass (com) and the center of pressure (cop). the behavior of the com behavior, can be explained by the relationship between cop and com, as in equation (winter, ) . the com acceleration is proportional to the distance between the cop and com. where ω = g/l, l is the height of the com above ground and g the acceleration from gravity. this relationship holds true for both the anterior-posterior direction and the medial-lateral direction. applying equation , a threat to balance would be an undesired movement of the com relative to the cop. to correct this undesired movement, a motor action has to be taken to accelerate the com in the opposite direction or, rather, to decelerate the com, thus stopping the movement. to prevent a fall, the com should not be able to accelerate to the point of no return in any particular direction. so the cop must be moved to accelerate the com in the opposite direction of its current travel. in a previous study with young healthy adults, we have identified three different mechanisms of balance control in response to perceived falls (reimann et al., b) . the perceived fall theoretically induces a perceived shift of the center of mass (com), requiring a motor action that shifts the com in the opposite direction. the following mechanisms are distinct balance mechanisms, observed in previous experiments, that can shift the cop and com in the mediallateral direction. the foot placement is the most commonly reported and quantified method of balance control (kuo, ; bauby and kuo, ; vlutters et al., ) . the foot placement mechanism consists of the swing foot moving in the direction of the perceived fall, and on heel strike, shifts the net cop in the direction of the perceived fall. in general, the foot placement mechanism refers to the modulation of step width in a particular direction (i.e. the direction of the perceived fall). wang and srinivasan ( ) show that % of the foot placement can be explained by the position and velocity of the com. theoretically, the sensory input we provide to subjects alters the perceived state of the com, thus there should be a difference between the predicted foot placement based on the com behavior, and the actual foot placement. we refer to this measure as the model-corrected foot placement. the lateral ankle is another mechanism that can shift the cop (hof et al., ) . the lateral ankle mechanism refers to the generation of ankle inversion/eversion torque during the single stance phase. activating musculature to roll the ankle while the foot is on the ground shifts the cop under the foot in the direction of the perceived fall. the goal is to accelerate the com away from the direction of the perceived fall, and by shifting the cop in the direction of the perceived fall, the com will accelerate away from the perceived fall. we refer to the third balance mechanism as the push-off mechanism. the pushoff mechanism refers to the modulation of ankle plantarflexion angle during double stance. in response to a visually perceived fall to the side, we observed a direction-dependent modulation of the stance leg ankle plantar/dorsiflexion angle. very few studies have recognized the pushoff mechanism's role in control of balance in the medial-lateral direction. ankle plantarflexion torque has been shown to change as a result of bipolar, binaural galvanic vestibular stimulation (iles et al., ) . only recently has the push-off been verified to have a functional role in balance control in the medial-lateral direction as kim and collins ( ) show modulation of ankle torque based on com behavior can reduce metabolic expenditure. furthermore, klemetti et al. ( ) have provided evidence that ankle plantarflexion torque induces trunk roll accelerations, providing further support for the push-off acting in the medial-lateral direction. the need for multiple mechanisms to promote balance derives from the gait cycle, which demands different mechanisms due to the changing configuration of the body (i.e., alternation of double stance and single stance). importantly, these mechanisms have a temporal order. the earliest response to a balance disturbance (i.e., at heel strike) is the lateral ankle mechanism, followed by a change in foot placement and push-off (reimann et al., b) . given that temporal order, we were interested in determining if gait characteristics can change how these balance mechanisms are used. specifically, are there changes in the use of the balance mechanisms at different cadences? modeling results suggest more frequent steps leads to more opportunities to correct undesired com movements with the foot placement mechanism (reimann et al., a) . we hypothesized that due to the longer single stance in slower walking, the lateral ankle mechanism would play a larger role in balance, and in faster walking, the foot placement mechanism would provide the majority of balance control. twenty-one healthy young subjects ( subjects walked on a split-belt, instrumented treadmill within a virtual environment projected onto a curved screen surrounding the treadmill as shown in figure the subjects performed two-minute trials. a metronome generated by custom labview software played throughout the trials at a frequency of bpm (low) or bpm (high), which alternated on every two minute trial. subjects were asked to match their cadence to the metronome. after a two-minute trial there was a second washout period of no metronome, followed by a second adaptation period for the new metronome frequency before introducing the balance perturbations. breaks were offered after every two-minute trials. during the two-minute trials, a target step number was randomized between - . the custom labview software counted heel strikes until the step counter matched the target step number. the leg of the heel strike that occurs on the target step number is referred to as the trigger leg. the labview software randomized between two conditions on a trigger, either we identified heel strikes on-line with a manually set vertical threshold of the heel position. the vertical threshold was set to the heel marker position during quiet standing. however, in the low condition people tended to "drag" their feet, which would cause steps to be counted both midway through swing and actual heel strike. to avoid double counting of steps, the experimenter increased the vertical threshold until double counting of steps did not occur. we recorded full-body kinematics using the plug-in gait marker set (davis et al., ) , with six additional markers on the anterior thigh, anterior tibia, and th metatarsal of each foot. another additional six markers were placed on the medial femoral epicondyles, medial malleolus, and the tip of the first distal phalanx of the foot for the static calibration pose at the beginning of the collection. we recorded electromyography signals from the tibialis anterior, peroneous longus, medial gastrocnemius, rectus femoris, biceps femoris, gluteus medius, tensor fascia latae, and erector spinae, bilaterally with cometa's electromyography picoemg sensors (bareggio, italy) with hydrogel mm x mm covidien kendall electrodes using seniam guidelines for placement (hermens et al., ) . marker positions were recorded at hz using a qualisys miqus motion capture system (gothenburg, sweden) with cameras. electromyography was recorded at hz. ground reaction forces and moments were collected at hz. we used custom matlab scripts for data management and organization. force plate data was low pass filtered with a th order butterworth zero-phase filter at a cut-off frequency of hz. emg data was bandpass filtered with a th order butterworth filter with cutoff frequencies of hz and hz, then rectified, then low-pass filtered with a th order butterworth filter at a cut-off frequency of hz. for each subject and emg channel, we calculated the average activation across all control strides and used this value to normalize emg before averaging across subjects. from the marker data, we calculated joint angle data and center of mass (com) positions based on a geometric model with segments (pelvis, torso, head, thighs, lower legs, feet, upper arms, forearms, hands) and degrees of freedom (dof) in opensim (john et al., ; seth et al., ) using an existing base model (zajac et al., ) . we interpolated the data between heel strikes to time points, representing percentage from heel strike to toe-off of the triggering leg. for the resulting interpolated trajectories for the perturbation steps, we subtracted the average of the control steps for the same stance foot. deviation of the perturbation steps away from the average of the control steps were interpreted as the response to the perceived fall induced by the vestibular stimulus (∆). for the model corrected foot placement, we fitted a linear regression model relating the foot placement changes for each subject to the changes of lateral position and velocity of the com at midstance using the control data (wang and srinivasan, ) . then for each stimulus step, we used this model to estimate the expected foot placement change based on the com state, and subtracted this from the observed foot placement change, resulting in an estimate of the foot placement change due to the visual stimulus (reimann et al., ) . we will refer to this model-based estimate as model corrected foot placement change. we hypothesized the lateral ankle mechanism would play a larger role in balance in the low condition, the foot placement mechanism would play a larger role in balance in the high. despite the expected difference in the use of the balance mechanisms, we also hypothesized that the overall shift of the com as a result of the response to the balance perturbation would not differ between conditions. to test our hypothesis about the foot placement mechanism, we analyzed the following five variables that are directly or indirectly related to the first post-stimulus swing leg heel strike: to test our hypothesis about the push-off mechanism, we analyzed the following two vari-ables related to the stance leg ankle push-off: (x) the ∆ plantarflexion angle integrated over the second post-stimulus double stance phase. (xi) the ∆ medial gastrocnemius emg of the stance leg integrated over the first post-stimulus swing phase. to test the our hypothesis about the overall balance response, we used (xii) the maximum shift of the com following the stimulus onset. we confirmed the assumptions of normality and homoscedasticity by visual inspection of the residual plots for the variables related to foot placement, lateral ankle, and push-off mechanisms. our primary analysis is a group analysis of the kinematic, kinetic, and electromyographical basis for the three balance mechanisms. to test our hypotheses about whether the relative influence of balance mechanisms changes at different cadences, we used r (r core team, ) and lme (bates et al., ) to perform a linear mixed effects analysis. for each outcome variable, we fitted a linear mixed model and performed an anova to analyze the symmetry of the balance response and interaction of the stimulus direction. we used satterthwaites method (fai and cornelius, ) implemented in the r-package lmertest (kuznetsova, ) . as fixed effects, we used triggering foot (left/right) to test the symmetry of the response and cadence (high/low) to test the difference in response to the balance perturbation in the two conditions. as random effects, we used individual intercepts for subjects. to analyze whether the differences between stimulus and control steps represented by the outcome variables were statistically significant, we calculated the least squares means and estimated the % confidence intervals for the intercept of each outcome variable at each level of the significant factor, using a kenward-roger approximation (halekoh and højsgaard, ) implemented in the rpackage emmeans (lenth, ) . we refrained from approximating p-values for the anova directly in the traditional format, which can currently not be calculated reliably due to the lack of analytical results for linear mixed models (bates et al., ) . results were judged statistically significant in two manners. first, the existence of the use of a variable associated with a balance mechanism in response to the perceived fall is judged statistically significant when the % confidence interval did not include zero (bolded in table ). second, the difference in the use of the variables associated with a balance mechanism is judged statistically significant when the % confidence intervals for the low and high condition did not overlap (highlighted grey in table ). we limited the statistical tests to the concrete hypotheses involving the use of the balance mechanisms that we had prior to performing the experiment (brenner, ) . subjects adjusted their stepping cadence in order to match the low or high metronome. the shift of the cop and com are generated by distinct balance mechanisms at different points of the gait cycle. the first mechanism that is able to act is the lateral ankle mechanism. the ankle of the new stance leg inverts during the first post-stimulus step relative to the unperturbed pattern ( figure a , table ). this ankle inversion change is larger in the low cadence ( figure a , table ). activity of the peroneous longus, an ankle everter decreases, beginning in the first double stance for both conditions. activity of the tibialis anterior, an ankle inverter, increases, beginning in the first double stance post-stimulus for both conditions. these changes are larger in the low condition for the tibialis anterior, but the confidence intervals overlap between the two cadence conditions for the peroneous longus (see table ). the timing and magnitude of these changes in kinematics and muscle activation align well with the initial cop-com displacement in the direction of the perceived fall during the first step for both cadence conditions, as shown in figure ( ). at the first post-stimulus heel-strike, subjects shifted their foot placement in the direction of the perceived fall in the high condition ( figure a ), as expected (see table ). contrary to our expectation, subjects tended to shift their foot placement in the opposite direction in the low condition, i.e. away from the perceived fall, although this shift was not statistically significant. surprisingly, this general pattern was similar for the model-corrected foot placement ( figure b ), where the shift away from the perceived fall in the low condition is statistically significant, though smaller (see table ). for the high condition, the swing leg hip is slightly adducted upon heel strike ( figure b ), though not statistically different from control steps (table ) . a hip adduction modulation would contribute to the swing leg heel moving towards the trigger leg. the low condition produces an abduction of the swing leg hip, allowing for movement of the heel away from the trigger leg. the combination of stance leg knee rotation and swing leg hip rotation yield produce a shift of the swing leg heel in the medial-lateral direction. for example, in the high condition, a trigger leg knee internal rotation ( figure c ) and a swing leg hip external rotation ( figure d ), in combination, result in a placement of the heel towards the trigger leg. the low condition shows an internally rotated trigger leg knee, but also an internally rotated swing leg hip. the change in leg joint angles support the observed shift in the swing leg heel position. the push-off mechanism was used for both cadence conditions (table ) , where an increased trigger leg plantarflexion angle was observed in the second double stance following the triggering heel strike ( figure a ). however, there was no difference in the plantarflexion mod-ulation between cadence conditions (see table ). similarly, there was no between-cadence difference in the trigger leg medial gastrocnemius emg (see table ). we studied the changes of the control of balance at different cadences by constraining subjects' stepping to either high or low cadence using a metronome and providing vestibular stimuli that induce the sensation of falling to the side. subjects responded to the balance perturbations by accelerating their body away from the direction of the perceived fall regardless of the cadence as observed in figure . we limited our analysis to the time interval from the heel-strike triggering the stimulus to the end of the second double stance post-stimulus, which encompasses the three previously identified balance mechanisms of lateral ankle roll, foot placement shift and push-off modulation. we observed changes in kinematics, ground reaction forces, and muscle activation to determine whether the neural control of balance is altered at different cadences. we found distinct changes in the use of balance mechanisms while walking at lower or higher cadences. we found that the overall effect of the stimulus seems to be invariant of the cadence condition. neither the time, nor the magnitude of the maximal com displacement in the direction of the perceived fall is significantly different between the two cadence conditions. however, the relative use of the mechanisms depends on the cadence. we found that the lateral ankle mechanism played a larger role in the low condition, supported by differences in the cop-com modulation, ankle inversion, and lateral ankle inverter (tibialis anterior) electromyography readings. the ankle inverted to a larger degree in low, indicating that the larger shift of the cop in the low condition is a result of the use of the lateral ankle mechanism. the increased inversion, along with the prolonged single stance period allows the cop and com to separate for a longer period of time. the combination of decreased peroneous longus muscle activity and increased activity of the tibialis anterior muscle in the low condition compared to the high condition, yields an increased inversion angle ( figure c ). the tibialis anterior is not only an ankle dorsiflexor, but also an ankle inverter, so the combined electromyographical changes to the muscles responsible for ankle inverting suggests the central nervous system is actively generating a larger shift of the cop under the stance foot in the low condition. the increased use of the lateral ankle mechanism during the low condition makes sense, given the increased duration of single stance in the low condition. we were surprised by the absence of a foot placement response to the perceived fall in the low condition. we had expected the foot placement shift to be smaller in the low condition, due to an increased and prolonged lateral ankle mechanism response, based on modeling results (reimann et al., ) . the lateral ankle mechanism accelerates the com, trying to stop the perceived fall, and this com shift is sensed by the cns and invokes a balance response in the opposite direction, which is expected to cancel out the response to the stimulus to some degree. following wang and srinivasan ( ) , we fitted a regression model to the relationship between the com at midstance and the foot placement shift in the unperturbed reference, controlling for the different cadence conditions (stimpson et al., ) . we used this model to estimate the expected foot placement shift based on the normal variability of the com movement and calculated the model-corrected foot placement shift by subtracting the model prediction from the observed value. this model-corrected foot placement shift isolates the response to the sensory perturbation, which should be equal between the two cadence conditions. in previous experiments, this model-corrected foot placement shift showed a more consistent response to the sensory perturbation than the uncorrected value (reimann et al., b) . contrary to this expectation, the model-corrected foot placement change observed at low cadence was significantly lower than at high cadence, and even below zero on average, though this was not statistically significant (see table ). the results indicate that all three previously identified balance mechanisms are used to respond to the perceived fall, but the majority of the balance response is shifted to the lateral ankle mechanism when walking with a low cadence. in contrast, when walking at a higher cadence, the foot placement mechanism dominates the balance response. the fact that the push-off mechanism does not differ between conditions indicates that the push-off may not be a critical balance mechanism, but may provide a subtle balance related adjustment at the end of the gait cycle. unexpectedly, we observed a difference in the dorsiflexion angle between ca-dence conditions early in the gait cycle, with an increased dorsiflexion in the low condition (see figure a ). counterintuitively, the gastrocnemius emg is also increased in the low condition early in the gait cycle (see figure b ), therefore the dorsiflexion response must be attributed to the increased tibialis anterior activity (see figure c ). such an observation indicates a form of "stiffening" that we plan to systematically analyze in a future publication. these results support the idea of continuous monitoring of the balance response throughout the gait cycle, and that the balance mechanisms are in fact interdependent (fettrow et al., ) . considering that a change in foot placement has dominated the walking literature as the primary balance response (bruijn and van dieën, ; yiou et al., ) , our results argue against a single balance mechanism for balance control and support the idea of a coordinated response of multiple mechanisms to achieve flexible control of balance while walking. as the vestibular perturbation elicits a perceived acceleration of the com in a particular direction, the cns must shift the com an equal amount in the opposite direction to counter the perceived shift. the adjustment can be made continuously throughout the gait cycle, and the total balance response can be viewed as the summation of the balance mechanisms. these results suggest that the lateral ankle and foot placement mechanisms are interdependent while interdependence between the push-off mechanism and the other mechanisms, is weak at best. these findings will also inform our models of how the balance control system adjusts to altering gait parameters. the foot placement mechanism is the most commonly modeled balance mechanism (townsend, ; wang and srinivasan, ) , but to our knowledge no one has attempted to determine its use at varying cadences. furthermore, models of human locomotion have difficulty walking at slower speeds (song and geyer, ) . the difficulty with gait speed adjustment may point to limitations in the mechanisms available in these models for the control of balance. the inability for the models to walk slowly could be a result of missing the degree of freedom and control strategy to implement the lateral ankle, foot placement, and push-off mechanism in a coordinated fashion. to allow for a model to maintain walking at different gait speeds, multiple balance mechanisms may be required in order to reproduce as flexible a system as observed with human bipedal gait. finally, we speculate that these findings may shed light on the development of preferred walking speeds, particularly in populations that have difficulty with balance. older adults tend to decrease gait speed with age (himann et al., ; lauretani et al., ) , which according to this data set would result in more use of the lateral ankle mechanism. we also know that people with parkinson's disease take shorter faster steps (knutsson, ) , and although their overall gait speed is diminished, an increase in cadence may shift the majority of the balance response to the foot placement compared to age matched controls. reasons for why people with parkinson's disease would shift the balance response to foot placement lies in evidence that they have reduced proprioception (hwang et al., ) , possibly leading to an inability to sense the cop under the stance foot. this deficit would make the use of the lateral ankle mechanism unreliable. if a particular balance mechanism is unreliable or cannot be used, the cns must recruit the available balance mechanisms. in a hypothetical case that the lateral ankle mechanism activation is hindered, it seems logical to increase cadence to rely more on the foot placement mechanism. thus, we speculate that balance mechanisms play a role in preferred cadence, and possibly gait speed. future experiments will attempt to directly assess the role of cadence in balance and preferred gait speed in older adults and those with neurological conditions such as parkinson's disease. we limited our analysis to the time between triggering leg heel strike and the second double stance post-stimulus because all three balance mechanisms are used within this time frame, but also due to the difficulty interpreting subsequent steps. the response to the perturbation may continue into the next gait cycle (balance perturbation continues for ms), but at some point the cns responds to the self-inflated fall due to the response to the perceived fall. it is difficult to uncover when precisely the cns determines the initial response to the balance perturbation is erroneous, but we speculate that the total balance response to the perturbation is similar between cadence conditions due to the similar overall deviation of the com in figure . our current results show that higher cadence results in faster gait speed, at least for healthy young adults walking on a self-paced treadmill. the relationship between gait speed and cadence is non-trivial, making it problematic to unpack the effects of cadence versus gait speed. the convergence to a faster gait speed when adopting a higher cadence may be related to metabolic efficiency, but this is an avenue far removed from the focus of the current work. to our knowledge this relationship has not been explored. our current experimental methods may lead to limitations in the presentation of the pushoff mechanism. we refer to the change in ankle plantarflexion in response to the balance perturbation as push-off, because to date this is the best term to describe what we believe the change in ankle plantarflexion does. due to the experimental design, we do not have reliable ground reaction forces and moments. providing balance perturbations, in addition to instructions to "walk normally", leads to many situations where one foot is on two force plates, or during double stance, two feet are on one force plate. these situations make it difficult to determine the ground reaction forces and moments associated with the change in ankle plantarflexion angle. another technical consideration in the current methodology is the heel strike identification method. different walking speeds in the cadence conditions produced different vertical heel trajectory profiles which required intervention by the experimenter (see methods). increasing the vertical threshold created a longer period of time from threshold crossing to actual heel strike for the low condition. therefore, the stimulus for the low condition was triggered on average ∼ ms prior to heel strike, and the stimulus for the high condition was triggered ∼ ms prior to heel strike, coincidentally corresponding to ∼ % of the time between heel strikes for each condition. the earlier trigger in the low condition may partially explain the earlier response time in the cop-com ( figure ) and corresponding emg for the lateral ankle mechanism ( figure b ,c), but not the amplitude. we investigated whether individuals altered the control of balance if they were asked to step at different frequencies while walking on a self-paced treadmill as they periodically received a vestibularly-induced sensation of a fall to the side. the current findings support the idea that balance mechanisms are coordinated to produce an overall balance response. in the low condition, the lateral ankle mechanism plays the primary role in the overall balance response, while the foot placement mechanism is not observed. when cadence is decreased (low), single stance is longer, providing more opportunity to modulate the cop through ankle roll, and diminishing the need for a change in foot placement to maintain upright posture. in the high condition, with shorter single stance duration, the lateral ankle mechanism is less effective and the balance response shifts to reliance on the foot placement mechanism. these findings suggest lateral stability is not dependent on cadence or gait speed, but the method of obtaining stability is altered with cadence, providing evidence of a flexible neural control scheme that adapts to changing constraints while walking. moreover, such findings provide insight into adoption of preferred gait parameters, particularly in populations who have drastically altered gait speed or cadence such as older adults or people with parkinson's disease. step changes in response to the balance perturbation in ankle dorsiflexion (a), and the medial gastrocnemius emg (b). the baseline represents the average for the control steps. bold cures indicate the average and the light curves encasing the bold curves indicates the % confidence interval. curves start at the heel strike triggering the stimulus and show the subsequent two steps. blue curves -high metronome, yellow curves -low metronome. ds -double stance, ss -single stance. table : results of the anova indicating difference between conditions and which factors have a significant effect on the magnitude of the motor response to the balance perturbation (see table for statistics on the existence of the mechanism). reduced vestibular function is associated with longer, slower steps in healthy adults during normal speed walking fitting linear mixed-effects models using lme active control of lateral balance in human walking why we need to do fewer statistical tests control of human gait stability through foot placement is slow walking more stable a systematic review of the gait characteristics associated with cerebellar ataxia a gait analysis data collection and reduction technique kinematic variability and local dynamic stability of upper body motions when walking at different speeds interlimb coordination in prosthetic walking: effects of asymmetry and walking velocity hospitalization-associated change in gait speed and risk of functional limitations for older adults the influence of gait speed on local dynamic stability of walking approximate f-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments the influence of gait speed on the stability of walking among the elderly interdependence of the balance mechanisms during walking frontal plane dynamic margins of stability in individuals with and without transtibial amputation walking on a loose rock surface a kenward-roger approximation and parametric bootstrap methods for tests in linear mixed models -the r package pbkrtest hermens development of recommendations for semg sensors and sensor placement procedures age-related changes in speed of walking control of lateral balance in walking. experimental findings in normal subjects and above-knee amputees balance responses to lateral perturbations in human treadmill walking control of lateral weight transfer is associated with walking speed in individuals post-stroke a central processing sensory deficit with parkinson's disease human standing and walking: comparison of the effects of stimulation of the vestibular system opensim: open-source software to create and analyze dynamic simulations of movement once-per-step control of ankle-foot prosthesis push-off work reduces effort associated with balance during walking effects of gait speed on stability of walking revealed by simulated response to tripping perturbation an analysis of parkinsonian gait stabilization of lateral motion in passive dynamic walking age-associated changes in skeletal muscles and their effect on mobility: an operational diagnosis of sarcopenia least-squares means: the r package lsmeans walking stability and sensorimotor function in older people with diabetic peripheral neuropathy multisensory function: simultaneous re-weighting of vision and touch for the control of human posture sensorimotor integration in human postural control strategies for the control of balance during locomotion complementary mechanisms for upright balance during walking neural control of balance during walking influence of neuromuscular noise and walking speed on fall risk and dynamic stability in a d dynamic walking model opensim: simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement a neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion predictive neuromechanical simulations indicate why walking performance declines with ageing effects of walking speed on the step-by-step control of step width biped gait stabilization via foot placement center of mass velocity-based predictions in balance recovery following pelvis perturbations during human walking stepping in the direction of the fall : the next foot placement can be predicted from current upper body state in steady-state walking human blance and posture control during standing and walking balance control during gait initiation: state-of-the-art and research perspectives an interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures key: cord- -mf oa authors: mazhari, ramin; brewster, jessica; fong, rich; bourke, caitlin; liu, zoe sj; takashima, eizo; tsuboi, takafumi; tham, wai-hong; harbers, matthias; chitnis, chetan; healer, julie; ome-kaius, maria; sattabongkot, jetsumon; kazura, james; robinson, leanne j.; king, christopher; mueller, ivo; longley, rhea j. title: a comparison of non-magnetic and magnetic beads for measuring igg antibodies against p. vivax antigens in a multiplexed bead-based assay using luminex® technology (bio-plex® or magpix®) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mf oa multiplexed bead-based assays that use luminex xmap® technology have become popular for measuring antibodies against proteins of interest in many fields, including malaria and more recently sars-cov- /covid- . there are currently two formats that are widely used: non-magnetic beads or magnetic beads. data is lacking regarding the comparability of results obtained using these two types of beads, and for assays run on different instruments. whilst non-magnetic beads can only be run on flow-based instruments (such as the luminex® / ™ or bio-plex® ), magnetic beads can be run on both these and the newer magpix® instruments. in this study we utilized a panel of purified recombinant plasmodium vivax proteins and samples from malaria-endemic areas to measure p. vivax-specific igg responses using different combinations of beads and instruments. we directly compared: i) non-magnetic versus magnetic beads run on a bio-plex® , ii) magnetic beads run on the bio-plex® versus magpix® and iii) non-magnetic beads run on a bio-plex® versus magnetic beads run on the magpix®. we also performed an external validation of our optimized assay. we observed that igg antibody responses, measured against our panel of p. vivax proteins, were strongly correlated in all three of our comparisons, however higher amounts of protein were required for coupling to magnetic beads. our external validation indicated that results generated in different laboratories using the same coupled beads are also highly comparable, particularly if a reference standard curve is used. the carboxylated beads were sourced from bio-rad (bio-plex cooh beads, ml, . x beads/ml and bio-plex pro magnetic cooh beads, ml, . x bead/ml) and stored at - °c. optimisation of coupling procedures for non-magnetic and magnetic beads were done separately, due to the larger size of the magnetic beads generally requiring more protein (see results). to be able to measure all plasma samples at the same dilution, we optimized all protein concentrations by generating a log-linear standard curve with a positive control plasma pool from immune png donors (high responders to plasmodium antigens). coupling of p. vivax proteins to non-magnetic beads was performed as previously described [ ] . briefly, the optimised antigen concentration (tables and ) was coupled to . x pre-activated microspheres, in mm monobasic sodium phosphate buffer ph . , using mg/ml sulfo-nhs and mg/ml of edc to cross-link the proteins to the beads. the activated beads were washed and stored in pbs, . % bsa, . % tween- , . % na-azide, ph . at °c until use. for the coupling to magnetic beads, a magnet rack was used for pelleting the beads, instead of the centrifugation step for non-magnetic beads. we qualitatively assessed the stability of the coupled beads by visual comparison of the mfi of the standard curve over a nine-month period. table for a complete list of proteins and the optimised amount coupled to non-magnetic and magnetic beads. to enable these parametric correlations, data were log-transformed prior to the analysis to better fit the normal distribution. it was again observed that there was a strong correlation between results obtained using the non- during the same week assays were performed to measure total igg antibodies against these p. vivax (table , scatter plots in figure s ). the same correlation analysis was then performed on data converted in r using the standard curves (to account for any plate-plate variation). strong correlation coefficients were observed for all proteins, including pvx_ (r values > . , p< . ) ( table , scatter plots in figure s ). for the majority of proteins, the correlation was stronger after conversion (table ) . this is expected given the conversion, based on the standard curve generated with a plasma pool from immune png donors, is used to account for any plate-plate variation. these results indicate that data generated using this multiplexed assay are highly reproducible in a vivax. importantly, we also assessed the stability of the coupled beads by running the standard curve times over a period of months (intensely for months) ( figure s ). for most proteins the coupled beads were highly stable ( / tested over -months), with the mfi dropping for three proteins and increasing for two proteins. this is supported by previous research that has indicated the stability of protein-coupled beads [ ] , noting that the stability may vary by antigen [ ]. the aim of this study was to demonstrate that multiplexing assays performed using magnetic beads or non-magnetic beads are highly comparable, independent of the beads and platform used to analyze the assays. we compared here a total of p. vivax proteins that were coupled to both magnetic beads and non-magnetic beads. the protein concentration used for the couplings was individually determined by optimisation for each protein for the chosen bead type (table ) figure s : stability of protein-coupled magnetic beads over -months. the original coupled beads were tested at every week for months after coupling, then again at months post-coupling. the mfi of the standard curves are presented (s = / , then -fold serial dilution optimization of incubation conditions of plasmodium falciparum antibody multiplex assays to measure igg, igg - , igm and ige using standard and customized reference pools for sero-epidemiological and vaccine studies epub / / the relationship between anti-merozoite antibodies and incidence of plasmodium falciparum malaria: a systematic review and meta- analysis multiplexed microsphere suspension array-based methods and protocols analysis of factors affecting the variability of a quantitative suspension bead array assay measuring igg to multiple plasmodium antigens optimization of a magnetic bead-based assay (magpix((r))-luminex) for immune surveillance of exposure to malaria using multiple plasmodium antigens and sera from different endemic settings serological signatures of sars-cov- infection: implications for antibody-based diagnostics distinct systems serology features in children, elderly and covid patients highly sensitive and specific multiplex antibody assays to quantify immunoglobulins m, a and g against sars-cov- antigens asymptomatic plasmodium vivax infections induce robust igg responses to multiple blood-stage proteins in a low-transmission region of western thailand development and validation of serological markers for detecting recent plasmodium vivax infection highly heterogeneous residual malaria risk in western thailand molecular epidemiology of residual plasmodium vivax transmission in a paediatric cohort in solomon islands comparison of non- magnetic and magnetic beads multiplex assay for assessment of plasmodium falciparum antibodies the establishment of a who reference reagent for anti-malaria (plasmodium falciparum) human serum optimisation and standardisation of a multiplex immunoassay of diverse plasmodium falciparum antigens to assess changes in malaria transmission using sero-epidemiology comparison of non-magnetic and magnetic beads in bead-based assays development of a high-throughput bead based assay system to measure hiv- specific immune signatures in clinical samples the development of a multiplex serological assay for avian influenza based on luminex technology measurement of antibodies to pneumococcal, meningococcal and haemophilus polysaccharides, and tetanus and diphtheria toxoids using a -plexed assay key: cord- -g g ti authors: wang, hao-yuan; oltion, keely; al-khdhairawi, amjad ayad qatran; weber, jean-frédéric f.; taunton, jack title: total synthesis and biological characterization of sr-a , a ternatin-related eef a inhibitor with enhanced cellular residence time date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g g ti ternatin and related cyclic peptides inhibit the elongation phase of protein synthesis by targeting the eukaryotic elongation factor- α (eef a), a potential therapeutic vulnerability in cancer and viral infections. the cyclic peptide natural product “a ” appears to be related to ternatin, but its complete structure is unknown and only of its stereocenters have been assigned. hence, a could be any one of possible stereoisomers. guided by the stereochemistry of ternatin and more potent structural variants, we synthesized two a epimers, “sr-a ” and “ss-a ”. we found that synthetic sr-a is indistinguishable from naturally derived a and potently inhibits cancer cell proliferation. relative to ss-a and previously characterized ternatin variants, sr-a exhibits a dramatically enhanced duration of action. this increase in cellular residence time is conferred, stereospecifically, by a single β-hydroxy group attached to n-methyl leucine. sr-a thus exemplifies a mechanism for enhancing the pharmacological potency of cyclic peptide natural products via side-chain hydroxylation. eukaryotic elongation factor- a (eef a) is an essential component of the translation machinery. during the elongation phase of protein synthesis, gtp-bound eef a delivers an aminoacyl-trna (aa-trna) to the ribosomal a site for selection. base pairing between the a-site mrna codon and aa-trna anticodon promotes gtp hydrolysis on eef a, releasing the aa-trna and facilitating peptide bond formation with the nascent peptidyl-trna in the p site. because tumor cell growth and viral replication require elevated protein synthesis rates, eef a inhibitors -all of which are macrocyclic natural products -have been evaluated as potential anticancer and antiviral drugs. didemnin b, - cytotrienin a, and nannocystin a are examples of structurally diverse macrocycles that bind eef a and inhibit translation elongation. in , dehydrodidemnin b (plitidepsin) was approved in australia for the treatment of relapsed/refractory multiple myeloma. plitidepsin is currently being tested in hospitalized covid- patients. the natural product "a " is a cyclic heptapeptide whose complete structure has not been reported. as described in a patent application, a was isolated from an aspergillus strain and was found to inhibit cancer cell proliferation at low nanomolar concentrations. although its amino acid sequence and n-methylation pattern were elucidated, only out of stereocenters could be assigned ( figure ). motivated by its potent antiproliferative activity and unknown mechanism of action, we sought to determine which of the possible stereoisomers (based on unassigned stereocenters) corresponds to a . based on our hypothesis that a is structurally related to the anti-adipogenic cyclic heptapeptide, ternatin, we previously designed and synthesized "ternatin- ", which incorporates the dehydromethyl leucine (dhml) and pipecolic acid residues found in a , yet lacks the b-hydroxy group attached to n-me-leu ( figure ). we discovered that ternatin and ternatin- inhibit cancer cell proliferation by targeting eef a, with ternatin- being up to -fold more potent than ternatin. recently, we found that ternatin- potently blocks replication of the novel coronavirus sars-cov , without obvious cytotoxic effects. figure . partially determined structure of the natural product a . based on previous studies of ternatin and ternatin- , we hypothesized that a corresponds to either one of two epimers, "sr-a " or "ss-a ". although the potent antiproliferative activity exhibited by ternatin- suggested a structural kinship with naturally derived a , this remained unproven. assuming our structural hypothesis is correct, a single stereocenter in a (n-me-b-oh-leu side chain) remained ambiguous. a more intriguing question concerns the role of n-me-b-oh-leu in the biological activity of a as compared to ternatin and ternatin- , both of which lack a b-hydroxy group at the equivalent position. non-proteinogenic b-hydroxy amino acids are frequently found in macrocyclic natural products, yet the stereospecific roles of this biosynthetic modification are unknown in most cases. here, we report the first total synthesis and biological characterization of two a epimers, sr-a and ss-a , in which (s,r)-and (s,s)-n-me-b-oh-leu replaces n-me-leu of ternatin- ( figure ). synthetic sr-a is spectroscopically and biologically indistinguishable from the natural product a , whereas ss-a has distinct properties. similar to ternatin- , sr-a potently inhibited cell proliferation and protein synthesis by targeting eef a. transient exposure of cells to sr-a , followed by washout, led to long-lasting inhibitory effects, whereas sustained inhibition was not observed with ss-a or ternatin- . our data thus reveal a striking and stereospecific increase in cellular residence time conferred by a single oxygen atom appended to a macrocyclic eef a inhibitor. we previously found that replacing (s)-leucine in ternatin with (s,r)-dehydromethyl leucine (hereafter "dhml") leads to increased potency, and we hypothesized that the stereochemistry of dhml in the natural product a is of the same configuration ( figure ). because our original -step synthesis of dhml methyl ester was low yielding and required a costly chiral auxiliary, we developed a more efficient, second-generation synthesis suitable for preparing gram quantities of fmoc-dhml. the copper(i)-promoted sn ' reaction between a serine-derived organozinc reagent and allylic electrophiles has been previously exploited to synthesize amino acids that contain a gstereogenic center. [ ] [ ] this method was appealing because it would provide dhml (as the boc methyl ester) in only two steps from the inexpensive chiral building block, boc-(s)-serine-ome. using previously reported conditions in which the organozinc reagent was generated in situ from boc-iodoalanine-ome , the sn ' reaction with crotyl chloride was slightly favored over the sn pathway, providing the desired boc-dhml-ome in % isolated yield ( figure , entry ). after extensive optimization, aimed at improving sn ' vs. sn selectivity and conversion, we a solid-phase route was previously employed to synthesize a linear heptapeptide precursor of ternatin, followed by solution-phase cyclization. however, this strategy involved macrocyclization between the secondary amine of n-me-ala and the carboxylic acid of leu ( figure a, site a) , which we found to be low-yielding in the context of peptides containing dhml at the carboxy terminus. thus, we sought to identify an alternative cyclization site using the ternatin-related cyclic peptide as a model system (figure a) . linear heptapeptide precursors were synthesized on the solid phase, deprotected and cleaved from the resin, and cyclized in solution (see supporting information for details). we failed to evaluate site b due to the poor resin loading of fmoc-b-oh-leu. gratifyingly, cyclization at site c provided in % overall yield (including the solid-phase linear heptapeptide synthesis), whereas cyclization at site a was less efficient ( % overall yield). by synthesizing the linear heptapeptide precursor on the solid phase and cyclizing in solution at site c, we were able to prepare ternatin- in days and % overall yield ( mg), a significant improvement over our previous route (figure b with synthetic sr-a and ss-a in hand (figure a) , we first compared their hplc elution profiles with an authentic sample of the natural product a . sr-a and naturally derived a had identical retention times, whereas ss-a eluted later in the gradient (figure b) . furthermore, the h and c nmr spectra of sr-a appeared identical to the corresponding spectra of natural a (figure c) . finally, sr-a and naturally derived a blocked proliferation of hct cancer cells with superimposable dose-response curves (figure d , ic ~ . nm), whereas ss-a was ~ -fold less potent (ic ~ . nm). together, these data are consistent with our stereochemical hypothesis and suggest that sr-a corresponds to the previously unknown structure of the natural product a . we previously demonstrated that the antiproliferative effects of ternatin- were abrogated in cells expressing a point mutant of eef a (a v). unsurprisingly, eef a-mutant cells were similarly resistant to sr-a (ic >> µm), providing strong genetic evidence that eef a is a physiologically relevant target (figure a ). consistent with this interpretation, treatment of cells with sr-a for h reduced the rate of protein synthesis with an ic of ~ nm (figure b) , as measured by a clickable puromycin incorporation assay (o-propargyl puromycin, opp). under these conditions - h of continuous treatment prior to a -h pulse with opp -ternatin- behaved identically to sr-a , whereas ss-a was slightly less potent. however, when the treatment time was shortened to min before pulse-labeling with opp for h (in the continuous presence of the cyclic peptides), the dose-response curves shifted significantly, such that ternatin- was ~ -fold more potent than sr-a and ss-a had intermediate potency drug-target residence time, which reflects not only the intrinsic biochemical off-rate, but also the rebinding rate and local target density in vivo, has emerged as a critical kinetic parameter in drug discovery. [ ] [ ] to test for potential differences in cellular residence time, we treated hct cells with nm sr-a , ss-a , or ternatin- for h, followed by washout into compound-free media. at various times post-washout, cells were pulse-labeled with opp for h. whereas protein synthesis rates partially recovered in cells treated with ternatin- or ss-a (~ % of dmso control levels, h post-washout), transient exposure of cells to sr-a resulted in sustained inhibition (figure a ). to confirm the extended duration of action observed with sr-a , we assessed cell proliferation during a -h washout period. strikingly, cell proliferation was nearly abolished after -h treatment with nm sr-a , followed by rigorous washout. by contrast, cell proliferation rates recovered to ~ % of dmso control levels after transient exposure to nm ternatin- or ss-a . these results demonstrate that the (r)-b-hydroxy group attached to n-me-leu endows sr-a with a kinetic advantage over ss-a and ternatin- , as reflected by washout resistance and increased cellular residence time. figure . n-me-b-oh-leu stereospecifically endows sr-a with increased cellular residence time. a) hct cells were treated with the indicated compounds ( nm) or dmso for h, followed by rigorous washout into compound-free media. at the indicated time points postwashout, cells were pulse-labeled with opp ( h), and opp incorporation was quantified. normalized data (% dmso control) are mean values ± sd (n = ). b) hct cells were treated with the indicated compounds ( nm) or dmso for h, followed by rigorous washout into compound-free media. at the indicated time points post-washout, cell proliferation was quantified using the celltiter-glo assay. normalized data (% dmso control at t = h postwashout) are mean values ± sd (n = ). ***, p < . ; ****, p < . . in this study, we developed an improved synthetic route to dhml-containing ternatin variants, culminating in the total synthesis of sr-a and ss-a . our work provides spectroscopic, chromatographic, and pharmacological evidence that synthetic sr-a (and not ss-a ) is identical to the natural product "a ". an unexpected finding from our experiments with cancer cell lines is that the b-hydroxy group in sr-a does not confer increased potency under continuous treatment conditions. rather, sr-a exhibits a dramatic increase in cellular residence time, as revealed by washout experiments followed by assessment of protein synthesis and cell proliferation rates. sr-a thus provides a compelling illustration of how a "ligand efficient" side-chain modification can be exploited to alter the pharmacological properties of a cyclic peptide natural product. roadblocks and resolutions in eukaryotic translation the eef a proteins: at the crossroads of oncogenesis, apoptosis, and viral infections gtp-dependent binding of the antiproliferative agent didemnin to elongation factor alpha decoding mammalian ribosome-mrna states by translational gtpase complexes inhibition of translation by cytotrienin a--a member of the ansamycin family nannocystin a: an elongation factor inhibitor from myxobacteria with differential anti-cancer properties randomized phase iii study (admyre) of plitidepsin in combination with dexamethasone vs. dexamethasone alone in patients with relapsed/refractory multiple myeloma ternatin, a highly nmethylated cyclic heptapeptide that inhibits fat accumulation: structure and synthesis ternatin and improved synthetic variants kill cancer cells by targeting the elongation factor- a ternary complex a sars-cov- protein interaction map reveals targets for drug repurposing a new route to hydrophobic amino acids using copper-promoted reactions of serine-derived organozinc reagents synthesis of enantiomerically pure unsaturated .alpha.-amino acids using serine-derived zinc/copper reagents imaging protein synthesis in cells and tissues with an alkyne analog of puromycin rebinding: or why drugs may act longer in vivo than expected from their in vitro target residence time the drug-target residence time model: a -year retrospective funding for this study was provided by the ucsf program for breakthrough biomedical key: cord- - tib o m authors: ahmed, asad; mam, bhavika; sowdhamini, ramanathan title: deelig: a deep learning-based approach to predict protein-ligand binding affinity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tib o m protein-ligand binding prediction has extensive biological significance. binding affinity helps in understanding the degree of protein-ligand interactions and has wide protein applications. protein-ligand docking using virtual screening and molecular dynamic simulations are required to predict the binding affinity of a ligand to its cognate receptor. in order to perform such analyses, it requires intense computational power and it becomes impossible to cover the entire chemical space of small molecules. it has been aided by a shift towards using machine learning-based methodologies that aids in binding prediction using regression. recent developments using deep learning has enabled us to make sense of massive amounts of complex datasets. herein, the ability of the model to “learn” intrinsic patterns in a complex plane of data is the strength of the approach. here, we have incorporated convolutional neural networks that find spatial relationships among data to help us predict affinity of binding of proteins in whole superfamilies towards a diverse set of ligands. the models were trained and validated using a detailed methodology for feature extraction. we have also tested deelig on protein complexes relevant to the current public health scenario. our approach to network construction and training on protein-ligand dataset prepared in-house has provided significantly better results than previously existing methods in the field. proteins are a diverse class of dynamic macromolecular structures in living organisms and are essential for the biochemistry and physiology of the organism. depending on their functional role (s), proteins may bind to other proteins, peptides, nucleic acids and non-peptide ligands with varying affinities. determining protein-ligand affinity helps in understanding the reaction mechanism and kinetics of the reaction and has applications in drug development and pharmacology ( ) . protein-ligand interaction is measured in terms of binding affinity. the stronger the readout for binding affinity, the stronger the interaction between protein and ligand may be inferred. it is quantified in terms of inhibition constant (ki), dissociation constant (kd), changes in free energy measures (delta g, delta h) and (ic ) ( ) . predicting binding affinity between a protein and ligand complements experimental approaches and is usually used as a start-point for the latter. prediction-based approaches are also useful in cases where experimental determination of binding affinity may not be feasible. classical prediction methods to score free binding energies of small ligands to biological macromolecules such as mm/gbsa and mm/pbsa typically rely on molecular dynamic simulations for calculations and aid in silico docking and virtual screening as well as experimental approaches. however, there is a trade-off between computational resources and accuracy ( ) . with a recent shift towards the use of machine learning and deep-learning based methods in the field of structural biology, making biologically significant predictions using regression and 'learning' intrinsic patterns in a complex plane of available data has led to resourceoptimal predictions without compromising on accuracy. deep learning has been known to learn representations and patterns in complex data forms. our aim was to apply deep learning to predict binding affinity of protein-ligand interaction. convolutional neural networks (cnn) are deep neural networks that use an input layer, output later as well as convolutional hidden layer(s). the first cnn was incorporated by lecunn in ( ) the connectivity pattern of which was inspired by the elegant experiments of hubert and weisel on the mammalian visual cortex in the s ( ) . with the growing technical advancements and massive amounts of data, cnns have emerged popular in biological fields in the recent decade with various applications ( ) . in our study, we have used cnns to provide a quantitative estimate of protein-ligand binding using various sets of features corresponding to protein and ligand respectively by finding spatial relationships amongst the data. our approach was validated using ligand-bound complexes from kinases superfamily in the pdb. kinases belong to a class of enzymes required for substrate dependent phosphorylation. they are represented across diverse cellular functions like signaling, differentiation, glycolysis ( ) . we have also tested our model on covid- main protease ( ) of the novel coronavirus strain complexed with various inhibitors of which binding affinities have not been predicted or experimentally determined so far. the raw data for our novel database was obtained from rcsb pdb ( ) database, where following were selected as the query parameters. these criteria resulted in a list of protein pdb ids, complexed ligand (s) and corresponding binding affinity values. the search results include the structures present in pdbdatabase, pdbbind ( , , ) , pdbmoad ( , ) and scpdb ( ) for its results. initial raw data database created contained protein structures in pdb format, protein sequences in fasta format, ligand in sdf format and binding affinity values of corresponding protein-ligand pairs for complexes. the pdb, fasta and sdf files filtered were further processed to refine our novel dataset, as shown in figure . protein-ligand complexes were , in number and corresponded to , complex unique chain-ligand pairs. binding affinity values were obtained from the rcsb database and protein chain-ligand pairs with corresponding binding affinity as were discarded to reduce statistical errors. this narrowed down the total complexes to , protein-ligand pairs. pocket information was extracted from the protein using ghecom ( ) and converted to mol format using chimera ( ) , which narrowed our results to pocket-ligand pairs. it narrowed down the size of the dataset to pocket-ligand pairs. we discarded other protein-ligand pairs with missing pssm profiles, secondary structure or dihedral angle information. it resulted in a total of pocket-ligand pairs, which corresponds to pocket-ligand pairs containing unique chains. training the deep learning network on raw information is known to result in longer time for convergence and less accuracy. we followed a conventional methodology for feature extraction and used the deep learning framework to learn the interaction between the proteinpocket and ligand for their affinity prediction. a comprehensive two-level feature extraction methodology, one at the atomic level and the other at the level of amino acids utilizing structural information and protein sequence respectively. • bit hot or all null hot encoding for atom types: b, c, n, o, p, s, se, halogen and metal. • integer for hybridization • integer representing the number of bonds with heavy atoms • integer representing the number of bonds with hetero atoms • bits ( if present) encoding properties defined with smarts patterns: hydrophobic, aromatic, acceptor, donor and ring • float for partial charges • integer to distinguish between ligand as - and protein as we utilized the sequence information of protein to get more features about the protein pocket-ligand interaction. • position-specific scoring matrix (pssm): pssm is a matrix that represents the probability of mutation at each point of the sequence. it gives a bit-probability for each amino acid at each location. pssm profiles were obtained using psi-blast ( ) with swissprot as subject database and e-value threshold as . . chains with less than amino acids were removed from the input dataset. • relative solvent accessibility (rsa): it is encoded by bit of information for each amino acid that provides whether it is buried or exposed to the solvent. we set a threshold of % in rsa values. rsa was obtained using naccess ( ) . • secondary structure: it is encoded by bit of information about the structure as coil, helix or plate and was predicted using the dssp ( , ). • dihedral angles: it is encoded by bits of information with phi / psi angles of each of the amino acids and was predicted using dssp ( , ) for obtaining dihedral angles. standard ligand features were calculated for ligands in our dataset using padel ( ) and d, d and chemical fingerprints, which includes hybridisation, atom pair interaction, counts of various functional group. we also used qikprop ( ) and canvas ( ) to derive admet (absorption, distribution, metabolism, excretion, and toxicity) properties, which includes the physical properties, solubility and partition coefficients. the exhaustive list of every property calculated is given in the appendix. it results in a d array of , dimensions containing the various properties of a given ligand. this is used as a feature vector representing the ligand represented in mol format. the three-dimensional co-ordinates of atoms were converted into a d grid of resolution Å with Å spacing between the two axes centered along the centroid of the ligand. atoms outside each such grid were discarded. the atoms lying inside the grid were rounded up to the nearest coordinate of the grid where features of corresponding atoms that lay in the same coordinates were added up. this resulted in projecting ligand-interacting residues into a three-dimensional cube with features representing the atomic as well as protein-based properties of each atom of the protein pocket. features were calculated at the atomic level (section . . ) corresponding to each atom of an amino acid and ligand. a -bit vector was calculated that uniquely identified each of the atoms in the d co-ordinates of a given protein-pocket and ligand complex. a d tensor each of size m x m x m x , i.e. the coordinates (x, y, z) and the features, where m represents the number of atoms present in a complex was constructed as the feature vector representing the given protein pocket-ligand. the d vector contains the protein-pocket features and was converted to a d grid using grid featurization (section . ). the d-featurized grid is essentially a d tensor, where the coordinates are approximated to the points on the grid. the dataset is converted to vectors and is divided into training:validation:test sets in ratio : : . convolutional neural networks ( ) have been used to capture spatial features in an image. we use cnns to capture the interaction between ligand and protein atoms in threedimensional space. a network was constructed ( figure ) with a d cnn of varying channel sizes of [ , , ] with non-linear activation relu after each layer, each d cnn had a filter of Å cube which was used to perform convolution operations. maxpool ( ) layer that acts in three dimensions to lower the dimension with a pool size of Å cube and batch normalization ( ) layer is added after each cnn layer, this in turn decreases the training time and helps in faster convergence. the latent features learnt from the above cnn layers were then flattened and used for calculating the binding affinity of the protein pocket-ligand pair. the cnn derives the relation among the d coordinates and their features, which would correspond well to the binding affinities of complexes. the features from the last cnn layer are then flattened out, and passed through a fully connected neural network having the number of neurons as [ , , ] with relu as non-linearity after each layer. dropout ( ) is added after each layer to prevent overfitting by forcing the neural network to learn various other pathways by randomly assigning neurons to zero, . as dropout threshold. dense network predicts a regressive value of binding affinity, corresponding to a single neuron output. training framework is shown in figure and a detailed layer network is shown in figure (a). the featurized protein-pocket grid formed was rotated to all combinations possible, such that the network is able learn in an orientation invariant form. the network was trained by taking mean square error between the predicted and actual values as a loss function. the network was optimized using adam ( ) as the optimizer with a learning rate of e- and weight decay of . for epochs. network was trained on an nvidia pascal gpu using pytorch ( ) as the framework. features were calculated at the amino acid level (section . . ) and were concatenated alongside the atomic level features (section . . ) to each atom of amino acid. it results in a -bit vector uniquely identifying each of the atoms in the d co-ordinates of a given protein. a d tensor each of sizes m x m x m x , i.e. the coordinates (x, y, z) and the features, where m represents the number of atoms present in a complex is constructed as the feature vector of protein pocket. the d vector contains the protein-pocket features, it was converted to a d grid using grid featurization (section . ). the d featurized grid is essentially a d tensor, where the coordinates are approximated to the points on the grid. the ligands were separately featurized by calculating the ligand properties (section . ), which results in a d tensor. the dataset is converted to vectors and is divided into training:validation:test sets in ratio : : . a multi-input network was constructed ( ) training framework is shown in figure and a detailed layer network is shown in figure (b) the featurized protein-pocket grid formed was rotated to all combinations possible, such that the network is able to learn in an orientation invariant form. the featurized protein pocket-ligand pair of training set was passed through corresponding the network and trained by taking mean square error between the predicted and actual values as a loss function. the network was optimized using adam ( ) as the optimizer with a learning rate of e- and weight decay of . . the network was trained on an nvidia pascal gpu using pytorch ( ) as the framework. the performance of the models was quantified using mean absolute error (mae) and root mean square error (rmse). it was tested on validation and testing sets which were initially divided from our dataset as mentioned in the training section. lower error corresponds to better learning capacity of the model. standard deviation among the real and predicted values was also calculated. the mae, rmse and sd values are shown in table . for the purpose of training and testing models, one nvidia tesla p gpu cluster was used. computational time taken for featurization of dataset, training and testing were hours, hours and minutes respectively. two modules were trained. the first module was trained using a small set of features for protein and ligand, which were represented together in a d grid space. this approach has also been part of a previous study ( ) . however, the previous study uses a restricted ligand set that does not involve larger ligands. here we have used a diverse set of ligands as one of our inputs. with training of atomic model for epochs, mae score of . was achieved (table ) . we constructed another module that enabled us to improve on the ligand and protein based information. to this purpose, we used an increased feature vector size which amounted to bits in size for ligand and bits for each atom of protein. with training of composite model for only epochs, mae score of . was achieved ( table ) . the performance of our model was further evaluated using ligand-bound complexes from the kinase superfamily from pdb. the composite model outperformed the atomic model significantly and with lower standard deviation. in light of the ongoing coronavirus pandemic, we tested protein-ligand complexes from the coronavirus (cov) family. the covid- main protease is a key enzyme for the novel strain of coronavirus that is being implicated in the pandemic. a recent study involved testing of invitro binding efficacy of coronavirus covid- virus main protease (mpro) with a potent reversible synthetic inhibitor, n ( ) . however, the highly potent inhibition by n rendered the experimental determination of binding affinity not achievable. using the structure of mpro at high resolution ( bqy: . angstrom), we have been able to predict the binding affinity of n to . e+ nanomolar (table ) . this value agrees with the observed high affinity in the course of recent experiments ( ) . another study has deposited the complex of the covid- main protease with a broadspectrum inhibitor x (n-( -tert-butylphenyl)-n-[ ( r)- -(cyclohexylamino)- -oxo- -(pyridin- -yl)ethyl]- h-imidazole- -carboxamide) ( ; unpublished). we used these complexes to predict their respective binding affinities as they have not been made available. based on our model-based predictions, broad spectrum inhibitor x scores for highest affinity followed by ligands z , n , z , z and z in the order of decreasing binding affinity ( we propose a deep-learning based approach to predict ligand (eg., drug)-target binding affinity using only structures of target protein (pdb format) and ligand (sdf format) as inputs. convolutional neural networks (cnn) were used to learn representations from the features extracted from these inputs and hidden layers in the affinity prediction task. we used two approaches to feature extraction-atomic level as well as composite level and compared their performance using the same network. deep-learning based approaches have been implemented for prediction of binding affinity. one of the studies used atomic level features of complex in a cnn based framework for binding affinity prediction ( ), while another study used protein sequence level features in a cnn based framework for prediction ( ) . another approach used as been to use feature learning along with gradient boosting algorithms to predict binding affinity ( ) . here, we provide a composite model that incorporates tripartite structural, sequence and atomic level features with those of the atomic and other chemical features of the ligand to predict binding affinity of a putative complex. we have trained two models to predict the binding affinity between protein and ligand in a given complex. this would help existing databases like rscb pdb, pdbbind in filling missing binding affinity data for complexes. we have constructed a novel dataset that represents a diverse set of ligands and using a novel deep learning based approach we have achieved significant improvement in prediction of binding affinity of protein-ligand complexes. interestingly, our approach performed better without ligand coordinates as input. to counter filtering or noise reduction in our dataset, our dataset constructed is smaller than pdbbind ( ) but we have overcome the constraints on ligand selection part of a previous study ( ) . although our dataset contains complexes compared to , complexes found in pdbbind, the ligands used as part of our training include unique ligands absent in pdbbind. this helps in achieving ligand diversity during training the cnn model. the similarity matrix constructed from the binary fingerprints of ligands used in the dataset supports our claim of improved ligand diversity in our dataset (supplementary file s ). we have also eliminated the need of providing ligands in a complex form with protein. thus a given protein pocket may be tested for the degree of binding for any given ligand. this can be extended to predicting potential binding partners for proteins in other superfamilies as well. it is also important to consider that docking score and pose is not a reliable correlation with mm/gbsa poses ( ). deelig can be used for a member of any protein superfamily and a non-peptide ligand, the docking pose of which may or may not be known. the code repository for the project is publicly available at : https://github.com/asadahmedtech/deelig binding affinity predictions through deelig can be extended to protein-ligand complexes of protein superfamilies where the affinity is quantitatively unknown due to experimental limitations or where the potential for binding is yet to be explored in vitro. a webserver to implement deelig for easy online access would be useful for the general scientific community and this will also be in the pipeline. a later version of deelig which is trained on peptide ligand dataset will also be worked on. following properties of ligand were calculated using padel ( ) • ip (ev) ( ionization potential) • ea (ev) (electron affinity) • #metab (likely metabolic reactions) • psa (van der waals sa of polar n and o atoms) • #nando, #ringatoms (number of atoms in rings) • #in (number of atoms in or membered rings) • #in (number of atoms in or membered rings) • #noncon (ring atoms cannot form conjugated aromatic bonds) • #nonhatm (heavy atoms-nonhydrogen atoms) • ruleofthree • ruleoffive (lipinski violations) • qplogkhsa (binding to human serum albumin) • percenthuman-oralabsorption recent improvements to binding moad: a resource for protein ligand binding affinities and structures gapped blast and psi-blast: a new generation of protein database search programs analysis and comparison of d fingerprints: insights into database screening performance using eight fingerprint methods the mm/pbsa and mm/gbsa methods to estimate ligandbinding affinities expert opinion on drug discovery simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines receptive fields, binocular interaction and functional architecture in the cat's visual cortex batch normalization: accelerating deep network training by reducing internal covariate shift structure of m pro from sars-cov- and discovery of its inhibitors a series of pdb related databases for everyday needs sc-pdb: a d-database of ligandable binding sites- years on dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features on the binding affinity of macromolecular interactions: daring to ask why proteins interact detection of multiscale pockets on protein surfaces using mathematical morphology adam: a method for stochastic optimization imagenet classification with deep convolutional neural networks binding moad (mother of all databases gradient-based learning applied to document recognition deepatom: a framework for protein-ligand binding affinity prediction deep learning in bioinformatics bindingmoad, a high-quality protein-ligand database deepdta: deep drug-target binding affinity prediction pytorch: tensors and dynamic neural networks in python with strong gpu acceleration. pytorch: tensors and dynamic neural networks in python with strong gpu acceleration ucsf chimera-a visualization system for exploratory research and analysis the rcsb protein data bank: redesigned website and web services very deep convolutional networks for largescale image recognition development and evaluation of a deep learning model for protein-ligand binding affinity prediction the pdbbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures the pdbbind database: methodologies and updates the pdbbind database: methodologies and updates padel-descriptor: an open source software to calculate molecular descriptors and fingerprints pdb-wide collection of binding data: current status of the pdbbind database binding of nicotinoids and the related compounds to the insect nicotinic acetyicholine receptor schrödinger release - : qikprop, schrödinger, llc supplementary files a. similarity matrix of ligands used in dataset (supplementary file s ) b. dataset details (supplementary file s ) c. dataset distribution aa acknowledges funding awarded by the indian academy of sciences, bangalore ( ).bm would like to acknowledge tata trusts-tdu fellowship for phd awarded to her from to . all authors acknowledge ncbs for infrastructural support. the authors declare no conflict of interest. key: cord- - rf xs d authors: arokiaraj, mark christopher; wilson, jarad title: a novel method of immunomodulation of endothelial cells using streptococcus pyogenes and its lysate date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rf xs d background coronary artery diseases and autoimmune disorders are common in clinical practice. in this study, a novel method of immune-modulation to modify the endothelial function was studied to modulate the features of the endothelial cells, and thereby to reduce coronary artery disease and other disorders modulated by endothelium. methods huvec cells were seeded in the cell culture, and streptococcus pyogenes were added to the cell culture, and the supernatant was studied for the secreted proteins. in the second phase, the bacterial lysate was synthesized, and the lysate was added to cell culture; and the proteins in the supernatant were studied at various time intervals. results when streptococcus pyogenes alone was added to culture, e cadherin, angiostatin, epcam and pdgf-ab were some of the biomarkers elevated significantly. hcc , igfbp and timp were some of the biomarkers which showed a reduction. when the lysate was added, the cell-culture was maintained for a longer time, and it showed the synthesis of immune regulatory cytokines. heatmap analysis showed a significant number of proteins/cytokines concerning the immune/pathways, and toll-like receptors superfamily were modified. blc, il , bmp , parc, contactin , il rb, nap (cxcl ), eotaxin were maximally increased. by principal component analysis, the results observed were significant. conclusion there is potential for a novel method of immunomodulation of the endothelial cells, which have pleiotropic functions, using streptococcus pyogenes and its lysates. streptococcus pyogenes and lysate on immune regulation background immune-related disorders are common in clinical practice. though commonly handled disease by the immune system is an infection, most of the other disorders like autoimmune disorders, leukemias, and even non-communicable diseases have an immune basis. a broad-spectrum treatment method to modulate immune disorders would be useful. rheumatic heart diseases are common in the general population in the asian countries with a prevalence of about . to / community based on echocardiography screening in school in india. streptococcus pyogenes infections are associated with sore-throats, and they are also associated with rheumatic heart diseases. rheumatic fever is associated with migratory joint pains and occasionally pan-carditis. in patients with rheumatic heart diseases, the incidence of coronary artery diseases is low. [ ] [ ] [ ] [ ] this was observed in some studies in india and neighboring countries. hence, in this study the streptococcus pyogenes infections were used to study the immune modulatory potentials in the endothelial cells. streptococcus pyogenes are known to produce a unique enzyme that is useful in cleaving immunoglobulin g in the blood (ides). [ ] [ ] [ ] its usefulness therapeutically has been tested to cleave igg antibodies. , streptococcus pyogenes secrete serum optical factor, which shows increased uptake of hdl, and thereby it can reduce atherosclerosis. , in this study, streptococcus pyogenes was used to infect the endothelial cells, and the endothelial response was evaluated. also, in the later part of the study the streptococcus pyogenes' lysate was used on the endothelial cells, and the results were evaluated. microorganisms and their diseases and susceptibility to infections can prevent autoimmune disorders though this phenomenon is not well studied. the incidence of autoimmune disorders is common in the migrant populations especially of asian ethnicity. , this is commonly attributed as the hygiene hypothesis, and the exact mechanism is speculative and not decisively studied. streptococcus pyogenes and lysate on immune regulation in our center out of consecutive valve replacement surgeries six cases underwent concomitant coronary artery bypass surgery. out of these cases a clear etiology of rheumatic heart disease was not established on any of the cases, and the primary etiologies were ischemia mr and degenerative sclerotic calcific aortic valve. the study was performed in search of novel applications of streptococcus pyogenes in regulating immune functions, and its related effects on cardiovascular and its pleiotropic functions. streptococcus pyogenes were obtained from the laboratory (streptococcus pyogenes rosenbach -atcc, tm lancefield group a). the bacteria were cultured by seeding, and the colonies were inoculated with endothelial cells. serial observations about the endothelial cells were made at regular intervals by microscopy. the supernatant was collected, and inflammatory markers were studied at serial time intervals. in the second phase of the study, the streptococcus pyogenes' lysate was prepared from the same bacteria. the lysate was added to the endothelial cells, and the endothelial cell response was studied at serial intervals - , , , , hours, and control samples were studied. the biomarkers secreted were studied, and the results were achieved. the biomarkers demonstrating no variation among all the samples (zero variance) were excluded from the data profile analysis since they do not contribute regarding distinguishing samples from each other. the biomarker values were standardized (centering and scaling) by subtracting the average and then dividing by the standard deviation. the standardized data were plotted in a heatmap with hierarchical clustering by euclidean distance. the various expression levels of multiple biomarkers may come from a common underlying factor/mechanism. the principal component analysis (pca) decompose the data set into different principal components (pcs) sorted by their contribution to total variance/variation in the dataset. these pcs are linear transformations/combinations of standardized biomarker values. by observing the location of a sample on the plot of the first pcs explaining the most variation, we can tell the pattern of samples. all the analyses were conducted in the r programming language v . . (r core team ). the incidence of rheumatic heart disease is showing a decreasing trend, and the incidence of coronary artery disease is rising in the recent days. the incidence in our center and other centers as well show a low association of coronary artery disease with rheumatic valvular heart diseases, irrespective of age and metabolic characteristics. , the incidence of mixed procedure i.e., combined valve surgeries and bypass surgeries, is < %. and further analysis looking for rheumatic valvular etiology in these combined procedures would be much lesser, even in high volume centers like tums. specific biomarkers like hcc , igfbp , pdgf-aa, and timp decrease in the levels compared to controls. hemofiltrate cc is a chemokine, which attracts and acts through ccr receptors. it is widely secreted by various tissues. insulin-like growth factor binding protein reduces the risk of diabetes. igfbp is implicated in the regulation of igf in most tissues. blocking of igf bp results in the reduction of tumor and metastasis. plateletderived growth factor aa is involved in the migration of smooth muscle cells. [ ] [ ] [ ] reduction in the timp metallopeptidase inhibitor and also has anti-angiogenic activity is associated with a reduction in the adverse clinical events in acute kidney injuries. mif (macrophage migration inhibitory factor) and lif (leukemia inhibitory factor) levels were reduced. mif is a widely expressed pleiotropic cytokine, and it is involved in the stimulation of other inflammatory cytokines like tnf alpha, inf gamma, il , il , cxcl etc. lif is involved in cell differentiation, and maturation and stimulation lead to jak/stat and mapk cascades. increased activity was found in e cadherin is involved in cell-to-cell interactions, and they have tumor suppressor effects. angiostatin, dan, is an inhibitor of bmp and tnf. angiostatin is engaged in the reduction of angiogenesis. , there is a marked increase in epcam, which are complex proteins that promote transcription factor-mediated pluripotency reprogramming, cfg riic and pdgf ab. the platelet-derived growth factor has active angiogenic potential and mitogenesis and acts on various tissues. angiogenin may maintain blood homeostasis and participates in anti-inflammatory activity and has antibacterial and antiviral properties. when the cells were treated with streptococcus pyogenes' lysate, the levels of blc -the b lymphocyte chemoattractant protein (cxcl ) was increased. contactin is a neuronal membrane protein, and it acts as an active cell adhesion molecule. - il is an inflammatory protein, and it was induced after lysate treatment. il induces the production of gcsf and chemokines like cxcl and . il is strongly associated with chronic inflammation associated with autoimmune disorders. parc (parkin like ubiquitin ligase) is a cytoplasmic anchor protein to p -associated protein complexes. [ ] [ ] [ ] cxcl is involved in neutrophil chemotaxis, adhesion to the endothelial cells, and transendothelial migration of the cells. [ ] [ ] [ ] chemokine cxcl is engaged in neutrophil-platelet crosstalk, and also it is actively involved in the growth of renal cell carcinoma. il rb is the receptor for il , and it actively participates in inflammatory signaling. trappin is a serine protease inhibitor, and it has anti-inflammatory actions on the mucosal surfaces. it also has anti-retrovirus activities on the mucosal surfaces. sdf alpha and its chemokine receptor play a significant role in hematopoietic cell mobilization, cancer metastasis, and ischemic injury repair in myocardial infarction tissues. fgf (fibroblastic growth factor) has an active role in tissue repair. ccl is a mucosa-associated epithelial chemokine, and it is associated with the recruitment of the cells, and it helps in t and b cell accumulation in mucosal surfaces. - bb (cd ) signalosome promotes t cell proliferation and survival and results in increased t cell effector functions. , vcam is an inflammatory protein involved in cell to cell adhesions, and it also effectively induces angiogenesis. , gitr glucosteroid induced tnfr related protein is expressed by t cells and its ligands, and it boosts t cell activity. gitr agonistic stimulation is emerging as a promising therapeutic concept. siglec is a leucocyte receptor that recognizes sialic acid structures and helps in leucocyte recruitment. vegfr is a receptor for vascular endothelial growth factors c and d and it is involved in lymphangiogenesis and to some extent in vegf a induced angiogenesis as well. il overcomes insulin resistance by promoting anti-inflammatory macrophage differentiation in adipose tissue. cd is expressed on the surfaces of the endothelial cells though it is primarily expressed by lymphoid tissues. they are expressed in non lymphamatous tumours. cd signalling is involved in proliferation, differentiation and survival (anti-apoptosis). tgf b is expressed in the endothelium, and it plays an essential role in angiogenesis. gp is a glycoprotein that participates in il mediated inflammation and vascular pathologies, and it also has a negative feedback control. mif is a widely expressed pleiotropic cytokine, and it is involved in the stimulation of other inflammatory cytokines like tnf alpha, inf gamma, il , il , cxcl etc. , mif is elevated in type and diabetes. fas activation is associated with autoimmune disorders, which can be modulated by downregulation. timp are matrix metalloproteinases which are involved in inflammation in the cancer cells. igfbp are inhibitory and stimulatory to some of the tumours. il r is involved in mediating inflammation leading to myocarditis. hence a reduction in these receptors could reduce inflammatory changes. , dickkopf family proteins are active modulators of wnt pathways, and mostly their effects are inhibitory. tnf r has proinflammatory and some anti-inflammatory aspects as well. the stimulatory and inhibitory effects had attracted considerable interest in the treatment of autoimmune diseases and cancer. follistatin is actively involved in activin a-follistatin regulation of cardiac inflammation and fibrosis. heparin-binding epidermal growth factor-like growth factor inhibits cytokine-induced nf-κb activation. the heat map analysis shows a significant change in more proteins, and thereby it is possible to infer that streptococcus has a role in immune regulation. the above observations indicate that the immune system undergoes various modifications by the streptococcus pyogenes direct challenge. certain parameters are increased, and some are decreased. in the long term, the immune memory and its regulation are complex, and it is also subjected to many positive and negative feedback regulations. also, when the lysate is added, the modulations are seen. hence, streptococcus pyogenes is associated with changes in the immune system, which can influence potential regulations in the immune homeostasis of the individuals. the negative impact of rheumatic heart diseases can have a positive influence in modifying the immune-related functions and possibly rendering significant protection. it is indeed difficult to predict the immune response in the future to a certain extent. however, it can be inferred that the endothelial response could be significant and probably chaotic, which determine transcription and gene regulation. chaos does not necessarily lead to dysfunctions, and at times and situations, it could be a natural method of selection to strengthen immune functions. optical laser chaos signals which are high speed when studied are found to synchronize and it has many features. , decoding these chaotic signals in immune system would potentially lead to our better understanding of immune system and modify its treatment which are significant challenges to at present. in the recent times coronavirus infections (covid- ) pandemic is rampant, and the infection selectively affects various countries, and the mortality statistics were varied in different countries. these could be due to varied climatic conditions and the immune response to the viruses. the southeast asian countries and the indian subcontinent are relatively less affected so far, at his time of writing. the streptococcus pyogenes, tropical bacterial infections and other viral are common in these areas, and they could provide cross-immunity to the coronaviral infections. bacteria can synthesize restriction enzymes like nucleases and inhibit viruses. it has been shown that bacterial presence can reduce the intensity of viral infections. our study also reflects the immune regulation changes due to streptococcus pyogenes as well as by its lysate. in the theory of natural selection, the environment or the nature in various forms could offer protection by its selective mechanisms in various geographic locations. these could not necessarily be simple mechanisms but also as chaotic or cross-immunity methods. the commonly available streptococcus bacteria and the common viruses could be the mechanism of choice in the form of selection by unhygienic means. possibly these protective mechanisms are not well studied by the scientific and social communities, and the primary focus is often tertiary care treatment modalities. the rarity of diabetes and rheumatic heart disease was observed by legendary physicians like joslin and steinchron in the early s and 's respectively. , our study also suggests various metabolic modulators being stimulated and some inhibited. also, joseph barach made similar observations in that period and attributed the views to immune regulation changes. our observations, such as increased il , decreasing fas, and dickkopf proteins, also indicate possible metabolic protection. antibodies seen in rheumatic fever are also found in antiphospholipid antibody syndromes. however, the incidence of autoimmune diseases in rheumatic heart disorders is very rare or possibly mutually exclusive by negative feedback mechanisms, at least in the clinical experience of the author. further studies need to be performed to observe the immune changes in animals after direct streptococcus challenge and after the lysate administration. specifically, the immune changes regulating the autoimmune disorders, cancer regulation, atherosclerotic processes, and host defense activities to viruses need to be studied in animal models. streptococcus pyogenes and its lysate has immunomodulation actions when tested with endothelial cells, which have pleiotropic functions. further studies need to be performed to identify its potential benefits. current status of rheumatic heart disease in india mortality prediction in indian cardiac surgery patients: validation of european system for cardiac operative risk evaluation ii does parsonnet scoring model predict mortality following adult cardiac surgery in india concomitant coronary artery bypass graft and aortic and mitral valve replacement for rheumatic heart disease: short-and mid-term outcomes prevalence of cad in rheumatic heart disease: is time to redefine the age for screening coronary angiography? iosr journal of dental and medical sciences ides, a highly specific immunoglobulin g (igg)-cleaving enzyme from streptococcus pyogenes, is inhibited by specific igg antibodies generated during infection ides, a novel streptococcal cysteine proteinase with unique specificity for immunoglobulin g streptococcal ides and its impact on immune response and inflammation ides: a bacterial proteolytic enzyme with therapeutic potential administration of immunoglobulin g-degrading enzyme of streptococcus pyogenes (ides) for persistent anti-adamts antibodies in patients with thrombotic thrombocytopenic purpura in clinical remission the structure and function of serum opacity factor: a unique streptococcal virulence determinant that targets high-density lipoproteins streptococcal serum opacity factor increases the rate of hepatocyte uptake of human plasma high-density lipoprotein cholesterol genetics of autoimmune diseases: insights from population genetics the incidence and prevalence of systemic lupus erythematosus in the uk the prevalence and incidence of systemic lupus erythematosus in ccl serves as a novel prognostic factor and tumor suppressor of hcc by modulating cell cycle and promoting apoptosis growth factor binding protein (igfbp- ) and the risk of developing type igfbp- -taking the lead in growth, metabolism and cancer regulatory effects of platelet-derived growth factor-aa homodimer on migration of vascular smooth muscle cells role of platelet-derived growth factors in physiology and medicine tissue inhibitor metalloproteinase- (timp- )⋅igf-binding protein- (igfbp ) levels are associated with adverse long-term outcomes in patients with aki macrophage migration inhibitory factor (mif): a key player in protozoan infections the regulation of leukemia inhibitory factor tumor suppressor gene e-cadherin and its role in normal and malignant cells the dan family: modulators of tgf-β signaling and beyond members of the dan family are bmp antagonists that form highly stable noncovalent dimers epithelial cell adhesion molecule (epcam) complex proteins promote transcription factor-mediated pluripotency reprogramming mechanism of platelet-derived growth factor (pdgf)aa, ab, and bb binding to alpha and beta pdgf receptor three decades of research on angiogenin: a review and perspective the membrane bound bacterial lipocalin blc is a functional dimer with binding preference for lysophospholipids bace activity regulates cell surface contactin- levels contactin- /tag- , active on the front line for three decades expression in the cardiac purkinje fiber network an overview of il- function and signaling calpain-mediated processing of p -associated parkin-like cytoplasmic protein (parc) affects chemosensitivity of human ovarian cancer cells by promoting p subcellular trafficking dimerization of cul and parc is not required for all cul functions and mouse development platelet-derived chemokine cxcl dimer preferentially exists in the glycosaminoglycan-bound form: implications for neutrophil-platelet crosstalk characterization of the orphan human chemokine receptor type receptor (cxcl ) on t lymphocyte cells the cxcl /cxcr / axis is a key driver in the growth of clear cell renal cell carcinoma platelet-derived chemokines cxc chemokine ligand (cxcl) , connective tissue-activating peptide iii, and cxcl differentially affect and cross-regulate neutrophil adhesion and transendothelial migration interleukin receptor signaling: master regulator of intestinal mucosal homeostasis in mice and humans association of il , il ra, and il rb polymorphisms with benign prostate hyperplasia in korean population il- regulates memory t cell development and the balance between th and follicular th cell responses during an acute viral infection eotaxin- generation is differentially regulated by lipopolysaccharide and il- in monocytes and macrophages exogenous bmp- facilitates the recovery of cardiac function after acute myocardial infarction through counteracting tgf-β signaling pathway trappin- /elafin: a novel innate anti-human immunodeficiency virus- molecule of the human female reproductive tract quantitation of cxcr expression in myocardial infarction using mtc-labeled sdf- fgf- expression enhances the performance of bioengineered skin identification of a novel chemokine (ccl ), which binds ccr (gpr ) cd ( - bb) signalosome: complexity is a matter of trafs - bb (cd ), an inducible costimulatory receptor, as a specific target for cancer therapy emerging roles of vascular cell adhesion molecule- (vcam- ) in immunological disorders and cancer ig-like domain of vcam- is a potential therapeutic target in tnfα-induced angiogenesis hera-gitrl activates t cells and promotes anti-tumor efficacy independent of fcγr-binding functionality modulation of immune tolerance via siglec-sialic acid interactions vegfr- in adult angiogenesis modulation of the il- /il- axis in obesity by il- rα tina marie albertson, and mckesson specialty health/us oncology research, cd expresssion in non lymphomatous malignancies tgf-beta receptor function in the endothelium roles of il- -gp signaling in vascular inflammation the role of mif in type and type diabetes mellitus dual role of fas/fasl-mediated signal in peripheral immune tolerance matrix metalloproteinases and tissue inhibitor of metalloproteinases are essential for the inflammatory response in cancer cells insulin-like growth factor binding protein- is a novel mediator of p inhibition of insulin-like growth factor signaling critical cytokine pathways to cardiac inflammation il- rα , a decoy receptor for il- acts as an inhibitor of il- -dependent signal transduction in glioblastoma cells function and biological roles of the dickkopf family of wnt modulators tumor necrosis factor receptor- (tnfr ): an overview of an emerging drug target the expression and role of activin a and follistatin in heart failure rats after myocardial infarction heparin-binding epidermal growth factor-like growth factor inhibits cytokine-induced nf-κb activation and nitric oxide production via activation of the phosphatidylinositol -kinase pathway on chaotic dynamics in transcription factors and the associated effects in differential gene regulation experimental demonstration of anticipating synchronisation in chaotic semiconductor lasers with optical feedback demonstration of optical synchronization of chaotic external-cavity laser diodes microbiota regulates immune defense against respiratory tract influenza a virus infection streptococcus pyogenes nuclease a (spna) mediated virulence does not exclusively depend on nuclease activity together forever: bacterial-viral interactions in infection and immunity journal of researches into the natural history and geology of the countries visited during the voyage of the h.m.s. beagle round the world, under the command of captain fitz the 'hygiene hypothesis' for autoimmune and allergic diseases: an update the blood sugar and cardiac involvement in rheumatic fever the incidence of rheumatic heart disease among diabetic patients overlapping humoral autoimmunity links rheumatic fever and the antiphospholipid syndrome key: cord- -n hzpbyf authors: wang, lina; chen, fengzhen; guo, xueqin; you, lijin; yang, xiaoxia; yang, fan; yang, tao; gao, fei; hua, cong; ding, yuantong; cai, jia; yang, linlin; huang, wei; xu, zhicheng; wan, bo; tong, jiawei; peng, chunhua; yang, yawen; zhang, lei; liu, ke; zhou, feiyu; zhang, minwen; tan, cong; zeng, wenjun; wang, bo; wei, xiaofeng title: virusdip: virus data integration platform date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n hzpbyf motivation the coronavirus disease (covid- ) pandemic poses a huge threat to human public health. viral sequence data plays an important role in the scientific prevention and control of epidemics. a comprehensive virus database will be vital useful for virus data retrieval and deep analysis. to promote sharing of virus data, several virus databases and related analyzing tools have been created. results to facilitate virus research and promote the global sharing of virus data, we present here virusdip, a one-stop service platform for archive, integration, access, analysis of virus data. it accepts the submission of viral sequence data from all over the world and currently integrates data resources from the national genebank database (cngbdb), global initiative on sharing all influenza data (gisaid), and national center for biotechnology information (ncbi). moreover, based on the comprehensive data resources, blast sequence alignment tool and multi-party security computing tools are deployed for multi-sequence alignment, phylogenetic tree building and global trusted sharing. virusdip is gradually establishing cooperation with more databases, and paving the way for the analysis of virus origin and evolution. all public data in virusdip are freely available for all researchers worldwide. availability https://db.cngb.org/virus/ contact weixiaofeng@cngb.org the covid- pandemic caused by hcov- is a global health emergency and poses a huge threat to human public health (harapan et al., ; sohrabi et al., ; velavan et al., ) . viral sequence data can provide an important scientific basis for the traceability and evolution of viruses (lu et al., ) , the development of viral nucleic acid detection reagents, vaccines, and drugs (lundstrom, ; lundstrom, ; zhang et al., ) , and play an important role in clinical assistant diagnosis and scientific prevention and control of epidemics. a one-stop comprehensive virus database will be vital for virus data retrieval and deep analysis. several organizations at home and abroad have constructed virus databases and developed related analysis tools, such as ncbi virus (hatcher et al., ) , gisaid (elbe et al., ; shu et al., ) virusdip is established and maintained using the django web framework (https://www.djangoproject.com/), the python programming language and pycharm (https://www.jetbrains.com/pycharm/). the centos (https://www.centos.org/) is chosen as the operating system of our server. in terms of system architecture, we use nginx (http://nginx.org/) to provide static resource access, uwsgi (https://uwsgi-docs.readthedocs.io/en/latest/) to deploy query and download services, postgresql to store metadata. in terms of data security, the database is deployed in the cngb (wang et al., ) which has passed the three-level review of information security level protection and the protection capability review of trusted cloud service. moreover, all services have been deployed with high availability. the main functions of virusdip includes archive, integration, access, and analysis of virus data ( fig. ). virusdip archives virus data submitted by cngbdb users and integrates these data with public data from different data sources to provide users with free data access. users can perform blast alignment based on the retrieved virus data, and can perform individual and multi-party genome analysis on the hcov- genome data and visually display its phylogeny. we have developed a convenient and rapid submission process for viral sequence data. the submission portal is at https://db.cngb.org/cnsa/init?type=virus. for data compatibility, the virus data standard integrates the virus and pathogen sample data standard of the international nucleotide sequence database collaboration (insdc) (karsch-mizrachi et al., ) , the hcov- data standard of gisaid, and the sample data standard of covid- genomics uk (cog-uk) consortium. all submitted data must be validated by machine and manual curation to normalize sequences and sample attributes, and ensure data quality. virusdip currently integrates the virus data submitted by cngbdb users and shared virus data of gisaid and ncbi. all shared data follows their original data agreements of its source database. on mar , , cngb and gisaid reached a strategic cooperation and plans to gradually establish a gisaid mirror database in virusdip. at present, the hcov- sequence metadata of gisaid can be synchronized to virusdip in real time. virusdip has deployed the blast sequence alignment tool and established thematic sequence alignment databases for the hcov- and comprehensive virus data to facilitate scientific researchers to quickly perform sequence alignment on a certain virus or other various viruses and analyze the similarity of virus sequences. data security and privacy protection are new challenges while promoting the open sharing the biological big data. blockchain and secure multi-party computation technology have provided a new technical concept for biological big data governance and applied to medical and genomic data analysis (ahmad et al., ; bogdanov et al., ; vazirani et al., ; veeningen et al., ) . based on nextstrain (hadfield et al., ) and visualization tools, virusdip has deployed multi-party genome analysis platform for hcov- , which is the first hcov- analysis tool based on blockchain and secure multi-party computation in china to our knowledge. the tool can display the evolution of the existing hcov- sequences on the platform, including phylogenetic tree and geographic distribution. users can use their private data and the public data of virusdip to perform combined computing to obtain the relative position of private virus sequences on the current public evolution tree, fully guaranteeing data security and promoting the sharing and application of virus data. virusdip has built an efficient batch submission process for virus data and integrated comprehensive virus data from multiple data sources. all integrated virus data can be freely accessed in one stop. virusdip also provides users a real-time, dynamic online joint evolution analysis tool for phylogenetic analysis of hcov- . virusdip is committed to building a comprehensive virus data platform for archive, integration, access, and analysis. in the future, we will continue to improve the data standards for archiving virus data. moreover, we will gradually develop or deploy new visualization and evolution analysis tools for virus data, such as literature tracking, variation analysis, browser and phylogeny to further promote the global trusted sharing of virus data and data mining. in addition, we will continue to strengthen data sharing cooperation with gisaid, expand cooperation with other virus databases and support more projects to promote the establishment of a global cooperation network for life resources. what is blockchain technology and its significance in the current healthcare system? a brief insight implementation and evaluation of an algorithm for cryptographically private principal component analysis on genomic data data, disease and diplomacy: gisaid's innovative contribution to global health nextstrain: real-time tracking of pathogen evolution coronavirus disease (covid- ): a literature review virus variation resource -improved response to emergent viral outbreaks the international nucleotide sequence database collaboration genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding rna-based drugs and vaccines latest development on rna-based drugs and vaccines gisaid: global initiative on sharing all influenza data -from vision to reality world health organization declares global emergency: a review of the novel coronavirus (covid- ) implementing blockchains for efficient health care: systematic review enabling analytics on sensitive medical data with secure multi-party computation the covid- epidemic the china national genebank horizontal line owned by all a genomic perspective on the origin and emergence of sars-cov- the novel coronavirus resource we gratefully acknowledge all collaborators and authors from originating and submitting laboratories of virus data integrated by virusdip.conflict of interest: none declared. key: cord- -zesg atp authors: iacullo, carly; diesburg, darcy a.; wessel, jan r. title: non-selective inhibition of the motor system following unexpected and expected infrequent events date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zesg atp motor inhibition is a key control mechanism that allows humans to rapidly adapt their actions in response to environmental events. one of the hallmark signatures of rapidly exerted, reactive motor inhibition is the non-selective suppression of cortico-spinal excitability (cse): unexpected sensory stimuli lead to a suppression of cse across the entire motor system, even in muscles that are inactive. theories suggest that this reflects a fast, automatic, and broad engagement of inhibitory control, which facilitates behavioral adaptations to unexpected changes in the sensory environment. however, it is an open question whether such non-selective cse suppression is truly due to the unexpected nature of the sensory event, or whether it is sufficient for an event to be merely infrequent (but not unexpected). here, we report data from two experiments in which human subjects experienced both unexpected and expected infrequent events during a simple reaction time task while cse was measured from a task-unrelated muscle. we found that expected infrequent events can indeed produce non-selective cse suppression – but only when they occur during movement initiation. in contrast, unexpected infrequent events produce non-selective cse suppression even in the absence of movement initiation. moreover, cse suppression due to unexpected events occurs at shorter latencies compared to expected infrequent events. these findings demonstrate that unexpectedness and stimulus infrequency have qualitatively different suppressive effects on the motor system. they also have key implications for studies that seek to disentangle neural and psychological processes related to motor inhibition and stimulus detection. motor inhibition is a core component of controlled and flexible human behavior. the rapid interruption of active motor representations allows humans to momentarily cancel ongoing movements and movement plans, which in turn allows them to reevaluate whether those movements are still appropriate when environmental circumstances suddenly change. in the laboratory, motor inhibition is usually assessed in tasks like the stop-signal task (logan, cowan, and davis, ) , where it allows humans to rapidly stop actions even after their initiation. in such tasks, subjects are explicitly instructed to stop an action following a previously instructed infrequent signal, which follows the response prompt on a minority of trials (verbruggen et al., ) . because subjects in tasks like the stop-signal task expect that these infrequent stop-signals will occur on a subset of trials, successful action-stopping in such tasks results from the implementation of both proactive and reactive inhibitory control mechanisms (aron, , kenemans, . proactive inhibition denotes the anticipatory implementation of control processes during the expectation of a stop-signal, while reactive inhibition denotes the cascade of processes that is triggered by the stop-signal itself (verbruggen et al., ; chikazoe et al., ; jaffard et al., ) . within the stop-signal task, the signals that instruct participants to cancel an action are infrequent events. however, since stop-signals are explicitly part of the task instruction, their occurrence is also expected. notably, however, in recent years, work on tasks that involve unexpected sensory events (e.g., the novelty-oddball paradigm or the cross-modal oddball task; courchesne et al., , parmentier et al., has shown that such events automatically induce motor inhibition, even when there is no instruction to ever stop an action. in other words, unexpected sensory events induce a reflexive engagement of motor inhibition, and they can do so even in the absence of proactive control (i.e., when the task does not involve an instruction to exert inhibitory control; wessel, ) . this automatic recruitment of reactive motor inhibition after unexpected events is evident on many levels of observation, including behavior, brain activity, and physiological changes of the motor system (cf. wessel & aron, , for a review). in behavior, this engagement of motor inhibition is suggested by the fact that unexpected events presented during forced-choice reaction time tasks lead to a slowing of the prompted motor responses (dawson et al., , ljungberg et al., . concomitantly, in the brain, unexpected events activate some of the same cortical and subcortical circuitry that is involved in stopping actions in tasks like the stop-signal task (bockova et al., ; wessel et al., ; fife et al., ) . however, the inhibitory effects that unexpected events exert on the motor system are perhaps most evident from physiological measurements of cortico-spinal excitability (cse). cse can be non-invasively probed using transcranial magnetic stimulation (tms) and electromyography (barker et al., ; rothwell et al., ; bestmann & krakauer, ) . by applying single-pulses of tms to the contralateral motor cortex representation of a specific muscle, a motor evoked potential is produced in the electromyogram of that muscle. the amplitude of this motor evoked potential provides a proxy for the net-cse of the underlying corticomotor tract. in tasks like the stop-signal task, cse of the muscles involved in the action is suppressed when a stop-signal occurs (coxon et al., (coxon et al., , . in addition, several studies have shown that this suppression of the motor system extends even beyond the muscle group that is targeted for stopping (badry et al., ; cai, oldenkamp, and aron, ; majid et al., ; wessel et al., wessel et al., , . subsequent studies have found that the proactive-reactive control balance is a key factor in determining this non-selective property of motor inhibition: the more proactive control is exerted, the more selectively it can be applied. in turn, the more stopping relies on reactive mechanisms, the greater the non-selective suppression of cse (greenhouse, oldenkamp, and aron, , duque et al., ) . in other words, non-selective cse suppression is a hallmark signature of the reactive implementation of motor inhibition. consequently, in line with the proposal that unexpected sensory events lead to an automatic recruitment of the brain's reactive inhibition circuity even when stopping is not explicitly required (i.e., in the absence of proactive control), such events do indeed also produce non-selective suppression of cse (wessel & aron, ) . in that particular study, subjects performed a verbal reaction time task, in which unexpected sounds were infrequently presented prior to the imperative stimulus. this led to cse suppression at a task-unrelated hand muscle, specifically at ms following sound onset. the same is true when a task is performed with the legs and cse is measured at the hand (dutra et al., ) . such studies of unexpected sensory events (see also novembre et al., novembre et al., , have led us to propose that unexpected events automatically activate the same reactive inhibitory control systems that are recruited when actions have to be stopped actively in tasks like the stop-signal task. specifically, we propose that the purpose of this automatically engaged inhibitory control effort is to rapidly interrupt ongoing behavior, which purchases time for the cognitive system to resolve the surprise produced by the unexpected event. this additional processing time can be used to evaluate whether ongoing motor plans are still appropriate in light of the sudden unexpected change in environmental regularity (wessel & aron, ) . however, there is a notable alternative to this surprise-inhibition theory. specifically, while the two types of psychological events that are known to result in nonselective cse suppression (stop-signals and unexpected events) differ in the degree to which they produce surprise (stop-signals are expected, unexpected events are not), they also have a notable commonality: they are both infrequent events within the context of their respective tasks. stop-signals typically occur in around - % of trials in the stopsignal task (verbruggen et al., ; though see dykstra et al., for an exception). similarly, in typical studies of unexpected events, only about - % of trials involve an unexpected event. therefore, it is possible that the infrequency of a stimulus alone can account for the presence of non-selective cse suppression after both stop-signals and unexpected events. if that is the case, surprise itself is not necessary to explain the presence of non-selective cse suppression, and may in fact not uniquely engage motor inhibition at all. indeed, while surprise and infrequency are often confounded, they are meaningfully different cognitive constructs. for example, infrequent events can be entirely expected (hearing a fire alarm during an announced drill), or entirely unexpected (hearing the same fire alarm on a regular day), with fundamentally different cognitive and behavioral implications. therefore, the goal of the current study was to investigate whether expected infrequent events can produce reactive motor inhibition, as indexed by non-selective cse suppression. notably, this question is not just relevant to test the proposed link between surprise and motor inhibition. indeed, if expected infrequent events can -by themselves -recruit reactive motor inhibition, this would be highly relevant for the study of motor inhibition in the stop-signal task. in fact, one of the most controversial questions in the recent stopsignal literature is which exact neural or psychological processes following stop-signals are related to the attentional detection of the infrequent stop-signal itself, and which are related to the actual implementation of motor inhibition (verbruggen et al., ; hampshire et al., ; matzke et al., ) . to address this question, many studies have utilized control tasks whose stimulus layout matches the stop-signal task (i.e., a go-signal is followed by an infrequent second signal) but with an instruction that does not involve outright action stopping (e.g., to press a second button after the original go-response or to ignore the second signal entirely, hampshire et al., ; dodds et al., ; chatham et al., ; erika-florence et al., ; waller et al., ) . if such expected infrequent stimuli presented outside of a stop-signal task produced the same type of reactive, nonselective motor inhibition that is found after unexpected infrequent stimuli, it would invalidate the assumption that a contrast between a stop-signal and an infrequent-signal control task would isolate the inhibitory process that is found in the stop-signal task. therefore, in sum, we here aimed to explicitly test whether expected infrequent events produce the same type of non-selective suppression of the motor system that is found after unexpected infrequent events. we tested this possibility using tasks that presented such infrequent events both before and during action initiation. in experiment , participants were twenty young, healthy adults ( female, mean age . , sd: . ). in experiment , participants were twenty-one young, healthy adults (all right-handed, female, mean age: . , sd: . ). all participants were recruited via a university of iowa research-dedicated email list or via the university of iowa department of psychological brain and sciences' online recruitment tool and compensated in correspondence to their recruitment means, either by an hourly rate of $ or by receiving course credit. the participants were all screened using a safety questionnaire (rossi et al., ) to ensure it was safe for them to undergo tms. experimental procedures were approved by the university of iowa institutional review board (# ). the stimuli for the behavioral paradigms for both experiment and experiment were presented using psychtoolbox (brainard et al., ) and matlab b (themathworks, natick, ma) on a linux desktop computer running ubuntu. in experiment , participants responded to the stimuli on the screen using their feet by pushing kinesis savant elite foot pedals (left or right; see figure for visualization of task setup). at the beginning of every trial, a black fixation cross was displayed in the center of a gray screen background. after ms, a sound stimulus was played for ms, which could be of one of the following conditions: standard (frequent), expected (infrequent), unexpected (infrequent). the standard and expected sounds were sine wave tones of either or hz frequency, counterbalanced across participants. the participant was introduced to the standard and expected sounds in a practice block prior to the recorded experiment. in the practice block, the expected sound occurred during % of trials, with the remainder being standard sounds. in the actual experiment, the expected sound occurred on % of trials. the unexpected sounds occurred on % of trials and were only introduced during the main experiment, without prior instruction. these novel sounds were bird song samples from european starlings (recorded by jordan a. comins), which were matched in amplitude envelope and duration to the sine wave tones. after the sound, on each trial, a single pulse of tms was delivered with a delay of , , or ms (i.e., centered around ms, which was the time point at which the cse suppression after unexpected sounds was observed in wessel & aron, ) . subjects were instructed that the sound would cue them to the timing of the appearance of the imperative stimulus. the imperative stimulus was a black arrow pointing left or pointing right, and appeared ms after the onset of the sound. participants responded according to the direction of the arrow by pressing the left or right foot pedal (deadline: , ms). if no response was made in time, "too slow!" was displayed on screen in red. after an inter-trial interval of , , , , or ms (during which the fixation cross was displayed), the next trial began. the practice block lasted trials. during the main block, participants completed a total of trials ( standard, expected, unexpected), divided into blocks separated by selftimed breaks. the task in experiment was the same as in experiment , except for the order and relative timing of the sound relative to the imperative stimulus. in experiment , the sound played ms after the onset of the imperative stimulus. again, tms stimulation occurred , , or ms after the sound. all task code, analysis code, and data can be found on the open science framework (osf) at [link will be added at time of publication]. were observed consistently. resting motor threshold (rmt) was then defined as the minimum intensity required to induce meps of amplitudes exceeding . mv peak to peak in of consecutive probes (rossini et al., ) . tms stimulation intensity was then adjusted to % of rmt (experiment : mean intensity: . % of maximum stimulator output; range: - %; experiment : mean intensity: . % of maximum stimulator output; range: - %) for stimulation during the experimental task. in both experiments, tms pulses occurred with a delay of , , or ms after sound onset (uniform distribution). a passive baseline for mep normalization was collected by delivery of single tms pulses at the end of each experimental task block. one baseline pulse was delivered every seconds during baseline collection (the same length as a trial during the active task), and no visual stimuli were shown on screen during this time. an emg sweep was triggered ms before each tms pulse. emg was recorded using a bipolar belly-tendon montage over the fdi muscle of the right hand using adhesive electrodes (h sg, covidien ltd., dublin, ireland), with a ground electrode placed over distal end of ulna. electrodes were connected to a grass p amplifier (grass products, west warwick, ri; hz sampling rate, filters: hz high-pass, hz low-pass, hz notch). the amplified emg data were sampled via a ced micro - sampler (cambridge electronic design ltd., cambridge, uk) and recorded to the disc using ced signal software (version ). meps were identified from the emg trace via in-house software developed in matlab (themathworks, natick, ma). trials were excluded if the root mean square power of the emg trace ms before the tms pulse exceeded . mv or if the mep amplitude did not exceed . mv. mep amplitude was quantified with a peak-to-peak rationale, measuring the difference between maximum and minimum amplitude within a time period of - ms after the pulse. both automated artifact rejection and mep amplitude quantification were visually checked for accuracy on each individual trial for every data set by a rater who was blind to the specific trial type. we then calculated the mean mep amplitudes for each condition of interest (sound: standard, expected, unexpected; tms timing: ms, ms, ms), and normalized by dividing these amplitudes by the median baseline mep estimate. after artifact correction, mep amplitudes were tested for differences using a x anova with the factors sound and tms timing. for experiment , the mean number of trials per condition were (standard, ms), (standard, ms), (standard, ms), (expected, ms), (expected, ms), (expected, ms), (unexpected, ms), (unexpected, ms), and (unexpected, ms), respectively. for experiment , the mean number of trials per condition were (standard, ms), (standard, ms), (standard, ms), (infrequent expected, ms), (infrequent expected, ms), (infrequent expected, ms), (unexpected novel, ms), (unexpected novel, ms) and (unexpected novel, ms), respectively. mean reaction times and accuracy by condition for both experiment and can be found in table . for experiment , we conducted an anova (repeated measures, -way/factor) on rt to assess the effects of sound type. an overall main effect of sound type was found (f( , ) = . , p < . , η = . ). pairwise t-tests were conducted to evaluate which sound (expected or unexpected) resulted in mean rt that differed significantly from rt during the standard trials. reaction time for unexpected tone trials was not significantly different from rt during standard trials (t( ) = . , p = . , d = . ) but rt for expected tone trials was significantly faster than rt during standard trials (t( ) = . , p = . , d = . ). in experiment , we presented the sound stimulus following the target arrow to assess the effects of infrequent stimuli on an already-initiated movement. for experiment , we conducted an anova (repeated measures, -way/factor) on rt to assess the effects of sound type. an overall main effect of sound type was found (f( , ) = . , p < . , η = . ). pairwise t-tests were conducted to evaluate which infrequent sound (expected or unexpected) resulted in mean rt that differed significantly from rt during the standard trials. reaction time for unexpected trials was significantly slower than rt during standard trials (t( ) = - . , p = . , d = . ), but rt for expected trials was not significantly different than rt during standard trials (t( ) = . , p = . , d = . ). across two experiments, we investigated whether infrequent but expected events induce a non-selective suppression of the motor system, similarly to unexpected events and stop-signals. using single-pulse tms combined with emg of task-unrelated hand muscles while participants performed forced-choice reaction time tasks with their feet, we found that infrequent expected sounds are indeed followed by a non-selective suppression of task-unrelated motor effectors. however, we found that this is only the case when a movement is currently being initiated (i.e., when the infrequent event follows the imperative stimulus that instructs a movement). in contrast, unexpected infrequent events non-selectively suppress cse even in the absence of movement initiation (i.e., when presented before any imperative stimulus). the latter finding replicates our previous report of non-selective cse suppression in a verbal reaction time task when unexpected sounds were presented prior to the imperative stimulus (wessel & aron, ) . notably, the timing of the non-selective suppression of cse after unexpected infrequent events was in line with our prior work, in that it takes place at ms following the onset of the sound (wessel & aron, ; dutra et al., ) . in turn, it is notable that non-selective cse suppression after expected infrequent events did not occur until ms after the event. these findings have two primary implications, which we will now discuss in turn. first, the results suggest that there is a qualitative difference in the non-selective suppression of the motor system that takes place after infrequent events, depending on whether these events were unexpected or expected. specifically, unexpected events induce cse suppression even in the absence of motor initiation, which suggests a more drastic type of inhibitory control that is not evoked by expected infrequent events. moreover, the latency difference in cse suppression between unexpected and expected infrequent events suggests a more rapid engagement of inhibitory control when infrequent events are unexpected. in that respect, it is interesting to observe that the respective suppressive effects of expected and unexpected infrequent events do not seem to be additive. this is evident from the fact that while expected infrequent sounds produced cse suppression at ms following sound onset in experiment , no suppression was observed at that time point for unexpected sounds (which showed cse suppression at ms instead). if the effects of surprise and infrequency were additive, unexpected sounds should have produced suppression at both ms (due to the unexpectedness) and at ms (due to the infrequency). instead, infrequency and unexpectedness appear to independently engage the same inhibitory process, but with different intensity and latency. this supports the theory that surprise is accompanied by a unique pattern of automatically engaged inhibitory control (wessel & aron, ) . beyond these implications for the processing of unexpected and infrequent events, the current findings also have very important implications for the study of motor inhibition in the context of the stop-signal task. as mentioned in the introduction, recent years have seen a controversial discussion regarding the exact psychological and neural mechanisms that contribute to performance in the stop-signal task. specifically, the notion that the ability to stop an action is not solely dependent on the efficacy of the inhibitory process itself, but also depends on the initial attentional detection of the (infrequent) stopsignal and the associated triggering of the inhibitory process has been particularly prominent (levy & wagner, ; verbruggen et al., ; erika-florence et al., ; matzke et al., matzke et al., , . this notion has spurred a fundamental discussion about which parts of the neural cascade of activity following stop-signals reflect the attentional detection of an infrequent instructed signal to stop, and which reflect the motor inhibition process itself (aron et al., ; hampshire & sharp, ; swick & chatham, ) . in many studies that address this question, an inferential contrast is used in which brain activity following stop-signals is compared to brain activity following perceptually identical, infrequent, expected events that do not convey a 'stopping' instruction (schmajuk et al., ; dimoska & johnstone, ; hampshire et al., ; boehler et al., ; tabu et al., ; dodds et al., ; chatham et al., ; erika-florence et al., ; bissett & logan, ; elchlepp et al., ; lawrence et al., ; verbruggen et al., ; waller, hazeltine, and wessel, ) . in other words, those studies employ a purportedly 'noninhibitory' control condition that resembles the design of our current experiment , where a go-signal is followed by an infrequent expected event. the current results clearly show that presenting such infrequent, expected events after go-signals lead to an automatic engagement of non-selective motor inhibition. this is in line with our other recent work, which has shown that expected infrequent events after a go-signal lead to an incidental slowing of reaction times and elicit neural activity from the same neural generator that is engaged by stop-signals . taken together, this suggests that the 'inhibition-free' control conditions that are used in studies to isolate attentional from inhibitory processes are not, in fact, free of inhibitory activity. hence, a subtractive contrast between stop-trials and such control conditions will likely cancel out (at least parts of) the inhibitory process, instead of isolating it. therefore, these subtractive contrasts might operationalize other condition differences between stop-trials and control trials with infrequent signals (such as the fact that stop-trials do not include a motor response). the current study has two shortcomings, owing to methodological limitations. first, we did not find the behavioral effects of unexpected and expected infrequent events (reaction time slowing) that are usually found in studies that use similar experimental paradigms (e.g., dawson et al., , parmentier et al., . this is likely due to the presence of the tms pulses, which eliminate such behavioral effects. first, tms of motor cortex interferes with ongoing behavior itself by interrupting the underlying motor processes (hadipour-niktarash et al., ; cohen et al., ). second, tms pulses produce a stereotypic auditory and haptic sensation that occurs on every trial. prior research has shown that when infrequent or surprising sounds are followed by a stereotypic, non-surprising sounds on every trial, the effect of infrequency or surprise on behavior is greatly reduced (parmentier, ; parmentier et al., ) . this outcome is unavoidable in studies using tms to probe the effects of unexpected events on motor excitability. a second shortcoming of the current study is we did not use an active baseline in the inter-trial interval during the task (unlike e.g., wessel & aron, ) . the introduction of such baseline trials would have further elongated an already tedious and tiring task for the subjects, who had to respond to more than very simple stimuli for more than minutes to provide sufficient number of trials in all three conditions. therefore, it is -strictly speaking -not possible to ascertain whether infrequent events study lead to a suppression of the mep below a task-baseline based on the current data, or whether they merely suppress cse compared to frequent events. however, since we know from previous study (wessel & aron, ) that unexpected infrequent events indeed suppress cse below active baseline, one could extrapolate that the same would be true for the expected infrequent events in the current study (the cse suppression that occurred at ms after expected sounds was indistinguishable in amplitude from the cse suppression that occurred at following unexpected sounds). nevertheless, this hypothesis would necessitate independent validation. finally, the differential timing of cse suppression found for unexpected and expected infrequent events provides some interesting aspects for future study. there are several potential explanations for this difference in timing. it is widely believed that nonselective cse suppression is due to the engagement of a specific fronto-basal ganglia inhibitory pathway (aron, ; jahanshahi et al., ; neubart et al., ; wiecki & frank ; wessel et al., ; wessel & aron, ; kelley et al., ; wessel et al., ) . it is unclear whether the timing difference between unexpected and expected infrequent events found here is due to differences in subcortical processing in the basal ganglia, or because of differences in the cortical processes that trigger the basal ganglia processes. one property of this proposed fronto-basal ganglia pathway underlying nonselective cse suppression is its ostensible hyper-direct, mono-synaptic connection from the cortical areas that trigger the inhibitory process into the basal ganglia structures that implement the actual inhibition (nambu et al., ; parent and hazrati, ; chen et al., ) . if this circuit is indeed as hard-wired and low-level as believed, differences in cortical processing related to the triggering of the inhibitory process are perhaps more likely to account for the differences in timing of the cse suppression between expected and unexpected infrequent events. indeed, classic eeg studies of such events do indicate that while unexpected infrequent events evoke a fronto-central p a waveform, expected frequent events evoke a slower-latency, more posterior p b (courchesne et al., ; friedman et al., ; comerchero & polich, ) , which would suggest differences in cortical processing. future studies could test whether both of these potentials reflect the activity of different cortical pathways that detect unexpected and expected infrequent events, respectively, but ultimately both produce inhibition mediated via the same basal ganglia circuit. in summary, we here find that infrequent events produce a non-selective suppression of the motor system, even when they are expected. however, this suppression of the motor system is qualitatively different than the suppression observed after unexpected events (which manifests with lower latency and is also observable in the absence of motor preparation). the presence of this frequency-related inhibitory effect poses an important challenge for studies of motor inhibition that seek to produce conditions that do not include inhibitory activity. finally, the current results show that surprise caused by unexpectedness has unique effects on the motor system that are not attributable to the relative frequency of an event alone. from reactive to proactive and selective control: developing a richer model for stopping inappropriate responses inhibition and the right inferior frontal cortex: one decade on involvement of the subthalamic nucleus and globus pallidus internus in attention suppression of human cortico-motoneuronal excitability during the stop-signal task non-invasive magnetic stimulation of human motor cortex the uses and interpretations of the motorevoked potential for understanding behaviour selective stopping? maybe not pinning down response inhibition in the brain-conjunction analyses of the stopsignal task the psychophysics toolbox a proactive mechanism for selective suppression of response tendencies cognitive control reflects context monitoring, not motoric stopping, in response inhibition prefrontal-subthalamic hyperdirect pathway modulates movement inhibition in humans. neuron preparation to inhibit a response complements response inhibition during performance of a stop-signal task ventral and dorsal stream contributions to the online control of immediate and delayed grasping: a tms approach p a and p b from typical auditory and visual stimuli stimulus novelty, task relevance and the visual evoked potential in man intracortical inhibition during volitional inhibition of prepared action selective inhibition of movement allocation of cognitive processing capacity during human autonomic classical conditioning effects of varying stop-signal probability on erps in the stop-signal task: do they reflect variations in inhibitory processing or simply novelty effects dissociating inhibition, attention, and response control in the frontoparietal network using functional magnetic resonance imaging physiological markers of motor inhibition during human behavior perceptual surprise improves action stopping by nonselectively suppressing motor activity via a neural mechanism for motor inhibition leveling the field for a fairer race between going and stopping: neural evidence for the race model of motor inhibition from a new version of the stop signal task proactive inhibitory control: a general biasing account a functional network perspective on response inhibition and attentional control causal role for the subthalamic nucleus in interrupting behavior the novelty p : an event-related brain potential (erp) sign of the brain's evaluation of novelty stopping a response has global or nonglobal effects on the motor system depending on preparation impairment of retention but not acquisition of a visuomotor skill through time-dependent disruption of primary motor cortex the role of the right inferior frontal gyrus: inhibition and attentional control contrasting network and modular perspectives on inhibitory control proactive inhibitory control of movement assessed by event-related fmri a fronto-striatosubthalamic-pallidal network for goal-directed and habitual inhibition a human prefrontal-subthalamic circuit for cognitive control specific proactive and generic reactive inhibition stopping to food can reduce intake. effects of stimulus-specificity and individual differences in dietary restraint cognitive control and right ventrolateral prefrontal cortex: reflexive reorienting, motor inhibition, and action updating on the ability to inhibit simple and choice reaction time responses: a model and a method the informational constraints of behavioral distraction by unexpected sounds: the role of event information proactive selective response suppression is implemented via the basal ganglia release the beests: bayesian estimation of ex-gaussian stop-signal reaction time distributions a bayesian approach for estimating the probability of trigger failures in the stop-signal paradigm functional significance of the corticosubthalamo-pallidal 'hyperdirect'pathway cortical and subcortical interactions during action reprogramming and their related white matter pathways saliency detection as a reactive process: unexpected sensory events evoke corticomuscular coupling the effect of salient stimuli on neural oscillations, isometric force, and their coupling functional anatomy of the basal ganglia. i. the corticobasal ganglia-thalamo-cortical loop towards a cognitive model of distraction by auditory novelty: the role of involuntary attention capture and semantic processing the cognitive determinants of behavioral distraction by deviant auditory stimuli: a review non-invasive electrical and magnetic stimulation of the brain, spinal cord and roots: basic principles and procedures for routine clinical application. report of an ifcn committee magnetic stimulation: motor evoked potentials electrophysiological activity underlying inhibitory control processes in normal adults ten years of inhibition revisited functional relevance of pre-supplementary motor areas for the choice to stop during stop signal task proactive adjustments of response strategies in the stop-signal paradigm theta burst stimulation dissociates attention and action updating in human inferior frontal cortex proactive and reactive stopping when distracted: an attentional account a consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task common neural processes during action-stopping and infrequent stimulus detection: the frontocentral p as an index of generic motor inhibition unexpected events induce motor slowing via a brain mechanism for action-stopping with global suppressive effects surprise disrupts cognition via a fronto-basal ganglia suppressive mechanism on the globality of motor suppression: unexpected events and their influence on behavior and cognition surprise: a more realistic framework for studying action stopping non-selective inhibition of inappropriate motor-tendencies during response-conflict by a fronto-subthalamic mechanism a computational model of inhibitory control in frontal cortex and basal ganglia key: cord- -lkngp u authors: bachmaier, kurt; stuart, andrew; hong, zhigang; tsukasaki, yoshikazu; singh, abhalaxmi; chakraborty, sreeparna; mukhopadhyay, amitabha; gao, xiaopei; maienschein-cline, mark; kanteti, prasad; rehman, jalees; malik, asrar b. title: selective nanotherapeutic targeting of the neutrophil subset mediating inflammatory injury date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lkngp u inflammatory tissue injury such as acute lung injury (ali) is a disorder that leads to respiratory failure, a major cause of morbidity and mortality worldwide. excessive neutrophil influx is a critical pathogenic factor in the development of ali. here, we identify the subset of neutrophils that is responsible for ali and lethality in polymicrobial sepsis. the pro-inflammatory neutrophil subpopulation was characterized by its unique ability to endocytose albumin nanoparticles (anp), upregulation of pro-inflammatory cytokines and chemokines as well as the excessive production of reactive oxygen species (ros) in models of endotoxemia and septicemia. anp delivery of the drug piceatannol, a spleen tyrosine kinase (syk) inhibitor, to the susceptible subset of neutrophils, prevented ali and mortality in mice subjected to polymicrobial infection. targeted inhibition of syk in anp-susceptible neutrophils had no detrimental effect on neutrophil-dependent host defense because the subset of anplow neutrophils effectively controlled polymicrobial infection. the results show that neutrophil heterogeneity can be leveraged therapeutically to prevent ali without compromising host defense. subsets of neutrophils differ markedly in their response to both homeostatic and inflammatory signals ( ) . neutrophil heterogeneity is apparent in the lungs of naïve mice, where a large proportion is marginated in the microvasculature, where they may function as immune sentinels, while other neutrophils circulate unimpeded ( ) . neutrophils are an essential component of the innate immune response to polymicrobial infection due to their ability to eliminate the infectious agents ( ) . however, neutrophils can also become pathogenic in diseases by promoting excessive inflammation such as in the case of acute lung injury (ali), a main cause of morbidity and mortality worldwide ( ) . excessive activation of neutrophils by bloodstream bacteria and their products, such as the bacterial endotoxin lipopolysaccharide (lps), results in tissue damage and organ dysfunction ( ) . therapeutic efforts of curbing this excessive neutrophilic inflammation have been frustratingly ineffective ( ) . administration of nitric oxide, norepinephrine, low dose corticosteroids, prostaglandin e , or recombinant activated protein c, when critically evaluated, did not significantly improve patient mortality ( ) . moreover, neutralizing key inflammatory mediators such as the cytokines tnf-α, and il- β, or reactive oxygen species (ros), has also failed ( ) . in experimental models, the elimination of neutrophils markedly decreases the severity of ali ( ) . on the other hand, there is the risk of compromising host defense in the setting of global neutrophil impairment and deterioration of pulmonary function during recovery from neutropenia ( ) . targeting specific subsets of neutrophils could represent an optimal therapeutic approach if one could identify deleterious neutrophil subsets without compromising the subsets essential for host defense. the unique ability of neutrophils to rapidly change their phenotype and function according to changes in their microenvironment ( ) ( ) ( ) ( ) ( ) is a manifestation of neutrophil heterogeneity, but the distinct roles of neutrophil subsets in the setting of sepsis or endotoxemia and not well understood. we hypothesized that subsets of neutrophils are primarily responsible for the maladaptive hyperinflammatory response that causes ali, multiple organ failure, and death. in the present study, we identified the subset of neutrophils that incorporated specially formulated albumin nanoparticles (anp) as the subset that could be therapeutically targeted without impairing the elimination of bacteria in experimental septicemia. heterogeneous response of neutrophils to endotoxin and septicemia. after i.v. injection of albumin nanoparticles (anp) to naive mice, we observed anp-uptake in liver and spleen, whereas lungs, heart and kidney remained mostly free of anp (supplemental figure ) . in response to i.p. challenge with the endotoxin of gram-negative bacteria, lipopolysaccharide (lps), uptake of i.v.injected anp in heart, kidney, liver and spleen did not increase compared to naïve mice. in lungs, however, anp uptake increased significantly after lps challenge (supplemental figure ). ly g + polymorphonuclear neutrophils (pmn) have the capacity to internalize anp ( ) . we next determined whether uptake of anp in the lung was restricted to ly g + pmn and whether there was heterogeneity in the endocytosis of anp among ly g + pmn. in response to i.p. lps, only cd + leukocytes endocytosed i.v. injected anp whereas parenchymal cells (cd neg ) did not ( figure a ). anp-endocytosis was restricted to ly g + pmn and largely absent in cd + monocytes/macrophages, nk . + nk cells, or lymphocytes (data not shown). pulmonary pmn endocytosed anp in a bimodal manner, with one subset showing highly efficient uptake (anp high ), and the other subset demonstrating minimal to no uptake (anp low ) ( figure a ). bacterial endotoxins amplify the neutrophil activation in septicemic mice, leading to increased pmn sequestration in lungs where pmn release pro-inflammatory mediators and further enhance the recruitment of immune cells ( ) . using cecal ligation and puncture (clp), a reproducible and clinically relevant mouse model of polymicrobial infection that causes ali, we found that in naïve control mice after sequential i.v. injections of anp only ~ % of lung pmn endocytosed anp as evidenced by anp-specific fluorescence ( figure b) . at h after a sham operation, laparotomy plus cecal ligation without puncture of the cecum, and sequential i.v. injections of anp, anp high pmn increased to only ~ % ( figure b) . induction of severe polymicrobial sepsis by clp, however, increased the frequency of anp high lung cells -fold over baseline conditions to ~ % ( figure b ). we consistently found that cd b expression levels on pmn in peripheral blood, lung, and liver, were greater on anp high pmn than on anp low pmn ( figure e ), and lung anp high pmn showed the highest cd b expression ( figure e) , indicating a higher level of inflammatory activation of the anp high pmn subset. moreover, in septicemic mice, the percentages of anp high pmn was significantly greater than in sham controls in blood, lung, and liver ( figure f ), consistent with an increased pro-inflammatory state as well as increased adhesiveness of the anp high pmn subset. this heterogeneity in cd b activation on pmn suggested that susceptibility to anpendocytosis delineated distinct subsets of pulmonary pmn. to define the differences between the pmn subsets, we next analyzed whether anp high pmn had a transcriptomic profile different from anp low pmn. we performed an unbiased analysis of lung pmn transcriptomic profiles using rna-seq. we challenged mice with i.p. injections of lps or saline and administered anp i.v. h later. at h after anp injection, we euthanized mice and harvested pmn from single cell suspension of their lungs and sorted the ly g + pmn by flow cytometry according to their anp uptake into anp low and anp high pmn. immediately after sorting, we prepared pmn mrna for rna-seq analysis. we generated a heat map and dendrogram ( figure a ) to represent the normalized pmn gene expression data. we found that the biological replicates clustered into groups with distinct transcriptomic profiles; i.e., the mrna profiles defined pmn from lps-challenged or saline-injected mice, and were distinct in anp low and anp high pmn ( figure a ). using metacore pathway analysis to identify pathways that were different between anp high pmn and anp low pmn, we found that the pathways regulating immune response and immune cell migration were significantly overrepresented in the anp high pmn (supplemental table) . pathways containing chemokine receptors were significantly enriched in anp high pmn, consistent with the concept that pmn heterogeneity is a function of differential pmn trafficking into tissue presumably facilitated by different chemoreceptor expression. we also found that chemokine receptors were over-represented . to identify the chemokine receptors for each pmn subset, we generated separate heatmaps for chemokine receptors, plotting all genes with cpm > . ( reads at sequencing depth of m reads) regardless of differential expression levels. in naïve mice, anp high pmn showed relative over-expression of chemokine receptors cxcr , cxcr , and ccrl ( figure b ). in lpschallenged mice, anp high pmn showed relative over-expression of chemokine receptors ccr , ccr , ccr , ccr , ccrl , cxcr and cxcr ( figure c ). we next assessed the expression of chemokines in anp high pmn and anp low pmn. in saline injected control mice, anp high pmn were significantly enriched for the expression of the chemokines ccl , ccl , and cxcl ( figure d ). in lps-challenged mice, anp high pmn demonstrated relative over-expression of the chemokines ccl , ccl , ccl , ccl , ccl , cxcl , cxcl , cxcl , cxcl ( figure e ). of note, mice were only exposed to anp for the last hour of the h lps-challenge, and the differences in gene expression between anp high pmn and anp low pmn in lps challenged mice were far greater than those in the naïve mice, indicating that the anp uptake itself likely did not affect the gene expression profiles. these rna-seq data unequivocally demonstrated the existence of lung pmn subsets with a distinct response to the inflammatory stimulus lps. based on our rna-seq data and metacore pathway analysis, we selected a group of chemokine receptors and cytokines to determine the kinetics of their expression after lps-stimulation. we performed this independent validation by quantitative pcr (qpcr) and flow cytometry. we found that mrna expression of the chemokine receptor ccr was significantly greater in anp high pmn than in anp low pmn at h, h, and h after lps challenge ( figure a ). importantly, we found that ccr receptor cell surface expression, consistent with the mrna data, was significantly greater on lung anp high pmn than in anp low pmn before and h, h, and h after lps stimulation ( figure b ). cxcr expression was reduced at h after lps-stimulation compared to h stimulation, cxcr receptor cell surface expression increased at h after lps-stimulation and was greater in anp high than in anp low pmn, and decreased to expression levela of unstimulated pmn thereafter ( figure b ). these data demonstrate that the observed mrna expression heterogeneity translated into cell surface protein expression heterogeneity. the mrna levels of the chemokines ccl , ccl , cxcl , cxcl ( figure c -f) were significantly greater in anp high pmn than in anp low pmn h, h, and h after in vivo lps challenge. ccl ( figure c ) and ccl ( figure d ) expression, in particular, was vastly greater in anp high than in anp low pmn, suggesting that anp high pmn are specialized cells of inflammation, which recruit and activate additional pmn. in addition, heterodimers of ccl and ccl are known to attract monocytes/macrophages ( ) . the cytokine il- β is essential for antibacterial function and expression of il β was induced ~ -fold in anp low pmn h after lps challenge compared to anp low pmn from saline injected control mice ( figure g ); in anp high pmn, il β was induced ~ fold over anp high pmn from saline injected control mice ( figure h ). expression of the pleiotropic cytokine il- expression was significantly greater in the anp high than in the anp low pmn in lungs h, h, and h after lps challenge ( figure h ). these data demonstrate that pulmonary anp high pmn can markedly amplify the inflammatory response. we next determined whether adoptively transferring anp high pmn from donors into syngeneic recipient mice would induce inflammation in these mice. donor balb/c mice were challenged with a lethal dose of lps [ mg/kg] and injected with two doses of anp labeled with the stable fluorochrome af at h and h after lps challenge ( figure a ). at h after lps challenge, donor mice were euthanized and lung single cell suspensions were prepared for flow cytometric sorting into anp high and anp low neutrophils ( figure b ). syngeneic recipient mice were injected i.v. with x pulmonary anp high pmn or, as controls, with an equal number of pulmonary anp low pmn from the same donors (three donors per recipient mouse were required to achieve the necessary cell number). at h prior to transfer of donor cells, recipient mice were treated with a sublethal dose of lps [ mg/kg] to activate their endothelium, a prerequisite for initiating neutrophilic lung inflammation ( ) . at h after the adoptive transfer, we assessed lung inflammation in recipient mice ( figure a ). we found anp + ly g + pmn in lungs of recipient mice, confirming successful transfer of donor cells to recipient mice and their homing to the lung ( figure c ). transfer of donor anp high pmn significantly increased lung inflammation in the recipient mice when compared to mice that received anp low cells ( figure c ). moreover, after transfer of anp high pmn, lung ly g + pmn produced more ros when compared to controls receiving anp low pmn ( figure d ). because ros induce tissue inflammation ( , ) , and anp high cells are carriers of large amounts of mrna for inflammatory cytokines and chemokines ( figure ), we measured the inflammatory mediators il- β and cxcl in lung tissue extracts. il- β, released during activation of nlrp inflammasome, mediates tissue injury, whereas the chemokine cxcl amplifies the inflammatory cycle by attracting additional pro-inflammatory neutrophils ( ) ( ) ( ) . mice receiving anp high pmn had significantly greater concentrations of il- β ( figure e ) and cxcl ( figure f ) in their lungs than recipients of anp low pmn. these data demonstrated the intrinsic ability of anp high pmn to promote lung inflammation. firm pmn adhesion on microvascular endothelium, induced by endotoxin or bacteremia, upregulates mac- (a heterodimer of cd b and β -integrin cd ), contributing to maximal activation of pmn ( ). syk activity is required for β -integrin-mediated neutrophil activation ( ) . given our data above, we reasoned that inhibiting integrin signaling specifically in the subset of anp high pmn would reduce lung inflammation in the polymicrobial sepsis model. we thus used the drug piceatannol, a syk inhibitor ( , ) , that is readily incorporated into anp due to its poor water solubility ( , ) , to inhibit syk-mediated β -integrin-dependent neutrophil adhesion. we found that therapeutic administration of piceatannol loaded anp high pmn protected cd mice from lethal polymicrobial sepsis ( figure a ). treatment with two i.v. injections of piceatannol incorporated into anp (panp), h and h after clp, significantly reduced mortality of mice when compared to control groups treated with anp without any drug after challenge with clp ( figure a ). treatment using panp reduced clp lethality to the rate of sham-operated (laparotomy plus cecal ligation without puncture of the cecum) mice ( figure a ). injections of panp alone had no effect on the survival rate compared to saline injected controls ( figure a ). clp challenged mice treated with anp, without any drug, had the same mortality rate as saline-injected controls ( figure a ), demonstrating that targeting specifically the anp high subset of pmn is sufficient to prevent clp-induced mortality. similarly, in the absence of polymicrobial infection but after i.p. challenge with a lethal dose of the endotoxin lps (ld ), mice treated with sequential i.v. injections of panp h and h after lps challenge, showed significantly reduced mortality when compared to control mice injected with anp alone ( figure b ). reduced mortality after panp treatment was correlated with the presence of significantly fewer highly inflammatory cd b high cd high pmn ( figure c ), and reduced cd b expression on lung pmn when compared to anp-treated controls ( figure d ). furthermore, while cd b expression on pmn was reduced by panp treatment in the lungs, in peripheral blood pmn cd b surface-expression was higher when compared to pmn from anp-treated mice ( figure d ). to test whether augmented cd b expression on pmn in peripheral blood was a consequence of panp-induced pmn trafficking from the lung to the peripheral blood, we used two-photon microscopy to visualize pmn trafficking in the lung in vivo ( ) . we determined the number of ly g + pmn in microvasculature and velocity of their migration through the microvasculature in lungs (video, figure e ). in lps challenged cd mice, the number of pmn increased, as measured by pmn-specific fluorescence (video). treatment of pmn with panp, however, significantly increased the velocity of ly g + pmn in the lung microvasculature and reduced the number of ly g + pmn as compared to anp treated controls (movie, figure e ,f). cell targeted treatment of anp high pmn by inhibiting β -integrin signaling, and accelerated the transit of pmn through lungs, and thus reduced exposure time of lung tissue to noxious pmn-derived mediators. ros production is a potent mediator of tissue damage ( , ) . we found that anp high cells were characterized by high ros production ( figure d ). syk, whose enzymatic activity is the cellular target of piceatannol ( ), is required for integrin-mediated neutrophilic superoxide production ( ). we measured ros production by bone marrow ly g + pmn in vitro (supplemental figure ). bone marrow pmn responded to stimulation with the bacterial peptide fmlp with ros production (supplemental figure ) . pmn with higher uptake of panp showed greater reduction in ros production (supplemental figure ) . moreover, the delivery of piceatannol via panp increased drug efficacy by orders of magnitude when compared to free drug because of its incorporation in the toxic pmn subset (supplemental figure ) . we therefore examined whether panp treatment reduced superoxide production by lung pmn of endotoxemic mice. we challenged mice with a lethal dose of lps and analyzed the production of ros by the lung ly g + pmn ex vivo. we found that anp high pmn had significantly higher intracellular ros levels than anp low pmn ( figure a ). panp treatment, however, almost completely blocked ros production in these cells ( figure a ). these data demonstrated that anp high pmn are largely responsible for ros production by lung inflammatory cell in endotoxemia because targeted treatment via panp markedly reduced ros production. mice doubly deficient for nadph oxidase and inos (gp phox −/− nos −/− ) develop spontaneous infections ( ) . ros production and complementary no production by pmn are essential to control host microbial diversity and microbial infection ( ) , and functional lung pmn are essential for the task of clearing bloodstream bacteria because resident macrophages in liver and spleen alone are insufficient ( , ) . we therefore determined the effects of panp treatment on the bacterial burden of cd mice in the clp model of polymicrobial infection. we found no exacerbation or amelioration of the bacterial burden as a result of panp treatment when compared to anp treatment ( figure b ), suggesting that selective inhibition of integrin signaling in the anp high pmn subset did not compromise bacterial elimination. two consecutive i.v. injections of panp, given h and h after clp, did not increase bacteremia when compared to anp injected controls ( figure b ). the bacterial burden of lungs, livers, and spleens of bacteremic mice was similar between panptreated mice and anp-treated controls ( figure b ). pmn-dependent antimicrobial function was fully preserved after panp treatment, suggesting that antimicrobial functions, ingestion and elimination of bacteria are mainly performed by anp low pmn (supplemental figure ). because panp treatment did not weaken anti-microbial resistance, we analyzed parameters of tissue inflammation and damage we measured crucial inflammatory mediators, il- β, and cxcl . in lung tissue extracts of mice subjected to clp. we found a substantial reduction in the concentration of il- β and cxcl after panp treatment when compared to anp treated controls ( figure c ). we next measured nitrotyrosine formation in lungs and livers of septicemic mice. in the experimental mice as well as in septicemic patients, activated lung myeloid cells, inflammatory or resident, and epithelial type ii cells, release both no and superoxide which react to form peroxynitrite, a potent oxidant causing tissue damage ( ) . peroxynitrite (onoo − ), but not no or superoxide alone, nitrates tyrosine residues ( ) . we observed that nitrotyrosine-specific staining in inflammatory and parenchymal cells was significantly reduced in lungs and livers of mice treated with panp when compared to anp-treated controls ( figure d ,e). these data suggest that antimicrobial function and tissue damaging function are performed by distinct subsets of pmn. we next determined the effect of panp treatment on pulmonary edema which is a characteristic feature of inflammatory lung injury. a marked increase in lung wet-to-dry weight ratio is indicative of breakdown of the alveolar capillary barriers, the hallmark of ali. pneumonia is the most common cause of ali in patients ( ) and also the most common cause of sepsis ( ) . in the model of pneumonia induced by i.t. instillation of live p. aeruginosa bacteria, panp treatment significantly reduced pulmonary edema when compared to treatment with the control anp ( figure f ). photomicrographs of lung or liver sections from septicemic mice treated with anp or panp showing nitrotyrosine formation. paraffin embedded sections were stained with specific ab to nitrotyrosine (red staining), and with dapi to visualize nuclei (blue staining). polymicrobial sepsis was induced by clp, mice were treated with panp or anp h and h after challenge and were sacrificed for tissue processing and staining h after challenge. bar furthermore, treatment with panp significantly reduced pulmonary edema in endotoxemic or septicemic mice when compared to lungs from anp treated controls ( figure f) . a reduction of tissue damage, because of reduced lung inflammation, could be the proximate cause of reduced inflammatory lung injury after treatment. measuring a markers of overall cell damage, lactic dehydrogenase (ldh) ( ) , revealed that the polymicrobial sepsis-induced increased serum activity of ldh was significantly reduced by panp treatment when compared to anp treated controls ( figure g ). in addition, hepatocyte-specific sorbitol dehydrogenase (sdh) activity, a marker of hepatocyte damage ( ) , was also significantly reduced by panp treatment of septicemic mice ( figure g ). endocytosis of anp delineates two distinct subsets of pmn. anp high pmn cause tissue damage whereas anp low pmn control microbial infection ( figure h ). targeting neutrophilic inflammation that is pathogenic in ali remains an important unmet clinical need. our observation that neutrophils primed by bacterial infections or bacterial derived products exhibit heterogeneity in their capacity to endocytose anp lead to the discovery of two distinct neutrophilic subsets, one that causes inflammatory injury and one that controls microbial infection. functional and phenotypic pmn heterogeneity ( ) ( ) ( ) ( ) ( ) led us to test the hypothesis that immunopathology and severe tissue inflammation of endotoxemia and septicemia can be treated by targeting a distinct neutrophilic subset. increased expression of chemokines and chemokine receptors in anp high pmn was consistent with their role in promoting tissue inflammation. several of the chemokines such as ccl and ccl or cxcl and cxcl are members of the macrophage inflammatory protein family and are typically thought to be released by macrophages to increase the influx of pro-inflammatory cells such as neutrophils ( ) . our data suggest that a subset of neutrophils might be a substantial source of these inflammatory proteins in models of ali. all cells need to express genes that are required for their intrinsic functions, whereas production of secreted factors can be delegated to subsets ( ) . our results indicate that a specific subset of phenotypic and functionally distinct neutrophils is responsible for lethality in experimental polymicrobial sepsis. anp high pmn were characterized by higher expression of the chemokine receptors for the ligands released following lps activation such as ccr and ccr (the receptors for ccl ) which could point to a possible positive feedback loop in which inflammatory anp high pmn attract additional inflammatory pmn, and thus actively promote a vicious cycle of hyperinflammation and tissue injury ( ) . given their phenotypic and functional profile, anp high pmn might play a pathogenic role in covid- , the disease caused by coronavirus sars-cov- ( ) . the main cause of covid- -mortality is acute respiratory failure ( ) . in patients with severe covid- , activated neutrophils, recruited to the pulmonary microvasculature, produce histotoxic mediators including ros ( ) . activation of neutrophils might contribute to cytokine release syndrome ("cytokine storm") that characterizes severe covid- disease ( ) . therapy targeting anp high pmn might prevent a patient's hyperinflammatory response to sars-cov- without weakening the antiviral response. it has been shown that the incorporation of denatured albumin beads depends on mac- expression ( ) . anp-incorporation, by contrast, is mac- -independent ( ) suggesting a distinct molecular mechanism of anp-endocytosis. "aged neutrophils", were first described in vitro as functionally deficient ( ) and have been subsequently shown to promote disease in vivo in models of sickle-cell disease or endotoxin-induced septic shock ( ) . while "aged neutrophils" home to the bone marrow under steady-state conditions anp high pmn do not home to the bone marrow under steady-state conditions and markers of aged peripheral blood neutrophils, cxcr , cxcl , cd l, and tlr , do not distinguish anp high from anp low pmn under steady-state or inflammatory conditions in the lung (figure ). these findings thus show that the previously described subset of aged neutrophils is distinct from the anp high pmn subset we identified. administration of anp carrying piceatannol, a syk inhibitor, dramatically improved survival in polymicrobial sepsis but, critically, did not increase the host's bacterial burden. syk function is required for the essential antibacterial functions of neutrophils ( ) . we found that anp low pmn are more efficient in ingestion and elimination of e. coli bacteria in vitro than anp high pmn and inhibition of syk in anp high pmn does not impair control of polymicrobial infection in vivo. by therapeutically leveraging the preferential anp uptake of the inflammatory pmn subset, we succeeded in limiting immunopathology caused by polymicrobial infection. pathogens adapt to host resistance mechanisms, but not host tolerance mechanisms ( ) . it is conceivable that an anp-based approach of targeting toxic subset of neutrophils, by improving host tolerance, might assist the host in combating antibiotic-resistant microbial infections ( ) . anp could also be used to deliver compounds other than piceatannol or to specifically deliver sirnas or micrornas. furthermore, anp-based therapy might be useful in other forms of neutrophilic injury; e.g., ischemia reperfusion injury of the myocardium or the kidney ( ) . the ability to tolerate pathogens in experimental polymicrobial sepsis was greatly strengthened by targeting anp high pmn with a drug that accelerates the velocity of pulmonary pmn and abrogates their ros production. panp did not target β -integrins directly ( ) but mitigated downstream β -integrin signaling, and are thus more likely to be effective when β -integrin are already engaged; i.e., in the lung microvasculature of septic patients ( ) . further potential improvement of anp-based therapies over the current standard treatments of septic patients ( ) lies in the precision targeting of the relevant pathogenic subset of neutrophils. earlier efforts to neutralize reactive oxygen species , to use antibodies against key inflammatory cytokines such as tnf-α, il- β, or to inhibit the endotoxin lps ( , ( ) ( ) ( ) have failed to reduce mortality associated with ali/ards possibly because they did not discriminate between pmn subsets and may have compromised both host defense (resistance) and tissue repair (tolerance) ( ) . compared with the therapeutic administration of exogenous cells, generated from donors or from the patients themselves, for example, mesenchymal stromal/stem cells (mscs) ( ), anp-based therapy has the advantage of targeting the host's endogenous cells dependent on their pathogenic activation. one limitation of our approach is that we cannot distinguish whether the specification into anp high pmn occurs in the bone marrow prior to egress of pmn into circulation or whether it occurs in the tissue itself. it is also possible that the anp high state characterized by upregulation of chemokine receptors and chemokine ligands is reversible and that pmn can transition between these states, in response to environmental cues even during the short life-span of neutrophils ( ) . the present data support the concept that the generation of heterogeneous pmn subpopulations evolved as a host defense mechanism to avoid an indiscriminate response by all pmn to septicemia ( ) . ( , ) . we demonstrate here that neutrophil heterogeneity can be effectively leveraged for a nanoparticle-based therapy. mice. we used outbred male cd mice, at a body weight ranging from g to g and, for adoptive transfer experiments, male inbred balb/c mice between and wk of age. mice were treated in accordance with the nih guide for the care and use of laboratory animals and uic animal care committee's regulations. all procedures were approved by the uic iacuc. synthesis of uniform-sized spheric albumin nanoparticles. anp and panp, synthesized as described ( ) , were of consistent hydrodynamic size ( nm ± nm diameter and zeta potential (- ± . mv) distribution. we injected i.v. . mg/kg body weight of anp, or of anp loaded with . µm piceatannol (panp) h and h after challenge with lps or h and h after clp. flow cytometry and cell sorting. single cell suspensions were prepared as described ( ) tissue damage markers. the activity, ldh and sdh was determined using commercial kits according to manufacturers' instructions. histopathology was evaluated in sections from paraffin embedded or frozen tissues using specific antibodies to nitrotyrosine as described ( ) . quantification of hydrogen peroxide production. we measured hydrogen peroxide production using the amplex red hydrogen peroxide kit (invitrogen) following the manufacturer's instructions. heterogeneity of neutrophils neutrophil kinetics in health and disease the role of neutrophils in immune dysfunction during severe inflammation neutrophils and acute lung injury sepsis: current dogma and new perspectives surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock: drotrecogin alfa (activated) in adults with septic shock sepsis: pathophysiology and clinical management respiratory status deterioration during g-csf-induced neutropenia recovery phenotypic and functional change of cytokine-activated neutrophils: inflammatory neutrophils are heterogeneous and enhance adaptive immune responses getting to the site of inflammation: the leukocyte adhesion cascade updated neutrophil heterogeneity: implications for homeostasis and pathogenesis neutrophils in homeostasis prevention of vascular inflammation by nanoparticle targeting of adherent neutrophils identification of human macrophage inflammatory proteins α and β as a native secreted heterodimer endothelium-derived toll-like receptor- is the key molecule in lps-induced neutrophil sequestration into lungs reactive oxygen species in inflammation and tissue injury critical role for cxcr and cxcr ligands during the pathogenesis of ventilator-induced lung injury critical role of endothelial cxcr in lps-induced neutrophil migration into the lung caspase- -mediated endothelial pyroptosis underlies endotoxemia-induced lung injury annual review of immunology syk is required for integrin signaling in neutrophils piceatannol, a syk-selective tyrosine kinase inhibitor, attenuated antigen challenge of guinea pig airways in vitro piceatannol ( , , ', '-tetrahydroxy-trans-stilbene) is a naturally-occurring protein-tyrosine kinase inhibitor stabilized imaging of immune surveillance in the mouse lung phenotype of mice and macrophages deficient in both phagocyte oxidase and inducible nitric oxide synthase the lung is a host defense niche for immediate neutrophil-mediated vascular protection outcomes of bacteremia in patients with cancer and neutropenia: observations from two decades of epidemiological and clinical trials the acute respiratory distress syndrome prognostic significance of elevated serum lactate dehydrogenase (ldh) in patients with severe sepsis pneumotoxicity and hepatotoxicity of styrene and styrene oxide the mercurial nature of neutrophils: still an enigma in ards? neutrophils: between host defence, immune modulation, and tissue injury social networking of human neutrophils within the immune system neutrophil heterogeneity in health and disease: a revitalized avenue in inflammation and immunity neutrophil recruitment and function in health and inflammation role of inflammatory mediators in the pathophysiology of acute respiratory distress syndrome desynchronization of the molecular clock contributes to the heterogeneity of the inflammatory response monocyte recruitment during infection and inflammation the sars-cov- outbreak: what we know clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china cytokine release syndrome in severe covid- the cytokine storm in covid- : an overview of the involvement of the chemokine/chemokine-receptor system heterotypic interactions enabled by polarized neutrophil microdomains mediate thromboinflammatory injury impairment of function in aging neutrophils is associated with apoptosis neutrophil ageing is regulated by the microbiome neutrophil-specific deletion of syk kinase results in reduced host defense to bacterial infection disease tolerance as a defense strategy two ways to survive infection: what resistance and tolerance can teach us about treating infectious diseases involvement of neutrophils in the pathogenesis of lethal myocardial reperfusion injury selected treatment strategies for septic shock based on proposed mechanisms of pathogenesis surviving sepsis campaign: international guidelines for management of sepsis and septic shock effect of eritoran, an antagonist of md -tlr , on mortality in patients with severe sepsis: the access randomized trial a randomized, double-blind, placebo-controlled trial of tak- for the treatment of severe sepsis anti-tumor necrosis factor therapy in sepsis: update on clinical trials and lessons learned mesenchymal stem (stromal) cells for treatment of ards: a phase clinical trial selective roles for tolllike receptor (tlr) and tlr in the regulation of neutrophil activation and life span neutrophil diversity in health and disease e ubiquitin ligase cblb regulates the acute inflammatory response underlying lung injury inos expression and nitrotyrosine formation in the myocardium in response to inflammation is controlled by the interferon regulatory transcription factor measurement of oxidative burst in neutrophils flow cytometric analysis of the granulocyte respiratory burst -a comparison study of fluorescent-probes star: ultrafast universal rna-seq aligner featurecounts: an efficient general purpose program for assigning sequence reads to genomic features differential expression analysis of multifactor rna-seq experiments with respect to biological variation edger: a bioconductor package for differential expression analysis of digital gene expression data controlling the false discovery rate: a practical and powerful approach to multiple testing analyzing real-time pcr data by the comparative ct method cd (tnfrsf ), c-f pyk (fak ), i-kb, mhc class ii tgf-beta , l-selectin ubiquitin, tap (psf ), beta-catenin, ap- , irf , rsad , lck, pkc-theta il- beta, fc gamma ri, tnf-r , hla-drb , tgf-beta, mhc class ii . e- fc gamma ri, l-selectin, pecam , mhc class ii, ccr , cd . e- ccl , il- , il- r beta chain, il- beta, ap- , il ra, ccl , ccr , fkhr . e- e- . e- mhc class ii alpha chain, il- beta, i-kb ccl , il- beta, tgf-beta , il- alpha, caspase- , pd-l , il rn, il r , . e- il- beta, i-kb, tgf-beta , mhc class ii psmb , tap (psf ) il- beta, mhc class ii, cd (tnfrsf ), ccr , il- , p mapk, c . e- . e- il- beta, tnf-r , mhc class ii, il- il- beta, i-kb, tpl (map k ), mhc class ii c-flip(long) mhc class ii, pd-l , slam, cd (tnfrsf ), cd (tnfrsf ) cox- (ptgs ), jak , mhc class ii, lck, cd (tnfrsf ) mhc class ii, cd (tnfrsf ) e- . e- tgf-beta , c-iap , c-flip(short) gro- , il- beta, i-kb il- r beta chain, i-kb, sil- ra, il- ra, lck, nik(map k ) cyclin d , bfl , nik(map k ), cd (tnfrsf ), p mapk il- beta, i-kb s p receptor, tgf-beta , cd , tie , pd-l ikk-gamma, il- beta il- r beta chain, jak , c-myc, il- r alpha chain, granzyme . e- il- , caspas . e- il- r beta chain, il- beta, sil- ra, mhc class ii cox- (ptgs ), pge r , ikk-gamma, il- beta, i-kb, tgf-beta , mhc cl ptpn , cd (tnfrsf ), c-flip nf-at (nfatc ) lck, cd (tnfrsf ), pkc-theta, nik(map k ) rig--drb , tgf-beta, mhc class ii, ptpn , cd (tnfrsf ), csf , il- , mhc class ii beta chain, il- r alpha chain lass ii, klf , slam, cd (tnfrsf ), pkc-theta, nik(map k ), fkhr, bcl- , irf , malt , cd , ifn-gamma, cd , blimp (pr - , il ra, ccl , ccr , fkhr, psgl- , ccr , nf-at (nfatc ), ifn-gamma , cd , lat, itk, p mapk, nf-at (nfatc ) cd (tnfrsf ), ccr , mhc class ii beta chain, p mapk, ask (map k ), cd , c-jun spase- , pd-l , il rn, il r , csf , il- , endothelin- , caspase- , msr , p mapk, pd-l , aml (runx ) il- r alpha chain, cd zeta, nf-at (nfatc ) mhc class ii beta chain, zap , cd , lat, ets , aiolos, cd zeta, nf-at (nfatc ) rar-alpha/rxr-alpha nik(map k ), fkhr, bcl- , cd , malt , il- r alpha chain, nf-at (nfatc ), cxcr , ifn-gamma il- , p mapk, cd , cxcr , ifn-gamma ase- , granzyme b, cd , ifn-gamma, fasr(cd ) mhc class ii, irak / , ap- , cd (tnfrsf ), hsp , p mapk, cd , c-jun cd , cd (tnfrsf ), cd , pd-l (tnfrsf ), il ra , zap , cd , lat, itk, nf-at (nfatc ), nf-at, fasr(cd ) , caspase- , dr (tnfrsf a), c-iap , tbid, bcl- , bid -beta , mhc class ii, ccr , cd (tnfrsf ), ccr , csf , cmklr , m-csf receptor d (tnfrsf ), smad , tieg , il- r alpha chain, nf-at (nfatc ) hgf receptor (met), c-iap , tbid nfkbia, ifn-gamma, bcl- , c-jun cd , itk, nf-at (nfatc ), vav- , ifn-gamma tcf(lef), dsh, frizzled c, il- r alpha chain, granzyme b, bcl- usp reg class ib (p ), hip , itk, p mapk, cxcr , c-jun, vil (ezrin) vav- , cd sl , c-jun, fasr(cd ) vav- , ifn-gamma, bcl- p mapk t (nfatc ), cxcr , c-jun, pd-l , p mapk, m-csf receptor, cd , socs , ifn-gamma supplemental figure . anp biodistribution measured by ivis in naïve mice, i.v.-injected anps were mostly found to liver and spleen. after i.p. administration of the endotoxin lps [ mg/kg], anp appeared prominently in lungs, and remained visible in liver and in spleen. albumin was labeled with vivotag (perkinelmer) and then formed into anps. organs were harvested h after µl tail vein injection of mg/ml vivotag labeled anps. epi-fluorescence radiance scale corresponding to images. fluorescence was measured by a xenogen ivis spectrum (caliper life sciences) and images were processed with living image software (ver. . . ). an excitation filter of nm and emission filter of nm with sec exposure times were used for all experiments. representative data obtained from at least mice per treatment group. flow cytometry histograms of bone marrow pmn stimulated with lps [ mg/ml] and incubated with panp, at the indicated doses, for min at °c, and then processed for flow cytometry. representative data obtained from at least mice per treatment group panp treatment reduces superoxide production by pmn in response to fmlp stimulation. pamp reduced fmlp stimulated ros production in a dose dependent manner. bone marrow pmn were incubated with the indicated dises of panp and stimulated with fmlp for the indicated time. ros production was measured using an amplex red hydrogen peroxide/peroxidase assay kit. representative data obtained from at least mice per treatment group. key: cord- - hn authors: shaw, dario r.; ali, muhammad; katuri, krishna p.; gralnick, jeffrey a.; reimann, joachim; mesman, rob; van niftrik, laura; jetten, mike s. m.; saikaly, pascal e. title: extracellular electron transfer-dependent anaerobic oxidation of ammonium by anammox bacteria date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: hn anaerobic ammonium oxidation (anammox) by anammox bacteria contributes significantly to the global nitrogen cycle, and plays a major role in sustainable wastewater treatment. anammox bacteria convert ammonium (nh +) to dinitrogen gas (n ) using nitrite (no −) or nitric oxide (no) as the electron acceptor. in the absence of no − or no, anammox bacteria can couple formate oxidation to the reduction of metal oxides such as fe(iii) or mn(iv). their genomes contain homologs of geobacter and shewanella cytochromes involved in extracellular electron transfer (eet). however, it is still unknown whether anammox bacteria have eet capability and can couple the oxidation of nh + with transfer of electrons to carbon-based insoluble extracellular electron acceptors. here we show using complementary approaches that in the absence of no −, freshwater and marine anammox bacteria couple the oxidation of nh + with transfer of electrons to carbon-based insoluble extracellular electron acceptors such as graphene oxide (go) or electrodes poised at a certain potential in microbial electrolysis cells (mecs). metagenomics, fluorescence in-situ hybridization and electrochemical analyses coupled with mec performance confirmed that anammox electrode biofilms were responsible for current generation through eet-dependent oxidation of nh +. n-labelling experiments revealed the molecular mechanism of the eet-dependent anammox process. nh + was oxidized to n via hydroxylamine (nh oh) as intermediate when electrode was the terminal electron acceptor. comparative transcriptomics analysis supported isotope labelling experiments and revealed an alternative pathway for nh + oxidation coupled to eet when electrode is used as electron acceptor compared to no −as electron acceptor. to our knowledge, our results provide the first experimental evidence that marine and freshwater anammox bacteria can couple nh + oxidation with eet, which is a significant finding, and challenges our perception of a key player of anaerobic oxidation of nh + in natural environments and engineered systems. abstract: anaerobic ammonium oxidation (anammox) by anammox bacteria contributes significantly to the global nitrogen cycle, and plays a major role in sustainable wastewater treatment. anammox bacteria convert ammonium (nh + ) to dinitrogen gas (n ) using nitrite (no - ) or nitric oxide (no) as the electron acceptor. in the absence of no or no, anammox bacteria can couple formate oxidation to the reduction of metal oxides such as fe(iii) or mn(iv). their genomes contain homologs of geobacter and shewanella cytochromes involved in extracellular electron transfer (eet). however, it is still unknown whether anammox bacteria have eet capability and can couple the oxidation of nh + with transfer of electrons to carbon-based insoluble extracellular electron acceptors. here we show using complementary approaches that in the absence of no -, freshwater and marine anammox bacteria couple the oxidation of nh + with transfer of electrons to carbon-based insoluble extracellular electron acceptors such as graphene oxide (go) or electrodes poised at a certain potential in microbial electrolysis cells (mecs). metagenomics, fluorescence in-situ hybridization and electrochemical analyses coupled with mec performance confirmed that anammox electrode biofilms were responsible for current generation through eet-dependent oxidation of nh + . n-labelling experiments revealed the molecular mechanism of the eet-dependent anammox process. nh + was oxidized to n via hydroxylamine (nh oh) as intermediate when electrode was the terminal electron acceptor. comparative transcriptomics analysis supported isotope labelling experiments and revealed an alternative pathway for nh + oxidation coupled to eet when electrode is used as electron acceptor compared to no as electron acceptor. to our knowledge, our results provide the first experimental evidence that marine and freshwater anammox bacteria can couple nh + oxidation with eet, which is a significant finding, and challenges our perception of a key player of anaerobic oxidation of nh + in natural environments and engineered systems. main text: anaerobic ammonium oxidation (anammox) by anammox bacteria contributes up to % of n emitted into earth's atmosphere from the oceans ( , ). also, anammox bacteria has been extensively investigated for energy-efficient removal of nh + from wastewater ( ). initially, anammox bacteria were assumed to be restricted to nh + as electron donor and no or no as electron acceptor ( , ). more than a decade ago, preliminary experiments showed that kuenenia stuttgartiensis and scalindua could couple the oxidation of formate to the reduction of insoluble extracellular electron acceptors such as fe(iii) or mn(iv) oxides ( , ). however, the mechanism of how anammox bacteria reduce insoluble extracellular electron acceptors has remained unexplored to date. also, growth or electrochemical activity was not quantified in these experiments. further, these experiments could not discriminate between fe(iii) oxide reduction for nutritional acquisition (i.e., via siderophores) versus respiration through extracellular electron transfer (eet) ( ). therefore, with these preliminary experiments it could not be determined if anammox bacteria have eet capability or not. although preliminary work showed that k. stuttgartiensis could not reduce mn(iv) or fe(iii) with nh + as electron donor ( ), the possibility of anammox bacteria to oxidize nh + coupled to eet to other insoluble extracellular electron acceptors cannot be ruled out. in fact eet (and set of genes involved with eet) is not uniformly applied to all insoluble extracellular electron acceptors; some electroactive bacteria are not able to transfer electrons to carbon-based insoluble extracellular electron acceptors such as electrodes in bioelectrochemical systems but could reduce metal oxides and vice versa ( ). it is known for more than two decades that carbon-based high- molecular-weight organic materials, which are ubiquitous in terrestrial and aquatic environments and that are not involved in microbial metabolism (i.e., humic substances) can be used as external electron acceptor for the anaerobic oxidation of compounds ( ). also, it has been reported that anaerobic nh + oxidation linked to microbial reduction of natural organic matter fuels nitrogen loss in marine sediments ( ). a literature survey of more than eet-capable species indicated that there are many ecological niches for microorganisms able to perform eet ( ) . this resonates with a recent finding where listeria monocytogenes, a host-associated pathogen and fermentative gram-positive bacterium, was able to respire through a flavin-based eet process and behaved as an electrochemically active microorganism (i.e., able to transfer electrons from oxidized fuel (substrate) to a working electrode via eet process) ( ). further it was reported that anammox bacteria seem to have homologs of geobacter and shewanella multi-heme cytochromes that are responsible for eet ( ). these observations stimulated us to investigate whether anammox bacteria can couple nh + oxidation with eet to carbon-based insoluble extracellular electron acceptor and can behave as electrochemically active bacteria. (fig. s b-g) . also, a previously enriched k. stuttgartiensis (freshwater anammox species) culture was used ( ). the anammox cells were incubated anoxically for hours in the presence of nh + ( mm) and graphene oxide (go) as a proxy for insoluble electron acceptor. no no or no were added to the incubations. go particles are bigger than bacterial cells and cannot be internalized, and thus go can only be reduced by eet ( ). indeed, go was reduced by anammox bacteria as shown by the formation of suspended reduced go (rgo), which is black in color and insoluble (fig. a) ( ). in contrast, abiotic controls did not form insoluble black precipitates. reduction of go to rgo by anammox bacteria was further confirmed by raman spectroscopy, where the formation of the characteristic d and d+d¢ peaks of rgo ( ) were detected in the vials with anammox cells (fig. b) , whereas no peaks were detected in the abiotic control. further, isotope analysis of the produced n gas showed that anammox cells were capable of n formation (fig. c ). in contrast, n production was not significant in any of the tested anammox species or controls, suggesting that unlabeled no or no were not involved. the production of n indicated that the anammox cultures use a different mechanism for nh + oxidation in the presence of an insoluble extracellular electron acceptor (further explained below). gas production was not observed in the abiotic control (fig. c) . to determine if anammox bacteria are still dominant after incubation with go, we extracted and sequenced total dna from the brocadia and scalindua vials at the end of the experiment. differential coverage showed that the metagenomes were dominated by anammox bacteria (fig. s a removal were observed in any of the abiotic controls. subsequently, the ca. brocadia culture was inoculated into the mec and operated under optimal conditions for anammox (i.e., addition of nh + and no -). under this scenario, nh + and no were completely removed from the medium without any current generation ( fig. a) . stoichiometric ratios of consumed no − to consumed nh + (∆no − /∆nh + ) and produced no − to consumed nh + (∆no − /∆nh + ) were in the range of . - . and . - . , respectively, which are close to the theoretical ratios of the anammox reaction ( ). these ratios indicated that anammox bacteria were responsible for nh + removal in the mec. subsequently, no was gradually decreased to mm leaving the electrodes as the sole electron acceptor. when the exogenous electron acceptor (i.e., no -) was completely removed from the feed, anammox cells began to form a biofilm on the surface of the electrodes (fig. s i ) and current generation coupled to nh + oxidation was observed in the absence of no -( fig. a) . further, no and no − were below the detection limit at all time points when the working electrode was used as the sole electron acceptor. the magnitude of current generation was proportional to the nh + concentration ( fig. a) and maximum current density was observed at set potential of . v vs ag/agcl. there was no visible biofilm growth and current generation at the mole of electrons transferred to the electrode per mole of nh + oxidized to n (table s ) was stoichiometrically close to equation (eq. ). also, electron balance calculations showed that coulombic efficiency (ce) was > % for all nh + concentrations and anammox cultures tested in the experiments with electrodes as the sole electron acceptor (table s ). (table s ). in addition, nh + oxidation and current production were not affected by the addition of penicillin g (fig. s ) (table s ) . this observation agrees with the nh + removal and oxidation to n observed in the mecs and isotope labeling experiments ( fig. a, fig a) . the genes encoding for no and no reductases (nir genes) and their redox couples were significantly downregulated when electrode was used as the electron acceptor (table s ). this is expected as no was not added in the electrode-dependent anammox process. also, this supports the hypothesis that no is not an intermediate of the electrode-dependent anammox process and that there was no effect of ptio when no was replaced by the electrode as electron acceptor ( suggest an alternative pathway for nh + oxidation coupled to eet when working electrode is used as electron acceptor compared to no as electron acceptor. in conclusion, our study provides the first experimental evidence that phylogenetically and microbial nitrogen cycling processes in oxygen minimum molecular mechanism of anaerobic ammonium oxidation nitric oxide-dependent anaerobic ammonium oxidation deciphering the evolution and metabolism of an anammox bacterium from a community genome enrichment and characterization of marine anammox bacteria associated with global nitrogen gas production our analysis showed upregulation under electrode-dependent anammox process of the genes in the wood-ljungdahl pathway for co fixation and acetyl-coa synthesis glts catalyzes the binding of the ammonium- nitrogen to -oxoglutarate with the oxidation of fdred ( ). the -oxoglutarate used for this reaction can be provided by the key enzyme of the rtca cycle -oxoglutarate:ferredoxin oxidoreductase (ogor) ( ) iron assimilation in anammox bacteria in eet-dependent anammox process surprisingly the proteins involved in iron transport and assimilation are still unknown. our analysis revealed that in the absence of soluble electron acceptors (i.e., no -, no -), ca. brocadia electricigens expressed two gene clusters encoding a siderophore-mediated iron uptake system (fig. s , table s and ). the expressed siderophore-mediated transport system fe(iii) uptake relies on beta-barrel tonb-dependent receptors in the outer membrane ( ) and an energy-transducing protein complex tonb-exbb-exbd that links the outer with the inner membrane and generate a proton motive force ( ). a periplasmic iron-binding protein and an atp-dependent abc fe(iii) reduction in the cytoplasm can be carried out by ferric-chelate reductases/rubredoxins, from which multiple genes were found to be expressed (fig. s , table s ) this finding is in agreement with a previous study using the eet-capable model bacteria geobacter sulfurreducens ( ), in which it was shown that the pathways required for eet and metal oxide reduction are distinct. : rieske/cytochrome b complex; bc- : rieske/cytochrome b complex; bc- : rieske/cytochrome b complex; cyt c ( heme): periplasmic mono-heme c-type cytochrome membrane-anchored tetraheme c-type cytochrome; cyt c ( hemes): outer membrane penta-heme c-type cytochrome cytochrome c redox partner of the etm; cyt nir: cytochrome c; etm: electron transfer module for hydrazine synthesis; exbb: biopolymer transport protein exbb/tolq; exbd: biopolymer transport protein exbd/tolr; fdh: membrane-bound formate dehydrogenase foca: formate/nitrite transporter; glts: glutamate synthase; hao brosi_a : hydroxylamine oxidoreductase; hao brosi_a : hydroxylamine oxidoreductase; hao ex _ : hydroxylamine oxidoreductase; hdh: hydrazine dehydrogenase; hzs: hydrazine synthase; membnxr: membrane-bound complex of the nxr gene cluster; mscl: large mechanosensitive channel; mscs: pore-forming small mechanosensitive channel; nadh-dh: nadh dehydrogenase; nir brosi_a : nitrite reductase; norvw: flavodoxin nitric oxide reductase; nrfa: ammonium-forming nitrite reductase heme): outer membrane lipoprotein mono-heme c-type cytochrome ompa: ompa-like outer membrane protein, porin; pfdo: pyruvate ferredoxin oxidoreductase; pfl: pyruvate formate lyase; pp binding protein: iron abc transporter periplasmic substrate-binding protein; rnf: rnfabcdge type electron transport complex rubredoxin/ferric-chelate reductase; s-layer: s-layer protein; tonb: energy transducer tonb acetoxime (c h no), and the molecular ion peaks were detected at mass to charge (m/z) = and for nh oh and nh oh, respectively. m of nh oh and nh oh were used as standards. to determine the source of the oxygen used in the electrode-dependent nh + oxidation to nh oh, mecs were incubated with nh cl ( mm, cambridge isotope laboratories) in presence of % d o for hours. stable isotopes of nh oh were determined by gc/ms analysis after derivatization using acetone as described above. activity and electron balance calculations activities of specific anammox ( n ) with nitrite as the preferred electron acceptor and electrode- dependent anammox ( n ) with working electrode ( . v vs ag/agcl) as sole electron acceptor were calculated based on the changes in gas concentrations in single-chamber mec batch incubations. the activity was normalized against protein content of the biofilm on the electrodes. protein content was measured as described below (see analytical methods section). the moles of electrons recovered as current per mole of nh + oxidized were calculated using: analytical methods all samples were filtered through a . µm pore-size syringe filters (pall corporation) prior to chemical analysis. nh + concentration was determined photometrically using the indophenol method ( ) (lower detection limit = μm). absorbance at a wavelength of nm was determined using multi-label plate readers (spectramax plus ; molecular devices, ca, usa). no concentration was determined by the naphthylethylenediamine method ( ) (lower detection limit = μm). samples were mixed with . mm naphthylethylenediamine solution, and the absorbance was measured at a wavelength of nm. no concentration was measured by hach kits (hach, co, usa; lower detection limit = . mg l - no --n). user's guide was followed for these kits and concentrations were measured by spectrophotometer (d , hach, co, usa). concentrations of nh oh and hydrazine (n h ) were determined colorimetrically as previously described ( ). for nh oh, liquid samples were mixed with -quinolinol solution ( . % (w/v) trichloroacetic acid, . % (w/v) -hydroxyquinoline and . m na co ) and heated at °c for min. after cooling down for min, absorbance was measured at nm ( ). n h was derivatized with % (w/v) p-dimethylaminobenzaldehyde and absorbance at nm was measured ( ). the concentration of biomass on the working electrodes was determined as protein concentration using dc protein assay kit (bio-rad, tokyo, japan) according to manufacturer's instructions. bovine serum albumin was used as the protein standard. putative eet-dependent anammox pathway we provided evidence that phylogenetically distant anammox bacteria can perform eet and are electrochemically active, and we elucidated the molecular mechanism of nh + oxidation, which by itself are significant findings that changes our perception of a key player in the global nitrogen (table s and table s ). this result is consistent with the nh + uptake, oxidation and final conversion to n observed in the mecs and isotope labeling experiments ( fig. a, fig a) . the requirement of more moles of nh + when anammox growth is based on eet compared to no as electron acceptor (eq. ), increases the demand of nh + import into the cell, which can explain the upregulation of the ammonium transporters. in contrast, the genes encoding for no and no reductases (nir genes) and their redox couples were significantly downregulated ( fig. s , table s ). this agrees, with the fact that no and no were below the detection limit in the mecs (fig. a, fig. s a and b) and there was no effect of ptio when no was replaced by electrode as electron acceptor (fig. s ) (fig. ) . the four low-potential electrons released from this reaction must be stored until they are transferred to a redox partner (fig. e, fig. s c and d) exhibited oxidation/reduction peaks, which suggests that additional cytochrome(s) that transfer electrons directly to the electrode via solvent exposed hemes may be involved. also, no cytochromes for long-range electron transport were detected in the analysis (table s and ), suggesting that eet to electrodes by anammox bacteria rely on a direct eet mechanism. homology detection and structure prediction by hidden markov model comparison (hmm-hmm) of the highly upregulated penta-heme cytochrome ex _ (fig. s , table s ) gave high probability hits to proteins associated to the extracellular matrix and outer membrane iron respiratory proteins such as mtrf, omca and mtrc. also, it is worth mentioning that the gene cluster ex _ - was one of the most upregulated under set-potential conditions. therefore, future work should focus on determining the role of ex _ - in the eet- dependent anammox process. likewise, we also found the expression of outer membrane mono- heme c-type cytochromes (om cyt c ( heme) (fig. s , table s ) homologs to g. key: cord- -p ew w authors: ramana, chilakamarti v. title: regulation of early growth response- (egr- ) gene expression by stat -independent type i interferon signaling and respiratory viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: p ew w respiratory virus infection is one of the leading causes of death in the world. activation of the jak-stat pathway by interferon-alpha/beta (ifn-α/β) in lung epithelial cells is critical for innate immunity to respiratory viruses. genetic and biochemical studies have shown that transcriptional regulation by ifn-α/β required the formation of interferon-stimulated gene factor- (isgf- ) complex consisting of stat , stat , and irf transcription factors. furthermore, ifn α/β receptor activates multiple signal transduction pathways in parallel to the jak-stat pathway and induces several transcription factors at mrna levels resulting in the secondary and tertiary rounds of transcription. transcriptional factor profiling in the transcriptome and rna analysis revealed that early growth response- (egr- ) was rapidly induced by ifn-α/β and toll-like receptor (tlr) ligands in multiple cell types. studies in mutant cell lines lacking components of the isgf- complex revealed that ifn-β induction of egr- was independent of stat , stat , or irf . activation of the mek/erk- / pathway was implicated in the rapid induction of egr- by ifn-β in serum-starved mouse lung epithelial cells. interrogation of multiple microarray datasets revealed that respiratory viruses including coronaviruses regulated egr- expression in human lung cell lines. furthermore, egr- inducible genes including transcription factors, mediators of cell growth, and chemokines were differentially regulated in the human lung cell lines after coronavirus infection, and in the lung biopsies of covid- patients. rapid induction by interferons, tlr ligands, and respiratory viruses suggests a critical role for egr- in antiviral response and inflammation with potential implications for human health and disease. interferons (ifns) are pleiotropic cytokines that play a central role in innate and adaptive immunity ( , ) . there are major types of interferons. the type i interferons consists of ifn-alpha/beta (ifn-α/β ). type ii interferon represented by interferon-gamma (ifn-γ), and type iii interferons represented by interferon lambda (ifn-λ ) . the biological effects of ifn are mediated mainly by the rapid and dramatic changes in gene expression ( )). type i ifn signaling involves the binding of ifn-α/β to its receptor (ifnar) and activation of receptor-associated janus protein tyrosine kinases jak and tyk and the phosphorylation of stat and stat to form a heterodimer. this heterodimer associate with the interferon responsive factor (irf ) to form the interferon-stimulated gene factor- (isgf- ) complex on interferon-stimulated response elements (isre) in the promoters of type i ifn responsive genes to regulate transcription ( , , ) . stat homodimer or heterodimer of stat /stat can also bind to gamma activated sequence (gas) in the promoter and regulate transcription of some type i ifn responsive genes ( , ) . in addition to this classical canonical jak-stat pathway, non-canonical pathways involving stat , irf , and unphosphorylated isgf- components in the regulation of gene expression have been described ( ) . furthermore, several signal transduction pathways are activated by ifn receptors in parallel to jak-stat including extracellular signal-regulated kinases (erk- / ) and phosphoinositide '-kinase/akt pathways ( , ) . the first wave of interferon signaling is followed by induction of transcription factors such as interferon regulatory factors (irfs) that sustain the secondary and tertiary transcriptional responses. tlr recognition of pathogens by immune cells results in the production of multiple cytokines such as ifn-α/β , tumor necrosis factor-α (tnf-α ), and interleukin-β (il- β ) in innate immunity ( ) . activation of multiple signal transduction pathways and cross-talk between the pathways enables fine-tuning of gene expression in innate immunity ( ) . cross-talk between tnf-α and ifn signaling pathways regulate inflammatory gene expression to influence the immune responses ( , ) . clinical significance of the balance between immune modulation and inflammation in signal transduction pathways has been demonstrated in a variety of autoimmune diseases ( , ) . early growth response (human egr / mouse egr ; referred in this manuscript as egr- ) belongs to a family of immediate-early response genes that contain a conserved zinc finger dna-binding domain and binds to a gc-rich sequence in the promoters of target genes ( ) . a variety of signals, including serum, growth factors, cytokines, and hormones stimulate egr- expression ( , ) . egr- has been shown to play an important role by regulating inflammatory gene expression in a variety of lung diseases and in mouse lung injury models including asthma, emphysema, airway inflammation, and pulmonary fibrosis ( ) ( ) ( ) . ischemia-mediated activation of egr- triggers the expression of pivotal regulators of inflammation including the chemokine, adhesion receptor, and pro-coagulant gene expression ( ) . egr- stimulates chemokine production in interleukin- mediated airway inflammation, and remodeling in the lung ( ) . high levels of expression of egr- and egr- inducible genes were reported in atherosclerosis, an inflammatory disease ( ) . egr- may have a potential role in liver injury and in acute pancreatitis ( ) ( ) ( ) . lipopolysaccharide (lps) induction of egr- was mediated by the activation of erk- / pathway and serum response elements ( ) . in this study, transcription factor profiling in interferon-mediated gene expression data sets and rt-pcr revealed that egr- was rapidly induced by ifn-α/β and tlr ligands in multiple cell types. studies in mouse and human fibroblast mutant cell lines revealed that egr- induction by type i interferons was independent of transcription factors stat , stat or irf . furthermore, the regulation of egr- by ifn-β was mediated by the activation of the erk- / pathway in serum-starved mouse lung epithelial cells. respiratory pathogens including coronaviruses (sars-cov- and ) and influenza viruses regulated the expression of egr- in human lung cell lines and in lung biopsies and peripheral blood cells of covid- patients, these studies suggest that the regulation of egr- may play an important role in the antiviral response and inflammatory disease. gene expression in response to interferon, tnf-α. and tlr agonist treatment in human peripheral blood mononuclear cells (pbmc), human hepatoma cells (huh- ), mouse bone marrow-derived macrophages (bmdm) were reported previously ( ) ( ) ( ) . supplementary data was downloaded from the journal publisher websites and geo datasets were analyzed with the geor r method (ncbi). cluster analysis was performed using gene expression software tools at www.heatmapper.ca. gene expression datasets representing human lung cell lines infected with respiratory viruses and from covid- patients were reported previously ( , ) . gene expression resources from immgen rna seq skyline were used ( http://rstats.immgen.org/skyline_covid- /skyline.html ). outliers of expression were not included in the analysis. mouse lung epithelial (mle-kd) and macrophage (raw . ) cell lines were used ( , ) . human fibrosarcoma cell line ( ftgh) and mutant cell lines lacking stat (u a), stat (u a), and irf (u a) were described previously ( ) . wild -type and stat -knockout mouse embryo fibroblast (mef) cell lines were used ( ) . cells were maintained in dmem supplemented with % fbs and % penicillin and streptomycin. cells were plated at - % density and maintained in full medium for a day. cells were incubated in serum-free medium for another hours. cells were treated with tnf-α ( ng/ml) or ifn-β were performed using ambion retroscript (austin, tx), according to the manufacturer's protocol. primer sequences for egr- , irf , and gapdh were obtained from the molecular reagents section of mouse genome informatics (mgi). human egr- and β− actin primer sequences were previously described ( ) . pcr products were resolved on a % agarose gel containing ethidium bromide and visualized with u.v. light and images were captured on a digital system. image files were processed with imagej (nih) software. specific gene expression was normalized to gapdh or β -actin and fold changes in the treated samples were calculated with respect to controls. mle-kd cell extracts were prepared and proteins were separated by electrophoresis using %- % sds-page gels. proteins in the gel were electrophoretically transferred to polyvinylidene difluoride membranes (bio-rad, ca) and subjected to immunoblotting with the antibodies for phosphorylated erk- / (thr /tyr ) or total erk- / from cell signaling technology (beverly, ma). blots were visualized by enhanced chemiluminescence western detection system (pierce, il). (figure a and b). these results were consistent with previous studies in human fibroblasts ( ) . tnf-α induction of egr- was much higher than ifn α/β in ftgh cells ( figure b ). interestingly, tnf-α but not ifn-α induced egr- mrna in hela cells suggesting differential pathway regulation ( ) . furthermore, ifn-λ induction of egr- was higher than with ifn-α/β in huh cells ( figure c ) irf (u a) demonstrated that ifn-β induction of egr- was independent of isgf- components ( figure b ). tnf-α and tlr ligands activate several mitogen-activated protein kinase (mapk) pathways such as extracellular signal-regulated kinase (erk), jun n-terminal kinase (jnk) and p ( ) . activation of the erk- / pathway leading to the phosphorylation of transcription factors such as elk and srf was implicated in the rapid induction of egr- in response to multiple stimuli ( , ) . ( ) . detailed promoter analysis revealed that multiple distal and proximal cis-elements were involved in egr- induction ( ). the generation of an inflammatory response is a complex process involving multiple cytokines acting in parallel and in concert in innate and adaptive immunity ( , ) . influenza virus-infected dendritic cells and macrophages, which reside in close proximity to lung epithelium, can produce significant amounts of tnf-α and type i ifn in response to virus infection ( ) . ifn-α / β is involved in signaling cross-talk with tnf −α that enhance or dampen the severity of inflammatory response ( , ) . interaction between interferon-α/β and tnf-α signaling was reported in autoimmune diseases. ( , ) . a select list of genes was induced by both tnf-α and interferon-a /β in pbmc and bmdm gene expression datasets ( figure ). tnf-α induction of these genes was much higher than ifn α/β in the mouse bmdm cells. interestingly, egr- was induced by both cytokines in pbmc and bmdm. these genes are involved in transcriptional regulation and integrating signal transduction pathways that are likely to play an important role in the cytokine storm and shift to hyperinflammatory gene expression in response to coronavirus infections ( , ). egr- is a zinc finger transcription factor that binds to a gc-rich sequence in the promoter regions of target genes. a shortlist of egr- regulated genes was generated from the truust transcription factor database and published studies in different cell types and under different stimuli ( ) ( ) ( ) ) . egr- regulates genes involved in cell growth (egr , atf , pdgfrb, cdkna , ccnd , tnfsf , and chemokines involved in inflammation such as (cxcl , cxcl , ccl , ccl ). it is important to note that many of the egr- target genes were also known targets of nf-kb, ap , and stat in cytokine signaling ( ) . furthermore, egr- interacts with several other transcription factors such as ets , ets , elk that are common in cytokine responsive genes ( ) . previous studies have shown that transcription factors of innate and adaptive immunity show functional connectivity involving shared interacting protein partners and target genes. ( ) . cluster analysis revealed that egr- induction was correlated with the expression of egr- target genes by ifn-α/β in pbmc and in bmdm cells ( figure ). severe acute respiratory syndrome coronaviruses (sars-cov) and influenza viruses are major respiratory pathogens in humans with seasonal epidemics and potential pandemic threats ( , ) . gene expression profiling studies of human lung epithelial cell lines such as calu- , a , and nhbe in response to respiratory viruses were reported recently ( , ) . interrogation of infection also induced egr- mrna in a cells, hours post-infection ( figure c ). these studies suggest that egr- expression is a common host response to many respiratory viruses gene expression profiling studies of a limited number of human lung biopsies and pbmc of healthy controls and covid- patients were reported recently ( , ) . interrogation of the gene expression data in lung biopsies revealed that egr- expression levels were significantly lower in covid- patients compared with healthy controls. expression of egr- target genes related to cell growth such as ccnd , egr , and pdgfrb were also down-regulated ( figure and data not shown). in contrast, expression of egr- target genes involved in inflammatory responses such as cxcl , ccl , ccl , ccl , and tnfsf were dramatically enhanced in covid- patients compared with healthy controls (figure and data not shown). interestingly, stat expression levels were significantly increased in covid- patients compared with healthy controls (figure ). there are several limiting factors to interpreting the data. the lung is a complex tissue consisting of more than thirty cell types with differential contributions with respect to cell mass and gene expression. for example, type i and type ii cells in mouse lungs constitute the major and minor cell types and express distinct cell markers ( ) . without information on the expression levels in distinct cell types, it is difficult to correlate egr- expression with target genes in the whole lung tissue. in support of this view, in a mouse model of cd + t cell -mediated lung injury, chemokine expression was dependent on egr- activation in alveolar type ii cells ( ) . another intriguing possibility is that enhanced stat expression compensates for decreased egr- expression in some cell types in the lung. it is important to consider that in addition to changes in mrna levels, the phosphorylation status of stat and egr- may play an important role in the transcriptional regulation of chemokine target genes. it is likely that differential expression in distinct cell types may account for the disparity of egr- has emerged as a key regulator of cell growth, reproduction, and response to tissue injury ( , ( ) ( ) ( ) . egr- was rapidly induced by interferons and pro-inflammatory cytokines such as tnf-α, and il- β ( , , ) . recent studies have demonstrated dramatic changes in type i interferon, tnf-α, and il- β production by immune cells and cytokine-mediated lung inflammation in covid- patients ( , , , ) . however, the role of egr- in innate immunity and antiviral response to respiratory viruses in general and sars-cov- , in particular, remains to be investigated. transcriptional factor profiling in the transcriptome revealed that egr- was induced by ifn-α/β , tlr ligands, and tnf-α in human and mouse cells. studies in mutant cell lines lacking stat , stat , and irf revealed that ifn-α/β induction of egr- was independent of isgf- components and mediated by the activation of erk- / pathway. furthermore, respiratory viruses such as sars-cov- induced egr- and its target genes in several lung epithelial cell lines and in covid- patients. activation of egr- by ifn-α/β and tnf-α and cross-talk between the pathways modulates signal transduction and inflammatory response in innate immunity. how cells respond to interferons the jak-stat pathway at twenty induction of human mrnas by interferon complex roles of stat in regulating gene expression canonical and non-canonical aspects of jak-stat signaling: lessons from interferons for cytokine responses platanias l mechanisms of type-i-and type-ii-interferon-mediated signaling stat -dependent and -independent pathways in ifn gamma -dependent signaling toll-like receptors and innate immunity signaling crosstalk mechanisms that may fine-tune pathogen-responsive nfκb disturbance of tumor necrosis factor alpha-mediated beta interferon signaling in cervical carcinoma cells inflammatory impact of ifn-γ in cd + t cell-mediated lung injury is mediated by both stat -dependent and -independent pathways cross-regulation of tnf and ifn-α in autoimmune disease type i interferon in systemic lupus erythematosus and other autoimmune diseases early growth response protein (egr- ): prototype of a zinc-finger family of transcription factors regulation of the egr- gene by tumor necrosis factor and interferons in primary human fibroblasts regulation of life and death by the zinc finger transcription factor egr- the transcription factor early growth-response factor modulates tumor necrosis factor-alpha, immunoglobulin e, and airway responsiveness in mice expression of egr- in late stage emphysema early growth response gene -mediated apoptosis is essential for transforming growth factor beta -induced pulmonary fibrosis egr- , a master switch coordinating upregulation of divergent gene families underlying ischemic stress role of early growth response - (egr- ) in interleukin- -induced inflammation and remodeling high level expression of egr- and egr- inducible genes in mouse and human atherosclerosis ethanol-induced liver injury: potential roles for egr- pancreatic gene expression during the initiation of acute pancreatitis: identification of egr- as a key regulator early growth response factor- is critical for cholestatic liver injury lipopolysaccharide activation of the mek-erk / pathway in human monocytic cells mediates tissue factor and tumor necrosis factor a expression by inducing elk- phosphorylation and egr- expression dissecting interferon-induced transcriptional programs in human peripheral blood cells a unifying model for the selective regulation of inducible transcription by cpg islands and nucleosome remodeling dynamic expression profiling of type i and type iii interferon-stimulated hepatocytes reveals a stable hierarchy of gene expression drives development of covid- a single cell atlas of the peripheral immune response to severe covid regulation of gene expression in raw . macrophage cell line by interferon-gamma characterization of beta-r , a gene that is selectively induced by interferon beta (ifn-beta) compared with ifn-alpha stat -independent regulation of gene expression in response to ifn-gamma insights into the signal transduction pathways of mouse lung type ii cells revealed by transcription factor profiling in the transcriptome pathogen recognition by the innate immune system map kinase pathways serum response factor: discovery, biochemistry, biological roles and implications for tissue injury healing beta interferon and oncostatin m activate raf- and mitogen-activated protein kinase through a jak -dependent pathway type i interferon (ifn)-dependent activation of mnk and its role in the generation of growth inhibitory responses assessment of mtor-dependent translational regulation of interferon stimulated genes serum response factor indirectly regulates type i interferon-signaling in macrophages gnrh regulates early growth response protein transcription through multiple promoter elements control of adaptive immunity by the innate immune system molecular pathogenesis of influenza a virus infection and virus-induced regulation of gene expression trrust v : an expanded reference database of human and mouse transcriptional regulatory interactions insights into functional connectivity in mammalian signal transduction pathways by pairwise comparison of protein interaction partners of critical signaling hubs a pneumonia outbreak associated with a new coronavirus of probable bat origin influenza a induced cellular signal transduction pathways interference with intraepithelial tnf-α signaling inhibits cd (+) t-cell-mediated lung injury in influenza infection genomic analysis of increased host immune response and cell death responses induced by influenza virus pandemic influenza virus and streptococcus pneumoniae co-infection results in activation of coagulation and widespread pulmonary thrombosis in mice and humans influenza virus-induced ap- -dependent gene expression requires activation of the jnk signaling pathway nf-kappab-dependent induction of tumor necrosis factor-related apoptosis -inducing ligand(trail) and fas/fasl is crucial for efficient influenza propagation influenza virus propagation is impaired by inhibition of the lung-specific expression of active raf kinase results in increased mortality of influenza a virus-infected mice al longitudinal immunological analyses reveal inflammatory misfiring in severe covid patients identification and characterization of a new member of the tnf family that induces apoptosis multiple redundant effector mechanisms of cd + t cells protect against influenza infection angiotensin -converting enzyme is a functional receptor for the sars coronavirus early growth response- (egr- ) -a key player in myocardial cell injury protein kinase c beta/early growth response- pathway. a key player in ischemia, atherosclerosis, and restenosis role of alveolar epithelial early growth response (egr- ) in cd + t cell-mediated lung injury transcriptional regulation of egr- by the interleukin- -jnk-mkk -c-jun pathway impaired type i interferon activity and inflammatory responses in severe covid- patients immunophenotyping of covid- and influenza highlights the role of type i interferons in development of severe covid- key: cord- - hhlma authors: miller, mariël; rogers, jeffery c.; badham, marissa a.; cadenas, lousili; brightwell, eian; adams, jacob; tyler, cole; sebahar, paul r.; haussener, travis j.; reddy, hariprassada k.; looper, ryan e.; williams, dustin l. title: examination of a first-in-class bis-dialkylnorspermidine-terphenyl antibiotic in topical formulation against mono and polymicrobial biofilms date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hhlma biofilm-impaired tissue is a significant factor in chronic wounds such as diabetic foot ulcers. most, if not all, anti-biotics in clinical use have been optimized against planktonic phenotypes. in this study, an in vitro assessment was performed to determine the potential efficacy of a unique first-in-class series of antibiofilm antibiotics and compare outcomes to current clinical standards of care. the agent, cz- , was formulated into a hydrogel and tested against mature biofilms of a clinical isolate of methicillin-resistant staphylococcus aureus and pseudomonas aeruginosa atcc using two separate methods. in the first method, biofilms were grown on cellulose discs on an agar surface. topical agents were spread on gauze and placed over the biofilms for h. biofilms were quantified and imaged with confocal and scanning electron microscopy. in the second method, biofilms were grown on bioabsorbable collagen coupons in a modified cdc biofilm reactor. coupons were immersed in treatment for h. the first method was limited in its ability to assess efficacy. efficacy profiles against biofilms grown on collagen were more definitive, with cz- gel eradicating well-established biofilms to a greater degree compared to clinical standards. in conclusion, cz- may be a promising topical agent that targets the biofilm phenotype. pre-clinical work is currently being performed to determine the translatable potential of cz- gel. the centers for disease control and prevention (cdc) label the rapid global growth of drug-resistant pathogens "one of our most serious health threats" [ ] . the world health organization (who) also warns that "without urgent, coordinated action by many stakeholders, local, high doses of antibiotics that can be applied regularly to sustain antimicrobial delivery. topical delivery also helps maintain a moist wound bed, which facilitates the prevention of tissue dehydration, accelerates angiogenesis, assists in the breakdown of necrotic tissue and/or fibrin, and provides for the transport of cytokines and growth factors [ , ] . ) development of novel antimicrobial agents that address the current global threat of antibiotic resistance. we tested the in vitro efficacy of a topical formulation, the active component of which is a compound synthesized as part of a unique first-in-class series of antibiofilm agents (referred to as cz compounds). more specifically, czs are designed and synthesized to specifically wise over min. the solution was stirred for h. nabh ( . g, . , equiv.) was added portion-wise over min and the reaction stirred for an additional h. the solvent was evaporated, and the crude solid partitioned between etoac ( ml) and % naoh ( ml). the naoh phase was washed with etoac ( ml), and the combined organics were dried over na so . column chromatography was performed using gradient conditions starting at ( approximately mg of topical agent were spread in a thin layer, i.e., "buttered" on sterile " x " cotton gauze. the "buttered" side of the gauze pad was placed in contact with the discs such that all n= discs were covered completely. three additional gauze pads were placed cellulose discs were sterilely removed and placed individually into ml of pbs. samples were vortexed for min, sonicated for min at khz and plated using a -fold page of dilution series to quantify the cfu/disc that remained after treatment. positive controls of growth (n= ) with no treatment were also quantified for comparison. the same growth protocol as outlined above was used to test efficacy of topical products against polymicrobial biofilms. however, inocula concentrations were varied to grow mrsa and p. aeruginosa as polymicrobial biofilms. when the two isolates were inoculated at : or even : , ratio, p. aeruginosa overwhelmed the mrsa isolate. as such, a : , ratio was used; mrsa was inoculated at a concentration , times higher than p. aeruginosa. each isolate was suspended to a turbidity of % using a nephelometer (concentration equated to ~ x cfu/ml). mrsa was diluted : , (~ x cfu/ml) and p. aeruginosa was diluted : , , (~ x cfu/ml) using a -fold dilution series. twenty-five μl of each solution were pipetted onto cellulose discs for a total of μl per sample. polymicrobial biofilm growth was quantified as described above to obtain a baseline of cfu/disc. were drilled half way through the rod (fig. c) . heliplug™ collagen coupons were sterilely cut to size ( mm x mm), and pressed into each bored-out cavity of a rod (fig. c) . assembly was performed in a biosafety cabinet to maintain sterility. five-hundred ml of brain heart infusion (bhi) broth were added to the reactor after it was assembled. the broth was aseptically inoculated with cfu/ml (adjusted from . mcfarland standard) of mrsa or p. aeruginosa for monomicrobial biofilm growth. the reactor was set on a hot plate at °c and a baffle rotation of rpm. bacteria were grown in batch phase for h, after which a % solution of bhi was flowed through the reactor at a rate of . ml/min using a peristaltic pump (masterflex l/s microbore, cole palmer, vernon hills, il) for an additional h (fig a) . the inoculation protocol to grow polymicrobial biofilms on collagen was similar to cellulose; each isolate was suspended to a . mcfarland standard (~ x cfu/ml). mrsa was diluted : , (~ x cfu/ml) and p. aeruginosa was diluted : , , (~ x cfu/ml). however, we determined experimentally that polymicrobial biofilms grew more polymicrobial biofilms were otherwise grown as described for monomicrobial growth. the -h biofilm growth protocol for cellulose produced well-established, mature monomicrobial and polymicrobial biofilms of both mrsa and p. aeruginosa ( fig. a- c ). however, quantification outcomes following efficacy analyses were highly variable, in particular with the clinically-relevant products (data not shown as it was sporadic at best). we established a sub-hypothesis after observing the inconsistent outcomes: we hypothesized that topical treatments failed to reach the biofilms that formed on the underside of the cellulose disc (immediately adjacent to the surface of the agar), and the lack of exposure in that region led to highly variable quantification data. to test the sub-hypothesis, biofilms were grown on cellulose following the same growth protocol outlined above. sem and clsm imaging was performed to determine: ) if biofilms formed on the underside of the cellulose fiber network that was in apposition to the agar surface, and ) if those biofilms on the underside of cellulose discs were still viable following the topical product delivery protocol. sem imaging confirmed the presence of biofilms on, within, and between the interstices of the fibers on the underside of cellulose discs (figs. and ) . live/dead imaging also indicated page of that biofilms were viable on all surfaces of cellulose discs, but only surfaces in direct contact with topical agents showed cell death; bacteria on the underside and center of cellulose discs stained green (living), supporting our sub-hypothesis that bacteria on untreated surfaces were still viable and were not exposed to topical product treatments (figs. and ) [ ]. despite the limitation of this method, live/dead staining provided some useful information on topical efficacy. confocal imaging and staining indicated that cz- was highly effective against well-established biofilms that were exposed to the formulated gel, whereas clinical products had limited efficacy (fig ) . these outcomes provided rationale for performing analysis on collagen coupons. biofilms on collagen grew to maturity (fig. ) , and sem images indicated more robust biofilm formation compared to cellulose discs. quantification of positive controls supported this observation with ~ log more cfu/coupon compared to cellulose discs for both isolates. quantification data from efficacy testing against biofilms on collagen are reported in table . outcomes indicated that of the clinical standards of care, gentamicin was most effective against both monomicrobial and polymicrobial biofilms of mrsa and of p. aeruginosa (fig. and table ). gentamicin showed a log reduction of . cfu/collagen in monomicrobial biofilms of mrsa, and against polymicrobial biofilms it was effective against mrsa with a log reduction of . cfu/collagen. against both monomicrobial and polymicrobial biofilms, gentamicin showed complete eradication of p. aeruginosa, with no detectable growth (table ) . at all three concentrations ( . %, %, %) cz- reduced all monomicrobial and polymicrobial biofilms to below detectable levels ( fig. and table (table ) . retapamulin-treated mrsa biofilms had a log reduction of . cfu/collagen (table ) . respectively. the same was observed for polymicrobial biofilms; silver sulfadiazine showed . log reductions and neosporin® showed no reduction (table and fig. ). these data are of topical products this led to variable outcomes, complicating data interpretation. we conclude that this method may not be ideal for assessing efficacy of topical products unless it is modified to control for the lack of exposure to biofilms that are adjacent to the agar surface. cz- gels (all three concentrations) had equal efficacy against p. aeruginosa biofilms as gentamicin in the collagen test (fig. ) . cz- gels were more efficacious at eradicating biofilms in all other cases when compared to the standard of care topicals in the collagen tests (fig. ) . gentamicin had the greatest log reduction against monomicrobial biofilms of p. aeruginosa and polymicrobial biofilms amongst the clinical standards of care. these data were promising, but broader-scale consideration is given in clinical context; antibiotic resistance threats in the united states global antimicrobial resistance surveillance system (glass) report the x ' initiative: pursuing a global commitment to develop new antibacterial drugs by bad bugs, no drugs: no eskape! an update from the infectious diseases society of america federal funding for the study of antimicrobial resistance in nosocomial pathogens: no eskape cdc ( ) a public health action plan to combat antimicrobial resistance antimicrobial resistance: a global view from the world healthcare-associated infections forum the research agenda of the national institute of allergy and infectious diseases for antimicrobial resistance dakin's solution: is there a place for it in the st century? how to manage pseudomonas aeruginosa infections rapid degradation and non-selectivity of dakin's solution prevents effectiveness in contaminated musculoskeletal wound models fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for fda ( ) novel drug approvals for novel drug approvals for fda ( ) novel drug approvals for ecology of bacteria in chronic wounds outcomes of surgical treatment of diabetic foot osteomyelitis: a series of patients with histopathological confirmation of bone involvement diagnostic strategies in osteomyelitis culture of percutaneous bone biopsy specimens for diagnosis of diabetic foot osteomyelitis: concordance with ulcer swab cultures culture of per-wound bone specimens: a simplified approach for the medical management of diabetic foot osteomyelitis multiple bacterial species reside in chronic wounds: a longitudinal study biofilms in chronic wounds survey of bacterial diversity in chronic wounds using pyrosequencing, dgge, and full ribosome shotgun sequencing the penetration of antibiotics into aggregates of mucoid and non-mucoid pseudomonas aeruginosa self-generated diversity produces "insurance effects" in biofilm communities biofilm maturity studies indicate sharp debridement opens a time-dependent therapeutic window topical and systemic antibiotics in the management of periodontal diseases introduction to biofilm characterization of biologic properties of wound fluid collected during early stages of wound healing overview of wound healing in a moist environment in vitro testing of a first-in-class tri-alkylnorspermidine-biaryl antibiotic in an anti-biofilm silicone coating interaction between polyamines and bacterial outer membranes as investigated with ion-selective electrodes an in vitro biofilm model to examine the effect of antibiotic ointments on biofilms produced by burn wound bacterial isolates experimental model of biofilm implant-related osteomyelitis to test combination biomaterials using biofilms as initial inocula in vivo efficacy of a silicone -cationic steroid antimicrobial coating to prevent implant-related infection growth substrate may influence biofilm susceptibility to antibiotics developing selective media for quantification of multispecies biofilms following antibiotic treatment selection for staphylococcus aureus small-colony variants due to growth in the presence of pseudomonas aeruginosa microevolution of cytochrome bd oxidase in staphylococci and its implication in resistance to respiratory toxins released by pseudomonas small-colony variant selection as a survival strategy for staphylococcus aureus in the presence of pseudomonas aeruginosa advanced wound care therapies for nonhealing diabetic, venous, and arterial ulcers: a systematic review glaxosmithkline ( ) bactroban % cream or ointment product information staphylococcus aureus biofilms: properties, regulation, and roles in human disease world-wide antibiotic resistance in methicillin-resistant staphylococcus aureus the emergence of mupirocin resistance: a challenge to infection control and antibiotic prescribing practice mupirocin resistance the epidemiology of mupirocin resistance among methicillin-resistant staphylococcus aureus at a veterans' affairs hospital -heptyl- -hydroxyquinoline n- oxide, an antistaphylococcal agent produced by pseudomonas aeruginosa effects of staphylolytic enzymes from pseudomonas aeruginosa on the growth and ultrastructure of staphylococcus aureus synergistic interactions of pseudomonas aeruginosa and staphylococcus aureus in an in vitro wound model cytotoxicity is an important consideration, recognizing that infection is also toxic. nevertheless, in vivo data are currently being collected to determine if the observed cytotoxicity affects host tissue in an excision wound model. antimicrobial delivery by topical therapy provides the ability to achieve high university. the authors thank scott porter for his technical assistance. key: cord- -yjl uj h authors: vickaryous, nicola; jitlal, mark; jacobs, benjamin meir; middleton, rod; chandran, siddharthan; macdougall, niall john james; giovannoni, gavin; dobson, ruth title: remote testing of vitamin d levels across the uk ms population – a case control study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yjl uj h objective the association between vitamin d deficiency and multiple sclerosis (ms) is well described. we set out to use remote sampling to ascertain vitamin d status and vitamin d supplementation in a cross-sectional study of people with ms across the uk. methods people with ms and matched controls were recruited from across the uk. people with ms enrolled in the study; remote sampling kits were distributed to a subgroup. dried blood spots (dbs) were used to assess serum (oh)d in people with ms and controls. results ms participants completed the questionnaire; ms participants and controls provided biological samples. serum (oh)d was higher in ms than controls (median nmol/l vs nmol/l). a higher proportion of ms participants than controls supplemented ( % vs %, p< . ); people with ms supplemented at higher vd doses than controls (median vs iu/day, p< . ). people with ms who did not supplement had lower serum (oh)d levels than non-supplementing controls (median nmol/l vs nmol/l). participants engaged well with remote sampling. conclusions the uk ms population have higher serum (oh)d than controls, mainly as a result of vitamin d supplementation. remote sampling is a feasible way of carrying out large studies. ms susceptibility is a complex trait influenced by genetic and environmental factors. established environmental risk factors include ebv seropositivity, smoking, and childhood obesity [ ] [ ] [ ] . low serum -hydroxyvitamin d ( (oh)d) levels in adulthood, or even soon after birth, are associated with greater risk of developing ms [ ] [ ] [ ] . vitamin d is primarily derived from the uv light-dependent conversion of dehydrocholesterol to cholecalciferol in skin. serum (oh)d is formed by the hepatic -hydroxylation of cholecalciferol, which is further hydroxylated in the kidney to generate the biologically active compound ( , hydroxyvitamin d). (oh)d is most commonly used as a measure of vitamin d status due to its long half-life, relative stability and direct biological relationship to , hydroxyvitamin d [ ] . vitamin d is an attractive target for potential intervention in ms as it represents an easily modifiable factor. however, data is conflicting regarding the role of vitamin d in driving inflammation and/or progression in people with established ms. clinical trials of vitamin d supplementation in ms have failed to provide robust evidence of benefit [ ] . several recent meta-analyses looking at clinical trials of vitamin d for the treatment of ms have demonstrated at best modest reductions in annualised relapse rates (arr) and/or brain lesion activity but no impact on disability [ ] [ ] [ ] . there are thought to be multiple factors influencing vitamin d status in ms populations [ ] . current population guidelines recommend an intake of at least iu/day vitamin d for all [ ] . there is a lack of consensus and evidence on whether people with ms should be advised to supplement with vitamin d over and above the advice given to the general population. single centre studies examining vitamin d supplementation behaviours are subject to bias due to practices of individual neurologists; collecting supplementing information without the wider lifestyle context or serum vitamin d levels significantly limits interpretation. remote sampling using dried blood spots provides a means of testing biomarkers across an entire population without the need for in-person visits, which is of rapidly increasing relevance in the current covid- pandemic. we set out to examine the feasibility of a large-scale research project performed entirely remotely, including remote sampling using dried blood spots. we used remotely deployed questionnaires backed up with biological sampling to examine the behaviours and lifestyle factors that influence vitamin d and assess their contribution to the serum vitamin d status across the uk ms population. the primary method of recruitment was via the uk ms register [ ] . , individuals with ms were invited to participate; people with ms provided informed consent and completed a baseline questionnaire over weeks using the online platform; an additional participants (postal participants) directly contacted the study site ( figure ). individuals were additionally recruited via regional ms networks. questionnaires with sampling kits were distributed to three ms clinics across the uk -edinburgh, lanarkshire and london. sampling kits were handed out to potential participants. each ms participant who was given a sampling pack was asked to recruit an unrelated friend as a matched control. they were asked to select someone of the same gender, within years of age and living within a -mile radius (but not in the same house) as themselves. the uk ms register has ethical approval via south west bristol rec ( /sw/ ). this study had additional ethical permissions via london stanmore rec ( /lo/ ). stratified random sampling was used to select uk ms register participants to receive kits. participants were grouped (stratified) based on geographical location ( km x km square), ms type (rrms, spms, ppms) and disability (low disability classified as edss < , high disability edss ≥ ). random sampling within groups was then performed. a host of demographic and ms-specific data were collected including geographical location, gender, age, bmi, smoking status, ms type, edss, msis and date of diagnosis. where available, expanded disability status scores (edss) derived from a web-based application were used as a proxy for disability levels [ ] , and estimates of disease physical and psychological impact via the multiple sclerosis impact scale (msis- ). data on vitamin d supplementation was collected including supplement use, frequency, and dose at the time of questionnaire completion. participants completing the online form were invited to upload an image of their supplement to validate supplement dose. information of diet type and consumption of oily fish, and assessment of time spent on outdoor activities and uv sun protection averaged over the past months was also collected. to ensure complete capture of sun protection factor containing products in addition to sunblock (moisturiser, foundation, mineral powder etc.), participants were asked about both 'cosmetic sunblock' and 'sunblock' usage. each sampling pack contained two sampling kits, one for the ms participant and one for their matched control. each sampling kit contained a fully equipped dried blood spot (dbs) sampling system to collect a blood sample for vitamin d analysis, and a buccal swab for genetic material. a questionnaire was included for controls, and for those ms participants where data was not entered via the online system. sampling packs were sent out february-july . samples were received back at the study site february-september . serum vitamin d concentrations were measured from dbs [ ] . upon receipt samples were stored at - °c and underwent analysis in four batches. liquid chromatography tandem mass spectrometry was used to determine total (oh)d [ , ] . two dbs were analysed per participant; results were excluded if duplicate analysis differed by  %, if only one viable dbs was available, or if dbs were deemed to be of poor quality, i.e, spots too small, not fully soaked through or multiple overlapping spots. we then set out to verify our findings using an independent sample set derived from uk biobank (ukbb) [ ] . questionnaire and biomarker data from participants' baseline visit ( - ) were used. each individual with ms at the time of uk biobank registration (n= ) was randomly matched to four controls (n= ), stratified by age, gender, and ethnicity (white vs non-white). data including baseline serum (oh)d levels, vitamin d supplementation (yes/no; no dose information available), oily fish consumption, time spent outdoors and uv sun protection usage were analysed. statistical analyses for ms register data were performed using spss v and r (v. . . ). geographical mapping was performed using arcgis . . analysis of uk biobank data was carried out using r (version . . ). relationships between categorical variables were analysed using the chi-squared test of association; non-normally distributed continuous variables were analysed using the mann-whitney u test and the kruskal wallis test was used to compare + groups. simple linear regression was used to examine the relationship between demographic, solar and lifestyle behaviour that may affect dose of vitamin d and serum (oh)d levels. individual level data used in this study is available via the uk ms register by application from any suitably qualified investigator to the uk ms register steering committee. participants with ms provided questionnaire data. this group consisted of individuals recruited via the uk ms register, postal participants and participants from local ms clinics who returned packs. this group had a wide geographical distribution across the uk (supplementary figure a ). their demographics were consistent with that expected across an ms population; % female and predominantly relapsing remitting ms (rrms) ( table ) . sampling kits were posted out to participants. of kits sent to network sites, were distributed to potential participants. sampling packs were sent out to participants from across the united kingdom the demographics of the group from whom biological samples were obtained reflected stratified sampling across ms type and disability levels (table ) , with approximately % rrms, % spms, % ppms. edss scores were available for participants in this group; the median edss was . (iqr ) (table ) . controls appeared well-matched (table ) , with no significant difference in sex or age distribution. controls had a slightly higher bmi than participants with ms (median bmi in ms vs in controls; p= . ), and there was no difference in the proportion of current smokers in the two groups. % ( / ) of the ms participants from the biological sampling group reported taking vitamin d supplements compared to % ( / ) of controls (p< . ; table ). this did not appear to be restricted to the uk ms register population: % ( / ) ms participants recruited through clinics supplemented compared to just % ( / ) of their matched controls. there was no difference in reported rates of vitamin d supplementation across gender, ms type, disability level or score on msis (table ) . where dose data were available, ms participants (n= ) reported a higher median vitamin d supplement dose than controls (n= ) ( vs. iu/day; p< . ) (figure a) . table ). more ms participants identified as either vegetarian or vegan ( % vs % controls), p= . . there was no difference in oily fish consumption (supplementary table ). ms participants were more likely to report rarely spending time on outdoor activities ( % vs % controls), p< . (supplementary table ), which was strongly associated with disability levels. % ( / ) of participants with high edss ( ) rarely participated in outdoor activities compared to % ( / ) of participants with low edss (< ) (p< . ) (data not shown). ms participants were less likely than controls to wear sunblock ( vs % "never" wear sunblock, p< . ) (supplementary table ). females, both cases and controls, were more likely to wear cosmetic sunblock than males ( % females vs % males reported wearing it weekly or more, p< . ) (supplementary table ). there was no significant difference between ms and control females with respect to cosmetic sunblock usage, p= . (supplementary table ) . median serum (oh)d levels were higher in ms participants than controls: vs nmol/l, p< . ( figure b) . ms participants were more likely to have adequate serum levels (defined as > nmol/l) ( % ms vs % controls) ( table ). there were no differences in serum vitamin d levels by gender, ms table ). in the non-supplementing ms population increasing age had a negative association with serum (oh)d in a multivariable model. bmi was not associated with serum (oh)d levels. in the supplementing ms population there was a positive association between increasing vitamin d dose and serum (oh)d levels, but age, bmi or solar contributions were not associated with serum levels (supplementary table ). table ). a lower proportion of people with ms took vitamin d supplementation at ukbb enrolment than in our current study, however they were still more likely to do so than the matched controls ( % vs %, p< . ) (supplementary table ). in this case-control study we found a striking difference in vitamin d supplementation between people with ms and controls. % participants with ms report taking vitamin d supplements compared to just % of controls. not only were ms participants more likely to take vitamin d supplements, but they also took them at higher doses, such that people with ms in the uk now have overall higher serum (oh)d levels than controls. when stratified by supplementation habits we found that non-supplementing people with ms had lower levels of serum (oh)d. these findings carry implications for any future vitamin d supplementation trial -double blind, placebo-controlled supplementation trials need to take current behavioural patterns into account, and a "treat to target" trial utilising remote sampling is likely the most feasible study design for any large-scale study. this study is novel in its use of remote sampling technology. given the current covid- pandemic, the use of remote technologies to enable clinical trials to continue is highly relevant; we demonstrate that this is feasible in ms. the wide coverage we were able to achieve using remote sampling is particularly important when studying an environmentally sensitive endpoint such as serum (oh)d. the recruitment of a large pool of participants allowed us to stratify and select participants for biological sampling which represented all stages of ms with a range of disability. the use of straightforward sampling techniques carried out by participants at home allowed us to enrol all members of the ms community regardless of care centre, location or disability level. the relatively low rate of responses to the initial questionnaire likely reflects that this was the first uk ms register-hosted study where participants were asked to de-anonymise themselves for research purposes, and where biological sampling was required. the rate of return of usable sample packs ( %) is in keeping with other studies requiring sample return. this study is not without limitations. as recruitment primarily took place through a voluntary ms register, it could be argued that this high rate of supplementation resulted from a recruitment bias with an a priori interested population. people were aware from the information sheet that the purpose of the study was to establish vitamin d levels across the uk ms population. however, no overt reference was made to either an underlying hypothesis linking vitamin d deficiency to ms or recommended intakes. it could also be argued that the population taking part in the uk ms register represent a more engaged and educated group with respect to vitamin d supplementation and ms. furthermore, the recruitment of a subset of individuals directly from ms clinics across the uk enabled us to estimate bias related to method of recruitment. similarly high rates of supplementation were found in ms participants recruited via both means. participant recruitment of age and sex-matched controls may have induced bias related to overmatching, however the exclusion of household controls mitigates this to some degree. whilst similarities may remain around socioeconomic status and other lifestyle factors, we see that the impact of differential vitamin d status far outweighs this. whilst the ukbb population demonstrated a lower rate of vitamin d supplementation amongst people with ms compared to our current study, vitamin d supplementation was still significantly higher than in controls. the reason(s) for the discrepancy between vitamin d usage between the current study and the uk biobank population is unclear, but at least some of this difference may be attributed to the changes in vitamin d usage over the last years. ukbb baseline data was collected - years ago, and attitudes towards vitamin d supplementation in the uk have changed significantly over this time [ ] . due to the cross-sectional nature of this observational case-control study we are unable to make inferences with regard to vitamin d status and disease progression. however, the potential to re-recruit via the same online platform for follow-up remains. the use of self-reported behaviours is a further limitation, however, the dose-response to vitamin d supplementation and validation using photographs of supplements overcome this to some degree. the uk ms register population is predominantly white british [ ] and this study needs to be replicated in an ethnically diverse population. finally, whilst the return rate of biological samples was high for a survey-based study, it remains significantly lower than in direct sampling studies, and this must be considered in future remote sampling studies. epstein-barr virus antibodies and risk of multiple sclerosis: a prospective study bmi and low vitamin d are causal factors for multiple sclerosis: a mendelian randomization study smoking is a major preventable risk factor for multiple sclerosis serum -hydroxyvitamin d levels and risk of multiple sclerosis vitamin d status and the risk of multiple sclerosis: a systematic review and meta-analysis neonatal vitamin d status and risk of multiple sclerosis: a population-based case-control study vitamin d status: measurement, interpretation, and clinical application vitamin d for the management of multiple sclerosis vitamin d for the treatment of multiple sclerosis: a meta-analysis the effects of vitamin d supplementation on expanded disability status scale in people with multiple sclerosis: a critical, systematic review and metaanalysis of randomized controlled trials associations of serum (oh) vitamin d levels with clinical and radiological outcomes in multiple sclerosis, a systematic review and meta-analysis low vitamin d- (oh) level in indonesian multiple sclerosis and neuromyelitis optic patients vitamin d supplementation multiple sclerosis top | james lind alliance validating the portal population of the united kingdom multiple sclerosis register validating a novel web-based method to capture disease progression outcomes in multiple sclerosis a sensitive lc/ms/ms assay of oh vitamin d and oh vitamin d in dried blood spots measurements of -hydroxyvitamin d concentrations in archived dried blood spots are reliable and accurately reflect those in plasma trends in the incidence of testing for vitamin d deficiency in primary care in the uk: a retrospective analysis of the health improvement network (thin) anti-aging and sunscreens: paradigm shift in cosmetics sunscreen photoprotection and vitamin d status the effect of sunscreen on vitamin d: a review a systematic review of modifiable risk factors in the progression of multiple sclerosis vitamin d deficiency is associated with disability and disease progression in multiple sclerosis patients independently of oxidative and nitrosative stress association of vitamin d metabolite levels with relapse rate and disability in multiple sclerosis self-administration of vitamin d supplements in the general public may be associated with high -hydroxyvitamin d concentrations this study was funded by the uk ms society (grant ref ).this work was performed on the preventive neurology unit, which is funded by barts charity.the authors have no conflicts of interest directly relevant to this study to declare.author contributions:rd conceived the study with input from gg. nv and rd designed the study with input from mj, bj and rm. mj and bj designed and performed the statistical analysis. nv, sc, nm, rm played a role in acquiring and analysing data. nv drafted the manuscript with input from all coauthors. of the total n that provided supplementation data this n had a dose available; c of the total n that provided supplementation data this n had serum (oh)d levels available; d msis- scores were divided into quartiles and comparisons were made between lowest quartile (low impact) and highest quartile (high impact) key: cord- -n kpvsvg authors: nguyen, long t.; smith, brianna m.; jain, piyush k. title: enhancement of trans-cleavage activity of cas a with engineered crrna enables amplified nucleic acid detection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n kpvsvg the crispr/cas a rna-guided complexes have a tremendous potential for nucleic acid detection due to its ability to indiscriminately cleave ssdna once bound to a target dna. however, the current crispr/cas a systems are limited to detecting dna in a picomolar detection limit without an amplification step. here, we developed a platform with engineered crrnas and optimized conditions that enabled us to detect dna, dna/rna heteroduplex and methylated dna with higher sensitivity, achieving a limit of detection of in femtomolar range without any target pre-amplification step. by extending the ’- or ’-ends of the crrna with different lengths of ssdna, ssrna, and phosphorothioate ssdna, we discovered a new self-catalytic behavior and an augmented rate of lbcas a-mediated collateral cleavage activity as high as . -fold compared to the wild-type crrna. we applied this sensitive system to detect as low as fm dsdna from the pca gene, an overexpressed biomarker in prostate cancer patients, in simulated urine over hours. the same platform was used to detect as low as ~ fm cdna from hiv, fm rna from hcv, and fm cdna from sars-cov- , all within minutes without a need for target amplification. with isothermal amplification of sars-cov- rna using rt-lamp, the modified crrnas were incorporated in a paper-based lateral flow assay that could detect the target with up to -fold higher sensitivity within - minutes. based on the crystal structure of lbcas a/crrna/dsdna (pdb id: xus) , we reasoned that crrna extensions can influence the trans-cleavage activity by either activating or inhibiting the catalytic efficiency of cas a, allowing us to better understand crrna design with tunable transcleavage activity. we speculate that chemical modifications of the crrna can potentially change its nature of binding and subsequently alter this collateral cleavage due to conformational changes of the cas a dynamic endonuclease domain. we placed ssdna, ssrna, and phosphorothioate ssdna extensions of various lengths ranging from to nucleotides on either the '-or '-ends of the crrna targeting gfp (green fluorescent protein), referred to here as crgfp (fig. b-h) . in order to measure the collateral or trans-cleavage activity of cas a, we employed a fret-based reporter used in detectr, composed of a fluorophore (fam) and a quencher ( iabkfq) connected by a -nucleotide sequence (ttatt), which displays increased fluorescence upon cleavage. consistent with the previous literature , respectively. the fold in fluorescence was normalized by taking the ratio of background-corrected fluorescence signals of sample with activator to the corresponding sample without activator. error bars represent ± sem, where n = replicates; two-way anova test two-way anova (n= , n= , ns p > . , *p < . , **p < . , ***p < . , ****p < . ). the experiments were repeated at least twice with n = per experiment. when using wild-type crrnas, we observed that the lbcas a exhibited higher trans-cleavage activity than the ascas a or the fncas a, and therefore, we designed various modified crrnas compatible with lbcas a. using the same reporters, we discovered that ssdna and ssrna extensions on the '-end of crgfp markedly enhanced the trans-cleavage ability of target-activated lbcas a. comparing the two types, the ssdna extensions demonstrated higher activity than the corresponding ssrna (figs. b-d,f and supplementary figs. [ ] [ ] [ ] [ ] [ ] . on the other hand, the phosphorothioate ssdna extensions at the '-or '-end displayed minimal or no activity, showing decreased fluorescence intensity as modification length increased (figs. e,h and supplementary figs. [ ] [ ] [ ] [ ] [ ] . this observation suggests that further extending the crrna with -mer phosphorothioate ssdna and beyond significantly inhibits lbcas a trans-cleavage activity. the finding corroborated b. li and colleagues that phosphorothioate ssdna may prevent crrna-cas a-dna complex formation. notably, the '-dna with -mer extensions on the crgfp, referred as crgfp+ 'dna , yielded the highest fluorescence signal compared to other modifications, measuring approximately . fold higher intensity than the wild-type crgfp (fig. c, supplementary figs. a, a) . by investigating the conformational changes from the crystal structure of the binary lbcas a:crrna complex , , , we observed that the '-end modifications on crrna is proximal to the ruvc region of the lbcas a. this supports our observation that the '-end extensions lead to higher trans-cleavage activity than the '-end. we speculated that once an r-loop is formed between crrna and dsdna or ssdna activator, the lbcas a executes a partial trans-cleavage of the 'end of crrna, leaving an overhang. these remaining extensions may further expand the nuclease domain in the lbcas a, resulting in conformational changes and allowing more access for nonspecific ssdna cleavage. to confirm our hypothesis, we attached different fluorophores, or fluorophore-quencher pairs separated by dna linkers, to either the '-or '-end of the crgfp with -mer dna extensions and analyzed by denaturing gel electrophoresis. surprisingly, we discovered that the '-end of the crrna is processed by lbcas a only in the presence of an activator while the '-end is cleaved by lbcas a even in the absence of the activator (fig. a,b and supplementary figs. , ). by placing the fluorophore fam on the '-end and a -mer dna extension on the '-end of the crgfp, we learned that the first uracil on the '-end of the crgfp gets trimmed by lbcas a in the absence of an activator, which corroborated previous studies reported for fncas a . as a result, the '-end modifications are eliminated and converted back to the wild-type crrna before complexing with the activator. this finding reinforces our previous observation that the 'extended crrna has similar collateral cleavage activity as the wild-type crrna. fascinated by lbcas a pre-crrna processing as previously described and from our observations, we investigated how extensions of the mature crrna would influence the trans-cleavage activity compared to the corresponding extended pre-crrna. we discovered that the modified pre-crrna and modified mature crrna (tru-crrna) exhibited comparable trans-cleavage efficiency (fig. g) . furthermore, when a dsdna or an ssdna activator was present, the '-and '-end dna- to further understand the lbcas a enhanced enzymatic activity, we performed a michaelis-menten kinetic study on the wild-type crgfp and the crgfp+ 'dna and observed that the ratio kcat/km was . -fold higher for crgfp+ 'dna than the unmodified crgfp (figs. c,d) . the time-dependent gel electrophoresis analysis of nonspecific cleavage of ssdna m mp phage (~ kb) reconfirmed the fluorophore-quencher-based reporter assay results (fig. e) . we speculated that the reporter composition itself may affect the lbcas a collateral cleavage activity. therefore, we incorporated and tested various nucleotides (gc and ta-rich) and fluorophores (fam, hex, and cy ) within the reporter. consistent with our hypothesis, we observed that the lbcas a achieved maximal trans-cleavage activity with fam or hex and tarich reporter ( fig. f and supplementary figs. [ ] [ ] [ ] [ ] [ ] . furthermore, these results led us to question whether the trans-cleavage activity is dependent on the sequence of ssdna extensions on '-end of the crrna. to test this, we altered the nucleotide content of the extended regions of the crgfp. it turned out that the crgfp with ta-rich extensions carried out significantly more collateral cleavage than those with gc-rich regions ( fig. h and supplementary fig. ). based on our findings that the trans-cleavage activity is drastically improved by -mer ssdna extensions to the '-end of crgfp, we questioned if the binding of crrna with lbcas a itself is influenced by such modifications. a biolayer interferometry binding kinetic assay revealed that the dissociation constant, kd, between the binary complex lbcas a:crrna and lbcas a:crrna+ 'dna are comparable within a low nm scale ( fig. i and supplementary fig. ). these binding results suggest that the 'dna modification on crrna does not affect the binary complex formation between the lbcas a and the crrna. while -mer ssdna extensions on the '-end of crrna increases trans-cleavage activity with lbcas a, we questioned if this is consistent across other orthologs of cas a. to investigate further, we carried out an in vitro cis-cleavage and trans-cleavage assay of ascas a and fncas a with an extended crgfp compared to a wild-type crgfp ( fig. j and supplementary fig. ). interestingly, the crgfp+ 'dna showed similar results with fncas a; however, it exhibited an opposite effect with ascas a. however, the cis-cleavage activity was found to be comparable between the crgfp and crgfp+ 'dna for all the orthologs tested. overall, lbcas a showed the highest fluorescence signal, which is consistent with previous studies. , through observation of the fluorophore-quencher-based reporter assay and time-dependent gel electrophoresis, we hypothesized that the various extensions of ssdna on the crrna induce conformational changes on lbcas a that result in enhanced endonuclease activity. structural analysis of lbcas a shows that it contains a single ruvc domain, which processes precursor crrna into mature crrna, cleaves target dsdna or ssdna (referred here as activators), and executes nonspecific cleavage afterwards. , therefore, we were interested in understanding the effects of these modified crrnas on cis-cleavage compared to the wild-type crrna, as well as how cis-cleavage activity correlates to the trans-cleavage activity. towards this, we carried out an in vitro cis-cleavage assay for various '-end and '-end modifications. we noticed that the cis-cleavage activity was either similar or marginally improved with most '-end modifications while the '-end modifications showed either similar or slightly reduced activity. this phenomenon suggests that the trans-cleavage activity is commensurate with the cis-cleavage activity . the kd was determined by the biolayer interferometry binding kinetic assay with r > . . (j) trans-cleavage activity of different variants of cas a. the prefix lb, as, and fn stand for lachnospiraceae bacterium, acidaminococcus, and francisella novicida, respectively. (k) single-point mutations (m -m ) on the target strand of a dsdna gfp activator. the heat map displays relative fluorescence intensity normalized to wild-type (wt) activator after hours. error bars represent ± sem, where n = replicates. the experiments were repeated at least twice with n = per experiment. next, we sought to the characterize specificity of these extended crrnas in discriminating point mutations across dsdna. by mutating a single nucleotide at each position across the targetbinding region, we discovered that the crgfp+ 'dna tolerated mutations and produced a stronger fluorescence signal than the wild-type crgfp ( supplementary fig. ). however, the fluorescence intensity ratio of mutated to the non-mutated dsdna targets for the crgfp+ 'dna was quite comparable to the crgfp ( fig. k and supplementary fig. ). this observation suggests that the modified crrnas increased sensitivity of trans-cleavage, however, the specificity remained unchanged. previous studies demonstrated that fncas a is a metal-dependent endonuclease, and magnesium ions are required for fncas a-mediated self-processing of precursor crrna. based on these findings, we hypothesized that different metal ions may significantly affect the trans-cleavage activity of lbcas a. this led us to test a range of divalent metal cations and discovered that most ions including ca + , co + , zn + , cu + , and mn + significantly inhibited the lbcas a activity ( supplementary fig. ) . by further investigating the zn + mediated inhibition of lbcas a, we found that the inhibition was dose-dependent ( supplementary fig. ). interestingly, ni + ions showed an unusual cis-cleavage activity possibly due to its interactions with the his tags on lbcas a ( supplementary fig. ) . among the tested divalent metal ions, the mg + ions showed the highest in vitro cis-cleavage activity, which was consistent with the literature. therefore, we characterized the effect of mg + ions on trans-cleavage activity of lbcas a. with increasing the concentration of mg + ions, a significant increase in fluorescence signal was observed in an in vitro trans-cleavage assay. by varying the amount of mg + in the cas a reaction, we identified that the optimal condition of mg + was around mm ( fig. a-b and supplementary figs. [ ] [ ] [ ] [ ] . we optimized the previously developed crispr-based detection assays , , and combined them with our engineered crrna+ 'dna to create a crispr-enhance (enhanced analysis of nucleic acids with crrna extensions) technology or referred here as enhance. to validate the enhance technology, we first selected a clinically relevant nucleic acid biomarker, prostate cancer antigen (pca /dd ), which is one of the most overexpressed genes in prostate cancer tissue and excreted in patients' urine. consequently, elevated pca levels during prostate cancer progression has become a widely targeted biomarker for detection. [ ] [ ] [ ] [ ] to determine the limit of detection of pca using our enhance technology, we spiked the pca cdna into synthetic urine and investigated how this clinically-relevant environment affects the activity of cas a. using enhance for detecting the pca cdna, the limit of detection was determined to be as low as fm in the urine at mm mg + concentration compared to ~ pm at mm mg + concentration after hours (figs. a-c and supplementary figs. - ). in contrast, the wild-type crrna also showed a similar fm limit of detection at mm mg + concentration while the limit of detection was ~ pm at mm mg + concentrations after hours. therefore, by combining the crrna modifications with increased mg + ion concentrations, we achieved approximately -fold increase in sensitivity, based on limit of detection calculations. nevertheless, this observation also suggests that our modified crrna+ 'dna significantly improves the limit of detection at low mg + but reaches a saturation point that is comparable with the wild-type crrna at high mg + concentration. to understand the importance of divalent ions in the cas a transcleavage reaction, we carried out a michaelis-menten kinetic study with various mg + concentrations ( supplementary fig. ). we observed that the initial reaction rate of cas a in the presence of high mg + concentrations increased tremendously compared to that in low mg + . ssdna. using modified crrna, the limit of detection of hcv target ssdna was found to be fm ( amoles) at min, without target amplification, mean ± se, two-way anova (n= , n= ). (i) fold change in trans-cleavage activity with lbcas a in presence of pm ( fmols) of target sars-cov- cdna (dsdna), mean ± se, two-way anova (n= , n= , ns p > . , *p < . , **p < . , ***p < . , ****p < . ). (j) lateral flow assay detecting nm ( fmols) of sars-cov- cdna using either crcov- or crcov- + 'dna- within minutes of incubation. (j) schematic diagram showing how a lateral flow assay works. briefly, the dipstick uses gold-labeled fitc-specific antibodies that binds to fitc-biotin reporter and travel through membrane. only cleaved reporter will reside at the positive line. (k) lateral flow assay detecting sars-cov- cdna using crcov- and crcov- + 'dna without a pre-amplification step, and (l) band-intensity analysis of (k) using imagej. (m) lateral flow assay detecting sars-cov- rna n gene using crcov- and crcov- + 'dna with rt-lamp, and (n) band-intensity analysis of (m) using imagej. however, the two reaction rates eventually reach a similar saturation point (supplementary figs. [ ] [ ] [ ] . this suggests that mg + is not only required for the cas a reaction, but also accelerates the enzyme's trans-cleavage activity. regardless, mg + plays an important role in lowering the limit of detection in synthetic urine containing pca . while as low as fm (equivalent to . attomoles) of pca cdna can be detected with enhance without any target amplification ( supplementary fig. ) , the clinical concentration of pca mrna in the urine can be lower and therefore may require target pre-amplification. , therefore, we incorporated and tested a recombinase polymerase amplification (rpa) step to isothermally amplify the pca cdna. by combining the rpa step as previously reported, , the concentration of pca cdna in the urine was detectable down to ~ am ( zmol) with . -fold signal to noise ratio (fig. d) . while crrna/lbcas a has been traditionally used to detect unmodified dna, the field is missing the knowledge on how the common epigenetic marker, dna methylation, affects its transcleavage activity. dna methylation is also one of the bacterial defense systems that fight against outside invaders. it would be fascinating to understand how lbcas a collateral cleavage is able to recognize methylated dna targets. this curiosity let us to discover that the wild-type crrna had significantly reduced activity in detecting methylated dna, containing -methyl cytosine, compared to the unmethylated dna. however, the enhance showed . -fold and . -fold and higher trans-cleavage activity compared to the wild-type crrna for targeting the methylated dsdna and ssdna, respectively (fig. e, supplementary fig. a) . although there are no reports on rna-guided rna targeting by lbcas a, we envisioned that an rna can potentially be detected as a dna/rna heteroduplex. to test this hypothesis, we incorporated a reverse transcription step to convert rna into cdna/rna heteroduplex before detecting the rna with a trans-cleavage assay. we discovered that the rna can only be detected if the target strand for crrna is a dna but not an rna in a heteroduplex. notably, the efficiency of the trans-cleavage activity for the dna/rna heteroduplex was found to be significantly lower than the corresponding ssdna or dsdna (fig. e, supplementary fig. b) . however, the dna/rna heteroduplex achieved an improved enzymatic collateral activity when using the crrna+ 'dna compared to the wild-type crrna. we applied the enhance to successfully detect low picomolar concentrations of hiv rna target encoding tat gene with our dna/rna heteroduplex detection strategy (fig. f) . in parallel, ssdna and dsdna targets from hiv were also detected with much higher sensitivity compared to the wild-type crrna within to minutes (figs. f,g and supplementary figs. ) . we further applied the enhance for detecting hcv ssdna and hcv dsdna gene encoding a polyprotein precursor, both of which indicated consistent enhanced collateral activity than the wild-type crrnas within minutes (figs. f, h , and supplementary fig. ). the limit of detection for hiv and hcv targets were calculated to be fm cdna and fm ssdna, respectively. in the wake of the recent covid- pandemic, there is an urgent need to rapidly detect the sars-cov- coronavirus (referred as cov- here for simplicity). we optimized the enhance to detect cov- dsdna by designing crrnas targeting nucleocapsid phosphoprotein encoding n gene (figs. f,i) . while no clinical samples were tested, the results indicated the 'dna -modified crrna consistently demonstrated higher sensitivity for detecting cov- dsdna within minutes as compared to the wild-type crcov- ( supplementary figs. - ). by incorporating a commercially available paper-based lateral flow assay with a fitc-ssdna-biotin reporter, [ ] [ ] [ ] we could visually detect nm of cov- cdna within minutes of incubation using both wildtype and modified crrnas without any target amplification (fig. j-l and supplementary fig. ). the enzyme trans-cleavage activity exhibited a consistent trend with the crrna+ 'dna among five different targets (fig. f) . when incorporating a reverse transcription step and a loop-mediated isothermal amplification (rt-lamp) strategy into the enhance, both the crcov- -wt and the crcov- + 'dna demonstrated a limit of detection down to a - copies of rna (fig. m) . however, in case of crcov- -wt, the partial cleavage of the reporter resulted in a darker control line on the paper strip. band-intensity analysis showed that the enhance exhibited an average of -fold higher ratio of positive to control line between nm and pm of target cov- rna, while the crcov- -wt indicated an average of only -fold ratio (figs. m,n and supplementary fig. ) . additionally, the time lapse pictures of lateral flow assays showed that the positive line for target-containing samples developed and became visible sooner (within seconds) for the crcov- + 'dna than the crcov- -wt ( supplementary fig. ). in summary, we extended the '-and '-end of the crrna and discovered an amplified transcleavage activity of lbcas a when the '-end is extended with dna or rna. we applied this modified crrna/lbcas a system with the optimal conditions to detect pca in simulated urine with high sensitivity. this enhance technology enabled us to detect the dna/rna heteroduplex and methylated dna with unprecedented sensitivity. we further employed this system to test a range of target nucleic acids, including ssdna, dsdna, and rna from hiv, hcv, and sars-cov- without the need for further optimization. these findings are a crucial step towards enhancing detection of nucleic acids and assisting in the diagnosis of various diseases. multiple dna activators were used in this study. the gfp fragment ( bp) was produced by amplifying the pegfp-c plasmid using polymerase chain reaction in the proflex pcr system (thermofisher scientific). the pcr product was purified using monarch® nucleic acid purification kit (new england biolabs inc.). additionally, the -nt ds-gfp and ds-pca activators were generated by annealing two singlestranded ts and nts fragments at a : ratio (integrated dna technologies inc.) in x hybridization buffer ( nm tris-cl, ph . , mm kcl, mm mgcl ). the annealing process was executed in the proflex pcr system at o c for minutes followed by gradual cooling to o c at a rate of . o c/s. the plasmid lbcpf - nls (addgene # , a gift from jennifer doudna lab) was transformed into nico (de ) competent cells (new england biolabs). colonies were picked and inoculated in terrific broth at o c until od = . . iptg was then added to the cultures, and they were grown at o c overnight. cell pellets were collected from the overnight cultures by centrifugation, resuspended in lysis buffer ( m nacl, mm tris-hcl, mm imidazole, . mm tcep, . mg/ml lysozyme, and mm pmsf, ph = ), and broken by sonication. the sonicated solution then underwent high speed centrifugation at , rcf for minutes. the collected supernatant was then run through a ni-nta hispur column (thermofisher) pre-equilibrated with wash buffer a ( m nacl, mm tris-hcl, mm imidazole, . mm tcep, ph = ). the column was then eluted with buffer b ( m nacl, mm tris-hcl, mm imidazole, . mm tcep, ph = ). the eluted fractions were then pooled together and underwent tev cleavage overnight at o c (tev protease was purified using the plasmid prk , # from addgene, a gift from david waugh lab). the resulting fraction was equilibrated with buffer c ( mm nacl, mm hepes, . mm tcep, ph = ) at a : ratio and run through hitrap heparin hp ml column (ge biosciences). the column was washed with buffer c and gradually eluted at a gradient rate with buffer d ( mm nacl, mm hepes, . mm tcep, ph = ). the eluted fraction was concentrated down to l and passed through the hiload superdex pg column (ge biosciences). the purified lbcas a was then buffer exchanged with storage buffer ( mm nacl, mm na co , . mm tcep, % glycerol, ph = ) and flash frozen at - o c until use. the bli ni-nta biosensors were purchased from fortebio to perform the binding kinetic study with polyhistidine-tagged lbcas a. in detail, the experiment was carried out in a -well plate and included five steps: baseline, loading, baseline , association, and dissociation. the biosensors were dipped into the baseline containing x kinetic buffer ( x pbs, . % bsa, and . % tween ). they were then transferred into each loading well containing g/ml of lbcas a. after processing through loading and baseline , the protein-tagged biosensor was next allowed to dip into the crrna sample wells at different dilution ( , , . , . , . , . , . , and g/ml) in the association step. the dissociation step occurred when the biosensors were transferred back to baseline at a shake speed of rpm. all the samples were read by the octet qke system (fortebio). kd was determined by software data analysis . (fortebio), and only kd with r > . were extracted for comparison between crrna wild type and modified crrnas. in-vitro digestion reactions were carried out with three different types of the cas a family (lbcas a, ascas a, and fncas a were purchased from new england biolabs inc., integrated dna technologies inc., and abm®, respectively) and a wide array of modified crrna's (purchased from dna technologies inc.). cas a and crrna were mixed with : ratio ( nm: nm) in x nebuffer . and pre-incubated at o c for minutes to promote the ribonucleoprotein complex formation. dna activator (gfp or pca fragments) (final concentration of nm) was then added to the mixture to produce a total volume of l and incubated at o c for minutes. the sample was then analyzed in either % agarose gel (for gfp fragments amplified from the pegfp-c plasmid) pre-stained with either syber gold (invitrogen), gelred (biotium inc.), or premade % novex™ tbe-urea gel (invitrogen) . nonspecific cleavage activity of cpf was activated by incubating cpf , crrna, and dna activator with a concentration of nm: nm: nm in x nebuffer . buffer at o c for minutes. m mp was then added to the l reaction mixture and incubated for an additional minutes. a fraction of the reaction was taken out every minutes, quenched in x purple gel loading dye (new england biolabs inc.), and subsequently analyzed in % agarose gel (fisher scientific) . the fluorophore-quencher reporter assay was carried out following a standard clinical detection protocol. the cas a-crrna ribonucleoprotein complex was assembled by mixing nm cas a and nm crrna in x nebuffer . in the proflex pcr system (thermofisher scientific) at o c for minutes (volume . µl). the activator ( nm final concentration), fq reporter ( nm final concentration), and ultrapure™ dnase/rnase-free distilled water (invitrogen) were pre-added to a -well plate (greiner bio-one) to a volume of . µl. the reaction was initiated by adding the cas a-crrna mixture to the -well plate preloaded with activator and fq reporter (integrated dna technologies inc). the plate was quickly transferred to a plate reader (clariostar or biotek), and fluorescence intensity was measured every minutes for or hours (detection limit assay) (fam fq: λex: / nm, λem: / nm; hex: λex: / nm, λem: / nm). after or hours (detection limit assay), the sample was scanned for images using the amersham typhoon (ge healthcare). for michaelis-menten kinetic study, nm lbcas a: nm crrna: nm activator were mixed in nebuffer . and incubated at o c for minutes. the reaction mixture was then transferred to a -well plate (greiner bio-one) preloaded with different concentrations of fq reporter (hex or fam fq reporter: m, . m, . m, . m, . m, and m) and ultrapure™ dnase/rnase-free distilled water (invitrogen) . to find limit of detection (lod), the fluorophore-quencher reporter assay was carried out with various concentrations of activator. the lod calculations were based on the following formula : lod = . * std of rfu in the absence of activator slope of rfu vs. activator concentration the metal ions (mg + , zn + , mn + , cu + , co + , ca + ) were prepared by diluting chloride salt in different concentrations. for cis-cleavage, the cas a-crrna-metal iron duplex was mixed with nm: nm: varying nm ratio in x annealing buffer ( mm tris-hcl, ph . @ °c, mm nacl, mg/ml bsa) and pre-incubated at o c for minutes. dna activator (gfp or pca fragments) was then added to the mixture to a total volume of l and incubated at o c for minutes. to minimize the testing time, the following reagent was assembled in a one-pot reaction: for experiments involving a rt-lamp preamplification step of target rna, the mixture was prepared in the following order (except for the rna and primer mix samples (idt technologies), everything was purchased from new england biolabs): the rt-lamp reaction was incubated at o c for - minutes prior to lbcas a reaction. crispr-cas a target binding unleashes indiscriminate single-stranded dnase activity the crispr-cas a gene-editing system induces collateral cleavage of rna in glioma cells crispr-cas a has both cis-and trans-cleavage activities on singlestranded dna nucleic acid detection with crispr-cas a/c c multiplexed and portable nucleic acid detection platform with cas , cas a and csm programmed dna destruction by miniature crispr-cas enzymes sherlock: nucleic acid detection with crispr nucleases nucleic acid detection of plant genes using crispr-cas increasing the specificity of crispr systems with engineered rna secondary structures extension of the crrna enhances cpf gene editing in vitro and in vivo chemically modified cpf -crispr rnas mediate efficient genome editing in mammalian cells structural basis for the canonical and non-canonical pam recognition by crispr-cpf cas a trans-cleavage can be modulated in vitro and is active on ssdna, dsdna, and rna synthetic oligonucleotides inhibit crispr-cpf -mediated genome editing crystal structure of cpf in complex with guide rna and target dna the crystal structure of cpf in complex with crispr rna structural basis for guide rna processing and seed-dependent dna targeting by crispr-cas a exploring the trans-cleavage activity of crispr-cas a (cpf ) for the development of a cpf is a single rna-guided endonuclease of a class crispr-cas system the crisprassociated dna-cleaving enzyme cpf also processes precursor crispr rna prostate cancer specificity of pca gene testing: examples from clinical practice a first-generation multiplex biomarker analysis of urine for the early detection of prostate cancer pca and tmprss -erg gene fusions as diagnostic biomarkers for prostate cancer pca in the detection and management of early prostate cancer prostate health index (phi) and prostate cancer antigen (pca ) significantly improve prostate cancer detection at initial biopsy in a total psa range of - ng/ml pca urinary biomarker for prostate cancer multiplexed and portable nucleic acid detection platform with cas , cas a and csm a protocol for detection of covid- using crispr diagnostics v. . sherlock biosciences a protocol for rapid detection of the novel coronavirus sars-cov- using crispr diagnostics: sars-cov- detectr v crispr-cpf mediates efficient homology-directed repair and temperature-controlled genome editing cpf is a single rna-guided endonuclease of a class crispr-cas system detection of unamplified target genes via crispr-cas immobilized on a graphene field-effect transistor we are grateful to the members in the jain lab for their helpful discussions and the university of florida (uf) health cancer center for their support. we are particularly thankful to eric beck for editing the manuscript and ling jin, santosh rananaware, and marco downing for helping with the experiments and/or data analysis. we also thank the monoclonal antibody core facility staff, especially dr. angle sampson and shadi bootorabi, at the uf interdisciplinary center for biotechnology research (icbr) for coordinating the biolayer interferometry experiments. this research was supported by the internal funding from the uf and the uf herbert wertheim college of engineering. the authors declare no competing interests. pkj initiated the study; ltn and pkj designed research; ltn and bms performed research; lnt, bms, and pkj analyzed the data; ln and pkj wrote the manuscript that was edited and approved by all authors. key: cord- -mmkrwj t authors: snijder, eric j.; limpens, ronald w.a.l.; de wilde, adriaan h.; de jong, anja w. m.; zevenhoven-dobbe, jessika c.; maier, helena j.; faas, frank f.g.a.; koster, abraham j.; bárcena, montserrat title: a unifying structural and functional model of the coronavirus replication organelle: tracking down rna synthesis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mmkrwj t zoonotic coronavirus (cov) infections, like those responsible for the current sars-cov- epidemic, cause grave international public health concern. in infected cells, the cov rna-synthesizing machinery associates with modified endoplasmic reticulum membranes that are transformed into the viral replication organelle (ro). while double-membrane vesicles (dmvs) appear to be a pan-coronavirus ro element, studies to date describe an assortment of additional coronavirus-induced membrane structures. despite much speculation, it remains unclear which ro element(s) accommodate viral rna synthesis. here we provide detailed d and d analyses of cov ros and show that diverse covs essentially induce the same membrane modifications, including the small open double-membrane spherules (dmss) previously thought to be restricted to gamma- and delta-cov infections and proposed as sites of replication. metabolic labelling of newly-synthesized viral rna followed by quantitative em autoradiography revealed abundant viral rna synthesis associated with dmvs in cells infected with the beta-covs mers-cov and sars-cov, and the gamma-cov infectious bronchitis virus. rna synthesis could not be linked to dmss or any other cellular or virus-induced structure. our results provide a unifying model of the cov ro and clearly establish dmvs as the central hub for viral rna synthesis and a potential drug target in coronavirus infection. epidemic, cause grave international public health concern. in infected cells, the cov rna- synthesizing machinery associates with modified endoplasmic reticulum membranes that are transformed into the viral replication organelle (ro). while double-membrane vesicles (dmvs) appear to be a pan-coronavirus ro element, studies to date describe an assortment ro [ , , ] , was entirely possible and started to attract attention. notably, dmvs can be also formed in the absence of vrna synthesis by expression of key transmembrane nsps [ ] [ ] [ ] [ ] [ ] . moreover, several studies suggested a lack of direct correlation between the number of dmvs and the level of cov replication in the infected cell [ , ] . the interpretation of the cov ro structure and function was further compounded by the discovery of different ro double-membrane spherules (dmss) we first set out to analyse the ultrastructure of mers-cov-infected huh cells under sample preparation conditions favourable for autoradiography (see materials and methods) (fig , s video). strikingly, in addition to the dmvs and cm that are well established hallmarks of beta-cov infections, the presence of small spherules, occasionally in large numbers, was readily apparent (fig a and b) . these spherules were notably similar to the dmss previously described for the gamma-cov ibv [ ] . their remarkably regular size of ~ nm (average diameter . ± . nm, n= ), a delimiting double membrane and their electron- dense content, made these spherules clearly distinct from other structures, including progeny virions, which had comparable diameter (fig c and d ). the double-membrane spherules (dmss) generated during ibv infection were previously described as invaginations of the zippered er that remain open to the cytosol [ ] . in mers-cov-infected cells, the dmss were connected to the cm from which they seemed to derive ( fig e) . clear openings to the cytosol could not be detected for the large majority (~ %, n= ) of the fully reconstructed dmss, which suggests that the original invagination may eventually transform into a sealed compartment. this type of apparently closed dmss were also present, though in a lower proportion (~ %, n= ), in ibv-infected cell samples processed in an identical manner (s the d architecture of mers-cov-induced ro aligned with previous observations for other cov [ , ] . no clear openings connecting the interior of the dmvs and the cytosol could be detected. all three types of mers-cov-induced membrane modifications appeared to be interconnected, either directly or indirectly through the er. while dmss were connected to cm, and cm to er, er membranes were often continuous with dmvs (fig f, arrowheads) . therefore, like other covs, mers-cov infection appears to induce a network of largely interconnected modified er membranes that, as a whole, can be considered the cov ro. quantification of the autoradiography signal per subcellular structure (see also s table) . labelling densities and relative labelling indexes (rli) in different subcellular regions of (c) vero e cells infected with sars-cov (moi ) or (d) huh cells infected with mers-cov (moi ). radioactively-labelled uridine was provided for the indicated periods of time immediately before fixation at hpi and hpi, respectively. control mock-infected cells are excluded from the rli plots, as rli comparisons between conditions require the same number of classes (subcellular regions) and these cells lack ros and virions. budding vesicles (fig a-c) .the m and s proteins also localized to the golgi complex, aligning with previous observations for other covs [ , , , ] . the mers-cov n protein was found in regions with cm and dmss, though the distribution of signal was homogenous and dmss were not particularly densely labelled (fig d) . the presence of the assembly, or the s protein (fig e and f ). previously, the cm induced by sars-cov and mhv were shown by iem to accumulate viral nsps, while dsrna signal was primarily found inside the dmvs [ , ] . similarly, nsp mapped to the cm induced in mers-cov infection, but also to the dmss to a comparable extent (fig g) . our attempts to combine dsrna antibody labelling with thawed cryo-sections were unsuccessful, which made us resort to hpf-fs samples. in these, however, while dmvs were easily detected, the morphology of cm and dmss was less clearly defined. nevertheless, dsrna signal was clearly associated with dmvs, while the dark membranous regions between dmvs that we interpreted as cm and dmss clusters appeared devoid of signal (fig h and i ). in summary, for the antibodies tested (recognizing n, m, s, nsp , and dsrna), the labelling pattern in mers-cov-induced dmss closely resembled that of the cm, from which they seem to derive. the absence of labelling for key proteins in virus assembly, like the m and s proteins, strongly suggest that dmss do not represent (spurious) virus assembly events. the comprehensive analysis presented here demonstrates that viruses across different cov genera induce essentially the same type of membrane structures. after somewhat disparate observations [ , , , , , ] , the unifying model that emerges from our study is that our results add to studies that, in the last years and after much speculation, have started to provide experimental evidence that the dmvs induced by +rna viruses are active sites of vrna synthesis [ , [ ] [ ] [ ] . however, it is not clear that dmvs always play the primary role in virus replication that we demonstrate here for cov. for picornaviruses, for example, virus- were freeze-substituted in a leica afs system with . % (wt/vol) uranyl acetate as previously described [ ] , with the only modification that acetone was replaced by ethanol from the last washing step before lowycril infiltration onwards. cell sections ( nm thick) were incubated with the primary mouse antibody, then with a bridging rabbit anti-mouse-igg antibody (dako cytomation), and finally with protein a coupled to -nm gold particles. after immunolabelling, samples were additionally stained with % uranyl acetate and reynold's lead citrate. large mosaic em maps containing dozens of cell profiles were used for the quantitative analysis of the newly-synthesized rna autoradiography signal (see s table) . for each cov, different conditions (infected and mock-infected cells, plus different labelling times) were compared using only samples developed after the same period of time. the analysis of the signal in different subcellular regions was carried out using home-built software. areas of µm were randomly selected from the mosaic em maps and the autoradiography grains present in those areas were manually assigned to the underlying cellular structures. the abundance of the different types of subcellular structures was estimated through virtual points in a × lattice superimposed to each selected area, which were also assigned to the different subcellular classes. regularly along the process, the annotated data per condition was split into two random groups and the kendall and spearman coefficients, which measure the concordance between two data sets [ ], were calculated. new random regions were added until the average kendall and spearman coefficients resulting from random splits were higher than . and . , respectively (maximum value, ). labelling densities and relative labelling indexes (rli) were then calculated from the annotated points [ ] . for the analysis of the association of vrna synthesis with each of the different ros motifs, the specific dmvs, dmss and cm included in the analysis were carefully selected. only individual dmvs that were at least one micron away from any other virus-induced membrane modification were included in the analysis. for every grain present in an area of nm radius around each dmv, the distance to the dmv centre was measured. in the case of dmss, which were always part of clusters of virus-induced membrane structures, only dmss in the periphery of these clusters were selected. the quantified signal was limited to sub- areas devoid of other ro motifs, which were defined by circular arcs (typically o to o , radius nm) opposite to the ro clusters. cms are irregular structures that appear partially or totally surrounded by dmvs. only large cm (> . µm across) were selected in order to make more apparent (if present) any decay of the autoradiography signal as the distance to the surrounding dmvs increased. for each autoradiography grain, both the distance to the closest cm boundary (d ) and the distance to the opposite cm edge (d ) were measured. the relative distance to the cm edge was then calculated as d /(d +d ) and expressed in percentages. all the measurements in different dmvs, dmss and cm were made using aperio imagescope software (leica) and pooled together into three single data sets. autoradiography is a classic technique that allows the em visualization of a radioactive marker, usually targeting a certain process, and thus reveals the subcellular localization of that process [ , ] . tritiated uridine, for example, can be used to locate active rna synthesis [ ] [ ] [ ] , as shown also in this study. a clear advantage over the use of alternatives for metabolic labelling of newly-synthesized rna (e.g. br-uridine, br-utp, -ethynil uridine) is that the radioactive precursor is chemically identical to the natural substrate. after labelling, the samples are immediately fixed and processed for em. the location of the radioactive marker can then be made apparent by applying a highly-sensitive photographic emulsion (a nuclear emulsion) on top of the cell sections and exposing it for several weeks to months. the beta particles that are emitted as a result of tritium disintegrations generate electrons that get trapped in the silver halide emulsion and create a "latent image". when the emulsion is developed, these negative charges promote the reduction to metallic silver, generating electron-dense grains that are visible by em. in principle, given enough time to accumulate enough radioactive disintegrations, even low levels of the radioactive marker could be detected. in practice, other factors (e.g. background radiation, emulsion aging) set some limits to autoradiography, which is nonetheless a very sensitive technique. the resolution of em autoradiography is limited by the fact that radioactive disintegrations generate beta particles that are emitted in random directions. importantly, the probability of giving rise to signal degreases with the distance from the radioactive source; however, some beta particles may travel up to a few hundred nanometers before striking the photographic emulsion [ ] . therefore, it is important to keep in mind that the silver grains may not directly overlay the structure containing the radioactive source. quantitative analyses of the signal that take this factor into account, like those presented in this study, become indispensable to maximize the information that autoradiography can provide. as for mer-cov-infected cells (fig ) . tomographic slices through two regions containing membranous replication factories induced by plus- strand rna viruses ultrastructure of the replication sites of positive-strand rna viruses building viral replication organelles close encounters of the membrane types interaction of the innate immune system with positive- strand rna virus replication organelles rna virus replication complex parallels form and function of retrovirus capsids. molecular cell three-dimensional analysis of a viral rna replication complex reveals a virus-induced mini-organelle. plos biology composition and three-dimensional architecture of the dengue virus replication and assembly sites template rna length determines the size of replication complex spherules for semliki forest virus three- dimensional imaging of the intracellular assembly of a functional viral rna replicase complex the transformation of enterovirus replication structures: a three-dimensional study of single-and double-membrane compartments complex dynamic development of poliovirus membranous replication complexes membrane alterations induced by nonstructural proteins of human norovirus three- dimensional architecture and biogenesis of membrane structures associated with hepatitis c virus replication rna replication of mouse hepatitis virus takes place at double-membrane vesicles sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum ultrastructural characterization of arterivirus replication structures: reshaping the endoplasmic reticulum to accommodate viral rna synthesis extensive coronavirus-induced membrane rearrangements are not a determinant of pathogenicity an integrated analysis of membrane remodeling during porcine reproductive and respiratory syndrome virus replication and assembly double-stranded rna is produced by positive-strand rna viruses and dna viruses but not in detectable amounts by negative-strand rna viruses qualitative and quantitative ultrastructural analysis of the membrane rearrangements induced by coronavirus. cellular microbiology mers-coronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin a or interferon-alpha treatment severe acute respiratory syndrome coronavirus nonstructural proteins , , and induce double-membrane vesicles mutations across murine hepatitis virus nsp alter virus fitness and membrane modifications expression and cleavage of middle east respiratory syndrome coronavirus nsp - polyprotein induce the formation of double-membrane vesicles that mimic those associated with coronaviral rna replication bronchitis virus nonstructural protein alone induces membrane pairing competitive fitness in coronaviruses is not correlated with size or number of double-membrane vesicles under reduced-temperature growth conditions targeting membrane-bound viral rna synthesis reveals potent inhibition of diverse coronaviruses including the middle east respiratory syndrome virus infectious bronchitis virus generates spherules from zippered endoplasmic reticulum membranes replication organelle comprises double-membrane vesicles and zippered endoplasmic does form meet function in the coronavirus replicative organelle? trends in microbiology isolation of a novel coronavirus from a man with pneumonia in saudi arabia. the new england journal of medicine genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans the new england journal of medicine genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding techniques and applications of autoradiography in the light and electron microscope electron microscopy: principles and techniques for biologists virtual nanoscopy: generation of ultra-large high resolution electron microscopy maps relative labelling index: a novel stereological approach to test for non-random immunogold labelling of organelles and membranes on transmission electron microscopy thin sections replication of coronavirus mhv-a in sac-cells: determination of the first site of budding of progeny virions ultrastructural characterization of sars coronavirus the intracellular sites of early replication and budding of sars-coronavirus novel contrasting and labeling procedures for correlative microscopy of thawed cryosections coronavirus m proteins accumulate in the golgi complex beyond the site of virion budding proliferative growth of sars coronavirus in vero e cells morphogenesis of avian infectious bronchitis virus and a related human virus (strain e) ultrastructural characterization of membrane rearrangements induced by porcine epidemic diarrhea virus the e glycoprotein of an avian coronavirus is targeted to the cis golgi complex the putative helicase of the coronavirus mouse hepatitis virus is processed from the replicase gene polyprotein and localizes in complexes that are active in viral rna synthesis localization of mouse hepatitis virus nonstructural proteins and rna synthesis indicates a role for late endosomes in viral replication mouse hepatitis virus replicase proteins associate with two distinct populations of intracellular membranes determination of host proteins composing the microenvironment of coronavirus replicase complexes by proximity-labeling. elife the coronavirus nucleocapsid is a multifunctional protein visualizing coronavirus rna synthesis in time by using click chemistry do viruses subvert cholesterol homeostasis to induce host cubic membranes? trends in cell biology cubic membranes: a legend beyond the flatland* of cell membrane organization morphological and biochemical characterization of the membranous hepatitis c virus replication compartment epub / / escaping host factor pi kb inhibition: enterovirus genomic rna replication in the absence of replication organelles the origin, dynamic morphology, and pi p-independent formation of encephalomyocarditis virus replication organelles. mbio sars-coronavirus replication/transcription complexes are membrane-protected and need a host factor for activity in vitro early endonuclease-mediated evasion of rna sensing ensures efficient coronavirus replication mechanisms and enzymes involved in sars coronavirus genome expression a new virus isolated from the human respiratory tract reverse genetics system for the avian coronavirus infectious bronchitis virus the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex monoclonal antibodies to double-stranded rna as probes of rna structure in crude nucleic acid extracts towards a solution to mers: protective human monoclonal antibodies targeting different domains and functions of the mers-coronavirus spike glycoprotein resolution of a gold latensification-elon ascorbic acid developer for ilford l emulsion correlated fluorescence and d electron microscopy with high sensitivity and spatial precision a rapid method for assessing the distribution of gold labeling on thin sections dmss (white arrowheads) and zippered er (white arrows). most zippered er consists of long stretches of though branching zippered er, closer to the cm described for beta-cov, was also present (b) virus particles (black arrowheads) budding into the er membranes were often observed. scale bars analysis of previously described samples of cov-infected cells, prepared for em either by hpf(a) or cryo-plunging (b). a targeted search revealed the presence of dmss (white arrowheads) in close association with cm. in comparison with the chemically fixed samples used in this study, the superior ultrastructural preservation of cryo-fixation results in less distorted membranes, but also in a denser cytoplasm and darker cm that makes dms less apparent. (a) example from a mers-cov-infected huh cell sars-cov-infected cell ( hpi) metabolic labelling of newly-synthesized vrna in ibv-infected cells and analysis of the autoradiography signal. vero cells infected with ibv were pre-treated with actinomycin d for hour, then labelled for or min with tritiated uridine a) overview of an ibv-infected vero cell ( min labelling). the areas containing dmvs and zippered er are outlined in yellow and blue, respectively, and other subcellular structures annotated (n, nucleus; m, mitochondria; au, autophagosome; vcr, virion-containing regions). the autoradiography signal accumulates in areas of virus-induced membrane modifications that often only contain dmvs close-up of the area boxed in black in (a), which contains dmvs the contrast between the densely labelled dmvs and the zippered er and dmss largely lacking signal is apparent and suggests that the autoradiography grains sometimes present on the latter structures arose from radioactive disintegrations in the surrounding active dmvs. (c) in agreement with this possibility, most of the dmss ( %) were devoid of signal, and most of those that contained label where close to an active dmv (n dms = ). (d) furthermore, the distribution of autoradiography grains around dmss resembled that of a random distribution dmvs proved that these structures are associated with vrna synthesis, as the signal reaches maximum values in the proximity of the dmvs (n dmvs = ). ((c, d) see materials and methods for the selection criteria and details) electron tomography of the membrane structures induced in mers-cov infection animation illustrating the tomography reconstruction and model presented in fig b. the video first shows the tomographic slices ( . nm thick) through the reconstructed volume, and then surface-rendered models of the different structures segmented from the tomogram cm (blue) and dmvs (yellow and lilac, outer and inner membranes), er (green), and a vesicle (silver) containing virions (pink). the movie highlights the dms association with cm techniques and applications of autoradiography in the light and electron microscope autoradiography & radioautography. electron microscopy: principles and techniques for biologists association of polioviral proteins of the p genomic region with the viral replication complex and virus-induced membrane synthesis as visualized by electron microscopic immunocytochemistry and autoradiography escaping host factor pi kb inhibition: enterovirus genomic rna replication in the absence of replication organelles the origin, dynamic morphology, and pi p-independent formation of encephalomyocarditis virus replication organelles. mbio mers-coronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin a or interferon-alpha treatment sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum table data sets and sampling for the quantifications of the autoradiography signal presented in fig c and key: cord- -pmbpx authors: kalaycı, salih; Özden, cihan title: the linkage among sea transportation, trade liberalization and industrial development in the context of carbon dioxide emissions: an empirical investigation from china date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pmbpx the major goal of this paper is to focus on the existing literature regarding the linkage between maritime, trade liberalization and industrial development in the context of co by using econometrical model. in this context, it is attempted to reveal the effects of independent variables on co (dependent variable) for china from to (annual data) by implementing phillips-perron (pp), zivot-andrews unit root tests, fmols, dols, ccr, ardl and gmm methods. according to results of fmols, dols and ccr models there is a long-term stable relationship between sea transportation, trade liberalization, industrial development and carbon dioxide emissions which is proved empirically. similarly, short term ardl estimation results reveal that the main determinants of co in the short-run are changed in industrial development and maritime transport at % significance level. table summarizes the short-term ardl results and the findings regarding the error correction model. according to table , error correction model works in order to reach short-run adjustment. in the short term, approximately % of shocks in industrial development, maritime transport and trade liberalization are compensated within a period of time and the system is re-established in the long term. china produced half of the . million electric media used worldwide; the government directs its attention to the rehabilitation and reuse of all these lithium-ion batteries. large-scale production of biofuels can still be several years away. crude oil might be very difficult to promote alternative fuels on a national scale unless crude oil prices surge so high as to become unaffordable. authorities underline: china will become the world’s number one economy. now renewable energy will be more important, which should be encouraged to use by government on transportation so as to reduce the co emissions. however, china can be leader excess oil use for transport if they want to dominate the economy worldwide. one of the most frequently discussed global issues in recent years has been environmental destruction in the context of global warming and climate change. the main cause of global warming is the very rapid increase in the rate of gases that cause the greenhouse effect in the atmosphere. the main gas causing the greenhouse effect through carbon dioxide (co ) gas emitted to the atmosphere by the use of fossil fuels such as gasoline, coal and natural gas. mass production and excessive consumption, which started through the industrial revolution, increased energy needs and this requirement was met largely from fossil fuels. for many years, fossil fuel energy demand has reached exponential growth, causing pollution on the environment. thus, the countries that focused on the economic growth target also caused the co emissions and global warming. the emergence of environmental pollution and industrialization has begun to be discussed in academic literature. one of the most important sources of this trend is the report "the limits to growth" prepared by the roman club in . according to the report, "if the current growth of industrialization continues in terms of food production, consumption of natural resources and co emissions, economic growth will damage the environment in our world in the next years" [ ] . many studies have examined the dynamic relationships between transportation, co , industrial development and economic growth by taking into account the inverted u-shaped hypothesis as a base in the academic literature. however, the linkage among sea transportation, trade liberalization and industrial development in the context of co will be new variables. china has witnessed tremendous economic growth, as well as rapid development in both the financial and labor sectors since last years. nowadays, china is the second largest economy in the world. besides, economic growth was accompanied by increased fuel energy consumption and thus co emissions. in this study, empirical results proved the impact of sea transportation, trade liberalization and industrial development on carbon dioxide emissions for china from to . in order to examine these dynamic relationships, the researchers applied many kinds of econometrical methods, such as multivariate regressions, the johansen co-integration test, the adf (augmented dickey-fuller) unit root test, the var (vector autoregressive) model, impulse response analysis, variance decomposition analysis, granger causality test, and panel data analysis in the methodology section of their articles. the researchers achieved different results for the validity of ekc (environmental kuznets curve) relationships based on different samples, methodologies, and periods. the major goal of this paper is to concentrate on the existing literature regarding the relationship between maritime, trade liberalization and industrial development in the context of co by using econometrical model. in this context, it is attempted to reveal the effects of independent variables on co (dependent variable) for china from to . the main purpose of this analysis is to address the problem in terms of environmental economy by giving some suggestions. the study is structured as follows. following this introduction, section presents the theoretical background of relevant works and theorical framework of the article; section deals with the analyzing and result of research and consequently section concludes this article by giving some recommendation to reduce co emissions. production forms are gradually divided into nations in today's globalizing world. in this sense, local consumption in any country is incrementally performed throughout the world of supply chain. the transportation sector has developed through industrialization process and as a result carbon emissions has increased day by day especially in certain countries. therefore, in this case, it imposed some restrictions in terms of trade and supply chain. trade liberalization, transportation, industrial development and co are widely discussed regarding the environmental consequences of trade in academic literature. hassan and nosheen [ ] investigate the effects of trade openness energy usage and economic growth on carbon dioxide leakage in terms of ekc hypothesis for pakistan by implementing yearly data beginning from to . in order to reach the results they implement econometric models including adf unit root test, johnsen co-integration test to reveal the long-term relationship between variables and vector error correction. according to their findings, economic growth is positively connected with carbon dioxide leakage. population growth and trade liberalization have a considerably negative effect on co leakage during the short-term, whereas, in the long-term, the course is opposite. in addition, granger causality test indicates that a bi-directional causality acts from energy consumption and economic growth to co leakage. one-way causalities running from trade liberalization to co emissions as well. jebli and belloumi [ ] state that sea transportation, waste usage and combustible renewables significantly affect the carbon dioxide emissions, while any rise in sea transportion reduce the consumption of combustible renewables and waste usage. besides, sea transportation is significantly correlated with carbon dioxide leakage, demonstrating that the tunisian transportation is so contaminated due to the excessive non-renewable energy consumption. consequently, sea transport is the main contributor to air pollution and lead to the increment of carbon dioxide leakage in the tunisia. nakayama, zhu, hirokawa, irino and yoshikawa-inoue [ ] assert that rishiri island is the northernmost site of the japan where weather is measured, although it is not yet being incorporated by the wmo. past works demonstrate that when the continental sea transport is performed especially in - °e, - °n, it causes co emissions. zhu, yuen, ge and li's [ ] the findings indicate that sea transport emissions in terms of trading system can motivate main actors to benefit from new technologies, provide more carbon efficient vessels in terms of green energy. besides, the efficacy of sea transport emissions trading system is more obvious when bunker fuel prices are excessive. in this sense, the findings ascertain that bunker fuel prices have a larger impact on co reduction than implementing stricter co allowance allocation as well. their work involves the formulation of sea transport emissions trading system policies and ensures suggestions in order to decrease the sea transport emissions to fulfill the shipping industry's impetus by providing its environmental performance. katircioğlu [ ] demonstrates that energy usage has positive while incoming tourists and economic growth have negative effect on carbon dioxide leakage in the long-run. it is further clarified that singapore has an inverted u-shaped environmental kuznets curve and regardless of the energy usage level, carbon dioxide leakage follow a declining trend in singapore. besides, the relationship among sustainability of the environment, energy efficiency and tourism industry development can be clarified by the tourism-induced environmental kuznets curve hypothesis. according to which, in the beginning period of its trend, tourism sector lead to environmental pollution until the frontier point of country income is reached. in addition, as this point of income is reached however, it is expected to follow a downward point in the decline level. katircioglu, katircioglu and kilinc [ ] express that industrial development, aggregate households, and therefore urban areas result in additional energy demand, which causes an expansion in carbon dioxide emissions. they empirically proved the urbanization-induced environmental kuznets curve hypothesis and thus searched the long-run equilibrium linkage causality among co emissions and urban development from energy usage and real income growth in the world. katircioglu, gokmenoglu and eren [ ] assert that tourism industry is relied majorly on infrastructure potentials including highways, airports, harbours, hotels and holiday village. however, it can be inevitable that tourism industry and its factors will considerably influence the environmental quality. besides, abuse of the natural resources is one of the required operations in terms of attracting more individuals and ensuring a competitive tourism industry. deforestation, abusing the raw materials and overutilizing the natural water are some of the compromises made to engender tourism industry by constructing hotels and other facilities. thus, exploitation of the natural resources can cause undesirable environmental effects including excessive co leakage, air pollution and erosion. given its potential negative effects, connection among industrial development and environment quality has been examined in academic literature. in this sense, the findings of their work demonstrate that u-shaped is confirmed in the context of environmental kuznets curve hypothesis for main tourist countries. consequently, main tourist countries finely operate the tourism and urban development to control environmental pollution. katircioglu [ ] indicates that tourism industry causes the rise of co leakage for the cyprus which is an economy known for its extreme demand of tourism activities. furthermore, urbanism and entire population of the world have been involved in the research results for comparison aims as well. koksal, işik and katircioğlu [ ] state that developed economies concentrate more on light manufacturing industries including apparel, leather, wood, metal products and agribusiness. relying on their phase of growth, the differentiation among these two economies allows them to differ their financial structure, which creates composition impact and pollution haven hypothesis can clarify this fact. in this sense, liberal trade theories demonstrate that countries decide to manufacture products which they have a comparative advantage. mehrara and rezaei [ ] examine the linkage among economic growth, co emissions and trade openness through econometric models of unit root test, cointegration test and a panel data analysis from to for brics countries. in this context, data structure is tested to reveal the stationarity of series by implementing the adf and pp unit root tests which demonstrates that the series are stationary at i( ). they reveal a cointegration linkage among co emissions, economic growth and trade openness by using kao panel cointegration test. the proof demonstrates that in the long-term trade openness has a positive considerably effect on co emissions and impact of trade openness on emission. kuik and gerlagh [ ] find out the impact of trade openness on co emissions by using econometrical models. they use quantitative estimates of co emissions by taking into account the kyoto protocol and free trade through lowering the import tariff which is determined in the uruguay round of multilateral trade aggreement. in addition, lowering import tariff causes co emissions and the costs of abating the trade-induced co emission are relative to the welfare gains of free trade. in this context, analysis of the trade-induced co emissions demonstrates distinct differences among emissions caused by lowering import tariff on energy products and by on non-energy products. it demonstrates distinct differences in leakage responses between developing countries as well. managi, hibiki and tsurumi [ ] point out that the effect of trade liberalization on environmental pollution by implementing the instrumental variables technique. they show that the effect is considerable in the long-run, after the dynamic adjustment process, although it is small in the short-run and trade is determined as an advantageous factor in terms of the environment in oecd economies. it has prejudicial impact, however, on co leakage and sulfur dioxide so in non-oecd economies, although it does lower biochemical oxygen demand (bod) emissions in these countries. consequently, trade liberalization affects co leakage by way of the environmental regulation impact and capital influence. baek and kim [ ] reveal that trade volume to co leakage and energy causality holds for the developed economies; changes in degree of trade liberalization lead to corresponding changes in the rates of growth in emissions and energy usage. in this sense, co leakage and energy usage to trade volume is found to hold for the developing economies; any shocks in emissions and energy consumption influence the fluctuations in terms of trade liberalization. shahbaz, tiwari and nasir [ ] point out the impact of trade liberalization, economic growth and coal consumption on co leakage by implementing time series analysis from to for south africa. autoregressive distributed lag (ardl) bounds testing approach to cointegration has been implemented to analyze the long-term linkage between the variables while short-term dynamics have been examined by using ecm model. their results verified the long-term linkage between the relevant variables. findings indicate that an increase in gdp raises co leakage, while financial development decreases it. moreover, coal consumption has remarkable effect to pollute the environment in south africa. trade liberalization factor has positive effect on environmental quality by decreasing the growth of energy pollutants. finally, their findings confirm the existence of ekc. shen [ ] demonstrate that factor endowment hypothesis is considered by several linkage between the relevant variables as well as evidence of the ekc hypothesis. they confirmed causal linkage between the variables and propose a "polluter pays" mechanism to maintain the awareness of a clean environment. ali et al. [ ] show that there is a statistically inconsequential correlation among co emissions, gdp and industrial development of pakistan. however, the result of ali et al. [ ] 's result is not consistent with this paper. li et al. [ ] discuss in detail the large and widening energy consumption, air quality index and co efficiency score gap among chinese regions, through the lower co and air quality index efficiency scores mainly focused in the western cities. they recommend that china needs to pay more attention to the differing economic levels, social development, industrial structures, energy consumption, and r&d in the western regions and apply systematic solutions based on domestic meteorological and climatic conditions, economic and social development. this manuscript examines the nexus between sea transportation, trade liberalization, industrial development and carbon dioxide emissions by implementing fmols, dols, ccr and ardl model. annual datas are obtained from unctad's [ ] and worldbank's [ ] , [ ] , consequently, the model for co emissions in china would be as follows: co is the logarithm of carbon dioxide emissions expressed in levels, α is a constant, du (λ) is a dummy variable that takes the value as from the series in which the structural change is considered to have the value of in previous years, the variable of t represents the time, co - is carbon dioxide emissions lagging one period. dt (λ) = t -tλ if t > tλ and if this is not the case. the next term is the sum of the change in the variable of interest for periods t -j through k; the regressors of this term are added to eliminate the possible dependence on the limit distribution used in statistical tests, caused by the temporal dependence of the distributions. finally, ε is the error term. fmols, dols and ccr co-integration analysis depend on the condition that the series implemented, such as traditional co-integration method which is required stationary series. in addition, having the possibility to interpret the derived coefficients provide course of process in terms of co emissions through considering independent variables including sea transportation, trade liberalization and industrial development. the ardl equation is indicated as econometric symbols, where the determinants of long-term economic growth are investigated in equation ( ) below: the long-run relationship between carbon dioxide emissions sea transportation trade liberalization and industrial development is investigated through f bounds test which is considered the zero hypothesis. : = = = = : ≠ ≠ ≠ ≠ ( ) findings from fmols, dols and ccr models indicate that maritime transport, trade liberalization and industrial development are the determinants of long-term carbon emissions, just as in the results of the ardl model. it is also noteworthy that the findings obtained from fmols, dols and ccr models, which are described as new co-integration techniques and allowed the separation of short and long-term relationships, consistent with the long-term results obtained from the ardl model in table . in this context, trade relations should be increased by policy-makers through improving their maritime transport infrastructures and further accelerate their industrial growth via research and development. note: * critical values for f-statistics are . for the lower limit at % significance level and . for the upper limit. in this case, there is a long-term cointegration relationship between the variables at the % significance level in the estimated model. considering the ardl f-bound test, long-term ardl estimates were made, respectively, through the revealing long-term co-integration by empirical model which is used among the variables. long term ardl forecast results are given in table . long-term ardl forecast results reveal that the main determinants of co changes in sea transportation, trade liberalization, industrial development. table ; if sea transport increases by percent, co increases by . %. a percent increase in industrial development increases co by . %. unlike other econometric models (fmols, dols and ccr), trade liberalization has no statistically significant effect on co . long-run ardl results in terms of the relationships between co and key economic determinants are similar with short-term ardl test findings. short term ardl estimation results reveal that the main determinants of co in the short-run are changed in industrial development and maritime transport at a % significance level. table summarizes the short-term ardl results and the findings regarding the error correction model. according to table , error correction model works in order to reach short-run adjustment. in the short term, approximately % of shocks in industrial development, maritime transport and trade liberalization are compensated within a period of time and the system is re-established in the long term. (table ). tablo , the lowest root mean square error is found in the gmm-tsls methods. therefore, gmm-tsls method is selected for the analysis. according to gmmtsls analysis (table ) , there is no validation problem since the t-statistic value is more than . . ar ( ) is significant, and ar ( ) table .) is close to . transforming the transport industry to run on renewable energy is vital for a more sustainable society and many countries are concerned with their carbon footprint. the clean energy revolution is also sweeping through the chinese transport sector. in china produced half of the . million electric media used as worldwide; the government directs its attention to the rehabilitation and reuse of all these lithium-ion batteries. large-scale production of biofuels can still be several years away. crude oil might be very difficult to promote alternative fuels on a national scale unless crude oil prices surge so high as to become unaffordable. developing battery science, solar panels and electricity are obtained from the sun rays. this obtained energy is also included in the systems to assist the ship's shipbuilding auxiliary power or ship's daily use needs. some certain industries are using the solar panels for production. first looked at the solar panel due to the attitude and the cost of initial setup, it is necessary to actively wait for the future in the maritime industry. therefore, first step reuse of waste energy co emissions can be reduced. thus, since the amount of fuel consumed by the ship decreases, the amount of co emissions decreases. moreover, methods to reduce co emissions include reducing fuel costs and using alternative fuels with low or zero carbon content. therefore, carbon emissions will be greatly reduced with burning low emission's fuel in diesel engines. power management can be achieved by ensuring that diesel machines on ships operate at optimum powers. when machines are operated with excessive continuous overload, they can also be increased carbon dioxide emissions. common rail (powerline) working principle includes the minimum fuel for each load sent to the cylinder regardless of the load the machine is running. it is possible to achieve % energy efficiency with this system. it is used not only to reduce co emissions, also to reduce so emissions. it is possible to achieve % energy efficiency with this system. four types of engines are used in the manufacturing industry in the world, and these are equivalent engines ie , ie , ie and ie , ie motors are the least energy consuming and most efficient motors in the world. in order to make the use of ie engines more common, efforts to reduce the cost of such machines should be given importance. however, roller bearing is usually found in automated machines such as cnc, and their usage rate should be increased with r&d supports by government. intelligent start-stop systems are widely used today. logically, the software is thrown in batteries electronic circuit, so that parameters are protected without the need for high energy. thus, the machine start-up setup time is eliminated, and the machine can be used for longer production. this system should be applied in all machines with plc panels and even adding start-stop batteries to the low-tech machines together with the panel will provide co emissions, energy consumption and high profitability. uninsulated and high-volume environments are cooled by fans in enterprises, resulting in high energy consumption and co emissions. as an alternative system, the environment cooling system that works by rotating cold water in the company is much cheaper and environmentally friendly. findings from fmols, dols and ccr models indicate that maritime transport, trade liberalization and industrial development are the determinants of long-term carbon emissions, just as in the results of the ardl model. it is also noteworthy that the findings obtained from fmols, dols and ccr models, which are described as new co-integration techniques and allowed the separation of short and long-term relationships, consistent with the long-term results obtained from the ardl model in table . according to the long-run ardl, if sea transport increases by percent, co increases by . %. a percent increase in industrial development increases co by . %. unlike other econometric models (fmols, dols and ccr), trade liberalization has no statistically significant effect on co . long-run ardl results in terms of the relationships between co and key economic determinants are similar with short-term ardl test findings. short term ardl estimation results reveal that the main determinants of co in the short-run are changed in industrial development and maritime transport at a % significance level. china has increased its sales prices in all medical products and then other china's origin products followed these increase prices which will not drop again due to covid- even if china covers the loss. authorities underline that china will become the world's number one economy. as of now renewable energy will be more important, which should be encouraged to use by government on transportation (sea-railway-road) so as to reduce the co emissions. however, china can be leader for excess oil use for transport if they want to dominate the economy worldwide. the impact of air transportation on carbon dioxide, methane, and nitrous oxide emissions in pakistan: evidence from ardl modelling approach investigation of the causal relationships between combustible renewables and waste consumption and co emissions in the case of tunisian maritime and rail transport. renewable and sustainable energy reviews ozone depletion in the interstitial air of the seasonal snowpack in northern japan impact of maritime emissions trading system on fleet deployment and mitigation of co emission testing the tourism-induced ekc hypothesis: the case of singapore investigating the role of urban development in the conventional environmental kuznets curve: evidence from the globe testing the role of tourism development in ecological footprint quality: evidence from top tourist destinations estimating higher education induced energy consumption: the case of northern cyprus the role of shadow economies in ecological footprint quality: empirical evidence from turkey a panel estimation of the relationship between trade liberalization, economic growth and co emissions in brics countries trade liberalization and carbon leakage does trade liberalization reduce pollution emissions trade liberalization, economic growth, energy consumption and the environment: time series evidence from g- economies the effects of financial development, economic growth, coal consumption and trade openness on co emissions in south africa trade liberalization and environmental degradation in china trade openness and co emissions: evidence of bangladesh the role of trade liberalization in carbon dioxide emission: evidence from heterogeneous panel estimations testing the ekc hypothesis by considering trade openness, urbanization, and financial development: the case of turkey. environmental science and pollution research analysis of the nexus of co emissions, economic growth, land under cereal crops and agriculture value-added in pakistan using an ardl approach regional energy, co , and economic and air quality index performances in china: a meta-frontier approach world merchant fleet / unctadstat (merchant fleet by flag of registration and by type of ship, annual table summary) goods and services (bpm ): trade openness indicators, sum of imports and exports, goods and services for china retrieved from https://data.worldbank.org/ . perron, p. the calculation of the limiting distribution of the least-squares estimator in a near-integrated model further evidence on the great crash, the oil-price shock, and the unit-root hypothesis key: cord- -bycskjtr authors: mönke, gregor; sorgenfrei, frieda a.; schmal, christoph; granada, adrián e. title: optimal time frequency analysis for biological data - pyboat date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bycskjtr methods for the quantification of rhythmic biological signals have been essential for the discovery of function and design of biological oscillators. advances in live measurements have allowed recordings of unprecedented resolution revealing a new world of complex heterogeneous oscillations with multiple noisy non-stationary features. however, our understanding of the underlying mechanisms regulating these oscillations has been lagging behind, partially due to the lack of simple tools to reliably quantify these complex non-stationary features. with this challenge in mind, we have developed pyboat, a python-based fully automatic stand-alone software that integrates multiple steps of non-stationary oscillatory time series analysis into an easy-to-use graphical user interface. pyboat implements continuous wavelet analysis which is specifically designed to reveal time-dependent features. in this work we illustrate the advantages of our tool by analyzing complex non-stationary time-series profiles. our approach integrates data-visualization, optimized sinc-filter detrending, amplitude envelope removal and a subsequent continuous-wavelet based time-frequency analysis. finally, using analytical considerations and numerical simulations we discuss unexpected pitfalls in commonly used smoothing and detrending operations. oscillatory dynamics are ubiquitous in biological systems. from transcriptional to behavioral level these oscillations can range from milliseconds in case of neuronal ring patterns, up to years for the seasonal growth of trees or migration of birds (goldbeter et al. [ ] , gwinner [ ] , rohde and bhalerao [ ] ). to gain biological insight from these rhythms, it is often necessary to implement time-series analysis methods to detect and accurately measure key features of the oscillatory signal. computational methods that enable analysis of periods, amplitudes and phases of rhythmic time series data have been essential to unravel function and design principles of biological clocks (lauschke et al. [ ] , ono et al. [ ] , soroldoni et al. [ ] ). here we present pyboat, a framework and software package with a focus on usability and generality of such analysis. many time series analysis methods readily available for the practitioner rely on the assumption of stationary oscillatory features, i.e. that oscillation properties such as the period remain stable over time. a plethora of methods based on the assumption of stationarity have been proposed which can be divided into those working in the frequency domain such as fast fourier transforms (fft) or lomb-scargle periodograms (lomb [ ] , ruf [ ] ) and those working in the time domain such as autocorrelations (westermark et al. [ ] ), peak picking (abraham et al. [ ] ) or harmonic regressions (edwards et al. [ ] , halberg et al. [ ] , naitoh et al. [ ] , straume et al. [ ] ). in low noise systems with robust and stable oscillations, these stationary methods suce to reliably characterize oscillatory signals. recordings of biological oscillations frequently exhibit noisy and time-dependent features such as drifting period, uctuating amplitude and trend. animal vocalization (fitch et al. [ ] ), temporal changes in the activatory pathways of somitogenesis (tsiairis and aulehla [ ] ) or reversible and irreversible labilities of properties in the circadian system due to aging or environmental factors (pittendrigh and daan [ ] , scheer et al. [ ] ) are typical examples where systematic, often non-linear changes in oscillation periods occur. in such cases, the assumption of stationarity is unclear and often not valid, thus the need to use nonstationary-based methods that capture time-dependent oscillatory features. recently, the biological data analysis community has developed tools that implement powerful methods tailored to specic steps of time-series analysis such as rhythmicity detection (hughes et al. [ ] , thaben and westermark [ ] ), de-noising and detrending, and the characterization of nonstationary oscillatory components (leise [ ] , price et al. [ ] ). to extract time-dependent features of non-stationary oscillatory signals, methods can be broadly divided into those that rely on operations using a moving time-window (e.g. wavelet transform) and those that embeds the whole time series into a phase space representation (e.g. hilbert transform). these two families are complementary, having application-specic advantages and disadvantages, and in many cases both are able to provide equivalent information about the signal (quiroga et al. [ ] ). due to the inherent robustness in handling noisy oscillatory data and its interpretability advantages, we implemented at the core of pyboat a continuous-wavelet-transform approach. as a software package pyboat combines multiple steps in the analysis of oscillatory time series in an easy-to-use graphical user interface that requires no prior programming knowledge. with only two user-dened parameters, pyboat is able to proceed without further intervention with optimized detrending, amplitude envelope removal, spectral analysis, detection of main oscillatory components (ridge detection), oscillatory parameters readout and visualization plots (figure a) . pyboat is developed under an open-source license, is freely available for download and can be installed on multiple operatings systems. in the rst section of this work we lay out the mathematical foundations at the core of pyboat. in the subsequent section we describe artifacts generated by the widely used smoothing and detrending techniques and how they are resolved within pyboat. in section we describe the theory behind spectral readouts in the special case of complex amplitude envelopes. we nalize this manuscript with a short description of the user interface and software capabilities. shown together with a sweeping signal whose instantaneous period coincides with the morlet of scale s = exactly at τ . bottom panel: result of the convolution with the sliding morlet Ψ ,τ (t) along signal f (t). the power quickly decreases away from τ . the curve corresponds to one row in the wavelet power spectrum of panel d). c) synthetic signal with periods sweeping from t = s (f ≈ . hz) to t = s (f ≈ . hz). d) wavelet power spectrum shows timeresolved (instantaneous) periods. in this section we aim to lay down the basic principles of wavelet analysis as employed in our signal analysis tool, albeit the more mathematical subtleties are moved to the appendix. the classic approach to do frequency analysis of periodic signals is the wellknown fourier analysis. its working principle is the decomposition of a signal f (t) into sines and cosines, known as basis functions. these harmonic components have no localization in time but are sharply localized in frequency: each harmonic component carries exactly one frequency eective everywhere in time. thus, the straighforward fourier analysis underperforms in cases of time-dependent oscillatory features, such as when the period of the oscillation changes in time ( figure c ). the goal behind wavelets is to reach an optimal compromise between time and frequency localization (gabor [ ] ). gabor introduced a gaussian modulated harmonic component, also known as morlet wavelet: the basis harmonic functions for time-frequency analysis are then generated from the mother wavelet by scaling and translation: varying the time localization τ slides the wavelet left and right on the time axis. scale s changes the center frequency of the morlet wavelet according to ω center (s) = ω /s (see also appendix equation ( )). higher scales therefore generate wavelets with lower center frequency. the gaussian envelope suppresses the harmonic component with frequency ω center farther away from τ , therewith localizing the wavelet in time ( figure b top panel). this frequency ω center (s) is conventionally taken as the fourier equivalent (or pseudo-) frequency of a morlet wavelet with scale s. it is noteworthy to state that wavelets are in general not as sharply localized in frequency as their harmonic counterparts ( figure s ). this is a trade-o imposed by the uncertainty principle to gain localization in time (gröchenig [ ] ). the wavelet transform of a signal f (t) is given by the following integral expression: for a xed scale, this equation has the form of a convolution as denoted by the ' * ' operator, whereΨ denotes the complex conjugate of Ψ. for an intuitive understanding it is helpful to consider above expression as the cross-correlation between the signal and the wavelet of scale s (or center frequency ω center (s)). the translation variable τ slides the wavelet along the signal. since the wavelet decays fastly away from τ , only the instantaneous correlation of the wavelet with center frequency ω center and the signal around τ signicantly contributes to the integral ( figure b middle and lower panel). by using an array of wavelets with dierent frequencies (or periods), this allows to scan for multiple frequencies in the signal in a time-resolved manner. the result of the transform: w : f (t) → f (t, ω) is a complex valued function of two variables, frequency ω and time localization τ . in the following, we implicitly convert scales to frequency via the corresponding central frequencies ω center (s) of the morlet wavelets. to obtain a physically meaningful quantity, one denes the wavelet power spectrum: we adopted the normalization with the variance of the signal σ from torrence and compo [ ] as it allows for a natural and statistical interpretation of the wavelet power. by stacking the transformations in a frequency order, one constructs a two-dimensional time-frequency representation of the signal, where the power itself is usually color coded ( figure d ), using a dense set of frequencies to scan for approximates of the continuous wavelet transform. it is important to note that the time averaged wavelet power spectrum is an unbiased estimator for the true fourier power spectrum p f f of a signal f (percival [ ] ). this allows to directly compare fourier power spectra to the wavelet power. white noise is the simplest noise process which may serve as a null hypothesis. normalized by variance white noise has a at mean fourier power of one for all frequencies. hence, also the variance normalized wavelet power of one corresponds to the mean expected power for white noise: p f w n (ω) = ( figure c ). this serves as a universal unit to compare dierent empirical power spectra. extending these arguments to random uctuations of the fourier spectrum allows for the calculation of condence intervals on wavelet spectra. if a background spectrum p f (ω) is available, the condence power levels can be easily calculated as: assuming normality for the distribution of the complex fourier components of the background spectrum, one can derive that the power itself is chi-square dis-tributed (chateld [ ] ). thus, picking a desired condence (e.g. χ ( %) ≈ ) gives the scaling factor for the background spectrum. only wavelet powers greater than this condence level are then considered to indicate oscillations with the chosen condence. the interested reader may nd more details in section of torrence and compo [ ] . a wavelet power of c = corresponds to the % condence interval in case of white noise ( figure b ), which is frequency independent. for the practitioner, this should be considered the absolute minimum power, required to report 'oscillations' in a signal. it should be noted that especially for biological time series, due to correlations present also in non-oscillatory signals, white noise often is a poor choice for a null model (see also supplementary information a. ). a possible solution is to estimate the background spectrum from the data itself, this however is beyond the scope of this work. optimal filtering -do's and dont's a biological recording can be decomposed into components of interest and those elements which blur and challenge their analysis, most commonly noise and trends. various techniques for smoothing and detrending have been developed to deal with these issues. often overlooked is the fact that both, smoothing and detrending operations can introduce spectral biases, i.e. attenuation and amplication of certain frequencies. in this section we lay out a mathematical framework to understand and compare the eects of these two operations, showing examples of the potential pitfalls and at the same time providing a practical guide to avoid these issues. finally, we discuss how pyboat minimizes most of these common artifacts. the operation which removes the fast, high-frequency (low period) components of a signal is colloquially called smoothing. this is most commonly done as a sliding time window operation (convolution). in general terms we can refer to a window function w(t) such that the smoothed signal is given as: by employing the convolution theorem, it turns out that the fourier transformation of the smoothed signal is simply given by the product of the individual fourier transforms. it follows that the fourier power spectrum of the smoothed signal reads as: applying a few steps of fourier algebra shows that the original power spectrum p f f gets modied by the low pass response of the window function | w| scaled by the ratio of variances σ f /σ f s . also without resorting to mathematical formulas, smoothing and its eect on the time-frequency analysis can be easily grasped visually. a broad class of ltering methods falls into the category of convolutional ltering, meaning that there is some operation in a sliding window done to the data, e.g. for moving average and loess or savitzky-golay ltering (savitzky and golay [ ] ). moving average lter is a widely used smoothing technique, dened simply by a box-shaped window that slides in the time domain. in figure we summarize the spurious eects that this lter can have on noisy biological signals. white noise, commonly used as descriptor for uctuations in biological systems, is a random signal with no dominant period. the lack of dominant period can be seen from a raw white noise signal ( figure a ) and more clearly from the almost at landscape on the power spectrum ( figure b ). applying to raw white noise signal a moving average lter of size -times the signal's sampling interval (m = ∆t) leads to a smoothed noise signal (figure c ) that now has multiple dominant periods, as seen by the emergence of high power islands in figure d . comparing the original spectrum ( figure b ) with the white noise smoothed spectrum ( figure d ), it becomes evident that smoothing introduces a strong increase in wavelet power for longer periods. in other words, smoothing perturbs the original signal by creating multiple highpower islands of long periods, also referred as spurious oscillations. to better capture the statistics behind these smoothing-induced spurious oscillations, it is best to look at the time averaged wavelet spectrum. figure e shows the mean expected power after smoothing white noise with a moving average lter. a zone of attenuated small periods become visible, the sloppy stopband for periods around ∆t. these are the fast, high-frequency components which get removed from the signal. however, for larger periods the wavelet power gets amplied up to -fold. it is this gain, given by σ f /σ f s , which leads to spurious results in the analysis. as stated before, variance normalized white noise has a mean power of for all frequencies or periods (p f w n (ω) = ). this allows to use a straightforward numerical method to estimate a lter response | w(ω)| , i.e. applying the smoothing operation to simulated white noise and time averaging the wavelet spectra. this monte carlo approach works for every (also non-convolutional) smoothing method. results for the savitzky-golay lter applied to white noise signals can be found in supplementary figure s . convolutional lters will in general produce more gain and hence more spurious oscillations with increasing window sizes in the time domain. if smoothing, even with a rather small window (m = ∆t), already potentially introduces false positive oscillations, what does that mean for practical time-frequency analysis? for wavelet analysis the answer is plain and clear: smoothing is simply not needed at all. a close inspection of the unaltered white noise wavelet spectrum shown in figure b , shows the same structures for higher periods as in the spectrum of the smoothed signal ( figure d ). the big dierence is, that even though these random apparent oscillations get picked up by the wavelets, their low power directly indicates their low signicance. as wavelet analysis (see previous section) is based on convolutions, it already has power preserving smoothing built in. as illustration, we show in figure f a raw noisy signal with lengthening period (noisy chirp) and the corresponding power spectrum ( figure f lower panel). without any smoothing the main periodic signal can be clearly identied in the power spectrum. thus, wavelet analysis does not require smoothing for the detection of oscillations in very noisy signals. for all other spectrum analysis methods which rely on explicit smoothing, characteristics of the background noise and the signal to noise ratio are crucial to avoid detecting spurious oscillations. these are usually both quantities not readily available a priori or in practice. complementary to smoothing, an operation which removes the slow, low frequency components of a signal is generally called detrending. strong trends can dominate a signal by eectively carrying most of the variance and power. there are at least two broad classes of detrending techniques: parametric tting and convolution based. both aim to estimate the trend as a function over time to be subtracted from the original signal. a parametric t always is the best choice, if the deterministic processes leading to the trend are known and well understood. an example is the so called photobleaching encountered in time-lapse uorescent imaging experiments, here an exponential trend can often be well tted to the data based on rst principle deliberations (song et al. [ ] ). however, there are often other slow processes, like cell viability or cells drifting in and out of focus, which usually can't be readily described parametrically. for all these cases convolutional detrending with a window function w(t) is a good option and can be written as: the trend here is nothing less than the smoothed original signal, i.e. f (t) * w(t). however with the signal itself falling into the stop-band of the low-pass lter, with the aim to not capture and subtract any signal components. using basic algebra in the frequency domain we obtain an expression relating the window w(t) to the power spectrum of the original signal f (t): as in the case of smoothing, the so called high-pass response of the window function is given by ( − | w|) and scaled with the ratio of variances σ f /σ f d . in strong contrast to smoothing, there is no overall gain in power in the range of the periods, passing through the lter (called passband). there is however, in case of moving average and other time-domain lters (see also figure s ), no simple passband region. instead, there are rippling artifacts in the frequency domain, meaning some periods getting amplied to up to % and others attenuated by up to %. to showcase why this can be problematic, we constructed a synthetic chirp signal sweeping through a range of periods t − t , however, this time modied by a linear and an oscillatory trend ( figure a ). the oscillatory component of the trend was chosen for clarity with a specic time scale given by its period t trend , which is three times the longest period found in the chirp signal. strongly depending on the specic window size chosen for the moving average lter, there are various eects on both, the time and frequency domain (shaded area in figure b ) such as the introduction of amplitude envelopes and/or incomplete trend removal ( figure c ). a larger window size is better to reduce the eect of the ripples inside the passband. however, the lter decay (roll-o ) towards larger periods becomes very slow. that in turn means that trends can not be fully eliminated. smaller window sizes perform better in detrending, but their passband can be dominated by ripples (see also supplementary figure s ). in practice, sticking to lters originally designed for the time-domain without having oscillatory signals in mind, can easily lead to biased results of a time-frequency analysis. however, given the moderate gains of the detrending lter response, there is a much smaller chance to mistakenly detect spurious oscillations compared to the case of smoothing. the sinc lter, also known as the optimal lter in the frequency domain, is a function with a constant value of one in the passband. in other words, frequencies which pass through are neither amplied nor attenuated. accordingly, this lter also should be constantly zero in the stop-band, the frequencies (or periods) which should be ltered out. this optimal low-pass response can be formulated in the frequency domain simply as: here ω c is the cut o frequency, an innitely sharp transition band dividing the frequency range into pass-and stop-band. it is eectively a box in the frequency domain (dashed lines in figure d ). note that the optimal high-pass or detrending response simply and exactly swaps the pass-and stop-band. in the time domain via the inverse fourier transform, this can be written as: this function is known as the sinc function and hence the name sinc lter. an alternative name used in electrical engineering is brick-wall lter. in practice, this optimal lter has a nonzero roll-o as shown for two dierent cut-o periods (t c = π ωc ) in figure d . the sinc function mathematically requires the signal to be of innite length. therefore, every practical implementation implements windowed sinc lters (smith et al. [ ] ), see also supplementary information s ) about possible implementations. strikingly still, there are no ripples or other artifacts in the frequency-response of the windowed sinc lter. and hence also the 'real world' version allows for a bias free time-frequency analysis. as shown in figure e , the original signal can be exactly recovered via detrending. to showcase the performance of the sinc lter, we numerically compared its performance against two other common methods, the hodrick-prescott and moving average lter ( figure f ). the stop-and passband separation of the sinc lter clearly is the best, although the hodrick-prescott lter with a parameterization as given by ravn and uhlig [ ] also gives acceptable results (see also supplementary figure s ). the moving average is generally inadvisable, due to its amplication right at the start of the passband. in addition to its advantages in practical signal analysis, the sinc lter also allows to analytically calculate the gains from ltering pure noise (see also supplementary information a. ). the gain, and therefore the probability to detect spurious oscillations, introduced from smoothing is typically much larger compared to detrending. however, if a lot of energy of the noise is concentrated in the slow low frequency bands, also detrending with small cut-o periods alone can yield substantial gains (see figure s and figure s ). importantly, when using the sinc lter, the background spectrum of the noise will always be uniformly scaled by a constant factor in the pass-band. there is no mixing of attenuation and amplication as for time-domain lters like moving average ( figure b and c). if the spectrum of the noise can be estimated, or an empirical background spectrum is available, the theory presented in a. allows to directly calculate the correct condence intervals. extraction of the instantaneous period, amplitude and phase of that component is of prime interest for the practitioner. in this section we show how to obtain these important features using wavelet transforms as implemented in pyboat. from the perspective of wavelet power spectra, such main oscillatory components are characterized by concentrated and time-connected regions of high power. wavelet ridges are a means to trace these regions on the time-period plane. for the vast majority of practical applications a simple maximum ridge extraction is sucient. this maximum ridge can be dened as: with n being the number of sample points in the signal. thus, the ridge r(t k ) maps every time point t k to a row of the power spectrum, and therewith to a specic instantaneous period t k ( figure c and d). evaluating the power spectrum along a ridge, gives a time series of powers: p w f (t k , r(t k )). setting a power threshold value is recommended to avoid evaluating the ridge in regions of the spectrum where the noise dominates (in figure c threshold is set to ). alternatively to simple maximum ridge detection, more elaborated strategies for ridge extraction have been proposed (carmona et al. [ ] ). a problem often encountered when dealing with biological data is a general time-dependent amplitude envelope ( figure a ). under our wavelet approach, the power spectrum is normalized with the overall variance of the signal. consequently, regions with low signal amplitudes but robust oscillations are nevertheless represented as very low power blurring them with the spectral oor ( figure c ). this leads to the impractical situation, where even a noise free signal with an amplitude decay will show very low power at the end ( figure e ,f and s ), defeating its statistical purpose. a practical solution in this case is to estimate an amplitude envelope and subsequently normalize the signal with this envelope ( figure a and b). we specically show here non-periodic envelopes, estimated by a sliding window (see also methods). after normalization, lower amplitudes are no longer penalized and an eective power-thresholding of the ridge is possible (figure d and f) . a limitation of convolutional methods, including wavelet-based approaches, are edge eects. at the edges of the signal, the wavelets only partially overlap with the signal leading to a so-called cone of inuence (coi) (figure c and d). even though the periods are still close to the actual values, phases and especially the power should not be trusted inside the coi (see discussion and supplementary figure s ). once the trace of consecutive wavelet power maxima has been determined and thresholded, evaluating the transform along it yields the instantaneous envelope amplitude, normalized amplitude and phases (see figure e ,f and g). to obtain applications after introducing the dierent time series analysis steps using synthetic data for clarity, in this paragraph, we discuss examples of pyboat applications to real data. to showcase the versatility of our approach, we chose datasets obtained from dierent scientic elds. in figure a we display covid- infections in italy as reported by the disease prevention and control (ecdc). a sinc-lter trend identication with a cut-o period of days reveals a steep increase in newly reported infections at the beginning of march and a steady decline after the beginning of april. subtracting this non-linear trend clearly exposes oscillations with a stable period of one week ( figure b , and see supplementary figure s a for power spectrum analysis). similar ndings were recently reported in ricon-becker et al. [ ] . the signals shown in figure c show cycles in hare-lynx population sizes as inferred from the yearly number of pelts, trapped by the hudson bay company. data has been taken from odum and barrett [ ] . the corresponding power spectra are shown in the supplement ( figure s b ), and reveal a fairly stable year periodicity. after extracting the instantaneous phases with pyboat, we calculated the time-dependent phase dierences as shown in figure b . interestingly, the phase dierence slowly varies between being almost perfectly out of phase and being in phase for a few years around . the next example signal shows a single-cell trajectory of a u os cell carrying a geminin-cfp uorescent reporter (granada et al. [ ] ). geminin is a cell cycle reporter, accumulating in the g phase and then steeply declining during mitosis. applying pyboat on this non-sinusoidal oscillations reveals the cellcycle length over time ( figure f ), showing a slowing down of the cell cycle progression for this cell. ensemble dynamics for a control and a cisplatin treated population are shown in the supplementary figure s c . the nal example data set is taken from mönke et al. [ ] , here populations of mcf cells where treated with dierent dosages of the dna damaging agent ncs. this in turn elicits a dose-dependent and heterogeneous p response, tracked in the individual cells for each condition ( figure g ). pyboat also features readouts of the ensemble dynamics: figure h shows the time-dependent period distribution in each population, figure i the phase coherence over time. the latter is calculated as r(t) = j e iφ j (t) . it ranges from zero to one and is a classical measure of synchronicity in an ensemble of oscillators kuramoto [ ] . the strongly stimulated cells ( ng ncs) show stable oscillations with a period of around min, and retain more phase coherent after an initial drop in synchronicity. the medium stimulated cells ( ng ncs) start to slow down on average already after the rst pulse, both the spread of the period distribution and the low phase coherence indicate a much more heterogeneous response. two individual cells and their wavelet analysis are shown in supplementary figure s d . graphical interface the extraction of period, amplitude and phase is the nal step of our proposed analysis workow, which is outlined in the screen-captures of figure . the user interface is separated into several sections. first, the 'dataviewer' allows to visualize the individual raw signals, the trend determined by the sinc lter, the detrended time series and the amplitude envelope. once satisfactory parameters have been found, the actual wavelet transform together with the ridge are shown in the 'wavelet spectrum' window. after ridge extraction, it is possible to plot the instantaneous observables in a 'readout' window. each plot produced from the interface can be panned, zoomed in and saved separately if needed. once the delity of the analysis has been checked for individual signals, it is also possible to run the entire analysis as a 'batch process' for all signals imported. one aggregated result which we found quite useful is to determine the 'rhythmicity' of a population by creating a histogram of time-averaged powers of the individual ridges. a classication of signals into 'oscillatory' and 'nonoscillatory' based on this distribution, e.g. by using classical thresholds (otsu [ ] ) is a potential application. examples of the provided ensemble readouts are shown in figure h and i and supplementary figure s c . finally, pyboat also features a synthetic signal generator, allowing to quickly explore its capabilities also without having a suitable dataset at hand. a synthetic signal can be composed of up to two dierent chirps, and ar -noise and an exponential envelope can be added to simulate possible challenges often present in real data (see also material and methods and supplementary figure s ). installation and user guidelines of pyboat can be found in the github repository. gets detrended with a cut-o period of h, and an amplitude envelope is estimated via a window of size h. see labels and main text for further explanations. the example trajectory displays a circadian rhythm of h and is taken from the data set published in abel et al. [ ] . recordings of biological oscillatory signals can be conceptualized as an aggregate of multiple components, those coming from the underlying system of interest and additional confounding factors such as noise, modulations and trends that can disguise the underlying oscillations. in cases of variable period with noisy amplitude modulation and non-stationary trends the detection and analysis of oscillatory processes is a non-trivial endeavour. here we introduced pyboat, a novel software package that uses a statistically rigorous method to handle non-stationary rhythmic data. pyboat integrates pre-and post-processing steps without making a priori assumptions about the sources of noise and periodicity of the underlying oscillations. we showed how the signal processing steps of smoothing, detrending, amplitude envelope removal, signal detection and spectral analysis can be resolved by our hands-o standalone software (figure and ). artifacts introduced by the time series analysis methods itself are a common problem that inadvertently disturbs results of the time-frequency analysis of periodic components (wilden et al. [ ] ). here we rst analyzed the eects of data-smoothing on a rhythmic noisy signal and showed how common smoothing approaches disturb the original recordings by introducing non-linear attenuations and gains to the signal (figures , s and s ). these gains easily lead to spurious oscillations that were not present in the original raw data. these artifacts have been characterized since long for the commonly used moving-average smoothing method, known as the slutzky-yule eect (slutzky [ ] ). using an analytical framework, we describe the smoothing process as a lter operation in frequency domain. this allows us to quantify and directly compare the eects of diverse smoothing methods by means of response curves. importantly, we show here how any choice of smoothing unavoidably transforms the original signal in a non-trivial manner. one potential reason for its prevalence is that practitioners often implement a smoothing algorithm without quantitatively comparing the spectral components before versus after smoothing. pyboat avoids this problem by implementing a wavelet-based approach that per se evades the need to smooth the signal. another source of artifacts are detrending operations. thus, we next studied the spectral eects that signal detrending has on rhythmic components. our analytical and numerical approaches allowed us to compare the spectral eects of dierent detrending methods in terms of their response curves (see figure ). our results show that detrending also introduces non-trivial boosts and attenuations to the oscillatory components of the signal, strongly depending on the background noise ( figures s and s ). in general there is no universal approach and optimally a detrending model is based on information about the sources generating the trend. in cases without prior information to formulate a parametric detrending in the time domain, we suggest that the safest method is the convolution based sinc lter , as it is an "ideal" (step-function) lter in the frequency domain ( figures c and s ). furthermore we compared the performance of the sinc lter with two other commonly applied methods to remove non-linear trends in data ( figure f ), i.e. the moving average (díez-noguera [ ] ) and hodrick-prescott (myung et al. [ ] , schmal et al. [ ] , st. john and doyle [ ] ) lter. in addition to smoothing and detrending, amplitude-normalization by means of the amplitude envelope removal is another commonly used data processing step that pyboat is able to perform. here we further show how that for decaying signals amplitude normalization grants that the main oscillatory component of interest can be properly identied in the power spectrum ( figure a to d). this main component is identied by a ridge-tracking approach that can be then used to extract instantaneous signal parameters such as amplitudes, power and phase ( figure e to g). rhythmic time series can be categorized into those showing stationary oscillatory properties and the non-stationary ones where periods, amplitudes and phases change over time. many currently available tools for the analysis of biological rhythms rely on methods aimed at stationary oscillatory data, using either a standalone software environment such as brass (edwards et al. [ ] , locke et al. [ ] ), chronostar (klemz et al. [ ] ) and circada (cenek et al. [ ] ) or an online interfaces such as biodare , zielinski et al. [ ] ). continuous wavelet analysis allows to reveal non-stationary period, amplitude and phase dynamics and to identify multiple frequency components across dierent scales within a single oscillatory signal (leise [ ] , leise et al. [ ] , rojas et al. [ ] ) and is thus complementary to approaches that are designed to analyze stationary data. in contrast to the r-based waveclock package (price et al. [ ] ), pyboat can be operated as a standalone software tool that requires no prior programming knowledge as it can be fully operated using its graphical user interface (gui). an integrated batch processing option allows the analysis of large data sets within a few "clicks". for the programming interested user, pyboat can also be easily scripted without using the gui, making it simple to integrate it into individual analysis pipelines. pyboat also distinguishes itself from other wavelet-based packages (e.g. harang et al. [ ] ) by adding a robust sinc lter-based detrending and a statistically rigorous framework, providing the interpretation of results by statistical condence considerations. pyboat is not specically designed to analyze oscillations in high-throughput "omics" data. for such sake, specialized algorithms such as arser (yang and su [ ] ), jtk-cycle (hughes et al. [ ] ), metacycle (wu et al. [ ] ) or rain (thaben and westermark [ ] ) are more appropriate. its analysis reveals basic oscillatory properties such as the time-dependent (instantaneous) rhythmicity, period, amplitude and phase but is not aimed at more specic statistical tests such as, e.g., tests for dierential rhythmicity as implemented in dodr (thaben and westermark, ) . the continous-wavelet analysis underlying pyboat requires equidistant time series sampling with no gaps. methods such as lomb-scargle periodograms or harmonic regressions are more robust or even specically designed with respect to unevenly-sampled data (lomb [ ] , ruf [ ] ). being beyond the scope of this manuscript, it will be interesting in future work to integrate the ability to analyze unevenly sampled data into the pyboat software, either by the imputation of missing values (e.g. by linear interpolation) or the usage of wavelet functions specically designed for this purpose (thiebaut and roques [ ] ). pyboat is fast, easy too use and statistically robust analysis routine designed to complement existing methods and advance the ecient time series analysis of biological rhythms research. in order to make it publicly available, pyboat is a free and open-source, multi-platform software based on the popular python (van rossum and drake [ ] ) programming language. it can be downloaded using the following link : https://github.com/tensionhead/pyboat, and is available on the anaconda distribution (via the conda-forge channel). software pyboat is written in the python programming language (van rossum and drake [ ] ). it makes extensive use of python's core scientic libraries numpy and scipy (virtanen et al. [ ] ) for the numerics. additionally we use matplotlib (hunter [ ] for visualization, and pandas (mckinney [ ] ) for data management. pyboat is released under the open source gpl- . license, and its code is freely available from https://github.com/tensionhead/pyboat. the readme on this repository contains further information and installation instructions. pyboat is also hosted on the popular anaconda distribution, as part of the conda-forge community https://conda-forge.org/. to estimate the amplitude envelope in the time domain, we employ a moving window of size l and determine the minimum and maximum of the signal inside the window for each time point t. the amplitude at that time point is then given by a(t) = (max(t) − min(t)). this works very well for envelopes with no periodic components, like an exponential decay. however, this simple method is not suited for oscillatory amplitude modulations. it is also recommended to sinc-detrend the signal before estimating the amplitude envelope. note that l should always be larger then the maximal expected period in the signal, as otherwise the signal itself gets distorted. a noisy chirp signal can be written as: f (t i ) = a cos(φ(t i )) + d x(t i ), where φ(t i ) is the instantaneous phase and the x(t i ) are samples from a stationary stochastic process (the background noise). the increments of the t i are the sampling interval: t i+ − t i = ∆t, with i = , , ..., n samples. starting from a linear sweep through angular frequencies: ω( ) = ω and ω(t n ) = ω , we have ω(t) = ω −ω t n t + ω . the instantaneous phase is then given by sampling n times from a gaussian distribution with standard deviation equal to one corresponds to gaussian white noise ξ(t i ). with x(t i ) = ξ(t i ) the signal to noise ration (snr) then is a /d . a realization of an ar process can be simulated by a simple generative procedure: the inital x(t ) is a sample from the standard normal distribution. then the next sample is given by: x(t i ) = αx(t i− ) + ξ(t i ), with α < . simulating pink noise is less straightforward, and we use the python package colorednoise from https://pypi.org/project/colorednoise for the simulations. its implementation is based on timmer and koenig [ ] . functional network inference of the suprachiasmatic nucleus quantitative analysis of circadian single cell oscillations in response to temperature identication of chirps with continuous wavelet transform circada: shiny apps for exploration of experimental and synthetic circadian time series with an educational emphasis problem solving: a statistician's guide methods for serial analysis of long time series in the study of biological rhythms quantitative analysis of regulatory exibility under changing environmental conditions calls out of chaos: the adaptive signicance of nonlinear phenomena in mammalian vocal production theory of communication. part : the analysis of information systems biology of cellular rhythms the eects of proliferation status and cell cycle phase on the responses of single cells to chemotherapy foundations of time-frequency analysis circannual rhythms in birds circadian system phase an aspect of temporal morphology; procedures and illustrative examples wavos: a matlab toolkit for wavelet analysis and visualization of oscillatory systems jtk_cycle: an ecient nonparametric algorithm for detecting rhythmic components in genomescale data sets matplotlib: a d graphics environment hilbert transformer and time delay: statistical comparison in the presence of gaussian noise reciprocal regulation of carbon monoxide metabolism and the circadian clock chemical turbulence scaling of embryonic patterning based on phase-gradient encoding wavelet analysis of circadian and ultradian behavioral rhythms persistent cellautonomous circadian oscillations in broblasts revealed by six-week singlecell imaging of per ::luc bioluminescence extension of a genetic network model by iterative experimentation and mathematical analysis least-squares frequency analysis of unequally spaced data mckinney, wes. "data structures for statistical computing in python excitability in the p network mediates robust signaling with tunable activation thresholds in single cells online period estimation and determination of rhythmicity in circadian data, using the biodare data infrastructure period coding of bmal oscillators in the suprachiasmatic nucleus circadian rhythms determined by cosine curve tting: analysis of continuous work and sleep-loss data fundamentals of ecology dissociation of per and bmal circadian rhythms in the suprachiasmatic nucleus in parallel with behavioral outputs a threshold selection method from gray-level histograms on estimation of the wavelet variance circadian oscillations in rodents: a systematic increase of their frequency with age waveclock: wavelet analysis of circadian oscillation performance of dierent synchronization measures in real data: a case study on electroencephalographic signals on adjusting the hodrick-prescott lter for the frequency of observations a sevenday cycle in covid- infection and mortality rates: are inter-generational social interactions on the weekends killing susceptible people? medrxiv plant dormancy in the perennial context beyond spikes: multiscale computational analysis of in vivo long-term recordings in the cockroach circadian clock the lomb-scargle periodogram in biological rhythm research: analysis of incomplete and unequally spaced time-series smoothing and dierentiation of data by simplied least squares procedures plasticity of the intrinsic period of the human circadian timing system measuring relative coupling strength in circadian systems the summation of random causes as the source of cyclic processes the scientist and engineer's guide to digital signal processing photobleaching kinetics of uorescein in quantitative uorescence microscopy a doppler eect in embryonic pattern formation quantifying stochastic noise in cultured circadian reporter cells least squares analysis of uorescence data detecting rhythms in time series with rain dierential rhythmicity: detecting altered rhythmicity in biological data time-scale and time-frequency analyses of irregularly sampled astronomical time series on generating power law noise a practical guide to wavelet analysis self-organization of embryonic genetic oscillators into spatiotemporal wave patterns python reference manual. cre-atespace quantication of circadian rhythms in single cells subharmonics, biphonation, and deterministic chaos in mammal vocalization metacycle: an integrated r package to evaluate periodicity in large scale data analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation strengths and limitations of period estimation methods for circadian data we gratefully thank bharath ananthasubramaniam, hanspeter herzel, pedro pablo rojas and shaon chakrabarti for fruitful discussions and comments on the manuscript. we further thank jelle scholtalbers and the gbcs unit at the embl in heidelberg for technical support. we thank members of the aulehla and leptin labs for comments, support and helpful advice. key: cord- -blam f c authors: levade, inès; saber, morteza m.; midani, firas; chowdhury, fahima; khan, ashraful i.; begum, yasmin a.; ryan, edward t.; david, lawrence a.; calderwood, stephen b.; harris, jason b.; larocque, regina c.; qadri, firdausi; shapiro, b. jesse; weil, ana a. title: predicting vibrio cholerae infection and disease severity using metagenomics in a prospective cohort study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: blam f c background susceptibility to vibrio cholerae infection is impacted by blood group, age, and pre-existing immunity, but these factors only partially explain who becomes infected. a recent study used s rrna amplicon sequencing to quantify the composition of the gut microbiome and identify predictive biomarkers of infection with limited taxonomic resolution. methods to achieve increased resolution of gut microbial factors associated with v. cholerae susceptibility and identify predictors of symptomatic disease, we applied deep shotgun metagenomic sequencing to a cohort of household contacts of patients with cholera. results using machine learning, we resolved species, strains, gene families, and cellular pathways in the microbiome at the time of exposure to v. cholerae to identify markers that predict infection and symptoms. use of metagenomic features improved the precision and accuracy of prediction relative to s sequencing. we also predicted disease severity, although with greater uncertainty than our infection prediction. species within the genera prevotella and bifidobacterium predicted protection from infection, and genes involved in iron metabolism also correlated with protection. conclusion our results highlight the power of metagenomics to predict disease outcomes and suggest specific species and genes for experimental testing to investigate mechanisms of microbiome-related protection from cholera. summary cholera infection and disease severity can be predicted using metagenomic sequencing of the gut microbiome pre-infection in a prospective cohort, and suggests potentially protective bacterial species and genes. cholera is an acute diarrheal disease caused by vibrio cholerae. it is a major public health threat worldwide that continues to cause major outbreaks, such as in yemen, where over . million cases have been reported since ( , ). transmission of v. cholerae between household members commonly occurs through shared sources of contaminated food or water or through fecal-oral spread ( , ). the clinical spectrum of disease ranges from asymptomatic infection to severe watery diarrhea that can lead to fatal dehydration ( ). host factors such as age, innate immune factors, blood group, or prior acquired immunity partially explain why some people are more susceptible to v. cholerae infection than others, but a substantial amount of the variation remains unexplained ( ). the gut bacterial community can protect against enteropathogenic infections ( ) here we used shotgun metagenomics to analyze an expanded prospective cohort of persons exposed to v. cholerae in bangladesh. our metagenomic analysis yielded improved outcome predictions compared to s rrna sequencing, and identified bacterial genes associated with remaining uninfected after exposure to v. cholerae. we are also able to predict disease severity among infected contacts, albeit with lower power and precision than susceptibility. finally, we highlight several microbiome-encoded metabolic functions associated with protection against cholera. sample collection, clinical outcomes and metagenomic sequencing as described in ( ), household contacts were enrolled within hours of the presentation of an index cholera case at the icddr,b (international center for diarrheal disease research, bangladesh) dhaka hospital. index patients with severe acute diarrhea, a stool culture positive for v. cholerae, age between and years old, and no major comorbid conditions were recruited ( , ). a clinical assessment of symptoms in household contacts was conducted daily for the -day period after presentation of the index case, and repeated on day . we collected demographic information, rectal swabs, and blood samples for abo typing and vibriocidal antibody titers as described in the supplementary methods. during the observation period, contacts were determined to be infected if any rectal swab culture was positive for v. cholerae and/or if the contact developed diarrhea and a -fold increase in vibriocidal titer during the follow-up period ( , ). contacts with positive rectal swabs developing watery diarrhea were categorized as symptomatic and those without diarrhea were considered asymptomatic ( figure ). v. cholerae positive contacts (by culture or deep s amplicon sequencing ( )) at the time of enrollment were excluded, in addition to contacts who reported antibiotic use or diarrhea during the week prior to enrollment. dna extraction was performed for the selected samples and used for shotgun metagenomics sequencing. details on cohorts, sequencing methods and sample processing are described in supplementary methods. we used metaphlan (version . ) ( ) for taxonomic profiling and humann ( ) to profile cellular pathways (from metacyc) and gene families (identified using the pfam database). for identification of biomarkers of susceptibility and disease severity, we used metaml ( ) to apply a random forests (rf) classifier on species, pathways and gene-family relative abundances, as well as strain-specific markers presence/absence. models constructed using each of these features types were compared to a random dataset with shuffled labels, and to a model constructed with clinical/demographic data, using two-sample, two-sided t-tests over replicate cross-validation ( ). we used a stratified -fold cross validation approach, splitting our dataset into validation and training sets ( / and / of samples, respectively) with the same infected:uninfected ratio. we used an embedded feature selection strategy to identify the most metagenomic sequencing of the gut microbiome in household contacts exposed to v. cohort, upon which we base the majority of our analyses. we also performed exploratory analyses on the expanded cohort to determine the potential for predictive models to be generalized to larger samples. we used the shotgun metagenomic dna sequence reads from these samples to characterize four features of the microbiome: ) relative abundances of microbial species, ) the presence/absence of sub-species-level strains, ) metabolic pathway relative abundances, and ) gene family relative abundances (table ) (table s ) . however, such high auc values should be treated with caution because the models can be overfit when a supervised feature selection step is applied on the same data used to train the model ( ). because we did not have a fully independent validation cohort (e.g. from another continent) to test our model, we decided to use the features selected from the midani cohort to make predictions on the expanded dataset. using the same features selected from the midani training dataset, we made predictions on the expanded cohort and achieved aucs between . and . for prediction of infection using the four types of features (table s ) . again, because the expanded cohort partly overlaps with the midani cohort, and includes some repeated samples from the same individuals over time, these results could also be prone to overfitting, but they demonstrate the potential for generalized predictions. finally, we repeated the rf analysis using all features in the expanded dataset, whichh increased predictive performance relative to the original midani cohort (figure s ). once again, genes and pathways outperformed species and strains according to all metrics, with auc reaching ~ . using cellular pathways ( table ) . this improvement in the expanded cohort also highlights the importance of using larger, more balanced datasets as input to predictive models. to put the metagenomic predictions in context, we compared their predictive power and accuracy to clinical and demographic factors ( table s a) . three of these factors (age, baseline vibriocidal antibodies and blood group) are known to impact susceptibility to v. cholerae infection ( , ) and we used them to train rf models ( table s ) . as expected, contacts who became infected tended to be younger and have lower baseline antibody titers than those who remained uninfected ( table s b) , but these small differences were not sufficient to train a significantly predictive model. an rf model trained on the seven clinical and demographic factors did not perform better than a random model with shuffled labels (auc= . , p= . ; to predict symptomatic disease among infected individuals (figure ) , we divided samples into uninfected, symptomatic and asymptomatic groups and again applied the rf approach. we used the f score as a performance metric since it is well suited for uneven class distributions in our uninfected/symptomatic/asymptomatic comparison. applied to the midani cohort, this model predicted outcomes significantly better than random (shuffled labels) using species, strains or pathway data, but not gene families ( table ; see table s for p- values). however, the f scores for the symptomatic/asymptomatic predictions were systematically lower (mean scores ranging from of . to . ) than for the infected/uninfected prediction (means ranging from . to . ). using the expanded cohort, the scores were improved only slightly ( table ) . these results suggest that disease severity is predictable in principle, but with greater uncertainty than the infection outcome. (figures a, s a and s a) . (figure s ), but the overlap was poorer for the uninfected/symptomatic/asymptomatic prediction ( figure s ). this is consistent with the difficulty of predicting disease severity. in general, the most important species were selected by the model because of differences in relative abundance at baseline among uninfected/symptomatic/asymptomatic outcomes ( figure s , s ) . in rare cases, species presence/absence information was predictive. for example, ruminococcus gnavus, is absent (near or below limit of detection) in most of the individuals who become infected, but present in many (but not all) of those who remain uninfected ( figure s ) . thus, there is no single, strong predictor of infection outcomes, but rather a probabilistic combination of many species, each of relatively modest predictive value. table s we also identified gene families in the gut microbiome of persons who remained uninfected during follow-up (figures s and s ) , with some of the top gene families involved in dna repair, transmembrane transporter activity, iron metabolism (indicated with asterisks in figure ) , and genes of unknown function (table s ) . long-chain fatty acid biosynthesis pathways (e.g. cis-vaccenate, gondoate and stearate) were associated with individuals who remained uninfected, while amino acid biosynthesis and catabolic pathways were associated with individuals who developed infection (figures s and s , table s ). we identified three iron- related genes associated with remaining uninfected: ( ) the ferric uptake regulator fur, a major regulator of iron homeostasis, ( ) thioredoxin, a redox protein involved in adaptation to oxidative and iron-deficiency stress, and ( ) the tonb/exbd/tolqr system, a ferric chelate transporter ( - ). in individuals who became infected but asymptomatic, two genes involved in the conversion of riboflavin into catalytically active cofactors, the riboflavin kinase and the fad synthetase, were found as the first and the third most discriminant features (figure , table s ). we next asked which taxa in the microbiome likely encoded these genes. in some cases, specific taxonomic groups corresponded to discrete gene functions. for example, several iron metabolism-related gene families tend to be encoded by prevotella genomes (figure s ) . in other cases, the major contributors to protective gene families were unclassified (figures and s here; see table s for pathways. the contributions of each genus to encoding these pathways are shown as stacked colors within each bar, linearly scaled within the total. see table s for the complete list of pathways the gut microbiome is a potentially modifiable host risk factor for cholera, and identification of specific genes and strains correlated with susceptibility is needed for experimental testing to understand the mechanisms of observed correlations. compared to a previous study using a single marker gene, shotgun metagenomics provides this degree of resolution, potentially to the species and strain level, and to the level of individual genes and cellular functions. we found that gene families in the gut microbiome at the time of exposure to v. cholerae were more predictive of susceptibility compared to taxonomic or clinical and demographic information. selecting a subset of the most informative features further improved predictions, but using these selected features may lead to overfitting. this suggests an upper limit to predictive power that requires validation in larger, independent cohorts. all three bifidobacterium species associated with contacts that developed infection were also associated with asymptomatic rather than symptomatic disease, and prior work on this genus supports several hypotheses for this relationship. first, bifidobacteria are known to produce the scfa acetate that can protect against enteric infection in mice ( , , ). scfas are also known to inhibit cholera toxin-related chloride secretion in the mouse gut, reducing water and sodium loss, and have been observed to increase cholera toxin-specific antibody responses ( - ). bifidobacteria are also major producers of lactate, a metabolite that has been shown to impair v. the authors declare that there are no conflicts of interest. updated global burden of cholera in endemic countries cholera epidemic in yemen, - : an analysis of surveillance data defining endemic cholera at three levels of spatiotemporal resolution within bangladesh clinical outcomes in household contacts of patients with cholera in bangladesh cholera transmission: the host, pathogen and bacteriophage dynamic susceptibility to vibrio cholerae infection in a cohort of household contacts of patients with cholera in bangladesh roles of the intestinal microbiota in pathogen protection members of the human gut microbiota involved in recovery from vibrio cholerae infection bile salts modulate the mucin-activated type vi secretion system of pandemic vibrio cholerae a single gene of a commensal microbe affects host susceptibility to enteric infection probiotic strains detect and suppress cholera in mice anti-biofilm properties of the fecal probiotic lactobacilli against vibrio spp commensal-derived metabolites govern vibrio cholerae pathogenesis in host intestine gut microbial succession follows acute secretory diarrhea in humans human gut microbiota predicts susceptibility to vibrio cholerae infection metaphlan for enhanced metagenomic taxonomic profiling species-level functional profiling of metagenomes and metatranscriptomes machine learning meta-analysis of large metagenomic datasets: tools and biological insights fillat mf. the fur (ferric uptake regulator) superfamily: diversity and versatility of key transcriptional regulators thioredoxin h (trxh) contributes to adversity adaptation and pathogenicity of edwardsiella piscicida tonb-dependent transporters: regulation, structure, and function the prevotella copri complex comprises four distinct clades underrepresented in westernized populations dietary fiber-induced improvement in glucose metabolism is associated with increased abundance of prevotella utilisation of mucin glycans by the human gut symbiont ruminococcus gnavus is strain-dependent mucin glycan foraging in the human gut microbiome regulation of bacterial pathogenesis by intestinal short-chain fatty acids from dietary fiber to host physiology: short-chain fatty acids as key bacterial metabolites formation of propionate and butyrate by the human colonic microbiota. environmental microbiology butyrate protects mice from clostridium difficile-induced colitis through an hif- -dependent mechanism bifidobacteria can protect from enteropathogenic infection through production of acetate potential beneficial effects of butyrate in intestinal and extraintestinal diseases facilitate mucosal adjuvant activity of cholera toxin through gpr . the journal of immunology short-chain fatty acids inhibit fluid and electrolyte loss induced by cholera toxin in proximal colon of rabbit in vivo overview on the bacterial iron- cholera toxin promotes pathogen acquisition of host- derived nutrients transcriptomics reveals a cross-modulatory effect between riboflavin and iron and outlines responses to riboflavin biosynthesis and uptake in vibrio cholerae the human gut microbiome: from association to modulation supplementary tables s -s are available at: https://figshare.com/articles/supplementary_tables_-_levade_et_al_ / key: cord- -t r ny authors: nguyen-tu, marie-sophie; martinez-sanchez, aida; leclerc, isabelle; rutter, guy a.; da silva xavier, gabriela title: reduced expression of tcf l in adipocyte impairs glucose tolerance associated with decreased insulin secretion, incretins levels and lipid metabolism dysregulation in male mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t r ny transcription factor -like (tcf l ) is a downstream effector of the wnt/beta-catenin signalling pathway and its expression is critical for adipocyte development. the precise role of tcf l in glucose and lipid metabolism in adult adipocytes remains to be defined. here, we aim to investigate how changes in tcf l expression in mature adipocytes affect glucose homeostasis. tcf l was selectively ablated from mature adipocytes in c bl/ j mice using an adiponectin promoter-driven cre recombinase to recombine alleles floxed at exon of the tcf l gene. mice lacking tcf l in mature adipocytes displayed normal body weight. male mice exhibited normal glucose homeostasis at eight weeks of age. male heterozygote knockout mice (atcf l het) exhibited impaired glucose tolerance (auc increased . ± . -fold, p= . ), as assessed by intraperitoneal glucose tolerance test, and changes in fat mass at weeks (increased by . ± . -fold, p= . ). homozygote knockout mice exhibited impaired oral glucose tolerance at weeks of age (auc increased . ± . -fold, p= . ). islets of langerhans exhibited impaired glucose-stimulated insulin secretion in vitro (decreased . ± . -fold atcf l ko vs control, p= . ), but no changes in in vivo glucose-stimulated insulin secretion. female mice in which one or two alleles of the tcf l gene was knocked out in adipocytes displayed no changes in glucose tolerance, insulin sensitivity or insulin secretion. plasma levels of glucagon-like peptide- and gastric inhibitory polypeptide were lowered in knockout mice (decreased . ± . -fold and . ± . -fold, p= . and p= . , respectively), whilst plasma free fatty acids and fatty acid binding protein circulating levels were increased by . ± . and . ± . -fold, respectively (p= . and p= . ). mice with biallelic tcf l deletion exposed to high fat diet for weeks exhibited impaired glucose tolerance (p= . at min after glucose injection) which was associated with reduced in vivo glucose-stimulated insulin secretion (decreased . ± . -fold, p= . ). thus, our data indicate that loss of tcf l gene expression in adipocytes leads to impairments on metabolic responses which are dependent on gender, age and nutritional status. our findings further illuminate the role of tcf l in the maintenance of glucose homeostasis. transcription factor -like (tcf l ) is a member of the high mobility group box family of transcription factors, a downstream effector of the wnt/beta-catenin signalling pathway, and a key regulator of development and cell growth [ ] . tcf l function is important for the proper function of tissues involved in the regulation of energy homeostasis. for example, tcf l is required for the maintenance of functional pancreatic beta cell mass and insulin release from the endocrine pancreas [ , ] . in the liver, ablation of tcf l expression from hepatocytes has variously been shown to lead to reduced hepatic glucose production and improved glucose homeostasis [ ] , or hyperglycemia [ ] . studies by macdougald and colleagues [ ] [ ] [ ] have demonstrated that tcf l is involved in the regulation of the expression of proadipogenic genes during adipocyte development. additionally, insulin/insulin receptor substrate- and insulin growth factor have been shown to cross-talk with the wnt signalling pathway to regulate insulin sensitivity in preadipocytes [ , ] . the presence of tcf l binding sites on the promoter of the insulin receptor gene also suggests a role of tcf l /wnt signalling pathway in regulating insulin action in adipocytes [ ] . genome-wide association studies (gwas) have identified single nucleotide polymorphisms (snps; rs and rs ) in the tcf l gene as being amongst the most strongly associated with an increased risk of type diabetes [ ] . humans carrying the t risk allele at rs have elevated proinsulin levels, lowered first-phase insulin secretion and impaired responses to the incretin hormone glucagon-like peptide (glp- ) [ ] [ ] [ ] [ ] [ ] . whilst tcf l variants appear chiefly to affect pancreatic beta cell function in man, the mechanisms driving impaired insulin secretion are still poorly defined [ , ] . although rs in tcf l is not associated with changes in overall tcf l transcription (i.e. of all isoforms), with conflicting data regarding the association of rs with specific tcf l variant expression in subcutaneous fat [ ] [ ] [ ] , tcf l expression has been shown to be reduced in adipose tissue from type diabetes subjects [ ] and in obese mice [ ] . additionally, surgery-induced weight loss has been shown to regulate alternative splicing of tcf l in adipose tissue [ ] . tcf l splice variant expression is regulated by plasma triglycerides and free fatty acids [ , ] , with evidence indicating that acute intake of fat leads to reduced expression of tcf l in human adipocytes [ ] . thus, overall these data suggest that changes in tcf l expression may be linked to adaptation to changes in fuel intake. efforts to understand the causal relationship between tcf l and diabetes led to studies on several metabolic tissues, with the data suggesting that the combined effects of loss of tcf l in multiple tissues may account for the diabetes phenotype [ ] . for example, previous reports have described the impact of loss of tcf l expression on adipocyte development and insulin sensitivity [ , ] . in the present study, we used a genetic model of ablation of tcf l gene expression in mature adipocytes to determine whether tcf l plays a role in adipocyte function, independent of its role in adipogenesis. we have focused on whether loss of tcf l expression in the adipocyte impacts the release of hormones involved in glucose homeostasis, notably insulin and incretins. in this way, we sought to explore the possibility that altered tcf l expression in the adipocyte may contribute to type diabetes risk through downstream effects on multiple effector organs [ , ] . to generate tissue-specific knockout of tcf l alleles, we crossed mice in which exon (encoding for the beta-catenin-binding domain) was flanked by loxp sites [ ] with mice expressing cre recombinase under the control of the adiponectin promoter [ ] (a kind gift from d. withers, imperial college london) to produce deletion of one tcf l allele (atcf l het) or two alleles (atcf l ko). littermates used as controls did not express cre recombinase but were homozygous or heterozygous for the floxed tcf l allele. adiponectin-cre [ ] or mice with tcf l gene flanked by loxp sites (tcf l -floxed) [ ] did not display phenotypes that deviate from wild-type littermate control mice, consequently we used tcf l -floxed mice as controls in our test cohorts. mice were born at the expected mendelian ratios with no apparent abnormalities. animals were housed - per individuallyventilated cage in a pathogen-free facility with : light:dark cycle with free access to standard mouse chow (rm- ; special diet services) diet and water. high fat diet (hfd) cohort were put under a high sucrose high fat diet (d ; research diets) for weeks from -week-old. for the chow diet cohort, metabolic exploration was performed on each animal within a -week window at stages ( -week-old and -week-old). all in vivo procedures described were performed at the imperial college central biomedical service and approved by the uk home office animals scientific procedures act, (ho licence ppl / to gdsx). glucose tolerance was performed on h-fasted mice after an oral gavage of glucose (ogtt, g/kg of body weight) or intraperitoneal injection of glucose (ipgtt, g/kg body weight. ipgtt and ogtt were performed at two stages (at -week-old and at -week-old) for each individual mouse. insulin tolerance was performed after a h-fast with an intraperitoneal injection of insulin (ipitt, . u/kg in females, . u/kg in males under chow diet, . u/kg in males under hfd). in vivo glucosestimulated insulin secretion was assessed after oral or intraperitoneal administration of glucose and blood was collected at -and -minutes post-injection to assess plasma insulin levels using an ultrasensitive mouse insulin elisa kit (crystal chem, netherlands) or using a homogeneous time resolved fluorescence (htrf) insulin kit (cisbio, france) in a pherastar reader (bmg labtech, uk). to assess insulin sensitivity, male mice were starved for four hours and following an intraperitoneal injection of insulin ( ui/kg body weight) adipose tissues were collected and frozen in liquid nitrogen. adipose tissue proteins were extracted in lysis buffer ( mmol/l nacl, mmol/l tris-hcl ph . , % np- ) supplemented with protease inhibitors (roche, germany) and phosphatase inhibitors (sigma-aldrich, uk) and analysed by western blotting using antibodies for tcf /tcf l (c h ) (# , : , cell signalling, neb, uk), phospho-akt (# , : , cell signalling, neb, uk), total-akt (# , : , cell signalling, neb, uk), gapdh (# , : , cell signalling, neb, uk), alpha-tubulin (t , : , sigma-aldrich, uk). fiji software was used for densitometry quantification. uncut versions of all western blot images are presented in supplemental figure s . islets were isolated by digestion with collagenase as described [ ] . in brief, pancreata were inflated with a solution of collagenase from clostridium histolyticum ( mg/ml; nordmark, germany) and placed in a water bath at ⁰c for min. islets were washed and purified on a histopaque gradient (sigma-aldrich, uk). isolated islets were cultured for h in rpmi containing . mmol/l glucose, % foetal bovine serum and l-glutamine (sigma-aldrich, uk) and allowed to recover overnight. insulin secretion assays on isolated mouse islets were performed as previously described [ ] . in measurement of intracellular calcium dynamics was performed as previously described [ ] . in brief, whole isolated islets were incubated with fura- am (invitrogen, uk) [ ] for min at ⁰c in khb containing mmol/l glucose. fluorescence imaging was performed using a nipkow spinning disk head, allowing rapid scanning of islet areas for prolonged periods of time with minimal phototoxicity. volocity software (perkinelmer life sciences, uk) provided interface while islets were kept at ⁰c and constantly perifused with khb containing mmol/l or mmol/l glucose or mmol/l kcl. epididymal adipose tissue were removed from euthanized mice, fixed overnight in % formalin and subsequently embedded in paraffin wax. adipose tissue slices ( μm) were stained with hematoxylin and eosin (sigma-aldrich, uk) for morphological analysis. blood in the fed state was obtained by tail bleeding, and circulating factor concentrations were measured using the following kits according to the respective manufacturer's protocols: bio-plex protein array system (biorad, uk) with one multiplex panel was used to measure total glucagon-like peptide- (glp- ), glucose-dependent insulinotropic polypeptide (gip), leptin, adiponectin and plasminogen activator inhibitor- (pai- ); one multiplex panel was used to measure fatty acid binding protein (fabp ) and resistin (r&d systems, uk); non-esterified fatty acid (nefa) serum levels were measured by colorimetric assay (randox, uk) and dipeptidyl peptidase (dpp ; r&d systems, uk) levels were measured by elisa. rna was isolated from epididymal and subcutaneous adipose tissue, liver and pancreatic islets with trizol following manufacturer's instructions (invitrogen, uk). rna purity and concentration were measured by spectrophotometry (nanodrop, thermo scientific, uk). only rna with absorption ratios between . - . for / and / nm were used. rna integrity was checked on an agarose gel. rna was reversed transcribed using high-capacity cdna reverse transcription kit (applied biosystems, uk). qpcr was performed with fast sybr green master mix (applied biosystems, uk). the comparative ct method ( -ct ) was used to calculate relative gene expression levels using gapdh, βactin or ppia as an internal control. the primers sequences are listed in supplemental table s . data are shown as means ± sem. graphpad prism . was used for statistical analysis. statistical significance was evaluated by the two-tailed unpaired student t-test and one-or two-way anova, with tukey or bonferroni multiple comparisons post-hoc test as indicated in the figure legends. p values of < . were considered statistically significant. to determine the role of tcf l in the mature adipocyte, we generated a mouse line in which tcf l is deleted specifically in these cells through the expression of cre recombinase under the control of the adiponectin promoter [ ] . cre recombinase mediated the excision of exon of tcf l generating one single tcf l allele deletion (atcf l het) or a biallelic tcf l deletion (atcf l ko). tcf l mrna levels were decreased in inguinal adipose tissue (iwat) by . ± . % (p= . ) while in epididymal adipose tissue (ewat), a reduction by . ± . % did not reach any statistical significance (p= . ) in atcf l het mice compared to controls. in atcf l ko mice, expression was reduced by . ± . % and . ± . %, respectively in ewat and iwat compared to controls. conversely, no changes in tcf l expression were apparent in liver and pancreatic islets from atcf l het and atcf l ko mice vs islets from littermate controls (fig. a) . correspondingly, the content of the two tcf l protein isoforms ( kda and kda) in ewat was significantly reduced by ± % and ± % respectively in atcf l het mice and reduced by ± % and ± % in atcf l ko mice ( fig. b and c) . body weight was unchanged in male ( likewise, we found no change in fat or lean mass in weeks old male or female mice, as assessed by echomri ( fig. g , i, k and m). however, fat mass was significantly increased, and lean mass was decreased, in male atcf l het mice on normal chow diet ( fig. h and j) . female mice showed no change in body composition with age ( fig. l and n) . to explore the effects of adipocyte-specific tcf l ablation on whole body metabolism, we measured glucose tolerance, and found an age-dependent impairment in the response to glucose administration. glucose challenge was assessed in male and female mice at a young stage ( weeks of age) and at an older stage ( weeks of age). in younger mice, blood glucose levels after intraperitoneal injection of glucose were similar in atcf l het and atcf l ko compared to sex-matched littermate controls regardless of the gender ( fig. a and fig. a ). in older mice, glucose levels were impaired in male atcf l het mice rising to . ± . mmol/l at minutes after intraperitoneal injection of glucose compared to littermate controls rising to . ± . mmol/l (fig. b) . likewise, atcf l het and young atcf l ko mice had similar increases in glucose levels minutes after oral administration of glucose ( . ± . mmol/l and . ± . mmol/l respectively) compared to littermate controls ( . ± . mmol/l; fig. c) . surprisingly, glucose tolerance (as assessed by oral glucose tolerance test) was impaired in older atcf l ko mice ( weeks) compared to littermate controls (fig. d) . female mice exhibited no change in glucose tolerance regardless of age and genotype (fig. a, fig. b and fig. c ). whole body insulin sensitivity was unaffected in both genders across all genotypes ( fig. e and fig. d ). in order to assess whether ablation of tcf l in mature adipocyte affects organs involved in glycaemic control, beta cell secretory capacity was measured during glucose challenge. we sought first to examine whether impaired intraperitoneal and oral glucose challenge in older male atcf l het and atcf l ko mice was due to an impact on insulin secretion. fasting plasma insulin levels and in vivo glucose-stimulated insulin release were similar in atcf l het and atcf l ko mice vs littermate controls after intraperitoneal injection of glucose (fig. f) or after oral administration of glucose (fig. g) . glucose stimulated insulin secretion in isolated islets was decreased after high ( mm) glucose incubation ( . ± . % in atcf l ko vs . ± . % in controls) while responses to kcl ( mm) were not different in islets from atcf l ko compared to islets from littermate controls (fig. h) . when exploring after tcf l loss expression of key genes associated with normal beta cell function, we observed no significant changes in gene expression. indeed, no difference was observed for the expression of the insulin (ins , ins ), or glucagon (gcg) genes in islets from all genotypes, however a reduction in the expression of the glucose transporter (glut /slc a ) gene was observed in atcf l ko mice ( . ± . in atcf l ko vs . ± . in controls; fig. i ). to further evaluate the origins of the secretory defects observed in isolated islets (fig. h) , we measured the changes in cytosolic calcium in response to incubation with varying concentrations of glucose ( mm, mm) in the presence or absence of kcl ( mm) on isolated islets (fig. j) . islets from atcf l ko male mice showed a diminished response to high glucose incubation compared to atcf l het animals, while differences compared to control mice did not reach any statistical significance. islets from atcf l het male mice showed an increase response to kcl compared to controls (fig. j) . in vivo after intraperitoneal injection and in vitro glucose stimulated insulin secretion was unchanged in female atcf l kohet and atcf l ko mice ( fig. e and f) . therefore, alterations in pancreatic beta cell function observed ex vivo in the absence of tcf l in adipocyte have no impact on whole body glucose-stimulated insulin secretion. to investigate the causes of impaired oral glucose tolerance in male atcf l ko mice, we measured the circulating levels of other hormones regulating glucose metabolism. circulating glp- and gip levels in plasma were decreased in older male atcf l ko mice compared to age-and sex-matched littermate control mice (glp- : . ± . vs . ± . ng/ml; gip: . ± . vs . ± . ng/ml respectively; fig. a and fig. b ). we observed a significant decrease in gip levels ( . ± . vs . ± . ng/ml in controls), but no robust or statistical differences in glp- levels in atcf l het mice ( fig. a and b) . plasma dpp levels in male atcf l ko mice were not different compared to controls (fig. c ). evidence suggests a role of wnt/tcf l signalling in the control of lipid metabolism [ , ] . we, therefore, sought to determine whether tcf l could play a role as a regulator of fatty acid release from mature adipocyte. plasma levels of circulating nefa and the lipid carrier fabp were found to be increased respectively in atcf l ko mice compared to age-and sex-matched littermate controls (nefa: . ± . vs . ± . mmol/l, respectively; fabp : . ± . ng/ml vs . ± . ng/ml, respectively; fig. d and e) . finally, we investigated endocrine adipocyte function by measuring adipokines which are usually found to be affected in insulin resistance and metabolic diseases. plasma levels of adiponectin, leptin, resistin and pai- were found to be unchanged in male mice of all genotypes (fig. f , g and supplemental fig.s a and b) . we next explored whether loss of tcf l expression may affect insulin signalling in adipocytes, since previous reports [ , ] described hepatic insulin resistance in a mouse model of ablation of tcf l . pkb/akt ser phosphorylation was elevated at basal condition prior insulin stimulation in older ( weeks) atcf l male mice ( fig. h and i) , but pkb/akt ser phosphorylation in response to insulin was similar adipocytes from atcf l ko mice and control littermates. however, relative expression of phosphorylated akt after insulin stimulation compared to basal appeared decreased in atcf l ko mice compared to controls but was not robust enough to reach statistical significance (p= . ; fig. j ). indicative of unaltered hepatic insulin sensitivity, liver levels of the gluconeogenic genes g pase and pepck (pck ) did not differ between control and atcf l ko mice (fig. j) . therefore, our results suggest that tcf l could control lipolysis and lipid metabolism in adipocyte. in order to investigate whether nutritional status had an impact on glucose metabolism in the absence of tcf l expression in adipocytes, we maintained atcf l ko male mice on high fat ( %) diet for weeks. we focused on males as no change were observed in female mice on chow diet. body weight trajectory showed a similar increase in atcf l ko compared to control mice (fig. a) . glucose tolerance was altered after intraperitoneal injection of a high concentration ( g/kg) of glucose by a delay of minutes in the blood glucose peak after injection of glucose compared to controls ( . ± . vs . ± . mmol/l respectively; fig. b and c) . no significant change in oral glucose tolerance ( fig. d and e) , or insulin sensitivity (fig. f) , was observed between mice of all genotypes. insulin secretion was impaired during in vivo oral glucose challenge in atcf l ko compared to littermate controls (at minutes, . ± . vs . ± . ng/ml respectively; fig. h ). we observed no robust changes following intraperitoneal injection of glucose to reach statistical significance (fig. g) . ex vivo insulin release in response to high glucose ( mm), glp- ( nm) and kcl ( mm) was found to be no different between islets of langerhans isolated from atcf l ko mice and littermate control mice (fig. i) . in the present study, we demonstrate that changes in the expression of tcf l in murine adult adipose tissue may lead to alterations not only in adipocyte function but also in the function of other tissues involved in the regulation of energy homeostasis in a gender-, age-, and nutritional status-dependent manner. thus, we provide evidence that deletion of tcf l in adipocytes leads to alterations in the function of adipocytes, pancreatic islet beta cells, and enteroendocrine cells, thereby highlighting a role for tcf l in systemic glucose homeostasis. wnt signalling and its effectors beta-catenin and tcf l are critical during adipogenesis [ , , ] . the presence of this signalling module during adulthood suggests that it may also be important for the function of adult adipocytes. in the present study, we found that young mice presented no alteration of glucose tolerance or body composition, regardless of the gender, in the absence of tcf l in adipocyte, whilst defects appear with age ( fig. g and fig. a) . recently, tian et al. revealed crosstalk between wnt signalling and females hormones through tcf l to regulate lipid metabolism [ ] . thus, gender differences observed when expressing a dominant-negative form of tcf l in hepatocytes would suggest that sex hormones regulate glucose homeostasis via repressing hepatic gluconeogenesis and regulate lipid metabolism [ ] . in our study, we also found that female mice were largely unaffected by the loss of tcf l function in the adipocyte, suggesting a role for female hormones to maintain glucose homeostasis. the key effector of the wnt signalling pathway is the association of beta-catenin and a member of the tcf family which may include tcf l , tcf , tcf l and lef- [ ] . the availability of free betacatenin entering the nucleus to bind tcf l is crucial for activation of expression of downstream target genes. however, the regulation of tcf l expression is also essential. high-fat feeding modulates tcf l expression in pancreatic islets, in hepatocytes and in human adipocytes [ , , ] . furthermore, studies have suggested that variation in tcf l expression altered glucose metabolism and induces type diabetes phenotype [ ] . we found that deletion of a single tcf l allele generates distinct features of obesity-induced glucose intolerance while biallelic tcf l deletion reveals a disruption of endocrine signalling molecules. therefore, alterations induced by deletion of tcf l in mature adipocytes could depend on age and on the dosage of tcf l expression. we found that under chow diet fed mice lacking tcf l selectively in adipocytes displayed an impaired response to oral glucose challenge (fig. d) but normal tolerance to intraperitoneal injection of the sugar (fig. b) . this suggests an impaired incretin effect, defined as the postprandial insulin response provoked by incretin hormones such as glp- and gip. however, insulin release in response to an oral glucose challenge was maintained (fig. d) , whilst glucose-stimulated insulin secretion ex vivo from isolated islets of langerhans is impaired (fig. e) . our data therefore suggest that a mechanism exists to maintain insulin release in vivo after deletion of tcf l from adipocytes. however, when challenged with high fat feeding, impaired glucose tolerance is associated with impaired insulin secretion ( fig. b and h) . future studies will need to assess further the effects of highfat diet feeding on beta cell function on a larger cohort, as exploration were suspended during the pandemic events of covid- . a striking finding in the present study is that tcf l is required in adipose tissue for normal incretin production and insulin secretion: we reveal that decreased tcf l expression in mature adipocytes leads to lowered circulating levels of glp- and gip ( fig. a and b) . this suggests that a compensatory effect is unmasked in atcf l ko mice to stimulate insulin secretion in pancreatic beta cells when the incretin effect is compromised. one possible explanation to reconcile our findings on in vivo and ex vivo glucose-stimulated insulin secretion is that elevated fatty levels compensate in part for the lowered levels of circulating incretins, acting to amplify insulin release through the action of fatty acid receptors. nefa and specifically long-chain fatty acids potentiate glucose-stimulated insulin secretion [ , ] . moreover, a direct insulinotropic action of fabp , a cytosolic lipid chaperone expressed and secreted by white and brown adipocytes may act directly on pancreatic beta cells as demonstrated in previous studies showing that recombinant fabp administration enhanced glucose-stimulated insulin secretion in vitro and in vivo [ , ] . in a small type diabetes cohort, increased serum fabp was correlated to enhanced insulin release [ ] . taking these findings together, lowered circulating incretins may have provoked impaired glucose tolerance with no apparent change in beta cell function to release insulin in vivo-compensated by stimulation of glucose-stimulated insulin secretion by higher circulating fabp and nefa-in atcf l ko mice. nevertheless, there is a need to further investigate the mechanisms that may link impaired glucose tolerance with normal insulin secretion. these might include mechanisms such as reduced insulinindependent glucose disposal, or insulin resistance-induced altered glut trafficking in enterocytes [ ] . by what mechanisms might depletion of tcf l from adipocytes lead to a decrease in the circulating levels of glp- and gip? our data suggest that decreased incretin levels are unlikely to be due to an increase in the rate of degradation of these hormones in the bloodstream, as no change was found in circulating dpp in atcf l ko mice (fig. c) . we observed elevated circulating plasma fatty acids and fabp levels ( fig. d and e) , reflecting altered adipocyte metabolism in the absence of tcf l . wnt and tcf l are regulators of lipid metabolism in hepatocytes [ ] , consistent with elevated plasma triglycerides associated with tcf l risk variant rs [ ] . moreover, the genetic variant is localized near the regulatory region for acyl-coa synthetase long chain (acsl ) which activates fatty acids to generate long chain fatty acyl coa [ ] . consistent with our model, martchenko and colleagues [ ] have recently reported fatty acid-induced lowering of circadian release of glp- from l-cells as a result of decreased bmal expression. similarly, filipello et al [ ] reported decreased insulin-dependent glp- secretion from l-cell-derived glutag cells, and increased glucagon release, upon fatty acid treatment. similar findings on glp- secretion were reported by others [ , ] , whist long chain saturated (palmitate) but not unsaturated (oleate) fatty acids lead to l-cell apoptosis [ ] . on the other hand, activation of free fatty acid receptors with ffar /gpr agonist tak- [ ] or with short-chain fatty acids (ffar /gpr ) [ ] acutely increased glp- secretion from l-cells, indicating a balance between shorter term, and more chronic "lipotoxic" effects, is likely to govern overall glp- production, with the latter predominating after tcf l deletion in adipocytes. to explore further the direct role of tcf l in lipid metabolism, future studies will be necessary to assess lipolysis and insulin action in adipocyte lacking tcf l . in this study, high fat diet feeding altered glucose tolerance and insulin secretion. some discrepancies emerge from our report and the previous studies using the same genetic model. when examining the effects of conditional deletion tcf l in mature adipocytes, chen et al. [ ] found impairments in glucose tolerance after intraperitoneal injection of glucose in -month-old males and females under standard diet associated with hepatic insulin resistance. geoghegan et al. [ ] also generated a conditional knockout of tcf l in the adipocyte, reporting that knockout animals maintained on regular chow displayed no change in intraperitoneal glucose tolerance, whilst exaggerated insulin resistance and impaired glucose tolerance were apparent after high fat feeding [ ] . the authors demonstrated a role for tcf l in regulating lipid metabolism, finding a reduction in lipid accumulation and lipolysis during high fat feeding. we note that slightly different genetic strategies were used to create the mouse models in each case. thus, both our own and the earlier studies used a similar same loxp strategy with a cre recombinase under the control of the adiponectin promoter, but we have deleted exon , whereas chen et al. targeted exon and geoghegan et al. targeted exon [ , ] . the tcf l gene consists of exons [ ] and tissue-specific splicing variants could exert tissue-dependent distinct function and impact differently on specific cell type function such as the adipocyte or the beta cell [ ] . alternative splicing of tcf l potentially results in transcripts lacking exons and predicted to encode proteins lacking the β-catenin-binding domain [ ] . might changes in tcf l expression in adipose tissue contribute to the effects of type diabetes-associated variants? whilst inspection of data in the gtex database [ ] does not reveal any genotype-driven alteration in subcutaneous or breast adipose tissue tcf l expression for rs , we note that other intronic variants in this gene, such as rs , reveal significant expression quantitative trait loci for tcf l , and may conceivably influence incretin and insulin levels through the mechanisms identified here. in summary, tcf l could play a post-developmental role on metabolic tissues depending on its level of expression and the age of the mice. our findings provide an unexpected insight into the action of tcf l and reveal a novel mechanism through which adipocytes may impact insulin and incretin secretion. in demonstrating an action on multiple tissues, we also reveal a new level of complexity in the effects of gene action at the systems level on disease risk. such findings may highlight the importance of early interventions with incretins when tcf l expression is compromised and therefore the development of novel personalized therapies. m-sn-t co-designed the study, collected, analysed, interpreted the data and drafted the manuscript. gdsx conceived and co-designed the study, collected, interpreted the data and substantially critically revised the manuscript. gar conceived the study and co-wrote the manuscript. am-s contributed to the collection of data and critically revised the manuscript for important intellectual content. il contributed for resources. all authors gave final approval of the manuscript and gave consent to publication. gar is the guarantor of this work. awards, mrc programme grants (mr/r / , mr/j / , mr/l / ) an mrc experimental challenge grant (diva, mr/l x/ ), mrc (mr/n x/ ), diabetes uk (bda/ / this project has received funding from the european association for the study of diabetes, and university of birmingham to gdsx, european union's horizon research and innovation programme via the innovative medicines initiative joint undertaking under grant agreement no (rhapsody) to gar. this joint undertaking receives support from the european union's horizon research and innovation programme and efpia. duality of interest gar has received grant funding from the wnt signaling pathway effector tcf l and type diabetes mellitus abnormal glucose tolerance and insulin secretion in pancreas-specific tcf l -null mice selective disruption of tcf l in the pancreatic beta cell impairs secretory function and lowers beta cell mass diabetes risk gene and wnt effector tcf l /tcf controls hepatic response to perinatal and adult metabolic demand tcf l modulates glucose homeostasis by regulating creb-and foxo -dependent transcriptional pathway in the liver inhibition of adipogenesis by wnt signaling wnt b inhibits development of white and brown adipose tissues wnt signaling inhibits adipogenesis through beta-catenin-dependent and -independent mechanisms cross-talk between insulin and wnt signaling in preadipocytes: role of wnt co-receptor low density lipoprotein receptor-related protein- (lrp ) wnt signaling regulates mitochondrial physiology and insulin sensitivity lrp enhances glucose metabolism by promoting tcf l -dependent insulin receptor expression and igf receptor stabilization in humans variant of transcription factor -like (tcf l ) gene confers risk of type diabetes genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type diabetes tcf l polymorphisms modulate proinsulin levels and beta-cell function in a british europid population mechanisms by which common variants in the tcf l gene increase risk of type diabetes tcf l variant rs affects the risk of type diabetes by modulating incretin action the t allele of rs tcf l is associated with impaired insulinotropic action of incretin hormones, reduced h profiles of plasma insulin and glucagon, and increased hepatic glucose production in young healthy men tcf l rs impairs islet function and morphology in non-diabetic individuals human pancreatic islet three-dimensional chromatin architecture provides insights into the genetics of type diabetes transcription factor -like polymorphisms and type diabetes, glucose homeostasis traits and gene expression in us participants of european and african descent genotype and tissue-specific effects on alternative splicing of the transcription factor -like gene in humans tissue-specific alternative splicing of tcf l transcription factor tcf l genetic study in the french population: expression in human beta-cells and adipose tissue and strong association with type diabetes targeted deletion of tcf l in adipocytes promotes adipocyte hypertrophy and impaired glucose metabolism adipose tissue tcf l splicing is regulated by weight loss and associates with glucose and fatty acid metabolism tcf l is associated with high serum triacylglycerol and differentially expressed in adipose tissue in families with familial combined hyperlipidaemia tcf l expression is regulated by cell differentiation and overfeeding in human adipose tissue tcf l and glucose metabolism: time to look beyond the pancreas the diabetes gene and wnt pathway effector tcf l regulates adipocyte development and function tcf l and diabetes: a tale of two tissues, and of two species transcriptional control of adipose lipid handling by irf isolation and culture of mouse pancreatic islets for ex vivo imaging studies with trappable or recombinant fluorescent probes the transcription factor pax is required for pancreatic beta cell identity, glucose-regulated atp synthesis, and ca( +) dynamics in adult mice lipotoxicity disrupts incretinregulated human beta cell connectivity the developmental wnt signaling pathway effector beta-catenin/tcf mediates hepatic functions of the sex hormone estradiol in regulating lipid metabolism insulin treatment and high-fat diet feeding reduces the expression of three tcf genes in rodent pancreas the wnt signaling pathway effector tcf l is upregulated by insulin and represses hepatic gluconeogenesis alterations in tcf l expression define its role as a key regulator of glucose metabolism gpr is necessary but not sufficient for fatty acid stimulation of insulin secretion in vivo endogenous fatty acids are essential signaling factors of pancreatic beta-cells and insulin secretion identification of fatty acid binding protein as an adipokine that regulates insulin secretion during obesity circulating adipocyte fatty acid-binding protein induces insulin resistance in mice in vivo serum fatty acid-binding protein (fabp ) concentration is associated with insulin resistance in peripheral tissues, a clinical study insulin internalizes glut in the enterocytes of healthy but not insulin-resistant mice the type diabetes presumed causal variant within tcf l resides in an element that controls the expression of acsl suppression of circadian secretion of glucagon-like peptide- by the saturated fatty acid, palmitate chronic exposure to palmitate impairs insulin signaling in an intestinal l-cell line: a possible shift from glp- to glucagon production differential molecular and cellular responses of glp- secreting l-cells and pancreatic alpha cells to glucotoxicity and lipotoxicity glucagon-like peptide- production in the glutag cell line is impaired by free fatty acids via endoplasmic reticulum stress mechanistic insights into the detection of free fatty and bile acids by ileal glucagon-like peptide- secreting cells vascular, but not luminal, activation of ffar (gpr ) stimulates glp- secretion from isolated perfused rat small intestine short-chain fatty acids stimulate glucagon-like peptide- secretion via the g-protein-coupled receptor ffar the human t-cell factor- gene splicing isoforms, wnt signal pathway, and apoptosis in renal cell carcinoma tcf l splice variants have distinct effects on beta-cell turnover and function the genotype-tissue expression (gtex) project blood glucose levels during ogtt in male mice after weeks of hfd (n= control mice, n= atcf l ko mice). (f) blood glucose levels during ipitt in male mice after weeks of hfd (n= mice/genotype). (g) insulin plasma levels after intraperitoneal injection of glucose ( g/kg) in male mice following weeks of hfd (n= control mice, n= atcf l ko mice). (h) insulin plasma levels after oral administration of glucose ( g/kg) in male mice following weeks of hfd (n= control mice, n= atcf l ko mice). data were analysed by unpaired student's t-test, p*< . versus control. (i) insulin secretion on isolated islets from male mice after weeks of hfd during static incubation of mmol/l glucose ( g), mmol/l glucose ( g), a combination of mmol/l glucose and nmol/l glp- ( g+glp- ) and mol/l kcl, (n= mice/genotype). (j) proposed mechanism: decreased tcf l expression in adipocyte provoked increased circulating levels of lipids and subsequently, decreased incretins and glucose-stimulated insulin secretion (gsis) were observed in vivo the authors would like to thank dominic withers (mrc london institute of medical sciences (lms) at imperial college london) for the adipoq-cre mouse, lorraine lawrence (research histology facility at imperial college london) and stephen rothery (facility for imaging by light microscopy film at imperial college london) for technical assistance. key: cord- -nil vv h authors: alfano, niccolo; dayaram, anisha; axtner, jan; tsangaras, kyriakos; kampmann, marie-louise; mohamed, azlan; wong, seth t.; gilbert, m. thomas p.; wilting, andreas; greenwood, alex d. title: non-invasive surveys of mammalian viruses using environmental dna date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nil vv h environmental dna (edna) and its subdiscipline, invertebrate-derived dna (idna) have been used to survey biodiversity non-invasively [ , ]. water is ubiquitous in most ecosystems, and, among invertebrates, terrestrial haematophagous leeches are abundant and can be easily collected in many tropical rainforests [ , ]. such non-invasive nucleic acid sources can mitigate difficulties of obtaining wildlife samples, particularly in remote areas or for rare species. recently, edna/idna sources have been applied to monitoring specific wildlife pathogens [ , ]. however, previous studies have focused on known pathogens, whereas most wildlife pathogens are uncharacterized and unknown. non-invasive approaches to monitoring known and novel pathogens may be of particular benefit in ecosystems prone to viral emergence, many of which occur in areas where invasive sampling is challenging, such as tropical rainforests. here, we show that both edna from natural waterholes, and idna from terrestrial haematophagous leeches, can be used to detect unknown viruses circulating in mammalian hosts (figure ). using a curated set of rna oligonucleotides based on the virochip microarray assay [ ] as baits in a hybridization capture system, multiple mammalian rna and dna viruses were detected from both edna and idna samples. congruence was found between host dna assignment and viruses identified in leeches, and between animals observed visiting the waterholes and the viruses detected. our results demonstrate that edna/idna samples may represent an effective non-invasive resource for studying wildlife viral diversity. several of the detected viruses were novel, highlighting the potential of edna/idna for epidemiological analysis of emerging viruses prior to their emergence. highlights environmental dna (water and blood-sucking leeches) provided a non-invasive method of screening wildlife for viruses a comprehensive viral rna oligonucleotide bait set was developed to capture known and unknown mammalian virus diversity leech blood meal host determination and viruses identified were congruent viruses determined from water correlated with known and observed species visiting the water sources in brief alfano, dayaram, et al. demonstrate that environmental dna from southeast asian leech bloodmeals and waterholes from africa and mongolia can be used as to detect viruses circulating in wildlife. these nucleic acid sources may represent an effective non-invasive resource for studying wildlife viral diversity and emerging viruses pre-emergence. in order to test the sensitivity of the viral capture in recovering vertebrate host viruses, the capture system was first applied to a positive control consisting of medical leeches fed with human blood spiked with two rna viruses and two dna viruses at different concentrations [ ] . all four viruses were detected, even if enrichment efficiency (proportion of on-target viral reads) and target genome recovery varied among viruses (suppl. fig. ) . no viral contigs were identified in the negative controls included to monitor laboratory contaminations for either the leech or water experiments. tiger leeches (haemadipsa picta) and brown leeches (haemadipsa zeylanica) were collected in malaysian borneo and processed as pools (bulk samples) consisting of to individual leeches separated by leech species and sampling location. viruses were identified in of the leech pools analysed ( %) ( fig. ; suppl. tab. ). in of these ( %), two to three viruses were identified. sequence data from six vertebrate-infecting viral families were detected, including the anelloviridae, circoviridae, coronaviridae, parvoviridae, retroviridae and rhabdoviridae. the most common viral group detected was rhabdoviridae which was found in % of samples ( samples out of ), followed by coronaviridae which was identified in % of samples ( samples) . members of the anelloviridae were identified in % of samples ( samples) , retroviridae in three samples ( %), and parvoviridae and circoviridae in two samples ( %) (fig. ; suppl. tab. ). rhabdoviridae contigs were genetically similar to three different viral genera (suppl. tab. ). five contigs were most similar ( - %) to the vesicular stomatitis indiana virus (vsiv) (genus vesiculovirus) as determined by blast searches. the limited similarity of the contigs to known rhabdoviruses suggest they may represent a new genus related to fish rhabodviruses (perhabdovirus and sprivirus) or vesiculovirus (suppl. fig. ). the other contigs clustered phylogenetically, suggesting they represent two new species of a rhabdovirus related to lyssaviruses (suppl. fig. ). although in most cases one contig per sample was observed, in five samples (l , l , l , l , l ) two different viruses were found. several viral regions for rhabdoviridae were represented in the baits. however, most of the oligonucleotides were specific for the l gene which encodes the rna-dependent rna polymerase. all the recovered contigs mapped to the l gene (suppl. fig. a -c). the viral contig sequences were confirmed by pcr and sanger sequencing for l and l (suppl. fig. d ). all coronaviridae contigs matched a bat betacoronavirus as determined by blast searches with identities between - % (suppl. tab. ). the resulting sequence did not cluster in any of the four clades representing the known coronaviridae genera, suggesting it may represent a novel coronavirus genus (suppl. fig. ). each contig overlapped with the coronavirus rna-dependent rna polymerase gene (orf ab), the viral region mainly targeted by the rna oligonucleotide baits (suppl. anelloviridae contigs matched either porcine torque teno virus (pttv) ( - % identity), a giant panda anellovirus (gpav) ( - % identity) or a masked palm civet torque teno virus (pl-ttv) ( - % identity) (suppl. tab. ). the pttv contigs were found in two samples (l and l ), while the gpav and pl-ttv contigs were detected in six samples. gpav was the best match in four samples (l , l , l , l ) and pl-ttv in three (l , l , l ). in sample l both were identified. every anelloviridae contig mapped to the non-coding region of the relative reference genome since all anelloviridae baits targeted the same untranslated region (suppl. fig. a , c, e). the non-coding region sequenced is not phylogenetically informative and therefore, phylogenetic analysis could not be performed. viral contigs were confirmed by pcr and sanger sequencing for samples l , l , l and l (suppl. fig. b , d, f). three circoviridae contigs matching a porcine circovirus (pcv) ( % identity) were identified in l and l ( fig. ; suppl. tab. ). two non-overlapping but adjacent contigs were retrieved from l . a single contig overlapping with one of the two contigs determined from l was recovered from l (suppl. fig. a ). the contigs mapped to the pcv replication protein (rep), targeted by the circoviridae baits (suppl. fig. a ). the two overlapping contigs of l and l were confirmed by pcr and sanger sequencing (suppl. fig. b ). since the identity of the contigs with known viral sequences in genbank was %, no phylogenetic analysis was performed. parvoviridae contigs with the highest similarity to porcine parvovirus (ppv) were found in l ( contig with % identity) and l ( contigs with - % identity) (suppl. tab. ). the contig of l clustered within the tetraparvovirus genus, close to ungulate parvoviruses (porcine, ovine and bovine pv), while the contigs of l within the copiparvovirus genus, close to ppv (suppl. fig. ). two of the three contigs mapped to the replicase gene, while one from l mapped to an intergenic region (suppl. fig. c ). whereas the replicase region of ppv was covered by parvoviridae baits, the intergenic region was not (suppl. fig. c ). this portion of the virus may have been recovered by other non-parvoviridae baits targeting that region non-specifically. retroviridae contigs similar to the simian and feline foamy virus (spumaretrovirinae subfamily, - % identity) were detected in three samples (l , l , l ) (suppl. tab. ). phylogenetically the contigs clustered together as a sister group to the feline foamy viruses (felispumavirus genus), potentially being a new genus within the spumaretrovirinae (suppl. fig. ). the contigs mapped to the polymerase gene, which the exogenous retrovirus baits were designed to target (suppl. fig. d ). the mammalian hosts of the leeches were determined by metabarcoding [ ] . five waterholes from tanzania and six from mongolia were tested. from each waterhole, one water filtrate and one sediment sample were collected (except for one waterhole where only a sediment sample was collected), for a total of twenty-one samples. five samples (two water and three sediment samples) in total were positive for viral sequences ( . %). four viral families were identified including: retroviridae, herpesviridae, adenoviridae and papillomaviridae. in filtered water and sediment samples collected from the same waterhole, only one virus per sample was generally identified and in one location (wm and sm ) contigs from different viral families were isolated based on sample type. differences between sediment and water are not unexpected as the sediment likely represents a longer-term accumulation of biomaterial and the water represents more acute contamination at the surface and variable mixing throughout. of the water filtrate samples tested, two samples from mongolia (wm and wm ) ( . %) had viral contigs with % identity to the equid herpesvirus and (ehv- and ehv- ). the contig of wm mapped to the membrane glycoprotein b, whereas the two contigs of wm to the dna packaging protein and membrane glycoprotein g, all regions covered by the herpesviridae baits (suppl. fig. a -d). a nested panherpes pcr targeting the dna polymerase gene and the resulting sanger sequences further confirmed ehv presence (suppl. fig. e ). several equine species including domestic horses inhabit the gobi desert [ ] , which is consistent with the presence of these viruses. from the sediment samples tested, two from mongolia and one from tanzania yielded viral sequences ( %) representing three viral families including: retroviridae, adenoviridae and papillomaviridae. mongolian sediment sample sm was positive for four contigs mapping to the protease (pro) gene of the jaagsiekte sheep retrovirus (jsrv) with % identity (suppl. fig. a ). jsrv from this sample was further confirmed by pcr (suppl. fig. a ). mongolia sediment sample sm was positive for equine adenovirus ( % identity) with a contig mapping to a region comprised between the pvi and hexon capsid genes (suppl. fig. b ). given that multiple equine species are found in the gobi desert in mongolia, it is likely that the water sources sampled may have been frequented by these species [ ] . the sediment sample from tanzania st was positive for a zetapapillomavirus related to the equus caballus papillomavirus and equus asinus papillomavirus ( % identity; e -e genes) (suppl. fig. c ; suppl. fig. ), consistent with the detection of plains zebra's (equus quagga) dna from this water source [ ] . given that both captive and wild zebras have been known to contract bovine papillomaviruses [ , ] it is likely that they are susceptible to different equine papillomaviruses. emerging infectious viruses increasingly threaten human, domestic animal and wildlife health [ ] . sixty percent of emerging infectious diseases in humans are of zoonotic origin [ ] . wildlife trade and consumption of bushmeat, especially in africa and asia, have increasingly played a role in pathogen spillovers into human populations [ , ] . wildlife markets have recently facilitated the spillover of sars-cov- to humans [ ] resulting in a pandemic [ ] . the - sars-cov outbreak [ ] , the ebola outbreak in west africa [ ] and the global emergence of hiv [ ] have all been linked to wildlife trade and bushmeat consumption. early detection of novel infectious agents in wildlife represent a key factor to prevent their emergence. however, identification, surveillance and monitoring of emerging viruses using currently broadly applied approaches based on direct sampling of wildlife requires enormous investment in sampling, particularly for viruses that have low prevalence [ ] . for example, , wild birds were sampled in germany to detect avian influenza prevalence below % [ ] . similarly, sampling of over , animals in poland was required to determine an . % prevalence of african swine fever virus (asf) [ ] . sampling under remote field conditions or in developing countries present additional challenges. we provide evidence that environmental and invertebrate-derived dna samples including waterhole water, sediment and wild haematophagous terrestrial leeches can be used to survey known and unknown viruses. dna and rna viruses could be detected in % and . % of the idna (leech) and edna (waterhole) samples, respectively. the congruence of host dna assignment for leeches and viral families identified suggests that bloodmeals are a useful resource for determining viral diversity. similarly, the detection of primarily equine viruses from african and mongolian waterholes, where intense wild equid visitation rates were directly observed, suggests edna viruses from this resource reflect host utilization of the water and do not derive from other environmental sources such as fomites distributed over long distances. pcr based approaches, as used in earlier studies to detect pathogens from flies [ , ] or, under laboratory conditions, in medicinal leeches [ ] , require prior knowledge about the expected pathogens in the samples. the unknown viral diversity in the wild, and the potential degradation of viral nucleic acids in bloodmeals or in the environment, may affect detection by pcr resulting in high false negative rates. rna oligonucleotide based hybridization capture overcomes such limitations because the short baits can capture divergent and degraded dna. the comprehensive viral group representation in the rna bait set also allows for the determination of both viral presence and viral diversity with a relatively simple workflow. the ability of oligonucleotides with substantial divergence from the target sequence to capture more distantly related sequences is particularly useful in virology since most viruses are uncharacterized in wildlife and many evolve rapidly. using short rna baits to capture highly conserved sequences from every known vertebrate viral genome is a useful and relatively inexpensive approach for providing an initial viral identification. however, to fully characterize each virus, the rna oligonucleotide bait set would need to be customized to retrieve full length viral genomes. initial screening with full length genomes for all viruses is costly and may result in detection of host dna in cases of spurious homology between viruses and host dna sequence. several novel viruses were identified with our short rna bait approach, which is not unexpected as little is known about the virology of wildlife in southeast asia, where the leeches were collected. several viral contigs were phylogenetically distinct from known viruses and may represent new genera. for example, the novel coronavirus identified in leech bloodmeals did not cluster with any of the known coronaviridae clades. this finding highlights the ability of this method to detect unknown viruses. we could also associate the novel corona-and rhabdoviruses with mammal bloodmeals with limited evidence of a cervid association for both. cervids are regularly sold as bushmeat in wildlife markets [ ] and both recent coronavirus epidemics (sars-cov [ ] and sars-cov- [ ] ) spilled over from wildlife. this suggests that edna/idna-based pathogen surveillance approaches may complement efforts to proactively identify viruses that could potentially spillover to humans or livestock. the collection of wild haematophagous invertebrates such as leeches or water and sediments has both advantages and disadvantages compared to invasively collected wildlife samples. large amounts of dna can be extracted from bloodmeals, in particular when leeches are processed in bulk. we pooled up to leeches and many of our leech bulk samples contained a diverse mix of mammalian dna. a disadvantage of leeches is that they cannot be found in all environments: for example haematophagous terrestrial species are restricted to tropical rainforests of asia, madagascar and australia [ ] . in addition, leech feeding biases could influence diversity surveys [ , ] . however, this disadvantage could be overcome in the future by employing additional invertebrates such mosquitoes [ ] or carrion flies [ ] . waterholes are commonly found in almost all environments. in environments with seasonal water shortages, dna from animals can become highly concentrated due to many animals utilizing rare water sources. the disadvantages are that the dilution factor of water, depending on water body size, can obscure rare dna sequences and mixed host species sequences are generally the rule rather than the exception. further experiments with field filtration and sample concentration such as methods used with pathogen detection in waste water may improve detection rates [ ] . environmental dna and in particular its subdiscipline invertebrate-derived dna viral hybridization capture may be a useful and economical tool for identifying and characterizing major viral pathogens particularly in difficult to access sampling environments prior to viral emergence. sampling in environments where direct access to animals is difficult or highly restricted, edna and idna may be the only option to detect viral pathogens in the wild. the current study suggests this approach will be successful in either complementing or replacing invasive approaches. leeches two types of leeches, tiger leeches (haemadipsa picta) and brown leeches (haemadipsa zeylanica) were collected from february to may in the deramakot forest reserve in sabah, malaysian borneo as described in abrams et al. [ ] and axtner et al. [ ] . all leeches of the same type (tiger or brown) from the same site and occasion were pooled and processed as one sample. number of leeches ranged from to per pool (median= ). samples were stored in rna fixating saturated ammonium sulfate solution and exported under the permit 'jkm/mbs. - / jld. ( )' issued by the sabah biodiversity council. a total of pools (l -l ) were selected for viral capture to maximize representation of host wildlife species identified from bloodmeals [ ] . at each waterhole, ml of water was passed through a . µm sterivex filter unit (millipore) using a disposable -ml syringe to remove debris from water. in addition, g of the top - cm of sediment was collected at each waterhole. the samples were stored on ice packs during the respective field trip, and frozen at - °c. in total water filtrate and sediment samples were sampled at waterholes, six respectively from mongolia and tanzania. for each sample, ml of water filtrate was ultra-centrifuged at , rpm for hours to pellet dna and viral particles. the supernatant was then removed, the pellet re-suspended in ml of cold phosphate-buffered saline (pbs) (ph . ) (sigma-aldrich) and left at °c overnight. leeches were cut into small pieces with a new scalpel blade and lysed overnight (≥ hours) at °c in proteinase k and atl buffer at a ratio of : ; . water and sediment µl of the centrifuged filtrate was used to extract viral nucleic acids using the rtp ® dna/rna virus mini kit (stratec biomedical). the following modifications were made to the original protocol: µl of lysis buffer, µl of binding buffer and µl of proteinase k and carrier rna were used per sample. samples were eluted in µl. the nucleospin soil kit (macherey-nagel) was used to extract dna/rna from sediment. mg of soil was extracted according to the manufactures protocol using an elution volume of µl. as a positive control medical leeches (hirudo spp.) were fed human blood spiked with four viruses [ ] . two rna viruses, influenza a and measles morbillivirus, and two dna viruses, bovine herpesvirus and human adenovirus were used (see kampmann et al. [ ] for details). the rna was reverse transcribed using superscript iii and iv (thermo fisher scientific) with random hexamers prior to second-strand synthesis with klenow fragment (new england biolabs). the resulting double-stranded cdna/dna mix was sheared to an average fragment size of bp using a m focused ultrasonicator (covaris). sheared product was purified using the zr- dna clean & concentrator- kit (zymo). dual-indexed illumina sequencing libraries were constructed as described by meyer and kircher [ ] with the modifications described in alfano et al. [ ] . each library was amplified in three replicate reactions to minimize amplification bias in individual pcrs. the three replicate pcr products for each sample were pooled and purified using the minelute pcr purification kit (qiagen). negative control libraries were also prepared from different stages of the experimental process (extraction, reverse transcription, library preparation and index pcr) and indexed separately to monitor any contamination introduced during the experiment. amplified libraries were quantified using the tapestation (agilent technologies) on d screentapes. the targeted sequence capture panel was designed based on the oligonucleotide probes represented on the virochip microarray [ ] . the virochip is a pan-viral dna microarray comprising the most highly conserved mer sequences from every fully sequenced reference viral genome present in genbank, which was developed for the rapid identification and characterization of novel viruses and emerging infectious disease. we retrieved the viral oligonucleotides from the th generation virochip (viro ) [ ] , which are publicly available at ncbi's gene expression omnibus (geo) repository [ ] , accession number gpl https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gpl ( ). this platform includes ~ , oligonucleotides ( mer nucleotides) derived from ~ , viral species. we excluded sequences from bacteriophage, plant viruses, viral families infecting only invertebrates and endogenous retroviruses. we included viruses that could have both vertebrate and invertebrate hosts, such as vertebrate viruses with insect vectors. exogenous retroviruses were represented but murine leukemia viruses (mlvs) were removed. mlvs sequences may interfere with the capture of other viruses, since mlvs can cross enrich endogenous retroviruses which can represent large portions of several vertebrate genomes and mask rarer viral sequences. control oligonucleotides included in the virochip, such as those from human genes, yeast intergenic sequences, and human papilloma virus sequences present in hela cells were also removed. ninety-two -mer oligonucleotides covering (spaced end-to-end) the entire pol and gb genes of equine herpesvirus (ehv- ) were included as pcr screening of the water samples indicated they were positive for this virus (data not shown). the resulting , oligonucleotides were examined for repetitive elements, short repeats, and low complexity regions, which are problematic for probe design and capture, using repeatmasker. repetitive motifs were identified in oligonucleotides, which were removed. the final targeted sequence capture panel consisted of , unique sequences which were synthesized (as a panel of biotinylated rnas) at mycroarray (ann arbor, usa). in-solution target enrichment via hybridization-based capture was performed according to the manufacturer's protocol (mybaits® custom targeted enrichment, mycroarray, ann arbor, usa), with the following modifications for likely partially degraded samples with an expected low target viral content: ul dynabeads® m- streptavidin beads (invitrogen) instead of ul dynabeads® myone™ streptavidin c (invitrogen); hybridization, bead-bait binding, and wash steps temperature set to °c; hours hybridization time; ng baits per reaction; μl indexed library inputs. for capture, libraries generated from pooled leeches consisting of more than individuals were captured individually, while libraries generated from pools of fewer individuals were combined to have a comparable number ( ) ( ) ( ) ( ) ( ) ( ) of leeches per capture. this was done in order to ensure that each individual leech represented in each library was allocated enough bait for capture. for capture libraries generated from water and sediment samples. samples were pooled in groups of two. sediment and water cdna and dna were pooled separately. per pooled sample, µl of baits were used to ensure enough bait for each sample. the enriched libraries were reamplified using herculase ii fusion dna polymerase (agilent technologies) with p and p illumina library outer primers with the same cycling conditions described in alfano et al. . the re-amplified enriched libraries were purified using the minelute pcr purification kit (qiagen), quantified using the tapestation (agilent technologies) on d screentapes and finally pooled in equimolar amounts for single-read sequencing on two lanes of an illumina nextseq with the tg nextseq® / high output kit v ( cycles). a total of , , sequence reads bp long were generated (average: , , single reads per sample; standard deviation [sd]: , , ) (suppl. tab. ) and sorted by their dual index sequences. cutadapt v . and trimmomatic v . were used to remove adapter sequences and low-quality reads using a quality cutoff of and a minimal read length of nt. after trimming, % of the sequences were retained. three different approaches (a, b, c) were used to analyse the viral capture data: a) leech reads were removed from the dataset by alignment to the helobdella robusta genome v . (assembly gca_ . ), which is the only complete genome of hirudinea available in genbank, and all leech sequences from genbank ( , sequences resulting from "hirudinea" search) using bowtie v . . . [ ] . this left % of the original reads (suppl. tab. ). then, the filtered reads were searched by blast against a database generated from the capture bait sequences. the reads which matched with baits were then extracted and screened against the entire ncbi nucleotide database (nt) using blastn to find the best viral match. the filtered reads were mapped both to the corresponding bait sequence and the genome sequence of the best hit obtained by blast against the complete nt database, in order to generate a consensus sequence. this consensus sequence was again searched against the ncbi nt database using blastn to obtain a viral assignment. b) leech reads were removed as in method a. in addition, rrna reads were removed using sortmerna [ ] , leaving % of the original reads (suppl. tab. ). the filtered reads were de novo assembled using both spades v . . [ ] and trinity v . . [ ] assemblers. the obtained contigs from spades and trinity were pooled and clustered to remove duplicated or highly similar sequences using usearch v . . [ ] with a % threshold identity value. the centroids were then subjected to sequential blast searches against the ncbi nucleotide database and ncbi refseq viral protein database using blastn and blastx, respectively. c) the adaptor and quality trimmed data were uploaded to genome detective [ ] , a web base software that assembles viral genomes from ngs data. the software first groups reads into different buckets based on the proteins similarity to different viral hits. genome detective then de novo assembles the reads of each bucket creating a longer consensus sequence that is then searched against the ncbi refseq viral database using blastx and blastn algorithms. the results of amino acid and nucleotide search are combined and viral hit is assigned based on the best combined score. bacteriophages, invertebrate viruses and retroviruses were excluded from subsequent steps, which only focused on eukaryotic, specifically vertebrate viruses. the results of the three methods were compared and the viral contigs obtained were manually inspected. if more than one method generated a contig with the same viral hit, the contigs from each method were compared. if they had the same sequence or were overlapping, the longest contig was selected. the filtered reads were mapped to the viral contigs to calculate the number of viral reads for each virus. finally, the viral contigs were mapped to the reference genome of the virus corresponding to the best blast hit using geneious v . . (biomatters, inc.) [ ] . the baits were mapped to the same references to determine the genomic positions targeted by our bait panel for each virus. viral contigs were assigned to viral families according to the best blast results. comprehensive sets of representative sequences from these viral families were retrieved from genbank and aligned with the contigs using mafft v . [ ] . phylogenetic analysis was performed using the maximum-likelihood method based on the general time reversible substitution model with amongsite rate heterogeneity modelled by the Γ distribution and estimation of proportion of invariable sites available in raxml v [ ] , including bootstrap replicates to determine node support. phylogenetic analyses were performed only on viral contigs i) showing divergence from known viruses, i.e. with both blast identity and coverage to the best reference below %, to place them into a phylogenetic context, and ii) mapping to phylogenetically relevant genomic regions. therefore, circoviridae and anelloviridae contigs were excluded as were those identified from water. host identification of leeches followed an edna/idna workflow recently published [ ] . in summary, leech samples were digested and short fragments of the mitochondrial markers s, s and cytochrome b were amplified in four pcr replicates each resulting in pcr replicates per sample. we used a twin-tagging -step pcr protocol and pcr products were sequenced using an illumina miseq (for details please see axtner et al. [ ] ). after demultiplexing and read processing, each haplotype was taxonomically assigned to a curated reference database using protax [ ] . taxonomic assignments followed the criteria of axtner et al. . the primers listed in suppl. tab. were designed to confirm by pcr the viral contig sequences generated by the three approaches (a, b, c see above) from the leech samples. for pcrs targeting rna viruses, ul of extract were digested with rdnase i (ambion) following the manufacturer's protocol. the dnase-digested extracts were then purified using the rneasy minelute cleanup kit (qiagen). rna was reverse transcribed into cdna using iscript™ reverse transcription supermix (bio rad). sediment and water samples that tested positive for ehv and jsrv were screened using a previously described pan-herpes pcr [ ] and for jsrv [ ] , respectively. the resulting amplicons were sanger sequenced. environmental dna for wildlife biology and biodiversity monitoring environmental dna-an emerging tool in conservation for monitoring past and present biodiversity terrestrial mammal surveillance using hybridization capture of environmental dna from african waterholes idna from terrestrial haematophagous leeches as a wildlife surveying and monitoring tool-prospects, pitfalls and avenues to be developed tropical rainforest flies carrying pathogens form stable associations with social nonhuman primates design-and model-based recommendations for detecting and quantifying an amphibian pathogen in environmental samples using a pan-viral microarray assay (virochip) to screen clinical samples for viral pathogens leeches as a source of mammalian viral dna and rna-a study in medicinal leeches equus hemionus. the iucn red list of threatened species sarcoids in captive zebras (equus burchellii): association with bovine papillomavirus type infection detection of bovine papillomavirus dna in sarcoid-affected and healthy free-roaming zebra (equus zebra) populations in south africa public preferences for one health approaches to emerging infectious diseases: a discrete choice experiment global trends in emerging infectious diseases toward a quantification of risks at the nexus of conservation and health: the case of bushmeat markets in lao pdr wildlife trade and the emergence of infectious diseases a pneumonia outbreak associated with a new coronavirus of probable bat origin the sars, mers and novel coronavirus (covid- ) epidemics, the newest and biggest global health threats: what lessons have we learned? identification of a novel coronavirus in patients with severe acute respiratory syndrome emergence of zaire ebola virus disease in guinea origins of hiv and the aids pandemic. cold spring harb surveillance of wild birds for avian influenza virus chances and limitations of wild bird monitoring for the avian influenza virus h n -detection of pathogens highly mobile in time and space african swine fever epidemic tracking zoonotic pathogens using blood-sucking flies as' flying syringes empty forests, empty stomachs? bushmeat and livelihoods in the congo and amazon basins debugging diversity-a pan-continental exploration of the potential of terrestrial blood-feeding leeches as a vertebrate monitoring tool shifting up a gear with idna: from mammal detection events to standardised surveys broad surveys of dna viral diversity obtained through viral metagenomics of mosquitoes assessing the feasibility of fly based surveillance of wildlife infectious diseases two-step concentration of complex water samples for the detection of viruses an efficient and robust laboratory workflow and tetrapod database for larger scale environmental dna studies illumina sequencing library preparation for highly multiplexed target capture and sequencing variation in koala microbiomes within and between individuals: effect of body region and captivity status microarray-based detection and genotyping of viral pathogens virus identification in unknown tropical febrile illness cases using deep sequencing gene expression omnibus: ncbi gene expression and hybridization array data repository fast gapped-read alignment with bowtie sortmerna: fast and accurate filtering of ribosomal rnas in metatranscriptomic data spades: a new genome assembly algorithm and its applications to single-cell sequencing trinity: reconstructing a full-length transcriptome without a genome from rna-seq data search and clustering orders of magnitude faster than blast genome detective: an automated system for virus identification from high-throughput sequencing data geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data mafft multiple sequence alignment software version : improvements in performance and usability raxml version : a tool for phylogenetic analysis and post-analysis of large phylogenies unbiased probabilistic taxonomic classification for dna barcoding long term stability and infectivity of herpesviruses in water the long terminal repeat of jaagsiekte sheep retrovirus is preferentially active in differentiated epithelial cells of the lungs we would like to thank joseph derisi for guidance in using his oligonucleotide data. this project received financial support from the german federal ministry of education and research (bmbf fkz: ln a) to na, ja, am and aw and was supported by funds from the leibniz gemeinschaft, saw- -izw- to ad and adg. we thank peter seeber and sanatana-erini soilemetzidou for collection of water and sediment samples in africa and mongolia respectively. we thank the sabah forestry department, especially johnny kissing, peter lagan, and datuk sam mannan, for supporting the fieldwork and the sabah biodiversity council for providing research, collection, and export permits (jkm/mbs. - / jld. ) for the leech work. adg and aw designed the study; am and stw collected the leeches in the field; ja performed the leech nucleic acids extractions and the pcrs on leeches; na and ad performed the capture experiments; na and kt performed the bioinformatics and phylogenetic analyses; na, ad, adg and aw wrote the manuscript; na, ad, ja, kt, mlk, tpg, aw and adg reviewed the manuscript; all authors read and approved the final manuscript. the authors declare no competing interests. in the upper panel the left photo shows a leech feeding on a frog in a rainforest of vietnam (courtesy andrew tilker; leibniz-izw) and the right photo shows an african waterhole in tanzania (courtesy peter seeber; leibniz-izw). the middle panel depicts the hybridization capture protocol. briefly, illumina libraries were produced from reverse transcribed rna and dna from leech bloodmeals or from waterhole surface water and sediments. biotinylated viral rna baits were hybridized to the libraries and non-target dna was washed away. the remaining dna was sequenced, reads assembled into contigs and mapped to reference viral genomes. these contigs were further analyzed to determine viral identity. viral identity was paired with host identity determined either by mammalian metabarcoding of the leech samples, or by observation of waterhole usage. relative abundance of viruses from each family, shown as the percentage of the total number of viral reads in each leech and waterhole sample. in the sample names, s stands for sediment, w for water, t for tanzania, m for mongolia and l for leeches. the leech host assignment for each leech sample is shown on the right (see suppl. tab. for further details). key: cord- -mt h y authors: muller, alana; sirianni, lindsey a.; addante, richard j. title: neurophysiological correlates of the dunning-kruger effect date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mt h y the dunning-kruger effect (dke) is a metacognitive phenomenon of illusory superiority in which individuals who perform poorly on a task believe they performed better than others, yet individuals who performed very well believe they under-performed compared to others. this phenomenon has yet to be directly explored in episodic memory, nor explored for reaction times or physiological correlates. we designed a novel method to elicit the dke via a test of item recognition while electroencephalography (eeg) was recorded. throughout the task, participants were asked to estimate the percentile in which they performed compared to others. results revealed participants in the bottom th percentile overestimated their percentile, while participants in the top th percentile underestimated their percentile, exhibiting the classic dke. reaction time measures revealed a condition x group interaction whereby over-estimators responded faster than under-estimators when estimating being in the top percentile and responded slower when estimating being in the bottom percentile. between-group eeg differences were evident between over-estimators and under-estimators during dunning-kruger responses, which revealed fn -like effects of familiarity supporting differences for over-estimators from - ms, whereas ‘old-new’ memory erp effects revealed a late parietal component (lpc) associated with recollection-based processing from - ms for under-estimators that was not evident for over-estimators. findings suggest over- and under-estimators use differing cognitive processes when assessing their performance, such that under-estimators rely on recollection during memory and over-estimators draw upon excess familiarity when over-estimating their performance. episodic memory thus appears to play a contributory role in metacognitive judgments of illusory superiority and inferiority. graphical abstract event-related potentials (erps) were recorded for the dunning-kruger effect as subjects made metacognitive judgments about performance on a memory task. over- and under-estimators exhibited a crossover interaction in response times when believing they did best and worst, respectively. a crossover pattern was also observed for erps: lpc signals of recollection were found for under-estimators, whereas familiarity-based fn effects were evident for over-estimators and correlated with estimates. the dunning-kruger effect (dke) is a metacognitive phenomenon of illusory superiority in which individuals who perform poorly on a task believe they performed better than others, yet individuals who performed very well believe they underperformed compared to others. this phenomenon has yet to be directly explored in episodic memory, nor explored for reaction times or physiological correlates. we designed a novel method to elicit the dke via a test of item recognition while electroencephalography (eeg) was recorded. throughout the task, participants were asked to estimate the percentile in which they performed compared to others. results between-group eeg differences were evident between over-estimators and underestimators during dunning-kruger responses, which revealed fn -like effects of familiarity supporting differences for over-estimators from - ms, whereas 'oldnew' memory erp effects revealed a late parietal component (lpc) associated with recollection-based processing from - ms for under-estimators that was not evident for over-estimators. findings suggest over-and under-estimators use differing cognitive processes when assessing their performance, such that under-estimators rely on recollection during memory and over-estimators draw upon excess familiarity when over-estimating their performance. episodic memory thus appears to play a contributory role in metacognitive judgments of illusory superiority and inferiority. "the fool doth think he is wise, but the wise man knows himself to be a fool" (shakespeare, ) the dunning-kruger effect (dke) describes the phenomenon in which poor performers on a task tend to overestimate their performance while high performers on a task tend to underestimate their performance. overconfidence has been a topic of interest throughout recorded history, as early as the time of socrates, who was noted by plato for identifying "…it is likely that neither of us knows anything worthwhile, but he thinks he knows something when he does not, whereas when i do not know, neither do i think i know; so i am likely to be wiser than he to this small extent, that i do not think i know what i do not know", and later with charles darwin noting more simply that "ignorance more frequently begets confidence than does knowledge" (darwin, (darwin, / . also implied by these timeless observations is that these metacognitive illusions are bi-directional, such that the more competent individuals who perform highest tend to also under-estimate their abilities (for review see zell et al., ) . perceptions of inaccurate confidence in one's abilities is a common phenomenon that can happen to anyone (including the authors and readers) and can lead to serious problems that are often preventable. for example, the overconfidence that the titanic was unsinkable led to loss of over lives (bartlett, ; lord, ; lord, ) and in modern times, the covid- pandemic of has been noted for widespread over-estimation of abilities to manage the pandemic concurrent with early underestimation of its world-wide impact by many world health organizations, governments, and media alike (with a few exceptions). conversely, when the most competent people hold contributions from teams or society because they think others are better suited, the effects can also lead to serious problems that can be a substantial loss on society. for example, we could lose the most competent people for leadership positions, and instead embrace those whose merely think they are the best (incorrectly). therefore, it is important to understand how and why these inaccurate judgments of one's abilities occur, so that they can be prevented. in , the study of over-and under-confidence was further characterized by david dunning and justin kruger, social psychologists who explored combining the two effects under one term. in their landmark study, dunning and kruger conducted several studies showing that bottom performers on a tests of humor judgments, logical reasoning, and grammar overestimated their performance percentile and that, conversely, top performers underestimated their performance percentile. the name "the dunning-kruger effect" became highly popularized throughout mainstream culture and society (herrera, ; marczyk, ) . generically, the dke describes a phenomenon in which self-estimates of performance on a task and the related percentile ranking among others also participating in the task do not match actual performance. the direction of this mis-match of self-perception extends in both directions (sieber, ) . empirically, this paradigm has been used successfully in many different tasks to elicit the dke on such tasks as microeconomics college exams (ryvkin, krajč, & ortmann, b) , knowledge about the university of chicago (burson et al., ) , logical reasoning (schlösser et al., ) , cognitive reflection (pennycook et al., ) , size judgments (sanchez, ) , finance (atir, rosenzweig, & dunning, ) , and computer programming (critcher & dunning, ). more broadly, the effect has been referred to in popular culture contexts of aviation, (woodbury, ) , driving (svenson, ) , and professors rating their own teaching skills (cross, ) . however, an account of the cognitive processes leading to these illusory experiences has yet to be fully explored. the dunning-kruger effect. the dke is a psychological phenomenon in which a mismatch in one's perceived ability and the reality of one's objective performance on a given task appears to be directionally moderated by the factor of ability. low performers (individuals who do not earn high scores on a test using an objective scale) tend to overestimate their performance percentile on a task while high performers (individuals who earn high scores measured on an objective scale) tend to underestimate their performance percentile on the same task. most paradigms used to research the dke follow a similar format: participants are given a task such as a series of logical reasoning problems or math problems, etc., and after they finish the task in its entirety, they are asked to estimate their overall percentile estimate and objective score on the task. that is, their metacognitive judgment is measured as a single data point assessed at the conclusion of the study and represents their aggregated assessment of performance across many trials. that approach has not, however, permitted the ability to measure multiple instances of the cognitive phenomenon per person, and has precluded being able to collect simple statistical measures of central tendency, such as reaction times. accordingly, extant approaches have also precluded the collection of neuroscientific measures that rely upon have multiple measures per person (electroencephalography (eeg), fmri, etc.), but which may illuminate contributory cognitive processes. though relatively limited in scope, researchers have investigated illusory overconfidence for decades. one of the earliest studies of overconfidence was conducted by adams and adams ( ) who found that participants' confidence in their ability to recognize correctly spelled words was higher than their accuracy at the task. five years later, oskamp ( ) found that when clinical psychologists were asked to make a diagnosis for a case study, their confidence in their decision increased when they were given more information about the case although their accuracy did not increase. these instances showed that confidence and accuracy were not necessarily correlated, both in experimental studies and in more practical issues of clinical diagnoses. this finding has persisted in modern research on memory as well (hirst et al., ; kvavilashvili, et al., ) and will be discussed later below. memory research intersects with the dke at the point of confidence in one's memories and the accuracy of those memories. a large collection of research is available that supports the finding that high confidence does not beget high accuracy (chua, et al, ; roediger & desoto, ; koriat et al., ; nelson & narens, ; wells, et al., ) . people can often exhibit illusory over-confidence in their memory judgments, and this has had wide-ranging impacts on society. for example, socalled 'flashbulb memories' and thought to be some of our most salient memories, yet are found to be no more accurate than other memories despite the participants' high confidence in their accuracy (neisser & harsch, ; brown & kulik, ) . other experiments studying the 'false fame' phenomenon demonstrate that familiarity with names can lead to falsely recognizing them as famous later (dywan & jacoby, ; jacoby, woloshyn, & kelley, , : familiarity caused participants to falsely believe an ordinary name was famous because they could not recollect the context in which the name was presented. perhaps some of the most impactful research on memory failing to correlate with accuracy pertains to the criminal justice system (heaton-armstrong, et al., ; loftus, ; loftus & zanni, ; nadel & sinnott-armstrong, ; pena, et al., ; schacter & loftus, ) . in a study by pena et al. ( ) researchers asked participants to make judgments about their accuracy on a memory test for a mock crime observed earlier in a study. they found that participants who performed poorly on the memory test for details of a mock crime overestimated their memory accuracy, similar to what has been found for making prospective judgments of learning (jol) (irak, et al., ; müller et al., ; metcalfe & finn, ; koriat & ma'ayan, ) . these results were consistent with the results of poor performers exhibiting the dke, demonstrating the link that may inherently exist between the two domains of memory and metacognition, yet which remains largely unexplored. throughout the years, overconfidence in general was studied in contexts such as social situations (dunning, et al., ; vallone, et al., ) , tasks of differing degrees of difficulty (bradley, ; sen & boe, ) , and ways to reduce overconfidence (arkes, christensen, lai, & blumer, ; zechmeister, rusch, & markell, ) . in this research, there was a common finding of people having overconfidence in wrong answers (fischhoff, slovic, & lichtenstein, ; harvey, ; howell, ; may, ) and a less common finding of under-confident correct answers for top performers (sieber, ) since the focus of the research at that time was not the high performers. the term "overconfidence effect" developed to describe this pattern of higher self-estimates of confidence than ability, which stemmed into a related social psychological phenomena known as the 'better-than-average-effect' (btae), whereby people are found to rate themselves as better than average peers (for review see zell et al., ; alicke & govorun, ; chambers & windschitl, ; moore & healy, ; sedikides, gaertner, & cai, ; brown, brown, , hartwig & dunlosky, ) . the btae reflects a general self-evaluation bias (kwan et al., ; moore & healy, ) ( )). postulated that the reason for low performers' incorrect estimation for objective performance scores is due to meta-ignorance or two-fold ignorance (kruger & dunning, ) . this means that poor performers are unaware that they are ignorant of the details needed to correctly complete the task and that double ignorance bolsters feelings of false superiority (schacter, ) . more simply, poor performers do not have the knowledge to complete the task correctly and because they do not know their answers are incorrect, they believe they are performing well (i.e. 'ignorance is bliss') (schlösser et al., ) . while this is a very useful behavioral description, it does little to advance an understanding of the cognitive processes involved in the pervasive illusion. dunning and kruger also used what they coined 'reach-around-knowledge' to explain low performers' high confidence in their abilities. the term 'reach-around knowledge' refers to a person's unique knowledge gained from previously participating in a similar task to the task presented and generalizing their past experiences to the current experience (dunning, ) . kruger and dunning postulated that participants use 'reach-around-knowledge' to help achieve their estimation, though this doesn't necessarily require that it leads them to an accurate perception. according to this view, in order to give an overestimation, one must first have knowledge about the same or similar tasks but not have the knowledge about the details of the task to complete it correctly. having a larger store of reach-around knowledge should therefore increase the overestimation of poor performer's scores. on the contrary, having a smaller store of reach-around knowledge should decrease the overestimation of one's abilities resulting in a more accurate performance estimate. dunning and kruger's 'reach-around-knowledge' account, however, is only a theoretical concept that has not yet been operationally defined or objectively measured, and also lacks a substantive construct grounded in cognitive psychology. nevertheless, it provides a useful platform from which to expand upon in investigating this phenomenon within a theoretical construct. the 'reach-around-knowledge' account provided by dunning and kruger refers to changes in current behavior based upon prior experience, which is fundamentally a defining feature of memory (rudy, ) , and as such it recognizes a key role that memory processes may play in contributing to this metacognitive illusion. there happens to be a rich and robust empirical history of memory processes being both theoretically and operationally defined and studied. two of these cognitive processes of episodic memory that may contribute to the dke are familiarity and recollection (yonelinas, ; eichenbaum et al., ; yonelinas et al., ) . these processes align closely with the general concepts that dunning and kruger attributed to their 'reach-around-knowledge' account, and importantly, we can draw upon this platform of memory processes in approaching the dke in a systematic manner, which will be discussed in depth in the sections below. familiarity and recollection. the cognitive processes of familiarity and recollection have featured prominently in theoretical models of episodic memory for several decades (eichenbaum, yonelinas, & ranganath, ; khoe, kroll, yonelinas, dobbins, & knight, ; rugg & curran, ; squire, wixted, & clark, ; yonelinas, yonelinas, , . broadly, familiarity refers to having exposure to some material but not being able to recall the context in which it was presented, usually associated with mid-levels of confidence, whereas recollection refers instead to information that one can recall specific contextual details about and is typically associated only with high-confidence memories (addante, et al., a (addante, et al., , b . in explicit memory, theoretical models of recognition are largely governed by the dual processes of familiarity and recollection (diana, yonelinas, & ranganath, ; eichenbaum et al., ; ranganath, ; yonelinas, ; yonelinas, aly, wang, & koen, ) (though see wixted, and wixted & mickes, for alternative views), and it is possible that understanding of familiarity and recollection processes in memory may help explain a proportion of variance in the dke. recollection is typically operationalized as the declarative retrieval of episodic information of both the item and context bound together into a cohesive retrieval of the episodic event (for review see diana et al., ) , and in empirical studies is usually associated with the retrieval of contextual information about the item of the event, such as source memory (addante et al. a ; for reviews see eichenbaum et al., ; yonelinas et al. ; ranganath ) . the item in the event, however, may be retrieved without recollection, via reliance upon familiarity, which is typically conceptualized as retrieval of an item from a prior episode but without the associated contextual information in which it occurred. familiarity occurs, for instance, when a person can remember that someone seems recognizable from the past but cannot retrieve specific information of who the person is or from where they know them. recollection, on the other hand, would be remembering precisely who someone is and how you know them from a prior episode of one's past experience. each of these two memory phenomena have been found to be dissociable cognitive processes (yonelinas, ) , with dissociable neural substrates in the medial temporal lobes (ranganath et al., ) , neuropsychologically dissociable among patient impairments (addante, ranganath, olichney, & yonelinas, ; düzel et al., ; mecklinger, von cramon, & matthes-von cramon, ; bowles et al., ) , and with distinct patterns of electrophysiology at the scalp that is both spatially and temporally dissociable in event-related potentials (erps) curran, ; friedman, ; gherman & philiastides, ; rugg et al., ; rugg & curran, ) . physiologically, familiarity has been associated with erp differences in old and new memory trials during a negative-going peak at the mid-frontal scalp sites at approximately milliseconds to milliseconds post stimulus, called the mid-frontal old-new effect, or fn (for frontal-n effect). on the other hand, recollection has been associated with differences between memory conditions occurring at a peak in the erp at the parietal region of the scalp from approximately milliseconds to milliseconds called the late parietal component, or lpc leynes et al., ; for reviews see rugg & curran, ; friedman, ) . (adams & adams, ; ehrlinger & dunning, ; kruger & dunning, ; oskamp, ; pennycook et al., ; ryvkin et al., a; c. sanchez & dunning, ) . however, it is very likely that dkes could also be contributed to by memory experiences in one's past influencing the real-time processing of the current information-either via explicit or implicit means. based upon the converging literatures from memory and metacognition, a viable alternative theory we postulate to explain the dke is that illusory superiority may also be driven, at least in part, by increased familiarity from prior experience of one's past with the tested materials (e.g.: chua et al., ) . people may use a decision heuristic inducing a sense of performing well despite a lack of specific retrieval of the relevant details that would be involved in marking competency with the material. in this view, the experience of lacking a distinct recollection for but being generally familiar with material will lead people to assume that they are competent and successful at the task. this scenario would be associated with increased fn amplitudes in erps for inaccurate overestimators. in that case, it would be a relatively 'dangerous' combination to have insufficient recollection but excessive familiarity with a given topic, stimuli, or information (chua et al., ) because it could lead to inaccurate over-estimates of one's abilities and competencies. accordingly, under-estimators may be marked by having had sufficient recollection of the studied material (e.g. competency), such that these instances are associated with an lpc, while also leading people to recollect the extent of non-criterial information that their cognizance acknowledges could still be relatively wrong (parks & yonelinas, ) , hence lowering their estimated scores relative to other people. in this case, the excess of recollection signal would outweigh the noise of the familiarity signal. current study. the current paradigm has been designed to study the metacognitive decision-making process as it occurs in real-time during dke percentile estimates that are provided by participants throughout an item recognition memory test. based upon the memory framework for making metacognitive decisions noted above, broadly, we hypothesized that low performers who tend to overestimate their percentile ranking may do so because of familiarity for previous experiences in similar situations with markedly less reliance on recollection; and that high performers who tend to underestimate their percentile ranking may use more recollection to accurately outperform their peers. accordingly, we hypothesized that a larger fn will be evident in the group level erps for low performers compared to high performers and that there would be a larger lpc evident in the erps of high performers compared to low performers. bernardino and reported being free from neurological and memory problems. four participants' data were not used due to noncompliance issues (pressed only button throughout the task or ignored experimenter's instructions) and one participant did not have usable data due to a experimenter error that resulted in the loss of that data. two participants did not have usable eeg data due to excess motion artifacts/noise that resulted in a majority of trials being excluded from eeg but were included in behavioral analyses. this presented a behavioral data set of n = and eeg data set of n = . . % of the participants were self-reported to be hispanic, . % caucasian, . % asian, and . % identified as more than one or another ethnicity. the average age was . years old (sd = . ). none of the participants reported any visual, medical, or physical issues that would interfere with the experiment. most participants spoke english as their first language (n = ) and the whom indicated speaking a different language first had been speaking english for an average of . years (sd = . ). participants arrived at the lab and completed informed consent and demographic information forms via voluntary self-report. the paradigm used was a modified item recognition confidence test, which built from similar paradigms successfully used in our lab's prior research addante, watrous, yonelinas, ekstrom, & ranganath, ; addante, ; addante, ranganath, olichney, & yonelinas, ; roberts et al., ) and described in further detail below (fig. ). this paradigm consisted of an encoding phase containing four study sessions, in which participants studied words in each session, and a retrieval phase, containing six test sessions in which the participant's memory was tested for words in each session. they viewed a total of words, of which were presented in the encoding phase and of which were unstudied items (new items). during encoding, participants were given instructions to make a simple decision about the word presented on the screen. subjects were asked to either judge if the item was manmade or if the item was alive and conditions were counterbalanced. the stimuli were presented on a black computer screen in white letters. to begin a trial, a screen with a small white cross at the center was presented for one of three randomly chosen inter-stimuli-interval (isi) times: second, . seconds, or seconds. then, the stimulus word appeared in the middle of the screen with 'yes' presented to the bottom left of the word and 'no' presented to the bottom right of the word. the participants indicated their answer by pressing buttons corresponding to 'yes' and 'no' with their index and middle fingers, respectively, and this response was self-paced by the participant. after the participants responded, they viewed a blank black screen at a random duration of second, . seconds, or seconds. after the blank screen, the small white cross appeared at the center of the screen to begin the next trial. this cycle continued until all words in all four lists were presented. between each list, participants were read the instructions for the next task to prevent carryover effects of the preceding encoding task, and to ensure they correctly switched between the animacy and the manmade decision task. after the encoding phase was complete, the eeg cap was sized and ocular electrodes were attached. eeg was recorded using the actichamp eeg recording system with a -channel electrode cap conforming to the standard international - system of electrode locations. each subject was tested individually inside a soundattenuating chamber. stimulus presentation and behavioral response monitoring were controlled using presentation software on a windows pc. eeg was acquired at a rate of hz. subjects were instructed to minimize jaw and muscle tension, eye movements, and blinking. eog was monitored in the horizontal and vertical directions, and this data was used to eliminate trials contaminated by blinks, eye-movements, or other related artifacts. five ocular electrodes were applied to the face to record electrooculograms (eog): two above and below the left eye in line with the pupil to record electrical activity from vertical eye movements, two on each temple to record electrical activity from horizontal eye movements, and one in the center of the forehead just above the eyebrows as a reference electrode. the eeg cap was placed on the participant's head and prepared for electrical recording. gel was applied to each cap site and impedances were lowered below kohms via gentle abrasion to allow the electrodes to obtain a clear electrical signal. after the eeg cap was in place, the participant began the retrieval phase. the participants were read instructions asking them to judge if the stimulus word presented was old (studied during the encoding phase) or new (not studied in the encoding phase) ( figure ). as in the encoding phase, all stimuli words were presented in white font on a black screen. at the beginning of each trial, participants were presented with a word in the middle of the screen, the numbers " ", " ", " ", " ", and " " evenly spaced beneath the word, the word "new" on the left by the number " ", and the word "old" on the right under the number " ". participants pressed any number between " " and " " to indicate if they confidently believed the word was old (" "), believe the word was old but was not confident (" "), did not know if the word was old or new (" "), believe the word was new but was not confident (" "), or confidently believed the word was new (" ") (addante et al., ; a; b) . this prompt was subject-paced. participants were told to choose the response that gave us the most accurate reflection of their memory, and to respond as quickly and accurately as possible. immediately after the item recognition judgment, participants were asked to answer a source memory confidence test indicating if the word came from the animacy decision task or the manmade decision task during encoding on a scale of to , which was also subject-paced (addante et al., ; a; b; roberts et al., ) . after responding, participants viewed a blank black screen at a random duration of second, . seconds, or seconds. participants were instructed to blink only during this blank screen and avoid blinking during the screens with a small cross or stimuli. after the source memory test for each th word presented during the memory test, the dunning-kruger estimate was presented. participants received instructions asking them to estimate the percentile in which they believed they were performing up to that point in the test compared to other students who would participate in the study (subjects were instructed to focus on generic memory performance and to use their item memory as the primary context). during the test phase, the word "percentile?" was presented as a prompt for their estimate with the numbers "< %,", " 's", " 's", " 's", and " %+" evenly spaced beneath it. the dunning-kruger estimate was subject-paced. at the conclusion of the memory retrieval test, participants answered four post-test questions first, they were asked to "estimate your score on the whole test". participants were prompted to respond on a point scale with " " meaning below %, " " meaning between and %, " " meaning between and %, " " meaning between and percent, and " " meaning above %. the second question they were asked was the following: "in what percentile did you perform on the whole test?". the participants were prompted to respond on a -point scale with " " meaning below the th percentile, " " meaning between the th and th percentile, " " meaning between the th and th percentile, " " meaning between th and th percentile, and " " meaning in the th percentile or above. the first questions measured perceived objective score on the entire memory test while the second question measured perceived relative score in relation to other students taking the memory test. these post-test prompts allowed us to test for the dke at a between-subjects level to be sure the effect can be elicited using an episodic memory task. during analyses, participants were grouped into quartiles based on their percentile score on the test, allowing us to average each group's responses and test them against the other group's average responses to determine significant differences. they were also grouped by errors in percentile estimates; groups of over-estimators, correct-estimators, and under-estimators (also referred to as estimator groups later) were also made to investigate potential differences in cognitive strategies (see below). the two additional post-test questions were: ) "rate your memory in everyday life" and ) "how difficult was this entire test?". for the first question, participants responded on a -point scale with " " meaning very poor, " " meaning poor, " " meaning moderate, " " meaning good, and " " meaning very good. for the second prompt, participants responded on a -point scale with " " meaning very hard, " " meaning hard, " " meaning moderate, " " meaning easy, and " " meaning very easy. to maintain consistency towards replicating the original report by kruger and dunning ( ) , subjects were grouped for analyses in the same fashion as the original paper by separating participants into four quartiles depending on their test accuracy and investigating group differences among those quartiles. the way that subjects were selected for their respective group membership was based upon their performance on the item recognition test, divided into four quartiles. subjects' accuracy scores on the memory task (measured as the probability of a hit minus the probability of a false alarm, phit-pfa) were ranked from smallest to largest and split into quartiles of performance (<= %, > % to %, > % to %, and > %) and participants who fell into these quartiles comprised the low, nd , rd , and highest quartile. participants were then regrouped into what we call 'estimator groups' or participants that over-estimated, correctly estimated, and under-estimated their percentile ranking. to make these estimator groups, first, the percentile rankings described above were given scores of through that directly corresponded to the scale used by subjects to estimate their performance percentile group both in-and posttest. for example, a participant who scored in the st percentile was assigned a value of while a participant who scored in the nd percentile was assigned a value of . this allowed the subtracting of each actual percentile score from the participant's we used the post-test relative dunning-kruger estimate, so as to be consistent with the original approach used in kruger & dunning ( ) , although we also conducted a paired t-test between the average of the in-test dunning-kruger responses (m = . , sd = . ) for each person to the post-test relative dunning-kruger response (m = . , sd = . ) and found that the two scores did not differ, t( ) = . , p = . . electrophysiological analyses. physiological measurements of brain activity were recorded using eeg equipment from brain vision, llc. all eeg data was processed using the erplab toolbox using matlab (delorme & makeig, ; lopez-calderon & luck, ) . the eeg data was first re-referenced to the average of the mastoid electrodes, passed through a high-pass filter at . hertz (hz), and then down- in order to maintain sufficient signal-to-noise ratio (snr), all comparisons relied upon including only those subjects whom met a criterion of having a minimum number of artifact-free erp trials per condition being contrasted gruber & otten, ; kim et al., ; otten et al., ; c.f. luck, ) . erps of individual subjects were combined to create a grand average, and mean amplitudes were extracted for statistical analyses. topographic maps of scalp activity were created to assess the spatial distribution of effects. for erp figures, a hz low pass filter was applied to erps so as to parallel the similar 'smoothing' function that ensues from taking the mean voltage between two latencies during standard statistical analyses (i.e. addante, ) . erp results are reported for representative electrode sites but were also found to be reliable at surrounding -site clusters of electrodes unless otherwise noted. recognition memory response distributions for recognition of old and new items are displayed in table . item recognition accuracy was calculated as the proportion of hits (m = . , sd = . ) minus the proportion of false alarms (m = . , sd = . ) (i.e. phit-pfa) (addante et al., ; a, b) . participants performed item recognition at relatively high levels (max = . , min = . , m = . , sd = . ) which was greater than chance probability, t( ) = . , p < . . in addition, participants' accuracy for high confidence item recognition trials (' 's') was significantly greater than low confidence item recognition trials (' 's'), t( ) = . , p < . . source memory response distributions for recognition of old and new items are displayed in table . source memory accuracy values were collapsed to include high and low source confidence responses which were then divided by the sum of items receiving a correct and incorrect source response to calculate the proportion (addante et al., a (addante et al., , b roberts et al., ) . mean accuracy for source memory was . (sd = . ) and was reliably greater than chance, t( ) = . , p < . . the results of item memory confidence and source memory confidence scores and erps replicated the previous findings of addante , as reported in further detail by muller ( ). dunning-kruger response category for the post-test and in-test dunning-kruger responses are shown in table . when plotted against actual performance, results from subjects' reported performance estimates revealed that the canonical dke was evident, thereby replicating the dke and extending it to our novel episodic memory paradigm ( figure ). to quantify and analyze this effect, the participants were first split into quartiles based on memory accuracy (the procedure for grouping of subjects into groups based upon estimated performance vs actual performance was described in detail earlier in the methods). the average memory test accuracy, organized by quartile and each quartile's respective average post-test dunning-kruger response, is listed in table . parameter estimates of decision processes during memory retrieval. another possible account of the dke results is that subjects may have differentially engaged with the task, or that results may reflect different decision-making strategies (we thank two anonymous reviewers for raising these possibilities). to assess this possibility, we conducted analyses to quantify if subjects were using any discernably different decision processes reflecting differential engagement in the memory task, using the roc toolbox to calculate parameter estimates of their decision criterion (c), response bias (b), recollection (ro), and familiarity (f) process contributions to performing the memory task (yonelinas, ; ; park & yonelinas, ; yonelinas et al., ; . a one-way anova was performed to identify potential differences among groups (under-estimators, correct-estimators, and over-estimators) on each of the parameters. there were no reliable differences observed among groups (figure ), corresponding to the underlying performance differences on the memory test between groups. thus, outside of their core performance on the task, it appears that subjects were meaningfully engaged in the task in similar ways unattributable to factors of strategies, task-engagement, or decision-making differences of the groups. response speed for dunning-kruger judgments. reaction times (rts) were unable to be measured in previous studies of the dke because extant studies were limited by including only a single measure of self-estimate of performance at the end of a task, which is not adequate for rt analyses. the current study, however, collected thirty dke judgements per subject (n = ), which permitted analysis of response times during these phenomena, in order to gain insight into how the different groups might have performed the task differently. (table ) however, because our hypotheses were specifically interested in how people made illusory metacognitive judgments of being either among the best or the worst performers, we also specifically analyzed of the reaction times for when subjects reported performing either the best (' ') or the worst (' '), as a function of estimatorgroup to explore potential differences in cognitive strategies used to make these illusory self-estimates. for this analysis, the under-estimator group (n = ) is inherently defined as having a limited number of trials of responding that they believed they were the best percentile, and so this naturally reduced the sample of available subjects with sufficient trials for these sensitive behavioral analyses ( . this was an exploratory analysis using small samples that, like all scientific findings, will benefit from corroboration by independent laboratories. however, these results persisted despite the small sample sizes of the groups, and the patterns suggest that future studies using larger groups may find similar patterns. the pattern of responding revealed evidence that people who erred to overestimate their abilities were also responding faster when they believed they were doing the best and slower to say they were the worst, whereas more-accurate underestimators were slower to say they were the best and relatively quicker to say they were the worst. response times at encoding. one possibility to account for the dunning-kruger effect is that group-level differences could be due to how people encoded the information into memory (craik & lockhart, ; ; indeed, early accounts of the dunning-kruger effect have posited that results can be due to competency of subjects that can be corrected by improving information acquisition (kruger & dunning, ) . similarly, subjects' overconfidence at retrieval could have come from excess fluency at encoding providing feelings that the information was 'easily learned' (we thank an anonymous reviewer for making this suggestion), leading them to rely upon intuitive perceptions of fluency and feelings of familiarity at retrieval that they incorrectly misattributed as better performance ( rts of over-estimators and correct-estimators (t( ) = . , p = . ) (figure ). these findings appear to indicate that under-estimators may have performed better due to having spent (slightly) more time exposed to information tested later. the eeg data were analyzed in several systematic steps to probe possible differences between metacognitive judgements and cognitive strategies, and as noted in the methods section, erp analyses included only subjects who maintained a minimum number of valid erp trials for both of the erp conditions being compared, which resulted in somewhat smaller sample sizes from the original n = and is noted in each reported result's degrees of freedom. first, we assessed the data for general erp differences that could be identified between the tasks of memory and metacognition judgments. to do this, we assessed the erps for decisions in all of the dunning-kruger judgments collapsed together compared to the erps for all item memory judgements collapsed together, to form a standard baseline for comparison since there had not been prior erp work done in this kind of metacognitive task ( figure ). this revealed that erp activity for the metacognitive dke decisions was significantly greater than those for memory judgements, starting from approximately ms and continuing through ms at almost every electrode site. these effects were maximal at the central parietal site of pz through ms ( - ms: t( ) = . , p < . ; - ms: t( ) = . , p < . ; - ms: t( ) = . , p < . ) and similarly reliable at several surrounding sites, upon which time the effects became evident as maximal at mid-frontal site fz from - ms (t( ) = . , p < . ) with similar effects at surrounding sites, consistent with prior erp findings of parietal and anterior p a/b effects for novelty processing and oddball paradigms (curran, ; woodruff et al., ; kishiyama et al., ; knight, ; knight & scabini, ) . this basic finding established a foundation that erps during the metacognitive judgments of the dke were reliably distinct from memory-related activity, which we continued further investigation. are there differences in how dke groups were making their memory judgments? we next investigated physiological differences in memory ('old-new' effects of hits - = . ) and adjacent frontal sites, such that erps for over-estimators were far more positive than that of the under-estimators (figure ) . one suggestion from these results is that the frontal effect at - ms may be characteristic of the fn erp effect related to familiarity-based processing, in that over-estimators may be relying on the less-specific memory process of familiarity or intuitions of increased fluency to guide making their metacognitive judgments, instead of relying upon the more distinct recollection-related processes (yonelinas et al., ) that evidently appears to be supporting the people who were under-estimating their performance relative to the group (figure ). overall, these findings converge to reveal that the larger lpc magnitudes were related to higher proportions of hits on the memory test and with faster response times for recollection-related items of high confidence hits with correct source memory (addante et al., ; addante et al., a; roberts et al., ) . hence, the lpc was related to recollection, and the more recollection signal (lpc) a subject had predicted the more likely they were to under-estimate their memory performance via their average dk responses in the metacognitive judgments (figure ). recollection thus apparently led people to exhibit more humble metacognitive self-awareness. what is special about the under-estimators during their decisions to avoid the pitfalls of illusory superiority? one line of evidence we found was that in the under-estimator group, the magnitude of the erps for metacognitive judgments from - ms at mid-frontal site of fz exhibited a significant positive correlation with the average in-test dunning-kruger response given by subjects (r = . , p = . , n = ), though this relationship was not evident for the over-estimators (r = . , p = . , n = ) who exhibited the larger fn -like effects (figure , figure ). this suggested that the relative lack of familiarity-based processing in the under-estimators appears to be governing them towards reporting a lesser estimation of their performance in the task. that is, without the ambiguity of a familiarity signal it may leave recollection better suited to support these discriminating decisions. overall, these findings indicate that we can reliably predict how people will perform in either accurately-or inaccurately-self-estimating their abilities by knowing the extent of their recollection-related brain activity occurring beforehand on a memory task (macleod & donaldson, ; addante et al., ). the current study assessed multiple measures of dunning-kruger estimates interspersed throughout an on-going episodic memory test while eeg was recorded. the results from behavioral measures first revealed that the memory paradigm was successful at eliciting the dke. participants were separated into performance quartiles and their actual percentile ranking in the group was plotted alongside their estimated percentile ranking (figure ). the lowest performing participants in the bottom quartile were found to have substantially overestimated how highly they ranked in their groups while the highest performing participants moderately underestimated their actual ranking. this basic finding was important to identify as a starting point in a novel paradigm for studying the dke in episodic memory, and its establishment permitted us to continue to explore the data in more specified ways for both behavioral and electrophysiological domains. behavioral findings. the current study's paradigm permitted meaningful collection of reaction times for multiple dunning-kruger judgments that could be analyzed at a group level, which prior studies of the dke have not been able to investigate due to their one-time measures of metacognitive performance estimates at the completion of a study. we found over-estimators were discernably faster than under-estimators in judging themselves to be in the top percentile, but they were slower to judge themselves as being in the bottom percentile; accordingly, under-estimators were relatively slower to report being in the best performance and quicker to claim they were doing poorly. there are several theoretical accounts that could be used to view these results, including elements of cognition and the traditional dunning-kruger account of double ignorance ( ). the first account uses 'cognition for prototypes' to explain the reaction time patterns. kruger and dunning's ( ) original results suggest that over-estimators do not understand that they are performing poorly and so they believe they are performing well and placing well within their participant group. this could lead to them having a very positive perception about their ability to perform well on certain types of tasks. research on prototypes has shown that answers to questions that are very obviously true (closest to one's prototype) are answered faster (for example, the question, "is a robin a bird?" will elicit a faster "yes" response than the questions "is an ostrich a bird?" even though both are true) (rosch, ; collins & quillian, ) . therefore, if a person's perception of their self (or prototype of themselves) includes that they perform well on tasks, they will be more likely to give a fast response when rating themselves well as opposed to rating themselves poorly. on the other hand, if they believe they are performing poorly, this perception would oppose the prototype that they have formed, causing them to react slower to rating their performance negatively. the same may be true if under-estimators have formed a perception about themselves that they are only average or even below average: it would then be logical that they would be slower to rate themselves as being best and quicker to believe they are performing poorly. future research extending this paradigm with inclusion of 'prototype' measures of individual subjects would be able to test this hypothesis. an alternative account of the reaction time results comes from using kruger and dunning's ( ) model of double ignorance of low performers (i.e. . they do not know the answer, . they do not know they are ignorant of the answer) together with the inability of high performers to estimate their place among their peers due to not realizing the weaknesses of their peer group. by this account, over-estimators would be fast to report that they are doing well because they believe they are actually doing well, while they are slow to report that they are performing poorly because they do not believe they usually perform poorly or do not want to admit to themselves that they are performing poorly because that would be inconsistent with their metacognitive world-view. accordingly, the better-performing under-estimators would be competent enough to take pause in responding that they are doing well because they know the ways in which they might have failed (due to 'competence'), and likewise also would be guided by a humble competence (knowing also what they may not know as well as they could know) to be quicker in believing they were performing poorly. the correlation analysis of rts ( figure ) suggest that these judgments may be based upon recollection processing evident from the electrophysiological effects. neurophysiological findings. we began exploring the neurophysiology of the dke by examining brain activity for general differences in processing among the memory and metacognition tasks. that is, we assessed the generic extent to which these two judgment types could be established for reflecting different kinds of processing. erps between memory trials and self-estimate trials were found to differ reliably beginning from approximately ms into the epoch and continuing throughout the epoch to ms at almost every electrode site but being maximal first at posterior parietal sites and then later at mid-frontal regions ( figure ). this pattern of erps is consistent with established properties of p erp effects, or p a and p b effects, that are known to have the same distributions of topography and latency of across early/late and posterior/anterior regions, respectively, and which have been wellestablished as being associated with novelty processing or oddball tasks (dien, spencer, & donchin, ; otten & donchin, ; simons, et al., ) . this is consistent with the current paradigm in that the dke judgments were uncommon trials that appeared among the common memory trials in the test, and would have been salient stimuli for eliciting an orienting effect of attention as a novelty item (kishiyama, yonelinas, & lazzara, ; knight, ; knight & donatella, ) . we next explored whether the differential metacognitive judgments were associated with differential erp patterns. when brain activity of all dunning-kruger responses was investigated together, over-estimators (those who performed worst) were found to have a higher mean erp amplitude than under-estimators at frontal electrode sites during - ms ( figure , ) , consistent with the fn effect of familiarity-based processing (addante et al., a, b; for review see rugg & curran, ) . these early erp effects suggest that the errors of illusory superiority may be caused by an over-reliance on a generic sense of familiarity similar to as has been found in research on the false fame effect (jacoby et al., a (jacoby et al., , b (jacoby et al., , , as opposed to the more specific recollecting of the clear details from their past encounters which would instead provide the contextual cues to guide proper placement of one's perceptual judgments. under-estimators (those who performed best), on the other hand, exhibited a larger lpc than over-estimators did from - ms during memory judgments (figures , ) , indicating that these under-estimators may be making the decisions by reliance upon the clearer details of recollected information, as opposed to the fuzzy sense of familiarity that can come with less accuracy . in the introduction we postulated that that a memory-based framework could account for the illusory errors seen in the dke, whereby these errors (and successes) can be guided at least in part based upon differences in the cognitive processes of memory familiarity and recollection. in such a model, familiarity would be seen as providing the foundational cognitive processing associated with a heuristic used by people unsure about the details of their past performance on the task and thus guiding them to erroneously over-estimate how well they think they did (over-guessing based upon it feeling familiar but lacking details; whittlesea, ; whittlesea & williams, ; whittlesea, ; whittlesea, jacoby, & girard, ; whittlesea & leboe, ; voss & paller, ) . on the other hand, recollection would be seen as the cognitive process supporting people's abilities to correctly retrieve their memories of the past experiences with richness, detail, and contextual information bound together with the item of the episode (diana et al., ; eichenbaum et al., ) . thus, having the cognitive process of recollection available would guide people to make self-assessments of performance that are more conservatively constrained by the details of the facts of that prior performance, thereby avoiding the same risk of incorrectly assuming an over-performance based merely on it seeming familiar acontextually. taken together with the behavioral findings in rts, it appears that overestimators were 'quick to brag', whereas the high-performers were slow to judge themselves as being best-and their caution was associated with better scores. moreover, erp data suggested that recollecting the past with clear context and details may be an important part to helping keep us humble, whereas relying upon mere feelings of familiarity (without being sure) may lead us to over-estimating ourselves. what may instead delineate the matter is not confidence, per se, (since patients can be highly confident of familiar responses that lack both recollection and accuracy (addante et al., b (addante et al., , a , but rather may be found in the construct of recollecting the fuller combination of items-bound-in-context (diana et al., ; eichenbaum et al., ; addante et al., a) , including specific details, and contraindicating information (i.e. 'non-criterial recollection', gallo, weiss, & schacter, ; parks, ; yonelinas & jacoby, ) from which to place one's judgment . research on familiarity has identified a contribution of guessing, or fluency, to familiarity judgments that is included in its decision heuristic (voss et al., ; whittlesea, ; whittlesea & williams, ; whittlesea, ; whittlesea, jacoby, & girard, ; whittlesea & leboe, ) and that may evidently be leading people here to jump to the wrong conclusions about themselves relative to others, similar to as has been found in the false fame effect (jacoby et al., a (jacoby et al., , b, . this concept of familiarity being involved in illusory superiority judgments builds from the 'reach-around-knowledge' account (dunning, ) , and the data here provide a more concrete substantiation of this within the framework of cognitive processes of episodic memory. this interpretation is, to a certain extent, consistent with prior accounts of differences in people's task competency (kruger & dunning ; schlösser et al., ; adams & adams, ; ehrlinger & dunning, ; oskamp, ; pennycook et al., ; ryvkin et al., a; c. sanchez & dunning, ) if differences in illusory superiority judgments are understood as being due to differences in how people encoded the initial mnemonic information either successfully or unsuccessfully into though note, that simply being 'sure' is not sufficient on its own, either; see introduction review of the anti-correlation of memory and confidence, since success is not guaranteed with mere high confidence (chua, hannula, & ranganath, ; roediger & desoto, ; koriat et al., ; nelson & narens, ; wells, et al., ) . memory. that is, those that did not encode information well (i.e. poor attention, motivation, or distraction; craik, luo, & sakuta, ; craik, eftekhari, & binns, ; craik, naveh-benjamin, ishaik, & anderson, ; fernandes, moscovitch, ziegler, & grady, ; galli, gebert, & otten, ; middlebrooks, kerr, & castel, ; weeks & hasher, ) would not be likely to recollect that information later nor then be able to accurately calibrate how well they are actually performing if having to guess with heuristics of general familiarity and fluency (whittlesea, ; whittlesea & williams, ; whittlesea, ; whittlesea, jacoby, & girard, ; whittlesea & leboe, ) . accordingly, analysis of reaction times during encoding revealed that overestimators responded faster than under-estimators while learning information, thereby having less time to encode and consolidate the items into memory (figure ) , which is consistent with a large body of research on fluency in memory (leynes & zish, ; bader & mecklinger, ; bruett & leynes, ; cermak, et al., ; doss, bluestone, & gallo, ; kurilla & westerman, ; leynes & addante, ; li, et al., ; nie, et al., ; thapar & westerman, ; volz, schooler, & von cramon, ; westerman, ; whittlesea & leboe, ; alter & oppenheimer, ; castel, mccabe, & roediger, ; hertzog, et al., ; serra & dunlosky, ) . thus, subjects could have responded more quickly to items at encoding by virtue of their seeming more fluent. while these findings appear to indicate that overconfidence may have stemmed from enhanced fluency at encoding leading one to believe the information was more easily learned, it is challenged by the finding of their having the same response time as the correct-estimators did. alternatively, it appears that under-estimators may have performed better due to having spent (slightly) more time learning information better, again supporting their later competence. future work will benefit from empirical manipulations of fluency (e.g.: leynes & zish, ; bader & mecklinger, ; bruett & leynes, ; leynes & addante, ) and physiological measures during the encoding. alternative interpretations: studies of decision making have provided erp evidence that p effect timing and slope is each associated with evidence accumulation in decision making tasks (o'connell et al., ; twomey et al., ; boldt, et al., ) . one possibility for the current results of group differences in erps during the performance estimates is that they may reflect differential decision making and evidence accrual among subjects (for a similar model, see urai & pfeffer, ) (we thank an anonymous reviewer for pointing this out). by this account, over-estimators may have relied upon insufficient evidence accrual to make their hasty inaccurate decisions (consistent with the features of a familiarity-based signal detection process; yonelinas et al., ; , whereas the under-estimators may have been slower to believe they were doing best because of evidence accrual occurring more slowly for a slower-growing integration signal (summerfield & tickle, ; twomey et al., ) (consistent with a threshold model of recollection; yonelinas et al., ; parks & yonelinas, ; yonelinas & parks, ) . the correlation results we found were consistent with this, in that larger p signal magnitudes for dunning-kruger decisions predicted higher performance estimates in the under-estimators, as they presumably had more accrued more evidence to support those judgments (twomey et al., ; o'connell et al., ; boldt, ) . however, there are also a few lines of evidence weighing against this, which suggest that the results may not reflect core differences among groups in decisionmaking processes, attention to the task, or use of different strategies during the task. first, there were no differences across groups in their dunning-kruger response distributions, nor in their overall reaction times to the dunning-kruger decision task, which would be predicted by such accounts. second, there were no differences across groups in quantification of their use of any decision-criterion shifts (c), nor sensitivity to response bias (b) ( table ). hence, while it is always possible there could be some other decision-making factor or differences that are driving the observed effects, none of the four direct measures of such indications revealed any evidence for it. a extended possibility to that is that higher performing people might have gravitated to live in networks of other higher performing people that make them feel less outstanding (or at least, constrain their relative calibrations or criterions), and the lesser performing people might has socially gravitated to live in similar networks of lower performing people that permits them to presume having higher relative abilities. this possibility is speculative for now and lacks available data herein, and though intuitive it also runs counter to the current findings' caution of relying upon intuition for making estimates of conclusions. while the current experiment is not equipped to fully explore these factors nor the roles of decision-making processes further, these interpretations present fruitful paths for future research to take these next steps of exploration. implications. this current experiment provides several novel contributions to the understanding of the dke that also leaves much room for future explorations to build upon. first, to our knowledge, this is the only dunning-kruger experiment in which self-estimates on a task relative to a peer group were recorded repeatedly throughout the task, and which include physiological measures. self-estimates in prior studies are usually only acquired once: at the end of the task (adams & adams, ; burson, larrick, & klayman, ; ehrlinger & dunning, ; kruger & dunning, ; oskamp, ; pennycook, et al., ; ryvkin, krajč, & ortmann, a; sanchez & dunning, ) (although see simons ( ) for instance of when there was a variation of the task using repeated estimates before the task itself). this innovation to the classic dunning-kruger paradigm was critical to collecting both the reaction time measures and neurophysiological measures of erps during the metacognitive self-estimates, and offers future researchers a pathway forward to continuing exploring this phenomenon. our finding that under-estimators had a larger lpc than over-estimators did gives some insight into the inaccurate estimates of illusory superiority that occur in overestimators (relative underperformers). because the over-estimators had a smaller lpc (in fact, lacked any lpc evidence at all, figure , ), this suggests that they used less recollection during the episodic memory retrieval task. it is thereby reasonable to deduce that their memories for the episodic performance were diminished accordingly, leading to more metacognitive inaccuracies when trying to the recall episodic events related to their performance based purely upon the availability of familiarity. it is possible that this deficit may also be due to relatively more-impoverished encoding/learning in these subjects leading to relative differences in memory competency among the groups (dunning, ) , which can be tested in future studies. we also found evidence of differences in brain activity between under-estimators and over-estimators when collapsing across all dunning-kruger metacognitive responses. over-estimators had a larger erp mean amplitude than under-estimators at mid-frontal electrode sites from - ms, which is the characteristic position and latency of the fn that has been synonymous with familiarity in many prior studies curran, ; friedman, ; gherman & philiastides, ; rugg et al., ; rugg & curran, ) . in the framework of a memory-related interpretation of these results, one could argue that because we found an fn in this condition, over-estimators (under-performers) may have evidently used more familiarity than under-estimators did in making these metacognitive judgments, in lieu of the recollection that under-estimators (over-performers) were evidently relying upon instead. evidently, each group was apparently arriving at fundamentally different metacognitive conclusions because they were relying upon, or being influenced by, fundamentally different neurocognitive processes of memory. this was mirrored in the behavioral data of reaction times, which revealed a crossover interaction pattern of responding: the over-estimators were quick to say they were best but slower to say they were worst, whereas the more 'humble' under-estimators were slower to say they performed best but faster to say they were worst-again suggesting fundamentally different cognitive processing between groups. this matched erp findings that underestimators relied upon recollection instead of familiarity, whereas those who assumed they were doing better than they were actually doing relied upon familiarty instead of recollection. limitations and considerations for future research. the current work maintains inherent limitations of all initial explorations: findings remain to be assessed for generalizability, tested for its boundary conditions, and independently investigated for replicability across other sample sizes and experimental variables. some of our findings required relying upon relatively small sample sizes, and though the current paradigm has been previsouly found to be effective in seveal prior studies using even smaller sample sizes of neuropsychological patients addante, ) , nevertheless, these findings should serve as preliminary discoveries to motivate future work exploring larger group sizes. it should be noted, though, that in attempting to characterize the effects of under-estimators, most people in the study exhibited overestimating errors, so the majority of our relatievly large sized sample of n = were defined as not being in the under-estimator group that avoided these errors (indeed, this more pervasive illusory error was the prime focus of the current study and maintained a relatively large erp sample size). specified analyses looked at conditions that were rarely responded to, which further reduced certain sample sizes. nevertheless, in exploring these effects we maintained rigiorous controls of inclusion criteria for trials to gain effective signal to noise ratio (see methods), as is attested to by the reliable effects observed in the current work that small sample sizes inherently create a stronger challenge to achieve (which we did). additionally, while the electrophysiological results are compelling in suggesting memory effects contributing to the dunning-kruger phenomena, we should be cautious to avoid an over-reliance upon reverse inference poldrack, ) since other cognitive processes also likely contribute to erp effects of memory, too, such as implicit fluency and conceptual priming (voss et al., ; voss & paller, leynes & zish; leynes & addante, ; though see comments in addante et al., a addante et al., , b mecklinger et al., ; bridger et al., ; bader et al., ) . while future work would benefit from explorations in those directions, the current work is grounded in an extensive literature of established erp findings (rugg et a., ; addante et al., a; addante et al., b; for review see rugg & curran ) and observed systematic relationships among behavioral and physiological correlates of the cognitve processes (figure ) (stierset al., ; macleod & donaldson, ) . it also remains possible, though fully unexplored, that dke for both behavior and physiology could co-vary on variables of personality, so futue work integrating combinations of electrophysiology, genetics, and personality inventories could prove to be fruitful. a final limitation to the current work is that the authors, too, may be inherently subject to the pervasiveness of the dke and be over-estimating it's value, misinterpreting results, or unaware of counterfactual evidence. we hope that the current research can serve in providing value for motivating future research investigating these findings in more depth, extend them, and test them against competing hypotheses. in conclusion, the current study adds to the literature by a series of small steps: first, it represents the first physiological measures of the dke, as well as reaction time measures of the phenomenon. second, the study represents an integrative new paradigm that was developed to permit measuring multiple recurring trials of dunning-kruger metacognitive judgments, which others can now use to extend our understanding further. third, this paradigmatic innovation made possible the ability to capture the dke in a complex episodic memory task which extends the body of work on the dke to episodic memory tasks of item and source memory confidence measures/paradigms. together, these innovations revealed convergent insight into why people differ in this phenomenon. people who made the illusory errors of overestimating their performance did not have recollection-related processing supporting those memories and instead relied upon relatively less-accurate familiarity-based heuristics of intuition and fluency and were quicker to jump to those inaccurate conclusions. on the other hand, the more recollection signal (lpc) a subject had predicted the more likely they were to under-estimate their memory performance ( figure ). recollection thus appeared to lead people to a more cautious and constrained metacognitive comparison to others. (cross, ; huang, ) while extending to leaders occupying both the highest and lowest offices. the basic premise of the dke is thus seemingly a fundamental force that shapes our socio-psychological universe in similar ways that gravity shapes the backdrop of our physical universe-persisting through time and affecting everyone at some level. it takes work with self-awareness to avoid the pitfalls of illusory superiority, and surely benefits from practice. we show here that one way to do that is to avoid relying on intuition, fluency, and familiarity to make quick judgments; instead, results encourage relying on recollection of details and slower responses to reduce errors of illusory superiority. more experimentation is needed, but the present work identifies some of the cognitive processes involved in the errors that can lead to the leadership and safety hazards of over-and under-confidence in one's abilities. we hope that this research can serve to inspire new explorations endeavoring to discover the neural correlates of our psychological processes, towards a better understanding of ourselves and the truth of human behavior. authors data accessibility statement: data is accessible upon request. am collected the data for a master's thesis, analyzed data, and co-wrote the manuscript las programmed the study, assisted with behavioral data analysis rja designed the study, supervised all parts, analyzed data, wrote the manuscript, handled the submission process, and revised the manuscript for resubmission. note. means and sd are in milliseconds. by their actual percentile ranking. the low group consists of those in the first quartile (less than or equal to %), the second group consists of those in the second quartile (> % and <= %), the third group consists of those in the third quartile (> % and <= %), and the high group consists of those in fourth quartile (> %). participants who performed in the first quartile showed the most overestimation while participants who performed in the fourth quartile showed underestimation of their actual percentile. percentile or below corresponds to response of ' ' on the task and performing in the th percentile or above corresponds to response of ' '. the reaction times are separated by overestimators and the combined group of correct-& under-estimators collapsed due to relatively small sample sizes individually for these response bins. mean reaction times are reported in milliseconds. each black dot represents the raw score of an individual subject for each respective condition. error bars represent standard error of the mean. right: reaction times for the memory encoding task per each group. black bar represents the mean for each group. memory judgments compared to all dunning-kruger judgments (dk judgments minus memory judgments). each topographic map is range normalized according to their maximum and minimum values per latency. warmer colors represent more positive-going voltage differences, with scales for each noted beneath each map. b) erps for memory and metacognition tasks at central parietal size pz; x-axis is time in milliseconds, y-axis is µv. c) erps for memory and metacognition tasks at mid-frontal site fz. topographic maps show erps of collapsed dunning-kruger responses (dunning-kruger judgments , , , , and combined) for over-estimators compared to under-estimators from - ms. topographic map is range normalized to maximum and minimum values, warmer colors represent more positive-going voltage differences. right: erps for dunning-kruger judgments of over-estimators and under-estimators at mid-frontal site of fz. x-axis represents the magnitude of the lpc effect for both under-estimator and over-estimator groups combined (lpc measured as erps for hits minus correct rejections at left parietal site p from - ms during item recognition memory test (top left, top right, bottom left panels). bottom right panel x-axis represents the amplitude of mid-frontal erps for metacognitive judgments from - ms during the in-test dunning-kruger performance estimate task, separated by group. y-axis represents the proportion of successful item memory hits (judgments of or for 'old' status items during memory retrieval task) (top left); reaction times in milliseconds to recollection-related trials in which subjects got both an high item confidence hit and source memory judgment correct (top right); average in-test performance estimate given by subjects during the metacognitive dk judgments. confidence in the recognition and reproduction of words difficult to spell examining erp correlates of recognition memory: evidence of accurate source recognition without recollection prestimulus theta activity predicts correct source memory retrieval a critical role of the human hippocampus in an electrophysiological measure of implicit memory pre-stimulus neural activity predicts successful encoding of inter-item associations neurophysiological evidence for a recollection impairment in amnesia patients that leaves familiarity intact pre-stimulus neural activity predicts successful encoding of inter-item associations the effect of retrieval cues on post-retrieval monitoring in episodic memory: an electrophysiological study the self in social judgment the better-than-average effect uniting the tribes of fluency to form a metacognitive nation two methods of reducing overconfidence when knowledge knows no bounds: self-perceived expertise predicts claims of impossible knowledge separating event-related potential effects for conceptual fluency and episodic familiarity why the titanic sank the need to belong: desire for interpersonal attachments as a fundamental human motivation affect and accuracy in recall: studies of "flashbulb" memories confidence predictions affect performance confidence and neural preparation in perceptual decision making prefrontal transcranial direct current stimulation (tdcs) enhances behavioral and eeg markers of proactive control overconfidence in ignorant experts evaluations of self and others: self-enhancement biases in social judgments across the (not so) great divide: cultural similarities in selfevaluative processes understanding the better than average effect: motives (still) matter flashbulb memories event-related potentials indicate that fluency can be interpreted as familiarity skilled or unskilled, but still unaware of it: how perceptions of difficulty drive miscalibration in relative comparisons transcranial direct current stimulation modulates pattern separation the effect of instructor fluency on students' perceptions of instructors, confidence in learning, and actual learning illusions of competence and overestimation of associative memory for identical items: evidence from judgments of learning biases in social comparative judgments: the role of nonmotivated factors in above-average and comparative-optimism effects fluency versus conscious recollection in the word completion performance of amnesic patients an action-independent signature of perceptual choice in the human brain retrieval time from semantic memory the analects of confucius effects of aging and divided attention on memory for items and their contexts effects of divided attention at encoding and retrieval: further data divided attention during encoding and retrieval: differential control effects levels of processing: a framework for memory research how chronic self-views influence (and mislead) self-assessments of task performance: self-views shape bottom-up experiences with the task not can, but will college teaching be improved? new directions for higher education prefrontal cortex contributions to episodic retrieval monitoring and evaluation effects of attention and confidence on the hypothesized erp correlates of recollection and familiarity the descent of man, and selection in relation to sex. place of publication not identified mobilereference eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis the effects of unitization on familiarity-based source memory: testing a behavioral prediction derived from neuroimaging data localization of the event-related potential novelty response as defined by principal components analysis two mechanisms of constructive recollection: perceptual recombination and conceptual fluency the influence of directed attention at encoding on source memory retrieval in the young and old: an erp study the dunning-kruger effect: on being ignorant of one's own ignorance the overconfidence effect in social prediction task-related and item-related brain processes of memory retrieval effects of aging on source monitoring: differences in susceptibility to false fame how chronic self-views influence (and potentially mislead) estimates of performance the medial temporal lobe and recognition memory the anchoring-and-adjustment heuristic: why the adjustments are insufficient brain regions associated with successful and unsuccessful retrieval of verbal episodic memory as revealed by divided attention knowing with certainty: the appropriateness of extreme confidence metacognition and cognitive monitoring: a new area of cognitivedevelopmental inquiry the cognitive aging of episodic memory: a view based on the event-related brain potential available processing resources influence encoding-related brain activity before an event reducing false recognition with criterial recollection tests: distinctiveness heuristic versus criterion shifts flashbulb memories of the paris attacks neural representations of confidence emerge from the process of decision formation during perceptual choices the dunning-kruger effect is (mostly) a statistical artefact: valid approaches to testing the hypothesis with individual differences data using recently acquired knowledge to self-assess understanding in the classroom. scholarship of teaching and learning in psychology modulating human memory via entrainment of brain oscillations the contribution of judgment scale to the unskilled-and-unaware phenomenon: how evaluating others can exaggerate over-(and under-) confidence overconfidence in self-assessment of motor skill performance the relationship between the right frontal old/new erp effect and post-retrieval monitoring: specific or nonspecific? right dorsolateral prefrontal cortex is engaged during post-retrieval processing of both episodic and semantic information witness testimony: psychological, investigative and evidential perspectives erp evidence for the control of emotional memories during strategic retrieval encoding fluency is a cue used for judgments about learning a ten-year follow-up of a study of memory for the attack of uncertainty from internal and external sources: a clear case of overconfidence age differences in the neural correlates of the specificity of recollection: an event-related potential study when peers are not peers and don't know it: the dunning-kruger effect and self-fulfilling prophecy in peer-review comparing electrophysiological correlates of judgment of learning and feeling of knowing during face-name recognition strategic retrieval in a reality monitoring task nonanalytic cognition: memory, perception, and concept learning becoming famous overnight: limits on the ability to avoid unconscious influences of the past becoming famous without being recognized: unconscious influences of memory produced by dividing attention becoming famous without being recognized: unconscious influences of memory produced by dividing attention are the unskilled really that unaware? an alternative explanation the contribution of recollection and familiarity to yes-no and forced-choice recognition tests in healthy subjects and amnesics the von restorff effect in amnesia: the contribution of the hippocampal system to novelty-related memory enhancements contribution of human hippocampal region to novelty detection anatomic bases of event-related potentials and their relationship to novelty detection in humans visual short-term memory for high resolution associations is impaired in patients with medial temporal lobe damage the roc toolbox: a toolbox for analyzing receiver-operating characteristics derived from confidence ratings close but no cigar: spatial precision deficits following medial temporal lobe lesions provide novel insight into theoretical models of navigation and memory the effects of encoding fluency and retrieval fluency on judgments of learning are the unskilled really that unaware? an alternative explanation unskilled, unaware, or both? the better-than average heuristic and statistical regression predict errors in estimates of own performance unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments processing fluency affects subjective claims of recollection consistency of flashbulb memories of september over long delays: implications for consolidation and wrong time slice hypotheses reconceptualizing individual differences in self-enhancement bias: an interpersonal approach neurophysiological evidence that perceptions of fluency produce mere exposure effects event-related potential (erp) evidence for fluencybased recognition memory variations in retrieval monitoring during action memory judgments: evidence from event-related potentials (erps) processing fluency hinders subsequent recollection: an electrophysiological study do those who know more also know more about how much they know? organizational behavior & human performance leading questions and the eyewitness report semantic integration of verbal information into a visual memory the formation of false memories eyewitness testimony: the influence of the wording of a question erplab: an open-source toolbox for the analysis of event-related potentials investigating the functional utility of the left parietal erp old/new effect: brain activity predicts within but not between participant variance in episodic recollection do people overestimate their information literacy skills? a systematic review of empirical evidence on the dunning-kruger effect overconfidence as a result of incomplete and wrong knowledge recollection-based prospective metamemory judgments are more accurate than those based on confidence: judgments of remembering and knowing (jorks) impact of oscillatory tdcs targeting left prefrontal cortex on source memory retrieval event-related potential evidence for a specific recognition memory deficit in adult survivors of cerebral hypoxia evidence that judgments of learning are causally related to study choice selectively distracted: divided attention and memory for important information the trouble with overconfidence resisting false recognition: an erp study of lure discrimination neurological correlates of the dunning-kruger effect neural correlates of judgments of learning-an erp study on metacognition memory and law phantom flashbulbs: false recollections of hearing the news about challenger when people's judgments of learning (jols) are extremely accurate at predicting subsequent recall: the "delayed-jol effect sensitivity of reality monitoring to fluency: evidence from behavioral performance and event-related potential (erp) old/new effects a supramodal accumulation-tobound signal that determines perceptual decisions in humans overconfidence in case-study judgments relationship between p amplitude and subsequent recall for distinctive events: dependence on type of distinctiveness attribute assuming too much from 'familiar' brain potentials the role of noncriterial recollection in estimating recollection and familiarity evidence for a memory threshold in second-choice recognition memory responses the effects of exposure to differing amounts of misinformation and source credibility perception on source monitoring and memory accuracy dunning-kruger effects in reasoning: theoretical implications of the failure to recognize incompetence inferring mental states from neuroimaging data: from reverse inference to large-scale decoding a unified framework for the functional organization of the medial temporal lobes and the phenomenology of episodic memory dissociable correlates of recollection and familiarity within the medial temporal lobes structural bases of typicality effects entrainment enhances theta oscillations and improves episodic memory the neurobiology of learning and memory event-related potentials and recognition memory dissociation of the neural correlates of implicit and explicit memory are the unskilled doomed to remain unaware are the unskilled doomed to remain unaware differently confident: susceptibility to bias in perceptual judgments of size interacts with working memory capacity overconfidence among beginners: is a little learning a dangerous thing electrocortical signs of levels of processing: perceptual analysis and recognition memory adaptive constructive processes and the future of memory the cognitive neuroscience of constructive memory: remembering the past and imagining the future episodic future thinking and episodic counterfactual thinking: intersections between memory and decisions memory and law: what can cognitive neuroscience contribute? how unaware are the unskilled? empirical tests of the "signal extraction" counterexplanation for the dunning-kruger effect in self-evaluation of performance on the panculturality of self-enhancement and self-protection motivation: the case for the universality of self-esteem confidence and accuracy in judgements using computer displayed information does retrieval fluency contribute to the underconfidence-with-practice effect as you like it remembering september th: the role of retention interval and rehearsal on flashbulb and event memory confidence estimates on the correctness of constructed and multiple-choice responses unskilled and optimistic: overconfident predictions despite calibrated knowledge of relative skill on the relationship of p a and the novelty-p judgments of learning do not reduce to memory encoding operations: event-related potential evidence for distinct metacognitive processes immediate judgments of learning predict subsequent recollection: evidence from event-related potentials evidence for the differential impact of time and emotion on personal and event memories for metamemory, distinctiveness, and event-related potentials in recognition memory for faces recognition memory and the medial temporal lobe: a new perspective an investigation into the dunning-kruger effect in sport coaching reverse inference of memory retrieval processes underlying metacognitive monitoring of learning using multivariate pattern analysis the p as a build-to-threshold variable (commentary on twomey et al.) are we all less risky and more skillful than our fellow drivers? confidence, not consistency, characterizes flashbulb memories aging and fluency-based illusions in recognition memory judgment under uncertainty: heuristics and biases the classic p encodes a build-to-threshold decision variable overconfident prediction of future actions and outcomes by self and others it just felt right: the neural correlates of the fluency heuristic conceptual priming and familiarity: different expressions of memory during recognition testing with distinct neurophysiological correlates more than a feeling: pervasive influences of memory without awareness of retrieval neural correlates of conceptual implicit memory and their contamination of putative neural correlates of explicit memory neural substrates of remembering: event-related potential studies a night to remember divided attention reduces resistance to distraction at encoding but not retrieval relative fluency and illusions of recognition memory two routes to remembering (and another to remembering not) the heuristic basis of remembering and classification: fluency, generation, and resemblance the source of feelings of familiarity: the discrepancy-attribution hypothesis illusions of familiarity illusions of immediate memory: evidence of an attributional basis for feelings of familiarity and perceptual quality the heuristic basis of remembering and classification: fluency, generation, and resemblance two fluency heuristics (and how to tell them apart) dual-process theory and signal-detection theory of recognition memory a continuous dual-process model of remember/know judgments an electrophysiological investigation of the relationship between conceptual fluency and familiarity electrophysiological dissociation of the neural correlates of recollection and familiarity recognition memory rocs and the dual-process signaldetection model: comment on glanzer the nature of recollection and familiarity: a review of years of research recollection and familiarity: examining controversial assumptions and new directions noncriterial recollection: familiarity as automatic, irrelevant recollection receiver operating characteristics (rocs) in recognition memory: a review the hippocampus supports high-resolution binding in the service of perception, working memory and long-term memory dissociation of the electrophysiological correlates of familiarity strength and item repetition training college students to assess accurately what they know and don't know discrete fixed-resolution representations in visual working memory opposite effects of capacity load and resolution load on distractor processing key: cord- - b lfz authors: kurokawa, shunji; hashimoto, yoshihide; funamoto, seiichi; yamashita, akitatsu; yamazaki, kazuhiro; ikeda, tadashi; minatoya, kenji; kishida, akio; masumoto, hidetoshi title: in vivo recellularization of xenogeneic vascular grafts decellularized with high hydrostatic pressure method in a porcine carotid arterial interpose model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: b lfz autologous vascular grafts are widely used in revascularization surgeries for small caliber targets. however, the availability of autologous conduits might be limited due to prior surgeries or the quality of vessels. xenogeneic decellularized vascular grafts from animals potentially substitute for autologous vascular grafts. decellularization with high hydrostatic pressure (hhp) is reported to highly preserve extracellular matrix (ecm) which would be feasible for recellularization and vascular remodeling after implantation. in the present study, we conducted xenogeneic implantation of hhp-decellularized bovine vascular grafts from dorsalis pedis arteries to porcine carotid arteries then evaluated graft patency, ecm preservation and recellularization. surgical procedure not to damage luminal surface of the grafts from drying significantly increased the graft patency at weeks after implantation (p = . ). after the technical improvement, all grafts (n = ) were patent with mild stenosis due to intimal hyperplasia at weeks after implantation. neither aneurysmal change nor massive thrombosis was observed even without administration of anticoagulants nor anti-platelet agents. elastica van gieson and sirius-red stainings revealed fair preservation of ecm proteins including elastin and collagen after implantation. luminal surface of grafts was thoroughly covered with von willebrand factor-positive endothelium. scanning electron microscopy on luminal surface of implanted grafts exhibited cobblestone-like endothelial cell layer which is similar to native vascular endothelium. recellularization of tunica media with alpha-smooth muscle actin-positive smooth muscle cells was partly observed. thus, we confirmed that hhp-decellularized grafts are feasible for xenogeneic implantation accompanied by recellularization by recipient cells. cardiovascular disease is the highest cause of death worldwide. according to the report from world health organization, . million people died from cardiovascular diseases which accounts approximately % of the whole death on [ ]. coronary artery disease (cad) and peripheral artery disease (pad) are major components in cardiovascular disease with increased mortality and morbidity so far [ ] . revascularization surgery including bypass grafting is an established treatment modality for cad and pad. considering relatively small size of target arteries, synthetic vascular prostheses made from expanded polytetrafluoroethylene, dacron and others are not suitable for the revascularization of arteries with small caliber (< mm) because of their poor patency rate even though their feasibility in large-(> mm) or medium-( - mm) caliber arteries [ ] [ ] [ ] . autologous grafts such as internal mammary arteries, radial arteries or saphenous veins are commonly used for bypass grafting in small caliber arteries instead. however, the availability of autologous arterial conduits is rather limited in general. although autologous saphenous vein conduits possess higher availability and flexibility in regard of length and manipulation, it is reported that - % of patients the samples were freeze-dried and weighted (n = ). the mg of samples were minced and digested with μg/ml proteinase k in mm of tris-hcl, % sodium dodecyl sulfate, mm of nacl, and mm of ethylenediaminetetraacetic acid - na solution at °c for overnight. dna was isolated with a phenol/chloroform/isoamyl alcohol ( : : ) extraction followed by ethanol precipitation. the amount of residual dna was quantified by picogreen assay. the tensile test was conducted as previously described [ ] [ ] [ ] using a universal testing machine (autograph ag-x, shimadzu) at crosshead speed of . mm/min. the samples were cut into dumbbell shape with a length of mm and a width of mm. the sample thickness was measured using micrometer with μm accuracy before tensile test. each specimen was preloaded to . n before loading. four specimens from each group were separately tested. the elastic modulus was estimated from the slope of liner fit to the stress-strain curve. the tests were performed with longitudinal direction of the vessels. nine to ten-month-old female clawn miniature swine ( - kg; n = ) were anesthetized with ketamine ( mg/kg body weight) and xylazine ( . mg/kg body weight) cocktail by intramuscular administration. animals were intubated (shily hi-lo oral tracheal tube . mm i.d. covidien japan, tokyo, japan) and maintained on - % isoflurane and l oxygen using closed-circuit inhalation. during surgery, anesthesia depth and hemodynamic state were monitored by invasive blood pressure and electrocardiogram. decellularized grafts with hhp were implanted into right carotid artery. after midline neck incision, right common carotid artery was exposed for approximately cm from carotid bifurcation. heparin sodium ( iu / kg) was administered before clamp of the artery. right common carotid artery was interposed with hhp decellularized graft for approximately cm in end-to-end fashion using - polypropylene continuous suture. in phase group (n = ), anastomoses were accomplished with similar surgical conditions with those in human bypass grafting surgery (conventional condition). in phase group (n = ), anastomosis was carefully performed keeping the decellularized graft, especially its luminal surface always wet with water (moist condition) and not touching its luminal surface. after anastomosis, blood flow was resumed. blood flow velocity (time averaged maximum flow velocity) was measured by color doppler ultrasound at the center of the implanted graft. wound was closed layer by layer. all animals were administered prophylactic cefazolin sodium intravenously. after implantation, no anticoagulants nor anti-platelet agents were administrated. four weeks after transplantation, decellularized graft were explanted under general anesthesia and euthanized by intravenous bolus injection of potassium chloride ( - meq/kg). angiogram for bilateral carotid arteries are performed by inserting fr guiding catheter (introducer ii, terumo, tokyo, japan) at femoral artery and selective angiogram with fr straight catheter (glidecath, terumo) from the proximal end of common carotid arteries. iopamidol (oyplomin® , fujipharma, toyama, japan; ml) were used as contrast agent. the angiogram were performed at just after implantation surgery, weeks, weeks after the surgery, respectively. ivus was performed at the same timing with that of angiogram, respectively. visions pv . (philips japan, tokyo, japan) were used for ivus transducer and volcano (philips japan) were used for the data acquisition. ivus was performed for bilateral carotid arteries (contralateral side of implanted side for control). the data were recorded from distal to proximal anastomosis site of the implanted graft to evaluate intraluminal stenosis and morphology. the data were analyzed by image j software (version . i, national institutes of health, bethesda, md) [ ] . explanted grafts were incised to longitudinal direction and fixed in % paraformaldehyde for hours. after fixation, explanted grafts were divided into proximal part and distal part, subsequentially embedded in paraffin. in each part, sections with μm thickness were prepared consecutively and subjected to hematoxylin and eosin staining, elastica van gieson staining, sirius red staining and von kossa staining, respectively. immunostaining of von willebrand factor for endothelium (anti-von willebrand factor antibody, abcam, cambridge, uk, : ) and α-smooth muscle actin for smooth muscle cells (anti-alpha smooth muscle actin antibody, abcam, cambridge, uk, : ) were performed, respectively. sirius red staining sections were observed using polarized light microscope (bx ; olympus, tokyo, japan). native bovine dorsalis pedis arteries, decellularized grafts before implantation, native porcine carotid arteries and explanted grafts were fixed into a . % glutaraldehyde- % paraformaldehyde- . m phosphate buffer solution (ph . ) for hours at °c, respectively. after fixation, samples were immersed into % osmium tetroxide for hours at °c , and were dehydrated by graded ethanol ( %, %, %, %, %, % and %) for min. subsequently, the samples were dried and coated with a thin layer of platinum palladium using an ion sputtering device (eiko corp.,tokyo, japan). the sample were examined with a hitachi s- scanning electron microscope (hitachi, tokyo, japan). all data analyses were performed using jmp version . . (sas institute, cary, nc, usa). statistical analysis of the data was performed with unpaired t-tests or fisher's exact test for groups. p < . was considered significant. values are reported as means ± sd. macroscopic view of decellularized bovine dorsalis pedis artery with hhp method is shown in figure a . the amount of residual dna was measured to quantify the efficiency of cell removal. the amount of residual dna of decellularized graft was . ± . ng/mg which was significantly lower compared with those of bovine arteries, . ± . ng/mg (p < . ) (fig. b) . tensile test was performed to measure physical strength of decellularized grafts. we started our experiments under surgically equivalent condition with that of usual human revascularization surgeries in preparation and anastomoses of autologous vascular grafts such as saphenous vein grafts (phase ; n = ). we experienced that all implanted grafts were occluded at weeks after implantation ( / ; % patency in phase ). we considered that the major reason of the occlusion might be the damage of the intima and exposure of basement membrane attributed by the drying of a lumen side which may affect antithrombogenicity and patency [ ] , then modified condition of anastomoses to moist condition not to allow lumen of grafts to be dried as much as possible (phase ; n = ) ( fig. a,b ; supplemental video ). in phase , we confirmed that all grafts were patent at weeks after implantation ( / ; % patency in phase ) (phase vs phase ; p = . ). there was no significant difference in time averaged maximum flow velocity just after anastomosis between phase and phase (fig. c,d) . in all cases of phase , moderate stenoses of grafts were observed in the proximity of both anastomosis sites by selective angiogram (fig. a) . intimal thickening of the corresponding regions was also confirmed by ivus (fig. b) . stenosis ratios of proximal and distal anastomotic region were . ± . % and . ± . %, respectively. we optimally evaluated all explanted grafts at weeks after implantation. luminal side of all explanted grafts were macroscopically smooth, but some explanted grafts were accompanied by small amount of thrombi (fig. c) . no aneurysmal change was observed in phase cases. hematoxylin and eosin staining revealed that hhp-decellularized graft did not contain cell nuclei indicating sufficient decellularization by hhp. on the other hand, recellularization in whole layers of the grafts was confirmed at weeks after implantation (fig. a) . elastica van gieson staining and sirius red staining exhibited a fair preservation of elastin layer (internal elastic lamina), tunica media consisted of collagen fibers and stratified elastin layers in hhp-decellularized grafts, and recellularization at weeks after implantation, respectively (fig. b) . polarized microscopical observations for striated elastin layer revealed that collagen i deposition was preserved among the elastin layers before implantation, and newly produced collagen iii were deposited at the same region after implantation (supplemental fig. ) . immunostaining for von willebrand factor-positive endothelium revealed that the intima of implanted grafts was fairly covered by an endothelial cell layer throughout the graft (fig. c) . α-smooth muscle actin (αsma)-positive vascular smooth muscle cells are observed among tunica media (fig. d) . these results indicate that the hhpdecellularized vascular grafts are recellularized by host-derived vascular cells in accordance with the anatomical allocations of native arteries. we evaluated the stenotic regions in proximities of proximal and distal anastomoses. hematoxylin and eosin staining exhibited that the hypertrophic regions are filled with proliferated cellular components. immunostaining for von willebrand factor revealed thin endothelial cell layer covering the luminal surface. immunostaining for αsma showed that the stenotic region was mainly consisted of proliferated smooth muscle cells located between surface endothelial cell layer and internal elastic lamina ( fig e) . von kossa staining showed small deposition of calcified nodules close to the suture line which are not observed in another region of the graft (supplemental fig. ) . immunostaining revealed that the implanted grafts were not infiltrated with inflammatory cells except some regions close to luminal surface (supplemental fig. ). the ultrastructure of luminal surface of vessels are evaluated by sem. the luminal surface of native bovine artery (graft animal) was covered with endothelium exhibiting cobblestone-like appearance (fig. a) . after decellularization by hhp processes, the luminal surface of decellularized grafts exhibited acellular smooth surface without endothelium (fig. b) . the luminal surface of decellularized bovine graft implanted at porcine carotid artery for weeks showed endothelium with cobblestone-like appearance by fair recellularization similar with those in bovine and porcine native arteries (recipient animal) (fig. a , c, d). for patients requiring revascularization surgery, unavailability of autologous bypass grafts may lead to loss of an opportunity for appropriate therapy and consequent poor prognosis. although cryopreserved arterial allografts can be a surrogate for autologous small-caliber grafts with acceptable long-term patency, the availability is rather limited because of the shortage of donors and insufficient operation of organ / tissue banks to preserve and provide allografts [ ] [ ] [ ] . the present study was designed to address this healthcare problem through validating the feasibility of xenogeneic implantation of hhp-decellularized vascular grafts using bovine arterial grafts and a porcine carotid arterial interpose model. numerous methods for decellularization of living tissues have been reported so far [ ] [ ] [ ] . in previous reports of the implantation of decellularized small-caliber vascular grafts by chemical or biological decellularization methods, insufficient outcomes such as graft occlusion, thrombus formation, and intimal proliferation throughout the graft were observed [ , ] . we previously reported that the hhp decellularization, a novel method utilizing a physical basis can almost completely wash out the cellular components of porcine aorta and radial arteries with preserved mechanical properties such as elastic modulus [ ] . in the present study, we decellularized bovine dorsalis pedis arteries by hhp and confirmed that the efficiency of cellular wash-out was > % calculated by residual dna amounts compared to those of untreated bovine dorsalis pedis artery. residual dna amounts in the present study ( . ± . ng/mg) satisfied the recommended criteria of successful decellularization as < ng/mg [ ] . in mechanical tensile tests, there was no significant difference between hhp-decellularized and native arteries in the early and late elastic modulus representing the elastin phase and the collagen phase, respectively [ ] . in the present study, implantation surgeries were performed in phases which were different in surgical conditions especially in moist conditions of the graft (phase ; conventional condition, phase ; no touch of luminal surface of the graft keeping moist condition). flow patterns in phases immediately after implantation did not differ each other, indicating that the qualities of anastomoses were not different in the phases. however, the patency was significantly lower in phase compared to that in phase at weeks after implantation. in phase , all implanted grafts were resulted in thromboembolism, whereas all grafts in phase were patent with small amount of thrombi without any postoperative antiplatelet drugs or anticoagulants. these results suggest that the luminal surface of hhp-decellularized vascular grafts possesses fair antithrombogenicity when the intimal surface was not damaged by intraoperative grasping or drying. although a careful manipulation would be required in bypass grafting surgeries, hhp-decellularized vascular grafts might hold promise as a novel vascular graft without antiplatelet agents or anticoagulants in the future. xenogeneic hhp decellularized graft showed feasible capacity for recellularization and vascular remodeling without thrombogenicity. hhp decellularized vascular grafts may be utilized as new medical products for revascularization surgeries. world health organization. ncd mortality and morbidity comparison of global estimates of prevalence and risk factors for peripheral artery disease in and : a systematic review and analysis the tissue-engineered vascular the use of expanded polytetrafluoroethylene (ptfe) grafts for myocardial revascularization aorta-coronary bypass grafting with polytetrafluoroethylene conduits. early and late outcome in eight patients role of prosthetic conduits in coronary artery bypass grafting optimal conduit choice in the absence of single-segment great saphenous vein for below-knee popliteal bypass early systemic cellular immune response in children and young adults receiving decellularized fresh allografts for pulmonary valve replacement the use of high-hydrostatic pressure treatment to decellularize blood vessels decellularized porcine aortic intima-media as a potential cardiovascular biomaterial porcine radial artery decellularization by high hydrostatic pressure application of a vacuum pressure impregnation technique for rehydrating decellularized tissues nih image to imagej: years of image analysis redundant mechanism of platelet adhesion to laminin and collagen under flow: involvement of von willebrand factor and glycoprotein ib-ix-v banking of cryopreserved arterial allografts in europe: years of operation in the european homograft bank (ehb) in brussels midto long-term outcomes of cardiovascular tissue replacements utilizing homografts harvested and stored at japanese institutional tissue banks ten year experience of using cryopreserved arterial allografts for distal bypass in critical limb ischaemia decellularization combined with enzymatic removal of n-linked glycans and residual dna reduces inflammatory response and improves performance of porcine xenogeneic pulmonary heart valves in an ovine in vivo model in vivo performance of freeze-dried decellularized pulmonary heart valve allo-and xenografts orthotopically implanted into juvenile sheep decellularized scaffolds for tissue engineering: current status and future perspective multicenter evaluation of the bovine mesenteric vein bioprostheses for hemodialysis access in patients with an earlier failed prosthetic graft an overview of tissue and whole organ decellularization processes founder's award tissue heart valves: current challenges and future research perspectives the authors have declared that no competing interests exist. key: cord- - v qoyc authors: bauman, neda; ilić, andjelija; lijeskić, olivera; uzelac, aleksandra; klun, ivana; srbljanović, jelena; Ćirković, vladimir; bobić, branko; Štajner, tijana; djurković-djaković, olgica title: computational image analysis reveals the structural complexity of toxoplasma gondii tissue cysts date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v qoyc toxoplasma gondii is an obligate intracellular parasite infecting up to one third of the human population. the central event in the pathogenesis of toxoplasmosis is the conversion of tachyzoites into encysted bradyzoites. a novel approach to analyze the structure of in vivo-derived tissue cysts may be the increasingly used computational image analysis. the objective of this study was to quantify the geometrical complexity of t. gondii cysts by morphological, particle, and fractal analysis, as well as to determine if and how it is impacted by parasite strain, cyst age, and host factors. analyses were performed on images of t. gondii brain cysts of four type- strains (the reference me strain and three local isolates, named bgd , bgd , and bgd ) using imagej software package. the parameters of interest included diameter, circularity, relative particle count (rpc), fractal dimension (fd), lacunarity, and packing density (pd). although cyst diameter varied widely, its negative correlation with rpc was observed. circularity was remarkably close to , indicating that the shape of the brain cysts was a perfect circle. rpc, fd, and pd did not vary among cysts of different strains, age, and derived from mice of different genetic background. conversely, lacunarity, which is a measure of heterogeneity, was significantly lower for bgd strain vs. all other strains, and higher for me vs. bgd and bgd , but did not differ among me cysts of different age, and derived from genetically different mice. this study is the first application of fractal analysis in describing the structural complexity of t. gondii cysts. despite all the differences among the analyzed cysts, most parameters remained conserved. fractal analysis is a novel and widely accessible approach, which along with particle analysis may be applied to gain further insight into t. gondii cyst morphology. to determine if and how it is impacted by a number of parasite or host factors. additionally, morphological and particle analyses of t. gondii cysts were applied to gain further insight into the cyst shape uniformity, as well as to a possible correlation between the cyst size and the number of parasites. perimeter area y circularit   circularity values close to . , whereas elongated shapes have very low circularity (close to . ). cysts were transferred to black background images for subsequent particle analysis. for particle analysis, images were first enhanced using histogram equalization. during this relative particle count (rpc) was determined following the local bernsen auto-thresholding, watershed further segmented by a watershed algorithm for two reasons: a) it seems that, due to a low local border clearly separated particle areas. in that manner, all of the cyst cross-section images were subjected to an automated processing and the obtained particle numbers, n p , were expected to be proportional to the fig c. the fd was obtained as a negative slope of the best-fit regression line: each fd calculation is accompanied by the correlation coefficient, , describing a goodness of r fit of the regression line, in a particular case. in the cumulative mass method, a probability of finding a certain pixel number within a given lacunarity of a fractal set represents the ratio of the second-order moment (variance) to the in addition to the mentioned parameters, we defined another parameter measuring the percentage of space covered by bradyzoites, and termed it 'packing density' (pd). the two-dimensional analysis was performed on the cyst cross-sectional areas captured on microscope. both particle analysis and fractal analysis provided the number of black pixels in a considered cross section, n b . the total number of pixels in a cross section, n t , was known in all cases, which allowed the packing density estimation as . ( ) the a; fig a,b) . lacunarity, on the other hand, differed among the strains, in that it was significantly lower in the bgd strain in comparison to the three other ones (p≤ . ), as well as in bgd and bgd vs. me (tukey p= . and p= . , respectively), but not between bgd and bgd (p= . ) (fig b) . in an attempt to improve the understanding of the nature of the tissue cyst structural the analyzed cysts widely differed in diameter, with the cysts obtained from the experiment being smaller, albeit not significantly, compared to historical (archived) cysts. this may be due to a bias towards larger, more interesting cysts kept in a laboratory photo archive. advances in the life cycle of toxoplasma gondii genetic analysis of tachyzoite to bradyzoite differentiation mutants in toxoplasma gondii reveals a hierarchy of gene induction dynamics of toxoplasma gondii differentiation toxoplasma development -turn the switch on or off? novel approaches reveal that toxoplasma gondii bradyzoites within tissue cysts are dynamic and replicating entities in vivo fractal and multifractal analysis: a review the fractal geometry of life the goodness-of-fit of the fractal dimension as a sunflower oils quantifying biofilm structure using image analysis the fractal nature of escherichia coli biological flocs aspergillus fumigatus branching complexity in vitro: d images and dynamic modeling machine learning approach for automated screening of malaria parasite using light microscopic images efficacy of imagej in the assessment of apoptosis reliable enumeration of malaria parasites in thick blood films using digital image analysis prenatal and early postnatal diagnosis of congenital toxoplasmosis in a setting with no systematic screening in pregnancy shape descriptors of the "never resting" microglia in three different acute brain injury models in mice quantification and characterization of uvb-induced mitochondrial fragmentation in normal primary human keratinocytes characterization and measurement of random fractals fractal methods and results in cellular morphology -dimensions, lacunarity and multifractals unidentified toxoplasma-like tissue cysts in the brains of three cats. vet s data -fractal dimension and lacunarity of tissue cysts (n= ) s data -binary images of t. gondii cysts segmented by watershed algorithm (n= ) key: cord- -k z gb authors: gamboa arana, olga lucia; palmer, hannah; dannhauer, moritz; hile, connor; liu, sicong; hamdan, rena; brito, alexandra; cabeza, roberto; davis, simon w.; peterchev, angel v.; sommer, marc a.; appelbaum, lawrence g. title: dose-dependent enhancement of motion direction discrimination with transcranial magnetic stimulation of visual cortex date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k z gb despite the widespread use of transcranial magnetic stimulation (tms) in research and clinical care, the underlying mechanisms-of-actions that mediate modulatory effects remain poorly understood. to fill this gap, we studied dose–response functions of tms for modulation of visual processing. our approach combined electroencephalography (eeg) with application of single pulse tms to visual cortex as participants performed a motion perception task. during participants’ first visit, motion coherence thresholds, -channel visual evoked potentials (veps), and tms resting motor thresholds (rmt) were measured. in second and third visits, single pulse tms was delivered ms before the onset of motion or at the onset latency of the n vep component derived from the first session. tms was delivered at %, %, %, or % of rmt over the site of n peak activity, or at % over vertex. behavioral results demonstrated a significant main effect of tms timing on accuracy, with better performance when tms was applied at n -onset timing versus pre-onset, as well as a significant interaction, indicating that % intensity produced higher accuracy than other conditions. tms effects on veps showed reduced amplitudes in the % pre-onset condition, an increase for the % n -onset condition, and monotonic amplitude scaling with stimulation intensity. the n component was not affected by tms. these findings reveal dose–response relationships between intensity and timing of tms on visual perception and electrophysiological brain activity, generally indicating greater facilitation at stimulation intensities below rmt. transcranial magnetic stimulation (tms) has become a valuable treatment option for a host of psychiatric and neurological disorders and a useful tool in the study of the psychophysiology of human cognition. the underlying mechanisms-of-action that lead to these effects, however, are relatively poorly understood, hence the strong emphasis on filling this gap through the ongoing national institute of health, brain initiative research priorities (nih, ) . while a rich literature of studies now offer characterization of transient induced, and plastic long-term, effects of tms in the motor system (hallett et al., ; pascual-leone et al., ) , systematic dose-response characterization in the visual system is lacking. tms to the visual system can yield non-retinal perceptions, referred to as "phosphenes" for static flashes or "mophenes" if perceptual motion is induced (pascual-leone & walsh, ; schaeffner & welchman, ) , as well as modulation of perception from retinal input (de graaf, koivisto, jacobs, & sack, ) . notably, single-pulse tms (sptms) and pairedpulse tms (pptms) of the visual cortex have been widely reported to induce changes in the perception of visual motion (bosco, carrozzo, & lacquaniti, ; grasso et al., ; laycock, crewther, fitzgerald, & crewther, ; silvanto, lavie, & walsh, ; vetter, grosbras, & muckli, ) . across a number of studies, tms to motion sensitive cortex has been shown to influence speed (matthews, luber, qian, & lisanby, ) and direction sensitivity (campana, cowey, & walsh, ) , as well as perception of biological motion (mather, battaglini, & campana, ) . despite this accumulation of literature, the relationship between the reported physiological response and the degree of behavioral engagement across this literature is highly variable, and the dose-response relationships between tms and neurophysiology have yet to be established. studies of motion perception are particularly well-suited for exploring mechanisms of tms due to the superficial location of motion sensitive cortex and its well-characterized spatiotemporal progression of electroencephalographic (eeg) activation described by the p /n /p visual evoked potential (vep) complex. in this progression, it is regarded that the initial p reflects pattern-related activity of the parvo-cellular subsystem, while the subsequent n has been associated with motion perceptual sensitivity (bach & ullrich, ; kuba, kubová, kremláček, & langrová, ; martin, huxlin, & kavcic, ) and is localized to the direction-selective area v of the extra-striate visual cortex (pazo-Álvarez, amenedo, lorenzo-lópez, & cadaveira, ) . in lateralized visual attention tasks, this response has particularly been characterized as pre-attentive in the early < ms phase, and specific to spatial attention after about ms, as observed in the widely-reported n pc component (clark, appelbaum, van den berg, mitroff, & woldorff, ; luck & hillyard, ) . lastly, in tasks where response selection is made on motion stimuli, a central positive p component is frequently observed around ms (kuba, kremláček, & kubová, ; kubová, kremláček, szanyi, chlubnová, & kuba, ) that is thought to reflect attentional allocation to the stimuli (duncan-johnson & donchin, ) . while other thalamo-cortical pathways also contribute to visual motion perception, these veps offer specific testable neural markers that can be characterized to infer dose-response properties of tms the present study builds on these two bodies of literature, tms modulation of motion perception and motion-induced veps, to characterize dose-response functions of sptms in the visual cortex. our approach was to measure concurrent tms-eeg during an individually calibrated, dot-motion, direction-discrimination task, previously developed by our group to test pptms effects (gamboa, ) . for this purpose, we used a three-visit study design consisting of an initial dose-finding session to derive individualized motion coherence thresholds and stimulation parameters based on the onset of the n vep component ("n -onset"), followed by two dose-testing sessions during which sptms was delivered according to the spatial, temporal, and intensity parameters derived from the first session. during the dose-testing sessions, sptms was delivered at one of two different timings, either ms before the onset of motion ("pre-onset", based on the observation that tms to v around this latency can disrupt motion perception (laycock et al., ; sack, kohler, linden, goebel, & muckli, ) ), or at the individualized n -onset latency from session . during each of these two sessions, participants performed eight blocks of trials with intermixed pulse intensities at %, %, %, and % of rmt delivered over the hotspot of the n -onset component, as well as two separate blocks of trials at % rmt at a vertex, control location. as such, this study tested the effects of sptms over intensities that have previously been reported to induce facilitatory and inhibitory perceptual effects (luber et al., ; silvanto, bona, & cattaneo, ), but at different timings relative to the motion onset and at different cortical targets. the overall goal was to map the behavioral and evoked eeg dose-response functions within the constructs of individualized spatiotemporal targeting with the expectation that sptms delivered at the onset of the n component would disrupt motion processes in the brain and lead to monotonic effects across different stimulation intensities. twenty-four healthy volunteers ( females, mean age = , sd = . ) enrolled in this -visit study. all participants were self-reported right-handed, had normal or corrected-tonormal vision, and were screened for contraindications to tms (rossi et al., ) . exclusion criteria included a history of neurological or psychiatric disease and/ or use of psychoactive medication, a personal history of head trauma with loss of consciousness or family history of epilepsy or seizures. informed consent was obtained for each participant after explanation of study requirements under an experimental protocol approved by the duke university institutional review board (pro ). participants were compensated $ per hour for their time. this study consisted of three sessions, each lasting to hours, performed on separate visits within a three-week span. twenty-one participants completed the first dosefinding session, with and completers for the pre-onset and n -onset sessions, respectively. in total, of the participants completed all three sessions, while the remaining participants only completed one or two of the sessions due to scheduling conflicts, illness, or equipment difficulty. all stimuli were generated using matlab (mathworks, natick, ma, usa) and the psychtoolbox extension (brainard, ) and presented on a -inch monitor with hz refresh and × screen resolution. eeg was acquired on all three visits using a channel actichamp eeg system (brainproducts, munich, deutschland) and tms compatible acticap with slim active electrodes arranged according to an equidistant montage. the eeg signal was digitized at a sampling rate of khz and the online reference electrode was located at fcz. tms was delivered using a magpro r stimulator connected to a cool-b figure-ofeight coil (magventure, farum, denmark) . synchronization between the stimulation computer and tms machine was done through an arduino board that allowed a temporal precis between systems that was confirmed to be approximately ms. a brainsight (rogue research, canada) frameless stereotactic neuronavigation syst was used to monitor tms coil position on the participant's head so that the stimulation locat was kept as constant as possible. for this, coil trackers were attached to the coil and participant's head, which was registered to a standard head model using anatom landmarks. a coil holder and a chin rest were also used to maintain stable coil and he positioning. in addition, a spacer was used to minimize contact with the electrodes a associated artifact (ruddy et al., ) , while a thin foam sheet (~ mm) was added minimize bone conduction of the tms sound and scalp sensations caused by mechan vibration of the coil. participants wore earplugs for hearing protection and mitigation of audit activation by the tms pulse clicks. two experimenters were always present in the room maintain coil positioning and supervise eeg data quality. during the first session, eligibility was assessed and consent was obtained. following this, the -channel eeg cap was fitted to the participant's head and after applying gel the resting motor threshold (rmt) was acquired, taking about minutes according to the procedures described below. this allowed for estimations of stimulation intensity calculation according to rmt with equivalent distancing between the head and coil as used in the dosetesting sessions. participants then performed minutes of practice with the motion task ( figure a ) while eeg impedances were being adjusted and recordings were prepared. after obtaining clean eeg signals, with impedances below kΩ, the participant completed four, minute runs of the motion discrimination task. rmt was defined as the lowest intensity required to elicit a motor evoked potential (mep) of μ v peak-to-peak when the muscle was at rest (conforto, z'graggen, kohl, rösler, & kaelin-lang, a common dot motion discrimination task was employed across all sessions of this experiment. in the first 'dose-finding' session of the study, the coherence of dot motion varied randomly from trial to trial, allowing for the estimation of each individual's threshold coherence value, which was then fixed to this level for each participant in sessions and . in this task, each trial initiated with a white fixation cross appearing for ms, after which two fields of static white dots appear within . ° circular windows centered . ° to the left and right of fixation. these dots remained static for a variable duration between and ms, after which the dots moved briefly for ms at a speed of . °/s. the dots in left field moved incoherently in random directions while the dots in the right field contained some level of directional coherence, determined by the staircase procedures. on each trial a dot coherence was randomly selected to be , , , , , or %. following the motion, participants had seconds to indicate with a button press if the direction of motion was to the left or upwards. after each answer, feedback was provided with a green fixation cross for correct responses and red for incorrect responses that was presented for s. the viewing distance (eye-to-screen center) for each participant was kept constant at approximately cm and all the participants were instructed to perform the task as accurately as possible within the second allotted time. at the end of the session, a generalized linear model was applied to fit a sigmoid function to the assigned (+ /− due to dot motion direction) trial coherences and correct/incorrect responses to determine the % accuracy point on the coherence psychometric curve. if % accuracy could not be achieved, then % dot motion coherence was used. as described in greater detail in section . . , concurrent eeg was collected and analyzed for vep results. as described in section . , these analyses led to characterization of an n onset latency and location that was used for dosing in sessions and . during sessions and , participants performed the same motion task with dot stimuli presented only at the threshold coherence level determined from session . these sessions differed only in the latency at which tms was applied relative to the onset of the motion on each trial. as illustrated in figure b , biphasic sptms was applied either ms before the onset of motion (pre-onset) or at the onset latency of the n component (n -onset) derived from session . the v target was defined as the topographic location showing the most robust n response identified from the current source density maps (csd) corresponding to the n sink topography (see right column of table ) . consistent with what is known about motion sensitive cortex (silvanto et al., ) , this topographic localization pointed to electrodes over the left occipital cortex, as the regions displaying maximal activity (referred to here as v ). the center of the coil was placed tangentially to this stimulation site with the handle pointing towards the right hemisphere. on both of these sessions the procedures were identical and began with the participant practicing the motion task as the cap was placed, gel applied, and impedances checked and adjusted. after a clean eeg signal was achieved with impedances below kΩ, the participant began eight -minute blocks of trials each while receiving sptms. in six of the eight blocks, tms was delivered to the v target channel at intensities of %, %, % or % rmt, controlled remotely using a customized function from magic matlab toolbox (saatlou et al., ) . these intensities were consistently delivered for all trials within a block and changed according to a random sequence created for each block at the beginning of the session. the two remaining blocks were assigned to stimulation of the vertex during task performance at % of rmt ( trials). stimulation over vertex was performed as a control condition. vertex was defined as the scalp location corresponding to cz in the - system. the coil handle was parallel to the midline pointing backwards. two blocks of stimulation delivered over vertex were randomly distributed throughout the eight total blocks, as determined by a ordering calculation done at the start of each session. one-way anova were performed to evaluate the effect of dot coherence in the initial dose-finding session. to evaluate experimental effects in the dose-testing sessions changes in accuracy were examined by a repeated measure anova (rmanova) with factors stimulation condition ( %, %, %, and % rmt at v and % rmt at vertex) and tms timing (pre-onset and n -onset). additionally, to provide a more sensitive assessment of the interaction between tms at different intensities and brain dynamics, accuracy data were also analyzed as the difference of each stimulation condition, minus the % rmt condition. this condition presents the same visual stimuli but no tms pulse or associated sounds and sensation, making it an effective baseline to normalize ongoing tonic activity. a x rmanova was performed over these differences. post hoc pairwise comparisons were tested using tukey's hsd method. a shapiro-wilks test was used to verify the assumption of normality while mauchly's test evaluated the assumption of sphericity. a greenhouse-geisser correction was adopted when the sphericity assumption was violated. statistical tests were performed using spss . (spss inc, chicago) and p values ≤ . were considered statistically significant. eeg data were preprocessed and analyzed offline using brain vision analyzer (brain products, inc.) and matlab (mathworks, natick, ma, usa), in a manner modeled after rogasch et al. ( ) . data were down sampled to hz, tms pulses were removed via linear interpolation spanning from ms before to ms after the pulse. before applying independent component analysis (ica) to identify and discard artifactual components such as eye blinks and muscle activity, the eeg signal was bandpass filtered using a zero-phase shift , and , centered over midline parietal cortex, in alignment with previously reported visual motion p effects (kuba et al., ; kubová et al., ) . because of the biphasic morphology of the n component in this study (see figure b) figure a locations) for the closest eeg electrodes relative to the position of stimulation in session and . the coil was positioned approximately over the center of mass surrounding the electrodes, based on the n topography in visit , as guided by brainsight neuronavigation. as illustrated in figure a , participants performed the task at chance in trials that h no coherent movement ( %), with a monotonic increase in accuracy for higher coheren stimuli. veps time locked to the onset of motion showed a waveform morphology (figure b and c) dominated by a bilateral negative posterior distribution peaking around ms, followed by a positive-going central-parietal deflection around ms, suggestive of the widely reported n and p erp components (bach & ullrich, ; kuba et al., ; martin et al., ) . the n component was identifiable in all individual participant's and was used to derive the n -onset stimulation timing and location. veps were processed to calculate the latency at / max amplitude of the n component at its peak location. these locations were always at left occipital channels ( , , , and ), illustrated by the green labels in figure d . similar to previous studies using near-threshold motion onset (kuba et al., ; martin et al., ) , the vep here also did not contain a prominent p component (bach & ullrich, ; kuba et al., ; martin et al., ) . the p response has been previously described as a patternsensitive parvocellular-driven component, which may have been absent here due to the ms separation of the dot appearance and motion onset in this task (see supplement for more information). using performance accuracy data from the participants who completed both experimental sessions, the (tms timings) . ), with higher accuracy for n -onset, relative to pre-onset. the effect is illustrated in figure a . in addition, the tms performance for each intensity level was also adjusted relative to the % at v separately for timing condition (e.g., accuracy % -accuracy % ) as a means to control for ongoing tonic variability in behavior in the absence of sound or magnetic stimulation elicited by the pulse. rmanova results on the adjusted behavioral accuracy, shown in figure demonstrated non-significant main effects of tms one-way anova performed on the four normalized behavioral accuracy scor revealed significant differences between conditions (f[ , ] = . , p = . , η = . ), w higher accuracy for the % rmt condition relative to all the other conditions (ps one-way anova results on the late n amplitude in the v roi and the p amplitude in the cp roi revealed significant main effects of stimulus condition on the p amplitude (f[ , ] = . , p = . , η = . ), but not in the late n component (p = . ). pairwise comparisons indicated that the % at v condition resulted in higher mean amplitude than the % (p = . ) and % (p = . ) conditions, as illustrated in figure b . the pearson correlation between adjusted vep amplitudes and adjusted response accuracy across individuals were explored similarly to those in the pre-onset timing. no significant correlation estimates were identified for either the late n or p components with corrected alpha level. the current study aimed to identify task-relevant dose-response functions of single pulse tms in order to examine the influence of stimulation timing and intensity on electrophysiological and behavioral responses. here, single pulse tms was applied at two latencies relative to the onset of psychometrically-calibrated, near-threshold, motion stimuli to assess the behavioral and electrophysiological changes due to tms. it was hypothesized that stimulation would lead to greater behavioral disruption in motion direction discrimination when stimulated at the onset of the n component, given the critical links between this component and motion perception, and the past reports of acute perceptual disruption due to online tms (beynel et al., ) . this was expected to interact with stimulus intensities such that higher intensities would induce greater behavioral effects. it was observed that stimulation applied at the n -onset led to behavioral facilitation in motion discrimination relative to pre-onset stimulation, however, this effect interacted with stimulation intensity revealing greater facilitation of motion discrimination accuracy at the lowest stimulation intensity of % rmt. furthermore, it was found that vep amplitudes significantly differed for the p component, with lower amplitudes in the % intensity condition for the pre-onset condition and higher amplitudes in the % intensity condition for the n -onset condition. these results suggest that timing and intensity interact with the profiles of perceptual and electrophysiological responses to near threshold motion stimuli with the additional indication that behavioral facilitation is being promoted by lower intensity stimulation. the early n component could not be analyzed because it overlapped with the tms pulses and was removed during data cleaning past studies have reported a wide variety of behavioral effects that result from single or paired pulse tms over a range of factors, such as timing, intensity, and the ongoing activation state of the stimulated region (de graaf et al., ; romei, thut, & silvanto, ; sandrini, umiltà, & rusconi, ; wagner, valero-cabre, & pascual-leone, ) . although singlepulse tms has mainly been shown to be disruptive (abrahamyan, clifford, arabzadeh, & harris, ; amassian et al., ; beckers & zeki, ; desmond, chen, & shieh, ) , it has been reported that facilitation can be achieved with the right combination of stimulation timing, intensity and background brain state (abrahamyan, clifford, arabzadeh, & harris, ; abrahamyan et al., ; silvanto et al., ; silvanto, bona, marelli, & cattaneo, ) . for example, enhanced target detection has been found after stimulating the visual cortex with low intensity single pulse tms delivered and ms after stimulus onset (abrahamyan et al., (abrahamyan et al., , . in these cases, it was proposed that the summation of ongoing neural responses and the excitation induced by the lower intensity tms pulse resulted in enhanced processing and behavioral facilitation. this explanation is similar to the stochastic resonance account (schwarzkopf, silvanto, & rees, ; stocks, ) which proposes that the strength of a stimulus can be improved by externally enhancing the ongoing neural activity. that is, a single pulse of tms, delivered at low intensity, will add low levels of noise to the neural system, boosting information exchange between the neurons in the stimulated cortex (silvanto et al., ) . while such a model may explain the observed effects, other factors may also contribute to facilitation or inhibition in such contexts (cattaneo & silvanto, ) . for example, adaptation that may result from repeated presentations of a stimulus has been shown to influence tms effects with demonstrations that tms can reverse the influence of adaptation on target detection (silvanto, muggleton, cowey, & walsh, ) . similarly, tms applied during the delay between a visual priming stimulus and a target stimulus also leads to selective modulation of the target to reduce the priming effect, demonstrating that attributes encoded by less active neural populations are preferentially facilitated by tms (cattaneo & silvanto, ) . finally, as proposed in other studies demonstrating modulatory effects of tms to v in the n latency range (laycock et al., ; sack et al., ) , may reflect feedforward/feedback processing between the striate and extrastriate cortices. the current findings build on these theorized mechanisms to provide additional insight into state-dependency tms effects by describing systematic dose influences on behavior, and particularly the observed profile with greater enhancement effects at lower intensities. the combination of transcranial magnetic stimulation and electroencephalography (eeg) is a powerful tool for investigating cortical mechanisms and networks with fine temporal precision. using this technique, we expected to obtain electrophysiological evidence of the interplay between brain activity and tms conditions. our results showed that the mean activity of the p component in the cp roi produced significant differences between intensities in both timing conditions. in general, high intensity single pulse tms evoked higher mean activity values for the p component than pulses at low intensities. in the pre-onset % condition, values were significantly smaller than for the higher intensity tms conditions, while values were significantly higher for the % condition at n -onset when compared to the lower intensity conditions, but not to vertex- %. this mean activity increase, however, did not correlate with the magnitude of behavioral effects across participants. while n amplitudes generally scaled monotonically with higher stimulation intensities, these did not differ significantly across conditions or correlate with behavioral effects. in interpreting these effects, it is possible that the eeg response obtained here may have been impacted by indirect brain stimulation. namely, despite the presence of both a nostimulation control and a vertex control, the lack of a somatosensory-matched sham condition is a limitation of this study and the observed intensity effects may be attributed to (multi)sensory evoked activity and not necessarily the result of neural changes due to the induced magnetic field (casarotto et al., ; conde et al., ) . as discussed by others (siebner, conde, tomasevic, thielscher, & bergmann, ) , an important contributor to the tms-eeg response is the prominent auditory evoked potential associated with the clicking sound produced by tms. this auditory response is introduced by a combination of boneconduction and air conducted sound reaching the cochlea and activating central and parietotemporal regions bilaterally to produce a prominent n -p complex (lioumis, kičić, savolainen, mäkelä, & kähkönen, ). reducing sound levels and distancing the coil from the head, have been shown to reduce the n -p complex amplitude (seppo kähkönen, wilenius, komssi, & ilmoniemi, ; nikouline, ruohonen, & ilmoniemi, ; ter braack, de vos, & van putten, ) , and while earplugs and foam insulation were used to mitigate sound here, it is possible that these measures were not enough to suppress the auditory response. thus, future studies should strive to use a more sensitive somatosensory-matched sham control condition to better isolate tms effects. an alternative explanation for the amplitude scaling of the p response, but not the n response may pertain to attentional capture that is greater for higher intensity stimulation. as noted, the p response is strongly associated with attentional processes (kuba et al., ; kubová et al., ) and therefore, a plausible explanation of the effects here may stem from uncontrolled attentional allocation. an additional, related observation in this study was that during the pre-onset condition we found the presence of an early component at a latency roughly equivalent to the p . this response was not present in the % intensity condition, nor during the early stages of the eeg response in the n -onset condition, pointing to the possibility that this response may have been driven entirely by the tms pulse and not associated with the visual stimulus or the interaction between visual and magnetic stimulation. further control sessions, performed with a subset of seven participants, who received tms without visual stimulation (see supplementary figure ) also produce a robust eeg response in the p range, further supporting this assumption that this response is driven by the tms pulse rather than interactions with evoked visual responses. as noted above, such findings need to be considered in the context of multisensory stimulation and associated artifacts in the tms response (bonato, miniussi, & rossini, ; s. kähkönen, komssi, wilenius, & ilmoniemi, ; lioumis et al., ) . while the present study offered a unique view of the dose-response relationships between tms intensity, behavior and electrophysiological brain responses, there are a number of limitations that should be improved upon in future studies. as noted in the section above, one such limitation is possible presence of multisensory stimulation confounds that should be better controlled using somatosensory-matched electrical stimulation in future studies. another important consideration is the mode used to define stimulation intensity, which here, was delivered relative to resting motor threshold. while such intensity scaling is principled, objective, and widely used, this dosing is made based upon motor system responsiveness and not visual system responsiveness. the current finding therefore need to be considered in light of previous research showing that meps can be elicited at intensities below rmt, and that in motor cortex there are distinct recruitment (i.e. input-output) curves for inhibitory processes that typically elicit lower threshold than for excitatory processes (kallioniemi, säisänen, könönen, awiszus, & julkunen, ) . moreover, while there is also a literature showing that phosphene thresholds are generally higher than motor thresholds (antal, nitsche, kincses, lampe, & paulus, ; boroojerdi et al., ; deblieck, thompson, iacoboni, & wu, ) , there is also considerable heterogeneity in ability to elicit phosphenes in individual subjects. in light of these challenges, future studies should build on the current dose-response functions to include intensity scaling based on visual responsiveness. lastly, it is worth noting that there is an important movement towards larger and better powered sample sizes in tms studies. while the current sample of individuals who completed all study activities is larger than average sample size of , reported in a recent meta-analysis of online tms studies published last year (beynel et al., ) , future studies should continue to strive for even larger sample sizes. supplementary figure . in addition to the visual evoked potentials collected as part of main experiment, seven participants were also tested on a brief supplemental session in wh % rmt stimulation was delivered over their v target as they fixated on a screen with visual motion stimulus. here, trials of single pulse tms was delivered at random interv spaced apart between and seconds as they gazed at an unchanging fixation mark. illustrate the difference in evoked responses to tms delivered in the presence of visual mot (replicated from figure a improving visual sensitivity with subthreshold transcranial magnetic stimulation low intensity tms enhances perception of visual stimuli suppression of visual perception by magnetic coil stimulation of human occipital cortex no correlation between moving phosphene and motor thresholds: a transcranial magnetic stimulation study contrast dependency of motion-onset and pattern-reversal veps: interaction of stimulus type, recording site and response component the consequences of inactivating areas v and v on visual motion perception effects of online repetitive transcranial magnetic stimulation (rtms) on cognitive processing: a meta-analysis and recommendations for future studies transcranial magnetic stimulation and cortical evoked potentials: a tms/eeg co-registration study estimating resting motor thresholds in transcranial magnetic stimulation research and practice: a computer simulation evaluation of best methods visual and motor cortex excitability: a transcranial magnetic stimulation study contributions of the human temporoparietal junction and mt/v + to the timing of interception revealed by transcranial magnetic stimulation the psychophysics toolbox priming of motion direction and area v /mt: a test of perceptual memory eeg responses to tms are sensitive to changes in the perturbation parameters and repeatable over time investigating visual motion perception using the transcranial magnetic stimulation-adaptation paradigm improvement in visual search with practice: mapping learning-related changes in neurocognitive stages of processing the non-transcranial tms-evoked potential is an inherent source of ambiguity in tms-eeg studies impact of coil position and electrophysiological monitoring on determination of motor thresholds to transcranial magnetic stimulation the chronometry of visual perception: review of occipital tms masking studies correlation between motor and phosphene thresholds: a transcranial magnetic stimulation study cerebellar transcranial magnetic stimulation impairs verbal working memory the p component of the event-related brain potential as an index of information processing application of long-interval paired-pulse transcranial magnetic stimulation to motionsensitive visual cortex does not lead to changes in motion perception decoupling of early v motion processing from visual awareness: a matter of velocity as revealed by transcranial magnetic stimulation contribution of transcranial magnetic stimulation to assessment of brain connectivity and networks prefrontal transcranial magnetic stimulation produces intensity-dependent eeg responses in humans distinct differences in cortical reactivity of motor and prefrontal cortices to magnetic stimulation on the estimation of silent period thresholds in transcranial magnetic stimulation cognitive evoked potentials related to visual perception of motion in human subjects motion-onset veps: characteristics, methods, and diagnostic use visual event-related potentials to moving stimuli: normative data evidence for fast signals and later processing in human v /v and v /mt+: a tms study of motion perception reproducibility of tms-evoked eeg responses using transcranial magnetic stimulation to test a network model of perceptual decision making in the human brain electrophysiological evidence for parallel and serial processing during visual search motion-onset visual evoked potentials predict performance during a global direction discrimination task tms reveals flexible use of form and motion cues in biological motion perception transcranial magnetic stimulation differentially affects speed and direction judgments brain priority areas the role of the coil click in tms assessed with simultaneous eeg modulation of muscle responses evoked by transcranial magnetic stimulation during the acquisition of new fine motor skills fast backprojections from the motion to the primary visual area necessary for visual awareness effects of stimulus location on automatic detection of changes in motion direction in the human brain removing artefacts from tms-eeg recordings using independent component analysis: importance for assessing prefrontal and motor cortex network properties information-based approaches of noninvasive transcranial brain stimulation safety, ethical considerations, and application guidelines for the use of transcranial magnetic stimulation in clinical practice and research improving the quality of combined eeg-tms neural recordings: introducing the coil spacer magic: an open-source matlab toolbox for external control of transcranial magnetic stimulation devices the temporal characteristics of motion processing in hmt/v +: combining fmri and neuronavigated tms the use of transcranial magnetic stimulation in cognitive neuroscience: a new synthesis of methodological issues mapping the visual brain areas susceptible to phosphene induction through brain stimulation stochastic resonance effects reveal the neural mechanisms of transcranial magnetic stimulation distilling the essence of tms-evoked eeg potentials (teps): a call for securing mechanistic specificity and experimental rigor initial activation state, stimulation intensity and timing of stimulation interact in producing behavioral effects of tms on the mechanisms of transcranial magnetic stimulation (tms): how brain state and baseline performance level determine behavioral effects of tms double dissociation of v and v /mt activity in visual awareness neural adaptation reveals statedependent effects of transcranial magnetic stimulation suprathreshold stochastic resonance in multilevel threshold systems masking the auditory evoked potential in tms-eeg: a comparison of various methods tms over v disrupts motion prediction noninvasive human brain stimulation key: cord- -yzqic vt authors: liu, zhijin title: global view on virus infection in non-human primates and implication for public health and wildlife conservation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yzqic vt the pandemic outbreak and rapid worldwide spread of severe acute respiratory syndrome coronavirus (sars-cov- ) is not only a threat for humans, but potentially also for many animals. research has revealed that sars-cov- and other coronaviruses have been transmitted from animals to humans and vice versa, and across animal species, and hence, attracted public attention concerning host-virus interactions and transmission ways. non-human primates (nhps), as our evolutionary closest relatives, are susceptible to human viruses, and a number of pathogens are known to circulate between humans and nhps. here we generated global statistics of virus infection in nhps (vi-nhps). in total, nhp species from families have been reported to be infected by dna and rna viruses from virus families; . percent of viruses in nhps have also been found in humans, indicative of the high potential for cross species transmission of these viruses. the top ten nhp species with high centrality in the nhp-virus network are two apes (pan troglodytes, pongo pygmaeus), seven old world monkeys (macaca mulatta, m. fascicularis, papio cynocephalus, lophocebus albigena, chlorocebus aethiops, cercopithecus ascanius, c. nictitans) and a lemur (propithecus diadema). besides apes, there is a high risk of virus circulation between humans and old world monkeys, given the wide distribution of many old world monkey species and their frequent contact with humans. we suggest epidemiological investigations in nhps, specifically in old world monkeys with close contact to humans, and other effective measures to prevent this potential circular transmission. coronavirus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ) rapidly spread worldwide, and recent studies suggest that pets and other animals could also be infected by sars-cov- through natural contact [ , ] . captive rhesus macaques (m. mulatta), inoculated with sars-cov- in pathological studies, exhibited a moderate infection as observed in the majority of human cases [ , ] . besides captive animals and pets, wild animals are also susceptible to the infection of coronaviruses transmitted from humans. for instance, in , wild chimpanzees in côte d´ivoire were infected by the human coronavirus oc [ ] . the close evolutionary relationship between humans and nhps is thought to support pathogen transmission [ ] and many viruses have been described that circulate between humans and nhps. in captive and wild nhps, various viruses including coronaviruses, enteroviruses, enteric adenoviruses, rotaviruses, and picobirnaviruses have been detected, which are also found in humans [ ] [ ] [ ] . the most prominent cases of virus transmission from wild nhps to human are simian foamy virus (sfv), yellow fever virus (yfv), zika virus (zikv), and human immunodeficiency virus (hiv) [ ] [ ] [ ] [ ] . conversely, viruses such as poliovirus and measles have been reported in nhps and likely derived from local human populations [ ] . to block the potential circular transmission route of different viruses between human and nhps, precautions and regulations are needed. here we performed a survey on documented virus infections in nhps (vi-nhps) based on published data. first, we generated a summary statistics of worldwide reported vi-nhps. we then identified and predicted nhp species with a high risk of virus transmission from humans and predicted geographic locations where disease outbreaks are likely to occur. / global information of vi-nhps was extracted from the global mammal parasite database (gmpd, http://www.mammalparasites.org/). we also used literature searches for publications describing vi-nhps, which were not included in gmpd. only the natural virus infections in captive and wild nhps have been recorded, while the virus inoculations for pathological studies are not included. we then built host-virus ecological networks in which nodes represent nhps that are linked through shared viruses. since centrality in primate-virus networks could assess the potential for the circulation of viruses among nhps and humans, we estimated the centrality using four metrics: strength degree centrality, eigenvector centrality, betweenness centrality, and closeness centrality implemented in the r package "igraph" and ucinet . table s ), indicating that they detected similar nhp species as most central. a single factor found in pca explained . % of the variance of the indices, which was used as the composite index to assess the centrality of each node ( cercopithecus nictitans (figure a and b) . after controlling for phylogeny, virus number in each nhp species and the number of viruses shared with humans in each nhp species were significantly and positively associated to the centrality of each nhp species (strength degree centrality, eigenvector centrality, betweenness centrality, closeness centrality, and the composite centrality; figure c and d, table s results showed the some trend with the analysis without controlling the sampling efforts, and for the sake of brevity we provide results in the supplementary metarials (table s -s and figure s ). in the future, more efforts ought to be made for the collection, documentation and analysis of vi-nhp, especially for nhp species with higher potential of virus transmission. since coronaviruses have been reported in macaques and other primates [ , ] , viral surveys should first target such species, not only to find known coronaviruses in such populations, but also to find new strains with high zoonotic potential. experts in animal health and conservation are starting to urge for the protection of great apes during human covid- pandemics, since the transmission of the human virus to apes could result in severe outbreaks and local extinctions [ ] . we suggest to expand such efforts to various old world monkeys, as many of them, for instance, baboons or macaques, are widely distributed and often in close proximity to humans (figure a, b and c) . susceptibility of ferrets, cats, dogs, and different domestic animals to sars-coronavirus- sars-cov- neutralizing serum antibodies in cats: a serological investigation respiratory disease and virus shedding in rhesus macaques inoculated with sars-cov- ocular conjunctival inoculation of sars-cov- can cause mild covid- in rhesus macaques human coronavirus oc outbreak in wild chimpanzees, côte d´ivoire human culture and monkey behavior: assessing the contexts of potential pathogen transmission between macaques and humans coronavirus-like particles in nonhuman primate feces detection of viral agents in fecal specimens of monkeys with diarrhea leptospira spp., rotavirus, norovirus, and hepatitis e virus surveillance in a wild invasive golden-headed lion tamarin simian retroviral infections in human beings origins of major human infectious diseases infectious diseases in primates: behavior, ecology and evolution centrality in primate-parasite networks reveals the potential for the transmission of emerging infectious diseases to humans primate conservation: the prevention of disease transmission ucinet for windows: software for social network analysis the comparative approach in cross-species pathogen transmission and disease emergence in primates phylogenetic host specificity and understanding parasite sharing in primates the genetics of mexico recapitulates native american substructure and affects biomedical traits family cercopithecidae handbook of the mammals of the world population genomics of bronze age eurasia an integrated map of structural variation in human genomes systematic review of the rhesus macaque, macaca mulatta (zimmermann, ) identifying future zoonotic disease threats: where are the gaps in our understanding of primate infectious diseases? covid- : protect great apes during human pandemics comparative ace variation and primate covid- risk comparison of sars-cov- infections among species of non-human primates thoughts on convergence science of high-risk animals responsible for zoonotic epidemics key: cord- -ls zgipi authors: norris, rachael p.; terasaki, mark title: gap junction internalization and processing in vivo: a d immuno-electron microscopy study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ls zgipi gap junctions have well-established roles in cell-cell communication by way of forming permeable intercellular channels. less is understood about their internalization, which forms double membrane vesicles containing cytosol and membranes from another cell, called connexosomes or annular gap junctions. here, we systematically studied the fate of connexosomes in intact ovarian follicles. high pressure frozen, serial sectioned tissue was immunogold labeled for connexin . within a volume of electron micrographs, every labeled structure was categorized and counted. surface area measurements indicate that large connexosomes undergo fission. subsequent modifications are separation of inner and outer membranes, loss of cx from the outer membrane, and outward budding of the modified membranes. we also documented several clear examples of organelle transfer from one cell to another by gap junction internalization. we discuss how connexosome formation and processing may be a novel means for gap junctions to mediate cell-cell communication. gap junctions are arrays of permeable channels between two cells that have well-established roles in intercellular signaling (nielsen et al., ) . the basic structural unit is the transmembrane protein connexin. six connexins assemble to form the connexon, which is a pore within the membrane. a connexon in one cell docks head-on with a connexon in a neighboring cell to form a channel between the two cells. large gap junctions may consist of hundreds or thousands of channels packed densely in a patch a few microns in diameter (larsen, ) . the current view is that connexons are added to the plasma membrane via small post-golgi vesicles, followed by docking between two cells, but how connexons are removed from the membrane and their subsequent fate are incompletely understood. to turn over the gap junction, cells could undock connexons and then endocytose them in small parcels. instead, connexons remain docked and the gap junction is taken up by one of the two cells (laird et al., ; falk et al., ) . this was first suggested by electron microscopists who interpreted circular gap junction profiles (first called annular gap junctions) as internalized gap junctions (espey and stutts, ; merk et al., ) . this interpretation was convincingly corroborated by live imaging of gfp-connexins, which showed formation of vesicles of comparable size (jordan et al., ; piehl et al., ) . the internalized gap junction structure is now often called a connexosome (laird, ) . gap junction internalization is therefore a type of endocytosis, in which the plasma membrane of the neighboring cell remains attached to the endocytosed plasma membrane (heck and devenport, ) . gap junction internalization can also be considered to be a form of trogocytosis (joly and hudriser, ) in which a portion of the plasma membrane and cytosol of the neighboring cell is transferred to the engulfing cell (see fig. a ). there is a possibility that further processing after the initial engulfment is involved in other modes of cell-cell communication (see discussion). internalized gap junctions could likewise play new roles in cell-cell communication. in a previous study (norris et al., ) , we addressed several methodological issues that have limited electron microscopic studies of gap junctions in the past. mouse ovarian follicles were high pressure frozen in order to preserve structure better than chemical fixation, which involves diffusion of aldehydes through the cell membrane(s) and cross linking of proteins, during which abnormal processes may occur (murk et al., a) . the frozen tissue was freeze-substituted, embedded in lowicryl, sectioned, then immunolabeled with an antibody to cx . this allowed us to unambiguously identify gap junctions. serial sections were then imaged to obtain threedimensional information; this distinguished between internalized connexosomes and gap junctions in the process of invagination. likewise, a round profile in a single section could be a vesicle or an invagination, or an apparently empty vesicle could contain intraluminal vesicles; serial sections distinguished these alternatives. here, we investigate the fate of internalized gap junctions by examining cx localization. as before, we high pressure froze ovarian follicles and immunolabeled serial sections. we found a surprisingly large number of cx labeled structures that we interpret as modifications of the connexosome. mouse ovarian antral follicles were high pressure frozen minutes after exposure to luteinizing hormone (lh). this tissue was used because it has large numbers of cx gap junctions (okuma et al, ; norris et al., norris et al., , baena et al., ) , which are caused to internalize in response to hormone (larsen et al., ) . as in our previous study, the follicles were embedded in lowicryl, serial sectioned, then labeled with primary antibody to cx and gold-labeled secondary antibody (norris et al., ) . we used a different scanning electron microscope in order to obtain higher resolution images from the sections labeled in our previous study. this provided clearer images of the classic double membrane with a gap structure ( figure b -d). we will use the term connexosome to mean a double membrane vesicle completely detached from the plasma membrane in which both limiting membranes were contacting each other and labeled with cx throughout the periphery of the vesicle. connexosomes were the most abundant cx labeled structure in the cytoplasm, but there were also many other membranous structures that contained cx (video ) which we interpreted to be modifications of the connexosome. we characterized these cx structures by classifying and counting them in a defined volume. by doing so, we gained information on the relative abundance of various forms, which could be useful in deducing dynamics. we analyzed two volumes of mural granulosa cells that were each μm x μm in the x/y plane (imaged at nm per pixel), in sections of nm thickness. thus each volume was , μm . in this volume, there were gap junctions. of these, were a planar patch. the others were present on highly infolded membranes, which we refer to as invaginating gap junctions. there were internalized structures. of the cx labeled structures, were connexosomes and appeared to be connexosomes that had undergone processing. direct imaging of living cultured cells has shown that connexosomes are formed by internalization of entire gap junctions, and that connexosomes can undergo fission (piehl et al., ; bell et al., ) . we examined the high pressure frozen and serial sectioned ovarian follicles for connexosome formation. the areas of structures are measureable in d data sets, so we also measured the surface areas of all of the connexin containing membranes because this information could be relevant to connexosome formation ( figure e ). the average surface area of the "flat" gap junction plaques in the volume was . ± . μm (mean ± sd), with a range of . - . μm . serial sections showed that most were disc shaped, corresponding to a disc of average diameter . μm. the invaginated gap junctions looked like they could form connexosomes. these were larger than the "flat" gap junctions, with an average surface area of . ± . μm with a range of . - . μm (n = ). connexosomes, on the other hand, had an average surface area of . ± . µm , and a range of . - . µm (n= ). the average connexosome area thus was % of the average gap junction area (figs. e, f). it seems likely that invaginated gap junctions were frozen in the process of internalization and are therefore the source of connexosomes. the surface areas of invaginated gap junctions are significantly larger than connexosomes. four of the invaginated gap junctions contained organelles or vesicles in addition to cytosol. one deeply invaginated gap junction contained a multivesicular endosome and a mitochondrion ( fig. a, video ). if such structures were to form connexosomes, it would result in transfer of organelles from one cell to another. we indeed found a connexosome containing a mitochondrion and an apparent endosome (fig. b , video ), and another connexosome enclosing a tubular organelle, possibly er (fig. c , video ). because the entire periphery of the connexosome is gap junction, these organelles must have come from other, previously gap junction-coupled cells. in total, out of connexosomes contained vesicles or organelles in addition to cytosol. of the cx labeled internalized structures, appeared to be modified connexosomes. to describe the modifications, we will use the following terms to refer to the membranes and compartments of the unmodified connexosomes. there are an outer and an inner membrane, which are closely apposed because the connexons are docked. the small space between the two membranes originally was the extracellular space. the compartment within the inner membrane came from the cytoplasm of the neighboring cell. the initial modification appears to be fusion of a connexosome with another vesicle. in of the modified connexosomes, a patch of unlabeled outer membrane bulged outward from a labeled inner membrane ( suggesting that different types of vesicles were involved. cx labels the inner membrane in areas where the inner and outer membranes have separated. in these regions, the connexons must have become undocked. the inner and outer membranes have become more separated in of the modified connexosomes and small vesicles were present in this enlarged space (fig. c ). the small vesicles were generally not labeled with cx , and the diameters averaged nm. this corresponds closely to the reported diameters of intraluminal vesicles ( - nm) found in multivesicular endosomes (murk et al., b; hanson and cashikar, ; scott et al., ) . the inner membrane is identifiable and labeled while the outer membrane is much less labeled ( fig. b -d, video ). there was a striking example in which the inner and outer membranes were completely separated, and no cx label was present in the outer membrane (fig. d ). if the connexons had merely become undocked, the amount of label in the outer membrane should be comparable to the amount in the inner membrane. therefore, this is evidence for connexon degradation in the outer membrane. in the remaining of the modified connexosomes, there was an outer membrane similar in diameter to that of unmodified connexosomes that had little to no cx labeling. also, instead of a single inner membrane labeled with cx , there were various different sized vesicles labeled with cx ( fig. a and b, video ). the diameters of the cx labeled vesicles in modified connexosomes averaged nm, and ranged from nm to nm (fig. c ). based on the range of diameters and the presence of cx , these seem likely to have formed by fission or outward budding (akin to cytokinesis) of the inner connexosome membrane. for comparison, we also measured small vesicles within structures that lacked cx labeling (fig. d ). unlabeled vesicles were smaller, averaging nm in diameter, with a range of to nm ( fig. c ). in addition to subdivision of the inner membrane, some of the modified connexosomes had outer membranes with short tubules extending into the cytosol, (see arrows in fig. a and b), as is often seen in endosomes. (klumperman and raposo, ). luteinizing hormone stimulates the internalization of gap junctions in ovarian granulosa cells (larsen et al., ) . we high pressure froze follicles minutes after stimulation and then labeled cx by immunogold staining in serial sections ( nm thick). every cx containing structure was categorized and counted in a defined volume of ~ , µm . this data allows us to make several novel observations and measurements on connexosome formation and modification. live cell imaging studies of several cultured mammalian cell lines expressing cx -gfp chimeras showed that either entire gap junctions are internalized, followed by fission (piehl et al., ; bell et al., ) , or that a small portion of the gap junction center is internalized (falk et al., ). the d data from high pressure frozen tissue allowed us to look for internalization intermediates and also to make the first systematic measurements of gap junction and connexosome areas. most gap junctions were disk shaped on a flattened piece of plasma membrane (n = ). there were invaginated gap junctions, and their average areas were larger than those of the flat gap junctions. it seems likely that the largest gap junctions begin to invaginate and are the source of connexosomes in this tissue. the internalization of whole invaginated gap junctions should produce correspondingly large connexosomes, but connexosomes as a group are smaller than gap junctions. this is consistent with fission occurring soon after internalization. the simplest connexosome modification was a partial separation of the inner and outer membrane while the rest of the gap junction is intact. the separated outer membrane bulges out, and lacks cx while the inner membrane retains it (figures b and c) . it seems likely that a vesicle has fused with the connexosome, perhaps at a bare patch left over from the internalization process (falk et al., (falk et al., , if this vesicle fusion leads to a lowering of ph, it could cause the connexons of the gap junction to undock (falk et al., ) . subsequent modifications seem to involve two different processes. one is complete separation of the two membranes with loss of cx in the outer membrane but retention in the inner membrane. the other process is the appearance of numerous smaller compartments within the boundary of the former connexosome. small cx -free vesicles seem likely to derive from outward budding of the outer membrane; they resemble intraluminal vesicles of multivesicular endosomes. outward budding or fission of the inner membrane appears to produce cx containing compartments that are somewhat larger than the cx free vesicles. there is evidence from previous studies of other tissues for connexosome or connexin degradation by autophagy. autophagosomes engulf internalized gap junctions in the equine hoof wall (leach and oliphant, ) , canine ventricular myocardium (hesketh et al., ) , hela cells and mouse embryo fibroblasts (lichtenstein et al., ; fong et al., ) . in mouse liver cells, connexins are degraded by autophagy (bejarano et al., ) . however, we did not observe intermediates resembling a phagophore or an autophagosome in our images. our data comes from minutes after application of luteinizing hormone, which might be too early for the final stages of cx degradation. another possibility is that in ovarian granulosa cells, connexosomes become something more related to multivesicular endosomes. this conclusion was made by leithe et al ( ) in cultured rat liver epithelial cells that were treated with phorbol ester. their evidence was based on immunolocalization of endosomal markers and cx is highly phosphorylated after lh treatment (norris et al., ) . if we had not immunogold labeled sections with cx , a structure as seen in figs. a or b would likely be identified as a multivesicular endosome. this suggests that in other tissues, some apparent multivesicular endosomes could be modified connexosomes. we propose a sequence of connexosome processing events ( figure ). the initial event is fusion with a vesicle (fig. , step ) . the vesicle fusion adds unlabeled membrane to the outer membrane, and triggers the undocking of connexons, perhaps by lowering the ph (falk et al., ) . the connexons of the outer membrane are degraded, perhaps because the cytoplasm of the host cell can recognize that they are undocked (fig. , step ) the uncoupled inner membrane undergoes either fission or outward budding to form various sizes of cx containing vesicles (fig , step ) . possible fates of the modified connexosomes are degradation by autophagosome formation or direct fusion with a lysosome (leithe et al., ; falk et al., ) . however, an alternative possibility is the fusion of the inner vesicles with the modified outer membrane that lacks cx (fig. , step ) . this would result in the mixing of membranes from two cells, and the release of cytoplasm from one cell into another. evidence for this happening in other cells is discussed in the next section. what vesicles initially fuse with the connexosomes? many appear to be clear vesicles, which are consistent with endosomes (murk et al., a) , but a significant fraction of vesicles were dark which is characteristic of lysosomes (klumperman and raposo, ) . there is evidence that lysosomes are not always degradative and can instead function in secretion and signaling (settembre et al., ; perera and zoncu, ) . while there is solid evidence for clathrin involvement in connexosome formation (piehl et al., ) , an escrt driven process could also help explain some of our observations. escrt machinery for outward budding could become attached and activated on the donor side of the gap junction. there is evidence that escrt machinery can bind to ubiquitinated connexins (auth et al., ). if the machinery were transferred within the connexosome, it could generate single membrane vesicles by outward budding or fission of the internal membrane after separation of the inner and outer connexosome membranes. the channel properties of connexins are well established for the exchange of small molecules between cells (nielsen et al., ) . we discuss here how connexins, by way of forming connexosomes, may also facilitate cytoplasmic and membrane transfer between cells. first of all, our serial section images provide conclusive evidence for transfer of mitochondria and other organelles via connexosomes. a transferred mitochondrion could affect physiological processes of the host cell. in a mouse model for acute lung injury, cx -dependent mitochondrial transfer from bone marrow derived stroma cells rescued injured lung alveolar epithelial cells (islam et al., ) . the authors suggested that mitochondria were transferred in some way by microvesicles; our observations provide a clear cx -dependent mechanism for transfer. our observations demonstrate only the transfer of an organelle in an enclosed double membrane. if the double membranes were to fuse (or the inner membrane vesicles were to fuse with the outer membrane), this would result in mixing of cytoplasm (e.g. release of mitochondrion in the above example) and membranes ( figure , step ). it is established that the endosome membrane can fuse with a vesicle within it (bissig and gruenberg, ) . this occurs with enveloped viruses such as vesicular stomatitis virus (le blanc et al., ) or coronavirus (grove and marsh, ) , resulting in the release of nucleocapsids into the cytoplasm. there is now direct evidence that this also occurs with extracellular vesicles (joshi et al., ) , resulting in release of cargo to the cytoplasm. membrane mixing involving connexosomes might explain a previous finding in immune cells. macrophages transfer mhc ii bound antigens to dendritic cells in the gut to establish oral tolerance, and this depends on the presence of cx (mazzini et al., ) . without specifying the role of cx , the authors raise the possibility that transfer occurs via trogocytosis. in trogocytosis, like connexosome formation, a part of the plasma membrane and cytoplasm of one cell is internalized into another (joly and hudriser, ) ; a fusion between outer and inner compartments, followed by budding of the combined membranes would be required to achieve antigen transfer and presentation. we suggest that the mhc ii-antigen complex is transferred from the macrophage via connexosomes followed by fusion of inner and outer membranes to deliver it to the dendritic cell's endosomal system. cell-cell communication by way of connexosome formation and processing may be widespread and warrants further investigation. from this point, we followed the procedure of rubio and wenthold ( ) , with some modifications. samples were freeze-substituted with . % uranyl acetate (electron microscopy sciences) in dry methanol for hours at - ° c in an afs freeze substitution unit (leica biosystems). the temperature was then raised ° c per hour to - ° c. samples were then rinsed in methanol, and infiltrated with monostep uv light as the temperature in the afs was increased by ° per hour to ° c, then held at ° c for hours more. when samples were removed from the afs , they were pink in color and left to polymerize at room temperature until the pink hue was gone two days later. ultrathin sections ( nm) of lowicryl hm embedded follicles were cut on a uc- ultramicrotome (leica biosystems) with a diamond knife (diatome, hatfield, pa). the sections were picked up by an automated tape collector on glow-discharged kapton tape (terasaki et al., ; kasthuri et al., ; baena et al., ) . for immunostaining, ribbons of follicle sections on kapton tape were cut to lengths of approximately three inches and attached to a sheet of parafilm with doublesided carbon tape (electron microscopy sciences # ). sections were rehydrated with x pbs (life technologies, grand island, ny) and blocked in % normal goat serum (invitrogen, frederick, md) in a solution of % bovine serum albumin in pbs. following an overnight incubation at °c in primary antibody, sections were rinsed three times for minutes each in pbs, then rinsed once in % bsa in pbs. next, secondary antibody diluted at : was applied to sections for one hour at room temperature. sections were then rinsed with x pbs followed by milli-q filtered water and dried overnight. sections were placed back in their original order and post-stained with % uranyl acetate in : methanol: water for minutes, then rinsed generously in water. imaging serial sections of tissue with scanning em immuno-labeled sections on tape were attached to a cm diameter silicon wafer (university wafer, south boston, ma) with double-sided carbon adhesive tape (electron microscopy sciences). wafers were carbon coated (denton, moorestown, nj) and first imaged on a sigma field emission scanning electron microscope (zeiss, thornwood, ny) using a backscatter detector as described in norris et al., . two volumes of mural granulosa cells were imaged at nm/pixel resolution, with a field of view of square micrometers. original low-resolution images obtained on the zeiss sigma were aligned with the register virtual stack slices macro (fiji), then larger files were aligned and diced for convenient viewing with a custom program (piet, provided by duncan mak and jeff lichtman, harvard university). these files were used to track all cx -labeled internalized structures. when internalized structures looked complex, they were reimaged using a higher resolution electron microscope as described below. high-resolution images on fei verios l higher a. five serial sections through a modified connexosome containing several internal vesicles labeled with cx . b. five serial sections through a different modified connexosome with internal vesicles labeled with cx . this vesicle has a darker interior than the vesicle in a. in a. and b., arrows indicate short tubules extending from the outer membranes, as is often seen in endosomes. related video shows all sections through structures in a and b. c. diameters of cx -labeled internal vesicles are more variable in size and larger than unlabeled internal vesicles. internal vesicles labeled with cx were measured within nine modified connexosomes, and unlabeled internal vesicles were measured within nine structures that had no cx labeling. d. two sections through a vesicle with unlabeled intraluminal vesicles, and no cx labeling, as measured for panel c. scale bars in a,b, and d are nm. proposed events after internalization and fission. as in figure , gap junction proteins are represented by orange rectangles. ( ). fusion of a connexosome with a vesicle from the host / recipient cell leads to a local separation of the two membranes ( ). the formation of intraluminal vesicles (ilvs) from the outer membrane is also possible. a further modification is the outward budding of the inner membrane that still contains cx ( ). note that cx is decreased or absent in the outer connexosome membrane. while degradation is possible, another possible fate of the modified connexosomes is fusion of the inner vesicles with the limiting membrane of the processed connexosome ( ). this would result in release of contents from the other cell and mixing of the plasma membrane from the donor cell with the membrane of the processed connexosome. if some of the donor membrane budded away from the processed connexosome, it could get incorporated into the receiving cell plasma membrane. video . multiple structures labeled with cx polyclonal antibody. serial sections through an ovarian granulosa cells showing a variety of double membrane vesicles labeled with an antibody to cx and nm gold. other types of vesicles are also labeled with cx (yellow arrows). the scale bar is nm. video . organelle transfer by gap junction internalization. serial sections through ovarian granulosa cells labeled with anti-cx and nm gold. corresponding to figure re a, the cytoplasm of a cell protruding into another cell is shaded in blue. there is a multivesicular endosome and a mitochondrion protruding into the cell as well. scale bar is nm, as in figure a . corresponding to figure b , a connexosome contains a multivesicular endosome and a mitochondrion. corresponding to figure c , another connexosome contains smaller vesicles and other membranes. scale bars are nm, as in figures c and c . video . connexosome modifications of the outer and inner membranes. serial sections through ovarian granulosa cells labeled with anti-cx and nm gold. corresponding to figures and , the full structures of the modified connexosomes in figure a -d and in figures a and b are shown. scale bars are nm. the tsg protein binds to connexins and is involved in connexin degradation cellular heterogeneity of the luteinizing hormone receptor and its significance for cyclic gmp signaling in mouse preovulatory follicles. endocrinology serial-section electron microscopy using automated tape-collecting ultramicrotome (atum) autophagy modulates dynamics of connexins at the plasma membrane in a ubiquitin-dependent manner visualization of annular gap junction vesicle processing: the interplay between annular gap junctions and mitochondria alix and the multivesicular endosome: alix in wonderland exchange of cytoplasm between cells of the membrane granulosa in rabbit ovarian follicles gap junction turnover is achieved by the internalization of small endocytic doublemembrane vesicles degradation of endocytosed gap junctions by autophagosomal and endo/lysosomal pathways: a perspective degradation of connexins and gap junctions molecular mechanisms regulating formation, trafficking and processing of annular gap junctions internalized gap junctions are degraded by autophagy the cell biology of receptor-mediated virus entry multivesicular body morphogenesis trans-endocytosis of planar cell polarity complexes during cell division ultrastructure and regulation of lateralized connexin in the failing heart mitochondrial transfer from bone-marrow-derived stromal cells to pulmonary alveoli protects against acute lung injury what is trogocytosis and what is its purpose? the origin of annular junctions: a mechanism of gap junction internalization endocytosis of extracellular vesicles and release of their cargo from endosomes saturated reconstruction of a volume of neocortex the complex ultrastructure of the endolysosomal system life cycle of connexins in health and disease structural diversity of gap junctions. a review differential modulation of rat follicle cell gap junction populations at ovulation degradation of annular gap junctions of the equine hoof wall endosome-tocytosol transport of viral nucleocapsids endocytic processing of connexin gap junctions: a morphological study autophagy: a pathway that contributes to connexin degradation oral tolerance can be established via gap junction transfer of fed antigens from cxcr + macrophages to cd + dendritic cells the fine structure of granulosa cell nexuses in rat ovarian follicles influence of aldehyde fixation on the morphology of endosomes and lysosomes: quantitative analysis and electron tomography endosomal compartmentalization in three dimensions: implications for membrane fusion gap junctions luteinizing hormone causes map kinase-dependent phosphorylation and closure of connexin gap junctions in mouse ovarian follicles: one of two paths to meiotic resumption localization of phosphorylated connexin using serial section immunogold electron microscopy colocalization of connexin and connexin but absence of connexin in granulosa cell gap junctions of rat ovary the lysosome as a regulatory hub internalization of large double-membrane intercellular vesicles by a clathrindependent endocytic process glutamate receptors are selectively targeted to postsynaptic sites in neurons endosome maturation, transport and functions signals from the lysosome: a control centre for cellular clearance and energy metabolism stacked endoplasmic reticulum sheets are connected by helicoidal membrane motifs we thank rindy jaffe and matthias falk for critical review of the manuscript and useful discussions. we thank art hand, maya yankova, and maria rubio for technical advice. we thank valentina baena, tracy uliasz, and other members of the jaffe and terasaki labs for technical assistance and help with collecting samples. this work was funded by the fund for science. the authors declare no conflicts of interest. rachael norris contributed to conceptualization; data collection, curation and analysis; visualization, and writing and editing of the original draft. mark terasaki contributed to conceptualization, supervision of the work, funding acquisition, software and resources for data collection; and writing and editing of the original draft. key: cord- -fed fhu authors: mcpherson, malinda j.; mcdermott, josh h. title: time-dependent discrimination advantages for harmonic sounds suggest efficient coding for memory date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fed fhu perceptual systems have finite memory resources and must store incoming signals in compressed formats. to explore whether representations of a sound’s pitch might derive from this need for compression, we compared discrimination of harmonic and inharmonic sounds across delays. in contrast to inharmonic spectra, harmonic spectra can be summarized, and thus compressed, using their fundamental frequency (f ). participants heard two sounds and judged which was higher. despite being comparable for sounds presented back-to-back, discrimination was better for harmonic than inharmonic stimuli when sounds were separated in time, implicating memory representations unique to harmonic sounds. patterns of individual differences (correlations between thresholds in different conditions) indicated that listeners use different representations depending on the time delay between sounds, directly comparing the spectra of temporally adjacent sounds, but transitioning to comparing f s across delays. the need to store sound in memory appears to determine reliance on f -based pitch, and may explain its importance in music, in which listeners must extract relationships between notes separated in time. our sensory systems transduce information at high bandwidths, but have limited resources to hold this information in memory. in vision, short-term memory is believed to store schematic structure extracted from image intensities, e.g. object shape, or gist, that might be represented with fewer bits than the detailed patterns of intensity represented on the retina ( - ). for instance, at brief delays visual discrimination shows signs of being based on image intensities, believed to be represented in high-capacity but short-lasting sensory representations ( ) . by contrast, at longer delays more abstract ( , ) or categorical ( ) representations are implicated as the basis of short-term memory. in other sensory modalities the situation is less clear. audition, for instance, is argued to also make use of both a sensory trace and a short-term memory store ( , ) , but the representational characteristics of the memory store are not well characterized. there is evidence that memory for speech includes abstracted representations of phonetic features ( , ) or categorical representations of phonemes themselves ( ) ( ) ( ) . beyond speech, the differences between transient and persistent representations of sound remain unclear. this situation plausibly reflects a historical tendency within hearing research to favor simple stimuli, such as sinusoidal tones, for which there is not much to abstract or compress. such stimuli have been used to characterize the decay characteristics of auditory memory ( ) ( ) ( ) ( ) , its vulnerability to interference ( , ) , and the possibility of distinct memory resources for different sound attributes ( ) ( ) ( ) , but otherwise place few constraints on the underlying representations. here we explore whether auditory perceptual representations could be explained in part by memory limitations. one widely proposed auditory representation is that of pitch ( , ) . pitch is the perceptual property which enables sounds to be ordered from low to high ( ) , and is a salient characteristic of animal and human vocalizations, musical instrument notes, and some environmental sounds. such sounds often contain harmonics, whose frequencies are integer multiples of a single fundamental frequency (f ; figure a&b ). pitch is classically defined as the perceptual correlate of this f , which is thought to be estimated from the harmonics in a sound even when the frequency component at the f is physically absent (figure c ). despite the prevalence of this idea in textbook accounts of pitch, there is surprisingly little direct evidence that listeners utilize representations of a sound's f when making pitch comparisons. for instance, discrimination of two harmonic sounds is normally envisioned to involve a comparison of estimates of the sounds' f s ( , ) . but if the frequencies of the sounds are altered to make them inharmonic (lacking a single f ; figure d ), discrimination remains accurate ( ) ( ) ( ) ( ) , even though there are no f s to be compared. such sounds do not have a pitch in the classical sense -one would not be able to consistently sing them back or otherwise match their pitch, for instance ( ) -but listeners nonetheless hear a clear upward or downward change from one sound to the other, like that heard for harmonic sounds. this result is what would be expected if listeners were using the spectrum rather than the f (figure e ), e.g. by tracking frequency shifts between sounds ( ) . although harmonic advantages are evident in some other tasks plausibly related to pitch perception (such as recognizing familiar melodies, or detecting out-of-key notes ( ) ), the cause of this task dependence remains unclear. in this paper we consider whether these characteristics of pitch perception could be explained by memory constraints. one reason to estimate a sound's f might be that it provides an efficient summary of the spectrum: a harmonic sound contains many frequencies, but their values can all be predicted as integer multiples of the f . this summary might not be needed if two sounds are presented back-to-back, as high-fidelity (but quickly-fading) sensory traces of the sounds could be compared. but it might become useful in situations where listeners are more dependent on a longer-lasting memory representation. we explored this issue by measuring effects of time delay on discrimination. our hypothesis was that time delays would cause discrimination to be based on short-term auditory memory representations ( , ) . we tested discrimination abilities with harmonic and inharmonic stimuli, varying the length of silent pauses between sounds being compared. we predicted that if listeners summarize harmonic sounds with a representation of their f , then performance should be better for harmonic than for inharmonic stimuli. this prediction held across a variety of different types of sounds and task conditions, but only when sounds were separated in time. in addition, individual differences in performance across conditions indicate that listeners switch from representing the spectrum to representing the f depending on memory demands. reliance on f -based pitch thus appears to be driven in part by the need to store sound in memory. a. example spectrograms for natural harmonic sounds, including a spoken vowel, the call of a gibbon monkey, and a note played on an oboe. the components of such sounds have frequencies that are multiples of an f , and as a result are regularly spaced across the spectrum. b. schematic spectrogram (left) of a harmonic tone with an f of hz along with its autocorrelation function (right). the autocorrelation has a value of at a time lag corresponding to the period of the tone ( /f = ms). c. schematic spectrogram (left) of a harmonic tone (f of hz) missing its fundamental frequency, along with its autocorrelation function (right). the autocorrelation still has a value of at a time lag of ms, because the tone has the period of hz, even though this frequency is not present in its spectrum. d. schematic spectrogram (left) of an inharmonic tone along with its autocorrelation function (right). the tone was generated by perturbing the frequencies of the harmonics of hz, such that the frequencies were not integer multiples of any single f in the range of audible pitch. accordingly, the autocorrelation does not exhibit any strong peak. e. schematic of trials in a discrimination task in which listeners must judge which of two tones is higher. for harmonic tones, listeners could compare f estimates for the two tones or follow the spectrum. the inharmonic tones cannot be summarized with f s, but listeners could compare the spectra of the tones to determine which is higher. we began by measuring discrimination with and without an intervening delay between notes ( figure a ), using recordings of real instruments that were resynthesized to be either harmonic or inharmonic (figure b ). we used real instrument sounds to maximize ecological relevance. here and in subsequent experiments, sounds were made inharmonic by adding a random frequency 'jitter' to each successive harmonic; each harmonic could be jittered in frequency by up to % of the original f of the tone (jitter values were selected from a uniform distribution subject to constraints on the minimum spacing between adjacent frequencies). making sounds inharmonic renders them inconsistent with any f , such that the frequencies cannot be summarized by a single f . whereas the autocorrelation function of a harmonic tone shows a strong peak at the period of the f (figure b-c) , that for an inharmonic tone does not ( figure d ). the same pattern of random 'jitter' was added to each of the two notes in a trial. there was thus a direct correspondence between the frequencies of the first and second notes even though the inharmonic stimuli lacked an f (figure e ). participants heard two notes played by the same instrument (randomly selected on each trial from the set of cello, baritone saxophone, ukulele, pipe organ, and oboe, with the instruments each appearing an equal number of times within a given condition). the two notes were separated by , , or seconds of silence. notes always differed by either a quarter of a semitone (approximately . % difference between the note f s) or a half semitone (approximately % difference between the note f s). participants judged whether the second note was higher or lower than the first. we found modest decreases in performance for harmonic stimuli as the delay increased (significant main effect of delay for both . semitone (f( , )= . , p<. , hp =. , figure c ), and . semitone conditions (f( , )= . , p=. , hp =. )). this decrease in performance is consistent with previous studies examining memory for complex tones ( , ) . we observed a more pronounced decrease in performance for inharmonic stimuli, with worse performance than for harmonic stimuli at both delays ( seconds: t( )= . , p<. for . semitone trials, t( )= . , p=. for . semitone trials; seconds: t( )= . , p=. for . semitone trials, t( )= . , p=. for . semitone trials) despite indistinguishable performance for harmonic and inharmonic sounds without a delay (t( )= . , p=. for . semitone trials, t( )=- . , p=. for . semitone trials). these differences produced a significant interaction between the effect of delay and that of harmonicity (f( , )= . , p=. , hp =. for . semitone trials, f( , )= . , p=. , hp =. for . semitone trials). this effect was similar for musicians and nonmusicians (supplementary figure ) . averaging across the two difficulty conditions we found no main effect of musicianship (f( , )= . , p=. , hp =. ), no interaction between musicianship, harmonicity and delay length (f( , )= . , p=. , hp =. ), and the interaction between delay and harmonicity was significant in non-musicians alone (f( , )= . , p=. , hp =. ). this result suggests that the harmonic advantage is not dependent on extensive musical training. overall, the results of experiment are consistent with the idea that a sound's spectrum can mediate discrimination over short time intervals (potentially via a sensory trace of the spectrum), but that memory over longer periods relies on a representation of f . a. schematic of trial structure for experiment . during each trial, participants heard two notes played by the same instrument and judged whether the second note was higher or lower than the first note. notes were separated by a delay of , , or seconds. b. power spectra of example harmonic and inharmonic (with frequencies jittered) notes from a cello (the fundamental frequency of the harmonic note is hz in this example). c. results of experiment plotted separately for the two difficulty levels that were used. error bars show standard error of the mean. we replicated and extended the results of experiment using synthetic tones, the acoustic features of which can be more precisely manipulated. we generated complex tones that were either harmonic or inharmonic, applying fixed bandpass filters to all tones in order to minimize changes in the center of mass of the tones that could otherwise be used to perform the task ( figure a ). the first audible harmonic of these tones was generally the th (though it could be the rd or th , depending on the f or jitter pattern). to gauge the robustness of the effect across difficulty levels, we again used two f differences (. and . semitones). to further probe the effect of delay on representations of f , we included a third condition ('interleaved harmonic') where each of the two tones on a trial contained half of the harmonic series ( figure a ). one tone always contained harmonics [ , , , , , etc.] , and the other always contained harmonics [ , , , , , etc.] ( ) . the order of the two sets of harmonics was randomized across trials. this selection of harmonics eliminates common harmonics between tones. while in the harmonic and inharmonic conditions listeners could use the correspondence of the individual harmonics to compare the tones, in the interleaved harmonic condition there is no direct correspondence between harmonics present in the first and second notes, such that the task can only be performed by estimating and comparing the f s of the tones. by applying the same bandpass filter to each interleaved harmonic tone we sought to minimize timbral differences that are known to impair f discrimination ( ) , though the timbre nonetheless changed somewhat from note-to-note, which one might expect would impair performance to some extent. masking noise was included in all conditions to prevent distortion products from being audible, which might otherwise be used to perform the task. the combination of the bandpass filter and the masking noise was also sufficient to prevent the frequency component at the f from being used to perform the task. performance without a delay was worse for interleaved-harmonic tones than for the regular harmonic and inharmonic tones, but unlike in those other two conditions, interleaved harmonic performance did not deteriorate significantly over time (figure b ). there was no main effect of delay for either . semitones (f( , )= . , p=. , hp =. ) or . semitones (f( , )= . , p=. , hp =. ), in contrast to the significant main effects of delay for harmonic and inharmonic conditions at both difficulty levels (p<. in all cases). this pattern of results is consistent with the idea that there are two representations that listeners could use for discrimination: a sensory trace of the spectrum which decays quickly over time, and a representation of the f which is better retained over a delay. the spectrum can be used in the harmonic and inharmonic conditions, but not in the interleaved harmonic condition. as in experiment , the effects were qualitatively similar for musicians and non-musicians (supplementary figure ) . although there was a significant main effect of musicianship (f( , )= . , p<. , hp =. ), the interaction between the effects of delay and harmonicity was significant in both musicians (f( , )= . , p<. , hp =. ) and non-musicians a. schematic of stimuli and trial structure in experiment . left: during each trial, participants heard two tones and judged whether the second was higher or lower than the first. tones were separated by a delay of , , or seconds. right: power spectra of hz tones for harmonic, inharmonic, and interleaved harmonic conditions. b. results of experiment , plotted separately for the two difficulty levels. error bars show standard error of the mean. the purpose of experiment was to examine the effects of the specific inharmonicity manipulation used in experiments and . in experiments and , we used a different inharmonic jitter pattern for each trial. in principle, listeners might be able to learn a spectral pattern if it repeats across trials ( , ) , such that the harmonic advantage found in experiments and might not have been due to harmonicity per se, but rather to the fact that the harmonic pattern occurred repeatedly whereas the inharmonic pattern did not. in experiment we compared performance in a condition where the jitter pattern was held constant across trials ('inharmonic-fixed') vs. when it was altered every trial (though the same for the two notes of a trial), as in the previous experiments ('inharmonic'). unlike the first two experiments, we used adaptive procedures to measure discrimination thresholds. adaptive threshold measurements avoided difficulties associated with choosing stimulus differences appropriate for the anticipated range of performance across the conditions. participants completed -down- -up two-alternative-forced-choice ('is the second tone higher or lower than the first') adaptive 'runs', each of which produced a single threshold measurement. approximately trials per run on average. the order of these three blocks was randomized across participants. this design does not preclude the possibility that participants might learn a repeating inharmonic pattern given even more exposure to it, but it puts the inharmonic and harmonic tones on equal footing, testing whether the harmonic advantage might be due to relatively short-term learning of the consistent spectral pattern provided by harmonic tones. as shown in figure c , thresholds were similar for all conditions with no delay (no significant difference between any of the conditions (z< . , p>. for all pairwise comparisons, wilcoxon signed-rank test, used because the distribution of thresholds was non-normal). these discrimination thresholds were comparable to previous measurements for stimuli with resolved harmonics ( , ) . however, thresholds were slightly elevated (worse) for both the inharmonic these results indicate that the harmonic advantage cannot be explained by the consistency of the harmonic spectral pattern across trials, as making the inharmonic spectral pattern consistent did not reduce the effect. given this result, in subsequent experiments we opted to use inharmonic rather than inharmonic-fixed stimuli, to avoid the possibility that the results might otherwise be biased by the choice of a particular jitter pattern. to assess whether the inharmonic deficit could somehow reflect interference from successive trials ( ) rather than the decay of memory during the inter-stimulus delay, we replicated a subset of the conditions from experiment using a longer inter-trial interval. we included only the harmonic and inharmonic conditions, with and without a second delay between notes. for each condition participants completed four adaptive threshold measurements without any enforced delay between trials, and four adaptive thresholds where we imposed a second inter-trial interval (such that the delay between trials would always be at least second longer than the delay between notes of the same trial). this experiment was run online because the lab was temporarily closed due to the covid- virus. the interaction between within-trial delay ( vs. seconds) and stimulus type (harmonic vs. inharmonic) was present both with and without the longer inter-trial interval (with: f( , )= . , p=. , hp =. ; without: f( , )= . , p=. , hp =. ). in addition, we found no significant interaction between inter-trial interval, within-trial delay, and harmonic vs. inharmonic stimuli (f( , )= . , p=. , hp =. , supplementary figure ). this result suggests that the inharmonic deficit is due to difficulties retaining a representation of the tones during the delay period, rather than some sort of interference from preceding stimuli. a. schematic of trial structure for experiment . task was identical to that of experiments and , but with delay durations of , and seconds, and adaptive threshold measurements rather than method of constant stimuli. b. example block order for experiment , in which the adaptive runs for each condition were presented within a contiguous block. the beginning of an example run is shown schematically for each type of condition. stimulus conditions (harmonic, inharmonic, inharmonic-fixed) were blocked. note the difference between the inharmonic-fixed condition, in which the same jitter pattern was used across all trials within a block, and the inharmonic condition, in which the jitter pattern was different on every trial. delay conditions were intermixed within each block. c. results of experiment . error bars show within-subject standard error of the mean. experiments - leave open the possibility that listeners might use active rehearsal of the stimuli (singing to themselves, for instance) to perform the task over delays. although prior results suggest that active rehearsal does not obviously aid discrimination of tones over delays ( , ) , it could in principle explain the harmonic advantage on the assumption that it is more difficult to rehearse an inharmonic stimulus, and so it seemed important to address. to assess whether the harmonic advantage reflects rehearsal, we ran a 'one-shot' online experiment with much longer delay times, during which participants filled out a demographic survey. we assumed this unrelated task would prevent them from actively rehearsing the heard tone. this experiment was run online to recruit the large number of participants needed to obtain sufficient power. we have previously found that online participants can perform about as well as in-lab participants ( , ) provided basic steps are taken both to maximize the chances of reasonable sound presentation by testing for earphone/headphone use ( ) , and to ensure compliance with instructions, either by providing training or by removing poorly performing participants using hypothesis-neutral screening procedures. each participant completed only two trials in the main experiment. one trial had no delay between notes, as in the s delay conditions of the previous experiments. during the other trial participants heard one tone, then were redirected to a short demographic survey, and then heard the second tone ( figure a ). the order of the two trials was randomized across participants. for each participant, both trials contained the same type of tone, randomly assigned. the tones were either harmonic, inharmonic, or interleaved-harmonic (each identical to the tones used in experiment ). the two test tones always differed in f by a semitone. the discrimination task was described to participants at the start of the experiment, such that participants knew they should try to remember the first tone before the survey and that they would be asked to compare it to a second tone heard after the survey. to ensure task comprehension, participants completed practice trials with feedback (without a delay, with an f difference of a semitone). these practice trials were always with same types of tones the participant would hear during the main experiment (for instance, if participants heard inharmonic stimuli in the test trials, the practice trials also featured inharmonic stimuli). we ran a large number of participants to obtain sufficient power given the small number of trials per participant. participants completed the survey at their own pace. we measured the time interval between the onset of the first note before the survey and the onset of the second note after the survey. prior to analysis we removed participants who completed the survey in under seconds (proceeding through the survey so rapidly as to suggest that the participant did not read the questions; participant), or participants who took longer than minutes ( participants for all pairwise comparisons). as shown in figure b , experiment qualitatively replicated the results of experiments and even with the longer delay period and concurrent demographic survey. without a delay, there was no difference between performance with harmonic and inharmonic conditions (p=. , via bootstrap). with a delay, performance in both the harmonic and inharmonic conditions was worse than without a delay (p<. for both). however, this impairment was larger for the inharmonic condition; performance with inharmonic tones across a delay was significantly worse than that with harmonic tones (p<. ), producing an interaction between the type of tone and delay (p=. ). by contrast, performance on the interleaved harmonic condition did not deteriorate over the delay and in fact slightly improved (p=. ). this latter result could reflect the decay of the representation of the spectrum of the first tone across the delay, which in this condition might otherwise impair f discrimination by providing a competing cue (because the spectra of the two tones are different ( )). a. schematic of the two trial types in experiment . task was identical to that of experiment . however, for the 'delay' condition participants were redirected to a short demographic survey that they could complete at their own pace. b. results of experiment (which had the same stimulus conditions as experiment ). error bars show standard error of the mean, calculated via bootstrap. the similarity in performance between harmonic and inharmonic tones without a delay provides circumstantial evidence that listeners are computing changes in the same way for both types of stimuli, presumably using a representation of the spectrum in both cases. however, the results leave open the alternative possibility that listeners use a representation of the f for the harmonic tones despite having access to a representation of the spectrum (which they must use with the inharmonic tones), with the two strategies happening to support similar accuracy. to address these possibilities, and to explore whether listeners use different encoding strategies depending on memory constraints, we employed an individual differences approach ( , ) . the underlying logic is that performance on tasks that rely on the same perceptual representations should be correlated across participants. for example, if two discrimination tasks rely on similar representations, a participant with a low threshold on one task should tend to have a low threshold on the other task. in experiment , we estimated participants' discrimination thresholds either with or without a -second delay between stimuli, using a -down- -up adaptive procedure (figure a ). stimuli included harmonic, inharmonic, and interleaved-harmonic complex tones, generated as in experiments - . we used inharmonic stimuli for which a different pattern of jitter was chosen for each trial, because experiment showed similar results whether the inharmonic pattern was consistent or not, and because it seemed better to avoid a consistent jitter. specifically, it seemed possible that fixing the jitter pattern across trials for each participant might create artifactual individual differences given that some jitter patterns are by chance closer to harmonic than others. having the jitter pattern change from trial to trial produces a similar distribution of stimuli across participants and should average out the effects of idiosyncratic jitter patterns. we also included a condition with pure tones (sinusoidal tones, containing a single frequency) at the frequency of the fourth harmonic of the tones in the harmonic condition, and a fifth condition where each note contained two randomly chosen harmonics from each successive set of four ( to , to , etc.). by chance, some harmonics could be found in both notes. this condition (random harmonic) was intended as an internal replication of the anticipated results with the interleaved harmonic condition. the main hypothesis we sought to test was that listeners use one of two different representations depending on the duration of the inter-stimulus delay and the nature of the stimulus. specifically, without a delay, listeners use a detailed spectral representation for both harmonic and inharmonic sounds, relying on an f -based representation only when a detailed spectral pattern is not informative (as in the interleaved harmonic condition). in the presence of a delay, they switch to relying on an f -based representation for all harmonic sounds. the key predictions of this hypothesis in terms of correlations between thresholds are ) high correlations between harmonic and inharmonic discrimination without a delay, ) lower correlations between harmonic and inharmonic discrimination with a delay, ) low correlations between interleaved harmonic discrimination and both the harmonic and inharmonic conditions with no delay (because the former requires a representation of the f ), and ) higher correlations between interleaved harmonic and harmonic discrimination with a delay. we ran this study online to recruit sufficient numbers to measure the correlations of interest. this experiment was relatively arduous (it took approximately hours to complete), and based on pilot experiments we anticipated that many online participants would perform poorly relative to in-lab participants, perhaps because the chance of distraction occurring at some point over the hours is high. to obtain a criterion level of performance with which to determine inclusion for online participants, we ran a group of participants in the lab to establish acceptable performance levels. we calculated the overall mean threshold for all conditions without a delay (five conditions) for the best two-thirds of in-lab participants, and excluded online participants whose mean threshold on the first run of those same conditions (without a delay) was above the pattern of mean threshold measurements obtained online qualitatively replicated the results of experiments - ( figure b ). inharmonic thresholds were indistinguishable from harmonic thresholds without a delay (z= . , p=. ), but were higher when there was a delay between sounds (z=- . , p<. ). this produced a significant interaction between effects of tone figure c shows the correlations across participants between different pairs of thresholds. thresholds were correlated to some extent for all pairs of conditions, presumably reflecting general factors such as attention or motivation that produce variation in performance across participants. however, some correlations were higher than others. figure d&e plot the correlations (extracted from the matrix in figure c ) for which our hypothesis makes critical predictions, to facilitate their inspection. correlations here and elsewhere were corrected for the reliability of the underlying threshold measurements ( ) ; the correlation between two thresholds was divided by the square-root of the product of their reliabilities (cronbach's alpha calculated from spearman's correlations between pairs of the last runs of each condition). this denominator provides a ceiling for each correlation, as the correlation between two variables is limited by the accuracy with which each variable is measured. thus, correlations could in principle be as high as in the limit of large data, but because the threshold reliabilities were calculated from modest sample sizes, the corrected correlations could in practice slightly exceed . the results for pure tones were similar to those for the harmonic condition (figure e a. schematic of the five types of stimuli used in experiment . task and procedure (adaptive threshold measurements) were identical to those of experiment , but with no inter-trial delays. b. discrimination thresholds for all stimulus conditions, with and without a -second delay between tones. here and in panels d-e, error bars show standard error of the mean, calculated via bootstrap. c. matrix of the correlations between thresholds for all pairs of conditions. correlations are spearman's rho, corrected for the reliability of the threshold measurements (i.e., corrected for attenuation). corrected correlations can slightly exceed given that they are corrected with imperfect estimates of the reliabilities. d. comparison between harmonic/inharmonic and harmonic/interleaved harmonic threshold correlations, with and without a delay. e. comparison between pure tone condition correlation with and without a delay. we examined the relationship between pitch perception and memory by measuring the discrimination of harmonic and inharmonic sounds with and without a time delay between stimuli. across several experiments, we found that discrimination across a delay was better for harmonic sounds than for inharmonic sounds, despite comparable accuracy without a delay. this effect was observed over delays of a few seconds and persisted over longer delays with an intervening distractor task. we also analyzed individual differences in discrimination thresholds across a large number of participants. harmonic and inharmonic discrimination thresholds were highly correlated without a delay between sounds, but were less correlated with a delay. by contrast, thresholds for harmonic tones and tones designed to isolate f -based discrimination (interleaved-harmonic tones) showed the opposite pattern, becoming more correlated with a delay between sounds. together, the results suggest that listeners use different representations depending on memory demands, comparing the spectra for sounds nearby in time, and the f for sounds separated in time. the results provide evidence for two distinct mechanisms for pitch discrimination, reveal the constraints that determine when they are used, and demonstrate a form of abstraction within the auditory system whereby the representations of memory differ in format from those used for rapid on-line judgments about sounds. in hearing research, the word 'pitch' has traditionally referred to the perceptual correlate of the f ( ) . in some circumstances listeners must base behavior on the absolute f of a sound of interest, as when singing back a heard musical note. however, much of the time the information that matters to us is conveyed by how the f changes over time, and our results indicate that listeners often extract this information using a representation of the spectrum rather than the f . one consequence of this is that note-to-note changes can be completely unambiguous even for inharmonic sounds that lack an unambiguous f . is this pitch perception? under typical listening conditions (where sounds are harmonic) the changes in the spectrum convey changes in f , and thus enable judgments about how the f changes from note to note. consistent with this idea, listeners readily describe what they hear in the inharmonic conditions of our experiments as a pitch change, as though the consistent shift in the spectrum is interpreted as a change in the f even though neither note has a clear f . we propose that these spectral judgments should be considered part of pitch perception, which we construe to be the set of computations that enable judgments about a sound's f . the perception of inharmonic "pitch changes" might thus be considered an illusion, exploiting the spectral pitch mechanism in conditions in which it does not normally operate. why don't listeners base pitch judgments of harmonic sounds on their f s when sounds are backto-back? one possibility is that representations of f are in some cases less accurate and produce poorer discrimination than those of the spectrum. the results with interleaved harmonics (stimuli designed to isolate f -based pitch) are consistent with this idea, as discrimination without a delay was worse for interleaved harmonics than for either harmonic or inharmonic tones that had similar spectral composition across notes. however, we note that this deficit could also reflect other stimulus differences, such as the potentially interfering effects of the changes in the spectrum from note to note ( ) . regardless of the root cause for the reliance on spectral representations, the fact that performance with interleaved harmonics was similar with and without modest delays suggests that representations of the f are initially available in parallel with representations of the spectrum, with task demands determining which is used in the service of behavior. as time passes the high-fidelity representation of the spectrum appears to degrade, and listeners switch to more exclusively using a representation of the f . the use of time delays to study memory for frequency and/or pitch has a long tradition ( , ) . our results here are broadly consistent with this previous work, but to our knowledge provide the first evidence for differences in how representations of the f and the spectrum are retained over time. a number of studies have examined memory for tones separated by various types of interfering stimuli, and collectively provide evidence that the f of a sound is retained in memory. for example, intervening sounds interfere most with discrimination if their f s are similar to the tones being discriminated, irrespective of whether the intervening sounds are speech or synthetic tones ( ) , and irrespective of the spectral content of the intervening notes or the two comparison tones ( ) . our results complement these findings by showing that memory for f has different characteristics from that for frequency content (spectra), by showing how these differences impact pitch perception, and by suggesting that memory for f should be viewed as a form of compression. we also show that introducing a delay between notes forces listeners to use the f rather than the spectrum, which may be useful in experimentally isolating f -based pitch in future studies. our results are consistent with the idea that memory capacity limitations for complex spectra in some cases limit judgments about sound. previous studies of memory for complex tones failed to find clear evidence for such capacity limitations, in that there was no interaction between the effects of inter-stimulus-interval and of the number of constituent frequencies on the accuracy of judgments of a remembered tone ( ) . however, there were many differences between these prior experiments and those described here that might explain the apparent discrepancy, including that the time intervals tested were short (at most seconds) compared to those in our experiments, and the participants highly practiced. it is possible that under such conditions listeners are less dependent on the memory representations that were apparently tapped in our experiments. the tasks used in those prior experiments were also different (involving judgments of a single frequency component within an inharmonic complex tone), as were the stimuli (frequencies equidistant on a logarithmic scale). quantitative models of memory representations and their use in behavioral tasks seem likely to be an important next step in evaluating whether the available results can be explained by a single type of memory store. our results could have interesting analogues in vision. visual short-term memory has been argued to store relatively abstract representations ( - , , ), and the grouping of features into object-like representations is believed to increase its effective capacity ( ) ( ) ( ) . our results raise the question of whether such benefits are specific to memory. it is plausible that for stimuli presented back-to-back, discrimination of visual element arrays would be similar irrespective of the element arrangement, with advantages for elements that are grouped into a coherent pattern only appearing when short-term memory is taxed. to our knowledge this has not been explicitly tested. one apparent difference between auditory and visual memory is that in some contexts visual memory for simple stimuli decays relatively slowly, with performance largely unimpaired for multisecond delays comparable to those used here ( ) . by contrast, auditory memory is more vulnerable, with performance decreases often evident over seconds even for pure tone discrimination (e.g. fig. b) . as a consequence, visual memory has often been studied via memory 'masking' effects in which stimuli presented in the delay period impair performance if they are sufficiently similar to the stimuli being remembered ( ) ( ) ( ) . such effects also occur for auditory memory ( ) ( ) ( ) , but the performance impairments that occur with a silent delay were sufficient in our case to illuminate the underlying representation. masking effects might nonetheless yield additional insights. f -based pitch seems to be particularly important in music perception, evident in prior results documenting the effect of inharmonicity on music-related tasks. melody recognition, 'sour' note detection, and pitch interval discrimination are all worse for inharmonic than for harmonic tones, in contrast to other tasks such as up/down discrimination, which can be performed equally well for the two types of tones (shown again in the experiments here) ( ). our results here provide a potential explanation for these effects. music often requires notes to be compared across delays or intervening sounds, as when assessing a note's relation to a tonal center ( ) , and the present results suggest that this should necessitate f -based pitch. musical pitch perception may have come to rely on representations of f as a result, such that even in musical tasks that do not involve storage across a delay, such as musical interval discrimination, listeners use f -based pitch rather than the spectrum ( ). it is possible that similar memory advantages occur for other patterns that occur frequently in music, such as common chords. given its evident role in music perception, it is natural to wonder whether f -based pitch is honed by musical training. western-trained musicians are known to have better pitch discrimination than western non-musicians ( ) ( ) ( ) . however, previous studies examining effects of musicianship on pitch discrimination used either pure tone or harmonic complex tone stimuli, and thus do not differentiate between representations of the f vs. the spectrum. we found consistent overall pitch discrimination advantages for musicians compared to non-musicians ( supplementary figures - ), but found no evidence that this benefit was specific to f representations: musicianship did not interact with the effects of inharmonicity or inter-stimulus delay. it is possible that more extreme variation in musical experience might show f -specific effects. for instance, indigenous cultures in the amazon appear to differ from westerners in basic aspects of pitch ( ) and harmony perception ( , ) , raising the possibility that they might also differ in the extent of reliance on f -based pitch. it is also possible that musicianship effects might be more evident if memory were additionally taxed with intervening distractor tones. perception is often posited to estimate the distal causes in the world that generated a stimulus ( ) . parameters that capture how a stimulus was generated are useful for behavior -as when one requires knowledge of an object's shape in order to grasp it -but can also provide compressed representations of a stimulus. indeed, efficient coding has been proposed as a way to estimate generative parameters of sensory signals ( ) . a sound's f is one such generative parameter, and our results suggest that its representation may be understood in terms of efficient coding. prior work has explained aspects of auditory representations ( , ) and discrimination ( ) as consequences of efficient coding, but has not explored links to memory. our results raise the possibility that efficient coding may be particularly evident in sensory memory representations. we provide an example of abstract and compressed auditory memory representations, and in doing so explain some otherwise puzzling results in pitch perception (chiefly, the fact that conventional pitch discrimination tasks are not impaired by inharmonicity). this efficient coding perspective suggests that harmonic sounds may be more easily remembered because they are prevalent in the environment, such that humans have acquired representational transforms to efficiently represent them (e.g. by projection onto harmonic templates) ( ) . this interpretation also raises the possibility that the effects described here might generalize to or interact with other sound properties. there are many other regularities of natural sounds that influence perceptual grouping ( , ) . each of these could in principle produce memory benefits when sounds must be stored across delays. in addition to regularities like harmonicity that are common to a wide range natural sounds, humans also use learned 'schemas' for particular sources when segregating streams of sound ( , ) and these might also produce memory benefits. it is thus possible that recurring inharmonic spectral patterns, for instance in inharmonic musical instruments ( ), could confer a memory advantage to an individual with sufficient exposure to them, despite lacking the mathematical regularity of the harmonic series. memory could be particularly important in audition given that sound unfolds over time, with the structures that matter in speech, music, and other domains often extending over many seconds. other examples of auditory representations that discard details in favor of more abstract structures include the 'contour' of melodies, that listeners retain in some conditions in favor of the exact f intervals between notes ( ), or summary statistics of sound textures that average across temporal details ( , ) . these representations may reflect memory constraints involved in comparing two extended stimuli even without a pronounced inter-stimulus delay. our results leave open how the two representations implicated in pitch judgments are instantiated in the brain. pitch-related brain responses measured in humans have generally not distinguished representations of the f from that of the spectrum ( ) ( ) ( ) ( ) ( ) , in part because of the coarse nature of human neuroscience methods. moreover, we know little about how pitch representations are stored over time in order to mediate discrimination across a delay. in non-human primates there is evidence for representations of the spectrum of harmonic complex tones ( ) as well as of their f ( ) , though there is increasing evidence for heterogeneity in pitch across species ( ) ( ) ( ) ( ) ( ) . neurophysiological and behavioral experiments with delayed discrimination tasks in non-human animals could shed light on these issues. our results also indicate that we unconsciously switch between representations depending on the conditions in which we must make pitch judgments (i.e., whether there is a delay between sounds). one possibility is that sensory systems can assess the reliability of their representations and base decisions on the representation that is most reliable for a given context. evidence weighting according to reliability is a common strategy in perceptual decisions ( ) , and our results raise the possibility that such frameworks could be applied to understand memory-driven perceptual decisions. all experiments were approved by the committee on the use of humans as experimental subjects at the massachusetts institute of technology, and were conducted with the informed consent of the participants. in all experiments except for experiment (which contained only two trials per participant), we excluded poorly performing participants using hypothesis-neutral performance criteria. in experiments - , we selected exclusion criteria a priori based on our expectations of what would constitute good performance. for experiment , we conducted a pilot experiment in the lab, and selected online participants who performed comparably to good in-lab participants. musicianship: across all experiments, musicians were defined as individuals with five or more years of self-reported musical training and/or active practice/performance. non-musicians had four or fewer years of self-reported musical training and/or active practice/performance. experiment : participants were recruited for the online component of experiment . we sought to obtain mean performance levels comparable with those of compliant and attentive participants run in the lab. to this end, we excluded participants whose average performance across conditions fell below a cutoff. we ran participants in the lab to establish this cutoff. we used the average threshold from the best two-thirds of these in-lab participants ( in all experiments, a macmini computer running psychtoolbox for matlab ( ) was used to play sound waveforms. sounds were presented to participants at db spl over sennheiser hd headphones (circumaural) in a soundproof booth (industrial acoustics). sound levels were calibrated with headphones coupled to an artificial ear, with a microphone at the position of the eardrum. participants logged their responses via keyboard press. audio presentation: online. we used the crowdsourcing platform provided by amazon mechanical turk to run experiments that necessitated large numbers of participants (experiments and ), or when in-person data collection was not possible due to the covid- virus (experiment ). each participant in these studies used a calibration sound to set a comfortable level, and then had to pass a 'headphone check' experiment that helped ensure they were wearing headphones or earphones as instructed ( ) before they could complete the full experiment. the experimental stimuli were set to . db below the level of the calibration sound, to ensure that stimuli were never uncomfortably loud. participants logged their responses by clicking buttons on their computer monitors using their mouse. feedback (correct/incorrect) was given after each trial for all tasks except for the two test trials for about half of the participants of experiment (see below). procedure: participants heard two instrument notes per trial, separated by varying amounts of silence ( , , and seconds) and judged whether the second note was higher or lower than the first note. participants heard trials per condition, and all conditions were intermixed. the first stimulus for a trial began one second after the response was entered for the previous trial, such that there was at least a -second gap between successive trials. stimuli: instrument notes were derived from the rwc instrument database, which contains recordings of chromatic scales played on different instruments ( ) . we used recordings of baritone saxophone, cello, ukulele, pipe organ and oboe, chosen to cover a wide range of timbres. instrument tones were manipulated using the straight analysis and synthesis method ( ) ( ) ( ) . straight is normally used to decompose speech into excitation and vocal tract filtering, but can also decompose a recording of an instrument into an excitation signal and a spectrotemporal filter. if the voiced part of the excitation is modeled sinusoidally, one can alter the frequencies of individual harmonics, and then recombine them with the unaltered instrument body filtering to generate inharmonic notes. this manipulation ( ) leaves the spectral shape of the instrument largely intact. previous studies with speech suggest that the intelligibility of inharmonic speech is comparable to that of harmonic speech ( ) . the frequency jitters for inharmonic instruments were chosen in the same way as the jitters for the inharmonic synthetic tones used in experiments - (described below). the same pattern of jitter was used for both notes in a trial. straight was also used to frequency-shift the instrument notes to create pairs of notes that differed in f by a specific amount. audio was sampled at , hz. all notes were ms in duration and were windowed by ms half-hanning windows. each trial consisted of two notes. the second note differed from the first by . or . semitone. to generate individual trials, the f of the first note of each trial was randomly selected from a uniform distribution over the notes in a western classical chromatic scale between and hz (g to g ). a recording of this note, from an instrument selected from the set of that were used (baritone saxophone, cello, ukulele, pipe organ and oboe), was chosen as the source for the first note in the trial (instruments were counterbalanced across conditions). if the second note in the trial was higher, the note semitone above was used to generate the second note (the note semitone lower was used if the second note of the trial was lower). the two notes were analyzed and modified using the straight analysis and synthesis method ( ) ( ) ( ) ; the notes were f flattened to remove any vibrato, shifted to ensure that the f differences would be exactly the intended f difference apart, and resynthesized with harmonic or inharmonic excitation. some instruments, such as the ukulele, have slightly inharmonic spectra. these slight inharmonicities were removed for the harmonic conditions due to the resynthesis. experiments , , , . , and used the same types of tones. the stimuli for experiment . are described below. synthetic complex tones were generated with exponentially decaying temporal envelopes (decay constant of s - ) to which onset and offset ramps were applied ( ms half- hanning window). the sampling rate was , hz for experiment , and , hz for all others. prior to bandpass filtering, tones included all harmonics up to the nyquist limit, in sine phase, and were always ms in duration. in order to make notes inharmonic, the frequency of each harmonic, excluding the fundamental, was perturbed (jittered) by an amount chosen randomly from a uniform distribution, u(-. , . ). this jitter value was chosen to maximally perturb f (lesser jitter values did not fully remove peaks in the autocorrelation at the period of the original f ( )). jitter values were multiplied by the f of the tone, and added to the frequency of the respective harmonic. for example, if the f was hz and a jitter value of - . was selected for the second harmonic; its frequency would be set to hz. to minimize salient differences in beating, jitter values were constrained (via rejection sampling) such that adjacent harmonics were always separated by at least hz. the same jitter pattern was applied to every note of the stimulus for a given trial, such that the spectral pattern shifted coherently up or down, even in the absence of an f . except for the inharmonic-fixed condition of experiment , where one random jitter pattern was used for entire blocks of the experiment, a new jitter pattern was chosen for each trial. each complex tone was band-pass filtered in the frequency domain with a gaussian transfer function (in log frequency) centered at , hz with a standard deviation of half an octave. this filter served to ensure that participants could not perform the tasks using changes in the spectral envelope, and also to minimize timbral differences between notes in the interleaved harmonic condition. the filter parameters were chosen to ensure that the f was attenuated (to eliminate variation in a spectral edge at the f ) while preserving audibility of resolved harmonics (harmonics below the th , approximately). the combination of the filter and the masking noise (described below) rendered the frequency component at the f inaudible. to ensure that differences in performance for harmonic and inharmonic conditions could not be mediated by distortion products, we added masking noise to these bandpass filtered notes. we low pass filtered pink noise using a sigmoidal (logistic) transfer function in the frequency domain. the sigmoid had an inflection point at the third harmonic of the highest of the two notes on a trial, and a maximum slope yielding db of gain or attenuation per octave. we scaled the noise so that the noise power in a gammatone filter (one erbn in bandwidth ( ) , implemented as in ( )) centered at the f was db lower than the mean power of the three harmonics of the highest note of the trial that were closest to the , hz peak (and thus had greatest magnitude) of the gaussian spectral envelope ( ) . this noise power is sufficient to mask distortion products at the f ( , ) . this filtered and scaled pink noise was added to each note, and did not continue through the silence in 'delay' conditions. noise has been reported to facilitate the perception of the f of a set of harmonics ( , ) in contexts where the harmonic frequencies are embedded in relatively high levels of noise. because the noise in our stimuli was focused at the f rather than the higher harmonics that composed our tones, it seems less likely to have produced such a benefit, but we never specifically manipulated it to assess its effect. procedure: the procedure was identical to that for experiment , except stimuli were synthetic tones. stimuli: each trial consisted of two notes, described above in stimuli for experiments - . the second tone differed from the first by . or . semitones. the first note of each trial was randomly selected from a uniform distribution on a logarithmic scale spanning to hz. tones were either harmonic, inharmonic, or interleaved harmonic. interleaved harmonic notes were synthesized by removing harmonics [ , , , , , etc.] in one note, and harmonics [ , , , , etc.] in the other note. they were otherwise identical to the harmonic tones (identical bandpass filter in the frequency domain, as well as noise to mask a distortion product at the fundamental). this manipulation was intended to isolate f -based pitch, as it removes the note-to-note spectral correspondence between harmonics. procedure: participants heard two notes per trial, separated by varying amounts of silence ( , and seconds) and were asked whether the second note was higher or lower than the first note. unlike experiments and , which used the method of constant stimuli, participants completed -up- -down adaptive threshold measurements for each condition. each run ended after reversals. for the first reversals, the f changed by a factor of , and for subsequent reversals by a factor of √ . each adaptive run was initialized at an f difference of semitone (approximately %), and the maximum f difference was limited to semitones. the adaptive procedure continued if participants reached this semitone limit; if they continued to get trials incorrect the f difference remained at the semitone limit, and if they got two in a row right, the f difference would decrease from semitones by a factor of or √ depending on how many reversals had already occurred. in practice participants who hit this limit repeatedly were removed before analysis due to our exclusion criteria. thresholds were estimated by taking the geometric mean of the final reversals. the first stimulus for a trial began one second after the response was entered for the previous trial, such that there was at least a -second gap between successive trials. stimuli: each trial consisted of two notes, described above in stimuli for experiments - . the first note of each trial was randomly selected from a uniform distribution on a logarithmic scale spanning to hz. tones were either harmonic, inharmonic, or inharmonic-fixed, separated into three blocks, the order of which was counterbalanced across participants. participants completed adaptive thresholds within each block ( delay conditions x runs per condition). for the inharmonic-fixed block, a random jitter pattern was chosen at the beginning of the block and used for every trial within the entire block. procedure: the procedure for experiment was identical to that for experiment , except that the experiment was run on amazon mechanical turk, with different time intervals between trials. participants completed two sets of adaptive threshold measurements. in the first, trials were initiated by the participant, and could begin as soon as they entered the response for the previous trial and clicked a button to start the next trial. four adaptive threshold measurements per condition were taken in this way (harmonic and inharmonic stimuli both with and without a second delay). in the second set, a mandatory -second pause was inserted between each trial, which could be initiated by the participant once the pause had elapsed. four threshold measurements for the same conditions were taken in this way, and thresholds were estimated by taking the geometric mean of the final reversals. the two sets of measurements were randomly intermixed. to perform adaptive procedures online, stimuli were pre-generated (instead of being generated in real-time as was done for the in-lab studies). for each condition and possible f difference, the stimuli were drawn randomly from a set of pre-generated trials (varying in the f of the first note, and in the jitter pattern for the inharmonic trials). stimuli: stimuli were identical to those for the harmonic and inharmonic conditions in experiment . procedure. participants were recruited using the amazon mechanical turk crowdsourcing platform. in the main experiment, each participant completed two trials -one each of two trial types. in the first type of trial, they heard two consecutive notes and were asked whether the second note was higher or lower than the first. notes always differed by a semitone. for the second type of trial, participants heard the first note, then were directed to a demographic survey, then were presented with the second note, and then were asked to respond. before the presentation of the first note participants were told that they would be subsequently asked whether a second note heard after the survey was higher or lower in pitch than the note heard before the found that there was no significant difference between the two versions of the experiment in any of the conditions. we thus combined data across the two versions. before completing the two main experiment trials and the survey, participants completed practice trials without a delay, with the same type of tone that they would hear in the two main experiment trials (i.e. if a participant would hear inharmonic stimuli in the two experiment trials, their practice trials would contain inharmonic stimuli). participants received feedback on each of these practice trials. the stimulus difference in all trials was semitone. stimuli. stimuli for experiment were identical to those used in experiments - (harmonic, inharmonic and interleaved harmonic conditions). procedure: both in lab and on mechanical turk, participants completed -up- -down adaptive threshold measurements. the instructions were to judge whether the second note was higher or lower than the first note. the adaptive procedure was identical to that used in the first set of threshold measurements of experiment (each trial could be initiated by the participant as soon as they had entered their response for the previous trial). for the in-lab participants, we used the best three runs from each condition to set the inclusion criteria for online studies. for online participants, the first run of each condition was used to determine inclusion, and the final three runs of each condition were used for analysis. the order of the adaptive runs was randomized for each participant. there were runs for each of the conditions, for a total of adaptive runs, randomized in order for each participant. thresholds were estimated as the geometric mean of the final reversals. participants received feedback after each trial. stimuli: participants in lab and online were tested on different types of stimuli, presented either with no delay or a -second delay between tones ( conditions). the five types of stimuli were as follows: ( ) harmonic, ( ) inharmonic, and ( ) interleaved harmonic, all identical to the same conditions in experiment . ( ) pure tones: we used the th harmonic of the f (f * ) so that the stimuli would overlap in frequency with the complex tones used in other conditions, which were filtered so that first audible harmonic was generally the rd or th harmonic. low pass masking noise was omitted from the pure tone condition given that distortion products were not a concern. given the similarity in mean thresholds between the pure tone and harmonic conditions, and the high correlation between them across participants, the absence of noise in this condition does not in all conditions, the f of the initial tone for each trial was chosen randomly from a uniform distribution on a logarithmic scale spanning - hz. as in experiment , stimuli were pre-generated to enable online threshold measurements. for each condition and possible f difference, the stimuli were drawn randomly from a set of pre-generated trials (varying in the f of the first note, and in the jitter pattern for the inharmonic conditions). experiments and : a power analysis of pilot data for experiments and showed an effect size of d= . for the difference between harmonic and inharmonic conditions at seconds. we thus aimed to run at least musicians and non-musicians to be able to analyze the two groups separately and have an % chance of detecting the harmonic advantage at a p<. significance level (using paired t-tests). this number of participants also left us well-powered to observe an interaction between harmonicity and delay for both groups ( participants were needed to have an % of detecting an interaction with an effect size of that seen in pilot data, hp = . ). power analyses for experiments - and experiment (in lab baseline) used g*power ( ) . experiment was run in combination with other experiments (not described here) that were not as well powered and required more data, hence the additional participants. we performed power analyses for experiments and using a pilot experiment with participants where we measured thresholds either with or without a -second delay. the pilot experiment used the same method and analysis as experiments and , but without the -second-delay and fixed-jitter condition of experiment or the inter-trial delays of experiment . the effect size from this pilot experiment for the harmonic-inharmonic difference at the -second delay was d= . . based on the rough intuition that the effect of the inharmonic-fixed manipulation or the inter-trial delay might produce an effect approximately half this size, we sought to be % likely to detect an effect half as big as that observed in our pilot data, at a p<. significance (using a two-sided wilcoxon signed-rank test). this yielded a target sample size of participants. we did not plan to recruit equal numbers of musicians and non-musicians due to the similarity between groups in experiments - . yielding a target sample size of participants in each condition. we made no attempt to recruit equal numbers of musicians and non-musicians, as we did not plan to analyze those groups separately given the similar results across groups observed in the experiments we ran prior to this. experiment : for experiment , we performed a power analysis by bootstrapping pilot data (an earlier version of experiment with slightly different stimuli). for each of a set of sample sizes we computed bootstrap distributions of the interaction term (difference of differences between the conditions being compared, harmonic/inharmonic and interleaved harmonic), as well as null distributions obtained by permuting conditions across participants. we found that a sample size of yielded a % chance of seeing the interaction present in our pilot data at a p<. significance level. we ran more than this number of participants to allow performance-based exclusion. as with experiments - , we made no attempt to recruit equal numbers of musicians and non-musicians. for experiments , , and we calculated percent correct for each condition. for experiments and , data were evaluated for normality with lilliefors' composite goodness-of-fit test. data for experiment passed lilliefors' test, and so significance was evaluated using paired t-tests and repeated-measures anovas. we used mixed-model anovas to examine the effects of musicianship (to compare within-and between-group effects). data for experiment were nonnormal due to ceiling effects in some conditions, and so significance was evaluated with the same non-parametric tests used for the threshold experiments (described below). for experiment , the significance of the differences between conditions and the significance of interactions were calculated via bootstrap ( , samples). to calculate the significance of the interaction between conditions, we first calculated the interaction (the difference of differences in means with and without a delay). for instance, for the harmonic and inharmonic conditions this term is as follows: then, we approximated a null distribution for this interaction, permuting conditions across participants and recalculating the difference of differences , times. to determine statistical significance we compared the actual value of the interaction to this null distribution. data distributions were non-normal (skewed) for threshold experiments (experiments , and ) as well as for experiment , so non-parametric tests were used for all comparisons. wilcoxon signed-rank tests were used for pairwise comparisons between dependent samples (for example, two conditions for the same participant group). to compare performance across multiple conditions or across musicianship we used f statistics for repeated-measures anovas (for within group effects) and mixed-model anovas (to compare within and between group effects). however, because data were non-normal, we evaluated the significance of the f statistic with approximate permutation tests, randomizing the assignment of the data points across the conditions being tested , times, and comparing the f statistic to this distribution. we used spearman's rank correlations to examine individual differences in experiment . correlations were corrected for the reliability of the threshold measurements using the spearman correction for attenuation ( ) . we used standardized cronbach's alpha as a measure of reliability ( , ) . this entailed calculating the spearman correlation between pairs of the analyzed adaptive track thresholds for each condition, averaging these three correlations, and applying the spearman-brown correction to estimate the reliability of the mean of the three adaptive threshold measurements. standard errors for correlations were estimated by bootstrapping the correlations , times. to calculate the significance of the interaction between conditions, we first calculated the interaction (the difference of differences). for instance, for the harmonic and inharmonic conditions, this term was: then, we approximated a null distribution for this interaction, permuting conditions across participants and recalculating the difference of differences , times. to determine statistical significance we compared the actual value of the interaction to this null distribution. supplementary figure . results from experiment , plotted separately for musicians and non-musicians. results are averaged across the two difficulty levels (. and . semitones) to maximize power. error bars show standard error of the mean. there was no interaction between musicianship, harmonicity and delay length (f( , )= . , p=. , hp =. ), and the interaction between delay and harmonicity was significant in non-musicians alone (f( , )= . , p=. , hp =. )). results from experiment , measuring discrimination of synthetic tones with and without a delay between notes, and with and without a longer inter-trial interval. results from trials with (right) and without (left) an added second delay between trials are plotted separately. error bars show within-subject standard error of the mean. the interaction between within-trial delay ( vs. seconds) and stimulus type (harmonic vs. inharmonic) was present both with and without the longer inter-trial interval (with: f( , )= . , p=. , hp =. ; without: f( , )= . , p=. , hp =. ). remembering: a study in experimental and social psychology on the distinction between sensory storage and short-term visual memory organization of visual short-term memory compression in visual working memory: using statistical regularities to form more efficient memory representations the information available in brief visual presentations retention of abstract ideas abstraction and the process of recognition" in psychology of learning and motivation: advances in research and theory error-correcting dynamics in visual working memory an auditory analogue of the sperling partial report procedure: evidence for brief auditory storage on short and long auditory stores distinctive features and errors in short-term memory for english consonants distinctive features and errors in short-term memory for english vowels the role of auditory short-term memory in vowel perception auditory and phonetic memory codes in the discrimination of consonants and vowels categorical perception: issues, methods, findings. speech and language associative strength theory of recognition memory for pitch retroactive interference in short-term recognition memory for pitch auditory change detection: simple sounds are not memorized better than complex sounds memory for pitch versus memory for loudness mapping of interactions in the pitch memory store dissociation of pitch from timbre in auditory short-term memory speech versus nonspeech in pitch memory impaired short-term memory for pitch in congenital amusia working memory in primate sensory systems the psychophysics of pitch" in pitch -neural coding and perception pitch perception" in the oxford handbook of auditory science: hearing frequency discrimination of complex tones with overlapping and non-overlapping harmonics pitch discrimination of harmonic complex signals: residue pitch or multiple component discriminations? does fundamental-frequency discrimination measure virtual pitch discrimination? diversity in pitch perception revealed by task dependence the musical environment and auditory plasticity: hearing the pitch of percussion on the binding of successive sounds: perceiving shifts in nonperceived pitches decay of auditory memory in vowel discrimination fundamental differences in change detection between vision and audition perceptual grouping affects pitch judgments across time and frequency rapid formation of auditory memories: insights from noise schema learning for the cocktail party problem the role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination an autocorrelation model with place dependence to account for the effect of harmonic number on fundamental frequency discrimination the role of absolute and relative amounts of time in forgetting within immediate memory: the case of tone-pitch comparisons the decay of pitch memory during rehearsal illusory sound texture reveals multi-second statistical completion in auditory scene analysis headphone screening to facilitate web-based auditory experiments. attention, perception, and psychophysics how to use individual differences to isolate functional organization, biology and utility of visual perception; with illustrative proposals for stereopsis individual differences reveal the basis of consonance the proof and measurement of association between two things time factors in relative and absolute pitch discrimination the decline of pitch discrimination with time the capacity of visual working memory for features and conjunctions understanding the object benefit in visual short-term memory: the roles of feature proximity and connectedness a probabilistic model of visual working memory: incorporating higher order regularities into working memory capacity estimates stimulus-specific mechanisms of visual short-term memory the retention and disruption of color information in human short-term visual memory speed selectivity in visual short term memory cognitive foundations of musical pitch pitch discrimination: are professional musicians better than non-musicians? influence of musical and psychoacoustical training on pitch discrimination pitch discrimination in musicians and nonmusicians: effects of harmonic resolvability and processing effort universal and non-universal features of musical pitch perception revealed by singing perceptual fusion of musical notes by native amazonians suggests universal representations of musical intervals indifference to dissonance in native amazonians reveals cultural variation in music perception object perception as bayesian inference some informational aspects of visual perception efficient coding of natural sounds sparse codes for speech predict spectrotemporal receptive fields in the inferior colliculus rapid efficient coding of correlated complex acoustic properties learning mid-level auditory codes from natural sound statistics auditory scene analysis: the perceptual organization of sound ecological origins of perceptual grouping principles in the auditory system contour, interval, and pitch recognition in memory for melodies summary statistics in auditory perception adaptive and selective time-averaging of auditory scenes the processing of temporal pitch and melody information in auditory cortex a neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex representations of pitch and timbre variation in human auditory cortex intonational speech prosody encoding in the human auditory cortex neural representation of harmonic complex tones in primary auditory cortex of the awake monkey the neuronal representation of pitch in primate auditory cortex neural ensemble codes for stimulus periodicity in auditory cortex songbirds use spectral shape, not pitch, for sound pattern recognition complex pitch perception mechanisms are shared by humans and a new world monkey across-species differences in pitch perception are consistent with differences in cochlear filtering divergence in the functional organization of human and macaque auditory cortex revealed by fmri responses to harmonic tones humans integrate visual and haptic information in a statistically optimal fashion what's new in psychtoolbox- ? rwc music database: music genre database and musical instrument sound database inharmonic speech: a tool for the study of speech perception and separation tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f , and aperiodicity estimation straight, exploitation of the other aspect of vocoder: perceptually isomorphic decomposition of speech sounds inharmonic speech reveals the role of harmonicity in the cocktail party problem derivation of auditory filter shapes from notched-noise data auditory toolbox version distortion products in auditory fmri research: measurements and solutions distortion products and the perceived pitch of harmonic complex tones" in physiological and psychophysical bases of auditory function subharmonic pitches of a pure tone at low s/n ratio pitch for nonsimultaneous successive harmonics in quiet and noise power : a flexible statistical power analysis program for the social, behavioral, and biomedical sciences coefficient alpha and the internal structure of tests the relationship between unstandardized and standardized alpha, true reliability, and the underlying measurement model and . semitones) to maximize power. error bars show standard error of the mean. as in experiment , the effects were qualitatively similar for musicians and non-musicians. although there was a significant main effect of musicianship (f( , )= . , p<. , hp =. ), the interaction between the effects of delay and harmonicity was significant in both musicians (f( , )= . , p<. , hp =. ) and non-musicians (f( , )= . , p<. , hp =. ), and there was no interaction between musicianship, stimulus type (harmonic, inharmonic error bars show within-subject standard error of the mean. we again observed significant interactions between the effects of delay and harmonicity in both musicians (f( , )= . , p=. , hp =. ) and non-musicians (f( , )= . , p=. , hp =. ), and no interaction between musicianship, stimulus type (harmonic, inharmonic instead of including only those participants who performed as well as in-lab participants, we excluded participants if their average threshold across all conditions on the first run of the nodelay trials was greater than % (just under semitone). this excluded of participants correlations are spearman's rho, corrected for the reliability of the threshold measurements (i.e., corrected for attenuation). (b) comparison between harmonic/inharmonic and harmonic/interleaved harmonic correlations, with and without a delay. the interaction between harmonic/inharmonic and harmonic/interleaved harmonic correlations remained significant even with this more lenient inclusion criteria (difference of differences between correlations with and without a delay = . , p=. ). error bars in b and c show standard error of the mean interleaved harmonic/pure correlations, with and without a delay. the interaction between the inharmonic/pure and interleaved harmonic/pure correlations likewise remained significant with the less stringent inclusion criteria (difference of differences between correlations with and without a delay = comparison between harmonic/pure, inharmonic/pure, and interleaved harmonic/pure correlations, with and without a delay. the interaction between the inharmonic/pure and interleaved harmonic/pure condition was also significant (difference of differences between correlations with and without a delay = . , p=. ), again replicating the effect seen in experiment the authors thank r. grace, s. dolan, and c. wang for assistance with data collection, t. brady for helpful discussions, and l. demany the authors declare no competing interest. figure . replication of individual differences results with pilot experiment. results of experiment were replicated in two pilot experiments, the data from which are combined in this figure. both pilot experiments were run online using -down- -up adaptive procedures. participants discriminated tones identical to those used in experiment , except for the random harmonic conditions, which we thus omitted from this figure. instead of having a -second silent delay between test tones, both pilot experiments presented three intervening distractor notes between the test tones. in these conditions, participants heard the first test tone, a ms pause, three back-to-back ms notes, a ms pause, and then the second test tone (yielding a total delay between the two test tones of . seconds). for two of the four adaptive runs, intervening notes were harmonic, and for the other two runs they were inharmonic (intervening tones were generated in the same way as the main test tones). the runs for all stimulus conditions were randomly ordered throughout the experiment. participants completed the first pilot experiment, in which the intervening notes were chosen randomly from a -semitone distribution surrounding the first note (loosely modeled after the method used in semal & demany, ) . participants completed the second pilot experiment, in which the intervening notes were chosen randomly from a uniform distribution spanning . hz- hz ( - hz +/- semitones). in the first pilot experiment, adaptive runs for tones without an inter-stimulus delay (and thus without intervening notes) were initialized at semitone pitch difference, and adaptive tracks for intervening note conditions were initialized at a semitone pitch difference. for the second pilot experiment, all adaptive tracks were initialized at a semitone pitch difference. because results from the two pilots were similar, we combined the data, and then used the same filtering procedure used in experiment -participants who performed worse than . % across the first (of four) runs on conditions without intervening notes were removed from further analysis. this excluded of the total participants, leaving participants ( female, mean age= . years, s.d.= . years). of these participants reported greater than four years of musical training (mean= . , s.d.= . years). (a) matrix of the correlation between thresholds for all pairs of conditions. correlations are spearman's rho, corrected for the reliability of the threshold measurements (i.e., corrected for attenuation). (b) comparison between harmonic/inharmonic and harmonic/interleaved harmonic threshold correlations, with and without a delay. the interaction between harmonic/inharmonic and harmonic/interleaved harmonic correlations was significant in this pilot study (difference of differences between correlations with and without a delay = . , p=. ), replicating the effect from experiment . error bars show standard key: cord- -pu v oy authors: pichon, fabien; busato, florence; jochems, simon; jacquelin, beatrice; le grand, roger; deleuze, jean-francois; müller-trutwin, michaela; tost, jörg title: analysis and annotation of genome-wide dna methylation patterns in two nonhuman primate species using the infinium human methylation k and epic beadchips date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pu v oy the infinium human methylation and methylation epic beadchips are useful tools for the study of the methylation state of hundreds of thousands of cpg across the human genome at affordable cost. however, in a wide range of experimental settings in particular for studies in infectious or brain-related diseases, human samples cannot be easily obtained. hence, due to their close developmental, immunological and neurological proximity with humans, non-human primates are used in many research fields of human diseases and for preclinical research. few studies have used dna methylation microarrays in simian models. microarrays designed for the analysis of dna methylation patterns in the human genome could be useful given the genomic proximity between human and nonhuman primates. however, there is currently information lacking about the specificity and usability of each probe for many nonhuman primate species, including rhesus macaques (macaca mulatta), originating from asia, and african green monkeys originating from west-africa (chlorocebus sabaeus). rhesus macaques and african green monkeys are among the major nonhuman primate models utilized in biomedical research. here, we provide a precise evaluation and re-annotation of the probes of the two microarrays for the analysis of genome-wide dna methylation patterns in these two cercopithecidae species. we demonstrate that up to , of the k and , probes of the epic beadchip can be reliably used in macaca mulatta or chlorocebus sabaeus. the annotation files are provided in a format compatible with a variety of preprocessing, normalization and analytical pipelines designed for data analysis from k/epic arrays, facilitating high-throughput dna methylation analyses in macaca mulatta and chlorocebus sabaeus. they provide the opportunity to the research community to focus their analysis only on those probes identified as reliable. the described analytical workflow leaves the choice to the user to balance coverage versus specificity and can also be applied to other cercopithecidae species. ( mots max) now: keywords infinium, k, epic, microarray, dna methylation, macaca, chlorocebus, rhesus macaque, monkey, african green monkey, vervet, annotation background dna methylation is an epigenetic mark associated with gene regulation. it impacts a number of key biological processes including genomic imprinting, x-chromosome inactivation, repression of transposable elements, aging, carcinogenesis and immunity against infectious diseases [ ] . dna methylation consists in the methylation of a cytosine ( -methylcytosine) mostly immediately followed by a guanine. because of this methylation, cytosines are susceptible to deamination yielding thymines [ ] [ ] [ ] . due to their increased mutation rate, cpg dinucleotides have been depleted during evolution and are thus under-represented in the genome [ ] . however, a higher density of mostly unmethylated cpg dinucleotides is found in cpg islands, generally localized in the first exon and intron or in the promoter region of genes [ , ] . cpg methylation can be measured in humans using the infinium human methylation beadchip array (infinium k), which measures methylation levels at more than , cpg across the human genome [ ] . this array has been replaced by the infinium human methylationepic beadchip array (infinium epic, which adds about , cpgs localized in enhancers; [ ] ). due to its accuracy and capacity to analyze large cohorts at an affordable cost, the infinium k microarray has been used in a wide range of epigenome-wide association studies in humans (see review in [ ] ). only few genome-wide array tools are currently available for nonhuman primate models. of note, these species have a close phylogenic proximity with humans and a high percentage of dna identity and gene homology, raising the possibility of the use of human microarrays in studies in non-human primate models. indeed, human gene expression microarrays have been used in various studies on monkeys, from hcv and siv infection [ ] [ ] [ ] to asthma [ ] or glaucoma [ ] . more recently, the human affymetrix hg-u plus . genechip has also been used for a gene expression study in the phylogenetically more distant lemur microcebus murinus, which has been shown to be an excellent model for alzheimer's disease ( [ ] ; pichon et al., unpublished data). in the latter study, human microarrays detected about % of lemur transcripts, which is expected because of the divergence of both species. similarly, the human infinium k was also used for dna methylation studies in great apes [ ] . seventy-three percent of the probes designed to the human genome mapped to the bonobo genome, %- % to chimpanzee, % to gorilla and % to orangutan genomes [ , ] . studies were also performed in monkeys of the cynomolgus macaque species (macaca fascicularis) [ ] , rhesus macaque (macaca mulata, mm) [ ] and baboon [ ] . for example, using different selection criteria for probes yielding reliable signals, % of human probes were mapped and annotated to the macaca fascicularis genome and subsequently used to study the impact of birth weight on gene methylation and expression in macaca fascicularis [ ] . microarrays initially designed for the interrogation of the human genome have thus been used to study gene expression or dna methylation of nonhuman primate samples. this can be successfully done under the condition that a thorough evaluation of reliable cpg targeting probes is performed for the species of interest. several monkey species are widely used models to study complex human diseases contributing to unraveling the mechanisms or treatment of human and animal diseases. rhesus macaques are for instance frequently used for the development of vaccines against viral diseases, including sars-cov- , hiv, influenza and ebola virus [ ] [ ] [ ] . they are also widely used in various studies ranging from development and imprinting, to addiction and social cognition [ ] [ ] [ ] [ ] [ ] [ ] . in parallel, the chlorocebus genus, among which figure prominently african green monkeys (agm), have been a gold standard model for the studies and fight against several infectious diseases, such as yellow fever, trypanosoma and plague in the past [ , ] . african green monkeys, in particular chlorocebus sabaeus, are included in studies of neurological disorders, in pharmacological trials [ ] and more recently also for the identification of mechanisms of protection against hiv/aids, mers-cov and sars-cov macacca mulatta and chlorocebus have therefore, together with baboons and cynomolgus macaques, become reference animal models for preclinical research. nonetheless, no commercial off-the-shelf dna methylation microarrays are available for these species. in the present study, we evaluated the use of the infinium k and infinium epic beadchips for genome-wide dna methylation analyses in samples from chlorocebus sabaeus and macaca mulatta and conducted an in-depth analysis of the available probes for these two old world monkey genomes. results show that about one third of the infinium k or epic humandesigned probes can be reliably used to study dna methylation in these cercopithecidae and that the majority map to gene features. we provide for each species and each microarray a list of annotated probes that can be used by the scientific community for genome-wide dna methylation studies in chlorocebus sabaeus or macaca mulatta. these detailed data on probe behavior were obtained with stringent criteria. they provide the flexibility to the research community to focus their analysis only to those probes identified as reliable. animals were housed at the idmit center of the commissariat à l'energie atomique (cea, blood was collected from agm (chlorocebus sabaeus) and rhesus macaques (macaca mulatta) by venipuncture on edta tubes. as several blood samples were available for several animals, in total chlorocebus sabaeus and macaca mulatta samples were included in the study. cd + peripheral blood mononuclear cells (pbmc) were purified, as described previously using magnetic anti-cd beads (miltenyi) [ ] . cd + t cell purity after isolation was confirmed using flow cytometry (median %, iqr %- %). dna was extracted from cd + t cells using the dneasy blood and tissue kit (qiagen), according to manufacturer's protocol. one µg of dna was bisulfite-treated using the epitect ® bisulfite kit (qiagen, hilden, germany) and analysed using the infinium human methylation k beadchips (illumina, san diego, ca) according to the manufacturer's protocol. we then mapped all the bp probe sequences targeting cpg positions in the human genome from the infinium k and epic arrays to chlorocebus sabaeus (cs) and macaca mulatta (mm) genomes, consecutively, using bowtie [ ], allowing only a unique position on the respective genome and up to mismatches. sequences were thus classified as "perfect match", " mismatch", " mismatches", " mismatches" or "unmapped" depending on the number of mismatches attributed by bowtie. unmapped probes included thus probes with either more than mismatches as well as those that could not be mapped to unique location. because of the necessity to only keep probe sequences that can reliably hybridize on simian genomes and inform on methylation state of the cpg sites, we removed probes containing mismatches at the cpg site, which were annotated as "cs-non-targeting" or "mm-nontargeting". sequences with intact cpg sites were qualified as "cs-functional" or "mmfunctional", for chlorocebus sabaeus and macaca mulatta respectively. among the csfunctional or mm-functional probes, we also determined the exact position of the closest mismatch, if any, relative to the cpg if the mismatch was localized from bp to bp away from the cpg to yield the final selction of valid probes. to normalize and identify differentially methylated probes in the two monkey species, a refined version of subset quantile normalization (sqn) pipeline [ ] , which performs the sqn at the level of each individual sample prior to a between sample quantile normalization, was used to correct for the difference in the performance and dynamic range of infinium i and infinium ii probes. the original illumina annotation file used in the pipeline were replaced by the ones created for macaca mulatta and chlorocebus sabaeus. due to the stringent criteria in the probe selection process, no further filtering for e.g. non-specific probes was required. quantitative dna methylation analysis for validation was performed by pyrosequencing of bisulfite-treated dna [ ] . six regions of interest for validation were amplified using ng product were rendered single-stranded as previously described [ ] and pmol of the respective sequencing primer were used for analysis. quantitative dna methylation analysis was carried out on a psq md system with the pyrogold sqa reagent kit (qiagen) and results were analyzed using the pyromark cpg software (v. . . . , qiagen). throughout the manuscript, we use a nomenclature of "targeting" probes, referring to probes that map and potentially target a cpg dinucleotide in the respective simian genomes allowing still for mismatches at any place in the probe including the cpg site, and "functional probes", which target an intact cpg site in the simian genomes, as well as "valid probes", which is the final selection of probes based on the selection criteria described below in the results section. first, we mapped the human probes to the simian genomes. all bp probe sequences designed to target a cpg in the human genome, were extracted from illumina's manifest files for each microarray ( , sequences for the infinium k and , for the infinium epic). table ). a more careful study of the mismatch position showed that , ( . %) probe sequences had at least one mismatch at the cpg site and were thus identified as cs-non-functional probes. of note, these nontargeting probes presented a total of , mismatches of which . % were tàc or aàg, i.e. weak to strong (wàs), nucleotide substitutions (see supplementary table ). out of the , mapped probes, , ( . %) probe sequences were targeting a cpg dinucleotide and were thus identified as cs-functional (table ) . from these cs-functional probes, . .% has no mismatch at all, . % had the mismatch at more than bp from the cpg and , ( %) presented at least one mismatch at bp or less from the cpg (supplementary table ). the latter were further analyzed in more detail as described below. we detailed the distribution of all the substitutions from the entire mapped probes set (supplementary table ). we observed that a nucleotide located in a cpg site had more chances (around %) to be substituted than a base located outside a cpg site (about %). moreover, about % of these substitutions are wàs substitutions in cpg sites, while they represented around % along the whole bp probe sequence. table ) and were further investigated as described below. as for infinium k, we analyzed the distribution of all the substitutions from the whole mapped probe set (supplementary tables ). from this distribution, it appeared that a base located in a cpg sites had higher chance to be substituted on the epic beadchip (around %) than for infinium k (around %, hypergeometric test p< . ). in cpg sites, wàs nucleotide substitutions represented around %, while they represented around % along the entire bp probe sequences. altogether, for their use in dna methylation analysis using the two human beadchips, . (epic) and . % ( k) of the probes could be classified as functional for chlorocebus sabaeus. (table ) . of these mm-functional probes, , presented at least one mismatch at bp or less from the cpg (supplementary table ). as above, we studied more carefully the distribution of all the substitutions from the whole mapped probes set (supplementary tables and ) and found a very similar distribution in macaca mulatta as in chlorocebus sabaeus. altogether, for their use in dna methylation analysis using the two human beadchips, . (epic) and . % ( k) of the probes could be classified as functional for macaca mulatta. these results showed in summary, that results were similar for the two species and allowing up to three mismatches for the base pair probes potentially about a third of the probes on the respective arrays were targeting cpg positions in the two simian genomes. there was a strong overlap between the two simian species with , (about %) and , (about %) mapped probes in common between the two species and , (about %) and , (about %) probes in common for the infinium k and infinium epic microarrays, respectively. as this probe set contained nonetheless probes which contained mismatches (outside the cpg position), which could have on influence on the probe behavior, a point which has been neglected in most studies so far. therefore, final validation and refinement of the reliable probe sets required experimental analysis of samples from the respective species. to determine to which extent cpg probes designed for the human genome and mapping to simian genomes could efficiently be used to detect methylation on these two simian models, we analyzed cd + t cell samples from macaca mulatta and from chlorocebus sabaeus on the infinium k array (figure ). the density distribution of beta-values of the probes identified as perfectly matching were very similar to the bimodal beta-values density distribution commonly observed for human samples [ ] . beta-values density distribution of probes containing a single mismatch remained close to beta-values density distribution of perfectly matched probes, and this density distribution was still similar for probes with two mismatches. however, probes containing three mismatches presented density distribution of beta-values which started to deviate from the expected bimodal distribution and the distribution for unmapped probes did no longer follow the expected distribution comforting our restriction to a maximum of three mismatches to be present. these results were similar for both species. when analyzing the beta-values density distribution of probes in function of the position of the mismatch, we observed that, in both species, the beta-values distribution was closely related to the mismatch position ( figure ) . thus, probes with a mismatch localized at or bp from cpg site presented an aberrant density distribution of beta-values among valid probes whereas probes containing a mismatch at - bp or more away from the cpg were similar to probes without mismatches independent of the number of mismatches present (one to three). we thus chose to remove probes presenting a mismatch at or bp from the cpg from the furthermore, using pyrosequencing, which as a sequencing-by-synthesis method is not dependent on human probes but uses species-specific amplification and sequencing primers, we validated dna methylation levels measured by the respective infinium probes at three cpg positions in each species showing a high correlation between the two orthogonal technologies and validating our approach of selecting reliable probes (figure ). the selected probes had a either or one mismatch (cg , cg , cg (n= ), cg , cg cg (n= )). there was no correlation between the presence and the position of the mismatch and the accuracy of the infinium data compared to the pyrosequencing data. the annotation of the probes to the homo sapiens genome (grch ) as described in the manifest files for both microarrays showed that the additional content of the infinium epic beadchip compared to the infinium k array was mainly located in gene bodies, and in the open sea using the gene feature and cpg island feature annotation, respectively ( figure ).to provide the user with similar information as contained in the illumina manifest for the human genome, we annotated the location of the cpgs with matching probes according to the ensembl gene annotation (version ). among the , cs-valid probes on the infinium k beadchip, , were annotated for a gene feature (representing , genes) and , for a cpg island feature using the cs genome. , cs-valid probes were annotated for both a gene and an island feature. cs-valid probe sequences principally targeted cpgs located around transcription start sites and gene bodies (about % in total) and around % of the cs-valid probes target cpg in intergenic regions (versus % in human). according to ucsc islands prediction (see materials and methods section), more than % of human-designed cs-valid probes target cpg probes located in cpg islands, which is similar to the proportion for the human genome ( figure ). among the , cs-valid probes on the infinium epic beadchip, , were annotated for a gene feature (representing , genes) and , for a cpg island feature. , probes were annotated for both a gene and an island feature. the proportion of the respective gene feature and cpg islands categories followed the trend observed for the human genome with an increased proportion in intergenic regions and gene bodies on the epic arrays ( figure ). among the , mm-valid probes on the infinium k array, , were annotated to a gene feature (representing , genes) and , to a cpg island feature (figure ) . , mm-valid probes were annotated to both a gene and an island feature. as for cs-valid probes, but even more pronounced, mm-valid probe sequences principally targeted cpgs located in tss regions and gene bodies (about % in total, % in intergenic regions). the distributions among island features according to the ucsc cpg island prediction was more similar between species (figures , ) . compared to homo sapiens, reliable infinium k and epic probes followed the overall distribution of probes when using the cpg island feature annotation, while for the gene feature annotation the proportion of probes in intergenic regions was increased especially in chlorocebus sabaeus at the expense of mainly gene body probes ( figures , and ) . overall % of all simian genes were covered by at least one valid probe. we provide the scientific community with new manifest files for infinium k and epic beadchips, adapted for genome-wide dna methylation studies in chlorocebus sabaeus or macaca mulatta (supplementary material). we provide for each microarray and each species two files: one containing the whole set of cs-valid or mm-valid probes (filtered for probes with a mismatch at or bp from the cpg) and another file containing only perfectly matched probes. these annotation files retain the format of the original illumina manifest and can thus be used without further modifications in analysis pipelines for beadchips such as sqn [ ] or the widely used champ pipeline [ ] . all columns of the respective manifest epigenome-wide association studies have recently been performed on many phenotypes, traits and diseases including cancer, immune, neurodegenerative and infectious diseases, with now more than ewas published [ ] and many more large-scale studies are likely to be conducted in the near future, linking complex diseases and traits with changes in the epigenome. furthermore, dna methylation holds the promise to explain at least a part of the influences the environment has on a phenotype [ ] . cell lines or blood cells do in most cases not appropriately recapitulate the phenotype of complex diseases, requiring the use of tissue or animal models to further our understanding of disease etiology and evaluate potential future treatments. cs and mm represent reference models in biomedical research and due to the recently realized importance of epigenetic for human disease, it would be of importance to include dna methylation analysis in comprehensive multi-level -omics analyses. however, few tools are currently available. in the presented work we determined which probes of the most widely used dna methylation arrays can be used to reliably analyze cpgs in the genomes cs and mm. the valid probes were very similar between cs and mm, with over % of mapped and valid cpg probe sequences in common between the two species. this is in agreement with the phylogenic proximity of these two cercopithecidae [ ] . however, the percentage of probes that can be reliably used differ from the one identified in a previous study conducted on m. fascicularis [ ] , a species closely related to m. mulatta, in which % of human probes mapped to the simian genome in contrast to the . % identified here for m. mulatta. this difference can be explained by the more stringent parameters for mapping probe sequences in our study rather than differences between the two simian genomes. for instance, we allowed only three mismatches instead of the four mismatches allowed in the study by ong et al. [ ] to align infinium k probe sequences on the simian genomes as we observed that beta-values densities showed aberrant distributions for probes with more than three mismatches as well as probes with fewer mismatches close to the targeted cpg. this feature has not been considered in previous studies, but had clearly strong influence on the measured dna methylation levels ( figure ) . furthermore, in contrast to ong et al., [ ] we only kept uniquely matched sequences on simian genomes because probes measuring dna methylation levels at cpg in multiple genomic regions are difficult to interpret and thus are not informative as has been demonstrated in human studies of the arrays [ ] . furthermore, we have annotated . % of aligned probes sequences (versus % in ong et al). this can be explained because we only annotated probe sequences localized in genes or at less than , bp from a gene using a similar annotation format to the one provided by illumina for the human genome. a similar approach was also proposed for the analysis of dna methylation and, following oxidative bisulfite treatment, hydroxymethylation in mm, allowing up to four mismatches [ ] . again, position of potential mismatches was not further evaluated. the same filtering approach as proposed by ong et al. [ ] was subsequently also used for a study on osteoarthritis using baboon samples which concluded that a total of % of probes on the k beadchip could be reliably used [ ] . recently, human dna methylation capture panels were adapted to the analysis of dna methylation of non-human primates and notably agm. while this approach allows to capture a larger proportion of regulatory elements in the simian genomes [ ] , it comes at a higher price and thus less well suited for projects with larger number of samples. furthermore, in quantitative comparisons the infinium beadchips outperformed the sequencing-based approach in terms of quantitative precision even when pooling cpg levels of closely neighbored cpgs in the sequencing approach [ ] . these results were obtained in human samples, it can thus be expected that the quantitative ratios will be increasingly distorted in the presence of sequence mismatches between the capture probes and the simian genomes. sequence identity is about % [ ] . at random, we should therefore expect about . % of cpg sites in common between macaca mulatta, and humans. nevertheless, only . % of mapped probes sequences were identified as mm-targeting. analyzing more precisely the distribution of mismatches, we observed that càt and gàa (strong to weak, sàw) nucleotide substitutions accounted for . % of mismatches, in line with the fact that cpg sites are privileged mutational hot spots during speciation, due to methylation of cytosines [ ] [ ] [ ] . of note, wàs nucleotide substitutions represented . % of mismatches and around % when looking at cpg sites in agreement with the weak-to-strong bias observed during recent human evolution [ ] [ ] [ ] . the same results were obtained for chlorocebus sabaeus and similar results were found with the infinium epic. for the latter, we observed a slightly increased substitution rate at cpg sites concomitant with a slightly lower number of valid probes compared to infinium k for both species. this observation might be explained by the localization of the , additional probes in enhancers on the epic arrays as regulatory regions have been suggested to be fast evolving regions during human divergence from other primates [ , ] . as the overall percentage of reliable probes between the different generations of beadchips remains similar, it can be expected that future human generations of human infinium dna methylation microarrays, will also improve the coverage of the simian genomes. annotation of these valid probes for gene features showed higher proportion of probes targeting cpgs located in intergenic regions compared to cpgs in promoter or gene regions. array. while previous studies identified list of probes, they did not provide a species-specific annotation file. furthermore, we annotated probes according to illumina gene and cpg island features and built annotation files following the illumina manifest file format. we kept the human grch annotation for comparison purpose and our annotation files can directly be used with a number of processing tools. although the proportions of the different gene features are altered compared to the distribution for the human genome with a reduction in gene bodies and around the tss, the current generation of the methylation microarray cover around % of all known genes in the two species and these genes are covered by two to six probes allowing multiple reliable measurements of dna methylation to increase confidence in the obtained data. this observation will also be reinforced by the fact that human array contains proportionally more probes in the promoter regions (often associated with cpg islands) ensuring that promoter cpgs regions remain sufficiently covered. while genes of special interest for a research group might be missing from these lists, the arrays provide the currently most comprehensive tool for dna methylation analysis at a reasonable price. the number of available reliable probes exceeds by several orders of magnitude the number of probes available on the first generation of the infinium dna methylation beadchips, the k array, which led to the discovery of dna methylation changes following environmental exposure such as tobacco smoking [ ] or disease associated changes [ , ] , and whose success led to the development of the higher-density dna methylation arrays evaluated in this study. while monkey-specific microarrays would constitute ideal tools, however, commercial of-the-shelf dna methylation microarrays for model organisms have been awaited in vain. on the other hand, comprehensive whole genome bisulfite sequencing projects remain for most laboratories prohibitively expensive when performed at coverage allowing reliable methylation calls. in summary, our approach using standard bioinformatic tools and validated quality criteria using orthogonal analysis technology concerning notably the importance of the number and position of any mismatches should be applicable to other simian and perhaps even other mammalian species. our approach allows a rapid selection of reliable probes for any organism with an annotated genome sequence. of course, a more divergent genome sequence, will lead to a fewer probes that can be used with high confidence and for some organisms this might eventually lead to an unfavorable cost / output balance, such as mice where only - k probes for the k beadchip and k for the epic array were found reliable [ , , ] . in conclusion, the presented work investigated in depth the suitability of human dna infinium epic, respectively. note that one probe can be attributed to different transcripts and thus gene features, whereas one probe is attributed to only one island feature. a summary of the biological processes, disease-associated changes, and clinical applications of dna methylation -methylcytosine in eukaryotic dna bayesian markov chain monte carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution cpg dinucleotides and the mutation rate of non-cpg dna the covariation between tpa deficiency, cpg deficiency, and g+c content of human isochores is due to a mathematical artifact large-scale structure of genomic methylation patterns a genome-wide analysis of cpg dinucleotides in the human genome distinguishes two distinct classes of promoters validation of a dna methylation microarray for , cpg sites in the human genome validation of a dna methylation microarray for , cpg sites of the human genome enriched in enhancer sequences epigenome-wide association studies (ewas): past, present, and future the chimpanzee model of hepatitis c virus infections systems biology of natural simian immunodeficiency virus infections nonpathogenic siv infection of african green monkeys induces a strong but rapidly controlled type i ifn response microarray profile of differentially expressed genes in a monkey model of allergic asthma gene microarray analysis of experimental glaucomatous retina from cynomologous monkey distinct transcriptome expression of the temporal cortex of the primate microcebus murinus during brain aging versus alzheimer's disease-like pathology dynamics of dna methylation in recent human and great ape evolution array-based assay detects genome-wide -mc and -hmc in the brains of humans, non-human primates, and mice infinium monkeys: infinium k array for the cynomolgus macaque assessment of dna methylation patterns in the bone and cartilage of a nonhuman primate model of osteoarthritis low birth weight associates with hippocampal gene expression team vrcs ( ) safety, tolerability, pharmacokinetics, and immunogenicity of the therapeutic monoclonal antibody mab targeting ebola virus glycoprotein (vrc ): an open-label phase study ultrafast and memory-efficient alignment of short dna sequences to the human genome complete pipeline for infinium((r)) human methylation k beadchip data processing using subset quantile normalization for accurate dna methylation estimation dna methylation analysis by pyrosequencing champ: updated methylation analysis pipeline for illumina beadchips ewas data hub: a resource of dna methylation array data and metadata the promises and challenges of toxico-epigenomics: environmental chemicals and their impacts on the epigenome a molecular phylogeny of living primates discovery of cross-reactive probes and polymorphic cpgs in the illumina infinium humanmethylation microarray successful application of human-based methyl capture sequencing for methylome analysis in non-human primate models battle of epigenetic proportions: comparing illumina's epic methylation microarrays and truseq targeted bisulfite sequencing evolutionary and biomedical insights from the rhesus macaque genome an rna gene expressed during cortical development evolved rapidly in humans ongoing gc-biased evolution is widespread in the human genome and enriched near recombination hot spots ascertaining regions affected by gc-biased gene conversion through weak-to-strong mutational hotspots forces shaping the fastest evolving regions in the human genome extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection ancient hybridization and strong adaptation to viruses across african vervet monkey populations tobacco-smoking-related differential dna methylation: k discovery and replication differential dna methylation analysis of breast cancer reveals the impact of immune signaling in radiation therapy ) a functional methylome map of ulcerative colitis exploring the utility of human dna methylation arrays for profiling mouse genomic dna profiling dna methylation differences between inbred mouse strains on the illumina human infinium methylationepic microarray we would like to acknowledge grant support from anrs and the institutional budget form the cnrgh. sp was recipient of a phd fellowship from the university paris diderot, sorbonne paris cité. key: cord- -mygo qx authors: li, yanpeng; gordon, emilia; idle, amanda; altan, eda; seguin, m. alexis; estrada, marko; deng, xutao; delwart, eric title: multiple known and a novel parvovirus associated with an outbreak of feline diarrhea and vomiting date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mygo qx an unexplained outbreak of feline diarrhea and vomiting, negative for common enteric viral and bacterial pathogens, was subjected to viral metagenomics and pcr. we characterized from fecal samples the genome of a novel chapparvovirus we named fechavirus that was shed by / affected cats and different feline bocaviruses shed by / cats. also detected were nucleic acids from attenuated vaccine viruses, members of the normal feline virome, viruses found in only one or two cases, and viruses likely derived from ingested food products. epidemiological investigation of disease signs, time of onset, and transfers of affected cats between three facilities support a possible role for this new chapparvovirus in a highly contagious feline diarrhea and vomiting disease. cats have an estimated world-wide population of over half a billion. members shelter became involved in the outbreak when a cat (# ) transferred from shelter became sick, several days after arrival and before the outbreak had been identified. ultimately, a total cats were affected in shelter in november, cats were affected in shelter (november - january), and cats were affected in shelter (november-january) (figure and supplementary table ). nearly all transmission was indirect ( figure ) ; because of this, it was not possible to definitively determine which animals had been exposed, except in specific rooms where housing was communal or exposure was known to be widespread prior to introduction of control measures. attack rates for these rooms were . % (shelter ) and . % (shelter ) (table ) . overall, diarrhea and vomiting were observed in . % and . % of the cases. there were likely more cats vomiting because vomitus that could not be attributed to a particular cat was found multiple times in the final wave of illness. . % and . % of the affected cats also showed inappetence and lethargy, and . % required veterinary care ( table ). the minimum incubation period was hours, and the maximum was estimated at - days based on estimated exposure dates. vomiting tended to start - days before diarrhea, and last only a couple of days, but in some animals the diarrhea lasted up to a week (longer in a few animals). the mean duration of illness was . days and the median was . days (range to days) which has good efficacy against bacteria and viruses (including non-enveloped viruses) (omidbakhsh & sattar, ) . after control measures were initiated, the outbreak slowed and cases became sporadic, except for at shelter in january (due to communal housing). in order to determine the pathogens that caused the outbreak, all available fecal samples were analyzed using viral metagenomics. viral sequences assigned to five main viral families (anelloviridae, parvoviridae, papillomaviridae, polyomaviridae and caliciviridae) were detected in these fecal samples (table ) . (table ) . dependoparvovirus sequence reads were also found in two cats. a near full-length genome of bases could be assembled from cat # (also infected with fechavirus) whose phylogenetic analysis showed it to be related to a recently reported dependoparvovirus genome from bats (desmodus rotundus) sharing ns and vp protein with . % and . % identities (supplementary figure ) . the timeline of disease presentation and fecal shedding status of key animals in the transmission chain between the facilities was determined. there were no samples from cats in shelter available for testing, but cat # was fechavirus positive shortly after being housed in the same area as cat # and # , who were sick upon transfer from shelter to shelter . a second cat in shelter was also shedding fechavirus (cat # ). of the other affected cats in shelter a total of individual cats and a sample pool (mixed feces from cats) were using metagenomics, we found febov , , and and a novel chaphamaparvovirus we named fechavirus in a large fraction of fecal samples and fechavirus in all vomit samples from sick cats in a multi-facility outbreak. subsequent pcr testing confirmed the presence of either a bocavirus or fechavirus or both (n= cat # ) of these viruses in all but one of the seventeen sick cats tested (cat # ). the outbreak in shelter predominantly tested positive for febov ( / febov +ve and / fechavirus +ve) while the sick cats in shelter were mainly shedding fechavirus ( / fechavirus +ve, / febov +ve). another cat housed in shelter who developed vomiting after sick cats were transferred back from shelter was also shedding fechavirus as well as a novel dependovirus. the co-detection of these viral genomes indicate that fechavirus may provide the required help for replication-defective dependoviruses. fpv and feline fcv reads were determined to originate from recently inoculated attenuated vaccine strains while other viruses were deemed to be asymptomatic infections, derived from chicken and pork viruses in consumed food, or detected only in sporadic cases. fcv sequences were negative by pcr, this could be the results of low viral load in those cats. the close genetic relationship of fechavirus to cachavirus found in both diarrheic and healthy dog feces (fahsbender, fechavirus + -+ -------+ + -+ + + + febov -+ -+ + + --+ + -+ -----idexx diarrhea note: days (d -d ) reflect day samples were collected. days of illness are shaded starting at first sample collection. "+" and "-" means pcr positive or negative for the fechavirus. illness onset was , , , days before day for cats # , , and , respectively. figure . daily epidemic cases of the cat diarrhea outbreak in three shelters from november to january. genetic analysis of feline caliciviruses associated with a hemorrhagic-like disease infectious diseases of the dog and cat frequent cross-species transmission of parvoviruses among diverse carnivore hosts metagenomic study of the viruses of african straw-coloured fruit bats: detection of a chiropteran poxvirus and isolation of a novel adenovirus molecular analysis of carnivore protoparvovirus detected in white blood cells of naturally infected cats feline panleukopenia: a re-emergent disease genetic complexity and multiple infections with more parvovirus species in naturally infected cats faecal shedding of parvovirus deoxyribonucleic acid following modified live feline panleucopenia virus vaccination in healthy cats fecal viral diversity of captive and wild tasmanian devils characterized using virion- enriched metagenomics and metatranscriptomics mkpv (aka mucpv) and related chapparvoviruses are nephro-tropic and encode novel accessory proteins p and ns . biorxiv ictv virus taxonomy profile: parvoviridae parvoviruses: small does not mean simple feline virome-a review of novel enteric viruses detected in cats identification of a novel parvovirus in domestic cats identification of a novel ichthyic parvovirus in marine species in hainan island lyon-iarc polyomavirus dna in feces of diarrheic cats chapparvovirus dna found in % of dogs with diarrhea divergent gyroviruses in the feces of tunisian children presence of infectious agents and co-infections in diarrheic dogs determined with a real-time polymerase chain reaction-based panel phylogenetic analysis reveals the emergence, evolution and dispersal of carnivore parvoviruses european molecular epidemiology and strain diversity of feline calicivirus giardia infection in cats human bocavirus-the first years human bocaviruses are highly diverse, dispersed, recombination prone, and prevalent in enteric infections case-control comparison of enteric viromes in captive rhesus macaques with acute or idiopathic chronic diarrhea identification and characterization of bocaviruses in cats and dogs reveals a novel feline bocavirus and a novel genetic group of canine bocavirus canine bufavirus in faeces and plasma of dogs with diarrhoea comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent altered respiratory virome and serum cytokine profile associated with recurrent respiratory tract infections in children the intestinal virome of malabsorption syndrome-affected and unaffected broilers through shotgun metagenomics enteropathogenic bacteria in dogs and cats: diagnosis, epidemiology, treatment, and control novel parvovirus related to primate bufaviruses in dogs twenty-five years of structural parvovirology the blood dna virome in , humans feline fecal virome reveals novel and prevalent enteric viruses genomic characterization of diverse gyroviruses identified in the feces of domestic broad-spectrum microbicidal activity, toxicologic assessment, and materials compatibility of a new generation of accelerated hydrogen peroxide-based environmental surface disinfectant human bocavirus in stool: a true pathogen or an innocent bystander? detection of rotavirus species a, b and c in domestic mammalian animals with diarrhoea and genotyping of bovine species a rotavirus strains discovery of a novel parvovirinae virus, porcine parvovirus , by metagenomic sequencing of porcine rectal swabs an ancient lineage of highly divergent parvoviruses infects both vertebrate and invertebrate hosts common and emerging infectious diseases in the animal shelter feline bocavirus- associated with outbreaks of hemorrhagic enteritis in household cats: potential first evidence of a pathological role, viral tropism and natural genetic recombination feline calicivirus novel circular single-stranded dna virus from turkey faeces an atypical parvovirus drives chronic tubulointerstitial nephropathy and kidney fibrosis high diversity and novel enteric viruses in fecal viromes of healthy wild and captive thai cynomolgus macaques (macaca fascicularis). viruses chicken anemia virus chapparvoviruses occur in at least three vertebrate classes and have a broad biogeographic distribution feline parvovirus infection and associated diseases genetic characterization of feline bocavirus detected in cats in japan feline panleukopenia virus: its interesting evolution and current problems in immunoprophylaxis against a serious pathogen the fecal virome of red-crowned cranes viral diversity of house mice characterization of a novel porcine parvovirus tentatively designated ppv a novel rodent chapparvovirus in feces of wild rats detection and genetic characterization of feline bocavirus in northeast china cameroonian fruit bats harbor divergent viruses, including rotavirus h, bastroviruses, and picobirnaviruses using an alternative genetic code development and application of a multiplex pcr method for the simultaneous detection and differentiation of feline panleukopenia virus, feline bocavirus, and feline astrovirus faecal virome of cats in an animal shelter key: cord- - hv vh authors: zhang, dong yan; wang, jian; dokholyan, nikolay v. title: prefusion spike protein stabilization through computational mutagenesis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hv vh a novel severe acute respiratory syndrome (sars)-like coronavirus (sars-cov- ) has emerged as a human pathogen, causing global pandemic and resulting in over , deaths worldwide. the surface spike protein of sars-cov- mediates the process of coronavirus entry into human cells by binding angiotensin-converting enzyme (ace ). due to the critical role in viral-host interaction and the exposure of spike protein, it has been a focus of most vaccines’ developments. however, the structural and biochemical studies of the spike protein are challenging because it is thermodynamically metastable . here, we develop a new pipeline that automatically identifies mutants that thermodynamically stabilize the spike protein. our pipeline integrates bioinformatics analysis of conserved residues, motion dynamics from molecular dynamics simulations, and other structural analysis to identify residues that significantly contribute to the thermodynamic stability of the spike protein. we then utilize our previously developed protein design tool, eris, to predict thermodynamically stabilizing mutations in proteins. we validate the ability of our pipeline to identify protein stabilization mutants through known prefusion spike protein mutants. we finally utilize the pipeline to identify new prefusion spike protein stabilization mutants. the ongoing outbreak of the novel coronavirus [ ] [ ] [ ] , which causes fever, severe respiratory illness, and pneumonia, poses a major public health and governance challenges. the emerging pathogen has been characterized as a new member of the betacoronavirus genus (sars-cov- ) , , closely related to several bat coronaviruses and to severe acute respiratory syndrome coronavirus (sars-cov) , . compared with sars-cov, sars-cov- appears to be more readily transmitted from human to human, spreading to multiple continents and leading to the world health organization (who)'s declaration of a pandemic on march , . according to the who, as of june , , there had been > , , confirmed cases globally, leading to > , deaths. in the initial step of the infection, the coronavirus enters the host cell by binding to a cellular receptor and fusing the viral membrane with the target cell membrane , . for sars-cov- , this process is mediated by spike glycoprotein which is a homo-trimer . in the process of viruses fusing to host cells, the spike protein undergoes structural rearrangement and transits from a metastable prefusion conformational state to a highly stable post-fusion conformational state , . the spike protein comprises of two functional units, s and s subunits; when fused to the host cell, the two subunits are cleaved. the s subunit is responsible for binding to the angiotensin-converting enzyme (ace ) [ ] [ ] [ ] receptor on the host cell membrane and it contains the n-terminal domain (ntd), the receptor-binding domain (rbd) and the c-terminal domain (ctd). ntd in the s subunit assists recognize sugar receptors. rbd in the s subunit is critical for the binding of coronavirus to the ace- receptor [ ] [ ] [ ] [ ] . ctd in the s subunit could recognize other receptors . the binding of rbd to ace facilitates the cleavage of the spike protein and promotes the dissociation of the s subunit from the s subunit . s contains two heptad repeats (hr and hr ), a fusion peptide, and a protease cleavage site (s '). the dissociation of s induces s to undergo a dramatic structural change to fuse the host and viral membranes. thus, the spike protein serves as a target for development of antibodies, entry inhibitors and vaccines . coronavirus transits from a metastable prefusion state to a highly stable post-fusion state as part of the spike protein's role in membrane fusion. the instability of the prefusion state presents a significant challenge for the production of protein antigens for antigenic presentation of the prefusion antibody epitopes that are most likely to lead to neutralizing responses. thus, since the prefusion spike protein exists in a thermodynamically metastable state , a stabilized mutant conformation is critical for the development of vaccines and drugs. computational mutagenesis is an effective approach to finding mutations that are able to stabilize proteins. we have previously developed a protein design platform, eris , , which utilizes a physical force field for modeling inter-atomic interactions, as well as fast side-chain packing and backbone relaxation algorithms to enable efficient and transferrable protein molecular design. originally, eris has been validated on mutants from five proteins, corroborating the unbiased force field, side-chain packing and backbone relaxation algorithms. in many later studies, eris has been validated through prediction of thermodynamically stabilizing or destabilizing mutations [ ] [ ] [ ] [ ] [ ] [ ] , and direct protein design efforts [ ] [ ] [ ] [ ] . in this work, we propose a pipeline to automatically stabilize spike proteins through computational mutagenesis. within the pipeline, we first analyze the conservation score and solvent accessible surface area (sasa) of residues in the protein. we then perform discrete molecular dynamics (dmd) [ ] [ ] [ ] [ ] simulations to calculate the root mean square fluctuation (rmsf) of residues to analyze their flexibility. based on this information, we select appropriate residues ( < conservation score < ; sasa > . ; rmsf > . Å; see discussion) as mutation sites. we subject the selected residues to computational redesign using eris to find the stabilizing mutations by calculating the change in free energy ∆∆ = ∆ − ∆ , where ∆ and ∆ are the free energies of the mutant protein and wild type proteins correspondingly. we utilize this pipeline to identify stabilization mutants of the spike protein. next, we describe our methods in detail and provide a list of stabilizing mutations for spike protein. we propose a pipeline to automatically find stabilized mutants of proteins ( figure ). the pipeline can be divided into two stages. in the first stage, users designate the protein of interest, and then the pipeline will analyze the d structure of the protein by using different metrics. in the second stage, users designate the mutation sites according to the analysis of the d structure of the protein, and finally the pipeline will determine the stabilizing mutations of the protein. users can either upload the d structure of the protein or input the pdb id of the protein to designate the protein of interest. in the first stage, the first step is to remodel the d structure of the protein of interest to complete the missing atoms and residues. we integrate modeller into our pipeline to remodel proteins. next, the pipeline utilizes consurf to calculate the conservation score of each residue in the protein of interest. the conservation score indicates the importance of the residue in maintaining protein structure and/or function. subsequently, the pipeline utilizes dmd to analyze the flexibility of each residue in the spike protein through rmsf. the technique has already been used to efficiently study the protein folding thermodynamics and protein oligomerization and allows for a good equilibration of the structures. then, the pipeline will calculate sasa of residues in the protein. in the second stage, users designate the mutation sites according to the conservation score, rmsf, and sasa. a high conservation score (≥ ) indicates the residue may play important roles in the function or the stability of the structure of the protein; residues of high rmsf (> . Å) are likely the culprit to undermine the stability of the structure of the protein, hence we select residues that have a low conservation score or high rmsf; residues with sasa < . are considered buried and residues with sasa ≥ . are considered exposed to solvent. after the designation of the mutation sites, the pipeline utilizes eris to determine the changes in free energies of the mutants. for each residue in the mutation sites, we utilize modeller to re-model the d structure of the spike protein by using a structure deposited to protein databank (pdb), pdbid: vsb , the cryo-em structure of the prefusion state of the spike protein, as the template structure to complete the missing atoms and residues ( figure a&b ). next, we use chiron all following computational mutagenesis study are performed using the structure with the rbds in down conformation. to validate the ability of the pipeline to identify stabilization mutants, we use the pipeline to calculate the free energy changes of several known prefusion spike protein stabilization mutants. the p mutation strategy (k p and v p) has been proved effective for the stabilization of spike protein of sars-cov- and other betacoronavirus , , . hsieh and coworkers the spike protein is mainly composed of s and s subunits ( figure s ). we select residues for mutation from the ntd and rbd domains in s , and we also select residues from the hr (heptad repeat ) and ch (central helix) domains in s . we don't select residues from hr domain in s because the structure of the hr domain has not been solved in the cryo-em structure vsb. at the outset, we calculate the conservation score ( figure a&d ) of all residues by using consurf. based on the conservation score, most residues in hr /ch are conservative, while residues in ntd and rbd are prone to mutation in evolution. next, we use pymol to calculate sasa of all residues in the spike protein ( figure b&e) . sasa indicates the level of residues exposed to the solvent in a protein and usually most of the functional residues are located on the protein structure's surface . all four domains have both low sasa residues and high sasa residues. then, we perform , , steps dmd simulation for the spike protein to calculate the rmsf of each residue ( figure c&f ). the residues in hr /ch have extremely low rmsf, while the residues in ntd and rbd domains have moderate to high rmsf. we select residues that have different conservation scores in these four domains for mutagenesis. in the ntd, we select residues (table s ) with the conservation score ranging from (highly variable) to (highly conservative). the sasa of these residues range from . (buried) to . (exposed). the rmsf range from . Å (frozen) to . Å (flexible). likewise, in the other three domains (table s - ), we select to residues, respectively. these residues also have diverse conservation scores, sasa, and rmsf. of note, to avoid affecting the function of the spike protein, these residues are all not chosen from the functional sites of the spike protein, such as the ace binding site in rbd. we utilize eris to calculate the free energy changes of mutants relative to the wild type ( figure a , figure s & , and table s - ). in the ntd, residues, e , k , i , r , and l are selected for mutagenesis. among them, the free energy changes of nearly all mutations of residues i , r , and l are positive, indicating that they are destabilizing the structure. in contrast, most mutations on residues e and k have negative free energy changes, suggesting that they are stabilizing the structure. the mutant that has the most negative free energy change is r c. however, cysteine is prone to forming disulfide bond with other cysteine, which may affect the correct folding of the protein structure, so we recommend k m to be a better choice as the stabilization mutant. in the rbd, residues (a , t , y , n , and d ) (table s ) are selected for mutagenesis. the free energy changes calculated by eris ( figure s , in the hr /ch domain, residues (l , i , a , y , s , t , and v ) are selected for mutagenesis as shown in table s . in stark contrast to ntd and rbd, most residues have extremely high free energy changes (> kcal/mol), suggesting that these residues are not very good choices for mutagenesis. the high free energy changes also implicate that they may play important roles in stabilizing the structure so that they are irreplaceable to some extent. this finding is also in concert with the high conservation scores of these residues. that said, we can still find stabilization mutants for these residues, such as t a and s m ( figure s ). compared to experimental mutagenesis, such as random mutagenesis , and site-directed mutagenesis , , computational mutagenesis is an efficient alternative that lays the foundation of large-scale mutation screening. however, performing computational mutation screening for all residues in the spike protein trimeric structure, which consists of residues in each monomer, is still timeconsuming and inefficient, so we seek to select critical residues to perform mutagenesis. in this work, to interrogate how to select residues as mutation sites, we select residues that have different conservation scores, sasa, and rmsf. we find that the probability of stabilizing mutations for specific residues is correlated with the conservation score, sasa, and rmsf of these residues ( figure b -d). we calculate the average and the minimum free energy change of all mutations of each residue. we find that the average free energy change of mutations of residues with either high (> ) or low (< ) conservation score are typically larger than . only mutations in residues that have moderate ( ~ ) conservation score have negative average free energy change, thus indicating a possibility to find stabilizing mutations. we posit that conservative residues are typically playing critical roles in structural stability or protein functioning , making them a likely target for finding stabilizing mutations. however, if the conservation score of the residue is too high, the residue may be irreplaceable and any mutation will destabilize the structure. on the other hand, residues with low conservation scores may be less critical to the structural stability, reducing the chances for finding stabilizing mutations at these positions. thus, we select residues that have moderate conservation score ( ~ ) for mutagenesis. similarly, we find that residues with large sasa or large rmsf have more stabilization mutants than residues with low sasa or low rmsf, respectively. overall, we select residues that have moderate conservation score ( ~ ), high sasa (> . ), and high rmsf (> . Å) for mutagenesis. in addition, although we only select residues in ntd, rbd, and hr /ch domains to perform mutagenesis, residues in other regions can also be used as mutation sites. for example, the known p mutation strategy (k p and v p) has been proved effective for the stabilization of spike protein of sars-cov- and other betacoronavirus , , . in this work, we propose a pipeline to automatically stabilize proteins through computational mutagenesis. we analyze the conservation score, rmsf, and sasa of residues in the spike protein through the pipeline. we propose criteria based on the conservation score, rmsf, and sasa to identify residues for mutation. finally, we utilize eris to calculate the free energy change and find stabilizing mutants. all source codes are deposited in: https://bitbucket.org/dokhlab/proteinstabilization. the d structure of the spike protein colored sasa. the red/blue colors indicate exposed/buried residues. (f) the d structure of the spike protein colored by rmsf. red means flexible and blue means frozen. characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov a novel coronavirus from patients with pneumonia in china features, evaluation and treatment coronavirus (covid- ) coronavirus (covid- ) outbreak: what the department of endoscopy should know the proximal origin of sars-cov- a novel coronavirus genome identified in a cluster of pneumonia cases-wuhan severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats severe acute respiratory syndrome coronavirus (sars-cov- ) and corona virus disease- (covid- ): the epidemic and the challenges world health organization declares global emergency: a review of the novel coronavirus (covid- ) fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein sars-cov- infects t lymphocytes through its spike protein-mediated membrane fusion cryo-em structure of the -ncov spike in the prefusion conformation. science ( -. ) structure, function, and evolution of coronavirus spike proteins the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex a novel angiotensin-converting enzyme-related carboxypeptidase (ace ) converts angiotensin i to angiotensin - sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structural basis for the recognition of sars-cov- by fulllength human ace . science ( -. ) cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen unexpected receptor functional mimicry elucidates activation of coronavirus fusion structure of mouse coronavirus spike protein complexed with receptor reveals mechanism for viral entry tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion covid- , an emerging coronavirus infection: advances and prospects in designing and developing vaccines, immunotherapeutics, and therapeutics eris: an automated estimator of protein stability modeling backbone flexibility improves protein stability estimation emergence of protein fold families through rational design g protein mono-ubiquitination by the rsp ubiquitin ligase tyrosine phosphorylation switching of a g protein stabilization of μ-opioid receptor facilitates its cellular translocation and signaling large sod aggregates, unlike trimeric sod , do not impact cell viability in a model of amyotrophic lateral sclerosis a phosphomimetic mutation stabilizes sod and rescues cell viability in the context of an als-associated mutation nonnative sod trimer is toxic to motor neurons in a model of amyotrophic lateral sclerosis computational design of chemogenetic and optogenetic split proteins engineering extrinsic disorder to control protein activity in living cells rational design of a ligand-controlled protein conformational switch rationally designed carbohydrate-occluded epitopes elicit hiv- env-specific antibodies discrete molecular dynamics studies of the folding of a protein-like model discrete molecular dynamics applications of discrete molecular dynamics in biology and medicine ab initio folding of proteins with all-atom discrete molecular dynamics comparative protein modelling by satisfaction of spatial restraints statistical potential for assessment and prediction of protein structures computational modeling of small molecule ligand binding interactions and affinities solving protein structures using short-distance cross-linking constraints as a guide for discrete molecular dynamics simulations ab initio rna folding by discrete molecular dynamics: from structure prediction to folding mechanisms convergent solutions to binding at a protein-protein interface consurf: identification of functional regions in proteins by surface-mapping of phylogenetic information mdanalysis: a toolkit for the analysis of molecular dynamics simulations stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis structure-based design of prefusion-stabilized sars-cov- spikes improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning sequence saturation mutagenesis (sesam): a novel method for directed evolution evolving strategies for enzyme engineering directed mutagenesis site-directed mutagenesis: effect of an extracistronic mutation on the in vitro propagation of bacteriophage qbeta rna understanding hierarchical protein evolution from first principles we acknowledge support from the national institutes for health r gm ,the huck institutes of the life sciences, and the passan foundation. the project described was also supported by the national center for advancing translational sciences, national institutes of health, through grant ul tr . the content is solely the responsibility of the authors and does not necessarily represent the official views of the nih. the author declares no potential conflict of interest. key: cord- -ii qi bp authors: yang, liu; he, wei; yun, yuehui; gao, yongxiang; zhu, zhongliang; teng, maikun; liang, zhi; niu, liwen title: defining a global map of functional group based d ligand-binding motifs date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ii qi bp uncovering conserved d protein-ligand binding patterns at the basis of functional groups (fgs) shared by a variety of small molecules can greatly expand our knowledge of protein-ligand interactions. despite that conserved binding patterns for a few commonly used fgs have been reported in the literature, large-scale identification and evaluation of fg-based d binding motifs are still lacking. here, we developed aftme, an alignment-free method for automatic mapping of d motifs to different fgs of a specific ligand through two-dimensional clustering. applying our method to nature-existing ligands, we defined fg-binding motifs that are highly conserved across different ligand-binding pockets. systematic analysis further reveals four main classes of binding motifs corresponding to distinct sets of fgs. combinations of fg-binding motifs facilitate proteins to bind a wide spectrum of ligands with various binding affinities. finally, we showed that these general binding patterns are also applicable to target-drug interactions, providing new insights into structure-based drug design. protein-ligand interactions play fundamental roles in many important cellular functions, including small molecule metabolism, enzymatic catalysis, signal transduction and regulation. comprehensive knowledge of protein-ligand interactions can not only provide important insights into biological functions of ligand-binding proteins but also greatly benefit drug discovery and development (loewenstein et al., ; paul et al., ) . many proteins that don't display overall sequence or structure similarities may share similar local d structures and can bind to same or similar ligands, thus identifying conserved binding patterns across different ligand-binding proteins at d level could facilitate a better understanding of protein-ligand recognition (abrusan and marsh, ; du et al., ; persson et al., ) . with the rapid accumulation of experimentally determined structures of protein-ligand complexes, it became possible for large-scale identification of conserved d binding motifs using computational approaches (kinjo and nakamura, ; ribeiro et al., ) . current methods based on structural comparison or alignment of protein pockets have identified many well-defined d motifs that are conserved across different protein pockets and widely used for protein function annotation, pockets classification and ligand-binding prediction (gao and skolnick, ; hoffmann et al., ; hwang et al., ; pires et al., ; pu et al., ; yeturu and chandra, ) . however, these ligand-based d binding patterns are not applicable to large fraction of ligands especially small molecular drugs due to the lacking of reference d protein-ligand structures. despite the functional and structural diversity of different protein-binding ligands, many of them share same or similar functional groups (fgs) that mediate the interactions with the target proteins. therefore, identification of conserved d binding motifs for fgs shared by different small molecules may extend our understanding of protein-ligand interactions to higher resolution and broader scope (guvench, ) . previous studies have shown that conserved d motifs do exist in proteins binding different ligands with the same fg. for example, the phosphate-binding loop (p-loop) motif (saraste et al., ; via et al., ) , in which the residues are highly conserved in terms of amino acid types as well as spatial positions among diverse phosphate-binding proteins. conserved d motifs were also reported for other fgs such as adenine ring (denessiouk et al., ; narunsky et al., ; nebel et al., ) , heme group (ferousi et al., ; zubieta et al., ) and prosthetic groups (nebel, ) . however, these motifs were either uncovered through manual analysis of a small set of protein structures by an expert (e.g. a crystallographer) or through structural alignment of proteins that bind fgs with rigid structures, which are subjected to limited fg types and/or biased datasets of d structures. computational methods for automatic extraction of d binding motifs for a variety of fgs in large scale are still lacking. to systematically identify and evaluate d binding motifs at fg level, we developed aftme, an alignment-free method that automatically maps functional atoms to different fgs from a set of protein pockets binding the same ligand using two-dimensional clustering approach. we applied our method to natural ligands with abundant d protein-ligand structures and built an encyclopedia of binding motifs for different fgs, providing valuable resources for elucidating the mechanism of protein-ligand interactions as well as uncovering new rules for structure-based drug design. we have developed aftme, a computational method to dissect protein pockets binding a specific ligand into sectors that interact with different functional groups (fgs). the basic assumption of this method is simple: if conserved binding pattern for a specific fg exists, the pattern-forming atoms should be spatially proximal to the corresponding fg and frequently co-appear, thus can be detected through clustering analysis of functional atoms from diverse protein pockets binding the same ligand. fig. a outlines the major steps of the method. ( ) given a set of protein pockets binding the same ligand, aftme first parses all the functional atoms (fas) (he et al., ) that are considered to interact with the ligand atoms (las). ( ) then a distance matrix is constructed that evaluates the spatial distances between fas and las. ( ) based on the distance matrix, a twodimensional clustering algorithm is performed, through which las are clustered into different fgs at the first dimension and fas are clustered into corresponding fg-binding motifs. ( ) each identified binding motif can be represented as a vector according to its chemical composition, which facilitate further analysis. the detailed description of each step is presented in methods and materials section. considering the abundance of studies on atp-binding proteins, we first applied aftme to a set of atp-binding proteins as a proof of concept. as shown in fig. b , our method identified three fa clusters or binding motifs corresponding to triphosphate group, ribose and adenine, respectively. the triphosphate-binding motif (m ) mainly consists of hydrophilic atoms from polar amino acids like arg, lys, ser, etc., and the adeninebinding motif (m ) is enriched by atoms from hydrophobic and aromatic amino acids including leu, val, phe, etc., whereas the ribose-binding motif (m ) contains both hydrophobic and hydrophilic amino acids (fig. c ). further investigation of these binding motifs indicated that the aftme identified fg-binding motifs are biological meaningful units. for instance, among the hydrophilic residues (rendered in red in fig. d and fig. s ) interacting with the triphosphate group, lys and ser are both well-known conserved residues in the p-loop, a common motif for phosphate-binding in atp-and gtp-binding proteins, which is typically composed of a glycine-rich sequence followed by a conserved lysine and a serine or threonine (saraste et al., ) . it was found that the hydrophobic and/or aromatic residues (rendered in green in fig. d and fig. s ) making up the adenine-binding motif interact with adenine ring through c-h-π and/or π-π interactions. notably, moodie et al. described the recognition of adenine by proteins in terms of a fuzzy recognition template based on a sandwich-like structure formed by hydrophobic residues (moodie et al., ) . denessiouk et al. also found that bulky hydrophobic residues can form a hydrophobic area by interacting with the adenine base (denessiouk and johnson, ) . the a-loop motif, which includes aromatic residues forming π-π interactions with adenine ring was also reported (ambudkar et al., ) . these findings are largely consistent with the aftme identified adenine-binding motif. although no well-defined motifs corresponding to the ribose-binding motif have been reported yet, it makes sense that hydrophobic/aromatic residues of the ribose-binding motif interact with five-carbon ring while hydrophilic residues interact with the extended hydroxyl groups through polar interactions. figure . workflow of aftme and its application to atp-binding proteins. (a) a schematic view of major steps of the aftme method. (b) two-dimensional clustering of the distance matrix for atp-binding pockets. the vertical and horizontal axes correspond to fas and las, respectively. the color encodes the distance between an fa and an la. three la clusters corresponding to the triphosphate group, ribose and adenine ring of atp were identified, respectively. three fa clusters or binding motifs (m , m and m ) corresponding to the above three fgs were also obtained simultaneously. (c) distribution of amino acids and atom properties for m (top), m (middle) and m (bottom). (d) an example (pdb: vjc) showing the spatial distribution of amino acids within each identified fg-binding motif. (e) different atp-binding proteins use different combinations of fg-binding motifs m , m and m . (f) atp-binding pockets with high affinity have significantly more fas than those with low affinity. (g) comparison of the fa numbers in fg-binding motifs of atp-binding pockets with high and low affinity. the center line, bounds of box and whiskers represent the median, interquartile range and . times interquartile range, respectively. the p-values were calculated using manney-whitney test. we then set out to explore different roles played by the three identified motifs in the atpbinding process. as we can see from the venn diagram in fig. e , among the atpbinding pockets in the dataset, a majority ( , . %) contain all the three binding motifs. nevertheless, ( . %) pockets get two of them, among which ( . %) carry the phosphate-binding (m ) and adenine-binding (m ) motifs, indicating that combination of two motifs, especially m and m , is sufficient for atp binding. besides, we also noticed some cases in which only one binding motif together with one or more metal ions exists (fig. s ) , indicating that metal ions may greatly affect the global binding profile. next, we asked how different fg-binding motifs contribute to the binding affinity. all the atpbinding proteins with experimental affinity data available were collected and sorted from high to low affinities (table s ). in general, protein pockets in high-affinity (top / ) group contain more fas than those in low-affinity (bottom / ) group (fig. f , p= . e- , mann-whitney test). interestingly, when looking into an individual binding motif, only adeninebinding motif (m ) shows significant increment of fa numbers in high-affinity pockets (fig. g , p= . e- , mann-whitney test), suggesting that increase of hydrophobic interactions with the adenine ring makes major contributions to higher atp binding affinity. taken together, the above results demonstrate the ability of aftme to decompose ligand-binding sites into biological meaningful motifs, which are spatially well-defined to interact with different fgs and contribute unequally to protein-ligand binding affinity. to see whether fg-based d binding motifs identified by our method are reused across different ligand-binding proteins, we applied aftme to a few ligands sharing the same fgs with atp including adp, amp, gtp and utp. fine-mapped binding motifs for adenine and ribose were obtained from adp-and amp-binding proteins, and that for triphosphate was from gtp-and utp-binding proteins ( fig. s a- d ). we found that the chemical compositions of motifs binding the same fg are highly consistent although they were extracted from proteins binding different ligands ( fig. a- c ). for adenine and ribose, the binding motifs are universal among atp-, adp-and amp-binding pockets and have very similar distribution of amino acid types and atom categories ( fig. a, b) . similarly, triphosphate-binding motifs extracted from gtp-and utp-binding proteins also show consistent makeup with atp-derived motifs (fig. c ). we then extended our evaluation to a wider range of ligands, among which are adenine-containing, are ribose-containing and are triphosphate-containing. we described each fg-binding motif using a -dimensional vector representing the proportion of types of amino acids and categories of atoms, respectively (see methods and materials). the vector representation of the fg-binding motifs enables correlation analysis of chemical composition for any pair of motifs. it was found that adenine-binding motif pairs showed significantly higher correlations than random fgbinding motif pairs (fig. d ). similar observations were obtained for ribose-binding ( adenine-and ribose-binding motifs are extracted from atp-, adp-and amp-binding proteins, and triphosphate-binding motifs are obtained from proteins binding atp, gtp and utp, respectively. (d-f) pairs of (d) adenine-, (e) ribose-and (f) triphosphate-binding motifs show significantly higher composition correlation than motif pairs binding random fgs. (g) a schematic view of large-scale identification of fgbased binding motifs using aftme, the l, g and m represent the ligand, the fg and the motif, respectively. (h) correlation analysis of the identified fg-binding motifs indicates that motifs binding the same fg are highly consistent in their composition. the center line, bounds of box and whiskers represent the median, interquartile range and . times interquartile range, respectively. the p-values were calculated using paired t-test. next, we performed a large-scale analysis of d fg-binding motifs for all the ligands with abundant d structures available. as shown in fig. g , we first derived all the d proteinligand structures from biolip database (yang et al., ) . redundant proteins binding the same ligand that show over % sequence similarity were eliminated using cd-hit (li and godzik, ) . ligands with more than structures available were kept for the following analysis. fg-binding motifs corresponding to unique fgs were identified using aftme (table s ) , among which fgs appeared in multiple (at least three) ligands. for each fg present in multiple ligands, a conservation score (cs) of the corresponding fg-binding motif was calculated as the average of pairwise pearson's correlation coefficients among all the identified motifs binding this specific fg. and a corresponding p value was also calculated using permutation test (see methods and materials). as shown in table , most fg-binding motifs corresponding to fgs appeared in multiple ligands are highly conserved across different ligand-binding proteins (cs > . , p < . ). overall, two motifs binding the same fg show significantly higher composition correlations compared with two randomly selected motifs (fig. h) , confirming the high conservation of fg-binding motifs. these lines of evidences showed that aftme can be applied to detect binding motifs for a diversity of fgs. importantly, the identified binding motifs are highly conserved among different ligand-binding pockets, laying the foundations for expanding limited d motifs to a broader range of ligands that are not suitable for aftme analysis (e.g. due to lack of structure data) but sharing same or similar fgs with applicable ligands. given that binding motifs for different fgs have been identified using our method, we asked whether there are general interaction patterns between the identified motifs and the fgs they bind. we found that all the binding motifs could be clustered into classes based on their physicochemical properties using k-means ( figure s , see method and materials), which are well separated in the t-sne plot shown in fig. a . notably, the fgbinding motifs in different classes are featured with distinct physicochemical properties. the first class (red dots), denoted as the aromatic motif class, is enriched with atoms from aromatic amino acids like trp, tyr and phe. the second class (green dots), named the hydrophilic motif class, is mainly composed of hydrophilic, donor and acceptor atoms from polar amino acids such as arg, lys, asp and glu. the third class (blue dots), the mixed motif class, consists of both aromatic and hydrophilic atoms. the fourth class, named the hydrophobic motif class, is dominated by atoms from hydrophobic amino acids including leu, ile, val etc. (fig. a ). next, we looked into the correspondence between different motif classes and the fgs they bind. as shown in figure b , most of the fgs are uniquely mapped to a single motif class, indicating that different classes of motifs have their specific binding preference for fgs. although a variety of fgs are involved, we found some dominating fgs in each motif class (table s ) . among fgs that interact with the aromatic motifs, two types of fgs are in the majority, one is with aromatic ring and the other is with non-aromatic ring. the former type, exemplified with the cytosine ring of cytidine- '-monphosphate (c p), interacts with the aromatic ring of phe and tyr through π-stacking ( fig . c left panel) . the latter type, for example, the glucose ring in n-acetyl-d-glucosamine (nag), of which the carbon atoms form hydrophobic interactions with the aromatic atoms of tyr/trp ( fig. c right panel) . the d ligand-protein interactions were generated by ligplot (laskowski and swindells, ) . in contrast, the hydrophilic motif class prefers to bind polar fgs through hydrogen bonds, among which carboxyl and phosphorus are the most prevalent ones. for instance, the carboxyl group in citric acid (cit), forms n-h⋯o and o-h⋯o hydrogen bonds with n atom from imidazole of a his and o atom from hydroxyl of a thr, respectively (fig. d left panel). four n-h⋯o hydrogen bonds are formed between o atoms of the phosphate group in adenosine- '- '-diphosphate (a p) and n atoms of two basic amino acids (lys and arg) in the binding motif ( fig. d right panel) . there are over fgs engaged in both the mixed and the aromatic motif classes, most of which are non-aromatic sugar rings. in addition to hydrophobic interactions between the sugar ring and the aromatic ring which are frequently used in the aromatic motif class, the mixed motif class also contains hydrophilic amino acids that form hydrogen bonds with the extended-out hydroxyl groups (fig. e, right panel) . besides, the mixed motif class is of high propensity to recognize an amino acid, of which the amide group interacts with the aromatic ring through amide-π stacking and the carboxyl group interacts with hydrophilic amino acids via hydrogen bonds, respectively ( fig. e left panel) . the hydrophobic motif class also shares a major type of fg, the aromatic hetero-ring, with the aromatic motif class. instead of π-π interactions, the c-h-π interactions are the main driving force for hydrophobic-aromatic contacts. another two major types of fgs involved in the hydrophobic motif class are alkene and alkane chains. as two examples showed in figure f , the alkane (left) and the alkene (right) chains are well accommodated in protein pockets composed of hydrophobic residues. altogether, our systematic analysis suggested the existence of four classes of fg-binding motifs and their favored fgs. deep investigations further revealed general interaction patterns between these functional motifs and the fgs they bind, thus build up a global map of d motif-fg interactions. having identified the corresponding relations between motif classes and fgs, we then asked how motifs are combined to facilitate the binding of ligands that consist of different fgs. we found that the above four fg-binding motif classes are almost evenly distributed in protein pockets investigated in our analysis (fig. a) , suggesting that all the identified motif classes are commonly used and important for protein-ligand recognition. after careful inspection of the identified binding motifs and their host ligand-binding pockets, we found three distinct combination modes for motif classes in protein pockets (fig. b) . (i) the single-class mode, which applies to nearly a quarter of investigated ligand-binding cases, combines only fg-binding motifs of the same class. (ii) the doubleclass mode, which goes for more than % of the cases, integrates two different classes of fg-binding motifs. (iii) the triple-class mode, which recognizes a smaller fraction of ligands, assembles three different classes of fg-binding motifs. for the single-class mode, combinations of two mixed-class motifs are mostly observed, followed by hydrophobic-, aromatic-and hydrophilic-class motif combinations (fig. c) . for the double-class mode, there are possible class-class combinations, among which the hydrophobic-hydrophilic combination applies to the greatest number of ligands, indicating a commonly used protein-ligand binding pattern in which the hydrophobic fg of the ligand interacts with a hydrophobic motif while another polar fg is oriented to a hydrophilic motif (fig. d, fig. s ). the triple-class mode also includes four different class-class combinations that are almost equally present for ligands they bind (fig. e) . to gain further insights, we investigated in greater detail of the ligands involved in different combination modes (table s ) . notably, combinations of different classes of fg-binding motifs facilitate the binding of a vast diversity of ligands composed of fgs that are well mapped to the corresponding fg-binding motif classes. among ligands involved in the single-class mode, we outlined three for examples (fig. c, fig. s a ). -acetamido- -deoxy-alpha-d-glucopyranose (ndg) is composed of two mixed-motif favored fgs including an acetamide group and a glucose ring, both being bound by fg-binding motifs of the mixed class. cellobiose (cbi) is a disaccharide consisting of two glucoses and binds to proteins with two aromatic motifs. phosphoglyceric acid ( pg) has two polar fgs, a phosphate group and a glyceric acid group, which are recognized by two hydrophilic motifs. a greater number and higher variety of ligands were witnessed in the double-class mode (fig. d, fig. s b ). for instances, geranyl diphosphate (gpp) which is complexed with proteins comprising a hydrophobic and a hydrophilic motif, contains a hydrophobicpreferred alkene and a hydrophilic-preferred diphosphate group. for fructose- phospahte (f p) proteins achieve ligand-binding with an aromatic motif to the sugar ring and a hydrophilic motif to the phosphate group, respectively. other ligands such as tryptophan (trp, with an aromatic motif to indole and a mixed motif to alanine), b-octyl glucoside (bog, with a hydrophobic motif to octyl and a mixed motif to glucose), pyridinium- -ylpropane- -sulfonate ( ps, with an aromatic motif to pyridinium and a hydrophilic motif to sulfonate) all follow the general fg-motif interaction patterns we identified. in the triple-class mode, the ligands have at least three fgs and thus are in relatively larger size (fig. e, fig. s c) . for examples, to bind n-methyl- -hydroguanosine- 'diphosphate (m g), proteins adopt a binding pattern with an aromatic motif to dehydroalanine, a mixed motif to ribose and a hydrophilic motif to diphosphate. similarly, folic acid (fol) contains a pteridine ring, a benzoic group and a glutamic acid that are recognized by a hydrophobic, an aromatic and a mixed motif, respectively. in the example of atp, we already showed the unequal contribution of different fgbinding motifs to the binding affinity. here, we sought to explore how combinations of different classes of fg-binding motifs will affect the ligand-binding affinity. experimental binding affinity for all the protein-ligand pairs in our analysis were retrieved from the pdbbind database (table s ) (liu et al., ) . in the single-class mode, both the aromatic and hydrophobic combinations show significantly higher affinity than the mixed and hydrophilic combinations, suggesting that general hydrophobic interactions including π-stacking, c-h-π interactions and interactions between two aliphatic carbons contribute more to high binding affinity than polar interactions such as hydrogen bonds and salt bridge (fig. e) . consistently, in the other two combination modes, the more hydrophobic interactions are involved, the higher affinity the ligand-binding achieve. for example, combination of the hydrophobic and aromatic motifs is the most efficient binding pattern in the double-class mode. moreover, non-hydrophilic combinations get significantly higher affinity compared to combinations with hydrophilic motifs for both the double-class (fig. f , p= . e- , mann-whitney test) and triple-class modes (fig. g, p= . e- , mann-whitney test). the results further supplemented our observation from the atp-binding motifs (fig. g ) and confirmed the findings in the previous studies that hydrophobic interactions are a driving factor for the increased ligand efficiency (ferreira de freitas and schapira, ; young et al., ) . together, these evidences showed that fg-binding motifs are building blocks of ligandbinding sites, and combinations of different classes of fg-binding motifs facilitate the proteins to bind a wide spectrum of ligands with various binding affinities. we next asked whether the above motif-fg binding patterns derived from nature-existing protein-ligand complexes can also be applied to design target-drug interactions. to this end, we investigated in details of interactions between three well-defined drug targets and their small molecular inhibitors (fig. s a ). the first case is the kinase domain of braf, which is the target of vemurafenib, an fda-approved small molecular drug for the treatment of patients with metastatic melanoma with braf v e mutation (karoulia et al., ) . four fas clusters (motifs) were identified based on their spatial distances to las, that are well separated and mapped to different fgs of vemurafenib in d structure (fig. a, b, fig. s b ). the first motif mainly consists of hydrophobic atoms and interacts with the chlorophenyl group through hydrophobic interactions. the second motif contains atoms from two aromatic amino acids (trp and phe) and contacts with the pyridinyl group through π-stackings. the third motif includes both hydrophobic and hydrophilic atoms engaged in the hydrophobic and polar interactions with the difluorophenylsulfonamide group. the fourth motif is dominated by hydrophobic atoms that interact with the propane group. the second case is the methyltransferase domain of dot l, which is the target of epz- , a small molecular drug in clinical trial for the treatment of adult acute leukemia (stein et al., ) . using the same approach, we found a hydrophobic motif (m ), a mixed motif (m ) and a hydrophilic motif (m ) in the ligand-binding sites of dot l, which correspond to three different fgs of epz- , i.e. methyl-adenine, ribose and methionine, respectively (fig. c, d, fig. s c ). lastly, we looked into the interactions between the main protease (mpro) of covid- and one of its potent inhibitor b (dai et al., ) . we observed three well-separated motifs in the ligand-binding sites of the target protein, which are located proximal to the pyrrolidine, fluorophenyl and indole-carboxamide groups (fig. e, fig. s d) . notably, the interactions between the identified fg-binding motifs and the corresponding fgs also follow the general chemical rules: two hydrophobic motifs (m and m ) interact with two ring structures, the mixed motif (m ) interacts with the carboxamide (fig. f ). together, these examples indicate that the fg-based functional motifs also appear in different drug targets and interact with the specific fgs following general motif-fg binding patterns. thus, the global map of d fg-binding motifs could provide important insights and guidance for rational design of small molecular drugs. a classical assumption in structural biology is that the d structure of a protein determines its molecular function. however, many proteins that don't display overall sequence or structure similarities may share similar local d binding sites and can bind to same or similar ligands (kahraman et al., ) . thus, identifying conserved d patterns/motifs across different ligand-binding proteins serve as an efficient way to learn and predict protein-ligand interactions. computational methods that rely on multiple structure alignments or pairwise pocket comparisons have identified many conserved d binding patterns across different protein pockets binding same or similar ligands (dukka, ) . despite the validity and usefulness of these ligand-based binding patterns, they mainly go for nature-existing ligands with abundant protein-ligand d structures, thus limiting their application scope. here, we proposed aftme, an alignment-free method for automatic identification of d binding motifs at the basis of fgs shared by different small molecules, which permits studying protein-ligand interactions in a wider scope and higher resolution. the application to atp showed the feasibility and validity of our method to detect fg-based d binding motifs and confirmed the reusability of the motifs in different ligand-binding proteins. we further applied our method to natural ligands and obtained binding motifs for unique fgs, providing useful resources for deep exploration of protein-ligand recognition. systematic investigation of fg-binding motifs identified by our method provides several important insights into protein-ligand interactions. first, ligand-binding sites of a protein can be dissected into independent sectors corresponding to different fgs of the binding ligand. these fg-based binding motifs are highly conserved among different ligandbinding pockets at both amino acid and atom level. second, we found four classes of fgbinding motifs with distinct physicochemical properties and their own preference for fg binding. moreover, the interactions between d motifs and fgs follow some general rules. for example, a hydrophobic motif is more likely to interact with a hydrophobic fg and a hydrophilic motif usually recognizes a polar fg. third, following the general motif-fg recognition map, protein pockets consisting of different fg-binding motifs can bind a wide spectrum of ligands through different motif combination modes. of note is that protein pockets with more hydrophobic motifs tend to gain higher binding affinity. although the fg-binding motifs are mainly derived from protein structures binding natural ligands, we showed that these motifs can also be applied to different drug-target interactions. therefore, we expected that the global map of motif-fg interactions, together with the current molecular docking (trott and olson, ) and/or molecular simulation (md) (souza et al., ) approaches, can greatly benefit structure-based rational drug design. rapid development of high-throughput screening using crispr/cas system has greatly accelerated the discovery of new cancer drug targets in recent years (fellmann et al., ; jost and weissman, ) . crispr screening with tiling-sgrna designs can further infer essential protein domains that are suitable for drug targeting (he et al., ; neggers et al., ; shi et al., ) . however, identifying effective small molecular drugs for a specific target through high-throughput experimental screens is still expensive and inefficient (macarron et al., ) . virtual screening using computational approaches has emerged as a starting point for identifying hit molecules for a given drug target (lavecchia and di giovanni, ) . d-based predictions of small molecules for a specific protein target with machine or deep learning approaches are of higher accuracy compared to sequence-based predictors. however, these ligand-based methods rely on multiple d structures binding the ligands to learn the features, thus are limited to a small fraction of ligands. our study showed that conserved d binding patterns can be obtained at fg level, which may expand the scope for ligand-binding prediction since many different ligands share same or similar fgs. we collected all the protein-ligand complexes from biolip database, a semi-manually curated database for biologically relevant ligand-protein interactions (yang et al., ) . proteins that only bind metal ions were excluded. for each ligand, we removed the redundant proteins with more than % sequence similarity using cd-hit (li and godzik, ) . only ligands with at least protein structures were kept for further analysis, producing a dataset containing protein structures in complex with ligands. the pdb codes for all the protein structures, as well as the information of all ligands are available at https://github.com/mdhewei/aftme/datasets. for binding affinity analysis, we retrieved experimentally determined binding affinities of protein-ligand complexes from latest version of pdbbind database (liu et al., ) , resulting in binding affinity data for complexes covering out of ligands in the dataset. notably, the corresponding affinity values of each retrieved complex we used were p ! , " and #$ . these values were subjected to a transformation from raw affinity ! , " and #$ defined as follows: a higher value of ba indicates a stronger binding affinity for the protein-ligand complex. under the knowledge of biochemistry, we manually defined the functional groups (fgs) of each ligand based on their shape, size, and physicochemical property. given a ligand in the dataset, we firstly downloaded its d structure from pdb database (berman et al., ) , then scanned its structure and search for fgs in the following order: ( ) ring structures with consistent physicochemical property like adenine; ( ) ring structures together with other polar groups like ribose; ( ) chain structures at the termini or in the middle of the ligand like alkane chain; ( ) well-defined polar groups such as phosphate, carboxyl, hydroxyl etc.; ( ) other fragments that are close in size with already defined fgs. in general, we followed a basic principal that the intraligand fgs should be considerably different in shape and physicochemical property but close in size, thus ensure their independency in interacting with the protein partner. for all the defined fgs, we further classified them into major types referring to the classifications in previous study (cai et al., ) . aftme takes four major steps to extract fg-binding motifs: . extraction of ligand binding pockets. . construction of functional atom distance matrix. .two-dimensional clustering based on the distance matrix. . identification and characterization of fg-binding motifs. details of each step are described below. we defined the protein non-backbone heavy atoms within Å of the ligand as the functional atoms (fas) (hoffmann et al., ) . and all fas interacting with the ligand were gathered together to form the ligand binding pocket. to define the pocket described by a set of fas that interact with the ligand, each fa consisting of the pocket with the bound ligand of length l was presented as a l-dimension vector: where " represent the distance to the i-th atom from all l atoms of the ligand. then the pocket p consisting of m fas could be defined as follows: on completion of pocket definition, we constructed the functional atom distance matrix by collecting pockets binding the same ligand. given k pockets that bind the same ligand of length l, we calculate the p vector for each pocket consisting m fas with the vector f by extracting a set of fas interacting with the ligand. for all of k pockets, we then obtained the matrix )*+ as follows: where , denotes the distance between m-th fa in the k-th pockets and the l-th atom of the bound ligand. note m here just represents the number of fas consisting of a pocket, it is not always equal for every pocket binding the same ligand. for example, considering pockets binding same ligand with the length l, and we could extract the x, y fas from that pockets respectively. according to the definition of atom vector f and pocket vector p, we then calculated the )*+ with the shape of x + y rows and l columns as follows: based on the functional atom distance matrix ( )*+ ) calculated for a specific ligand, we performed a two-dimensional hierarchical clustering both on the rows and columns of distance matrix, which represented the fas and the ligand atoms (las) respectively. prior to this analysis, we standardized the data within the rows of the matrix, i.e., subtract the minimum and divide each by its maximum. first of all, each fa/la is considered as an individual cluster and the distances between different fa/la were calculated through the euclidean metric. then two clusters with the closest distance were merged, and the linkages were created using the ward method to minimize the total within-cluster variance. the clustering was performed using "agglomerativeclustering" module from the scikitlearn package in python (abraham et al., ) . we set the "n_clusters" parameter as the number of predefined fgs of the ligand. we used a heatmap with a row-oriented and a column-oriented dendrograms to visualize the hierarchical clustering results. based on the heatmap, we could find the correspondence between different cluster of fas and las. specifically, the observation that the cluster of fas to its proximal cluster of las should clustered together, thus build up a map between different fa clusters and la clusters. the clusters of las denote the different fgs within the ligand, and the clusters of fas represent binding motifs for the specific fg matched. following this step, we obtained different binding motifs for the fgs of the ligand they interact with. to maintain the consistency of la clusters and our predefined fgs and reduce the noises in the identified binding motifs, we adopted a twostep filtering process: first, the la clusters with less than % atoms in any of the predefined fgs were discarded together with the corresponding binding motifs, and vice versa. second, protein pockets with less than atoms were filtered within a specific binding motif. to quantitatively describe the identified binding motif, we made a deep insight into its composition from the protein level, amino acid level and atom level. in terms of protein and amino acid levels, we counted the number of binding pockets for all motifs interacting with specific fgs of the ligand, and the presence of each type of amino acids inside a binding motif. in particular, atoms were classified into categories according to their biochemical properties (he et al., ) : (i) hydrophilic, (ii) acceptor, (iii) donor, (iv) hydrophobic, (v) aromatic, (vi) neutral. by calculating frequency of occurrence for each atom category within the motif, the motif could be expressed from the perspective of atom level. with deception of three levels ranging from low to high resolution, we could gain a well-rounded and detailed understanding of the binding motif. to make a quantitative evaluation of the binding motif, we expressed it using a -dimensional vector = ( % , . , / , … , % , .$ , .% , … , . ) where the former dimensions are used to compute the proportion of occurrence of the amino acid in a specific motif for each of the types of amino acids, and the latter dimensions are used for the calculation of the proportion of occurrence of the atom properties of each category in the same motif for each of the categories defined above. particularly, the value in each dimension could be defined as follows: ," , ≤ ≤ ," , ≤ ≤ ," and are the number of residues of type amino acid observed in the motif and total number of residues in that motif, respectively. and the types amino acids are assigned to a fixed order from to . similarly, ," and are the number of properties of category atom property and total number of properties, and the ranging from to denotes the atom properties corresponding to the above biochemical categories. to quantitatively assess the reusability of two fg-binding motifs, we calculated the pearson correlation coefficient (pcc) between two -dimension vectors representing two binding motifs: higher pcc indicates stronger correlation between two binding motifs, which suggest their high reproducibility. to systematically measure the conservation of motifs binding same fg, we calculated the pair-wise pcc among all the motifs binding a specific fg, and evaluated the overall conservation score (cs) as the average of all the pair-wise pcc values: cs = ∑ "&% > ∑ :&%,:?" > ( " , : ) ( − ) in addition, a permutation test was used to evaluate the statistical significance of the cs. specifically, for each fg-binding motif appeared in multiple ligands, we randomly selected same number of motifs from all the identified motifs times and calculated their corresponding cs value, the p-value was calculated as follows: = ∑ ( " > $ ) + %@a$ "&% where " is the cs value of randomly selected motifs, $ is the cs value of motifs binding the same fg. fg-binding motifs were classified based on their physicochemical properties, specifically, we performed k-means clustering on the -dimension vectors representing all the motifs. to determine the optimal number of clusters, we used elbow method which follows the basic idea to minimize the total intra-cluster variation as much as possible. concretely, we first computed the k-means clustering on the data consisting of vectors for different numbers of clusters k, which is ranging from to . next the total intra-cluster variation was calculated for each k value, and the formula is defined as follows: m m ( , ̅ b c ) . where b is cluster numbers, " is the i-th cluster, is the cluster centroid. based on the computed variation under different values of k, a curve of the variation according to the number of clusters k could be plotted. finally, the location of a bend (k= ) in the plot was selected as the optimal number of clusters in our approach. the source codes of aftme algorithm and the results of large-scale fg-motif analysis are available at https://github.com/mdhewei/aftme. all other relevant data can be obtained from the authors upon request. machine learning for neuroimaging with scikit-learn ligand binding site structure influences the evolution of protein complex function and topology the a-loop, a novel conserved aromatic acid subdomain upstream of the walker a motif in abc transporters, is critical for atp binding the protein data bank prediction of compounds' biological function (metabolic pathways) based on functional group composition structure-based design of antiviral drug candidates targeting the sars-cov- main protease when fold is not important: a common structural framework for adenine and amp binding in unrelated protein families adenine recognition: a motif present in atp-, coa-, nad-, nadp-, and fad-dependent proteins insights into protein-ligand interactions: mechanisms, models, and methods structure-based methods for computational protein functional site prediction cornerstones of crispr-cas in drug discovery and therapy discovery of a functional, contracted heme-binding motif within a multiheme cytochrome a systematic analysis of atomic proteinligand interactions in the pdb apoc: large-scale identification of similar protein pockets computational functional group mapping for drug discovery mfasd: a structure-based algorithm for discriminating different types of metal-binding sites de novo identification of essential protein domains from crispr-cas tiling-sgrna knockout screens a new protein binding pocket similarity measure based on comparison of clouds of atoms in d: application to ligand prediction structure-based prediction of ligand-protein interactions on a genome-wide scale crispr approaches to small molecule target identification shape variation in protein binding pockets and their ligands an integrated model of raf inhibitor action predicts inhibitor activity against oncogenic braf signaling comprehensive structural classification of ligandbinding motifs in proteins ligplot+: multiple ligand-protein interaction diagrams for drug discovery virtual screening strategies in drug discovery: a critical review cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences pdbwide collection of binding data: current status of the pdbbind database protein function annotation by homology-based inference impact of highthroughput screening in biomedical research protein recognition of adenylate: an example of a fuzzy recognition template on the evolution of protein-adenine binding generation of d templates of active sites of proteins with rigid prosthetic groups target identification of small molecules using large-scale crispr-cas mutagenesis scanning of essential genes how to improve r&d productivity: the pharmaceutical industry's grand challenge extreme sequence divergence but conserved ligand-binding specificity in streptococcus pyogenes m protein deepdrug d: classification of ligand-binding pockets in proteins with a convolutional neural network visgremlin: graph mining-based detection and visualization of conserved motifs at d protein-ligand interface at the atomic level the p-loop--a common motif in atp-and gtp-binding proteins discovery of cancer drug targets by crispr-cas screening of protein domains protein-ligand binding with the coarse-grained martini model the dot l inhibitor pinometostat reduces h k methylation and has modest clinical activity in adult acute leukemia autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading threedimensional view of the surface motif associated with the p-loop structure: cis and trans cases of convergent evolution machine learning and complex biological data biolip: a semi-manually curated database for biologically relevant ligand-protein interactions pocketmatch: a new algorithm to compare binding sites in protein structures motifs for molecular recognition exploiting hydrophobic enclosure in protein-ligand binding crystal structures of two novel dye-decolorizing peroxidases reveal a beta-barrel fold with a conserved hemebinding motif this work was supported by national natural science foundation of china ( to l.n., u to l.n., and to z.z.); ministry of science and technology of china ( yfa to l.n.); hefei national science center pilot project funds (in part). we thank all the lab members in niu lab for helpful discussion. w.h. and z.l conceptualized the study. y.l., w.h. and y.y. collected the data, y.l. and h.w. developed the method and performed the systematic analysis, y.g., z.z. and m.k. helped data interpretation. l.n. and z.l. supervised the project. y.l., w.h., z.l. and l.n. wrote the original manuscript. all authors read and approved the final manuscript. the authors of this manuscript declare that they have no competing interests. key: cord- -w sb h authors: schumacher, garrett j.; sawaya, sterling; nelson, demetrius; hansen, aaron j. title: genetic information insecurity as state of the art date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w sb h genetic information is being generated at an increasingly rapid pace, offering advances in science and medicine that are paralleled only by the threats and risk present within the responsible ecosystem. human genetic information is identifiable and contains sensitive information, but genetic data security is only recently gaining attention. genetic data is generated in an evolving and distributed cyber-physical ecosystem, with multiple systems that handle data and multiple partners that utilize the data. this paper defines security classifications of genetic information and discusses the threats, vulnerabilities, and risk found throughout the entire genetic information ecosystem. laboratory security was found to be especially challenging, primarily due to devices and protocols that were not designed with security in mind. likewise, other industry standards and best practices threaten the security of the ecosystem. a breach or exposure anywhere in the ecosystem can compromise sensitive information. extensive development will be required to realize the potential of this emerging field while protecting the bioeconomy and all of its stakeholders. genetic information contained in nucleic acids, such as deoxyribonucleic acid (dna), has become ubiquitous in society, enabled primarily by rapid biotechnological development and drastic decreases in dna sequencing and dna synthesis costs (berger and schneck, ; naveed et al., ) . innovation in these industries has far outpaced regulatory capacity and remained somewhat isolated from the information security and privacy domains. a single human whole genome sequence can cost hundreds to thousands of dollars per sample, and when amassed genetic information can be worth millions , . this positions genetic information systems as likely targets for cyber and physical attacks. human genetic information is identifiable lowrence and collins, ) and also contains sensitive health information; yet it is not always defined in these capacities by law. unlike most other forms of data, it is immutable, remaining with an individual for their entire life. sensitive human genetic data necessitates protection for the sake of individuals, their relatives, and ethnic groups; genetic information in general must be protected to prevent national and global threats (sawaya et al., ) . therefore, human genetic information is a uniquely confidential form of data that requires increased security controls and scrutiny. furthermore, non-human biological sources of genetic data are also sensitive. for example, microbial genetic data can be used to create designer microbes with crispr-cas and other synthetic biology techniques (werner, ) , presenting global and national security concerns. several genomics stakeholders have reported security incidents according to news sources , , , and breach notifications , , , , , . the most common reasons were misconfigurations of cloud security settings and email phishing attacks, and one resulted from a stolen personal computer containing sensitive information . the national health service's genomics england database in the united kingdom has been targeted by malicious nation-state actors , and andme's chief security officer said their database of around ten million individuals is of extreme value and therefore "certainly is of interest to nation states" . despite this recognition, proper measures to protect genetic information are often lacking under current best practices in relevant industries and stakeholders. multi-stakeholder involvement and improved understanding of the security risks to biotechnology are required in order to develop appropriate countermeasures (millett et al., ) . towards these goals, this paper expands upon a microbiological genetic information system assessment by fayans et al. (fayans et al., ) to include a broader range of genetic information, as well as novel concepts and additional threats to the ecosystem. confidentiality, integrity, and availability are the core principles governing the secure operation of a system (fayans et al., ; international organization for standardization, ) . confidentiality is the principle of ensuring access to information is restricted based upon the information's sensitivity. examples of confidentiality include encryption, access controls, and authorization. integrity is the concept of protecting information from unauthorized modification or deletion, while availability ensures information is accessible to authorized parties at all times. integrity examples include logging events, backups, minimizing material degradation, and authenticity verification. availability can be described as minimizing the chance of data or material destruction, as well as network, power, and other infrastructure outages. sensitive genetic information, which includes both biological material and digital genetic data, is the primary asset of concern, and associated assets, such as metadata, electronic health records and intellectual property, are also vulnerable within this ecosystem. genetic information can be classified into two primary levels, sensitive and nonsensitive, based upon value, confidentiality requirements, criticality, and inherent risk. sensitive genetic information can be further categorized into restricted and private sublevels. ❖ restricted sensitive genetic information can be expected to cause significant risk to a nation, ethnic group, individual, or stakeholder if it is disclosed, modified, or destroyed without authorization. the highest level of security controls should be applied to restricted sensitive genetic information. examples of restricted sensitive information are material and data sourced from humans, resources humans rely upon, and organisms and microbes that could cause harm to humans or resources humans rely upon. due to its identifiability, human genetic information can be especially sensitive and thus requires special security considerations. ❖ private sensitive genetic information can be expected to cause a moderate level of risk to a nation, ethnic group, individual, or stakeholder if it is disclosed, modified, or destroyed without authorization. genetic information that is not explicitly classified as restricted sensitive or nonsensitive should be treated as private sensitive information. a reasonable level of security controls should be applied to private sensitive information. examples of private sensitive information are intellectual property from research, breeding, and agricultural programs. ❖ nonsensitive (or public) genetic information can be expected to cause little risk if it is disclosed, modified, or destroyed without authorization. while few controls may be required to protect the confidentiality of nonsensitive genetic information, controls should be in place to prevent unauthorized modification or destruction of nonsensitive information. examples of nonsensitive information are material and data sourced from non-human entities that are excluded from the sensitive level if the resulting data are to be made publicly available within reason. the genetic information ecosystem can be compromised in numerous ways, including purposefully adversarial activities and human error. organizations take steps to monitor and prevent error, and molecular biologists are skilled in laboratory techniques; however, they commonly do not have the expertise and resources to securely configure and operate these environments, nor are they enabled to do so by vendor service contracts and documentation. basic security features and tools, such as antivirus software, can easily be subverted, and advanced protections are not commonly implemented. much genetic data is already publicly available via open and semi-open databases, and dissemination practices are not properly addressed by regulations. there are wide-ranging motives behind adversaries targeting non-public genetic information (fayans et al., ) . numerous stakeholders, personnel, and insecure devices are relied upon from the path of sample collection to data dissemination. depending on the scale of an exploit, hundreds to millions of people could be compromised. local attacks could lead to certain devices, stakeholders, and individuals being affected, while supply chain and remote attacks could lead to global-scale impact. widespread public dissemination and lack of inherent security controls equate to millions of individuals and their relatives having substantial risk imposed upon them. genetic data can be used to identify an individual (lin et al., ) and predict their physical characteristics (li et al., ; lippert et al., ) , and capabilities for familial matching are increasing, with the ability to match individuals to distant relatives edge et al., ; ney et al., ) . identifiability of genetic information is a critical challenge leading to growing consumer privacy concerns (baig et al., ) , and behavioral predictions from genetic information are gaining traction to produce stronger predictors year over year (gard et al., ; johnson et al., ) . furthermore, many diseases and negative health outcomes have genetic determinants, meaning that genetic data can reveal sensitive health information about individuals and families (sawaya et al., ) . these issues pale in comparison to the weaponization of genetic information. genetics can inform both a doctor and an adversary in the same way, revealing weaknesses that can be used for treatment or exploited to cause disease (sawaya et al., ) . the creation of bioweapons utilizes the same processes as designing vaccines and medicines to mitigate infectious diseases, namely access to an original infectious organism or microbe and its genetic information (berger and roderick, ) . this alarming scenario was thought to be unlikely only six years ago as the necessary specialized skills and expertise were not widely distributed. since then, access to sensitive genetic data has increased, such as the genome sequences of the novel coronavirus (sars-cov- ) (sah et al., ) , african swine fever (mazur-panasiuk et al., ) , and the spanish influenza a (h n ) (tumpey et al., ) viruses. synthetic biology capabilities, skill sets, and resources have also proliferated (ney et al., ) . sars-cov- viral clone creation from synthetic dna fragments was possible only weeks after the sequences became publicly available (thao et al., ) . this same technology can be utilized to modify noninfectious microbes and microorganisms to create weaponizable infectious agents (berger and roderick, ; chosewood and wilson, ; salerno and koelm, ) . covid- susceptibility, symptoms, and mortality all have genetic components (taylor et al., ; ellinghaus et al., ; nguyen et al., ) , demonstrating how important it will be to safeguard genetic information in the future to avoid targeted biological weapons. additionally, microbiological data cannot be determined to have infectious origins until widespread infection occurs or until it is sequenced and deeply analyzed (chosewood and wilson, ; salerno and koelm, ) ; hence, data that is potentially sensitive also needs to be protected throughout the entire ecosystem. the genetic information ecosystem is a distributed cyber-physical system containing numerous stakeholders (supplementary material, appendix ), personnel, and devices for computing and networking purposes. the ecosystem is divided into the pre-analytical, analytical, and postanalytical phases that are synonymous with: (i) collection, storage, and distribution of biological samples, (ii) generation and processing of genetic data, and (iii) storage and sharing of genetic data (supplementary material, appendix ). this ecosystem introduces many pathways, or attack vectors, for malicious access to information and systems ( figure ). the genetic information ecosystem and accompanying threat landscape. the genetic information ecosystem is divided into three phases: pre-analytical, analytical, and post-analytical. the analytical phase is further divided into wet laboratory preparation, dna sequencing, and bioinformatic pipeline subphases. in its simplest form, this system is a series of inputs and outputs that are either biological material, data, or logical queries on data. every input, output, device, process, and stakeholder are vulnerable to exploitation via the attack vectors denoted by red letters. color schema: purple, sample collection and processing; blue, wet laboratory preparation; green, genetic data generation and processing; yellow, data dissemination, storage, and application. unauthorized physical access or insider threats could allow for theft of assets or the use of other attack vectors on any phase of the ecosystem (walsh and streilein, ) . small independent laboratories do not often have resources to implement strong physical security. large institutions are often enabled to maintain strong physical security, but the relatively large number of individuals and devices that need to be secured can create a complex attack surface. ultimately, the strongest cybersecurity can be easily circumvented by weak physical security. insider threats are a problem for information security because personnel possess deeper knowledge of an organization and its systems. many countries rely on foreign nationals working in biotechnological fields that may be susceptible to foreign influence . citizens can also be susceptible to foreign influence . personnel could introduce many exploits on-site if coerced or threatened. even when not acting in a purposefully malicious manner, personnel can unintentionally compromise the integrity and availability of genetic information through error (us office of the inspector general, ). appropriate safeguards should be in place to ensure that privileged individuals are empowered to do their work correctly and efficiently, but all activities should be documented and monitored when working with sensitive genetic information. sample collection, storage, and distribution processes have received little recognition as legitimate points for the compromise of genetic information. biological samples as inputs into this ecosystem can be modified maliciously to contain encoded malware (ney et al., ) , or they could be degraded, modified, or destroyed to compromise the material's and resulting data's integrity and availability. sample repository and storage equipment are usually connected to a local network for monitoring purposes. a remote or local network attack could sabotage connected storage equipment, causing samples to degrade or be destroyed. biorepositories and the collection and distribution of samples could be targeted to steal numerous biological samples, such as in known genetic testing scams . targeted exfiltration of small numbers of samples may be difficult to detect. sensitive biological material should be safeguarded in storage and transit, and when not needed for long-term biobanking, it should be destroyed following successful analysis. other organizations that handle genetic material could be targeted for the theft of samples and processed dna libraries. the wet laboratory preparation and dna sequencing subphases last several weeks and produce unused waste and stored material. at the conclusion of sequencing runs, the consumables that contain dna molecules are not always considered sensitive. these items can be found unwittingly maintained in many sequencing laboratories. several cases have been documented of dna being recovered and successfully sequenced while aged for years at room temperature and in non-controlled environments (colette et al., ). dna sequencing systems and laboratories are multifaceted in their design and threat profile. dna sequencing instruments have varying scalability of throughput, cost, and unique considerations for secure operation (table ) . sequencing instruments have a built-in computer and commonly have connected computers and servers for data storage, networking, and analytics. these devices contain a number of different hardware components, firmware, an operating system, and other software. some contain insecure legacy versions of operating system distributions. sequencing systems usually have wireless or wired local network connections to the internet that are required for device monitoring, maintenance, data transmission, and analytics in most operations. wireless capabilities and bluetooth technology within laboratories present unnecessary threats to these systems, as any equipment connected to laboratory networks is a potential network entry point. device vendors obtain various internal hardware components from several sources and integrate them into laboratory devices that contain vendor-specific intellectual property and software. generic hardware components are often produced overseas, which is cost effective but leads to insecurities and a lack of hardening for specific end-use purposes. hardware vulnerabilities could be exploited on-site, or they can be implanted during manufacturing and supply-chain processes for widespread and unknown security issues (fayans et al., ; ender et al., ; shwartz et al., ; anderson and kuhn, ) . such hardware issues are unpatchable and will remain with devices forever until newer devices can be manufactured to replace older versions. unfortunately, adversaries can always shift their techniques to create novel vulnerabilities within new hardware in a continual vicious cycle. third-party manufacturers and device vendors implement firmware in these hardware components. embedded device firmware has been shown to be more susceptible to cyber-attacks than other forms of software (shwartz et al., ) . in-field upgrades are difficult to implement, and like hardware, firmware and operating systems of sequencing systems can be maliciously altered within the supply chain (fayans et al., ) . a firmware-level exploit would allow for the evasion of operating system, software, and application-level security features. firmware exploits can remain hidden for long periods, even after hardware replacements or wiping and restoring to default factory settings. furthermore, operating systems have specific disclosed common vulnerabilities and exposures (cves) that are curated by the mitre organization and backed by the us government . with ubiquitous implementation in devices across all phases of the ecosystem, these software issues are especially concerning but can be partially mitigated by frequent updates. however, operating systems and firmware are typically updated every six to twelve months by a field agent accessing a sequencing device on site. device operators are not allowed to modify the device in any way, yet they are responsible for some security aspects of this equipment. additionally, researchers have confirmed the possibility of index hopping, or index misassignment, by sequencing device software, resulting in customers receiving confidential data from other customers (ney et al., ) or downstream data processors inputting incorrect data into their analyses. dna sequencing infrastructure is proliferating. illumina, the largest vendor of dna sequencing instruments, accounted for % of the world's sequencing data in by their own account . in , illumina had , sequencers implemented globally capable of producing a total daily output of tb (erlich, ) , with many of these instruments housed outside of the us and europe. in , technology developed by beijing genomics institute has finally resulted in the $ human genome (drmanac, ) while us prices remain around $ , . overseas organizations can be third-party sequencing service providers for direct-to-consumer (dtc) companies and other stakeholders. shipping internationally for analysis is less expensive than local services (office of the us trade representative, ), indicating that genetic data could be aggregated globally by nation-states and other actors during the analysis phase. https://cve.mitre.org/ https://www.cisa.gov/news/ / / /fbi-and-cisa-warn-against-chinese-targeting-covid- -research-organizations raw signal sequencing data are stored on a sequencing system's local memory and are transmitted to one or more endpoints. transmitting data across a local network requires internal information technology (it) configurations. vendor documentation usually depends upon implementing a firewall to secure sequencing systems, but doing so correctly requires deep knowledge of secure networking and vigilance of network activity. documentation also commonly mentions disabling and enabling certain network protocols and ports and further measures that can be difficult for most small-to medium-sized organizations if they lack dedicated it support. laboratories and dna sequencing systems are connected to many third-party services, and laboratories have little control over the security posture of these connections. independent cloud platforms and dna sequencing vendors' cloud platforms are implemented for bioinformatic processing, data storage, and device monitoring and maintenance capabilities (table ) . a thorough security assessment of cloud services remains unfulfilled in the genomics context. multifactor authentication, role-and task-based access, and many other security measures are not common in these platforms. misconfigurations to cloud services and remote communications are a primary vulnerability to genetic information, demonstrated by prior breaches, remote desktop protocol issues affecting illumina devices , and a disclosed vulnerability in illumina's basespace application program interface . laboratory information management systems (lims) are also frequently implemented within laboratories and connected to sequencing systems and laboratory networks (roy et al., ) . dna sequencing vendors provide their own limss as part of their cloud offerings. even when lims and cloud platforms meet all regulatory requirements for data security and privacy, they are handling data that is not truly anonymized and therefore remains identifiable and sensitive. furthermore, specific cves have been disclosed for dnatools' dnalims product that were actively exploited by a foreign nation-state . phishing attacks are another major threat, as email services add to the attack surface in many ways. sequencing service providers often share links granting access to datasets via email. these email chains are a primary trail of transactions that could be exploited to exfiltrate data on clients, metadata of samples, or genetic data itself. some laboratories transmit raw data directly to an external hard drive per customer or regulatory requirements. reducing network activity in this way can greatly minimize the threat surface of sensitive genetic information. separating networks and devices from other networks, or air gapping, while using hard drives is possible, but even air-gapped systems have been shown to be vulnerable to compromise (guri, ; guri et al., ) . sequencing devices are still required to be connected to the internet for maintenance and are often connected between offline operations. hard drives can be physically secured and transported; however, these methods are time and resource intensive, and external drives could be compromised for the injection of modified software or malware. bioinformatic software has not been commonly scrutinized in security contexts or subjected to the same adversarial pressure as other more mature software. open-source software is widely used across genomics, acquired from several online code repositories, and heavily modified for individual purposes, but it is only secure when security researchers are incentivized to assess these products. in a specialized and niche industry like genomics and bioinformatics this is typically not the case. bioinformatic programs have been found to be vulnerable due to poor coding practices, insecure function usage, and buffer overflows , (ney et al., ) . many researchers have uncovered that algorithms can be forced to mis-classify by intentionally modifying data inputs, breaking the integrity of any resulting outputs (finlayson et al., ) . nearly every imaginable algorithm, model type, and use case has been shown to be vulnerable to this kind of attack across many data types, especially those relevant to raw signal and sequencing data formats (biggio and roli, ) . similar attacks could be carried out in the processing of raw signal data internal to a sequencing system or on downstream bioinformatic analyses accepting raw sequencing data or processed data as an input. alarming amounts of human and other sensitive genetic data are publicly available , , , , . several funding and publication agencies require public dissemination, so researchers commonly contribute to open and semi-open databases (shi and wu, ) . healthcare providers either house their own internal databases or disseminate to third-party databases. their clinical data is protected like any other healthcare information as required by regulations; however, this data can be sold and aggregated by external entities. dtc companies keep their own internal databases closely guarded and can charge steep prices for third-party access. data sharing is prevalent when the price is right. data originators often have access to their genetic data and test results for download in plaintext. these reports can then be uploaded to public databases, such as gedmatch and dna.land, for further analyses, including finding distant genetic relatives with a shared ancestor . a well-known use of such identification tactics was the infamous golden state killer case (edge and coop, ) . data sharing is dependent upon the data controller's wants and needs, barring any legal or business requirements from other involved stakeholders. genetic database vulnerabilities have been well-studied and disclosed (edge and coop, ; ney et al., ; naveed et al., ; erlich and narayanan, ; gymrek et al., ) . for example, the contents of the entire gedmatch database could be leaked by uploading artificial genomes (ney et al., ) . such an attack would violate the confidentiality of more than a million users' and their relatives' genetic data because the information is not truly anonymized. even social media posts can be filtered for keywords indicative of participation in genetic research studies to identify research participants in public databases (liu et al., ) . all told, tens of millions of research participants, consumers, and relatives are already at risk. adversarial targeting of genetic information largely depends upon the sensitivity, quantity, and efficiency of information compromise for attackers, leading to various states in likelihood of a breach or exposure scenario. the impact of a compromise is determined by a range of factors, including the size of the population at risk, negative consequences to stakeholders, and capabilities and scale of adversarial activity. likelihood and impact both ultimately inform the level of risk facing stakeholders during ecosystem phases (figure ). risk to the genetic information ecosystem. quantity is not to scale but is denoted abstractly by width of the second column. likelihood judged by the available threats and opportunities to adversaries and the efficiency of an attack. impact in terms of the number of people affected and the current and emerging consequences to stakeholders. likelihood and impact scores: low (+); moderate (+ +); high (+ + +); very high (+ + + +); extreme (+ + + + +). low to extreme risk is denoted by the hue of red, from light to dark. security is a spectrum; stakeholders must do everything they can to chase security as a best practice. securing genetic information is a major challenge in this rapidly evolving ecosystem. attention has primarily been placed on the post-analytical phase of the genetic information ecosystem for security and privacy, but adequate measures have yet to be universally adopted. the pre-analytical and analytical phases are also highly vulnerable points for data compromise that must be addressed. adequate national regulations are needed for security and privacy enforcement, incentivization, and liability, but legal protection is dictated by regulators' responses and timelines. however, data originators, controllers, and processors can take immediate action to protect their data. genetic information security is a shared responsibility between sequencing laboratories and device vendors, as well as all other involved stakeholders. to protect genetic information, laboratories, biorepositories, and other data processors need to create strong organizational policies and reinvestments towards their physical and cyber infrastructure. they also need to determine the sensitivity of their data and material and take necessary precautions to safeguard sensitive genetic information. data controllers, especially healthcare providers and dtc companies, should reevaluate their data sharing models and methods, with special consideration for the identifiability of genetic data. device vendors need to consider security when their products are being designed and manufactured. many of these recommendations go against the current paradigms in genetics and related industries and will therefore take time, motivation, and incentivization before being actualized, with regulation being a critical factor. in order to secure genetic information and protect all stakeholders in the genetic information ecosystem, further in-depth assessments of this threat surface will be required, and novel security and privacy approaches will need to be developed. sequencing systems, bioinformatics software, and other biotechnological infrastructure need to be analyzed to fully understand their vulnerabilities. this will require collaborative engagement between stakeholders to implement improved security measures into genetic information systems (moritz et al., ; berger and schneck, ) . the development and implementation of genetic information security will foster a healthy and sustainable bioeconomy without damaging privacy or security. there can be security without privacy, but privacy requires security. these two can be at odds with one another in certain contexts. for example, personal security aligns with personal privacy, whereas public security can require encroachment on personal privacy. a similar story is unfolding within genetics. genetic data must be shared for public good, but this can jeopardize personal privacy. however, genetic data necessitates the strongest protections possible for public security and personal security. appropriate genetic information security will simultaneously protect everyone's safety, health, and privacy. the inspiration for this work occurred while performing several security assessments and penetration tests of dna sequencing laboratories and other stakeholders. initially, an analysis of available literature and technical documentation (n= ) was performed, followed by confidential semi-structured interviews (n= ) with key personnel from multiple relevant stakeholders. the study's population consisted of leaders and technicians from government agencies (n= ) and organizations in small, medium, and large enterprises (n= ) across the united states, including california, colorado, district of columbia, massachusetts, montana, and virginia. several stakeholders allowed access to their facilities for observing environments and further discussions. some stakeholders allowed in-depth assessments of equipment, networks, and services. gs, ss, and dn are founders and owners of geneinfosec inc. and are developing technology and services to protect genetic information. geneinfosec inc. has not received us federal research funding. ah declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. genetics stakeholders are categorized based upon their influence, contributions, and handling of biological samples and resulting genetic data (supplementary table ). asymmetries exist between stakeholders in these regards . data originators are humans that voluntarily or involuntarily are the source of biological samples or are investigators collecting samples from nonhuman specimens. examples of data originators include consumers, healthcare patients, military personnel, research subjects, migrants, criminals, and their relatives. data controllers are entities that are legally liable for and dictate the use of biological samples and resulting data. in humanderived contexts, data controllers are typically healthcare providers, researchers, law enforcement agencies, or dtc companies. data processors are entities that collect, store, generate, analyze, disseminate, and/or apply biological samples or genetic data. data processors may also be data originators and data controllers. examples include biorepositories, dna sequencing laboratories, researchers, cloud and other service providers, and supply chain entities responsible for devices, software and materials. regulators oversee this ecosystem and the application and use of biotechnology, biological samples, genetic data, and market/industry trends at the transnational, national, local, and organizational levels. biological samples and metadata from the samples must first be collected once a data originator or controller determines to proceed with genetic testing. biological samples can be sourced from any biological entity relying on nucleic acids for reproduction, replication, and other processes, including non-living microbes (e.g., viruses, prions), microorganisms (e.g., bacteria, fungi), and organisms (e.g., plants, animals). samples are typically de-identified of metadata and given a numeric identifier, but this is largely determined by the interests of data controllers and the regulations that may pertain to various sample types. metadata includes demographic details, inclusion and exclusion criteria, pedigree structure, health conditions critical for secondary analysis, and other identifying information . it can also be in the form of quality metrics obtained during the analysis phase. samples are then stored in controlled environments at decreased temperature, moisture, light, and oxygen to avoid degradation. sample repositories can be internal or third-party infrastructure housing small to extremely large quantities of material for short-and long-term storage. following storage, samples are distributed to an internal or third-party laboratory for dna sequencing preparations. the wet laboratory preparation phase chemically prepares biological samples for sequencing with sequencing-platform-dependent methods. this phase can be performed manually with time-and labor-intensive methods, or it can be highly automated to reduce costs, run-time, and error. common initial preparation steps involve removing contaminants and unwanted material from biological samples and extracting and purifying samples' nucleic acids. if rna is to be sequenced, it is usually converted into complementary dna. once dna has been isolated, a library for sequencing is created via size-selection, sequencing adapter ligation, and other chemical processes. adapters are synthetic dna molecules attached to dna fragments for sequencing and contain sample indexes, or identifiers. indexes allow for multiplexing sequencing runs with many samples at once to increase throughput, decrease costs, and to identify dna fragments to their sample source. to begin sequencing, prepared libraries are loaded into a dna sequencing instrument with the required materials and reagents. laboratory personnel must login to the instrument and any connected services, such as cloud services or information management systems, and configure a run to initiate sequencing. a single sequencing run can generate gigabytes to terabytes of raw sequencing data and last anywhere from a few hours to multiple days, requiring the devices to commonly be left unmonitored during operation. raw data can be stored on the instrument's local memory and are transmitted to one or more of the following endpoints during or following a sequencing run: (i) local servers, computers, or other devices within the laboratory; (ii) cloud services of the vendor or other service providers; and (iii) external hard drives directly tethered to the sequencer. data paths largely depend on the sequencing platform, the laboratory's capabilities and infrastructure, and the sensitivity of data being processed. certain regulations require external hard drive use and offline data storage, analysis, and transmission. bioinformatic pipelines convert raw data through a series of software tools into usable forms. raw signal data include images, chemical signal, electrical current signal, and other forms of signal data dependent upon the sequencing platform. primary analyses convert raw signal data into sequence data with accompanying quality metrics through a process known as basecalling. many sequencing instruments can perform these functions. the length of each dna molecule sequenced is orders of magnitude smaller than genes or genomes of interest, so basecalled sequence data must then be aligned to determine each read's position within a genome or genomic region. this aligned sequence data is then compared to reference genomes sourced from databases through a procedure known as variance detection to determine differences between a sample's data and the accepted normal genomic sequence. only the unique genetic variants of a sample are retained in variance call format (vcfs) files, a common final processed data form. vcf files are vastly smaller than the gigabytes to terabytes of raw data initially produced, making them an efficient format for longterm storage, dissemination, and analysis purposes. however, this file format exists as a security threat for sensitive genetic data because these files are personally identifiable and contain sensitive health information. following data analyses, processed data are integrated with metadata and ultimately interpreted for the data controller's purpose. metadata and genetic data are often housed together, and exploiting this combined information could lead to numerous risks and threats to the data originators, their relatives, and the liable entities involved along the data path. secondary analyses can be performed on datasets by data controllers and third-party data processors to answer any number of relevant research questions, such as in diagnostics or ancestry analysis. genetic research is only powerful when large datasets are created containing numerous data points from thousands to millions of samples. therefore, genetic data is widely distributed and accessible via remote means across numerous databases and stakeholders. low cost attacks on tamper resistant devices i'm hoping they're an ethical company that won't do anything that i'll regret" users perceptions of at-home dna testing companies national and transnational security implications of big data in the life sciences national and transnational security implications of asymmetric access to and use of biological data wild patterns: ten years after the rise of adversarial machine learning biosafety in microbiological and biomedical laboratories. us department of health and human services adverse effect of air exposure on the stability of dna stored at room temperature first $ genome sequencing enabled by new extreme throughput dnbseq platform how lucky was the genetic investigation in the golden state killer case attacks on genetic privacy via uploads to genealogical databases linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets genomewide association study of severe covid- with respiratory failure the unpatchable silicon: a full break of the bitstream encryption of xilinx -series fpgas routes for breaching and protecting genetic privacy identity inference of genomic data using longrange familial searches cyber security threats in the microbial genomics era: implications for public health adversarial attacks on medical machine learning genetic influences on antisocial behavior: recent advances and future directions. current opinion in psychology power-supplay: leaking data from air-gapped systems by turning the power-supplies into speakers brightness: leaking sensitive data from air-gapped workstations via screen brightness identifying personal genomes by surname inference iso/iec : . information technology -security techniques -guidelines for cybersecurity behavioral genetic studies of personality: an introduction and review of the results of + years of research. the sage handbook of personality theory and assessment robust genome-wide ancestry inference for heterogeneous datasets and ancestry facial imaging based on the genomes project. biorxiv genomic research and human subject privacy identification of individuals by trait prediction using whole-genome sequencing data amia annual symposium proceedings identifiability in genomic research the first complete genomic sequences of african swine fever cyber-biosecurity risk perceptions in the biotech sector promoting biosecurity by professionalizing biosecurity privacy in the genomic era computer security risks of distant relative matching in consumer genetic databases computer security, privacy, and dna sequencing: compromising computers with synthesized dna, privacy leaks, and more genotype extraction and false relative attacks: security risks to third-party genetic genealogy services beyond identity inference human leukocyte antigen susceptibility map for sars-cov- next-generation sequencing informatics: challenges and strategies for implementation in a clinical environment complete genome sequence of a novel coronavirus (sars-cov- ) strain isolated in nepal biological laboratory and transportation security and the biological weapons convention. national nuclear security administration artificial intelligence and the weaponization of genetic data an overview of human genetic privacy opening pandora's box: effective techniques for reverse engineering iot devices analysis of genetic host response risk factors in severe covid- patients. medrxiv rapid reconstruction of sars-cov- using a synthetic genomics platform characterization of the reconstructed spanish influenza pandemic virus the fbi dna laboratory: a review of protocol and practice vulnerabilities findings of the investigation into china's acts, policies and practices related to technology transfer, intellectual property, and innovation under section of the trade act of . office of the united states trade representative, executive office of the president security measures for safeguarding the bioeconomy the coming crispr wars: or why genome editing can be more dangerous than nuclear weapons thermo fisher scientific, inc. applied biosystems / xl genetic analyzer user guide thermo fisher scientific, inc. applied biosystems / xl dna analyzers user guide applied biosystems seqstudio genetic analyzer specification sheet illumina document # v illumina proactive | data security sheet illumina document # v illumina document # v illumina document # v , material # illumina document # v , material # nextseq dx instrument site prep guide novaseq sequencing system site prep guide thermo fisher scientific publication #col thermo fisher scientific publication #col ion torrent genexus integrated sequencer performance summary sheet gridion mk site installation and device operation requirements, version oxford nanopore technologies. minion it requirements, version promethion p /p site installation and device operation requirements, version pacific biosciences of california, inc. operations guide -sequel system: the smrt sequencer pacific biosciences of california, inc. operations guide -sequel ii system: the smrt sequencer connect platform | iot connectivity the authors would like to acknowledge the confidential research participants and collaborators on this study for their time, resources, and interest in bettering genetic information security. thank you to cory cranford, arya thaker, ashish yadav, and dr. kevin gifford and dr. daniel massey of the department of computer science, formerly of the technology, cybersecurity and policy program, at the university of colorado boulder for their support of this work. v appendix . overview of the genetic information ecosystem processes (page ) key: cord- - svaq at authors: ogrodzinski, martin p.; lunt, sophia y. title: metabolomic profiling of mouse mammary tumor derived cell lines reveals targeted therapy options for cancer subtypes date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: svaq at breast cancer is a heterogeneous disease with several subtypes that currently do not have targeted therapy options. metabolomics has the potential to uncover novel targeted treatment strategies by identifying metabolic pathways required for cancer cells to survive and proliferate. here, we used tumor-derived cell lines derived from the mmtv-myc mouse model to investigate metabolic pathways that are differentially utilized between two subtypes of breast cancer. using mass spectrometry-based metabolomics techniques, we identified differences in glycolysis, the tricarboxylic acid cycle, glutathione metabolism, and nucleotide metabolism between subtypes. we further show the feasibility of targeting these pathways in a subtype-specific manner using metabolism-targeting compounds. breast cancer is a heterogeneous disease with subtypes that vary by morphology, in . % of human breast cancers and is more common in high grade tumors and the basal-like subtype. as a transcription factor, myc affects numerous biological processes including metabolism. , notably, myc expression regulates several genes in glucose, amino acid, and nucleotide metabolism. - therefore, investigating metabolism of the mmtv-myc model system may reveal metabolic features common to human cancer and could present new targeted therapeutic options. here, we present a study investigating the metabolic profiles of two histologically distinct to determine metabolic profiles of histologically distinct mouse mammary tumor subtypes, polar metabolites were extracted from tumor-derived cell lines and quantitated using lc-ms/ms. we found metabolites involved in several central carbon metabolic pathways to be differentially abundant between emt and papillary tumor-derived cell lines (figure ). in the emt subtype, both oxidized and reduced forms of glutathione, a key metabolite in redox homeostasis, are elevated ( figure b ). increased levels of both reduced and oxidized glutathione imply that the emt subtype has elevated glutathione biosynthetic activity. this could reflect a greater dependency on glutathione biosynthesis in the emt cells and targeting glutathione biosynthesis would therefore be more effective against the emt subtype. metabolites increased in the papillary subtype include fructose bisphosphate (fbp; glycolysis); acetyl-coa indicating relative metabolite differences between emt and papillary tumor derived cell lines. metabolites are sorted by relationship to metabolic pathways. yellow and blue boxes indicate increased or decreased metabolite levels relative to the average of the papillary subtype, respectively. data for each sample is normalized to the average signal for all metabolites in the analysis. metabolites with statistically significant differences (p-value < . ) are bolded and marked with asterisks (*). (b) representative bar graphs of metabolites with statistically significant differences between emt and papillary subtypes. data are displayed as means ± s.d., n = . a b * * * * metabolic pool size measurements to reveal more complete metabolic profiles. we find the papillary subtype has proportionally lower abundance of and atp, as well as ribulose- -phosphate and ribose- -phosphate, two intermediates in the ppp production or decreased nucleotide consumption, we applied the same isotope labeling molecular portraits of human breast tumours non-redox-active lipoate derivates disrupt cancer cell mitochondrial open source clustering software java treeview-extensible visualization of microarray data isocor: correcting ms data in isotope supplementary figure : c-isotope labeling from glutamine into the tca cycle is similar between subtypes. cells were incubated in c-glutamine containing media for four hours and extracted for metabolites. grey boxes represent the unlabeled proportion for each metabolite at four hours. green boxes represent the sum of all potential isotopologues for each metabolite supplementary figure : c-isotope labeling from glucose into ribose -phospahte, serine, and glycine and from glutamine into aspartate is similar between subtypes. cells were grey boxes represent the unlabeled proportion for each metabolite at four hours. colored boxes represent isotopologues for each metabolite and are sorted based on carbon source key: cord- - kec re authors: miao, zhen; balzer, michael s.; ma, ziyuan; liu, hongbo; wu, junnan; shrestha, rojesh; aranyi, tamas; kwan, amy; kondo, ayano; pontoglio, marco; kim, junhyong; li, mingyao; kaestner, klaus h.; susztak, katalin title: single cell resolution regulatory landscape of the mouse kidney highlights cellular differentiation programs and renal disease targets date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kec re determining the epigenetic program that generates unique cell types in the kidney is critical for understanding cell-type heterogeneity during tissue homeostasis and injury response. here, we profiled open chromatin and gene expression in developing and adult mouse kidneys at single cell resolution. we show critical reliance of gene expression on distal regulatory elements (enhancers). we define key cell type-specific transcription factors and major gene-regulatory circuits for kidney cells. dynamic chromatin and expression changes during nephron progenitor differentiation demonstrated that podocyte commitment occurs early and is associated with sustained foxl expression. renal tubule cells followed a more complex differentiation, where hfn a was associated with proximal and tfap b with distal fate. mapping single nucleotide variants associated with human kidney disease identified critical cell types, developmental stages, genes, and regulatory mechanisms. we provide a global single cell resolution view of chromatin accessibility of kidney development. the dataset is available via interactive public websites. the true cell type specificity of these enhancers is critically important. here, we reasoned that single cell accessible chromatin information could be extremely useful to identify the cell type- specific enhancer regions and thereby the target cell type for the gwas hits, however, such maps have not been generated for the human kidney. we combined three recent kidney disease gwas discussion in summary, here we present the first cellular resolution open chromatin map for the developing and adult mouse kidney. using this dataset, we identified key cell type-specific regulatory networks for kidney cells, defined the cellular differentiation trajectory, characterized regulatory dynamics and identified key driving tfs for nephron development, especially for the terminal differentiation of epithelial cells. furthermore, our results shed light on the cell types and target genes for genetic variants associated with kidney disease development. by performing massively parallel single cell profiling of chromatin state, we were able to define the key regulatory logic for each kidney cell type by investigating cis-regulatory elements and tf- target gene interaction. we found that most cell type-specific open chromatin regions are within distal regulatory elements and intronic regions. our studies identified a massive amount of highly dynamic co-regulated peaks indicating the important correlation between distal regulatory elements and gene expression. future studies will examine the relative contribution of promoters and enhancer openness in gene expression regulation. however, these studies highlight that both chromatin opening and looping are critical for gene regulation. we also observed that the single cell open chromatin atlas was able to define more distinct cell types even in the developing kidney compared to scrna-seq analysis. given the continuous nature of rna expression, it has been exceedingly difficult to dissect specific cell types in the developing kidney , , . in addition, it has been difficult to resolve the cell type origin of lowly expressed transcripts in scrna-seq data. however, this is not the case for snatac-seq data, which were able to capture the chromatin state irrespective of gene expression magnitude. there were several examples where accessible peaks were identified in specific cell types even for lowly expressed genes such as shroom . we identified critical cell type-specific tfs by integrating multiple computational analyses. tf identification is challenging in scrna-seq data since the expression of several cell type-specific tfs is low and some of them do not show a high degree of cell type-specificity . by extracting motif information, snatac-seq data provides additional information for tf identification. / together with regulon analysis, as implemented in scenic, we have identified several tfs as well as their target genes that are important for kidney development. leveraging this newly identified cell type-specific regulatory network will be essential for future studies of cellular reprogramming of precursors into specific kidney cell types and for better understanding homeostatic and maladaptive regeneration. our studies revealed dynamic chromatin accessibility that tracks with renal cell differentiation. these states may reveal mechanisms governing the establishment of cell fate during development, in particular those underlying the emergence of specific cell types. we found a consistent and coherent pattern between gene expression and open chromatin information, where the nephron progenitors differentiated into two branches representing podocytes and tubule cells . we found that podocytes commitment occurred earlier, while tubule differentiation and segmentation appeared to be more complex. this podocyte specification correlated with the maintenance of expression of foxc and foxl expression in podocytes. while foxc has been known to play a role in nephron progenitors and podocytes, this is the first description of foxl in kidney and podocyte development. our studies are consistent with recent observations from organoid models that recapitulated podocyte differentiation better than tubule cell differentiation . our study also sheds light on tubule differentiation and segmentation. we confirmed the key role of hnf a in proximal tubules. we have identified a large number of new transcriptional regulators such as tfap a that seem to be critical for the distal portion of the nephron. open exclusively in nephron progenitors, whereas chromatin becomes inaccessible as differentiation progresses during later stages, such as shroom and uncx. this is an interesting and important novel mechanism, indicating that the altered expression of this gene might play a role in the development rewiring of the kidney. this mechanism is similar to genes associated with autism that are known to be expressed in the fetal but not in the adult stages and highlights the critical role of understanding chromatin accessibility at multiple stages of differentiation. while we have generated a large amount of high-quality data, this information will need further experimental validation, which is beyond the scope of the current manuscript. in addition, one needs to be aware of the limitations when interpreting different computational analyses, for example, the motif enrichment analyses such as implemented by homer, scenic, and chromvar, are not able to distinguish between tfs with similar binding sites. molecular mechanisms of diabetic kidney disease patterning a complex organ: branching morphogenesis and nephron segmentation in kidney development six and wnt regulate self-renewal and commitment of nephron progenitors through shared gene regulatory networks the gudmap database--an online resource for genitourinary research comparative analysis and refinement of human psc-derived kidney kidney organoids from human ips cells contain multiple lineages and model human nephrogenesis generation of nephron progenitor cells and kidney organoids from human pluripotent stem cells human kidney organoids: progress and remaining challenges correction: single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk psychrophilic proteases dramatically reduce single-cell rna-seq artifacts: a molecular atlas of kidney development how to grow a kidney: patient-specific kidney organoids come of age single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney time-dependent process of cell fate acquisition in mouse and human nephrogenesis single-cell transcriptomics reveals gene expression dynamics of human fetal kidney development understanding the kidney one cell at a time fast and accurate clustering of single cell epigenomes reveals cis- rare cell types. biorxiv fast, sensitive and accurate integration of single-cell data with model-based analysis of chip-seq (macs) single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease comprehensive integration of single-cell data advantages of single-nucleus over single-cell rna sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis gencode reference annotation for the human and mouse genomes developmental trajectory of pre-hematopoietic stem cell formation from endothelium. biorxiv joint profiling of chromatin accessibility and gene expression in thousands of single cells transcriptional regulatory control of mammalian nephron progenitors revealed by multi-factor cistromic analysis and genetic studies cicero predicts cis-regulatory dna interactions from single-cell simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities scenic: single-cell regulatory network inference and clustering characterization of paralogous uncx transcription factor encoding genes in zebrafish rna velocity of single cells chromvar: inferring transcription-factor-associated accessibility from single-cell epigenomic data wt is a key regulator of podocyte function: reduced expression levels cause crescentic glomerulonephritis and mesangial sclerosis wt targets gas to maintain nephron progenitor cells by modulating fgf signals crucial roles of brn in distal tubule formation and function in mouse kidney intronic locus determines shroom expression and potentiates renal allograft fibrosis developmental origins for kidney disease due to shroom deficiency renal compartment-specific genetic variation analyses identify new pathways in chronic kidney disease single-cell profiling reveals sex, lineage, and regional diversity in the mouse kidney six defines and regulates a multipotent self-renewing nephron progenitor population throughout mammalian kidney development single cell census of human kidney organoids shows reproducibility and diminished off-target cells after transplantation pgc- alpha protects from notch-induced kidney fibrosis from genome-wide associations to candidate causal variants by statistical fine-mapping the cis-regulatory dynamics of embryonic development at single-cell resolution inferring relevant cell types for complex traits by using single- integrative functional genomic analyses implicate specific molecular pathways and circuits in autism an ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome high-throughput sequencing of the transcriptome and chromatin accessibility in the same cell proteomics. tissue-based map of the human proteome atac-seq: a method for an improved atac-seq protocol reduces background and enables interrogation of frozen tissues assessment of computational methods for the analysis of single-cell differential regulation of mouse and human nephron progenitors by the six family of transcriptional regulators epigenetic memory at embryonic enhancers identified in dna methylation maps from adult mouse tissues great improves functional interpretation of cis-regulatory regions deeptools : a next generation web server for deep-sequencing data analysis software for computing and annotating genomic ranges the single-cell transcriptional landscape of mammalian organogenesis -(tetra-zol- -yl)acetato-kappao]tris-(tri- phenyl-phosphine-kappap)silver(i) mono-hydrate foxl -expressing mesenchymal cells constitute the intestinal stem cell niche key: cord- - rg qtdq authors: watkins, laura c.; degrado, william f.; voth, gregory a. title: influenza a m inhibitor binding understood through mechanisms of excess proton stabilization and channel dynamics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rg qtdq prevalent resistance to inhibitors that target the influenza a m proton channel has necessitated a continued drug design effort, supported by a sustained study of the mechanism of channel function and inhibition. recent high-resolution x-ray crystal structures present the first opportunity to see how the adamantyl-amine class of inhibitors bind to m and disrupt and interact with the channel’s water network, providing insight into the critical properties that enable their effective inhibition in wildtype m . in this work, we test the hypothesis that these drugs act primarily as mechanism-based inhibitors by comparing hydrated excess proton stabilization during proton transport in m with the interactions revealed in the crystal structures, using the multiscale reactive molecular dynamics (ms-rmd) methodology. ms-rmd, unlike classical molecular dynamics, models the hydrated proton (hydronium-like cation) as a dynamic excess charge defect and allows bonds to break and form, capturing the intricate interactions between the hydrated excess proton, protein atoms, and water. through this, we show that the ammonium group of the inhibitors is effectively positioned to take advantage of the channel’s natural ability to stabilize an excess protonic charge and is thus acting as a hydronium-mimic. additionally, we show that the channel is especially stable in the drug binding region, highlighting the importance of this property for binding the adamantane group. finally, we characterize an additional hinge point near val , which dynamically responds to charge and inhibitor binding. altogether, this work further illuminates a dynamic understanding of the mechanism of drug inhibition in m , grounded in the fundamental properties that enable the channel to transport and stabilize excess protons, with critical implications for future drug design efforts. toc graphic proton transport (pt) across cellular membranes is a critical component of many biomolecular systems, necessary, for example, to maintain ph gradients, , to drive atp synthesis, and to facilitate the co-or anti-transport of other small molecules. [ ] [ ] [ ] because of their essential role in such systems, channels and transporters with pt functionality are often targets for drug design to inhibit or control pt-in the case of viruses and bacteria, to slow or prevent infection, but there are myriad other disease applications. [ ] [ ] [ ] drug design is notoriously challenging, as both thermodynamic and kinetic factors must be considered but are difficult to predict and control, and its success depends on high quality structures, an understanding of structural dynamics, and a knowledge of the protein's function and its mechanism. thus, beyond elucidating mechanisms of pt in order to understand how a specific channel or transporter works, studying the detailed interactions that facilitate pt can provide valuable insight to help guide drug design efforts. the influenza virus kills up to , people each year, and the impact of the recent global coronavirus pandemic emphasizes how critical it is to maintain our focus on understanding and treating viral infections. the influenza a virus matrix (m ) proton channel is a homo-tetrameric protein responsible for the acidification of the viral interior, a critical step in the influenza infection process. [ ] [ ] [ ] it is the target of two of the three currently available oral antivirals, amantadine and rimantadine. , while these are effective at blocking pt in wildtype m , drug-resistant mutants have become the predominate strains, rendering these drugs ineffective and thus requiring a continued drug design effort informed by a deeper understanding of the pt and drug inhibition mechanisms. additionally, m is considered an archetype for the viroporin family, a class of viral channels considered ideal drug targets. the sars-cov- virus responsible for the covid- pandemic contains two viroporins, protein e and . [ ] [ ] [ ] thus, viroporins are a critical class of proteins to study as potential therapeutic targets. m is located in the viral capsid and is acid-activated: as the ph of the endosome encapsulating the virus is lowered, the m channel becomes activated and facilitates unidirectional proton flow to the viral interior, allowing the virus to escape the endosome and infect the cell. the key residue that controls activation is his , - which can bind one additional proton and take on a + charge. one histidine from each helix forms the his tetrad, which can collectively hold a + to + excess charge, dependent on ph. the channel becomes activated and the c-terminal portion opens (adopting the inwardopen conformation) upon reaching the + state, and pt occurs as the channel cycles through a transporter-like mechanism. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] amantadine and rimantadine belong to the adamantyl-amine class of inhibitors, binding in the upper-middle portion of the channel. these drugs were the predecessors of many related adamantane-based compounds featuring a relatively rigid, apolar group and an attached charged group. [ ] [ ] [ ] [ ] [ ] [ ] [ ] recently, thomaston et al. published several high-resolution x-ray crystal structures of m with amantadine, rimantadine, and a novel spiro-adamantyl amine bound. these structures provided the first opportunity to see the specific interactions that facilitate stable inhibitor binding and the disruption of the hydrogen-bonded water network otherwise present. along with an earlier qualitative md simulation study that guided the design of the spiro-adamantyl amine inhibitors, the crystallographic analysis provided potential insights into the mechanism of inhibition, suggesting that the backbone carbonyls of pore-lining residues act as physiochemical chameleons, able to engage in both hydrophobic and hydrophilic interactions, and that the drug is tilted off the channel's axis and interacts with waters in the ala layer. taken together, it is hypothesized that amantadine acts as a mechanism-based inhibitor, with the ammonium group functioning as a hydronium mimic. computational studies to date have primarily focused on the means of entry into the channel and location of binding, - but have not deduced specific interactions between the drug, channel, and channel water involved in binding as they relate specifically to similar interactions seen in the pt mechanism. proton transport is an inherently quantum mechanical process, as the hydrated proton structure (hydronium-like) exists in a complex hydrogen-bonded network that rearranges dynamically as bonds break and form according to the grotthuss shuttling mechanism. - thus, classical molecular dynamics (md) with fixed bonding topology cannot be used to study pt; moreover, ab initio methods are not efficient enough to reach the many nanosecond timescales necessary to obtain sufficient sampling in biomolecular systems that may have important degrees of freedom several orders of magnitude slower than proton shuttling. multiscale reactive molecular dynamics (ms-rmd) [ ] [ ] [ ] [ ] (and multistate empirical valence bond, ms-evb, before it) was developed to efficiently and accurately capture the solvation and delocalization of an excess proton in water, such that the quantum-chemical nature of the hydrated proton can be studied in the context of membrane proteins over the long timescales needed for accurate simulation of such systems. ms-rmd has been successfully applied in several protein systems to predict and explain mechanisms of pt. , , [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] in previous work, , quantum mechanics/molecular mechanics (qm/mm) and ms-rmd was used to calculate potentials of mean force (pmfs, i.e., free energy profiles) of pt through the m channel in the + - states, providing critical insight into the ph-dependent activation behavior and the role of the his tetrad in pt. most recently, we further analyzed the ms-rmd simulations to explore the detailed interactions between the hydrated excess proton and the channel and found that the proton dynamically, as a function of its position, alters several properties of the protein and pore waters, including the hydrogen-bonding network and the protein structure. this latter work illustrates how ms-rmd can be used successfully to investigate explicit, dynamic interactions between a hydrated proton and its immediate environment, as well as its indirect effects on other parts of the system. here, we employ a similar approach as in this previous work to focus specifically on properties related to drug binding and how the position of the bound drug relates to the overall pt mechanism. through this analysis, we examine the hypothesis that the adamantyl-amine drugs act as mechanism-based inhibitors. our results indicate that the ammonium group of amantadine and rimantadine are aptly positioned to take advantage of the channel's ability to stabilize an excess proton. additionally, by examining conformational fluctuations, we show that the drug binding pocket is an especially stable and symmetrical portion of the channel, conducive to binding a roughly spherical drug, and we reveal an additional minor hinge point towards the top of the channel which may be a relevant feature for future drug design efforts. simulations for calculating properties as the proton moves through the top of the channel were run as follows. starting structures were taken from previous simulations, which were initiated from a crystal structure of the transmembrane portion of the m channel (this construct is referred to as m tm) resolved at room-temperature and high ph (pdb: qkl ) embedded in a -palmitoyl- -oleoyl-sn-glycero- phosphocholine (popc) bilayer solvated with water. m tm is the minimum construct necessary to retain proton conduction similar to full-length m , and it has been shown that the presence of amphipathic helices, included in the full-length m protein, do not significantly influence the pt mechanism. the collective variable (cv) defined for umbrella sampling (us) is the z-coordinate of the vector between the excess proton center of excess charge (cec, see below) and the center of mass of the four gly alpha carbons, as in our previous work, such that the cv has negative values at the top (n-terminal end) of the protein and progresses to positive values at the bottom (cterminal end). the excess proton cec is defined as: is the center-of-charge of the diabatic ms-rmd state, and is the amplitude of that state. the sum is over all states. ms-rmd simulations were run with the excess proton at every . Å along the cv coordinate between - . and . , generating windows. to ensure that the proton remained in the channel, a cylindrical restraint was added at Å with a force constant of kcal/mol·Å using the opensource, community-developed plumed library. , the ms-evb version . parameters were used to describe the hydrated excess proton. after a ps ms-rmd equilibration, the replica exchange umbrella sampling technique was used to facilitate convergence. production simulations were run for ~ - ns with frames saved every ps. for calculating hydrogen bond residence times, longer independent trajectories were run with the cec restrained in different positions using us as described above. each hydrogens on the ammonium group of amantadine were added. right, a snapshot from an ms-rmd trajectory with the most hydroniumlike water indicated in green. in both, two opposing chains of m are shown in silver. the ser , his sidechains and the gly , ala backbone carbonyls are shown. the z-coordinate for the system is included on the left, where z = Å is defined as the center-of-mass of the gly alpha-carbons. trajectory was run for . - ps with frames saved every fs. simulation frames were binned by excess proton cec value for subsequent analyses, which were performed in python using the scipy, numpy, and pandas libraries. for hydrogen bond analysis, values were averaged over the four helices. hydrogen bonds were defined by the following criteria: the donor-acceptor distance must be less than . Å, and the donor-hydrogen-acceptor angle must be greater than °. several hydrogen bond definitions were tested and did not affect the conclusions (not shown). for calculating residence times, a hydrogen bond was considered in place as long as the particular water molecule remained the closest water to the protein atom and the hydrogen bond criteria were met. images of molecular structures were rendered in visual molecular dynamics (vmd), while other figures were generated using matplotlib. if the adamantyl-amine drugs are acting as mechanism-based inhibitors as hypothesized, we would expect to see specific aspects of the pt mechanism taken advantage of or replicated by the drug upon drug binding. to test this, we performed ms-rmd simulations of m in the + his charge state with an explicit excess proton to evaluate the hydrogen-bonding networks, pore shape, and protein fluctuations throughout pt that relate to drug binding. by focusing on pt in the + state, we are studying the process of proton entry and diffusion to his in the first key step of channel activation, paralleling inhibitor entry into the channel. we additionally expect this to be a prevalent charge state in drug-bound structures due to the lowered his pkas. replica exchange umbrella sampling was used to obtain sufficient sampling of proton positions throughout the top portion of the channel, with windows from cecz = - . to . Å where the coordinate origin is defined as center of mass of the gly alpha carbons. the channel is aligned along the z-axis for all subsequent analyses. to understand how properties of pt may provide insight into drug binding, we primarily examine variations dependent on proton position. we compared the values of each property when the proton is at the drugs' ammonium group positions versus other parts of the channel to determine if the drugs could be taking advantage of the channel's natural ability to stabilize a proton. this idea is highlighted in figure , which shows both the drug-bound crystal structure and a snapshot of a hydrated excess proton in the channel from our simulations. we refer to the drugs' ammonium nitrogen position along the z-axis in the crystal structure as ammnz. this value is - . and - . Å for the the inwardclosed amantadine and rimantadine bound structures, respectively (averaged over the two tetramers in each crystal structure). flexible hydrogen bonds stabilize the excess proton near ammnz. it has been shown in our previous work that hydrogen bonds within the channel, including those between water and protein atoms, help facilitate proton transport by altering their direction and frequency of interaction as the proton moves through the channel. here, we focus specifically on water interactions that may help account for excess charge stabilization near ammnz. in figure , we calculate the occupancy of three different hydrogen bonds between protein atoms and water as a function of the excess proton position in the channel. while the ala hydrogen bond occupancy is consistent as the excess proton enters and moves through the top of the channel, as it approaches the ala carbonyls, the occupancy decreases ~ %. this dip indicates the ala hydrogen-bonded waters can flexibly reduce their interaction with the protein as a result of an excess charge in their vicinity. additionally, this dip is centered at - . Å, near ammnz. at this point, the role of the waters near ala carbonyls in hydrating the proton is maximized. this supports the hypothesis that amantadine and rimantadine are mechanism-based inhibitors and take advantage of the channel's natural ability to stabilize a hydrated excess proton in order to stabilize the drug's ammonium group. the hydrogen bond occupancy of waters with the gly carbonyls increases once the excess proton passes through the val gate and remains fairly consistent across proton posi tions thereafter, exhibiting little dependence on the hydrated proton position once it is in the channel. the ser sidechain water occupancies are shown for comparison, which do not show a noticeable trend based on proton position. thus, this change in interactions is not a universal effect throughout the channel, but the ala waters seem to be uniquely flexible in this manner. these differences are consistent with drug design studies -while compounds such as spiro-adamantyl amine have been able to displace the water in the ala layer, no designed inhibitors have displaced the water around gly . to further understand how the dynamics of the hydrogenbond network may show how these drugs benefit from the channel's inherent excess-charge stabilization used in proton transport, we examined the average residence times of hydrogen bonds between water and several important protein atoms. to do this, independent trajectories were run for five different excess proton positions, including two trajectories with the proton completely outside the channel (cecz = - . , . Å) and three when the proton is near ammnz. these results are shown in figure . the ala water residence times slightly increase when the excess proton is near ammnz, the ser water residence times do not show any significant difference between the proton outside the channel and at ammnz, and those of gly waters decrease at ammnz. the waters hydrogen bonded to his imidazole nitrogens show the greatest change in residence times and are shown to highlight the ability of this method to describe such differences. we note that classical md does not wholly capture charge transfer in hydrogen bonds, resulting in an overall weaker interaction. thus, while these simulations provide valuable insight into these interactions and their trends, we expect that they are stronger in the real system and any differences will be more prominent. with the above results for ala , this may indicate that several waters remain tightly hydrogen bonded to the ala backbone carbonyls, while one or more are bonded less frequently. while the gly hydrogen bonds do not form less frequently (as indicated in figure ) with an excess charge in this region, they do exhibit greater dynamics and flexibility. this change indicates an increase in water dynamics when the excess proton is near, which could help stabilize and solvate the excess charge in the ala water layer. taken together, these results further support the hypothesis that amantadine and rimantadine act as mechanism-based inhibitors: the channel acts as a scaffold to facilitate pt by harboring flexible protein-water interactions that can adapt and respond to a positive excess charge, with specific ability to stabilize an excess proton near ammnz that the charged drugs can take advantage of. drug tilt positions ammonium group in highest cec density. one prominent characteristic of the amantadine and rimantadine bound structures is the drug's tilt within the pore. this tilted conformation is also seen in solid-state nmr studies. given the drug's three-fold symmetry in a four-fold symmetric channel, the ammonium group cannot form hydrogen bonds with all waters hydrogen-bonded to ala , leading in part to this tilt. based on our previous work examining the proton's path through the channel, we used a similar analysis to examine the density of cec positions when the excess proton is near the ammonium position in drug-bound structures. figure a the excess proton prefers to be near the edge of the pore, unlike the predominate preference for the center of the pore figure b shows the radial density of the cec in this same region of the channel. possible positions of amantadine's ammonium group nitrogens were calculated based on the drug's position and tilt in the crystal structure, and their radii are included as dashed lines (these positions were also used to generate the image in figure ). these hydrogens can extend to a radius ~ . Å in this static crystal structure, which indicates that the slightly off-centered ammonium group directly positions - of its hydrogens in the region of the cec's highest density. the cec's propensity for the edge of the pore indicates that the drug's tilt in the channel may not only be a necessary component of its binding, but also a thermodynamic advantage. this tilt further allows the drug to act as a hydronium mimic, as the hydrogens of the ammonium group are in the favorable positions of the solvated excess proton. the analysis of hydrogen bonding changes and proton densities indicates how the ammonium group is a functional addition to the adamantane scaffold, as the charged group is positioned in a region where the channel is especially adept to stabilize an excess charge. this stabilization relies on flexible water structures and hydrogen-bond interactions that can undergo minor changes to accommodate the proton, suggesting that the adamantyl-amine inhibitors are acting as hydronium mimics -they take advantage of these inherent features to help solvate the charged ammonium group. the identification of other regions of the channel with increased ability to stabilizing an excess charge, such as areas of increased proton density or significantly flexible water interactions, could help provide new targets for drugs to act as hydronium-mimics. pore shape and stability near ser are ideal for adamantane binding. another hypothesis about the adamantyl-amine class of inhibitors is that adamantane is effectively spherical and can freely rotate within the channel, but has no rotatable bonds, which minimizes the entropy lost upon binding. this rapid rotation can be seen on the nmr time scale and is consistent with the recent thomaston et al. crystallographic studies, in which the motion was indirectly inferred. nevertheless, its significance depends on the dynamic nature of the channel -if the protein exhibits great structural fluctuations in the region where the drug binds, then drug binding may induce changes that greatly decrease the entropy and this hypothesis would not fully explain the drugs' efficacy. to better understand how the channel's natural dynamics may lend itself to favorable drug binding, we examined the pore shape throughout our trajectories. as an estimate of the asymmetry of the channel, we calculated the eccentricity, which is essentially a measure of how "circular" a given oval is. the eccentricity is defined as: where a and b are the semi-major and semi-minor axes, respectively, which we approximate by the distance between alpha-carbons on opposing helices. a schematic of this is shown in si figure . eccentricity can have values between and , with indicating a circle and indicating a parabola. these results are shown in figure . eccentricity maximum, minimum, and rmsd values are calculated in si table . the pore-lining residues in the bottom half of the channel, gly , his , and trp , all show a greater degree of asymmetry and a wider range of eccentricity values, dependent on the excess proton position, than the pore-lining residues in the top part of the channel. interestingly, proton entry at cecz = - Å has a pronounced effect on the channel near trp , greater than that when the proton nears the center of the channel. ser , however, has overall the smallest average eccentricity and the lowest minimum value than the other pore-lining residues during pt in this portion of the channel. additionally, ser and val have smaller proton position dependent changes in eccentricity than the pore-lining residues in the bottom half of the channel. this result indicates that the ser region is the most symmetrical and stable in the channel. while analyzing the alpha-carbon distances and eccentricity, we also examined the correlation between these alpha-carbons distances on opposing helices, shown in figure , to gain further insight into protein motion and conformational fluctuations on the nanosecond timescale. these motions captured here are equilibrium fluctuations in the + , inwardclosed state, not necessarily motions driving the transition between inwardopen and inwardclosed. the calculated correlations indicate that the channel's equilibrium structural fluctuations are dominated by alternating inward-outward motions of opposing helices. at each pore-lining residue, the distances are negatively correlated-that is, when helices a and c move farther apart, helices b and d move closer together, and vice-versa. gly is known to be the hinge point whose kinking controls the large structural change between inward-open and inwardclosed conformations, which may falsely lead to the conclusion that the conformational fluctuations at equilibrium above and below gly are decorrelated, with a stable core centered at gly . interestingly, however, the motions at gly are strongly and similarly correlated with the motions at both ser and his . this correlation indicates that in this fixed charge state, the gly kink is relatively rigid. instead, there is a noticeable lack of correlation between val and the other porelining residues, suggesting that there is a secondary, minor "hinge" between val and ser that decorrelate the inwardoutward motions between the helices above and below this point. this natural hinge observed near val furthers our understanding of val acting as a secondary gate that opens to allow proton and water entry into the channel. , in our simulations, this valve can readily hydrate, particularly in the presence of a nearby excess proton. moreover, it is frequently closed, which may make passage of a hydrated sodium or chloride ion more difficult. this aspect of the val gate and its relevance for pt and proton selectivity is likely an important feature of the m channel and could be further explored in future work. given that the adamantane group of the drug is centered in the ser tetrad plane, we hypothesize that these facets of the channel's dynamics are critical for fully explaining the drugs' favorable binding. because of the more circular shape of the pore at the ser tetrad, the spherical adamantane group can fit snugly under the hydrophobic val cleft and block pt. additionally, the relative stability of the pore in the region of drug binding helps explain why drug binding is thermodynamically favorable. because the channel exhibits smaller structural fluctuations here than in other regions of the channel, the adamantane based drugs are able to bind with minimal loss of entropy as the channel does not need to lose flexibility to create a stable drug-binding interface. this has figure . pearson correlation coefficients of the distances between alpha-carbons on opposing helices, for all pore-lining residues, when excess proton cecz=- . Å. each row and column correspond to a specific residue and distance, as labeled, where 'a-c' is the distance between the alpha-carbons of helices and and , and 'b-d' is that of helices and . only those values with a pvalue < . are shown, any other values are set to . . important implications for designing drugs that use scaffolds different from the adamantane group, , - or that interact with drug-resistant mutants such as s n. while drug binding to more flexible regions of the channel is possible, only modest changes in potency are often observed despite large changes in the size of the drugs. this is likely because of the need to counter the greater loss of entropy resultant from structural changes and reduced fluctuations. altogether, we have shown how the adamantyl-amine inhibitors of m are suited to exploit various inherent features of the m channel that naturally facilitate proton transport, further supporting the claim that they function as mechanismbased inhibitors. the ability of hydrogen bond interactions to flexibly respond to the hydrated excess proton cec, measured in both hydrogen bond occupancy and residence times, indicate how the channel is suited to stabilizing an excess charge near ammnz. thus, the ammonium group of these inhibitors can act as a hydronium-mimic by binding in this region. we also analyzed the pore shape throughout the channel by calculating the eccentricity of the pore based on alpha-carbon distances. the results from these calculations indicate that the drug binding pocket is an especially stable and symmetrical portion of the channel, conducive to binding a roughly spherical drug. finally, by examining the correlations between these distances, we found an additional minor hinge point towards the top of the channel which may be a relevant feature for future drug design efforts. understanding these features as they relate to drug binding gives further insight into the specific interactions that stabilize drug binding, and could help inform drug design efforts by highlighting other aspects of the m channel and proton transport mechanism that a drug could take advantage of. through this understanding, we hope future drug-design efforts can take advantage of this approach to methodically create new inhibitors for the more prevalent mutant strains. with the recent publication of high resolution influenza b m (bm ) structures, we hope similar studies to elucidate the detailed pt mechanism in bm will be conducted to guide drug design in this functionally similar protein. this work also shows how similar analyses to understand the details of explicit proton transport mechanisms (not those inferred by water structures alone) could be used in other systems, and extended to ion transporters such as the sars-cov- viroporins, to help inform mechanism-based inhibitor design. elucidating the inherent features of drug-targetable proton transporters, such as flexible water and hydrogen bonding interactions, preferred proton positions, dynamic pore shapes, and structural fluctuations, can help guide the design of drug scaffolds and added substituents. the ms-rmd simulation methodology utilized in this work has also made these studies possible both for m and other important drug targets. supporting information. a table of average, maximum, minimum, and rmsd for eccentricity is included in the si. this material is available free of charge via the internet at http://pubs.acs.org. corresponding author * gavoth@uchicago.edu the authors declare no competing financial interest. voltage-gated proton channels and other proton transfer pathways a protonmotive force drives atp synthesis in bacteria coupling of phosphorylation to electron and hydrogen transfer by a chemi-osmotic type of mechanism chapters and chance and design--proton transfer in water, channels and bioenergetic proteins importance of ph homeostasis in metabolic health and diseases: crucial role of membrane proton transport lysosomal storage disease upon disruption of the neuronal chloride transport protein clc- (seasonal) (accessed march , ). . world health organization the m proton channels of influenza a and b viruses selective proton permeability and ph regulation of the influenza virus m channel expressed in mouse erythroleukaemia cells ion channel activity of influenza a virus m protein: characterization of the amantadine block viroporins: structure, function and potential as antiviral targets sars coronavirus e protein forms cation-selective ion channels severe acute respiratory syndrome-associated coronavirus a protein forms an ion channel and modulates virus release activation of the m ion channel of influenza virus: a role for the transmembrane domain histidine residue histidines, heart of the hydrogen ion channel from influenza a virus: toward an understanding of conductance and proton selectivity structure and mechanism of the m proton channel of influenza a virus mechanisms of proton conduction and gating in influenza m proton channels from solid-state nmr effect of cytosolic ph on inward currents reveals structural characteristics of the proton transport cycle in the influenza a protein m in cell-free membrane patches of xenopus oocytes acid activation mechanism of the influenza a m proton channel put a cork in it: plugging the m viral ion channel to sink influenza in vitro pharmacokinetic optimizations of am -s n channel blockers led to the discovery of slow-binding inhibitors with potent antiviral activity against drug-resistant influenza a viruses exploring the requirements for the hydrophobic scaffold and polar amine in inhibitors of m from influenza a virus identification of hits as matrix- protein inhibitors through the focused screening of a small primary amine library interpreting thermodynamic profiles of aminoadamantane compounds inhibiting the m proton channel of influenza a by free energy calculations hydrogen-bonded water molecules in the m channel of the influenza a virus guide the binding preferences of ammonium-based inhibitors computationally efficient multiconfigurational reactive molecular dynamics computationally efficient multiscale reactive molecular dynamics to describe amino acid deprotonation in proteins understanding the essential proton-pumping kinetic gates and decoupling mutations in cytochrome c oxidase proton movement and coupling in the pot family of peptide transporters modulating the chemical transport properties of a transmembrane antiporter via alternative anion flux exchange pathways in clc-ec antiporter proton-induced conformational and hydration dynamics in the influenza a m channel high-resolution structures of the m channel from influenza a virus reveal dynamic pathways for proton stabilization and transduction a second generation multistate empirical valence bond model for proton transport in aqueous systems a guide to numpy {vmd} --{v}isual {m}olecular {d}ynamics matplotlib: a d graphics environment the chemical and dynamical influence of the anti-viral drug amantadine on the m proton channel transmembrane domain computational study of drug binding to the membrane-bound tetrameric m peptide bundle from influenza a virus exploring the size limit of templates for inhibitors of the m ion channel of influenza a virus undecane derivatives: from wild-type inhibitors of the m ion channel of influenza a virus to derivatives with potent activity against the v a mutant expeditious lead optimization of isoxazole-containing influenza a virus m -s n inhibitors using the suzuki-miyaura cross-coupling reaction key: cord- -mhgnqi authors: shin, donghyuk; bhattacharya, anshu; cheng, yi-lin; alonso, marta campos; mehdipour, ahmad reza; van der heden van noort, gerbrand j.; ovaa, huib; hummer, gerhard; dikic, ivan title: novel class of otu deubiquitinases regulate substrate ubiquitination upon legionella infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mhgnqi legionella pneumophila is a gram-negative pathogenic bacterium that causes legionaries’ disease. the legionella genome codes more than effector proteins able to modulate host-pathogen interactions during infection. among them are also enzymes altering the host-ubiquitination system including bacterial ligases and deubiquitinases. in this study, based on homology-detection screening on legionella effector proteins, we identified two legionella otu-like deubiquitinases (lot; lotb (lpg /ceg ) and lotc (lpg ), lota (lpg /lem ) is already known). a crystal structure of lotc catalytic core (lotc - ) was determined at . Å and compared with other otu deubiquitinases, including lotb. unlike the classical otu-family, the structures of legionella otu-family (lotb and lotc) shows an extended helical lobe between the cys-loop and the variable loop, which define a novel class of otu-deubiquitinase. despite structural differences in their helical lobes, both lotb and lotc interact with ubiquitin. lotb has an additional ubiquitin binding site (s ’) enabling specific cleavage of lys -linked poly-ubiquitin chains. by contrast, lotc only contains the s site and cleaves different species of ubiquitin chains. ms analysis of catalytically inactive lotb and lotc identified different categories of host-substrates for these two related dubs. together, our results provide new structural insights of bacterial otu deubiquitinases and indicate distinct roles of bacterial deubiquitinases in host-pathogen interactions. ubiquitination, a well-studied post-translational modification system, regulates the fate of various substrates by tagging them with ubiquitin (yau and rape, linkage specificity of the otu family relies on one of the following mechanisms: ) additional ubiquitin-binding domains, ) ubiquitinated sequences in the substrates or ) defined s ' or s ubiquitin-binding sites (mevissen et al., ) . to define the minimal otu domain for biochemical and structural studies, we designed several constructs and tested their activity against the di-ub panel (fig. a, b) . while lotc retained its activity with the predicted otu domain ( - ), lotb lost its activity after deletion of amino acids ( - ) located at the c-terminus beyond the predicted otu domain ( - ). based on the lotb structure (pdb: ks , (ma et al., )), we assumed that this extra helical region might be required for the additional ubiquitin binding site (s ') to accept the distal ubiquitin moiety from k ub (fig. c) . to understand the detailed mechanism of linkage specificity of lotb and lotc at the molecular level, we determined the crystal structure of the catalytic domain of lotc (lotc - ) at . Å (fig. d features in the s ubiquitin-binding site (fig. c , d and table ). whereas the overall fold of the catalytic core of lotb and lotc resembles that of other otu-deubiquitinases, both showed clear differences in the helical arm region, which has been shown to interact with ubiquitin and it serves as an s binding site (mevissen et al., ) . the structure and sequence alignment with other otus clearly showed that both lotb and lotc contain a relatively long insertion between the cys- loop and the variable loop, compared to other otu members (fig. e) . the typical length of the helical lobe of the known otus is ranging from to amino acids (except otubain family which contain - amino acids), while lotb and lotc contain and amino acids, respectively. based on this observation, we wondered whether lota, another legionella otu- deubiquitinase (kubori et al., ) , also contains a longer insertion in the same region. based on the catalytic cysteine and histidine residues of the two otu domains on lota (hermanns and hofmann, ), we analyzed the sequence and found that both otu domains of lota also contain the longer insertion between the cys loop and the variable loop ( and amino acids, respectively; fig. e ). together, our results identify lot-dubs as a novel class of the otu-family with longer insertions in the helical lobe region ( supplementary fig. a) . novel structural fold of s -ubiquitin binding sites on legionella otus both lotb and lotc have extended helices, specifically near the s ubiquitin binding site and we wondered how these regions interact with ubiquitin. to address this, we performed ubiquitin docking into both lotb and lotc, followed by molecular dynamics (md) simulations for ns ( fig. a- to gain better insights into the physiological roles of lotb and lotc, we decided to identify their interacting proteins or substrates. first, to enrich for the interacting partners, catalytically inactive lotb or lotc were expressed in cells and immuno-precipitated from cell lysates. ubiquitin (uba ) is strongly enriched with both catalytically inactive lotb and lotc (fig. a, c) . ms analysis revealed that lotb mainly interacts with membrane protein complexes (copb , atp b, atp h, cox a, sec b). we also found interactions with some er-resident proteins (calnexin (canx), ddost, stt a). by contrast, most of the enriched proteins from the inactive lotc pull-down were non-membrane-bound organelle-and ribosome-related proteins (rps , rplp , rps , rplp , rpl ) (fig. a, c) . to further understand this, we sought to find the cellular localization of both dubs (fig. b, d) . consistent with the recent publication, lotb specifically co-localized with the er marker protein calnexin, but not with other organelle markers (tomm and gm for mitochondria and golgi, respectively, fig. b) , and the otu domain itself failed to localize on the er (supplementary fig. a ). by contrast, we could not find a specific cellular localization of lotc (fig. d) . next, to gain more insights into the functional roles of lotb and lotc, we decided to explore combinatorial ubiquitination events with other ubiquitin-related lotc. together, these findings establish important guidance on how to screen for more dubs in other pathogenic bacteria or viruses, how to characterize their physiological roles during infection. we also showed that the two legionella otus have different ubiquitin-binding modes that enable them to cleave specific ubiquitin chains. with ubiquitin activity-based probes (prg-, vme-probes), we showed that lotb contains an extra ubiquitin-binding site (s ') and is specific to k -linked ubiquitin chains, wherease lotc cleaves different types of ubiquitin chains. interestingly, we observed a modification of lotb with nedd -prg abp. further studies on neddylated proteins with lotb will give us more insights into dual-activity of lotb. in contrast, we could not see the modification between nedd -abp and lotc and we reasoned that the arg on ubiquitin, which is replaced by alanine in nedd , is important to locate the c-terminus of of gst-tagged protein was incubated hour with glutathione-s-sepharose pre-equilibrated with washing buffer ( mm tris-hcl ph . , mm nacl, mm dtt) and non-specific proteins were cleared with washing. gst-proteins were eluted with elution buffer ( mm tris-hcl ph . , mm nacl, mm dtt, mm reduced glutathione) and buffer exchanged to storage buffer ( mm tris-hcl ph . , mm nacl, mm dtt). for his-tagged proteins, the supernatant was incubated with ni-nta pre-equilibrated with washing buffer ( mm tris-hcl, ph . , mm nacl, mm imidazole) for hours and eluted with elution buffer ( mm tris-hcl, ph . , mm nacl, mm imidazole) and the buffer was exchanged to storage buffer. for lotc - , instead of using the elution buffer, glutathione beads were incubated with sfgfp-tev protease the program requires two protein structures as inputs, which were prepared by running the refinement protocol before the docking step. we performed the local docking approach and generated independent structures for each complex. the complexes in this way were subject to local refinement to remove remaining small clashes. the complexes were then clustered based on the distance matrix of cα atoms between the ligase and ubiquitin using the kmeans method. the representatives of two major clusters in each case were selected based on the interface score (i_sc), which represents the energy of the interactions across the interface of two proteins. these tables: table . top values are obtained from hhpred server (mpi bioinformatics toolkit) prg-abp vme-abp -s ' s ' s ' s s ' s prg k s s s ' k s ' s s ' s ' s ' s ' s ' s s prg k s s lotc-host proteins interactome lotb-host proteins interactome mrpl ndufs rps sf b slc a serbp npm sptbn prdx uba rps rps rps ppm b pcna nono dnaja tuba b pabpc pabpc atp b scrib uqcrc slc a cox a rbm atp h copb atp c psmc ppia stk uba rpl rpl rpl rpl rpl a rpl rpl rpl a rpl rpl rpl rpl a rpl rpl rpl rpl rpl rpl rpl rps rps a rps rps rps rps rps rps rps rps rps rps rplp mrpl hspa b; a psmd dnaja tcp rab a rab a;b stk l rbm psma hist h psmc trim phkb vdac sun riok psmc rpn uba hnrnpc npm eef d cct rplp rplp ybx hspa hspa tubb non-membrane-bounded organelle structural constituent of ribosome c orf eif i elmo erh kprp phgdh phka phka phkb phkg dock eef a eef d emd gnb golga hdx hnrnpc hnrnph hnrnpk hnrnpm hspa psmc ruvbl rps eef a psmd vps b rps eef d hsp aa sod rps cct ppia hist h b actg gramd uba arcn atp b bola canx cct cct ckap clns a cnot copb cse l ddost atp a atp b atp c atp h atp j atp o bsg chchd cox a cyb r dnaja hspa b hspa hspd hspe lrrc mycbp ogt phb phb sfxn slc a slc a stoml tufm uqcrc vdac mitochondrion ipo jak kpnb nono npm otud pabpc pcna pgrmc ppia ppm b prdx prkdc prmt psmc psmc psmd pspc rab a rab a riok rnf rpn rps rps rps ruvbl sec b slc a smn snrpd srrm stk stt a sun tcp tfrc tmem tmpo tnrc b tuba b tubb tubb b insights into catalysis and function of phosphoribosyl-linked serine ubiquitination otulin antagonizes lubac signaling by specifically hydrolyzing met -linked legionella translocates an e ubiquitin ligase that has multiple u-boxes with distinct functions lota, a legionella deubiquitinase, has dual catalytic activity and contributes to intracellular growth distinct deubiquitinase class important for genome stability crystal structures of two bacterial hect-like e ligases in complex with a human e reveal atomic details of pathogen-host interactions a compact viral processing proteinase/ubiquitin hydrolase from the otu family deubiquitinase ceg regulates the association of lys- -linked polyubiquitin molecules on the legionella phagosome molecular basis of lys -polyubiquitin specificity in the deubiquitinase cezanne a native chemical ligation handle that enables the synthesis of advanced activity-based probes: diubiquitin as a case study polymorphic transitions in single crystals: a new molecular dynamics method cellular quality control by the ubiquitin-proteasome system and autophagy the molecular basis for ubiquitin and ubiquitin-like specificities in bacterial effector ubiquitination independent of e and e enzymes by bacterial effectors molecular characterization of lubx: functional divergence of the u-box fold by stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and lc/ms sample pretreatment in the merops database of proteolytic enzymes, their substrates and inhibitors in and a comparison with peptidases in the panther database interferon-inducible antiviral effectors rhogdi using a family of "parallel" expression vectors serine ubiquitination by deubiquitinases dupa and dupb covalent inhibition of sumo and ubiquitin-specific cysteine proteases by an in situ thiol-alkyne addition decision-making in structure solution using bayesian estimates of map quality: the phenix autosol wizard the perseus computational platform for comprehensive analysis of (prote)omics data deubiquitinase function of arterivirus papain-like protease suppresses the innate immune response in infected host cells evidence for bidentate substrate binding as the basis for the k linkage specificity of otubain insights into the ubiquitin transfer cascade catalyzed by the charmm-gui membrane builder toward realistic biological membrane simulations a novel method for high-level production of tev protease by superfolder gfp tag the increasing complexity of the ubiquitin code lupas an, alva v hhpred server at its core key: cord- -c i btp authors: li, maohua; zhao, rongqing; chen, jianxin; tian, wenzhi; xia, chenxi; liu, xudong; li, yingzi; yan, yuyuan; li, song; sun, hunter; shen, tong; ren, wenlin; sun, le title: next generation of anti-pd-l atezolizumab with better anti-tumor efficacy in vivo date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c i btp some cancer patients treated with atezolizumab, pd-l antibody drug launched by genentech, quickly developed anti-drug antibody (ada), led to loss of efficacy. this was likely due to the heavy aggregation of atezolizumab, caused by mutation of n a for removing unwanted antibody-dependent cytotoxicity (adcc) of igg antibody drug. here, we developed a new version of atezolizumab (maxatezo), which was demonstrated better anti-tumor efficacy in vivo. in atezolizumab, we mutated a to n back to bring back the glycosylation, and inserted a short sequence gggs between g and g in the hinge region of the igg heavy chain. our data shown that insertion of gggs, without altering the anti-pd-l antibody affinity and inhibitory activity, completely abolished the adcc activity, as same as atezolizumab. moreover, the insertion of gggs, without altering the glycosylation profile of igg , increased the yields of anti-pd-l antibody considerately. additionally, glycosylation improved the stability yet reduced the amounts of aggregations in the antibody solutions. in turn, the level of ada in animals treated with maxatezo was % lower than the ones treated with atezolizumab. most importantly, at the same mg/kg dose, the anti-tumor activity of maxatezo had attained % compared to that of atezolizumab at %. in recent years, with the deepening of the research on the mechanism of tumor immune escape melanoma failed in clincal phase iii) , and the approved indications were far less than those of rival products in the same field. therefore, in terms of market occupancy and annual sales, it lags far behind keytruda and opdivo. it is well-known in antibody manufacturing that incomplete glycosylation will lead to aggregations of antibodies , which in turn will induce strong ada in treated patients . aglycosylation of antibody made the cases even worse. atezolizumab's drug label includes a warning to not shake the vial or discard the drug if the solution turns cloudy. in its pre-clinical and clinical studies, drug treated monkeys developed ada at a % rate, while in cancer patients whose immune systems had been severely damaged by chemo or radiotherapy, ada was developed at a . % rate . the fast development of neutralizing ada forced the dosage of atezolizumab to be escalated to a record high of mg per injection and still failed to reach end-points of several phase iii clinic trials . therapeutic antibodies have different mechanisms of action, such as ) neutralizing antibodies that block the target/pathogen's biological activities, ) clearing antibodies mediated by antibody-dependent cell phagocytosis (adcp) to remove the target/pathogen from the body, or ) targeting antibodies via antibody-dependent cytotoxicity (adcc) to recruit natural killer (nk) cells and other effector t cells to kill the pathogens or tumor cells. based on the intended mechanisms of action, researchers and drug developers may choose to utilize different isotypes. , . most antibody drugs choose igg , igg and igg isotypes, while igg is avoided due to its instability. the sequence homology between igg , igg and igg is more than % with the primary differences resting within the hinge region and ch domain, which contain the binding sites for different fcγrs , . fcγr engagement is essential for the fc functions of iggs , , . binding to antigens by the antibody can change the conformation of the fc region to expose the binding sites for fcγrs, which in turn can activate adcc and/or adcp activity , . the human fcγr family consists of the activating receptors fcγri, fcγriia, and fcγriiia, and the inhibitory receptor fcγriib . fcγri and fcγriia are expressed by macrophages, and involved in adcp function , , and fcγriiia which is expressed on nk cells is important for adcc function . for some antibodies targeting specific antigens on the surface of tumor cells, adcc mediated effects allow nk cells to effectively kill tumor cells . for development of these types of therapeutic antibodies, igg isotypes with strong adcc functions are preferred, and some of the antibodies are even modified to further enhance their adcc effects . for blocking antibodies, such as antibodies targeting soluble cytokines (tnf alpha and il a) or some immune checkpoints (ctla- , pd- , pd-l ), the ability to bind to fcγr is not desirable and the cytotoxicity brought about by adcc/adcp should be prevented. the binding affinity of igg to fcγrs on effector cell surfaces is highly dependent on the n-linked glycan at asparagine (n ) in its ch domain , , with a loss of binding to the fcγrs observed in n a point mutants , , enzymatic fc deglycosylation , recombinant igg expression in the presence of the n-linked glycosylation-inhibitor tunicamycin , or expression in bacteria , . in addition, the nature of the carbohydrate attached to n modulates the affinity of the fcγr interaction as well , . aglycosylation of igg has also been used to completely remove the unwanted adcc/cdc , . fda-approved anti-pd-l antibody drugs atezolimumab is human igg without glycosylation by n a mutation. structures of igg fc region show that the oligosaccharide attached to n is hidden in the cavity between the ch domains from the two heavy chains. when antibodies bind to the antigen, there are conformation changes in the fv regions, which result in a domino effect via the hinge region to the remaining fc region, leading to a cascade of conformation changes including the exposure of oligosaccharides attached to the n and the formation of the fcγr binding domain. then, the antigen-attached antibody would bind to the fcγr on the surfaces of effector cells, leading to adcc activation . in theory, if one could block the conformation change signal transduction from the fv region to the fc region, this should also prevent the formation of the fcγr binding site, and thus prevent the adcc activation. in this paper, based on the knowledge of an antibody's d conformation changes post antigen binding, we propose a novel way to design antibody drugs without adcc function by simply inserting a flexible amino acid sequence such as gggs into the hinge region of the antibody's fc region. our data shown that such an insertion completely abolished the antibody's adcc. the effects on binding affinity (ec ), inhibitory activity (ic ), the glycosylation profile, expression level, stability, immunogenicity and anti-tumor activity were also examined. based on the structural information acquired from different antibodies in protein data bank (e.g., pdb id: igt), we hypothesized that an insertion of a short but very flexible sequence in the hinge region or somewhere upstream of the glycosylation site of n may cut off the stress transmission signal between the fv and fc domain. in this study, short sequence of gggs was chosen and inserted between g and g of human igg heavy chain. the sequence of gggs has been used in many approved biological drugs, such as scfv and fc-fusion proteins, as flexible linkers without any known adversary effects or new immunogenicity in patients. for reverse engineering of atezolizumab, a back mutation of a n was also introduced into the original heavy chain to restore the glycosylation (fig. s ). to evaluate whether there will be any negative impact on antibody affinity, we compared their affinities to the recombinant pd-l using an indirect elisa. as shown in fig. a , both atezolizumab and maxatezo bind to pd-l in a dose-dependent manner, with ec between . ~ . nm. no negative impact on antibody affinity by insertion was observed. clearly, the insertion of gggs between g and g of human igg heavy chain showed no significant negative impact on antibody's affinity and inhibitory activity. next we studied whether the insertion indeed removed the adcc of human igg or not. for therapeutic application, the recombinant antibody needs to be produced using a stable mammalian cell system at reasonable high expression level. stable expression cho cell pools were established for both atezolizumab and maxatezo. two production runs for each antibody were carried out using l and/or l bioreactors with same manufacturing parameters. on day , the culture supernatants were collected and the antibody concentrations in the culture supernatants were measured. as shown in table , the expression levels of maxatezo by stable cho cell pools were between . ~ . g/l, which qualified the industry expectation of g/l or higher. however, the expression levels of atezolizumab were approximately . g/l, much lower than that of maxatezo. possibly, aglycosylation of atezolizumab could lead to in-correct folding of antibodies, which in turn cause the decrease of secretion. after cloning and selection, the expression level of maxatezo by cho monoclonal cell line increased to . g/l. our data had suggested that insertion of gggs does not have negative impact on recombinant antibody expression. adding back of the glycosylation actually help to improve the production level. high molecular weight (hmw) aggregations of antibody drugs are the major causes of antidrug antibody (ada) . aglycosylation of igg could make the aggregation even worse. to examine the levels of aggregates in the final drugs, maxatezo and atezolizumab were separated by hplc-sec. the percentages of hmw in the final drugs were estimated to be less than . %, based on the peak areas of monomer and hmw, seems met the quality requirement for antibody drug (fig. , table ). however, when we took a close look at the areas for the monomers, atezolizumab had absorbance of , , units, % less of protein than maxatezo's , , units, even though the loading amounts were the same. it is possible that atezolizumab has significant amount of large aggregation which were removed by the pre-filter of the hplc columns. not only glycosylation is important for the correct conformation, secretion and stability of protein, its carbohydrate moiety also plays great role in its affinity to its receptor(s). therefore, it is important to confirm the insertion does not result significant changes to the profile of the glycosylation. both maxatezo and atezoliumab were subjected to digestion with pngase f to release the glycan from the antibodies. the enzymatic products were labeled with -aminobenzamide ( -ab), then separated by hydrophilic interaction chromatography (hilic) with a fluorescence detector. while there was no glycan detected in the sample digested from atezoliumab, the glycans with different sizes were observed in the sample prepared from maxatezo (fig. s ). the glycan profiles of different clones of maxatezo were further analyzed. as shown in table and figure , all clones exhibited similar profiles of glycan compositions and the distribution of the glycoforms appeared to be normal. our data indicates the insertion of gggs does not change the glycan profile of human igg . thermal stability is one of the key factors for antibody drug development and it is highly influenced by the status of glycosylation . to evaluate the stability of maxatezo and atezolizumab, both antibodies were heated at for minutes, then centrifuged at rpm for minutes, and filtered using . μm filters. samplings were taken at each step and the protein concentrations were measured at od using a uv spectrophotometry. as shown in table , the concentrations of maxatezo decreased only % (from . mg/ml to . mg/ml) after the treatments, within the range of loss by filtration. however, the concentrations of aglycosylated atezolizumab decreased from . mg/ml to . mg/ml, which constitute to approximately % of loss. clearly, maxatezo with gggs insertion is much more stable than atezolizumab. in vivo immunogenicity assessment was carried out to compare the anti-drug antibody (ada) titers of atezolizumab and maxatezo in mice. as shown in fig. , atezolizumab induced extremely high titers of ada in mice, aligning with what was observed in healthy monkeys and human cancer patients. as glycosylation had been restored, the ada titers of maxatezo was % less than that of the aglycosylated atezolizumab. in this study, the in vivo therapeutic efficacy of the test compounds maxatezo and atezolizumab was evaluated in the treatment of mc mouse colorectal cancer model in c bl/ j mice. as shown in fig. , maxatezo( mg/kg, q d x ), maxatezo ( . mg/kg, q d x ), maxatezo ( mg/kg, q d x ), atezolizumab ( . mg/kg, q d x ) and atezolizumab ( mg/kg, q d x ) treatments produced tgi values of %, %, %, % and %, respectively. tumors in treatment groups were all significantly smaller than those in the human igg group (p< . ). high dose maxatezo treatment at mg/kg resulted in significantly smaller tumors compared with maxatezo at mg/kg and atezolizumab at mg/kg (p< . ). complete tumor regression was observed in , , , , mice from the maxatezo groups ( mg/kg, . mg/kg, mg/kg) and atezolizumab groups ( . mg/kg, mg/kg), respectively. all the mice in this study have been in decent condition without obvious abnormality, dosing holiday or death during the treatment. in summary, maxatezo and atezolizumab produced significant antitumor activity in mc mouse colorectal cancer model. maxatezo ( mg/kg) is significantly better than atezolizumab to inhibit tumor growth at the same dosing level and frequency of administration. tumor-bearing mice showed good tolerance to continuous administration of maxatezo and atezolizumab in this experiment. our data demonstrated that the insertion of gggs in the hinge regions of human igg could abolish their adcc activities completely without concering negative impact on antibody affinities, inhibitory activities, expression levels, stabilities or immunogenicity. the efficacy in tumor inhibition of maxatezo is much better than that of atezolizumab. immunotherapy, using monoclonal antibodies against pd , pd-l or ctla , has demonstrated effective to treat various cancers. the therapeutic fields for antibody drugs also expand beyond autoimmune disease, cancers and infectious diseases into chronic diseases such as pain, neurodegenerations, diabetes, and osteoporosis. to reduce the immune-related adverse effects (irae), switching igg to igg /igg or use of aglycosylation of igg has been widely used to remove the fc function(s) of antibody drugs. however, igg has an extra cystine residue in the upstream of the hinge region. as such, it will form homo-or hetero-dimers via the inter-igg disulfide bond, which will affect its expression level and stability . igg has reduced adcc but retains adcp activity. moreover, it is considered that adcp activity of pembrolizumab reduced its tumor killing potency by phagocytosis of nk cells . a new fc function removal technology based on structural biology was reported in this paper. a flexible sequence, such as gggs, was inserted into the hinge region of igg isotype antibody to interrupt the stress signal transfer between the fv and fc region. for this reason, the fcγr binding domain will not be exposed when the antibody binds to the target, leading to the loss of adcc activity. in this study, the rational design of human igg antibody without adcc or other fc functions is a very promising approach to suffice the needs to develop therapeutic antibodies without unwanted antibody's fc functions. for those well-known antibody drugs, such as genentech's aglycosylated anti-pd-l atezolizumab, we demonstrated that inserting gggs in the hinge regions of human igg fc could remove the adcc activities completely. since this approach does not alter the fv domain of the antibody, we did not observe negative impact on either the affinity or inhibitory activity. the study strategized a physics-based approach to solve a problem of biology. inspired by previous studies about the structural change of the fc hinge region after the binding of antigens, we simply inserted a flexible linker to stop the stress transmission from fv to fc of the antibody, and to prevent the exposure of binding sites for various fcγrs. leaving the glycosylation intact and making no additional changes in the remaining parts of the fc region not only resulted in much higher expression levels than the aglycosylated atzeolizumab, but stability was also improved. currently, the re-engineered anti-pd-l (maxatezo) just successfully manufactured and shall proceed into pre-clinical studies shortly. as of june , there are antibody drugs on the market and more than in clinical trials with purposely reduced adcc and/or adcp activities. in separate study with re-engineering anti- furthermore, in the sars-cov infection patients, the anti-spike igg cause severe acute lung injury through the fcγrs, which could skew alveolar macrophages from wound-healing to proinflammatory . we expect that our technology can also help research fellows develop covid- therapeutic antibodies with much less proinflammatory activities. the antibodies were expressed either by transient transfection or stable cell pool, and cho-k cells were used for both routes. all the antibodies were purified with protein a sepharose. to establish the stable cell pool, after the transfection, cho host cells were subjected to selection pressure to obtain stable mini cell pools, and the antibody expression level by fed batch in l bioreactor were evaluated. to obtain monoclonal stable cell line, sub-cloning by limiting dilution was carried out from the selected stable mini pools. top - clones were selected for further development. the binding affinity was determined by indirect elisa against the antigen. each well of the -well high binding eia plates was coated with µg/ml of antigen, such as recombinant pd-l , at overnight in pbs. after two washes with pbs and blocking with % skim-milk in pbs for hour at room temperature, wells were incubated with purified antibody in % skim-milk-pbs for another one hour at room temperature. after two washes with pbs, wells were then incubated with hrp-conjugated goat anti-human igg fc-specific secondary antibodies (jackson lab) in % skim-milk-pbs for hour at room temperature. after five washes with pbs plus . % tween (pbst), hrp substrate , ´, , ´-tetramethylbenzidine (tmb) solution was added. the reaction was stopped with stop solution ( . m h so ) after minutes and absorbance was measured at nm with a microplate reader. as reported previously cho-pd-l -cd l cells were seeded at , cells per well in well plate, incubated at , with % co for - hours. , of jurkat-pd- -nfat cells were added to each well in the presence or absence of testing antibody and incubated at with % co for another hours. cells were lysed and μ l of luciferase substrate (promega bio-glotm luciferase assay) was added into each well, and the plate was measured using spectramax m to calculate the relative luciferase unit. using cfse to dye raji cells overexpressing target protein (such as pd-l , pd- or ctla- for hplc-sec analysis of antibodies, the sample was diluted to . mg/ml with the mobile phase ( mm nah po , mm nacl, ph . ), then loaded μl to the tsk gel g swxl ( μm, . × mm) column which has been equilibrated by the same mobile phase. then run the procedure with . ml / min flow rate for min. the absorbance value at nm was monitored. antibodies were first digested with pngase f (new england biolabs) according to the manufacturer's instructions, and the free glycan(s) were separated using ludgerclean™ eb kit according to the manufacturer's instructions. the glycan samples were labeled with -ab ( aminobenzamide) using ludgertag tm -ab ( -aminobenzamide) glycan labeling kit and separated with ludgerclean™ s cartridges. the -ab labeled glycan samples were analyzed by hplc with fluorescence detection. as reported previously , balb/c mouse were used to evaluate the ada titers of different antibodies. ~ weeks old female balb/c mice were first immunized with antibodies in complete freund's adjuvant and boosted with antibodies in incomplete freund's adjuvant. two to four weeks after the first immunization, tail bleeds from the immunized mice were tested for titers by indirect enzyme-linked immunoassay (elisa) against antibody drugs in the presence of % human sera. the mc cell line purchased from biovector ntcc inc.. the tumor cell is maintained in vitro in dmem medium supplemented with % heat inactivated fetal calf serum, u/ml penicillin, μ g/ml streptomycin at °c in an atmosphere of % co in air. the cells growing in an exponential phase are harvested and counted for tumor inoculation. tumor cells were suspended in pbs to x /ml after two washes and they were subcutaneously inoculated at the right flank of c bl/ j mice with µl/mouse. when the average tumor volume reached about mm , the mice were randomized based on their tumor volumes and test articles were administered to the mice according to the predetermined regimen as shown in table s . after inoculation, the animals were checked daily for morbidity and mortality. at the time of routine monitoring, the animals were checked for any effects of tumor growth and treatments on normal behavior such as mobility, food and water consumption, body weight gain/loss, eye/hair matting and any other abnormal effect. animal death and observable clinical signs will be recorded. the tumor volume was calculated according to formula: tumor volume = . × long diameter × short diameter , and the tumors were measured twice a week. the t/c values were calculated from the tumor volume, where t is the average relative tumor volume (rtv) of each test subject treated group, and c is the average rtv of the control group. rtv is the ratio of tumor volume after administration to pre-dose. tumor growth inhibition (tgi%) was calculated as ( -t/c) × %. the treatments with tumor growth inhibition ≥ % and statistically significant difference in tumor volume are considered effective. tumors were collected, photographed, weighed at study termination and tumor weight inhibition (%) was calculated as ( -average tw of each test subject treated group/average tw of the control group) × %. the relative percent of different glycan was verified. the horizontal coordinate is glycan isotype and the vertical coordinate is relative percent. immunogenicity of antibodies was assessed using in vivo mouse model and the ada titers were measured using indirect elisa against antibody drugs. the line with squares represents the maxatezo, (anti-pd-l antibody with gggs insertion), the line with rhombus represents the atezolizumab (anti-pd-l antibody with n a). the horizontal coordinate is antibody dilution ratio and the vertical coordinate is the absorbance value at nm. the horizontal coordinate is the days after tumor inoculation and the vertical coordinate is the tumor volume (mm ). note: the stable cell pools expressing either atezolizumab or maxatezo were seeded in bioreactors and the culture supernatants were harvested on day . the antibodies in the culture supernatants were separated by analytic hplc and the antibody concentrations were determined based on the areas of the peaks at od . atezolizumab versus chemotherapy in patients with platinum-treated locally advanced or metastatic urothelial carcinoma (imvigor ): a multicentre, open-label, phase randomised controlled trial the influence of glycosylation on the thermal stability and effector function expression of human igg -fc: properties of a series of truncated glycoforms immunogenicity testing of therapeutic protein products -developing and validating assays for anti-drug antibody detection label of tecentriq® (atezolizumab) antibody therapeutics: isotype and glycoform selection antibody engineering and modification technologies structure of fcgammari in complex with fc reveals the importance of glycan recognition for high-affinity igg binding the igg fc contains distinct fc receptor (fcr) binding sites: the leukocyte receptors fc gamma ri and fc gamma riia bind to a region in the fc distinct from that recognized by neonatal fcr and protein a fcgamma receptors as regulators of immune responses endoglycosidase treatment abrogates igg arthritogenicity: importance of igg glycosylation in arthritis optimizing engagement of the immune system by anti-tumor antibodies: an engineer's perspective the many faces of fcgammari: implications for therapeutic antibody function fc gamma receptors: glycobiology and therapeutic prospects igg fc engineering to modulate antibody effector functions optimization of antibody binding to fcgammariia enhances macrophage phagocytosis of tumor cells m macrophages phagocytose rituximab-opsonized leukemic targets more efficiently than m cells in vitro natural killer cell mediated antibody-dependent cellular cytotoxicity in tumor immunotherapy with therapeutic antibodies role of fc-fcgammar interactions in the antitumor activity of therapeutic antibodies boosting adcc and cdc activity by fc engineering and evaluation of antibody effector functions interaction sites on human igg-fc for fcgammar: current models the impact of glycosylation on the biological function and structure of human immunoglobulins high resolution mapping of the binding site on human igg for fc gamma ri, fc gamma rii, fc gamma riii, and fcrn and design of igg variants with improved binding to the fc gamma r studies of aglycosylated chimeric mouse-human igg. role of carbohydrate in the structure and effector functions mediated by the human igg constant region role of oligosaccharide residues of igg -fc in fc gamma riib binding aglycosylation of human igg and igg monoclonal antibodies can eliminate recognition by human cells expressing fc gamma ri and/or fc gamma rii receptors isolation of engineered, full-length antibodies from libraries expressed in escherichia coli expression of full-length immunoglobulins in escherichia coli: rapid and efficient production of aglycosylated antibodies anti-inflammatory activity of immunoglobulin g resulting from fc sialylation lack of fucose on human igg n-linked oligosaccharide improves binding to human fcgamma riii and antibody-dependent cellular toxicity aglycosylated immunoglobulin g variants productively engage activating fc receptors human igg can form covalent dimers the binding of an anti-pd- antibody to fcgammariota has a profound impact on its biological functions anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection development of a robust reporter gene assay to measure the bioactivity of anti-pd- /anti-pd-l therapeutic antibodies a simple and cost-effective assay for measuring anti-drug antibody in human patients treated with adalimumab the authors declare that they are affiliated with four commercial entities with stakes in the data. patent of de-adcc pending. key: cord- -sxw l tt authors: talbot, steven r.; struve, birgitta; wassermann, laura; heider, miriam; weegh, nora; knape, tilo; hofmann, martine c. j.; von knethen, andreas; jirkof, paulin; keubler, lydia; bleich, andré; häger, christine title: one score to rule them all: severity assessment in laboratory mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sxw l tt animal welfare and the refinement of experimental procedures are fundamental aspects of biomedical research. they provide the basis for robust experimental designs and reproducibility of results. in many countries, the determination of welfare is a mandatory legal requirement and implies the assessment of the degree of the severity that an animal experiences during an experiment. however, for an effective severity assessment, an objective and exact approach/system/strategy is needed. in light of these demands, we have developed the relative severity assessment (relsa) score. this comprehensive composite score was established on the basis of physiological and behavioral data from a surgical mouse study. body weight, the mouse grimace scale score, burrowing behavior, and the telemetry-derived parameters heart rate, heart rate variability, temperature, and general activity were used to investigate the quality of indicating severity during postoperative recovery. the relsa scores not only revealed individual severity levels but also allowed a comparison of severity in distinct mouse models addressing colitis, sepsis, and restraint stress using a k-means clustering approach with the maximum achieved relsa scores. we discriminated and classified data from sepsis nonsurvivors into the highest relative severity level. data from mice after intraperitoneal transmitter implantation and sepsis survivor al were located in the next lower cluster, while data from mice subjected to colitis and restraint stress were placed in the lowest severity cluster. analysis of individual variables and their combinations revealed model- and time-dependent contributions to severity levels. in conclusion, we propose the relsa score as a validated tool for objective real-time applicability in severity assessment and as a first step towards a unified and accessible risk assessment tool in biomedical research. as an effective severity assessment system, it will fundamentally improve animal welfare, as well as data quality and reproducibility. good science and high quality data derived from animal experiments in basic and translational research requires good animal welfare. consequently, researchers are obligated to ensure the best possible welfare of research animals, in line with the refinement principle in the rs , . therefore, determination of the welfare of animals under scientific procedures is embedded in many international animal protection guidelines and acts, e.g., the guide for the care and use of assessment in a realistic scenario, a selection of the best performing variables for a given model is desirable. we can clearly show that some single variables or combinations outperform others in this study. however, this outperformance differs from day to day. each variable can assume a state of (not chosen) or (chosen). since there are eight variables, a total of = combinations were analyzed. therefore, we tested the possible variable combinations within the pooled tm data and calculated the relsa scores for each day. the relsa scores were summed in a by matrix during the iteration steps. finally, the individual sums were averaged using the total number of analyzed animals (n= ). the resulting relsa performance score for each variable combination across postoperative days is shown in fig. a . on the post-op day, the best performing relsa scores in the present data were bur h ( . ), bur h/act ( . ), bur h/hrv ( . ), bur h/hrv/act ( . ) and bur h/hr ( . ). the top worst performers on the post-op day were buron/temp/mgs ( . ), bwc/temp/mgs ( . ), mgs ( . ), temp/mgs ( . ) and temp ( . ) ( table ). using only bur h as the best performing variable on post-op day to display individual relsa scores revealed a similar grading of the tm vs sham groups regarding the maximum values (fig. b) . this underscores the rationale for selecting the best informative parameters for a given model. however, this parameter alone does not reflect the welfare state over the whole time course of this study. the clustering of relsa max scores reveals objective severity levels. in addition to the data for building the relsa reference set from tm-implanted mice, we evaluated relsa performance as a tool for severity comparisons between models by including data from three additional animal studies (colitis, stress, sepsis). all included studies recorded data for the following five variables: heart rate, heart rate variability, temperature, activity, and body weight. each study was analyzed using the relsa methodology and was therefore referenced against the data set from the tm-implanted mice. this way, the overall context allowed the comparison of studies in terms of general model severity. for this, we used the individual relsa max values as previously described to assess the maximum achieved severity for each animal in these studies. with these data, we used a k-means cluster analysis to segment the ordered univariate relsa max outputs into distinct clusters. we estimated the number of clusters heuristically to k= using scree analysis (fig. a, b) . the resulting borders of the clusters are shown as dashed lines in fig. c we used data from mice suffering from colitis induced by dextran sulfate sodium (dss), colitis + stress where the animals received dss and were additionally subjected to immobilization stress on consecutive days for h per day, and corresponding colitis control animals treated only with water. furthermore, we used data from mice submitted to cecal ligation puncture (clp) surgery for sepsis induction and the corresponding sham-operated animals (clp sham). here, the data were divided into clp survivors and nonsurvivors. the above-described cluster levels enabled a ranking of the respective animal models regarding the severity that was experienced. cluster analysis revealed the highest severity level for clp nonsurvivors, followed by a cluster of tm-implanted animals (which were the relsa reference set) and clp survivors. lower severity clusters were formed by data from animals suffering from colitis and stress, colitis alone and clp sham-operated animals. data from colitis control animals were allocated to the lowest relsa cluster (fig. c, d) . furthermore, we investigated how stable the relsa max distributions were in terms of their mean values and cluster positions. some studies or subgroups involved small sample sizes. therefore, we applied -fold bootstrapping to assess the % confidence intervals of the relsa max centroids. except for the colitis + stress study, the confidence intervals remained within their relative k-means cluster levels. the confidence interval for the colitis control group did not overlap with any other higher-level confidence interval. model-specific parameter contributions to the general relsa severity estimation. relsa curves from the individual animals over time displayed the generalized biological variation that occurs during severity monitoring. individual animals deviated from the group mean (fig. a, b) . this enabled individual severity monitoring. we used radar charts to quantify the contribution of single parameters to the relsa max scores (fig. c ). for data from the tm-implanted animals (fig. b) , it became obvious that immediately after surgery, all variables except for temperature contributed to the overall detected severity. over time, some parameters returned to their baseline positions, but hrv and act remained contributors to an elevated relsa score. in the case of the clp model, the relsa score was dominated by the large differences in the temperature variable. however, the other parameters, except for body weight, contributed to the overall relsa score (fig. s ) . for the clp study, the time variable is hours and not days (fig. s a, b) . therefore, the body weight variable was not flexible enough to indicate rapid impairment of the animals in a manner similar to, e.g., the temperature. interestingly, in clp sham animals (fig. s c) , activity was the most active variable, but temperature and heart rate also contributed to the relsa score. in animals suffering from colitis with ( fig. s d ) and without stress (fig. s e) , activity was the dominating variable over the first days, but on day , body weight became more relevant. as expected, radar charts with data from colitis control mice showed no relevant changes within any of the observed variables (fig. s f) . evidence-based severity assessment is increasingly becoming indispensable in animal research. from a researcher´s point of view, it enables the best possible monitoring of the welfare state. from an ethical point of view, it is the prerequisite for a refinement of experimental procedures leading to a minimal burden for animals and, in unity, provides a basis for high-quality data. from a legal point of view, ensuring animal welfare and severity assessment is mandatory in many countries, e.g., in all eu member states . the large number and diversity of animal models and the lack of validated methods hinder clear definitions of severity categories . this has multiple consequences, ranging from legal uncertainties for scientists and authorities to a potential bias in rating the prospective severity of the animals in their studies. we have developed a tool that enables evidence-based severity assessment. with the algorithm presented, an arbitrary number of outcome variables can be used to compute a composite score for welfare assessment and severity grading - . to our knowledge, this is the first attempt in preclinical science to combine phenotypical data using matrices of standardized differences to weigh variable contributions as a means for obtaining a measure for relative severity grades. this contrasts with current standards using human judgment to generate numerical scores for assessing welfare. using the approach presented, we have also shown that variables differ in performance and sensitivity and, therefore, strengthen the concept of a multimodal severity assessment. finally, the relsa algorithm enabled the quantitative comparison of distinct animal models with regard to severity levels, which leads to the speculation that it will do so in human patients as well. when developing relsa, we aimed at a quantitative grading of severity while methods at hand are characterized by qualitative scoring. the principle of composite scoring is based on systems utilized for clinical monitoring and risk assessment in human medicine. one example is the acute physiology and chronic health evaluation (apache ii) score, which was first reported in . the apache ii score comprises physiological and laboratory parameters with an additional weighting for age and preadmission health status to predict the risk of death , . in contrast, the sequential organ failure assessment (sofa) score, which was established in , consists of different scores assessing distinct organ dysfunction and failure , . the score describes the status of morbidity and critical illness but does not predict the outcome. currently, the sofa score is being used in the severity assessment of covid- patients to characterize mortality among intensive care unit (icu) patients . in veterinary medicine and laboratory animal science, there are various composite scores available, e.g., the clinical severity index for acute pancreatitis in canines , composite behavior scores for pain assessment in rodents , , or composite measure schemes for rat epilepsy models to create a more generalized severity assessment score, we used a system that can potentially combine any measurement or variable from the clinical and behavioral examination. this takes into account the multidimensional nature of severity, reflecting not only pain and distress but also affective emotional states. therefore, the chosen parameters for severity assessment should be multimodal . this concept is supported by growing evidence in the literature. in a study assessing severity during a chronic pancreatitis model, it was shown that the combination of multiple variables improved the sensitivity of read-out parameters . in the present study, we used a comprehensive panel of methods to monitor the welfare of animals after various experimental procedures, with tm implantation as a use case. to exclude selection bias, we calculated the models' severity level with a full set of available variables: body weight change, burrowing behavior, mgs score and telemetry- derived parameters including hr, hrv and temperature. these parameters were selected based on increasing evidence of their suitability in various model systems as well as several round -table discussions within our german research foundation (dfg)-funded research consortium , which focuses on severity assessment in animal-based research (www.severity-assessment.de) , , . we observed that even though some variables showed high sensitivity towards the implantation procedure, they only showed strong changes over a short time frame. the most prominent example here is the bur h variable. burrowing is a highly motivated behavior of mice and is known to be impaired under painful conditions or in mouse models of anxiety and schizophrenia , . in this study, burrowing was highly sensitive in detecting changes in welfare but only immediately after tm implantation. likewise, bwc sensitively indicated the impact of tm surgery but quickly recovered within to days after the operation. body weight is considered one of the most critical parameters in classic clinical scoring in rodents . however, monitoring body weight as a severity assessment parameter was shown to be model-specific and should be used in combination with other parameters . similarly, using mgs, only short-term effects were detected within minutes after implantation (not shown). on a daily scale, the mgs variable played no role in indicating severity. in contrast, the telemetry-derived parameters hr and hrv showed strong changes on the post-op day but also indicated a longer-lasting impact on animals, suggesting an extended recovery period (up to day ). telemetry is a frequently used method in biomedical research. it has been shown that hr and hrv are parameters indicating distress and pain , , and hr and body temp serve as critical parameters in sepsis studies . this leads to the assumption that the various parameters reflected different facets of severity (e.g., pain) better than others or that the animals did not experience the particular facets after a while. however, this question remains elusive, and the results of the present study underscore the need for a combination of parameters, including physiological parameters, to fully assess the severity situation. variance, the resulting relsa max values can be used, e.g., in animal model comparisons (figs. and ) . comparing the relsa max values revealed that tm implantation exhibited higher severity than sham operations. however, sham operation also shows some level of severity. the small peak in the relsa score on day after tm implantation (fig. a, b) demonstrates how sensitive the algorithm is towards value changes. here, the relsa score was not zero like the rest of the variables, but it was slightly elevated due to some minor variation in the hr variable (relsa hr, = . (sd . )). if there are changes in the measured values, the relsa score will adequately reflect this. an overall effect of the chosen analgesics was not observed, leaving the search for an ideal treatment for future studies. to validate the relsa algorithm, we used data from models with different forms and grades of impairments. an acute dss-colitis model, an acute dss colitis in combination with repeated restraint stress model and a clp sepsis model were assessed. fig. c shows that the relsa max scores remained within the moderate frame of the k-means cluster levels and did not exceed the relsa level of , with the exception of the clp nonsurvivors. the colitis relsa max values reliably clustered in level , indicating a lower severity for the dss-colitis model compared to the tm-implantation study. however, in the colitis study, animals had to be euthanized because the humane endpoint (max. of % weight loss) was reached. this had been set to ensure that animals experience a maximum of moderate severity levels according to the project authorization. although the relsa values indicated increased suffering, they also imply that the animals may have been euthanized too early, challenging the use of a % loss of body weight as an objective endpoint to ensure moderate severity levels. even though the humane endpoint for a single variable was reached, the remaining variables did not support a general increase in overall suffering in relation to the reference set. data from the clp study revealed very high relsa scores for the animals that did not survive the procedure (relsa max ≥ . ) and lower values for the surviving and sham animals (relsa max < ). the main factor responsible for the high scores was a large decrease in temperature, but hrv and act also indicated increases in severity. here, more than one variable is pointing towards increased suffering and therefore to an increased impairment in well-being. relsa enables scientists to quantify severity. in addition, it can be used to classify animals and models in qualitative frameworks, e.g., mild, moderate, and severe. for qualitative grading, data from a predefined reference set are needed, and subsequently, the severity context can be extrapolated. trivially, the extrema (min/max) of each variable serve as ranges for the given severity context. one caveat is that researchers must provide some sort of estimation about the quality of severity for the reference set, a step that involves human judgment. however, once defined, a new experiment can be used for severity quantification with regard to the reference set. this concept is new to the field and allows an evidence-based comparison of models within actual statutory provisions and guidelines. in addition to providing context, the reference set has another purpose: it regularizes the possible ranges of the input variables. this can prove essential, as variables behave differently when animals are negatively affected. for example, a loss of % in body weight is generally recognized as a threat to animal health . at the same time, burrowing behavior may drop to zero. in this case, a difference of % in one variable is equivalent to % in the other variable. for an optimal representation of this bias, we calculated individual relsa weights (r w ) as effect sizes for each variable and day, which were then used in the final score calculation. individual variables contribute to the final relsa score as relsa weights (r w ). these weights can be considered a special form of effect size that is somewhat related to glass' ∆ . for the r w values, however, the differences are that these values are not standardized to the standard deviation in the control group but rather to the difference of the respective variable to its maximum deviation in the reference set. this approach allows an estimation of within-animal effect sizes and measurements of a particular variable's importance. for the generalization of the weights in a final score, we concluded that variables with larger deviations should have more impact, while smaller deviations mostly represent noise and effects that are less prominent within a cohort. in statistics, this is followed by the root mean square (rms) concept, e.g., in error and regression analysis. in contrast to a pure sum score, the rms has the advantage that it directly translates to the scale of the individual weights and is considered to be more accurate in showing the best fit. another important issue is the sampling and measurement frequency. body weight is detected (e.g., once per day in the morning) and burrowing behavior after a certain time (e.g., after h or overnight). the sampling rates in these cases are a) not equal and b) not frequent enough to catch minute-by-minute changes. transient changes in some variables thus appear as "all-or-nothing" parameters. they change much faster than the sampling rates so that the exact development over time cannot be seen. although the sampling rate cannot be corrected with relsa, the skewness in distribution can be adjusted to a certain degree by including extreme values of a reference model with known severity into the calculation. to be comparable, the relsa algorithm requires the same reporting frame (e.g., day) in all input variables even though this can mean that the integration times are different (e.g., bur h). relsa was designed to assess the multidimensional severity an animal experiences under impaired welfare conditions using multivariate data. the combination of objective variables into a composite score has the advantage of unbiased severity assessment without the need for interpretation or analysis. we have shown that such a composite model can be built, tested and validated. in the future, a comparison of more animal models will lead to a severity map that can then be used to obtain a better understanding of the multivariate severity context. it will not only become much clearer to assess severity but also enable the ranking of animal models with regard to their impairment of welfare. finally, this may also reveal more generalized or specific variables for preoperative mg/kg carprofen s.c. and postoperative . mg/kg s.c. every h until day . the mice that underwent additional colitis or stress induction were treated using the metamizole analgesia regimen. in the clp study, mice aged to weeks were anesthetized via s.c. injection of mg/kg in ml/kg ketamine (ketaset ® , zoetis deutschland gmbh, berlin, germany) and mg/kg in ml/kg xylazine (rompun ® , bayer vital gmbh, leverkusen, germany). perioperative management was the same as described above. the blood pressure catheter was placed in the left carotid artery and positioned so that the gel-filled sensing region of the catheter was approximately mm in the aortic arch. the telemetry transmitter device body was placed along the lateral flank between the forelimb and hindlimb, close to the back midline. biopotential ecg leads were tunneled subcutaneously to achieve positioning analogous to lead ii in human ecg. burrowing behavior. one week before intraperitoneal transmitter implantation or the corresponding sham surgery, the mice were housed pairwise in type ll macrolon cages filled with aspen bedding material (asbewood gmbh, buxtehude, germany) and two compressed cotton nesting pads (asbewood gmbh, buxtehude, germany). on days five and four before surgery, the burrowing apparatus was provided to the animals to train burrowing behavior . baseline measurements were taken on days two and one before surgery. a -ml plastic bottle with a length of cm, a diameter of . cm and a port diameter of cm was used as a burrowing apparatus. it was filled with g +/- . g of the standard diet pellets of the mice (altromin , lage, germany). for burrowing testing after surgeries ( st , nd , rd , th and th night after surgery), mice were single housed in a type-ii macrolon cage with autoclaved hardwood shavings. the burrowing bottles were placed in the left corner. in the right corner, half of the used nesting material from the home cage was provided as a shelter. the tests started three hours before the dark phase, and after two hours, the content of the burrowing bottles was weighed (bur h). the bottles containing the remaining pellets were placed back into the cages and weighed again the next morning (buron). astrazeneca gmbh, wedel, germany) was used. the depth of anesthesia was checked by means of the corneal and eyelid reflex. during the entire period of anesthesia, the mice were on a heating pad at . ± . °c. the abdominal cavity was aseptically opened via a midline laparotomy incision of approximately cm, and the cecum was exposed. subsequently, the cecum was / ligated (nylon monofilament suture / , fine science tools gmbh, heidelberg, germany) distal to the ileocecal valve, while care was taken that the intestinal continuity was maintained. the exposed cecum was punctured twice, "through-and-through", with a -gauge needle. next, sufficient pressure was applied to the cecum to extrude fecal material from each puncture site (~ mm). the cecum was returned to the abdominal cavity and placed in the upper central abdomen. following this procedure, the peritoneum was closed with three knot fissures with nonresorbable sterile suture material (nylon monofilament suture / , fine science tools gmbh, heidelberg, germany), and the upper skin layer was stapled with sterile clips (michel suture clips . x . mm, fine science tools gmbh, heidelberg, germany). for the mice undergoing a sham laparotomy, the same procedure was performed without clp. after fully recovering from the anesthesia, the mice were put back into their home cage, after which the continuous data acquisition of all physiological parameters began immediately. the mice received . mg/kg buprenorphine s.c. three hours after surgery and subsequently every h for the rest of the experiment. at the end of the experiments, mice were anaesthetized deeply with isoflurane and killed by cervical dislocation. . colitis induction and restraint stress. after intraperitoneal transmitter implantation and days of postoperative recovery, the female c bl /j mice were exposed to % (control; receiving water only) or % dss (colitis; mol wt - ; mp biomedicals, eschwege, germany) in drinking water for consecutive days to induce intestinal inflammation. the mice were weighed daily, and the telemetry-derived parameters hr, hrv, activity, and temperature were recorded. a third group of mice was subjected to restraint stress (colitis + stress) in addition to dss treatment. the mice were inserted into restraint tubes on consecutive days (d -d ) for minutes (from : to : am). the restraint tubes ( -mm internal diameter, -mm length) consisted of clear acrylic glass with ventilation holes ( mm diameter) and a whole length spanning -mm-wide opening along the upper side of the tube. the ends of the tube were sealed on one side by a piece of acrylic glass with a slot for the mouse tail and on the other end by a solid plastic ring that screwed into place. the mice were able to rotate around their axis but could not move horizontally. data characterization. before analysis, the data were brought into the tabular format required for relsa analysis (table s ) . eight variables were used in the calculations (body weight change (bwc), mouse grimace scale (mgs), h of burrowing started h before dark phasebur h), burrowing overnight (buron), heart rate (hr), heart rate variability (hrv), body temperature (temp) and activity (act)). for each variable, the data were pooled, and the effective ranges were determined (table ) . furthermore, cohen's d was calculated for each variable and each day, i.e., post-op ( ), , and , to compare the resulting relsa scores with an independent measurement of effect. principal component analysis (pca). pca was conducted using the factoextra package in r. pca requires complete data so that the present data were limited to the following days: baseline, post-op and postoperative days to . for pca calculations, all variables were scaled and centered. the principal components of the first two dimensions for all respective days were plotted, as well as the factor loadings and variable contributions. relative severity assessment (relsa) score calculation. the principal methodology of the relsa calculation is depicted in fig. . quantitative input data were normalized to the range [ ; ]% with % as starting values (based on physiological or baseline conditions, e.g., on pre-op day (- ); table s ). the relsa methodology requires a reference set. if this set has a qualitative severity attribute, the calculated scores will be in reference to that category. according to annex xiii of the eu directive, surgical interventions under general anesthesia, such as the tm implantations or sham surgery, are categorized as "moderate" in terms of severity. thus, the relsa reference set quantitatively reflected this category. it uses the respective extrema of the monitored variables, thereby establishing the context within the referential severity category. for each time point (t), data differences from the normalized baseline for each contributing variable (i) were calculated. to establish the severity context, the differences were divided by the normalized maximum-reached differences in the respective variables of the reference set to yield weights (r w , see formula ). for this measure, absolute differences were used. each r w is an expression of the similarity of an actual data point to the maximum-reached value observed in the reference set at any observed time point. this step also regularized differences in variable contributions at any given level of severity so that different scales do not skew the results. to give larger differences more weight, the final relsa score was calculated by the root mean square (rms) of the available r w divided by the number of variables (n) (see formula ). missing variables did not contribute to the relsa score, whereas values equal or above baseline level contributed with values of zero. furthermore, levels of severity in the reference data were calculated using a k-means algorithm . the number of clusters was determined heuristically with a scree plot (fig. a) . a relsa score of means that all contributing variables for a test animal reached the same values as the largest observed deviations in the reference set with the defined level of severity (here, "moderate"). happy animals make good science the principles of humane experimental technique guide for the care and use of laboratory animals: eighth edition on the protection of animals used for scientific purposes operational details of the five domains model and its key applications to the assessment and management of impulse for animal welfare outside the experiment assessing affective state in laboratory rodents to promote animal welfare-what is the progress in applied refinement research? animals : an open access journal from mdpi how can we assess their suffering? german research consortium aims at defining a severity assessment framework for laboratory animals improving bioscience research reporting: the arrive guidelines for reporting animal research reproducibility: respect your cells reproducibility in science: improving the standard for basic and preclinical research defining body-weight reduction as a humane endpoint: a critical appraisal running in the wheel: defining individual severity levels in mice a novel multi-parametric analysis of non-invasive methods to assess animal distress during chronic pancreatitis grading distress of different animal models for gastrointestinal diseases based on scientific assessment of animal welfare assessment of unnecessary suffering in animals by veterinary experts guidelines on severity assessment and classification of genetically altered mouse and rat lines classification and reporting of severity experienced by animals used in scientific procedures: felasa/eclam/eslav working group report apache-acute physiology and chronic health evaluation: a physiologically based classification system apache ii: a severity of disease classification system the sofa (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. on behalf of the working group on sepsis-related problems of the european society of intensive care medicine use of the sofa score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study development of a clinical severity index for dogs with acute pancreatitis automated analysis of postoperative behaviour: assessment of homecagescan as a novel method to rapidly identify pain and analgesic effects in mice comparative effects of vasectomy surgery and buprenorphine treatment on faecal corticosterone concentrations and behaviour assessed by manual and automated analysis methods in c and c h mice design of composite measure schemes for comparative severity assessment in animal-based neuroscience research: a case study focussed on rat epilepsy models sickness behavior score is associated with neuroinflammation and late behavioral changes in polymicrobial sepsis animal model. inflammation where are we heading? challenges in evidence-based severity assessment where are we heading? challenges in evidence-based severity assessment burrowing and nest building behavior as indicators of well-being in mice do glua knockout mice exhibit behavioral abnormalities relevant to the negative or cognitive symptoms of schizophrenia and schizoaffective disorder? assessment of post-laparotomy pain in laboratory mice by telemetric recording of heart rate and heart rate variability implantation of radiotelemetry transmitters yielding data on ecg, heart rate, core body temperature and activity in free-moving laboratory mice use of biotelemetry to define physiology-based deterioration thresholds in a murine cecal ligation and puncture model of sepsis meta-analysis and the integration of research in special education felasa recommendations for the health monitoring of mouse, rat, hamster, guinea pig and rabbit colonies in breeding and experimental units burrowing in rodents: a sensitive method for detecting behavioral dysfunction improvement of the mouse grimace scale set-up for implementing a semi- automated mouse grimace scale scoring (part ) semi-automated generation of pictures for the mouse grimace scale: a multi- laboratory analysis (part ) coding of facial expressions of pain in the laboratory mouse statistics. data distributions were tested against the hypothesis of normality using the shapiro-wilk test. in the case of failed rejections, nonparametric methods were used for group comparisons (kruskal-wallis test) and the mann-whitney u-test for pairwise tests. parametric analyses were performed using analysis of variance (anova) or single t-tests (with welch's correction in case of unequal variance). for multiple comparisons, the resulting p-values were adjusted using the bonferroni correction. the relsa max and cluster centroids were bootstrapped -fold to yield mean values as well as % bias-corrected and accelerated (bca) confidence intervals. with either method, the resulting p-values were considered to be significant at the following levels: . (*), . (**), . (***) and . (****). software and packages. the algorithm was developed in r (version . . ). in addition to relsa, the following packages were used for analysis: ggplot , factoextra, effsize, plyr, and boot. radar charts were realized using the fsmb package. the relsa algorithm and the raw data are available as an r package with full documentation on github: https://github.com/mytalbot/relsa. test data require the same variables that are included in the reference set. both data sources are normalized to their baseline values, followed by the calculation of the individual relsa weights (r w ) as standardized effect sizes with regard to the maximal observed changes in the reference set. the final relsa score is calculated as a root mean square from the available r w . the relsa score can be calculated for single values and for multiple time points. key: cord- -ppgx mci authors: maughan, elizabeth f.; nigro, ersilia; pennycuick, adam; gowers, kate h.c.; denais, celine; gómez-lópez, sandra; lazarus, kyren a.; butler, colin r.; lee, dani do hyang; orr, jessica c.; teixeira, vitor h.; hartley, benjamin e.; hewitt, richard j.; yaghchi, chadwan al; sandhu, gurpreet s.; birchall, martin a.; o’callaghan, christopher; smith, claire m.; de coppi, paolo; hynds, robert e.; janes, sam m. title: cell-intrinsic differences between human airway epithelial cells from children and adults date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ppgx mci the airway epithelium is a key protective barrier whose integrity is preserved by the self-renewal and differentiation of basal progenitor cells. epithelial cells are central to the pathogenesis of multiple lung diseases. in chronic diseases, increasing age is a principle risk factor. in acute diseases, such as covid- , children suffer less severe symptoms than adults and have a lower rate of mortality. few studies have explored differences between airway epithelial cells in children and adults to explain this age dependent variation in diseases. here, we perform bulk rna sequencing studies in laser-capture microdissected whole epithelium, facs-sorted basal cells and cultured basal cells, as well as in vitro cell proliferation experiments, to address the intrinsic molecular differences between paediatric and adult airway basal cells. we find that, while the cellular composition of the paediatric and adult tracheobronchial epithelium is broadly similar, in cell culture, paediatric airway epithelial cells displayed higher colony forming ability, better in vitro growth and outcompeted adult cells in competitive proliferation assays. in rna sequencing experiments, we observed potentially important differences in airway epithelial gene expression between samples from children and adults. however, genes known to be associated with sars-cov- infection were not differentially expressed between children and adults. our results chart cell-intrinsic differences in transcriptional profile and regenerative capacity between proximal airway epithelial cells of children and adults. the human airways are lined by a pseudostratified epithelium from the trachea through most of the generations of conducting airway branching. functionally, the epithelium secretes a mucous layer and produces motile force to flow mucus proximally out of the lungs, providing protection against noxious particles and pathogens. these specialized functions are accomplished by luminal mucosecretory and ciliated epithelial cells, respectively. the differentiated cell types in this slow-turnover tissue are replenished by airway basal cells, which act as multipotent progenitors (rock et al., ; teixeira et al., ) . the cellular composition of the airway is being mapped in ever more detail through single cell rna sequencing approaches. a novel cftr-producing population, the ionocyte (montoro et al., ; plasschaert et al., ) , has been described, and we have new insight into cellular differentiation through the resolution of differentiation intermediates; for example, basal luminal progenitor cells (mori et al., ; rock et al., ; watson et al., ) , which are defined by an intermediate keratin profile (tp -/krt +/krt +/krt +/krt +) (braga et al., ; garcía et al., ) . efforts in mapping pathologic states have identified mucous ciliated cells (ciliated cells which co-express a number of genes typically associated with goblet cells and are more frequent in asthmatic patients (braga et al., ) ), as well as novel pathological epithelial cell subtypes in idiopathic pulmonary fibrosis (adams et al., ; habermann et al., ; reyfman et al., ) . in addition, region-specific differences exist in airway epithelial cell phenotype within the bronchial tree such that basal, ciliated and secretory populations in the nasal epithelium differ phenotypically (and maybe functionally) from their counterparts in the large or small airways (braga et al., ; deprez et al., ; kumar et al., ; travaglini et al., ) . however, despite well-characterized structural and functional consequences of ageing in the distal lung (navarro and driscoll, ; thurlbeck and angus, ; turner et al., ) , little is known about alterations in human airway epithelial cell composition or function between paediatric and adult tissue. the importance of age-related epithelial variation has been highlighted by the covid- pandemic, where functional differences in airway predisposition to viral infection and response have had a major clinical impact. single cell rna sequencing studies have shown that increased transcriptional noise and upregulation of a core group of age-associated molecular mechanisms -including protein processing-and inflammation-associated genes -are correlated with ageing across mouse cell and tissue types, but additional processes are unique to particular cell types within specific organs, including the lungs (angelidis et al., ; kimmel et al., ) . however, to date such studies have not profiled the trachea, which has distinct composition and stem cell biology to the more distal airways (basil et al., ) , in detail. during murine tracheal ageing, epithelial cell density is reduced and the proportion of basal cells within the epithelium is slightly decreased, but there is no obvious decline in their in vitro clonogenic potential or differentiation capacity (wansleeben et al., ) . however, microarray gene expression analysis shows changes consistent with the development of low-grade chronic inflammation in older mice, together with an increased presence of activated adaptive immune cells (wansleeben et al., ) . there are striking differences in airway structure and composition between rodents and humans , which suggests that the effects of ageing on airway epithelial regeneration might differ substantially between species. careful characterization of human paediatric and adult airway epithelial cell composition and function will both inform lung regenerative medicine efforts and potential pathogenic mechanisms behind multiple chronic lung disease pathologies (kicic et al., ; prasse et al., ; staudt et al., ) , as age is a risk factor for copd, pulmonary fibrosis, infection and lung cancer. in all such cases, basal stem cell dysfunction, perhaps accelerated by smoking, is likely to play a role in disease pathogenesis (meiners et al., ) . here, we compare paediatric and adult human tracheobronchial epithelia in terms of cellular composition, gene expression profiles and behaviour in primary cell culture. in light of the ongoing covid- pandemic, we compare expression of epithelial viral infection-associated genes, including those so far linked to sars-cov- infection, and make our data available as a resource for the community. human tracheobronchial epithelium has comparable cellular composition in children and adults in mouse trachea, the proportion of the epithelium that expresses the basal cell-associated protein keratin (krt ) decreases with age (wansleeben et al., ) , so we first compared the cellular composition of steady state normal human airway epithelium using haematoxylin and eosin (h&e) staining and immunohistochemistry for tp (basal cells), muc ac table s ). we found no significant differences in the proportion of cells in these three cellular compartments in paediatric and adult biopsies either by immunohistochemistry ( figure a / b), or by assessing basal, mucosecretory or ciliated cellassociated gene expression (table s ) in bulk rna sequencing in which we had laser-capture microdissected the whole epithelium ( figure c ; figure s ). analysing this laser-capture microdissected whole epithelium rna sequencing dataset using deseq (love et al., ) with a false discovery rate (fdr) of % and log fold change threshold of . , we identified genes with significant differential expression between paediatric and adult donors of which were upregulated in adults and were expressed at higher levels in children ( figure a ; table s ). to determine alterations in biologically functional gene groups, we performed gene set enrichment analysis (gsea) using the hallmark gene sets from msigdb (liberzon et al., ; subramanian et al., ) . in the paediatric airway epithelium, this demonstrated a higher expression of genes associated with interferon alpha and gamma responses. in adults, there was a higher expression of genes associated with tp , tgfβ and wnt-β-catenin signalling, as well as processes such as epithelial-mesenchymal transition and glycolysis ( figure b ). although broad transcriptional similarities are expected within the same tissue between childhood and adulthood, these differences in gene expression may be functionally important. quantification of tp + basal cells, muc ac + mucosecretory cells and foxj + ciliated cells in paediatric and adult tracheobronchial epithelium. results are shown as a proportion of total cells within the epithelium ( , total cells for tp , , for muc ac and , for foxj ). no significant differences were seen in a two-sided wilcoxon rank sum test (n = donors/age group; basal cells, p = . ; ciliated cells, p = ; mucosecretory cells, p = . ). (c) expression of basal, ciliated and mucosecretory cell markers in rna sequencing data from paediatric and adult epithelium. for each sample, the geometric mean of a set of gene markers associated with a given cell type (table s ) is shown. no significant differences were seen in a two-sided wilcoxon rank sum test (n = donors/age group; basal cells p = . ; ciliated cells p = . , mucosecretory cells p = ). (a) cluster diagram showing the normalized expression of all genes differentially expressed with a false discovery rate (fdr) < . and a log fold change > . in six paediatric (months of age/sex; m, m, m, m, m, m) and six adult (years of age/sex; f, f, m, f, m, m) laser-capture microdissected tracheobronchial epithelial samples. values are scaled by row. gene order is based on hierarchical clustering based on the similarity in overall expression patterns. red represents relative expression higher than the median expression and blue represents lower expression. (b) pathway analysis was performed on the same paediatric and adult laser-capture microdissected tracheobronchial epithelial samples using gene set enrichment analysis (gsea) to interrogate hallmark pathways from msigdb. for pathways with fdr < . , normalized enrichment scores are shown. a negative score (blue) represents upregulation of the pathway in the paediatric samples; a positive score (red) represents upregulation in the adult samples. basal cells from children proliferate more readily in primary cell culture than those from adults to investigate possible differences in regenerative potential between paediatric and adult proximal airway basal cells, we established primary cell cultures. first, we facs-sorted single basal cells, identified by dual epcam (epithelial) and podoplanin (pdpn; basal (miller et al., ; weeden et al., ) ) positivity, into individual wells of -well plates (range = to cells per donor) to compare the clonal potential of native paediatric and adult basal cells. after days of culture in epithelial cell culture medium containing y- , colony formation was significantly higher among basal cells derived from children than adults ( figure a ), consistent with our previous work (yoshida et al., ) . at this timepoint, paediatric basal cells had often generated colonies that had become confluent to fill the well, whereas no adult colonies reached confluence ( figure b / c). when cells were isolated and cultured in epithelial cell culture medium without y- on t -j feeder layers, expansion of paediatric and adult cells proceeded similarly at early passages but a growth advantage was observed in paediatric donors after passages ( figure d ). in mtt assays, cultured paediatric cells showed greater proliferation than adult cells after and days ( figure e ). likewise, differences in edu uptake ( figure f ) and ki -positivity ( figure g ) were seen in passage cell cultures in the presence of feeder cells. when cultured basal cells were assessed in colony formation assays, there was a trend towards paediatric cells forming more colonies than adult cells ( figure h ). to better understand progenitor capacity, we next developed a competitive proliferation assay ( figure s ), using lentiviral cell labelling with fluorescent constructs (eekels et al., ) . after optimization in t cells to ensure that the two lentiviruses did not affect cell growth ( figure s ), we isolated and cultured patient basal cells in epithelial cell culture medium containing y- to facilitate lentiviral transduction (horani et al., ) and transduced these with either with green fluorescent protein (gfp)-or mcherry-expressing lentiviral constructs ( figure a ). when combining gfp + and mcherry + cells from the same donor in equal number, the ratio remained : over days ( figure s ). after transducing three paediatric and four adult basal cell cultures in this manner, we combined each paediatric donor with each adult donor in both possible colour combinations so that we could monitor the growth dynamics of the two populations separately using fluorescence ( figure b / c). there were no differences in lentiviral integration as determined by pcr targeting the puromycin resistance gene contained within both gfp and mcherry lentiviral vectors ( figure s e ). when cells were harvested at approximately % confluence, the growth differential between paediatric cells and adult cells was calculated for each paediatric/adult pair. in almost all donor pairs, the paediatric cells outgrew the adult cells ( figure d ). interestingly, the pairings that did not follow this pattern involved the youngest adult donor, who was years of age. given the different behaviour and phenotype of cultured basal cells from children and adults observed in vitro, we assessed whether these changes correlate with gene expression differences in native basal cells. we used facs to isolate epcam + /pdpn + basal cells (miller et al., ; weeden et al., ) directly from tracheal biopsies and performed bulk rna sequencing. consistent with successful purification of basal cells, we saw enrichment for basal cell-associated gene expression in this dataset compared to the laser-capture microdissected whole epithelium ( figure s a ). using deseq with an fdr of % and log fold change threshold of . , we identified genes with significant differential expression between basal cells sorted from paediatric and adult donors, of which were upregulated in children and were more highly expressed in adults ( figure a ; table s ). ntrk has previously been associated with basal cell function as it was upregulated in polyp versus non-polyp basal cells in human nasal basal epithelial cells (ordovas-montanes et al., ) . here, it was upregulated in adult compared to paediatric basal cells, consistent with a possible negative influence on basal cell progenitor function. however, the majority of differentially expressed genes do not have previously described roles in airway basal cells. gsea suggested that pathways such as tnfα and mtorc signalling, as well as processes such as inflammation and apoptosis, were higher in paediatric basal cells although all pathways were of borderline statistical significance ( figure b ). next, we asked whether cell culture differentially affected the transcriptome of paediatric and adult tracheobronchial basal cells in a manner that might result in functional differences. we performed bulk rna sequencing on cultured basal cells that were isolated and expanded on t -j mouse embryonic feeder cells in epithelial cell culture medium containing y- . as expected, cultured cells were enriched for basal cell-associated gene expression compared to laser-capture microdissected whole epithelium ( figure s a ). freshly sorted basal cells were more similar to laser-capture microdissected whole epithelium than cultured basal cells ( figure s b ), emphasising the significant impact of the artificial cell culture environment on the basal cell transcriptome. using deseq with an fdr of % and log fold change threshold of . , we identified genes with significant differential expression between basal cells sorted from paediatric and adult donors, of which were upregulated in children and were more highly expressed in adults ( figure c ; table s ). notably, the mucins muc , muc a, muc ac, muc b, and muc , as well as the secretory master regulator spdef, were all more highly expressed by adult cultured basal cells than paediatric basal cells. in adult basal cell cultures, we have previously observed upregulation of mucosecretory genes, such as scgb a , in cells co-cultured with mouse embryonic feeder cells in medium containing y- compared to those in a serum-free alternative, bronchial epithelial growth medium (butler et al., ) . however, even if mucosecretory gene expression is favoured in these conditions, it is unclear why paediatric and adult epithelial cells differ in their response to culture. gsea suggested alterations in multiple pathways, some of which, including tnfα signalling and the inflammatory response, were now in the opposite direction to what was seen in basal cells in vivo ( figure d ). airway epithelial cell types differ in their gene expression between the nasal and airway epithelium , and indeed between the proximal and distal airways (travaglini et al., ) , potentially contributing to the region-variable infection pattern of the novel coronavirus sars-cov- (sungnak et al., ; wölfel et al., ) . however, this virus also produces markedly less severe symptoms in paediatric than adult covid- patients su et al., ) . upregulated type i and ii interferon-associated gene expression in children ( figure b ) might be indicative of more effective anti-viral immunity whereas higher wnt pathway-associated gene expression in adults ( figure b ) might be relevant since wnt activation has been shown to increase rates of influenza replication (more et al., ) . to further assess epithelial-intrinsic factors that might explain the disparity in symptom severity, we compared the expression of a manually curated list of genes associated with viral infection in epithelial cells (table s ) in our laser-capture microdissected whole tracheobronchial epithelium dataset ( figure a ). while there was heterogeneity of expression in some of these genes between individuals, clustering was not seen by donor age ( figure a ). the entry of coronaviruses into host cells depends upon the binding of the viral spike (s) protein to a host receptor and s protein priming by host proteases. it has been reported that sars-cov- uses ace as a receptor and the tmprss protease for s priming, although cathepsin l activity may also have a role (hoffmann et al., ) . while ace and tmprss transcripts are found in adult nasal and airway goblet and ciliated cells (bertram et al., ; sungnak et al., ) , protein expression of ace in particular seems to be low overall in airway epithelium and restricted to rare cells (aguiar et al., ; hikmet et al., ) . moreover, it is unclear whether these findings extend to paediatric nasal or airway epithelia. to address this, we looked at the expression of these, and other candidate sars-cov- infection-associated genes (table s ), in laser-capture microdissected whole tracheobronchial epithelium. expression was equivalent between the paediatric and adult tissue, although adam -a gene that is involved in ectodomain shedding of both egfr ligands (vallath et al., ) and ace in the context of sars-cov (lambert et al., ) -neared statistical significance with higher expression in adult epithelium ( figure b ). we also determined the relative expression of these genes compared to all other genes in each dataset, and found that, although ace is detected in each, it is typically in the midrange of transcripts detected ( figure c ). expression of all candidate sars-cov- infection-associated genes was also comparable between children and adults in our facs-sorted basal cell ( figure s ) and cultured basal cell datasets ( figure s ), although the lower n number in those experiments limits these analyses. we hope that these rna sequencing datasets (geo accession number gse ) will be a useful resource for investigations of covid- . table s ). in this study, we explored differences between human tracheobronchial basal cells in children and adults in three bulk transcriptomic experiments; we compared laser-capture microdissected whole epithelium, facs-sorted basal cells and cultured basal cells. at the level of the whole epithelium, there was broad conservation of airway epithelial transcriptional programmes but notable differences in the expression of genes associated with interferon responses and cell proliferation, which we speculate may lead to functional differences in cellular behaviour. in support of this, epithelial cell culture showed that basal cells from children have a greater colony forming capacity than those from adults. these data are consistent with studies in other epithelia, where ageing reduces the proportion of cells identified as stem cells using in vitro methodologies (barrandon and green, ) . such differences in clonal potential and proliferative capacity of basal cells between children and adults might be responsible for their differing repair responses following airway injury (smith et al., ) , and is of relevance for lung regenerative medicine. for example, paediatric cells might be more amenable to engraftment following transplantation, as is the case in bone marrow transplantation, where donor age significantly affects outcomes (kollman et al., ) . although future airway epithelial cell therapies are likely to be predominantly required by older people and be autologous in nature, recognising age-related differences in regenerative capacity might allow the development of approaches that improve the culture and transplantation of aged basal cells. further, corrective gene and cell therapies in the context of genetic diseases such as cystic fibrosis (vaidyanathan et al., ) might be more efficient if performed early in life, when cultured cells have greater progenitor potential. basal cell-specific rna sequencing revealed few differences in gene expression between paediatric and adult tracheobronchial basal cells in vivo that might explain the differences in their proliferative capacity once cultured, although the role of many differentially expressed genes has not been determined in respiratory epithelial cells. upon culture, new differences between paediatric and adult basal cells emerged, suggesting that cells from donors of different ages might respond differently to cell culture. this would be consistent with paediatric cultures being more proliferative and typically proliferating for longer time periods than their adult counterparts in population doubling assays. the mechanisms by which airway basal cells lose their in vitro proliferative capacity with age, and whether this reflects in vivo loss of progenitor capacity are important areas for further study. during the current covid- pandemic, it has been observed that children suffer less severe symptoms than adult patients. there has been speculation that this might be due to their lower expression levels of the viral entry receptor, ace . our results imply that there are no major differences in the epithelial expression of viral infection-associated genes between children and adults in the proximal airways, and suggest that expression of ace and other genes implicated in sars-cov- cell entry are also comparable. of course, there are multiple caveats: investigating nasal epithelium may reveal differences that are not present in the tracheobronchial epithelium since both covid- symptomatic and asymptomatic patients show a greater viral yield in nasal samples than throat samples (zhou et al., ) , our rna sequencing studies did not include donors over the age of who have a higher covid- morbidity and mortality risk, and future single cell rna sequencing of blood or nasal epithelial brush biopsies might reveal differences in immune cell phenotype or viral response that we could not detect here. thus far, human adult lung single cell rna sequencing has been used in attempts to correlate ace biology with clinical observations during the current outbreak of covid- (sungnak et al., ) . however, the relative lack of symptoms in paediatric covid- patients justifies accelerated efforts to map human paediatric nasal and lung cells to facilitate comparisons with these adult samples. it is possible to culture large numbers of airway epithelial cells in these and other cell culture conditions (butler et al., ; peters-hall et al., ; zhang et al., ) . airway basal cells might be candidates for in vitro replication of sars-cov- , which would minimize issues arising as a result of physiological differences with cell lines derived from other organs and/or species (poon et al., ) . although we detected transcripts for genes associated with viral uptake in d primary cultured human tracheobronchial basal cells, other studies have shown that airway basal cells in another medium composition do not express ace protein (aguiar et al., ) so this requires validation at the protein level. nevertheless, cell culture conditions alter basal cell gene expression more significantly than inter-patient variability (butler et al., ) , so comparing ace protein expression across basal cell culture systems might be of interest. basal cells can also be differentiated towards mucosecretory and ciliated lineages in scalable air-liquid interface or organoid (sachs et al., ) cultures that contain the cellular lineages thought to be targeted by sars-cov- in patients, suggesting these for use in viral infection studies (jonsdottir and dijkman, ; pizzorno et al., ) . ethical approval to obtain patient tracheobronchial biopsies was granted by the national research ethics committee (rec references /lo/ and /q / ) and patients (or their parents) gave informed, written consent. luminal biopsies were obtained using cupped biopsy forceps from patients undergoing planned rigid laryngotracheobronchoscopy under general anaesthesia or flexible bronchoscopy under sedation. patient characteristics, procedure indication and precise site of biopsy are included in table s . samples for histology were fixed overnight in % paraformaldehyde (pfa) before being dehydrated through an ethanol gradient using a leica tp vacuum tissue processor. samples were embedded in paraffin and sectioned at µm thickness using a microtome. haematoxylin and eosin (h&e) staining was performed using an automated system (tissue-tek kit. stained slides were scanned using a nanozoomer whole slide imager (hamamatsu photonics) to create virtual slides using ndp.view software. for cell type quantification, images of sections that contained areas of intact epithelium with at least cells were used (overall - slides were assessed per donor). images were reviewed in fiji software and positively stained cells counted using the cell count function. in total, , (tp ), , (muc ac) and , (foxj ) cells were assessed for expression of these proteins. bronchoscopic biopsies were frozen immediately in optimal cutting temperature compound (oct; in liquid hexane or on dry ice) and transported to a histopathology laboratory (great ormond street children's hospital, london, u.k.) within hours on dry ice. blocks and cut slides were stored at - °c prior to use. μm sections were mounted on membraneslide . pen (d) membrane covered slides (zeiss). one h&e slide was cut per block to aid navigation and identification of epithelium and basement membrane. slides were prepared with serial washes in methanol, rnase-free water, rnase inhibitor and ethanol to remove residual oct. laser capture microdissection was performed using a palm microbeam laser microdissection microscope at x and x magnification ( figure s ) to extract the epithelial portion (or all cells above the basement membrane) from each biopsy into microadhesivecapped tubes (zeiss). samples were suspended in a : mix of arcturus picopure extraction buffer: rnalater (life technologies) and stored at - °c until use. for rna extraction, samples were thawed, disrupted by lysis (incubation at °c for mins followed by incubation at room temperature for mins), vortexed and filtered using rnaeasy minelute columns (qiagen). rna extraction was performed using the arcturus picopure rna isolation kit (life technologies ltd; kit ) as per the manufacturer's instructions. rna was quantified using the qubit rna hs assay kit (thermo fisher scientific). libraries were created by ucl genomics core facility using the smarter stranded total rnaseq kit (clontech), cleaned using jetseq (bioline) and quality control analysis of rna integrity was performed using high sensitivity rna screentape and the tapestation analysis software (agilent technologies). rna sequencing was performed using . x nextseq for cycles (illumina; pe, ~ m reads per sample). following sequencing, run data were demultiplexed and converted to fastq files using illumina's bcl fastq conversion software v . . quality control and adapter trimming were performed using fastp version . . with default settings. fastq files were then tagged with the umi read (umitools (smith et al., ) ) and aligned to the human genome ucsc hg using rna-star (dobin et al., ) version . . b. aligned reads were umi deduplicated using je-suite (girardot et al., ) version . . and count matrices were obtained using featurecounts. downstream analysis was performed using the r statistical environment version . . with bioconductor version . . (huber et al., ) . counts were compared between paediatric and adult groups using deseq (love et al., ) using the default settings. only genes with non-zero counts in at least two samples were included in differential analysis. pathways were assessed using an implementation of gene set enrichment analysis (gsea) in the fgsea r package (korotkevich et al., ) , using hallmark gene sets from msigdb (liberzon et al., ; subramanian et al., ) as input. heatmaps were plotted using the pheatmap (kolde, ) package, implementing a complete linkage clustering method. all other plots were created using ggplot . to avoid confounding by sex distribution, differences between the paediatric and adult groups, genes on the x and y chromosomes were removed prior to differential analysis. this study utilises several gene lists as included in table s : markers of basal, secretory and ciliated cells, viral response genes and covid- genes of interest. in each case, potentially relevant genes were identified following expert review of the literature. for cell marker genes, we further refine these lists following the method of danaher and colleagues (danaher et al., ) , based on the assumption that genes consistently associated with a cell type should correlate with each other. using normal lung rna sequencing data from the gene tissue expression project (gtex consortium, ), we constructed a similarity matrix for each cell type following the danaher method and performed hierarchical clustering. in each case, a clear co-correlated cluster was observed ( figure s ). genes within this cluster were taken forwards for further analysis ( figure s ; table s ) . samples were transported to the laboratory in transport medium consisting of αmem containing penicillin/streptomycin, amphotericin b and gentamicin. cell suspensions were generated by sequential enzymatic digestion using dispase u/ml (corning) for minutes at room temperature followed by . % trypsin/edta (sigma) for minutes at °c (both in rpmi medium, gibco). each enzyme step was quenched with medium containing fbs, placed on ice and combined following the second digestion. biopsies were manually homogenized using sharp dissection between digest steps and by blunt homogenization through a μm cell strainer (miltenyi biotec). centrifugation steps were performed at x g for mins at for colony formation assays, basal cells were sorted into collagen i-coated -well plates containing t -j feeder cells at , cells per cm using a bd facsaria fusion facs sorter running bd facsdiva . software at the ucl cancer institute flow cytometry core facility. experiments lasted days and at termination, the number of wells which had become confluent was counted manually using a light microscope. brightfield images were taken using a zeiss axiovert a microscope. for bulk rna sequencing, basal cells were sorted into epithelial cell culture medium containing y- for transport to the laboratory before being centrifuged at x g for mins at °c and resuspended in rna extraction buffer and processed as above for laser-captured samples. primary human airway epithelial cells were isolated and expanded on mitotically inactivated t -j feeder layers in two previously reported epithelial growth media, one containing y- (butler et al., ; liu et al., ) and one without (hynds et al., ; rheinwald and green, ) . feeder layers were prepared as previously described (hynds et al., ) . epithelial cell culture medium without y- consisted of dulbecco's modified eagle's medium (dmem)/f in a : ratio containing x penicillin-streptomycin, % fetal bovine serum, % adenine, hydrocortisone ( . µg/ml), egf ( ng/ml), insulin ( µg/ml), . nm cholera toxin, x - t and gentamicin ( mg/ml), as previously described (hynds et al., ) . epithelial cell culture medium containing y- consisted of dmem/f in a : ratio containing x penicillin-streptomycin (gibco), % fetal bovine serum (gibco) supplemented with μm y- (cambridge bioscience), hydrocortisone ( ng/ml; sigma-aldrich), epidermal growth factor ( . ng/ml; sino biological), insulin ( μg/ml; sigma-aldrich), . nm cholera toxin (sigma-aldrich), amphotericin b ( ng/ml; thermo fisher scientific) and gentamicin ( μg/ml; gibco), as previously described (butler et al., ) . where indicated, dishes were collagen i-coated by diluting rat tail collagen i (bd biosciences) to μg/ml in sterile . n acetic acid and applying at μg/cm for one hour at room temperature in a tissue culture hood. coated surfaces were washed once with sterile pbs before cell seeding. population doublings were calculated as previously described (butler et al., ) . rna sequencing of cultured basal cells was performed after either two or three passages in epithelial cell culture medium containing y- . rna was extracted using the rneasy mini kit (qiagen) following the manufacturer's instructions. rna was submitted for library preparation by the wellcome-mrc cambridge stem cell institute. initial qc was performed using qubit rna hs assay kit (thermo fisher scientific) and tapestation analysis software (agilent technologies). ng rna was used for library preparation using the nebnext ultra ii directional rna library prep kit (illumina) and the qiaseq fastselect rna removal kit (qiagen). libraries were subsequently measured using the qubit system and visualized on a tapestation d . sequencing was performed using the novaseq sp pe standard (~ m illumina reads per sample) at the cruk-ci genomics core. following quality control and adapter trimming with fastp (as described above), paired-end reads were aligned to the human genome ucsc hg using rna-star (dobin et al., ) version . . b. count matrices were obtained using featurecounts. downstream analysis was performed as described for epithelial and basal rnaseq datasets above. for mtt assays, primary human airway epithelial cells that had been isolated and expanded in epithelial cell culture medium without y- were seeded in -well plates at a density of , cells/well for hours without feeder cells. at the stated end-points, adherent cells were stained with mtt dye solution ( µl of : diluted mtt stock solution in culture medium) for hours at °c. after incubation, the medium was removed and µl dimethyl sulfoxide (dmso; sigma-aldrich) was added to dissolve the mtt crystals. the eluted specific stain was measured using a spectrophotometer ( nm). to analyse edu uptake, passage primary human airway epithelial cells that had been isolated and expanded in epithelial cell culture medium without y- were cultured until approximately % confluence. cells were washed with pbs and feeder cells were removed by differential trypsinization. after washing in dmem containing % fbs and then pbs again, the remaining epithelial cells were treated with μm edu (life technologies click-it edu alexa fluor ) for hour. cells were trypsinized to obtain single cell suspensions, stained according to the manufacturer's instructions and finally co-stained with dapi. cells were run on an lsrfortessa (bd biosciences) flow cytometer and data were analysed using flowjo . . (treestar). , cultured human airway epithelial cells were seeded per well of a collagen i-coated sixwell plate containing inactivated t -j feeder cells. medium was carefully changed on day and day of culture before the experiment was terminated on day . colonies were fixed for minutes in % pfa, stained using crystal violet (sigma-aldrich) at room temperature for minutes and washed repeatedly in water. colonies of more than cells were counted manually using a light microscope. colony forming efficiency was calculated as: (number of colonies formed/number of seeded cells) * . cells were cultured in -well chamber slides (ibidi) and fixed in % pfa for minutes at room temperature. slides were washed and stored in pbs until staining. cells were permeabilized and blocked in pbs containing % fbs and . % triton x for hour at room temperature. primary anti-ki antibody (thermo fisher scientific, rm- ) was incubated overnight in block buffer without triton x at °c. after three -minute washes in pbs, anti-rabbit secondary antibodies (alexafluor dyes; molecular probes) were incubated at a : dilution in block buffer without triton x for hours at room temperature. cells were washed in pbs, counterstained using dapi ( μg/ml stock, : , in pbs) and washed twice more in pbs. images were acquired using a zeiss lsm confocal microscope. ki staining was performed by seeding passage cells on feeder cells for days. feeder cells were removed by differential trypsinization and basal cells were fixed in % pfa prior to staining as above. ki positivity was assessed by manual counting of ki -stained nuclei as a proportion of all dapi-stained nuclei in five images per donor (mean = cells per donor, range - ). one shot stbl chemically competent e. coli bacteria (thermo fisher scientific) were transformed using third generation lentiviral plasmids. the gfp-containing and mcherrycontaining plasmids were pcdh-ef -copgfp-t a-puro and pcdh-cmv-mcherry-t a-puro (gifts from kazuhiro oka; addgene plasmids # and # ). the packaging plasmids used were pmdlg/prre, prsv-rev and pmd .g (gifts from didier trono; addgene plasmids # , # and # (dull et al., ) ). bacteria were plated on ampicillin-containing agar plates overnight, and colonies expanded from this in lb broth containing ampicillin. plasmids population doubling times were calculated for each colour and moi, and these values were compared to untransduced cells. cells were then seeded as a : mix of gfp-and mcherryexpressing cells into quadruplicate wells of a -well plate ( , of each colour at either high or low expression). these cells were passaged twice per week. at each passage, the contents of each well were trypsinized and the percentage of gfp and mcherry-expressing cells was analysed by flow cytometry. % of each well's cell volume was immediately re-plated into fresh -well plates whilst % of the sample from each well was stained with a live/dead fixable stain, fixed with % pfa and analysed for gfp and mcherry expression by flow cytometry. reference wells containing , cells of a single colour were trypsinized at the same time points and counted manually to calculate a doubling time for each cell type. primary human tracheobronchial basal cells were isolated and expanded in epithelial cell culture medium containing y- for two passages before transduction with either gfp-or mcherry-containing viruses (moi = ). transduction was performed in culture medium plus μg/ml polybrene and medium was exchanged for fresh culture medium after hours. purification of cultures to remove untransduced cells was performed by facs for gfp or mcherry positivity after - days of further expansion. since both the gfp and mcherry lentiviral plasmids carry a puromycin resistance cassette, pcr copy number analysis for the puromycin cassette was performed to quantify the number of lentiviral copies incorporated per donor cell (taqman custom copy number assay). genomic dna (gdna) was extracted from each transduced cell culture using a blood & cell culture dna mini kit (qiagen) as per the manufacturer's instructions. gdna was quantified using a nanodrop system and pcr performed using the taqman genotyping master mix and reference assay (life technologies). stable transduction was confirmed by flow cytometry prior to competitive proliferation assays. feeder cells were removed from cultures of gfp+ and mcherry+ basal cells by differential trypsinization and basal cells were suspended in facs buffer containing μm y- for facs. cells were seeded by facs as a : mix of gfp-and mcherry-expressing cells into triplicate wells of a -well plate containing t -j feeder cells ( or , of each colour). all paediatric cultures were crossed with all of the adult cultures of the opposite label. medium was refreshed after three and five days. at seven days or at - % confluence as ascertained by fluorescence microscopy, the entire contents of the well were trypsinized for flow cytometry (i.e. feeder cells were not removed). the resulting cell suspensions were digested enzymatically with . u / ml liberase (thermolysin medium; roche) in serum-free medium to form a single cell suspension, which was quenched with medium containing % fbs. zombie violet live/dead fixable stain was applied as per the manufacturer's instructions and cell suspensions were fixed with % pfa for mins at room temperature. gfp and mcherry expression was assessed by flow cytometry. feeder cells were seen as an unlabelled population. untransduced paediatric cells cultured in parallel were used as non-labelled controls and single colour wells containing , cells from each individual donor were used as single colour controls. cell growth advantages were calculated using the growth calculation of eekels et al. for two cell populations, assuming exponential change in the ratio of the two populations over time (eekels et al., ) . to calculate the growth advantage of paediatric cells over adult cells, we used the formula: where ga is the calculated growth advantage, td (adult) immunofluorescence imaging showed that the fluorescence intensity was higher after transductions at the higher moi. cells were imaged hours after sorting. cells remained stably transduced even after transduction using the lower moi. scale bars = μm. (b) fluorescence imaging of mixed cultures of gfp+ and mcherry+ t hek cells after days. (a) immunofluorescence imaging days post-transduction of human basal cells with gfp (above) and mcherry (below) lentiviral constructs. (b) immunofluorescence imaging days after fluorescence-activated cell sorting to purify gfp+ (above) and mcherry+ (below) populations from untransduced cells. (c) immunofluorescence imaging of cultures days after mixing of primary human airway basal cells in a : ratio. (d) flow cytometry experiment in which gfp+ and mcherry+ cells from the same donor were mixed in equal ratio and analysed after days. (e) copy number determination for the puromycin resistance gene, which is present in both the gfp-and mcherry-containing lentiviruses. there were no differences in viral integration between either paediatric and adult cell cultures or between gfp and mcherry lentiviruses (p = . for gfp, p = . for mcherry, two-tailed unpaired t-test). table s ) across the three rna sequencing datasets (laser-capture microdissected whole epithelium, "lcm epithelium"; facs-sorted epcam + /pdpn + basal cells, "sorted"; cultured basal cells, "cultured"). (b) umap plot visualising laser-capture microdissected whole tracheobronchial epithelium ("lcm epithelium", squares), facs-sorted basal cell ("basal", circles) and cultured basal cell ("cultured", triangles) datasets. blue colour indicates paediatric samples and red colour indicates adult samples. gene order is based on hierarchical clustering based on the similarity in overall expression patterns. red represents relative expression higher than the median expression and blue represents lower expression. genes that are not expressed in this dataset are excluded (see table s ). genes that are not expressed in this dataset are excluded (see table s ). (b) plots comparing the expression (log normalized counts) of selected host genes associated with sars-cov- (ace , tmprss , ctsl), other coronavirus (adam , anpep, dpp ) and influenza (st gal , st gal , tmprss ) infection in the three paediatric and three adult cultured basal cell samples. all genes analysed were non-significant (fdr > . ) when comparing paediatric and adult samples using a wilcoxon test with correction for multiple testing by the benjamini-hochberg method. figure s : refining tracheobronchial cell type-specific gene lists by co-correlation. (a) a similarity matrix, based on the method of danaher et al. (danaher et al., ) , showing co-correlation of the expression of a manually curated list of genes expected to be expressed by airway basal cells (see table s ) in the gene tissue expression project normal lung rna sequencing data (gtex consortium, ) . hierarchical clustering revealed a sub-cluster of genes whose expression was correlated. these were taken forward for further analyses as cell type-specific gene signatures and applied to our proximal airway epithelial cell rna sequencing datasets. (b) as for (a) but using a manually curated list of genes expected to be expressed by airway mucosecretory cells. (c) as for (a) and (b) but using a manually curated list of genes expected to be expressed by airway ciliated cells. single cell rna-seq reveals ectopic and aberrant lung resident cell populations in idiopathic pulmonary fibrosis gene expression and in situ protein profiling of candidate sars-cov- receptors in human airway epithelial cells and lung tissue. biorxiv an atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics three clonal types of keratinocyte with different capacities for multiplication the cellular and physiological basis for lung repair and regeneration: past, present, and future influenza and sars-coronavirus activating proteases tmprss and hat are expressed at multiple sites in human respiratory and gastrointestinal tracts a cellular census of human lungs identifies novel cell states in health and in asthma rapid expansion of human epithelial stem cells suitable for airway tissue engineering fastp: an ultra-fast all-in-one fastq preprocessor gene expression markers of tumor infiltrating leukocytes star: ultrafast universal rna-seq aligner a third-generation lentivirus vector with a conditional packaging system a competitive cell growth assay for the detection of subtle effects of gene transduction on cell proliferation novel dynamics of human mucociliary differentiation revealed by single-cell rna sequencing of nasal epithelial cultures human genomics. the genotype-tissue expression (gtex) pilot analysis: multitissue gene regulation in humans single-cell rna-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis a comparativedescriptive analysis of clinical characteristics in -coronavirus-infected children and adults the protein expression profile of ace in human tissues. biorxiv sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor repair and regeneration of the respiratory system: complexity, plasticity, and mechanisms of lung stem cell function rho-associated protein kinase inhibition enhances airway epithelial basal-cell proliferation and lentivirus transduction orchestrating high-throughput genomic analysis with bioconductor expansion of human airway basal stem cells and their differentiation as d tracheospheres cross-talk between human airway epithelial cells and t -j feeder cells involves partial activation of human met by murine hgf coronaviruses and the human airway: a universal system for virus-host interaction studies intrinsic biochemical and functional differences in bronchial epithelial cells of children with asthma murine single-cell rna-seq reveals cell-identity-and tissue-specific trajectories of aging pheatmap: pretty heatmaps. r package version donor characteristics as risk factors in recipients after transplantation of bone marrow from unrelated donors: the effect of donor age fast gene set enrichment analysis. biorxiv distal airway stem cells yield alveoli in vitro and during lung regeneration following h n influenza infection tumor necrosis factor-alpha convertase (adam ) mediates regulated ectodomain shedding of the severe-acute respiratory syndrome-coronavirus (sars-cov) receptor high-content screening for rare respiratory diseases: readthrough therapy in primary ciliary dyskinesia the molecular signatures database (msigdb) hallmark gene set collection rock inhibitor and feeder cells induce the conditional reprogramming of epithelial cells moderated estimation of fold change and dispersion for rna-seq data with deseq hallmarks of the ageing lung basal stem cell fate specification is mediated by smad signaling in the developing human lung a revised airway epithelial hierarchy includes cftr-expressing ionocytes regulation of influenza virus replication by wnt/beta-catenin signaling notch -jagged signaling controls the pool of undifferentiated airway progenitors regeneration of the aging lung: a mini-review allergic inflammatory memory in human respiratory epithelial progenitor cells long-term culture and cloning of primary human bronchial basal cells that maintain multipotent differentiation capacity and cftr channel function characterization and treatment of sars-cov- in nasal and bronchial human airway epithelia. biorxiv a single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte recurrent mutations associated with isolation and passage of sars coronavirus in cells from non-human primates bal cell gene expression is indicative of outcome and airway basal cell involvement in ipf single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis serial cultivation of strains of human epidermal keratinocytes: the formation of keratinizing colonies from single cells notch-dependent differentiation of adult airway basal stem cells airway basal stem cells: a perspective on their roles in epithelial homeostasis and remodeling long-term expanding human airway organoids for disease modeling mechanisms of acute respiratory distress syndrome in children and adults: a review and suggestions for future research umi-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy airway basal stem/progenitor cells have diminished capacity to regenerate airway epithelium in chronic obstructive pulmonary disease the different clinical characteristics of corona virus disease cases between children and their families in china -the character of children with covid- gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles sars-cov- entry genes are most highly expressed in nasal goblet and ciliated cells within human airways stochastic homeostasis in human airway epithelium is achieved by neutral competition of basal cell progenitors growth and aging of the normal human lung a molecular cell atlas of the human lung from single cell rna sequencing elasticity of human lungs in relation to age high-efficiency, selection-free gene repair in airway stem cells from cystic fibrosis patients rescues cftr function in differentiated epithelia targeting egfr signalling in chronic lung disease: therapeutic challenges and opportunities age-related changes in the cellular composition and epithelial organization of the mouse trachea clonal dynamics reveal two distinct populations of basal cells in slow-turnover airway epithelium lung basal stem cells rapidly repair dna damage using the error-prone nonhomologous end-joining pathway virological assessment of hospitalized patients with covid- tobacco smoking and somatic mutations in human bronchial epithelium in vitro expansion of epithelial stem cells enabled by pharmacological inhibition of pak -rock-myosin ii and tgf-beta signaling a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -i ymrov authors: hill, chris h.; cook, georgia; napthine, sawsan; kibe, anuja; brown, katherine; caliskan, neva; firth, andrew e.; graham, stephen c.; brierley, ian title: structural and molecular basis for protein-stimulated ribosomal frameshifting in theiler’s murine encephalomyelitis virus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: i ymrov the a protein of theiler’s murine encephalomyelitis virus (tmev) is required for stimulating programmed − ribosomal frameshifting (prf) during infection. however, the amino acid sequence of tmev a shares only % identity with the a orthologue from the related cardiovirus encephalomyocarditis virus (emcv) for which a structure has been recently determined. here we present the x-ray crystal structure of tmev a, revealing that, despite the low sequence identity, the overall beta-shell architecture is retained, and the location of previously described mutations on this structure suggests a common rna binding mode. we determine the minimal stimulatory element in the viral rna required for a binding and show that a binds to this element with : stoichiometry and nanomolar affinity. we also demonstrate a critical role for bases upstream of the originally predicted stem-loop, providing evidence that an alternative pseudoknot-like conformation recently demonstrated for emcv is a conserved feature of cardiovirus stimulatory elements. we go on to examine frameshifting in infected cells by ribosome profiling and metabolic labelling. we observe prf efficiencies of up to %, highlighting this as the most efficient example of − prf in any natural system thus far characterised. furthermore, we document a series of ribosomal pauses in and around the site of prf with potential implications for our understanding of translational control in cardioviruses. cardioviruses are a diverse group of picornaviruses that cause encephalitis, myocarditis and enteric disease in a variety of mammalian hosts including rodents, swine and humans . the cardiovirus b or theilovirus species comprises several isolates including sikhote-alin virus , rat theilovirus and theiler's murine encephalomyelitis virus (tmev), all of which are genetically distinct from the cardiovirus a species encephalomyocarditis virus (emcv) . within the cardiovirus b species, tmev has been the most extensively characterised and has been used as a mouse model for virus-induced demyelination and multiple sclerosis . like all picornaviruses, tmev replication is cytoplasmic and begins with the translation of its single-stranded ~ kb positive-sense rna genome. the resultant polyprotein (l- abcd- abc- abcd) is subsequently processed, mainly by the virally encoded c protease , . several "non-canonical" translation events occur during the production of the tmev polyprotein. first, initiation is directed by a type ii internal ribosome entry site (ires) in the ′ untranslated region (utr) , . secondly, a co-translational stopgo or ribosome "skipping" event occurs at the junction between the a and b gene products , . in this process, the peptidyl-transfer reaction fails between the glycine and proline in a conserved d(v/i)exnpg|p motif, releasing the upstream l- abcd- a product as the ribosome continues translating the downstream bc- abcd region. note that in tmev, however, the presence of a c protease cleavage site near the start of the b protein has been considered to make the stopgo reaction functionally redundant , . thirdly, programmed − ribosomal frameshifting (prf) diverts a proportion of ribosomes out of the polyprotein reading frame and into a short overlapping orf, termed b*, near the start of the b protein. in tmev this orf is only eight codons in length and the resulting transframe product b* has just amino acids (~ . kda), with no established functional role . thus it has been hypothesised that in tmev, the main function of prf may simply be to downregulate translation of the enzymatic proteins encoded downstream of the frameshift site, which are required in much smaller quantities in the late stages of infection . prf is common amongst rna viruses, where it is used as a translational control strategy to express gene products in optimal ratios for efficient virus replication. additionally, the utilisation of overlapping orfs permits more information to be encoded by a small genome. our mechanistic understanding of prf has been informed by the study of examples in hundreds of viruses (reviewed in [ ] [ ] [ ] ). generally, prf involves two elements within the viral messenger rna (mrna). typically, a heptameric shift site or "slippery sequence" of the form x_xxy_yyz (where xxx represents any three identical nucleotides, yyy represents aaa or uuu, and z is any nucleotide except g ) is located - nucleotides upstream of a structured "stimulatory element" (usually a stem-loop or pseudoknot) that impedes the progress of the elongating ribosome, such that the ribosome pauses with the shift site in the decoding centre [ ] [ ] [ ] . this can facilitate a change of reading-frame if the codon-anticodon interactions of the psite and a-site trnas slip and recouple in the − frame (xxy  xxx and yyz  yyy, respectively) during resolution of the topologically unusual stimulatory element. for any given system, such prf generates a fixed ratio of products set by parameters that include the energetics of codon-anticodon pairing at the shift site, the conformational flexibility of the stimulatory element and the resistance of this element to unwinding by the ribosomal helicase , . in cardioviruses, however, prf exhibits some intriguing mechanistic exceptions. firstly, the conserved g_guu_uuu shift site is located - nucleotides upstream of the stimulatory stem-loop, seemingly too far away to position the shift site in the decoding centre during a pause. secondly, the viral a protein is required as a trans-activator of frameshifting in cells and in vitro , [ ] [ ] [ ] which permits temporally controlled, variable-efficiency frameshifting related to the amount of a protein in the cell during infection. to date, cardioviruses present one of only two known examples of protein-stimulated frameshifting, along with porcine reproductive and respiratory syndrome virus (prrsv) and other arteriviruses (family arteriviridae), where a complex of viral nsp β and host poly(c) binding protein stimulates prf [ ] [ ] [ ] . our previous investigations of protein-stimulated frameshifting in emcv and tmev have revealed that the viral a protein acts in complex with a stem-loop downstream of the slippery sequence, and that this interaction can be disabled by mutating either a conserved cytosine triplet in the loop or a pair of conserved arginine residues in the protein , , . more recently, we solved the structure of emcv a, revealing a novel protein fold that permits binding to both the stem-loop and to ribosomal rna with high affinity . however, a protein sequences are highly divergent within cardioviruses, and the tmev protein shares only ~ % pairwise amino acid sequence identity with its emcv orthologue. additionally, the stem-loop that comprises the stimulatory element in tmev is significantly more compact than the equivalent structure in emcv. here we present the x-ray crystal structure of tmev a and investigate the interaction with its cognate rna using a variety of biochemical and biophysical techniques. we define a minimal tmev stimulatory element necessary for a binding and show that the protein forms a : rna-protein complex with nanomolar affinity. we provide evidence that the alternative pseudoknot-like conformation recently described for the emcv stimulatory element is also likely to be present in tmev and other cardioviruses. finally, we use metabolic labelling and ribosome profiling to study a-mediated frameshifting and translation of the tmev genome at sub-codon resolution in infected cells. together, this body of work provides the most comprehensive analysis to date of the molecular biology of theilovirus protein-stimulated frameshifting. structure of tmev a tmev a protein was purified following recombinant expression as a gst fusion in e. coli ( figure a ). after cleavage of the gst tag and removal of contaminating nucleic acids by heparin affinity chromatography, high-salt conditions were required to prevent aggregation and maintain solubility. under optimised conditions, size-exclusion chromatography coupled to multi-angle light scattering (sec-mals) revealed the protein to be predominantly monomeric ( figure b , peak ) with an observed mw of , da compared to a theoretical mass of , da, as calculated from the primary sequence. a small proportion of trimers were also present ( figure b , peak : observed mass of , da vs. theoretical mass of , da). we crystallised the protein, but were unable to solve the structure by molecular replacement using the emcv a crystal structure as a search model . instead we obtained experimental phases via bromine derivatisation and determined the structure by singlewavelength anomalous dispersion analysis. the asymmetric unit of the cubic cell contained a single copy of a, which was refined to . Å resolution (table , figure c ). interpretable electron density was visible for a residues - , with a short g- p- l- g- s n-terminal extension that resulted from c protease cleavage of the gst tag. as these residues project away from the globular tmev domain and lack regular secondary structure, they are likely to be flexible in solution, and their ordered conformation in the structure likely arises from contacts with symmetry-related tmev molecules in the crystal. inspection of the crystalline lattice did not reveal the presence of a trimers and none of the observed crystallographic interfaces are predicted to mediate protein oligomerisation in solution. the structure of tmev a reveals a globular β αβ α fold, with a similar "beta-shell" architecture to emcv a (cα backbone rmsd of . Å over residues) . an extensive curved antiparallel six-stranded beta sheet packs against two alpha helices on the concave surface of the sheet, whilst the loops between adjacent beta strands project from the opposite convex face. the protein is highly basic (pi ~ . ) and most of the contributing lysine, arginine and histidine residues are solvent-exposed. at physiological ph the protein will thus have a positive electrostatic surface potential across the convex face of the beta sheet, surrounding loops and the n-terminal end of helix α , suggesting a putative rna-binding surface ( figure c ). this is supported by a previous biochemical analysis of tmev a function, in which we made several point mutations of conserved basic residues and assessed their ability to stimulate prf ( figure d ). the m mutant (r a / r a) was found to completely inhibit frameshifting, and the m mutant (k a / r a) was found to reduce it by approximately fourfold. both m and m are in surface loops on either side of this large, positively charged beta sheet, consistent with a role for this face of the protein in forming electrostatic interactions with the ribose phosphate backbone of the prf stimulatory element. in contrast, m (r a) had no effect -indeed most of this residue is buried (only approximately . % of the residue's surface area being solvent-accessible) and it likely plays a structural role in stabilising packing of helix α against the underside of the central sheet. perhaps surprisingly, m (k a / k a) also had no effect, despite these residues being located in the same positively charged loop as the essential r and r . this demonstrates that precise spatial positioning of charge within this loop is key to the specificity of rna-binding. despite low sequence identity, the overall fold of tmev a is similar to emcv a ( figure e ). the most functionally important part of emcv a is the "arginine loop" between strands β and β ; necessary for prf , ribosomal rna binding and nuclear localisation . the two arginines in this loop (r / r in emcv a; r / r in tmev a) are amongst the only surface-exposed residues that are completely conserved across both species of cardiovirus ( figure s a and b) . structurally, this loop adopts an almost identical conformation, consistent with mutagenesis data suggesting that they are functionally equivalent , . however, there are also some key differences between the two a orthologues ( figure e ). the loop region between the end of α and the start of β is much longer in emcv a. this loop, and the nterminal end of α , are two of three regions that contribute to the rna-binding surface in emcv a . these two regions are not conserved in tmev a ( figure s c ): the backbone geometry is different, and, with the exception of k and r , there are no chemically equivalent side chains in the vicinity that could form similar interactions. however, the third region that forms the rna-binding surface is the highly conserved β -β arginine loop mentioned above. apart from this loop, the rest of the tmev a rna-binding surface may differ from that in emcv, possibly involving other residues on the surface of the beta sheet that are only conserved amongst theilovirus isolates (e.g. r , d , k , h ; figure s c ). this is supported by a reduction in prf seen with the m (k a) mutant . the conservation at the c-terminus of the protein ( figure s a and b) is concentrated in the d(v/i)exnpg motif required for stopgo peptide release between a and b gene products in other cardioviruses , . as expected, this motif is unstructured in both proteins, and if we consider only the ordered amino acids, we observe some of the largest structural differences ( figure e ). in emcv a, the shorter α helix leads to a more pronounced curvature of the central beta sheet, and the c-terminus forms a short β strand that packs against β . conversely, in tmev a, the α helix continues with a pronounced ° kink until the end of the protein. this has consequences for the structure of the putative yxxxxlΦ motif at the cterminus that has previously been reported to sequester eif e in a manner analogous to e-bp, thereby disabling cap-dependent translation of host mrna . in tmev a, the first tyrosine residue of this motif (y ) is located on the buried side of the α helix (with only approximately . % of the residue's surface area being solvent accessible) and is therefore not available to interact with eif e. a second tyrosine (y ) is more exposed (approximately ~ . % of the residue's surface area being solvent accessible), but given the local secondary structure, it is unlikely that l and i would be able to interact with eif e in the same way as e-bp without a significant conformational change ( figure s d ). in emcv a, the putative yxxxxlΦ motif is not helical, instead forming a more extended backbone . superposition of this region reveals that this motif is not structurally conserved between a orthologues ( figure s e) . therefore, the interaction with eif e must either involve other parts of the protein or must be accompanied by significant structural rearrangement of the α helix of tmev. binding of a to the prf stimulatory element in the tmev rna the prf stimulatory element in tmev consists of a stem-loop (seven base-pair stem and nucleotide loop) located nucleotides downstream from the shift site ( figure a ). this is more compact than the equivalent element in emcv, which has a nt loop that may contain an additional stem , . however, in both viruses, three conserved cytosines in the loop are essential for a binding , . to determine the minimal rna element necessary for interaction with tmev a, we prepared a series of synthetic rna constructs ( figure b ) and assessed a binding by electrophoretic mobility shift assays (emsa; figure c ) and microscale thermophoresis (mst; figure d ). binding of a was generally high affinity, with dissociation constants in the sub-micromolar range. ( figure d ). truncation of either the shift-site (tmev , kd = ± nm), the ′ extension (tmev , kd = ± nm) or both (tmev , kd = ± nm) had little effect on a binding. shortening the loop by three nucleotides (tmev , kd = ± nm) slightly reduced the affinity, however removal of the ′ extension (tmev , tmev ) completely abolished a binding, even though the stem-loop in tmev is predicted to be intact, and the three essential cytosines are present in the loop. to validate that these small rnas are adopting conformations relevant to prf, we performed competition experiments. a dualluciferase-based reporter mrna containing the tmev shift site and stimulatory element was prepared and designed such that frame and − frame products would be easily distinguishable by sds-page following in vitro translation in rabbit reticulocyte lysates . in the presence of a, prf occurred efficiently (~ %; figure e ), but inclusion of a molar excess of the small rna would be predicted to reduce prf efficiency if the competitor rna were to sequester a from the reporter mrna. in line with the emsa assays, tmev and tmev both efficiently competed with the reporter mrna, reducing prf to % and %, respectively. rnas lacking the ′ extension (tmev , tmev ) were unable to compete ( figure e) . we have recently shown that the ′ extension is also important for a binding in emcv, where it likely forms the second stem of an rna pseudoknot via interaction with the ccc loop motif . however, given that the equivalent loop sequence is nt shorter in tmev, it was unclear whether this alternate conformation would be topologically possible for the more compact rna element. nevertheless, an alignment of cardiovirus rna sequences that direct prf shows that, in addition to the cytosine triplet, there are several nucleotides in the ′ extension that are completely conserved in all isolates ( figure a ). to investigate this in more detail, we truncated the ′ extension one nucleotide at a time ( figure b ) and assessed a binding by emsa ( figure c ). removal of the first six nucleotides of the ′ extension (cagcca; tmev - ) had no dramatic effect on a binding, however removal of nucleotides from the conserved caagg motif progressively reduced binding until it was undetectable (tmev ). to explore this further, we made point mutations and assessed their effects in a frameshift reporter assay ( figure d ). as expected, loop mutations c g or c g abolished prf. however, frameshifting in the c g background was restored by introducing a g c mutation in the ′ extension, demonstrating the necessity for a base pair between positions and . the importance of this alternate conformation for a recognition was verified by emsa analysis ( figure e ) in rna tmev , which comprises the minimal functional element as defined by the deletion analysis, individual mutation of either c or g completely abrogated a binding. an unrestrained rna folding simulation identified a pseudoknot-like conformation consistent with the biochemical data, including a base pair between g and c ( figure f ). this is consistent with our evidence for a pseudoknot-like conformation in the emcv sequence that is selectively recognised by a . whilst the double c g+g c mutant could rescue prf in our in vitro reporter assay, it did not restore a binding by emsa, either in the background of tmev ( figure e ) or the longer tmev rna (data not shown). this is consistent with the double mutant forming a topologically equivalent conformer that is either less stable and/or binds a with lower affinity. thus, whilst this rna is still able to stimulate prf, the association does not persist for the necessary timescales to permit observation in a dissociative technique such as emsa. to assess the binding affinity and stoichiometry in solution with unlabelled rna, we carried out isothermal titration calorimetry (itc) experiments with tmev and a protein. to prevent protein aggregation in the itc cell, it was necessary to perform the titration at mm nacl. increasing the salt concentration had only a slight effect on a binding as judged by emsa ( figure a ) and under these conditions we observed a kd of ± nm and a ~ : molar ratio of protein to rna ( figures b and c) . the large contribution of Δh (− . ± . kcal/mol) term to the overall free energy of binding (− . kcal/mol) is consistent with an electrostatic interaction mechanism, and the slightly higher affinity observed using itc versus mst and emsa may arise from minor differences in buffer composition, or the absence of the fluorophore on the rna, which may slightly interfere with protein binding. to confirm the stoichiometry in solution, we performed a sec-mals analysis of the mixture retrieved from the itc cell ( figure d ). as expected, the two peaks correspond to an approximate : rnaprotein complex (early; mw , da observed vs. , da expected) and excess rna (late; mw , da observed vs , da expected). the slightly higher than expected mass of the early peak may be due to small amounts of a dimeric rna species that independently co-migrate with the complex. to measure a-stimulated prf in the context of virus infection, we carried out metabolic labelling experiments. in addition to wt virus, two mutant viruses were employed. virus ss has two synonymous mutations in the slippery sequence (g_guu_uuu to a_gug_uuu) that prevent frameshifting , . virus m has the wt slippery sequence, but contains the m mutations described above ( figure d ), in which the universally conserved a residues r and r are mutated to alanine, rendering it unable to bind to the prf-stimulatory rna stemloop or to stimulate frameshifting . the small-plaque phenotype previously observed for these mutant viruses, and their emcv counterparts [ ] [ ] [ ] , was confirmed in bsr cells, a single cell clone of the mesocricetus auratus cell line bhk- ( figure s a ). western blotting of tmev-infected bsr cells harvested over a -hour timecourse demonstrated that a expression is detectable from eight hours post-infection (hpi) onwards, with levels increasing up to hpi, at which point cytopathic effects become fairly extensive ( figure s b ). cells were infected at a multiplicity of infection (moi) of three, or mock-infected, and cell lysates analysed at , , and hpi ( figure a , figure s c ). at hpi, reliable quantification of viral proteins above the background of host translation was not possible. however, at and hpi, a large proportion of ongoing translation is viral, likely due to virus-induced shutoff of host gene expression, and viral proteins were clearly visible and quantifiable. proteins downstream of the frameshift site are only translated by ribosomes which have not undergone frameshifting, as those which frameshift encounter a stop codon in the − frame eight codons downstream of the shift site. frameshift efficiency can be estimated from the ratio of downstream to upstream products, normalised by the frameshift-defective m mutant as a control (detailed in methods, and previously described in , ). the mean wt frameshift efficiency was found to increase from % at hpi to % at hpi ( figure b , blue bars) (two-tailed welch's t-test: t = − . , df = . , p = . ), reaching a percentage efficiency within the - % range calculated by metabolic labelling of infected bhk- cells , . the frameshift efficiency of the ss mutant virus is negligible at both timepoints ( % and % at and hpi respectively), as previously shown for both tmev and emcv . the wt frameshift efficiency measurements from the metabolic labelling experiments are considerably higher than those measured in the in vitro frameshift reporter assays ( figure e ), a discrepancy also observed in experiments with emcv . this could be due to a number of reasons, but is most likely due to differing activities of purified versus virally-produced a, or differences in translation conditions between the two systems, such as potential depletion of the supply of translation factors during infection. we extended our examination of prf in tmev-infected cells using ribosome profiling, a deep sequencing-based technique that gives a global snapshot of the positions of translating ribosomes at sub-codon resolution , . bsr cells were infected with wt, ss, or m tmev, or mock-infected, and harvested without cycloheximide treatment at hpi by snap-freezing in liquid nitrogen, to preserve the position of ribosomes on viral and host mrnas. cells were lysed and the presence of a in infected samples was verified by western blot ( figure s a ). lysates were treated with rnase i and ribosome-protected mrna fragments (rpfs) harvested by pelleting the ribosomes through sucrose and subsequent phenol extraction ( figure c ). rpfs were ligated to adapters, cloned, and deep sequenced. reads were mapped to the host (m. auratus) and viral (based on nc_ ) genomes (table s ) to precisely determine the locations of translating ribosomes. quality control analysis, carried out as previously described , indicated that the datasets were of high quality. translating ribosomes are known to protect two distinct lengths of mrna from nuclease digestion: - nt, thought to originate from a ribosome with an occupied a site , , , and - nt, from either a translocating ribosome in the rotated state or a ribosome with an empty a site , . metaanalysis of reads mapping to host and viral coding sequences revealed a read length distribution with two distinct peaks at the expected lengths for these two populations of rpfs ( figure s b ). in mammalian cell lysates treated with sufficient nuclease that all unprotected regions of mrna are fully digested ("trimmed"), there is a distance of ~ nt between the ′ end of the rpf and the first nucleotide in the p site of the ribosome . this can be used to infer the frame of translation. in our libraries, the majority of rpf ′ ends map to the first nucleotide position of codons (herein termed phase ), as is evident for the host cds-mapping reads in this dataset ( figure s c ). the length distribution and phase composition of reads mapping to the viral genome closely matched those of host-mapping reads. a meta-analysis of the inferred p site positions of ribosomes relative to host mrna start and stop codons reveals that rpfs map to coding sequences with a triplet periodicity reflective of the length of a codon ( figure s d ). as expected, few rpfs map to the utrs, particularly the ′utr. we also observe typically heightened rpf peaks corresponding to the sites of translation initiation and termination , . to analyse translation of the viral rna, we looked at rpf distribution on the viral genome ( figure d -g, figure s ). there is a clear dominance of the phase throughout the genome ( figure e ), including within the + -frame l* orf which overlaps l and vp , indicating that the l* orf is not highly translated. ribosomes that undergo frameshifting at the b* shift site translate only eight codons in the − frame before encountering a termination codon. although the transframe region is too short to see the change in frame in the whole-genome plot, the resultant drop-off in ribosome density after the frameshift site in the wt virus is strikingly obvious ( figure d ). no such decrease in density is seen after the frameshift site on the ss virus genome, and there is actually a slight increase in read density in this region on the m genome. this could be due to differences between the two regions in mean translation rate or the extent of various biasing effects inherent in the ribosome profiling procedure, such as ligation, pcr or nuclease biases . in order to control for these effects and highlight translational features related to the presence of a functional a and/or shift site, the read densities on the wt and ss genomes were divided by those in the corresponding position on the m genome ( figure f , figure s b ). frameshift efficiency was calculated from the ratio of m -normalised rpf density in the regions downstream and upstream of the frameshift site, revealing a mean wt frameshift efficiency of %, which to our knowledge is the highest − prf efficiency thus far recorded in any natural system , , ( figure b , green bars). this is significantly greater than the % measured by metabolic labelling at the same timepoint (two-tailed welch's t-test: t = − . , df = . , p = . ). the profiling assay (with m normalisation) is expected to be substantially more accurate than the metabolic labelling approach, which suffers from lower sensitivity (densitometry of a few protein bands versus high throughput sequencing), interference from background and closely migrating protein bands, and possible differences in polyprotein processing and protein turnover between wt and mutant viruses. given the metabolic labelling data ( figure s c) , it is likely that an even higher frameshift efficiency might be detected if profiling was carried out at hpi. the occurrence of a highly efficient frameshift in wt tmev was verified by the observation of a marked shift in the dominant rpf phase from in the upstream region, to − /+ in the b* transframe region ( figure s a ). it should be noted that in frameshift efficiency calculations, neither the profiling nor the metabolic labelling experiments would reliably distinguish between ribosome drop-off due to termination at the b* stop codon post-frameshifting, and ribosome drop-off at the stopgo site located just five codons upstream. however, stopgo is generally very efficient in cultured cells, with little drop-off , , and this is evident here, as no obvious decrease in rpf density occurs after the stopgo motif in the m mutant ( figure d , figure s a) . surprisingly, the mean frameshift efficiency of the ss mutant was high, at %. similar residual frameshift activity has been observed at other highly efficient shift sites with analogous mutations designed to knock out frameshifting , . this may be reflective of the very strong frameshift-stimulatory activity of the a-rna complex facilitating frameshifting even despite unfavourable codon-anticodon repairing in the − frame. however, prolonged pausing of ribosomes over the mutant slippery sequence (discussed below) could potentially influence mechanisms other than frameshifting that may contribute to this observed drop-off, such as stopgo, discussed above, or ribosome drop-off at the frameshift site itself . a small increase in the percentage of − /+ phase reads is seen in the transframe region of the ss mutant, and not the m control, ( figure s a ). however, phasing proportions taken over such a short region (nine codons) cannot be expected to reliably quantify relative translation levels in the two overlapping reading frames. ribosomes pausing over the slippery sequence has long been considered a mechanistically important feature of prf [ ] [ ] [ ] . however, while observed to a small extent on wt shift sites in vitro , , [ ] [ ] [ ] , it has been elusive in ribosome profiling data , . however, if the slippery sequence is mutated to prevent frameshifting, a measurable pause is seen both in vitro , , and in profiling experiments , perhaps reflecting a reduced ability of non-frameshifting ribosomes to resolve the topological problem posed by the downstream frameshift-stimulatory element. ribosome profiling allows identification of ribosome pauses with single-nucleotide precision. it should be noted that at the level of single nucleotides ribosome profiles are strongly affected by nuclease, ligation and potentially other biases introduced during library preparation. however, by comparing the wt and ss genome ribosome profiles with the m genome ribosome profile, we may identify potential changes in dwell time on some codons that may be related to a binding. the region of greatest difference between wt and m is surprisingly not at the frameshift site, but in the middle of the a orf ( figure f , figure s b ). this pause likely corresponds to ribosomes translating the rna-binding arginine residues (r and r ), which are mutated to alanine in the m genome. the pause on the m -normalised plot likely reflects increased decoding time for the arginine-encoding cgc codons in the wt, which are relatively poorly adapted to the cellular trna pool compared to the alanine-encoding gcc codons in the m mutant , ( figure s b and c). this peak is present to a lesser extent on the ss genome, potentially due to the slightly slower replication kinetics of this mutant virus meaning there are fewer copies of the viral genome to deplete the cellular supply of the relevant aminoacylated trna. moving on to look specifically at the frameshift site, a single-nucleotide resolution plot of reads mapping to this region reveals a peak on the ss mutant genome corresponding to a ribosome paused with the guu codon of the slippery sequence in the p site ( figure g , figure s c ). this putative pause is present to a lesser extent on the wt genome, but not the m genome, indicating it is related to the presence of functional a. the uuu codon of the slippery sequence also appears to have a much larger phase peak in ss than m , however this may be enhanced by potential "run-on" effects in which a fraction of ribosome pauses may be able to resolve during cell harvesting and ribosomes translocate to the next codon . on both wt and ss genomes, there is a peak two nucleotides downstream of the main slippery sequence pause, corresponding to ribosomes which have frameshifted and then translocated one codon, and further peaks suggestive of − -frame translation throughout the b* orf. closer inspection of the whole-genome ( figure d , figure s a ) and m -normalised plots ( figure f , figure s b ) reveals that the large pause at the frameshift site actually extends a little further upstream, ending just upstream of the a- b junction formed by the stopgo motif. further investigation of this region ( figure s b ) reveals that this is due to a very prominent pause in the ss mutant, and to a lesser extent the wt, with ribosomal p sites corresponding to the glutamic acid and methionine residues of the d(v/i)exnpg|p stopgo motif (pause sites highlighted in bold, where x is methionine in tmev). this pause is, surprisingly, considerably larger than the pause over the slippery sequence itself. ribosomal pausing over stopgo motifs has been observed in vitro , however it occurs with the ribosomal p site corresponding to the conserved glycine residue directly before the 'cleavage' site , . noticing that the pause over the stopgo motif was nine codons upstream of the pause over the slippery sequence, we wondered whether this might be consistent with the transient formation of disomes in which the leading ribosome is paused over the slippery sequence. disomes are routinely excluded during preparation of ribosome profiling libraries by the inclusion of a size-selection step (in this study, - nt) which selects for monosomeprotected fragments . for mock, wt-and m -infected lysates from replicate we carried out two parallel size selection steps, in which the - nt "monosome" fraction and a - nt "broad spectrum" fraction were isolated from the same lysate. quality control analysis of the length distribution of reads in the broad-spectrum fraction demonstrated local peaks at read lengths of around , , and - nt, consistent with expected lengths of rna protected by disomes , , ( figure a) . quantification of the number of reads attributed to each phase for reads of each length revealed a bias towards phase , indicating a portion of genuine ribosome footprints among reads of lengths - , - , and - nt ( figure b ). these read lengths were selected for analysis as potential "disome-protected fragments", and their density plotted on the viral genome at the inferred p site position of the upstream, colliding ribosome ( figure c and d). a very prominent peak is visible on the wt genome over the stopgo motif ( figure c) , and closer inspection reveals that one of the highest peaks in this region is over the valine of the stopgo motif, ten codons upstream of the slippery sequence pause ( figure d ). this is the expected distance between the p sites of ribosomes involved in a disome , , and would be consistent with disome formation due to a ribosome translating the stopgo motif colliding with a ribosome paused over the slippery sequence. further, this approximate ten-codon periodicity in distances between peaks extends upstream, consistent with potential formation of ribosome queues up to six ribosomes long , ( figure d , grey oblongs). an additional peak is evident over the arginine just upstream of the stopgo release site, which would correspond to a disome in which the leading ribosome was paused over the conserved glycine of the stopgo motif ( figure d, brown oblongs) . this is evidently a feature of the stopgo site itself and unrelated to the binding of a downstream, as it occurs in the m dataset as well as the wt. it should be noted that ribosome profiling generates an averaged result of ribosome positions over multiple copies of the viral genome, and formation of the two putative disomes proposed here could not occur simultaneously on a single rna. the potential formation of ribosome queues behind the frameshift/stopgo site is supported by the monosome data, in which rpf density gradually increases throughout a, reaching a maximum over the stopgo motif. this is particularly apparent in the data from the ss mutant ( figure d and f), consistent with the idea that greater pausing over the mutant slippery sequence may be increasing disome formation, pushing the lengths of protected fragments into the disome fraction, and reducing its visibility in the monosome dataset. this is unusual, as no prominent disome peaks were detected near the frameshift site by broad spectrum ribosome profiling of the coronavirus murine hepatitis virus . indeed, the potential formation of disomes at the frameshift site in tmev represents the first evidence that ribosomes can stack at a prf signal. our combination of structural, biophysical, biochemical and deep sequencing approaches affords a view of unparalleled detail into the molecular mechanisms underpinning theilovirus protein-stimulated frameshifting. despite highly divergent sequences within cardioviruses, the tmev a protein adopts the second known occurrence of the beta-shell fold. whilst the distribution of positively charged residues comprising the putative rna-binding surface is different from that found in emcv, it nevertheless recapitulates the same exquisite conformational selectivity for binding to its viral rna target. strikingly, we demonstrate that this is a pseudoknot that is likely to exist in equilibrium with the stem-loop previously suggested by structure probing experiments. despite being nt shorter, the stimulatory rna element is able to adopt a conformation topologically equivalent to that previously seen in emcv, involving interactions between the loop and the ′ extension. in infected cells, stabilisation of this conformation by high-affinity a binding represents the 'switch' that controls frameshifting and thereby reprogramming of viral gene expression. our ribosome profiling data reveals that, when invoked, this is up to % efficient, representing the highest known − prf efficiency to date. whilst frameshift-associated pausing is not normally detectable at wt shift sites in profiling data, we show that it is detectable here by analysing rpfs corresponding to disomes, not only at the prf site but also the adjacent stopgo motif. this is consistent with the relatively long pauses accompanying tmev frameshifting in vitro, and suggests that ribosome collisions are more common than previously thought during translation of the tmev genome. taken together, these results suggest that there is a fine balance between necessary ribosome pausing associated with recoding events, and the detrimental effect on viral fitness that would result from these pauses lasting long enough to trigger ribosome quality control pathways, and the degradation of the viral rna. in future, structural characterisation of the a-rna complex will yield further insights into the molecular mechanisms underlying the potency of this elongation blockade. protein expression and purification tmev a cdna was amplified by pcr from plasmid pgex p -based constructs (f ′ aattcatatgaatcccgcttctctctaccgc ′; r ′ aattggatccttattagcctgggttcatttctacatc ′) and cloned into popt g to introduce a c protease-cleavable n-terminal gst tag. recombinant protein was produced in e. coli bl (de ) plyss cells. cultures were grown in xty broth supplemented with μg/ml ampicillin ( °c, rpm). expression was induced at a of ~ with . mm isopropyl β-d- -thiogalactopyranoside (iptg) and continued overnight ( rpm, °c, h). bacteria were pelleted ( , × g, °c, min), washed in cold pbs and stored at - °c. cells from four litres of culture were thawed in ml lysis buffer ( mm tris (hcl) ph . , mm nacl, . mm mgcl , . mm dtt, . % w/v tween- supplemented with μg/ml dnase i and edta-free protease inhibitors) and lysed using a cell disruptor ( kpsi, °c). the insoluble fraction was pelleted by centrifugation ( , × g, min, °c) and discarded. supernatant was incubated ( h, °c) with . ml of glutathione sepharose b resin (ge healthcare) that had been pre-equilibrated in the same buffer. protein-bound resin was washed three times by centrifugation ( × g, min, °c) and re-suspension in ml wash buffer ( mm tris (hcl) ph . , mm nacl, . mm dtt). washed resin was transferred to a gravity column and protein was eluted in batch with ml wash buffer supplemented with mm reduced glutathione ( h, °c). gst-tagged c protease was added to the eluate ( μg/ml) and the mixture was dialysed ( k molecular weight cut-off (mwco), °c, h) against l wash buffer to remove the glutathione. dialysed proteins were then re-incubated with glutathione sepharose b resin (as above; h, °c) to remove the cleaved gst and gst- c protease. the flow-through was subjected to heparin-affinity chromatography to remove nucleic acids. samples were loaded on a ml hitrap heparin column (ge healthcare) at . ml/min, washed with two column volumes of buffer a ( mm tris (hcl) ph . , mm nacl, . mm dtt and eluted with a %  % gradient of buffer b ( mm tris (hcl) ph . , . m nacl, . mm dtt) over column volumes. after removal of nucleic acids, protein became aggregation-prone and precipitated at low temperatures, therefore all subsequent steps were performed at °c. fractions corresponding to the a peak were pooled and concentrated using an amicon® ultra centrifugal filter unit ( k mwco, , × g) prior to size exclusion chromatography (superdex / column; mm hepes ph . , . m nacl, . mm dtt). purity was assessed by - % gradient sds-page, and protein identity verified by mass spectrometry. purified protein was used immediately for crystallisation trials or was concentrated (~ . mg/ml, μm), snap-frozen in liquid nitrogen and stored at - °c. for studies of the tmev a protein in isolation, a superdex increase / gl column (ge healthcare) was equilibrated with mm tris (hcl) ph . , . m nacl ( . ml/min flow, °c). per experiment, μl of protein was injected at concentrations of . , . and . mg/ml (molar concentrations of , . and . μm, respectively). the static light scattering, differential refractive index, and the uv absorbance at nm were measured by dawn + (wyatt technology), optilab t-rex (wyatt technology), and agilent uv (agilent technologies) detectors. the corresponding molar mass from each elution peak was calculated using astra software (wyatt technology) using the differential refractive index and a dn/dc value of . to calculate the protein concentration. for studies of a-rna complexes, samples were recovered directly from the itc cell after confirmation of binding and concentrated to an a of ~ . prior to injection of ul onto a superdex increase / gl column pre-equilibrated in mm tris (hcl) ph . , mm nacl ( . ml/min flow, °c). data were recorded as above, and to estimate the relative contributions of protein and rna to molar mass, a protein conjugate analysis was performed within astra , using a protein dn/dc value of . and an rna (modifier) dn/dc value of . . prior to this analysis, extinction coefficients (at nm) were determined experimentally from protein-only and rna-only peaks using the astra "uv extinction from ri peak" function. purified tmev a was concentrated to . mg/ml in mm tris (hcl) ph . , . m nacl, . mm dtt. sitting-drop vapor diffusion experiments were set up in -well mrc plates with μl reservoir solution, nl protein and nl crystallization buffer. diffraction-quality crystals grew in . m kbr, . m potassium thiocyanate, . m sodium cacodylate ph . , % w/v poly-γ-glutamic acid - and % w/v peg-mme . crystals were harvested in nylon loops and cryo-protected by removal from the mother liquor through a . μl layer of crystallization buffer that had been supplemented with % v/v glycerol, prior to flash-cooling by plunging into liquid nitrogen. x-ray data collection, structure determination, refinement and analysis datasets of images ( table ) were recorded from two crystals at beamline i - , diamond light source on a pilatus m detector (dectris), using % transmission, an oscillation range of . ° and an exposure time of . s per image. data were collected at a wavelength of . Å and a temperature of k. reflections were indexed and integrated with dials (highest resolution crystal) or xds (structure determination crystal) and data were scaled and merged with aimless within the xia data reduction pipeline . resolution cut-off was decided by a cc / value ≥ . and an i/σ(i) ≥ . in the highest resolution shell . the structure was solved by single-wavelength anomalous dispersion (sad) analysis of the structure determination crystal, using anomalous signal from bromine present in the crystallisation buffer. sad phasing was performed using autosharp , implementing shelxd for substructure determination , sharp for density modification and arp/warp for automated model building. this placed residues out of ( %) in the single chain that comprised the asymmetric unit of the cubic p cell. this preliminary model was subsequently used as a molecular replacement search model to solve a higher resolution dataset (highest resolution crystal) using phaser . this model was then subjected to several rounds of manual adjustment using coot and refinement with phenix.refine . molprobity was used to assess model geometry including ramachandran outliers, bad rotamers and mainchain geometry deviations throughout the refinement process. for the electrostatic potential calculations, pdb pqr and propka were used to estimate protein pka values and assign partial charges. electrostatic surface potentials were calculated using apbs . relative solvent-accessible surface areas per residue were calculated using getarea and crystallographic interaction interfaces were assessed using pdbepisa . for cardiovirus sequence alignments, the match > align tool in ucsf chimera was first used to generate a seed alignment based on superposed structures of emcv and tmev a, prior to the subsequent alignment of other cardiovirus a sequences. structural figures depicting crystallographic data (cartoon, stick and surface representations) were rendered in pymol (schrödinger llc). the representation of surface conservation was generated using consurf . technologies) . measurements were performed using a monolith nt. pico instrument (nanotemper technologies) at an ambient temperature of °c. instrument parameters were adjusted to % led power and medium mst power. data of two independently pipetted measurements were analysed for fraction bound using initial fluorescence (mo.affinity analysis software, nanotempe technologies). the non-binding rnas (namely tmev and ) were normalized using graphpad prism. itc analyses were carried out at °c using a microcal peaq-itc (malvern panalytical). rnas and proteins were dialysed ( h, °c) into buffer ( mm tris (hcl) ph . , mm nacl) before performing experiments. final concentrations of protein and rna after dialysis were determined by spectrophotometry (a and a , respectively), using theoretical extinction coefficients based on the primary sequence of each component. rna ( μm) was titrated into protein ( μm) with × . μl injection followed by × . μl injections. control titrations of rna into buffer, buffer into protein and buffer into buffer were also performed. data were analyzed using the microcal peaq-itc analysis software (malvern panalytical) and binding constants determined by fitting a single-site binding model. for in vitro frameshifting assays, we cloned a nt sequence containing the gguuuuu shift site flanked by nt upstream and nt downstream into the dual luciferase plasmid pdluc at the xhoi/bglii sites . the sequence was inserted between the renilla and firefly luciferase genes so that firefly luciferase expression is dependent on − prf. wild-type or derivative frameshift reporter plasmids were linearized with fspi and capped run-off transcripts generated using t rna polymerase as described . messenger rnas were recovered by phenol/chloroform extraction ( : v/v), desalted by centrifugation through a nucaway spin column (ambion) and concentrated by ethanol precipitation. the mrna was resuspended in water, checked for integrity by agarose gel electrophoresis, and quantified by spectrophotometry. messenger rnas were translated in nuclease-treated rabbit reticulocyte lysate (rrl) (promega). typical reactions were composed of % (v/v) rrl, μm amino acids (lacking methionine) and . mbq [ s]-methionine and programmed with ∼ μg/ml template mrna. reactions were incubated for h at °c. samples were mixed with volumes of × laemmli's sample buffer, boiled for min and resolved by sds-page. dried gels were exposed to a storage phosphor screen (perkinelmer), the screen was then scanned in a typhoon fla using the phosphor autoradiography mode. bands were quantified using imagequant™tl software. the calculations of frameshifting efficiency (%fs) took into account the differential methionine content of the various products and %fs was calculated as % − fs = × (ifs/metfs) / [(is/mets) + (ifs/metfs)]. in the formula, the number of methionines in the stop, − are denoted by mets, metfs respectively; while the densitometry values for the same products are denoted by is and ifs respectively. all frameshift assays were carried out a minimum of three times. bsr (single cell clone of bhk- cells, species mesocricetus auratus, provided by polly roy, lshtm, uk) were maintained in dulbecco's modified eagle's medium (dmem), high glucose, supplemented with l-glutamine ( mm), antibiotics, and fetal bovine serum (fbs) ( %), at °c and % co . cells were verified as mycoplasma-free by pcr (e-myco plus mycoplasma pcr detection kit, intron biotechnology). cells were seeded to achieve % confluence on the day they were infected with virus stocks listed in napthine et al , based on gdvii isolate nc_ with three nucleotide differences present in wt and mutant viruses (g a, a g and g a; nt coordinates with respect to nc_ ). all infections, except for plaque assays, were carried out at a moi of in serum-free media, or media only for the mock-infected samples. after incubation at °c for h, inoculum was replaced with serum-free media supplemented with fbs ( %), and infected cells were incubated at °c until harvesting or further processing. bsr cells in -well plates at % confluence were inoculated with serial dilutions of virus stocks for h then overlaid with ml dmem supplemented with l-glutamine ( mm), antibiotics, fbs ( %), and carboxymethyl cellulose ( . %). plates were incubated at °c for h then fixed with formal saline and stained with . % toluidine blue. bsr cells in -well plates were infected at a moi of in a volume of μl. after h, the inoculum was replaced with dmem containing % fbs ( ml) and cells were incubated at °c for , , or h delineating the , , or hpi -timepoints respectively. at the set timepoint post-infection, cells were incubated for h in methionine-and serum-free dmem, then radiolabelled for h with [ s]-methionine at μci ml − (∼ , ci mmol − ) in methioninefree medium. cells were harvested and washed twice by resuspension in ml of ice-cold phosphate-buffered saline (pbs) and pelleted at , g for min. cell pellets were lysed in μl × sds-page sample buffer and boiled for min before analysis by sds-page. dried gels were exposed to x-ray films or to phosphorimager storage screens. image analysis was carried out using imagequanttl . to quantify the radioactivity in virus-specific products. bands which were quantifiable for all three viruses (table s ) were carried forward for normalisation by methionine content and then by an average of the quantifiable proteins upstream of the frameshift site as a loading control. bands in the wt and ss lanes were normalised by the counterpart bands in the m lane to control for differences in protein turnover. frameshift efficiency (%) is given by the equation × [ -(downstream/upstream)], where downstream and upstream represent the average of the fully normalised intensity values for proteins downstream and upstream of the frameshift site, respectively. bsr cells in cm dishes were inoculated in triplicate for h at °c at a moi of , or media only for the mock-infected samples. after inoculation, cells were incubated in serum-free media supplemented with fbs ( %) at °c for h. cells were harvested and ribosomeprotected fragments (rpfs) purified and prepared for next-generation sequencing as described in chung et al. (based on , , ) , with the following modifications. the cycloheximide pre-treatment was omitted from the harvesting protocol and cells were instead washed with warm pbs before snap-freezing in liquid nitrogen. rnase i (ambion) was added to final concentration . u/µl (replicate ) or . u/µl (replicate and -higher concentration to ameliorate incomplete trimming noticed in replicate ), and digestion inhibited by adding superase-in rnase inhibitor (invitrogen) to final concentration . u/µl and . u/µl, respectively. the range of fragment sizes selected during polyacrylamide gel purification before ligation to adapters was increased to - nt, and size ranges for post-ligation gels adjusted accordingly. for replicate samples, two gel slices were excised per library, to purify monosome-protected ( - nt) and broad spectrum ( - nt) fragments from the same lysate. depletion of ribosomal rna was carried out solely by use of the ribozero gold human/mouse/rat kit (illumina). adapter sequences were based on the truseq small rna sequences (illumina), with an additional seven random nucleotides at the ′-end of the ′adapter and the ′-end of the ′-adapter, to reduce the effects of ligation bias and enable identification of pcr duplicates. for the broad-spectrum libraries only the ′-adapter contained the random nucleotides. libraries were deep sequenced on the nextseq platform (illumina), and data made publicly available on arrayexpress under accession numbers e-mtab- and e-mtab- . adapter sequences were removed and reads resulting from rna fragments shorter than nt discarded using fastx_clipper (version . . , parameters: -q -l -c -n -v). sequences with no adapters and those which consisted only of adapters were discarded. pcr duplicates were removed using awk, and the seven random nucleotides originating from the adapters were trimmed from each end of the reads using seqtk trimfq (version . ). reads were aligned to reference databases using bowtie , allowing one mismatch and reporting only the best alignment (version . . , parameters: -v --best), in the following order: rrna, vrna, mrna, ncrna, mtdna, and gdna. quality control analysis indicated some contamination of replicate libraries with e. coli rna (bl ). these reads do not exhibit the expected features of genuine rpfs, however, indicating that the contamination occurred after lysates were harvested and thus they do not affect our conclusions. reads were re-mapped with the addition of a bl reference database (cp . ) before vrna, to remove these reads. the rrna database was made up of genbank accession numbers nr_ . , nr_ . , nr_ . and nr_ . . the mrna database was compiled from the available m. auratus genbank refseq mrnas on th nov , after removing transcripts with annotated changes in frame. the ncrna and gdna databases are from m. auratus ensembl release (genome assembly . ). viral genome sequences were verified by de novo assembly with trinity (version . . ), and reversion rates for mutated bases verified as below . %. for the broad-spectrum dataset, reads with no detected adapter were not discarded by fastx_clipper, pcr duplicates were not removed, and two mismatches were allowed during bowtie mapping, except to bl e. coli, for which one mismatch was allowed. to visualise rpf distribution on the viral genome, the number of reads with ′-ends at each position was counted and divided by the total number of positive-sense host mrna and vrna reads for that library, to normalise for library size and calculate reads per million mapped reads (rpm). for plots covering the entire genome, a sliding window running mean filter of nt was applied. to generate the plot of wt and ss data normalised by m data, utrs plus a small buffer ( nt at the ′-end of the cds and nt at the ′-end) were excluded and the result of the nt running mean at each position on the wt or ss genome was divided by the nt running mean at the corresponding position on the m genome. this avoided any instances of division by zero. only positive sense reads were used, and pairs for normalisation were allocated according to replicate number. for the broad-spectrum dataset, reads of lengths - , - , and - nt were selected for analysis as potential "disome-protected fragments", and the denominator for normalisation of disome-protected read densities to rpm was calculated using only reads of these lengths. for all plots of read distribution on the viral genome, a + nt offset was added to the ′-end coordinate of the read before plotting, to reflect the inferred position of the ribosomal p site. for disome-protected fragments, this represents the position of the p site of the colliding ribosome. for plots in the main text showing data from only one replicate, replicate was used. relative adaptiveness values for sense codons to the cellular trna pool were downloaded from the species-specific trna adaptive index compendium , for cricetulus griseus as a proxy for m. auratus, on rd aug . for the phasing and length distribution quality control plots, only reads which map completely within the cds of host mrna were included. reported kd values using the same constructs. e) experiment showing the effects of titrating excess short rnas (tmev - ) as competitors into an in vitro frameshift reporter assay. the concentrations of the reporter mrna and a were kept constant in the rrl and short rnas were added in -and fold molar excess relative to the reporter mrna, as indicated. translation products were visualised by using autoradiography, and % frameshifting was calculated following densitometry and correction for the number of methionines present in frame and − frame products. and ribosome profiling (green bars). bars represent the mean of three replicates, with values for each replicate indicated by crosses. c) schematic of ribosome profiling methodology. cells are flash frozen before lysis. rnase i is added to digest regions of unprotected rna, then ribosomes and enclosed ribosome-protected fragments (rpfs) of rna are purified. rpfs are released and prepared for highthroughput sequencing to determine the positions of translating ribosomes at the time of cell lysis. d) rpf densities in reads per million mapped reads (rpm) on wt, ss and m viral genomes (top panel), after application of a nt running mean filter, from cells harvested at hpi. positive sense reads are plotted in green (above the horizontal axis), negative sense in red (below the horizontal axis). in all plots, rpf densities from replicate are shown, plotted at the inferred position of the ribosomal p site. e) rpf densities from d, coloured according to phase (purple, blue, and yellow represent rpfs whose ′ ends map to, respectively, the st, nd or rd nucleotides of polyprotein-frame codons, defined as phases , + /− , and + /− ), after application of a codon running mean filter. frames for each viral orf are designated with respect to the polyprotein (set to ) and indicated on the genome plot by colour and by an offset in the y plane in the corresponding direction. f) ratio of wt or ss density at each position on the genome relative to the rpf density on the m mutant genome. utrs were excluded, only positive sense reads were used, and a nt running mean filter was applied before the division. regions defined as "upstream" and "downstream" in the ribosome profiling frameshift efficiency calculations are annotated below, and average densities for these regions displayed to the left of the plots as grey lines. g) inferred positions of ribosomal p sites at the slippery sequence and in the b* region, coloured according to the phase of rpf ′ ends as in e. the genome sequence in this region is underlaid beneath the data in the top panel, and all libraries are set to the same scale on the y axis, with no running mean filter applied. supplementary . ribosome profiling of disome-protected fragments. a) length distribution of reads mapping within host cdss for broad spectrum libraries. b) total number of reads attributed to each phase, for reads of each specified length, in the broad-spectrum libraries. read lengths which were selected for inclusion in the "disome-protected fragment" plots (c and d) are indicated by square brackets, and correspond to peaks in a). c) density of disome-protected fragments on the wt and m viral genomes, derived from the same lysate as the monosome-protected fragment data plotted in figure d-g. reads of lengths - , - , and - nt were selected for inclusion, and read densities are plotted as rpm after application of a nt running mean filter. positive sense reads are plotted in green (above the horizontal axis), negative sense in red (below the horizontal axis). in all disome plots, reads are plotted at the inferred p site position of the colliding ribosome. d) density of disome-protected fragments, plotted at inferred p site positions of colliding ribosomes involved in disomes upstream of the b* frameshift site, coloured according to phase as in panel b. the amino acid sequence in this region is underlaid beneath the data in the top panel, and both libraries are set to the same scale on the y axis, with no running mean filter applied. codons on which leading ribosomes would be expected to pause due to stopgo and frameshifting (fs) are indicated in brown and grey respectively. positions of ribosomes potentially involved in queue formation behind these pause sites are indicated (fs: above genome map; stopgo: below genome map), with p site positions annotated as dashed lines, in the corresponding colours. cardioviruses are genetically diverse and cause common enteric infections in south asian children phylogenetic analysis of the species theilovirus: emerging murine and human pathogens facets of theiler's murine encephalomyelitis virus-induced diseases: an update a detailed kinetic analysis of the in vitro synthesis and processing of encephalomyocarditis virus products proteolytic processing of picornaviral polyprotein cell-specific proteins regulate viral rna translation and virus-induced disease a cell cycle-dependent protein serves as a template-specific translation initiation factor proteolysis at the a/ b junction in theiler's murine encephalomyelitis virus a case for "stopgo": reprogramming translation to augment codon meaning of ggn by promoting unconventional termination (stop) after addition of glycine and then allowing continued translation (go) characterization of ribosomal frameshifting in theiler's murine encephalomyelitis virus theiler's murine encephalomyelitis virus contrasts with encephalomyocarditis and foot-and-mouth disease viruses in its functional utilization of the stopgo non-standard translation mechanism ribosomal frameshifting and transcriptional slippage: from genetic steganography and cryptography to adventitious use non-canonical translation in rna viruses mechanisms and biomedical implications of - programmed ribosome frameshifting on viral and bacterial mrnas mutational analysis of the "slippery-sequence" component of a coronavirus ribosomal frameshifting signal torsional restraint: a new twist on frameshifting pseudoknots a mechanical explanation of rna pseudoknot function in programmed ribosomal frameshifting a. mrna pseudoknot structures can act as ribosomal roadblocks thermodynamic control of - programmed ribosomal frameshifting the energy landscape of - ribosomal frameshifting protein-directed ribosomal frameshifting temporally regulates gene expression characterization of the stimulators of protein-directed ribosomal frameshifting in theiler's murine encephalomyelitis virus ribosomal frameshifting into an overlapping gene in the b-encoding region of the cardiovirus genome programmed - /- ribosomal frameshifting in simarteriviruses: an evolutionarily conserved mechanism transactivation of programmed ribosomal frameshifting by a viral protein a novel role for poly(c) binding proteins in programmed ribosomal frameshifting structural studies of cardiovirus a protein reveal the molecular basis for rna recognition and translational control mutational analysis of the emcv a protein identifies a nuclear localization signal and an eif e binding site simrnaweb: a web server for rna d structure modeling with optional restraints an analysis by metabolic labelling of the encephalomyocarditis virus ribosomal frameshifting efficiency and stimulators genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage r rna ribosome pausing and stacking during translation of a eukaryotic mrna high-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mrna fragments translation inhibitors cause abnormalities in ribosome profiling experiments accounting for biases in riboprofiling data indicates a major role for proline in stalling translation e unum pluribus: multiple proteins from a self-processing polyprotein site-specific release of nascent chains from ribosomes at a sense codon programmed - frameshifting by kinetic partitioning during impeded translocation sequence requirements for efficient translational frameshifting in the escherichia coli dnax gene and the role of an unstable interaction between trna(lys) and an aag lysine codon ribosome pausing, arrest and rescue in bacteria and eukaryotes ribosomal movement impeded at a pseudoknot required for frameshifting ribosomal pausing during translation of an rna pseudoknot kinetics of ribosomal pausing during programmed - translational frameshifting comparative analysis of gene expression in virulent and attenuated strains of infectious bronchitis virus at subcodon resolution stadium: species-specific trna adaptive index compendium understanding biases in ribosome profiling experiments reveals signatures of translation dynamics in yeast analysis of the aphthovirus a/ b polyprotein 'cleavage' mechanism indicates not a proteolytic reaction, but a novel translational effect: a putative ribosomal 'skip dom rescues ribosomes in ' untranslated regions genome-wide survey of ribosome collision the extent of ribosome queuing in budding yeast structural basis of vps a recruitment to the human hops complex by vps dials: implementation and evaluation of a new integration package how good are my data and what is the resolution? xia : an expert system for macromolecular crystallography data reduction linking crystallographic model and data quality automated structure solution with autosharp a short history of shelx arp/warp and molecular replacement phaser crystallographic software features and development of coot phenix: a comprehensive python-based system for macromolecular structure solution improved management of lysosomal glucosylceramide levels in a mouse model of type gaucher disease using enzyme and substrate reduction therapy pdb pqr: an automated pipeline for the setup of poisson-boltzmann electrostatics calculations electrostatics of nanosystems: application to microtubules and the ribosome exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules inference of macromolecular assemblies from crystalline state ucsf chimera--a visualization system for exploratory research and analysis consurf : an improved methodology to estimate and visualize evolutionary conservation in macromolecules processive selenocysteine incorporation during synthesis of eukaryotic selenoproteins translational termination-re-initiation in viral systems the use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for ribo-seq data analysis the ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mrna fragments mammalian micrornas predominantly act to decrease target mrna levels we thank the staff of diamond light source beamline i - for assistance with crystal screening and data collection. we thank janet deane for assistance with sec-mals experiments. we thank tatyana ss ss ss m m m mock mock mock wt_broad m _broad mock_broad key: cord- -ggm vrz authors: nova, nicole; deyle, ethan r.; shocket, marta s.; macdonald, andrew j.; childs, marissa l.; rypdal, martin; sugihara, george; mordecai, erin a. title: susceptible host availability modulates climate effects on dengue dynamics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ggm vrz experiments and models suggest that climate affects mosquito-borne disease transmission. however, disease transmission involves complex nonlinear interactions between climate and population dynamics, which makes detecting climate drivers at the population level challenging. by analyzing incidence data, estimated susceptible population size, and climate data with methods based on nonlinear time series analysis (collectively referred to as empirical dynamic modeling), we identified drivers and their interactive effects on dengue dynamics in san juan, puerto rico. climatic forcing arose only when susceptible availability was high: temperature and rainfall had net positive and negative effects, respectively. by capturing mechanistic, nonlinear, and context-dependent effects of population susceptibility, temperature, and rainfall on dengue transmission empirically, our model improves forecast skill over recent, state-of-the-art models for dengue incidence. together, these results provide empirical evidence that the interdependence of host population susceptibility and climate drive dengue dynamics in a nonlinear and complex, yet predictable way. empirical dynamic modeling (edm) edm infers a system's mechanistic underpinnings and predicts its dynamics using time series data of one or more variables to construct an attractor in state space ( figure s ). this procedure is called univariate (using lagged versions of a single to test for nonlinear state dependence of a variable-the motivation behind edm- we used the s-map test for nonlinearity (sugihara ) (figures s b, c and s ; supporting information). we used an edm approach called convergent cross-mapping (ccm) (sugihara et al. ) to identify drivers of dengue incidence. if two variables are causally related, then a multivariate attractor-where each variable in the system represents a dimension that traces the dynamics of the system-can be reconstructed (up to a practical limit) using lagged versions of just one of the variables ( figure s ). based on takens' theorem, this univariate "shadow attractor" preserves the structural and dynamic properties of the original multivariate attractor (takens ; sugihara et al. ). the concept behind ccm is that if temperature causes dengue incidence, then information about past temperature will be embedded in the dynamics of dengue, such that the shadow attractor produced using only incidence data allows us to accurately reconstruct temperature in the past. however, the converse scenario would not be true: since dengue does not cause temperature, the shadow attractor constructed using temperature data should not contain information to accurately reconstruct past dengue incidence (supporting information). the critical criterion for estimating causal (directional) associations between two variables using ccm is checking that the cross-mapping skill (i.e., pearson's correlation coefficient, ρ, between predicted driver values using the univariate ssr of the response variable, and the observed driver values) monotonically increases anomalies (deyle et al. a). in the second, more conservative "ebisuzaki" null model, we conserved any periodicity (beyond seasonal) and randomized the phases of fourier-transformed time series (ebisuzaki ). we tested for statistically significant differences in cross-mapping skill between the model that used the data versus the null models by performing kolmogorov-smirnov (k-s) tests after convergence. we also repeated ccm in the nonsensical, reverse-causal direction (e.g., to test whether incidence drives climate) as a control for potential spurious relationships generated by non-causal covariation (e.g., due to seasonality). we examined the predictive power of the drivers on dengue incidence by assessing how well we can predict dengue dynamics using temperature, rainfall, susceptibles index, and their combined effects. we used a combination of univariate ssr (i.e., with incidence data) and multivariate ssr to build forecasting models and to determine the improvement of forecasting using simplex projection when including (supporting information). we built the ssr forecasting models/attractors using the we assessed model forecasting performance using leave-one-out cross-validation. next, we evaluated out-of-sample forecasting performance of these models using testing data from four additional seasons ( / - / we followed the procedure as directed in the challenge (supporting information). in nonlinear systems, drivers generally have an effect that is state-dependent: the strength and direction of the effect depends on the current state of the system. scenario exploration with multivariate edm allowed us to assess the effect of a small change in temperature or rainfall on dengue incidence, across different states of the system. the outcome of these small changes allowed us to deduce the relationship between each climate driver and dengue incidence and how they depend on the system state. for each time step t we used s-maps (sugihara ; deyle et al. a) to predict dengue incidence using a small increase (+Δx) and a small decrease (-Δx) of the observed value of driver ( ) (temperature or rainfall). for each putative climate driver, the difference in dengue predictions between these small changes is Δ = ( + ) j ( ) + for both temperature and rainfall to recover their approximate relationships with dengue incidence at different states of the system. scenario exploration analyses were repeated across several model parameterizations to address potential edm showed that temperature, rainfall, and the susceptibles index drive dengue incidence since the convergence criterion was met (kendall's > , p < . ) in all three cases (figure a we cannot rule out the possibility that the apparent forcing of temperature on dengue is due to a seasonal confounder. however, if no such confounder exists, then the seasonal trend in temperature, which accounts for most temperature variation in san juan, drives the seasonal trend observed in dengue incidence. compared to the other drivers, the converging cross-mapping skill of the temperature null models were relatively high (figures and s as expected, edm tests for putative causality in the nonsensical directions- incidence driving temperature or rainfall-were not significant (i.e., no convergence; figure s , black lines). this result further supports the finding that temperature and rainfall drive dengue incidence, because their causal relationships were not confounded by spurious bidirectionality. the null models for the nonsensical directions of causality ( figure s , grey lines) also displayed no convergence (completely flat), as expected (i.e., seasonality of dengue incidence does not drive seasonality of temperature or rainfall). however, seasonality (or any periodicity) of temperature, rainfall and susceptibles index drive dengue dynamics, shown by convergence of the seasonal and ebisuzaki null models (grey lines in the multivariate ssr model using only temperature and rainfall data did not predict predicting peak incidence, peak week, and seasonal incidence for all seasons on average (tables s -s , figures s -s ). this demonstrates the benefit of the edm approach for capturing the mechanistic, nonlinear, interdependent relationships among drivers over both equation-based mechanistic models and phenomenological models. as expected, we find state-dependent effects of temperature and rainfall with non- zero median effects. we found that temperature had a small positive median effect ( . cases/°c, wilcox p < . ) on dengue incidence (figure a) finding, since evidence of climate functional responses for disease dynamics is rare due to the difficulty of obtaining appropriately informative field data. it is possible that if we had temperature data ranging across a larger spectrum-possibly by assembling data across multiple climates-that the empirical functional response derived from edm would also look unimodal. further, when the susceptibles index was high, the slope of the relationship between rainfall and dengue incidence became more negative as rainfall increased, suggesting a concave-down effect of rainfall on incidence (figure g, h) we found that rainfall, susceptible availability, and plausibly temperature (via its seasonality) interact to drive dengue incidence. combined, these three drivers predicted dengue incidence with high accuracy (figure c even when accounting for susceptible availability, the effects of temperature and rainfall on dengue were strongly state-dependent (figure d, g) . this result is potentially due to nonlinear effects of each climate driver (figure e vector densities that potentiate dengue outbreaks in a brazilian city edm'). r package version . . environmental drivers of the spatiotemporal dynamics of respiratory syncytial virus in the united states temperature impacts on dengue emergence in the united states: investigating the role of seasonality and climate change dengue vector dynamics (aedes aegypti) influenced by climate and social factors in ecuador: implications for targeted control nonlinear forecasting for the classification of natural time series reply to baskerville and cobey: misconceptions about causation with synchrony and seasonal drivers detecting effect of temperature on the vector efficiency of aedes aegypti for dengue virus climate variation drives dengue dynamics modelling the effective reproduction number of vector-borne diseases: the yellow fever outbreak in luanda key: cord- -ar nzmw authors: wayment-steele, hannah k.; kladwang, wipapat; das, rhiju title: rna secondary structure packages ranked and improved by high-throughput experiments date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ar nzmw the computer-aided study and design of rna molecules is increasingly prevalent across a range of disciplines, yet little is known about the accuracy of commonly used structure prediction packages in real-world tasks. here, we evaluate the performance of current packages using eternabench, a dataset comprising in vitro structure mapping and riboswitch activity datasets involving , synthetic sequences from the crowdsourced rna design project eterna. we find that contrafold and rnasoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages like the viennarna software, which derive parameters primarily from thermodynamic experiments. motivated by these results, we develop a multitask-learning-based model, eternafold, which demonstrates improved performance that generalizes to diverse external datasets, including complete viral genomes probed in vivo and synthetic designs modeling mrna vaccines. rna molecules perform essential roles in cells, including regulating transcription, translation, and molecular interactions, and performing catalysis. synthetic rna molecules are gaining increasing interest and development for a variety of applications, including genome editing, biosensing, and vaccination. characterizing rna secondary structure, the collection of base pairs present in the molecule, is typically necessary for understanding the function of natural rna molecules and is of crucial importance for designing better synthetic molecules. some of the most widely-used packages use a physics-based approach that assigns thermodynamic values to a set of structural features (vienna rnafold, nupack, and rnastructure ), with parameters traditionally characterized via optical melting experiments and then generalized by expert intuition. however, a number of other approaches have also been developed that utilize statistical learning methods to derive parameters for structural features (rnasoft, contrafold, cyclefold, learntofold, mxfold , spot-rna ). secondary structure modeling packages are typically evaluated by comparing single predicted structures to secondary structures of natural rnas . while important, this practice has limitations for accurately assessing packages, including bias toward structures more abundant in the most well-studied rnas (trnas, ribosomal rna, etc.) and neglect of energetic effects from these natural rnas' tertiary contacts or binding partners. furthermore, scoring on single structures fails to assess the accuracy of ensemble-averaged rna structural observables, such as base-pairing probabilities, affinities for proteins, and ligand-dependent structural rearrangements, which are particularly relevant for the study and design of riboswitches , , ribozymes, pre-mrna transcripts, and therapeutics that occupy more than one structure as part of their functional cycles. existing packages are, in theory, capable of predicting ensemble properties through so-called partition function calculations, and, in practice, are used to guide rna ensemble-based design, despite not being validated for these applications. new sources of structural data now offer opportunities to evaluate secondary structure packages incisively and with less bias. in particular, experimental data accumulating in the rna design crowdsourcing platform eterna has potential to mitigate effects of bias incurred in natural rna datasets, as the probed constructs are designed by citizen scientists to fold to synthetic secondary structures of their choosing. in this work, we evaluate the performance of commonly-used packages capable of making thermodynamic predictions in two tasks that have been crowdsourced on eterna and are emerging as central to rna characterization and design: ) predicting chemical reactivity data through calculating probabilities that nucleotides are unpaired, and ) predicting relative stabilities of multiple structural states that underlie the functions of riboswitch molecules, a task that involves predicting affinities of both small molecules and proteins of interest. both of these evaluations are made possible through eternabench, a collection of high-throughput datasets acquired in eterna. we find striking, consistent differences in package performance across these quantitative tasks, with the packages contrafold and rnasoft performing better than packages that are in much wider use. furthermore, we develop a multitask-learning-based framework to train a thermodynamic model on these tasks concurrently with the task of single-structure prediction. the resulting multitask-trained model, called eternafold, demonstrates increased accuracy both on held-out data from eterna as well as completely independent datasets encompassing viral genomes and mrnas probed with distinct methods and under distinct solution and cellular conditions. we conclude with suggestions of how rna thermodynamic secondary structure models might continue to be improved and with discussion of design applications. we evaluated commonly used secondary structure modeling packages in their ability to make thermodynamic predictions on a compilation of large datasets of diverse synthetic molecules from eterna, which we termed eternabench ( figure a ). the packages vienna rnafold, nupack, rnastructure, rnasoft, and contrafold, were analyzed across different package versions, parameter sets, and modelling options, where available (supplementary table ). packages discussed in the main text reflect their standard settings; the performance of other package options are included in supplementary information. we also evaluated packages trained more recently through a varied set of statistical or deep learning methods (learntofold, spot-rna, mxfold, and cyclefold), but these packages demonstrated poor performance ( figure s , s ) and have been omitted from further discussion. our first ensemble-based structure prediction task investigates the capability of these packages to predict chemical mapping reactivities. chemical mapping is a widely-used readout of rna secondary structure - and has served as a high-throughput structural readout for experiments performed in the eterna massive open online laboratory . a nucleotide's reactivity in a chemical mapping experiment depends on the availability of the nucleotide to be chemically modified, and hence provides an ensemble-averaged readout of the nucleotide's deprotection from base pairing or other binding partners. we wished to investigate if current secondary structure packages differed in their ability to recapitulate information about the ensembles of misfolded states that are captured in chemical mapping experiments. we used the eterna "cloud labs" for this purpose: datasets of player-designed constructs, ranging from - nucleotides in length, all of which were chemically modified by selective '-hydroxyl acylation analyzed by primer extension (shape) and read out using the map-seq chemical mapping protocol. after filtering for sequences with > % redundancy, these datasets comprise , individual constructs ( figure a , "eternabench-cm"). figure a shows an example heatmap of shape data for eterna-player-designed synthetic rna molecules from the eterna cloud lab round , the round with the most constructs and a relatively high signal-to-noise ratio ( figure s ). figure b shows two of these player-designed constructs in the minimum free energy (mfe) structures predicted in the eterna gameplay with vienna rnafold . . ("vienna "), colored by experimental shape reactivity data. vienna predicted that both sequences would form the same structure, and low reactivity in stem regions in construct (i) indicates that the molecule predominantly folded to the target structure. however, high reactivity values in the stems of construct (ii) indicate that the molecule did not fold to the correct structure, and instead remained largely unfolded, in conflict with the predictions of vienna . figure c depicts package-predicted unpaired probabilities per nucleotide, punp, plotted in the same heatmap arrangement as the experimental data in figure a (see figure s for example heatmaps for all package options tested). in this subset of constructs, all packages are largely able to identify which regions should be completely paired (punp = , white) or unpaired (punp = , black), but some packages are able to predict punp values between and that more table , and pairwise significance tests are in figure s . we computed the correlation coefficients between chemical mapping data and punp figure s ). for all packages, correlation coefficients were weakly positively associated with number of constructs in the dataset and strongly negatively correlated with the experimental noise in the data ( figure s ), confirming the importance of having large, high-quality data sets in evaluating these secondary structure packages. we observed that contrafold and rnasoft generally predict that the constructs studied are more melted than the other packages predict at their default temperatures of °c, even though the actual chemical mapping experiments were carried out at lower temperature ( °c; see methods). motivated by this observation, we wished to ascertain if a simple change in temperature might account for differences in performance between packages. to address this, vienna , nupack, and rnastructure packages include parameters for both enthalpy and entropy, allowing for altering the temperature used in prediction. we found that increasing the temperature from the default value of ˚c used in these packages to - °c improved their correlation to experimental data, but the maximum correlation values did not surpass the correlation observed from contrafold or rnasoft with default settings ( figure s ). our second ensemble-based structure prediction task involved predicting the relative populations of states occupied by riboswitch molecules. riboswitches are rna molecules that alter their structure upon binding of an input ligand, which effects an output action such as regulating transcription, translation, splicing, or the binding of a reporter molecule. , , we compared these packages in their ability to predict the relative binding affinity of synthetic riboswitches to their output reporter, fluorescently-tagged ms viral coat protein. these riboswitches were designed to use a small molecule input (flavin mononucleotide (fmn), tryptophan, or theophylline) to regulate formation of the ms hairpin aptamer ( figure a , "eternabench-switch"). these riboswitches came from two sources: the first consisted of , riboswitches designed by citizen scientists on eterna, filtered from the original dataset to exclude sequences with over % similarity. the second consisted of , riboswitches designed fully computationally using the ribologic package, probed concomitantly with eterna riboswitches. three important metrics that secondary structure packages can predict for these datasets (depicted in figure a , overview of experimental measurements in figure s figure s , and correlations and pairwise significance tests for all package options are in figure s and supplementary table . when performance across independent experimental datasets was evaluated ( figure c , figure s ) and pairwise comparisons bootstrapped over all datasets, we obtain a ranking, from lowest to highest, of nupack, vienna rnafold, rnastructure, rnasoft blstar, and contrafold, identical to the ranking obtained from chemical mapping data ( figure c , d, pairwise significance tests per dataset in figure s ). another metric to evaluate predictive accuracy is the root mean-squared error (rmse) between predicted and experimental values; this metric revealed a similar ranking led by contrafold ( figure s , supplementary exhibited higher correlation to experimental values than vienna across the majority of data sets (> % bootstraps, figure d , figure s ). for log ar, contrafold predictions also exhibited higher correlation than vienna ( % of bootstraps; figure e , figure s ). overall, these comparisons consistently ranked contrafold as the most accurate package for modeling riboswitches; this ranking matches the entirely independent ranking based on chemical mapping measurements of distinct rna sequences described in the previous section. given that the two packages that performed best in both structure prediction tasks were developed with statistical methods, we hypothesized that performance in these two tasks might be improved by incorporating these tasks in the process of training a secondary structure package. multi-task learning, the process of learning tasks in parallel using a shared model, has proven useful in image classification and natural language processing. for learning to be transferred across tasks, the tasks must share significant commonalities and structure. however, multi-task learning might fail if modelling assumptions made for one task do not hold across other tasks, or if the model's representational capacity is not large enough to correctly model all the data types. we used the contrafold code as a framework to explore multi-task learning on rna structural data, since it has previously been extended to train on chemical mapping data to maximize the expected likelihood of chemical mapping data. we further extended the contrafold loss function to include a term to minimize the mean squared error of riboswitch affinities for ms protein (online methods). we trained models with a variety of combinations of data types to explore interactions in multitask training (table ) and evaluated performance on held-out test sets for single-structure prediction accuracy, chemical mapping prediction accuracy, and riboswitch fold change prediction. for single-structure data, we used the s-processed dataset used previously in training contrafold and rnasoft . comparing performance across models trained with different types of input data indicates some tradeoffs in performance. model "s", trained only on the s-processed structure dataset, exhibited the highest accuracy for a held-out single-structure prediction test set ( . ( )), outperforming models that included training on other data types. likewise, model "r", trained only on riboswitch )*# $%&' values, exhibited the highest performance in a held-out riboswitch )*# $%&' prediction test set. however, model "scrr", trained on four data types (structure, chemical mapping, riboswitch )*# $%&' and )*# (%&' ) exhibited the highest performance on riboswitch )*# (%&' and chemical mapping, and its performance was within error of models "s" and "r" on single structure prediction and )*# $%&' prediction test sets, respectively. we termed this scrr model "eternafold". we wished to test if eternafold's improvements in recovering eterna measurements generalized to improvement in predictions for datasets from other groups, experimental protocols, and rna molecules. we first tested the ability of eternafold to predict the thermodynamics of protein binding not included in its training data, analogous to the ms binding comparisons above. we made use of a large data set of precisely measured ddg (table , figure b ). most of these test molecules were much longer (thousands of nucleotides) than the -nucleotide rnas used as the primary training data for eternafold. nevertheless, compared to all other packages tested, eternafold exhibited the highest correlation in / datasets with p < . and an additional / datasets with p < . ( figure d , figure s , supplementary table , and demonstrated the highest correlation in a pairwise significance analysis bootstrapped over all datasets ( figure c ). we were curious as to whether the differences in packages arise from consistent accuracy differences across all regions of these rnas or from a net balance of increased and decreased accuracies at specific subregions of the rnas, which might reflect particular motifs that are handled better or worse by the different packages. figures in this work, we have established eternabench, benchmark datasets and analysis methods for evaluating package accuracy for two modeling tasks important in rna structural characterization and design. these include ) predicting unpaired probabilities, as measured through chemical mapping experiments, and ) predicting relative stabilities of different conformational states, as exhibited in riboswitch systems. we discovered that rnasoft and further advances in thermodynamic structure prediction might come from incorporating more data in a data-driven framework like eternafold as well as advances in modelling. to analyze how much accuracy increases from incorporating more data, we compared differences in accuracy in models trained on the holdout dataset, roughly % the size of the full training dataset ( figure s ). figure s ). further data-driven investigations will be necessary to improve performance and aspects of the model that need to expanded, which may include noncanonical pairs , more sophisticated treatment of junctions , next-nearest-neighbor effects , and chemically modified nucleotides , . this work demonstrates that rna secondary structure prediction can be methodically evaluated and improved by combining thermodynamic predictions with statistical learning on diverse, high-throughput experimental datasets. as more sources of structural data continue to be made available in higher quantity, resolution, and scope, the eternabench and eternafold frameworks will allow for the usage of these datasets in continually assessing and improving rna structure prediction. an important use case for eternafold will be computationally guided design of rna medicines, including structured mrnas for viral pandemics like covid- . eternafold already affords increases in predictive power for structure mapping data acquired for synthetic mrnas that model mrna vaccines as well as for alphaviruses, which form the basis of self-amplifying mrna vaccines , . these observations suggest immediate applications for the packages herein and potentially rapid further improvements in eternafold as further structural data are acquired on these new rna molecules. in vitro/ in vivo chemical mapping the algorithms evaluated in this work model secondary structure in the following manner. given a model , which is comprised of a set of structural features { }, the partition function of an rna sequence x is computed as where ( & ) is the free energy contribution of structural feature k, -is boltzmann's constant, and t is temperature. z represents a sum over the set of all possible structures {s}. from this expression, the probability of any particular structure s is defined as . ( ) structure prediction algorithms are able to estimate the ensemble-averaged probability that a nucleotide is paired or unpaired. let ( : | , ) be the probability of bases i and j being paired, given sequence x and model . this may be computed as the relationship between the probability of a nucleotide being unpaired and its experimentallymeasured reactivity has served as a locus for many efforts for improving structure prediction of rna constructs incorporating chemical mapping data from those constructs, and several functional forms have been used to describe the relationship between unpaired probability and chemical mapping reactivity [ ] [ ] [ ] . in this work, we use the correlation coefficient between unpaired probability and experimentally-measured reactivity as a nonparametric measure of model quality. a thermodynamic framework discussed in greater detail in ref. allows us to relate the observed binding affinity of an output molecule to the relative populations of a riboswitch molecule in different states. in the absence of input ligand, we may relate the probability that a riboswitch adopts a structural feature that can bind its output, ( ), to an experimentally-measured binding affinity, k . , via the relative ratios of both values to those of a reference state: we selected the ms hairpin aptamer as a reference state whose probability of forming, @ab ( ), can be estimated by the secondary structure algorithm. for each separate independent experimental dataset, ;c) @ab is estimated as the highest affinity measured ( figure s ). we refer to the estimated ratio figure a ). although there may be error introduced in which experimental point is selected to be ;c) @ab , relative error should be constant when comparing packages on the same dataset. to compare packages, we report the correlation between log( ;c) ± / ;c) @ab ) and log( >+? ± ), which excludes the effect of selection for k @ab . however, the root mean squared error (rmse) between predicted and experimental values is also of interest to be able to consider error in terms of energetic units (kcal/mol) and is reported in supplementary tables s , s , s , s , and plotted in figure s for the ribologic-fmn dataset. in general, the probability of any an rna molecule forming any structure motif is computed as where e;= b denotes a structure containing that motif. computing this probability requires a dynamic programming routine that is able to constrain the sampled structure space to only structures containing that motif to estimate a so-called "constrained partition function". however, not all secondary structure algorithms have implemented constrained partition function estimation. because the ms aptamer is a hairpin, we can approximate its probability of forming as the probability of forming the final base pair of the ms hairpin aptamer (colored pink in figure a ), an experimental observable that can be estimated by all the packages tested here. thus, our prediction of interest is where i and j are the nucleotides forming the terminal base pair in the ms aptamer stem. the value @ab ( : ) is accordingly computed as the probability of closing the base pair in the reference sequence. we confirmed that calculations using eqn. and eqn. agree for vienna and contrafold packages. rnastructure is capable of computing constrained partition functions, but the constrained partition function calculations did not match those from base pair probabilities ( figure s ). hence, rnastructure constrained partition function predictions were excluded from comparisons. the estimation of k g hi j follows similarly to above but must take into account increased thermodynamic weights for states that correctly display the aptamer of the input small molecule ligand. therefore, it cannot be estimated via the simplified single base pair calculation and must make use of constrained partition functions (eq. ). for all constructs as well as the reference ms hairpin construct, we performed >+? ± estimations including a flanking hairpin included in the illumina array experiments (described in ref. ). in example, the full reference ms hairpin construct, as well as the constraint used for estimating :@ak @ab with constrained-partition-function-based estimation, is in brief, the contrafold loss function optimizes the conditional log-likelihood of ground-truth structure ( ) given sequence ( ) over dataset : in contrafold-se , the authors include a term to also use chemical mapping data to optimize structure prediction by maximizing the likelihood of observing the included chemical the datasets used here for evaluation, as well as scripts and python notebooks for reproducing the filtered datasets and the chemical mapping and riboswitch fold change calculations described here, are available at www.software.eternagame.org in the package "eternabench". the code for training eternafold, as well as the training and test sets used, are available at www.software.eternagame.org as the package "eternafold". the eternafold code is derived from the contrafold-se codebase, which is derived from the contrafold codebase. a server to run eternafold is being made available for noncommercial use. all base-pairing probability calculations and constrained partition function calculations were performed using standardized system calls through python wrappers developed in arnie (www.github.com/daslab/arnie). example command-line calls for each package option evaluated are provided in supplementary table . datasets were handled with pandas (https://github.com/pandas-dev/pandas) and visualized with seaborn (https://seaborn.pydata.org/). the following significance test was performed to make pairwise package comparisons for each type of data. for each bootstrapping round, the datasets within the data type (for instance, the cloud lab chemical mapping rounds) were sampled with replacement, and within each of the sampled datasets, the datapoints were sampled with replacement. the correlations for both packages on the sampled datasets were calculated, and the number of datasets for which package a had a higher correlation than package b was recorded. if package a had a higher correlation in more than % of the datasets, this was counted as a "win" for package a. and processed with rdatkit (https://ribokit.github.io/rdatkit/). the rna was probed with the map-seq protocol with a co-loaded standard molecule (p -p - hp rna) to enable normalization, as described in ref. ; measurements were carried out at ambient temperatures ( ˚c) with mm mgcl , mm na-hepes, ph . . within each chemical mapping dataset, cd-hit-est was used to filter sequences with greater than % redundancy (excluding a shared ' primer binding site). from each sequence cluster identified, the sequence with the highest signal-to-noise ratio from chemical mapping experiments was selected as the representative sequence. nucleotides with reactivities less than zero or greater than the th percentile of the dataset were removed from analysis. cloud lab round was filtered to exclude certain experiments that had fmn present, pertaining to eterna cloud lab challenges to design riboswitches. adenosine nucleotides preceded by or more as were also removed due to evidence of anomalous transcription effects in such stretches , though this removal was not shown to alter package correlations to data (data not included). external chemical mapping datasets were obtained from the supplementary information from the papers and processed similarly (outliers, nucleotides in poly-a stretches removed). for molecules longer than nucleotides, p(unp) predictions were performed using a sliding window size of , with overlapping regions of length . changing the window size was not shown to affect correlation values ( figure s ). the eukaryotic genome as an rna machine exploring the potential of genome editing crispr-cas technology rna-based fluorescent biosensors for detecting metabolites in vitro and in living cells introduction to rna vaccines optimal computer folding of large rna sequences using thermodynamics and auxiliary information viennarna package . nupack: analysis and design of nucleic acid systems rnastructure: software for rna secondary structure prediction and analysis thermodynamic parameters for an expanded nearest-neighbor model for formation of rna duplexes with watson-crick base pairs rna secondary structure prediction without physics-based models measurements were carried out at °c in mm tris-hcl, ph . , mm kcl, mm mgcl mg/ml bsa, mm dtt, . mg/ml yeast trna, . % tween- , and varying concentrations of small molecule ligand (fmn, tryptophan, theophylline) and ms coat protein datasets were filtered to only include constructs with sequences that included the canonical ms and small molecule aptamers, and filtered using cd-hit-est to remove sequence redundancy over % rna folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction data-directed rna secondary structure prediction using probabilistic modeling computational analysis of conserved rna secondary structure in transcriptomes and genomes evaluating riboswitch optimality crowdsourced rna design discovers diverse, reversible, efficient, self-contained molecular sensors. biorxiv automated design of diverse stand-alone riboswitches rna secondary structure prediction without physics-based models learning rna secondary structure (only) from structure probing data an rna mapping database for curating rna structure mapping experiments standardization of rna chemical mapping experiments cd-hit: accelerated for clustering the nextgeneration sequencing data anomalous reverse transcription through chemical modifications in polyadenosine stretches. biorxiv we thank members of the das and barna labs (stanford university), c. pop, and c.-s. foo for useful discussions. we thank i. jarmoskaite, v. v. topkar, r. wellington-oguri, and j. townley for helpful comments on the manuscript. calculations and model training were performed on the key: cord- -a cqw kg authors: shi, yuejun; shi, jiale; sun, limeng; tan, yubei; wang, gang; guo, fenglin; hu, guangli; fu, yanan; fu, zhen f.; xiao, shaobo; peng, guiqing title: insight into vaccine development for alpha-coronaviruses based on structural and immunological analyses of spike proteins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a cqw kg coronaviruses that infect humans belong to the alpha-coronavirus (including hcov- e) and beta-coronavirus (including sars-cov and sars-cov- ) genera. in particular, sars-cov- is currently a major threat to public health worldwide. however, no commercial vaccines against the coronaviruses that can infect humans are available. the spike (s) homotrimers bind to their receptors through the receptor-binding domain (rbd), which is believed to be a major target to block viral entry. in this study, we selected alpha-coronavirus (hcov- e) and beta-coronavirus (sars-cov and sars-cov- ) as models. their rbds were observed to adopt two different conformational states (lying or standing). then, structural and immunological analyses were used to explore differences in the immune response with rbds among these coronaviruses. our results showed that more rbd-specific antibodies were induced by the s trimer with the rbd in the “standing” state (sars-cov and sars-cov- ) than the s trimer with the rbd in the “lying” state (hcov- e), and the affinity between the rbd-specific antibodies and s trimer was also higher in the sars-cov and sars-cov- . in addition, we found that the ability of the hcov- e rbd to induce neutralizing antibodies was much lower and the intact and stable s subunit was essential for producing efficient neutralizing antibodies against hcov- e. importantly, our results reveal different vaccine strategies for coronaviruses, and s-trimer is better than rbd as a target for vaccine development in alpha-coronavirus. our findings will provide important implications for future development of coronavirus vaccines. importance outbreak of coronaviruses, especially sars-cov- , poses a serious threat to global public health. development of vaccines to prevent the coronaviruses that can infect humans has always been a top priority. coronavirus spike (s) protein is considered as a major target for vaccine development. currently, structural studies have shown that alpha-coronavirus (hcov- e) and beta-coronavirus (sars-cov and sars-cov- ) rbds are in lying and standing state, respectively. here, we tested the ability of s-trimer and rbd to induce neutralizing antibodies among these coronaviruses. our results showed that beta-covs rbds are in a standing state, and their s proteins can induce more neutralizing antibodies targeting rbd. however, hcov- e rbd is in a lying state, and its s protein induces a low level of neutralizing antibody targeting rbd. our results indicate that alpha-coronavirus is more conducive to escape host immune recognition, and also provide novel ideas for the development of vaccines targeting s protein. hcov-nl ) and beta-covs (hcov-oc and hcov-hku ) are well adapted to humans and widely circulate in the human population, with most infections causing mild disease in immunocompetent adults ( , , ). in addition, sars-cov, sars-cov- and mers-cov belong to beta-cov and are highly pathogenic ( - ). as the primary glycoprotein on the surface of the viral envelope, the spike (s) glycoprotein is the major target of neutralizing antibodies (nabs) elicited by natural infection and key antigens in experimental vaccine candidates. the s protein contains two subunits responsible for receptor binding (s subunit) and membrane fusion (s subunit) ( ). in particular, the s subunit of the prefusion s protein is structurally ( , ) . the s subunits of beta-and gamma-cov strains utilize the cross-subunit packing mode, reducing the conformational conflict of the rbd in a standing state ( , , , ). in contrast, alpha-and delta-cov strains both utilize an intrasubunit packing mode, and the s -ctd is limited by the conformational conflict with surrounding domains ( , , - , , ) . hence, the s -rbd in the s trimer was captured in two different states among different coronaviruses. in the beta-covs (sars-cov, sars-cov- and mers-cov), the s -rbd adopts a "standing" state, which is believed to be a prerequisite for receptor binding and rbm-specific antibody binding ( , , ) . nevertheless, the s -rbds of alpha-covs all adopt "lying" state, which is considered more conducive to evading antibody recognition ( , , , mers-cov. among them, the s protein or rbd was the major targets ( ) ( ) ( ) . compared with beta-covs, relatively few studies have investigated two alpha-hcovs: hcov- e and hcov-nl . however, their s subunit structure and receptor recognition pattern, especially the structure of the rbd and its state in the s trimer, differ substantially from those of beta-covs, suggesting different s protein immune responses between alpha-and beta-covs. importantly, considering the low homology between different coronavirus genera, related research on alpha-covs can not only help to elucidate the differences between s proteins that adopt different rbd states but can also facilitate the development of coronavirus vaccines. in this study, we selected sars-cov, sars-cov- , and hcov- e as models, which adopt the two rbd states, and evaluated and compared immune responses to the s trimers and rbds of these coronaviruses through immunological and bioinformatics approaches. we also investigated the mechanism through which the hcov- e s trimer produced effective nabs. finally, we provide possible vaccine strategies for alpha- to address this issue, we performed b-cell epitope predictions for the s trimers and rbds of alpha-cov (hcov- e) and beta-covs (sars-cov and sars-cov- ). the predicted positive residues (the corresponding spatial epitope and linear epitope) are displayed on the structural surface ( fig. a, c and e) , and the distribution of positive residues on the rbd is summarized in table . a total of and amino acid residues located on the rbd were predicted to be conformational epitopes for sars-cov and sars-cov- , respectively. of these, and residues were located in the sars-cov rbm subdomain and in the sars-cov- rbm subdomain, respectively. the linear b-cell epitope prediction results were similar in sars-cov and sars-cov- . however, in hcov- e, only residues located in the rbm subdomain were predicted to be conformational epitopes, and residues were predicted to be linear epitopes. the same results also appeared in the hcov- e s trimer: fewer positive residues were located in the rbd than in the sars-cov or sars-cov- rbm subdomain ( fig. a, c and sars-cov- -immunized mice had a good neutralizing ability ( fig. i and j) . for hcov- e, the s trimer serum had a comparable neutralizing ability to that of sars-cov or sars-cov- , but the rbd serum had no detectable neutralizing ability (fig. k) . our experimental results indicate that the lying state of the rbd in the hcov- e s-trimer induces the production of very few antibodies targeting the rbd, but the s-trimer still produces strong neutralizing antibody levels. in this study, we found that more rbd-specific antibodies were induced by the s trimer with the rbd in the standing state than the s trimer with the rbd in the lying state, and the affinity between rbd-specific antibodies and the s trimer was also higher in the standing state. however, we also found that fewer nabs were induced by the rbd of hcov- e than by the rbds of sars-cov or sars-cov- . in terms of hcov- e, the distribution of the potential residues in the rbm was lower than that of sars-cov or sars-cov- , which may have been caused by different rbm patterns and exposure degrees. when we compared the reported nab epitopes of sars-cov and alpha-cov tgev with our results ( ), they were basically consistent. therefore, we believe that this finding illustrates the inherent difference between the rbds of alpha-and beta-cov. the intact and stable s subunit of hcov- e is a prerequisite for the production of effective nabs our experimental results showed that hcov- e s-trimer can induce strong nab levels, while the rbd alone is less immunogenic. next, we will explore which functional domains of the s-trimer are involved in the generation of nabs. to clarify this issue, we immunized mice with the hcov- e s trimer ( µg), s ( µg), ntd ( µg), rbd ( µg) and ntd+rbd ( µg+ µg). meanwhile, to better confirm our results, the hcov- e strain vr was used for the neutralizing assay. the results indicated that the s trimer serum had the best neutralizing ability, followed by the s and ntd+rbd sera, while the ntd and rbd sera alone had no detectable neutralizing effects (fig. a) . the results indicate that the s region in the s-trimer should be the key region for nabs induction. to further verify the importance of the complete s structure in the s-trimer, we designed two s trimer mutants, namely, an ntd-deficient s trimer and an s c/t c s trimer, the s subunit integrity or stability of which was destroyed ( fig. c and f ). mutant proteins disrupt the conformational conflicts that limit rbd standing, significantly improving their ability to bind hapn ( fig. d and g) . however, an incomplete or unstable s conformation significantly reduces the level of nabs induced by the s-trimer (fig. e and k). taken together, these results showed that the intact and stable s subunit of hcov- e is a prerequisite for the production of effective nabs. furthermore, our experimental results show that rbd has a higher ability to bind to the receptor hapn (fig. b) , which indicates that the characteristics of rbd itself may lead to the generation of less neutralizing antibodies. furthermore, we screened monoclonal antibodies using s-trimer, and the results showed that few antibodies targeting s -rbd (fig. a) . to further determine the ability of rbd to induce antibodies itself, we screened monoclonal antibodies targeting the s region and found that the proportion of antibodies targeting rbd was approximately % (fig. b ). since the s protein is expressed in a monomeric form, rbd is not restricted by we compared the structures of s trimers and rbds among alpha-coronaviruses (figs. b and a) . we also predicted the potential b-cell epitopes for their rbds ( fig. a; table ) . in alpha-cov, the s-trimer had a closed s subunit with three "lying" rbds (fig. b) . moreover, the rbds consist of a standard β-sandwich fold core and three short discontinuous loops in the same spatial region ( , , , , , , ) (fig. a) . meanwhile, we performed a structural conservative analysis and the results showed that the rbd structures of hcov-nl , pedv, and fipv are most similar to hcov- e, with rsmd values of . , . , and . , respectively (fig. b) . in addition, the distribution of potential b-cell epitopes in the rbds of alpha-covs was also similar to that of hcov- e (fig. a and c; table ) . based on the above data, inherent differences exist in the rbds between alpha-and beta-covs (figs. and a). however, the alpha-and beta-covs show high similarity in their rbds and similar potential immune characteristics within their respective genera (figs. , , a and b). accordingly, in alpha-covs such as hcov- e, subunit vaccines should prioritize the s-trimer rather than the rbd. in beta-covs such as sars-cov and sars-cov- , the s trimer and rbd are both good candidates for subunit vaccines (fig. ) . in summary, we systematically analyzed the conformational states and igg ( : , diluted in pbst with % bsa (w/v), boster) was used for detection. signal reading was carried out in the same manner. hbs buffer was used as a mock then the plates were reacted with the hybridoma culture supernatants at ℃ for h. hrp-conjugated goat anti-mouse igg ( : , diluted in pbst with % bsa (w/v), boster) was used for detection. signal reading was carried out in the manner described above. hybridoma culturing medium was used as a mock control. ratification vote on taxonomic proposals to the international committee on taxonomy of viruses origin and evolution of pathogenic coronaviruses genetic recombination, and pathogenesis of coronaviruses clinical features of patients infected with novel coronavirus in wuhan genomic analysis of human coronaviruses oc (hcov-oc s) circulating in france from to reveals a high intra-specific diversity with new recombinant genotypes coronavirus as a possible cause of severe acute respiratory syndrome anonymous. . the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- structure, function, and evolution of coronavirus spike proteins cryo-em analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans cryo-em structure of the -ncov spike in the prefusion conformation the . -angstrom cryo-electron microscopy structure of the porcine epidemic diarrhea virus spike protein in the prefusion structural basis for human coronavirus attachment to sialic acid receptors the human coronavirus hcov- e s-protein glycan shield and fusion activation of a deltacoronavirus spike glycoprotein fine-tuned for enteric infections cryo-electron microscopy structure of porcine deltacoronavirus spike protein in the prefusion state cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a antigenic and immunogenic characterization of recombinant baculovirus-expressed severe acute respiratory syndrome coronavirus spike protein: implication for vaccine design recombinant receptor binding domain protein induces partial protective immunity in rhesus macaques against middle east respiratory syndrome coronavirus challenge immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen structural bases of coronavirus attachment to host aminopeptidase n and its inhibition by neutralizing antibodies the x-ray crystal structure of human aminopeptidase n reveals a novel dimer and the basis for peptide processing comparison of coronaviruses a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- clustal w and clustal x version . s : receptor-binding subunit; s : membrane fusion subunit; ntd: n-terminal domain; rbd: receptor-binding domain (magenta). (b) overall structure comparison of coronavirus s trimers structure-based b-cell epitope predictions of beta-cov (sars-cov and sars-cov- ) and alpha-cov (hcov- e). (a, c and e) the predicted b cell epitopes of sars-cov, sars-cov- and hcov- e are shown. the linear (red cartoon) and conformational (yellow sphere) b cell epitopes were bepipred . or discotope . and labeled onto the corresponding structure by and f) the complex structures of the rbds of sars-cov sars-cov- and hcov- e with the receptors (hace and hapn) are shown. the interface area of each complex and the surface area of each rbd were calculated via the rbm region of the rbd and the receptors (hace and hapn) are shown in red and cyan immunological analysis of beta-cov (sars-cov and sars-cov- ) and hcov- e). (a and b) cross-reactivity of the sars-cov s trimer and mice sera of sars-cov s trimer (red) and sars-cov rbd (blue) were -fold serially diluted (starting with -fold dilution) and reacted with the s trimer (a) or rbd (b), respectively cross-reactivity of the sars-cov- s trimer and rbd-specific sera is determined by mice sera of sars-cov- s trimer (magenta) and sars-cov- rbd (slate) were -fold diluted and reacted with sars-cov- s trimer (c) and rbd (d) -reactivity of the hcov- e s trimer and rbd-specific sera is determined by elisa. mice sera of hcov- e s trimer (orange) and hcov- e rbd the antibody titers of sera from mice immunized with μg of the hcov- e rbd (brown) and μg of the hcov- e rbd (purple) all data above are presented as the dilution that remained positive. (i, j and k) the neutralization assay of mouse sera from the spike trimer and rbd against sars-cov, sars-cov- and hcov- e pseudoviruses is determined. the data are presented as the mean reciprocal ic titer. the limit of detection for the assay depends on the initial dilution and is represented by the intact and stable s subunit of hcov- e is a prerequisite for the production of effective nabs. (a) the neutralization abilities of mouse sera from the b) determination of the affinity of ntd and rbd with the receptor hapn. (c) structural model of hcov- e-s-△ntd. magenta: rbd; green: sd ; cyan: sd . (d) dose-dependent binding of hcov- e-s-△ntd and hapn. (e) the neutralization ability of mouse sera from hcov- e-s-△ntd was measured via pseudovirus neutralization assay magenta: rbd; blue: ntd; green: sd ; cyan: sd . (g) dose-dependent binding of h) the neutralization ability of mouse sera from hcov- e-s-s c/t c was measured via pseudovirus neutralization assay the limit of detection for the assay depends on the initial dilution and is represented by dotted lines,a reciprocal ic titer of was assigned. besides monoclonal antibody epitope mapping of the hcov- e spike protein monoclonal antibody (mab) epitope regions in the hcov- e spike protein (a) and s domain (b). supernatants of positive hybridomas were reacted with the data are presented as the od (bottom). mabs and their epitope regions are indicated below the schematic of the b cell epitope analysis of the rbd regions of alpha-coronavirus spike proteins. (a) structures of the rbds from alpha-covs (hcov- e the linear (red cartoon) and conformational (yellow sphere) b cell epitopes were predicted by bepipred . or discotope . and labeled onto the corresponding rbd structure by pymol. (b) structural comparison of the rbds from alpha-covs. (c) sequence alignment of the rbds from alpha-covs. the rbm or putative rbm region is shown in cyan foundation (program no. py ). the authors declare no competing interests. fig. potential vaccine strategies for alpha-and beta-covs. the model showed that the rbds of the alpha-cov s trimers are in a lying state. in this state, the s protein cannot bind to the receptor, but meanwhile, this state is also conducive to escaping the immune response target the rbd, and the rbds of the alpha-covs also induces fewer nabs; thus, their s-trimers can be an effective potential subunit vaccine. in key: cord- -luqvw y authors: levinson, julia; kohl, kid; baltag, valentina; ross, david title: investigating the effectiveness of school health services delivered by a health provider: a systematic review of systematic reviews date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: luqvw y schools are the only institution regularly reaching the majority of school-age children and adolescents across the globe. although at least countries have school health services, there is no rigorous, evidence-based guidance on which school health services are effective and should be implemented in schools. to investigate the effectiveness of school health services for improving the health of school-age children and adolescents, a systematic review of systematic reviews (overview) was conducted. five databases were searched through june . systematic reviews of intervention studies that evaluated school-based or school-linked health services delivered by a health provider were included. review quality was assessed using a modified ballard and montgomery four-item checklist. references were screened and systematic reviews containing primary studies were assessed narratively. interventions with evidence for effectiveness addressed autism, depression, anxiety, obesity, dental caries, visual acuity, asthma, and sleep. no review evaluated the effectiveness of a multi-component school health services intervention addressing multiple health areas. strongest evidence supports implementation of anxiety prevention programs, indicated asthma education, and vision screening with provision of free spectacles. additional systematic reviews are needed that analyze the effectiveness of comprehensive school health services, and specific services for under-researched health areas relevant for this population. and should be implemented in schools. to investigate the effectiveness of school health services for improving the health of school-age children and adolescents, a systematic review of systematic reviews (overview) was conducted. five databases were searched through june . systematic reviews of intervention studies that evaluated school-based or school-linked health services delivered by a health provider were included. review quality was assessed using a modified ballard and montgomery four-item checklist. references were screened and systematic reviews containing primary studies were assessed narratively. interventions with evidence for effectiveness addressed autism, depression, anxiety, obesity, dental caries, visual acuity, asthma, and sleep. no review evaluated the effectiveness of a multi-component school health services intervention addressing multiple health areas. strongest evidence supports implementation of anxiety prevention programs, indicated asthma education, and vision screening with provision of free spectacles. additional systematic reviews are needed that analyze the effectiveness of comprehensive school health services, and specific services for under-researched health areas relevant for this population. the world health organization (who) launched the global school health initiative in with the goal to improve child, adolescent and community health through health promotion and programming in schools [ ] . this initiative is dedicated to promoting development of school health programs and increasing the number of health-promoting schools, characterized by who as "a school constantly strengthening its capacity as a healthy setting for living, learning and working" [ ] . in , who, the united nations educational, scientific and cultural organization (unesco), the united nations children's fund (unicef) and the world bank developed a partnership for focusing resources on effective school health -a fresh start approach [ ] . the fresh framework promotes four pillars: health-related school policies, provision of safe water and sanitation, skills-based health education and school-based health and nutrition services [ ] . while various guidance documents have been published by united nations (un) organizations addressing a range of services from oral health to malaria [ ] [ ] [ ] [ ] [ ] , there is no internationally accepted guideline regarding school health services. this systematic review of systematic reviews, henceforth referred to as an overview, will inform the upcoming development of a who guideline that addresses one pillar of the fresh framework: school health services delivered by a health provider. schools offer a unique platform for health care delivery. in , the global means for the primary and secondary net school enrollment rates were % and %, respectively, thus the potential reach of school health services is wide [ ] . additionally, a recent review found that school-based or school-linked health services already exist in at least countries [ ] . the global accelerated action for the health of adolescents (aa-ha!) implementation guidance calls for the prioritization of school health programs as an important step towards universal health coverage and urges that "every school should be a health promoting school" [ ] . the primary objective of this overview was to explore the effectiveness of school- based or school-linked health services delivered by a health provider for improving the health of school-age children and adolescents. through a comprehensive literature search, the overview aimed to identify health areas and specific school health service interventions that have at least some evidence of effectiveness. it was also designed to suggest further research in areas where recent systematic reviews (srs) exist, but with insufficient evidence. finally, the overview aimed to identify the health areas and specific school health services interventions for which no srs were found, whether because the primary literature does not exist or where there are primary studies but no sr has been conducted. this overview was conducted using the preferred reporting items for systematic reviews and meta-analyses (prisma) [ ] . a protocol was developed a priori that outlined the overview objectives, aims, operational definitions, search strategy, inclusion/exclusion criteria, and quality appraisal methods. this document was followed throughout the review process and is available in s appendix. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] enrolled in schools; (b) interventions were school-based or school-linked health services, involved a health provider (see definitions in s appendix), and were of any duration or length of follow-up; (c) intervention effectiveness was compared to either no intervention, an alternative intervention, the same intervention in a different setting (i.e. not in schools), an active control, or a waitlist control; (d) interventions aimed to improve some aspect of health; and (e) study designs were either randomized controlled trials (rcts), quasi-experimental studies (qes), or other non-randomized intervention studies. there were no date restrictions on publication of included srs. in addition to these criteria for included studies, the srs themselves had to fulfill the following criteria: (a) included the words "systematic review" in the title or abstract; (b) outlined inclusion criteria within the methods section; (c) published in peer-reviewed journals and indexed before june , ; (d) published in the english language. in addition to srs that did not meet these inclusion criteria, srs were excluded if the review was superseded by a newer version. study selection citations identified from the systematic search were uploaded to covidence systematic review software [ ] and duplicates were automatically deleted. two reviewers (kk and jl) screened all titles and abstracts using the inclusion/exclusion criteria and excluded all articles that were definitely ineligible. articles that received conflicting votes (ineligible vs. potentially or probably eligible) were discussed and consensus was reached. the same two reviewers screened the full text of all the potentially or probably eligible articles using a ranked list of the inclusion criteria (s appendix). reasons for exclusion were selected from the ranked list. if consensus was not possible during title/abstract or full text screening, a third reviewer (dr), who had the casting vote, would have been asked to independently screen the article. this was never required as consensus was always reached. data collection one reviewer (jl) extracted summary data from each selected article using a customized standard form with independent data extraction performed for % of included srs by one of the other reviewers (dr or kk). there was % agreement between reviewers for all items within the standard form, with discrepancies only in level of detail . data items included the research design of the sr and primary studies, sample description and setting, intervention characteristics, outcomes, meta-analysis results, quality appraisal, and conclusions. due to the heterogeneity of the srs included in this overview, it was not possible to perform a meta-analysis. outcome measures were collected from included studies. risk of bias within primary studies was recorded in s appendix. risk of bias across srs was determined using ballard and montgomery's four-item checklist for overviews of srs [ ] . these items include: ( ) overlap (see below), ( ) rating of confidence from the amstar checklist [ ] , ( ) date of publication, and ( ) match between the scope of the included srs and the overview itself. an important consideration in overviews is the degree of overlap, or the use of the same primary study in multiple included srs. high overlap can contribute to biased results [ ] . this overview used the corrected covered area (cca), a comprehensive and validated measure, to determine overlap [ ] . the cca is calculated using three variables: the number of "index" publications (r), the number of total publications (n), and the number of srs within the overview (c). an "index" publication is the first appearance of a primary study within an overview. the formula for the cca is: ccas can be interpreted as indicating slight, moderate, high or very high overlap with scores of - , - , - , or > , respectively [ ] . the amstar checklist [ ] was used to appraise quality of included srs. one reviewer (jl) assessed all srs and a second (kk) duplicated appraisal of %, with % agreement and only minor disagreements that did not impact grades of confidence. following the recommendation of the amstar developers [ ] , individual ratings were not combined into an overall score. instead, the authors determined which of the items on the checklist were critical for this overview and which of the items were non-critical. building on a method suggested by shea and colleagues [ ] , grades of confidence in the results of each sr were generated based on critical flaws and non-critical weaknesses. the grading system is available in s appendix. confidence in results ranged from high (three or fewer non-critical weaknesses) to critically low (more than three critical flaws with or without non-critical weaknesses) [ ] . used meta-analysis to combine results [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , whereas the remaining nine srs narratively synthesized results [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . eleven srs included studies located in countries with high- income or upper-middle income economies only [ , , [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , , ] . six srs included at least one study from countries with lower-middle income or lower income economies [ , , , , , ] . the final three srs either did not state the locations of included studies [ , ] or provided regions rather than specific country locations [ ] . to be included in this overview, at least % of studies within each sr had to fulfill all inclusion criteria. in four srs, % of included studies fulfilled all inclusion criteria [ , , , ] , although brendel and colleagues [ ] only included one study in total. in another four srs, % to % of included studies fulfilled all the inclusion criteria [ , , , ] . in the remaining twelve srs, % to % of included studies fulfilled all the inclusion criteria [ , , [ ] [ ] [ ] [ ] , , , [ ] [ ] [ ] . all srs primarily examined studies on school-based, rather than school-linked interventions. the srs covered eight health areas: nine on mental health [ , , , , , [ ] [ ] [ ] ] , four on oral health [ , , , ] , two on asthma [ , ] , and one sr each on sleep [ ] ; obesity [ ] ; vision [ ] ; menstrual management [ ] ; and sexual and reproductive health (srh) [ ] . eleven srs included only cluster-and individually- randomized controlled trials [ ] [ ] [ ] , , , , , , ] , seven srs included other types of controlled and uncontrolled experimental studies in addition to rcts [ , , , , , , ] , and two srs included only qes [ ] or controlled clinical trials [ ] . table the corrected covered area (cca) was found to be , indicating only slight overlap between the srs. calculations for the cca can be found in s appendix. table presents the remaining three of the four items of ballard and montgomery's checklist for overviews of reviews: ( ) levels of confidence in results for each included sr, ( ) publication year, and ( ) match in scope to the overview. a majority of the studies ( %) were given low or critically low levels of confidence. only three srs [ , , ] were scored as having moderate levels of confidence and just one [ ] was given a high level of confidence. the details of the quality appraisal of primary studies included in the srs are given in s appendix. [ ] oral health moderate % bastounis [ ] mental health low % brendel [ ] mental health critically low % chung [ ] sleep low % cooper [ ] oral health high % evans [ ] vision moderate % geryk [ ] asthma critically low % gold [ ] mental [ ] obesity critically low % stein [ ] oral health low % sullivan [ ] mental health critically low % walter [ ] asthma low % werner-seidler [ ] mental health low % a the four items of the checklist include: overlap, amstar rating of confidence, up-to-date, and relevance of included studies b see s appendix for amstar rating information c i.e., percentage of studies within the systematic review that clearly fulfill all inclusion criteria srh = sexual and reproductive health none of the srs evaluated comprehensive, multi-component, or multi-health area school health services. two srs found strong evidence for the potential effectiveness of educational interventions for children and adolescents with asthma diagnoses (table a) [ , ] . geryk and colleagues found that education on correct use of an inhaler improved inhaler technique, regardless of deliverer, method, or duration of the intervention [ ] . however, they did not assess risk of bias or appraise the quality of included studies [ ] . walter and colleagues found that family asthma educational programs for children and their parents or caregivers improved quality of life for both caregivers and children, and decreased asthma exacerbations for children [ ] . while results from primary studies were statistically significant in both srs, heterogeneity of interventions precluded meta-analysis by walter and colleagues [ ] and no reason was given for why meta-analysis was not performed by geryk and colleagues [ ] . hennegan and montgomery assessed the effectiveness of "hardware" and "software" menstrual management interventions (table b ) [ ] . hardware interventions included provision of sanitary products and software interventions focused on menstrual management education. a metaanalysis of two studies on sanitary pad provision found a moderate but statistically non-significant effect on school attendance. however, it was unclear whether these studies involved a health provider [ ] . outcomes across studies differed, but the authors noted trends toward improvement in menstruation knowledge, management practices, psychosocial outcomes, and school attendance. hennegan and montgomery found a high level of heterogeneity and substantial risk of bias in the included studies overall, thus they were unable to make conclusions about the effectiveness of menstrual management interventions [ ] . the effectiveness of school-based mental health services was assessed in nine srs (table c ) [ , , , , , [ ] [ ] [ ] , ] . srs addressed various intervention types: universal interventions [ , , , , , ] ; targeted interventions for military-connected children [ ] , children and adolescents at risk for depression and/or anxiety [ , ] , refugee and war-traumatized youth [ ] , and children referred to therapy [ ] ; and indicated interventions for children and/or adolescents diagnosed with autism spectrum disorder [ ] , depression [ , , ] , or anxiety [ , , ] . prevention and treatment of mood disorders was assessed in five srs, all of which targeted children and adolescents using rcts of established programs. higgins and o'sullivan assessed the friends for life program, a manual-based cognitive behavioral anxiety prevention program comprised of ten sessions with developmentally-tailored programs for different age groups [ ] . they found statistically significant improvements in self-reported measures of anxiety for participants who completed the program as compared to those in the control group [ ] . a sr and meta-analysis by bastounis and colleagues on the educational and preventative penn resiliency program (prp) and its derivatives found small, non-significant effect sizes for the prevalence of both depression and anxiety, favoring the intervention in the former and the control in the latter [ ] . the remaining three srs also assessed friends and prp along with additional often-overlapping programs. neil and christensen analyzed unique anxiety prevention and early intervention programs and found reductions in anxiety symptoms in % of the included studies [ ] . kavanagh and colleagues examined depression and anxiety group counseling programs based on cognitive behavioral therapy (cbt) and found statistically significant reductions of depressive symptoms at both four weeks and three months follow-up [ , ] . finally, meta-analysis by werner-seidler and colleagues of rcts on the effectiveness of depression and anxiety prevention and group therapy programs found small yet statistically significant effect sizes in favor of the intervention groups for both depression and anxiety as compared to control groups [ ] . although the overall degree of overlap between all srs within this overview was slight, the overlap between just these five srs targeting mood disorders was high (cca = ). assessments of music [ ] and art therapy [ ] in two srs reported weak evidence of effectiveness. gold and colleagues assessed daily music therapy as an intervention to improve verbal and gestural communicative skills and reduce behavioral problems in children diagnosed with autism spectrum disorder [ ] . meta-analysis found small but statistically significant effect sizes in favor of music therapy for gestural communication, verbal communication, and behavioral problems [ ] . oppositional defiant disorder (odd), separation anxiety disorder (sad), moderate to severe behavior problems, or learning disorders [ ] . the authors found improvements in classroom behavior, symptoms of odd, and symptoms of sad [ ] . however, in the studies included in both gold and colleagues' and mcdonald and drey's srs, the numbers of participants per intervention group were very small: - and - per study, respectively, introducing possibility of bias [ , ] . mostly favorable evidence of effectiveness was found in a sr of social-emotional interventions for refugee and war-traumatized youth from countries [ ] . improvements in trauma-related symptoms and impairment were found through narrative assessment of creative expression interventions, cognitive behavioral interventions, and multifaceted interventions [ ] . in contrast with gold and colleagues [ ] , this sr by sullivan and simonson found negative effects from music therapy interventions [ ] . however, there was no risk of bias assessment in this sr and therefore the results must be interpreted cautiously. the final sr on mental health services examined well-being interventions for children with a parent in the military [ ] . only one quasi-experimental study from the united states in was included in the sr. the study assessed a group counseling intervention and found no statistically significant effects on the prevalence of anxiety, self-esteem, internalizing behavior or externalizing behavior [ ] . schroeder and colleagues reviewed the effectiveness of obesity treatment and prevention interventions that specifically involved a school nurse (table d ) [ ] . most interventions involved school-nurse-delivered nutrition counseling, nutrition and health education, and some parent involvement or physical activity. meta-analysis indicated small, yet statistically significant, reductions in body mass index (bmi), bmi z-score, and bmi percentile for both obesity treatment and prevention [ ] . srs on oral health interventions focused on prevention [ , ] , screening [ ] , and education [ , ] (table e ). strongest evidence in favor of oral health interventions emerged from a sr on universal topical application of fluoride gel for the prevention of dental caries [ ] . meta-analysis results indicated a statistically significant effect on the before-after change in caries prevalence [ ] . a universal educational intervention on oral hygiene and caries produced weaker evidence of effectiveness [ ] . small but statistically significant effect sizes were found in favor of the intervention for mean plaque levels and oral hygiene, but no statistical significance was found for change in gingivitis indices [ ] . two srs on dental health screening [ ] and behavioral interventions for caries prevention [ ] found limited evidence of effectiveness. arora and colleagues did not find any rcts that looked at the effectiveness of dental health screening versus no screening on improving oral health outcomes, but their search did locate six rcts from the united kingdom and india with dental care attendance as the outcome [ ] . the data was too heterogeneous to meta-analyze, and the authors determined that the certainty of the evidence of the benefit of dental screening in increasing dental attendance was very low [ ] . the other sr examined behavioral interventions in the form of education on tooth brushing and the use of fluoride toothpaste in brazil, italy, united kingdom, and iran [ ] . due to the diversity in outcome measures and intervention intensities, the authors felt unable to make any evidence-based recommendations [ ] . a sr on sexual health interventions for prevention of sexually transmitted infections (stis) and human immunodeficiency virus (hiv) in sub-saharan african countries found that educational interventions were successful in increasing knowledge and attitudes for participants (table f ) [ ] . however, the sr suggested that the studies were ineffective in changing self-reported risky behaviors, although follow-up was either immediate or short-term (less than months) [ ] . this sr did not discuss the quality or risk of bias of included studies [ ] . chung and colleagues systematically reviewed and meta-analyzed universal sleep education programs as compared to no additional sleep intervention from australia, new zealand, brazil, and hong kong (table g ) [ ] . five of the included studies examined the same weekly sleep education program from the australian centre for education in sleep. the sixth study assessed a -day program in brazil. meta-analysis of the six studies showed statistically significant short-term benefits for weekday sleep time, weekend sleep time, and mood [ ] . however, these results did not persist at follow-up [ ] . evans and colleagues reviewed seven rcts from china, india, and tanzania on vision screening for correctable visual acuity deficits at or before school entry (table h ) [ ] . through meta-analysis of two rcts, the authors found that school vision screening combined with provision of free spectacles resulted in a statistically significant % increase in the wearing of spectacles at - months follow-up as compared to vision screening combined with prescription for spectacles only [ ] . evans and colleagues found no statistically significant difference in the proportion of students wearing spectacles at - months follow-up between vision screening with provision of ready-made spectacles and vision screening with provision of custom-made spectacles in a meta-analysis of three rcts [ ] . education on the wearing of spectacles in addition to vision screening as compared to vision screening alone did not have a significant effect [ ] . no srs found eligible studies comparing vision screening with no vision screening. this overview found srs covering primary studies. the majority of srs assessed educational, counseling, or preventive interventions, most of which were special research interventions rather than routinely-delivered school health services. no sr examined comprehensive or multi-component school health services, despite the fact that comprehensive services may be more efficient, easier to implement, and more sustainable than single interventions [ ] . results from this overview suggest that certain interventions can be effective in improving child and adolescent health outcomes, and thus may be worthwhile for integration into school health programs. vision screening is one of the most common forms of school health services [ ] , although the majority of programs are concentrated in high-income countries (hic) [ ] . although prevalence of visual impairment varies widely by ethnic group and age [ ] , who estimates that at least million children below age are visually impaired [ ] . evans and colleagues found strong evidence from china, tanzania, and india that school vision screening for correctable visual acuity deficits increased wearing of spectacles when spectacles were provided at no cost [ ] . a recent guideline from the international agency for prevention of blindness (iapb) reiterates the importance of free spectacles and goes further to suggest that low-and middle-income countries (lmic) adopt comprehensive school eye health programs [ ] . vision screening linked with free provision of spectacles, as a component of a comprehensive school eye health program, is an example of a cost-effective form of school health services that may be implemented. five srs covered depression and/or anxiety prevention and early intervention programs, with the friends for life program (friends) and the penn resiliency program (prp) most common [ , , , , ] . given that friends has been endorsed by who [ ] and was found to be effective in decreasing anxiety symptoms in all four srs where it was mentioned in this overview [ , , , ] , policy makers and school health officials may consider incorporating this or similar programs into existing school health services. the four srs that included prp found mixed evidence [ ] or no evidence of effectiveness [ , , , ] , bringing the popularity of this intervention into question. finally, creative therapy interventions seem to be effective for indicated populations of school-age children, such as children with autism spectrum disorder [ , ] . however, this conclusion should be interpreted cautiously due to small effect sizes, small sample sizes, and conflicting evidence on the effectiveness of music therapy between sullivan and simonson [ ] and gold and colleagues [ ] . comprehensive school programs that promote healthy school environments, health and nutrition literacy, and physical activity are one of the six key areas for ending childhood obesity recommended by who [ ] . this overview found only one sr that assessed obesity treatment and prevention delivered by a health professional in schools, despite the fact that over million school-age children and adolescents were overweight or obese in [ ] . schroeder and colleagues found that school nurses are well positioned to deliver nutritional counseling, design and coordinate physical activity interventions, and educate parents, students, and staff on health, nutrition, and fitness [ ] . however, all included primary studies were delivered in hic [ ] . schools are considered to be an ideal platform for oral health promotion through education, services, and the school environment [ ] . the most promising evidence from a sr was on topical application of fluoride gel for the prevention of dental caries [ ] . educational interventions had mixed effects. a sr that focused on behavioral education, such as demonstrating how to correctly brush teeth, found no evidence for reduction in caries [ ] , whereas a sr on oral hygiene and caries education found evidence for decreased plaque and improved hygiene [ ] . more research should be done to identify the content and methods of deliver that make some oral health education interventions more effective than others. it is difficult to determine overall effectiveness of school health services from this overview because the included srs do not sufficiently cover the health areas most relevant for children and adolescents. in , the top five leading causes of death for - year olds were lower respiratory infections, diarrheal diseases, meningitis, drowning, and road injury [ ] . among - year olds, the leading causes of death were lower respiratory infections, drowning, road injury, diarrheal diseases, and meningitis [ ] . finally, for [ ] [ ] [ ] [ ] [ ] year olds, the leading causes of death were road injury, self-harm, interpersonal violence, diarrheal diseases, and lower respiratory infections [ ] . leading causes of disability-adjusted life years (dalys) for - year olds were iron deficiency anemia, road injury, depressive disorders, lower respiratory infections, and diarrheal diseases [ ] . this overview shows that the current sr literature does address mental health, specifically mood disorders. however, the causes of death and disability beyond self-harm and depressive disorders are currently not addressed. although mortality and morbidity statistics vary by region and country, it is clear that the health areas included in this overview reflect a small subset of the global burden of disease for children and adolescents. furthermore, this overview exposes a mismatch between the sr literature on effectiveness of school health services and the actual school health services that are most commonly delivered. vaccinations have been identified as the most common type of intervention in schools in at least countries or territories [ ] , and there is evidence of effectiveness from primary studies regarding feasibility of school-based vaccination programs [ , ] . yet no srs on vaccinations fulfilled the inclusion criteria for this overview, suggesting the need for these srs to be conducted. additionally, at least countries or territories include some form of school health services that are routinely delivered, as opposed to special research interventions [ ] . this overview primarily found evidence for special research interventions, suggesting a need for assessment of routinely-delivered school health services. one of the central questions of this overview was whether the school health services that are regularly delivered across the globe are evidence-based. the mismatch in the sr literature identified by this overview demonstrates that more research must be done before an answer to this question can be determined. another important gap that this overview reveals is a lack of research on interventions carried out in lmic and low-income countries (lic). only one of the srs included in this overview examined studies from a majority of lmic and lic [ , ] . this is problematic given that health disparities for children and adolescents are greater in lmic and lic than in higher income countries [ ] . additionally, resources differ by income level and therefore effective interventions in hic may need to be tailored or changed entirely in order to be feasible in lmic and lic. who reports densities of less than one physician per population in countries and less than three nurses or although three srs mentioned the cost of specialized professionals delivering interventions versus teachers or a school nurse [ , , ] , cost, let alone cost-effectiveness, was not closely analyzed in any of the included srs. for useful recommendations to be made regarding school health services, cost-effectiveness must be more closely examined by primary studies and srs. although overviews offer a comprehensive method for synthesizing evidence, they also come with important methodological limitations. first, an overview is unlikely to include the latest evidence if recent primary studies have not yet been included in srs. this lag may preclude the ability for an overview to truly reflect current knowledge [ ] . while this overview found significant gaps in the evidence for certain health areas, this does not necessarily mean that relevant high quality primary trials have not been conducted. second, the ability of overviews to make valid and accurate conclusions is dependent upon the accuracy, rigor, and inclusiveness of the srs themselves. % (n= ) of srs included in this overview were given ratings of low or critically low confidence using the amstar checklist, although this is not unusual given the stringency of the checklist. nonetheless, it is interesting to note that all four of the srs given moderate or high levels of confidence were cochrane reviews. the remaining cochrane review was given a critically low level of confidence, though this may be because it was published in and standards for both the methods and reporting of cochrane reviews have improved in recent years. third, the scopes of individual srs often differ from the scope of the overview, a problem that ballard and montgomery call a "scope mismatch" [ ] . in this overview, srs with at least % of included studies fulfilling all criteria were included after extensive discussion between the authors and experts in the field. this implies that a narrower range of srs would have been eligible if a stricter cut-off had been selected, and vice versa. it is important to take this into account when interpreting results. finally, overlap of primary trials between srs can bias results of an overview [ ] . there is no definitive guidance on how to correct for overlap, as both including or excluding overlapping srs presents potentially biased results [ , ] . this overview measured overlap using the corrected covered area (cca) [ ] and did not exclude overlapping studies. however, the degree of overlap across all srs was graded as being small (cca = ), while there was high overlap in srs on mood disorders (cca = ). cca values for all health areas and calculations are available in s appendix. a key limitation of this overview is that only publications self-titled "systematic reviews" were included. this decision was made because of the vast numbers of reviews available and the increased rigor associated with the term "systematic". a sensitivity analysis comparing "systematic*" with "systematic review" found that the number of search results increased almost three-fold, but did not reveal any new articles that would eventually have met the subsequent eligibility criteria. another limitation is that this overview only included randomized and non-randomized controlled trials, quasi-experimental studies and other controlled study designs where health professionals delivered the intervention. while this strengthens the rigor of included studies and improves decision-making ability, it also excludes potentially relevant literature. a strength of this overview is that it attempted to answer a question that has not yet been answered regarding the effectiveness of both comprehensive and specific school health services delivered by a health provider. while other pillars of health promoting schools have relevant guidance documents, guidance on school health services is limited and not explicitly evidence-based. given the wide reach of schools and the fact that school health services already exist in most countries, international guidelines are needed to clarify whether school health services can be effective, and if so, which interventions should and should not be included. this overview makes an important first step toward that guideline. this overview presents multiple effective interventions that may be offered as a part of school health services delivered by a health provider. however, it is difficult to formulate an overarching answer about the effectiveness of school health services for improving the health of school-age children and adolescents due to the heterogeneity of srs found and the evident gaps in the sr literature. more than half of included srs analyzed mental health and oral health interventions, and no srs were found that assessed other relevant health areas, such as vaccinations, communicable diseases, injuries, etc. further, no srs evaluated comprehensive or multi-component school health services. if school health services are to truly improve the health of children and adolescents, they must comprehensively address the most pressing problems of this population. in order for policy makers and leaders in school health to make evidence-based recommendations on which services should be available in schools, who should deliver them, and how should they be delivered, more srs must be done. these srs must assess routine, multi-component school health services and the characteristics that make them effective, with special attention to content, quality, intensity, method of delivery, and cost. the gaps in the sr literature identified by this overview will inform the commissioning of new srs by who to feed into evidence-based global recommendations. who | global school health initiative focusing resources on effective school health: a fresh start to enhancing the quality and equity of education who information series on school health: oral health promotion: an essential element of a health-promoting school. geneva: world health organization school-based deworming: a planner's guide to proposal development for national school-based deworming programs deworm the world basic guide for school directors, teachers, students, parents and administrators usaid hygiene improvement project malaria control in schools: a toolkit on effective education sector responses to malaria in africa world bank open data global overview of school health services: data from countries world health organization. global accelerated action for the health of adolescents (aa-ha!): guidance to support country implementation preferred reporting items for systematic reviews and meta-analyses: the prisma statement covidence systematic review software veritas health innovation risk of bias in overviews of reviews: a scoping review of methodological guidance and four-item checklist amstar : a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. the bmj overviews of systematic reviews: great promise, greater challenge systematic review finds overlapping reviews were not mentioned in every other overview amstar : a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both school dental screening programmes for oral health. the cochrane library the effectiveness of the penn resiliency programme (prp) and its adapted versions in reducing depression and anxiety and improving explanatory style: a systematic review and meta-analysis school-based sleep education programs for short sleep duration in adolescents: a systematic review and meta-analysis vision screening for correctable visual acuity deficits in school-age children and adolescents. the cochrane library do menstrual hygiene management interventions improve education and psychosocial outcomes for women and girls in low and middle income countries? a systematic review music therapy for autistic spectrum disorder. the cochrane library school-based cognitivebehavioural interventions: a systematic review of effects and inequalities fluoride gels for preventing dental caries in children and adolescents. the cochrane library are school nurses an overlooked resource in reducing childhood obesity? a systematic review and meta-analysis effectiveness of oral health education on oral hygiene and dental caries in schoolchildren: systematic review and meta-analysis school-based depression and anxiety prevention programs for young people: a systematic review and meta-analysis effects of school-based interventions with u.s. military-connected children: a systematic review primary school-based behavioural interventions for preventing caries a systematic review of school-based interventions that include inhaler technique education what works": systematic review of the "friends for life" programme as a universal school-based intervention programme for the prevention of child and youth anxiety primary-school-based art therapy: a review of controlled studies efficacy and effectiveness of school-based prevention and early intervention programs for anxiety a systematic review of school-based sexual health interventions to prevent sti/hiv in sub-saharan africa a systematic review of school-based social-emotional interventions for refugee and war-traumatized youth effectiveness of school-based family asthma educational programs in quality of life and asthma exacerbations in asthmatic children aged five to : a systematic review world bank country and lending groups -world bank data help desk school-based cognitivebehavioural interventions: a systematic review of effects and inequalities global variations and time trends in the prevalence of childhood myopia, a systematic review and quantitative meta-analysis: implications for aetiology and early prevention vision impairment and blindness. in: world health organization standard school eye health guidelines for low and middle-income countries the international agency for the prevention of blindness prevention of mental disorders: effective interventions and policy options geneva: world health organization obesity and overweight. in: world health organization who [internet human papillomavirus vaccination in tanzanian schoolgirls: cluster-randomized trial comparing vaccine-delivery strategies feasibility of delivering hpv vaccine to girls aged to years in uganda world health statistics : monitoring health for the sdgs, sustainable development goals. geneva: world health organization ensure healthy lives and promote wellbeing for all at all ages we would like to thank tomas allen for his guidance and support in designing the search strategy. we are also grateful for advice from the who school health services guideline steering group and guideline development group members. key: cord- -k jnv v authors: friedl, jana; knopp, michael r.; groh, carina; paz, eyal; gould, sven b.; boos, felix; herrmann, johannes m. title: more than just a ticket canceller: the mitochondrial processing peptidase matures complex precursor proteins at internal cleavage sites date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k jnv v most mitochondrial proteins are synthesized in the cytosol as precursors that carry n-terminal presequences. after import into mitochondria, these targeting signals are cleaved off by the mitochondrial processing peptidase mpp, giving rise to shorter mature proteins. using the mitochondrial tandem protein arg , as a model substrate, we demonstrate that mpp has an additional role in preprotein maturation, beyond the removal of presequences. arg , is synthesized as a polyprotein precursor that is imported into the mitochondrial matrix and subsequently separated into two distinct enzymes that function in arginine biogenesis. this internal processing is performed by mpp, which cleaves the arg , precursor both at its n-terminus and at an internal site between the arg and arg parts. the peculiar organization and biogenesis of arg , is conserved across fungi and might preserve the mode of co-translational subunit association of the arginine biosynthesis complex of the polycistronic arginine operon in prokaryotic mitochondrial ancestors. putative mpp cleavage sites are also present at the junctions in other mitochondrial fusion proteins from fungi, plants and animals. our data suggest that, in addition to its role as “ticket canceller” for the removal of presequences, mpp exhibits a second, widely conserved activity as internal processing peptidase for complex mitochondrial precursor proteins. all cellular processes are carried out by proteins, linear chains of amino acids that fold into three- dimensional structures. while the amino acid sequence of a protein is primarily determined by its dna sequence, many proteins are additionally modified by proteolytic cleavage after their synthesis. processing of polypeptides at their n-terminus is pervasive in both prokaryotic and eukaryotic proteomes. for instance, the amino-terminal methionine is removed from many polypeptides when (fig. b) . obviously, the precursor is rapidly and efficiently cleaved in vivo, which yields a c-terminal arg fragment. in order to analyze the mechanistic basis of this unusual biogenesis, we sought to reconstitute the biogenesis and processing of arg , in an in vitro system. to this end, we synthesized radiolabeled arg , precursor in reticulocyte lysate and incubated it with isolated yeast mitochondria. we observed that the precursor of around kda was efficiently processed to a slightly smaller polypeptides, but not the arg , precursor were protected from digestion with externally added proteinase k. this shows that these polypeptides were translocated across the mitochondrial outer membrane. both import and processing were dependent on the mitochondrial inner membrane . we conclude that arg , is imported into the mitochondrial matrix via the presequence pathway and cleaved into separate polypeptides inside mitochondria. how is the arg , precursor processed to give rise to the arg and arg enzymes? a number of proteases in the mitochondrial matrix are described (quiros et al., , veling et al., . however, most of them are either implicated in degradation and turnover of proteins (such as the lon protease pim ) or known to remove short peptides or single amino acids from the n-terminus of mitochondrial precursor proteins (such as oct or icp ), but not for internal cleavage of proteins and (fig. d ). the latter one would fit to the molecular masses of the arg and arg proteins observed in our in vitro system. to directly test whether mpp can cleave the arg , precursor, we purified mpp from e. coli expressing his-tagged mas and mas (the two subunits of mpp). incubation of radiolabeled arg , precursor protein with mpp resulted in the formation of smaller fragments whose size perfectly matched those that were generated after import into isolated mitochondria (fig. e ). proper processing was blocked when edta was added to the reaction, which inhibits the metalloprotease mpp by chelating divalent cations (suppl. fig. a ) (luciano et al., ) . we conclude that arg , is imported into the mitochondrial matrix and processed twice by mpp. a the unusual biogenesis of arg , prompted us to ask whether arg and arg can also be imported separately. therefore, we created truncated versions of the arg , gene which contain only the n-terminal arg with its mts (arg - ) or only the c-terminal arg , starting at the first (arg - ) or the second imts-l (arg - ). for the latter two, we also generated variants which additionally carry the well-characterized presequence of atp synthase subunit from neurospora crassa (su -arg - and su -arg - ) ( fig. a) . radiolabeled proteins were synthesized in vitro and incubated with isolated mitochondria to test their import competence. as expected, arg - was efficiently imported and its mts was cleaved (fig. b) . the shorter arg variant (arg - ) did not reach a protease-protected compartment and, thus, was not imported into mitochondria (fig. c) . however, n-terminal fusion of the su presequence completely restored import of arg - (fig. d) . hence, import of arg and arg into mitochondria is in principle possible also for separated polypeptides, at least in the in vitro assay used here. we next tested whether arg and arg can be imported separately in vivo and function in arginine biosynthesis. the deletion of the arg , gene renders yeast cells auxotrophic for arginine. we expressed either the full length arg , precursor or combinations of separate arg and arg variants in a Δarg , deletion mutant. if arg and arg make their way into mitochondria and acquire a functional conformation, arginine prototrophy should be restored. when we streaked out these cells on plates with minimal growth medium lacking arginine, we observed growth for the wildtype and the Δarg , mutant complemented with full length arg , , but not for Δarg , carrying only an empty plasmid, as expected (fig. e ). the mutant expressing both arg - and the shorter arg - variant was not able to grow without arginine (fig. e ). however, when the presequence of su was fused to the short arg - , cells regained arginine prototrophy (fig. f ). all strains grew on plates containing arginine, showing that the arg - protein has no toxic gain-of-function effect when residing in the cytosol (suppl. fig. b,c). growing the strains in liquid medium lacking arginine confirmed the results obtained on plates and additionally demonstrated that the growth rate of the strain expressing arg - and su - arg - is comparable to that of the wildtype (suppl. fig. d . table a) . , species were identified that encode exactly one copy each of argb and argc (suppl. table f ) that were not further investigated. we also predicted the intracellular localization of all proteins via targetp. strikingly, mitochondrial localization was assigned to almost all fusion proteins of fungi, whereas most separate proteins in algae were predicted to be imported into chloroplasts (fig. b) . in summary, in fungi the acetylglutamate kinase and acetylglutamyl-phosphate reductase are generally encoded as a fusion protein, which is imported into mitochondria and processed twice by mpp to remove its presequence and gives rise to two functional enzymes. in contrast, algae encode two separate proteins which are individually imported into chloroplasts. gamma-proteobacteria express the genes from one polycistronic rna (fig. c) . profiling analysis and indeed found prominent imts-ls at each of their junction sites (fig. b) . the organization as fusion protein is an elegant solution to confer mitochondrial targeting of two enzymes that reside in the same compartment and even act in subsequent steps of a biochemical pathway. it is still remarkable that this organization of arg , was retained during evolution even in distantly related organisms, indicating that there exists a strong constraint that maintained this organization for more than a billion years of evolution. to our knowledge, this is the only example of a fusion of two functionally related proteins whose organization is so widely conserved across eukaryote species. eukaryotic genomes typically strongly disfavor even "milder" variants of physical coupling of genes, such as an operon-like organization which is pervasively present in prokaryotes, besides its canonical role as "ticket canceller" that clips targeting signals, the mitochondrial processing peptidase mpp obviously also possesses a "tailor" activity for internal processing of several precursor proteins (fig. c ). this property is conserved across the fungi, plant and potentially also animal kingdoms. internal mpp cleavage requires a proximal recognition motif which appears to be an imts-l, but remarkably also a strong n-terminal presequence. this suggests that in order to access internal cleavage sites, mpp has to be loaded onto a precursor as soon as it emerges from the tim channel. mpp might then "scan" the still unfolded polypeptide for cleavage sites. a protein with a strong presequence will efficiently recruit mpp, which enables subsequent internal cleavage at an imts-l before this is buried by protein folding. in analyses of the n-proteome from yeast, yeast strains and plasmids all yeast strains used in this study were based on the wt strain by (winston et al., ) . unless indicated differently, strains were grown on synthetic medium ( . % yeast nitrogen base and . % (nh ) so ) containing % glucose. the arg , -coding region or a fragment of it was amplified by pcr and cloned into pgem the homogenate was centrifuged for min at , x g at °c to separate cell debris and nuclei from organelles. the mitochondrial fraction was isolated by centrifugation of the supernatant from the previous step for min at , x g at °c. the crude mitochondrial pellet was gently resuspended in sh buffer ( . m sorbitol, mm hepes/koh ph . ), centrifuged for min at , x g at °c, recovered from the supernatant by centrifugation for min at , x g at °c and finally resuspended in sh buffer. the protein concentration of the mitochondrial suspension was determined by a bradford assay and mitochondria were diluted to a final concentration of mg/ml protein with ice-cold sh buffer, aliquoted, frozen in liquid nitrogen and stored at - °c. . table d ). subject organisms that exhibited hits (at least % local identity and a maximum e-value of e- ) were grouped by the number and pattern of the identified homologs, corresponding to either encoding arg , as one gene or as two genes (suppl . table e exhibiting exactly one homolog to each of the two query sequences (at least % local identity and a maximum e-value of e- ) were further investigated and the nucleotide distance between the subject genes was calculated (suppl . table c ). sequence pairs with a maximum distance of nucleotides were suspected to be encoded polycistronically. all genomic distances in nucleotides were derived from refseq genome feature tables. sequence pairs with overlapping start and end position were given a distance of one nucleotide. the regulatory region of the divergent argecbh operon in escherichia coli k- an early mtupr: redistribution of the nuclear transcription factor rox to mitochondria protects against intramitochondrial proteotoxic aggregates the versatility of the mitochondrial presequence processing machinery: cleavage, quality control and turnover new roles for mitochondrial proteases in health, ageing and disease the nadh dehydrogenase nde executes cell death after integrating signals from metabolism and proteostasis on the mitochondrial surface role of the membrane potential in mitochondrial protein unfolding and import two distinct membrane potential-dependent steps drive mitochondrial matrix protein translocation the benefits of cotranslational assembly: a structural perspective cotranslational assembly of protein complexes in eukaryotes revealed by ribosome profiling the biosynthesis of insulin and a probable precursor of insulin by a human islet cell adenoma requirement for the yeast gene lon in intramitochondrial proteolysis and maintenance of respiration n-terminome analysis of the human mitochondrial proteome the n-end rule pathway and regulation by proteolysis multi- omic mitoprotease profiling defines a role for oct p in coenzyme q production mitochondrial protein turnover: role of the precursor intermediate peptidase oct in protein stabilization global analysis of the mitochondrial n-proteome identifies a processing peptidase critical for protein stability mitochondrial targeting sequences may form amphiphilic helices molecular chaperones cooperate with pim protease in the degradation of misfolded proteins in mitochondria construction of a set of convenient saccharomyces cerevisiae strains that are isogenic to s c proteomic profiling of the mitochondrial ribosome identifies atp as a composite mitochondrial precursor protein viral precursor polyproteins: keys of regulation from replication to maturation crystal structure of sars-cov- main protease provides a basis for design of improved alpha- ketoamide inhibitors Δarg , ) were transformed with plasmids for expression of the indicated arg , variants and streaked out on plates containing minimal growth medium without arginine figure . mpp requires a strong n-terminal mts for internal processing of precursor proteins non- imported material is digested with proteinase k (left half). % of the total lysate used per import lane is loaded for control. the membrane potential (Δψ) was depleted with vao. red arrowheads indicate processing sites. p, precursor, i, intermediate, m, mature. c, yeast cells expressing indicated variants of arg , , all carrying a c-terminal ha tag, were lysed and protein extracts were analyzed by sds-page and immunoblotting directed against the ha epitope or sod as a loading control competing interests the authors declare that they have no competing interests. key: cord- -v gkubd authors: mäkinen, janne j.; shin, yeonoh; vieras, eeva; virta, pasi; metsä-ketelä, mikko; murakami, katsuhiko s.; belogurov, georgiy a. title: the mechanism of the nucleo-sugar selection by multi-subunit rna polymerases date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v gkubd rna polymerases (rnaps) synthesize rna from ntps, whereas dna polymerases synthesize dna from ’dntps. dna polymerases select against ntps by using steric gates to exclude the ’ oh, but rnaps have to employ alternative selection strategies. in single-subunit rnaps, a conserved tyr residue discriminates against ’dntps, whereas selectivity mechanisms of multi-subunit rnaps remain hitherto unknown. here we show that a conserved arg residue uses a two-pronged strategy to select against ’dntps in multi-subunit rnaps. the conserved arg interacts with the ’oh group to promote ntp binding, but selectively inhibits incorporation of ’dntps by interacting with their ’oh group to favor the catalytically-inert ’-endo conformation of the deoxyribose moiety. this deformative action is an elegant example of an active selection against a substrate that is a substructure of the correct substrate. our findings provide important insights into the evolutionary origins of biopolymers and the design of selective inhibitors of viral rnaps. all cellular lifeforms use two types of nucleic acids, rna and dna to store, propagate and utilize their genetic information. rna polymerases (rnaps) synthesize rna from ribonucleoside triphosphates (ntps), whereas dna polymerases (dnaps) use '-deoxyribonucleoside triphosphates ( 'dntps) to synthesize dna. the rna building blocks precede the dna building blocks biosynthetically and possibly also evolutionarily , . messenger rna molecules function as information carriers in a single-stranded form, whereas ribosomal, transfer and regulatory rnas adopt complex three-dimensional structures composed of double-stranded segments. the double stranded rnas favor a-form geometry where the ribose moiety of each nucleotide adopts the '-endo conformation (fig. a) . in contrast, dna functions as a b-form double helix, where the deoxyribose of each nucleotide adopts the '-endo conformation (fig. a, b) . hybrid duplexes between the rna and dna transiently form during transcription and adopt an a-form geometry because 'oh groups in the rna clash with the phosphate linkages in the b-form configuration. the sugar moieties of ntps and 'dntps equilibrate freely between the ' and '-endo conformations in solution with the overall bias typically shifted towards the '-endo conformers . however, both ntps and 'dntps typically adopt the '-endo conformation in the active sites of the nucleic acid polymerases . rnaps and dnaps need to discriminate efficiently against the substrates with the non-cognate sugar. the intracellular levels of ntps are in the range of hundreds of micromoles to several millimoles per liter and exceed those of the corresponding 'dntps more than tenfold [ ] [ ] [ ] . when selecting the 'dntps, most dnaps use bulky side-chain residues in their active sites to exclude the 'oh of the ntps (reviewed in ref. ). these steric gate residues, typically gln/glu in a-family dnaps and tyr/phe in y-and b-family dnaps, create a stacking interaction with the deoxyribose moiety of an incoming 'dntp and form a hydrogen bond between the backbone amide group and the ′-oh group of the deoxyribose moiety (fig. c) . selection against the 'dntps by rnaps is a daunting challenge because 'dntps are substructures of the corresponding ntps. single-subunit rnaps (e.g., mitochondrial and bacteriophage t and n enzymes) are homologous and structurally similar to dnaps. however, single-subunit rnaps lack a steric gate and use a conserved tyr residue to discriminate against 'dntps , . tyr selectively facilitates the binding of ntps by forming a hydrogen bond with the 'oh group of the ntp ribose ( fig. c) , . intriguingly, the same tyr also inhibits the incorporation of 'dntps by an unknown mechanism , . noteworthy, a homologous tyr hydrogen bonds with the steric gate gln/glu residue in a-family dnaps ( fig. c ) , . the mechanism of discrimination against 'dntps by the multi-subunit rnaps (bacterial, archaeal and eukaryotic nuclear rnaps) is poorly understood. the combined structural evidence (reviewed in ref. ) suggests that the 'oh group can make polar contacts with three universally conserved amino acid side chains: β'arg , β'asn and β'gln (numbering of the escherichia coli rnap). the β'arg and the β'asn are contributed by the active site cavity and can interact with the 'oh of ntps in the open and closed active site (see below), whereas the β'gln is contributed by a mobile domain called the trigger loop (tl) and can only transiently interact with the '-and '-oh of ntps in the closed active site [ ] [ ] [ ] (fig. c) . closure of the active site by the tl is an essential step during nucleotide incorporation by the multi-subunit rnaps because the α-phosphate of the ntp is located . - Å away from the rna ' end in the open active site , . complete closure of the active site by the folding of two alpha-helical turns of the tl positions the triphosphate moiety of the substrate ntp inline for an attack by the 'oh group of the rna and accelerates catalysis ˜ fold , , , . in contrast, folding of one helical turn of the tl is insufficient to promote catalysis ( 'oh  αp distance . Å ) but likely significantly reduces the rate of ntp dissociation from the active site by establishing contacts between the β'gln and the ribose moiety and stacking of the β'met with the nucleobase (reviewed in ref. ). the relative contribution of the tl (β'gln and β'met ) and the active site cavity (β'arg and β'asn ) to the discrimination against 'dntps remains hitherto uncertain. the closure of the active site makes only a -to -fold contribution to an overall -to -fold selectivity against the ´dntp in rnaps from e. coli and saccharomyces cerevisiae . consistently, the open active site of the e. coli rnap retained a ~ -fold overall selectivity against 'dntps . however, the open active site of the thermus aquaticus rnap has been reported to be largely unselective , and individual substitutions of the β'asn with ser in e. coli and s. cerevisiae resulted in only a < -fold decrease in selectivity , . most importantly, although the universally conserved β′arg closely approaches 'oh of the ntp in several x-ray crystal structures [ ] [ ] [ ] , (supplementary table ) and has been highlighted as the sole residue mediating the selectivity against 'dntp in a computational study by roßbach and ochsenfeld , the role of this residue has not been experimentally assessed. in this study, we systematically investigated the effects of individual substitutions of the active site residues on the discrimination against 'dntps in single nucleotide addition (sna) assays and during processive transcript elongation by the e. coli rnap. this analysis demonstrated that β'arg is the major determinant of the selectivity against 'dntps in multi-subunit rnaps. we further analyzed the binding of '-deoxy substrates by in silico docking and x-ray crystallography of thermus thermophilus rnap. our data suggest that the conserved arg actively selects against 'dntps by favoring their templated binding in the '-endo conformation that is poorly suitable for incorporation into rna. to investigate the mechanism of the discrimination against the '-deoxy substrates we performed time-resolved studied of the single nucleotide incorporation by the wild-type (wt) and variant e. coli rnaps. among several single substitutions of the key residues that contact ntp ribose (fig. c) , we selected four variant rnaps that retained at least half of the wild-type activity at saturating concentration of ntps. this approach minimized the possibility that the amino acid substitutions induced global rearrangements of the active site thereby complicating the interpretations of their effects on the sugar selectivity. transcript elongation complexes (tecs) were assembled on synthetic nucleic acid scaffolds and they contained the fully complementary transcription bubble flanked by -nucleotide dna duplexes upstream and downstream (supplementary fig. a) . the annealing region of a nucleotide rna primer was initially nucleotides, permitting the tec extended by one nucleotide to adopt the post-and pre-translocated states, but disfavoring backtracking . the rna primer was ' labeled with the infrared fluorophore atto to monitor the rna extension by denaturing page. the template dna strand contained the fluorescent base analogue -methyl-isoxanthopterin ( -mi) eight nucleotides upstream from the rna ' end to monitor rnap translocation along the dna following nucleotide incorporation . we first measured gtp and 'dgtp concentration series of the wt and altered rnaps using a time-resolved fluorescence assay performed in a stopped-flow instrument (supplementary fig. b, c) . we used the translocation assay because it allowed rapid acquisition of concentration series, whereas measurements of concentration series by monitoring rna extension in the rapid chemical quench-flow setup would be considerably more laborious. the concentration series data allowed the estimation of kcat and the km (michaelis constant) for gtp and 'dgtp. we then supplemented the concentration series with timecourses of gmp and 'dgmp incorporation obtained using a rapid chemical quench-flow technique with edta as a quencher. edta inactivates the free gtp and 'dgtp by chelating mg + but allows a fraction of the bound substrate to complete incorporation into rna after the addition of edta , . as a result, the edta quench experiment is equivalent to a pulse-chase setup and provides information about the rate of substrate dissociation from the active site of rnap. a global analysis of the concentration series and edta quench experiments (i) allowed the estimation of the kd for gtp dissociation from the active site and (ii) suggested that the kd for the dissociation of the 'dgtp from the active site approximately equals the km for 'dgmp incorporation (see supplementary note). we further used inferred values of kcat and kd to compare the capabilities of the variant rnaps to discriminate against 'dgtp ( fig. ) . wt rnap displayed ~ -fold higher affinity for gtp than for 'dgtp ( fig. a, table ). the β'r k and β'q m substitutions decreased the selectivity at the binding step -and fold, respectively, largely by decreasing the affinity for gtp. in contrast, the β'n s decreased the selectivity only -fold, whereas the β'm a increased the selectivity . -fold. at saturating substrate concentrations, the wt rnap incorporated gmp ~ -fold faster than the 'dgmp ( fig. b, table ). the β'r k substitution decreased the selectivity -fold, primarily by accelerating the incorporation of the 'dgmp. in comparison, the effects of other substitutions on the selectivity against the 'dgtp at the incorporation step were relatively small (fig. b, supplementary tables , ) . the β'm a decreased the selectivity -fold, whereas the β'n s and β'q m increased the selectivity . -and -fold, respectively. noteworthy, the β'q m decreased the rate of 'dgmp incorporation -fold. overall, these experiments suggested that the β'arg plays a central role in the discrimination against '-deoxy substrates ( table ) : the β'arg selectively facilitated binding of gtp and selectively inhibited the incorporation of 'dgmp. in contrast, the role of the β'gln was complex: while the β'gln selectively facilitated the binding of gtp, it also selectively facilitated the incorporation of 'dgmp. the time-resolved sna assays described above are superior to any other currently available techniques for the quantitative assessment of the binding and incorporation of different substrates and the effects of active site residues therein. however, these assays have several limitations: the nucleotide incorporation was measured for static complexes stabilized in the post-translocated state by the artificially limited rna:dna complementarity and the effects are assessed only at a single, easy to transcribe, sequence position. to test if the conclusions drawn from the sna assay remain valid during processive transcript elongation we developed a semiquantitative assay as follows. tecs were assembled on a nucleic acid scaffold with a bp-long downstream dna and chased with ntp mixtures containing µm atp, ctp, utp and gtp or 'dgtp for min at °c. transcription with the 'dgtp by the wt rnap resulted in characteristic pauses at each sequence position preceding the incorporation of the 'dgmp ( fig. , pre-g sites). we used the amplitude of these accumulations as a semi-quantitative measure of the ability of the rnap to utilize 'dgtp. noteworthy, the interpretation of the processive transcription by some variant rnaps was complicated by enhanced pausing after the incorporation of cytosine (fig. b , at-c sites) and 'dgmp ( fig. b , at-g sites) in certain sequence contexts. however, these additional pauses were unrelated to the utilization of the 'dgtp as a substrate and could be disregarded when comparing pre-g pauses that occurred upstream of all at-c and at-g pauses. in contrast to the wt rnap, the β'r k did not pause prior to the incorporation of the 'dgmp ( fig. ) , consistently with the significantly higher 'dgmp incorporation rate observed in sna assays (fig. b) . moreover, the β'r l rnap also did not accumulate at the pre-g sites despite being strongly defective during processive transcription (fig. a, supplementary fig. ) . these data suggest that the loss of selectivity in the β'r k is attributable to the absence of the β'r rather than the gain of function effect of the lys residue at the corresponding position. the β'm a paused noticeably less whereas the β'q m paused noticeably more than the wt rnap at the pre-g sites (fig. a, supplementary fig. ) consistently with the -fold higher (β'm a) and -fold lower (β'q m) kcat for the 'dgmp incorporation in the sna experiments ( fig. b, table ). in contrast, the β'n s was largely indistinguishable from the wt rnap in its ability to utilize the 'dgtp in the processive transcription assay (fig. , supplementary fig. ) , presumably because this assay is not sensitive enough to resolve the ~ . -fold difference in kcat for the 'dgmp incorporation (fig. b, supplementary tables , ) . overall, the analysis of the utilization of the 'dgtp during the processive transcription of diverse sequences by the wt and variant rnaps recapitulated the major effects observed in the sna experiments. next, we tested the effects of the β'r k, β'm a, β'q m and β'n s substitutions on utilization of 'datp, 'dctp and 'dutp during processive transcription (supplementary figs. - ). for each 'dntp, we custom designed a template where the 'dntp is incorporated several times early in transcription, thereby allowing unambiguous interpretation of the accumulation of rnaps at sites preceding the 'dnmp incorporation. an analysis of the utilization of 'datp, 'dctp and 'dutp largely recapitulated the effects observed for 'dgtp, except that the β'n s was markedly inferior to the wt rnap in utilizing 'datp and 'dutp. overall, these data demonstrated that the enhanced or diminished capabilities of the variant rnaps to utilize 'dgtp in the sna assays reflected, in qualitative terms, their capabilities to utilize all four 'dntps. the role of the β'arg in selectively promoting the binding of ntps was easy to explain because the β'arg interacts with the 'oh of the ntp analogues in several rnap structures (supplementary table , fig. c, a, b) . in contrast, the observation that the β'arg selectively inhibited the incorporation of 'dntps could not be readily explained: our results show that the β'arg substitutions promote the incorporation of the substrate that lacks the 'oh group, which the β'arg would interact with. we hypothesized that, in the absence of the 'oh, the β'arg interacted with something else and that the interaction slowed down the incorporation of 'dnmps into the nascent rna. we further reasoned that the 'oh group of the 'dntp was the most likely interacting partner of the β'arg , an inference supported by md simulations of s. cerevisiae rnapii . however, the 'oh group is positioned too far from the β'arg when the sugar moiety is in the '-endo conformation (supplementary table ) . we further hypothesized and demonstrated by in silico docking experiments that the 'oh could move to within the hydrogen bond distance of the β'arg if the deoxyribose moiety adopted a '-endo conformation (supplementary fig. to test this hypothesis in crystallo, we solved the x-ray crystal structure of the initially transcribing complexes containing t. thermophilus rnap, dna and -nt rna primer with incoming 'dctp bound at the active site at . Å resolution. the structure displayed a wellresolved electron density of the 'dctp and the β'arg closely approaching the deoxyribose moiety ( fig. c, d, table , supplementary fig. a, supplementary data ) . the 'dctp was observed in the pre-insertion conformation, that was unsuitable for catalysis because the αphosphate was located . Å from the 'oh of the rna primer. the electron density was consistent with the interaction between the β'arg and the 'oh group of the deoxyribose in the '-endo conformation in agreement with the results of in silico docking. interestingly, the density for the metal ion complexed by the β-and γ-phosphates of the 'dctp was weak and the coordination distances were longer than typically observed for mg + in the corresponding position. we modeled this metal ion as a na + rather than mg + similarly to what has been proposed for dna polymerase β . the tl was completely unfolded in the structure of the initially transcribing complex with the 'dctp, in contrast to a partially helical conformation typically observed in the structures of the rnap complexes with non-hydrolysable ntp analogues (supplementary table to test if the unavailability of the 'oh group was indeed responsible for the destabilization of the tl folding, we solved the x-ray crystal structure of the initially transcribing complex of the t. thermophilus rnap with a 'dctp at . Å resolution. the structure displayed a wellresolved density of the 'dctp and the β'arg closely approaching the '-deoxyribose moiety ( fig. e, f, supplementary fig. b, supplementary data ) . the 'dctp was in the preinsertion conformation that was unsuitable for catalysis because the α-phosphate was located . Å away from the 'oh of the rna primer. the overall pose of the 'dctp was similar to that of cytidine- '-[(α,β)-methyleno]triphosphate (cmpcpp): the '-deoxyribose adopted a 'endo conformation and the 'oh group interacted with the β'arg . however, the tl was completely unfolded, supporting our hypothesis that the unavailability of the 'oh group was alone sufficient to significantly destabilize the folding of the first helical turn of the tl. overall, the comparative analysis of rnap structures with cmpcpp, 'dctp and 'dctp suggested that the β'arg inhibited the incorporation of 'dntps by interacting with their 'oh group and favoring the '-endo conformation of the deoxyribose moiety. at the same time, the structures did not provide a decisive answer as to why the '-endo conformations of 'dntps were less suitable for incorporation into rna than the '-endo conformations. the x-ray structures and in silico modeling experiments suggested that interactions between the 'oh of the deoxyribose moiety and the β'arg or the β'gln were mutually exclusive. accordingly, the β'arg could inhibit the incorporation of the 'dnmp solely by slowing down the initial steps of the tl folding, by sequestering the 'oh group and preventing its interaction with the β'gln of the tl. to test this hypothesis, we determined the incorporation rate of 'dgmp by the wt rnap (supplementary fig. c) . we found that the kcat for 'dgmp incorporation was only -fold slower than the kcat for gmp incorporation and fold higher than the kcat for 'dgmp incorporation ( table ). these data demonstrated that the sequestration of the 'oh group accounted for no more than a -fold inhibition of the 'dgmp incorporation by the β'arg . the remaining -fold inhibition of the overall -fold inhibitory effect was contributed by some other features of the '-endo binding pose, as discussed below. in this study we performed a systematic analysis of the role of the amino acid residues in the active site of the multi-subunit rnap in selecting ntps over 'dntps. we identified a conserved arg residue, β'arg (e. coli rnap numbering) as the major determinant of the sugar selectivity. the β'arg favored binding of gtp over 'dgtp and selectively inhibited the incorporation of 'dnmps into rna (figs. , , table ). the enhancement of ntp binding by the β'arg is consistent with the observation that the β'arg is positioned to hydrogen bond with the 'oh of the ntp substrate analogues in several rnap structures (supplementary table ) and with md simulations of the s. cerevisiae rnapii . however, the existing data fail to explain the inhibition of the 'dntp incorporation by the β'arg . in search of an explanation, we performed in silico docking experiments and solved the x-ray crystal structures of the initially transcribing t. thermophilus rnap with the cognate 'dctp and 'dctp. these experiments revealed that the β'arg interacts with the 'oh group of the 'dntp substrate and favors the '-endo conformation of the deoxyribose ( fig. c, d, supplementary fig. e, f) . in contrast, the ribose of the cognate ntp substrate is stabilized in the '-endo conformation by multiple polar contacts and hydrogen bonds with the active site residues: β'arg , β'asn and β'gln (figs. c, a, b, supplementary fig. a , b, we next considered whether the deformation of the 'dntp substrate, repositioning of the β'arg or both were behind the slow incorporation of the 'dnmps by the wt rnap. a hybrid quantum and molecular mechanics (qm/mm) analysis of nucleotide incorporation by the s. cerevisiae rnapii suggested that repositioning of the β'arg by the 'dntp substrate may increase the activation energy barrier for the nucleotide addition reaction . however, a comparison of the rnap structures with bound cmpcpp, 'dctp and 'dctp revealed very small changes in the conformation of the β'arg (fig. ) . similarly, a survey of the published x-ray and cryoem structures revealed that the β'arg occupies approximately the same volume irrespective of the presence or absence of the active site ligands (supplementary table ). accordingly, we reasoned that the preferential selection of the catalytically-inert '-endo conformers of 'dntps and the deformation of the catalytically-labile '-endo conformers of 'dntps by β'arg were likely the major factors behind the slow incorporation of the 'dnmps. however, it remained unclear why the '-endo conformers of the substrates were less suitable for the incorporation than the '-endo conformers. we first explored the possibility that the sequestration of the 'oh group by the β'arg makes it unavailable for the interaction with the β'gln of the tl (fig. c, d, supplementary fig. e, f) , thereby destabilizing the tl-mediated closure of the active site. it is well established that the closure of the active site by two helical turns of the tl accelerates the catalysis of nucleotide incorporation by ~ -fold , , , . noteworthy, the tl is partially folded in most structures with ribonucleotide substrate analogues (fig. a, table ) - , , yet was completely unfolded in the structures we obtained with either a 'dctp ( fig. c, d) or a 'dctp ( fig. e, f) . given that the 'dctp was in the conventional '-endo conformation, the latter result suggested that the unavailability of the 'oh group was sufficient to significantly impair the folding of the tl and slow down the catalysis by the t. thermophilus rnap in crystallo. to quantitatively estimate the contribution of the 'oh interactions to the catalysis, we determined the rate of the 'dgmp incorporation by e. coli rnap. we found that the rate of the 'dgmp incorporation was -fold slower than the rate of the gmp incorporation, but -fold faster than that of the 'dgmp ( table , supplementary fig. c) . these results suggested that the sequestration of the 'oh group by the β'arg could account for no more than a -fold out of its -fold overall inhibitory effect. notably, the t. thermophilus rnap also incorporates 'dnmps faster than 'dnmps but discriminates against both types of substrates ~ -fold stronger than the e. coli rnap . similarly, the effects of the β'q m substitution were inconsistent with the idea that the 'oh capture by β'arg could alone account for the slow rate of the 'dnmp incorporation. if that were true, the β'q m variant should be relatively insensitive to the absence of the 'oh group. however, the opposite was true: β'q m was only twofold slower in incorporating gmp than the wt rnap, but tenfold slower in incorporating 'dgmp. we propose that the β'gln competes with the β'arg for the 'oh group of the 'dntp substrate: the β'arg favors the catalytically-inert '-endo conformer (fig. c, d, supplementary fig. e, f) , whereas the β'gln favors the catalytically-labile '-endo conformer (supplementary fig. c, d) . as a result, the β'gln is more important during the incorporation of 'dnmps than nmps. since the tl folding can account only for a fraction of the inhibitory effect, what other factors make the '-endo conformers of 'dntps catalytically inert? it is noteworthy that the sugars of the attacking and substrate nucleotides adopt the '-endo conformation in all rnaps and dnaps during the nucleotide incorporation . in other words, even the ' ends of dna primers adopt the '-endo conformation to catalyze the incorporation of the 'dnmps into the dna. apparently, the a-form geometry is much better suited for the catalysis of the nucleotide condensation than the b-form geometry , . the better accessibility of the nucleophilic 'oh group of the attacking nucleotide is likely the primary reason. the substrate then adopts the 'endo conformation to match the overall geometry of the a-form duplex and to avoid clashes with the attacking nucleotide . in general terms, the inertness of the '-endo conformation of the 'dntps can be partially attributed to the differences in the conformations of the triphosphate moieties that in turn originate from the differences in the bond angles at c' of the sugar between the '-and '-endo conformers (fig. b) . we term this inhibitory component as c' -geometry-dependent effects. however, in our view, it is impossible to further refine this hypothesis at present: (i) the resolutions of the structures are not very high (≥ . Å, supplementary table noteworthy, the conserved arg is one of only five catalytic residues that are conserved in the superfamily of the so called "two-β-barrel" rnaps , that includes the multi-subunit rnaps and very distantly related cellular rna-dependent rnaps (rdrps) involved in the rna interference ( supplementary fig. ) . accordingly, the common ancestor of the two-β-barrel rnaps could conceivably discriminate against 'dntps and therefore likely evolved in the presence of both ntps and 'dntps. this inference lends credence to the hypothesis that proteins evolved in primordial lifeforms that already possessed both rna and dna , . viral rdrps (members of the so-called "right-hand" superfamily of nucleic acid polymerases) are not homologous to multi-subunit rnaps but share some elements of their sugar selection strategies. it appears that the 'oh of the substrate ntp facilitates the active site closure in both classes of enzymes. in multi-subunit rnaps, 'oh facilitates the tl folding via the interaction with β'gln , whereas in viral rdrps, 'oh initiates the closure by sterically clashing with asp (poliovirus rdrp numbering) . in both classes of enzymes, 'dntps adopt a '-endo pose wherein the 'oh is misplaced and cannot readily facilitate the closure of the active site, explaining low reactivities of 'dntps. however, 'dntps are better substrates than 'dntps also for viral rdrps suggesting that the low reactivity of the '-endo 'dntps additionally relies on c' -geometry-dependent effects (see above), which lead to a suboptimal conformation of the triphosphate moiety, a suboptimal geometry of the transition state, or both. multi-subunit rnaps and viral rdrps converged on using the '-endo binding pose to discriminate against 'dntps. in doing so these enzymes accentuate the intrinsic preferences of 'dntps to retain the inert '-endo conformation upon binding to the a-form template in the non-enzymatic system in summary, our data show that a universally conserved arg residue plays a central role in selecting ntps over 'dntps by the multi-subunit rnaps. when ntp binds in the rnap active site, its ribose adopts the '-endo conformation that positions the 'oh group to interact with the universally conserved gln residue of the tl domain and promotes the closure of the active site, whereas the triphosphate moiety can undergo rapid isomerization into the insertion conformation leading to efficient catalysis. the interaction of the conserved arg residue with the 'oh of the ntps selectively enhances their binding more than -fold and renders rnap saturated with ntps in the physiological concentration range. in contrast, the interaction of the conserved arg with the 'oh of the 'dntp substrates shapes their deoxyribose moiety into the catalytically inert '-endo conformation where the 'oh cannot promote closure of the active site and substrate incorporation is additionally inhibited by the unfavorable geometry of the triphosphate moiety. the deformative action of the conserved arg on the 'dntp substrates is an elegant example of active selection against a substrate that is a substructure of the correct substrate. dna and rna oligonucleotides were purchased from eurofins genomics gmbh (ebersberg, germany) and iba biotech (göttingen, germany). dna oligonucleotides and rna primers are listed in supplementary table our initial docking trials revealed that docking of nucleoside monophosphates produced the most robust and quantitatively interpretable results. thus, the docking algorithm failed to recover templated poses for nucleosides without phosphate groups. the docking algorithm also failed to position the triphosphate moiety to coordinate metal ion number two and instead attempted to maximize its contacts with the protein. as a result, the recovered conformations of the triphosphate moieties differed from those observed in crystal structures. considering the high impact of the triphosphate moiety on the ligand binding score and our assessment that the triphosphate moiety was docked incorrectly, we opted to limit the systematic investigation of the interaction between rnap and the sugar moieties of nucleosides to docking nucleoside monophosphates. we first docked '-endo cmp, '-endo 'dcmp and '-endo 'dcmp to the rnap fragment . the docking algorithm recovered high-scoring poses (- . ± . kcal/mol) for cmp in out of runs, lower-scoring poses (- . ± . kcal/mol) for '-endo 'dcmp in out of runs and 'endo 'dcmp in out of runs. the β'arg side chain was kept flexible in the latter case because our manual assessment suggested that a sub-angstrom repositioning of β'arg would be needed to accommodate the '-endo deoxyribose. we than fixed the β'arg table ). these in silico experiments suggested that the semi-closed active site can bind the '-endo and '-endo 'dcmp with similar affinities. the 'oh of the '-endo 'dcmp was positioned to interact with β'gln and β'asn , whereas the 'oh of the 'endo 'dcmp was positioned to interact with β'arg and β'asn ( supplementary fig. ) . we further inferred that the open active should have preference for the '-endo 'dcmp because β'gln is not positioned to interact with the 'oh of the substrate in the open active site. well in line with our prediction, the x-ray diffraction data for the crystals of the rnap- 'dctp complex was consistent with the '-endo conformation of the 'dctp bound in the open active site (fig. b, c, supplementary fig. a) . we further verified the binding preferences of the open active site in silico by removing the 'dctp from the model and docking alternative conformers of the 'dcmp. the docking algorithm recovered higher-scoring poses (- . ± . kcal/mol) for the '-endo 'dcmp in out of runs and lower-scoring poses (- . ± . kcal/mol) for '-endo 'dcmp in only out of runs (supplementary table ). the non-template dna strand ( '-tataatgggagctgtcacggatgcagg- ') was annealed to the template dna strand ( '-cctgcatccgtgagtgcagcca- ') in µl of mm tris-hcl (ph . ), mm nacl, and mm edta to the final concentration of mm. the solution was heated at °c for min and then gradually cooled to °c. the crystals of the rnap and promoter dna complex were prepared as described previously the x-ray datasets were collected at the macromolecular diffraction at the cornell high energy synchrotron source (macchess) f beamline (cornell university, ithaca, ny) and structures were determined as previously described , using the following crystallographic software: the reaction products were modelled as sums of independent contributions by the fast and slow fractions of rnap using numerical integration capabilities of the kintek explorer software. contributions of each fraction were modeled as scheme . upper and lower bounds of the parameters were calculated at a % increase in chi . table and supplementary tables - . error bars are ranges of duplicate measurements or sds of the best-fit parameters, whichever values were larger. a tecs were assembled using the scaffold shown above the gel panels and chased with µm atp, ctp, utp and gtp or 'dgtp for min at °c. the positions of gmps in the resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. -bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using rgb palette on the right. b lane profiles of transcription in all-ntps and 'dgtp chases by the wild-type and β'r k rnaps quantified from gels in (a). traces were manually aligned along the x-axis and scaled along the y-axis using several sequence positions as references. magenta numbers are interatomic distances in Å. panels (a) and (b) were prepared using pdb id coli rnaps. a tecs were assembled using the scaffold shown above the gel panels and chased with µm atp, ctp, utp and gtp or 'dgtp for min at °c. the positions of gmps in the resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. -bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using rgb pale e on the right. b lane profiles of transcription in all-ntps and 'dgtp chases by the wild-type and β'r k rnaps quantified from gels in (a). traces were manually aligned along the x-axis and scaled along the y-axis using several sequence positions as references. supplementary fig. : utilization of 'dutp and 'datp during the processive transcript elongation by the wt and variant rnaps. a tecs were assembled using the scaffold shown above the gel panels and chased with µm ctp, gtp, utp, atp (all-ntps-chase), or ctp, gtp, atp, 'dutp ( 'dutp-chase), or ctp, gtp, utp, 'datp ( 'datp-chase) for min at °c. the positions of umps or amps in resolved stretches of the transcribed sequence are marked along the right edge of the gel panels. -bit grayscale scans were normalized using max pixel counts within each gel panel and pseudocolored using rgb pale e on the right. b lane profiles of transcription by the wt (cyan) and β'r k (magenta) rnaps quantified from gels in (a). traces were manually aligned along the x-axis and scaled along the y-axis using several sequence positions as references. + + + + supplementary fig. : lane profiles of transcription in all-ntps and 'dgtp chases quantified from gels in main text figure . fig. : lane profiles of transcription in all-ntps, 'dutp and 'datp chases quantified from gels shown in supplementary figure . fig. . utilization of 'dctp during the processive transcript elongation by the wt and variant rnaps. a tecs were assembled using the scaffold shown above gel panels and chased with µm gtp, utp, atp and ctp (all-ntps chase) or 'dctp ( 'dctp chase) for min at °c. b lane profiles of transcription by the wt (cyan) and β'r k (magenta) rnaps quantified from gels supplementary table . dna oligonucleotides and rna primers used in this study. we used time-resolved single nucleotide addition experiments to estimate the equilibrium constant for gtp, 'dgtp and 'dgtp binding and dissociation in the active site of rnap and to determine the first order rate constant (also known as the turnover number) for the incorporation of gmp, 'dgmp and 'dgmp into the nascent rna. the tecs were assembled on synthetic nucleic acid scaffolds and contained the fully complementary transcription bubble flanked by -nucleotide dna duplexes upstream and downstream (supplementary fig. a) . the annealing region of a -nucleotide rna primer was initially nucleotides, permitting the tec extended by one nucleotide to adopt the post-and pre-translocated states, but disfavoring backtracking. the rna primer was ' labeled with the infrared fluorophore atto to monitor the rna extension by denaturing page. to facilitate the rapid acquisition of kinetic data (see below), the template dna strand contained a fluorescent base analogue -methyl-isoxanthopterin ( -mi) eight nucleotides upstream from the rna ' end. -mi allowed the monitoring of rnap translocation along the dna following nucleotide incorporation (supplementary fig. a) . we first measured concentration series of gmp and 'dgmp incorporation by the wild-type and altered rnaps using a time-resolved fluorescence assay performed in a stopped flow instrument (supplementary figs. - ) . we used the translocation assay because it allowed the rapid acquisition of concentration series, whereas measurements of concentration series by monitoring rna extension in a rapid chemical quench-flow setup would be considerably more laborious. we then performed a preliminary data analysis by fitting each fluorescence timetrace to a single exponential function followed by fitting the resulting individual rates to a michaelis equation. the inferred kcat and km generally supported all major conclusions reported in this study. however, we proceeded to expand the datasets by including additional data and developed more elaborate analysis routines. the first reason to invoke a more elaborate analysis was the observation that most fluorescence time traces in our datasets fitted poorly to the single exponential function. in fact, the underlying physics of a single turnover enzymatic reaction suggests that individual timetraces in the concentration series should, in a general case, be poorly described by a single exponential function (see below). the second reason to invoke a more elaborate analysis was the concern that the michaelis constant is a lumped constant that contains a sum of the catalytic and substrate dissociation rates in the numerator and the substrate binding rate in the denominator, whereas the equilibrium binding constants are the ratios of the substrate dissociation and binding rates. accordingly, we were concerned that comparing the michaelis constants of reactions could potentially lead to erroneous conclusions in the cases where the km was markedly different from the kd. for the sake of understanding our analysis workflow, it is important to acknowledge that each reaction timetrace in the concentration series describes a single turnover process: we designed the transcribed sequence so that only a single gmp (or 'dgmp or 'dgmp) became incorporated upon the addition of gtp (or 'dgtp or 'dgtp). the ease of obtaining single turnover timetraces is a significant analytical advantage natively associated with templatedependent nucleic acid polymerases. it is often possible to infer more parameters from concentration series of single-turnover reactions than from concentration series of classic multiturnover enzymatic reactions. next, most timetraces in the concentration series are not expected to fit a single exponential function even in the case of the simple signal, a -nt extended nascent rna (rna in this study). the enzymatic reaction is minimally a two-step sequential reaction that consists of the that was employed by prajapati et al. next, a kinetic heterogeneity in the tec preparations introduced an additional level of complexity to the fitting of the data. we reported previously that a vast majority of tecs contain - % of a slow fraction that manifests itself as a slow phase in reaction timetraces of both the fluorescence signal (stopped-flow assay) and the extended rna (quench flow assay) , . in the case of fast reactions measured in this study (gtp, 'dgtp data), the rates of the fast and slow phases differed approximately tenfold and therefore the phases could be precisely resolved (see a dedicated section below). importantly, the fast phase of the reaction constituted - % of the signal amplitude ( table , supplementary table ) . accordingly, we considered the activity of the fast fraction as a representative measure of the rnap activity in each experiment and disregarded the minor slow fraction when comparing the wild-type and variant rnaps (fig. ) . in the case of slow reactions ( 'dgtp data), the fast and the slow phases were not well separated ( -fold difference in rates, when fitting data to equation , each timetrace was described by a stretched exponential function (an empirical function that is often used to describe heterogeneous systems ). at the same time, the exponent followed the hyperbolic dependence on the 'dgtp concentration ( supplementary figs. c, ) . such fits described the data well and gave three parameters: a reaction rate constant (k), a stretching parameter (β) and the michaelis constant (km). when a stretching exponential function is applied to a process where the reactivity changes over time (or distance), the rate constant parameter (k) corresponds to the initial reaction rate constant. in our case, the stretched exponential fit potentially absorbed both temporal and structural heterogeneity as well as the deviations from the single exponential behavior caused by the sequential nature of the enzymatic reaction (see above). for this reason, the rate parameter (k) did not have an easily interpretable meaning. to circumvent this problem we calculated the median reaction time as (median reaction time) = (ln( )^( /β)) / k; then calculated the median reaction rate assuming that (median reaction rate) = ln( ) / (median reaction time) and used the median reaction rate as a measure when comparing the wild-type and variant rnaps (fig. , supplementary table ) . next, fitting the data to equation gives the km rather than the kd. however, it is rather certain that koff >> kcat for all 'dgmp incorporation reactions ( table , also see scenario below). if so, km approximately equals kd for each 'dgmp incorporation reaction. accordingly, we used km in place of kd for 'dgmp addition reactions when comparing substrates and rnaps (fig. ) . finally, we emphasize that the 'dgmp incorporation data by the wild-type and the β'r k rnaps were fit to both scheme and equation leading to affinities for 'dgtp that were indistinguishable within the margin of the experimental uncertainty (compare 'dgtp data in table and supplementary table ). the catalytic activity of the wild-type rnap towards 'dgtp inferred by fitting the data to equation was, as expected, in-between the catalytic activities of the fast and slow fraction inferred by fitting the data to scheme . accordingly, we argue that the employment of different analysis routines for gtp and 'dgtp is of little concern for the main inferences drawn in this study. we have previously shown that the nucleotide addition and the subsequent translocation along the dna by the wild-type e. coli rnap occur with similar rates at saturating concentrations of cognate ntps . as a result, (i) the translocation timetraces are delayed by a few milliseconds relative to the nucleotide addition timecurves and (ii) the translocation timetraces at saturating concentrations of cognate ntp substrates are not well described by a single exponential function because both nucleotide addition and translocation are partially rate limiting. in this study, translocation rates were tangential to the main line of investigation, but they were necessary parameters during the global fitting of the fluorescence timetraces and gmp incorporation timecurves to scheme . at the same time, the translocation rates are much faster than the 'dgmp incorporation rates and could be completely disregarded during the analysis of the 'dgtp concentration series by fitting the date to scheme or equation . supplementary table should not be equated with the forward translocation rates. thus, we modeled translocation as an irreversible transition in scheme . as a result, the inferred translocation rates are the rates of the system approaching the translocation equilibrium after the nucleotide incorporation rather than the forward translocation rates. albeit somewhat counterintuitively but following the rules of the formal kinetics the inferred equilibration rate equals the sum of the forward and the backward translocation rates. it was possible to further split the equilibration rate into the forward and backward translocation rates by assessing the completeness of the translocation, as we did in our previous studies . however, we refrained from doing so in this study because the translocation process was tangential to the main line of the investigation. fig. a) . as a result, both tec and tec ntp are detected as tec in the edta quenched samples because nearly % of the tec ntp is converted into tec after the addition of edta, and practically no ntp dissociates back into the solution (kcat >> koff). the above situation corresponds to 'dgmp addition by the wild-type and variant rnaps. fitting the 'dgtp concentration series to a semi-empirical equation allowed the estimation of kcat and km 'dgtp ≈ kd 'dgtp for the wild-type, β'r k, β'm a, β'q m and β'n s rnaps table ). for the β'r k and the wild-type rnap we additionally measured the edta quench curve, fitted the data globally to scheme and inferred the lower bounds of kon and koff. in addition to kd (table , supplementary figs. c, a) . . as always, kcat and km can be inferred from the ntp concentration series, but neither kcat/km ≈ kon (as is in scenario ) nor km ≈ kd (as is in scenario ). in contrast, the global fit of the ntp concentration series and the edta quench data has the best resolving power in scenario : kcat, kon, koff, and ktra (in some cases) can be inferred from the data though the precision of the individual estimates varies greatly. the above situation corresponds to the gmp addition by the wild-type and variant rnaps (supplementary table , supplementary figs. b, ) and the 'dgmp addition by the wildtype rnap (table , supplementary fig. c) . only the wild-type rnap data allowed for precise estimates of all parameters of scheme . in the case of the β'r k and β'q m for the comparison of the rnap's capabilities to bind and utilize various substrates (fig. ) . handling of the slow fraction during fitting to scheme . the timecourses of the nmp incorporation by the wild-type e. coli tec typically display a distinctive slow phase that represents - % of the overall signal amplitude and features the rate of . - s - . in contrast, the major, fast phase of the reaction is approximately tenfold faster at saturating [ntp] ( - s - for gtp). the slow phase possibly represents an inactive tec in equilibrium with the active tec, a fraction of the tec that slowly reacts with the ntp substrate or a combination of both. during the fitting of the data using the kintek explorer software, the slow phase can be modeled in two ways (supplementary note fig. b) . the first option is to invoke a reversible equilibrium between the active and inactive tec and to introduce a virtual equilibration step prior to mixing of the tec with the ntps. we term this approach as the reversible inactivation model. the second option is to explicitly model the tec preparation as two fractions that do not interconvert but incorporate nmp with different rates. the fractions of the slow and fast tec are then allowed to vary as parameters during the fit. we term this approach as the nonequilibrium heterogeneity model. the two models are largely indistinguishable if measurements are carried out at a single [ntp] and both models require two parameters to describe the slow phase: inactivation and recovery rates in the first case, and the slow fraction and its reaction rate in the second case ( supplementary note fig. b) . however, the response of the slow phase to the decrease in the [ntp] differs between these two models. the reversible inactivation model predicts that the rate of the slow phase is independent of [ntp] and the slow phase is largely abolished as the [ntp] decreases. in contrast, the non-equilibrium heterogeneity model predicts that the rate of the slow phase decreases in unison with the rate of the fast phase as [ntp] decreases (both follow a hyperbolic dependence on [ntp]). in this study we analyzed all gmp and 'dgmp incorporation datasets using the non-equilibrium heterogeneity approach to model the slow phase, because some datasets (e.g. β'q m, supplementary fig. ) could not be adequately fit by the previously employed reversible inactivation model , , . fig. : kinetic analyses of the data. a simulation and graphic interpretation of the edta and hcl quench curves at saturating substrate concentrations and different values of k . b simulation of concentration series of a off biphasic reaction using the reversible inactivation (left) and non-equilibrium catalytic heterogeneity (right) models. origin of life: the rna world the antiquity of rna-based evolution activated ribonucleotides undergo a sugar pucker switch upon binding to a single-stranded rna template watching dna polymerase η make a phosphodiester bond physiological concentrations of purines and pyrimidines abundant ribonucleotide incorporation into dna by yeast replicative polymerases basic mechanisms of transcript elongation and its regulation unlocking the sugar "steric gate" of dna polymerases a mutant t rna polymerase as a dna polymerase mechanism of ribose '-group discrimination by an rna polymerase the structural mechanism of translocation and helicase activity in t rna polymerase x-ray crystal structures elucidate the nucleotidyl transfer reaction of transcript initiation using two nucleotides choosing the right sugar: how polymerases select a nucleotide substrate crystal structures of open and closed forms of binary and ternary complexes of the large fragment of thermus aquaticus dna polymerase i: structural basis for nucleotide incorporation klentaq polymerase replicates unnatural base pairs by inducing a watson-crick geometry interactive d versions of the structural figures (webgl in browser): supplementary data : interactive fig. a, b supplementary data : interactive fig. c, d supplementary data : interactive fig. e, f supplementary data : interactive supplementary fig. c, d supplementary data : interactive supplementary fig. e, f supplementary data : interactive supplementary fig we thank irina artsimovitch for critically reading the manuscript, the staff at the macchess for support of crystallographic data collection, anssi m. malinen for constructing plasmids, matti turtola for his contribution to the development of the edta quench method. the reaction products were modelled as sums of independent contributions by the fast and slow fractions of rnap using numerical integration capabilities of the kintek explorer software. contributions of each fraction were modeled as scheme . upper and lower bounds of the parameters were calculated at a % increase in chi . key: cord- -klvx g authors: tayfuroglu, omer; yildiz, muslum; pearson, lee-wright; kocak, abdulkadir title: an accurate free energy method for solvation of organic compounds and binding to proteins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: klvx g here, we introduce a new strategy to estimate free energies using single end-state molecular dynamics simulation trajectories. the method is adopted from ani- ccx neural network potentials (machine learning) for the atomic simulation environment (ase) and predicts the single point energies at the accuracy of ccsd(t)/cbs level for the entire configurational space that is sampled by molecular dynamics (md) simulations. our preliminary results show that the method can be as accurate as bennet-acceptance-ration (bar) with much reduced computational cost. not only does it enable to calculate solvation free energies of small organic compounds, but it is also possible to predict absolute and relative binding free energies in ligand-protein complex systems. rapid calculation also enables to screen small organic molecules from databases as potent inhibitors to any drug targets. the life cycle of the virus as many other viruses has an essential proteolytic auto-processing step. - the cl-protease takes an important role in freeing individual functional proteins from single-chain polyprotein that has been translated by the host cell translation machinery. interfering to this step has been shown to have capability of inhibiting virus replication. therefore, cl-protease is one of the valid and most attractive drug targets for covid- treatment. developing a drug molecule from scratch for curing a disease is quite expensive and time consuming. instead, repurposing an fda approved drug is a much faster and less expensive strategy since the time is very critical parameter in controlling such invasive viruses. even testing all fda approved drugs experimentally one by one may cause intolerable waste of the time. to help to ease the problem, computational calculations can be very handy. there have already been several computational studies about predicting an efficient fda approved drugs that can be repurposed for covid- . [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] in searching for drug candidates as inhibitors, there are two core questions that can be addressed: (a) what is the binding mode? (i.e. where does it bind from the enzyme?) (b) what is the binding affinity? (binding free energy). inhibition of an enzyme can be involved in various mechanisms. for instance, an inhibitor can bind to the enzyme from the active site in the absence of substrate (competitive inhibition) forming an additional or it can bind in the presence of the substrate (i.e. enzyme-substrate complexed) from the allosteric site (uncompetitive inhibition). alternatively, it can reduce the activity of the enzyme by binding from allosteric site of the enzyme either free or complex (non-competitive inhibition). in all these mechanisms the equilibrium constant, k s belonging to the reaction between the enzyme and substrate is shifted towards the reactants by existing inhibitor in the media. kinetic studies to understand the activity loss of the enzyme on substrate by the inhibitor report values such as inhibition equilibrium constant, k i or half maximal inhibitory concentration, ic . both these experimental parameters are proportional (they are equal in special cases). thermodynamic equilibrium constant k i is related to the standard binding gibbs free energy of the system by: the Δg o can also be calculated from the thermodynamic potentials. thus, it is possible to compare experimental Δg o determined from k i and theoretical Δg o predicted by thermodynamic potentials using computational methods. however, logarithmic relation between the k i and Δg o complicates the reliability of the calculations. an error of - kcal is acceptable for the most computations, but this makes thousands of times less efficient inhibitor. therefore, most computational methods fail to estimate the experimental binding affinity trend (relative binding free energies) among the inhibitors. reproducing the absolute standard free energies of binding is even more difficult for the computational methods. moreover, a trade-off between the computational accuracy and speed must be made. however, almost all of the studies screen databases listing the fda approved drugs at the molecular docking level, which is mostly based on geometrical alignment of a ligand into a binding pocket, namely the active site of c-like protease ( cl pro ). despite the speed of docking approach, they are very coarse methods and almost never predict the correct experimental binding affinity trend among the inhibitors. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] more sophisticated methods to calculate the potential binding free energy of inhibitor candidate to the protein ranges from post molecular dynamics simulations such as molecular mechanics poisson-boltzmann surface area (mmpbsa) [ ] [ ] [ ] [ ] to perturbation methods such as bennett acceptance ratio (bar), - the latter being much more accurate. [ ] [ ] [ ] are also commonly used and proven to be sufficiently accurate in estimating solvation free energies along with relative and absolute binding free energies. however, all these methods require very high computational cost. screening all the drug candidates using these methods may not be very practical. for a typical free energy of binding, these methods require at least λ-states for the decoupling of l from pl complex and another that many for decoupling l from water (i.e. solvation free energy of ligand), which increases the computational cost by -folds. machine learning (ml) techniques have attracted a great interest for the past decade due to their successful algorithms to scientific questions including but not limited to chemical reactions, [ ] [ ] potential energy surfaces, [ ] [ ] [ ] [ ] forces, - atomization energies, - and proteinligand complex scorings. . one of the most promising aspects of the ml techniques is that the trained models can be applied to new systems (transferrable). recently several models using active learning such as ani- , ani- x and ani- cxx here, we introduce a new strategy to estimate free energies of solvation of small organic compounds and binding to proteins in explicit solvent using single end-state md simulations. the method is adopted from ani- ccx neural network potentials (machine learning) for the the insertion of the ligand to an environment of solvent (solvation free energy) or receptor (binding free energy) can be defined by a coupling parameter, λ. at each λ-state, the potential energy of the system would be: where v is the interaction energy between the ligand and environment. ܷ is the reference potential energy, which includes all energy terms except for the ligand-environment interaction. the partition function is; where f is the free energy depending on λ-state. re-organizing the above equation, the equation above is only valid when the ligand-environment interaction is proportional to coupling term λ. this results in; in the limiting case, this equation yields as; the equation implies that the free energy change is nearly half-the potential between the ligand-environment interaction in the complex configurational ensemble. for the binding studies, the protein-ligand complex (~ atoms) was placed in the center of a dodecahedron box. for the solvation studies, the ligand was placed in the center of a cubic box. each system was solvated in the tip p model type water energy minimization was carried out to a maximum kj.mol - .nm - force using verlet cutoff scheme. for both long range electrostatic and van der waals interactions, a cutoff length of Å was used with the particle mesh ewald method (pme) ( th order interpolation). the neighbor list update frequency was set to ps - . as with our earlier studies, [ ] [ ] a stepwise energy minimization and equilibration schemes were used. each minimization step consisted of a cycle of steepest descent and a subsequent cycles of l-bfgs integrators. after minimization, each system was equilibrated within three steps using langevin in order to test the ml-qm method, we run -ns md simulations for the molecules whose solvation free energies have been extensively studied. we first built a system in which the ligand is replaced in the center of box and the box is solvated with just single water molecule. the ligands studied are given in figure . we prepared other independent simulations in which we increased the number water molecules (nh o) in the box. each time the average ml-qm energies were almost . this has been the case until nh o= . figure a shows the interaction energy between the ligand and water molecules calculated by ml-qm energy difference: is the total energy of the system including ligand and water. are the energies of free ligand and free water molecules, respectively, which have been calculated by extracting from the trajectory of complex structure. when we kept adding more water in the system and running other md simulations, we found that starting from nh o= , the system undergoes from the unbound state to bound state by a drastic energy jump after ~ . ns. the degree of freedom for the water molecules in the bound state must be it is apparent that the ml-qm interaction energy is constant for the simulations with more than sufficient waters exist. the empirically determined linearity constant is α = / , thus the free energy change is approximately half of the interaction energy between the ligand and environment. however, considering the experimental conditions are more or less similar, the energy profile (i.e. the trend among the inhibitors) should at least be reproduced. here, we see that ml-qm is the most accurate method to yield the correct trend among the compared methods ( figure ). a novel coronavirus from patients with pneumonia in china coronavirus main proteinase ( cl pro ) structure: basis for design of anti-sars drugs virus-encoded proteinases and proteolytic processing in the nidovirales conservation of substrate specificities among coronavirus main proteases crystal structure of sars-cov- main protease provides a basis for design of improved α -ketoamide inhibitors in silico study the inhibition of angiotensin converting enzyme receptor of covid- by ammoides verticillata components harvested from western algeria state-of-the-art tools to identify druggable protein ligand of sars-cov- molecular docking analysis of n-substituted oseltamivir derivatives with the sars-cov- main protease drug repurposing for coronavirus (covid- ): in silico screening of known drugs against coronavirus cl hydrolase and protease enzymes reverse vaccinology approach to design a novel multi-epitope vaccine candidate against covid- : an in silico study repurposing of known anti-virals as potential inhibitors for sars-cov- main protease using molecular docking analysis covid- spike-host cell receptor grp binding site prediction virtual screening and repurposing of fda approved drugs against covid- main protease molecular docking analysis of selected natural products from plants for inhibition of sars-cov- main protease molecular docking analysis of withaferin a from withania somnifera with the glucose regulated protein (grp ) in comparison with the covid- main protease ligand binding free energy and kinetics calculation in an intuitive look at the relationship of k-i and ic : a more general use for the dixon plot the mm/pbsa and mm/gbsa methods to estimate ligand-binding affinities a. g_mmpbsa-a gromacs tool for high-throughput mm-pbsa calculations a comparative linear interaction energy and mm/pbsa study on sirt -ligand binding free energy calculation revisiting free energy calculations: a theoretical connection to mm/pbsa and direct calculation of the association free energy efficient estimation of free energy differences from monte carlo data efficiency of alchemical free energy simulations i: practical comparison of the exponential formula, thermodynamic integration and bennett's acceptance ratio method comparison of thermodynamic integration and bennett's acceptance ratio for calculating relative protein-ligand binding free energies bayesian estimation of free energies from equilibrium simulations unorthodox uses of bennett's acceptance ratio method multiscale free energy simulations: an efficient method for connecting classical md simulations to qm or qm/mm free energies using non-boltzmann bennett reweighting schemes on relation between the free-energy perturbation and bennett's acceptance ratio methods: tracing the influence of the energy gap comparison of efficiency and bias of free energies computed by exponential averaging, the bennett acceptance ratio, and thermodynamic integration binding thermodynamic cycles: hysteresis, the locally weighted histogram analysis method, and the overlapping states matrix calculations of solvation free energy through energy reweighting from molecular mechanics to quantum mechanics convergence of single-step free energy perturbation absolute hydration free energies of blocked amino acids: implications for protein solvation and stability bruckner, s.; boresch, s. efficiency of alchemical free energy simulations ii: improvements for thermodynamic integration qm/mm free-energy perturbation compared to thermodynamic integration and umbrella sampling: application to an enzymatic reaction separation-shifted scaling, a new scaling method for lennard-jones interactions in thermodynamic integration efficient determination of protein-protein standard binding free energies from first principles free-energy cost for translocon-assisted insertion of membrane proteins scalable molecular dynamics with namd efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the predicting reaction performance in c-n cross-coupling using machine learning the tensormol- . model chemistry: a neural network augmented with long-range physics machine learning of accurate energy-conserving molecular force fields quantumchemical insights from deep tensor neural networks first principles neural network potentials for reactive simulations of large molecular and condensed systems molecular dynamics with on-the-fly machine learning of quantum-mechanical forces accurate interatomic force fields via machine learning with covariant kernels energy-free machine learning force field for aluminum fast and accurate modeling of molecular atomization energies with machine learning prediction errors of molecular machine learning models lower than hybrid dft error protein-ligand scoring with convolutional neural networks ani- , a data set of million calculated offequilibrium conformations for organic molecules less is more: sampling chemical space with active learning linear interaction energy (lie) models for ligand binding in implicit solvent: theory and application to the binding of nnrtis to hiv- reverse transcriptase gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers improved side-chain torsion potentials for the amber ff sb protein force field molecular-dynamics study of atomic motions in water computational insights into the protonation states of catalytic dyad in bace -acyl guanidine based inhibitor complex docking, molecular dynamics and free energy studies on aspartoacylase mutations involved in canavan disease electrostatics of nanosystems: application to microtubules and the ribosome key: cord- -ykyr gvl authors: ivanov, mark v.; bubis, julia a.; gorshkov, vladimir; abdrakhimov, daniil a.; kjeldsen, frank; gorshkov, mikhail v. title: boosting the ms -only proteomics with machine learning allows protein identifications in -minute proteome analysis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ykyr gvl proteome-wide analyses most often rely on tandem mass spectrometry imposing considerable instrumental time consumption that is one of the main obstacles in a broader acceptance of proteomics in biomedical and clinical research. recently, we presented a fast proteomic method termed directms based on ms -only mass spectra acquisition and data processing. the method allowed significant squeezing of the proteome-wide analysis to a few minute time frame at the depth of quantitative proteome coverage of proteins at % fdr. in this work, to further increase the capabilities of the directms method, we explored the opportunities presented by the recent progress in the machine learning area and applied the lightgbm tree-based learning algorithm into the scoring of peptide-feature matches when processing ms spectra. further, we integrated the peptide feature identification algorithm of directms with the recently introduced peptide retention time prediction utility, deeplc. additional approaches to improve performance of the directms method are discussed and demonstrated, such as faims coupled to the orbitrap mass analyzer. as a result of all improvements to directms , we succeeded in identifying more than proteins at % fdr from the hela cell line in a minute lc-ms analysis. high throughput and sensitive analytical approaches enabling proteomewide measurements of large cohorts of samples will open the way for application of mass spectrometry-based proteomics in clinical trials, population proteomics, as well as in emerging areas of drug-to-proteome interactions and metaproteome characterization. , moreover, the needed throughput and depth of proteome coverage in these studies has to be accompanied with protein quantitation consistent across the sample cohorts, which is necessary for these approaches being useful in personalized medicine studies. , a recent example on using the large cohort of covid- patients clearly demonstrated the need for ultra-highthroughput proteomics to generate hypotheses about therapeutic targets and aid classification or diagnostic decision making in clinical environments. the typical ms/ms-based proteomic methods are extremely instrument time consuming, which is partially overcome by expensive multiplexing using isotopic labeling strategies, such as tandem mass tag (tmt). nevertheless, a number of recent studies employing the bottom-up proteomic approaches become increasingly focused on increasing the throughput of the proteome-wide analysis. these studies include all steps of the proteome characterization workflow, such as increasing the speed of sample preparation prior to ms-based analysis [ ] [ ] [ ] [ ] [ ] , squeezing data acquisition time to a few minute range [ ] [ ] [ ] and increasing the throughput of data processing [ ] [ ] [ ] [ ] [ ] [ ] . one of the most time-consuming parts of the analysis is ms/ms data acquisition, which requires using long hplc gradient times to have enough room for sequential isolation of as many as possible precursor ions from each ms spectrum for subsequent fragmentation. a number of methods and approaches to ms/ms-free proteome analysis have been proposed and widely explored starting from the early accurate mass and tag (time) method [ ] [ ] [ ] [ ] [ ] to a recent truly (e.g., without employing tandem mass spectrometry at any step in the workflow) ms/ms-free realizations , . these methods rely heavily on the accuracies of both peptide m/z measurements and retention time (rt) predictions. the latter is especially important for the ms/ms-free strategy as only retention times contain sequence-specific information in a (m/z, rt) space. , with advances in development of machine learning algorithms a variety of highly accurate rt prediction models become increasingly available. [ ] [ ] [ ] [ ] the latter work is particularly interesting as it shows that deeplc model's performance is comparable with the other deep learning-based alternatives, yet, it provides better generalization between different chromatography setups. this feature is particularly useful for ms/ms-free proteomic approaches such as directms , when the number of peptides available for rt prediction model training is typically small. therefore, the use of a pretrained deeplc model for rapid adoption to a dataset obtained for particular separation conditions becomes advantageous. previously, we described the directms method, in which proteolytic peptide mixture analysis is performed in ms/ms-free mode of acquisition using high resolution mass spectrometry. because the method does not employ isolation and fragmentation steps, the time for peptide separation can be reduced significantly to few minutes and the number of ms spectra available for processing will only be limited by the acquisition rate of the mass analyzer operating at high mass resolution. for the first time, directms demonstrated the capability to identify up to proteins from a hela cell line using only a -minute hplc gradient. moreover, the average sequence coverage for each identified protein in this method exceeded the one of a standard ms/ms-based approach (even when long gradient is used) by almost an order of magnitude, thus, significantly improving the quantitation. on a proteome-wide scale this kind of analysis efficiency was not considered feasible before because of the whole proteome sample complexity, a lack of sequence specific information in the measured m/z values of peptide ions, and the low accuracy of existing phenomenological retention time prediction models. however, advances in high resolution mass spectrometry technology and machine learning- in this work, we integrated two novel machine learning algorithms into the data processing workflow of directms to further improve its efficiency. these algorithms include deeplc as a retention time prediction model used in the method's peptide identification and the gradient boosted machine, lightgbm , for scoring peptide-feature matches. also, the method was upgraded and tested for processing ms -only data obtained using high resolution mass spectrometry and the high-field asymmetric waveform ion mobility, faims , which was found increasingly applicable in proteomic research , as it provides additional separation dimension for peptides at the front end of a mass spectrometer. parameters for the search were as follows: minimum scans for detected peptide isotopic cluster; minimum one visible c isotope; charges from + to +, no missed cleavage sites, and ppm initial mass accuracy. all searches were performed against swiss-prot human concatenated database containing protein sequences and its decoys unless otherwise stated. results were filtered to % protein level false discovery rate (fdr) using target-decoy approach with its "picked" modification and "+ " correction . data availability. the datasets generated and analyzed during the current study have been deposited to the proteomexchange consortium via the pride partner repository with the dataset identifier pxd . the workflow. figure shows details of directms method. the method starts with acquiring high resolution peptide ion mass spectra. a mass spectrometer operates in ms -only mode and simply collects spectra for eluting peptides at the speed determined by the agc and mass resolution settings. the total number of ms spectra acquired during -min lc gradient at the mass resolution of , at m/z and agc of * ranges from , to , depending on the mass spectrometer model. false discovery rate analysis at peptide level is shown in figure c . from which we found that . % of e. coli proteins at % fdr in the identification results as expected (e. coli and human databases contain shows that directms provides an accurate threshold for filtering protein identifications. a novel lc system embeds analytes in pre-formed gradients for rapid mass spectrometry for translational proteomics: progress and clinical implications mass spectrometry applied to bottom-up proteomics: entering the high-throughput era for hypothesis testing data, reagents, assays and merits of proteomics for sars-cov- research and testing ultra-high-throughput clinical proteomics reveals classifiers of covid- infection tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by ms/ms sample clean-up strategies for esi mass spectrometry applications in bottom-up proteomics: trends from modified filter-aided sample preparation (fasp) method increases peptide and protein identifications for shotgun proteomics an ultrafast sample-preparation approach for shotgun proteomics comparison of in-solution, fasp, and s-trap based digestion methods for bottom-up proteomic studies sample preparation by easy extraction and digestion (speed) -a universal, rapid, and detergent-free protocol for proteomics based on acid extraction online parallel accumulation-serial fragmentation (pasef) with a novel trapped ion mobility mass spectrometer evosep one enables robust deep proteome coverage using tandem mass tags while significantly reducing instrument time a compact quadrupole-orbitrap mass spectrometer with faims interface improves proteome coverage in short lc gradients faster sequest searching for peptide identification from tandem mass spectra a full open modification search method performing all-to-all spectra comparisons within minutes ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics fast open modification spectral library searching through approximate nearest neighbor indexing sw-tandem: a highly efficient tool for large-scale peptide identification with parallel spectrum dot product on sunway taihulight fast quantitative analysis of timstof pasef data with msfragger and ionquant proteomic analyses using an accurate mass and time tag strategy accurate mass measurements in proteomics influence of mass resolution on species matching in accurate mass and retention time (amt) tag proteomics experiments advances in proteomics data analysis and display using an accurate mass and time tag approach identification of phosphorylated human peptides by accurate mass measurement alone protein identification in complex mixtures using multiple enzymes with complementary specificity directms : ms/ms-free identification of proteins of cellular proteomes in minutes predicting peptide retention times for proteomics predictive chromatography of peptides and proteins as a complementary tool for training, selection, and robust calibration of retention time models for targeted proteomics improved peptide retention time prediction in liquid chromatography through deep learning prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning deeplc can predict retention times for peptides that carry as-yet unseen modifications lightgbm: a highly efficient gradient boosting decision tree a new method of separation of multi-atomic ions by mobility at atmospheric pressure using a high-frequency amplitude-asymmetric strong electric field high-field asymmetric waveform ion mobility spectrometry for mass spectrometry-based proteomics enhancement of mass spectrometry performance for proteomic analyses using high-field asymmetric waveform ion mobility spectrometry (faims) the pride database and related tools and resources in : improving support for quantification data open source software for rapid proteomics tools development dinosaur: a refined open-source peptide ms feature detector target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry a scalable approach for protein false discovery rate estimation in large proteomic data sets unbiased false discovery rate estimation for shotgun proteomics based on the target-decoy approach prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition lc-ms alignment in theory and practice: a comprehensive algorithmic review key: cord- -rzy mejb authors: duricki, denise a.; drndarski, svetlana; bernanos, michel; wood, tobias; bosch, karen; chen, qin; shine, h. david; simmons, camilla; williams, steven c.r.; mcmahon, stephen b.; begley, david j.; cash, diana; moon, lawrence d.f. title: corticospinal neuroplasticity and sensorimotor recovery in rats treated by infusion of neurotrophin- into disabled forelimb muscles started h after stroke date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: rzy mejb stroke often leads to arm disability and reduced responsiveness to stimuli on the other side of the body. neurotrophin- (nt ) is made by skeletal muscle during infancy but levels drop postnatally and into adulthood. it is essential for the survival and wiring-up of sensory afferents from muscle. we have previously shown that gene therapy delivery of human nt into the affected triceps brachii forelimb muscle improves sensorimotor recovery after ischemic stroke in adult and elderly rats. here, to move this therapy one step nearer to the clinic, we set out to test the hypothesis that intramuscular infusion of nt protein could improve sensorimotor recovery after ischemic cortical stroke in adult rats. to simulate a clinically-feasible time-to-treat, twenty-four hours later rats were randomized to receive nt or vehicle by infusion into triceps brachii for four weeks using implanted minipumps. nt increased the accuracy of forelimb placement during walking on a horizontal ladder and increased use of the affected arm for lateral support during rearing. nt also reversed sensory deficits on the affected forearm. there was no evidence of forepaw sensitivity to cold stimuli after stroke or nt treatment. mri confirmed that treatment did not induce neuroprotection. functional mri during low threshold electrical stimulation of the affected forearm showed an increase in peri-infarct bold signal with time in both stroke groups and indicated that neurotrophin- did not further increase peri-infarct bold signal. rather, nt induced spinal neuroplasticity including sprouting of the spared corticospinal and serotonergic pathways. neurophysiology showed that nt treatment increased functional connectivity between the corticospinal tracts and spinal circuits controlling muscles on the treated side. after intravenous injection, radiolabelled nt crossed from bloodstream into the brain and spinal cord in adult mice with or without strokes. our results show that delayed, peripheral infusion of neurotrophin- can improve sensorimotor function after ischemic stroke. phase i and ii clinical trials of nt (for constipation and neuropathy) have shown that peripheral, high doses are safe and well tolerated, which paves the way for nt as a therapy for stroke. ischemic stroke occurs in the brain when blood flow is restricted, causing brain cells to die rapidly. movements on the opposite side of the body are frequently affected . stroke victims also often exhibit lack of responsiveness to stimuli on their affected side. the w.h.o. estimates that, worldwide, there are million stroke survivors, with another million new strokes annually. the vast majority of stroke victims are not eligible for the few therapies that improve outcome because they arrive in hospital too late for reperfusion to be effective . treatment six hours or more after ischemic stroke is usually limited to rehabilitation: therapies that reverse sensory impairments and locomotor disability are urgently needed, and these must work when initiated many hours after stroke. neurotrophin- is a growth factor which plays a key role in the development, and function of locomotor circuits that express nt receptors, including descending serotonergic and corticospinal tract (cst) axons and afferents from muscle and skin that mediate proprioception and tactile sensation [ ] [ ] [ ] . however, peripheral levels of nt drop in the postnatal period . we and others had shown that delivery of nt into the cns promotes recovery in rodent models of spinal cord injury [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] but this involved invasive routes of delivery (e.g., intraspinal injection or intrathecal infusion) or gene therapy. we also recently showed that injection of an adeno-associated viral vector (aav) encoding full-length human nt (prepront , kda) into forelimb muscles hours after stroke in adult or elderly rats improved sensorimotor recovery . we had originally expected that aav would be trafficked from muscle to the spinal cord retrogradely in axons and that this would enhance secretion of nt by motor neurons, leading to sprouting of the spared cst , , and sensorimotor recovery. although nt protein was overexpressed in injected muscles, to our surprise we found little evidence for expression of the human nt transgene in the spinal cord or cervical drgs , using this dose and preparation of aav. this serendipitous result led us to reject our original assumption that sensorimotor recovery required expression of the human nt transgene in the spinal cord and to wonder whether peripheral infusion of nt protein would suffice. accordingly, here, we test the hypothesis that infusion of the mature form of the nt protein ( kda) into disabled forelimb muscles improves sensorimotor recovery. this is consistent with work by others including a study showing that a signal from muscle spindles can improve neuroplasticity of descending pathways and can enhance recovery after cns injury . notably, nt protein is synthesised by muscle spindles and can be transported from muscle to sensory ganglia and spinal motor neurons in nerves , , and from the bloodstream to the cns [ ] [ ] [ ] . this route of administration and time frame is clinically feasible so to take this potential therapy one step nearer the clinic, we next set out to determine whether intramuscular infusion of human nt protein (mature form, kda) would improve outcome after stroke (i.e., bypassing the use of gene therapy and spinal surgery). importantly, the mature form of the nt protein has excellent translational potential: phase i and ii clinical trials have shown that repeated, systemic, high doses of nt protein are well-tolerated, safe and effective in more than humans with sensory and motor neuropathy (charcot-marie-tooth type a) or constipation including in people with spinal cord injury [ ] [ ] [ ] [ ] [ ] . in contrast to other neurotrophins, nt does not cause any serious adverse effects such as pain probably because its principal high affinity receptor trkc is not expressed on adult nociceptors , . these studies pave the way for nt as a therapy for stroke in humans. we now show in a blinded, randomized preclinical trial that treatment of disabled upper arm muscles with human nt protein reverses sensory and motor disability in rats when treatment is initiated in a clinically-feasible timeframe ( hours after stroke). rats received unilateral focal cortical stroke or underwent sham surgery , (fig. a,b) . twentyfour hours after stroke, rats were allocated to treatment using nt or vehicle, infused into affected triceps brachii muscles for one month via implanted catheters and subcutaneous osmotic minipumps. experiments were performed in accordance with guidelines from the stroke therapy academic industry roundtable (stair) and others and our findings were reported in accordance with the arrive (animals in research: reporting in vivo experiments) guidelines. all surgical procedures, behavioural testing and analysis were performed using a randomised block design. all surgeries, behavioural testing and analysis were performed with investigators blinded to treatment groups. rats were randomised to surgery by drawing a rat identity number from an envelope and then a stroke/sham allocation from an envelope. allocation concealment was performed by having nt and vehicle stocks coded by an independent person prior to loading pumps. behavioural testing was conducted blind and codes were only broken after behavioural analyses were complete. lister hooded (~ months; - g) outbred female rats (charles river, uk) and adult c bl/ mice ( - weeks) were used. all procedures were in accordance with the uk home office guidelines and animals (scientific procedures) act of . rats were maintained (specific pathogen free) in groups of to in plexiglas housing with tunnels and bedding on a : hour light/dark cycle with food and water ad libitum. focal ischemic stroke was induced in the hemisphere representing the dominant forelimb ( supplementary fig. ), as determined by the cylinder behavioural test. stroke lesions (n = ) were performed as previously described . briefly, animals were transferred to a stereotaxic frame (david kopf instruments, usa) where a midline incision was made, the cortex was then exposed via craniotomy using the following co-ordinates [defined as anterioposterior (ap), mediolateral (ml)]: ap mm to − mm, ml mm to mm, relative to bregma. endothelin- (et- , pmol/µl in sterile saline; calbiochem) was applied using a glass micropipette attached to a hamilton syringe. µl of et- to was applied to the overlying dura to reduce bleeding, and immediately thereafter, the dura mater was incised and reflected. four μl volumes of et- were administered topically and four µl volumes were microinjected intracortically (at a depth of mm from the brain surface) at the following co-ordinates (from bregma and midline, respectively): ap + . mm, ml . mm ap + . mm, ml . mm ap + . mm, ml . mm ap - . mm, ml . mm temperature was maintained using a rectal probe connected to a homeothermic blanket (harvard apparatus, usa) placed under the animal which maintained rectal temperature at ± °c. prior to suturing, the animal was left undisturbed for minutes. modifying previous work , the skull fragment was then replaced and sealed using bone wax (covidien, uk). % ( / ) of rats survived this stroke surgery. sham-operated (n = ) rats received all procedures up to, but not including, craniotomy or endothelin- injection. animals were given buprenorphine ( . mg/kg, subcutaneously) for postoperative pain relief. our method of inducing stroke with et- is advantageous for evaluating regenerative stroke therapies for four reasons: ) our model produces ischemic lesions that model small focal human strokes rather than larger "malignant" strokes that tend to be fatal in humans ; ) our model targets sensorimotor cortex; ) our stroke model involves only low mortality rates and has reasonable reproducibility with a proven ability to detect therapies that induce neuroplasticity and functional recovery , ; ) our model causes sustained sensorimotor deficits (e.g., impaired use of limbs) which are common neurological symptoms of human stroke. structural images were obtained prior to stroke, hours after stroke and at one and eight weeks after stroke. mr imaging was conducted on a tesla (t) horizontal bore vmris scanner (varian, palo alto, ca, usa). animals were anaesthetised using . % isoflurane, in . l/min medical air and . l/min medical oxygen in an induction chamber. once anaesthetised they were secured in a stereotaxic head frame inside the quadrature birdcage mr coil ( mm internal diameter, id) and placed into the scanner. each animal's physiology was supervised throughout the procedure using a respiration monitor (biopac, usa) and a pulse oximetry sensor (nonin, usa) that interfaced with a pc running biopac. additionally, an mri compatible homeothermic blanket (harvard apparatus, usa) placed over the animal responded to any alterations in the body temperature identified by the rectal probe, and maintained temperature at ± °c. the t weighted mr images were acquired using a fast spin-echo sequence: effective echo time (te) ms, repetition time (tr) ms, field of view (fov) mm x mm and an acquisition matrix x , acquiring x mm thick slices in approximately minutes. at the end of the study (to avoid affecting blinding or randomization), lesion volumes at the hour time point were measured using a semi-automatic contour method in jim software under blinded conditions (xinapse systems ltd). functional magnetic resonance imaging (fmri) was performed in a subset of rats that did not receive intracortical injections of bda tracer (n = /group). images were acquired prior to stroke and at one and eight weeks after stroke during non-noxious somatosensory stimulation of the affected or less-affected wrist . this involves delivery of small electrical currents to a wrist whilst the subjects were kept anaesthetised using medium dose alpha-chloralose suitable for recovery and longitudinal (repeated) imaging . alpha chloralose anesthesia was prepared by mixing equal amounts of borax decahydrate (sigma) and alpha chloralose-pestanal (analytical standard, < % beta isoform, , sigma, uk) in physiological saline each at a concentration of mg/ml in a glass beaker at °c prior to filtering using a . µm filter. rats were first anaesthetised using % isoflurane, . l/min medical air and . l/min oxygen. a tail cannulation was performed and the animal was transferred to the mri machine. a bolus of mg/kg alpha chloralose-pestanal was injected intravenously and then the isoflurane was switched off after minutes. an infusion line for continuous application of alpha chloralose was then attached to the cannula at mg/kg/h over the experimental time. medical air ( . l/min) and oxygen ( . l/min) was continuously delivered throughout the scanning period . mr images were obtained using a tesla scanner (varian, agilent). initially, a t -weighted structural scan was acquired, using a fast spin echo (fse) sequence with repetition time (tr) = ms, echo time (te) = ms and field of view (fov) of x mm, yielding slices with voxel size of . x . x mm in approximately min. fmri scans were acquired using a multi-gradient-multi-echo sequence (tr = ms, tes = , , ms, voxel size . mm x . mm x mm, resolution x x , scan time s). volumes were acquired with a pseudo-random onoff stimulation of the forepaw at hz ( μs, ma pulse) using a platinum subdermal needle electrode (f-e , grass technologies, usa) and a tens (transcutaneous electrical nerve stimulation) pad. it has previously been shown that the use of a tens pad results in this intensity of stimulation being innocuous rather than noxious: whereas blood pressure is not altered at ma stimulation, it is increased with . ma stimulation (see references in ). the order of paw stimulation was also randomised. animals were closely monitored following the end of the scanning, given ml of saline (subcutaneous, room temperature) and were kept in a warmed incubator ( deg c) until fully conscious: this takes to hours due to slow pharmacokinetics of alpha chloralose. two animals died following alpha chloralose anaesthesia due to breathing difficulties in recovery. scans with obvious imaging artefacts were discarded, leaving final group numbers of n= , , and n= , , at weeks , and for nt and vehicle treated groups respectively. the resulting images were analyzed with spm- (statistical parametric mapping, fil, ucl). in order to make sure all lesions were in the same side of the brain, images with righthand side strokes were rotated about the sagittal mid-plane, so that the lesioned hemisphere always appeared on the left. functional scans were initially realigned to the first image in the timeseries in order to correct for movements of the head. the first volume of the functional scan was then spatially registered to the structural image, which was, in turn, linearly warped to a template brain. linear warping was used in this step in order to avoid deforming the lesion region. warping parameters obtained during registration of structural image to template were applied to the realigned functional time-series, resulting in structural and functional images that are all in a standard space. finally, functional images were smoothed using a gaussian kernel with full width half maximum of . x . x mm (twice the voxel size). because of the relatively long effective tr of the functional images, a pet basic model (one-sample t-test) was used for firstlevel analyses with covariates consisting of the pseudo-random stimulation pattern (paradigm), and the estimated movement parameters of each individual rat. volumes signal intensity was globally scaled and individual masks, generated from the fast spin echo (fse) structural scan for each rat brain at each time-point using a d pulse-coupled neural network (also registered to the template), were used as explicit masks for the first-level statistical analysis. contrast images from the first-level analysis were then carried onto a second-level (random effects) group analysis. effects of group (i.e. nt or vehicle treated) and stimulated paw (i.e. affected or lessaffected) were used to create statistical comparisons. a flexible factorial analysis was used to compare the difference between the nt and vehicle treated groups in the change from baseline to weeks . statistical parametric maps were generated using an uncorrected threshold of p < . ; images show group mean activations and t values are given. hours after stroke (immediately after mri), rats were allocated to treatment using a randomized block design. allocation concealment was performed by having nt and vehicle stocks coded by a third party. rats were anaesthetised as above and a small incision was made between elbow and axilla, and a small subcutaneous space was formed to the lower back. the osmotic pump with the catheter attached was positioned in this subcutaneous space and then an ultrafine, flexible catheter was implanted approximately millimetres into the proximal end of the long head of the triceps brachii muscle on the disabled side and this was sutured in place (prolene / , ethicon, uk). triceps brachii was selected as the site for infusion because this large muscle is involved in forelimb extension during walking and for postural support during rearing) rearing. note that in the triceps brachii, the end plates are located in the belly of the muscle . the catheter was made from stretchable and flexible silicone tubing (id = . inches, wall thickness = . inches, trelleberg, sf medical, uk, sfm - ) attached to the osmotic pump via larger gluedon tubing (id . mm, vwr international). a second section of this stiff tubing ( millimetres long) was inserted to guide the flexible catheter into the triceps; the guide was slid back after the silicone tube was implanted. catheters were connected to subcutaneous osmotic minipumps ( ml , alzet) containing either vehicle ( . % saline containing . % bovine serum albumin; sigma; a ; - - ) or vehicle containing recombinant human nt (kind gift of genentech inc., usa). pump flow moderators were mr-compatible (peek micro medical tubing, durect, # ). the original vials contained . mg/ml recombinant human ("rhu") nt in mm acetate, mm nacl, ph . . sds page and proteomic analysis indicates that this is the amino acid mature nt protein obtained after proteolytic cleavage of the amino acid proneurotrophin- (uniprot p ) ( supplementary figures and ). the nt dose ( µg/ hours) was selected based on previous experiments . pumps were replaced after two weeks and removed after four weeks. skin was sutured and analgesic administered as above. all rats survived this surgery. sham rats did not undergo this surgery. pilot experiments showed that the pump flow rate ( µl/hour) was sufficient to deliver substances ( . % saline containing % fast green, sigma) to the entire volume of the triceps muscles. rats (n = /group) were terminally anesthetised ( weeks following stroke, before pumps were removed) and triceps brachii and c spinal cord were rapidly dissected and snap frozen in liquid nitrogen prior to storage at - °c. tissue was homogenised in ice cold lysis buffer containing mm nacl, mm tris-hcl (ph . ), % np , % glycerol, mm phenylmethanesulfonylfluoride, μg/ml aprotinin, μg/ml leupeptin and . mm sodium vanadate, using approximately times the volume of buffer to the wet weight of tissue ( µl/mg tissue). protein content was measured and nt elisa was carried out according to manufacturer's instructions (emax, promega). n.b., promega nt elisa kits are no longer available. however, it was recently discovered that when performing elisa using other nt kits (r&d using "reagent diluent" and abcam using diluent a), measurements of nt from skeletal muscle lysates do not provide reliable quantitative data. this is due to so-called "matrix effects" as shown by poor recovery of spiked-in nt (< % or > %) and non-linear relationship between concentration of input material and estimated nt concentration based on a dilution series of muscle homogenate. to overcome the effect of interfering substances, samples should be diluted and appropriate diluents to prepare standards and diluted samples must be used. the abcam kit used with diluent b (not a), however, provided reliable quantitative results (dr. aline barroso spejo, unpublished results). we assessed sensory and motor deficits after stroke using the cylinder test (to assess postural support by forelimbs during rearing), adhesive patch test (to assess responsiveness to tactile stimuli), horizontal ladder (to assess forelimb and hindlimb skilled locomotion), a grip strength test and a test used to monitor unusual responses to cold stimuli (cold allodynia) , . all behavioural testing was carried out by an experimenter blinded to surgery and treatment groups. rats were handled and trained for three weeks on the horizontal ladder before the study began. preoperative baseline scores for the horizontal ladder, the vertical cylinder and the grip strength test were collected one week before surgery. the "adhesive patch" test was used to measure ) the time taken to contact stimuli on the wrists, ) the time taken to remove stimuli from the wrists, and ) the magnitude of lack of responsiveness to stimuli on the affected wrist , , , . for each trial, a round adhesive patch ( mm diameter, ryman) was applied to each wrist on the dorsal side and the animal was returned to its home cage. two times were recorded for both forepaws: ( ) contact and ( ) remove; where "contact" represents the time taken for the animal to contact an adhesive patch with its mouth, and "remove" represents the time taken for the animal to remove the first adhesive patch from its wrist. to determine whether the rats preferentially removed a sticker from their less-affected wrist before their more-affected wrist, the order and side of label removal was recorded. this was repeated four times per session until a > % preference had been found; if this was not the case a fifth trial was conducted. the magnitude of asymmetry was established using the seven levels of stimulus pairs on both wrists as previously described (figure a) . from trial to trial, the size of the stimulus was progressively increased on the affected wrist and decreased on the less affected wrist by an equal amount ( . mm ), until the rat removed the stimulus on the affected wrist first (reversal of original bias). the higher the score, the greater the degree of somatosensory impairment. walking: to assess impairments in forelimb and hindlimb function after stroke, rats were videotaped as they walked along a horizontal ladder. rats were videotaped crossing a horizontal ladder ( m) with irregularly spaced rungs ( to cm spacing changed weekly) weekly, times per session. any slight paw slips, deep paw slips and complete misses were scored as errors. the mean number of errors per step was calculated for each limb for each week (foot faults are routinely normalized "per step" after stroke although analysis of foot fault data with or without normalization led to the same conclusions being drawn). the cylinder test was used to assess asymmetries in forelimb use for postural support during rearing within a transparent cm diameter and cm high cylinder . an angled mirror was placed behind the cylinder to allow movements to be recorded when the animal turned away from the camera. during exploration, rats rear against the vertical surface of the cylinder. the first forelimb to touch the wall was scored as an independent placement for that forelimb. subsequent placement of the other forelimb against the wall to maintain balance was scored as "both." if both forelimbs were simultaneously placed against the wall during rearing this was scored as "both." a lateral movement along the wall using both forelimbs alternately was also scored as "both." scores were obtained from a total number of full rears to control for differences in rearing between animals. once scores had been acquired, forelimb asymmetry was calculated using the formula: × (ipsilateral forelimb use + / bilateral forelimb use)/total forelimb use observations (hsu and jones ). figure a ; mjs technology, uk). both the affected and less affected forelimb strength were each measured (simultaneously) at baseline, week , week and week following stroke. a pair of force transducers were used in parallel to measure the peak force achieved by a rat's forelimbs as its bilateral grip was broken by the experimenter gently pulling the rat by the base of the tail horizontally away from the transducer. the average of three strength readings was noted down per session and an average taken for both arms. the difference in grip strength was taken by subtracting the affected forepaw grip strength from the less affected grip strength. the presence or absence of cold allodynia was assessed using standard methods . rats were placed in a transparent cylinder ( cm diameter, cm height) atop a mesh wire floor. a drop ( µl) of acetone was placed against the centre of the forepaw. in the following s after acetone application the rat's response was monitored. if the rat did not withdraw, flick or lick its paw within this s period then no response was recorded for that trial ( , see below). however, if within this s period the animal responded to the cooling effect of the acetone, then the animal's response was assessed for an additional s, a total of s from initial application. the reasons for taking a longer period of time to assess the evoked behaviour were to measure only pain-related behaviour evoked by cooling and not startle responses that can occur following the initial application of acetone . moreover, the behaviour evoked by acetone is often an interrupted series of behaviours, thus it is important to give enough time to see all pain-related behaviours. responses to acetone were graded to the following -point scale: , no response; , quick withdrawal, flick or stamp of the paw; , prolonged withdrawal or repeated flicking of the paw; , repeated flicking of the paw with licking directed at the ventral side of the paw. acetone was applied alternately three times to each paw and the responses scored categorically. cumulative scores were then generated by adding the scores for each rat together, the minimum score being (no response to any of the six trials) and the maximum possible score being per forepaw. to visualize uninjured cst axons, six weeks after stroke, % biotinylated dextran amine (bda; , mw, invitrogen) in pbs (ph . ) was microinjected unilaterally into the uninjured sensorimotor cortex. animals were placed in a stereotaxic frame and six burr holes were made into the skull at the following coordinates (defined as anterioposterior (ap), mediolateral (ml): ) ap: + mm, ml: . mm; ) ap: + . mm, ml: . mm; ) ap: + . mm, ml: . mm; ) ap: + . mm, ml: . mm; ) ap: + . mm, ml: . mm; ) ap: - . mm, ml: . mm, relative to bregma. at each site, . μl injections of bda ( % in pbs) were delivered using a glass micropipette attached to a hamilton syringe inserted mm from the skull surface and delivered at a rate of . μl/min. animals were subsequently left for weeks before being perfused. tract tracing was not performed in rats that were to undergo functional mri or neurophysiology. as described below, we recorded from the ulnar nerve on the affected side and stimulated the ipsilateral median nerve or, in the pyramids, the corticospinal tract corresponding to the affected or less-affected hemisphere. at the end of the study, rats ( rats per group) were anaesthetised with an intraperitoneal injection of . g/kg urethane (sigma-aldrich). the rat was kept at °c with a homeothermic blanket system and rectal thermometer probe. tracheotomy was performed and a tracheal cannula inserted. the pyramids were then exposed ventrally by blunt dissection and removal of a small area of bone. the brachial plexus of the affected forelimb was exposed from a ventral approach by dissecting the pectoralis major. the ulnar and median nerves were dissected free from surrounding connective tissue and cut distally (to prevent twitches of target muscles). skin flaps from the incision formed a pool, which was filled with paraffin oil. the median and ulnar nerves supply flexor muscles in the forearm, wrist and hand. stimulation of afferents in the median nerve can generate responses in the ulnar nerve motor neurons. the proximal segment of each nerve was mounted on a pair of silver wire hook electrodes (with > cm separation). electrical stimuli of increasing amplitude from µa to µa, in μa steps, (single µs square wave pulse at . hz) were delivered from a constant current stimulator (nl a neurolog, digitimer) to the proximal segment of the cut median nerve. ulnar nerve responses to each stimulus were recorded from the pair of silver wire hook electrodes connected to a differential pre-amplifier and amplifier (digitimer) coupled via a powerlab (ad instruments) interface to a personal computer running labchart and scope software (ad instruments). an average of sweeps at µa was calculated online for each nerve and used to find the difference in amplitudes of monosynaptic reflexes evoked by median nerve stimulation. this was achieved using software to calculate the absolute integral of any response between . ms and ms, regardless of whether a response was observed qualitatively. cst stimulation experiment with ulnar recordings: ulnar nerve recordings were obtained during stimulation of each pyramid in turn. the concentric bipolar stimulation electrode (fhc cbbpc ) was located mm lateral to the midline and gently lowered through the pyramid up to a maximum depth of . mm while stimulating at μa ( pulses, pulse width µs; frequency hz). at the electrode location providing maximal ulnar nerve response, stimuli of increasing intensity were applied in the range of µa to µa, in µa increments. five sweeps were captured at each stimulus intensity. the number of spikes % greater than the noise, and falling between and ms, was calculated for each sweep. the average number of spikes for sweeps at each amplitude was calculated and the difference in the number of spikes elicited by stimulation of the pyramids from the lesioned or contralesional hemisphere. furthermore, the signal was rectified and the area under the curve was measured between and ms for each sweep and averaged for the sweeps at each intensity. each parameter was analysed using twoway repeated measures anova. graphs show mean and standard error of the mean for the area under the curve for stimuli given at µa. eight weeks after stroke surgery and two weeks after injection of bda, rats were terminally anesthetized with sodium pentobarbital ( mg/kg; i.p.) and perfused transcardially with pbs for minutes, followed by ml of % paraformaldehyde in pbs for minutes. the brain, c -c spinal cord, c and c drgs and both arms were carefully dissected and stored in % paraformaldehyde in pbs for hours and then transferred to % sucrose in pbs and stored at - °c. spinal cord segments c and c was embedded in oct and μm transverse slices were cut using a freezing stage microtome (kryomat; leitz, germany). ten series of sections were collected and stored in tbs/ . % azide ( mm tris, mm nacl, . mm nan , ph . ) at °c. cst axons were counted that crossed the midline, at two more lateral planes and at an oblique plane (figure a ) at c and c . for each rat, we estimated the number of cst axons per cord segment by calculating the average number of cst axons per section and then multiplying by a scaling factor (number of sections cut per segment). the total length of serotonergic processes was measured using a standard method designed specifically to measure serotonergic sprouting after neurotrophin treatment (see refs in ) and which is well suited for quantification of dense terminal arbors (e.g., in the dorsal horn of the spinal cord). processes were identified using the "adjust threshold" function in imagej and fiber lengths were measured in three areas: the dorsal horn, intermediate grey and ventral horn (fig. a ) in sections per rat. we calculated the ratio of the sides ipsilateral and contralateral to nt treatment for the three areas separately. immunofluorescence was visualized under a zeiss imager.z microscope or a confocal zeiss lsm laser scanning microscope. photographs were taken using the axio cam and axiovision le rel. . or the lsm image browser software for image analysis. nt protein was radiolabelled with µci ( . mbq) n-succinimidyl [ , - h]propionate ( h-nsp) and separated from unbound h-nsp using an Äktaprime purification system using a modification of a previous method the same treatment was repeated for mice h after stroke, with an incubation period of mins (n = ). in this set of experiments, . mbq of c sucrose (vascular marker) was injected towards the end of the incubation and the brain tissue samples also taken for capillary depletion analysis to distinguish nt or albumin in vascular endothelial cells from that in brain parenchyma. in brief, brain tissue was homogenized in physiological buffer ( µl per mg of tissue) and % dextran ( µl per mg of tissue) as described previously . the homogenate was subjected to density gradient centrifugation ( , × g for min at °c) to give an endothelial cell-enriched pellet and a supernatant containing the brain parenchyma and interstitial fluid (isf). the homogenate, pellet and supernatant samples were solubilized and counted as described above. distribution volume, vd, was calculated for all samples, including the endothelial pellet and brain parenchyma (isf). the values were corrected for c sucrose. data was analysed for capillary fraction, parenchyma and whole brain using one way anova and post hoc (bonferroni) t-tests. ug of protein was subjected to denaturing or non-denaturing sds page and visualised using colloidal coomassie brilliant blue staining. each band was excised separately, digested enzymatically (with trypsin) and subjected to lc/ms/ms analysis (dr. steve lynham, proteomics facility, kcl). in-gel reduction, alkylation and digestion with trypsin were performed prior to subsequent analysis by mass spectrometry. cysteine residues were reduced with dithiothreitol and derivatised by treatment with iodoacetamide to form stable carbamidomethyl derivatives. trypsin digestion was carried out overnight at room temperature after initial incubation at o c for hours. lc/ms/ms: peptides were extracted from the gel pieces by a series of acetonitrile and aqueous washes. the extract was pooled with the initial supernatant and lyophilised. each sample was then resuspended in l of mm ammonium bicarbonate and analysed by lc/ms/ms. chromatographic separations were performed using an ultimate lc system (dionex, uk). peptides were resolved by reversed phase chromatography on a m c pepmap column using a three step linear gradient of acetonitrile in . % formic acid. the gradient was delivered to elute the peptides at a flow rate of nl/min over min. the eluate was ionised by electrospray ionisation using a z-spray source fitted to a qtof-micro (waters corp.) operating under masslynx v . . the instrument was run in automated data-dependent switching mode, selecting precursor ions based on their intensity for sequencing by collision-induced fragmentation. the ms/ms analyses were conducted using collision energy profiles that were chosen based on the mass-to-charge ratio (m/z) and the charge state of the peptide. database searching: the mass spectral data was processed into peak lists using proteinlynx global server v . . with the following parameters: (ms survey -no background subtraction, sg smoothing iterations channels, peaks centroided (top %) no de-isotoping; ms/ms -no background subtraction, sg smoothing iterations channels, peak centroiding (top %) no de-isotoping). the peak list was searched against the uniprot database using mascot software v . using the following parameter specifications (precursor ion mass tolerance . da; fragment ion mass tolerance . da; tryptic digest with up to three missed cleavages; variable modifications: acetyl (protein n-term), carbamidomethylation (c), gln->pyro-glu (n-term q) and oxidation (m). lc/ms/ms analysis and interrogation of the data against the uniprot database identified nt from the excised and digested d gel bands. the results of the analysis and database searches are given in supplementary figure . database generated files were uploaded into scaffold (v . ) software (www.proteomesoftware.com) to create the .sfd file (pr lm d gel ). all samples were aligned in this software for easier interpretation and used to validate ms/ms based peptide assignments and protein identifications. peptide assignments were accepted if they contained at least two unique peptide assignments and were established at % identification probability by the protein prophet algorithm . the result table includes probability scores (mowse) for each peptide identified from the protein sequence. the threshold identity score corresponds to a % chance of incorrect assignment. peptides identified below these probabilities were accepted following manual inspection of the raw data to ensure that fragment ions correctly match the assigned sequence. the sequence coverage for each identified protein is represented in supplementary figure in yellow highlights. statistical analyses were conducted using spss (version . ). graphs show means ± sems (except where otherwise stated) and 'n' denotes number of rats. asterisks (*,**,***) indicate p≤ . , p≤ . and p≤ . , respectively. threshold for significance was . . histology and molecular biology data were assessed using kruskal-wallis and mann-whitney tests (due to small sample sizes). serotonergic fibre lengths was analysed by region using one way anova and post hoc (bonferroni) t-tests. pkcγ data was analysed using kruskal wallis and mann whitney tests. behavioural and mri data were analysed using linear models and restricted maximum likelihood estimation to accommodate data from rats with occasional missing values . akaike's information criterion showed that the model with best fit for the horizontal ladder data had a compound symmetric covariance matrix, whereas for the sensory test and mri data an unstructured covariance matrix was used. the model with best fit for the vertical cylinder had a compound symmetric covariance matrix, according to the - restricted log likelihood information criterion. baseline scores were used as covariates. degrees of freedom are reported to nearest integer. normality was assessed using histograms. t-tests were two-tailed unless otherwise specified. sample size calculations were presented previously . magnetic resonance imaging (mri) confirmed that infarcts included the forelimb and hindlimb areas in sensorimotor cortex (fig c) . there was no difference in the mean infarct volume between stroke groups at h, one or eight weeks after stroke (fig. d) . loss of cst axons was assessed at weeks in the upper cervical spinal cord using protein kinase c gamma (pkcγ) immunofluorescence , (fig. e) . stroke caused a % loss of cst axons in the dorsal columns relative to shams (fig. f) with no difference between vehicle and nt treated rats. together, the mri and pkcγ histology data indicate that there were no confounding pre-treatment differences in mean infarct volumes and that nt did not act as a neuroprotective agent, as expected, based on our previous results and given that treatment was initiated after the majority of cell death will have occurred. we used the "adhesive patch" test to assess forepaw somatosensory function. a sensory score was obtained by attaching pairs of adhesive patches to each rat's wrist on the dorsal side (fig. a) : a high score (e.g. ) denotes that a rat preferentially removed the smaller stimulus from their less-affected wrist (i.e., did not first remove the larger stimulus on their affected wrist). the two stroke groups exhibited a similar lack of responsiveness to stimuli on their affected wrists after one week (fig. b) . delayed treatment with nt caused recovery compared to vehicle: whereas vehicle-treated stroke rats showed a deficit relative to sham rats which persisted for eight weeks. importantly, there were no confounding differences in the time taken to contact or to remove a patch from either their less-affected or affected paw: after stroke, nt -treated rats and vehicletreated rats took longer to contact an adhesive patch relative to shams, but there was no difference between nt and vehicle treated rats (supplementary fig. a) . moreover, neither stroke nor nt treatment caused any deficit in the additional time taken after contact to remove the patch (supplementary figure b) . thus, delayed treatment of disabled forelimb muscles with nt improved responsiveness to tactile stimuli after ischemic stroke. walking was assessed using a horizontal ladder with irregularly spaced rungs (fig. c) . accurate paw placement during crossing requires proprioceptive feedback from muscle spindles . after one week, the two stroke groups made a similar number of errors with their affected forelimb when crossing a horizontal ladder (fig. d) . delayed nt treatment caused a progressive recovery after stroke whereas vehicle treated animals remained persistently impaired until the end of the study. this is consistent with previous work from our lab , . stroke also caused a modest unilateral hindlimb impairment on the ladder; infusion of nt into the forelimb triceps brachii did not improve this (supplementary figure ) . neurotrophin- also restored the use of the affected forelimb for lateral support while rats reared in a vertical cylinder (fig. e) . after stroke and vehicle treatment, rats used their affected forelimb less often than shams. nt -treated rats showed more frequent use of the affected forelimb relative to vehicle-treated rats (fig. f) . we used force transducers to measure grip strength of each forelimb (supplementary fig. ) . stroke caused transient weakness in both groups but infusion of nt into triceps brachii did not modify grip strength. we also found no evidence for pain (cold allodynia) on the affected or treated forelimbs, assessed by application of ice-cold acetone to the centre of the forepaw. cold allodynia was induced neither by stroke nor nt treatment (supplementary fig. ). in summary, infusion of nt protein into the triceps brachii induced recovery on both sensory and motor tasks that require control of muscles by pathways including corticospinal pathways, serotonergic raphespinal pathways and proprioceptive circuits. accordingly, we hypothesised that nt would induce neuroplasticity in multiple pathways. we examined anatomical neuroplasticity in the c cervical spinal cord because we knew from experiments using adult and elderly rats that the less-affected corticospinal tract sprouts at this level (as well as other levels) after injection of aav-nt into muscles including triceps brachii . indeed, anterograde tracing from the less-affected hemisphere (fig. b) revealed that infusion of nt protein increased sprouting of the cst in the c spinal cord (fig. a,b) across the midline and into the affected side at two more lateral planes, and also from the ventral cst. we assessed neural output in the ulnar nerve on the affected side, whose motor neurons are also found in c (range: c to c ) that supply muscles in the forearm including the hand , . to do this, we recorded responses during electrical stimulation of either the spared less-affected corticospinal tract (fig. c) or the partially-ablated corticospinal tract (fig. f) in the medullary pyramids. nt treatment led to enhanced responses in the ulnar nerve during stimulation of the less-affected (fig. d, e) and more-affected (fig. g, h) pathways. this result is consistent with the sprouting of traced cst axons (fig. a,b) and indicates that cst axons from both the stroke hemisphere and the contralesional hemisphere formed new synapses and/or strengthened preexisting connections in the cord on the treated side, most likely on pre-motor interneurons that lie between cst axons and motoneurons , . however, we did not find any evidence that nt strengthened the short-latency reflex from afferents in the median nerve to motor neurons in the ulnar nerve (fig. i-l) . we also found that nt treatment caused serotonergic axons to sprout in the ventral c spinal cord (fig. a-d) . anatomical and functional plasticity of corticospinal and raphespinal pathways is consistent with their expression of receptors for nt , , - . we conclude that nt caused neuroplasticity in multiple descending locomotor pathways including the raphespinal and the spared corticospinal tracts. these data are consistent with previous findings from our lab , that peripherally-administered nt can, directly or indirectly, enhance supraspinal plasticity after stroke. accordingly, next, we assessed the biodistribution of nt after peripheral administration. we measured the amount of total (rat and human) nt in the triceps brachii and c spinal cord. elisa was performed using a subset of five rats per group withdrawn at random from the study at the four-week time point: this revealed an increase in total nt protein levels in the triceps brachii on the treated side (fig. a ) and, surprisingly, on the untreated side (perhaps due to nt in endothelial cells; see below). we were not able to detect any increase in total nt in the c spinal cords (supplementary fig. ). however, elisa cannot distinguish exogenous human nt from endogenous rat nt because the amino acid sequences for mature human and rat nt are identical , . because elisa did not allow us to detect any small increases in exogenous human nt against the background of endogenous rat nt in the cns, we next used a more sensitive method for measuring trafficking of nt across the blood-cns barrier. recombinant nt protein was radiolabelled and purified. [ h]nt was injected intravenously into adult mice. radiolabelled albumin was used as a control because it does not enter the cns efficiently from the bloodstream. after , , , , , or minutes, brain, spinal cord and serum were taken for scintillation counting. nt progressively accumulated in the intact brain (fig. b) and cervical spinal cord (fig. c) . in plasma, the half-life of nt was short (fig. d) . our data is consistent with that from others who have shown that radiolabelled nt rapidly crosses the barriers between the blood and an intact cns [ ] [ ] [ ] and that a small amount of intact nt accumulates in the brain and cervical spinal cord (although the majority of nt is cleared rapidly from the bloodstream) [ ] [ ] [ ] . for example, after injection of nt into the brachial vein (which provides drainage from the triceps brachii), nt accumulates in the cortex, striatum, brainstem, cerebellum, sciatic nerve (and other regions of the nervous system involved with locomotion) . ischemia. minutes later, tissues were taken for scintillation counting. in contrast to [ h]albumin, [ h]nt accumulated in the brain (fig e) . to confirm entry of [ h]nt into brain parenchyma beyond endothelial cells, capillaries were depleted by gradient centrifugation to yield a supernatant containing brain parenchyma and an endothelial cell-enriched pellet . [ h]nt entered parenchyma (depleted of endothelial cells) at a level above that seen for [ h]albumin (fig. f ). transport of nt into the cns is apparently a receptor-mediated process as shown by ) the expression of nt receptors in rodent and human cns capillaries , and ) the ability of non-radiolabelled nt to compete for uptake of radiolabelled nt into the cns , , . in addition, we and others have shown that nt enters the pns after peripheral administration: after intramuscular overexpression of aav encoding nt , nt levels are elevated in the blood stream and nt accumulates in the ipsilateral drg , . we also found some evidence that nt is retrogradely transported from muscle to ipsilateral motor neurons . this is consistent with data showing that ) the blood-nerve barrier in drgs is permeable to proteins like nt , ) that after intravenous injection, radiolabelled nt accumulates in the sciatic nerve and ) that nt is retrogradely transported from muscle to the spinal cord or drg in nerves , , , . we conclude that neuroplasticity occurred in multiple locomotor pathways because peripherally-administered nt bound to receptors in the pns and cns. to explore the mechanism whereby nt improved responsiveness to stimuli attached to the affected wrist (fig. b) , we performed functional brain imaging (bold-fmri) during low threshold (non-noxious) intensity electrical stimulation of the affected wrist ( supplementary fig. ). as expected, prior to stroke, stimulation of the wrist resulted in a higher probability of activation of the opposite somatosensory cortex (supplementary fig. a ). fmri performed one week after stroke confirmed that somatosensory cortex was not active when the affected paw was stimulated in either vehicle or nt treated rats (p> . , supplementary fig. b ). this supports our claim above that there were no early differences between groups that could be explained by neuroprotection. fmri performed eight weeks after stroke revealed a trend towards perilesional re-activation of somatosensory cortex in both vehicle and nt treated groups (p< . , supplementary fig. c ). this is in line with human brain imaging studies showing that spontaneous sensory recovery is increased after stroke when more-normal activity patterns are observed on the affected side of the brain . however, these probabilities of re-activation were not big enough to survive correction for testing of multiple voxels (p-values> . ) although clearly they are in a location that might mediate recovery of somatosensation. a longitudinal analysis showed that at weeks (relative to pre-stroke baseline), there was some evidence that rats treated with neurotrophin- showed increased probability of activation of perilesional cortex (supplementary fig. d, p< . ) and showed decreased probability of activation of somatosensory cortex on the less-affected hemisphere (p< . ) relative to vehicletreated stroke rats. however, these apparent differences did not survive correction for testing of multiple voxels (p< . ). we conclude that both groups showed partial, spontaneous restoration of more-normal patterns of somatosensory cortex activation , but, conservatively, that nt did not further increase probability of activation of any supraspinal areas. these conclusions are consistent with previous fmri data from our laboratory ; we propose that the additional recovery of somatosensory function after nt treatment (fig. b) is due to changes in the spinal cord rather than in supraspinal areas. the batch of neurotrophin- protein that we used was produced more than a decade ago by genentech. we sought to determine whether any degradation had occurred and to confirm its amino acid sequence so that identical preparations of neurotrophin- could be made for future experiments. supplementary figure depicts results from a non-denaturing gel showing a higher molecular weight band ( kda) and a lower molecular weight band ( kda) consistent, respectively, with dimeric mature nt and monomeric mature nt . there was no evidence of degradation or aggregation. each band was excised separately, digested enzymatically (with trypsin) and subjected to lc/ms/ms analysis. proteomic analysis was consistent with both bands being mature nt with no evidence of residual prepro sequences (supplementary figure ) . we conclude that the higher molecular weight band is not prepront (~ kda) but rather corresponds to dimeric mature nt . this facilitated our ongoing experiments to evaluate nt as a therapy for stroke because most commercial preparations of nt consist of mature nt rather than prepront . treatment of disabled arm muscles with nt protein, initiated hours after stroke, caused changes in multiple locomotor circuits, and promoted a progressive recovery of sensory and motor function in rats. the fact that nt can reverse disability when treatment is initiated hours after stroke is exciting because the vast majority of stroke victims are diagnosed within this time frame . in contrast, the gold-standard drug for ischemic stroke, tpa, needs to be given within a few hours and is only administered to a minority. thus, nt could potentially be used to treat an enormous number of victims. nt has good clinical potential. firstly, phase ii clinical trials show that doses up to µg/kg/day are well tolerated and safe in healthy humans and in humans with other conditions , - . we used a threefold lower dose ( µg/kg/day) in this study: in future experiments we will optimize the dose and duration of treatment because it is possible that a higher dose of nt would promote additional recovery after stroke. secondly, there is good conservation from rodents to primates including humans in the expression of receptors for nt in the locomotor system , , [ ] [ ] [ ] . thirdly, in none of our rodent experiments has nt treatment caused any detectable pain, spasticity or muscle weakness (in line with the human trials); rather, after bilateral corticospinal tract injury in rats, intramuscular delivery of aav-nt reduced spasticity, slightly improved grip strength and showed a trend towards reducing mechanical hyperalgesia . in this study and in a previous study we used functional mri combined with electrical stimulation of the wrist in an effort to discover what neuroplasticity underlies recovery of somatosensory responsiveness to adhesive patches attached to the wrist. we confirmed work by others that recovery correlated well with more-normal patterns of increased bold signal surrounding the infarct (potentially in spared somatosensory cortex) , , , but we did not find strong evidence that nt further increased peri-lesional (or other) activation (either in this study or in our previous study ). instead, we now propose that nt increased somatosensory recovery by inducing neuroplasticity in spinal circuits involving cutaneous afferents. this is plausible because cutaneous afferents which mediate tactile sensitivity express trkc receptors . moreover, others have shown that dl spinal interneurons can gate cutaneous transmission . we have previously shown that nt normalises post-activation depression of output from spinal circuits evoked by stimulation of low threshold afferents from the treated wrist (which might include cutaneous afferents as well as proprioceptive afferents) although in those experiments we measured motor output rather than sensory transmission. in the future one might examine whether nt modulates gating of somatosensory inputs from the wrist to spinal interneurons . however, in the present work, the deficits in somatosensory responses were modest and might be difficult to dissect. with regard to corticospinal neuroplasticity, we have shown twice previously (in adult and elderly rats) that the less-affected corticospinal tract sprouts across the cervical midline after injection of aav-nt into affected forelimb muscles . others have shown that intrathecal infusion of nt induces sprouting of the corticospinal tracts and that injection of vectors encoding nt into muscles or nerve can induce corticospinal tract sprouting. here, our anatomical tracing confirmed that the less-affected corticospinal tract sprouted after infusion of nt protein into triceps and in future we will trace both tracts. this is because, in the present study, neurophysiology revealed that both corticospinal tracts underwent plasticity after unilateral infusion of nt protein. we propose that spared cst axons sprouted after nt entered the cns from the systemic circulation. this is consistent with data from us and others showing that radiolabelled nt entered the brain and spinal cord after intravenous injection [ ] [ ] [ ] . moreover, it has been shown that endogenous muscle spindle-derived cues induce sprouting of descending pathways after spinal cord injury in adult mice ; given that muscle spindles make nt endogenously , it is plausible that infusion of supplementary nt to muscle might enhance corticospinal sprouting after stroke. it is also notable that infusion of nt into a proximal forelimb extensor improved the accuracy of use of the affected forelimb when walking on a horizontal ladder but did not improve the accuracy of use of the affected hindlimb; this implies that circulating nt is not sufficient to improve hindlimb movements. moreover, we did not find any evidence that nt strengthened the short-latency reflex between afferents in the median nerve and motor neurons in the ulnar nerve; this may be because we infused nt into the triceps brachii whose afferents do not run in the median or ulnar nerve. this is consistent with previous work of ours showing that a reflex may be strengthened when its afferent comes from a muscle expressing higher levels of nt but not when its afferent comes from a muscle lacking transgenic expression of nt . finally, infusion of nt protein into triceps brachii did not improve forelimb grip strength: however, the grip strength task probably depends more on strength in hand and digit flexor muscles (into which nt was not infused) than on triceps brachii (elbow extensors). indeed, in previous work, injection of aav-nt into proximal and distal flexor muscles did modestly improve grip strength . taken together, these results indicate that it may be important to target nt to multiple muscles. however, it is not straightforward to reconcile all our findings with a single mechanistic explanation. it is possible that, additionally, nt was trafficked from triceps brachii in axons to motor neurons and/or by drg neurons where it induced expression of a molecule that was secreted and induced cst sprouting (e.g., bdnf or igf , ). nt is certainly trafficked to ipsilateral motor neurons and drg after intramuscular delivery , , , , and in this study we also showed, unexpectedly, a small increase in contralateral triceps (perhaps from nt in endothelial cells). diffusion of nt within neuropil is inefficient but spinal motor neuron dendritic arbors can be very large; some even extend across the midline and these might provide a widespread source of cues for supraspinal axonal plasticity (e.g., across the midline). to seek drg-secreted factors, we have performed rnaseq of cervical drg after injection of aav-nt into forelimb flexors , . in the future we will also seek motor neuron-derived cues. finally, it is interesting that the recovery continues even after infusion of nt is discontinued at four weeks. this is encouraging, from a translational perspective. we propose that the four-week long nt treatment induces changes in target neurons that persist (e.g., due to sustained modifications in gene expression). indeed, longer treatment with nt induces different intracellular signalling events in sensory axons than does brief treatment, thereby enhancing terminal branching . in the future, we will seek factors that are persistently increased in target neurons after nt treatment is discontinued. additionally, it may be that nt induces sprouting of cst axons that (after cessation of treatment) is followed by selection of synapses (e.g., strengthening or pruning) by a mechanism that is independent of nt . for example, it is known that corticomotoneuronal axon synapses are pruned by repulsive plexina -sema d interactions . to begin to dissect the mechanisms whereby nt promotes neuroplasticity and recovery after peripheral delivery, we are setting up a mouse model of stroke. in summary, treatment of disabled arm muscles with nt (initiated in a clinically-feasible timeframe) induces multilevel spinal and supraspinal neuroplasticity, improves walking and reverses a tactile sensory impairment. and hours later infusion of nt or vehicle into the disabled triceps brachii was initiated for one month. six weeks after stroke, anterograde tracer was injected into the contralesional hemisphere (blue). rats underwent weeks of behavioural testing. structural mri was conducted on all rats at hours, week and weeks after stroke and fmri was conducted in a subset of rats at baseline, week and week . electrophysiology was performed in the subset of rats which did not receive bda tracing. all surgeries, treatments and behavioural testing were performed using a randomized block design and the study was completed blinded to treatment allocation. c) t mri scans hours after stroke, immediately prior to treatment, showing infarct in coronal sections rostral (mm) to bregma. d) there were no differences between stroke groups in mean lesion volume at hours (mann whitney p values = . ). e) photomicrographs showing loss of figure : delayed nt treatment improved responsiveness to somatosensory stimulation, improved walking and partially restored use of the affected forelimb for lateral support during rearing. a-c) somatosensory deficits were assessed using pairs of adhesive patches attached to the rat's wrists. b) treatment with nt caused improvement compared to vehicle (linear model; f , = . , p< . ; post hoc p= . ): whereas vehicle-treated stroke rats showed a deficit relative to sham rats which persisted for eight weeks (linear model; f , = . , corticospinal axons were anterogradely traced from the less-affected cortex and were counted at the midline (m), at two more lateral planes (d and d ) and crossing into grey matter from the ipsilateral, ventral tract (ipsi). b) nt treatment caused an increase in the number of axons crossing at the midline (f , = . , p= . ; post hoc p value= . ), at two lateral planes denoted as d (f , = . , p< . , post hoc p value< . ) and d (f , = . ,p< . , post hoc p value< . ) and from the ventral cst on the treated side (f , = . , p= . , post hoc p value= . ). although stroke by itself caused sprouting at the midline at c (planned comparison p= . ), nt did not promote additional sprouting at c . n= /group were used for tract tracing. c-e) the cst from the less-affected hemisphere or f-h) lesioned hemisphere was stimulated in the medullary pyramids (before the decussation) and the motor output was recorded from the ulnar nerve on the treated side. d, g) the majority of spikes were detected between ms and ms, latencies consistent with polysynaptic transmission, in both vehicle-treated rats (grey) and nt treated rats (blue) when the less-affected or affected cst was stimulated. e, h) stimulus intensity was increased incrementally from µa to µa and the area under the curve were measured (between and ms) after stimulation of the affected or less-affected hemisphere. nt treatment caused increased output in the ulnar nerve during stimulation of either the affected cst (two-way rm anova intensity* group interaction f , = . , p= . , n= vehicle, nt ) or less affected cst (f , = . , p= . , n= vehicle, n= nt ). i, j) the heteronymous reflex from median afferents to ulnar motor neurons was recorded in the axilla. k) the monosynaptic component was measured. l) nt did not increase the strength of the monosynaptic component. the rats held bilaterally on to the pair of force transducers (top left) and the rat was pulled away horizontally and perpendicularly (towards the right) until the bilateral grip was broken. the force transducers provide a measure of strength (grams) for each upper limb. an average of three trials was taken per rat per week. grip strength (grams) are presented as group means ± sems. b) grip strength of the affected limb was subtracted from the unaffected limb strength, as an internal control (e.g., to control for differences in motivation, etc.). c) grip strength for the affected limb. d) grip strength for the less-affected limb. stroke caused a weak trend towards a transient decrease in strength on the limb affected by stroke (relative to shams; time f , = . , p= . ; p-values p= . and . , respectively) but multiple pairwise comparisons did not show significant differences at any timepoint (all p> . ). there was no difference between nt and vehicle treated rats overall (group f , = . restoring brain function after stroke -bridging the gap between animals and humans a comprehensive review of prehospital and in-hospital delay times in acute stroke care trkc-like immunoreactivity in the primate descending serotonergic system local and remote growth factor effects after primate spinal cord injury influences of neurotrophins on mammalian motoneurons in vivo expression and coexpression of trk receptors in subpopulations of adult primary sensory neurons projecting to identified peripheral targets the neurotrophins bdnf, nt- , and ngf display distinct patterns of retrograde axonal transport in peripheral and central neurons expression of neurotrophins in skeletal muscle: quantitative comparison and significance for motoneuron survival and maintenance of function nt- , but not bdnf, prevents atrophy and death of axotomized spinal cord projection neurons muscle injection of aav-nt promotes anatomical reorganization of cst axons and improves behavioral outcome following sci neurotrophin- expressed in situ induces axonal plasticity in the adult injured spinal cord adeno-associated viral vector-mediated neurotrophin gene transfer in the injured adult rat spinal cord improves hind-limb function differential effects of brain-derived neurotrophic factor and neurotrophin- on hindlimb function in paraplegic rats intramuscular aav delivery of nt- alters synaptic transmission to motoneurons in adult rats either brain-derived neurotrophic factor or neurotrophin- only neurotrophin-producing grafts promote locomotor recovery in untrained spinalized cats neurotrophin- enhances sprouting of corticospinal tract during development and after adult spinal cord lesions intramuscular neurotrophin- normalizes low threshold spinal reflexes, reduces spasms and improves mobility after bilateral corticospinal tract injury in rats spinal electromagnetic stimulation combined with transgene delivery of neurotrophin nt- and exercise: novel combination therapy for spinal contusion injury delayed intramuscular human neurotrophin- improves recovery in adult and elderly rats after stroke retrograde viral delivery of igf- prolongs survival in a mouse als model immune activation is required for nt- -induced axonal plasticity in chronic spinal cord injury expression of neurotrophin- promotes axonal plasticity in the acute but not chronic injured spinal cord activity-dependent increase in neurotrophic factors is associated with an enhanced modulation of spinal reflexes after spinal cord injury muscle spindle feedback directs locomotor recovery and circuit reorganization after spinal cord injury selective expression of neurotrophin- messenger rna in muscle spindles of the rat permeability of the blood-brain barrier to neurotrophins permeability at the blood-brain and blood-nerve barriers of the neurotrophic factors: ngf, cntf, nt- , bdnf penetration of neurotrophins and cytokines across the bloodbrain/blood-spinal cord barrier nt- promotes nerve regeneration and sensory improvement in cmt a mouse models and in patients neurotrophin- improves functional constipation tolerability of recombinant-methionyl human neurotrophin- (r-methunt ) in healthy subjects recombinant human neurotrophic factors accelerate colonic transit and relieve constipation in humans nerve growth factor-and neurotrophin- -induced changes in nociceptive threshold and the release of substance p from the rat isolated spinal cord unbiased classification of sensory neuron types by large-scale singlecell rna sequencing sustained sensorimotor impairments after endothelin- induced focal cerebral ischemia (stroke) in aged rats rodent models of focal stroke: size, mechanism, and purpose delayed treatment with chondroitinase abc promotes sensorimotor recovery and plasticity after stroke in aged rats on the use of alpha-chloralose for repeated bold fmri measurements in rats robust automatic rodent brain extraction using -d pulse-coupled neural networks (pcnn) contrast weights in flexible factorial design with multiple groups of subjects spatial characterization of the motor neuron columns supplying the rat forelimb ethosuximide reverses paclitaxel-and vincristine-induced painful peripheral neuropathy chondroitinase abc promotes plasticity of spinal reflexes following peripheral nerve injury the labelling of proteins to high specific radioactivities by conjugation to a i-containing acylating agent heat shock protein-based therapy as a potential candidate for treating the sphingolipidoses graphical evaluation of blood-tobrain transfer constants from multiple-time uptake data the distribution of the anti-hiv drug, ' '-dideoxycytidine (ddc), across the blood-brain and blood-cerebrospinal fluid barriers and the influence of organic anion transport inhibitors a statistical model for identifying proteins by tandem mass spectrometry analysis of longitudinal data from animals with missing values using spss chondroitinase abc promotes functional recovery after spinal cord injury cervical motoneuron topography reflects the proximodistal organization of muscles and movements of the rat forelimb: a retrograde carbocyanine dye analysis lack of monosynaptic corticomotoneuronal epsps in rats: disynaptic epsps mediated via reticulospinal neurons and polysynaptic epsps via segmental interneurons electrophysiological actions of the rubrospinal tract in the anaesthetised rat trka, trkb, and trkc messenger rna expression by bulbospinal cells of the rat motoneuron-derived neurotrophin- is a survival factor for pax -expressing spinal interneurons bdnf and nt- , but not ngf, prevent axotomy-induced death of rat corticospinal neurons in vivo human and rat brain-derived neurotrophic factor and neurotrophin- : gene structures, distributions, and chromosomal localizations neurotrophin- : a neurotrophic factor related to ngf and bdnf a revised role for p-glycoprotein in the brain distribution of dexamethasone, cortisol, and corticosterone in wild-type and abcb a/bdeficient mice the cell biology of the blood-brain barrier nerve growth factor-induced protection of brain capillary endothelial cells exposed to oxygen-glucose deprivation involves attenuation of erk phosphorylation expression of cannabinoid receptors and neurotrophins in human gliomas vascularization of the dorsal root ganglia and peripheral nerve of the mouse: implications for chemical-induced peripheral sensory neuropathies neurotrophin- administration attenuates deficits of pyridoxineinduced large-fiber sensory neuropathy neurotrophin- is a target-derived neurotrophic factor for penile erection-inducing neurons reemergence of activation with poststroke somatosensory recovery: a serial fmri case study correlation between brain reorganization, ischemic damage, and neurologic status after transient focal cerebral ischemia in rats: a functional magnetic resonance imaging study early prediction of functional recovery after experimental stroke: functional magnetic resonance imaging, electrophysiology, and behavioral testing in rats expression of mrnas for neurotrophic factors (ngf, bdnf, nt- , and gdnf) and their receptors (p ngfr, trka, trkb, and trkc) in the adult human peripheral nervous system and nonneural tissues trka and trkc expression is increased in human diabetic skin neurotrophin- -like immunoreactivity and trk c expression in human spinal motoneurones in amyotrophic lateral sclerosis changes in cortical activation patterns accompanying somatosensory recovery in a stroke patient: a functional magnetic resonance imaging study longitudinal changes in cerebral response to proprioceptive input in individual patients after stroke: an fmri study circuits for grasping: spinal di interneurons mediate cutaneous control of motor behavior intraspinal rewiring of the corticospinal tract requires target-derived brain-derived neurotrophic factor and compensates lost function after brain injury igf-i specifically enhances axon outgrowth of corticospinal motor neurons differential distribution of exogenous bdnf, ngf, and nt- in the brain corresponds to the relative abundance and distribution of high-affinity and low-affinity neurotrophin receptors distinct limb and trunk premotor circuits establish laterality in the spinal cord rnaseq dataset describing transcriptional changes in cervical sensory ganglia after bilateral pyramidotomy and forelimb intramuscular gene therapy with aav encoding human neurotrophin- . data in brief sad kinases sculpt axonal arbors of sensory neurons through long-and short-term responses to neurotrophin signals control of species-dependent cortico-motoneuronal connections underlying manual dexterity cst axons in the upper cervical dorsal columns weeks after stroke (right) relative to sham surgery (left), visualised using pkcγ immunofluorescence. f) stroke caused a significant loss of cst axons relative to shams in the dorsal columns there were no differences between nt and vehicle treated rats at one week (t-test p= . ). c) accuracy of paw placement by the affected forelimb during walking was assessed using a horizontal ladder with irregularly spaced runs. d) one week after stroke, nt and vehicle treated rats made a similar number of misplaced steps (t-test p= . ), expressed as a percentage of total steps. importantly, the nt group progressively recovered compared to the vehicle group (group f , = . , p< . ; post hoc p= . ) and differed from the vehicle group from weeks to (group x time f , = . , p< . ; post hoc p values< . ) and whereas the vehicle group remained impaired relative to shams from weeks to (p values< . ), from weeks to the nt group made no more errors than shams (post hoc p-values> . ). e) the vertical cylinder test assessed use of the affected forelimb for lateral support during rearing. f) stroke caused a reduction in the use of the affected forelimb during rearing in a vertical cylinder in both nt and vehicle treated rats relative to shams (group f , = . , p= . ; post hoc p values= . and . , respectively) with no differences between stroke groups at one week (p= . ). nt treatment caused a progressive recovery in the use of the affected forelimb grey circles) against time after iv injection (n= - mice/time) in adult mice. t-tests for nt vs albumin all significant for incubation times of , , , and (p values from < . to < . ***). the volume of distribution of [ h] nt or [ h] albumin in brain (vd =am/cp) is calculated as a ratio of counts per minute (cpm) in µg of brain and cpm in µl of serum for each time point and plotted against exposure time given by the term ∫ t cp(τ)dτ/cp. the rate of influx (ki) was calculated from the patlak plot of vd for  - μl/mg/s). c) [ h]nt entered the cervical spinal cord more abundantly than [ h]albumin. d) plasma half-life of nt for the normal adult mouse is ~ min (estimated from /normalised serum values). e) twenty-four hours after cortical ischemia, [ h]nt entered the brain more abundantly than [ h]albumin, measured minutes after iv injection. f) twenty-four hours after cortical ischemia there are no conflicts of interests of the authors. correspondence and requests for materials should be addressed to l.m (lawrence.moon@kcl.ac.uk) figure : focal cortical stroke caused impairment of the affected forelimb but modest or no impairment of the three other limbs. a) after stroke, nt treated rats recovered function of their affected forelimb on the ladder test relative to stroke vehicle controls and sham rats. nt treated rats recovered fully relative to shams (linear model and t-tests, p≤ . ). *** denotes group difference, p < . ; † denotes interaction of group with time, p< . . this subpanel is reproduced from figure to allow comparison with other subpanels). b) shows photograph of the horizontal ladder set up and insert shows a rat traversing the ladder. c) there was no difference in the number of foot faults made in any of the groups using the less affected supplementary figure : cold allodynia was caused neither by focal cortical stroke nor by treatment with neurotrophin- . the acetone test was used to see whether stroke and/or nt treatment caused any change in cold allodynia pain responses. the test involves applying a drop of acetone to the a) affected or b) less affected forelimb, and then allocating a score between and : higher numbers denote a heightened pain response. there is no evidence of painful behaviour based on this test in either forelimb. rm ancova with bonferroni post hoc tests. figure : elisa revealed that infusion of nt into triceps brachii did not cause detectable elevation of nt in homogenates of cervical spinal cord hemicords on the infused or non-infused side of the body (mann whitney p-values= . , . , respectively). figure : functional brain imaging during stimulation of the affected wrist revealed no enhanced probability of perilesional activation by neurotrophin- . the same rats were imaged prior to stroke and then one week and eight weeks after stroke and intramuscular treatment with either nt or vehicle. scans with obvious imaging artefacts were discarded, leaving final group numbers of n= , , and n= , , at weeks , and for nt and vehicle treated groups respectively. red voxels denote greater probability of activation during stimulation (versus stimulation off) whereas blue voxels denote lesser probability of activation during stimulation (versus stimulation off). a) prior to stroke, stimulation of the dominant paw led to a strong probability of activation in the opposite somatosensory cortex. b) one week after stroke, this activation was abolished by infarction. c) eight weeks after stroke, there was a slight trend towards a small perilesional area of reactivation in both groups. d) there was a slight trend towards greater perilesional reactivation in the nt group versus the vehicle group at weeks (relative to their baselines). however, all these heat maps of groups of rats show t-values obtained by statistical parametric map analysis without correction for multiple testing (p< . ) and there were no differences between the two groups for any voxels when the threshold for significance was corrected for multiple testing (p< . ; this data is not shown as the heat map was black). red voxels denote greater probability of activation during stimulation for the nt group than for the vehicle group whereas blue voxels denote lesser probability of activation for the nt group than for the vehicle group. when stimulating the less-affected wrist, there were no differences between the two groups for any voxels when the threshold for significance was corrected for multiple testing (p< . ; this data is not shown as the heat map was black). figure : different amounts of recombinant nt were run in four lanes of an sds page gel. two sizes of band of interest (lm _ and lm _ ) were detected following staining with colloidal coomassie brilliant blue. these protein bands were excised prior to separate enzymatic digestion and lc/ms/ms analysis. the apparent molecular weight of the upper band (~ kda) is consistent with either the pro-neurotrophin- precursor form or a dimer of the mature nt protein, whilst the lower band (~ kda) is consistent with the mature nt protein. sequencing revealed that both bands represent the mature nt protein. key: cord- -dt ers authors: zeng, xiangrui; fang, zhou; ma, tianzhou; lin, chien-wei; tseng, george c. title: comparative pathway integrator: a framework of meta-analytic integration of multiple transcriptomic studies for consensual and differential pathway analysis date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: dt ers motivation pathway analysis provides a knowledge-driven approach to interpret differentially expressed genes associated with disease status. many tools have been developed to analyze a single study. when multiple studies of different conditions are jointly analyzed, novel integrative tools are needed. in addition, pathway redundancy issue introduced by combining public pathway databases hinders knowledge discovery. methods and results we present a meta-analytic integration tool, comparative pathway integrator (cpi), to address these issues using adaptively weighted fisher’s method to discover consensual and differential enrichment patterns, consensus clustering to reduce pathway redundancy, and a novel text mining algorithm to assist interpretation of the pathway clusters. we applied cpi to jointly analyze six psychiatric disorder transcriptomic studies to demonstrate its effectiveness, and found functions confirmed by previous biological studies as well novel enrichment patterns. availability cpi is accessible online: http://tsenglab.biostat.pitt.edu/software.htm. contact xiangruz@andrew.cmu.edu in a typical transcriptomic study, a set of candidate genes associated with diseases or other outcomes are first identified through differential expression analysis. then, to gain more insight into the underlying biological mechanism, pathway analysis (a.k.a. gene set analysis) is usually applied to pursue the functional annotation of the candidate biomarker list. the goal behind pathway analysis is to determine whether the detected biomarkers are enriched in pre-defined biological functional domains. these functional domains might come from one of the publicly available databases such as gene ontology (go) [ ] and the kyoto encyclopedia of genes and genomes (kegg) [ ] . three main categories of pathway analysis methods have been developed in the past decade. the first method called "over-representation analysis" considers biomarkers at a certain differential expressed (de) evidence cutoff and statistically evaluates the fraction of de genes in a particular pathway found among the background genes. without a hard threshold, the second category "functional class scoring" takes the de evidence scores of all genes in a pathway into account and aggregates them into a single pathway-specific statistics. the third category "pathway topology" further incorporates the information of gene-gene interaction and their cellular location in addition to the pathway database [ ] . many transcriptomic datasets have been generated with the rapid advances of high-throughput genomic technologies in the past decade. meta-analysis, a set of statistical methods for combining multiple studies of a related hypothesis, has thus become popular. yet few methods have been developed for the pathway meta-analysis so far. [ ] developed two approaches of meta-analysis for pathway enrichment by combining de evidence at the gene level (mape g) or at the pathway level (mape p). in real cases, when multiple datasets for a common hypothesis are available under different conditions (e.g., tissues or labs), it might also be interesting to detect pathways enriched consistently under all conditions (consensual pathways) and pathways enriched solely in one condition but not in the others (differential pathways). one naïve way is to identify the enriched pathways in each study individually and manually check whether a certain pathway is enriched in one or multiple studies. to the best of our knowledge, there is currently no available statistical tool that can achieve this goal automatically and systematically. another issue emerges with pathway analysis is the pathway redundancy. the large amount of individual pathways identified can hardly infer the underlying biology directly due to this issue. this kind of redundancy typically occurs in a regular pathway analysis since different pathways may include many overlapping genes. toolkit david [ ] resolved this issue by clustering pathways based on a kappa statistics representing the pathway similarity [ ] . but the users still need to manually inspect every pathway in a cluster. furthermore, due to the long and vague pathway description, users can barely make solid conclusions from the results. in light of this, we proposed a framework of meta-analytical integration of multiple transcriptomic studies for consensual and differential pathway analysis, wrapped in a tool named comparative pathway integrator (cpi). our tool incorporates databases, including msigdb [ ] , go, kegg, or user-defined gene set lists, as reference of pathway analysis. in order to identify both commonly and study-specifically enriched pathways, we applied the adaptively weighted fisher's method [ ] , which is originally developed to combine p-values from multiple genomic studies for detecting homogeneous and heterogeneous differentially expressed genes. clustering analysis based on the overlapping genes among enriched pathways is applied to remove the level of pathway redundancy. subsequently, we developed a text mining algorithm to automate the annotation of pathway clusters by extracting keywords from pathway descriptions, which also offers more statistically valid summarization compared to leaving user exploring each pathway in a cluster manually and heuristically. last, cpi visualize the findings and provide users both text and graphical outputs for intuitive while statistical solid presentation and easy interpretation. an r gui package cpi has been disseminated into metaomics, an analysis pipeline and browser-based software suite for transcriptomic meta-analysis [ ] . figure : the work flow of cpi cpi is a comprehensive tool incorporating several widely accepted mature methods as well as some novel algorithms/approaches. it is mainly composed of three steps (figure ). the first step is metaanalytic pathway analysis, which includes pathway enrichment analysis and meta analysis. this step partially resembles the work of r package metapath [ ] . while metapath focuses on detecting consensus expressed pathways, cpi will detect both consensual and differentially expressed pathways, providing extra information on how the patterns of pathway enrichment differ across studies. the second step is pathway clustering. this step aims to reduce the redundancy of the pathways from normally hundreds of enriched pathways to a few pathway clusters. the results are more succinct and interpretable. the third step is text mining based on the pathway names and descriptions to find keywords characterizing the overall information of the cluster. a permutation-based statistical test is proformed to assess if a specific biological noun phrases significantly appear more than by chance. without this step, it would be difficult to identify the representative characteristics for each cluster and users may still need to eyeball all the pathways in results since the clustering of the pathways does not actually reduce the total number of pathways. finally we have both graphical output, containing preadsheet output of p-value matrices of pathways and pathway clustering details including gene composition and keywords. compared with differentially-expressed gene analysis, pathway analysis gives more biological insight to be more systematic and comprehensive. in cpi, we adopt the method of over representation analysis when users input gene list and corresponding p-values from each study. for users who prefer other types of pathway analysis, we also accepts lists of significant pathways and the corresponding p-value as user input.. given pathway enrichment results, we perform adaptively-weighted fisher's (aw fisher) method [ ] as meta-analysis, to identify pathways significant in one or more studies/conditions. aw fisher method not only increases the statistical power , but also gives a binary weight for each pathway, indicating the significance of the pathway in each study/condition. given a user-specified q-value cutoff, we have a list of significant pathways, with some of them commonly significant across studies/conditions while some of them significant in specific studies/conditions. because of the nature of pathways (e.g., hierarchy structure), many genes are shared among different pathways. this redundancy often reduces the interpretability of the result from pathway enrichment analysis. in cpi, we perform pathway clustering method to reduce the redundancy among pathways. the similarity between different pathways is calculated based on kappa statistics [ ] , which depends on how many genes are mutual and exclusive among those pathways. the kappa statistics represents the distance between two figure : plots used to assist decision on total cluster number a) elbow plot. b) consensus cdf plot pathways based on the genes composing each pathway. a distance matrix is then defined based on similarity matrix, composing the distance between each pair of pathways. consensus clustering [ ] is used for clustering purpose. following the original consensus clustering method, an elbow plot and consensus cdf plot are generated to assist users to decide the number of clusters ( figure ). for most clustering algorithm including the consensus clustering we adopted, each pathway will be forced to cluster into one group of pathways, no matter it's scattered or not compared to the pathways within the same cluster. we allow scattered pathways to form singletons, when its gene composition is different from representative pathway clusters, to avoid adding noise to other existing clusters. to improve the tightness of the cluster, we further calculated the silhouette width [ ] , a measure of how tightly each pathway is grouped in its cluster, and removed the scattered pathways with low silhouette width iteratively until all pathways' silhouette width is above a certain cutoff (we choose . in our experiments). the removing cutoff for silhouette width is estimated empirically based on its distribution from our multi-disease dataset. for singletons identified, we collected them into another cluster instead of filtering them out because those pathways might be interesting in terms of unique genes composition. reducing the redundancy of pathways by clustering per se gives limited summary of the pathways. since the user usually need to go through most of the pathways in a cluster to grasp an idea and interpretation of the contents in the cluster, the interpretation is not quantitative and largely rely on the biological knowledge of the user. therefore, we need a more rigorous and statistically meaningful summary of the cluster. the above goal is expressed here as the text mining for key noun phrases for each cluster: which noun phrase appears more frequently in a certain cluster than they usually would? we will therefore treat these noun phrases as the key entities for that cluster. the entity is counted based on the number of pathways containing it, rather than the frequency of it appearing in all pathway descriptions in a cluster. for instance, in a certain cluster, if "t cell" occurs in pathway descriptions (three times in pathway # , twice in pathway # and once in pathway # ),"t cell" is counted times occurrence even if it appears times in total. for a pathway description, first, we extracted unique noun phrases from it. this step was done using the noun chunk extraction function from python package spacy [ ] . spacy is an industrial strength text-mining package employing a large library database as well as some machine learning algorithms to detect information from texts. the stop words in english, such as "the", "a", "that", which are very common and carry no important information, are removed from those noun phrases. this step was done using the english stop words database from python package nltk (natural language toolkit) [ ] , the python package with the largest text mining database. after removing all stop words, the last word of each noun phrases, i.e., the central noun of a noun phrase, is lemmatized (converting plural form to singular form) if it is in plural form. this step was done using the lemmatizer function in nltk. the top common english words are filtered out from the result noun phrases of length one. a text mining process of an example sentence is shown in figure . in total, we provide pathway databases (go, kegg, biocarta, reactome, phenocarta and etc.) with pathways. the above formalizing and filtering steps were repeated times to generate standard form noun phrases for all pathways. based on the results, we constructed a binary matrix where each row being a noun phrase and each column being a pathway with element x ij = indicating the pathway description j contains the noun phrase i. once the matrix was constructed, python package vocabulary was used to identify synonyms from row names of the matrix (noun phrases). when a pair of synonyms are identified, the phrase with lower occurrence in all pathways are combined with the phrase with higher occurrence. then the row of less occurred phrase was deleted. since in later cluster text mining, a phrase needs to at least occur in two pathways to be considered, all the rows of phrases which occurred only once in total pathways are deleted. as a result, a matrix of rows and columns was constructed. for later penalized permutation test, the above text mining matrix construction procedure was also applied to pathway names of pathways. a similar matrix of rows and columns x ij = indicating the pathway term name j contains the noun phrase i. a simple strategy to test for the significance of a phrase in a cluster is by simple counting and conduct fisher exact test. yet we found this method to be less powerful and biological justifiable from real data analysis, because the phrases in the term name or a shorter description of a pathway are designed to be more representative than those in a full or longer description. we therefore decide to penalize on words found in longer description. we down-weighted the phrase count by assigning a score between and to each pathway j to indicate whether it contains phrase i: phrase i appeared in term name of pathway j, exp(−α|w j |) phrase i appeared only in description on pathway j, where |w j | is the number of unique noun phrases in description of pathway j; α is a parameter controlling the intensity of penalty. the greater α is, the greater the penalty is on longer description. when α equals to , there is no penalty and our test simplifies to be equivalent to fisher exact test. and then we define cluster score t i (c) to be the sum of scores of pathways in the cluster, i.e. for phrase i in a pathway cluster c, we have our test statistics: to test for the null hypothesis that a phrase is not enriched in a certain cluster, we adopt a permutation test. the basic argument of constructing the permutation distribution for the test statistics, under the null hypothesis, is that all phrases occur equally frequent across all clusters, including the unpermuted data. so for each phrase i in the tth permutation, pathways are randomly sampled to form subset s t with the same cluster size as c. test statistics t i (s t ) is recomputed at the end of each permutation. the operation is then repeated for a large number of times (say, t times). at last, all t i (s t ) are ranked together with the original data t i (c). and the p-value (later transformed to q-value by bh procedure) could be calculated thereby, indicating how extremely frequent phrase i is seen in cluster c. suppose we already have the clustered pathways and the aw fisher p-value for each pathway, in this final step we aim to help users understand the overall pattern of pathways significance better, by visualization approaches including heatmaps of p-value for each pathway under each condition, heatmap of kappa statistics for each pathway. we provide homo sapiens pathway database including pathway database from msigdb, database from connectivity map, transcription factor target database jaspar, protein-protein interaction database and microrna databases as options for enrichment analysis. in addition, gene ontology, kegg database for organism mus musculus and saccharomyces cerevisiae, jaspar database for organism mus musculus are provided. setting the text mining permutation at times took a reasonable time for the whole analysis procedure: minutes. our results also demonstrate the power of penalized permutation test over fisher's exact test. since fisher's exact test treats words in description (can be up to words) the same as words in pathway names (usually less than words), signals of common biological words in pathway names will be masked due to its more frequent occurrence in pathway descriptions, and some non-informative words in descriptions could be detected as keywords falsely. for example, in our text mining results of the two tests, atp in the mitochondrial atp activity cluster was ranked low in the fisher's exact test (r= ) while prioritized in permutation test (r= ). some meaningless words such as buolo and engelhardt in cluster was ranked high in fisher's exact test (r= ), but ranked low by penalized permutation test (r= and r= ). to conclude, for detecting ordinary keywords, penalized permutation test performs roughly the same as fisher's exact test. however, for detecting keywords frequently occur in biological vocabulary and filtering out strange jargons penalized permutation test is indeed more powerful than fisher's exact test. to demonstrate its utility, we applied cpi using the default databases (gene ontology, kegg, reactome and biocarta) to analyze the transcriptome datasets of three psychiatric diseases of two prefrontal cortex layers. these datasets, provided by dr. david a. lewis' group, was used previously to compare post-mortem . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) − . . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) − . . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) tissue dorsolateral prefrontal cortex (dlpfc) layer and layer pyramidal cells' gene expression level of bipolar disorder, major depressive and schizophrenia patients with matched healthy control [ ] . by inputting the top differentially expressed genes in each of the six datasets and default pathways containing to genes, top major pathways of the total pathways were identified and clustered to clusters with pathways as singleton terms. of seven clusters, three important clusters with insightful biological meaning are discussed below. the elbow plot and consensus cdf plot (figure ) indicate that a cluster number of is reasonable. because when the number of clusters is greater than , the relative change in area under cdf curve of elbow plot slows down (figure a) , and the consensus cdf curve flattens out (figure b) . based on the heatmap output shown in figure and , we found that pathways in cluster are significantly altered in schizophrenia dlpfc layer . cluster , and shows similar pattern, and are significantly enriched consensually in both schizophrenia dlpfc layer and . cluster is enriched in the dlpfc layer of two diseases: significantly in major depression disorder, and marginally, bipolar disease. cluster is solely enriched in the layer of schizophrenia. cluster is enriched significantly in layer of schizophrenia, and slightly in the layer of schizophrenia and bipolar disease. and we also observed that the singletons on the bottom of the heatmap display no consensual nor differential pattern, as they are scattered pathways. to further demonstrate the consensual and differential patterns of our results, we inspect the contents of each cluster with the help of text mining. the information provided from these pathways are diverse, even within a cluster. but those information could be condensed statistically into keywords, with our penalized text mining method. based on pathway clustering and text mining results in the clustering summary csv file from step , we found that cluster of pathways with keywords under fdr . : innate immune response, cytokine, mapk is significantly altered in major depression dlpfc layer (group ). reduced mapk/erk signaling in hippocampal area is associated with depressive behavior [ , , ] . it is unknown whether major depressive disorder is associated with altered mapk/erk signaling in dorsolateral prefrontal cortex. our study demonstrated altered mapk/erk signaling in major depressive disorder dlpfc layer but not in layer or the other two diseases. it remains an open question how this alteration relates to major depression disorder symptoms. cluster of pathways with keywords under fdr . : inner mitochondrial membrane, nadh, atp synthesis is significantly altered in layer of schizophrenia dlpfc (group ) and to a less extent in layer of schizophrenia dlpfc (group ). this consistent with a previous transcriptomic study of schizophrenia and schizoaffective disorder in dr. lewis' lab [ ] . and our results further indicated that the difference in the degree of mitochondrial dysfunction across dlpfc layer and layer was mainly due to different degrees of mitochondrial atp dysfunction rather than other aspects of mitochondrial dysfunction. cluster of pathways with keywords under fdr . : dna replication, mitosis, ubiquitin is significantly altered in schizophrenia dlpfc layer (group ) and to a less extent in bipolar disorder dlpfc layer (group ) and schizophrenia dlpfc layer (group ). this validates another previous finding from dr. lewis's lab which associated ubiquitin-proteasome related genes to schizophrenia layer [ ] . furthermore, consensual results across different diseases and layers are shown by the significant alteration of ubiquitin-proteasome system and cell cycle not only in schizophrenia dlpfc layer , but also in schizophrenia and bipolar disorder dlpfc layer . similar result from a blood based microarray investigation evidenced preliminarily ubiquitin-proteasome dysregulation in both schizophrenia and bipolar disorder [ ] . however, the de genes contributing to ubiquitin-proteasome dysregulation are different in bipolar disorder and schizophrenia, which may need further biological investigation and interpretation. in this article, we explored the approaches for comparative meta-analytic pathway analysis, and developed an integrative platform for this purpose called "cpi". cpi reduces pathway redundancy to condense knowledge discovered from the results and also conducts text mining to provide statistically solid suggestions on interpreting results. cpi has three major steps. in the first step, users may input either p-value matrix of either genes or pathways for each study to start with. if p-values of genes are inputted, pathway analysis is applied first. then user may select a q-value cutoff for significant pathways [ ] . and those pathways are passed down to meta analysis where aw fisher is applied to discover consensually and differentially enriched pathways. we suggest . as our default cutoff, and the users may also choose their own cutoff according to their budget on the follow-up experiments and analysis. in the second step, the pathways are clustered using consensus clustering, with consensus cdf plot and elbow plot assisting users to choose the number of clusters [ ] . by default, a group of six clusters were assessed. silhouette information is used to achieve cluster tightness by moving scattered pathways to singleton. the cutoff of . is selected using silhouette information [ ] distribution from our multi-disease dataset, and results in a reasonable number of singletons ( out of pathways). in the third step, text mining, users may choose number of permutation iterations based on user's computation capacity and data size. the computational complexities of the cpi r standalone and gui version are similar. using one thread on ix windows system, with the input of gene lists with total of subjects and k genes and the default pathway reference database, setting permutation test iteration to be , one run of cpi takes minutes. moreover, both versions can be executed in parallel to further speed up the analysis. cpi has three advantages as compared to previous methods addressing pathway meta-analysis. first, cpi explores consensual and differential expression pattern spontaneously in integrated pathway analysis. second, cpi clusters pathways by the gene composition to reduce pathway redundancy. third, cpi uses a statistically valid text mining method to interpret pathway analysis results. in addition, the penalized text mining algorithm by permutation test has shown the advantage over the standard test like fisher's exact test based on real data analysis. we applied the tool to multiple psychiatric disorders transcriptomic data. the result identifies multiple pathway enrichment patterns relevant to previously confirmed as well as novel biological functions, such as mitochondrial atp dysfunction in schizophrenia dlpfc layer , ubiquitin-proteasome system dysregulation in schizophrenia and bipolar disorder dlpfc layer and altered mapk/erk signaling chain in major depression disorder dlpfc layer [ ] . cpi has several limitations. first, for single study pathway analysis, our tool only allows user to apply fisher's exact test and ks test, but it can be readily extend to include other methods such as gsea [ ] . second, our text mining algorithm rely on the descriptions provided by pathway databases. so for those pathway databases that do not provide descriptions, e.g. some pathways in kegg, text mining algorithm loses advantages. thirdly, computation time is not ignorable, especially in text mining step. for the real data we applied, each permutation iterations increase minutes computation time. in summary, cpi is a meta-analytic tool for discovering commonly expressed and study specific pattern in transcriptomic studies, that will also reduce pathway redundancy and conduct text mining to increase interpretability of the results. cpi is implemented in r, as well as r shiny, an r based graphical user interface. the r shiny version is disseminated in metaomics can be easily handled by users without programming knowledge [ ] . this work has been supported by the nih r ca and r lm . distinctive transcriptome alterations of prefrontal pyramidal neurons in schizophrenia and schizoaffective disorder gene ontology: tool for the unification of biology controlling the false discovery rate: a practical and powerful approach to multiple testing nltk: the natural language toolkit preliminary evidence of ubiquitin proteasome system dysregulation in schizophrenia and bipolar disorder: convergent pathway analysis findings from two independent samples weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit a role for map kinase signaling in behavioral models of depression and antidepressant treatment a negative regulator of map kinase causes depressive behavior reduced activation and expression of erk / map kinase in the post-mortem brain of depressed suicide subjects systematic and integrative analysis of large gene lists using david bioinformatics resources kegg: kyoto encyclopedia of genes and genomes ten years of pathway analysis: current approaches and outstanding challenges an adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies molecular signatures database (msigdb) . metaomics: analysis pipeline and browser-based software suite for transcriptomic meta-analysis consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data evaluation of trkb and bdnf transcripts in prefrontal cortex, hippocampus, and striatum from subjects with schizophrenia, bipolar disorder, and major depressive disorder silhouettes: a graphical aid to the interpretation and validation of cluster analysis meta-analysis for pathway enrichment analysis when combining multiple genomic studies gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles understanding interobserver agreement: the kappa statistic figure : heatmap of logged p-value of pathways for all clusters using default databases. key: cord- -eb t n authors: morales-nebreda, luisa; helmin, kathryn a.; markov, nikolay s.; piseaux, raul; acosta, manuel a. torres; abdala-valencia, hiam; politanska, yuliya; singer, benjamin d. title: aging imparts cell-autonomous dysfunction to regulatory t cells during recovery from influenza pneumonia date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: eb t n regulatory t (treg) cells orchestrate resolution and repair of acute lung inflammation and injury following viral pneumonia. compared with younger patients, older individuals experience impaired recovery and worse clinical outcomes after severe viral infections, including influenza and the novel severe acute respiratory syndrome coronavirus (sars-cov- ). whether age is a key determinant of treg cell pro-repair function following lung injury remains unknown. here, we show that aging results in a cell-autonomous impairment of reparative treg cell function following experimental influenza pneumonia. transcriptional and dna methylation profiling of sorted treg cells provide insight into the mechanisms underlying their age-related dysfunction, with treg cells from aged mice demonstrating both loss of reparative programs and gain of maladaptive programs. novel strategies that restore youthful treg cell functional programs could be leveraged as therapies to improve outcomes among older individuals with severe viral pneumonia. age is the most important risk factor determining mortality and disease severity in patients infected with influenza virus or the novel severe acute respiratory syndrome coronavirus (sars-cov- ) ( , ). global estimates of seasonal influenza-associated mortality range from , - , deaths per year, with the highest at-risk group comprised of individuals over age ( ) . in the united states, influenza-associated morbidity and mortality have steadily increased, an observation linked to an expansion of the aging population. pneumonia related to both severe influenza a virus and sars-cov- infection results in an initial acute exudative phase characterized by release of pro-inflammatory mediators that damage the alveolar epithelial and capillary barrier to cause refractory hypoxemia and the acute respiratory distress syndrome (ards) ( ) . if a patient survives this first stage, activation of resolution and repair programs during the ensuing recovery phase is crucial for restoration of lung architecture and function, which promotes liberation from mechanical ventilation, decreases intensive care unit length-of-stay and extends survival. foxp dampen inflammatory responses to both endogenous and exogenous antigens. aside from their role in maintaining immune homeostasis through their capacity to suppress over-exuberant immune system activation, treg cells reside in healthy tissues and accumulate in the lung in response to viral injury to promote tissue repair ( , ) . our group and others have shown that in murine models of lung injury, treg cells are master orchestrators of recovery ( ) ( ) ( ) ( ) . treg cells are capable of promoting tissue regeneration and repair, at least in part through release of reparative mediators such as the epidermal growth factor receptor ligand amphiregulin (areg), which induces cell proliferation and differentiation of the injured tissue ( ) . epigenetic phenomena, including dna methylation, modify the architecture of the genome to control gene expression and regulate cellular identity and function throughout the lifespan ( ) . aside from being one of the best predictive biomarkers of chronological aging and age-related disease onset, dna methylation regulates treg cell identity through tight epigenetic control of foxp and foxp -dependent programs ( ) . biological aging is associated with a progressive loss of molecular and cellular homeostatic mechanisms that maintain normal organ function, rendering individuals susceptible to disease ( , ) . because of their tissue-reparative functions, treg cells are important modulators of the immune response that promotes tissue regeneration following injury ( ) . whether age plays a key role in determining the pro-repair function of treg cells in the injured lung during recovery from viral pneumonia remains unknown. if aging indeed impacts treg cell-mediated recovery, is it a treg cell-autonomous phenomenon or is it because the aging lung microenvironment is resistant to treg cell-mediated repair? using heterochronic (age-mismatched) adoptive treg cell transfer experiments and molecular profiling in mice, we sought to determine whether the age-related impairment in repair following influenza-induced lung injury is intrinsic to treg cells. our data support a paradigm in which aged treg cells fail to upregulate youthful reparative programs, activate maladaptive responses and consequently exhibit a cell-autonomous impairment in pro-recovery function, which delays resolution from viralinduced lung injury in aged hosts. to evaluate the age-related susceptibility to influenza-induced lung injury, we administered influenza a/wsn/ (h n ) virus via the intratracheal route to young ( months) and aged ( months) wild-type mice. aged mice exhibited > % mortality when compared with young animals ( figure a) , impaired recovery of total body weight following a similar nadir ( figure b ) and more severe lung injury by histopathology at a late recovery time point, day post-infection ( figure c ). at this same time point, aged mice also displayed an increase in the total number of cells per lung (figure d ), which were mainly comprised of immune cells identified by the panhematopoietic marker cd (figure e) , suggesting non-resolving tissue inflammation during recovery in older mice. we next wanted to determine whether the age-related susceptibility to influenza-induced lung injury was due to a differential inflammatory response during the initial acute injury phase. accordingly, we examined a different group of young and aged mice at a time point when viral clearance was complete ( ) and weight nadir was observed in both groups, days post-infection. aged mice demonstrated increased mortality when compared with young animals at this time point (supplemental figure a) , but other markers of acute inflammation, including weight loss (supplemental figure b) , total lung cells (supplemental figure c ) and total lung cd + cells in surviving animals were not significantly different between groups (supplemental figure d) . collectively, these results suggest that aging results in similar early injury but persistent lung inflammatory pathology during the recovery phase of influenza-induced lung injury. having established that aging results in an increased susceptibility to persistent lung injury after influenza infection, we explored whether the impaired recovery in aged mice was linked to a persistent failure to repopulate the structural components of the alveolar-capillary barrier (i.e., failure to repair). flow cytometry analysis (supplemental in previous studies, investigators demonstrated that following influenza-induced lung injury, a population of cytokeratin + (krt + ) basal-like cells expand and migrate to the distal airspaces in an attempt to repair the injured epithelial barrier ( ) . these cells lack the capacity to transdifferentiate into functional at cells, resulting in a dysplastic response that contributes to a dysregulated and incomplete repair phenotype following injury ( ) . using a flow cytometry quantitative approach, we found that at days post-infection, aged mice showed a significant increase in krt + cells compared with young animals (figure g-h) . in summary, older mice failed to repair the injured lung during the recovery phase of influenza-induced lung injury. we and others have identified an essential role for regulatory t (treg) cells in orchestrating resolution and repair of acute lung injury ( ) ( ) ( ) ( ) ( ) . having established that aged mice fail to repair the injured lung, we next sought to determine whether this finding is due to age-related features altering the lung microenvironment or is driven by cell-autonomous, age-associated treg cell factors. thus, we performed heterochronic (age-mismatched) adoptive transfer of x splenic young or aged treg cells via retro-orbital injection into aged or young mice hours post-infection ( figure a) . notably, adoptive transfer of young treg cells into aged hosts resulted in improved survival when compared with aged mice that received pbs (control), while adoptive transfer of aged treg cells into young hosts worsened their survival when compared with their respective controls ( figure b ). we next turned to an inducible treg cell depletion system using foxp dtr mice in order to eliminate treg cells from recipients and specifically determine the age-related effect of donor treg cells on the susceptibility to influenza-induced lung injury ( figure c ). adoptive transfer of aged treg cells into treg cell-depleted foxp dtr mice days post-infection resulted in increased mortality when compared with adoptive transfer of young treg cells ( figure d ). combined, our findings demonstrate that the loss of the treg cell-associated pro-repair function in aged hosts is dominated by intrinsic, age-related changes in treg cells and not conferred extrinsically by the aging lung microenvironment. k-means clustering of these differentially expressed genes demonstrated that cluster ii was both the largest cluster and the one that defined the differential response to influenza infection between naïve and influenza-treated mice ( figure c) . notably, genes from this cluster were significantly upregulated among young treg cells when compared with aged treg cells following influenza infection ( figure d ). functional enrichment analysis revealed that this cluster was enriched for processes related to tissue and vasculature development and extracellular matrix formation figure a-b) . we found that during the treg cell response to influenza, there were , upregulated genes in young mice and only upregulated genes in aged mice when compared with their respective naïve state (fdr q-value < . ). gene set enrichment analysis revealed upregulation of prorepair hallmark processes in both young and aged treg cells during recovery from influenza infection (supplemental figure c-d) . we next compared the age-related transcriptional response to influenza infection and found shared genes between both age groups that were associated with pro-repair processes (supplemental figure e) in addition to representing one of the hallmarks of aging, epigenetic phenomena such as dna methylation regulate the development, differentiation and functional specialization of t cell lineages, including treg cells ( ) ( ) ( ) . therefore, we reasoned that age-related changes to the treg cell dna methylome could inform the divergent pro-repair transcriptional response seen between young and aged treg cells following influenza infection. we performed genome-wide ( '-cytosine-phosphate-guanine- ') cpg methylation profiling with modified reduced representation bisulfite sequencing (mrrbs) of sorted lung treg cells during the naïve state or recovery phase following influenza infection (day ) (figure a) . pca of ~ , differentially methylated cytosines (dmcs, fdr q-value < . ) revealed tight clustering according to group assignment with the main variance across the dataset (pc ) reflecting methylation changes due to age (figure b) , consistent with prior studies ( , ) . we next identified genes that were both differentially expressed and had differentially methylated cytosines within their gene promoters (anova, fdr q-value < . ), and found , genes meeting this parameter threshold ( figure c ). k-means clustering of gene expression levels revealed a substantial similarity to the deg heat map shown in figure c . gene set enrichment analysis of these genes demonstrated that this methylation-regulated gene expression program was associated with pro-recovery processes and was significantly skewed toward young treg cells ( figure c) . combined, these results show that age-related dna methylation regulates the pro-reparative transcriptional regulatory network during recovery from influenza-induced lung injury. we sought to unambiguously address the paradigm of how aging affects treg cell function during recovery from influenza pneumonia. we used heterochronic (age-mismatched) treg cell adoptive transfer following influenza infection to establish that the age-related pro-repair function of these , , ) . here, we found that similar to human epidemiologic data and previous pre-clinical murine studies, aged mice exhibit increased susceptibility and impaired recovery following influenza infection. injury to alveolar epithelial type i, ii and endothelial cells disrupts the tight gas exchange barrier causing accumulation of fluid and pro-inflammatory mediators in the alveolar space, a hallmark of ards pathophysiology ( ) . notably, we found that during late recovery from influenza infection, aged hosts demonstrated a decreased number of alveolar epithelial type ii cells and endothelial cells when compared with young animals, suggesting that failure to repopulate the alveolar lining contributes to the observed age-related impairment in recovery. severe influenza infection leads to a robust expansion of krt + cells, which migrate distally to form cystic-like structures or pods intended to cover the damaged alveolar wall ( ) . these pods persist long after the initial infection, lack the capacity to generate a functional alveolar epithelium and therefore constitute an insufficient reparative response to injury ( ) . here, we showed that aged animals display an increased percentage of krt + cells during the recovery phase of influenza-induced lung injury, which reflects the dysregulated repair response in aged hosts. over the past decade, regulatory t cells have emerged as key mediators of wound healing and tissue regeneration ( , , ) . this group of specialized cells has been primarily known for their ability to suppress effector immune cell subsets leading to resolution of inflammation, but they are also capable of directly affecting tissue regeneration through production of pro-repair mediators such as amphiregulin and keratinocyte growth factor ( , , ) . investigators have demonstrated that aging can negatively impact the composition and function of the treg cell pool throughout the lifespan, rendering them inefficient as facilitators of tissue repair ( ) . this decline might occur through cell-autonomous mechanisms resulting in t cell maladaptive responses that lead to increased susceptibility to disease. for instance, loss of stemness accompanied by differentiation into pro-inflammatory th /th phenotypes, activation of dna damage responses and the senescence secretome are among some of the t cell maladaptations that result from the mounting challenges to which the t cell repertoire is exposed over a lifetime ( ) . these t cell maladaptive changes could also result from an age-related loss in stromal signals and circulating factors from the tissue microenvironment that either affect t cell function directly or render the microenvironment resistant to t cell responses. our heterochronic adoptive treg cell transfer experiments definitively address this paradigm, showing that the observed age-related treg cell dysfunction is due to cell-autonomous mechanisms and dominant over the aged pulmonary microenvironment. our data demonstrate that aging not only imparts a loss of pro-recovery treg cell function, but also a gain of some of these maladaptive features when compared with young hosts. what are the molecular mechanisms underpinning the age-associated treg cell gain or loss-of pro-reparative function in the lung following influenza infection? gene expression profiling of lung treg cells during the recovery phase of influenza infection showed that young treg cells significantly upregulated genes (when compared with aged treg cells) linked to biologic processes associated with a robust pro-repair signature, including extracellular matrix organization, alveologenesis and vasculogenesis. here, we demonstrate that the young treg cell pro-repair program is dominated by areg expression, accompanied by upregulation of il- and il- receptors and other genes related to the above-mentioned reparative processes. interestingly, we found no difference when comparing the suppressive phenotype of young versus aged treg cells, suggesting that following influenza-induced lung injury, the reparative program of treg cells is separable and distinct from their suppressive program. this is an important observation that informs the development of novel treg cell-based immunotherapies to specifically target molecular pathways regulating their reparative function. in regard to aged treg cells, we found that although capable of upregulating a pro-repair program following influenza infection, it is less robust when compared with the youthful reparative response. moreover, aged treg cells displayed increased expression of genes associated with an effector phenotype. accordingly, we found increased expression of th canonical markers, tbet and ifn-γ. whether this finding represents an age-related functional adaptability of treg cells following influenza infection or it is the result of treg cell lineage instability leading to effector differentiation remains unknown. establishment of a treg cell specific dna hypomethylation pattern at key genomic loci is necessary to maintain the lineage stability and immunosuppressive function of treg cells ( ) . epigenomic profiling has revealed that treg cell-specific alterations in methylation patterning modulate treg cell transcriptional programs and increase susceptibility to human autoimmune diseases ( ) . whether epigenetic phenomena have a similar regulatory role in modulating the treg cell reparative gene expression program remains unknown. here, we used an unsupervised bioinformatics analysis to uncover a treg cell-specific methylation-regulated transcriptional program enriched for reparative processes during recovery from influenza infection. our computational integrative approach provides inferential evidence that age-related dna methylation can modify the expression of genes linked to pro-repair processes in treg cells but does not prove causality and therefore represents a limitation of our study. future research could focus on leveraging epigenome editing technologies to establish the causality of age-related, treg cell-specific dna methylation patterns in controlling their regenerative function. in conclusion, our study establishes that aging imparts cell-autonomous dysfunction to the young ( - -month-old) and aged ( - - wild-type c bl/ mice were anesthetized with isoflurane and intubated using a -gauge angiocatheter cut to a length that placed the tip of the catheter above the carina. mice were instilled with a mouse-adapted influenza a virus (a/wsn/ [h n ]) ( pfu/mouse or pfu/mouse for foxp dtr mice, in µl of sterile pbs) as previously described ( ) . to prepare organ tissues for histopathology, the inferior vena cava was cut and the right ventricle was perfused in situ with ml of sterile pbs and then sutured a -gauge angiocatheter into the trachea via a tracheostomy. the lungs were removed en bloc and inflated to cm h o with % paraformaldehyde. -µm sections from paraffin-embedded lungs were stained with hematoxylin-eosin and examined using light microscopy with the high-throughput, automated, slide imaging system, tissuegnostics (tissuegnostics gmbh). single-cell suspensions from harvested mouse lungs were prepared and stained for flow cytometry analysis and fluorescence-activated cell sorting as previously described using the reagents shown in supplemental table ( , ) . the cd + t cell isolation kit, mouse (miltenyi) was used to enrich cd + t cells in single-cell suspensions prior to flow cytometry sorting. cell counts of single-cell suspensions were obtained using a cellometer with ao/pi staining (nexcelom bioscience) before preparation for flow cytometry. data acquisition for analysis was performed using a bd symphony a instrument with facsdiva software (bd). cell sorting was performed using the -way purity setting on bd facsaria sorp instruments with facsdiva software. analysis was performed with flowjo v . . software. lungs were harvested from young and aged mice and a single-cell suspension was obtained. red blood cells were removed with ack lysis buffer (thermo fisher) following the manufacturer's instructions. single-cell suspensions were plated on -well cell culture plates (thermo fisher) at a concentration of x cells/ml with rpmi plus µl/ml leukocyte activation cocktail with golgiplug (bd) and incubated for hours at °c. after incubation, cells were resuspended in pbs and stained with a viability dye and subsequently with fluorochromeconjugated antibodies directed at ifn-γ (clone xmg . ), il- (clone tc - h ) and il- (clone b ). data acquisition and analysis was performed as described above. splenic cd + cd + treg cells were isolated from euthanized young ( - -month-old) and aged flow cytometry sorted lung treg cells were pelleted in rlt plus buffer with -mercaptoethanol and stored at - °c until rna extraction was performed. the qiagen allprep dna/rna micro kit was used for rna and dna simultaneous isolation ( ) . rna quality was assessed with the tapestation system (agilent technologies). mrna was isolated from purified ng total rna using oligo-dt beads (new england biolabs, inc). nebnext ultra™ rna kit was used for full-length cdna synthesis and library preparation. libraries were pooled, denatured and diluted, resulting in a . pm dna solution. phix control was spiked at %. libraries were sequenced on an illumina nextseq instrument (illumina inc) using nextseq high output reagent kit ( x cycles). for rna-seq analysis, fastq reads were demultiplexed with bcl fastq v . a ranked gene list from young and aged phenotypes was ordered by log (fold-change) in average expression, using , permutations and the hallmark gene set database ( ) . genomic dna was isolated from sorted lung treg cells using qiagen allprep dna/rna micro kit. endonuclease digestion, fragment size selection, bisulfite conversion and library preparation were performed as previously described ( , ( ) ( ) ( ) . sequencing was performed on nextseq instrument (illumina). dna methylation analysis and quantification were performed using trim the raw and processed next-generation sequencing data sets have been uploaded to the geo database (https://www.ncbi.nlm.nih.gov/geo/) under accession number gse , which will be made public upon peer-reviewed publication. epidemiology of seasonal influenza: use of surveillance data and statistical models to estimate the burden of disease clinical treatment experts group for, association between age and clinical characteristics and outcomes of covid- estimates of global seasonal influenza-associated respiratory mortality: a modelling study acute respiratory distress syndrome multidimensional assessment of alveolar t cells in critically ill patients regulatory t-cells: potential regulator of tissue repair and regeneration regulatory t cell dna methyltransferase inhibition accelerates resolution of lung inflammation cd +cd +foxp + tregs resolve experimental lung injury in mice and are present in humans with acute lung injury transcriptional analysis of foxp + tregs and functions of two identified molecules during resolution of ali regulatory t cells reduce acute lung injury fibroproliferation by decreasing fibrocyte recruitment a distinct function of regulatory t cells in tissue protection dna methylation as a transcriptional regulator of the immune system t cell receptor stimulation-induced epigenetic changes and foxp expression are independent and complementary events required for treg cell development molecular and physiological manifestations and measurement of aging in humans dna methylation-based biomarkers and the epigenetic clock theory of ageing immunometabolism of pro-repair cells impaired clearance of influenza a virus in obese, leptin receptor deficient mice is independent of leptin signaling in the lung epithelium and macrophages lineagenegative progenitors mobilize to regenerate lung epithelium after major injury persistent pathology in influenzainfected mouse lungs successful and maladaptive t cell aging regulatory t cells: mechanisms of differentiation and function single-cell transcriptomics of regulatory t cells reveals trajectories of tissue adaptation regulatory t cells in nonlymphoid tissues dna methylation-based measures of biological age: meta-analysis predicting time to death clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in impaired immune responses in the lungs of aged mice following influenza infection regulatory t cells in skin injury: at the crossroads of tolerance and tissue repair a special population of regulatory t cells potentiates muscle repair foxp (+) regulatory t cell expression of keratinocyte growth factor enhances lung epithelial proliferation poor repair of skeletal muscle in aging mice reflects a defect in local, interleukin- -dependent accumulation of regulatory t cells regulatory t cell-specific epigenomic region variants are a key determinant of susceptibility to common autoimmune diseases intratracheal administration of influenza virus is superior to intranasal administration as a model of acute lung injury flow-cytometric method for simultaneous analysis of mouse lung epithelial, endothelial, and hematopoietic lineage cells improving the quality and reproducibility of flow cytometry in the lung mitochondrial complex iii is essential for suppressive function of regulatory t cells dna methylation regulates the neonatal cd (+) t-cell response to pneumonia in mice gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles a practical guide to the measurement and analysis of dna methylation tet coactivates gene expression through demethylation of enhancers catacomb: an endogenous inducible gene that antagonizes h k methylation activity of polycomb repressive complex via an h k m-like mechanism we thank the northwestern university flow cytometry core facility supported by cancer center support grant (nci ca ). flow cytometry cell sorting was performed using bd facsaria sorp systems purchased through the support of nih s od - and s od - . histology services were provided by the northwestern university mouse histology and key: cord- - nfprn a authors: azmi, maryam a.; palmisano, nicholas j.; medwig-kinney, taylor n.; moore, frances e.; rahman, rumana; zhang, wan; adikes, rebecca c.; matus, david q. title: a laboratory module that explores rna interference and codon optimization through fluorescence microscopy using caenorhabditis elegans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nfprn a authentic research experiences are beneficial to students allowing them to gain laboratory and problem-solving skills as well as foundational research skills in a team-based setting. we designed a laboratory module to provide an authentic research experience to stimulate curiosity, introduce students to experimental techniques, and promote higher-order thinking. in this laboratory module, students learn about rna interference (rnai) and codon optimization using the research organism caenorhabditis elegans (c. elegans). students are given the opportunity to perform a commonly used method of gene downregulation in c. elegans where they visualize gene depletion using fluorescence microscopy and quantify the efficacy of depletion using quantitative image analysis. the module presented here educates students on how to report their results and findings by generating publication quality figures and figure legends. the activities outlined exemplify ways by which students can acquire the critical thinking, data interpretation, and technical skills, which are beneficial for future laboratory classes, independent inquiry-based research projects and careers in the life sciences and beyond. scientific teaching content learning goals gain experience working with c. elegans understand the process of rna interference and importance of codon optimization learn basic microscopy techniques and image analysis learn how to properly use the scientific method enhance critical thinking skills learning objectives students will be able to: lab and : identify specific larval stages of c. elegans synchronize c. elegans larvae using alkaline hypochlorite treatment understand codon usage formulate hypotheses and design a controlled experiment lab and : acquire images using an epifluorescence microscope effectively communicate results and formulate conclusions from data describe what rnai is and how it affects gene expression/activity calculate mean fluorescent intensity from acquired fluorescence micrographs perform statistical tests to determine the significance of results generate publication quality figures and figure legends inquiry-based learning is a form of active learning where students obtain the skills they need to problem solve and make unique discoveries about the natural world( - ). in contrast to teacher- centered instruction, where facts are disseminated to students, inquiry-based learning encourages students to foster their own independent learning with the assistance of the instructor( - ). in addition, inquiry-based learning puts emphasis on students developing science process skills, such as making observations, developing hypotheses, and formulating conclusions( - ). course-based undergraduate research experiences (cures) are a form of inquiry-based learning that provide students with a genuine research experience. students enrolled in cures develop or are given a research question with an unknown outcome, use the scientific method to address the question, collect and analyze data, and communicate their results ( ) ( ) ( ) . students that participate in a cure learn the necessary skills and techniques they need to carry out the tasks required( - ), and at the same time gain confidence in their ability to engage in the scientific process( - ). assessment of student learning gains reveal that cures improve student ability to think critically, interpret data, communicate results, and collaborate as a team, compared to traditional lab courses ( ) ( ) ( ) ( ) ( ) ( ) . a critical aspect of independent research is obtaining the foundational skills and introductory training needed for understanding a specific system and/or research topic of interest. here we describe a simple laboratory module employed in the first half of our upper division undergraduate cure on developmental genetics, which is used to prepare students for independent inquiry-based group research projects that occur in the second half of the course. in this module, students are introduced to the research organism, caenhorhabditis elegans (c. elegans), to explore the concepts of rna interference (rnai) and codon optimization. c. elegans offers many advantages that make it an ideal research organism, such as a fast life cycle, large brood sizes, and easy access to genetic manipulation by forward and/or reverse genetic approaches ( ) ( ) ( ) . additionally, they are transparent, which allows for visualization of all tissue types, and the real-time visualization of fluorescent-tagged reporter proteins expressed in various tissues of interest ( , ) . using the protocols outlined in this paper, students will cultivate and utilize c. elegans to conduct an rnai experiment where they will visualize first-hand how rnai depletes a gene of interest and how codon optimization significantly impacts gene expression. the importance of this module is that it not only teaches concepts in molecular genetics and introduces students to a model system, it also promotes higher-order thinking. with the instruction provided here, students will gain experience and expertise in working with c. elegans, basic microscopy, data analysis, and science communication. students will have an intuition for probing a research question, developing hypotheses, planning experiments, thinking about possible results, and formulating conclusions. therefore, modules like the one presented here have a positive impact on student development and at the same time provide the prerequisites needed for success in stem- related fields and beyond. overall, this module can be used at various educational levels to enhance students' interest in science and provide the groundwork needed for independent scientific research. the module incorporates active learning styles of instruction that significantly enhance student performance and encourage engagement. finally, this module can also be used as a "stepping-stone" or "bootcamp" exercise to provide students with a set of skills and tools for the inquiry-based module of a cure using c. elegans as a model organism. intended audience this laboratory module was employed in the first half of upper-level undergraduate developmental genetics at stony brook university. most students enrolled in the course were juniors or seniors; however, the module can be implemented at various educational levels (i.e. freshman or sophomores or first-year graduate students as a "bootcamp" exercise). it can also be adapted for use in a high school setting where students can be introduced to a model system commonly studied in the life sciences, learn concepts that pertain to gene expression and regulation, and reinforce their prior knowledge of the scientific method. the module requires four lab sessions of approximately hours each. we found this was ample time for students to become accustomed to working with c. elegans and proficient in the necessary skills needed to complete the module. however, the module timing can be adjusted as needed to any desired length of time. and animal development. it is highly encouraged that students have familiarity with basic laboratory procedures, such as micropipetting and sterile techniques. prior to the module, all necessary materials and information needed to complete the assignments will be provided, and students will receive an introduction to c. elegans, rnai, and basic microscopy. activities, polling questions, and independent-learning activities. to complete the life history stages assignment and gfp rnai experiment discussed below, students were arranged into groups, which fostered peer-to-peer communication, teamwork, and promoted student engagement. moreover, prior to the gfp rnai experiment, students were assigned a gfp rnai worksheet (see below) which prompted independent thinking about the experiment. by completing the gfp rnai worksheet, students had an opportunity to complete a task on their own without supervision and troubleshoot through any obstacles they may have encountered. student assessments were conducted at multiple levels throughout the module. during the short introductory lectures given, students were asked a series of polling questions incorporated into the lecture (supplement ) and were assessed based on their answers. students were evaluated on their ability to perform lab tasks and follow directions given by teaching assistants and instructors. additionally, students were graded based on the quality of their lab assignment, which included data analysis, figure generation and figure legend writing (supplement ). inclusive teaching we have designed this module to be all-inclusive by differentiating content and lesson material to reach various types of learners. the hands-on activities of this module capture the attention and engagement of kinesthetic and tactile learners. our short lectures that contain images, provide written instruction, and facilitate discussion amongst the class are accommodating to both visual and auditory learners. in addition, we encourage splitting the class into groups to include both men and women, as well as students of different ethnic backgrounds to foster an inclusive instructional environment. thus, the module ensures equity by reaching all types of learners and encourages diversity through the formation of disparate groups. overview of the module in the module discussed in detail below, various concepts that are central to the understanding of gene expression and gene regulation will be explored. using c. elegans as a model organism, students will understand how codon optimization of a nucleotide sequence significantly impacts the effectiveness of rnai-mediated gene depletion. specifically, students will work with two gfp-expressing c. elegans strains, where one strain expresses a non-codon optimized (nco) gfp tag (gfpnco), while the other strain expresses a codon optimized (co) gfp tag (gfpco). the gfpnco and gfpco tags are each fused to the histone protein, his- (h b), that is driven by a ubiquitous promoter, eft- , which promotes expression in all cells. students will treat each strain with an empty vector (control) rnai bacterial clone or an rnai bacterial clone that produces dsrna specific to only the non-codon optimized gfp variant (gfpnco) (review timmons and fire, for a detailed description on how rnai works in c. elegans). through fluorescence microscopy, students will visually see differences in gfp expression in each strain due to codon optimization, and they will observe that significant depletion occurs only in the strain expressing eft- >h b::gfpnco. from their understanding of rnai and codon optimization, we anticipate that students will be able to accurately predict these results and explain why these outcomes occur. prior to the module, we present students with a series of lectures that include an introduction to c. elegans and discussion about gene regulation (supplement ) (review corsi et al., for a comprehensive overview of c. elegans). we discuss the topic of rnai, which is a biological process that in the presence of exogenous double stranded rna (dsrna) results in post-transcriptional gene silencing ( , ( ) ( ) ( ) . one method used to administer c. elegans with dsrna is to feed them with e. coli expressing a vector capable of producing dsrna, that is complementary to a target gene of interest( , ). c. elegans are unique in that they have a systemic rnai response, meaning that dsrna spreads throughout all tissues, with the exception of most neurons( , ). thus, loss-of- function phenotypes for genes of interest can be assessed in almost any tissue of interest using rnai. we also engage our students by having a discussion on codon optimization, or the modification of a sequence of dna such that the frequency of codons used by a particular organism for a specific amino acid is taken into account( - ). codon optimization significantly enhances the expression level of a particular protein due to the correlation between codon usage and trna abundance, and mrna stability( - ). thus, the expression levels of codon optimized genes will be more robust than those of non-codon optimized. in all, we anticipate this module will fulfill several goals, which include student proficiency in using the scientific method and development of critical thinking skills. after completing this module, students will be able to conduct controlled experiments using a model organism. in addition, they will be able to explain what rnai is and how it can be used to assess loss-of-function phenotypes for any gene of interest. lastly, students will be able to state the importance of codon optimization as it pertains to gene expression. and ). an additional lab session was devoted to an introduction to compound light microscopy and a tutorial on microscopy. after the conclusion of these introductory sessions, students demonstrate their ability to work with c. elegans and operate a compound light microscope available in class, by imaging the life history stages (different larval stages) of the worms. additionally, students compile a figure containing images of the life stages along with a descriptive figure legend (supplement and ). we find this exercise extremely valuable for students to master important worm husbandry techniques (i.e. worm picking), identify larval stages, and become familiar with microscopy techniques, all of which will be necessary to successfully perform the gfp rnai experiment. to prepare the students for the experiment, we presented a short lecture on rnai and codon bias (supplement ) and devised a "gfp rnai worksheet" (supplement ). the goal of this worksheet is to drive students to formulate hypotheses as to whether the gfpnco rnai clone will efficiently knock down gfp intensity levels in the strain expressing h b::gfpco or h b::gfpnco. in this worksheet, the students are provided with the nucleotide and amino acid sequences for the codon and non-codon optimized h b::gfp tags, as well as the short-interfering nucleotide sequence from the gfpnco rnai clone (supplement ). using the sequences provided, students will make a pairwise sequence alignment, using emboss needle (https://www.ebi.ac.uk/tools/psa/emboss_needle/). they will then compare the percent similarities between the different sequences and determine whether the short-interfering nucleotide sequence for gfpnco rnai is most similar to h b::gfpco or h b::gfpnco. through this process, students will see that the sirna sequence (in dna form) encoded by the gfpnco rnai clone is % identical to the gfpnco sequence and not the gfpco sequence. students will also appreciate that the control rnai clone is called "empty vector" because it does not produce a dsrna product. to conduct the rnai experiment, the students should grow up both the eft- >h b:: gfpco when ngm plates are full of gravid adults (> adults on each plate or food source is near depletion), students should treat each strain with alkaline hypochlorite solution (figure step )(supplement and supplement section v) to create synchronized l s. approximately - l animals should be pipetted onto control and gfpnco-specific rnai plates (figure step ). individual rnai plates should have no more than ~ - worms to prevent overcrowding and depletion of the e. coli food source (figure step ). the l 's are then cultured on the rnai plates at the desired temperature until the l or l stage is reached (figure step ). once the desired stage is reached, students mount the animals on % agarose slides containing a droplet of m buffer and sodium azide to anesthetize the animals (figure step ). we recommend that students pick ~ animals for imaging at a time. we also encourage instructors to ensure students handle sodium azide with care as it is toxic( )(supplement and supplement section vi). students quantified h b::gfp fluorescence depletion using two wide-field epifluorescence microscopes, the accu-scope or leica dmlb fluorescence microscopes (figure step imaged for each rnai treatment (control and gfpnco). from the data acquired by the students, several qualitative observations were made (figure a and b) . first, the overall fluorescence intensity of the gfpco strain was visually much brighter than the gfpnco strain. second, treating the gfpnco strain with gfpnco rnai strongly reduced the fluorescence intensity of gfp, whereas treating the gfpco strain with gfpnco rnai did not (figure a and b, eft- >h b::gfp column) . third, in the gfpnco strain treated with gfpnco rnai, although the fluorescence intensity of gfp was strongly reduced, some nuclei still showed high levels of gfp, these correspond to the cells that are insensitive to rnai, most notably neurons (figure a , eft- >h b::gfpnco; gfpnco rnai) to analyze the data quantitatively, we instructed students to quantify whole-body gfp fluorescence intensity for animals from each strain grown on control and gfpnco rnai, using fiji/imagej ( ). briefly, the entire body of each worm was outlined and the mean fluorescence intensity (mfi) was then measured for both gfp and an area of background. the background mfi measurement was then subtracted from the gfp mfi measurement to reduce background noise and obtain a mean gray value (mgv). mean gray values were normalized by dividing the mfi in rnai- treated animals by the average mfi in control-treated animals (supplement , supplement section vii, tutorial videos - , supplement ). the mean gray values obtained from each imaging system (microscope) are plotted next to their respective micrographs (figure a' and b') . by plotting the normalized mgv, students were able to clearly see that treating the gfpnco strain with gfpnco rnai significantly reduced the expression of gfp compared to control-treated animals (figure a, a' , and b, b', nco strain; control rnai vs. gfpnco rnai). moreover, the students noted that treatment with gfpnco rnai had no effect on gfp expression levels in the codon- optimized strain (figure a, a', creating a publication quality figure similar to that described for the life history stages, which also included a table of their raw data and description of their results (supplement and ). from these results, and the results obtained from the gfp rnai worksheet, it becomes evident that rnai specificity is largely dependent on the sequence homology/similarity between the target gene sequence and the sequence of the sirna produced by the rnai clone itself. extended results (optional) compared to wide-field epifluorescence microscopy, high resolution fluorescence microscopy greatly improves the detail and resolving power of fluorescence micrographs, such that unwanted out-of-focus light is significantly reduced, and the detail of cellular objects is greatly enhanced( ). thus, to show students high quality images of nuclear dna labeled with h b::gfp, we acquired spinning-disk confocal images for both the eft- >h b::gfpco and eft- >h b::gfpnco strains ( figure c, c', , and ) . importantly, these spinning-disk confocal images served to better illustrate some of the key concepts discussed in the lab module, such as codon optimization and lineage specific differences in rnai susceptibility. from the confocal fluorescence micrographs, it becomes more apparent that treatment with gfpnco rnai significantly reduces gfp fluorescence intensity in the gfpnco strain, but not in the gfpco strain ( figure c , c'; co strain vs. nco strain; gfpnco rnai vs. control rnai). to highlight the differences in expression levels between codon optimized and non-codon optimized h b::gfp fusion proteins, we took spinning disk confocal images of the c. elegans germline. in general, codon optimized transgenes are more robustly expressed in the germline than non-codon optimized transgenes ( , ) . in line with this, h b::gfp fluorescence expression was more robust in germ cells when gfp is codon-optimized as opposed to when it is non-codon optimized (figure ; co strain vs. nco strain). in c. elegans, certain cell lineages show different sensitivities to exogenous dsrna. for example, neurons are less sensitive to rnai compared to other somatic tissues ( , ) . this is because neurons lack the dsrna-gated channel, sid- , which promotes the uptake of dsrna into cells that express it( ). to emphasize this concept to students, we acquired spinning-disk confocal images of nuclei from various cell lineages commonly studied in c. elegans (figure a) with gfpnco rnai significantly reduced gfp fluorescence intensity levels in the gfpnco strain, but not in the gfpco strain ( figure b-e) . however, with respect to the gfpnco strain treated with gfpnco rnai, the percent decrease in gfp intensity levels in the pharyngeal neurons was much less than the decrease found in the other cell types examined ( figure b compared to figures c-e) . thus, these observations can be used in the classroom to clearly illustrate to students that certain cell types are more or less sensitive to exogenous dsrna, and this sensitivity is in part dictated by the surface proteins these cells express ( ). the laboratory module presented here teaches a variety of common techniques employed by c. elegans researchers and exposes students to various concepts in molecular genetics and microscopy. during this module, students will become proficient at working with a widely used research organism, be able to conduct controlled experiments, analyze data, produce publication quality images, and have a basic understanding of microscopy. in addition, students will have a solid foundation as to how rnai works, how it can be used to study gene function, and the importance of codon optimization on proper gene expression this module clearly illustrates that certain cell types are less or more prone to the effects of dsrna treatment, and that codon optimization results in improved gene expression in tissues (i.e. the germline). the advantage of using a strain that drives ubiquitous expression of h b::gfp is that it is extremely bright and nuclear localized, and therefore easily visible on widefield epifluorescence microscopes, which are commonly available in most laboratory classrooms. for classrooms that have access to a high resolution microscope, such as a spinning-disk confocal, laser-scanning confocal microscope, or structured illumination microscopes, this module can be easily adapted for use on those types of microscopes as shown in figures c, , and . the additional benefit of the strains used in this module is that students can immediately see differences in depletion between h b::gfpco and h b::gfpnco upon gfpnco rnai treatment. such is a bonus when attempting to keep students intrigued and engaged. upon completing this module, students will acquire the basic foundational skills needed for independent inquiry-based research projects involving c. elegans. some examples of inquiry-based research projects that can follow this module include a reverse genetics screen to identify genes important for specific processes of interest, such as longevity. in this example, with the assistance of their instructor, students can design a simple research question, such as "do fat metabolism genes play a role in regulating lifespan?". students can search the literature for fat metabolism genes of interest, use either the ahringer or vidal rnai libraries (source bioscience) to isolate clones specific for those genes, and determine if their depletion reduces or enhances longevity. results can then be documented, written up in research paper format, and presented to the class. the independent inquiry-based research projects that follow this module are limitless and can focus on key areas of research implicated in a wide range of cellular processes, such as cell cycle regulation, cellular invasion, stress-resistance pathways, vesicle trafficking, and much more. whereas most lecture and laboratory-based classrooms use expository styles of instruction, classrooms that utilize active learning styles of instruction, such as inquiry-based learning strategies, significantly enhance student performance and learning outcomes ( , , ) . several active learning strategies that can be implemented throughout this module include variations of the jigsaw technique( ) and think-pair-share( ). with respect to the jigsaw technique, students can be divided into several teams, with each team focusing on identifying a specific life history stage as described above. when those teams complete the assignment, the class is divided once again into new groups. each new group consists of one member from the original teams, with each member being responsible for teaching the group how to identify their originally assigned life history stage. we performed a variation of the jigsaw technique and found the students to be engaged and excited with the task provided. to further promote student thinking and awareness, an instructor can also take advantage of think-pair-share. here, an instructor can pose a question, allow the students to think independently about the question, then form pairs or groups to discuss their ideas collectively, and share with the class. this strategy can be easily administered for the gfp rnai worksheet. here students can be given the worksheet as a homework or in-class assignment to think independently about which strain the gfpnco rnai clone will be more efficient against. later in the lab session or during the next lab session, the students can form pairs or groups, discuss their opinions with one another, and then present them to the class. in all, there are various active learning strategies that can be implemented in this module, which fosters peer-to-peer communication, promotes student engagement, and stimulates higher-order thinking. this module can be further adapted for remote teaching and online learning. instructors can teach image analysis alongside with their lectures online and provide students with previously acquired raw data sets from epifluorescence and/or confocal microscopes. lessons can be held synchronously by utilizing the share screen option in video conferencing apps, such as zoom or google meet, or asynchronously by recording lectures and lessons ahead of time, and uploading them onto blackboard, microsoft cloud, or google drive along with our image analysis video tutorials (supplemental tutorial videos - ). based on the knowledge gained from the lectures, compiled raw data, and the gfp rnai worksheet, students will be able to formulate their hypothesis and test it by analyzing the supplied data. we adapted this distance learning technique for the second half of our course during the sars-cov pandemic in the spring of and received positive feedback from our students about the adaptability of the course. an additional advantage of this module is that it not only applies to the undergraduate setting, it can be adapted to a variety of educational levels, such as the high school or graduate level. at the high school level, this module can enhance critical thinking, promote independence at an early stage of a student's career, and instill awareness by introducing the field of genetics and organismal biology. moreover, it inherently promotes student engagement, by allowing students to work with an organism that most are unaware exists, work in groups to share ideas, and visualize cellular processes live. additionally, we provide simplified protocols and instructions to make it easy for instructors who have little experience with c. elegans to facilitate this module in their classroom. at the graduate level, this module can be particularly useful for graduate student rotations and can serve as an introductory "boot camp" or "stepping-stone" to introduce the c. elegans and experimental techniques used in c. elegans research. here, entry-level graduate students who have not previously worked with c. elegans will have the opportunity to do so, and can immediately start acquiring data by conducting a reverse genetics screen devised by the principal investigator and/or themselves. over time, these students can become confident enough to develop and plan their own projects. in summary, this module is an excellent resource for instructors interested in conveying a real-life science experience to their students and serves as an excellent opportunity for students to gain the hands-on experience they need in order to pursue a career in biology. rnai efficiency and to avoid overcrowding/starvation, ~ worms per plate will suffice. (step ) l larvae are grown until the l /l larval stage and then mounted on % agarose pad slides (containing sodium azide (anesthetic) and a drop of m buffer) for image acquisition. *growth times will vary based on temperature (see text for more details). (step ) images are acquired and then analyzed using fiji/imagej to determine the mean fluorescence intensity. results are briefly explained in the lab report and submitted along with a publication quality figure with figure legend. practical advice for teaching inquiry-based science process skills in the biological sciences the effects of discovery learning in a lower-division biology course inquiry-based and research-based laboratory pedagogies in undergraduate science assessment of course-based undergraduate research experiences: a meeting report modeling course-based undergraduate research experiences: an agenda for future research and evaluation a scalable cure using a crispr/cas fluorescent protein knock-in strategy in caenorhabditis elegans based undergraduate research experience to introduce drug-receptor concepts curric dev research and teaching: development of course-based undergraduate research experiences using a design-based approach alumni perceptions used to assess undergraduate research experience undergraduate research experiences support science career decisions and active learning becoming a scientist: the role of undergraduate research in students' cognitive, personal, and professional development a high-enrollment course-based undergraduate research experience improves student conceptions of scientific thinking and ability to interpret data. cbe adding authenticity to inquiry in a first-year, research-based implementing a course-based undergraduate research experience to grow the quantity and quality of undergraduate research in an animal science curriculum active learning increases student performance in science, engineering, and mathematics mini-course-based undergraduate research experience: impact on student understanding of stem research and interest in stem programs undergraduate biology lab courses: comparing the impact of traditionally based "cookbook" and authentic research-based courses on student lab experiences potent and specific genetic interference by double-stranded rna in caenorhabditis elegans the art and design of genetic screens green fluorescent protein as a marker for gene expression a transparent window into biology: a primer on caenorhabditis elegans revealing the world of rna interference codon optimality, bias and usage in translation and mrna decay codon bias as a means to fine-tune gene expression imagej : imagej for the next generation of scientific image data inhibition of thymidine kinase gene expression by anti-sense rna: a molecular approach to genetic analysis production of antisense rna leads to effective and specific inhibition of gene expression in c. elegans muscle identification of plant genetic loci involved in a posttranscriptional mechanism for meiotically reversible transgene silencing specific interference by ingested dsrna effectiveness of specific rna-mediated interference through ingested double-stranded rna in caenorhabditis elegans systemic rnai in c. elegans requires the putative transmembrane protein sid- systemic rnai in caenorhabditis elegans synonymous but not the same: the causes and consequences of codon bias codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes codon usage and trna content in unicellular and multicellular organisms correlation between the abundance of escherichia coli transfer rnas and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the e. coli translational system coevolution of codon usage and transfer rna abundance codon optimality is a major determinant of mrna stability an improved estimation of trna expression to better elucidate the coevolution between trna abundance and codon usage in bacteria the life cycle of the nematode caenorhabditis elegans a simplified method for mutant characterization auxin-mediated protein degradation in caenorhabditis elegans sodium azide-induced neurotoxicity mitochondrial inhibitors and neurodegenerative disorders super-resolution imaging in live cells the caenorhabditis elegans transgenic toolbox optogenetic dissection of mitotic spindle positioning in vivo enhanced neuronal rnai in c. elegans using sid- sid- domains important for dsrna import in caenorhabditis elegans inquiry-based learning to improve student engagement in a large first year topic. student success prescribed active learning increases performance in introductory biology the jigsaw classroom the responsive classroom discussion key: cord- - clslvqb authors: wang, xiaoqi; yang, yaning; liao, xiangke; li, lenli; li, fei; peng, shaoliang title: selfrl: two-level self-supervised transformer representation learning for link prediction of heterogeneous biomedical networks date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: clslvqb predicting potential links in heterogeneous biomedical networks (hbns) can greatly benefit various important biomedical problem. however, the self-supervised representation learning for link prediction in hbns has been slightly explored in previous researches. therefore, this study proposes a two-level self-supervised representation learning, namely selfrl, for link prediction in heterogeneous biomedical networks. the meta path detection-based self-supervised learning task is proposed to learn representation vectors that can capture the global-level structure and semantic feature in hbns. the vertex entity mask-based self-supervised learning mechanism is designed to enhance local association of vertices. finally, the representations from two tasks are concatenated to generate high-quality representation vectors. the results of link prediction on six datasets show selfrl outperforms state-of-the-art methods. in particular, selfrl reveals great performance with results close to in terms of auc and aupr on the neodti-net dataset. in addition, the pubmed publications demonstrate that nine out of ten drugs screened by selfrl can inhibit the cytokine storm in covid- patients. in summary, selfrl provides a general frame-work that develops self-supervised learning tasks with unlabeled data to obtain promising representations for improving link prediction. in recent decades, networks have been widely used to represent biomedical entities (as nodes) and their relations (as edges). predicting potential links in heterogeneous biomedical networks (hbns) can be beneficial to various significant biology and medicine problems, such as target identification, drug repositioning, and adverse drug reaction predictions. for example, network-based drug repositioning methods have already offered promising insights to boost the effective treatment of covid- disease (zeng et al. ; xiaoqi et al. ) , since it outbreak in december of . many network-based learning approaches have been developed to facilitate link prediction in hbns. in particularly, network representation learning methods, that aim at converting high-dimensionality networks into a low-dimensional space while maximally preserve structural * to whom correspondence should be addressed: fei li (pitta-cus@gmail.com) , and shaoliang peng (slpeng@hnu.edu.cn). properties (cui et al. ) , have provided effective and potential paradigms for link prediction li et al. ) . nevertheless, most of the network representation learning-based link prediction approaches heavily depend on a large amount of labeled data. the requirement of large-scale labeled data may not be met in many real link prediction for biomedical networks (su et al. ) . to address this issue, many studies have focused on developing unsupervised representation learning algorithms that use the network structure and vertex attributes to learn low-dimension vectors of nodes in networks (yuxiao et al. ) , such as grarep (cao, lu, and xu ) , tadw (cheng et al. ) , line (tang et al. ) , and struc vec (ribeiro, saverese, and figueiredo ) . however, these network presentation learning approaches are aimed at homogeneous network, and cannot applied directly to the hbns. therefore, a growth number of studies have integrated meta paths, which are able to capture topological structure feature and relevant semantic, to develop representation learning approaches for heterogeneous information networks. dong et al. used meta path based random walk and then leveraged a skip-gram model to learn node representation (dong, chawla, and swami ). shi et al. proposed a fusion approach to integrate different representations based on different meta paths into a single representation (shi et al. ). ji et al. developed the attention-based meta path fusion for heterogeneous information network embedding (ji, shi, and wang ) . wang et al. proposed a meta path-driven deep representation learning for a heterogeneous drug network (xiaoqi et al. ) . unfortunately, most of the meta path-based network representation approaches focused on preserving vertex-level information by formalizing meta paths and then leveraging a word embedding model to learn node representation. therefore, the global-level structure and semantic information among vertices in heterogeneous networks is hard to be fully modeled. in addition, these representation approaches is not specially designed for link prediction, thus resulting in learning an inexplicit representation for link prediction. on the other hand, self-supervised learning, which is a form of unsupervised learning, has been receiving more and more attention. self-supervised representation learning for-mulates some pretext tasks using only unlabeled data to learn representation vector without any manual annotations (xiao et al. ) . self-supervised representation learning technologies have been widely use for various domains, such as natural language processing, computer vision, and image processing. however, very few approaches have been generalized for hbns because the structure and semantic information of heterogeneous networks is significantly differ between domains, and the model trained on a pretext task may be unsuitable for link prediction tasks. based on the above analysis, there are two main problems in link prediction based on network representation learning. the first one is how to design a self-supervised representation learning approach based on a great amount of unlabeled data to learn low-dimension vectors that integrate the differentview structure and semantic information of hbns. the second one is how to ensure the pretext tasks in self-supervised representation learning be beneficial for link prediction of hbns. in order to overcome the mentioned issues, this study proposes a two-level self-supervised representation learning (selfrl) for link prediction in heterogeneous biomedical networks. first, a meta path detection self-supervised learning mechanism is developed to train a deep transformer encoder for learning low-dimensional representations that capture the path-level information on hbns. meanwhile, sel-frl integrates the vertex entity mask task to learn local association of vertices in hbns. finally, the representations from the entity mask and meta path detection is concatenated for generating the embedding vectors of nodes in hbns. the results of link prediction on six datasets show that the proposed selfrl is superior to state-of-the-art methods. in summary, the contributions of the paper are listed below: • we proposed a two-level self-supervised representation learning method for hbns, where this study integrates the meta path detection and vertex entity mask selfsupervised learning task based on a great number of unlabeled data to learn high quality representation vector of vertices. • the meta path detection self-supervised learning task is developed to capture the global-level structure and semantic feature of hbns. meanwhile, vertex entity-masked model is designed to learn local association of nodes. therefore, the representation vectors of selfrl integrate two-level structure and semantic feature of hbns. • the meta path detection task is specifically designed for link prediction. the experimental results indicate that selfrl outperforms state-of-the-art methods on six datasets. in particular, selfrl reveals great performance with results close to in terms of auc and aupr on the neodti-net dataset. heterogeneous biomedical network a heterogeneous biomedical network is defined as g = (v, e) where v denotes a biomedical entity set, and e rep-resents a biomedical link set. in a heterogeneous biomedical network, using a mapping function of vertex type φ(v) : v → a and a mapping function of relation type ψ(e) : e → r to associate each vertex v and each edge e, respectively. a and r denote the sets of the entity and relation types, where |a| + |r| > . for a given heterogeneous network g = (v, e), the network schema t g can be defined as a directed graph defined over object types a and link types r, that is, t g = (a, r). the schema of a heterogeneous biomedical network expresses all allowable relation types between different type of vertices, as shown in figure . figure : schema of the heterogeneous biomedical network that includes four types of vertices (i.e., drug, protein, disease, and side-effect). network representation learning plays a significant role in various network analysis tasks, such as community detection, link prediction, and node classification. therefore, network representation learning has been receiving more and more attention during recent decades. network representation learning aims at learning low-dimensional representations of network vertices, such that proximities between them in the original space are preserved (cui et al. ). the network representation learning approaches can be roughly categorized into three groups: matrix factorizationbased network representation learning approaches, random walk-based network representation learning approaches, and neural network-based network representation learning approaches (yue et al. ). the matrix factorization-based network representation learning methods extract an adjacency matrix, and factorize it to obtain the representation vectors of vertices, such as, laplacian eigenmaps (belkin and niyogi ) and the locally linear embedding methods (roweis and saul ) . the traditional matrix factorization has many variants that often focus on factorizing the high-order data matrix, such as, grarep (cao, lu, and xu ) and hope (ou et al. ) . inspired by the word vec (mikolov et al. ) ... ), node vec (grover and leskovec ) , and metap-ath vec/metapath vec++ (dong, chawla, and swami ), in which a network is transformed into node sequences. these models were later extended by struc vec (ribeiro, saverese, and figueiredo ) for the purpose of better modeling the structural identity. over the past years, neural network models have been widely used in various domains, and they have also been applied to the network representation learning areas. in neural network-based network representation learning, different methods adopt different learning architectures and various network information as input. for example, the line (tang et al. ) aims at embedding by preserving both local and global network structure properties. the sdne (wang, cui, and zhu ) and dngr (cao ) were developed using deep autoencoder architecture. the graphgan (wang et al. ) adopts generative adversarial networks to model the connectivity of nodes. predicting potential links in hbns can greatly benefit various important biomedical problems. this study proposes selfrl that is a two-level self-supervised representation learning algorithm, to improve the quality of link prediction. the flowchart of the proposed selfrl is shown in figure . considering meta path reflecting heterogeneous characteristics and rich semantics, selfrl first uses a random walk strategy guided by meta-paths to generate node sequences that are treated as the true paths of hbns. meanwhile, an equal number of false paths is produced by randomly replacing some of the nodes in each of true path. then, based on the true paths, this work proposes vertex entity masked as self-supervised learning task to train deep transformer encoder for learning entity-level representations. in addition, a meta path detection-based self-supervised learning task based on all true and false paths is designed to train a deep transformer encoder for learning path-level representation vectors. finally, the representations obtained from the twolevel self-supervised learning task are concatenated to generate the embedding vectors of vertices in hbns, and then are used for link prediction. true path generation a meta-path is a composite relation denoting a sequence of adjacent links between nodes a and a i in a heterogeneous network, and can be expressed in the where r i represents a schema between two objects. different adjacent links indicate distinct semantics. in this study, all the meta paths are reversible, and no longer than four nodes. this is based on the results of the previous studies that meta paths longer than four nodes may be too long to contribute to the informative feature (fu et al. ). in addition, sun et al. have suggested that short meta paths are good enough, and that long meta paths may even reduce the quality of semantic meanings (sun et al. ) . in this work, each network vertex and meta path are regarded as vocabulary and sentence, respectively. indeed, a large percentage of meta paths are biased to highly visible objects. therefore, three key steps are defined to keep a balance between different semantic types of meta paths, and they are as follows: ( ) generate all sequences according to meta paths whose positive and reverse directional sampling probabilities are the same and equal to . . ( ) count the total number of meta paths of each type, and calculate their median value (n ); ( ) randomly select n paths if the total number of meta paths of each type is larger than n ; otherwise, select all sequences. the selected paths are able to reflect topological structures and interaction mechanisms between vertices in hbns, and will be used to design selfsupervised learning task to learn low-dimensional representations of network vertices. false path generation the paths selected using the above procedure are treated as the true paths in hbns. the equal number of false paths are produced by randomly replacing some nodes in each of the true paths. in other words, each true path corresponds to a false path. there is no relation between the permutation nodes and context in false paths, and the number of replaced nodes is less than the length of the current path. for instance, a true path (i.e., d p d e ) is shown in figure (b). during the generation procedure of false paths, the st and rd tokens that correspond to d and d , respectively, are randomly chosen, and two nodes from the hbns which correspond to d and d , respectively, are also randomly chosen. if there is a relationship between d and p , node d is replaced with p . if there is a relationship between d and p , another node from the network is chosen until the mentioned conditions are satisfied. similarly, node d is replaced with d , because there are no relations between d and e (or p ). finally, the path (i.e., d p d e ) is treated as a false path. meta path detection in general language understanding evaluation, the corpus of linguistic acceptability (cola) is a binary classification task, where the goal is to predict whether a sentence is linguistically acceptable or not ). in addition, perozzi et al. have suggested that paths generated by short random walks can be regarded as short sentences (perozzi, alrfou, and skiena ) . inspired by their work, this study assumes that true paths can be treated as linguistically acceptable sentences, while the false paths can be regarded as linguistically unacceptable sentences. based on this hypothesis, we proposes the meta path detection task where the goal is to predict whether a path is acceptable or not. in the proposed selfrl, a set of true and false paths is fed into the deep transformer encoder for learning path-level representation vector. selfrl maps a path of symbol representations to the output vector of continuous representations that is fed into the softmax function to predict whether a path is a true or false path. apparently, the only distinction between true and false paths is whether there is an association between nodes of path sequence. therefore, the meta path detection task is the extension of the link prediction to a certain extent. especially, when a path includes only two nodes, the meta path detection is equal to the link prediction. for instance, judging whether a path is a true or false path, e.g., d s in figure b , is the same as predicting whether there is a relation between d and s . however, the meta path detection task is generally more difficult compared to link prediction, because it requires the understanding of long-range composite relationships between vertices of hbns. therefore, the meta path detection-based self-supervised learning task encourages to capturing high-level structure and semantic information in hbns, thus facilitating the performance of link prediction. in order to capture the local information on hbns, this study develops the vertex entity mask-based self-supervised learning task, where nodes in true paths are randomly masked, and then predicting those masked nodes. the vertex entity mask task has been widely applied to natural language processing. however, using the vertex entity mask task to drive the heterogeneous biomedical network representation model is a less explored research. in this work, the vertex entity mask task fellows the implementation described in the bert, and the implementation is almost identical to the original (devlin et al. ) . in brief, % of the vertex en-tities in true paths are randomly chosen for prediction. for each selected vertex entity, there are three different operations for improving the model generalization performance. the selected vertex entity is replaced with the ¡mask¿ token for % time, and is replaced with a random node for % time. furthermore, it has % chance to keep the original vertex. finally, the masked path is used for training a deep transformer encoder model according to the vertex entity mask task where the last hidden vectors corresponding to the mask vertex entities are fed into the softmax function to predict their original vertices with cross entropy loss. the vertex entity mask task can keep a local contextual representation of every vertex. the vertex entity mask-based self-supervised learning task captures the local association of the vertex in hbns. the meta path detection-based self-supervised learning task enhances the global-level structure and semantic features of the hbns. therefore, the two-level representations are concatenated as the final embedding vectors that integrate structure and semantics information on hbns from different level, as shown in figure (f). layer normalization the model of selfrl is a deep transformer encoder, and the implementation is almost identical to the original (vaswani et al. ) . the selfrl follows the overall architecture that includes the stacked self-attention and point-wise, fully connected layers, and softmax function, as shown in figure . multi-head attention an attention function can be described as mapping a query vectors and a set of key-value pairs to an output vectors. the multi-head attention leads table : the node and edge statistics of the datasets. here, ddi, dti, dsa, dda, pda, ppi represent the drug-drug interaction, drug-target interaction, drug-side-effect association, and drug-disease association, protein-disease association and protein-protein interaction, respectively. where w o is a parameter matrices, and h i is the attention function of i-th subspace, and is given as follows: respectively denotes the query, key, and value representations of the i-th subspace, and w is parameter matrices which represent that q, k, and v are transformed into h i subspaces, and d and d k hi represent the dimensionality of the model and h i submodel. position-wise feed-forward network in addition to multi-head attention layers, the proposed selfrl model include a fully connected feed-forward network, which includes two linear transformations with a relu activation function, is given as follows: there are the same the linear transformations for various positions, while these linear transformations use various parameters from layer to layer. residual connection for each sub-layer, a residual connection and normalization mechanism are employed. that is, the output of each sub-layer is given as follows: where x and f (x) stand for input and the transformational function of each sub-layer, respectively. in this work, the performance of selfrl is evaluated comprehensively by link prediction on six datasets. the results of selfrl is also compared with the results of methods. for neodti-net datasets, the performance of selfrl is compared with those of seven state-of-the-art methods, including mscmf ( . the details on how to set the hyperparameters in above baseline approaches can be found in neodti (wan et al. ) . for deepdr-net datasets, the link prediction results generated by selfrl are compared with that of seven baseline algorithms, including deepdr (zeng et al. ) , dtinet (luo et al. ) , kernelized bayesian matrix factorization (kbmf) (gonen and kaski ) , support vector machine (svm) (cortes and vapnik ) , random forest (rf) (l ), random walk with restart (rwr) (cao et al. ) , and katz (singhblom et al. ) . the details of the baseline approaches and hyperparameters selection can be seen in deepdr (zeng et al. ) . for single network datasets, selfrl is compared with network representation methods, that is laplacian (belkin and niyogi ) , singular value decomposition (svd), graph factorization (gf) (ahmed et al. ) , hope (ou et al. ) , grarep (cao, lu, and xu ) , deepwalk (perozzi, alrfou, and skiena ) , node vec (grover and leskovec ) , struc vec (ribeiro, saverese, and figueiredo ) , line (tang et al. ) , sdne (wang, cui, and zhu ) , and gae (kipf and welling ) . more implementation details can be found in bionev (yue et al. ) . the hyperparameters selection of baseline methods were set to default values, and the original data of neodti (wan et al. ) , deepdr (zeng et al. ) , and bionev (yue et al. ) were used in the experiments. the parameters of the proposed selfrl follows those of the bert (devlin et al. ) which the number of transformer blocks (l), the number of self-attention heads (a), and the hidden size (h) is set to , , and , respectively. for the neodti-net dataset, the embedding vectors are fed into the inductive matrix completion model (imc) (jain and dhillon ) to predict dti. the number of negative samples that are randomly chosen from negative pairs, is ten times that of positive samples according to the guidelines in neodti (wan et al. ) . then, to reduce the data bias, the ten-fold cross-validation is performed repeatedly ten times, and the average value is calculated. for the deepdr-net dataset, a collective variational autoencoder (cvae) is used to predict dda. all positive samples and the same number of negative samples that is randomly selected from unknown pairs are used to train and test the model according to the guidelines in deepdr (zeng et al. ) . then, five-fold crossvalidation is performed repeatedly times. for neodti-net and deepdr-net datasets, the area under precision recall (aupr) curve and the area under receiver operating characteristic (auc) curve are adopted to evaluate the link prediction performance generated by all approaches. for other datasets, the representation vectors are fed into the logistic regression binary classifier for link prediction, the training set ( %) and the testing set ( %) consisted of the equal number of positive samples and negative samples that is randomly selected from all the unknown interactions according to the guidelines in bionev. the performance of different methods is evaluated by accuracy (acc), auc, and f score. the overall performances of all methods for dti prediction on the neodti-net dataset are presented in figure . selfrl shows great results with the auc and aupr value close to , and significantly outperformed the baseline methods. in particular, neodti and dtinet were specially developed for the neodti-net dataset. however, selfrl is still superior to both neodti and dtinet, improving the aupr by approximately % and %, respectively. the results of dda prediction of selfrl and baseline methods are represented in figure . these experimental results demonstrate that selfrl generates better results of the dda prediction on the deepdr-net dataset than the baseline methods. however, selfrl achieves the improvements in term of auc and aupr less than %. a major reason for such a poor superiority of the selfrl to the other methods is that selfrl considers only four types of objects and edges. however, deepdr included types of vertices and types of edges of drug-related data. in addition, deepdr specially integrated multi-modal deep autoencoder (mda) and cvae model to improve the dda prediction on the deepdr-net dataset. unfortunately, the selfrl+cvae combination maybe reduce the original balance between the mda and cvae. the above results and analysis indicate that the proposed selfrl is a powerful network representation approach for complex heterogeneous networks, and that can achieve very promising results in link prediction. such a good performance of the proposed selfrl is due to the following facts: ( ) selfrl designs a two-level self-supervised learning task to integrate the local association of a node and the global level information of hbns. ( ) meta path detection selfsupervised learning task that is an extension of link prediction, is specially designed for link prediction. in particular, path detection of two nodes is equal to link prediction. therefore, the representation generated by meta path detection is able to facilitate the link prediction performance. ( ) selfrl uses meta paths to integrate the structural and semantic features of hbns. in this section, the link prediction results on four single network datasets are presented to further verify the represen- table , and the best results are marked in boldface. selfrl shows higher accuracy in link prediction on four single networks compared to the other baseline approaches. especially, the proposed selfrl can achieves an approximately % improvement in terms of auc and acc over the second best method on the string-ppi dataset. the auc value of link prediction on the ndfrt-dda dataset is improved from . to . when selfrl is compared with grarep. however, grarep only achieves an enhancement of . compared to line that is the third best method on the string-ppi dataset. therefore, the improvement of selfrl is significant in comparison to the enhancement of grarep compared to line. meanwhile, we also notice that selfrl have poor superiority to the second best method on the ctd-dda and drugbank-ddi datasets. one possible reason for this result can be that the structure and semantic of the ctd-dda and drugbank-ddi datasets are simple and monotonous, so most of the network representation approaches are able to achieve good performance on them. consequently, the proposed selfrl is a potential representation method for the single network datasets, and can contribute to link prediction by introducing a two-level self-supervised learning task. in the neodti and deepdr, low-dimensional representations of nodes in hbns are first learned by network representation approaches, and then are fed into classifier models for predicting potential link among vertices. to further examine the contribution of the network representation approaches, the low-dimensional representation vector is fed into svm that is a traditional and popular classifier for link prediction. the experimental results of these combinations are shown in table . selfrl achieves the best per-formance in link prediction for complex heterogeneous networks, providing a great improvement of over % with regard to auc and aupr compared to the neodti and deepdr. with the change of classifiers, the result of sel-frl in link prediction reduced from . to . on the neodti-net dataset, while the auc value of neodti approximately reduce by %. interestingly, the results on the deepdr-net dataset are similar. therefore, the experimental results indicate that the network representation performance of selfrl is more robust and better than those of the other embedding approaches. this is mainly because selfrl integrates a two-level self-supervised learning model to fuse the rich structure and semantic information from different views. meanwhile, path detection is an extension of link prediction, yielding to better representation in link prediction. the emergence and rapid expansion of covid- have posed a global health threat. recent studies have demonstrated that the cytokine storm, namely the excessive inflammatory response, is a key factor leading to death in patients with covid- . therefore, it is urgent and important to discover potential drugs that prevent the cy- tokine storm in covid- patients. meanwhile, it has been proven that interleukin(il)- is a potential target of antiinflammatory response, and drugs targeting il- are promising agents blocking cytokine storm for severe covid- patients (mehta et al. ). in the experiments, selfrl is used for drug repositioning for covid- disease which aim to discovery agents binding to il- for blocking cytokine storm in patients. the low-dimensional representation vectors generated by selfrl are fed into the imc algorithm for predicting the confidence scores between il- and each drug in neodti-net dataset. then, the top- agents with the highest confidence scores are selected as potential therapeutic agents for covid- patients. the candidate drugs and their anti-inflammatory mechanisms of action in silico is shown in table . the knowledge from pubmed publications demonstrates that nine out of ten drugs are able to reduce the release and express of il- for exerting anti-inflammatory effects in silico. meanwhile, there are three drugs (i.e., dasatinib, carvedilol, and indomethacin) that inhibit the release of il- by reducing the mrna levels of il- . however, imatinib inhibits the function of human monocytes to prevent the expression of il- . in addition, although the anti-inflammatory mechanisms of action of five agents (i.e., arsenic trioxide, irbesartan, amiloride, propranolol, sorafenib) are uncertain, these agents can still reduce the release or expression of il- for preforming anti-inflammatory effects. therefore, the top ten agents predicted by selfrl-based drug repositioning is able to be used for inhibiting cytokine storm in patients with covid- , and should be taken into consideration in clinical studies on covid- . these results further indicate that the proposed selfrl is a powerful network representation learning approach, and can facilitate the link prediction in hbns. in this study, selfrl uses transformer encoders to learn representation vectors by the proposed vertex entity mask and meta path detection tasks. meanwhile, the entity-and path- table : the dti and dda prediction result of selfrl and baseline methods on the neodti-net and deepdr-net datasets. the mlth and clth stand for the mean and concatenation values of representation from the last two hidden layers, respectively. atlre denotes the mean value of the two-level representation from the last hidden layer. table . selfrl achieves the best performance. meanwhile, the results show that the two-level representation are superior to the single level representation. interestingly, the concatenation of vectors from the lth layers is beneficial to improving the link prediction performance compared to the mean value of the vectors from the lth layers for each level representation model. this is intuitive since two-level representation can fuse the structural and semantic information from different views in hbns. meanwhile, larger number of dimensions can provide more and richer information. this study proposes a two-level self-supervised representation learning, termed selfrl, for link prediction in heterogeneous biomedical networks. the proposed selfrl designs a meta path detection-based self-supervised learning task, and integrates vertices entity-level mask tasks to capture the rich structure and semantics from two-level views of hbns. the results of link prediction indicate that selfrl is superior to state-of-the-art approaches on six datasets. in the future, we will design more self-supervised learning tasks with unable data to improve the representation performance of the model. in addition, we will also developed the effective multi-task learning framework in the proposed model. distributed large-scale natural graph factorization drug-target interaction prediction through domain-tuned network-based inference laplacian eigenmaps and spectral techniques for embedding and clustering laplacian eigenmaps for dimensionality reduction and data representation new directions for diffusion-based network prediction of protein function: incorporating pathways with confidence deep neural network for learning graph representations grarep: learning graph representations with global structural information network representation learning with rich text information support-vector networks a survey on network embedding bert: pre-training of deep bidirectional transformers for language understanding predicting drug target interactions using meta-pathbased semantic network analysis kernelized bayesian matrix factorization node vec: scalable feature learning for networks provable inductive matrix completion attention based meta path fusion forheterogeneous information network embedding variational graph auto-encoders. arxiv:machine learning random forests deepcas: an end-to-end predictor of information cascades predicting drug-target interaction using a novel graph neural network with d structure-embedded graph representation a network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information covid- : consider cytokine storm syndromes and immunosuppression. the lancet drug?target interaction prediction by learning from local information and neighbors distributed representations of words and phrases and their compositionality asymmetric transitivity preserving graph embedding deepwalk: online learning of social representations struc vec: learning node representations from structural identity. in knowledge discovery and data mining nonlinear dimensionality reduction by locally linear embedding heterogeneous information network embedding for recommendation prediction and validation of gene-disease associations using methods inspired by social network analyses network embedding in biomedical data science pathsim: meta path-based top-k similarity search in heterogeneous information networks line: large-scale information network embedding neodti: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions glue: a multi-task benchmark and analysis platform for natural language understanding structural deep network embedding graphgan: graph representation learning with generative adversarial nets shine:signed heterogeneous information network embedding for sentiment link prediction semisupervised drug-protein interaction prediction from heterogeneous biological spaces self-supervised learning: generative or contrastive. arxiv doi network representation learning-based drug mechanism discovery and anti-inflammatory response against a novel approach for drug response prediction in cancer cell lines via network representation learning graph embedding on biomedical networks: methods, applications and evaluations heterogeneous network representation learning using deep learning deepdr: a network-based deep learning approach to in silico drug repositioning key: cord- - ren ie authors: lutkenhoff, evan s.; nigri, anna; sebastiano, davide rossi; sattin, davide; visani, elisa; rosazza, cristina; d’incerti, ludovico; bruzzone, maria grazia; franceschetti, silvana; leonardi, matilde; ferraro, stefania; monti, martin m. title: eeg power spectra and subcortical pathology in chronic disorders of consciousness date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: ren ie objective to determine (i) the association between long-term impairment of consciousness after severe brain injury, spontaneous brain oscillations, and underlying subcortical damage, and (ii) whether such data can be used to aid patient diagnosis, a process known to be susceptible to high error rates. methods cross-sectional observational sample of patients with a disorder of consciousness secondary to brain injury, collected prospectively at a tertiary center between and . multimodal analyses relating clinical measures of impairment, electroencephalographic measures of spontaneous brain activity, and magnetic resonance imaging data of subcortical atrophy were conducted in . results in the final analyzed sample of patients, systematic associations were found between electroencephalographic power spectra and subcortical damage. specifically, the ratio of beta-to-delta relative power was negatively associated with greater atrophy in regions of the bilateral thalamus and globus pallidus (both left > right) previously shown to be preferentially atrophied in chronic disorders of consciousness. power spectrum total density was also negatively associated with widespread atrophy in regions of the left globus pallidus, right caudate, and in brainstem. furthermore, we showed that the combination of demographics, encephalographic, and imaging data in an analytic framework can be employed to aid behavioral diagnosis. conclusions these results ground, for the first time, electroencephalographic presentation detected with routine clinical techniques in the underlying brain pathology of disorders of consciousness and demonstrate how multimodal combination of clinical, electroencephalographic, and imaging data can be employed in potentially mitigating the high rates of misdiagnosis typical of this patient cohort. search terms disorders of consciousness, subcortical pathology, eeg, mri. lutkenhoff, owen, & monti, ) , there are no data directly connecting the patterns of eeg power spectra and subcortical damage in long-term doc patients, a gap which is not only problematic for the clinician's interpretation of the observed eeg data, but also hampers our ability to monitor, through an inexpensive, bedside, repeatable technique interventions and their effects. in what follows, we address, in a large cohort of patients with chronic doc, the heretofore untested relationship between observed electrocortical rhythms, patterns of subcortical brain atrophy (including thalamus, brainstem, and basal ganglia), and clinical measures of awareness and arousal. (giacino, kalmar, & whyte, )), (ii) neurophysiological evaluation, including resting eeg, and (iii) neuroradiological assessment, including mri (see table ). insert table about here the acquisition of both resting eeg and structural mri datasets constituted the inclusion criteria for the present study. experienced raters independently assessed each patient times with the italian version of the crs-r (sacco et al., ) . the best- recorded performance was used to classify patients as vs or mcs. as described below, patients were discarded due to the low quality of the mri data, following previously established procedures (lutkenhoff et al., ) . specifically, datasets were excluded because of excessive movement artifacts, datasets were excluded due to software failure in segmenting subcortical structures, datasets were excluded due to poor quality in the estimation of normalized brain tissue volume, and subjects were excluded due to large regions of signal dropout artifacts (e.g., implants) preventing tissue segmentation (see figure s ). importantly, as discussed below, exclusions do not bias the analyzed sample as compared to the full cohort (see also of these, % (n = ) had a traumatic brain injury (tbi) and % (n = ) had non- traumatic etiology (non-tbi). in particular, among non-tbi patients, % (n = ) suffered from anoxic brain injury and % (n = ) from haemorrhagic and/or ischemic brain injury. the median disease duration at the time of the study was months (range = - months). (see tables and s ). the local ethics committee approved all aspects of this research and written informed consent was obtained from the legally authorized representative of the patients prior to their inclusion in the study. data acquisition and analysis eeg data acquisition and processing patients underwent polygraphic recordings between pm and am on the following order to prevent signal degradation due to the prolonged recording and to maintain, as much as possible, the same reproducible characteristics between recordings in different patients. the selected epochs were filtered ( - hz, db/octave), followed by a hz notch filter to suppress the noise of the electrical power line, reformatted against the linked ear-lobe reference. in order to remove blink-artifacts, we applied an ica- artifact rejection algorithm. then, the selected eeg activity was divided into non- overlapping s segments and analyzed using the fast fourier transform. absolute total power and relative power were evaluated in the delta ( - hz), theta ( - hz), alpha ( ) ( ) ( ) ( ) ( ) ( ) , beta ( - hz) and gamma (> hz) bands, and averaged within each eeg channel. mri data acquisition and processing neuroimaging data were obtained with a t mri machine (achieva, philips healthcare bv, best, nl; -channel coil reconstructed into -dimensional vertex meshes, as depicted in figure . in addition, the normalized brain volume (nbv), a measure of global brain atrophy including white and gray matter volume, was calculated for each patient using fsl siena months post-injury as fixed variables and subjects as the random variable. as described below, the significant interaction between eeg features and diagnosis was followed up with one mixed-model analysis per each eeg feature (using the same model, albeit without the eeg feature as repeated variable). individual mixed-models were followed up with pairwise post-hoc comparisons between diagnostic groups (i.e., vs, mcs-, mcs+), with Šidák correction for multiplicity. eeg -mri analysis in the second analysis, we related eeg spectral features to subcortical shape measures. positively on the total power for each electrode (henceforth, total power component). finally, the last three components appeared to capture diffuse statistical covariance between electrodes, although with a preference for loading, respectively, positively on the delta band and negatively on the alpha band in right hemispheric electrodes (henceforth, variables, in a general linear model predicting local shape patterns (e.g., atrophy). sex, age, time-post-injury, etiology (i.e., tbi vs non-tbi), were included as covariates, along with nbv (to ensure that observed tissue displacement reflect local subcortical shape changes independent of overall brain atrophy ( in this analysis, we related the patients' behavioral presentation, as captured by the crs-r subscales, with subcortical atrophy. because of significant correlations between the subscales of the crs-r (i.e., the desired independent variables), behavioral data were entered into a pca performed analogously to the one described above. the analysis returned components with an eigenvalue greater than , cumulatively explaining . % of the variance. the three components loaded on, respectively, the auditory, visual, and arousal subscales (henceforth, audio-visual- arousal component), the motor subscale (henceforth, motor component), and the oromotor and communication subscales (henceforth, oromotor-communication component). as in the previous analysis, the three components were entered as independent variables in a general linear model predicting subcortical shape, along with the same covariates described above. group-level significance was assessed identically to the previous analysis. predicting doc level from eeg spectral features finally, we employed a binary logistic regression to evaluate the degree to which eeg results the mixed-model analysis revealed a significant interaction (f ( , . ) = . , p < . ) between diagnostic group (i.e., vs, mcs-, mcs+) and eeg features (i.e., total power, delta, theta, alpha, beta, gamma frequency bands; see figure and table ), along with a significant main effect of diagnosis (f ( , . ) = . , p = predicting doc level from eeg spectral features as shown in figure a and table , diagnosis (i.e., vs vs. mcs) was predicted significantly better by the model including eeg components, overall brain atrophy (including both white matter and gray matter), and demographic components (i.e., age, and specificity (sens/spec; . , . , respectively) than both other models (auc = . , sens/spec . / . and auc = . , sens/spec . / . for the demographics and atrophy and the demographic only models, respectively). notably, the contribution of increasingly complex models (i.e., adding brain atrophy and eeg components) is to increase the model's sensitivity to mcs (at the cost of a loss of specificity). insert table about here finally, in terms of individual variables, as shown in figure b and could correctly classify patients across the conscious/unconscious line (as behaviorally defined) with ∼ % success, leveraging on demographic information (i.e., sex, age), overall brain atrophy, and eeg features (i.e., total power and θ /δ components). it is also noteworthy that in our data combination of eeg and mri data is additive in terms of enhancing discriminability across groups, strengthening the idea that multimodal approaches are a desirable way of assaying different -and complementary -aspects of finally, in evaluating the above results, the reader should be mindful of some limitations. first, our results are skewed by survivor bias effects; we might thus be representing a spectrum of impairment which, while severe, excludes the even greater damage present in patients who do not survive later than a year post injury. the same is true with respect to the fact that datasets had to be excluded from the collected sample. while it would have been ideal to be able to retain more of the sample, conventional quality control limited our ability to analyze the full dataset. nonetheless, the final analyzed sample is similar to that of other mr-based work in the field (e.g., ( such inferences from a small subset of our data, it is conceivable that mcs-patients could have a more "rigid" rhythmic eeg less susceptible to variations as compared to mcs+ patients. third, due to significant correlations across channels within and across power bands, in order to perform the regression analyses presented above, we had to first reduce the independent variables by means of a pca. while conventional, it does affect the interpretation of our results in as much as we cannot directly assess whether the effects we report in mixed component (e.g., the β /δ component) are principally due to either frequency or to their combination. nonetheless, each of the three ratio components correlates strongly with the "raw" ratio of each pair of frequencies (calculated by taking the ratio of the average relative power across all channels; specifically, r= . , r= . , and r= . for β /δ, α /δ, and θ /δ, respectively) as well as with each numerator and denominator variable (albeit numerically more with the nominator for β /δ and α /δ components; see table s ). this suggests that our results can be reasonably interpreted as reflecting actual ratios in relative power. third, gamma frequencies are known to often contain residual muscle artifacts. we decided to keep them in the analysis mainly because, even if they do contain artifacts, including this component still contributes to explaining variance in the signal. had we not included it, any variance across patients due to motion would have de facto been subsumed by the unexplained variance term. in this sense, our approach is analogous to the conventional inclusion of motion parameters in functional mri analysis. fourth, while we report associations between brain damage in subcortical regions and eeg spectral features, this does not necessarily imply that the pinpointed areas are themselves the generators of specific oscillatory rhythms at rest. finally, it should also be pointed out that the present work did not include an additional independent sample to confirm the classification results, thus inviting caution in the extrapolation of the results towards new cohorts. as multi-site efforts increase, larger cohorts of high-quality data will permit full application of such approaches. in conclusion, this work bridges different levels of analysis of patients surviving severe brain injury, uniting doc brain pathology ( potential conflicts of interest the authors declare no conflicts of interest. ethical standards the authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the helsinki declaration of , as revised in . funding sources this work was supported by the italian ministry of health (research grant rf- - table s ………………………………………………………………….………………. table s .…………………………………………………………………………………. dual function of thalamic low-vigilance state oscillations: rhythm- regulation and plasticity intrinsic functional connectivity differentiates minimally conscious from unresponsive patients human consciousness is supported by dynamic complex patterns of brain signal coordination early detection of consciousness in patients with acute severe traumatic brain thalamic and extrathalamic mechanisms of consciousness after severe brain injury thalamic atrophy in antero-medial and dorsal nuclei correlates with six-month outcome after severe brain injury optimized brain extraction for pathological brains (optibet) the subcortical basis of outcome and cognitive impairment in tbi: a longitudinal cohort study thalamo-frontal connectivity mediates top-down cognitive functions in disorders of consciousness willful modulation of brain activity in disorders of consciousness functional connectivity of eeg is subject-specific, associated with phenotype, and different from fmri the neural correlates of lexical processing in disorders of and appearance for subcortical brain segmentation basal ganglia control of sleep-wake behavior and cortical activation multimodal study of default-mode network integrity in disorders of consciousness significance of multiple neurophysiological measures in patients with chronic disorders of consciousness sleep patterns associated with the severity of impairment in a large cohort of patients with chronic disorders of consciousness accurate, robust, and automated longitudinal and cross-sectional brain change analysis diagnostic precision of pet imaging and functional mri in disorders of consciousness: a clinical validation study default network connectivity reflects the level of consciousness in non-communicative brain-damaged patients disentangling disorders of consciousness: insights from diffusion tensor imaging and machine learning key: cord- -iwpiti h authors: harrison, angela r.; moseley, gregory w. title: the ebola virus interferon antagonist vp undergoes active nucleocytoplasmic trafficking date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iwpiti h viral interferon (ifn) antagonist proteins mediate evasion of ifn-mediated innate immunity and are often multifunctional, having distinct roles in viral replication processes. functions of the ebola virus (ebov) ifn antagonist vp include nucleocapsid assembly during cytoplasmic replication and inhibition of ifn-activated signalling by stat . for the latter, vp prevents stat nuclear import via competitive binding to nuclear import receptors (karyopherins). many viral proteins, including proteins from viruses with cytoplasmic replication cycles, interact with the trafficking machinery to undergo nucleocytoplasmic transport, with key roles in pathogenesis. despite established karyopherin interaction, the nuclear trafficking profile of vp has not been investigated. we find that vp becomes strongly nuclear following overexpression of karyopherin or inhibition of nuclear export pathways. molecular mapping indicates that cytoplasmic localisation of vp depends on a crm -dependent nuclear export sequence at the vp c-terminus. nuclear export is not required for stat antagonism, consistent with competitive karyopherin binding being the principal antagonistic mechanism while export mediates return of nuclear vp to the cytoplasm for replication functions. thus, nuclear export of vp might provide novel targets for antiviral approaches. importance ebola virus (ebov) is the causative agent of ongoing outbreaks of severe haemorrhagic fever with case-fatality rates between and %. proteins of many viruses with cytoplasmic replication cycles similar to ebov interact with the nuclear trafficking machinery, resulting in active nucleocytoplasmic shuttling important to immune evasion and other intranuclear functions. however, exploitation of host trafficking machinery for nucleocytoplasmic transport by ebov has not been directly examined. we find that the ebov protein vp is actively trafficked between the nucleus and cytoplasm, and identify the specific pathways and sequences involved. the data indicate that nucleocytoplasmic trafficking is important for the multifunctional nature of vp , which has critical roles in immune evasion and viral replication, identifying a new mechanism in infection by this highly lethal pathogen, and potential target for antivirals. transport, with key roles in pathogenesis. despite established karyopherin interaction, the nuclear trafficking profile of vp has not been investigated. we find that vp becomes strongly nuclear following overexpression of karyopherin or inhibition of nuclear export pathways. molecular mapping indicates that cytoplasmic localisation of vp depends on a crm -dependent nuclear export sequence at the vp c-terminus. nuclear export is not required for stat antagonism, consistent with competitive karyopherin binding being the principal antagonistic mechanism while export mediates return of nuclear vp to the cytoplasm for replication functions. thus, nuclear export of vp might provide novel targets for antiviral approaches. hek t cells ( figure s ). thus, gfp-vp can localise into the nucleus in complexes with k , indicating that cytoplasmic localisation, which is required for roles in nucleocapsid assembly/condensation ( , , ), derives from active nuclear export. can diffuse through the npc and lacks nlss or ness, was diffusely localised between the dependent ness ( figure a , figure s a ). importantly, the fn/c for gfp-vp - in lmb- treated cells was higher than that for gfp alone ( figure c ), indicative of accumulation. thus, vp localisation appears to be dynamic, involving nuclear entry and rapid nuclear export via crm interaction. to determine which of the predicted ness is/are responsible for nuclear export, we generated constructs to express truncated vp proteins comprising n-terminal (vp - ), central (vp - ) and c-terminal (vp - ) portions fused to gfp; each of these contained one or more of the potential ness ( figure a ). the truncated proteins were designed to be of similar length and to avoid disruption of key structural elements (e.g. alpha helices and beta sheets), based on the vp crystal structure ( ). all proteins were predominantly cytoplasmic at steady state ( figure b ). localisation of the n-terminal fragment was largely unaffected by lmb treatment, and lmb produced only a small (≤ . fold) increase for the fn/c of the central fragment ( figure b ,c). in contrast, a consistent and substantial increase (> fold) in the fn/c for the c-terminal fragment was observed following lmb treatment. vp - also displayed a consistently reduced fn/c at steady state compared with the other truncated proteins. thus, it appeared that prominent discrete crm -dependent nes activity is located in the c-terminal region of vp . notably, only full-length vp displayed accumulation into the nucleus following lmb treatment, with all truncated proteins remaining significantly less nuclear than gfp alone. this suggests that the full protein sequence is required for efficient nuclear accumulation, such that truncations remove key sequences or otherwise impact conformation to affect important interactions. the crystal structure of vp bound to k indicates that three regions contact the k (cl and cl / , separated by - residues, figure a induced only a small increase in fn/c for gfp-ul nls-vp - ( figure a ,b), suggestive of cytoplasmic retention or nuclear export mediated largely via an alternative mechanism to crm -dependent export. however, lmb induced substantial nuclear localisation of gfp- ul nls-vp - (> . fold increase in fn/c; figure b ) that clearly exceeded nuclear localisation of gfp-vp - ( figure c ), consistent with a classical crm -dependent nes counteracting the activity of the heterologous ul nls. the fn/c for gfp-vp - was also markedly increased by lmb treatment but did not attain an fn/c similar to that of full- required for efficient nuclear localisation. nevertheless, these data clearly indicate that vp contains classical crm -dependent nes activity and that the principal nes is within vp - . the c-terminal crm -dependent nes is the principal sequence mediating nuclear export of vp to confirm that the c-terminal nes is the major sequence driving crm -dependent export of vp , we used site-directed mutagenesis to disable the nes motif. analysis of the vp c- terminal region identified residues - (comprising the c-terminal residues) as containing a sequence strongly conforming to a nes ( to further examine effects of altered vp nuclear trafficking on stat responses, we assessed nuclear import of stat using clsm analysis of cos cells expressing gfp-vp and immunostained for stat following treatment without or with ifn- and/or lmb. in agreement with results of the luciferase reporter assays, we observed that despite substantial re-localisation of gfp-vp to the nucleus in lmb-treated cells, ifn--dependent stat nuclear localisation remained clearly inhibited ( figure s ). together, these data indicate that nuclear export of vp is not required for inhibition of stat responses, consistent with k binding representing the major antagonistic mechanism. thus, it appears that active in this study we have shown that ebov vp undergoes active trafficking between the nucleus and cytoplasm involving crm -dependent nuclear export via a nes at the vp c-terminus. the acquisition of active nuclear trafficking sequences is consistent with a requirement for highly regulated/dynamic localisation; furthermore, since vp is reported to oligomerise (potentially as tetramers) ( ), it is likely that active nuclear trafficking is required for transport of vp multimers. the identified nes was not resolved in vp crystal structures ( , , ) but localisation at the c-terminal end would be consistent with exposure and accessibility to crm ( ), and the predominantly cytoplasmic localisation of gfp-vp in resting cells suggests that the nes is the dominant trafficking signal at steady state. intriguingly, previous studies indicated that a mutated vp protein defective for k-binding was more cytoplasmic than wt protein ( ). this would be consistent with karyopherin binding mediating import; one might thus speculate that vp would require export mechanisms to enable cytoplasmic localisation/functions. our findings are the first to confirm this is the case. notably, the ebov matrix protein vp has also been reported to localise to the nucleus in infected and transfected cells ( , ); however, a direct role for active trafficking pathways to regulate localisation, distinct from mechanisms such as diffusion or interaction with other host factors, has not been defined. thus, our data provides, to our knowledge, the first direct demonstration of a filovirus protein exploiting specific host trafficking machinery for nucleocytoplasmic transport, identifying a new mechanism in infection by these highly lethal pathogens. although the nucleus is not directly involved in the replication processes of most rna viruses, proteins of a number of these viruses are reported to encode nuclear trafficking sequences, indicative of a requirement for dynamic regulation or specific accumulation in particular compartments. for example, the rabv ifn antagonist p protein encodes several nlss and ness ( , , - ), with regulatory mechanisms including co-localisation or overlap of the sequences, enabling co-regulation by mechanisms including phosphorylation ( - ). although our data identify the c-terminal nes as a principal determinant of nucleocytoplasmic localisation of full-length vp , the differential localisation and lmb sensitivity of vp - and vp - , and the finding that vp - does not recapitulate nuclear accumulation of full-length vp , suggest the presence of alternative regulatory sequences/mechanisms, potentially exposed by truncation. for example, vp is reported to associate with membranes ( ), which might result in tethering within the cytoplasm under certain conditions. interestingly, a recent study reported that sumoylation of residue k of vp enhances k binding and ifn antagonistic function ( ). in contrast, ubiquitination, including at residue k within cl (figure a ), appears to negatively regulate ifn antagonist activity ( ). intriguingly, k is distal to cl - but is within a predicted nes motif (figure a) . whether mechanisms will be of interest in defining the processes controlling immune evasion and replication by ebov. while some viral ifn antagonists use ness to facilitate immune evasion, including through mislocalisation of associated stats ( , ), vp uses a mechanism of competitive binding to ks. our finding that vp nuclear export is not required for stat antagonism is consistent with this, and indicates that export relates to cytoplasmic roles including in nucleocapsid assembly and transport ( - ). the requirement for efficient translocation out of the nucleus is consistent with interaction of vp with k (see above), that underpins distinct functions in immune evasion. this is further supported by our finding that the c-terminal nes motif is conserved among vp of several filovirus species that have been shown to bind to ks or have conserved cl sequences ( , , ), but not in mabv vp ( figure e , figure s c thus, targeting vp regulatory mechanisms, including its nuclear export, may provide novel targets for anti-ebov drug design. the construct to express the minimal nls from human cytomegalovirus ul protein (residues - ) fused to gfp was generated by subcloning from pepi-gfp-ul - ( , ) into the pegfp-c vector c-terminal to gfp (clontech). constructs to express full-length or truncated ebov-vp protein fused to gfp or gfp-ul nls were generated by pcr to express flag-tagged k was a kind gift from c. basler (georgia state university). other constructs have been described elsewhere ( ). discovery of an ebolavirus-like filovirus in europe filoviruses: ecology, molecular biology, and evolution descriptive analysis of ebola virus proteins infection of naive target cells with virus-like particles: implications for the function of ebola virus vp knockdown of ebola virus vp impairs viral nucleocapsid assembly and prevents virus replication the assembly of ebola virus nucleocapsid requires virion-associated proteins and and posttranslational modification of contribution of ebola virus glycoprotein, nucleoprotein, and vp to budding of vp virus-like particles both matrix proteins of ebola virus contribute to the regulation of viral genome replication and transcription ebola virus (ebov) vp inhibits transcription and replication of the ebov genome vp is a molecular determinant of ebola virus virulence in guinea pigs a single phosphorodiamidate morpholino oligomer targeting vp protects rhesus monkeys against lethal ebola virus infection fn/c) for gfp (mean ± sem, n  cells for each condition; results are from a single assay representative of three independent assays). statistical analysis (student's t-test) was performed using graphpad prism software a) schematic of full- length vp and truncated vp proteins generated. location of potential ness are shown in yellow. location of clusters (cl - ) of residues that interact with ks in the vp :kα complex crystal structure ( ) are shown in red. numbering indicates residue positions in full- length vp ; sequences of potential ness are shown above. (b) cos cells representative images are shown. (c) images such as those shown in b were analysed to calculate the fn/c for gfp (c; mean ± sem; n  cells for each condition statistical analysis used student's t-test. **** from a single assay representative of two independent assays). statistical analysis used student's t-test the authors declare that they have no conflicts of interest with the contents of this article. key: cord- -vodwptdo authors: altshuler, anna; amitai-lange, aya; tarazi, noam; dey, sunanda; strinkovsky, lior; bhattacharya, swarnabh; hadad-porat, shira; nasser, waseem; imeri, jusuf; ben-david, gil; tiosano, beatrice; berkowitz, eran; karin, nathan; savir, yonatan; shalom-feuerstein, ruby title: capturing limbal epithelial stem cell population dynamics, signature, and their niche date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vodwptdo stem cells (scs) are traditionally viewed as rare, slow-cycling cells that follow deterministic rules dictating their self-renewal or differentiation. it was several decades ago, when limbal epithelial scs (lscs) that regenerate the corneal epithelium were one of the first sporadic, quiescent scs ever discovered. however, lsc dynamics, heterogeneity and genetic signature are largely unknown. moreover, recent accumulating evidence strongly suggested that epithelial scs are actually abundant, frequently dividing cells that display stochastic behavior. in this work, we performed an in-depth analysis of the murine limbal epithelium by single-cell rna sequencing and quantitative lineage tracing. the generated data provided an atlas of cell states of the corneal epithelial lineage, and particularly, revealed the co-existence of two novel lsc populations that reside in separate and well-defined sub-compartments. in the “outer” limbus, we identified a primitive widespread population of quiescent lscs (qlscs) that uniformly express krt /gpha /ifitm /cd proteins, while the “inner” limbus host prevalent active lscs (alscs) co-expressing krt -gfp/atf /mt - /socs . analysis of lsc population dynamics suggests that while qlscs and alscs possess different proliferation rates, they both follow similar stochastic rules that dictate their self-renewal and differentiation. finally, t cells were distributed in close proximity to qlscs. indeed, their absence or inhibition resulted in the loss of quiescence and delayed wound healing. taken together, we propose that divergent regenerative strategies are tailored to properly support tissue-specific physiological constraints. the present study suggests that in the case of the cornea, quiescent epithelial scs are abundant, follow stochastic rules and neutral drift dynamics. stem cells (scs) differ from any other cells by their substantial ability to retain in an undifferentiated state, self-duplicate or enter differentiation program, depending on tissue demand . scs have been successfully applied in clinical trials to reconstitute the bone marrow, treat skin burn, and restore corneal blindness . however, various fundamental features of scs are still under debate or remain unknown. for example, sc prevalence in the niche, sc heterogeneity and self-renewal mechanisms are not well understood, and consequently, they remain enigmatic cells . these key challenges hamper the progression towards advanced application of scs for regenerative medicine. seminal studies supported a deterministic and hierarchical model in which scs were viewed as slowcycling cells that are located in a specialized microenvironment known as the niche . such quiescent scs (qscs) are commonly believed to be rare cells that are surrounded by their progeny, of abundant, fast cycling but short-lived progenitor cells . key evidence for the quiescence and scarcity of scs came from indirect observations in vivo and in vitro. sporadic slow-cycling cells were identified in different tissues as nucleotide-label retaining cells in pulse-chase experiments performed in various tissues including the cornea , bone marrow , brain , skin , gut and skeletal muscle . clonal survival/growth dynamics in vivo (i.e. by genetic lineage tracing of single cells at the sc compartment) or ex vivo (i.e. by colony formation assay), typically result in the detection of a low number of clones that survive for long-periods of time. this evidence shaped the traditional deterministic model that considered qscs as very potent cells that are also very scarce. the benefits of quiescence may be paramount and include reduced biochemical damage associated with the production of toxic agents or biosynthesis of macromolecules, and minimalized accumulation of disease-causing mutations . despite much effort in the field, markers that reliably identify genuine and scarce qscs were usually not found. proposed markers, typically labeled a contiguous widespread cell population in the sc compartment . nevertheless, the co-existence of qscs and actively dividing scs (ascs) in specific tissues including the bone marrow, brain, hair follicle and muscle has been shown . these tissues differ from traditional epithelial sc models by their significantly slower turnover (or resting phase) . in recent years, however, frequently dividing scs were identified in epithelial tissues including the gut , epidermis and esophagus . these studies proposed an alternative, virtually opposing, stochastic model that attributes scs with entirely contrary properties including short cell cycle, abundance in the niche and unpredictable survival rate . one of the common models that is used to capture this phenomena is that of asymmetric self-renewal with neural drift dynamics . these findings challenged the conventional paradigm that considered quiescence as an integral feature of scs. it is likely that the appropriate regenerative strategy is tailored for each tissue depending on the physiological necessities and challenges (e.g. irradiation, toxicity, cellular turn over or lineage diversity). in this sense, the corneal epithelium is an interesting study case, whereas scs and their progeny must support tissue transparency and protection from sun irradiation, toxicants, physical injury and invading microorganisms . indeed, most ocular surface epithelial cancers are restricted to the corneal sc compartment of the limbus. however, the rareness of limbal tumors suggests the existence of specialized mechanisms that may provide protection against neoplastic cell transformation. the cornea is an excellent model for sc research hallmarked by segregation of scs, short-lived progenitors and differentiated cells into distinct anatomical compartments. the clarity and high accessibility of the cornea, allows multi-color ("confetti") fluorescent lineage tracing , assessing different sc hypotheses , as well as sc or progenitor cell depletion using vital microscopy . corneal epithelial scs reside in the limbus, a ring-shaped region at the corneal-conjunctival boundary. the limbal microenvironment (niche) markedly differs from that of the cornea. it contains blood and lymph vasculature whereas the cornea is avascular, it is hallmarked by a unique extracellular matrix with soft biochemical properties while it hosts specialized niche cells . rare label retaining cells were confined to the limbus . in line, colony formation tests revealed that only a low percentage of limbal cells could form large clones ("holoclones") in vitro, a feature that is commonly attributed to bona fide limbal scs (lscs) . based on these observations, it is widely believed that true lscs are scarce limbal epithelial cells kept most of the time in quiescence state and are surrounded by their early progeny which are abundant, fast dividing but short-lived progenitor cells. tremendous efforts have been made by many research groups to discover markers for the identification of lscs. keratin (krt ) , c/ebpdelta and bmi , abcb , abcg and p represent a partial list of genes proposed to identify lscs. recently, we have shown that the krt -gfp transgene (green fluorescent protein coding gene under the promoter of krt ) labeled a discrete population of murine lscs. krt -gfp + basal limbal epithelial cells were located at close proximity to the site of corneal regeneration origin, as evident by lineage tracing of krt + cells . however, neither krt -gfp, nor any of the proposed markers could faithfully mark a hypothetical rare, slow-cycling sc population in the limbus, suggesting that the current model may be inaccurate. consequently, the prevalence of lscs, their genetic signature, and regulation by niche cells remain poorly defined. here we combined single-cell transcriptomics and quantitative lineage tracing to capture lsc populations and their signature. this work provides a useful atlas of the entire murine corneal epithelial lineage, including the signature of a slow cycling quiescent lsc (qlsc) state, and finally, unravels an apparently tight interaction and regulation of lscs by t cells. careful inspection of the limbus of krt -gfp transgene bearing mice showed that krt -gfp + cells are confined to an "inner" limbal zone, suggesting that the "outer" limbal region (the space between krt -gfp + cells and the krt + conjunctiva) contains a different lsc population (fig. a, s a) . to clarify the molecular nature and heterogeneity of lsc populations, we performed a single-cell transcriptomic analysis. in order to facilitate the detection of rare cell populations, reduce variability and gain robust statistical data, we preferentially isolated epithelial cells from the limbus of individual adult mice ( . months old, n= eyes). the limbus together with marginal conjunctiva and corneal periphery were carefully dissected (fig. a) , a protocol for isolating mainly epithelial cells was applied (see methods), and suspended cells were subjected to x chromium single-cell rna sequencing (scrna-seq). initial raw data analysis and quality controls were performed with cell ranger software and a large number of , high quality cells passed rigorous quality tests exhibiting significant mean reads ( , per cell) as well as a substantial median number of , detected genes per cell. in silico analysis revealed discrete cell states in the corneal epithelial lineage cell filtering and unbiased clustering were performed with r package seurat revealing cell populations displaying a markedly distinct signature of gene expression (fig. b, s b) . analysis of lineagespecific markers suggested that > % of the cells are ocular epithelial cells (see methods). next, we performed in silico analysis of putative markers for each epithelial tissue (i.e. conjunctiva, limbus, and cornea) and cell layer (i.e. basal or supra basal layer) ( fig. c -j, s c-d). conjunctival cells: clusters - were identified as conjunctival cells based on the positive expression of putative conjunctival cytokeratins (krt , krt , krt (fig. c) and krt a, krt , krt (fig. s c) ), and the lack or comparatively low expression of corneal markers krt and slurp and of the protein phosphatase (ppp r c) (fig. d ). further analysis of putative markers of basal (itgb , itgb , ccnd (cyclin d ) ) and supra basal (cld , dsg a, cdkn a (p ) ) cells indicated that cluster represents a population of basal conjunctival cells while cluster denotes supra-basal conjunctival cells (fig. g-h) . krt was highly expressed by conjunctival basal cells (cluster ) and even higher in conjunctival suprabasal cells (cluster ), whereas krt displayed basal cell enrichment in conjunctival cells (fig. i) . limbal cells: clusters - were identified as limbal cells based on the lack or lower expression of conjunctival (fig. c , s c) and corneal (fig. d ) markers. while cluster displayed supra-basal cell phenotype, cluster - were identified as limbal basal cells (fig. g -h) expressing a set of new limbal specific markers (fig. e-f ). as compared to cluster , cluster was hallmarked by much higher levels of krt and krt (fig. i ) and lower krt (fig. d) , suggesting that it represents the most primitive undifferentiated limbal cell population. corneal cells: although a minimal basal expression levels of corneal specific krt mrna was found in all groups, clusters - expressed much higher levels of krt (krt hi ) (fig. d) . clusters - were basal corneal epithelial markers although cluster appeared to include cells that seemingly initiated differentiation, as evident by attenuation of basal epithelial cell markers on the expense of supra-basal cell markers ( fig. g-h) . likewise, analysis of basal/supra-basal cell markers suggested that cluster represents partially differentiated cells, most likely corneal wing cells, whereas cluster represents terminally differentiated superficial cells (fig. g-h) . in line with previous reports , while krt and krt -gfp labeled basal conjunctival/limbal epithelial cells, they also marked supra-basal cells of the conjunctiva, limbus and marginal cornea periphery (fig. i) . interestingly, the signal of endogenous krt did not overlap with that of krt -gfp (fig. l ), in line with previous reports on epidermis . this suggests that the genomic insertion site of k -gfp transgene, the fractional krt promoter cloned, lack of krt coding sequence or presence of gfp sequence, may have deviated the transcriptional regulation of the krt -gfp transgene away from that of endogenous krt . cells in mitosis: interestingly, clusters - were found to represent cells from all tissues (conjunctiva, limbus, cornea (figs. c-f)) that were hallmarked by various cell cycle genes including mki , top a, ccna (cyclin a ) (fig. j) . the grouping of this cell population of mixed lineages into discrete clusters indicated that the changes in expression of genes related to mitosis were profound and dominated the differential expression of tissue-specific genes. further analysis of cell cycle genes, suggested that cluster represents cells that were captured at early stage of mitosis (late g -s) while cells of cluster were netted at relatively more advanced stages (s, g and m) of the cell cycle (fig. s d) . two novel limbal sub-compartments: to certify the cluster identification in vivo, we performed in situ hybridization (ish) on tissue sections to assess the anatomical localization of selected clusters. eyes of - months old c bl/ mice were enucleated and tissue sections were stained as detailed in methods. the conjunctiva and cornea were labeled using ish probes for keratin (krt ) and krt , respectively. in order to identify the location of the limbal clusters - , we focused on gpha and atf that displayed relatively high mrna copies. interestingly, this analysis revealed novel well-demarcated sub-compartments in the limbus. the "outer limbus" was occupied by thin epithelium comprised of small basal epithelial cells with flattened morphology that expressed gpha mrna. the basal layer cells of the "inner limbus" expressed atf mrna and displayed flattened cell morphology that gradually became cuboidal towards the corneal periphery (fig. k) . in agreement, immunostaining showed that the outer limbal basal epithelial cells were krt + /ifitm + /cd + whereas inner limbal epithelial cells were krt -gfp + /atf + /mt - + (fig. l) . it was noticeable that the two limbal sub-compartments are thin. moreover, the outer limbus tissue was occasionally absent or truncated in tissue sections. therefore, to keep the tissue intact and facilitate efficient analysis, we established wholemount immunostaining for antibodies that were found suitable. figure m -o, shows that antibodies against gpha , ifitm and cd specifically labeled the outer limbus while krt -gfp and mt labeled the inner limbus by wholemount immunostaining. for consistency and further analyses, we defined conventions for the identification of the outer and inner limbus. the outer limbus markers (gpha , ifitm or cd ) displayed uniform and sharp signal and were well-correlated with slow proliferation and clonal growth dynamics (figs - ) . hence, these markers hereafter were chosen to define the outer limbus, as illustrated in fig. o . next, the inner limbus was defined as the limbal zone that was negative to outer limbal markers, and its boundary with the peripheral cornea was demarcated by inner limbus markers (e.g. krt -gfp in fig. o ). accordingly, quantitative analysis showed that the average width of the outer limbus was ~ m while the inner limbus was much wider (~ m) (fig. p) . interestingly, the pattern of krt -gfp signal was often continuous, or nearly continuous with outer limbus markers like gpha , but never overlapped with the labeling of the outer limbus ( fig. n-o) . interestingly, by microscopic analysis of wholemount tissues, it was noticed that flattened basal cells found in both sub-compartments were hallmarked by a unique pattern of nuclei staining that differed from that of the cuboidal basal cells that were detected in the marginal inner limbus and corneal periphery ( fig. s e-f ). finally, analysis of selected keratin coding genes confirmed that basal outer limbal epithelial cells were krt hi /krt hi /krt med /krt neg , basal inner limbal epithelial cells were krt neg /krt med /krt neg /krt low , while conjunctival cells were krt hi /krt hi /krt hi /krt med /krt neg (fig. s c, g-h) . in silico analysis of clusters: next, pathway enrichment analysis using webgestalt was performed with the ora method using kegg as the functional database in order to identify potential pathways that were enriched in clusters or in comparison to all other clusters. interestingly, sc and cancer pathways including pi k and p were enriched in cluster while cluster was enriched with tnf pathways and multiple annotations for cancer (fig. s ) . additional analysis revealed a list of differentially expressed genes expressed by clusters or ( fig. s a-b) . expectedly, genes related to hemi-desmosomes increased in clusters of basal cells (clusters , - , - ), and desmosome-related and gap junction associated genes increased in clustered supra-basal cells (clusters , , - ) (fig. s c) . interestingly, high levels of the extracellular matrix (ecm) components lamb and crim (cysteine-rich motor neuron ), that regulates bmp and inhibits cell proliferation , were found in the outer limbus cluster , whereas col a was higher in the inner limbus cluster . taken together, this data illuminates the genetic signature of diverse cell states across the corneal epithelial lineage, and in particular, reveals the co-existence of two discrete limbal sub-compartments consisting of two distinct limbal epithelial cell populations. to further uncover the nature of outer and inner limbal epithelial cells, functional enrichment analysis (webgestalt with ora and kegg) shed light on differences between clusters and . interestingly, among the top functional categories, identifiers involved cell proliferation while involved cell movement (fig. s a) , suggesting that a central difference between the limbal clusters and lies in their cell cycle and motility characteristics. we thus aimed to gain insights regarding proliferation, clonal dynamics, centripetal movement, and discover the potential hierarchy between all clusters and within each limbal zone. for that purpose, we designed a quantitative genetic lineage tracing experiment for tracking all cell populations including potentially rare lsc populations, in a clonal and inducible manner. we established double transgenic ubc-cre ert ; brainbow . mice that facilitated the tamoxifen inducible activation of cre recombinase in all cell types in an efficient and unbiased manner under the control of the ubiquitin c (ubc) promoter (fig. s b ). in the absence of tamoxifen, cre is expressed at the cytoplasm of all cell types of this mouse, thus unable to access the dna. however, upon transient exposure to tamoxifen, cre recombinase translocates into the nucleus where it can randomly induce dna excision and/or inversion of the loxp sites in the brainbow . cassette, thus, allowing stochastic and irreversible expression of one out of four "confetti" fluorescent proteins (namely, nuclear green (gfp), cytoplasmic red (rfp), cytoplasmic yellow (yfp) or membrane cyan (cfp) fluorescent protein) (fig. s c) . two-month-old mice were treated with tamoxifen for - consecutive days to induce efficient "confetti" labeling in all cell populations. for quantitative analysis of the limbal sub-compartments, wholemount staining of gpha was performed to identify the outer limbus ( nm to avoid overlap with confetti colors and dapi) while the contiguous m gpha -negative limbus was considered as inner limbus for calculations ( fig. s d) . interestingly, at all time-points, the distribution of clones size (i.e., the number of basal cells in a clone) in the outer limbal epithelial zone was significantly lower compared to those found in the inner limbus and the peripheral cornea. moreover, the average clonal size in the inner limbus was smaller than the size of clones in the corneal periphery ( fig. a-d, e) . two months post induction, a pattern of radial stripes emerged already in clones that held a "foot" in the inner limbus (fig. b) . four-months post induction, when the pattern of fully developed, radial stripes appeared. most stripes extended from the inner limbus and only a few confetti stripes were sourced from the outer limbus (fig. d, j) . the detection of stripes starting from both outer and inner limbal zones after very long-term tracing, suggests that both limbal zones contain lscs. the uniform expression pattern of markers and clonal growth dynamics implied that the outer limbal epithelial cells might follow stochastic rules and neutral drift . a main prediction of this approach is that the size of the clone would grow linearly with time .in this case, the slope of this linear trend is proportional to the product of doubling rate and the probability of symmetric division in the epithelial plane (methods). our results suggest that clone size dynamics in all regimes can be captured by a linear growth and that the slopes are significantly different and ordered centripetally-each regime has a slope which is about twice higher than its neighboring zone ( fig. f-g) . another prediction of the neutral drift model, is that the clonal size distributions scale with their averages. that is, the size distribution of the clone size normalized by their mean should be similar. fig. h shows the probability of having a clone that is larger than some value (one minus the cumulative probability distribution) for the normalized size distributions. all regimes show clear scaling of their clone size for all time points. furthermore, the neutral drift model predicts that the number of clones should decrease inversely with the clone size (fig. i ). in our case, there is a clear difference between the three different zones tested. while in the periphery the expected inverse relationship between size and number is similar to that of neutral drift, in the outer and inner regimes after four months the reduction in the number of clones is smaller than expected by that model. taken together, this data suggests that the outer limbus and inner limbus contain a population of scs that have similar stochastic dynamics but different doubling times. if the outer lscs are the most primitive undifferentiated cells positioned at the tip of the cellular hierarchy, the inner limbus may host slightly committed lscs. to further assess the hierarchical relations between clusters in the corneal epithelial lineage, we plotted the clusters using linear dimensional reduction using the seurat algorithm for limbal/corneal epithelial clusters (clusters [ ] [ ] [ ] [ ] [ ] [ ] [ ] . this algorithm infers the sequential changes of gene expression between clusters and thereby provides insights for the dynamics of biological processes, in this case cell differentiation. in line with lineage tracing data, this analysis suggested that cluster (outer limbal basal cells) gives rise to cluster (inner limbal basal cells) that derive clusters - (corneal basal cells) and clusters - (corneal wing and superficial cells), while cluster cells (limbal superficial) that are indeed very similar to corneal superficial cells were positioned last (fig. k) . taken together, the combined quantitative lineage tracing and in silico analysis illuminates the dynamics and hierarchy between corneal epithelial cell populations. to rigorously test lsc growth dynamics by an alternative and direct approach, we next examined the cycling properties of outer and inner basal limbal epithelial cells by staining of ki which labels cells that are at different stages of cell division (late g , s, g , m). in agreement, only . ± . % of outer limbal basal cells were ki + , while . ± . % of inner limbal basal cells and . ± . % of corneal peripheral basal cells were ki + ( fig. a-b ). to corroborate this data, we performed -ethynyl- ´-deoxyuridine (edu) nucleotide analogue incorporation assay to identify undergoing dna replication (s-phase). as illustrated in fig. c , -month-old mice were intra-peritoneal injected with edu, six hours later mice were sacrificed and cells that were in the s-phase of cell division during the interval were identified as edu + cells by whole mount staining. expectedly, edu incorporation by outer limbal cells was much less frequent, as compared to that of basal epithelial cells at the inner limbus or peripheral cornea (fig. d) . to examine the ability of outer limbal cells to enter active cell cycle in response to corneal injury, we performed large ( . mm) corneal epithelial debridement following edu injection. evidently, -hours post injury, numerous outer (and inner) limbal cells entered cell cycle and were edu positive. taken together, this set of experiments suggests that the outer lscs are slow-cycling cells, but they can quickly enter cell cycle and participate in corneal healing. to further examine the presence of slow cycling cells in the outer limbus, we performed an edu pulse-chase experiment. as described in fig. e , mice were injected with edu twice a day for days ("pulse") followed by a "chase" period of -days at the absence of edu. during the chase period the label declines by % within every cell division and the label becomes undetectable after few successive division rounds (~ ), depending on assay type and sensitivity . as expected, edu + label retaining cells were predominantly found in the outer limbus, whereas rare cells were occasionally found in the inner limbus (fig. f) , suggesting that during the -day chase period most quiescent outer lscs exceeded sufficient division rounds and lost labeling and only few which underwent fewer replications could be detected. to validate that label retaining cells were epithelial cells, we performed co-staining of edu + label retaining cells with krt antibody (fig. s h) . the relative homogeneity in the expression of gpha /ifitm /cd by the outer lscs, together with the absence of ki associated with broad slow growth of clones in the outer limbus, implied that the vast majority, perhaps entire, basal cell population in the outer limbal zone divides infrequently. to gain direct evidence on cell cycle of each population and evaluate the rate of cell division in each zone we performed a double nucleotide-analogue injection. as depicted in fig. g, . hours following injection of iododeoxyuridine (idu, red), edu (green) was injected, and minutes later tissues were harvested, stained and analyzed (fig. h-i) . as detailed in methods, s-phase length was first calculated as the ratio between the number of cells that were idu + /edu + (cells that were engaged with, but have not completed, dna replication) and the number of cells that were idu + /edu -(cells that were in s-phase after idu injection but have completed dna replication already before edu injection), multiplied by the interval length ( . hours). the estimated division frequency was calculated as the ratio between the total number of cells divided by the number of cells in s-phase, multiplied by the s-phase length . this analysis suggests that outer limbal basal cells divide on average every ~ days, inner limbal basal cells divide every ~ days while corneal peripheral basal cells divide every ~ . days (fig. i) . taken together, this data strongly suggests that the outer limbus contains a predominant equipotent krt + /gpha + /ifitm + /cd + cell population of qlscs while the inner limbus is occupied by a widespreas equipotent alscs. cellular quiescence is linked with reduced biochemical activity including transcription activity . indeed, the cells in cluster (qlscs) displayed the lowest value of rna reads per cell (fig. j) . previously, bmp was linked with epithelial sc quiescence while wnt repressed bmp pathway and induced sc activation . in agreement, cells of cluster (qlscs) preferentially expressed the bmp-regulated transcription factors id /id and the wnt inhibitor, sfrp , while wnt a was higher in cluster of alscs (fig. k) . interestingly, cluster expressed cyclin d and the tumor suppressor trp , both of which were linked with growth arrest induced by various stimuli. the relevance of qlsc to cell therapy is significant, therefore, we next explored the usefulness of the newly identified markers of murine qlscs to human. immunofluorescent staining revealed that krt , ifitm and gpha are expressed by basal limbal epithelial cells (fig. a) . similar to staining of murine epithelium ( fig. - ) , the labeling pattern of qlsc markers was not sporadic but rather wide, marking a large population of basal limbal epithelial cells. interestingly, however, gpha expression was drastically down regulated to hardly detectable levels upon cultivation of lscs. repression of gpha (esigpha ) resulted in total inhibition of gpha expression (fig b) . this implies that gpha expression depends on niche specific signal that was absent in vitro, and that gpha is tightly associated with a slow cycling feature that is not maintained in vitro. nevertheless, krt and ifitm were highly expressed by undifferentiated limbal epithelial cells in vivo and in vitro (fig. c ). in addition, calcium induced differentiation enhanced krt and reduced krt , ifitm and p (fig. c) . in a previous report, ifitm was linked with localization to endo-membranes where it protected embryonic scs from viral infection . in line, ifitm signal was confined to cellular vesicle structures in the cytoplasm of cultivated limbal cells (fig. d) . to test whether ifitm influences stemness and differentiation, we performed a knock down experiment. to achieve efficient and specific knockdown, we used endoribonuclease-prepared silencing rna (esirna) where enzymatic digestion of long double stranded rnas produces a mixture of multiple short fragments of silencing sequences, thereby, enhancing specificity and efficiency. efficient knock down of ifitm was evident by rna and protein levels ( fig. e-f) . moreover, repression of ifitm resulted in a significant reduction in krt , and an increase in the expression of the differentiation associated corneal keratin krt (fig. e-f ). taken together, this set of experiments suggests that ifitm positively controls the undifferentiated state of human qlscs, and that the newly identified qlsc markers (ifitm , gpha ) can be used to identify human qlscs. t cells serve as niche cells for qlsc regulating cell proliferation and wound closure the striking segregation of qlscs to the outer limbus suggests that this cell state must be regulated by the defined local microenvironment (niche). interestingly, the outer limbus was clearly demarcated by wellstructured blood/lymph vasculature (fig a) . thin well-organized blood vessels typically penetrated the marginal inner limbus but never reached the corneal periphery. lymph vessels were characteristically present in the outer but not the inner limbus. additionally, the limbus was highly populated with cd + immune cells, amongst them, many cd + that displayed dendritic cell morphology, both in the stroma and in the epithelium (fig. s a) . interestingly, we have identified a population of t cells in the outer limbus, many of which expressed the regulatory t cell markers, cd and foxp (fig. b, s a) and others expressed the cytotoxic t cell marker cd (fig. s b) , suggesting that qlscs may be regulated by t cells. next, we explored the limbus of two laboratory mouse strains that fail to maturate t and b lymphocytes, namely, severe combined immunodeficiency (scid) and non-obese diabetic scid (nod/scid), and balb/c mice served as a control group. similar to c bl/ genetic background (fig - ) , the outer limbus of balb/c mice was hallmarked by gpha + /cd + /ifitm + that infrequently expressed ki (fig. b-d, s a -c, s e). curiously, an extensive reduction of gpha and cd proteins to levels that became barely detectable was found in scid mice (fig. b-c) as well as in nod/scid mice (fig. s c) , while other qlsc markers (e.g. ifitm ) were unaffected (fig. s b) . moreover, higher index of ki labeling was found in the outer limbus (as well as inner limbus) of scid mice (fig. c-d) that displayed mild epithelial thickening (fig. s d) . in agreement, similar defects were recapitulated in nod/scid mice (fig. s c) . since no b cells (cd + ) were detected in the limbus (fig. s b) , we assumed that this phenotype is caused by the absence of t cells in scid mice. to further address the involvement of t cells in qlsc regulation, we explored athymic nude-foxn nu mice that lack t cell development. here too, qlsc markers gpha and cd were dramatically attenuated while ifitm was maintained, strengthening the involvement of t cells in qlsc regulation (fig. s e) . finally, we aimed to substantiate the crosstalk of t cells -qlsc in adulthood, and exclude the possibility of developmental failure. to this end, we repressed the ocular immune system by topically applying the corticosteroid dexamethasone (fig. s f) , or alternatively and more specifically, performed inhibition of regulatory t cells by sub-conjunctival injection of anti-cd antibody (pc . ). intriguingly, -days post antibody injection, cd and gpha drastically decreased whereas cell proliferation increased (fig. e-f ), strongly suggesting that t cells directly regulate quiescence in the outer limbus. similar results were obtained following dexamethasone treatment (fig. s f) . finally, to link between t cell regulation and qlsc functionality, we performed corneal epithelial debridement and followed epithelial closure by fluorescein dye penetration. as shown in fig. g -h, immunodeficient mice displayed delayed wound closure. taken together, this set of experiments suggests that t cells, most likely cd +/foxp + regulatory t cells, serve as a niche for qlscs and play a critical role in quiescence maintenance, control of epithelial thickness and wound healing. in , cotsarelis and lavker reported the identification of rare slow cycling cells in the limbus . since then, over the last -decades, much attention has been given for identifying such rare cells, however, no reliable marker has been found yet. here we captured the signature of primitive qlscs and propose that these cells are more prevalent than previously estimated. we showed evidence that qlscs, which populate the basal layer of the outer limbus, quite uniformly express krt + /gpha + /ifitm + /cd + . the ~ days estimated cell cycle length of qlsc fits well with quantitative lineage tracing, double nucleotide incorporation experiments, and nucleotide pulse-chase label retention assays. indeed, the limbal/corneal epithelium becomes uniformly labeled following -weeks of nucleotide pulse , i.e., suggesting that the length of cell cycle of both limbal and corneal epithelial cells is shorter than days. additionally, if assuming ~ days for qlscs, the interval of -days of chase in the absence of edu (fig. e-f ) should theoretically allow ~ division rounds for outer qlscs, and it is therefore not surprising that most outer limbal basal cells lost labeling while only few retained low edu signal. in other words, this data fits well with abundant qlscs model. likewise, this model is supported by the quantitative analysis of ki staining and edu incorporation after short pulse (fig. a-d) as in both cases, the outer limbus showed a widespread negativity for labeling. our working model that posits abundancy of equipotent qlscs fits well with a stochastic sc model that was proposed to describe sc dynamics in the epidermis, esophagus and gut epithelium . in these tissues, abundant equipotent scs that were shown to be extremely fast cycling, are at the tip of the cellular hierarchy. epidermal and esophageal scs divide every - days while gut epithelial scs divide once a day, much faster as compared to qlscs that divide every days. this suggests that under distinct physiological conditions, tissues display tailored regenerative strategies for reasons of cost and benefit tradeoff. the two new lsc populations were well-segregated into well-defined limbal sub-compartments. this conclusion is supported by the observations that the outer and inner limbal cells differentially express specific markers and engaged pathways (fig. , s - ) , display distinct clonal growth and proliferation dynamics (fig. - ) , and niche components (fig ) . the lineage tracing experiments implies that alscs play a key role in cornea replenishment under homeostasis, as most stripes emerged from this region. by contrast, a key feature of qlscs is their ability to rapidly exit dormant state and enter the cell cycle in response to injury (fig. ) . the cell cycle length of qlscs is only approximately twice longer than that of alscs that were estimated to divide every ~ days. however, this frequency of cell division is higher compared to some estimations for other tissue-specific slow cycling scs . the reason for such a drastic difference in sc cycling frequencies and its importance is poorly understood. in this sense, one could not exclude the possibility that an even slower cell population co-exists in the outer limbus, however, if it does, these cells must be very rare. moreover, sub-fractionation of cluster of qlscs does not provide statistically relevant subclusters, again, suggesting the cells in the cluster were relatively highly homogenous. the analysis of the clone size distributions showed that they scale with the average number of clones in all the regions. this suggests that the underlying stochastic dynamics of all the regions are similar. the clone size dynamics are consistent with the neutral drift dynamics model and suggest that the doubling time in the inner limbus is twice the doubling time in the outer limbal regime, which is consistent with the cell cycle length measurements. in all regimes, the number of clones is going down with time, as expected. however, analyzing the quantitative relation between clone size and clone number reveals a striking difference between the outer and inner limbal regimes and the periphery. while all zones exhibit clone size distributions that are consistent with the neutral drift model, the decrease in the inner and outer regimes is slower than expected by this model. this saturation in clone number decrease rate might indicate that the clonal neutral competition is limited in these areas. this could be due to the tight spatial boundary conditions of the limbal regimes. our working model (fig. ) is in agreement with the hematopoietic, hair follicle , neural and muscle lineages, all of which incorporate qscs that can produce ascs . what are the benefits and costs of having two distinct sc states? what is the mechanism that controls the transition between sc states? is this model universal or otherwise only applies for particular tissues? which physiological constraints influence sc division rates? interestingly, epithelial scs of the gut proliferate once a day , while epidermal and esophageal scs divide every ~ days. in this context, it is tempting to hypothesize that the corneal transparency and location in the front of the eye makes it an extremely hazardous environment for scs that must be well guarded especially because they must constantly proliferate to replenish corneal center with new cells. this must entail fundamental considerations for sc well-being such as localizing the scs at the most well-protected zone (outer limbus) and accrediting scs with infrequent division. the quiescence state of scs is believed to be accompanied by reduction in dna replication, metabolism, gene transcription and protein translation . notably, these processes were associated with molecular damage due to the production of toxic agents or errors in biosynthesis of macromolecules . in line, attenuation of these activities by sc dormancy was proposed to delay sc aging . moreover, the reduced cumulating number of cell replications by quiescent scs, may significantly attenuate the accumulation of mutations. surprisingly, however, qscs employ an error prone mechanism of non-homologous end joining to repair their dna . by contrary, frequently dividing scs engage a non-error prone mechanism of homologous recombination for repairing dna breaks , and frequent division rate was shown to be required for the activation of repair pathways . in line, frequently dividing scs were identified in many tissues including the gut , epidermis ,and esophagus . these cumulating findings revised the old dogma that considered quiescence as an integral feature of scs. it also suggested that epithelial scs are not quiescent and that the high turnover of such tissues is associated by a single pool of active scs. the present study refines this conclusion and highlight the possibility that qscs play an important role in corneal epithelial lineage and potentially in other epithelial tissues that may benefit from qsc reservoir. the regulation of quiescence is only partly understood, however, it clearly involves interactions with niche extracellular matrix proteins, secreted factors, and niche resident cells. at the molecular level, cell cycle inhibitors (e.g. p , p and p ), tumor suppressor genes (e.g. retinoblastoma and p ) and soluble factors (e.g. bmp , dkk- , sfrp, wif and shh) were found to positively regulate quiescence, while cytokines and wnt activators (e.g. noggin) were associated with the exit from quiescence state and the transition into active sc state . the bioinformatic analysis of limbal clusters suggests that qlsc engage pi k pathway, the bmp-regulated transcription factors id / , the tumor suppressor gene trp , cyclin d and the wnt inhibitor srf , all of which were linked with growth arrest. of interest, alscs were enriched for tnf, wnt and ap- pathways (fig. s - ) that were linked with proliferation or sc activation. interestingly, atf which is a structural protein of ap- complex was shown to repress id and enhance epithelial cell proliferation . future studies will address the mechanism by which these genes are regulated by outer/inner limbus specific ecm factors (fig. s ) , metabolites or vasculature related factors, or influenced by niche cells (e.g. t cells). likewise, future experiments will be needed to illuminate the mechanism that controls the quiescence and the transition into active state. the maintenance of lsc specific states, and the transition between them is most likely driven by niche cells (e.g. t cells, keratocytes, vasculature), ecm and/or biomechanical forces. it is therefore not surprising that quiescence (as well as gpha expression) is lost ex vivo, whereas culture conditions are sub-optimal and lacking essential components of the niche. the co-localization of t cells with qlsc strongly implies that these cells may secret cytokines to control lsc quiescence, in line with previous studies where immune cells were identified as niche cells for epithelial scs . in the clinics, corticosteroids such as dexamethasone are being widely used to prevent uncontrolled corneal inflammation and neo-vascularisation, as these agents are considered as highly effective drugs for nonspecific suppression of inflammation. however, it has been shown that anti-inflammatory steroids must be used for short time as they may have adverse effects including corneal wound healing delay . the awareness and recognition of such adverse effect is of particular importance given the observed interference of t cells -qlsc crosstalk. future studies will be needed to illuminate these interactions and translate this knowledge into optimized treatments. moreover, this knowledge may facilitate optimal qlsc cultivation for curing blindness in patients that suffer from "lsc deficiency" . we identified a set of markers for detecting qlscs that may allow identifying and perhaps purifying qlscs. what is the functional role of these proteins? do they control quiescence and/or maintain stemness? what controls their specific expression by qlscs? gpha (glycoprotein hormone alpha- ) was found here to be uniquely expressed by qlscs but its role is still unclear. transgenic overexpression of gpha in mice showed no gross phenotype . interestingly, we report a drastic reduction of gpha levels in cases where the niche was disturbed, for example, when lscs were disconnected from the niche and grown in vitro, or following immune cell repression or absence in vivo. in both cases, quiescence was lost, suggesting that gpha may be regulated by t cells and controls quiescent cell state. by contrast, the interferon-induced transmembrane protein (ifitm ) was still expressed in the absence of immune cells in vivo and in vitro, suggesting that it is regulated by other niche means, not via a t cell-induced pathway, and that it does not control cell proliferation. interestingly, both ifitm and cd reside in cellular endo membranes . moreover, ifitm is known for its antiviral specific functions , providing protection against entry and replication of several types of viruses including the covid- virus . ifitm was shown to be highly expressed in embryonic, neural and mesenchymal scs, suggesting that it may confer anti-viral activity to qlscs. interestingly, it was shown that ifitm was involved in b cell related leukemia and control embryonic germ cells . such evidence hints that ifitm may provide protection and control important functions of scs. no major developmental disorder was reported regarding ifitm knockout mice, although the cornea or lscs have not been investigated in that study . emb (embigin), an adhesion molecule that marked qlscs (fig. s a) , was shown to regulate hematopoietic sc quiescence and localization . furthermore, emb was linked with quiescence of hair follicle scs while gpha and ifitm were shown to be expressed in epidermal scs in the g state . future studies will be needed to illuminate the role of these genes in sc homeostasis and pathology, and to test whether their manipulation can contribute to advancing lsc therapy. in summary, this report provides a useful atlas that uncovers the main corneal epithelial cell populations, capturing the signature and the niche of quiescent and activated lsc states. these data open new research avenues for studying the mechanisms of cell proliferation and differentiation as well as the applications of lscs in regenerative medicine. ten eyes of . months old krt -gfp mice were enucleated, the limbus (with marginal conjunctiva and peripheral cornea) was dissected (~ . mm), tissues were pooled and incubated in trypsin (x biological industries) for min °c μl. supernatant was collected into ml (rpmi (biological industries) % chelated fetal calf serum). trypsinization was repeated for cycles adding fresh trypsin in each cycle. cell suspension was centrifuged ( min at g), re-suspended and filtered using cell strainer (danyel biotech) to achieve cells/μl. an rna library was produced according to the x genomics protocol (chromium single cell ' library & gel bead kit v , pn- ) using input cells. single cell separation was performed using the chromium single cell a chip kit (pn- ) . the rnaseq data was generated on illumina nextseq , bp paired-end reads, high-output mode (illumina, fc- - ) according to x recommendations read - bp and read - bp. cell ranger (version . . ) was used for primary analysis. transcripts were mapped to the mm reference genome with the addition of the egfp gene sequence. downstream analyses including clustering, classifying single cells and differential expression analyses were performed using r package -seurat (version . . ) . low-quality cells (> % mitochondrial umi counts, or < or > expressed genes ) and genes (detected in < cells) were excluded, and eventually , genes across , cells were analyzed. in silico analysis of non-epithelial cell markers confirmed the absence (> . %) of corneal endothelial cells (clrn , pvrl , gpc , slc a , slc a , grip , hrt d, irx ) , corneal stromal keratocytes (vim, thy , s a ), goblet cells (muc , muc ac, muc b) , melanocytes (dct, mitf, ptgs, tyrp , melana, tyr), t cells (cd d, cd e, cd , g mb), b cells (cd , cd a, cd b). the only non-epithelial cells identified in our dataset were six cells that expressed mhc ii (h -ab + ), of them were cd c + /cd + or cd c + /cd + dendriticlike cells, others were cd + /adgre + or cd + /cd + macrophage-like. analysis of pan epithelial genes (expression of krt , krt , cdh , epcam and cldn ) confirmed that most (> %) cells were ocular epithelial cells. for pathway analysis, a list of differentially expressed genes between chosen clusters was analyzed using webgestalt . over-representation enrichment analysis (ora) was used for pathway analysis with kegg functional database. animals, tissue preparation and staining: animal care was according to the arvo statement for the use of animals in ophthalmic and vision research. r r-brainbow . (# ), k -cre ert (# ), krt -gfp (# ) and ubc-creert (# ) were from jackson laboratories (bar harbor, me). to induce cre recombinase activity, mg/day tamoxifen (t , sigma, st. louis, mo) dissolved in corn oil was intraperitonealy injected ( μl) for - consecutive days, as previously reported . nod/scid, scid, nude, and balb/c were from envigo rms ltd (israel). dexamethasone (sigma) was dissolved in pbs ( . %) and administered as drops ( µl) on ocular surface of anesthetized ( % isoflurane) mice, for minutes every hours for consecutive days. for wounding, mice were anesthetized ( % isoflurane) and injected intramuscular with analgesics buprenorphine ( . mg/ml, µl). central corneal wounding ( mm diameter) was performed using an ophthalmic rotating burr (algerbrush) under fluorescent binocular. the wounded corneas were stained ( % fluorescein), and the wounded area was imaged and quantified (nis-elements analysis d). for cd + foxp + regulatory t cell depletion, subconjunctival injection of µl ( µg) anti-cd antibody (clone:pc , anticd il- rα) or rat igg (control) (biocell invivomab) was performed. single intraperitoneal injection of μl ( . mg/ml) edu (sigma) was performed and hours later tissues were processed. for pulse-chase, edu was injected every hours for consecutive days and tissues processed after additional -days. isolated corneas were fixed ( % pfa, hour) and stained (click-it, invitrogen) according to the manufacturer's instructions followed by whole mount staining protocol (described above) for other markers staining. for the calculation of length cycle (tc) . mg/ml idu (abcam) was injected and . hours later, . mg/ml edu was injected and tissues harvested minutes later. the calculation of length cycle (tc) was calculated according to formula described by staining, imaging and quantifications: for paraffin sections, eyes were fixed (overnight, % formaldehyde), washed briefly with phosphate buffered saline (pbs), incubated in increasing ethanol, xylene and paraffin ( °c) and then embedded in paraffin blocks, as previously described . sections ( μm) were prepared, antigen retrival performed with unmasking solution (vector laboratories, h ), blocked ( . % tween and . % gelatin, hours), incubated with primary antibody (overnight, °c), secondary antibodies (invitrogen, : , hours) followed by ′, -diamidino- -phenylindole (dapi) staining, and mounting (thermo scientific). for h&e (sigma) staining, sections were rehydrated, incubated with hematoxylin ( minutes), washed (tap water, minutes), stained with eosin ( minutes), dehydrated, incubated with xylene ( minutes) and mounted (thermo scientific). for frozen sections, eyes were fixed ( hours, % formaldehyde), washed briefly (pbs), incubated in % sucrose (overnight) and embedded in optimal cutting temperature compound (oct). sections ( μm) were fixed ( % paraformaldehyde), permeabilized ( . %, tritonx- , minutes), blocked ( % bsa, % gelatin, . % normal goat serum, . % normal donkey serum, . % tritonx- , hour), incubated with primary antibody (overnight, °c), secondary antibodies ( : , hour), dapi and mounting (thermo scientific). for whole mount, corneas were isolated, fixed ( % formaldehyde, hours, room temperature), permeabilized ( . % triton, hours), blocked ( . % tritonx- , % normal donkey serum, . % bsa, hour), primary antibody (overnight, °c on a shaker), secondary antibodies ( : , hour), followed by dapi, tissue flattening under a dissecting binocular and mounting (thermo scientific), as previously described images were acquired using x/ . m plan-apochromat objective on lsm airyscan laser scanning confocal system (zeiss, oberkochen, germany) or by nikon eclipse ni-e upright microscope using plan apo λ x objective. for computerized quantification analyses we used nis-elements analysis d software, the outer limbus was marked by gpha or cd staining, the adjacent µm (or entire inner limbus if k -gfp used) of the inner limbus was calculated, while µm peripheral cornea was quantified and different field were calculated from each cornea (n= - ). in vitro experiments: human limbal rings from cadaveric corneas were obtained under the approval of the local ethical committee and declaration of helsinki. cells cultivation, differentiation, transfection, western blot and real time polymerase chain reaction (qpcr) analyses were performed as previously described . we used esirna against egfp (ehuegfp, sigma), ifitm (ehu , sigma) or gpha (ehu , sigma) and cells were collected - days after transfection. primers for qpcr were: gapdh (gccaaggtcatccatgacaac, ctccaccaccctgttgctgta), ifitm (ctgggcttcatagcattcgcct, agatgttcaggcacttggcggt), krt (gacggagatcacagacctgag, ctccagccgtgtctttatgtc), krt (tgaatggtgaggtggtctca, tttcagaagggcaaaaagga). clone size and number analysis: as described previously , one of the main predictions of the neutral drift dynamics model is that the average number of cells in a given clone as a function of time is given by, where  is the fraction of basal cells that divide, r is the probability for symmetric division and  is the division rate. the number of clones is expected to decrease with time as, . thus, the relation between the clone size and clone number is given by liner fit of the clone size was performed using matlab linear regression model. estimating the clone size cumulative distribution was done using empirical cdf estimation using matlab. statistical analysis: data are presented as means +/-sd. t-test, anova (analysis of variance) followed by bonferroni test, or kolmogorov-smirnov test were performed using graphpad prism software, as indicated in legends, to calculate p-values. differences were considered to be statistically significant from a p-value below . . statistical significance was calculated using t-test (*, p-value< . ). abbreviations: cj., conjunctiva; peri., periphery. figure : working hypothesis. schematic illustration of our working model, depicting the anatomy of the limbal sub-compartments and lsc populations. we propose that the outer limbus is occupied by widespread qlscs that play an important role as a sc reservoir for wound repair, barrier maintenance and corneal replenishment. the inner limbus contains abundant alscs that are actively engaged in corneal replenishment. the maintenance and differentiation of lsc populations is regulated by niche components including blood and lymph vasculature, stromal cells, and particularly t cells that control qlsc proliferation. (h) edu pulse/chase experiment described in fig. e-f that was performed to detect slow cycling cells. the square zone on the right shows enlarged view of the dashed region on the left. arrow shows edu + /krt + limbal epithelial cells at the outer limbus. scale bars are μm. abbreviations: conj., conjunctiva; peri., periphery. figure s : prediction of pathways enriched in clusters or . in silico analysis performed to predict pathways that were enriched in cluster or (in comparison to all other clusters) using the webgestalt algorithm. figure s : differential expression of selected families of genes. heatmap presentation shows the average expression of the indicated genes across clusters. genes enriched in cluster or cluster are shown in a and b, respectively, and selected family-related genes are shown in (c). outer limbus krt -gfpregion krt -gfp + region ± µm ± µm ± µm defining adult stem cells by function, not by phenotype epithelial stem cells: turning over new leaves advances in stem cell research and therapeutic development somatic stem cell heterogeneity: diversity in the blood, skin and intestinal stem cell compartments defining adult stem cell function at its simplest: the ability to replace lost cells through mitosis the biology of hematopoietic stem cells hematopoiesis: an evolving paradigm for stem cell biology disparate differentiation in mouse hemopoietic colonies derived from paired progenitors stem cells: units of development, units of regeneration, and units in evolution existence of slow-cycling limbal epithelial basal cells that can be preferentially stimulated to proliferate: implications on epithelial stem cells in vivo proliferation and cell cycle kinetics of long-term self-renewing hematopoietic stem cells in vivo fate analysis reveals the multipotent and self-renewal capacities of sox + neural stem cells in the adult hippocampus label-retaining cells reside in the bulge area of pilosebaceous unit: implications for follicular stem cells, hair cycle, and skin carcinogenesis the intestinal epithelial stem cell: the mucosal governor asymmetric division and cosegregation of template dna strands in adult muscle satellite cells mechanisms, hallmarks, and implications of stem cell quiescence good and bad of stem cell aging coexistence of quiescent and active adult stem cells in mammals tissue-specific stem cells: lessons from the skeletal muscle satellite cell concise review: quiescent and active states of endogenous adult neural stem cells: identification and characterization the tortoise and the hair: slow-cycling cells in the stem cell race identification of stem cells in small intestine and colon by marker gene lgr a single type of progenitor cell maintains normal epidermis cellular heterogeneity in the mouse esophagus implicates the presence of a nonquiescent epithelial stem cell population clonal analysis of stem cells in differentiation and disease live imaging of stem cells: answering old questions and raising new ones kinetics of cell division in epidermal maintenance corneal epithelial stem cells: deficiency and regulation lineage tracing of stem and progenitor cells of the murine corneal epithelium a method for lineage tracing of corneal cells using multi-color fluorescent reporter mice tracing the fate of limbal epithelial progenitor cells in the murine cornea lineage tracing in the adult mouse corneal epithelium supports the limbal epithelial stem cell hypothesis with intermittent periods of stem cell quiescence corneal-committed cells restore the stem cell pool and tissue boundary following injury peripheral (not central) corneal epithelia contribute to the closure of an annular debridement injury limbal epithelial stem cells: role of the niche microenvironment limbal epithelial crypts: a novel anatomical structure and a putative limbal stem cell niche assessment of corneal substrate biomechanics and its effect on epithelial stem cell maintenance and differentiation characterization of slow cycling corneal limbal epithelial cells identifies putative stem cell markers comparison of limbal and peripheral human corneal epithelium in tissue culture location and clonal analysis of stem cells and their differentiated progeny in the human ocular surface cytokeratin can be used to identify the limbal phenotype in normal and diseased ocular surfaces c/ebpdelta regulates cell cycle and self-renewal of human limbal stem cells abcb is a limbal stem cell gene required for corneal development and repair abcg transporter identifies a population of clonogenic human limbal epithelial cells p identifies keratinocyte stem cells integrating single-cell transcriptomic data across different conditions, technologies, and species structure, development and function of cytoskeletal elements in nonneuronal cells of the human eye patterns of cytokeratin and vimentin expression in the human eye corneal expression of slurp- by age, sex, genetic strain, and ocular surface health corneal integrins and their functions cell cycle protein expression and proliferative status in human corneal cells tight junction transmembrane protein claudin subtype expression and distribution in human corneal and conjunctival epithelium hierarchical expression of desmosomal cadherins during stratified epithelial morphogenesis in the mouse organization and formation of the tight junction system in human epidermis and cultured keratinocytes capturing and profiling adult hair follicle stem cells webgestalt: an integrated system for exploring gene sets in various biological contexts a novel role for crim in the corneal response to uv and pterygium development intestinal crypt homeostasis results from neutral competition between symmetrically dividing lgr stem cells universal patterns of stem cell fate in cycling adult tissues quantitative clonal analysis and single-cell transcriptomics reveal division kinetics, hierarchy, and fate of oral epithelial progenitor cells computer simulation of neutral drift among limbal epithelial stem cells of mosaic mice foxg is required for specification of ventral telencephalon and region-specific regulation of dorsal telencephalic precursor proliferation and apoptosis signaling through bmpr-ia regulates quiescence and long-term activity of neural stem cells in the adult hippocampus bmp signaling and its psmad / target genes differentially regulate hair follicle stem cell lineages the inhibitor of differentiation isoform id b, generated by alternative splicing, maintains cell quiescence and confers self-renewal and cancer stem cell-like properties wnt signaling maintains the hair-inducing activity of the dermal papilla axin marks quiescent hair follicle bulge stem cells that are maintained by autocrine wnt/β-catenin signaling increased expression of cyclin d during multiple states of growth arrest in primary and established cells glucose regulates cyclin d expression in quiescent and replicating pancreatic β-cells through glycolysis and calcium channels p regulates hematopoietic stem cell quiescence intrinsic immunity shapes viral resistance of stem cells long-lived keratin + esophageal progenitor cells contribute to homeostasis and regeneration injury activates transient olfactory stem cell states with diverse lineage capacities hematopoietic stem cells reversibly switch from dormancy to self-renewal during homeostasis and repair quantitative proliferation dynamics and random chromosome segregation of hair follicle stem cells the adult drosophila posterior midgut is maintained by pluripotent stem cells defining the cellular lineage hierarchy in the interfollicular epidermis of adult skin esophageal stem cells--a review of their identification and characterization the dna-damage response in human biology and disease dna damage in stem cells hematopoietic stem cell quiescence promotes error-prone dna repair and mutagenesis bcl- and accelerated dna repair mediates resistance of hair follicle bulge stem cells to dna-damage-induced cell death molecular regulation of stem cell quiescence dna repair mechanisms in dividing and non-dividing cells epidermal homeostasis: a balancing act of stem cells in the skin defining the epithelial stem cell niche in skin a self-enabling tgfbeta response coupled to stress signaling: smad engages stress response factor atf for id repression in epithelial cells two to tango: dialog between immunity and stem cells in health and disease corticosteroids and corneal epithelial wound healing eyes on the prize: limbal stem cells and corneal restoration a glycoprotein hormone expressed in corticotrophs exhibits unique binding properties on thyroid-stimulating hormone receptor antiviral role of ifitm proteins in african swine fever virus infection the broad-spectrum antiviral functions of ifit and ifitm proteins monoclonal antibody to the interferoninducible protein leu- triggers aggregation and inhibits proliferation of leukemic b cells the generation of spermatogonial stem cells and spermatogonia in mammals normal germ line establishment in mice carrying a deletion of the ifitm/fragilis gene family cluster embigin regulates hspc homing and quiescence and acts as a cell surface marker for a niche factor-enriched subset of osteolineage cells defining the design principles of skin epidermis postnatal growth classification of low quality cells from single-cell rna-seq data in vivo depletion of cd +foxp + treg cells by the pc anti-cd monoclonal antibody is mediated by fcgammariii+ phagocytes sox regulates p and stem/progenitor cell state in the corneal epithelium outer innerperi. key: cord- -qagyaegp authors: magee, michelle; lewis, courtney; noffs, gustavo; reece, hannah; chan, jess c. s.; zaga, charissa j.; paynter, camille; birchall, olga; azocar, sandra rojas; ediriweera, angela; caverlé, marja w.; schultz, benjamin g.; vogel, adam p. title: effects of face masks on acoustic analysis and speech perception: implications for peri-pandemic protocols date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qagyaegp wearing face masks (alongside physical distancing) provides some protection against infection from covid- . face masks can also change how we communicate and subsequently affect speech signal quality. here we investigated how three face mask types (n , surgical and cloth) affect acoustic analysis of speech and perceived intelligibility in healthy subjects. we compared speech produced with and without the different masks on acoustic measures of timing, frequency, perturbation and power spectral density. speech clarity was also examined using a standardized intelligibility tool by blinded raters. mask type impacted the power distribution in frequencies above khz for both the n and surgical masks. measures of timing and spectral tilt also differed across mask conditions. cepstral and harmonics to noise ratios remained flat across mask type. no differences were observed across conditions for word or sentence intelligibility measures. our data show that face masks change the speech signal, but some specific acoustic features remain largely unaffected (e.g., measures of voice quality) irrespective of mask type. outcomes have bearing on how future speech studies are run when personal protective equipment is worn. face masks (alongside physical distancing) provide some protection against infection from coronavirus disease (chu et al., ) . their use in public spaces and healthcare settings is either recommended or mandatory in many jurisdictions internationally. in the united states, the center for disease control (cdc, ) recommends mask use to minimize droplet dispersion and aerosolization of the virus (bahl et al., ) . clinical trials and healthcare settings continue to assess speech production, which generates respiratory droplets while unrestricted exposure increases the likelihood of disease contraction (stadnytskyi et al., ) . risk of transmission increases through behaviors common in many speech assessment tasks including continuous and loud speech (asadi et al., ) . at the same time, acknowledgement of the necessity of personal protective equipment to minimize virus transmission has increased internationally (asadi et al., ; stadnytskyi et al., ; zaga et al., ) . masks, however, alter the speech signal with downstream effects on intelligibility of a speaker. the use of personal protective equipment poses some unique challenges for speech assessment. we evaluated the impact wearing a mask has on acoustic output and speech perception. we examined how different face mask types (surgical, cloth and n ), in combination with microphone location variations (headset vs. tabletop), affect speech recordings and intelligibility. four subjects, aged . ± . years, range - ; males: females, were included in the study. all speakers were english speaking with no dysphonia, cognitive or neurological impairments. one male and female had english as their second language. the speech battery was elicited by trained staff and consisted of sustaining an open vowel /aː/ for approximately six seconds reproduced ten times and reading a phonetically balanced text, the grandfather passage (van riper, ) , reproduced five times. the speech battery was repeated under four conditions in a randomized order: ) no mask; ) standard surgical mask (regulated under cfr . ); ) cloth mask ( -layered cotton); and ) n mask (disposable mask made from electrostatic non-woven polypropylene fiber containing a filtration layer). subjects were instructed to speak in a natural manner at a comfortable pitch and pace. speech samples were recorded using two standardized methods: ) using a head-mounted cardioid condenser microphone (akg , harman international, united states) positioned inches from the corner of the subject's mouth (minimum sensitivity of - db, near flat frequency response) and coupled with a quad-capture usb . audio interface (roland corporation, shizuoka, japan) connected to a laptop computer; and ) using a blue yeti (blue microphones, united states) tabletop microphone (sensitivity . mv/pa) connected to a laptop computer. the microphone was positioned ft. from the subject to simulate physical distancing measures. standardization of the recording environment was achieved by recording in the absence of traffic, electrical, appliance, or other background noise. all recordings were sampled at . khz with -bit quantization. speech intelligibility was evaluated using the assessment of intelligibility of dysarthria speech (assids) (yorkston and beukelman, ) . for each condition subjects read aloud a randomized list of single words (one and two syllables in length) and sentences ( to syllables in length). two blinded raters transcribed assids words and sentences, with the percentage of correct items calculated for each condition. audio files were screened for deviations and synchronized between microphones to ensure uniformity of length. acoustic analysis of sustained vowel and reading tasks were performed using praat software (boersma, ) . two groups of speech features were analyzed, one to describe responsiveness to speech and silence, and another to determine agreement between measurements taken by different microphone conditions. the speech spectrum was used to describe the impact of mask type on the complex voice waveform. the interaction between intensity and frequency was characterized using the power spectral density (psd, db/khz relative x - pa) in the longterm average spectrum on the reading task. psd provides information on how "each frequency" contributes to the total sound power. frequency bands were fixed at khz. psd was averaged across subjects for each mask condition and compared between masks not subjects. center-of-gravity (cog, in hz) was calculated from the power spectrum to inform frequency responsiveness of the conditions. cog is the mean power-weighted frequency, i.e. the frequency that divides the power spectrum in equal halves above and below cog. the intensity of background noise (floor) was determined as equal to the average intensity during the quietest three seconds of each files (i.e., in the absence of vocalization). floor intensity was subtracted from the average intensity (during vocalization) for each task (vowel and reading) to determine the speech intensity prominence per mask condition. features of interest included cepstral peak prominence smoothed (cpps), harmonic-to-noise ratio (hnr), local jitter and shimmer for the sustained vowel, and average and standard deviation of pause length for the reading task. fundamental frequency was calculated through autocorrelation within a restricted range ( hz - hz for males, hz - hz for females) (vogel et al., ). the analysis window was ms and ms respectively, and window shift fixed at ms. the maximum number of formants was set at with a maximum of hz for formant detection. all other parameters were maintained at default software settings. the detection of silence-speech and speech-silence transitions was done using an energy threshold on the time domain (rosen et al., ; vogel et al., ) . the threshold was set to % of the th percentile, with minimum silence length set to ms and minimum speech length to ms. to examine differences of each acoustic parameter under each mask condition (no mask, surgical, n , and cloth), a linear mixed-effects model analysis using restricted maximum likelihood estimation was applied. mask type was modeled as a fixed factor, and subject and order of mask as a random factor. bonferroni corrected post hoc pairwise comparisons were conducted to determine differences in mask type (surgical, n , and cloth) compared to no mask. to investigate power spectral density, the interaction effect between mask and frequency band was investigated. where the interaction was significant, planned comparisons were made for each khz frequency band to determine differences between masks types compared to no mask. spss was used for all statistical analyses (ibm spss version . ). intelligibility varied between the speakers and across mask conditions. on average, intelligibility remained above % for all mask conditions, irrespective of single words (figure a frequency bands were collapsed into khz slices to explore differences in psd between mask type. there was a mask × khz frequency band interaction effect (f , = . , p= . ). post hoc comparisons showed power (db/hz ) was significantly lower between - khz for n mask and - khz for surgical and cloth masks when compared to no mask on recordings made using the head-mounted microphone (figure a) . no significant differences were observed between mask conditions on recordings made using the tabletop microphone (f , = . , p= . ; figure b ). -insert figure about here- showed that recordings produced with the n mask increased percentage of pauses (p= . ) (table ) . spectral tilt was lower in recordings produced with the surgical (p= . ) and n masks (p= . ). for recordings produced with the tabletop microphone, there was a significant effect of mask type for percentage of pauses (f , . = . , p= . ), and spectral tilt (f , . = . , p= . ) ( table ) . post hoc comparisons revealed that the n and cloth masks yielded higher percentage of pauses (n p= . ; cloth p= . ) no mask. as with the head-mounted microphone, recordings produced with the tabletop microphone yielded lower spectral tilt values with both the surgical (p= . ) and n masks (p= . ). no significant differences were observed in acoustic parameters extracted from the sustained vowel recorded using either the headmounted or tabletop microphone. -insert table about here- the type of mask affected the speech signal. we observed significant differences in acoustic power distribution across relevant frequency bands for speech in all three mask conditions compared to no mask. the differences were not observed in frequencies below khz. differences in signal for higher frequencies led to altered acoustic outcomes including spectral tilt. the masks however did not significantly influence listener-perceived intelligibility or acoustic measures of perturbation (e.g., nhr, cpps). measures of speech rate were lower for n and surgical masks, possibly as speakers compensate when wearing masks to improve intelligibility. it is also possible that speech timing differences were related to how speech boundaries are identified in the analysis scripts (i.e., our timing analysis relied on identification of phoneme/word boundaries via intensity thresholds). intelligibility scores varied between raters and between mask condition. intelligibility remained above % for words and sentences. anecdotally, it can be difficult to understand people when they wear a mask (goldin et al., ) . our small dataset suggests mask type does not systematically impact intelligibility in controlled environments. our recordings were made with high-quality microphones in quiet environments. raters listened to samples in ideal listening conditions away from distractions and background noise but without visual aid (lips and jaw movement) for all mask conditions. in loud environments, communication can be challenging with multiple distractors, background noise, and a lower signal-to-noise ratios (snr). noise in ecological situations may further decrease speech intelligibility, when complementary visual cues blocked by use of face masks play a role in communication. it is clear that face masks change the acoustic speech signal, but some specific perceptual features remain largely unaffected (e.g., acoustic measures of voice quality) irrespective of mask type. these results have implications for clinical assessments and speech research where ppe is required. it is easy to assume that subjects in a speech study will simply remove ppe during assessments; however, subjects and researchers may be reluctant to do so if it leads to potential exposure to airborne viruses. in longitudinal studies with data collection before, during, and after pandemics requiring ppe, researchers should consider how to mitigate against changes to protocols that affect speech (see figure ) (redenlab, ) . mean power spectra density displayed between - khz based on mask type. shaded areas represent the standard error of mean. *p≤ . no mask vs mask type at each frequency bin. red stars denote significant differences between no mask and n , blue stars denote significant differences between no mask and surgical masks while orange stars denote significant differences between no mask and n . *disclaimer: please be advised that nothing completely eliminates bacteria or viruses and the guidelines contained in this document are measures attempting to limit the spread of a virus. further, these guidelines do not supersede medical practitioner recommendations or the covid- safety policies implemented by your business or institution. it is your responsibility to follow the recommendations and safety policies applicable to your business or institution. to reduce risk, it is recommended assessors wear masks throughout assessments, the microphone's metal surfaces are sanitized between subjects, and all windscreens are washed at the end of each use. aerosol emission and superemission during human speech increase with voice loudness face coverings and mask to minimise droplet dispersion and aerosolisation: a video case study use of masks to help slow the spread of covid- physical distancing, face masks, and eye protection to prevent person-to-person transmission of sars-cov- and covid- : a systematic review and meta-analysis how do medical masks degrade speech reception? guidance on minimizing risk to patients and staff during speech recordings automatic method of pause measurement for normal and dysarthric speech the airborne lifetime of small speech droplets and their potential importance in sars-cov- transmission standardization of pitch-range settings in voice acoustic analysis motor speech signature of behavioral variant frontotemporal dementia: refining the phenotype assessment of intelligibility of dysarthric speech speech-language pathology guidance for tracheostomy during the covid- pandemic: an international multidisciplinary perspective key: cord- -k jya b authors: das jana, indrani; kumbhakar, partha; banerjee, saptarshi; gowda, chinmayee chowde; kedia, nandita; kuila, saikat kumar; banerjee, sushanta; das, narayan chandra; das, amit kumar; manna, indranil; tiwary, chandra sekhar; mondal, arindam title: development of a copper-graphene nanocomposite based transparent coating with antiviral activity against influenza virus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k jya b respiratory infections by rna viruses are one of the major burdens upon global health and economy. viruses like influenza or coronaviruses can be transmitted through respiratory droplets or contaminated surfaces. an effective antiviral coating can decrease the viability of the virus particles in the outside environment significantly, hence reducing their transmission rate. in this work, we have screened a series of nanoparticles and their composites for antiviral activity using nano luciferase based highly sensitive influenza a reporter virus. using this screening system, we have identified copper-graphene (cu-gr) nanocomposite shows strong antiviral activity. extensive material and biological characterization of the nanocomposite suggested a unique metal oxide embedded graphene sheet architecture that can inactivate the virion particles only within minutes of pre-incubation and subsequently interferes with the entry of these virion particles into the host cell. this ultimately results in reduced viral gene expression, replication and production of progeny virus particles, slowing down the overall pace of progression of infection. using pva as a capping agent, we have been able to generate a cu-gr nanocomposite based highly transparent coating that retains its original antiviral activity in the solid form. the emergence of novel virus strains and the associated outbreaks are becoming a significant threat to mankind (koven ) . the currently ongoing pandemic, caused by the severe acute respiratory syndrome-coronavirus (sars-cov- ), has brought the majority of the world to a grinding halt, severely impacting health & economy across the nations ( fomites. thus, the development of low cost and easily scalable antiviral coating materials, which could be widely applied to various surfaces in order to inactivate the virus particles in the environment, may serve as an effective way to reduce the chance of infection and hence to lower the overall speed of transmission. different metal oxides, including cu and ag have been explored for their biocidal activity in soluble as well as in insoluble forms (minoshima et al. ) . copper and silver nanoparticles have remarkable sequence from porcine teschovirus. the nano-luc-influenza a reporter virus, as a part of its gene expression, synthesizes the pa- a-nluc polypeptide, which gets self-cleaved to produce nano-luciferase. subsequently, the luciferase activity could be measured as a quantitative estimate of viral gene expression and hence progression of virus replication cycle inside the cells. to test whether the nano-luciferase activity could actually serve as a proxy to virus replication, we have infected mdck cells with different amounts of input virus and viral replication/gene expression was monitored using nano-glo assay (promega). as shown in figure a , there is a linear relationship between multiplicity of infection ( . - . ) and luciferase light unit measurements (r = . ), where an increase in one log in the input virus amount leads to about % increase in the luciferase activity or vice versa, measured at hours of post-infection. this data suggests that the nano-luciferase influenza a reporter virus could serve as an excellent tool to study the antiviral activity of various nanoparticles or their nanocomposites used in this study. in order to test the antiviral activity, we have standardized a "nano-luc reporter assay" described in figure b. briefly, nano-luciferase influenza a reporter viruses were pre-incubated with the um colloidal suspensions of each of the nanoparticles/ composites or with the vehicle control for minutes at room temperature and subsequently used to infect mdck cells at an moi of . . luciferase activity was measured at hours of post infection and plotted as a relative percentage of the vehicle control set ( figure c). prior treatment of the virus stock solution with cu-gr composite showed % reduction in viral gene expression, while prior treatment with ag-gr resulted in % reduction. treatment with other materials shows no significant decrease in luciferase activity. from the correlation of input virus units and the corresponding luciferase activity, as shown in figure a , it can be inferred that prior treatment with cu-gr solution resulted in more than -fold reductions in the infectious virus population that has been used to infect the mdck cells. in this context, it should be noted that none of the materials showed substantial cytotoxicity upon madin-darby canine kidney cells (mdck) within the concentration range of . um - . um as evaluated using mtt assay ( figure ) . hence, the reduction in nano-luciferase activity as a result of prior exposure to cu-gr should be attributed exclusively to the reduction of the infectivity of the nano-luciferase reporter virus. henceforth, we focused upon the extensive characterization of the cu-gr nanocomposite. we have extensively characterized the structural parameters of the synthesized cu-gr nanocomposites by optical measurements. figure a depicts raman spectra of synthesized cu-gr nanocomposite samples at excitation of nm in the range of cm - to cm - . with raman spectroscopy, we are able to cm - ) confirms the existence of graphene in the samples synthesized. generally, d peak originates from defects in the hexagonal sp carbon system while the g peak arises due to the stretching vibration of sp carbon pairs in both rings and chains (ferrari et al. ). except, d and g peak, the d peak arises at ~ cm - . the d peak originates due to transverse optical (to) phonons around the k point and is reporter assay in order to identify the optimal time and concentration required for its antiviral activity. the nano-luc reporter assay was performed where the influenza a reporter virus was pretreated with the colloidal form of the cu-gr nanocomposite for various time periods before using them for infecting mdck cells. as evidenced from figure a , a sharp decrease (> %) in the reporter activity was observed as a result of minutes of preincubation with cu-gr composite, while longer times of preincubation showed only minor additional reduction. this data suggested that minutes of preincubation with cu-gr composite can lead to more than tenfold reduction in input virus titer that ultimately results in ~ % decrease in reporter activity. subsequently, we tried to identify the optimal concentration of the cu-gr composite required for its antiviral activity. different concentrations of the cu-gr composite ( nm, nm, nm, µm, µm and µm, respectively) were used to treat the nano-luc influenza a reporter virus for minutes followed by performing nano-luc reporter assay with the same. µm cu-gr nanocomposite shows non-significant decrease in viral titer. this data further substantiates the fact that treatment with µm cu-gr significantly reduces viral infectivity which results in a decrease in viral gene expression, replication and subsequent production of viral titer ( figure c ). the plaque assay titer data are tabulated in figure d . each data is a representative of at least three independent experiments, each experiment was performed in triplicate. graphs are performed in microsoft excel and represented as mean standard deviations (n= ). results were compared by performing two-tailed student's t test. significance is defined as p< . and statistical significance is indicated with an asterisk (*). the *p value < . , **p value < . **p value and ***p < . were considered statistically significant. thermal properties of the binary-filler hybrid composites with graphene and copper nanoparticles plasmonic properties of copper nanoparticles fabricated by nanosphere lithography copper nanoparticle/polymer composites with antifungal and bacteriostatic properties cu-ag bimetallic nanoparticles on reduced graphene oxide nanosheets as peroxidase mimic for glucose and ascorbic acid detection copper nanoparticles: aqueous phase synthesis and conductive films fabrication at low sintering temperature situ raman spectroscopy of copper and copper oxide surfaces during electrochemical oxygen evolution reaction: identification of cuiii oxides as catalytically active species graphene synthesis: relationship to applications chemistry with graphene and graphene oxide - challenges for synthetic chemists coronaviruses: an overview of their replication methods in molecular biology raman spectrum of graphene and graphene layers cu and cu-based nanoparticles: synthesis and applications in functionalization of graphene: covalent and non-covalent antiviral activity of cuprous oxide nanoparticles against hepatitis c virus in vitro application principles of virology harmonic optical responses of single ag nanoparticles with morphology updating the accounts: global mortality of the - 'spanish' influenza pandemic graphene materials in antimicrobial nanomedicine: current status and future perspectives anomalous k-point phonons in noble metal/graphene heterostructure activated by localized surface plasmon resonance engla, journal - -new engla nd journal the lancet commission on pollution and health receptor usage for sars-cov- and other lineage b betacoronaviruses viromimetic sting agonist-loaded hollow polymeric nanoparticles for safe and effective vaccination against middle east respiratory syndrome coronavirus preparation, characterization and antibacterial properties of silver-modified low-viscosity overlay medium for viral plaque assays comparison of the antiviral effect of solid-state copper and silver antimicrobial mechanisms and effectiveness of graphene and graphene- with sars-cov- a roadmap for graphene can graphene take part in the fight against covid- ? discovery of direct-acting antiviral agents with a graphene-based fluorescent nanosensor the geography and mortality of the ion-based metal/graphene antibacterial agents comprising mono-ionic and bi-ionic silver and copper species one-step synthesis of pt-decorated graphene-carbon nanotubes for the electrochemical sensing of dopamine, uric acid and ascorbic acid strain specificity in antimicrobial activity of silver and copper nanoparticles the generation of recombinant influenza a viruses expressing a pb fusion protein requires the conservation of a packaging signal overlapping the coding and noncoding regions at the ′ end of the pb segment reviewing the history of pandemic influenza: understanding patterns of emergence and transmission facile synthesis and application of ag-chemically converted graphene antibacterial activities of solid-state cuprous compounds influenza: the mother of all pandemics antiviral activity of graphene oxide−silver nanocomposites_ .pdf key: cord- -fe v pt authors: zhang, chiyu; forsdyke, donald r. title: potential achilles heels of sars-cov- displayed by the base order-dependent component of rna folding energy date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fe v pt base order, not composition, best reflects local evolutionary pressure for folding of single-stranded nucleic acids. the base order-dependent component of folding energy has revealed a highly conserved region in hiv- genomes that associates with rna structure. this corresponds to a packaging signal that is recognized by the nucleocapsid domain of the gag polyprotein. long viewed as a potential hiv- “achilles heel,” the signal can be targeted by a recently described antiviral compound (nsc ) or by synthetic oligonucleotides. thus, a conserved base-order-rich region of hiv- may facilitate therapeutic attack. although sars-cov- differs in many respects from hiv- , the same technology displays regions with a high base order-dependent folding energy component, which are also highly conserved. this indicates structural invariance (si) sustained by natural selection. while the regions are often also protein-encoding (e.g. nsp , orf a), we suggest that their nucleic acid level functions – such as the ribosomal frameshifting element (fse) that facilitates differential expression of a and ab polyproteins – can be considered potential “achilles heels” for sars-cov- , perhaps susceptible to therapies like those envisaged for aids. the region of the fse scored well, but higher si scores were obtained in other regions, including those encoding nsp and the nucleocapsid (n) protein. composition tends to reflect genome-wide evolutionary pressures. just as a local arrangement of words conveys specific meaning to a text, so base order better reflects local evolutionary pressures. base order is most likely to be conserved when encoding a function critical for survival. assays of the base order-dependent component of the folding energy have shown that a highly conserved region, in otherwise rapidly mutating hiv- genomes, associates with an rna structure corresponding, not to a protein-encoding function, but to an rna packaging signal. the latter is specifically recognized by the nucleocapsid domain of the gag polyprotein [ ] and is now seen as a potential "achilles heel" of hiv- that can be targeted by a recently described antiviral compound (nsc ) or by synthetic oligonucleotides [ ] . we here report similar highly conserved structural regions of the sars-cov- genome, one or more of which should be susceptible to targeting [ , ] . we identify certain open reading frames (orfs) that, because of their conservation, have so far attracted therapeutic interest mainly related to their functions at the protein level [ , ] , rather than at the level of the corresponding, yet highly structured, regions of the genome. the ribosomal frameshift element (fse) that is among our results, is attracting attention [ , , ] . yet our analysis, consistent with recent reports [ , ] , suggests there may be more suitable targets in other regions. these were obtained from the ncbi (bethesda) and gisaid epicov (munich) databases. the wuhan-hu sequence (genbank nc_ . ), deemed taxonomically prototypic [ , ] , was compared (regarding base substitutions and folding potential) with chinese isolates, italian isolates, and isolates from new york, usa. our "window" starting point was base of the , base prototype sequence. we refer to windows by their centers. the center of the first base window would be . in previous hiv- studies the base differences between just two individual sequences sufficed for the tabulation of a statistically significant set of base substitution frequencies [ ] . the lower mutation rates of sars-cov- strains [ ] the energetics of the folding of a single-stranded nucleic acid into a stem-loop structure depend on both the composition and order of its bases. base composition is a distinctive characteristic of a genome or large genome sector. a localized sequence (e.g. a base window), which is rich in the strongly-pairing bases g and c, will tend to have a stable structure simply by virtue of its base composition, rather than of its unique base order. this high gc% value can obscure the contribution of the base order-dependent component of the folding energy, which provides a sensitive indicator of local intraspecies pressures for the conservation of function within a population (i.e. a mutated organism is eliminated by natural selection so no longer can be assayed for function in the population). in contrast, interspecies mutations tend to influence the genome-wide oligonucleotide (k-mer) pressure, of which base composition (gc%) is an indicator. this pressure can act to generate and/or sustain members of emerging species by preventing recombination with parental forms [ , , [ ] [ ] [ ] . elimination of this base composition-dependent component facilitates focus on local folding. early studies of rna virus structure by le and maizel [ ] were primarily concerned with the statistical significance of rna folding, rather than with distinguishing the relative contributions of base composition and base order. however, with a pipeline between the various programs that were offered by the wisconsin genetics computer group, the base composition and base order-dependent components were separated and individually assessed ("folding of randomized sequence difference" analysis; fors-d analysis). departing from le and maizel, fors-d values (see below) were not divided to yield z-scores, but were simply plotted with statistical error bars [ , , , , ] . the limits of the latter were generally close to the corresponding fors-d values and, for clarity, are omitted here. a window of bases is moved along a natural sequence in or base steps. a folding program [ ] is applied to the sequence in each window to obtain "folding of natural sequence" (fons) values for each window, to which both base composition and base order will have contributed. the four bases in each sequence window are then shuffled to destroy their order while retaining base composition, and folding energy is again determined. this shuffle-and-fold "monte carlo" procedure is repeated ten times and the average (mean) folding value is taken as the approach was employed by others who, rather than shuffling the four bases, favored retaining some base order information. accordingly, they shuffled groups of bases (e.g. the sixteen dinucleotides). following disparagement of the conceptual basis of four base shuffling, which was duly clarified [ ] , the validity of single base level shuffling is now generally accepted and is being applied routinely to various viral genomes [ , , ] . the monte carlo procedure can also be simplified to decrease fors-m computational time [ ] using support vector machine-based technology [ ] . our original software ("bodslp"), written by professor jian-sheng wu [ ] , retains the monte carlo approach and was further developed by professor shungao xu as "random fold-scan" for windows-based systems [ ] . in addition to assisting the study of infectious viruses and protozoa [ ] , fors-d analysis proved fruitful when applied to topics such as speciation [ , , ] , the origin of introns [ , , ] , relating structure to recombination breakpoints and deletions [ , ] , and relying on a single sequence (rather than alignments) for the determination of positive darwinian selection [ ] . however, for a given sequence window, output can follow only from the base order in that window. lost are higher order structures that might occur naturally through long-range interactions [ ] . furthermore, if the artificial demarcation of a window happens to cut between the two limbs of a natural stem-loop structure, then lost are what might have been contributed to the folding energetics had a larger window, or a different section point, been chosen. variations in step size will generate differences in windows and hence variations in results. such variations might be less when window margins correspond to natural section points, such as those demarcating an rna transcript. there is also a kinetic aspect, particularly apparent with transcripts, due to the probability that the pattern of early ' folding will influence later folding. high negative si values indicate structural conservation among a set of genomes. the validity of this approach is supported by prior work with hiv- . fig. shows application of the index to previously reported data on hiv- [ ] . here a high negative si index value corresponds to regions recognized as likely to offer achilles heel-like vulnerability [ ] . the region around the window centered on nucleotide bases is the focus of recent work [ ] [ ] [ ] . subtype (hxb ) [ , ] . the region recently recognized as a potential achilles heel (bases - ) [ ] [ ] [ ] , has a high negative fors-d value (black triangles) similar to those of the rre (rev response element) and the ' untranslated region (utr). the si index (continuous red line) indicates highest sequence conservation in the regions of the rna packaging signal and the 'utr. . . v a r i ation d u e t o b the degree of conservation in sequential windows of members of a set of sequences from china was evaluated as the number of substitutable base positions (polymorphism), relative to the prototype sequence (fig. ) . this was compared with corresponding fors-d values, the profile of which differed a little from that of fig. due to different step values (see materials and methods). as with some previous studies [ ] , there tended to be a reciprocal relationship between the for a more focused view of the reciprocal separation of high negative folding values and corresponding numbers of substitutions (ranging from zero to high positive values) the two were added to provide the "structural invariance" (si) index (fig. ) . here, despite having some substitutions, orfs nsp and nsp were preeminent (scoring - a region with the highest number of substitutions (see fig. ) centered on base (scoring + . si units). (for plots of analogous hiv- data see fig. ). windows and in the fse region scored - . and - . units, respectively. however, there were many more windows with higher negative scores. the si profile for china (fig. ) was, in broad outline, confirmed with corresponding data later downloaded from italy and new york, usa (fig. ) . the high negative si indices found in the nsp , nsp , orf a and n regions, were evident with sequences for all three locations. other regions, notably the s region, were also corroborated. the high positive si values, indicating regions likely to have poorly conserved structures, are also corroborated at some locations. a high intraspecies mutation rate for the n protein orf (fig. ) is also seen when interspecies comparisons are made with other coronavirus species, with implications for early speciation mechanisms [ ] . fig. (purple) , with si indices for italy ( isolates; green) and new york, usa ( isolates; red). mutation rates of microbial pathogens are generally higher than those of their hosts. while a microbe spreading from host-to-host can "anticipate" that it will face a succession of broadly similar challenges, in the short-term those hosts cannot likewise "anticipate" that new microbial invaders will remain as they were in previous hosts. thus, host immune defenses may be overwhelmed. (in the long-term there is a different scenario related to innate immunity; see below). therapeutic challenges are, first, to locate a conserved, less-variable, part of a pathogen's genome that it will have inherited sequentially from a multiplicity of past generations, and so is likely to carry through to a multiplicity of future generations. second, is to identify the corresponding primary function, be it at the genome, rna transcript, or protein level. third, from this knowledge (that may be incomplete; i.e. function not fully clarified), devise effective pathogen inhibition without imposing deleterious side-effects on the host. viral vulnerability is often assumed to associate with protein-level functions [ , ] . however, studies of the aids virus have identified genome structure itself as both functional and conserved, so signifying vulnerability [ , ] (see fig. ). a genomic packaging signal for hiv- , which is specifically recognized by the nucleocapsid domain of its gag polyprotein, has long been recognized as a potential "achilles heel" [ , , ] so inviting therapeutic exploration [ ] [ ] [ ] . through targeting of specific rna conformations, gag not only influences the assembly of hiv- genomic rna into virus particles, but also regulates hiv- mrna translation [ ] . permit easier switching between regulatory options. indeed, small changes in target rna structure can impede this [ ] . thus, unlike most other regions of the hiv- genome, mutations here would likely lead to negative darwinian selection at an early stage -hence the high conservation. application of the same bioinformatic technology to the sars-cov- virus genome has now revealed similar "achilles heels." lacking the chronicity of hiv- infection, the genome of sars-cov- should have been shaped less by adaptations to counter long-term host immune defenses. it cannot hide within its host genome in latent dna form. yet, the larger sars-cov- genome contains many more genes than hiv, which require differential expression according to the stage of infection. even more complex regulatory controls can be envisaged, likely requiring conserved genome conformations at appropriate locations. be they synonymous or non-synonymous, mutations in these structured regions could result in negative selection of the viruses in which they occurredhence high conservation. the ribosome fse located close to base (figs. ) would seem to exemplify this [ ] , and a potential targeting agent is now available [ ]. however, we have here identified other structurally important regions with more base order-dependence and higher degrees of conservation (figs - ), that might, either singly or collectively, be better candidates for targeting. when determining folding energy, our approach depends on eliminating contributions of base composition which, as noted, plays an unusual role in the case of the fse. more usually, base composition is a distinctive characteristic of entire genomes or large genome sectors that reflects their underlying oligomer ("k-mer") content. the slow genome-wide accumulation of mutations in oligomer composition (easiest documented as changes in base composition; gc%) can serve to initiate divergence into new species. by preventing that accumulation, potentially diverging organisms can stay within the confines of their species [ ] [ ] [ ] . the presence or absence of synonymous mutations [ , ] , which affect structure rather than amino acid composition, can have an important role in this process. the primary role of constancy in the base composition-related character is to prevent recombination with allied species (interspecies recombination) while facilitating the intraspecies recombination that can correct mutations, so retaining species individuality [ , ] . such recombination is initiated by "kissing" interactions between complementary unpaired bases at the tips of stem-loop structures [ ] . thus, we are here concerned with localized intraspecies mutations that affect fitness, so making members of a species carrying those mutations liable to natural selection. the mutations facilitate within-species evolution rather than divergence into new species. and when that evolution has run its course, some of the polymorphic bases will have become less mutable, so will be deemed "conserved." indeed, mutations of orf nsp are high when the sequences of different coronavirus species are compared [ ] , yet when, from intraspecies comparisons, mutations (in the form of base substitutions) are scored, they are very low (figs. , ) . our technology (see materials and methods) removes the base composition-dependent component of mutational changes (that relates more to interspecies evolution) and focuses on the base orderdependent component (that relates more to intraspecies evolution). it best reflects localized functions, be they encoding protein or determining the potentiality for folding into higher order structure, of linear, single-stranded, nucleic acid sequences. is conservation necessarily a good indicator of likely therapeutic success? we sought regions that were both high in stem-loop potential and bereft of mutations, following the premise that conserved functions would be best targeted therapeutically, assuming the availability of pathogen-specific therapeutic agents that would not cross-react with hosts. interference with structural nucleic acid level functions would seem less likely to produce unforeseen host sideeffects than with protein-level functions. yet, that possibility remains. indeed, a conserved function in a pathogen could owe that conservation to the pathogen strategy of, whenever possible, mutating to resemble its host. this would make it less vulnerable to host immune defenses. to prevent autoimmunity, the generation of immune cell repertoires involves the negative selection of self-reacting cells so creating "holes" in the repertoire that pathogens can exploit by progressive mutation towards host-self, testing mutational effectiveness a step at a time. this would make it advantageous for the host, during the process of repertoire generation, not only to negatively select immune cells of specificity towards "self," but also to positively select immune cells of specificity towards "near-self." a high level of anti-near-self immune cell clones would constitute a barrier limiting the extent of pathogen mutation towards self. the existence of such positive selection is now generally accepted, with the implication that some pathogen functions might have approached so close to host-self that targeting them therapeutically would result in cross-reactivity [ ] . this unlikely caveat aside, we deem conservation a reliable indicator that a certain pathogen function is likely to be a suitable target for therapy. attacking a short segment of a pathogen nucleic acid sequence is unlikely to ensnare a similar segment of its host's nucleic acid. in any case, to militate against this, the pathogen specificity of a potential therapeutic agent can be screened against the prototypic human genomic sequence (assuming it is unlikely that patient genomes will significantly depart from this). while prospects for the development of prophylactic vaccines against infection with sars-cov- are promising, methods to boost post-infection host immune defenses and to directly target sars-cov- are urgently needed. these require a better understanding both of viral interactions with host innate and acquired immune systems [ ] , and of viral vulnerabilities. the latter enquiry proceeds in three steps: find specific "achilles heels." design therapies to exploit them. prove their clinical effectiveness. we have here been concerned with the first step. although the bioinformatic technology related to this has long been available [ ] , its claim to reveal viral "achilles heels," as promulgated in successive textbook editions [ ] , has only recently gained support [ ] [ ] [ ] . this may be because the importance of removing redundant information and analyzing solely the contribution of base order to folding energy, was not fully appreciated. thus, we have here repeated and expanded on past clarifications [ , , ] of the conceptual basis of a technology that has contributed to the understanding of a many biological problems other than viral infections [ ] . meanwhile, it is pleasing to note that, with minimal evidence on targeting, progress is being made with the second step [ , ] . it is hoped that by targeting one or more of the conserved regions in the sars-cov- genome here identified, rapid cures will be achieved. authors declare no conflict of interest. low genetic diversity may be an achilles heel of sars-cov- reciprocal relationship between stem-loop potential and substitution density in retroviral quasispecies under positive darwinian selection implications of hiv rna structure for recombination, speciation, and the neutralism-selectionism controversy an rna-binding compound that stabilizes the hiv- grna packaging signal structure and specifically blocks hiv- rna encapsidation the heart of the hiv rna packaging signal? identification of the initial nucleocapsid recognition element in the hiv- rna packaging signal evolutionary bioinformatics hiv- gag protein with or without p specifically dimerizes on the viral rna packaging signal a small interfering rna (sirna) database for sars-cov- targeting the sars-cov rna genome with small molecule binders and ribonuclease targeting chimera (ribotac) degraders. acs cent sci example study of a highly conserved sequence motif in nsp of sars-cov- as a therapeutic target sars-cov- and orf a: nonsynonymous mutations, functional domains, and viral pathogenesis structural and functional conservation of the programmed − ribosomal frameshift signal of sars coronavirus (sars-cov- ) the short-and long-range rna-rna interactome of sars-cov- an in silico map of the sars-cov- rna structurome pervasive rna secondary structure in the genomes of sars-cov- and other coronaviruses an evolutionary portrait of the progenitor sars cov and its dominant offshoots in covid pandemic assessing uncertainty in the rooting of the sars-cov- phylogeny different biological species "broadcast" their dnas at different (g+c)% "wavelengths success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation hybrid sterility can only be primary when acting as a reproductive barrier for sympatric speciation a method for assessing the statistical significance of rna folding conservation of stem-loop potential in introns of snake venom phospholipase a genes: an application of fors-d analysis computer prediction of rna secondary structure calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues scanfold: an approach for genome-wide discovery of local rna structural elements -applications to zika virus and hiv a computational procedure for assessing the significance of rna secondary structure fast and reliable prediction of noncoding rnas evaluation of fors-d analysis: a comparison with the statistically significant stem-loop potential a fors-d analysis software "random_fold_scan" and the influence of different shuffle approaches on fors-d analysis low complexity segments in plasmodium falciparum proteins are primarily nucleic acid level adaptations microsatellites that violate chargaff's second parity rule have base order-dependent asymmetries in the folding energies of complementary dna strands and may not drive speciation a stem-loop "kissing" model for the initiation of recombination and the origin of introns local base order influences the origin of ccr deletions mediated by dna slip replication the key role for local base order in the generation of multiple forms of china hiv- b'/c intersubtype recombinants positive darwinian selection. does the comparative method rule? codon usage and phenotypic divergences of sars-cov- genes human immunodeficiency virus type gag polyprotein modulates its own translation synonymous mutations and the molecular evolution of sars-cov- origins a putative role of de-mono-adp-ribosylation of stat by the sars-cov- nsp protein in the cytokine storm syndrome of covid- two signal half century: from negative selection of self-reactivity to positive selection of near-self reactivity we thank prof. shungao xu at jiangsu university for software, and ms. le cao and ms.yingying ma at shanghai public health clinical center, fudan university, for their technical support. queen's university hosts forsdyke's webpages. the biorxiv server hosts preprints. key: cord- -rivaoo authors: noreika, valdas; kamke, marc r.; canales-johnson, andrés; chennu, srivas; bekinschtein, tristan a.; mattingley, jason b. title: alertness fluctuations during task performance modulate cortical evoked responses to transcranial magnetic stimulation date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: rivaoo transcranial magnetic stimulation (tms) has been widely used in human cognitive neuroscience to examine the causal role of distinct cortical areas in perceptual, cognitive and motor functions. however, it is widely acknowledged that the effects of focal cortical stimulation on behaviour can vary substantially between participants and even from trial to trial within individuals. here we asked whether spontaneous fluctuations in alertness can account for the variability in behavioural and neurophysiological responses to tms. we combined single-pulse tms with neural recording via electroencephalography (eeg) to quantify changes in motor and cortical reactivity with fluctuating levels of alertness defined objectively on the basis of ongoing brain activity. we observed rapid, non-linear changes in tms-evoked neural responses – specifically, motor evoked potentials and tms-evoked cortical potentials – as eeg activity indicated decreasing levels of alertness, even while participants remained awake and responsive in the behavioural task. impact statement a substantial proportion of inter-trial variability in neurophysiological responses to tms is due to spontaneous fluctuations in alertness, which should be controlled for during experimental and clinical applications of tms. transcranial magnetic stimulation (tms) is a widely used tool for probing human brain function, with applications ranging from the characterization of inter-hemispheric motor cortex interactions and establishment of causal links between neural oscillations and attention, to the clinical treatment of depression and other diseases (dugué & vanrullen, ; valero-cabré et al., ; ziemann, ) . a number of neurophysiological indices of cortical tms perturbation have been used to contrast experimental conditions of interest, including motor evoked potentials (meps) recorded from peripheral muscles (barker et al., ; bestmann & krakauer, ) , and tms-evoked potentials (teps) which are thought to reflect the reactivity of underlying cortical circuits (chung et al., ; . these and other outcome measures show varying sensitivity to different experimental manipulations, as well as confounding factors. perhaps the largest within-participant variation in motor and cortical responses to tms are observed when contrasting wakefulness and sleep. as healthy adult participants fall into slow wave sleep, mep amplitude diminishes (bergmann et al., ; hess et al., ; grosse et al., ) , whereas tep amplitude increases in association with a breakdown of effective connectivity, reflecting a state shift from a global to a more local or stereotypical mode of processing (massimini et al., ) . likewise, sleep pressure has been shown to modulate tms responses during normal waking in daytime hours. for example, mep-derived motor thresholds are elevated following sleep deprivation (de gennaro et al., ) , whereas tep amplitude increases throughout the day as a function of a natural build-up of sleep pressure (huber et al., ) . to date, however, it is not known whether the effects of tms on neural activity are influenced by fluctuations in the level of alertness that occur during wakefulness. here we combined single-pulse tms with concurrent eeg recording and a simple behavioural task to quantify changes in motor and cortical reactivity with fluctuating levels of alertness defined objectively on the basis of ongoing brain activity. in most studies that have employed tms to perturb or modify brain activity, behavioural or physiological data are typically collected over experimental sessions that last up to an hour or more (e.g. darmani et al., ; herring et al., ; salminen-vaparanta et al., ) . the implicit assumption is that participants' levels or alertness remain relatively consistent for the duration of the testing session. this assumption has recently been challenged by findings from brain imaging studies. for instance, tagliazucchi and laufs ( ) found that % of research participants drifted into a drowsy state (n sleep) during resting-state functional magnetic imaging (fmri) protocols after only three minutes. these periods of early n sleep during passive resting-state scans were accompanied by increased signal variance in sensory and motor cortices, increased temporal-temporal and temporal-frontal connectivity, and decreased thalamic-frontal connectivity patterns (tagliazucchi & laufs, ) , highlighting a pervasive source of variance in neuroimaging data due to drowsiness. likewise, using an active decision-making task, de gee et al. ( ) demonstrated that brainstem-controlled inter-trial fluctuations in phasic arousal are accompanied by modulation in the involvement of prefrontal and parietal cortices in choice encoding, again implying that fluctuating levels of alertness are potentially a significant source of variability in neural activity. further evidence for the contribution of fluctuating levels of alertness to variability in neural activity has come from studies of mep amplitudes, which tend to be highly variable from trial to trial even when the intensity of the tms pulses is held constant (ellaway et al., ; maeda et al., ) . several studies have shown that a significant portion of this variance is related to eeg oscillatory activity in a pre-tms time window (mäki & ilmoniemi, ; sauseng et al., ; zarkowski et al., ) . in particular, trials with higher pre-stimulation alpha power tend to be associated with lower mep amplitude (sauseng et al., ; zarkowski et al., ) . to the extent that increases in alpha power are associated with decreased levels of alertness in participants who are awake and have their eyes open (gharagozlou et al., ; kaida et al., ) , spontaneous fluctuations in alertness could be a significant source of inter-trial mep variability. unfortunately, previous tms investigations have not measured or controlled for changes in alertness in their participants, and so it remains unknown whether fluctuations in alertness are systematically associated with changes in tms-evoked neural activity. in the present study, we characterize effects of spontaneous fluctuations in alertness on neurophysiological indices of tms perturbation over the primary motor cortex during a single daytime session. we had four goals: ( ) to estimate the latency and stability of fluctuations in alertness over the course of an active, single-pulse tms session; ( ) to test whether fluctuations in alertness modulate the occurrence and amplitude meps; ( ) to determine whether the amplitude of tep responses within the first ms after a tms pulse changes across different levels of alertness; and ( ) to assess whether inter-trial variance of mep and tep amplitudes is altered with decreases in alertness. participants (n= ) relaxed with eyes closed in a reclining chair, and received single pulses of tms over the right motor cortex (targeting the first dorsal interosseous (fdi) muscle of their left hand) at different intensities centred on their individual motor threshold (see materials and methods for a detailed outline of the experimental protocol). participants were asked to press a mouse button with their right hand (which was not targeted) after each tms pulse if they felt a tactile sensation or twitch in their targeted left hand ( fig. a-b ). at the commencement of the study, participants were informed that they were free to fall asleep but that they should otherwise continue performing the task. they were gently awakened if they failed to respond after successive tms pulses. to assess the instantaneous level of alertness, a two-fold eeg analysis was applied over the time window immediately preceding each tms pulse: ( ) a binary definition of awake and drowsy states following eeg spectral power signatures (θ/α) averaged across all eeg electrodes (bareham et al., ) , and ( ) a dynamical definition of alertness levels following a detailed sub-staging system for scoring the transition to n sleep (hori et al., ) (fig. c ). both measures were highly correlated (see fig f) . temporal structure of an individual trial. two eeg windows preceding single-pulse tms were used to assess alertness: a s window was used for manual scoring of hori stages ( alertness levels), and a s window was used for automatic calculation of the theta ( - hz) to alpha ( - hz) spectral power ratio. following each tms pulse delivered over the right motor cortex, motor evoked potentials (meps) were recorded from the first dorsal interosseous (fdi) muscle of the left hand, and tms-evoked potentials (teps) were recorded using high density eeg to characterize cortical reactivity. (b) schematic of experimental set-up, showing emg, eeg, tms and response mouse in situ. (c) brief definitions and eeg examples of hori stages of sleep onset, progressing from relaxed wakefulness (hori stage ) to nrem stage sleep (hori stage ) (modified with permission from ogilvie ( ) ). in the current study, alertness levels - (marked in green) correspond to hori stages - . (d) percentage of trials obtained within each alertness level, shown separately for the participants. datasets are sorted from the most alert participants (lower rows) to the drowsiest participants (upper rows). there were very few epochs of alertness level or above. (e) representative dataset for one participant, showing good agreement between the two eeg measures of alertness across the whole testing session. the upper subplot indicates θ/α ratio; the lower subplot shows fluctuations of alertness levels on the hori scale. (f) cross-validation of eeg measures of alertness: intra-individual correlations between θ/α power and alertness levels across single session trials. bars represent intra-individual spearman's rank order correlation coefficients for the participants, sorted from the most to the least positive coefficients. on average, tms sessions lasted for . min (sd= , range= . - . min) including time spent switching tms coils and allowing breaks for participants. during this period, all participants reached alertness level or higher, reflecting deep drowsiness with a dominance of eeg theta ripples (see fig c) . notably, it took only . min on average for participants to reach alertness level (sd= . , range= . - . min), indicating a rapid decrease of alertness despite the fact that participants were receiving tms pulses and generating task-specific motor responses. all participants ceased responding at some point during the testing session, after which they either woke up spontaneously due to tms, or they were awoken by an experimenter after consecutive unresponsive trials. on average, . % of trials were categorised as "unresponsive" (sd= . , range= . - . ), suggesting a notable impact of drowsiness on task performance. alertness levels and unresponsive trials tended to be spread across the testing session, i.e., participants tended to "oscillate" between awake and drowsy states (see fig e and s ). consequently, there was no systematic increase or decrease in alertness level within a session at the group level. four participants showed a significant but weak positive correlation between alertness level and trial number (mean rho= . ), participants showed a significant negative association (mean rho=- . ), while the remaining participants showed no significant correlation between alertness level and trial number (mean rho=- . ) (see fig s ) . these results suggest that a given participant's level of alertness cannot be assumed to decrease continuously over a testing session. only concurrent eeg measures of alertness can definitively determine a participant's moment-to-moment level of alertness. visualisation of the alertness level (vertical axis) shown for the entire tms testing session (horizontal axis). each subplot represents a different participant (indicated by id number). red vertical lines depict the first trial within a session scored as alertness level . blue dots indicate unresponsive trials. results of spearman rank order correlation tests between alertness level and trial number are presented in the top right corner of each subplot (ns=not significant). we first assessed corticospinal excitability as a function of alertness and tms intensity. to this end we calculated the proportion of trials with mep peak-to-peak amplitude above μv for each of the tms intensities, separately for the θ/α-defined awake and drowsy trials. a sigmoid function was then fitted across alertness conditions for each participant. the slope of the mep sigmoid was slightly but significantly shallower in drowsy compared with awake trials (wilcoxon signed-rank test: z-score= . , p= . , r= . ), suggesting mildly increased noise and instability in corticospinal processing (see fig. a ; individual participant results are shown in fig. s ). by contrast, the mep sigmoid threshold did not differ between awake and drowsy trials (t( )= . , p= . , d= . , bayes factor in favour of the null= . ). we considered whether the observed difference in mep slopes was specifically related to alertness, as the amplitude of pre-stimulus alpha oscillations has also been implicated in fluctuations in attention and sensory gating. in these cases, eeg alpha effects are typically evident only within a relatively short pre-stimulus time period of a few hundred milliseconds (romei et al., ) , and are restricted to sensory or fronto-parietal regions (capotosto et al., ; van dijk et al., ) . contrary to this, the difference observed here in mep sigmoid slope as a function of eeg θ/α power was temporally and spatially widespread (fig. s ). (a) group-averaged frequency of trials with meps above a threshold value of μv across tms intensities, centred on individual motor thresholds ( %). sigmoidal functions are fitted separately to the θ/α-defined awake (red) and drowsy (blue) conditions (error bars represent one standard error of mean, sem). insets on the right depict each participant's sigmoid threshold and slope difference (awake minus drowsy), with horizontal bars sorted in ascending order. only responsive trials are included in the analysis shown in this and other subplots. alertness states are distinguished here using the eeg θ/α measure taken from a ms window immediately prior to the tms pulse. (b, upper panel) group-level dynamics of meps (averaged across three tms intensities centered on motor threshold) across alertness levels - . horizontal green dashed lines delineate peaks at and ms post-tms ( ms) for alertness level . (b, lower panel) change in mep peak-to-peak amplitude across alertness levels - . circles represent individual participants. for each alertness level, the red line depicts the group-level mean of peak-to-peak amplitude. the pink shaded region represents standard deviation (sd), and the blue shaded region represents the % confidence interval of the mean. states. percentage of trials with meps above threshold value of μv, calculated separately for the awake and drowsy trials across tms conditions centred on individual motor threshold ( %). sigmoidal functions are fitted to the awake (red) and drowsy (blue) conditions separately for each individual (n= ). slopes (lower panel) between θ/α-defined awake and drowsy conditions. alertness states were measured and contrasted separately for each of the time bins in steps of ms across a to ms pre-tms time window. electroencephalography (eeg) spectral power was averaged over all electrodes. (b) difference in mep sigmoid thresholds (upper row) and slopes (lower row) between θ/α-defined awake and drowsy conditions. alertness states were measured and contrasted separately for each of the eeg electrodes. eeg spectral power is averaged over a - to ms pre-stimulation time window. we next compared mep peak-to-peak amplitudes between alertness level , reflecting relaxed wakefulness, and alertness levels - , reflecting increasing levels of drowsiness. as shown in figure b , there was a significant increase in mep amplitude between alertness levels and (t( )= . , p= . , d= . ), as well as an intermediate stepped increase across alertness levels (d= . ) and (d= . ), and a small decrease at alertness level (d= . ). a linear trend of increasing mep amplitude was observed across alertness levels - (f( , )= . , p= . , partial η = . ), but this was no longer significant when level was also included (f( , )= . , p= . , partial η = . ). these findings indicate a reliably non-linear reorganization of corticospinal excitability at a time when drowsy participants are still conscious and responsive. the most noticeable change in dynamics occurred with the disappearance of alpha waves, at a point where there was eeg flattening and the first occurrence of eeg theta-range ripples, i.e., alertness levels and despite the fact that participants were still responding behaviourally in the task. these observations suggest a much earlier modulation of corticospinal excitability in the initial moments of drowsiness than has been reported previously in studies of mep changes with sleep deprivation or during nrem sleep (de gennaro et al., ; grosse et al., ; manganotti et al., ) . we next assessed post-tms cortical reactivity measured as tms-evoked potentials (teps) within the first ms after each pulse. early tep amplitude is known to increase in response to homeostatic sleep pressure (huber et al., ) and during nrem sleep (massimini et al., ) , likely reflecting a combination of synaptic strengthening, changes in neuromodulation, and impaired inhibition (huber et al., ) . we hypothesized that, as with mep amplitude, teps should be affected by the level of alertness in drowsy but responsive participants. comparing tep amplitudes between θ/α-defined awake and drowsy trials revealed a reliable increase in cortical reactivity in drowsy trials in a time window from - ms after the tms pulse (t( )= . , p= . , d= . ) (see fig a) . this pattern was evident in / participants (see fig. s ). while displaying a wide fronto-central spread, the peak difference between awake and drowsy states occurred over the right motor region, directly beneath the tms coil (see fig b-c) . additional control analyses confirmed that the observed tep increase for drowsy trials was not due to tms-evoked scalp muscle artefacts (see fig s ) . we further compared tep amplitudes at the site of stimulation across alertness levels - . as hypothesized, tep amplitude increased as participants became more drowsy (mann-kendall trend test: z= . , p= . , tau= . ) (fig d-e) . planned comparisons revealed a significant increase in tep amplitude between alertness levels and level (t( )= . , p= . , d= . ), and (t( )= . , p= . , d= . ), and and (t( )= . , p= . , d= . ). these findings provide the first direct evidence for an inverse association between cortical reactivity and alertness, suggesting that sleep-related changes in neural activity may intrude early in the transition between wakefulness and sleep, while participants are still able to respond in an ongoing behavioural task. strikingly, the tep effects emerged at a relatively early alertness level, before the appearance of drowsiness ripples or early slow waves (hori et al., ) . shading depicts standard error of the mean (sem). (b) topographical distribution of the early tep mean peak at - ms post-tms pulse in the θ/α-defined awake (upper left) and drowsy (upper right) states. black dots indicate locations of three eeg electrodes with the maximal amplitude in the map. non-parametric z map (below) reveals region reliably different between awake and drowsy states. (c) - ms data-driven spatiotemporal clustering of eeg potentials post-tms pulse between θ/α-defined awake (red) and drowsy (blue) states. tep amplitude was significantly higher in drowsy trials in a - ms time window (cluster peak: ms, t = . , p = . ). the green horizontal line depicts the time window of significant difference. the electrode with the largest difference between awake and drowsy states is marked as a green dot in the topographic voltage map, and its waveforms are plotted below. the black contours within the map show the electrodes with statistically reliable differences (cluster). the topographic voltage map is at the peak difference between awake and drowsy states. (d) individual-level dynamics of tep cortical reactivity peak amplitude across alertness levels - (tep amplitude averaged over - ms across electrodes beneath the tms coil). normalized amplitude is shown relative to alertness level (green dashed line). black lines represent participants with higher tep amplitude at alertness level relative to alertness level (n= ); grey lines represent participants with lower tep amplitude at alertness level relative to alertness level (n= ). (e) group-level dynamics of tep waveforms across alertness levels - (teps averaged over electrodes beneath the tms coil). horizontal green dashed line delineates tep cortical reactivity peak at ms post-tms at alertness level . teps averaged across eeg electrodes within a region-of-interest (roi) beneath the tms coil. awake (red) and drowsy (blue) tep waveforms are depicted separately for each individual (n= ). electromyography (emg) waveforms depicting tms-evoked scalp muscular contraction artefact identified during independent component analysis (ica). after identification of this component, trials were split between awake (red) and drowsy (blue) conditions. data shown here are averaged across participants in whom the artefact could be identified. error shading depicts standard error of mean (sem). there was no significant difference between the two conditions. having examined participant-level changes in mep and tep amplitudes with spontaneous fluctuations in alertness, we next carried out single-trial analyses of tms-evoked response variability by pooling data across participants, separately for each alertness level. response variability was quantified as the absolute difference between the tms-evoked response amplitude in each trial and the mean amplitude in a given alertness condition. the variability of single-trial mep amplitude changed significantly between alertness levels figure b , there was a consistent increase in tep amplitude variability from alertness level to level (standardized jonckheere-terpstra trend test statistic= . , p= . ), suggesting increases in cortical reactivity with decreases in alertness. mep peak-to-peak amplitude. (b) tep peak amplitude. jittered dots represent individual trials across participants (n= per condition). for each alertness level, the red line depicts mean amplitude. pink shading represents standard deviation (sd), which was very small and is thus difficult to discern in the figure. blue shading represents the % confidence intervals for the mean. insets on the right indicate locations of hand and scalp electrodes and time windows used to detect peak amplitude values. in a final step, we asked whether mep and tep amplitudes were correlated at a single trial level, and whether any such association was modulated with changes in alertness. there was no significant correlation between mep and tep amplitudes at alertness level (spearman's rho= . , p= . ; not significant after correction) or level (spearman's rho= . , p= . ; not significant after correction). as shown in figure s , however, there was a significant positive correlation between mep and tep amplitudes at the deeper levels of drowsiness, including alertness level (spearman's rho= . , p= . e- ), alertness level (spearman's rho= . , p= . e- ), and alertness level (spearman's rho= . , p= . e- ) (see fig. s ). taken together, these findings indicate that while mep and tep responses are independent during wakefulness, they become positively associated with increases in drowsiness, suggesting a gradual shift in state toward a non-differentiated mode of neural processing. individual dots in the scatter plots depict single trial amplitudes of mep and tep responses. z-scored amplitude values are shown, and least-squares lines are plotted to visualize associations that were assessed using a spearman rank order correlation test. most studies that use tms to investigate perceptual, cognitive and motor function in human participants do not consider the possibility that fluctuating levels of alertness across a typical testing session might lead to measurable changes in both behavioural and associated patterns of brain activity. here we used single-pulse tms delivered over the right motor cortex while simultaneously measuring meps and teps across different, objectively defined levels of alertness while participants engaged in a simple tactile perception task. participants exhibited fluctuating levels of alertness across the testing session, as indexed by continuous eeg recordings, but continued to respond behaviourally even in relatively deep states of drowsiness. strikingly, both motor evoked responses and tms-evoked cortical reactivity were altered across different levels of alertness. specifically, we found that mep amplitudes peaked during eeg flattening (alertness level ), whereas tep cortical reactivity increased earlier and remained stable across alertness levels and . these findings highlight that a substantial proportion of inter-trial variability in neurophysiological responses to tms can potentially be attributed to spontaneous fluctuation in alertness. inter-trial and inter-subject variability in mep amplitude is a well-known source of data variance in tms experiments (kiers et al., ; ellaway et al., ; rösler et al., ; schutter et al., ; sommer et al., ) , and it has been suggested that or more trials are required to provide a reliable estimate of mep amplitude (goldsworthy et al., ) . the non-stationarity of mep amplitudes has been attributed to a number of factors, including prestimulus voluntary muscle contraction (kiers et al., ) , variation in the number of recruited alpha-motor neurons (rösler et al., ) , variation in the synchronization of motor neuron discharges (rösler et al., ) and functional hemispheric asymmetries (schutter et al., ) . furthermore, a recent series of studies found that the amplitude or phase of pre-tms eeg oscillations can predict mep amplitude, including alpha (iscan et al., ; sauseng et al., ; schulz et al., ; zarkowski et al., ) , beta (mäki & ilmoniemi, ; keil et al., ; schulz et al., ) and gamma (zarkowski et al., ) frequency bands. extending these studies, our findings suggest that changing levels of alertness could be a key factor in brain state-modulation of corticospinal excitability. for instance, zarkowski et al. ( ) demonstrated that mep amplitude is negatively correlated with pre-tms alpha power ( - hz) and positively correlated with pre-tms gamma power ( - hz), with an alpha/gamma ratio being the strongest predictor of mep amplitude. while a theta/alpha ratio can index the level of alertness in eyes-closed experiments, such as in the present study, the alertness-indexing frequencies are shifted upward in eyes-open paradigms (eoh et al., ; kaida et al., ; zhao et al., ) . it is therefore likely that the results of zarkowski et al. ( ) and other studies were influenced by changing levels of alertness in the participant sample. our finding of non-linear changes in mep amplitude with decreasing levels of alertness might explain previous contradictory findings regarding sleep deprivation effects on mep amplitude. while several studies have reported an increase in corticospinal motor threshold following sleep deprivation (manganotti et al., ; de gennaro et al., ) , other studies have failed to find any such effect (civardi et al., ; manganotti et al., ) . arguably, due to individual differences in instantaneous drowsiness levels and potentially different time of day and duration of testing, the dominant level of alertness varied between these studies, confounding their comparison. for instance, datasets with a relatively high proportion of trials obtained during alertness level would likely indicate higher mep amplitude compared with other datasets. unfortunately, a fine-grained measurement of alertness is seldom undertaken in mep studies, even when eeg is recorded, e.g., "sleepiness" or nrem stage sleep are usually treated as a uniform state (manganotti et al., ) , even though a more detailed analysis can reveal at least micro-states within n sleep (hori et al., ; see fig c) . regarding the modulation of teps with sleepiness, huber et al. ( ) observed an increase of tep amplitude as a function of prolonged wakefulness as well as following sleep deprivation. contrary to our results, however, they found no association between tep amplitude and short-lasting episodes of drowsiness. this discrepancy is most likely attributable to the fact that huber et al. ( ) relied solely on a behavioural definition of drowsiness (performance in a visuomotor tracking task), whereas we used fine-grained eeg measures of alertness that could be quantified independently of fluctuations in behaviour. in another recent study, tep amplitude was found to depend on the interaction between sleep pressure and a phase of circadian cycle rather than sleep homeostasis alone (ly et al., ) . furthermore, the same study found that an increase in tep amplitude was associated with an increase in eeg theta power across h of sustained wakefulness. unfortunately, the authors did not report whether such an association held over a shorter period of time (e.g., - mins), as would be the case in typical tms studies. sleep-related increases in tep amplitudes likely reflect facilitation of a stereotypical, local mode of processing, which has been linked to the loss of consciousness (massimini et al., (massimini et al., , tononi & massimini, ) . similarly, an increase in tep amplitude is observed during dreamless xenon anaesthesia, whereas dreamful ketamine anaesthesia shows no changes in tep amplitude compared with wakefulness, suggesting a specific link between tep amplitude and (un)consciousness (sarasso et al., ) . contrary to these suggestions, however, here we found that the perturbational tep marker of (un)consciousness arose at an early stage of drowsiness, when participants were still able to respond behaviourally and well before they fell asleep. thus, we would suggest that changes in local cortical processing as indexed by teps primarily reflect different levels of alertness rather than the presence or absence of sensory awareness. an increase in stereotypical local processing with increasing drowsiness was also observed in the association between teps and meps. fecchio et al. ( ) showed that trials with a high mep amplitude were also associated with a relatively high tep amplitude compared with trials with a low mep amplitude. in the present study, we found a gradually increasing intertrial variability of tep amplitude across decreasing levels of alertness. by contrast, variability in mep amplitude showed a more complex u-shaped pattern characterised by an initial decrease during early stages of drowsiness and a subsequent increase with higher levels of drowsiness, suggesting potentially independent neural pathways in the transition between wakefulness and sleep. nevertheless, while tep and mep amplitudes showed no correlation at alertness levels and , these two neural indices became positively correlated with increasing drowsiness, i.e. at alertness levels - . these findings suggest that withintrial associations between corticospinal excitability (mep) and cortical reactivity (tep) depend on the level of alertness, which should be controlled for when experiments involve hundreds of trials delivered over a prolonged testing session (fecchio et al., ; helfrich et al., ) . several strategies exist for dealing with changing levels of alertness during behavioural testing. if alertness decreases throughout a session, an additional factor of trial or block number could be added to a statistical model as an alertness regressor or covariate. however, we found that an initial decrease in alertness did not persist throughout the testing session, and participants tended to "oscillate" between awake and drowsy periods (see fig s ) , which precludes any straightforward inference of decreasing alertness over the course of a single testing session. alternatively, reaction times (rts) could be used as a behavioural index of alertness in active tms experiments, as rts typically lengthen with decreases in vigilance (schmidt et al., ) , increases in drowsiness (ogilvie & wilkinson, ) , and following sleep deprivation (ratcliff & van dongen, ) . however, such a strategy would not be possible in passive paradigms in which participants are not required to respond (gordon et al., ; massimini et al., ) . furthermore, as both trial counts and reaction times are relative measures, participants who maintain high alertness throughout a session could be falsely labelled as drowsy in a proportion of trials. arguably, concurrent eeg recording should be the gold standard for assessment of single-trial alertness, as it provides quantifiable and reliable signatures of instantaneous brain-states, including alpha-and theta-derived hori stages of sleep onset (hori et al., ) . as an alternative to the tedious manual scoring of hori stages (alertness levels), an automated eeg method based on wakefulness and sleep grapho-elements is available for the detection of drowsiness from eeg data (jagannathan et al., ) . while methods that weight the dominance of eeg theta and alpha oscillations are suitable for eyes-closed paradigms (hori et al., ; jagannathan et al., ) , such as resting state or phosphene studies (bonnard et al., ; de graaf et al., ) , the power of higher eeg frequencies should be considered when assessing alertness during active eyesopen experiments (eoh et al., ; kaida et al., ; zhao et al., ) . finally, when eeg measurements are not available or feasible, tms experiments could be carried out in short blocks of just a few minutes each (e.g., - min) and inter-block intervals could be used to assess instantaneous subjective sleepiness, for example by asking participants to undertake the -graded karolinska sleepiness scale (Åkerstedt & gillberg, ; kaida et al., ) . paradigms. it will also be important to investigate whether neurophysiological effects of decreasing alertness are limited to individuals who are likely to fall asleep in a situation of prolonged inactivity, such as our participants. if this turned out to be the case, it might be advisable to selectively recruit individuals who are very unlikely to fall asleep during daytime hours to tms experiments that require a high and stable level of alertness. to conclude, our findings challenge the widely held assumption that the cortex is maintained in a more or less "steady state" when participants undertake experimental investigations of perceptual, cognitive or motor function. our findings demonstrate that spontaneously occurring fluctuations in alertness differentially modulate cortical reactivity over relatively short durations, even when participants are tested during day time hours when they would normally be awake and performing typical activities of daily living. our study highlights the importance of controlling for spontaneous fluctuations in alertness at a single trial level in non-invasive brain stimulation studies. twenty participants ( male; mean age . years: age range - years) took part in the study. all participants were screened for contraindications to tms (rossi et al., ) , which included having no history of hearing impairment or injury, and no neurological or psychiatric disorders. all participants were right handed, as assessed using the edinburgh handedness scale (oldfield, ) . the mean handedness index was . (sd= . ; range . to ). potential participants were also screened with the epworth sleepiness scale (ess) (johns, ) . the mean ess score was . (sd= . ), which indicates that most of the participants had a slight to moderate chance of dozing off in a situation of prolonged inactivity. university of queensland (uq), and the study was carried out in accordance with the declaration of helsinki. all participants gave informed, signed consent. participants were recruited through an electronic volunteer database managed by uq's school of psychology. they received $ for taking part in the study. there were no adverse reactions to tms. surface emg was recorded from the first dorsal interosseous (fdi) of the left and right hands using disposable mm ag-agcl electrodes (kendall h sg by covidien; ma, usa). the electrodes were placed in a belly-tendon montage with the reference over the proximal phalanx of the index finger and a common reference on the right elbow. raw emg signals were amplified (× ) and filtered ( - hz; hz notch filter) using a digitimer neurolog system (digitimer; hertfordshire, uk). the data were digitised at hz using a power and signal (v ) software (cambridge electronic design; cambridge, uk) and stored for offline analysis on a pc. throughout the experiment emg activity was monitored on-line using a digital oscilloscope with a high gain. participants were prompted to relax if any unwanted muscle activity was observed. tms was applied to the right primary motor cortex using a magstim stimulator and a mm figure-of-eight coil (# - ; the magstim company; carmarthenshire, uk). the site for stimulation was the point on the scalp over the motor cortex that elicited the largest and most consistent amplitude meps from the left fdi. this stimulation 'hotspot' was found by placing the tms coil tangentially on the scalp with the handle pointing posteriorly and laterally at ~ ° to the sagittal plane, and stimulating at an intensity that was assumed to be slightly suprathreshold for most individuals. once the hotspot had been identified it was marked using an infrared neuro-navigation system (visor by ant neuro; enschede, the netherlands). a small piece of foam (~ ) mm thick was then placed under the centre of the tms coil so that it was not in physical contact with any eeg electrodes. the hotspot was remarked and the location and orientation of the tms coil were maintained throughout the testing session with the aid of the neuro-navigation system. accuracy of coil position and handle orientation were kept within mm and degrees, respectively, but were typically within mm and degrees. resting motor threshold was determined using the relative frequency method with a criterion of ≥ µv (peak-to-peak) mep amplitude in at least five out of ten consecutive trials rossini et al., ; samii et al., ) . a two-down, one-up staircase was used, starting at a suprathreshold intensity. mean motor threshold for the group was . % (range - %) of maximal tms output intensity. continuous eeg data were acquired using a channel brainamp mr plus amplifier, tms braincap and brain vision recorder (v ) software (brain products; gilching, germany). a high chloride abrasive electrolyte gel was used (abralyt hicl by easycap; herrsching, germany) and electrode placement corresponded with the international - system. data were sampled at khz with a bandpass filter of dc- hz and resolution of . mv (± . mv). recordings were referenced online to the left mastoid, and electrode impedance was typically kept below kΩ. participants were seated in a comfortable reclining chair that included head and leg support (see figure b) . after placing the emg and eeg electrodes, participants had their eyes blindfolded and the lights in the lab were dimmed. they were instructed to relax for a few minutes while estimation of individual resting motor threshold was performed. participants' hands were comfortably supported with pillows. after threshold estimation the combined tms-eeg experiment was carried out, and participants were reminded to stay relaxed and keep their eyes closed. they were also instructed to pay attention covertly to their left hand and to respond by clicking one of the two keys on a mouse held in their right hand if they felt a tactile sensation, such as a twitch or a touch, in their left hand at the time of each tms pulse (see figure a ). participants were explicitly instructed that they were permitted to fall asleep should they wish to. if no responses were registered after - consecutive trials (i.e., failure to press a mouse button within seconds of a tms pulse), participants were gently awakened verbally and reminded to continue the task. during stimulation, nine tms intensities centred on the individual resting motor threshold were used (- %, - %, - %, - %, %, + %, + %, + %, + %). given that tms stimulator output intensity is measured in whole numbers from to , the calculated percentage from threshold intensity was rounded. this yielded slightly different sized steps from - % to + % for some individuals, and this was taken into consideration when fitting sigmoidal functions at the single participant level. for each individual, trials of single pulse tms were delivered, with an average inter-pulse interval of . sec and a uniformly distributed random jitter of ± ms. thus, the inter-pulse interval lasted between . - . s. we incorporated a relatively long inter-pulse interval to facilitate the natural development of drowsiness, and to allow sufficient time for a return of tonic emg activity to its baseline level. as our aim was to obtain the maximum number of evoked responses (meps and teps) around the tms threshold intensity, the following number of trials was delivered at each tms intensity: trials ( . % of a total) at each of the - %, - %, - %, + %, + % and + % intensities; trials ( . % of a total) at each of the - % and - % intensities; trials ( . % of a total) at % (i.e., at the individual resting motor threshold). trial order was randomized throughout the experiment. tms pulses were delivered in blocks of trials. one experimenter held the tms coil and the other monitored ongoing eeg; these individuals switched their places after each block. an extended rest was provided after blocks to allow participants a break from the task, to change the heated tms coil, and to reduce the impedance of any eeg electrodes if required. data collection lasted approximately minutes. in an effort to reduce the potential impact of any circadian fluctuation in cortical excitability (sale et al., ) , all testing sessions commenced at . pm (a time at which participants were more likely to feel drowsy after having had their lunch). peak-to-peak amplitudes of meps evoked by tms pulses delivered over the right motor cortex were calculated for each trial within a - ms time window using signal (v ) software (cambridge electronic design; cambridge, uk). trials containing phasic muscle activity in the left fdi channel within ms prior to a tms pulse being delivered were discarded from the analyses. we characterized modulations of meps as a function of alertness levels by fitting a sigmoid function to the proportion of trials that evoked meps (constrained from to on the y axis) across the tms intensities (- %, - %, - %, - %, %, + %, + %, + %, + ). we then compared threshold and slope measures in awake and drowsy trials separately for each participant. a μv cut-off threshold in peak-to-peak amplitude rossini et al., ; samii et al., ) was used to define the presence of an mep. a sigmoid function was fitted to each individual participant's data: where f is the mep ratio, x is the tms intensity, µ is the threshold value (the tms intensity at the inflection point), and s is inversely proportional to the slope at the threshold. the actual slope of the fitted sigmoid was calculated by fitting a straight line between a point . above the inflection point and a point . below it (see figure s ). to assess dynamics of mep peak-to-peak amplitude across alertness levels - , responsive trials with mep amplitude at least twice as high as the peak-to-peak distance in the - - ms baseline window were averaged separately for each alertness level and each participant. in an effort to control for mep variance as a function of tms intensity, only three tms intensities (- %, %, + %) around each individual's motor threshold were included in the analysis of mep changes across alertness levels. eeg pre-processing and analysis: pre-tms spectral power eeg data pre-processing was carried out using eeglab toolbox for matlab (delorme & makeig, ) , with two separate pre-processing pipelines developed for the analysis of eeg activity before and after tms pulses. to calculate eeg spectral power before tms, the recordings were downsampled to hz, and then epoched in - ms to - ms time segments preceding each tms pulse. the noisiest epochs were manually deleted, and the most deviant eeg channels were detected with the 'spectopo' function before running the independent component analysis (ica) for further removal of artefacts such as eye blinks and saccades, heartbeats, and muscle noise. ica was carried out on relatively clean channels only, whereas the noisy channels were recalculated by spherical spline interpolation of surrounding channels after deleting ica components with artefacts. data were again manually inspected and several remaining noisy epochs were deleted. on average, trials ( . %) were discarded per single participant during emg and eeg pre-processing, leaving on average trials per participant (sd= ; range= - trials) for the subsequent analyses. the spectral power of eeg oscillations over the s time interval immediately preceding each tms pulse was computed using a hilbert transform, set from . hz to . hz in steps of hz. given that estimation of spectral power of slow oscillations can be difficult close to the edges of eeg segments (cohen, ) , and we were particularly interested in the spectral power just before each tms pulse, a dummy copy of each eeg epoch was created by flipping the left and the right sides of each pre-tms epoch along the time axis. the resulting "mirror image" data were the concatenated with the original pre-tms data; that is, the time axis of the obtained . s eeg epochs extended from - ms to - ms (original) and then back from - ms to - ms (mirror). in this manner, an abrupt discontinuity was avoided in the time window just before the tms pulse, thus enabling a more stable estimate of spectral power. after hilbert transformation, the "mirror" part of the eeg epoch was deleted, retaining the original pre-tms window from - ms to - ms. to reduce data size, eeg recordings were down-sampled to hz before running the hilbert transform. two complementary eeg measures were used to assess participants' level of alertness before each tms pulse: ( ) the hori scoring system of sleep onset eeg (hori et al., ) , and ( ) a ratio between eeg spectral power of pre-tms theta and alpha oscillations, which we refer to here as the 'θ/α' measure of alertness (bareham et al., ; noreika et al., ). the hori system relies on visual scoring of s segments of continuous eeg data (hori et al., ) . it consists of stages reflecting a gradual progression from wakefulness to sleep, from hori stage which refers to alpha-dominated relaxed wakefulness, to hori stage which is defined by the occurrence of complete spindles coinciding with classic stage nrem sleep (see fig c) . the hori system has been used to map dynamic wake-sleep changes in erps (nittono et al., ) , eeg spectral power (tanaka et al., ) , reaction times, and the rate of subjective reports of being asleep (hori et al., ) . in the present study, hori stages were visually assessed by an experienced sleep researcher (vn) who was blind to participants' responsiveness and the tms intensity on any trial. for scoring purposes, only eeg channels of the standard - system were used (fp , fp , f , f , fz, f , f , c , cz, c , t , t , p , p , pz, p , p , o , o ), and eeg recordings were low pass filtered ( hz). previous research has found that participants are typically unresponsive in hori stages and above (ogilvie, ) , so our analysis was restricted to hori stages - , which we refer to here as alertness levels - . hori stages to are marked by decreasing activity in the alpha range, and hori stages to are characterized by an increase in activity in the theta range (hori et al., ) . thus, progression of drowsiness can be quantified by a ratio of the spectral power of the alpha and theta eeg frequency bands. specifically, here drowsiness was quantified as a period of time with an increased θ/α ratio of spectral power (bareham et al., ) . to apply this measure, theta ( . - . hz) and alpha ( . - . hz) power was first averaged in time from - ms to - ms, and the θ/α ratio was then calculated for each trial and electrode. next, the θ/α ratio was averaged across all electrodes, resulting in a single "alertness" value per trial. finally, trials were split into the most strongly "awake" ( %) and most strongly "drowsy" ( %) trials, excluding the % of trials that were intermediate between the two extremes. importantly, in addition to spontaneous fluctuations in alertness, the spectral power of eeg pre-stimulus oscillations can reflect attentional sampling and/or sensory gating (capotosto et al., ; romei et al., ; van dijk et al., ) . we expected that an alertness-related effect would be spatially and temporally widespread and consistent, so we repeated the mep analysis by splitting the data between awake and drowsy trials separately for each eeg electrode, and in equally sized pre-tms time bins of ms duration, from - ms to ms relative to the tms pulse. all participants completed the experimental task and reached the expected alertness level or higher, marked by the occurrence and dominance of theta waves. at a group level, a comparable proportion of awake and drowsy trials were obtained as per the criteria defined above (alertness levels - : m= . %, sd= . ; alertness levels - : m= . %, sd= . ) (see fig d) . thus, even though the hori system provides absolute electrophysiological signatures of the depth of drowsiness, the θ/α ratio was used to identify equal proportions of awake and drowsy trials within each participant. given that the θ/α measure is relative, there was a risk of mislabelling trials for some participants, as it would make a split between "awake" and "drowsy" trials even if all of them happened to be alertness level . thus, to verify the use of θ/α data splits, we compared these two measures at an individual level and at the level of the group as a whole. first, we carried out correlation analyses between the two measures of alertness within each participant. second, we compared correlation coefficients against zero to assess the consistency of association between the hori and the θ/α measures. at an individual level, hori and θ/α scores were positively and significantly correlated for all participants (individual rho ranged from . to . ). group analysis confirmed a very strong association between these two electrophysiological measures of alertness (one sample t test: t( )= . , p< . ), confirming that the θ/α ratio was well suited to assessing the level of alertness in the sample here (see fig f) . analysis of eeg reactivity to tms pulses in the first ms time window requires a perfect alignment of tms markers with the onset of the actual tms pulses. given that there was some delay and jittering between a tms marker and the pulse itself (m= . ms, sd= . ms), eeg markers indicating tms intensity were automatically adjusted to the time point of the actual tms pulse. for this, raw eeg data were segmented ± ms around each tms marker, and global field power (gfp) was calculated as a standard deviation of voltage across all electrodes, resulting in a single time waveform for each tms marker. each obtained waveform was baseline corrected to the - ms to - ms time window, and each time sample was transformed to its absolute value. the remaining time window of - ms to + ms was scanned, searching for the first time point where a gfp value that exceeded the maximal baseline gfp value by a factor of five, which indicated the onset of a tms artefact. the tms marker was then reallocated to this point in the continuous eeg recording. the eeg data were processed following an ica-based approach of tms-eeg artefact cleaning (rogasch et al., ) . first, eeg data were segmented from - ms to + ms around the onset of tms artefact. next, the segments were baseline corrected to the mean of the interval from - ms to - ms time window. a line was then fitted to the data from - ms to ms, thus deleting the initial tms-eeg artefact, and the epochs were down-sampled to hz. the most deviating eeg channels were then detected with the 'spectopo' function and the first round of independent component analysis (ica) was performed without using noisy channels. after deleting a very distinctive early high amplitude component that reflected tms-evoked contraction of scalp muscles, eeg data were filtered ( - hz) and epoched from - ms to + ms around the onset of the tms marker. once again, any deviating eeg channels were identified and a second round of ica was carried out without using noisy channels. independent components reflecting tms-eeg decay artefact, eye movements, auditory evoked potentials, hz line noise, and other sources of noise, were deleted, after which bad channels were recalculated using spherical spline interpolation. the eeg segments were again baseline corrected (- ms to - ms), and manually inspected to delete epochs that still contained a residual tms artefact. to account for within-trial variance, raw voltages of each individual trials were transformed to z-scores using the mean and standard deviation of the baseline period (- to - ms). trials were then split into different levels of alertness. to assess changes in eeg reactivity to tms perturbation as a function of alertness, the four electrodes immediately beneath the tms coil were chosen to contribute their voltage values to a region of interest (roi), and these were then averaged within each participant. the group-level waveform was then plotted, revealing an early tep peak at ms post-tms pulse. the data were then split between alertness levels and the mean amplitude (± ms) around the peak ( - ms) was calculated for each participant and each level of alertness. erp dynamics were additionally studied using data-driven spatiotemporal clustering analyses similar to what we have described previously (chennu et al., ) . awake and drowsy trials were compared in the time windows of interest ( - ms) by averaging single-subject data and running group level clustering. using modified functions of fieldtrip toolbox (maris and oostenveld, ; oostenveld et al., ) , we compared corresponding spatiotemporal points in individual awake and drowsy trials with an independent samples t-test. although this step was parametric, fieldtrip uses a nonparametric clustering method (bullmore et al., ) to address the multiple comparisons problem. t values of adjacent spatiotemporal points whose p values were less than . were clustered together by summating their t values, and the largest such cluster was retained. a minimum of two neighbouring electrodes had to pass this threshold to form a cluster, with the neighbourhood defined as other electrodes within a cm radius. this whole procedure, that is, calculation of t values at each spatiotemporal point followed by clustering of adjacent t values, was repeated times, with recombination and randomized resampling before each repetition. this monte carlo method generated a nonparametric estimate of the p value representing the statistical significance of the originally identified cluster. the cluster-level t value was calculated as the sum of the individual t values at the points within the cluster. we considered the possibility that a hypothetical alertness-modulation of the contraction of scalp muscles following tms may have contributed to the alertness-modulation of tep amplitude. to address this hypothesis, we compared the amplitude of the ica component of the scalp muscle, which was removed during the first stage of ica cleaning (rogasch et al., ) , between θ/α-defined awake and drowsy states. no amplitude difference was observed between awake and drowsy trials (see fig s ) , ruling out the possibility that tep corticalreactivity changes between awake and drowsy states were due to a change in the intensity of scalp muscle contraction. paired samples t tests were used to compare behavioural and neural summary measures between θ/α-defined awake and drowsy states. pooled variance was used to calculate cohen's d, with . indicating a small effect size, . a medium effect size, and . a large effect size (cohen, ) . for a similar comparison of summary measures across alertness levels - , a one way repeated measures anova was carried out with linear as well as non-linear contrasts. huynh-feldt correction was used when mauchly's test indicated violation of the assumption of sphericity. partial η was calculated as an effect size in anova tests, with . indicating a small effect size, . a medium effect size, and . a large effect size (cohen, ) . shapiro-wilk's test was used to assess normality of the distribution before running parametric tests. square-root or log transform were used to normalize skewed data. when transformations failed, non-parametric statistical tests were used, such as wilcoxon's signed-ranks test instead of a paired samples t test, and the mann-kendall trend test instead of a one-way repeated measures anova for linear contrasts across alertness levels - . a jonckheere-terpstra trend test was used to assess a linear change of single trial response variance across alertness levels - , followed up by planned contrasts using a mann-whitney test. for any significant main effects, bonferroni-holm multiplicity correction (holm, ) of p values was carried out to account for multiple follow-up comparisons between baseline alertness level and the other four levels. non-parametric spearman's rank order correlation test was used to assess for an association between single-trial mep and tep responses, with the bonferroni-holm correction of p values (holm, ) . aiming to avoid sample size effects when comparing pooled single-trial data across different levels of alertness, the trial number was matched by randomly selecting trials for each alertness level, which was the minimum number observed in one of the conditions. statistical analyses were carried out using matlab and ibm spss (v ) software packages. subjective and objective sleepiness in the active individual losing the left side of the world: rightward shift in human spatial attention with sleep onset non-invasive magnetic stimulation of human motor cortex eeg-guided transcranial magnetic stimulation reveals rapid shifts in motor cortical excitability during the human sleep slow oscillation the uses and interpretations of the motor-evoked potential for understanding behaviour resting state brain dynamics and its transients: a combined tms-eeg study global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural mr images of the brain frontoparietal cortex controls spatial attention through modulation of anticipatory alpha rhythms expectation and attention in hierarchical auditory prediction measuring brain stimulation induced changes in cortical properties using tms-eeg cortical excitability and sleep deprivation: a transcranial magnetic stimulation study statistical power analysis for the behavioral sciences analyzing neural time series data: theory and practice effects of antiepileptic drugs on cortical excitability in humans: a tms-emg and tms-eeg study dynamic modulation of decision biases by brainstem arousal systems. elife neurophysiological correlates of sleepiness: a combined tms and eeg study seeing in the dark: phosphene thresholds with eyes open versus closed in the absence of visual inputs eeglab: an open source toolbox for analysis of singletrial eeg dynamics including independent component analysis transcranial magnetic stimulation reveals intrinsic perceptual and attentional rhythms variability in the amplitude of skeletal muscle responses to magnetic stimulation of the motor cortex in man electroencephalographic study of drowsiness in simulated driving with sleep deprivation the spectral features of eeg responses to transcranial magnetic stimulation of the primary motor cortex depend on the amplitude of the motor evoked potentials detecting driver mental fatigue based on eeg alpha power changes during simulated driving minimum number of trials required for within-and between-session reliability of tms measures of corticospinal excitability modulation of cortical responses by transcranial direct current stimulation of dorsolateral prefrontal cortex: a resting-state eeg and tms-eeg study corticospinal excitability in human sleep as assessed by transcranial magnetic stimulation monitoring cortical excitability during repetitive transcranial magnetic stimulation in children with adhd: a single-blind, sham-controlled tms-eeg study attention modulates tmslocked alpha oscillations in the visual cortex excitability of the human motor cortex is enhanced during rem sleep a simple sequentially rejective multiple test procedure topographical eeg changes and the hypnagogic experience human cortical excitability increases with time awake abnormal cortical motor excitability in dystonia neuronal responses to magnetic stimulation reveal cortical reactivity and connectivity prestimulus alpha oscillations and inter-subject variability of motor evoked potentials in singleand paired-pulse tms paradigms tracking wakefulness as it fades: micro-measures of alertness a new method for measuring daytime sleepiness: the epworth sleepiness scale validation of the karolinska sleepiness scale against performance and eeg variables cortical brain states and corticospinal synchronization influence tms-evoked motor potentials variability of motor potentials evoked by transcranial magnetic stimulation circadian regulation of human cortical excitability neuronal responses to magnetic stimulation reveal cortical reactivity and connectivity inter-and intra-individual variability of paired-pulse curves with transcranial magnetic stimulation (tms) eeg oscillations and magnetically evoked motor potentials reflect motor system excitability in overlapping neuronal populations decrease in motor cortical excitability in human subjects after sleep deprivation changes of motor cortical excitability in human subjects from wakefulness to early stages of sleep: a combined transcranial magnetic stimulation and electroencephalographic study effects of sleep deprivation on cortical excitability in patients affected by juvenile myoclonic epilepsy: a combined transcranial magnetic stimulation and eeg study nonparametric statistical testing of eeg-and meg-data breakdown of cortical effective connectivity during sleep cortical mechanisms of loss of consciousness: insight from tms/eeg studies gradual changes of mismatch negativity during the sleep onset period wakefulness state modulates conscious access: suppression of auditory detection in the transition to sleep the process of falling asleep the detection of sleep onset: behavioral and physiological convergence the assessment and analysis of handedness: the edinburgh inventory fieldtrip: open source software for advanced analysis of meg, eeg, and invasive electrophysiological data diffusion model for one-choice reaction-time tasks and the cognitive effects of sleep deprivation removing artefacts from tms-eeg recordings using independent component analysis: importance for assessing prefrontal and motor cortex network properties spontaneous fluctuations in posterior α-band eeg activity reflect variability in excitability of human visual areas trial-to-trial size variability of motorevoked potentials. a study using the triple stimulation technique safety, ethical considerations, and application guidelines for the use of transcranial magnetic stimulation in clinical practice and research noninvasive electrical and magnetic stimulation of the brain, spinal cord and roots: basic principles and procedures for routine clinical application factors influencing the magnitude and reproducibility of corticomotor excitability changes induced by paired associative stimulation subjective characteristics of tms-induced phosphenes originating in human v and v characterization of postexercise facilitation and depression of motor evoked potentials to transcranial magnetic stimulation consciousness and complexity during unresponsiveness induced by propofol, xenon, and ketamine spontaneous locally restricted eeg alpha activity determines cortical excitability in the motor cortex drivers' misjudgement of vigilance state during prolonged monotonous daytime driving now i am ready-now i am not: the influence of pre-tms oscillations and corticomuscular coherence on motorevoked potentials corticospinal state variability and hemispheric asymmetries in motivational tendencies intra-and interindividual variability of motor responses to repetitive transcranial magnetic stimulation decoding wakefulness levels from typical fmri restingstate data reveals reliable drifts between wakefulness and sleep topographical characteristics and principal component structure of the hypnagogic eeg why does consciousness fade in early sleep? transcranial magnetic stimulation in basic and clinical neuroscience: a comprehensive review of fundamental principles and novel insights prestimulus oscillatory activity in the alpha band predicts visual discrimination ability eeg and the variance of motor evoked potential amplitude thirty years of transcranial magnetic stimulation: where do we stand? electroencephalogram and electrocardiograph assessment of mental fatigue in a driving simulator the study was supported by the wellcome trust (wt ma to tab), the eu key: cord- -o kvd b authors: mcpherson, malinda j.; grace, river c.; mcdermott, josh h. title: harmonicity aids hearing in noise date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o kvd b hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. we explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise. we measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. harmonicity also improved discrimination of sounds in noise. in contrast to other documented effects of harmonicity, harmonic detection advantages were comparable in musicians and non-musicians. the results show that harmonicity is critical for hearing in noise, demonstrating a previously unappreciated aspect of auditory scene analysis. the consistency of the effect across synthetic and natural stimuli, as well as across musical expertise, suggests its importance in everyday hearing. noise is an unavoidable part of our auditory experience. we must pick out sounds of interest amid background noise on a daily basis -a speaker in a restaurant, a bird song in a windy forest, or a siren on a city street. noise distorts the peripheral representation of sounds, but humans with normal hearing are relatively robust to its presence (sarampalis et al., ). however, hearing in noise becomes more difficult with age (ruggles et al., ; tremblay et al., ) and for those with even moderate hearing loss (bacon et al., ; oxenham, ; plack et al., ; rossi-katz & arehart, ; smoorenburg, ; tremblay et al., ) . consequently, understanding the basis of hearing in noise, and its malfunction in hearing impairment, has become a major focus of auditory research (kell & mcdermott, ; khalighinejad et al., ; mesgarani et al., ; moore et al., ; rabinowitz et al., ; town et al., ) . hearing in noise can be viewed as a particular case of auditory scene analysis, the problem listeners solve when segregating individual sources from the mixture of sounds entering the ears (bregman, ; carlyon, ; darwin, ; mcdermott, ). in general, segregating sources from a mixture is possible only because of the regularities in natural sounds (młynarski & mcdermott, ) . most research on the signal properties that help listeners segregate sounds has focused on situations where people discern concurrent sources of the same type, for example, multiple speakers (the classic 'cocktail party problem' (assman & summerfield, ; culling & summerfield, a; de cheveigne, kawahara, et al., ; de cheveigne et al., ; de cheveigne, mcadams, et al., ) ), or multiple concurrent tones (as in music rasch, ) ). concurrent onsets or offsets (darwin, ; darwin & ciocca, ) , co-location in space (cusack et al., ; freyman et al., ; hawley et al., ; ihlefeld & shinn-cunningham, ) , and frequency proximity (chalikia & bregman, ; darwin & hukin, ; młynarski & mcdermott, ) can all help to group sound elements and segregate them from other similar sounds in the background. harmonicity -the property of frequencies that are multiples of a common 'fundamental', or f ( fig. ) -likewise aids auditory grouping. for example, harmonic structure can help a listener select a single talker from a mixture of talkers (darwin et al., ; josupeit & hohmann, ; josupeit et al., ; popham et al., ; woods & mcdermott, ) . and when one harmonic in a complex tone or speech utterance is mistuned so that it is no longer an integer multiple of the fundamental, it can be heard as a separate sound (hartmann et al., ; moore et al., ; popham et al., ; roberts & brunstrom, ) . less is known about the factors and mechanisms that enable hearing in noise (operationally defined for the purposes of this paper as a background sound that does not contain audibly discrete frequency components, for example, white or pink gaussian noise, and some sound textures). previous research on hearing in noise has mainly focused on features of noise, such as stationarity, that aid its suppression (kell & mcdermott, ; khalighinejad et al., ; mesgarani et al., ; moore et al., ; rabinowitz et al., ) or separation (mcwalter & mcdermott, ; mcwalter & mcdermott, ) from signals such as speech. here we instead study the aspects of a signal that enable it to be heard more readily in noise. harmonicity is one sound property that differentiates communication signals such as speech and music from noise. although harmonicity is known to aid the segregation of multiple harmonically structured sounds, its role in hearing in noise is unclear. to explore whether harmonic frequency relations aid hearing of sounds in noise, we compared detection and discrimination of harmonic and inharmonic tones and spoken vowels embedded in noise. inharmonic sounds were generated by jittering frequency components so that they were not integer multiples of the fundamental frequency (mcpherson & mcdermott, ; roberts & holmes, ). the first question was whether harmonicity would make sounds easier to detect in noise across a range of sounds and tasks. the one related prior study we know of found that 'chords' composed of three harmonically related pure tones were somewhat easier to detect in noise than nonharmonically related tones, but did not pursue the basis of this effect (hafter & saberi, ) . the second question was whether harmonicity would make sounds easier to discriminate in noise. to address this question, we measured pitch discrimination thresholds with harmonic and inharmonic stimuli at a range of signal-to-noise ratios (snrs), asking whether harmonicity would aid discrimination in noise at supra-threshold snrs. pitch discrimination thresholds are known to be comparable for harmonic and inharmonic tones without noise, suggesting that listeners track the spectra of the tones in order to make up/down discrimination judgments (faulkner, ; mcpherson & mcdermott, micheyl et al., ; moore & glasberg, ) . but in noisy conditions it could be difficult to accurately encode the entire spectrum, making it advantageous to rely on harmonic structure for discrimination. previous studies have found that it is easier to hear the f of harmonic sounds when there is background noise (hall & peters, ; houtgast, ) , but it was unclear whether such effects would translate to improved discrimination of harmonic tones, compared to inharmonic tones, in noise. a third question was whether any benefit of harmonicity would be influenced by musical experience. musical training has been proposed as beneficial for hearing speech in noise (clayton et al., ; parbery-clark et al., ; swaminathan et al., ) , but evidence for such musicianship advantages has been inconsistent (boebinger et al., ; madsen et al., ) . it seemed plausible that musicianship effects might relate to harmonicity. harmonic structure is critical to music -most musical instruments have harmonic frequency spectra, and the frequency ratios between common intervals in standard western scales (and other scales around the world) are shared with the harmonic series. musical training has been associated with the enhancement of perceptual judgments related to harmonicity, with lower pitch discrimination thresholds (kishon-rabin et al., ; mcdermott, keebler, et al., ; micheyl et al., ; spiegel & watson, ) , and larger preferences for harmonic over inharmonic sounds in musicians (dellacherie et al., ; mcdermott, lehr, et al., ; weiss et al., ) , or in individuals with lifelong exposure to western music . thus, musical training might enhance sensitivity to harmonic structure, which, if relevant to hearing in noise, could support the use of musical training in clinical contexts. we found that harmonic sounds were consistently easier to detect in noise than inharmonic sounds. this result held for speech as well as synthetic tones. discrimination thresholds were also better for harmonic than inharmonic tones when presented in noise. this observed difference in discrimination thresholds between harmonic and inharmonic tones was greater than what could be explained simply by the detectability advantage for harmonic tones. additionally, across both detection and discrimination tasks, we observed harmonic advantages for non-musicians on par with those for musicians. the results suggest a grouping effect in which harmonicity aids detection and discrimination of signals in noise. figure . harmonicity a. spectrograms of example natural harmonic sounds: a spoken vowel, a cow mooing, a note played on a trumpet, and a phone vibrating. the frequency components of such sounds are multiples of a fundamental frequency, and are thus regularly spaced across the spectrum. b. schematic spectrogram of a harmonic tone with an f of hz. the purpose of experiment was to examine the effect of harmonicity on the detection of sounds in noise. in each trial, participants heard two noise bursts (fig. a) . a complex tone (fig. b. ) or a pure tone was embedded in one of the noise bursts, and participants were asked to choose which noise burst contained the tone. the complex tones could be harmonic or inharmonic, with constituent frequencies added in sine or random phase. participants completed four adaptive measurements of the detection threshold for tones in each condition. participants. participants completed experiment on an online data collection platform, amazon mechanical turk. of these participants were removed from analysis because their average threshold across conditions (using the first adaptive run of each condition) was over a standard deviation worse than the group mean across all conditions. this exclusion criterion is neutral with respect to the hypotheses being tested, and also independent of the data we analyzed (only the subsequent runs were included for analysis in the remaining participants, to avoid double-dipping). therefore, our exclusion procedure allowed unbiased threshold estimates from those final three runs. in previous studies we have found that online results replicate in-lab results when such steps are taken to exclude the worst-performing participants woods & mcdermott, ) . of the remaining participants ( female, mean age = . years, s.d.= . years), had four or more years of musical training, with an average of . years, s.d.= . years. in this and other experiments, we determined sample sizes a priori based on pilot studies, and using g*power (faul et al., ) . we ran a pilot study that was similar to experiment . the only difference was that the frequencies of each inharmonic note were jittered independently on each trial (in contrast to experiment , where each inharmonic tone for a participant was made inharmonic in the same way across the entire study, as described below). we ran this pilot study in participants, and observed a strong main effect of harmonicity (hp =. for an anova comparing harmonic vs. inharmonic conditions). we chose our sample size to be able to detect a potential musicianship effect that might be substantially weaker than the main effect of harmonicity. we thus sought to be well-powered to detect an interaction between musicianship and harmonicity / the size of the main effect of harmonicity at a significance level of p<. , % of the time. this yielded a target sample size of participants ( musicians and nonmusicians). in practice we ran participants in batches, and then excluded them based on whether they passed the headphone check and our performance criteria, so the final sample was somewhat larger than this target. procedure. all experiments were conducted online on amazon's mechanical turk platform. in-person data collection was not possible due to the covid- virus. prior to starting the experiment, potential participants were instructed to use headphones, and then used a calibration sound ( . seconds of threshold equalizing noise (moore et al., ) ) to set their volume to a comfortable level. the experimental stimuli were normalized to db below the level of the calibration sound to ensure that they were never uncomfortably loud. participants were then screened with a brief experiment designed to help ensure they were wearing earphones or headphones, as instructed (woods et al., ) . if they passed this screening (across all experiments in this paper, approximately % of participants did so, consistent with woods et. al, ) , participants proceeded to the main experiment. for all experiments in the paper, participants received feedback after each trial, and to incentivize good performance, they received a compensation bonus proportional to the number of correct trials. we used adaptive procedures to measure detection thresholds. participants completed down- -up two-alternative-forced-choice ('does the first or second noise burst contain a tone?') adaptive threshold measurements. adaptive tracks were stopped after reversals. the signalto-noise ratio (snr) was changed by db for the first two reversals, db for the subsequent two reversals, and . db for the final six reversals. the threshold estimate from a track was the average of the snrs at the final six reversals. participants completed four adaptive threshold measurements for each condition. complex tone conditions (random vs. sine phase tones, and harmonic vs. inharmonic tones) were randomly intermixed, and the four runs of the pure tone condition were grouped together, run either before or after all of the complex tone adaptive runs, chosen equiprobably for each participant. stimuli. trials consisted of two noise bursts, one of which contained a tone. first, two ms samples of noise were generated, and one of these noise samples was randomly chosen to contain the tone. the tone was scaled to have the appropriate power relative to that noise sample; both intervals were then normalized to the same rms value. tones were ms in duration; the noise began and ended ms before and after the tone (fig. a) . the tones started ms after the noise to avoid an 'overshoot' effect, whereby tones are harder to detect when they start near the onset of noise. (zwicker, ) . the two noise bursts were separated by ms of silence. the noise used in this and all other experiments was threshold equalizing (te) noise (moore et al., ) . pilot experiments with both white and pink noise suggested that the harmonic detection advantage is present regardless of the noise spectrum. in experiment , noise was low pass filtered with a th order butterworth filter to make it more pleasant for participants. the cutoff frequency was hz, chosen to be well above the highest possible harmonic in the complex tones. noise in all experiments was windowed in time with ms half-hanning windows. complex tones contained ten equal-amplitude harmonics. depending on the condition, harmonics were added in sine phase or random phase (fig. c) . f s of the tones (both complex and pure -pure tones were just the f frequency component of the harmonic tones) were randomly selected to be between - hz (log uniform distribution). tones were windowed with ms half-hanning windows, and were ms in duration. tones and noise were sampled at . khz. to make tones inharmonic, the frequency of each frequency component (other than the f component) was 'jittered' by up to % of the f value. jittering was accomplished by sampling a jitter value from the distribution u(- . , . ), multiplying by the f , then adding the resulting value to the frequency of the respective harmonic. jitter values were selected via rejection sampling, successively moving up the harmonic series rejecting or accepting sampled jitter values, to ensure that adjacent harmonics were always separated by at least hz (to avoid salient beating). jitter values varied across participants (described below), but for a given participant were fixed across the experiment (i.e., each inharmonic tone heard by a given participant had the same jitter pattern). for technical reasons all stimuli for online experiments were generated ahead of time. stimuli were pre-generated for every possible difficulty level (snr) within the adaptive procedure. the snr was capped at + db snr per component. if participants in the experiment reached this cap the stimuli remained at this snr until participants got three trials in a row correct. in practice, participants who performed poorly enough to reach this cap were removed post hoc by our filtering procedure. adaptive tracks were initialized at - db snr per component. for each trial within an adaptive track, one of the stimuli for the current difficulty level within the adaptive track was selected at random. in order to vary the jitters across participants, we generated independent sets of possible stimuli, each with a different set of randomly selected jitter values for the inharmonic trials. each participant only heard trials from one of these sets (i.e., all the inharmonic stimuli they heard were 'jittered' in the same way throughout the experiment). as some randomly selected jitter patterns can by chance be close to harmonic, we randomly generated , possible jitter patterns, then selected the patterns that minimized peaks in the autocorrelation function. the resulting jitters were evaluated by eye to ensure that they were distinct. statistical analysis. thresholds were calculated by averaging the snr values of the final six reversals of the adaptive track. data distributions were non-normal (skewed), so nonparametric tests were used for all comparisons. wilcoxon signed-rank tests were used for pairwise comparisons between dependent samples. to compare performance across multiple conditions we used repeated-measures anovas. however, because data were non-normal, we evaluated the significance of the f statistic with approximate permutation tests, randomizing the assignment of the data points across the conditions being tested , times, and comparing the f statistic to this distribution. to evaluate potential differences between musicians and nonmusicians we compared the difference between harmonic and inharmonic detection thresholds. the distribution of these differences was normal (evaluated using the lilliefors test), thus we used trial structure for experiment . during each trial, participants heard two noise bursts, one of which contained a complex tone (left) or pure tone (right), and were asked to decide whether the tone was in the first or second noise burst. b. schematic of harmonic and inharmonic spectra; complex tones were harmonic or inharmonic (with frequencies jittered). c. example waveforms of harmonic tones added in random phase (top) and sine phase (bottom). the waveform is 'peakier' when the harmonics are added in sine phase. d. results of experiment , shown as box-and-whisker-plots, with black lines for individual participant results. the central mark in the box plots the median, and the bottom and top edges of the box indicate the th and th percentiles, respectively. whiskers extend to the most extreme data points not considered outliers. asterisks denote significance, wilcoxon signed-rank test: ***=p< . . e. harmonic detection advantage (harmonic threshold -inharmonic threshold, averaged across phase conditions) for musicians and non-musicians. for e, error bars denote standard error of the mean. independent sample t tests to compare groups, and used bayesian statistics to estimate the probability of null results. as shown in fig. d , detection in noise was better for the complex tone conditions than the pure tone conditions (z= . , p<. , pure tone vs. mean performance for inharmonic conditions) as expected from signal detection theory given the ten-fold increase in harmonics in the complex tones compared to the pure tones (florentine et al., ) . however, detection thresholds were substantially better for harmonic than inharmonic complex tones even though they each had frequency components (significant differences in both sine and random phase conditions, wilcoxon signed-rank test: sine phase: z= . , p<. ; random phase: z= . , p<. ). we observed a . db snr advantage for inharmonic tones compared to pure tones, and an additional . db snr advantage for harmonic tones over inharmonic tones (averaged across phase conditions). these differences are large enough to have some real-life significance. for instance, if a harmonic tone could be just detected meters away from its source in free field conditions, an otherwise identical inharmonic tone would only be audible . meters away from the source (using the inverse square law; for comparison, a pure tone at the same level as one of the frequency components from the complex tone would be audible . meters away). a priori it seemed plausible that a detection advantage for harmonic tones could be explained by the regular amplitude modulation of harmonic sounds. however, performance was similar for the sine and random phase conditions (the latter of which produces substantially less modulation, fig. b ). we observed no significant differences between phase conditions or interaction with harmonicity (no significant main effect of phase, f( , )= . , p=. ,hp = . , and no interaction between harmonicity and phase, f( , )= . p=. , ,hp = . ). this result indicates that the observed harmonic advantage derives from spectral rather than temporal properties of harmonic sounds. the results are also unlikely to be explained by distortion products. although harmonic tones would be expected to produce stronger distortion products than inharmonic tones, these should be undetectable for tones that include all the lower harmonics (as were used here and in most other experiments) (norman-haignere & mcdermott, ; pressnitzer & patterson, ) . we were curious whether musicians and non-musicians would have comparable detection advantages for harmonic over inharmonic stimuli. averaging across phase conditions, we compared the harmonic detection advantage for both groups ([inharmonic thresholds -harmonic thresholds]). we observed no significant differences between groups (fig. e , and supplementary fig a, musician mean advantage = . db, s.d.= . , non-musician mean advantage = . db, s.d.- . , t( )= . , p=. ). the bayes factor (using a cauchy distribution prior, centered at zero with a scale of . ) was . against a null hypothesis, providing moderate support for the null hypothesis (jasp, version . . , ) . in experiment , we investigated whether the observed harmonic advantage would facilitate other types of judgments about sounds. specifically, it seemed plausible that being better able to separate tones from noise might help listeners discriminate successive tones at low snrs. using an adaptive procedure, we measured traditional up-down discrimination thresholds for harmonic, inharmonic, and pure tone conditions (fig. a) at a range of snrs. participants. participants were recruited online for experiment . we excluded participants who performed worse than . % across all conditions (averaged across both runs of the experiment). this cutoff was based on a pilot study run in the lab -it was the average performance across all conditions for non-musician participants. we used this cutoff to obtain mean performance levels on par with those of compliant and attentive participants run in the lab. participants were excluded from analysis by this criterion. this resulted in participants, female, mean age = . years, s.d.= . years. participants had four or more years of musical training, with an average of . years, s.d. = . . we chose our sample size using the same pilot data used to determine the exclusion criteria. the pilot experiment, run in participants, differed from the current study in a few respects. in addition to being run in the lab, the pilot experiment did not include a pure tone condition, the snr values were shifted half a semitone higher, and in inharmonic conditions, a different jitter pattern was used for each trial (rather than the same jitter pattern being used across trials). we performed anovas testing for effects of harmonicity and musicianship; the pilot data showed fairly large main effects of both harmonicity (hp =. ) and musicianship (hp =. ), suggesting that both these analyses would be well-powered with modest sample sizes. to ensure the reliability of planned analyses examining the inflection points and slopes of sigmoid functions fitted to the discrimination curves, we also estimated the sample size needed to obtain reliable mean thresholds. we extrapolated from our pilot data (via bootstrap) that an n of at least would be necessary to have a split-half reliability of the mean measured threshold in each condition (assessed between the first and second adaptive runs of the experiment) greater than r=. . this sample size was also sufficient for the anova analyses (for example, to see an effect of musicianship / the size of that observed in our pilot study % of the time at a p<. significance level, one would need a sample size of ). we thus aimed to recruit at least participants. procedure. in experiment we measured classic up-down "pitch" discrimination. as in experiment , on each trial participants heard two noise bursts. however, in this experiment, a tone was presented in each of the two noise bursts, and participants judged whether the second tone was higher or lower than the first tone. the difference in the f s used to generate the tones was initialized at semitone, and was changed by a factor of through the first four reversals, and then by √ through the final six reversals. we tested pitch discrimination at snrs for pure tones, snrs for inharmonic tones, and snrs for harmonic tones. this choice was motivated by pilot data showing that at the lowest snr tested for harmonic tones, inharmonic tones were undetectable. the same logic applied to the lowest two snrs and pure tones. because we expected that the lowest snr conditions tested in each condition would make discrimination very difficult (if not impossible), we capped the f difference at semitones. if participants completed three trials in a row incorrectly at this f difference, the adaptive track was ended early. for these trials, the threshold was conservatively recorded as semitones for analysis ( . %). participants performed adaptive runs per condition. stimuli: the stimuli for experiment were identical to the random phase complex tones used in experiment , except that each of the two noise bursts contained a tone. the initial f for each trial was randomly selected between and hz (log uniform distribution). the same vector of jitter values was applied to each of the two notes (mcpherson & mcdermott, ) used in a trial. as in experiment , we generated sets of stimuli, each with a different jitter pattern, selected from , randomly generated jitter patterns as those with the smallest autocorrelation peaks. each participant was randomly assigned one of these sets of stimuli, and only heard one inharmonic 'jitter' pattern throughout the experiment. statistical analysis. thresholds were estimated by taking the geometric mean of the f differences from the final six reversals of the adaptive track. as in experiment , data distributions were non-normal (skewed), so wilcoxon signed-rank tests were used for pairwise comparisons. to compare performance across multiple conditions or across musicianship we used repeatedmeasures anovas (for within group effects) and mixed-model anovas (to compare within and between group effects). we evaluated the significance of the f statistic with approximate permutation tests, randomizing the assignment of the data points across the conditions being tested , times, and comparing the f statistic to this null distribution. we completed a secondary analysis to compare the results for the three stimulus conditions (harmonic, inharmonic, and pure tone) after accounting for differences in detectability between conditions. we replotted the pitch discrimination curves in terms of the snr relative to the detection thresholds measured in experiment (- . db snr, - . db snr, - . db snr, for harmonic, inharmonic and pure tone conditions, respectively, fig. b, inset) . we then bootstrapped over participants to evaluate the statistical significance of the residual differences between conditions. for each bootstrap sample we fit a sigmoid function to the results curve for each condition and compiled a distribution of the slopes and midpoints of each of the curves. in order to obtain reasonable curve fits, it was necessary to pad the data on either end of the snr range with dummy values: with - . on the low end (the highest possible threshold that could be measured in the experiment, as if we had added one additional, lower snr), and with zeros at the high end. we compared the bootstrap distributions of the slopes and midpoints of the three conditions to determine the significance of differences between conditions. replicating prior results (mcpherson & mcdermott, , discrimination thresholds in quiet were similar for harmonic and inharmonic tones (around . % in both cases; rightmost conditions of fig. b) , with indistinguishable thresholds (-inf db; z= . , p=. ). however, at lower snrs inharmonic discrimination thresholds were substantially higher than harmonic thresholds (significant differences at all snrs between - . and - db; z> . , p<. in all cases), yielding a main effect of harmonicity (between harmonic and inharmonic tones (excluding - db snr, when only harmonic thresholds were measured), f( , )= . , p<. , hp = . ). when we accounted for differences in the detection thresholds for the three tone types (as measured in experiment ), we found that the inflection points of the sigmoid functions remained significantly different for harmonic and inharmonic conditions (p=. ). the inflection point for the pure tone condition was not significantly different from either the harmonic (p=. ) or inharmonic conditions (p=. ), and the slopes of the three conditions were not significantly different. the difference between harmonic and inharmonic discrimination after accounting for the detectability of the tones suggests that harmonic discrimination advantage is better than what would be expected based on detectability, or conversely, that inharmonic discrimination is worse than what would be expected based on detectability. even at snrs where people can detect inharmonic tones fairly reliably, harmonicity aids discrimination in noise, perhaps because representations of the f can be used for discrimination. consistent with many previous studies (kishon-rabin et al., ; mcdermott, keebler, et al., ; micheyl et al., ; spiegel & watson, ) , pitch discrimination was overall better in musicians than non-musicians ( supplementary fig. b) . these differences were significant by a sign test (mean thresholds were higher in non-musicians for of conditions, p<. ), but were modest, and did not reach significance in an anova (excluding the - and - . db snr conditions, for which we didn't measure pure tone thresholds, f( , )= . , p=. , hp = . ). we also did not observe a significant interaction between musicianship and harmonicity (only examining the harmonic and inharmonic conditions, - . db snr and greater, f( , )= . , p=. , hp =. ). it is possible that if we increased the level of musical training used to categorize participants as musicians, we would have seen a more robust difference in discrimination ability between groups. however, our results show that the harmonic advantage for pitch discrimination is observed in both musicians and non-musicians (significant main effect in each group separately, musicians: f( , )= . , p<. , hp =. , non-musicians f( , )= . , p<. , hp =. ), which suggests that this advantage reflects a domain-general auditory scene analysis ability. figure : harmonic advantage for discriminating tones in noise a. schematic of the trial structure for experiment . during each trial, participants heard two noise bursts, each of which contained a complex tone (both tones were either harmonic or inharmonic), and were asked to decide whether the second tone was higher or lower than the first tone. b. results from experiment . error bars denote standard error of the mean. for conditions where we were unable to measure thresholds from all participants, the number of participants with measurable thresholds is indicated next to the data point. exact threshold values are provided for thresholds under %. asterisks denote significance, wilcoxon signed-rank test: ***=p< . , **=p< . . inset: pitch discrimination thresholds adjusted based on the detection thresholds measured in experiment . the x axis plots snr relative to detection threshold for the three different tone types. in experiment , we tested whether the harmonic detection advantage would be present for natural sounds by measuring detection thresholds for spoken syllables embedded in noise (fig. a ). participants: participants completed experiment online. were removed because their average performance across the first run of all conditions was over a standard deviation away from the group mean across the first run. as in experiment , only the subsequent runs were used for analysis. participants were included in the final analysis, female, mean age= . years, s.d.= . years). participants had four or more years of musical training, with an average of . years, s.d.= . years. the effect size of harmonicity measured in a pilot version of experiment was moderate (dz = . ), plausibly because we used natural stimuli, which are more variable than the synthetic tones used in other experiments (in which we observed larger effect sizes). the pilot experiment (run in participants) was identical to experiment except that each inharmonic trial contained harmonics that were jittered independently from the other trials. we aimed to run at least participants to be % sure of seeing an effect this size with a . significance threshold. we did not attempt to recruit equal numbers of musicians and non-musicians given the lack of an effect of musicianship in experiment . procedure. we measured detection thresholds for single spoken vowels embedded in noise, resynthesized to be inharmonic or harmonic. participants were asked whether the first or second noise burst contained a word, rather than a tone. the adaptive staircase procedure was the same as that used in experiment . stimuli: speech was resynthesized using the straight analysis and synthesis method (ellis et al., ; kawahara & morise, ) . straight decomposes a recording of speech into voiced and unvoiced vocal excitation and vocal tract filtering. if the voiced excitation is modelled sinusoidally, one can alter the frequencies of individual harmonics and then recombine them with the unaltered unvoiced excitation and vocal tract filtering to generate inharmonic speech. this manipulation leaves the spectral envelope of the speech largely intact, and intelligibility of inharmonic speech in quiet is comparable to that of harmonic speech (popham et al., ) . the frequency jitters for inharmonic speech were chosen in the same way as those for inharmonic complex tones. speech and noise were sampled at khz. code implementing the harmonic/inharmonic resynthesis is available on the senior author's lab web page. we used the vowels /i/, /u/, /a/ and /ɔ/, from the hillenbrand vowel set (hillenbrand et al., ) (h-v-d syllables) . these four vowels bound the english vowel space. as in experiment participants heard syllables embedded in te-noise, but noise bursts were ms in duration and vowels were truncated to be ms in duration. there were ms of noise before the onset of the syllable and ms of noise after the syllable ended. noise was not low-pass filtered; noise went up to the nyquist limit. stimuli were pre-generated, and trials were generated in advance for each possible difficulty level. the adaptive procedure was initialized at an snr of db snr and capped at db snr. the same pattern of jitter was used throughout the entire speech syllable, and as in previous studies, different sets of stimuli were generated, each with a distinct jitter pattern for inharmonic stimuli. participants were randomly assigned to one of the stimuli sets. statistical analysis. as in experiment , thresholds were calculated by averaging the snr values of the final six reversals of the adaptive track. data distributions were non-normal (skewed), so a wilcoxon signed-rank test was used to compare the harmonic and inharmonic conditions. as shown in fig. b , harmonic vowels were easier to detect in noise than inharmonic vowels (wilcoxon signed-rank test: z= . , p<. ). this result demonstrates that the effect observed with complex tones generalizes to real-world sounds such as speech. perhaps unsurprisingly, the harmonic advantage was more variable with these natural stimuli (the standard deviation of the difference between harmonic and inharmonic thresholds was . db snr in experiment , but . db snr in experiment ). this variability may reflect the additional cues available for detection in some stimulus exemplars but not others, including concurrent modulation across frequency components (culling & summerfield, b; mcadams, ) , and onsets and offsets of consonants (darwin, ) . the persistence of the harmonic advantage despite these factors suggests that the advantage for detecting harmonic sounds could aid in real-world listening. figure : harmonic advantage for detecting speech in noise a. schematic of the trial structure for experiment . during each trial, online participants heard two noise bursts, one of which contained a spoken syllable, and were asked to decide whether the first or second noise burst contained speech. speech was resynthesized to be either harmonic or inharmonic. b. results of experiment . results are shown as box-and-whisker-plots, with black lines plotting individual participant results. the central mark in the box plots the median, and the bottom and top edges of the box indicate the th and th percentiles, respectively. whiskers extend to the most extreme data points not considered outliers. asterisks denote significance, wilcoxon signed-rank test: ***=p< . . due to the increase in cochlear filtering bandwidth with frequency, only harmonics below about the th are believed to be individually discernible by the auditory system, and these harmonics dominate the perception of pitch (shackleton & carlyon, ) . to determine whether the harmonic detection advantage observed in experiment was driven by low-numbered harmonics that are individually "resolved" by the cochlea, we ran a second experiment with the same task as experiment , but with tones filtered to only contain harmonics - ("unresolved" harmonics, fig. a ). tones were again presented in either sine phase or random phase. participants. participants were recruited online for experiment . participants performed over a standard deviation worse than the group mean on the first adaptive run and were excluded from analysis. as in previous studies, only the subsequent runs were analyzed. participants were included in the final analysis, female, with a mean age of . years, s.d.= . years. participants had four or more years of musical training, with an average of . years, s.d.= . years. we used the pilot data from experiment to determine sample size. based on prior work we hypothesized that the effect of harmonicity might be reduced with unresolved harmonics (hartmann et al., ; moore et al., ) . we aimed to be able to detect an effect half the size of the main effect of harmonicity seen with resolved harmonics in our pilot data (hp =. ). this yielded a target sample size of participants (to have a % chance of seeing the hypothesized effect with a . significance threshold). procedure. the instructions and adaptive procedure were identical to those used in experiment . stimuli. tones contained harmonics to at full amplitude, with a trapezoid-shaped filter applied in the frequency domain in order to reduce the sharp spectral edge that might otherwise be used to perform the task. on the lower edge of the tone, the th harmonic was attenuated to be db below the th harmonic, and the th harmonic to be db below. on the upper edge of the tone, the same pattern of attenuation was applied in reverse between the st and rd harmonics. all other harmonics were removed. additionally, the cutoff frequency for the noise (te-noise) was increased to , hz (rather than , hz used in experiments , , and ). noise was filtered with a th order butterworth filter. other aspects of the stimuli (duration of tones, timing of tones in noise, etc.) were matched to parameters used in experiment . statistical analysis. statistical analysis was identical to that used in experiment . as shown in fig. b , there was no difference in detectability between harmonic and inharmonic stimuli (f( , )= . , p=. , hp =. ). there was also no main effect of phase (f( , )= . , p= . , hp =. ). these results suggest that the harmonic detection advantage is specific to resolved harmonics, placing constraints on the mechanism underlying the harmonic advantage observed in our other experiments. the bayes factor (using a cauchy distribution prior, centered at zero with a scale of . ) was . against a null hypothesis, providing mild support for the null hypothesis (jasp, version . . , ). figure : harmonic detection advantage is specific to resolved harmonics a. schematic of the trial structure for experiment . during each trial, participants heard two noise bursts, one of which contained a complex tone with unresolved harmonics, and were asked to decide whether the first or second noise burst contained a word. b. results from experiment , shown as box-andwhisker-plots, with black lines plotting individual participant results. the central mark in the box plots the median, and the bottom and top edges of the box indicate the th and th percentiles, respectively. whiskers extend to the most extreme data points not considered outliers. one potential explanation for the observed harmonic detection advantage is that people have internal templates for harmonic spectra, potentially developed over a lifetime of exposure to harmonic sounds to help efficiently encode natural sounds. these templates could potentially help listeners know what to listen for in a detection task. experiment tested this idea by assessing whether the harmonic advantage persists even when listeners are cued beforehand to the target tone. participants heard two stimulus intervals, each containing a "cue" tone followed by a noise burst. one of the noise bursts contained an additional occurrence of the cue tone (fig. a) , and participants were asked whether the first or second noise burst contained the cued tone. participants. participants completed experiment online. participants were removed because their average performance across the first run of both conditions was over a standard deviation lower than the group mean. as in previous experiments, only the subsequent runs were used for analysis. participants were included in the final analysis, female, mean age= . , s.d.= . years. participants had four or more years of musical training, with an average of . years, s.d.= . years. we used data from a pilot experiment to determine sample size. the pilot experiment differed from the current experiment in ways: it was run in the lab, and each inharmonic note contained harmonics that were jittered independently from the other trials. the pilot experiment was run in participants. since experiment only had two conditions, we intended to use a single wilcoxon signed-rank test to assess the difference between harmonic and inharmonic conditions. the effect size for this comparison in the pilot experiment was dz = . . this yielded a target sample size of at least participants to have a % chance of seeing an effect of harmonicity of that size with a . significance threshold using a wilcoxon signed-rank test. procedure. the instructions and adaptive procedure were identical to those used in experiment . stimuli. participants heard a tone before each of the two noise bursts. this "cue" tone was identical to the tone embedded in one of the noise bursts (that participants had to detect). each trial had the following structure: a ms tone, followed by ms of silence, the first ms noise burst, ms of silence, a ms tone, ms of silence, and finally, the second ms noise burst. the target tone was present in either the first or the second noise burst, starting ms into the noise burst and lasting for ms. only tones with harmonics added in random phase were used. in all other respects, stimuli in experiment were identical to those of experiment . statistical analysis. thresholds were calculated by averaging the snr values of the final six reversals of the adaptive track. wilcoxon signed-rank tests was used to compare the harmonic and inharmonic conditions. as shown in fig. b , the harmonic advantage persisted with the cue (z= . , p=. ). even when participants knew exactly what to listen for in the noise, there was still an added benefit when detecting harmonic tones. wilcoxon rank-sum tests showed that detection thresholds for harmonic tones in experiment (without a cue) were indistinguishable from those with a cue tone (z= . , p=. ). there was also no significant difference for inharmonic tones with and without a cue (z= . , p= . ). this result suggests that the observed detection advantage for harmonic over inharmonic tones does not simply reflect biases due to familiarity. figure : harmonic advantage persists when listeners know what to listen for a. schematic of the trial structure for experiment . during each trial, participants heard two noise bursts, both of which were preceded by a 'cue' tone, and one of which contained that same 'cue' tone. participants were asked to decide whether the first or second noise burst contained the cued tone. b. results from experiment . shown as box-and-whisker-plots, with black lines plotting individual participant results. the central mark in the box plots the median, and the bottom and top edges of the box indicate the th and th percentiles, respectively. whiskers extend to the most extreme data points not considered outliers. asterisks denote significance, wilcoxon signed-rank test: **=p< . . we found that both synthetic and natural harmonic sounds were consistently easier to detect and discriminate in noise than otherwise identical inharmonic sounds. this harmonic advantage was present in both musicians and non-musicians, and generalized to natural sounds (specifically, speech, as seen in experiment ). most acoustic communication signals (e.g. speech, and many musical sounds) are harmonic. the benefit we observed suggests that such harmonic sounds are audible at greater distances than otherwise identical inharmonic sounds. for example, if a harmonic sound is just detectable meters from its source in a scene with spatially uniform background noise, our results indicate that the listener would have to move . meters closer to the sound source to hear an otherwise identical inharmonic sound. given the ubiquity of both background noise and harmonic communication sounds in daily life, humans may have evolved or learned mechanisms to help detect harmonic sounds in noisy backgrounds. these results represent a previously undocumented aspect of auditory scene analysis. the results of experiment suggest that the harmonic advantage cannot be explained by amplitude fluctuations in the sound wave (because results were similar for sine and random phase conditions). moreover, our results suggest that the harmonic advantage is absent for tones containing only unresolved harmonics, in which amplitude fluctuations should be maximally salient. the harmonic detection advantage persisted even when people knew exactly what to listen for (experiment ), suggesting that it is perceptual in origin, and reflects a basic ability to separate harmonic sounds from noise. the observation that harmonicity enhances pitch discrimination in noise (experiment ) suggests that harmonicity aids the ability to extract information from sounds in noise, in addition to improving detectability. harmonic and inharmonic discrimination at high snrs is similar, likely driven by the ability to track frequency shifts between notes (demany & ramos, ; faulkner, ; mcpherson & mcdermott, ; micheyl et al., ; moore & glasberg, ) . but when presented in background noise, harmonic structure may help listeners hear out individual harmonics, or alternatively, enable listeners to 'fill in' missing harmonics that they would otherwise be unable to hear (mcdermott & oxenham, ) . when notes are inharmonic, if a listener fails to detect even one of the frequency components from the complex, it could make the correspondence between the components of the first note and the second note (and thus the direction of the pitch change) ambiguous, impairing performance. could the harmonic discrimination advantage reflect f -based pitch? previous studies suggest that listeners are more inclined to base judgments on the f , rather than spectral features, when tones are embedded in noise (hall & peters, ; houtgast, ) . thus, in principle, the detection advantage we report could be driven by a pitch signal from harmonically related frequencies. some evidence for this possibility comes from the analysis of experiment that adjusted discrimination thresholds using detection thresholds. the harmonicity advantage persisted even after adjusting for the detection advantage, suggesting that there is some additional factor (potentially f -based pitch) that aids discrimination when sounds are harmonic. this result is consistent with the idea that the grouping of harmonics and the estimation of their f may rely on partially distinct mechanisms (brunstrom & roberts, ; moore, ; roberts & brunstrom, ) . prior experiments comparing sound segregation in musicians and non-musicians have reached divergent conclusions about the benefits of musical training (madsen et al., ) . studies that show a musician benefit have often involved multiple harmonic sounds, or a single harmonic mistuned from a complex tone (coffey et al., ; zendel & alain, ) . in contrast, we found musical training to have little effect on the harmonic advantage for detecting sounds in noise. harmonic detection advantages were comparable in musicians and non-musicians, as was the effect of noise on discrimination. western musical training often involves hearing out one harmonic sound among others (picking out a melody from its harmony, for example), but does not typically require hearing in stationary background noise of the sort used here. it is thus possible that hearing a harmonic sound in noise requires distinct strategies compared to hearing out one harmonic sound among other similar sounds, and that these two abilities could be differentially affected by musicianship (oxenham et al., ) . musicianship advantages might thus be expected in conditions where there are multiple harmonic sounds, but not in conditions like those tested here, where a single tone is embedded in noise. our findings suggest that harmonicity is critical for detecting and discriminating sounds in noisy auditory scenes. in contrast to traditional experiments measuring speech intelligibility in noise (duquesnoy, ; festen & plomp, ) , we demonstrate a hearing-in-noise effect with relatively simple stimuli. our effects might plausibly be evident in non-human animal models of hearing (feng & wang, ) , and could be used to further explore and understand cross-species similarities and differences in the representations of harmonic sounds (kalluri et al., ; norman-haignere et al., ; shofner & chaney, ; walker et al., ) . one promising future direction may be to use the tasks developed here to search for neural signatures of harmonicity-based sound segregation in noise. it could also be informative to measure the harmonic detection advantage in individuals with listening disorders (boets et al., ; cameron et al., ; dole et al., ; lagace et al., ; ziegler et al., ), as its presence or absence might help pin down the origins of commonly observed hearing-in-noise deficits (dole et al., ; lagace et al., ; ziegler et al., ). supplementary figure : effects of musicianship on detection and discrimination in noise a. results of experiment , separated for musicians and non-musicians. the results are averaged across sine and random phase conditions, and shown as box-and-whisker-plots, with black lines plotting individual participant results. the central mark in the box plots the median, and the bottom and top edges of the box indicate the th and th percentiles, respectively. whiskers extend to the most extreme data points not considered outliers. ***=p< . . b. results from experiment , separated for musicians and non-musicians. modeling the perception of concurrent vowels: vowels with different fundamental frequencies the effects of hearing loss and noise masking on the masking release for speech in temporally complex backgrounds musicians and non-musicians are equally adept at perceiving masked speech speech perception in preschoolers at family risk for dyslexia: relations with low-level auditory processing and phonological ability auditory scene analysis: the perceptual organization of sound separate mechanisms govern the selection of spectral components for perceptual fusion and for the computation of global pitch the listening in spatialized noise test: an auditory processing disorder study how the brain separates sounds the perceptual segregation of simultaneous vowels with harmonic, shifted, or random components executive function, visual attention and the cocktail party problem in musicians and nonmusicians speech-in-noise perception in musicians: a review perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay the role of frequency modulation in the perceptual segregation of concurrent vowels effects of location, frequency region, and time course of selective attention on auditory scene analysis perceptual grouping of speech components differing in fundamental frequency and onset-time auditory grouping effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component perceptual segregation of a harmonic from a vowel by interaural time difference and frequency proximity concurrent vowel identification. i. effects of relative amplitude and f difference identification of concurrent harmonic and inharmonic vowels: a test of the theory of harmonic cancellation and enhancement concurrent vowel identification. ii. effects of phase, harmonicity, and task the effect of musical experience on emotional self-reports and psychophysiological responses to dissonance on the binding of successive sounds: perceiving shifts in nonperceived pitches speech-in-noise perception deficit in adults with dyslexia: effects of background type and listening configuration effect of a single interfering noise or speech source on the binaural sentence intelligibility of aged persons inharmonic speech: a tool for the study of speech perception and separation a flexible statistical power analysis program for the social, behavioral, and biomedical sciences pitch discrimination of harmonic complex signals: residue pitch or multiple component discriminations? harmonic template neurons in primate auditory cortex underlying complex sound processing effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing loudness of complex sounds as a function of the standard stimulus and the number of components spatial release from informational masking in speech recognition a level of stimulus representation model for auditory detection and attention pitch for nonsimultaneous successive harmonics in quiet and noise hearing a mistuned harmonic in an otherwise periodic complex tone the benefit of binaural hearing in a cocktail party: effect of location and type of interferer acoustic characteristics of american english vowels subharmonic pitches of a pure tone at low s/n ratio spatial release from energetic and informational masking in a divided speech identification task modeling speech localization, talker identification, and word recognition in a multi-talker setting sparse periodicity-based auditory features explain human performance in a spatial multitalker auditory scene analysis task perception and cortical neural coding of harmonic fusion in ferrets tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f , and aperiodicity estimation invariance to background noise as a signature of nonprimary auditory cortex adaptation of the human auditory cortex to changing background noise pitch discrimination: are professional musicians better than non-musicians? auditory processing disorder and speech perception problems in noise: finding the underlying origin speech perception is similar for musicians and non-musicians across a wide range of conditions segregation of concurrent sounds. i.:effects of frequency modulation coherence the cocktail party problem musical intervals and relative pitch: frequency resolution, not interval resolution, is special individual differences reveal the basis of consonance spectral completion of partially masked sounds indifference to dissonance in native amazonians reveals cultural variation in music perception perceptual fusion of musical notes by native amazonians suggests universal representations of musical intervals diversity in pitch perception revealed by task dependence efficient codes for memory determine pitch representations illusory sound texture reveals multi-second statistical completion in auditory scene analysis adaptive and selective time-averaging of auditory scenes mechanisms of noise robust representation of speech in primary auditory cortex influence of musical and psychoacoustical training on pitch discrimination pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings further evidence that fundamentalfrequency difference limens measure pitch discrimination ecological origins of perceptual grouping principles in the auditory system thresholds for the detection of inharmonicity in complex tones the perception of inharmonic complex tones frequency discrimination of complex tones with overlapping and non-overlapping harmonics thresholds for hearing mistuned partials as separate tones in harmonic complexes a test for the diagnosis of dead regions in the cochlea noise-invariant neurons in the avian auditory cortex: hearing the song in noise distortion products in auditory fmri research: measurements and solutions divergence in the functional organization of human and macaque auditory cortex revealed by fmri responses to harmonic tones pitch perception and auditory stream segregation: implications for hearing loss and cochlear implants informational masking and musical training musical experience and the aging auditory system: implications for cognitive abilities and hearing speech in noise perceptual consequences of "hidden" hearing loss inharmonic speech reveals the role of harmonicity in the cocktail party problem distortion products and the perceived pitch of harmonic complex tones constructing noiseinvariant representations of sound in the auditory pathway perceptual segregation and pitch shifts of mistuned components in harmonic complexes and in regular inharmonic complexes perceptual fusion and fragmentation of complex tones made inharmonic by applying different degrees of frequency shift and spectral stretch grouping and the pitch of a mistuned fundamental component: effects of applying simultaneous multiple mistunings to the other harmonics effects of cochlear hearing loss on perceptual grouping cues in competing-vowel perception why middle-aged listeners have trouble hearing in everyday settings objective measures of listening effort: effects of background noise and noise reduction the role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination processing pitch in a nonhuman mammal (chinchilla laniger) speech reception in quiet and in noisy conditions by individuals with noise-induced hearing loss in relation to their tone audiogram performance on frequency-discrimination tasks by musicians and non-musicians musical training, individual differences and the cocktail party problem signal processing in auditory cortex underlies degraded speech sound discrimination in noise effects of age and age-related hearing loss on the neural representation of speech cues acrossspecies differences in pitch perception are consistent with differences development of consonance preferences in western listeners schema learning for the cocktail party problem attentive tracking of sound sources headphone screening to facilitate web-based auditory experiments. attention, perception, and psychophysics concurrent sound segregation is enhanced in musicians speech-perception-in-noise deficits in dyslexia temporal effects in simultaneous masking and loudness key: cord- -yau r c authors: tamming, renee j.; dumeaux, vanessa; langlois, luana; ellegood, jacob; qiu, lily r.; jiang, yan; lerch, jason p.; bérubé, nathalie g. title: atrx deletion in neurons leads to sexually-dimorphic dysregulation of mir- and spatial learning and memory deficits date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: yau r c mutations in the atrx chromatin remodeler are associated with syndromic and non-syndromic intellectual disability. emerging evidence points to key roles for atrx in preserving neuroprogenitor cell genomic stability, whereas atrx function in differentiated neurons and memory processes are still unresolved. here, we show that atrx deletion in mouse forebrain glutamatergic neurons causes distinct hippocampal structural defects identified by magnetic resonance imaging. ultrastructural analysis revealed fewer presynaptic vesicles and an enlarged postsynaptic area at ca apical dendrite-axon junctions. these synaptic defects are associated with impaired long-term contextual memory in male, but not female mice. mechanistically, we identify atrx-dependent and sex-specific alterations in synaptic gene expression linked to mir levels, a known regulator of presynaptic processes and spatial memory. we conclude that ablation of atrx in excitatory forebrain neurons leads to sexually dimorphic outcomes on mir- and on spatial memory, identifying a promising therapeutic target for neurological disorders caused by atrx dysfunction. summary statement ablation of the atrx chromatin remodeler specifically in forebrain excitatory neurons of mice causes male-specific deficits in long-term spatial memory associated with mir- overexpression, transcriptional changes and structural alterations corresponding to pre- and post-synaptic abnormalities. alpha-thalassemia x-linked intellectual disability syndrome, or atr-x syndrome, is a rare congenital x-linked disorder resulting in moderate to severe intellectual disability (id), developmental delay, microcephaly, hypomyelination, and a mild form of alphathalassemia [omim: ] . in a recent study of approximately individuals with id, atrx mutations were identified as one of the most frequent cause of non-syndromic id , emphasizing a key requirement for this gene in cognitive processes. atrx-related id arises from hypomorphic mutations in the atrx gene, most commonly in the highly conserved atrx/dnmt /dnmt l (add) and switch/sucrose non-fermenting (swi/snf) domains , . the former targets atrx to chromatin by means of a histone reader domain that recognizes specific histone tail modifications , and the latter confers atpase activity and is critical for its chromatin remodeling activity , . atrx, in a complex with the histone chaperone daxx, promotes the deposition of the histone variant h . at heterochromatic domains including telomeres and pericentromeres , . however, atrx is also required for h . deposition within the gene body of a subset of g-rich genes, presumably to reduce g-quadruplex formation and promote transcriptional elongation . atrx is also required for the postnatal suppression of a network of imprinted genes in the neonatal brain by promoting long range chromatin interactions via ctcf and cohesin . in mice, germline deletion of atrx results in embryonic lethality while conditional deletion of atrx in neuroprogenitors leads to excessive dna damage caused by dna replication stress and subsequent tp -dependent apoptosis , . mice with deletion of exon of atrx (atrx Δe ) were generated that result in global reduction of atrx expression. these mice are viable and exhibit impaired novel object recognition memory, spatial memory in the barnes maze, and contextual fear memory . some of the molecular defects identified in these mice included decreased activation of camkii and the ampa receptor in the hippocampus as well as decreased spine density in the medial prefrontal cortex, and altered dna methylation and increased expression of xlr b in neurons . our group also reported similar behavioural impairments in female mice exhibiting mosaic expression of atrx in the central nervous system . however, the contribution of different cell types and sex difference to behavioural abnormalities have not yet been resolved. to start addressing these questions, we deleted atrx specifically in glutamatergic forebrain neurons in male and female mice. this approach bypasses deleterious effects of atrx loss of function that we previously observed during brain development caused by replication stress in proliferating neuroprogenitors , . a comprehensive analysis of these mice reveals that atrx promotes long-term spatial learning and memory associated with morphological and synaptic ultrastructural changes in the hippocampus. we show that female mice lacking atrx in neurons are protected from spatial learning and memory defects and identify sex-specific effects of atrx loss on the expression of synaptic genes and mir- . overall, we identify a novel sex-specific function for atrx in neurons in the regulation of long-term spatial memory associated with abnormal synapse ultrastructure. animal care and husbandry. mice were exposed to a -hour-light/ -hour-dark cycle and with water and chow ad libitum. the atrx loxp mice have been described previously . atrx loxp mice were mated with c bl/ mice expressing cre recombinase under the control of the αcamkii gene promoter . the progeny includes hemizygous male mice that produce no atrx protein in forebrain excitatory neurons (atrx-cko). the atrx-cko males were mated to atrx loxp females to yield homozygous deletion of atrx in female mice (atrx-cko fem ). male and female littermate floxed mice lacking the cre allele were used as controls (ctrl; ctrl fem ). genotyping of tail biopsies for the presence of the floxed and cre alleles was performed as described previously arrive guidelines were followed: mouse groups were randomized, experimenters were blind to the genotypes, and software-based analysis was used to score mouse performance in all the tasks. all behavioural experiments were performed between : am and : pm. immunofluorescence staining. mice were perfused with ml phosphate buffered saline (pbs) followed by ml % paraformaldehyde (pfa) in pbs and the brain fixed for hours in % pfa in pbs and cryopreserved in % sucrose/pbs. brains were flash frozen in cryomatrix (thermo scientific) and sectioned at µm thickness as described previously . for immunostaining, antigen retrieval was performed by incubating slides in mm sodium citrate at °c for min. cooled sections were washed and blocked with % normal goat serum (sigma). the slides were incubated overnight in primary antibody sections were washed again three times for min, counterstained with dapi and mounted with slowfade gold (invitrogen). all images were captured using an inverted microscope (leica dmi b) with a digital camera (hamamatsu orca-er). openlab image software was used for manual image capture, and images were processed using the volocity software (demo version . . ; perkinelmer) and adobe photoshop cs (version . ). cell counts of dapi, gfap, and iba were performed in adobe photoshop. dapi was counted per mm and gfap and iba were counted as percentages of dapi + cells. one section from five pairs of ctrl/atrx-cko was counted. reverse transcriptase real-time pcr (qrt-pcr). total rna was isolated from control and atrx-cko frontal cortex and hippocampus using the mirvana total rna isolation kit (thermofisher) and reverse transcribed to cdna using μg rna and superscript ii reverse transcriptase (invitrogen). real-time pcr was performed in duplicate using gene-specific primers under the following conditions: °c for s, °c for s, °c for s for cycles. all data were normalized against β-actin expression levels. primers used were as follows: β-actin (forward ctgtcgagtcgcgtccaccc, reverse acatgccggagccgttgtcg); atrx (forward agaaattgaggatgcttcacc, reverse tgaacctggggacttctttg). total rna was also used for reverse transcription of mirna using the taqman advanced microrna reverse transcription kit (thermofisher). qrt-pcr was performed using the short-term memory, mice were exposed to the original object (a) and a novel object (b; a blue plastic pyramid attached on top of a green prism base) . hours after training. to test long-term memory, mice were exposed to (a) and (b) hours after training. novel object recognition was expressed as the percentage of time spent with the novel object as a fraction of the total time spent interacting. interaction was defined as sniffing or touching the object, but not leaning against or climbing on the object. morris water maze. the morris water maze was conducted as described previously touchscreen assays. the paired associate learning (dpal) and visual paired discrimination (vpd) and reversal tasks were performed as previously described - . animals were food-restricted to % starting body weight. animals were separated into two counter-balanced subgroups to control for time of day of testing, and equipment variation. mice were tested in bussey-saksida mouse touch screen chambers (lafayette neuroscience) with strawberry milkshake given as a reward. for the dpal acquisition phase, animals were tested for their ability to associate objects with locations. mice were presented with two images in two of three windows; one image was in its correct location (s+) and one was in one of its two incorrect locations (s-). the third window was blank. a correct response triggered reward presentation and start of an inter-trial period. the pre-training was repeated until mice reached criterion (completion of trials within minutes). the dpal evaluation phase was performed for sessions over weeks. a correct response triggered reward presentation, whereas an incorrect response caused a s time out and the house lights to turn on. an incorrect response also resulted in a correction trial, where the same s+/s-images were presented in the same two locations until the mouse responded correctly. the mouse was given trials over minutes per day. percent correct, number of correction trials, latency to a correct or incorrect response, and latency to retrieve reward were recorded for each week. vpd acquisition required the animal to touch the same image (s+) no matter which window it appeared in. the other screen had an incorrect image (s-). a correct response triggered reward presentation, whereas an incorrect response triggered the house lights to turn on, a time out of s, and a correction trial to begin (previous trial repeated until a correct choice is made). this was repeated until mice reached criterion of / trials correct within minutes over consecutive days, after which baseline measurements were done for two sessions. parameters for baseline were identical to the acquisition steps. immediately following baseline measurements, the vpd task reversal began, where most parameters were the same as the acquisition, but the correct image associated with the reward was s-, and the incorrect response that triggers house lights was s+. this also determined total number of vesicles per synapse. vesicle cluster size was measured to calculate vesicle density. the area of the post-synaptic density was also quantified. statistics were calculated by two-way repeated measures anova with sidak's multiple comparison test or unpaired student's t-tests where applicable. statistical analyses. all data were analyzed using graphpad prism software with student's t test (unpaired, two-tailed) or one or two-way repeated measures anova with sidak's post-test where applicable. all results are depicted as mean +/-sem unless indicated otherwise. p values of less than . were considered to indicate significance. we generated mice lacking atrx in postnatal forebrain excitatory neurons by cre/loxp mediated recombination of the mouse atrx gene with the camkii-cre driver line of mice . to confirm loss of atrx, we performed immunofluorescence staining of control and conditional knockout (atrx-cko) brain cryosections obtained from -month-old mice (figure a,b) . atrx is highly expressed in excitatory neurons of the hippocampus of control mice, including the cortex and hippocampal ca , ca / , and dentate gyrus neurons, but is absent in these cells in the atrx-cko mice. additional validation of atrx inactivation in atrx-cko mice was achieved by qrt-pcr (figure c) , showing that atrx expression is decreased by % (+/- . %) and % (+/- . %) in the cortex and the hippocampus, respectively, which is expected from a neuron-specific deletion. the brain sub-region specificity of atrx loss was demonstrated by western blot analysis, showing reduced protein levels in the rostral and caudal cortices and hippocampus, but not in the cerebellum (figure d ). the mice survived to adulthood and had normal general appearance and behaviour. however, body weight measurements revealed a small but significant reduction in atrx-cko compared to control mice (figure e ). these findings demonstrate that we achieved specific deletion of atrx in excitatory neurons and while the mice were slightly smaller, they survived to adulthood, allowing further analyses in the adult brain. we first examined control and atrx-cko mouse brains for neuroanatomical anomalies by magnetic resonance imaging (mri). using a t -weighted mri sequence, we were able to analyze and compare the entire brain as well as independent brain regions from control and atrx-cko male animals. the data obtained showed that the overall volume of the atrx-cko brain is significantly smaller compared to controls ( . % of control volume, p< . ), as indicated by whole volume in mm and cumulative serial slices of control and atrx-cko brains (figure a,b) , which correlates with the smaller body size of the mice. due to the reduction in body size and absolute total brain and hippocampal volumes of the atrx-cko mice, we next examined hippocampal neuroanatomy relative to total brain volume ( we postulated that the increase in relative volume of the ca sr/slm may be due to increased length or branching of ca apical dendrites. to investigate this possibility, golgi staining was used to sporadically label neurons (figure a ) and sholl analysis was performed on confocal microscopy images to evaluate apical dendrite branching of ca hippocampal neurons. however, no significant difference in dendritic branching or length was observed between control and atrx-cko mice, whether analyzed separately for apical or basal dendrites (figure b-g) . increased relative volume might also be caused by an increased number of cells, but immunofluorescence staining and quantification of astrocytes (gfap+) and microglia (iba +) and total number of cells (inferred from dapi+ staining) revealed no differences in atrx-cko hippocampi (figure h-j) . overall, the increased relative volume of the ca sr/slm cannot be explained by increased length or complexity of dendritic trees or by an increased number of glial cells. based on the hippocampal structural alterations we detected by mri, we looked more closely at potential ultrastructural changes in the ca sr/slm area using transmission electron microscopy (tem) (figure a) . the presynaptic boutons were divided in nm bins from the active zone, and the number of vesicles in each bin was counted. the spatial distribution of vesicles in relation to the cleft was unchanged between the atrx-cko mice and controls (figure b) . however, we found that the total number of vesicles, the density of the vesicles, and the number of docked vesicles was significantly decreased at atrx-cko compared to control synapses (figure c-e) . we also analysed other structural aspects of synapses and found that the size of the postsynaptic density and the width of the synaptic cleft were both increased in atrx-cko compared to controls (figure f,g) . the length of the active zone, cluster size, or diameter of the vesicles did not vary significantly between control and atrx-cko samples ( figure h -j). these results suggest that atrx is required for structural integrity of the pre-and post-synapse, including maintenance of the synaptic vesicle pool at pre-synaptic termini and potential defects in postsynaptic protein clustering. we to investigate the effects of neuronal-specific atrx ablation on spatial learning and memory, we tested the mice in the morris water maze task. the atrx-cko mice showed a significant delay in latency to find the platform on day of the learning portion of the task; however, by day they were able to find the platform as quickly as the control mice (f= . , p= . ; figure a ). this finding was reflected in the distance traveled p= . ; figure f , g). these behavioural analyses suggest that atrx in required in excitatory neurons for long-term hippocampal-dependent spatial learning and memory. to determine whether loss of atrx in female mice would exhibit similar behavioural defects as seen in male mice, we generated atrx-cko female mice (atrx- given the observed male-specific defects in spatial memory, we performed additional translational cognitive tasks on the atrx-cko male mice. the dpal touchscreen task in mice is analogous to cognitive testing done in humans by the cambridge neuropsychological test automated battery (cantab) , and normal performance in this task is thought to partly depend on the hippocampus , . control and atrx-cko mice were trained to identify the position of three images as depicted in figure a , undergoing trials per day for weeks. the results demonstrate that the atrx-cko mice exhibit a profound deficit in this task, indicated by both the percent correct (f= . , p= . ; figure b ) and the number of correction trials required (f= . , p< . ; figure c ) (supplementary video , ) . these defects were not due to an inability to perform within the chamber or to attentional deficits, as latency to a correct answer (f= . , p= . ), to an incorrect answer (f= . , p= . ), and to retrieve the reward (f= . , p= . ) was not significantly different between control and atrx-cko mice (figure d-f) . to determine if the impairment in the dpal task is caused by a vision problem rather than a learning defect, the mice were also tested in the visual paired discrimination (vpd) touchscreen task which requires the mice to discriminate between two images regardless of position on the screen. while the atrx-cko mice took significantly longer to reach the criterion pre-testing (t= . , p= . ; to identify the molecular mechanism(s) leading to spatial memory impairment, we performed rna-sequencing in both male and female hippocampi obtained from three pairs of littermate-matched ctrl/atrx-cko and ctrl fem /atrx-cko fem mice. there were transcripts differentially expressed in the atrx-cko males compared to control counterparts and transcripts in atrx-cko fem compared to the female controls (fdr < . ). to isolate transcripts that were likely to be causative to the impaired learning and memory phenotype in the male mice which was not found in the female mice, we focused on transcripts whose changes in expression with the atrx-cko were differential between male and female mice (n = transcripts, interaction term fdr < . ). the expression heat map of these transcripts illustrates that their expression levels are similar in control males and females but are differentially expressed when atrx is lost depending on sex (figure a) . we then utilized panther , a tool for gene enrichment analysis based on functional annotations to examine gene ontology biological processes for which our list of transcripts was enriched (figure b) . the top five pathways included neurotransmitter receptor transport to postsynaptic membrane, protein localization to postsynaptic membrane, non-motile cilium assembly, and vesicle-mediated transport to the membrane. therefore, the rna sequencing revealed many transcripts related to synapses, supporting the tem data. certain mirna are enriched within presynaptic terminals and have been figure a ) as well as an enrichment for targets of mir- (supp. figure b) . we compared the list of genes downregulated in the atrx-cko male hippocampi to those predicted to be regulated by mir- through mirna.org (supp. table ) . we found shank , cadps , glrb, and sgip expression to be inversely . this data provides additional evidence that loss of atrx in the cortex and hippocampus of male mice leads to increased mir- expression and consequent downregulation of its target genes, starting at early stages of forebrain development. this study presents evidence that atrx is required in a sex-specific manner in excitatory forebrain neurons for normal spatial learning and memory (figure ) [ ] [ ] [ ] . in humans, neurological disorders such as autismspectrum disorders tend to preferentially affect males rather than females, possibly due to combinatorial contributions of hormonal and genetic factors in a phenomenon known as the female protective effect [ ] [ ] [ ] , and this is regularly supported with mouse models [ ] [ ] [ ] . the presence of estrogen and estrogen receptor in the female brain has been shown to be neuroprotective and leads to enhanced schaffer-collateral ltp . in addition, certain x-linked genes involved in chromatin regulation (e.g. utx, a histone demethylase) are able to escape x-inactivation and so are expressed two-fold in females compared to males in conclusion, our study presents strong evidence that atrx is required in forebrain excitatory neurons for spatial learning and long-term memory and regulation of genes required for efficient synaptic transmission. vpd touchscreen assays were performed at the robarts research institute neurobehavioural core facility, tem imaging at the biotron at western university and the rna sequencing at the london regional genomics centre. mutations in a putative global transcriptional regulator cause x-linked mental retardation with alpha-thalassemia (atr-x syndrome) targeted next-generation sequencing analysis of , individuals with intellectual disability mutations in transcriptional regulator atrx establish the functional significance of a phd-like domain microrna- a regulates neurite outgrowth, spinal morphology, and function mir- b shapes the presynaptic transcriptome and influences neurotransmission by silencing the polycomb group protein bmi the schizophrenia risk gene product mir- alters presynaptic plasticity the swi/snf protein atrx co-regulates pseudoautosomal genes that have translocated to autosomes in the mouse genome toppgene suite for gene list enrichment analysis and candidate gene prioritization the shank family of scaffold proteins the glycine receptor cloning and characterization of human cadps and cadps , new members of the ca +-dependent activator for secretion protein family intersectin forms complexes with sgip and reps in clathrin-coated pits disruption of the direct perforant path input to the ca subregion of the dorsal hippocampus interferes with spatial working memory and novelty detection a macromolecular synthesis-dependent late phase of long-term potentiation requiring camp in the medial perforant pathway of rat hippocampal slices abnormal microglia and enhanced inflammation-related gene transcription in mice with conditional deletion of ctcf in camk a-cre-expressing neurons lipopolysaccharide-induced microglial activation induces learning and memory deficits without neuronal cell death in rats learning, memory, and glial cell changes following recovery from chronic unpredictable stress disruption of the perineuronal net in the hippocampus or medial prefrontal cortex impairs fear conditioning modification of extracellular matrix by enzymatic removal of chondroitin sulfate and by lack of tenascin-r differentially affects several forms of synaptic plasticity in the hippocampus new translational assays for preclinical modelling of cognition in schizophrenia: the touchscreen testing method for mice and rats synaptic scaffold evolution generated components of vertebrate cognitive complexity x-linked alphathalassemia/mental retardation (atr-x) syndrome: localization to xq -q . by x inactivation and linkage analysis sexually dimorphic behavior, neuronal activity, and gene expression in chd -mutant mice mecp organizes juvenile social behavior in a sex-specific manner mecp modulates sex differences in the postsynaptic development of the valproate animal model of autism epidemiology of pervasive developmental disorders a higher mutational burden in females supports a "female protective model" in neurodevelopmental disorders transcriptomic analysis of autistic brain reveals convergent molecular pathology human crmp mutation and disrupted crmp expression in mice are associated with asd characteristics and sexual dimorphism shank deletions in males with investigation of sex differences in the expression of rora and its transcriptional targets in the brain as a potential contributor to the sex bias in autism memory-related synaptic plasticity is sexually dimorphic in rodent hippocampus sex-specific differences in expression of histone demethylases utx and uty in mouse brain and neurons mirna expression profile after status epilepticus and hippocampal neuroprotection by targeting mir- microarray based analysis of microrna expression in rat cerebral cortex after traumatic brain injury the brain-specific microrna mir- b regulates the formation of fear-extinction memory supplementary figure : transcriptional profiling reveals dysregulation of presynaptic genes in atrx-foxg mice. (a) differentially expressed genes between control and atrx-foxg p . forebrain were used for gene ontology analysis and top biological processes were listed by p-value. (b) top mirna predicted to regulate differentially expressed genes from control and atrx-foxg mice we are grateful to doug higgs and richard gibbons for the atrx floxed mice, vania prado and marco prado for the camkii-cre mice, michael miller for advice on statistical analyses, and tim bussey for discussions on the touchscreen assays. the dpal and the authors declare no competing or financial interests. key: cord- - qvnj c authors: li, xin; jin, xiufeng; chen, shunmei; wang, liangge; yau, tung on; yang, jianyi; hong, zhangyong; ruan, jishou; duan, guangyou; gao, shan title: the discovery of a recombinant sars -like cov strain provides insights into sars and covid- pandemics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qvnj c in december , the world awoke to a new zoonotic strain of coronavirus named severe acute respiratory syndrome coronavirus- (sars-cov- ). in the present study, we identified key recombination regions and mutation sites cross the sars-cov- , sars-cov and sars-like cov clusters of betacoronavirus subgroup b. based on the analysis of these recombination events, we proposed that the spike protein of sars-cov- may have more than one specific receptor for its function. in addition, we reported—for the first time—a recombination event of orf at the whole-gene level in a bat and ultimately determined that orf enhances the viral replication. in conjunction with our previous discoveries, we found that receptor binding abilities, junction furin cleavage sites (fcss), strong first ribosome binding sites (rbss) and enhanced orf s are main factors contributing to transmission, virulence and host adaptability of covs. junction fcss and enhanced orf s increase the efficiencies in viral entry into cells and replication, respectively while strong first rbss enhance the translational initiation. the strong recombination ability of covs integrated these factors to generate multiple recombinant strains, two of which evolved into sars-cov and sars-cov- by nature selection, resulting in the sars and covid- pandemics. a new zoonotic strain of coronavirus named severe acute respiratory syndrome coronavirus- (sars-cov- ) emerged in december . since sars-cov- is high similar to sars-cov, many studies focused on the investigate of the receptor binding domain (rbd) of the spike protein and its receptor ace using the same strategies and methods as in sars-cov [ ] . different from these studies, we previously reported several other findings on sars-cov- for the first time, including the following in particular: ( ) the alternative translation of nankai coding sequence (cds) that characterize the rapid mutation rate of betacoronavirus at the nucleotide level [ ] ; ( ) a furin cleavage site (fcs) "rrar" in the junction region between s and s subunits (junction fcs) of sars-cov- that may increase the efficiency of viral entry into cells [ ] ; and ( ) the use of ' untranslated-region (utr) barcoding for the detection, identification, classification and phylogenetic analysis of-though not limited to-covs [ ] . by data mining betacoronaviruses from public databases, we found that more than nucleotides (nts) at the ' ends of the ' utrs in betacoronavirus genomes are highly conserved with very few single nucleotide polymorphisms (snps) within each subgroup of betacoronaviruses. we defined ~ -bp sequences of ' utrs including the start codons (atgs) of the first open reading frames (orfs) as barcodes to represent betacoronaviruses. using ' utr barcodes, betacoronaviruses were clustered into four classes, matching the c, b, a and d subgroups of betacoronavirus [ ] , respectively. the ' end of each ' utr includes the first ribosome binding site (rbs) of betacoronavirus, which regulates the translational initiation of downstream genes orf a and b [ ] . in particular, sars-cov- and sars-cov have the strong first rbss enhancing the translational initiation [ ] . these previous studies indicated that receptor binding abilities, junction fcss and strong first rbss are main factors contributing to transmission, virulence and host adaptability of cov. the present study started with the identification of key recombination regions and mutation sites cross three clusters of betacoronavirus subgroup b. using the insertions and deletions (indels) at six sites, we identified two recently detected betacoronavirus strains rmyn and rmyn from a bat [ ] and discovered that rmyn was a recombinant sars -like cov strain. this led us to report-for the first time-a recombination event in open reading frame (orf ) at the whole-gene level in a bat, which had been co-infected by two betacoronavirus strains. orf (figure ) , existing only in betacoronavirus subgroup b, was considered to have played a significant role in adaptation to human hosts following interspecies transmission [ ] via the modification of viral replication [ ] . next, we ultimately determined that orf enhances the viral replication, which is another main factor contributing to transmission, virulence and host adaptability of cov. in the present study, we analyzed these main factors in the context of betacoronavirus evolution (conjoint analysis of phylogeny and molecular [ ] ) to explain the sars and covid- pandemics. based on analysis of betacoronavirus subgroup b (materials and methods), key insertions and deletions (indels) were identified at six sites (named m to m ) in genes orf a, membrane (m), orf a, b, and nucleocapsid (n), respectively (table ) . using the indels at six sites, betacoronavirus subgroup b was classified into two classes: ( ) the first class includes sars-cov- (from humans) and sars -like cov (from animals), and ( ) the second class includes sars-cov (from humans) and sars-like cov (from animals). this classification result is reliable as all recombination and mutations between them are unlikely to undergo reversible changes together. as a mutation site, m has a length of nt in the second class and nt in the first class. m , m , m and m in the first class have -nt deletions that are complete codons, whereas m in the first class has -nt deletion that are not complete codons. almost all the identified recombination events occurred in genes orf a, s and orf ( table ) . the recombination regions rc to rc and rc to rc are located in orf a and the s region of the s gene, respectively, while the recombination events in orf are complex (see below). to initiate the cov infection, the s protein encoded by the s gene need to be cleaved into the s and s subunits for receptor binding and membrane fusion. by analysis of all recombination events in all betacoronaviruses, we obtained the following results: ( ) there are a few genotypes of each recombination region (rc to rc ); ( ) rc to rc have more diversity than rc to rc in the genotypes; ( ) betacoronaviruses within the sars-cov- and sars-cov clusters (see below) had the same genotypes of each recombination region; and ( ) there were a few non-synonymous substitutions between different sequences of each genotype. these results suggested that recombination, rather than accumulated mutation directly triggered cross-species transmission and outbreaks of sars-cov and sars-cov- , while mutation could change potential recombination sites, affecting recombination. further analysis showed that two recombination regions (rc and rc ) are localized in the receptor binding domain (rbd) of s (figure ) , while three other recombination regions (rc , rc and rc ) are localized in the n-terminal domain (ntd) of s . almost all the secondary structures of five protein segments encoded by rc to rc are disordered, which are responsible for protein protein interaction (ppi). this suggested that the recombination of rc to rc improve the adaptability of betacoronaviruses in new hosts (host range expansion [ ]) by enhancing interaction of rbd and ntd with their receptors, exhibiting that the positive selection of s was particularly strong [ ] . since both rbd and ntd had similar recombination events in their ppi regions, we proposed that ntd has a specific receptor as rbd has angiotensin-converting enzyme (ace ). thus, the s subunit of sars-cov- may have more than one specific receptor (figure ) like gp of hiv has the cluster of differentiation receptor (cd ) and the c-c chemokine receptor (ccr ). comprehensive analysis and reuse of data from different sources are necessary to determine the other receptor/s of sars-cov- . a previous study identified two genetic susceptibility loci (rs at locus p . and with rs at locus q . ) in covid- patients with respiratory failure using genome-wide association analysis [ ] . the locus p . was associated to six genes slc a , lztfl , ccr , fyco , cxcr and xcr . however, the previous study only focused on the further analysis of the locus q . and confirmed a potential involvement of the abo blood-group system. the researchers did not notice that three chemokine receptors ccr , cxcr and xcr merit further investigation as candidates of sars-cov- receptors. the analysis of bulk rna-seq data showed high expression of ccr and xcr in thymus and cxcr in t cells, compared to other tissues and cell types [ ] . in particular, the thymic cells were consistently negative for ace and many covs can infect thymus [ ] . by investigating interaction of three protein segments encoded by rc to rc in ntd (table ) with ccr , cxcr and xcr , we found that ccr is the most possible candidate among three chemokine receptors. however, the final determination of the other receptor/s of sars-cov- need more calculation and experiments on candidates at the whole-genome level. our study did not rule out the possibilities of non-receptor proteins binding to ntd. the s protein is cleaved into two subunit s and s (in red color) for receptor binding and membrane fusion. s has two domains, rbd (in green color) and ntd (in blue color). it is well accepted that s binds to its specific receptor angiotensin-converting enzyme (ace ) by the interaction between rbd and ace (in purple color). in the present study, we propose that the spike protein of sars-cov- may have more than one specific receptor for its function as gp of hiv has cd and ccr . the structure of s was predicted using trrosetta [ ] . recently, two betacoronavirus strains rmyn and rmyn (gisaid: epi_isl_ and epi_isl_ ) were detected from a bat (rhinolophus malayanus) [ ] . since betacoronaviruses from subgroup b share many highly similar regions in their genome sequences, it is very difficult to assemble them correctly using high-throughput sequencing (hts) data from one sample. therefore, epi_isl_ was only assembled into a partial sequence in a previous study [ ] . however, the exact identification of viruses requires the complete genomes or even the full-length genomes. using paired-end sequencing data, we reassembled these two virus genomes and obtained two full-length sequences to update epi_isl_ and epi_isl_ (supplementary ). using ' utr barcodes (introduction), the betacoronaviruses rmyn and rmyn were identified as belonging to subgroup b. using the indels at m to m , rmyn was further identified as belonging to the second class, respectively. using the indels at m to m and m , rmyn was further identified as belonging to the first class but a recombinant sars -like cov strain. rmyn was supposed to have a -nt deletion at the m site; however, it did not ( table ) . this led us to report-for the first time-a recombination event in orf at the whole-gene level in a bat, which had been co-infected by two betacoronavirus strains. orf (figure ), existing only in betacoronavirus subgroup b, was considered to have played a significant role in adaptation to human hosts following interspecies transmission [ ] via the modification of viral replication [ ] . a -nt deletion in sars-cov (genbank: ay ) was reported and considered to be associated with attenuation during the early stage of human-to-human transmission [ ] . rna-seq data from a bat was aligned to two genomes of rmyn and rmyn (gisaid: epi_isl_ and epi_isl_ ). rna abundance is represented by read counts (y-axis). the rna abundance of rmyn is times that of rmyn . a. rmyn was identified as belonging to the sars-like cov cluster and has a type orf . b. rmyn was identified as belonging to the sars-cov- cluster but has an type orf (enhanced orf ). a -nt deletion during the early evolution of sars-cov- (gisaid: epi_isl_ ) was also reported [ ] , but associated with attenuation without changes in its replication [ ] . although many recombination events in orf of betacoronaviruses have been reported in sequence analysis results, it is difficult to determine whether they were recombination events or small-size mutation (indel & snp) accumulation as most of them only occurred over very small genomic regions, excepting a few events (e.g., the -nt deletion in orf [ ] ). in the present study, the discovery of a recombination event in orf at the whole-gene level led to the determination of three types (see below) of orf genes in the betacoronavirus subgroup b, providing clues to understand the functions of orf . based on conjoint analysis of phylogeny and molecular function that was proposed in our previous study [ ] , genes (i.e. orf a, s and orf ) containing the recombination regions under high selection pressure must be removed in phylogenetic analysis. using large segments (supplementary ) spanning s , orf a, b, envelope (e), m, orf , a, b, n( a) and orf (table ) , phylogenetic tree ( figure a) showed that of betacoronaviruses from subgroup b (materials and methods) were classified into two major clades, corresponding to the first and second class classified using the indels at six sites, respectively: ( ) the first major clade, named the sars-cov- cluster, includes sars-cov- and all sars -like covs (from bats and pangolins); and ( ) using only orf (supplementary ), phylogenetic tree ( figure b ) also showed that the betacoronaviruses were classified into the sars-cov- , sars-cov and sars-like cov clusters. however, this tree did not reflect the evolutionary relationship of the three clusters due to the recombination events of orf . using cdss of nsp (rna-dependent rna polymerase, rdrp), the rooted phylogenetic tree ( figure c ) was construct to confirm the evolutionary relationship of the three clusters in tree (figure a) .in phylogenetic tree , the sars-cov- , sars-cov and sars-like cov clusters have types , and orf genes, respectively. type orf genes possess low nucleotide identities (below %) to type orf genes, while type orf genes are so highly divergent from types and orf genes, they cannot be well aligned to calculate nucleotide identities between type orf genes and types or orf genes. as rmyn belongs to the cluster (figure a) but has a type rather than type orf (figure b ), rmyn is a recombinant sars -like cov strain. this discovery indicated that recombination occurred across the sars-cov- and sars-cov clusters, which has potential to generate a new strain more dangerous than sars-cov- and sars-cov. recombination regions (rc - ) and mutation sites (m - ) were annotated in the viral genomes of sars-cov- (genbank: mn ) by column - and rmyn (gisaid: epi_isl_ ) by column - . # these amino acid sequences are encoded by rc - from sars-cov- . all the insertions and deletions refer to sars-cov (genbank: ay ). * since rmyn has a recombinant orf , it has the same allele at the m site as sars-cov. comparing phylogenetic tree (figure a ) using large segments with ( figure b ) using only orf genes, all betacoronaviruses were consistently classified into the same clusters in both trees, except rmyn and the sars-like cov strain wiv (genbank: kf ). the accession numbers of the genbank or gisaid databases were used to represent the viral genomes: wiv was classified into cluster , but has a type rather than type orf , wiv is a recombinant sars-like cov strain. wiv , isolated from chinese horseshoe bats (rhinolophus sinicus), was considered the most closely related to sars-cov but not its immediate ancestor [ ] . a previous study predicted the immediate ancestor of sars-cov based on the following hypothesis: the ancestor of sars-like covs from civets was a recombinant virus with orf originating from greater horseshoe bats (rhinolophus ferrumequinum) and other genomic regions originating from different horseshoe bats [ ] . however, whether these recombination events occurred in bats or civets remains unclear [ ] . both phylogenetic tree ( figure a ) and ( figure b ) consistently revealed that sars-cov- is most closely related to the well-known strain ratg (genbank: mn ) isolated from intermediate horseshoe bats (rhinolophus affinis) . however, ratg is unlikely to be the immediate ancestor of sars-cov- due to lack of the junction fcs next, we conducted further research on the biological functions of orf . rmyn and rmyn were simultaneously detected in a bat, providing a special opportunity to compare their copy numbers. as rmyn and rmyn have type and type orf genes, respectively, the difference between the copy numbers of rmyn and that of rmyn can be estimated by their relative rna abundances to test a previous hypothesis that type orf genes increase replication efficiency of viruses. aligning rna-seq data to the genomes of rmyn and rmyn , our calculation showed that the rmyn genome was covered . % of its length with an average depth of . (figure a) , while the rmyn genome was covered . % with an average depth of . ( figure b) . the rna abundance of rmyn is times that of rmyn . based on the "leader-to-body fusion" model explaining the replication and transcription of covs [ ] , the difference in rna abundance of the orf a and orf b genes ( figure ab ) resulted from cov replication, rather than transcription. therefore, this result suggests that type orf (named enhanced orf ) genes increase replication efficiency of rmyn , ruling out the possibility that transcription contributes to the difference in rna abundance of the two virus strains. our study ultimately determined that orf enhances the viral replication. receptor binding abilities, junction fcss, strong first rbss and enhanced orf s (see above) are main factors contributing to transmission, virulence and host adaptability of covs. by analysis of these main factors in betacoronavirus genomes (materials and methods), we concluded: ( ) rapid recombination of viral genomes provides cov the strong ability of cross-species transmission and outbreak; ( ) the immediate ancestor of betacoronavirus was most likely to have two junction fcs and a strong first rbs, and it transmitted across species during its outbreak; ( ) after a period of adaption in new hosts, betacoronavirus was attenuated to spread widely and persist in the host population by loss of abilities attributed to one or more factors (e.g. junction fcss); and ( ) the strong recombination ability of covs integrated these factors to generate multiple recombinant strains, very a few of which evolved into super virus strains (e.g. sars- cov and sars-cov- ) causing pandemics by nature selection. in the betacoronavirus subgroup c (figure a) , middle east respiratory syndrome coronavirus (mers-cov) has two junction fcss. the first one "rstr", located at position in the s protein (noted as r ), is nonfunctioning, as a result of attenuation, because there is a disulfide bond cross the junction fcs r . however, the second junction fcs "rsvr" (r ) is still functional. originated from the same ancestor of mers-cov, mers-like covs (e.g. hedgehog cov) were further attenuated by loss of two junction fcss. in the betacoronavirus subgroup b (figure ab) , sars-cov- (genbank: mn ) has the junction fcs "rrar" (r ) and lost a junction fcs by substituting "kntq" (r ) for "rntr", as a result of attenuation. all sars- like covs (from bats or pangolins [ ] ) in the sars-cov- cluster were further attenuated by loss of "rrar" and substituting "kntq" for "rntr". the immediate ancestor of sars-cov was an attenuated variant of sars-cov- by loss of "rrar" (e.g r ) and with inaccessible "rntr" (r ) which has secondary structures in helix rather than coil. all sars-like covs in the sarslike cov cluster were further attenuated by loss of "rrar" and substituting "kntq" for "rntr". in the betacoronavirus subgroup d, hku -cov was attenuated by loss of two junction fcss but still have a strong first rbs. in the betacoronavirus subgroup a (figure a) , although almost all strains (e.g. hcov-oc and hcov-hku ) still have one junction fcs, they do not have strong first rbss and enhanced orf s. these strains were heavily attenuated from the immediate ancestor of betacoronavirus due to complex reasons. firstly, the average arginine (r) pencentage of s proteins from betacoronaviruses of the subgroup a except mouse hepatitis virus (mhv) . % is significantly lower than those from mhv and betacoronaviruses of the subgroup b, c and d ( . %, . %, . % and . %). this indicated that accumulated mutations caused attenuation by loss of arginine residues, since arginine residues are indispensable for the protease cleavage sites. other reasons may include the loss of strong first rbss and genetic events in the transcription regulatory sequences [ ] , an important factor that is not further investigated in the present study, but merit further investigation in the future. and betacoronaviruses from pangolins) in the sars-cov- cluster are attenuated variants of sars-cov- . as recombinant betacoronavirus, the immediate ancestor of sars-cov- is characterized by the junction fcs "rrar", while the immediate ancestor of sars-cov is characterized by the enhanced orf . therefore, wiv without the enhanced orf and ratg without the junction fcs "rrar" may contribute to, but are not the immediate ancestors of sars-cov and sars-cov- , respectively. the outbreaks of mer-cov, sars-cov and sars-cov- were triggered by recombination events, not accumulated mutations. so it is not suitable to estimate their divergence time using current theories in evolutionary biology. the origins of the junction fcs "rrar" and the enhanced orf are still unknown. therefore, the recombinant strains (e.g. wiv and rmyn ) are identified based on the reference strains that were usually reported before the recombinant strains. this reflects the phylogenetic relationship between them, not the actual recombinant events which occurred to generate the recombinant strains. future investigation need be conducted to search for the betacoronavirus strains that provided the junction fcs "rrar" and the enhanced orf to sars-cov- and sars-cov, respectively. our studies also suggest rthat the nucleotide sequences of the junction fcs "rrar" and orf may originate from non-viral species. and ( ) sars -like cov with at least one junction fcs will be eventually detected in bats. the software virusdetect [ ] was used to detect viruses in rna-seq data from [ ] . the software fastq_clean [ ] was used for rna-seq data cleaning and quality control. the genomes of rmyn and rmyn (gisaid: epi_isl_ and epi_isl_ ) were reassembled by aligning rna-seq data on two closest reference genomes jx and mn . svdetect v . b and svfilter [ ] were used to removed abnormal aligned reads. several haploid contigs (supplementary ) highly similar to the complete rmyn genome were also assembled. this suggested that there exists more than one betacoronavirus strain belonging to the sars-like cov cluster in the same sample, from which rmyn and rmyn were detected. , genome sequences of betacoronaviruses (in group a, b, c and d) were downloaded from the ncbi virus database (https://www.ncbi.nlm.nih.gov/labs/virus) in our previous study [ ] . epi_isl_ and epi_isl_ ), totally complete genomes were used for the phylogenetic analysis applying the neighbour joining (nj) method. sequence alignment was performed using the bowtie v . . software with paired-end alignment allowing mismatches; mutation detection and other data processing were carried out using perl scripts; the phylogenetic analysis was performed using mega v . . ; statistics and plotting were conducted using the software r v . . the bioconductor packages [ ] . the structure of s (supplementary ) was predicted using trrosetta [ ] . recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission bioinformatics analysis of the novel coronavirus genome a furin cleavage site was discovered in the s protein of the novel coronavirus novel coronavirus leads to insights into its virulence the architecture of sars-cov- transcriptome a novel bat coronavirus closely related to sars-cov- contains natural insertions at the s /s cleavage site of the spike protein sars coronavirus orf protein is acquired from sars-related coronavirus from greater horseshoe bats through recombination attenuation of replication by a nucleotide deletion in sarscoronavirus acquired during the early stages of human-to-human transmission genomewide association study of severe covid- with respiratory failure potential impact of sars-cov- infection on the thymus discovery of a -nt deletion during the early evolution of sars-cov- . biorxiv effects of a major deletion in the sars-cov- genome on the severity of infection and the inflammatory response: an observational cohort study. the lancet isolation and characterization of a bat sars-like coronavirus that uses the ace receptor identifying sars-cov- -related coronaviruses in malayan pangolins virusdetect: an automated pipeline for efficient virus discovery using deep sequencing of small rnas fastq_clean: an optimized pipeline to clean the illumina sequencing data with quality control genomewide analysis of dongxiang wild rice (oryza rufipogon griff.) to investigate lost/acquired genes during rice domestication analysis of multimerization of the sars coronavirus nucleocapsid protein. biochemical and biophysical research communications the e protein is a multifunctional membrane protein of sars-cov isolation and characterization of viruses related to the sars coronavirus from animals in southern china isolation and characterization of a bat sars-like coronavirus that uses the ace receptor r language and bioconductor in bioinformatics applications(chinese edition) author contributions statements sg conceived the project. sg and gd supervised this study. xj and sc conducted programming. xl, lw, and ty downloaded, managed and processed the data non-financial competing interests key: cord- - xueqdri authors: leary, shay; gaudieri, silvana; chopra, abha; pakala, suman; alves, eric; john, mina; das, suman; mallal, simon; phillips, elizabeth title: three adjacent nucleotide changes spanning two residues in sars-cov- nucleoprotein: possible homologous recombination from the transcription-regulating sequence date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xueqdri the covid- pandemic is caused by the single-stranded rna virus severe acute respiratory syndrome coronavirus (sars-cov- ), a virus of zoonotic origin that was first detected in wuhan, china in december . there is evidence that homologous recombination contributed to this cross-species transmission. since that time the virus has demonstrated a high propensity for human-to-human transmission. here we report two newly identified adjacent amino acid polymorphisms in the nucleocapsid at positions and (r k/g r) due to three adjacent nucleotide changes across the two codons (i.e. agg gga to aaa cga). this new strain within the lgg clade may have arisen by a form of homologous recombination from the core sequence (cs-b) of the transcription-regulating sequences of sas-cov- itself and has rapidly increased to approximately one third of reported sequences from europe during the month of march . we note that these polymorphisms are predicted to reduce the binding of an overlying putative hla-c* -restricted epitope and that hla-c* is prevalent in caucasians being carried by > % of the population. the findings suggest that homologous recombination may have occurred since its introduction into humans and be a mechanism for increased viral fitness and adaptation of sars-cov- to human populations. evidence of viral adaptation to selective pressures as it spreads among diverse human populations has implications for the ongoing potential for changes in viral fitness over time, which in turn may impact transmissibility, disease pathogenesis and immunogenicity. geographic differences in viral sequence diversity and epidemiological profiles of disease are likely to reflect the spread of founder viruses, which first entered different sars-cov- naïve populations. however, the extent to which selection pressures operating within those populations also impact sars-cov- diversity is currently not known. functional effects of new genetic changes need to be considered in ongoing public health measures to contain infection around the world and in the development of universal vaccines and antiviral therapy. here we describe a new emerging strain of sars-cov- within the lgg clade that appears to be the result of a homologous recombination event that introduced three adjacent nucleotide changes spanning two residues of the nucleocapsid protein. that strain expanded rapidly in europe in march . this protein forms an integral part of the virus lifecycle and is known to be highly immunogenic. we utilized publicly available sars-cov- sequences from the gisaid database table ) . of these polymorphisms, three were the polymorphisms l s in orf , d g in surface glycoprotein (s) and g v in ns (orf a) that mark the major worldwide clades s, g and v, respectively. two newly identified adjacent polymorphisms (r k and g r) in the nucleocapsid protein occur in approximately . % of deposited strains and form one of the main strains emerging from europe ( figure a ). other common polymorphisms include q h in ns , t i in nsp , l f in nsp , p l in the rna-dependent rna polymerase, t m in the membrane glycoprotein and p l and y c in the helicase. current low frequency polymorphisms at < % of deposited sars-cov- sequences include s i in the nucleocapsid, h y in ns , and the following polymorphisms v i, g d, i v, p s, a t, f y, g s and k r in orf ab (supplementary table ). the polymorphisms are present in strains sequenced using different next generation sequencing (ngs) platforms (e.g. nanopore, illumina) and the sanger-based sequencing method making it unlikely that the new changes are sequence or alignment errors. in addition, different laboratories around the world have deposited sequences with these polymorphisms in the database and examination of individual sequences in the region does not find obvious insertions/deletions likely representing alignment issues or homopolymer slippage. for the two newly identified adjacent polymorphisms in the nucleocapsid at positions and , there were no strains in the database that had only one of the two changes. the sars-cov- sequences deposited into the gisaid database are consensus strains predominantly generated from ngs platforms that can typically identify low frequency variants. we did not have access to the original sequence files from the contributing laboratories in order to assess if there was evidence of strains that harbored only one of the polymorphisms at lower frequencies. however, no circulating strain has so far been captured that contains only one of the two nucleocapsid polymorphisms as the consensus sequence. the rapid emergence of these closely linked polymorphisms in viruses may reflect strong selection pressure on this region of the genome in which the original mutation incurred a replicative capacity, or other fitness cost, which could be restored by a linked compensatory mutation. evidence for such adaptations with closely linked compensatory mutations are known to occur under host immune pressure as is well established for other adaptable rna viruses such as hiv , and hepatitis c virus (hcv) . these viruses have such a high rate of viral replication and error-prone reverse transciptase that a massive swarm of viral variants with ongoing recombination between residues is generated continuously. as a result selection pressure exerted by immune responses or other selective pressures effectively operate on each separate residue independently . in contrast, coronaviruses encode proofreading machinery and have a propensity to adapt by homologous recombination between viruses rather than classic step-wise individual mutations driven by selective pressures operating on single viral residues. this, together with the routine nature of their cross-species transmission , led graham and baric to presciently warn in that it was a matter of when, rather than if, a pathogenic coronavirus pandemic would occur in humans. also of note, the phenomena of compensatory fixation has been described in the area of hiv antiviral resistance in which the linked mutations cannot revert to wild type when the selective pressure is removed as the virus cannot negotiate the fitness valley to return to its previous optimal state . we therefore predict that the k /r (aaa cga) change is likely to remain fixed and intermediates to the wild type are unlikely to be found. it will be critical to determine if the introduction of the aaacga motif results in a replicative or other fitness cost to the virus, creates an alternative subgenomic mrna transcript or rna secondary structure or increases nucleocapsid activity as this could indicate that there may be viral attenuation as passage occurs globally through populations of diverse immunogenetic background. as further evidence of the likelihood of a homologous recombination event, the r k polymorphism involves a two-step process from agg to aaa. however, strikingly, the position shows no evidence to date of alternative codon usage, all viral strains that contain an r at this position have the agg codon, and similarly those as of march , there appears to be only a small proportion of strains with kr- lgg in the us, likely reflecting that deposited sequences have been mainly from the west coast of the us that experienced initial importation of asian strains of sars-cov- . it will be of great interest to see sequences from the east coast of the us given the early importation of sars-cov- from northern europe as well as asia and the widespread community transmission that has followed ( figure a) . interestingly, the m polymorphism in the membrane glycoprotein appears to only be present on the kr-lgg combination (of the sequences with this polymorphism are from europe and from north america) (supplementary table ). when the other common polymorphisms (> %) observed in the nsp , nsp , rna-dependent rna polymerase (rdrp), membrane glycoprotein and helicase are taken into account, there are at present eight main circulating strains at > % frequency in the database all within one to three amino acid polymorphism networks (supplementary table ). of note, our current knowledge of the global circulating strains is dependent on the ability of laboratories in different countries to deposit full genome length sars-cov- sequences and may be subject to ascertainment bias. as such, the frequencies of specific strains shown in figure may not reflect the size of the outbreak. however, the data does provide the opportunity to predict the presence of specific strains in areas given the known epidemiology within different countries and regions. currently the possible functional effect(s) of the introduction of the aaacga motif into the nucleocapsid are not known. the nucleocapsid protein is a key structural protein critical to viral transcription and assembly, suggesting that changes in this protein could either increase or decrease replicative fitness. however, it is also possible these changes could be functionally compensated by linked polymorphism in the virus and/or counterbalanced by some other host fitness benefit. however, we have not found any other polymorphism linked to the k /r change to date. selection of viral adaptations to polymorphic host responses mediated by t cells, nkcells, antibodies and antiviral drugs are well described for other rna viruses such as hiv and hcv , . hiv- adaptations to human leucocyte antigen (hla)-restricted t-cell responses have also been shown to be transmitted and accumulate over time , . as previously shown for sars-cov, t-cell responses against sars-cov- are likely to target the nucleocapsid . notably, sars-cov- r k/g r polymorphisms modify the predicted binding of the hla-c* allele to a putative tcell epitope containing these residues. escape from hla-c-restricted t-cell responses may conceivably confer a fitness advantage for sars-cov- , particularly in european populations where hla-c* is prevalent and carried by > % of the population (www.allelefrequencies.net). the replication characteristics and plasticity of small, highly mutable viruses such as hiv and hcv are distinct from sars-cov- , which is significantly less variable. transmission and accumulation of ctl escape variants drive negative associations between hiv polymorphisms and hla hiv evolution: ctl escape mutation and reversion after transmission molecular footprints reveal the impact of the protective hla-a* allele in hepatitis c virus infection evidence of hiv- adaptation to hla-restricted immune responses at a population level cross-species transmission of the newly identified coronavirus -ncov recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission hiv protease resistance and viral fitness the minimum amount of homology required for homologous recombination in mammalian cells evidence of viral adaptation to hla class i-restricted immune pressure in chronic hepatitis c virus infection extensive host immune adaptation in a concentrated north american hiv epidemic rapid hiv- disease progression in individuals infected with a virus adapted to its host population long-lived memory t lymphocyte responses against sars coronavirus nucleocapsid protein in sars-recovered patients cytotoxic t-cell immunity to influenza key: cord- - h lp authors: tada, takuya; fan, chen; kaur, ramanjit; stapleford, kenneth a.; gristick, harry; nimigean, crina; landau, nathaniel r. title: a soluble ace microbody protein fused to a single immunoglobulin fc domain is a potent inhibitor of sars-cov- infection in cell culture date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: h lp soluble forms of ace have recently been shown to inhibit sars-cov- infection. we report on an improved soluble ace , termed a “microbody” in which the ace ectodomain is fused to fc domain of the immunoglobulin heavy chain. the protein is smaller than previously described ace -ig fc fusion proteins and contains an h a mutation in the ace catalytic active site that inactivates the enzyme without reducing its affinity for the sars-cov- spike. the disulfide-bonded ace microbody protein inhibited entry of lentiviral sars-cov- spike protein pseudotyped virus and live sars-cov- with a potency -fold higher than unmodified soluble ace and was active after initial virus binding to the cell. the ace microbody inhibited the entry of ace -specific β coronaviruses and viruses with the high infectivity variant d g spike. the ace microbody may be a valuable therapeutic for covid- that is active against sars-cov- variants and future coronaviruses that may arise. as the severe acute respiratory syndrome coronavirus (sars-cov- ) continues to spread worldwide, there is an urgent need for preventative vaccine and improved therapeutics for treatment of covid- . the development of therapeutic agents that block specific steps of the coronavirus replication cycle will be highly valuable both for treatment and prophylaxis. coronavirus replication consists of attachment, uncoating, replication, translation, assembly and release, all of which are potential drug targets. virus entry is particularly advantageous because as the first step in virus replication, it spares target cells from becoming infected and because drugs that block entry do not need to be cell permeable as the targets are externally exposed. in sars-cov- entry, the virus attaches to the target cell through the interaction of the spike glycoprotein (s) with its receptor, the angiotensin-converting enzyme (ace ) (li, ; li et al., ; li et al., ) , a plasma membrane protein carboxypeptidase that degrades angiotensin ii to angiotensin-( - ) [ang-( - )] a vasodilator that promotes sodium transport in the regulation of cardiac function and blood pressure (kuba et al., ; riordan, ; tikellis and thomas, ) . ace binding triggers s protein-mediated fusion of the viral envelope with the cell plasma membrane or intracellular endosomal membranes. the s protein is synthesized as a single polypeptide that is cleaved by the cellular protease furin into s and s subunits in the endoplasmic reticulum and then further processed by tmprss on target cells (glowacka et al., ; hoffmann et al., ; matsuyama et al., ; shulla et al., ) . the s subunit contains the receptor binding domain (rbd) which binds to ace while s mediates virus-cell fusion (belouzard et al., ; fehr and perlman, ; heald-sargent and gallagher, ; li et al., ; shang et al., ) . cells that express ace are potential targets of the virus. these include cells in the lungs, arteries, heart, kidney, and intestines (harmer et al., ; ksiazek et al., ; leung et al., ) . the use of soluble receptors to prevent virus entry by competitively binding to viral envelope glycoproteins was first explored for hiv- with soluble cd . in early studies, a soluble form of cd deleted for the transmembrane and cytoplasmic domains was found to block virus entry in vitro (daar et al., ; haim et al., ; orloff et al., ; schenten et al., ; sullivan et al., ) . fusion of the protein to an immunoglobulin fc region, termed an "immunoadhesin", increased the avidity for gp by dimerizing the protein and served to increase the half-life of the protein in vivo. an enhanced soluble cd -ig containing a peptide derived from the hiv- coreceptor ccr was found to potently block infection and to protect rhesus macaques from infection (chiang et al., ) . the soluble receptor approach to blocking virus entry has been recently applied to sars-cov- through the use of recombinant human soluble ace protein (hrsace ) (kuba et al., ; monteil et al., ; wysocki et al., ) or hrsace -igg which encodes soluble ace and the fc region of the human immunoglobulin g (igg) (case et al., ; lei et al., ; qian and hu, ) which were shown to inhibit of sars-cov and sars-cov- entry in a mouse model. in phase and phase clinical trials (haschke et al., ; khan et al., ) , the protein showed partial antiviral activity but short halflive. addition of the fc region increased the half-life of the protein in vivo. a potential concern with the addition of the ig fc region is the possibility of enhancement, similar to what occurs with antibody-dependent enhancement in which anti-spike protein antibody attaches to fc receptors on immune cells, facilitating infection rather than preventing it (eroshenko et al., ) . we report, here on a soluble human ace "microbody" in which the ace ectodomain is fused to domain of immunoglobulin g heavy chain fc region (igg-ch ) (maute et al., ) . the igg-ch fc domain served to dimerize the protein, increasing its affinity for the sars-cov- s and decreasing the molecular mass of the protein. the ace microbody did not bind to cell surface fc receptor, reducing any possibility of infection enhancement. mutation of the active site h to alanine in the ace microbody protein, a mutation that has been shown to inactivate ace catalytic activity (guy et al., ) , did not decrease its affinity for the s protein. the dimeric ace microbody had about -fold higher antiviral activity than soluble ace , which was also a dimer, and high a higher affinity for virion binding. the ace microbody blocked virus entry into ace . t cells that over-expressed ace as well as all of the cell-lines tested and was fully active against the d g variant spike protein and a panel of b coronavirus spike proteins. as a means to study sars-cov- entry, we developed an assay based on sars-cov- s protein pseudotyped lentiviral reporter viruses. the viruses package a lentiviral vector genome that encodes nanoluciferase and gfp separated by a p a self-processing peptide, providing a convenient means to titer the virus, and the ability to use two different assays to measure infection. to pseudotype the virions, we constructed expression vectors for the full-length sars-cov- s and for a Δ variant deleted for the carboxyterminal amino acids that removes a reported endoplasmic reticulum retention sequence that blocks transit of the s protein to the cell surface (giroglou et al., ) ( figure a) . the vectors were constructed with or without a carboxy-terminal hemagglutinin (ha) epitope tag. pseudotyped viruses were produced in t cells cotransfected with the dual nanoluciferase/gfp reporter lentiviral vector plenti.gfp.nluc, gag/pol expression vector pmdl and full-length s protein, the Δ s protein, vesicular stomatitis virus g protein (vsv-g) expression vector or without an envelope glycoprotein expression vector. immunoblot analysis showed that full-length and Δ s proteins were expressed and processed into the cleaved s protein (s is not visible as it lacks an epitope tag). analysis of the virions showed that the Δ s protein was packaged into virions at > -fold higher levels than the full-length protein ( figure b ). this difference was not the result of differences in virion production as similar amounts of virion p were present in the cell supernatant. analysis of the transfected t cells by flow cytometry showed a minor increase in the amount of cell surface Δ s protein as compared to full-length ( figure c ) suggesting that deletion of the endoplasmic reticulum retention signal was not the primary cause of the increased virion packaging of the Δ s protein and may result from an inhibitory effect of the s protein cytoplasmic tail on virion incorporation. as a suitable target cell-line, we established a clonal, stably transfected t cell-line that expressed high levels of ace (figures d and s ). a comparison of the infectivity of the viruses on ace . t cells showed that the Δ s protein pseudotype was about . -fold more infectious than the full-length s protein pseudotype ( figure e ). the ha-tag had no effect on infectivity and the nevirapine control demonstrated that the luciferase activity was the result of bona fide infection and not carried-over luciferase in the virus-containing supernatant. to determine the cell-type tropism of the pseudotyped virus, we tested several standard laboratory cell-lines for susceptibility to infection to the Δ s protein pseudotyped virus. the vsv-g pseudotype, which has very high infectivity on most celltypes was tested for comparison and virus lacking a glycoprotein was included to control for potential receptor-independent virus uptake. the results showed high infectivity of the Δ s protein pseudotyped virus on ace . t cells, intermediate infectivity on t, vero, vero e , a , ace .a , caco and huh and low infectivity on a , chme , bhk and u ( figure f) . analysis by flow cytometry of cell surface ace levels showed high level expression on ace . t, intermediate levels expression on ace .a and low to undetectable levels on a , caco and huh ( figure s ). the low level of ace expression on cells such as vero and caco suggests that virus can use very small amounts of the receptor for entry. moreover, the pseudotyped virus is a highly sensitive means with which to detect virus entry. soluble ace and ace -fc fusions have been shown to inhibit sars-cov- infection (case et al., ; kuba et al., ; lei et al., ; monteil et al., ; qian and hu, ; wysocki et al., ) . to increase the effectiveness of soluble ace and improve therapeutic potential, we generated an ace -"microbody" in which the ace ectodomain was fused to a single igg ch domain of the igg fc region (figure a ). this domain contains the disulfide bonding cysteine residues of the igg fc that are required to dimerize the protein, which would serve to increase the ace microbody avidity for ace . to prevent potential unwanted effects of the protein on blood pressure due to the catalytic activity of ace , we mutated h , one of the key active site histidine residues of ace , to alanine, a mutation that has been shown to block catalytic activity (guy et al., ) . h lies underneath the s protein interaction site so was not predicted to interfere with s protein binding ( figure b ). for comparison, we constructed vector encoding soluble ace without the igg ch . the proteins were purified from transfected t cells and purified to homogeneity by ni-nta agarose affinity chromatography followed by size exclusion chromatography ( figure s ). the oligomerization state of the proteins was analyzed by sds-page under nonreducing and reducing conditions. under reducing conditions, the ace and ace .h a microbody proteins and soluble ace ran at the kda and kda, consistent with their calculated molecular mass (figures c and s ). under nonreducing conditions, the ace microbody and ace .h a microbody proteins ran at kda, consistent with dimers while the soluble ace ran as a monomer with a mass of kda ( figure c) . analysis of the proteins by size-exclusion chromatography coupled with multi-angle light scattering (sec-mals) under nondenaturing conditions showed all three proteins to have a molecular mass consistent with dimers ( figure d ). the mass of the ace and ace .h a microbody proteins was kda and kda, respectively, while soluble ace was kda. taken together, the results suggest that the ace microbody proteins are disulfide-bonded dimers while soluble ace is a nondisulfide-bonded dimer. to compare the relative ability of the soluble ace proteins to virions that display the s protein, we established a virion pull-down assay. ni-nta beads were incubated with a serial dilution of the carboxy-terminal his-tagged soluble ace proteins. free spike protein was removed and the beads were then incubated with a fixed amount of lentiviral pseudotyped virions. free virions were removed and the bound virions were quantified by immunoblot analysis for virion p capsid protein. to confirm that virus binding to the beads was specific for the bead-bound ace , control virions lacking the spike were tested. the results showed that s protein pseudotyped virions bound to the beads while virions that lacked the s protein failed to bind, confirming that the binding was specific ( figure a) . in addition, a high titer human serum from a recovered individual blocked binding of the virions to the bead-bound ace microbody ( figure s ). analysis of the soluble ace and ace microbody proteins that had bound to the beads showed that similar amounts of each proteins had bound ( figure b ). immunoblot analysis of virion binding to the bead-bound soluble ace proteins showed that the wild-type and h a microbody proteins both bound to virions more efficiently than soluble ace ( figure c) and that the ace .h a microbody bound more virions than the wild-type microbody protein. this was unexpected as h does not lie in the interaction surface with the s protein. to determine the relative antiviral activity of soluble ace and the ace microbody proteins, we tested their ability to block the infection sars-cov- Δ s protein pseudotyped gfp/luciferase reporter virus. a fixed amount of pseudotyped reporter virus was incubated with the ace proteins and then used to infect ace . t cells. after days, luciferase activity and the number of gfp+ cells in the infected cultures were analyzed. for comparison, a high titered recovered patient serum with a neutralizing titer of : (figure s ) was also tested. the results showed that soluble ace had moderate inhibitory activity with an ec of . µg/ml. the ace microbody was significantly more potent, with an ec of . µg/ml and the ace .h a microbody protein was somewhat more potent than the wild-type ace microbody with an ec of . µg/ml ( figure a ). inhibition of infection by the soluble ace proteins was comparable to recovered patient serum although it is not possible to directly compare the two inhibitors as the mass amount of anti-s protein antibody in the serum is not known. to confirm the results, we analyzed the infected cells by flow cytometry to determine the number of gfp+ cells. the inhibition curves were similar to the luciferase curves, confirming that the ace proteins had decreased the number of cells infected and did not simply reduce expression of the reporter protein ( figure b, top) . representative images of the gfp+ cells provide visual confirmation of the results ( figure b, below) . the inhibitory activity of the soluble ace proteins was specific for the sars-cov- s protein as they did not inhibit vsv-g pseudotyped virus ( figure c ). the ace microbody was somewhat more active when tested on untransfected t that express low levels of ace ( figure d ). to determine the ability of the ace microbody proteins to block the replication of live sars-cov- , we used the replication-competent sars-cov- , icsars-cov- mng that encodes an mneongreen reporter gene in orf (xie et al., ) . serially diluted ace microbody proteins were incubated with the virus and the mixture was then used to infect ace . t cells. the results showed that - . µg of ace microbody protein blocked live virus replication ( figure e ). soluble ace was less active; µg of the protein had a % antiviral effect and the activity was lost with . µg. the antiviral activity of ace proteins against live virus was similar to pseudotyped virus, except that in the live virus assay, the wild-type and h a microbodies were of similar potency. in the experiments described above, the proteins were incubated with virus prior to infection. to determine whether they would be active when at later time points, the ace proteins were tested in an "escape from inhibition" assay in which the soluble ace and ace microbody proteins were added to cells at the same time as virus or up to hours post-infection. the results showed that addition of the microbody together with the virus (t ) blocked the infection by %. addition of the microbody minutes postinfection maintained most of the antiviral effect, and even hours post-infection the inhibitor blocked % of the infection. at hours post-infection, the ace microbody retained its blocking activity at µg/ml but was less active with decreasing amounts of inhibitor ( figure a) . these results suggest that the ace microbody is highly efficient at neutralizing the virus when present before the virus has had a chance to bind to the cell and that it maintains its ability to block infection when added together with the cells and even hours after the virus has been exposed to cells, a time which most of the virus has not yet bound to the cell. to determine whether the ace microbody could prevent virus entry once the virus bound to the cell, the virus was prebound by incubating it with cells for hour at °c, the unbound virus was removed and the ace microbody was added at increasing time points. the results showed that removal of the unbound virus after hour incubation resulted in less infection as compared to when the virus was incubated with the cells for hours, indicating that only a fraction of the virus had bound to cells. however, virus that was bound could be blocked by the ace microbody for another minutes post-binding ( figure b) . the ability to block entry of the cell-bound virus suggests that virus binding results from a small number of spike molecules binding to ace . over the next - minutes, additional spike:ace interactions form, escaping the ability of the ace microbody to block virus entry. the results demonstrate that the ace microbody is a highly potent inhibitor of free virus and maintains its antiviral activity against virus newlybound to the cell. cov- containing a d g point mutation in the s protein has been found to be circulating in the human population with increasing prevalence (daniloski et al., ; eaaswarkhanth et al., ; korber et al., ; zhang et al., ) . the d g mutation was found to decrease shedding of the spike protein from the virus and to assume a fusion-ready conformation, resulting in increased infectivity and most likely contributing to its increasing prevalence. to determine the ability of the soluble ace proteins to block entry of virus with the d g s protein, we introduced the mutation into the Δ s protein expression vector and generated pseudotyped reporter viruses ( figure a ). analysis of the infectivity of the d g and wild-type pseudotyped viruses on the panel of cell-lines showed that the mutation increased the infectivity of virus - fold on t, ace . t, vero and veroe cells, consistent with previous reports (daniloski et al., ; yurkovetskiy et al., ; zhang et al., ) . infectivity of the mutated virus was also increased in a , ace .a caco although the overall infectivity of these cells was low (figure. b). to determine the ability of the soluble ace proteins to neutralize the virus with the variant s protein, serial dilutions of the soluble ace proteins were tested for their ability to block wild-type and d g s pseudotyped virus. the results showed that soluble ace had moderate antiviral activity against wild-type virus, while the wildtype and h a microbody proteins were more potent ( figure c ). the ace .h a microbody was somewhat more active at low concentrations than the wild-type protein. to test the relative binding affinity of the soluble ace proteins for wild-type and d g mutated spike, we tested the pseudotyped virions in the ace -virus binding assay ( figure d) . the results showed that virus with the d g s bound efficiently to soluble ace . the results demonstrate the broad activity of the ace microbody. we report the development of a soluble form of ace in which the ectodomain of ace is fused to a single domain of the igg heavy chain fc. the domain renders the protein smaller than those fused to the full-length fc yet retains the cysteine residues required for dimerization and the ability to increase the in vivo half-life (maute et al., ) . the microbody protein was shown to be a disulfide-bonded dimer in contrast to soluble ace lacking the fc domain which was dimeric but not nondisulfide-bonded. although both proteins are dimeric, the ace microbody had about -fold more antiviral activity than soluble ace and bound to virions with a > -fold increased affinity. while high affinity anti-spike rbd monoclonal antibodies that potently inhibit sars-cov- infection will be of great value in the treatment of covid- , the soluble receptor proteins have advantageous features. the ace microbody is of fully human origin so should be relatively non-immunogenic. in addition, it is expected to be broadly active against mutated variant spike proteins that may arise in the human population. the microbody was fully active against virus with the d g s protein, a variant of increasing prevalence with increased infectivity in vitro ( figure ) and was highly active against ace -specific s proteins from other b coronaviruses. it has been previously shown that a recombinant ace -fc fusion had a major effect on blood pressure in a mouse model (liu et al., ) . it was therefore important to inactivate ace carboxypeptidase activity in microbody to decrease unwanted effects on blood pressure associated with its use therapeutically. the h a mutation alters one of the histidine essential for ace catalytic activity yet did not impair antiviral activity against sars cov- or other b coronavirus spike proteins. in some of our analyses, the ace .h a microbody appeared to be more active than the wild-type protein although the significance of this difference was unclear as the two proteins had similar activity in the live virus replication assay. escape from inhibition studies provided insight into the kinetics of virus infection and the mechanism of inhibition by the soluble receptors. pretreatment of virus with the ace microbody potently neutralized the virus as did simultaneous treatment addition of virus and microbody to cells. furthermore, the protein retained its ability to prevent infection even when added to the culture at times after addition of virus, blocking infection by about % when added hour after virus addition. the ace microbody was partially active even on virus that had already attached to the cell. when virus was pre-bound for hours, a time at which about % of the infectious virus had bound the cell, the ace microbody retained the ability to prevent infection of about % of the bound virus ( figure a ). taken together, the experiments suggest a series of events in which the virus binds to cells over a period of about hours. during this time, the ace microbody is highly efficient, neutralizing nearly all of the free virus. once the virus binds to the cell, the ace microbody retains its ability to block infection for about min, suggesting that binding is initially mediated by a small number of s proteins and that over hours, additional s proteins are recruited to interact with target cell ace , a period during which the ace microbody remains able to block the viral fusion reaction. once a sufficient number of s protein:ace interactions have formed, the virus escapes neutralization. it was surprising that the ace microbody had more antiviral activity than soluble ace as both proteins are dimeric. in addition, the ace microbody protein showed somewhat better binding to virions than soluble ace . the reasons for these differences are not clear. it is possible that the disulfide bonds of the ace microbody stabilize the dimer or that they position the individual monomers in a more favorable conformation to bind to the individual subunits of the s protein trimer. it is worth noting that in most of the experiments, we used ace . t cells that overexpress ace compared to the cell-lines tested. on untransfected t cells that express barely detectable levels of ace , the antiviral activity of the microbody protein was increased, suggesting that the antiviral activity of the ace microbody may be under-estimated by the use of ace overexpressing cells. recent reports have described similar soluble ace proteins. recently soluble ace -related inhibitor including rhace was shown to partially block infection (case et al., ; lei et al., ; monteil et al., ) although the proteins had a short half-life (wysocki et al., ) (< hours in mice), limiting their clinical usefulness. in contrast a dimeric rhesus ace -fc fusion protein had a half-life in mice in plasma greater than week (liu et al., ) . the half-life of the ace microbody in vivo has not yet been tested, but the protein retained antiviral activity for several days in tissue culture, significantly longer than longer than soluble ace ( figure s ). the phenomenon of antibody-dependent enhancement is caused by the interaction of the fc domain of non-neutralizing antibody with the fc receptor on cells which then serves to promote rather than inhibit virus neutralization. a similar phenomenon is possible with receptor-fc fusion proteins by interaction with fc receptor on cells. because the ace microbody contained only a single fc domain, it was not expected to interact with fc receptor. to test whether this was the case, we tested the ace microbody in an enhancement assay using u cells which express fc receptors. the ace microbody protein did not detectably bind to cells that express the fc g receptor and the cells did not become infected, suggesting that this mechanism is not likely to play a role in vivo ( figure s and not shown). pseudotyped viruses have been highly useful for studies of sars-cov- entry. vectors for producing sars-cov- lentiviral pseudotypes have been developed by several laboratories (crawford et al., ; nie et al., ; ou et al., ; schmidt et al., ; shang et al., ; xia et al., ) . the vectors we report here produce pseudotyped lentiviral viruses with very high infectivity. the high infectivity of the pseudotypes produced is due in part to efficient expression of a codon-optimized Δ s protein and the efficient virion incorporation that results from the cytoplasmic tail truncation. the Δ s protein was present at only slightly higher levels on the cell surface than the full-length protein, suggesting that this small increase does not fully account for the large increase in virion incorporation. a possible explanation is that the full-length cytoplasmic tail sterically hinders virion incorporation by conflicting with the underlying viral matrix protein and that the deletion removes the conflict. also, contributing to high viral titers, is the use of separate gag/pol packaging vector and lentiviral transfer vector as opposed to a lentiviral proviral dna encoding gag/pol and the reporter gene, a strategy that resulted in higher reporter gene expression as shown in a direct comparison (not shown). moreover, the dual luciferase/gfp reporter allows for measurement of infectious virus titer by flow cytometry and the high sensitivity of nanoluciferase read-out. the lentiviral pseudotypes are highly useful for rapidly titering neutralizing antibody in patient serum. in a study of over sera from recovering patients, we found that the pseudotype assay results to be highly correlated with those of a live virus neutralization assay (submitted). a feature of soluble receptors is that because the virus spike protein needs to conserve receptor binding affinity to maintain transmissibility, they should maintain their ability to neutralize s protein variants. sars-cov- s variants have been found to be circulating in the human population and it is likely that others are yet to emerge, some of which may be less sensitive to neutralization by the therapeutic monoclonal antibodies currently under development. the recently identified sars-cov- variant encoding the d g s protein has been found to be spreading with increased frequency in the human population (daniloski et al., ; eaaswarkhanth et al., ; korber et al., ; zhang et al., ) . the d g s protein was found to be more resistant to shedding from the virion and to adopt a conformation that favors ace -binding and is in a more fusioncompetent state (yurkovetskiy et al., ; zhang et al., ) . we confirmed the increased infectivity of virions and find that the d g s protein has a higher affinity for ace as measured in a virion binding assay. nevertheless, the ace microbody maintained its ability to neutralize d g s protein pseudotyped virus. the ability of the ace microbody to neutralize diverse b coronaviruses suggest that it may also be able to neutralize novel ace using coronaviruses that may be transferred to the human population in the future. the microbody protein could serve as an off-the-shelf reagent that could be rapidly deployed. the dual gfp/nanoluciferase lentiviral vector plenti.gfp.nluc was generated by overlap extension pcr. a dna fragment encoding gfp was amplified with a forward primer containing a bamh-i site and a reverse primer encoding the p a sequence. the nanoluciferase gene (nluc) was amplified with a forward primer encoding the p a motif and a reverse primer containing a '-sal-i site. the amplicons were mixed and amplified with the external primers. the fused amplicon was cleaved with bamh-i and sal-i and cloned into plenti.cmv.gfp.puro (addgene plasmid # , provided by eric campeau and paul kaufman) (campeau et al., ). the sars-cov- s expression vector pccov .s was chemically synthesized as dna fragments a and b encoding codon-optimized ' and ' halves, respectively, of the s gene of wuhan-hu- / sars-cov- isolate (table s and the amplicon was then cloned into the kpn-i and xho-i sites of pcdna . the ace microbody expression vector pcace -microbody was generated by overlap extension pcr that fused the extracellular domain of ace with human immunoglobulin g heavy chain fc domain using a forward primer containing a kpn-i site and reverse primer containing an xhis-tag and xho- site. the amplicon was cloned into the kpn-i and xho-i sites of pcdna . expression vector pcace .h a-microbody that expressed the ace .h a microbody was generated by overlap extension pcr using primers that overlapped the mutation. full-length cdna sequence, primer sequences and amino acid sequences are shown in tables s - . control and recovered patient sera were collected from patients through the nyu vaccine center with written consent under i.r.b. approval (irb - and irb - ) and were deidentified. vero e , caco , a , ace a , bhk, huh t, vero and chme cells were sars-cov- s protein pseudotyped lentiviral stocks were produced by cotransfecting virus stocks were titered on t by flow cytometry and for luciferase activity. the p concentration was measured and the virus was used at a concentration of . µg/ml. to test the inhibitory activity of soluble receptors and convalescent sera, µl serially diluted inhibitor or convalescent patient serum was incubated for min at room temperature with µl pseudotyped reporter virus (approximately x cps luciferase activity/µl) at a moi of . in a volume of µl. the mixture was added to ace . t cells in a well tissue culture dish containing x cells/well. after days, the culture medium was removed and µls nano-glo luciferase substrate (promega) and µls medium was added to each well. the supernatant ( µls) was transferred to a microtiter plate and the luminescence was read in an envision microplate luminometer (perkinelmer). alternatively, the gfp+ cells were quantified by flow cytometry with pacific blue viability dye to exclude dead cells (biolegend). f cells (thermo fisher) at a density of . x cells/ml were transfected with microbody expression vector plasmid dna using polyethyleneimine (polysciences, inc) at a : plasmid:pei ratio. the cells were then cultured at °c and at hours posttransfection mm sodium butyrate was added. after days, the supernatant culture medium was collected, filtered and adjusted ph to . . the medium was passed over a the absolute molecular masses of the purified protein complexes were determined by sec/mals. the proteins were injected onto a superdex / gl gel-filtration chromatography column equilibrated in sample buffer that was connected to a dawn heleos ii -angle light-scattering detector (wyatt technology), a dynamic lightscattering detector (dynapro nanostar; wyatt technology) and an optilab t-rex refractive index detector (wyatt technology). the data were collected at °c at a flow rate of . ml/minute every second. the molecular mass of each protein was determined by analysis with astra software. t cells were transfected by lipofection with µg pcace -microbody. at hours posttransduction, . ml of culture supernatant was incubated with nickel-nitrilotriacetic acidagarose beads (qiagen). the beads were washed, and bound protein was eluted with laemmle loading buffer. the proteins were analyzed on an immunoblot probed with mouse anti- xhis antibody (invitrogen) and horseradish peroxidase (hrp)-conjugated goat anti-mouse igg secondary antibody (sigma-aldrich). the proteins were visualized using luminescent substrate and scanned on a li-cor biosciences fc imaging system (li-cor biotechnology). ratios were calculated as the his (spike) signal intensity divided by the p signal intensity for an identical exposure of the blot. transfected cells were lysed in buffer containing mm hepes, mm kcl, mm edta, . % np- , and protease inhibitor cocktail. protein concentration in the lysates was measured by bicinchoninic protein assay and the lysates ( µg) were separated by sds-page. the proteins were transferred to polyvinylidene difluoride membranes and probed with anti-ha mab (covance), mouse anti-his mab (invitrogen) and anti-gapdh mab (life technologies) followed by goat anti-mouse hrp-conjugated second antibody (sigma). the blots were visualized using luminescent substrate (millipore) on a li-cor bio-sciences fc imaging system. soluble ace proteins ( µg) were mixed with µl nickel beads for hour at °c. unbound protein was removed by washing the beads with pbs. the beads were resuspended in pbs and mixed with µl pseudotyped lentiviral virions after h incubation at °c, the beads were washed with pbs and resuspended in reducing laemmli loading buffer and heated to °c. the eluted proteins were separated by sds-page and analyzed on an immunoblot probed with anti-p antibody (ag . ) followed by goat anti-mouse hrp-conjugated second antibody. mneongreen sars-cov- (xie et al., cell host and microbe ) was obtained from the world reference center for emerging viruses and arboviruses at the university of texas medical branch. the virus was passaged once on vero e cells (atcc crl- ), clarified by low-speed centrifugation, aliquoted, and stored at - °c. the infectious virus titer was determined by plaque assay on vero e cells after staining with crystal violet. virus neutralization was determined as previously described (xie et al, biorxiv ) . ace . t cells were seeded in a -well plate ( x /well). the next day, mneongreen sars-cov- (moi = . ) was mixed : with serially -fold diluted soluble ace protein in dmem/ % fbs and incubated for hour at °c. the virus:protein mixture was then added to the ace cells and incubated for hours. at °c in % co . the cells were fixed with % paraformaldehyde, stained with dapi and the mneongreen+ cells were counted on a cellinsight cx platform high content microscope (thermo fisher). all experiments were performed in technical duplicates or triplicates and data were analyzed using graphpad prism (version . e). statistical significance was determined by the two-tailed, unpaired t test. significance was based on two-sided testing and attributed to p< . . confidence intervals are shown as the mean ± sd or sem. (*p≤ . , **p≤ . , ***p≤ . , ****p≤ . ). mechanisms of coronavirus cell entry mediated by the viral spike protein a versatile viral system for expression and depletion of proteins in mammalian cells neutralizing antibody and soluble ace inhibition of a replication-competent vsv-sars-cov- and a clinical isolate of sars enhanced recognition and neutralization of hiv- by antibody-derived ccr -mimetic peptide variants protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays the d g mutation in sars-cov spike increases transduction of multiple human cell types. biorxiv could the d g substitution in the sars-cov- spike (s) protein be associated with higher covid- mortality? implications of antibody-dependent enhancement of infection for sars-cov- countermeasures coronaviruses: an overview of their replication and pathogenesis retroviral vectors pseudotyped with severe acute respiratory syndrome coronavirus s protein evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response identification of critical active-site residues in angiotensin-converting enzyme- (ace ) by site-directed mutagenesis soluble cd and cd -mimetic compounds inhibit hiv- infection by induction of a short-lived activated state quantitative mrna expression profiling of ace , a novel homologue of angiotensin converting enzyme pharmacokinetics and pharmacodynamics of recombinant human angiotensin-converting enzyme in healthy human subjects ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor a pilot clinical trial of recombinant human angiotensin-converting enzyme in acute respiratory distress syndrome tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus a novel coronavirus associated with severe acute respiratory syndrome trilogy of ace : a peptidase in the renin-angiotensin system, a sars receptor, and a partner for amino acid transporters a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury structure of the sars-cov- spike receptor-binding domain bound to the ace receptor neutralization of sars-cov- spike pseudotyped virus by recombinant ace -ig functional assessment of cell entry and receptor usage for lineage b beta-coronaviruses enteric involvement of severe acute respiratory syndrome-associated coronavirus infection receptor recognition mechanisms of coronaviruses: a decade of structural studies structure of sars coronavirus spike receptor-binding domain complexed with receptor insights from the association of sars-cov sprotein with its receptor, ace angiotensin-converting enzyme is a functional receptor for the sars coronavirus novel ace -fc chimeric fusion provides long-lasting hypertension control and organ protection in mouse models of systemic renin angiotensin system activation efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss engineering high-affinity pd- variants for optimized immunotherapy and immuno-pet imaging infections in engineered human tissues using clinical-grade soluble human ace establishment and validation of a pseudovirus neutralization assay for sars-cov- two mechanisms of soluble cd (scd )-mediated inhibition of human immunodeficiency virus type (hiv- ) infectivity and their relation to primary hiv- isolates with reduced sensitivity to scd characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov ucsf chimera--a visualization system for exploratory research and analysis ig-like ace protein therapeutics: a revival in development during the covid- pandemic angiotensin-i-converting enzyme and its relatives effects of soluble cd on simian immunodeficiency virus infection of cd -positive and cd -negative cells measuring sars-cov- neutralizing antibody activity using pseudotyped and chimeric viruses cell entry mechanisms of sars-cov- a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry determinants of human immunodeficiency virus type envelope glycoprotein activation by soluble cd and monoclonal antibodies angiotensin-converting enzyme (ace ) is a key modulator of the renin angiotensin system in health and disease targeting the degradation of angiotensin ii with recombinant angiotensin-converting enzyme : prevention of angiotensin ii-dependent hypertension inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion an infectious cdna clone of sars-cov- sars-cov- spike protein variant d g increases infectivity and retains sensitivity to antibodies that target the receptor binding domain the d g mutation in the sars-cov- spike protein reduces we thank lin xinhua and hanna nazeeh (nyulh) and benjamin tenoever for cell-lines ace -mb h a-mb sace a. no serum no serum **** **** **** **** **** tada et al. atgtcaagctcttcctggctccttctcagccttgttgctgtaactgctgctcagtccaccattgaggaacaggccaagacatttt tggacaagtttaaccacgaagccgaagacctgttctatcaaagttcacttgcttcttggaattataacaccaatattactgaaga gaatgtccaaaacatgaataatgctggggacaaatggtctgcctttttaaaggaacagtccacacttgcccaaatgtatccacta caagaaattcagaatctcacagtcaagcttcagctgcaggctcttcagcaaaatgggtcttcagtgctctcagaagacaagagc aaacggttgaacacaattctaaatacaatgagcaccatctacagtactggaaaagtttgtaacccagataatccacaagaatgct tattacttgaaccaggtttgaatgaaataatggcaaacagtttagactacaatgagaggctctgggcttgggaaagctggagat ctgaggtcggcaagcagctgaggccattatatgaagagtatgtggtcttgaaaaatgagatggcaagagcaaatcattatgagg actatggggattattggagaggagactatgaagtaaatggggtagatggctatgactacagccgcggccagttgattgaagat gtggaacatacctttgaagagattaaaccattatatgaacatcttcatgcctatgtgagggcaaagttgatgaatgcctatcctt cctatatcagtccaattggatgcctccctgctcatttgcttggtgatatgtggggtagattttggacaaatctgtactctttgac agttccctttggacagaaaccaaacatagatgttactgatgcaatggtggaccaggcctgggatgcacagagaatattcaagga ggccgagaagttctttgtatctgttggtcttcctaatatgactcaaggattctgggaaaattccatgctaacggacccaggaaat gttcagaaagcagtctgccatcccacagcttgggacctggggaagggcgacttcaggatccttatgtgcacaaaggtgacaat ggacgacttcctgacagctcatcatgagatggggcatatccagtatgatatggcatatgctgcacaaccttttctgctaagaaa tggagctaatgaaggattccatgaagctgttggggaaatcatgtcactttctgcagccacacctaagcatttaaaatccattggt cttctgtcacccgattttcaagaagacaatgaaacagaaataaacttcctgctcaaacaagcactcacgattgttgggactctgc catttacttacatgttagagaagtggaggtggatggtctttaaaggggaaattcccaaagaccagtggatgaaaaagtggtggg agatgaagcgagagatagttggggtggtggaacctgtgccccatgatgaaacatactgtgaccccgcatctctgttccatgttt ctaatgattactcattcattcgatattacacaaggaccctttaccaattccagtttcaagaagcactttgtcaagcagctaaacat gaaggccctctgcacaaatgtgacatctcaaactctacagaagctggacagaaactgttcaatatgctgaggcttggaaaatca gaaccctggaccctagcattggaaaatgttgtaggagcaaagaacatgaatgtaaggccactgctcaactactttgagccctta tttacctggctgaaagaccagaacaagaattcttttgtgggatggagtaccgactggagtccatatgcagaccaaagcatcaaa gtgaggataagcctaaaatcagctcttggagataaagcatatgaatggaacgacaatgaaatgtacctgttccgatcatctgttg catatgctatgaggcagtactttttaaaagtaaaaaatcagatgattctttttggggaggaggatgtgcgagtggctaatttgaa accaagaatctcctttaatttctttgtcactgcacctaaaaatgtgtctgatatcattcctagaactgaagttgaaaaggccatca ggatgtcccggagccgtatcaatgatgctttccgtctgaatgacaacagcctagagtttctggggatacagccaacacttggac ctcctaaccagccccctgtttccatatggctgattgtttttggagttgtgatgggagtgatagtggttggcattgtcatcctgat cttcactgggatcagagatcggaagaagaaaaataaagcaagaagtggagaaaatccttatgcctccatcgatattagcaaagg agaaaataatccaggattccaaaacactgatgatgttcagacctccttttag key: cord- -pabh ajb authors: de lussanet de la sablonière, marc h. e. title: robust, general purpose, digital power line hum filter which is free of deformations and which can be applied to large transients date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: pabh ajb power line interference (“hum noise”) is a common source of noise in recorded biological data. it has a highly constant frequency with harmonics and little variation in amplitude and wave shape. in contrast to stochastic noise it is phasic, i.e., the phase relation remains constant across long intervals. in digital recordings, the measurement frequency is typically very close to a multiple of the hum frequency ( or hz). common filters introduce various kinds of distortions and errors. the here proposed subtraction method makes use of the specific properties of power line hum. for this, it computes a moving estimate of the hum noise by taking the periodic median of the high-pass filtered signal. the resulting periodic median subtraction (pms) filter reliably removes hum of any harmonic composition, even if the ground frequency is lacking. the filter is completely free of border artifacts. it does not introduce distortions, even if around sharp transients such as in force plate recordings of jumping. the filter is validated on recorded and artificial data. the errors are quantified. the results also show that the errors of hum filters generally increase with the high-frequency content of the data. thus, the removal of hum from a force plate recording generally gives better results than from an emg recoding. compared to other filters, the errors of the pms filter are generally lower than the best hum filters currently known. digital data recordings, such as eeg, emg, ekg, kinetics (force plates) etc., are all prone to power line noise. power line noise is a common and old problem and various filters have been made designed for such noise. even though power line noise is mostly highly regular in frequency, amplitude and wave form, it has proven notoriously difficult to design a filter that is free of artifacts and deformations, which leaves the original signal intact. moreover many filters focus on a single narrow frequency band, but power line noise typically has strong harmonic frequencies as well. hum problems can be solved by avoiding them. if they cannot be avoided, analog or digital filters can be applied. thanks to modern computational power and digital storage capacity analog filters are not common anymore. there are two dominant approaches to design computational filters, i.e., spectral filters in the frequency domain and subtraction filters in the time domain (for reviews see baratta et al., ; daniel and neagu, ; thalkar and upasani, ; blri, ) . frequency based filters aim to eliminate the main frequency (and often also the harmonics) from the noisy signal. a serious problem are the distortion artifacts which not only remove the hum noise, but also change the shape of the signal. these are known as gibb's rippling and as phase distortions. gibb's rippling is especially problematic in the vicinity of transients in the signal, which are especially prominent in force plate measurements (kinetics). many filters perform badly at the beginning and end of the signal (edge effects), and are thus not well suited for relatively short recordings. frequency-based methods involve moving averages technique and iir notch (kaur aneja and singh, ) , spectrum interpolation (leske and dalal, ), multi-taper decomposition (mitra and bokil, , for multichannel techniques such as eeg) and fourier decomposition (singh et al., ) . subtraction-type filters aim to fit the shape and amplitude of the hum noise directly from the data, sometimes from multiple channels (e.g., in eeg signals) or from a reference channel (e.g. wan et al., ; lin et al., ) . a major advantage of time-domain filters over frequency-based ones, is the highly regular nature of hum noise. typically, the recording frequency is a multiple of the power line frequency, so that each period of samples of hum noise is an almost exact repeat of the previous and following periods. the periodic shape of the hum noise can be estimated using waveletindependent component analysis (akwei-sekyere, ; kaushal et al., ; oliveira et al., ) , eigenvalue decomposition (sharma and pachori, ) , robust active noise control (lin et al., ) , least mean squares (wan et al., ), or recurrent neural networks (qiu et al., ) . a very simple but remarkably powerful method is provided by computing the mean period of the hum noise (levkov et al., ) . the goal of the present work is to find a distortion-free hum-filter which is robust to transients in the data and which can handle even short time series. with respect to levkov et al. ( ) , it is substantially improved the method by using a median rather than the mean values for creating the filter, by introducing a moving window. the periodic median is calculated from the high-pass-filtered data and is thus robust to transients. this new, periodic median subtraction (pms), filter is tested on artificial and experimental data. as a reference the current best performing frequencybased hum filter is used (singh et al., ) . figure illustrates the creation of the filter schematically. the data are highpass filtered using an th-order zero-phase digital butterworth filter (matlab functions butter and filtfilt). the cut-off frequency optimally is somewhat below half the power line frequency (i.e., hz cut-off for hz power line frequency), to ensure that the high-pass filter does not deform the noise components of the data. then the periodicity is established as the ratio of the measurement fre- quency and the power line frequency. if this is not a whole number, the smallest multiple of the period that constitutes a whole number is to be taken). the shape of the power line noise for each sample is then estimated as median of the samples differing a multiple of the noise period from the current sample. recorded signal usually show fluctuations in amplitude (and/or frequency, see footnote ) and spectral shape. performance is much improved when the filter is designed as a moving window. the thus retrieved time series (i.e., the filter) is subtracted from the raw data. we implemented two cases, which are available as a matlab function (de lussanet, ). if the window of the length of the data (or longer). the array of data is reshaped to a matrix of size n periods × n humperiod . for the case of a moving window, an array is created that repeats the data for n times, with n the width of the window, expressed in periods of the hum. this array is then reshaped into a matrix, which is one period longer than the data. by this reshaping, each consecutive row of data is shifted by one period with respect to the previous one. n- n- n- n- ... repeat-by-oneperiod fig. schema of the periodic median subtraction (pms) filter. the raw data contain a pattern of additive hum noise of n periods (top row). for construction of the filter, the offset and slower trends are removed by high-pass filtering. the thus filtered signal is repeated into a matrix, of which each row is shifted by one hum period. the number of rows is the width of the filter window, expressed in hum periods. for each column the median is calculated. the first and last periods are repeated at the beginning and end of the filter respectively. the thus obtained filter is subtracted from the data. the filter is calculated as the median across the columns. the resulting array of the length of the hum period is repeated to be at least as long as the data. small offsets are corrected for by subtracting the mean. in the case of a moving window, the beginning and end of the filter matrix overlap beginning and end of the data series. therefore these parts of the filter are replaced by repeats of the first and last valid period of the filter respectively ( fig. ). the method was validated using simulated and real data. the simulated data was a sequence of white noise, s, hz and an amplitude of . in some simulations the simulated data were low-passed filtered at hz (bidirectional st order butterworth). the simulated additive noise was a wave-shaped pattern with rich harmonics. it had a period of samples ( hz) and an amplitude of . in some simulations, other amplitudes (signal-to-noise-ratio, snr) and other pe-riodic patterns were tested (sinusoidal, periodic random sequence, and periodic pulse). for testing purpose, a step of was added to the data at s. during the interval - s, the noise amplitude was reduced from to . the experimental data involved jumping on a force plate and emg recordings using two different systems. the jump was recorded using kistler force plates (type ca), connected via a kistler daq (type a ) to the recording computer. the data wer sampled at a rate of hz. the daq had an external power adapter, whereas the computer had an internal power adapter, so the housing was connected to earthing of the v ac plug. as the housing of the computer and the daq were not electrically connected, a considerable hz hum with some harmonics was present in the recorded data (amplitude ∼ n). the recorded drop jump consisted of two loaded phases and two non-loaded phases, with sudden, transient transitions between these phases. one bipolar emg recording was made using a wired system (tom erfassung, gjb datentechnik gmbh, germany) with local amplifiers close to the surface electrodes (disposable ag/ag-cl electrodes type h sg, covidien, neustadt, germany). the electrodes were applied to the prepared skin over the right trapezius neck muscle and the reference electrode was placed over the c vertebra. the recording was made in a free-driving vehicle, whereas the power was taken from the on-board battery. the battery's dc current was transformed to v hz ac by using a standard dc-ac power transformer. the transformer caused a strong hum in the emg signal, which was mainly composed of the odd harmonics. the other emg recording was made of the right extensor digitorum longus muscle, using a wireless bipolar recording (noraxon telemyo dts) and disposable surface ekectrodes. the recording was made at hz with the standard online hz low-pass filter applied. the signal was completely hum-free. an artificial -hz harmonic-rich hum was added to the recorded data, but various kinds of hum spectra were tested also. the pms subtraction filter was applied to the raw data with hum in all cases. the standard window of hum periods ( . s) was applied. as a reference, we also applied the fdm filter (the fourier decomposition method the emg data were then rectified, and smoothed using a low-pass nd-order butterworth filter (the order was effectively doubled by applying the filter bidirectionally to prevent phase shifts). the results of the simulations are summarized in figure . the original data consisted of white noise with amplitude , the filter was applied to these data with added hum with an amplitude of times the signal. the absolute error in terms of the amplitude of the white noise "signal" was . / . / . / . (mean/median/max/st.dev.) for the new pms filter. as a comparison, the "gold standard", the fdm filter (singh et al., ) gave . / . / . / . , which is worse on each of these measures. panel c shows how well the filtered signal (red curve) represents the original data (green). panel a of figure shows the smoothed absolute error for ten simulations in both filters. it illustrates how the pms filter is free of onset deformations and has a very low error for the stationary period. the enlargement in panel b shows the influence of the step in the data and the sudden change in noise amplitude. note, that the change of noise amplitude (from to ) was quite large considering that the signal had an amplitude of . (singh et al., ) and the new pms hum-filter for simulated data ( hz white noise of amplitude , with hum of amplitude ). at s, there was a step in the data; at - s the noise amplitude was reduced to . a. absolute, smoothed error of the filtered signal with respect to the hum-free data, in simulations. b. selection of panel a. c. data without hum and two filtered versions around the step in the data at s of one exemplary simulation and a selection just before the step. d. absolute (non-smoothed) error at onset, at the step in the data and at onset of reduced noise, for one exemplary simulation. e. box plots of the absolute error for four time windows ( simulations). the stable period was taken from - s, the other windows lasted . s each (left: the new pms filter, right the reference fdm filter). f. median absolute error as a function of the amplitude of high frequencies in the data. in the onset and offset period the performance of the pms filter was com- pletely stable, as shown by panels d and e (fig. ) . also the large, transient step in the data did not affect the filter results (panel c-e). only the sudden, large change in hum amplitude caused a considerable error shortly before and after the onset and offset. the quality of the hum-filtering depended strongly on the amplitude of high frequencies (hf) in the data (fig. f) . applying the pms hum filter to white noise data (hf amplitude of : panels a-e) led to considerable errors ( . of the signal amplitude), whereas the errors were small ( . ) for hf amplitudes of . and below. the same trend was found for the fdm filter ( fig. f ). one manner to reduce the pms filter-errors is to increase the window. increasing the filter window of the pms filter from to periods ( s) reduced the median absolute error in white noise data to . , which is equivalent to the error in a signal with an hf amplitude of . (cf. fig. f ). changes of amplitude and shape of the hum-noise gave highly comparable results. in the example of figure a -f, a power inverter was the source of the hum-noise in an emg recording. consequently, the hum consisted almost entirely of harmonics (panel c). the pms filter completely eliminated the lower harmonics and most of harmonics above hz (panel c). panel b shows a sort period without muscle activity. panel d shows somewhat more than six periods of the filter. it can be seen how the filter changes slightly and gradually. panels e and f show the cleaned, rectified emg before and after low-pass filtering, showing a clean signal with a good resting amplitude, despite considerable hum-noise in the original recording. in the jumping example (fig. ) , the amplitude of the hum-noise was as much as n, because the ad converter (kistler daq) was not grounded. in the presented drop jump, the subject landed at first mainly on plate and landed after the jump mainly on plate . however in each landing part of one foot contacted the other plate, so both plates registered each landing. in the presence of the hum noise, it would not be possible to determine the exact times of contact and release, but after subtraction of the hum noise, the transients are so sharp that the precise time sample can be determined in each case (panels b and f). the shape of the frequency spectrum of the filtered signal is only changed at the harmonics (panel a, c). at hz (c) and hz (d) it can be seen that the frequency is not merely depressed or interpolated. rather, the pattern of the flanking spectrum is restored in a realistic pattern. this is especially clear in panel d, where the filtered spectrum regularly continues the oscillating pattern in the hz range. interestingly, the frequency power at hz is even enhanced by the filter. this study presents and validates a new filter for power line hum noise. power line hum still presents a serious problem in many kinds of measurements, ranging from electrophysiological to force plate recordings. the new method intro- duces periodic median subtraction (pms) and presents a number of improvements with respect to existing subtraction methods. the validation showed that the filter also performs better than frequency spectrum-based filter, i.e., the current "gold standard", the fdm filter (singh et al., ). an advantage of the pms filter is that it is completely independent of the harmonic structure of the hum noise. moreover, by introducing a filter window it is highly tolerant to temporal variations of the shape and amplitude of the hum. it is not common to analyse the true errors that are caused by filtering the data. instead, studies tend to plot data before and after filtering in separate this was so for the simulated data as well as for the measured data. thus, for emg (and thus likewise for other electrophysiological data) power line hum filtering causes relative large distortions to the data, whereas the same filter applied to force plate measurements is virtually distortion-free. the new pms filter performed better than the reference filter on all measures. theoretically, the performance can even be further improved by increasing the filter window. on the simulated data, the error made to a signal consisting of white noise (cf. fig. f ) could be compensated by a ten-fold increase of the window (i.e., from to seconds). in practice however, the hum is usually not so stable that such a long window of s is practicable. in the various data to which the pms filter was applied so-far, a window of periods ( s) usually was the optimal compromise (for the force plate data, longer windows seem to be even better, but the difference was so small, that it was irrelevant for the usual applications). with the sudden, large change on noise amplitude, both filters showed considerable distortions. in the onset and offset as well as in the transient condition the new pms filter had no increase of the error at all. this is quite unusual. even though the fdm filter that was used as a reference is robust to small offsets and baseline oscillations, it did show considerable distortion of the signal at onset and offset as well as during the transient. the distortion-free behaviour of the pms reveal its extraordinary robustness. the here-proposed pms filter provides a simple but very robust method. the filter is free of phase distortions has excellent handling of transients in the data. it automatically handles any harmonic composition of the hum noise. it performs especially well with high amplitudes of hum noise and copes well with variability of phase and amplitude of the noise. compared to existing subtraction filters, the pms filter excels in robustness, by using a window, by using the median period for extracting the current hum period, and by being based on the high-passed data. compared to the fourier-based filter that was used as a reference (singh, ) , the pms has considerably better properties in all mentioned aspects (onset/offset, transients, noise amplitude variability, low snr). the pms also performed better than the fdm on the analysed force and emg recordings. concluding, the pms is probably the best hum filter. a matlab routine with example data is provided. the author declares that he has no conflict of interest. the data and materials for all simulations are available at powerline noise elimination in biomedical signals via blind source separation and wavelet analysis observed brain dynamics a waveletbased method for power-line interference removal in ecg signals elimination of power line interference from ecg signals using recurrent neural networks baseline wander and power line interference removal from ecg signals using eigenvalue decomposition breaking the limits: redefining the instantaneous frequency baseline wander and power-line interference removal from ecg signals using fourier decomposition method various techniques for removal of power line interference from ecg signal the elimination of hz power line interference from ecg using a variable step size lms adaptive filtering algorithm key: cord- -a fx tyd authors: tang, tiffany; jaimes, javier a.; bidon, miya k.; straus, marco r.; daniel, susan; whittaker, gary r. title: proteolytic activation of the sars-cov- spike s /s site: a re-evaluation of furin cleavage date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a fx tyd the severe acute respiratory syndrome coronavirus (sars-cov- ) uses its spike (s) protein to mediate viral entry into host cells. cleavage of the s protein at the s /s and/or s ’ site is known to activate the s protein for viral entry, which can occur at either the cell plasma membrane or the endosomal membrane. previous studies show that sars-cov- has a unique insert at the s /s site that can be cleaved by furin, which expands viral tropism to lung cells. here, we analyze the presence of a furin s /s site in related covs and offer thoughts on the implications of sars-cov- ’s unique insert on its origin. we also utilized viral pseudoparticles to study the impact of the s /s cleavage on infectivity. our results demonstrate that s /s pre-cleavage is essential for plasma membrane entry into calu- cells, a model lung epithelial cell line, but not for endosomal entry vero e cells, a model cell culture line, and that other proteases in addition to furin are responsible for processing sars-cov- s /s . the st century has seen the rise of pathogenic strains of human coronaviruses (covs) causing major public health concerns, first with the severe acute respiratory syndrome coronavirus (sars-cov) in , then the middle east respiratory syndrome coronavirus (mers-cov) first emerged in , and now, with the severe acute respiratory syndrome coronavirus (sars-cov- ). sars-cov- causes the disease syndrome known as covid- , now classified as a pandemic with global reach and devastation. cov host cell entry is mediated by its spike (s) glycoprotein, a large transmembrane protein that decorates the virus particle . the s protein is demarcated into two domains, the s domain, which is the receptor binding domain, and the s domain, which contains the membrane fusion machinery. there are two cleavage events associated with s-mediated membrane fusion . the first is a priming cleavage that occurs at the interface of the s /s region (s /s ) for some coronaviruses, and the second is the obligatory triggering cleavage that occurs within the s region (s ') . the priming cleavage generally converts the s protein into a fusion competent form, by enabling the s protein to better bind receptors or expose hidden cleavage sites . the triggering cleavage initiates a series of conformational changes that enable the s protein to harpoon into the host membrane for membrane fusion . there are a variety of proteases capable of priming and triggering cov s proteins. depending on which protease are available, cov can fuse with either the plasma membrane or the endosomal membrane . sars-cov was found to utilize the transmembrane protease tmprss to fuse at the plasma membrane surface . however, tmprss expression is limited to respiratory cell lines, and in tmprss negative cell lines, sars-cov utilized endosomal cathepsin l to fuse to the endosomal membrane . mers-cov was also found to utilize tmprss and cathepsin l with one major difference. the mers-cov s /s boundary contains an rsvr insert that can be recognized by furin or related proprotein convertases (pc), proteases commonly found in the secretory pathway of most cell lines , . during the s maturation process, the s protein can be cleaved by furin/pcs , . thus, mers-cov particles harbor cleaved s protein and it was observed that the s /s pre-cleavage was crucial for mers-cov , but not sars-cov , to infect via the plasma membrane route. however, the s /s cleavage was not a requirement for mers-cov endosomal pathway infection . for sars-cov- , early studies showed that the s /s junction contained an insert with two additional basic residues, p-r-r-a (r -arginine, a-alanine), that was not present in sars-cov or its closest bat ancestor viruses , . this insert forms a p-r-r-a-r sequence and while it does contain the minimum furin recognition motif, r-x-x-r, it is unusual and diverges from the preferred and canonical r-x-k/r-r motif. compared to the canonical motif, the residues at the p and p location for sars-cov- s /s are reversed, with r at the p instead of the p location, and an a at the p instead of the p location , ( figure ) . intriguingly, the only other known example of this insert on furindb , a database of furin substrates, is found in proaerolysin, a bacterial toxin, and was determined to be activated by furin . indeed, early studies suggest that the sars-cov- is also processed by furin since sars-cov- harbors a cleaved s protein, likely due to furin processing at the s /s site [ ] [ ] [ ] . similar to what was observed for mers-cov, this s /s cleavage was determined to be a prerequisite for tmprss activation at the s ' for sars-cov- infection in respiratory cell lines, such as calu- , . furthermore, like mers-cov, sars-cov- can also utilize cathepsin l in the endosomal pathway in tmprss -negative cell lines, such as vero e . in this study, we sought to better characterize the s /s cleavage site of sars-cov- s and raise some intriguing possibilities for how the site might have emerged. using pseudoparticles, we investigated the impact of this s /s cleavage for successful infection of cells via the plasma membrane and endosomal route. our data suggests that the s /s cleavage is essential for tmprss mediated plasma membrane entry, but not for cathepsin l mediated endosomal entry, and that furin may not be the only protease responsible for the s /s cleavage event. furin-cleavage predictions of the sars-cov- s /s site. we used the pitou and prop cleavage prediction tools to analyze the likelihood that the s /s site is processed by furin, with positive scores (pitou) and . (prop) indicating furin cleavage. prop predicts furin cleavage sites based on networks derived from experimental data, whereas pitou uses a combination of a hidden markov model and biological knowledge-based cumulative probability score functions to characterize a amino acid motif from p to p ' that reflects the binding strength and accessibility of the motif in the furin binding pocket. thus, the pitou algorithm has been reported to be more sensitive and specific than prop for predicting furin cleavage . both algorithms agree with each other in predicting furin cleavage, strengthening the predictions (figure ) , but since the pitou algorithm is more sensitive and specific, we decided to focus on the pitou values for further analysis. the pitou algorithm predicts that the sars-cov- s /s site (score: . ) can be cleaved by furin, whereas other lineage b covs that have been proposed as sars-cov- precursors, such as sars-cov, ratg , zc , zxc and rmyn , cannot be cleaved by furin. these predictions have been confirmed experimentally as recombinant furin was shown to process the purified sars-cov- s protein, generating appropriate cleavage products , whereas no cleavage products were detected when processing the sars-cov s. notably, the pitou scores of traditionally accepted furin cleavage sites, such as those found in influenza h n (score: . ) and hcov-hku (score: . ), are much higher than for sars-cov- , suggesting that while sars-cov- s /s site has increased furin recognition when compared to other lineage b-betacovs, the site is not optimal for furin cleavage. the modest sars-cov- pitou score raises question about the origin and evolution of the virus, possibly suggesting that it does not represent an "insertion" into the viral genome as is typically assumed . in this scenario, it is worth noting that in betacov lineage c, mers-cov (score: . ) is barely above threshold. in fact, within the betacov lineage c, there is a range of pitou scores for the closest known bat virus ancestors to mers-cov; i.e. batcov-hku (score: < ) to batcov-hku (score: . ). with this in mind, the mers-cov s /s site can be seen as either a gain-of-furin-cleavage mutation from batcov-hku , or equally a loss-of-furin-cleavage from batcov-hku . thus, for sars-cov- , a simple gain-of-furincleavage may not necessarily be the cause of increased disease, leaving open the potential for a yet-to-be discovered ancestor virus in bats with a robust furin cleavage site (i.e. score approx. - ), but one that may not have gained the capacity to bind a human receptor and for which a robust furin cleavage site may be been down-regulated or expanded to other proteases with less specific recognition motifs. these suggested parallels are informed by betacov lineage c where mers-cov (score: . ) and batcov-hku bind hdpp , but with batcov-hku (score: . ) unable to bind this receptor . overall, while it is generally believed that the sars-cov- s /s site is a gained furin insert, we consider it is important to consider alternative explanations-especially as the search continues to determine the origin of sars- mlv pseudoparticles as a system to study sars-cov- entry. to assess the functional importance of the s /s site for sars-cov- entry, we utilized viral pseudoparticles. these particles consist of a murine leukemia virus (mlv) core and are decorated with the viral envelope protein to accurately recapitulate the entry steps of their native counterpart . these particles also contain a luciferase reporter that integrates into the host cell genome upon successful infection and drives the cells to produce luciferase, which is quantifiable. we and others have used mlv-based pseudoparticles widely to study cov entry pathways , , . mlv pseudoparticles exhibiting the sars-cov- s protein (sars-cov- pp), or the sars-cov s protein (sars-covpp) were generated alongside positive control particles containing the vesicular stomatitis virus g protein (vsvpp) or negative control particles (Δenvpp) lacking envelope proteins. since coronaviruses can enter via the plasma membrane or via endosomes, we chose to infect cell lines representative of each pathway, as the entry mechanism can be highly cell-type dependent . we utilized the calu- and the vero e cell lines for these studies, as they are commonly used cell lines for studying sars-cov and sars-cov- plasma membrane and endosomal entry, respectively. as expected, vsvpp (positive control) infected vero e and calu- cells with several orders of magnitude higher luciferase units than the values reported with Δenvpp infection (negative control) (figure a, b) . this confirms that the envelope protein is driving infection, and not the particle itself. in the case of sars-covpp and sars-cov- pp, both particles are infectious as they drive luciferase production several orders of magnitude higher than Δenvpp. (figure a, b) . it is noted that sars-covpp are more infectious than sars-cov- pp. to determine the cause of this difference, we analyzed the particles via western blot to visualize s content. for the sars-cov- pp, we detected a band at kda and for sars-covpp, a strong band at kda ( figure c) . the kda band corresponds to the s segment of the s protein following s /s cleavage. the kda band for the sars-covpp corresponds to uncleaved s protein. we observe that the sars-cov s band is more intense than the sars-cov- s band, despite both particles showing similar intensities for the mlvp loading control band. as a result, we infer that the sars-covpp incorporated more s protein than sars-cov- pp, resulting in the higher infectivity observed for sars-covpp transduction. use of the dec-rvkr-cmk protease inhibitor to produce sars- pp with uncleaved s. to examine the functional role of s priming by furin/pc proteases, we needed to produce sars-cov- pp expressing uncleaved s protein. this can be accomplished by adding in an appropriate protease inhibitor to producer cells to prevent s /s cleavage during biogenesis. we chose dec-rvkr-cmk because it has been shown to inhibit furin/pcs, preventing s /s cleavage , , . indeed, we previously showed that addition of dec-rvkr-cmk to producer cells generates mlv pseudotyped particles that harbor full-length mers-cov s protein (mers-covpp) and we observe a similar trend with sars-cov- pp generated from cells treated with dec-rvkr-cmk ( figure c, lanes and ) . the dec-rvkr-cmk-treated (uncleaved) particles were used to transduce either the vero e or calu- cells to observe the impact of the s /s cleavage on infection. in vero e cells, the uncleaved sars-cov- pp were -fold more infectious than their cleaved counterparts, suggesting that uncleaved particles result in more infectious particles, and that the s /s pre-cleavage is not required or hinders infection via the endosomal pathway ( figure a) . however, in calu- cells, the opposite trend was observed; the uncleaved particles were significantly less infectious than their cleaved counterparts (figure b) , suggesting that s /s pre-cleavage is essential , for plasma membrane entry and increased uncleaved s incorporation cannot compensate. thus, we were curious if we could restore the infection of the uncleaved particles by treating them with exogeneous furin, which has been previously shown to process purified sars-cov- s protein . in calu- , we observed a -fold increase in infection when uncleaved particles were treated with exogeneous furin, but with cleaved particles, we did not observe any statistically significant increase with furin treatment (figure b) . in vero e cells, for both types of particles, we did not observe a statistically significant increase in infection with furin treatment. (figure a ). since furin treatment only modestly increased infectivity in all cases observed, we wanted to visualize how efficiently furin was processing the sars-cov- s. for cleaved particles, exogeneous furin was able to fully reduce the faint full-length s band into s (figure c, lanes and ) . for uncleaved particles, furin treatment had no observable impact, as the intensity of the full-length s bands are the same and there are no additional bands shown between furin treated and non-treated particles (figure c, lanes and ) . additional bands at > kda likely corresponding to dimeric and trimeric s . in addition, the intensity of the uncleaved sars-cov- pp s bands are greater than the intensity of the cleaved sars-cov- pp s bands and this trend was observed with mers-covpp s . this likely suggests that the increased infectivity we observed with the uncleaved sars-cov- pp is due to greater s incorporation, though the s is uncleaved. due to the limited ability of exogenous furin to rescue and cleave sars-cov- s, we re-evaluated the role of furin in processing the s /s site. since dec-rvkr-cmk can also inhibit a number of furin related pcs, the effects that have been observed with dec-rvkr-cmk particles could have resulted from inhibition of other proteases and not furin. therefore, we produced particles in the presence of alpha -pdx, a potent and more selective furin inhibitor than dec-rvkr-cmk. infectivity results of these particles in vero e cells that a highly furin specific inhibitor failed to recapitulate the high level of enhancement provided by dec-rvk-cmk at all tested inhibitor concentrations (figure ). this indicates that furin itself may not in fact be the only active protease processing the s /s site, as is generally assumed. other pcs within the secretory pathway may also assist with the s /s cleavage and have been observed to cleave sars-cov- s /s peptide . lastly, we wanted to investigate if dec-rvkr-cmk inhibition affects viral proteins that do not feature a furin s /s site, such as sars-cov. dec-rvkr-cmk treatment had no significant impact on sars-cov s mediated infection of vero e and calu- cells (figure a and b) , suggesting that dec-rvkr-cmk impacts on sars-cov- s is due to inhibiting the s /s pre-cleavage and not due to some general effect on protein expression. treatment of sars-covpp with exogeneous furin also yielded no difference in protein conformation ( figure c) . overall, the results support observations , that the role of the sars-cov- s /s site is to expand viral tropism to lung cells. cleaved s /s site is crucial for the sars-cov- to be subsequently cleaved at the s ' location by tmprss for immediate plasma membrane entry in respiratory cells. without the s /s pre-cleavage, sars-cov- would be endocytosed, and due to low cathepsin l expression in respiratory endosomes , coupled with expression of antiviral restriction factors in endosomes , sars-cov- would not effectively infect via respiratory endosome. if sars-cov- is infecting tmprss negative cells, it can utilize endosomal cathepsin l, an ubiquitous protease generally found throughout mammalian cells , to activate the s protein , though at undetermined sites. however, for cathepsin l activation, s /s precleavage is not required, and our results indicate that preventing this cleavage increases infectivity of sars-cov- . this may be connected to recent work showing that s /s pre-cleavage reduces s thermal stability . interestingly, the s /s site activation appears to also have a role in the immune response against sars-cov- , as it has been shown that sars-cov- s with a deleted s /s loop can provide better protective immune response than with the s /s loop . the role of furin in activating the s /s site was also investigated. as protease inhibitors commonly employed in cell entry studies, such as dec-rvkr-cmk, can also inhibit other pcs in addition to furin , it is difficult to ascertain the impact of individual proteases. while furin likely can process the sars-cov- site, our data suggests that other pcs are also involved, since western blots show poor processing of uncleaved s upon addition of purified furin and also the use of a highly selective furin inhibitor has a modest impact on infectivity. as we consider protease-based inhibitors in treating covid- , especially those targeting furin , it is important to thoroughly consider evaluate the role of these proteases to be certain that the relevant proteases are targeted. predicted structural modeling. s protein models were built using ucsf chimera (v. . , university of california) through modeler homology tool of the modeller extension (v. . , university of california) as described . sars-cov and sars-cov-- s models were built based on the sars-cov s structure (pdb no. packaging construct, the ptg-luc luciferase reporter, pcaggs/vsv-g, pcaggs, and pcdna/sars-s plasmids are as previously described . the pcdna/sars -s was a generous gift from veesler lab . protease inhibitor decanoyl-rvkr-cmk (dec-rvkr-cmk) was purchased from tocris and resuspended in sterile water to mm. recombinant furin was purchased from new england biolabs. human alpha- pdx (alpha -pdx) recombinant protein was purchased from thermofisher scientific. pseudoparticle production. pseudotyped virus were produced from published protocols with minor modifications . briefly, hek t cells were seeded to % confluency. cells were transfected with ptgluc ( ng), pcmv-mlv-gagpol ( ng), and the respective viral envelope protein ( ng) using polyethylenimine for hours. supernatants were then harvested, centrifuged, clarified, and stored in - °c aliquots. for + dec-rvkr-cmk particles, . µl of dec-rvkr-cmk was added to cells immediately after transfection and boosted with an additional . µl hours later (final concentration: µm). for alpha -pdx particles, indicated concentrations were added to cells immediately after transfection. pseudoparticle assays. infection assays were as previously described with minor modifications (inset) magnification of s /s site with conserved r and s residues (red ribbon) and the unique four amino acid insertion p-r-r-a for sars-cov- (blue ribbon) are shown. the p's denote the position of that amino acid from the s /s cleavage site, with p -p referring to amino acids before the cleavage site and p ' referring to amino acids after the cleavage site. particles were used to infect vero and calu- cells and infectivity is normalized to the -dec condition. error bars represent the standard error measurements of three biological replicates (n= ). statistical analysis was performed using an unpaired student's t test. ns, non-significant, p > . . (c) western blot analysis of incorporated s in sars-covpp. epidemiology and cause of severe acute respiratory syndrome (sars) in people's republic of china isolation of a novel coronavirus from a man with pneumonia in saudi arabia a novel coronavirus from patients with pneumonia in china mechanisms of coronavirus cell entry mediated by the viral spike protein fusion of enveloped viruses in endosomes set, fuse! the coronavirus spike protein and acquisition of fusion competence efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism different residues in the sars-cov spike protein determine cleavage and activation by the host cell protease tmprss cryo-em structure of the -ncov spike in the prefusion conformation phylogenetic analysis and structural modeling of sars-cov- spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop proprotein convertase models based on the crystal structures of furin and kexin: explanation of their specificity the crystal structure of the proprotein processing proteinase furin explains its stringent specificity a database of -residue furin cleavage site motifs, substrates and their associated drugs the pore-forming toxin proaerolysin is activated by furin sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor article sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells tmprss and furin are both essential for proteolytic activation of sars-cov- in human airway cells computational prediction of furin cleavage sites by a hybrid method and understanding mechanism underlying diseases prediction of proprotein convertase cleavage sites sars-cov- and bat ratg spike glycoprotein structures inform on virus evolution and furin-cleavage effects the proximal origin of sars-cov- receptor usage and cell entry of bat coronavirus hku provide insight into bat-tohuman transmission of mers coronavirus production of pseudotyped particles to study highly pathogenic coronaviruses in a biosafety level setting ca + ions promote fusion of middle east respiratory syndrome coronavirus with host cells and increase infectivity coronavirus membrane fusion mechanism offers a potential target for antiviral development middle east respiratory syndrome coronavirus spike protein is not activated directly by cellular furin during viral entry into target cells α -antitrypsin portland, a bioengineered serpin highly selective for furin: application as an antipathogenic agent proteolytic cleavage of the sars-cov- spike protein and the role of the novel s /s site the furin cleavage site of sars-cov- spike protein is a key determinant for transmission due to enhanced replication in airway cells a single immunization with nucleoside-modified mrna vaccines elicits strong cellular and humoral immune responses against sars-cov- in mice ll a single immunization with nucleoside-modified mrna vaccines elicits strong cellular and humoral immune responses against sars-cov- in mice furin inhibitors block sars-cov- spike protein cleavage to suppress virus production and cytopathic effects key: cord- -d joq authors: arthur, ronan f.; jones, james h.; bonds, matthew h.; ram, yoav; feldman, marcus w. title: adaptive social contact rates induce complex dynamics during epidemics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d joq the covid- pandemic has posed a significant dilemma for governments across the globe. the public health consequences of inaction are catastrophic; but the economic consequences of drastic action are likewise catastrophic. governments must therefore strike a balance in the face of these trade-offs. but with critical uncertainty about how to find such a balance, they are forced to experiment with their interventions and await the results of their experimentation. models have proved inaccurate because behavioral response patterns are either not factored in or are hard to predict. one crucial behavioral response in a pandemic is adaptive social contact: potentially infectious contact between people is deliberately reduced either individually or by fiat; and this must be balanced against the economic cost of having fewer people in contact and therefore active in the labor force. we develop a model for adaptive optimal control of the effective social contact rate within a susceptible-infectious-susceptible (sis) epidemic model using a dynamic utility function with delayed information. this utility function trades off the population-wide contact rate with the expected cost and risk of increasing infections. our analytical and computational analysis of this simple discrete-time deterministic model reveals the existence of a non-zero equilibrium, oscillatory dynamics around this equilibrium under some parametric conditions, and complex dynamic regimes that shift under small parameter perturbations. these results support the supposition that infectious disease dynamics under adaptive behavior-change may have an indifference point, may produce oscillatory dynamics without other forcing, and constitute complex adaptive systems with associated dynamics. implications for covid- include an expectation of fluctuations, for a considerable time, around a quasi-equilibrium that balances public health and economic priorities, that shows multiple peaks and surges in some scenarios, and that implies a high degree of uncertainty in mathematical projections. author summary epidemic response in the form of social contact reduction, such as has been utilized during the ongoing covid- pandemic, presents inherent tradeoffs between the economic costs of reducing social contacts and the public health costs of neglecting to do so. such tradeoffs introduce an interactive, iterative mechanism which adds complexity to an infectious disease system. consequently, infectious disease modeling typically has not included dynamic behavior change that must address such a tradeoff. here, we develop a theoretical model that introduces lost or gained economic and public health utility through the adjustment of social contact rates with delayed information. we find this model produces an equilibrium, a point of indifference where the tradeoff is neutral, and at which a disease will be endemic for a long period of time. under small perturbations, this model exhibits complex dynamic regimes, including oscillatory behavior, runaway exponential growth, and eradication. these dynamics suggest that for epidemic response that relies on social contact reduction, secondary waves and surges with accompanied business re-closures and shutdowns may be expected, and that accurate projection under such circumstances is unlikely. the covid- pandemic had infected almost million people and caused over , deaths worldwide as of june , [ ] . in the absence of effective therapies and vaccines [ ] , many governments responded with lock-down policies and social distancing laws to reduce the rate of social contacts and curb transmission of the virus. prevalence of covid- in the wake of these policies in the united states indicates they may have been successful at decreasing the reproduction number (r t ) of the epidemic [ ] . however, they have also led to economic recession with an unemployment rate at an -year peak, the stock market in decline, and the federal government forced to borrow heavily to financially support businesses and households. solutions to these economic crises may conflict with public health recommendations. thus, governments worldwide must decide how to balance the economic and public health consequences of their epidemic response interventions. behavior-change in response to an epidemic, whether autonomously adopted by individuals or externally directed by governments, affects the dynamics of infectious diseases [ , ] . prominent examples of behavior-change in response to infectious disease prevalence include measles-mumps-rubella (mmr) vaccination choices [ ] , social distancing in influenza outbreaks [ ] , condom purchases in hiv-affected communities [ ], and social distancing during the ongoing covid- pandemic [ ] . behavior is endogenous to an infectious disease system because it is, in part, a consequence of the prevalence of the disease, which in turn responds to changes in behavior [ , ] . individuals and governments have greater incentive to change behavior as prevalence increases; conversely they have reduced incentive as prevalence decreases [ , ] . endogenous behavioral response may then theoretically produce a non-zero endemic equilibrium of infection. this happens because, at low levels of prevalence, the cost of avoidance of a disease may be higher than the private benefit to the individual, even though the collective, public benefit in the long-term may be greater. however, in epidemic response we typically think of behavior-change as an exogenously-induced intervention without considering associated costs. while guiding positive change is an important intervention, neglecting to recognize the endogeneity of behavior can lead to a misunderstanding of incentives and a resurgence of the epidemic when behavior change is reversed prematurely. although there is growing interest in the role of adaptive human behavior in infectious disease dynamics, there is still a lack of general understanding of the most important properties of such systems [ , , ] . behavior is difficult to measure, quantify, or predict [ ] , in part, due to the complexity and diversity of human beings who make simply allowed the transmission parameter (β) to be a negative function of the number infected, effectively introducing an intrinsic negative feedback to the infected class that regulated the disease [ ] . modelers have used a variety of tools, including agent-based modeling [ ] , network structures for the replacement of central nodes when sick [ ] or for behavior-change as a social contagion process [ ] , game theoretic descriptions of rational choice under changing incentives as with vaccination [ , , ] , and a branching process for heterogeneous agents and the effect of behavior during the west africa ebola epidemic in [ ] . a common approach to incorporating behavior into epidemic models is to track co-evolving dynamics of behavior and infection [ , , ] , where behavior represents an i-state of the model [ ] . in a compartmental model, this could mean separate compartments (and transitions therefrom) for susceptible individuals in a state of fear and those not in a state of fear [ ] . periodicity (i.e. multi-peak dynamics) has long been documented empirically in epidemiology [ , ] . periodicity can be driven by seasonal contact rate changes (e.g. when children are in school) [ ] , seasonality in the climate or ecology [ ] , sexual behavior change [ ] , and host immunity cycling through new births of susceptibles or a decay of immunity over time. some papers in nonlinear dynamics have studied delay differential equations in the context of epidemic dynamics and found periodic solutions [ ] . although it is atypical to include delay in modeling, delay is an important feature of epidemics. for example, if behavior responds to mortality rates, there will inevitably be a lag with an average duration of the incubation period plus the life expectancy upon becoming infected. in a tightly interdependent system, reacting to outdated information can result in an irrational response and periodic cycling. the original epidemic model of kermack and mckendrick [ ] was first expressed in discrete time. then by allowing "the subdivisions of time to increase in number so that each interval becomes very small" the famous differential equations of the sir epidemic model were derived. here we begin with a discrete-time susceptible-infected-susceptible model that is adjusted on the principle of endogenous behavior-change through an adaptive social-contact rate that can be thought of as either individually motivated or institutionally imposed. we introduce a dynamic utility function that motivates the population's effective contact rate at a particular time period. this utility function is based on information about the epidemic size that may not be current. this leads to a time delay in the contact function that increases the complexity of the population dynamics of the infection. results from the discrete-time model show that the system approaches an equilibrium in many cases, although small parameter perturbations can lead the dynamics to enter qualitatively distinct regimes. the analogous continuous-time model retains periodicities for some sets of parameters, but numerical investigation shows that the continuous time version is much better behaved than the discrete-time model. this dynamical behavior is similar to models of ecological population dynamics, and a useful mathematical parallel is drawn between these systems. to represent endogenous behavior-change, we start with the classical discrete-time susceptible-infected-susceptible (sis) model [ ] , which, when incidence is relatively small compared to the total population [ , ] , can be written in terms of the recursions where at time t, s t represents the number of susceptible individuals, i t the infected individuals, and n t the number of individuals that make up the population, which is assumed fixed in a closed population. we can therefore write n for the constant population size. here γ, with < γ < , is the rate of removal from i to s due to recovery. this model in its simplest form assumes random mixing, where the parameter b represents a composite of the average contact rate and the disease-specific transmissibility given a contact event. in order to introduce human behavior, we substitute for b a time-dependent b t , which is a function of both b , the probability that disease transmission takes place on contact, and a dynamic social rate of contact c t whose optimal value, c * t , is determined at each time t as in economic epidemiological models [ ] , namely where c * t represents the optimal contact rate, defined as the number of contacts per unit time that maximize utility for the individual. here, c * t is a function of the number of this utility function is assumed to take the form here u represents utility for an individual at time t given a particular number of contacts per unit time c, α is a constant that represents maximum potential utility achieved at a target contact rateĉ. the second term, α (c −ĉ) , is a concave function that represents the penalty for deviating fromĉ. the third term, the delay in information acquisition and the speed of response to that information. we note that ( − i n b ) c can be approximated by when i n b is small and c i n b << . we thus assume i n (b ) is small, and approximate u (c) in eq. using eq. . eq. assumes a strictly negative relationship between number of infecteds and contact. we assume an individual or government will balance the cost of infection, the probability of infection, and the cost of deviating from the target contact rateĉ to select an optimal contact rate c * t , namely the number of contacts, which takes into account the risk of infection and the penalty for deviating from the target contact rate. this captures the idea that individuals trade off how many people they want to interact with versus their risk of getting sick, or that authorities want to reopen the economy during a pandemic and have to trade off morbidity and mortality from increasing infections with the need to allow additional social contacts to help the economy restart. this optimal contact rate can be calculated by finding the maximum of u with respect to c from eq. with substitution from eq. , namely differentiating, we have which vanishes at the optimal contact rate, c * , which we write as c * t to show its dependence on time. then which we assume to be positive. therefore, total utility will decrease as i t increases and c * t also decreases. utility is maximized at each time step, rather than over the course of lifetime expectations. in addition, eq assumes a strictly negative relationship between number of infecteds at time t − ∆ and c * . while behavior at high degrees of prevalence has been shown to be non-linear and fatalistic [ , ] , in this model, prevalence (i.e., b it n ) is assumed to be small, consistent with eq. . we introduce the new parameter α = α we can now rewrite the recursion from eq. , using eq. and replacing c t with c * t as defined by eq. , as when ∆ = and there is no time delay, f (·) is a cubic polynomial, given by july , / for the susceptible-infected-removed (sir) version of the model, we include the removed category and write the (discrete-time) recursion system as the baseline contact rate and c * t specified by eq. . with b t = b, say, and not changing over time, eqs. - form the discrete-time version of the classical kermack-mckendrick sir model [ ] . the inclusion of the removed category entails thatĨ = is the only equilibrium of the system eqs. - ; unlike the sis model, there is no equilibrium with infecteds present. in general, since c * t includes the delay ∆, the dynamic approach toĨ = is expected to be quite complex. intuitively, since the infecteds are ultimately removed, we do expect that from any initial frequency i of infecteds all n individuals will eventually be in the r category. numerical analysis of this sir model shows strong similarity between the sis and sir models for several hundred time steps before the sir model converges toĨ = with r = n . in the section "numerical iteration and continuous-time analog" we compare the numerical iteration of the sis (eq. ) and sir (eqs. [ ] [ ] [ ] and integration of the continuous-time (differential equation) versions of the sis and sir models. to determine the dynamic trajectories of ( ) without time delay, we first solve for the fixed point(s) of the recursion ( ) (i.e., value or values of i such that from eq. , it is clear that i = is an equilibrium as no new infections can occur in the next time-step if none exist in the current one. this is the disease-free equilibrium denoted byĨ. other equilibria are the solutions of we label the solution with the + sign i * and the one with the − signÎ. it is important to note that under these conditionsÎ is an equilibrium of the for this to hold whenÎ is legitimate is if inequalities ( ) and nĉb > γ hold, thenÎ is locally stable. however, even if both of these inequalities hold, the number of infecteds may not converge toÎ. it is well known that iterations of discrete-time recursive relations, of which ( ) is an example (i.e., with ∆ = ), may produce cycles or chaos depending on the parameters and the starting frequency i of infecteds. table shows an array of possible asymptotic dynamics with ∆ = found by numerical iteration of ( ) for a specific set of parameters and an initial frequency table are examples for which, beginning with a single infected, the number of infecteds explodes, becoming unbounded; of course, this is an illegitimate trajectory since i t cannot exceed n . however, in the case marked * ,Î is locally stable and with a large enough initial number of infecteds, there is damped oscillatory convergence toÎ. in the case marked * * , with i = the number of infecteds becomes unbounded, but in this case,Î is locally unstable, and starting with i close to i a stable two-point cycle is approached; in this case df (i)/di| i=Î < − . table . stability analysis of the sis model is more complicated when ∆ = , and in the appendix we outline the procedure for local analysis of the recursion ( ) nearÎ. local stability is sensitive to the delay time ∆ as can be seen from the numerical iteration of july , / ( ) for the specific set of parameters shown in table . some analytical details related to table are in the appendix. table reports an array of dynamic trajectories for some choices of parameters and, in two cases, an initial number of infecteds other than i = . the first three rows show three sets of parameters for which the equilibrium values ofÎ are very similar but the trajectories of i t are different: a two-point cycle, a four-point cycle, and apparently chaotic cycling above and belowÎ. in all of these cases, df (i)/di| i=Î < − . clearly the dynamics are sensitive to the target contact rateĉ in these cases. the fourth and eighth rows show that i t becomes unbounded (tends to +∞) from i = , but a two-point cycle is approached if i is close enough toÎ: df (i)/di| i=Î < − in this case. for the parameters in the ninth row, if i is close enough toÎ there is damped oscillation intoÎ: here − < df (i)/di| i=Î < . the fifth and sixth rows of table exemplify another interesting dynamic starting from i = . i t becomes larger thanÎ (overshoots) and then converges monotonically down toÎ; in each case < df (i)/dt| i=Î < . for the parameters in the seventh row, there is oscillatory convergence toÎ from i = (− < df (i)/di| i=Î < ), while in the last row there is straightforward monotone convergence toÎ. a continuous-time analog of the discrete-time recursion ( ), in the form of a differential equation, substitutes di/dt for i t+ − i t in ( ). we then solve the resulting delay differential equation numerically using the vode differential equation integrator in scipy [ , ] (source code available at https://github.com/yoavram/sanjose). using the parameters in table figure with i = . in figure , with no delay (∆ = ) and a one-unit delay (∆ = ), the discrete and continuous dynamics are very similar, both converging toÎ. however, with ∆ = the differential equation oscillates intoÎ while the discrete-time recursion enters a regime of inexact cycling aroundÎ, which appears to be a state of chaos. for ∆ = and ∆ = , the discrete recursion "collapses". in other words, i t becomes negative and appears to go off to −∞; in figure , this is cut off at i = . the continuous version, however, in these cases enters a stable cycle aroundÎ. it is important to note that in figure for respectively. in fig. s s there appears to be convergence toÎ, but in fig. s l after about time units, in both discrete-and continuous-time sir versions, the number of infected begins to decline towards zero. it is worth noting that if the total population size of n decreases over time, for example, if we take n (t) = n exp(−zt), with z = b ĉ γ, then the short-term dynamics of the sis model in ( ) begins to closely resemble the sir version. this is illustrated in supplementary fig. s n , where b ,ĉ, γ are, as in figs. s s and s l, the same as in fig. , panel (a) . with n decreasing to zero, both s and i will approach zero in the our model makes a number of simplifying assumptions. we assume, for example, that all individuals in the population will respond in the same fashion to government policy. we assume that governments choose a uniform contact rate according to an optimized utility function, which is homogeneous across all individuals in the population. finally, we assume that the utility function is symmetric around the optimal number of contacts so that increasing or decreasing contacts above or below the target contact rate, respectively, yield the same reduction in utility. these assumptions allowed us to create the simplest possible model that includes adaptive behavior and time delay. in holling's heuristic distinction in ecology between tactical models, models built to be parameterized and predictive, and strategic models, which aim to be as simple as possible to highlight phenomenological generalities, this is a strategic model [ ] . we note that the five distinct kinds of dynamical trajectories seen in these computational experiments come from a purely deterministic recursion. this means that oscillations and even erratic, near-chaotic dynamics and collapse in an epidemic may not necessarily be due to seasonality, complex agent-based interactions, changing or stochastic parameter values, demographic change, host immunity, or socio-cultural idiosyncracies. this dynamical behavior in number of infecteds can result from mathematical properties of a simple deterministic system with homogeneous endogenous behavior-change, similar to complex population dynamics of biological organisms [ ] . the mathematical consistency with population dynamics suggests a parallel in ecology, that the indifference point for human behavior functions in a similar way to a carrying capacity in ecology, below which a population will tend to grow and above which a individuals are incentivized to change their behavior to protect themselves, they will, and they will cease to do this when they are not [ ] . further, our results show certain parameter sets can lead to limit-cycle dynamics, consistent with other negative feedback mechanisms with time delays [ , ] . this is because the system is reacting to conditions that were true in the past, but not necessarily true in the present. in our discrete-time model, there is the added complexity that the non-zero equilibrium may be locally stable but not attained from a wide range of initial conditions, including the most natural one, namely a single infected individual. observed epidemic curves of many transient disease outbreaks typically inflect and go extinct, as opposed to this model that may oscillate perpetually or converge [ ] , and surges in fluctuations in covid- cases globally [ ] . there may be many causes for such double-peaked outbreaks, one of which may be a lapse in behavior-change after the epidemic begins to die down due to decreasing incentives [ ] , as represented in our simple theoretical model. this is consistent with findings that voluntary vaccination programs suffer from decreasing incentives to participate as prevalence decreases [ , ] . it should be noted that the continuous-time version of our model can support a stable cyclic epidemic whose interpretation in empirical terms will depend on the time scale, and hence on the meaning of the delay, ∆. one of the responsibilities of infectious disease modelers (e.g. covid- modelers) is to predict and project forward what epidemics will do in the future in order to better assist in the proper and strategic allocation of preventative resources. covid- models have often proved wrong by orders of magnitude because they lack the means to account for adaptive response. an insight from this model, however, is that prediction becomes very difficult, perhaps impossible, if we allow for adaptive behavior-change because the system is qualitatively sensitive to small differences in values of key parameters. these parameters are very hard to measure precisely; they change depending on the disease system and context and their inference is generally subject to large errors. further, we don't know how policy-makers weight the economic trade-offs against the public health priorities (i.e., the ratio between α and α in our model) to arrive at new policy recommendations. to maximize the ability to predict and minimize loss of life or morbidity, outbreak response should not only seek to minimize the reproduction number, but also the length of time taken to gather and distribute information. another approach would be to use a predetermined strategy for the contact rate, as opposed to a contact rate that depends on the number of infecteds. in our model, complex dynamic regimes occur more often when there is a time delay. if behavior-change arises from fear and fear is triggered by high local mortality and high local prevalence, such delays seem plausible since death and incubation periods are lagging epidemiological indicators. lags mean that people can respond sluggishly to an unfolding epidemic crisis, but they also mean that people can abandon protective behaviors prematurely. developing approaches to incentivize protective behavior throughout the duration of any lag introduced by the natural history of the infection (or otherwise) should be a priority in applied research. this paper represents a first step in understanding endogenous behavior-change and time-lagged protective behavior, and we anticipate further developments along these lines that could incorporate long incubation periods and/or recognition of asymptomatic transmission. in the neighborhood of the equilibriumÎ, write i t =Î + ε t and i t−∆ =Î + ε t−∆ , where ε t and ε t−∆ are small enough that quadratic terms in them can be neglected in the expression for i t+ =Î + ε t+ . the linear approximation to (a ) is then and in the case ∆ = , this reduces to we focus first on ∆ = and write (a ) as ε t+ = ε t l(Î). recall thatÎ satisfies eq. ( ) , and substituting γ from ( ) now we turn to the general case ∆ = and eq. (a ), which we write as where a and b are the corresponding terms on the right side of (a constants with respect to time. local stability ofÎ is then determined by the properties of recursion (a ), whose solution first involves solving its characteristic equation in principle there are ∆ + real or complex roots of (a ), which we represent as λ , λ , . . . , λ ∆+ , and the solution of (a ) can be written as where c i are found from the initial conditions. convergence to, and hence local stability ofÎ, is determined by the magnitude of the absolute value (if real) or modulus (if complex) of the roots λ , λ , . . . , λ ∆+ :Î is locally stable if the largest among the ∆ + of these is less than unity. in table , results of numerically iterating the complete recursion ( ) are listed for the delay ∆ varying from ∆ = to ∆ = , all starting from i = , with n = , and the stated parameters. figure illustrates the discrete-and continuous-time dynamics summarized in table with complex roots . ± . i whose modulus is . , which is less than . the complexity implies cyclic behavior, and since the modulus is less than one, we see locally damped oscillatory convergence toÎ. for ∆ = , the characteristic equation is the cubic which has one real root . and complex roots . ± . i. here the modulus of the complex roots is . , which is greater than unity so thatÎ is not locally stable. in this case the dynamics depend on the initial value i . if i < , i t oscillates but not in a stable cycle. if i > , the oscillation becomes unbounded. world health organization. coronavirus disease (covid- ): situation report scientific and ethical basis for social-distancing interventions against covid- . the lancet infectious diseases social factors in epidemiology modelling the influence of human behaviour on the spread of infectious diseases: a review evolving public perceptions and stability in vaccine uptake game theory of social distancing in response to an epidemic the responsiveness of the demand for condoms to the local prevalence of aids nine challenges in incorporating the dynamics of behaviour in infectious diseases models impact and behaviour: the importance of social forces to infectious disease dynamics and disease ecology economic epidemiology and infectious diseases erratic flu vaccination emerges from short-sighted behavior in contact networks capturing human behaviour a generalization of the kermack-mckendrick deterministic epidemic model a hybrid epidemic model: combining the advantages of agent-based and equation-based approaches winter. ieee the effect of a prudent adaptive behaviour on disease transmission coupled contagion dynamics of fear and disease: mathematical and computational explorations a general approach for population games with application to vaccination ebola cases and health system demand in liberia the spread of awareness and its proceedings of the national academy of sciences a review the dynamics of physiologically structured populations periodicity in epidemiological models measles in england and wales-i: an analysis of factors underlying seasonal patterns seasonal and interannual cycles of endemic cholera in bengal - in relation to climate and geography etiology of newly emerging marine diseases epidemic cycles driven by host behaviour periodic solutions of delay differential equations arising in some models of epidemics a contribution to the mathematical theory of epidemics the royal society modeling infectious diseases in humans and animals princeton university press time series modelling of childhood diseases dynamical systems approach adaptive human behavior in epidemiological models choices, beliefs, and infectious disease dynamics higher disease prevalence can induce greater sociality: a game theoretic coevolutionary model global stability of an sir epidemic global stability for the seir model in epidemiology scipy-based delay differential equation (dde) solver the strategy of building models of complex ecological systems simple mathematical models with very complicated dynamics journal of the fisheries board of canada time-delay versus stability in population models with two and three trophic levels time delays are not necessarily destabilizing different epidemic curves for severe acute respiratory rational epidemics and their public control group interest versus self-interest in smallpox vaccination policy key: cord- -ysxc rpl authors: cheek, martin; onana, jean michel title: deinbollia onanae (sapindaceae), a new, endangered, montane tree species from the cameroon highlands date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ysxc rpl deinbollia onanae (sapindaceae-litchi clade) is here formally named and characterised as a new species to science, previously known as deinbollia sp. . cameroon has the highest species-diversity and species endemism known in this african-western indian ocean genus of species. deinbollia onanae is an infrequent tree species known from five locations in surviving islands of montane (or upper submontane) forest along the line of the cameroon highlands. it is here assessed as endangered according to the iucn standard, threatened mainly by clearance of forest for agriculture. the majority of tree species characteristic of montane forest (above m alt.) in the cameroon highlands are also widespread in east african mountains (i.e. are afromontane). deinbollia onanae is one of only a very small number of species that are endemic (globally restricted to) the mountain range. it is postulated that this new species is in a sister relationship with deinbollia oreophila, which is a frequent species of a lower (submontane) altitudinal band of the same range. it is further postulated that seed dispersal is or was by frugivorous birds, potentially turacos, alternatively by primates such as preuss s monkey. as part of the project to designate important plant areas (ipas) in cameroon (also known as tropical important plant areas or tipas), we are striving to name, assess the conservation status and include in ipas (darbyshire et al., ) rare and threatened plant species in the surviving, threatened natural habitat of the cross-sanaga interval (cheek et al., ) . several of these species were previously designated as new to science but not formally published in a series of checklists (see below) ranging over much of the cross-sanaga interval. the cross-sanaga has the highest vascular plant species diversity per degree square in tropical africa (barthlott et al., ) but natural habitat is being steadily being cleared, predominantly for agriculture. in this paper we formally describe and name as deinbollia onanae cheek a high-altitude tree species formerly designated as "deinbollia sp. " (harvey et al., . in the st century only two new species to science have been published in the genus, deinbollia mezilii d.w. thomas & d. j.harris (thomas & harris, ) and d. oreophila cheek (cheek & etuge ) , both from cameroon. but specimens often remain unidentified in herbaria. for example, specimens unidentified to species are listed in the gabon checklist (sosef et al., ) . the genus has no major uses but the fruits of several species are reported as being edible by humans, and the seeds are probably primate-dispersed or dispersed by large frugivorous birds, and the flowers probably bee-pollinated (cheek & etuge, ) , while the bark of d. grandifolius hook.f. is used medicinally and the wood for planks (burkill, ) . fieldwork in cameroon resulting in the specimens cited in this paper was conducted under the terms of the series of memoranda of collaboration between irad (institute for agronomic research and development)-national herbarium of cameroon and royal botanic gardens, kew beginning in , the most recent of which is valid until th sept. . the most recent research permit issued for fieldwork under these agreements was /minresi/b /c /c /c (issued nov ), and the export permit number was /irad/dg/crra-nk/ssrb/ / (issued dec ). at the royal botanic gardens, fieldwork was approved by the institutional review board of kew entitled the overseas fieldwork committee (ofc) for which the most recent registration number was ofc - ( ). the most complete set of duplicates for all specimens made was deposited at ya, the remainder exported to k for identification and distribution following standard practice. field work methodology followed was cheek & cable ( ) . herbarium citations follow index herbariorum (thiers et al., ) . specimens indicated "!" were seen by one or more of the authors, and were studied at k, p, wag, and ya. the national herbarium of cameroon, ya, was also searched for additional material of the new taxon as was tropicos (http://legacy.tropicos.org/specimensearch.aspx). during the time that this paper was researched in - , it was not possible to obtain physical access to material at wag (due to the transfer of wag to naturalis, leiden, subsequent construction work, and covid- travel and access restrictions). however images for wag specimens were studied at https://bioportal.naturalis.nl/?language=en and those from p at https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form?lang=en_us. we also searched jstor global plants ( ) for additional type material of the genus not already represented at k. binomial authorities follow the international plant names index (ipni, ) . the conservation assessment was made using the categories and criteria of iucn ( ). geocat was used to calculate red list metrics (bachman et al., ) . herbarium material was examined with a leica wild m dissecting binocular microscope fitted with an eyepiece graticule measuring in units of . mm at maximum magnification. the drawing was made with the same equipment using leica camera lucida attachment. flowers from herbarium specimens of the new species described below were soaked in warm water to rehydrate the flowers, allowing dissection, characterisation and measurement. the terms and format of the description follow the conventions of (cheek & etuge, ). deinbollia sp. , because it has leaves less than m long, only sparsely hairy on the lower surface, leaflets more than cm long and sepals adaxially glabrous, flower buds very sparsely hairy and less than mm diam. borne on a branched inflorescence - cm long, keys out in the flore du cameroun treatment of deinbollia (fouilloy & hallé, ) to a couplet leading to d. grandifolia hook.f. and d. maxima gilg. however, it differs from these two species in having ( -) - -jugate (not - -jugate), and in other characters shown in table . deinbollia onanae fouilloy & hallé ( ) . the affinities of deinbollia sp. may instead be with, however, the recently described d. oreophila since this species also occurs at altitude in the cameroon highlands and both species share numerous raised lenticels and also leaflets with high length: breadth ratios and with high numbers of secondary nerves. in fact, at two locations, mt kupe and bali ngemba, the two species are sympatric and their altitudinal ranges can overlap slightly , harvey et al., . as the only two species of the genus to grow at altitude in the cameroon highlands, there is a possibility that they might be confused with each other. the two species can be separated using cheek et al., ( ) . deinbollia sp. sensu cheek in harvey et al., ( : ) ; cheek & etuge in cheek et al., ( : ) ; cheek in cheek et al.,( : , fig ) . monoecious tree or treelet ( -) - (- ) m tall, when in flower, lacking exudate or scent when wounded, sparingly branched, nearly glabrous, apart from the inflorescence. stems of flowering branches terete - . cm diameter, solid (not hollow), second internode below apical inflorescence - . cm long, outer epidermis pale grey-brown, contrasting with the darker brown bases of the adjoining petiolar pulvini, lenticels dense, raised, elliptic, . - . mm long, concolorous, inconspicuous, glabrescent, hairs sparse to dense, dark brown, cylindric . - . mm long. leaves alternate, pinnately compound, ( -) - cm long; leaflets ( -) - per leaf on flowering stems, leaflets - per leaf on leaves of juvenile trees. petiole ( -) . - . cm long, terete, c. mm diameter at midpoint, drying pale yellow; basal pulvini dark brown; rhachis ( . -) - cm long, ( -) - -jugate on flowering stems, - -jugate on non-flowering stems of juvenile trees, the upper surface of the distal half strongly convex with two lateral wings, glabrescent with sparse inconspicuous hairs (de wilde ), or with dense dark brown appressed hairs (cable ) . leaflets mostly oblong ( . -) - . x ( . -) . - cm, (but leaflets of sterile branches to . cm wide), acumen c. cm long, base broadly acute, slightly asymmetric, (basalmost leaflets lanceolate and about half the length of the other leaflets) lateral nerves and midrib yellow, raised above and below, convex, ( -) - on each side of the midrib, nearly brochidodromous, the lateral nerve apices forming a weak irregular submarginal nerve, stronger branches uniting with the secondary nerve above, intersecondary nerves strong, parallel to the secondaries, tertiary and quaternary nerves reticulate raised yellow and conspicuous, on both surfaces, contrasting with the pale grey-green areolae (except in cable (k) where they are concolorous and so inconspicuous above, possibly an artefact of poor drying); upper surface glabrous, lower surface with inconspicuous, minute, cylindrical, glossy dark-brown hairs c. . mm long, distributed very sparsely along the midrib and secondary nerves, absent from mature leaves of non-flowering specimens (e.g. cheek ) but then the same hair type present on axillary buds and young leaves; petiolules yellow, - mm long. inflorescence a - -flowered, loose, terminal panicle x cm; auxiliary inflorescences sometimes present in the axils of the distal - leaves (cheek ) ; peduncle of terminal inflorescences - cm long; rhachis internodes ( -) - cm long, shortest in the distal portion; first order bracts caducous; indumentum brown hairy; primary branches - per inflorescence, - cm long, each bearing ( -) - partial-inflorescences; partial-peduncles - mm long, apex with a cluster of - bracteoles; bracteoles subulate to narrowly lanceolate, - mm long, apex narrowly acute, partial-inflorescences ( -) -flowered in glomerules, pedicels erect, terete, - x . mm (female), - x mm (male), sparsely puberulent, hairs . - . mm long. flowers white, scent not recorded, flower buds c. mm diam., open flowers c. x mm. calyx with sepals (- ), orbicular to broadly ovate, concave, green colour, - x . - . mm apex obtuse. corolla apex slightly exserted from calyx, petals rhombic or spatulate. male flowers (fig. c) . petals (- ), white, rhombic c. x mm, apex obtuse-acute, base cuneate, margins densely ciliate, hairs . mm long, outer surface glabrous, inner surface glabrous in distal half, proximal half compressed funneliform with ventral appendage adnate at margins, retuse (notched) for . mm at midline, adaxial surface moderately densely hairy, hairs c. . mm long. extra-staminal disc toruslike, glabrous, irregular, outer wall convex, lacking constrictions or teeth with c. poorly defined lobes, . - mm wide, c. . mm high. stamens c. , erect, slightly exserted by - mm at anthesis, c. - . mm long; filament - mm long, straight, densely puberulent the entire length (fig. d) ; anthers yellow, ovate-ellipsoid, - . mm long. ovary (vestigial, fig. e ) bilobed, c. x . mm densely appressed hairy, hairs c. . mm; style . mm long, glabrous. female flowers (fig. g) , with sepals and petals as the male flowers, but petals c. x . - . mm, usually detaching with a stamen attached, probably due to interlocking hairs (see fig. j) , proximal two-thirds claw-like, c. . mm wide, margin sparsely and irregularly ciliate; ventral appendage with apex deeply bilobed, lobes c. mm x mm; disc as in male flower. stamens c. (see fig. i ), included at anthesis, filament c. . mm long, proximal half to quarter glabrous, distal part densely hairy; anther as male flowers but indehiscent; ovary bilobed (see fig. h) , . x mm, indumentum as male flower, style c. mm long, apical mm, curved, surface papillate-minutely puberulent, apex subcapitate. infructescence, mature fruit and seed unknown. phenology: flowering in november-december; fruiting probably in august-september. local name and uses: none are known. "onana's deinbollia is suggested as a common name. (onana, ) , co-chair of the iucn central african red list authority for plants, former head of the national herbarium of cameroon ( cameroon ( - , co-author of the red data book of the plants of cameroon and the taxonomic checklist of the vascular plants of cameroon (onana, ) . he led field teams of ya staff working with those of k that resulted in the collection of several of the specimens of this species and personally collected this species in the field (onana , k, ya). . two specimens of deinbollia matched no other and were named deinbollia cf. pinnata . in subsequent surveys this taxon was more explicitly referred to as a new species: deinbollia sp. (harvey et al., , cheek et al., conservation: deinbollia onanae is rare at each of its five known locations so far as is known. despite many thousands of herbarium specimens being collected at kilum-ijim, at mt kupe and the bakossi mts, and at bali ngemba cheek et al., ; harvey et al., ) only two specimens of this species at two sites, were made at each of the first two locations and only one at the third location. surveys at other sites in the cameroon highlands and elsewhere, e.g at mt cameroon and at the lebialem highlands, failed to find this species (cheek et al., ; cable & cheek ; harvey et al., ; cheek et al., ) . however, at dom, where a targetted search for this species was made by the first author, three specimens were made, each representing single, isolated trees . no more individuals than these were found. at adamaoua it has only been collected once, and only a single tree was then noted (w.j.j.o. & j.j.f.e. de wilde, b.e.e. de wilde-duyfjes (k)). none of these locations is formally protected for nature conservation. tree cutting for timber and habitat clearance for agriculture has long been known to be a threat at all but the last of these locations (references cited above). we assess the area of occupancy of deinbollia onanae as km² using the iucn preferred km² cell size. therefore, we assess this species as endangered, en b ab(iii) using the iucn ( ) standard. we suggest that this species be included in forest restoration plantings within its natural range to partly reverse its move to extinction. however, the likely large (c. cm diam.), thin-walled seeds are probably recalcitrant, so not suitable for conventional seed-banking, and should not be allowed to be dried before sowing. the discovery of a threatened, new species to science from surviving natural habitat in the cameroon highlands is not unusual. at most of the five locations from which we here describe deinbollia onanae, additional new or resurrected species to science, all threatened with extinction, have been documented in recent years. at mt kupe for example, coffea montekupensis stoffelen (stoffelen et al., ) and more recently the new species and genus to science kupeantha kupensis cheek & sonké (cheek et al., a) . at bali ngemba, leptonychia kamerunensis engler & k. krause (cheek et al., ) , psychotria babatwoensis cheek (cheek et al., ) and allophylus ujori cheek (cheek & etuge, b) , at mt oku and the ijim ridge kniphofia reflexa marais (maisels et al., ) , scleria cheekii bauters (bauters et al., ) , while at dom, the endemic epiphytic sedge coleochloa domensis musaya & d.a simpson (muasya et al., ) . no additional such species are known from the adamaoua location, probably because it less completely sampled than the preceeding four. however, deinbollia onanae is exceptional among these aforementioned species in that it is a new species of tree of predominantly montane forest. the many other newly discovered for science, resurrected or rediscovered plant species of the cameroon highlands have either been herbs or shrubs or are derived from submontane habitats ( - m altitude). the division between montane and submontane forest is well-marked in cameroon (cheek et al., ; cheek et al., ) , although some species of tree, like deinbollia onanae can occur on either side of the m contour. the tree species diversity of the montane forest is low compared with adjoining submontane forest, and in contrast, with very few cameroon highland endemic tree species. almost all montane tree species of the cameroon highlands are widespread in montane forest in africa (afromontane) occurring also east of the congo basin in the rift mountains of east africa. apart from deinbollia onanae, the only other montane tree species endemic to the cameroon highland chain are the nearly extinct ternstroemia cameroonensis cheek (cheek et al., ) and the more common and widespread schleffera mannii (hook.f.)harms (keay, ) . the high altitudinal range of deinbollia onanae is unrivalled west of the congo basin by any other species of the genus. elsewhere in africa it is matched only by deinbollia kilimandscharica taub., of mountains from ethiopia to malawi, reported to achieve m elevation in tanzania (davies & verdcourt, ) . most species of the genus in tropical africa are lowland forest shrubs, in the cameroon highlands only deinbollia oreophila also occurs regularly at altitude, and is largely confined to the submontane forest band being recorded from ( -) - (- ) m altitude where it is often relatively frequent (cheek & etuge, ) . we postulate based on their shared morphological characters that these two may be sister species that have segregated between two adjacent altitudinally based vegetation types in a similar way to certain clades of bird species in the cameroon highlands such as the turaco (njabo & sorensen, ). this hypothesis needs testing. it would most readily done by a comprehensive species-level molecular phylogenomic study of deinbollia as has been achieved in several other genera, such as nepenthes l.f. (murphy et al., ) . the fruits of deinbollia onanae are expected to be similar to those of other species of the genus, i.e., fleshy, indehiscent and large-seeded, suggesting that the now intermittent distribution of this species, along a line c. km along peaks of the cameroon highland line, is due to dispersal by animals. dispersal is or was, possibly by a fruit-eating bird such as bannerman's turaco (tauraco bannermani), a species restricted to forest above m altitude (njabo & sorensen ). alternatively, dispersal might be by a primate species such as preuss's monkey (allochrocebus preussi), which lives in high altitude forest up to m altitude. both species are threatened with extinction and have been assessed as endangered under the iucn standard. formerly the range of deinbollia onanae may have once been more continuous along the mountain range than today, but it was likely greatly reduced when forest was cleared for agriculture. large sections of the range, such as the bamenda highlands, are now so denuded of the original forest that they are today referred to as "the grasslands". it has been estimated in such areas that over . % of the original forest has been lost . such discoveries as deinbollia onanae underline the urgency for publishing further discoveries while it is still possible since threats to such newly discovered for science species are clear and current, putting these species at high risk of extinction. about new species of vascular plant have been discovered each year for the last decade or more . until species are known to science, they cannot be assessed for their conservation status and the possibility of protecting them is reduced . documented extinctions of plant species are increasing, e.g. oxygyne triandra schltr. of cameroon is now known to be globally extinct (cheek et al., b) . in some cases species appear to be extinct even before they are known to science, such as vepris bali cheek, once sympatric with deinbollia onanae at bali ngemba , and elsewhere, nepenthes maximoides cheek (king & cheek, ) . most of the > cameroonian species in the red data book for the plants of cameroon are threatened with extinction due to habitat clearance or degradation, especially of forest for small-holder and plantation agriculture e.g. oil palm, following logging . efforts are now being made to delimit the highest priority areas in cameroon for plant conservation as tropical important plant areas (tipas) using the revised ipa criteria set out in darbyshire et al., ( ) . this is intended to help avoid the global extinction of additional endemic species such as the endangered deinbollia onanae which will be included in the proposed ipa s of mt kupe, bali ngemba, kilum-ijim and dom. supporting red list threat assessments with geocat: geospatial conservation assessment tool global distribution of species diversity in vascular plants: towards a world map of phytodiversity scleria cheekii, a new species of scleria subgenus hypoporum (cyperaceae, cyperoideae, sclerieae) from cameroon plastid and nuclear dna markers reveal intricate relationships at subfamilial and tribal levels in the soapberry family (sapindaceae) phylogeny and circumscription of sapindaceae revisited: molecular sequence data, morphology and biogeography support recognition of a new family the plants of mt cameroon, a conservation checklist plant inventory for conservation management: the kew-earthwatch programme in western cameroon a new submontane species of deinbollia (sapindaceae) from western cameroon and adjoining nigeria allophylus conraui (sapindaceae) reassessed and allophylus ujori described from western cameroon rubiaceae), a new genus from cameroon and equatorial guinea three new or resurrected species of leptonychia (sterculiaceae-byttneriaceae-malvaceae) from west-central africa mapping plant biodiversity on mt the biodiversity of african plants (proceedings xiv aetfat congress) four new submontane species of psychotria (rubiaceae) with bacterial nodules from western cameroon vepris bali (rutaceae), a new critically endangered (possibly extinct) cloud forest tree species from bali ngemba, cameroon the plants of dom the plants of mefou proposed national park the phytogeography and flora of western cameroon and the cross river-sanaga river interval new scientific discoveries: plants and fungi the plants of mount oku and the ijim ridge, cameroon, a conservation checklist the plants of kupe, mwanenguba and the bakossi mountains ternstroemia cameroonensis (ternstroemiaceae), a new medicinally important species of montane tree, nearly extinct in the highlands of cameroon taxonomic monograph of oxygyne (thismiaceae), rare achlorophyllous mycoheterotrophs with strongly disjunct distribution important plant areas: revised selection criteria for a global approach to plant conservation the plants of bali ngemba forest reserve, cameroon. a conservation checklist the plants of the lebialem highlands, a conservation checklist international plant names index. the royal botanic gardens, kew, harvard university herbaria & libraries and australian national botanic gardens iucn red list categories and criteria: version . . second edition jstor global plants. . continuously updated) available at sapindaceae nepenthes maximoides (nepenthaceae) a new, critically endangered (possibly extinct) species in sect. alatae from luzon, philippines showing striking pitcher convergence with n. maxima rare plants on mt oku summit, cameroon a new species of epiphytic coleochloa (cyperaceae) from cameroon a phylogenomic analysis of nepenthes (nepenthaceae) origin of bannerman's turaco tauraco bannermani in relation to historical climate change and the distribution of west african montane forests the vascular plants of cameroon. a taxonomic checklist with iucn assessments. flore du cameroun . irad-national herbarium of cameroon red data book of the flowering plants of cameroon, iucn global assessments sapindaceae in engler, a. das pflanzenreich iv. heft c checklist of gabonese vascular plants a new species of coffea (rubiaceae) and notes on mt kupe (cameroon) index herbariorum: a global directory of public herbaria and associated staff notes on deinbollia species from cameroon new sapindaceae from cameroon and nigeria this paper was completed as part of the cameroon tropical important plant areas project, supported by the players of peoples postcode lottery. the second author's contribution to this paper was made possible by visits from cameroon to rbg, kew, u.k. sponsored by the bentham-moxon trust of rbg, kew. most of the specimens cited in this paper were collected with the support of volunteers of earthwatch europe, oxford and by our colleagues kenneth tah, olivier sene, victor nana, verina ingram, david okebiro, assefa, b. gapta, h. ndue, m. kissimou, rene nfon, stuart cable, ben pollard and the late martin etuge for assistance in the field. drs florence ngo ngwe, eric nana, jean betti lagarde, the current and former directors, of irad-national herbarium of cameroon, yaoundé, and their staff are thanked for expediting the collaboration between our two institutes. janis shillito typed the manuscript. two anonymous reviewers are thanked for reviewing an earlier version of this paper. key: cord- -qvuv g authors: amster, guy; murphy, david a.; milligan, william m.; sella, guy title: changes in life history and population size can explain relative neutral diversity levels on x and autosomes in extant human populations date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: qvuv g in human populations, relative levels of neutral polymorphism on the x and autosomes differ markedly from each other and from the naive theoretical expectation of ¾. these differences have attracted considerable attention, with studies highlighting several potential causes, including male biased mutation and reproductive variance, historical changes in population size, and selection at linked loci. we revisit this question in light of our new theory about the effects of life history and given pedigree-based estimates of the dependence of human mutation rates on sex and age. we demonstrate that life history effects, particularly higher generation times in males than females, likely had multiple effects on human x-to-autosomes (x:a) polymorphism ratios, through the extent of male mutation bias, the equilibrium x:a ratios of effective population sizes, and differential responses to changes in population size. we also show that the standard approach of using divergence between species to correct for the male bias in mutation results in biased estimates of x:a effective population size ratios. we obtain alternative estimates using pedigree-based estimates of the male mutation bias, which reveal x:a ratios of effective population sizes to be considerably greater than previously appreciated. we then show that the joint effects of historical changes in life history and population size can explain x:a polymorphism ratios in extant human populations. our results suggest that ancestral human populations were highly polygynous; that non-african populations experienced a substantial reduction in polygyny and/or increase in male-biased generation times around the out of africa bottleneck; and that extant diversity levels were affected by fairly recent changes in sex-specific life history. significance statement all else being equal, the ratio of diversity levels on x and autosomes at selectively neutral sites should mirror the ratio of their numbers in the population and thus equal ¾. in reality, the ratios observed across human populations differ markedly from ¾ and from each other. because from a population perspective, autosomes spend an equal number of generations in both sexes while the x spends twice as many generations in females, these departures from the naïve expectations likely reflect differences between male and female life histories and their effects on mutation processes. indeed, we show that the ratios observed across human populations can be explained by demographic history, assuming plausible, sex-specific mutation rates, generation times and reproductive variances. neutral polymorphism patterns on the x and autosomes reflect a combination of evolutionary forces. everything else being equal, the x to autosome (x:a) polymorphism ratio should be ¾, because the number of x-chromosomes in a population is ¾ that of autosomes. a complication, however, is that autosomes spend an equal number of generations in diploid form in both sexes, whereas the x spends twice as many generations in diploid form in females as in haploid form in males. as a result, the x:a polymorphism ratio can also be shaped by differences in male and female life history and mutation processes, as well as by differences in the effects of demographic history and selection at linked sites on the x and autosomes. the effects of these factors have been studied theoretically ( ) and in relation to observations in many species ( ) ( ) ( ) ( ) ( ) ( ) . notably, their effects on polymorphism ratios in human populations has garnered considerable interest over the past decade ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . the impact of selection at linked sites on neutral diversity levels could differ for x and autosomes because of differences in recombination rates, in the density of selected regions, and in the efficacy and modes of selection. notably, the hemizygosity of the x in males leads to a more rapid fixation of recessive or partially recessive beneficial alleles and to a more rapid purging of recessive deleterious ones ( , ) . accounting for these effects and for recombination rates suggests that in humans-in mammals more generally-the effects of selection at linked sites should be stronger on the x ( ( ) , but see ( ) ). to evaluate these effects empirically, several studies have examined how polymorphism levels on the x and autosomes vary with genetic distance from putatively selected regions, e.g., from coding and conserved non-coding regions ( , , , ( ) ( ) ( ) . in most hominids, including humans, such comparisons confirm the theoretical expectation that selection at linked loci reduces x:a ratios ( , , ) . they further suggest that the effects are minimal sufficiently far from genes ( , ) , thereby providing an opportunity to examine the effects of other factors shaping x:a ratios in isolation, by considering regions that are minimally affected. even far from genes, however, the x:a ratios in humans and other hominids differ markedly from the naive expectation ( , , ) . polymorphism levels on the x and autosomes are typically divided by divergence from an outgroup (e.g., divergence to orangutan or rhesus macaque is used to normalize polymorphism levels in humans) in order to control for the effects of higher mutation rates in males and variation in mutation rates along the genome ( ) . the normalized estimates of x:a ratios in regions far from genes range between ¾ and among human populations, generally decreasing with the distance from africa ( , , , ) . ratios exceeding ¾ have also been observed in most other hominids ( ( ), but see ( )). these departures from ¾ and differences among populations and species have been attributed in part to the effects of demographic history, in particular to historical changes in population size. if we assume that the effective population size on the x is generally smaller than on autosomes, then changes in population size will have a different impact on polymorphism levels on x and autosomes ( , ( ) ( ) ( ) . notably, population bottlenecks that occurred sufficiently recently, such as the out of africa (ooa) bottleneck in human evolution, will have decreased the x:a ratio, because a greater proportion of x-linked lineages will have coalesced during the bottleneck ( ) . indeed, simulation studies suggested that historical changes in population size have contributed substantially to the x:a ratios decrease with the distance from africa ( ) . historical differences between males and females may have also played a role. for example, keinan and colleagues speculated that male biased migration or longer male generation times during the out-of-africa bottleneck contributed to the lower x:a ratios in non-africans ( ) . sex differences in life history traits are also likely to have had substantial effects on x:a ratios. the most straightforward of these effects arises from higher reproductive variances in males than in females (e.g., due to sexual selection ( )), which cause higher coalescence rates on autosomes and thus increased x:a ratios ( , ) . this increase is theoretically bound by a multiplicative factor of # ( ), but is probably much smaller in reality. nonetheless, greater male reproductive variances in extant hunter-gatherers and hominid species ( ) , suggest that male biased variance plausibly contributed to observed differences in x:a ratios, as well as to their departure from # . in addition, higher generation times in males may have also had substantial yet underappreciated effects ( ) . higher generation times in males decrease coalescence rates on autosomes compared to the x and thus the x:a ratio of effective population sizes ( ) . in addition, mutation rates in humans-likely in mammals more generally-increase more rapidly with paternal than with maternal age ( ) ( ) ( ) ( ) ( ) ( ) . longer generation times in males therefore decrease mutation rates on the x relative to autosomes. normalizing polymorphism estimates by divergence to an outgroup may not fully account for this mutational effect if, as is likely, male mutation bias evolve over phylogenetic time scales ( ) ( ) ( ) ; moreover, normalized ratios also reflect a non-mutational generation times effect on x:a divergence ratios, as longer generation times in males imply fewer generations on autosomes relative to the x since the species split ( , ) . thus, male and female generation times can in principle affect x:a ratios in multiple ways, which should be considered jointly. here we examine these effects, and those of life history more generally, on polymorphism ratios in humans. we begin with general considerations: about the effects in populations of constant size, about the effects in response to changes in population size, and about biases introduced by normalizing polymorphism ratios by divergence to an outgroup. we then estimate x:a polymorphism ratios in six human populations in which historical changes in population size were inferred previously, and show that considering these effects jointly can explain the observed ratios. life history effects in populations of constant size. in a parallel paper ( ) , we derive expressions for neutral x:a polymorphism ratios in a panmictic population of constant size, under a model that captures quite general life history effects. the model assumes that the population is divided into sex specific age classes, with female and male proportions % and & at birth, respectively ( % + & = ), and that the sizes of subsequent age classes of each sex declines with age, reflecting sex and age specific mortality. fecundity also depends on sex and age and incorporates sex-specific reproductive variances and correlations in the numbers of offspring at different ages. generation times in females, % , and in males, & , are defined as the expectations of maternal and paternal ages. mutation rates can vary with sex and age, with their per generation rates in females, % , and in males, & , defined as expectations over parental ages. the expected numbers of offspring of each sex necessarily equals , but female and male reproductive variances, % and & respectively, may differ due to sex and age dependent mortality and fecundity. we show that the x:a ratio of effective population sizes is then: where ( ) = bcde fcdf . here we define the effective population sizes such that they equal the number of individuals under the standard wright-fisher model, but we note that they are sometimes defined as the inverse of coalescence rates (e.g., in the statement that all else being equal, the x:a ratio of . is ¾). we refer to the x:a ratio of inverse coalescence rates as the genealogical ratio, which in this case is simply ¾ times the ratio of . values. we also show that the x:a ratio of expected heterozygosities is females and males at birth in humans are nearly equal, we henceforth assume that they are, and thus that the latter ratio equals ( + & )/( + % ); for brevity, we refer to it as the ratio of reproductive variances. all three ratios in eq. are male-biased in many taxa ( ) ( ) ( ) . we examine their effects on x:a polymorphism ratios in humans, given available estimates. sex-specific reproductive variances were measured in five extant hunter-gatherer groups, albeit using small sample sizes, and found to be . - . folds higher in males ( ) , with reproductive variances ratios corresponding to a - % increase in x:a polymorphism ratios. sex-specific generation times were measured in seven hunter-gatherer groups, with mean generation times found to vary between and years and generation times ratios between . and . ( , ) , corresponding to a . %- . % decrease in x:a polymorphism ratios. male mutation bias, , was estimated in pedigree studies ( ) , and found to increase approximately linearly on the sex- normalizing ratios of polymorphism using divergence. most studies estimate x:a genealogical ratios by dividing polymorphism levels by estimates of the number of substitutions since the split from an outgroup (e.g., ( , , , ) ). this "normalization" is meant to control for differences in mutation rates on the x and autosomes, due to male mutation bias and differences in base composition. we now ask whether this practice is valid, and in particular whether it controls for male-biased mutation when we account for life history effects on polymorphism and substitution rates. to this end, we rely on eq. for the polymorphism ratio (ignoring changes in population size) and on a parallel expression for the substitution ratio where by '*' we denote parameters averaged over the lineage on which substitutions are measured (the specific form of averaging is detailed in ( )). the second term (on the right-hand side) is the x:a genealogical ratio. the first term (in brackets) includes the mutational effect on the polymorphism ratio and the terms introduced by the normalization. for the normalization to fulfill its purpose of canceling out the effect of male mutation bias, this term should equal . previous work suggests that male mutation bias ( = & % ⁄ ) evolves substantially over phylogenetic time scales ( , ) , and therefore the mutational is unlikely to cancel out. however, even if it did, the dependence of the substitution ratio on the generation times ratio introduces an additional term, ( & * % * ⁄ ). both terms are likely to lead to bias in estimates of the genealogical x:a ratio in a given population. furthermore, as the degree of male mutation bias probably varies among populations (e.g., due to variation in generation times ( ) ), relative estimates of genealogical ratios in different populations will likely biased as well. we can assess the severity of these biases for estimates of human x:a genealogical ratios by comparing divergence-and pedigree-based estimates of the male mutation bias, (fig. ). if differences in the mutation rate between the x and autosome arise predominantly from the male mutation bias (see discussion), then we would expect estimates of the mutational effects on x:a polymorphism ratios based on contemporary pedigree studies to be more reliable. indeed, while mutation rates may have evolved over the period in which neutral diversity in extant human populations arose (e.g., on the order of ~ . my ( ); see discussion), such changes were likely smaller than the changes over phylogenetic time scales (e.g., on the order of ~ and ~ my for divergence from orangutans and rhesus macaques, respectively ( )). second, while both divergence and pedigree-based estimates of depend on the sex ratio of generation times, in pedigree studies, this dependence reflects only the effect of generation times on mutation rates (as opposed to the non-mutational term ( & * % * ⁄ ) for divergence) and is explicit. pedigree-based estimates of , again assuming a linear dependence on & / % ( ) and estimates of & / % in extant hunter-gatherers ( , ) , range between . to . , and estimates in most societies point to the higher end of this range. these estimates are approximately twofold greater than those based on human-orangutan or human-macaque divergence (fig. ) , commonly used to normalize human x:a ratios ( , , , ) . this strongly suggests that current estimates of human x:a genealogical ratios are substantially biased. table s ) and on contemporary pedigree studies (si section ; ( )). pedigree-based estimates strongly depend on, and are therefore shown as a function of, the generation times ratio, & / % . they depend only weakly on the average generation time, , as shown by the (cyan) range corresponding to between - years. revised estimates of human x:a genealogical ratios. we therefore revisit the estimation of genealogical x:a ratios in human populations. we first estimate x:a polymorphism ratios normalized by divergence, in order to correct for local variation in mutation rates on x and autosomes ( ) , and then we rely on pedigree studies to correct for the bias in divergence-based estimates of . we estimate normalized, neutral polymorphism ratios in the absence of selection at linked sites in two ways (si section . ). first, we apply the standard method ( ), based on measuring polymorphism and divergence at putatively neutral sites far from exons. second, rather than imposing a threshold distance from exons, we use sites throughout the genome and rely on the mcvicker et al. b-maps to correct for the effects of selection at linked sites ( ) ; this approach allows us to use more data. our estimates based on the two approaches are consistent (fig. s ) and we henceforth rely on estimates using the second approach (fig. ) , which are more precise. for unclear reasons, they do not agree with the estimates of arbiza et al., which rely on similar data but are slightly higher in yri and much higher in other populations ( ) . to examine how much male-mutation bias affects estimates of x:a genealogical ratios, we assume = . , corresponding to the pedigree based estimate with the average & % ⁄ measured in extant huntergatherers (fig. ) . we then obtain the corrected estimates by multiplying our divergence-normalized estimates by ( . )/ ( \ ] ), where \ ] is the divergence-based estimate of male mutation bias. dividing by ( \ ] ) also removes the potential effect of differential levels of ancestral polymorphism on x and autosomes. the resulting estimates are ~ % greater than those based on divergence alone, suggesting that the genealogical x:a ratios in humans are considerably greater than previously appreciated (fig. ) . as we already noted, pedigree-based estimates of strongly depend on & % ⁄ . since this ratio likely varies over time and among populations, we cannot estimate genealogical x:a ratios that reliably. this limitation is not specific to us, but instead highlights the difficulty of teasing apart the mutational and genealogical effects on x:a polymorphism ratios without making explicit or implicit assumptions about male mutation bias and its evolution. explaining genealogical ratios in human populations. instead, we turn the question on its head and ask whether the effects of sex ratios of generation times and reproductive variances as well as historical changes in population size could explain the estimates of x:a polymorphism ratios that are normalized by divergence. to this end, we rely on pairwise msmc-based estimates of historical, autosomal effective population sizes for the six g populations in which these were inferred ( fig. a; ( ) ). in all cases considered, we assume that = , to match the assumptions of these previous demographic inferences ( ) . we first consider the ratio in yri, then the reduction in ratio in ceu relative to yri, and lastly, the ratios in all six populations jointly. polymorphism ratio in yri. the x:a polymorphism ratio in yri is remarkably high (fig. ) , which is indicative of substantial polygyny (i.e., that a minority of males sired offspring with multiple females). accounting for historical changes in population size and assuming, for example, the average generation trade-off in which assuming a higher generation times ratio implies more extreme male-biased reproductive variances (fig. ) . given that the generation times ratio was likely greater than , our findings suggest substantial polygyny (where a minority of males sire offspring with multiple females) in the ancestors of yri, and thus of other human populations ( ) . ) are constrained to be between and . ; these ranges are somewhat arbitrary, yet they are clearly possible, given estimates in extant hunter-gatherers ( ) . requiring the ratios to have been the same in both populations and constant over time (model ii in fig. ), we find that the maximal reduction in the x:a polymorphism ratio in ceu relative to yri is . % (see si section for details on the maximization). allowing the ratios to have different values before and after the split between yri and ceu but requiring them to be the same in both populations (model iii in fig. ), results in only a slightly greater maximal reduction of . % in the x:a polymorphism ratio. further allowing for population specific parameter values after the split (model iv in fig. ), we find that the maximal reduction in the ratio in ceu relative to yri rises to %, which is greater than the reduction observed. these results illustrate, quite surprisingly, that fairly recent changes to life history traits (relative to the average age of neutral polymorphism in either population) can dramatically affect x:a polymorphism ratios. in particular, they show that the reduction in polymorphism ratio in ceu relative to yri can be explained by assuming that life history parameters varied within plausible ranges over time and among populations. ⁄ are allowed to vary within the ranges detailed in the text, and are chosen to maximize the extant reduction in polymorphism ratios in ceu relative to yri under the following constraints: (ii) constant ratios over time and populations; (iii) ratios can differ before and after the populations split but are the same in both populations; (iv) ratios are the same before the populations split but different after. the estimated reduction in polymorphism ratio in ceu relative to yri is shown for comparison. polymorphism ratios in six populations. next, we examine whether variation in life history can explain the polymorphism ratios observed in all six populations jointly. for comparison, we first consider the model without sex-specific life history, which expectedly yields a poor fit (fig. b ). next, we allow for sex-specific life history parameters (within the ranges detailed above) and let them vary among the intervals defined by the approximate split times among populations (fig. a) . in particular, we seek the parameter values that minimize a weighted squared distance between predicted and estimated polymorphism ratios (see si section ). allowing sex-specific life history parameters to vary over time but not among populations substantially improved the fit, but fails to account for some features, e.g., the ratio in yri (fig. b) . further allowing sex-specific life history parameters to differ after populations split from one another, we are able to closely match the point estimates for all six populations (with mean distance < . sem averaged over the observed estimates; fig. b ). comparison of estimated normalized, polymorphism x:a ratios with those predicated under the historical changes in population size shown in a, and assuming: i) no life history effects (black); ii) sex-specific life history parameters that vary among the demarked intervals and were chosen to best fit the estimates (red); iii) the best fit further allowing sex-specific life history parameters to vary among populations after they split (blue). see text and si section for details. life history traits during human evolution. our results illustrate that historical changes in sex-specific life history traits and in population size can explain the x:a polymorphism ratios in extant human populations. our analysis relied on somewhat arbitrary decisions to fit few extant polymorphism ratios using many 'historical' life history parameters, i.e., about possible parameter ranges, the time intervals in which they could vary, and the distance between predictions and estimates that was minimized. alternative decisions would doubtless result in other sets of parameters that match the estimates of polymorphism ratios to a similar degree (accounting for uncertainty). the specific set of values we found (fig. s ) should therefore be treated as one of many possibilities; narrowing these sets down will require bringing to bear richer summaries of the data (see discussion). nonetheless, our results suggest a few conclusions. the first is that ancestral human populations were highly polygynous, as explaining the polymorphism ratios in yri would be difficult otherwise. second, they indicate that non-african populations likely experienced a substantial reduction in polygyny and/or increase in male-biased generation times around the ooa bottleneck, helping to explain the large reduction in polymorphism ratios in non-african populations. third, we find that, quite surprisingly, fairly recent changes in sex-specific life history have had a substantial impact on extant diversity levels, and in particular can account for the reduction in ratios between european and asian populations. life history traits, and generation times in particular, affect x:a polymorphism ratios in multiple ways, and these effects can be surprisingly strong. in particular, we have shown that in humans, higher generation times in males than in females substantially decreases the mutation rate and increases coalescence rates on the x relative to autosomes. they also substantially enhance the reduction in the x:a ratio due to bottlenecks (or alternatively increase the ratio due to population growth), both by accelerating the response time in generations and by increasing the number of generations per unit time on the x relative to autosomes. these generation times effects compound those of higher reproductive variance in males (i.e., polygyny) that were explored by previous studies ( , , , , , ) . higher male variances decrease coalescence rates on x relative to autosomes and dampen the effects of changes in population sizes on x:a diversity ratios. as we show, considered jointly, these effects can explain observed x:a polymorphism ratios across human populations. while our results have clear implications about the values of life history traits in recent human evolution, our ability to draw quantitative conclusions is limited by remaining gaps in our knowledge about demographic history. current demographic inferences assume that the autosomal generation time and mutation rate were constant, whereas both have doubtless changed over time. ignoring such changes introduces errors in estimates of effective population sizes and in their assignment to past dates (i.e., in years). accounting for these errors is unlikely to change our qualitative conclusions, but would likely affect the life history parameter estimates. the same is true of demographic complications that we did not consider, including historical migration/admixture among populations ( ) and ancient introgression ( , ) , and more speculative sex biases in these processes ( , , ) . our analysis further relies on pedigree-based estimates of mutation rates in contemporary humans in order to model mutational effects on x:a polymorphism ratios. we used these estimates to infer x:a genealogical ratios and to relate models of historical life history trait values with extant x:a polymorphism ratios. in so doing, we assumed that mutation rates on the x are well approximated by the averages over rates in males and females, i.e., that / =f % + b f & (because pedigree-based estimates rely on autosomal mutations). although this assumption is inexact, given the evidence for x specific modifiers of mutation rates, the observed effects are fairly subtle ( ) . in the future, larger pedigree studies in humans, with a sufficient number of mutations on the x, should allow direct estimation of mutation rates on the x. our approach also assumed that male mutation bias and its dependence on generation times observed today hold for the entire period over which extant neutral diversity in human arose, e.g., over the past ~ . my ( ) . the evidence regarding evolutionary change in male mutation bias is contradictory. lineage-specific, divergence-based estimates of in great apes are extremely variable, with estimates of . for humans, . for chimpanzees, . for gorillas and . for orangutans ( ) (some but probably not most of this variation could be due to changing sex-ratios of generation times ( ) ). in contrast, pedigree-based estimates of in extant species spanning a much greater phylogenetic range, i.e., mammals, appear to be stable (albeit with large confidence intervals), and are in fact consistent with the estimates in humans (( - ); felix wu and molly przeworski, personal communication). larger pedigree-based studies in other catarrhine species will likely resolve this apparent conflict and inform the plausibility of our assumption. more generally, we note that pedigree-based estimates of the autosomal mutation rate have triggered a wholesale revision of the chronology of human evolution obtained from genetic data ( ) . similarly, our results call for a revision of human x:a polymorphism ratios in light of pedigree based estimates of the male mutation bias. novel insights about mutation may also facilitate direct inferences about historical changes in life history traits. such inferences could rely on the fact that different kinds of mutations have distinct dependencies on male and female generation times ( ) but share the same genealogies. it may therefore be possible to infer male and female generation times from the ratios of different kinds of mutations of the same age on x and autosome linked genealogies. it might also be possible to extend methods like msmc to utilize data about different kinds of mutations on the x and autosomes jointly, in order to infer historical changes in both generation times and effective population sizes, and possibly even sex-dependent migration between populations. while we focused our analysis on humans, for which there are more data, there is every reason to think that life history substantially affected x:a polymorphism ratios in other species as well. notably, sex differences in life history traits and changes in population size, as well as extensive variation in these factors among populations and closely related species, are pervasive in many taxa (e.g., among vertebrates ( )). it almost necessarily follows that the life history and particularly the generation time effects that we describe would have affected their x:a polymorphism ratios. the effect of life-history and mode of inheritance on neutral genetic variability selection, recombination, and dna polymorphism in drosophila. non neutral-evolution: theories and molecular data contrasting patterns of x-linked and autosomal nucleotide variation in drosophila melanogaster and drosophila simulans testing models of selection and demography in drosophila simulans the different levels of genetic diversity in sex chromosomes and autosomes revisiting an old riddle: what determines genetic diversity levels within species? genomic signatures of sex-biased demography: progress and prospects sex-biased evolutionary forces shape genomic patterns of human diversity accelerated genetic drift on chromosome x during the human dispersal out of africa the ratio of human x chromosome to autosome diversity is positively correlated with genetic distance from genes can a sex-biased human demography account for the reduced effective population size of chromosome x in non-africans? analyses of x-linked and autosomal genetic variation in population-scale whole genome sequencing evidence for increased levels of positive and negative selection on the x chromosome versus autosomes in humans great ape genetic diversity and population history extreme selective sweeps independently targeted the x chromosomes of the great apes a mathematical theory of natural and artificial selection the relative rates of evolution of sex chromosomes and autosomes the effects of deleterious mutations on evolution at linked sites widespread genomic signatures of natural selection in hominid evolution classic selective sweeps were rare in recent human evolution contrasting x-linked and autosomal diversity across human populations search for gravitational-wave bursts from soft gamma repeaters a human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear dna variation population bottlenecks and patterns of human polymorphism population size changes reshape genomic patterns of diversity means, variances, and ranges in reproductive success: comparative evidence life history effects on neutral diversity levels of autosomes and sex chromosomes the origins, patterns and implications of human spontaneous mutation rate of de novo mutations and the importance of father's age to disease risk strong male bias drives germline mutation in chimpanzees new observations on maternal age effect on germline de novo mutations parental influence on human germline de novo mutations in , trios from iceland overlooked roles of dna damage and maternal age in generating human germline mutations doubts about complex speciation between humans and chimpanzees do variations in substitution rates and male mutation bias correlate with life-history traits? a study of mammalian genomes life history effects on the molecular clock of autosomes and sex chromosomes evolution in age-structured populations cross-cultural estimation of the human generation interval for use in geneticsbased population divergence studies sexual selection and the origins of human mating systems genome analyses substantiate male mutation bias in many species reduced representation genome sequencing suggests low diversity on the sex chromosomes of tonkean macaque monkeys do variations in substitution rates and male mutation bias correlate with life-history traits? a study of mammalian genomes revising the human mutation rate: implications for understanding human evolution inferring human population size and separation history from multiple genome sequences a genetic atlas of human admixture history a draft sequence of the neandertal genome a high-coverage genome sequence from an archaic denisovan individual the strength of selection against neanderthal introgression signatures of replication timing, recombination, and sex in the spectrum of rare variants on the human x chromosome and autosomes paternal age in rhesus macaques is positively associated with germline mutation accumulation but not with measures of offspring sociability reproductive longevity predicts mutation rates in primates direct estimation of mutations in great apes reconciles phylogenetic dating direct estimation of de novo mutation rates in a chimpanzee parentoffspring trio by ultra-deep whole genome sequencing striking differences in patterns of germline mutation between mice and humans frequency of mosaicism points towards mutation-prone early cleavage cell divisions. biorxiv an analysis of the relationship between metabolism, developmental schedules, and longevity using phylogenetic independent contrasts. the journals of gerontology. series a, biological sciences and medical sciences acknowledgements. we thank i. agarwal, p. moorjani and m. przeworski for many helpful discussions and comments on the manuscript. we also thank the editor and three anonymous reviewers for many helpful comments on an earlier version of this manuscript. key: cord- -jvbgouv authors: pfrieger, frank w. title: insight into the workforce advancing fields of science and technology date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jvbgouv advances in biomedicine and other fields of science and technology depend on research teams and their peer-reviewed publications. the scientific literature represents an invaluable socioeconomic resource guiding future research. typically, this growing body of information is explored by queries in bibliographic databases concerning topics of interest and by subsequent scrutiny of matching publications. this approach informs readily about content, but leaves the workforce driving the field largely unexplored. the hurdle can be overcome by a transparent team-centered analysis that visualizes the teams working in a field of interest and that delineates their genealogic and collaborative relations. context-specific, but citation-independent metrics gauge team impact and reveal key contributors valuing publication output, mentorship and collaboration. the new insight into the structure, dynamics and performance of the workforce driving research in distinct disciplines complements ongoing efforts to mine the scientific literature, foster collaboration, evaluate research and guide future policies and investments. progress in biomedicine and other fields of science and technology (s&t) depends on research teams working in specific fields and on the publication of their results by peer-reviewed articles. the rapidly growing body of scientific information reflects past and current states of the art and represents an invaluable socio-economic resource guiding future research activity, policies and pfrieger, workforce analysis investments (mukherjee et al., ; fischhoff and scheufele, ) . moreover, scientific publications are explored by the "science of science" aiming to understand the inner workings of science from global points of view (clauset et al., ; zeng et al., ; fortunato et al., ; fischhoff and scheufele, ; hardwicke et al., ) . the utility of this information relies on bibliographic databases, on refined methods to search and analyse content (lu, ; mclevey and mcilroy-young, ) and on efficient science communication (fischhoff and scheufele, ) . for global analyses complex algorithms process large data sets (muller et al., ; zeng et al., ; miotto et al., ; kastrin and hristovski, ) , whereas a typical user queries a bibliographic database on a specific topic using relevant keywords. from the resulting list of publications, scientific content can be extracted, but the workforce driving the field, its size, dynamics and key contributors remain largely inaccessible. this hurdle can be overcome by a team-centered approach, named teamtree analysis (tta). based on scientific articles related to a specific topic, this approach reveals instantly the teams working in a research field, visualizes workforce growth, delineates family and collaborative connections and gauges team performance in a citation-independent manner. to explore tta, the workforce of an exemplary field in biomedical science was analyzed. a pubmed query using the term "circadian clock" (clock) yielded a list of articles published between and , from which tta identified principal investigators (pis)/teams working in the field based on last author names (table ; supplementary data ). pfrieger, workforce analysis plotting publication years of each pi/team against a chronologic team index with alternating sign (ti) creates a tree-like visual revealing each team's entry into the field and its publication count (pc) per year (fig. a) . the clock field expanded steadily in terms of workforce and of publication output as indicated by annual counts of newly entering teams and of published articles, respectively (fig. a) . individual pis/teams published up to articles (pc, publication count) as last authors (last) with a maximum annual output of nearly papers per year. the majority of teams ( %) contributed single articles (fig. b) . ranking teams by numbers of publications revealed the top ten players in the clock field with respect to publication record ( fig. c ). tta exposes ancestor -offspring relations based on last -first authors of articles, respectively. a quarter of pis/teams with last author articles published previously as first authors (pc first) thus qualifying as offspring in this field (fig. b) . about % of the teams qualified as ancestors that generated up to offspring teams (oc, offspring count) and published up to articles with their offspring (pcoff). offspring teams and their articles represented a relatively constant fraction of the annual workforce and of the publication output (fig. d) . overall, the clock field comprised families with up to members (fs, family size) spanning generations (tg, team generation; fig. e ). the genealogic analysis indicated the most prolific players in the field and their family relations (fig. d, f) . collaborations among teams in the clock field were delineated from co-authorship of pis/teams on pubmed articles. figure a shows connections between teams (left; last authors) and their collaborators (right; coauthors) and the numbers of collaborators per team (cc, collaboration count). the analysis separates out-and in-degree connections, where teams are listed as last authors or co-authors, respectively. collaborating teams represented a substantial pfrieger, workforce analysis fraction of the workforce and contributed more than half of all publications ( fig. a) . the increasing importance of collaborations was indicated by the steadily increasing mean number of authors per article published annually (fig. b ). about % of the pis/teams working in the clock field established up to and out-and in-degree collaborations, respectively. collaborative articles represented % of the total publication output with individual pis/teams publishing up to and collaborative papers as last and coauthor, respectively (fig. c ). ranking teams based on cc values revealed the most strongly connected players in the field. they form a large network with a diameter of , a mean distance of . and an edge density of . (fig. d ). a frequent goal in s&t is to identify key contributors to a field, who can serve as referees, experts, collaborators or awardees. tta delivered three parameters (pc, oc and cc) allowing to estimate team performance (figs. , ). intersection of the top teams for each parameter revealed some overlap between pc-, oc-and cc-based rankings and a core of teams that figured among the top in all three categories (fig. e ). plotting individual teams in the threedimensional parameter space showed that the top teams occupied distinct volumes (fig. f ). this suggested that the product of the three parameters, further referred to as poc, enables differentiated team ranking that values scientific production, offspring generation and cooperativity ( s&t comprise a wide range of disciplines raising the question, whether the field-specific workforce can be evaluated in disciplines other than biomedicine. to address this, exemplary fields in geoscience, computer science, chemistry, astronomy and physics were analyzed (table ; fig. a ; supplementary data ). synoptic graphs summarized the workforce expansion, genealogic relations and collaborative connections of each field and revealed field-specific differences. notably, the teams with top ten poc values comprised winners of field-specific scientific awards (table ; fig. a ) corroborating that this parameter can identify high impact teams in different scientific disciplines. frequently, specific teams are of interest, for example those with exceptionally high publication output . tta allows to trace their publication activity over time while separating articles originating from offspring and collaborators. to illustrate this point, pubmed articles authored by selected pis working in distinct areas of biomedicine were analyzed (table ; fig. b -d). this revealed their distinct publication histories and the contributions from offspring and collaborators. for example, two pis/teams, lip and raoult, showed strong increases of annual publication rates and years ago, respectively. breakdown of contributions revealed that these changes were driven by increased numbers of in-degree collaborations, where the pis are listed as co-authors (fig. b,c) . an important question is how research fields develop over time. the dynamics can determine priorities for public funding, private investments and workforce allocation. to explore whether and how the workforce impacts field development, exemplary fields in biomedicine showing distinct dynamics were analyzed (table ; fig. a -c). separation of "newcomers" entering a field per year from "established" teams working already in the field per year revealed their respective impact on the field's development (fig. a ). many newcomers published only one article (sats, single article teams; fig. a ) excluding a sustained contribution to the workforce. their annual fraction was consistently lower in expanding compared to non-expanding fields ( fig. b ). on the other hand, teams with collaborative and family connections showed consistently longer publication periods indicating that they influence the development of each field (fig. c) . the visuals and quantitative measures introduced here provide a comprehensive view on the workforce behind a field of research. the choice of field subjected to tta can be user- table ) or it may be driven by public interest. an example for the latter is the covid- pandemic, where tta reveals the major contributors, their connections and the remarkable dynamics of the field during coronavirus-induced disease outbreaks (table , fig. ). teams focus on specific subtopics within a field, they publish in different journals and they are affiliated with distinct organizations. how do these factors impact research within a field? to address these points, tta was applied to teams a) focusing on distinct subtopics related to table ; fig. g -i). the graphics reveal the factor-dependent development of each field, notably the strong expansion of the abeta-related workforce and publication output, the strong increase of climate-related teams publishing in recently founded journals and the successive and rather parallel growth of the ai/machine learning field at academic and private institutions (fig. a, d, g) . most teams worked on single topics (fig. b) , published in single journals (fig. e ) and worked at single institutions (fig. h) . increasing the numbers of these factors potentiated poc values (fig. b, e, h) and thus probably the team impact. teams with top ten poc values formed large networks connecting topics (fig. c ) and affiliations (fig. i ). co-publishing and collaborating teams in the climate field established selective connections between journals and affiliations, respectively. these connections differed in strength and depended in part on the total workforce and on the publication period (fig. f, i) . the approach introduced here provides new insight into the workforce that advances a field of interest in s&t based on peer-reviewed articles. the field-and team-related visuals and measures can be scrutinized ad-hoc following a database query. thereby, users gain an accessible and transparent tool to mine the ever-growing scientific literature. learning about the workforce of an important, but unfamiliar field facilitates the identification of experts and the establishment of collaborations crossing disciplinary boundaries (trujillo and long, ) . the field-specific approach complements global analyses focusing on team impact (sekara et al., ; ahmadpoor and jones, ) , evolution (milojevic, ) and affiliation (jones et al., ; way et al., ) . similar to other approaches, tta faces the name ambiguity challenge, whose solution requires more refined approaches (zeng et al., ) . the measures introduced pfrieger, workforce analysis here can help to evaluate context-specific team performance and to reveal the impact of the workforce on the development of a research area. notably, the poc value takes into account three key activities in research, namely scientific production, mentorship and cooperativity, and reliably identifies key contributors. thereby, this metric complements measures such as citations (hirsch, ; ioannidis et al., ; zeng et al., ) https://www.genealogy.math.ndsu.nodak.edu/). however, this approach may underestimate offspring counts in the case of first or last co-authorship, when first authors change the field of interest and in the case of alphabetical author lists or of field-specific author ranking (waltman, ) . genealogic and collaborative connections seem to enhance the impact of teams and to prolong their life-span within a field. this underlines the relevance of training and mentorship ensuring the continuity of research in s&t (sauermann and haeussler, ) and supports the importance of cooperations regardless of the field (lu et al., ; mukherjee et al., ; wuchty et al., ; stallings et al., ; coccia and wang, ; parish et al., ) . considering these connections and the resulting publications within a research area adds an important component to evaluate team performance, inform about underlying networks and forecast the dynamics of the field. wickham, ; ggrepel: slowikowski et al., https://cran.r-project.org/package=ggrepel; igraph: csardi and nepusz ( ) ; plot d: soetaert, https://cran.r-project.org/package=plot d]. code and additional data are available upon written request to the author. tta was applied to lists of publications resulting from queries of bibliographic databases concerning topics of interest. pis listed as last authors defined research teams. last author names included initials to reduce author ambiguity (milojevic, ) . articles with an arbitrary limit of authors were omitted from the analysis. tta assigned a chronologic index (ti) according to the year of the first publication with alternating sign enabling a tree-like display of the workforce. further, tta attributed a color to each team and calculated parameters summarizing publication record, genealogy and collaborations. genealogic relations were based on offspring -ancestor pairs, where offspring was defined as pis that appeared initially as first authors on a publication of an ancestor team and that published subsequently as last (team) authors. each offspring (first author; generation i+ ) was assigned to the ancestor (last author; generation i) with the earliest common publication. families were defined as progeny of first ancestors (i = ) encompassing all subsequent generations. tta derived collaborations based on co-authorship (newman, ) . out-and in-degree connections pfrieger, workforce analysis specified how often a team listed other teams as co-authors and how often the same team was listed as co-author, respectively. the research fields selected for analyses are summarized in table . statistical analyses were performed using the indicated tests. decoding team and individual impact in science and invention data-driven predictions in the science of science evolution and convergence of the patterns of international scientific collaboration the igraph software package for complex network research. interjournal complex systems neurotree: a collaborative, graphical database of the academic genealogy of neuroscience the science of science communication iii science of science calibrating the scientific ecosystem through meta-research an index to quantify an individual's scientific research output impact of medical academic genealogy on publication patterns: an analysis of the literature for surgical resection in brain tumor patients thousands of scientists publish a paper every five days multiple citation indicators and their composite across scientific disciplines multi-university research teams: shifting impact, geography, and stratification in science disentangling the evolution of medline bibliographic database: a complex network perspective intellectual synthesis in mentorship determines success in academic careers pubmed and beyond: a survey of web tools for searching biomedical literature introducing metaknowledge: software for computational research in information science, network analysis, and science of science workforce analysis accuracy of simple, initials-based methods for author name disambiguation principles of scientific research team formation and evolution deep learning for healthcare: review, opportunities and challenges the nearly universal link between the age of past knowledge and tomorrow's breakthroughs in science and technology: the hotspot textpresso: an ontology-based information retrieval and extraction system for biological literature coauthorship networks and patterns of scientific collaboration dynamics of co-authorship and productivity across different fields of scientific research r core team. (r foundation for statistical computing authorship and contribution disclosures the chaperone effect in scientific publishing workforce analysis determining scientific impact using a collaboration index document co-citation analysis to enhance transdisciplinary research an empirical analysis of the use of alphabetical authorship in scientific publishing productivity, prominence, and the effects of academic environment elegant graphics for data analysis the increasing dominance of teams in production of knowledge the science of science: from the perspective of complex systems nobel prize chemistry workforce analysis figure . teamtree analysis of pubmed articles related to "coronavirus" and associated diseases number of teams entering the field per year (orange) and of articles (black) published per year (right). names indicate teams with top ten total pc. b, publication periods (top, left) of individual teams and their pc values as last author symbol size, mean annual pc (pc annu). c, ancestor-to-offspring connections of teams with top ten offspring counts (oc) (left) and fraction of offspring teams (orange) and of offspring publications (black) compared to total counts per year (right). d, left, from top to bottom, counts of offspring (oc), of offspring publications (pc off, positive) per team, generation of each team (tp, middle) and family size (fs) of first-generation teams. symbol size, mean annual counts of offspring publications e, family trees and names of teams with top ten oc values. f, collaborative connections symbol size, out (negative) and in (positive) degree values. annual fractions of collaborating teams (orange) and their publications (black) compared to total numbers (right). inset, mean number of authors (ac) per article published per year. g, counts of out-degree (positive, last author) and of in-degree collaborations (negative, co-author; top) and of resulting articles (bottom) per pi/team (left) and corresponding relative frequency distributions (right) all others, grey). j, log (poc) values per team (left) and their relative frequency distribution annual publication (p, bars) and team (t, lines) counts distinguishing newcomers (left) and established teams (right) from the indicated categories. white line, count of single article teams. inset, publication periods of teams from indicated categories and fields. groups showed statistically significant differences except for the indicated pair (p < . , kruskal-wallis tests. asterisks, p < . , post-hoc dunn test, benjamini-hochberg adjusted workforce analysis d-f) and affiliations (a : harvard; a : stanford; a : ibm; a : mit; a : microsoft: a : google) (g-i) in indicated fields. factor-specific workforce development (a, d, g), fractions of teams focusing on one or more subtopics (b, left: colors, number of topics) networks of teams with top poc values from each category (pi names). (f and i left) strength (thickness, sis ls pfrieger, workforce analysis normalized to maximal numbers) of team-and collaboration-based connections between journals (f) and affiliations (i, left), respectively. symbol sizes, number of teams per journal or affiliation (f, i left) and cc values normalized to maximum per affiliation (i, right supplementary information the author thanks drs. v. demais, m. muzet, v. pallottini, h. runz, and m. slezak for helpful comments on previous versions of the manuscript. key: cord- - kk fb authors: ma, jiahao; su, danmei; huang, xueqin; liang, ying; ma, yan; liang, peng; zheng, sanduo title: cryo-em structure of s-trimer, a subunit vaccine candidate for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kk fb less than a year after its emergence, the severe acute respiratory syndrome coronavirus (sars-cov- ) has infected over million people worldwide with a death toll approaching million. vaccination remains the best hope to ultimately put this pandemic to an end. here, using trimer-tag technology, we produced both wild-type (wt) and furin site mutant (mt) s-trimers for covid- vaccine studies. cryo-em structures of the wt and mt s-trimers, determined at . Å and . Å respectively, revealed that both antigens adopt a tightly closed conformation and their structures are essentially identical to that of the previously solved full-length wt s protein in detergent. these results validate trimer-tag as a platform technology in production of metastable wt s-trimer as a candidate for covid- subunit vaccine. the emergence of sars-cov- in late has led to a global pandemic and has disrupted lives and global economies on a scale unseen in recent human history. this is not the first time when a new coronavirus has posted as a major threat to public health; both sars-cov and middle east respiratory syndrome (mers-cov) caused human infections within past years ( ) . the fact that no licensed vaccines have ever been approved for these highly similar viruses is a reminder for the great challenges we face when hundreds of companies and institutions worldwide rush to develop covid- vaccines with multiple strategies ( ) . a successful vaccine that could truly impact the course of this ongoing covid- pandemic has to have four key characteristics: safety, efficacy, scalability (to billions of doses to meet global demand), and speed. although protein subunit vaccines have excellent track records for the first three requirements, exemplified by the highly successful vaccine gardasil used to prevent hpv infections ( ) and shingrix vaccine for containing herpes zoster virus infections ( ), subunit vaccine development can take years to decades to complete. many of the difficulties reside in the manufacturing processes that have to ensure a fully native-like antigen structure is retained, starting from subunit vaccine designs. similar to other enveloped rna viruses such as hiv, rsv and influenza, coronaviruses including sars-cov- also use a ubiquitous trimeric viral surface antigen (spike protein) to gain entry into host of oleic acid seen in our structure, linoleic acid as well as ps were also observed in the recently published structure of a full-length mutant s protein ( q- p-fl) produced from insect cells ( ) . ps is buried deeply in the hydrophobic pocket residues with a few hydrophilic residues including n , n , r and h making hydrogen bonds with the hydroxyl group of the ps (fig. b) . notably, ps engages hydrophobic interactions with f and m which are invisible in the s- p structure (fig. c) . since no small molecule was shown to bind to the s- p structure, ps likely stabilizes the disordered loops of the ntd domain, making them more ordered (fig. b and c ). the oleic acid located in the hydrophobic pocket of the rbd domain engaged a salt bridge interaction with r at the adjacent protomer through its carboxylic acid group, bringing the rbd domain in close proximity and resulting in the tightly closed conformation (fig. d) . recent studies have shown that low ph can stabilize the s-trimer ( ). indeed, negative staining em analysis of wt s-trimer at ph . revealed more homogenous trimer than that at physiological ph (see accompanying paper). in light of this finding, we were able to determine the cryo-em structure of the wt s-trimer at . Å resolution at ph . ( fig. s and table s ). the structure of the wt s-trimer resembled that of the mt form with a root mean square deviation of . Å over cα atoms (fig. a) . oleic acid was well resolved in the wt structure but the density for the ps was weak, likely due to the low resolution or low occupancy. as a result, the ntd domain of the wt s-trimer was less well resolved than that of mt (fig. a) . it has been shown that a ph-dependent switch domain (residue - , ph switch ) undergoes dramatic conformational change at different ph values ( ). however, this region was nearly identical between our two structures (fig. b ). instead, a fragment (residue - ) we named ph switch at the ctd region of the s domain before the furin cleavage site displays significant structural arrangement. to this structural arrangement. at physiological ph, r forms salt bridge interactions with d and d (fig. d) . at lower ph, the protonation of d and d weakens these interactions. as a result, r flips to the other side and makes hydrophobic interactions with w and l through its aliphatic chain, leading to the ordered helix-turn-helix motif (fig. e) . the newly-formed structural motif makes direct contact with the previously identified ph switch of the adjacent protomer, accounting for the enhanced stability of the wt s-trimer at lower ph (fig. a) . the structural arrangement of ph switch in different ph was also observed in previous studies ( ), further supporting the conformational change between the mt and wt s-trimer structures was due to the different ph but not to the mutation in the furin site. in contrast to the structural differences described above for s- p protein, both of our wt and mt s-trimer were nearly identical to the recently published structures of full- length wild-type s ( ) and q- p-fl ( ) purified in detergent from hek and sf insect cell membranes, respectively. when revisiting the electron density map for full- length wild-type s protein (emdb: ), we spotted unassigned density at the become predominant over the ancestral form worldwide and has been shown to interaction with k at the conformational switch region (fig. c ). from the tightly to the loosely closed state, the conformational switch undergoes a large conformational arrangement and becomes disordered (fig. d ). k flips to the opposite side and interacts with d and d of the ctd , causing the s to move downwards relative to the s (fig. a) . finally, the ctd domain further moves downwards and causes the rbd to adopt an open conformation for receptor binding (movie s ). like the previously reported structure of full-length wt s protein purified in detergent micelles, it is unclear whether the furin cleavage site in our resolved wt s-trimer structure is cleaved. moreover, we could not exclude the possibility that other conformational states exist in the wt sample that were not captured in our cryo-em study since partial cleavage of the furin site may lead to some s dissociation from s-trimer. nevertheless, we are certain that the highly purified wt s-trimer predominately adopts a pre-fusion state, unlike the full-length wild-type spike protein which forms both pre-and post-fusion states in the presence of detergent ( ). we thank xiaodong wang for his coordination and input in this study. we thank maofu liao and andrew c. kruse for critical reading of the manuscript. we thank hongwei wang for providing graphene oxide coated grids. we also thank staff at shuimu biosciences for their assistance with cryo-em data collection. all em data were collected at shuimu biosciences. cell culture medium was clarified by depth filtration (millipore) to remove cell and debris. s-trimers were purified to homogeneity by consecutive chromatographic steps including protein a affinity column using mabselect prisma (ge healthcare) which was preloaded with endo -fc at ( mg/ml) to capture s-trimer, based on the high affinity binding between endo and trimer-tag ( ) . after washing off any unbound contaminating proteins, s-trimers were purified to near homogeneity in a single step using . m nacl in phosphate buffered saline (pbs). for s mt and sars-cov s-trimer, the proteins were dialyzed against pbs plus . % polysorbate before analysis. after one hour of low ph (ph . ) viral inactivation (vi) step using acetic acid, the ph was adjusted to neutral range, wt s-trimer was further purified on a capto qxp resins (ge biosceinces) in a flow-through mode to remove any host cell dna and residual host cell proteins (hcp). a final preventative viral removal (vr) step was performed using a nano-filtration cartridge (asahikasei) before final buffer exchange to pbs plus . % polysorbate by uf/df (millipore). ace -fc expression vector was generated by subcloning a gene-synthesized cdna template (genscript) encoding soluble human ace (amino acid residue - ,accession number: nm_ . ) into hind iii and bgl ii sites of pgh-hfc expression vector (genhunter, nashville, tn) to allow in-frame fusion to human igg fc. the expression vector was then stably transfected into gh-cho (dhfr -/-) cell line and high expression clones were selected and adapted to sfm- -cho (hyclone) serum free medium and ace -fc was produced in a l bioreactor as essentially as described for s-trimer above. ace -fc was purified to homogeneity from the conditioned medium using poros xq column (thermo fisher) following manufacturer's instructions. the avidity of different s-trimer binding to the sars-cov- receptor ace were assessed by bio-layer interferometry measurements on fortebio octet qke (pall, new york). ace -fc ( µg/ml) was immobilized on protein a (proa) biosensors (pall). real-time receptor binding curves were obtained by applying the sensor in a two-fold serial dilutions of s-trimer from . - µg/ml in pbs. kinetic parameters (k on and k off ) and affinities (k d ) were analyzed using octet software, version . . dissociation constants (k d ) were determined using steady state analysis, assuming a : binding model for a s-trimer to ace -fc. negative staining was performed as previously described ( ) . in brief, μl of purified s-trimer at a concentration of about . mg/ml was deposited on a glow-discharged carbon-coated copper grid for s before being blotted with filter paper. grids were quickly washed with two drops of water and one drop of % (w/v) uranium acetate. grids were kept touching to the last drop of % (w/v) uranium acetate for s, and blotted with filter paper. data collection was performed on a tecnai t electron microscope operated at kev equipped with a fei ceta k detector. images were collected at a magnification of , x and a defocus of . μm. purified mt s-trimer protein diluted to . and . mg/ml in pbs buffer were applied to glow-discharged gold holey carbon . / . -mesh grids with and without graphene oxide, respectively. grids were blotted for - seconds at a blotting force of and plunge-frozen in liquid ethane using a markiv vitrobot (thermo fisher scientific). the chamber was maintained at ºc and % humidity during freezing. . mg/ml of wt s-trimer sample in low ph buffer ( mm sodium citrate ph . and mm nacl) was deposited on glow-discharged gold holey carbon . / . -mesh grids with graphene oxide. grids were blotted and vitrified using the same condition. all movies were collected using a titan krios microscope (thermo fisher scientific) equipped with a bioquantum gif/k direct electron detector (gatan). the detector is operated in superresolution mode. a complete description of cryo-em data collection parameters are summarized in table s . for mt s-trimer protein, motion correction for cryo-em images and contrast transfer function (ctf) estimation were performed using motioncorr ( ) and ctffind ( ) respectively. , particles were automatically picked from images collected on grid without graphene oxide (go) using laplacian-of-gaussian in relion . . ( ), and , , particles was automatically picked from images collected on grid with go. extract particles from two datasets were downsized by two-fold and subjected to d classification separately, resulting in , and , good particles. particles from go grids were recentered using scripts written by kai zhang (http://www.mrclmb.cam.ac.uk/kzhang/useful_tools/scripts/) before d classification. good particles from both datasets were combined and subjected to d classification using the initial model generated from the model of s protein with c symmetry (pdb id: vxx). two major classes accounting for . % and . % particles show clear and complete structural features. these particles were auto-refined followed by local d classification with c symmetry to generate two classes with similar structure. the better class was subjected to d refinement with c symmetry and post-processed using mask on the entire molecule to yield a . Å map. ctf refinement was performed to further increase the resolution to . Å. for image processing of wt s-trimer, mt s-trimer map was low-pass-filtered to Å resolution and used as the d reference template for auto-picking. , articles were picked from images collected on go-coated grid and used for two rounds of d classification. , particles were selected from good d classes and used for d classification with c symmetry. one class accounting for . % showing a well-defined structure was refined with c symmetry and post-processed using mask on the entire molecule to give a map at . Å resolution. ctf refinement followed by another round of d refinement improved the resolution to . Å. reported resolutions were calculated based on the goldstandard fourier shell correlation (fsc) at the . criterion. the soluble ectodomain structure s- p (pdb: vxx) was used as a template for model building. the missing region in the s- p structure can be de novo modeled in the mt s-trimer map, owing to its high resolution. the model was manually built in coot ( ) and real space refinement was performed in phenix ( ) the relative quantitation for oleic acid and linoleic acid was performed by a thermo vanquish uhplc coupled to a thermo q exactive hf-x hybrid quadrupole-orbitrap mass spectrometer. the chromatographic separation was performed using a waters csh c column ( . x mm, . μm) and maintained at °c. the separation was performed using isocratic flow of a solvent composed of % acetonitrile, % water, and mm ammonium acetate. the flow rate was set at . ml/min for min. full-scan-ddms mass spectra were acquired in the range of - m/z with the following esi source settings: spray voltage . kv, aux gas heater temperature °c, capillary temperature °c, sheath gas flow rate unit, aux gas flow gas unit in the negative mode. polysorbate (e), the rbd with oleic acid (f), the conformational switch (g) and the s region (h). (i) fsc curves of model-to-map comparison between the mt s-trimer and the s- p structure. (a) structural overlay of mt vxx) with the s domain aligned. s- p and mt are colored in grey and blue respectively. the same atoms from both structures are shown as sphere. the red arrow indicates moving direction. (b) top view of the aligned mt and s- p trimer structure. the rbd domains of the s- p structure rotate counterclockwise and move toward three improvement of pharmacokinetic profile of trail via trimer-tag enhances its antitumor activity in vivo endo binds to the c-terminal region of type i collagen structural basis for kctd-mediated rapid desensitization of gabab signalling motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy ctffind : fast and accurate defocus estimation from electron micrographs new tools for automated high-resolution cryo-em structure determination in relion- coot: model-building tools for molecular graphics phenix: a comprehensive python-based system for macromolecular structure solution key: cord- - gyk cwx authors: morand, serge; walther, bruno a. title: the accelerated infectious disease risk in the anthropocene: more outbreaks and wider global spread date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gyk cwx the greatly accelerated economic growth during the anthropocene has resulted in astonishing improvements in many aspects of human well-being, but has also caused the acceleration of risks, such as the interlinked biodiversity and climate crisis. here, we report on another risk: the accelerated infectious disease risk associated with the number and geographic spread of human infectious disease outbreaks. using the most complete, reliable, and up-to-date database on human infectious disease outbreaks (gideon), we show that the number of disease outbreaks, the number of diseases involved in these outbreaks, and the number of countries affected have increased during the entire anthropocene. furthermore, the spatial distribution of these outbreaks is becoming more globalized in the sense that the overall modularity of the disease networks across the globe has decreased, meaning disease outbreaks have become increasingly pandemic in their nature. this decrease in modularity is correlated with the increase in air traffic. we finally show that those countries and regions which are most central within these disease networks tend to be countries with higher gdps. therefore, one cost of increased global mobility and greater economic growth is the increased risk of disease outbreaks and their faster and wider spread. we briefly discuss three different scenarios which decision-makers might follow in light of our results. the anthropocene has also been nicknamed the 'great acceleration' because various socioeconomic and earth-system related indicators experienced a continuous and often exponential growth after the second world war (steffen et al., a; mcneill and engelke, ; steffen et al., ) . while this relentless growth of the human enterprise improved human well-being around the world in many aspects (waage et al., ; barrett, ; permanyer and scholl, ; zheng and qian, ) , negative impacts have likewise increased (ipcc, ; ipbes, ) . in tandem, warnings about a climate emergency (lenton et al., ; ripple et al., ) , a sixth mass extinction (pereira et al., ; barnosky et al., ; ceballos et al., ) , increasing ocean acidification and dead zones (hoegh-guldberg et al., ; diaz and rosenberg, ; branch et al., ) , and even widespread ecosystem collapse (jackson, ; barnosky et al., ; unep, ) and transgression of safe planetary boundaries (steffen et al., b) have grown increasingly urgent. to avoid or at least dampen the anticipated or already realized changes, scientists and many others have called out for radical changes to how human economies operate and relate to human societies and their environments (unmüßig, ; simms, ; walther, ; ripple et al., ) . we here want to draw attention to another important and noteworthy feature of the anthropocene which greatly affects public health, human well-being, and economic performance. these findings are especially pertinent as the world reels from the health, social and economic impact of the current sars-cov- pandemic (el zowalaty and järhult, ; ghebreyesus and swaminathan, ; lorusso et al., ) . the increasing connectivity of human populations due to international trade and travel (guimerà et al., ; colizza et al., ; brockmann and helbing, ; gabrielli et al., ) , the rapid growth of the transport of wild and domesticated animals worldwide (rosen and smith, ; schneider, ; rohr et al., ; levitt, ) , and other factors such as the increasing encroachment of human populations on hitherto isolated wild animal populations through loss and fragmentation of wild habitats (patz et al., ; despommier et al., ; pongsiri et al., ; myers et al., ) have led to a great acceleration of infectious disease risks, e.g., the increase in emerging infectious diseases and drug-resistant microbes since (jones et al., ) and the increase in the number of disease outbreaks since (smith et al., ) . to expand the previous analysis (smith et al., ) to the beginning of the anthropocene, we investigated whether the number of disease outbreaks has increased since the second world war. in addition, we examined whether the global pattern of infectious disease outbreaks changed possibly due the increasing connectivity of human populations. in other words, have the disease outbreaks become more globalized in the sense that these outbreaks are increasingly shared by countries worldwide? to investigate these questions, we used a the most complete, reliable, and up-to-date global dataset (gideon informatics, ) which had already been used in the previous analysis (smith et al., ) . this dataset can be used to enumerated the recorded annual number of disease outbreaks. to investigate the changing global patterns of disease outbreaks, we used this dataset to calculate two measures which have been recently introduced into ecological and parasitological studies. these two measures, namely modularity and centrality, quantify the connectivity of bipartite networks. modularity is defined as the extent to which nodes (specifically, sites and species for presenceabsence matrices) in a compartment are more likely to be connected to each other than to other nodes of the network (thébault, ) . the calculation of a modularity measure is useful for global phenomena because it allows the overall level of compartmentalization (or fragmentation) into compartments (or clusters, modules, subgroups, or subsets) of an entire dataset to be quantified. high modularity in a global network means that subgroups of countries and disease outbreaks interact more strongly among themselves (that is, within a compartment) than with the other subgroups (that is, among compartments) (bordes et al., ) . centrality is defined as the degree of the connectedness of a node (e.g., a keystone species in ecological studies; jordán, ; gonzález et al., ) . in the context of our study, centrality is the degree of the connectedness of a country and those countries connected to it. we estimated the countries which are the potential centres of disease outbreaks by investigating the eigenvector centrality of a given country in a network of countries which share disease outbreaks among each other. eigenvector centrality is a generalization of degree centrality, which is the number of connections a country has to other countries in terms of sharing disease outbreaks. eigenvector centrality considers countries to be highly central if the connected countries to them through shared outbreaks are connected to many other well-connected countries (bonacich and lloyd, ; wells et al., ) . modularity and centrality analyses have been used to investigate various ecological, parasitological and epidemiological questions (e.g., tylianakis et al., ; jordán, ; gonzález et al., ; anderson and sukhdeo, ; bascompte and jordano, ; poisot et al., ; bordes et al., ; genrich et al., ) . using a widely used world dataset on infectious disease outbreaks, we here present results which demonstrate that the accelerated number of disease outbreaks and their increased global spread are two further threatening aspects of the accelerated infectious disease risk associated with the globalization process which characterizes the anthropocene. to collate the total number of infectious disease outbreaks over the years - , we extracted the relevant data from the medical database called gideon (gideon informatics, ) which contains information on the presence and occurrence of epidemics of human infectious diseases in each country as well as the number of surveys conducted in each country. the gideon data are curated as records of confirmed outbreaks, are continually updated using various sources such as who and promed, and are accessible via subscription. this dataset is generally considered to be the most complete, reliable, and up-to-date in the world and has been regularly used in previous macro-scale studies of infectious disease, epidemics, and pathogen diversity (e.g., smith et al., ; dunn et al., ; morand et al., ; morand et al., ; poisot et al., ; smith et al., ; morand, ; morand and walther, ) . each row in the gideon dataset specifies the disease 'species', the year and the country of the outbreak. the 'annual total outbreak number' is simply the annual total number of outbreaks regardless of the disease designation and including all countries. the 'annual total disease number' uses the same data for each year as the 'annual total outbreak number' but then counts only the different infectious diseases which had at least one outbreak in that respective year. the 'annual total country number' uses the same data for each year as the 'annual total outbreak number' but then counts only the different countries which had at least one outbreak in that respective year. our entire - dataset contains outbreaks of human infectious diseases in nations. in our case, we built yearly bipartite networks of presence-absence matrices which link countries with all the recorded epidemic outbreaks. we then transformed these bipartite networks where separate nodes from countries were connected with nodes of epidemic outbreaks into unipartite networks using the tnet package (opsahl, ) the calculation of modularity of bipartite networks of shared epidemic outbreaks among countries allowed us to identify modules of countries that share common epidemic outbreaks in each respective year (see introduction) (blondel et al., ; bordes et al., ) . we calculated our modularity measure of unipartite network for each year using igraph (lehoucq et al., ) . high modularity calculated in this context means that an epidemic remains relatively constrained within a few countries while low modularity means that an epidemic has spread across relatively more countries. the calculation of the eigenvector centrality of the unipartite network of each respective year allowed us to determine the number of connections a country has to other countries in terms of epidemic outbreak sharing. eigenvector centrality is a measure of the degree of the connectedness of a country and those countries connected to it. high centrality calculated in this context means that a country is connected to many countries which are also well-connected (bonacich and lloyd, ; wells et al., ) . a visual examination of the time trend of the annual modularity measure suggested a discontinuous trend over time. to detect such a discontinuity (or breakpoint) in the trend over time, we used the r package segmented (muggeo, ; muggeo, ) . this package allows the identification of one or more discontinuities using the bootstrap method; in other words, the decomposition of a relationship into one of more piece-wise linear relationships and the identification of breaking point(s). the standard errors of the breakpoint estimates were computed with the procedure of clegg et al. ( ) which is implemented in the r package segmented. finally, we used a smooth regression to visualize the patterns of changes over time (harrell, ) (e.g., for air transport, modularity, etc.). increase in global measures of mobility and gdp. the increase in measures of mobility during the anthropocene has been staggering, with growth percentages of up to % (table ) . many more impressive local and regional examples exist, see, e.g., (wilson, ; wilson, ) . for further analyses below, we only used the world bank data on flight passengers and air freight; therefore, we detail them here. since the s, the total global number of flight passengers ( increase in the number of outbreaks. since the s, the annual total outbreak number ( fig. d ), the annual total disease number (fig. e) , and the annual total country number ( fig. f ) have increased exponentially. however, for the annual total disease number, this increase has slowed since the s. while there is some annual up-and-down variation, the overall trends are well demonstrated by the smooth line regressions. furthermore, around , the trends for the three variables began a decrease or stagnation in the actual data, but not in the smooth line regressions. we also plotted all the annual centrality values for each nation within six regions. despite a large amount of variation over the investigated time period, the three north american countries had the highest mean of centrality values, followed by the three pacific countries and then the european, south american, asian and african countries (fig. ). these differences in centrality between the six regions are statistically significant (kruskal-wallis χ = . , df = , p = . e- ). our results further support the hypothesis that the anthropocene is associated with to a great acceleration of infectious disease risks. first, we showed that the number of disease outbreaks, the number of diseases involved in these outbreaks, and the number of countries affected have increased during the entire anthropocene (thus expanding on the results of previous studies which were more limited in time or space, e.g., morand et al., ; morand et al., ; smith et al., ; morand, ) . furthermore, these increases have mostly been exponential, although with some recent slowdowns (see discussion below). second, we demonstrated that the spatial distribution of these outbreaks has become more globalized in the sense that the overall modularity of the disease networks across the globe has decreased since around . in other words, clusters of disease outbreaks began to increasingly become connected with other clusters so that the fragmented nature of outbreak clusters diminished over time. before , a disease outbreak usually remained confined to one or a few closely connected countries; thereafter, disease outbreaks have become increasingly pandemic in their nature. we thus revealed a long-term, worldwide change in the biogeographic structure of human infectious diseases associated with outbreaks. we further found that this decrease in modularity is correlated with the increase in air traffic. the increase in global mobility and especially in air traffic (table ) allows an outbreak to rapidly spread across several national and continental borders within a short period of time (see also results from modelling and real-world data below). third, we demonstrated which countries and regions are most central within these disease networks. countries which are more centrally located within these disease networks tend to be also the more developed and emerging countries with significantly higher gdps. therefore, one cost of increased global mobility (which is currently tightly linked to economic growth and globalization, see discussion below) is the increased risk of disease outbreaks and their faster and wider spread (although we note that the risk per capita may be decreasing, smith et al., ) . before we discuss the implications of our results, we address possible limitations and biases. while gideon is generally acknowledged to be the most complete, reliable, and up-to-date global dataset on infectious disease outbreaks, we nevertheless should consider that ( ) there may have been some underreporting in the early part of the anthropocene, and ( ) recent outbreaks may not have been entered into gideon yet. ( ) there may have been some underreporting of infectious disease outbreaks during the early parts of the anthropocene in developing countries. however, the imposition in the s of the so- while we cannot exclude the possibility of some underreporting of disease outbreaks during the early part of the anthropocene, it is rather unlikely that the large increases of several hundreds of percent which we documented in figures d-f are entirely due to underreporting. since the overall trends are so consistent and so large over a relatively long period of time, we argue that these trends are real even if the actual numbers may be off by a few percentage points. ( ) the recent slowdowns shown in figures d-f could be real or due to the most recent outbreaks not having been entered into gideon yet. if they are real, they are not really influencing the overall decade-long trends documented here. however, if they are due to underreporting, then the documented trends would be even stronger. although it is generally acknowledged that correlation does not prove causation, the correlation between air travel and modularity specifically (fig. ) , and the relationship between increased mobility and the faster and wider spread of disease outbreaks (table , figures and ) in general make sense given theoretical models and real-world evidence (see discussion below). however, we acknowledge that other factors may be responsible, especially variables which may covary with mobility measures. further causal analyses are therefore required, but these are beyond the scope of this study. the starting point of a disease outbreak is due to a variety of local conditions or factors (morand and lajaunie, ; morand and figuié, ) . however, after emergence, the local, regional, or global spread of a disease is of course dependent on many other factors of which host mobility is usually one of the most important ones. this is of course especially true for directly transmitted human pathogens (walther and ewald, , and studies cited below), although mobility of humans as well as vectors are also important for the global spread of vector-borne diseases (tatem et al., ; brown et al., ; eritja et al., ; golnar et al., ; oliveira et al., ) . theoretical models predict that increased mobility leads to a faster and more wide-ranging spread of a disease outbreak, and, vice versa, decreased mobility slows and contains the spread of an outbreak. modelling the spread of the sars-cov- pandemic, hufnagel et al. ( ) demonstrated that two control strategies, shutting down airport connections and isolating cities, reduced the spread of the virus. drastic travel limitations also delayed a pandemic by a few weeks in a model of the global spread of influenza (colizza et al., ) . similarly, increasing levels of ( ) isolation of infectious hosts, household quarantine and related behavioral changes which reduce transmission rates and ( ) air traffic reduction increasingly slowed the global spread of influenza, although the latter control strategy required the almost complete halt of global air traffic (cooper et al., ; ferguson et al., ; flahault et al., ; hollingsworth et al., ; epstein et al., ; bajardi et al., ) . epstein et al. ( ) emphasized that a combination of both control strategies would be even more effective, a result mirrored by mao ( ) for a model at the city scale. crucially, hollingsworth et al. ( ) also showed a significant decrease in the number of countries affected if travel reductions are combined with other control strategies to reduce transmission rates. this result was confirmed by cooper et al. ( ) and flahault et al. ( ) who found that fewer cities (distributed around the world) were affected by major outbreaks if sufficiently early and significant travel and transmission reductions were implemented. in a simulated smallpox attack, even gradual and mild behavioral changes had a dramatic impact in slowing the epidemic (del valle et al., ) . another model suggested that public health policies that encourage self-quarantine by infected people can lower disease prevalence (chen et al., ) . real-world examples also demonstrate the link between increased mobility and faster disease outbreaks. real data on influenza in the usa showed that a reduction in air travel resulted in a delayed and prolonged influenza season (brownstein et al., a; brownstein et al., b) . similarly, the presence of airports and railway stations significantly advanced the arrival of influenza during the pandemic in china (cai et al., ) . global connectivity due to air traffic allows an outbreak to rapidly spread across several national and continental borders within a short period of time. for example, the ebola virus outbreak was brought into the continental usa onboard a commercial flight from liberia because the host was asymptomatic during the flight (cdc, ) . sevilla ( ) reviewed and modelled how air travel can aid the global spread of ebola, h n influenza, sars-cov- , and pneumonic plague. a systematic review of the effectiveness of travel reductions concluded that internal travel restrictions as well as international border restrictions both delay the spread of influenza epidemics (mateus et al., ) . therefore, given the staggering global increase in global mobility during the anthropocene documented in table , the increase of the number of disease outbreaks, of diseases involved in these outbreaks, and of countries affected, as well as the decrease of modularity of these disease outbreaks make sense because the mobility of humans, other living beings, and goods (which can act as carriers or vectors) all facilitate the spread of disease and species (e.g., smith and guégan, ; findlater and bogoch, ; sardain et al., ) . given the lack of any antiviral or vaccine treatment, the current sars-cov- pandemic has forced governments to drastically curtail people's mobility and introduce continent-wide social distancing and lockdowns in order to at least slow its spread by lowering transmission rates. thus, the governments' responses to this acute global health emergency actually mirror many of the recommendations which were given by the various modelling studies cited above. it should also be noted that the restriction of people's mobility (and its most extreme form, quarantine) and social distancing were already used before the advent of the germ theory of disease and are even used to some extent by other species (hart, ; tognotti, ; bashford, ) . given the link between mobility and disease outbreaks documented by our study, the key question which decision-makers and society at large should ask are which of the following three scenarios (which we outline in very broad terms only) should we aim for in the coming decades. ( ) once the current sars-cov- pandemic is over, we continue on our path of ever increasing mobility without regard to the costs in terms of the accelerated infectious disease risk. ( ) we attempt to slow down or even reverse mobility rates of infected hosts and vectors. ( ) we attempt to slow down or even reverse mobility rates of humans and other carriers and vectors (in other words, decrease many or all of the mobility measures in table ). we briefly discuss the implications of each scenario. however, this discussion is by no means exhaustive, but meant to stimulate further discussion and study which are urgently needed to come to terms with the accelerated infectious disease risk of the anthropocene. ( ) most likely, at least in the short-term, economic and political decision-makers will return to 'business-as-usual' which means increasing mobility rates even further. after all, various projections predict more immense increases of mobility within the next few decades. for example, international tourist arrivals worldwide are expected to increase by . % a year between and to reach . billion by (unwto, ) from the . billion recorded in (table ) in addition to the environmental and social costs and risks of this scenario (e.g., increasing landuse change, greenhouse gas emissions, resource use and waste production, etc.), including the risk of widespread ecosystem collapse (see introduction), we can now add the cost of an increasing infectious disease risk. while our study only focused on human infectious diseases, this cost related to increased mobility is also increasing for animal and plant disease outbreaks as well as alien species introductions (e.g., anderson et al., ; fisher et al., ; bélanger and pilling, ; sardain et al., ) . while outbreaks of animal and plant diseases may be amenable to a cost- benefit analysis (tildesley et al., ) , the current sars-cov- pandemic has clearly shown that simple cost-benefit analyses cannot be applied when the lives of millions of people are at stake (note that another emerging infectious disease, the global hiv pandemic, has claimed million lives so far). given that another pandemic becomes more likely with increasing rates of emergence and increasing global mobility, a 'business-as-usual' scenario is automatically associated with further epidemics and pandemics, possibly killing further millions of humans and devastating local and regional economies or even the global economy. if a 'business-as-usual' scenario is followed with regards to global mobility, countries and the world community should at least invest in better detection and surveillance methods to catch and contain the next pandemic as early as possible, and in better preparedness of public health facilities in case the next pandemic nevertheless gets out of hand (jain et al., ; bedford et al., ; di marco et al., ) . however, this scenario nevertheless will likely be associated with an increased number of epidemic outbreaks (some of which may become devastating pandemics), given the results of our study. ( ) consequently, the most realistic and agreeable scenario may be to slow down or even reverse the mobility rates of infected hosts and vectors. again, the current sars-cov- pandemic has demonstrated that identifying infected hosts and reducing secondary infection rates caused by these infected hosts appear to be the most successful strategies to achieve elimination of the outbreak (e.g., . the required measures, such as mass home quarantine, restrictions on travel, expanded testing and contact tracing, and additional surveillance measures, are thus mostly focused on ( ) identifying and isolating infected hosts (which means to drastically restrict their mobility) and ( ) drastic restrictions of mobility for uninfected hosts. while the latter is possible in crisis situations, it cannot be a long-term solution to the quandary of the increased infectious disease risk of the anthropocene unless we want to decrease our total global mobility (see scenario below). therefore, much improved identification and isolation infected hosts may be the way forward. already, such measures have been adopted during the current sars-cov- pandemic, e.g., body temperature checks for every air travel passenger even though they appear to be ineffective (cohen and bonifield, ) . however, if efficient and reliable health checks which can identify various diseases and which can be administered relatively timeand cost-efficiently to large numbers of passengers could be implemented, we may be able to significantly restrict the mobility of infected hosts. while such a proposal may sound like "pie in the sky" at the moment, rapid advances in diagnostic techniques, such as translational proteomics, may soon allow us to identify infected hosts using simple breathalyzer, saliva, or urine tests (athlin et al., ; nakhleh et al., ; tao et al., ; zainabadi et al., ) . furthermore, hygienic measures, such as the enforcement of handwashing and the wearing of facemasks in public transport hubs, complete and regular disinfection of important traffic hubs and vehicles (including the air and all surfaces), and much better vector control should become mandatory global standards of public health (e.g., grout and speakman, ; nicolaides et al., , reviewed in huizer et al., , especially in the most central of traffic hubs, such as the world's most connected airports (guimerà et al., ; bajardi et al., ) . such measures would certainly help to decrease the mobility of infected hosts and thus the transmission and global spread of diseases. ( ) the most sustainable scenario is, however, to decrease or even reverse global mobility rates of humans and other carriers and vectors, especially if it is part and parcel of a much larger movement towards global sustainability by reducing humanity's environmental footprint and replacing unsustainable economic growth with sustainable economic degrowth (schneider et al., ; daly and farley, ; alexander, ; czech, ; galaz, ; cosme et al., ; weiss and cattaneo, ; chiengkul, ; sandberg et al., ; schmid, ) . such a general, comprehensive and global slowdown of mobility of both uninfected and infected people and vectors would be opposed for many reasons and by many interest groups, mainly based on economic arguments based around the need for continuous economic growth which has so far almost always been positively linked with increased mobility (e.g., arvin et al., ; hakim and merkert, ; unwto, ; saidi et al., ; nasreen et al., ) . it is to some extent possible to decouple mobility from economic growth (loo and banister, ; lane, ) , but even if such a decoupling was achieved, it would not sufficiently reduce mobility to significantly decrease infectious disease risks. as the modelling results cited above and the experience with the current sars-cov- pandemic clearly show, only a huge reduction in mobility and contact rates is sufficient to achieve a slowdown or halt of a highly contagious disease outbreak which is already under way. yet, a decrease in global levels of mobility should also decrease the overall number of disease outbreaks, according to our results (which is different to just slowing and containing the spread of an outbreak, see discussion above). therefore, the many environmental benefits of economic degrowth and deglobalization would be augmented by a global health benefit, the almost certain decrease of infectious disease outbreaks. since economic degrowth and deglobalization have been advocated by many sustainability experts to deal with the currently converging environmental crises (climate change, ocean acidification, biodiversity, etc., see references above), our results further strengthen the argument for such a 'not-business-as-usual' scenario. moreover, the economic degrowth scenario would also ameliorate many of the local conditions or factors associated with the emergence of outbreaks (such as increased livestock production and contact rates with wildlife, climate change, loss and fragmentation of natural habitats caused by urbanization and agricultural intensification, etc.) thus further decreasing the likelihood of disease outbreaks. in addition, we have a growing understanding that the presence of abundant biodiversity and healthy ecosystems has an overall positive effect on human well-being and health (chivian and bernstein, ; wood et al., ; sandifer et al., ; walther et al., ; morand and lajaunie, ; mcmahon et al., ) which should count as an additional health benefit of the economic degrowth scenario. naturally, decreasing mobility is a moral and political choice which can be informed by science, but not answered by science. however, given all the current negative impacts of high mobility (greenhouse gas emissions, land-use and land-cover change and the resulting habitat loss and fragmentation due to transportation infrastructure and energy production, transport of alien species, etc.), maybe it is time to ask whether it is morally justified, for example, to move the equivalent of all the inhabitants of a small town across a continent so that a football team can be supported by its fans during an away game? is it necessary to fly halfway around the world for a weekend shopping trip? is it really most cost-efficient for supply chains to cover the entire globe when all the externalities are included? is long-term sustainability achievable with ever higher rates of mobility? the demand for ever-increasing mobility is putting many stresses on the earth system and therefore also on many aspects of human well-being and health. in this study, we documented another one: the public health risks of an increasing number of disease outbreaks and their increasingly global spread. even without the devastating current impacts of the sars-cov- pandemic, the additional disease outbreak burden associated with our highly mobile and migratory human societies is a definite cost which must be considered in its moral and ethical implications as we consider the future trajectory of the anthropocene (ehrlich and ehrlich, ; steffen et al., ; schill et al., ) . the medicine that might kill the patient: structural adjustment and its impacts on health care in bangladesh planned economic contraction: the emerging case for degrowth emerging infectious diseases of plants: pathogen pollution, climate change and agrotechnology drivers host centrality in food web networks determines parasite diversity transportation intensity, urbanization, economic growth, and co emissions in the g- countries comparison of the immuview and the binaxnow antigen tests in detection of streptococcus pneumoniae and legionella pneumophila in urine human mobility networks, travel restrictions, and the global spread of h n pandemic elimination: what new zealand's coronavirus response can teach the world. the guardian new zealand's elimination strategy for the covid- pandemic and what is required to make it work approaching a state shift in earth's biosphere food security and sociopolitical stability mutualistic networks quarantine: local and global histories a new twenty-first century science for effective epidemic response the state of the world's biodiversity for food and agriculture. fao commission on genetic resources for food and agriculture assessments fast unfolding of community hierarchies in large network eigenvector-like measures of centrality for asymmetric relations forecasting potential emergence of zoonotic diseases in south-east asia: network analysis identifies key rodent hosts habitat fragmentation alters the properties of a host-parasite network: rodents and their helminths in south-east asia impacts of ocean acidification on marine seafood the hidden geometry of complex, network-driven contagion phenomena assessing the risks of west nile virus-infected mosquitoes from transatlantic aircraft: implications for disease emergence in the united kingdom. vector-borne zoonotic dis air travel and the spread of influenza: authors' reply empirical evidence for the effect of airline travel on inter-regional influenza spread in the united states roles of different transport modes in the spatial spread of the influenza a(h n ) pandemic in mainland china cdc and texas health department confirm first ebola case diagnosed in the u accelerated modern human-induced species losses: entering the sixth mass extinction public avoidance and epidemics: insights from an economic model the degrowth movement: alternative economic practices and relevance to developing countries sustaining life: how human health depends on biodiversity estimating average annual per cent change in trend analysis no us coronavirus cases were caught by airport temperature checks. here's what has worked modeling the worldwide spread of pandemic influenza: baseline case and containment intervention the role of the airline transportation network in the prediction and predictability of global epidemics delaying the international spread of pandemic influenza assessing the degrowth discourse: a review and analysis of academic degrowth policy proposals supply shock: economic growth at the crossroads and the steady state solution ecological economics: principles and applications effects of behavioral changes in a smallpox attack model the role of ecotones in emerging infectious diseases sustainable development must account for pandemic risk spreading dead zones and consequences for marine ecosystems global drivers of human pathogen richness and prevalence can a collapse of global civilization be avoided? from sars to covid- : a previously unknown sars-cov- virus of pandemic potential infecting humans-call for a one health approach. one health controlling pandemic flu: the value of international air travel restrictions direct evidence of adult aedes albopictus dispersal by car strategies for mitigating an influenza pandemic human mobility and the global spread of infectious diseases: a focus on air travel emerging fungal threats to animal, plant and ecosystem health strategies for containing a global influenza pandemic dissecting global air traffic data to discern different types and trends of transnational human mobility global environmental governance, technology and politics: the anthropocene gap duality of interaction outcomes in a plant-frugivore multilayer network scientists are sprinting to outpace the novel coronavirus global infectious disease and epidemiology online network quantifying the potential pathways and locations of rift valley fever virus entry into the united states centrality measures and the importance of generalist species in pollination networks are we there yet? in-flight food safety and cabin crew hygiene practices the worldwide air transportation network: anomalous centrality, community structure, and cities' global roles the causal relationship between air transport and economic growth: empirical evidence from south asia regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis behavioural defences in animals against pathogens and parasites: parallels with the pillars of medicine in humans urbanization and disease emergence: dynamics at the wildlife-livestock-human interface coral reefs under rapid climate change and ocean acidification will travel restrictions control the international spread of pandemic influenza? forecast and control of epidemics in a globalized world usefulness and applicability of infectious disease control measures in air travel: a review iata annual review fifth assessment report (ar ). intergovernmental panel on climate change shipping statistics and market review itf transport outlook ecological extinction and evolution in the brave new ocean planning for large epidemics and pandemics: challenges from a policy perspective spillover and pandemic properties of zoonotic viruses with high host plasticity global trends in emerging infectious diseases keystone species and food webs antibiotic and pesticide susceptibility and the anthropocene operating space stalls in africa's fertility decline partly result from disruptions in female education impacts of biodiversity on the emergence and transmission of infectious diseases the outcomes of old myths and the implications of new technologies for the sustainability of transport climate tipping points -too risky to bet against two billion and rising: the global trade in live animals in eight charts. the guardian decoupling transport from economic growth: extending the debate to include environmental and social externalities novel coronavirus (sars-cov- ) epidemic: a veterinary perspective evaluating the combined effectiveness of influenza control strategies and human preventive behavior effectiveness of travel restrictions in the rapid containment of human influenza: a systematic review ecosystem change and zoonoses in the anthropocene the great acceleration: an environmental history of the anthropocene since diversity and origins of human infectious diseases emergence of infectious diseases: risks and issues for societies biodiversity and health: linking life, ecosystems and societies confronting emerging zoonoses: the one health paradigm climate variability and outbreaks of infectious diseases in individualistic values are related to an increase in the outbreaks of infectious diseases and zoonotic diseases estimating regression models with unknown break-points segmented: an r package to fit regression models with broken-line relationships human health impacts of ecosystem alteration diagnosis and classification of diseases from subjects via pattern analysis of exhaled molecules long-run causal relationship between economic growth hand-hygiene mitigation strategies against global disease spreading through the air transportation network a quantitative risk assessment (qra) of the risk of introduction of the japanese encephalitis virus (jev) in the united states via infected mosquitoes transported in aircraft and cargo ships in the wake of structural adjustment programs -exploring the relationship between domestic policies and health outcomes in argentina and uruguay structure and evolution of weighted networks unhealthy landscapes: policy recommendations on land use change and infectious disease emergence global trends in lifespan inequality causal inference in disease ecology: investigating ecological drivers of disease emergence ongoing worldwide homogenization of human pathogens biodiversity loss affects global disease ecology mobility, vehicle fleet, energy use and emissions forecast tool (moveet) the r project for statistical computing. r foundation for statistical computing world scientists' warning of a climate emergency emerging human infectious diseases and the links to global food production summarizing the evidence on the international trade in illegal wildlife the long-run relationships between transport energy consumption, transport infrastructure, and economic growth in mena countries green growth or degrowth? assessing the normative justifications for environmental sustainability and economic growth through critical social theory exploring connections among nature, biodiversity, ecosystem services, and human health and well-being: opportunities to enhance health and biodiversity conservation global forecasts of shipping traffic and biological invasions to a more dynamic understanding of human behaviour for the anthropocene degrowth and postcapitalism: transformative geographies beyond accumulation and growth crisis or opportunity? economic degrowth for social equity and ecological sustainability. introduction to this special issue sold into extinction: the global trade in endangered species germs on a plane: the transmission and risks of airplane-borne diseases conventional thinking will not solve the climate crisis. the guardian global rise in human infectious disease outbreaks changing geographic distributions of human pathogens globalization of human infectious disease the trajectory of the anthropocene: the great acceleration planetary boundaries: guiding human development on a changing planet trajectories of the earth system in the anthropocene maritime economics a saliva-based rapid test to quantify the infectious subclinical malaria parasite reservoir global traffic and disease vector dispersal identifying compartments in presence-absence matrices and bipartite networks: insights into modularity measures the role of movement restrictions in limiting the economic impact of livestock infections lessons from the history of quarantine, from plague to influenza a anthropogenic pressure on the open ocean: the growth of ship traffic revealed by altimeter data analysis habitat modification alters the structure of tropical host-parasitoid food webs united nations, department of economic and social affairs, population division unctad, . unctad handbook of statistics . united nations conference on trade and development review of maritime transport unctad, b. unctad handbook of statistics . united nations conference on trade and development global environmental outlook geo- . united nations environment programme radical goals for sustainable development unwto tourism highlights. edition. world tourism organization (unwto) unwto, . unwto tourism highlights. edition. world tourism organization (unwto) unwto, . tourism towards . global overview. world tourism organization (unwto) the millennium development goals: a cross-sectoral analysis and principles for goal setting after the radical rebuilding of societies biodiversity and health: lessons and recommendations from an interdisciplinary conference to advise southeast asian research, society and policy pathogen survival in the external environment and the evolution of virulence annual world airport traffic forecasts - degrowth -taking stock and reviewing an emerging academic paradigm distinct spread of dna and rna viruses among mammals amid prominent role of domestic species disease ecology and the global emergence of zoonotic pathogens travel and the emergence of infectious diseases population movements and emerging diseases does biodiversity protect humans against infectious disease? air transport, freight (million ton-km) air transport maritime highways of global trade maritime logistics: a complete guide to effective shipping and port management an efficient and costeffective method for purification of small sized dnas and rnas from human urine development and poverty reduction: a global comparative perspective a pneumonia outbreak associated with a new coronavirus of probable bat origin this work was part of thefuturehealthsea project funded by the french anr (anr- -ce - - ). s.m. is supported by the thailand international cooperation agency (tica) "animal innovative health". we sincerely thank dieter stockmann from the institute of shipping economics and logistics (isl) and claire thackeray from container trades statistics (cts) for providing data about container movements, jean tournadre for providing data about ship numbers, and ting-wu chuang for providing references. traffic vehicles for transportation < to > million motor vehicles (> %) (steffen et al., a) - international tourism < to > million arrivals (> %) (steffen et al., a) table are approximations whenever numbers are given with a ~, < or > sign because these numbers were taken from graphs in steffen et al. ( a) and isl ( ). abbreviations: dwt = deadweight tonnage is a measure of how much weight a ship can carry; mt = metric ton; teu = twenty-foot equivalent unit, a standard size for . m long containers. key: cord- -gmy zyad authors: he, sijia; waheed, abdul a.; hetrick, brian; dabbagh, deemah; akhrymuk, ivan v.; kehn-hall, kylene; freed, eric o.; wu, yuntao title: psgl- inhibits the virion incorporation of sars-cov and sars-cov- spike glycoproteins and impairs virus attachment and infectivity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gmy zyad p-selectin glycoprotein ligand- (psgl- ) is a cell surface glycoprotein that binds to p-, e-, and l-selectins to mediate the tethering and rolling of immune cells on the surface of the endothelium for cell migration into inflamed tissues. psgl- has been identified as an interferon-γ (inf-γ)-regulated factor that restricts hiv- infectivity, and has recently been found to possess broad-spectrum antiviral activities. here we report that the expression of psgl- in virus-producing cells impairs the incorporation of sars-cov and sars-cov- spike (s) glycoproteins into pseudovirions and blocks virus attachment and infection of target cells. these findings suggest that psgl- may potentially inhibit coronavirus replication in psgl- + cells. the ongoing coronavirus disease (covid- ) is a global pandemic afflicting more than million people in over countries and territories, resulting in more than , deaths as of june th, . currently, there are no effective treatments or vaccines. understanding virushost interactions is critical for developing novel therapeutics and vaccines. p-selectin glycoprotein ligand- (psgl- , also known as selplg or cd ) is a human protein recently identified to possess broad-spectrum antiviral activity ( ) . psgl- binds to the selectin family of proteins, p-, e-, and l-selectin ( ) , and mediates immune cell tethering and rolling on the surface of endothelium to promote cell migration into inflamed tissues ( ) . in the context of viral infection, psgl- has been identified as an ifn-γ-regulated inhibitory factor involved in blocking hiv- infectivity ( ), and was recently found to possess broad-spectrum antiviral activity ( ) , blocking viral infections through steric hindrance of particle attachment to target cells ( , ) . the coronavirus spike (s) glycoproteins play an essential role in viral entry by binding the cellsurface receptor on target cells and mediating the fusion between viral and cellular membranes during virus entry ( ) . the s protein is also the target of neutralizing antibodies generated by the infected host. because of its central role in virus infection and adaptive immunity, the s protein is a prime target for the development of antiviral therapeutics and vaccines. in addition to the adaptive arm of the host immune response, viral infections trigger an innate immune response, largely induced by ifn, that sets up an antiviral state. hundreds of ifn-stimulated genes (isgs) are induced by viral infection ( ) . while the role of some isgs in blocking the replication of particular viruses has been well established, the vast majority of isgs have not been characterized. because of the significance of host innate immunity in viral transmission and replication within and between hosts, there is an unmet need to understand these antiviral inhibitory factors in detail. previous studies have demonstrated that psgl- can be incorporated into hiv- virions, and its virion incorporation subsequently blocks viral infectivity ( , ) . to investigate the ability of psgl- to restrict coronavirus infection, we first established a lentiviral vector-based coronavirus pseudovirus infection system ( ) , in which the s proteins from either sars-cov or sars-cov- were used to pseudotype lentiviral particles (fig. a ). using this system, we assembled particles in the presence or absence of psgl- ( ), and then used the particles to infect target vero and calu- cells, which endogenously express the primary sars-cov and cov- receptor, angiotensin converting enzyme (ace ) ( , ). the expression of psgl- in viral producer cells had a minor (∼ two-fold) effect on the release of sars-cov and -cov- pseudovirions ( fig. b and c) , consistent with the previous finding that psgl- expression has minimal effects on viral release ( ). however, the infectivity of psgl- -imprinted sars-cov particles was completely abrogated in vero cells (fig. d) , demonstrating the ability of psgl- to block the infectivity of sars-cov s-bearing virions. we further tested the effect of psgl- on the infectivity of lentiviral particles pseudotyped with the sars-cov- s protein. we found that particles pseudotyped with sars-cov- s protein had much lower infectivity than those pseudotyped with sars-cov s protein. to resolve this technical issue, we developed a more sensitive reporter system in which a luciferase reporter (luc) gene was expressed from the hiv- ltr in the presence of co-expressed hiv- tat protein ( ) (fig. a) . a major advantage of this system is that high-level luc expression can be achieved upon transactivation by co-expressed tat protein following viral entry, which minimizes non-specific luc background from non-productive viral entry ( ) . using this system, we found that the infectivity of the sars-cov- pseudovirus is also potently inhibited by the expression of psgl- in the virus-producer cells ( fig. e and f) . together, these results demonstrate that psgl- expression in the virus-producer cells severely diminishes the infectivity of virions bearing sars coronavirus s proteins. to investigate possible mechanisms, we analyzed the virion incorporation of sars-cov s proteins in the presence of psgl- . our previous study showed that psgl- can inhibit the incorporation of the hiv envelope glycoprotein ( ). as shown in fig. b , the expression of psgl- in the virus-producer cell also decreased the amount of both sars-cov and sars-cov- s proteins on virions. we and other previously reported that psgl- -mediated inhibition of virion infectivity is through steric hindrance of particle attachment to target cells, which does not depend on the presence of viral envelope glycoproteins ( , ) . we performed a virion attachment assay and observed that the lentiviral particles pseudotyped with sars-cov or sars-cov- s protein produced from psgl- -expressing cells were impaired in their ability to attach to target cells (fig. d) . these results demonstrate that the presence of psgl- on virus particles can structurally hinder virion interaction with the target cells even in the presence of remaining s proteins, consistent with previous studies of psgl- and hiv- infection ( , ) . in this report, we demonstrate that the expression of psgl- in virus-producer cells impairs the infectivity of virions bearing the s protein of either sars-cov or sars-cov- , a phenotype shared among several other viruses (e.g., hiv- , murine leukemia virus, and influenza virus) found to be sensitive to psgl- restriction ( ). psgl- has been suggested to be expressed in certain lung cancer cells ( ) , and in lung phagocytes that control the severity of pneumoccal dissemination from the lung to the bloodsstreem ( , ) . nevertheless, it remains to be determined whether psgl- is expressed in sars coronavirus target cells in the lungs, and, if so, whether its expression can impair viral infection. in hiv- infection, the viral accessory proteins vpu and nef have been shown to antagonize psgl- on cd t cells through surface downregulation and intracellular degradation ( , ) . it also remains unknown whether coronaviruses possess a mechanism for antagonizing psgl- . virus assembly. the sars-cov s and sars-cov- s protein expression vectors were kindly provided by gary whittaker and nevan krogan, respectively. a sars-cov- s protein expression vector was also purchased from sinobiological. for the production of gfp reporter lentiviral particles pseudotyped with sars-cov-s, the sars-cov-s expression vector ( . µg), pcmvΔr . ( . µg), and plko. -puro-turbogfp ( µg) were cotransfected with either pcmv -psgl- ( µg), or pcmv -empty vector ( µg) as previously described ( ) . for the production of luciferase reporter lentiviral particles pseudotyped with sars-cov- -s, the sars-cov- -s expression vector ( . µg), pcmvΔr . ( . µg), and pltr-tat-ires-luc ( µg) were cotransfected with either pcmv -psgl- ( µg), or pcmv -empty vector ( µg). both of sars-cov-s and sars-cov- -s pseudotyped viral particles were produced in hek t cells. virus supernatants were collected at - hours post transfection, concentrated by ultracentrifugation, and stored at - °c. hiv- p elisa. sars-cov-s and sars-cov- -s pseudotyped lentiviral particles were quantified using an in-house p elisa kit as previously described ( ) . hiv- env-defective pnl - /kfs ( µg ) and vectors expressing the s protein of either sars- data availability all data generated or analyzed during this study are included in this article. reagents are available from y. w. upon request. psgl- restricts hiv- infectivity by blocking virus particle attachment to target cells expression cloning of a functional glycoprotein ligand for pselectin insights into the molecular basis of leukocyte tethering and rolling revealed by structures of p-and e-selectin bound to sle(x) and psgl- proteomic profiling of hiv- infection of human cd (+) t cells identifies psgl- as an hiv restriction factor virion-incorporated psgl- and cd inhibit both cell-free infection and transinfection of hiv- by preventing virus-cell binding coronavirus membrane fusion mechanism offers a potential target for antiviral development interferons: reprogramming the metabolic network against viral infection activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites angiotensin-converting enzyme is a functional receptor for the sars coronavirus sars-cov- cell entry depends on ace and tmprss is blocked by a clinically proven protease inhibitor tat controls transcriptional persistence of unintegrated hiv genome in primary human macrophages development of a nonintegrating rev-dependent lentiviral vector carrying diphtheria toxin a chain and human traf to target hiv reservoirs activated platelets interact with lung cancer cells through p-selectin glycoprotein ligand- peritoneal macrophages express both pselectin and psgl- psgl- on leukocytes is a critical component of the host immune response against invasive pneumococcal disease hiv envelope-cxcr signaling activates cofilin to overcome cortical actin restriction in resting cd t cells left panel; in the absence of psgl- in the virus-producer cell, virions bearing s protein bind the ace receptor and infect the target cell. right panel; expression of psgl- in the virus-producer cell results in diminished s protein incorporation, and psgl- incorporation into virions sterically blocks virus binding to target cells competing interests a provisional patent applications pertaining to the results presented in this paper has been filed by george mason university. key: cord- - kslllaq authors: kaur, s.; gomez-blanco, j.; khalifa, a.; adinarayanan, s.; sanchez-garcia, r.; wrapp, d.; mclellan, j. s.; bui, k. h.; vargas, j. title: local computational methods to improve the interpretability and analysis of cryo-em maps date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kslllaq cryo-electron microscopy (cryo-em) maps usually show heterogeneous distributions of b-factors and electron density occupancies and are typically b-factor sharpened to improve their contrast and interpretability at high-resolutions. however, ‘over-sharpening’ due to the application of a single global b-factor can distort processed maps causing connected densities to appear broken and disconnected. this issue limits the interpretability of cryo-em maps, i.e. ab initio modelling. in this work, we propose ) approaches to enhance high-resolution features of cryo-em maps, while preventing map distortions and ) methods to obtain local b-factors and electron density occupancy maps. these algorithms have as common link the use of the spiral phase transformation and are called locspiral, locbsharpen, locbfactor and lococcupancy. our results, which include improved maps of recent sars-cov- structures, show that our methods can improve the interpretability and analysis of obtained reconstructions. and are called locspiral, locbsharpen, locbfactor and lococcupancy. our results, which include improved maps of recent sars-cov- structures, show that our methods can improve the interpretability and analysis of obtained reconstructions. cryo-electron microscopy (cryo-em) has become a mainstream technique for structure determination of macromolecular complexes at close-to-atomic resolution and ultimately for building an atomic model (ge, scholl et al. , wandzik, kouba et al. . with its unique ability to reconstruct multiple conformations and compositions of the macromolecular complexes, cryo-em allows the understanding of the structural and assembly dynamics of macromolecular complexes in their native conditions (davis, tan et al. , plaschka, lin et al. , razi, davis et al. . however, the presence of heterogeneity in cryo-em maps leads to a high variability in resolution within different regions of the same map. this leads to challenges and errors in the process of building an atomic model from a cryo-em reconstruction. additionally, current reconstructions from cryo-em do not provide essential information to build accurate ab initio atomic models as atomic debye-waller factors (b-factors) or atomic occupancies, while their counterparts from x-ray crystallography do by analyzing the attenuation of scattered intensity represented at bragg peaks. cryo-em structures exhibit loss of contrast at high-resolution coming from many different sources, including molecular motions, heterogeneity and/or signal damping by the transfer function of the electron microscope (ctf). interpretation of high-resolution features in cryo-em maps is essential to understanding the biological functions of macromolecules. thus, approaches to compensate for this contrast loss and improve map visibility at high-resolution are crucial. this process is usually referred to as "sharpening" and is typically performed by imposing a uniform b-factor to the cryo-em map that boosts the map signal amplitudes within a defined resolution range. when the map is sharpened with increasing positive b-factors the clarity and map details initially improve, but eventually the map becomes worse as the connectivity is lost, and the map densities appear broken and noisy. in the global sharpening approach (rosenthal and henderson , fernandez, luque et al. , scheres , the b-factor is automatically computed by determining the line that best fits the decay of the spherically averaged noise-weighted amplitude structure factors, within a resolution range given by rmax] , with rmax the maximum resolution in the map given by the fourier shell correlation (fsc). more recently, the autosharpen method within phenix ) calculates a single b-factor that maximizes both map connectivity and details of the resulting sharpened map. autosharpen automatically chooses the b-factor that leads to the highest level of detail in the map, while maintaining connectivity. this combination is optimized by maximizing the surface area of the contours in the sharpened map. the approaches presented above are global, so the same signal amplitude scaling is applied to map regions that may exhibit very different signal to noise ratios (snrs) at medium/highresolutions. thus, cryo-em maps showing inhomogeneous snrs (and resolutions) can result into sharpened maps that show both over-sharpened and under-sharpened regions. the former may be strongly affected by noise and broken densities, while the latter may present reduced contrast at high-resolutions. both cases make it difficult or even impossible to interpret the biological relevance of these regions or even the whole map (murshudov ) . thus, local sharpening methods have been proposed to overcome these limitations (erney ramirez-aportela , jakobi, wilmanns et al. ) . locscale approach (jakobi, wilmanns et al. ) compares radial averages of structure factor amplitudes inside moving windows between the experimental and the atomic density maps. after, the method modifies locally the map amplitudes of the experimental map in fourier space to rescale them accordingly to those of the atomic map. this approach requires as input a complete atomic model (without major gaps) fitted to the cryo-em map to sharpen, which is not always available. in addition, the size of the moving window should be provided and depending on the quality of the map to be sharpened, this process may lead to overfitting. more recently, the localdeblur method (erney ramirez-aportela ) proposed an approach for map local sharpening using as input an estimation of the local resolution. the method assumes that the map local density values have been obtained by the convolution between a local isotropic low-pass filter and the actual map. this local low-pass filter is assumed gaussian shaped so that the frequency cutoff is given by the local resolution estimation. in x-ray crystallography, the b-factor (also called temperature value or debye-waller factor) describes the degree to which the electron density is spread out, indicating the true static or dynamic mobility of an atom and/or the positions where errors may exist in the model building. the b-factor is given by = 〈 〉, where 〈 〉 is the mean square displacement for atom i. these atomic b-factors can be experimentally measured in x-ray crystallography, introduced as an amendment factor of the structure factor calculations since the scattering effect of x-ray is reduced on the oscillating atoms compared to the atoms at rest (sherwood, cooper et al. ). bfactors can be further refined by model building packages, i.e. phenix (liebschner, afonine et al. ) or refmac (winn, murshudov et al. ) to improve the quality and accuracy of atomics models. although b-factors are essential to 'sharpen' cryo-em maps at high-resolution, they also provide key information to analyze cryo-em reconstructions. effective b-factors are used to model the combined effects of issues such as molecular drifting due to charging effects, macromolecular flexibility or possible errors in the reconstruction workflow that lead to a signal falloff (rosenthal and henderson , liao and frank , penczek . however, cryo-em maps are usually analyzed with a single b-factor, even though maps may largely differ in different regions. thus, methods to determine local b-factors are much needed to accurately analyze cryo-em maps and improve the quality of fitted atomic models. another local parameter usually provided by x-ray crystallography in contrast with cryo-em are atomic occupancies (or q-values). the occupancy estimates the presence of an atom at its mean position and it ranges between . to . . note that these parameters can be also refined by model building packages if the electron density map is of sufficient resolution. to our knowledge, currently there is not any available method to estimate local occupancies from cryo-em maps, even though this information (in addition to local bfactors) is essential to building accurate atomic models. for example, in (afonine, klaholz et al. ) authors found that % of all models examined in this analysis possess unrealistic occupancies or/and b-factor values, such as all being set to zero or other unlikely values. they also reported that % of models analyzed show cross-correlations between cryo-em maps and respective models below . , and they indicated as a possible hypothesis an incomplete optimization of the model parameters (coordinates, occupancies and b-factors). in this work, we propose semi-automated methods to enhance high-resolution map features to improve their visibility and interpretability. more importantly, these approaches do not require input parameters as fitted atomic models or local resolution maps, which reduces the possibility of overfitting. in particular, our proposed local map enhancement approach (locspiral) is robust to maps affected by inhomogeneous local resolutions/snrs, thus the method strongly improves the interpretability of these maps. secondly, we also propose approaches to determine local b-factors and density occupancy maps to improve the analysis of cryo-em reconstructions. the link between the different proposed approaches is the use of the spiral phase transform to estimate a modulation or amplitude map of the cryo-em reconstruction at different resolutions. we tested our proposed methods with four different samples ranging from near-atomic singleparticle reconstructions (∼ . Å) to maps with more modest resolutions (∼ . Å). in all cases, we compared our results with the ones provided by the relion postprocessing approach (fernandez, luque et al. , kimanius, forsberg et al. ). first, we analysed a single-particle reconstruction of the polycystin- (pc ) trp channel (emdatabank: emd- ) (wang, corey et al. ) . in this case, we focussed on showing the capacity of our locspiral approach, though, for the sake of consistency, we also show results of our obtained b-factor and occupancy maps. the original publication reports a resolution of . Å with a final b-factor to be used for sharpening of - . Å (slope of guinier plot fitting equal to - . Å ). in figure we also compared the performance of locspiral with other methods, including localdeblur, our proposed local b-factor correction method (locbsharpen) and the global bfactor correction approach as implemented in relion. to compare the different results, we used metrics proposed in (afonine, klaholz et al. ) . the results are shown in figure s . in this case, we used a relatively high threshold value to visually compare the different maps. from figure s , we can see that the map obtained by our proposed method shows good connectivity and is less affected by broken or missed densities. emringer (barad, echols et al. ) and crosscorrelation scores (obtained using pdb t n as reference) show approximately similar results for all cases, though, the highest scores are provided by locspiral and locbsharpen approaches. for the sake of comparison, we also provide fsc curves calculated by comparing the different maps with the reference atomic model (pdb t n). in this case, the best results at high resolutions are provided by localdeblur and by our proposed locspiral approach. in addition, we provide the results of our local b-factor and occupancy map calculations. in figure b (upper map), we show the obtained local b-factor map to be used for sharpening (slope of the local guinier plot multiplied by a factor ) and the a map (local values of the logarithm of structure factors amplitudes at Å). the resolution range used to estimate these maps was between Å to the fsc resolution ( . Å). the average value of the local b-factor map gives a value of - . Å , which is in close agreement with the value provided by relion (- . Å ). the a map provides the fitted local amplitudes at Å, showing the local "amount" of signal at this resolution. as expected, figure b shows that the inner parts of the protein show lower b-factors than the outer regions. in figure c , we show the obtained local occupancy map. interesting, both the occupancy and a maps show low values in the regions occupied by detergent densities, lipid densities and cholesterol densities (please see figure in (wang, corey et al. )), indicating the presence of compositional variability in these regions and low signal at Å. next we processed immature ribosomal maps of the bacterial large subunit (davis, tan et al. ). these maps where obtained after depletion of bl ribosomal protein and are publicly available from the emdb (emd- , emd- , emd- , emd- , emd- ) (lawson, baker et al. ) . in this case, we focussed on showing the capacity of our proposed local occupancy maps to interpret and analyse reconstructions showing a high degree of compositional heterogeneity. figure shows the obtained results. the first row shows the different maps to be processed as deposited in the emdb. next, we show the obtained occupancy maps, where the mature s ribosome (emdb- ) is coloured according to the corresponding occupancy maps. these figures clearly show regions that are lacking in the different immature maps with respect to the mature map. thus, occupancy maps were used to create binary masks to segment the mature s ribosome map, extracting after the densities that are missing in the respective immature maps. these densities are shown in the third column of figure with different colours (yellow, red, indigo and green). the obtained occupancy maps also allow us to define a "maturity level" index. this index is calculated by comparing the number of voxels activated in the solvent mask of the mature s reconstruction with the ones in the occupancy masks (see methods section for a more detailed description). as can be seen from figure , the larger the unfolded regions in the immature maps are, the smaller the maturity level is. this maturity level index allows us to quantitatively sort the different immature maps in a spectrum according to their maturity. we further show the advantages of our locspiral and locbfactor approaches in these highly heterogenous datasets. in the second row of figure , we show maps with improved contrast at high-resolution obtained after processing emd- by the locspiral method and by relion (fernandez, luque et al. , kimanius, forsberg et al. . the same soft mask was applied to both maps. in the figure, we show the maps at low and high threshold values. when a low threshold value is used, it is not possible to see details in the relion map, while at high threshold values many regions of this map are not visible. conversely, our locspiral approach shows high resolution features at both high and low thresholds without losing appreciable map densities. finally, we also show the obtained local b-factor map (b map) and the local values of the logarithm of structure factor's amplitudes at Å (a map in the figure). the average value of the local b-factor map to be used for sharpening is - . Å (slope of the local guinier plot multiplied by ). we obtained the b-factor estimations within a resolution range between Å to the fsc resolution given by . Å. interestingly, the b-factor map shows lower b-factors in the outer part of the macromolecule, corresponding to regions that are folded partially and show compositional and conformational heterogeneity. as we will see in more detail in the next section, these modestly sloped guinier plots come from low amplitudes (close or below to the noise amplitude level) at resolutions of Å or higher. this result can be directly observed in the obtained a map in the figure that shows low values in potentially unfolded regions. therefore, in order to accurately analyze local b-factor maps, it is necessary to interpret both b and a maps. next we processed the saccharomyces cerevisiae pre-catalytic b complex spliceosomal single particles deposited in empiar (empiar ) (iudin, korir et al. , plaschka, lin et al. (fernandez, luque et al. , kimanius, forsberg et al. . then, the unfiltered map provided by relion autorefine was used to test our proposed methods. in figure a , we show a central slice along the z axis of this map, and several points are marked with coloured squares. these points show parts of the map that correspond to clear spliceosome densities (green and red), flexible and low resolution spliceosomal regions (yellow and blue) and background (magenta). figure b shows the corresponding guinier plots of these points. solid lines represent measured values of the logarithm of snr-weighted structure factor amplitudes, while dashed lines show fitted curves. this figure also provides the obtained b-factors for the different curves. the guinier plots and b-factors are determined within a resolution range of Å to the fsc resolution, given by . Å. as can be seen from figure b , the red and green curves, which correspond to clear spliceosomal densities, present high amplitude values at Å, while yellow, blue and magenta curves show low amplitudes at Å and a flat profile within the resolution range. in figure b we also show in the black curve, the guinier plot of the noise/background amplitudes obtained from the % quantile of the empirical noise/background distribution for reference. the discontinuous black line indicates the linear fit of this noise guinier plot. comparing the yellow, blue and magenta curves, it is clear that these plots are below our noise level and that the shape of these curves is similar to that of the noise curve. figure d shows the spliceosome map coloured with the obtained b-factor map to be used for sharpening (slope of the local guinier plot multiplied by ). the average of these b-factors for sharpening is - . Å , while the value reported by relion postprocess is - . Å . figure e shows a central slice of the obtained b-factor map along z axis. as can be seen from figure e, that can be seen in figure e . thus, these regions show very low signal-to-noise rations and are just noise within this resolution range. therefore, we have recalculated our b-factors using a new resolution range of [ , ] Å. the results are shown in figure s . as can be seen from figure s , now the flexible parts show low b-factor and amplitudes at Å . in this case, the average value of the local b-factor map to be used for sharpening is of - . Å (slope of the local guinier plot multiplied by ). we also show results obtained by the lococcupancy and locspiral methods for this highly heterogenous case. figure c show the spliceosome map coloured according to the occupancy map. from figure c , we see that the flexible and moving parts of the spliceosome show low occupancies. finally, in figure g , we show maps at different orientations and similar threshold values obtained by our locspiral and by the postprocessing method of relion (fernandez, luque et al. , kimanius, forsberg et al. ). the map obtained by our proposed method is marked with an asterisk. as before, the map obtained by our proposed approach shows fewer fragmented and broken densities, especially in the flexible part of the spliceosome reconstruction, and enhanced details in the central core portion. we have processed recent cryo-em maps of the cov spike ( the sars-cov- spike ectodomain structure (emd- ) with a single rbd up. the reported global resolution of these maps is . Å, . Å, . Å and . Å, respectively. interesting, deposited atomic models (pdbs pdb vsb, pdb vxx and pdb vyb) incompletely cover the reconstructed cryo-em maps, showing the existence of disordered or over sharpened regions after b-factor correction that could not be modelled. figure s displays corresponding maps and fitted atomic models showing the large amount of protein that is not currently modelled. in figure a , we show deposited emd and obtained maps by our proposed locspiral approach. in this figure, we use a relatively low threshold to visualize the outer parts of the protein. this figure shows that our obtained reconstructions present less fragmented and broken densities and better map connectivity than the deposited emd maps, suggesting that our approach improves the analysis and visualization of the outer regions and potentially aides in the modelling of additional map motifs. interesting, the obtained emd- map shows some fragmented density at the top of the spike, however, we believe that this additional density is an artifact that comes as result of artificially imposing c symmetry on particles that are asymmetric. then, we used the reconstruction obtained from locspiral of emd- and to improve the deposited atomic model (pdb vsb). as result, we could model additional loops and motifs: k .c-f .c; e .c-s .c; nag .c; p .c-k .c, and some additional amino acids, which are now visible in the improved map: p .c-g .c; s .c-v .c; a .b-a .b. we were also able to visualize density corresponding to numerous additional n-linked glycans that could not be resolved in the original reconstruction. examples of some regions that could be further modelled are shown in figures c and d . in figure c , we show at the left the obtained locspiral map with the improved atomic model in green, and at the right the deposited emd map with the pdb vsb in magenta. figure d shows in white the pdb vsb with the traced parts of the glycan proteins marked with purple spheres and in red the additional traced parts using our improved locspiral map. in addition, in this figure we provide zoomed views of two glycan proteins that could be further modelled with our improved map also shown in the image. finally, in figure b , we show obtained local b-factor maps using a similar colormap and estimated global resolution estimations. these figures show that emd- and emd- show lower b-factors than emd- and emd- , and then a better localizability of secondary structure and residues. we have also applied these techniques to recently reported high-resolution cryo-em reconstructions of mouse apoferritin (emd- ) and (emd- ). the reported global resolution of these reconstructions are . and . Å for emd- and emd- , respectively. in figure , we show the results of our obtained b-factor map to be used for sharpening (slope of the local guinier plot multiplied by ), local amplitudes at Å and local occupancy maps. the resolution range used to estimate b and a maps was between Å to the reported global resolution for both cases. the occupancy maps were calculated for these high-resolution maps between to the global resolution. as can be seen from figure in this paper, we have introduced methods to improve the analysis and interpretability of cryo-em maps. these methods include map enhancement approaches (locspiral and local b-factor sharpening), and approaches to calculate local b-factors and density occupancy maps. we have shown in our experiments that our locspiral approach improves map connectivity showing fewer fragmented and broken densities and better coverage of the atomic model. in fact, our locspiral approach has been applied on several published publications (ichikawa, khalifa et al. , yang, chen et al. , gutmann, schafer et al. , jahagirdar, jha et al. , khalifa, ichikawa et al. , enabling molecular modelling on maps with flexibility and light anisotropic resolution. we envision that our proposed methods to estimate local b-factors and occupancy maps could be used to improve de novo model building. first, these maps can be employed to guide the manual tracing. these maps can be informative to estimate the range of structures that could be compatible to the given electron microscopy density. second, for very high resolution cryo-em maps, these values can be used as an approximation of the atomic b-factors and occupancies to be further refined as part of the automatic model refinement process by automatic model building packages as phenix (liebschner, afonine et al. ) or refmac (winn, murshudov et al. ) . b-factor maps provide complementary information to local resolution maps, though usually these results are correlated. the latter usually determines the resolution at a given point by comparing the map to noise or background amplitudes (vilas, gomez-blanco et al. ) , while the former determines the rate of signal amplitude fall off within a resolution range. we can find map regions with similar local resolution (map amplitude similar to noise/background amplitude at this resolution and coordinates), while different b-factor as the signal damping could be different within the used resolution range (highly or slowly sloped). we have seen that we have to be careful when processing maps affected by high flexibility and heterogeneity as the obtained b-factors could be underestimated if the selected resolution range is above the local resolution at these regions. however, these problematic cases can be easily detected as the a map values in these locations are close to or below the noise amplitude. thus, these regions can be automatically disable and not taken into consideration. the methods proposed here are semi-automated and essentially only require the map to enhance or analyse, a binary solvent mask and a resolution range as inputs. they do not require additional information as atomic models or local resolution maps. the common link between all these approaches is the use of the spiral phase transform, which is used to factorize cryo-em maps into amplitude and phase terms in real space for different resolutions. the spiral phase transform has been extensively used in optics for phase extraction in interferometry (larkin, bone et al. , antonio quiroga and servin , vargas, restrepo et al. , vargas, quiroga et al. or by shack-hartmann sensors (vargas, gonzález-fernandez et al. , vargas, restrepo et al. ). this transformation is not new in cryo-em as it has been proposed previously to facilitate particle screening (vargas, abrishami et al. ) , ctf estimation (vargas, oton et al. ) and local and directional resolution determination (vilas, gomez-blanco et al. , vilas, tagare et al. . in (vilas, gomez-blanco et al. , vilas, tagare et al. ) the authors used the riesz transform to obtain amplitude maps, which is similar to the spiral phase transform. cryo-em reconstructions of different types of macromolecules have been used to test the performance of these algorithms. specifically, we have used a membrane protein (trp channel), immature ribosomes affected by high compositional heterogeneity, the spliceosome that shows high conformational heterogeneity, recent sars-cov- reconstructions exhibiting dynamic regions and high resolution apoferritin reconstructions. in all cases, our proposed approaches show excellent results, improving the analysis and the interpretability of the processed maps. the proposed methods are also highly efficient. for example, the processing of emd- (map size px ) using our local enhancement approach took only min on a standard laptop using cores. this work was supported by grants from nserc discovery grant (rgpin- - ), j.v. acknowledges economical support from the ramón y cajal program (ryc - -i). we want to thank helpful discussions with jose jesus fernandez. the proposed methods are based on a d generalization of the d spiral phase transform. in the following, we present the d spiral phase transform and its application to map enhancement, local b-factor determination, and estimation of local map occupancies. the spiral phase transform is a fourier operator that can factorize a d map into its amplitude and phase terms in real space at different resolutions. we assume without loss of generality that a given d map can be modelled as a d phase modulated signal given by where is the cryo-em map, is a band-passed map filtered at frequency , the d background or dc term, the d amplitude map, the d modulating phase and = ( , , ). assuming that we are interested in spatial frequencies higher than / - / /Å and that the background is usually a low frequency signal, we can approximate a high-passed filtered map for resolutions higher than - Å as the quadrature transformation of eq. ( ) is given by assuming that is a low varying map compared to , the gradient of is approximated by however, we can use eq. ( ) to obtain the modulation and cosine terms in eq. ( ) separately without sign ambiguity as using these expressions, we can obtain for each frequency the terms cos( ( )) and ( ). we are proposing here a robust local map enhancement method that only requires as input a binary mask of the macromolecule. the approach works for both high and moderate resolution maps. in the following, we provide details of the proposed method. as explained before, each band-pass filtered map can be factorized into an amplitude and phase term by the spiral phase transform. then, given a user defined solvent mask, the method obtains the empirical noise amplitude probability distribution ( ) at frequency ω, selecting the voxels not included in the solvent mask. from this distribution, the approach determines the noise amplitude value corresponding to the % quantile, given by (q= %). this value is used to locally normalize map amplitudes in real space along different frequencies and remove local signals that are below this amplitude threshold as they are likely noise at this given frequency and position. after this nonlinear amplitude transformation, the map is given by then, eq. ( ) is rewritten as the method allows as option the use of a snr weighting parameter to weight the contribution of the different amplitudes in the final map. in this case, eq. ( ) is rewritten as with , ( ) the snr weighting parameter given by the factorization of a d map into its amplitude and phase terms in real space for different resolutions allows the efficient determination of local b-factor maps. for resolutions between - Å to the estimated global map resolution, the method first obtains the local amplitude maps ( ). these amplitude maps are then used to obtain snr-weighted log-amplitudes of the structure factors locally as log( ( )) = log( , ( ) ( )) with , ( ) a snr weighting parameter defined in ( ). these expressions can be used to fit log( ( )) versus within a resolution rage between - Å to the estimated global map resolution. thus, finally we have with ( ) the local b-factor map and ( ) the log-amplitude map intensities at , which corresponds to the lowest frequency within the used frequency range. the spiral phase transform can be used to obtain local b-factor sharpened maps. note that expression ( ) can be modified for frequencies higher than as ̆( ) = ∑̆, ( ) = { ∑ ( , ( ) ( ) ( ( ))) , < ∑ ( , ( ) ( ) ( ( ))) , ≥ ( ) low occupancy map regions correspond to parts of the macromolecule where map amplitudes of the reconstruction are significantly smaller when compared to other regions of the macromolecule. keeping this in mind, we define the occupancy map as where (q= %) is obtained from the empirical macromolecule amplitude probability distribution ( ) at frequency ω. this amplitude probability distribution is calculated from voxels that are included in the solvent mask. from this distribution, the approach determines the macromolecule amplitude values corresponding to the % and % quantiles, given by (q= %) and (q= %) that are used as threshold. to calculate local occupancy maps, a typical resolution range between and - Å is used to obtain density occupancies of complete secondary structure motifs, while ranges between to - . Å are used for high resolution cryo-em maps to obtain occupancies of residues. in the analysis of the immature s ribosomes, we have proposed a maturity level index. this index can be extended to the analysis of any maturing macromolecule and is useful to place immature macromolecules into a maturing timeline. the calculation of this index requires reconstructions of immature and mature macromolecules. the mature reconstruction is used to obtain a binary solvent mask, while the immature reconstructions are used to calculate occupancy maps. these occupancy maps allow us to determine highly occupied regions (occupancy > . ) and calculate occupancy masks. then, the index is obtained comparing the number of voxels activated in the solvent mask of the mature reconstruction with the ones in the occupancy masks. as can be seen from figure , the larger are the regions that are not folded in the immature maps, the smaller is the maturity level. the pdb vsb with traced parts of the glycan proteins marked with purple spheres. in red, additional parts that could be traced using our improved locspiral map. inside the black squares, zoomed views of two glycan proteins that could be further modelled. new tools for the analysis and validation of cryo-em maps and atomic models isotropic n-dimensional fringe pattern normalization emringer: side chain-directed model and map validation for d cryo-electron microscopy automatic local resolution-based sharpening of cryo-em maps sharpening high resolution information in single particle electron cryomicroscopy action of a minimal contractile bactericidal nanomachine a robust approach to ab initio cryoelectron microscopy initial volume determination cryo-em structure of the complete and ligand-saturated insulin receptor ectodomain tubulin lattice in cilia is in a stressed form regulated by microtubule inner proteins empiar: a public archive for raw electron microscopy image data alternative conformations and motions adopted by s ribosomal subunits visualized by cryo-electron microscopy model-based local density sharpening of cryo-em maps the inner junction complex of the cilia is an interaction hub that involves tubulin post-translational modifications accelerated cryo-em structure determination with parallelisation using gpus in relion- natural demodulation of two-dimensional fringe patterns. i. general background of the spiral phase quadrature transform emdatabank.org: unified data resource for cryoem definition and estimation of resolution in single-particle reconstructions macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix refinement of atomic structures against cryo-em maps image restoration in cryo-electron microscopy structure of a pre-catalytic spliceosome role of era in assembly and homeostasis of the ribosomal small subunit optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy semi-automated selection of cryo-em particles in relion- . general n-dimensional quadrature transform and its application to interferogram demodulation crystals, x-rays, and proteins : comprehensive protein crystallography automated map sharpening by maximization of detail and connectivity particle quality assessment and sorting for automatic and semiautomatic particle-picking techniques shack-hartmann centroid detection method based on high dynamic range imaging and normalization techniques fastdef: fast defocus and astigmatism estimation for high-throughput transmission electron microscopy two-step interferometry by a regularized optical flow algorithm multiplicative phaseshifting interferometry using optical flow shack-hartmann centroid detection using the spiral phase transform high dynamic range imaging method for interferometry monores: automatic and accurate estimation of local resolution for electron microscopy maps measuring localdirectional resolution and local anisotropy in cryo-em maps a structure-based model for the complete transcription cycle of influenza polymerase lipid interactions of a ciliary membrane trp channel: simulation and structural studies of polycystin- macromolecular tls refinement in refmac at moderate resolutions cryo-em structure of the -ncov spike in the prefusion conformation cryo-electron microscopy structures of arna, a key enzyme for polymyxin resistance, revealed unexpected oligomerizations and domain movements rearranging terms, we obtain eq. ( ) shows that the quadrature term is composed by two terms. the first is an orientation map and the second corresponds to a non-linear operator that can be interpreted as a dgeneralization of the d hilbert transform, which can be efficiently calculated using the fourier transform. as shown in , the operator , ( ) | ( )| ⁄ corresponds to the d hilbert transform for our band-passed maps , ( ) thenthus, eq. ( ) can be rewritten asnote that is an unit vector pointing in the same direction that , ( ) (remember that is a low varying map compared to ), but maybe with different orientation because a possible change of sign introduced by the cosine term in eq.( ). we can rewrite eq. ( ) aswhere ( ) is a function with range + or - considering that ( ) and , can be parallel or antiparallel. from eq. ( ), we can obtain an estimation of ( ) affected by an indetermination in its sign key: cord- -vsa y ip authors: warner, emily f.; bohálová, natália; brázda, václav; waller, zoë a. e.; bidula, stefan title: cross kingdom analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets to ameliorate fungal pathogenicity? date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vsa y ip fungi contribute to upwards of . million human deaths annually, are involved in the spoilage of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. moreover, this already significant issue is exacerbated by a rise in antifungal resistance and a critical requirement for novel drug targets. quadruplexes are four-stranded secondary structures in nucleic acids which can regulate processes such as transcription, translation, replication, and recombination. they are also found in genes linked to virulence in microbes, and quadruplex-binding ligands have been demonstrated to eliminate drug resistant pathogens. using a computational approach, we identified putative quadruplex-forming sequences (pqs) in genomes across the fungal kingdom and explored their potential involvement in virulence, drug resistance, and pathogenicity. here we present the largest analysis of pqs in fungi and identified significant heterogeneity of these sequences throughout phyla, genera, and species. moreover, pqs were genetically conserved. notably, loss of pqs in cryptococci and aspergilli was associated with pathogenicity. pqs in the clinically important pathogens aspergillus fumigatus, cryptococcus neoformans, and candida albicans were located within genes (particularly coding regions), mrna, repeat regions, mobile elements, trna, ncrna, rrna, and the centromere. genes containing pqs in these organisms were found to be primarily associated with metabolism, nucleic acid binding, transporter activity, and protein modification. finally, pqs were found in over genes associated with virulence, drug resistance, or key biological processes in these pathogenic fungi and were found in genes which were highly upregulated during germination, hypoxia, oxidative stress, iron limitation, and in biofilms. taken together, quadruplexes in fungi could present interesting novel targets to ameliorate fungal virulence and overcome drug resistance. fungi contribute to upwards of . million human deaths annually, are involved in the spoilage of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. moreover, this already significant issue is exacerbated by a rise in antifungal resistance and a critical requirement for novel drug targets. quadruplexes are four-stranded secondary structures in nucleic acids which can regulate processes such as transcription, translation, replication, and recombination. they are also found in genes linked to virulence in microbes, and quadruplex-binding ligands have been demonstrated to eliminate drug resistant pathogens. using a computational approach, we identified putative quadruplex-forming sequences (pqs) in genomes across the fungal kingdom and explored their potential involvement in virulence, drug resistance, and pathogenicity. here we present the largest analysis of pqs in fungi and identified significant heterogeneity of these sequences throughout phyla, genera, and species. moreover, pqs were genetically conserved. notably, loss of pqs in cryptococci and aspergilli was associated with pathogenicity. pqs in the clinically important pathogens aspergillus fumigatus, cryptococcus neoformans, and candida albicans were located within genes (particularly coding regions), mrna, repeat regions, introduction sequence nucleotides and can form intramolecular or intermolecular associations [ , ] . this structure is further stabilised by the presence of monovalent cations, especially potassium [ ] . moreover, the ʹ-to ʹ-directionality of the strands, glycosidic bonding in the g-tetrads, the cation present, and number of stacked g-tetrads contribute to the wide variation of observed g structures and topologies [ ] . conversely, ims form within cytosine-rich regions there has been increased interest in the therapeutic potential of targeting quadruplexes following the implication of these secondary structures in disease, especially cancer, due to their prevalence in oncogene promoters [ ] . however, there is also now a growing number of pathogens in which g s respectively; figure a and e). the basidiomycota and zoopagomycota had high pqs frequencies relative to genome size ( . and . pqs/kbp, respectively; figure b and f). the mucoromycota and basidiomycota displayed high pqs frequencies relative to gc content ( and pqs/gc%, respectively; figure c and g). fungi within the basidiomycota had the highest average gc content ( . %; figure d and h). the microsporidia and cryptomycota scored lowest for total number of pqs ( and , respectively), pqs/kbp ( . and . , respectively) and pqs/gc% ( and , respectively; figure ). moreover, they also had low gc content ( . % and . %, respectively). considering g s and ims form in guanine or cytosine rich regions, respectively, one would expect fungi with a higher genome gc content to have a higher pqs frequency by chance. to investigate this further, the frequency of pqs/kbp relative to the gc content in all fungi and their divisions were plotted. as expected, there was a positive correlation between gc content and pqs frequency amongst all the fungal species analysed (r= . ; p< . ; figure a ). and mucoromycota (r= . , r= . , and r= . and r= . , respectively; all p< . ; figure b -d). however, there was not a significant correlation observed within the kickxellomycotina (n= species) were . , . , and . , respectively ( figure a ). the average pqs/kbp for each subphylum was . , . , and . , respectively ( figure b). the average pqs/gc% for each subphylum was . , . , and . , respectively ( figure c ). finally, the average gc% for each subphylum was . %, . %, and . %, respectively ( figure d) . figure c ). finally, the average gc% were . % and . %, respectively ( figure d ). finally, we also highlighted the frequency of pqs in fungal genera which contained important human and plant pathogens. we found that there was also large heterogeneity in the frequency of pqs between species within genera containing human pathogens (e.g. aspergillus spp., candida spp., cryptococcus spp., blastomyces spp.) and plant pathogens (e.g. verticillium spp., and fusarium spp.; figure ). this variation was particularly wide within aspergillus spp., and cryptococcus spp. evolutionary conservation of genetic motifs within the genome are a hallmark of their fundamental importance to how that organism functions. therefore, we endeavoured to explore whether there was evolutionary conservation of pqs within fungal genomes. we chose to explore this relationship in aspergillus spp., due to the robustness and accuracy of the phylogenetic tree available [ ] . notably, we found that the frequency of pqs/kbp appeared to be intrinsically linked to how closely related species were, with species within the same section displaying similar pqs frequencies ( figure ). aspergilli in this tree were divided into sections (range of pqs/kbp the ascomycota and basidiomycota contain many of the most prevalent fungal pathogens of both plants and humans, including the genera aspergillus spp., candida spp., and cryptococcus spp., which contain fungal species that account for most fungal-related deaths in humans. although, not all species within these genera are potential pathogens and we found high variation in their pqs frequency. therefore, we compared the pqs frequency between pathogenic and non-pathogenic species to explore whether there was a link with pathogenicity. similarly, comparing species of cryptococcus ( pathogenic, non-pathogenic) we also found that pathogenic species had a significantly lower frequency of pqs/kbp ( . vs. when only total pqs were considered, the largest number of pqs in all three fungal species could be found within the coding regions (cds), genes, and mrna, with few pqs found in other genomic features ( figure a , b, and c). however, this was not the same when considering the frequency of pqs/kbp of the genomic features. in a. fumigatus, the greatest frequency of pqs could be found in the repeat regions ( figure d ). the lowest frequency could be found within the trna. in c. neoformans, the highest pqs frequencies were still in the cds, genes, and mrna, with a very low frequency found within the trna ( figure e ). in c. albicans, the highest frequency of pqs could be found in the rrna, followed by repeat regions and ncrna ( figure f ). there were no pqs found in the trna and low frequencies were again found in the mobile elements. the total number and frequency of pqs bp before and after the annotated genomic features appeared to be evenly distributed (figure ) . pqs are found in genes encoding proteins involved in metabolism, nucleic acid binding, cell transport, and protein modification as we knew the genomic location of the pqs, we could then identify the number and identity of the genes which contained these sequences. this further enabled us to identify the classes of proteins associated with pqs-containing genes. in a. fumigatus, . % of genes contained at least one pqs. in c. neoformans, this number was almost double, with . % of genes containing pqs. conversely, pqs were only found in . % of genes in c. albicans. despite the discrepancies in the number of genes where pqs can found between the organisms, in all cases, pqs were primarily located in genes which encoded proteins involved in metabolism, nucleic acid binding, cell transport, and protein modification (figure ). they were least likely to be found in genes encoding for calcium-binding proteins, extracellular matrix proteins, cell adhesion molecules, and defense/immunity proteins (figure ) . in all organisms, pqs could be found in the highest frequency in genes associated with metabolite interconversion enzymes. in a. fumigatus, the number of genes associated with metabolite interconversion enzymes was . -fold higher than the next represented protein class ( genes vs. genes for nucleic acid binding proteins and transporters; figure a) . in c. neoformans the number of genes associated with these enzymes was . -fold higher compared to nucleic acid binding proteins ( genes vs. genes, respectively; figure b) . in c. albicans, the difference in the number of pqs-containing genes associated with metabolite interconversion enzymes and nucleic acid binding proteins was much lower ( genes vs. genes, respectively; . -fold; figure c ). surprisingly, when categorising genes based on gene ontology terms, there was an almost identical distribution of genes involved in im in the promoter of the hiv- pro-viral genome has also been recently been described [ ] . thus, whether pqs could be found in genes associated with virulence/drug resistance in a. fumigatus, c. neoformans, and c. albicans was explored. although the list is not exhaustive (there are many proteins still yet to be characterised), there were many interesting candidates that arose from the analysis. in total, pqs were found in over genes associated with the virulence, drug resistance, or key biological processes of a. fumigatus ( genes), c. neoformans ( genes), and c. albicans ( ; tables - ) . in a. fumigatus, pqs could be found in notable genes, including the -α sterol demethylases (cyp a and cyp b), the , -β-glucan synthase catalytic subunit fks , and abc drug exporter atrf, which are involved in drug resistance. in addition to genes involved in virulence, including transcription factors stua, hapx, and pacc, genes involved in pigment biosynthesis (pksp, arp , abr , abr , and ayg ), a master regulator of secondary metabolism laea, and glin and glip which are involved in the synthesis of gliotoxin (table ) . as pqs could be found in almost two-thirds of c. neoformans genes, it was not surprising that pqs could be found in those associated with virulence. these included the abc transporter afr (which is associated with fluconazole resistance), the protein kinases fsk and hog , the calcineurin-associated genes crz and cna , pacc/rim like in a. fumigatus, and numerous capsule-associated genes (the main virulence factor of cryptococcus) including cap , cap , cap , cap , cap , cap , cas , cas , and cxt (table ) . there were very few genes in c. albicans that contained sequences likely to form quadruplexes, and thus, quadruplexes might be less important in this organism. notable genes included the iron permeases ftr and ftr , and a gene associated with flucytosine resistance (rrp ; table ) . the highest scoring potential quadruplex-forming sequences for each of these genes were then re-analysed in an alternative pqs predictive algorithm called qgrs mapper. in this instance, the scores of known quadruplex-forming sequences were compared to scores of the pqs in fungi. this was conducted to provide further insight into whether these sequences were likely to form quadruplex structures. figure b ). in all cases, the average pqs frequencies in the upregulated genes were higher than the average pqs observed throughout the entire genome ( figure. b ). the average pqs frequencies in upregulated pqs-containing genes were . pqs/kbp (germinating conidia), . pqs/kbp (oxidative stress), and . pqs/kbp (biofilms; figure b ). although, there were a range of pqs frequencies observed between the genes from . to . pqs/kbp. the genes containing the highest pqs frequencies for each condition were afua_ g in germinating conidia and hyphae ( . pqs/kbp), afua_ g in hypoxic fungi ( . pqs/kbp), afua_ g during iron limitation ( . pqs/kbp), afua_ g during oxidative stress ( . pqs/kbp), and afua_ g in biofilms ( . pqs/kbp). interestingly, each of these genes were upregulated in at least out the conditions investigated. in this study, the number of potential quadruplex-forming sequences within the genomes of fungi were computationally predicted and their potential involvement in pathogenicity was discussed. several important observations were made. this was the first study to identify the heterogeneity of pqs amongst genetically distinct fungal species. moreover, we highlighted that pathogenic aspergillus and cryptococcus species contained fewer pqs compared to their non-pathogenic counterparts and these could be found throughout known genomic features, including genes, mrna, repeat regions, trna, ncrna, and rrna. genes containing pqs were associated with metabolism, nucleic acid-binding proteins, protein modifying enzymes, and transporters. notably, pqs likely to form quadruplexes were identified in genes linked with fungal virulence or drug resistance, such as cyp a, and could be found in genes upregulated during fungal growth and in response to stress. the frequency of pqs throughout genomes is highly variable; for example, human genomes were shown to contain around . pqs/kbp, whereas the genomes of escherichia coli contain around . pqs/kbp [ ] . in this study we also found significant differences in the interestingly, loss of pqs has recently also been observed in pathogenic coronaviridae [ ] . it has also been reported that host nucleolin (an rna-binding protein) can bind and stabilise quadruplexes in the ltr promoter of hiv- , which can silence viral transcription [ ] . therefore, in this situation, loss of quadruplexes would be beneficial for immune evasion. (pacc/rim ). the most notable virulence factor of c. neoformans is its polysaccharide capsule and pqs could be found in numerous capsule-associated genes (cas , cap , cap , cap , cap , cap , cas , cap , and cxt ) [ ]. in c. albicans pqs could be found in genes such as the iron permeases ftr and ftr [ ] . notably, many of these genes contained pqs which have previously been shown to be capable of forming bona fide quadruplexes, such as the sequence ggaggaggagg [ ] . it is also interesting to highlight that these organisms contained many more g + l - compared to g + l - pqs sequences, which is a characteristic shared with s. cerevisiae [ ] . there are now an ever-increasing number of g s identified within genes linked to microbial pathogenicity. g -forming motifs located in the hsds, recd, and pmra genes of s. the pearson correlation coefficient was used to determine the association between pqs and gc content. p< . was considered statistically significant. stop neglecting fungi. nat microbiol strategies for engineering natural product biosynthesis in fungi the regulation and functions of dna and rna g-quadruplexes i-motif dna: structural features and significance to cell biology whole genome experimental maps of dna g-quadruplexes in multiple species quadruplex dna: sequence, topology and structure an intramolecular g-quadruplex structure with mixed parallel/antiparallel mycocosm portal: gearing up for fungal genomes g hunter web application: a web server for g-quadruplex prediction panther version : more genomes, a new panther go-slim and improvements in enrichment analysis tools applications for protein sequence-function evolution data: mrna/protein expression analysis and coding snp scoring tools comparative transcriptome analysis revealing dormant conidia and germination associated genes in aspergillus species: an essential role for atfa in conidial dormancy additional oxidative stress reroutes the global response of aspergillus fumigatus to iron depletion global transcriptome changes underlying colony growth in the opportunistic human pathogen aspergillus fumigatus a robust phylogenomic time tree for biotechnologically and medically important fungi in the genera aspergillus and penicillium. mbio g-quadruplex-induced instability during leading-strand replication rna g-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria nucleolin stabilizes g-quadruplex structures folded by the ltr promoter and silences hiv- viral transcription aspergillus fumigatus conidia survive and germinate in acidic organelles of a epithelial cells genomic distribution and functional analyses of potential g- quadruplex-forming sequences in saccharomyces cerevisiae divergent distributions of inverted repeats and g-quadruplex forming sequences in saccharomyces cerevisiae genome-wide prediction of g dna as regulatory motifs: role in metabolism impacts upon candida immunogenicity and pathogenicity at multiple levels metabolism in fungal pathogenesis. cold spring harb perspect med antifungal resistance, metabolic routes as drug targets, and new antifungal agents: an overview about endemic dimorphic fungi secondary metabolite arsenal of an opportunistic pathogenic fungus candidalysin is a fungal peptide toxin critical for mucosal infection the fungal cyp s: their functions, structures identification of aspergillus fumigatus multidrug transporter genes and their potential involvement in antifungal resistance laea, a regulator of morphogenetic fungal virulence factors recognition of dhn-melanin by a c-type lectin receptor is required for immunity to aspergillus aspergillus fumigatus virulence through the lens of transcription factors role of afr , an abc transporter-encoding gene, in the in vivo response to fluconazole and virulence of cryptococcus neoformans distinct stress responses of two functional laccases in cryptococcus neoformans are revealed in the absence of the thiol-specific antioxidant the capsule of the fungal pathogen cryptococcus neoformans functional characterization of the ferroxidase, permease high-affinity iron transport complex from candida albicans characterization of highly conserved g-quadruplex motifs as potential drug targets in streptococcus pneumoniae g-quadruplex dna motifs in the malaria parasite plasmodium falciparum and their potential as novel antimalarial drug targets characterization of g-quadruplex motifs in espb genes of mycobacterium tuberculosis as potential drug targets berberine antifungal activity in fluconazole-resistant pathogenic yeasts: action mechanism evaluated by flow cytometry and biofilm growth inhibition in candida spp key: cord- - w caxwu authors: zeng, xin; li, lingfang; lin, jing; li, xinlei; liu, bin; kong, yang; zeng, shunze; du, jianhua; xiao, huahong; zhang, tao; zhang, shelin; liu, jianghai title: blocking antibodies against sars-cov- rbd isolated from a phage display antibody library using a competitive biopanning strategy date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w caxwu the infection of the novel coronavirus sars-cov- have caused more than , deaths, but no vaccine or specific therapeutic antibody is currently available. sars-cov- relies on its spike protein, in particular the receptor binding domain (rbd), to bind human cell receptor angiotensin-converting enzyme (ace ) for viral entry, and thus targeting rbd holds the promise for preventing sars-cov- infection. in this work, a competitive biopanning strategy of a phage display antibody library was applied to screen blocking antibodies against rbd. high-affinity antibodies were enriched after the first round using a standard panning process in which rbd-his recombinant protein was immobilized as a bait. at the next two rounds, immobilized ace -fc and free rbd-his proteins were mixed with the enriched phage antibodies. antibodies binding to rbd at epitopes different from ace -binding site were captured by the immobilized ace -fc, forming a “sandwich” complex. only antibodies competed with ace for recognizing rbd at the same or similar epitopes can bind to the free rbd-his in the supernatant and be subsequently separated by the ni-nta magnetic beads. top lead from the competitive biopanning of a synthetic antibody library, lib ab , was produced as the full-length igg format. it was proved to competitively block the binding of rbd to ace protein, and potently inhibit sars-cov- pseudovirus infection of ace -overexpressing hela cells with ic values of nm. nevertheless, top lead from the standard biopanning of lib ab , can only bind to rbd in vitro but not have the blocking or neutralization activity. our strategy can efficiently isolate the blocking antibodies of rbd, and it would speed up the discovery of neutralizing antibodies against sars-cov- . the recent outbreak of a novel coronavirus disease (covid- ) has emerged from a public health emergency of international concern to global pandemic. its pathogen, sars-cov- , is a newly identified β-coronavirus. coronavirus got the family name from the spike(s) protein on the viral particle. the highly glycosylated s protein stays compact in trimeric state, recognizes receptor on the host cell membrane, and then undergoes a series of conformation changes, proteolysis events and membrane fusion to complete viral entry. for vaccines, clinical diagnosis, early prevention and medication, the s protein is the most significant target. the primary sequences of s protein between severe acute respiratory syndrome coronavirus (sars-cov) and sars-cov- share about % identities and % similarities, which indicates high possibility of structural homology and the same infection pathway. sars-cov and sars-cov- recognized the same host cell receptor ace for mediating viral entry into host cells. it was reported that sars-cov s protein trimer bound to ace at : in ratio [ , ] . before infection, rbd of each sars-cov s monomer was partially buried in the inactive "down" conformation and not able to bind ace due to steric clash. once infection started, one monomer turned "up" its rbd to expose enough space to ace , inducing further conformational open and loose for proteolysis [ , ] . atomic-level structural analysis suggested that the spatial interaction and interface between sars-cov- rbd and ace was mostly in accordance with the sars-cov case [ ] . besides, a cryo-em structure of sars-cov- s protein trimer published recently showed that one of the three rbds was in "up" conformation and naturally exposed the whole interaction interface [ ] , while the classic closed symmetric trimer still existed [ ] . that might explain why sars-cov- is much more contagious than sars-cov and causing tricky problems worldwide. no effective cure or vaccine is currently available for covid- . based on structure information above, blocking sars-cov- rbd is a rational therapeutic approach. here we developed a competitive biopanning strategy to efficiently isolate blocking antibodies from phage display antibody libraries. several high-affinity antibodies targeting sars-cov- rbd and blocking its binding to ace were isolated, and the top lead exhibited a neutralization activity of sars-cov- pseudotyped vsv infection. recombinant proteins ace -his was purchased from novoprotein (shanghai, china). ace -hfc and sars-cov- rbd-his were purchased from sino biological (beijing, china). sars-cov- rbd-mfc was expressed using ablink biotech's hek f expression system. a synthetic human fab antibody library ab (libab ) was constructed according to a procedure previously described [ ] . human germline immunoglobulin variable segments vh - and vl - were employed as templates, the complementarity-determining regions l (cdr-l ) and h (cdr-h ) was diversified by the designed mutagenic oligonucleotides. the oligonucleotides were synthesized using the trimer phosphoramidites mix z (glen research) containing codons for amino acids in the following molar ratios: % each y, s &g, % each t & a, and % each p, h, r, f, w, v & l. the number of positions denoted by z in cdr-l (qq (z)n plt) and -h (ar (z) n (a/g/d/y) fdy) was varied from to and to , respectively. the library size is estimated to be × . antibodies against rbd were screened at the first round using a standard biopanning protocol [ ] . briefly, rbd-his was coated on -well maxisorp plates at °c overnight. after the coating buffer was decanted, the plate was blocked with % polyvinyl alcohol (pva) at room temperature for hour. μl of phage libraries ( pfu/ml) was added per well for -hour binding. after washing eight times with pt buffer ( . % tween- in pbs), bound phages were eluted with mm hcl ( μl per well), followed by -min incubation. the eluent was transferred into a . ml microfuge tube and neutralized with m tris-hcl (ph . ). half the neutralized phage solution was mixed with ml of actively growing e. coli neb -alpha f' (od = . ) in ×yt media containing μg/ml tetracycline and incubated at °c for hour. pfu of m k helper phages were added next and incubated for another hour. the infected bacteria were amplified in ml ×yt medium containing μg/ml carbenicillin and μg/ml kanamycin, shaking at rpm and growing overnight at °c. the next day, phages were harvested in precipitant with peg/nacl solution and resuspended in pbs buffer for the following rounds of panning. after the first round of the standard biopanning, a competitive biopanning protocol that included steps of competitive binding, magnetic separation, elution and amplification ( fig. ) , was applied to isolate the epitope-specific antibodies. briefly, μl of ace -hfc protein ( μg/ml) was coated on the -well maxisorp plates. the wells were washed and blocked with % pva, and then the mixture of antibody library ( pfu per well) and free rbd-his protein ( ng per well) was added. after a -hour competitive binding, the supernatant was transferred into a . ml microfuge tube containing the pre-washed ni-nta magnetic beads (genscript) and incubated on a shaker at room temperature for hour. beads were collected using the magnetic separation rack and washed by the pt buffer for times. bound phages were eluted with mm hcl ( μl per tube) after -min incubation. beads were collected using the magnetic separation rack, and the supernatant was transfer into a tube for neutralization. half the neutralized phage solution was mixed with ml of actively growing neb alpha f' cells and amplified as the standard biopanning protocol. μl of the bacterial culture before infection with helper phages was taken, diluted, and grown on the lb plates containing μg/ml carbenicillin at °c overnight. the single clones were picked up next day for the phage elisa assay. fig. schematic presentation of a competitive biopanning strategy. a specific binder of target protein was added during the binding step for the selection of blocking antibodies. in this work, the immobilized ace -hfc captured rbd-his and the antibodies binding rbd at different epitopes, forming a complex like a "sandwich". however, when an antibody recognized the same or similar epitopes within rbd as the ace did, it could block rbd-ace interaction. the antibodies would bind to the free rbd-his in the supernatant and be subsequently separated by the ni-nta magnetic beads. single clones were inoculated into μl ×yt medium containing μg/ml carbenicillin, μg/ml kanamycin and pfu/ml helper phages in -deep-well plates and incubated overnight at °c and rpm. the plates were centrifuged at , rpm and the supernatant was applied for phage elisa. the -well maxisorp plates were coated overnight at °c with rbd-mfc ( μg/ml, μl per well). after blocking with % pva, plates were incubated with μl bacterial supernatant containing phages for hours at room temperature. after six times of wash with pt, bound phages were detected using an hrp-conjugated anti-m antibody (sino biological) and tetramethyl benzidine (tmb) as substrate. absorption at nm was measured. vh and vl of the positive phage were subcloned respectively into the pfusess-chig-hg and pfusess-clig-hk (invitrogen). antibodies were transiently expressed in freestyle™ hek -f cells (life technologies) using fectin transfection reagent according to manufacturer's instructions. after transfection, cells were grown in the serum-free medium for an additional days. the supernatant was collected and purified on a mabselect protein a column (ge healthcare). eluted igg was dialyzed against pbs and stored at - °c. recombinant human ace -his ( μg/ml, μl per well) was coated on -well maxisorp plates, followed by a pre-incubated mixture of the anti-rbd antibody titrated into a constant amount of rbd-mfc ( µg/ml). rbd binding to ace was detected using hrp conjugated anti-mouse fc antibody. the neutralization effects of antibodies on sars-cov- pseudovirus were performed by the genscript inc. (nanjing, china) under a research service contract. briefly, , of the human ace -overexpressing hela monoclonal cells were seeded into each well of a -well plate. sars-cov- pseudovirus and antibodies were incubated at ambient temperature for hour. the mixture was transferred into wells and incubated with cells at °c, % co for hours. the culture medium was freshly replaced, and cells were incubated for another hours. the culture medium was removed, and cells were rinsed with pbs. µl lysis buffer was added and further incubated at ambient temperature for minutes. µl supernatant was transferred to a sterile un-clear -well plate with the bio-glo luciferase substrate added, and the luminescence signal was measured with envision. the dose response curves were plotted with the relative luminescence unit against the antibody concentration. the assay results were processed by microsoft office excel and graphpad prism . high-affinity antibodies were identified by the phage elisa after rounds of the competitive biopanning, clones were randomly selected. their properties of binding to rbd were measured using phage elisa. positive binding was defined as an od reading two or more times higher than the negative control (pva alone). clones showed positive signals (fig. ) . after the dna sequencing, these clones were summarized into groups of unique antibodies. identification of positive clones to immobilized antigen in competitive manner. taken od readings as measurement, data of each group fluctuated within %. the highest-ranking one was named rrbd- in this work. rrbd- , the top lead with the highest od reading isolated from the competitive biopanning, and rrbd- , the top lead isolated from the standard biopanning at round , were expressed as full-length igg antibodies using the f expression system. their binding and blocking abilities against rbd were compared. both rrbd- and rrbd- had high affinities for rbd, with ec at . nm and . nm, respectively. only rrbd- blocked the binding of rbd to ace with an ic at . nm, while rrbd- did not. as a positive control, the recombinant ace -hfc ( µg/ml) totally inhibited the infection of ace -overexpressing hela monoclonal cells with sars-cov- pseudovirus. the antibody rrbd- showed a significant neutralization activity against the sars-cov- pseudovirus with ic values of . nm. however, the antibody rrbd- had no neutralization effect of the pseudovirus and there were no significant differences between the highest concentration antibody group and the blank group without antibody addition. two sars-cov- rbd-specific antibodies selected from different strategies showed different neutralization activities. the antibody rrbd- competed with ace could neutralize sars-cov- pseudovirus, but rrbd- couldn't. rbd-ace interaction initiated viral infection of both sars-cov and sars-cov- . their rbds share high sequence identities ( %) and structure homology, so the well-established sars-cov antibodies were firstly assumed short-cut therapeutic candidates for sars-cov- . however, the real scenario is much more problematic. several independent peer-reviewed studies as well as preprinted ones have proved that all structurally known sars-cov specific antibodies, including s , r, m and f g , have no cross-reactivity of sars-cov- [ , , ] . these antibodies all compete with ace to bind sars-cov rbd, but their epitopes only have limited overlaps of the full ace -rbd interface, which could be the reason of lacking cross-reactivity. cr is a special case with % conserved key residues in the epitope between sars-cov- and sars-cov. its cross-reactivity was remarkable, but one site loss of n-glycan results in ~ magnitude reduction of binding affinity to sars-cov- rbd [ ] . as in human life, rbd-specific monoclonal antibodies derived from covid- recovered individuals indicated similar patterns of no cross-reactivities with either sar-cov or mers-cov [ ] . in general, structural and functional analysis suggests that targeting sars-cov- rbd could be a direct and promising therapeutic strategy, while focusing on previous sars-cov antibodies is not very ideal or efficient. no sars-cov- rbd-specific monoclonal antibody has been reported from human antibody libraries (up to april th , ). in the meantime, sars-cov- spreads unexpectedly fast around the world, and a new study just shifted its basic reproductive number (r ) from . to . [ ] . a rapid and effective method of obtaining the sars-cov- neutralizing antibodies is much required. naï ve antibody libraries derived from natural immune systems have their capacity limits, while synthetic libraries with higher diversity have more opportunities to isolate binders especially for novel infectious antigens. compared to a naï ve antibody library of ~ diversity, a synthetic library with additional artificial randomization on cdrs can reach diversity as high as ~ . when the recombinant rbd and ace proteins were ready, it took weeks to isolate, produce and verify the antibodies in this study. using the standard biopanning method, we enriched rbdspecific phages from our synthetic lib ab but not from our naï ve antibody libraries (data not shown). unfortunately, the top lead rrbd- from the standard biopanning of lib ab couldn't block the rbd-ace interaction (fig. ) , although it did bind to rbd with an ec of . nm (fig. ) . the clinical potential and applications of an antibody often depends on its binding epitopes of the target protein. a high-affinity antibody against the target protein can be screened from a phage display antibody library using the standard biopanning process, but its binding epitopes are identified by some extra steps, such as epitope mapping and competitive elisa. we therefore developed a new competitive biopanning strategy to efficiently isolate isotype-specific antibodies from libraries. as expected, the top lead rrbd- successfully bind to rbd in compete with ace both in solution and in pseudovirus, and its binding affinity is quite high in ~ nm differing from measuring methods. in conclusion, our strategic discovery of human monoclonal antibodies against sars-cov- rbd may fill the blanks of antibody-related pharmaceutical development and shed light on new treatments in need of global health concerns. cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace unexpected receptor functional mimicry elucidates activation of coronavirus fusion cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding structural and functional basis of sars-cov- entry by using human ace cryo-em structure of the sars-cov- spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein a single-framework synthetic antibody library containing a combination of canonical and variable complementarity-determining regions identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries potent human neutralizing antibodies elicited by sars-cov- infection high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus we thank chengdu zicheng yibo biotechnology co., ltd for providing the laboratory consumables and bovine serum. this work was supported by sichuan science and technology program ( rz ), the program of sars-cov- protection (cyhx , kezhi people's air-defense equipment co., ltd) and the program of sars-cov- antibody discovery (jl c- , ablink biotech co., ltd). key: cord- - vk cf authors: liekkinen, juho; de santos moreno, berta; paananen, riku o.; vattulainen, ilpo; monticelli, luca; de la serna, jorge bernardino; javanainen, matti title: understanding the functional properties of lipid heterogeneity in pulmonary surfactant monolayers at the atomistic level date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vk cf pulmonary surfactant is a complex mixture of lipids and proteins lining the interior of the alveoli, and constitutes the first barrier to both oxygen and pathogens as they progress toward blood circulation. despite decades of study, the behavior of the pulmonary surfactant is poorly understood on the molecular scale, which hinders the development of effective surfactant replacement therapies, useful in the treatment of several lung-related diseases. in this work, we combined all-atom molecular dynamics simulations, langmuir trough measurements, and afm imaging to study synthetic four-component lipid monolayers designed to model protein-free pulmonary surfactant. we characterized the structural and dynamic properties of the monolayers with a special focus on lateral heterogeneity. remarkably, simulations reproduce almost quantitatively the experimental data on pressure–area isotherms and the presence of lateral heterogeneities highlighted by afm. quite surprisingly, the pressure–area isotherms do not show a plateau region, despite the presence of liquid-condensed nanometer–sized domains at surface pressures larger than mn/m. in the simulations, the domains were small and transient, but they did not coalesce to yield a separate phase. the liquid–condensed domains were only slightly enriched in dppc and cholesterol, and their chemical composition remained very similar to the overall composition of the monolayer membrane. instead, they differed from liquid-expanded regions in terms of membrane thickness (in agreement with afm data), diffusion rates, acyl chain packing, and orientation. we hypothesize that such lateral heterogeneities are crucial for lung surfactant function, as they allow both efficient packing, to achieve low surface tension, and sufficient fluidity, critical for rapid adsorption to the air–liquid interface during the breathing cycle. the integrity of the alveolar gas-blood barrier is crucial for effective gas exchange and health, filtering of undesirable components, and response to inhaled hazard. at the same time, it develops tolerance mechanisms to attenuate immunopathology. the alveoli are continuously exposed to inhaled micro-and nanosized pathogens, which are normally rapidly eliminated with the help of the immune system. immune responses in the alveoli must be tightly regulated to prevent excessive inflammation and tissue damage. inappropriate or excessive immune responses cause the development of systemic airway inflammation, as in the acute respiratory distress syndrome (ards). ards is the major cause of respiratory failure affecting millions of people annually, and it is also a main cause of death in many viral infections, such as in severe acute respiratory syndrome coronavirus (sars-cov- ), and in the current sars-cov- which causes the coronavirus disease (covid- ). the alveolar epithelium is coated by pulmonary surfactant, that is a delicate membranous proteolipid film that maintains normal lung function. pulmonary surfactant forms a monolayer, which lines the alveolar epithelium and is synthesised and secreted by epithelial alveolar type cells. pulmonary surfactant is the most permeable interface of the human body exposed to the environment, presenting the first respiratory barrier against inhaled foreign matter and microorganisms. it is a lipoprotein complex comprising approximately weight-% lipid of which phosphatidylcholine (pc) is the principal component. importantly, the pulmonary surfactant is exceptionally rich in dipalmitoylphosphatidylcholine (dppc) with a high main transition temperature (t m ). other major lipid components involve pcs with an unsaturated chain, phosphatidylglycerol (pg), and cholesterol. as to protein content, the pulmonary surfactant contains four specific surfactant proteins (sp-a, sp-b, sp-c, and sp-d) of which sp-a and sp-d are innate immune defence proteins, whereas sp-b and sp-c together with the phospholipids are crucial in sustaining the very low surface tension needed to avoid alveolar collapse, oedema, and lack of oxygenation. basic cellular and molecular biology research has suggested that early pulmonary surfactant dysfunction contributes to the high morbidity of coronaviruses. the ability to reduce the surface tension of pulmonary surfactant can be compromised due to decreased concentration of surfactant phospholipids and proteins, altered phospholipid composition, proteolysis and/or protein inhibition, as well as oxidative inactivation of lipids and proteins. [ ] [ ] [ ] [ ] variations of these dysfunctional mechanisms have been reported in child and adult patients with ards. for instance, pulmonary surfactant metabolism studies in adult ards patients showed altered surfactant lipid composition: dppc content was decreased, whereas the fractions of the surface tension-inactive unsaturated species were increased. moreover, both the total amounts of pc and pg were decreased. therefore, the possibility that early surfactant replacement therapy could be beneficial in preventing progression of disease severity is encouraging, given the established harmless profile of the pulmonary surfactant. even though it is mainly indicated for prematurely born babies, surfactant replacement therapy has also been used in adult ards studies. [ ] [ ] [ ] [ ] this method, where exogenous surfactant is supplied into the lungs, is currently being tested in clinical trials to treat covid- infected patients that require ventilator support. , accordingly, for the development of efficient surfactant replacements, we require a better understanding of the roles of the pulmonary surfactant components in lung mechanics. pulmonary surfactant forms a network of complex biological self-assemblying morphologies lining the alveoli. the distinctive structure formed by the pulmonary surfactant is a monolayer at the gas-alveolar epithelium liquid interface. in addition to lateral packing in the monolayer at the liquid-air interface, a fraction of the pulmonary surfactant is also likely folded from the interface into lipid bilayers or multilayers in the aqueous subphase, acting as lipid reservoirs. , this monolayer is repeatedly compressed and expanded during breathing cycles; property that can only be withheld by a material with very peculiar viscoelastic properties. , these reservoirs have also been suggested to participate in oxygen transfer from the inhaled air to circulation. it has been shown that the pulmonary surfactant monolayers and membranes exhibit phase behaviour that is believed to play roles in lung mechanics; tight packing of lipids with saturated lipid chains promotes its ability to lower surface tension, whereas lipids with unsaturated chains increase pulmonary surfactant fluidity and thus allow for its rapid adsorption to the air-water interface. [ ] [ ] [ ] [ ] moreover, the phase behavior is also considered to regulate the functions of the surfactant proteins. , however, the biophysical implications of the phase behavior are poorly understood, and a molecular view of the molecular organization would be extremely helpful in understanding the roles of different lipids and surfactant proteins in lung functionality enabled by pulmonary surfactant. when compressed under the t m of the corresponding membrane, single-component monolayers transition from a two-dimensional gas phase into a loosely packed and dynamic liquidexpanded (l e ) phase. further compression takes the monolayer to a brittle liquid-condensed (l c ) phase, which eventually collapses, excreting matter into the aqueous subphase. [ ] [ ] [ ] the l e -l c transition takes place through the formation of a coexistence phase characterized by a plateau in a surface pressure-area isotherm, which is manifested as observable small domains. [ ] [ ] [ ] the behavior of native pulmonary surfactant monolayers is quite different from such model systems. still, fluorescence and brewster angle microscopy reported visible domains in monolayers formed from lipid fractions of the pulmonary surfactant. , these domains were suggested to be highly enriched in dppc. domains were also detected in vesicles created from the pulmonary surfactant, and also in the presence of the surfactant proteins. , however, the surface pressure-area isotherms measured for pulmonary surfactant lipid fractions were not found to contain a plateau of any sort. thus, the monolayer did not entirely transition to the l c phase at any pressure, indicating that no l e -l c coexistence was present either. presumably, the observed domains did not present an equilibrium phase. indeed, the composition of the pulmonary surfactant seems to be adjusted so that the mixture is barely fluid at body temperature, , which can lead to distinct behavior of its lipid components. by doping the pulmonary surfactant lipid fraction with additional dppc, the t m can be gradually increased, and a coexistence plateau becomes eventually visible. these observations suggest that the behavior of the pulmonary surfactant may be characterized by transient heterogeneity arising from critical fluctuations, where the behaviour of each lipid type is determined by its t m value. these research topics are difficult to study at the molecular level due to the limited resolution of available experimental techniques. molecular simulations are often helpful in resolving such aspects down to the atomistic level. however, previous simulation studies on multi-component monolayers are relatively scarce. , , [ ] [ ] [ ] [ ] [ ] while previous simulations have generated a lot of insight to better understand the behavior of surfactant monolayers, they have been haunted by the insufficiently accurate description of the physical behavior of the lipids at the interface, which is largely due to water models whose quality in describing phenomena at water-monolayer-air interfacial regions is not sufficiently high, , , , and due to incorrect description of the driving forces behind the formation of heterogeneous membranes. to overcome these limitations and provide a detailed and accurate picture of the molecularlevel organization of surfactant monolayers, we used extensive and state-of-the-art all-atom molecular dynamics (md) simulations of model systems that were validated with experimental langmuir trough data. the composition of our model monolayers was chosen to match the composition of the protein-free pulmonary surfactant. these model systems were simulated at various compression states and temperatures, matching the conditions studied in experiments. by using our recently developed protocol for performing simulations of lipid monolayers at the air-water interface, , we were able to reach quantitative agreement with experimental surface pressure-area isotherms. the simulations predicted the presence of lateral heterogeneity characterized by domain formation, which we confirmed by atomic force microscopy (afm) imaging. upon compression, we found the appearance of transient l c -like domains. upon further compression these domains were found to aggregate to form large ordered regions. thanks to the atomistic detail available in the simulations, we were able to draw conclusions as to the physical and chemical properties of the nanoscale domains. interestingly, we observed that the domains are not substantially enriched by any lipid type, indicating that the peculiar viscoelastic properties of the pulmonary surfactant arise from the collective behavior of the mixture rather than from the features of its independent lipid components. our findings help understand lung mechanics and thus guide the development of strategies to tackle lung conditions, while also paving the methodological way for future studies of the pulmonary surfactant. until recently, the poor description of surface tension by the commonly used water models , has prevented quantitative simulation studies of lipid monolayers at the air-water interface. however, very recently we demonstrated that a combination of the four-point opc water model and the charmm lipid model reproduces experimental surface pressurearea isotherms in single-component -palmitoyl- -oleoylphosphatidylcholine (popc) and dppc monolayers, and the agreement is largely quantitative. in this study, we extend this approach to quaternary lipid monolayers whose composition was chosen to match the composition of the protein-free pulmonary surfactant as accurately as possible. in practice, our systems contained mol% dppc, mol% popc, mol% -palmitoyl- oleoylphosphatidylglycerol (popg), and mol% cholesterol. the systems were set up as follows. two monolayers, separated by a water slab and each containing lipids were first set up at an area per lipid (apl) equal to Å . next, these monolayers were either expanded or compressed during a ns simulation to an average apl value of or Å , respectively, using the movingrestraint and cell keywords in the plumed . package. structures at a total of apl values were extracted and used as initial structures for the production monolayer simulations in the nvt ensemble for µs each. all simulations were performed at both and k, and additional repeats as well as larger simulations of up to lipids were performed to provide further validation for the results. the charmm model for phospholipids and cholesterol was used together with the four-point opc water model. the simulations were performed with the version . .x of the gromacs simulation package, and the recommended simulation parameters for the charmm force field were used to reproduce realistic monolayer behavior. details on the simulation setups and on the simulation methodology are provided in the supporting information (si). surface pressure-area isotherms monolayer behavior was characterized and compared to experiments using surface pressurearea isotherms. monolayer surface pressure Π at an apl of a was calculated as Π(a) = γ − γ(a), where γ and γ are the surface tensions of the monolayer-covered and plain airwater interfaces, respectively. here, the surface tensions are obtained from the pressure components along the monolayer plane (p l = (p xx + p yy ) / ) and normal to it (p n ) as γ = l z × (p n − p l ) / , where l z is the simulation box size normal to the monolayer plane, and a factor of two indicates the presence of two water-air interfaces in the simulation box. the values were obtained with the gmx energy tool provided with gromacs. domains packed like in the l c phase were detected by clustering lipid chains and cholesterols based on their packing in the monolayer plane. the th carbons in the lipid chains and the c atom in cholesterol ring were included in the clustering that used the dbscan algorithm. the chosen atoms in the lipid chains capture well the hexagonal packing in ordered structures, and the c carbon of cholesterol resides at the same depth. this carbon is part of both the five-member (d) and six-member (c) rings. for dbscan, we used a cut-off of . nm and a minimum neighbour count of . the cut-off was set to the distance at which the first minimum appears in the radial distribution function of the clustered atoms. the clusters were considered to be part of the l c -like domain. this clustering was performed on conformations separated by ns. the l c -like fraction, number of individual clusters, and the largest cluster size were extracted for each conformation. all these quantities were then averaged over the trajectory, and their standard deviations were used as error estimates. pattern-matching was used to find all residence events in the l c clusters. these times were histogrammed and fitted with a power law with an exponent of b. lateral demixing of lipids with unsaturated and saturated chains was quantified by the con- cholesterols were considered to be part of a cholesterol cluster if they were in contact with at least one other cholesterol molecule. we used a cut-off of . nm based on the first minimum in the cholesterol-cholesterol radial distribution function. all residence events were used in the calculation of the probability distribution of cholesterol cluster sizes. the distances were measured from the centers of mass of the cholesterol molecules. diffusion coefficients were calculated to characterize monolayer dynamics and to detect changes in monolayer packing. the diffusion coefficients were extracted from center-of-mass (com) trajectories. the motion of lipids with respect to the movement of the monolayer as a whole was analyzed to eliminate possible artifacts due to monolayer drift. the diffusion coefficients were extracted from linear fits to time-and ensemble-averaged mean-squared displacement at lag times between and ns. two values were extracted -one from each monolayer -and averaged, and their difference served as an error estimate. the gromacs tool gmx msd was used, and the com trajectories were generated using gmx traj. monolayer thickness was used to couple afm height profiles of heterogeneous membranes to the lateral packing within the domains. this thickness was estimated from density profiles calculated for each lipid type along the monolayer normal (z) using the gromacs tool gmx density. these profiles were aligned at the phosphorus peak of dppc, and the lipid was set to begin and end at z values where its density crossed % of its maximum value. lipid chain tilt was used to characterize persistent l c -like packing in the monolayers. the tilt angle of lipid chains was calculated as the angle between the z axis and the vector joining the st and th carbons in the fatty acid chains of phospholipids using the gromacs tool gmx gangle. the angle distributions were averaged over both chains and both monolayers, and the distributions were fitted with a gaussian. the location of its maximum and the variance were used as the mean tilt angle and its error estimate, respectively. by means of a specially designed ribbon langmuir-wilhelmy trough (nima technology, uk), compression isotherm assays were performed and surface pressures as a function of molecular area were obtained at constant temperature, as described previously. the lipid mixture was identical to that employed in the simulations (see above). the employed langmuir-wilhelmy trough has a maximum area of cm and a minimum of cm and instead of the canonical rigid barriers uses a continuous teflon-coated ribbon that by moving symmetrically reduces the available area in the aqueous surface, and therefore compresses the confined sample to carry out the compression isotherm. the pressure is recorded using a electronical pressure sensor and a piece of cellulose and employing the wilhelmy plate technique, with an estimated error of ± mn/m among different isotherms, that were assessed at least in triplicate at k. the compression isotherm displayed in fig we again used a lipid composition identical to the one in simulations (see above). all these lipids were obtained from avanti polar lipids (alabaster, al). lipids were spread in mm chloroform solution on the surface of a deionized water subphase (milli-q, millipore, bedford, ma) in a langmuir trough (ksv minitrough, espoo, finland) with a platinum wilhelmy plate. the dilatational rheology of the monolayer was studied using the oscillating barrier method. the film was first compressed to the desired surface pressure, after which sinusoidal area compressions with an amplitude of % were performed at a frequency of , , , or mhz and the changes in surface pressure in response to the oscillations were recorded to determine the surface dilatational modulus. each measurement was repeated two times, and an average dilatational modulus was calculated over frequency, since the dilatational modulus was found to be constant with respect to frequency. all measurements were conducted at room temperature ( k). lipid monolayers were transferred onto mica as described in ref. . for the monolayer preparation, the sample was spread onto the air-water interface until the minimum surface pressure of ∼ - mn/m was observed. after min of monolayer equilibration, the film was compressed until the desired surface pressure was reached (from to mn/m, in mn/m steps), at a compression speed of cm /min. before the transfer was started, the film was again equilibrated for min at constant pressure. the monolayers were finally deposited in a freshly cleaved muscovite mica substrate (plano gmbh, wetzlar, germany) that had been previously submerged. the lifting device-to which the mica substrate was fixed-was raised in the vertical plane out of the buffered aqueous subphase at a speed of mm/min at constant pressure. in all experiment modalities three independent experiments were carried out, as a minimum, and up to images were taken and analysed. atomic force microscopy langmuir-blodgett supported monolayers' topographical images were taken using an atomic force microscope (jpk nanowizard, jpk instruments, berlin, germany), employing in both cases silicon-spm cantilevers (nanosensors, nanoworld ag, neuchatel, switzerland). the ac mode in air was selected for monolayers. the scan rate was ∼ hz for all afm images. at least different supported mono-and bilayer systems were assessed, and each sample was imaged on a minimum of three different positions. image processing of afm data was done using the spip software package as in ref. (image metrology, hørsholm, denmark). plateau surface pressure-area isotherm is a key quantity representing monolayer behavior at the water-air interface, and it is readily extracted from both langmuir trough measurements and computer simulations. these isotherms for the quaternary pulmonary surfactant lipid monolayers at and k are shown in figure . where domains were readily seen at surface pressures above mn/m, or at apl below Å at k. no l c /l e coexistence plateau was observed in the simulated isotherms either ( fig. and fig. s ), which was further verified by running independent replica simulations at selected apls. furthermore, by running independent repetitions at selected areas and larger monolayer systems, we also verified that the initial monolayer configuration or the finite-size effects did not affect the results (fig. s ) . the movies at doi: . /m .figshare. as well as the additional snapshots of the simulated monolayers at k and k given in figs. s and s , respectively, clearly demonstrate heterogeneity in lateral organization. based on our simulations at k (fig. c) , the monolayers at apls equal to (corresponding to the very packed state with major regions in an l c -like arrangement) and Å (corresponding to the l e phase) extend . and . nm towards the air phase, respectively, calculated from the plane defined by the dppc phosphorus atoms (see methods). this difference of . nm is exactly the difference observed in the height profile study in the afm experiments ( fig. a and b) , clearly indicating that the measured thicker and thinner regions in the monolayers correspond to l e -and l c -like regions, respectively. the extent of the l c -like region grows as the surface pressure of the monolayer is increased, as would also be expected during a proper phase coexistence region, although the surface pressure does not remain constant here based on the isotherms shown in fig. . from afm figures, at mn/m (fig. , image ) the l c domains seem to fuse together to form more irregular shapes, but the overall heterogeneity is clearly visible. here, the coalescence of the domains was likely also limited by the slow diffusion within the compressed monolayers. indeed, our simulations predict that compression from a state with apl equal to Å to the one with apl equal to Å slows lipid diffusion by - orders of magnitude (fig. a) . the nanostructures within the domains correspond to the l c -like ones measured at mn/m (see si). at mn/m (fig. , image below apl Å ) and mn/m (fig. s in the si) we have already reached the monolayer equilibrium collapse plateau seen in the experimental surface pressure-area isotherms. it should be noted that the monolayers in the alveoli are expected to reach surface tension values close to mn/m (corresponding to surface pressure of ∼ mn/m). however, in exeriments this is only achieved by very rapid compression, or by the use of bubble surfactometer methods. in the collapse region, the domains are completely merged with a continuous network of small holes, forming a sponge-like collapsed structure at the boundary of the air-liquid interface. further on, the protruding regions seen in the afm images at surface pressures above the collapse plateau are collapsed material, excluded from the interfacial film into the water phase. this is verified by the height profiles ( fig. s in the si), which show height differences of up to nm, which is clearly above the difference observed in the simulations between monolayers at different apls (fig. c ). increasing the temperature to k in the simulations renders the monolayer more fluid and the heterogeneity is visible at surface pressures above mn/m, or apl below Å ( fig. s in the si). concluding, both afm measurements and computer simulations revealed lateral heterogeneity in a large range of apl values, despite the lack of a plateau in the surface pressure-area isotherm. together, simulations and afm imaging suggest that the structure of these domains resembles the l c phase. as no coexistence plateau is observed in the isotherms obtained from experiments and simulations, it is worth asking whether the heterogeneity observed by afm and simulations manifests itself in some other monolayer properties. to this end, we measured the dilatational moduli of the monolayer at various surface pressures using the oscillating barrier approach. as shown in fig. c , we observe little change in this modulus up to a surface pressure of mn/m. this signals that the monolayer area can be changed relatively easily, possibly due to a rapid reorganization of lipids in the membrane plane. however, after mn/m -corresponding roughly to apl equal to Å based on the isotherms in fig. large apl values, and this structural feature is coupled to a diffusion mode that differs from that observed at larger apl values. it is worth noting that the long-range orientational order of the tilted l c -like domains is also clearly evident in fig. . where rapid changes in the apl-dependence of many quantities were observed in the last section. curiously, temperature has again very little effect on this demixing. finally, it is worth pointing out that we also studied the time evolution of the contact fractions of the systems at different apl values but found no systematic tendency for any of the monolayers to demix during the simulation time. we also studied whether cholesterol has a tendency to cluster in the monolayers. the probability of a cholesterol to be in a cluster of a given size is shown in the left panel in stable phase separation has also been reported, employing fluorescence microscopy and afm, in pulmonary surfactant extract bilayer and monolayer membranes, whose behavior was similar when the monolayer was compressed to a pressure of ∼ mn/m. the monolayer phase behavior is naturally more complex of the two, as it is strongly affected by lateral pressure, as highlighted by afm experiments. importantly, the afm images measured for monolayers from lung surfactant extracts (compare fig. merge. this is further evidenced by our analyses on domain compositions and lifetimes, which suggests that lipids in monolayers have little tendency to undergo phase separation based on the saturation level of their acyl chains (fig. ) . in terms of fluidity, popg has been suggested to be a key player, in addition to its role in protein-lipid interactions. , we observed that bulk l e regions are indeed enriched in popg. still, the fluidity is not due to popg alone, since all the lipid components display fairly similar diffusion coefficients across apl values (fig. ) . additionally, we observed that cholesterol has a slight tendency to induce ordered clusters at large apls, whereas the cholesterols themselves show little tendency to cluster together. the fact that the spatial heterogeneity was dynamic and dictated by different packing ("physical separation") instead of the nature of the lipid species ("chemical separation"), suggests that the pulmonary surfactant has specific and collective viscoelastic properties that cannot be derived from the behavior of its components in a straightforward manner. these dynamic properties will be further clarified in our future work. our simulations also indicate that the behavior of monolayers is in general very similar at room and body temperature. this is not very surprising, as both temperatures fall below the t m of dppc. in addition to these well-balanced viscoelastic properties of the pulmonary surfactant, the observed heterogeneity likely plays a role in regulating the function of surfactant proteins. , computer simulations have become an indispensable tool in biological soft matter re-search due to their ability to probe small time and length scales. traditionally, reproducing surface pressure-area isotherms in classical molecular dynamics simulations has been challenging, and -unlike here -the isotherms in earlier studies were often shifted artificially to match the experimental ones. , this signals that in previous simulation studies of pulmonary surfactant layers the descriptions of the physics at the interface has been inadequate due inadequate water models. , the simulation approach used in the present work largely overcomes these issues through a combination of recently developed simulation models and simulation parameters. the non-circular domain shapes observed by afm indicate a fairly small line tension, which is also supported by the fact that small and independent domains do not coalesce -as would eventually happen in the case of phase separation. this is in line with the short lifetimes of domains observed in our simulations (fig. c ). based on our simulations, we hypothesize that the l c domains are highly dynamic in monolayers at the air-water interface, but much less dynamic when the samples are transferred onto a mica substrate for afm measurements. nevertheless, the heterogeneity observed by afm is not an artefact of the immobilization, as heterogeneity was also detected in the dilatational modulus of monolayers measured in the langmuir trough (fig. c) . finally, it is worth noting that the domains visible in our simulations are limited by the simulation box size, despite representing the state-of-the-art in this regard. our control simulations with more lipids (see methods) result in similar behavior, thus suggesting that the observed heterogeneity does not arise due to finite-size effects. the surface pressures were also unchanged by the change in simulation box size (fig. s ). the domain lifetimes extracted from simulations are likely dependent on the domain sizes, and thus also system sizes, since escaping a core of a larger domain takes more time. unfortunately, extrapolating these lifetimes to microscopic scales is not straightforward. our study highlights that, despite the absence of a coexistence plateau in the surface pressure-area isotherm, heterogeneity at the nanometer scale -undetectable by fluorescence or brewster angle microscopies -should be detected by experimental approaches other than afm. the surface dilatational rheology measurements using an oscillating barrier readily observed a major change in the dilatational modulus at an apl where the heterogeneities begin to appear (fig. c) . lipid diffusion analyzed from the simulations also revealed a change in the trend at this crossover apl. as monolayer phase transitions are readily detected in experiments that probe lipid diffusion, such experiments could also detect smaller scale heterogeneity. finally, lipid tilt measured from our simulations (fig. b) revealed a persistent tilt angle of • up to the crossover apl, and this behavior should be captured by either x-ray diffraction or vibrational sum frequency generation spectroscopy methods. using a combination of langmuir trough experiments, afm imaging, and atomistic molecular dynamics simulations, we demonstrated that a synthetic quaternary lipid mixture is able to qualitatively reproduce the key features of the phase behavior of the native pulmonary surfactant extracts. under a large range of compression levels, thicker l c -like domains appear in the otherwise thinner l e -phase monolayer. these domains are dynamic and only slightly enriched in dppc with two saturated chains. we demonstrated that despite there not being a visible phase transition to the l c phase, some monolayer properties change significantly at well-defined values of area per lipid, and this crossover value is consistent between numerous quantities. moreover, since these properties should be readily measurable using experimental methods, our study also guides experimental work on detecting heterogeneities in biofilms. synthetic pulmonary surfactants mimicking the properties of the full functional surfactant are continuously investigated. a critical aspect is to find a proper lipid composition mixture. our results highlight how already a few key lipid components of the pulmonary surfactant display small domains, resembling the behavior of surfactant extracts. the lipid mixture is able to pack in a dynamic manner, thus enabling efficient surface tension reduction while maintaining sufficient fluidity. this behavior might also be crucial for the function of surfactant proteins, as has been investigated in other molecular simulations with multilipid components, which we will focus on in our future work. our approach, combining the charmm force field with the four-point opc water model, enables atomistic studies of lipid structures at the air-water interfaces in the complex pulmonary surfactant, allowing for studies of the physiologically important processes in the lung at a detail difficult to achieve experimentally. by integrating experimental data with molecular simulations, we provide, for the first time, a quantitatively accurate, unprecedented picture of the structural and dynamic properties of a realistic model of lung surfactant, under physiologically relevant conditions. the acute respiratory distress syndrome pathological findings of covid- associated with acute respiratory distress syndrome pulmonary surfactant: functions and molecular composition. bba -mol advances in alveolar biology: some new looks at the alveolar interface inactivation of pulmonary surfactant due to serum-inhibited adsorption and reversal by hydrophilic polymers: experimental inhibition of pulmonary surfactant adsorption by serum and the mechanisms of reversal by hydrophilic polymers: theory deficient hydrophilic lung surfactant proteins a and d with normal surfactant phospholipid molecular species in cystic fibrosis altered molecular specificity of surfactant phosphatidycholine synthesis in patients with acute respiratory distress syndrome surfactants in acute respiratory distress syndrome in infants and children: past, present and future bronchoscopic surfactant administration in patients with severe adult respiratory distress syndrome and sepsis bronchoscopic administration of bovine natural surfactant in ards and septic shock: impact on gas exchange and haemodynamics bronchoscopic administration of bovine natural surfactant in ards and septic shock: impact on biophysical and biochemical surfactant properties treatment with bovine surfactant in severe acute respiratory distress syndrome in children: a randomized multicenter study restoring pulmonary surfactant membranes and films at the respiratory surface a clinical trial of nebulized surfactant for the treatment of moderate to severe covid- (covsurf) london's exogenous surfactant study for covid (lesscovid) the molecular mechanism of monolayer-bilayer transformations of lung surfactant from molecular dynamics simulations structure of pulmonary surfactant membranes and films: the role of proteins and lipid-protein interactions structure and mechanical properties define performance of pulmonary surfactant membranes and films andreassen, s. a model of ventilation of the healthy human lung the effect of tissue elastic properties and surfactant on alveolar stability pulmonary surfactant layers accelerate o diffusion through the air-water interface role of lipid ordered/disordered phase coexistence in pulmonary surfactant function cholesterol rules: direct observation of the coexistence of two fluid phases in native pulmonary surfactant membranes at physiological temperatures compositional and structural characterization of monolayers and bilayers composed of native pulmonary surfactant from wild type mice segregated ordered lipid phases and protein-promoted membrane cohesivity are required for pulmonary surfactant films to stabilize and protect the respiratory surface segregated phases in pulmonary surfactant membranes do not show coexistence of lipid populations with differentiated dynamic properties structure and phase transitions in langmuir monolayers the molecular mechanism of lipid monolayer collapse multiphoton excitation fluorescence microscopy in planar membrane systems relationships between equilibrium spreading pressure and phase equilibria of phospholipid bilayers and monolayers at the air-water interface isotherms of dipalmitoylphosphatidylcholine (dppc) monolayers: features revealed and features obscured atomistic model for nearly quantitative simulations of langmuir monolayers phase separation in monolayers of pulmonary surfactant phospholipids at the air-water interface: composition and structure adaptation to low body temperature influences pulmonary surfactant composition thereby increasing fluidity while maintaining appropriately ordered membrane structure and surface activity critical phenomena: fluctuations caught in the act multiscale simulations of biological membranes: the challenge to understand biological phenomena in a living substance molecular view of phase coexistence in lipid monolayers pressure-area isotherm of a lipid monolayer from molecular dynamics simulations vattulainen, i. free volume theory applied to lateral diffusion in langmuir monolayers: atomistic simulations for a protein-free model of lung surfactant lung surfactant protein sp-b promotes formation of bilayer reservoirs from monolayer and lipid transfer between the interface and subphase the mechanism of collapse of heterogeneous lipid monolayers surface tension of the most popular models of water by using the test-area simulation method structural properties of popc monolayers under lateral compression: computer simulations analysis predictions of phase separation in three-component lipid membranes by the martini force field crystalline wax esters regulate the evaporation resistance of tear film lipid layers associated with dry eye syndrome simulated surface tensions of common water models building water models: a different approach update of the charmm all-atom additive force field for lipids: validation on six lipid types plumed : new feathers for an old bird update of the cholesterol force field parameters in charmm high performance molecular simulations through multi-level parallelism from laptops to supercomputers simulations using the charmm additive force field a density-based algorithm for discovering clusters in large spatial databases with noise nanoscale membrane domain formation driven by transmembrane helices can induce domain formation in crowded model membranes barron, a. e. mimicking sp-c palmitoylation on a peptoid-based sp-b analogue markedly improves surface activity biomimetic nterminal alkylation of peptoid analogues of surfactant protein c rapid compression transforms interfacial monolayers of pulmonary surfactant a captive bubble method reproduces the in situ behavior of lung surfactant monolayers translational diffusion in phospholipid monolayers measured by fluorescence microphotolysis phase coexistence in films composed of dlpc and dppc: a comparison between different model membrane systems lateral diffusion in the liquid phases of dimyristoylphosphatidylcholine/cholesterol lipid bilayers: a free volume analysis influence of palmitic acid and hexadecanol on the phase transition temperature and molecular packing of dipalmitoylphosphatidylcholine monolayers at the air-water interface condensing effect of palmitic acid on dppc in mixed langmuir monolayers long-range molecular orientational order in monolayer solid domains of phospholipid on calculating the bending modulus of lipid bilayer membranes from buckling simulations investigation of phospholipids of the pulmonary extracellular lining by electron paramagnetic resonance. the effects of phosphatidylglycerol and unsaturated phosphatidylcholines on the fluidity of dipalmitoyl phosphatidylcholine vattulainen, i. pulmonary surfactant lipid reorganization induced by the adsorption of the pressure-area isotherm of a lipid monolayer from molecular dynamics simulations structural properties of popc monolayers under lateral compression: computer simulations analysis equilibrium and dynamic interfacial tension measurements at microscopic interfaces using a micropipet technique. . dynamics of phospholipid monolayer formation and equilibrium tensions at the water-air interface the martini force field: coarse grained model for biomolecular simulations formation and structure of surface films: captive bubble surfactometry. bba -mol. basis dis persistence of phase coexistence in disaturated phosphatidylcholine monolayers at high surface pressures advances in synthetic lung surfactant protein technology all-atom molecular dynamics simulations of dimeric lung surfactant protein b in lipid multilayers we thank csc -it center for science (espoo, finland) for computing resources. key: cord- -ea sjcs authors: ramazzotti, daniele; angaroni, fabrizio; maspero, davide; gambacorti-passerini, carlo; antoniotti, marco; graudenzi, alex; piazza, rocco title: verso: a comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ea sjcs a global cross-discipline effort is ongoing to characterize the evolution of sars-cov- virus and generate reliable epidemiological models of its diffusion. to this end, phylogenomic approaches leverage accumulating genomic mutations to track the evolutionary history of the virus and benefit from the surge of sequences deposited in public databases. yet, such methods typically rely on consensus sequences representing the dominant virus lineage, whereas a complex intra-host genomic composition is often observed within single hosts. furthermore, most approaches might produce inaccurate results with noisy data and sampling limitations, as witnessed in most countries affected by the epidemics. we introduce verso (viral evolution reconstruction), a new comprehensive framework for the characterization of viral evolution and transmission from sequencing data of viral genomes. our probabilistic approach first delivers robust phylogenetic models from clonal variant profiles and then exploits variant frequency patterns to characterize and visualize the intra-host genomic diversity of samples, which may reveal uncovered infection events. we prove via extensive simulations that verso outperforms the state-of-the-art tools for phylogenetic inference, also in condition of noisy observations and sampling limitations. the application of our approach to sars-cov- samples from amplicon sequencing and to samples from rna-sequencing unravels robust phylogenomic models, improving the current knowledge on sars-cov- evolution and spread. importantly, by exploiting co-occurrence patterns of minor variants, verso allows us to reveal uncovered infection paths, which are validated with contact tracing data. moreover, the in-depth analysis of the mutational landscape of sars-cov- confirms a statistically significant increase of genomic diversity in time and allows us to identify a number of variants that are transiting from minor to clonal state in the population, as well as several homoplasies, some of which might indicate ongoing positive selection processes. overall, the results show that the joint application of our framework and data-driven epidemiological models might improve currently available strategies for pathogen surveillance and analysis. verso is released as an open source tool at https://github.com/bimib-disco/verso. the outbreak of coronavirus disease (covid- ) , which started in late in wuhan (china) [ , ] and was declared pandemic by the world health organization, is fueling the publication of an increasing number of studies aimed at exploiting the information provided by the viral genome of sars-cov- virus to identify its proximal origin, characterize the mode and timing of its evolution, as well as to define descriptive and predictive models of geographical spread and evaluate the related clinical impact [ , , ] . as a matter of fact, the mutations that rapidly accumulate in the viral genome [ ] can be used to track the evolution of the virus and, accordingly, unravel the viral infection network [ , ] . at the time of this writing, numerous independent laboratories around the world are isolating and sequencing sars-cov- samples and depositing them on public databases, e.g., gisaid [ ] , whose data are accessible via the nextstrain portal [ ] . such data can be employed to estimate models from genomic epidemiology and may serve, for instance, to estimate the proportion of undetected infected people by uncovering cryptic transmissions, as well as to predict likely trends in the number of infected, hospitalized, dead and recovered people [ , , ] . more in detail, most studies employ phylogenomic approaches that process consensus sequences, which represent the dominant virus lineage within each infected host. a growing plethora of methods for phylogenomic reconstruction is available to this end, which rely on different algorithmic frameworks, including distance-matrix, maximum parsimony, maximum likelihood or bayesian inference, with various substitution models and distinct evolutionary assumptions (see, e.g., [ , , , , , , , , , ] ). however, while such methods have repeatedly proven effective in unraveling the main patterns of evolution of viral genomes with respect to many different diseases, including sars-cov- [ , , , ] , at least two issues can be raised. first, most phylogenomics methods might produce unreliable results when dealing with noisy observation, for instance due to sequencing issues, or collected with significant sampling limitations [ , , ] , as witnessed for most countries during the epidemics [ , ] . second, most methods do not consider the key information on intra-host minor variants (also referred to as minority variants or isnvs), which can be retrieved from whole-genome deep sequencing raw data and might be essential to improve the characterization of the infection dynamics and to pinpoint positively selected variants [ , , ] . due to the high replication, mutation and recombination rates of rna viruses, subpopulations of mutant viruses, also known as viral quasispecies [ ] , typically emerge and coexist within single hosts, and are supposed to underlie most of the adaptive potential to the immune system response and to anti-viral therapies [ , , ] . in this regard, many recent studies highlighted the noteworthy amount of intra-host genomic diversity in sars-cov- samples [ , , , , , , , , ] , similarly to what already observed in many distinct infectious diseases [ , , , , , , ] . here, we introduce verso (viral evolution reconstruction), a new comprehensive framework for the inference of high-resolution models of viral evolution from raw sequencing data of viral genomes (see fig. ). verso includes two subsequent algorithmic steps. step # : robust phylogenomic inference from clonal variant profiles. verso first employs a probabilistic noise-tolerant framework to process binarized clonal variant profiles (or, alternatively, consensus sequences), to return a robust phylogenetic model also in condition of sampling limitations and sequencing issues. by adapting algorithmic strategies widely employed in cancer evolution analysis [ , , , ] , verso is able to correct false positive and false negative variants, can manage missing observations due to low coverage, and is designed to group samples with identical (corrected) clonal genotype in polytomies, avoiding ungrounded arbitrary orderings. as a result, the accurate and robust phylogenomic models produces by verso may be used to improve the parameter estimation of epidemiological models, which typically rely on limited and inhomogeneous data [ , ] . notice that this step can be executed independently from step # , for instance in case raw sequencing data are not available. homoplasy detection (clonal variants). the first step of verso allows us to identify clonal mutations that violate the accumulation hypothesis and might be involved in homoplasies, possibly due to positive selection (in a scenario of convergent/parallel evolution [ ] ), founder effects [ ] or mutational hotspots [ ] . such information might be useful to drive the design of opportune treatments and vaccines, for instance by blacklisting positively selected genomic regions. step # : characterization of intra-host genomic diversity. in the second step, verso exploits the information on variant frequency (vf) profiles obtained from raw-sequencing data (if available), to characterize and visualize the intra-host genomic similarity of hosts with identical (corrected) clonal genotype. in fact, even though the extent and modes of transmission of quasispecies from a host to another during infections are still elusive [ , ] , patterns of co-occurrence of minor variants detected in hosts with identical clonal genotype may provide an indication on the presence of uncovered infection paths [ , ] . for this reason, the second step of verso is designed to characterize and visualize the genomic similarity of samples by exploiting dimensionality reduction and clustering strategies typically employed in single-cell analyses [ ] . alternative approaches for the analysis of quasispecies, yet with different goals and algorithmic assumptions have been proposed, for instance in [ , , , ] and recently reviewed in [ ] . as specified above, verso step # is executed on groups of samples with identical clonal genotype: the rationale is that the transmission of minor variants implicates the concurrent transfer of clonal variants, excluding the rare cases in which the vf of a clonal variant significantly decreases in a given host, for instance due to mutation losses (e.g., via recombination-associated deletions or via multiple mutations hitting an already mutated genome location [ ] ) or to complex horizontal evolution phenomena (e.g., super-infections [ , ] ). conversely, the transmission of clonal variants does not necessarily implicate the transfer of all minor variants, which are affected by complex recombination and transmission effects, such as bottlenecks [ , ] . as a final result, verso allows to visualize the genomic similarity of samples on a low-dimensional space (e.g., umap [ ] or tsne [ ] ) representing the intra-host genomic diversity, and to characterize high-resolution infection chains, thus overcoming the limitations of methods relying on consensus sequences. homoplasy detection (minor variants). importantly, minor variants observed in hosts with distinct clonal genotypes (identified via verso step # ) may indicate homoplasies, due to mutational hotspots, phantom mutations or to positive selection [ ] . verso pinpoints such variants for further investigations and allows to exclude them from the computation of the vf-based genomic similarity prior to verso step # , to reduce the possible confounding effects. to assess the accuracy and robustness of the results produced by verso, we performed an extensive array of simulations, and compared with two state-of-the-art methods for phylogenetic reconstruction, i.e., iq-tree [ ] and beast [ ] . as a major result, verso outperforms competing methods in all settings and also in condition of high noise and sampling limitations. furthermore, we applied verso to two large-scale datasets, generated via amplicon and rna-seq illumina sequencing protocols, including and samples, respectively. the robust phylogenomic models delivered via verso step # allow us to refine the current knowledge on sars-cov- evolution and spread. besides, thanks to the in-depth analysis of the mutational landscape of both clonal and minor variants, we could identify a number of variants undergoing transition to clonality, as well as several homoplasies, including variants likely undergoing positive selection processes. remarkably, the infection chains identified via verso step # , by assessing the intra-host genomic similarity of samples with the same clonal genotype, were validated by employing contact tracing data from [ ] . this important result, which could not be achieved by analyzing consensus sequences, proves the effectiveness of employing raw sequencing data to improve the characterization of the transmission dynamics, in particular during the early phase of the outbreak, in which a relatively low diversity of sars-cov- has been observed at the consensus level. verso is released as free open source tool at this link: https://github.com/bimib-disco/verso. in order to assess the performance of verso and compare it with competing approaches, we executed extensive tests on simulated datasets, generated with the coalescent model simulator msprime [ ] . simulations allow one to compute a number of metrics with respect to the ground-truth, which in this case is the phylogeny of samples resulting from a backwards-in-time coalescent simulation [ ] . accordingly, this allows one to evaluate the accuracy and robustness of the results produced by competing methods in a variety of in-silico scenarios. in detail, we selected simulation scenarios with n = samples in which a number of clonal variants (with distinguishable profiles) between and was observed. we then inflated the datasets with false positives with rate α and false negatives with rate β, in order to mimic sequencing and coverage issues. moreover, additional datasets were generated via random subsampling of the original datasets, to model possible sampling limitations and sampling biases. as a result, we investigated simulations settings: (a) low noise, no subsampling, (b) high noise, no subsampling, (c) low noise, subsampling, and (d) high noise, subsampling (see methods and the supplementary material for further details, the complete parameter settings of the simulations are provided in table of the supplementary material). verso step # was compared with two state-of-the-art phylogenetic methods from consensus sequences: iq-tree [ ] , the algorithmic strategy included in the nextstrain-augur pipeline [ ] , and beast [ ] . consensus sequences to be provided as input to such methods were generated from simulation data by employing the reference genome sars-cov- -anc (see below). example, three hosts infected by the same viral lineage are sequenced. all hosts share the same clonal mutation (t>c, green), but two of them (# and # ) are characterized by a distinct minor mutation (a>t, red), which randomly emerged in host # and was transferred to host # during the infection. standard sequencing experiments return an identical consensus sequence for all samples, by employing a threshold on variant frequency (vf) and by selecting mutations characterizing the dominant lineage. (b) verso takes as input the variant frequency profiles of samples, generated from raw sequencing data. in step # , verso processes the binarized profiles of clonal variants and solves a boolean matrix factorization problem by maximizing a likelihood function via monte-carlo markov chain, in order to correct false positives/negatives and missing data. as output, it returns both the corrected mutational profiles of samples and the phylogenetic tree, in which samples with identical corrected genotypes are grouped in polytomies. corrected genotypes are then employed to identify homoplasies of minor variants, which are further investigated to pipoint positively selected mutations. the variant frequency profile of minor variants (excluding homoplasies) is processed by step # of verso, which computes a refined genomic distance among hosts (via bray-curtis dissimilarity on the knn graph, after pca) and performs clustering and dimensionality reduction, in order to project and visualize samples on a d space, representing the intra-host genomic diversity and the distance among hosts. this allows to identify uncovered transmission paths among samples with identical clonal genotype. the performance of methods was assessed by comparing the reconstructed phylogeny with the simulated ground-truth, in terms of: (i) absolute error evolutionary distance, (ii) branch score difference [ ] and (iii) quadratic path difference [ ] (please refer to the supplementary material for a detailed description of all metrics). in figure one can find the performance distribution of all methods with respect to all simulations settings. notably, verso step # outperforms competing methods in all scenarios (mann-whitney u test p < . in all cases), with noteworthy percentage improvements, also in conditions of high noise and sampling limitations. this important result shows that the probabilistic framework that underlies verso step # can produce more robust and reliable results when processing noisy data, as typically observed in real-world scenarios. different reference genomes have been employed in the analysis of sars-cov- origin and evolution. two genome sequences from human samples, in particular, were used in early phylogenomic studies, namely sequence epi_isl_ (ref. # in the following), used, e.g., in [ ] and sequence epi_isl_ (ref. # ) used, e.g., in [ ] . excluding the polya tails, the two sequences are identical for on genome positions ( in order to define a likely common ancestor for both sequences, we analyzed the bat-cov-ratg genome (sequence epi_isl_ ) [ ] and the pangolin-cov genome (sequence epi_isl_ ) [ , ] , which were identified as closely related genomes to sars-cov- [ ] . in particular, it was hypothesized that sars-cov- might be a recombinant of an ancestor of pangolin-cov and bat-cov-ratg [ , ] , whereas more recent findings would suggest that sars-cov- lineage is the consequence of a direct or indirect zoonotic jump from bats [ ] . whatever the case, both bat-cov-ratg and pangolin-cov display haplotype tctct at locations , , , and and, therefore, one can hypothesize that such haplotype was present in the unknown common ancestor of ref. # and # . for this reason, we generated an artificial reference genome, named sars-cov- -anc, which is identical to both ref. notice that verso pipeline is flexible and can employ any reference genome. we retrieved raw illumina amplicon sequencing data of sars-cov- samples of dataset # and applied verso to the mutational profiles of samples selected after quality check (mutational profiles were generated by executing variant calling via standard practices; see methods for further details). notice that the analysis of this dataset was performed independently from that of dataset # in order to exclude possible sequencing-related artifacts or idiosyncrasies. we first applied verso step # to the mutational profile of the variants detected as clonal (vf > %) in at least % of the samples, in order to reconstruct a robust phylogenomic tree. the verso phylogenetic model is displayed in fig. a and highlights the presence of clonal genotypes, obtained by removing noise from data, and which define polytomies including different numbers of samples (see methods for further details). more in detail, variant g. t>c (n, synonymous) is the earliest evolutionary event from reference genome sars-cov- -anc and is detected in samples of the dataset. the related clonal genotype g , which is characterized by no further mutations, identifies a polytomy including australian, chinese, american and south-african samples. three clades originate from g : a first clade includes clonal genotypes g ( samples) and g ( ), while a second clade includes clonal genotype g ( ). clonal genotypes g -g are characterized by the absence of snvs g. t>c (orf ab, synonymous) and g. c>t (orf , p. s>l) and correspond to previously identified type a [ ] (also type s [ ] ), which was hypothesized to be an early sars-cov- type. the third clade originating from clonal genotype g includes all remaining clonal genotypes (g -g ) and is characterized by the presence of both snvs g. t>c and g. c>t. this specific haplotype corresponds to type b [ ] (also type l [ ] ) and an increase of its prevalence has progressively recorded in the population, as one can see in fig. , as opposed to type a (s), which was rarely observed in late samples. in this regard, we note that there are currently insufficient elements to support any epidemiological claim on virulence and pathogenicity of such sars-cov- types, even if recent evidences would suggest the existence of a low correlation [ ] . , on (i) absolute error evolutionary distance, (ii) branch score difference [ ] and (iii) quadratic path difference [ ] with respect to the ground-truth sample phylogeny provided by msprime (see the supplementary week ( ) homoplasy detection (clonal variants). clonal variants included in our model show apparent violations of the accumulation hypothesis, namely: g. g>t (orf ab, p. l>f), g. c>t (orf ab, synonymous), g. c>t (orf ab, synonymous), g. c>t (orf , p. s>l), and g. c>t (n,p. p>l), suggesting that they might be involved in homomplasies. some of such variants have been exhaustively studied (e.g., g. g>t in [ ] ), specifically to verify possible scenarios of convergent evolution, which may unveil the fingerprint of adaptation of sars-cov- to human hosts. to this end, particular attention should be devoted to the three non synonymous substitutions, i.e., g. g>t (present in samples, ≈ % of the dataset), g. c>t ( samples, ≈ %) and g. c>t ( samples, ≈ %). as a first result, we note the prevalence dynamics of the haplotypes defined by such variants does not show any apparent growth trend in the population (see supplementary figure ). to further investigate if such variants fall in a region prone to mutations of the sars-cov- genome, we evaluated the mutational density employing a sliding window approach similarly to [ ] (see supplementary material for additional details). as shown in supplementary figure , the mutational density, computed by considering synonymous minor variants, exhibits a median value of = . [syn. mutations][nucleotides] − . interestingly, the three nonsynonymous snvs (g. g>t, g. c>t and g. c>t) are located within windows with a higher mutational density than the median value: . , . and . [syn. mutations][nucleotides] − , respectively (see table of the supplementary material), and this would suggest that they might have originally emerged due to the presence of natural mutational hotspots or phantom mutations. however, this analysis is not conclusive and further investigations are needed to characterize the functional effect of such mutations and the possible impact in the evolutionary and diffusion process of sars-cov- . stability analysis. the choice of an appropriate vf threshold to identify clonal variants and, accordingly, to generate consensus sequences from raw sequencing data might affect the stability of the results of any downstream phylogenomic analysis. on the one hand, loose thresholds might increase the risk of including non-clonal variants in consensus sequences. on the other hand, too strict thresholds might increase the rate of false negatives, especially with noisy sequencing data. for this reason, we assessed the robustness of the results produced by verso step # on dataset # when different thresholds in the set δ ∈ { . , . , . , . } are employed to identify clonal variants, with those obtained with default threshold (δ = . ), in terms of tree accuracy (see the supplementary material for further details). as one can see in supplementary figure , the tree accuracy varies between . and . in all settings, proving the the results produced by verso step # are robust with regard to the choice of the vf threshold for clonal variant identification. we then applied verso step # to the complete vf profiles of the samples with the same clonal genotype and projected their intra-host genomic diversity on the umap low-dimensional space. this was done excluding: (i) the clonal variants employed in the phylogenetic inference via verso step # , (ii) all minor variants (vf ≤ %) observed in more than one clonal genotype (i.e., homoplasies) and which are likely emerged independently within the hosts, due to mutational hotspots, phantom mutations or positive selection (see methods and the next subsections). even though, as expected, the vf profiles of minor variants are noisy, a complex intra-host genomic architecture is observed in several individuals. moreover, patterns of co-occurrence of minor variants across samples support the hypothesis of transmission from a host to another. in fig. as a first result, all samples belonging a specific contact group are characterized by the same clonal genotype, determined via verso step # , a result that confirms recent findings [ , ] . more importantly, the analysis of the intra-host genomic diversity via verso step # allows to highly refine this analysis. in fig. one can find the umap plot of clonal genotypes g , g and g , which include (on ), (on ) and (on ) samples with contact information. strikingly, the distribution of the pairwise intra-host genomic distance among samples from the same institution/household (computed on the k-nearest neighbour graph via bray-curtis dissimilarity, after pca; see methods) is significantly lower with respect to the distance of all samples with the same clonal genotype (p-value of the mann-whitney u test < . in all cases). furthermore, all samples belonging to the same contact group are connected in the knn graph, while a noteworthy proportion of samples without contact information in genotypes g and g are placed in disconnected graphs ( . % and . %, respectively). this major result suggests that patterns of co-occurrence of minor variants can indeed provide useful indication on contact tracing dynamics, which would be masked when employing consensus sequencing data. accordingly, the algorithmic strategy employed by verso step # and, especially, the identification of the knn graph on intra-host genomic similarity, provides an effective tool to dissect the complexity of viral evolution and transmission, which might in turn improve the reliability of currently available contact tracing tools. homoplasy detection (minor variants). several minor variants are found in samples with distinct clonal genotypes and might indicate the presence of homoplasies. in this respect, the heatmap in fig. f returns the distribution of minor snvs with respect to: (i) the number of distinct clonal genotypes in which they are detected and (ii) the mutational density of the region in which they are located (see the supplementary material for details on the mutational density analysis). the intuition is that variants detected in single clonal genotypes (left region of the heatmap) are likely spontaneously emerged private mutations or the result of infection events between hosts with same clonal genotype (see above). conversely, snvs found in multiple clonal genotypes (right region of the heatmap) may have emerged due to positive selection in a parallel/convergent evolution scenario, or to mutational hotspots or phantom mutations. to this end, the mutational density analysis provides useful information to pinpoint mutation-prone regions of the genome. interestingly, a significant number of minor variants are observed in multiple clonal genotypes and fall in scarcely mutated regions of the genome (see supplementary figure ). this would suggest that some of these variants might have been positively selected, due to some possible functional advantage or to transmission-related founder effects. in this respect, we further focused our investigation on a list of candidate minor variants, which: (i) are detected in more than one clonal genotype, (ii) are present in at least samples, (iii) are nonsynonymous and (iv) fall in a region of the genome with mutational density lower than the median value (see table of the supplementary material for details on such variants). in the following, we focus on a subset of such variants falling on the spike gene of the sars-cov- genome. considerations on homoplasies falling on the spike gene. the spike protein of sars-cov- plays a critical role in the recognition of the ace receptor and in the ensuing cell membrane fusion process [ ] . we prioritized candidate homoplastic minor variants occurring on the sars-cov- spike gene (s) (see table of the supplementary material). interestingly, out of , namely: g. t>c (p. i>t) and g. g>t (p. g>c), detected in samples in total ( and samples, respectively), clustered in the so-called connector region (cr), bridging between the two heptad repeat regions (hr and hr ) of the s subunit of the spike protein. when the receptor binding domain (rbd) binds to ace receptor on the target cell, it causes a conformational change responsible for the insertion of the fusion peptide (fp) into the target cell membrane. this, in turn, triggers further conformational changes eventually promoting a direct interaction between hr trimer and hr , which occurs upon bending of the flexible cr, in order to form a six-helical hr -hr complex known as the fusion core region (fcr) in close proximity to the target cell plasma membrane, ultimately leading to viral fusion and cell entry [ ] . peptides derived from the hr heptad region of enveloped viruses and able to efficiently bind to the viral hr region inhibit the formation of the fcr and completely suppress viral infection [ ] . therefore, the formation of the fcr is considered to be vital to mediate virus entry in the target cells, promoting viral infectivity. of note, the cr is highly conserved across the gammacoronavirus genus, supporting the notion that this region may play a very important but yet unclear functional role (fig g) . although structural and in vitro models will be required in order to extensively characterize the functional effect of these variants, the evidence that two our of three minor variants detected in the spike protein falls in a small domain comprising less than % of the entire spike protein length is intriguing, as it suggests a potential functional role for these mutations. it will be important to track the prevalence of these mutations, as well as of all other candidate convergent variants falling on different region of the sars-cov- , to highlight possible transitions to clonality (see below). we analyzed in-depth the mutational landscape of the samples of dataset # . first, the comparison of the number of clonal (vf > %) and minor variants detected in each host (fig. a ) reveals a bimodal distribution of clonal variants (with first mode at and second mode at ), whereas minor variants display a more dispersed long-tailed distribution with median equal to and average ≈ . from the plot, it is also clear that individuals characterized by the same clonal genotype may display a significantly different number of minor variants, with distinct distributions observed across clonal genotypes. importantly, the comparison of the distribution of the number of variants obtained by grouping the samples with respect to collection week ( fig. b-c) allows us to highlight a highly statistically significant increasing trend for clonal variants (mann-kendall trend test on median number of clonal variants p < . ). this result would strongly support both the hypotheses of accumulation of clonal variants in the population and that of a concurrent increase of overall genomic diversity of sars-cov- [ , ] , whereas the relevance of this phenomenon on minor variants is unclear. we then focused on the properties of the snvs detected in the population. surprisingly, the distribution of the median vf for each detected variants (fig. d ) reveals a bimodal distribution, with the large majority of variants showing either a very low or a very high vf, with only a small proportion of variants showing a median vf within the range − %. this behavior is typical of systems where the prevalence of some subpopulations is driven by positive darwinian selection while others are purified [ ] . in order to analyze the two components of this distribution, we further categorized the variants as always clonal, (i.e., snvs detected with vf > % in all samples), always minor (i.e., snvs detected with vf % and ≤ % in all samples) and mixed (i.e., snvs detected as clonal in at least one sample and as minor in at least another sample). as one can see in fig. e, . %, . % of and % all snvs are respectively detected as always clonal, always minor and mixed in our dataset. moreover, %, . % and . % of always clonal, always minor and mixed variants, respectively, are nonsynonymous, whereas the large majority of remaining variants are synonymous. these results would suggest that, in most cases, randomly emerging sars-cov- minor variants tend to remain at a low frequency in the population, whereas, in some circumstances, certain variants can undergo frequency increases and even become clonal, due to uncovered mixed transmission events or to selection shifts, as it was observed in [ ] for the cases of h n and h n / influenza. interestingly, variants identified as possibly convergent (see above) fall in this category and deserves further investigations (see table of the supplementary material for additional details) . transmission bottleneck analysis. the estimation of transmission bottlenecks might be of specific interest during the current pandemics. despite most available methods require data collected on donor-host couples (see, e.g., [ , ] ), here we employed a strategy akin to [ , ] and that is roughly based on the analysis of the variation of the vf variance of a number of candidate neutral mutations. the intuition is that variance shrinking indicates significant transmission bottlenecks which, accordingly, would result in lower viral diversity transferred from a host to another and, possibly, in purification of certain variants in the population. as the analysis ideally requires the comparison of groups in which infection events have occurred, here we considered groups of samples with distinct clonal genotypes, separately. we then selected a number of variants as neutral markers. the rationale is that transmission phenomena such as bottlenecks are expected to significantly affect the vf variance of neutral markers (please see supplementary material for further details). more in detail, we first split the samples of each clonal genotype, for which a collection date is available, in nonoverlapping groups corresponding to two subsequent time windows, i.e., before and after the th week, . accordingly, snvs were selected as candidate neutral or quasi-neutral markers, namely variants g. t>c, g. a>g and g. g>a. in supplementary figure , one can find the distribution of the variant frequency of the selected markers with respect to the time windows, which highlights moderate variations of the variance for all markers (see also table of the supplementary material). all in all, this result would suggest the presence of mild bottleneck effects, consistently with recent studies involving donor-host data [ ] . we retrieved the raw illumina rna-sequencing data of samples included in dataset # and applied verso to the mutational profiles of samples selected after quality check. clonal variants were employed in the analysis, according to the filters described in the methods section. remarkably, the output phylogenetic model is consistent with the one obtained for dataset # , despite minor differences (supplementary figure a) . specifically, distinct clonal genotypes are identified by verso step # , of which are identical to those found in the analysis of dataset # (in such cases the same genotype label was maintained). further clonal genotypes are evolutionary consistent and represent independent branches detected due to the nonoverlapping composition of the dataset, and are labeled with progressive letters from the closest genotype (i.e., g b, g b, g b, g c, g b), while the samples of genotype g b* might be safely assigned to genotype g b, since the absence of mutation g. c>t is likely due to low coverage. by excluding the remaining clonal genotype gh, which presents inconsistencies due to the presence of the candidate homoplastic variant g. g>t (orf ab, p. l>f, see above), all clonal genotypes display the same ordering in both datasets. this proves the robustness of the results delivered by verso step # even when dealing with data generated from distinct sequencing platforms. by looking at the geo-temporal localization of samples obtained via microreact [ ] (supplementary figure b) , one can see that that dataset # includes samples with a significantly different geographical distribution with respect to dataset # . this dataset contains sample from countries, with the large majority collected in usa ( . %). more in detail, the samples of such country are mostly characterized by clonal genotype g . we further notice that, also for dataset # , mutation g. a>g (s,p. d>g) becomes prevalent in the population at late collection dates. moreover, only samples belonging to previously defined type b are detected in this dataset. the analysis of the intra-host genomic diversity was also performed for dataset # via verso step # , which would suggest the existence of uncovered infection events and of several infection clusters with distinct properties, even though no contact tracing are available in this case. overall, this proves the general applicability of verso framework, which can produce meaningful results when applied to data produced with any sequencing platforms. however, in order to minimize the possible impact of data-and platform-specific biases, we suggest to perform the verso analysis on datasets generated from different protocols separately. we finally assessed the computational time required by verso in a variety of simulated scenarios. the results are shown in the supplementary material (supplementary figure ) and demonstrate the scalability of verso also when processing large-scale datasets. we introduced verso, a comprehensive framework for the high-resolution characterization of viral evolution from sequencing data, which improves over currently available methods for the analysis of consensus sequences. verso exploits the distinct properties of clonal and minor variants to dissect the complex interplay of genomic evolution within hosts and transmission among hosts. on the one hand, the probabilistic framework underlying verso step # delivers highly accurate and robust phylogenetic models from clonal variants, also in condition of noisy observations and sampling limitations, as proven by extensive simulations and by the application to two-large scale sars-cov- datasets generated from distinct sequencing platforms. on the other hand, the characterization of intra-host genomic diversity provided by verso step # allows one to identify uncovered infection paths, which were in our case validated with contact tracing data, as well as to intercept variants involved in homoplasies. this may represents a major advancement in the analysis of viral evolution and spread and should be quickly implemented in combination to data-driven epidemiological models, to deliver a high-precision platform for pathogen detection and surveillance [ , ] . this might be particularly relevant for countries which suffered outbreaks of exceptional proportions and for which the limitations and inhomogeneity of diagnostic tests have proved insufficient to define reliable descriptive/predictive models of disease diffusion. for instance, it was hypothesized that the rapid diffusion of covid- might be likely due to the extremely high number of untested asymptomatic hosts [ ] . more accurate and robust phylogenetic models may allow to improve the assessment of molecular clocks and, accordingly, the estimation of the parameters of epidemiological models such as sir and sis [ , ] , as well as to unravel the cryptic transmission paths [ , , , ] . furthermore, the finer grain of the analysis on intra-host genomic similarity from sequencing data might be employed to enhance the active surveillance, for instance by facilitating the identification of infection clusters and super-spreaders [ ] . finally, the characterization of variants possibly involved in positive selection processes might be used to drive the experimental research on treatments and vaccines. verso is a novel framework for the reconstruction of viral evolution models from raw sequencing data of viral genomes. it includes a two-step procedure, which we describe in the following. the first step of verso employs a probabilistic maximum-likelihood framework for the reconstruction of robust phylogenetic trees from binarized mutational profiles of clonal variants (or, alternatively, from consensus sequences). this step relies on an evolved version of the algorithmic framework introduced in [ ] for the inference of cancer evolution models from single-cell sequencing data, and can be executed independently from step # , in case raw sequencing data are not available. in detail, the method takes as input: a n (samples) × m (variants) binary mutational profile matrix, as defined on clonal snvs only. in this case, an entry in a given sample is equal to (present) if the vf is larger than a certain threshold (in our analyses, equal to %), it is equal to if lower than a distinct threshold (in our analyses, equal to %), and is considered as missing (na) in the other cases, thus modeling possible uncertainty in sequencing data or low coverage. notice that consensus sequences can be processed by verso step # by generating a consistent binarized mutational profile matrix. here, we recall that the variant accumulation hypothesis holds only when considering clonal mutations, which are most likely transmitted from a host to another during the infection, whereas this might not be the case with variants with lower frequency, due to the high recombination rates, as well as to bottlenecks, founder effects and stochasticity (see below). we also note that given the intrinsic challenges associated with a reliable identification of low vf indels, the analysis focuses only on single nucleotide variants. further details on the variant calling pipeline employed in this study are provided in the next subsections. the algorithmic framework verso step # is a probabilistic framework which solves a boolean matrix factorization problem with perfect phylogeny constraints, i.e., by assuming the infinite sites assumption, which subsumes a consistent process of accumulation of clonal variants in the population and does not allow for losses of mutations or convergent variants (i.e., mutations observed in distinct clades). further details on verso assumptions are provided in the supplementary material and in [ , ] ). our approach accounts for uncertainty in the data, by employing a maximum likelihood approach (via mcmc search) that allows for the presence of false positives, false negatives and missing data points. as shown in [ ] in a different experimental context, our algorithmic framework ensures robustness and scalability also in case of high rates of errors and missing data, due for instance to sampling limitations, and is robust to mild violations of the infinite sites assumption, e.g., to convergent variants or mutation losses (see the supplementary material for further details on the algorithmic framework, including the probabilistic graphical model depicted in supplementary figure and the summary of notation in table of the supplementary material). the inference returns a set of maximum likelihood variants trees (minimum ) as sampled during the mcmc search, representing the ordering of accumulation of clonal variants, and a set of maximum likelihood attachments of samples to variants. given the variants tree and the maximum likelihood attachments of samples to variants, verso outputs: (i) a phylogenetic model where each leaf correspond to a sample, whereas internal nodes correspond to accumulating clonal variants, (ii) the corrected clonal genotype of each sample, i.e., the binary mutational profile on clonal variants obtained after removing false positives, false negatives and missing data. the model naturally includes polytomies, which group samples with the same corrected clonal genotype. the length of the branches in the model represents the number of clonal substitutions (which can be normalized with respect to genome length), as in standard phylogenomic models. the verso phylogenetic model is provided as output in newick file format and can be processed and visualized in standard tools for phylogenetic analysis, such as figtree [ ] or dendroscope [ ] . furthermore, verso allows one to visualize the geo-temporal localization of clonal genotypes via microreact [ ] . violations of the perfect phylogeny constraints (i.e., of the consistent accumulation of clonal variants in the population) [ ] are possible and can be due to homoplasies, i.e., identical variants detected in samples belonging to different clades, or to rare occurrences involving mutation losses (e.g., due to recombination-related deletions o or to multiple mutations hitting an already mutated genome location [ ] ), as well as to infrequent transmission phenomena, such as super-infections [ , ] . in this regard, verso allows to identify mutations likely involved in homoplasies in a similar fashion to the plethora of works on mitochondrial evolution (see, for example [ , , ] ). in detail, given the maximum likelihood phylogenetic tree, verso can estimate the variants that are theoretically expected in each sample. by comparing the theoretical observations with the input data, verso can estimate the rate of false positives (i.e., the variants that are observed in the data but are not predicted by verso), and false negatives (i.e., variants that are not observed, but predicted). variants that show a particularly high level of estimated error rates represent candidate homoplasies and are flagged. once this procedure has been completed, the list of flagged variants can include: (i) mutations falling in highly-mutated regions due to mutational hotspots, (ii) phantom mutations i.e., systematic artifacts generated during sequencing processes [ ] , or (iii) mutations that have been positively selected in the population, e.g., due to a particular functional advantage. since one might be interested in identifying positively selected mutations, verso allows to perform a subsequent analysis, which aims at highlighting the mutation-prone regions of the genome, and which might be due to mutational hotspots or phantom mutations (see the supplementary material for further details). we finally note that the detection of homoplasies for minor variants require a different algorithmic procedure, which is detailed in the following. in the second step, verso takes into account the variant frequency profiles of groups of samples with the same clonal genotype (identified via verso step # ), in order to characterize their intra-host genomic diversity and visualize it on a low-dimensional space. this allows to highlight patterns of co-occurrence of minor variants, possibly underlying uncovered infection events, as well as homoplasies involving, e.g., positively selected variants. notice that this step requires raw sequencing data and the prior execution of step # . verso step # takes as input a n (samples) × m (variants) variant frequency (vf) profile matrix, in which each entry includes the vf ∈ ( , ) of a given mutation in a certain sample, after filtering out: (i) the clonal variants employed in step # and (ii) the minor variants possibly involved in homoplasies (see below). the variant calling pipeline employed in this work is detailed in the next subsections. while it is sound to binarize clonal variant profiles to reconstruct a phylogenetic tree, it is opportune to consider the variant frequency profiles when analyzing intra-host variants, for several reasons. first, variant frequency profiles describe the intra-host genomic diversity of any given host, and this information would be lost during binarization. second, minor variant profiles might be noisy, due to the relatively low abundance and to the technical limitations of sequencing experiments. accordingly, such data may possibly include artifacts, which can be partially mitigated during the quality-check phase and by including in the analysis only highly-confident variants. however, binarization with arbitrary thresholds might increase the false positive rate, compromising the accuracy of any downstream analysis. third, as specified above, the extent of transmission of minor variants among individuals is still partially obscure. the vf of minor variants is, in fact, highly affected by recombination processes, as well as by complex transmission phenomena, involving stochastic fluctuations, bottlenecks and founder effects, and which may lead certain variants changing their vf, not being transmitted or even becoming clonal in the infected host [ ] . the latter issue also suggests that the hypothesis of accumulation of minor variants during infections may not hold and should be relaxed. for these reasons, verso step # defines a pairwise genomic distance, computed on the variant frequency profiles, to be used in downstream analyses. the intuition is that samples displaying similar patterns of co-occurrence of minor variants might have a similar quasispecies architecture, thus being at a small evolutionary distance. accordingly, this might indicate a direct or indirect infection event. in particular, in this work we employed the bray-curtis dissimilarity, which is defined as follows: given the ordered vf vectors of two samples, i.e. v i = {v f i , . . . , v f i r , } and v j = {v f j , . . . , v f j r , }, the pairwise bray-curtis dissimilarity d(i, j) is given by: since this measure weights the pairwise vf dissimilarity on each variant with respect to the sum of the vf of all variants detected in both samples, it can be effectively used to compare the intra-host genomic diversity of samples, as proposed for instance in [ ] . however, verso allows one to employ different distance metrics on vf profiles, such as correlation or euclidean distance. as a design choice, in verso the genomic distance is computed among all samples associated to any given clonal genotype, as inferred in step # . the rationale is that, in a statistical inference framework modeling a complex interplay involving heterogeneous dynamical processes, it is crucial to stratify samples into homogeneous groups, to reduce the impact of possible confounding effects [ ] . furthermore, as specified above, due to the distinct properties of clonal and minor variants during transmission, it is reasonable to assume that the event in which certain minor variants and no clonal variants are transmitted from a host to another during the infection is extremely unlikely. accordingly, the clonal variants employed for the reconstruction of the phylogenetic tree in step # are excluded from the computation of the intra-host distance among samples. in order to produce useful knowledge from the genomic distance discussed above and since, in real-world scenarios, this is a typically complex high-dimensional problem, it is sound to employ state-of-the-art strategies for dimensionality reduction and (sample) clustering, as typically done in single-cell analyses [ ] . in this regard, the workflow employed in verso ensures high scalability with large datasets, also allowing to taking advantage of effective analysis and visualization features. in detail, the workflow includes three steps: (i) the computation of the k-nearest neighbour graph (k-nng), which can be executed on the original variant frequency matrix, or after applying principal component analysis (pca), to possibly reduce the effect of noisy observations (when the number of samples and variants is sufficiently high); (ii) the clustering of samples via either louvain or leiden algorithms for community detection [ ] ; (iii) the projection of samples on a low-dimensional space via standard tsne [ ] or umap [ ] plots. as output, verso step # delivers both the partitioning of samples in homogeneous clusters and the visualization in a low-dimensional space, also allowing to label samples according to other covariates, such as, e.g., collection date or geographical location. in the map in fig. , for instance, the intra-host genomic diversity of each sample and the genomic distance among samples are projected on the first two umap components, whereas samples that are connected by k-nng edges display similar patterns of co-occurrence of variants. accordingly, the map show clusters of samples likely affected by infection events, in which (a fraction of) quasispecies might have been transmitted from a host to another. this represents a major novelty introduced by verso and also allows one to effectively visualize the space of variant frequency profiles. to facilitate the usage, verso step # is provided as a python script which employs the scanpy suite of tools [ ] , which is typically used in single-cell analyses and includes a number of highly-effective analysis and visualization features. additional feature: homoplasy detection on minor variants also in the case of minor variants, it is important to pinpoint possible homoplasies and which might be due to mutational hotspots, phantom mutations and convergent variants. given the phylogenetic model retrieved via step # , verso allows to flag the variants that are detected in a number of clonal genotypes exceeding a user-defined threshold. in our case, the threshold is equal to , meaning that all minor variants found in more than one clonal genotypes are flagged. such variants are then excluded from the computation of the intra-host genomic distance, prior to the execution of step # . furthermore, the list of flagged variants can be investigated as proposed for step # (see above), in order to possibly identify mutations involved in positive selection scenarios. dataset # (illumina amplicon sequencing) we analyzed samples from distinct individuals obtained from ncbi bioprojects, which, at the time of writing, are all the publicly available datasets including raw illumina amplicon sequencing data. in detail, we selected the following projects: contact tracing data were obtained from the study presented in [ ] . in detail, for samples included in dataset # (ncbi bioproject prjna ), information on households, work institutions and epidemiological linkages are provided. thus, it is possible to identify different contact groups based on institutions regularly frequented by patients and household couples. contact information was employed to assess the relation between the intra-host genomic similarity and the contact dynamics. the results are provided in the main text. [ ] . we remark that one should be extremely careful when considering low-frequency variants, which might possibly result from sequencing artifacts, even in case of high-coverage experiments. in this regard, we note that many approaches can be employed to reduce false variants. for instance, the broad institute recently updated an effective variant calling pipeline for viral genome data [ ] , while new methods for error correction of viral sequencing have been proposed at this widely used website: https://virological.org, which also includes a number of useful up-to-date guidelines and best practices for viral evolution analyses. in our case, we here employed the following significance filters on variants. in particular, we kept only the mutations: ( ) showing a varscan significance p-value < . (fisher's exact test on the read counts supporting reference and variant alleles) and more than reads of support in at least % of the samples, ( ) displaying a variant frequency vf > %. as a result, we selected a list of (on overall snvs) highly-confident snvs for dataset # and (on ) for dataset # . high-quality variants were then mapped on sars-cov- coding sequences (cdss) via a custom r script, also by highlighting synonymous/nonsynonymous states and amino acid substitutions for the related open reading frame (orf) product. in particular, we translated reference and mutated cdss with the seqinr r package to obtain the relative amino acid sequences, which we compared to assess the effect of each nucleotide variation in terms of amino acid substitution. we finally note that availability of the ct values generated by q-pcr and the related quantification of the amount of viral transcripts would be very useful to characterize samples with high viral load, yet this information is not available for the considered datasets. in order to select high-quality samples, we selected only those exhibiting high coverage and in particular those with at least reads in more than % of the sars-cov- -anc genome. in addition, we filtered out all samples exhibiting more than minor variants (vf ≤ %). we finally excluded samples srr and srr from dataset # , as the first sample displays zero snvs and the second one reports an unfeasible collection date (i.e. th jan. ). after the quality-check filters, samples of dataset # are left for downstream analyses, in which distinct high-quality single-nucleotide variants are observed, and samples are left for dataset # , with high-quality snvs. the phylogenomic analysis via verso step # was performed on datasets # and # by considering only clonal variants (vf > %) detected in at least % of the samples. a grid search comprising different error rates was employed (see table of the supplementary material). samples with the same corrected clonal genotype were grouped in polytomies in the final phylogenetic models. the analysis of the intra-host genomic diversity via verso step # was performed by considering the vf profiles of all samples, by excluding: (i) the clonal variants employed in the phylogenomic reconstruction via verso step # , (ii) the minor variants involved in homoplasies, i.e., observed in more than one clonal genotype returned by verso step # . missing values (na) were imputed to for downstream analysis. a number of pcs equals to was employed in pca step, prior to the computation of the k-nearest neighbour graph (k = ) on the bray-curtis dissimilarity of vf profiles. leiden algorithm was applied with resolution = (see table of the supplementary material for the parameter settings of verso employed in the case studies). in order to compare the performance of verso step # with competing phylogenomic tools, i.e., iq-tree [ ] and beast [ ] , we performed extensive simulations via msprime [ ] , which simulates a backwards-in-time coalescent model. in particular, we simulated distinct evolutionary processes, with the following parameters: n = total samples, effective population size n e = . (i.e., haploid population), mutational rate m = × − mutations per site per generation and a genome of length l = bases. such parameters were chosen to roughly approximate the mutational rate currently estimated for sars-cov- (i.e., m ≈ − mutations per site per year and ≈ − generation year [ ] ) and to obtain a number of clonal mutations (in the range − ) that is comparable to the one observed in the real-word scenarios (see the case studies). as output, msprime returns a phylogenetic tree representing the genealogy between the samples, the genotype of all samples (i.e., the leaves of the tree) and the location of all mutations. the genotypes of the samples were then inflated with different levels of noise, with false positive rate α and false negative rate β (see the parameter settings in table of the supplementary material), in order to assess the performance of the methods in conditions of noisy observations and possible sequencing issues. finally, we subsampled all datasets to obtain two distinct samples sizes ( and samples), in order to test the robustness of methods in conditions of sampling limitations. the parameters of the phylogenetic methods employed in the comparative assessment are reported in the supplementary material ( table of the supplementary material). verso is freely available at this link: https://github.com/bimib-disco/verso. verso step # is provided as an open source standalone r tool, whereas step # is provided as python script. the source code to replicate all the analyses presented in the manuscript, both on simulated and real-world datasets, is available at this link: https://github.com/bimib-disco/verso-utilities. scanpy [ ] is available at this link: https://scanpy.readthedocs.io/en/stable/. the web-based tool for the geo-temporal visualization of samples, microreact [ ] , is available at this link: https://microreact.org/ showcase. the tool employed to plot the phylogenomic model returned by verso step # (in newick file format) is figtree [ ] and is available at this link: http://tree.bio.ed.ac.uk/software/figtree/. supervised the computational analysis. a.g. and r.p. drafted the manuscript, which all authors discussed, reviewed and approved. a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- isolation of sars-cov- -related coronavirus from malayan pangolins genomic surveillance reveals multiple introductions of sars-cov- into northern california we shouldn't worry when a virus mutates during disease outbreaks unifying the epidemiological and evolutionary dynamics of pathogens quantifying influenza virus diversity and transmission in humans global initiative on sharing all influenza data-from vision to reality iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies viral phylodynamics establishment and cryptic transmission of zika virus in brazil and the americas the emergence of sars-cov- in europe and north america median-joining networks for inferring intraspecific phylogenies evolutionary inferences from phylogenies: a review of methods mrbayes . : efficient bayesian phylogenetic inference and model choice across a large model space raxml version : a tool for phylogenetic analysis and post-analysis of large phylogenies genomic infectious disease epidemiology in partially sampled and ongoing outbreaks quentin: reconstruction of disease transmissions from viral quasispecies genomic data bayesian reconstruction of transmission within outbreaks using genomic variants hiv-trace (transmission cluster engine): a tool for large scale molecular epidemiology of hiv- and other rapidly evolving pathogens beast . : an advanced software platform for bayesian evolutionary analysis early phylogenetic estimate of the effective reproduction number of sars-cov- phylogenetic network analysis of sars-cov- genomes analysis of the hosts and transmission paths of sars-cov- in the covid- outbreak a metric on the space of reduced phylogenetic networks bitphylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies phylogenetic interpretation during outbreaks requires caution regaining perspective on sars-cov- molecular tracing and its implications the quasispecies (extremely heterogeneous) nature of viral rna genome populations: biological relevance-a review mutational and fitness landscapes of an rna virus revealed through population sequencing rapid viral quasispecies evolution: implications for vaccine and drug strategies why do rna viruses recombine? mutational signatures and heterogeneous host response revealed via large-scale characterization of sars-cov- genomic diversity genomic diversity of sars-cov- in coronavirus disease patients virological assessment of hospitalized patients with covid- molecular characterization of sars-cov- from the first case of covid- in italy intra-host site-specific polymorphisms of sars-cov- is consistent across multiple samples and methodologies. medrxiv genomic epidemiology of sars-cov- in guangdong province, china shared sars-cov- diversity suggests localised transmission of minority variants tracking the covid- pandemic in australia using genomics mutational dynamics and transmission properties of sars-cov- superspreading events in austria clonal interference and the evolution of rna viruses sars-associated coronavirus quasispecies in individual patients beyond the consensus: dissecting within-host viral population diversity of foot-and-mouth disease virus by using next-generation genome sequencing analysis of intrapatient heterogeneity uncovers the microevolution of middle east respiratory syndrome coronavirus intra-host dynamics of ebola virus during capri: efficient inference of cancer progression models from cross-sectional data cancer evolution: mathematical models and computational inference algorithmic methods to infer the evolutionary trajectories in cancer progression the evolution of tumour phylogenetics: principles and practice exceptional convergent evolution in a virus the fingerprint of phantom mutations in mitochondrial dna data circulating virus load determines the size of bottlenecks in viral populations progressing within a host reconstructing foot-and-mouth disease outbreaks: a methods comparison of transmission network models scanpy: large-scale single-cell gene expression data analysis qure: software for viral quasispecies reconstruction from next-generation sequencing data full-length haplotype reconstruction to infer the structure of heterogeneous virus populations viral quasispecies assembly via maximal clique enumeration qsdpr: viral quasispecies reconstruction via correlation clustering epidemiological data analysis of viral quasispecies in the next-generation sequencing era co-infection and super-infection models in evolutionary epidemiology incidence of co-infections and superinfections in hospitalized patients with covid- : a retrospective cohort study uniform manifold approximation and projection for dimension reduction visualizing data using t-sne revealing covid- transmission in australia by sars-cov- genome sequencing and agent-based modeling efficient coalescent simulation and genealogical analysis for large sample sizes coalescent theory: an introduction nextstrain: real-time tracking of pathogen evolution a simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates distributions of tree comparison metrics-some new results the first novel coronavirus case in nepal evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic emergence of sars-cov- through recombination and strong purifying selection microreact: visualizing and sharing data for genomic epidemiology and phylogeography on the origin and continuing evolution of sars-cov- viral and host factors related to the clinical outcome of covid- evaluating the effects of sars-cov- spike mutation d g on transmissibility and pathogenicity tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus structural and functional analysis of the d g sars-cov- spike protein variant making sense of mutation: what d g means for the covid- pandemic remains unclear emergence of genomic diversity and recurrent mutations in sars-cov- . infection correcting for purifying selection: an improved human mitochondrial molecular clock functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses peptide-based membrane fusion inhibitors targeting hcov- e spike protein hr and hr domains a pan-coronavirus fusion inhibitor targeting the hr domain of human coronavirus spike transmission dynamics and evolutionary history of -ncov positive and negative selection on the human genome transmission bottleneck size estimation from pathogen deep-sequencing data, with an application to human influenza a virus inferring transmission bottleneck size from viral sequence data using a novel haplotype reconstruction method large bottleneck size in cauliflower mosaic virus populations during host plant colonization genetic drift, purifying selection and vector genotype shape dengue virus intra-host genetic diversity in mosquitoes towards a genomics-informed, real-time, global pathogen surveillance system substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) phylodynamics of infectious disease epidemics cryptic transmission of sars-cov- in washington state mapping genome variation of sars-cov- worldwide highlights the impact of covid- super-spreaders longitudinal cancer evolution from single cells the number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations efficient algorithms for inferring evolutionary trees dendroscope : an interactive tool for rooted phylogenetic trees and networks phantom mutation hotspots in human mitochondrial dna haplogrep : mitochondrial haplogroup classification in the era of high-throughput sequencing genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering current best practices in single-cell rna-seq analysis: a tutorial from louvain to leiden: guaranteeing well-connected communities varscan : somatic mutation and copy number alteration discovery in cancer by exome sequencing broadinstitute/viral-ngs science forum: sars-cov- (covid- ) by the numbers this work was partially supported by the elixir italian chapter and the sysbionet project, a ministero dell'istruzione, dell'università e della ricerca initiative for the italian roadmap of european strategy forum on research infrastructures and by the airc-ig grant . partial support was also provided by the cruk/airc accelerator award # , "single-cell cancer evolution in the clinic". we thank giulio caravagna and chiara damiani for helpful discussions. we also thank david posada for interesting suggestions on the preliminary version of the manuscript. key: cord- -mvz l yj authors: liu, tiantian; chen, zhong; chen, wanqiu; chen, xin; hosseini, maryam; yang, zhaowei; li, jing; ho, diana; turay, david; gheorghe, ciprian; jones, wendell; wang, charles title: a benchmarking study of sars-cov- whole-genome sequencing protocols using covid- patient samples date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mvz l yj the covid- pandemic is a once-in-a-lifetime event, exceeding mortality rates of the flu pandemics from the ’s and ’s. whole-genome sequencing (wgs) of sars-cov- plays a critical role in understanding the disease. performance variation exists across sars-cov- viral wgs technologies, but there is currently no benchmarking study comparing different wgs sequencing protocols. we compared seven different sars-cov- wgs library protocols using rna from patient nasopharyngeal swab samples under two storage conditions. we constructed multiple wgs libraries encompassing three different viral inputs: , , , , and , copies. libraries were sequenced using two distinct platforms with varying sequencing depths and read lengths. we found large differences in mappability and genome coverage, and variations in sensitivity, reproducibility and precision of single-nucleotide variant calling across different protocols. we ranked the performance of protocols based on six different metrics. our results indicated that the most appropriate protocol depended on viral input amount and sequencing depth. our findings offer guidance in choosing appropriate wgs protocols to characterize sars-cov- and its evolution. the severe acute respiratory syndrome coronavirus- (sars-cov- ), a novel coronavirus that initially emerged in december in wuhan, china , , causing the coronavirus disease of (covid- ) , has led to a pandemic with > million confirmed cases and more than , , deaths worldwide as of november , (world health organization website https://www.who.int/emergencies/diseases/novel-coronavirus- ). sars-cov- can be rapidly transmitted from person-to-person, even during the asymptomatic stage , which is challenging healthcare systems and public health response. the whole-genome sequencing (wgs) of sars-cov- has been used as a powerful tool to study covid- since the first sequence was released on jan , . analysis of the sars-cov- genome allows for understanding the clinical outcome , developing diagnostics and vaccines for covid- ; and enables the tracking of the evolution and spread of the virus by phylogenetic analysis , which can reveal the dynamics of subtype evolution. to uncover the complete or near-complete sequence of sars-cov- , leading laboratories have used several sequencing protocols, including shotgun metagenomic approaches , , target-capture sequencing using twist custom target enrichment , and target whole-genome amplification sequencing by an multiplex artic primer set , . however, large variations in performance, e.g., genome coverage and singlenucleotide variant (snv) detection, occur across different protocols. there is no benchmark study that has compared different protocols using the same patient samples, particularly evaluating factors such as variation in viral input, sequencing platform and depth, sample quality and storage condition. notably, sars-cov- wgs requires a viral rna isolation from the clinical samples for sequencing library construction, and there can be orders-of-magnitude differences in viral load across different subjects. a large proportion of clinical samples contain extremely low viral copy number, which may impact the quality of wgs and the confidence in calls of snv or indel detection. we report a benchmarking study on sars-cov- wgs using clinical nasopharyngeal swab samples. we compared seven different library construction protocols and specifically evaluated the cross-protocol performance in sequencing read mappability, viral genome coverage percentage and uniformity, effect of sequence depth, snv calling concordance (reproducibility), precision (positive predictive value), and sensitivity (proportion of consensus variants identified at different sequencing depths and viral copy number inputs) across protocols. we constructed libraries using rna samples isolated from either freshly prepared nasopharyngeal swabs in avl buffer or frozen nasopharyngeal swabs preserved in avl buffer to evaluate the impact of storage conditions on the protocol performance. our findings offer guidance in choosing the most suitable sars-cov- wgs protocols to better associate sars-cov- variation with its epidemiological and clinical characteristics. mapping rates to the viral genome between fresh and frozen samples (p< . -frozen samples generally had better viral mapping) after accounting for protocol, viral load, and input amount. however, we also observed that the lower mapping rate could be improved and compensated by deeper sequencing (i.e., ~ x deeper for fresh samples vs. frozen) for certain protocols (fig. a) . the significant differences in viral mapping were more easily seen in p , p , p , and p generally corresponded to higher off-target bacterial sequence observed in fresh versus frozen samples (fig. a, c) . one exception was the p for which fresh samples also resulted in higher (off-target) human mapping rates (fig. b) . overall, the above results suggested that frozen samples performed better than or equivalent to fresh samples in their mappability and on-target percentage to the sars-cov- genome for all protocols with p having very high mappability with either sample preservation storage methods (fig. a-c) . read mappability to the sars-cov- viral genome versus human and bacterial genomes were clearly different across protocols regardless of rna prepared from fresh or frozen samples. the artic amplicon-based target genome amplification technology p had the highest read mapping percentage ( . % ± %) to the sars-cov- viral genome with the lowest reads mapping percentage to human genome ( . % ± . %) among the seven protocols (fig. a) . however, p , an earlier version of the artic amplicon-based protocol, had the second highest read mapping percentage ( % ± . %) to the sars-cov- viral genome with a substantially higher mapping rate ( . ± . %) to the human genome as compared to p (fig. a) . all the other metagenomic approach-based protocols had starkly fewer reads mapped to the reference viral genome (< . %). for the qiaseq fx single-cell rna-seq library kit incorporated with human ribosomal rna depletion (p ), the viral mapping rates were . % ± . %, and the mapping proportions to human genome were . % ± . %. when the same library kit was incorporated with both human and bacterial ribosomal rna depletions (p ), the viral mapping rates dropped slightly to . % ± . %, and the mapping rates on the human genome also dropped to . % ± . %. both p and p had the lowest percentage of reads mapped to the sars-cov- viral genome ( . % ± . %), and the highest mapping rates to human genome ( . % ± % and . % ± . %, respectively) (fig. a) . we also noticed that p had the highest rate of reads mapped to bacterial genome overall ( . % ± . %) among the seven protocols (fig. c) . to evaluate the variability of viral load on the sars-cov- wgs performance, we compared the sars-cov- , human, and bacterial reference mapping rates between low ( k copies) and high viral inputs ( k and m copies) across seven protocols (fig. d-f ). in our study design, different levels of sars-cov- viral inputs were generated using either original undiluted patient np sample-derived rna or a dilution from the identical higher viral load samples across all protocols (suppl. table ). we found that high viral inputs ( k and m copies) had a higher mapping rate to the sars-cov- genome compared to low viral inputs across all protocols as expected, although the difference is smallest for p , p and p [high vs. low, diff= . , p = . , generalized linear mixed model (glmm, statistical testing table now shown), fig. d) ]. p and p performed well with p having better mapping rates than p at either input level (high viral input vs. low viral input mapping rates: p , . % ± . % vs. . ± . %) with significantly less sequencing reads mapped to either human or bacterial genomes ( fig. d-f) , while p and p performed very poorly for both low and high viral inputs with extremely low sars-cov- viral mapping rates compared to other protocols (fig. d) . for p , low viral input resulted in orders of magnitude lower sars-cov- viral mapping rates compared to high viral inputs, e.g., . % ± . % vs. . ± . %, suggesting p would require a higher viral input to obtain adequate sars-cov- viral genome coverage when sequencing depth is limited (fig. d) . in general, the mapping rates to sars-cov- differed significantly by protocol (p< . e- , glmm), and by sample storage condition (p= . e- , glmm). according to the differences in their least square means, p had a statistically significantly higher sas-cov- mapping rate compared to the other six protocols (p vs. p , diff = . , p = . e- , glmm) followed by p (p vs. p , diff = . , p < . e- , glmm, statistical testing table now shown). the advantage in mapping rate of p and p is directly attributable to their viral amplicon-based design. protocols p -p were less easily differentiated by mapping rate (fig. d ). to determine the impact of read depth to the sars-cov- genome coverage, we down-sampled all of the library-run datasets to , ( k), , ( k), , ( k), , , ( m), , , ( m), , , ( m), , , ( m), and , , ( m) paired-end (pe) reads, and evaluated the sars-cov- viral genome coverage at different sequence depths across seven protocols (fig. ) . we noticed one sample (np ) performed extremely differently compared to other samples across protocols (green line in suppl. fig. ). therefore, we excluded this sample from the analysis for fig. . both amplicon-based protocols p and p achieved significantly higher coverage for the sars-cov- at a threshold of > x reads at each base (termed min x), even with comparatively lower overall read depths compared to other protocols. particularly, p consistently had higher percentages of sars-cov- genome coverage (min x) regardless of viral input amount ( fig. a-b) . at low viral input but with only m sequencing reads, p had % ± . % of the sars-cov- viral genome coverage (min x). by comparison, p , the other amplicon-based target genome amplification protocol, could only achieve % ± . % of the viral genome coverage (min x) at a sequencing depth even with m reads (fig. a) . interestingly, at low viral input, p and p achieved higher percentages of genome coverage (min x) than p at m reads (fig. a) . we found that p and p achieved almost complete coverage of sars-cov- genome (min x, . ± . % and . ± . %, respectively) with m reads at low viral input (fig. a) . p -p performed poorly with low viral input even at m read depth. however, the performance of viral genome coverage (min x) at the low input generally was not always consistent with what was observed at the high input for each protocol except for p which achieved nearly % genome coverage (min x) at the low input with m reads, and it reached to nearly % genome coverage (min x) at high input with m reads (fig. a-b) . furthermore, when the samples contained high viral input, both p and p could achieve high viral genome coverage (min x) at lower read depths as compared to all other protocols (fig. b) . for example, when the sequencing depth was at only k pe, p had . % ± . % of genome coverage whereas p reached . % ± . % of the sars-cov- genome coverage (min x, fig. b ). in contrast, at high viral input, three rna-seq based metagenomics protocols p , p and p had similar genome coverage levels across different sequencing depths (e.g., min x at m read depth: p , . ± . %; p , . ± . %; and p , . ± . %, respectively, fig. b ). nevertheless, for p and p , the observed genome coverage (whether > x or > x) was consistently low at even > m reads, regardless of high or low viral input ( fig. a-b) , suggesting that p and p would not be suitable in practice to achieve necessary coverage for snv detection. notably, when all the samples (including np ) were included for genome coverage analysis, we observed a similar relative pattern of performance by protocol between viral input levels as demonstrated in fig. but with greater variation (suppl. fig. a-b ). to evaluate the coverage quality and regional bias relative to the viral reference genome between protocols, we computed the average coverage by genomic position across all samples across all protocols at a depth of m total reads ( fig. a-b) . as expected, due to their amplicon-based nature, p and p had the highest average coverage compared to all the other protocols ( fig. a-b) . a great majority of the regions across the sars-cov- genome had ~ , x coverage while some regions were near or exceeded , x coverage for p and p at high viral input (suppl. fig. ) . the variations or "spikes" of the coverage in many regions across the viral genome were much more pronounced at the low viral input as compared to the high viral input derived from the same samples (suppl. fig. ). for p and p , the average coverage across the entire viral genome usually ranged from - x (fig. a) , whereas for p and p , only certain regions were sequenced, and many regions showed no coverage in the samples with either low or high viral input ( fig. a-b) . furthermore, we observed that protocols p and p had much higher coverage at the ' end of the viral genome, which was likely introduced by oligo(dt) primers during cdna synthesis. other than the ' end of the viral genome, p and p had comparably even coverage across the viral genome regardless of viral inputs ( fig. a-b) . however, the viral genome coverage was significantly different between the low and high viral input in p which had scarcely or no coverage across whole genome at low viral input ( fig. a-b) . at high viral input, p had excellent coverage uniformity across the whole genome (fig. b) . finally, for completeness, in addition to greatly reduced viral genome coverage for p and p , but there was also a lack of coverage uniformity regardless viral input ( fig. a-b) . we further compared the coverage uniformity using a quantitative metric, i.e., coefficient of variation (cv) across seven protocols (fig. c-d) computed using the averaged coverage depth by reference genome position. amplicon-based p had the best uniformity of genome coverage at both low and high viral inputs and was the only protocol with a coverage cv at or less than (i.e., %, fig. c-d) , whereas protocols p -p had ~ - times higher cv compared to p at either viral inputs ( fig. c-d) . however, for p at high viral input, consistent with what was demonstrated in the coverage tracks ( fig. a-b) , the cv was very small (~ , most similar to the p , fig. d) , indicating excellent uniformity in genome coverage. conversely, at low viral input, p had much larger cv values (range of [ , ] , fig. c ). we also examined the relative impact of viral input (high vs. low, pair-wise, derived from the same clinical sample) on coverage uniformity using our cv metric (suppl. fig. ). we found that the uniformity and overall coverage of the sars-cov- genome as evaluated by cv improved remarkably for each of six protocols (p was not evaluated due to distinct clinical samples used for low vs. high input), when higher viral inputs (e.g., k or m vs. k copies) were used ( fig. a-b) . particularly, the coverage uniformity for p was the least affected by viral input changes. coverage uniformity of p and p were also less impacted by low versus high viral input levels than other protocols (suppl. fig. b ). to gain a deeper understanding on the variations in read depth in certain regions and the differences of these variations between the two related amplicon-based protocols (i.e., p and p ), we compared the genome coverage profiles of three samples with high viral input ( m copies) for which the wgs libraries were constructed using p and sequenced at m read depth (suppl. fig. ) . we noticed that all three samples shared similar coverage patterns across the whole sars-cov- genome, suggesting that those local high spikes were primerset dependent. we examined the primer sets corresponding to those highly variable regions, and found that four mostly over-represented spiking regions in the three samples using p were associated with about primer sets whose amplified genome regions were covered by multiple amplicons (suppl. fig. ) . although the artic v primer set was designed to cover each genome position with two amplicons (with the exception of the regions covered by amplicons and ), high coverage regions were associated with three or more amplicons. consistent with this fact, the coverages for those regions were roughly equal to the sum of the coverages from three to four individual amplicons, supposing each amplicon had a similar amplification efficiency. furthermore, at least one extra alternative primer pair was linked to each high coverage region, hinting a redundancy for some artic v primers. in contrast, our study showed that p had a more uniform coverage across all regions compared to p ( fig. a- fig. ) . finally, it was unsurprising to observe low coverage at both the '-and '-regions for p and p since only one primer pair covered each of these regions. genome variants in clinical samples were called from bam files after removing duplications and primer sequences from amplicon reads. in order to evaluate the accuracy of variant calling, we called variants from clinical samples prepared from p , p , p , p , and p using varscan (v . . ) and ivar (v . . ) against the reference sars-cov- genome nc_ . . the putative snvs were defined as variants with x minimum coverage and > % allele frequency, the default setting for varscan. our wgs data showed that five out of the eight clinical samples were able to produce snv calls with the other three samples failing due to insufficient coverage. we analyzed the results from the independent preparation and library methods and found certain sars-cov- viral snvs recurred across protocols and across patients. in particular, we identified ten snvs that were found by at least three distinct protocols in at least one patient, all at high allele frequency. seven of these ten were identified in more than one patient. we termed these ten snvs as the consensus snvs for this study, and we established that a consensus snv for a patient was a consensus snv from the overall study that was identified by at least two distinct protocols. these ten consensus snvs, for which our measurements of sensitivity were derived, are provided in suppl. table . notably, all consensus snvs in patient samples had high (> %) average allele frequency, which was to be expected for a haploid-type genome. also, of importance, all snvs in the study that had observed allele frequency above % but were not identified as consensus snvs were: a) never replicated across protocols for the same sample; b) were never identified in more than one sample; and c) were all observed only when using low viral copy input (data not shown). therefore, higher viral copy input was strongly associated with better cross-protocol snv reproducibility as discussed in more detail in the next section. sensitivity results at high viral input for consensus snvs are shown for three representative clinical samples (np , np , np ) in fig. a , and b. all putative snv data from high viral input at different read depths are provided in at high ( m) viral copy input level, m x paired-end reads were sufficient to detect nearly all consensus snvs in the three representative clinical samples using any of p -p except p for which only m reads were needed (fig. b) . although all five protocols were able to detect the majority of consensus snvs from as low as . m reads, p and p exhibited better performance than p , p , and p at low read depth due to a high percentage of on-target reads. we found that only . m reads were sufficient for p to properly detect roughly % of the consensus snvs (fig. b) . surprisingly, p required m reads to detect more than % of the consensus snvs, which was more than expected and contrasted greatly with p ( fig. b) . at m reads, p identified all consensus snvs in all samples while p and p had sensitivity levels that exceeded p ( fig. a-b) . however, at lower viral copy input, we observed that several consensus snvs were not detected. i.e., either observed at low allele frequency or completely undetected across protocols, which indicated a reduced sensitivity in snv detection with low viral input (suppl. fig. & ) . particularly, at low viral copy input, we noticed that p did not make any snv call at all due to its inadequate genome coverage. however, p achieved excellent sensitivity (almost %) at m pe reads while p was able to achieve ~ % sensitivity on average at . m pe reads even with low viral copy input (suppl. fig. & ). phylogenetic network analysis of complete sars-cov- genomes has been conducted to track the transmission of covid- (. based on the snvs identified in our tested samples, we performed phylogenetic analyses to explore the relatedness of the genotypes in our samples with the viral strains spread in the country and the world. we identified four viral genome subtypes containing six to nine snvs within each genome subtype (suppl. fig. ). all four viral subtypes appeared to share six of the ten consensus snvs, indicating that these genotypes were phylogenetically closely related (suppl. figs. a-b & ). using a global phylogeny generated via nextstrain (suppl. fig. ), the cases from the loma linda (ca, usa) were most related to the cohort from mi (usa), with all the five virus samples pertaining to clade c (suppl. fig. ). to investigate the influence of viral input, sequencing read depth, and protocol on key quality parameters such as reproducibility and precision, we examined protocols p , p , p , p , and p using samples np , np , and np which had wgs libraries constructed with both k and m viral inputs. we defined the reproducibility relative to an allele frequency threshold: i.e., a variant was reproducible between protocols a and b (or between input amounts for the same protocol) if the variant from protocol a's library had an allele frequency equal to or greater than a threshold for which the variant was also identified by protocol b's library at any allele frequency. we used the jaccard index for reproducibility scoring. we observed that the protocol, read depth, viral input amount, and the sample itself impacted the reproducibility between protocols on called snvs (fig. a) . the sample itself and the viral input amount had the biggest impact on reproducibility, suggesting that there were characteristics about the sample apart from viral input level that influenced the snv detections. the allele frequency threshold impact was muted once the threshold exceeds %. the average reproducibility across protocols using the jaccard index was almost % when using m viral copies for input with reasonable allele frequency thresholds and when sequencing the sample to m pe reads (fig. b) . lowering either read depth to m or . m pe reads or reducing the viral input to k copies noticeably reduced reproducibility by at least % and sometimes by % or more (fig. b) . regarding the precision of snv calling, we noticed that at k viral copy input a large number of low allele frequency snvs (i.e., - %, false) were putatively identified in each sample by p , p , p , and p (p not evaluated at low viral copy input due to generally poor genome sequence coverage results), especially in sample np (suppl. fig. ). an increase in read depth did not improve the precision of snv calling, nor did it reduce the number of low allele frequency snvs (suppl. fig. ). as sars-co-v is a haploid virus, we assumed that the vast majority of these low frequency putative variants were false snvs. consistent with this, none of these putative lower allele frequency snvs was reproduced across at least three protocols (suppl. fig. ). on the other hand, in certain clinical samples (np and np ), some putative false snvs with high allele frequency (> %) were called primarily by amplicon-based protocols (p or p ) at low viral copy input (suppl. fig. a -c, snvs without an asterisk), nevertheless these snvs were not reproduced even in the same samples with high viral copy input (fig. a&c) . in summary, we observed that nonreproducible candidate snvs tended to have one or more of the following characteristics: ) they tended to have allele frequency below % ( % of observed nonreproducible snvs); ) they tended to occur when using low viral input ( % of observed nonreproducible snvs); and ) they were observed at local pile-up depths of less than or equal to bases ( % of observed nonreproducible snvs). our benchmarking study suggested that low viral copy input severely affected reproducibility of snv calls as well as sensitivity and precision of consensus snvs. the covid- pandemic is causing a global health crisis. by november , over . million deaths were attributable to covid- , and the number is continuously growing (world health organization website https://www.who.int/emergencies/diseases/novel-coronavirus- ). there is an urgent need to better understand and track sars-cov- to improve the viral detection, tracing of the viral transmission, and the development of effective therapeutic approaches. generating full-length sars-cov- sequence through next-generation sequencing (ngs) will allow better understanding of its evolution and enhance the treatment strategies for covid- , - . here, we compared seven wgs protocols for sars-cov- using clinical samples from infected patients, benchmarking the performances of these protocols in several aspects including the sequencing read mappability, genome coverage (percentage and uniformity, minimum sequences required); sample storage condition; effects of viral input, sequencing depth, length and platform; sensitivity, reproducibility and precision of snv calling and related assay factors (e.g., amount of viral input, sequencing depth and bioinformatics pipeline). the sars-cov- is a positive-sense single-stranded rna virus, which has low stability once rna enzymes are released after cellular destruction. the quality of virus rna is critical for the detection and the overall genome sequencing. it has been reported that only - % of the positive cases are identified by rt-pcr, possibly due to loss or degradation of virus rna during the sampling process , . starting the rna isolation immediately following np swab sample collection may be ideal to minimize rna degradation; however, immediate isolation is often impractical especially when involving large cohorts of sampling at different time points. therefore, we compared the samples isolated from two storage conditions, i.e., rna isolated either immediately from the freshly prepared np swabs or from the np swabs in avl buffer that were frozen at - c for - days. we found that although there were differences in the genome mappability between fresh and frozen samples across the protocols where the frozen samples performed slightly better than or equivalent to fresh samples in their on-target percentage to the sars-cov- genome (fig. a-c) , there was no practical difference in the on-target sequence mappability for p . furthermore, for other well-performing protocols such as p , p , and p , one could overcome differences in mappability by deeper sequencing (e.g., > x deeper, fig. a, b) . thus, for the wgs of sars-cov- involving large numbers of samples we believe that using the rna isolated from frozen samples (- °c) can be a practical and better choice. the aritic amplicon-based target whole-genome amplification of sars-cov- is considered as a highly sensitive and low-cost method which could provide high coverage for the viral genome with much less sequencing needed . several studies have used the artic target whole-genome amplification technique for sequencing sars-cov- , , . the qiaseq sars-cov- primer panel protocols p and p were based on the artic v primer set, but with a replacement of the _right primer by a substitute primer (i.e., ′-tctctgccaaattgttggaaaggca- ′) . consistent with the previous reports , our study showed that p and p preferentially amplified sars-cov- genome up to -fold over human or bacterial genomes in human samples ( fig. d-f ). compared to the rna-seq metagenomics-based technologies (i.e., p , p and p ), p and p achieved more than -fold higher coverage for the sars-cov- genome depending on viral load and sequencing depth (fig. d-f ). at high viral input, as few as k reads were sufficient for p and p to achieve > % viral genome coverage (min x) (fig. b) . we found that the p worked better than p for the samples with low viral copy number or even for samples with partial rna degradation ( fig. a and suppl. figs. a & ) . furthermore, we also noticed that p had noticeably more bias and large variations (spikes) in genomic coverage of several regions which were associated with the primer sets - , - , - , and - , respectively. however, these variations were significantly decreased with p , which showed a much more uniform genome coverage at both low and high viral inputs (fig. c, suppl. fig. & ). although the primer-panel based target amplicon sequencing has been shown as a cost-effective approach for sequencing the clinical covid- samples to discover the individual genetic diversity , we found there were some limitations for the artic v amplicon-based target whole-amplification protocols. first, by design, the current artic v amplicons only covered genome regions from positions to , which would make it impossible for the artic v amplicon-based protocols to detect a snv outside of the pcr amplified regions. this scenario actually occurred in our benchmarking study and we found that a consensus variant, g. g>a in sample np , was consistently detected by protocols p , p , and p , but was missed by p and p (fig. a, c&d) . second, a single-base mismatch between the primer and template may produce a pcr error such as chimeric pcr amplification , which might lead to a false snv call. for example, we found that p , at low viral input, called a unique "false" snv (g. g>t) with almost % allele frequency and > , x coverage (suppl. fig. ). however, this putative snv was not detected in the same clinical sample prepared using either p at high viral input ( m) or p , p , p and p (fig. c, suppl. fig. ) at any input. third, pcr amplified primer-originated "contaminated" sequences associated with the qiagen protocols p and p may lead to an error in snv calling. coincidently, we had a consensus snv (g. c>t) which was within the overlapping binding site to the right adjacent primers and alt. interestingly, this snv was consistently called by p , p , p , and p at the defined threshold (> % frequency), but in p had a significantly lower variant allele frequency ( fig. a&c ) and was not called. because the pcr primers could mask a snv that was located in the primer-binding regions, proper primer trimming would be critical for accurately detecting snvs within the primer binding regions. to understand how this inconsistency occurred, we analyzed the sequencing reads derived from the amplicons and (after adapter trimming) before and after primer trimming on the sequencing data generated from both p and p . we found that, for p , after adapter trimming, only . % of the reads contained g. c>t snv; but after primer trimming using either ivar or clc (qiagen, https://www.qiagenbioinformatics.com/products/clcgenomics-workbench), the frequency of g. c>t became . % or . %, respectively, whereas many reads containing the primer sequences (g. c) still remained and were not trimmed properly (extended data fig. a ). for p , after adapter trimming, . % of the reads contained the consensus g. c>t snv; after primer trimming using either ivar or clc, the frequency of g. c>t became . % or . %, respectively (extended data fig. b) . per qiagen protocols, during the library constructions for p and p , the pcr amplified products were subject to an enzymatic random fragmentation which could generate primer-originated "contaminated" sequences, i.e., the reads containing partial primer sequences or reverse complementary complete/partial primer sequences that could not be removed by ivar or clc (suppl. fig. ) . however, when applying cutadapt , a trimming algorithm that removed all the partial or complete primer sequences by trimming only the end of the reads, i.e., "end-primer sequence trimming", the frequency of the snv calling for g. c>t became . % (p ) or . % (p ), respectively (extended data fig. ) . in contrast to the artic v amplicon-based target genome amplification, the rna-seq based metagenomics sequencing protocols such as p , p and p used an unbiased approach to cover the whole-genome. the metagenomics approach has been used for sequencing sars-cov- in several recent studies , , , . obviously, a unique advantage of the metagenomic approach is its whole-genome coverage including all bases for the sars-cov- genome given an adequate sequencing depth. we found that when the samples contained a higher viral load (e.g., ~ k or m copies), p , p , p achieved almost complete sars-cov- genome coverage (min x) with only ~ m reads per sample (fig. b) . when the samples np , np , and np contained a lower viral input (< k copies), we observed only p achieved sufficient whole-genome coverage (min x, fig. a) leading to % sensitivity for detecting the consensus snvs at a depth of ~ m reads (suppl. fig. ). the protocol p is based on single primer isothermal amplification technology (spia, nugen) coupled with the high-throughput sequencing, which can also generate a full-length sars-cov- genome. the spia has been shown to generate the full-length genomes for hiv, west nile virus, and bovine coronavirus etc , . however, for samples with low viral input (< k copies), we observed that only . % of the spia reads could map to the viral genome, suggesting that p might not be ideal if a low copy number of sars-cov- within sample is expected. detecting individual sars-cov- genome variation is critical in tracking the viral spread, evolution, as well as for understanding the potential drug resistance. thus, we benchmarked and ranked the sensitivity, reproducibility, and precision of the snv calling of sars-co-v- across protocols (extended data fig. and suppl. fig. ). we found that the metagenomics protocol p was ranked consistently best in the sensitivity of snv detection, followed by p , p and p at either low or high viral input (extended data fig. a and suppl. fig. a) . the rankings for reproducibility of snv calling were very similar to rankings for sensitivity of snv calling, although differences between top protocols were smaller and p at high viral input moved up in rank. (extended data fig. b and suppl. fig. b ). in contrast, there was a striking difference in the ranking order between the low viral input and high viral input regarding precision, i.e., p was ranked best followed by p and p at low viral input; whereas at high viral input, p and p performed the best, followed by p , p , and p (extended data fig. c and suppl. fig. c) . however, at low viral input, all protocols including p performed poorly for precision, thus the ranking may be somewhat random. at high input, the order is probably more meaningful. therefore, we should not be surprised if the precision ranking for some protocols changes dramatically between low and high inputs. overall, we observed that the viral input was a key factor impacting the snv calling sensitivity, reproducibility, and precision for sars-cov- , e.g., a low viral input adversely affected the snv detection (figs. a-b, a-b, extended data fig. a-c, suppl. figs. , & ). as expected, limited copy number of the viral rna requires extra pcr amplification, which inevitably introduces more noise and bias as well as potential errors . other studies have reported similar effects of viral input copy number on the sequencing and mutation detection quality in line with our observations , [ ] [ ] [ ] . we also ranked the protocol performance based on mappability, minimal genome coverage and uniformity of genome coverage of the sars-co-v- (extended data fig. d-f and suppl fig. d-f) . obviously, p and p performed much better than metagenomics protocols p , p , and p in the mappability at both low and high viral input (extended data fig. d and suppl. fig. d ). in addition, p was ranked best uniformity of genome coverage at both low and high viral input, whereas p , p and p also performed generally well (extended data fig. f and suppl. fig. f) . however, for minimal genome coverage (% of genome with min x), metagenomics protocols p , p and p consistently outperformed the primer-panel based protocols p and p at both low and high viral inputs (extended data fig. e and suppl. fig. e) . in conclusion, our study shows that metagenomic approaches are more sensitive, reproducible and accurate at moderate to higher read depth (e.g., m reads) for the sars-co-v- snv calling. although amplicon approaches produce high coverage at lower read depths, they may yield less accurate detection (more false positives and false negatives), leading to reduced sensitivity compared to other methods for the reasons stated previously. therefore, for protocols p -p , we recommend at least m viral copies for input and m pe reads so that reasonable levels of sensitivity, reproducibility, and precision are achieved. if lower read depths are preferred, then with p , one can achieve satisfactory high levels of sensitivity, reproducibility, and precision in snv calling with fewer reads ( . m- m pe reads), especially with a reasonable threshold for allele frequency and if the variants are within the amplicon design. in summary, we benchmarked sars-cov- whole-genome sequencing using seven ngs protocols and evaluated the differences in mappability, viral genome coverage, and variations in snv calling sensitivity, reproducibility and concordance across input amounts and between protocols. the result of our study will provide a thorough reference and resource on selecting appropriate whole-genome sequencing technologies for clinical sars-cov- samples, providing knowledge to mitigate the impact of covid on our society. figure illustrate our overall study design. briefly, eight covid- positive nasopharyngeal swab rna samples, either freshly isolated or from frozen condition, were used to generate sars-cov- wgs libraries using seven protocols (fig. , suppl. table ) . two different sars-co-v- inputs, low ( copies) vs. high ( , or million copies) were used. each pair-wise low vs. high input were obtained the same clinical sample except p for which the low viral input wgs libraries were obtained from different samples due to the limitation of minimal total rna amount required (suppl. table ) . for fresh samples, three different viral inputs, i.e., vs. either , or million sars-co-v- viral copies, were used from each same sample, whereas for frozen samples, two different viral inputs, i.e., vs. million sars-co-v- viral copies, each from the same sample were used. the seven protocols included: the qiaseq sars-cov- primer panel v artic v primer set based target genome amplification protocol (p ); the qiaseq fx single-cell rna-seq library kit coupled with qiaseq fastselect -rrna hmr kit protocol (p ); the qiaseq fx single-cell rna-seq library kit coupled with qiaseq fastselect -rrna hmr kit and qiaseq fastselect - s/ s/ s kit protocol (p ); the tecan trio rna-seq kit coupled with human rrna depletion protocol (p ); protocols and used an in-house cdna synthesis recipe with a mix of random primers, oligo(dt), and four pairs of sars-cov- specific primers, coupled with either the illumina dna library preparation kit-dna nano (p ) or the nextera xt kit (p ); the qiaseq sars-cov- primer panel v artic v primer set based target genome amplification protocol with proprietary buffer chemistry modification (p ). the sars-co-v- wgs libraries were sequenced, pair-end, x or x bp, on two different illumina platforms (miseqdx vs. nextseq , suppl. table ). we benchmarked the performances of protocols on mappability, viral genome coverage (%) and uniformity, and sensitivity, reproducibility, and precision of snv calling. the study was approved by the institutional review board (irb number ) and the institutional biosafety committee (ibc) of the loma linda university (llu). all the clinical specimens were collected at llu medical center. the nasopharyngeal (np) specimens were collected from sars-cov- positive individuals. a total of eight nasopharyngeal swab specimens were included in the evaluation. after collection, the np swab was immediately immersed in µl diluted avl buffer ( µl pbs + µl avl buffer) in an eppendorf tube and incubated in room temperature for minutes. for fresh samples, the rna was extracted within to hours of sample collection. for frozen samples, the tubes were placed in - °c for - days before rna isolation. rna was isolated from np swabs with qiaamp viral rna mini kit (qigen, germany) according to manufacturer's instructions, and . µg carrier rna in ave buffer and µl absolute ethanol were added to each sample and mixed well. the entire volume of lysate was passed through qiaamp mini column and centrifuged at , g for min, allowing rna to bind to the column. following the wash by aw and aw buffer, the rna was eluted in µl ave buffer and stored in - °c for down-stream procedures. µl purified rna was mixed with . µl of nuclease-free h o, µl of qiagen buffer rdd, and . µl of rnase-free dnase i stock solution (qiagen). the sample was incubated for minutes at room temperature. then, the rna was purified using the rneasy minelute cleanup kit (qiagen), following manufacture's protocol. to confirm the presence of sars-cov- rna and the viral copy number from clinical specimens, realtime rt-pcr was performed using sybr green qrt-pcr method (suppl. table the sars-cov- genome copy number in total rna was quantitated by comparing the average coronavirus ct across orf ab, s, and n to that of the positive control. the sars-co-v- viral target genome amplicon libraries were constructed using qiaseq sars-cov- primer panel v (qiagen, germany) coupled with qiaseq fx dna library kit (qiagen) following the manufacturer's protocols. briefly, µl of total rna of different viral input ( million, , or , viral copies, respectively) was reverse transcribed to synthesize cdna using random hexamers. µl of cdna was evenly split into two pcr pools ( . µl for each pool) and amplified into bp amplicons using two sets of primers which cover % of the entire sars-cov- genome. the qiagen primer panel was designed based on artic v primers, with the exception that the right primer for amplicon (ncov- _ _right(-)) was replaced with a modified sequence, ′-tctctgccaaattgttggaaaggca- ' (itokawa group). the pcr was performed per manufacturer instruction with -cycle amplification for million and k viral copy samples, and -cycle amplification for , viral copy samples. after amplification, the contents of pcr pools were combined into one single tube for each sample followed with an ampure bead clean-up per manufacturer's instruction. the purified amplicons were quantified using qubit . (life technology) and normalized for dna library construction. two inputs ( ng and . ng) of purified amplicons were used for dna library construction. enzymatic fragmentation and end-repair were performed to generate bp dna fragments with an adenine on the ' end. briefly, to deplete human ribosomal rna, µl of diluted ( . x) fastselect rrna hmr (qiagen, germany) was added into µl covid- specimen rna along with µl na denaturation buffer, followed by heated at °c for min and then stepwise cooled to °c for min. afterwards, reverse transcription was performed using both random primer and oligo dt primer, and the remaining library preparation steps were performed following the protocol of qiaseq fx single cell rna library kit (qiagen, germany). to deplete both human and bacterial ribosomal rna, µl of diluted ( . x) fastselect rrna hmr (qiagen, germany) and µl of diluted ( . x) qiaseq fastselect s/ s/ s (qiagen) were added into µl covid- specimen rna along with µl na denaturation buffer, followed by heated at °c for min and then stepwise cooled to °c for min. qiaseq bead cleanup was carried out per the manufacturer's instructions. afterwards, reverse transcription was conducted using both random primer and oligo dt primer, and the remaining library preparation steps were performed by following the protocol of qiaseq fx single cell rna library kit (qiagen). after repli-g amplification, - ng of input cdnas were used for enzymatic fragmentation by incubating at °c for min, followed by adaptor ligation and ampurexp bead cleanup. final libraries were eluted from the beads without amplification. all the libraries were quantified with qubit . (life technologies) and quality analyzed on a tapestation (agilent). eight rna samples isolated from fresh and frozen specimens were used for tecan trio rna-seq library construction (nugen/tecan), following the nugen protocol with integrated dnase treatment. for fresh sample, total rna amounts containing million or , sars-co-v- viral copies for np and np , and sars-co-v- viral copies for np and np were used as input. for frozen samples, total rna amounts containing million or , sars-co-v- viral copies for np and np , and sars-co-v- viral copies for np and np were used as initial input, respectively. all procedures were carried out using conditions specified in the trio rna-seq protocol. the µl of total rna was treated with dnase, followed by cdna synthesis using random hexamers. after purification by ampure xp beads (beckman coulter), cdnas were amplified on beads by single primer isothermal amplification (spia). next, enzymatic fragmentation and end repair were performed to the cdnas to generate blunt ends. the illumina adaptors were ligated to cdna fragments, followed by first round of library amplification. anydeplete probe mix was used to deplete the human ribosomal transcripts. the remained dna libraries were amplified a second time for cycles. additional cycles of amplification were carried out for the libraries with a yield lower than ng. after the second round of library amplification, double size selection by ampure beads was performed to obtain library molecules with size ranged between bp and bp. then, . µl of ampure beads was added to the µl library products. after incubation and magnetic separation, supernatant was collected and another . µl of ampure beads was added. following magnetic separation, the supernatant was removed and the beads were washed with % alcohol. the final libraries were eluted in water. the libraries were quantified by qubit . (life technology) and quality analyzed on a tapestation (agilent). the amount of rna input was normalized based on the viral load determined by sybr green qrt-pcr method. specifically, total rna amounts containing million, , or sars-cov- viral copies were used to start the initial cdna synthesis by superscript iii reverse transcriptase (invitrogen), using a mix of random primers, oligo(dt) , and four pairs of sars-cov- specific primers that cover the ' and ' ends. the sequences of the specific primers were: f , '-attaaaggtttataccttccc- '; r , '-ttttttttttttgtcattctcc- '; f , '-ttcttatttcacagagca- '; r , '-aacataaccatccactgaatatg- '; f , '-aaatggggtaaggctagac- '; r , '-agtctacttgaccatcaac- '; f , '-agcacactttcctcgtgaagg- '; r , '-cttgaacttcctcttgtctg- '. reverse transcription (rt) primer annealing was conducted at °c for minutes, then, incubated on ice for minute. rt was carried out at °c for min, then °c for min in µl volume. all rt products were used for cdna amplification using qiaseq multiple displacement amplification (mda) technology. after amplification, the cdna was purified using equal volume of agencout ampure xp beads (beckman coulter). ng or ng purified cdnas were used for library construction using either truseq dna nano library preparation kit (illumina), i.e., p , or nextera xt dna library preparation kit (illumina), i.e., p , respectively. all procedures were carried out following the protocols recommended by the manufacturers. the libraries were purified with agencourt ampure beads, quantitated by qubit dsdna hs assay (life technologies), and the quality was analyzed on a tapestation with d screen tape (agilent). the sars-co-v- viral target genome amplicon libraries were constructed using the qiaseq sars-cov- primer panel v (qiagen, germany) coupled with qiaseq fx dna library kit (qiagen) following the manufacturer's protocols. all the procedures including primers were identical as described in p , except for some proprietary modifications on the buffers (qiagen, personal communications). the libraries were multiplexed with different barcodes and pooled at nm in equimolar amounts. the sample qc were reported by fastqc , qualimap , and multiqc . the raw reads were trimmed with cutadapt (v . . ). the trimmed reads were aligned to the wuhan-hu- reference using bwa mem (v . . ) with default settings. for the sequencing data generated using the artic v based primer-panel protocols (p and p ), an extra primer trimming was performed using ivar (v . . ). the aligned reads were further de-duplicated by samtools rmdup (v . ) to get the bam files. a kraken database was built based on the complete genomes in the ncbi refseq database for archaea, bacteria, protozoa, fungi, human and viruses (sars-cov- genome included). to summarize the read mapping percentages to multiple taxa, the trimmed reads were classified into human, sars-cov- , bacterial, and remaining reads (e.g., unclassified, archaeal, viral, fungi, protozoa) by using the kraken database. the sequencing read mappability (mapping percentage) to the sars-co-v- genome was computed for each of the protocols at different sequencing depths. a generalized linear mixed model (glmm) was created using lmer package in r to determine the significant factors explaining variations in mapping rates. as mapping rates tended to be at extremes for different protocols (i.e., mapping rates were usually near or %), we first transformed the mapping rate using the probit transform. the zero value in mapping rate was replaced with . e- for the probit transform. we then created a glmm using the transformed mapping rate as the dependent variable and employed fixed effects of protocol, viral copy input amount (low or high), and sample storage condition (fresh vs. frozen) and a random effect of the viral rna input concentration. no interactions terms were significant and results from the simplest model with these factors were reported. p values were generated using the satterthwaite approximation for degrees of freedom carried out by the lmertest package in r. to identify which groups were statistically different from one another, pairwise comparisons were carried out for all the fixed effects using difflsmeans function of lmertest package in r. the degree of freedom was adjusted by satterthwaite method. to evaluate sequence mapping to sars-cov- viral, human, and bacterial genomes, data were presented as the mean ± one standard deviation. to evaluate sequence read depth on minimal viral genome coverage, data were presented as the mean ± one standard error. the genome coverage was defined as the breath of coverage, which was measured as percentage of the sars-co-v- reference genome for which the genomic positions (bases) were sequenced with minimal x coverage. coverage uniformity on the sars-cov- genome was examined by comparing a quantitative metric, i.e., coefficient of variation (cv) across seven protocols. cv was computed using the standard deviation and mean of the coverage at each reference genome position. variants were called on the bam files by varscan (v . . ) and bcftools (v . ). to accurately identify snvs, we used samtools mpileup (parameters: -a -d -q ) and varscan (v . . ) (parameters: --p-value . -variants). then, we filtered the low-confidence snvs with snippy vcf_filter (parameters: --minqual -mincov --minfrac . ). samtools (v . ) mpileup and bcftools (v . ) were used to generate the genome variants fastq file from the bam file, and the fastq file was converted into genome fasta file using linux cat command. the variant fasta files generated from the same clinical sample but different protocols were piled up using jalview (v . . . ) alignment tool, from which one consensus fasta file was compiled for each clinical sample. protocols were ranked by average sensitivity using m reads for all protocols except p , where m reads were used (p was not sequenced at m reads). precision -variant calls from ivar were filtered for variant allele frequency (putative variants with vaf > % were considered as called variants, a lower threshold than that used for sensitivity, to better measure potential false calls). precision used the consensus snvs (discussed earlier) as the set of positives. calls of variants with vaf > % that were not a consensus snv were termed false positives (fp). precision, also known as positive predictive value (ppv), was defined as tp/(tp + fp). protocols were ranked by average precision using m reads for all protocols except p , where m reads were used (p was not sequenced at m reads). phylogenetic analysis was performed using nextstrain pipeline (https://github.com/nextstrain/ncov) fig. and fig. ). in amplicon sequencing, a potential snv allele could be located within an amplicon primer per se (masked snv allele) and the amplified viral allele (potential snv) from an adjacent second amplicon. we had a snv, i.e., g. c>t (in np ), which was covered by both n _ _r and n _ _r_alt primers. the full primer sequence could be removed from the end of a read by standard trimming or by ivar if it was located in anywhere in a read. in both scenarios, the masked snv allele reads would be removed and the potential snv could be detected. however, p and p employed an enzymatic fragmentation step, which resulted in partial primer sequences at the end of some reads that could not be removed by either ivar or clc bio package. in addition, ivar may also remove the primer sequences located within the middle of a read, i.e., a "true" snv derived from the reads amplified by a second adjacent amplicon. under both circumstances, the potential snv calling would be compromised. to reveal the masked snv calling, we compared the g. allele frequencies using different primer trimming methods on fastq file of np at m viral input with m read depth. briefly, the same fastq file was trimmed by quality trimming only, clc bio (default settings), ivar (default settings), and cutadapt (ends trimming only), respectively. after trimming, the occurrences of full and partial sequences for primer n _ _r and n _ _r_alt, as well as their reverse complementary sequences that covered g. were counted and the frequency of allele t was used to compared the efficiencies of trimming methods (extended data fig. ). the ranking performances of snv detection and viral genome mapping across seven protocols were evaluated individually for each of six categories or metrics, using either z-score statistic based on harmonic mean (extended data fig. ) or displayed by individual sample/data point values (suppl. fig. ), both derived from snv calling and viral genome mapping data at low and high viral input. for the z-score based rankings, as each protocol contained multiple data points linked to different samples and read depths, a mean was taken as the initial ranking value for the given protocol and was used for z-score transformation. the z-score was calculated based on the average of all sample/data points per metric per protocol for either low or high input. to reduce the variation associated with viral input, only data points generated from k and m viral inputs were used for ranking. sample np , np , and np were used for snv calling ranking evaluations on sensitivity, reproducibility, and precision. other samples were also included in the mapping-based ranking evaluations on mappability, genome coverage, and uniformity of coverage. snv detection sensitivity was measured by the percentage of consensus fig. ) . for the rankings displayed based on individual sample or data point for each of six metrics, all sample/data point values were the same as used for the z-score based rankings, but no mean was calculated, which thus allowed to display the distribution of all samples and data point values for each of the six metrics at low and high viral input (see figure legends for detail in suppl. fig. ). the sequencing data have been uploaded to the ncbi sra (sequence read archive) under the bioproject accession # prjna . the data will be available to the public when the paper is published. for reviewers, a token for accessing the data can be obtained from the editor of the journal. reviewers' link: https://dataview.ncbi.nlm.nih.gov/object/prjna ?reviewer=r nj tijbk p rnsddc n we used many algorithms and code sets for the sars-cov- genome mapping, genome coverage and snv calling which have been published previously. all of our code is provided in the github at the following link. https://github.com/oxwang/covid _ms all the authors claim no conflicts of interests. any mention of commercial products is for clarification and not intended as an endorsement. i.e., ( k, low) vs. either , or million ( k or m, high) sars-co-v- viral copies, were used from each same sample, whereas for frozen samples, two different viral inputs, i.e., ( k, low) vs. million ( m, high) sars-co-v- viral copies from each same sample, were used. p used different samples at low input vs. high input due to minimal total rna amount required. the performances of protocols were benchmarked based on viral input, sequencing platform and depth, mappability, viral genome coverage and coverage uniformity, and sensitivity, reproducibility, as well as precision across seven protocols. only at m viral input were the data available for m pe reads and showed (bottom panel, right), no data available at m reads at low input. x-axis shows the samples (np , np , np ), y-axis shows the snv calling reproducibility. (b) jaccard score showing the average reproducibility between protocols across all combinations of protocols when varying read depth ( . m, m, and m), viral input amount ( k, m), and allele frequency calling threshold; x-axis shows allele frequency (af) threshold, y-axis shows the jaccard score-average reproducibility (%); . m/ k: . m reads/ viral copies, m/ k: m reads/ viral copies; . m/ m: . m reads/ million viral copies; m/ m: m reads/ million viral copies; m/ m: m reads/ million viral copies. extended data figure . artic v primer sequences masked a snv call of g. c>t. a snv in np was found in the genome location at g. , which was within the primer n _ _r and n _ _r_alt sequences, and was also in the middle of amplicon . after sequencing, reads with g. could be covered by n _ _r and n _ r_alt sequences under scenarios (right panel). scenario : full primer sequences of n _ _r and n _ r_alt; scenario : partial primer sequences of n _ _r and n _ r_alt; scenario : full reverse-complementary sequences of n _ _r and n _ r_alt; scenario : partial reverse-complementary sequences of n _ _r and n _ r_alt. fastq files were first trimmed to remove adapters using cutadapt with default settings, then subject to second round of trimming to remove sars-cov- artic v amplicon primers, using ivar, qiagen clc package, and customized cutadapt trimming. after trimming, the occurrence of primer sequences listed in each of the above scenarios were counted, in searching for evidence that g. c>t snv call was compromised by primer derived sequence "contamination". data was normalized to the number of counts per million reads. (a) library prepared with p from np at m viral input. (b) library prepared with p from np at m viral input. y-axis shows the cpm reads for p (a) or p (b); x-axis shows reads containing g. c allele in orange (false negative) and reads containing g. c>t allele in green (consensus true snv) in four different scenarios (illustrated in the right panel) using four different trimming methods (adapter trimming, ivar primer trimming, clc primer trimming and non-internal primer trimming). note that after "non-internal primer trimming", by applying cutadapt, a trimming algorithm that removed all the partial or complete primer sequences by trimming only the end of the reads, i.e., "end-primer sequence trimming", the frequency of the snv calling for g. c>t became . % (p ) or . % (p ), respectively. extended data figure . z-score rankings of sars-co-v- whole-genome sequencing protocols. protocols were ranked individually using z-score at each metric. (a) ranking of sensitivity of snv detection (low and high inputs); sensitivity evaluates the ability of a protocol in detecting potential snvs based on the consensus snvs defined (see results and methods); (b) ranking of reproducibility of snv detection (low and high input); reproducibility metric measures the likelihood of snvs detected in a given protocol that might be detected by another protocol in an independent experiment. reproducibility and its calculation were defined in the methods. (c) ranking of precision of snv detection (low and high inputs); precision metric measures the accuracy of consensus snv detected by a protocol from all potential snvs (frequency > %); (d) ranking of sars-cov- genome mappability which measures the mapping efficiency of sequencing data to the viral genome; (e) ranking of sars-cov- genome coverage which measures the proportion of viral genome that can be covered at specific read depth; (f) ranking of uniformity of genome coverage which evaluates the evenness of coverage across viral genome. the reciprocal value of coefficient of variation (cv) was used for z-score calculation in order to keep same ranking directionality (large value for better performance) as in other categories. z-scores are plotted as circles with their size and color shade scaled to the z-score value from large to small, and dark blue to light blue. note that larger z-score values imply better performance, clinical characteristics of coronavirus disease in china clinical features of patients infected with novel coronavirus in wuhan transmission of -ncov infection from an asymptomatic contact in germany a new coronavirus associated with human respiratory disease in china viral and host factors related to the clinical outcome of covid- first ngs-based covid- diagnostic the covid- vaccine development landscape the proximal origin of sars-cov- phylogenetic network analysis of sars-cov- genomes host, viral, and environmental transcriptome profiles of the severe acute respiratory syndrome coronavirus (sars-cov- ) cryptic transmission of sars-cov- in washington state. medrxiv sequencing identifies multiple, early introductions of sars-cov to new york city region. medrxiv genomic epidemiology of sars-cov- in guangdong province molecular architecture of early dissemination and evolution of the sars-cov- virus in metropolitan houston varscan : somatic mutation and copy number alteration discovery in cancer by exome sequencing an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar nextstrain: real-time tracking of pathogen evolution genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a genomic perspective on the origin and emergence of sars-cov- a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event comparison of different samples for novel coronavirus detection by nucleic acid amplification tests correlation of chest ct and rt-pcr testing in coronavirus disease (covid- ) in china: a report of cases a rapid, low cost, and highly sensitive sars-cov- diagnostic based on whole genome sequencing. biorxiv introductions and early spread of sars-cov- in the new york city area coast-to-coast spread of sars-cov- in the united states revealed by genomic epidemiology a proposal of alternative primers for the artic network's multiplex pcr to improve coverage of sars-cov- genome sequencing. biorxiv highly sensitive and full-genome interrogation of sars-cov- using multiplexed pcr enrichment followed by next-generation sequencing. biorxiv multiple approaches for massively parallel sequencing of sars-cov- genomes directly from clinical samples examining sources of error in pcr by single-molecule sequencing cutadapt removes adapter sequences from high-throughput sequencing reads virus isolation from the first patient with sars-cov- in korea linear mrna amplification from as little as ng total rna for global gene expression analysis single primer isothermal amplification (spia) combined with next generation sequencing provides complete bovine coronavirus genome coverage and higher sequence depth compared to sequence-independent single primer amplification (sispa) the impact of amplification on differential expression analyses by rna-seq functional dna quantification guides accurate next-generation sequencing mutation detection in formalin-fixed, paraffin-embedded tumor biopsies comparison of somatic mutation calling methods in amplicon and whole exome sequence data the impact of dna input amount and dna source on the performance of whole-exome sequencing in cancer epidemiology fastqc: a quality control tool for high throughput sequence data qualimap: evaluating next-generation sequencing alignment data multiqc: summarize analysis results for multiple tools and samples in a single report fast and accurate short read alignment with burrows-wheeler transform the sequence alignment/map format and samtools improved metagenomic analysis with kraken walker s fitting linear mixed-effects models using lme christensen rhb lmertest package: tests in linear mixed effects models jalview version -a multiple sequence alignment editor and analysis workbench key: cord- -ls l mlg authors: tindle, courtney; fuller, mackenzie; fonseca, ayden; taheri, sahar; ibeawuchi, stella-rita; beutler, nathan; claire, amanraj; castillo, vanessa; hernandez, moises; russo, hana; duran, jason; crotty alexander, laura e.; tipps, ann; lin, grace; thistlethwaite, patricia a.; chattopadhyay, ranajoy; rogers, thomas f.; sahoo, debashis; ghosh, pradipta; das, soumita title: adult stem cell-derived complete lung organoid models emulate lung disease in covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ls l mlg sars-cov- , the virus responsible for covid- , causes widespread damage in the lungs in the setting of an overzealous immune response whose origin remains unclear. we present a scalable, propagable, personalized, cost-effective adult stem cell-derived human lung organoid model that is complete with both proximal and distal airway epithelia. monolayers derived from adult lung organoids (alos), primary airway cells, or hipsc-derived alveolar type-ii (at ) pneumocytes were infected with sars-cov- to create in vitro lung models of covid- . infected alo-monolayers best recapitulated the transcriptomic signatures in diverse cohorts of covid- patient-derived respiratory samples. the airway (proximal) cells were critical for sustained viral infection, whereas distal alveolar differentiation (at →at ) was critical for mounting the overzealous host immune response in fatal disease; alo monolayers with well-mixed proximodistal airway components recapitulated both. findings validate a human lung model of covid- , which can be immediately utilized to investigate covid- pathogenesis and vet new therapies and vaccines. graphic abstract highlights human lung organoids with mixed proximodistal epithelia are created proximal airway cells are critical for viral infectivity distal alveolar cells are important for emulating host response both are required for the overzealous response in severe covid- in brief an integrated stem cell-based disease modeling and computational approach demonstrate how both proximal airway epithelium is critical for sars-cov- infectivity, but distal differentiation of alveolar pneumocytes is critical for simulating the overzealous host response in fatal covid- . sars-cov- , the virus responsible for covid- , causes widespread inflammation and injury in the lungs, giving rise to diffuse alveolar damage (dad) (arrossi and farver, ; borczuk et al., ; damiani et al., ; li et al., ; roden et al., ) , featuring marked infection and viral burden leading to apoptosis of alveolar pneumocytes (hussman, ) , along with pulmonary edema (bratic and larsson, ; carsana et al., ) . dad leads to poor gas exchange and, ultimately, respiratory failure; the latter appears to be the final common mechanism of death in most patients with severe covid- infection. how the virus causes so much damage remains unclear. a particular challenge is to understand the out-of-control immune reaction to the sars-cov- infection known as a cytokine storm, which has been implicated in many of the deaths from covid- . although rapidly developed pre-clinical animal models have recapitulated some of the pathognomonic aspects of infection, e.g., induction of disease, and transmission, and even viral shedding in the upper and lower respiratory tract, many failed to develop severe clinical symptoms (lakdawala and menachery, ) . thus, the need for pre-clinical models remains both urgent and unmet. to address this need, several groups have attempted to develop human pre-clinical covid- lung models, all within the last few months (duan et al., ; mulay et al., ; salahudeen et al., ) . while a head-to-head comparison of the key characteristics of each model can be found in table , what is particularly noteworthy is that none recapitulate the heterogeneous epithelial cellularity of both proximal and distal airways, i.e., airway epithelia, basal cells, secretory club cells and alveolar pneumocytes. also noteworthy is that models derived from ipscs lack propagability and/or cannot be reproducibly generated for biobanking; nor can they be scaled up in cost-effective ways for use in drug screens. besides the approaches described so far, there are a few more approaches used for modeling covid- -(i) d organoids from bronchospheres and tracheospheres have been established before (hild and jaffe, ; rock et al., ; tadokoro et al., ) and are now used in apical-out cultures for infection with sars-cov- (suzuki et al., ); (ii) the most common model used for drug screening is the air-liquid interphase (ali model) in which pseudo-stratified primary bronchial or small airway epithelial cells are used to recreate the multilayered mucociliary epithelium (mou et al., ; randell et al., ) ; (iii) several groups have also generated d airway models from ipscs or tissue-resident stem cells (dye et al., ; ghaedi et al., ; konishi et al., ; mccauley et al., ; miller et al., ; wong et al., ) ; (iv) others have generated at cells from ipscs using closely overlapping protocols of sequential differentiation starting with definitive endoderm, anterior foregut endoderm, and distal alveolar expression (chen et al., ; gotoh et al., ; huang et al., ; jacob et al., ; jacob et al., ; yamamoto et al., ) . (v) finally, long term in vitro culture conditions for pseudo-stratified airway epithelium organoids, derived from healthy and diseased adult humans suitable to assess virus infectivity (sachs et al., ; van der vaart and clevers, ; zhou et al., ) have been pioneered; unfortunately, these airway organoids expressed virtually no lung mesenchyme or alveolar signature. what remains unclear is if any of these models accurately recapitulate the immunopathologic phenotype that is seen in the lungs in covid- . we present a rigorous transdisciplinary approach that systematically assesses an adult lung organoid model that is propagable, personalized and complete with both proximal airway and distal alveolar cell types against existing models that are incomplete, and we cross-validate them all against covid- patient-derived respiratory samples. findings surprisingly show that cellular crosstalk between both proximal and distal components are necessary to emulate how sars-cov- causes diffuse alveolar pneumocyte damage; the proximal airway mounts a sustained viral infection, but it is the distal alveolar pneumocytes that mount the overzealous host response that has been implicated in a fatal disease. to determine which cell types in the lungs might be most readily infected, we began by analyzing a human lung single-cell sequencing dataset (gse ) for the levels of expression of angiotensin-converting enzyme-ii (ace ) and transmembrane serine protease (tmprss ), the two receptors that have been shown to be the primary sites of entry for the sars-cov- (hoffmann et al., ) . the dataset was queried with widely accepted markers of all the major cell types (see table ). alveolar epithelial type (at ), ciliated and club cells emerged as the cells with the highest expression of both receptors (fig a; fig s a) . these observations are consistent with published studies demonstrating that ace is indeed expressed highest in at and ciliated cells (jia et al., ; mulay et al., ; zhao et al., ) . in a cohort of deceased covid- patients, we observed by h&e (fig s b) that gas-exchanging flattened at pneumocytes are virtually replaced by cuboidal cells that were subsequently confirmed to be at -like cells via immunofluorescent staining with the at -specific marker, surfactant protein-c (sftpc; fig b upper panel; fig s c; top). we also confirmed that club cells express ace (fig s c; bottom) , underscoring the importance of preserving these cells in any ideal lung model of covid- . when we analyzed the lungs of deceased covid- patients, the presence of sars-cov- in alveolar pneumocytes was also confirmed, as determined by the colocalization of viral nucleocapsid protein with sftpc (fig b; lower panel; fig s d) . immunohistochemistry studies further showed the presence of sars-cov- virus in alveolar pneumocytes and in alveolar immune cells (fig s e) . these findings are consistent with the gathering consensus that alveolar pneumocytes support the interaction between the epithelial cells and inflammatory cells recruited to the lung; via mechanisms that remain unclear, they are generally believed to contribute to the development of acute lung injury and acute respiratory distress syndrome (ards), the severe hypoxemic respiratory failure during covid- (hou et al., ; spagnolo et al., ) . because prior work has demonstrated that sars-cov- infectivity in patient-derived airway cells is highest in the proximal airway epithelium compared to the distal alveolar pneumocytes (at and at ) (hou et al., ) , and yet, it is the at pneumocytes that harbor the virus, and the at pneumocytes that are ultimately destroyed during diffuse alveolar damage, we hypothesized that both proximal airway and distal (alveolar pneumocyte) components might play distinct roles in the respiratory system to mount the so-called viral infectivity and host immune response phases of the clinical symptoms observed in covid- (chen and li, ) . because no existing lung model provides such proximodistal cellular representation (table ) , and hence, may not recapitulate with accuracy the clinical phases of covid- , we first sought to develop a lung model that is complete with both proximal and distal airway epithelia using adult stem cells that were isolated from deep lung biopsies. lung organoids were generated using the protocol outlined in fig c and methods. organoids grown in d cultures were subsequently dissociated into single cells to create dmonolayers (either maintained submerged in media or used in ali model) for sars-cov- infection, followed by rna seq analysis. primary airway epithelial cells and hipsc-derived alveolar type-ii (at ) pneumocytes were used as additional models (fig d; left panel) . each of these transcriptomic datasets was subsequently used to cross-validate our ex-vivo lung models of sars-cov- infection with the human covid- autopsy lung specimens (fig d; right panel) to objectively vet each model for their ability to accurately recapitulate the gene expression signatures in the patient-derived lungs. three lung organoid lines were developed from deep lung biopsies obtained from the normal regions of lung lobes surgically resected for lung cancer; both genders, smokers and non-smokers were represented (fig s a; table ). three different types of media were compared (fig s b) ; the composition of these media was inspired either by their ability to support adult-stem cell-derived mixed epithelial cellularity in other organs (like the gastrointestinal tract (miyoshi and stappenbeck, ; sato et al., ; sayed et al., ) , or rationalized based on published growth conditions for proximal and distal airway components (gotoh et al., ; sachs et al., ; van der vaart and clevers, ) . a growth condition that included conditioned media from l-wrn cells which express wnt , r-spondin and noggin, supplemented with recombinant growth factors, which we named as 'lung organoid expansion media' emerged as superior compared to alveolosphere media-i and ii (jacob et al., ; yamamoto et al., ) (details in the methods), based on its ability to consistently and reproducibly support the best morphology and growth characteristics across multiple attempts to isolate organoids from lung tissue samples. three adult lung organoid lines (alo - ) were developed using the expansion media, monitored for their growth characteristics by brightfield microscopy and cultured with similar phenotypes until p and beyond (fig s c-d) . the d morphology of the lung organoid was also assessed by h&e staining of slices cut from formalin-fixed paraffin-embedded (ffpe) cell blocks of histogel-embedded alo - (fig s e) . to determine if all the major lung epithelial cells (illustrated in fig a) are present in the organoids, we analyzed various cell-type markers by qrt-pcr (fig b-h) . all three alo lines had a comparable level of at cell surfactant markers (compared against hipsc-derived at cells as positive control) and a significant amount of at , as determined using the marker aqp . alos also contained basal cells (as determined by the marker itga ), ciliated cells (as determined by the marker foxj ), club cells (as determined by the marker scgb a ) and stem cells (as determined by marker p -ngfr). as expected, the primary human bronchial epithelial cells (nhbe) had significantly higher expression of basal cell markers than the alo lines (hence, served as a positive control), but they lacked stemness and club cells (hence, served as a negative control). the presence of all cell types was also confirmed by assessing protein expression of various cell types within organoids grown in d cultures. two different approaches were used-(i) slices cut from ffpe cell blocks of histogel-embedded alo lines (fig i-j) or (ii) alo lines grown in -well chamber slides were fixed in matrigel (fig k) , stained, and assessed by confocal microscopy. such staining not only confirmed the presence of all cell types in each alo line but also demonstrated the presence of more than one cell type (i.e., mixed cellularity) of proximal (basal-krt ) and distal (at /at markers) within the same organoid structure. for example, at and basal cells, marked by sftpb and krt , respectively, were found in the same d-structure (fig j, interrupted curved line). similarly, ciliated cells and goblet cells stained by ac-tub and muc , respectively, were found to coexist within the same structure (fig j, interrupted box; fig k, arrow) . the presence of heterogeneous cellularity was documented in all three alo lines (see multiple additional examples in fig s ) . to model respiratory infections such as covid- , it is necessary for pathogens to be able to access the apical surface. many groups have modeled such access by dissociating d organoids into single cells and plating them as d-monolayers (duan et al., ; han et al., ; huang et al., ; mulay et al., ; sachs et al., ; zhou et al., ) . because the loss of dimensionality can have a major impact on cellular proportions and impact disease-modeling in unpredictable ways, we assessed the impact of the d-to- d conversion on cellularity by rna seq analyses. two commonly encountered methods of growth in d-monolayers were tested: (i) monolayers polarized on trans-well inserts but submerged in growth media, and ii) monolayers were grown at the air-liquid interface (popularly known as the 'ali model '(dvorak et al., ; prytherch et al., ) ) for -days to differentiate into the mucociliary epithelium (see fig a; fig s a-e) . the submerged d-monolayers had several regions of organized vacuolated-appearing spots (fig s a, d; arrow), presumably due to morphogenesis and cellular organization even in d. the ali-monolayers appeared to be progressively hazier with time after air-lift, likely due to the accumulation of secreted mucin (fig s e) , and formed an intact epithelial barrier, as determined by trans-epithelial electrical resistance (teer) (fig s c) . rna seq datasets were analyzed using the same set of cell markers, as we used in fig a ( table ). cell-type deconvolution of our dataset using cibersortx (https://cibersortx.stanford.edu/runcibersortx.php) showed that cellular proportions in the human lung tissues were also relatively well-preserved in organoids grown in d over several passages (fig b; left) ; both showed a mixed population of simulated alveolar, basal, club, ciliated and goblet cells. when d organoids were dissociated and plated as d monolayers on transwells, the at signatures were virtually abolished with a concomitant and prominent emergence of at signatures, suggesting that growth in d-monolayers favor differentiation of at cells into at cells (shami and evans, ) (fig b; middle) . a compensatory reduction in proportion was also observed for the club, goblet and ciliated cells. the same organoids, when grown in long-term d culture conditions in the ali model, showed a strikingly opposite pattern; alveolar signatures were almost entirely replaced by a concomitant increase in ciliated and goblet cells (fig b; right). these findings are consistent with the well-established notion that ali conditions favor growth as pseudo-stratified mucociliary epithelium (dvorak et al., ; prytherch et al., ) . as an alternative model for use as monolayers for viral infection, we developed hipsc-derived at cells and alveolospheres (fig c) , using established protocols (huang et al., ) . because they were grown in the presence of chir (an aminopyrimidine derivate that is a selective and potent wnt agonist) (abdelwahab et al., ; jacob et al., ; yamamoto et al., ) , which probably inhibits the at →at differentiation, these monolayers were enriched for at and devoid of at cells (fig d) . the multicellularity of lung organoid monolayers was also confirmed by immunofluorescence staining and confocal microscopy of the submerged and ali monolayers, followed by the visualization of cell markers in either max-projected z-stacks (fig e; left) or orthogonal views of the same (fig e; right) . as expected, markers for the same cell type (i.e., sftpb and sftpc, both at markers) colocalize, but markers for different cell types do not. submerged monolayers showed the prominent presence of both at (aqp -positive) and at cells. compared to the submerged monolayers, the ali model showed a significant increase in the ciliated epithelium (as determined by ac tub; compare ac tub stained panels in fig e with f) . this increase was associated with a concomitant decrease in both krt -stained basal and sftpc-positive at cells (fig f) . taken together, the immunofluorescence images are in agreement with the rna seq dataset; both demonstrate that the short-term submerged monolayer favors distal differentiation (at →at ), whereas the day ali model favors proximal mucociliary differentiation. it is noteworthy that these distinct differentiation phenotypes originated from the same d-organoids despite the seeding of cells in the same basic media composition (i.e., pneumacult tm ) prior to switching over to an ali-maintenance media for the prolonged growth at air-liquid interface; the latter is a well-described methodology that promotes differentiation into ciliated and goblet cells (rayner et al., ) . because the lung organoids with complete proximodistal cellularity could be differentiated into either proximalpredominant monolayers in submerged short-term cultures or distal-predominant monolayers in long-term ali cultures, this provided us with an opportunity to model the respiratory tract and assess the impact of the virus along the entire proximal-to-distal gradient. we first asked how viral infectivity varies in the two models. because multiple groups have shown the importance of the ciliated airway cells for infectivity (i.e., viral entry, replication and apical release (hou et al., ; hui et al., ; milewska et al., ; zhu et al., ) ), as positive controls, we infected monolayers of human airway epithelia (see fig s f-i) . at cells, which express high levels of viral entry receptors ace and tmprss (fig a, fig s a) have been shown to be proficient in viral entry, but are least amenable to sustained viral release and infectivity (hou et al., ; hui et al., ) . to this end, we infected monolayers of hipsc-derived homogeneous cultures of at cells as secondary controls (see fig s j- l). infection was carried out using the washington strain of sars-cov- , usa-wa / [bei resources nr- (rogers et al., ) ]. as expected, the d-lung monolayers we generated, both the submerged or the ali models, were readily infected with sars-cov- (fig s a) , as determined by the presence of the viral envelope gene (e-gene; fig g) ; however, the kinetics of viral amplification differed. when expressed as levels of e gene normalized to the peak values in each model (fig g) , the kinetics of the ali-monolayer model mirrored that of the primary airway epithelial monolayers; both showed slow beginning ( - hpi) followed by an exponential increase in e gene levels from to hpi. the submerged monolayer model showed sustained viral infection during the - hpi window (fig s a; left) . in the case of at cells, the - hpi window was notably missing in monolayers of hipsc-derived at cells (fig g; fig s a; right). when we specifically analyzed the kinetics of viral e gene expression during the late phase ( - hpi window), we found that proximal airway models [human bronchial airway epi (hbepc)] were more permissive than distal models [human small airway epi (hsaepc) and at ] to viral replication (fig s b) ; the alo monolayers showed intermediate sustained infectivity (albeit with variability). all models showed extensive cell death and detachment by hrs and, hence, were not analyzed. confocal imaging of infected alo monolayers with anti-sars-cov- nucleocapsid protein antibody showed that submerged alo monolayers did indeed show progressive changes during the to h window after infection ( taken together, these findings show that sustained viral infectivity is best simulated in monolayers that resemble the proximal mucociliary epithelium, i.e., d-monolayers of lung organoids grown as ali models and the primary airway epithelia. because prior studies conducted in patient-derived airway cells (hou et al., ) mirror what we see in our monolayers, we conclude that proximal airway cells within our mixed-cellular model appear to be sufficient to model viral infectivity in covid- . next, we asked if the newly generated lung models accurately recapitulate the host immune response in covid- . to this end, we analyzed the infected alo monolayers (both the submerged and ali variants) as well as the airway epithelial (hsaepc) and at monolayers by rna seq and compared them all against the transcriptome profile of lungs from deceased covid- patients. a publicly available dataset (gse ) (nienhold et al., ) , comprised of lung transcriptomes from victims deceased either due to non-infectious causes (controls) or due to covid- , was first analyzed for differentially expressed genes (fig a-b) . this cohort was chosen as a test cohort over others because it was the largest one available at the time of this study with appropriate postmortem control samples. differentially expressed genes showed an immunophenotype that was consistent with what is expected in viral infections (fig c; table ; fig s ) , and showed overrepresentation of pathways such as interferon, immune, and cytokine signaling (fig d; table ; fig s ) . differentially expressed gene signatures and reactome pathways that were enriched in the test cohort were fairly representative of the host immune response observed in patient-derived respiratory samples in multiple other validation cohorts; the signature derived from the test cohort could consistently classify control (normal) samples from covid- samples (roc auc . to . across the board; fig e) . the most notable finding is that the patient-derived signature was able to perfectly classify the epcam-sorted epithelial fractions from the bronchoalveolar lavage fluids of infected and healthy subjects (roc auc . ; gse -epithelium (liao et al., ) , suggesting that the respiratory epithelium is a major site where the host immune response is detected in covid- . when compared to existing organoid models of covid- , we found that the patient-derived covid- -lung signature was able to perfectly classify infected vs. uninfected late passages (> ) of hipsc-derived at / monolayers (gse ) (han et al., ) and infected vs. uninfected liver and pancreatic organoids (fig f) . the covid- -lung signatures failed to classify commonly used respiratory models, e.g., a cells and bronchial organoids, as well as intestinal organoids (fig f) . a similar analysis on our own lung models revealed that the covid- lung signature was induced in submerged monolayers with distal-predominant at →at differentiation, but not in the proximal-predominant ali model (roc auc . and . , respectively; fig g) . the ali model and the small airway epithelia, both models that mimic the airway epithelia (and lack alveolar pneumocytes; see fig b) , failed to mount the patient-derived immune signatures (fig h; left) . these findings suggested that the presence of alveolar pneumocytes is critical for emulating host response. to our surprise, induction of the covid- -lung signature also failed in hipsc-derived at monolayers (fig h; right), indicating that at cells are unlikely to be the source of such host response. these findings indicate that both proximal airway and at cells, when alone, are insufficient to induce the host immune response that is encountered in the lungs of covid- patient. next, we analyzed the datasets from our alo monolayers for differentially expressed genes when challenged with sars-cov- (fig a-b) . genes and pathways upregulated in the infected lung organoidderived monolayer models overlapped significantly with those that were upregulated in the covid- lung missing components, the model-derived deg signature was sufficient to consistently and accurately classify diverse cohorts of patient-derived respiratory samples (roc auc ranging from . to . ; fig f) ; the modelderived deg signature was significantly induced in covid- samples compared to normal controls (fig g-h ). most importantly, the model-derived deg signature was significantly induced in the epithelial cells recovered from bronchoalveolar lavage (fig i) . taken together, these cross-validation studies from disease to model (fig ) and vice versa (fig ) provide an objective assessment of the match between the host response in covid- lungs and our submerged alo monolayers. such a match was not seen in the case of the other models, e.g., the proximal airway-mimic ali model, hsaepc monolayer, or hipsc-derived at models. because the submerged alo monolayers contained both proximal airway epithelia (basal cells) and promoted at →at differentiation, findings demonstrate that mixed cellular monolayers can mimic the host response in covid- . a subtractive analysis revealed that the cell type that is shared between models which showed induction of host response signatures (i.e., alo submerged monolayers and gse (han et al., ); fig f) but is absent in models that do not show such response (hu bronchial organoids, small airway epi, ali-model of alo) is at . we conclude that distal differentiation from at →at , a complex process that is comprised of distinct intermediates (choi et al., ), is essential for modeling the host immune response in covid- . both proximal and distal airway epithelia are required to mount the overzealous host response in we next asked which model best simulated the overzealous host immune response that has been widely implicated in fatal covid- . to this end, we relied upon a recently described artificial intelligence (ai)-guided definition of the nature of the overzealous response in fatal covid- (sahoo et al., ) . using ace as a seed gene, a -gene signature was identified and validated as an invariant immune response that was shared among all respiratory viral pandemics, including covid- (fig a) . a subset of genes within the -gene signature was subsequently identified as a determinant of disease severity/fatality; these genes represented translational arrest, senescence, and apoptosis. these two signatures referred to as vip ( -gene) and severe vip ( -gene) signatures, were used as a computational framework to first vet existing sars-cov- infection models that have been commonly used for therapeutic screens (fig b-d) . surprisingly, we found that each model fell short in one way or another. for example, the vero e , which is a commonly used cultured cell model, showed a completely opposite response; instead of being induced, both the -gene and -gene vip signatures were suppressed in infected vero e monolayers (fig b) . similarly, neither vip signature was induced in the case of sars-cov- challenged human bronchial organoids(suzuki et al., ) ( fig c) . finally, in the case of the hipsc-derived at / organoids, which recapitulated the covid- -lung derived immune signatures (in fig f) , the -gene vip signature was induced significantly (fig d; top) , but the -gene severity signature was not (fig d; bottom) . these findings show that none of the existing models capture the overzealous host immune response that has been implicated in a fatality. our lung models showed that both the -and -gene vip signatures were induced significantly in the submerged alo-derived monolayers that had distal differentiation (fig e; left), but not in the proximal-mimic ali model (fig e; right) . neither signatures were induced in monolayers of small airway epithelial cells (fig f) or hipsc-derived at cells (fig g) . taken together with our infectivity analyses, these findings demonstrate that although the proximal airway epithelia and at cells may be infected, and as described by others (dye et al., ; hou et al., ) , may be vital for mounting a viral response and for disease transmission, these cells alone cannot mount the overzealous host immune response that is associated with the fatal disease. similarly, even though the alveolar pneumocytes, at and at cells, are sufficient to mount the host immune response, in the absence of proximal airway components, they too are insufficient to recapitulate the severe vip signature that is characterized by cellular senescence and apoptosis. however, when both proximal and distal components are present, i.e., basal, ciliated and at cells, the model mimicked the overzealous host immune response in covid- (fig h) . the most important discovery we report here is the creation of adult lung organoids that are complete with both proximal airway and distal alveolar epithelia; these organoids can not only be stably propagated and expanded in d cultures but also used as monolayers of mixed cellularity for modeling viral and host immune responses during respiratory viral pandemics. furthermore, an objective analysis of this model and other existing sars-cov- -infected lung models against patient-lung derived transcriptomes showed that the model which most closely emulates the elements of viral infectivity, lung injury, and inflammation in covid- is one that contained both proximal and distal alveolar signatures (fig h) , whereas, the presence of just one or the other fell short. there are three important impacts of this work. first, successful modeling of the human lung organoids that are complete with both proximal and distal signatures has not been accomplished before. the multicellularity of the lung has been a daunting challenge that many experts have tried to recreate in vitro; in fact, the demand for perfecting such a model has always remained high, not just in the current context of the covid- pandemic but also with the potential of future pandemics. we have provided the evidence that the organoids that were created using our methodology retain proximal and distal cellularity throughout multiple passages and even within the same organoid. although a systematic design of experiment (doe) approach (bukys et al., ) was not involved in getting to this desirable goal, a rationalized approach was taken. for example, a wnt/rspondin/noggin-containing conditioned media was used as a source of the so-called 'niche factors' for any organoid growth (sato and clevers, ) . this was supplemented with recombinant fgf / ; fgf is known to help cell proliferation and differentiation and is required for normal branching morphogenesis (padela et al., ) , whereas fgf helps in cell maturation (rabata et al., ) and in alveolar regeneration upon injury (yuan et al., ) . together, they are likely to have directed the differentiation toward distal lung lineages (hence, the preservation of alveolar signatures). the presence of both distal alveolar and proximal ciliated cells was critical: proximal cells were required to recreate sustained viral infectivity, and the distal alveolar pneumocytes, in particular, the ability of at cells to differentiate into at pneumocytes was essential to recreate the host response. it is possible that the response is mediated by a distinct at -lineage population, i.e., damageassociated transient progenitors (datps), which arise as intermediates during at →at differentiation upon injury-induced alveolar regeneration (choi et al., ) . although somewhat unexpected, the role of at pneumocytes in mounting innate immune responses has been documented before in the context of bacterial pneumonia (wong and johnson, ; yamamoto et al., ) . in work (huang et al., ) that was published during the preparation of this manuscript, authors used long-term ali models of hipsc-derived at monolayers (in growth conditions that inhibit at →at differentiation, as we did here for our at model) and showed that sars-cov- induces iat -intrinsic cytotoxicity and inflammatory response, but failed to induce type interferon pathways (ifn α/β). it is possible that prolonged culture of iat pneumocytes gives rise to some datps but cannot robustly do so in the presence of inhibitors of at differentiation. this (spatially segregated viral and host immune response) is a common theme across many lung infections (including bacterial pneumonia and other viral pandemics (chan et al., ; hou et al., ; taubenberger and morens, ; weinheimer et al., ) and hence, this mixed cellularity model is appropriate for use in modeling diverse lung infections and respiratory pandemics to come. second, among all the established lung models so far, ours features key properties that are desirable whenever disease models are being considered for their use in htp modes for rapid screening of candidate therapeutics and vaccines -(i) reproducibility, propagability and scalability, (ii) cost-effectiveness, (iii) personalization, and (iv) modularity, with the potential to be used in multi-dimensional complex lung models with other cell types if/when needed. we showed that the protocol we have optimized supports isolation, expansion and propagability at least up to - passages (at the time of submission of this work), with documented retention of proximal and distal airway components up to p (by rna seq). feasibility has also been established for scaling up for use in -well htp assays. we also showed that the protocols for generating lung organoids could be reproduced in both genders and regardless of the donor's smoking status, consistency in outcome and growth characteristics were observed across all isolation attempts. the alos are also cost-effective; the need for exclusive reliance on recombinant growth factors was replaced at least in part with conditioned media from a commonly used cell line (l-wrn/ atcc® crl- cells). such media has a batch to batch stable cocktail of wnt, r-spondin, and noggin and has been shown to improve reproducibility in the context of gi organoids in independent laboratories (vandussen et al., ) . the use of conditioned media may not only have made our model more cost-effective than others but also likely improved the rigor and reproducibility in lung modeling and research. because the model is propagable, repeated ipsc-reprogramming (another expensive step) is also eliminated, further cutting costs compared to many other models. as for personalization, our model is derived from adult lung stem cells from deep lung biopsies; each organoid line was established from one patient. by avoiding ipscs or epscs, this model not only captures genetics but also retains organ-specific epigenetic programming in the lung, and hence, potentially additional programming that may occur in disease (such as in the setting of chronic infection, injury, inflammation, somatic mutations, etc.). the ability to replicate donor phenotype and genotype in vitro allows for potential use as pre-clinical human models for phase ' ' clinical trials. as for modularity, by showing that the d lung organoids could be used as polarized monolayers on transwells to allow infectious agents to access the apical surface (in this case, sars-cov- ), we demonstrate that the organoids have the potential to be reverse-engineered with additional components in a physiologically relevant spatially segregated manner: for example, immune and stromal cells can be placed in the lower chamber to model complex lung diseases that are yet to be modeled and have no cure (e.g., idiopathic pulmonary fibrosis, etc). third, the value of the alo models is further enhanced due to the availability of companion readouts/ biomarkers (e.g., vip signatures in the case of respiratory viral pandemics, or monitoring the e gene, or viral shedding, etc.) that can rapidly and objectively vet treatment efficacy based on set therapeutic goals. of these readouts, the host response, as assessed by vip signatures, is a key vantage point because an overzealous host response is what is known to cause fatality. recently, a systematic review of the existing pre-clinical animal models revealed that most of the animal models of covid- recapitulated mild patterns of human covid- ; no severe illness associated with mortality was observed, suggesting a wide gap between covid- in humans (spagnolo et al., ) and animal models (ehaideb et al., ) . the model revealed here, in conjunction with the vip signatures described earlier (sahoo et al., ) , could serve as a pre-clinical model with companion diagnostics to identify drugs that target both the viral and host response in pandemics. milewska, a., kula-pacurar, a., wadas, j., suder, a., szczepanski, a., dabrowska, a., owczarek, k., marcello, a., ochman, m., stacel, t., et al. ( ) . replication of severe acute respiratory syndrome coronavirus in human respiratory epithelium. j virol . sayed, i.m., suarez, k., lim, e., singh, s., pereira, m., ibeawuchi, s.r., katkar, g., dunkel, y., mittal, y., chattopadhyay, r., et al. ( ) . host engulfment pathway controls inflammation in inflammatory bowel disease. zhu, n., wang, w., liu, z., liang, c., wang, w., ye, f., huang, b., zhao, l., wang, h., zhou, w., et al. ( ) . morphogenesis and cytopathic effect of sars-cov- infection in human airway epithelial cells. nat commun , . whisker plots display relative levels of ace expression in various cell types in the normal human lung. the cell types were annotated within a publicly available single-cell sequencing dataset (gse ) using genes listed in table s . p values were analyzed by one-way anova and tukey post hoc test. b. ffpe sections of the human lung from normal and deceased covid- patients were stained for sftpc, alone or in combination with to generate adult healthy lung organoids, fresh biopsy bites were prospectively collected after surgical resection of the lung by the cardiothoracic surgeon. before collection of the lung specimens, each tissue was sent to a gross anatomy room where a pathologist cataloged the area of focus, and the extra specimens were routed to the research lab in human transport media (htm, advanced dmem/f- , mm hepes, x glutamax, x penicillin-streptomycin, m y- ) for cell isolation. deidentified lung tissues obtained during surgical resection, that were deemed excess by clinical pathologists, were collected using an approved human research protocol (irb# ; pi: thistlethwaite). isolation and biobanking of organoids from these lung tissues were carried out using an approved human research protocol (irb# : pi ghosh and das) that covers human subject research at the uc san diego humanoid center of research excellence (core). for all the deidentified human subjects, information including age, gender, and previous history of the disease, was collected from the chart following the rules of hipaa and described in the table. a portion of the same lung tissue specimen was fixed in % zinc-formalin for at least hrs followed by submersion in % etoh until embedding in ffpe blocks. the lung specimens from covid- positive human subjects were collected through autopsy (the study was irb exempt). all donations to this trial were obtained after telephone consent followed by written email confirmation by the next of kin/power of attorney per california state law (no in-person visitation could be allowed into our covid- icu during the pandemic). the team member followed the cdc guidelines for covid and the autopsy procedures ((cap), ; (cdc), )). lung specimens were collected in % zinc-formalin and stored for hrs before processing for histology. patient characteristics are listed in the table. autopsy # was a standard autopsy performed by anatomical pathology in the bsl autopsy suite. the patient expired and his family consented for autopsy. after hrs, the lungs were removed and immersion fixed whole in % formalin for hrs and then processed further. lungs were only partially fixed at this time (about % fixed in thicker segments) and were sectioned into small - cm chunks and immersed in % formalin for further investigation. autopsy # and # were collected from rapid post-mortem lung biopsies. the procedure was performed in the jacobs medical center icu (all of the icu rooms have a pressure-negative environment, with air exhausted through hepa filters [biosafety level (bsl )] for isolation of sars-cov- virus). biopsies were performed - hrs after patient expiration. the ventilator was shut off to reduce the aerosolization of viral particles at least hr after the loss of pulse and before sample collection. every team member had personal protective equipment in accordance with the university policies for procedures on patients with covid- (n mask + surgical mask, hairnet, full face shield, surgical gowns, double surgical gloves, booties). lung biopsies were obtained after l-thoracotomy in the th intercostal space by the cardiothoracic surgery team. samples were taken from the left upper lobe (lul) and left lower lobe (lll) and then sectioned further. a previously published protocol was modified to isolate lung organoids from human subjects (sachs et al., ; zhou et al., ) . briefly, normal human lung specimens were washed with pbs/ x penicillinstreptomycin and minced with surgical scissors. tissue fragments were resuspended in ml of wash buffer (advanced dmem/f- , mm hepes, x glutamax, x penicillin-streptomycin) containing mg/ml collagenase type i (thermo fisher, usa) and incubated at °c for approximately hr. during incubation, tissue pieces were sheared every min with a ml serological pipette and examined under a light microscope to monitor the progress of digestion. when - % of single cells were released from connective tissue, the digestion buffer was neutralized with ml wash buffer with added % fetal bovine serum; the suspension was passed through a -µm cell strainer and centrifuged at rcf. remaining erythrocytes were lysed in ml red blood cell lysis buffer (invitrogen) at room temperature for min, followed by the addition of ml of wash buffer and centrifugation at rcf. cell pellets were resuspended in cold matrigel (corning, usa) and seeded in µl droplets on a well tissue culture plate. the plate was inverted and incubated at °c for min to allow complete polymerization of the matrigel before the addition of ml lung expansion medium per well. lung expansion media was prepared by modifying the gi-organoid media ( % conditioned media, prepared from l-wrn cells with wnt a, r-spondin, and noggin, atcc-crl- tm ) (ghosh et al., ; sayed et al., a; sayed et al., b; sayed et al., c) with a proprietary cocktail from the humanoid core containing b , tgf-β receptor inhibitor, antioxidants, p mapk inhibitor, fgf , fgf and rock inhibitor. the lung expansion media was compared to alveolosphere media i (imdm and f as the basal medium with b , low concentration of kgf, monothioglycerol, gsk inhibitor, ascorbic acid, dexamethasone, ibmx, camp and rock inhibitor) and ii (f as the basal medium with added cacl , b , low concentration of kgf, gsk inhibitor, tgf-β receptor inhibitor dexamethasone, ibmx, camp and rock inhibitor) modified from previously published literature (jacob et al., ; yamamoto et al., ) . neither alvelosphere media contain any added wnt a, r-spondin, and noggin. the composition of these media was developed either by fundamentals of adultstem cell-derived mixed epithelial cellularity in other organs (like the gastrointestinal tract (miyoshi and stappenbeck, ; sato et al., ; sayed et al., c) , or rationalized based on published growth conditions for proximal and distal airway components (gotoh et al., ; sachs et al., ; van der vaart and clevers, ) . organoids were maintained in a humidified incubator at °c/ % co , with a complete media change performed every days. after the organoids reached confluency between - days, organoids were collected in pbs/ . mm edta and centrifuged at rcf for min. organoids were dissociated in ml tryple select (gibco, usa) per well at °c for - min and mechanically sheared. wash buffer was added at a : , tryple to wash buffer ratio. the cell suspension was subsequently centrifuged, resuspended in matrigel, and seeded at a : ratio. lung organoids were biobanked and passage - cells were used for experiments. subculture was performed every - days. lung-organoid-derived monolayers were prepared using a modified protocol of gi-organoid-derived monolayers (ghosh et al., ; sayed et al., a; sayed et al., b; sayed et al., c) . briefly, transwell inserts ( . mm diameter, . um pore size, corning) were coated in matrigel diluted in cold pbs at a : ratio and incubated for hr at room temperature. confluent organoids were collected in pbs/edta on day and dissociated into single cells in tryple for - min at °c. following enzymatic digestion, the cell suspension was mechanically sheared through vigorous pipetting with a µl pipette and neutralized with wash buffer. the suspension was human ipsc-derived alveolar epithelial type cells (ihaepc ) were obtained from cell applications inc. and cultured in growth media (i k- , cell applications inc.) according to the manufacturer's instructions. all the primary cells were used within early passages ( - ) to avoid any gradual disintegration of the airway epithelium with columnar epithelial structure and epithelial integrity. lung organoid-derived monolayers or primary airway epithelial cells either in well plates or in transwells were washed twice with antibiotic-free lung wash media. e pfu of sars-cov- strain usa-wa / (bei resources nr- ) in complete dmem was added to the apical side of the transwell and allowed to incubate for , , and hrs at °c and % co . after incubation, the media was removed from the basal side of the transwell. the apical side of the transwells was then washed twice with (antibiotic-free lung wash media) and then twice with pbs. trizol™ reagent (thermo fisher ) was added to the well and incubated at ˚c and % co for min. the trizol™ reagent was removed and stored at - ˚c for rna analysis. organoids and monolayers used for lung cell type studies were lysed using rna lysis buffer followed by rna cov studies were lysed in tri-reagent and rna was extracted using zymo research direct-zol rna miniprep. organoid and monolayer cell-type gene expression was measured by qrt-pcr using x sybr green qpcr master mix. cdna was amplified with gene-specific primer/probe set for the lung cell type markers and qscript cdna supermix ( x). qrt-pcr was performed with the applied biosystems quantstudio real-time pcr system. cycling parameters were as follows: °c for s, followed by cycles of s at °c and s at °c. all samples were assayed in triplicate and eukaryotic s ribosomal rna was used as a reference. itga itga f 'cgaaaccaaggttctgagccc' itga r 'cttggatctccactgaggcagt' goblet muc ac muc ac f ' ggaactgtggggacagctctt' muc ac r ' gtcacattcctcagcgaggtc' cilia foxj foxj f 'actcgtatgccacgctcatctg' foxj r 'gagacaggttgtggcggattga' club cell scgb a scgb a f ' caaaagcccagagaaagcatc' scgb a r ' cagttggggatcttcagcttc' alveolar type aqp , pdpn, p rx aqp f ' tacggtgtggcaccgctcaatg' aqp r ' agtcagtggaggcgaagatgca' pdpn f ' gtgccgaagatgatgtggtgac' pdpn r ' ggactgtgctttctgaagttggc' p rx f ' gtggcggattatgtgataccagc' p rx r ' cacacagtggtcgcatctggaa' alveolar type abca , sftpa , sftpc, lamp abca f ' cttgacagtcgcagagcacctt' abca r ' ctccgtgagttccacttgtcct' sftpa f ' cacctggagaaatgccatgtcc' sftpa r ' aagtcgtggagtgtggcttgga' sftpc f ' gtcctcatcgtcgtggtgattg' sftpc r ' agaaggtggcagtggtaaccag' lamp f ' tgggagcctatttgaccgtctc' lamp r ' gctgacaactggaggctctgtt' assessment of sars-cov- infectivity test was determined by qpcr using taqman assays and taqman universal pcr master mix as done before (corman et al., ; lamers et al., ) . cdna was amplified with gene-specific primer/probe set for the e gene and qpcr was performed with the applied biosystems rrna. cycling parameters were as follows: °c for s, followed by cycles of s at °c and s at °c. all samples were assayed in triplicate and gene eukaryotic s ribosomal rna was used as a reference. organoids and lung organoid-derived monolayers were fixed by either: ( ) % pfa at room temperature for min and quenched with mm glycine for min, ( ) ice-cold % methanol at - °c for min, ( ) ice-cold % methanol on ice for min. subsequently, samples were permeabilized and blocked for hrs using an inhouse blocking buffer ( mg/ml bsa and . % triton x- in pbs); as described previously (lopez-sanchez et al., ) . primary antibodies were diluted in blocking buffer and allowed to incubate overnight at °c; secondary antibodies were diluted in blocking buffer and allowed to incubate for hrs in the dark. antibody dilutions are listed in the supplementary key resource table. prolong glass was used as a mounting medium. # thick coverslips were applied to slides and sealed. samples were stored at °c until imaged. ffpe embedded organoid and lung tissue sections underwent antigen retrieval as previously described in methods for immunohistochemistry staining. after antigen retrieval and cooling in di water; samples were permeabilized and blocked in blocking buffer and treated as mentioned above for immunofluorescence. images were acquired at room temperature with leica tcs spe confocal and with dmi b microscope using the leica las-af software. images were taken with a × oil-immersion objective using -, -, -nm laser lines for excitation. z-stack images were acquired by successive z-slices of µm in the desired confocal channels. fields of view that were representative and/or of interest were determined by randomly imaging different fields. zslices of a z-stack were overlaid to create maximum intensity projection images; all images were processed using fiji (image j) software. organoids were seeded on a layer of matrigel in -well plates and grown for - days. once mature, organoids were fixed in % pfa at room temperature for min and quenched with mm glycine for min. organoids were gently washed with pbs and harvested using a cell scraper. organoids were resuspended in pbs using wide-bore µl pipette tips. organoids were stained using gill's hematoxylin for min for easier ffpe embedding and sectioning visualization. hematoxylin stained organoids were gently washed in pbs and centrifuged and excess hematoxylin was aspirated. organoids were resuspended in °c histogel and centrifuged at °c for min. histogel embedded organoid pellets were allowed to cool to room temperature and stored in % ethanol at °c until ready for ffpe embedding by lji histology core. successive ffpe embedded organoid sections were cut at a µm thickness and fixed on to microscope slides. for sars cov-nucleoprotein (np) antigen retrieval, slides were immersed in tris-edta buffer (ph . ) and boiled for min at °c inside a pressure cooker. endogenous peroxidase activity was blocked by incubation with % h o for minutes. to block non-specific protein binding . % goat serum was added. tissues were then incubated with a rabbit sars cov-np antibody (sino biological, see supplementary key resource table) for . hrs at room temperature in a humidified chamber and then rinsed with tbs or pbs times, for min each. sections were incubated with horse anti-rabbit igg secondary antibodies for min at room temperature and then washed with tbs or pbs times for min each. sections were incubated with dab and counterstained with hematoxylin for sec. annotations were computed using tximport and biomart r package. a custom python script was used to organize the data and log reduced using log (tpm+ ). the raw data and processed data are deposited in gene expression omnibus under accession no gse , and gse . publicly available covid- gene expression databases were downloaded from the national center for biotechnology information (ncbi) gene expression omnibus website (geo) (barrett et al., ; barrett et al., ; edgar et al., ) . if the dataset is not normalized, rma (robust multichip average) (irizarry et al., a; irizarry et al., b) is used for microarrays and tpm (transcripts per millions) (li and dewey, ; pachter, ) is used for rnaseq data for normalization. we used log (tpm+ ) to compute the final log-reduced expression values for rnaseq data. accession numbers for these crowdsourced datasets are provided in the figures and manuscript. all of the above datasets were processed using the hegemon data analysis framework (dalerba et al., ; dalerba et al., ; volkmer et al., ) . deseq (love et al., ) was applied to uninfected and infected samples to identify up-and down-regulated genes. a gene signature score is computed using both the up-and down-regulated genes which are used to order the sample. to compute the gene signature score, first, the genes present in this list were normalized according to a modified z-score approach centered around stepminer threshold (formula = (expr -sthr)/ *stddev (fabregat et al., ) . reactome identifies signaling and metabolic molecules and organizes their relations into biological pathways and processes. violin, swarm and bubble plots are created using python seaborn package version . . . single cell rnaseq data from gse was downloaded from geo in the hdf feature barcode matrix format. the filtered barcode data matrix was processed using seurat v r package (stuart et al., ) and scanpy python package (wolf et al., ) . pseudo bulk analysis of gse data was performed by adding counts from the different cell subtypes and normalized using log (cpm+ ). epithelial cells were identified using sftpa , sftpb, ager, aqp , sftpc, scgb a , krt , cyp f , ccdc , and tppp genes using scina algorithm (zhang et al., ) . pseudo bulk datasets were prepared by adding counts from the selected cells and normalized using log (cpm+ ). cibersortx (https://cibersortx.stanford.edu/runcibersortx.php) was used for cell-type deconvolution of our dataset (which was normalized by cpm). as reference samples, we first used the single cell rnaseq dataset (gse ) from gene expression omnibus (geo). next, we analyzed the bulk rna seq datasets for the identification of cell types of interest using relevant gene markers (see table (ace ,tmprss ) were identified using scina algorithm. then, normalized pseudo counts were obtained with the cpm normalization method. the cell-type signature matrix was derived from the single cell rnaseq dataset, cell-types, and gene markers of interest. it was constructed by taking an average from gene expression levels for each of the markers across each cell type. all experiments were repeated at least three times, and results were presented either as one representative experiment or as average ± s.e.m. statistical significance between datasets with three or more experimental groups was determined using one-way anova including a tukey's test for multiple comparisons. for all tests, a p-value of . was used as the cutoff to determine significance (*p < . , **p < . , ***p < . , and ****p < . ). all experiments were repeated a least three times, and p-values are indicated in each figure. all statistical analyses were performed using graphpad prism . . a part of the statistical tests was performed using r version . . ( - - ). standard t-tests were performed using python scipy.stats.ttest_ind package (version . . ). marker used in this work for immunofluorescence (if) marker used in this work for qpcr obscure markers (not a lot of research relative to lung) brca cybb krt c qb fcgr a il il cd cd xage b ccr ccr alox b cmklr mx tnfrsf ccr cxcr cdk gbp hla-g ido isg lag mad l cxcl mki snai ifitm gzmb cd cd bst bub ccl ccnb cxcl ifi ifi tdo gzma oas pou af cxcl gnly dmbt ddx tnfaip lamp kiaa melk slamf il foxm ifih ifi pdcd lg ifit ifit cxcl irf psmb ccl tnfsf isg cdkn c qa oas oas ifit top a lilrb herc tnfsf b ifi l stat collection and submission of post-mortem specimens from deceased persons with known or suspected covid- ncbi geo: mining millions of expression profiles--database and tools ncbi geo: archive for functional genomics data sets--update detection of novel coronavirus ( -ncov) by real-time rt-pcr single-cell dissection of transcriptional heterogeneity in human colon tumors cdx as a prognostic biomarker in stage ii and stage iii colon cancer gene expression omnibus: ncbi gene expression and hybridization array data repository the reactome pathway knowledgebase the stress polarity signaling (sps) pathway serves as a marker and a target in the leaky gut barrier: implications in aging and cancer generation of alveolar epithelial spheroids via isolated progenitor cells from human pluripotent stem cells summaries of affymetrix genechip probe level data exploration, normalization, and summaries of high density oligonucleotide array probe level data derivation of self-renewing lung alveolar epithelial type ii cells from human pluripotent stem cells sars-cov- productively infects human gut enterocytes rsem: accurate transcript quantification from rna-seq data with or without a reference genome giv/girdin is a central hub for profibrogenic signalling networks during liver fibrosis moderated estimation of fold change and dispersion for rna-seq data with deseq in vitro expansion and genetic modification of gastrointestinal stem cells in spheroid culture models for transcript quantification from rna-seq long-term expanding human airway organoids for disease modeling single lgr stem cells build crypt-villus structures in vitro without a mesenchymal niche the dna glycosylase neil suppresses fusobacterium-infection-induced inflammation and dna damage in colonic epithelial cells helicobacter pylori infection downregulates the dna glycosylase neil , resulting in increased genome damage and inflammation in gastric epithelial cells host engulfment pathway controls inflammation in inflammatory bowel disease comprehensive integration of single-cell data three differentiation states risk-stratify bladder cancer into distinct subtypes scanpy: large-scale single-cell gene expression data analysis long-term expansion of alveolar stem cells derived from human ips cells in organoids scina: a semi-supervised subtyping algorithm of single cells and bulk samples differentiated human airway organoids to assess infectivity of emerging influenza virus name pvalue fdr interferon signaling . e- . e- interferon alpha/beta signaling . e- . e- cytokine signaling in immune system . e- . e- immune system . e- . e- interleukin- signaling . e- . e- interferon gamma signaling . e- . e- chemokine receptors bind chemokines . e- . e- signaling by interleukins . e- . e- insulin-like growth factor- mrna binding proteins (igf bps/imps/vickzs) bind rna ifi epsti amigo ifitm slc a cmpk wars faap apol oasl ifi isg oas ifi l cd slc f ifit ifi samd l ifit parp srp p a. whisker plots display relative levels of ace expression in various cell types in the normal human lung. the cell types were annotated within a publicly available single-cell sequencing dataset (gse ) using genes listed in table s . a. schematic lists the various markers used here for qpcr and immunofluorescence to confirm the presence of all cell types in the d lung organoids here and in d monolayers later (in fig ) . b-h. bar graphs display the relative abundance of various cell type markers (normalized to s) in adult lung organoids (alo), compared to the airway (nhbe) and/or alveolar (at ) control cells, as appropriate. p-values were analyzed by one-way anova. error bars denote s.e.m; n = - datasets. i. d organoids grown in -well chamber slides were fixed, immunostained and visualized by confocal microscopy, as in fig k. scale bar = µm. c. monolayers of alo - were challenged with sars-cov- for indicated time points prior to fixation and staining for krt (red) and viral nucleocapsid protein (green) and dapi (blue; nuclei) and visualized by confocal microscopy. representative images are shown, displaying various cytopathic effects. scale bar = µm. publicly available rna seq datasets (gse ) of lung autopsies from patients who were deceased due to covid- or noninfectious causes (normal lung control) were analyzed for differential expression of genes and displayed as a heatmap. figure . reactome pathway analysis of differentially expressed genes in lung autopsies (normal vs. covid- ). reactome-pathway analysis of the differentially expressed genes shows the major pathways upregulated in covid- -affected lungs. top: visualization as flattened (left) and hierarchical (right, insets) reactome. bottom: visualization of the same data as tables with statistical analysis indicative of the degree of pathway enrichment. figure . differential expression analysis of rna seq datasets from lung organoid monolayers, infected or not, with sars-cov- . adult lung organoid monolayers infected or not with sars-cov- were analyzed by rna seq and differential expression analysis. differentially expressed genes are displayed as a heatmap. figure . reactome pathway analysis of differentially expressed genes in lung organoid monolayers infected with sars-cov- . reactome-pathway analysis of the differentially expressed genes shows the major pathways upregulated in sars-cov- -infected lung organoid monolayers. top: visualization as flattened (left) and hierarchical (right, insets) reactfoam. bottom: visualization of the same data as tables with statistical analysis indicative of the degree of pathway enrichment. key: cord- -g pv zz authors: proud, pamela c.; tsitoura, daphne; watson, robert j.; chua, brendon y; aram, marilyn j.; bewley, kevin r.; cavell, breeze e.; cobb, rebecca; dowall, stuart; fotheringham, susan a.; ho, catherine m. k.; lucas, vanessa; ngabo, didier; rayner, emma; ryan, kathryn a.; slack, gillian s.; thomas, stephen; wand, nadina i.; yeates, paul; demaison, christophe; jackson, david c.; bartlett, nathan w.; mercuri, francesca; carroll, miles w. title: prophylactic intranasal administration of a tlr agonist reduces upper respiratory tract viral shedding in a sars-cov- challenge ferret model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g pv zz respiratory viruses such as coronaviruses represent major ongoing global threats, causing epidemics and pandemics with huge economic burden. rapid spread of virus through populations poses an enormous challenge for outbreak control. like all respiratory viruses, the most recent novel human coronavirus sars-cov- , initiates infection in the upper respiratory tract (urt). infected individuals are often asymptomatic, yet highly infectious and readily transmit virus. a therapy that restricts initial replication in the urt has the potential to prevent progression of severe lower respiratory tract disease as well as limiting person-to-person transmission. we show that prophylactic intra-nasal administration of the tlr / agonist inna- in a sars-cov- ferret infection model effectively reduces levels of viral rna in the nose and throat. the results of our study support clinical development of a therapy based on prophylactic tlr / innate immune activation in the urt to reduce sars-cov- transmission and provide protection against covid- . as with other respiratory covs, sars-cov- primarily spreads via the airborne route, with respiratory droplets expelled by infected individuals . virus can be transmitted from symptomatic, as well as pre-or asymptomatic individuals , , with asymptomatic individuals being able to shed virus, and therefore being capable to transmit the disease, for longer than those with symptoms . as with other respiratory viruses such as influenza, recent evidence suggests that, the epithelium of the upper respiratory tract (urt) is the initial site of sars-cov- infection , . this is consistent with the abundant nasal epithelial cell expression of the sars-cov- receptor, angiotensin-converting enzyme (ace ) and its decreasing expression throughout the lower respiratory tract . a topical treatment of the urt that boosts anti-viral immunity and restricts viral replication is a promising method to promote viral clearance, reduce viral shedding and transmission. the tlrs are key microbe-recognition receptors with a crucial role in activation of host defence and protection from infections and therefore attractive drug targets against infectious diseases [ ] [ ] [ ] to determine whether tlr / agonists are also active against sars-cov- , we used in life samples were taken at days , , , , and , with scheduled culls at days (n= ) and end of study days - (n= ) (fig a) . comparison test, significant (> fold) reduction in nasal viral rna was observed at dpc (p= . ) and highly significant (p< . ), greater than -fold reduction in throat viral rna was apparent from to dpc following inna- i.n. treatment ( figure s ). group ( ug/ml) appears to be the most optimal dosing in this study to assess sars-cov- detected beyond the urt, lung tissue samples were collected, on scheduled cull day ( / animals) and days - ( / animals) dpc and analysed for viral rna levels. on day dpc, two culled ferrets from the control vehicle group had detectable viral rna levels ( . x and . x copies/ml) (fig c ). there was one ferret in group showing detectable, but below the quantifiable access to food and water was ad libitum and environmental enrichment was provided. ferrets were anaesthetised with ketamine/xylazine ( . mg/kg and . mg/kg bodyweight) and exsanguination was effected via cardiac puncture, followed by injection of an anaesthetic overdose (sodium pentabarbitone dolelethal, vetquinol uk ltd, mg/kg). a necropsy was performed immediately after confirmation of death. the left lung was dissected and used for subsequent virology procedures. x pfu/ml sars-cov- . nasal wash and throat swabs were collected at days , , , , & post challenge (p.c.) for all treatment groups and control group. scheduled culls were performed for / ferrets on day p.c. and / ferrets on origin and evolution of pathogenic coronaviruses evolutionary history, potential intermediate animal host, and cross- species analyses of sars-cov- epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china isolation of a novel coronavirus from a man with pneumonia in saudi arabia genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the trinity of covid- : immunity, inflammation and intervention clinical characteristics of asymptomatic infections with covid- screened among close contacts in nanjing, china. science china. life sciences sars-cov- viral load in upper respiratory specimens of infected patients clinical and immunological assessment of asymptomatic sars-cov- infections inactivated influenza vaccine that provides rapid immune-system-mediated protection and subsequent long-term adaptive generation of adaptive immune responses following influenza virus challenge is not compromised by pre-treatment with the tlr- agonist pam cys dose-dependent response to infection with sars-cov- in the ferret model: evidence of protection to re-challenge. biorxiv ferret models of viral pathogenesis mapping influenza transmission in the ferret model to transmission in humans receptor recognition by the novel coronavirus from wuhan: an analysis based on decade structural studies of sars coronavirus pathology of experimental sars coronavirus infection in cats and ferrets susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus are most highly expressed in nasal goblet and ciliated cells within human airways the role of tlr in tlr-dependent human mucosal epithelial cell responses to microbial pathogens the current state of h n vaccines and the use of the ferret model for influenza therapeutic and prophylactic development antiviral efficacies of fda-approved drugs against sars- sars-cov- vaccines: should we focus on mucosal immunity? expert opinion on biological therapy chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques. biorxiv tlr -and tlr -independent recognition of bacterial lipopeptides transloading of tumor antigen-derived peptides into antigen- presenting cells the synthetic bacterial lipopeptide pam csk modulates respiratory syncytial virus infection independent of tlr activation isolation and rapid sharing of the novel coronavirus cov- ) from the first patient diagnosed with covid- in australia. the medical journal of australia metagenomic nanopore sequencing of influenza virus direct from clinical respiratory samples viral rna was quantified by rt-qpcr. (a) nasal wash (b) throat swab (c) lung tissue. geometric mean +/-standard deviation are displayed on the graphs. dashed horizontal lines denote the lower limit of quantification (lloq) and lower limit of detection (llod) dunnett's multiple comparisons test are displayed above the error bars (*) key: cord- -wpywh c authors: itokawa, kentaro; sekizuka, tsuyoshi; hashino, masanori; tanaka, rina; kuroda, makoto title: disentangling primer interactions improves sars-cov- genome sequencing by the artic network’s multiplex pcr date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wpywh c since december , the coronavirus disease (covid- ) caused by a novel coronavirus sars-cov- has rapidly spread to almost every nation in the world. soon after the pandemic was recognized by epidemiologists, a group of biologists comprising the artic network, has devised a multiplexed polymerase chain reaction (pcr) protocol and primer set for targeted whole-genome amplification of sars-cov- . the artic primer set amplifies amplicons, which are separated only in two pcrs, across a nearly entire viral genome. the original primer set and protocol showed a fairly small amplification bias when clinical samples with relatively high viral loads were used. however, when sample’s viral load was low, several amplicons, especially amplicons and , exhibited low coverage or complete dropout. we have determined that these dropouts were due to a dimer formation between the forward primer for amplicon , _left, and the reverse primer for amplicon , _right. replacement of _right with an alternatively designed primer was sufficient to produce a drastic improvement in coverage of both amplicons. based on this result, we replaced primers in total in the artic primer set that were predicted to be involved in primer interactions. the resulting primer set, version n (niid- ), exhibits improved overall coverage compared to the artic network’s original (v ) and modified (v ) primer set. in the original artic prime set v , pcr amplicons and were amplified by the primer pairs _left & _right and _left & _right, respectively. those primers were included in the same multiplexed reaction, "pool ." we noticed that two of those primers, _left and _right, were perfectly complementary to one another by -nt at their ′ ends (fig ) . indeed, we observed ngs reads derived from the predicted dimer in raw fastq data. from this observation, we reasoned that the acute dropouts of those amplicons were due to an interaction between _left and _right, which could compete for amplification of the designated targets. next, we replaced one of the two interacting primers, _right, in the pool reaction with a newly designed primer _rightv ( ′- tctctgccaaattgttggaaaggca- ′), which is located -nt downstream from _right. figures a and b show the coverage obtained with the v set and the v set with _right replaced with _rightv for cdna isolated from a clinical sample obtained during the covid- cruise ship outbreak, which was previously analyzed (epi_isl_ ) (sekizuka et al. ) . the replacement of the primer drastically improved the read depth in the regions covered by amplicons and without any notable adverse effects. the replacement of the primer _rigth improved coverage not only for amplicon , but also for as well, supporting the hypothesis that the single primer interaction caused dropout of both amplicons. given this observation, we identified an additional primer interactions using in silico analysis (fig a and b) . those primer interactions predicted by primerroc algorithm (johnston et al. ), which gave the highest score for the interaction between _left and _right among all , possible interactions, were likely involved in producing the low coverage frequently seen in our routine experiments. next, we designed an additional alternative primers, which resulted in a new primer set (artic primer set ver. niid- (n ) including primer replacements from the original v primer set (table s ). the n primer set eliminated all interactions shown in fig a, and was expected to improve amplification of up to amplicons ( , , , , , , , , , , , , , , , , , , , , , and ) . alongside with this modification, the artic network itself released another modified version of primer set known as v in th march (loman and quick ) after replacement for the same clinical sample (previously deposited to gisaid with id epi_isl_ , ct= . , / input per reaction). regions covered by amplicons with modified primer ( _right) and the interacting primer ( _left) are highlighted by green and orange colors, respectively. for all data, reads were downsampled to normalize average coverage to x. horizontal dotted line indicates depth = . these two experiments were conducted with the same pcr master mix (except primers) and in the same pcr run in the same thermal cycler. a b we reported our result on the replacement of primer _right in a preprint (itokawa et al. a) . the v primer set included spike-in primers, which were directly added into the v primer set to aid amplification of amplicons ( , , , , , , , , , , and ) . the n primer set resulted in improved robustness of coverage over a broader range for relevant amplicons. the improvement, however, made potentially weak amplicons and more apparent (fig ) . the abundance of amplicons gradually decreased with decreasing , in contrast, the abundance of amplicon decreased with increasing . these amplicons seemed equally weak in all three primer sets rather than specific in n primer set. so far, we have not yet identified interactions involving the primers for those amplicons. the gradient experiment also revealed relatively narrow range of optimal temperature for for the v and v primer set, around °c, which was broaden for the n primer set. nevertheless, while ta = °c is a good starting point, a fine tuning of this value may help improving sequencing quality since even slight difference between thermal cyclers, such as systematic and/or well-to-well accuracy differences and under-or overshooting, may affect the results of multiplex pcr (ho kim et al. ). finally, we further compared the v , v and n primer sets for three other clinical samples using a standard temperature program ( = °c). in all three clinical samples (fig and s ) , the n primer set showed the most even coverage fig abundance of amplicons at different annealing/extension temperatures with the three different primer sets on a same clinical sample (previously deposited to gisaid with id epi_isl_ , ct= , / input per reaction). for all data, reads were downsampled to normalize average coverage to x before analysis. the green lines and points indicate the abundances of amplicons whose primers in v primer set were subjected to modification in the n primer set. the orange lines and points indicate the abundances of amplicons whose primers were not modified but predicted to be eliminated the adverse primer interactions in the n primer set. other amplicons which were not subjected to the modification are indicated by black lines and points. the plots in the left column shows results of all amplicons while only amplicons targeted by modification are shown in the plots in the right column. horizontal dotted line indicates fragment abundance = . red vertical lines indicate normal annealing/extension temperature, °c. all those experiments were conducted with the same pcr master mix (except primers) and in the same pcr run in the same thermal cycler. regions covered by amplicons with modified primers and with not modified but interacting primers are highlighted by green and orange colors, respectively. for all data, the reads were downsampled to normalize average coverage to x. horizontal dotted line indicates depth = . these two experiments were conducted with the same pcr master mix (except primers) and in the same pcr run in the same thermal cycler. position several nucleotides toward the ′ ends, but extension or trimming on either end were applied when the medium dissociation temperature ( m) predicted by the neb website tool (https://tmcalculator.neb.com/) were considered too low or high. see details of modifications on primers indicated in table s table_s .pdf differences between the n primer set and the original v primer set. fig_s .pdf depth plots of the original (v ) and two modified artic primer sets (v and n ) for two clinical samples (newly deposited to gisaid with id epi_isl_ , ct = . for a and previously towards a genomics-informed, real-time, global pathogen surveillance system key: cord- -k retwa authors: gassen, nils c.; papies, jan; bajaj, thomas; dethloff, frederik; emanuel, jackson; weckmann, katja; heinz, daniel e.; heinemann, nicolas; lennarz, martina; richter, anja; niemeyer, daniela; corman, victor m.; giavalisco, patrick; drosten, christian; müller, marcel a. title: analysis of sars-cov- -controlled autophagy reveals spermidine, mk- , and niclosamide as putative antiviral therapeutics date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k retwa severe acute respiratory syndrome coronavirus (sars-cov- ) poses an acute threat to public health and the world economy, especially because no approved specific drugs or vaccines are available. pharmacological modulation of metabolism-dependent cellular pathways such as autophagy reduced propagation of highly pathogenic middle east respiratory syndrome (mers)-cov. here we show that sars-cov- infection limits autophagy by interfering with multiple metabolic pathways and that compound-driven interventions aimed at autophagy induction reduce sars-cov- propagation in vitro. in-depth analyses of autophagy signaling and metabolomics indicate that sars-cov- reduces glycolysis and protein translation by limiting activation of amp-protein activated kinase (ampk) and mammalian target of rapamycin complex (mtorc ). infection also downregulates autophagy-inducing spermidine, and facilitates akt /skp -dependent degradation of autophagy-initiating beclin- (becn ). targeting of these pathways by exogenous administration of spermidine, akt inhibitor mk- , and the beclin- stabilizing, antihelminthic drug niclosamide inhibited sars-cov- propagation by , , and > %, respectively. in sum, sars-cov- infection causally diminishes autophagy. a clinically approved and well-tolerated autophagy-inducing compound shows potential for evaluation as a treatment against sars-cov- . severe acute respiratory syndrome coronavirus (sars-cov- ) poses an acute threat to public health and the world economy, especially because no approved specific drugs or vaccines are available. compound-based targeting of cellular proteins that are essential for the virus life cycle has led to the discovery of broadly reactive drugs against a range of covs ( - ). as virus propagation strongly depends on energy and catabolic substrates of host cells, drug target identification should consider the metabolism of infected cells ( ). autophagy, a highly conserved cytosolic degradation process of long-lived proteins, lipids, and organelles in eukaryotic cells, is tightly controlled by metabolism ( , ). during autophagy, intracellular macromolecules are recycled by incorporation into lc b-lipidated autophagosomes (ap) and degradation into their monomers, such as fatty and amino acids, after fusion with low ph lysosomes ( ). in the case of highly pathogenic middle east respiratory syndrome (mers)-cov, we recently showed that autophagy is limited by a virus-induced akt -dependent activation of the e -ligase s-phase kinase-associated protein (skp ), which targets the key autophagy initiating protein beclin- (becn ) for proteasomal degradation ( ). congruently, inhibition of skp propagation up to % (figure a, lower right, figure s d,e) . akt blocks mtorc inhibitor tsc ( ) and further supports the suggestion that up-regulation of mtorc components has antiviral effects. as akt inhibition results in becn up-regulation and autophagy induction ( , ), sars-cov- growth inhibition was expected. direct blocking of the negative becn regulator spk by previously described inhibitors smip , smip - , valinomycin, and niclosamide ( ) showed sars-cov- growth inhibition from (smip , smip - ) to over % in case of valinomycin and niclosamide (figure a, lower panel, figure s d,e) . we further confirmed that the dominant intervention of niclosamide during sars-cov- infection acts on autophagy induction, as adding bafa after niclosamide treatment showed an enhancing effect on the lipidation of lc b as reflected by comparable lc b-ii/i ratios between mock-and sars-cov- -infected cells (figure b) . however, we cannot exclude that the activity of niclosamide as a hydrogen ionophore has additional inhibitory functions, e.g. by blocking endosomal acidification ( ), which is important for sars-cov- entry ( ). twenty-four hours later, cells were fixed and analyzed by fluorescence microscopy. vesicles with both green and red fluorescence (aps) and with red fluorescence only (autolysosomes, al) were counted. in all panels error bars denote standard error of mean derived from n = biologically independent experiments. tp < . , *p < . , ***p < . (two-way anova in a,b, one-way anova in c,d, t-test in e. abbreviations: lc b, microtubule-associated protein a/ b light chain b; mrfp, monomeric red fluorescent protein; egfp, enhanced green fluorescent protein. clinical characteristics of coronavirus disease in china. n engl antiviral potential of erk/mapk and pi k/akt/mtor signaling modulation for middle east respiratory syndrome coronavirus infection as identified by temporal kinome analysis the sars-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors targeting membrane-bound viral rna synthesis reveals potent inhibition of diverse coronaviruses including the middle east respiratory syndrome virus sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor historical landmarks of autophagy research metabolomics profiling reveals differential adaptation of major energy metabolism pathways associated with autophagy upon oxygen and glucose reduction the reversible modification regulates the membrane-binding state of apg /aut essential for autophagy and the cytoplasm to vacuole targeting pathway skp attenuates autophagy through beclin -ubiquitination and its inhibition reduces mers-coronavirus infection guidelines for the use and interpretation of assays for monitoring autophagy dissection of the autophagosome maturation process by a novel reporter protein, tandem fluorescent-tagged lc ampk: regulation of metabolic dynamics in the context of autophagy ampk and mtor in cellular energy homeostasis and drug targets ampk--sensing energy while talking to other signaling pathways sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum autophagy during viral infection -a double-edged sword how mitochondria produce reactive oxygen species spatially distinct pools of torc balance protein homeostasis induction of autophagy by spermidine promotes longevity polyamines and eif a hypusination modulate mitochondrial respiration and macrophage activation polyamines control eif a hypusination, tfeb translation, and autophagy to reverse b cell senescence inhibitors of polyamine metabolism: review article inhibition of polyamine biosynthesis is a broad-spectrum strategy against rna viruses inhibition of lipolysis and lipogenesis in isolated rat adipocytes with aicar, a cell-permeable activator of amp-activated protein kinase aicar inhibits nfkappab dna binding independently of ampk to attenuate lps-triggered inflammatory responses in human macrophages network-based drug repurposing for novel coronavirus - ncov/sars-cov- phase ii trial of akt inhibitor mk- in patients with advanced breast cancer who have tumors with pik ca or akt mutations, and/or pten loss/pten mutation tsc is phosphorylated and inhibited by akt and suppresses mtor signalling akt-mediated regulation of autophagy and tumorigenesis through beclin phosphorylation niclosamide is a proton carrier and targets acidic endosomes with broad antiviral effects the effect of spermidine on memory performance in older adults at risk for dementia: a randomized controlled trial safety and tolerability of spermidine supplementation in mice and older adults with subjective cognitive decline results of an abbreviated phase-ii study with the akt inhibitor mk- in patients with niclosamide as a treatment for hymenolepis diminuta and dipylidium caninum infection in man key: cord- - jccb nh authors: saha, sovan; chatterjee, piyali; basu, subhadip; nasipuri, mita title: detection of spreader nodes and ranking of interacting edges in human-sars-cov protein interaction network date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jccb nh the entire world has recently witnessed the commencement of coronavirus disease (covid- ) pandemic. it is caused by a novel coronavirus (n-cov) generally distinguished as severe acute respiratory syndrome coronavirus (sars-cov- ). it has exploited human vulnerabilities to coronavirus outbreak. sars-cov- promotes fatal chronic respiratory disease followed by multiple organ failure which ultimately puts an end to human life. no proven vaccine for n-cov is available till date in spite of significant research efforts worldwide. international committee on taxonomy of viruses (ictv) has reached to a consensus that the virus sars-cov- is highly genetically similar to severe acute respiratory syndrome coronavirus (sars-cov) outbreak of . it has been reported that sars-cov has ∼ % genetic similarities with n-cov. with this hypothesis, the current work focuses on the identification of spreader nodes in sars-cov protein interaction network. various network characteristics like edge ratio, neighborhood density and node weight have been explored for defining a new feature spreadability index by virtue of which spreader nodes and edges are identified. the selected top spreader nodes having high spreadability index have been also validated by susceptible-infected-susceptible (sis) disease model. initially, the proposed method is applied on a synthetic protein interaction network followed by sars-cov-human protein interaction network. hence, key spreader nodes and edges (ranked edges) are unmasked in sars-cov proteins and its connected level and level human proteins. the new network attribute spreadability index along with generated sis values of selected top spreader nodes when compared with the other network centrality based methodologies like degree centrality (dc), closeness centrality (cc), local average centrality (lac) and betweeness centrality (bc) is found to perform relatively better than the existing-state-of-art. the pandemic covid- registered its first case on december [ ] . it laid its foundation in the chinese city of wuhan (hubei province) [ ] . soon, it made several countries all over the world [ ] as its victim by community spreading which ultimately compelled world health organization (who) to declare a global health emergency on january [ ] for the massive outbreak of covid- . owing to its expected fatality rate, which is about %, as projected by who [ ] , researchers from nations, all over the world, have joined their hands to work together to understand the spreading mechanisms of this virus sars-cov- [ ] - [ ] and to find out all possible ways to save human lives from the dark shadow of covid- . coronavirus belongs to the family coronaviridae. this single stranded rna virus not only affects humans but also mammals and birds too. due to coronavirus, common fever/flu symptoms are noted in humans followed by acute respiratory infections. nevertheless sars-cov- is under the same betacoronavirus genus as that of mers and sars coronavirus [ ] . it comprises of several structural and non-structural proteins. the structural proteins includes the envelope (e) protein, membrane (m) protein, nucleocapsid (n) protein and the spike (s) protein. though sars-cov- has been identified recently but there is an intense scarcity of data as well as necessary information needed to gain immunity against sars-cov- . studies has revealed the fact that sars-cov- is highly genetically similar to sars-cov based on several experimental genomic analyses [ ] , [ ]- [ ] . this is also the reason behind the naming of sars-cov- by international committee on taxonomy of viruses (ictv) [ ] . due to this genetic similarity, immunological study of sars-cov may lead to the discovery of sars-cov- potential drug development. in the proposed methodology, protein-protein interaction network (ppin) has been used as the central component in identification of spreader nodes in sars-cov. ppin has been found to be very effective module for protein function determination [ ] - [ ] as well as in the identification of central/essential or key spreader nodes in the network [ ] - [ ] . compactness of the network and its transmission capability is estimated using centrality analysis. anthonisse et al. [ ] proposed a new centrality measure named as betweenness centrality (bc) . bc is actually defined as the measure of impact of a particular node over the transmission between every pair of nodes with a consideration that this transmission is always executed over the shortest path between them. it is defined as: where ( , ) is the total number of shortest paths from node to node and ( , , ) is the number of those paths that pass through . another centrality measure, called as closeness centrality (cc), was defined by sabidussi et al. [ ] . it is actually a procedure for detecting nodes having efficient transmission capability within a network. nodes which have high closeness centrality value are considered to have the shortest distance to all available nodes in the network. it can be mathematically expressed as: where | | denotes the number of neighbors of node u and ( , ) is the distance of the shortest path from node to node . two other important centrality measures: degree centrality (dc) [ ] and local average centrality (lac) [ ] are also found to be very effective in this area of research. dc is considered to be the most simplest among the available centrality measures which only count the number of neighbors of a node. nodes having high degree is said to be highly connected module of the network. it is defined as: where | | hold the same meaning as state above. lac is defined to be the local metric to compute the essentiality of node with respect to transmission ability by considering its modular nature, the mathematical model of which is highlighted as: where is the subgraph induced by and )* + , is the total number of nodes which are directly connected in . due to high morbidity and mortality of sars-cov , it has been felt that there is a pressing need to properly understand the way of disease transmission from sars-cov- ppin to human ppin [ ] - [ ] . but since sars-cov- ppin is still not available, sars-cov ppin is considered for this research study due to its high genetic similarity with sars-cov- . in the proposed methodology, at first sars-cov-human ppin (up to level ) is formed from the collected datasets [ ] - [ ] . once it is formed, spreader nodes are identified in each of sars-cov proteins, its level and level of human network by the application of a new network attribute i.e. spreadability index which is a combination of three terminologies: ) edge ratio [ ] ) neighborhood density [ ] and ) node weight [ ] . the detected spreader nodes are also validated by the existing sis epidemic disease model [ ] . then the edges connecting two spreader nodes are ranked based on the average of spreadability index of spreader nodes themselves to access to the spreading ability of the corresponding edge. the ranked edges thus highlight the path of entire disease propagation from sars-cov to human level and then from human level to human level proteins. spreader nodes and edges both play a crucial role in transmission of infection from one part of the network to another. generally in a disease specific ppin models, at least two entities are involved: one is pathogen/bait and the other is host/prey [ ] . in this research work, sars-cov takes the role of the former while human later. sars-cov first transmits infection to its corresponding interaction human proteins which in turn affect its next level of proteins. so, the transmission occurs through connected nodes and edges. not all the nodes or proteins in the ppin transmit infection. so, proper identification of nodes transmitting infection (spreader nodes) is required. it is simultaneously also true that the transmission is not possible without the edges connecting two spreader nodes. thus these connecting edges are called spreader edges. the proposed methodology involves a proper study and assessment of various existing established ppin features followed by the identification of spreader nodes, which has been also verified by sis model. ego network of node (-) [ ] is defined as the grouping of node itself along with its corresponding level neighbors and interconnections. n (-) [ ] consists of the set of nodes which belong to the ego network,i.e. { } ∪ ( ). edge ratio of node [ ] is defined by the following equation: where i : is the total number of outgoing edges from the ego networkand j : is the total number of interactions among node 's neighbors, respectively. Γ ( ) denotes the level neighbors of node .is considered to be ego network. : (k) denotes node k's neighbors which belongs -. in the edge ratio, i : is positively related to the non-peripheral location of node . large number of interactions resulting from the ego network actually denotes that the node has high level of interconnectivity between its neighbors. on the other hand, j : is negatively related to the inter-module location of node . it represents the fact that the interconnectivity between neighbors is usually connected to the number of structural holes available around the node. when neighbor's interconnectivity is low, the root or the central node gain more control of the flow of transmission among the neighbors. jaccard dissimilarity [ ] of node and k ( l m n ( , k)) is defined as: where |Γ(i) ∩ Γ(j) | refers to the number of common neighbors of neighbors of and k. |Γ(i) ∪ Γ(j)| is the total number of neighbors of and k. similarity between two nodes is determined by jaccard similarity based on their common neighbors. the similarity degree between and k is considered to be more when they have more number of common neighbors. whereas, when dissimilarity between the neighbors of a node is high, it guarantees that the only common node among the neighbors is the central node, which is termed as structural hole situation [ ] . the neighborhood diversity [ ] is defined as: when a node's neighborhood density reaches its greatest value, it reveals the fact that the neighbors have no other closer path. hence, the neighbors should transmit or communicate through this node. node weight y of node ∈ z in ppin [ ] is interpreted as the average degree of all nodes in [ \ ] , a sub-graph of a graph [ \ . it is considered as another measure to determine the strength of connectivity of a node in a network. mathematically, it is represented by where, z" is the set of nodes in [ \ ] . | z dd | is the number of nodes in [ \ ] . and )*( ) is the degree of a node ∈ z". spreadability index of node is defined as the ability of node to transmit an infection from one node to another. mathematically it can be defined as: nodes having high spreadability index are termed as spreader nodes i.e. these nodes have the capability to transmit the infection to its maximum number of interconnected neighbors in a much short amount of time in comparison to the other nodes in ppin. figure synthetic ppin consisting of nodes and edges where each node represents a protein and its interaction is represented by edges. figure represents a sample ppin where each protein is denoted as nodes while its interaction by edges. in this network, spreadability index is computed by using basic network characteristics as stated earlier which has been highlighted and compared to dc, bc, cc and lac in table to table . in figure , it can be observed that nodes , are clearly the most essential spreaders. node connects the four densely connected modules of the network which turns this node to stand in the first position having highest spreadability index by our proposed methodology. this node has been correctly ranked by all the methods except lac and dc. node though has moderate edge ratio and node weight, but is one of the most densely connected modules itself in spite of getting isolated from the main network module of node . moreover, node has the highest neighborhood density. it establishes the fact that the only path of transmission for nodes , , , , , , , and is node . thus if node gets affected, then all the connected nodes with it will be immediately affected due to the lack of the connectivity of the neighbors with other central nodes. so, node holds the second position with respect to spreadability index in our proposed methodology. node is not properly identified as the second most influential spreader nodes by the other methods. further assessment of the remaining nodes highlights the fact that the performance of the new attribute spreadability index in our proposed methodology is relatively better in comparison to the others. the sis epidemic model [ ] is table to table that the proposed methodology has the highest sis infection rate of . (see table ) in comparison to others for their corresponding top spreader nodes in the synthetic network as shown in figure . to show the ranking of interacting spreader edges, another synthetic ppin has been considered as network while the previous synthetic ppin in section . ( figure ) has been represented as network in figure . node d, e and f are the selected top spreader nodes in network by spreadability index in the same way as that of synthetic ppin in figure while to avoid the complexity in the diagram, top nodes in network (see table ) are selected as spreader nodes in network . red colored edges are the interconnectivities between the nodes of network while black colored edges show the interconnectivity between nodes of network . green colored spreader edges (i.e. edges connected with spreader nodes) show the interconnectivity between network and . ranking of a spreader edge measures the effectiveness of transmission ability of a spreader edge i.e. how many interactive nodes gets infected through that edge. thus all the spreading edges are ranked based on the average of the spreadability index of its connected spreader nodes. the ranked spreader edges in figure are highlighted in table . figure ranking of spreader edges between two interconnected ppin based on spreadability index. thickness of the edges vary with the order of ranking. table ranking of spreader edges for network and network in figure spreader edges the proposed methodology leads to the identification of spreader nodes and edges through a network characteristic, called spreader index which has been also checked and validated by sis model. initially the whole working module is implemented on synthetic networks as shown in figure and figure figure . in figure a , at first sars-cov ppin is displayed in which each protein is marked in red. thereafter spreader nodes in sars-cov ppin are identified by spreadability index which are denoted as blue nodes among red. once the spreader nodes are active (in figure b) , it transmits the infection to its corresponding direct partners i.e. human level proteins (marked in green). in figure c , spreader nodes are identified in sars-cov level human proteins (marked in blue) and the disease will be transmitted further. the same will continue till sars-cov level human proteins in which green nodes are the spreaders and thus the infection will penetrate further in human ppin resulting in a significant fall in human immunity level followed by severe acute respiratory syndrome. in figure , the ppin of sars-cov network has been highlighted. there are mainly proteins which include e, m, orf a, orf a, s, n, orf a, orf ab, and orf b. the computed spreadability index of each of this protein and its corresponding validation by sis model is highlighted in table . it is also compared with other central/ influential spreader node detection methodologies like dc, cc, lac and bc which are shown in table , table , table and table respectively. similarly spreader nodes are also identified in sars-cov's level neighbors and level neighbors (see figure and figure ). spreadability index definitely plays an important role in this proposed methodology. in fact, spreader nodes are successfully identified by the virtue of this scoring technique which covers all the aspects of transmitting infection from one node to another in a ppin. it should be mentioned here that while identifying spreader nodes in sars-cov level human proteins, it has been noted that the number of nodes are getting increased largely with the increment of successive levels. so, high, medium and low threshold [ ] have been applied and the entire disease transmission through spreadability index is computationally assessed at each threshold. the network statistics of spreader nodes at each level of threshold is shown in table . it can be observed that threshold application is only implemented at sars-cov level human proteins not on others. this is because of the availability of very less number of nodes and edges. only nodes and edges having extremely low spreadability index have been discarded at the first level. beside identification of spreader nodes, spreader edges are also identified. the ranked edges between sars-cov spreaders and its level human spreaders is highlighted in table while the ranked edges between sars-cov s level and level human spreaders at high, medium and low threshold are highlighted in a- table , a- table and a- table in the appendix respectively. the entire network view of sars-cov and human ppin has been generated online under three circumstances: ) all the nodes and edges are considered as spreader nodes and edges respectively and are ranked accordingly. in the above generated network views, the blue, yellow and green color represent sars-cov spreaders, its level human spreaders and its level human spreaders respectively. the remaining nodes are in indigo. spreadability index is thus proved to be effective in the detection of spreader nodes and edges in sars-cov-human ppin along with the cross validation by sis model. spreader nodes are the most critical nodes in the network which actually transmit the infection to its successors. simultaneously it is also true if the spreader nodes are not connected with spreader edges infection transmission would not have been possible. in a nutshell, it can be said that the proposed work exploits the possibility of understanding the entire disease propagation/transmission from sars-cov network to human network. it should be borne in mind that sars-cov is ~ % genetically similar to its predecessor sars-cov [ ] , [ ] . it strongly reveals the fact that the human proteins chosen as spreaders of sars-cov might be the potential targets of sars-cov also. if our future research work reveals this assumption, then it will definitely explore a new direction in the identification of essential drugs/vaccine for sars-cov . [ ] "who | update -sars case fatality ratio, incubation period." https://www.who.int/csr/sars/archive/ _ _ a/en/ (accessed apr. , ). [ ] "who | middle east respiratory syndrome coronavirus (mers-cov)." https://www.who.int/emergencies/mers-cov/en/ (accessed apr. , ). [ ] "naming the coronavirus disease (covid- ) and the virus that causes it." https://www.who.int/emergencies/diseases/novel-coronavirus- /technical-guidance/naming-thecoronavirus-disease-(covid- )-and-the-virus-that-causes-it (accessed apr. , ). table ranked spreader edges between sars-cov s level and level human proteins at medium threshold world-health-organization coronavirus disease (covid- ) outbreak a novel coronavirus outbreak of global health concern world map | cdc statement on the second meeting of the international health regulations ( ) emergency committee regarding the outbreak of novel coronavirus ( -ncov) statement-on-the-meeting-of-the-international-health-regulations-( )-emergency-committee-regarding-the-outbreak clinical features of patients infected with novel coronavirus in wuhan, china data sharing and outbreaks: best practice exemplified potential inhibitors for -ncov coronavirus m protease from clinically approved medicines a pneumonia outbreak associated with a new coronavirus of probable bat origin protein function prediction from dynamic protein interaction network using gene expression data protein function prediction from proteinprotein interaction network using gene ontology based neighborhood analysis and physico-chemical features funpred . : improved protein function prediction using protein interaction network lethality and centrality in protein networks centers of complex networks a local average connectivity-based method for identifying essential proteins from the network level high-betweenness proteins in the yeast protein interaction network the rush in a directed graph the centrality index of a graph the sars-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors large-scale analysis of disease pathways in the human interactome identifying influential spreaders based on edge ratio and neighborhood diversity measures in complex networks detecting overlapping protein complexes in ppi networks based on robustness the mathematical theory of infectious diseases and its applications analysis of protein targets in pathogen-host interaction in infectious diseases: a case study on plasmodium falciparum and homo sapiens interaction network the distribution of the flora in the alpine zone a method for predicting protein complex in dynamic ppi networks genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan authors are thankful to the cmater research laboratory of the computer science department, jadavpur university, india, for providing infrastructure facilities during progress of the work. this project is partially supported by the cmater research laboratory of the computer science and engineering department, jadavpur university, india, and dbt project (no.bt/pr /bid/ / / ), ministry of science and technology, government of india. key: cord- - sbc guz authors: satish, swarup; yao, zonghai; drozdov, andrew; veytsman, boris title: the impact of preprint servers in the formation of novel ideas date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sbc guz we study whether novel ideas in biomedical literature appear first in preprints or traditional journals. we develop a bayesian method to estimate the time of appearance for a phrase in the literature, and apply it to a number of phrases, both automatically extracted and suggested by experts. we see that presently most phrases appear first in the traditional journals, but there is a number of phrases with the first appearance on preprint servers. a comparison of the general composition of texts from biorxiv and traditional journals shows a growing trend of biorxiv being predictive of traditional journals. we discuss the application of the method for related problems. a paper submitted to a journal goes through several stages: peer review, editorial work, copyediting, publication. this leads to a long waiting time between submission and the final publication (powell, ) . the situation is especially bad in the life sciences, where the waiting time between submission and publication approaches the duration of a traditional phd study, creating serious difficulties for young scientists (vale, ) . while this is frustrating for scientists whose recognition and promotion often depend on the publication record, it is also bad for science itself, significantly slowing down its progress (qunaj et al., ) . preprint servers were offered as a means of accelerating science (berg et al., ; desjardins-proulx et al., ; sarabipour et al., ; schloss, ; lauer et al., ; peiperl, ) , especially in the wake of covid- epidemics (krumholz et al., ) . the discussion about the benefits (and dangers) of preprint is no longer confined to the scientific literature, coming to the pages of popular newspapers (eisen and tibshirani, ). * equal contribution. the benefits of preprints for accelerating science are often raised in the discussions between regulatory agencies, funders, scientists and publishers. thus a method to objectively assess them is important. one way to do this assessment is to look at a new important idea and to measure whether it first appears in a traditional journal or on a preprint server. an implementation of this approach requires one to define what is an important idea, and how to find the time of appearance for it. this is the goal of our work. the definition of novelty in science and the methods to determine and predict novelty have a long history discussed in the next section. in this work we use a very simple approach (garfield, ; latour and woolgar, ) : new ideas correspond to new terms. thus if we find new words and phrases in scientific papers, we can surmise the appearance of new ideas. the definition of the time of appearance for a new idea is not trivial. it is not enough to register the first mention of a term. first, some hits might be erroneous, and give us false positives. on the other hand, we might miss some mentions of a term due to the incompleteness of the corpus. therefore a more subtle method to determine the time of appearance is needed. in this work we offer a bayesian approach to this problem. based on our definitions of novelty and the time of appearance for novel ideas we compare the time of appearance for several novel ideas in the papers published in biorxiv/medrxiv https://www. biorxiv.org/ and pubmed central full text collection https://www.ncbi.nlm.nih.gov/pmc/. novel ideas and breakthroughs are among the central concepts for the science of science. a number of studies propose different ideas to quantify originality in science and technology (cozzens et al., ; alexander et al., ; rzhetsky et al., ; rotolo et al., ; wang and chai, ; shibayama and wang, ) or their impact on the other works (shi et al., ; shahaf et al., ; sinatra et al., ; hutchins et al., ; wesley-smith et al., ; herrmannova et al., b,a; zhao et al., ; bornmann et al., ; small et al., ) . the prediction of breakthroughs, scientific impact and citation counts is a well developed area (schubert and schubert, ; garfield et al., ; dietz et al., ; lokker et al., ; shi et al., ; uzzi et al., ; alexander, ; klimek et al., ; tahamtan et al., ; mckeown et al., ; clauset et al., ; peoples et al., ; salatino et al., ; dong et al., ; iacopini et al., ; feldman et al., ; van den besselaar and sandström, ; klavans et al., ) . however, the question asked in these works is different from the one we ask. most of the researchers tried to determine what makes a work original or impactful, and how to predict originality or impact. our question is the following: suppose we know a certain idea is novel (or impactful). can we pinpoint a moment in time when this idea appeared, and where did it appear? sentence level novelty detection was a topic of novelty tracks of text retrieval conferences (trec) from to (soboroff and harman, ; harman, ; clarke et al., ; soboroff and harman, ) . the goal of these tracks was to highlight the relevant sentences that contain novel information, given a topic and an ordered list of relevant documents. at the document level, karkali et al. ( ) computed novelty score based on the inverse document frequency scoring function. another work by verheij et al. ( ) presents a comparison study of different novelty detection methods evaluated on news articles where language model based methods perform better than the cosine similarity based ones. dasgupta and dey ( ) conducted experiments with information entropy measure to calculate novelty of a document. again, the work that we present here significantly differs from the existing novelty detection methods since we use the novelty detection as a starting point rather than a goal. we first get candidate phrases from the documents with the most appropriate key phrases extraction method, and then determine the appearance timing of these phrases. another field that is relevant to our research is the detection of change points in a stream of events. change point detection (or cpd) detects abrupt shifts in time series trends that can be easily identified via the human eye, but are harder to pinpoint using traditional statistical approaches. the research in cpd is applicable across an array of industries, including finance, manufacturing quality control, energy, medical diagnostics, and human activity analysis. there are many representative methods of cpd. binary segmentation (bai, ) is a sequential approach: first, one change point is detected in the complete input signal, then series is split around this change point, then the operation is repeated on the two resulting sub-signals. as opposite to binary segmentation, which is a greedy procedure, bottom-up segmentation (fryzlewicz, ) is generous: it starts with many change points and successively deletes the less significant ones. first, the signal is divided in many sub-signals along a regular grid. then contiguous segments are successively merged according to a measure of how similar they are. because the enumeration of all possible partitions is impossible, pelt (killick et al., ) relies on a pruning rule. many indexes are discarded, greatly reducing the computational cost while retaining the ability to find the optimal segmentation. window-based change point detection (aminikhanghahi and cook, ) uses two windows which slide along the data stream. dynamic programming was also used for this task (truong et al., ) . in this work we propose a simple bayesian approach to the detection of change points, which seems to give intuitively reasonable results for our purpose. our proposed bayesian approach to novelty detection (band) finds the time interval τ that maximizes the observed series of publication frequency for a phrase (figure ). paper publications are events, so it is reasonable to assume that the number of publications n in a unit interval at the time t containing the given phrase z follows a poisson distribution. the joint density function for publication frequency is given by the equation where µ t is modeled by a piecewise linear function of t: prior to τ , we expect the number of publications containing the given phrase to be small, ideally zero. the parameter α > controls for the noise (misattributed papers, improper or ambiguous usage of the phrase, etc.). after the moment τ , we expect the steady grow of phrase popularity with the rate β. in other words, τ is the point in time ehen the phrase begins to be adopted. we consider each phrase independently and use bayesian modeling to find the most probable parameters α, β, and τ given the observed data using the standard bayesian approach we use a flat uniformative prior with p (α, β, τ ) = const. ( ) our motivation for finding τ using band is to compare the impact of different publication venues (i.e. preprint servers and peer-reviewed journals) that cover overlapping research topics. for a single phrase z, we use the observed data from two sources and estimate the posterior probability p (α, β, τ ) for each source separately. then we run a simulation to find the % confidence interval of τ using the following procedure: . for a single data source, compute p (α, β, τ ) for all possible configurations on a grid. . sample a large number of triplets (α, β, τ ) from the posterior computed in step . . remove . % of the triples with the highest value of τ , and . % of triplets with the lowest value of τ . the probability for τ to lie in the remaining interval can be estimated as %. in order to compare two sources, we create two sets of tuples, one for each source. then we randomly draw a tuple from the first set and a tuple from the second one. for each pair i we compute δ i , the expected difference between the two sources where τ (s) i is the i-th sample in source s: the conclusion about the priority is based on distribution of δ around zero. if δ > for the majority of the pairs, then the phrase gains traction first on source s . otherwise the second source wins. in this section we describe our data collection procedure and background on methods of text processing and assessing data quality. our code for running experiments is publicly available. we consider two types of publication venues: peerreviewed journals and preprint servers. there are now many preprint servers for various areas of science, including arxiv, biorxiv, medrxiv, psyarxiv, socarxiv, chemrxiv, agrirxiv, and others. to compare a preprint server to a traditional venue one needs a large open access collection of traditionally published papers. in biomedical sciences there is a huge pubmed central open access dataset described below, which drove our choice for biorxiv/medrxiv as a comparison venue. another reason for this choice of data sources is that one of our organizations, chan zuckerberg initiative, has a special interest in biomedical sciences in general and biorxiv & medrxiv in particular. pubmed is a central repository for biomedical papers published in peer-reviewed journals. it contains over million journal publications. abstracts are publicly available for all papers, and for a subset (the pubmed central open access dataset with over . million papers) full texts are available. in contrast to pubmed, biorxiv is a preprint server for the biological sciences. papers published there are not required to pass a strict and lengthy review process. biorxiv hosts over , full text articles, each open to the public. recently medical papers were separated into a special server medrxiv. since the search engine provided by biorxiv can search both servers, below we use the term "biorxiv" for the longer, but more correct term "the union of biorxiv and medrxiv papers". pubmed and biorxiv cover the two representative categories of publication venue. we also include data sources using information from covid- open research dataset challenge . in our experiments and analysis we leverage a large collection of phrases collected in an unsupervised way using textrank (mihalcea and tarau, ; nathan, ) . the procedure to extract the phrase is the following: . first we extract all candidate phrases from biorxiv abstracts using textrank. this results in , , phrases. step to the , phrases by eliminating phrases that were detected by textrank only once. step we generate monthly time series data for both pmc and biorxiv using full text. we use this data collection procedure and recommendations from czi biomedical curators to create three groups of phrases: (a) common phrases: banal phrases manually selected that are also extracted by textrank (includes 'medical history', 'heart disease', 'x-ray', etc.). (b) novel phrases: phrases selected by experts (includes 'mass cytometry', 'gene editing', 'fluorescence activated cell sorting', etc.). (c) top extracted phrases: the top phrases extracted by textrank determined by average importance score. all common and novel phrases, and a subset of the top ranking phrases are shown in appendices a. , a. . , and a. . . to assess the effectiveness of band for finding τ we include a strong baseline in our experiments. in our analysis (section . ), we compare these methods to band not only for finding the first clear inflection point, but also how relevant τ is for the end goal of novelty detection and comparing research impact. the baseline we include is window-based change point detection (see section ). it works by maximizing the discrepancy measuring function ( ) which is large when the left segment c(x a , τ ) is dissimilar from the right segment c(τ, x b ). window-based changepoint detection (wcpd) is flexible and has been successful when applied to many tasks. however, for a number of novel phrases (see the first two examples on figure ) it gives intuitively unsatisfactory results for the detection of the idea onset. the reason is, wcpd tries to find the point where the publication frequency significantly changes, which often corresponds to the moment the idea is widely adopted. our task, on the other hand, is to find the point when the idea appears, which is a different problem. therefore one may expect band to work better for the cause of the novelty detection because its model of growth starting from zero might be more suitable to describe the publication frequency than the generic model of wcpd. we found necessary to run wcpd using the finite difference approximation of the publication frequency gradient to obtain reasonable predictions. we use the implementation of wcpd provided in ruptures (truong et al., ) with an l -cost function (c), window size of , penalty of , and no specification on how many changepoints to return. in our figures we display all changepoints from wcpd to illustrate these changepoints do not solely identify the point before rapid growth in idea adaptation. on the other hand our method in this section we discuss the following hypotheses and research questions: • can we find the point τ in time when a phrase is determined novel using our bayesian approach to novelty detection (band)? • is the value of τ effective for comparing the impact of two publication venues? • do our findings verify that preprint servers are having a positive impact on the development of novel ideas? • how effective is publication frequency for distinguishing from novel and banal phrases? below we answer each of these questions. the common wisdom is the preprint servers have positive impact on research, by making research available openly and quickly. we attempt to verify that ideas develop faster on biorxiv rather than on pubmed. our results indicate that this might be true in some, but not all, cases. for some phrases δ leans positive (see figure top), indicating the relevant phrase and presumably the novel idea appeared first on biorxiv. however, the opposite is true more often than not (see figure , bottom). why is pubmed frequently the first place that novel ideas appear? one reason might be that biorxiv is relatively new, did not gain enough traction yet, and its benefits are not widely appreciated. one can even say it is surprising and encouraging that some novel ideas appear first on biorxiv despite the it being a relative newcomer. thus one interpretation of our finding is that while preprint servers are already having a positive impact on research, there is still a potential for the growth. in this case we expect that in the future novel ideas will appear first on biorxiv at a higher rate. we further check this assumption in section . . we assume that curves of publication frequency of novel ideas follow a particular shape-they are relatively flat followed by a growth period. this assumption is built in the design of band. to verify band's effectiveness, we use a set of novel phrases provided by the team of biomedical curators at chan zuckerberg initiative (czi), extract their publication frequency data from pubmed, and calculate τ using band. qualitatively, we see in about the novelty onset. as discussed in section . , wcpd is another technique for finding points of inflection. wcpd is a useful method because it finds inflection points without requiring any prior model of the data. as expected, if the evolution of data follows our model of growth, band gives a better estimate for the novelty onset (figure , left and center) . on the other hand, if the data do not follow band model, band is not supposed to work well, and assumption-free models like wcpd might work better. an example is shown on (figure , right) , where a period of growth is followed by a plateau rather than the growth assumed by band model. a potentially confounding variable in our experimental setup is the relative recency in the establishment of biorxiv ( ) compared to pubmed ( ) . this motivates us to measure the correlation between the publication frequency of these two data sources. we proceed by looking at three groups of phrases: (a) common phrases found in medical terminology, (b) novel phrases provided by experts, and (c) highest scoring phrases according to textrank. first, we aggregated the last years of data. we found the similarity between pubmed and biorxiv for common phrases and phrases extracted from textrank most similar when comparing the early snapshot of biorxiv with the later snapshot of pubmed (figure ) . thus for these phrases biorxiv is predictive of pubmed. on the other hand, phrases selected by experts (section . ) tend to appear on pubmed first (figure , center) . taking into account that biorxiv is still evolving, we performed a more fine-grained analysis where results are only aggregated across years instead of . this also allowed us to study how the content alignment between pubmed and biorxiv has changed over time. the result is shown on figure . according to this figure, as biorxiv matures, it becomes a better leading indicator of pubmed. perhaps more critically, it also shows that the con- figure : fine-grained correlation analysis. we perform the same analysis as in figure , except with a smaller window (two years) and shifting both biorxiv (x-axis) and the starting month (y-axis). if x = and y = , then the start month is april for pubmed and march for biorxiv. the data are shown on a grid with darker red indicate high correlation and dark blue indicating low or inverse correlation. in general, we see higher correlation between biorxiv and pubmed as biorxiv matures. (b) virus outbreaks that took place after the start of biorxiv. the outbreak and year it began is shown above each plot. figure : outbreaks that occurred before biorxiv became widespread (top row, a) with publishing activity plateaued in recent years v. outbreaks occurring after biorxiv was founded (bottom row, b) with recent growth in research activity. tent alignment is indeed changing. the analysis we provide in this work at best is a glimpse into the relationship between pre-print servers and peer-reviewed journals. it will likely change as biorxiv continues to mature. during natural disasters such as virus outbreaks, scientific progress towards understanding diseases and their cures is critical, warranting fast dissemination of ideas and results of research. pre-print servers are particularly well suited to this end. thus we compare biorxiv to pubmed for five recent virus outbreaks, some of which took place prior to the founding of biorxiv. each virus outbreak was analyzed using a composite of publication frequency of multiple related phrases. for example, values for 'sars-cov ' and 'covid- ' are aggregated in the covid- plot. the five relevant outbreaks are listed below: • prior to biorxiv establishment (before november , figure , top): mers and sars. • after biorxiv establishment ( figure , bottom): zika, ebola, and covid- . the first group of outbreaks exhibits the expected behavior: biorxiv activity is fairly minimal given that the growth of research on those topics had begun to saturate by the time biorxiv was formed. but the second group exhibits a different behavior. for ebola outbreak, which took place in , the activity in biorxiv is fairly low compared to pmc. for zika ( ) and covid- ( ) biorxiv has an early activity spike. through this analysis we can see that biorxiv has become increasingly important in emergency situations during recent years. our work serves as a first attempt for discovering research impact of publication venues using full text analysis. one notable assumption we make is that all phrase mentions are treated equally. in the future, we may want to distinguish how a phrase is being used. for example, if the phrase of interest is z, then a research paper may write about works extending z, but alternatively it may simply discuss techniques similar to z. similar analysis has provided useful for citations (jurgens et al., ) . furthermore, our approach leverages textrank to find many relevant phrases, but we do not cluster phrases, so two or more phrases with similar meaning will be treated separately (i.e. facs and fluorescence activated cell sorting). a simple alias table or string similarity extension (tam et al., ) would be a clear improvement. leveraging high precision concept extraction systems (king et al., ) might improve clustering even more. another approach would be to use a better proxy for ideas than textual phrase, like concepts or topics extracted from the corpus. we introduce a bayesian model for novelty detection (band), and use this model to investigate how quickly new ideas form on pre-print servers compared to peer-reviewed journals. our findings indicate that novel phrases, which we use as a proxy for new ideas, in most cases appear on pre-print servers and in peer reviewed journals. in some cases, novel phrases appear on pre-print servers first. in many cases the content of preprints is a predictor of the content of peer reviewed journals. as the preprint servers mature, this feature becomes more prominent. when a fast review time is in high demand (such as during epidemic outbreaks), pre-print servers have a high utility, and the related novel phrases appear on pre-print servers first. in our experiments and analysis we primarily consider two data sources: biorxiv (representative of pre-print servers) and pubmed (representative of peer-reviewed journals). we extract phrases using textrank as described in section . . in this section of the appendix, we summarize the phrases considered in the experiments. textrank extracted , , phrases from biorxiv abstracts. we further filter this list to , by only including phrases that were detected by textrank more than once. this does not necessarily mean that the phrases that were filtered only occur in the biorxiv abstracts a single time but that textrank only found it to be a key phrase once. the distribution of phrase lengths is shown in figure . a sample of phrases is listed in table . to validate the effectiveness of band, we use a small set of curated phrases provided by experts at czi. those results are discussed in detail in section . . the complete list of phrases is shown in table . in conducting the correlation study, explained in section . , we used a set of phrases that can be considered to be commonly used in biomedical literature. the complete list can be found in the table . pre-print servers are especially useful during crises such as pandemics. we chose recent virus outbreaks to study their response which is discussed in section . : sars, mers, ebola, zika, and covid- . we made one compound time series curve for each virus's frequency based occurrence in pubmed and biorxiv. table shows two lists. the phrases are obtained using all ordered combinations of entries from column and column . for example we can combine "sars-cov" and "epidemic" to form "sars-cov epidemic". each epidemic is associated with a set of phrases to form one compound set of phrases. an exploratory study of interdisciplinarity and breakthrough ideas a reasoning-based framework for the computation of technical emergence a survey of methods for time series change point detection estimating multiple breaks one at a time preprints for the life sciences measuring researcher independence using bibliometric data: a proposal for a new performance indicator citation concept analysis (cca): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by thomas s. kuhn and karl r. popper overview of the trec terabyte track data-driven predictions in the science of science emerging technologies: quantitative identification and measurement automatic scoring for innovativeness of textual ideas karthik ram, timothée poisot, and dominique gravel. . the case for open preprints in biology unsupervised prediction of citation influences collaboration diversity and scientific impact how to identify flawed research before it becomes dangerous citation count analysis for papers with preprints unbalanced haar technique for nonparametric function estimation primordial concepts, citation indexing, and historio-bibliography algorithmic citation-linked historiography--mapping the literature of science overview of the trec novelty track analyzing citation-distance networks for evaluating publication impact text and graph based approach for analyzing patterns of research collaboration: an analysis of the trueim-pactdataset relative citation ratio (rcr): a new metric that uses citation rates to measure influence at the article level network dynamics of innovation processes measuring the evolution of a scientific field through citation frames efficient online novelty detection in news streams optimal detection of changepoints with a linear computational cost high-precision extraction of emerging concepts from sientific literature . a novel approach to predicting exceptional growth in research. arxiv e-prints successful fish go with the flow: citation impact prediction based on centrality measures for term-document networks preprints can fill a void in times of rapidly changing science laboratory life: the construction of scientific facts time for a prepublication culture in clinical research? prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study predicting the impact of scientific concepts using full-text features textrank: bringing order into text pytextrank, a python implementation of textrank for phrase extraction and summarization of text documents preprints in medical research: progress and principles twitter predicts citation rates of ecological research does it take too long to publish research? delays in the publication of important clinical trial findings in oncology what is an emerging technology? choosing experiments to accelerate collective discovery proceedings of the th acm/ieee on joint conference on digital libraries on the value of preprints: an early career researcher perspective inorganica chimica acta: its publications, references and citations. an update for - metro maps of science citing for high impact measuring originality in science quantifying the evolution of individual scientific impact citations and certainty: a new interpretation of citation counts overview of the trec novelty track novelty detection: the trec experience factors affecting number of citations: a comprehensive review of the literature optimal transport-based alignment of learned character representations for string similarity selective review of offline change point detection methods atypical combinations and scientific impact accelerating scientific publication in biology a comparison study for novelty control mechanisms applied to web news stories bias against novelty in science: a cautionary tale for users of bibliometric indicators cord- : the covid- open research dataset three new bibliometric indicators/approaches derived from keyword analysis static ranking of scholarly papers using article-level eigenfactor (alef) measuring academic influence using heterogeneous author-citation networks the authors are grateful to bill burkholder (chan zuckerberg biohub) who suggested this research, and to barbara vidal & michaela torkar (chan zuckerberg initiative) for their expert help in selecting the phrases for comparison.this research was initiated at the university of massachusetts amherst industry mentorship program. key: cord- -tl fm yu authors: goletic, teufik; konjhodzic, rijad; fejzic, nihad; goletic, sejla; eterovic, toni; softic, adis; kustura, aida; salihefendic, lana; ostojic, maja; travar, maja; mrdjen, visnja; tihic, nijaz; jazic, sead; musa, sanjin; marjanovic, damir; hukic, mirsada title: phylogenetic pattern of sars-cov- from covid- patients from bosnia and herzegovina: lessons learned to optimize future molecular and epidemiological approaches date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tl fm yu whole genome sequence of four samples from covid- outbreaks was done in two laboratories in bosnia and herzegovina (veterinary faculty sarajevo and alea genetic center). all four bih sequences cluster mainly with european ones (italy, austria, france, sweden, cyprus, england). the constructed phylogenetic tree indicates probable multiple independent introduction events. the success of future containment measures concernig new introductions will be highly challenging for country due to the significant proportion of bh population living abroad. the first case of a new viral respiratory disease (covid- ) caused by a newly discovered virus (sars-cov- ) was confirmed in bosnia and herzegovina (bh) on march , . in the period until th of june, a total of cases were laboratory confirmed, of which with fatal outcome [ ] . on th of march bh declared national emergency followed by introduction of very restrictive measures on international movement, social distancing, mandatory use of personal protective equipment and lockdown for the population under and above years. despite having different health authorities and no state ministry of health due to complex governmental structure, it is generally agreed that bh authorities successfully implemented case detecting and tracing practices keeping epidemic scenario not to overwhelm health care capacities. cease of restrictive measures started mid-may and, regardless of the current new cases numbers increase in the first half of june, authorities are keeping the plan to return to pre epidemic life conditions. bh scientific community has a significant experience in, and previously established capacity for, a wide specter of both human and animal virologic genetic analysis [ , , , ] . as a result, several scientific teams were prepared to be involved in a joint mission of genome sequencing of sars-cov- in bh. objectives of this research were: to share obtained sequences of the complete genome of sars-cov- strains from clinical samples of bih patients diagnosed with covid- , and to contribute to the understanding of the interaction of molecular and classical epidemiology findings of covid in bh and the whole region and give recommendations for the improvement of prevention and future measures. the samples used in this study were nasopharyngeal swabs in viral transport medium from four patients, originating from tuzla, sarajevo, livno and banja luka, and rt-pcr covid- confirmed on , , , and of april , respectively ( samples from livno and banja luka were processed at the laboratory of veterinary faculty sarajevo (vfs) and samples from tuzla and sarajevo at the alea genetic center, sarajevo (agc). a total of μl of each sample was used for rna extraction by qiaamp viral rna mini kit (qiagen, germany), and the product of extraction was later used for further analysis, while the rest of the samples were used for virus isolation on various cell lines at the vfs (data not shown here). the presence of sars-cov- in the samples was confirmed by real-time rt-pcr using hku protocol [ ] , and the cycle threshold (ct) values were , ; , ; , and , for livno, banja luka, tuzla and sarajevo samples, respectively. livno and banja luka samples wgs was performed according to the artic amplicon sequencing protocol for minion for ncov- , which uses two primer pools to generate the sequence, as described elsewhere [ ] . the sequencing was performed on a minion sequencer, using an r . . flow cell on which the samples, as well as a negative control, were pooled. minknow software and the minit device were used for high accuracy realtime base-calling during the run, which lasted hours. only the base-called fastq files with the q score ≥ were used for further analysis. the bioinformatic analysis was performed according to the ncov- novel coronavirus bioinformatics protocol [ ] . the consensus sequence was mapped, for correction purposes, to the wuhan reference genome (mn ) using minimap , and polished in racon. tuzla and sarajevo samples wgs was performed according to ion ampliseq™ sars-cov- research panel instructions for use on an ion genestudio™ s series system, as described previously [ ] , with certain modifications. sequencing was performed on ion genestudio™ s instrument using chip. raw data was analyzed using torrent suite software . . where the sequences were aligned to the ion ampliseq sars-cov- reference genome. fasta format of the obtained sequences was generated by the iterative refinement meta-assembler (irma) plugin. sequences were additionally reviewed using bioedit software. all sequences were deposited in the global initiative on sharing all influenza data (gisaid) https://www.gisaid.org/epiflu-applications/next-hcov- -app/). to perform a phylogenetic analysis (pa) of sequences from livno (gisaid accession id: epi_isl_ ), banja luka (gisaid accession id: epi_isl_ ), tuzla (gisaid accession id: epi_isl_ ) and sarajevo (gisaid accession id: epi_isl_ ), a dataset of whole genome sequences was obtained from gisaid (supplementary material). these sequences were chosen based on their similarity with four bh sequences mentioned above. namely, a set of sequences from other countries, displaying the highest sequence identity with each bh sequence, were chosen. sequence alignment and the construction of the phylogenetic tree were performed in mega x [ ] . two main gisaid lineages [ ] were highlighted with differently coloured brackets and named accordingly: b. . . (gr) (yellow bracket) and b. (g) (blue bracket). four bh sequences were marked with a different marker (red square, circles and a triangle), and bootstrap values were shown at the level of nodes. livno sequence (gisaid accession id: epi_isl_ ) is marked differently from other three bh sequences because its gisaid subclade (b. . . (o)) is different than the subclade it has been sorted in this phylogenetic tree (b. . . (gr) ). position in spike protein, used to characterize the g clade [ ] , was shown to be g in bh isolates, assigning all four bih isolates to g clade together with european sequences (italy, austria, france, sweden, cyprus, england). the constructed phylogenetic tree in figure indicates probable multiple independent introduction events as reflected by clustering of each single bih sequence in a separate cluster, highlighted with red (livno, epi_isl_ ), green (banja luka, epi_isl_ ), blue (sarajevo, epi_isl_ ) and purple (tuzla, epi_isl_ ). we sequenced the whole genome out of four samples from four different bh regions, originated from tuzla, livno, sarajevo and banjaluka municipalities. even though all four bih sequences cluster mainly with european ones, the constructed phylogenetic tree indicates multiple independent introductions of covid- in bih. those findings correspond with other covid- wgs studies [ , ] and confirm the importance of travel-associated disease introduction events for bih as well. the success of future mitigation measures related to this type of disease introduction will be highly challenging for bih due to a significant proportion of bih population living abroad. wgs is a valuable approach for further investigation of the level of association of various modes of international movement (touristic, economic, religious, family visits etc) with transmission chains and associated risks. recently, significant increase in the number of new cases (more than in just one week and both international and community transmitted) emphasized a need to establish a more efficient scientific and expert system for further work on the genetic analysis of the sars cov- virus, and to continue investigation of the whole genetic covid- pattern as well. integration of phylogenetic (molecular) and epidemiological approaches in the assessment of human, animal and environmental data will help with the identification of risk factors for disease spreading and optimize efficient and rational use of preventive measures. the importance of results gained from such approaches and studies will help in communication and scientific justification of social distancing and movement restriction perception in public. all generated viral genome assemblies have been submitted to gsiad. all submitters of data may be contacted directly via www.gisaid.org ministry of civil affairs of bosnia and herzegovina dna identification of skeletal remains from the second world war mass graves uncovered in slovenia highly effective dna extraction method for nuclear short tandem repeat testing of skeletal remains from mass graves identification of human remains from the second world war mass graves uncovered in bosnia and herzegovina molecular determinants of pathogenicity and host specificity of highly pathogenic h n bih isolates school of public health, the university of hong kong ncov- sequencing protocol v . protocols.io [internet ncov- novel coronavirus bioinformatics protocol. artic network ion ampliseq™ sars-cov- research panel instructions for use on an ion genestudio™ s series system quick reference (pub. no. man ) molecular evolutionary genetics analysis across computing platforms a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology genomics of indian sars-cov- : implications in genetic diversity, possible origin and spread of virus genetic structure of sars-cov- reflects clonal superspreading and multiple independent introduction events genomic epidemiology of sars-cov- spread in scotland highlights the role of european travel in covid- emergence part of this work was conducted in molecular-diagnostic and forensic laboratory of veterinary faculty university of sarajevo that is supported with equipment and training of staff by international atomic energy agency (iaea) under project "strengthening state infrastructure for food and animal food control and protecting animal health" (boh ). the authors wish to thank dr joshua quick for his kind contribution of artic primer set for whole genome sequencing, which greatly assisted in their research. the authors wish to thank dr goran cerkez from federal ministry of health for his continuing support of this research and dr mirza ponjavic from university of tuzla for technical support. we gratefully acknowledge the authors, the originating and submitting laboratories for their sample and metadata shared. tg, rk, dm and nf conceptualised and designed the study. tg, rk, sg, te, ak, as, and ls designed and implemented lab protocols. tg, rk and sg developed and implemented sequencing data analysis approaches. mh, mo, mt, vm, nt, sj and sm provided clinical samples and epidemiological data. tg, rk, sg, dm, sm and nf made an analysis of data and drafted discussion and conclusions of the study. all authors have commented on the draft and approved the final version. key: cord- - lr th authors: shishir, tushar ahmed; naser, iftekhar bin; faruque, shah m. title: in silico comparative genomics of sars-cov- to determine the source and diversity of the pathogen in bangladesh date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lr th the covid pandemic caused by sars-cov- virus has severely affected most countries of the world including bangladesh. we conducted comparative analysis of publicly available whole-genome sequences of sars-cov- isolates in bangladesh and isolates from another countries to predict possible transmission routes of covid to bangladesh and genomic variations among the viruses. phylogenetic analysis indicated that the pathogen was imported in bangladesh from multiple countries. the viruses found in the southern district of chattogram were closely related to strains from saudi arabia whereas those in dhaka were similar to that of united kingdom and france. the sars-cov- sequences from bangladesh belonged to three clusters. compared to the ancestral sars-cov- sequence reported from china, the isolates in bangladesh had a total of mutations in the coding region of the genome, and of these were missense. among these, missense mutations ( %) were predicted to destabilize protein structures. remarkably, a mutation that leads to an i f change in the nsp protein and a mutation leading to d g change in the spike protein were prevalent in sars-cov- genomic sequences, and might have influenced the epidemiological properties of the virus in bangladesh. the pandemic of coronavirus disease referred to as covid- pandemic, which originated in wuhan, china in december is ongoing and has now spread to countries and territories. as of july , the pandemic has caused about million cases and over half a million death. this novel virus of the coronaviridae family and betacoronavirus genus ( , ) designated as severe acute respiratory syndrome coronavirus (sars-cov- ), is the causative agent of covid- . previously two other coronaviruses, namely sars-cov and mers-cov have demonstrated high pathogenicity and caused epidemic with mortality rate ~ % and ~ % respectively affecting more than countries each time ( ) ( ) ( ) . however, sars-cov- has proven to be highly infectious and caused pandemic spread to over countries and territories. besides its devastating impact in north america and europe, the disease is now rapidly spreading in south america including brazil, and in south asian countries particularly india, pakistan and bangladesh ( ) . the virus was first detected in bangladesh in march ( ) . although infections remained low until the end of march it began to rise steeply in april. by the end of june, new cases in bangladesh grew to nearly , and the rate of detection of cases compared to the total number of samples tested increased to about % which was highest in asia ( ) . sars-cov- is a positive-sense single-stranded rna virus with a genome size of nearly kb. the ' end of the genome codes for a polyprotein which is further cleaved to viral nonstructural proteins whereas the ' end encodes for structural proteins of the virus including surface glycoproteins spike (s), membrane protein (m), envelop protein (e) and nucleocapsid protein (n) ( ) . like other rna viruses, sars-cov- is also inherently prone to mutations due to high recombination frequency resulting in genomic diversity ( ) ( ) ( ) . due to the rapid evolution of the virus, development of vaccines and therapies may be challenging. to monitor the emergence of diversity, it is important to conduct comparative genomics of viruses isolated over time and in various geographical locations. comparative analysis of genome sequences of various isolates of sars-cov- would allow to identify and characterize the variable and conserved regions of the genome and this knowledge may be useful for developing effective vaccines, as well as in molecular epidemiological tracking. thousands of sars-cov- virus genomes have been sequenced and submitted in public databases for further study. this include sars-cov- genomic sequences submitted from bangladesh in the global initiative on sharing all influenza data (gisaid) database, till th june ( ) . we conducted comparative analysis of publicly available genome sequences of sars-cov- from countries to predict the origin of viruses in bangladesh by studying a time- resolved phylogenetic relationship. later, we analyzed the variants present in different isolates of bangladesh to understand the pattern of mutations in relation to the ancestral wuhan strain, find unique mutations, and possible effect of these mutations on the stability of encoded proteins, and selection pressure on genes. a total of whole genome sequences of sars-cov- including sequences isolated in bangladesh (detail information in s table) , and that of isolates of each month between january to may, isolated in different countries with high frequency of infection were included in this analysis. source and number of sequence are presented in table (detail information are provided in s table) . however, since only number of sequences were reported from different african countries, we included all sequences from the african countries and categorized collectively as african sequence ( ) . reference sequence included in various analysis was the sequence of the ancestral strain from wuhan, china (nc_ . ) ( ). selected sequences were annotated by viral annotation pipeline and identification (vapid) tool ( ) , and multiple sequence alignment was carried out using mafft algorithm ( ) . maximum likelihood phylogenetic tree was constructed with iq-tree ( ), the generated tree was reconstructed based on time-calibration by treetime ( ) , and visualized on itol server ( ) . for analysis of mutations, sequence were mapped with minimap ( ) , and variants were detected using samtools ( ) and snp-sites ( ) . a haplotype network was generated based on mutations in genome using popart ( ) . sequences were then put into different clades based on specific mutations proposed in gisaid ( ) and further classified as d g type ( , ) . subsequently, another phylogenetic tree and haplotype network containing only sars-cov- sequences from bangladeshi was constructed and categorized using the same tools, and additionally one step further clustered with treecluster ( ) . the direction of selection in sequences from bangladesh was calculated by the slac algorithm ( ) in the datamonkey server ( ) . finally, the effects of the mutations on protein stability were predicted using deepddg ( ). a total of genomic sequences of sars-cov- reported from various countries (table ) which included sequences from bangladesh and the sequence of the ancestral sars-cov- isolated in wuhan, china were analyzed in the time-resolved phylogenetic tree. sequences from bangladesh belonged to three different clusters of which one cluster carried of the sequences, and shared the same node with sequence from germany while they had a common ancestry with isolates from the united kingdom (fig ) . the remaining two clusters of sars-cov- sequences contained and sequences respectively from bangladesh, and they shared the same node with sequence of sars-cov- reported from india, and also shared a common ancestry with isolates from saudi arabia. besides, lone sequences that did not belong to any of these clusters were found to have similarity with sequences from europe including united kingdom, germany, france, italy, and russia. one of these sequences was closely related to sequence reported from the usa. subsequently, all sars-cov- sequences from representative countries were clustered based on some specific mutations sustained, into different clades as mentioned by gisaid. in this analysis, the sequences from bangladesh were found to be distributed in all clades except v (figs and ) . classification. in order to understand the evolutionary relationship and possible transmission dynamics of sars-cov- in bangladesh at a higher resolution, another time-resolved phylogenetic tree carrying only sequences of the pathogen isolated in various regions of bangladesh was generated using the sequence of the first sars-cov- reported from yuhan, china as a reference. of the three clusters produced in this analysis, cluster- included mostly isolates from chattogram and one isolate from dhaka, cluster- included isolates from dhaka, narayanganj and chattogram districts, whereas cluster- included isolates from chattogram only. as mentioned above, the isolates from bangladesh were found to be distributed in all gisaid clades based on specific mutations, except in clade v (fig ) . most isolates of dhaka and narayanganj ( of ) belonged to the gr clade, whereas those of chattogram belonged to five different gsid clades (g, gh, gr, o, and s). the major international airport in bangladesh is situated in the capital city dhaka, whereas the major seaport is located in chattogram. based on the phylogenetic analysis, all isolates of dhaka were the descendant of sars-cov- found in european countries, more specifically france and the united kingdom. on the other hand, most isolates of chattogram were found closely related to saudi arabian isolates. moreover, considering the gsid clades, the presence of s clade was absent among dhaka whereas most isolates of chattogram was found to belong to the s clade. clearly these two genomic variants of sars-cov- were initially imported by travelers from different countries, and the two variants initially spread in the two areas. that the isolates of narayanganj and two isolates of dhaka are closely related, indicates that the sars-cov- strain imported initially through international traveler to dhaka later spread to narayanganj, which is a densely populated city with river ports and large business centres. the sars-cov- sequences were also categorized according to d g type mutation (fig ) . this particular subtype with a non-silent (aspartate to glycine) mutation at th position of the spike protein is presumed to have rapidly outcompeted other preexisting subtypes, including the ancestral strain. the d g mutation generates an additional serine protease (elastase) cleavage site near the s -s junction of the spike protein ( ) . all but one sequence from dhaka and narayanganj were found to be of g type which carries glycine at position whereas sequences of chattogram carry sequences of both types (fig ) . in addition, the first sequence from bangladesh carried g type of surface glycoprotein, which indicate that this dominant variant was present since the first isolation of sars-cov- in bangladesh and the mutant virus might have been imported to the country from europe, and the presence of the mutation might have facilitated viral transmission. relationships among dna sequences within a population are often studied by constructing and visualizing a haplotype network. we constructed a haplotype network by the median joining algorithm and found that of sars-cov- sequences from representative countries were alike, therefore formed a large haplo group (fig a) . however, there were presences of a significant number of unique lineages too consisting of a single or multiple sars-cov- sequences (fig a) . this network demonstrated the closeness of the sequences and their pattern of mutation beyond the geographical boundary. several sars-cov- isolates appeared to have sustained certain common mutations along with certain unique mutations. although a large proportion of sequences from bangladesh belonged to the common cluster (fig a) , there was a significant number of unique nodes as well due to mutations overtime subsequent to being carried into bangladesh (fig a) . therefore analysis of the sequences from bangladesh provided further insight of their mutation patterns. the haplotype network revealed that viruses isolated in bangladesh had certain unique mutations in them, and as a result they belonged to different haplo groups and no significant cig group (fig. a) . most of the isolates sustained a significant number of mutations compared with each other. in addition it further confirms that most isolates from chattogram ( fig b) were not directly related to those isolated in dhaka or narayanganj. we detected the presence of point mutations in sars-cov- isolates from bangladesh when compared to the reference sequence from yuhan, china. in addition, isolates were found to have lost significant portions of their genome, and as a result lost sequences for some non-structural proteins such as orf and orf while other deletions were upstream or downstream gene variants (s table) . among the point mutations, mutations were in the non-coding region of the genome and were in coding regions. ten of the non-coding mutations were in upstream non-coding region and rest was in downstream non-coding region of the genome. seventy mutations in the coding region were synonymous and mutations predicted substituted amino acids. among twelve predicted orfs, orf ab which comprises approximately % of the genome encoding nonstructural proteins had more than percent of the total mutations while gene e encoding envelope protein and orf b were conserved and did not carry any mutation. though orf harbored the highest number of mutations, mutation density was highest in orf considering orf lengths. details and distribution of the mutations are presented in table and full analysis report is placed in s table. in sequences from bangladesh, c>t and c>t changes were the two most abundant mutations found in out of isolates, and often found simultaneously (table ) . position is located in the non-coding region whereas the mutation in position was synonymous. on the other hand, sequences were found to harbor c>t and a>g mutations which altered amino acid pro>leu and asp>gly respectively, and these two mutations were found to be present simultaneously as well. in addition, other co- fig ) . orf was predicted to have dn/ds value of . due to the presence of higher number of missense than synonymous mutations. this finding indicates that orf is rapidly evolving and is highly divergent. the orf protein is an accessory protein whose function is yet to be fully elucidated ( ) . the orf a and nucleocapsid phosphoprotein had dn/ds values . and . respectively which confer their strong evolution to cope up with challenges under positive selection pressure. orf is predicted to be conversed with dn/ds value while envelope protein and orf b did not harbor any mutation and was conserved. on the other hand, orf a and surface glycoprotein might approach toward positive selection pressure and evolve but rests of the proteins were under negative selection pressure. table ) . none of the mutations in structural proteins were predicted to increase stability. all mutations were synonymous and found this gene conserved in summary, mutation analysis revealed point mutations as well as deletion of base pairs. deletions of the base pairs were associated with missing non-structural proteins and predictably affected certain viral properties since orf a protein is the growth factor of the coronavirus family, induce apoptosis, and promotes viral encapsulation ( ) ( ) ( ) while orf is associated with viral adaptation by playing role in host-virus interaction ( , ) . furthermore, we have found that some genes are under positive selection pressure indicating that the virus is fast-evolving presumably to evade host cell's innate immunity; which should be taken into special consideration prior to vaccine development or other treatment strategies. finally, a missense mutation at a>t changing the amino acid isoleucine to phenylalanine in nsp protein was found uniquely among isolates in bangladesh. nsp is a methyltransferase like domain that interacts with phb and phb , and modulates the host cell survival strategy by affecting cellular differentiation, mitochondrial biogenesis, and cell cycle progression to escape from innate immunity ( , ) . this unique and high-frequency mutation might be a further interest of study, considering death rate against the infection rate in bangladesh. virus taxonomy: the database of the international committee on taxonomy of viruses (ictv) epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china sars and mers: recent insights into emerging coronaviruses sars and other coronaviruses as causes of pneumonia webmeter. coronavirus age, sex, demographics (covid- ) -worldometer. . available from: www.worldometers.info covid- outbreak situations in bangladesh: an empirical analysis origin and evolution of pathogenic coronaviruses coronaviruses: an rna proofreading machine regulates replication fidelity and diversity two mutations were critical for bat-to-human transmission of middle east respiratory syndrome coronavirus the proximal origin of sars-cov- global initiative on sharing all influenza data -from vision to reality a new coronavirus associated with human respiratory disease in china vapid: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to ncbi genbank mafft multiple sequence alignment software version : improvements in performance and usability iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies treetime: maximum-likelihood phylodynamic analysis interactive tree of life (itol) v : recent updates and new developments minimap : pairwise alignment for nucleotide sequences the sequence alignment/map format and samtools snp-sites: rapid efficient extraction of snps from multi-fasta alignments. microb genomics popart: full-feature software for haplotype network construction clade and lineage nomenclature aids in genomic epidemiology studies of active hcov- viruses tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity clustering biological sequences using phylogenetic trees not so different after all: a comparison of methods for detecting amino acid sites under selection datamonkey: rapid detection of selective pressure on individual sites of codon alignments deepddg: predicting the stability change of protein point mutations using neural networks aputativediacidicmotifinthesars-covorf proteininfluencesitssubcellularlocalizationandsuppressionofexpressionofco -transfectedexpressionconstructs sars-cov accessory protein a directly interacts with human lfa- severe acute respiratory syndrome coronavirus gene products contribute to virus-induced apoptosis severe acute respiratory syndrome coronavirus orf a inhibits bone marrow stromal antigen virion tethering through a novel mechanism of glycosylation interference the orf protein of sars-cov- mediates immune evasion through potently downregulating mhc-i. biorxiv the proteins of severe acute respiratory syndrome coronavirus- (sars cov- or n-cov ), the cause of covid- covid- : the role of the nsp and nsp in its pathogenesis key: cord- - xzkfl g authors: pandolfi, laura; bozzini, sara; frangipane, vanessa; percivalle, elena; de luigi, ada; violatto, martina bruna; lopez, gianluca; gabanti, elisa; carsana, luca; d’amato, maura; morosini, monica; de amici, mara; nebuloni, manuela; fossali, tommaso; colombo, riccardo; saracino, laura; codullo, veronica; gnecchi, massimiliano; bigini, paolo; baldanti, fausto; lilleri, daniele; meloni, federica title: neutrophil extracellular traps induce the epithelial-mesenchymal transition: implications in post-covid- fibrosis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xzkfl g the release of neutrophil extracellular traps (nets), a process termed netosis, avoids pathogen spread but may cause tissue injury. nets have been found in severe covid- patients, but their role in disease development is still unknown. the aim of this study is to assess the capacity of nets to drive epithelial-mesenchymal transition (emt) of lung epithelial cells and to analyze the involvement of nets in covid- . neutrophils activated with pma (pma-neu), a stimulus known to induce nets formation, induce both emt and cell death in the lung epithelial cell line, a . notably, nets isolated from pma-neu induce emt without cell damage. bronchoalveolar lavage fluid of severe covid- patients showed high concentration of nets. thus, we tested in an in vitro alveolar model the hypothesis that virus-induced net may drive emt. co-culturing a at air-liquid interface with alveolar macrophages, neutrophils and sars-cov , we demonstrated a significant induction of the emt in a together with high concentration of nets, il and il β, best-known inducers of netosis. lung tissues of covid- deceased patients showed that epithelial cells are characterized by increased mesenchymal markers. these results show for the first time that netosis plays a major role in triggering lung fibrosis in covid- patients. neutrophils represent the most abundant type of white blood cells. neutrophils protect against foreign pathogens and are considered an essential component of the innate immune system. after activation, neutrophils can react against pathogens with three major different mechanisms: i) phagocytosis; ii) release of granules (that contain proteases and ros); iii) netosis. this last mechanism is considered a type of programmed neutrophils cell death, which consequently releases nets ( ) . nets are webslike structures composed by chromatin decorated with proteases, such as human neutrophils elastase (hne) and myeloperoxidase (mpo), whose primary role is limiting the spreading of pathogens in tissues ( ) . several signals can stimulate netosis: microorganisms; pro-inflammatory cytokines, such as il and il β; and chemicals, such as pma ( ) . in spite of their protective role in reducing pathogens diffusion in the parenchyma, several studies revealed that a persistence of nets can amplify the primary injury therefore inducing further clinical complications ( , , ) . in the lung, the persistence of activated neutrophils that release nets has been linked to several diseases, such as cystic fibrosis and acute respiratory distress syndrome (ards). notably, covid- patients who developed ards show increased nets in the serum and, most importantly, net release significantly correlates with the severity of the lung pathology ( ) . moreover, a recent paper demonstrated that nets are detectable in tracheal aspirate of patients with covid- and that they are involved in the prothrombotic clinical manifestations of covid- ( ) . the direct consequence of nets persistence in the lung is the damage of epithelial and endothelial cells, driven predominantly by histones ( ) . additionally, nets have been found to drive the emt in the context of the breast cancer thanks to their capacity to upregulate transcription factors involved in the emt (zeb and snai ) ( ) . given these premises, the aim of this study is to assess whether netosis may drive the emt also in lung epithelial cells. this would represent a new crucial molecular mechanism underlying the development of inflammatory-induced lung fibrosis, thus it would also be of great importance for several pulmonary diseases, such as autoimmune microbial insults and transplant rejection. also, given the high concentration of nets in covid- patients that develop ards, this study is aimed at understanding if nets could be implicated in the induction of emt in sars-cov pneumonia by promoting a fibrosis. since netosis is toxic in the lung ( , ) and it induces the emt in breast cancer cells ( ) , we investigated if nets can induce the emt in the alveolar epithelial cell line a . we used pma as a primary chemical effector of netosis. confocal microscopy confirmed that pma leads to a very efficient release of nets by human neutrophils (fig. l -t) compared to non-activated cells (fig. a -i). nets are visible as web-like structures and, as previously reported ( , ) , they are positive for the histone h ( fig. l-n) , hne ( fig. o -q) and mpo (fig. r-t) . moreover, dapi labeling ( fig. l , o, r) further confirms that these proteins are associated to free-dna (fig. n , q and t). to evaluate the possibility that nets may induce the emt in lung epithelial cells, we cultivated a cells with pma-neu for and h. by microscopy we observed that a cells lose the epithelial morphology upon incubation with pma-neu ( fig. a) and gain a fibroblastic phenotype (fig. c) . these morphological changes are visible also after tgf-β treatment, the best-known effector of the emt (fig. b ). to confirm that these morphological changes are related to emt induction, we evaluated the expression of two major proteins involved in the emt: α-sma (a mesenchymal marker) and e-cadherin (an epithelial marker). we performed western blot analysis on a cells exposed to pma-neu or tgf-β, used as a positive control of emt induction. pma-neu induced a significant up-regulation of α-sma ( . ± . ) after h compared to control cells notably, microscopy of a cells treated with pma-neu (fig. c) suggests that a portion of cells encounter cell death. the amount of cell death was quantified by propidium iodide (pi) staining h after treatment with pma-neu. flow cytometry analysis confirmed that pma-neu induced . ± . % of cell death compared to control cells (fig. s ). since pma-neu were able to induce both the emt and cell death of a cells, we asked if these effects were directly related to net release. nets were isolated from either . x cells (nets showing that the major cellular components are neutrophils, macrophages, epithelial cells and cellular/nuclear debries. in agreement with recent papers ( , , ) , measuring nets in the bal of mild (imw) and severe (icu) patients, we founded that icu have a significant higher amount of nets compared to imw patients (fig. a) . interestingly, when patients were divided in survivors and non-survivors, we found that the non-survivors group presents more nets compared to patients who survived ( figure b ). finally, we assessed the neutrophil count (fig. c) and the levels of il (fig. d ) in the same bal samples as previously reported ( ) , and found a significant direct correlation between nets and neutrophil counts, as well as the levels of il . we next investigated if in the context of covid- , nets could play a significant role in inducing the emt. to do so, we set up an alveolar in vitro model by culturing a cells at the air-liquid interface (ali). alveolar macrophages (am) were seeded at the apical side of a cells, while neu were added to the basolateral chamber of the transwell ( , ) . our in vitro alveolar model was, to clarify if nets are produced by neu and involved in this in vitro experimental setting, we measured hne-dna complexes in culture media after h of incubation, but also il and il β, two most known effectors of netosis. in agreement with the role played by nets in the induction of the emt, we found a significant higher induction of nets in neu+am+sars-cov compared to neu+sars-cov and, as expected, sars-cov alone (fig. c) . similarly, the quantification of il (fig. d ) and il β (fig. e ) showed that these two cytokines are only produced in neu+am+sars-cov cultures, in agreement with the well-known capacity of am to secrete these cytokines ( ) . our study demonstrated for the first time that nets are sufficient to induce emt in the a alveolar epithelial cell line. moreover, we confirmed a significant correlation between net induction in the lung and the severity of covid- . we also demonstrated in an in vitro model that nets, possibly sustained by the secretion of pro-inflammatory cytokines from ams, are involved in the inducement of emt. notably, lung tissues of covid- deceased patients showed that epithelial cells are characterized by increased mesenchymal markers. we found that neutrophils efficiently induce the emt via netosis, an observation that, at the best of our knowledge, has never been demonstrated before in the context of lung fibrosis and, more importantly, in severe covid- . net-related injuries are usually studied in the context of the endothelium ( , ) . in particular, the prothrombotic role ( ) of nets and their ability to drive endothelial-mesenchymal transition ( ) has been thoroughly analyzed. however, recent reports suggested that the infiltration of neutrophils in the alveolar space interferes with cell-cell adhesion of lung epithelial cells through hne ( ) , also, that neutrophils induce the emt by releasing tgf-β, neutrophil gelatinase-associated lipocalin (ngal) ( ) or proteinase-activated receptor (par ) ( ) . herein, we demonstrate that nets can also directly activate the emt program in a cells, a commonly used model of type ii pneumocytes. we showed that the addition of purified nets to a cells induced a significant overexpression of the mesenchymal marker α-sma after h of treatment, together with a decrease of e-cadherin expression (fig. ) . comparing pma-neu and nets effect, we found that although both treatments are able to trigger the emt (fig. and ) , the presence of neutrophils is uniquely able induce the death of a cells (fig. s ) . these results not only confirm data already present in the literature, but also shed new light on the mechanism that leads to the emt in the lung: exaggerate netosis not only induces tissue damage ( , ) but it also triggers the emt in lung epithelial cells. another important finding of our study is the presence of nets in the bal of covid- patients. given the high percentage of nets in the peripheral blood and the exaggerate infiltration of neutrophils in alveolar spaces of sars-cov- -infected patients ( , ) , the role of these phagocytes in the pathogenesis of covid- was clear since the very beginning. net levels in different samples of sars-cov -infected patients significantly correlated with the disease severity, suggesting that netosis may play a relevant role in covid- ( , , ) . here, we confirmed and expanded previous observations by quantifying nets in the alveolar micro-environment. we took into consideration two groups of patients, those with mild disease (patients that were admitted to the hospital with signs and symptoms of bilateral interstitial pneumonia but did not required intubation) and those with severe disease (who needed icu support). our data demonstrated higher levels of nets in severe patients compared to imw (fig. a ). in addition, net levels correlated to the percentage of neutrophils in the bal (fig. c) , as well as with the levels of il (fig. d ) and the disease severity ( fig. b) . because of the high levels of nets in the bal of covid- patients and their potential role in inducing the emt, we next focused on the correlation between nets and the emt by setting up an in vitro alveolar model that we infected with sars-cov- . to mimic the alveolar microenvironment a were cultivated at the ali to have the contact with air, such as under physiologic conditions. next, we added am onto the apical side of a and neu in the basolateral chamber of the transwell. sars-cov was inoculated on the apical side and after h only the neu+am+sars-cov culture significantly induced the emt in a , in contrast to neu+sars-cov and sars-cov alone conditions ( fig. a and b) . another recent study suggested that sars-cov infection may trigger emt-like molecular changes in a , as determined by the upregulation of the zeb gene together with the downregulation of epcam ( ) . however, in our experimental model, we demonstrated that the sole presence of sars-cov is not sufficient to induce an efficient emt, as measured by α-sma/e-cadherin regulation (fig. a) . notably, also in the case of neu+sars-cov we did not see a strong emt, as measured by α-sma/e-cadherin protein modulation in a (fig. a) , although neu were stimulated to secrete nets by sars-cov (fig. c ), as previously described by veras et al ( ) . thus, we hypothesize that the amount of nets released by neu in neu+sars-cov is not sufficient to elicit the emt in a cells. in fact, when we added am, we observed an enhancement of nets release probably due to the capacity of these cells to secrete il and il β (fig. d and e) , two proteins known to efficiently drive netosis ( ) . we are aware about the low endogenous expression by a of ace , the specific receptor for sars-cov cell entry ( , ) . however, our results are even more interesting because emphasizes the critical role of innate immunity in sars-cov infection despite the internalization of virus by cells. in particular, we would like to propose a model in which macrophages, by releasing il and il β, two cytokines significantly increased in the plasma ( , ) and bal-fluid of severe covid- patients ( ), potentiate the capacity of nets released by neu to amplify the noxious activity on lung cells, favoring the emt. our data are in agreement with previous studies that showed the existence of a feedback loop between neutrophils/nets and release of il ( ) and il β ( ). the immunohistochemistry analysis of lung tissue obtained post-mortem from deceased covid- patients supported our in vitro findings, showing a subset of pneumocytes expressing mesenchymal markers (fig. ) , thus confirming that epithelial cells acquired a mesenchymal phenotype. futures studies will be focused on in conclusion, this study highlights the contribution of neutrophil activation and net generation in the induction of the emt. several inflammatory disorders besides ards, such as autoimmune diseases and chronic lung graft rejection have been associated at various degrees with alveolar and small airway influx of neu and with their activation ( , ) . we, thus, hypothesize that netosisinduced emt should be considered as an important pathogenic step of lung fibrosis consequent to neutrophilic inflammation. our findings also suggest that future therapeutic interventions could target this mechanism. of the irccs policlinico san matteo foundation. bals were collected as previously described ( ) . after centrifugation, supernatants were analyzed to quantify il release; cell pellets were fixed, and neutrophils were counted after staining with papanicolaou. additional bal samples were fixed in % buffered formalin and cell-blocks were done; three-m paraffin sections were stained by h&e for cytological examination. lung autopsy tissues from patients died of sars-cov- pneumonia were fixed in % buffered formalin for h. three-m paraffin sections were stained by h&e. immunohistochemistry reactions were performed on the most representative area by using anti-cytokeratin (clone sp , ventana) and anti-α-sma (clone a , ventana). neutrophils were isolated from peripheral blood of healthy donors. blood was stratified by lympholyte® cell separation media (euroclone, milan, italy) and after centrifugation at g for min mononuclear cell phase was eliminated to allow neutrophils isolation. a solution of % dextran ( kda) in . % nacl was added to the remaining phases of neutrophils and erythrocytes followed by a min incubation at room temperature, allowing the precipitation of erythrocytes. supernatant was collected and washed with pbs (euroclone). after centrifugation at g for min, pellet was resuspended with versalyse lysing solution (beckman coulter s.r.l., milano) for min at room temperature in the dark. after the addition of pbs, sample was centrifuged again at g for min. this passage was repeated until the pellet was found to be free of erythrocytes. alveolar macrophages (am) were isolated from bal of four patients without lung infection and with a cytologic count showing am > % and lymphocytes < %. bal were filtered with a sterile gauze, centrifuged at g for min and then pellet of cells were cultured in suspension for h in dmem containing % fcs, p/s and l-glutamine. to induce the release of nets and to isolate them we followed a published protocol with some modifications ( ) . x isolated neutrophils were cultivated in cm plate with the addition of nm pma (sigma-aldrich s.r.l.,milan, italy) for h at °c. after that, supernatant was discarded, and the layer of nets present onto the bottom of the plate was harvested using pbs and collected into falcon. after centrifugation at g for min, cell-free nets-rich supernatant was divided in . ml eppendorf and centrifuged at , g for min at °c. a wild strain of sars-cov- was isolated in vero e cell line from a nasal swab of a covid- infected patient. virus was propagated and titrated to prepare a stock to be stored at - °c to be used for all the experiments at a concentration of tcid . a cell line (purchased from atcc ® ) was cultivated with dmem high glucose supplemented with % of fbs, % of penicillin-streptomycin solution and % of l-glutamine (all purchased by euroclone). cell were cultivated at °c with % of co and harvested when they reached the % of confluence. to study the induction of emt on a by pma-neu/nets, . x a were cultured on well plate for h. . x and x pma-neu /nets were added in each well and after and h supernatants were discarded and a were lysed to extract all proteins to perform western blot analysis. as positive control we treated a with g ml - of tgf-β. cell death was evaluated labeling a with pi after h of incubation with x pma-neu /nets and analyzed with flow cytometry. to mimic the alveolar space in vitro, . x a cells were cultured onto transwell inserts ( . (mab r, chemicon). after wash, the membranes were incubated with the appropriate horseradish-peroxidase conjugated secondary ab ( : in tbst + % bsa; h at room temperature; anti-mouse a and anti-rabbit a , sigma). the immunoreactivity was detected by ecl reagents (amersham), acquired with the chemidoc imaging system (image lab, bio-rad). to quantify cytokines released in the alveolar in vitro model, elisa assays were performed. we quantified il with simplestep elisa® kit (abcam) and il β with human il- β/il- f immunoassay (r&d systems) was titered using a commercial enzyme-linked immunosorbent assay kit (human il- β/il- f immunoassay, r&d systems) following the manufacturer's instructions and the results were expressed as pg ml - . all determinations were measure in same session. for il quantification in bal of covid- patients we referred to already published quantifications ( ) . to quantify nets we measured hne-dna complexes using elisa specific for hne and anti-dna ab of cell death detection elisa plus (roche). briefly, µl of sample (diluted : ) was added to anti-hne elisa kit (thermo fisher scientific) for h at room temperature. after three washes, anti-dna ab was added on each well followed by an incubation of h at room temperature. after in vitro results presented in this study were analyzed by one-way anova followed by dunnet test. results were obtained from three/four independent experimental replicates represented as mean ± sd. bal analyses were done by mann-whitney test, while correlation analyses were done calculating spearman coefficient. data are represented as median (interquartile range -iqr). all analyses were carried out with a graphpad prism . statistical program (graphpad software, san diego, ca, usa). a value p < . was considered statistically significant. research and data collection protocols were approved by the institutional review boards (comitato images of merged channels. scale bar = µm. neutrophil extracellular traps directly induce epithelial and endothelial cell death: a predominant role of histones netosis, complement, and coagulation: a triangular relationship neutrophil extracellular traps: double-edged swords of innate immunity neutrophilia and netopathy as key pathologic drivers of progressive lung impairment in patients with covid- targeting potential drivers of covid- : neutrophil extracellular traps maladaptive role of neutrophil extracellular traps in pathogen-induced lung injury neutrophil extracellular traps (nets) as markers of disease severity in covid- . medrxiv neutrophil extracellular traps (nets) contribute to immunothrombosis in covid- acute respiratory distress syndrome. blood neutrophil extracellular traps (nets) promote pro-metastatic phenotype in human breast cancer cells through epithelial-mesenchymal transition. cancers (basel) computational detection and quantification of human and mouse neutrophil extracellular traps in flow cytometry and confocal microscopy. sci rep integrin-dependent cell adhesion to neutrophil extracellular traps through engagement of fibronectin in neutrophil-like cells neutrophil extracellular traps infiltrate the lung airway, interstitial, and vascular compartments in severe covid- sars-cov- -triggered neutrophil extracellular traps mediate covid- pathology broncho-alveolar inflammation in covid- patients: a correlation with clinical outcome. medrxiv the crux of positive controls -proinflammatory responses in lung cell models air-liquid interface: relevant in vitro models for investigating air pollutant-induced pulmonary toxicity phenotypic, functional, and plasticity features of classical and alternatively activated human macrophages pulmonary post-mortem findings in a series of covid- cases from northern italy: a two-centre descriptive study neutrophil extracellular traps induce endothelial cell activation and tissue factor production through interleukin- α and cathepsin g nets activate pulmonary arterial endothelial cells the pathway of neutrophil extracellular traps towards atherosclerosis and thrombosis neutrophil extracellular traps drive endothelial-to-mesenchymal transition neutrophil elastase cleaves epithelial cadherin in acutely injured lung epithelium increased neutrophil gelatinase-associated lipocalin (ngal) promotes airway remodelling in chronic obstructive pulmonary disease proteinase-activated receptor stimulation-induced epithelialmesenchymal transition in alveolar epithelial cells drive necroinflammation in covid- hematological findings and complications of covid- sars-cov- infection induces emt-like molecular changes, including zeb -mediated repression of the viral receptor ace , in lung cancer models neutrophil extracellular traps stimulate proinflammatory responses in human airway epithelial cells structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis of receptor recognition by sars-cov- characterization of the inflammatory response to severe covid- illness clinical and pathological investigation of patients with severe covid- airway neutrophilia in stable and bronchiolitis obliterans syndrome patients following lung transplantation the role for neutrophil extracellular traps in cystic fibrosis autoimmunity simplified human neutrophil extracellular traps (nets) isolation and handling a-c) representative optical images of (a) untreated a , (b) a incubated with tgf-β, or (c) with pma-neu after h of treatment. scale bar = µm. (d,e) quantification of immunoblots of a using anti-e-cadherin or anti-α-sma incubated with tgf-β or pma-neu after (d) h, or (e) h. data are represented as mean ± sd of three independent replicates. (f) representative immunoblot of a line and = tgf-β; line and = pma-neu. statistical tests: one-way anova followed by dunnet test a) quantification of immunoblots using anti-e-cadherin or antiα-sma of a incubated with nets . (derived from . x pma-neu) or nets (derived from x pma-neu) after h of treatment. data are represented as mean ± sd of three independent replicates. (b) representative immunoblot of a treated with nets . or nets after h a) quantification of nets in bal of mild (imw) and severe (icu) patients. nets were measured by hne-dna complexes quantification subtracting absorbance read at nm to that read at nm. (b) quantified nets were compared dividing samples in survivors and non-survivors. statistical tests: shapiro-wilk test followed by mann-whitney test. (c) correlation analysis between nets and percentage of neutrophils counted in the respective bal sample. (d) correlation analysis between nets and il quantified in the respective bal sample. correlation analyses were done by spearman correlation . r = spearman coefficient western blot of emt nets and nets/cytokines analysis in alveolar in vitro model semiquantitative analysis of four independent immunoblots and (b) representative immunoblot using anti-e-cadherin or anti-α-sma or β-actin of a co-cultured with sars-cov , neu+sars-cov , or neu+am+sars-cov after h of treatment statistical tests: one-way anova followed by dunnet test. (c) nets quantification by dna-hne complexes measurement statistical tests: shapiro-wilk test followed by one-way anova and dunnet test. data are represented as mean ± sd lung tissue of covid- deceased patient with dad in proliferative phase vascular congestion, oedema, hyperplastic pneumocytes, a few granulocytes, fibrin deposition and focal hyaline membranes in alveolar spaces a) x and (b) x. (c-d) anti-cytokeratin immunohistochemistry staining residual epithelial cells; immunohistiochemistry, om (c) x and (d) x. (e-f) anti-α-sma immunohistochemistry staining of mesenchymal stromal and vascular cells om (e) x and (f) x. (f) the same epithelial cells of (d) express α-sma indicating the gain of a mesenchymal phenotype key: cord- -azt sr w authors: peng, qi; peng, ruchao; yuan, bin; wang, min; zhao, jingru; fu, lifeng; qi, jianxun; shi, yi title: structural basis of sars-cov- polymerase inhibition by favipiravir date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: azt sr w the outbreak of severe acute respiratory syndrome coronavirus (sars-cov- ) has developed into an unprecedented global pandemic. nucleoside analogues, such as remdesivir and favipiravir, can serve as the first-line broad-spectrum antiviral drugs against the newly emerging viral diseases. recent clinical trials of these two drugs for sars-cov- treatment revealed antiviral efficacies as well as side effects with different extents – . as a pyrazine derivative, favipiravir could be incorporated into the viral rna products by mimicking both adenine and guanine nucleotides, which may further lead to mutations in progeny rna copies due to the non-conserved base-pairing capacity . here, we determined the cryo-em structure of favipiravir bound to the replicating polymerase complex of sars-cov- in the pre-catalytic state. this structure provides a missing snapshot for visualizing the catalysis dynamics of coronavirus polymerase, and reveals an unexpected base-pairing pattern between favipiravir and pyrimidine residues which may explain its capacity for mimicking both adenine and guanine nucleotides. these findings shed lights on the mechanism of coronavirus polymerase catalysis and provide a rational basis for developing antiviral drugs to combat the sars-cov- pandemic. proteins . a total of structural proteins and at least non-structural proteins (nsps) as well as accessory proteins are encoded by the coronavirus genome. among them, nsp is the core catalytic subunit of viral rna-dependent rna polymerase (rdrp) complex, which executes transcription and replication of the viral genomic rna . two cofactor subunits, nsp and nsp , associate with nsp to constitute an obligatory core polymerase complex to confer processivity for rna elongation . to achieve complete transcription and replication, a panel of other nsps are involved to accomplish other enzymatic functions, including the nsp -nsp exonuclease for proofreading [ ] [ ] [ ] , the nsp helicase for rna unwinding [ ] [ ] [ ] [ ] , the nsp n methyltransferase and the nsp -nsp '-o-methyltransferase for capping [ ] [ ] [ ] [ ] [ ] . due to the key roles of polymerase complex for viral replication, it has long been thought of as a promising antiviral drug target . recently, we and other groups have determined the structures of sars-cov- core polymerase complex in both apo and rna-bound states - , providing important information for structure-based antiviral drug design. in the efforts to combat sars-cov- pandemic, remdesivir was initially expected as a highly competent drug candidate for disease treatment. however, the recently disclosed outcome of clinical trials revealed controversial efficacies as well as some cases of side effects , . the clinical improvement rate of patients receiving remdesivir within days was only % and ~ % subjects developed adverse effects , . in contrast, another broad-spectrum antiviral favipiravir showed good promise that . % patients were clinically recovered at day of treatment and . % subjects were reported with side effect manifestations , . both of them are administered as pro-drugs and should be processed into triphosphate forms as nucleotide mimicries to interfere with rna synthesis. the active form of remdesivir has a similar base moiety to adenine nucleotide which can be incorporated into the growing strand of rna product using an uracil nucleotide as the template . besides, the cyanogroup in the ribose ring may cause steric clash with polymerase residues during elongation, resulting in aberrant termination of rna synthesis , . it is yet unclear how favipiravir could be recognized by the polymerase and disturb the faithful process of rna production. to investigate the mechanism for the antiviral efficacy of favipiravir, we performed in vitro primer-extension assays using a template derived from the '-untranslated region ( '-utr) of the authentic viral genome. the + catalytic position was inserted with different template residues accordingly to allow only one nucleotide to be incorporated in the presence of each individual nucleotide triphosphate (ntp) substrate (fig. ) . even though the product strand was supposed to grow by only one nucleotide, some larger rna products with two or three nucleotides extension were generated for each specific ntp substrate ( fig. c and d) . this phenomenon suggests the sars-cov- polymerase is prone to mis-incorporate unwanted nucleotides into the product rna, resulting in low fidelity for transferring the genomic information during transcription and replication. similar observations were also reported recently that sars-cov- polymerase is more tolerant for mismatches between template and product residues than other viral rdrps , which further highlights the requirement for the proofreading nuclease nsp to maintain the integrity of viral genome. interestingly, favipiravir could be incorporated into the rna product with similar efficiencies to those of atp or gtp substrates guided by u or c template residues, respectively (fig. c) . in contrast, remdesivir could only be incorporated with a u in the template strand (fig. d) . these evidences demonstrate that favipiravir is a universal mimicry for purine nucleotides, different from other specific nucleotide analogues, such as remdesivir , and sofosbuvir . even though favipiravir could be efficiently incorporated into rna products to impair the fidelity of rna synthesis, it did not significantly inhibit rna production in vitro (extended data fig. ) . the presence of favipiravir in rna product did not immediately terminate the extension of growing strand except when repeat pyrimidine residues were encountered in the template. the rna synthesis seemed prone to stall if multiple consecutive incorporations of favipiravir were supposed to take place (extended data fig. b) , consistent with the observations reported by other groups recently . this evidence suggests that the presence of repetitive favipiravir residues might distort the configuration of template-product rna duplex to prevent its further extension. however, the consecutive incorporation of favipiravir might take place with extremely low probabilities in the cellular environment due to the competition of atp/gtp substrates. thus, favipiravir is more likely to be discretely incorporated during virus replication and induce mutations in progeny rna copies. this hypothesis was supported by a recently reported virus-based inhibition assay which revealed that favipiravir was able to escape the proofreading mechanism of sars-cov- replication complex and led to mutations in progeny viral genome . to uncover the structural basis of favipiravir recognition by the polymerase, we determined the structure of sars-cov- nsp -nsp -nsp core polymerase complex in the presence of a template-product partial duplex rna and favipiravir at . fig. ). in the structure, the nsp subunit (nsp . ) in the nsp -nsp heterodimer is mostly unresolved, similar to the recently reported structure of remdesivir-bound complex in the pre-translocation conformation (fig. a) . besides, the n-terminal long helices of nsp subunits could not be visualized as well, which are supposed to form a sliding platform for template-product duplex elongation , . this might result from the insufficient length of the rna duplex to establish extensive interactions with the helix track. this structure resolved nucleotide residues in the template strand and residues in the primer strand. importantly, favipiravir was captured in the triphosphate form (ftp) before catalysis which bound at the + position and paired with a c template residue ( fig. b and c) . the α-phosphate of ftp is located in vicinity of the '-hydroxyl group of a - residue with a distance of ~ . Å and no density for a phosphate-di-ester bond could be observed (fig. c) . moreover, the density of ftp could only be visualized at lower contour levels than the other residues, suggesting the lower occupancy of favipiravir in the complex, which might be related to the pre-catalytic state without stable covalent interactions with the product rna. thus, this structure provides a missing snapshot of the pre-catalytic conformation of sars-cov- polymerase, an earlier stage before the previously reported post-catalysis/pre-translocation and post-translocation conformations , , . as a mimicry of nucleotide substrate, ftp is recognized as an incoming ntp by conserved catalytic residues (fig. a) . residues d (motif a) and n (motif b) stabilize the ribose ring of ftp, and residue s (motif b) may potentially interact with both the ribose and pyrazine branch moieties. the a - residue of product rna potentially further contact with the α-phosphate and ribose of ftp + to drag the two residues in close proximity for catalysis. the amide group of favipiravir pairs with the template c residue and potentially forms three hydrogen bonds for base-pairing (fig. a) . when paired with a u template residue, there might be two hydrogen bonds in between (extended data fig. ) . in order to capture the pre-catalytic conformation, we did not include catalytic metals in the buffers for protein purification and assembled the complex by incubation on ice to prevent catalysis (extended data fig. c ). yet, we observed the density for a single metal ion anchored by d (motif c) of nsp which is offset the catalytic position and may arise from the cellular environment during expression (fig. a) . a similar conformation was also observed in the structure of enterovirus (ev ) polymerase (pdb: f i) . to enable catalysis, two metal ions are required to neutralize the negative charges of phosphate groups in ntp substrate and stabilize the transition intermediate , . based on the previous structure of post-catalytic conformation in the presence of remdesivir , we modeled two magnesium ions into our structure to analyze the potential catalytic state of this polymerase. the two metal atoms are coordinated by d (motif a) and d (motif c) and further bridge the three phosphate groups of ftp substrate. the terminal γ-phosphate is further sequestered by a salt bridge contributed by k (fig. a) . besides, residues r and r from motif f may also interact with the βand γ-phosphate groups. these interactions together stabilize the incoming nucleotide and '-terminus of product rna in close vicinity to facilitate the nucleophilic attack to α-phosphate by the '-hydroxyl oxygen of - nucleotide. since favipiravir could mimic both adenine and guanine nucleotides for rna synthesis, we modeled the gtp into the structure to compare the potential different recognition patterns for different ntp substrates (fig. ) . basically, the gtp reveals a highly similar interaction network with the polymerase residues to that for ftp except that an additional residue k from motif f may also be involved in interactions with the carbonyl group of the guanine base ( fig. b) . the atp substate could also be accommodated similarly with the amino group of adenine base interacting with k . for pyrimidine nucleotides, however, the base moiety is too far away from k to form such interactions (fig. d ). this residue is highly conserved in all viral rdrps and may serve as a signature residue for discriminating purine and pyrimidine substrates (extended data fig. ). comparing the rdrps from different viruses, including positively-/negatively-sensed and segmented/non-segmented rna viruses, we found the residues for stabilizing the ribose, base and catalytic metals are highly conserved across all viral families, whereas the residues for accommodating the phosphate groups reveal obvious diversity and may also involve residues outside the seven canonical catalytic motifs of rdrp, e.g. residue r in hepatitis c virus (hcv) polymerase (fig. c ). with the available structures of sars-cov- polymerase before and after catalysis , , , we were able to assemble a complete scenario of the catalytic cycle to analyze the potential conformational changes of the polymerase during catalysis (fig. ) . it has been established in flavivirus and enterovirus rdrps that the binding of incoming ntp substrates would induce active site closure and relocation of metal ions to facilitate catalysis - . a similar scenario may also exist for coronavirus polymerase, and potentially other viral rdrps. the motif a loop in apo polymerase adopts an open conformation with the active site fully accessible for incoming ntp substrates ( fig. a and b) . one of the catalytic metal ions (metal a) is anchored by d offset the catalytic site, whereas the other one (metal b) might be less stable and frequently exchange between the polymerase and the solvent. therefore, the metal a atom could be visualized in several viral polymerases without the template rna and ntp substrate in the active site, including bunyavirus (pdb: z g) , and arenavirus (pdb: klc and kld) polymerases. with the ntp substrate coming in, the motif a loop flip inward to close the active site and the side chain of d (motif c) rotates to relocate metal a into the catalytic site. besides, residue k (motif d) would also be re-oriented to stabilize the γ-phosphate of ntp ( fig. c and d) . after catalysis, the motif a loop and residue k would move back to the open conformation, allowing the pyrophosphoric acid fragment to be released. this process is accompanied by the translocation of template-product rna duplex, which resumes the polymerase to get ready for the next round of catalysis ( fig. e and f) . in summary, we present a missing structural snapshot of coronavirus polymerase replication in the pre-catalytic conformation, facilitating the extrapolation of the dynamic catalytic cycle for rna nucleotide polymerization. importantly, we reveal the structural basis of favipiravir incorporation by sars-cov- , which suggests the feasibility of developing other nucleotide-mimicking antiviral drugs by utilizing non-base derived molecular entities. given the better record of favipiravir than remdesivir in side effect manifestations , , it may represent a better option for clinical treatment of sars-cov- infections, as well as a panel of other human-infecting rna viruses . in addition, these findings provide an important basis for developing better broad-spectrum inhibitors with higher potency to combat the infection of various emerging rna viruses. wang the sars-cov- nsp , nsp and nsp l fusion proteins were expressed in e. coli, and the nsp polymerase subunit was expressed with the bac-to-bac system (invitrogen) as previously described . all these proteins were purified by tandem affinity chromatography and sizeexclusion chromatography (sec) accordingly. to constitute the core polymerase complex, the purified nsp , nsp and nsp l proteins were incubated on ice overnight with a molar ratio of nsp :nsp :nsp l = : : . the complex was then purified by sec using a superdex increase column (ge healthcare) equilibrated with a buffer consisting of mm hepes-naoh (ph . ), mm nacl and mm tris ( -carboxyethyl) phosphine (tcep). the fractions for nsp -nsp -nsp l complex were pooled and concentrated to mg/ml for subsequence experiments. the activity of sars-cov- polymerase complex was tested as previously described with slight modifications. briefly, -nt template rna strands (sequences adapted for each specific ntp substrate as shown in fig. ) were annealed to a complementary -nt primer containing a '-fluorescein label ( 'fam-gucauucuccuaagaagcua- ', takara). to perform the primer extension assay, μm nsp , μm nsp and μm nsp were incubated for min at °c with μm annealed rna and . mm individual ntp or drug in a reaction buffer containing mm tris-hcl (ph . ), mm kcl, mm tcep and mm mgcl (freshly added prior to usage). the products were denatured by heating to °c for min in the presence of formamide and resolved by % page containing m urea which was run with . ×tbe buffer. images were recorded with a vilber fusion imaging system. the rna products were quantified by integrating the intensity of each band using imagej software. the significance of difference was estimated by the two-tailed student's t-test to calculate the p values for each experimental group. to prepare the favipiravir bound polymerase complex, the purified nsp -nsp -nsp l each exposure was performed with a dose rate of e -/pixel/s (approximately e -/Å /s) and lasted for s, resulting in an accumulative dose of ~ e -/Å which was fractionated into movie-frames. the final defocus range of the dataset was approximately - . to - . μm. the movie frames were aligned using motioncor to correct beam-induced motion and anisotropic magnification . initial contrast transfer function (ctf) values were estimated with ctffind . at the micrograph level. particles were automatically picked with relion- . following the standard protocol. in total, approximately , , particles were picked from ~ , micrographs. after four rounds of d classification, ~ , , particles were selected for d classification with the density map of sars-cov- polymerase replicating complex (emdb- ) low-pass filtered to Å resolution as the reference. after two rounds of d classification, three distinguished d classes were identified with clear features of secondary structural elements. these classes showed different extents of flexibility at the distal end of rna duplex but the main body of polymerase complex was highly similar. therefore, these classes were combined (including ~ , particles) and subjected to d refinement supplemented with per-particle ctf refinement and dose-weighting, which led to a reconstruction of . Å resolution estimated by the gold-standard fourier shell correlation (fsc) . cut-off value. in the density map, the long helices within the n-terminus of nsp subunits were not observed. to better resolved this region, a local mask for the n-terminal region of nsp subunits and the rna duplex was applied to perform d classification without particle alignment. however, no definable class with a long helix track could be identified, suggesting the flexibility of this region in the structure. this phenomenon might result from the shorter rna duplex observed in other structures , that would help to stabilize the long helices. in addition, this structure was restricted by some extent of preferred orientation of particles, which somehow limited the attainable resolution in certain views of the final reconstruction. basically, the density map was sufficient to support faithful atomic modelling in most regions. the local resolution distribution of the final density map was calculated with resmap . the structure of remdesivir bound sars-cov- polymerase complex (pdb id: bv ) was rigidly docked into the density map using chimera . the model was manually corrected for local fit in coot . the structure of ftp was built using the "ligand builder" plug-in in coot and manually fitted into the density. the initial model was refined in real space using phenix with the secondary structural restraints and ramachandran restrains applied. the model was further adjusted and refined iteratively for several rounds aided by the stereochemical quality assessment using molprobity . the representative density and atomic models are shown in extended data fig. . the statistics for image processing and model refinement are summarized in the extended data table . structural figures were prepared by either chimera or pymol (https://pymol.org/). the the aligned sequences were then used for conservation analysis via the consurf server. the figure was generated with pymol. the cryo-em density map and atomic coordinates have been deposited to the electron microscopy data bank (emdb) and the protein data bank (pdb) with the accession codes emd- and ctt, respectively. all other data are available from the authors on reasonable request. remdesivir for the treatment of covid- -preliminary report we thank all staff members in the center of biological imaging (cbi), institute of biophysics (ibp), chinese academy of sciences (cas), for assistance with data collection. this study was key: cord- -z exbyu authors: wang, hongru; pipes, lenore; nielsen, rasmus title: synonymous mutations and the molecular evolution of sars-cov- origins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: z exbyu human severe acute respiratory syndrome coronavirus (sars-cov- ) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, ratg and rmyn . however, there is a segment of high amino acid similarity between human sars-cov- and a pangolin isolated strain, gd , in the receptor binding domain (rbd) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. we perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human sars-cov- and related strains. we show that the synonymous divergence between the bat derived viruses and sars-cov- is larger than between gd and sars-cov- in the rbd, providing strong additional support for the recombination hypothesis. however, the synonymous divergence between pangolin strain and sars-cov- is also relatively high, which is not consistent with a recent recombination between them, instead it suggests a recombination into ratg . we also find a -fold increase in the dn/ds ratio from the lineage leading to sars-cov- to the strains of the current pandemic, suggesting that the vast majority of non-synonymous mutations currently segregating within the human strains have a negative impact on viral fitness. finally, we estimate that the time to the most recent common ancestor of sars-cov- and ratg or rmyn based on synonymous divergence, is . years ( % c.i., . - . ) and . years ( % c.i., . - . ), respectively. the covid pandemic is perhaps the biggest public health and economic threat that the world has faced for decades (li, guan, et al. ; ; zhou, yang, et al. ) . it is caused by a coronavirus (lu, et al. ; zhang and holmes ), severe acute respiratory the genome of human coronavirus can effectively recombine with other viruses to form a chimeric new strain when they co-infect the same host (forni, et al. ; boni, et al. ). complicated recombination histories have been observed in the receptor binding motif region of the spike protein xiao, et al. ; zhang, et al. ) and several other regions (boni, et al. ) of the sars-cov- , it is thus important to exhaustively search along the viral genome for other regions potentially of recombination origin and identify possible donors associated with them. to identify possible viral strains that may have contributed, by recombination, to the formation of human sars-cov- , we searched ncbi and embl virus entries along with gisaid epiflu and epicov databases for similar sequences using blast in bp windows stepping every bp (fig. b) . the majority of the genome ( . %, / of the windows) has one unique best hit, likely reflecting the high genetic diversity of the coronavirus. . % of the genomic regions has multiple best hits, which suggests that these regions might be more conserved. among the windows with unique best hits, . % ( / ) of them were the ratg or rmyn bat strains and . % of them, including the ace contact residues region of the s protein, were the pangolin sars-cov- virus. these observations are consistent with previous results that ratg and rmyn are the most closely related viral strains, while the region containing the ace contact residues is more closely related to the pangolin virus strain table ). the mosaic pattern that different regions of the genome show highest identity to different virus strains is likely to have been caused by the rich recombination history of the sars-cov- lineage (boni, et al. ; li, giorgi, et al. ; in some genomic regions may suggest recombination between the ancestral lineage of sars- cov- and distantly related virus lineages, although more formal analyses are needed to determine the recombination history (see also boni, et al. for further discussion). searching databases with blast using the most closely related viral strains, ratg and rmyn , we observe a very similar pattern, as that observed for sars-cov- , in terms of top hits across the genome (fig. b) , suggesting that these possible recombination events with distantly related lineages are not unique to the sars-cov- lineage, but happened on the ancestral lineage of sars-cov- , ratg , and rmyn . a notable exception is a large region around the s gene, where rmyn show little similarity to both sars-cov- and ratg . sequence similarity and recombination we focus further on studying the synonymous evolution of sars-cov- , and analyzing wuhan- hu- as the human ncov reference strain we performed recombination analyses across the five viral genomes based on the concensus of the seven recombination-detection methods implemented in rdp (see methods). we identified nine recombination regions affecting at least one of the sequences incongruence when compared with genome-wide trees ( fig. and supplementary figure - ) . particularly, a recombination signal is found in a region encompassing the rbd of the s protein, suggesting that the human sars-cov- (wuhan-hu- ) sequence is a recombinant with the pangolin-cov (gd ) as the donor (supplementary table ). phylogenetic analyses also support that wuhan-hu- and gd form a clade relative to ratg (supplementary figure c , d). phylogenetic analyses (fig. ) in genomic regions with all recombination tracts (supplementary table ) masked using maximum-likelihood (fig. a) and neighbor-joining based on synonymous (fig. b ) or non-synoymous (fig. c ) mutation distance metrics, consistently support rmyn as the nearest outgroup to human sars-cov- , in contrast to previous analyses before the discovery of rmyn , which instead found ratg to be the nearest outgroup ). this observation is also consistent with the genome-wide phylogeny constructed in previous study (zhou, chen, et al. ). we plot the overall sequence similarity (% nucleotides identical) between sars-cov- and the four other strains analyzed in windows of bp (fig. ) . notice that the divergences between human sars-cov- and the bat viral sequences, ratg and rmyn , in most regions of the genome, are quite low compared to the other comparisons. a notable exception is the suspected recombination region in rmyn that has an unusual high level of divergence recombination event with more distantly related viral strains (fig. e) . the other four sequences are all highly, and approximately equally, divergent from rmyn in this large region (fig. e) , suggesting that the rmyn strain obtained a divergent haplotype from the recombination event. when blast searching using -bp windows along the rmyn genome, we find no single viral genome as the top hit, instead the top hits are found sporadically in different viral strains of the sars-cov lineage (fig. f) , suggesting that the sequence of the most proximal donor is not represented in the database. while the overall divergence in the s gene encoding the spike protein could suggest the presence of recombination in the region, previous study ) reported that the tree based on synonymous substitutions supported ratg as the sister taxon to the human sars- cov- also in this region. that would suggest the similarity between gd and human sars-cov- might be a consequence of convergent evolution, possibly because both strains adapted to the use of the same receptor. an objective of the current study is to examine if there are more narrow regions of the spike protein that might show evidence of recombination. we investigate this issue using estimates of synonymous divergence per synonymous site (ds) in sliding windows of bp. however, estimation of d s is complicated by the high levels of divergence and extremely skewed nucleotide content in the rd position of the sequences (table ) which will cause a high degree of homoplasy. we, therefore, entertain methods for estimation that explicitly account for unequal nucleotide content and multiple hits in the same site such as maximum likelihood methods and the yn method (yang and nielsen ) . it is shown that for short sequences, some counting methods, such as the yn method, can perform better in terms of mean squared error (mse) for estimating d n and d s (yang and nielsen ). however, it is unclear in the current case how best to estimate d s . for this reason, we performed a small simulations study (see methods) for evaluating the performance of the maximum likelihood (ml) estimator of d n and d s (as implemented in codeml (yang ) ) under the f x model and the yn method implemented in paml. in general, we find that estimates under the yn are more biased with slightly higher mse than the ml estimate for values in the most relevant regime of d s < . (fig. ) . however, we also notice that both estimators are biased under these conditions. for this reason, we perform a bias correction calibrated using simulations specific to the nucleotide frequencies and d n /d s ratio observed for sars-cov- (see methods). the bias corrections we obtain are ^* = ^ + . ^ - . ^ + . ^ , for the ml estimator and ^* = ^+ . ^ - . ^ + . ^ for yn . notice that there is a trade-off between mean and variance ( fig. ) so that the mse becomes very large, particularly for the for yn method, after bias correction. for d s > the estimates are generally not reliable, however, we note that for d s < . the bias-corrected ml estimator tends overall to have slightly lower mse, and we, therefore, use this estimator for analyses of bp regions. by recombination with more divergent strains. we, therefore, also estimate d n and d s for the regions with inferred recombination tracts (supplementary table ) removed from all sequences (table ) respectively. this confirms that rmyn is the virus most closely related to sars-cov- . the relative high synonymous divergence also shows that the apparent high nucleotide similarity between sars-cov- and the bat strains ( . % (zhou, yang, et al. ) and . %(zhou, chen, et al. )) is caused by conservation at the amino acid level (d n / d s = . and . ) exacerbated by a high degree of synonymous homoplasy facilitated by a highly skewed nucleotide composition at the third position of codons (with an at content > %, table ). the synonymous divergence to the pangolin sequences gd and gx_p e in genomic regions with inferred recombination tracts removed is . ( % c.i., . - . ) and . ( % c.i., . - . ), respectively. values for other comparisons are shown in tables and . in comparisons between sars-cov- and more distantly related strains, d s will be larger than , and with this level of saturation, estimation of divergence is associated with high variance and may be highly dependent on the accuracy of the model assumptions. this makes phylogenetic analyses based on synonymous mutations unreliable when applied to these more divergent sequences. nonetheless, the synonymous divergence levels seem differences between estimates larger than . should not be interpreted strongly, as these estimates have high variance and likely will be quite sensitive to the specifics of the model assumptions. we find that d s (sars-cov- , gd ) approximately equals d s (gd , ratg ) and is larger than d s (sars-cov- , ratg ) in almost the entire genome showing than in these parts of the genome gd is a proper outgroup to (sars-cov- , ratg ) assuming a constant molecular clock. one noticeable exception from this is the rbd region of the s gene. in this region the divergence between sars-cov- and gd is substantially lower than between gd and ratg (fig. a, c) . the same region also has much smaller divergence between sars-cov- and gd than between sars-cov- and ratg (fig. a, c) . the pattern is quite different than that observed in the rest of the genome, most easily seen by considering the ratio of d s (sars-cov- , gd ) to d s (sars-cov- , ratg ) (fig. b, d) . in fact, the estimates of d s (sars-cov- , ratg ) are saturated in this region, even though they are substantially lower than in the rest of the genome. this strongly suggests a recombination event in the region and provides independent evidence of that previously reported based on amino acid divergence (e.g.,(zhang, et al. )). the combined evidences from synonymous divergence and the topological recombination inference, provide strong support for the recombination hypothesis. however, these analyses alone do not distinguish between recombination into ratg from an unknown source as previously hypothesized (boni, et al. ) and recombination between sars-cov- and gd as proposed as one possible explanation by lam et al. . to distinguish between these hypotheses we searched for sequences that might be more closely related, in the rbd region, to ratg than sars-cov- and we plotted sliding window similarities across the genome for ratg (fig. c) . we observe relatively low sequence identity between ratg and all three other strains in the ace contact residue region of the spike protein, which is more consistent with the hypothesis of recombination into ratg , as proposed in (boni, et al. ). moreover, our blast search analyses of ratg in this region show highest local sequence similarity with gx pangolin virus strains which is the genome-wide with the hypothesis of recombination from a virus related to gx pangolin strains, than with recombination between sars-cov- and gd . unfortunately, because of the high level of synonymous divergence to the nearest outgroup, tree estimation in small windows is extremely labile in this region. in fact, synonymous divergence appears fully saturated in the comparison with gx_p e, eliminating the possibility to infer meaningful trees based on synonymous divergence. however, we can use the overall maximum likelihood tree using both synonymous and nonsynonymous mutations (fig. d ). the ml tree using sequence from the ace contact residue region supports the clustering of sars- cov- and gd , but with unusual long external branches for all strains except sars- cov- , possibly reflecting smaller recombination regions within the ace contact residue region. the use of synonymous mutations provides an opportunity to calibrate the molecular clock without relying on amino acid changing mutations that are more likely to be affected by selection. the rate of substitution of weakly and slightly deleterious mutations is highly dependent on ecological factors and the effective population size. weakly deleterious mutations are more likely to be observed over small time scales than over long time scales, as they are unlikely to persist in the population for a long time and go to fixation. this will lead to a decreasing dn/ds ratio for longer evolutionary lineages. furthermore, changes in effective population size will translate into changes in the rate of substitution of slightly deleterious mutations. finally, changes in ecology (such as host shifts, host immune changes, changes in cell surface receptor, etc.) can lead to changes in the rate of amino acid substitution. for all of these reasons, the use of synonymous mutations, which are less likely to be the subject of selection than nonsynonymous mutations, are preferred in molecular clock calculations. for many viruses, the use of synonymous mutations to calibrate divergence times is not possible, as synonymous sites are fully saturated even at short divergence times. however, for the comparisons between sars-cov- and ratg , and sars-cov- and rmyn , synonymous sites are not saturated and can be used for calibration. we find an estimate of ω = . between sars- cov- and ratg , excluding just the small rdb region showing a recombination signal in sars-cov- (supplementary table providing an approximate % confidence interval of ( . , . ). also, using human strains of sars-cov- from genbank and national microbiology data center (see methods) we obtain an estimate of ω = . using the f x model in codeml. notice that there is a - fold difference in d n /d s ratio between these estimates. assuming very little of this difference is caused by positive selection, this suggests that the vast majority of mutations currently segregating in the sars-cov- are slightly or weakly deleterious for the virus. to calibrate the clock we use the estimate provided by (http://virological.org/t/phylodynamic- analysis-of-sars-cov- -update- - - / ) of = . × - substitutions/site/year ( % ci: . x - , . x - ). the synonymous specific mutation rate can be found from this as d s /year = s = /(ps +ωpn), where ω is the d n /d s ratio, and pn and ps are the proportions of nonsynonymous and synonymous sites, respectively. the estimate of the total divergence on the two lineages is then ^= ( + )⁄ . inserting the numbers from table for the divergence between sars-cov- and ratg and rmyn ,respectively, we find a total divergence of . years and . years respectively. taking into account that ratg was isolated july , we find an estimated tmrca between that strain and sars-cov- of times. the estimate for sars-cov- and ratg is compatible with the values obtained using different methods for dating (boni, et al. ) . the variance in the estimate in d s is small and the uncertainty is mostly dominated by the uncertainty in the estimate of the mutation rate. we estimate the s.d. in ^ using parametric simulations, using the ml estimates of all parameters, for both ratg vs. sars-cov- and for rmyn vs. sars-cov- , and for each simulated data also simulating values of and from normal distributions with mean . × - and s.d. . × - , and mean . and s.d. . , respectively. we subject each simulated data set to the same inference procedure as done on the real data. our estimate of the s.d. in the estimate is . for ratg vs. sars-cov- and . for rmyn vs. sars- cov- , providing an approximate % confidence interval of ( . , . ) and ( . , . ), respectively. for ratg , if including all sites, except the -bp in the rbd of the s gene (supplementary table ), the estimate is . years with an approx. % c.i. of ( . , . ). as more sars-cov- sequences are being obtained, providing more precise estimates of the mutation rate, this confidence interval will become narrower. however, we warn that the estimate is based on a molecular clock assumption and that violations of this assumption eventually will become a more likely source of error than the statistical uncertainty quantified in the calculation of the confidence intervals. we also note that, so far, we have assumed no variation in the mutation rate among synonymous sites. however, just from the analysis of the bp windows, it is clear that is not true. the variance in the estimate of ds among bp windows from the ratg -sars-cov- comparison is approximately . . in contrast, in the simulated data assuming constant mutation rate, the variance is approximately . , suggesting substantial variation in the synonymous mutation rate along the length of the genome. alternatively, this might be explained by undetected recombination in the evolutionary history since the divergence of the strains. the highly skewed distribution of nucleotide frequencies in synonymous sites in sars-cov- (kandeel, et al. ), along with high divergence, complicates the estimation of synonymous divergence in sars-cov- and related viruses. in particular, in the third codon position the nucleotide frequency of t is . % while it is just . % for c. this resulting codon usage is not optimized for mammalian cells (e.g, (chamary, et al. ) ). a possible explanation is a strong mutational bias caused by apolipoprotein b mrna-editing enzymes (apobecs) which can cause cytosine-to-uracil changes (giorgio, et al. ). a consequence of the skewed nucleotide frequencies is a high degree of homoplasy in synonymous sites that challenges estimates of d s . we here evaluated estimators of d s in bp sliding windows and found that a bias-corrected version of the maximum likelihood estimator tended to perform best for values of d s < . we used this estimator to investigate the relationship between sars-cov- and related viruses in sliding windows. we show that synonymous mutations show shorter divergence to pangolin viruses, than the otherwise most closely related bat virus, ratg , in part of the receptor-binding domain of the spike protein. this strongly suggests that the previously reported amino acid similarity between pangolin viruses and sars-cov- is not due to convergent evolution, but more likely is due to recombination. in the recombination analysis, we identified recombination from pangolin strains into sars-cov- , which provides further support for the recombination hypothesis. however, we also find that the synonymous divergence between sars-cov- and pangolin viruses in this region is relatively high, which is not consistent with a recent recombination between the two. it instead suggests that the recombination was into ratg from an unknown strain, rather than between pangolin viruses and sars-cov- , as proposed in (boni, et al. ). the alternative explanation of recombination into sars-cov- from the pangolin virus, would require the additional assumption of a mutational hotspot to account for the high level of divergence in the region between sars-cov- and the donor pangolin viral genome. to fully distinguish between these hypotheses, additional strains would have to be discovered that either are candidates for introgression into ratg or can break up the lineage in the phylogenetic tree between pangolin viruses and ratg . the fact that synonymous divergence to the outgroups, ratg and rmyn , is not fully saturated, provides an opportunity for a number of different analyses. first, we can date the time of the divergence between the bat viruses and sars-cov- using synonymous mutations alone. in doing so, we find estimates of . years ( % c.i., . - . ) and . years ( % c.i., . - . ), respectively. most of the uncertainty in these estimates comes from uncertainty in the estimate of the mutation rate reported for sars-cov- . as more data is being produced for sars-cov- , the estimate should become more precise and the confidence interval significantly narrowed. we note that the mutation rate we use here are estimated based on the entire genome, which may differ from that in non-recombination regions. to address this problem, we downloaded all the sars-cov- sequences that are available until - - from gisaid, and obtained an estimate of : . for the ratio of mutation rates in the recombination and non-recombination regions, using the "gtrgamma" model implemented in the raxml (stamatakis ). given the length ratio between the two partitions is : , the difference between the partitions will cause a slight overestimate of the mutation rate by ~ %, which is relatively small compared to the confidence intervals and the potential for other unknown sources of uncertainty. however, we warn that a residual cause of unmodeled statistical uncertainty is deviations from the molecular clock. variation in the molecular clock could be modeled statistically (see e.g., (drummond, et al. ) and (lartillot, et al. ) ), but the fact that synonymous mutations are mostly saturated for more divergent viruses that would be needed to train such models, is a challenge to such efforts. on the positive side, we note that the estimates of d s given in table very different approach based on including more divergent sequences and applying a relaxed molecular clock. we see the two approaches as being complimentary. in the traditional relaxed molecular clock approach more divergent sequences are needed that may introduce more uncertainty due to various idiosyncrasies such as alignment errors. furthermore, the relaxed molecular clock uses both synonymous and non-synonymous mutations and is, therefore, more susceptible to the effects of selection. our approach allows us to focus on just the relevant in- group species and to use only synonymous mutations. the disadvantage is that we cannot accommodate a relaxed molecular clock. however, the fact that both approaches provide similar estimates is reassuring and suggests that neither idiosyncrasies of divergent sequences, natural selection, or deviations from a molecular clock has led to grossly misleading conclusions another advantage of estimation of synonymous and nonsynonymous rates in the outgroup lineage, is that it can provide estimates of the mutational load of the current pandemic. the d n /d s ratio is almost times larger in the circulating sars-cov- strains than in the outgroup lineage. while some of this difference could possibly be explained by positive selection acting at a higher rate after zoonotic transfer, it is perhaps more likely that a substantial proportion of segregating nonsynonymous mutations are deleterious, suggesting a very high and increasing mutation load in circulating sars-cov- strains. we are grateful to dr. e.c holmes for providing the genome sequence of rmyn . we thank dr. we also thank . replicates. the associated distance matrix for (b) and (c) can be found in table . basic local alignment search tool evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- an exact nonparametric method for inferring mosaic structure in sequence triplets hearing silence: non-neutral evolution at synonymous sites in mammals relaxed phylogenetics and dating with confidence phylip (phylogeny inference package) version . a. distributed by the author mafft multiple sequence alignment software version : improvements in performance and usability identifying sars-cov- related coronaviruses in malayan pangolins a mixed relaxed clock model emergence of sars-cov- through recombination and strong purifying selection phylogeny-aware alignment with prank genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding rdp: detection of recombination amongst aligned sequences evaluation of methods for detecting recombination from dna sequences: computer simulations identification of breakpoints in intergenotypic recombinants of hiv type by bootscanning analyzing the mosaic structure of genes detection of novel coronaviruses in bats in myanmar coronavirus from wuhan: an analysis based on decade-long structural studies of sars a new coronavirus associated with human respiratory disease in china isolation of sars-cov- -related coronavirus from malayan pangolins paml : phylogenetic analysis by maximum likelihood the mean of d s estimates using different methods; ml.corr and yn .corr are the bias corrected versions of the ml and yn methods, respectively. (b) errors in d s estimates as measured using the ratio of square root of mean squared error (mse) to true d s . all the estimates are based on , simulations. ml: maximum-likelihood estimates using the f x model in codeml; ml.corr, maximumlikelihood estimates with bias correction; yn , count-based estimates in (yang and nielsen ); yn .corr, yn estimates with bias correction key: cord- -znfgh m authors: fisher, dale; reilly, alan; zheng, adrian kang eng; cook, alex r; anderson, danielle e. title: seeding of outbreaks of covid- by contaminated fresh and frozen food date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: znfgh m an explanation is required for the re-emergence of covid- outbreaks in regions with apparent local eradication. recent outbreaks have emerged in vietnam, new zealand and parts of china where there had been no cases for some months. importation of contaminated food and food packaging is a feasible source for such outbreaks and a source of clusters within existing outbreaks. such events can be prevented if the risk is better appreciated. it is possible in these regions that eradication was never truly achieved and that there had been ongoing unidentified transmission. alternatively, occult transmission chains can be seeded through travelers but this mechanism would require a false negative swab in an asymptomatic individual or failure to ensure quarantine. for occult transmission to lead to recrudescence of an outbreak months since the last detected case would require an ascertainment rate that for most settings is implausibly small, as demonstrated through simulations (supplementary material). even for a reproduction number of , a transmission chain that avoided stochastic extinction, and only one in twenty infections being detected, the chance of reaching generations without detection is at most one in a thousand. another possibility to explain reemergence is transport of contaminated products such as foodstuffs. a well recognised feature of the covid- pandemic is the number of clusters within meat and seafood processing facilities. in the uk, outbreaks of covid- caused disruption in a poultry processing plant and in an establishment producing ready meals for supermarkets ( ). fish processing in tuna canneries in portugal and ghana was suspended after workers in both countries tested positive for covid- ( , ). abattoirs in australia have closed following large clusters amongst its workers ( , ). in germany, more than , workers tested positive for covid- at one of the largest slaughterhouses in gütersloh culminating in a lockdown of two districts and over , people ( ). the beijing xinfadi wholesale food market event has lead to concerns that imported contaminated food could seed new clusters. sars-cov- was detected on workers and environmental samples, including a cutting board used to slice imported salmon. a swabbing campaign directed at millions of nearby residents as well as all workers at the market and the broader beijing food chain revealed covid- cases ( ). chinese authorities acted quickly to suspend import of salmon from europe ( , ), and later imports from food premises where outbreaks of covid- have occurred among workers, affecting the us, germany and brazil among others ( ). in july, china also suspended imports of shrimp from three ecuadorean processing plants after detecting sars-cov- on shipments ( ). the feasibility of this "non traditional" transmission mechanism is currently debated. at - °c no viable sars-cov- was found after hours on copper surfaces, hours on cardboard and after days on stainless steel and plastic surfaces ( ) . we have assessed the survival of sars-cov- on refrigerated and frozen meat and salmon over weeks to assess the potential of outbreaks being seeded by imported contaminated food. vero-e cells (atcc# crl- ) were maintained in dulbecco's modified eagle medium (dmem) supplemented with % fetal bovine serum (fbs) and % penicillin/streptomycin. sars-cov- , isolate betacov/singapore/ / (accession id epi_isl_ ), was used as virus inoculum. all work with sars-cov- was performed under bsl containment at the duke-nus medical school absl laboratory. individual pieces of salmon, chicken and pork sourced from supermarkets in singapore were sliced into mm cubes and µl of x tcid /ml sars-cov- was added to each cube. the samples were stored at different temperatures ( ˚c, - ˚c and - ˚c) and harvested at specified time points ( , , , , -and -days post-inoculation). following incubation, µl of the virus inoculum was transferred to a new tube and frozen at - °c until titration. three replicates were performed for each condition. the cell-free virus titre (tcid /ml) for each sample were determined by limited dilution. the limit of detection (lod) was x tcid /ml. graphpad prism (version . ) was used to perform one-way anova, followed by dunnett's multiple comparison test. the titre of sars-cov- remained constant at °c, - °c and - °c for the duration of the experiment (fig ) . infectivity was maintained for weeks in both the refrigerated ( °c) and frozen (- °c and - °c) samples. no significant difference was observed between sars-cov- recovered after incubation with or without the presence of food. the who advises that it is very unlikely that people can contract covid- from food or food packaging ( ). while it can be confidently argued that transmission via contaminated food is not a major infection route, the potential for movement of contaminated items to a region with no covid- and initiate an outbreak is an important hypothesis. it is necessary to understand the risk of an item becoming contaminated and remaining so at the time of export, and of the virus surviving the transport and storage conditions. the clusters of infection of covid- among workers in slaughterhouses and meat processing facilities in many countries can be attributed to factors that promote transmission of virus directly between workers, such as crowding, poor ventilation, and shouting in close proximity due to high ambient noise levels. workers may go to work when infected, they may live in crowded housing, and travel on crowded transport. environmental contamination at the work site is likely to be prolonged due to low temperatures, metal surfaces and lack of uv light ( , ) . with a significant burden of virus present in infected workers and the environment then contamination of meat with sars-cov- is possible during butchering and processing. the killing lines in abattoirs generally run at ambient temperature but the process later moves into a controlled environmental temperature of not greater than °c for the breakdown of carcasses and meat is maintained at - °c as legislated by food regulations. the processing of meat and poultry is generally carried out manually in crowded conditions. salmon processing is, in contrast, highly automated with filleting and cutting performed by machines, and minimal handling by workers. however, where such processing is carried out manually in crowded conditions the risk of contamination increases. our laboratory work has shown that sars-cov- can survive the time and temperatures associated with transportation and storage conditions associated with international food trade. when adding sars-cov- to chicken, salmon and pork pieces there was no decline in infectious virus after days at °c (standard refrigeration) and - °c (standard freezing). contamination of food is possible, and virus survival during transport and storage is likely. food transportation and storage occurs in a controlled setting akin to a laboratory. temperature and relative humidity is consistent and maintained and adverse conditions such as drying out is not permitted for the integrity of the food. in quantifying the viral titre we can reasonably assess a rate of decline in infectivity, which did not occur in any of the conditions we assessed. we believe it is possible that contaminated imported food can transfer virus to workers as well as the environment. an infected food handler has the potential to become an index case of a new outbreak. the international food market is massive and even a very unlikely event could be expected to occur from time to time. efforts to avert the risk of covid- outbreaks seeded by contaminated food must begin at the source; that is food processing premises. these include frequent hand washing, cleaning of food contact surfaces, materials and utensils. fitness to work protocols should be in place and unwell staff should be excluded. furthermore, the conditions under which workers in our food chain must be reviewed to ensure that our food is safe. financial support needs to be given to unwell workers to ensure no disincentives to self-isolation or presenting for a test. ppe usage needs to be overseen and social distancing in and out of the workplace needs to be supported. in receiving markets at the other end of the supply chain, food cannot be decontaminated, however, added precautions to ensure good hand hygiene and regular cleaning of surfaces and utensils is important. consumers should wash their hands after touching uncooked products and ensure that food is well cooked. our findings, coupled with the reports from china of sars-cov- being detected on imported frozen chicken and frozen shrimp packaging material, should alert food safety competent authorities and the food industry of a "new normal" environment where this virus is posing a non-traditional food safety risk. aerosol and surface stability of sars-cov- as compared with sars-cov- update: covid- among workers in meat and poultry processing facilities -united states figure : quantification of infectious virus over days. viral titres were determined by limiting dilution. titres are expressed as mean ± sd log tcid /ml. sars-cov- was stored alone or in the presence of fish, chicken or pork and tested under refrigeration ( °c); and frozen (- °c and - °c). to assess the feasibility of long chains of occult infection leading to clusters some time after a community was thought to have been clear of infection, we built a branching process simulation model. each case is assumed to generate a poisson-distributed number of secondary cases with mean . we seed the chain with a single unidentified case in generation , the untraced and unobserved daughter infection of a case. the number of infections in generation of the chain is denoted , which has distribution ( ). each case after the initial infection that seeds the chain is ascertained with probability . the chain is simulated forward for generations or until cases or until the first case is ascertained. we then calculate the number of generations passed until the chain is uncovered or becomes extinct, and the probability there is on-going occult transmission in each generation. the probability the chain of infection is detected and the expected number of infections are derived through monte carlo simulation with draws. figure s : probability of ongoing undetected transmission for the first generations for four values of the ascertainment rate ( = %, %, % and %) and reproduction number ( = , . , , . ). probabilities < / are set to / for graphical purposes.it is very unlikely that ongoing occult transmission would be maintained for more than ten generations except for low reproduction numbers and very low ascertainment rates. key: cord- -fmuozy w authors: bickler, stephen w.; cauvi, david m.; fisch, kathleen m.; prieto, james m.; gaidry, alicia d.; thangarajah, hariharan; lazar, david; ignacio, romeo; gerstmann, dale r.; ryan, allen f.; bickler, philip e.; de maio, antonio title: age is associated with increased expression of pattern recognition receptor genes and ace , the receptor for sars-cov- : implications for the epidemiology of covid- disease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fmuozy w older aged adults and those with pre-existing conditions are at highest risk for severe covid- associated outcomes. using a large dataset of genome-wide rna-seq profiles derived from human dermal fibroblasts (gse ) we investigated whether age affects the expression of pattern recognition receptor (prr) genes and ace , the receptor for sars-cov- . older age was associated with increased expression of prr genes, ace and four genes that encode proteins that have been shown to interact with sar -cov- proteins. assessment of prr expression might provide a strategy for stratifying the risk of severe covid- disease at both the individual and population levels. most people infected with sars-cov- will have mild to moderate cold and flu-like symptoms, or even be asymptomatic ( ) . older aged adults, and those with underlying conditions such as diabetes mellitus, chronic lung disease and cardiovascular disease are at highest risk for severe covid- associated outcomes ( ) . the highest case fatality rates are in the years and older age group ( . %), with the lowest in the - years age group ( . %) ( ) . the reasons for these markedly different outcomes at the extremes of age and for the occasional death that occurs in apparently healthy younger patients remain poorly understood. pattern recognition receptors (prrs) play crucial roles in the innate immune response by recognizing pathogen-associated molecular patterns (pamps) and molecules derived from damaged cells, referred to as damage-associated molecular patterns (damps) ( ) ( ) ( ) . prrs are coupled to intracellular signaling cascades that control transcription of a wide spectrum of inflammatory genes. humans have several distinct classes of prrs, including toll-like receptors (tlrs), nod-like receptors (nlrs), rig-like receptors (rlrs), c-type lectin receptors (clrs) and intracellular dna sensors. prrs play a critical role in the inflammatory response induced by viruses and are important determinants of outcome ( ) ( ) ( ) . the oldest (≥ years) and the youngest (≤ years) age groups (see "methods" section). after filtering out genes with low expression (cpm > . in at least two samples), a total of genes were differentially expressed between the oldest relative to the youngest age group (fig. a table ). we next focused on whether the expression of individual prr genes change with age. between the oldest (≥ years) and the youngest (≤ years) age groups we found three differentially expressed prr genes (tlr , tlr , and ihif ) that had a log fc > . ( fig. c and d, additional file : suppl tables b and a-c) . plots of the other tlr genes counts are provided in additional file : suppl figure . the expression of two prr genes were negatively correlated with age, nucleotide-binding oligomerization domain-containing protein (nod ) (log fc = - . ; adj. p value = . ; pearson r - . , adj. p value . ) and cyclic gmp-amp synthase (cgas) (log fc = - . , adj. p value . e- ; pearson r - . , adj. p value . e- ). both genes encode proteins that activate the immune response to viruses ( , ) . to explore our findings further, we performed a differential gene expression analysis on the dermal fibroblast cell lines that had high (> th percentile) and low (< th percentile) expression of tlr (additional file : suppl table ). curiously, enrichment analysis of the differentially expressed genes showed cell cycle (kegg: hsa ) to be the canonical pathway with the greatest enrichment (fdr . e- ), similar to the enrichment of the differentially expressed genes between oldest and youngest groups ( fig. b and c , additional file : suppl table ). tlr is known to act via the adaptor molecule trif to regulate the expression of type i interferons. tlr activation of trif can also induce the cell cycle, an effect which is antagonized by type i interferons ( ) . our finding of both high levels of tlr and elevated cell cycle could thus imply changes in the expression of type i interferons. we then examined whether the expression of ace , the receptor for sars-cov- , changes with age. ace expression was detected in of the cell lines ( . %) and showed a marked increase in the + age group (fig. b right) . ace expression was correlated with the expression of of the prr genes ( fig. a and additional file: suppl table a-c). of note, ace was expressed at much lower levels than tlr , with variable expression in the year and over age group. whether the latter reflects the biological state of the individuals who donated the skin samples or is a consequence of ex vivo culture will require further study. we also asked the question if the differentially expressed genes between the oldest and youngest age groups encode proteins that interact with sars-cov- (see "methods" section). our analysis revealed eleven differentially expressed genes between the oldest and youngest age groups that encode proteins known to interact with sars-cov- (fig. d) . four of these genes (adam , fbln , fam a , clip ) have increased expression in the older compared to the younger age groups. interestingly, the sar-cov- proteins to which they bind relate to lipid modifications and vesicle trafficking. host interactions of orf (endoplasmic reticulum quality control), m (er structural morphology proteins), and nsp (golgins) may facilitate the dramatic reconfiguration of er/golgi trafficking during coronavirus infection ( ) . whether age-related increases in the expression of host proteins that bind sars-cov- protein predispose to covid- disease or change its clinical course deserves further study. the covid- (coronavirus disease- ) pandemic is presenting unprecedented challenges to health care systems and governments worldwide. as of june , there have been , , confirmed cases worldwide, resulting in , deaths ( ) . covid- disease is caused by the novel severe acute respiratory syndrome coronavirus (sars-cov- ). in this study, we used rna-seq data from a large collection of dermal fibroblasts to demonstrate that prr genes and ace vary with age. further, we show that aging is associated with increased expression of several genes that encode proteins known to bind to sars-cov- . whether these gene expression differences contribute to the epidemiology of sars-cov- infection will require further study. nevertheless, overexpression of prr genes, tlr in particular, is an intriguing mechanism to explain the relationship between age and sars-cov- infection, and potentially the tlr-mediated cytokine storm that characterizes the morbidity and mortality in covid- disease. tlr has been previously suggested to have a role in the damaging responses that occurs during viral infections, acting via both pamps and damps our study does have some limitations. foremost, is that health information was not available for the individuals donating skin samples to the dermal fibroblast collection. although, the skin samples are reported to be from "apparently healthy individuals", we believe it is unlikely that individuals in the oldest age group were completely free of chronic diseases. another limitation was that minority groups are inadequately represented in the collection. the dermal fibroblast collection includes samples from one american indian (< %), one hispanic (< %), two asians ( . %), and nine blacks ( . %)-way too few to draw any meaningful conclusions on the ethnic groups that have been the hardest hit by the covid- pandemic. finally, as the scientific community ramps up research in response to the covid- pandemic, the dermal fibroblast model could prove useful for investigating sars-cov- biology. fibroblasts have been previously used to investigate host antiviral defenses during coronavirus infection ( ) . the potential strength of the dermal fibroblast model is that skin samples can be easily obtained from donors of different ages, sex, and ethnicities, and those with varying comorbidities such a high blood pressure and diabetes; and from smokers and non-smokers. such a model would also have an advantage over transfection models as these cells would not only have increased expression of ace and tlr , but also have an aged transcriptome which could be important for the infectivity and outcome of the sars-cov- infection. the critical role prrs play in mediating host-pathogen interactions, and their increased expression in some comorbidities associated with poor covid- outcomes, make them an attractive target for developing tools to predict risk for and outcomes of sars-cov- infection at both the individual and population levels. using a large dataset of genome-wide rna-seq profiles derived from human dermal fibroblasts we show that expression of prr genes and ace , the receptor for sars-cov- vary with age. advanced age was also associated with increased expression of several genes that encode proteins which interact with sars-cov- . given that prrs function as a critical interface between the host and invading pathogens, further research is needed to better understand how changes in prr expression affects the susceptibility to and outcome of sars-cov- infection. our analysis was done using rna-seq data (gse ) from the national center for biotechnology information (ncbi, bethesda, md, usa). normalized tmm gene counts per million for the individual dermal fibroblast cell lines were downloaded from the geo rna-seq experiments interactive navigator (grein) ( , ) . limma-voom ( , ) was used to identify differentially expressed genes between the oldest (≥ years, n= ) and youngest (≤ years, n= ) age groups. differentially expressed genes were defined as those with an adjusted p value < . after multiple testing correction and an absolute log fold change > . . enrichment analysis of the differentially expressed genes was performed with toppgene ( ). pairwise pearson correlation coefficients were calculated between the normalized gene counts of the prr genes, ace and age, over all samples using graphpad prism version . . protein-protein interactions linking differentially expressed genes and sars-cov- proteins were identified by overlaying differentially expressed genes in the oldest and youngest age groups on to the sars-cov- human protein-protein interaction map reported by gordon, et al ( ) . network visualization was performed using cytoscape ( ) the ndex v . . ( ) . presymptomatic sars-cov- infections and transmission in a skilled nursing facility preliminary estimates of the prevalence of selected underlying health conditions among patients with coronavirus disease -united states estimates of the severity of coronavirus disease : a model-based analysis innate immune pattern recognition: a cell biological perspective damp-sensing receptors in sterile inflammation and inflammatory diseases pathogen recognition and toll-like receptor targeted therapeutics in innate immune cells toll-like receptor in acute viral infection: too much of a good thing implications of toll-like receptors in ebola infection host genetic determinants of hepatitis b virus infection epigenetic regulation of skin cells in natural aging and premature aging diseases. cells gene expression changes with age in skin, adipose tissue, blood and brain aging and chronic sun exposure cause distinct epigenetic changes in human skin longitudinal epigenetic and gene expression profiles analyzed by three-component analysis reveal downregulation of genes involved in protein translation in human aging predicting age from the transcriptome of human dermal fibroblasts cgas and sting: at the intersection of dna and rna virussensing networks beyond the inflammasome: regulatory nod-like receptor modulation of the host immune response following virus exposure cell proliferation and survival induced by toll-like receptors is antagonized by type i ifns a sars-cov- protein interaction map reveals targets for drug repurposing an interactive web-based dashboard to track covid- in real time old" protein with a new story: coronavirus endoribonuclease is important for evading host antiviral defenses grein: an interactive web platform for re-analyzing geo rna-seq data limma powers differential expression analyses for rna-sequencing and microarray studies voom: precision weights unlock linear model analysis tools for rna-seq read counts toppgene suite for gene list enrichment analysis and candidate gene prioritization cytoscape: a software environment for integrated models of biomolecular interaction networks ndex: a community resource for sharing and publishing of biological networks the authors declare that they have no competing interests. key: cord- -o oxchq authors: nguyen, thanh thi; pathirana, pubudu n.; nguyen, thin; nguyen, henry; bhatti, asim; nguyen, dinh c.; nguyen, dung tien; nguyen, ngoc duy; creighton, douglas; abdelrazek, mohamed title: genomic mutations and changes in protein secondary structure and solvent accessibility of sars-cov- (covid- virus) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o oxchq severe acute respiratory syndrome coronavirus (sars-cov- ) is a highly pathogenic virus that has caused the global covid- pandemic. tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. this paper reports and analyses genomic mutations in the coding regions of sars-cov- and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. prediction results suggest that mutation d g in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. based on , viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of sars-cov- in many perspectives, especially in tracing the evolution and worldwide spread of the virus. our analysis results also show that coding genes e, m, orf , orf a, orf b and orf are most stable, potentially suitable to be targeted for vaccine and drug development. biological investigations of the novel coronavirus sars-cov- are important to understand the virus and help to propose appropriate responses to the pandemic. scientists have been able to obtain genomic sequences of sars-cov- and have started analysis of these data. reference genome of sars-cov- deposited to the national center for biotechnology information (ncbi) genbank sequence database (isolate wuhan-hu- , accession number nc_ ) shows that sars-cov- is an rna virus having a length of , nucleotides. comparative genomic analysis results obtained in [ ] [ ] [ ] suggest that the covid- virus may be originated in bats. other studies show that pangolins may have served as the hosts for the virus [ ; ] . andersen et al. [ ] furthermore believe that sars-cov- is not a purposefully manipulated virus or constructed in a laboratory but has a natural origin. a study in [ ] using machine learning unsupervised clustering methods corroborates previous findings that sars-cov- belongs to the sarbecovirus subgenus of the betacoronavirus genus within the coronaviridae family [ ; ] . the whole genome analysis results also indicate that bats are more likely the reservoir hosts for the virus than pangolins. another study in [ ] demonstrates that sars-cov- may have resulted from a recombination of a pangolin coronavirus and a bat coronavirus, and pangolins may have acted as an intermediate host for the virus. since the first cases were detected, the covid- virus has spread to almost every country in the world and has been linked to the deaths of more than , people of over million confirmed cases [ ] . tracing the evolution and spread of the virus is important for developing vaccines and drugs as well as proposing appropriate intervention strategies. monitoring and analysing the viral genome mutations can be helpful for this task. due to a strong immunologic pressure in humans, the virus may have mutated over time to circumvent responses of the human immune system. this leads to the creation of virus variants with possible different virulence, infectivity, and transmissibility [ ] . this paper reports all point mutations occurring so far in sars-cov- and presents exemplified implications obtained from the analysis of these mutation pattern data. four types of mutations, which include synonymous, nonsynonymous, insertion and deletion, are detected. we use , sars-cov- genome sequences collected in countries and deposited to the ncbi genbank so far and create a spreadsheet dataset of all mutations occurred across different genes. eleven protein coding genes of sars-cov- have been identified, namely orf ab, spike (s), orf a, envelope (e), membrane (m), orf , orf a, orf b, orf , nucleocapsid (n) and orf . the order of these genes and their corresponding length are illustrated in fig. . the genes s, e, m, and n produce structural proteins that play important roles in the virus functions. for example, the receptor-binding domain (rbd) region of the s protein can bind to a receptor of a host cell, e.g. the human and bat angiotensin-converting enzyme (ace ) receptor, enabling the entrance of the virus into the cell [ ] . predictions of protein structures may help understand the virus's functions and thus contribute to developing vaccines and therapeutics against the virus. in this paper, to evaluate the possible impacts of genomic mutations on the virus functions, we propose the use of the sspro/accpro methods to predict protein secondary structure and relative solvent accessibility [ ] . these predictors were built using deep learning one-dimensional bidirectional recurrent neural networks incorporated in the scratch- d soft- ware suite (version . , ) [ ] . by comparing the prediction results obtained on the reference genome and mutated genomes, we are able to assess whether the detected mutations have the potential to change the protein structure and solvent accessibility, and thus lead to possible changes of the virus characteristics. because of the functional importance of structural proteins, we only report the prediction results of these proteins in this study. the next section reviews related works in the literature. we then present materials and methods for sars-cov- mutation detection, and protein secondary structure and solvent accessibility prediction. next we summarize statistics of sars-cov- mutations so far and implications of these mutations. details of mutations in nonstructural orf genes and structural s, e, m and n genes are presented after that. a full sars-cov- mutation spreadsheet report is provided in the supplemental information. since the first genomes were collected in december , there have been many findings on the mutations of sars-cov- . for example, phan [ ] analysed genomes of sars-cov- downloaded from the the global initiative on sharing all influenza data (gisaid) database (https://www.gisaid.org/) and found mutations over the entire viral genome sequences. among them, there are three mutations occurring in the rbd region of the spike surface glycoprotein s, including n d, d y and v f, with the numbers showing amino acid (aa) positions in the protein. that study also reveals three deletions in the genomes of sars-cov- obtained from japan, usa and australia. two of these deletion mutations are in the orf ab polyprotein and one deletion occurs in the ' end of the genome. likewise, a study in [ ] shows that the sars-cov- genomes may have undergone recurrent, independent mutations at sites with % of these are of the nonsynonymous type. tang et al. [ ] investigated genomes of covid- patients and discover mutations in sites of these genomes. the study also shows that the spike gene s consistently has larger ds values (synonymous substitutions per synonymous site) than other genes. in addition, two major lineages of the virus, denoted as l and s, have been specified based on two tightly linked snps. the l lineage is found more prevalent than the s lineage among the examined sequences. korber et al. [ ] tracked the mutations of spike protein s of sars-cov- because it plays an important role in mediating infection of targeted cells and is the focus of vaccine and antibody therapy development efforts [ ] . they detected mutations in the spike protein that are growing, especially the mutation d g that rapidly becomes the dominant form when spread to a new geographical region. likewise, hashimi [ ] analysed the mutation frequency in the spike protein s of sars-cov- genomes downloaded from the gisaid and genbank databases. the study found mutations occurring in the s protein sequences obtained from multiple countries. it suggests that the virus is spreading in two forms, the d form (residue d at position in the s protein) takes . % while the g form takes . % proportion of the examined isolates. koyama et al. [ ] on the other hand found several variants of sars-cov- that may cause drifts and escape from immune recognition by using the prediction results of b-cell and t-cell epitopes in [ ] . typically, the mutation d g occurring in the spike protein is found prevalent in the european population. this mutation may have caused antigenic drift, resulting in vaccine mismatches that lead to a high mortality rate of this population. a recent situation report [ ] by nextstrain [ ] on genomic epidemiology of novel coronavirus using , publicly shared covid- genomes shows that sars-cov- on average accumulates changes at a rate of substitutions per year. this is approximately equivalent to mutation per , bases in a year. this evolutionary rate of sars-cov- is typical for a coronavirus, and it is smaller than that of influenza (average mutations per , bases per year) and hiv (average mutations per , bases per year). shen et al. [ ] conducted metatranscriptome sequencing for bronchoalveolar lavage fluid samples obtained from patients with covid- and found no evidence for the transmission of intrahost variants as well as a high evolution rate of the virus with the number of intrahost variants ranged from to around a median number of . pachetti et al. [ ] examined genomic sequences of covid- patients from the gisaid database and discovered novel recurrent mutations at nucleotide locations , , , , , , and . mutations at locations , , , and are mostly found in europe while those at locations , and occur in sequences obtained from patients in north america. likewise, a study in [ ] on sars-cov- complete genome sequences discovered mutations. among them, the mutations at position c t in the orf ab gene, t c in the orf gene and c t in the n gene are common. we use , sequence records downloaded from the ncbi genbank database on - - . the latest collection date for the samples from which the sequences were derived was on - - . the data, which were collected in countries, include both nucleotide sequences and protein translations of coding genes. a proportion of the , records have sequences of only few proteins, i.e. these records do not annotate all proteins (orf ab, orf a, orf , orf a, orf b, orf , orf , s, e, m and n). the number of available sequences is thus different from one protein to another (see column "avai num" in table ). genome sequences that do not specify country or aa sequences that contain letter "x" representing an unknown aa are excluded in our calculations. we use the genome obtained from the isolate wuhan-hu- , accession number nc_ as the reference genome. for the mutation detection purpose, we apply a dynamic programming algorithm to protein aa sequences to get global pairwise alignments between a reference sequence and a query sequence. specifically, we use the python bio.pairwise .align.globalms function (https://biopython.org/docs/dev/api/bio.pairwise .html) where a match is given points, a mismatch is deducted . points, points are deducted when opening a gap, and point is deducted when extending it. gaps are then inserted into nucleotide sequences corresponding to the resulted protein sequence alignments. using the resulted pairwise alignments, we are able to compare query sequences and the reference sequences at each position and identify locations of insertion, deletion, synonymous and nonsynonymous mutations. virus protein structure plays a key role in its functions and a change in structure shape may affect its functions, virulence, infectivity and transmissibility, possibly resulting in non-functional proteins. protein secondary structure is defined by hydrogen bonding patterns, which make an intermediate form before the protein folds into a three-dimensional shape composing its tertiary structure. eight types of protein secondary structure defined by the dictionary of protein secondary structure (dssp) include helix (g), α helix (h), π helix (i), hydrogen bonded turn (t), extended strand in parallel and/or anti-parallel β-sheet conformation (e), residue in isolated β-bridge (b), bend (s) and coil (c). the dssp tool assigns every residue to one of the eight possible states. in a reduced form, these conformational states can be diminished to states: h = {h, g, i}, e = {e, b} and c = {s, t, c} [ ] . the protein secondary structure represents interactions between neighboring or near-by aas as its functional three-dimensional shape is created through the polypeptide folding. we thus determine a change in protein secondary structure if any change happens in the structures of the mutated aa and its neighboring aas compared to those of the reference sequence. in detail, we consider aas ahead and aas behind the mutated aa. the same approach is applied when considering a change of the protein relative solvent accessibility. solvent-exposed area represents the area of a biomolecule on a surface that is accessible to a solvent. accordingly, a residue is considered as exposed if at least % of that residue must be exposed, denoted as the "e" state. alternatively, the residue is determined as buried, i.e. the "b" state. there have been various protein secondary structure prediction programs in the literature and many of those were developed based on artificial intelligence models using protein aa sequences such as jpred [ ] , spider [ ] , porter [ ] , raptorx [ ] , psspred [ ] , yasspp [ ] and sspro [ ] . in this paper, we use the protein secondary structure and relative solvent accessibility prediction methods sspro/accpro [ ] within the scratch- d software suite (release . , ) [ ] . these predictors were built using the bidirectional recursive neural networks and a combination of the sequence similarity and sequence-based structural similarity to sequences in the protein data bank [ ] . prediction results of -class structure (sspro predictor) and %-threshold relative solvent accessibility (accpro predictor) are used for statistics on protein secondary structure and accessibility changes. we however also report in the spreadsheet supplemental information prediction results of -class structure (sspro predictor) and relative solvent accessibility on thresholds, ranging from % to % with a % step (the accpro predictor within the scratch- d software). table summarizes statistics of sars-cov- mutations so far. "aa length" indicates the length of the protein aa sequence derived from the sars-cov- reference genome. "avai num" denotes the number of records among , ncbi genbank records that have the complete sequence of the corresponding protein. "no mu" refers to the number of sequences that do not have any mutations compared to the reference sequence. "delete" means the number of deletion mutations occurring in the aa sequences of the protein. this number may be larger than the number of sequences having deletion mutations because an aa sequence may have more than one deletion. likewise, "insert", "nonsyn" and "syn" show the number of insertion, nonsynonymous and synonymous mutations occurring in the protein aa sequences. "nonsyn/syn" demonstrates a ratio between the number of nonsynonymous mutations versus the number of synonymous mutations. "struct change" means the number of nonsynonymous mutations that have protein secondary structure change potential based on the sspro predictor of the scratch- d software. similarly, "acc change" refers to the number of nonsynonymous mutations that have potential to change the protein relative solvent accessibility based on the accpro predictor of the scratch- d software. insertion and deletion mutations alter protein secondary structure and solvent accessibility by default so that they are not included in the structure and solvent accessibility change statistics. table shows that the orf a and orf proteins have the number of nonsynonymous mutations significantly larger than that of the synonymous mutations. in contrast, this ratio in proteins e, m, orf b and orf are very small (less than ). these proteins could be targeted for vaccine and drug development as they have less variations than other proteins. these findings are supported by results presented in figs (fig. ) , entire regions before and after the spike at position are almost unchanged. fig. presents variations of multiple proteins. in addition to proteins e, m, orf b and orf , we find that proteins orf and orf a are also relatively stable without a large number of variations at any particular locations. protein n has , nonsynonymous mutations but , of them are likely to make changes in protein secondary structure, making a ratio of . %. this is considerably larger than those of protein s ( . %), protein m ( . %) and protein e ( . %). the number of solvent accessibility changes of protein s is larger than its structure changes: vs . this however is opposite in other structural proteins: e ( vs ), m ( vs ) and n ( vs , ). the orf ab polyprotein has , aas. among , records deposited to the ncbi genbank database, only , genomes have the complete coding sequence (cds) of protein orf ab, with , unique aa sequences. this is quite a large number compared to other proteins but understandable because orf ab is the longest protein of sars-cov- and thus has a large number table . table ). the genbank accession numbers are presented on the left while isolate names and collected dates are on the right. the numbers on top show the positions of aas in the protein and isolates are ordered by collected dates. the first isolate having these deletions is usa-ca / (record mt in second row), collected on - - in usa: ca. this is also the isolate having the largest number of deletions: five sequentially at g -, h -, v -, m -, v -and three at k -, s -, f -. the other patients followed were possibly infected by this first case but more data such as travel history are needed to confirm this hypothesis. ( ) germany ( ) taiwan ( ) the orf a protein has aas with its complete cds appearing in , isolates ( unique aa sequences). among these, , sequences have no mutation or only synonymous mutations, and , sequences have insertion, deletion or nonsynonymous mutations. table . notably, the mutation q h occurs in , sequences collected in many countries. this is an emerging and active mutation, which requires further investigation as the latest case of this mutation was on - - , same as the latest collection date of the entire downloaded dataset. the mutation g v occurring in sequences is also a prevalent mutation in the orf a protein. the orf protein has aas, appearing in , isolates with unique aa sequences. among these, , sequences have no mutation or only synonymous mutations and sequences have insertion, deletion or nonsynonymous mutations. two insertion mutations occur in record mt at positions - r and - t (end of the sequence). nine continual deletions occur similarly in sequences: mt (collected in hong kong on - - from an adult male patient [ ] ) and mt (usa: virginia in - ). these deletions are f -, k -, v -, s -, i -, w -, n -, l -and d -. alignment of these sequences with the reference genome is displayed in fig. . the isolate mt thus may have transmitted the virus to mt but this implication needs to be corroborated by patients' travel history. there are distinct nonsynonymous mutations and those occurring in or more sequences are presented in table . the orf a protein has aas in length, found in , isolates with unique aa sequences. among these, , sequences have no mutations or only synonymous mutations, while the rest orf a. there are deletion mutations occurring in records: mt (collected in usa: massachusetts on - - ) and mt (usa on - - ). the mt sequence has deletion at position l -while the mt sequence has sequential deletions f -, a -, f -, a -, c -, p -, d -, g -, v -, k -, h -, v -, y -and q -. alignment of these sequences with that of the reference genome is shown in fig. . there are distinct nonsynonymous mutations with those occurring in or more sequences are reported in table . the orf b protein has aas with its complete cds appearing in , isolates, forming a set of unique aa sequences. there are , sequences having no mutations or only synonymous mutations and sequences having nonsynonymous mutations. no insertion or deletion mutations are found in gene orf b. this along with a small number of nonsynonymous mutations indicate that orf b is a stable gene. distinct nonsynonymous mutations ( of them) include f l, f y, f l, s l, l f, t i, c f, c s, h y and a t. summary of nonsynonymous mutations in gene orf b occurring in or more sequences is shown in table . the orf protein has aas in length, appearing in , isolates with only unique aa sequences. among them, , sequences have no mutation or only synonymous mutations and the rest sequences have nonsynonymous mutations. no insertion and deletion mutations are found in gene orf . similar to orf b, this is a stable gene. there are distinct nonsynonymous mutations, including i l, a v, s f, r l, r c, a v, d y and v i. those occurring in sequences or more are presented in table . the virus transmission may have happened between these two isolates but this needs further investigation. alignment of these sequences is shown in fig. . the number of nonsynonymous mutations in gene s is , , with distinct mutations. mutations that occur in or more cases are reported in table . the number of synonymous mutations is , making a ratio between nonsynonymous versus synonymous mutations at . . among the nonsynonymous mutations, mutation d g is extremely common as it happens in , sequences, majorly collected in usa ( ), india ( ) and australia ( ). the first collected date of the d g mutation cannot be identified precisely because some sequences deposited to the ncbi genbank did not record the full date details. the current data show that either of the following sequences, which have the d g mutation, was first collected: mt in usa in , or mt , mt , mt and mt all in germany: bavaria in - , or mt in thailand on - - . it is however important to note that the first patient having the d g mutation and his/her location may never be known because genome of that patient might not be sequenced and reported. therefore, information reported here can support for further investigation. our statistics show that among , sequences of the s protein, , sequences have the mutation d g, taking . %. this number has considerably increased compared to . % in the previous analysis in [ ] on a dataset downloaded on - - . on the other hand, there are a t mutations that all occur in thailand. the first case of this mutation was collected on - - and its latest case was on - - . this may indicate that the first case had probably transmitted to other cases having the same mutation a t in thailand. alternatively, mutations h y ( cases), v a ( cases), e d ( cases), p l ( cases) and s f ( cases) all occur only in usa or mutation l v ( cases) occurs only in hong kong (refer to the attached spreadsheet). the "latest date" in tables - may be used to infer which mutations are inactive or still active. for example, in gene s (table ) , the latest date of d g was on - - (same as the latest collection date of the entire dataset) that indicates that this mutation is still active. the latest date of p l was on - - , indicating that this mutation may no longer occur. this kind of information may be useful for further research on vaccine and drug development as ongoing changes of the viral proteins need to be focused and addressed. we identify the rbd region within the residue range arg -phe of protein s based on a study in [ ] . in the rbd region only, the number of nonsynonymous mutations is and that of synonymous is , making a ratio of . . this is much smaller than the ratio of . for the entire gene s, suggesting that the rbd region may have been optimized for binding to a receptor of a host cell. this is complemented by fig. showing all deletion mutations in gene s being outside the rbd region. note that the difference of these ratios is partly due to the large number of d g mutations ( , ) , which is outside the rbd region. table summarizes nonsynonymous mutations in the rbd region occurring in or more sequences. notable mutation in this region is v a occurring in isolates all collected in usa. the first and latest collected dates of these isolates were respectively - - and - - , suggesting that the first isolate may have spread to others having the same mutation v a. likewise, the mutation g s occurs in isolates all collected in usa: wa from - - to - - . alternatively, the mutation y f occurs in sequences all in netherlands but the first collected date was on - - and the latest collected date was on - - . these dates are too close, indicating that all the reported y f cases may have been infected from another case, whose genome had not been sequenced and reported to the ncbi genbank. it is important to note that all the transmission implications need further investigation with more data from other aspects such as travel history, physical contacts and so on. in for the entire protein s, nonsynonymous mutations ( unique) have both structure and solvent accessibility change potentials. these mutations occurring in or more sequences are reported in table . mutation h y occurs in cases and mutation p l occurs in cases, which are all collected in usa. the most common mutation d g does not have the potential to change either protein secondary structure or relative solvent accessibility. the envelope protein e has aas, found in , genbank records with unique aa sequences. among them, , sequences have no mutation or only synonymous mutations while sequences have nonsynonymous mutations. gene e is thus relatively stable and could be targeted for vaccine and drug development. this is supported by the fact that no insertion or deletion mutations are found within gene e. there are distinct nonsynonymous mutations in gene e and those occur in or more sequences are presented in table . five distinct nonsynonymous mutations in gene e have protein structure change potential: s c, s f, p l, d y and l f. alternatively, distinct mutations have potential to change relative solvent accessibility: l h, l r, d y and l f. therefore, d y and l f are two mutations in gene e that have a potential to change both protein structure and solvent accessibility. table . gene s -nonsynonymous mutations that have both structure and solvent accessibility change potentials occuring in or more sequences. the "query structure" (and "query accessibility") shows the unique structure (and accessibility) changes based on on prediction results. structure letter in parentheses is the predicted structure of the residue at the corresponding mutation position. five letters before and after parentheses are structures of neighbouring residues. likewise, letter "b" or "e" in parentheses shows the accessibility status of the residue at the mutation position. ccccc(c)cbeee ccccc(c)ceeee bebbb(b)bbbbb beebb(b)bbbbb d g eecct(t)ccctc ccccc(c)cceec cccct(c)cceec bbbeb(e)bbbeb bbbeb(b)bbbbb s f ecctt(c)cctcc ecttc(e)eeecc bbebe(b)bbebb bbbbb(b)bbbbb w l tccct(c)cccse cccee(e)eeese ebbbe(b)bbbeb bbbbb(b)bbbeb g d ctccc(c)seeee eeccc(c)seeee bebbb(b)ebbbb bbbbb(b) the m protein has aas and its complete cds appears in , genbank records, with unique aa sequences. there are , sequences having no mutation or only synonymous mutations while other sequences have nonsynonymous mutations. no insertion or deletion mutations are found in gene m. the number of distinct nonsynonymous mutations in gene m is , with those occurring in or more sequences shown in table . among these, mutations are likely to make changes in protein secondary structure: c f, a s, a v, v f, n b, r l, v i, d n, d y and s i. alternatively, mutations have the solvent accessibility change potential: n b, p l, p s, h y, d n and t i. n b and d n are thus two mutations having potential to change both protein structure and solvent accessibility in gene m. the n protein has aas and its complete cds appears in , isolates, with unique aa sequences. among them, , sequences have no mutation or only synonymous mutations while the rest sequences have deletions or nonsynonymous mutations. there are no insertion in gene n. the sequence in mt (collected in usa: ny on - - ) has three sequential deletions at q -, t -and v -while the sequence in mt (usa: ny on - - ) has six sequential deletions at t -, e -, p -, k -, k -and d -. two other sequences mt and mt (both collected in turkey on - - ) have three sequential deletions at r -, n -and s -. there are , nonsynonymous mutations with distinct ones and those occurring in or more sequences are presented in table . notable mutations are r k occurring in sequences and g r occurring in sequences. there are mutations in this protein having the potential to change both protein structure and solvent accessibility, including g v, d y, g w, r c, r l, r c, a s, p h, t i, t i, a s, d e, d h, d y and d y. analysing the virus genome sequences and their proteins is crucial for understanding the virus and proposing appropriate approaches to respond to and control the pandemic. this paper has reported all point mutations of sars-cov- since the virus's first genomes were obtained in december . a sars-cov- mutation database is built using a large number of genome sequences ( , ) obtained across countries. this database can enable scientists to monitor the evolution and spread of the virus although the use of these data needs to be corroborated with patients' clinical data and travel history for substantiated confirmations. we also predict the secondary structure and relative solvent accessibility of the virus proteins to evaluate whether the detected mutations have a potential to change the virus characteristics. these protein secondary structure and solvent accessibility change potentials are predicted results based on deep learning recurrent neural networks, which need to be experimentally verified. they however provide important insights about the virus and prompt further experimental biochemistry and molecular biology research into the genomic regions of these mutations. among , d g mutations, our prediction results show that none of these mutations is likely to make changes in the protein secondary structure and relative solvent accessibility. in addition, we have shown regions of the sars-cov- genomes that have small variations such as those coding for proteins e, m, orf , orf a, orf b and orf . these regions could be targeted for vaccine and drug development. usa ( ) australia ( ) bangladesh ( ) hong kong ( ) taiwan ( ) germany ( ) kazakhstan ( ) s n - - china australia ( ) greece ( ) bangladesh ( ) japan ( ) czech republic ( ) poland ( ) germany ( ) india ( ) taiwan ( ) turkey ( ) france ( ) thailand ( ) serbia australia ( ) greece ( ) bangladesh ( ) japan ( ) czech republic ) serbia ( ) italy ( ) spain ( ) russia ( ) sri lanka ( ) puerto rico ( ) peru ( ) nigeria ( ) a new coronavirus associated with human respiratory disease in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a pneumonia outbreak associated with a new coronavirus of probable bat origin identifying sars-cov- related coronaviruses in malayan pangolins probable pangolin origin of sars-cov- associated with the covid- outbreak the proximal origin of sars-cov- origin of novel coronavirus (covid- ): a computational biology study using artificial intelligence. biorxiv a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- isolation of sars-cov- -related coronavirus from malayan pangolins who coronavirus disease (covid- ) dashboard characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine sspro/accpro : almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity scratch: a protein structure and structural feature prediction server genetic diversity and evolution of sars-cov- . infection emergence of genomic diversity and recurrent mutations in sars-cov- . infection on the origin and continuing evolution of sars-cov- spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- . biorxiv a short review on antibody therapy for covid- emergence of mutations and possible antigenic drift in the surface glycoprotein of sars-cov- (covid- ) emergence of drift variants that may affect covid- vaccine development and antibody treatment a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- genomic analysis of covid- nextstrain: real-time tracking of pathogen evolution genomic diversity of sars-cov- in coronavirus disease patients emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant genomic characterization of a novel sars-cov- multi-output interval type- fuzzy logic system for protein secondary structure prediction jpred : a protein secondary structure prediction server spider : a package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks porter : fast, state-of-the-art ab initio prediction of protein secondary structure in and classes. biorxiv raptorx: exploiting structure information for protein alignment by statistical inference a comparative assessment and analysis of representative sequence alignment methods for protein structure prediction yasspp: better kernels and coding schemes lead to improvements in protein secondary structure prediction the protein data bank structure of the sars-cov- spike receptor-binding domain bound to the ace receptor a rare deletion in sars-cov- orf dramatically alters the predicted three-dimensional structure of the resultant protein key: cord- -qigfstxt authors: yang, chen; zhang, yu; chen, hong; chen, yuchen; yang, dong; shen, ziwei; wang, xiaomu; liu, xinran; xiong, mingrui; huang, kun title: kidney injury molecule- is a potential receptor for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qigfstxt covid- patients present high incidence of kidney abnormalities, which are associated with poor prognosis and high mortality. identification of sars-cov- in kidney of covid- patients suggests renal tropism and direct infection. presently, it is generally recognized that sars-cov- initiates invasion through binding of receptor-binding domain (rbd) of spike protein to host cell-membrane receptor ace , however, whether there is additional target of sars-cov- in kidney remains unclear. kidney injury molecule- (kim ) is a transmembrane protein that drastically up-regulated after renal injury. here, binding between sars-cov -rbd and the extracellular ig v domain of kim was identified by molecular simulations and co-immunoprecipitation, which was comparable in affinity to that of ace to sars-cov- . moreover, kim facilitated cell entry of sars-cov -rbd, which was potently blockaded by a rationally designed kim -derived polypeptide. together, the findings suggest kim may mediate and exacerbate sars-cov- infection in a ‘vicious cycle’, and kim could be further explored as a therapeutic target. the world health organization has announced the coronavirus disease (covid- ) as a pandemic . severe acute respiratory syndrome coronavirus (sars-cov- ), the pathogen of covid- , is a single-stranded rna virus, belonging to the betacoronavirus genus which also includes severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov) . sars-cov- , sars-cov and mers-cov mainly target respiratory systems to primarily manifest with respiratory illness. notably, reports about renal involvement among patients infected, as well as identifications of viral infection in kidney suggested that these coronaviruses may directly invade kidney . kidney impairment in hospitalized covid- patients is common, and is associated with severe inflammation, poor clinical progress and high in-hospital mortality. a study from wuhan showed that ( . %) of covid- patients had renal complications including proteinuria, hematuria or acute kidney injury (aki), and patients with renal impairment had much higher overall mortality ( . %) than those without ( . %) . meanwhile, in a cohort of patients from new york, . % of hospitalized patients developed aki, and % of those with aki died during observation time . consistently, we recently reported that covid- patients with chronic kidney diseases (ckd) were related to higher risk of poor prognosis and inhospital death , . a recent study suggested renal tropism of sars-cov- , which was detected in the kidneys of % of covid- patients with aki, whereas the viral rna was only found in % of patients without aki . among multiple organ manifestations in covid- patients, apart from lung, kidney is highly vulnerable to the virus, and renal dysfunctions are closely associated with high mortality, with the underlying molecular mechanisms remain unclear. like coronaviruses sars-cov and mers-cov, sars-cov- contains four key proteins, namely spike (s), envelope (e), membrane (m) and nucleocapsid (n) proteins. sars-cov- invasion initiates from binding with cellular membrane receptors via the viral spike protein ( figure a) . presently, angiotensin-converting enzyme (ace ), also the target for sars-cov, is the only confirmed receptor for sars-cov- . responsible for receptor recognition, sars-cov- spike protein (sars-cov- -sp) consists subunits s and s , the receptor-binding domain (rbd) in s binds ace to initiate the fusion of s with cell membrane and subsequent cell entry ( figure a ). renal expression of ace has been identified , suggesting its potential of mediating sars-cov- kidney infection. cell entry of virus may involve multiple transmembrane receptors , whether there are any additional receptors in kidney remain unclear. kidney injury molecule- (kim , also known as tim , havcr , or cd ) is primary expressed in kidney and drastically up-regulated in injured kidney proximal tubule in aki or ckd, and plays crucial roles in inflammation infiltration and immune response , , . here, we hypothesize that kim may be a binding target of sars-cov- that mediates its kidney invasion. in this study, we investigate the binding of sars-cov -rbd to kim by molecular simulations and co-immunoprecipitation (co-ip) assays, binding potentials of kim to sars-cov and mers-cov were also evaluated. we demonstrate kim -assisted cell entry and cytotoxicity of sars-cov -rbd in human kidney cells, and explore the blockade effects of two kim -derived antagonist peptides against sars-cov- -rbd invasion. recombinant sars-cov- -rbd (t ) was obtained from genscript (nanjing, china). antagonist peptide (scslftcqngiv) and (scslftcqngggwf) were chemically synthesized by genscript. anti-mouse-igg (h&l) antibody (p/n - - ) and anti-rabbit-igg (h&l) antibody (p/n - - ) were obtained from rockland (philadelphia, pa). igg with surebeads tm protein g magnetic beads ( - -ap, proteintech, wuhan, china) were used. to obtain the comprehensive transcriptome and protein profiles of kim and ace for human tissues, we collected and analyzed the transcriptome data and immunohistochemistry-base protein profiles from human protein atlas (hpa, https://www.proteinatlas.org), which showed the expression and localization of human proteins across tissues and organs, based on deep sequencing of rna (rna-seq) from normal tissue and immunohistochemistry on tissue microarrays containing tissue types . hpa rna-seq tissue of the protein-coding gene was recorded as mean proteincoding transcripts per million (ptpm), corresponding to mean values of samples from each tissue. histology-based protein expression levels were analyzed manually into levels (not detected, low, medium and high). top tissular transcriptional levels and histology-based protein expression levels of kim and ace were listed, respectively. and the overlapped expression profile of kim and ace were summarized. were used to seek potential binding models. the best-scored protein complexes were selected to conduct following molecular dynamics simulations, which were conducted by the desmond server (http://www.deshawresearch.com/) and analyzed by pymol . and maestro . . . nsec dynamics simulations diagram was applied to study the dynamic parameters of the protein complexes. molecular mechanics generalized born surface area (mm-gbsa) binding free energy was calculated by hawkdock . root mean square deviation (rmsd) was utilized to estimate the average change in displacement of a selection of atoms for a particular frame as described . root mean square fluctuation (rmsf) was conducted to study the displacement changes in the protein chain . i/r injury was performed on c bl/ mice as we previously described . for cisplatininduced aki, mg/kg bodyweight cisplatin was injected intraperitoneally into - week-old male mice, and mice were sacrificed days later. blood and kidney samples were collected for further analysis. total rna was isolated from kidneys by rna iso plus (takara biotech., dalian, china) and reverse transcribed into cdna using the m-mlv first-strand synthesis system (invitrogen, grand island, ne). the abundance of specific gene transcripts was assessed by qpcr. primers used in the study are provided (supplementary table s ). mammalian expression plasmids for human kim , kim ig v (kim - aa), kim Δig v (truncated kim without residues - aa), ace , sars-cov- -rbd (sars-cov- spike protein - aa) and sars-cov- -sp were constructed. pcr amplification products of the corresponding cdna fragments were cloned into a prk promoter-based vector containing either ha or flag tag ( figure a ). sars-cov- related plasmids were kindly gifts by dr. ph wang of shandong university. human kidney tubular cell line hk- (obtained from china center for type culture collection, wuhan, china) was cultured in dmem/f media (hyclone, logan, ut) containing . mm glucose and % fetal bovine serum. to evaluate the impact of cov- -rbd plasmids for hours, then collected for further detection. transfected hk- /hek t cells were lysed in ml pre-lysis buffer. for immunoprecipitation, cell lysate was immune-precipitated with indicated antibody or respective igg with surebeads tm protein g magnetic beads overnight at °c. after washing with pre-lysis buffer containing mm nacl, the beads were boiled in loading buffer and subjected to immunoblotting. the crispr-cas based protocols for genome engineering were used as we previously described . kim guide rna target sequences are provided (table s ). fluoresceine isothiocyanate (fitc) label was performed as we previously described . briefly, sars-cov- -rbd was co-incubated with fitc (molar ratio : ) overnight, then mm nh cl was added to stop the reaction and quench the un-reacted fitc. the solution was dialyzed twice, and lyophilized for further use. transfected hek t cells ( × ) or hk- cells ( × ) were incubated with free fitc or fitc-sars-cov- -rbd for h. for peptide based internalization assays, antagonist peptide or ( μm) was co-added with sars-cov- -rbd ( μg/ml). after fixing with % (w/v) formaldehyde, the cell membranes were stained with alex flour labelled phalloidin ( μg/ml) and the nuclei were stained by dapi ( μg/ml), then imaged with a leica tcs sp confocal microscope. cells were plated at - cells per well. at % confluence, cells incubated with sars-cov- -rbd ( μg/ml) were treated with or without antagonist peptide / ( μm). after that, μl mtt ( mg/ml) was added to each well for hours, medium was removed and dmso was added. absorbance measured at nm was normalized to the respective control group. data were expressed as mean ± sd. significant differences were assessed by two-tailed student's test. -sided p value less than . was considered statistically significant. analyses were performed with excel and graphpad prism . . to elucidate kim and ace enrichment in tissues, the transcriptome and histology- the overall sequence similarity between sars-cov-rbd and sars-cov- -rbd is . %, and of the ace -contacting residues are conserved in both rbd ( figure b (table s and s ), which is lower than that of sars-cov- -rbd and ace (- . kcal/mol), but comparable to that of sars-cov-rbd and ace (- . kcal/mol) . given ace is also recognized as a key receptor for sars-cov-rbd , our results thus indicate strong interaction between sars-cov- -rbd and kim (table and table s ). notably, the different binding regions on the surface of sars-cov- -rbd to kim and to ace implicate that kim and ace may synergistically mediate sars-cov- invasion ( figure b ). cov- spike protein ( figure s and table s ), which may affect the interaction between sars-cov- and its receptors. covid- cases carrying v f mutation in sars-cov- , which contacting kim , have been reported (http://giorgilab.dyndns.org/coronapp/, figure s b -c). mm-gbsa analysis suggest that v f mutation lead to mildly enhanced binding free energy (- . kcal/mol) to kim compared with that of sars-cov- -rbd (- . kcal/mol) (table s ). consistent with our findings, latest clinical reports combined with mutational scanning suggest that v f may lead to enhanced viral infectivity of sars-cov- , . microarray data show increased kim expression in sars patients-derived peripheral blood mononuclear cells comparing to healthy controls ( figure s a ) (gse ) . (table s ). together, our data suggest that compared with sars-cov-rbd and mers-cov-rbd, sars-cov- -rbd showed the highest binding affinity to kim ; moreover, sars-cov-rbd and sars-cov- -rbd share the same binding pocket on the kim ig v domain ( figure c ). to confirm the binding between sars-cov- -rbd and kim , endogenous and exogenous co-ip assays were performed ( figure a ) figure c ). since kim ig v is crucial in mediating the binding of virus and cell membrane receptor , full length kim and ig v were respectively co-transfected with sars-cov- -rbd into a stable kim knockout hk- cell line ( figure d and figure s a -b); binding of sars-cov- -rbd with full length kim and ig v were observed ( figure d ). deletion of ig v domain abolished the binding between kim and sars-cov- -rbd in hek t cells ( figure e ). these findings suggest ig v domain mediates interaction between kim and sars-cov- . we next investigated the role kim plays in mediating cell entry of sars-cov- -rbd. fluoresceine isothiocyanate (fitc) was utilized to label sars-cov- -rbd, and the cell entry of sars-cov- -rbd in hk- cells was observed ( figure a ). knockout of kim attenuated the invasion and reduced the cytotoxicity induced by sars-cov- -rbd ( figure a and figure s c ). in kim knockout hk- cells, restoring full length kim and overexpressing ig v both rescued cell entry of sars-cov- -rbd ( figure a ). consistently, overexpression of kim in hek t leads to enhanced invasion of sars-cov- -rbd ( figure b ). to competitively bind with sars-cov- -rbd and inhibit kim -associated invasion, we rationally designed two antagonist peptides based on sars-cov- contacting motifs in kim (motif : leu , phe , gln ; motif : trp , phe ; fig a) . peptide mimics motif , while peptide covers both motifs, with triple glycine used as a flexible linker ( figure a ). the mm-gbsa binding free energy which indicates potential binding between peptides and sars-cov- -rbd was provided in table s . both peptides showed no distinct cytotoxicity, and peptide reduced sars-cov- -rbd induced cell death ( figure s d ) and decreased cell entry of sars-cov- -rbd in hk- cells (figure b ), suggesting the protective effects of this kim -derived polypeptide. to fight covid- pandemic, a deep understanding of how sars-cov- invade human cells is warranted. studies have indicated direct infection of sars-cov- in kidney in addition to lung , , however, ace remains the only confirmed receptor which may mediate this invasion. furthermore, the renal tropism of sars-cov- and associated kidney injury, seem unexplainable by the relatively decreased level of ace upon viral invasion . here, we reported kim , a drastically up-regulated biomarker for kidney injury ( figure c ) , mediates sars-cov- kidney invasion as a receptor. we also found that sars-cov- -rbd binds kim ig v with a higher affinity than that of sars-cov-rbd and mers-cov-rbd, which probably underlies the stronger contagion of sars-cov- . notably, our results suggest that sars-cov- -rbd binds kim and ace via two distinct pockets, implicating that kim and ace may synergistically mediate the invasion of sars-cov- in kidney cells; which may explain the strong renal tropism, as well as the high incidence of acute kidney injury in covid- patients . mutations in sars-cov- spike protein may affect the interaction between sars-cov- and its receptors, resulting in different viral infectivity and antigenicity . among the sites of sars-cov- that involved in attachment to kim , v f mutation was recently reported to increase sensitivity to neutralizing antibodies and show stability-enhancing effect , . our findings suggested that v f mutation enhances binding affinity towards kim (table s ), yet the functional consequences remain unknown; further investigations of how kim interacts with different sars-cov- variants could be of value in developing related therapies. as a viral envelope ps binding receptor, kim has been shown to mediate viral invasions that cause hepatitis a and encephalitis through its ps binding site wfnd ( - ) , , . although our results have demonstrated the binding of kim with rbd of multiple coronaviruses, its role as a ps receptor in meditating sars-cov- binding also worth further investigation. ace is the most well-studied receptor for sars-cov- so far, yet it is not an ideal therapeutic target for covid- since it is widely expresses in multiple organs, and plays crucial roles in regulating blood pressure and preventing heart/kidney injury , . in contrast, kim has stronger association to kidney function and highly expressed only after renal injury , , , which make it a more specific and maybe safer therapeutic target for covid- patients with kidney diseases. together, our study identified kim , a kidney injury marker, as a potential receptor for sars-cov- . to explain the high incidence of aki of covid- patients, we propose a model of a 'vicious cycle' co-mediated by kim and ace in kidney of covid- patients. in this model, the higher physiological level and binding affinity of ace make it the primary target for the initial sars-cov- invasion, which lacks kidneyspecificity. next, the induced kidney injury and the resulting drastically up-regulated kim rapidly promotes a kim -and-ace -co-mediated secondary viral infection, which is more kidney-specific, and consequently exacerbating kidney damage in a vicious cycle ( figure c ). therefore, blocking kim may partly inhibit the cell entry of sars-cov- and attenuate the kidney injury caused by viral invasion, however, further studies are necessary to fully validate this model. moreover, drugs target the binding pocket of kim may be explored, and bi-specific antibodies/peptides that dualtargeting kim and ace may also be developed, considering that they can be simultaneously targeted by sars-cov- . given expression profile of kim overlapped with that of ace , notably in multiple target organs of sars-cov- ( figure s ), studies on additional organs/tissues affected by coronaviruses may also worth future exploration. top ranked residues that involved in the binding of sars-cov- -rbd and kim ig v were listed. clinical characteristics of coronavirus disease in china q: product of natural evolution (sars, mers, and sars-cov- ); deadly diseases, from sars to sars-cov- organ distribution of severe acute respiratory syndrome (sars) associated coronavirus (sars-cov) in sars patients: implications for pathogenesis and virus transmission pathways c: in-vitro renal epithelial cell infection reveals a viral kidney tropism as a potential mechanism for acute renal failure during middle east respiratory syndrome (mers) coronavirus infection sars-cov- renal tropism associates with acute kidney injury. the lancet renal involvement and early prognosis in patients with covid- pneumonia acute kidney injury in patients hospitalized with covid- clinical characteristics and outcomes of patients with diabetes and covid- in association with glucose-lowering medication covid- & chronic renal disease: clinical characteristics & prognosis structural basis of receptor recognition by sars-cov- structure of the sars-cov- spike receptorbinding domain bound to the ace receptor the tim- :tim- pathway enhances renal ischemia-reperfusion injury structural equation modeling highlights the potential of kim- as a biomarker for chronic kidney disease tim- acts a dual-attachment receptor for ebolavirus by interacting directly with viral gp and the ps on the viral envelope tim- ubiquitination mediates dengue virus entry havcr (cd ) and its mouse ortholog are functional hepatitis a virus (hav) cellular receptors that mediate hav infection tcell immunoglobulin and mucin domain (tim- ) is a receptor for zaire ebolavirus and lake victoria marburgvirus proteomics. tissue-based map of the human proteome. science u: binding-affinity predictions of hsp in the d r grand challenge with docking, mm/gbsa, qm/mm, and free-energy simulations zs: molecular dynamics studies of the d structure and planar ligand binding of a quadruplex dimer elabela and an elabela fragment protect against aki angptl negatively regulates nf-kappab activation by facilitating selective autophagic degradation of ikkgamma histone methyltransferase g a protects against acute liver injury through gstp ds: gastrointestinal and hepatic manifestations of covid- : a comprehensive review rising concern on damaged testis of covid- patients the mers-cov receptor dpp as a candidate binding target of the sars-cov- spike. iscience jv: shedding of the urinary biomarker kidney injury molecule- (kim- ) is regulated by map kinases and juxtamembrane region the impact of mutations in sars-cov- spike on viral infectivity and antigenicity. cell deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding. biorxiv expression profile of immune response genes in patients with severe acute respiratory syndrome ultrastructural evidence for direct renal infection with sars-cov- tim- promotes japanese encephalitis virus entry and infection. viruses a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury kim- -mediated phagocytosis reduces acute injury to the kidney sars-cov- , sars-cov, and mers-cov: a comparative overview r: physiological and pathological regulation of ace , the sars-cov- receptor angiotensin-converting enzyme protects from severe acute lung failure this work was supported by the natural science foundation of china ( and ). chen yang yu zhang, hong chen and dong yang performed the simulation studies. the authors declare no conflict of interests. key: cord- - se authors: graudenzi, alex; maspero, davide; angaroni, fabrizio; piazza, rocco; ramazzotti, daniele title: mutational signatures and heterogeneous host response revealed via large-scale characterization of sars-cov- genomic diversity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: se to dissect the mechanisms underlying the observed inflation of variants in sars-cov- genome, we present the largest up-to-date analysis of intra-host genomic diversity, which reveals that the majority of samples present a complex sublineage architecture, due to the interplay between host-related mutational processes and transmission dynamics. strikingly, the deconvolution of the entire set of intra-host variants reveals the existence of mutually exclusive viral mutational signatures, which prove that distinct hosts differently respond to sars-cov- infections. in particular, two signatures are likely ruled by apobec and reactive oxygen species (ros), which induce hypermutation in a significant number of samples, and appear to be affected by severe purifying selection. conversely, several mutations linked to low-rate mutational processes appear to transit to clonality in the population, eventually leading to the definition of new viral genotypes and to an increase of overall genomic diversity. finally, we demonstrate that a high number of variants are observed in samples associated to independent lineages, likely due to signature-related mutational hotspots or to positive selection. the covid- pandemic has currently affected countries worldwide with , , people being infected, while the number of casualties has reached the impressive number of , [ ] (update june ). the origin and the main features of sars-cov- evolution have been investigated [ , , , , ] , also due to the impressive amount of consensus viral sequences included in public databases, such as gisaid [ ] . however, only a few currently available datasets include raw sequencing data, which are necessary to quantify intra-host genomic variability. due to the combination of high error and replication rates of viral polymerase, subpopulations of viruses with distinct genotypes, also known as viral quasispecies [ ] , usually coexist within single hosts. such heterogeneous mixtures are supposed to underlie the adaptive potential of rna viruses to internal and external selection phenomena, which are related, e.g., to the interaction with the host's immune system or to the response to antiviral agents. for instance, it was hypothesized that intra-host heterogeneity may be correlated with prognosis and clinical outcome [ , ] . furthermore, even if the modes of transmission of intra-host variants in the population are still elusive, one may hypothesize that, in certain circumstances, infections allow such variants to spread, sometimes inducing significant changes in their frequency [ ] . in particular, several studies on sars-cov- support the presence of intra-host genomic diversity in clinical samples and primary isolates [ , , , , , , , ] , whereas similar results were obtained on sars-cov [ ] , mers [ ] , ebola [ ] and h n influenza [ ] . we here present the largest up-to-date study on intra-host genomic diversity of sars-cov- , based on high-quality samples for which raw sequencing data are available. our analysis shows that ≈ % of the sars-cov- genome has already mutated in at least one sample, with most mutations (≈ %) being detected at a low frequency in a very limited number of hypermutated samples (≈ % of the cohort displays more than minor variants). the large majority of samples shows highly heterogeneous sublineage architectures, in which patterns of co-occurence of minor variants seem to suggest the existence of uncovered infection paths and of homoplasies, either due to functional convergent evolution or to mutational hotspots. importantly, several variants are observed as clonal in certain samples and at a relatively low frequency in others, demonstrating that transition to clonality might be due not only to functional selection shifts, but also to complex transmission dynamics involving stochastic and founder effects. strikingly, our analysis allowed to identify three non-overlapping mutational signatures, i.e., specific distributions of nucleotide substitutions, which are observed in distinct clusters of samples in a mutually exclusive fashion, suggesting the presence of host-related mutational processes. one might hypothesize that such different processes are related to the interaction of the virus with the host's immune system and/or with antiviral therapies, and might pave the way for a better understanding of the molecular mechanisms underlying different clinical outcomes. in particular, the first signature is characterized by c>t:g>a substitution and is likely related to apobec activity, the second signature is characterized by g>t:c>a substitution and might be associated to ros-related processes, while a third signature is mostly associated to a>g:t>c substitution, which is usually imputed to adar activity. the two first signatures, in particular, appear to cause hypermutation of the samples, up to extremely high values (up to variants detected in a single host), and the dn/ds analysis would suggest that they are both affected by purifying selection in the population. we finally provide a high-resolution model of viral evolution via verso [ ] , which allows to identify a robust phylogenomic lineage tree, as well as to quantify the intra-and inter-host genomic diversity even among samples harboring the same viral lineages. interestingly, clonal variants are detected as convergent in two distinct clades, and it is sound to hypothesize they might have been related to transmission-related founder effects, following random mutation acquisition. importantly, a significantly high number of minor variants is observed in independent lineages and is unlikely related to infection events, suggesting that in such cases, the same mutation can emerge in independent samples due to the presence of mutational hotspots, especially in hypermutated samples affected by mutational signatures. mutational landscape of sars-cov- from variant frequency profiles of samples. we performed variant calling from amplicon raw sequencing data of samples from the ncbi bioproject prjna and by aligning sequences to reference genome sars-cov- -anc, which is the likely ancestral sars-cov- genome [ ] . we first filtered-out samples showing more than % of the genomic positions covered by less than reads. we also verified the absence of any possible bias in the detection of minor variants due to sequencing artifacts. as one can see in suppl. fig. , no correlation between the number of snvs and both total coverage and the mean sequencing depth is observed (r = . in both cases), which proves the good quality of the calls. as a result, we extensively analyzed the mutational profiles of high-quality samples; distinct single-nucleotide variants (snvs, identified by genome location and nucleotide substitution) were detected in the cohort, for a total of , mutations (see methods for further details; the variant frequency profiles of all samples are included in suppl. table ; see fig. i for a graphical representation of an example dataset). in particular, in our analysis we consider any snv detected in any given sample as clonal, if its variant frequency (vf) is > % and as minor if its variant frequency is > % and ≤ %. the distribution of the number of minor and clonal variants observed in each sample (fig. a ) unveils a bimodal distribution of clonal variants ( st mode = , nd mode = , median = , max = ) likely due to the partitioning of samples in distinct phylogenomic clades (see below). minor variants are detected in ≈ % of the samples and show a long-tail distribution (median = , max = ). samples (≈ % of the cohort) display a significantly large number of minor variants (> ) and hint at the existence of hypermutation processes, which is confirmed in successive analyses (see below). accordingly, we label such samples as hypermutated. interestingly, we observe a statistically significant increase of genomic diversity on clonal variants with respect to collection week (mann-kendall test for trend on median number of clonal variants p = . , fig. b) , due to the accumulation of clonal variants in the population, and which confirms recent findings [ , , ] , whereas, as expected, this phenomenon is less evident for minor variants (fig. c ). this aspect is further investigated in the following and hints at the interplay involving the evolutionary dynamics within hosts and the transmission among hosts, which differently affects clonal and minor variants [ ] . evidence of transition to clonality. we further categorize each detected snv as: (i) always clonal, if clonal in all samples in which it is detected, (ii) always minor, if minor in all samples in which it is detected, (iii) mixed, if observed as clonal in at least one sample and as minor in at least another sample. single-nucleotide variants were detected on distinct genome sites ( . % of sars-cov- genome), of which sites ( . % of the genome) display multiple nucleotide substitutions (see fig. d ). this suggests that the proportion of mutated genomic sites might be considerably higher in the overall population, especially if considering minor variants. remarkably, . % of all snvs ( ) are privately detected in hypermutated samples, which represent the ≈ % of the cohort (fig. e) . . %, . % and . % snvs are detected as always clonal, mixed and always minor, respectively, in non hyper-mutated samples, whereas hyper-mutated samples are characterized by a large majority of always minor variants ( . %) ( fig. f ; notice that, in this categorization, snvs detected in non-hypermutated samples can be present also in hypermutated samples, yet not privately). in most cases, most variants are non-synonymous (see fig. g ). the analysis of the vf distribution ( fig. h ) unveils an impressive scarcity of variants showing vf in the middle-range, i.e., between % and %, for all categories. this phenomenon is likely due to transmission bottlenecks, which tend to purify low-frequency variants in the population. nonetheless, both mixed and always minor variants display broad vf spectra, an aspect that is particularly relevant for the former category, for both hypermutated and non-hypermutated samples. in this respect, . % of all mixed variants ( on ) never display a vf ≤ %: one may hypothesize that such variants are indeed transiting to clonality in the population, because either positively selected, as a result of the strong immunologic pressure within human hosts [ ] , or because affected by transmission phenomena involving founder effects and stochastic fluctuations [ , ] . the former hypothesis is supported by the ratio of non-synonymous and synonymous substitutions = . ( / , variants are non-coding), which is slightly superior to the theoretical ns/s ratio of the sars-cov- -anc reference genome, which is = . (resulting ratio = . ), suggesting a mild tendency toward positive selection for transiting variants. conversely, one might hypothesize that most remaining mixed variants may result from random mutations hitting positions of snvs that are already present as clonal in the population; this is especially relevant for hypermutated samples, which show an extremely high number of low-frequency variants. furthermore, the distribution of snvs with respect to each region of the genome in fig. a -b demonstrate that mutations are uniformly distributed across the genome, with noteworthy predominance of minor snvs in hyper-mutated samples (see also suppl. fig ) . overall, this analysis provides the first large-scale quantification of transition to clonality in sars-cov- and might serve to intercept variants possibly involved in functional modifications, bottlenecks or founder effects. de novo decomposition of sars-cov- mutational signatures. in order to investigate the existence of mutational processes related to the interaction between the host and the sars-cov- virus, we analyzed the distribution of nucleotide substitutions for all snvs detected in the cohort. in fig. a one can see the proportion of snvs for each of the nucleotide substitution types (e.g., number of c>t's) over the total number of nucleotides present in the reference genome for each substitution type (e.g., number of c's); snvs are then grouped in those detected in non hyper-mutated samples or privately detected in hyper-mutated samples. certain substitutions present a significantly higher normalized abundance, confirming recent findings on distinct cohorts [ , ] . in particular, c>t substitutions are observed in . % and . % of all c nucleotides in sars-cov- genome, in either non-hypermutated or (privately) in hypermutated samples, respectively ( . % by considering all snvs from all samples); g>t's in . % and . % of all g's, t>c's in . % and . % of all t's, and a>g's in . % and . % of all a's. although, traditionally, a -substitution pattern has been used in order to report mutations occurring in single-stranded genomes, we reasoned that, owing to the intrinsically double-stranded nature of the viral life-cycle (i.e., a mutation occurring on a plus strand can be transferred on the minus strand by rdrp and vice versa) it is sound to consider a total of substitution classes (obtained by merging equivalent substitutions in complementary strands) to investigate the possible presence of viral mutational signatures [ ] . clonal variants were not considered in the analysis, to focus on snvs likely related to host-specific mutational processes and by excluding variants presumably transmitted during infection events. in fig. b we return the (average) proportion of substitution classes obtained by grouping samples with respect to the number of minor snvs. one can notice that samples with less than minor variants display a relatively even proportion of the substitution classes c>t:g>a, c>a:g>t and a>g:t>c, plus some residuals from other classes. remarkably, samples with more than and less than minor variants show a clear prevalence of c>a:g>t class, whereas highly hypermutated samples (> minor snvs) are predominantly characterized by c>t:g>a class. this result strongly supports the existence of focal processes causing mutations at different rates in distinct samples, and which we here investigate via mutational signature analysis (see methods). in detail, to identify the mutational processes responsible for the sars-cov- variants with a statistically grounded approach, we applied a non-negative matrix factorization (nmf) [ ] and standard metrics to determine the optimal rank (see methods). in particular, we analyzed the mutational profiles of samples with at least minor variants (on total samples), to ensure a sufficient sampling of the distributions. strikingly, distinct and non-overlapping mutational signatures are found and explain . % of the variance in the data ( fig. a and suppl. characterization of mutational signatures of sars-cov- . signature s# is related to c>t:g>a substitution, which was often associated to apobec (apolipoprotein b mrna editing catalytic polypeptide-like), i.e., a cytidine deaminases involved in the inhibition of several viruses and retrotransposons [ ] . an insurgence of apobec-related mutations was observed in other coronaviruses shortly after spillover [ ] and it was recently hypothesized that apobec-like editing processes might have a role in the response of the host to sars-cov- [ ] . as specified above, a mutational process occurring on single-stranded rna with a given pattern, e.g., c>t, could occur as a c>t mutation on the plus reference strand, but could similarly occur on the minus strand, again as a c>t substitution. however, c>t events originally occurring in the minus strand would be recorded as g>a owing to the mapping of the mutational event as a reverse-complement on the plus reference genome. starting from these considerations and hypothesizing that the c>t:g>a substitution is mediated by apobec, which operates on singlestranded rna and is similarly active on both strands, the analysis of the c>t / g>a ratio (or, more generally, of a plus/minus substitution ratio) should give an accurate measurement of the molar ratio between the two viral strands inside the infected cells. in our case, by looking at the relative proportion of substitutions one can notice that there is a large disproportion between c>t and g>a, with a ≈ -fold substitution ratio in favor of the former ( . %/ . %). this result allows us to hypothesize that plus and minus viral strands of sars-cov- genome are present in infected cells with a molar ratio of approximately : in favor of the plus strand and are consistent with the expected activity of apobec on single-stranded rna. further experimental analyses will be required to confirm this hypothesis. the second signature s# is predominantly characterized by substitution c>a:g>t, whose origin is however still obscure. to gain insight into the mechanisms responsible for its onset, also in this case we analyzed the c>a and g>t substitution frequency, which revealed a strong disproportion in favor of the latter ( . % vs . %). strikingly, the g>t to c>a ratio is : , virtually identical to the c>t to g>a ratio shown for the single-stranded apobec process, therefore suggesting that: (i) the c>t:g>a mutational process is able to operate on single-stranded rna, (ii) the g>t substitution is the active mutational process, (iii) the ratio value of : represents a constant, likely reflecting the molar ratio between the two viral strands. in this respect, one might hypothesize a role for reactive oxygen species (ros) as mutagenic agent underlying this signature, as observed for instance in clonal cancer evolution [ ] . ros are extremely reactive species formed by the partial reduction of oxygen. a large number of ros-mediated dna modifications have already been identified; in particular however, guanine is extremely vulnerable to ros, because of its low redox potential [ ] . ros activity on guanine causes its oxidation to , -dihydro- -oxo- -deoxyguanine (oxoguanine). notably: (i) oxoguanine can pair with adenine, ultimately causing g>t transversions, and (ii) ros are able to operate on single-stranded rna, therefore their mutational process closely resembles the c>a:g>t pattern we see in signature s# . thus, it is sound to hypothesize that the c>a:g>t substitution is generated by ros, whose production is triggered upon infection, in line with several reports indicating that a strong ros burst is often triggered during the early phases of several viral infections [ , ] . finally, signature s# is primarily characterized by a>g:t>c substitution, which is typically imputed to the adar deaminase mutational process [ ] . adar targets adenosine nucleotides, causing deamination of the adenine to inosine, which is structurally similar to guanine, ultimately leading to an a>g substitution. unlike apobec, adar targets double-stranded rna, hence it is active only on plus/minus rna dimers. in line with this mechanism and in sharp contrast with apobec, a>g's and the equivalent t>c's show a similar prevalence ( . % and . %,respectively; ratio ≈ . ), therefore supporting the notion that the a>g:t>c mutational process is exquisitely selective for double-stranded rna, where it can similarly targets adenines present on both strands. identification of signature-based clusters. we then clustered the samples with at least mutations (on ), by applying k-means on the low-rank latent nmf matrix and employing standard heuristics to determine the optimal number of clusters (see methods). as a result, signature-based clusters (sc# , sc# and sc# ) are retrieved, including , and samples respectively (see fig. b ). remarkably, cluster sc# and sc# are characterized by distinctive signatures, s# (dominated by substitution c>t:g>a) and s# (c>a:g>t), respectively, whereas cluster sc# is characterized by a mixtures of all three signatures. the silhouette width coefficients demonstrate that clusters are robust and well-separated ( . , . , . for the three clusters, respectively; average silhouette = . ) and, in particular, the samples of clusters sc# and sc# display non-overlapping categorical vf distributions (see fig. d ; pearson correlation coefficient between the exposure matrix = − . ). this impressive result points at the existence of two mutually exclusive host-related mutational processes involving different groups of samples. we here recall that samples with a number of minor variants between and ( . % of the cohort) cannot be reliably associated to signature-based clusters, due to the low number of snvs. for this reason, such samples were considered separately in the analysis and were labeled as cluster sc# from now on (fig. c) . importantly, by computing the categorical vf distribution of all minor snvs with respect to all trinucleotide contexts (i.e., by considering flanking bases), one can notice that clusters sc# and sc# display profiles that closely resemble that of the theoretical substitution distribution of the reference genome, thus suggesting that, in such cases, the host-related mutational processes are likely independent from flanking bases. conversely, sc# displays a distribution of c>a:g>t substitutions with prevalent peaks in certain contexts and, especially, on gct>gat context. we finally note that, due to the possible transmission of minor variants among hosts during infections (see above), signature-based clusters might include both samples with host-related mutational processes and samples with minor variants herited from infecting hosts. table ). in particular, clusters sc# and sc# are characterized by a significantly high number of minor variants ( st quartile = and = minor variants , median = and = , rd quartile = and = , max = and = , for sc# and sc# , respectively). accordingly, both clusters include a large majority of hypermutated samples (with > minor variants), on and on for sc# and sc# , respectively. this result supports the existence of highly active mutational processes and is consistent with the hypothesis of apobecand ros-related mechanisms for cluster sc# and sc# , respectively. conversely, cluster sc# displays a much lower number of minor variants and most samples are non-hypermutated ( st quartile = minor variants, median = , rd quartile = . , max = ; hypermutated samples on ). this finding hints at the existence of mild spontaneous mutational processes affecting this signature-based cluster. in this regard, the accumulation of a large number of mutations is well-known to be detrimental for the overall fitness of the virus. for this reason, one might hypothesize that in presence of hypermutation, e.g., when the apobec or ros responses are triggered, the ability of the virus to proceed through its vital cycle, therefore accumulating further mutations, is greatly impaired or completely abolished. as opposite, in absence of major mutational processes, the virus could undergo a slower accumulation of variants, a process that is likely generated by a mixture of different spontaneous processes and which might be consistent with the mutational profile observed in cluster sc# . to further investigate this issue, we analyzed the distribution of always clonal variants with regard to the whole cohort (see methods). as one can see in suppl. figure , the categorical vf distribution on all substitution classes closely resembles that of cluster sc# on minor variants (pearson correlation coefficient = . ). the fact that the cluster sc# is highly correlated with the clonal variants profile and the relative scarcity of c>t:g>a and c>a:g>t substitutions in the clonal population would support the hypothesis of purifying selection against (hyper)mutational processes characterizing signatures s# and s# . this aspect is additionally discussed in the dn/ds analysis presented in the next subsection. the vf distribution for all snvs highlights additional differences among signature-based clusters. in particular, cluster sc# displays higher abundance of low-frequency variants (in the range − %), with respect to the other clusters (fig. c) . moreover, the categorical distributions of substitutions with respect to sars-cov- open reading frames (orfs) seem to be significantly different among signature-based clusters and are consistent with the composition of the corresponding mutational signatures (fig. d ). in fact, while in most cases variants appear to be randomly distributed across orfs, one can notice that signature-cluster sc# displays a certain bias toward orf n, i.e., the sars-cov- nucleocapsid protein. overall, these results reinforce the hypothesis of distinct mutational processes active in different patients. when clinical data would be available in combination to sequencing data, this will allow to assess the correlation with clinical outcomes. evidence of purifying selection against signature-related (hyper)mutational processes. to investigate the evolutionary dynamics of sars-cov- , we first analyzed the cumulative distribution function (cdf) of the vf of all minor variants detected in the cohort, grouped by signature-based cluster (we did not include cluster sc# in the analysis, due to the low number of minor variants). also in this case, we excluded clonal mutations from the analysis, as mostly related to transmission and accumulation evolutionary events [ ] . remarkably, the cdf of cluster sc# is significantly different from that of the other clusters and presents a steeper slope, suggesting that the mutational processes related to signature s# (i.e., apobec) produce a considerably larger number of low-frequency variants, likely due to a higher mutational rate, and in spite of possible selection and transmission phenomena (fig. e ). to further investigate this issue, we performed a corrected version of the dn/ds ratio analysis, i.e., obtained by normalizing the s/ns rate with respect to the theoretical distribution of substitutions detected in each cluster, as suggested in a different context in [ ] . in fig. f , one can notice that the dn/ds ratio trend is significantly different for the three signature-based clusters. in particular, sc# show a ratio that is generally below , suggesting the existence of significant purifying selection, a result that is consistent with our previous conclusions. the dn/ds ratio is slightly higher for sc# , yet mostly lower than , hinting at the presence of milder purifying selection processes. finally, as expected, signature-based cluster sc# displays a dn/ds ratio close to on the whole genome, suggesting that the mutational processes underlying this cluster might be spontaneous and neutral in the population, as also confirmed by the similarity with the clonal variant substitution profile shown above. high-resolution model of sars-cov- evolution via verso reveals convergent variants and allows to quantify minor variants transmission. we employed verso, a framework introduced in [ ] for the reconstruction of high-resolution models of viral evolution from raw sequencing data of viral samples. in particular, we first applied verso to the binarized vf profiles of clonal variants (vf > %) detected in at least % of the cohort, in order to obtain a phylogenomic lineage tree, which describes the existence different viral genotypes (or lineages) and the ancestral relations among them (fig. a) . interestingly, snv g. t>c (mapped on orf n, synonymous) appear to be the earliest evolutionary event from reference genome sars-cov- -anc [ ] . two major clades characterize the model and correspond to known sars-cov- types [ , ] , as determined by presence/absence of mutations g. t>c (orf ab, synonymous) and g. c>t (orf ab, p. s>l) . furthermore, the model unveils the presence of homoplasies, as clonal variants are found in separate clades in a convergent evolution scenario, namely g. c>t (orf ab, synonymous) and g. c>t (n, p. p>l) (fig. b) . one might hypothesize that such snvs have spontaneously emerged in unrelated samples and were selected either due to some functional advantage or, more likely, to the combination # variants (log scaled) of founder and stochastic effects involved in variant transmission during infections, which might lead certain minor snvs transiting to clonality in the population (see above). the latter hypothesis is further confirmed by the fact that g. c>t is a silent mutation, and points at the complex interplay between evolution within hosts and transmission among hosts. furthermore, we note that variant g. a>g (s, p. d>g), whose correlation with clinical outcome was recently hypothesized [ , , ] , is found in viral lineages, which include samples of the cohort. a second step of verso allows to process the complete vf profiles of the samples in each lineage, to project the sublineage composition and the genomic distance among samples on a low-dimensional space (see methods). such maps return the genomic similarity among hosts harboring the same viral lineage (i.e., same clonal mutation), and which would not be identifiable by considering consensus sequences. in fig. c one can see that many distinct intra-lineage clusters are found, which identify samples with similar vf profiles on all variants. also, a complex inter-cluster network is revealed in each case, which connect clusters via samples presenting a relatively small evolutionary distance. this genomic distance measures the impact of both transmission processes of minor variants during infections and of shared mutational hotspots hitting independent samples; in particular, the latter occurrence is more likely when considering samples affected by hypermutation processes, as observed for signature clusters sc# and sc# . we finally quantified the number of homoplasies involving minor variants (i.e., identical minor snvs observed in samples of independent lineages). as extensively discussed in [ ] , while all the clonal variants of a host are most likely transmitted during an infection, the extent of transmission of minor variants is still baffling and is highly influenced by bottlenecks, founder effects and stochasticity [ , ] . multiple simultaneous infections of the same host from individuals harboring distinct viral lineages, also named superinfections, might in principle affect variant clonality, yet their occurrence is extremely rare [ ] . for such reasons, homoplasies involving minor variants are most likely due to: (i) positive selection of the variants due to some functional advantage, in a scenario of parallel/convergent evolution, (ii) mutational hotspots, i.e., svns falling in mutation-prone sites or regions of the viral genome, (iii) complex transmission dynamics involving founder effects and stochasticity, which may allow certain minor variants to transit to clonality, eventually leading to a lineage transmutation (see above). we also note complex combinations of such phenomena are possible, as proven by the complex sublineage architecture often observed in individual samples. in our case, we observe that . % of minor variants are observed as private of single samples, . % in multiple samples of the same lineage and . % are detected in samples belonging to distinct lineages (fig. d) . important conclusions can be drawn from these results. on the one hand, the quantity of minor variants most likely transmitted across hosts appears to be relatively limited, as proved by the ratio of mutations shared in different hosts harboring the same viral lineage. on the other hand, the number of minor variants observed in multiple independent lineages is surprisingly high; this is especially true for clusters related to (hyper)mutational processes (i.e., signature-based clusters sc# and sc# ), which can display an extremely high number of minor variants shared across a large number of lineages and samples (up to and lineages, and and distinct samples, respectively; fig. e ). this result would hint at the abundance of mutational hotspots inducing the occurrence of identical minor variants on likely unrelated samples, as well as the possible presence of positively selected variants. overall, these results suggest that the complex sublineage architecture of samples can be exploited to highly refine standard (phylo)genomic analyses and provide a quantitative measure of intra-and inter-host genomic diversity. we employed two independent datasets comprising and samples, respectively (ncbi bioprojects: prjna , amplicon, usa; prjna , amplicon, australia), to validate the presence of the mutational signatures described above. in detail, we performed signature assignment with respect to the discovered signatures on and high-quality samples showing ≥ minor variants. three signature-based clusters are found for both datasets and explain more that % of the variance. such clusters are related to combinations of signatures consistently to the analysis presented in the text and display alike distributions of minor svns (see suppl. figs. − ). standard (phylo)genomic analyses of viral consensus sequences might miss useful information to investigate the elusive mechanisms of viral evolution within hosts and of transmission among hosts. in this respect, raw sequencing data of viral samples can be effectively employed to deliver a high-resolution picture of intra-host heterogeneity, which might underlie different clinical outcomes and affect the efficacy of anti-viral therapies. this aspect is vital especially during the critical phases of an outbreak, as experimental hypotheses are urgently needed to deliver effective prognostic, diagnostic and therapeutic strategies for infected patients. we here presented the largest up-to-date quantitative analysis of intra-host genomic diversity of sars-cov- , which revealed that the large majority of samples present a complex sublineage architecture, likely due to the interplay between host-related mutational processes and transmission dynamics. in particular, we here proved the existence of mutually exclusive viral mutational signatures, i.e., nucleotide substitution patterns, which show that different hosts respond to sars-cov- infections in different ways, in many cases by suffering hypermutational processes ruled either by apobec or ros-related activity. as a first consequence, ≈ % of the sars-cov- viral genome is already mutated in at least one sample and, in most cases, such mutations are found at a low frequency in hypermutated individuals. the ds/dn analysis shows that such numerous low-frequency variants tend to be purified in the population whereas, conversely, a significant number of variants linked to spontaneous low-rate mutational processes appear to consolidate. in particular, due to the still obscure combination of founder effects and selection phenomena, certain variants appear to transit to clonality in the population, eventually leading to the definition of new viral genotypes. once become clonal, mutations tend to accumulate in the population, as proven by a statistically significant increase of genomic diversity, and might be used to reconstruct robust models of viral evolution via verso [ ] . finally, the analysis of homoplasies, i.e., (low-frequency) variants shared across distinct viral lineages and unlikely due to infection events, demonstrate that a high number of mutations can independently emerge in multiple samples, due to mutational hotspots often related to signatures or, possibly, to positive (functional) selection. dataset. we analyzed a cohort comprising samples from ncbi bioproject with accession number prjna . for all samples, amplicon sequencing high-coverage raw data are provided; all patients were located in australia. within this cohort, we considered for our analyses, high-quality samples having coverage > in more than % of the virus genome. we considered two additional datasets for validation, ncbi bioproject with accession numbers prjna (united states, amplicon samples) and prjna (australia, amplicon samples). we applied the same quality filters to these datasets (coverage > in > % of virus genome), to obtain two validation sets of and samples, respectively. snvs calling. we downloaded sra files and converted them to fastq files using sra toolkit. following [ ] , we used trimmomatic (version . ) to remove positions at low quality from the rna sequences, using the following settings: leading: trailing: slidingwindow: : minlen: . we used bwa mem (version . . ) to map reads to the reference genome (sars-cov- -anc [ ] ). we then generated sorted bam files from bwa mem results with samtools (version . ) and removed duplicates with picard (version . . ). variant calling was performed generating mpileup files with samtools and then using varscan (min-var-freq parameter set to . ) [ ] . signatures analysis. the analysis was performed on minor variants (vf > % and ≤ %), in order to ensure that the considered variants are not due to transmission, but are likely emerged in the host. in such way, we could associate to each discovered signature a mechanism causing variants in the viral genome related to the specific host. signatures decomposition was formulated as a non-negative matrix factorization problem (nmf) [ ] . given n samples, r possible substitution classes (e.g., c>t:g>a) and s signatures, we can define the following objects: • the input data matrix d, a n × r dimensional matrix, where every element d i,j represents the cumulative vf of the snvs with substitution class j in the i th sample. note that d i,j ∈ r + ; • the low-rank latent nmf matrix a, a n × s dimensional matrix , where every element a i,j represents the linear combination coefficient of signature j in sample i (also exposure of the i th sample to signature j [ ] ). note that a i,j ∈ r + ; • the signature (or basis) matrix b, a s × r dimensional matrix, where every row is a categorical distribution of all substitution classes in each signature. for this matrix we assume that every row must sum up to , then b i,j ∈ { , } and r i= b i,j = . in particular, we here considered substitution classes (r = ) by merging equivalent substitution types, namely g>t:c>a, g>c:c>g, g>a:c>t, a>t:t>a, a>g:t>c and a>c:t>g. problem (signature decomposition): given the data matrix d, we aim at finding the nmf latent matrix a and the signature matrix b, such that ||d − a · b|| is minimum. to solve the stated problem, we here performed a total of independent nmf runs with standard update [ ] , for solutions at ranks varying from to , where initial solutions were randomly initialized; for each run a total of iterations were performed where signatures and their assignments to samples were iteratively estimated by non-negative least squares [ ] . the final solution was constructed as the consensus of the runs [ ] . we then employed multiple state-of-the-art approaches to assess the optimal rank (optimal number of signatures s) for the nmf decomposition. we first assess the stability of nmf results over the runs, with the idea that stable solutions are preferable to unstable ones; to this extent, we computed cophenetic correlation coefficient [ ] and dispersion coefficient [ ] , which both showed a sharp drop at rank equal to , hinting at signatures as a stable solution (see suppl. fig. a-b) . furthermore, we also evaluated the goodness of fit of nmf solutions at different ranks, with rank equals to being able to explain . % of variance in the data (see suppl. fig. a-c) and showing an average cross-validation error of . [ ] ; finally, we report as suppl. fig. d the average pearson correlation between observations and predictions by nmf with rank equals to , showing a plateau with correlation equals . [ ] . all of this supports as the optimal rank for our decomposition problem and the presence of distinct mutational signatures in out data. identification of signature-based clusters. in order to identify clusters of samples possibly affected in different proportions by the discovered mutational signatures, we considered the low-rank latent nmf matrix a defined above. specifically, we first normalized a such that each row of the matrix sums up to and then computed the euclidean distance among each pair of samples. we next performed principal component analysis (pca) on the distance matrix to estimate the optimal number of clusters present in our data. in detail, the analysis of the eigenvalues of the distance matrix shows that components explain > % of the variance, followed by a plateau. accordingly, we performed k-means clustering with k = on the first principal components of the distance matrix to discover the signature-based clusters. dn/ds analysis. in order to quantify the selection pressure in coding regions of sars-cov- , we employed dn/ds analysis, which assesses and compares non-synonymous to synonymous substitution rates. in its standard version, this analysis assumes uniform nucleotide substitution probabilities across the genome; however, this hypothesis might not hold if different mutational processes are active with biases over a subset of substitutions (e.g., non-uniform distribution might be observed across signature-based clusters). if this bias is not taken in account, it may lead to erroneous estimation of the dn/ds ratio [ ] . for this reason, since we discovered the existence of different host-related mutational processes (i.e., the mutational signatures) that are strongly biased toward specific substitutions, we developed a corrected dn/ds ratio analysis, as proposed in a different context in [ ] . specifically, given the i th sliding window of the coding region comprising l bases and considering the f th signature-based cluster, the corrected dn/ds ratio (for signature-cluster f and sliding window i) is given by: where, n i,f obs and s i,f obs are the numbers of non-synonymous and synonymous substitutions detected in the window i in at least one sample of signature-based cluster f , p f,c is the probability of substitution class c in signature-based cluster f (computed with respect to the categorical normalized cumulative vf distribution of all substitution classes in that cluster), and n k,c (s k,c ) is equal to if class c identifies an admissible non-synonymous (synonymous) substitution in the k th position of the window i, otherwise. verso high-resolution model of viral evolution. we employed the verso framework introduced in [ ] to reconstruct a high-resolution model of viral evolution. the framework processes variant frequency profiles generated from raw sequencing data and includes two subsequent algorithmic steps. in the first step, verso takes as input the binarized mutational profiles of clonal mutations. in our analysis, we considered only clonal variants (vf > %) detected in at least % of the samples of the cohort (m = clonal variants on n = samples). verso employs a probabilistic approach to model the accumulation of clonal variants and the presence of noise and uncertainty in the data, as proposed in the context of cancer evolution in [ ] . as output, a phylogenomic lineage tree is returned, in which each node correspond to a viral genotype (or lineage), edges are parental relations, characterized by one or more accumulating variants, and any viral lineage is associated to a cluster of samples. variants showing particularly high level of estimated error rates with respect to the theoretical model represent possible violations of perfect phylogenetic constraints, e.g., due to convergent evolution. in the analysis presented in the text, a grid search comprising different error rates was employed. hierarchical clustering was then performed on the similarity matrix returned by verso to associate samples to viral lineages [ ] . in the second step, verso takes into account the vf profiles of all variants detected in the cohort and defines a pairwise genomic distance among samples, based on bray-curtis dissimilarity [ ] . samples showing similar patterns of co-occurrence of variants might have a similar sublineage architecture, therefore being at a small evolutionary distance and pinpointing possible uncovered infection events or the presence of shared of mutational hotspots. in this work the genomic distance is computed among all samples associated to any given viral lineage, as inferred in the first step, to reduce the impact of confounding effects [ ] . the genomic distance is then used in a workflow for dimensionality reduction and clustering via scanpy [ ] including: (i) k-nearest neighbour graph (k-nng) computation, executed after applying principal component analysis (pca); (ii) clustering of samples via leiden algorithms for community detection [ ] ; (iii) projection of samples on the umap low-dimensional space [ ] . verso finally returns both the partitioning of samples in clusters and the visualization in a low-dimensional space. the source code used to replicate all the analyses is available at this link: https://github.com/bimib-disco/sars-cov- -ihmv. coronavirus disease (covid- ): situation report a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- isolation of sars-cov- -related coronavirus from malayan pangolins genomic surveillance reveals multiple introductions of sars-cov- into northern california global initiative on sharing all influenza data-from vision to reality the quasispecies (extremely heterogeneous) nature of viral rna genome populations: biological relevance-a review rapid viral quasispecies evolution: implications for vaccine and drug strategies shared sars-cov- diversity suggests localised transmission of minority variants quantification of intra-host genomic diversity of sars-cov- allows a high-resolution characterization of viral evolution and reveals functionally convergent variants genomic diversity of sars-cov- in coronavirus disease patients virological assessment of hospitalized patients with covid- molecular characterization of sars-cov- from the first case of covid- in italy intra-host site-specific polymorphisms of sars-cov- is consistent across multiple samples and methodologies. medrxiv genomic epidemiology of sars-cov- in guangdong province, china. medrxiv tracking the covid- pandemic in australia using genomics sars-associated coronavirus quasispecies in individual patients analysis of intrapatient heterogeneity uncovers the microevolution of middle east respiratory syndrome coronavirus intra-host dynamics of ebola virus during quantifying influenza virus diversity and transmission in humans transmission dynamics and evolutionary history of -ncov topology of viral evolution viral escape mechanisms-escapology taught by viruses circulating virus load determines the size of bottlenecks in viral populations progressing within a host rampant c-> u hypermutation in the genomes of sars-cov- and other coronaviruses-causes and consequences for their short and long evolutionary trajectories evidence for host-dependent rna editing in the transcriptome of sars-cov- signatures of mutational processes in human cancer metagenes and molecular pattern discovery using matrix factorization apobec a cytidine deaminase induces rna editing in monocytes and macrophages cytosine deamination and selection of cpg suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses the repertoire of mutational signatures in human cancer base-excision repair of oxidative dna damage reactive oxygen and nitrogen species during viral infections rna viruses: ros-mediated cell death functions and regulation of rna editing by adar deaminases. annual review of biochemistry mutational signatures are critical for proper estimation of purifying selection pressures in cancer somatic mutation data when using the dn/ds metric phylogenetic network analysis of sars-cov- genomes on the origin and continuing evolution of sars-cov- exploring the genomic and proteomic variations of sars-cov- spike glycoprotein: a computational biology approach. infection the d g mutation in sars-cov- spike increases transduction of multiple human cell types tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering varscan : somatic mutation and copy number alteration discovery in cancer by exome sequencing nonnegativity constraints in numerical analysis sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis de novo mutational signature discovery in tumor genomes using sparsesignatures longitudinal cancer evolution from single cells scanpy: large-scale single-cell gene expression data analysis from louvain to leiden: guaranteeing well-connected communities umap: uniform manifold approximation and projection this work was partially supported by the elixir italian chapter and the sysbionet project, a ministero dell'istruzione, dell'università e della ricerca initiative for the italian roadmap of european strategy forum on research infrastructures and by the airc-ig grant . we thank marco antoniotti, giulio caravagna and chiara damiani for helpful discussions . # variants (log scaled) a.g., d.m., f.a., r.p. and d.r. designed and developed the study. a.g, d.m., f.a. and d.r defined, implemented and executed the computational analyses. a.g., d.m., f.a., r.p. and d.r. analyzed the data and interpreted the results. a.g., d.r. and r.p. supervised the study. all authors wrote the manuscript, discussed the results, and commented on the manuscript. the authors declare that they have no competing interests. key: cord- -u d vvmq authors: st-germain, jonathan r.; astori, audrey; samavarchi-tehrani, payman; abdouni, hala; macwan, vinitha; kim, dae-kyum; knapp, jennifer j.; roth, frederick p.; gingras, anne-claude; raught, brian title: a sars-cov- bioid-based virus-host membrane protein interactome and virus peptide compendium: new proteomics resources for covid- research date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: u d vvmq key steps of viral replication take place at host cell membranes, but the detection of membrane-associated protein-protein interactions using standard affinity-based approaches (e.g. immunoprecipitation coupled with mass spectrometry, ip-ms) is challenging. to learn more about sars-cov- - host protein interactions that take place at membranes, we utilized a complementary technique, proximity-dependent biotin labeling (bioid). this approach uncovered a virus-host topology network comprising proximity interactions amongst host proteins, highlighting extensive virus protein crosstalk with: (i) host protein folding and modification machinery; (ii) membrane-bound vesicles and organelles, and; (iii) lipid trafficking pathways and er-organelle membrane contact sites. the design and implementation of sensitive mass spectrometric approaches for the analysis of complex biological samples is also important for both clinical and basic research proteomics focused on the study of covid- . to this end, we conducted a mass spectrometry-based characterization of the sars-cov- virion and infected cell lysates, identifying unique high-confidence virus tryptic peptides derived from different virus proteins, to create a high quality resource for use in targeted proteomics approaches. together, these datasets comprise a valuable resource for ms-based sars-cov- research, and identify novel virus-host protein interactions that could be targeted in covid- therapeutics. the sars-cov- ~ kb positive strand rna genome (genbank mn . , ) contains two large open reading frames (orf a and orf ab) encoding polyproteins that are cleaved by viral proteases into ~ non-structural proteins (nsps). smaller ' orfs encode the primary structural proteins of the virus, spike (s), nucleocapsid (n), membrane (m) and envelope (e), along with nine additional polypeptides of poorly understood function (fig a) . to enable proteomics-based approaches for the analysis of complex biological samples, we analyzed both sars-cov- -infected cell lysates and mature virions, generating a high confidence virus peptide spectrum compendium. this dataset can be used e.g. for the selection of virus peptides for use in targeted proteomics approaches (e.g. the identification of viral peptides in human clinical samples), or for the generation of peptide spectral libraries for increased sensitivity of detection. two sars-cov- -host protein-protein interaction (ppi) mapping efforts have utilized immunoprecipitation coupled with mass spectrometry (ip-ms) of epitope-tagged viral proteins to identify > putative virus-host ppis in hek t and a cells. while extremely powerful for identifying stable, soluble protein complexes, ip-ms approaches are not optimal for the capture of weak or transient ppis, or for the detection of ppis that take place at poorly soluble intracellular locations such as membranes, where key steps in viral replication occur. to better understand how host cell functions are hijacked and subverted by sars-cov- proteins, we used proximity-dependent biotinylation (bioid ) as a complementary approach to map virus-host protein proximity interactions in live human cells. this dataset provides a valuable resource for better understanding sars-cov- pathogenesis, and identifies numerous previously undescribed virus-host ppis that could represent attractive targets for therapeutic intervention. a sars-cov- peptide compendium. to identify high quality tryptic virus peptides for use in targeted proteomics analyses, data-dependent acquisition (dda) mass spectrometry was conducted on the mature sars-cov- virion. the toronto sb virus strain was cultured in veroe cells (moi . ), and culture media was collected hrs post-infection. virus was concentrated by centrifugation, inactivated by detergent and subjected to tryptic proteolysis. the resulting viral tryptic peptides were identified using nanoflow liquid chromatography -tandem mass spectrometry (lc-ms/ms; fig a, together, these data confirm and expand upon previous proteomic analyses of sars-cov- virions, infected cells , - and patient samples [ ] [ ] [ ] , and provide a library of high quality virus peptide spectra covering virus proteins that can be used for the creation of peptide spectral libraries and targeted proteomics approaches. a sars-cov- -host protein proximity interactome. based on standard transcript mapping algorithms and conservation with orfs in other coronaviruses , , we created a sars-cov- open reading frame (orf) vector set (fig a) . nine sars-cov- proteins are predicted to have one or more transmembrane domains (s, e, m, nsp , nsp , nsp , orf a, orf a and orf b). to better characterize sars-cov- -host membrane-associated ppis, these virus orfs (along with the remaining poorly understood open reading frames orf b, orf , orf and orf b) were fused in-frame with an n-terminal bira* (r g) coding sequence, and the resulting fusion proteins individually expressed in hek flp-in t-rex cells. using these cells, a virus-host ppi landscape was characterized using bioid (as in ; supp table ). applying a bayesian false discovery rate of ≤ %, high confidence proximity interactions were identified with unique human proteins (all raw data available at massive.ucsd.edu, accession #msv ). prey polypeptides were detected as high confidence interactors for a single sars-cov- bait protein in this analysis, underscoring the high degree of specificity in this virus-host proximity interaction map. bait-bait correlation analysis (fig b) , based on similarity between interactomes (jaccard index analysis conducted in prohits-viz ) revealed high levels of correspondence between the s (spike), e, m, nsp , nsp , orf a, orf b, and orf bait proteins. the nsp , nsp , orf a, orf b, orf , and orf b interactomes shared a lower degree of similarity with the other bait proteins in this set. bait proteins with one or more predicted transmembrane domains thus largely clustered together, with the exception of orf a (which clusters outside the main group of putative membrane baits, even though it is predicted to possess three transmembrane helices), and orf (which clusters with the putative membrane baits, but has no predicted transmembrane domain itself). a self-organized force-directed bait-prey topology map was next generated, in which map location is determined by the number and abundance (i.e. total peptide counts) of host cell interactors (fig c) . this approach similarly clustered all of the baits with one or more predicted transmembrane helices, along with orf , in a dense "core" region of the map, indicating that these bait proteins share a large proportion of common interactors. nsp and orf a occupy regions at the edge of this dense region of the map, indicating a lower number of shared interactors with the other membrane proteins. nsp , orf b and orf b occupy peripheral regions of the map, indicating that they share far fewer interactors with the rest of the baits analyzed here. interestingly, orf occupies a region of the map near orf a (these two baits were also clustered near each other in the bait-bait analysis, above). consistent with this location, more than half of the orf a interactome ( of proteins) is also present in the larger orf interactome ( proteins). this overlapping group of interactors is enriched in plasma membrane (pm) and er proteins. based on this observation, it will be interesting to explore similarities in orf a and orf function. as a whole, the virus-host interactome is significantly enriched in proteins associated with the endoplasmic reticulum (er)/nuclear, golgi and plasma membranes, and er-golgi trafficking vesicles ( even amongst those viral proteins that appear to localize exclusively to the er-golgi-pm endomembrane membrane system, specificity in virus-host interactomes was observed, likely reflecting preferences for interactions with different subsets of membrane proteins and/or localization to unique membrane lipid nanodomains. for example, both the orf a and orf b interactomes are enriched in pm solute channels, but orf a appears to interact uniquely with the anion exchanger slc a , the taurine transporter slc a , and the glycine transporter slc a , while orf b interacts specifically with the amino acid transporter slc a , the sulfate transporter slc a , and the divalent metal transporter slc a . autophagy is an important part of the innate immune response, effecting the elimination of intracellular pathogens such as viruses (virophagy), and delivering them to the lysosome, which processes pathogen components for antigen presentation , . many viruses have thus evolved strategies to inhibit the host autophagic machinery. notably, however, (+)rna viruses appear to be dependent on autophagic function for efficient replication, and hijack components of the autophagic machinery for use in membrane re-organization and the creation of ros . consistent with these observations, our data highlight multiple interactions amongst sars-cov- proteins and the er-phagy receptors fam b, tex and sec c. a number of virus protein interactions were also detected with components of the ufmylation system (ddrgk , cdk rap , ufl and ufsp ), which was recently shown to play a key role in er-phagy , highlighting interesting links between specific autophagy pathways and sars-cov- . interactome is significantly enriched in mitochondrial proteins (supp table , supp table ). amongst the high confidence orf b interactors is the mitochondrial antiviral signaling protein (mavs), which acts as a hub for cell-based innate immune signaling. the cellular pattern recognition receptors (prrs) detect pathogen-associated molecular patterns (pamps, e.g. viral rnas). pamp-bound prrs interact with mavs, which activates the nf-kb and type i interferon signaling pathways . many different viruses block the host antiviral response by interfering with mavs signaling. this may be accomplished by e.g. direct mavs cleavage by viral proteases (a strategy used by hav, hcv and coxsackievirus b ), or via s proteasomemediated degradation, a strategy used by sars-cov orf b , which recruits the hect e ligase itch/aip to effect mavs ubiquitylation. we did not detect any ubiquitin e ligases in the orf b bioid interactome, but consistent with a recent report indicating that sars-cov- orf b binds directly to tomm to block mavs-mediated ifn signaling, we detected tomm as a major component of the orf b interactome. the sars-cov- orf interactome is uniquely enriched in nuclear pore complex (npc) components (supp table , supp table ). sars-cov orf was shown to inhibit npcmediated transport by tethering the importin proteins kpna and kpnb /tnpo to er/golgi membranes , which effectively blocks the import of immune signaling proteins such as stat into the nucleus. sars-cov- orf shares % identity at the amino acid level with its sars-cov counterpart, and similarly displays potent immune repressor function . it will be interesting to detemine if the sars-cov orf interaction with npc components leads to a similar disruption of immune signaling and nuclear transport. orf b interacts specifically with lamtor and lamtor , components of the ragulator complex, which is localized to the lysosomal membrane and regulates the mechanistic target of rapamycin complex (mtorc ). mtor signaling is inactivated by amino acid starvation and other types of stress, to inhibit cap-dependent translation and upregulate autophagy . many viruses have thus evolved mechanisms to maintain mtorc activity during infection . recent work has shown that the lamtor / proteins may also play important roles in xenophagy . based on these observations, sars-cov- orf b could play an important role in regulating mtorc activity and/or in the disruption of antiviral immune function. in addition to er/golgi proteins, sars-cov- nsp interacts with the cytoplasmic rna binding proteins fxr and fxr . the fxr proteins were identified as host cell components of (+)rna equine encephalitis virus (eev) rna replication complexes (rc) [ ] [ ] [ ] . fxrs are recruited to the viral rc by eev nsp proteins, and the fxrs are required for rc assembly. it will be interesting to determine whether the fxrs play similar roles in sars-cov- rna replication. we and others have previously reported that orthogonal ppi discovery approaches such as proximity-dependent biotinylation (bioid) can provide information that is highly complementary to ip-ms datasets , [ ] [ ] [ ] [ ] . to this end, we applied bioid to identify proximity partners for sars-cov- proteins in the proteomics "workhorse" cell system. this mapping project significantly expands upon the sars-cov- virus-host interactome, providing a rich resource that can be mined by the scientific community for better understanding sars-cov- pathobiology, and identifying virus-host membrane protein interactions that could be targeted in covid- therapeutics. the design and implementation of sensitive mass spectrometric approaches for the analysis of complex biological samples will be important for clinical and basic research proteomics focused on covid- . to this end, we also undertook an analysis of sars-cov- virions and infected vero cell lsyates using data-dependent acquisition tandem mass spectrometry, and identified unique tryptic peptides, assigned to different virus proteins. this work provides a significantly expanded sars-cov- tryptic peptide compendium for use in targeted proteomics approaches such as parallel or selected reaction monitoring (prm/srm), or for use in spectral library building. bioid, mass spectrometry and data analysis were conducted exactly as in supp table . virus peptide identification dataset, complete raw data. data for viral preparation and infected cells is presented in individual tabs. table genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan a new coronavirus associated with human respiratory disease in china multi-level proteomics reveals host-perturbation strategies of a promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells sequence, infectivity, and replication kinetics of severe acute respiratory syndrome coronavirus . emerg infect dis data, reagents, assays and merits of proteomics for sars-cov- research and testing shortlisting sars-cov- peptides for targeted studies from experimental data-dependent acquisition tandem mass spectrometry data shotgun proteomics analysis of sars-cov- -infected cells and how it can optimize whole viral particle antigen production for vaccines a genome-wide er-phagy screen highlights key roles of mitochondrial metabolism and er-resident ufmylation proteomics of sars-cov- -infected host cells reveals therapy targets mass spectrometric identification of sars-cov- proteins from gargle solution samples of covid- patients mass-spectrometric detection of sars-cov- virus in scrapings of the epithelium of the nasopharynx of infected patients via nucleocapsid n protein proteotyping sars-cov- virus from nasopharyngeal swabs: a proof-of-concept focused on a min mass spectrometry window global interactomics uncovers extensive organellar targeting by zika virus prohits-viz: a suite of web tools for visualizing interaction proteomics data an intramembrane chaperone complex facilitates membrane protein biogenesis here, there, and everywhere: the importance of er membrane contact sites +)rna viruses rewire cellular pathways to build replication organelles rhinovirus uses a phosphatidylinositol -phosphate/cholesterol countercurrent for the formation of replication compartments at the er-golgi interface fat(al) attraction: picornaviruses usurp lipid transfer at membrane contact sites to create replication organelles host lipids in positive-strand rna virus genome replication the oxysterol-binding protein cycle: burning off pi( )p to transport cholesterol hepatitis c virus replication depends on endosomal cholesterol homeostasis digesting the crisis: autophagy and coronaviruses autophagy enhances the presentation of endogenous viral antigens on mhc class i molecules during hsv- infection manipulation of autophagy by (+) rna viruses regulation of mavs expression and signaling function in the antiviral innate immune response sars-coronavirus open reading frame- b suppresses innate immunity by targeting mitochondria and the mavs/traf /traf signalosome sars-cov- orf b suppresses type i interferon responses by targeting tom viral subversion of nucleocytoplasmic trafficking sars-cov- nsp , nsp , nsp and orf function as potent interferon antagonists nutrient regulation of mtorc at a glance adapting the stress response: viral subversion of the mtor signaling pathway lamtor /lamtor complex is required for tax bp -mediated xenophagy concise review: fragile x proteins in stem cell maintenance and differentiation mutations in hypervariable domain of venezuelan equine encephalitis virus nsp protein differentially affect viral replication hypervariable domain of eastern equine encephalitis virus nsp redundantly utilizes multiple cellular proteins for replication complex assembly new world and old world alphaviruses have evolved to exploit different components of stress granules, fxr and g bp proteins, for assembly of viral replication complexes bioid-based identification of skp cullin f-box (scf)beta-trcp / e ligase substrates getting to know the neighborhood: using proximity-dependent biotinylation to characterize protein complexes and map organelles parallel exploration of interaction space by bioid and affinity purification coupled to mass spectrometry proximity biotinylation and affinity purification are complementary approaches for the interactome mapping of chromatin-associated protein complexes saint: probabilistic scoring of affinity purification-mass spectrometry data key: cord- -aeyf yu authors: joshi, bhrugesh; bakarola, vishvajit; shah, parth; krishnamurthy, ramar title: deepmine - natural language processing based automatic literature mining and research summarization for early-stage comprehension in pandemic situations specifically for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aeyf yu the recent pandemic created due to novel coronavirus (ncov- ) from wuhan, china demanding a large scale of a general health emergency. this demands novel research on the vaccine to fight against this pandemic situation, re-purposing of the existing drugs, phylogenetic analysis to identify the origin and determine the similarity with other known viruses, etc. the very preliminary task from the research community is to analyze the wide verities of existing related research articles, which is very much time-consuming in such situations where each minute counts for saving hundreds of human lives. the entire manual processing is even lower down the efficiency in mining the information. we have developed a complete automatic literature mining system that delivers efficient and fast mining from existing biomedical literature databases. with the help of modern-day deep learning algorithms, our system also delivers a summarization of important research articles that provides ease and fast comprehension of critical research articles. the system is currently scanning nearly , , , english words from , research articles in not greater than . seconds with multiple search keywords. our research article presents the criticality of literature mining, especially in pandemic situations with the implementation and online deployment of the system. the recent pandemic of coronavirus disease [ ] is caused by novel human coronavirus, initially referred to as the wuhan coronavirus (cov), which is currently designated as a severe acute respiratory syndrome (sars)-cov- as per the latest international committee on taxonomy of viruses (ictv) classification [ ] and was suggested to have a possible zoonotic origin [ ] . as an outbreak of pneumonia started in wuhan, china and the first case was found on december . by the end of march , it has infected more than , individuals in nearly countries and causes more than , deaths worldwide [ ] . sars-cov- belongs to the family of coronaviridae. it has an envelope and maintains a single-strand, positive-sense rna genome ranging from to kb in length [ ] . these viruses can be classified into four genera: alpha, beta, delta, and gamma. from which alpha and beta coronavirus (covs) are known to infect humans [ ] . they are circulated among humans, other mammals, and birds. they can cause respiratory, hepatic, enteric, and neurologic diseases [ ] as well. despite the fact majority of human coronavirus infections found to have a mild effect, the epidemics of two beta coronaviruses (betacov) -namely severe acute respiratory syndrome coronavirus (sars-cov) and the middle east respiratory syndrome coronavirus (mers-cov) have caused more than ten thousand cumulative cases in the past years with mortality rates about percent for sars-cov and percent in case of mers-cov [ ] [ ] . as the situation is getting worst turning from epidemic to pandemic with an increasing number of confirmed cases and its related deaths with each passing day. science community around the world has joined their hand to fight against this deadly disease in all the possible ways including, making the vaccines, repurposing the existing drugs and designing diagnostic kits for the detection of the presence of virus or disease. varieties of works also includes sequencing the virus genome to identify the origin of the virus and the possible mode of transmission, to allocate the resource for the patients and applying the statistical models to predict how fast the disease can be spread. to begin any new research it is very essential to know what information is already available. as we know the fact that the similar virus has already been known, it becomes crucial to collect and organize this information from the ocean of research articles. the average reading speed of a human is roughly wpm (words per minute) [ ] , that means that it will take a substantial amount of time to comprehend and find useful information from thousands of research articles published. to help the research community for screening the thousands of published research and making the comprehension in a short amount of time we have designed the ai-nlp based system which can mine the relevant articles and give a short article summary, that can make ease in fast and efficient comprehension for researchers. as various experts are working in a different domain they search for the relevant information which lies in a varied domain. data mining is becoming a fundamental component of the global world with verities of applications. data mining assists a quick and efficient decision-making process that enhancing the accuracy of an outcome. we already have a huge amount of data or information in forms of research articles published in the last decades or more. with the application of the literature mining system, meaningful research articles can be extracted from this huge set and brief technical summary can be produced for research articles user interested with. in computer science, text summarization is a process of shortening the large text document(s) in order to generate short and meaningful piece of text. the objective is to create fluent natural language text keeping major insights or technicality of the source data. the automatic text summarization is an ordinary problem in the field of natural language processing and machine learning. the task was first carried out in form of generating automatic literature abstracts in [ ] . over the past half a century, the problem of text summarization has been addressed with verities of perspectives. primarily, the task of summarization is divided into two major categories as extractive summarization and abstractive summarization [ ] . as the name itself suggests, the extractive method involves pulling of key phrases from the source document in order to generate the targeted summary. the abstractive summarization works similar as we human do [ ] . it involves the end-to-end deep learning technique called sequence-to-sequence learning to derive the understanding about the association between words. the primary objective of the our system is to deliver quick and efficient search from a huge amount of available literature. by entering a keyword coron-avirus we are getting number of research articles that contain the keyword in the title. this is even time consuming to go through the abstract of each literature, here we are interested in. hence we found that searching for an article is not serves the ultimate purpose of serving important research articles to a user so that the researcher can speed up the research in epidemic situations. to overcome this limitation we developed a research text summarizer that can generate a technical summary by scanning all the research articles derived from user-entered keyword(s). in the demanding situation of covid- , we applied the literature mining with user entered keyword(s) and automatic generation of brief summary of research articles, that user searches for. the ultimate objective of our system deepmine, is to provide quick and efficient access of the openly available research articles. for the initial starting purpose, we have used the covid- open research dataset (cord- ) [ ] . the dataset has been published by allen institute for ai on th march . the detailed category wise description of the cord- dataset is given in table- . the deepmine is primarily performing two major functions namely mining of articles from available open data sources using user-entered keywords and generate brief technical summary in natural language for a quick review of articles that user interested with. the entire system has been developed using python programming language with the support of various scientific and natural language processing libraries available. as represented in figure- , a researcher can submit the keyword(s) he or she is interested with and the mine article process of the system returns article titles and available links by scanning more than , , , words from , research articles openly provided by cord- . a user of the system can separately make requests for generating a summary of the research articles with article summarization process. the system has used the deep natural language processing-based text summarization for generating detailed technical summary given the research article as an input. the deepmine system uses the natural language processing based text mining from available data sources. one user enters keyword(s) related to articles he or she is interested with, system returns all the articles having the entered keyword(s) in the article title. for returning more precise articles, we have currently focused on the title of the available research articles. the system returns found research articles having the keyword(s) in the title with total count and the user can explicitly visit to abstract and summary to the research article. the deepmine service is openly available on url https://deepmine.in/ as discussed under section - , cord- dataset contains nearly , , , words in title, abstract and article body. the below table- represents the dataset category wise title word count. while mining through the deepmine given a keyword "coronavirus" we found number of research articles as shown in figure- it is utmost significance for a researcher to take a quick idea of research with the abstract of the research article. deepmine system derived abstract for all mined articles from the dataset for a quick review. the below figure- shows the abstarct of one of the research articles mined suing keyword "coronavirus". from the homepage of the system as shown in figure- , we mined research articles using multiple keywords i.e. "coronavirus" and "rna". this mining returns us total results. statistics of country wise research articles as shown in figure- and abstract of one of the research articles is shown in figure- . our primary contribution majorly focuses on the design and development of the system that can able to automatically mine the literature, provided the keyword(s) as an input. the system is able to generate the brief summary of the research article, that user interested with. our ultimate objective is to provide a quick and easy literature mining service so that the researchers working for fighting current pandemic situation of covid- can get early-stage comprehension. currently we have used the covid- open research dataset (cord- ) , publicly made available by allen institute of ai on th march . our system deepmine is providing mining from , research articles with keywords by scanning nearly , , , english words available in literature dataset in not greater than . seconds. from experiments performed on the live deepmine system we can observer that the system is helpful for the primary mining of the huge amount literature available in various openly available datasets. however, in many of the research articles of the dataset the affiliation and/or location has not been provided. we are in the process of cleaning and improving the dataset with automatic and manual processes. currently, we are working on the collection of various research articles openly available from reputed publishers. afterward, the system will be upgraded with a wide range of research literature especially of the bio-medical field. an equal amount of work is going on to improve the process of text summarization with the latest deep learning techniques to provide more accurate and human like text summarization on research articles. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- a novel coronavirus from patients with pneumonia in china prevention: novel coronavirus ( -ncov), wuhan, china epidemiology, genetic recombination, and pathogenesis of coronaviruses host factors in coronavirus replication, in roles of host gene and non-coding rna expression in virus infection coronavirus pathogenesis, in advances in virus research summary of probable sars cases with onset of illness from middle east respiratory syndrome coronavirus (mers-cov standardized assessment of reading performance: the new international reading speed texts irest the automatic creation of literature abstracts a review on neural network based abstractive text summarization models text summarization with tensorflow covid- open research dataset key: cord- -i e fge authors: huang, kuan-ying a.; tan, tiong kit; chen, ting-hua; huang, chung-guei; harvey, ruth; hussain, saira; chen, cheng-pin; harding, adam; gilbert-jaramillo, javier; liu, xu; knight, michael; schimanski, lisa; shih, shin-ru; lin, yi-chun; cheng, chien-yu; cheng, shu-hsing; huang, yhu-chering; lin, tzou-yien; jan, jia-tsrong; ma, che; james, william; daniels, rodney s.; mccauley, john w.; rijal, pramila; townsend, alain r. title: breadth and function of antibody response to acute sars-cov- infection in humans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: i e fge serological and plasmablast responses and plasmablast-derived igg monoclonal antibodies (mabs) have been analysed in three covid- patients with different clinical severities. potent humoral responses were detected within weeks of onset of illness in all patients and the serological titre was elicited soon after or concomitantly with peripheral plasmablast response. an average of . % and . % of plasmablast-derived mabs were reactive with virus spike glycoprotein or nucleocapsid, respectively. a subset of anti-spike ( of ) and over half of anti-nucleocapsid ( of ) antibodies cross-reacted with other betacoronaviruses tested and harboured extensive somatic mutations, indicative of an expansion of memory b cells upon sars-cov- infection. fourteen of anti-spike mabs, including five anti-rbd, three anti-non-rbd s and six anti-s , neutralised wild-type sars-cov- in independent assays. anti-rbd mabs were further grouped into four cross-inhibiting clusters, of which six antibodies from three separate clusters blocked the binding of rbd to ace and five were neutralising. all ace -blocking anti-rbd antibodies were isolated from two patients with prolonged fever, which is compatible with substantial ace -blocking response in their sera. at last, the identification of non-competing pairs of neutralising antibodies would offer potential templates for the development of prophylactic and therapeutic agents against sars-cov- . in late , a novel coronavirus emerged and was identified as the cause of a cluster of respiratory infection cases in wuhan, china. it spread quickly around the world. in march of a pandemic was declared by the world health organization, the virus was formally named as severe acute respiratory syndrome coronavirus (sars- cov- ) and the resulting disease was named covid- . as of october , there (table ) , suggesting the presence of conserved epitopes on the spike glycoproteins of betacoronaviruses. each of anti-spike glycoprotein mabs was encoded by a unique set of heavy chain vdj and light chain vj rearrangements in the variable domain (supplemental table ). oc virus and three of these also cross-reacted on mers (table ). all five cross- reactive anti-s antibodies had high rates of somatic mutation ( ± ), indicating a memory phenotype, and three of the five were neutralising to a moderate level (half maximal effective concentration, ec , - . nm, table ). the cdr length varied among anti-spike glycoprotein antibodies (supplemental table ). no significant differences were found between anti-s and anti-s or anti- rbd subsets. among anti-s mabs, a significantly longer heavy chain cdr length was found in the cross-reactive group compared to the specific group (cross-reactive ± versus specific ± , p= . , two-tailed mann-whitney test; figure c) , indicating that a long cdr may play a role in antigen binding, which is also found in several broadly reactive human mabs against human immunodeficiency virus and influenza virus ( , ). the binding activities of anti-rbd mabs were further characterised in detail. using mdck-siat cells transduced to express the rbd and flow cytometry, binding activities of the anti-rbd mabs were shown to vary with % binding concentration from . to . µg/ml (supplemental figure ). the mabs with strong anti-rbd binding have a relatively long heavy chain cdr length ( % binding concentration < . µg/ml versus > . µg/ml, p= . , two-tailed mann- whitney test; supplemental figure the anti-spike glycoprotein mabs were systematically examined by plaque reduction neutralisation (prnt) assay for neutralisation of wild type sars-cov- virus (see methods; summarised in table ). a total of neutralising antibodies distributed between different regions of the spike glycoprotein were identified: of to rbd, of to s (non-rbd), of to s . the ec concentrations, as a measure of potency, ranged from . to ~ nm ( ng/ml -~ µg/ml). (see methods): inhibition of virus replication was measured by quantitative pcr in the supernatant bathing the infected cells. this results corroborated that anti-rbd fd a, anti-rbd fi a, anti-rbd fd d, anti-rbd ey a and anti-s ew c, as crude culture supernatants, reduced the virus signal from ~ -to ~ , -fold (supplemental figure ) . potent neutralising antibodies to the rbd of sars-cov- spike glycoprotein were identified and we thus analyse the blockade of the ace -rbd interaction by anti- rbd antibodies in two assays ( figure , table the structure of vhh -fc bound to rbd is known ( ) and its footprint on the rbd does not overlap that of ace , so inhibition is thought to occur by steric hindrance. in the second assay, we employed mdck-siat cells overexpressing full-length human ace as a transmembrane protein. unlabelled antibodies or ace -fc were mixed in excess with biotinylated rbd, and binding of rbd was detected with streptavidin-hrp in elisa (figure b ). the results of this assay mostly mirrored those of the first assay and confirmed that in this orientation anti-rbd neutralising antibodies fd a and fd d competed in excess with soluble rbd for binding to ace ( figure b ). in addition, anti-rbd neutralising antibody ey a competed with rbd for ace binding. the binding pattern of ey a is analogous to a previously described antibody cr (table ) ( ). these two antibodies are known to bind to the same region of rbd away from the ace binding site, but they influence the binding kinetics of rbd to ace , presumably through steric effects ( ). the ten anti-rbd mabs were then divided into cross-inhibiting groups as described for human mabs to ebola ( ) by assessing competition of unlabelled antibodies at -fold (or greater) excess over a biotin labelled target antibody by elisa. included as controls were the vhh -fc ( ) and h -h -fc ( ) the ten antibodies formed four cross-inhibiting clusters (table ) , represented by antibodies ey a (cluster , which included cr ), fi a (cluster , which included h -h ), fd a (cluster , which included s ) and fj b (cluster ). the strongest inhibitors of ace -fc binding were in clusters and (tables and ) . neutralising antibodies were detected in clusters , and , with the strongest antibodies fi a and fd a being in clusters and (tables and ) . table ) and did not cross-react strongly with other betacoronaviruses (table ) . fd a exhibits the most potent neutralising activity in the prnt assay and also completely inhibits sars-cov- -induced cytopathic effect (see methods) at . nm. thirteen mabs were defined that bound the s region and three, close to germline in sequence, were neutralising. fj c showed strong neutralisation (ec . nm), whilst fd e (ec nm) and fd e (ec nm) were moderately neutralising (table ) the mabs were evolved from clonal groups defined by their heavy chain vdj and light chain vj rearrangements (supplemental table the presence of pre-existing immune memory to betacoronavirus that cross-react with sars-cov- is supported by the accumulation of somatic mutations in the genes encoding cross-reactive antibodies isolated from covid- patients (figures c and d, supplemental tables and ). this situation is reminiscent of re-exposure to immunogenic epitopes shared by closely related viruses leading to induction of broadly cross-reactive antibodies in patients infected with influenza, dengue or zika viruses ( - ). the mabs that bound to the spike glycoprotein were systematically tested for neutralisation (summarised in table ). results established that neutralising epitopes were present on the rbd, s -ntd, s -non ntd/rbd, and s regions of the spike cd neg cd pos cd neg cd hi cd hi igg pos plasmablasts were gated and isolated in chamber as single cells as previously described ( ) . sorted single cells were used to produce human igg mabs as previously described ( ) confluent monolayers of vero e cells in -well plates were incubated with ~ plaque forming units (pfu) of sars cov- (hcov- /england/ / , epi_isl_ ) and antibodies in a -fold dilution series (triplicates) for hours at room temperature. inoculum was then removed, and cells were overlaid with plaque assay overlay. cells were incubated at °c, % co for hours prior to fixation with % paraformaldehyde at °c for minutes. fixed cells were then permeabilised with . % triton-x- and stained with a horseradish peroxidase conjugated-antibody against virus protein for hour at room temperature. tmb substrate was then added to visualise virus plaques as described previously for influenza virus ( ). convalescent serum from covid- patients was used as a control. in brief, this rapid, high-throughput assay determines the concentration of antibody that produces a % reduction in infectious focus-forming units of authentic sars- eagle's medium containing % fbs), two-fold serially diluted mabs in vgm starting at µg/ml were added to each duplicated well. the plates were immediately transferred to a bsl- laboratory and tcid sars-cov- (hcov- /taiwan/ / , epi_isl_ ) in vgm was added. the plates were further incubated at °c with % co for three days and the cytopathic morphology of the cells was recorded using an imagexpress nano automated cellular imaging system. competitive binding assays were performed as described previously ( ) two assays were used to determine the blocking of binding of ace to rbd by mabs. rbd was anchored on the plate in the first assay whereas ace was anchored for the second assay. the second ace blocking assay was performed as described previously ( , ) . b non-ntd s pos . . . . . . . ew b b non-ntd s -ve . . . . . . -ve fd d b ntd pos . . . . . . -ve fd c b non-ntd s pos . . . . . . -ve fd d b non-ntd s -ve . . . . . . -ve fd b b non-ntd s -ve . . . . . . -ve fd c b ntd pos . . . . . . -ve fg c a non-ntd s pos . . . . . . -ve fn c c non-ntd s -ve . . . . . . -ve fd e b non-ntd s pos . . . . . . -ve ew b b non-ntd s -ve . . . . . . -ve deployment of convalescent plasma for the prevention and treatment of covid- effect of convalescent plasma therapy on time to clinical improvement in patients with severe and life-threatening use of convalescent plasma therapy in sars patients in hong kong structure, function, and antigenicity of the sars-cov- spike breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid- neutralizing antibodies in patients with severe acute respiratory syndrome-associated coronavirus infection antibody responses to sars-cov- in patients with covid- serology characteristics of sars-cov- infection since exposure and post symptom onset cross-neutralization of influenza a viruses mediated by a single antibody loop structural insights on the role of antibodies in hiv- vaccine and therapy human monoclonal antibody combination against sars . huo, j. et al. neutralization of sars-cov- by destruction of the prefusion spike a highly conserved cryptic epitope in the receptor binding domains of neutralizing nanobodies bind sars-cov- spike rbd and block interaction with ace structural basis for the neutralization of sars-cov- by an antibody from a convalescent patient rugged nanoscaffold to enhance plug-and-display vaccination structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies therapeutic monoclonal antibodies for ebola virus infection derived from vaccinated humans cross-neutralization of sars-cov- by a human monoclonal sars cov antibody protective humoral immunity in sars-cov- infected pediatric patients antibody responses to sars-cov- in patients of novel coronavirus disease serologic cross-reactivity of sars-cov- with endemic and seasonal an outbreak of human coronavirus oc infection and serological cross-reactivity with sars coronavirus recovery in tracheal organ cultures of novel viruses from patients with respiratory disease epidemiology of seasonal coronaviruses: establishing the context for the emergence of coronavirus disease human coronavirus oc associated with fatal development of a nucleocapsid-based human coronavirus immunoassay and estimates of individuals exposed to coronavirus in a u.s. metropolitan population the dominance of human coronavirus oc and nl infections in infants the human immune response to dengue virus is dominated by highly cross-reactive antibodies endowed with neutralizing and enhancing activity zika virus activates de novo and cross-reactive memory b cell responses in dengue-experienced donors broadly cross-reactive antibodies dominate the human b cell response against pandemic h n influenza virus infection receptor-binding domain of severe convergent antibody responses to sars-cov- in convalescent individuals potent neutralizing antibodies against sars-cov- identified by single-cell sequencing of convalescent patients' b cells potent neutralizing antibodies against multiple epitopes on sars-cov- spike a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace potent neutralizing antibodies from covid- patients define multiple targets of vulnerability studies in humanized mice and convalescent humans yield a sars cov- antibody cocktail isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model human neutralizing antibodies elicited by sars-cov- infection a human monoclonal antibody blocking sars-cov- infection human monoclonal antibodies block the binding of sars-cov- spike protein to angiotensin converting enzyme receptor receptor-binding domain as a target for developing sars vaccines the sars-cov- receptor-binding domain elicits a potent neutralizing response without antibody-dependent enhancement a vaccine targeting the rbd of the s protein of sars-cov- induces protective immunity structural basis of receptor recognition by sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural and functional basis of sars-cov- entry by using sars-cov- neutralizing antibody structures inform therapeutic strategies complete mapping of mutations to the sars-cov- spike receptor-binding domain that escape antibody recognition antibody cocktail to sars-cov- spike protein prevents rapid mutational escape seen with individual antibodies a neutralizing human antibody binds to the n-terminal domain of the structure-function analysis of neutralizing antibodies to h n influenza from naturally infected humans optimisation of a micro-neutralisation assay and its application in antigenic characterisation of influenza viruses. influenza other respir viruses isolation and rapid sharing of the novel coronavirus from the first patient diagnosed with covid- in australia the data are presented as specificity, number of antibodies, and the percentage of total antibodies isolated from each patient. (b) the binding activity of anti-sars-cov- mabs with spike glycoprotein, rbd and the s subunit in elisa. anti-influenza h mab bs- a and anti-sars rbd cr were included as controls. each experiment was repeated twice. the od values are presented as mean ± standard error of the mean. panels (c) and (d) show numbers of variable domain mutations in mab genes and variation antibodies that strongly cross-react with at least one betacoronavirus (sars or mers or oc ) were defined as cross-reactive mabs. cdr length and mutation numbers are presented as mean ± standard error of the mean (anti-s , specific reactive, n= ; anti-n, specific, n= versus cross-reactive, n= ). the two-tailed test was performed to compare the mutations between two groups d, =day ; ns, non-significant hinge and fc region of human igg and ace -fc were included as controls the rbd was colored in green. the epitopes recognized by ey a, cr and vhh (cluster mab) ( , , ) were colored in magenta. the epitopes recognized by ace and h -h (cluster mab) ( ) were overlapping and colored in blue and light blue. the epitopes recognized by s convalescent sera were analysed in the ace -blocking (ace anchored) assay anti-rbd antibody fd a and anti- influenza h antibody bs a were included as controls. data are presented as mean ± standard error of the mean ace -blocking activity of anti-rbd antibody compared to ace -fc (see methods): +, partial; ++ abbreviations: ifa, immunofluorescence; rbd, receptor-binding domain; prnt, plaque reduction neutralisation assay key: cord- -iihgc nr authors: cavallo, luigi; oliva, romina title: d y and other mutations in the fusion core of the sars-cov- spike protein heptad repeat undermine the post-fusion assembly date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iihgc nr the iconic “red crown” of the severe acute respiratory syndrome coronavirus (sars-cov- ) is made of its spike (s) glycoprotein. the s protein is the trojan horse of coronaviruses, mediating their entry into the host cells. while sars-cov- was becoming a global threat, scientists have been accumulating data on the virus at an impressive pace, both in terms of genomic sequences and of three-dimensional structures. on april st, the gisaid resource had collected , sars-cov- genomic sequences. we extracted from them all the complete s protein sequences and identified point mutations thereof. six mutations were located on a -residue segment ( - ) in the “fusion core” of the heptad repeat (hr ). our modeling in the pre- and post-fusion s protein conformations revealed, for three of them, the loss of interactions stabilizing the post-fusion assembly. on may th, the sars-cov- genomic sequences in gisaid were , . an analysis of the occurrences of the hr mutations in this updated dataset revealed a significant increase for the s i and s f mutations and a dramatic increase for the d y mutation, which was particularly widespread in sweden and wales/england. we notice that this is also the mutation causing the loss of a strong inter-monomer interaction, the d -r salt bridge, thus clearly weakening the post-fusion assembly. coronavirus disease is caused by the severe acute respiratory syndrome coronavirus (sars-cov- ). sars-cov- is a novel virus belonging to the β genus coronaviruses, which also include two highly pathogenic human viruses identified in the last two decades, the severe acute respiratory syndrome coronavirus (sars-cov) and the middle east respiratory syndrome coronavirus (mers-cov) ( ) ( ) ( ) . coronaviruses are named after the protruding spike (s) glycoproteins on their envelope, giving a crown (corona in latin) shape to the virions ( ) . of the four structural proteins of coronavirues, s, envelope (e), membrane (m), and nucleocapsid (n), the s protein is the one playing a key role in mediating the viral entry into the host cells ( ) ( ) ( ) , making it one of the main targets for the development of therapeutic drugs and vaccines ( ) ( ) ( ) ( ) ( ) ( ) ( ) . comprised of two functional subunits, s and s , it first binds to a host receptor through the receptor-binding domain (rbd) in the s subunit and then fuses the viral and host membranes through the s subunit ( , ) . in the prefusion conformation, the sars-cov- s protein forms homotrimers protruding from the viral surface, where its rbd binds to the angiotensin-converting enzyme (ace ) receptor on the host cell surface ( ) (like the sars-cov homolog ( ) , and differently from mers-cov s, which recognizes a different receptor, the dipeptidyl peptidase ( ) ). receptor binding and proteolytic processing by cellular proteases then cause s to dissociate and s to undergo large-scale conformational changes towards a stable structure, bringing viral and cellular membranes into close proximity for fusion and infection ( , , ) . while the outbreak of covid- was rapidly spreading all over the world, affecting millions of people and becoming a global threat, laboratories worldwide promptly started to sequence a large number of sars-cov- genomes. all the available genomic data is accessible through the global initiative on sharing all influenza data (gisaid) website, an invaluable open access resource ( , ) . simultaneously, crucial structural knowledge has been achieved on sars-cov- , especially regarding the s protein. d structures are now available from the protein data bank (pdb) ( ) for the sars-cov- s protein in the pre-fusion conformation, also bound to the ace receptor ( ) ( ) ( ) ( ) ( ) ( ) ( ) , and for the post-fusion core of its s subunit in the postfusion conformation ( ) . on april st , months after the first sequencing ( ), , genomic sequences of sars-cov- were available from gisaid. therefore, we considered the time ripe for an assessment of the mutational spectrum of the sars-cov- spike protein. to this aim, we extracted all the complete s protein sequences from the gisaid st april dataset and identified all the mutations occurring in at least identical sequences (see table s ). from this analysis, a -amino acid segment in the fusion core of the heptad repeat (hr ) emerged as a hotspot for mutations. while the mutations we identified corresponded to a mutation every positions along the protein sequence, as many as amino acids were found to be mutated in the above -amino acid segment: s , d , l , s , s and s . after the proteolytic processing, in the post-fusion conformation, the s protein hr and hr motifs interact with each other to form a six-helix bundle ( -hb), which promotes initiation of the viral and cellular membranes fusion. the hr "fusion core" is named after its role in giving many interactions with hr in the post-fusion conformation, thus playing a key role in the virus infectivity ( ) . based on the structural location of the above highly concentrated mutations and on their nonconservative nature, we considered them of particular interest and decided to further investigate their structural basis, both in the pre-and post-fusion conformation, as well as their sequencing dates and geographical distribution. as we show in the following, as many as three of them are responsible for the loss of inter-monomer hbonds in the post-fusion conformation, while one of them, s p, would introduce unexpected structural strain in the pre-fusion conformation. a search in the gisaid resource updated to may th showed a significant increase in occurrences especially for one mutant, d y, unreported to date, which has become a common variant in some european countries, especially sweden. it is also the mutant having the most significant structural role, causing the loss of an intermonomer salt bridge in the post-fusion assembly. we downloaded the , genomic sequences available from gisaid on april st . from these sequences, we extracted the nucleotide sequences of the spike protein and translated them to protein sequences with in-house scripts. nucleotides sequences featuring an internal stop codon, having at least one undefined ("n") nucleotide or resulting in spike proteins of length different from , amino acids were discarded. sequences annotated as pangolin, bat or canine were also discarded. the remaining , protein sequences were further analysed. first, we clustered them in sets of identical sequences with cd-hit ( ), obtaining clusters of at least sequences and unique sequences. as a reference system for further analyses, we used the first dated (on december th ) genomic sequence in gisaid, isolated and sequenced in wuhan (hubei, china) ( ) . then, upon alignment to the reference sequence, we identified point mutations in all the sets of at least two sequences. we downloaded again the , genomic sequences available from gisaid on may th (gisaid_hcov- _ _ _ _ ) and followed the above pipeline to extract , complete -residue long s protein sequences. we then recorded the presence and frequency in them of any mutation occurring in the fusion core of the hr (residues - ) with in-house scripts. mutants d models were built using the mutate_model module of the modeller v program ( ) . this is an automated method for modelling point mutations in protein structures, which includes an optimisation procedure of the mutated residue in its environment, beginning with a conjugate gradients minimisation, continuing with molecular dynamics with simulated annealing and finishing again by conjugate gradients. the used force field is charm- , for details see reference ( ) . models for mutants in the pre-fusion conformation were built starting from the em structure of the pre-fusion trimeric conformation (pdb id: vsb, resolution . Å, ( )). models for mutants in the post-fusion conformation were built starting from the x-ray structure of the s subunit fusion core (pdb id: lxt, resolution . Å, ( )). molecular models were analysed and visually inspected with pymol ( ) . the cocomaps web server ( ) was used to analyse the inter-chain contacts and hbonds as well as the residues accessibility to the solvent. we downloaded all the sars-cov- genomic sequences from the gisaid resource on april st , extracted from them , complete s protein sequences and identified all the point mutations occurring in at least two identical sequences (see methods). the mutations we identified, occurring at different positions spread all over the protein sequence, are reported in table s , with the relative number of occurrences. while the mutations we identified were spaced on average positions along the protein sequence, a segment of amino acids harboured mutations, at positions , , , , and , proposing itself as a mutational hotspot. this sequence segment is part of the "fusion core" of the hr , in the protein s subunit. the hr of coronaviruses s proteins undergoes one of the most notable rearrangements within the protein between the pre-and post-fusion conformations. in the post-fusion conformation, in fact, it experiences a refolding of the pre-fusion multiple helices and intervening regions into a single continuous helix ( figure ). as already mentioned, three of these long helices then form a hb with three hr helical motifs ( , , ) . the hr and its "fusion core" in particular thus play a crucial role in the virus infectivity. the following mutations were identified in the fusion core of the hr on april st : s i, d y, l f, s f, s f, s p. two of these mutations, d y and s p, were among the most frequent in the ensemble of mutations we identified. besides the widespread d g, now dominant over the original d variant ( , ) , only other mutations (two of them being very peripheral, l f and p l) recurred indeed in ≥ sequences (see table s ). s p was also reported in ( ) , where it was hypothesized to be spreading via recombination. the l f mutation was a particularly late one; it was found in sequences, associated to the d g mutation, both from england and dated march th . the s i mutation was found in sequences from usa (washington), dated march th and th , associated to the d g mutation. finally, the s f mutation had a unique geographical distribution, as it was found in sequences from australia (new south wales) dated february th and march th , not associated to the d g mutation. in addition, it was found in single sequence from france, dated march th , where it was associated to the d g mutation. in conclusion, with the exception of s f, which was found in australia, all the mutations in the hr core fusion were spread in two continents, europe and/or north america. furthermore, most of them originated from the d g variant. this is in agreement with them seeming to be quite late mutations, sequenced starting from the end of february/march , i.e. over two months after the first wuhan variant dated december th ( ) ( table ) the ≈ -fold increment in frequency of the s i and s f mutants was in line with the increment of the sequences in the dataset. the three additional occurrences of the s i mutation were from usa (washington), wales and england. a novel s t mutation was also reported twice from scotland. additional occurrences of the s f mutation were instead from usa, , england, , kazakhstan, , and uae, . as for the l f and s f mutants, their increment was significantly lower than the increment of the sequences in the dataset. a positive selection thus clearly hasn't emerged to date for these mutations. the only additional occurrence of l f was from denmark, while the additional occurrences of s f were from france and usa (washington). the s p mutation represented a special case. most of the sequences harbouring such a mutation were indeed modified between the april st and the may th datasets, so that they do not feature anymore the mutation to proline. however, novel occurrences of the same mutation, s p, emerged from china (beijing). in addition, sequences from scotland presented the novel s i mutation. as for the remaining positions of the hr fusion core, to may th , either they were fully conserved (s , k , a , a , l , g , k and q ), or they hosted one single occurrence of mutation (to valine for a , to aspartate for i and g , to histidine for q and d , to alanine for t and to arginine for l ). because of the rarity of such mutations, we will not discuss them here. however, we will continue to monitor them over time. all the amino acids undergoing mutations in the sars-cov- s protein are conserved in the bat coronavirus ratg s protein (sharing an overall sequence identity of % with sars-cov- s protein), while as many as five of them are mutated in the sars-cov- s protein (overall % sequence identical to the sars-cov- homolog) (see figure ). four of these mutations are however conservative (aspartate to glutamate, serine to threonine), except s , which is a lysine in sars-cov. it has been proposed that such mutations in the sars-cov- hr may be associated with enhanced interactions with the hr , further stabilizing the -hb structure and maybe leading to increased infectivity of the virus ( ) . it is noteworthy that no one of the point mutations we identified restored the corresponding sars-cov amino acid. in the pre-fusion conformation, all the mutated positions, but s , are located on the second of four non-coaxial helical segments composing the hr (figure ). four of them, s , d , s and s , are exposed to the solvent (table ) , and can be modelled as larger (mostly aromatic) residues without causing any structural strain (see figure ). these mutations are not expected to cause relevant changes in the prefusion structure, although they could have a destabilizing effect as a consequence of posing large aromatic residues in direct contact with the solvent instead of smaller apolar (leucine), polar (serine in cases) or even charged (aspartate) residues. in addition, s involves its side-chain in a h-bond with the main-chain of d , residues upstream. the loss of this h-bond in the s f mutant also points to a slight destabilization of the pre-fusion conformation. as for l , it is buried in the prefusion conformation, pointing towards a three-stranded anti-parallel β-sheet made of the s -p segment from the s subunit and of the y -p and g -a segments from the s subunit, without directly contacting it (distances above Å). it can also be modelled as a large phenylalanine without causing sterical strain. upon mutation, it seems to optimize the hydrophobic interactions with the neighboring residues, especially i and a . finally, s is located on a turn immediately downstream the helical segment hosting the above five mutations, between the second and third helical segments. the wild-type residue s features φ and ψ dihedral angles of . ° and . °, respectively, which fall in an unfavourable region for prolines. in the s p model we generated, the p φ and ψ dihedrals assume the values of . ° and . °, placing the residue in an outlier region ( ). the favoured φ angle for prolines is indeed restricted to the value of - ± °, ( ) characteristic of α-helices. a proline at such a position would therefore introduce an anomaly in the pre-fusion conformation, while strongly promoting the transition to the post-fusion single continuous helical conformation. it is also worth noticing that this would be the only mutation among those we identified so far, introducing a proline residue in the sars-cov- s protein (table s ). in light of the analysis of the gisaid may th updated, we also modelled the s i mutation. being isoleucine compatible with the s dihedral values, this mutation does not result in any structural strain. when looking at the post-fusion conformation of the sars-cov- spike protein s subunit, these mutations appear more revealing. three of the wild-type residues, s , d and s , are indeed engaged in side-chain to side-chain h-bonds with the hr segment of an adjacent monomer. in particular, s , d and s (hr on chain a) are h-bonded to s , r and e , respectively (hr on chain c, figure ). these are all strong h-bonds, especially the one between s and e , involving a negatively charged residue, and the one between d and r , being actually a salt bridge (estimated to contribute an additional - kcal/mol to the free energy of protein stability as compare to a neutral h-bond ( )). all these three h-bonds are lost upon mutation, which points to a weakening of the post-fusion assembly. of the remaining three mutations, s f is completely exposed to the solvent and therefore, like in the pre-fusion conformation, expected to act unfavourable on the protein solvation energy. on the contrary, in case of l f and s f, which are substantially buried within the structure, mutation to a large aromatic phenylalanine seems even to optimize the network of the hydrophobic interactions; in case of f , with the aliphatic parts of the side-chains of e and r on an adjacent monomer, and, in case of f , with v and a on the same monomer and with other f residues on both the adjacent monomers. when comparing the effect of the mutations on the pre-and post-fusion structures, it emerges that the s i, d y and s i mutations strongly destabilize the postfusion conformation, while having a marginal impact on the stability of the pre-fusion one. on the contrary, s f seems to favour the post-fusion conformation over the pre-fusion one. as for s f and s f, they seem to have a comparable effect on both the conformations, slightly stabilizing and destabilizing, respectively. finally, the s p mutation would strongly destabilize both the pre-and post-fusion conformations. based on a thorough analysis of the s protein sequences, that we extracted from the genomic sequences of sars-cov- reported in gisaid on april st , we identified the fusion core of the hr as a mutational hotspot. the d y and s p mutations were the most numerous, being among the most frequently occurring mutations overall at the time. other, less frequent, mutations were s f and then s i, l f and s f. overall, such mutations appeared to be late ones, emerging starting from the end of february or even mid march , and were mainly localized in europe and usa. based on their frequency, on their location in a protein region playing a key role in the post-fusion conformation and also on the non-conservative nature of the mutations themselves, we decided to further investigate the structural basis of such mutations, finding out that they all can play a role in tuning the stability of the preand/or post-fusion s protein conformation. other potentially interesting mutations are s i and s f, whose number of occurrences underwent a ≈ / -fold increase. on the other hand, the increment in the occurrence of l f and s f was marginal, posing less emphasis on such mutations, which will be nonetheless useful to continue monitoring. finally, the s p mutation, although still reported in few cases, underwent a dramatic reduction of occurrences, due to modification of the original sequences where they were first reported. at the same time, a s i mutation emerged, that will also be worth continuing to monitor. we remind here that a proline at position would cause a significant destabilization on the s protein pre-fusion conformation. it is also worth noticing that the mutations significantly increasing their frequency over time, d y and s i, were also those that, together with s p/i, caused the loss of a inter-monomer h-bond in the post-fusion conformation of the protein. interestingly, the now emerging s i mutation gets the same effect without destabilizing the pre-fusion conformation. the most frequently occurring mutation in the hr "fusion core", common in sweden and uk on may th , is also the one causing the loss of a strong inter-monomer salt bridge. our structural analyses provide a rationale for such mutations, pointing to a weakening of the post-fusion assembly. however, only experiments on cellular systems will clarify whether this may be a virus strategy for reducing its membrane fusion capacity, thus lowering its virulence. we gratefully acknowledge all the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories where genetic sequence data were generated and shared via the gisaid initiative, on which this research is based. table . solvent accessibility of mutated residues in the pre-and post-fusion conformations. amino acid pre-fusion post-fusion i exposed partly buried ( . %) a y exposed partly buried ( . %) f buried ( . %) buried ( . %) f exposed exposed f exposed buried ( . %) table s . list of mutations identified in gisaid in at least identical sequences on april st , in sequential order. a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china clinical features of patients infected with novel coronavirus in wuhan receptor recognition mechanisms of coronaviruses: a decade of structural studies activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein cell entry mechanisms of sars-cov- sars-cov- spike protein: an optimal immunological target for vaccines the spike protein of sars-cov--a target for vaccine and therapeutic development key residues of the receptor binding motif in the spike protein of sars-cov- that interact with ace and neutralizing antibodies potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody ) a human monoclonal antibody blocking sars-cov- infection candidate drugs against sars-cov- and covid- sars-cov- vaccines: status report. immunity ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence structure of sars coronavirus spike receptor-binding domain complexed with receptor structure of mers-cov spike receptor-binding domain complexed with human receptor dpp tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion ) data, disease and diplomacy: gisaid's innovative contribution to global health gisaid: global initiative on sharing all influenza data -from vision to reality the protein data bank cryo-em structure of the -ncov spike in the prefusion conformation ) structure, function, and antigenicity of the sars-cov- spike glycoprotein structure of the sars-cov- spike receptorbinding domain bound to the ace receptor structural basis for the recognition of sars-cov- by full-length human ace structural and functional basis of sars-cov- entry by using human ace ) structural basis of receptor recognition by sars-cov- a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion identification of a novel coronavirus causing severe pneumonia in human: a descriptive study fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein cd-hit: accelerated for clustering the next-generation sequencing data comparative protein modelling by satisfaction of spatial restraints modeling mutations in protein structures cocomaps: a web application to analyze and visualize contacts at the interface of biomolecular complexes could the d g substitution in the sars-cov- spike (s) protein be associated with higher covid- mortality? spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- . biorxiv structure validation by calpha geometry: phi,psi and cbeta deviation influence of proline residues on protein conformation ph-induced denaturation of proteins: a single salt bridge contributes - kcal/mol to the free energy of folding of t lysozyme authors declare no competing interests. key: cord- - at gelt authors: han, namshik; hwang, woochang; tzelepis, konstantinos; schmerer, patrick; yankova, eliza; macmahon, méabh; lei, winnie; katritsis, nicholas m; liu, anika; schuldt, alison; harris, rebecca; chapman, kathryn; mccaughan, frank; weber, friedemann; kouzarides, tony title: identification of sars-cov- induced pathways reveal drug repurposing strategies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: at gelt the global outbreak of sars-cov- necessitates the rapid development of new therapies against covid- infection. here, we present the identification of approved drugs, appropriate for repurposing against covid- . we constructed a sars-cov- -induced protein (sip) network, based on disease signatures defined by covid- multi-omic datasets(bojkova et al., ; gordon et al., ), and cross-examined these pathways against approved drugs. this analysis identified drugs predicted to target sars-cov- -induced pathways, of which are already in covid- clinical trials(clinicaltrials.gov, ) testifying to the validity of the approach. using artificial neural network analysis we classified these drugs into distinct pathways, within two overarching mechanisms of action (moas): viral replication ( ) and immune response ( ). a subset of drugs implicated in viral replication were tested in cellular assays and two (proguanil and sulfasalazine) were shown to inhibit replication. this unbiased and validated analysis opens new avenues for the rapid repurposing of approved drugs into clinical trials. to date, the majority of small molecule and antibody approaches for treating sars-cov- related pathology are rightly rooted in repurposing and are focused on several key virus or host targets, or on pathways as points for therapeutic intervention and treatment. this has been underpinned by the unprecedented pace of scientific research to uncover the molecular bases of virus structure, and the mechanisms by which it gains access to cells before replication and release of new virus particles. the emergence of global proteomics datasets is now propelling our understanding of these mechanisms through which the virus interacts with host cell proteins, determining the directly interacting proteins (dip) (gordon et al., ) and differentially expressed proteins (dep) . such interactome outputs, to understand the disease mechanism of covid- , we investigated which biological processes sip sub-networks are implicated in, for sars-cov- proteins ( structural proteins, non-structural proteins and accessory factors of the virus genome). we analysed several parameters: ( ) the subcellular localization of the proteins; ( ) the differences between the h and h timepoints (figures b and c) ; and ( ) the biological processes that the proteins act in. we found significantly stronger relevance of rna metabolism at h. we observe that the viral proteins n (nucleocapsid), nsp (non-structural protein ), orf (open reading frame ) and orf of sars-cov- interact with ribosomal proteins in the hidden layer of our sip network, indicating that they may have a possible influence on rna metabolism. the n and nsp proteins are known to drive viral replication (gordon et al., ) . more interestingly, orf and are the only two proteins of sars-cov- that are distinct from other coronaviruses (tang et al., ) . we observed that orf was enriched in the endoplasmic reticulum (er) ( figure c ), which may be significant as the er is the intracellular niche for viral replication and assembly (romero-brey and bartenschlager, ) . there were key proteins in the hidden layer that did not have strong enrichment in known biological pathways ('other') and that actively interacted with 'virus replication' proteins at h. further study on the unknown proteins found individual links to rna binding (atp a , mrto and nhp l ), host-virus interaction (ace , cxcr , derl , gnb l , hspd , kdr, krt , sirt and tmprss ), histones (h afz, hist h ps and wdtc ), viral mrna translation (mrps ) and er-associated responses (atf , cftr, derl and ins). we next confirmed statistically that virus-related pathways are enriched in the top enriched go-terms as well as rna-and er-related processes ( figure d ). the differences between two time points were also confirmed. in summary, our pattern analysis in the sip sub-networks revealed biological pathway changes during the course of infection, with prominent increases in proteins involved in virus replication by h. an in silico drug simulation on the key pathway of sip network identifies drug candidates to identify drugs targeting the key pathways, we conducted a network-based in-silico drug efficacy simulation (guney et al., ) on the key pathways of sip network at h and h after infection. we collected , approved drugs from publicly available databases (chembl (mendez et al., ) and drugbank (wishart et al., ) ). this virtual screening identified drugs (table s ) that are predicted to target the key pathways of sip network, of which ( . %) were specific to the h timepoint, ( %) were specific to the h timepoint and ( . %) were common to both timepoints. we then checked the anatomical therapeutic chemical (atc) code (available for drugs only) to determine the therapeutic areas for which specific drugs have been developed. the top clinical areas against which these approved drugs are used for were cancer, sex hormone signalling, diabetes, immune system, bacterial disease and inflammatory/rheumatic disease ( figure s ). interestingly, % of the drugs have been tested in phase or clinical trials for infectious diseases, and half of these were hiv trials; further drugs have been tested in trials for inflammatory ( %) and respiratory ( %) disease. among the identified drugs, ( %) are now in covid- clinical trials (clinicaltrials.gov, ) (table s ) . to determine the significance of this finding, we asked what the likelihood would be of this number of drugs being identified as hits by chance and found that, by comparison, only of , drugs were in the covid- clinical trials (clinicaltrials.gov, ) . a hypergeometric test returned a p-value of . e- , demonstrating the reliability and validity of our computational approaches. further drugs that we identified have also been reported as being potential candidates against covid- (courtney j. mycroft-west, dunhao su , yong li , scott e. guimond, timothy r. rudd and elli, gavin miller, quentin m. nunes, patricia procter, antonella bisio, nicholas r. forsyth, jeremy e. turnbull, marco guerrini, david g. fernig, edwin a. yates, ; criado et al., ; kuleshov et al., ; kumar, arun; c.s, sharanya; j, abhithaj; miribrahim sajid, javeria tariq, sheharbano awais, zehranaseem, samirashabbir balouch, ; shin et al., ) . thus, drug efficacy simulation has revealed drugs in total that are either in covid- clinical trials or being considered as potential drug candidates in pre-clinical studies, supporting the strength of our approach. in addition, we have discovered additional drugs that could provide novel opportunities for repurposing as covid- therapeutics. the full list of approved drugs along with their safety profile and moa are shown in table s . to investigate the mechanism of action (moa) for the drugs in the context of covid- , we used self-organizing map (som), a type of artificial neural network, to analyse the relationship between the drugs and the key pathways (termed drug-pathway association). after the unsupervised training of som, the distance between the adjacent neurons (pathways) was calculated and presented in different coloured hexagons, which illustrates the probability density distribution of data vectors (drug-pathway association score) (vesanto and alhoniemi, ) ( figures a and s ). based on the distance, we applied the davies-bouldin (db) index to separate the key pathways into clusters ( figure b ). these clusters of pathways and drugs identified two moa categories: ( ) virus replication (vr) and ( ) immune response (ir) ( figure c ). the som also mapped drugs into each neuron (the number of drugs per neuron is shown in figure d and drug names are shown in figure e ). notably, out of the drugs that are in covid- clinical trials (clinicaltrials.gov, ) were in the vr moa category while only drugs were in the ir ( figure d ). finally, we identified mechanistic roles and connections for the drugs and their target proteins, and mapping the drugs into pathway clusters ( figure e ). a more extensive analysis of information about each drug is given in table s . we next sought to identify the precise proteins, within the sip network, targeted by each of the drugs. we found that of the , proteins targeted by the drugs, most ( %) are targeted by a single drug ( figure s a ). however, there are proteins ( . %) that are targeted by or more drugs ( figure s a ). to establish whether there is a pathway relationship between these proteins, we interrogated their molecular function. figure s b shows that the most enriched categories of function for these proteins were heme, microsome, oxidoreductase and monooxygenase, all of which are related to nicotinamide adenine dinucleotide phosphate (nadp) and nitric oxide (no) synthesis. as no is important for viral synthesis (and because nadp affects no production), this could provide a potential mechanism by which these drugs might alter viral infection (kwiecien et al., ; lind et al., ; wang et al., ) . based on these findings we decided to validate in cellular assays, five drugs (ademetionine, alogliptin, flucytosine, proguanil and sulfasalazine) with good safety profiles which are functioning within this pathway. to assess whether these five drugs are able to reduce sars-cov- infection, we performed an initial screening using the vero e cell line, where we observed that of the drugs, proguanil and sulfasalazine, showed significant antiviral effects without any noticeable cellular toxicity at the indicated doses ( figures a and s a ). we then focused on these two drugs, expanding our validation using different cellular models (vero e and calu- ). treatment of vero e and calu- cells with proguanil and sulfasalazine illustrated strong anti-sars-cov- effects (represented by reductions of the envelope and nucleocapsid gene rnas) in a dose dependent manner, mirroring the results of the initial screen (figures b-e, s b-e). importantly, no significant effect on cellular viability was observed at any tested dose ( figures s f-h) . the effective concentration of sulfasalazine is comparable to maximal plasma concentrations achieved routinely in patients with rheumatoid arthritis or inflammatory bowel disease(iarc working group on the evaluation of carcinogenic risk to humans, ). to further investigate the anti-sars-cov- impact of these two drugs, we examined the status of recently discovered intracellular pathways directly associated with sars-cov- infection and cytokine production . indeed, treatment with either proguanil or sulfasalazine significantly reduced the phosphorylation of mapkapk (p-mk , t ) ( figure f ), an important component of the p /mitogen-activated protein kinase (mapk) signalling pathway, which has been shown to be activated via sars-cov- infection and stimulate cytokine response . importantly, treatment of calu- and vero e cell lines with proguanil and sulfasalazine led to a significant downregulation of the mrna of key cytokines (figures g-j and s ), which are dictated by the p /mapk signalling pathway and shown to become elevated during sars-cov- infection and replication (cxcl , ifnb and tnf-a). hence, the above results solidify the promising anti-sars-cov- effects of the two drugs, both at the viral as well as the molecular level. to understand why sulfasalazine and proguanil are effective against sars-cov- infection, but others functioning in the same pathway were not ( figure a ), we looked more closely at the targets of each drug. figure shows that sars-cov- orf binds to gammaglutamyl hydrolase (ggh) and regulates the synthesis of no, which is necessary for viral synthesis. an additional auxiliary pathway, mediating the synthesis of nadp, can also affect no production, although indirectly. sulfasalazine and proguanil impinge on both of these pathways: sulfasalazine targets the nfkb inhibitors nfkbia and ikbkb as well as cyp enzymes, whereas proguanil targets dhfr and cyp enzymes plus interacting partners. in this way these two drugs might more effectively target no production and thus disrupt viral replication. by contrast, the three drugs that were not effective against sars-cov- infection (flucytosine, alogliptin and ademetionine) only affect one of the two pathways. this analysis thereby highlights the possibility that targeting no production through multiple pathways may be the reason for the efficacy of sulfasalazine and proguanil in reducing viral replication. here we have used a series of computational approaches, including bespoke methods for data integration, network analysis, computer simulation and machine learning, to identify novel sars-cov- induced pathways that could be targeted therapeutically by repurposing existing and approved drugs ( figure s ). although network analysis is increasingly being used for analysis of genetic datasets to uncover disease signatures (barabási et al., ) , a few key aspects of our approach were essential in uncovering these new targets, including agnostic construction of the sip network and application of novel algorithms (previously used in other industries including social media). in addition, the use of artificial neural networks to understand systematically the mechanism of action for the drugs was vital to this investigation. our analysis identifies approved drugs, along with their moa, that may effective against covid- (table s ). we are confident that these drugs have a potential for repurposing for covid- , since out of the drugs have already entered clinical trials, testifying to the predictive power of our approach. an important part of our analysis is the use of already approved drugs. this allows for the rapid advancement of the most promising of the drugs which are not already in clinical trials. we identify two drugs, sulfasalazine and proguanil, that can reduce sars-cov- viral replication in cellular assays, raising the exciting possibility of their potential use in prophylaxis or treatment against covid- . both of these drugs function through the no pathway and have the potential to target more than one pathway necessary for no production. safety is a particularly important consideration, since such drugs will be prescribed to any covid- positive-case individuals who may have a broader range of underlying medical conditions and may not be hospitalised at the time of taking the drug. sulfasalazine and proguanil have the potential to be used prophylactically or therapeutically. both drugs are well established and well tolerated drugs (nakato et al., ; nikfar et al., ) . sulfasalazine is already in use as an anti-inflammatory drug against autoimmune disorders. given this drug has anti-viral activity (figure ) , this raises the possibility that sulfasalazine may act as an anti-viral and also an anti-inflammatory, if used against covid- . proguanil is in used against malaria in combination with atovaquone. it has an excellent safety profile and is well tolerated when used as a prophylactic and in treatment. a complementary study to ours, that uses large-scale compound screening in cultured cells, has recently uncovered molecules which have a partial effect on viral infectivity, of which show a dose dependent reduction of viral replication (riva et al., ) . this list of drugs does not overlap with ours, with only two of our approved drugs being present in this list. neither sulfasalazine nor proguanil are amongst them. the main reason for this apparent disparity is that only % of the compounds tested by riva et al. are approved, whereas % of our drugs are approved. this highlights the major difference in the two studies: our in-silico studies identify potential anti-viral drugs that are already approved and therefore at an advanced stage of repurposing, whereas riva et al. have identified compounds validated in african green monkey cells, most of which are either in pre-clinical or phase - clinical trials. our study has shed unanticipated new light on covid- disease mechanisms and has generated promising drug repurposing opportunities for prophylaxis and treatment. our data-driven unsupervised approach and biological validation has uncovered approved drugs not currently in clinical trials, which can be investigated immediately for repurposing and two drugs that show promise as anti-viral drugs. we expect this resource of potential drugs will facilitate and accelerate the development of therapeutics against covid- . (a) the schematic depicts our strategy of constructing a sars-cov- -induced protein (sip) hidden network through data integration and network construction of directly-interacting proteins (dips) and differentially expressed proteins (deps), followed by identification of drugs that target key pathways in this network (b) the sars-cov- orf sub-network shows the extent of the hidden layer that is revealed through the network analysis (c) percentage of shortest paths between the dip and dep that are via - proteins at h versus h. computer programming scripts that were used in this study are available from https://github.com/wchwang/covid . high-confidence sars-cov- -human ppis (gordon et al., ) were used as dip. lc-ms/ms data at hours, hours after sars-cov- infection and no infection as a control were analysed to identify differentially expressed proteins (dep) (|log fc| > . , fdr-bh padj-value < . ). sip network was constructed of all shortest paths between dip and dep in a human proteinprotein interaction network from string database (szklarczyk et al., ) . only interactions with a confidence score of greater more than medium ( . ) were used. all shortest paths between dip and dep were found using the python package networkx (hagberg et al., ) . networks were visualized using gephi . . (bastian et al., ) (figure s ) . multiple network centrality algorithms were deployed to identify key proteins in sip networks. eigenvector centrality was used to identify the most influential proteins in the network. degree centrality was used to identify the hub proteins in the network. betweenness centrality was used to identify the bottleneck proteins in the network. random walk with restart was used to identify proteins which are influenced by sars-cov- . the algorithms were implemented in the python package networkx (hagberg et al., ) . permutation tests were performed , times to identify significant proteins for each of the network centrality algorithms. for each permutation test, a random network that has the same degree distribution as the sip network was generated. if a protein has less than permutation p-value . for each of the network centrality algorithms, we considered it a key protein. key proteins of sip network were tested for enrichment of jensen disease (pletscher-frankild et al., ) and gene ontology (go biological process) terms. enrichment analyses were performed using enrichr (kuleshov et al., ) . key networks were built using interactions between the key proteins of the sip network at hours and hours after infection. when visualising the key networks, subcellular localization of key proteins and enriched pathways of hidden layer proteins was added (figures b and c) . subcellular localization information for key proteins was found using compartment database (binder et al., ) . among the available datasets in the compartment database, 'knowledge channel' data with a confidence score of greater than four was used. to identify enriched functions of the hidden layer proteins, the hidden layer proteins were tested for enrichment of reactome pathway terms. most hidden layer proteins belonged to the pathways "metabolism of rna", "cell cycle" and "immune system" so we retained only these pathways for key network visualisation. the key networks visualization was carried out using circos (krzywinski et al., ) . approved drugs were collected from chembl (mendez et al., ) and drugbank (wishart et al., ) . drug-target interaction information was collected from drugbank(v . ) (wishart et al., ) , stitch(confidence score > . ) (szklarczyk et al., ) and cheng, et al(cheng et al., ) . network-based in-silico drug efficacy simulation was conducted for key proteins from the sip network at -hour and -hour. given k, the set of key proteins from sip networks, and t, the set of drug targets, the network proximity(equation ( )) of k with the target set of t of each approved drug where d(k, t) the shortest path length between nodes k ∈ k and t ∈ t in the human ppis (cheng et al., ) was executed. to assess the significance of the distance between a key protein of sip network and a drug ! ( , ), the distance was converted to z-score based on permutation tests by using the permutation tests were repeated times, each time with two randomly selected gene lists with similar degree distributions to those of k and t. the corresponding p-value was calculated based on the permutation test results. drug to sars-cov- associations with a zscore of less than − were considered significantly proximal (cheng et al., ) . biological pathways were collated for the drugs we identified by in-silico drug efficacy simulation. to do this, proteins targeted by the drugs in sip network were tested for enrichment of reactome pathway terms using g:profiler (raudvere et al., ) . since reacome pathway hierarchy contains main overarching parent pathways and more specific child pathways nested within these, in cases where child pathways were among enriched pathways, the parent pathway term was removed from the enriched pathways list. finally, reactome pathways for drugs were identified. based on these identifications, a matrix containing drugs and reactome pathways was generated for drug-pathway association. this matrix was constructed using the f score (f = (precision x recall)/(precision + recall) from the pathway enrichment analysis. self-organizing map (som) (kohonen, ) was used to analyse moa of the drugs. the data used in training was the f score matrix for drug-pathway associations ( pathways by drugs). after the som training, davies-bouldin index (dbi) (davies and bouldin, ) was calculated based on the u-matrix to determine the best patterning among partitions ( figure a ). k-means algorithm were then used in order to find the pathway clusters ( figure b ). the som component maps of pathways were analysed based on the clustering result and mapped into two moa categories based on the biological functions ( figure s ). the som model also labelled each neurons with the drugs ( figures d and e ). the som toolbox package (vatanen et al., ) for matlab was used for this analysis. the frequency of drug-protein targeting was counted. permutation tests were then performed times to identify the significance threshold for the frequency of drug-protein targeting ( figure s a ). for each permutation test, the drugs among all the drugs which we used for the in silico drug efficacy simulation were randomly selected. then, the number of drugs targeting the same protein was calculated for all of the randomly selected drugs. the proteins frequently targeted in the sip network than randomised network were then tested for enrichment of uniprot keywords ( figure s b ). infection experiments were performed under biosafety level conditions. sars-cov- (strain münchen- . / / ) isolate was propagated in vero e cells in dmem supplemented with % fbs. for infection experiments in vero e and calu- cells, sars-cov- (strain münchen- . / / ) at moi= . pfu/cell for hours. all work involving live sars-cov- was performed in the bsl- facility of the institute for virology, university of giessen (germany), and was approved according to the german act of genetic engineering by the local authority. vero e and calu- cells were seeded using x cells in -well plates. the following day cells were treated for hour prior to infection with the indicated doses of ademethionine ( μm, selleckchem), alogliptin ( μm, selleckchem), flucytosine ( μm, selleckchem), proguanil ( nm- μm, selleckchem), sulfasalazine ( nm- μm, selleckchem), ifn-a ( u/ml), dmso (sigma) or mock and infected with sars-cov- at moi of . in serumfree dmem at °c for hours before rna or protein lysis. infection experiments were performed under biosafety level conditions. quantitative rt-pcr analysis rna was isolated using the rneasy mini (qiagen). sars-cov- replication (e-gene and ngene rna) and gene expression of the cytokines cxcl , ifnb and tnf-a was quantified by rt-qpcr. for cdna synthesis, rna was reverse-transcribed with the superscript vilo cdna synthesis kit (invitrogen, - ) . the levels of specific rnas were measured using the abi real-time pcr machine and the powerup™ sybr™ green master mix (applied biosystems, ) according to the manufacturer's instructions. Δct values were determined relative to the gapdh and ΔΔct values were normalized to infected dmso treated samples. error bars indicate the standard deviation of the mean from three independent biological replicates. all primer sequences are listed in table below. cytotoxicity was performed in vero e and calu- cells using neutral red (abcam, ab ) and mtt assay (roche) respectively, according to the manufacturer's instructions. cytotoxicity was performed in vero e and calu- cells with the indicated compound dilutions and concurrent with viral replication assays. all assays were performed in biologically independent triplicates. western blot analysis x vero e cells either mock-infected or infected and treated with dmso or proguanil ( μm) or sulfasalazine ( μm) for hours, were resuspended and lysed in whole cell xsds sample buffer ( xsds sample buffer: mm tris-hcl, ph = . , . % glycerol, . % sds, . mm bromophenol blue), supplemented with ml -mercaptoethanol, protease inhibitors (sigma), and phosphatase inhibitors (sigma) and boiled for min at °c. - μg of protein was separated on sds-page gels, and blotted onto polyvinylidene difluoride membranes (millipore). western blot experiments were performed using the following antibodies: gapdh (abcam, ab ), phospho-mapkapk- (thr , cell signalling, ), goat anti-rabbit (abcam, ab ) and anti-mouse-hrp (cell signalling, s). statistical analyses performed were specified in figure legends. differences were considered significant for p-values < . . network medicine: a network-based approach to human disease gephi: an open source software for exploring and manipulating networks compartments: unification and visualization of protein subcellular localization evidence proteomics of sars-cov- -infected host cells reveals therapy targets the global phosphorylation landscape of sars-cov- infection network-based prediction of drug combinations anti-inflammatory mechanism of galangin in lipopolysaccharide-stimulated microglia: critical role of ppar-γ signaling pathway sars-cov- clinical trials nadph-generating dehydrogenases: their role in the mechanism of protection against nitro-oxidative stress induced by adverse environmental conditions. front glycosaminoglycans induce conformational change in the sars-. biorxiv lessons from dermatology about inflammatory responses in covid- a cluster separation measure a sars-cov- protein interaction map reveals targets for drug repurposing clinical characteristics of coronavirus disease in china network-based in silico drug efficacy screening extrapulmonary manifestations of covid- exploring network structure, dynamics, and function using networkx hostile takeovers: viral appropriation of the nf-κb pathway some drugs and herbal products the self-organizing map circos: an information aesthetic for comparative genomics the covid- gene and drug set library. ssrn electron enrichr: a comprehensive gene set enrichment analysis web server update drug repurposing to identify therapeutics against covid with sars-cov- spike glycoprotein and main protease as targets: an in silico study lipid peroxidation, reactive oxygen species and antioxidative factors in the pathogenesis of gastric mucosal lesions and mechanism of protection against oxidative stress -induced gastric injury inducible nitric oxide synthase: good or bad? chembl: towards direct deposition of bioassay data sars-cov- : cytokine storm and therapy a systematic review and meta-analysis of the effectiveness and safety of atovaquone -proguanil (malarone) for chemoprophylaxis against malaria a meta-analysis of the efficacy of sulfasalazine in comparison with -aminosalicylates in the induction of improvement and maintenance of remission in patients with ulcerative colitis diseases: text mining and data integration of disease-gene associations g:profiler: a web server for functional enrichment analysis and conversions of gene lists ( update) discovery of sars-cov- antiviral drugs through large-scale compound repurposing endoplasmic reticulum: the favorite intracellular niche for viral replication and assembly papain-like protease regulates sars-cov- viral spread and innate immunity stitch : augmenting proteinchemical interaction networks with tissue and affinity data string v : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets on the origin and continuing evolution of sars-cov- selforganization and missing values in som and gtm clustering of the self−organizing map nitric oxide regulates endocytosis by s-nitrosylation of dynamin nitric oxide and redox mechanisms in the immune response drugbank . : a major update to the drugbank database sulfasalazine or proguanil-treated vero e cells at indicated concentrations for three hours prior to infection with sars-cov- for hours. (g-j) rt-qpcr analysis of the indicated mrnas from calu- cells pre-treated with proguanil or sulfasalazine at indicated concentrations for three hours prior to infection with sars-cov- for hours. statistical test: student's t test key: cord- -on w x authors: muruato, antonio e.; fontes-garfias, camila r.; ren, ping; garcia-blanco, mariano a.; menachery, vineet d.; xie, xuping; shi, pei-yong title: a high-throughput neutralizing antibody assay for covid- diagnosis and vaccine evaluation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: on w x virus neutralization remains the gold standard for determining antibody efficacy. therefore, a high-throughput assay to measure sars-cov- neutralizing antibodies is urgently needed for covid- serodiagnosis, convalescent plasma therapy, and vaccine development. here we report on a fluorescence-based sars-cov- neutralization assay that detects sars-cov- neutralizing antibodies in covid- patient specimens and yields comparable results to plaque reduction neutralizing assay, the gold standard of serological testing. our approach offers a rapid platform that can be scaled to screen people for antibody protection from covid- , a key parameter necessary to safely reopen local communities. infection. an ideal serological assay should measure neutralizing antibody levels, which should predict protection from reinfection. conventionally, neutralizing antibodies are measured by plaque reduction neutralization test (prnt). although prnt and elisa results generally corelate with each other, the lack of complete fidelity of elisa continues to make prnt the gold-standard for determining immune protection , . however, due to its low throughput, prnt is not practical for large scale serodiagnosis and vaccine evaluation. this is a major gap for covid- surveillance and vaccine development. to address the above gap, we developed a fluorescence-based assay that rapidly and using a high-content imaging reader (fig. a) . forty covid- serum specimens from rt-pcr- confirmed patients and ten non-covid- serum samples (archived before covid- emergence) were analyzed using the reporter virus. after reporter viral infection, the cells turned serum dilution (fig. c) , which allowed for determination of the dilution fold that neutralized % of fluorescent cells (nt ). the reporter assay rapidly diagnosed fifty specimens in less than h: all forty covid- sera (specimens - ) showed positive nt of to , and all ten non-covid- sera (specimens - ) showed negative nt of < for (fig. d) . to validate the reporter virus neutralization results, we performed the conventional prnt on the same set of patient specimens. in agreement with the reporter virus results, the forty positive sera showed prnt of to , and the ten negative sera exhibited prnt of < (fig. d) . a strong correlation was observed between the reporter virus and prnt results, with a correlation efficiency r of . (fig. e) . the results demonstrate that when diagnosing patient specimens, the reporter virus assay delivers neutralization results comparable to the prnt assay, the gold standard of serological testing. next, we evaluated the specificity of reporter neutralization assay using potentially cross- reactive sera and interfering substances (table ) . two groups of specimens were tested for sars-cov- so that the assay could be performed at a bsl facility. nevertheless, the mneongreen reporter assay offers a rapid, high-throughput platform to test covid- patient sera not previously available. because neutralizing titer is a key parameter to predict immunity, the reporter human sera and interfering substances. all suman serum specimens were obtained at the university of texas medical branch (utmb). all specimens were de-identified from patient information. a total of forty de-identified convalescent sera from covid- patients (confirmed with viral rt-pcr positive) were tested in this study. ten non-covid- sera, collected before covid- emergence , , were also tested in the reporter virus and prnt assays. for testing cross reactivity, a total of de-identified specimens from patients with antigens or antibodies against different viruses, bacteria, and parasites were tested in the mneongreen sars-cov- neutralization assay (table ) . for testing interfering substances, nineteen de-identified serum specimens with albumin, elevated bilirubin, cholesterol, rheumatoid factor, and autoimmune nuclear antibodies were tested in the reporter neutralization assay. all human sera were heat-inactivated at °c for min before testing. severe acute respiratory syndrome coronavirus -specific (p/s; gibco) were seeded in each well of black µclear flat-bottom -well plate (greiner bio- one™). the cells were incubated overnight at °c with % co . on the following day, each serum was -fold serially diluted in % fbs and u/ml p/s dmem, and incubated with balanced salt solution; gibco) were added to each well to stain cell nucleus. the plate was sealed with breath-easy sealing membrane (diversified biotech), incubated at °c for min, and quantified for mneongreen fluorescence on cytation tm (biotek). the raw images ( × montage) were acquired using × objective, processed, and stitched using the default setting. utmb has filed a patent on the reverse genetic system and reporter sars-cov- . key: cord- - ir s authors: das, rohit pritam; jagadeb, manaswini; rath, surya narayan title: identification of peptide candidate against covid- through reverse vaccinology: an immunoinformatics approach date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ir s novel corona virus disease (covid- ) is emerging as a pandemic situation and declared as a global health emergency by who. due to lack of specific medicine and vaccine, viral infection has gained a frightening rate and created a devastating state across the globe. here the authors have attempted to design epitope based potential peptide as a vaccine candidate using immunoinformatics approach. as of evidence from literatures, sars-cov- spike protein is a key protein to initiate the viral infection within a host cell thus used here as a reasonable vaccine target. we have predicted a -mer peptide as representative of both b-cell and t-cell epitopic region along with suitable properties such as antigenic and non-allergenic. to its support, strong molecular interaction of the predicted peptide was also observed with mhc molecules and toll like receptors. the present study may helpful to step forward in the development of vaccine candidates against covid- . the disease covid- outbreak caused due to emergence of novel severe acute respiratory syndrome corona virus (sars-cov- ) [ ] . according to who, the novel corona virus has affected more than . million people with a fatality of , , across the globe as of may , . respiratory droplets, direct contact and fecal-to-oral transmission are conventional routs for sars-cov- [ ] . the symptoms of sars-cov- infection include fever, dry cough, shortness of breath, runny nose and sore throat [ , ] . the rate of transmission and death is gaining severity due to ignorance of specific drug and vaccine against it. sars-cov- is a positive-sense single-stranded rna virus and its genome is around . kb long with twelve putative open reading frames (orfs) that encode different structural and non-structural proteins. the first sars-cov- (wuhan-hu- ) genome was successfully sequenced and submitted to genbank on january , (accession no. mn . ) [ ] . one-third of the genome is responsible for coding the structural proteins in sars-cov- , namely, spike (s), envelope (e), membrane (m), and nucleocapsid (n) of sars-cov- are potential antigen for neutralizing antibody preparation and may be prospective therapeutics [ ] . after entering into host body, the virion attaches to the host cell membrane and the viral spike protein s interacts with a functional host cell receptor known as angiotensin-converting enzyme (ace ). thereafter, spike protein s mediates the fusion of the virion and cellular membranes by acting as a class i viral fusion protein [ ] . during this phase, the protein attains at least three conformational states: pre-fusion native state, pre-hairpin intermediate state, and post-fusion hairpin state [ ] . as sars-cov- s glycoprotein is surface-attached and has potentiality to initiates the infection thus could be a promising vaccine target. in this connection, epitope based peptide design have remarkable privilege than conventional vaccine development. peptide based vaccine are most popular since they are specific, generate long lasting immunity, able to avoid undesirable immune responses and are reasonably cheaper [ ] . in addition, epitope based vaccine design has been aided by robust computational techniques [ ] . therefore, authors have focused on discovery of epitope from sars-cov- s glycoprotein. the t-cell epitopes are typically peptide fragments, whereas the b-cell epitopes can be proteins, lipids, nucleic acids or carbohydrates [ , , ] . based on literature, the peptide is considered sufficient for activation of the appropriate cellular and humoral responses as it is the fragment of antigenic protein [ , ] . here we have identified peptide as vaccine candidate as the peptide vaccines are comparatively easy for production, chemically stable, and absence of infectious potential [ ] . the present study would throw lights on vaccine development against covid- . protein sequence of s glycoprotein was retrieved from uniprot (id: p dtc ) database [ ] . the gene name is "s" which belongs to severe acute respiratory syndrome corona virus ( -ncov) (sars-cov- ). the b-cell epitopic regions present in sars-cov- s protein were identified using bcepred prediction server (https://webs.iiitd.edu.in/cgibin/bcepred/) [ ] . it helped to predict linear epitopes from s protein sequence using physico-chemical properties. mhc binding prediction includes the prediction of binding sites for both cd + and cd +. the iedb analysis resource (http://tools.iedb.org/main/) predicts specific t-cell epitopes to bind with mhc class i molecules along with ic (half maximal inhibitory concentration) values. similarly, it employs different methods to predict mhc class ii epitopes, including a consensus approach which combines nn-align, smm-align and combinatorial library methods [ ] . the crystal structure of hla-b* : (pdb id: lnr) presenting mhc class i molecule in complex form with the peptide (rpqvplrpmty) was retrieved from pdb database [ ] . similarly, as representative of mhc-ii, the crystal structure of hla-dr (pdb id: t x) in complex form with a synthetic peptide (aaysdqatplllspr) was retrieved from pdb. pepfold server [ , ] was used to build the tertiary structural model of predicted peptides. molecular docking was performed between the predicted peptides and mhc representative structures using patchdock web server [ , , ] . determination of antigenic and allergenic properties are two important factor related to peptide based vaccine designing. the antigenicity of predicted peptides was calculated using vaxijen tool [ ] with the cut off value . . allertop v. . [ ] and allergenfp v. . tool [ ] was used to predict allergenic property of predicted peptides. in order to find the molecular properties of predicted peptides, innovagen's peptide calculator was used. it makes calculations and estimations on physiochemical properties like peptide molecular weight, peptide extinction coefficient, peptide net charge at neutral ph and peptide iso-electric point. crystal structures of human toll-like receptors such as tlr (pdb id: nig) and tlr (pdb id: g a) was extracted from pdb and subjected for structural preparation. interaction of both tlr and tlr structures with the predicted peptides were performed using patchdock web server [ , , ] . the complete sequence of sars-cov- s protein (uniprot id: p dtc ) is of , amino acids length. average antigenic propensity was calculated as . from antigenic determinants within its primary sequence. the antigenic plot (figure ) between amino acid residues with respect to propensity strongly established s protein antigenicity. prediction of b-cell epitope is a crucial step in epitope base vaccine design [ ] . the potential role of toll-like receptor mediated host-parasites interaction is well established. particularly, tlr and tlr are well known mediums for interaction between filarial parasites and host innate immune system [ , , ] . therefore, in this study we have studied the interaction between predicted peptides (pep- and pep- ) and toll-like receptors (tlr and tlr ). the result suggested pep- more strongly interacted with tlr than tlr (table , figure ). further, strongest binding affinity (ace score) was observed between tlr -pep docked complex (table , figure ). moreover, pep- is found more significantly binding with both tlr and tlr (table , figure ). this study is focused on the prediction of effective epitopes from spike protein of sars-cov- . among all possible epitopes, pep- and pep- were claimed as more effective candidates. again, it was observed that the pep- as more suitable than pep- in order to be a possible vaccine candidates. further, in vitro and in vivo validation is required to confirm the prediction. overall, this study would be informative towards new vaccine development for prevention of widespread covid- . middle east respiratory syndrome coronavirus: another zoonotic betacoronavirus causing sars-like disease clinical characteristics of coronavirus disease in china a new coronavirus associated with human respiratory disease in china american journal of respiratory and critical care medicine novel coronavirus structure, mechanism of action, antiviral drug promises and rule out against its treatment mechanisms of coronavirus cell entry mediated by the viral spike protein structure, function, and antigenicity of the sars-cov- spike glycoprotein peptide-based synthetic vaccines recent advances with liposomes as pharmaceutical carriers. nature reviews drug discovery vaccine and antibody-directed t cell tumour immunotherapy bba)-reviews on cancer structural and immunogenicity analysis of chimeric b-cell epitope constructs derived from the gp and gp subunits of the envelope glycoproteins of htlv- . the journal of peptide research identification of t-and b-cell epitopes in synthetic peptides derived from a streptococcus mutans protein and characterization of their antigenicity and immunogenicity. archives of oral biology design and development of synthetic peptide vaccines: past, present and future. expert review of vaccines prediction of b-cell epitopes using evolutionary information and propensity scales t-cell epitope vaccine design by immunoinformatics. open biology uniprot: a hub for protein information bcepred: prediction of continuous b-cell epitopes in antigenic sequences using physico-chemical properties. ininternational conference on artificial immune systems epitopeviewer: a java application for the visualization and analysis of immune epitopes in the immune epitope database and analysis resource (iedb). immunome research protein data bank (pdb): database of three-dimensional structural information of biological macromolecules pep-fold: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides. nucleic acids research pep-fold: an online resource for de novo peptide structure prediction protein-protein docking: methods and tools efficient unbound docking of rigid molecules patchdock and symmdock: servers for rigid and symmetric docking vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. bmc bioinformatics allertop v. -a server for in silico prediction of allergens allergenfp: allergenicity prediction by descriptor fingerprints recent advances in b-cell epitope prediction methods. immunome research an introduction to b-cell epitope mapping and in silico epitope prediction we are thankful to dr. pawan kumar agrawal, vice chancellor, odisha university of agriculture and technology, bhubaneswar for his moral support and valuable suggestion. the authors declare no competing interest. key: cord- -ghrlj b authors: pruijssers, andrea j.; george, amelia s.; schäfer, alexandra; leist, sarah r.; gralinksi, lisa e.; dinnon, kenneth h.; yount, boyd l.; agostini, maria l.; stevens, laura j.; chappell, james d.; lu, xiaotao; hughes, tia m.; gully, kendra; martinez, david r.; brown, ariane j.; graham, rachel l.; perry, jason k.; du pont, venice; pitts, jared; ma, bin; babusis, darius; murakami, eisuke; feng, joy y.; bilello, john p.; porter, danielle p.; cihlar, tomas; baric, ralph s.; denison, mark r.; sheahan, timothy p. title: remdesivir potently inhibits sars-cov- in human lung cells and chimeric sars-cov expressing the sars-cov- rna polymerase in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ghrlj b severe acute respiratory syndrome coronavirus (sars-cov- ) emerged in as the causative agent of the novel pandemic viral disease covid- . with no approved therapies, this pandemic illustrates the urgent need for safe, broad-spectrum antiviral countermeasures against sars-cov- and future emerging covs. we report that remdesivir (rdv), a monophosphoramidate prodrug of an adenosine analog, potently inhibits sars-cov- replication in human lung cells and primary human airway epithelial cultures (ec = . μm). weaker activity was observed in vero e cells (ec = . μm) due to their low capacity to metabolize rdv. to rapidly evaluate in vivo efficacy, we engineered a chimeric sars-cov encoding the viral target of rdv, the rna-dependent rna polymerase, of sars-cov- . in mice infected with chimeric virus, therapeutic rdv administration diminished lung viral load and improved pulmonary function as compared to vehicle treated animals. these data provide evidence that rdv is potently active against sars-cov- in vitro and in vivo, supporting its further clinical testing for treatment of covid- . while four endemic human cov (hcov-oc , - e, -nl , and -hku ) typically cause mild respiratory diseases with common cold-like symptoms, sars-cov- , mers-cov, and sars-cov- cause severe respiratory disease with respective mortality rates of % (chan-yeung and xu, ) , % (arabi et al., ) , and an estimated % (chen, ) . the development of effective broad-spectrum antivirals has been hampered by viral diversity, the capacity of covs to adaptively overcome negative selective pressures, and the ability to actively counteract drugs through the action of a proofreading exoribonuclease. we previously reported that remdesivir (rdv), a monophosphoramidate prodrug of the c-adenosine analog gs- , potently inhibits replication of a broad spectrum of pre-pandemic bat covs and human epidemic covs in primary human lung cell cultures (agostini et calu human lung cells with sub-micromolar ec and in primary human airway epithelial cultures (haes) with nanomolar ec . notably, we have detected comparably lower potency of rdv in established human and monkey cell lines due to their lower metabolic capacity to activate the compound. mice infected with chimeric sars-cov- encoding the sars-cov- rdrp and treated therapeutically with rdv show decreased viral loads in the lungs and increased pulmonary function. these data emphasize the potential of rdv as a promising countermeasure against the ongoing covid- pandemic. replication for each cell type, and infectious viral titer and viral genome copy number in the supernatant were quantified by plaque assay and rt-qpcr, respectively. rdv and gs- potently inhibited sars-cov- replication in a dose-dependent manner in both cell types ( fig. ; table ). in calu cells, both compounds displayed dose-dependent inhibition of viral replication as determined by plaque assay ( fig. b) and rt-qpcr (fig. c) . rdv inhibited sars-cov- with an ec = . μm and ec = . μm. the parent compound gs- was less potent: ec = . μm, ec = . μm ( fig. d ; table ). ec values determined by quantification of viral genome copies were roughly two-fold higher than those obtained by quantification of infectious virus ( fig. e; table ). both compounds also displayed dose-dependent inhibition of viral replication in vero e cells as determined by infectious viral titer and genome copy number (fig f) . rdv inhibited sars-cov- with ec = . μm and ec = . μm, while gs- was more potent (ec = . μm, ec = . μm) ( fig. g and physiology of the human conducting airway (sims et al., ) . therefore, we evaluated antiviral activity of rdv in this biologically relevant model. in rdv treated hae, we observed a dose-dependent reduction in infectious virus production, with > -fold inhibition at the highest tested concentration (fig. a) . importantly, rdv demonstrate potent antiviral activity with ec values of . and . µm in two independent experiments (fig. b) . we previously reported that rdv is not cytotoxic at doses at or below µm in this culture system, supporting the conclusion that the observed antiviral effect was virus-specific (sheahan et al., ) . together, these data demonstrate that rdv is potently antiviral against sars-cov- in primary human lung cultures with a selectivity index of > . (fig. s ) and the efficiency of each step might differ between cell types. therefore, to reconcile the differences in antiviral activity of rdv and gs- observed in our and other studies, we compared intracellular rdv-tp concentrations in vero e , calu b , and haes following incubation with the two compounds. rdv-tp levels per million cells produced after -to -hour treatment with rdv were substantially higher in primary hae cultures than either calu b or vero e . (fig ; table ; tables s , s ). given the primary nature of hae cultures, we used cells from two independent donors with similar demographic profiles. rdv-tp was efficiently formed in both donor cultures following incubation with rdv with a difference of < -fold from each other. the lowest levels of rdv-tp were observed following rdv treatment of vero e cells and were approximately -and -fold lower than those observed in calu b and hae cultures, respectively. the levels of gs- as well as the intermediate mono-and di-phosphorylated metabolites (rdv-mp and rdv-dp) were readily detected in calu b cultures following treatment with rdv, but were below the limit of quantification in vero e cells at all time points tested (table s ). in addition, incubation of vero e cells with gs- yielded -fold higher rdv-tp levels compared to incubation with rdv corresponding to higher antiviral potency of gs- relative to rdv, which is not observed with either calu or hae cultures. (table s , s ). in conclusion, the rdv-tp levels in the different cell types directly correlated with the antiviral potencies of rdv and gs- against sars- cov- with the hae cultures producing substantially higher levels of rdv-tp that translated into markedly more potent antiviral activity of rdv (table ) . importantly, the metabolism of rdv in vero e cells appeared altered and was less efficient particularly in comparison with the hae cultures, indicating that vero e cells might not be an adequate cell type to characterize the antiviral activity of rdv and potentially also other nucleotide prodrug-based antivirals. to determine whether rdv exerts antiviral effect on sars-cov- in vivo, we constructed a chimeric mouse-adapted sars-cov- variant encoding the target of rdv antiviral activity, the rdrp, of sars-cov- (sars /sars -rdrp) (fig. a) . although other chimeric replicase orf recombinant covs have shown to be viable (stobart et al., ) , this is the first demonstration that the rdrp from a related but different cov can support efficient replication of another. after recovery and sequence-confirmation ( fig. b ) of recombinant chimeric viruses with and without nanoluciferase reporter, we compared sars-cov- and sars /sars -rdrp replication and sensitivity to rdv in huh cells. replication of both viruses was inhibited similarly in a dose-dependent manner by rdv (sars-cov- mean ec = . µm; sars /sars -rdrp mean ec = . µm) ( fig. c and d) . we then sought to determine the therapeutic efficacy of rdv against the sars /sars -rdrp in mouse models employed for previous studies of rdv (sheahan et al., ) . mice produce a serum esterase absent in humans, carboxyl esterase c (ces c), that dramatically reduces half-life of rdv. thus, to mirror pharmacokinetics observed in humans, mouse studies with rdv must be performed in transgenic c bl/ ces c -/mice (sheahan et al., ) . we infected female c bl/ ces c -/mice with pfu sars /sars -rdrp and initiated subcutaneous treatment with mg/kg rdv bid at one day post-infection (dpi). this regimen was continued until study termination. while weight loss did not differ between vehicle-and rdv-treated animals (fig. e) , lung hemorrhage at five dpi was significantly reduced with rdv treatment (fig. f ). to gain insight into physiologic metrics of disease severity, we measured pulmonary function daily by whole body plethysmography (wbp). the wbp metric, penh, is a surrogate marker of pulmonary obstruction (menachery et al., a) . therapeutic rdv significantly ameliorated loss of pulmonary function observed in the vehicle-treated group (fig. g) . importantly, rdv treatment dramatically reduced lung viral load (fig. h) . taken together, these data demonstrate that therapeutically administered rdv can reduce virus replication and versus sars-cov- isolates used in the previously mentioned studies assessing rdv potency did not reveal consensus changes in nsp sequence, suggesting that any isolate-specific variation in rdv sensitivity is not likely due to differences in the rdv-tp interaction with the rdrp. therefore, the differences in ec may be partially explained by intrinsic differences of sars-cov- virus isolates, quantification methods, and assay conditions such as incubation period and virus input. we thank dr. natalie thornburg at the centers for disease control and prevention in atlanta, usa for providing the stock of sars-cov- used in this study. finally, we thank vumc and unc environmental health and safety personnel for ensuring that our work is performed safely and securely. we also thank facilities management personnel for their tireless commitment to excellent facility performance and our grant management teams for their administrative support of our research operations. the authors affiliated with gilead sciences, inc. are employees of the company and own company stock. the other authors have no conflict of interest to report. transwell-col ( mm diameter) supports (corning). human airway epithelium cultures (hae) were generated by provision of an air-liquid interface for to weeks to form well-differentiated, polarized cultures that resembled in vivo pseudostratified mucociliary epithelium (fulcher et al., ) . clinical specimens of sars-cov- from a case-patient who acquired covid- during travel to china and diagnosed in washington state, usa upon return were collected as described (holshue et al., ) . virus isolation from this patient's specimens was performed as described in (harcourt et al.) . the sequence is available through genbank (accession number mn ). a passage stock of the sars- cov- seattle isolate was obtained from the cdc and passed twice in vero e cells to generate high-titer passage stock for experiments described in this manuscript. cov- n gene positive control plasmid (idt, cat# ) served as template to pcr-amplify a bp product using forward ( '-taatacgactcactatagggatgtctgataatggacccca) and reverse ( '-ttaggcctgagttgagtcag) primers that appended a t rna polymerase promoter to the ' end of the complete n orf. pcr product was column purified (promega) for subsequent in vitro transcription of n rna using mmessage mmachine t transcription kit (invitrogen) according to manufacturer's protocol. n rna was purified using rneasy mini kit (qiagen) according to manufacturer's protocol, and copy number was calculated using scienceprimer.com cop number calculator. in vitro metabolism of rdv and gs- . calu b or vero e cells were seeded in a -well plate at . x or . x cells/well, respectively. twenty-four hours later, cell culture media was replaced with media containing μm rdv (gs- ) or gs- and incubated at ˚c. differentiated hae cultures from two healthy donors (mattek corporation; ashland, ma) were maintained with media replacement every other day for week. the hae donors were -and -year-old females of the same race. at the time of treatment, media was replaced on the basal side of the transwell hae culture, while the apical surface media was replaced with µl media containing μm rdv. at , and h post drug addition to all cultures, cells were washed times with ice-cold tris-buffered saline, scraped into . ml ice-cold % methanol and stored at - °c. extracts were centrifuged at , x g for minutes and supernatants were transferred to clean tubes for evaporation in a mivac duo concentrator (genevac). dried samples were reconstituted in mobile phase a containing mm ammonium formate (ph ) with mm dimethylhexylamine (dmh) in water for analysis by lc-ms/ms, using a multi-stage linear gradient from % to % acetonitrile in mobile phase a at a flow rate of μl/min. analytes were separated using a x mm, . μm luna c ( ) hst column (phenomenex) connected to an lc- adxr (shimadzu) ternary pump system and hts pal autosampler (leap technologies). detection was performed on a qtrap + (ab sciex) mass spectrometer operating in positive ion and multiple reaction monitoring modes. analytes were quantified using a -point standard curve ranging in concentration from . to pmol prepared in extracts from untreated cells. for normalization by cell number, multiple untreated calu or vero e culture wells were counted at each timepoint. hae cells were counted at the -h timepoint and the counts for other timepoints were determined by normalized to endogenous atp levels for accuracy. formulations for in vivo studies. rdv was solubilized at . mg/ml in vehicle containing % sulfobutylether-β-cyclodextrin sodium salt in water (with hcl/naoh) at ph . . in vivo efficacy studies. all animal experiments were performed in accordance with the university of north carolina at chapel hill institutional animal care and use committee policies and guidelines. to achieve a pharmacokinetic profile similar to that observed in humans, we performed therapeutic efficacy studies in ces c -/mice (stock , the jackson laboratory), which lack a serum esterase not present in humans that dramatically reduces rdv half-life (sheahan et al., ) . week-old female ces c -/- mice were anaesthetized with a mixture of ketamine/xylazine and intranasally infected with pfu sars /sars -rdrp in µl. one dpi, vehicle (n = ) and rdv (n = ) dosing was initiated ( mg/kg subcutaneously) and continued every h until the end of the study at five dpi. to monitor morbidity, mice were weighed daily. pulmonary function testing was performed daily by whole body plethysmography (wbp) (data sciences international) (sheahan et al., ) . at five dpi, animals were sacrificed by isoflurane overdose, lungs were scored for lung hemorrhage, and the inferior right lobe was frozen at − °c for viral titration via plaque assay on vero e cells. lung hemorrhage is a gross pathological phenotype readily observed by the naked eye and driven by the degree of virus replication, where lung coloration changes from pink to dark red (sheahan et al., (sheahan et al., , a . for the plaque assay, x vero e cells/well were seeded in -well plates. the following day, medium was removed, and monolayers were adsorbed at ˚c for one h with serial dilutions of sample ranging from - to - . cells were overlayed with x dmem, % fetal clone serum, × antibiotic-antimycotic, . % agarose. viral plaques were enumerated three days later. mathematical and statistical analyses. the ec value was defined in graphpad prism as the concentration at which there was a % decrease in viral replication relative to vehicle alone ( % inhibition). curves were fitted based on four parameter non-linear regression analysis. all statistical tests were executed using graphpad prism . coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease hydroxycytidine inhibits a proofreading-intact coronavirus with a high genetic barrier to resistance viral replication. structural basis for rna replication by the hepatitis c virus polymerase middle east respiratory syndrome sars-cov- and sars-cov differ in their cell tropism and drug sensitivity profiles broad spectrum antiviral remdesivir inhibits human endemic and zoonotic deltacoronaviruses with a highly divergent rna dependent rna polymerase sars: epidemiology. respirol. carlton vic suppl pathogenicity and transmissibility of -ncov-a quick overview and comparison with other emerging viruses synthesis and antiviral activity of a series of ′-substituted -aza- , -dideazaadenosine c- nucleosides remdesivir, lopinavir, emetine, and homoharringtonine inhibit sars-cov- replication in vitro data, disease and diplomacy: gisaid's innovative contribution to global health is the expression of deoxynucleoside kinases and '-nucleotidases in animal tissues related to the biological effects of nucleoside analogs? well-differentiated human airway epithelial cell cultures structure of the rna-dependent rna polymerase from covid- virus remdesivir is a direct-acting antiviral that inhibits rna-dependent rna polymerase from severe acute respiratory syndrome coronavirus with high potency the antiviral compound remdesivir potently inhibits rna-dependent rna polymerase from middle east respiratory syndrome coronavirus compassionate use of remdesivir for patients with severe covid- early release -severe acute respiratory syndrome coronavirus from patient with novel coronavirus disease first case of novel coronavirus in the united states viral loads in clinical specimens and sars manifestations identification of antiviral drug candidates against sars-cov- from fda-approved drugs structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors the role of transporters in the toxicity of nucleoside and nucleotide analogs gs- and its parent nucleoside analog inhibit filo the protide prodrug technology: from the concept to the clinic new metrics for evaluating viral respiratory pathogenesis a sars-like cluster of circulating bat coronaviruses shows potential for human emergence sars-like wiv -cov poised for human emergence lusakibanza cell-line dependent antiviral activity of sofosbuvir against zika virus viral load of sars-cov- in clinical samples severe acute respiratory syndrome lianhuaqingwen exerts anti-viral and anti-inflammatory activity against novel coronavirus (sars-cov- ) pharmacologic treatments for coronavirus disease (covid- ): a review reverse genetics with a full-length infectious cdna of the middle east respiratory syndrome coronavirus broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov an orally bioavailable broad-spectrum antiviral inhibits sars-cov- in human airway epithelial cell cultures and multiple coronaviruses in mice gisaid: global initiative on sharing all influenza data -from vision to reality severe acute respiratory syndrome coronavirus infection of human ciliated airway epithelia: role of ciliated cells in viral spread in the conducting airways of the lungs chimeric exchange of coronavirus nsp proteases ( clpro) identifies common and divergent regulatory determinants of protease activity receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro therapeutic efficacy of the small molecule gs- against ebola virus in rhesus monkeys clinical benefit of remdesivir in rhesus macaques prophylactic and therapeutic remdesivir (gs- ) treatment in the rhesus macaque model of mers-cov infection sars and mers: recent insights into emerging coronaviruses dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection reverse genetics with a full-length infectious cdna of severe acute respiratory syndrome coronavirus clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -rhd qqyn authors: bhattacharyya, sumit; kotlo, kumar; tobacman, joanne k. title: increased expression of chondroitin sulfotransferases following angii may contribute to pathophysiology underlying covid- respiratory failure: impact may be exacerbated by decline in arylsulfatase b activity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rhd qqyn the spike protein of sars-cov- binds to respiratory epithelium through the ace receptor, an endogenous receptor for angiotensin ii (angii). the mechanisms by which this viral infection leads to hypoxia and respiratory failure have not yet been elucidated. interactions between the sulfated glycosaminoglycans heparin and heparan sulfate and the sars-cov- spike glycoprotein have been identified as participating in viral adherence and infectivity. in this brief report, we present data indicating that stimulation of vascular smooth muscle cells by angii leads to increased expression of two chondroitin sulfotransferases (chst and chst ), which are required for the synthesis of the sulfated glycosaminoglycans chondroitin -sulfate (c s) and chondroitin , -disulfate (cse). we suggest that increased expression of these chondroitin sulfotransferases and the ensuing production of chondroitin sulfates may contribute to viral adherence to bronchioalveolar cells and to the progression of respiratory disease in covid- . the enzyme arylsulfatase b (arsb; n-acetylgalactosamine- -sulfatase), which removes -sulfate groups from the non-reducing end of chondroitin -sulfate residues, is required for degradation of c s and cse. in hypoxic conditions or following treatment with chloroquine, arsb activity is reduced. decline in arsb can contribute to ongoing accumulation and airway obstruction by c s and cse. decline in arsb leads to increased expression of interleukin(il)- in human bronchial epithelial cells, and il- is associated with cytokine storm in covid- . these findings indicate how chondroitin sulfates, chondroitin sulfotransferases, and chondroitin sulfatases may participate in the progression of hypoxic respiratory insufficiency in covid- disease and suggest new therapeutic targets. our studies of chondroitin sulfates, chondroitin sulfatases, and chondroitin sulfotransferases demonstrate several findings that are pertinent to mechanisms of how sars-cov- infection might lead to respiratory insufficiency through interactions with sulfated gags [ , , [ ] [ ] [ ] , ] . in this brief report, data are presented showing increased expression of the chondroitin sulfotransferases chst and chst in human vascular smooth muscle cells following stimulation by exogenous angiotensin (ang) ii. chst (n-acetylgalactosamine - arsb activation requires oxygen for post-translational modification and activation, and arsb activity is lower in hypoxic conditions [ , ] ; b) decline in arsb is associated with increased il- expression in human bronchial epithelial cells and in patients with cystic fibrosis and asthma [ ] ; c) decline in arsb replicates effects of hypoxia and generation of sulfate by arsb may be important in mitochondrial metabolism [ , ] ; d) accumulation of sulfated glycosaminoglycans when arsb is reduced can contribute to inflammation and pulmonary pathophysiology, as in cystic fibrosis [ ] ; e) in patients with moderate copd, refractoriness to oxygen therapy was associated with gene mutations regulating arsb expression [ ] ; and f) changes in chondroitin -sulfation affect binding of the critical molecules galectin- and shp (ptpn ; protein tyrosine phosphatase non-receptor type ) with impact on transcriptional events and vital cell signaling [ , ] . the lysosomal enzyme galactose -sulfate sulfatase (galns) removes the -sulfate group at the non-reducing end of cse, followed by removal of the -sulfate group at the non-reducing end by arsb [ ] . deficiency of lysosomal acidification, as by treatment with chloroquine or hydroxychlorooquine [ ], can inhibit activity of these chondroitin sulfatases and contribute to the accumulation of c s and cse [ ] . we hypothesize that hydroxychloroquine treatment and hypoxia reduce arsb, and that decline in arsb contributes to cytokine storm and respiratory distress in covid- . decline in arsb activity exacerbates the impact of enhanced production of chst and chst due to ace mediated increases in chst and chst . with these background considerations, we report increased expression of chondroitin sulfotransferases by the angii-activated pathway and hypothesize how these effects might contribute to the manifestations of covid- infection and suggest new approaches to reduce disease morbidity and mortality. human aortic smooth muscle cells (lifeline technology, oceanside, ca, usa) were maintained in dmem supplemented with % fetal bovine serum at °c in a humidified atmosphere with % co . cells were passaged twice, then grown in multiwell culture plates to % confluence, and media was changed to dmem without serum for h to maintain quiescence. cells were then treated with angiotensin ii (angii; circulating leukocytes cells were isolated from whole blood samples obtained from patients aged - , followed at rush university medical center (rumc) for cystic fibrosis, asthma, or other conditions under a protocol approved by rumc and the university of illinois at chicago (uic) institutional review boards [ ] . venous blood was collected in citrated tubes and processed within h of collection. subjects had no acute illness at the time of blood collection. blood samples were identified with a study number, and a registry linked patient clinical data with study identifier. polymorphonuclear (pmn) and mononuclear (mc; monocytes and lymphocytes) white blood cells were collected separately from the whole blood samples by the polymorphprep tm kit (axis-shield, oslo, norway) [ ] . cells were stored at - °c until further experiments. arsb activity was determined in the leukocyte samples by a fluorometric assay, following a standard protocol [ ] . protocol required ml of cell homogenate, ml of assay buffer ( . m na-acetate buffer with mmol/l barium sulfate: . mol/l na acetate, ph . ), and ml of substrate [ mm -methylumbilliferyl sulfate (mus)] in assay buffer. materials were combined in microplate wells, and the microplate was incubated for min at °c. the reaction was stopped by adding ml of stop buffer (glycine-carbonate buffer, ph . ), and fluorescence was detected at nm (excitation) and nm (emission). arsb activity was expressed as nmol/mg protein/h, and was derived from a standard curve prepared using known quantities of -methylumbilliferyl at ph . . measurement of interleukin- by elisa interleukin (il)- in the patient plasma samples was measured by the quantikine elisa kit for human il- (r&d systems, minneapolis, mn), as previously [ ] . il- in the samples was captured into the wells of a microtiter plate pre-coated with specific anti-il- monoclonal antibody, and the immobilized il- was detected by a biotinylated second antibody and streptavidin-horseradish peroxidase (hrp)-conjugate with the chromogenic substrate hydrogen peroxide/ tetramethylbenzidine (tmb). color intensity was read at nm with a reference filter of nm in an elisa plate reader (fluostar, bmg labtech, inc., cary, nc). il- concentrations were extrapolated from a standard curve plotted using known concentrations of il- , expressed as picograms per milliliter plasma (pg/ml). qrt-pcr was performed using established techniques and primers identified using relative expression was calculated by standard methods comparing treated and control cells using gapdh as housekeeping gene. measurement of total sulfated glycosaminoglycans (gags) was performed as previously described [ ] . briefly, the blyscan tm assay kit (biocolor ltd, newtownabbey, n. ireland) was used for detection of the sulfated gag, based on the reaction of , -dimethylmethylene blue with the sulfated oligosaccharides in the gag chains. sulfotransferase activity was determined using the universal sulfotransferase activity kit angii exposure leads to marked increase in expression of chst and chst treatment of human aortic smooth muscle cells with angiotensin (ang) ii led to marked increases in expression of chst and chst . chst increased to . times the baseline level (n= ) (p< . ), and chst increased to . times the baseline level (n= ) (p< . ) (fig. a,b) . co-treatment with candesartan, an angiotensin(at)- receptor blocker, largely, but incompletely inhibited the angii-induced increases in expression of chst and chst (fig. c, d) . consistent with the chst increases, the total sulfated glycosaminoglycans (gags) were increased, and the increase was largely inhibited by arb treatment (p< . ) ( fig. a, b ). the measured sulfotransferase activity was also increased (p< . ) and incompletely inhibited by treatment with candesartan (fig. c, d ). following treatment with angii, expression of several proteoglycans with chondroitinsulfate attachments increased. these include increases from control values for versican, syndecan- , biglycan, and perlecan (p< . ) (fig. a) . no increases were shown for chondroitin sulfate proteoglycan (cspg) , tyrosylprotein sulfotransferase (tpst) , or arylsulfatase b (arsb). treatment with candesartan largely inhibited the increases in expression of these proteoglycans (fig. b ). in plasma from patients with cystic fibrosis or asthma, interleukin- values were significantly greater in the plasma from the patients with cystic fibrosis and in most of the patients with asthma, in contrast to the controls (p< . ) (fig. a) [ ] . also, neutrophil arylsulfatase b activity in cf was less compared to levels in neutrophils from healthy normal controls (p< . ) (fig. b) [ ] . an inverse relationship between arsb and il- in the subjects is apparent (r = - . ) (fig. c) . previously, we reported an increase in il- mrna expression in human bronchial epithelial cells (bec) following arsb silencing, and il- was increased in the spent media of the cultured bec following arsb silencing [ ] . an overall representation of the proposed interactions between spike protein receptor binding with the ace receptor indicates increased expression of chst and chst (fig. ) . increases in these sulfotransferases leads to increased c s and cse production, which accumulate further when arsb activity is reduced. decline in arsb leads to increased expression of il- , thereby contributing to cytokine storm and exacerbation of effects of infection with sars-cov- . the enzyme n-acetylgalactosamine- -sulfatase (galactose -sulfatesulfatase; galns) is required to remove the -sulfate from cse, prior to removal of the sulfate by arsb. if galns is reduced, cse is anticipated to accumulate. in the human aortic smooth muscle cells, exposure to angiotensin ii markedly increased the mrna expression of chst and chst . in contrast, there was no increased expression of tpst , a tyrosylprotein sulfotransferase. the angiotensin-receptor blocker (arb) candesartan incompletely inhibited the increase in expression. these findings suggest that stimulation of the ace receptor by sars-cov- spike protein may also stimulate expression of these chondroitin sulfotransferases and increase abundance of the chondroitin sulfates chondroitin -sulfate and chondroitin , -disulfate. accumulation of chondroitin sulfates in the lung is anticipated to impair airflow and oxygenation, leading to secondary decline in arsb activity due to the requirement for oxygen for post-translational modification and activation of arsb [ ] . decline in arsb was shown to increase il- expression in bronchial epithelial cells and to be increased in patients with cystic fibrosis and asthma [ ] . decline in arsb has also been associated with refractoriness to oxygen therapy in patients with moderate copd [ ] . these findings indicate that increased expression of chondroitin sulfotransferases, increased abundance of chondroitin sulfates, and decline in chondroitin sulfatase activity of arsb activity can contribute to the pathophysiology of covid- , with effects on cytokine storm and respiratory failure. in the human aortic smooth muscle cells, inhibition of the angii receptor by the angiotensin receptor blocker (arb) candesartan largely blocked the increases in chst and chst expression, total sulfated glycosaminoglycans, and sulfotransferase activity. the impact of arb or angiotensin converting enzyme inhibitor (acei) therapy on the expression or activity of the ace in human lung cells is not clarified, and the potential impact of arb or acei on the severity of covid- is under intensive investigation [ ] [ ] [ ] . decline in arsb was associated with increased plasma il- in patients with cystic fibrosis, and increased mrna expression and increased secretion into media of cultured human bronchial epithelial cells [ ] . since il- contributes to cytokine storm in covid- , this finding is particularly relevant to the mechanism whereby arsb decline can contribute to and can accelerate covid- disease. also, genomic data indicate that mutations affecting arsb gene expression are associated with unresponsiveness to oxygen therapy in patients with moderate chronic obstructive lung disease [ ] , consistent with impaired response to oxygen therapy in covid- . these findings suggest that treatment with chloroquine/hydroxychloroquine might impact on clinical response to sars-cov- infection, due to inhibition of arsb and the associated increases in chondroitin -sulfation. the manifestations of lower arsb may include: impact on viral binding to respiratory tract cells in which chondroitin -sulfation is increased; impaired responsiveness to oxygen treatment; and enhanced il- production contributing to cytokine storm. recombinant human (rh) arsb is used for replacement in mucopolysaccharidosis vi (mps vi), the inherited genetic deficiency of arsb [ ] . as we await a vaccine and/or effective antiviral treatment of covid- , rharsb may be a useful, new approach to refractory hypoxia in these patients. we are mindful of the shortcomings of the data presented and hope that other investigators will further assess how chondroitin sulfates, chondroitin sulfotransferases and chondroitin sulfatases contribute to covid- pathophysiology. these investigations may lead to more effective treatment and to better understanding of how to prevent severe disease. a. plasma interleukin (il)- levels were markedly increased in cystic fibrosis (n= ) compared to the normal controls (n= ) ( . ± . pg/ml vs. . ± . pg/ml; p< . ). levels in patients with asthma (n= ) ( . ± . pg/ml) were also significantly increased. b. arsb activity in the circulating neutrophils was markedly reduced in the cystic fibrosis patients, compared to the normal controls ( . ± . vs. . ± . nmol/mg protein/h) and the patients with asthma ( . ± . nmol/mg protein/h). there is an inverse relationship between plasma il- levels and neutrophil arsb activity (r = - . ). increases were shown in versican, syndecan- , biglycan, and perlecan, but not in cspg or in tpst (not shown). b. treatment with candesartan reduced, but did not eliminate, increases in the proteoglycans. the overall pathway indicates that transcriptional events arise following the stimulation of the ace receptor, leading to increased expression of chst and chst . the increased sulfotransferase expression, attributable either to angii or to spike protein receptor binding, leads to increased production of c s and cse. accumulation of these sulfated glycosaminoglycans causes decline in oxygenation, leading to decline in arsb activity, and the further accumulation of c s and cse and expression of il- , and increasing respiratory insufficiency and cytokine storm. i n c r e a s e d e x p r e s s i o n o f c h s t a n d c h s t f o l l o w i n g a n g i i e x p o s u r s u l f o t r a n s f e r a s e a c t i v i t y a n d t o t a l s u l f a t e d g l y c o s a m i n o g l y c a n inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans murine coronavirus with an extended host range uses heparan sulfate as an entry receptor glycosaminoglycan binding motif at s /s proteolytic cleavage site on spike glycoprotein may facilitate novel coronavirus (sars-cov- ) host cell entry structural basis for human coronavirus attachment to sialic acid receptors human coronaviruses oc and hku bind to -o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a glycosaminoglycans induce conformational change in the sarscov- spike s receptor binding domain structure, function, and evolution of coronavirus spike proteins comparison of sars-cov- spike protein binding to human, pet, farm animals, and putative intermediate hosts ace and ace receptors the proximal origin of sars-cov- assignment of coronavirus spike protein site-specific glycosylation using glycresoft. biorxiv increased chst follows decline in arylsulfatase b (arsb) and disinhibition of noncanonical wnt signaling: potential impact on epithelial and mesenchymal identity hypoxia reduces arylsulfatase b activity and silencing arylsulfatase b replicates and mediates the effects of hypoxia probing the oxygen-binding site of the human formylglycine generating enzyme using halide ions reduced arylsulfatase b activity in leukocytes from cystic fibrosis patients restriction of aerobic metabolism by acquired or innate arylsulfatase b deficiency: a new approach to the warburg effect cellbound il- increases in bronchial epithelial cells after arylsulfatase b silencing due to sequestration with chondroitin- -sulfate genomics and response to long-term oxygen therapy in chronic obstructive pulmonary disease reninangiotensin-aldosterone system inhibitors in patients with covid- angiotensin-converting enzyme inhibitors and angiotensin receptor blockers in patients with coronavirus disease : friend or foe? ace (angiotensin-converting enzyme ), covid- , and ace inhibitor and ang ii (angiotensin ii) receptor blocker use during the pandemic: the pediatric perspective enzyme replacement therapy in mucopolysaccharidosis vi (maroteaux-lamy syndrome) the authors acknowledge the contributions of robert danziger, md, mba, to vascular studies and hypertension mechanisms. key: cord- -wcn bjnw authors: ren, xianwen; wen, wen; fan, xiaoying; hou, wenhong; su, bin; cai, pengfei; li, jiesheng; liu, yang; tang, fei; zhang, fan; yang, yu; he, jiangping; ma, wenji; he, jingjing; wang, pingping; cao, qiqi; chen, fangjin; chen, yuqing; cheng, xuelian; deng, guohong; deng, xilong; ding, wenyu; feng, yingmei; gan, rui; guo, chuang; guo, weiqiang; he, shuai; jiang, chen; liang, juanran; li, yi-min; lin, jun; ling, yun; liu, haofei; liu, jianwei; liu, nianping; liu, yang; luo, meng; ma, qiang; song, qibing; sun, wujianan; wang, gaoxiang; wang, feng; wang, ying; wen, xiaofeng; wu, qian; xu, gang; xie, xiaowei; xiong, xinxin; xing, xudong; xu, hao; yin, chonghai; yu, dongdong; yu, kezhuo; yuan, jin; zhang, biao; zhang, tong; zhao, jincun; zhao, peidong; zhou, jianfeng; zhou, wei; zhong, sujuan; zhong, xiaosong; zhang, shuye; zhu, lin; zhu, ping; zou, bin; zou, jiahua; zuo, zengtao; bai, fan; huang, xi; bian, xiuwu; zhou, penghui; jiang, qinghua; huang, zhiwei; bei, jin-xin; wei, lai; liu, xindong; cheng, tao; li, xiangpan; zhao, pingsen; wang, fu-sheng; wang, hongyang; su, bing; zhang, zheng; qu, kun; wang, xiaoqun; chen, jiekai; jin, ronghua; zhang, zemin title: large-scale single-cell analysis reveals critical immune characteristics of covid- patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wcn bjnw dysfunctional immune response in the covid- patients is a recurrent theme impacting symptoms and mortality, yet the detailed understanding of pertinent immune cells is not complete. we applied single-cell rna sequencing to samples from covid- patients and controls to create a comprehensive immune landscape. lymphopenia and active t and b cell responses were found to coexist and associated with age, sex and their interactions with covid- . diverse epithelial and immune cell types were observed to be virus-positive and showed dramatic transcriptomic changes. elevation of anxa and s a in virus-positive squamous epithelial cells may enable the initiation of neutrophil and macrophage responses via the anxa -fpr and s a / -tlr axes. systemic upregulation of s a /a , mainly by megakaryocytes and monocytes in the peripheral blood, may contribute to the cytokine storms frequently observed in severe patients. our data provide a rich resource for understanding the pathogenesis and designing effective therapeutic strategies for covid- . highlights large-scale scrna-seq analysis depicts the immune landscape of covid- lymphopenia and active t and b cell responses coexist and are shaped by age and sex sars-cov- infects diverse epithelial and immune cells, inducing distinct responses cytokine storms with systemic s a /a are associated with covid- severity the coronavirus disease (covid- ) is an ongoing pandemic infectious disease, caused with the severe acute respiratory syndrome coronavirus (sars-cov- ). currently, it has caused with around million infections and close to million deaths according to the statistics of world health organization until september , , with the fatality rate as high as ~ % in specific regions. although many covid- patients experience asymptomatic, mild or moderate symptoms, some patients progress to severe conditions and even death. it is thus of paramount importance to understand the disease mechanisms and the underlying factors associated with vulnerabilities, which are critical for controlling the pandemic and alleviating the global crisis. it is also critical to systematically investigate differences between clinical presentations (mild/moderate and severe), or between treatment outcomes (disease progression and convalescence) of patients, as they can provide important guidance to the development of effective therapeutics and vaccines. findings. here we applied scrna-seq to a large cohort with individuals, including hospitalized covid- patients with moderate or severe disease, and patients in the convalescent stage, as well as healthy controls. with high-quality transcriptomics data of ~ . million single cells, we reveal that sars-cov- could infect a wider range of cell types than previous understanding, and induce distinct phenotypic changes in those infected cells. such heterogeneity of sars-cov- infection has important immunological implications as such cells exhibit distinct interaction potentials with innate and adaptive immune cells. we also observed critical changes in the peripheral blood discriminating mild/moderate from severe covid- patients in the disease progression or convalescence stages, and found their association with patient sex and age. further, our large cohort analysis provides a unique opportunity to reveal the characteristics of cytokine storms in patients, and to further illustrate the cell subpopulations that might contribute to the inflammatory responses and the hyper-inflammatory genomic signatures under sars-cov- infection. our findings may have important implications to the research, treatment, control and prevention of covid- . integrated analysis of the covid- scrna-seq data to systematically characterize the immune properties at single-cell resolution in the covid- patients, we formed a single cell consortium for covid- in china (sc ), which consisted of researchers from research institutes or hospitals from different regions of china. members of sc contributed covid- related scrna-seq data, mostly still unpublished, for a total of individuals, including patients with mild/moderate symptoms, hospitalized patients with severe symptoms, and recovered convalescent persons, as well as healthy controls ( figure a and table s ). while most previous studies did not discriminate whether convalescent individuals recovered from mild/moderate or severe symptoms, we divided the convalescent group into two subgroups, recovered from mild/moderate symptoms and recovered from severe symptoms, to investigate the effects of disease severity on the immune status of recovered individuals. this cohort covered a wide age range (from to years old), with the mild/moderate and severe groups having significant age differences (figure s a we applied a common set of stringent quality control criteria to ensure that the selected data were from single and live cells and that their transcriptomic phenotypes were comprehensively characterized. a total of , , high-quality single cells were ultimately obtained, with an average of , unique molecular identifiers (umis), representing , genes ( figures s d and s e ). with the large-scale of data, we obtained cell clusters, covering diverse epithelial cells in the respiratory system, megakaryocytes, mast cells, myeloid cells, and nk/t/b cells ( figure b ). such an information-rich resource (available at http://covid .cancer-pku.cn/ for quick browsing) enabled accurate annotation and analysis of these cell clusters at different resolutions ( figure c , figure s f-j and table s ), which allow the elucidation of potential molecular and cellular mechanisms underlying the pathogenesis of sars-cov- infection and differences of human immune responses for patients with distinct symptoms. notable differences could be observed in the immune compositions of healthy controls and covid- patients with mild/moderate or severe symptoms ( figure d ) or between the disease progression stages and convalescence ( figure e ) based on the t-distributed stochastic neighbor embedding (t-sne) projection. the tissue preference of each cluster was illustrated based on the ratio of observed to randomly expected cell numbers (ro/e, figure f ), partially reflecting the validity of cell clustering. notably, various clusters of proliferating cd + and cd + t, and plasma b cells were more enriched in balf than pbmcs, indicating activated adaptive immune responses in the lung ( figure f ). we first analyzed the compositional changes of the broad categories of immune cells for pbmcs in different covid- patient groups. notably, the percentages of megakaryocytes and monocytes in pbmcs were elevated, particularly in severe covid- patients during the disease progression stage (figure a) the increased plasma b cells in peripheral blood appeared to be derived from active proliferation of plasmablasts and transitions from memory b cells based on the paired bcr sequencing analyses. both the extent of bcr clonal expansion and the diversity of the total bcr repertoire of these cells were significantly increased in severe covid- patients ( figure c ). plasmablast cells (b_c _mki ), characterized by high expression of mki and thus indicating a proliferative state, were elevated in the peripheral blood of severe covid- patients ( figure s a ) and shared the most clonotypes with plasma cells (figure d). the memory b cell cluster expressing high levels of cd , cd , aim , grip , and coch (b_c -cd -aim ) was the second major source of plasma b cells in the peripheral blood, which shared a large proportion of clonotypes with plasma cells and plasmablasts ( figure d ). distinct from plasma cells and plasmablasts which were mainly composed of igas and iggs, b_c -cd -aim had a higher proportion of igms ( figure e), indicating a precursor state. we applied analysis of variance (anova) to dissect the associations of compositional changes of plasma b cells with disease severity, stage (progression or convalescence), age, sex, or the interactions of these factors. we found that plasma b cells in blood were specifically associated with the disease severity of covid- , and then disease stage, but had no associations with age or sex observed ( figure f ) (takahashi et al., ). in fact, for the mild/moderate disease, convalescent patients harbored higher levels of plasma b cells than those in the disease progression stage. by contrast, the plasma b cell levels in convalescent patients who recovered from severe disease were significantly lower than those in the disease progression stage ( figure b ). interestingly, the precursors of plasma b cells, i.e., cells of b_c -cd -aim , appeared to be associated with sex differences ( figure g ). in females, the percentage of b_c -cd -aim cells was significantly higher than that of males ( figure g ). almost all b cell clusters were associated with disease stages, implying the importance of humoral immune response changes between disease progression and convalescence ( figure s b and table s ). in summary, plasma b cells appeared to be significantly elevated in the peripheral blood regarding either the composition, proliferation, or developmental transition from memory b cells, and were more associated with disease severity. while their precursor cells were also elevated, they were more prone to be influenced by sex differences, providing a plausible explanation for the epidemiological observations on sex differences of covid- . taken together with the observation that plasma b cells were more enriched in balf ( figure f ), these observations may suggest that humoral immune responses were actively initiated to combat sars-cov- infection and contributed to disease severity. two proliferative cd + t cell clusters were also identified, with t_cd _c -mki - ccl low characterized by high expression of sell and low ccl and t_cd _c -mki - ccl high characterized by low sell and high ccl . the counts of t_cd _c -mki - ccl high in pbmcs did not show significant differences among different covid- patients. by contrast, the t_cd _c -mki -ccl low counts were elevated in covid- patients, particularly in severe patients during the disease progression stage ( figure c ). similar to plasma b cells, the diversity and clonality of this cluster were both increased in severe patients with disease progression ( figure d ), indicating an expanded tcr repertoire and developmental transitions from other clusters. unlike plasma b cells whose source cluster b_c -cd -aim was increased in peripheral blood ( figure s c ), the major source cluster of proliferative cd + t cells t_c _cd −anxa was decreased in covid- patients, particularly in severe patients during the disease progression stage (figures e and f). this may partially explain the dichotomous and incomplete adaptive immunity previously observed in covid- patients (catanzaro et al., ). anova analyses revealed that different from t_cd _c -mki -ccl low ( figure g ), the percentage of t_cd _c −anxa was associated with disease severity, progression/convalescence, and sex ( figure h ). in particular, female patients generally had higher levels of t_cd _c −anxa than males ( figure h) while the decrease of γ δ t cells, mait cells, and effector memory t cells abovementioned were primarily associated with disease severity, the decreases of naive and central memory t cells were associated with the age but not sex difference of patients ( figure s d and s e and table s ). such clusters included the naive cd + cluster t_cd _c -lef , the cd + central memory cluster t_cd _c -gpr , the naive cd + cluster t_cd _c -lef , and two cd + cd + clusters t_cd _c -nr a and t_cd _c -fos. cohort also enabled us to dissect the impact of age and sex on the immune responses of covid- patients. we found that, rather than associated with t cell proliferation, age and sex are more likely associated with the abundance of naive/central memory t cells and the precursor cells of proliferative t cells, respectively, highlighting the complexity of human t cell responses to sars-cov- infection. our scrna-seq data also coupled with tcr/bcr repertoire sequencing and thus provided a rich resource to investigate the tcr/bcr usage of covid- patients, which is instructive for the development of anti-sars-cov- therapeutics and vaccines. we first examined whether identical tcrs or bcrs could be identified across covid- patients. we found that only a few tcrs or bcrs were shared between two patients, and no identical tcrs or bcrs were shared beyond three patients. the diversity of tcr or bcr repertoires of various t and b clusters might also be influenced by age, sex, covid- severity, and disease stages. while age was mainly associated with the abundance of only naive and central memory t cells in pbmc, anova analysis revealed that age might influence the decrease of tcr diversity in a wider range of t cells, including naive, central memory, and diverse effector memory t cells ( figure s i -j). by contrast, sex differences were mainly associated with the bcr diversity of naive and memory b cells (b_c -tcl a, b_c -ms a -cd , and b_c -sox -tnfrsf b) and the tcr diversity of a subset of effector memory cd + t cells (t_cd _c -gzmk-fos high ) ( figure s i -k). after correcting the effects of age and sex, the decrease of diversity in mait cells, naive b and cd + and cd + t cells, effector memory cd + t cells (t_cd _c - gzmk and t_cd _c -cotl ), and a few cd + cd + t cell clusters (t_cd _c - itga , t_cd _c -anxa ) remained independently associated with covid- severity ( figure s i -k), highlighting the importance of these cells in covid- . importantly, the tcr diversity of one proliferative cd + t cell cluster, i.e., t_cd _c -mki -fos, was associated with the triad interaction by disease severity, age, and sex ( figure s j ), indicating the impacts of age and sex on disease severity. similarly, the clonal expansion of a central memory cd + t cell cluster highly expressing aqp (t_cd _c -aqp ) was also associated with the triad interaction by disease severity, age, and sex ( sars-cov- detected in multiple epithelial and immune cell types with interferon response phenotypes the enrichment of plasma b and proliferative t cells in balf and the elevation of these cells in pbmcs of covid- patients highlighted the roles of these cells in combating sars-cov- infection. to explore potential interactions between these cells and sars- cov- infected cells, we examined the characteristics of cell types that harbored sars- cov- sequences in our dataset. we examined their expression levels in these cells ( figure c ) (netea et al., ). we found that at least a subset of those epithelial cells expressed ace and tmprss , consistent with the notion that sars-cov- employs ace and tmprss to invade these cells. interestingly, those immune cells, which did not express ace or tmprss , harbored even more viral rna sequences than the epithelial cells ( figure d ). the high viral load reassured that the detection of sars-cov- rnas in these immune cells was unlikely caused by experimental contamination. consistently, an independent scrna-seq study of covid- patients also identified sars-cov- rnas in neutrophils and macrophages from the respiratory samples of covid- patients (bost et al., ). since interferon-stimulated genes (isgs) are typically activated in virus-infected cells (schoggins and rice, ), we next examined the expression of isgs in these cells (figure e and clca , and sult b ( figure a ). these genes were enriched in pathways such as "response to virus", "response to type i interferon" and "response to hypoxia", consistent with viral infection and the subsequent respiratory distress, reflecting the host immune response via type i interferons ( figure b ). by contrast, the numbers of genes with significant changes after sars-cov- infection for ciliated and secretory epithelial cells were much smaller than squamous cells, and few genes showed consistent changes in all the three epithelial cell types ( figure c ). we cells with no viral detection ( figure s c ). such changes were consistent across covid- patients ( figure f ). comparison across ciliated, secretory, and squamous epithelial cells infected by sars-cov- also highlighted the dispersing tendency of ciliated cells and the interacting potentials among squamous cells themselves (figures g and h ). such interaction distinctions not only existed among epithelial cells, but also impacted their interactions with immune cells. consistent with the dispersing nature of ciliated cells in the outer compartment of the pseudo-space, no significant interactions were observed between virus+ ciliated cells and immune cells. by contrast, virus+ secretory epithelial cells showed significant interactions with neutrophils and macrophages in mild/moderate covid- patients via the scgb a -marco axis ( figures s d and s e ), but such interactions were subdued in severe covid- patients due to the down-regulation of marco in neutrophils and macrophages ( figure s f ). in severe patients, virus+ squamous cells showed significant interactions with neutrophils and macrophages via the anxa -fpr and s a /a -tlr axes ( figure i ). neutrophils and macrophages exhibiting high interacting potentials with virus+ squamous epithelial cells were also prone to be sars-cov- infected ( figure j ). as anxa -fpr and s a /a immune response. it is noteworthy that plasma b cells in balf also tended to be sars- cov- -positive and displayed close interactions with virus+ neutrophils and squamous epithelial cells via the s a /a -tlr axes ( figure l ). we then investigated the cell types expressing anxa , fpr , s a , s a , and tlr in both balf and pbmc across covid- patients to evaluate the possible inflammatory cascade mediated by these lr pairs. it was evident that anxa was highly expressed in a wide range of immune cells except b cells and naive t cells ( figures s a and s b ) and its receptor fpr was highly expressed in neutrophils, macrophages, and monocytes ( figures s a and s b) . interestingly, for most immune cell clusters in balf, the expression levels of anxa and fpr were down-regulated in severe covid- patients compared with those of mild/moderate covid- patients ( figure s a ). but in pbmcs, except for mait cells (t_cd _c -slc a ) and γ δ t cells (t_gdt_c -trdv ), anxa and fpr were significantly up-regulated in many cell types in severe covid- patients compared with those of mild/moderate covid- patients ( figure s b ). s a and s a were highly expressed in neutrophils, macrophages, and monocytes in covid- patients with mild/moderate symptoms and had no expression in t, b, nk, or dendritic cells ( figures s h and s b ). however, for severe covid- patients in the disease progression stage, s a and s a were significantly up-regulated in almost all cell clusters for both balf and pbmcs ( figures s h and s b ). in particular, t, b, nk, and dendritic cells had no or minimal levels of s a and s a expression in mild/moderate covid- patients ( figures s h and s b ). by contrast, in severe covid- patients, the levels of s a and s a were significantly up-regulated in t, b, nk, and dendritic cells ( figures s h and s b) , indicating a systemic inflammatory response. tlr did not exhibit significant differences in pbmcs between severe and mild/moderate covid- patients but was significantly down-regulated in certain balf monocyte and macrophage subsets ( figure s b ). in summary, our data indicated that sars-cov- infection in different types of epithelial cells might trigger different transcriptomic changes and thus could modulate their interactions with themselves and with immune cells. in particular, squamous epithelial cells could up-regulate anxa and s a /a after sars-cov- infection, enhancing their interactions with neutrophils and macrophages via the axes of anxa -fpr and s a /a -tlr . the systemic up-regulation of anxa , fpr , and s a /a in immune cells from peripheral blood may indicate, at least partially, the molecular mechanism of aberrant inflammation in severe covid- patients. this hypothesis is supported by a preliminary finding that small molecules targeting s a /a could inhibit compromised adaptive immune response in severe patients. furthermore, the megakaryocytes in pbmcs, followed by monocytes, exhibited higher interaction potentials with epithelial and immune cells in balf than adaptive immune cells ( figure s i ), suggesting the critical roles of these cells in the pathogenesis of covid- . megakaryocytes and monocyte subsets as critical peripheral sources of cytokine storms with our large scale scrna-seq dataset, we next sought to investigate whether any crucial cell subtypes in peripheral blood contribute to the bulk of inflammatory cytokine production. we first defined a cytokine score and inflammatory score for each cell based on the expressions of the collected cytokine genes and reported inflammatory response genes (liberzon et al., ) (table s ) , respectively, and used these two scores as indicators to evaluate the levels of inflammatory cytokine storm for each cell. we found apparent elevated expression of cytokine and inflammatory genes in patients, especially at the severe progression stage ( figures a and s a t_cd _c -slc a ) and one subtype of megakaryocytes was detected with significantly higher cytokine and inflammatory scores ( figure. s b, table s , p < . ), indicating that these cells might be major sources of inflammatory storm. interestingly, megakaryocytes, which have not been reported in the inflammatory response in covid- patients, may affect the functions of platelets at the disease stage, in consistent to a previous study (manne et al., ). each of the hyper-inflammatory subtypes highly expressed several cytokine genes that are known to be involved in the inflammatory storm, such as ccl , il b, cxcl , ccl , ccl , il , ltb and tgfb , but with different patterns ( figure b ), suggesting divergent genomic signatures of these cells. we then investigated the proportion of each of the cell subtypes in every patient and found that these hyper-inflammatory cell subtypes were in general slightly more frequent in patients at severe stage ( figure. s c ). when we clustered these cell subtypes with each individual patient based on the proportions of the hyper- inflammatory cell subtype in pbmcs, we found distinct enrichment of these cell subtypes in different groups of patients ( figure c ). mono_c -cd -ccl , known be associated with tocilizumab-responding cytokine storm (guo et al., a), was highly enriched in a subpopulation of severe onset patients likely to be accompanied by inflammatory storm ( figures c and d ). the proportion of mono_c -cd -ccl subtype was also correlated with the age of the corresponding patients ( figure. e) . the hyper-inflammatory megakaryocytes were enriched in another batch of severe onset patients, which could also be under excessive inflammatory response ( figure. c and d) . by contrast, mono_c -cd -hla-dpb and mono_c -cd -vcan subtypes were widely distributed in every disease stage, and the hyper-inflammatory t cells showed decreased proportions in patients at the severe onset stage such as t_cd _c -gzmk-fos high subtype ( figures c, d and s b ), although both of these two monocyte subtypes exhibited increased proportions in elder convalescent patients ( figure e ). taken next, we investigated the inflammatory signatures for each hyper-inflammatory cell subtype and found unique pro-inflammatory cytokine gene expressions in each cell subtype ( figure f), suggesting diverse mechanisms by which these cell subtypes may contribute to the cytokine storm. the hyper-inflammatory mono_c -cd -ccl and megakaryocytes largely expressed more cytokines, suggesting central roles of the two cell types in driving the inflammatory storm. specifically, mono_c -cd -ccl highly expressed cxcl , tnf, il rn, il b, and ccl , which we also detected with significantly higher levels in serum from patients at the severe stage, especially those critically ill patients ( figures f and s d ). although the inflammatory megakaryocytes highly expressed the cell type identity marker genes such as ppbp (zhang et al., ), the expression level of these genes was significantly decreased in patients compared to healthy controls, indicating a loss of function of these cells after inflammatory activation ( figures f and g) . notably, the t_cd _c -tnf subtype specifically and highly expressed ifng, a pro-inflammatory cytokine highly enriched in patients at the severe onset stage also confirmed by serum cytokine detection ( figures f, g and s d) . moreover, pro-inflammatory cytokines cxcl and ifng showed significant age-dependent expressions in patients with disease progression, while no significance was observed in healthy controls ( figure h ). ppbp showed no correlation with the age in either patients or healthy controls, suggesting that the loss of function of megakaryocytes might not be age-dependent ( figure h ). to assess the dynamic changes of cytokines in covid- patients with different periods, we compared them with healthy controls for these seven hyper-inflammatory subtypes, and found that ifng, il , ccl , tnf, cxcl , cxcl , il rn, etc, were highly expressed in cells of severe patients with disease progression ( figure s e ). we also observed eight cell subtypes with significantly higher cytokine scores even though their inflammatory scores showed no difference to other cell clusters ( figure s b , table s , p < . ). these cell subtypes exhibited uniform and relatively low expressions of cytokine genes such as igf , txlna, scyl , ccl and il ( figure f ), likely not involved in the cytokine storm. no significant differences were observed at the serum level for these cytokines between the different groups of patients ( figure s f ). these genes specific for hyper-inflammatory cells may serve as signatures for the inflammatory storm and be helpful in deepening the understanding of covid- pathogenesis. figure a ). similar to our analysis on pbmcs, we identified five hyper-inflammatory cell subtypes, including macro_c -ccl l , the three subtypes of monocytes and the neutrophils ( figure b ), suggesting that these cell subtypes might be the major sources driving inflammatory storm in the lung tissue. neither cd + nor cd + t cells were detected with an elevated inflammatory score or the cytokine score in balf samples, which was different from those in pbmcs. each hyper-inflammatory subtype highly expressed specific cytokines; for example, macro_c -ccl l specifically expressed ccl infection rates and disease severity of covid- patients. our data, covering a wide age range and a sex-balanced covid- cohort, proved to be powerful at dissecting the associations of age and sex in the immune responses to sars-cov- infection. our data revealed an apparent involvement of age and sex in the diverse human immune responses via multiple mechanisms, at least partially reflected at the immune cell sub-cluster level. in general, plasma b and proliferative t cells were associated with disease severity, while compositional differences of the precursor cells of these adaptive immune cell types were more prone to be influenced by sex and age seemed to impact more on naive and central memory cells. of note, age and sex also seemed to impact the diversity of tcr/bcr repertoires for a wide range of t and b cells, which may have clinical implications. the single-cell resolution of our data also enabled us to examine the in vivo potential host cells of sars-cov- and the transcriptomic changes caused by sars-cov- infection. we observed the presence of sars-cov- rnas in multiple epithelial cell types in the human respiratory tract, including ciliated, secretory, and squamous cells. although prominent type i interferon responses could be identified in these cells, distinct transcriptomic changes appeared to be caused by sars-cov- infection. such distinctions were exhibited not only in the correlations of interferon responses and viral load, but also in the genes of specific immune relevance, including those encoding lr interactions which are pivotal to cell-cell communications. of hundreds of immune-relevant lr pairs, anxa -fpr and s a /a - tlr seemed to be critical in mediating the interactions of virus+ squamous epithelial cells and neutrophils and macrophages. although s a /a were not expressed in lymphoid cells in mild/moderate covid- patients, they were highly expressed in the t, b, and nk cells of severe patients, likely contributing to the aberrant inflammation of these patients. coincidentally, small molecule inhibitors of s a /a could reduce the aberrant inflammation and sara-cov- replication in mice (guo et al., b), supporting our findings. both s a / and fpr should be evaluated further as targets for modulating the immune responses to sars-cov- . in addition to epithelial cells, rnas of sars-cov- were also identified in various immune cell types, including neutrophils, macrophages, plasma b cells, t and nk cells, often with even higher levels than those in epithelial cells. the viral infection status of these cells could also be supported by the prominent interferon responses in these cells. it is still not clear how such immune cells would acquire viral sequences in the absence of either ace or tmprss , but it is evident that the pattern of sars-cov- infection is more complicated than initial understanding. such complexity needs to be thoroughly addressed before this dreadful infectious disease can be effectively controlled. the rich information of our data also allowed us to dissect the cellular origins of potential cytokine storms. we found that megakaryocytes and a few monocyte subsets might be key sources of a diverse set of cytokines highly elevated in covid- patients with severe disease progression. we suspect that in severe patients, infected epithelial cells would secrete cytokines such as il rn into the peripheral, and monocytes expressing il r could be stimulated and in turn produce multiple proinflammatory cytokines such as cxcl , il , il b, and tnf ( figure e ). through il r , these hyperactive monocytes could also interact with dysfunctional megakaryocytes producing tgfb , tnfsf , pf and fth . meanwhile, the t cells in the blood go through lymphopenia, while the residual ones are hyperactive in secreting many inflammatory cytokines such as ifng and tnfsf . such proinflammatory cytokines secreted by the cells in the blood could also infiltrate into the lung tissue, and thus activating the tissue resident monocytes, macrophages and neutrophils for further cytokine production. we acknowledge that this is only one of many possible scenarios where an inflammatory storm could form, although our data revealed key actors in the final cytokine screenplay. in conclusion, we generated a large scrna-seq dataset including ~ violin plots of selected marker genes (rows) for major cell subpopulations (columns) ordered by cell lineage relationships. nk, natural killer cells; mono, monocyte; macro e) t-sne representations of integrated single-cell transcriptomes of , , cells coloured by disease symptoms (d) and disease progression stages (e) see also figure s dynamic changes of b cell composition across disease conditions (a) differences in immune cell composition across disease conditions for pbmc. conditions are shown in different colors. each boxplot represents one cell cluster. all differences with adjusted p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis. (b) changes of xbp + plasma cells proportion across disease conditions. composition of xbp + plasma cell bcrh cgene differences of xbp + plasma cells clonal expansion and bcr diversity across disease conditions. bcr clonal expansion level is calculated by startrac-expa. shannon's entropy reveals the diversity of bcr repertoire. all differences with p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis clonotypes of clones contain xbp + plasma cells (right); only shows clones with more than cells. (e) composition of b_c -cd -aim memory cells bcrh cgene. (f) anova of xbp + plasma cells proportion. (g) anova of b_c -cd -aim memory cells proportion (left) and differences of b_c - cd -aim memory cells proportion between male and female (right) see also figure s differences in t cell composition across disease conditions (a) changes of proliferating cd and cd t cells across disease conditions for pbmc all differences with adjusted p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis differences of three cd proliferating t cell sub clusters proportion across disease conditions. all differences with p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis differences of two cd proliferating t cell sub clusters proportion across disease conditions. all differences with p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis differences of t_cd _c -mki -ccl low proliferating cells clonal expansion and tcr diversity across disease conditions. tcr clonal expansion level is calculated by startrac-expa. shannon's entropy reveals the diversity of bcr repertoire. all differences with p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis transition between t_cd _c -mki -ccl low proliferating cells and other cd cell sub clusters (left) and clonotypes of clones contain t_cd _c -mki -ccl low proliferating cells (right) all differences with p-value < . are indicated; two-sided unpaired wilcoxon was used for analysis. (g) anova of t_cd _c -mki -ccl low proliferating cells proportion. (h) anova of t_c _cd -anxa t cell proportion (left) and differences of anxa t cell proportion between male and female (right). two-sided unpaired wilcoxon test see also figure s figure landscape of cell types detected sars-cov- sequences and their antiviral response uniform manifold approximation and projection (umap) of all cells with sars-cov- genome umi > after quality control containing cells in total characteristic markers we chose to identify each cell type. the purple box indicates immune cell types (top), and the red one indicates epithelial cell types (bottom) umap showing expression level of known sars-cov- infected receptor ace (left) and tmprss (right) umap showing the viral load of each cell. the darker colors in the bar indicate a higher viral load in cells umap showing the activation of interferon-stimulated genes (isgs) in cells with viral detection violin plots showing differential expression of isgs between cells with viral detection (virus+) and cells without (virus-) in pbmc-derived neutrophils (left panel), balf-derived neutrophils (middle panel) and squamous cells (right panel). the y axis represents the expression level of each gene scatter plots showing the correlation between viral load and isgs in neutrophil (left panel) and squamous cells (right panel). the line in scatter plots represent the result of linear regression. each point in the graph represents one single cell the x axis shows virus load in each cell while the y axis represents the expression level of one of the isg genes. correlation coefficient (r) and probability (p) are acquired using pearson see also figures s and tables s boxplot showing the pseudo space distance within squamous cells. each dot represents an individual patient. two-sided paired wilcoxon test violin plot showing the pseudo space distance within each type of epithelial cells in one example. two-sided unpaired wilcoxon test boxplot showing the median of pseudo space distance within each type of epithelial cells of all the patients with balf data. each dot represents an individual patient. two- sided unpaired wilcoxon test boxplot showing the normalized connection between squamous cells and virus-detected plasma b cells of all the patients with balf data. each dot represents an individual patient. two-sided unpaired wilcoxon test pie chart showing the ligand-receptor contribution proportion between virus+ squamous and macro_c -vcan in one example. ligand-receptor pairs with contribution less than . were merged as 'other lrs boxplot showing the normalized connection between squamous cells and virus- macrophage (left), virus+ macrophage (middle) and virus+ neutrophils (left) mono_c -cd -ccl and megakaryocytes in peripheral blood appear as dominant source for inflammatory cytokine storm a) t-sne plots of pbmc cells colored by major cell types (top left panel), inflammatory cell types (top right panel), cytokine score (middle panel) and inflammatory score violin plots of selected cytokine genes for seven hyper-inflammatory cell subtypes. (c) heatmap of an unsupervised clustering of cell proportion of seven hyper-inflammatory cell subtypes in all samples analyzed. (d) box plots of the cell proportion of mono_c -cd -ccl , mega and t_cd _c -gzmk- fos high clusters from healthy controls (n= ), moderate convalescent (n= ), moderate onset (n= ), severe convalescent (n= ) and severe onset (n= ) patients ordinary least squares model of age to cell proportion of mono_c -cd -ccl convalescent (n= ) and onset (n= ) patients. p value was assessed with f- statistic for ordinary least squares model. (f) heatmap of cytokines genes' expression among seven hyper-inflammatory cell subtypes. seven hyper-inflammatory cell subtypes are colored in red and others are colored in grey. (g) box plots of the cytokines t_cd _c -tnf clusters from healthy controls (n= ), moderate convalescent (n= ), moderate onset (n= ), severe convalescent (n= ) and severe onset (n= ) patients. two-sided wilcoxon rank-sum test ordinary least squares model of age to cytokines' expression of mono_c -cd -ccl mega and t_cd _c -tnf clusters from healthy controls (n= ), convalescent (n= + ) and onset (n= + ) patients. p value was assessed with f-statistic for ordinary least squares model the box represents the second, third quartiles and median, whiskers each extend . times the interquartile range; dots represent outliers mono_c , mono_c , mono_c , t_cd _c , t_cd _c , t_cd _c and mega correspond to mono_c -cd -ccl , mono_c -cd -hla-dpb t_cd _c -slc a , t_cd _c -tnf and mega, respectively. in panel (f), t_cd _c , t_cd _c , t_cd _c , t_cd _c t_cd _c -cotl , t_cd _c -znf t_cd _c -il rb and nk_c -fcgr a, respectively. dc, dendritic cells. mega, megakaryocytes. mono, monocytes figure . the interactions of hyper-inflammatory cell subtypes in lung and peripheral blood a) t-sne plots of balf cells colored by major cell types (top panel), cytokine score (middle panel) and inflammatory score significance was evaluated with wilcoxon rank-sum test. **** p < . . (c) heatmap of an unsupervised clustering of cytokine genes' expression among five hyper-inflammatory cell subtypes the outer ring displays color coded cell types and the inner ring represents the involved ligand-receptor interacting pairs. the line width and arrow width are proportional to the log fold change between severe onset and moderate onset patient groups in ligand and receptor, respectively. colors and types of lines are used to indicate different types of interactions as shown in the legend. the bar plot at bottom indicates the interaction score for each interaction summary illustration depicting the potential cytokine/receptor interactions of hyper- inflammatory cell subtypes involved in the cytokine storm epi, epithelial cells. macro, macrophage cells. mono, monocytes. neu, neutrophils basic information of the dataset quality and cell subsets in major cell lineages, related to figure (a) sorted age span of donors color-coded by disease symptoms. (b) distribution of sex in donors with different disease symptoms. chi-square test. (c-e) distribution of unique molecular identifier (umi) count per cell (c), gene count per cell (d), and percentage of mitochondrial transcripts per cell (e) detected for cells pleural effusion/sputum. (f-j) violin plots of selected marker genes (rows) for cell subsets (columns) within each cell lineage, including b/plasma b cell clusters (f), myeloid cell clusters (g), nk cell clusters (h), epithelial cell clusters (i) and t cell clusters (j). (a) d pseudo space calculated by csomap, showing the location of ciliated cells violin plot showing the distance calculated from space shown in (a) within each ciliated cell group. two-sided unpaired wilcoxon test violin plot showing the distance within each squamous cell group. two-sided unpaired wilcoxon test bar plot showing the mean of normalized connections of the interaction between virus+ secretory and macro_c -c qc in patients categorized by two states. error bar, s.e.m. (e) pie chart showing the ligand-receptor contribution proportion between virus+ secretory and macro_c -vcan in one example dotplot showing the mean expression level of marco in balf samples. pct, percentage of expressed cells boxplot of normalized connection between major cell types and ciliated (top), secretory (middle) and squamous (bottom) cells with viral detection. kruskal-wallis rank sum test. (h) dot plots showing the expression of s a (left) and s a (right) in pbmc samples. each dot is colored by the means of the expression and sized by the scaled means (z scores) boxplot of normalized connection between pbmc-derived cell types and balf. each dot represents a sample. kruskal-wallis rank sum test. (a) dot plots showing the expression of anxa (top), fpr (middle) and tlr (bottom) in pbmc samples dot plots showing the expression of anxa (first panel), fpr (second panel) s a (third panel), s a (fourth panel) and tlr (bottom panel) in balf samples. each dot is colored by the means of the expression and sized by the scaled means identification of hyper-inflammatory subtypes associated with cytokine storm in pbmcs (a) t-sne plots of pbmc cells colored by cytokine score (top panel) and inflammatory score severe onset (color, n= ) and average of all samples (n= ) (top panel); the inflammatory score (middle panel) and cytokine score (bottom panel) of subtypes from healthy controls (n= ), moderate convalescent (n= ), moderate onset (n= ), severe convalescent (n= ) and severe onset (n= ) patients. significance was evaluated with mann-whitney rank test for each subtype versus all the other subtypes. **** p < . . (c) barplots of subtypes' (seven hyper-inflammatory cell types, eight cytokine cell types and others) frequencies across each individual samples from healthy controls (n= ) bar graphs showing cytokine concentration at the serum levels of ccl , ifng, il rn and tnf from healthy controls (n= ), convalescent (n= ), non-severe (n= ), severe (n= ), death case (n= ) patients the triangle represents severe onset versus healthy controls. circle stands for moderate onset versus healthy controls. the square stands for convalescent versus healthy controls. all rings in the plot from the inside to the outside represent the range of p value bar graphs showing cytokine concentration at the serum levels of ccl and il from healthy controls (n= ), convalescent (n= ), non-severe (n= ), severe (n= ), death case (n= ) patients. kruskal-wallis h-test between non-severe, severe and death case. in panel (d) and panel (f), all points are shown and bars represent mean with the % confidence intervals. dc, dendritic cells intercellular interaction alterations among cell types between severe and moderate onset sample groups circos plot showing the prioritized interactions mediated by ligand-receptor pairs between inflammation-related cell subtypes for each tissue, namely the outer ring displays color coded cell types and the inner ring represents the involved ligand-receptor interacting pairs. the line width and arrow width are proportional to the log fold change between severe onset and moderate onset patient groups in ligand and receptor, respectively. colors and types of lines are used to indicate different types of interactions as shown in the legend. the barplot at bottom indicates the interaction score for each ligand-receptor interaction epi, epithelial cells. macro, macrophage cells. mono, monocytes. neu, neutrophils. mega, megakaryocytes yan, x., li, f., wang, x., yan, j., zhu, f., tang, s., deng, y., wang, h., chen, r., yu, z., et al. ( ) . neutrophil to lymphocyte ratio as prognostic and predictive factor in patients with coronavirus disease : a retrospective cross-sectional study. j med virol. yang, y., shen, c., li, j., yuan, j., wei, j., huang, f., wang, f., li, g., li, y., xing, l., et al. ( ). plasma ip- and mcp- levels are highly associated with disease severity and predict the progression of covid- . j allergy clin immunol , - e . yu, g., wang, l.g., han, y., and he, q.y. ( ) years old) and the elderly ( + years old). interactions between variables were regarded as significantly associated with cell type proportions when fdr < . . to investigate the impact of virus infection on epithelial cells, we identify differential expressed genes by performing two-sided unpaired wilcoxon tests on all the expressed genes (expressed in at least % cells in either group of cells). p values were adjusted following benjamini & hochberg protocol. top highly expressed genes of each group were shown in the volcano plots. based on these genes, enriched go terms were then acquired for each group of cells using r package clusterprofiler (yu et al., ) . to illustrate the cell-cell interaction potential of cells with viral detection, we first created a set of datasets by joining balf samples with the virus+ dataset separately. then, we used csomap (ren et al., ) to construct a d pseudo space and calculate the significant interaction for each dataset. to investigate the interaction potentials of the cell types, we used two indexes, distances within cell type and normalized connection. distance within each cell type is calculated based on the aforementioned d coordinates. the shorter the distance, the closer the cells are located in the d space, which indicates that they are more likely to interact with each other. to further investigate the interaction between different cell types, we made use of the csomap output connection matrix. for a cluster pair, normalized connection was calculated by dividing its corresponding connection value by the product of their respective cell numbers. normalized connections were then multiplied by , . meanwhile, to highlight the key ligand-receptor pairs function in the interaction, we also examine the contribution output by csomap. in addition, normalized connections were also calculated on another set of cohorts where we combined virus+ dataset with samples with paired pbmc and balf tissues, in order to investigate the interaction potential between cells from two tissues, pbmc and balf. inflammatory and cytokine score related subtypes analysis. briefly, we firstly filtered out samples with less than cells. for pbmc, only subtypes with more than cells were included in the subsequent analysis. for balf data analysis, we removed major cell types with less than cells. to define inflammatory and cytokine score, we downloaded a gene set termed 'hallmark_inflammatory_response' from msigdb (pmid: ) and collected cytokine genes based on these references (see table s ). cytokine and inflammatory score were evaluated with the addmodulescore function built in the seurat (pmid: ). to select the most promising hyper-inflammatory cell types, we performed mann-whitney rank test (single-tail) for each subtype's score versus all the other subtypes' score. seven subtypes (mono_c -cd -ccl , mono_c -cd -hla-dpb , mono_c -cd -vcan, t_cd _c -gzmk-fos high , t_cd _c -tnf, t_cd _c - slc a and mega) in pbmc were defined as hyper-inflammatory cell types with significantly statistical parameters (p < . ) in both cytokine and inflammatory score. in addition, we defined subtypes (t_cd _c -il rb, t_cd _c -gnly, nk_c -fcgr a, t_cd _c -znf , t_cd _c -cotl , t_cd _c -tyrobp, t_cd _c -gzmk and t_gdt_c to identify and visualize the possible cell-cell interactions in terms of cytokine storm between the highly inflammation-correlated cell types evaluated by the inflammation score within each tissue and the crosstalk between lung and circulating blood, we employed an r package italk introduced by wang et al. (wang et al., , biorxiv, https://www.biorxiv.org/content/ . / v ). cytokine/chemokine category (n = ) in the ligand-receptor database was selectively used for our purpose. wilcoxon rank sum test was used to identify the differentially expressed genes (degs) between severe onset and moderate onset patient groups for each cell type. degs were then matched and paired against the ligand-receptor database to construct a putative cell-cell communication network. an interaction score defined as the product of the log fold change of ligand and receptor was used to rank these interactions. in addition, the expression level of both ligand and receptor were also considered. we defined severe gained interaction if a ligand gene was upregulated in severe onset group and its paired gene upregulated or remains no change. we defined severe lost interaction if a ligand(receptor) gene was downregulated in severe onset group regardless of the expression level of its paired gene. cytokine analysis of serum by using multiplex bead-based immunoassay human cytokines in the serum were measured by bio-plex pro tm human cytokine screening plex bio-plextm system (# , bio-rad, us) and bio-plextm system (bio-rad). bio-plex protm assays are essentially immunoassays formatted on magnetic beads and are built upon three core elements of xmap technology, fluorescently dyed microspheres (also called beads), a dedicated flow cytometer with two lasers to measure the different molecules bound to the surface of the beads, and a high-speed digital signal processor that efficiently manages the fluorescence data. sample preparation whole blood from covid- patients and healthy controls were drawn into collection tubes containing anticoagulant. centrifugation the tubes at , x g for min at °c and transfer the serum to a clean polyprophylene tube, followed by another centrifugation at , x g for min at °c to completely remove platelets and precipitates. the raw sequencing and processed gene expression data in this paper have been deposited into gsa (genome sequence archive in big data center, beijing institute of genomics, chinese academy of sciences) and the ncbi geo database, respectively. visualization of this dataset can be found at http://covid .cancer-pku.cn. key: cord- -fqrn lr authors: lee, jimmy; hughes, tom; lee, mei-ho; field, hume; rovie-ryan, jeffrine japning; sitam, frankie thomas; sipangkui, symphorosa; nathan, senthilvel k.s.s.; ramirez, diana; kumar, subbiah vijay; lasimbang, helen; epstein, jonathan h.; daszak, peter title: no evidence of coronaviruses or other potentially zoonotic viruses in sunda pangolins (manis javanica) entering the wildlife trade via malaysia date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fqrn lr the legal and illegal trade in wildlife for food, medicine and other products is a globally significant threat to biodiversity that is also responsible for the emergence of pathogens that threaten human and livestock health and our global economy. trade in wildlife likely played a role in the origin of covid- , and viruses closely related to sars-cov- have been identified in bats and pangolins, both traded widely. to investigate the possible role of pangolins as a source of potential zoonoses, we collected throat and rectal swabs from sunda pangolins (manis javanica) confiscated in peninsular malaysia and sabah between august and march . total nucleic acid was extracted for viral molecular screening using conventional pcr protocols used to routinely identify known and novel viruses in extensive prior sampling (> , mammals). no sample yielded a positive pcr result for any of the targeted viral families – coronaviridae, filoviridae, flaviviridae, orthomyxoviridae and paramyxoviridae. in light of recent reports of coronaviruses including a sars-cov- related virus in sunda pangolins in china, the lack of any coronavirus detection in our ‘upstream’ market chain samples suggests that these detections in ‘downstream’ animals more plausibly reflect exposure to infected humans, wildlife or other animals within the wildlife trade network. while confirmatory serologic studies are needed, it is likely that sunda pangolins are incidental hosts of coronaviruses. our findings further support the importance of ending the trade in wildlife globally. the legal and illegal trade in wildlife for consumption as food, medicine and other products is a globally significant threat to conservation ( smith et al., ; nayar, ; rosen and smith, ) . it also drives the emergence of pathogens that threaten human and domestic animal health, and national and global economies (lee and mckibbin, ; smith et al., ; smith et al., ). this includes the severe acute respiratory syndrome (sars) outbreak caused by sars coronavirus (sars-cov) , which originated in the large wet markets of guangdong province, china (ksiazek et al., ) , and the current covid- outbreak caused by sars-cov- , first discovered in people associated with a wet market in wuhan zhu et al., ) . both viruses likely originated in bats, with sars-cov infecting civets and other small mammals in the markets which may have acted as intermediate or amplifying hosts (guan et al., ; wang and eaton, ) . the finding of furin cleavage insertions in the spike (s) protein sequence in the sars-cov- genome has led some to suggest that intermediate hosts may have been involved in the emergence of covid- (andersen et al., ) , however no intermediate hosts have so far been conclusively identified. recently, four different groups have identified coronaviruses in imported sunda or malayan pangolins (manis javanica) seized in raids on wildlife traders in china ( liu et al., ; lam et al., ; xiao et al., ; zhang et al., ) . the genomes of these are closely related to sars-cov- , particularly in some genes, including the s-gene responsible for binding to host cells, albeit that some bat-covs have higher overall sequence identity to sars-cov- (latinne et al., ) . authors of these papers propose that further sampling of pangolins might help elucidate the potential role of pangolins in the evolution of sarsr-covs, the emergence of covid- , and the risk of future zoonotic viral emergence lam et al., ; xiao et al., ) . over a ten-year period, as part of the usaid predict project (predict consortium, ; predict consortium, ), we collected biological samples from confiscated and rescued sunda pangolins in their country-of-origin: peninsular malaysia and the malaysian state of sabah on the island of borneo. the aims of this study were to identify the phylogeographic origins of confiscated pangolins and any potentially zoonotic pathogens associated with them (karesh ). here we report on the results from pathogen surveillance and discovery screening of these pangolin samples. sunda pangolins (manis javanica) were either confiscated from smugglers or rescued from the wild between august and march , and were in the possession of the department of wildlife and national parks peninsular malaysia, or sabah wildlife department at the time of sampling. most confiscations occurred near national borders or ports, and were reported to be destined for other southeast asian countries en route to china, and were usually found in sacks or crates in temporary holding facilities, or in vehicles. the wild-rescued sunda pangolins were all surrendered by members of the public who found them in their native habitats. all pangolins were alive during the sampling process, based on their weakened condition the animals had been in captivity for varying lengths of time, but usually many weeks or months, there was no way of confirming this. the sampling protocol was approved by uc davis institutional animal care and use committee (protocol number: ). each pangolin was assigned a unique identification code, gps coordinates of the confiscation or rescue locations, biometric measurement and physical health check information were recorded. swab samples were collected from the throat and rectum using a sterile non-absorbent mini-tip polyester swab (puritan, guilford, usa) placed in a cryotube contained μl of trizol reagent (invitrogen, carlsbad, usa). all samples were stored immediately in a liquid nitrogen dewar mve doble (chart biomed, ball ground, usa) at the sampling site and transferred to a - °c freezer for long term storage. total nucleic acid was extracted for viral molecular screening using the nuclisens easymag or minimag system according to the manufacturer's protocol with validated modifications (biomérieux, marcy l'etoile, france). complementary dna (cdna) of each sample were generated, according to manufacturer's protocol with random hexamers, from the superscript iii first-strand synthesis system for reverse transcription pcr (invitrogen, carlsbad, usa). the cdna was used in conventional pcr protocols screening five viral families: coronaviridae, filoviridae, flaviviridae, orthomyxoviridae and paramyxoviridae (table ) . the pcrs were conducted in a veriti or simpliamp thermal cycler (applied biosystems, foster city, usa). reactions were carried out in a final volume of µl, following the manufacturer's protocol (qiagen, hilden, germany) using µl of the cdna product as template and either fast cycling pcr kit or hotstartaq plus master mix with a final concentration of . µm for each primer following the manufacturer's protocols (qiagen, hilden, germany). predict universal controls and (anthony et al., ) , and specific controls for filovirus (one health institute laboratory, university of california, davis) and influenza liang pcr (liang et al., unpublished) were used. peninsular malaysia and sabah samples were screened on separate occasions at two different certified bsl biocontainment level laboratories using standardised methods. pcr products were loaded and run on % agarose gel electrophoresis - v, for - minutes with . x tris-acetate-edta buffer (vivantis technologies sdn. bhd., subang jaya, malaysia). the gels were viewed on a transilluminator and expected size bands were excised, stored in separate microcentrifuge tubes, and the corresponding post pcr mixes were used as a template for contamination control pcrs to check for contamination from the universal positive controls. pcr products were run under the same gel electrophoresis conditions; those without the expected size bands showed that there was no contamination from the controls. products from the initial pcr of these samples were then purified using the ultrafree-da centrifugal filter units (millipore, cork, ireland); the purified products were cloned using the dual-colour selection strataclone pcr cloning kit according to the manufacturer's protocol (stratagene, la jolla, usa). up to eight colonies containing the pcr product were selected and inoculated on luria bertani agar slants, individually. grown colonies were sent to a commercial company for direct colony sequencing. a total of sunda pangolins were screened: in peninsular malaysia (confiscated n= ; wildrescued n= ) (tables a and b) , and in sabah state (confiscated n= ; wild-rescued n= ) (tables c and d) . no sample yielded a positive pcr result for any member of the targeted virus families, either in peninsular malaysia ( % ci . - . ) or in sabah ( % ci . - . ). all positive controls were successfully amplified, confirming that the pcrs were performing properly. our negative findings across five viral families associated with emerging and re-emerging zoonotic diseases in recent decades contrast with reports of the detection of parainfluenza virus (wang et al., ) , coronaviruses and sendai virus (liu et al., ; zhang et al., ) , and sarsr-covs (lam et al., ; xiao et al., ) in sunda pangolins. our sample size is substantial, particularly given the rarity of these animals in malaysia -the international union for conservation of nature (iucn) lists the sunda pangolin (manis javanica) as 'critically endangered', as a result of poaching, smuggling and habitat loss (iucn, ) . our previous studies of bat coronaviruses revealed - % pcr prevalence (yang et al., ; anthony et al., ; hu et al., ; latinne et al., ) , suggesting that even at the upper limit of the % confidence interval, our negative findings in pangolins are inconsistent with endemic coronavirus infection at a population level. serologic studies are needed to support this contention. while our sampling was necessarily opportunistic (given the conservation status and the cryptic nature of the species) and sampling intensity varied, our negative findings over ten years and at multiple locations supports the veracity of the findings. the most parsimonious explanation for the contrast between our findings and the discovery of sarsr-covs in sunda pangolins by (liu et al., ; liu et al., ; lam et al., is the nature of the sampled population: our samples were drawn from an 'upstream' cohort of animals yet to enter or just entering the illegal trade network, whereas all others were drawn from 'downstream' cohorts confiscated at their destination in china. during the wildlife trade transits, which often includes movement through other southeast asian countries, animals are often housed together in groups from disparate geographic regions, and often with other species, giving opportunity for viral transmission among and within species. the housing of some of the animals in rehabilitation centers in china would also allow for exposure to coronaviruses from other groups or species. in natural wildlife reservoir hosts, sarsr-covs appear to cause little if any clinical signs, and this is supported by the limited laboratory infections so far carried out (watanabe et al., ) . the reports of clinical illness and pathology associated with coronavirus infection in pangolins (liu et al., ; xiao et al., ) , are unlikely in a reservoir host. we therefore conclude that the detections of sars-cov- related viruses in pangolins are more plausibly a result of their exposure to infected people, wildlife or other animals after they entered the trade network. thus, the likelihood is that sunda pangolins are incidental rather than reservoir hosts of coronaviruses as claimed by zhang et al., ( ) . our microsatellite dna fragment analysis (manuscript in preparation) suggests that confiscated pangolins from peninsular malaysia and sabah were taken from malaysia, brunei or indonesia, however further analysis of pangolins from the neighbouring countries is required to confirm the results. they were confiscated at holding facilities, ports or borders prior to shipment, and had not yet been exposed to multiple potential sources of infection, unlike the confiscated animals in china reported by xiao et al., ( ) and lam et al., ( ) . an array of pathogens and infections have been observed in wet markets, in wildlife (dong et al., ; cantlay et al., ) , in humans (xu et al., ) and in domestic animals (karesh et al., ) . in comparison to wildlife screened from the wild (poon et al., ) and from farms (tu et al., ; kan et al., ) , wildlife in markets have a much higher chance of exposure to pathogens and disease spillover. these findings highlight the importance of carefully and systemically ending the trade in wildlife and improving biosecurity to avoid having wet markets where wild animals are mixing with farmed animals and humans. our findings suggest that pangolins that have not entered the illegal wildlife trade pose no threat to human health. while the detection of sars-cov- like viruses in some trade-rescued pangolins suggests a parallel with traded civets (parguma larvata) in the emergence of sars-cov (guan et al., ) , any role as an intermediate host in the transmission of sars-cov- from a putative natural bat host to humans is yet to be established. serological studies in pre-trade pangolins will shed further light on any role of pangolins as hosts of sars cov -related viruses. all pangolin species face known and significant threats to their survival in nature and require active conservation efforts to ensure their enduring existence for future generations. the proximal origin of sars-cov- emergence of fatal avian influenza in new england harbor seals laboratory protocols for predict surveillance global patterns in coronavirus diversity a review of zoonotic infection risks associated with the wild meat trade in malaysia detection of a novel and highly divergent coronavirus from asian leopard cats and chinese ferret badgers in southern china isolation and characterization of viruses related to the sars coronavirus from animals in southern china discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus captive breeding of pangolins: current status, problems and future prospects the iucn red list of threatened species molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and farms wildlife trade and global disease emergence predict: surveillance and prediction for emerging pathogens of wildlife a novel coronavirus associated with severe acute reparatory syndrome identifying sars-cov- related coronaviruses in malayan pangolins origin and cross-species transmission of bat coronaviruses in china estimating the global economic costs of sars. institute of medicine (us) forum on microbial threats. learning from sars. preparing for the next outbreak. workshop summary viral metagenomic revealed sendai virus and coronavirus infection of malayan pangolins (manis javanica) are pangolins the intermediate host of the novel coronavirus ( -ncov) a real-time rt-pcr method for the universal detection and identification of flaviviruses wildlife trade threatens southeast asia's rare species identification of a novel coronavirus in bats predict & predict surveillance identification of a severe acute respiratory syndrome coronavirus-like virus in a leaf-nosed bat in nigeria summarizing the evidence on the international trade in illegal wildlife evidence for the role of infectious disease in species extinction and endangerment drowning in unidentified fishes: scope, implications,and regulation of live fish import. conservation letters reducing the risks of the wildlife trade sensitive and broadly reactive reverse transcription-pcr assays to detect novel paramyxoviruses bats, civets and the emergence of sars. wildlife and emerging zoonotic diseases: the biology complete genome sequence of parainfluenza virus (piv ) from a sunda pangolin (manis javanica) in china bat coronaviruses and experimental infection of bats, the philippines isolation and characterization of -ncov-like coronavirus from malayan pangolins an epidemiologic investigation on infection with severe acute respiratory syndrome coronavirus in wild animal traders in guangzhou. zhonghua yu fang yi xue za zhi isolation and characterization of a novel bat coronavirus closely related to the direct progenitor of severe acute respiratory syndrome coronavirus rapid molecular strategy for filovirus detection and characterization probable pangolin origin of sars-cov- associated with the covid- outbreak a novel coronavirus form patients with pneumonia in china c) and ( d) details of sunda pangolins confiscated from smugglers and rescued from wild. (level of detail for the location of confiscations and rescue events reported was determined by the local wildlife departments the event, (*) indicates the state where the confiscation or rescue occurred in peninsular malaysia, (**) indicates the district where the confiscation or rescue occurred in peninsular malaysia or sabah state laboratory and the laboratory team (fernandes opook, emilly sion) for sample processing at sabah wildlife department's wildlife health, genetic and forensic laboratory and program assistant velsri sharminie for generating the maps. we dedicate this paper to dr. diana ramirez, who sadly passed away on / / , in honour of her vital contribution to this work and wildlife conservation in sabah. this study was made possible in part by the generous support of the american people through the united states agency for international development (usaid) emerging pandemic threats predict project (cooperative agreement numbers aid-oaa-a- - and ghn-aoo- - - ), and the usaid infectious disease emergence and economics of altered landscapes (ideeal) project (cooperative agreement number aid- -a- - ). the contents are the responsibility of the authors and do not necessarily reflect the views of usaid or the united states government. key: cord- -sqz yc b authors: huo, jiandong; zhao, yuguang; ren, jingshan; zhou, daming; duyvesteyn, helen me; ginn, helen m; carrique, loic; malinauskas, tomas; ruza, reinis r; shah, pranav nm; tan, tiong kit; rijal, pramila; coombes, naomi; bewley, kevin; radecke, julika; paterson, neil g; supasa, piyasa; mongkolsapaya, juthathip; screaton, gavin r; carroll, miles; townsend, alain; fry, elizabeth e; owens, raymond j; stuart, david i title: neutralization of sars-cov- by destruction of the prefusion spike date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sqz yc b there are as yet no licenced therapeutics for the covid- pandemic. the causal coronavirus (sars-cov- ) binds host cells via a trimeric spike whose receptor binding domain (rbd) recognizes angiotensin-converting enzyme (ace ), initiating conformational changes that drive membrane fusion. we find that monoclonal antibody cr binds the rbd tightly, neutralising sars-cov- and report the crystal structure at . Å of the fab/rbd complex. some crystals are suitable for screening for entry-blocking inhibitors. the highly conserved, structure-stabilising, cr epitope is inaccessible in the prefusion spike, suggesting that cr binding would facilitate conversion to the fusion-incompetent post-fusion state. cryo-em analysis confirms that incubation of spike with cr fab leads to destruction of the prefusion trimer. presentation of this cryptic epitope in an rbd-based vaccine might advantageously focus immune responses. binders at this epitope may be useful therapeutically, possibly in synergy with an antibody blocking receptor attachment. highlights cr neutralises sars-cov- neutralisation is by destroying the prefusion spike conformation this antibody may have therapeutic potential alone or with one blocking receptor attachment incursion of animal (usually bat)-derived coronaviruses into the human population has caused several outbreaks of severe disease, starting with severe acute respiratory syndrome (sars) in (menachery et al., ) . in late a highly infectious illness, with cold-like symptoms progressing to pneumonia and acute respiratory failure, resulting in an estimated % overall death rate (baud et al., ) , with higher mortality among the elderly and immunocompromised populations, was identified and confirmed as a pandemic by the who on th march . the etiological agent is a novel coronavirus (sars-cov- ) belonging to lineage b betacoronavirus and sharing % sequence identity with bat coronaviruses (lu et al., a) . the heavily glycosylated trimeric surface spike protein mediates viral entry into the host cell. it is a large type i transmembrane glycoprotein (the ectodomain alone comprises over residues) (wrapp et al., ) . it is made as a single polypeptide and then cleaved by host proteases to yield an n-terminal s region and the c-terminal s region. spike exists initially in a pre-fusion state where the domains of s cloak the upper portion of the spike with the relatively small (~ kda) s rbd nestled at the tip. the rbd is predominantly in a 'down' state where the receptor binding site is inaccessible, however it appears that it stochastically flips up with a hinge-like motion transiently presenting the ace receptor binding site (roy, ; song et al., ; walls et al., ; wrapp et al., ) . ace acts as a functional receptor for both sars-cov and sars-cov- , binding to the latter with a to -fold higher affinity (k d of ~ nm), possibly contributing to its ease of transmission (song et al., ; wrapp et al., ) . there is % sequence identity between the rbds of sars-cov and sars-cov- ( figure s ). when ace locks on it holds the rbd 'up', destabilising the s cloak and possibly favouring conversion to a postfusion form where the s subunit, through massive conformational changes, propels its fusion domain upwards to engage with the host membrane, casting off s in the process (song et al., ; wrapp et al., ) . structural studies of the rbd in complex with ace (lan et al., ; wang et al., b; yan et al., ) how that it is recognized by the extracellular peptidase domain (pd) of ace through mainly polar interactions. the s protein is an attractive candidate for both vaccine development and immunotherapy. potent nanomolar affinity neutralising human monoclonal antibodies against the sars-cov rbd have been identified that attach at the ace receptor binding site (including m , cr and r (ter meulen et al., ; sui et al., ; zhu et al., ) ). for example r binds with nanomolar affinity, prevents binding to ace and the formation of syncytia in vitro, and inhibits viral replication in vivo (sui et al., ) . however, despite the two viruses sharing the same ace receptor these ace blocking antibodies do not bind sars-cov- rbd (wrapp et al., ) . in contrast cr , a sars-cov-specific monoclonal selected from a single chain fv phage display library constructed from lymphocytes of a convalescent sars patient and reconstructed into igg format (ter meulen et al., ) , has been reported to cross-react strongly, binding to the rbd of sars-cov- with a k d of . nm (tian et al., ) , whilst not competing with the binding of ace (ter meulen et al., ) . furthermore, although sars-cov escape mutations could be readily generated for ace blocking cr , no escape mutations could be generated for cr , preventing mapping of its epitope (ter meulen et al., ) . furthermore a natural mutation of sars-cov- has now been detected at residue (y n) (gisaid (shu and mccauley, ) : accession id: epi_isl_ wienecke-baldacchino et al., ), which forms part of the ace binding epitope. finally, cr and cr act synergistically to neutralise sars-cov with extreme potency (ter meulen et al., ) . whilst this work was being prepared for publication a paper reporting that cr does not neutralise sars-cov- and describing the structure of the complex with the rbd at . Å resolution was published (yuan et al., ) . here we extend the structure analysis to significantly higher resolution and, using a different neutralisation assay, show that cr does neutralise sars-cov- , but via a mechanism that would not be detected by the method of yuan et al (yuan et al., ) . we use cryo-em analysis of the interaction of cr with the full spike ectodomain to confirm this mechanism. taken together these observations suggest that the cr epitope should be a major target for therapeutic antibodies. to understand how cr works we first investigated the interaction of cr fab with isolated recombinant sars-cov- rbd, both alone and in the presence of ace . surface plasmon resonance (spr) measurements (methods and figure s ) confirmed that cr binding to rbd is strong (although weaker than the binding reported to sars-cov (ter meulen et al., ) ), with a slight variation according to whether cr or rbd is used as the analyte (k d = nm and nm respectively, derived from the kinetic data in table s ). an independent measure using bio-layer interferometry (bli) with rbd as analyte gave a k d of nm (methods and figure s ). these values are quite similar to those reported by tian et al. (tian et al., ) ( . nm), whereas weaker binding (k d ~ nm) was reported recently by yuan et al. (yuan et al., ) . using spr to perform a competition assay revealed that the binding of ace to the rbd is perturbed by the presence of cr ( figure s ). the presence of ace slows the binding of cr to rbd and accelerates the dissociation. similarly, the release of ace from rbd is accelerated by the presence of cr . these observations are suggestive of an allosteric effect between ace and cr . a plaque reduction neutralisation test using sars-cov- virus and cr showed an nd of : for a starting concentration of mg/ml (calculated according to grist (grist, ) ), superior to that of mers convalescent serum (nd of : ) used as a nibsc international standard positive control (see methods and table s ). this corresponds to % neutralisation at ~ nm (~ . ug/ml). this is similar to the neutralising concentration ( % neutralisation at ug/ml) reported by ter meulen et al. (ter meulen et al., ) for sars-cov, however, as discussed below, it is in apparent disagreement with the result reported recently by yuan et al. (yuan et al., ) . we determined the crystal structure of the sars-cov- rbd-cr fab complex (see methods and table s ) to investigate the relationship between the binding epitopes of ace and cr . crystals grew rapidly and consistently. two crystal forms grew in the same drop. the solvent content of the crystal form solved first was unusually high (ca %) with the ace binding site exposed to large continuous solvent channels within the crystal lattice ( figure s ). these crystals therefore offer a promising vehicle for crystallographic screening to identify potential therapeutics that could act to block virus attachment. the current analysis of this crystal form is at . Å resolution and so, to avoid overfitting, refinement used a novel real-space refinement algorithm to optimise the phases (vagabond, hmg unpublished, see methods). this, together with the favourable observation to parameter ratio resulting from the exceptionally high solvent content, meant that the map was of very high quality, allowing reliable structural interpretation ( figure s , methods). full interpretation of the detailed interactions between cr and the rbd was enabled by the second crystal form which diffracted to high resolution, . Å, and the structure of which was refined to give an r-work/r-free of . / . and good stereochemistry (methods, table s , figure s ). the high-resolution structure is shown in figure a . there are two complexes in the crystal asymmetric unit with residues - in one rbd, - and - in the other rbd well defined, whilst residues - of the cr heavy chains are disordered. the rbd has a very similar structure to that seen in the complex of sars-cov- rbd with ace , rmsd for ca atoms of . Å (pdb, m j (lan et al., ) ), and an rmsd of . Å compared to the sars cov rbd (pdb, ajf (li et al., ) ). only minor conformational changes are introduced by binding to cr , at residues - . the rbd was deglycosylated (methods) to leave a single saccharide unit at each of the n-linked glycosylation sites clearly seen at n and n ( figure s ). cr attaches to the rbd surface orthogonal to the ace receptor binding site. there is no overlap between the epitopes and indeed both the fab and ace ectodomain can bind without clashing ( figure d ) (tian et al., ) . such independence of the ace binding site has been reported recently for another sars-cov- neutralising antibody, d . the fab complex interface buries Å of surface area ( and Å by the heavy and light chains respectively, figure a and figure s ), somewhat more than the rbd-ace interface which covers Å (pdb m j (lan et al., ) ). typical of a fab complex, the interaction is mediated by the antibody cdr loops, which fit well into the rather sculpted surface of the rbd (figure b , c). the heavy chain cdr , and make contacts to residues from α , β and α (residues - ), while two of the light chain cdrs ( and ) interact mainly with residues from the β -α loop, α ( - ) and the α -β loop ( - ) (figures , s , s ). a total of residues from the heavy chain and from the light chain cement the interaction with residues from the rbd. for the heavy chain these potentially form h-bonds and salt bridges, the latter from d and e (cdr ) to k of the rbd. whilst the light chain interface comprises h-bonds and a single salt bridge between e (cdr ) and k of the rbd. the binding is consolidated by a number of hydrophobic interactions ( figure s b ). of the residues involved in the interaction are conserved between sars-cov and sars-cov- ( figure b and figure s ). the cr epitope is much more conserved than that of the receptor blocking anti-sars-cov antibody r for which only of the interacting residues are conserved (hwang et al., ) , in-line with the lack of cross reactivity observed for the latter. the reason for the conservation of the cr epitope becomes clear in the context of the complete pre-fusion s structure (pdb ids: vsb (wrapp et al., ) , vxx, vyb (walls et al., ) ) where the epitope is inaccessible ( figure ). when the rbd is in the 'down' configuration the cr epitope is packed tightly against another rbd of the trimer and the n-terminal domain (ntd) of the neighbouring protomer. in the structure of the pre-fusion form of trimeric spike the majority of rbds are 'down', although presumably stochastically one may be 'up' (walls et al., ; wrapp et al., ) . the structure of a sars-cov complex with ace ectodomain shows that this 'up' configuration is competent to bind receptor, and that there are a family of 'up' orientations with significantly different hinge angles (song et al., ) . however, the cr epitope remains largely inaccessible even in the 'up' configuration. modelling the rotation of the rbd required to enable fab interaction in the context of the spike trimer, showed a rotation corresponding to a > ° further declination from the central vertical axis was required, beyond that observed previously (walls et al., ; wrapp et al., ) (figure i ), although this might be partly mitigated by more complex movements of the rbd and if more than one rbd is in the 'up' configuration this requirement would be relaxed somewhat. since locking the up state by receptor blocking antibodies is thought to destabilise the pre-fusion state (walls et al., ) binding of cr presumably introduces further destabilisation, leading to a premature conversion to the post-fusion state, inactivating the virus. cr and ace blocking antibodies can bind independently but both induce an 'up' conformation, presumably explaining the observed synergy between binding at the two sites (ter meulen et al., ) . to test if cr binding destabilises the prefusion state of spike, the ectodomain construct described previously (wrapp et al., ) was used to produce glycosylated protein in hek cells (methods). cryo-em screening showed that the protein was in the trimeric prefusion conformation. spike was then mixed with an excess of cr fab and incubated at room temperature, with aliquots being taken at minutes and hours. aliquots were immediately applied to cryo-em grids and frozen (methods). for the minutes incubation, collection of a substantial amount of data allowed unbiased particle picking and d classification which revealed two major structural classes with a similar number in each, (i) the prefusion conformation, and (ii) a radically different conformation (methods , table s and figure s ). detailed analysis of the prefusion conformation led to a structure at a nominal resolution of . Å (fsc = . ), based on a broad distribution of orientations, that revealed the same predominant rbd pattern (one 'up' and two 'down') previously seen (wrapp et al., ) with no evidence of cr binding (figure a , figure s ). analysis of the other major particle class revealed strong preferential orientation of the particles on the grid ( figure s a ). despite this a reconstruction with a nominal resolution of . Å within the plane of the grid, and perhaps Å resolution in the perpendicular direction ( figure s b ), could be produced which allowed the unambiguous fitting of the cr -rbd complex (figure b ). note that in addition there is less well defined density attached to the rbd, in a suitable position to correspond to the spike n-terminal domain (wrapp et al., ) . these structures are no longer trimeric, rather two complexes associate to form an approximately symmetric dimer (however, application of this symmetry in the reconstruction process did not improve the resolution). the interactions responsible for dimerisation involve the ace binding site on the rbd and the elbow of the fab, however the interaction does not occur in our lowresolution crystal form and is therefore probably extremely weak and not biologically significant. since conversion to the post-fusion conformation leads to dissociation of s (which includes the n-terminal domain and rbd) these results confirm that cr destabilises the prefusion spike conformation. further evidence of this is provided by analysis of data collected after h incubation. by this point there were no intact trimers remaining and a heterogeneous range of oligomeric assemblies had appeared, which we were not able to interpret in detail but which are consistent with the lateral assembly of fab/rbd complexes ( figure s ). note that the relatively slow kinetics will not be representative of events in vivo, where the conversion might be accelerated by the elevated temperature and the absence of the mutations which were added to this construct to stabilise the prefusion state (kirchdoerfer et al., ; pallesen et al., ; wrapp et al., ) . until now the only documented mechanism of neutralisation of coronaviruses has been through blocking receptor attachment. in the case of sars-cov this is achieved by presentation of the rbd of the spike in an 'up' conformation. although not yet confirmed for sars-cov- it is very likely that a similar mechanism can apply. here we define a second class of neutralisers, that bind a highly conserved epitope ( figure s ) and can therefore act against both sars-cov and sars-cov- (cr was first identified as a neutralising antibody against sars-cov (ter meulen et al., ) ). we find that binding of cr to the isolated rbd is tight (~ nm) and the crystal structure of the complex reveals the atomic detail of the interaction. despite the spatial separation of the cr and ace epitopes we find an allosteric effect between the two binding events. the role of the cr epitope in stabilising the prefusion spike trimer explains why it has, to date, proved impossible to generate mutations that escape binding of the antibody (ter meulen et al., ) . whilst in our assay cr neutralises sars-cov- , a recent paper (yuan et al., ) reported an alternative assay that did not detect neutralisation. the difference is likely due to their removal of the antibody/virus mix after adsorption to the indicator cells, before incubating to allow cytopathic effect (cpe) to develop. this would be in-line with the distinction previously seen between neutralisation tests for influenza virus by antibodies which bind the stem of hemagglutinin and therefore do not block receptor binding (thomson et al., ) . these antibodies did not appear to be neutralising when tested with the standard who neutralisation assay, in which a similar protocol is used to that adopted by yuan et al, in which the inoculum of virus/antibody is washed out before development of cpe. neutralisation was observed, however, when the antibodies were left in the assay during incubation to produce cpe. by analogy we would expect antibodies to the rbd that block attachment to ace to behave in a similar way to antibodies against the globular head of ha, whilst antibodies such as cr , that neutralise by an alternative mechanism to blocking receptor attachment, may need to be present throughout the incubation period with the indicator cells to reveal neutralisation. this agrees with our observation that, in the absence of ace , the cr fab destroys the prefusion-stabilised trimer (t / ~ h at room temperature as measured by cryo-em). with monoclonal antibodies now recognised as potential antivirals (lu et al., b; salazar et al., ) our results suggest that cr may be of immediate utility, since the mechanism of neutralisation will be unusually resistant to virus escape. in contrast antibodies which compete with ace (whose epitope on sars-cov- is reported to have already shown mutation at residue (gisaid: accession id: epi_isl_ wienecke-baldacchino et al., (shu and mccauley, ) ), are likely to be susceptible to escape. furthermore, with knowledge of the detailed structure of the epitope presented here a higher affinity version of cr might be engineered. alternatively, since the same mechanism of neutralisation is likely to be used by other antibodies, a more potent monoclonal antibody targeting the same epitope might be found (for instance by screening for competition with cr ). additionally, since this epitope is sterically and functionally independent of the well-established receptor-blocking neutralising antibody epitope there is considerable scope for therapeutic synergy between antibodies targeting the two epitopes (indeed this type of to further validate the spr results the k d of fab cr for rbd was also measured by bio-layer interferometry. kinetic assays were performed on an octet red e (fortebio) at ℃ with a shake speed of rpm. fab cr was immobilized onto amine reactive nd generation (ar g) biosensors (fortebio) and serially diluted rbd ( , , , and nm) was used as analyte. pbs (ph . ) was used as the assay buffer. recorded data were analysed using the data analysis software ht v . (fortebio), with a global : fitting model. neutralising virus titres were measured in serum samples that had been heat-inactivated at °c for minutes. sars-cov- (strain victoria/ / at cell passage (caly et al., ) ) was diluted to a concentration of . e+ pfu/ml ( pfu/ µl) and mixed : in % fcs/mem containing mm hepes buffer with doubling serum dilutions from : to : in a -well v-bottomed plate. the plate was incubated at °c in a humidified box for hour to allow the antibody in the serum samples to neutralise the virus. cr (ph . ) at a starting concentration of mg/ml was diluted in . the dilutions were then made -fold up to . the neutralised virus was transferred into the wells of a twice dpbs-washed plaque assay -well plate that had been seeded with vero/hslam the previous day at . e+ cells per well in % fcs/mem. neutralised virus was allowed to adsorb at °c for a further hour, and overlaid with plaque assay overlay media ( x mem/ . % cmc/ % fcs final). after days incubation at °c in a humified box, the plates were fixed, stained and plaques counted. dilutions and controls were performed in duplicate. median neutralising titres (nd ) were determined using the spearman-karber formula (kärber, ) relative to virus only control wells. purified and deglycosylated rbd and cr fab were concentrated to . mg/ml and mg/ml respectively, and then mixed in an approximate molar ratio of : . crystallization screen experiments were carried out using the nanolitre sitting-drop vapour diffusion method in -well plates as previously described (walter et al., (walter et al., , transmission). data were indexed, integrated and scaled with the automated data processing program xia -dials (winter, ; winter et al., ) . the data set of ° was collected from a single frozen crystal to . Å resolution with -fold redundancy. the crystal belongs to space group p with unit cell dimensions a = b = . Å and c = . Å. the structure was determined by molecular replacement with phaser (mccoy et al., ) using search models of human germline antibody fabs - /o (pdb id, kmt (teplyakov et al., ) ) heavy chain and ighv - /igk - (pdb id, i d (teplyakov et al., ) ) light chain, and rbd of sars-cov- rbd/ace complex (pdb id, m j (lan et al., ) ). there is one rbd/cr complex in the crystal asymmetric unit, resulting in a crystal solvent content of ~ %. during optimization of the crystallization conditions, a second crystal form was found to grow in the same condition with similar morphology. a data set of ° rotation with data extending to . Å was collected on beamline i of diamond from one of these crystals (exposure time . s per . ° frame, beam size × μ m and % beam transmission). the crystal also belongs to space group p but with significantly different unit cell dimensions (a = b = . Å and c = . Å). there were two rbd/cr complexes in the asymmetric unit and a solvent content of ~ %. the initial structure was determined using the lower resolution data from the first crystal form. data were excluded at a resolution below Å as these fell under the beamstop shadow. one cycle of refmac (murshudov et al., ) was used to refine atomic coordinates after manual correction in coot (emsley and cowtan, ) figure s ). the final refined structure had an r work of . (r free , . ) for all data to . Å resolution. this structure was later used to determine the structure of the second crystal form, which has been refined with phenix (liebschner et al., ) to r work = . and r free = . for all data to . Å resolution. this refined model revealed the presence of one extra residue at each heavy chain n-terminus and extra residues at the n-terminus of one rbd from the signal peptide. there is well ordered density for a single glycan at each of the glycosylation sites at n and n in one rbd, and only one at n in the second rbd. data collection and structure refinement statistics are given in table s . structural comparisons used shp (stuart et al., ) , residues forming the rbd/fab interface were identified with pisa (krissinel and henrick, ) , figures were prepared with pymol (the pymol molecular graphics system, version . r pre, schrödinger, llc). purified spike protein was buffer exchanged into mm tris ph . , mm nacl, . % nan buffer using a desalting column (zeba, thermo fisher μ m and at a nominal magnification of x , , corresponding to a calibrated pixel size of . Å/pixel, see table s . cryo-em data processing for both the minute and h incubation datasets, motion correction and alignment of x binned super-resolution movies was performed using relion . . ctf-estimation with gctf (v . ) (zhang, ) and non-template-driven particle picking was then performed within cryosparc v . . -live followed by multiple rounds of d classification (punjani et al., ) . for the minutes dataset. d class averages for structure-a and structure-b were then used separately for template-driven classification before further rounds of d and d classification with c symmetry. both structures were then sharpened in cryosparc. data processing and refinement statistics are given in table s . an initial model for the spike (structure-a) was generated using pdb id, vyb (walls et al., ) and rigid body fitted into the final map using coot (emsley and cowtan, ) . the model was further refined in real space with phenix (liebschner et al., ) which resulted in a correlation coefficient of . . two copies of rbd-cr were fitted into structure-b in the same manner. because of the strongly anisotropic resolution the overall correlation coefficient vs the model was lower ( . ). for the h incubation dataset, particles were extracted with a larger box size ( pixels as compared to pixels), and, following multiple rounds of d classification, d class averages from 'blob-picked' particles showing signs of complete 'flower-like' structures were selected for ab initio reconstruction. for the h data no detailed fitting was attempted. t/e (red, negative; blue, positive). real estimates of mortality following covid- infection isolation and rapid sharing of the novel coronavirus (sar-cov- ) from the first patient diagnosed with covid- in australia coot: model-building tools for molecular graphics diagnostic methods in clinical virology. x structural basis of neutralization by a human anti-severe acute respiratory syndrome spike protein antibody, r beitrag zur kollektiven behandlung pharmakologischer reihenversuche stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis inference of macromolecular assemblies from crystalline state structure of the sars-cov- spike receptor-binding domain bound to the ace structural biology: structure of sars coronavirus spike receptor-binding domain complexed with receptor macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding development of therapeutic antibodies for the treatment of diseases phaser crystallographic software a sars-like cluster of circulating bat coronaviruses shows potential for human emergence human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants refmac for the refinement of macromolecular crystal structures a pipeline for the production of antibody fragments for structural studies using transient expression in hek t cells the production of glycoproteins by transient expression in mammalian cells hek cells: an alternative to e. coli for the production of secreted and intracellular mammalian proteins immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen immunopathogenesis of coronavirus infections: implications for sars cryosparc: algorithms for rapid unsupervised cryo-em structure determination dynamical asymmetry exposes -ncov prefusion spike antibody therapies for the prevention and treatment of viral infections gisaid: global initiative on sharing all influenza datafrom vision to reality cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace crystal structure of cat muscle pyruvate kinase at a resolution of . Å potent neutralization of severe acute respiratory syndrome (sars) coronavirus by a human mab to s protein that blocks receptor association antibody modeling assessment ii structural diversity in a human antibody germline library pandemic h n influenza infection and vaccination in humans induces cross-protective antibodies that potent binding of novel coronavirus spike protein by a sars coronavirusspecific human monoclonal antibody immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus unexpected receptor functional mimicry elucidates activation of coronavirus fusion function, and antigenicity of the sars-cov- spike glycoprotein a procedure for setting up high-throughput nanolitre crystallization experiments. i. protocol design and validation a procedure for setting up highthroughput nanolitre crystallization experiments. crystallization workflow for initial screening, automated storage, imaging and optimization molecular mechanism for antibody-dependent enhancement of coronavirus entry a human monoclonal antibody blocking sars-cov- infection structural and functional basis of sars-cov- entry by using human ace xia : an expert system for macromolecular crystallography data reduction dials: implementation and evaluation of a new integration package cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of the sars-cov- by full-length human ace a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov gctf: real-time ctf determination and correction potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies rbds in the down conformation (generated by superposing our rbd structure on the prefusion trimer of ref (wrapp et al., ) ). the viral membrane would be at the bottom of the picture. all of s and s are shown in yellow, apart from the rbd, which is shown in grey, with the cr epitope coloured green. a, a cut-way of the trimer showing, in red, the dipeptide (residues - ) which has been mutated to pp to confer stability on the pre-fusion state. note the proximity to the cr epitope. c, showing a top view of the molecule (also used for panels d-f). one of the rbds has been drawn in light grey in the down configuration and hinged up in dark grey, using the motion about the hinge axis observed for several coronavirus spikes, but extending the motion sufficiently to allow cr to bind. the pp motif is shown in red and the glycosylated residue n in magenta. panels d-f show the trimer viewed from above d -all rbds down, e -one rbd up f -one rbd rotated (as in c)to allow access to cr . panels g-i are equivalent structures to d-f, but are viewed from the side. in e bound ace is shown and in f cr . key: cord- -zqoggwnk authors: pietschmann, jan; vöpel, nadja; spiegel, holger; krause, hans-joachim; schröper, florian title: brief communication: magnetic immuno-detection of sars-cov- specific antibodies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zqoggwnk sars-cov- causes ongoing infections worldwide, and identifying people with immunity is becoming increasingly important. available point-of-care diagnostic systems as lateral flow assays have high potential for fast and easy on-site antibody testing but are lacking specificity, sensitivity or possibility for quantitative measurements. here, a new point-of-care approach for sars-cov- specific antibody detection in human serum based on magnetic immuno-detection is described and compared to standard elisa. for magnetic immuno-detection, immunofiltration columns were coated with a sars-cov- spike protein peptide. sars-cov- peptide reactive antibodies, spiked at different concentrations into pbs and human serum, were rinsed through immunofiltration columns. specific antibodies were retained within the ifc and labelled with an isotype specific biotinylated antibody. streptavidin-functionalized magnetic nanoparticles were applied to label the secondary antibodies. enriched magnetic nanoparticles were then detected by means of frequency magnetic mixing detection technology, using a portable magnetic read-out device. measuring signals corresponded to the amount of sars-cov- specific antibodies in the sample. our preliminary magnetic immuno-detection setup resulted in a higher sensitivity and broader detection range and was four times faster than elisa. further optimizations could reduce assay times to that of a typical lateral flow assay, enabling a fast and easy approach, well suited for point-of-care measurements without expensive lab equipment. the new severe acute respiratory syndrome coronavirus- (sars-cov- ), is causing ongoing worldwide infections, leading to an unprecedented pandemic. according to world health organization (who), it is estimated that up to % of people with coronavirus disease (covid- ) are not aware that they are/were infected due to no or very mild symptoms [ ] . symptoms can be noted comparable to common cold cough, rhinitis or fever up to harsh symptoms, at a later point, serological assays would also be required to prove and monitor effectivity of vaccination and longevity of the obtained immunity. fast, cheap and easily applicable on-site testing solutions will thus become increasingly important, but currently only few rapid test systems are available. lateral-flow-detection (lfd) approaches are easy to handle and results are gained after - min. however they are not quantitative, and their reliability, specificity and sensitivity is much worse than that of lab-based assay formats based on enzyme-linked immunosorbent assay (elisa). in particular, specificity is a major challenge at currently available serological antibody tests. this depends to a large extent on the antigen used in the test assays. enveloped positive-stranded rna sars-cov- coronaviruses consist of five structural proteins, the spike glycoprotein (s), envelope protein (e), membrane protein (m), the nucleocapsid protein (n) and a hemagglutinin esterase (he). the s-protein, a complex folded glycoprotein comprising two regions, s and s , exhibits the highest immunogenicity, has the most important role in host interaction, especially cell entry, and is also the main target for neutralizing antibodies [ ] . the proteins m, e and he are only weakly immunogenic and less suitable as targets for antibody diagnosis. although the n protein is immunodominant, it is not suitable for the specific analysis of the immune response against sars-cov- viruses due to its high cross-reactivity with antibodies targeting related coronavirius strains [ , ] . the company euroimmun ag, lübeck, germany offers two elisa kits using a genetically modified n-protein variant, which enables a more specific detection of antibodies already ten days after infection. however, for highly specific detection of immune response against sars-cov- , typically the s subunit of s-protein should be used. currently, only few vendors offer these specific elisa formats using the s subunit of s-protein for specifically detecting sars-cov- antibodies [ , ] . nevertheless, specific, sensitive and quantitative rapid tests applicable for a decentralized point-of- care (poc) analysis are currently not available. magnetic immuno-detection (mind) could be a powerful tool for poc assay performance. mind as reference method to our poc mind approach, a typical laboratory-based elisa was performed (fig ) . after coating of elisa microtiter plate with sars-cov- spike protein peptide and blocking with bsa, sars-cov- spike protein peptide specific antibody was diluted in the range from . ng·ml - to ng·ml - in pbs-buffer or serum and applied into wells. after addition of biotinylated gar and subsequent labelling with streptavidin-ap, the elisa plate was read out at nm and obtained measuring values were used to generate calibration curves for sars-cov- specific antibody concentrations in pbs (fig , black curve) and in human serum samples (fig , red curve). blank values determined in pbs and serum were . au ± . au and same calibration measurements employing dilutions of sars-cov- specific antibody were done with our poc mind-based setup (fig and ) . comparable to laboratory-based elisa, the same dilutions of sars-cov- spike protein peptide specific antibody in pbs-buffer (fig , black in this proof-of-concept experiment, a commercially available sars-cov- spike protein peptide with corresponding antibody was used. if using this peptide for testing of patient samples, funding: the author received no specific funding for this work. acknowledgments: the authors would like to thank max schubert for his helpful advices and support given in discussions. competing interests: the authors declare no competing interests. ; a narrative review clinical characteristics of , covid- patients: a meta- analysis sars-cov- and covid- in older adults: what we may expect regarding pathogenesis, immune responses, and outcomes structure, function, and antigenicity of the sars-cov- spike glycoprotein evaluation of serologic and antigenic relationships between middle eastern respiratory syndrome coronavirus and other coronaviruses to develop vaccine platforms for the rapid response to emerging coronaviruses reactivity between common human coronaviruses and sars-cov- using coronavirus antigen microarray antibody responses to sars-cov- in patients of novel coronavirus disease comparison of four new commercial serologic assays for determination of sars-cov- igg magnetic particle detection by frequency mixing for immunoassay applications immunofiltration columns for frequency mixing-based multiplex magnetic immunodetection. sensors (basel) sensitive and rapid detection of cholera toxin subunit b using magnetic frequency mixing detection crp determination based on a novel magnetic biosensor francisella tularensis detection using magnetic labels and a magnetic biosensor based on frequency mixing magnetic biosensor for the detection of yersinia pestis simple and portable magnetic immunoassay for rapid detection and sensitive quantification of plant viruses sensitive aflatoxin b detection using nanoparticle-based competitive magnetic immunodetection. toxins (basel) antibody and cytokine serum levels in patients subjected to anti-rabies prophylaxis with serum-vaccination multiplex detection of different magnetic beads using frequency scanning in magnetic frequency mixing technique key: cord- -bb h w authors: brann, david h.; tsukahara, tatsuya; weinreb, caleb; lipovsek, marcela; van den berge, koen; gong, boying; chance, rebecca; macaulay, iain c.; chou, hsin-jung; fletcher, russell; das, diya; street, kelly; de bezieux, hector roux; choi, yoon-gi; risso, davide; dudoit, sandrine; purdom, elizabeth; mill, jonathan s.; hachem, ralph abi; matsunami, hiroaki; logan, darren w.; goldstein, bradley j.; grubb, matthew s.; ngai, john; datta, sandeep robert title: non-neuronal expression of sars-cov- entry genes in the olfactory system suggests mechanisms underlying covid- -associated anosmia date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bb h w altered olfactory function is a common symptom of covid- , but its etiology is unknown. a key question is whether sars-cov- (cov- ) – the causal agent in covid- – affects olfaction directly by infecting olfactory sensory neurons or their targets in the olfactory bulb, or indirectly, through perturbation of supporting cells. here we identify cell types in the olfactory epithelium and olfactory bulb that express sars-cov- cell entry molecules. bulk sequencing revealed that mouse, non-human primate and human olfactory mucosa expresses two key genes involved in cov- entry, ace and tmprss . however, single cell sequencing and immunostaining demonstrated ace expression in support cells, stem cells, and perivascular cells; in contrast, neurons in both the olfactory epithelium and bulb did not express ace message or protein. these findings suggest that cov- infection of non-neuronal cell types leads to anosmia and related disturbances in odor perception in covid- patients. sars-cov- (cov- ) is a pandemic coronavirus that causes the covid- syndrome, which can include upper respiratory infection (uri) symptoms, severe respiratory distress, acute cardiac injury and death ( - ). cov- is closely related to other beta-coronaviruses, including the causal agents in pandemic sars and mers (sars-cov and mers-cov, respectively) and endemic viruses typically associated with mild uri syndromes (hcov-oc and hcov- e) ( ) ( ) ( ) . clinical reports suggest that infection with cov- is associated with high rates of disturbances in smell and taste perception, including anosmia ( ) ( ) ( ) ( ) ( ) . while many viruses (including coronaviruses) induce transient changes in odor perception due to inflammatory responses, in at least some cases covid-related anosmia has been reported to occur in the absence of significant nasal inflammation or coryzal symptoms ( , ( ) ( ) ( ) . this observation suggests that cov- might directly target odor processing mechanisms, although the specific means through which cov- alters odor perception remains unknown. cov- -like sars-cov -infects cells through interactions between its spike (s) protein and the ace protein on target cells. this interaction requires cleavage of the s protein, likely by the cell surface protease tmprss , although other proteases (such as cathepsin b and l, ctsb/ctsl) may also be involved ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . other coronaviruses use different cell surface receptors and proteases to facilitate cellular entry, including dpp , furin and hspa for mers-cov, anpep for hcov- e, tmprss d for sars-cov (in addition to ace and tmprss ), and st gal and st gal for hcov-oc and hcov-hku ( , ( ) ( ) ( ) . we hypothesized that identifying the specific olfactory cell types susceptible to direct cov- infection (due to e.g., ace and tmprss expression) would provide insight into possible mechanisms through which covid- causes altered smell perception. the nasal epithelium is divided into a respiratory epithelium (re) and olfactory epithelium (oe), whose functions and cell types differ. the nasal re is continuous with the epithelium that lines much of the respiratory tract and is thought to humidify air as it enters the nose; main cell types include basal cells, ciliated cells, secretory cells (including goblet cells), and brush/microvillar cells ( , ) (figure ). the oe, in contrast, is responsible for odor detection, as it houses mature olfactory sensory neurons (osns) that interact with odors via receptors localized on their dendritic cilia. osns are supported by sustentacular cells, which act to structurally support sensory neurons, phagocytose and/or detoxify potentially damaging agents, and maintain local salt and water balance ( ) ( ) ( ) ; microvillar cells and mucus-secreting bowman's gland cells also play important roles in maintaining oe homeostasis and function ( , ) (figure ). in addition, the oe contains globose basal cells (gbcs), which are primarily responsible for regenerating osns during normal epithelial turnover, and horizontal basal cells (hbcs), which act as reserve stem cells activated upon tissue damage ( ) ( ) ( ) . osns elaborate axons that puncture the cribriform plate at the base of the skull and terminate in the olfactory bulb, whose local circuits process olfactory information before sending it to higher brain centers (figure ). it has recently been demonstrated through single cell rna sequencing analysis (referred to herein as scseq) that cells from the human upper airway -including nasal re goblet, basal and ciliated cells -express high levels of ace and tmprss , suggesting that these re cell types may serve as a viral reservoir during cov- infection ( ) . however, analyzed samples in that dataset did not include any osns or sustentacular cells, indicating that tissue sampling in these experiments did not include the oe ( , ) . here we query both new and previously published bulk rna-seq and scseq datasets from the olfactory system for expression of ace , tmrpss and other genes implicated in coronavirus entry. we find that non-neuronal cells in the oe and olfactory bulb, including support, stem and perivascular cells, express cov- entryassociated transcripts and their associated proteins, suggesting that infection of these non-neuronal cell types contributes to anosmia in covid- patients. schematic of a sagittal view of the human nasal cavity, in which respiratory and olfactory epithelium are colored (left). for each type of epithelium, a schematic of the anatomy and known major cell types are shown (right). in the olfactory bulb in the brain (tan) the axons from olfactory sensory neurons coalesce into glomeruli, and mitral/tufted cells innervate these glomeruli and send olfactory projections to downstream olfactory areas. glomeruli are also innervated by juxtaglomerular cells, a subset of which are dopaminergic. to determine whether genes relevant to cov- entry are expressed in osns or other cell types in the human oe, we queried previously published bulk rna-seq data derived from the whole olfactory mucosa (wom) of macaque, marmoset and human ( ) , and found expression of almost all cov-entry-related genes in all wom samples ( figure s a ). to identify the specific cell types in human oe that express ace , we quantified gene expression in scseq derived from four human nasal biopsy samples recently reported by durante et al ( ) . neither ace nor tmprss were detected in mature osns, whereas these genes were detected in both sustentacular cells and hbcs (figures a-e) . in contrast, genes relevant to cell entry of other covs were expressed in osns, as well as in other oe cell types. we confirmed the expression of ace proteins via immunostaining of human olfactory epithelium biopsy tissue, which revealed expression in sustentacular and basal cells, and an absence of ace protein in osns ( figures f and s e ). together, these results demonstrate that sustentacular and olfactory stem cells, but not mature osns, are potentially direct targets of cov- in the human oe. given that the nasopharynx is a major site of infection for cov- ( ), we compared the frequency of ace and tmprss expression among the cell types in the human re and oe ( ) . sustentacular cells exhibited the highest frequency of ace expression in the oe ( . % of cells) although this frequency was slightly lower than that observed in respiratory ciliated and secretory cells ( . % and . %, respectively). while all hbc subtypes expressed ace , the frequency of expression of ace was lower in olfactory hbcs ( . % of cells) compared to respiratory hbcs ( . % of cells) ( figure b ). in addition, all other re cell subtypes showed higher frequencies of ace and tmprss expression than was apparent in oe cells. these results demonstrate the presence of key cov- entry-related genes in specific cell types in the oe, but at lower levels of expression than in re isolated from the nasal mucosa. we wondered whether these lower levels of expression might nonetheless be sufficient for infection of cov- . it was recently reported that nasal re has higher expression of cov- entry genes than re of the trachea or lungs ( ) , and we therefore asked where the oe fell within this previously established spectrum of expression. to address this question, we developed a two step alignment procedure in which we first sought to identify cell types that were common across the oe and re, and then leveraged gene expression patterns in these common cell types to normalize gene expression levels across all cell types in the oe and re ( figure s ). this approach revealed a correspondences between goblet cells in the re and bowman's gland cells in the oe ( % mapping probability, see methods), and between pulmonary ionocytes in the re and a subset of microvillar cells in the oe ( % mapping probability, see methods); after alignment, human oe sustentacular cells were found to express ace and tmprss at levels similar to those observed in the remainder of the non-nasal respiratory tract ( figure g) ( ) . these results are consistent with the possibility that specific cell types in the human olfactory epithelium express ace at a level that is permissive for direct infection. ( ) . each dot represents an individual cell, colored by cell type (hbc = horizontal basal cell, osn = olfactory sensory neuron, sus = sustentacular cell, mv: microvillar cell, resp.: respiratory, oec = olfactory ensheathing cell, smc=smooth muscle cell). (b) percent of cells expressing ace and tmprss . ace was not detected in any osns, but was observed in sus cells and hbcs, among other olfactory and respiratory epithelial cell types. olfactory and respiratory cell types are shown separately. ace and tmprss were also co-expressed above chance levels (odds ratio . , p-value . e- , fisher's exact test). (c) umap representations of detected immature (gng ) and mature (gng ) osns. neither ace nor tmprss are detected in either population of osns. the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell). (d) umap representations of all cells, depicting the normalized expression of cov- related genes ace and tmprss , as well as several cell type markers. ace and tmprss are expressed in respiratory and olfactory cell types, but not in osns. ace and tmprss are detected in hbc (krt ) and sustentacular (cyp a ) cells, as well as other respiratory epithelial cell types, including respiratory ciliated (foxj ) cells. (e) various cov related genes including ace and tmprss , are expressed in respiratory and olfactory cell types, but not in osns. gene expression for ace and tmprss as well as marker genes for olfactory and respiratory epithelial cell types are shown normalized by their maximum expression across cell types. mhv, mouse hepatitis virus. (f) ace immunostaining of human olfactory mucosal biopsy samples. ace protein (green) is detected in sustentacular cells and krt -positive basal cells (red; white arrowhead). nuclei were stained with dapi (blue). bar = µm. the ace and krt channels from the box on the left are shown individually on the right (g) gene expression across cell types and tissues in durante figure s ). the tissues correspond to progressive positions along the airway from nasal to distal lung. ace expression in olfactory hbc and sustentacular cells is comparable to that observed in other cell types in the respiratory tract. to further explore the distribution of cov- cell entry genes in the olfactory system we turned to the mouse, which enables interrogative experiments not possible in humans. to evaluate whether that expression patterns observed in the mouse correspond to those observed in the human oe, we examined published datasets in which rna-seq was independently performed on mouse wom and on purified populations of mature osns ( ) ( ) ( ) . the cov- receptor ace and the protease tmprss were expressed in wom, as were the cathepsins ctsb and ctsl (figures a and s a) ( ) . however, expression of these genes (with the exception of ctsb) was much lower and ace expression was nearly absent in purified osn samples (figures a and s a, see legend for counts). genes used for cell entry by other covs (except st gal ) were also expressed in wom, and de-enriched in purified osns. the deenrichment of ace and tmprss in osns relative to wom was also observed in two other mouse rna-seq datasets ( , ) ( figure s b ). these data demonstrate that, as in humans, ace and other cov- entry-related genes are expressed in the mouse olfactory epithelium. the presence of ace and tmprss transcripts in mouse wom and their (near total) absence in purified osns suggest that the molecular components that enable cov- entry into cells are expressed in non-neuronal cell types in the mouse nasal epithelium. to identify the specific cell types that express ace and tmprss , we performed scseq (via drop-seq, see methods) on mouse wom ( figure b ). these results were consistent with observations made in the human epithelium: ace and tmprss were expressed in a fraction of sustentacular and bowman's gland cells, and a very small fraction of stem cells, but not in osns (zero of , identified mature osns, figures c and s c-d) . of note, only dorsally-located sustentacular cells, which express the markers sult c and acsm , were positive for ace ( figures d and s d-e) . indeed, reanalysis of the ace + subset of human sustentacular cells revealed that all positive cells expressed genetic markers associated with the dorsal epithelium ( figure s d ). an independent mouse scseq data set (obtained using the x chromium platform, see methods) revealed that olfactory sensory neurons did not express ace ( of mature osns were positive for ace ), while expression was observed in a fraction of bowman's gland cells and hbcs ( figure s , see methods). expression in sustentacular cells was not observed in this dataset, which included relatively few dorsal sustentacular cells (a possible consequence of the specific cell isolation procedure associated with the x platform, which distinguishes it from drop-seq; compare figures s c and d) . staining of the mouse wom with anti-ace antibodies confirmed that ace protein is expressed in sustentacular cells and is specifically localized to the sustentacular cell microvilli ( figure e -k). ace + sustentacular cells were identified exclusively within the dorsal subregion of the oe; critically, within that region many (and possibly all) sustentacular cells expressed ace ( figure f -g). this observation is consistent with the possibility that ace protein can be broadly expressed in cell populations that exhibit sparse expression when characterized by scseq. staining was also observed in bowman's gland cells but not in osns ( figure h -j). taken together, these data demonstrate that ace is expressed by sustentacular cells that specifically reside in the dorsal epithelium in both mouse and human. considered positive if any transcripts (umis) were expressed for a given gene. sustentacular cells (sus) from dorsal and ventral zones are quantified separately. ace is detected in dorsal sustentacular, bowman's gland, hbcs, as well as respiratory cell types. (d) umap representation of sustentacular cells, with expression of cov- related genes ace and tmprss , as well as marker genes for sus (both pan-sus marker cbr and dorsal specific marker sult c ) indicated. each point represents an individual sustentacular cell, and the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell; in this plot ace expression is binarized for visualization purposes). ace -positive sustentacular cells are found within the dorsal sult c positive subset. umap plots for other cell types are shown in figure s . (e) ace immunostaining of mouse main olfactory epithelium. as shown in this epithelial hemisection, ace protein is detected in the dorsal zone and respiratory epithelium. note that the punctate ace staining beneath the epithelial layer is likely associated with vasculature (see figure f ). bar = µm. arrowheads depict the edges of ace expression, corresponding to the presumptive dorsal zone (confirmed in g). viral injury can lead to broad changes in oe physiology that are accompanied by recruitment of stem cell populations tasked with regenerating the epithelium ( , ) . to characterize the distribution of ace expression under similar circumstances, we injured the oe by treating mice with methimazole (which specifically ablates osns), and then employed a previously established lineage tracing protocol to perform scseq on hbcs and their descendants during subsequent regeneration (see methods) ( ) . this analysis revealed that after injury ace and tmprss are expressed in subsets of sustentacular cells and hbcs, as well as in the activated hbcs that serve to regenerate the epithelium (figures a-c and s ; note that activated hbcs express ace at higher levels than resting hbcs). analysis of the ace + sustentacular cell population revealed expression of dorsal epithelial markers ( figure d ). to validate these results, we re-analyzed a similar lineage tracing dataset in which identified hbcs and their progeny were subject to smart-seq -based deep sequencing, which is more sensitive than scseq ( ) . in this dataset, ace was detected in more than . % of gbcs, nearly % of activated hbcs and nearly % of sustentacular cells but was not detected in osns ( figures s b) . furthermore, larger percentages of hbcs, gbcs and sustentacular cells expressed tmprss . immunostaining with anti-ace antibodies confirmed that ace protein was present in activated stem cells under these regeneration conditions ( figure e ). these results demonstrate that activated stem cells recruited during injury express ace , and do so at higher levels than those in resting stem cells. given the potential for the re and oe in the nasal cavity to be directly infected with cov- , we assessed the expression of ace and other cov entry genes in the mouse olfactory bulb (ob), which is directly connected to osns via cranial nerve i (cn i); in principle, alterations in bulb function could cause anosmia independently of functional changes in the oe. to do so, we performed scseq (using drop-seq, see methods) on the mouse ob, and merged these data with a previously published ob scseq analysis, yielding a dataset with nearly , single cells (see methods) ( ) . this analysis revealed that ace expression was absent from ob neurons and instead was observed only in vascular cells, predominantly in pericytes, which are involved in blood pressure regulation, maintenance of the blood-brain barrier, and inflammatory responses (figures a-d and s - ) ( ) . although other potential cov proteases were expressed in the ob, tmprss was not expressed. we also performed smart-seq -based deep sequencing of single ob dopaminergic juxtaglomerular neurons, a population of local interneurons in the ob glomerular layer that (like tufted cells) can receive direct monosynaptic input from nose osns (figures e and s , see methods); these experiments confirmed the absence of ace and tmprss expression in this cell type. immunostaining in the ob revealed that blood vessels expressed high levels of ace protein, particularly in pericytes; consistent with the scseq results, staining was not observed in any neuronal cell type ( figure f ). these observations may also hold true for other brain regions, as re-analysis of deeply sequenced scseq datasets from different regions of the nervous system demonstrated that ace and tmprss expression is absent from neurons, consistent with prior immunostaining results ( figure s ) ( ) . given the extensive similarities detailed above in expression patterns for ace and tmprss in the mouse and human, these findings (performed in mouse) suggest that ob neurons are likely not a primary site of infection, but that vascular pericytes may be sensitive to cov- infection in the ob. here we show that subsets of oe sustentacular cells, hbcs, and bowman's gland cells in both mouse and human samples express the cov- receptor ace and the spike protein protease tmprss . human oe sustentacular cells express these genes at levels comparable to those observed in lung cells. in contrast, we failed to detect ace expression in mature osns at either the transcript or protein levels. these observations suggest that cov- does not directly enter osns, but instead may target oe support and stem cells. similarly, neurons in the ob do not express ace , whereas vascular pericytes do. thus primary infection of non-neuronal cell types -rather than sensory or bulb neurons -may be responsible for anosmia and related disturbances in odor perception in covid- patients. the identification of non-neuronal cell types in the oe and bulb susceptible to cov- infection suggests four possible, non-mutually-exclusive mechanisms for the acute loss of smell reported in covid- patients. first, local infection of support and vascular cells in the nose and bulb could cause significant inflammatory responses whose downstream effects could block effective odor conduction, or alter the function of osns or bulb neurons ( ) ( ). second, damage to support cells (which are responsible for local water and ion balance) could indirectly influence signaling from osns to the brain ( ) . third, damage to sustentacular cells and bowman's gland cells in mouse models can lead to diffuse architectural damage to the entire oe, which in turn could abrogate smell perception ( ) . finally, vascular damage could lead to hypoperfusion and inflammation leading to changes in ob function. immunostaining in the mouse suggests that ace protein is (nearly) ubiquitously expressed in sustentacular cells in the dorsal oe, despite sparse detection of ace transcripts using scseq. similarly, nearly all vascular cells positive for a pericyte marker also expressed ace protein, although only a fraction of ob pericytes were positive for ace message when assessed using scseq. although ace transcripts were more rarely detected than protein, there was a clear concordance at the cell type level: expression of ace mrna in a particular cell type accurately predicted the presence of ace protein, while ace transcript-negative cell types (including osns) did not express ace protein. if humans also exhibit a similar relationship between mrna and protein (a reasonable possibility given the precise match in olfactory cell types that express cov- cell entry genes between the two species), then ace protein is likely to be broadly expressed in human dorsal sustentacular cells. thus, in the there may be many sustentacular cells available for cov- infection in the human epithelium (which in turn could recruit a diffuse inflammatory process). that said, it remains possible that damage to the oe could be caused by more limited cell infection. for example, infection of subsets of sustentacular cells by the sdav coronavirus in rats ultimately leads to disruption of the global architecture of the oe, suggesting that focal coronavirus infection may be sufficient to cause diffuse epithelial damage ( ) . the natural history of cov -induced anosmia is only now being defined; while recovery of smell has been reported, it remains unclear whether in a subset of patients smell disturbances will be long-lasting or permanent ( ) ( ) ( ) ( ) ( ) ) . we observe that activated hbcs, which are recruited after injury, express ace at higher levels than those apparent in resting stem cells. while on its own it is likely that infection of stem cells would not cause acute smell deficits, in the context of infection the dual challenge of loss of sustentacular cells, together with the inability to effectively renew the oe over time, could result in persistent anosmia. many viruses, including coronaviruses, have been shown to propagate from the nasal epithelium past the cribriform plate to infect the ob; this form of central infection has been suggested to mediate olfactory deficits, even in the absence of lasting oe damage ( , ( ) ( ) ( ) ( ) ( ) . the rodent coronavirus mhv passes from the nose to the bulb, even though rodent osns do not express ceacam , the main mhv receptor ( , ) ( figures s c, s e, s a) , suggesting that covs in the nasal mucosa can reach the brain through mechanisms independent of axonal transport by sensory nerves; interestingly, ob dopaminergic juxtaglomerular cells express ceacam ( figure e ), which likely supports the ability of mhv to target the bulb and change odor perception. one speculative possibility is that local seeding of the oe with cov- -infected cells can result in osn-independent transfer of virions from the nose to the bulb, perhaps via the vascular supply shared between the ob and the osn axons that comprise cn i. although cn i was not directly queried in our datasets, it is reasonable to infer that vascular pericytes in cn i also express ace , which suggests a possible route of entry for cov- from the nose into the brain. given the absence of ace in ob neurons, we speculate that any central olfactory dysfunction in covid- is the secondary consequence of pericyte-mediated vascular inflammation ( ) . we note several caveats that temper our conclusions. although current data suggest that ace is the most likely receptor for cov- in vivo, it is possible (although it has not yet been demonstrated) that other molecules such as bsg may enable cov- entry independently of ace ( figures e, s c , s e, s a) ( , ) . in addition, it has recently been reported that low level expression of ace can support cov- cell entry ( ); it is possible, therefore, that ace expression beneath the level of detection in our assays may yet enable cov- infection of apparently ace negative cell types. we also propose that damage to the olfactory system is either due to primary infection or secondary inflammation; it is possible (although has not yet been demonstrated) that cells infected with cov- can form syncytia with cells that do not express ace . such a mechanism could damage neurons adjacent to infected cells. any reasonable pathophysiological mechanism for covid- -associated anosmia must account for the high penetrance of smell disorders relative to endemic viruses, the apparent suddenness of smell loss (which can precede the development of other symptoms), and the transient nature of dysfunction in many patients ( ) ( ) ( ) ( ) ( ) ( , ( ) ( ) ( ) ; definitive identification of the disease mechanisms underlying covid- -mediated anosmia will require additional research. nonetheless, our identification of cells in the oe and ob expressing molecules known to be involved in cov- entry illuminates a path forward for future studies. human scseq data from durante et al. ( ) was downloaded from the geo at accession gse . x genomics mtx files were filtered to remove any cells with fewer than total counts. additional preprocessing was performed as described above, including total counts normalization and filtering for highly variable genes using the spring gene filtering function "filter_genes" with parameters ( , , ) . the resulting data were visualized in spring and partitioned using louvain clustering on the spring k-nearest-neighbor graph. four clusters were removed for quality control, including two with low total counts (likely background) and two with high mitochondrial counts (likely stressed or dying cells). putative doublets were also identified using scrublet and removed ( % of cells). the remaining cells were projected to dimensions using pca. pca-batch-correction was performed using patient as a reference, as previously described ( ) . the filtered data were then re-partitioned using louvain clustering on the spring graph and each cluster was annotated using known marker genes, as described in ( ) . for example, immature and mature osns were identified via their expression of gng and gng , respectively. hbcs were identified via the expression of krt and tp and olfactory hbcs were distinguished from respiratory hbcs via the expression of cxcl and meg . identification of sus cells (cyp a , cyp j ), bowman's gland (sox , gpx ), and mv ionocytes-like cells (ascl , cftr, foxi ) was also performed using known marker genes. for visualization, the top principal components were reduced to two dimensions using umap with parameters (n_neighbors= , min_dist= . ). the filtered human scseq dataset contained cells. each of the samples contained cells from both the olfactory and respiratory epithelium, although the frequency of osns and respiratory cells varied across patients, as previously described ( ) . cells expressed ace and cells expressed tmprss . of the identified osns, including both immature and mature cells, none of the cells express ace and only ( . %) expressed tmprss . in contrast, ace was reliably detected in at least % and tmprss was expressed in close to % of multiple respiratory epithelial subtypes. the expression of both known cell type markers and known cov-related genes was also examined across respiratory and olfactory epithelial cell types. for these gene sets, the mean expression in each cell type was calculated and normalized by the maximum across cell types. data from deprez et al. ( ) were downloaded from the human cell atlas website (https://www.genomique.eu/cellbrowser/hca/; "single-cell atlas of the airway epithelium (grch human genome)"). a subset of these data was combined with a subset of the durante data for mapping between cell types. for the deprez data, the subset consisted of samples from the nasal re that belonged to a cell type with > cells, including basal, cycling basal, suprabasal, secretory, mucous multiciliated cells, multiciliated, sms goblet and ionocyte. we observed two distinct subpopulations of basal cells, with one of the two populations distinguished by expression of cxcl . the cells in this population were manually identified using spring and defined for downstream analysis as a separate cell type annotation called "basal (cxcl +)". for the durante data, the subset consisted of cells from cell types that had some putative similarity to cells in the deprez dataset, including olfactory hbc, cycling respiratory hbc, respiratory hbc, early respiratory secretory cells, respiratory secretory cells, sustentacular cells, bowman's gland, olfactory microvillar cells. to establish a cell type mapping: ) durante ( ) and deprez ( ) data were combined and gene expression values were linearly scaled so that all cells across datasets had the same total counts. pca was then performed using highly variable genes (n= genes) and pcabatch-correction ( ) with the durante data as a reference set. ) the table of votes t was z-scored against a null distribution, generated by repeating the procedure above times with shuffled cell type labels. the resulting z-scores were similar between the two possible mapping directions (durante -> deprez vs. deprez -> durante; r= . pearson correlation of mapping zscores). the mapping z-scores were also highly robust upon varying the number of votes-cast per cell (r> . correlation of mapping z-scores upon changing the vote numbers to or as opposed to ). only cell-type correspondences with a high zscore in both mapping directions (z-score > ) were used for downstream analysis. to establish a common scale of gene expression between datasets, we restricted to cell type correspondences that were supported both by bioinformatic mapping and shared a nominal cell type designation based on marker genes. these included: basal/suprabasal cells = "respiratory hbcs" from durante et al., and "basal" and "suprabasal" cells from deprez we next sought a transformation of the durante data so that it would agree with the deprez data within the corresponding cell types identified above to account for differing normalization strategies applied to each dataset prior to download (log normalization and rescaling with cell-specific factors for deprez et al. but not for durante et al.), we used the following ansatz for the transformation, where the pseudocount p is a global latent parameter and the rescaling factors ! are fit to each gene separately. in the equation below, t denotes the transformation and !" represents a gene expression value for cell i and gene j in the durante data: the parameter p was fit by maximizing the correlation of average gene expression across all genes between each of the cell type correspondences listed above. the rescaling factors ! were then fitted separately for each gene by taking the quotient of average gene expression between the deprez data and the log-transformed durante data, again across the cell type correspondences above. normalized gene expression tables were obtained from previous published datasets ( , ( ) ( ) ( ) . for the mouse data sets, the means of the replicates from wom or osn were used to calculate log fold changes. for the mouse data from saraiva et al. and the primate data sets ( , ) , the normalized counts of the genes of interest from individual replicates were plotted. below is a table with detailed sample information. sample information for the bulk rna-seq data analyzed in this study a new dataset of whole olfactory mucosa scseq was generated from adult male mice ( - weeks-old). all mouse husbandry and experiments were performed following institutional and federal guidelines and approved by harvard medical school's institutional animal care and use committee (iacuc). briefly, dissected main olfactory epithelium were cleaned up in µl of ebss (worthington) and epithelium tissues were isolated in µl of papain ( u/ml in ebss) and µl of dnase i ( u/ml). tissue pieces were transferred to a ml round-bottom tube (bd) and . ml of papain and µl of dnase i were added. after - . hour incubation with rocking at °c, the suspension was triturated with a ml pipette times and passed through µm cell strainer (bd) and strainer was washed with ml of dmem + % fbs (invitrogen). the cell suspension was centrifuged at g for min. cells were resuspended with ml of dmem + % fbs and centrifuged at g for min. cells were suspended with pbs + . % bsa and concentration was measured by hemocytometer. drop-seq experiments were performed as previously described ( ) . microfluidics devices were obtained from flowjem and barcode beads were obtained from chemgenes. of min drop-seq runs were collected in total, which were obtained from mice. replicates of drop-seq samples were sequenced across runs on an illumina nextseq platform. paired end reads from the fastq files were trimmed, aligned, and tagged via the drop-seq tools (v . ) pipeline, using star (v . . a) with genomic indices from ensembl release . the digital gene expression matrix was generated for , cells for _ , , cells for , _ , _ds , _ds , _ds , , cells for _ds , and , cells for . processing of the wom drop-seq samples was performed in seurat (v . . ). cells with less than umis or more than , umis, or higher than % mitochondrial genes were removed. potential doublets were removed using scrublet. cells were initially preprocessed using the seurat pipeline. variable genes "findvariablegenes" (y.cutoff = . ) were scaled (regressing out effects due to numi, the percent of mitochondrial genes, and replicate ids) and the data was clustered using pcs with the louvain algorithm (resolution= . ). in a fraction of sustentacular cells, we observed co-expression of markers for sustentacular cells and other cell types (e.g. osns). re-clustering of sustentacular cells alone separately out these presumed doublets from the rest of the sustentacular cells, and the presumed doublets were removed for the analyses described below. the filtered cells from the preprocessing steps were reanalyzed in python using scanpy and spring. in brief, the raw gene counts in each cell were total counts normalized and variable genes were identified using the spring gene filtering function "filter_genes" with parameters ( , , ); mitochondrial and olfactory receptor genes were excluded from the variable gene lists. the resulting variable genes were zscored and the dimensionality of the data was reduced to via principal component analysis. the k-nearest neighbor graph (n_neighbors= ) of these pcs was clustered using the leiden algorithm (resolution= . ) and was reduced to two dimensions for visualization via the umap method (min_dist= . ). clusters were manually annotated on the basis of known marker genes and those sharing markers (e.g. olfactory sensory neurons) were merged. the mouse wom drop-seq dataset contained cells that passed the above filtering. each of the clusters identified contained cells from all replicates in roughly equal proportions. of the mature osns and the immature osns, none of the cells express ace . in contrast, in the olfactory epithelial cells, ace expression was observed in the bowman's gland, olfactory hbcs, dorsal sustentacular cells. mice were sacrificed with a lethal dose of xylazine and nasal epithelium with attached olfactory bulbs were dissected and fixed in % paraformaldehyde (electron microscope sciences, ) in phosphate-buffered saline (pbs) for overnight at °c or for hours at room temperature. tissues were washed in pbs for times ( min each) and incubated in . m edta in pbs overnight at °c. the following day, tissues were rinsed by pbs and incubated in % sucrose in pbs for at least min, transferred to tissue freezing medium (vwr, - ) for at least min and frozen on crushed dry ice and stored at - °c until sectioning. tissue sections ( µm thick for the olfactory bulb and µm thick for nasal epithelium) were collected on superfrost plus glass slides (vwr, ) and stored at - °c until immunostaining. for methimazole treated samples, adult c bl/ j mice ( - weeks old, jax stock no. ) were given intraperitoneal injections with methimazole (sigma m ) at µg/g body weight and sacrificed at , , and -hour timepoints. sections were permeabilized with . % triton x- in pbs for min then rinsed times in pbs. sections were then incubated for - min in blocking solution that consisted of pbs containing % bovine serum albumin (jackson immunoresearch, - - ) and % donkey serum (jackson immunoresearch, - - ) at room temperature, followed by overnight incubation at °c with primary antibodies diluted in the same blocking solution. primary antibodies used are as follows. after secondary antibody incubation, sections were washed twice for - min in pbs, incubated with nm dapi in pbs for min and then rinsed with pbs. slides were mounted with glass coverslips using vectashield mounting medium (vector laboratories, h- ) or prolong diamond antifade mountant (invitrogen, p ). for co-staining of ace and nqo , slides were first stained with ace primary antibody and donkey anti-goat igg alexa secondary. after washes of secondary antibody, tissues were incubated with unconjugated donkey anti-goat igg fab fragments (jackson immunoresearch, - - ) at µg/ml diluted in blocking solution for hour at room temperature. tissues were washed twice with pbs, once in blocking solution, and incubated in blocking solution for - min at room temperature, followed by a second round of staining with the nqo primary antibody and donkey anti-goat igg alexa secondary antibody. confocal images were acquired using a leica spe microscope (harvard medical school neurobiology imaging facility) with nm, nm, nm, and nm laser lines. multi-slice z-stack images were acquired, and their maximal intensity projections are shown. for figure e , tiled images were acquired and stitched by the leica las x software. images were processed using fiji imagej software ( ) , and noisy images were median-smoothed using the remove outliers function built into fiji. sult c rna was detected by fluorescent rnascope assay (advanced cell diagnostics, kit ) using probe -c , following the manufacturer's protocol (rnascope fluorescent multiplex kit user manual, -um date ) for paraformaldehyde-fixed tissue. prior to initiating the hybridization protocol, the tissue was pre-treated with two successive incubations (first min, then min long) in rnascope protease iii (advanced cell diagnostics, ) at °c, then washed in distilled water. at the end of protocol, the tissue was washed in pbs and subjected to the -day immunostaining protocol described above. human olfactory mucosa biopsies were obtained via irb-approved protocol at duke university school of medicine, from nasal septum or superior turbinate during endoscopic sinus surgery. tissue was fixed with % paraformaldehyde and cryosectioned at µm and sections were processed for immunostaining, as previously described ( ) . sections from a female nasal septum biopsy were stained for ace ( figure f ) using the same goat anti-ace (thermo fisher, pa - , : ) and the protocol described above for mouse tissue. the human sections were co-stained with rabbit antikeratin (abcam, ab ; ab_ , : ) and were detected with alexafluor donkey anti-goat (jackson immunoresearch, - - ) and alexafluor donkey anti-rabbit (jackson immunoresearch, - - ) secondary antibodies ( : ). as further validation of ace expression and to confirm the lack of ace expression in human olfactory sensory neurons ( figure s e ), sections were stained with a rabbit anti-ace (abcam, ab ; rrid:ab_ , used at : ) antibody immunogenized against human ace and a mouse tuj antibody against neuronspecific tubulin (biolegend, ; rrid:ab_ ). anti-ace was raised against a c-terminal synthetic peptide for human ace and was validated by the manufacturer to not cross-react with ace for immunohistochemical labeling of ace in fruit bat nasal tissue as well as in human lower airway. recombinant human ace abolished labeling with this antibody in a previous study in human tissue, further demonstrating its specificity ( ) . the tuj antibody was validated, as previously described ( ) . biotinylated secondary antibodies (vector labs), avidin-biotinylated horseradish peroxidase kit (vector) followed by fluorescein tyramide signal amplification (perkin elmer) were applied per manufacturer's instructions. for dual staining, tuj was visualized using alexafluor goat anti-mouse (jackson immunoresearch, - - ; rrid: ab_ ). human sections were counterstained with ', -diamidino- -phenylindole (dapi) and coverslips were mounted using prolong gold (invitrogen) for imaging, using a leica dmi microscope system. images were processed using fiji imagej software (nih). scale bars were applied directly from the leica acquisition software metadata in imagej tools. unsharp mask was applied in imagej, and brightness/contrast was adjusted globally. month-old and month-old wild type c bl/ j mice were obtained from the national institute on aging aged rodent colony and used for the wom experiments; each experimental condition consisted of one male and one female mouse to aid doublet detection. mice containing the transgenic krt -creer(t ) driver ( ) and rosa -yfp reporter allele ( ) were used for the hbc lineage tracing dataset. all mice were assumed to be of normal immune status. animals were maintained and treated according to federal guidelines under iacuc oversight at the university of california, berkeley. the olfactory epithelium was surgically removed, and the dorsal, sensory portion was dissected and dissociated, as previously described ( ) . for wom experiments, dissociated cells were subjected to fluorescence-activated cell sorting (facs) using propidium iodide to identify and select against dead or dying cells; , cells/sample were collected in % fbs. for the hbc lineage tracing experiments krt -creer; rosa yfp/yfp mice were injected once with tamoxifen ( . mg tamoxifen/g body weight) at p - days of age and sacrificed at hours, hours, hours, days and days post-injury, as previously described ( , ) . for each experimental time point, yfp+ cells were isolated by facs based on yfp expression and negative for propidium iodide, a vital dye. cells isolated by facs were subjected to single-cell rna-seq. three replicates (defined here as a facs collection run) per age were analyzed for the wom experiment; at least two biological replicates were collected for each experimental condition for the hbc lineage tracing experiment. single cell cdna libraries from the isolated cells were prepared using the chromium single cell ' system according to the manufacturer's instructions. the wom preparation employed v chemistry with the following modification: the cell suspension was directly added to the reverse transcription master mix, along with the appropriate volume of water to achieve the approximate cell capture target. the hbc lineage tracing experiments were performed using v chemistry. the . % weight/volume bsa washing step was omitted to minimize cell loss. completed libraries were sequenced on illumina hiseq to produce paired-end nt reads. sequence data were processed with the x genomics cell ranger pipeline ( . . for v chemistry), resulting in the initial starting number before filtering of , wom cells and , hbc lineage traced cells. the scone r/bioconductor package ( ) was used to filter out lowly-expressed genes (fewer than umi's in fewer than cells) and low-quality libraries (using the metric_sample_filter function with arguments hard_nreads = , zcut = ). cells with co-expression of male (ddx y, eif s y, kdm d, and uty) and female marker genes (xist) were removed as potential doublets from the wom dataset. for both datasets, doublet cell detection was performed per sample using doubletfinder ( ) and scrublet ( ) . genes with at least umis in at least cells were used for downstream clustering and cell type identification. for the hbc lineage tracing dataset, the bioconductor package scone was used to pick the top normalization ("none,fq,ruv_k= ,no_bio,batch"), corresponding to full quantile normalization, batch correction and removing one factor of unwanted variation using ruv ( ) . a range of cluster labels were created by clustering using the partitioning around medoids (pam) algorithm and hierarchical clustering in the clusterexperiment bioconductor package ( ) , with parameters k s= ( , , , , , ) and alpha=(na, . , . , . ). clusters that did not show differential expression were merged (using the function mergeclusters with arguments mergemethod = 'adjp', cutoff = . , and demethod = 'limma' for the lineagetraced dataset). initial clustering identified one macrophage (msr +) cluster consisting of cells; upon its removal and restarting from the normalization step a subsequent set of clusters was obtained. these clusters were used to filter out cells for which no stable clustering could be found (i.e., 'unassigned' cells), and four clusters respectively consisting of , and and cells. doublets were identified using doubletfinder and putative doublets were removed. inspection of the data in a three-dimensional umap embedding identified two groups of cells whose experimentally sampled timepoint did not match their position along the hbc differentiation trajectory, and these additional cells were also removed from subsequent analyses. analysis of wom scseq data were performed in python using the open-source scanpy software starting from the raw umi count matrix of the cells passing the initial filtering and qc criteria described above. umis were total-count normalized and scaled by , (tpt, tag per ten-thousands) and then log-normalized. for each gene, the residuals from linear regression models using the total number of umis per cell as predictors were then scaled via z-scoring. pca was then performed on a set of highlyvariable genes (excluding or genes) calculated using the "highly_variable_genes" function with parameters: min_mean= . , max_mean= , min_disp= . . a batch corrected neighborhood graph was constructed by the "bbknn" function with pcs with the parameters: local_connectivity= . , and embedding two-dimensions using the umap function with default parameters (min_dist = . ). cells were clustered using the neighborhood graph via the leiden algorithm (resolution = . ). identified clusters were manually merged and annotated based on known marker gene expression. we the filtered hbc lineage dataset containing cells was analyzing in python and processed for visualization using pipelines in spring and scanpy ( , ) . in brief, total counts were normalized to the median total counts for each cell and highly variable genes were selected using the spring gene filtering function ("filter_genes") using parameters ( , , ) . the dimensionality of the data was reduced to using principal components analysis (pca) and visualized in two-dimensions using the umap method with parameters (n_neighbors= , min_dist= . ). clustering was performed using the leiden algorithm (resolution= . ) and clusters were merged manually using known marker genes. expression of candidate cov- -related genes was defined if at least one transcript (umi) was detected in that cell, and the percent of cells expressing candidate genes was calculated for each cell type. in the wom dataset ace was only detected in out of , mature osns ( . %), and in the hbc lineage dataset, ace was not detected in any osns. furthermore, ace was not detected in immature sensory neurons (gbcs, inps, or iosns) in either dataset. single-cell rna-seq data from hbc-derived cells from fletcher et al. and gadye et al ( , ) , labeled via krt -creer driver mice, were downloaded from geo at accession gse using the file "gse _oehbcdiff_cufflinks_eset_counts_table.txt.gz". processing was performed as described above, including total counts normalization and filtering for highly variable genes using the spring gene filtering function "filter_genes" with parameters ( , , ) . the resulting data were visualized in spring and a subset of cells were removed for quality control, including a cluster of cells with low total counts and another with predominantly reads from ercc spike-in controls. putative doublets were also identified using scrublet and removed ( % of cells) ( ) . the resulting data were visualized in spring and partitioned using louvain clustering on the spring k-nearest-neighbor graph using the top principal components. cell type annotation was performed manually using the same set of markers genes listed above. three clusters were removed for quality control, including one with low total counts and one with predominantly reads from ercc spike-in controls (likely background), and one with high mitochondrial counts (likely stressed cells). for visualization, and clustering the remaining cells were projected to dimensions using pca and visualized with umap with parameters (n_neighbors= , min_dist= . , alpha= . , maxiter= ). clustering was performed using the leiden algorithm (resolution= . ) and cell types were manually annotated using known marker genes. the filtered dataset of mouse hbc-derived cells contained cells. the percent of cells expressing each marker gene was calculated as described above. of the osns identified, none of them expressed ace , and only out of inps and iosns expressed ace . in contrast, ace and tmprss were both detected in hbcs and sus cells. single-cell rnaseq data from whole mouse olfactory bulb ( ) were downloaded from mousebrain.org/loomfiles_level_l .html in loom format (l olfactory.loom) and converted to a seurat object. samples were obtained from juvenile mice (age postnatal day [ ] [ ] [ ] [ ] . this dataset comprises cells passing cell quality filters, excluding cells identified as potential doublets. a new dataset of whole olfactory bulb scseq was generated from adult male mice ( - weeks-old). all mouse husbandry and experiments were performed following institutional and federal guidelines and approved by harvard medical school's institutional animal care and use committee (iacuc). briefly, dissected olfactory bulbs (including the accessory olfactory bulb and fractions of the anterior olfactory nucleus) were dissociated in µl of dissociation media (dm: hbss containing mm hepes, mm mgcl , mm d-glucose) with u/ml papain and u/ml dnase i (worthington). minced tissue pieces were transferred to a ml round-bottom tube (bd). dm was added to a final volume of . ml and the tissue was mechanically triturated times with a p pipette tip. after -hour incubation with rocking at °c, the suspension was triturated with a ml pipette times and . ml was passed through µm cell strainer (bd). the suspension was then mechanically triturated with a p pipette tip times and µl were filtered on the same strainer. the cell suspension was further triturated with a p pipette tip times and filtered. ml of quench buffer ( ml of dm, . ml of protease inhibitor prepared by resuspending vial of protease inhibitor with ml of dm, and u of dnase i) was added to the suspension and centrifuged at g for min. cells were resuspended with ml of quench buffer and overlaid gently on top of ml of protease inhibitor, then spun down at g for min. the pellet was resuspended using dm supplemented with . % bsa and spun down at g for min. cells were suspended in µl of dm with . % bsa. drop-seq experiments were performed as previously described ( ) . microfluidics devices were obtained from flowjem and barcode beads were obtained from chemgenes. two min drop-seq runs were collected from a single dissociation preparation obtained from mice. two such dissociations were performed, giving total replicates. replicates of drop-seq samples were pooled and sequenced across runs on an illumina nextseq platform. paired end reads from the fastq files were trimmed, aligned, and tagged via the drop-seq tools ( - . ) pipeline, using star ( . . a) with genomic indices from ensembl release . the digital gene expression matrix was generated for , cells per replicate. cells with low numbers of genes ( ), low numbers of umis ( ) or high numbers of umis (> ) were removed ( % of cells). potential doublets were identified via scrublet and removed ( . % of cells). overall, this new dataset comprised cells. raw umi counts from juvenile and adult whole olfactory bulb samples were integrated in seurat ( ) . integrating the datasets ensured that clusters with rare cell types could be identified and that corresponding cell types could be accurately matched. as described below (see figure s ), although some cell types were observed with different frequencies, the integration procedure yielded stable clusters with cells from both datasets. briefly, raw counts were log-normalised separately and the most variable genes identified by variance stabilizing transformation for each dataset. the variable genes present in both datasets and the first principal components (pcs) were used as features for identifying the integration anchors. the integrated expression matrix was scaled and dimensionality reduced using pca. based on their percentage of explained variance, the first pcs were chosen for umap visualisation and clustering. graph-based clustering was performed using the louvain algorithm following the standard seurat workflow. cluster stability was analysed with clustree on a range of resolution values ( . to . ), with . yielding the most stable set of clusters ( ) . overall, clusters were identified, the smallest of which contained only cells with gene expression patterns consistent with blood cells, which were excluded from further visualisation plots. clustering the two datasets separately yielded similar results. moreover, the distribution of cells from each dataset across clusters was homogenous ( figure s ) and the clusters corresponded previous cell class and subtype annotations ( ) . as previously reported, a small cluster of excitatory neurons (cluster ) contained neurons from the anterior olfactory nucleus. umap visualisations of expression level for cell class and cell type markers, and for genes coding for coronavirus entry proteins, depict log-normalized umi counts. the heatmap in figure b shows the mean expression level for each cell class, normalised to the maximum mean value. the percentage of cells per cell class expressing ace was defined as the percentage of cells with at least one umi. in cells from both datasets, ace was enriched in pericytes but was not detected in neurons. acute olfactory bulb µm slices were obtained from dat-cre/flox-tdtomato (b .sjl-slc a tm . (cre) bkmn/j , jax stock / b .cg-gt(rosa) sor tm (cag-tdtomato)hze , jax stock ) p mice as previously described ( ) . as part of a wider study, at p these mice had undergone brief h unilateral naris occlusion via a plastic plug insert (n = mice) or were subjected to a sham control manipulation (n = mice); all observed effects here were independent of these treatment groups. single cell suspensions were generated using the neural tissue dissociation kit -postnatal neurons (miltenyi biotec. cat no. - - ), following manufacturer's instructions for manual dissociation, using fired-polished pasteur pipettes of progressively smaller diameter. after enzymatic and mechanical dissociations, cells were filtered through a µm cell strainer, centrifuged for minutes at ° c, resuspended in µl of acsf (in mm: nacl, . kcl, . nah po , hepes, glucose, mgcl , cacl ) with channel blockers ( . µm ttx, µm cnqx, µm d-apv) and kept on ice to minimise excitotoxicity and cell death. for manual sorting of fluorescently labelled dopaminergic neurons we adapted a previously described protocol ( ) . µl of single cell suspension was dispersed on . mm petri dishes (with a sylgard-covered base) containing ml of acsf + channel blockers. dishes were left undisturbed for minutes to allow the cells to sink and settle. throughout, dishes were kept on a metal plate on top of ice. tdtomato-positive cells were identified by their red fluorescence under a stereoscope. using a pulled glass capillary pipette attached to a mouthpiece, individual cells were aspirated and transferred to a clean, empty dish containing ml acsf + channel blockers. the same cell was then transferred to a third clean plate, changing pipettes for every plate change. finally, each individual cell was transferred to a . ml pcr tube containing µl of lysis buffer (rlt plus -qiagen). the tube was immediately placed on a metal plate sitting on top of dry ice for flash-freezing. collected cells were stored at - c until further processing. positive (more than cells) and negative (sample collection procedure without picking a cell) controls were collected for each sorting session. in total, we collected samples from . hours elapsed between mouse sacrifice and collection of the last cell in any session. samples were processing using a modified version of the smart-seq protocol ( ) . briefly, µl of a : , , dilution of ercc spike-ins (invitrogen. cat. no. ) was added to each sample and mrna was captured using modified oligo-dt biotinylated beads (dynabeads, invitrogen). pcr amplification was performed for cycles. amplified cdna was cleaned with a . : ratio of ampure-xp beads (beckman coulter). cdnas were quantified on qubit using hs dna reagents (invitrogen) and selected samples were run on a bioanalyzer hs dna chip (agilent) to evaluate size distribution. for generating the sequencing libraries, individual cdna samples were normalised to . ng/µl and µl was used for one-quarter standard-sized nextera xt (illumina) tagmentation reactions, with amplification cycles. sample indexing was performed using index sets a and d (illumina). at this point, individual samples were pooled according to their index set. pooled libraries were cleaned using a . : ratio of ampure beads and quantified on qubit using hs dna reagents and with the kapa library quantification kits for illumina (roche). samples were sequenced on two separate rapid-runs on hiseq (illumina), generating bp paired-end reads. an additional samples were sequenced on miseq (illumina). paired-end read fastq files were demultiplexed, quality controlled using fastqc (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using trim galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). reads were pseudoaligned and quantified using kallisto ( ) against a reference transcriptome from ensembl release (gencode release m grcm .p ) with sequences corresponding to the ercc spike-ins and the cre recombinase and tdt genes added to the index. transcripts were collapsed into genes using the sumacrossfeatures function in scater. cell level quality control and cell filtering was performed in scater ( ) . cells with < genes, < , reads, > % reads mapping to ercc spike-ins, > % reads mapping to mitochondrial genes or low library complexity were discarded ( % samples). the population of olfactory bulb cells labelled in dat-tdtomato mice is known to include a minor non-dopaminergic calretinin-positive subgroup ( ) , so calretininexpressing cells were excluded from all analyses. the sctransform function in seurat was used to remove technical batch effects. an analysis of single-cell gene expression data from studies was performed to investigate the expression of genes coding for coronavirus entry proteins in neurons from a range of brain regions and sensory systems. processed gene expression data tables were obtained from scseq studies that evaluated gene expression in retina (gse ) ( ) inner ear sensory epithelium (gse ) ( , ) and spiral ganglion (gse ) ( ) , ventral midbrain (gse ) ( ) , hippocampus (gse ) ( ), cortex (gse ) ( ), hypothalamus (gse ) ( ), visceral motor neurons (gse ) ( ) , dorsal root ganglia (gse ) ( ) and spinal cord dorsal horn (gse ) ( ) . smart-seq sequencing data from vsx -gfp positive cells was used from the retina dataset. a subset of the expression matrix that corresponds to day (i.e. control, undisturbed neurons) was used from the layer vi somatosensory cortex dataset. a subset of the data containing neurons from untreated (control) mice was used from the hypothalamic neuron dataset. from the ventral midbrain dopaminergic neuron dataset, a subset comprising dat-cre/tdtomato positive neurons from p mice was used. a subset comprising type i neurons from wild type mice was used from the spiral ganglion dataset. the "unclassified" neurons were excluded from the visceral motor neuron dataset. a subset containing neurons that were collected at room temperature was used from the dorsal root ganglia dataset. expression data from dorsal horn neurons obtained from c /bl wild type mice, vgat-cre-tdtomato and vglut -egfp mouse lines was used from the spinal cord dataset. inspection of all datasets for batch effects was performed using the scater package (version . . ) ( ) . publicly available raw count expression matrices were used for the retina, hippocampus, hypothalamus, midbrain, visceral motor neurons and spinal cord datasets, whereas the normalized expression data was used from the inner ear hair cell datasets. for datasets containing raw counts, normalization was performed for each dataset separately by computing pool-based size factors that are subsequently deconvolved to obtain cell-based size factors using the scran package (version . . ) ( ). violin plots were generated in scater. ( ) . normalized counts for each gene in the whole olfactory mucosa (wom) and olfactory sensory neurons (osns) are shown. each circle represents a biological replicate and each color indicates the category of the gene shown on the right (cov- and other covs: genes involved in the entry of these viruses, other categories: marker genes for specific cell types such fig. a for three bulk rna-sequencing datasets. mhv, mouse hepatitis virus. left plot is same as fig. a except for the addition of ceacam . (c) gene expression for cov-related genes including ace and tmprss as well as marker genes for olfactory and re subtypes are shown normalized by their maximum expression across cell types. ace and tmprss are expressed in wom respiratory and non-neuronal olfactory cell types, but not in osns. (d) umap representations of gene expression in the wom dataset for cov- related genes ace and tmprss , as well as marker genes for each cell type. each point represents an individual cell, and the color represents the normalized expression level for each gene (number of umis for a given gene divided by the total number of umis for each cell). (e) fluorescent in situ hybridization of an identified dorsal sustentacular cell marker, sult c (in yellow), combined with immunostaining for the known dorsal osn marker nqo (white). note that sult c rna fills the apical cytoplasm; given that sustentacular cells are ubiquitous in the epithelium, this is apparent as broad antisense signal for sult c in a pattern that is characteristic of the apical anatomy of sustentacular cells. sult c rna is detected in sustentacular cells in the nqo -positive dorsal olfactory epithelium. nuclei were stained with dapi (blue). bar = µm. granule cells ( ) granule cells ( ) immature neurons ( ) granule cells ( ) calretinin neurons ( ) astrocytes ( ) olfactory ensheathing cells ( ) immature neurons ( ) interneurons ( ) microglia ( ) oligodendrocytes ( ) dopaminergic neurons ( ) interneurons ( ) mitral/tufted cells -aon ( ) astrocytes ( ) vascular ( ) oligo precursor cells ( ) pericytes ( ) external tufted cells ( ) mitral/tufted cells ( ) perivascular macrophages ( ) vip neurons ( ) vascular leptomeningeal cells ( ) intermediate progenitor cells ( ) granule cells ( hypothalamus (romanov) retina (shekhar) inner ear spiral ganglion (shrestha) dorsal root ganglia (usoskin) clinical characteristics of coronavirus disease in china the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention a pneumonia outbreak associated with a new coronavirus of probable bat origin differences and similarities between severe acute respiratory syndrome (sars)-coronavirus (cov) and sars-cov- . would a rose by another name smell as sweet? coronavirusesdrug discovery and therapeutic options hosts and sources of endemic human coronaviruses coincidence of covid- epidemic and olfactory dysfunction outbreak self-reported olfactory and taste disorders in sars-cov- patients: a cross-sectional study virological assessment of hospitalized patients with covid- olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (covid- ): a multicenter european study loss of smell and taste in combination with other symptoms is a strong predictor of covid- infection. medrxiv a primer on viral-associated olfactory loss in the era of covid- sudden and complete olfactory loss function as a possible symptom of covid- european patients with mild-to-moderate coronavirus disease sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace efficient replication of severe acute respiratory syndrome coronavirus in mouse cells is limited by murine angiotensin-converting enzyme a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease influenza and sars-coronavirus activating proteases tmprss and hat are expressed at multiple sites in human respiratory and gastrointestinal tracts middle east respiratory syndrome coronavirus and bat coronavirus hku both can utilize grp for attachment onto host cells the laboratory mouse comparative anatomy, physiology, and function of the upper respiratory tract phagocytic cells in the rat olfactory epithelium after bulbectomy supporting cells as phagocytes in the olfactory epithelium after bulbectomy ionic conductances in sustentacular cells of the mouse olfactory epithelium novel role of cystic fibrosis transmembrane conductance regulator in maintaining adult mouse olfactory neuronal homeostasis olfactory epithelium: cells, clinical disorders, and insights from an adult stem cell niche stem and progenitor cells of the mammalian olfactory epithelium: taking poietic license deconstructing olfactory stem cell trajectories at single-cell resolution sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes a single-cell atlas of the human healthy airways a cellular census of human lungs identifies novel cell states in health and in asthma a transcriptomic atlas of mammalian olfactory mucosae reveals an evolutionary influence on food odor detection in humans single-cell analysis of olfactory neurogenesis and differentiation in adult humans sars-cov- entry genes are most highly expressed in nasal goblet and ciliated cells within human airways hierarchical deconstruction of mouse olfactory sensory neurons: from whole mucosa to single-cell rna-seq deep sequencing of the murine olfactory receptor neuron transcriptome dnmt a regulates global gene expression in olfactory sensory neurons and enables odorant-induced transcription molecular architecture of the mouse nervous system pericytes and neurovascular function in the healthy and diseased brain tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis sudden and complete olfactory loss function as a possible symptom of covid- a single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte morphologic changes in the nasal cavity associated with sialodacryoadenitis virus infection in the wistar rat early recovery following new onset anosmia during the covid- pandemic -an observational cohort study neurologic alterations due to respiratory virus infections functional consequences following infection of the olfactory system by intranasal infusion of the olfactory bulb line variant (oblv) of mouse hepatitis strain jhm systemic diseases and disorders the olfactory nerve and not the trigeminal nerve is the major site of cns entry for mouse hepatitis virus, strain jhm intranasal inoculation with the olfactory bulb line variant of mouse hepatitis virus causes extensive destruction of the olfactory bulb and accelerated turnover of neurons in the olfactory epithelium of mice ceacam a-/-mice are completely resistant to infection by murine coronavirus mouse hepatitis virus a function of hab g/cd in invasion of host cells by severe acute respiratory syndrome coronavirus sars-cov- invades host cells via a novel route: cd -spike protein sars-cov- productively infects human gut enterocytes lineage tracing on transcriptional landscapes links state to fate during differentiation highly parallel genome-wide expression profiling of individual cells using nanoliter droplets fiji: an open-source platform for biological-image analysis angiotensinconverting enzyme is reduced in alzheimer's disease in association with increasing amyloid-β and tau pathology temporally-controlled site-specific mutagenesis in the basal layer of the epidermis: comparison of the recombinase activity of the tamoxifeninducible cre-er(t) and cre-er(t ) recombinases cre reporter strains produced by targeted insertion of eyfp and ecfp into the rosa locus injury activates transient olfactory stem cell states with diverse lineage capacities performance assessment and selection of normalization procedures for single-cell rna-seq doubletfinder: doublet detection in single-cell rna sequencing data using artificial nearest neighbors scrublet: computational identification of cell doublets in single-cell transcriptomic data normalization of rna-seq data using factor analysis of control genes or samples clusterexperiment and rsec: a bioconductor package and framework for clustering of single-cell and other large gene expression datasets spring: a kinetic interface for visualizing high dimensional single-cell expression data scanpy: large-scale single-cell gene expression data analysis comprehensive integration of single-cell data clustering trees: a visualization for evaluating clusterings at multiple resolutions embryonic and postnatal neurogenesis produce functionally distinct subclasses of dopaminergic neuron a manual method for the purification of fluorescently labeled neurons from the mammalian brain separation and parallel sequencing of the genomes and transcriptomes of single cells using g&t-seq near-optimal probabilistic rnaseq quantification scater: pre-processing, quality control, normalization and visualization of single-cell rna-seq data in r the transcription factor pax regulates survival of dopaminergic olfactory bulb neurons via crystallin αa comprehensive classification of retinal bipolar neurons by single-cell transcriptomics single-cell rna-seq resolves cellular complexity in sensory organs from the neonatal inner ear characterization of spatial and temporal development of type i and type ii hair cells in the mouse utricle using new celltype-specific markers sensory neuron diversity in the inner ear is shaped by activity sensory neuron diversity molecular diversity of midbrain development in resource molecular diversity of midbrain development in mouse, human and stem cells dissociable structural and functional hippocampal outputs via distinct subiculum cell classes variation in activity state, axonal projection, and position define the transcriptional identity of individual neocortical projection neurons molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes visceral motor neuron diversity delineates a cellular basis for nipple-and pilo-erection muscle control unbiased classification of sensory neuron types by large-scale single-cell rna sequencing neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types pooling across cells to normalize single-cell rna sequencing data with many zero counts we thank members of the datta lab, james schwob, bernardo sabatini, andreas schaefer, kevin franks, michael greenberg and vanessa ruta for helpful comments on the manuscript. we thank james lipscombe and andres crespo for technical support. data and materials availability: reanalyzed datasets are obtained from the urls listed in supplementary materials. all data is currently being deposited and will be made publicly accessible from the ncbi geo at accession gse . normalized expression * * * olfactory ensheathing cells (oec) and respiratory cells. (e) gene expression for cov-related genes including ace and tmprss as well as marker genes for olfactory and re subtypes are shown normalized by their maximum expression across cell types. ace and tmprss are expressed in wom respiratory and olfactory cell types, but not in osns. (f) cov- related genes ace and tmprss , as well as marker genes for cell types in fig. c ., in umap representation of wom dataset with normalized expression. gfap-positive oecs (olfactory ensheathing cells) and muc b-positive secretory cells are indicated by asterisks. key: cord- -njlxrxv authors: yang, ziwei; zhang, xiaolin; wang, fan; wang, peihui; kuang, ersheng; li, xiaojuan title: suppression of mda -mediated antiviral immune responses by nsp of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: njlxrxv melanoma differentiation-associated gene- (mda ) acts as a cytoplasmic rna sensor to detect viral dsrna and mediates type i interferon (ifn) signaling and antiviral innate immune responses to infection by rna viruses. upon recognition of viral dsrna, mda is activated with k -linked polyubiquitination and then triggers the recruitment of mavs and activation of tbk and ikk, subsequently leading to irf and nf-κb phosphorylation. great numbers of symptomatic and severe infections of sars-cov- are spreading worldwide, and the poor efficacy of treatment with type i interferon and antiviral agents indicates that sars-cov- escapes from antiviral immune responses via an unknown mechanism. here, we report that sars-cov- nonstructural protein (nsp ) acts as an innate immune suppressor and inhibits type i ifn signaling to promote infection of rna viruses. it downregulates the expression of type i ifns, ifn-stimulated genes and proinflammatory cytokines by binding to mda and impairing its k -linked polyubiquitination. our findings reveal that nsp mediates innate immune evasion during sars-cov- infection and may serve as a potential target for future therapeutics for sars-cov- infectious diseases. importance the large-scale spread of covid- is causing mass casualties worldwide, and the failure of antiviral immune treatment suggests immune evasion. it has been reported that several nonstructural proteins of severe coronaviruses suppress antiviral immune responses; however, the immune suppression mechanism of sars-cov- remains unknown. here, we revealed that nsp protein of sars-cov- directly blocks the activation of the cytosolic viral dsrna sensor mda and significantly downregulates antiviral immune responses. our study contributes to our understanding of the direct immune evasion mechanism of sars-cov- by showing that nsp suppresses the most upstream sensor of innate immune responses involved in the recognition of viral dsrna. severe acute respiratory syndrome coronavirus (sars-cov- ) is an emerging severe coronavirus that is currently causing a global outbreak of coronavirus disease . it has infected more than twenty million patients and caused more than two-thirds of a million deaths. the number of patients and deaths are still rapidly increasing; however, no effective therapy, vaccine or cure are available. knowledge of sars-cov- their induction upon stimulation, and nsp once again had suppressive effects on the expression of these cytokines (fig. c , red vs. blue columns). taken together, these results suggest that nsp suppresses mavs- dependent innate immune responses, probably by acting on either mavs or upstream rna sensors. since viral rna of coronaviruses contains a methylated '-cap and '-polya tail that is similar to cellular mrna, we assumed that nsp may preferentially regulate mda -mediated responses rather than rig-i- mediated responses that recognize '-ppprna. hence, we performed coimmunoprecipitation (co-ip) analysis and found that nsp interacts with mda (fig. a) . furthermore, we mapped the binding domain and determined that its card domain is responsible for the nsp interaction (fig. b) . as a result, mda -mediated isre-luc activity was inhibited by nsp in a dose-dependent manner (fig. c ). in addition, confocal microscopy demonstrated that nsp tightly colocalized with mda inside cells (fig. d) . these results suggest that nsp interacts with mda and directly suppresses mda -mediated immune responses. to further understand the molecular mechanism by which nsp interacts and interferes with mda , we predicted the mda card, nsp and k -ub tertiary structures with swiss-model, and then the predicted structures were input into zdock server for simulation. predicted docking models were processed in pymol for visualization. surprisingly, we found that nsp possesses a long α-helix ( supplementary fig. a) , which is tightly packed in the ravines formed by the two α-helixes of mda cards. the random coil and a short α-helix in the n terminus of nsp occupy the area or space that interacts with k -ub ( fig. e and supplementary fig. b ). further calculation of vacuum electrostatics for this binding model demonstrated that the contact area in the chain of mda cards is positively charged, while the corresponding area in the chain of nsp is negatively charged (fig. f and supplementary fig. c ), implying that there is a likely interaction of these two structures. we further searched the polar contacts in the interface of the binding model with pymol and found that paired residues anchor with each other, one locates in the n-terminal coil of nsp while the others are in the long α-helix (fig. g) .thus, computer-based molecular structural prediction and modeling implies that nsp interacts with the mda card domain probably through ionic interactions and dipolar surfaces between the nsp and mda card binding pockets. next, we sought to determine how nsp inhibits mda activation. it is well documented that upon virus infection, the mda card domain undergoes k -linked polyubiquitination and recruits mavs to form a signalosome . the structural prediction of the nsp -mda card interaction showed that nsp may interrupt this process since it interacts with mda at its card domain and shields the binding area or space for k -ubiquitin linkage (fig. e, supplementary fig. , and pdb file). the polyubiquitination of mda was analyzed in the presence or absence of nsp expression. the mda -expressing plasmid was cotransfected into hek t cells with a wt-, k -, or k -linked ubiquitin-expressing plasmid, and an in vivo ubiquitination assay showed that mda wt-and k -linked polyubiquitination were strongly inhibited (fig. a,c) , while k - linked polyubiquitination was barely affected (fig. b) . thus, these results reveal that nsp interferes with the such as il- , il- a, il- f, and il c, was also downregulated. furthermore, a decreasing transcription tendency was also observed for the inflammatory receptors il- ri, il- rii, il- rα, and il rii; nk cell- associated activation receptors, such as nkp , nkp , and nkg b; and the trans-acting t-cell-specific transcription factor gata (fig. a) , indicating that the activation of t cells and nk cells was attenuated by nsp through the suppression of these key factors. although these decreased cytokines and receptors may not be directly activated by irf or nfκb, they could be regulated by downstream cytokines or other factors derived from these two pathways. in contrast, the cytokine il- and ifn-gamma suppression gene foxp was significantly increased with nsp overexpression. to further confirm that the downregulation of these immune and inflammatory cytokines and genes is mediated by nsp under physiological conditions, a cells were transfected with an nsp -expressing plasmid or empty vector and then stimulated with poly(i:c) mimicking viral rna for the indicated times. the inhibition of the expression of key cytokines and related genes was verified, and nsp negatively regulated the expression of these immune and inflammatory genes (fig. b) . collectively, these results suggest that nsp could strongly impair the expression of genes involved in antiviral immune and inflammatory responses. in mavs ko hek t cells that also express nsp successfully restored the nsp inhibitory activity. consequently, a series of antiviral immune and inflammatory cytokines and related genes were further strongly downregulated by nsp expression. herein, we speculated that nsp may act on rig-i or mda , two upstream viral rna sensors. our results showed that nsp directly interacted with mda on its card domains, and mda -mediated type i ifn signaling activities were strongly inhibited at the same time. polyubiquitinated modification of mda is crucial for its antiviral responses, k -linked polyubiquitination mediates mda proteasomal degradation , and k - linked polyubiquitination mediates mda -induced type i ifn expression . we speculated that nsp may inhibit type i ifn signaling through the polyubiquitinated modification of mda . to test our hypothesis, we determined the status of wt-, k -and k -linked polyubiquitination of mda in the absence or presence of nsp and confirmed that wt-and k -linked polyubiquitination were impaired by nsp , while k -linked polyubiquitination was barely changed. we therefore conclude that under certain circumstances, nsp jeopardizes antiviral responses by impairing mda k -linked polyubiquitination. based on our existing experimental data, we propose a simple working model to illustrate how nsp negatively regulates innate immune responses by inhibiting mda k -linked polyubiquitination (fig. c) . to the attenuation of rig-i-mediated type i ifn antiviral responses . in addition, the ns protein of zika virus interacts with scaffold proteins - - ϵ and η separately through its - - binding motif, hence blocking the translocation of rig-i and mda from the cytosol to mitochondria, impairing signalosome formation with mavs, and antagonizing innate immunity . our studies revealed that nsp of sars-cov- acts as a binding partner of mda to shield its k -linked polyubiquitination and then impairs the formation or activation of the mda in summary, our study provides insights into the potential mechanisms of sars-cov- nsp in the inhibition of type i ifn signaling and antiviral responses. we provide compelling evidence that nsp plays a critical negative role in mda -mediated antiviral responses and demonstrate specific orchestration of the viral dsrna-triggered signalosome and signal cascade by nsp . importantly, considering that mda plays a key pathological role in antiviral immunity towards severe coronaviruses, antagonists of nsp could serve as a promising therapeutic target for covid- therapies. immunoprecipitation and immunoblot analysis for immunoprecipitation, whole cell extracts were prepared after transfection or stimulation with appropriate ligands, followed by incubation for h at °c with anti-gfp agarose beads (alpalife). beads were washed times with low-salt lysis buffer, and immunoprecipitants were eluted with x sds loading buffer and then resolved by sds-page. proteins were transferred to pvdf membranes (millipore) and further incubated with the appropriate primary and secondary antibodies. the images were visualized using odyssey sa (li-cor). computer-based prediction and structural modeling nsp .pdb, mda -cards.pdb and k -ub.pdb were generated in swiss-model . mda -cards.pdb was input into zdock-server as a receptor, while nsp or k -ub was input as a ligand for docking computation. mda -cards with nsp .pdb and mda -cards with k -ub.pdb were the best fit prediction models chosen from the results. all the pdb files were processed and visualized with pymol (schrödinger. middle east respiratory syndrome: emergence of a pathogenic annual review of medicine dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice emerging coronaviruses: genome structure, replication, and pathogenesis transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients. emerging microbes & infections clinical features of patients infected with novel coronavirus in wuhan, china epidemiological and clinical characteristics of cases of novel coronavirus pneumonia china: a descriptive study recognition of pathogen-associated molecular patterns by tlr family type i interferons (alpha/beta) in immunity and autoimmunity sars coronavirus nsp protein induces template-dependent endonucleolytic cleavage of mrnas: viral mrnas are resistant to nsp -induced rna cleavage severe acute respiratory syndrome coronavirus nsp facilitates efficient propagation in cells through a specific translational shutoff of host severe acute respiratory syndrome coronavirus papain-like protease ubiquitin-like domain and catalytic domain regulate antagonism of irf and nf-kappab signaling mavs-mediated apoptosis and its inhibition by viral proteins ribose '-o-methylation provides a molecular signature for the distinction of self and non-self mrna dependent on the rna sensor mda biochemical and structural insights into the mechanisms of sars coronavirus rna ribose o-methylation by nsp /nsp protein complex crystal structure and functional analysis of the sars-coronavirus rna cap '-o- phage display technique identifies the interaction of severe acute respiratory syndrome coronavirus open reading frame protein with nuclear pore complex interacting protein npipb in modulating type i interferon antagonism sars-coronavirus open reading frame- b suppresses innate immunity by targeting mitochondria and the mavs/traf /traf signalosome mers-cov b protein interferes with the nf-κb-dependent innate immune response during infection structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- ubiquitin-induced oligomerization of the rna sensors rig-i and mda activates antiviral innate immune response the e ubiquitin ligase trim attenuates antiviral immune responses by targeting mda and rig-i pangolins lack ifih /mda , a cytoplasmic rna sensor that initiates innate immune defense upon coronavirus infection regulation of retinoic acid inducible gene-i (rig-i) activation by the histone deacetylase herpesvirus deconjugases inhibit the ifn response by promoting trim autoubiquitination and functional inactivation of the rig-i signalosome zika virus ns mimics a cellular - - -binding motif to antagonize rig-i-and mda - nlrp negatively regulates type i interferon signaling by targeting the kinase tbk for degradation via the ubiquitin ligase dtx swiss-model: modelling protein tertiary and quaternary structure using evolutionary information zdock server: interactive docking prediction of protein-protein complexes and symmetric author contributions x.l. and e.k. initiated the concept. z.y., x.l. and e.k. designed the experiments and analyzed the data. p.w. provided the reagents. z.y., x.z., f.w. performed the experiments. z.y. and e.k. wrote the paper. we thank all the members of our laboratory for their critical assistance and helpful discussions. this work is supported by grants from the natural science foundation of china ( and ) to e.k. and the natural science foundation of china ( ) to x.li. the authors declare no competing financial interest. twenty-four hours post transfection, cells were treated with poly(i:c) ( μg/ml) for the indicated time points, and then total rna was extracted and subjected to rt-pcr analysis for tnf-α, ifn-β, ifit , ifit , il- and ccl expression. the data are shown as the mean values ± sd (n = ). *, p < . ; **, p < . ; ***, p < . ; ****, p < . ; by sidak's multiple comparisons test. were collected, and total rna was extracted and subjected to rt-pcr analysis for the expression of the indicated genes. "▼" indicates genes that were significantly changed in two independent nsp -expressing samples. b. a cells were transfected with an empty vector or flag-nsp . twenty-four hours post transfection, cells were treated with poly(i:c) ( μg/ml) for the indicated time points, and then total rna was extracted and subjected to rt-pcr analysis of the expression of selected genes. the data are shown as the mean values ± sd (n = ). *, p < . ; **, p < . ; ***, p < . ; ****, p < the sequences of primer pairs used in the rt-pcr array (fig. a) key: cord- -tmp yxlv authors: pinto, dora; park, young-jun; beltramello, martina; walls, alexandra c.; tortorici, m. alejandra; bianchi, siro; jaconi, stefano; culap, katja; zatta, fabrizia; de marco, anna; peter, alessia; guarino, barbara; spreafico, roberto; cameroni, elisabetta; case, james brett; chen, rita e.; havenar-daughton, colin; snell, gyorgy; telenti, amalio; virgin, herbert w.; lanzavecchia, antonio; diamond, michael s.; fink, katja; veesler, david; corti, davide title: structural and functional analysis of a potent sarbecovirus neutralizing antibody date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tmp yxlv sars-cov- is a newly emerged coronavirus responsible for the current covid- pandemic that has resulted in more than one million infections and , deaths , . vaccine and therapeutic discovery efforts are paramount to curb the pandemic spread of this zoonotic virus. the sars-cov- spike (s) glycoprotein promotes entry into host cells and is the main target of neutralizing antibodies. here we describe multiple monoclonal antibodies targeting sars-cov- s identified from memory b cells of a sars survivor infected in . one antibody, named s , potently neutralizes sars-cov- and sars-cov pseudoviruses as well as authentic sars-cov- by engaging the s receptor-binding domain. using cryo-electron microscopy and binding assays, we show that s recognizes a glycan-containing epitope that is conserved within the sarbecovirus subgenus, without competing with receptor attachment. antibody cocktails including s along with other antibodies identified here further enhanced sars-cov- neutralization and may limit the emergence of neutralization-escape mutants. these results pave the way for using s and s -containing antibody cocktails for prophylaxis in individuals at high risk of exposure or as a post-exposure therapy to limit or treat severe disease. sars-cov- is a newly emerged coronavirus responsible for the current covid- pandemic that has resulted in more than one million infections and , deaths , . vaccine and therapeutic discovery efforts are paramount to curb the pandemic spread of this zoonotic virus. the sars-cov- spike (s) glycoprotein promotes entry into host cells and is the main target of neutralizing antibodies. here we describe multiple monoclonal antibodies targeting sars-cov- s identified from memory b cells of a sars survivor infected in . one antibody, named s , potently neutralizes sars-cov- and sars-cov pseudoviruses as well as authentic sars-cov- by engaging the s receptor- binding domain. using cryo-electron microscopy and binding assays, we show that s recognizes a glycan-containing epitope that is conserved within the sarbecovirus subgenus, without competing with receptor attachment. antibody cocktails including s along with other antibodies identified here further enhanced sars-cov- neutralization and may limit the emergence of neutralization-escape mutants. these results pave the way for using s and s -containing antibody cocktails for prophylaxis in individuals at high risk of exposure or as a post-exposure therapy to limit or treat severe disease. coronavirus entry into host cells is mediated by the transmembrane spike (s) glycoprotein that forms homotrimers protruding from the viral surface . the s glycoprotein comprises two functional subunits: s (divided into a, b, c and d domains) that is responsible for binding to host cell receptors and s that promotes fusion of the viral and cellular membranes , . both sars-cov- and sars-cov belong to the sarbecovirus subgenus and their s glycoproteins share % amino acid sequence identity . sars-cov- s is closely related to the bat sars-related cov (sarsr-cov) ratg with which it shares . % amino acid sequence identity . we and others recently demonstrated that human angiotensin converting enzyme (hace ) is a functional receptor for sars-cov- , as is the case for sars-cov , - . the ranging between . and , ng/ml, and . and ng/ml, respectively (fig. a- b). mabs were further evaluated for binding to the sars-cov- and sars-cov s b domains as well as to the prefusion-stabilized oc s , mers-cov s , , sars- cov s and sars-cov- s ectodomain trimers. none of the mabs studied bound to prefusion oc s or mers-cov s ectodomain trimers, indicating a lack of cross- reactivity outside the sarbecovirus subgenus (extended data fig. ) . mabs s , s , s and s recognized the sars-cov- and sars-cov rbds. in particular, s bound with nanomolar affinity to both s b domains, as determined by biolayer interferometry (fig. c-d, extended data fig. ) . unexpectedly, s and s stained cells expressing sars-cov- s at higher levels than those expressing sars-cov s, yet it did not interact with sars-cov- or sars-cov s ectodomain trimers and rbd constructs by elisa. these results suggest that they may recognize post-fusion sars-cov- s, which was recently proposed to be abundant on the surface of authentic sars-cov- viruses (fig. a-b and extended data fig. ) . the s -fold molecular axis. cdrh and cdrl sandwich the sars-cov- s glycan at position n through contacts with the core fucose moiety (in agreement with the detection of sars-cov- n core-fucosylated peptides by mass-spectrometry ) and to a lesser extent with the core n-acetyl-glucosamine (fig. d) . these latter interactions bury an average surface of ~ Å and stabilize the n oligosaccharide which is resolved to a much larger extent than in the apo sars-cov- s structures , . the structural data explain the s cross-reactivity between sars-cov- and sars-cov as out of residues of the epitope are strictly conserved ( fig. f and extended data fig. a to further investigate the mechanism of s -mediated neutralization, we compared side-by-side transduction of sars-cov- -mlv in the presence of either s fab or s igg. both experiments yielded comparable ic values ( . and . nm, respectively), indicating similar potencies for igg and fab (fig. d) . however, the s igg reached % neutralization, whereas the s fab plateaued at ~ % neutralization (fig. d) . this result indicates that one or more igg-specific bivalent mechanisms, such as s trimer cross-linking, steric hindrance or aggregation of virions , may contribute to the ability to fully neutralize pseudovirions. to gain more insight into the epitopes recognized by our panel of mabs, we used structural information, escape mutants analysis , , , and biolayer inteferometry-based epitope binning to map the antigenic sites present on the sars- cov and sars-cov- s b domains ( fig. a and extended data fig. ). this analysis identified at least four antigenic sites within the s b domain of sars-cov targeted by our panel of mabs. the receptor-binding motif, which is targeted by s , s and s , is termed site i. sites ii and iii are defined by s and s , respectively, and the two sites were bridged by mab s . site iv is defined by s , s , and s mabs. given the lower number of mabs cross-reacting with sars-cov- , we were able to identify sites iv targeted by s and s , and site ii-iii targeted by s and s (fig. b) . movie frame alignment, estimation of the microscope contrast-transfer function parameters, particle picking and extraction were carried out using warp . particle images were extracted with a box size of binned to yielding a pixel size of . Å. for each data set two rounds of reference-free d classification were performed using cryosparc to select well-defined particle images. subsequently, two rounds of d classification with iterations each (angular sampling . ˚ for iterations and . ˚ with local search for iterations), using our previously reported closed sars- cov- s structure as initial model, were carried out using relion without imposing symmetry to separate distinct sars-cov- s conformations. d refinements were carried out using non-uniform refinement along with per-particle defocus refinement in cryosparc . particle images were subjected to bayesian polishing before performing another round of non-uniform refinement in cryosparc followed by per- particle defocus refinement and again non-uniform refinement. supplemented with % glycerol. the dataset was collected at als beamline . . and processed to . Å resolution in space group p using mosflm and aimless . the structure of fab s was solved by molecular replacement using phaser and homology models as search models. the coordinates were improved and completed using coot and refined with refmac . crystallographic data collection and refinement statistics are shown in table . s mab. a-b , ribbon diagrams of s and ace bound to sars-cov- s b . this composite model was generated using the sars-cov- s/s cryoem structure reported here and a crystal structure of sars-cov- s bound to ace . c, competition of s or s mabs with ace to bind to sars-cov s b (left panel) and sars-cov- s b (right panel). ace was immobilized at the surface of biosensors before incubation with s b domain alone or s b precomplexed with mabs. the vertical dashed line indicates the start of the association of mab-complexed or free s b to solid-phase ace . d, neutralization of sars-cov-mlv by s igg or s fab, plotted in nm (means ±sd is shown, one out of two experiments is shown). e, mab-mediated adcc using primary nk effector cells and sars-cov- s-expressing expicho as target cells. bar graph shows the average area under the curve (auc) for the responses of - donors genotyped for their fcgriiia (mean±sd, from two independent experiments). f, activation of high affinity (v ) or low affinity (f ) fcgriiia was measured using jurkat reporter cells and sars-cov- sexpressing expicho as target cells (one experiment, one or two measurements per mab). g, mab-mediated adcp using cell trace violet-labelled pbmcs as phagocytic cells and pkf labelled sars-cov- s-expressing expicho as target cells. bar graph shows the average area under the curve (auc) for the responses of four donor (mean±sd, from two independent experiments). h, activation of fcgriia measured using jurkat reporter cells and sars-cov- s-expressing expicho as target cells (one experiment, one or two measurements per mab). fig. ) . b, competition of mab pairs for binding to the sars-cov- s b domain . c-d, neutralization of sars-cov- -mlv by s combined with an equimolar amount of s or s mabs. for mab cocktails the concentration on the x axis is that of the individual mabs. table : characteristics of the antibodies described in this study. vh and vl % identity refers to v gene identity compared to germline (imgt). isolation and characterization of a bat sars-like coronavirus that uses the ace receptor site-specific analysis of the sars-cov- glycan shield. biorxiv genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding structure of sars coronavirus spike receptor-binding domain complexed with receptor occupancy and mechanism in antibody- mediated neutralization of animal viruses igg fc engineering to modulate antibody effector functions alveolar macrophages are critical for broadly-reactive antibody- mediated protection against influenza a virus in mice differential fc-receptor engagement drives an a neutralizing antibody selected from plasma cells that binds to group and group influenza a hemagglutinins fc receptor but not complement binding is important in antibody protection against hiv human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants specificity, cross-reactivity, and function of antibodies elicited by zika virus infection enhanced antibody half-life improves in vivo activity potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody automated molecular microscopy: the new leginon system addressing preferred specimen orientation in single-particle cryo-em through tilting real-time cryo-electron microscopy data preprocessing with warp cryosparc: algorithms for rapid unsupervised cryo-em structure determination new tools for automated high-resolution cryo-em structure determination in relion- a bayesian approach to beam- induced motion correction in cryo-em single-particle analysis prevention of overfitting in cryo-em structure determination visualizing density maps with ucsf chimera features and development of coot tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions automated structure refinement of macromolecular assemblies from cryo-em maps using rosetta automatically fixing errors in glycoprotein structures with molprobity: all-atom structure validation for macromolecular crystallography emringer: side chain-directed model and map validation for d cryo-electron microscopy macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix privateer: software for the conformational validation of carbohydrate structures ucsf chimerax: meeting modern challenges in visualization and analysis imosflm: a new graphical interface for diffraction-image processing with mosflm how good are my data and what is the resolution? phaser crystallographic software refmac for the refinement of macromolecular crystal structures identifying sars-cov- related coronaviruses in malayan pangolins key: cord- -lfuxhlki authors: diallo, aïssatou bailo; gay, laetitia; coiffard, benjamin; leone, marc; mezouar, soraya; mege, jean-louis title: daytime variation in sars-cov- infection and cytokine production date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lfuxhlki s. ray and a. reddy recently anticipated the implication of circadian rhythm in severe acute respiratory syndrome coronavirus (sars-cov- ), which is the causative agent of the coronavirus disease (covid- ). in addition to its key role in the regulation of biological functions, the circadian rhythm has been suggested as a regulator of viral infections. specifically, the time of day of infection was found critical for illness progression, as has been reported for influenza, respiratory syncytial and parainfluenza type viruses. we analyzed circadian rhythm implication in sars-cov- virus infection of isolated human monocytes, key actor cells in covid- disease, from healthy subjects. the circadian gene expression of bmal and clock genes was investigated with q-rtpcr. monocytes were infected with sars-cov- virus strain and viral infection was investigated by one-step qrt-pcr and immunofluorescence. interleukin (il)- , il- β and il- levels were also measured in supernatants of infected monocytes. using cosinor analysis, we showed that bmal and clock transcripts exhibited circadian rhythm in monocytes with an acrophase and a bathyphase at zeitgeber time (zt) and zt . after forty-eight hours, the amount of sars-cov- virus increased in the monocyte infected at zt compared to zt . the high virus amount at zt was associated with significant increased release in il- , il- β and il- compared to zt . our results suggest that time day of sars-cov- infection affects viral infection and host immune response. they support consideration of circadian rhythm in sars-cov- disease progression and we propose circadian rhythm as a novel target for managing viral progression. importance the implication of circadian rhythm (cr) in pathogenesis of severe acute respiratory syndrome coronavirus (sars-cov- ) has been recently anticipated. the time of day of infection is critical for illness progression as reported for influenza, respiratory syncytial and parainfluenza type viruses. in this study, we wondered if sars-cov- infection and cytokine production by human monocytes, innate immune cells affected by covid- , were regulated by cr. our results suggest that time day of sars-cov- infection affects viral infection and host immune response. they support consideration of circadian rhythm in sars-cov- disease progression and we propose circadian rhythm as a novel target for managing viral progression. the implication of circadian rhythm (cr) in pathogenesis of severe acute respiratory syndrome coronavirus (sars-cov- ) has been recently anticipated ( , ). the cr regulates physiological processes in living organisms with a period of hours ( ). rhythmicity depends on central and peripheral oscillators whose activity relies on two main feedback loops managed by a clock genes cascade under the regulation of the main clock gene bmal ( ). the host susceptibility to microorganism is likely under control of biological clocks ( ). the time of day of infection is critical for illness progression as reported for influenza, respiratory syncytial and parainfluenza type viruses ( - ). we previously reported that cr is a key actor at the interface between infection susceptibility, clinical presentation and prognosis of infection ( , ) . there are some evidences that enable to anticipate the role of cr in sars-cov- infection. the absence of bmal has an impact on intracellular replication of coronaviruses, especially vesicular trafficking, endoplasmic reticulum and protein biosynthesis ( ) . knock-out of bmal markedly decreases the replication of several viruses such as dengue or zika ( ). finally, among key proteins involved in sars-cov- interaction with the host recently published ( ), it has been identified % of them being associated with circadian pathway ( ). clearly, the evidences of an implication of cr in sars-cov- infection of human cells are lacking. in this study, we wondered if sars-cov- infection and cytokine production by human monocytes, innate immune cells affected by covid- , were regulated by cr. we first wondered if the infection of monocytes, innate immune cells affected by , obey to circadian oscillations. every hours during hours, total rna was extracted and expression of bmal and clock genes was investigated in unstimulated monocytes as previously described ( ). expression of investigating genes exhibited cr in monocytes with an acrophase (peak of the rhythm) and a bathyphase (trough of the rhythm) at zeitgeber (german name for synchronizer) time (zt) and zt (fig. a ). these two time points represent the beginning of the active and the resting periods in humans ( ). to assess the involvement of cr in infection of monocytes with sars-cov- , we incubated monocytes with sars-cov- during the bathyphase (zt ) and acrophase (zt ) during hours. then, viral rna was extracted to evaluate covid- virus amount and the phagocytosis index was calculated at zt and zt using immunofluorescence. the amount of sars-cov- virus was higher in monocytes cultured at zt than at zt , suggesting a covid- disease is characterized by runaway immune system leading to a cytokine storm consisting of high circulating levels of cytokines including il- , il-  and il- ( ). we wondered if the interaction of sars-cov- with monocytes affected cytokine production at two points of the cr. the amounts of il- , il- and il- were significantly increased at zt (fig. d ) when the amount of infection is highest. hence, the interaction of sars-cov- with monocytes resulted in distinct cytokine pattern according to daytime. we demonstrate here that the time day of sars-cov- infection determines consistently viral infection/replication and host immune response. it is likely that sars-cov- exploits clock pathway for its own gain. our findings support consideration of cr in sars-cov- disease progression and suggest that cr represents a novel target for managing viral progression. this study also highlights the importance of the time of treatment administration to covid- patients since cr was found regulating pharmacokinetics of several drugs ( ). several treatments are proposed to prevent the occurrence of severe forms in covid- . they include passive immunization, cytokines, anti-cytokine antibody or corticoids ( ). all these candidates affect the immune response known to oscillate during the day and their administration according to cr of sars-cov- . finally, the well- documented cr disturbance in intensive care units ( ) with % fetal bovine serum (fbs), as previously described ( ). human monocytes were isolated from peripheral blood mononuclear cells from healthy donors (convention n° , etablissement français du sang, marseille, france) following cd selection using macs magnetic beads (miltenyi biotec, bergisch, germany) as previously described ( ). coronaviruses: an overview of their replication and coronaviruses the circadian clock components bmal and rev- erbα regulate flavivirus replication zt key: cord- -twk kvd authors: täufer, matthias title: rapid, large-scale, and effective detection of covid- via non-adaptive testing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: twk kvd pooling of samples can increase lab capacity when using polymerase chain reaction (pcr) to detect diseases such as covid- . however, pool testing is typically performed via an adaptive testing strategy which requires a feedback loop in the lab and at least two pcr runs to confirm positive results. this can cost precious time. we discuss a non-adaptive testing method where each sample is distributed in a prescribed manner over several pools, and which yields reliable results after one round of testing. more precisely, assuming knowledge about the overall incidence rate, we calculate explicit error bounds on the number of false positives which scale favourably with pool size and sample multiplicity. this allows for hugely streamlined pcr testing and cuts in detection times for a large-scale testing scenario. a viable consequence of this method could be real-time screening of entire communities, frontline healthcare workers and international flight passengers, for example, using the pcr machines currently in operation. one key to containing and mitigating the covid- pandemic is suggested to be rapid testing on a massive scale [hzw + , sby] . it would be beneficial to develop the ability to routinely, and in particular rapidly, test groups such as frontline healthcare workers, police officers, and international travellers. testing for sars-cov- is currently performed via the polymerase chain reaction (pcr) on nasopharyngeal swabs [tty + ] . typically, the population size significantly exceeds the capacity for testing, with the number of available pcr machines and reagents an important bottleneck in this process. there are two basic approaches to pcr testing in populations: . individual tests, where every single sample is examined, and . pooled tests where larger sets of samples are mixed and tested en bloc. pooled testing was pioneered by dorfman in [dor ] in the context of blood tests and led to a host of research activity, both on the lab side as well as the theoretical side [ajs , dh , dh ] . if the disease is rare in the population, pooled testing may be advisable. in this case it can assist in optimizing precious testing capacity since most individual results would be negative. pooling relies on the fact that the pcr is reasonably reliable under the combination of samples: the preprint [yast + ] suggests that a detection of sars-cov- in pools of size and possibly is feasible. while a classic pooling strategy has the advantage that less overall pcr tests are required, there are disadvantages in terms of lab organisation and -more crucially -time: pooling only indicates whether a pool contains at least one infected individual. if samples are tested in pools of size n and the incidence ρ is small (more precisely, if ρ · n is small) then a number of samples will be in pools that are tested positive and hence undergo a second round of testing. in other words, pooled testing with individual verification of positive pools is an adaptive testing strategy, the lab organisation for which is a labour, management, and resource intensive process. it has several drawbacks, since it requires keeping multiple lab samples and re-running of the time-intensive pcr process. the lab feedback loop makes the entire workflow more susceptible to delays (see figure ). this may result in delays in individual results -a particular problem when the objective is to rapidly identify infected individuals, who may infect others while waiting for the test outcome. furthermore, since the number of samples undergoing a second round of testing is an unknown quantity, some reserve capacity is required to prevent further delays. this makes it more challenging for the lab to operate near its maximal capacity. in the theoretical research on testing strategies the distinction is made between adaptive testing, for example when all samples in a positive pool undergo a second round of testing, and non-adaptive strategies, where all tests can be run simultaneously [dh ] . testing every sample individually can be considered as a trivial non-adaptive strategy, but there exist non-adaptive strategies which combine the benefit of pooling with the advantages of non-adaptive testing. in this note, we propose a non-adaptive pooling strategy for rapid and largescale screening for sars-cov- or other scenarios where detection time is critical. this allows for significant streamlining of the testing process and reductions in detection time. firstly because only one round of pcr is required, and secondly because it eliminates actions in the lab workflow that require input from results determined in the lab, i.e. the testing infrastructure can be organized completely linearly, cf. figure for an illustration. the strategy will systematically overestimate the number of positives, but we can provide error bounds on the number of false positives which scale favourably with large numbers and will be small in realistic scenarios. our testing strategy is as follows: every individual's sample is broken up into k samples and distributed over k different pools of size n such that no two individuals share more than one pool. an individual is considered as tested positive if all the pools in which its sample has been given are tested positive or -in our case equivalently -an item is considered as tested negative if it appears in at least one negative pool. this decoding algorithm is also known as comp (combinatorial orthogonal matching pursuit), an algorithm easily implementable in practice with low run-time and storage [jas ] . let us make our definition more formal: definition (multipools). let a population (x , . . . , x n ) of size n , a pool size n, and a multiplicity k be given, and assume that n k is a multiple of n. we call a collection of subsets/pools of {x , . . . , x n } an (n, n, k)-multipool, or briefly multipool, if all of the following three conditions hold: (m ) every pool consists of exactly n elements. (m ) every sample x i is contained in exactly k pools. (m ) for any two different samples x i , x j there exists at most one pool which contains both x i and x j . in the context of non-adaptive testing, designs as in definition are called (k − )-disjunct matrices and it is known that such matrices correctly identify up to k infected samples [maz ] . however, we will be interested in scenarios where the number of infected samples can exceed the multiplicity k. if n = n and k = the construction of an (n, n, )-multipool is quite straightforward, see figure : arrange the n samples in a rectangular grid and then pool along every row and column, cf. [ssw + , fflh, zdf + ]. however, as we shall see below, k = is in many realistic scenarios insufficient for the desired precision. some recent contributions [fflh, mnb + ] propose to arrange samples in a ( or higher dimensional) hypercube and to pool along all hyperplanes. this makes every individual sample appear in three or more pools, but it is not a multipool in the sense of definition above, since in dimension three and higher, any two hyperplanes will intersect in more than one point, in violation of property (m ). this creates unnecessary correlations between different pools and impairs performance. if k = , systems as in definition are also called steiner triples and have been recently used in non-adaptive group testing for sars-cov- [gar + ]. a flexible way to construct multipools of various multiplicities k is given by the shifted transversal design [tm , egn + ] which we explain in section . we always assume that the incidence ρ of the disease is small compared to the inverse pool size /n. this is a reasonable requirement, also in classical pooling strategies (a ρn portion of samples will have to undergo second testing, thus a large ρn would attenuate the benefit of pooling). assuming perfect performance of the pcr, also under pooling (see section on how to deal with uncertainty here), multipooling will identify all infected individuals, since all their pools will be positive. however, a sample might falsely be declared positive if all pools in which it is contained happen to contain an infected sample. the expected portion of false positives in a multipool strategy is here, the third identity crucially uses the property (m ) which guarantees independence between the poolmates in the different pools of a sample. by bayes' rule, the probability to actually be negative when tested positive by the multipool (i.e. the portion of subjects falsely declared positive among all subjects declared positive) is let us calculate for which k the probability of a positive test result being a false positive does not exceed fp > : ( this provides a lower bound on the necessary multiplicity k in terms of the sample size n, the knowledge on the incidence ρ, and the acceptable portion fp of false positive results among all positives. assuming fp < and ρ ≤ (which are both reasonable assumptions, recall that ρn is small), the lower bound in ( . ) is monotone increasing in ρ. hence, if the exact incidence is unknown but we have an upper bound on it, we can work with the largest/worst case ρ. let us summarize these findings in the following theorem . let the incidence be at most ρ ≤ , and let < fp < . if then in any multipooling strategy with pool size n and multiplicity k, the probability of a positive test being a false positive does not exceed fp . the number of tests required in a multipool strategy is n k/n, an improvement compared to individual testing by a factor n/k. a key observation is that the lower bound on k in inequality ( . ) scales favourably with large multiplicities n. indeed, recall that in an adaptive pooling strategy one wants on the one hand large pool sizes n, but on the other hand nρ should be small. it is therefore reasonable to have n proportional to the inverse of ρ, i.e. nρ ≈ c. using that − ρ ≈ and − ( − ρ) n− ≈ (n − )ρ ≈ nρ, the lower bound in ( . ) behaves approximately as that is k grows only logarithmically with the pool size n. an analogous analysis shows that k also grows logarithmically with the inverse of fp when the error probability fp is sent to zero. we compare the theoretical values found in the lead-up to theorem to simulated values in figure the question for which combiniations (n, n, k) a multipool exists seems to be in general a non-trivial combinatorial problem. we focus here on the case when n = n and on constructions based on the shifted transversal design [tm ] . it is useful to imagine all n samples arranged in an n × n-square and label samples by their x and y-coordinate, i.e. denoting the sample at position (i, j) ∈ n by x ij , where we define the the sample in the lower left (south-west) corner to be x . for multiplicity k = , a (n, n, k)-multipool can be constructed by pooling along rows and columns, as in figure . unfortunately, for reasonable parameter choices, a multiplicity of k = turns out to lead to large false positive rates: for instance, arranging n = samples from a population with incidence ρ = . in a rectangular grid and pooling along all rows and columns (in our notation this is an ( , , )-multipool), identity ( . ) will imply that on average . % of positive results will actually be false positives. to improve on this and pass to multiplicity k = , one can sample along diagonals, where the diagonals are continued periodically, see figure . this works for any pool size n ≥ and leads to theorem . let n = n and n ≥ . then there exists an (n, n, )-multipool, obtained by sampling along rows, columns, and all periodically continued southwest-to-north-east diagonals. in the situation of n = and n = , this allows for the construction of a ( , , )-multipool in which, by ( . ), the probability of a positive result being erroneous is reduced to . %. in such a scenario, one would test individuals with tests, a compression by a factor . . a higher compression rate would require larger pool sizes n. since the lower bound ( . ) on k in theorem is monotonous in n, this will in turn also require to higher multiplicities k in order to achieve comparable false positive error probabilities. to pass to k = , one might now be tempted to pool along the other (north-west-to-south-east) diagonals, but this is not going to yield a multipool in general, see for instance figure where, in the case n = , two diagonals intersect in more than one point, in violation of property (m ) in definition . this is due to the fact that n = has non-trivial divisors, i.e. it is not a prime number. south-west-to-north-east diagonals are of the form were, (mod n) means that we use arithmetic modulo n, that is as soon as we exceed n − , we start counting from again. these diagonals are lines of slope + and − , respectively, and the difference of these slopes is , which divides . since intersections of two such lines are given by solutions to the equation has a unique solution j if and only if the greatest common divisor of m and n is . since this must hold for all m ∈ { , . . . , n − }, n must be a prime number. in this case, the integers modulo n form an algebraic structure called a field, in which every non-zero element has a well-defined multiplicative inverse. for prime n, the unique solution of ( . ) is therefore given by j = m − l, where m − denotes the multiplicative inverse of m in arithmetic modulo n. this suggests to use a prime pool size n and sample along lines of different slopes, that is to use pools of the form p (l, m) := x j,l+jm(mod n) : j = , . . . , n − , l, m ∈ { , . . . , n − }. ( . ) we can add one more type of pool by sampling along all vertical lines (their slope can be considered as "infinity") which we denote by p (l, ∞) := {x l,j : j = , . . . , n − } , l ∈ { , . . . , n − }. ( . ) such ensembles of pools are sketched in figure for the case n = . this construction is also referred to as the shifted transversal design in [tm ] . we summarise our findings in the following theorem . let n be a prime number and let n = n . then, there exists a (n, n, k)-multipool for k = (n + ), and consequently also for every smaller k. this multipool is given by pooling along all sloped lines, that is: figure contains an illustration of elements of such a multipool in the case n = with multiplicity k = . theorem allows for multiplicities up to k = n + , but in practice, one will want to work with much lower multiplicities k since a high multiplicity would require many tests and defeat the purpose of pooling. from a practical perspective it seems reasonable to generate large pools by a sequence of unions of two equally diluted pools. this leads to pool sizes which are a power of , certainly not a prime number (except for itself). one approach to accomodate for that would be population sizes n = n where n is a prime just below a power of , e.g. n = , which is just below or n = which is just below . then pools of size n can be mixed by adding a small number of negative dummy samples and proceeding as if n was a power of . let us sketch some concrete examples where the pool sizes are a prime number and where the multipooling strategy might be useful: let the population size be n = = . this could for instance be the number of employees in a company or passengers which depart from an international airport within a certain time window. let the incidence rate ρ be no more than . % and let us work with a pool size n = . since n is prime, theorem allows to construct ( , , k)-multipools for any k ≤ and theorem allows to bound the probability of a positive test being erroneous for different multiplicities k as in table . accepting for instance a false positive probability of % requires n/n = pcr tests, . % of what would be required in individual testing. let us emphasize again here that this means that % among the results flagged as positive will be false positives, not % of the overall test results. table : probability of a positive result being a false positive and the compression k/n compared to individual testing for pool size n = , incidence ρ ≤ . and different multiplicities k. the multipool method scales well with larger numbers. let the population size be n = = and the pool size n = , which is of the order of pools being used for the pcr today [yast + ]. let furthermore be the incidence rate be no larger than . %, a realistic upper bound for the prevalence of sars-cov- in many countries [fns ] . since n = is prime, theorem allows to construct ( , , k)-multipools for any k ≤ and the error bounds in theorem lead to table . if we choose k = and accept fp = . % as the k fp k/n . . . . . . table : probability of a positive result being a false positive and the compression k/n compared to individual testing for pool size n = , incidence ρ ≤ . and different multiplicities k. probability for positive results being false positives, we need n/n = tests in order to fast and efficiently test individuals, that is . % of what would be needed with individual testing. the non-adaptive multi-pooling strategy provides a streamlined and efficient organisation of the testing process and cuts in detection time. this significant benefit comes with potential reductions in accuracy compared with adaptive testing, but this false positive rate can be tightly controlled and tailored to suit the circumstance. the false positive probability fp deemed an acceptable cost for the increased testing efficiencies may depend on, for example, the infection characteristics, the government policy and resource levels. a small modification of our strategy might furthermore allow for an improvement of the false negative rate -even compared to usual adaptive pool testing strategies: even though commonly used, pooling samples can potentially dilute samples close to the identification threshold of the pcr and increase the probaility of false negatives. the recent preprint [yast + ] estimates a false negative rate of % when detecting sars-cov- in pools of size . one can reduce this type of false negative in our strategy by declaring all samples which are in at least k − positive pools as tested positive. this strategy is known as the "noisy comp" (ncomp) decoding algorithm [ccjs , cjsa ] where an item is declared infected if more than a certain portion of its pools test positive. this will on the one hand lower the probability of false negatives, but more importantly it will only mildly affect the false positive rate. this could be seen by adding a next-order term in the error analysis performed leading up to theorem . for a sound analysis, knowledge on the false positive rate gained through experiments would be required, but the general message that the necessary multiplicity k will grow slowly with large n and small fp remains. let us finally note that the basic idea is close to compressed sensing and sparse recovery [ct , fr ] . while in our situation the output space consists of { , }-vectors, which make the mathematics we use rather elementary, there also seem to be applications of the pcr where quantitative measurements are taken and where compressive sensing techniques might be applied. a very recent approach in this direction is tapestry pooling [grk + , gar + ] which takes quantitative data from pcr measurements and uses methods from compressed sensing to decode. in the scenario of testing n = samples in pools of size n = discussed in section , this approach suggests reasonable results at multiplicity k = , a higher compression rate than in our approach. however we emphasise that the (experimental) error analysis performed in the context of tapestry pooling focuses on fixed numbers of infected samples and is therefore in a slightly different spirit than our approach which is based on the prevalence of the disease in the population. group testing: an information theory perspective. foundations and trends® in communications and information theory non-adaptive probabilistic group testing with noisy measurements: near-optimal bounds with efficient algorithms non-adaptive group testing: explicit bounds and novel algorithms near-optimal signal recovery from random projections: universal encoding strategies? combinatorial group testing and its applications pooling designs and nonadaptive group testing the detection of defective members of large populations biological screens from linear codes: theory and tools purim: a rapid method with reduced cost for massive detection of covid- coronavirus (covid- ) infection survey pilot: england a mathematical introduction to compressive sensing a compressed sensing approach to group-testing for covid- detection tapestry: a single-round smart pooling technique for covid- testing performance of group testing algorithms with near-constant tests per item on almost disjunct matrices for group testing a strategy for finding people infected with sars-cov- : optimizing pooled testing at low prevalence eliminating covid- : a community-based analysis a two-dimensional pooling approach towards efficient detection of parasitoid and pathogen dna at low infestation rates supplementary material for rapid, large-scale, and effective detection of covid- via non-adaptive testing a new pooling strategy for high-throughput screening: the shifted transversal design consistent detection of novel coronavirus in saliva evaluation of covid- rt-qpcr test in multi-sample pools a two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms the author thanks christoph schumacher for numerous helpful discussions and comments. comments by emma lawrance, albrecht seelmann and sasha sodin are also gratefully acknowledged. key: cord- -uau jj authors: ramirez, santseharay; fernandez-antunez, carlota; pham, long v.; ryberg, line a.; feng, shan; pedersen, martin s.; mikkelsen, lotte s.; belouzard, sandrine; dubuisson, jean; gottwein, judith m.; fahnøe, ulrik; bukh, jens title: efficient culture of sars-cov- in human hepatoma cells enhances viability of the virus in human lung cancer cell lines permitting the screening of antiviral compounds date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uau jj efforts to mitigate covid- include screening of existing antiviral molecules that could be re-purposed to treat sars-cov- infections. although sars-cov- propagates efficiently in african green monkey kidney (vero) cells, antivirals such as nucleos(t)ide analogs (nucs) often exhibit decreased activity in these cells due to inefficient metabolization. limited sars-cov- replication and propagation occurs in human cells, which are the most relevant testing platforms. by performing serial passages of a sars-cov- isolate in the human hepatoma cell line clone huh . , we selected viral populations with improved viability in human cells. culture adaptation led to the emergence of a significant number of high frequency changes (> % of the viral population) in the region coding for the spike glycoprotein, including a deletion of nine amino acids in the n-terminal domain and amino acid changes (e d, p r, and q h). we demonstrated that the huh . -adapted virus exhibited a > -log increase in infectivity titers (tcid ) in huh . cells, with titers of ~ log tcid /ml, and > -log increase in the human lung cancer cell line calu- , with titers of ~ log tcid /ml. culture adaptation in huh . cells further permitted efficient infection of the otherwise sars-cov- refractory human lung cancer cell line a , with titers of ~ log tcid /ml. the enhanced ability of the virus to replicate and propagate in human cells permitted screening of a panel of nine nucs, including broad-spectrum compounds. remdesivir, eidd- and to a limited extent galidesivir showed antiviral effect across these human cell lines, whereas sofosbuvir, uprifosbuvir, valopicitabine, mericitabine, ribavirin, and favipiravir had no apparent activity. importance the cell culture adapted variant of the sars-cov- virus obtained in the present study, showed significantly enhanced replication and propagation in various human cell lines, including lung derived cells otherwise refractory for infection with the original virus. this sars-cov- variant will be a valuable tool permitting investigations across human cell types, and studies of identified mutations could contribute to our understanding of viral pathogenesis. in particular, the adapted virus can be a good model for investigations of viral entry and cell tropism for sars-cov- , in which the spike glycoprotein plays a central role. further, as shown here with the use of remdesivir and eidd- , two nucs with significant inhibitory effect against sars-cov- , large differences in the antiviral activity are observed depending on the cell line. thus, it is essential to select the most relevant target cells for pre-clinical screenings of antiviral compounds, facilitated by using a virus with broader tropism. the severe acute respiratory syndrome coronavirus (sars-cov- ), responsible for covid- , in order to generate a large volume of supernatant that could be used for characterization of the huh . adapted virus (this virus will be referred to as p huh . virus). we performed a comparative titration in various cells of the p veroe and the p huh . viruses ( figure b) and found that the infectivity titers in huh . cells after culture adaptation had increased by more than logs (mean of . and . log tcid /ml, respectively). the huh . adapted virus also exhibited significantly increased titers in vero e cells (mean of . log tcid /ml for the p huh . virus versus . log tcid /ml for the p veroe virus). interestingly, the original p veroe virus was less viable in the huh parental cell line than in the huh . clone, however, adaptation to the huh . clone also led to a significant increase in infectivity titers in huh cells ( . and . log tcid /ml for the p veroe and p huh . viruses, respectively). visual observations of p huh . virus infected cultures in the light microscope indicated an increase in cpe. to better quantify this, we performed viral cytopathic effect assays (cpe assays, figure c ), in which we detected an evident increase in cpe titers (log cpe /ml) in all cells. cpe significantly increased from . to . log cpe /ml in vero e cells. in huh . cells, the p veroe virus was not cytopathic (we obtained a value just above the assay threshold in one of the independent experiments), whereas the p huh . virus led to high titers of . log cpe /ml. for the huh parental cells the p veroe virus was also non-cytopathic, but the adapted p huh . yielded . log cpe /ml. in addition to the increase in infectivity titers observed after infection with the adapted p huh . virus in huh . cells, we also noticed an evident increase in the intensity of the antigen staining (α-spike protein antibody), and in the number of infected cells at non-cytopathic virus dilutions of the p huh . virus compared to the p veroe virus (figure d ). this suggest that the p huh . virus might both replicate and propagate at higher levels in huh . cells, as also indicated by the significant increase in refractory to infection , . to our knowledge, the ability of the calu- cell line to support sars-cov- replication and propagation has not been previously reported. the calu- and the a cell lines are widely available and well-characterized standards among the human lung carcinoma/alveolar cell lines used in cancer research , . further, the a cell line is also a model for the study of respiratory viruses, such as respiratory syncytial virus and influenza , . compared to the p veroe virus, the p huh . virus exhibited a significant increase in the ability to infect calu- cells with > -log increase in infectivity titers. for the original p veroe virus, observed titers in calu- were . log tcid /ml, which increased to . log tcid /ml for the p huh . virus ( figure a) . surprisingly, the p huh . virus was able to efficiently infect a cells, with titers of . the adapted p huh . virus permitted drug testing in huh . , calu- and a cells. we focused our screen on nucs previously shown to have antiviral effect against hcv, including sofosbuvir, a pangenotypic hcv drug used in the clinic , but we also included broader-spectrum molecules such as galidesivir, favipiravir and ribavirin. among the nucs tested only remdesivir and eidd- displayed significant effect across the human cells (table ) , as previously described . remdesivir was most active in huh . cells, with ~ -fold lower ec values when compared to calu- and a cells (figure a ). despite being less active in lung carcinoma than in hepatoma cells, remdesivir was still more active in lung carcinoma calu- and a than in vero e cells (about -fold lower ec compared to the p veroe virus). the opposite was observed for eidd- (figure b) , which was more active in a and calu- cells ( -and -fold more active with ec of . µm and . µm, respectively) than in huh . cells ( . µm). finally, galidesivir exhibited limited activity (figure c ) with relatively high ec (> µm); the best inhibitory effect was observed in a cells (ec of µm). other nucs, including sofosbuvir, had no apparent activity (ec > µm) in these human cell lines with our experimental conditions (table ) . in this study, we performed isolation of sars-cov- (isolate dk-ahh ) in vero e cells and in the type- interferon response, one of the features that has been correlated to increased permissiveness to hcv replication . interestingly, huh cells are highly susceptible and permissive to cov- is more sensitive to type- interferons than sars-cov . therefore decreased type- interferon responses might be a key feature for enhancing the cell culture viability of sars-cov- , correlating with the high permissiveness of the vero cell lines, which lack genes encoding type- interferons . led to the acquisition of a mutation in the s-protein creating a novel furin site downstream of the s /s site that was implicated in the entry and syncytium formation in vero cells . on the other hand, deletions in the s -s furin cleavage site have been found during culture adaptation in vero e cells , . finally, q h was present at low frequency in p but increased significantly in the last passages, consistent with the maximum increase in viral infectivity. residue q locates within the s subunit, which undergoes conformational rearrangements from prefusion to post fusion states. specifically, this position is located within the heptad repeat (hr ), which is part of the fusion-active core structure . we demonstrated that the activity of remdesivir in huh . cells was very similar between the p veroe and the adapted p huh . virus, despite the later exhibiting multiple changes in the genome. since these changes concentrated in the s-protein and not in the nsp protein, which is the main target of nucs, this virus represents an excellent tool to study this drug class in human cells. however, the mutations present in the adapted virus could potentially interfere with entry processes and therefore this experimental system might not be an optimal tool for the screening of entry/fusion inhibitors. it is important to continue evaluating antiviral strategies against sars-cov- , that could also be table . anti-sars-cov- activity of a panel of nucleos(t)ide analogs in different human cells. for each compound, the antiviral activity in huh . , calu- or a cells is indicated by ec values (µm). these values were inferred from concentration-response curves as shown in figure . all compounds were tested at non-cytotoxic concentrations as described in materials and methods. for remdesivir the maximum concentration tested was µm. for eid- the maximum concentration tested was µm. "> " indicates that the maximum concentration tested was µm and that no viral inhibition tending towards or reaching % was observed at this concentration. the maximum concentration tested for galidesivir was µm (since clear inhibitory effects were observed at µm and µm was non-cytotoxic). genomic characterisation and epidemiology of novel coronavirus: infectivity of the covid- virus comparative tropism, replication kinetics, and cell damage profiling of sars- cov- and sars-cov with implications for clinical manifestations, transmissibility, and laboratory studies of covid- : an observational study effect of inducible fhit and p expression in the calu- lung cancer cell regulation of the interferon system: evidence that vero cells have a genetic defect in interferon production deducing the n-and o-glycosylation profile of the spike protein of novel coronavirus sars-cov- mutations strengthened sars-cov- infectivity sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites entry from the cell surface of severe acute respiratory syndrome coronavirus with cleaved s protein as revealed by pseudotype virus bearing cleaved s protein expression of sars-cov- receptor ace and tmprss in human primary conjunctival and pterygium cell lines and in mouse cornea activates respiratory viruses and is expressed in viral target cells transmembrane serine protease tmprss activates hepatitis c virus infection advantages of the parent nucleoside gs- over remdesivir for covid- treatment highly efficient full-length hepatitis c virus genotype (strain tn) infectious culture system cell culture system permits identification of escape variants with resistance to sofosbuvir highly efficient infectious cell culture of three hepatitis c virus genotype b strains and sensitivity to lead protease cell culture studies of the efficacy and barrier to resistance of sofosbuvir- velpatasvir and glecaprevir-pibrentasvir against hepatitis c virus genotypes a, b, and c detection of novel coronavirus ( -ncov) by real-time rt-pcr full-length open reading frame amplification of hepatitis c virus microrna- antagonism against hepatitis c virus genotypes - and reduced efficacy by host rna insertion or mutations in the hcv ' utr evolutionary pathways to persistence of highly fit and resistant hepatitis c virus protease inhibitor escape variants cutadapt removes adapter sequences from high-throughput sequencing reads novel infectious cdna clones of hepatitis c virus genotype s ) and a (strain ed ): genetic analyses and in vivo pathogenesis studies differential efficacy of protease inhibitors against hcv genotypes a, a, a, and a ns / a protease recombinant viruses muscle: multiple sequence alignment with high accuracy and high throughput cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding potential to target new outbreaks we thank bjarne Ørskov lindhardt (hvidovre hospital) and carsten geisler (university of copenhagen) for their support of these studies, and anna-louise sørensen, susanne ruszczycka and louise barny christensen, hvidovre hospital, for technical support. key: cord- -oer lxxr authors: onodi, fanny; bonnet-madin, lucie; karpf, léa; meertens, laurent; poirot, justine; legoff, jérome; delaugerre, constance; amara, ali; soumelis, vassili title: sars-cov- induces activation and diversification of human plasmacytoid pre-dendritic cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oer lxxr several studies have analyzed antiviral immune pathways in severe covid- patients. however, the initial steps of antiviral immunity are not known. here, we have studied the interaction of isolated primary sars-cov- viral strains with human plasmacytoid pre-dendritic cells (pdc), a key player in antiviral immunity. we show that pdc are not permissive to sars-cov- infection. however, they efficiently diversified into activated p -, p -, and p -pdc effector subsets in response to viral stimulation. they expressed checkpoint molecules at levels similar to influenza virus-induced activation. they rapidly produced high levels of interferon-α, interferon-λ , il- , ip- , and il- . importantly, all major aspects of sars-cov- -induced pdc activation were inhibited by hydroxychloroquine, including p - and p -pdc differentiation, the expression of maturation markers, and the production of interferon-α and inflammatory cytokines. our results indicate that pdc may represent a major player in the first line of defense against sars-cov- infection, and call for caution in the use of hydroxychloroquine in the early treatment of the disease. severe acute respiratory syndrome-coronavirus- (sars-cov- ) is the third zoonotic coronavirus that emerged in the last two decades. sars-cov- is the causative agent of coronavirus disease that appeared in late in wuhan, hubei province in china (nandakumar, ; sheahan and frieman, ) . sars-cov- became rapidly pandemic, and infection have now been detected in countries and territories, and is responsible for approximatively million confirmed cases and , deaths as of of june (who weekly update).  sars-cov- infection may lead to a diversity of clinical presentations, ranging from asymptomatic or mild "flu-like" syndrome, to severe and life-threatening acute respiratory failure. disease aggravation usually occurs after to days following initial symptoms (tang et al., ) . at this late stage, three main factors were shown to contribute to the progression and severity of the infection (tang et al., ) : ) viral persistence was evidenced in the lung and systemic circulation, although it is not constant (tang et al., ) , ) an excess production of pro-inflammatory cytokines, such as il- b and il- (tay et al., ; arnaldez et al., ) , ) a defect in type i interferon (ifn) production, especially in critically ill patients (tay et al., ; acharya et al., ) . although these abnormalities were confirmed in several studies, their origin and underlying mechanisms remain mostly unknown. in particular, it is not known whether an imbalance between inflammatory cytokines and type i ifn occurs early in the disease, at the stage of the primary infection, and whether the virus itself may be responsible. to fill this gap of knowledge, it becomes essential to investigate the early innate immune response to sars-cov- . among the immune cells that are involved in innate anti-viral immunity, plasmacytoid predendritic cells (pdc) play a particularly important role as the major source of type i ifn in response to viral infection (liu, ) . pdc can sense a large array of viruses including the coronaviruses murine hepatis virus (mhv) and the middle east respiratory syndrome coronavirus (mers) (scheuplein et al., ; cervantes-barragan et al., ) , and respond by producing innate cytokines, including all forms of type i ifns (α and β), type iii ifn, and inflammatory cytokines, in particular tnf-α and il- (liu, ; yin et al., ; gilliet et al., ) . however, different viruses may induce different cytokine patterns (thomas et al., ) , possibly creating an imbalance between ifn versus inflammatory cytokine response. additionally, some viruses were shown to subvert pdc functions through different mechanisms not necessarily related to productive infection. this is the case for hiv, which may induce pdc apoptosis in vitro (meyers et al., ) and pdc depletion in vivo (soumelis et al., ; meera et al., ) . human hepatitis c virus can inhibit ifn-α production by pdc through the glycoprotein e binding to bdca- (florentin et al., ) . human papillomavirus induces very low ifn response in pdc (bontkes et al., ) , which may be due to impaired tlr- and - signaling (hirsch et al., ) . whether sars-cov- induces efficient pdc activation, or may interfere with various biological pathways in pdc is currently unknown. in this study, we have systematically addressed the interactions between clinical sars-cov- isolates and primary human pdc in order to reproduce the early stages of the infection. we showed that pdc are resistant to productive infection with sars-cov- strains but still mount substantial ifn responses upon viral challenge. interestingly, pdc responded to sars-cov- by a complete activation program, including diversification into effector subsets, production of type i and type iii ifn, as well as inflammatory cytokines. we also showed that hydroxychloroquine, an antimalarial drug proposed for treatment of covid- patients (das et al., ; mahévas et al., ) , inhibits sars-cov- -induced pdc activation and ifn production in a dose-dependent manner. our results establish pdc as a potential key player in innate immunity to sars-cov- , and raise caution regarding pharmacological manipulation that could inhibit pdc effector functions. in order to efficiently recapitulate sars-cov- -pdc interactions, we used two strains of sars-cov- primary isolates. their viral genome sequences were nearly identical with , % identity. sequence comparison with reference strain wuhan-hu- (ncbi accession number nc_ . ) showed that both strains contain a subset of mutations (c t; c t; a g and g t), characteristic of the gh clade based on gisaid nomenclature. human primary pdc were purified from healthy donor peripheral blood mononuclear cells (pbmc) by cell sorting. first, we asked whether sars-cov- was able to induce pdc activation, and diversification into ifnproducing and/or t cell stimulating effectors, as we previously described for influenza virus a (flu) (alculumbre et al., ) . after hours of culture, sars-cov- activated pdc efficiently diversified into p (pd-l + cd -), p (pd-l + cd + ), and p (pd-l + cd + ) pdc subsets, similar to flu stimulation ( fig a) . p -, p -, and p -pdc were all significantly induced by sars-cov- and flu, as compared to medium control ( fig b) . in parallel, we observed a sharp decrease in non-activated p -pdc (pd-l -cd -) (fig a and b) . sars-cov- -induced pdc activation was comparable with magnetically-versus facs-sorted pdc (fig s a and s b ), confirming that both methods are suitable for subsequent experiments. all main findings were confirmed on at least three independent experiments using facssorted pdc, with a protocol that excluded as-dc, a rare dendritic cell (dc) subset that shares some markers and functional features with pdc (villani et al., ) , based on cd , cd and axl expression ( fig s a) . pdc activation and diversification was observed with two independent primary sars-cov- strains (fig c) , which both induced similar proportions of p -p subsets. pdc diversification was also observed by co-culturing of pdc with sars-cov- -infected vero e cells with a similar efficiency than free sars-cov- ( fig s c) . sars-cov- improved pdc viability as compared to medium condition ( fig d) , which is compatible with subsequent effector functions. next, we asked whether sars-cov- -induced pdc activation was dependent on productive infection. we first checked whether pdc express at their cell surface ace , the major sars-cov- entry receptor (hoffmann et al., , ) . no significant expression was detected using an anti-ace -specific antibody, as compared to a low and high expression on vero e and t-ace cell lines, respectively ( fig e) . the ability of pdc to replicate sars-cov- was then assessed. human pdc were challenged with sars-cov- strain _ at moi of , and cultured for h, h or h. our results showed that pdc were refractory to sars-cov- infection, as evaluated by quantifying ) the intracellular production of the nucleoprotein antigen (n) (fig f) , or the accumulation of viral rna in sars-cov- -infected cells (fig s d) , and ) the release of infectious progeny virus in the supernatants of infected cells using plaque assays ( fig g) . as positive control, the permissive vero e cells produced high level of the n antigen, increased viral rna overtime ( fig s d) , and high viral titers following sars-cov- incubation ( fig g) . similar results were obtained with pdc isolated from three independent donors ( fig s e) . overall, these results show that pdc are resistant to sars-cov- infection, and are efficiently activated by the virus independently of ace expression. their viability was not affected by sars-cov- challenge. activating immune checkpoints play a key role in t cell stimulation, and serve as surrogate markers of dc differentiation (guermonprez et al., ) . we first assessed the dose-dependent effect of sars-cov- on cd expression and subset diversification. cd was induced in a dose-dependent manner by sars-cov- at moi . to (fig a) . this was accompanied by an increase in p -pdc subset, and a slight decrease in p -pdc ( fig b) . a detailed phenotypic analysis was subsequently performed on pdc after and hours of culture with sars-cov- ( fig c and fig s a) . diversification was observed at both time points, with a slight increase in p -pdc at hours ( fig s a) . p -and p -pdc significantly upregulated cd , cd , ccr , and ox l, as compared to non-activated p -pdc, in both sars-cov- and flu conditions ( fig c) . pd-l , and cd l, an integrin that promotes lymph node homing, were both higher on p -and p -pdc ( fig c) . expression of checkpoint molecules persisted at h, especially the higher cd and cd expression on p -pdc ( fig s b) . a key and defining function of pdc is their ability to produce large amounts of type i ifn (gilliet et al., ) . we measured the production of several cytokines at the protein level after hours of culture. both sars-cov- and flu induced high levels of ifn-α and ifn-λ , both being critical anti-viral effector cytokines ( fig a) . ifn-α levels following sars-cov- activation reached up to ng/ml, indicating a very efficient activation. the chemokine ip- was also significantly induced ( fig a) , possibly due to an autocrine ifn loop (blackwell and krieg, ) . inflammatory cytokines il- and il- were comparably induced by sars-cov- and flu ( fig a) . however, tnf-α levels were marginally induced by sars-cov- as compared to flu activation ( fig a) . cytokine production was maintained after hours of viral activation ( fig b) . secreted protein levels were similar to h levels for most cytokines. interestingly, ifn-α levels raised by -fold between h and h for one donor ( fig a and b) , indicating the possibility of increased production. such strong ifn producer suggests a potential virus controller. because the oropharyngeal mucosa is an entry site for sars-cov- , we aimed at validating our results using pdc purified from tonsils. sars-cov- induced a marked diversification of tonsillar pdc into all three activated subsets ( fig s c) . tonsillar pdc efficiently produced ifn and inflammatory cytokines in response to sars-cov- ( fig s d) . overall, our results establish sars-cov- as a very efficient inducer of type i and type iii ifn responses. inflammatory cytokines were induced at similar level than flu activation, without any significant imbalance that would be suggestive of excessive inflammatory response. given that sars-cov- did not infect pdc, and did not interfere with pdc activation, we asked whether pharmacological agents could modulate pdc diversification and cytokine production. hydroxychloroquine (hcq) is known to inhibit endosomal acidification which may diminish pdc activation (kuznik et al., , ; sacre et al., ) . additionally, it is being tested in several clinical studies as a potential treatment for covid- (das et al., ; mahévas et al., ) . hence, we addressed its role in sars-cov- -induced pdc activation. following hours of culture, we found that hcq inhibited pdc diversification in response to sars-cov- , which is similar to the decrease observed with flu, used as a positive control ( fig a) . in particular, p -and p -pdc differentiation were almost completely inhibited ( fig b) . inhibition of sars-cov- -induced pdc diversification by hcq was dosedependent ( fig s a) . the significant decrease in p -pdc was paralleled by a decrease in cd , cd , and ccr expression (fig c and d) . ox -ligand expression was not significantly affected by hcq (fig c and d) . however, hcq inhibited the appearance of an ox -ligand high pdc population ( fig s b and s c ), which may impact subsequent t cell activation. last, we assessed the effect of hcq on innate pdc functions. we measured cytokine production after hours of sars-cov- -induced pdc activation in the presence or absence of hcq. we found that ifn-α and λ levels were decreased by hcq (fig e) . this was also the case for il- and il- , with a much lesser impact on ip- ( fig e) . together, these results show that hcq inhibits sars-cov- -induced pdc diversification and cytokine production. type i ifns are critical cytokines that control viral replication. several chronic viral infections were associated to poor type i ifn responses (lee et al., ; snell et al., ; marsili et al., ; dolganiuc et al., ) . in covid- patients, decreased serum levels of type i ifn were associated with severity in late stage infection, and increased viral load (tay et al., ; acharya et al., ) . this raised the question as to whether sars-cov- was intrinsically capable of inducing a robust ifn response, or on the contrary could interfere with ifn production and other antiviral immune pathways. in this study, we have used primary sars-cov- isolates and human primary pdc in order to increase the relevance to a naturally occurring infection. our results demonstrate that sars-cov- is a strong ifn inducer by efficiently stimulating primary pdc. viral sensing was independent of the expression of the ace entry receptor or the ability of the virus to replicate in pdc. however, the precise molecular mechanisms involved remain to be investigated. both type i and type iii ifns were induced at high levels upon sars-cov- stimulation. this strongly suggests that the defects observed in critically ill covid- patients are acquired during disease evolution through secondary events, not necessarily associated to direct effect of the virus. possible mechanisms could be related to inflammatory cytokines, such as tnf, and endogenous glucocorticoid response, both being able to promote pdc apoptosis ( (abe and thomson, ) . however, additional mechanisms may be involved and need to be explored in the context of severe covid- . an excessive production of inflammatory cytokines, such as il- β, il- and tnf, was associated to covid- severity (tay et al., ; arnaldez et al., ; vabret et al., ) . their cellular source and the underlying mechanisms are currently unknown. our results indicate that sars-cov- -induced pdc activation promotes a balanced production of innate cytokines, including large amounts of type i and type iii infs, without any significant excess in inflammatory cytokines. this suggests that pdc activation may not be a causal factor of covid- aggravation. in support to that, recent studies indicated that endothelial cells may be a target of sars-cov- infection, and could be at the origin of the systemic and multi-organ production of inflammatory cytokines (pons et al., ) . bronchial epithelial cells could also be involved in the production of high levels of il- , which were not detected in the serum and in pbmc transcriptomic studies in severe covid- (wilk et al., ) . this supports the view of pdc as being protective through an early and efficient production of antiviral cytokines, with later defects due to currently unknown mechanisms, associated with late stage aggravation. on the contrary, nonprofessional innate immune cells such as endothelial cells and bronchial epithelial cells may be involved in the secondary worsening of covid- through the excessive and uncontrolled production of inflammatory cytokines. several therapeutic approaches have been explored and are currently being tested in clinical trials on covid- patients (vabret et al., ; tay et al., ) . these include antiviral agents (yang et al., ) , immune-modulatory molecules, such as glucocorticoids (fernández cruz et al., ) , and anti-inflammatory molecules, such as hcq (touret and de lamballerie, ; lecuit, ) . this latter drug was additionally shown in in vitro studies to interfere with sars-cov- replication (wang et al., ) . continued efforts in mapping and dissecting immune effector pathways to sars-cov- will be of major importance in order to design efficient treatment strategies adapted to each patient and stage of the infection. buffy coats from healthy human donors were obtained from etablissement français du sang, paris, saint-antoine crozatier blood bank. peripheral blood mononuclear cells (pbmcs) were isolated through ficoll density gradient centrifugation (ficoll-paque; ge healthcare). pdc were isolated through a first step of pdc magnetic sorting (human plasmacytoid dc enrichment kit; stemcell), and subsequent flow cytometric sorting on the basis of live, lineage -(cd , cd , cd , cd , cd and cd ), cd c -cd + , and cd -cd cells to a % purity. due to some logistic issues, alternatively frozen pbmcs from etablissement français du sang, paris, saint louis blood bank, were thawed and placed at °c for h for cell recovery. sars-cov- viruses were isolated from nasopharyngeal swab specimens collected at service de virologie (hospital saint louis, paris). samples were centrifugated at , x g for min then filtered using a . μm filter, and diluted : with dmem- % (dmem supplemented with % fbs, % penicillin-streptomycin, % glutamax and mm hepes). vero e cells were seeded in -well cell culture plate ( , cells/well), and incubated at °c with μl of inoculum and observed daily for cytopathogenic effects (cpe) by light microscopy. substantial cpe were seen at - hours post inoculation. culture supernatants were then collected, clarified by centrifugation, filtered using a . μm filter and kept at - °c. we confirmed sars-cov- replication by rt-qpcr and whole viral genome sequences were obtained by next generation sequencing using illumina misseq sequencers. strains sequences have been deposited in the global initiative of sharing all influenza data (gissaid) database with accession id epi_isl_ ( _ ) and epi_isl_ ( _ ). all viruses belong to the gh clade. sars-cov- strains were further propagated on vero e in dmem- % (dmem supplemented with % fbs, % penicillin-streptomycin, % glutamax and mm hepes). viruses were passaged three times before being used for experiments. for the last passage, viruses were purified through a % sucrose cushion by ultracentrifugation at , x g for hours at °c. pellets were resuspended in hne x ph . (hepes mm, nacl mm, edta . mm), aliquoted and stored at - viruses titer was ascertained by plaque assays in vero e cells and expressed as pfu per ml. cells were incubated for hour with -fold dilution of viral stocks. the inoculum was then replaced with avicel . % mixed at equal volume with dmem supplemented with % fbs, % glutamax, mm mgcl , . % of nahco , and incubated days at °c before plaque counting. vero cells were plated ( , cell per well) in p -well plates hours before being incubated with sars-cov- diluted in dmem- %. freshly purified pdc were seeded in p -well plates ( , cells per well) and incubated with sars-cov- diluted in pdcs culture medium (rpmi medium with glutamax, % of fbs, % of mem neaa, % of sodium pyruvate, and % of penicillin/streptomycin). at , and hour post-inoculation, vero cells were trypsinized and transferred to p -well plates. to quantify infectious viral particle released in the supernatants of infected cells, pdc and vero cells were inoculated with sars-cov- as described above and incubated at °c for -hour. at indicated time points, supernatants were collected and kept at - °c. virus titer were then determined by plaque assay on vero e cells as described above. vero e and pdc were inoculated with sars-cov- as described above. at the indicated time points, cells were washed thrice with pbs. vero e were further incubated with trypsin . % for min at °c to remove cells surface bound particles. total rna was extracted using the rneasy plus mini kit (qiagen) according to manufacturer's instruction. cdnas were generated from ng total rna by using the maxima first strand synthesis kit following manufacturer's instruction (thermo fisher scientific). amplification products were incubated with unit of rnase h for min at °c, followed by min at °c for enzyme inactivation, and diluted fold in dnase/rnase free water. real time quantitative pcr was performed using a to sort pdc, cells were stained with zombie violet or buv fixable viability dye pdc cytokine production of ifn-α , il- , il- , ip- and tnf-α, was measured in culture supernatants using bd cytometric bead array (cba), according to the manufacturer's protocol, with a pg/ml detection limit. acquisitions were performed on a lsr fortessa (bd biosciences), and cytokine concentrations were determined using fcap array software (bd biosciences). the concentration of secreted ifn-λ was measured by enzyme-linked immunosorbent assay (elisa) (r&d systems, duoset dy ), according to the manufacturer's instructions. the manufacturer reported no cross-reactivity nor interference with ifn-α, ifn-β a, il- rβ, ifn-λ and λ , and il- rα. the optical density value (od) of the supernatants was defined as its absolute od value, minus the od absorbance from blank wells. the detection limit was pg/ml and all samples were run in duplicates. statistical analyses were performed with one-way anova, kruskall wallis's test with dunn's multiple comparison post-test or mann whitney's test, in prism (graphpad software). ali the authors declare no competing interests. ali amara's lab received fundings from the french government's investissement dexamethasone preferentially suppresses plasmacytoid dendritic cell differentiation and enhances their apoptotic death dysregulation of type i interferon responses in covid- diversification of human plasmacytoid predendritic cells in response to a single stimulus the society for immunotherapy of cancer perspective on regulation of interleukin- signaling in covid- -related systemic inflammatory response cpg-a-induced monocyte ifn-gammainducible protein- production is regulated by plasmacytoid dendritic cellderived ifn-α plasmacytoid dendritic cells are present in cervical carcinoma and become activated by human papillomavirus type virus-like particles control of coronavirus infection through plasmacytoid dendritic-cell-derived type i interferon an updated systematic review of the therapeutic role of hydroxychloroquine in coronavirus disease- (covid- ) hepatitis c virus (hcv) core protein-induced, monocytemediated mechanisms of reduced ifn-α and plasmacytoid dendritic cell loss in chronic hcv infection avendaño-solá, and puerta de hierro covid- study group. . impact of glucocorticoid treatment in sars-cov- infection mortality: a retrospective controlled cohort study hcv glycoprotein e is a novel bdca- ligand and acts as an inhibitor of ifn production by plasmacytoid dendritic cells plasmacytoid dendritic cells: sensing nucleic acids in viral infection and autoimmune diseases antigen presentation and t cell stimulation by dendritic cells impaired toll-like receptor and signaling: from chronic viral infections to cancer sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor mechanism of endosomal tlr inhibition by antimalarial drugs and imidazoquinolines chloroquine and covid- , where do we stand? negative regulation of type i ifn expression by oasl permits chronic viral infection and cd + tcell exhaustion ipc: professional type interferon-producing cells and plasmacytoid dendritic cell precursors clinical efficacy of hydroxychloroquine in patients with covid- pneumonia who require oxygen: observational comparative study using routine care data hiv- , interferon and the interferon regulatory factor system: an interplay between induction, antiviral responses and viral evasion irreversible loss of pdcs by apoptosis during early hiv infection may be a critical determinant of immune dysfunction impact of hiv on cell survival and antiviral activity of plasmacytoid dendritic cells covid- : emergence, spread, possible treatments, and global burden. frontiers in public health the vascular endothelium: the cornerstone of organ dysfunction in severe sars-cov- infection hydroxychloroquine is associated with impaired interferon-α and tumor necrosis factor-α production by plasmacytoid dendritic cells in systemic lupus erythematosus high secretion of interferons by human plasmacytoid dendritic cells upon recognition of middle east respiratory syndrome coronavirus the continued epidemic threat of sars-cov- and implications for the future of global public health type i interferon in chronic virus infection and cancer depletion of circulating natural type interferonproducing cells in hiv-infected aids patients the hallmarks of covid- disease the trinity of covid- : immunity, inflammation and intervention differential responses of plasmacytoid dendritic cells to influenza virus and distinct viral pathogens of chloroquine and covid- immunology of covid- : current state of the science single-cell rna-seq reveals new types of human blood dendritic cells, monocytes, and progenitors remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro . a single-cell atlas of the peripheral immune response in patients with severe covid- repurposing old drugs as antiviral agents for coronaviruses type iii ifns are produced by and stimulate human plasmacytoid dendritic cells key: cord- -d tg i authors: andres, cristina; garcia-cehic, damir; gregori, josep; piñana, maria; rodriguez-frias, francisco; guerrero-murillo, mercedes; esperalba, juliana; rando, ariadna; goterris, lidia; codina, maria gema; quer, susanna; martín, maria carmen; campins, magda; ferrer, ricard; almirante, benito; esteban, juan ignacio; pumarola, tomás; antón, andrés; quer, josep title: naturally occurring sars-cov- gene deletions close to the spike s /s cleavage site in the viral quasispecies of covid patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d tg i the sars-cov- spike (s) protein, the viral mediator for binding and entry into the host cell, has sparked great interest as a target for vaccine development and treatments with neutralizing antibodies. initial data suggest that the virus has low mutation rates, but its large genome could facilitate recombination, insertions, and deletions, as has been described in other coronaviruses. here, we deep-sequenced the complete sars-cov- s gene from patients ( with mild and with severe covid- ), and found that the virus accumulates deletions upstream and very close to the s /s cleavage site, generating a frameshift with appearance of a stop codon. these deletions were found in a small percentage of the viral quasispecies ( . %) in samples from all the mild and only half the severe covid- patients. our results suggest that the virus may generate free s protein released to the circulation. we propose that natural selection has favored a “don’t burn down the house” strategy, in which free s protein may compete with viral particles for the ace receptor, thus reducing the severity of the infection and tissue damage without losing transmission capability. rna viruses replicate using their own rna-dependent rna polymerase (rdrp), which lacks proofreading mechanisms and is prone to mutate at high rates ( - - - substitutions /nucleotide/replication cycle), lending the virus a quasispecies structure( , ). previous studies with severe acute respiratory syndrome coronavirus (sars-cov) and mouse hepatitis viruses have reported moderate mutation rates of . x - and . x - subs/site/cycle respectively, below the expected range for rna viruses ( ) . this is consistent with a role for non-structural protein (nsp) in rna proofreading or repair functions because of its '- ' exonuclease (exon) activity. nonetheless, the large size of the cov rna genome increases the probability that deletions will be generated and recombination events will take place, which could facilitate adaptation to new host environments, as occurs with jumping between species , . one naturally occurring deletion on nucleotides in the open reading frame (orf) of sars-cov after human-to-human transmission was found to be associated with attenuation of replication ( ) . the low mutation rate, high human-to-human transmissibility (r = . )( ), and absence of human pre-existing immunity against sars-cov- could explain its rapid spread through the human population, with very high sequence identity ( . %) between isolates recovered all over the world (sequence published in the repository sequence data banks, gisaid and genbank). the high pathogenicity of the virus, the severity of covid- and the lack of an effective antiviral treatment or vaccine has pushed the scientific community worldwide to develop, in record time, a solution for this pandemic ( ) . among the sars-cov- structural proteins, including spike (s), envelope (e), and membrane (m) constituting the viral coat, and the nucleocapsid (n) protein that packages the viral genome, the s glycoprotein is the most promising as a therapeutic and vaccine target. the s protein is encoded by the s gene, and following trimerization, it composes the spikes of the characteristic viral particle crown (corona). the s protein is essential for sars-cov- to infect a host cell ( ) , by recognizing and binding to the human cell receptor, angiotensin-converting enzyme (ace ) ( ) , and possibly (with lower affinity) to other receptors, such as cd l (l-sign), also used by sars-cov ( ) and dipeptidyl peptidase (dpp ), used by mers ( ) . the s gene has , nucleotides with , amino acids (genbank reference sequence mn . ). it has five essential domains: the receptor-binding domain (rbd), o-linked glycan residues flanking a polybasic s /s cleavage site, fusion peptide (fp), heptad repeats hr and hr , and a transmembrane domain (tm). the s rbd includes amino acid positions that show high affinity for the human ace receptor, which is widely distributed, but mainly present in alveolar type (at ) cells of the lungs ( ) . once the virus is attached to the host cell receptor cleavage occurs between subunits s and s , and subunit s drives the viral and cellular membranes to fuse ( ) . thus, s recognizes and binds to the human cell receptor ace , whereas s directly facilitates entry into the host cell. both functions are crucial for infection, and therein lies the interest of s as a target for the development of vaccines and antiviral agents. because of the importance of the s protein, we carried out a deep-sequencing study of the s gene in upper respiratory tract samples from patients with mild or severe sars-cov- disease. of particular note, hot-spot deletion sites were found in minority mutants located upstream and very close to the s /s and s ' cleavage sites, suggesting that these genomes code for a truncated s protein. the variants were significantly more prevalent in patients with mild than those with severe disease. thus, their effect on the protein could constitute a favorable regulatory mechanism emerging in the viral quasispecies to modulate the pathological effect of the infection. discussion is provided on the implications this observation may have in the biology of sars-cov- . upper respiratory tract specimens (naso/oropharyngeal swabs or nasopharyngeal aspirates) from individuals consulting in the emergency room were collected for sars-cov- testing in the department of microbiology at hospital universitari vall d'hebron (huvh), barcelona (spain). samples from patients with no previous comorbidities other than covid- were included in the study. as defined by cdc criteria (https://www.cdc.gov/coronavirus/ ncov/hcp/clinical-guidance-management-patients.html), patients had a mild clinical presentation of covid- (absence of viral pneumonia and hypoxia, no hospitalization requirement, able to manage their illness at home), whereas patients had severe disease (icu requirement for supportive management of complications of severe covid- , eg, pneumonia, hypoxemic respiratory failure, sepsis, cardiomyopathy and arrhythmia, acute kidney injury, and other complications). all patients, both those with mild and severe disease, had a favorable outcome with resolution of the infection. the diagnosis of covid- was performed by two tests, an in-house pcr assay using the primer/probe set from the cdc -ncov real-time rt-pcr diagnostic panel (qiagen, germany) and a commercial real-time rt-pcr assay, the allplex -ncov assay (seegene, korea). the eighteen respiratory specimens were inactivated by mixing µl of sample with µl of avl buffer (qiagen, hilden, germany). extraction of nucleic acids was then performed using the qiamp viral rna mini kit (qiagen, hilden, germany) following the manufacturers' instructions but without the rna carrier, obtaining a final elution of µl. the complete s gene was amplified using a double pcr. the first rt-pcr step consisted in amplifying large fragments, base pairs (bp) and bp in length, respectively. the ' end of primer and ' end of primer were designed to be outside the s region to ensure that we were amplifying sars-cov- genomic rna, and not subgenomic rna. (table s ). the superscript iii one-step rt-pcr system with platinum taq hifi dna polymerase (invitrogen; carlsbad, ca, usa) was used for the rt-pcr. reverse transcription was done at ºc for min, followed by a retrotranscriptase inactivation step at ºc for min. next, cycles of pcr amplification were performed as follows: denaturation at ºc for sec, annealing at ºc for sec, and elongation at ºc for min. after the last cycle, amplification ended with a final elongation step at ºc for min. the second round of amplification (nested) was done using overlapping internal primer pairs to amplify fragments bp to bp in length. the faststart high-fidelity pcr system dntpack (sigma, st. louis, mo, ca) was used for this purpose, as follows: activation at ºc for min, followed by cycles with denaturation at ºc for sec, annealing at ºc for sec, and elongation at ºc for sec, ending with a single elongation step at ºc for min. pcr products were purified using the qiaquick gel extraction kit (qiagen, hilden, germany) with qg buffer, following the manufacturers' instructions, and eluted dna was quantified by fluorometry using the qubit dsdna br assay kit (thermofisher, ma, usa). for each patient, pcr products were normalized to . ng/µl, pooled in a single tube, and purified using kapa pure beads (kapabiosystems, roche, pleasanton, ca, usa) to ensure that no short dna fragments were present in the library. library preparation was done using the kapa hyper the sequence analysis aimed to obtain high-quality haplotypes fully covering the amplicons. the pipeline comprises the following steps: . amplicons were reconstructed from the corresponding r and r paired ends using flash ( ) and setting a minimum of overlapping bases and a maximum of % mismatches. low-quality reads that did not match meet the requirements were discarded. . next, all reads with more than % of bases below a phred score of q were filtered out. . the reads were demultiplexed by matching primers, allowing a maximum of three mismatches, and the primers were trimmed at both read ends. identical reads were collapsed to haplotypes with the corresponding frequencies as read counts. a fasta file was generated with each pool/primer/strand combination. the reverse haplotypes were reverse complemented. comparison by log-expectation (muscle) ( ) , then separated into strands, and haplotypes common to both strands at abundances ≥ . % were identified. lowabundance haplotypes (< . %) and those unique to one strand were discarded. the haplotypes common to both strands, with frequencies not below . % are called consensus haplotypes, and are the basis of subsequent computations. the amino acid alignments were computed as follows: -gaps were removed and haplotypes translated to amino acids. -the translated stops generated were identified, and haplotypes were trimmed after the stop. -resulting amino acid haplotypes were realigned with muscle (embl-ebi https://www.ebi.ac.uk/tools/msa/muscle/). all computations were made in the r language and platform ( ) , developing in-house scripts using biostrings ( ) and ape ( ) percentage of the master sequence, gap incidence per patient and per amplicon, and premature stop codons are available as supplementary tables s -s . deletions were not randomly accumulated along the s gene, but instead, were found at specific regions (figure , figures s -s ). deletions coded as delta (Δ -Δ ) ranged from to nucleotides lost ( table ). in some cases, the sequence recovered the correct reading frame, in others, the frameshift caused the appearance of a premature stop codon very close to the deletion site, whereas in still others, a new amino acid segment appeared. deletions were found in all amplicons, but they were mainly observed at frequencies < % (table ). most deletions in amplicons n , n , n , n , n , n , and n , were found in only or patients, whereas deletions in amplicons n , n n and n , ranging from to nucleotides, were observed in to patients. a deletion of nucleotides in amplicon n (nt - ), generating a stop codon, was present at a frequency of . % of the quasispecies in samples from patients p and p . the largest deletion, involving nucleotides (nt - ) and found in n of patient p , resulted in a loss of amino acids, but the reading frame recovered. a striking result was the accumulation of deletions ("hot-spot") in amplicon n , between nucleotides - (aa y -n ) in / ( %) patients, which included % of the patients with mild disease (p -p , p , p and p ), and only half of those with severe disease (p , p , p and p ). in this particular hot-spot, deletions Δ to Δ were produced ( figure , table ). among the severe patients, p , p , and p had no deletions in the n amplicon, and p showed a deletion outside this hot-spot location (table s ). viral variants carrying these deletions were significantly more frequent in mild than severe covid- patients (fisher test: odds-ratio: % confidence interval . - . ; p= . ) (table s ) . among the total of , Δ deletions in amplicon n , a premature stop codon appeared immediately after the deletion site in five cases ( . %) and the reading frame recovered after losing , , or amino acids in six cases. however, a frameshift that changed the reading frame and caused the appearance of a premature stop codon several amino acids later was generated in most of the deletions / ( . %), and in consequence the s /s cleavage site and the polybasic domain (prrar/s) disappeared. in of the ( . %) n deletions, a tata box-like motif (nt , - , ) was lost. in this particular region, the deletion was characterized by a similar ' cutting edge (table s ). an interesting result at the amino acid level was that (table s ). in patients, a second deletion hot-spot was found deleting a number of nucleotides (from to ) between positions and (aa f- f), coinciding with the secondary s cleavage site (s '). the hot spot was located between nt ( k) and nt ( i), just after the exact s ' cleavage site (kpskr/sfi) ( table ) . here, we describe the naturally occurring deletions in the sars-cov- s gene in a set of patients with mild or severe covid . the deletions mainly clustered in two hot-spot regions, one (Δ , affecting aa -aa ) located upstream but very close to the s /s cleavage site (aa / ) and the second (Δ affecting aa - ) situated just upstream to the secondary cleavage site s ' (aa / ). these two deletions were found in most of the patient samples studied, and notably, the Δ deletion was present in % of patients with mild infection and in half of those with severe disease, three fourth of the studied patients (table ). this finding suggests that the deletions are not sporadic events even though these deletions were seen in a relatively small percentage of the viral quasispecies ( . % for Δ ; and . % for Δ ). the mutants could be interpreted as a strategy that natural selection has favored during the sars-cov- infectious life cycle to facilitate extensive spread of the infection, as is discussed below. this study involved deep-sequencing of the complete sars-cov- spike gene using overlapping amplicons in laboratory confirmed samples for sars-cov- in patients. in studies with other sars-cov viruses, several subgenomic rnas were reported to be generated during the cell cycle ( , ) . to exclusively study the genomic viral rna of sars-cov- , rt-pcr was performed using two large pcr products in which the ' end of primer pair and the ' end of primer pair were designed to be outside the spike region ( ' end in orf ab and ' end in orf a) (table s , figure s ). taking into consideration that cov have '- ' exon activity (nsp protein), consistent with a proofreading mechanism to correct mutations during replication, the deep-sequencing analysis accepted mutants present at a low frequency of ≥ . %. because the possibility of pcr artefacts, deep-sequencing point mutations, or deletion of single nucleotides generated mainly at homopolymeric sites, we did not include single deletions unless they were found in different patients and in overlapping amplicons at higher frequencies (> %). no insertions were found. entry of the viral genome into the cell depends on recognition and binding of the surface subunit s to the ace human receptor ( ) , whereas the s subunit is responsible for fixing the s protein to the viral membrane surface. after binding to the ace cell receptor, the s protein is primed by the serine-protease, tmprss , which leads to s protein cleavage at s /s and s ' ( ) . after cleavage, s remains attached to ace , while subunit s anchors the viral and cellular membranes, inducing fusion and viral entry. the Δ deletion ( table ) mainly causes a frameshift that generates an in-frame stop codon. the presence of this new stop codon would result in translation of a truncated s, which would consist of an almost complete s subunit, and total absence of the s subunit responsible for anchoring s to the lipid membrane of the viral particle. the absence of the s anchor peptide suggests that s could be produced as a "free" protein (free s ). as s is located on the exposed outside of sars-cov- in the crown structures, it could have hydrophilic domains and be a soluble peptide with potential for release outside the infected cell, in the lower respiratory tract and even to plasma. (figure ). these free soluble proteins, which are not a part of the viral cycle or components of the viral particles have also been observed in other viral infections. for example, a huge amount of "empty" subviral genomic particles, consisting of viral envelope proteins (hbsag), are often found in plasma of patients with hepatitis b virus (hbv) infection. these empty particles are produced and secreted during hbv infection, and have an immunomodulatory role ( ) . in addition, soluble hbv e antigen (hbeag), which is not a component of the viral particles and shares immunoactive epitopes with the hbv core antigen (hbcag viral capsid component), is detected during hbv infection and has an immunomodulator role ( ) . human respiratory syncytial virus (hrsv) is another respiratory virus with the ability to produce pre-anchored proteins. the attachment protein (g) of hrsv is an anchored protein whose main function is viral attachment to the host's cell membrane through a still unknown receptor ( ) . as in many other viruses, this protein has several functions, and in this case, because of the existence of a second start codon, a soluble form of g protein lacking the anchor is produced, and this is shed to the extracellular medium ( ) in abundant quantities by infected cells. the function of soluble, free g is to inhibit toll-like receptors, thereby modulating the host's immune response. free g also binds to the host's neutralizing antibodies, which are mainly directed to this protein. in this way, neutralization of circulating virions is reduced, favoring viral infection ( ) . the free s binding subunit of sars-cov- without its membrane anchor s could have similar functions (figure ). one putative action of secreted free s protein might be to attach to the human ace cell receptor, thereby competing with complete viral particles to re-infect or newly infect respiratory tract cells, resulting in less severe disease. this could be interpreted as an effect of natural selection to attenuate the infection and facilitate its persistence with minimal damage, increasing the human-to-human transmission into the community. this strategy, which we have dubbed "don't burn down the house" is supported by the finding that these minor variants carrying these deletions were statistically more frequent in patients with mild that severe covid . this self-modulating viral strategy has also been seen in hepatitis delta virus (hdv) infection, where one viral antigen (short hdv antigen shdag) enhances hdv replication, while a second antigen (large hdv antigen lhdag), produced after a stop codon edition (tag to tgg) by cellular adenosine deaminase, acts as a negative regulator of replication ( ) . the fact that the truncated s protein was present in only a low percentage of the entire viral quasispecies suggests that natural selection may have designed a favorable equilibrium in which a limited number of deleted virions are generated to balance virus production with infection of new cells during disease progression. a likely reason for maintaining a minority population of genomes with deletions able to produce free s protein would be to infect a host while causing minimal damage, which would greatly facilitate transmission of the virus within the population. it would be of interest to determine whether the percentage of these viral mutants changes during disease progression and whether there are differences between the quasispecies isolated from upper and lower respiratory tract specimens. as a consequence of the frameshift a new peptide motif, rlrlillgghvv* appeared in several sequences with a deletion that started in different nucleotide points. additional work is needed to determine whether acquisition of this peptide motif has biological consequences. two other putative consequences of the s mutants might be that free s protein could bind with s-specific antibodies, acting as a decoy and weakening the immune response, or to circulating ace , released from the cell membrane to plasma ( , ) , with cardiovascular effects. however, as the deletions were mainly found in patients with mild disease and considering the zoonotic origin of the virus (animal immune and cardiac systems differ from human ones) and the short time that the virus has been evolving in the human population, we believe that the most likely reason for maintaining a minor population of mutant genomes able to produce free s protein would be to cause an infection with limited damage in the host, thus facilitating transmission and persistence of the virus in the population. the observation of mutation hot spots in the s gene opens the door to further work on a number of potentially related aspects. in a recent study ( ) , lau et al reported the results of plaque purification of vero-e cultured sars-cov- genomes obtained from nasopharyngeal aspirate of a covid- patient. the authors found deletions of to nucleotides at the s /s junction. in a further experiment, infection of hamsters with virus containing these variants led to attenuated viral disease. these findings strongly support our hypothesis that deletions close to the s /s cleavage site may be a phenomenon favoured by natural selection to enhance spread of the sars-cov- . the authors failed to detect these deletions in this and other clinical specimens, but this may be attributable to the relatively low sequencing throughput of the sanger technique used. in the cell culture experiments, the lack of genomes with deletions that generate a premature stop codon in the s gene can be easily explained, as the truncated s gene would produce non-infectious particles. to conclude, in-depth sequencing of the sars-cov- s gene in patients with covid- enabled identification of a naturally occurring deletion very close to the s /s cleavage site. our results indicate that the mutant s would have a large impact on the s protein, and suggest that the virus could produce free s , which may have implications regarding the candidacy of s protein as a target for vaccination and antiviral treatment strategies. the deletions were significantly more prevalent in patients with mild than in those with severe disease, supporting the notion that they are a strategy of natural selection to decrease the injury caused after onset of the infection. in this "don't burn down the house" strategy, the ability of the virus to bind with ace receptor and spread to others would be unchanged; thus its propensity for transmission would be enhanced by a mildly affected host. to prove this hypothesis, it is essential to further investigate whether the truncated s protein (free s ) is present in respiratory tract specimens and in plasma. bar plots for the patients and by amplicons are provided in supplementary material (figures s to s for nucleotides and s to s for amino acids). the transcription step, two scenarios are depicted: to the left, the viral particle resulting from normal s protein, and to the right the viral particle resulting from truncated s. in normal conditions, once the nucleoprotein is freed into the cytoplasm ss+rna is translated into the nonstructural proteins required for transcription. ss+rna is transcribed into ss-rna and later into genomic ss+rna which is encapsidated (left side of the figure). once the complete viral particle has been formed, it is secreted from the cell by exocytosis. the right side of the figure depicts the situation when a deletion occurs in the s gene during transcription of the complete genome and before subgenomic mrnas are generated to produce the structural proteins. translation of a deleted subgenomic spike mrna would lead to a truncated s protein composed of the s domain without s , which could be shed outside the cell as free s . the box depicts possible destinations of free s , which could bind to ) the ace cell receptor, ) s -specific neutralizing antibodies, or ) free ace receptor. the red triangle indicates the deletion in genomic rna. abbreviations: ace , angiotensin converting enzyme ; mrna, messenger rna; nab; neutralizing antibodies; pp a, polyprotein a; rdrp, rna-dependent rna polymerase; s, spike; s , subunit s at the n-terminal domain of the s protein, which includes receptor binding domain (rbd); s , subunit s located at the c-terminal domain of s protein, which includes fusion peptide (fp), heptad repeat (hr) domain and , and the transmembrane domain (tm); ss, single stranded; ss+rna, single-stranded positive sense rna; tmprs , human serine protease tmprss replication error, quasispecies populations and extreme evolution rates of rna viruses basic concepts in rna virus evolution infidelity of sars-cov nsp -exonuclease mutant virus replication is revealed by complete genome sequencing attenuation of replication by a nucleotide deletion in sars-coronavirus acquired during the early stages of human-to-human transmission covid- -navigating the uncharted a review of sars-cov- and the ongoing clinical trials mechanisms of coronavirus cell entry mediated by the viral spike protein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor cd l (l-sign) is a receptor for severe acute respiratory syndrome coronavirus sars and mers: recent insights into emerging coronaviruses the proximal origin of sars-cov- ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence flash: fast length adjustment of short reads to improve genome assemblies muscle: a multiple sequence alignment method with reduced time and space complexity r: a language and environment for statistical computing biostrings:string objects representing biological sequences, and matching algorithms ape: analyses of phylogenetics and evolution in r language from sars to mers, thrusting coronaviruses into the spotlight hepatitis b virus: the challenge of an ancient virus with multiple faces and a remarkable replication strategy immunomodulatory function of hbeag related to short-sighted evolution, transmissibility, and clinical manifestation of hepatitis b virus respiratory syncytial virus--a comprehensive review further characterization of the soluble form of the g glycoprotein of respiratory syncytial virus the cysteine-rich region of respiratory syncytial virus attachment protein inhibits innate immunity elicited by the virus and endotoxin hepatitis delta virus the protective arm of the renin angiotensin system (ras) functional aspects and therapeutic implications the relationship of covid- severity with cardiovascular disease and its traditional risk factors: a systematic review and meta-analysis attenuated sars-cov- variants with deletions at the s /s junction inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion this work was partially supported by the direcció general de recerca i innovació en institute (vhir), european development regional fund (erdf) "a way to achieve europe", by we declare that no public or private company has had any role in the study design, data collection, experimental work, data analysis, decision to publish, or preparation of the manuscript. roche diagnostics s.l. provided support in the form of a salary for one of the authors (josep gregori), but the company did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.no other competing interests to declare. thus, our adherence to nature policies on sharing data and materials is not altered. the work has been approved by vall d'hebron university hospital ethical committee reference number pr(ag) - . key: cord- -lwh rww authors: anderson, cole; castillo, fritz; koenig, michael; managbanag, jim title: pooling nasopharyngeal swab specimens to increase testing capacity for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lwh rww the recent emergence of sars-cov- has lead to a global pandemic of unprecedented proportions. current diagnosis of covid- relies on the detection of sars-cov- rna by rt-pcr in upper and lower respiratory specimens. while sensitive and specific, these rt-pcr assays require considerable supplies and reagents, which are often limited during global pandemics and surge testing. here, we show that a nasopharyngeal swab pooling strategy can detect a single positive sample in pools of up to samples without sacrificing rt-pcr sensitivity and specificity. we also report that this pooling strategy can be applied to rapid, moderate complexity assays, such as the biofire covid- test. implementing a pooling strategy can significantly increase laboratory testing capacity while simultaneously reducing turnaround times for rapid identification and isolation of positive covid- cases in high risk populations. patients with viral pneumonia . pneumonia associated with sars-cov- was later designated as coronavirus disease (covid- ) by the world health organization in february . it was determined that after a zoonotic transmission event in wuhan city , widespread person- to-person transmission quickly occurred that led to the infection and death of over , and , people in china, respectively. to date, according to the who, there have been , , reported cases of covid- , including , deaths worldwide . since the initial outbreak in china, covid- has been declared a global pandemic affecting at least other countries, territories or areas. to monitor and diagnose covid- , the us food and drug administration (fda) approved an emergency use authorization (eua) for the cdc -ncov real-time rt-pcr diagnostic panel on february , . this protocol allows for the rapid detection of sars-cov- rna from clinical specimens such as, nasopharyngeal and oropharyngeal swabs, sputum, bronchoalveolar lavage, and tracheal aspirates. as evidenced by the ongoing sars-cov- pandemic, increased demand for testing can overwhelm diagnostic laboratories and lead to drastic shortages in supplies and reagents. a strategy to overcome high testing demand is to pool specimens before rna extraction, test pools, and then retest individual specimens from positive pools. similar strategies have shown to increase testing capacity for the detection of common infectious diseases such as influenza, hiv, hepatitis, and chlamydia trachomatis - . in this study, we examined the feasibility of pooling nasopharyngeal swab specimens submitted for covid- testing using the cdc -ncov rt-pcr diagnostic panel without compromising clinical sensitivity. our data shows that pooling respiratory samples during times of increased volume and low disease prevalence can save time and reagents without significant modifications to laboratory infrastructure or workflow. this study was determined to meet the exempt criteria listed in cfr . (d) from the landstuhl regional medical center exempt determination official. during an outbreak cluster of sars-cov- in stuttgart, germany, nasopharyngeal (np) swabs were collected and placed into . ml of normal saline. specimens were submitted to the virology laboratory at landstuhl regional medical center for routine sars-cov- testing using the cdc -ncov rt-pcr assay. post clinical testing, specimens were de-identified and randomly assigned into pools of to create distinct pools (the th pool contained specimens diluted in . ml of transport media). pools were created by combining ul of each specimen to create . ml pools. viral transport media was added to each pool at a : ratio for nucleic acid extraction performed on the roche magna pure platform using the magna pure total na isolation kit (roche). elution volume was set to ul to concentrate viral rna. each round of extraction contained a human specimen control to monitor for pcr inhibition and specimen quality. table ) . the mean c t value and standard deviation for n and n of the pools were . ( . ) and . ( . ), respectively. similarly, the mean c t values of individual positive specimens were . ( . ) and . ( . ) for n and n , respectively. despite dilution, there was no significant difference in mean c t value between the pooled and individually tested specimens (figure ) . to determine if a pooling approach is feasible with rapid, moderate complexity tests, we tested (table ) . a new coronavirus associated with human respiratory disease in china clinical features of patients infected with novel coronavirus in who director-general's remarks at the media briefing on -ncov on february a pneumonia outbreak associated with a new coronavirus of probable bat origin coronavirus disease (covid- ) situation report- . available at update: fda issues first emergency use authorization for point of care diagnostic | fda pooling nasopharyngeal/throat swab specimens to increase testing capacity for influenza viruses by pcr pooling of sera for human immunodeficiency virus (hiv) testing: an economical method for use in developing countries pooling of clinical specimens prior to testing for chlamydia trachomatis by pcr is accurate and cost saving high throughput screening of million serologically negative blood donors for hepatitis b virus, hepatitis c virus and human immunodeficiency virus type- by nucleic acid amplification testing with specific and sensitive multiplex reagent in utility of pooled urine specimens for detection of chlamydia trachomatis and neisseria gonorrhoeae in men attending public sexually transmitted infection clinics in mumbai, india, by pcr evaluation of covid- rt-qpcr test in multi-sample pools pooling of samples for testing for sars-cov- in asymptomatic people increasing testing throughput and case detection with a pooled-sample bayesian approach in the context of covid- biofire covid- test emergency use authorization key: cord- -wdgbno p authors: begum, feroza; mukherjee, debica; thagriki, dluya; das, sandeepan; tripathi, prem prakash; banerjee, arup kumar; ray, upasana title: analyses of spike protein from first deposited sequences of sars-cov from west bengal, india date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wdgbno p india has recently started sequencing sars-cov genome from clinical isolates. currently only few sequences are available from three states in india. kerala was the first state to deposit complete sequence from two isolates followed by one from gujarat. on april , , the first five sequences from the state of west bengal (eastern india) were deposited on ‘a global initiative on sharing avian flu data’ (gisaid) platform. in this paper we have analysed the spike protein sequences from all these five isolates and also compared for their similarities or differences with other sequences reported in india and with isolates of wuhan origin. we report one unique mutation at position and the other at in the s domain of spike protein of the isolates from west bengal only and one mutation downstream of the receptor binding domain at position in s domain which was common with the sequence from gujarat (a state of western part of india). mutation in the s domain showed changes in the secondary structure of the spike protein at region of mutation. we also studied molecular dynamics using normal mode analyses and found that this mutation decreases the flexibility of s domain. since both s and s are important in receptor binding followed by entry in the host cells, such mutations may define the affinity or avidity of receptor binding. sars-cov (a member of coronaviruses) outbreak occurred in wuhan, china in the year and it became a pandemic recently that has affected countries worldwide. to design antiviral therapeutics/ vaccines it is important to understand the genetic sequence, structure and function of the viral proteins. when a virus tries to adapt to a new environment, in a new host, in a new geographical location and a new population, it would make changes in its genetic make up which in turn would bring in slight modifications in the viral proteins. such variations would help the virus to utilize the host's machinery the best in favour of the virus survival and propagation. since host's immune system eventually learns to identify a pathogen that had infected and starts producing protective antibodies, a virus often changes its structural proteins such that it can still infect the host cells escaping the host's immune system. coronaviruses have been long known to undergo rapid mutations in its rna genome. such mutations get reflected in changes in the amino acid sequences of its structural and the nonstructural proteins. spike protein is one of the structural proteins of sars-cov that forms a homotrimer on the surface of the vital lipid envelope. this trimer is made up of monomers consisting of s and s subunits. s subunit helps in attachment to the host cell receptor while s subunit helps in fusion to host cell and entry. thus, spike protein has been an area of interest for designing vaccine and antiviral candidates against sars-cov . since the spike protein tends to mutate, it is important to obtain a broad mutation profile of this protein from extensive genome sequencing from different geographical locations of the world. targeting areas of spike protein that do not undergo mutation i.e. conserved would be the key to design effective broad-spectrum antiviral or vaccine. we downloaded the five new sars-cov sequences from west bengal (epi_isl_ ; epi_isl_ ; epi_isl_ ; epi_isl_ ; epi_isl_ ) from gisaid database and the spike protein sequences corresponding to kerala isolates [ ] and gujarat isolate [ ] from the ncbi virus database. for the west bengal isolates, the nucleotide sequences corresponding to the spike protein were selected and then translated on 'expasy' translate tool to obtain the protein sequences. all the spike protein sequences were aligned by multiple sequence alignment platform of clustal omega. the alignment file was viewed using mview and differences in the sequence or the amino acid changes were recorded. cfssp (chou and fasman secondary structure prediction) server was used to predict secondary structures of sars-cov spike protein. to study the effect of mutation on the conformation, stability and flexibility of the spike protein, structure was downloaded from rcsb pdb. we used the available sars-cov- spike ectodomain structure (open state) (pdb id: vyb). vyb structure was uploaded on dynamut software (university of melbourne, australia) [ ] and change in vibrational entropy; the atomic fluctuations and deformation energies due to mutation were determined. for atomic fluctuation and deformation energy calculations, calculations were performed over first ten non-trivial modes of the molecule. the first set of five sequencing data from clinical isolates of sars-cov from the state of west bengal, india was submitted on . . by national institute of biomedical genomics (nibmg) in collaboration with icmr-national institute of cholera and enteric diseases (icmr-niced). the sequences were submitted on gisaid database. we downloaded all the sequences from west bengal (figure ) and performed a nucleotide translation to obtain respective spike protein sequences. all these spike protein sequences were first aligned in clustal omega to check for similarities or differences. we found that all the isolates from west bengal were identical (data not shown as they were identical). so, we used one of these sequences as representative of sars-cov spike from west bengal for our further analyses. since, currently we have sequences against sars-cov only from three states in india i.e. kerala, gujarat and now west bengal, we compared all the sequences to detect possible changes ( figure ) . we considered the original wuhan sequence as the wild type for comparison. based on these criteria we found four different amino acid positions that were mutated in these isolates overall. we had recently published the details with respect to the spike protein mutations in kerala and gujarat isolates. here, we report that among west bengal isolates there were three mutations in the spike protein. one of these mutations was d g in the s domain. this lies near the receptor bending domain at a downstream position. the other mutation was g v in the s domain. while d g was also found in the isolate from gujarat but not in kerala isolates, g v mutation was exclusively found in two of the isolates of west bengal. none of the isolates from other parts of india had this mutation. another mutation that appeared in one of the west bengal's isolates was t i. we characterized mutation g v.both glycine (g) and valine (v) are non-polar amino acids with aliphatic r groups. glycine has no side chain whereas valine is bulkier due to its side chain. a change from glycine to valine can thus potentially disrupt the local folding of the protein. for example, it was shown that g to v change in a pglycoprotein changed its drug specificities [ ] . secondary structure prediction showed changes in and around the site of mutation ( figure ). in the mutant spike there was a loss of turn structure from position and addition of four helices at positions , , and . this change in secondary structure might lead to change in function of s . s helps in fusion process of the spike protein and thus mutation in s may have altered receptor spike interactions and thus infectivity. to correlate if changes in secondary structure also gets reflected in the dynamics of the protein in its tertiary structure, we performed normal mode analyses and studies protein stability and flexibility. change in vibrational entropy energy (ΔΔsvib encom) between the wild type wuhan isolate and the west bengal isolate was - . kcal.mol - .k - (figure ). the ΔΔg was . kcal/mol and the ΔΔg encom was . kcal/mol. all these suggested a stabilizing mutation in this type of spike. the interatomic interactions have been shown in figure . analyses of atomic fluctuations and deformation energies showed visible changes ( figure ). atomic fluctuations calculate the measure of absolute atomic motion whereas the deformation energies detect the measure of flexibility of a protein. figure shows the visual representations of the atomic fluctuation and deformation energies where positions that could be visibly detected to be different have been marked. this is the first report of mutations of such types in the isolates of the state of west bengal and further sequencing followed by sequence analyses would help expanding the knowledge about variations of spike protein in human sars-cov. these variations might lead to virus diversification and eventual emergence of variants/ antibody escape mutants/strains/ serotypes. also, mutations might help the virus to expand its tissue tropism and adjust with the host environment better. therefore, elaborate studies on sequence variations should be done which would in turn help in better therapeutic targeting. a new coronavirus associated with human respiratory disease in china a virus that has gone viral: amino acid mutation in s protein of indian isolate of coronavirus covid- might impact receptor binding and thus infectivity novel mutations in the s domain of covid spike protein of isolate from gujarat origin dynamut: predicting the impact of mutations on protein conformation, flexibility and stability an altered pattern of cross-resistance in multidrug-resistant human cells results from spontaneous mutations in the mdr (p-glycoprotein) gene we thank prof. saumitra das, nibmg, kalyani, west bengal and icmr-niced for depositing the sequences in public database and making it available open access. we thank csir, acsir and north bengal medical college and hospital for other necessary support and input. authors declare no conflict of interests. key: cord- -q quyijg authors: lim, su bin; dawson, valina l.; dawson, ted m.; kang, sung-ung title: ace -expressing endothelial cells in aging mouse brain date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q quyijg angiotensin-converting enzyme (ace ) is a key receptor mediating the entry of sars-cov- into the host cell. through a systematic analysis of publicly available mouse brain sc/snrna-seq data, we found that ace is specifically expressed in small sub-populations of endothelial cells and mural cells, namely pericytes and vascular smooth muscle cells. further, functional changes in viral mrna transcription and replication, and impaired blood-brain barrier regulation were most prominently implicated in the aged, ace -expressing endothelial cells, when compared to the young adult mouse brains. concordant ec transcriptomic changes were further found in normal aged human brains. overall, this work reveals an outline of ace distribution in the mouse brain and identify putative brain host cells that may underlie the selective susceptibility of the aging brain to viral infection. in addition to the well-known respiratory symptoms, covid- patients suffer from a loss of smell and taste, headache, impaired consciousness, and nerve pain [ ] , raising possibility of virus infiltration in the nervous system, including brain. despite cases emerging of covid- patients with neurologic manifestations, potential neurotropic mechanisms underlying sars-cov- mediated entry into the cells of the brain are largely unexplored. evidenced by transgenic mice models [ , ] , the evolutionarily-related coronaviruses, such as sars-cov and mers-cov, can invade the brain by replicating and spreading through the nasal cavity, and possibly olfactory bulbs located in close proximity to the frontal lobes of the brain [ ] . once inside the brain, viruses can harm the brain directly and indirectly by infecting the cells and myelin sheaths, and by activating microglia, which may in turn consume healthy neurons to induce neuroinflammation and neurodegeneration [ ] . many of the observed neurological symptoms observed may in part be explained by a primary vasculopathy and hypercoagulability [ ] , as endothelial dysfunction and the resulting clotting are increasingly being observed in patients with severe covid- infection [ ] [ ] [ ] . consistent with these findings, the first pathologic evidence of direct viral infection of the ec and lymphocytic endotheliitis has been found in multiple organs, including lung, heart, kidney and liver, in a series of covid- patients [ ] . in contrast, recent clinical findings, including an mri study [ ] and immunohistochemistry and rt-qpcr analyses [ , ] , did not observe any signs of encephalitis from postmortem brain examination of covid- patients. similarly, postmortem analysis of sars-cov- -exposed mice transgenically expressing ace via mouse ace promoter failed to detect the virus in the brain [ ] . in light of such controversy regarding neuropathological features, a more comprehensive assessment on the distribution of ace in a cell type-specific manner is required to identify putative brain host cells. here, we analyzed publicly available and spatially rich brain rna-seq datasets to assess ace distribution in mouse brains at the single-cell and single-nuclei level. we found that ace was consistently expressed in small subpopulations of endothelial cells (ecs) and mural cells in all the analyzed datasets, in which the impaired blood-brain barrier was further implicated in the aged brains. these findings altogether may hold potential to initiate new avenues of research on specific cells types (ec and vascular mural cells) that remain poorly understood particularly in relation to the aging and viral infection in the brain. a total of adult mouse brains datasets deposited in the single cell portal (scp) were analyzed in this study ( table ; see methods). datasets lacking vascular cell types were excluded from the data collection and subsequent analyses. the analyzed datasets were derived from diverse singlecell (scrna-seq) and single-nuclei (snrna-seq) sequencing technologies, including x genomics, smart-seq, drop-seq, and snucdrop-seq. the number of cells included in the final scp-generated tsne/umap plots also varied from , (lowest; t ) to , (largest; t ). despite the varying library preparation and sequencing technologies, which arguably recover population heterogeneity to different extents, the analysis of the retrieved scrna-seq datasets ( fig. a-e) consistently showed increased levels of ace in a small subpopulation of vascular cells, namely endothelial cells (ec), pericytes (pc), and vascular smooth muscle cells (vsmc), across different brain regions, including auditory cortex (t ), anterior lateral motor cortex (t ), primary visual cortex (t ), hypothalamus (t ), and whole brain (t ). further, scrna-seq dataset specifically derived from brain vasculature in young adult and aged mice (t ) confirms the elevated ace expression in subsets of the three identified cell types, which consist of . % of the cell populations ( fig. c) . similarly, ace mrnas were enriched in subpopulations of ecs and mural cells of all analyzed snrna-seq datasets ( fig. a-e) derived from cortex (t , t , t ), midbrain (t ), including substantia nigra pars reticulata (snr), substantia nigra pars compacta (snc), and ventral tegmental area (vta), and cerebellum (t ). anterior lateral motor cortex (n= , ) ace expression we next asked if these identified vascular cell sub-populations expressing ace would be affected by aging and whether they have unique transcriptional changes that are functionally important. of the major cell types of different lineages (oligodendrocyte, astrocyte, and neuronal lineages, ependymal cells, vasculature cells, and immune cells), ec, pc, and vsmc cell types contribute to a majority of ace -expressing cells in the whole mouse brain (t ) (fig. fig. . impaired bbb implicated in ace -expressing ecs of the aged brain. to assess our findings in a human context, we next asked if the identified transcriptomic changes in the aged ec gene signatures would further be conserved and detected in normal human aged brains using bulk rna-seq data derived from the genotype-tissue expression (gtex) project database ( table ) [ ] . expression levels (in tpm) of human orthologs of the aged ec degs of ec degs were different in the old group from that of the young population, indicating concordant ec transcriptomic changes in normal aged human brains (fig. a-b) . this conclusion was further supported by gene set enrichment analysis (gsea) results, displaying a significant enrichment of the aged (n = ) and young (n = ) mouse ec degs in the old (n = ) and young (n = ) human samples, respectively (fig. c) . in conclusion, we have identified specific ec signatures that are functionally important and related to the aging and viral infection in the brain. while our study provides a foundation for a more refined level of analysis of ec and vascular pc, a cell type that remains poorly understood despite its key roles in immune response and microvascular stability [ ] , our analyses are limited only to the normal aging mouse and human brains, lacking the context of covid- neuropathology. a number of recent sc/snrna-seq studies identified ace mrna in the olfactory neuroepithelium [ , ] , although there are no sc/snrna-seq data derived from postmortem brains of covid- patients to date. the distribution of ace and other genes mediating sars-cov- entry into the cells of the brain thus remains to be investigated across different regions and cell types. using scrna-seq data derived from normal human brain tissues, muus, c et al. [ ] have identified ace + tmprss + oligodendrocytes, while chen et al. [ ] have found subsets of both neuronal (excitatory and inhibitory neurons) and non-neuronal cells (mainly astrocytes and oligodendrocytes) expressing ace . these studies, however, have only used a limited number of datasets, which may in part explain the inconsistent results of the identified cell types. despite the works that failed to identify direct signs of sars-cov- infection in the brains of covid- patients [ , ] , other lines of evidence support the neurotropism of the virus, as evidenced by experimental platforms leveraging human induced pluripotent stem cell (ipsc)derived dopaminergic neurons [ ] and an organotypic brain model [ ] . at this point, more data and systematic molecular evidence will be needed to assess the neuroinvasive potential of sars-cov- and its potential impact on neuroinflammation and neurodegenerative diseases. of all the independent studies retrieved with the term "ace " from the single cell portal (https://singlecell.broadinstitute.org/single_cell), sc/snrna-seq datasets were derived from ( ) adult (young or old) mouse brains, and had ( ) author-defined cell type annotations including endothelial cells (ec, pc, or vsmc). a total of , cells were analyzed in this study, based on the organ (i.e., brain) and species of origin (i.e., mouse), and diseased status (i.e., normal). -d tsne/umap plots (colored by cell type) and box plots for cell type-specific ace expression presented in this study were generated by the single cell portal. t-sne visualization (colored by age) and cell type-specific de gene lists of the aging mouse brains (t ) were obtained from the advanced interactive data viewer (http://shiny.baderlab.org/agingmousebrain/agingmousebrain_scv/). genes with |logger (gene expression ratio)|> . and fdr< . were defined as degs, and were analyzed for functional pathway enrichment using geneanalyics (https://geneanalytics.genecards.org/). de-identified processed human brain bulk rna-seq data and annotation files for sample attributes and subject phenotypes were obtained from the genotype-tissue expression (gtex) portal (https://www.gtexportal.org/home/datasets). data from amygdala (n = ), anterior cingulate cortex (n = ), caudate (basal ganglia) (n = ), cerebellar hemisphere (n = ), cerebellum (n = ), hippocampus (n = ), frontal cortex (ba ) (n = ), cortex (n = ), hypothalamus (n = ), nucleus accumbens (basal ganglia) (n = ), putamen (basal ganglia) (n = ), substantia nigra (n = ) were analyzed in this study. a total of , samples were divided into two age groups: young (< ) and old (≥ years old). expression levels (in tpm) were compared between the two age groups for all genes (n = , ) by t-test with multiple testing correction. the performance of bonferroni correction and false discovery rate (fdr)-benjamini-hochberg (bh) procedure was assessed using the rstudio (version . . ) base function p.adjust(). the r qvalue package [ ] from bioconductor was used to assess the performance of q-value approach. the r biomart package [ ] from bioconductor was used to convert mouse ec degs to human gene symbols (hgnc) using getlds() function. gene set enrichment analysis (gsea v . . ; https://www.gsea-msigdb.org/gsea/index.jsp) [ ] was used to assess ec degs (ec up-and down-regulated genes) in gtex human brain bulk rna-seq samples. following parameters were set to run enrichment tests: ( ) number of permutations = , , ( ) collapse/remap to gene symbols = no_collapse, and ( ) permutation type = gene_set. neurologic manifestations of hospitalized patients with coronavirus disease severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace middle east respiratory syndrome coronavirus causes multiple organ damage and lethal disease in mice transgenic for human dipeptidyl peptidase . the journal of infectious diseases the neuroinvasive potential of sars-cov may play a role in the respiratory failure of covid- patients highly pathogenic h n influenza virus can enter the central nervous system and induce neuroinflammation and neurodegeneration covid- and the chemical senses: supporting players take center stage incidence of thrombotic complications in critically ill icu patients with covid- clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study coagulopathy and antiphospholipid antibodies in patients with covid- endothelial cell infection and endotheliitis in covid- early postmortem brain mri findings in covid- non-survivors postmortem examination of patients with covid- neuropathological features of covid- the pathogenicity of sars-cov- in hace transgenic mice brain microvascular pericytes in health and disease coordinating center -analysis working, g.; statistical methods groups-analysis working, g.; enhancing, g.g.; fund, n.i.h.c.; nih/nci; nih/nhgri; nih/nimh; nih/nida, et al. genetic effects on gene expression across human tissues blood-brain barrier pericytes as a target for hiv- infection non-neuronal expression of sars-cov- entry genes in the olfactory system suggests mechanisms underlying covid- -associated anosmia sars-cov- receptor and entry genes are expressed by sustentacular cells in the human olfactory neuroepithelium integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell typespecific expression of mediators of sars-cov- viral entry and highlights inflammatory programs in putative target cells the spatial and cell-type distribution of sars-cov- receptor ace in human and mouse brain a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids infectability of human brainsphere neurons suggests neurotropism of sars-cov- qvalue: q-value estimation for false discovery rate control. r package version . mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles the authors declare no competing interest. key: cord- -w twrjzr authors: yin, rui; luo, zihan; kwoh, chee keong title: alignment-free machine learning approaches for the lethality prediction of potential novel human-adapted coronavirus using genomic nucleotide date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w twrjzr a newly emerging novel coronavirus appeared and rapidly spread worldwide and world health organization declared a pandemic on march , . the roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe on humans. the detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspective for treatment. we developed alignment-free machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of potential human-adapted coronavirus using genomic nucleotide. we performed extensive experiments through six different feature transformation and machine learning algorithms in combination with digital signal processing to infer the lethality of possible future novel coronaviruses using previous existing strains. the results tested on sars-cov, mers-cov and sars-cov- datasets show an average . % prediction accuracy. we also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. our study achieves high levels of prediction performance based on raw rna sequences alone without genome annotations and specialized biological knowledge. the results demonstrate that, for any novel human coronavirus strains, this alignment-free machine learning-based approach can offer a reliable real-time estimation for its viral lethality. have caused high morbidity and mortality, and unfortunately, the fast and untraceable virus mutations take the lives of people before the immune system can produce the inhibitory antibody [ ] . currently, no miracle drug or vaccines are available to treat or prevent the humans infected by coronaviruses [ ] [ ] . therefore, there is a desperate need for developing approaches to detect the lethality of coronaviruses not only for covid- but also the potential new variants and species. this would facilitate the diagnosis of coronavirus clinical severity and provide decision-making support. the detection of viral lethality has already been explored in influenza viruses [ ] . through a meta-analysis of predicting the virulence and antigenicity of influenza viruses, we can infer the lethality of the virus timely to improve the current influenza surveillance system [ ] . regarding the risk of novel emerging coronavirus strains, much attention has been captured to investigate the lethality or clinical severity of new emerging coronavirus. typically, epidemiological models are certainly built to estimate the lethality and the extent of undetected infections associated with the new coronaviruses. bastolla suggested an orthogonal approach based on a minimum number of parameters robustly fitted from the cumulative data easily accessible for all countries at the john hopkins university database to extrapolate the death rate [ ] . bello-chavolla et al. proposed a clinical score to evaluate the risk for complications and lethality attributable to covid- regarding the effect of obesity and diabetes in mexico [ ] . the results provided a tool for quick determination of susceptibility patients in a first contact scenario. want et al. leveraged patient data in real-time and devise a patient information based algorithm to estimate and predict the death rate caused by covid- for the near future [ ] . aiewsakun et al. performed a genome-wide association study on the genomes of covid- to identify genetic variations that might be associated with the covid- severity [ ] . moreover, jiang et al. established an artificial intelligence framework for data-driven prediction of coronavirus clinical severity [ ] . the development of computational and physics-based approaches has relieved the labors of experiments by utilizing epidemiological and biological data to construct the model. however, direct evaluation of potential novel coronavirus strains for their lethality is crucial when clinicians are forced to make difficult decisions without past specific experience to guide clinical acumen. inferring the lethality of novel coronavirus is possible by identifying the patterns from a large number of coronavirus sequences. in this paper, we propose alignment-free machine learning-based approaches to infer / the lethality of potential novel human-adapted coronavirus using genomic sequences. the main contribution is that we formulate the problem of estimating the lethality of human-adapted coronavirus through machine learning approaches. by leveraging some appropriate feature transformation, we can encode genomic nucleotides into numbers that allow us to convert it into a prediction task. the experimental results suggest our models deliver accurate prediction of lethality without prior biological knowledge. we also performed phylogenetic analysis validating the effectiveness of our models through other human coronaviruses. problem formulation the pandemic of novel coronavirus covid- has caused thousands of fatalities, making tremendous treats to public health worldwide. the society is deeply concerned about its spread and evolution with the emergence of any potential new variants, that would increase the lethality. typically, lethality refers to the capability of causing death. it is usually estimated as the cumulative number of deaths divided by the total number of confirmed cases. among all the human-adapted coronaviruses, mers-cov caused the highest fatality rate of % [ ] , followed by the sars-cov with . % fatality rate [ ] . in comparison, covid- indicates a lower mortality rate of . % [ ] . the lethality rate of covid- is likely to decrease with better treatment and precautions. in this paper, we mainly focus on these three types of human-adapted coronavirus and define the degree of viral lethality in terms of historical fatality rates. as a result, mers-cov strains are high lethal while sars-cov and covid- strains are middle and low lethal, respectively. data collection and preprocessing genomic nucleotide sequences of three different coronaviruses with the human host are downloaded from national center for biotechnology information on april , [ ] . duplicate sequences and incomplete genomes with a length smaller than are removed from the collection to address the possible issues raised from sequence length bias. some sars-cov strains from the laboratory are included that are cultivated in vero cell cultures to enrich the training samples. finally, we end up with , , samples for mers-cov, sars-cov and sars-cov- . in addition, we also collect the genomic data of other four human coronaviruses with , , and strains for hcov-hku , hcov-nl , hcov- e and hcov-oc , respectively. apart from the four symbolic bases (a, c, t, g) of each strain, we have degenerate base symbols that are an iupac representation [ ] for a position on genomic sequences, which could [ ] , mapping biological sequences into real-value vector space that the information or pattern characteristic of the sequence is kept in order. this is / important as the existing machine learning approaches can only deal with vectors but not sequence samples. several methods are proposed that convert genomic sequences into numerical vectors, e.g., the fixed mapping between nucleotides and real numbers without biological significance [ ] , based on physio-chemical properties [ ] , deduction from doublets or codons [ ] , and chaos game representation [ ] . to accommodate comprehensive analysis and comparison, we adapt different types of numerical representations for biological rna sequences. randhawa et al. [ ] showed that "real", "just-a" and "purine/pyrimidine (pp)" numerical representation yield better the real number representation is a fixed transformation technique that we obtain values of four bases as: adenine (a) = - . , thymine (t) = . , cytosine (c) = . , and guanine (g) = - . [ ] . it is efficient in finding a complementary strand of dna/rna sequence and can endure complementary property. just-a" method maps the four bases into binary classification as the presence of adenine is labeled , while others are [ ] . pp representation is a dna-walk model that shows nucleotides sequences in which a step is taken upwards if the nucleotide is pyrimidine with t/c = , or downward if it is purine with a/g = - [ ] . eiip describes the distribution of the energy of free elections along with nucleotide sequences that a single eiip indicator sequence is formed through replacing its nucleotides, where a= . , c= . , g= . , and t= . [ ] . the sequence-to-signal mapping for nearest-neighbor based doublet representation is illustrated in [ ] , where the last position is followed by the first in the sequence. lastly, cgr is a method proposed by jeffrey [ ] that has been successfully used for a visual representation of genome sequence patterns and taxonomic nucleotide of the sequence is plotted halfway between the center of the square and the vertex representing this nucleotide. the next base is mapped into the image that the coordinate is assigned halfway between the previous point and the vertex corresponding to the previous nucleotide. the mathematical formulation of the successive points that calculates the coordinates in the cgr of the sequences is described below: where c ix and c iy denote the x and y coordinates of the vertices matching the nucleotide at position i of the sequence, respectively. model construction machine learning has been utilized in many aspects of viral genomic analysis, e.g., antigenicity prediction of viruses [ ] , genome classification of novel pathogens [ ] , reassortment detection [ ] , receptor binding analysis [ ] and vaccine recommendation [ ] , etc. with increasingly available genomic sequences, it will play more critical roles in helping biologists to analyze large, complex biological data for prediction and discovery. in this work, we provide a comprehensive analysis of the implemented in comparison with the predictive performance of machine learning models. traditional machine learning models consist of logistic regression (lr), random forest (rf), k-nearest neighbor (knn) and neural network (nn) [ ] , while three variants of convolutional neural network (cnn) and two types of recurrent neural network (rnn) are leveraged. the cnn models contain alexnet [ ] , vgg [ ] and resnet [ ] . following the choices of five one-dimensional numerical representation for viral sequences, digital signal processing is introduced through dft techniques. we assume that the number of input sequence is n and all the sequences have the same length l. for and ≤ k ≤ l − , the corresponding discrete numerical representation is formulated as where f (s i (k)) denotes the numerical value after mapping by function f (·) at the position k of nucleotide sequence s i . the signal n i computed after dft is represented as vector f i . the formulation of f i is presented below. we define that the magnitude vector that corresponds to the signal typically, the length of numerical digital signal n i is equal to the magnitude spectrum m i that is originated from the length of the genomic sequence. however, the input genome sequences are in different lengths, thus they need to be length-normalized after dft. median length-normalization is leveraged for the input digital signals using zero padding. we employ anti-symmetric padding that begins from the last position if the input sequences are shorter than the median length, these short signals are extended to the median length with zero-padding, while the longer sequences are truncated after the median length. as for the two-dimensional numerical representation, i.e., cgr, a point that corresponds to a sequence of length l will be contained within a square with a side of length −l . we assume a square cgr image is generated with a size of k × k matrix, where k is the parameter that determines the size of the image. the frequency of occurrence of any oligomer in a sequence can be obtained by partitioning the cgr space into small squares. therefore, the number of cgr points in each unit square of k × k grid is equal to the number of occurrences of all possible k-mers in the sequence. by counting the frequency of cgr points, it is possible to calculate oligonucleotide frequencies at various grid resolutions. we define the element a j as the number of points that are located in the corresponding sub-square j, where ≤ j ≤ k . each sequence will be mapped into a k × k dimensional vector space based on cgr. implementation and evaluation we implement all the models by scikit-learn [ ] and pytorch [ ] . we utilize the learning models. for deep learning-based models, we apply stochastic gradient descent with a minimum batch size of for optimization. the drop-out (rate = . ) strategy is carried out with a . learning rate and all the models are fit for training epochs. the predictive performance is evaluated by accuracy, precision, sensitivity, and f score of all models in the prediction tasks of coronavirus lethality. [ ] . the evidence shows that cytosine discrimination and deamination against cpg dinucleotides are the driving force that outlines the coronaviruses over evolutionary times [ ] . it is indicated that the atypical nucleotide bias could reflect distinct biological functions that are the direct cause of the characteristic codon usage in these viruses [ ] . therefore, the analysis of the nucleotide and codon usage in coronaviruses can not only exhibits the clues on potential viral evolution but also improves the understanding of the viral regulation and promotes vaccine design. we test the ability of our models to identify the lethality of other different human guangzhou, china, but few death cases are reported [ ] . hcov- e is a close relative of hcov-nl and it will lead to alike symptoms [ ] . figure displays the cgr plots of different sequences of human coronavirus at the value of for k-mer frequency. the cgr plots visually indicate that the genomic signature of the sars-cov- isolate wuhan-hu- (fig. c) is closer to the genomic signature of the sars-cov coronavirus isolate canada (fig. a) , followed by the strain of mers-cov betacoronavirus england isolate (fig. b) . moreover, the other four human ace receptor has been identified as the potential receptor for covid- and serves as a potential target for treatment [ ] [ ] . nevertheless, with the circulation of bat-related coronavirus and geographic coverage, it is critical to monitor the evolution of coronavirus. currently, seven known types of coronavirus can infect humans. novel strains of these coronaviruses can likely arise and attack human again through reassortment and mutation when two different or more strains co-infect the same host. preparation is necessary to prevent potential epidemics and pandemics caused by a novel coronavirus. as a result, our work paves the basis for surveillance by inferring the lethality of any potential human coronaviruses that may emerge in the future. this study is subject to a variety of limitations. the definition of classifying the degree of coronavirus lethality is mainly based on the mortality rate. we assume that the higher the mortality, the more lethal for the virus, and thus make three categories of the lethality level for all viruses with a different threshold. however, our estimation for these values lies within the range of fatality rate from the literature, which we do not have sufficient data to parameterize the case-structured model, especially for viruses with few samples. we also do not build a benchmark for the death caused directly by human coronaviruses, as the criteria from institutions and countries could be different. besides, the limited data points for the human coronavirus pale the high predictive accuracy, as most of the machine learning algorithms possess a superb generation ability to discover inherent patterns from training samples, particularly in the small dataset. but like typical machine learning approaches, our models are not qualified to provide a direct and accessible explanation that explicitly interprets why a certain coronavirus strain is more lethal to humans. some rule-based methods or clinical study might provide a better rationale for their results. conclusion we provide a comprehensive analysis through alignment-free machine learning-based methods for the prediction of the lethality of potential human-adapted coronavirus. the results show that on the average, cgr, eiip, and just-a representations perform better than others. interestingly, traditional machine learning methods display obvious merit both in computational efficiency and performance than deep learning models on this task. validation of other types of human coronavirus in combination with phylogenetic analysis further demonstrates our predictive results. we hope this work would facilitate the research of covid- for biologists and clinicians that are in the frontline. coronavirus infections and immune responses a case for the ancient origin of coronaviruses evolutionary insights into the ecology of coronaviruses origin and evolution of pathogenic coronaviruses coronavirus diversity, phylogeny and interspecies jumping history and recent advances in coronavirus discovery hosts and sources of endemic human coronaviruses global outbreak of severe acute respiratory syndrome (sars) outbreak of middle east respiratory syndrome coronavirus in saudi arabia: a retrospective study genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the challenge of emerging and re-emerging infectious diseases a comparative analysis of factors influencing two outbreaks of middle eastern respiratory syndrome (mers) in saudi arabia and south korea an interactive web-based dashboard to track covid- in real time. the lancet infectious diseases covid- : what is next for public health? the lancet tempel: time-series mutation prediction of influenza a viruses via attention-based recurrent neural networks escaping pandoras boxanother novel coronavirus the sars, mers and novel coronavirus (covid- ) epidemics, the newest and biggest global health threats: what lessons have we learned? development and clinical application of a rapid igm-igg combined antibody test for sars-cov- infection diagnosis comparative genetic analysis of the novel coronavirus ( -ncov/sars-cov- ) receptor ace in different populations meta-analysis on the lethality of influenza a viruses using machine learning approaches predicting antigenic variants of h n influenza virus based on epidemics and pandemics using a stacking model how lethal is the novel coronavirus, and how many undetected cases there are? the importance of being tested. medrxiv predicting mortality due to sars-cov- : a mechanistic score relating obesity and diabetes to covid- outcomes in mexico real-time estimation and prediction of mortality caused by covid- with patient information based algorithm suradej hongeng, and arunee thitithanyanont. sars-cov- genetic variations associated with covid- severity. medrxiv towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity summary of probable sars cases with onset of illness from database resources of the national center for biotechnology information nomenclature for incompletely specified bases in nucleic acid sequences: recommendations computational identification of physicochemical signatures for host tropism of influenza a virus protein-protein interaction site prediction through combining local and global features with deep neural networks numerical representation of dna sequences identification of pathogenic viruses using genomic cepstral coefficients with radial basis function neural network genomic signal processing methods for computation of alignment-free distances from dna sequences analysis of genomic sequences by chaos game representation ml-dsp: machine learning with digital signal processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels autoregressive modeling and feature analysis of dna sequences evolution of long-range fractal correlations and /f noise in dna base sequences visualization and analysis of dna sequences using dna walks a coding measure scheme employing electron-ion interaction pseudopotential (eiip) chaos game representation of gene structure additive methods for genomic signatures machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: covid- case study hopper: an adaptive model for probability estimation of influenza reassortment through host prediction computational analysis of the receptor binding specificity of novel influenza a/h n viruses time series computational prediction of vaccines for influenza a h n with recurrent neural networks machine learning: an algorithmic perspective imagenet classification with deep convolutional neural networks very deep convolutional networks for large-scale image recognition deep residual learning for image recognition scikit-learn: machine learning in python. the automatic differentiation in pytorch mutational patterns correlate with genome organization in sars and other coronaviruses genome structure and transcriptional regulation of human coronavirus nl coronavirus genomics and bioinformatics analysis. viruses on the biased nucleotide composition of the human coronavirus rna genome an open-source k-mer based machine learning tool for fast and accurate subtyping of hiv- genomes projecting the transmission dynamics of sars-cov- through the postpandemic period epidemiology, genetic recombination, and pathogenesis of coronaviruses understanding human coronavirus hcov-nl . the open virology journal epidemiology and clinical characteristics of human coronaviruses oc , e, nl , and hku : a study of hospitalized children with acute respiratory tract infection in guangzhou, china human coronavirus nl and e seroconversion in children host and infectivity prediction of wuhan novel coronavirus using deep learning algorithm functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses bats are natural reservoirs of sars-like coronaviruses isolation and characterization of viruses related to the sars coronavirus from animals in southern china middle east respiratory syndrome coronavirus infection in dromedary camels in saudi arabia beware of asymptomatic transmission: study on -ncov prevention and control measures based on extended seir model preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china, from to : a data-driven analysis in the early phase of the outbreak developing covid- vaccines at pandemic speed single-cell rna expression profiling of ace , the putative receptor of wuhan -ncov key: cord- -sx q ol authors: davda, jayeshkumar narsibhai; frank, keith; prakash, sivakumar; purohit, gunjan; vijayashankar, devi prasad; vedagiri, dhiviya; tallapaka, karthik bharadwaj; harshan, krishnan harinivas; siva, archana bharadwaj; mishra, rakesh kumar; dhawan, jyotsna; siddiqi, imran title: an inexpensive rt-pcr endpoint diagnostic assay for sars-cov- using nested pcr: direct assessment of detection efficiency of rt-qpcr tests and suitability for surveillance date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sx q ol with a view to extending testing capabilities for the ongoing sars-cov- pandemic we have developed a test that lowers cost and does not require real time quantitative reverse transcription polymerase chain reaction (rt-qpcr). we developed a reverse transcription nested pcr endpoint assay (rt-npcr) and showed that rt-npcr has comparable performance to the standard rt-qpcr test. in the course of comparing the results of both tests, we found that the standard rt-qpcr test can have low detection efficiency (less than %) in a real testing scenario which may be only partly explained by low viral representation in many samples. this finding points to the importance of directly monitoring detection efficiency in test environments. we also suggest measures that would improve detection efficiency. the continuing covid- pandemic has created an urgent need for increased diagnostic tests worldwide. the requirement of tests has exceeded the normal testing capacities available in public and private hospitals and clinical research laboratories and also strained financial resources. the most widely deployed type of test for identifying individuals infected with sars-cov- , based on recommendations of the world health organization (who) and national health centres such as the us centers for disease control and prevention (cdc) detects the presence of viral rna. the method employs real time quantitative reverse transcription polymerase chain reaction (rt-qpcr) of rna extracted from nasopharyngeal (np) swab samples, to measure amplification of a short segment of a viral gene in the course of a pcr reaction following reverse transcription of viral rna. performing the rt-qpcr test requires a real time thermal cycler which is an expensive instrument. most major research laboratories are equipped with only a limited number and smaller laboratories may have none which has placed constraints on the number of tests as well as places for conducting them. further, the need for fluorescent oligonucleotide probes adds to the cost of the tests. a number of new diagnostic testing methods aimed at reducing the dependence on expensive equipment and kits that are in short supply have been proposed and are under development for deployment (broughton et al., ; rauch et al., ; yan et al., ) . however, given the enormous and widespread current need for diagnostic testing in diverse environments it is equally important to increase the utilization of existing research capabilities that have potential but are currently not being used for testing, through the use of tests that employ more widely available equipment and reagents using simple established methods. in order to extend the scope of diagnostic testing for sars-cov- we explored a reverse transcription nested pcr (rt-npcr) approach that does not depend on rt-qpcr but uses standard rt-pcr as part of an endpoint assay. we developed and tested a rt-npcr protocol comprising a multiplex primary rt-pcr for amplification of four sars-cov- amplicons and a control human rpp amplicon followed by a secondary nested pcr for individual amplicons and visualization by agarose gel electrophoresis. we also examined the use of rt-npcr in pooled testing and in direct amplification without rna isolation. rna isolated from np swab samples that had been previously tested using one of two rt-qpcr tests was examined using rt-npcr and the results compared. we found that taking both standard rt-qpcr tests together, the rt-npcr test was able to correctly identify % of samples detected as positive by rt-qpcr and also detected % samples as positive among samples that were negative by the standard rt-qpcr test (likely false negatives). based on the experimentally measured false negative rate by rt-npcr tests from this study we estimated that as many as % of positive samples may escape detection in single pass testing by rt-qpcr in an actual testing scenario. we designed sets of nested oligonucleotide primers for specifically amplifying different regions of sars-cov- and not sars-cov based on available sequence information (genbank id nc_ . for sars-cov and mt . for sars-cov- ). these primers were tested for amplification by a primary pcr, followed by a second round of nested pcr. the starting template was cdna prepared from a pool of rna isolated from two np swab samples that had previously been identified as positive using a rt-qpcr diagnostic kit. from a set of candidate amplicons we selected amplicons that gave visible amplification in the primary pcr and strong bands in the secondary pcr reaction as visualized by agarose gel electrophoresis. the primer sequences for these amplicons were checked against sars-cov- sequences from indian isolates. except for a single mismatch with two sequences in the middle for two primers, all primers were % identical to all sequences. these amplicons comprised portions of orf ab, m, and n genes ( figure a ; table ). the amplified bands were excised from the agarose gel, the dna was extracted, and identity of each band was confirmed by sequencing. similarly a set of nested primers was developed for the human rpp gene as a control. to test efficiency of amplification we used a dilution series of amplicons as templates in primary and secondary pcr. - molecules of dna in dilutions of isolated viral amplicons could be detected by nested pcr ( figure b ). to detect the presence of sars-cov- in rna isolated from np swabs we performed a multiplex one-step rt-pcr on rna from positive and negative samples using pooled primers for the four viral amplicons together with human rpp control. the product of the primary one-step rt-pcr was used as template in separate nested secondary pcr reactions for each of the amplicons followed by detection using agarose gel electrophoresis. using the nested rt-pcr assay we were able to detect amplification of the four viral amplicons in rna from positive samples and no amplification was detected in negative samples ( figure ). pooled testing has been used for sars-co-v detection (yelin et al., ) . in order to assess the suitability of the rt-npcr test for analysis of pooled samples, we performed the test on three sets of pooled samples comprising rna isolated from different dilutions of a positive sample in a pool of ten negative samples. for a sample having a cycle threshold (ct) value of . for the e gene, the rt-npcr test gave robust detection in rna from undiluted sample (positive for out of viral amplicons) and less robustly at a dilution of : (positive for out of viral amplicons; figure ). for two samples having a ct value of . and . , the rt-npcr test successfully detected presence at a dilution of : . hence the rt-npcr test was able to detect presence of all three samples in a pool of : . direct testing of samples by rt-qpcr without rna isolation has been described in recent reports (bruce et al., ; merindol et al., ; smyrlaki et al., ) . we performed direct testing of heat inactivated positive samples by the rt-npcr test. using µl of swab sample directly, which corresponds to / of the amount used in testing of isolated rna, the rt-npcr test detected a positive sample having an e gene ct value of . but not a sample having a higher ct of . (figure ). using a higher volume of swab sample gave weaker amplification suggesting the presence of an inhibitor in the vtm. we observed improved amplification when polyvinylpyrrolidone was included in the pcr reaction. passage of the sample through a sephadex-g spin column also relieved the inhibition and gave improved amplification. overall the rt-npcr test is capable of detecting positives directly in swab samples but at a reduced sensitivity compared to isolated rna. in order to compare the performance of rt-npcr with that of standard rt-qpcr we tested rna samples that had tested positive by rt-qpcr. samples that were positive for at least two amplicons out of four by rt-npcr were called positive, samples that were positive for one amplicon were considered ambiguous and not included in the comparison, and samples that were negative for all four amplicons were called negative. we compared rt-npcr test results to sets of positive and negative rna samples that had been tested by either of two rt-qpcr table ). the niv set covered samples and had a prevalence rate (% positives) of %. the detection efficiency for this set was calculated to be . . the labgun set comprised samples and had a prevalence rate of . %. the detection efficiency for the second set was estimated to be . . therefore our estimate suggests that a high proportion of positive samples (upto % or more) can be missed by the standard rt-qpcr test in a real testing scenario. the rt-npcr test described above does not require a real time thermal cycler and can be performed in a laboratory that has basic molecular biology equipment, a thermal cycler, and a bsl room with class ii laminar flow hood. it uses well-established methodologies and reagents that are widely available and can serve as a basis for broadening the scope of testing for sars-cov- . the performance of the rt-npcr test was comparable to rt-qpcr and the cost of consumables for the test based on list prices is around us $ per test about half of which is the cost of rna isolation. the test uses two rounds of pcr amplification which does increase potential for contamination. however we found that by following a set of practices described in a detailed protocol (supplementary file ), contamination could be avoided. in the course of comparing the rt-npcr and standard rt-qpcr tests we estimated from the rate of experimentally determined false negatives, that a high proportion (about %) of positive samples were being missed by the rt-qpcr test applied in a single pass testing protocol. this is a high escape rate which poses a concern in individual diagnosis and merits monitoring and greater comparison of results across testing scenarios. we also suggest that detection can be improved by repeat testing of isolated rna and by increasing the number of amplicons tested. the existence of false negatives in the rt-qpcr test has been inferred in a number of studies that compared clinical test and symptomatic data with rt-qpcr testing information (kucirka et al., ; li et al., ; xiao et al., ) . several studies have compared the performance of different rt-qpcr kits on rna from clinical samples primarily to assess the performance of the kits, and have found agreement as well as differences that also point to the existence of false negative test results (hogan et al., ; pujadas et al., ; van kasteren et al., ; xiong et al., ) . however, direct experimental analysis of rt-qpcr negative rna samples at testing centres using different but related rt-pcr based tests as a way of estimating detection efficiency has received limited attention likely because of the high demand for diagnostic tests, shortage of testing kits, and cost of testing. the low detection efficiency that we have estimated may not be entirely explained by a low representation of the virus in many of the samples as the mean number of positive amplicons by rt-npcr for the false negatives ( niv + labgun) was . ( figure ). possible alternative explanations are variability in test performance or in representation of different portions of the viral rna. low detection efficiency could be a possible concern in other tests including those under development that are based on reverse transcription as a first step followed by dna amplification and detection (broughton et al., ; rauch et al., ; yan et al., ) . many rt-qpcr tests use rna corresponding to about % of swab sample. increasing the amount of sample for rna isolation and concentration of the sample are possible options for improving detection, however these would also increase the number of operations and expense, and it would need to be assessed whether this would be compatible with medium to high throughput protocols. protocols based on initial detection of a single amplicon followed by a confirmatory test for a second amplicon may also contribute to false negatives and lower detection. in conclusion the rt-npcr endpoint assay for sars-cov- described above uses widely available reagents, lowers costs, and obviates the need for a real time time thermal cycler. the assay can therefore be performed in a large number of clinical and diagnostic laboratories lacking this expensive piece of equipment essential for rt-qpcr testing. the performance of rt-npcr is comparable to rt-qpcr and analysis of rt-qpcr tested samples by rt-npcr shows a high escape rate in detection of positives, highlighting a need for directly monitoring detection efficiency through assessment of false negatives in real testing environments. as more regions of the world move into community transmission phase, there is also a greater need for surveillance testing. pooled testing by rt-npcr can contribute to meeting this requirement. np swab samples were collected from patients suspected of being infected with sars-cov- and their contacts at different hospitals in the hyderabad vicinity based on indian council of medical research (icmr) guidelines (http://www.nie.gov.in/images/leftcontent_attach/covid-sari_sample_collection_sop_ .pdf) and in accordance with institutional ethics committee guidelines. samples were coded and anonymized before processing and data collection. µl of np swab sample in viral transport medium (vtm) was used for rna isolation by the qiaamp® viral rna (cat# ) or equivalent kit as per manufacturer's instructions. np sample was lysed in bsl facility. rna isolation steps were carried out in bsl facility. rna was eluted in µl of water. for pooled sample rna isolation, positive sample was pooled with negative samples in : , : and : ratios comprising µl of np sample and isolated as above. all safety precautions were followed as per icmr-niv guidelines. sars-cov- full length genomic sequence of an isolate from kerala -india was downloaded from ncbi (genbank id mt . ). primers were designed for regions specific for sars-cov- , but not sars-cov (genbank id nc_ . for the human rpp gene (genbank id u . ) primers were designed on exon-exon junctions to avoid genomic dna amplification. oligonucleotides were obtained from bioserve, hyderabad. pooled rna from two previously identified positive np swab samples was used for first strand cdna synthesis (takara primescript kit cat # a). µl of the cdna reaction was diluted ten-fold and used for primary pcr. primary pcr was performed using emeraldamp® gt pcr master mix (takara cat# rr a) ul, forward and reverse primers ( µm) µl each, diluted cdna µl, and water µl. thermal cycling conditions were ) °c - minutes ) °c - seconds ) °c - seconds ) °c - seconds ) repeat steps - for cycles ) °c - seconds ) °c - seconds ) °c - seconds ) repeat steps - for cycles ) °c - minutes ) °c -hold. the primary pcr product was diluted fifty-fold and µl was used for secondary pcr using a nested primer pair. thermal cycling conditions were the same as for primary pcr. pcr products were separated by electrophoresis on a . % agarose gel and visualized on a uv gel documentation system. for amplicon dilution experiments, bands were excised from the gel, dna was extracted (macherey-nagel nucleospin kit, cat # . ) and subcloned into pgem-t vector (promega). the insert was pcr amplified from a plasmid dna clone, followed by gel purification and extraction. dna was quantified by a qubit dsdna hs kit (thermofisher cat # q ). primary rt-pcr and secondary pcr was performed using the primescript iii -step rt-pcr kit and emeraldamp gt pcr mix (takara; detailed protocol provided in supplementary file ). for direct testing of sample without rna isolation np sample was heat inactivated at °c for minutes in bsl . µl of % pvp (sigma cat# pvp - g) was included in primary pcr and other conditions were kept same. for sephadex g (sigma cat# ge - - ) experiment, a spin column was prepared as mentioned in sambrook and maniatis manual and other conditions were kept same. the spin column was used outside the pcr area to avoid contamination from aerosols. images were edited using adobe photoshop and included removal of intervening lanes in the gel between samples and dna marker indicated by a vertical line . niv method: rt-qpcr for e gene and rpp gene was done with µl of rna template by following first line screening assay according to national insititute of virology (icmr-niv) https://www.icmr.gov.in/pdf/covid/labs/ _sop_for_first_line_screening_assay_for_ _ ncov.pdf. based on e gene result rdrp and orf b were tested with µl of rna sample by following confirmatory assay given by icmr-niv https://main.icmr.nic.in/sites/default/files/upload_documents/ _sop_for_confirmatory_as say_for_ _ncov.pdf. labgun method: rt-qpcr for e gene and rdrp was performed along with internal control as per manufacturer's instructions (labgenomics -labgun tm covid- rt-pcr kit cat# cv b). µl of rna template was used for the reaction. the work described in this study was carried out in accordance with institutional ethics committee guidelines. crispr-cas -based detection of sars-cov- direct rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swabs without an rna extraction step comparison of the accula sars-cov- test with a laboratory-developed assay for detection of sars-cov- rna in clinical nasopharyngeal specimens variation in falsenegative rate of reverse transcriptase polymerase chain reaction-based sars-cov- tests by time since exposure stability issues of rt-pcr testing of sars-cov- for hospitalized patients clinically diagnosed with covid- interrater reliability: the kappa statistic sars-cov- detection by direct rrt-pcr without rna extraction comparison of sars-cov- detection from nasopharyngeal swab samples by the roche cobas® sars-cov- test and a laboratory-developed real-time rt-pcr test a scalable, easy-to-deploy, protocol for cas -based detection of sars-cov- genetic material massive and rapid covid- testing is feasible by extraction-free sars-cov- rt-qpcr comparison of seven commercial rt-pcr diagnostic kits for covid- false-negative of rt-pcr and prolonged nucleic acid conversion in covid- : rather than recurrence comparative performance of four nucleic acid amplification tests for sars-cov- virus rapid and visual detection of novel coronavirus (sars-cov- ) by a reverse transcription loopmediated isothermal amplification assay clinical infectious diseases: an official publication of the infectious diseases society of america the telengana state government and directorate of medical education are acknowledged for providing the swab samples used in this study. we thank the covid- team of volunteers at ccmb for processing the samples. the full list of volunteers is provided as supplementary file . this work was supported by the council of scientific and industrial research (csir, india). the authors declare no competing interests key: cord- -nkql h x authors: muus, christoph; luecken, malte d.; eraslan, gokcen; waghray, avinash; heimberg, graham; sikkema, lisa; kobayashi, yoshihiko; vaishnav, eeshit dhaval; subramanian, ayshwarya; smilie, christopher; jagadeesh, karthik; duong, elizabeth thu; fiskin, evgenij; triglia, elena torlai; ansari, meshal; cai, peiwen; lin, brian; buchanan, justin; chen, sijia; shu, jian; haber, adam l; chung, hattie; montoro, daniel t; adams, taylor; aliee, hananeh; samuel, j.; andrusivova, allon zaneta; angelidis, ilias; ashenberg, orr; bassler, kevin; bécavin, christophe; benhar, inbal; bergenstråhle, joseph; bergenstråhle, ludvig; bolt, liam; braun, emelie; bui, linh t; chaffin, mark; chichelnitskiy, evgeny; chiou, joshua; conlon, thomas m; cuoco, michael s; deprez, marie; fischer, david s; gillich, astrid; gould, joshua; guo, minzhe; gutierrez, austin j; habermann, arun c; harvey, tyler; he, peng; hou, xiaomeng; hu, lijuan; jaiswal, alok; jiang, peiyong; kapellos, theodoros; kuo, christin s; larsson, ludvig; kyungtae lim, michael a. leney-greene; litviňuková, monika; lu, ji; maatz, henrike; madissoon, elo; mamanova, lira; manakongtreecheep, kasidet; marquette, charles-hugo; mbano, ian; mcadams, alexi marie; metzger, ross j; nabhan, ahmad n; nyquist, sarah k.; ordovas-montanes, jose; penland, lolita; poirion, olivier b; poli, sergio; qi, cancan; reichart, daniel; rosas, ivan; schupp, jonas; sinha, rahul; sit, rene v; slowikowski, kamil; slyper, michal; smith, neal; sountoulidis, alex; strunz, maximilian; sun, dawei; talavera-lópez, carlos; tan, peng; tantivit, jessica; travaglini, kyle j; tucker, nathan r.; vernon, katherine; wadsworth, marc h.; waldmann, julia; wang, xiuting; yan, wenjun; zhao, william; ziegler, carly g. k. title: integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of sars-cov- viral entry and highlights inflammatory programs in putative target cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nkql h x the covid- pandemic, caused by the novel coronavirus sars-cov- , creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. cell membrane bound angiotensin-converting enzyme (ace ) and associated proteases, transmembrane protease serine (tmprss ) and cathepsin l (ctsl), were previously identified as mediators of sars-cov cellular entry. here, we assess the cell type-specific rna expression of ace , tmprss , and ctsl through an integrated analysis of single-cell and single-nucleus rna-seq studies, including lung and airways datasets ( unpublished), and datasets from other diverse organs. joint expression of ace and the accessory proteases identifies specific subsets of respiratory epithelial cells as putative targets of viral infection in the nasal passages, airways, and alveoli. cells that co-express ace and proteases are also identified in cells from other organs, some of which have been associated with covid- transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. performing the first meta-analyses of scrna-seq studies, we analyzed , , cells from nasal, airway, and lung parenchyma samples from donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ace , tmprss , and ctsl in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to covid- susceptibility and outcomes. notably, there was a particularly low expression of ace in the few young pediatric samples in the analysis. further analysis reveals a gene expression program shared by ace +tmprss + cells in nasal, lung and gut tissues, including genes that may mediate viral entry, subtend key immune functions, and mediate epithelial-macrophage cross-talk. amongst these are il , its receptor and co-receptor, il r, tnf response pathways, and complement genes. cell type specificity in the lung and airways and smoking effects were conserved in mice. our analyses suggest that differences in the cell type-specific expression of mediators of sars-cov- viral entry may be responsible for aspects of covid- epidemiology and clinical course, and point to putative molecular pathways involved in disease susceptibility and pathogenesis. covid- is a global health threat due to its rapid spread, morbidity, and mortality. despite progress in viral identification, sequencing of the full viral genome, creation of initial diagnostics, and the development of therapeutic hypotheses, many outstanding hurdles remain. these include deciphering the basis of the increased risk associated with certain demographic groups and identifying molecular mechanisms of disease pathogenesis. the clinical presentation and transmission of covid- is complex. common symptoms include fever, cough, shortness of breath, chest pain, malaise, fatigue, headache, myalgias, anosmia, and diarrhea, while laboratory and radiographic findings include lymphopenia and ground-glass opacities on chest imaging, respectively [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . of an initial cohort of , hospitalized patients diagnosed with covid- , many developed diffuse alveolar damage (dad) , pneumonia ( . %), not infrequently complicated by acute respiratory distress syndrome (ards, . %), and shock ( . %), with . % of patients requiring icu admission and . % requiring ventilation . as the number of patients has surged, multi-system pathologies have been increasingly described, including kidney injury , liver injury, gastrointestinal symptoms , cardiac injury and dysfunction , , , and multiorgan failure [ ] [ ] [ ] [ ] . in addition to nasal and throat secretions, sars-cov- rna has also been detected in saliva and stool specimens , , suggesting possible alternative routes of transmission beyond respiratory droplets , . sars-cov- may also infect the testis, similarly to sars-cov , . vertical transmission from mother to fetus remains a possibility. at least five neonates born to pregnant women with covid- pneumonia were reported to test positive for sars-cov- infection after birth [ ] [ ] [ ] and other studies report newborns with elevated virus-specific antibodies to sars-cov- born to mothers with covid- , . however, several other studies have thus far failed to find evidence for intrauterine transmission from pregnant women with covid- to their newborns in cohorts as large as patients [ ] [ ] [ ] . additionally, newborns from covid- patients who had cesarean deliveries in their third trimester tested negative for sars-cov- . there is substantial variation in the clinical consequences of infection across individuals, ranging from asymptomatic carrier status to death. it has been suggested that undocumented subclinical infection contributes to the rapid dissemination of the virus . as of april , , covid- has caused , , confirmed infections and , deaths worldwide (https://coronavirus.jhu.edu/map.html). while true case fatality rate (cfr) is difficult to assess early in an epidemic [ ] [ ] [ ] , estimates from modeling studies range from . % - . % , . disease severity and mortality rates show a striking rise with age , with cfr estimates ranging from < . % for patients under years old to > % for those over , with a slightly higher incidence and mortality in men , . children are significantly less likely than adults to develop severe disease, and reported pediatric deaths are rare . smoking is most likely associated with more severe disease , . finally, adults with pre-existing cardiovascular disease and acute myocardial injury have higher rates of disease acuity and death , . the coronavirus non-segmented, positive sense rna genome of ~ kb contains coding regions for the expression of structural proteins including spike (s), envelope (e), membrane (m), and nucleocapsid (n) proteins. virion recognition of host cells is initiated by interactions between the s protein and its receptor . ace [ ] [ ] [ ] , an essential regulator of the renin-angiotensin system , is the receptor for both sars-cov and sars-cov- . the receptor-binding domain of the sars-cov- s-protein has a higher binding affinity for human ace than sars-cov , , whereas the interaction with cd (encoded by the gene bsg), another reported receptor for the sars-cov- s-protein, is weak (cd : kd, . μm vs hace : kd, ~ nm) . following receptor binding, the virus gains access to the host cell cytosol through acid-dependent proteolytic cleavage of the s protein. for sars-cov, a number of proteases including tmprss and ctsl cleave at the s and s boundary and s domain (s ') to mediate membrane fusion and virus infectivity. for sars-cov- , both pharmacological inhibition of endogenous tmprss protein and tmprss overexpression support a role for tmprss -mediated cellular entry , . the identification of the specific cell types that can be infected by sars-cov- will inform our understanding of disease transmission and pathogenesis, which are often cell context-specific. studies suggest that key infection routes involve the nasal passages, airways, and alveoli, where epithelial cells play a key barrier role. identifying putative target cells in other organs could inform our understanding of extra-pulmonary covid- associated organ failure or of potential placental transmission. early analyses of the human lung cell atlas revealed that some of the cells of the nasal passages, airways, alveoli, and gut co-express ace and tmprss , . here, we perform integrated analysis of single-cell and single-nucleus rna-seq studies, including studies of the lung and airways, and additional studies of other diverse tissues, spanning both published and unpublished datasets. we comprehensively define the expression patterns of the ace viral receptor and accessory proteases genes. we test how their expression is related to age (from prenatal to old age), sex, and smoking status. we identify gene expression programs associated with cells that can be infected by the virus and compare these programs across specific cell types, organs, and species. to further inform future studies, we assess the conservation of these human features in mouse models and explore the expression of other proteases that may play a role in the viral replication cycle. previous analyses of human cell atlas datasets established that ace , the viral receptor, and one of its entry-associated proteases, tmprss , are expressed in nasal, lung, and gut epithelial cells . specifically, nasal goblet cells and multiciliated cells comprised the highest fraction of dual-positive ace + tmprss + cells , consistent with a plausible role for a nasal viral reservoir that supports transmissivity. in the distal lung, co-expression occurred in at cells , , . previous surveys across other tissues also showed a relatively high portion of ace + tmprss + cells within colonic enterocytes, another potential viral reservoir that promotes viral transmission . to perform a comprehensive survey, we enumerated the proportion of dual-positive ace + tmprss + cells and ace + ctsl + cells across human studies (including seven of the lung and airways) with single-cell or single-nucleus rna-seq (sc/snrna-seq) ( fig. , methods, supplementary table and ). these included a large survey of published datasets from diverse tissues, which we assigned to five broad cell categories, (fig. a,b , extended data fig. , , supplementary table ) . we further analyzed more finely annotated published and unpublished datasets (methods, fig. c ,d, supplementary table ) . consistent with previous reports , dual-positive ace + tmprss + cells in the proximal airways were largely secretory goblet and multiciliated cells, and dual-positive cells in the distal lung were largely at cells (fig. c, extended data fig. a) . ace expression in secretory (especially goblet) and at cells is also supported by scatac-seq from the primary carina and subpleural parenchyma, respectively (fig. , n= samples per location, n= patient), showing accessibility at the ace locus in a portion of at cells ( . %, out of , cells, methods), as well as secretory and multiciliated cells, and to a lesser extent some basal and tuft cells (fig. a-c) . the proportion of at cells with an open ace locus is somewhat higher than of ace + at cells by scrna-seq in the same patient and region (pleura: . %, out of , cells vs. . %, out of , cells). cells with accessible chromatin at both the ace and tmprss loci were also most commonly found in epithelial cells, especially at cells (fig. d, . %, of , at cells from a subpleural sample; comparable to . %, of , in matched scrna-seq), secretory cells ( . %, of secretory cells from small airway of the subpleural region, and %, of secretory cells from primary carina of the large airway), and multiciliated cells ( . %, of multiciliated cells from subpleural sample, and . %, out of multiciliated cells from primary carina). there were dual-positive ace + tmprss + cells in tissues beyond the respiratory system ( fig. a-c) , including enterocytes, pancreatic ductal cells, prostate luminal epithelial cells, cholangiocytes , oligodendrocytes in the brain, inhibitory enteric neurons, heart fibroblasts/pericytes , and fibroblasts and pericytes in multiple other tissues (fig. c) . ace + tmprss + epithelial cells were most prevalent (in order) within the ileum, liver, lung, nasal mucosa, bladder, testis, prostate, and kidney (fig. a) . enterocytes had a substantial proportion of dual-positive cells (fig. c) , and are possibly part of a renin-angiotensin multicellular circuit . in line with the kidney's role in the renin-angiotensin-aldosterone system, dual-positive cells are enriched in the proximal tubular cells and in principal cells of the collecting duct (fig. a,c) . interestingly, brain oligodendrocytes, multiciliated and sustentacular cells in olfactory epithelium, at cells in non-smoker lung, and ductal cells in pancreas -all ace + tmprss + -were also all enriched for myrf, a transcription factor necessary for myelination in the brain and sufficient to induce expression of the myelin proteins mog (myelin oligodendrocyte glycoprotein) and mbp (myelin basic protein) . ace + ctsl + cells were enriched in additional subsets, associated with covid- pathology, most notably the olfactory epithelium, ventricular cardiomyocytes, heart macrophages, and pericytes in multiple tissues, including the heart, lung, and kidney (fig. d) . the presence of dual-positive cells in the lung, heart, and kidney may reflect that cells in these organs may be direct targets of viral infection and pathology , . dual-positive cells in the sustentacular and basal cells of the olfactory epithelium ( fig. c ) may be associated with a loss of the sense of smell . dual-positive cells in the corneal and conjunctival epithelium, may contribute to viral transmission , . dual positive cardiomyocytes may be related to "direct" cardiomyocyte damage (see tucker et al. companion manuscript ), whereas heart pericytes may indicate a vascular component to the cardiac dysfunction, and could contribute to increased troponin leak in patients without coronary artery disease. notably, ace -expressing heart pericytes in another dataset (tucker et al) is even higher ( %) than any other tissue dataset analyzed here (max. . % in kidney, extended data fig. ) . despite the lymphopenia observed with covid- , , , , we did not typically observe ace mrna expression in scrna-seq profiles in the bone marrow or cord blood (fig. a,b) , although there was ace expression in some tissue macrophages, including alveolar and heart macrophages (extended data fig. ) . further studies of ace rna and protein expression in covid- disease tissue will help elucidate its expression in immune cells . to validate our findings from scrna-seq analysis and to determine the spatial expression patterns of ace , tmprss , and ctsl, and their corresponding proteins we performed fluorescence in situ hybridization and immunohistochemistry on tissue sections of airway and alveoli from healthy donor lungs that were rejected for lung transplantation. first, we performed triple fluorescence in situ hybridization to identify ace , ctsl and tmprss on alveolar sections. we observed co-expression, albeit at low levels, of all three genes in alveolar cells (fig. e) . we then performed co-staining with cell type-specific markers. we observed ace transcripts in a subset of type (at ) cells identified by canonical at protein markers, htii- and pro-sftpc (fig. f,g) . similarly, we observed tmprss gene expression in htii- + at cells (fig. f) . immunostaining for tmprss protein further confirmed at cell expression (extended data fig. a) . we also observed tmprss protein expression at low levels in some at cells identified by the canonical at protein marker ager (extended data fig. a ). of note, some non-epithelial cells also expressed these three genes. we further validated the expression of ace by bulk mrna-seq in sorted at cells, including those from long-term cultured alveolar organoids (extended data fig. b) . we then performed immunohistochemistry and deployed three different available putative ace antibodies to establish ace protein expression (supplementary table ). one of these antibodies, the one used previously to functionally block cellular viral entry, specifically labeled adult pro-sftpc-positive at cells (extended data fig. c) . as a cautionary note, the lack of agreement between antibody staining patterns suggests that some of these antibodies may be non-specific. previous studies have revealed that ace is highly enriched in mucous cells of the nasal and lower respiratory tract epithelium . in healthy lungs, both large and small airways contain mucous cells in surface airway epithelium, albeit at low numbers. however, submucosal glands (smgs) that reside deep within the airway tissues are composed of abundant mucous cells. to test whether these mucous cells also express ace , tmprss , and ctsl, we performed an integrated analysis on scrna-seq datasets obtained from microdissected smgs of the healthy donors. we observed overlapping expression and relatively high enrichment of aec , tmprss and ctsl in mucous cells of the smgs (extended data fig. e ). in situ transcript analysis for ace further confirmed the presence of transcripts in acinar epithelial cells of the smgs (fig. h) , and cells expressing ace in the large airway epithelium (fig. i) . we next sought to understand how the expression of each of these three key genes --ace , tmprss , and ctsl --in specific cell subsets may relate to three key covariates that have been associated with disease severity: age (older individuals are more severely affected), sex (males are more severely affected), and smoking (smokers are more severely affected) . we integrated samples across many studies, as no single dataset generated to date is sufficiently large to address this question. we assembled datasets (supplementary table , supplementary data file d ), comprised of , , cells from individuals, spanning healthy nasal, lung, and airway samples profiled by scrna-seq or snrna-seq from either biopsies, resections, entire lungs that could not be used for transplant, or post mortem examinations, allowing us to study a diversity of respiratory regions and cell types (fig. a) . these included published datasets [ ] [ ] [ ] [ ] [ ] [ ] and datasets that are not yet published [ ] [ ] [ ] [ ] [ ] . in the case of unpublished data, we only obtained single-cell expression counts for the three genes, as well as the total umi counts per cell, cell identity annotations, and the relevant anonymous clinical variables (age and sex, as well as smoking status when ascertained). cell identity annotations were manually harmonized using an ontology with three levels of annotation specificity (fig. b, supplementary table ) ; focusing on levels and allowed us to include a large number of datasets, while retaining relatively high cell subset specificity (fig. a,b) . to facilitate rapid data sharing, we analyzed data pre-processed by each data-generating team at the level of gene counts, using total counts as a size factor. we used poisson regression (diffxpy package; methods) to model the association between the expression counts of the three genes and age, sex, and smoking status, and their possible pair-wise interactions (fig. c) , using total counts as an offset, and dataset as a technical covariate to capture sampling and processing differences. it should be noted that modeling interaction terms was crucial as their omission resulted in reversed effects for age and sex for particular cell types (discussion). this model was fitted to non-fetal lung data ( , cells, samples, donors, datasets) within each cell type to assess cell-type specific association of these covariates with the three genes. to further validate sex and age associations, we fit a simplified version of the model without smoking status covariates to the full non-fetal lung data ( , cells, samples, donors, datasets). uncertainty is challenging to model in our single-cell meta-analysis as variability exists on the levels of both donors and cells. for simplicity, we modeled the overall variance with both contributions covered implicitly by treating each cell as an independent observation. as cells from the same donor cannot be typically regarded as independent observations, this can result in inflated p-values, especially when there are few donors for a particular cell type. to counteract this limitation, we employed three approaches: ( ) we used a simple noise model (poisson) to reduce the chance of overfitting donor variability to obtain spurious associations; ( ) we confirmed significant associations from the single-cell model in a pseudo-bulk analysis to ensure effect directions are consistent when modeling only donor variation (methods, fig. d ( ) we investigated whether significant associations change direction when holding out any one dataset to ensure that the effect is not dominated by the inclusion of many cells from only one source (methods, fig. f, supplementary data d ) . we regarded an association that passes all of these validations as a robust trend, while associations that appear dominated by a single dataset (often because this dataset is a major contributor of a given cell type) were denoted as indications. we focused on trends or indications in those cell types where both tmprss and ace are predominantly expressed in the lung: airway epithelial cells (basal, multiciliated, and secretory cells), alveolar at cells, and submucosal gland secretory cells (fig. e) . strikingly, we find robust trends of ace expression with age, sex, and smoking status in these cell types (fig. d , extended data fig. and malte ): ace expression increases with age in basal and multiciliated cells. ace expression is elevated in males in airway secretory cells and alveolar at cells. furthermore, we find strongly elevated levels of ace in past or current smokers in multiciliated cells (log fold change (log fc): . , fig. d ). significant associations of ace expression indicate increased expression with age in at (largest age effect: slope of log expression per year of . ) and secretory cells, and increased expression of ace in males in multiciliated cells. further indications associate past or current smoking with decreased ace expression in at cells, and increased ace expression in basal cells. however, these last five indications are not robust trends and depend on the inclusion of a single dataset (fig. f) , often because that dataset contributes a large number of cells of a particular type. specifically, when we held out the largest declined donor transplant dataset (supplementary table , "regev-rajagopal", most cells and most samples), a declined donor tracheal epithelium dataset ("seibold", supplementary table , most donors in the smoking analysis), or a further declined donor lung dataset ("kropski-banovich", supplementary table ) respectively, the effect is no longer present (methods, fig. f , supplemenatry data d ). the above trends and indications for sex and age were further validated in the simplified model on the full non-fetal lung dataset (extended data fig. , supplementary data d ) . with the exception of the age association in basal and secretory cells, all associations were found to be significant at a false discovery rate (fdr) threshold of %, confirmed by pseudo-bulk analysis, and were supported at least at the level of indication. indeed, all robust trends were also supported as robust trends in the simplified model. fitting the simplified model on the smoking data subset shows that modeling smoking status is crucial to detect the basal and secretory age association. without modeling the effect of smoking status on basal and secretory cell ace expression, this variance is captured as uncertainty against which the age effect is evaluated as not significant. taking into account smoking status is of particular importance as the effect sizes associated with smoking tend to be much larger than age effects, and tend to be larger than sex effects. for example, in multiciliated cells the effect sizes assigned to smoking status, sex, and age associations with ace expression are β= . , β= . , and β= . respectively, where β represents the log fc and the slope of log expression per year. examining joint trends of ace and the protease genes within the same cell type, there are indications of up-regulation of both ace and tmprss in multiciliated cells (ace indication dependent on "seibold" dataset) in males and with age (both indications dependent on "regev-rajagopal" dataset). in at cells, there is an indication of joint up-regulation of ace and tmprss with age (dependent on "regev-rajagopal" dataset), and an indication of ace and ctsl down-regulation in smokers (dependent on "kropski-banovich" dataset). all above joint trends for age and sex covariates were confirmed on the full non-fetal lung data using the simple model without smoking covariates. in aggregate, elevated levels of cell-type specific ace and associated proteases are correlated to increasing age, in smokers, and in males. the age associations highlighted particularly low expression in samples from very young children (newborn to years old). as reports suggest that most infants and young children cases do not display severe disease , , we inspected the subsets of studies of human development and pediatric samples from our integrated analysis (supplementary table ). these included cells from first trimester samples ( donors) of fetal lung ( . weeks post conception; wpc), fetal lung samples ( donors) from the second trimester ( - weeks) , lung samples ( donors) spanning from third trimester premature births (n= ), full term newborns (n= ), ~ -month old (n= ), -year-old (n= ), and -year-old (n= ) children. because the number of samples here is small, all observations must be interpreted with caution. the extent of ace expression in lung cells changes during development (fig. g, supplementary table ). there are dual-positive cells present in the very early first trimester lungs by scrna-seq, and some ace expression in epithelial cells in the second trimester samples (extended data fig. a,b) . notably, spatial transcriptomics of . pcw fetal lung did not capture any ace expression (data not shown). in lungs from third trimester pre-term births, ace expression is high, with ace + tmprss + cells observed in alveolar at cell populations (extended data fig. b) . strikingly, ace expression is very low in normal lungs of newborns, the one ~ months' lung sample, and the -year-old lungs (fig. g) . this is further supported by single-cell chromatin accessibility by transposome hypersensitive sites sequencing (scths-seq ) from human pediatric samples (full gestation, no known lung disease) collected at day of life, months, years, and years (n= at each time point) (extended data fig. a) . ace gene activity scores (methods), when present, were in the at /at population, but no signal was present at birth, it was low in the -year-old and -year-old sample, and higher in the month-old (extended data fig. b-d) . notably, immunohistochemistry (ihc) of the ~ month-old infant lung also showed fewer ace -immunoreactive at cells (extended data fig. d ). we also assessed whether ace , tmprss , and ctsl are expressed in the human placenta during pregnancy, using data from three published scrna-seq studies: two from the first trimester ( , cells and , cells) and one from full-term placenta ( , cells) [ ] [ ] [ ] . ace was expressed ( . %) in maternal decidual/stromal cells, maternal pericytes, and fetal extravillous trophoblasts, cytotrophoblasts, and syncytiotrophoblast in both first-trimester and term placenta (fig. d) . note that there was little expression of tmprss ( . %) in the placenta and accordingly few ace + tmprss + dual-positive cells (as we previously reported ). however, ctsl is expressed in most cells ( %) in the maternal-fetal interface, and there are ace + ctsl + dual-positive cells ( . %) among maternal decidual/stromal cells, pericytes, and fetal trophoblasts (hla-a, b negative) in both first-trimester and term placentae. overall, these patterns may be important in understanding why children are more resistant to covid- and and in considering risk during pregnancy. our human lung cell atlas analyses have revealed immune signaling genes that co-vary with ace and tmprss in airway and lung cells , . these analyses identified antiviral response genes that are enriched in ace + tmprss + cells (e.g., ido , irak , nos , tnfsf , oas , mx ), and suggested that ace itself is interferon regulated , . to explore such gene programs in a broader context, we identified signatures for dual-positive ace + tmprss + cells compared to dual-negative ace -tmprss cells in the nasal epithelium, lung, and gut (supplementary tables , ) with two complementary approaches. the first aimed to find features that characterize programs of dual positive cells that are shared by different cell types in one tissue ("tissue programs"). the second aimed to find features that are associated with dual positive cells compared to other cells of the same type, and may or may not be shared with other types ("cell programs") (methods). to infer tissue programs, we trained a random forest classifier to discriminate between dual-positive and dual-negative cells (excluding ace and tmprss ; : class balanced test-train split), generalizing across multiple cell types in one tissue, and ranked genes according to their importance scores in the classifier (methods). to infer cell programs, we performed differential expression analysis between dual-positive and dualnegative cells within each cell subset. we note that ace + tmprss + cells have more unique transcripts detected (extended data fig. b) : this can reflect a technical confounder, biological features, or both. we conservatively controlled for these differences (by sampling dual positive and dual negative cells from matched gene complexity bins; methods; extended data fig. b , extended data fig. ) . importantly, these methods do not assume that ace + tmprss + cells form a distinct subset within each cell type. rather, our goal is to leverage the variation among single cells within a single type to identify gene programs that are co-regulated with ace and tmprss within each expressing cell subset. tissue programs (fig. a, extended data fig. a, supplementary table , ) were enriched in several pathways related to viral infection and immune response (see fig. b , extended data fig. a for visualization of selected genes, and supplementary tables for the full list). these include phagosome structure, antigen processing and presentation, and apoptosis. among the tissue program genes we highlight: ceacam (lung, nasal, gut programs) and ceacam (lung), surface attachment factors for coronavirus spike protein; slpi (lung, nasal), a secreted protease inhibitor that is associated with virus resistance ; pigr (lung, gut), the polymeric immunoglobulin receptor that may promote antibody-dependent enhancement via iga ; and, cxcl (lung, nasal), a mucosal chemokine that attracts dendritic cells and monocytes to the lungs table ) were enriched in many of the same genes and pathways as tissue-specific programs (fig. d , supplementary table , , , , ), and highlight a potential role for tnf signaling in ace regulation. we first confirmed that the cell programs were not merely associated with the number of transcripts per cell (extended data fig. c ). while some genes were shared between the tissue and cell programs (e.g., many virus-related genes, such as ceacam , cxcl , slpi, and hla-dra), the cell programs further captured unique biological functions and activities. for example, dual positive lung secretory cells differentially expressed genes involved in tnf signaling including ripk , a key regulator of inflammatory cell death via necroptosis, previously implicated in sars-cov pathogenesis . both lung dual positive secretory and multiciliated cells differentially expressed lysosomal genes (mfsd , ctss, ctns, ctsh), potentially relevant for endolysosomal entry of coronaviruses . dual positive at cell programs included genes involved in immunoproteasome (psmb , psmb , fig. c) , class i and ii antigen presentation (hla-dma, hla-drb , hla-dpb , hla-dra, hla-dpa ), and phagocytosis. dual-positive nasal goblet cells differentially expressed several cytokines and chemokines, including granulocyte-colony stimulating factor (csf ), which may impact hematopoiesis, the recruitment of neutrophils, and inflammatory pathology; cxcl and cxcl , chemoattractants for neutrophils; interleukin- (il ), which induces the production of il- and tnf ; and ccl , which is upregulated by tnf . the at cell program included the surfactant proteins, sftpa and sftpa ; the il- receptor (il r ), which may promote antiviral immune responses (below); and, multiple components of mhc-ii (e.g., hla-dpa , hla-dpb ), congruent with a role in antigen presentation. cell programs from multiple tissues (fig. c,d) included genes related to tnf signaling (e.g., birc , ccl , cxcl , cxcl , jun, nfkb ), raising the possibility that anti-tnf therapy may impact the expression of ace and/or tmprss . consistent with this hypothesis, ace expression in enterocytes was significantly lower in ulcerative colitis patients treated with anti-tnf compared to untreated patients (mean = . and . log (transcripts per , (tp k)+ ) in treated vs. untreated; adjusted p < e- ). however, we could not control for many important features, including disease severity, which is strongly associated with anti-tnf treatment, raising the need for future work. some of the genes are targets of known drugs . for example, dual-positive lung secretory cells expressed, in addition to ace (targeted by ace inhibitors), other drug targets, including c , hdac , il a, pik ca, ramp , and slc a . other program genes were shown to interact with sars-cov- proteins via affinity purification mass spectrometry . among those was gdf , which was identified as a putative interaction partner for the sars-cov- protein orf , is a central regulator of inflammation , and was a member of the dual-positive cell programs of both lung basal cells and nasal multiciliated cells. some program genes may be particularly related to covid pathological features and may indicate putative therapeutic targets. for example, muc is especially highly induced in dualpositive cells (in tissue and specific cell programs), which may be associated with respiratory secretions . importantly, the lung tissue and gut enterocyte programs include the gene encoding the il co-receptor (il st), and the at cell program includes il . il signaling has been implicated in uncontrolled immune responses in the lungs of covid patients, elevated serum il levels are associated with the need for mechanical ventilation , and anti-il r antibodies (tocilizumab) are being tested for clinical efficacy in covid- patients. indeed, il st and il are higher in dual positive vs. dual negative at cells (extended data fig. d ), although il expression is relatively low in these cells from healthy tissue. additional cell types, such as heart pericytes, are enriched for cells with co-expression of ace with il r or il st (extended data fig. ). the immune-like features of ace + epithelial cells are also reflected in the regulatory features of the ace locus by scatac-seq (fig. f) . note that because epithelial cells with an accessible ace locus tend to have a higher number of fragments in peaks than cells with inaccessible ace (extended data fig. f ), consistent also with higher umis in scrna-seq, some of the cells with inaccessible ace could be false negatives, reducing our power. previous studies in the healthy lung predicted that interactions between at cells and myeloidlineage macrophages may be important for immune regulation and surfactant homeostasis . to explore this possibility, we predicted interactions between at cells (in general, or ace + tmprss + dual-positives) and myeloid cells (methods ), using our large declined donor transplant dataset ("regev/rajagopal"; samples, patients, - locations each). at cells and myeloid cells were present in lung lobes samples from all patients, whereas samples from patients contained both ace + tmprss + dual-positive at cells and myeloid cells. we identified significant predicted interactions involving oncostatin m (osm), an il -type cytokine expressed by myeloid cells , with the oncostatin m receptor (osmr) and its paralog receptor lif receptor subunit alpha (lifr) expressed in both for at cells in general, and in double positive ones). interactions involving the complement pathway were also predicted (for all at and dual positives) between complement c and c expressed by at cells and their cognate receptor expressed in myeloid cells. three samples had interactions between the il receptor on at cells and il b or il rn in myeloid cells. the il -receptor interactions were identified mostly (in out of samples) involving only dual-positive at cells, suggesting a possible role of ace + tmprss + dual-positive cells in il -mediated processes. finally, we identified interactions between csf , , or expressed in at cells (including double positives) and their receptors expressed in myeloid cells. these predicted interactions further support the previously identified roles for cross-talk between at and myeloid cells, such as macrophages, in immune regulation (osm, complement, il ) and surfactant homeostasis (csf), as previously highlighted . we next asked whether human cell types of interest were present in animal models. while such analyses cannot address molecular compatibility (due to sequence variation in ace across species, as shown for lower compatibility of sars-cov and mouse ace ), they can help determine if dual-positive cells are present in commonly employed models, and if their characteristics, proportions, and programs are similar to those of their human counterparts. in a separate study , our lung network showed strong similarities to the human data in a macaque model. here, we focused on the more distant, but commonly used, mouse model. ace + tmprss + and ace + ctsl + dual-positive cells were present primarily in club and multiciliated cells in the airway epithelia of healthy mice (ace + tmprss + club . % [ . %, . %] and multiciliated . % [ . %, . %], ace + ctsl + club . % [ . %, . %] and multiciliated . % [ . %, . %]), consistent with the expression patterns found in human airways (fig. a) . furthermore, ace expression increased over a -month time-course of healthy mouse aging in both club (p= . e- ) and goblet (p= . ) cells (fig. a) . the proportion of ace + tmprss + dual-positive cells did not significantly increase with age during this time course (data not shown), but the proportion of ace + ctsl + dual-positive cells significantly increased in club cells during this time-course (fig. b) . interestingly, the mice were aged between - months, a -month period that is reported to reflect the maturation period from early to mature adults . examining bulk rna-seq profiles of sorted populations of alveolar at cells (sftpc + ), airway basal cells (krt + ), alveolar endothelial cells (cd -cd + ), alveolar epithelial cells (epcam + ), whole lung and whole trachea from a krt -creer/lsl-tdtomato/sftpc-egfp transgenic mouse model, and across tissues from encode, showed that ace , tmprss and ctsl are expressed in sorted at cells, whole trachea and whole lung, as well as in stomach, intestine, kidney and bladder. in human smokers, statistical modeling uncovered a robust trend of increased ace expression in airway epithelial cells, while expression in at cells was reduced (fig. d, extended data fig. ). to experimentally confirm these findings, we examined cell profiles from mice exposed daily to cigarette smoke for two months, followed by scrna-seq of whole lungs (fig. c) . epithelial specific expression patterns of mouse ace and the ace + tmprss + and ace + ctsl + dual-positive cells were largely consistent with the human data (fig. d) . upon smoke exposure, there was a significant increase in ace + airway secretory cell numbers, while the fraction of ace + at cells was unaltered (fig. e) . moreover, the expression levels of ace were significantly increased in airway secretory cells (fig. f ), but not in at cells (fig. g) . this was in agreement with bulk rna-seq of mouse lungs exposed to different doses of cigarette smoke , in which ace levels increased in a dose-dependent manner by daily cigarette smoke over months (fig. h) . notably, the covid- relevant proteases tmprss and ctsl were also significantly increased by smoke exposure in mice (fig. i,j) . thus, mouse smoking data shows similar trends as observed in humans and experimentally confirms the association of ace levels with smoking. we also compared the patterns between the mouse and human placenta, analyzing ace , tmprss , and ctsl expression across , cells from scrna-seq data during mouse placenta development from embryonic days . to (shu et al., unpublished). we find ace + tmprss + dual-positive cells ( . %) in a large fraction of fetal trophoblasts with strong epithelial signatures. ace + tmprss + dual-positive cells express signatures of at cells and hepatocytes, and many also express ctsl. ace + ctsl + dual-positive cells ( . %) are also present among fibroblasts, stromal cells, and fetal trophoblasts in both mice and humans (fig. k, extended data fig. ) . notably, while ace + ctsl + dual-positive fibroblasts and stromal cells in humans are of maternal origin, ace + ctsl + dual-positive fibroblasts and stromal cells are of fetal origin in mice. tmprss has been demonstrated to mediate sars-cov- infection in vitro , , but sars-cov- also infects cells in the absence of tmprss . thus, additional proteases likely play roles in proteolytic cleavage events of spike and other viral proteins that underlie entry (fusion) and egress. to systematically predict proteases potentially involved in sars-cov- pathogenesis, we tested for co-expression of each of annotated human protease genes with ace in the large declined donor transplant dataset ("regev/rajagopal") from patients. the analysis recovered tmprss as one of the significantly co-expressed in multiple lung epithelial cell types (fig. a, supplementary table , ). in addition, multiple members of the proprotein convertase subtilisin kexin (pcsk) family were also significantly co-expressed with ace in both proximal and distal airway epithelial cells (fig. a,b) , including furin, pcsk , pcsk , pcsk and pcsk in at cells. proprotein convertases have known roles in coronavirus s-protein priming , , . we obtained similar results in an independent dataset of , cells from donors (extended data fig. a,b, "aggregated lung") . to further investigate the role of proprotein convertases as candidates for sars-cov- s-protein processing we analyzed the sars-cov- spike protein sequence. multiple sequence alignment of s-protein sequences of sars-cov- and other beta-coronaviruses revealed a polybasic insert at the s /s junction present only in sars-cov- spike (extended data fig. c) . while polybasic sites are found in multiple members of betacoronavirus lineages a and c (e.g., mers-cov), sars-cov- is the only known member of lineage b harboring a polybasic motif in the s /s region (extended data fig. c) . as previously reported, this polybasic sequence corresponds well to cleavage motifs of multiple pcsk family proteases (extended data fig. d ) [ ] [ ] [ ] , and has a high probability for its pcsk-mediated cleavage (at amino acid ) (by prop and prosperous , ) as well as additional sites including the s ' position (at amino acid ), which would release predicted fusion-mediating peptides (extended data fig. e ) . we next examined pcsks expression and co-expression with ace across lung cell subsets (fig. c, extended data fig. f) . furin, pcsk and pcsk were broadly expressed across multiple lung cell types, and pcsk and pcsk were largely restricted to neuroendocrine cells, as previously reported , with pcsk further detected in . % of at cells (fig. d, extended data fig. g ). in many cell subsets we observed dual expression with ace at fractions comparable to or higher than those of ace + tmprss + cells (fig. e, extended data fig. h) . these include at cells (ace + tmprss + , ace + furin + and ace + pcsk + at . %, . %, and . %, fig. e) ; multiciliated cells in the proximal airway (ace + tmprss + , ace + furin + , ace + pcsk + , and ace + pcsk + at . %, . %, . %, and . %), and basal cells (ace + tmprss + , ace + furin + , and ace + pcsk + at . %, . %, and . %). coexpression is present across tissues in addition to the lung (extended data fig. i,j) , including the liver, ileum, kidney and nasal airways, with the highest percentages of ace + pcsk + dual positive cells in nasal airways (ace + pcsk + . %, ace + furin + . %), bladder (ace + pcsk + . %) and testis (ace + pcsk + . %). because different host proteases may contribute to different stages of the viral life cycle , , we also examined the prevalence of ace + tmprss + pcsk + triple-positive cells (tps) in the lung dataset. ace + tmprss + pcsk + were the main triple positive cells in multiciliated ( . %) and secretory cells ( . %) of proximal airways, and ace + tmprss + furin + tps were the most common within at cells ( . %) (extended data fig. k) . finally, when we examined all known human proteases for co-expression with ace in major lung epithelial cell types (fig. f) , we recovered cathepsins (ctsb, ctsc, ctsd, ctsl, ctss), proteasome subunits (e.g. psmb , psmb , psmb ), and complement proteases (c r, c , cfi) (fig. f, extended data fig. ), the latter also captured in our programs above (fig. ) . we performed integrative analyses of single-cell atlases in the lung and airways and across tissues to identify cell types and tissues that have the key molecular machinery required for sars-cov- infection. we then examined the relationship between specific cell types and three key covariates --age, sex, and smoking status --that have been related to disease severity. we further used the scale of these integrated atlases to identify gene programs in major epithelial cell subsets that may be infected by the virus, and search for other potential accessory proteases. our hope is that this extensive analysis and resource will help with hypothesis generation (and refutation) towards better understanding of the molecular and cellular basis of covid- infection, and the identification of putative therapeutic avenues. our cross-tissue analysis substantially expands on our , , , and others' - earlier efforts, allowing us to identify cell subsets across diverse tissues that may be implicated in virus transmission, pathogenesis, or both. focusing on pathogenesis, in addition to key subsets in the lung, airways and gut, we identified ace + cells that co-express either tmprss or ctsl in diverse organs, many of which have been associated with severe disease. these include epithelial cells in the liver, kidney, pancreas, and olfactory epithelium, cardiomyocytes, pericytes and fibroblasts in the heart, and oligodendrocytes in the brain. for example, the presence of double positive cardiomyocytes and cardiac pericytes and fibroblasts may provide a pathological basis for the cardiac abnormalities noted in covid- patients including elevated troponin, a signature of cardiomyocyte injury, myocarditis, and sudden cardiac death . as the co-expression of genes involved in sars-cov infection are highest in cardiac pericytes in healthy heart, damage to vascular beds may trigger troponin release in otherwise normal hearts. moreover, as myocardial ace expression is increased in patients with existing cardiovascular diseases (tucker et al. companion manuscript ), sars-cov infection may result in greater damage to cardiomyocytes, and account for greater disease acuity and poorer survival in these patients. one intriguing clinical observation is that some covid- patients display an array of neurologic symptoms , (helms et al. ) , reported as seizures and acute necrotizing encephalopathy, similar to that previously observed following other infections such as influenza , . neuroinflammation could result from direct viral infection of the brain, or a systemic cytokine storm [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . direct viral invasion of sars-cov and mers-cov is observed in multiple brain regions in human patients and mouse models , , consistent with widespread ace expression in numerous brain cell types. furthermore, sars-cov has been shown to infiltrate the brain via the olfactory epithelium-olfactory bulb axis ; olfactory transmission for sars-cov- has been recently proposed . other possible transmission routes could be through the infection of ace + tmprss + enteric neurons synapsing with vagal afferents, or entry through blood-cns interfaces such as the choroid plexus or meninges , [ ] [ ] [ ] . profiling immune cells at these sites after infection is an important future step to better understand how the viral response may lead to encephalitis. one intriguing possibility is that encephalitis might arise as an autoimmune response to myelin antigens expressed by infected cells. antibodies against peptides of myelin proteins have been clinically shown to be associated with autoimmune encephalitis and seizures [ ] [ ] [ ] , and myelin peptides are targets of t cells in demyelinating inflammatory neurological diseases such as acute demyelinating encephalomyelitis, guillain-barre syndrome, and multiple sclerosis [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . oligodendrocytes, the myelin-producing cells of the cns, are the main ace + tmprss + cell type in the brain, the myelin transcriptional regulator myrf was enriched in certain ace + tmprss + cell types as noted above, and myelin proteins mog and mbp were co-expressed in numerous ace + clusters across organs. myrf and mbp were significantly differentially expressed in ace + tmprss + subsets of the lung and gut (supplementary table ); myelin targeting th cells trained in the gut are able to infiltrate the cns in a mouse model of experimental autoimmune encephalomyelitis . taken together, the expression of myelin proteins across multiple ace + tmprss + cells could hypothetically contribute to antigen presentation and autoimmune response in the context of viral infection. one test of this hypothesis would be to establish whether demyelination occurs in covid- patients, and whether it can be triggered by anti-myelin specific immunity induced by virus-infected cells, related to observations following other viral infections like zika, influenza, and epstein-barr virus , , . our meta--analysis of scrna-seq across studies provided the required statistical power to uncover population-level signals at a molecular level and at single-cell resolution. we found that the sars-cov- receptor and associated proteases were up-regulated in airway epithelial and at cells with age and in males, an association that may shed light on the marked increase in mortality with age. furthermore, ace was up-regulated in airway epithelial cells (basal and multiciliated cells) in past or present smokers, but down-regulated in their at cells; we have also confirmed this in an experimental system in a mouse model. these contrasting smoking associations show the importance of the single-cell resolution, as the down-regulation in at cells will be masked by the airway epithelial signal, leading to loss of association or misinterpretation of seemingly consistent ace signals in bulk rna-seq . importantly, ace is particularly lowly expressed in young pediatric samples, also mirrored by lack of chromatin accessibility in the ace locus. ace expression is known to be regulated in complex ways across different tissues and may be affected by both common therapies (acei/arb) (tucker et al., companion manuscript ), and during infection . moreover, both higher ace expression per cell and a higher fraction of ace + cells can in principle have implications for infection, but may have conflicting effects on pathogenesis, as ace knockout mice show more severe ards upon lung injury because of its role in the renin-angiotensin pathway, which seems to protect from consequences of lung injury and inflammation (for potential roles of this pathway on cov infection (pre-covid) see the review in ). as sars-cov binding will lead to internalization and therefore downregulation of ace on the cell surface, the protective function of ace via its proteolytic processing of angiotensin-ii may be lost. thus, the smoking mediated downregulation of ace in at cells may not protect cells from being infected but rather may increase ards due to more severe loss of ace from the cell surface upon infection. other confounders, including acei which are not available to our meta-analysis may further impact our results. to the best of our knowledge, this study is the first single-cell meta-analysis (in any setting). to perform this meta-analysis, we used a model that included both the tested covariates (age, sex, and smoking status), technical covariates (dataset and the number of umis per cell), and several interaction terms. including these interaction terms was crucial, as omission resulted in increased background variation and reversed effect estimates. likewise, modeling the smoking status of a donor was important to reduce background variation and account for the unbalanced distribution of covariates in the dataset. for example, while we have similar numbers of male ascertained smokers and non-smokers ( and donors), there are three times as many female ascertained non-smokers as female smokers ( and donors), which is reflective of this bias in the population . the addition of these terms increases the complexity of the model. indeed, only one dataset ("seibold") had sufficient numbers of donors of various ages, sex, and smoking status to fit the full model. thus, performing the meta-analysis was only possible due to the aggregation of a large number of healthy single-cell datasets enabled by the hca lung biological network and a community-wide effort. a limitation of our expression model is that each cell is treated as an independent observation. thus, the significance of association with traits such as sex, age, and smoking status may show inflated p-values, especially where the associations are determined from few donors. in this case, the variation between cells from a single donor dominates the variation between donors, background variation is underestimated and effect significance can be overestimated. aggregating many datasets allows us to counteract this effect, yet p-value inflation may occur in cell types that are not as commonly shared across datasets. our main conclusions are drawn on airway epithelial and at cells, which are distributed widely across datasets and are modeled on the basis of many donors. furthermore, we have confirmed significant associations by pseudobulk analysis and by holding out datasets. this confirmation ensures that associations are consistent when only considering donor variation, and we are aware if these associations are dataset dependent (often when one dataset is a particularly major source of a given cell type). models that account for both single-cell count distributions, and population structure in the data have the potential to improve future meta-analyses across single-cell atlases. having a cell type annotation with consistent resolution across datasets was instrumental for analyzing the association with clinical covariates up to the resolution of basal, secretory, or multiciliated cells. these cell type labels still aggregate over considerable diversity, which is the subject of ongoing scientific research. importantly, the labeled subtypes of these cell clusters differ between datasets. thus, in those cases where individual associations depend on a particular dataset not being held out, it may be the case that these associations become more robust at a higher level of cell type annotation. future high-resolution cell annotation efforts have the potential to further consolidate our single-cell meta-analysis results. in addition to modeling associations between gene expression and clinical co-variates, we also examined whether the proportion of ace + tmprss + cells per sample is associated with age or sex. while we can observe a trend of double positive cell proportions increasing with age (extended data fig. a) , the high compositional diversity across samples and studies (fig. a, extended data fig. ) , the potential confounders (total counts, dataset), and limited sample numbers are prohibitive to modeling these associations. further metadata that describe the sample diversity such as harmonized annotations on anatomical location, sampling methods, and sample processing can help to capture this heterogeneity in ace + tmprss + cell proportion models. the expression of ace and tmprss in lung, nasal and gut epithelial cells is associated with expression programs with many shared features, involving key immunological genes and genes related to viral infection, raising many hypotheses for future studies, especially as more patient tissue samples are analyzed in the coming months. in the lung, epithelial cells express il , il r and il st, which raises the hypothesis that infection may trigger cytokine expression from these cells and contribute to uncontrolled immunological responses. the immune-like programs in these cells are further reinforced by the accessibility of stat and irf binding sites in scatac-seq data, consistent with another study from our network showing the role of interferon in regulating ace expression in epithelial cells . notably, scrna-seq analysis of immune cells from bronchoalveolar lavage fluid of covid- patients identified high activity of transcription factors such as stat / and irf / / / / / in macrophage states increased in severe covid- patients . other hypotheses for future studies include lysosomal genes in dual positive lung secretory and multiciliated cells, which may be consistent with putative "viral entry" cells, and ripk expression in the cell programs of airway cells, which opens the hypothesis of necroptosis initiating a pro-inflammatory response. interestingly, we observed relatively high enrichment of ace in secretory cell types (mucous cells and at cells). we speculate that viruses may take advantage of the rich secretory pathway components in these cells for their efficient dispersal. additionally, smgs of the airways are recently shown to serve as reservoirs of reserve stem cells , . therefore, we also speculate that smgs similarly may serve as reservoirs for viruses where they can escape from muco-ciliary transport and mechanical expulsion associated with severe cough in the airway luminal surface. the gene programs of at cells can also contribute to cross talk with alveolar macrophages. our cell-cell interaction analysis suggests that at cells engage with alveolar macrophages through oncostatin m, csf, il and complement pathways, suggesting therapeutic hypotheses. the complement pathway is particularly intriguing in the context of covid- . first, viral protein glycosylation is a known trigger for the lectin pathway (lp) of the proteolytic complement cascade, such that in addition to classical complement activation via antibody complexes, other coronavirus glycoproteins can be recognized by lp-inducing host collectin proteins , . moreover, excessive complement activation resulting in acute lung injury and cytokine storms were also implicated in the pathogenesis of sars , and complement inhibition using the anti-c antibody eculizumab is currently evaluated as anti-inflammatory experimental emergency treatment for severe covid- in clinical trials (clinicaltrials.gov identifier nct ). in our analysis, multiple complement pathway proteases (e.g. c r, c , cfi) are co-expressed with ace across different lung cell subsets (fig. f,g, extended data fig. ) and complement inhibitory factor cd and complement protease c were preferentially expressed by ace + tmprss + dps within lung tissue, in multiciliated and secretory cells, respectively (fig. a,c, supplementary table , supplementary tables , ) . moreover, cell-cell interaction analysis predicted cross-talk between at cells expressing complement proteins c and c and macrophages expressing cognate receptors. at cell expression of negative complement regulators cfi and cd might represent a strategy for sars-cov- to at least partially escape complement surveillance. finally, to explore therapeutic hypotheses related to disruption of viral processing via protease inhibition, we explored the expression of other proteases across our integrated atlases. although a multitude of different sars-cov- features likely account for its high pathogenicity and transmissivity, it has been speculated that the prrar loop might contribute to increased covid- severity. introduction of similar polybasic cleavage sites into avian influenza viruses and human coronaviruses was shown to render them more pathogenic, increasing mortality and viral spread , . one hypothesis is that acquisition of a pcsk cleavage site would expand the number of cell types that can be directly infected by sars-cov- . a recent report has started to address expression of furin in cells expressing sars-cov- host factors ace or tmprss , and furin activity is inhibited by guanylate-binding proteins (gbps), a group of interferonstimulated genes, in order to restrict viral envelope processing . however, the highly overlapping recognition sequence of pcsk family members (extended data fig. d) suggests that multiple pcsks in addition to furin could mediate cleavage at the s /s prrar motif (extended data fig. e) . our expression analysis confirms that pcsk family members, in particular furin, pcsk and pcsk , are more broadly expressed than tmprss across lung cell types (fig. d) , as well as across tissues (extended data fig. i) . in the lung, we note the higher proportion of ace + pcsk + basal cells and ace + pcsk / + fibroblasts (fig. e, extended data fig. h) . interestingly, the host interactome of sars-cov- further suggests interaction of viral proteins with pcsk , which also showed significant ace co-expression in at cells (fig. b, extended data fig. b) . moreover, because pcsk localization is detected in different membrane compartments along the secretory and endocytic pathways , it is conceivable that pcsks could process sars-cov- s-proteins at different stages of the viral life cycle. moreover, further analysis is required to assess the extent to which sars-cov- relies on proteolytic activity provided in trans either by neighboring cells or extracellularly localized proteases . altogether, this could provide sars-cov- with an immense flexibility in different entry and egress pathways. taken together, our analyses provide a rich molecular and cellular map as context for the transmission, pathogenesis, clinical associations, and therapeutic hypotheses for covid- . as new single cell atlases will be generated from covid- tissues and experimental models, they will help further advance our understanding of this disease. sample collection underwent irb review and approval at the institutions where the samples were originally collected. "adipose_healthy_manton_unpublished" was collected under irb p / (orsp- ). tissue samples from breast, esophagus muscularis, esophagus mucosa, heart, lung, prostate, skeletal muscle and skin referred to as "tissue_healthy_regev_snrna-seq_unpublished" were collected under orsp- . samples (supplementary table , ) publicly available single-cell rna-seq datasets were downloaded from gene expression omnibus (geo) . we searched geo for datasets that met all of the following three criteria: ( ) provided unnormalized count data; ( ) was generated using the x genomics's chromium platform ; and ( ) profiled human samples. these tissue samples spanned a wide range, including primary tissues, cultured cell lines, and chemically or genetically perturbed samples. applying these filters increases standardization of sample as the vast majority were prepared using the same x chromium instrument and cell ranger pipelines. datasets comprise of one or more samples (individual gene expression matrices), which often correspond to individual experiments or patient samples. in total, this yielded , , cells from samples from distinct datasets (supplementary table ) . to allow comparison across samples and datasets, we mapped through a common dictionary of gene symbols and excluded unrecognized symbols. if a gene from an aggregated master list was not found in a sample, the expression was considered to be zero for every cell in that sample. after all datasets were collected, we quantified the percentage of cells with > umis for both ace and tmprss or ace and ctsl. for further analyses with broad cell classes, we only used datasets with more than double positive cells yielding , cells from samples. for integration across datasets, we used two levels of annotations. when possible, every sample was annotated with its tissue of origin based on the available metadata from geo. we excluded any sample for which tissue was not specified. for the smaller subset of , cells we then manually annotated cell clusters with broad cell type classes using marker genes. these clusters were generated using the harmony-pytorch python implementation (version . . (https://github.com/lilab-bcb/harmony-pytorch ) of the harmony scrna-seq integration method for batch correction and leiden clustering from the scanpy package (version . . ) . clusters without clear markers distinguishing types were excluded from further analysis. data was processed using scanpy. individual datasets were normalized log (umis/ , + ) by column sum and the log p function (ln( , * gij + ) where a gene's expression profile, g, is the result of the umi count for each gene, i, for cell j, normalized by the sum of all umi counts for cell j. this data normalization step was only used for generating the clusters and cell type annotations. all other statistical tests for the integrated analysis were performed on the cell's binary classification as a double positive or not. for example, for a cell to be considered ace +, it has > ace transcripts. double positive cells have > transcripts for both genes of interest. we used fisher's exact test to test for statistical dependence between the expression of ace and tmprss or ctsl and corrected for multiple testing via benjamini-hochberg over all tests for each gene pair. we compiled a compendium of published and unpublished datasets consisting of , , cells from tissues and/or organs including adipose, bone marrow, brain, breast, colon, cord blood, enteric nervous system, esophagus mucosa, esophagus muscularis, anterior eye, heart, kidney, liver, lung, nasal, olfactory epithelium, pancreas, placenta, prostate, skeletal muscle and skin. after the harmonization of cell type annotations, ace -tmprss and ace -ctsl coexpression were assessed using a logistic mixed effect model: where yi was the binarized expression level of either tmprss or ctsl, and covariates were binarized ace expression in cell i and a sample-level random intercept. models were fit separately for each cell type in each dataset. in order to avoid spurious associations in cell types with very few ace + cells and due to very low expression of ace , we subsampled ace cells to the number of ace + cells within each cell type and discarded cell types containing fewer than cells expressing either ace or fewer than cells expressing the other gene being tested after the subsampling procedure. the significance of the association between ace and tmprss /ctsl is controlled for % fdr using the statsmodels python package (version . . ) . data processing was performed using scanpy python package (version . . ) and logistic models were fit using lme r package (version . . ) . library generation and sequencing. libraries were generated using the x chromium controller and the chromium single cell atac library & gel bead kit (# ) according to the manufacturer's instructions (cg -rev c; cg -rev b) with unpublished modifications relating to cell handling and processing. briefly, human lung derived primary cells were processed in . ml dna lobind tubes (eppendorf), washed in pbs via centrifugation at g, min, c, lysed for min on ice before washing via centrifugation at g, min, c. the supernatant was discarded and lysed cells were diluted in x diluted nuclei buffer ( x genomics) before counting using trypan blue and a countess ii fl automated cell counter to validate lysis. if large cell clumps were observed, a µm flowmi cell strainer was used prior to the tagmentation reaction, followed by gel bead-in-emulsions (gems) generation and linear pcr as described in the protocol. after breaking the gems, the barcoded tagmented dna was purified and further amplified to enable sample indexing and enrichment of scatac-seq libraries. the final libraries were quantified using a qubit dsdna hs assay kit (invitrogen) and a high sensitivity dna chip run on a bioanalyzer system (agilent). all libraries were sequenced using nextseq high output cartridge kits and a nextseq sequencer (illumina). x scatac-seq libraries were sequenced paired end ( x cycles). initial data processing and qc. fastq files were demultiplexed using x genomics cellranger atac mkfastq (version . . ). we obtained peak-barcode matrices by aligning reads to grch (cr v . . pre-built reference) using cellranger atac count. peak-barcode matrices from six channels were normalized per sequencing depth and pooled using cellranger atac aggr. the aggregated, depth-normalized, filtered dataset was analyzed with signac (v . . , https://github.com/timoast/signac), a seurat extension developed for the analysis of scatacseq data. all the analyses in signac were run with a random number generator seed set as . cells that appeared as outliers in qc metrics (peak_region_fragments ≤ or peak_region_fragments ≥ , or blacklist_ratio ≥ . or nucleosome_signal ≥ or tss.enrichment ≤ ) were excluded from the analysis. normalization and dimensionality reduction. the aggregated dataset was processed with latent semantic indexing , i.e. datasets were normalized using term frequency-inverse document frequency (tf-idf), then singular value decomposition (svd), ran on all binary features, was used to embed cells in low-dimensional space. uniform manifold approximation and projection (umap) was then applied for visualization, using the first dimensions of the svd space. gene activity matrix and differential motif activity analysis. a gene activity matrix was calculated as the chromatin accessibility associated with each gene locus (extended to include kb upstream of the transcription start site, as described in the vignette 'analyzing pbmc scatac-seq' (version: march , , https://satijalab.org/signac/articles/pbmc_vignette.html), using as gene annotation the genes.gtf file provided together with cellranger's atac grch - . . reference genome. clusters were annotated using label transfer from matching scrna samples or by literature / expert search of marker "active" (i.e. accessible) genes. differential motif activity analysis was performed using signac's implementation of chromvar , with motif position frequency matrices from jaspar (http://jaspar.genereg.net/) selecting transcription factors motifs from human (species= ), broadly following the vignette 'motif analysis with signac' (https://satijalab.org/signac/articles/motif_vignette.html). cells were identified as positive for ace and/or tmprss (i.e. with the loci accessible) if at least one fragment was overlapping with the gene locus or kb upstream. differential activity scores between epithelial cells positive for ace (with the above-mentioned definition of 'positive') and non-expressing ace was performed with the findmarkers function of seurat (version . . ), using as test 'lr' (i.e. logistic regression) and as latent variable the number of counts in peak. the following publically available bulk-rnaseq datasets were obtained from the encode database: lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid ; lid , generated by the gingeras lab . these fastq datasets were aligned to the mm annotation build using star and processed using the standard tuxedo suite to yield a normalized fpkm matrix. immunohistochemistry analysis was performed on % pfa fixed, oct embedded tissue sections from human explant donors. briefly, cells were permeabilized with % triton-x in pbs. slides were either not treated with antigen retrieval or antigen retrieval was performed with the tris-citrate buffer as needed (supplementary table ). slides were incubated overnight with primary antibodies at indicated concentrations (supplementary table ) in donkey serum with % triton-x in pbs. slides were treated with alexa-fluor secondary antibodies mixed with dapi at : concentration in donkey serum with % triton-x in pbs for hour at room temperature. slides were mounted and imaged on a confocal microscope. proximity ligation in situ hybridization (plish) was performed as described previously . briefly, frozen human trachea and distal lung sections were fixed with . % paraformaldehyde for min, treated with protease ( μg/ml proteinase k for lung or pepsin for trachea for min) at °c, and dehydrated with up-series of ethanol. the sections were incubated with gene-specific oligos (supplementary table ) in hybridization buffer ( m sodium trichloroacetate, mm tris [ph . ], mm edta, . mg/ml heparin) for h at °c. common bridge and circle probes were added to the section and incubated for h followed by t ligase reaction for h. rolling circle amplification was performed by using phi polymerase (# , lucigen) for hours at °c. fluorophore-conjugated detection probe was applied and incubated for min at °c followed by mounting in medium containing dapi. to assess the association of age, sex, and smoking status with the expression of ace , tmprss , and ctsl, we aggregated scrna-seq datasets of healthy human nasal and lung cells, as well as fetal samples. aggregation of these datasets was enabled by harmonizing the cell type labels of individual datasets within scanpy (version . . . ). we harmonized annotations together with data contributors using a preliminary ontology generated on the basis of published datasets [ ] [ ] [ ] [ ] with levels of annotations (level -lowest resolution; supplementary table ) . we further harmonized metadata by collapsing the smoking covariate into "has smoked" and "has never smoked" and by taking mean ages where only age ranges were given. this endeavor produced a dataset of , , cells in samples from donors (supplementary data d ) . we divided the data into fetal ( , to get an overview of sample diversity, we clustered the samples using the proportion of cells in level cell types as features. clustering was performed using louvain clustering (resolution . ; louvain package version . . ) on a knn-graph (k= ) computed on euclidean distances over the top principal components of the cell type proportion data within scanpy. this produced five clusters. sample cluster labels were assigned based on metadata for anatomical location that was obtained from the published datasets and via input from the data generators. within adult datasets we modeled the association of age, sex, and smoking status with gene expression for ace , tmprss , and ctsl using a generalized linear model with the log total counts per cell as offset and poisson noise as implemented in statsmodels (version . . ) and using a wald test from diffxpy (www.github.com/theislab/diffxpy; version . . , batchglm version . . ). specifically we used the model: here, " denotes the raw count expression of gene i in cell j and age:sex, sex:smoking, and age:smoking are interaction terms of the three modeled covariates. these terms model whether there is a difference in the smoking effect in men and women, and likewise whether the age effect is different for smokers and non-smokers. while we model these interaction terms, we only tested age, sex, and smoking effects individually to reduce the multiple testing burden. we included the dataset term to model the batch effects between the diverse datasets we obtained, and the log total counts per cell was used as an offset. here, the total counts were scaled to have a mean of across all cells before the log was taken. in order to fit this model we pruned the data to contain only datasets that have at least donors and for which smoking status metadata was provided. this resulted in a dataset of , cells and samples from donors for adult lung data. only a single dataset remained for adult nasal data after this filtering on which the model could not be fit. to obtain cell-type specific associations the above model was fit within each cell type for all cell types with at least , cells. we performed wald tests over the age and sex covariates independently and corrected for multiple testing via benjamini-hochberg over all tests within a cell type. as metadata on smoking status was only available for a subset of the data, we also fitted a simpler model on a larger dataset to confirm sex and age associations. the simplified model: was fit on , cells in samples from donors of adult lung data. again, log total counts (scaled) was used as an offset. the above models treat cells as independent observations and thus model cellular and donor variation jointly. as donor variation tends to be larger than single-cell variation, when most cells come from few donors (either there are few donors, or few donors contribute most of the cells), this can lead to an inflation of p-values. to counteract this effect, we verified that significant associations are consistent when modeling only donor variation via pseudo-bulk analysis, and we tested whether effects are dependent on few donors by holding out datasets. pseudo-bulk data was generated by computing the mean for each gene expression value and numi covariate for cells in the same cell type and donor. after filtering as described above, models ( ) and ( ) were fit to the data. in contrast to the single-cell model, pseudo-bulk analysis underestimates certainty in modeled effects as uncertainty in the pseudo-bulk means are not taken into account. thus, we used only effect directions from pseudo-bulk analysis to validate single-cell associations. we regarded only those associations as significant, where the fdrcorrected p-value in the single-cell model is below . , and the sign of the estimated effect is consistent in both the single-cell and the pseudo-bulk analysis. we further separated significant associations into robust trends and indications depending on the holdout analysis. a significant association was regarded as a robust trend if the effect direction is consistent when holding out any dataset when fitting the model irrespective of p-value. in the case that holding out one dataset caused the maximum likelihood estimate of the coefficient to be reversed, we denote this as the effect no longer being present, which characterized the association as an indication. for each of the lung, nasal, and gut datasets, we labeled the cells with non-zero counts for both ace and tmprss as dual-positive cells (dps), and the cells with zero counts for both ace and tmprss as dual-negative cells (dns). within each tissue, we identified cell types with greater than dps, and for each of these cell types, we selected the genes with increased expression (log fold change greater than ) in dps vs dns (so that we focus on important "positive" features). we trained a classifier with : train:test split to classify the dps from dns within each of these cell types using the sklearn (version . . ) randomforestclassifier function with the following parameters: n_estimators set to , the criterion as gini, and the class_weight parameter set to balanced_subsample. we first trained individual classifiers separately for each of the cell types, and pooled genes with positive feature importance values (using the feature_importance field in the trained randomforestclassifier object) to train a final dp vs dn classifier across each tissue. we used the top genes, as ranked by their feature importance scores, to define the signature for the gene expression program of dps for the tissue. this procedure was carried out in lung, nasal, and gut datasets, yielding tissue-specific signatures for gene expression programs of dps from each tissue. for visualization purposes only, we generated network diagrams using the networkx (version . ) tool with the forceatlas graph layout algorithm . we scored genes that appeared in signatures for multiple tissues by their aggregated feature importance (using a plotting heuristic that used the sum of importance ranks for genes in individual tissues and by assigning a large valued rank ( ) to a gene that did not appear in a particular tissue) and selected the top genes that were shared by each pair of tissues or shared by all tissues along with additional genes that included the ones unique to each tissue's signature to plot in the network visualization. the go terms enriched in the gene expression programs shared by dps across tissues were found using gprofiler (version . . ) using the scanpy.queries.enrich tool. this analysis was performed in two ways: on the original data, as well as after accounting for differences in distribution of the number of umis (numi) per cell between dps and dns. this was done by binning the numi distribution in the dps for each tissue into a bins and then randomly sampling from the numi distribution for the dns in each bin to match the distribution of the dps in that bin. the numi distributions before and after the matching are shown in extended data fig. b . in parallel, we used a regression framework to recover gene modules enriched in dp vs. dn cells (fig. c,d, extended data fig. a,b) in the nasal, lung, and gut datasets. we first restricted our analysis to cell subsets derived from at least two donor individuals that each contained a mixture of dn and dp cells (nawijn nasal: multiciliated, goblet; regev/rajagopal lung: at , at , basal, multiciliated, secretory; aggregated lung: at , multiciliated, secretory; regev/xavier colon: best + enterocytes, cycling ta (transit amplifying), enterocytes, immature enterocytes , ta- ). for each of these cell subsets, we then used mast (version . . ) to fit the following regression model to every gene with cells as observations: where yi is the expression level of gene i in cells, measured in units of log (tp k+ ), x is the binary co-expression state of each cell (i.e. dp vs. dn), and s is the donor that each cell was isolated from. to control for donor-specific effects (i.e. batch effects), we used a mixed model with a random intercept that varies for each donor. to fit this model, we subsampled cells from dp and dn groups to ensure that both the donor distribution and the cell complexity (i.e. the number of genes per cell) were evenly matched between the two groups, as follows. first, for each subset, we restricted our analysis to donors containing at least two dn and two dp cells. using these samples, we partitioned the cells into equally-sized bins based on cell complexity and subsampled dn cells from each bin to match the cell complexity distribution of the dp cells. finally, we fit the mixed model (above), controlling for both donor and cell complexity. to build gene modules for dp cells, we prioritized genes by requiring that they be expressed in at least % of dp cells, and to have a model coefficient greater than with an fdr-adjusted pvalue less than . (for the combined coefficient in the hurdle model). after this filtering step, genes were ranked by their model coefficient (i.e. estimated effect size). the top genes were selected for network visualization within each cell type (fig. c,d, extended data fig. a,b) . in three cases (gut cycling ta, ta- and best + cells), rp -* antisense genes were flagged and excluded from visualizations. to visualize overlap across each network, we indicated whether each gene was among the top genes from each of the other cell types. putative drug targets were identified by querying the drugbank database . gene set enrichment analysis was performed using the r package enrichr (version . ) , selecting the top genes from each cell type for the pan-tissue analysis ("all" category; fig. e ), and the top genes from each cell type for the tissue-specific analyses ("gut", "nasal", and "lung" categories; fig. e ). we note a few caveats/challenges/limitations that may influence our results, including non uniform sampling across donors; variation in cell compositions across regions (e.g., distal lung vs carina), and additional cellular heterogeneity that the current level of broad subset annotation may not have been captured. cellphonedb v. . . was run with default parameters on the human lung samples of the regev/rajagopal dataset, analyzing the cells from each dissected region separately. for each sample (patient/location combination), for each cell type we distinguished double positive cells (ace > and tmprss > ) from all others. only interactions highlighted as significant, i.e. present in the "significant means" output from cellphonedb were considered. ace -protease co-expression (figure , extended data fig. ) and ace -il /il r/il st coexpression (extended data fig. ) were tested via the logistic mixed-effects model described in "integrated co-expression analysis of high resolution cell annotations across tissues" (equation , above). data and an interactive analysis examining the co-expression of genes across datasets can be accessed via the open-source data platform, terra at https://app.terra.bio/#workspaces/kcoincubator/covid- _cross_tissue_analysis. interactive visualization and download of gene expression data can be accessed on the single cell portal at https://singlecell.broadinstitute.org/single_cell?scpbr=hca-covid- -integrated-analysis n.k. was a consultant to biogen idec, boehringer ingelheim, third rock, pliant, samumed, numedii, indaloo, theravance, lifemax, three lake partners, optikira and received non-financial support from miragen. all of these outside the work reported. j.l. is a scientific consultant for x genomics inc a.r. is a co-founder and equity holder of celsius therapeutics, an equity holder in immunitas, and an sab member of thermofisher scientific, syros pharmaceuticals, asimov, and neogene therapeutics o.r.r., is a co-inventor on patent applications filed by the broad institute to inventions relating to single cell genomics applications, such as in pct/us / and us provisional application no. / , . a.k.s. compensation for consulting and sab membership from honeycomb biotechnologies, cellarity, cogen therapeutics, orche bio, and dahlia biosciences. s.a.t. was a consultant at genentech, biogen and roche in the last three years. f.j.t. reports receiving consulting fees from roche diagnostics gmbh, and ownership interest in cellarity inc. l.v. is funder of definigen and bilitech two biotech companies using hpscs and organoid for disease modelling and cell based therapy. (c) statistical model. model fitted to the data to assess sex, age, and smoking status associations with expression of the three genes. denotes gene counts and numi denotes the total umi counts per cell. (d) age, sex, and smoking status associations with expression of ace (blue), tmprss (orange), and ctsl (green) in epithelial cells. effect size (x axis) of the association, in log fold change (sex, smoking status) or slope of log expression with age. colored bars: associations with an fdr-corrected p-value< . , where pseudo-bulk analysis shows a consistent effect direction. error bars: standard errors around coefficient estimates. (e) distribution of ace and tmprss expression across level lung cell types. red shading indicates the main cell types that express ace and tmprss . (f) hold out analysis shows the robustness of associations to holding out a dataset. the values show the number of held-out datasets that result in loss of association between a given covariate (rows) and ace , tmprss , or ctsl expression in a given cell type (columns). robust trends are determined by significant effects that are robust to holding out any dataset ( values). (g) low expression in pediatric samples. mean expression level (log cpm, y axis) of ace (blue), tmprss (orange), and ctsl (green) across age bins (x axis) in at (left) and ciliated (right) cells. pediatric samples: - years. samples from past or current smokers were removed from this plot to avoid smoking confounders. error bars are omitted due to y-axis limitations. they are typically -fold the mean value (supplementary table ). multiciliated and at cells are shown as these cell types are present in fetal data, and show significant age associations with ace expression. (a) gradual increase in ace expression by airway epithelial cell type with age. mean expression (y axis) of ace in different airway epithelial cells (x axis) of mice of three consecutive ages (color legend, upper right). shown are replicate mice (dots), mean (bar), and error bars (standard error of the mean (sem)). (b) increase in proportion of ace + ctsl + goblet and club cells with age. percent of ace + ctsl + cells (x axis) in different airway epithelial cell types (y axis) of mice of three consecutive ages (color legend, upper right). shown are replicate mice (dots), mean (bar), and error bars (sem). (c-j) increase in ace expression in secretory cells with smoking. mice were daily exposed to cigarette smoke or filtered air as control for two months after which cells from whole lung suspensions were analyzed by scrna-seq (drop-seq). (c,d) umap of scrna-seq profiles (dots) colored by experimental group (c) or by ace + cells and indicated double positive cells (d). alveolar epithelial cells (at and at ) and airway epithelial secretory and ciliated cells are marked. (e) the relative frequency of ace + cells is increased by smoking in airway secretory cells but not at cells. relative proportion (y axis) of ace + (red) and ace -(grey) cells in smoking and control mice of different cell types (x axis). (f, g) expression of ace is increased in airway secretory cells, but not in at cells. distribution of ace expression (y axis) in secretory (f) and at (g) cells from control and smoking mice (x axis). (h-j) re-analysis of published bulk mrna-seq of lungs exposed to different daily doses of cigarette smoke show increased expression of (h) ace , (i) tmprss , and (j) ctsl after five months of chronic exposure. extended data figure . age, sex, and smoking status associations with expression of ace , tmprss , and ctsl across level cell type annotations. effect size (y axis) of association as log fold changes (sex, smoking status) and slope of log expression with age. bars that are colored in indicate associations with an fdr-corrected p-value of < . where the pseudo-bulk analysis shows a consistent effect direction. error bars represent model uncertainties. extended data figure . age, sex, and smoking status associations with expression of ace , tmprss , and ctsl across level cell type annotations. effect size (y axis) of the association as log fold changes (sex, smoking status) and slope of log expression with age. bars that are colored in indicate associations with an fdr-corrected p-value of < . where the pseudo-bulk analysis shows a consistent effect direction. error bars represent model uncertainties. fig. . cell programs for dual positive cells (a,b) top genes from each cell program recovered for different lung (a) or gut (b) epithelial cell-type (nodes, colors). colored concentric circles: overlap with a gene in the top significant genes in other cell types. ace and tmprss are included even if not among the top . (c) comparison of signature scores of cell programs between dp and dn cells for each cell type stratified by gene complexity bin. cells were partitioned into gene complexity bins for every cell type. (d,e) il and its receptor's expression in specific cell types in lung and heart. (d) significance (dot size) and fold change (dot color) of differential expression between dp and dn cells within different types (rows) for il and its receptors il r and il st (columns) across tissues. (e) top: significance (dot size) and fold change (dot color) of differential expression between dp and dn cells within different cell types in the heart (rows)for il and its receptors il r and il st (columns). bottom: significance (dot size) and effect size (dot color) from a mixed effects model of co-expression of il , il r, or il st (columns) coexpression with ace . (f) distribution of number of counts in peaks (y axis) in ace + epithelial cells (having at least fragment in the ace gene locus) and ace cells. figure . co-expression of ace and il ,il r,il st. co-expression of ace and il ,il r,il st in select single-cell datasets. p-values and significance (fdr %) derived from the logistic mixed-effects model. figure . expression of ace , tmprss and ctsl in mouse placenta. umap embedding (as in fig. k ) of scrna-seq profiles of placenta cells collected at e. . (top) or along a time course (bottom), colored by expression level oface , tmprss , and ctsl. figure . additional analyses to identify other proteases that may have a role in infection. (a) multiple proteases are co-expressed with ace in another human lung scrna-seq ("aggregated lung"). scatter plot of significance (y axis, -log (adjusted p value)) and effect size (x axis) of co-expression of each protease gene (dot) with ace within each indicated epithelial cell type (color). dashed line: significance threshold. tmprss and pcsks that significantly coexpressed with ace are marked. (b) ace -protease co-expression with pcsks, tmprss and ctsl across lung cell types ("aggregated lung"). significance (dot size, -log (adjusted p value)) and effect size (color) for co-expression of ace with selected proteases (columns) across cell types (rows). (c-d) predicted cleavage sites in the sars-cov- s-protein s /s region. (c) multiple amino acid sequence alignment of sars-cov- s-protein s /s region with orthologous sequences from other betacoronaviruses (top) and polybasic cleavage sites of other human pathogenic viruses (bottom). pathological findings of covid- associated with acute respiratory distress syndrome clinical features of patients infected with novel coronavirus in wuhan clinical features of covid- related liver damage. medrxiv clinical characteristics of coronavirus disease in china covid- and the cardiovascular system identification of a novel coronavirus causing severe pneumonia in human: a descriptive study a novel coronavirus outbreak of global health concern the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak kidney impairment is associated with in-hospital death of covid- patients clinical and radiographic features of cardiac injury in patients with novel coronavirus pneumonia characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding covid- ) detection of sars-cov- in different types of clinical specimens the ace expression in sertoli cells and germ cells may cause male reproductive disorder after sars-cov- infection an insight of comparison between covid- ( -ncov disease) and sars in pathology and pathogenesis possible vertical transmission of sars-cov- from an infected mother to her newborn clinical characteristics and intrauterine vertical transmission potential of covid- infection in nine pregnant women: a retrospective review of medical records neonatal early-onset infection with sars-cov- in neonates born to mothers with covid- in wuhan, china antibodies in infants born to mothers with covid- pneumonia an analysis of pregnant women with covid- , their newborn infants, and maternal-fetal transmission of sars-cov- : maternal coronavirus infections and pregnancy outcomes infants born to mothers with a new coronavirus (covid- ) lack of vertical transmission of severe acute respiratory syndrome coronavirus substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) adjusted age-specific case fatality ratio during the covid- epidemic in hubei, china estimating clinical severity of covid- from the transmission dynamics in wuhan, china case-fatality risk estimates for covid- calculated by using a lag time for fatality estimating risk for death from novel coronavirus disease, china real estimates of mortality following covid- infection evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease covid- -new insights on a rapidly changing epidemic sars-cov- infection in children covid- and smoking: a systematic review of the evidence cardiac involvement in a patient with coronavirus disease (covid- ) association of coronavirus disease (covid- ) with myocardial injury and mortality coronaviruses: an overview of their replication and pathogenesis cryo-em structure of the -ncov spike in the prefusion conformation sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars tissue renin-angiotensin-aldosterone systems: targets for pharmacological therapy angiotensin-converting enzyme is a functional receptor for the sars coronavirus structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- invades host cells via a novel route: cd -spike protein host cell proteases: critical determinants of coronavirus tropism and pathogenesis enhanced isolation of sars-cov- by tmprss -expressing cells sars-cov- entry genes are most highly expressed in nasal goblet and ciliated cells within human airways sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is enriched in specific cell subsets across tissues sars-cov- receptor ace and tmprss are predominantly expressed in a transient secretory cell type in subsegmental bronchial branches single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses ace expression by colonic epithelial cells is associated with viral infection sars coronavirus, but not human coronavirus nl , utilizes cathepsin l to infect ace -expressing cells specific ace expression in cholangiocytes may cause liver damage after -ncov infection regulation of ace in cardiac myocytes and fibroblasts intra-and inter-cellular rewiring of the human colon during ulcerative colitis myelin gene regulatory factor is a critical transcriptional regulator required for cns myelination clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study lost sense of smell may be peculiar clue to coronavirus infection evaluation of coronavirus in tears and conjunctival secretions of patients with sars-cov- infection myocyte specific upregulation of ace in cardiovascular disease: implications for sars-cov- mediated myocarditis covid- , ecmo, and lymphopenia: a word of caution lymphopenia predicts disease severity of covid- : a descriptive and predictive study the novel severe acute respiratory syndrome coronavirus (sars-cov- ) directly decimates human spleens and lymph nodes sex difference and smoking predisposition in patients with covid- a cellular census of human lungs identifies novel cell states in health and in asthma single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis proliferating spp /mertk-expressing macrophages in idiopathic pulmonary fibrosis scrna-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation allergic inflammatory memory in human respiratory epithelial progenitor cells in vitro and in vivo development of the human airway at single-cell resolution a single-cell atlas of the human healthy airways single cell rna-seq reveals ectopic and aberrant lung resident cell populations in idiopathic pulmonary fibrosis a molecular cell atlas of the human lung from single cell rna sequencing single-cell rna-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis dissecting the cellular specificity of smoking effects and reconstructing lineages in the human airway epithelium coronavirus infections in children including covid- characterization of chromatin accessibility with a transposome hypersensitive sites sequencing (ths-seq) assay single-cell reconstruction of the early maternal-fetal interface in humans a single-cell survey of the human first-trimester placenta and decidua integrative single-cell and cell-free plasma rna transcriptomics elucidates placental cellular dynamics carcinoembryonic antigen-related cell adhesion molecule is an important surface attachment factor that facilitates entry of middle east respiratory syndrome coronavirus secretory leukocyte protease inhibitor (slpi) in mucosal fluids inhibits hiv-i the role of the polymeric immunoglobulin receptor and secretory immunoglobulins during mucosal infection and immunity cxcl is a mucosal chemokine elevated in idiopathic pulmonary fibrosis that exhibits broad antimicrobial activity primary type ii alveolar epithelial cells present microbial antigens to antigen-specific cd + t cells sars-coronavirus open reading frame- a drives multimodal necrotic cell death coronavirus cell entry occurs through the endo-/lysosomal pathway in a proteolysis-dependent manner il- induces production of il- and tnf-alpha and results in cell apoptosis through tnf-alpha ccl is an inducible product of human airway epithelia with innate immune properties drugbank . : a major update to the drugbank database for a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing gdf is an inflammation-induced central mediator of tissue tolerance the role of the cell surface mucin muc as a barrier to infection and regulator of inflammation level of il- predicts respiratory failure in hospitalized symptomatic covid- patients. infectious diseases (except hiv/aids chromvar: inferring transcription-factor-associated accessibility from single-cell epigenomic data forkhead box transcription factors of the foxa class are required for basal transcription of angiotensin-converting enzyme single-cell connectomic analysis of adult mammalian lungs cellphonedb: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes oncostatin m is a differentiation factor for myeloid leukemia cells efficient replication of severe acute respiratory syndrome coronavirus in mouse cells is limited by murine angiotensin-converting enzyme a revised airway epithelial hierarchy includes cftr-expressing ionocytes the transcriptome of nrf -/-mice provides evidence for impaired cell cycle progression in the development of cigarette smoke-induced emphysematous changes the degradome database: expanding roles of mammalian proteases in life and disease the proteolytic regulation of virus cell entry by furin and other proprotein convertases furin-mediated protein processing in infectious diseases and cancer the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade structural modeling of -novel coronavirus (ncov) spike protein reveals a proteolytically-sensitive activation loop as a distinguishing feature compared to sars-cov and related sars-like coronaviruses the biology and therapeutic targeting of the proprotein convertases prediction of proprotein convertase cleavage sites prosperous: high-throughput prediction of substrate cleavage sites for proteases with improved accuracy physiological and molecular triggers for sars-cov membrane fusion and entry into host cells evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis single-cell rna expression profiling of ace , the putative receptor of wuhan knowledge synthesis from million biomedical documents augments the deep expression profiling of coronavirus receptors neurological manifestations of hospitalized patients with covid- in wuhan, china: a retrospective case series study covid- -associated acute hemorrhagic necrotizing encephalopathy: ct and mri features encephalitis and encephalopathy associated with an influenza epidemic in japan influenza surveillance system of japan and acute encephalitis and encephalopathy in the influenza season illuminating viral infections in the nervous system human coronaviruses: viral and cellular factors involved in neuroinvasiveness and neuropathogenesis the neuroinvasive potential of sars-cov may be at least partially responsible for the respiratory failure of covid- patients neurologic alterations due to respiratory virus infections. front detection of severe acute respiratory syndrome coronavirus in the brain: potential role of the chemokine mig in pathogenesis systemic cytokine responses in patients with influenza-associated encephalopathy severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace non-neural expression of sars-cov- entry genes in the olfactory epithelium suggests mechanisms underlying anosmia in covid- patients cns infection and immune privilege the vagus nerve is one route of transneural invasion for intranasally inoculated influenza a virus in mice oral inoculation with herpes simplex virus type infects enteric neuron and mucosal nerve fibers within the gastrointestinal tract in mice seizures and encephalitis in myelin oligodendrocyte glycoprotein igg disease vs aquaporin igg disease encephalitis is an important clinical component of myelin oligodendrocyte glycoprotein antibody associated demyelination: a single-center cohort study in shanghai, china infectious mononucleosis triggers generation of igg auto-antibodies against native myelin oligodendrocyte glycoprotein trans-presentation of il- by dendritic cells is required for the priming of pathogenic t cells molecular mimicry as an inducing trigger for cns autoimmune demyelinating disease myelin-specific cd t cells exacerbate brain inflammation in cns autoimmunity zika virus and the guillain-barré syndrome -case series from seven countries guillain-barré syndrome a peptide from myelin oligodendrocyte glycoprotein that induces demyelinating encephalomyelitis resembling multiple sclerosis antimyelin antibodies as a predictor of clinically definite multiple sclerosis after a first demyelinating event anti-mog and anti-mbp antibody subclasses in multiple sclerosis disrupting myelin-specific th cell gut homing confers protection in an adoptive transfer experimental autoimmune encephalomyelitis early guillain-barré syndrome associated with acute dengue fever clinical features of guillain-barré syndrome with vs without zika virus infection cigarette smoke triggers the expansion of a subpopulation of respiratory epithelial cells that express the sars-cov- receptor angiotensin-converting enzyme protects from severe acute lung failure renin-angiotensin system in human coronavirus pathogenesis smoking in men vs the landscape of lung bronchoalveolar immune cells in covid- revealed by single-cell rna sequencing myoepithelial cells of submucosal glands can function as reserve stem cells to regenerate airways after injury submucosal gland myoepithelial cells are reserve stem cells that can regenerate mouse tracheal epithelium infection of human alveolar macrophages by human coronavirus strain e a single asparagine-linked glycosylation site of the severe acute respiratory syndrome coronavirus spike glycoprotein facilitates inhibition by mannose-binding lectin through multiple mechanisms the role of c a in acute lung injury induced by highly pathogenic viral infections highly pathogenic coronavirus n protein aggravates lung injury by masp- -mediated complement over-activation. medrxiv bovine viral diarrhoea virus infection disrupts uterine interferon stimulated gene regulatory pathways during pregnancy recognition in cows cleavage of a neuroinvasive human respiratory virus spike glycoprotein by proprotein convertases modulates neurovirulence and virus spread within the central nervous system guanylate-binding proteins and exert broad antiviral activity by inhibiting furin-mediated processing of viral envelope proteins ncbi geo: archive for functional genomics data sets-update massively parallel digital transcriptional profiling of single cells cumulus: a cloud-based data analysis framework for large-scale single-cell and single-nucleus rna-seq fast, sensitive and accurate integration of single-cell data with harmony scanpy: large-scale single-cell gene expression data analysis econometric and statistical modeling with python fitting linear mixed-effects models using lme comprehensive integration of single-cell data multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing umap: uniform manifold approximation and projection for dimension reduction jaspar : update of the open-access database of transcription factor binding profiles the encyclopedia of dna elements (encode): data portal update automated cell-type classification in intact tissues by single-cell molecular profiling single-cell rna-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis scikit-learn: machine learning in python classification and regression trees exploring network structure, dynamics, and function using networkx forceatlas , a continuous graph layout algorithm for handy network visualization designed for the gephi software profiler: a web server for functional enrichment analysis and conversions of gene lists ( update) mast: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell rna sequencing data enrichr: a comprehensive gene set enrichment analysis web server update key: cord- -jd ako c authors: kang, sisi; yang, mei; he, suhua; wang, yueming; chen, xiaoxue; chen, yao-qing; hong, zhongsi; liu, jing; jiang, guanmin; chen, qiuyue; zhou, ziliang; zhou, zhechong; huang, zhaoxia; huang, xi; he, huanhuan; zheng, weihong; liao, hua-xin; xiao, fei; shan, hong; chen, shoudeng title: a covid- antibody curbs sars-cov- nucleocapsid protein-induced complement hyper-activation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jd ako c although human antibodies elicited by severe acute respiratory distress syndrome coronavirus- (sars-cov- ) nucleocapsid (n) protein are profoundly boosted upon infection, little is known about the function of n-directed antibodies. herein, we isolated and profiled a panel of n protein-specific monoclonal antibodies (mab) from a quick recovery coronavirus disease- (covid- ) convalescent, who had dominant antibody responses to sars-cov- n protein rather than to spike protein. the complex structure of n protein rna binding domain with the highest binding affinity mab ncov reveals the epitopes and antigen’s allosteric changes. functionally, a virus-free complement hyper-activation analysis demonstrates that ncov specifically compromises n protein-induced complement hyper-activation, a risk factor for morbidity and mortality in covid- , thus paving the way for functional anti-n mabs identification. one sentence summary b cell profiling, structural determination, and protease activity assays identify a functional antibody to n protein. mechanics despite the severity of hypoxemia ( ). it is reported that complement-mediated thrombotic microvascular injury in the lung may contribute to atypical ards features of , accompanied by extensive deposition of the alternative pathway (ap) and lectin pathway (lp) complement components( ). indeed, complement activation is found in multiple organs of severe covid- patients in several other studies( , ), as well as in patients with severe acute respiratory distress syndrome (sars) ( , ) . a recent retrospective observational study of , patients revealed that complement disorder associated with morbidity and mortality of . although systemic activation of complement plays a pivotal role in protective immunity against pathogens, hyper-activation of complement may lead to collateral tissue injury. severe acute respiratory distress syndrome-associated coronavirus- (sars-cov- ) nucleocapsid (n) protein is a highly immunopathogenic and multifunctional viral protein ( ) ( ) ( ) ( ) ( ) ( ) , which elicited high titers of binding antibodies in humoral immune responses ( ) ( ) ( ) . a recent preprint study found that sars-cov- n protein bound to lp complement components masp- (mannan binding lectin- associated serine protease- ), and resulted in complement hyper-activation and aggravated inflammatory lung injury ( ) . several studies have reported in isolations of human monoclonal antibodies (mabs) targeting sars-cov- spike (s) protein, shedding the light of developing therapeutic interventions of [ ] [ ] [ ] [ ] [ ] . however, little is known about the potential therapeutic applications of n protein-targeting mabs in the convalescent b cell repertoire. herein, we report a human mab derived from covid- convalescent, with specific targeting to sars- cov- n protein and functionally compromising complement hyper-activation ex vivo. isolation of n protein-directed mabs to profile antibody response to sars-cov- n protein in early recovered patients, we collected six convalescent blood samples at seven to days after the onset of the disease symptoms. all patients are recovered from covid- during the outbreak in zhuhai, guangdong province, china, with age ranging from to years old ( table s ). the sars-cov- nasal swabs reverse transcription-polymerase chain reaction (rt-pcr) tests were confirmed being negative at the points of blood collection for all of these six covid- patients. plasma samples and peripheral blood mononuclear cells (pbmc) were isolated for serological analysis and antibody isolation. serum antibody titers to sars-cov- s and n proteins were measured by enzyme-linked immunosorbent assays (elisa) (fig. a , b, table s ). serologic analysis demonstrated that serum antibody titers to the n protein were substantially higher than to the s protein in most of the patients. for example, zd and zd had only minimal levels of antibody response to the s protein, while they had much higher antibody titers to the n protein. to be noted, the time from the disease onset to complete recovery from clinical symptoms of covid patient zd was only days (table s ) . to take advantage of patient zd that was still in the early recovery phase with high possibility of high percentage of antigen-specific plasma cells, single plasma cells ( fig. c) with phenotype of cd -/cd -/cd -/cd a -/cd + /cd low-neg /cd hi /cd hi , as well as antigen-specific memory b cells with phenotype of cd + /cd + (fig. d) were sorted from pbmc of patient zd by fluorescence activating cell sorter (facs). to ensure an unbiased assessment, the sorting of antigen-specific memory b cells was carried out with combined probes of both fluorophore-labeled s and n recombinant proteins. variable region of immunoglobulin (ig) heavy- and light-chain gene segment (v h and v l ) pairs from the sorted single cells were amplified by rt- pcr, sequenced, annotated and expressed as recombinant mabs using the methods as described previously( ). recombinant mabs were screened against sars-cov- s and n proteins. in total, we identified mabs reacted with sars-cov n protein including mabs from plasma cells, and mabs from memory b cells (table s ) . we found that igg is the predominant isotype at . % followed by igg ( . %), iga ( . %), igg ( . %) and igm ( . ) (fig. e ). v h gene family usage in sars-cov n protein-reactive antibodies was . % v h , . % v h , . % v h , . % v h and . % v h , respectively (fig. f) , which was similar to the distribution of v h families collected in the ncbi database. nine of sars-cov- n protein-reactive antibodies had no mutation from their germline v h and v h gene segments (fig. f, table s ). average mutation frequency of the remaining mutated antibodies was . % (+/- . %) in v h and . % (+/- . %) in v l . in consistent with the lower serum antibody titers to sars-cov- s protein, we identified only eight sars-cov- s protein-reactive mabs including antibodies from plasma cells and three antibodies from memory b cells. v h gene segment of the s protein-reactive antibodies had either no mutation ( / ) or minimal mutation ( / ) (fig. g ). there were no significant differences in complementarity-determining region (cdr ) length in amino acid residues between the n- fig. a) . among mabs binding to nfl; antibodies bound to n-ntd; one antibody bound to n-ctd (fig. b) . total of nine antibodies including one antibody (ncov ) recognizing n-ctd, seven mabs binding n-ntd (ncov , ncov , ncov , ncov , ncov , ncov , ncov ) and one mab (ncov ) binding only to nfl but not to the other variant n proteins were chosen as representatives for further study. purified antibodies were confirmed to bind the nfl protein by elisa (fig. c) . affinity of these antibodies to the nfl protein was measured by surface plasmon resonance (spr) (fig. d) . in an effort to further characterize the function and structure relationship, three antibodies ncov , ncov and ncov were selected for production of recombinant fab antibodies based on their unique characters. mab ncov has v h mutation frequency of . %, but high binding affinity with kd of . nm (fig. d) to the n protein. mabs ncov and ncov have high v h mutation at . % and . %, respectively, and have binding affinity to n protein with kd of . nm and . nm (fig. d, table s ). complex structure of mab with n-ntd to investigate the molecular interaction mechanism of mab ncov with n protein, we next solved the complex structure of sars-cov- n protein ntd (n-ntd) with ncov fab fragments (ncov fab) at . Å resolution by x-ray crystallography. the final structure is fitted with visible electron density spanning residues - (sars-cov- n-ntd), - (ncov fab, the heavy chain of fab fragments), and - (ncov fab, the light chain of fab fragments, except residues ranged - ), respectively. the complete statistics for data collection, phasing, and refinement are presented in table s . with the help of the high-resolution structure, we were able to designate all complementarity determining regions (cdrs) in the ncov fab as l-cdr (light chain cdr , residues - ), with unambiguous electron density map (fig. a, fig. s a ). the interacting cdrs pinch the c-terminal tail of sars-cov- n-ntd (residues range from to ), with extensive binding contacts of Å burying surface area (table s ) . light chain l-cdr and l-cdr of ncov fab interact with residues ranging from - of n-ntd via numerous hydrophilic and hydrophobic contacts (fig. b, fig. s b ). of note, sars-cov- n- ntd residue q is recognized by l-cdr residue t via a hydrogen bond, simultaneously stacking with l-cdr residue w and l-cdr residue y (fig. c) . besides, a network of interactions from heavy chain h-cdr , h-cdr of ncov fab to residues - of n-ntd suggests that sars-cov- n-ntd conservative residue k has a critical role in ncov antibody binding. the k is recognized via hydrogen bonds with residues e d-carboxyl group and t , d , s main-chain carbonyl groups inside the h-cdr of ncov fab (fig. d) . besides, sars-cov- n-ntd l also interacts with i , v , n , and a of h-cdr and h-cdr of ncov fab through hydrophobic interactions (fig. e) . interestingly, all three residues (q , l , and k ) of sars-cov- n-ntd are relatively conserved in the highly pathogenic betacoronavirus n protein (fig. s b) , which implicated that the ncov may cross- interact with sars-cov n protein or mers-cov n protein. indeed, the binding affinities measured by spr analysis demonstrate that ncov interacts to sars-cov n protein and mers-cov n protein with kd of . nm (fig. s b, c) . to discover the conformational changes between the sars-cov- n-ntd apo-state with the antibody-bound state, we next superimposed the complex structure with the n-ntd structure (pdb: m m)( ). the superimposition result suggests that the c-terminal tail of sars-cov- n-ntd unfold from the basic palm region upon the ncov fab binding (fig. f) , which likely contributes to allosteric regulation of normal full-length n protein's function. additionally, ncov fab binding results in a . Å movement of the b-finger region outward from the rna binding pocket, which may enlarge the rna binding pocket of the n protein (fig. f) . to sum up, our crystal structural data demonstrated that the human mab ncov recognizes the and found to be linear (fig. c) . therefore, the additions of sars-cov- n protein do not change the single substrate binding site characterization of the enzymatic reactions. to assess the suppression ability of ncov to the sars-cov- n protein-induced complement hyper- activation function, we next conducted the complement hyper-activation analysis in serial n protein: ncov ratios. as shown in fig. d , the addition of n protein elevates v max value up to -folds ( : ratio), whereas the additions of antibody ncov decline the v max in a dose- depended manner (table s ) the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china treatment of coronavirus disease (covid- ): a review abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia hospitalized patients with novel coronavirus-infected pneumonia in a novel coronavirus from patients with pneumonia in china clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study acute respiratory distress syndrome complement associated microvascular injury and thrombosis in the pathogenesis of severe covid- infection: a report of five cases complement activation in patients with covid- : a novel therapeutic target the case of complement activation in covid- multiorgan impact serum proteomic fingerprints of adult patients with severe acute respiratory syndrome plasma proteome of severe acute respiratory syndrome analyzed by two-dimensional gel electrophoresis and mass spectrometry immune complement and coagulation dysfunction in adverse outcomes of sars-cov- infection the coronavirus nucleocapsid is a multifunctional protein the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak -an update on the status crystal structure of sars-cov- nucleocapsid protein rna binding domain reveals potential unique drug targeting sites the orf , orf and nucleocapsid proteins of sars-cov- inhibit type i interferon signaling pathway architecture and self-assembly of the sars-cov- nucleocapsid protein a neutralizing human antibody binds to the n-terminal domain of the spike protein of sars-cov- antibodies to sars-cov- and their potential for therapeutic passive immunization longitudinal isolation of potent near-germline sars-cov- -neutralizing antibodies from covid- patients potently neutralizing and protective human antibodies against sars-cov- a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace a human monoclonal antibody blocking sars-cov- infection human neutralizing antibodies elicited by sars-cov- infection potent neutralizing antibodies against sars-cov- identified by high- throughput single-cell sequencing of convalescent patients' b cells high-throughput isolation of immunoglobulin genes from single human b cells and expression as monoclonal antibodies multiple domains of masp- , an initiating complement protease, are required for interaction with its substrate c specificity, cross-reactivity, and function of antibodies elicited by zika virus infection delineating antibody recognition against zika virus during natural infection complement regulators and inhibitory proteins covid- : complement, coagulation, and collateral damage study on interaction between sars-cov n and map . xi bao yu fen zi mian yi xue za zhi = chinese journal of cellular and molecular immunology for technical assistants of mabs isolation, production and characterization emerging prevention products contributed to protein purification and crystallization, in vitro protein-protein interaction analysis, and complement activation analysis. y. w. contributed to mabs isolation, in vitro protein-protein interaction analysis we thank the staffs of the bl u/ u/ u beamlines at ssrf for their help with the x-ray diffraction data screening and collections. we thank junlang liang, tong liu, nan the authors declare no conflict of interest. are shown. p values: *p < . ; **p < . ; "-" means that the kinetics did not conform to michaelis-menten kinetics. key: cord- - y xbgun authors: wierbowski, shayne d.; liang, siqi; chen, you; andre, nicole m.; lipkin, steven m.; whittaker, gary r.; yu, haiyuan title: a d structural interactome to explore the impact of evolutionary divergence, population variation, and small-molecule drugs on sars-cov- -human protein-protein interactions date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y xbgun the recent covid- pandemic has sparked a global public health crisis. vital to the development of informed treatments for this disease is a comprehensive understanding of the molecular interactions involved in disease pathology. one lens through which we can better understand this pathology is through the network of protein-protein interactions between its viral agent, sars-cov- , and its human host. for instance, increased infectivity of sars-cov- compared to sars-cov can be explained by rapid evolution along the interface between the spike protein and its human receptor (ace ) leading to increased binding affinity. sequence divergences that modulate other protein-protein interactions may further explain differences in transmission and virulence in this novel coronavirus. to facilitate these comparisons, we combined homology-based structural modeling with the eclair pipeline for interface prediction at residue resolution, and molecular docking with pyrosetta. this enabled us to compile a novel d structural interactome meta-analysis for the published interactome network between sars-cov- and human. this resource includes docked structures for all interactions with protein structures, enrichment analysis of variation along interfaces, predicted ΔΔg between sars-cov and sars-cov- variants for each interaction, predicted impact of natural human population variation on binding affinity, and a further prioritized set of drug repurposing candidates predicted to overlap with protein interfaces†. all predictions are available online† for easy access and are continually updated when new interactions are published. † some sections of this pre-print have been redacted to comply with current biorxiv policy restricting the dissemination of purely in silico results predicting potential therapies for sars-cov- that have not undergone thorough peer-review. the results section titled “prioritization of candidate inhibitors of sars-cov- -human interactions through binding site comparison,” figure , supplemental table , and all links to our web resource have been removed. blank headers left in place to preserve structure and item numbering. our full manuscript will be published in an appropriate journal following peer-review. the ongoing global covid- pandemic caused by the infection of sars-cov- has to date infected impact of mutations on interaction binding affinity and performed a comparison of protein-protein and protein-drug binding sites. we compile all results from our structural interactome into a user-friendly web server allowing for quick exploration of individual interactions or bulk download and analysis of the whole dataset. further, we explore the utility of our interactome modeling approach in identifying key interactions undergoing evolution along viral protein interfaces, highlighting population variants on human interfaces that could modulate the strength of viral-host interactions to confer protection from or susceptibility to covid- , and prioritizing drug candidates predicted to bind competitively at viral- human interaction interfaces. enrichment of divergence between sars-cov and sars-cov- at spike-ace binding interface to highlight the utility of computational and structural approaches to model the sars-cov- -human interactome, we first examined the interaction between the sars-cov- spike protein (s) and human angiotensin-converting enzyme (ace ) (fig .a) . this interaction is key for viral entry into human cells and is the only viral-human interaction with solved crystal structures available in both sars-cov and sars-cov- [ ] [ ] [ ] . comparison between sars-cov and sars-cov- revealed that sequence divergence of the s protein was highly enriched at the s-ace interaction interface (fig .a; log oddsratio= . , p= . e- ), indicating functional evolution around this interaction. to explore the functional impact of these mutations on this interaction, we leveraged the rosetta energy function to estimate the change in binding affinity (ΔΔg) between the sars-cov and sars-cov- versions of the s-ace interaction (fig .b and .c) . the predicted negative ΔΔg value of - . rosetta energy units (reu) indicates an increased binding affinity using the sars-cov- s protein driven by better optimized solvation and hydrogen bonding potential fulfillment along the ace interface. our result is consistent with the hypothesis that increased stability of the s-ace interaction is one of the key reasons for elevated transmission of sars-cov- . moreover, recent experimental energy kinetics assays have shown that sars-cov- s protein binds ace with - -fold higher affinity than that of sars-cov s protein supporting the conclusions from our computational modeling. among individuals , , . several hypotheses for genetic predisposition models have been proposed including that expression quantitative trait loci (eqtls) may up-or down-regulate host response genes and that functional coding variants may alter viral-human interactions , . for instance, a recent rna-in order to add a structural component to our interactome map, and thereby enable modeling of the binding affinity for these interactions, we additionally performed docking in pyrosetta using our eclair interface likelihood predictions to refine the search space (supplemental after constructing the d interactome between sars-cov- and human, we first looked for evidence of interface-specific variation by mapping both gnomad reported human population variants (supplemental table ) and sequence divergences between sars-cov and sars-cov- (supplemental table ) onto the predicted interfaces. in general, conserved residues have been shown to cluster at protein-protein interfaces , and a recent analysis of sars-cov- structure and evolution likewise concluded that highly conserved surface residues were likely to drive protein-protein interactions . consistent with these prior findings at an interactome-wide level, we observed significant depletion for both viral and human variation along the predicted interfaces comparable to that observed on solved human-human interfaces (fig .a) . nonetheless, considering each interaction individually, our analysis uncovered a interaction interfaces enriched for human population variants (fig .b) , and enriched for recent viral sequence divergences (fig .c) . a breakdown of variant enrichment on each interface is provide in supplemental table . the individual viral interfaces showing an unexpected degree of variation may-like the previously discussed s-ace interface-be indicative of recent functional evolution around the viral-human interaction. considering the slower rate of evolution in humans, enrichment of population variants along the human interfaces is unlikely to be a selective response to the virus. rather, these interfaces with high population variation along the interfaces may represent edges in the interactome whose strength may fluctuate among individuals or between populations. alternatively, enrichment and depletion of variation along the human-viral interfaces could help distinguish viral proteins that bind along existing-and therefore conserved-human-human interfaces from those that bind using novel interfaces-that would be less likely to be under selective pressure. to further explore the functional impact of naturally occurring variants on the human interactors of sars-cov- , we considered variants with phenotypic associations as reported in hgmd , clinvar or the nhgri-ebi gwas catalog . interactors of sars-cov- were significantly more likely than the rest of the human proteome to harbor phenotypic variants in each of these databases (fig .d) . notably, among the individual disease categories enriched in this gene set, several were consistent with reported comorbidities including heart disease, respiratory tract disease, and metabolic disease , (fig . e; supplemental table ). disruption of native protein-protein interactions is one mechanism of disease pathology, and disease mutations are known to be enriched along protein interfaces , . human population variants on the predicted human-viral interface were more likely to be annotated as deleterious by sift and polyphen but showed identical allele frequency distributions compared to those off the interfaces (supplemental figure ) . however, mapping annotated disease mutations onto the protein interfaces only revealed significant enrichment along known human-human interfaces; no such enrichment was found on human-viral interfaces (fig .f) . this is likely because unlike with human- human interactions, mutations disrupting human-viral interactions would not disrupt natural cell function, and therefore would be unlikely to be pathogenic. our finding that disease mutations and viral proteins affect human proteins at distinct sites is consistent with a two-hit hypothesis of comorbidities whereby proteins whose function is already affected by genetic background may be further compromised by viral infection. we next sought to explore the impact of sequence divergence in sars-cov- relative to sars-cov on viral-human interactions. mutations between the two viruses were identified by pairwise alignment and the impacts of these mutations on the binding energy (ΔΔg) for interactions amenable to docking were predicted using a pyrosetta pipeline , , . although the binding energy for most interactions was unchanged-either because no mutations occurred near the interface or because the mutations that did had marginal effect-we observed an increased likelihood of the divergence from sars-cov to sars- cov- resulting in decreased binding energy (i.e. more stable interaction) (fig . a; supplemental table ). the significant outliers in these ΔΔg predictions can help pinpoint key differences between the viral- human interactomes of sars-cov and sars-cov- . we further note a wide range of affinity impacts among various human interactors of a single viral protein (fig .d) and hypothesize that these differences may help identify the most important interactions. to further explore the significance of these changes in interaction affinity, we considered those interactions with the largest decrease in binding energy; corresponding to the largest predicted increase in affinity. specifically, we highlight the interaction between coronavirus orf c and human mitochondrial nadh scanning mutagenesis along all docked interfaces in pyrosetta. we identified as binding energy hotspot mutations all mutations with a predicted ΔΔg at least one standard deviation away from the mean for identical amino acid substitutions across the rest of the interface. in total, out of , population variants on eligible interfaces, ( . %) were identified as hotspots predicted to disrupt interaction stability, and ( . %) were identified as hotspots predicted to contribute to interaction stability (fig .b) . most of the hotspot mutations were predicted to be driven by solvation or repulsive forces, with disruptive hotspots primarily being driven by repulsive forces and stabilizing hotspots primarily being driven by solvation forces (fig .c) . results summarizing the predicted impact of all , population variants on the docked interfaces are provided in supplemental table . the current version of the sars-cov- human structural interactome web server describes viral-human interactions reported by gordon et al. . we will continue support for the web server with periodic updates as additional interactome screens between sars-cov- and human are published. as we update, a navigation option to select between the current or previous stable releases of the web server will be provided. overall, we present a comprehensive resource to explore the sars-cov- -human protein-protein interactome map in a structural context. analysis through this framework allows us to consider the recent evolution of sars-cov- in the context of its interactome map and to prioritize for further functional characterization key interactions. likewise, our consideration of underlying variation in the human proteins that interact with sars-cov- may be valuable in explaining differences in response to infection. we particularly note that our observation that perturbation from underlying disease mutations and viral protein binding occur at distinct sites on the protein is of clinical interest. further investigation into the combined role of these two sources of perturbation to better understand the mechanisms linked to comorbidities is warranted. however, our work is not without limitation. firstly, we note that although structural coverage from our homology modelling of sars-cov- proteins was robust (supplemental figure ) , the same done to orient the most likely interface residues on each structure towards each other, protein-protein docking using incomplete protein models introduces some bias and low coverage may exclude some true interface residues. for this reason, the initial eclair interface annotations-which are less subject to structural coverage limitations-may provide orthogonal value. we additionally note that direct perhaps most importantly, we emphasize the importance of further experimental characterization to confirm the predictions made here. nonetheless we believe our d structural sars- cov- -human interactome web server will prove to be a key resource in informing hypothesis driven exploration of the mechanisms of sars-cov- pathology and host response. the scope, and potential impacts of our webserver will continue to grow as we incorporate the results of ongoing and future interactome screens between sars-cov- and human. additionally, we note that our d structural interactome framework can be rapidly deployed to analyze future viruses. homology-based modeling of all sars-cov- proteins was performed in modeller using a multiple template modeling procedure. in brief, a list of candidate template structures for each protein to be modelled was obtained by running blast against a reference containing all sequences in the protein data bank (pdb) . templates were filtered to only retain those with at least % identify to the protein to be modelled, and the remaining templates were ranked using a weighted combination of percent identity and coverage as described previously . to compile the final set of overlapping templates for modeling, first the top ranked template was selected as a seed. overlapping templates were iteratively added to the set so long as ) the new template increased the overall coverage by at least %, and ) the new template retained a total percent identity no more than % worse than the initial seed template. pairwise alignments between the protein to be modelled and the template set were generated using a regions with large gaps (at least gaps in the alignment in a residue window). finally, alignment was to accommodate predictions between sars-cov- and human, slight alterations were made. using the predicted interface probabilities reported by eclair, we set up the initial docking conformation to explore a restricted search space for each docking simulation. in cases where multiple structures were available for the human protein, all structures were weighted based on the eclair scores for the covered residues in each structure to maximize both coverage age inclusion of likely interface residues. for each protein in the interaction, we performed a linear regression classification in scikit-learn to optimally separate the likely interface residues from likely non-interface residues. the plane defined by this linear regression served as a reference to orient the structures along the y-axis the x-z plane and separated a distance of Å along the y-axis. for each docking attempt, a series of random perturbations from these initial conformations were made to search the nearby space. first, the human protein was rotated up to ° along the y-axis to allow full exploration of different rotations of the two interfaces relative to each other. second, to apply some flexibility to the plane predicting the interface vs. non-interface sides of each protein, up to ° of rotation along the x-and z-axis were allowed for both the viral and human proteins. finally, a random translation up to Å in magnitude was applied to the human protein along the x-z plane so that the docking could explore contact points other than the center of masses along these axes. after initializing these guided starting conformations, docking was simulated in pyrosetta using a modified version of the protein-protein docking methodology provided by gray . the initial demo (https://graylab.jhu.edu/pyrosetta/downloads/scripts/demo/d docking.py) takes two chains from a co-crystal structure, applies a random perturbation, and re-docks them. because randomized initial orientation was already handled as described previously, these steps were removed from our docking runs. in brief, the protein models were converted to centroid representation, slid into contact using the "interchain_cen" scoring function, and converted back to full atom representation, before having their side-chains optimized using the predefined "docking" and "docking_min" scoring table . to annotate interface residues from atomic resolution docked models, we used a previously described and established definition for interface residues . in brief, the solvent accessible surface area (sasa) for both bound and unbound docked structures was calculated using naccess. we define as accessibility) and ) in contact with the interacting chain (defined as any residue whose absolute accessibility decreased by ≥ . Å ). the full list of sars-cov- mutations is reported in supplemental table . human population variants in all human proteins shown to interact with sars-cov- proteins were obtained from gnomad reported as the most general term with no more significant ancestor term (supplemental table , sheet ). raw enrichment values for all terms are also predicted (supplemental table , sheet ). for curation of disease and trait associations from nhgri-ebi gwas catalog (http://www.ebi.ac.uk/gwas/) the scoring function used for these calculations is as described previously using the following weights; protein-ligand docking using smina the previous viral-human interactome screen by gordon et al. supplemental table . list of all predicted drug-target binding sites . comparison of the percentage of human genes that interact with (green) or do not interact with (orange) coronaviruses: an overview of their replication and pathogenesis a pneumonia outbreak associated with a new coronavirus of probable bat origin including severe acute respiratory syndrome (sars) and middle east respiratory syndrome (mers). mandell, douglas, and bennett's principles and practice of infectious diseases a novel bat coronavirus closely related to sars-cov- contains natural insertions at the s /s cleavage site of the spike protein extrapulmonary manifestations of covid- clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study severe obesity, increasing age and male sex are independently associated with worse in-hospital outcomes, and higher in-hospital mortality african-american covid- mortality: a sentinel event characteristics associated with hospitalization among patients with covid- greater risk of severe covid- in black, asian and minority ethnic populations is not explained by cardiometabolic, socioeconomic or behavioural factors, or by (oh)-vitamin d status: study of cases from the uk biobank disparities in incidence of covid- among underrepresented racial/ethnic groups in counties identified as hotspots during racial demographics and covid- confirmed cases and deaths: a correlational analysis of us counties the sars-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors global landscape of hiv-human protein complexes protein interaction mapping identifies rbbp as a negative regulator of ebola virus replication comparative flavivirus-host protein interaction mapping reveals mechanisms of dengue and zika virus pathogenesis a sars-cov- protein interaction map reveals targets for drug repurposing structure of the human receptor tyrosine kinase met in complex with the listeria invasion protein inlb. cell sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor chemokine receptor ccr antagonist maraviroc: medicinal chemistry and clinical applications inhibiting hiv- integrase by shifting its oligomerization equilibrium small molecule inhibitors of the ledgf site of human immunodeficiency virus integrase identified by fragment screening and structure based design virus-receptor interactions: the key to cellular invasion structural insights into the interaction of coronavirus papain-like proteases and interferon-stimulated gene product from different species mechanism of inhibition of retromer transport by the bacterial effector ridl solution structure of the complex between poxvirus-encoded cc chemokine inhibitor vcci and human mip- beta structural properties of the promiscuous vp activation domain crystal structure of a gamma-herpesvirus cyclin-cdk complex metabolic syndrome and viral pathogenesis: lessons from influenza and coronaviruses a unifying view of st century systems biology the molecular sociology of the cell network medicine: a network-based approach to human disease small molecules, big targets: drug discovery faces the protein-protein interaction challenge small-molecule inhibitors of protein-protein interactions: progressing toward the reality alphaspace: fragment-centric topographical mapping to target protein-protein interaction interfaces the development and current use of bcl- inhibitors for the treatment of chronic lymphocytic leukemia identification of protein-protein interaction inhibitors targeting vaccinia virus processivity factor for development of antiviral agents inhibition of human papillomavirus dna replication by small molecule antagonists of the e -e protein interaction optimization and determination of the absolute configuration of a series of potent inhibitors of human papillomavirus type- e -e protein-protein interaction: a combined medicinal chemistry, nmr and computational chemistry approach protein-protein interactions in virus-host systems. front microbiol interactome insider: a structural interactome browser for genomic studies pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis structural and functional basis of sars-cov- entry by using human ace sars-cov- and bat ratg spike glycoprotein structures inform on virus evolution and furin-cleavage effects structure, function, and antigenicity of the sars-cov- spike glycoprotein the rosetta all-atom energy function for macromolecular modeling and design structural basis of receptor recognition by sars-cov- cryo-em structure of the -ncov spike in the prefusion conformation who is most likely to be infected with sars-cov- ? the lancet infectious diseases comparative genetic analysis of the novel coronavirus ( -ncov/sars-cov- ) receptor ace in different populations genetic predisposition models to covid- infection the mutational constraint spectrum quantified from variation in , humans a simple physical model for binding energy hot spots in protein-protein complexes spatial chemical conservation of hot spot interactions in protein-protein complexes tests of concrete strength across the thickness of industrial floor using the ultrasonic method with exponential spot heads the sequence of human ace is suboptimal for binding the s spike protein of sars coronavirus . biorxiv conserved residue clusters at protein-protein interfaces and their use in binding site identification sars-cov (covid- ) structural/evolution dynamicome: insights into functional evolution and human genomics. biorxiv human gene mutation database (hgmd): update clinvar: improving access to variant interpretations and supporting evidence the nhgri-ebi gwas catalog of published genome-wide association studies, targeted arrays and summary statistics characteristics associated with hospitalization among patients with covid- prevalence of comorbidities and its effects in patients infected with sars-cov- : a systematic review and meta-analysis widespread macromolecular interaction perturbations in human genetic disorders three-dimensional reconstruction of protein networks provides insight into human genetic disease sift web server: predicting effects of amino acid substitutions on proteins a method and server for predicting damaging missense mutations analysis of intraviral protein-protein interactions of the sars coronavirus orfeome human mitochondrial complex i assembly is mediated by ndufaf cia complex i assembly factor: a candidate for human complex i deficiency? hum genet human genome-wide rnai screen reveals a role for nuclear pore proteins in poxvirus morphogenesis mitochondrial reactive oxygen species control t cell activation by regulating il- and il- expression: mechanism of ciprofloxacin-mediated immunosuppression the landscape of human cancer proteins targeted by sars-cov- genome-wide sirna screen identifies the retromer as a cellular entry factor for human papillomavirus the master regulator of the cellular stress response (hsf ) is critical for orthopoxvirus infection a genome-wide small interfering rna (sirna) screen reveals nuclear factor-kappab (nf-kappab)-independent regulators of nod -induced interleukin- (il- ) secretion architecture of the human interactome defines protein communities and disease networks tmed potentiates cellular ifn responses to dna viruses by reinforcing mita dimerization and facilitating its trafficking role of the early secretory pathway in sars-cov- infection tom mediates activation of interferon regulatory factor on mitochondria a whole-genome association study of major determinants for host control of hiv- . science extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations comparative protein structure modeling using modeller basic local alignment search tool the protein data bank interactome d: adding structural details to protein networks protein identification and analysis tools on the expasy server the proteomics protocols handbook divergence measures based on the shannon entropy predicting functionally important residues from sequence conservation direct coupling analysis for protein contact prediction evolutionarily conserved pathways of energetic connectivity in protein families modbase, a database of annotated comparative protein structure models and associated resources the interpretation of protein structures: estimation of static accessibility accelerating protein docking in zdock using an advanced d convolution library scikit-learn: machine learning in python high-resolution protein-protein docking uniprot: a worldwide hub of protein knowledge analysis of multimerization of the sars coronavirus nucleocapsid protein a new coronavirus associated with human respiratory disease in china genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan a general method applicable to the search for similarities in the amino acid sequence of two proteins amino acid substitution matrices from protein blocks the ensembl variant effect predictor explaining odds ratios. j can acad child adolesc psychiatry ldlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants open babel: an open chemical toolbox lessons learned in empirical scoring with smina from the csar benchmarking exercise sars-cov- that contain disease annotations in hgdm (log or= . , p= . e- ) sars-cov- proteins were significantly more likely to harbor disease mutations than non-interactors error bars indicate ± se. e, a sample of individual disease terms enriched in human genes targeted by comparison of the enrichment of hgdm, clinvar, and gwas annotated mutations on human-vial interfaces or human-human interfaces for the same gene set. although disease mutations were enriched on human-human interfaces (hgmd, log or= ), no enrichment was observed on human-viral interfaces (hgmd, log or= . , p= . ; clinvar the gwas category was removed from this analysis because most lead gwas snps occurred in non-coding regions. error bars indicate ± se predicted changes in binding affinity from sequence divergences between sars-cov and sars- an overall representation of these ΔΔg predictions is reported (mean=- . reu, std= . reu) with interactions sorted from those with the largest decrease in binding energy (most stabilized relative to sars-cov) to those with the largest increase in binding energy (most destabilized relative to sars-cov) < z-score ≤ - , n= ), or strongly stabilizing (z-score ≤ - , n= ) score < , n= , ) showed minimal impact of binding affinity. c, breakdown of the contribution of each term in the pyrosetta energy function used for in-silico scanning mutagenesis for all population variants a breakdown of which term contributed most heavily to the classification of all interface hotspot population variants is shown on the right. d, individual sars-cov- -human interactions involving the same viral protein can have distinct interfaces with distinct predicted changes in binding affinity between sars-cov and sars-cov- versions of the protein. an example involving orf b is highlighted where some interactions (e.g. tomm and ptbp ) are predicted to be more stabilized in sars-cov- whereas others (e.g. bag , slc a r , and mark ) are predicted to me unaffected. e, docked structure for the interaction between sars-cov- orf c and human ndufaf sars-cov- (bottom) orf c. interface residues are colored by their predicted energy contribution from blue (stabilizing) to white (no impact) to red (destabilizing) - are labeled in red, while other residues with a major contribution to the binding affinity are labeled in green (ndufaf ) or blue (orf c). the overall predicted change in binding energy (ΔΔg=- . reu) suggests the interaction is more stable (lower energy) in the sars-cov- version of the interaction supplemental figure . summary of human population variant frequency and deleteriousness summary of allele frequency for human population variants either on or off the predicted human- viral interface presented as either a raw distribution or a cumulative density respectively. variants in either category had roughly identical allele frequency distributions. c, d, summary of the sift deleteriousness score for human population variants either on or off the predicted human-viral interface presented as either a raw distribution or a cumulative density respectively population variants on the interface were significantly more likely to be classified deleterious. f, g, summary of the polyphen deleteriousness score for human population variants either on or off the predicted human-viral interface presented as either a raw distribution or a cumulative density respectively. plots are colored based on the split between polyphen benign, possibly damaging, and probably damaging categories. e, pie chart breakdown of these categories. pie char outlines distinguish interface (green) from non-interface (orange) for each sars-cov- -human interaction with d structure available for both proteins, independent guiding docking trials were used to select a final docked configuration. the structure for the viral protein is colored from white to blue with darker blue corresponding to higher eclair prediction. the structure for the human protein is colored similarly using a green to white gradient. initial semi-random docked configurations were generated using five steps. first a plane separating eclair predicted likely interface from likely non-interface residues was drawn to divide each protein. second, the two protein chains were separated Å apart on the y-axis using the previously defined plane to orient the likely interface sides of each protein towards each other. third, the human protein was randomly rotated up to ° along the y-axis to sample different orientations of the two interfaces relative to each other. key: cord- -s hc fxs authors: ostaszewski, marek; niarakis, anna; mazein, alexander; kuperstein, inna; phair, robert; orta-resendiz, aurelio; singh, vidisha; aghamiri, sara sadat; acencio, marcio luis; glaab, enrico; ruepp, andreas; fobo, gisela; montrone, corinna; brauner, barbara; frischman, goar; monraz gómez, luis cristóbal; somers, julia; hoch, matti; gupta, shailendra kumar; scheel, julia; borlinghaus, hanna; czauderna, tobias; schreiber, falk; montagud, arnau; de leon, miguel ponce; funahashi, akira; hiki, yusuke; hiroi, noriko; yamada, takahiro g.; dräger, andreas; renz, alina; naveez, muhammad; bocskei, zsolt; messina, francesco; börnigen, daniela; fergusson, liam; conti, marta; rameil, marius; nakonecnij, vanessa; vanhoefer, jakob; schmiester, leonard; wang, muying; ackerman, emily e.; shoemaker, jason; zucker, jeremy; oxford, kristie; teuton, jeremy; kocakaya, ebru; summak, gökçe yağmur; hanspers, kristina; kutmon, martina; coort, susan; eijssen, lars; ehrhart, friederike; rex, d. a. b.; slenter, denise; martens, marvin; haw, robin; jassal, bijay; matthews, lisa; orlic-milacic, marija; senff ribeiro, andrea; rothfels, karen; shamovsky, veronica; stephan, ralf; sevilla, cristoffer; varusai, thawfeek; ravel, jean-marie; fraser, rupsha; ortseifen, vera; marchesi, silvia; gawron, piotr; smula, ewa; heirendt, laurent; satagopam, venkata; wu, guanming; riutta, anders; golebiewski, martin; owen, stuart; goble, carole; hu, xiaoming; overall, rupert w.; maier, dieter; bauch, angela; gyori, benjamin m.; bachman, john a.; vega, carlos; grouès, valentin; vazquez, miguel; porras, pablo; licata, luana; iannuccelli, marta; sacco, francesca; nesterova, anastasia; yuryev, anton; de waard, anita; turei, denes; luna, augustin; babur, ozgun; soliman, sylvain; valdeolivas, alberto; esteban-medina, marina; peña-chilet, maria; helikar, tomáš; puniya, bhanwar lal; modos, dezso; treveil, agatha; olbei, marton; de meulder, bertrand; dugourd, aurélien; naldi, aurelien; noel, vincent; calzone, laurence; sander, chris; demir, emek; korcsmaros, tamas; freeman, tom c.; augé, franck; beckmann, jacques s.; hasenauer, jan; wolkenhauer, olaf; wilighagen, egon l.; pico, alexander r.; evelo, chris t.; gillespie, marc e.; stein, lincoln d.; hermjakob, henning; d’eustachio, peter; saez-rodriguez, julio; dopazo, joaquin; valencia, alfonso; kitano, hiroaki; barillot, emmanuel; auffray, charles; balling, rudi; schneider, reinhard title: covid- disease map, a computational knowledge repository of sars-cov- virus-host interaction mechanisms date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: s hc fxs we hereby describe a large-scale community effort to build an open-access, interoperable, and computable repository of covid- molecular mechanisms - the covid- disease map. we discuss the tools, platforms, and guidelines necessary for the distributed development of its contents by a multi-faceted community of biocurators, domain experts, bioinformaticians, and computational biologists. we highlight the role of relevant databases and text mining approaches in enrichment and validation of the curated mechanisms. we describe the contents of the map and their relevance to the molecular pathophysiology of covid- and the analytical and computational modelling approaches that can be applied to the contents of the covid- disease map for mechanistic data interpretation and predictions. we conclude by demonstrating concrete applications of our work through several use cases. the coronavirus disease (covid- ) pandemic due to severe acute respiratory syndrome coronavirus (sars-cov- ) [ ] has already resulted in the infection of over million people worldwide, of whom one million have died . the molecular pathophysiology that links sars-cov- infection to the clinical manifestations and course of covid- is complex and spans multiple biological pathways, cell types and organs [ , ] . to gain the insights into this complex network, the biomedical research community needs to approach it from a systems perspective, collecting the mechanistic knowledge scattered across the scientific literature and bioinformatic databases, and integrating it using formal systems biology standards. with this goal in mind, we initiated a collaborative effort involving over biocurators, domain experts, modelers and data analysts from institutions in countries to develop the covid- disease map, an open-access collection of curated computational diagrams and models of molecular mechanisms implicated in the disease [ ] . to this end, we aligned the biocuration efforts of the disease maps community [ , ] , reactome [ ] , and wikipathways [ ] and developed common guidelines utilising standardised encoding and annotation schemes, based on community-developed systems biology standards [ ] [ ] [ ] , and persistent identifier repositories [ ] . moreover, we integrated relevant knowledge from public repositories [ ] [ ] [ ] [ ] and text mining resources, providing a means to update and refine contents of the map. the fruit of these efforts was a series of pathway diagrams describing key events in the covid- infectious cycle and host response. we ensured that this comprehensive diagrammatic description of disease mechanisms is machine-readable and computable. this allows us to develop novel bioinformatics workflows, creating executable networks for analysis and prediction. in this way, the map is both human and machine-readable, lowering the communication barrier between biocurators, domain experts, and computational biologists significantly. computational modelling, data analysis, and their informed interpretation using the contents of the map have the potential to identify molecular signatures of disease predisposition and development, and to suggest drug repositioning for improving current treatments. covid- disease map is a collection of diagrams containing interactions between elements, supported by publications and preprints. the summary of diagrams available in the covid- disease map can be found online in supplementary material . the map is a constantly evolving resource, refined and updated by ongoing efforts of biocuration, sharing and analysis. here, we report its current status. in section we explain the set up of our community effort to construct the interoperable content of the resource, involving biocurators, domain experts and data analysts. in section we demonstrate that the scope of the biological maps in the resource reflects the state-ofthe-art about the molecular biology of covid- . next, we outline analytical workflows that can be used on the contents of the map, including initial, preliminary outcomes of two such workflows, discussed in detail as use cases in section . we conclude in section with an outlook to further development of the covid- map and the utility of the entire resource in future efforts towards building and applying disease-relevant computational repositories. the covid- disease map project involves three main groups: (i) biocurators, (ii) domain experts, and (iii) analysts and modellers: i. biocurators develop a collection of systems biology diagrams focused on the molecular mechanisms of sars-cov- . ii. domain experts refine the contents of the diagrams, supported by interactive visualisation and annotations. iii. analysts and modellers develop computational workflows to generate hypotheses and predictions about the mechanisms encoded in the diagrams. all three groups have an important role in the process of building the map, by providing content, refining it, and defining the downstream computational use of the map. figure illustrates the ecosystem of the covid- disease map community, highlighting the roles of different participants, available format conversions, interoperable tools, and downstream uses. the information about the community members and their contributions are disseminated via the fairdomhub [ ] , so that content distributed across different collections can be uniformly referenced. the biocurators of the covid- disease map diagrams follow the guidelines developed by the community, and specific workflows of wikipathways [ ] and reactome [ ] . the biocurators build literature-based systems biology diagrams, representing the molecular processes implicated in the covid- pathophysiology, their complex regulation and the phenotypic outcomes. these diagrams are main building blocks of the map, and are composed of biochemical reactions and interactions (further called altogether interactions) taking place between different types of molecular entities in various cellular compartments. as there are multiple teams working on related topics, biocurators can provide an expert review across pathways and across platforms. this is possible, as all platforms offer intuitive visualisation, interpretation, and analysis of pathway knowledge to support basic and clinical research, genome analysis, modelling, systems biology, and education. table lists information about the created content. for more details see supplementary material . communicating to refine, interpret and apply covid- disease map diagrams. these diagrams are created and maintained by biocurators, following pathway database workflows or standalone diagram editors, and reviewed by domain experts. the content is shared via pathway databases or a gitlab repository; all can be enriched by integrated resources of text mining and interaction databases. the covid- disease map diagrams, available in layout-aware systems biology formats and integrated with external repositories, are available in several formats allowing a range of computational analyses, including network analysis and boolean, kinetic or multiscale simulations. both interactions and interacting entities are annotated following a uniform, persistent identification scheme, using either miriam or identifiers.org [ ] , and the guidelines for annotations of computational models [ ] . viral protein interactions are explicitly annotated with their taxonomy identifiers to highlight findings from strains other than sars-cov- . moreover, tools like modelpolisher [ ] , sbmlsqueezer [ ] or memote help to automatically complement the annotations in the sbml format and validate the model (see also supplementary material ). the knowledge on covid- mechanisms is rapidly evolving, as demonstrated by the rapid growth of the covid- open research dataset (cord- ) dataset, a source scientific manuscript text and metadata on covid- and related coronavirus research [ ] . cord- currently contains over , articles and preprints, over four times more than when it was introduced . in such a quickly evolving environment, biocuration efforts need to be supported by other repositories of structured knowledge about molecular mechanisms relevant for covid- , like molecular interaction databases, or text mining resources. contents of such repositories may suggest improvements in the existing covid- disease map diagrams, or establish a starting point for developing new pathways (see section "biocuration of database and text mining content"). interaction and pathway databases contain structured and annotated information on protein interactions or causal relationships. while interaction databases focus on pairs of molecules, offering broad coverage of literature-reported findings. pathway databases provide detailed description of biochemical processes and their regulations of related interactions, supported by diagrams. both types of resources can be a valuable input for covid- disease map biocurators, given the comparability of identifiers used for molecular annotations, and the reference to publications used for defining an interaction or building a pathway. table text-mining approaches can help to sieve through such rapidly expanding literature with natural language processing (nlp) algorithms based on semantic modelling, ontologies, and linguistic analysis to automatically extract and annotate relevant sentences, biomolecules, and their interactions. this scope was recently extended to pathway figure mining: decoding pathway figures into their computable representations [ ] . altogether, these automated workflows lead to the construction of knowledge graphs: semantic networks incorporating ontology concepts, unique biomolecule references, and their interactions extracted from abstracts or full-text documents [ ] . the covid- disease map project integrates open-access text mining resources, indra [ ] , biokb , ailani covid- , and pathwaystudio . all platforms offer keyword-based search allowing interactive exploration. additionally, the map benefits from an extensive protein-protein interaction network (ppi) generated with a custom text-mining pipeline using opennlp and gnormplus [ ] . this pipeline was applied to the cord- dataset and the collection of medline abstracts associated with the genes in the sars-cov- ppi network [ ] using the entrez gene reference-into-function (generif). for detailed descriptions of the resources, see supplementary material . molecular interactions from databases and knowledge graphs from text mining resources discussed above (from now on called altogether 'knowledge graphs') have a broad coverage at the cost of depth of mechanistic representation. this content can be used by the biocurators in the process of building and updating the systems biology focused diagrams. biocurators can use this content in three main ways: by visual exploration, by programmatic comparison, and by direct incorporation of the content. first, the biocurators can visually explore the contents of the knowledge graphs using available search interfaces to locate new knowledge and encode it in the diagrams. moreover, solutions like covidminer project , pathwaystudio and ailani offer a visual representation of a group of interactions for a better understanding of their biological context, allowing search by interactions, rather than just isolated keywords. finally, indra and ailani offer assistant bots that respond to natural language queries and return meaningful answers extracted from knowledge graphs. second, programmatic access and reproducible exploration of the knowledge graphs is possible via data endpoints: sparql for biokb and application programming interfaces for indra, ailani, and pathway studio. users can programmatically submit keyword queries and retrieve functions, interactions, pathways, or drugs associated with submitted gene lists. this way, otherwise time-consuming tasks like an assessment of completeness of a given diagram, or search for new literature evidence, can be automated to a large extent. finally, biocurators can directly incorporate the content of knowledge graphs into sbml format using biokc [ ] . additionally, the contents of the elsevier covid- pathway collection can be translated to sbgnml preserving the layout of the diagrams. the sbgnml content can then be converted into other diagram formats used by biocurators (see section . below). the biocuration of the covid- disease map is distributed across multiple teams, using varying tools and associated systems biology representations. this requires a common approach to annotations of evidence, biochemical reactions, molecular entities and their interactions. moreover, the interoperability of layout-aware formats is needed for comparison and integration of the diagrams in the map. the covid- disease map diagrams are encoded in one of three layout-aware formats for standardised representation of molecular interactions: sbml [ ] [ ] [ ] , sbgnml [ ] , and gpml [ ] . these xml-based formats focus to a varying degree on user-friendly graphical representation, standardised visualisation, and support of computational workflows. for the detailed description of the formats, see supplementary material . each of these three languages has a different focus: sbml emphasizes standardised representation of the data model underlying molecular interactions, sbgnml provides standardised graphical representation of molecular processes, while gpml allows for a partially standardised representation of uncertain biological knowledge. nevertheless, all three formats are centered around molecular interactions, provide a constrained vocabulary to encode element and interaction types, encode layout of their diagrams and support stable identifiers for diagram components. these shared properties, supported by a common ontology [ ] , allow cross-format mapping and enable translation of key properties between the formats. therefore, when developing the contents of the map, biocurators use the tools they are familiar with, facilitating this distributed task. the covid- disease map community ecosystem of tools and resources (see figure ) ensures interoperability between the three layout-aware formats for molecular mechanisms: sbml, sbgnml, and gpml. essential elements of this setup are tools capable of providing cross-format translation functionality [ , ] and supporting harmonised visualisation processing. another essential translation interface is a representation of reactome pathways in wikipathways gpml [ ] and sbml. the sbml export of reactome content has been optimised in the context of this project and facilitates integration with the other covid- disease map software components. the contents of the covid- disease map diagrams can be directly transformed into inputs of computational pipelines and data repositories. besides the direct use of sbml format in kinetic simulations, celldesigner sbml files can be transformed into sbml qual [ ] using casq [ ] , enabling boolean modelling-based simulations (see also supplementary material ). in parallel, casq converts the diagrams to the sif format , supporting pathway modelling workflows using simplified interaction networks. notably, the gitlab repository features an automated translation of stable versions of diagrams into sbml qual. finally, translation of the diagrams into xgmml format (the extensible graph markup and modelling language) using cytoscape [ ] or ginsim [ ] allows for network analysis and interoperability with molecular interaction repositories [ ] . thanks to the community effort discussed above supported by a rich bioinformatics framework, we constructed the covid- disease map, focussing on the mechanisms known from other coronaviruses [ ] and suggested by early experimental investigations [pmid: ]. then, we applied the analytical and modelling workflows to the contributed diagrams and associated interaction databases to propose initial map-based insights into covid- molecular mechanisms. the covid- disease map is an evolving repository of pathways affected by sars-cov- . figure . it is currently centred on molecular processes involved in sars-cov- entry, http://www.ebi.ac.uk/sbo/main/ http://www.cbmc.it/fastcent/doc/sifformat.htm replication, and host-pathogen interactions. as mechanisms of host susceptibility, immune response, cell and organ specificity emerge, these will be incorporated into the next versions of the map. the covid- map represents the mechanisms in a "host cell". this follows literature reports on cell specificity of sars-cov- [ , [ ] [ ] [ ] [ ] [ ] . some pathways included in the covid- map may be shared among different cell types, as for example the ifn- pathway found in cells such as dendritic, epithelial, and alveolar macrophages [ ] [ ] [ ] [ ] [ ] . while at this stage, we do not address cell specificity explicitly in our diagrams, extensive annotations may allow identification of pathways relevant to the cell type of interest. the sars-cov- infection process and covid- progression follow a sequence of steps ( figure ), starting from viral attachment and entry, which involve various dynamic processes on different time scales that are not captured in static representations of pathways. correlation of symptoms and potential drugs suggested to date helps downstream data exploration and drug target interpretation in the context of therapeutic interventions. human host ilc , ilc- , ilc natural killer renin-angiotensinaldosterone system (raas) granulocytes nasal mucosa disease map golgi er cd + cd + ace tmprss integrative stress response dendritic cells transmission of sars-cov- primarily occurs through contact with respiratory drops, airborne transmission, and through contact with contaminated surfaces [ ] [ ] [ ] . upon contact with the respiratory epithelium, the virus infects cells mostly by binding the spike surface glycoprotein (s) to angiotensin-converting enzyme (ace ) with the help of serine protease tmprss [ ] [ ] [ ] [ ] . importantly, recent results suggest viral entry using other receptors of lungs and the immune system [ , ] . once attached, sars-cov- can enter cells either by direct fusion of the virion and cell membranes in the presence of proteases • dendritic cells. • nk cells. • monocytes and macrophages. • t cells, th and th response. • b cells, antibody production. asymptomatic/pre -symptomatic. vaccine? pre-exposure prophylaxis? antivirals? sirs, shock. shortness of breath. anosmia, ageusia, cough, fever, diarrhea. multiple organ dysfunction ards, complications. host response • cellular stress. • apoptosis. systemic and ventilation support oxygen therapy host raas ards; acute respiratory distress syndrome. raas; renin-angiotensin-aldosterone system. sirs; systemic inflammatory response syndrome. pathophysiology virus-host cell interactions and host response disease map critical asymptomatic (lung, heart, kidney) (nasal and respiratory epithelium, alveoli, vascular endothelial) tmprss and furin or by endocytosis in their absence. regardless of the entry mechanism, the s protein has to be activated to initiate the plasma or endosome membrane fusion process. while in the cell membrane, s protein is activated by tmprss and furin, in the endosome s protein is activated by cathepsin b (ctsb) and cathepsin l (ctsl) [ , ] . activated s promotes the cell-or endosome-membrane fusion [ ] with the virion membrane, and then the nucleocapsid is injected into the cytoplasm. these mechanisms are represented in the corresponding diagrams of the map . within the host cell, sars-cov- hijacks the rough endoplasmic reticulum (rer)-linked host translational machinery. it then synthesises viral proteins replicase polyprotein a (pp a) and replicase polyprotein ab (pp ab) directly from the virus (+)genomic rna (grna) [ , ] . through a complex cascade of proteolytic cleavages, pp a and pp ab give rise to non-structural proteins (nsps) [ ] [ ] [ ] . most of these nsps collectively form the replication transcription complex (rtc) that is anchored to the membrane of the double-membrane vesicle [ , ] endoplasmic reticulum stress and unfolded protein response as discussed above, the virus hijacks the er to replicate. production of large amounts of viral proteins exceeds the protein folding capacity of the er, creating an overload of unfolded proteins. as a result, the unfolded protein response (upr) pathways are triggered to assure the er homeostasis, using three main signalling routes of upr via perk, ire , and atf [ ]. their role is to mitigate the misfolded protein load and reduce oxidative stress. the resulting protein degradation is coordinated with a decrease in protein synthesis via eif alpha phosphorylation and induction of protein folding genes via the transcription factor xbp [ ] . when the er is unable to restore its function, it can trigger cell apoptosis [ , ] . the results are er stress and activation of the upr. the expression of some human coronavirus (hcov) proteins during infection, in particular the s glycoprotein, may induce activation of the er stress in the host cells [ ] . based on sars-cov results, this may lead to activation of the perk [ ], ire and in an indirect manner, of the atf pathways [ ] . processes of degrading malfunctioning proteins and damaged organelles, including the ubiquitin-proteasome system (ups) and autophagy [ ] are essential to maintain energy homeostasis and prevent cellular stress [ , ] . autophagy is also involved in cell defence, including direct destruction of the viruses via virophagy, presentation of viral antigens, and inhibition of excessive inflammatory reactions [ , ] . sars-cov- directly affects the process of ups-based protein degradation, as indicated by the host-virus interactome dataset published recently [ ] . this mechanism may be a defence against viral protein degradation [ ] . the map describes in detail the nature of this interaction, namely the impact of orf virus protein on the cul ubiquitin ligase complex and its potential substrates. interactions between sars-cov- and host autophagy pathways are inferred based on results from other covs. a finding that covs use double-membrane vesicles and lc -i for replication [ ] may suggest that the virus induces autophagy, possibly in atg -dependent manner [ ] , although some evidence points to the contrary [ ] . also, the cov nsp restricts autophagosome expansion, compromising the degradation of viral components [ ] . recently revealed mutations in nsp [ ] indicate its importance, although the exact effect of the mutations remains unknown. based on the connection between autophagy and the endocytic pathway of the virus replication cycle [ ] , autophagy modulation was suggested as a potential therapy strategy, either pharmacologically [ , [ ] [ ] [ ] , or via fasting [ ] . apoptosis, a synonym for programmed cell death, is triggered by virus-host interaction upon infection, as the early death of the virus-infected cells may prevent viral replication. many viruses block or delay cell death by expressing anti-apoptotic proteins to maximize the production of viral progeny [ ] . in turn, apoptosis induction at the end of the viral replication cycle might assist in viral dissemination while reducing an inflammatory response. for instance, sars-cov- [ ] and mers [ ] are able to invoke apoptosis in lymphocytes, compromising the immune system. apoptosis follows two major pathways [ ] , called extrinsic and intrinsic. extrinsic signals are transmitted by death ligands and their receptors (e.g., fasl and tnf-alpha). activated death receptors recruit adaptors like fadd and tradd, and initiator procaspases like caspase- , leading to cell death with the help of effector caspases- and [ , ] . in turn, the intrinsic pathway involves mitochondria-related members of the bcl- protein family. cellular stress causes bcl- proteins-mediated release of cytochrome c from the mitochondria into the cytoplasm. cytochrome c then forms a complex with apaf and recruits initiator procaspase- to form the apoptosome, leading to the proteolytic activation of caspase- . activated caspase- can now initiate the caspase cascade by activating effector caspases and [ ] . the intrinsic pathway is modulated by sars-cov molecules [ , ] . as intrinsic apoptosis involves mitochondria, its activity may also be exacerbated by sars-cov- disruptions of the electron transport chain, mitochondrial translation, and transmembrane transport [ ] . the resulting mitochondrial dysfunction may lead to increased release of reactive oxygen species and pro-apoptotic factors. another vital crosstalk is that of the intrinsic pathway with the pi k-akt pro-survival pathway. activated akt can phosphorylate and inactivate various pro-apoptotic proteins, including bad and caspase- [ ] . sars-cov uses pi k-akt signalling cascade to enhance infection [ ] . moreover, sars-cov could affect apoptosis in a cell-type-specific manner [ , ] . sars-cov structural proteins s, e, m, n, and accessory proteins a, b, , a, a, and b have been shown to act as crucial effectors of apoptosis in vitro. structural proteins seem to affect mainly the intrinsic apoptotic pathway, with p mapk and pi k/akt pathways regulating cell death. accessory proteins can induce apoptosis via different cascades and in a cellspecific manner [ ] . sars-cov e and a protein were shown to activate the intrinsic pathway by blocking anti-apoptotic bcl-xl localized to the er [ ] . sars-cov m protein and the ion channel activity of e and a were shown to interfere with pro-survival signalling cascades [ , ] . the viral replication and the consequent immune and inflammatory responses cause damage to the epithelium and pulmonary capillary vascular endothelium and activate the main intracellular defence mechanisms, as well as the humoral and cellular immune responses. resulting cellular stress and tissue damage [ , ] impair respiratory capacity and lead to acute respiratory distress syndrome (ards) [ , , ] . hyperinflammation is a known complication, causing widespread damage, organ failure, and death, followed by a not yet completely understood rapid increase of cytokine levels (cytokine storm) [ ] [ ] [ ] , and acute ards [ ] . other reported complications, such as coagulation disturbances and thrombosis are associated with severe cases, but the specific mechanisms are still unknown [ , [ ] [ ] [ ] , although some reports suggest that covid- coagulopathy has a distinct profile [ ] . the sars-cov- infection disrupts the coagulation cascade and is frequently associated with hyperinflammation, renin-angiotensin system (ras) imbalance and intravascular coagulopathy [ , [ ] [ ] [ ] . hyperinflammation leads in turn to detrimental hypercoagulability and immunothrombosis, leading to microvascular thrombosis with further organ damage [ ] . importantly, ras is influenced by risk factors of developing severe forms of covid- [ ] [ ] [ ] . ace , used by sars-cov- for host cell entry, is a regulator of ras and is widely expressed in the affected organs [ ] . the main function of ace is the conversion of angii to angiotensin - (ang - ), and these two angiotensins trigger the counter-regulatory arms of ras [ ] . the signalling via angii and its receptor agtr , elevated in the infected [ , ] , induces the coagulation cascade leading to microvascular thrombosis [ ] , while ang - and its receptor mas attenuate these effects [ ] . the innate immune system detects specific pathogen-associated molecular patterns (pamps), through pattern recognition receptors (prrs). detection of sars-cov- is mediated through receptors that recognise double-stranded and single-stranded rna in the endosome during endocytosis of the virus particle, or in the cytoplasm during the viral replication. these receptors mediate the activation of transcription factors such as ap , nfkappab, irf , and irf , responsible for the transcription of antiviral proteins, in particular, interferon-alpha and beta [ , ] . sars-cov- reduces the production of type i interferons to evade the immune response [ ] . the detailed mechanism is not clear yet; however, sars-cov m protein inhibits the irf activation [ ] and suppresses nfkappab and cox transcription. at the same time, sars-cov n protein activates nfkappab [ ] , so the overall impact is unclear. these pathways are also negatively regulated by sars-cov nsp papain-like protease domain (plpro) [ ] . the map contains the initial recognition process of the viral particle by the innate immune system and the viral mechanisms to evade the immune response. it provides the connection between virus entry (detecting the viral endosomal patterns), its replication cycle (detection cytoplasmic viral patterns), and the effector pathways of pro-inflammatory cytokines, especially of the interferon type i class. the latter seems to play a crucial but complex role in covid- pathology: both negative [ , ] and positive effects [ , ] of interferons on virus replication have been reported. interferon type i signalling interferons (ifns) are central players in the antiviral immune response of the host cell [ ] , specifically affected by sars-cov- [ ] [ ] [ ] [ ] . type i ifns are induced upon viral recognition of pamps by various host prrs [ ] as discussed earlier. the ifn-i pathway diagram represents the activation of tlr and ifnar and the subsequent recruiting of adaptor proteins and the downstream signalling cascades regulating key transcription factors including irf / , nf-kappab, ap- , and isre [ , ] . further, the map shows irf mediated induction of ifn-i, affected by the sars-cov- proteins. sars-cov nsp and orf interfere with irf signalling [ , ] and sars-cov m, n, nsp and nsp act as interferon antagonists [ , , , , ] . moreover, coronaviruses orf a, orf and nsp proteins can repress interferon expression and stimulate the degradation of ifnar and stat during the unfolded protein response (upr) [ , ] . another mechanism of viral rna recognition is rig-like receptor signalling [ ] , leading to sting activation [ ] , and via the recruitment of traf , tbk and ikkepsilon to phosphorylation of irf [ ] . this in turn induces the transcription of ifns alpha, beta and lambda [ ] . sars-cov viral papain-like-proteases, contained within the nsp and nsp proteins, inhibit sting and the downstream ifn secretion [ ] . in line with this hypothesis, sars-cov- infection results in a unique inflammatory response defined by low levels of ifn-i and high expression of cytokines [ , ] . the ifnlambda diagram describes the ifnl receptor signaling cascade [ ] , including jak-stat signaling and the induction of interferon stimulated genes, which encode antiviral proteins [ ] . the interactions of sars-cov- proteins with the ifnl pathway are based on the literature [ ] or sars-cov homology [ ] . metabolic pathways govern the immune microenvironment by modulating the availability of nutrients and critical metabolites [ ] . infectious entities reprogram host metabolism to create favourable conditions for their reproduction [ ] . sars-cov- proteins interact with a variety of immunometabolic pathways, several of which are described below. heme catabolism is a well-known anti-inflammatory system in the context of infectious and autoimmune diseases [ , ] . the main effector of this pathway, heme oxygenase- (hmox ) was found to interact with sars-cov- orf a, although the nature of this interaction remains ambiguous [ , ] . hmox cleaves heme into carbon monoxide, biliverdin (then reduced to bilirubin), and ferrous iron [pmid: ]. biliverdin, bilirubin, and carbon monoxide possess cytoprotective properties, and have shown promise as immunomodulatory therapeutics [ , ] . importantly, activation of hmox also inhibits the nlrp inflammasome [ ] [ ] [ ] , which is a pro-inflammatory and prothrombotic multiprotein system [ ] highly active in covid- [ ] [ ] [ ] . it mediates production of the pro-inflammatory cytokines il- b and il- via caspase- [ ] . the sars-cov orf a, e, and orf a incite the nlrp inflammasome [ ] [ ] [ ] [ ] . still, the potential of the hmox pathway to fight covid- inflammation remains to be tested [ , , ] despite promising results in other models of inflammation [ , , [ ] [ ] [ ] . the tryptophan-kynurenine pathway is closely related to heme metabolism. the ratelimiting step of this pathway is catalysed by the indoleamine , dioxygenase enzymes (ido and ido ) in dendritic cells, macrophages, and epithelial cells in response to inflammatory cytokines like ifn-gamma, ifn- , tgf-beta, tnf-alpha, and il- [ ] [ ] [ ] . crosstalk with the hmox pathway also increases the expression of ido and hmox in a feed-forward manner. metabolomics analyses from severe covid- patients revealed enrichment of kynurenines and depletion of tryptophan, indicating robust activation of ido enzymes [ , ] . depletion of tryptophan [ , , ] and kynurenines and their derivatives affect the proliferation and immune response of a range of t cells [ , [ ] [ ] [ ] [ ] [ ] . however, despite high levels of kynurenines in covid- , cd + t-cells and th cells are enriched in lung tissue, and t-regulatory cells are diminished [ ] . this raises the question of whether and how the immune response elicited in covid- evades suppression by the kynurenine pathway. the sars-cov- protein nsp interacts with three human proteins: gla, sirt , and impdh [ ] . the galactose metabolism pathway, including the gla enzyme [ ] , is interconnected with amino sugar and nucleotide sugar metabolism. sirt is a naddependent desuccinylase and demalonylase regulating serine catabolism, oxidative metabolism and apoptosis initiation [ ] [ ] [ ] . moreover, nicotinamide metabolism regulated by sirt occurs downstream of the tryptophan metabolism, linking it to the pathways discussed above. finally, impdh is the rate-limiting enzyme in the de novo synthesis of gtp, allowing regulation of purine metabolism and downstream potential antiviral targets [ , ] . the pyrimidine synthesis pathway, tightly linked to purine metabolism, affects viral dna and rna synthesis. pyrimidine deprivation is a host targeted antiviral defence mechanism, which blocks viral replication in infected cells and can be regulated pharmacologically [ ] [ ] [ ] . it appears that components of the dna damage response connect the inhibition of pyrimidine biosynthesis to the interferon signalling pathway, probably via sting-induced tbk activation that amplifies interferon response to viral infection, discussed above. inhibition of de novo pyrimidine synthesis may have beneficial effects on the recovery from covid- [ ] ; however, this may happen only in a small group of patients. covid- pathways featured in the previous section cover mechanisms reported so far. still, certain aspects of the disease were not represented in detail because of their complexity, namely cell-type-specific immune response, and susceptibility features. their mechanistic description is of great importance, as suggested by clinical reports on the involvement of these pathways in the molecular pathophysiology of the disease. the mechanisms outlined below will be the next targets in our curation roadmap. cell type-specific immune response covid- causes serious disbalance in multiple populations of immune cells. some studies report that covid- patients have a significant decrease of peripheral cd + and cd + cytotoxic t lymphocytes (ctls), b cells, nk cells, as well as higher levels of a broad range of cytokines and chemokines [ , [ ] [ ] [ ] [ ] . the disease causes functional exhaustion of cd + ctls and nk cells, induced by sars-cov- s protein and by excessive pro-inflammatory cytokine response [ , ] . moreover, the ratio of naïve-to-memory helper t-cells, as well as the decrease of t regulatory cells, correlate with covid- severity [ ] . conversely, high levels of th and cytotoxic cd + t-cells have been found in the lung tissue [ ] . pulmonary recruitment of lymphocytes into the airways may explain the lymphopenia and the increased neutrophil-lymphocyte ratio in peripheral blood found in covid- patients [ , , ] . in this regard, an abnormal increase of the th :treg cell ratio may promote the release of pro-inflammatory cytokines and chemokines, increasing disease severity [ ] . sars-cov- infection is associated with increased morbidity and mortality in individuals with underlying chronic diseases or a compromised immune system [ ] [ ] [ ] [ ] . groups of increased risk are men, pregnant and postpartum women, and individuals with high occupational viral exposure [ ] [ ] [ ] . other susceptibility factors include the abo blood groups [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] and respiratory conditions [ ] [ ] [ ] [ ] [ ] [ ] . importantly, age is one of the key aspects contributing to the severity of the disease. the elderly are at high risk of developing severe or critical disease [ , ] . age-related elevated levels of pro-inflammatory cytokines (inflammation) [ ] [ ] [ ] [ ] , immunosenescence and cellular stress of ageing cells [ , , , , ] may contribute to the risk. in contrast, children are generally less likely to develop severe disease [ , ] , with the exception of infants [ , [ ] [ ] [ ] . however, some previously healthy children and adolescents can develop a multisystem inflammatory syndrome following sars-cov- infection [ ] [ ] [ ] [ ] [ ] . several genetic factors have been proposed and identified to influence susceptibility and severity, including the ace gene, hla locus, errors influencing type i ifn production, tlr pathways, myeloid compartments, as well as cytokine polymorphisms [ , , [ ] [ ] [ ] [ ] [ ] [ ] [ ] . we aim to connect the susceptibility features to specific molecular mechanisms and better understand the contributing factors. this can lead to a series of testable hypotheses, including the role of vitamin d counteracting pro-inflammatory cytokine secretion [ ] [ ] [ ] in an age-dependent manner [ , ] , and modifying the severity of the disease. another example of a testable hypothesis may be that the immune phenotype associated with asthma inhibits pro-inflammatory cytokine production and modifies gene expression in the airway epithelium, protecting against severe covid- [ , , ] . in order to understand complex and often indirect dependencies between different pathways and molecules, we need to combine computational and data-driven analyses. standardised representation and programmatic access to the contents of the covid- disease map enable the development of reproducible analytical and modelling workflows. here, we discuss the range of possible approaches and demonstrate preliminary results, focusing on interoperability, reproducibility, and applicability of the methods and tools. our goal is to work on the computational challenges as a community, involving the biocurators and domain experts in the analysis of the covid- disease map and rely on their feedback to evaluate the outcomes. in this way, we aim to identify approaches to tackle the complexity and the size of the map, proposing a state-of-the-art framework for robust analysis, reliable models, and useful predictions. visualisation of omics data can help contextualise the map with experimental data creating data-specific blueprints. these blueprints could be used to highlight parts of the map that are active in one condition versus another (treatment versus control, patient versus healthy, normal versus infected cell, etc.). combining information contained in multiple omics platforms can make patient stratification more powerful, by reducing the number of samples needed or by augmenting the precision of the patient groups [ , ] . approaches that integrate multiple data types without the accompanying mechanistic diagrams [ ] [ ] [ ] produce patient groupings that are difficult to interpret. in turn, classical pathway analyses often produce long lists mixing generic and cell-specific pathways, making it challenging to pinpoint relevant information. using disease maps to interpret omics-based clusters addresses the issues related to contextualised visual data analytics. footprints are signatures of a molecular regulator determined by the expression levels of its targets [ ] . for example, a footprint can contain targets of a transcription factor (tf) or peptides phosphorylated by a kinase. combining multiple omics readouts and multiple measurements can increase the robustness of such signatures. nevertheless, an essential component is the mechanistic description of the targets of a given regulator, allowing computation of its footprint. with available sars-cov- related omics and interaction datasets [ ] , it is possible to infer which tfs and signalling pathways are affected upon infection [ ] . combining the covid- disease map regulatory interactions with curated collections of tf-target interactions like dorothea [ ] will provide a contextualised evaluation of the effect of sars-cov- infection at the tf level. the virus-host interactome is a network of virus-human protein-protein interactions (ppis) that can help understanding the mechanisms of disease [ , [ ] [ ] [ ] . it can be expanded by merging virus-host ppi data with human ppi and protein data [ ] to discover clusters of interactions indicating human mechanisms and pathways affected by the virus [ ] . these clusters first of all can be interpreted at the mechanistic level by visual exploration of covid- disease map diagrams. in addition, these clusters can potentially reveal additional pathways to add to the covid- disease map (e.g., e protein interactions or tgfbeta diagrams) or suggest new interactions to introduce into the existing diagrams. computational modelling is a powerful approach that enables in silico experiments, produces testable hypotheses, helps elucidate regulation and, finally, can suggest via predictions novel therapeutic targets and candidates for drug repurposing. mechanistic models of pathways allow bridging variations at the scale of molecular activity to variations at the level of cell behaviour. this can be achieved by coupling the molecular interactions of a given pathway with its endpoint and by contextualising the molecular activity using omics datasets. hipathia is such a method, processing transcriptomic or genomic data to estimate the functional profiles of a pathway conditioned by the data studied and linkable to phenotypes such as disease symptoms or other endpoints of interest [ , ] . moreover, such mechanistic modelling can be used to predict the effect of interventions as, for example, the effect of targeted drugs [ ] . hipathia integrates directly with the diagrams of the covid- map using the sif format provided by casq (see section . ), as well as with the associated interaction databases (see section . ). the drawback of approaches like hipathia is their computational complexity, limiting the size of the diagrams they can process. an approach to large-scale mechanistic pathway modelling is to transform them into causal networks. carnival [ ] combines the causal representation of networks [ ] with transcriptomics, phosphoproteomics, or metabolomics data [ ] to contextualise cellular networks and extract mechanistic hypotheses. the algorithm identifies a set of coherent causal links connecting upstream drivers such as stimulations or mutations to downstream changes in transcription factor activities. analysis of the dynamics of molecular networks is necessary to understand their dynamics and deepen our understanding of crucial regulators behind disease-related pathophysiology. discrete modelling framework provides this possibility. covid- disease map diagrams, translated to sbml qual (see section . ), can be directly imported by tools like cell collective [ ] or ginsim [ ] for analysis. preserving annotations and layout information ensures transparency and reusability of the models. importantly, cell collective is an online user-friendly modelling platform that provides features for real-time in silico simulations and analysis of complex signalling networks. the platform allows users without computational background to simulate or analyse models to generate and prioritise new hypotheses. references and layout are used for model visualisation, supporting the interpretation of the results. the mathematics and code behind each model, however, remain accessible to all users. in turn, ginsim is a tool providing a wide range of analysis methods, including efficient identification of the states of convergence of a given model (attractors). model reduction functionality can also be employed to facilitate the analysis of large-scale models. viral infection and immune response are complex processes that span many different scales, from molecular interactions to multicellular behaviour. the modelling and simulation of such complex scenarios require a dedicated multiscale computational architecture, where multiple models run in parallel and communicate among them to capture cellular behaviour and intercellular communications. multiscale agent-based models simulate processes taking place at different time scales, e.g., diffusion, cell mechanics, cell cycle, or signal transduction [ ] , proposed also for covid- [ ] . physiboss [ ] allows such simulation of intracellular processes by combining the computational framework of physicell [ ] with maboss [ ] tool for stochastic simulation of logical models to study of transient effects and perturbations [ ] . implementation of detailed covid- signalling models in the pysiboss framework may help to better understand complex dynamics of multi-scale processes as interactions and crosstalk between immune system components and the host cell in covid- . in this case study, we combine computational approaches discussed above and present results derived from omics data analysis on the covid- disease maps diagrams. we measured the effect of covid- at the transcription factor (tf) activity level by applying viper [ ] combined with dorothea regulons [ ] on rna-seq datasets of the sars-cov- infected cell line [ ] . then, we mapped the tfs normalised enrichment score (nes) on the interferon type i signalling pathway diagram of the covid- disease map using the sif files generated by casq (see section . ). as highlighted in figure , our manually curated pathway included some of the most active tfs after sars-cov- infection, such as stat , stat , irf and nfkb . these genes are well known to be involved in cytokine signalling and first antiviral response [ , ] . interestingly, they are located downstream of various viral proteins (e, s, nsp , orf a and orf a) and members of the mapk pathway (mapk , mapk and map k ). sars-cov- infection is known to promote mapk activation, which mediates the cellular response to pathogenic infection and promotes the production of proinflammatory cytokines [ ] . altogether, we identified signaling events that may capture the mechanistic response of the human cells to the viral infection. in this use case, the hipathia [ ] algorithm was used to calculate the level of activity of the subpathways from the covid- apoptosis diagram, with the aim to evaluate whether covid- disease map diagrams can be used for pathway modelling approach. to this end, a public rna-seq dataset from human sars-cov- infected lung cells (geo gse ) was used. first, the rna-seq gene expression data was normalized with the trimmed mean of m values (tmm) normalization [ ] , then rescaled to range [ ; ] for the calculation of the signal and normalised using quantile normalisation [ ] . the normalised gene expression values were used to calculate the level of activation of the subpathways, then a case/control contrast with a wilcoxon test was used to assess differences in signaling activity between the two conditions. the activation levels have been calculated using transcriptional data from gse and hipathia mechanistic pathway analysis algorithm. each node represents a gene (ellipse), a metabolite (circle) or a function (square). the pathway is composed of circuits from a receptor gene/metabolite to an effector gene/function, with interactions simplified to inhibitions or activations (see section . , sif format). significantly deregulated circuits are highlighted by color arrows (red: activated in infected cells). the color of the node corresponds to the level of differential expression of each node in sars-cov- infected cells vs normal lung cells. blue: down-regulated elements, red: up-regulated elements, white: elements with not statistically significant differential expression. hipathia calculates the overall circuit activation, and can indicate deregulated interaction even if interacting elements are not individually differentially expressed. results of the apoptosis pathway analysis can be seen in figure and supplementary material . importantly, hipathia calculates the overall activation of circuits (series of causally connected elements), and can indicate deregulated interactions resulting from a cumulative effect, even if interacting elements are not individually differentially expressed. when discussing differential activation, we refer to the circuits, while individual elements are mentioned as differentially expressed. the analysis shows an overactivation of several circuits, specifically the one ending in the effector protein bax. this overactivation seems to be led by the overexpression of the bad protein, inhibiting bcl -mcl -bcl l complex, which in turn inhibits bax. indeed, sars-cov- infection can invoke caspase -induced apoptosis [ ] , where bax together with the ripoptosome/caspase- complex, may act as a pro-inflammatory checkpoint [ ] . this result is supported by studies in sars-cov, showing bax overexpression following infection [ , ] . overall, our findings recapitulate reported outcomes. with evolving contents of the covid- disease map and new omics data becoming available, new mechanism-based hypotheses can be formulated. in the covid disease map community we strive to produce interoperable content and seamless downstream analyses, translating the graphic representations of the molecular mechanisms to executable models. we are also aware of parallel efforts towards modelling of covid- mechanisms, which we plan to include as a part of our ecosystem. these efforts are not yet directly interoperable with the covid disease map content as they use either different notation schemes or require parameters not covered by our biocuration guidelines at the same time, they provide a complementary source of information and the opportunity to create an even broader toolset to tackle the pandemic. the modified edinburgh pathway notation (mepn) scheme [ ] allows for the detailed visual encoding of molecular processes using the yed platform but diagrams are constructed in such a way as to also function as petri nets. these can then be used directly for activity simulations using the biolayout network analysis tool [ ] . the current mepn covid- model details the replication cycle of sars-cov- , integrated with a range of host defence systems, e.g. type interferon signalling, tlr receptors, oas systems, etc. simulations of altered gene expression, interactions with drug targets or changes to interaction kinetics can be represented by introducing relevant transitions or nodes directly in the diagram. currently, models constructed in mepn can be saved as sbgn.ml files, however is a loss of information and the features associated computationally are not compatible with other covid- disease map diagrams (not modelled as petri nets). the covid- disease map can support dynamic kinetic modelling to quantify the behaviour of different pathways and evaluate the dynamic effects of perturbations. however, it is necessary to assign a kinetic equation or a rate law to every reaction in the diagram to be analysed. this process is challenging because any given reaction depends on its cellular and physiological context, which makes it difficult to parameterise. software support of tools like sbmlsqueezer [ ] and reaction kinetics databases like sabio-rk [ ] are indispensable in this effort. nevertheless, the most critical factor is the availability of experimentally validated parameters that can be reliably applied in sars-cov- modelling scenarios. the covid- disease map is both a knowledgebase and a computational repository. on the one hand, it is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. on the other hand, it is a computational resource of curated content for graph-based analyses and disease modelling. it offers a shared mental map for understanding the dynamic nature of the disease at the molecular level and also its dynamic propagation at a systemic level. thus, it provides a platform for a precise formulation of models, accurate data interpretation, monitoring of therapy, and potential for drug repositioning. the covid- disease map spans three platforms and assembles diagrams describing molecular mechanisms of covid- . these diagrams are grounded in the relevant published sars-cov- research, completed where necessary by mechanisms discovered in related beta-coronaviruses. this unprecedented effort of community-driven biocuration resulted in over forty diagrams with molecular resolution constructed since march . it demonstrates that expertise in biocuration, clear guidelines and text mining solutions can accelerate the passage from the published findings to a meaningful mechanistic representation of knowledge. the covid disease map can provide the tipping point to shortcut research data generation and knowledge accumulation, creating a formalized and standardized streamline of well defined tasks. this approach to an emerging pandemic leveraged the capacity and expertise of an entire swath of the bioinformatics community, bringing them together to improve the way we build and share knowledge. by aligning our efforts, we strive to provide covid- specific pathway models, synchronize content with similar resources and encourage discussion and feedback at every stage of the curation process. with new results published every day, and with the active engagement of the research community, we envision the covid- disease map as an evolving and continuously updated knowledge base whose utility spans the entire research and development spectrum from basic science to pharmaceutical development and personalized medicine. moreover, our approach includes a large-scale effort to create interoperable tools and seamless downstream analysis pipelines to boost the applicability of established methodologies to the covid- disease map content. this includes harmonisation of formats, support of standards, and transparency in all steps to ensure wide use and content reusability. preliminary results of such efforts are presented in the case studies. the covid- disease map community is open and expanding as more people with complementary expertise join forces. in the longer run, the map's content will help to find robust signatures related to sars-cov- predisposition or response to various treatments, along with the prioritization of new potential drug targets or drug candidates. we aim to provide the tools to deepen our understanding of the mechanisms driving the infection and help boost drug development supported with testable suggestions. we aim at building armor for new treatments to prevent new waves of covid- or similar pandemics. a pneumonia outbreak associated with a new coronavirus of probable bat origin covid- and the kidney: from epidemiology to clinical practice receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues covid- disease map, building a computational repository of sars-cov- virus-host interaction mechanisms systems medicine disease maps: community-driven comprehensive representation of disease mechanisms communitydriven roadmap for integrated disease maps the reactome pathway knowledgebase wikipathways: a multifaceted pathway database bridging metabolomics to other omics research the systems biology graphical notation sbml level : an extensible format for the exchange and reuse of biological models the biopax community standard for pathway data sharing uniform resolution of compact identifiers for biomedical data omnipath: guidelines and gateway for literature-curated signaling pathway resources the imex coronavirus interactome: an evolving map of coronaviridae-host molecular interactions signor . , the signaling network open resource . : update update: integration, analysis and exploration of pathway data fairdomhub: a repository and collaboration environment for sharing systems biology research identifiers.org and miriam registry: community resources to provide persistent identification setting the basis of best practices and standards for curation and annotation of logical models in biology-highlights of the zbit bioinformatics toolbox: a web-platform for systems biology and expression data analysis sbmlsqueezer : context-sensitive creation of kinetic equations in biochemical networks minerva-a platform for visualization and curation of molecular interaction networks editing, validating and translating of sbgn maps pathvisio : an extendable pathway analysis toolbox modeling and simulation using celldesigner software support for sbgn maps: sbgn-ml and libsbgn systems biology graphical notation markup language (sbgnml) version . cord- : the covid- open research dataset protein interaction data curation: the international molecular exchange (imex) consortium years of pathway figures informing epidemic (research) responses in a timely fashion by knowledge management -a zika virus use case [internet]. pathology from word models to executable models of signaling networks using automated assembly gnormplus: an integrative approach for tagging genes, gene families, and protein domains a sars-cov- protein interaction map reveals targets for drug repurposing biokc: a collaborative platform for systems biology model curation and annotation systems biology markup language (sbml) level package: multistate, multicomponent and multicompartment species, version , release the systems biology markup language (sbml) level package: layout, version core sbml level package: render, version , release controlled vocabularies and semantics in systems biology closing the gap between formats for storing layout information in systems biology cd sbgnml: bidirectional conversion between celldesigner and sbgn formats reactome from a wikipathways perspective sbml qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools automated inference of boolean models from molecular interaction maps using casq cytoscape: a software environment for integrated models of biomolecular interaction networks logical modeling and analysis of cellular regulatory networks with ginsim . ndex: a community resource for sharing and publishing of biological networks human coronavirus: host-pathogen interaction comparative replication and immune activation profiles of sars-cov- and sars-cov in human lungs: an ex vivo study with implications for the pathogenesis of covid- tropism, replication competence, and innate immune responses of the coronavirus sars-cov- in human respiratory tract and conjunctiva: an analysis in ex-vivo and in-vitro cultures pathogenesis of covid- from a cell biology perspective pulmonary postmortem findings in a series of covid- cases from northern italy: a two-centre descriptive study interaction of sars-cov- and other coronavirus with ace (angiotensin-converting enzyme)- as their main receptor: therapeutic implications. hypertens dallas tex type i interferons: diversity of sources, production pathways and effects on immune responses impaired type i interferon activity and inflammatory responses in severe covid- patients interplay between sars-cov- and the type i interferon response the type i interferon response in covid- : implications for treatment type i and type iii interferons -induction, signaling, evasion, and application to combat covid- the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application temporal dynamics in viral shedding and transmissibility of covid- clinical features of patients infected with novel coronavirus in wuhan, china persons evaluated for novel coronavirus -united states epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan the prevalence of olfactory and gustatory dysfunction in covid- patients: a systematic review and meta-analysis identifying airborne transmission as the dominant route for the spread of covid- aerosol and surface stability of sars-cov- as compared with sars-cov- airborne transmission of sars-cov- : theoretical considerations and available evidence cell entry mechanisms of sars-cov- structure of the sars-cov- spike receptorbinding domain bound to the ace receptor sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses cd l/l-sign and cd /dc-sign act as receptors for sars-cov- and are differentially expressed in lung and kidney epithelial and endothelial cells sars-cov- spike protein interacts with multiple innate immune receptors a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein viral and cellular mrna translation in coronavirus-infected cells rna replication of mouse hepatitis virus takes place at double-membrane vesicles identification of severe acute respiratory syndrome coronavirus replicase products and characterization of papain-like protease activity liberation of sars-cov main protease from the viral polyprotein: n-terminal autocleavage does not depend on the mature dimerization mode severe acute respiratory syndrome coronavirus envelope protein regulates cell stress response and apoptosis autophagy during viral infection -a double-edged sword autophagy and energy metabolism canonical and noncanonical autophagy as potential targets for covid- digesting the crisis: autophagy and coronaviruses understanding sars-cov- -mediated inflammatory responses: from mechanisms to potential therapeutic tools proteasome activator pa γ-dependent degradation of coronavirus disease (covid- ) nucleocapsid protein coronaviruses hijack the lc -i-positive edemosomes, er-derived vesicles exporting short-lived erad regulators, for replication coronavirus replication complex formation utilizes components of cellular autophagy coronavirus replication does not require the autophagy gene atg coronavirus nsp restricts autophagosome expansion evolutionary analysis of sars-cov- : how mutation of non-structural protein (nsp ) could affect viral autophagy targeting the endocytic pathway and autophagy process as a novel therapeutic strategy in covid- autophagy and sars-cov- infection: apossible smart targeting of the autophagy pathway open questions for harnessing autophagy-modulating drugs in the sars-cov- war: hope or hype? intermittent fasting, a possible priming tool for host defense against sars-cov- infection: crosstalk among calorie restriction, autophagy and immune response murine coronavirus-induced apoptosis in cl- cells involves a mitochondria-mediated pathway and its downstream caspase- activation and bid cleavage the novel severe acute respiratory syndrome coronavirus (sars-cov- ) directly decimates human spleens and lymph nodes middle east respiratory syndrome coronavirus efficiently infects human primary t lymphocytes and activates the extrinsic and intrinsic apoptosis pathways apoptotic pathways: paper wraps stone blunts scissors apoptosis: a review of programmed cell death modulation of host cell death by sars coronavirus proteins spike protein of sars-cov stimulates cyclooxygenase- expression via both calcium-dependent and calcium-independent protein kinase c pathways augmentation of chemokine production by severe acute respiratory syndrome coronavirus a/x and a/x proteins through nf-kappab activation antiapoptotic signalling by the insulin-like growth factor i receptor, phosphatidylinositol -kinase, and akt jnk and pi k/akt signaling pathways are required for establishing persistent sars-cov infection in vero e cells phosphatidylinositol -kinase-dependent pathways oppose fas-induced apoptosis and limit chloride secretion in human intestinal epithelial cells. implications for inflammatory diarrheal states human intestinal epithelial cell survival: differentiation state-specific control mechanisms induction of apoptosis by the severe acute respiratory syndrome coronavirus a protein is dependent on its interaction with the bcl-xl protein the sars-coronavirus membrane protein induces apoptosis via interfering with pdk -pkb/akt signalling frequency and distribution of chest radiographic findings in patients positive for covid- coronavirus disease (covid- ) ct findings: a systematic review and meta-analysis covid- pathophysiology: a review acute respiratory distress syndrome clinical and immunological features of severe and moderate coronavirus disease longitudinal analyses reveal immunological misfiring in severe covid- is a "cytokine storm" relevant to covid- ? urgent avenues in the treatment of covid- : targeting downstream inflammation to prevent catastrophic syndrome incidence of thrombotic complications in critically ill icu patients with covid- abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia clinical course and outcome of patients infected with the novel coronavirus, sars-cov- , discharged from two hospitals in wuhan, china the unique characteristics of covid- coagulopathy treatment of covid- with conestat alfa, a regulator of the complement complement associated microvascular injury and thrombosis in the pathogenesis of severe covid- infection: a report of five cases hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease (covid- ): a meta-analysis hyperinflammation and derangement of renin-angiotensin-aldosterone system in covid- : a novel hypothesis for clinically suspected hypercoagulopathy and microvascular immunothrombosis angiotensin ii up-regulates angiotensin i-converting enzyme (ace), but down-regulates ace via the at -erk/p map kinase pathway sex hormones promote opposite effects on ace and ace activity, hypertrophy and cardiac contractility in spontaneously hypertensive rats sex differences in the aging pattern of renin-angiotensin system serum peptidases sars-cov- receptor and regulator of the renin-angiotensin system: celebrating the th anniversary of the discovery of ace counterregulatory renin-angiotensin system in cardiovascular disease clinical and biochemical indexes from -ncov infected patients linked to viral loads and lung injury the emerging threat of (micro)thrombosis in covid- and its therapeutic implications pathogen recognition and inflammatory signaling in innate immune defenses pattern recognition receptors and inflammation severe acute respiratory syndrome coronavirus m protein inhibits type i interferon production by impeding the formation of traf .tank.tbk /ikkepsilon complex activation of nf-kappab by the full-length nucleocapsid protein of the sars coronavirus sars coronavirus papain-like protease inhibits the tlr signaling pathway through removing lys -linked polyubiquitination of traf and traf antiviral activities of type i interferons to sars-cov- infection interferon priming enables cells to partially overturn the sars coronavirus-induced block in innate immune activation a suspicious role of interferon in the pathogenesis of sars-cov- by enhancing expression of ace sars-cov- is sensitive to type i interferon pretreatment severe acute respiratory syndrome coronavirus orf antagonizes stat function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane inborn errors of type i ifn immunity in patients with life-threatening covid- autoantibodies against type i ifns in patients with life-threatening covid- human coronaviruses: a review of virus-host interactions regulation of irf- -dependent innate immunity by the papain-like protease domain of the severe acute respiratory syndrome coronavirus severe acute respiratory syndrome coronavirus open reading frame (orf) b, orf , and nucleocapsid proteins function as interferon antagonists post-translational modifications of coronavirus proteins: roles and function accessory proteins of sars-cov and other coronaviruses the sars coronavirus a protein causes endoplasmic reticulum stress and induces ligand-independent downregulation of the type interferon receptor accessory proteins b and ab of severe acute respiratory syndrome coronavirus suppress the interferon signaling pathway by mediating ubiquitin-dependent rapid degradation of interferon regulatory factor message in a bottle: lessons learned from antagonism of sting signalling during rna virus infection peroxisomal mavs activates irf -mediated ifn-λ production covid- as a sting disorder with delayed over-secretion of interferon-beta imbalanced host response to sars-cov- drives development of covid- interferon-λ: immune functions at barrier surfaces and beyond decoding type i and iii interferon signalling during viral infection structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- immunometabolism and pulmonary infections: implications for protective immune responses and host-directed therapies competition for nutrients and its role in controlling immune responses new insights into the nrf- /ho signaling axis and its application in pediatric respiratory diseases heme catabolic pathway in inflammation and immune disorders the hmox pathway as a promising target for the treatment and prevention of sars-cov- of (covid- ) bile pigments in pulmonary and vascular disease heme oxygenase- dampens the macrophage sterile inflammasome response and regulates its components in the hypoxic lung carbon monoxide negatively regulates nlrp inflammasome activation in macrophages heme oxygenase- protects airway epithelium against apoptosis by targeting the proinflammatory nlrp -rxr axis in asthma negative regulators and their mechanisms in nlrp inflammasome activation and signaling targeting the nlrp inflammasome in severe covid- sars-cov- infection and overactivation of nlrp inflammasome as a trigger of cytokine "storm" and risk factor for damage of hematopoietic stem cells severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc severe acute respiratory syndrome coronavirus e protein transports calcium ions and activates the nlrp inflammasome role of severe acute respiratory syndrome coronavirus viroporins e, a, and a in replication and pathogenesis. denison mr, editor. mbio coronavirus e protein forms ion channels with functionally and structurally-involved membrane lipids targeting the heme-heme oxygenase system to prevent severe complications following covid- infections genetic polymorphisms complicate covid- therapy: pivotal role of ho- in cytokine storm heme oxygenase- induction contributes to renoprotection by g-csf during rhabdomyolysis-associated acute kidney injury targeting the nrf -heme oxygenase- axis after intracerebral hemorrhage hemin and cobalt protoporphyrin inhibit nlrp inflammasome activation by enhancing autophagy: a novel mechanism of inflammasome regulation remarkable role of indoleamine , -dioxygenase and tryptophan metabolites in infectious diseases: potential role in macrophage-mediated inflammatory diseases inhibition of acute lethal pulmonary inflammation by the ido-ahr pathway -hydroxyanthranilic acid, one of metabolites of tryptophan via indoleamine , -dioxygenase pathway, suppresses inducible nitric oxide synthase expression by enhancing heme oxygenase- expression nitric oxide inhibits indoleamine , -dioxygenase activity in interferon-gamma primed mononuclear phagocytes multiomic immunophenotyping of covid- patients reveals early infection trajectories [internet]. immunology gcn kinase in t cells mediates proliferative arrest and anergy induction in response to indoleamine , -dioxygenase indoleamine , dioxygenase and metabolic control of immune responses an expanding range of targets for kynurenine metabolites of tryptophan ido expands human cd +cd high regulatory t cells by promoting maturation of lps-treated dendritic cells inhibition of allogeneic t cell proliferation by indoleamine , -dioxygenase-expressing dendritic cells: mediation of suppression by tryptophan metabolites cinnabarinic acid generated from -hydroxyanthranilic acid strongly induces apoptosis in thymocytes through the generation of reactive oxygen species and the induction of caspase aryl hydrocarbon receptor negatively regulates dendritic cell immunogenicity via a kynureninedependent mechanism dysregulation of immune response in patients with coronavirus the molecular defect leading to fabry disease: structure of human alpha-galactosidase sirt is a nad-dependent protein lysine demalonylase and desuccinylase shmt desuccinylation by sirt drives cancer cell proliferation substrates and regulation mechanisms for the human mitochondrial sirtuins sirt and sirt imp/gtp balance modulates cytoophidium assembly and impdh activity fba reveals guanylate kinase as a potential target for antiviral therapies against sars-cov- . zenodo teriflunomide in the treatment of multiple sclerosis: current evidence and future prospects cerpegin-derived furo[ , -c]pyridine- , ( h, h)-diones enhance cellular response to interferons by de novo pyrimidine biosynthesis inhibition novel and potent inhibitors targeting dhodh are broad-spectrum antivirals against rna viruses including newly-emerged coronavirus sars-cov- longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of sars-cov- infected patients functional exhaustion of antiviral lymphocytes in covid- patients immunopathological characteristics of coronavirus disease cases in guangzhou detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals sars-cov- spike protein controls natural killer cell activation via the hla-e/nkg a pathway pathological findings of covid- associated with acute respiratory distress syndrome neutrophil-to-lymphocyte ratio and clinical outcome in covid- : a report from the italian front line higher level of neutrophil-to-lymphocyte is associated with severe covid- the comparative immunological characteristics of sars-cov, mers-cov, and sars-cov- coronavirus infections comorbidity and its impact on patients with covid- in china: a nationwide analysis risk factors for developing into critical covid- patients in wuhan, china: a multicenter, retrospective, cohort study covid- and crosstalk with the hallmarks of aging covid- in immunocompromised hosts: what we know so far considering how biological sex impacts immune responses and covid- outcomes public health agency of sweden's brief report: pregnant and postpartum women with severe acute respiratory syndrome coronavirus infection in intensive care in sweden risk of covid- among front-line health-care workers and the general community: a prospective cohort study relationship between the abo blood group and the covid- susceptibility association between abo blood groups and risk of sars-cov- pneumonia relationship between abo blood group distribution and clinical characteristics in patients with covid- genomewide association study of severe covid- with respiratory failure covid- and the abo blood group connection more on "association between abo blood groups and risk of sars-cov- pneumonia abo blood group predisposes to covid- severity and cardiovascular diseases inhibition of the interaction between the sars-cov spike protein and its cellular receptor by anti-histoblood group antibodies covid- and abo blood group: another viewpoint asthma and covid- association of respiratory allergy, asthma, and expression of the sars-cov- receptor ace distinct effects of asthma and copd comorbidity on disease expression and outcome in patients with covid- eleven faces of coronavirus disease covid- and asthma: reflection during the pandemic type inflammation modulates ace and tmprss in airway epithelial cells the possible pathophysiology mechanism of cytokine storm in elderly adults with covid- infection: the contribution of "inflameaging immunosenescence and inflamm-aging as two sides of the same coin: friends or foes? front immunol inflammaging: a new immunemetabolic viewpoint for age-related diseases an update and a model coronavirus disease (sars-cov- ) and colonization of ocular tissues and secretions: a systematic review immunosenescence in aging: between immune cells depletion and cytokines up-regulation clinical characteristics of children and young people admitted to hospital with covid- in united kingdom: prospective multicentre observational cohort study immune responses to sars-cov- infection in hospitalized pediatric and adult patients pathophysiology of covid- : why children fare better than adults? sars-cov- (covid- ): what do we know about children? a systematic review sars-cov- infection in children and newborns: a systematic review an outbreak of severe kawasaki-like disease at the italian epicentre of the sars-cov- epidemic: an observational cohort study multisystem inflammatory syndrome in children during the coronavirus pandemic: a case series hyperinflammatory shock in children during covid- pandemic childhood multisystem inflammatory syndrome -a new challenge in the pandemic genetic variability of human angiotensin-converting enzyme (hace ) among various ethnic populations covid- and individual genetic susceptibility/receptivity: role of ace /ace genes, immunity, inflammation and coagulation. might the double x-chromosome in females be protective against sars-cov- compared to the single x-chromosome in males? ace receptor polymorphism: susceptibility to sars-cov- , hypertension, multi-organ failure, and covid- disease outcome genetic gateways to covid- infection: implications for risk, severity, and outcomes a theory on sars-cov- susceptibility: reduced tlr -activity as a mechanistic link between men, obese and elderly severe covid- is marked by a dysregulated myeloid cell compartment deciphering the role of host genetics in susceptibility to severe covid- amelioration of non-alcoholic fatty liver disease with npc l -targeted igy or n- polyunsaturated fatty acids in mice evidence that vitamin d supplementation could reduce risk of influenza and covid- infections and deaths effect of single-dose injection of vitamin d on immune cytokines in ulcerative colitis patients: a randomized placebo-controlled trial prevalence of vitamin d deficiency among healthy infants and toddlers type and interferon inflammation strongly regulate sars-cov- related gene expression in the airway epithelium a computational framework for complex disease stratification from multiple large-scale datasets integration of multi-omics datasets enables molecular classification of copd similarity network fusion for aggregating data types on a genomic scale integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis diablo: an integrative approach for identifying key molecular drivers from multi-omics assays footprint-based functional analysis of multiomic data the global phosphorylation landscape of sars-cov- infection regulatory network analysis of paneth cell and goblet cell enriched gut organoids using transcriptomics approaches benchmark and integration of resources for the estimation of human transcription factor activities network-based drug repurposing for novel coronavirus -ncov/sars-cov- virus-host interactome and proteomic survey reveal potential virulence factors influencing sars-cov- multi-level proteomics reveals host-perturbation strategies of sars-cov- and sars-cov psicquic and psiscore: accessing and scoring molecular interactions covid- : viral-host interactome analyzed by network based-approach model to study pathogenesis of sars-cov- infection high throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes assessing the impact of mutations found in next generation sequencing data over human signaling pathways actionable pathways: interactive discovery of therapeutic targets using signaling pathway models from expression footprints to causal pathways: contextualizing large signaling networks with carnival the cell collective: toward an open and collaborative approach to systems biology comparing individualbased approaches to modelling the self-organization of multicellular tissues rapid community-driven development of a sars-cov- tissue simulator physiboss: a multiscale agent-based modelling framework integrating physical dimension and cell signalling physicell: an open source physics-based cell simulator for -d multicellular systems maboss . : an environment for stochastic boolean modeling conceptual and computational framework for logical modelling of biological networks deregulated in diseases functional characterization of somatic mutations in cancer using network-based inference of protein activity stat and irf : beyond isgf ifnβdependent increases in stat , stat , and irf mediate resistance to viruses and dna damage edger: a bioconductor package for differential expression analysis of digital gene expression data a comparison of normalization methods for high density oligonucleotide array data based on variance and bias sars-cov- triggers inflammatory responses and cell death through caspase- activation bax/bak-induced apoptosis results in caspase- -dependent il- β maturation in macrophages over-expression of severe acute respiratory syndrome coronavirus b protein induces both apoptosis and necrosis in vero e cells the mepn scheme: an intuitive and flexible graphical system for rendering biological pathways a graphical and computational modeling platform for biological pathways sabio-rk: an updated resource for manually curated biochemical reaction kinetics key: cord- -qpbgq d authors: walker, susanne n.; chokkalingam, neethu; reuschel, emma l.; purwar, mansi; xu, ziyang; gary, ebony n.; kim, kevin y.; schultheis, katherine; walters, jewell; ramos, stephanie; smith, trevor r.f.; broderick, kate e.; tebas, pablo; patel, ami; weiner, david b.; kulp, daniel w. title: sars-cov- assays to detect functional antibody responses that block ace recognition in vaccinated animals and infected patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qpbgq d sars-cov- (severe acute respiratory syndrome coronavirus ) has caused a global pandemic of covid- resulting in cases of mild to severe respiratory distress and significant mortality. the global outbreak of this novel coronavirus has now infected > million people worldwide with > million cases in the us (june th, ). there is an urgent need for vaccines and therapeutics to combat the spread of this coronavirus. similarly, the development of diagnostic and research tools to determine infection and vaccine efficacy are critically needed. molecular assays have been developed to determine viral genetic material present in patients. serological assays have been developed to determine humoral responses to the spike protein or receptor binding domain (rbd). detection of functional antibodies can be accomplished through neutralization of live sars-cov virus, but requires significant expertise, an infectible stable cell line, a specialized biosafety level (bsl- ) facility. as large numbers of people return from quarantine, it is critical to have rapid diagnostics that can be widely adopted and employed to assess functional antibody levels in the returning workforce. this type of surrogate neutralization diagnostic can also be used to assess humoral immune responses induced in patients from the large number of vaccine and immunotherapy trials currently on-going. here we describe a rapid serological diagnostic assay for determining antibody receptor blocking and demonstrate the broad utility of the assay by measuring the antibody functionality of sera from small animals and non-human primates immunized with an experimental sars-cov- vaccine and using sera from infected patients. options for rapid diagnostic of functional antibody responses, fast and simple functional assays may prove to be a critical assessment tool to discriminate between potential cpt donors. in parallel with cpt, major academic, industry and government entities are pushing for pseudovirus neutralization assays run in bsl- facilities were quickly developed to detect the functional antibody response in sera( ). while this is a critical tool for determining protective antibody titers, it requires several days for a readout and are not standardized between laboratories. the pseudoviruses produced in these assays are not easily manufactured and take time to express, harvest, and titer. one such approach to help augment the methods listed above is an enzyme-linked immunosorbent assay (elisa) employed in a competitive manner to determine levels of ace receptor blocking antibodies in a sample. in addition, recent advances of portable and field-deployable surface plasmon resonance (spr) devices( ) and widespread availability of spr instruments in research laboratories make spr an additional platform for measuring ace receptor blocking. here, we describe a competition elisa assay and a spr assay developed to rapidly detect ace receptor blocking antibodies in iggs and sera of vaccinated mice, guinea pigs, rabbits and non-human primates, as well as, human samples (fig. b) . we next sought to confirm the functionality of ace -ighu. previous studies suggest sars-cov- binds to ace with an affinity range of - nm( , ). we determined that our ace -ighu binds with similar affinity to the receptor binding domain (rbd) of sars-cov- spike ( . nm) as assessed by spr (fig. c ). next, using enzyme-linked immunosorbent assays (elisas) we immobilized full-length sars-cov- spike protein (containing both the s and s subunits) and incubated a dilution series of ace -ighu (fig. d) . the binding curves confirmed the high affinity interaction of the receptor for the spike protein. we further showed similar binding for two independent batches of ace , as well as a sample that was frozen and thawed (fig. s a) . from the binding curve, we hypothesized that a reasonable concentration of ace protein fusion needed to see competitive blocking while still binding > % of immobilized spike protein in the absence of blocking would be around . - ug/ml (fig. d, red arrow) . to examine if we could construct a competition assay, we employed ace -igmu (mouse fc) to act as a competitor to ace -ighu. to match our initial binding elisa format, the competition elisa assay similarly captured a his x-tagged full-length spike protein by first immobilizing an anti-his polyclonal antibody. a dilution series of the competitor (ace -igmu) was pre-mixed with a constant concentration of soluble receptor (ace -ighu). a secondary anti-human antibody conjugated to horseradish peroxidase (hrp) determines the amount of ace -ighu present through a tmb colorimetric readout ( fig. a) . in order to formally determine the optimal concentration of soluble receptor (ace -ighu), we performed the assay at four concentrations ranging from . ug/ml to ug/ml (fig. b) . the ace -ighu concentration of . ug/ml (red curve in fig. b) , was able to show a complete inhibition curve in the presence of ace -igmu. animal igg and serological competition the proof-of-concept competition elisa displayed a full blocking curve, so we sought to utilize this assay for animals immunized with sars-cov- spike protein. the same design for the competition elisa was used for this assay, replacing the ace -igmu competitor with antibodies induced by vaccination (fig. a) . in our previous work, balb/c mice were immunized with dna plasmids encoding sars-cov- spike protein( ). to examine the activity of antibodies in the sera, iggs from either naïve mice or vaccinated mice days post-immunization were purified using a protein g column. unlike the ace control which only binds to the receptor binding site (rbs) on the receptor binding domain (rbd) of the spike protein, antibodies from immunized mice can bind to a multitude of epitopes on the spike protein including epitopes on the s subunit(which includes the rbd), s subunit or s -s interfaces. while antibodies distal to the rbs should have less effect on ace binding, we hypothesized that such distal antibodies may inhibit ace binding directly by sterically obscuring the rbs or indirectly by causing allosteric conformational shifts in the spike protein. to test this, we immobilized either the full spike protein (s +s ) or s alone to examine the levels of detectable blocking antibodies. a mixture of ace -ighu at a constant concentration of . ug/ml and a dilution series from a vaccinated mouse igg (iggm ) or naïve mice igg (naïve iggm) was incubated on the plate. an anti- human-hrp conjugated antibody was added to determine the ace binding in the presence of iggm. as fig. b illustrates, there is greater antibody blocking with the full spike protein than with the s subunit alone (fig. s a ). to show the utility of this assay in samples from larger mammals, we examined receptor blocking of rabbit sera from sars-cov- immunization studies. we pooled sera from five for full, uninhibited ace binding, the auc will be larger than the auc for a competitive curve (fig. c) . as seen with the mouse iggs, the pooled vaccinated rabbit igg displayed statistically significant blocking of ace receptor binding compared to the naïve animal pool and the day pool (fig. d, fig s b) . the high dose rabbits reduced ace signal relative to the low dose group, highlighting the utility of the assay to help discriminate between different vaccine regimens. up to this point we have analyzed purified iggs collected from sera, however we wanted to validate the use of this assay on serological samples as well. the same rabbit sera pools were used as competitors in the competition elisa assay in a dilution series to compare blocking between sera and purified igg. the rabbit sera displayed statistically significant ace receptor blocking as we saw in the purified igg assay (fig. e, fig. s c ). next, we sought to show that we could assess receptor blocking in a third animal model of guinea pigs which were immunized with a sars-cov- spike-based vaccine( ). we compared a pool of sera collected on day post immunization to a day pool (fig. f) . the spike showed statistically significant ace blocking and, importantly, the pooled sera was comparable to the average auc from all six sera samples (fig. f, fig. s d ). the competition elisa was used to analyze both the iggs and the sera from the groups, both groups showed statistically significant blocking of the ace receptor in these assays (fig. s e, fig. s f ). for detection of ace binding in the presence of primate antibody inhibitors (fig. a) . to confirm the function of the replacement ace -igmu, an initial binding elisa was performed (fig. b) . we predicted a similar concentration was needed for optimal competition on the binding curve (fig. b , red arrow) and confirmed this by running a competition elisa assay using ace -ighu as the competitor at varying ace -igmu constant concentrations (fig. c fig. f, fig s b, s c ). this finding is consistent with the ace blocking data and we show a correlation between pseudovirus neutralization id and auc for residual ace blocking across all our datasets (fig. ) . thus, we have demonstrated that the ace competition assay can be employed to measure receptor inhibition levels of human samples. to quantitate blocking of the spike-ace bimolecular interaction in a second, independent experiment, we developed a sensitive surface plasmon resonance (spr) assay. spr is a widely used platform that does not require secondary antibodies and therefore we could use a single assay format for small animals, nhps and humans. in our assay, a cap sensor chip is used to capture single stranded dna coupled to streptavidin. we are then able to capture biotinylated spike protein to the surface of the sensor chip (fig. a) . in spr, changes in to demonstrate the feasibility of this assay, we used ace -ighu as both the sample (ace sample ) and the receptor (ace receptor ). the sensorgram for this experiment shows ace sample injections at various concentrations binding to sars-cov- rbd between and seconds ( fig. b) . at seconds we inject ace receptor at nm. we observe ace receptor binding to sars-cov- rbd at the lower ace sample concentrations, but not at the highest ace sample concentration. the binding signals of ace sample and ace receptor intersect close to nm, suggesting the assay is working as expected (fig c) . a measure of % ace receptor inhibition can be calculated to the dose dependence response of the sample (fig. d) . while the spike-ace interaction is also being considered as an important therapeutic target. indeed, there have been sars-cov- spike-ace inhibitors developed previously( ). to examine the functionality of these small molecules beyond direct binding to the spike protein, assays such as the one developed here are needed. the spr instrument is often used for drug discovery and the spr assay could be easily adapted to examine blocking capabilities of candidate drugs. the elisa assay does not depend on the molecular identity of the competitor, so small molecule or peptide inhibitors could be directly assessed in this assay. our study presents a new set of assays for assessing ability of antibody samples to inhibit sars-cov- spike interaction with its receptor. as with most assays, the limit of detection can be an issue. in some of our samples, we saw robust blocking and in others there was minimal blocking. this could be a property of the samples themselves or a limit in the ability to detect ace inhibition in our assays. in addition, discovering functional monoclonal antibodies can be the sars-cov- pseudovirus was produced by co-transfection of hek t cells with : ratio of dna plasmid encoding sars-cov- s protein (genscript) and backbone plasmid pnl - substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) an interactive web-based dashboard to track covid- in real time angiotensin-converting enzyme is a functional receptor for the sars coronavirus a pneumonia outbreak associated with a new coronavirus of probable bat origin the sars-cov s glycoprotein: expression and functional characterization structure of the sars-cov- spike receptor- binding domain bound to the ace receptor structural basis of receptor recognition by sars-cov- characterization of the receptor- binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine structural basis for the recognition of sars- cov- by full-length human ace cryo-em structure of the -ncov spike in the prefusion conformation a serological survey on neutralizing antibody titer of sars convalescent sera treatment of critically ill patients with covid- with convalescent plasma effectiveness of convalescent plasma therapy in severe covid- patients convalescent plasma therapy for the treatment of patients with covid- : assessment of methods available for antibody detection and their correlation with neutralising antibody levels development of genetic diagnostic methods for novel coronavirus (ncov- ) in japan serology assays to manage covid- . science. animal igg and serological competition. a) auc is significantly decreased in the presence of vaccinated mouse igg competitors; however a greater decrease is observed when full-length cov- spike protein was immobilized versus naïve mouse igg samples. b) elisa competition curves for vaccinated rabbit igg (iggr low dose, blue; iggr high dose, red) or sera (sera low dose, blue; sera high dose, red) versus naïve rabbit igg or c) sera samples (grey) and pooled day rabbit igg or sera samples (black). d) elisa competition curves for week vaccinated guinea pig sera (pool, dark blue; individual animals, blue) versus naïve guinea pig sera samples (grey) and pooled prevaccinated guinea pig sera samples (black). e) auc for vaccinated guinea pig igg pool (blue) versus naïve guinea pig primate serological competition. a) four constant concentrations of ace -igmu were tested with varying concentrations of the ace -ighu competitor to establish an optimal ace -igmu concentration which displays a full blocking curve (red, . ug/ml) from the competitor dilution series while retaining a wide range in signal. b) elisa competition curves for vaccinated nhp sera (blue) versus human sera from nine sars-cov- positive covid- patients was tested in the primate competition assay and compared to sixteen naïve human sera collected pre-pandemic. the auc of the covid- patient serum (purple) is significantly decreased compared to the pre-pandemic human serum (grey) and normalized to a buffer control. b) pseudovirus neutralization assay for the nine the authors would like to matthew sullivan for providing feedback on the manuscript. funding key: cord- - jqdi authors: giobbe, giovanni giuseppe; bonfante, francesco; zambaiti, elisa; gagliano, onelia; jones, brendan c.; luni, camilla; laterza, cecilia; perin, silvia; stuart, hannah t.; pagliari, matteo; bortolami, alessio; mazzetto, eva; manfredi, anna; colantuono, chiara; di filippo, lucio; pellegata, alessandro; li, vivian sze wing; eaton, simon; thapar, nikhil; cacchiarelli, davide; elvassore, nicola; de coppi, paolo title: sars-cov- infection and replication in human fetal and pediatric gastric organoids date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jqdi coronavirus disease (covid- ) pandemic caused by severe acute respiratory syndrome coronavirus (sars-cov- ) infection is a global public health emergency. covid- typically manifests as a respiratory illness but an increasing number of clinical reports describe gastrointestinal (gi) symptoms. this is particularly true in children in whom gi symptoms are frequent and viral shedding outlasts viral clearance from the respiratory system. by contrast, fetuses seem to be rarely affected by covid- , although the virus has been detected in placentas of affected women. these observations raise the question of whether the virus can infect and replicate within the stomach once ingested. moreover, it is not yet clear whether active replication of sars-cov- is possible in the stomach of children or in fetuses at different developmental stages. here we show the novel derivation of fetal gastric organoids from - post-conception week (pcw) fetuses, and from pediatric biopsies, to be used as an in vitro model for sars-cov- gastric infection. gastric organoids recapitulate human stomach with linear increase of gastric mucin ac along developmental stages, and expression of gastric markers pepsinogen, somatostatin, gastrin and chromogranin a. in order to investigate sars-cov- infection with minimal perturbation and under steady-state conditions, we induced a reversed polarity in the gastric organoids (rp-gos) in suspension. in this condition of exposed apical polarity, the virus can easily access viral receptor angiotensin-converting enzyme (ace ). the pediatric rp-gos are fully susceptible to infection with sars-cov- , where viral nucleoprotein is expressed in cells undergoing programmed cell death, while the efficiency of infection is significantly lower in fetal organoids. the rp-gos derived from pediatric patients show sustained robust viral replication of sars-cov- , compared with organoids derived from fetal stomachs. transcriptomic analysis shows a moderate innate antiviral response and the lack of differentially expressed genes belonging to the interferon family. collectively, we established the first expandable human gastric organoid culture across fetal developmental stages, and we support the hypothesis that fetal tissue seems to be less susceptible to sars-cov- infection, especially in early stages of development. however, the virus can efficiently infect gastric epithelium in pediatric patients, suggesting that the stomach might have an active role in fecal-oral transmission of sars-cov- . severe acute respiratory syndrome coronavirus is responsible for a pandemic that has proven catastrophic, due to the lack of immunity in the human population and the range of pathological features associated with infection, including severe and often life-threatening respiratory syndromes causing major health, social and economic consequences. the virus has been shown to infect respiratory epithelial cells and to spread mainly via the respiratory tract . as governments and international health agencies seek effective policies to minimise infections, maintain health care delivery, and eventually ease movement restrictions, understanding the pathogenesis and the various mechanisms for transmission is of the utmost importance. it is well established that adults are more likely than children to develop symptoms upon sars-cov- infection, but little is known on the role of children in transmission of the disease. a growing body of literature suggests that replication at the level of the gastrointestinal (gi) tract not only occurs in a large proportion of confirmed cases , but it also extends the overall duration of shedding, after viral clearance from the respiratory tract has occurred . interestingly, infected children have been shown to be particularly prone to develop gi symptoms which can be moderate-to-severe, leading to intensive care unit (icu) admission and mimicking, in some cases, symptoms of appendicitis . additionally, sars-cov- was detected by means of electron microscopy in stool samples, elevated concentrations of sars-cov- rna were detected in air samples collected in patients' toilet areas and rectal swabs from mildly symptomatic pediatric patients persistently tested positive, even after viral clearance from the upper respiratory tract had occurred . this evidence, together with the recent demonstration of a high receptor density at the level of the oral cavity and tongue , raises important questions about the likelihood of fecal-oral transmission and whether therapeutic interventions to reduce gastrointestinal infection will play a role in the control of the disease. however, a recent report of consecutive covid- positive children from wuhan did not find any difference in fecal nucleic acid rt-pcr between children with or without gi symptoms, suggesting that rt-pcr detection of the virus was not due to gut infection but coming instead from the respiratory tract from swallowed sputum . defining the role of the gi tract in sars-cov- infection may also help understand the risks of vertical transmission during gestation since amniotic fluid is swallowed by the fetus during gestation and viral contamination has been isolated from the placenta of an affected woman . while samples from affected mothers have so far failed to prove that amniotic fluid, cord blood, and breast milk contain sars-cov- , very limited data are available at this stage for pregnant women with covid- , and even fewer data are available on intrauterine vertical transmission . while severe symptoms and death have been recorded in infants as young as months of age , very few cases of pathology in neonates have been associated with infection and when newborns from infected mothers were screened for sars-cov- , they tested negative for the virus . however, this limited cohort cannot exclude the possibility of infection of the fetuses. reliable human in vitro gi model systems that faithfully reproduce infection dynamics and disease mechanisms will prove key to advance our understanding of sars-cov- replication and pathology in the gi tract. little information is available with respect to the distribution of the viral receptor angiotensinconverting enzyme (ace ) at the level of the gi tract of humans . in particular, we lack fundamental information regarding which region of the gi system is the target of replication and primarily associates with the prolonged shedding of sars-cov- in both pediatric and adult patients. organoids have attracted great attention, enabling in vitro disease modeling and providing an ideal tool for studying infectious pathogens, particularly of the gi tract . recent studies have demonstrated how sars-cov- can efficiently infect human intestinal enteroids , , providing evidence in support of the hypothesis that sees sars-cov- as a fecaloral transmissible pathogen. however, it remains to be elucidated whether access to the duodenum depends on passive transport of infected oral fluids across the stomach, or on active viral replication in the gastric mucosa. human gastric organoids derived from adult patients and induced pluripotent stem cells have proved to be instrumental for the generation of reliable in vitro models for the characterization of infectious agents , . organoid derivation from human fetal organs has been shown for the intestine , liver and pancreas . here, we describe the novel derivation of proliferative progenitors from human fetal stomach and their expansion in vitro as enterospheres. furthermore, we provide insight into the ability of sars-cov- to infect an organoid-based model of the gastric mucosa at both fetal and pediatric ages. this work aims to unravel the susceptibility of the stomach to sars-cov- infection through the development of an innovative expandable in vitro model that faithfully reproduces the gastric microenvironment. a deeper understanding of the susceptibility of the human stomach to sars-cov- infection and replication could lay the foundations for the development of therapeutic options to reduce gastrointestinal infection. organoids are organized three dimensional structures that can be grown from isolated stem cells found in adult and fetal tissues. in order to derive a novel in vitro gastric model of fetal origin, we firstly characterized the tissues isolated from human fetuses and compared them to gastric mucosa obtained from pediatric patients undergoing surgery (fig. a) . developing stomach structures are shown in fig. b from carnegie stage (cs) (corresponding to mid-week ) to post conception week (pcw) . gastric crypts start to invaginate between pcw and pcw and form a clearly defined crypt at around pcw (fig. c) . we characterized the appearance of gastric markers during fetal development. mucin ac positive pit mucous cells were evident at pcw , while pepsinogen c (marking chief cells) started to emerge at around pcw (fig. d) . mucin , a gland mucous cell marker, was constitutively expressed from early week (cs ), together with enteroendocrine cells marked by chromogranin a that were present from mid-week (cs ) (fig. e) . we then defined three distinct groups of gastric epithelial tissues based on gland maturity: ) early fetal stomachs from pcw to pcw ; ) late fetal stomachs from pcw to pcw ; ) pediatric stomachs. real time quantitative pcr (qpcr) was performed on gastric tissues obtained from these three groups to examine the gene expression changes of stem cell and differentiated cell markers. a significant correlation between developmental stage and mrna expression was observed for axin , mucin ac (muc ac), pepsinogen a (pga ), with a similar trend for chromogranin a (chga) and atpase h+/k+ transporting subunit beta (atp b) (fig. f) . on the other hand, expression of leucine-rich repeat-containing g-protein coupled receptor (lgr ) and somatostatin (sst) were significantly higher in the late fetal stomachs. following gastric tissue characterization, we efficiently extracted glandular crypts from fetal and pediatric stomachs utilizing chelating buffers and mechanical stress. to improve compatibility with subsequent clinical application of this organoid system, isolated fetal cells were expanded in a chemically defined medium, without the use of animal serum or conditioned media. each gastric cytokine, based on previous work was screened and selectively removed from the organoids split to single cells and grown for days to allow clonal organoid formation. while r-spondin ,wnt- a and noggin withdrawal led to more differentiated morphology, chir (gsk- inhibitor) proved to be essential in the formation of fetal gastric organoids starting from single cells (fig. a) . no difference was found among fetal and pediatric organoid growth in the medium. we then performed isolation of several gastric organoid lines (supplementary table ). the isolation protocol proved to be highly efficient and we obtained a biobank composed of lines of early fetal stage (from cs to pcw ), lines of late fetal stage (from pcw to pcw ), and lines of pediatric stage organoids (from months-to years-old). expanding organoids were stained for the epithelial marker ezrin (ezr) and luminal polarized f-actin fig. c . muc ac was present on the luminal side of the organoids of all stages, with a relatively lower expression in the early pcw (fig. c ). organoids were expanded and counted for several months, showing higher rate of expansion for earlier fetal stages (fig. d) . no plateau was reached in any of the curves even after several months, showing the possibility to obtain stable fetal gastric organoid lines ( supplementary fig. ). after weekly passaging for more than weeks, we further characterized the organoid lines to evaluate genomic stability. single nucleotide polymorphism (snp) arrays on early fetal, late fetal and pediatric organoids showed no chromosomal duplications, no large deletions, nor other karyotype aberrations, demonstrating the organoids are genetically stable after prolonged in vitro culture (fig. e) . real time pcr was performed on organoids grouped in early fetal (cs to pcw ), late fetal (pcw to pcw ) and pediatric. stem cell crypt markers lgr and axin were expressed in these organoids, indicating the presence of proliferating cells. muc ac showed a pattern of increased expression along differentiation comparable to the tissue of origin in fig. f . the expression patterns of muc and sst were also comparable to the tissue of origin. on the other hand, chga showed an inverted pattern of expression, while transcript expression of proton pump transporter atp b, responsible for gastric acid secretion, was lost in the organoid model (fig. f) . next, we characterized the transcriptomics of gastric epithelial tissues and gastric-derived organoids, at three developmental stages. rna-seq was performed on the three groups of early fetal, late fetal and pediatric samples. principal component analysis (pca) showed smaller heterogeneity in the organoid groups derived at different stages of fetal and pediatric development with respect to the primary tissues analyzed at the same stages, which may also include some heterogeneity from the surrounding cells as a result of the isolation procedure (fig. a) . when pca was performed including only organoid samples, the overall variability due to the different developmental stage was comparable to that between biological replicates within the same group ( supplementary fig. a ). this analysis suggests that transcriptional differences related to the developmental stage of the tissue of origin could be more subtle than those captured at pca level. we then analyzed the expression of typical gastric markers in organoids derived from tissues at different stages . the only differentially expressed gene (deg) was muc ac, which was more highly expressed in organoids from tissues at later developmental stages (fig. b) , confirming the qpcr results above (fig. f) . we did not observe processes of "intestinalization" of the organoids in culture, as cdx expression was negligible ( supplementary fig. c ). consistent with the qpcr results in fig. f , expression of atp a and atp b proton transporters were not detectable in the rna-seq, confirming the absence of the parietal cells in the organoids ( supplementary fig. d ). on the other hand, most putative genes identifying gastric crypt stem cells (sox , olfm , procr, mki , tacstd ) were expressed at all developmental stages ( supplementary fig. e ). rna-seq analysis on the gastric primary tissues showed a significant increase in transcript levels of the functional markers along the developmental stage (fig. c) , confirming that the temporal trend shown by pca (fig. a) is related to specific gastric developmental stages. when we performed hierarchical clustering analyses of the previously reported genes representing the six stomach cellular subtypes , most of these genes from the analysis were not degs (fig. d) . indeed, these six cell types are known to be all co-present at different stages of embryo development from pcw to . furthermore, we clustered degs between pair of conditions to reproduce a pseudo-temporal profile between the three developmental stages considered (fig. e) . we highlighted in fig. f the results of a pathway enrichment analysis from selected clusters that displayed gastric-related functions. full results are reported in (supplementary file ). in order to validate both fetal and pediatric gastric organoids as functional in vitro models of sars-cov- infection and replication, we optimized the culture condition for viral infection in a d system (fig. a) . standard organoids of endodermal organs have a luminal polarity facing the internal portion of the structure, with an apical (inner) f-actin and zonula occludens- (zo- ), and basal (external) lamina marked by b- integrin (b -int) (fig. b) . such inner polarity might be an obstacle to an efficient viral infection in vitro, given that the apical side is luminal. in addition, matrigel might impede efficient diffusion of the virus, thus affecting the likelihood of establishing an infection, and subsequent detection and quantification of the viral progeny released from the infected organoids. to maximize the efficiency of infection and the effective quantification of viral progeny released, we reverted the polarity of the gastric organoids to expose the apical side of the cells on the outer side. organoids were removed from the surrounding extracellular matrix and cultured in suspension for days, resulting in the exposure of the apical f-actin on the outer side, accompanied with muc ac secretion externally (fig. c) . conversely, zo- and b -int expression was inverted compared to standard organoids in fig. b . full d deconvolution images of reversed organoids are shown in supplementary fig. a . we then analyzed the absolute expression of ace and transmembrane protease serine (tmprss ) sars-cov- receptors in our gastric models to evaluate the sars-cov- infection potential. rna-seq data analysis showed that expression of ace was significantly lower in early fetal stomachs compared to the pediatric ones, while late fetal samples' higher variability prevents drawing a final conclusion. on the other hand, tmprss mrna expression was consistently high throughout the stomach samples ( fig. d) . we further performed rna sequencing on rp-gos to evaluate the transcriptional changes. pca analysis showed similar clustering among the different stages between rp-gos ( supplementary fig. b ) and normal polarity organoids ( supplementary fig. a) . a comparable pattern of expression for ace and tmprss was observed also in rp-gos, with ace significantly higher expressed in pediatric organoids (fig. e ). protein expression of ace was further confirmed by immunofluorescence staining in all the rp-gos derived at pcw , pcw and pediatric stages (fig. f) . to investigate the susceptibility of organoids to sars-cov- infection, we selected a sars-cov- isolate obtained from the pharyngeal swab of a -year-old pediatric patient. purity of the isolate was confirmed by means of molecular testing comprising an extended panel of bacterial and viral respiratory agents. reversed-polarity gastric organoids derived at pcw , pcw and pediatric stages and normalpolarity organoids from the same stages were infected by trained virologists in a biosafety level (bsl ) laboratory. after a -hour infection, the organoids were cultured up to hours in suspension and checked for structural integrity and viability by visual examination on a daily basis (fig. a) . vero e cells were used as a susceptible substrate for sars-cov- , to validate the infection in vitro (supplementary fig. a ). in next, we performed rna-seq analysis on non-infected and infected organoids samples at each developmental stage. interestingly, we identified significantly degs in samples from pcw ( degs) and pediatric organoids ( degs), but not in pcw samples (fig. a ). all the degs in both developmental stages were up-regulated after the infection, among which genes (cmpk , ddx , dhx , herc , ifi , ifit , ifit , irf , mx , rsad ) were in common. we further performed an analysis to find the degs associated with the infection irrespective of the developmental stage, identifying a further degs (bst , eif ak , herc , ifi l, lamp , slc a , stat , usp ), for an overall number of degs equal to . intriguingly, approximately % of the degs identified in this study as respondent to the infection were previously found to be degs in a literature survey of transcriptomic data on sars-cov, where genes were identified as degs at the intersection of at least studies (fig. b ). among the common degs, some were first responders of the infection process, like the ddx and ifih encoding the viral rna sensors rig-i and mda respectively, and their regulators, such as dhx ; others more downstream players of the response, such as oas that is activated by detection of dsrna to inhibit viral replication, ifit that inhibits the expression of viral mrnas, and bst that limits viral secretion. all the degs identified in this study, except slc a , showed an up-regulation in response to the infection in samples from pcw and the pediatric patient (fig. c) . ifi l showed the highest fold change both in pcw and in pediatric samples. this gene was previously found to be a marker of viral infection compared with to bacterial infection and more recently described as a negative modulator of innate immune responses induced after virus infections . type i, ii and iii interferon (ifn) transcripts were not differentially expressed between non-infected and infected rp-gos. we then performed an enrichment analysis within the reactome database to understand the functional implications of the degs up-regulation after the infection in pcw and pediatric samples (fig. d ). interestingly, the majority of degs fell within pathways associated with the innate response to viral infection, particularly those involved in the regulation of type i ifn alpha/beta by cytoplasmic patternrecognition receptors (prrs) such as rig-i and mda , and the expression of ifn stimulated genes (isgs). moreover, to capture more subtle differences between non-infected and infected samples that do not emerge in the deg analysis, we performed a quantitative set analysis for gene expression (qusage) within the gene ontology database (supplementary file ). categories related to the ifn response to the infection were identified in all three sample groups (pcw , pcw , and pediatric), other categories were reported for completeness, but require further studies to understand their relevance. reliable in vitro models capable of reproducing complex in vivo systems are becoming increasingly important in life sciences and play a crucial role in the investigation of emerging pathogens of devastating sanitary and economic impact like sars-cov- . in the context of the covid- pandemic, it is still unclear how gastrointestinal virus replication might affect the clinical outcome of infection, the development of immunity and the transmission dynamics in the population. while it has been shown that sars-cov- is frequently detected in rectal samples of affected children and adults, it remains to be determined if the virus is able to produce a primary infection throughout the entire gi tract, or if its presence could be related in part to a passive transport of contaminated sputum coming from the upper respiratory tract. most importantly, the ability of sars-cov- to persist in the gi tract after respiratory clearance, has not yet been fully elucidated in terms of viral infectivity, possibly impairing important public health and policy measures for the control of the disease. these concerns are particularly relevant in children who appear on average to suffer a less severe respiratory illness compared to adults, despite recording more prominent gi symptoms with clinical pictures mimicking appendicitis , a hyperinflammatory shock syndrome (paediatric multisystem inflammatory syndrome -temporally associated with sars-cov- , pims-ts) , or acting as relatively asymptomatic carriers of the virus. those risks have prompted clinical guidelines recommending the avoidance of aerosol producing procedures, including upper gi endoscopies, in children with confirmed or suspected covid- for the safety of frontline clinical staff and other patients. susceptibility of the different portions of the gi tract to sars-cov- infection has not been fully characterized and due to a paucity of autopsy reports targeting the gastric compartment , the capacity of sars-cov- to infect the gastric mucosa is still unclear. two recent studies show that the sars-cov- receptor angiotensin converting enzyme (ace ) is highly expressed on differentiated enterocytes and that intestinal organoids derived from the small intestine can be easily infected by sars-cov- , . interestingly, intestinal organoids derived from both human and horseshoe bats are fully susceptible to sars-cov- infection and sustain robust viral replication. although vertical transmission of sars-cov- seems to be anecdotal, it is still unclear if this lack of infection relates to the inability of the virus to migrate through the placenta , to the low susceptibility of the fetal cells to infection, or simply on low viremic loads. whereas human fetal intestinal organoids have already been reported , a reliable d culture in vitro model of human gastric mucosa at different developmental stages has been challenging to achieve. in this study, we describe successful derivation of human gastric organoids from both fetal and pediatric samples and we demonstrate that gastric cells are susceptible to sars-cov- infection. furthermore, we describe how a reversed-polarity organoid model can help expose the apical domains, in direct contact with the surrounding microenvironment, so that pathogens can easily access surface receptors on the cells. past studies showed laborious pathogen infection in human gastric organoids by microinjection of helicobacter pylori solution into the lumen of each organoid , . other studies showed disruption of d organoid organization in favor of a d monolayer culture to overcome inner polarity problems in h. pylori infection . recently, in sars-cov- infection studies, intestinal organoid d structures were sheared to expose the apical viral receptors and then reaggregated in ecm hydrogel droplets , . in these studies, infection of organoids upon shearing and embedding was attained, as proved by the immunofluorescent staining of viral antigens and detection of viral rna. however, release of the infectious progeny in the culture supernatants differed greatly, ranging from positive titers of around - tissue culture infectious dose % (tcid )/ml to - tcid /ml . such discrepancy could depend on multiple factors, but we believe that the laborious nature of this approach might increase inter-operator and inter-laboratory variability, resulting in the generation of less-reproducible data. to overcome these complications, we decided to prevent the system perturbation and infect human gastric organoids under steady-state conditions. taking advantage of a polarity reversion study , we generated fetal and pediatric cultures of rp-gos in suspension. in this condition of exposed apical polarity and absence of surrounding matrigel, we could infect organoids and readily titrate the infectivity of the progeny virus, recording infection level comparable to those shown by zhou et al. , taking into account an ffu-to-tcid conversion factor of . (data not shown, from previous validation of assays). interestingly, when we infected gastric organoids through shearing and re-embedding in matrigel, infection was achieved but failed to detect virus in the supernatants, indicating this approach as suboptimal for our purposes. we demonstrated that the rp-gos are fully susceptible to sars-cov- infection, with an efficiency of replication that correlates directly with the developmental stage of origin. quantification of gene transcripts coding for the viral receptors ace and tmprss suggested that the observed levels of replication are not dependent on difference in the density of receptors, given their statistically comparable expression across the three stages. nonetheless, variation in the protein-to-mrna ratio across organoids of different developmental stage should be taken into account and receptivity investigated in future work. immunofluorescence staining for the nucleocapsid indicated a clear cytosolic localization of this protein that in some cells was associated with the presence of the cleaved caspase , confirming the occurrence of apoptosis in the gi compartment . apoptosis in infected gi mucosal cells might account at least in part for the frequent abdominal pain, vomit and diarrhea described in covid- patients , in particular in pediatric populations. apoptosis is one of the key mechanisms of cells to restrict viral infections by destruction of the cellular machinery indispensable for virus replication; on the other hand, selected viruses have evolved diverse adaptative strategies to control this phenomenon in their favor . to this respect, sars-cov was shown to replicate in vitro to high titers in cells undergoing apoptosis and to low titers in cells where cytopathic effect was limited and a persistent infection was established . interestingly, induction of apoptosis for sars-cov was proved to be caused by a nuclear localization of the nucleocapsid protein that in turn resulted in its cleavage by caspases and . the precise mechanism underpinning nucleocapsid cleaving, apoptosis and the replication efficiency of sars-cov remains unexplored. we reckon that similar mechanistic studies are of great interest to decipher the pathology of sars-cov- in the gi system and its implications on virus shedding and transmissibility. rp-go of late fetal and pediatric age infected with sars-cov- shared a transcriptional footprint surprisingly similar to those described for infected human small intestine enteroids , in which type i ifn genes were either poorly expressed or undetectable, despite enterocytes and gastric cells displaying moderate levels of isgs primarily involved in the recognition of viral rna. moreover, our transcriptional data are in considerable agreement with clinical and experimental profiles derived from covid- patients, infected normal human bronchial epithelial cells and in vivo studies in ferrets that highlighted a negligible expression of genes of the ifn family but a robust expression of chemokines and isgs. our data provide novel evidence in support of the hypothesis that pathogenesis of covid- is at least in part dependent on a reduced innate antiviral response and an unbalanced cytokine production. nevertheless, in our model chemokines were not differentially expressed, whereas in small intestine organoids, zhou et al reported degs coding both chemokine receptors and ligands. since we conducted a bulk rna-seq analysis and the number of infected cells in our organoids were still a minority, we speculate that many processes specific of infected cells, most likely did not reach a statistically significance level and might have gone undetected, hence imposing a cautionary approach in our interpretation of data. interestingly, a large overlap of degs with previous transcriptomic studies of sars-cov infection was found, including the peculiar feature of a limited/absent type i ifn induction and the recruitment of a subset of cytoplasmic prrs. similarly to sars-cov, in which orf b and orf are the main antagonists of ifn, a recent study currently under peerrevision indicates the sars-cov- orf b protein as a potent ifn inhibitor, supporting ours and the published transcriptomic data herein discussed. our gastric organoid system offers a unique tool to characterize the replication of viruses and some of the associated pathological consequences of infection. this innovative model could represent an in vitro scalable platform for the development and testing of antiviral drug candidates targeting the gi system. a deeper understanding of the pathogenic mechanism underpinning the viral colonization of the gi system will potentially expand the available therapeutic options for the inhibition and preventing of gi infection, in an attempt to suppress viral shedding and halt spreading of the disease. the clinical importance of our findings relates to the worrisome phenomenon of prolonged shedding of sars-cov- from the gi tract and calls for further research to assess the risk of vertical transmission in infected women. defining the susceptible age and the target anatomical sites will prove of crucial importance for the implementation of sensitive and sustainable diagnostic screening for the identification of contagious asymptomatic patients. human fetal stomachs were dissected from tissue obtained immediately after termination of pregnancy from to pcw (post conception week), in compliance with the bioethics legislation in the uk. human pediatric gastric surgical biopsies were collected after informed consent, in compliance with all relevant ethical regulations for work with human participants, following the guidelines of the licenses nd and ds . fetal stomachs and pediatric biopsies were collected in ice-cold sterile phosphate buffered solution (pbs -sigma-aldrich) and processed within a few hours of collection. gastric crypt stem cells were isolated from specimens following a well-established dissociation protocol . briefly, fetal stomachs where cut open longitudinally along the lesser curvature, while @ . cm pediatric biopsies where processed as they were obtained. specimens were cold-washed in a plate with chelating buffer (sterile milli-q water (merck millipore) with . mmol/l na hpo , . mmol/l kh po , . mmol/l nacl, . mmol/l kcl, . mmol/l sucrose, . mmol/l d-sorbitol, . mmol/l dl-dithiothreitol, ph , all from sigma-aldrich). mucus was removed with a glass coverslip and mucosa was stripped from muscle layer. tissue was cut in small pieces, transferred in a ml tube in new chelating buffer and pipetted repeatedly. supernatant was discarded and ml of mm edta was added and incubated for min at room temperature. edta was discarded and mucosa pieces were washed in ice cold pbs with ca + /mg + (sigma-aldrich). tissue was transferred to a new cm plate on ice and pressure was applied on top with a sterile . cm plate, to release the crypts from the mucosa. table . cell were passaged every - days. to passage the organoids, matrigel droplets were disrupted by pipetting in the well and transferred to tubes on ice. cells were washed with ml of cold basal admem+++ and centrifuged at g at °c. (first method) for single cell dissociation, supernatant was discarded, and the pellet resuspended in ml of tryple (thermo fisher) and incubated for min. after incubation organoids were disaggregated by pipetting, and ml of ice-cold admem+++ was added to dilute and inhibit tryple. (second method) for standard organoid passage during expansion, the organoid pellet was resuspended in . ml of ice-cold admem+++ and organoids were manually disrupted by narrowed (flamed) glass pipette pre-coated in bsa % in pbs, to avoid adhesion to the glass. cells were washed, pelleted and supernatant discarded. almost-dry pellets of disaggregated organoids (or single cells) were thoroughly resuspended in cold liquid matrigel, aliquoted in µl droplets in pre-warmed multi-well plates and incubated at °c for min to form a gel. rho-kinase inhibitor (tocris) was added to single cell dissociated organoids. medium was added and changed every days. fully grown gastric organoids at day after single cell disaggregation were removed from surrounding extracellular matrix using a modified published protocol . matrigel was dissolved with min treatment of the droplets with cell recovery solution (corning) at °c. organoids were retrieved from the plates using % bsa-coated cut-end tips and transferred to % bsa-coated ml tubes. cells were extensively washed with ice-cold pbs and centrifuged at g for min at °c. supernatant was discarded, the pellet was resuspended in complete medium and transferred to non-tissue culture treated low-adhesive multiwell plates (pre-coated in % bsa). organoids were cultured in suspension for days to allow reversion of polarity, before use in infection experiments. for rna isolation from stomach tissues, i) pediatric stomach biopsies consisted of only mucosal layer from surgical samples; ii) late fetal stomachs were cut open and mucosal layer was isolated; iii) early fetal stomachs were processed with no layer isolation, given the small size of the samples. mucus was removed from all the samples with a glass coverslip to prevent rna loss during the isolation protocol, and tissues washed in ice-cold pbs. then the tissues were finely cut with a scalpel on a petri-dish on ice and transferred to . ml tubes. recovery solution (corning) at °c. cells were then washed in ice cold pbs to remove matrix leftovers that could interfere with rna isolation. organoids were centrifuged at g at °c and supernatant discarded. dry pellets of tissues and organoids were lysed with rlt buffer (qiagen). rna was isolated with rneasy mini kit (qiagen) following manufacturer's instructions. total rna was quantified using the qubit . fluorimetric assay (thermo fisher scientific). rna reverse transcription was performed using the high-capacity cdna reverse transcription kit (thermo fisher), according to the manufacturer's instructions. reverse transcription was done using the t thermal cycler (bio-rad). the qrt-pcr was performed with taqman gene expression assay probes (thermo fisher) according to the manufacturer's instructions. the following probes (all from thermo fisher) were used: gapdh (glyceraldehyde -phosphate dehydrogenase), lgr (leucine-rich repeat-containing gprotein coupled receptor ), axin (axin-like protein), muc ac (mucin ac), muc (mucin ), pga (pepsinogen a ), sst (somatostatin), gast (gastrin), chga (chromogranin a), atp b (atpase h+/k+ transporting subunit beta). reactions were performed on step one plus real-time pcr system (applied biosystems) and results were analyzed with stepone (version . ) software (life technologies). gapdh expression was used to normalize ct values for gene expression, and data were shown as relative fold change to controls (early fetal stage), using ∆∆ct method, and presented as mean ± sem. for rna-seq data of original tissues and organoids with spontaneous polarity, total rna ( ng) from each sample was prepared using quantseq ' mrna-seq library prep kit (lexogen gmbh) according to manufacturer's instructions. the amplified fragmented cdna of bp in size were sequenced in singleend mode using the nova seq (illumina) with a read length of bp. illumina novaseq base call (bcl) files were converted into fastq files through bcl fastq (version v . . . ) following software guide. sequence reads were trimmed using bbduk software (bbmap suite . ), following software guide, to remove adapter sequences, poly-a tails and low-quality end bases (regions with average quality below ). alignment was performed with star . . a on hg reference assembly obtained from cellranger website (ensembl ), following online site guide. the expression levels of genes were determined with htseq-count . . by using cellranger pre-build genes annotations (ensembl assembly ). all transcripts having < cpm in less than samples and percentage of multimap alignment reads > % simultaneously were filtered out. for rna-seq data of non-infected and infected rp-gos, a total of pg of rna was used as input for the synthesis of cdna with the smart-seq v ultra low input rna kit for sequencing (takara bio usa, mountain view, ca, usa). manufacturer suggested protocol was followed, with minor modifications. pg of dna generated with smart-seq v kit were used for preparation of library with nextera xt dna library preparation kit (illumina inc., san diego, ca, usa), following suggested protocol. libraries were sequenced in pair-end mode using a nova seq sequencing system on an sp, cycles flow cell (illumina inc., san diego, ca, usa). illumina novaseq base call (bcl) files were converted into fastq files through bcl fastq (version v . . . ) following software guide. alignment was performed with star . . a on hg reference assembly obtained from the gencode website (primary assembly v. ). transcripts estimated counts were determined with rsem . . by using the gencode v. genes annotations. all genes having < cpm in less than replicates of the same condition were filtered out. differentially expressed genes (degs) were computed with edger , using a mixed criterion based on p-value, after false discovery rate (fdr) correction by benjamini-hochberg method, lower than . and absolute log (fold change) higher than log ( . ). this analysis was paired between non-infected and infected samples derived from the same original sample. for rna-seq data of organoids with spontaneous polarity, degs were clustered according to a flat, increasing, or decreasing profile according to the differential expression analysis between pairs of time points. principal component analysis was performed by singular value decomposition (svd) on log (cpm+ ) data, after centering, using matlab r a (the mathworks). degs over-representation analysis of gene ontology (go) and reactome categories was performed using cluego (version . . ) . reactome hierarchy was visualized using cluego within cytoscape . hierarchical clustering of degs was performed on median-centered log (cpm+ ) data in matlab, using euclidean distance and complete linkage. log-normalized expression data were analyzed by the quantitative set analysis for gene expression (qusage) bioconductor package. vero e cells (atcc® crl ™) were maintained in dulbecco's modified eagle's medium (dmem, thermo fisher) supplemented with % fetal calf serum (fcs), penicillin ( u/ml) and streptomycin ( u/ml) (all from thermo fisher) at °c in a humidified % co incubator. the sars-cov- isolate was obtained from a nasopharyngeal swab collected from a -year-old boy in italy. briefly, the swab viral transport medium was filtered through a . µm filter, serially diluted and incubated onto a confluent layer of vero e cells, for days. to ensure purity of the viral isolate, the supernatant of the highest dilution in which cytopathic effect was visible was tested for the presence of human respiratory pathogens including sars-cov- , using the qiastat-dx respiratory sars-cov- panel (qiagen). viral stocks were produced infecting at a multiplicity of infection (moi) of vero e cells cultured in dmem supplemented with % fcs, penicillin ( u/ml) and streptomycin ( u/ml) and incubating the cells for hours. supernatants were collected when % cells exhibited cytopathic effect and cleared by low-speed centrifugation before being stored at - °c. all infections in this paper were performed using the third culture passage of the original isolate. intact organoids were embedded in two µl-drops of matrigel per well, in -well plates. embedded organoids were washed once in dmem and infected at a moi of . by incubation with µl of an expansion medium viral suspension for hours. after removal of the inoculum, organoids were washed twice with a dmem solution and µl of complete medium were added to each well to maintain the culture at °c with % co . reversed-polarity organoids were infected at a moi of by incubation with µl of an expansion medium viral suspension for hours. after infection, organoids were washed twice in dmem to remove unbound virus. rp-gos were dispersed in a µl expansion medium at °c with % co . for all organoid cultures µl of supernatant were harvested at , , , and hours post infection. an equal volume of expansion medium replaced the sampled supernatant at each collection time. an extra sample at hours post infection was collected for the rp-gos. samples were stored at - °c before titration through the ffa. supernatants of organoid cultures and aliquots of viral stocks were serially diluted and incubated on confluent monolayers of vero e cells, in -well plates, for hour. culture medium formulation was the same used for virus propagation. after infection, the inoculum was removed and an overlay of mem, % fbs, penicillin ( u/ml) and streptomycin ( u/ml) and . % carboxy methyl cellulose was added. after hours, the overlay medium was removed and cells were fixed in a % paraformaldehyde (pfa) phosphate buffered solution (pbs), for minutes at °c. upon removal, cells were permeabilized by incubation with a . % triton x- solution for minutes. immunostaining of infected cells was performed by incubation of the j anti-dsrna monoclonal antibody ( : , ; scicons) for hour, followed by -hour incubation with peroxidase-labeled goat anti-mouse antibodies ( : ; dako) and a min incubation with the true blue™ (kpl) peroxidase substrate. solution of % bovine serum albumin and . % tween- in pbs was used for the preparation of working dilutions of immuno-reagents. after each antibody incubation, cells were washed times through a min incubation with a . % tween- pbs solution. focus forming units (ffu) were counted after acquisition of pictures at a high resolution of x dpi, on a flatbed scanner. human gastric tissues were fixed in % paraformaldehyde (pfa -sigma-aldrich) for hours and embedded in paraffin wax, then cut at µm on a microtome. hematoxylin and eosin (h&e) tissue slides were stained according to manufacturer's instructions with hematoxylin and eosin (h&e) (thermo fisher). immunostaining was performed by blocking and permeabilizing the tissue slides with pbs + triton x- . % with bsa . %. organoid whole-mounts were blocked and permeabilized with pbs + triton x- . % with bsa % for h at room temperature in rotation. primary antibodies were incubated in blocking buffer for h at °c in rotation and extensively washed in pbs + triton x- . %. secondary antibodies were incubated overnight at °c in rotation and extensively washed. slides were mounted in mounting medium, while floating organoids were moved to a glass-bottomed petri dish and blocked with a coverslip on top. the full list of primary and secondary antibodies is presented in supplementary table . organoids were imaged in bright field using a zeiss axio observer a . immunofluorescence images of whole-mount staining and sections were acquired on a confocal microscope zeiss lsm . infected organoid immunofluorescence images were acquired on a leica tcs sp . statistical analyses were performed using the following software: matlab (v. r a) for pca, pie plot, bar plot, hierarchical clustering with proteomic and rna-seq data. graphpad prism mac (v. . h) was used with all other graphs and charts. . c) hierarchical clustering of deg genes included in (b), data were median-centered for each pair of non-infected and infected conditions. d) results from an enrichment analysis within reactome database of degs highlighted in (a). symbol size is proportional to number of genes. p-value < - . white symbols were not enriched and were added to highlight the hierarchy between categories within reactome structure. virological assessment of hospitalized patients with covid- evidence for gastrointestinal infection of sars-cov- prolonged presence of sars-cov- viral rna in faecal samples gastrointestinal features in children with covid- : an observation of varied presentation in eight children aerodynamic analysis of sars-cov- in two wuhan hospitals characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding high expression of ace receptor of -ncov on the epithelial cells of oral mucosa comparative study of the clinical characteristics and epidemiological trend of covid- infected children with or without gi symptoms first case of placental infection with sars-cov- clinical characteristics and intrauterine vertical transmission potential of covid- infection in nine pregnant women: a retrospective review of medical records vertical transmission of coronavirus disease : severe acute respiratory syndrome coronavirus rna on the fetal side of the placenta in pregnancies with coronavirus disease -positive mothers and neonates at birth sars-cov- infection in children epidemiology of covid- among children in china digestive system is a potential route of covid- : an analysis of single-cell coexpression pattern of key proteins in viral entry process organoids as an in vitro model of human development and disease sars-cov- productively infects human gut enterocytes. science ( -. ) infection of bat and human intestinal organoids by sars-cov- modelling human development and disease in pluripotent stem-cell-derived gastric organoids in vitro expansion of human gastric epithelial stem cells and their responses to bacterial infection transplantation of expanded fetal intestinal progenitors contributes to colon regeneration after injury long-term expansion of functional mouse and human hepatocytes as d organoids extracellular matrix hydrogel derived from decellularized tissues enables endodermal organoid culture tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell rna-sequencing controlling epithelial polarity: a human enteroid model for host-pathogen interactions a qpcr expression assay of ifi l gene differentiates viral from bacterial infections in febrile children novel functions of ifi l as a feedback regulator of host antiviral responses quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations clinical characteristics of children with a pediatric inflammatory multisystem syndrome temporally associated with sars-cov- hyperinflammatory shock in children during covid- pandemic does the human placenta express the canonical cell entry mediators for sars-cov- ? a novel human gastric primary cell culture system for modelling helicobacter pylori infection in vitro review article: gastrointestinal features in covid- and the possibility of faecal transmission caspase cleavage of viral proteins, another way for viruses to make the best of apoptosis cell type-specific cleavage of nucleocapsid protein by effector caspases during sars coronavirus infection imbalanced host response to sars-cov- drives development of covid- sars-cov- orf b is a potent interferon antagonist whose activity is further increased by a naturally occurring elongation variant sequence analysis star: ultrafast universal rna-seq aligner rsem: accurate transcript quantification from rna-seq data with or without a reference genome edger: a bioconductor package for differential expression analysis of digital gene expression data cluego: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks cytoscape: a software environment for integrated models of biomolecular interaction networks interferon-induced transmembrane protein (ifitm ) is upregulated explicitly in sars-cov- infected lung epithelial cells dc is founder, shareholder, and consultant of next generation diagnostic srl. all the other authors of the study declare that they do not have anything to disclose regarding funding or conflict of interest with respect to this manuscript. the authors declare that all data supporting the findings of this study are available within the article, its supplementary information, attached files, and online deposited data (gastric rna-seq data: gse_xxxx, it will be deposited during revision), or from the authors upon reasonable request. key: cord- -z aux t authors: bierig, tobias; collu, gabriella; blanc, alain; poghosyan, emiliya; benoit, roger. m. title: design, expression, purification and characterization of a yfp-tagged -ncov spike receptor-binding domain construct date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: z aux t -ncov is the causative agent of the serious, still ongoing, worldwide covid- pandemic. high quality recombinant virus proteins are required for research related to the development of vaccines and improved assays, and to the general understanding of virus action. the receptor-binding domain (rbd) of the -ncov spike (s) protein contains disulfide bonds and n-linked glycosylations, therefore, it is typically produced by secretion. here, we describe a construct and protocol for the expression and purification of yellow fluorescent protein (yfp) labeled -ncov spike rbd. the fusion protein, in the vector pcdna /to, comprises an n-terminal interferon alpha (ifnα ) signal peptide, an eyfp, a flag-tag, a human rhinovirus c protease cleavage site, the rbd of the -ncov spike protein and a c-terminal x his-tag. we stably transfected hek cells. following expansion of the cells, the fusion protein was secreted from adherent cells into serum-free medium. ni-nta imac purification resulted in very high protein purity, based on analysis by sds-page. the fusion protein was soluble and monodisperse, as confirmed by size-exclusion chromatography (sec) and negative staining electron microscopy. deglycosylation experiments confirmed the presence of n-linked glycosylations in the secreted protein. complex formation with the peptidase domain of human angiotensin-converting enzyme (ace ), the receptor for the -ncov spike rbd, was confirmed by sec, both for the yfp-fused spike rbd and for spike rbd alone, after removal of yfp by proteolytic cleavage. possible applications for the fusion protein include binding studies on cells or in vitro, fluorescent labeling of potential virus-binding sites on cells, the use as an antigen for immunization studies or as a tool for the development of novel virus- or antibody-detection assays. the membrane-anchored, trimeric spike (s) glycoproteins are the most prominent protrusions on the surface of -ncov. cov spike proteins typically comprise two subunits. the s subunit is responsible for receptor binding and the s subunit is involved in fusing the membranes of the virus and the host (li ) . the s subunit is composed of an n-terminal domain (s -ntd) and a c-terminal domain (s -ctd) (li ) . s -ctd comprises two subdomains, one functioning as a core structure, the other one as a receptor-binding motif (li ) . the receptor-binding domain of -ncov binds human angiotensinconverting enzyme (ace ) with high affinity (wrapp et al. ) . the spike rbd is an important target for drug discovery research (toelzer et al. ) and for the development of vaccines (wrapp et al. ; wang et al. ) . within the s trimer, the receptor-binding domains (rbds) can be in a down conformation or alternatively in an up conformation, the latter being the receptor-accessible state (wrapp et al. ) . recent complex structures wang et al. ) confirmed that a single spike rbd, taken out of the trimeric context, is capable of binding its human receptor ace . here, we describe a construct and protocol for the production and purification of milligram amounts of n-terminally yfp-labeled spike rbd. the domain boundaries of our receptorbinding domain (rbd) construct are based on the construct used for the crystal structure by wang et al. (cell , pdb entry lzg) , comprising amino acids - , which also includes the receptor-binding motif (amino acids - , uniprotkb -p dtc ). expression is performed by secretion into serum-free medium from adherent, stably transfected hek cells. the protocol involves only standard cell culture techniques and equipment. our experiments confirmed that the fusion protein (also after proteolytic removal of yfp) binds the human ace peptidase domain. the dna coding for the ifn -eyfp-flagtag-prescission_site-s_rbd- xhis-tag-stopstop fusion protein was ordered from genewiz, cloned into the hindiii and xbai sites of pcdna /to (invitrogen). the human ace peptidase domain (amino acids - ) construct with an n-terminal interleukin- (il- ) peptide and a c-terminal xhis-tag and two stop codons was ordered as a fragmentgene from genewiz and cloned into the kpni and noti sites of pcdna /to. adherent hek cells were grown to ~ % confluence in a cm diameter cell culture dish at o c, % co . just before transfection, the cells were washed with ml pbs. µg plasmid dna and µg of kda, linear polyethylenimine (pei) were mixed in a sterile ml falcon tube and incubated at room temperature for minutes with occasional gentle mixing. next, ml of dulbecco's modified eagle medium (dmem), with high glucose and lglutamine (bioconcept), without fbs, were added to the dna-pei mixture, followed by another minutes incubation at room temperature with occasional mixing. thereafter, the pbs was removed from the cells and the transfection mixture was added onto the cells and distributed well. after incubation at o c, % co for six hours, ml of dmem high glucose supplemented with % fbs were added, and the cells were incubated at o c, % co overnight . the next day, the cells were split : and grown in dmem high glucose medium supplemented with % fbs. after another overnight incubation, the medium was replaced by fresh medium of the same composition, and supplemented with zeocin (invivogen) to a final concentration of µg/ml and penicillin-streptomycin (pan biotech) to a final concentration of u/ml. the selective medium was exchanged on mondays, wednesdays and fridays until only zeocinresistant cells remained and the cells were confluent. nine days after transfection, the cells were trypsinized and transferred to a cm cell culture flask in ml selective medium. fourteen days after transfection, / of the cells in the cm flask ( % confluent) were split into a new cm flask for expression, while the other / were transferred to another flask as a backup and for freezing. sixteen days after transfection, when the cells in the expression flask were  % confluent, they were washed twice with pbs, and ml of serum-free, selective expression medium (opti-mem i reduced serum medium, gibco ref - ) supplemented with µg/ml zeocin (invivogen cat no ant-zn) and µg/ml tetracycline) were added. the selective expression medium was collected on mondays, wednesdays and fridays and replaced with fresh medium of the same composition. once in the serum-free medium, the cells reached confluence within days. the supernatant medium collected from the confluent culture typically contained some detached cells, which were removed by centrifugation at room temperature, rcf for minutes. for upscaling, the cells were expanded into two cell culture flasks with cm surface area each. the cells were grown to confluence and then washed twice in pbs before adding serum-free medium for expression. ni-nta imac after removal of detached cells by centrifugation, the expression medium containing the secreted fusion protein was supplemented with / tablet of protease inhibitor (complete edta-free protease inhibitor cocktail tablets, roche diagnostic gmbh, ) and transferred to a ml falcon tube containing l of washed, pre-equilibrated ni-nta agarose (qiagen cat. no. ) . the solution was incubated at o c with occasional agitation until the next batch of secreted protein became available. then, the ni-nta resin was collected by centrifugation at rcf, minutes at o c. the supernatant was discarded and the fresh batch of clarified, protease inhibitor treated medium was added to the resin. this process was repeated until the ni-nta agarose clearly became yellow. the ni-nta resin was then collected by centrifugation ( rcf, minutes, o c), the supernatant was discarded, and the resin was resuspended in wash buffer ( mm tris ph . , mm nacl) and transferred to a column. the resin was washed six times with ml of ice-cold wash buffer per wash, using gravity flow. the protein was eluted in three l steps in mm tris ph . , mm nacl, mm imidazole. for the upscaled purification, ml of ni-nta agarose was used. the imac-purified proteins were centrifuged for minutes at rcf, o c and run on a superdex increase / gl column (code - - ) in mm tris ph . , mm nacl, at a flow rate of . ml/minute on an Äkta ettan system at o c. for binding studies, separate proteins were centrifuged minutes at rcf, o c. equimolar amounts of the supernatants were then mixed and incubated on ice for hour prior to the sec run. for the complex containing the yfp-s_rbd fusion protein, g of sec-purified yfp-s_rbd were mixed with an equimolar amount of imac-purified ace peptidase domain, and the volume was adjusted to l using sec buffer. for the complex containing cleaved s rbd without yfp, g of imac-purified s_rbd were mixed with an equimolar amount of imac-purified ace peptidase domain, and the volume was adjusted to l using sec buffer. after the incubation step, before the sec run, the complexes were again centrifuged for minutes at rcf, o c. negative stain grid preparation . - µl of purified protein ( . mg/ml) were first applied to a glow discharged, carbon coated grid (plano, germany), thereafter excess liquid was blotted away using filter paper and grids were stained with - % uranyl acetate solution. cryo-em grid preparation the protein peak obtained from sec was collected and concentrated to . mg/ml. cryo-em grids were prepared by applying . μl of protein to the glow-discharged quantifoil r . / . -copper mesh grids from electron microscopy science (q -cr . ). the grids were blotted for s, plunge-frozen in liquid ethane using a vitrobot mark iv (thermo fischer scientific), operated at °c and % humidity, and stored in liquid nitrogen until cryo-em data collection. data acquisition was performed using a jem fs transmission electron microscope (jeol, tokyo, japan) equipped with an in-column energy filter and a field emission gun. micrographs were recorded with k /xp direct electron detector (gatan, ametek) and gms software (gatan, ametek). for analysis by sds-page, l x pbs and l ( u) pngase f (from elizabethkingia meningoseptica, expressed in e. coli, sigma aldrich f - un) were added to l of sec purified yfp-s-rbd ( . mg/ml protein concentration), followed by overnight incubation at room temperature. for mass spectrometry, the protein deglycolysation was achieved by incubating l ( . mg/ml) protein solution with l of pngase f ( u, glycerol free) at °c overnight. lc/ms analysis was performed on a waters lct premier mass spectrometer (esi-tof) and hplc waters . samples were chromatographed on a reprosil-pur c -aq column ( µm, mmx mm) heated to °c using the conditions shown in our aim was to produce high-quality, soluble -ncov spike rbd labeled with a fluorescent protein for easy detection. spike rbd contains disulfide bonds and nglycosylations (see e.g. pdb entry m , yan et al., ; pdb entry vsb, wrapp et al., ; pdb entry lzg, wang et al. ) . therefore, this protein domain is usually produced by secretion from eukaryotic cells. only few fusion proteins are commonly used for secreted proteins, notably the constant domain (fc) of igg and human serum albumin (dalton and barton, ) . we instead used yellow fluorescent protein as a fusion protein. analysis of enhanced yellow fluorescent protein (eyfp, ormö et al., ) , using the netnglyc . server and the netoglyc . server (steentoft et al., ) , revealed no n-glycosylation sites, but a single putative oglycosylation site, just above threshold, within the yfp sequence. analysis of the yfp structure showed that the putative o-glycosylation site is near the surface of the protein. furthermore, secretion of the enhanced green fluorescent protein (egfp) has previously been described (román et al., ) . gfp is nearly identical to yfp in structure and sequence, and also contains the putative o-linked glycosylation site. this same publication (román et al., ) also suggested improved protein secretion levels when using the interferon alpha (ifn ) signal peptide, compared to a number of commonly used signal peptides, including the signal peptide of interleukin- (il- ). for this reason, we used the ifn signal peptide in our construct, in nearly the same context to the fluorescent protein as in ref. (román et al., ) , except that we left out the start methionine of yfp, since translation starts at the start atg of the signal peptide upstream of the yfp. instead, we inserted a short linker (translating into gly-ser), which allowed the insertion of a bamhi restriction endonuclease recognition sequence for later use of the vector with the signal peptide for other targets. the construct was designed for insertion into the hindiii and xbai sites of the vector pcdna /to (invitrogen), a mammalian expression vector that allows tetracycline-inducible expression from a cmv promoter in cells expressing the tetracycline repressor protein, and constitutive expression in cells not containing the tetracycline repressor protein. at the start of the insert, we entered a noti site containing a partial kozak sequence (gcggccgccatgg), which we completed with additional nucleotides. the penultimate residue in the signal peptide is alanine, resulting in an optimal atgg dna sequence (kozak ) . a flag-tag for detection of the fusion protein or cleaved-off yfp was included at the c-terminus of yfp upstream of a human rhinovirus c protease cleavage site. an xhoi restriction endonuclease recognition site was included in the sequence coding for leu-glu of the c protease site, for later use of the vector with the signal peptide and yfp for other targets. the sequence coding for the -ncov-spike_rbd with a c-terminal, noncleavable x his-tag and two stop codons, was inserted just downstream of the rhinovirus c protease site. the resulting expression construct is depicted in figure . we transfected the expression plasmid into hek cells and generated stable cells by selection with zeocin. we then expanded the adherent, stably transfected cell culture in a flask with cm surface area. when a confluence of ~ % was reached, the dmem/fbs medium was replaced by serum-free opti-mem medium. the supernatant medium was collected three times a week, clarified by centrifugation, supplemented with protease inhibitor, and successively incubated with the same µl ni-nta agarose batch. this process was repeated until the ni-nta agarose clearly turned yellowish in color. this stage was reached after nine sequential incubations, each with - ml medium. we analyzed the yfp fluorescence of each medium batch that we collected. comparison to the yfp fluorescence of a purified yfp of known concentration allowed an initial estimation of the amount of secreted protein. ml of medium from hours incubation with the confluent culture typically produced a fluorescence peak height of  relative fluorescence units, which corresponds to a yfp concentration of  g/ml. the cells, originally at  % confluence, reached  - % confluence within a week in serum-free medium, and a subpopulation of cells detached in confluent cultures and had to be removed from the medium by centrifugation prior to addition to the ni-nta resin. after reaching confluence, the amount of protein secreted into the medium remained stable over more than six weeks, based on fluorescence measurements (data not shown). to upscale protein production, we expanded the stably transfected cells from a backup plate to two larger flasks ( cm surface area each). in the original expression flask, the cells had grown slowly after changing to serum-free medium, and the amount of secreted protein increased significantly as the cells reached higher confluence. furthermore, confluent cultures remained productive for several weeks. for those reasons, we grew the larger scale cultures to confluence before changing to serum-free medium. we then collected medium from one cm flask and two cm flasks and sequentially incubated the collected medium with a ml batch of ni-nta agarose until the resin turned yellow ( ml medium total, collected over days). the ni-nta resin was washed and the protein was eluted in an imidazole-containing buffer. the initial small scale imac purification from µl ni-nta resin yielded µg protein of high purity (figure a) . the protein solution was monodisperse according to analytical sec, resulting in a single main sec peak at the expected retention volume (figure b) . the upscaled purification from ml ni-nta resin yielded . mg of pure protein after imac. based on analysis by sds-page, the protein purity was already high after ni-nta imac. the sec profile of the yfp-s_rbd fusion protein confirmed the high purity and also showed that the protein solution was monodisperse. this was confirmed by negative staining electron microscopy (em) (figure a) . to test whether the s_rbd protein retains its properties after removal of the fluorescent protein tag, the yfp was removed by rhinovirus c protease cleavage. l ni-nta resin were loaded with protein in five steps with a total of ~ ml expression medium. after washing, the ni-nta resin was incubated overnight in the presence of prescission protease (gst-tagged human rhinovirus c protease). the yfp was then washed off and collected, while the his-tagged s_rbd protein remained on the column. the protein was eluted from the now colorless ni-nta resin using an imidazole-containing buffer and analyzed by sds-page (figure d) . the collected cleaved-off yfp was also analyzed on sds-page, after incubation with glutathione sepharose b to remove the gst-tagged protease (figure d) . the hs_rbd protein was analyzed by analytical sec. there was a single main peak at the expected retention volume, with only a slight shoulder, confirming that the s_rbd domain retained its solubility and monodispersity after removal of the yfp fusion protein (figure e) . to test whether the purified yfp-s_rbd fusion protein binds its target receptor ace , we produced and purified human ace peptidase domain and analyzed the separate proteins as well as the complex of the two proteins by analytical sec experiments. the complex coeluted in a peak at a reduced retention volume compared to the peak from ace run alone or the peak of yfp-s_rdb run alone, clearly confirming complex formation (figure a-c). to test whether the s_rbd domain retains its ace -binding activity after proteolytic removal of the yfp, the analytical sec experiment was repeated with prescission protease cleaved, purified s_rbd. the two proteins co-eluted in a peak at a reduced retention volume compared to the separate proteins, confirming binding ( figure d and e). over-expression of secreted proteins from mammalian cell lines an analysis of '-noncoding sequences from vertebrate messenger rnas structure, function, and evolution of coronavirus spike proteins crystal structure of the aequorea victoria green fluorescent protein. science enhancing heterologous protein expression and secretion in hek cells by means of combination of cmv promoter and ifnα signal peptide precision mapping of the human o-galnac glycoproteome through simplecell technology free fatty acid binding pocket in the locked structure of sars-cov- spike protein. science structural and functional basis of sars-cov- entry by using human ace . cell cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by fulllength human ace we thank takashi ishikawa, gebhard f.x. schertler and michel steinmetz for supporting the project. this work was in part supported by grants from ubs promedica ( /m) and the swiss national science foundation (snf spark, crsk- _ ) to r.m.b., and by a sinergia grant from the swiss national science foundation (crsii _ / ) . we furthermore acknowledge support in part from the psi covid emergency science fund. the project was initiated and coordinated by r.m.b.. the ace construct was sub-cloned by t.b.. cell culture experiments, protein purification and biochemical analysis were performed by r.m.b., t.b. and g.c.. electron microscopy experiments were carried out by e.p. and t.b.. mass spectrometry analysis was performed by a.b.. the manuscript was written by r.m.b. and t.b. with contributions from all authors. the authors declare no competing interests. key: cord- -yqe vdj authors: kumar, nilesh; mishra, bharat; mehmood, adeel; athar, mohammad; mukhtar, m. shahid title: integrative network biology framework elucidates molecular mechanisms of sars-cov- pathogenesis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yqe vdj covid- (coronavirus disease ) is a respiratory illness caused by severe acute respiratory syndrome coronavirus (sars-cov- ). while the pathophysiology of this deadly virus is complex and largely unknown, we employ a network biology-fueled approach and integrated multiomics data pertaining to lung epithelial cells-specific coexpression network and human interactome to generate calu- -specific human-sars-cov- interactome (csi). topological clustering and pathway enrichment analysis show that sars-cov- target central nodes of host-viral network that participate in core functional pathways. network centrality analyses discover high-value sars-cov- targets, which are possibly involved in viral entry, proliferation and survival to establish infection and facilitate disease progression. our probabilistic modeling framework elucidates critical regulatory circuitry and molecular events pertinent to covid- , particularly the host modifying responses and cytokine storm. overall, our network centric analyses reveal novel molecular components, uncover structural and functional modules, and provide molecular insights into sars-cov- pathogenicity. from the epicenter of the covid- (coronavirus disease ) outbreak in china, the disease has spread globally in countries/territories with over . million confirmed cases and almost , fatalities as of april , , and the world health organization (who) warned that the pandemic is accelerating worldwide , . apart from the human tragedy, covid- has a growing detrimental impact on the global economy and will likely cause trillions in financial losses worldwide in alone. covid- is an infectious respiratory illness caused by a highly contagious and pathogenic sars-cov- (severe acute respiratory syndrome coronavirus ). this single-stranded rna virus belongs to the family coronaviridae and is closely related to another human coronavirus sars-cov with . % nucleotide similarity , . sars-cov and another human coronavirus mers-cov (middle east respiratory syndrome-cov) caused two previous global epidemics in and , respectively, both characterized by high fatality rates , . these coronaviruses mainly spread from a contagious individual to a healthy person through respiratory droplets derived from an infected person's cough or sneeze, and from direct contact with contaminated surfaces or objects, where the virus can maintain its viability for period ranging from hours to days , . unlike other coronaviruses, sars-cov- transmits more efficiently and sustainably in the community according to center for disease control (cdc) . while majority of the patients infected with sars-cov- develop a mild to moderate self-resolving respiratory illness, infants and older adults (≥ years) as well as patients with preexisting medical conditions such as cardiovascular disease, diabetes, chronic respiratory disease, renal dysfunction, obesity and cancer are more vulnerable , . the pathophysiology of sars-cov- is complex and largely unknown but is associated with an extensive immune reaction referred to as 'cytokine storm' triggered by the excessive production of interleukin beta (il- β), interleukin (il- ) and others. the cytokine release syndrome leads to extensive tissue damage and multiple organ failure . while no vaccine or antiviral drugs are currently available to prevent or treat covid- , identifying molecular targets of the virus could help uncover effective treatment. integrated interactome-transcriptome analysis to generate calu- -specific human- it is likely that the outcome of sars-cov- infection can largely be determined by the interaction patterns of host proteins and viral factors. to build the human -sars-cov- interactome, we first assembled a comprehensive human interactome encompassing experimentally validated ppis from string database . since the string database is not fully updated, we manually curated ppis from four additional proteomes-scale interactome studies, i.e. human interactome i and ii, bioplex, qubic, and cofrac (reviewed in ). this yielded us an experimentally validated high quality interactome containing , nodes and , edges (fig. a ). subsequently, we compiled an exhaustive list of host proteins interacting with the novel human coronavirus that was referred to as sars-cov- interacting proteins (sips) (supplementary data ). this comprises human proteins associated with the peptides of sars-cov- , whereas the remaining host proteins interact with the viral factors of other human coronaviruses including sars-cov and mers-cov , which could also be of significance in understanding the molecular pathogenesis of sars-cov- . by querying these sips in the human interactome, we generated a subnetwork of , nodes and , edges that covers first and second neighbors of sips (fig. a) . given that the sips-derived ppi subnetwork may not operate in all spatial or temporal conditions, coronavirus-specific co-expression data is used to filter the interactions in the context of covid- . it is important to note that no exceptionally high-resolution sars-cov- transcriptome was available at the time of analysis (details below). therefore, we took advantage of extensive temporal expression data available for sars-cov and mers-cov (fig. b) . towards this, we performed a weighted coexpression network analysis (wgcna) in human airway epithelial cells (calu- ) treated with sars-cov and mers-cov over time in vitro in culture. this analysis yielded a comprehensive co-expression network with , nodes and , , edges ( fig. b) . by integrating this calu- co-expression network with sips-derived ppi subnetwork, we generated calu- -specific human-sars-cov- interactome (csi) that contains sips interacting with their first and second neighbors make a network of , nodes and , edges (fig. c, supplementary data ) . we showed that csi follows a power law degree distribution with a few nodes harboring increased connectivity, and thus exhibits properties of a scale-free network (r = . ; (fig. d , supplementary data ), similar to the previously generated other human-viral interactomes , , , , , , , , , , , , , . taken together, we constructed a robust, high quality csi that was further utilized for network-aided architectural and functional pathway analyses. from a network biology standpoint, a viral infection as well as other pathogen attacks can be viewed as a set of strategic perturbations, at least in part, within the core components of the host interactome , , . since such central nodes correspond to proteins that exhibit increased connectivity and/or central positions within a network, we addressed a question whether sars-cov- also attacks such important nodes within csi. towards this, we calculated the average degree (number of connections), betweenness (the fraction of all shortest paths that include a node within a network), load centrality (the fraction of all shortest paths that pass through a node), information centrality (the harmonic mean of all the information measures for a node in a connected network) and pagerank index (counting incoming and outgoing connections considering the weight of the edge) for sips, and compared them with their first and second neighbors. we demonstrated that these four topological features of sips were significantly higher than the other nodes within csi (fig. a , b, c and supplementary fig. a and b, supplementary data ; t-test p < . ). we also showed that sips were significantly enriched in csi compared to the human interactome (fig. d, supplementary data ; hypergeometric p< . e- ). these results indicate that sars-cov- targets core structural components of the human-viral interactome, and prompted another question as to whether csi also activates common biological processes in response to viral infection. since nodes within csi not only form protein complexes with each other but also transcriptionally co-express, we reasoned that densely connected nodes within this network may participate in similar biological functions. towards this, we investigated the underlying modular structures (protein clusters ≥ nodes) in csi followed by ingenuity pathway analysis (ipa). this approach allowed us to identify modules ranging from to nodes for the smallest and largest modules, respectively. subsequently, we examined the biological processes, cellular pathways and signaling cascades that are modulated in the top modules performed a human phenotype ontology analysis that identifies phenotypic abnormalities encountered in human diseases. significantly enriched terms included mitochondrial inheritance, hepatic necrosis, respiratory failure and abnormality of the common coagulation pathway ( supplementary fig. e ). collectively, we showed that sars-cov- proteins interact with central nodes of csi, and these proteins are implicated in core molecular and cellular pathways to establish infection and continue disease progress. human-viral interactome landscapes of several viruses have previously shown that viral proteins interact with nodes corresponding to high degree (hubs) and high betweenness (bottlenecks), and such structural features have been previously used to predict viral targets , , , , , , , , , , , , , . in addition to hubs and bottlenecks, pagerank algorithm was also effectively used to identify viral targets . moreover, these physical characteristics can also be used to prioritize the most influential genes in csi for biological relevance and drug target discovery. here, we used nine different centrality indices to identify the most influential nodes referred to as csi significant proteins (csps). this includes the above described degree, betweenness, information centrality, pagerank index and load centrality as well as additional features such as eigenvector centrality (a measure of the influence of a node in a network), closeness centrality (reciprocal of the sum of the length of the shortest paths between the node and network), harmonic centrality (reverses the sum and reciprocal operations of closeness centrality and weighted k-shell decomposition) (an edge weighting method based on adding the degree of two nodes in network partition). while weighted k-shell decomposition analysis was recently performed to increase the predictability of host targets of bacterial pathogens , we showed that the top % of nodes reside in the inner layers of csi (fig. a, supplementary data ) . for other centrality measures, we also maintained a stringent threshold of top % to be considered as a highly influential node or csp. evidently, we can expect overlapping topological features for the same set of nodes. noticeably, we observed a strong positive correlation between information centrality and degree ( fig. b ; r = . ), betweenness and degree ( fig. c ; r = . ) and pagerank and degree ( fig. d ; r = . , supplementary data ). collectively, we identified csps that exhibit more than one high centrality measure (fig. e, supplementary data ) . for instance, eef a that has previously been implicated in sars was enriched in all the centrality measures tested in our study (fig. f, supplementary data ) . in addition, ube i, ppia, and phb were also associated with sars and were enriched in more than five centrality measures (fig. f, supplementary data ). we categorized these csps into three major groups based on their potential roles in covid- . while we expect some, if not all, of these proteins to have more than one function, the group- csps might be largely relevant to modifying host response following sars-cov- infection (fig. e) . moreover, the proteins in the other two groups might be involved in viral entry, proliferation, survival and pathogenesis as well as cytokine storm ( fig. e ; see details in discussion). furthermore, we found that these csps are targets of some of the well-known sars-cov- viral proteins. sars-cov- nsp targets most of the csps (i.e. seven in total), sars-cov nsp targets five csps, and sars-cov- m has four csps targets, while other sars-cov- nsps' ( , , ) and sars-cov- orfs' ( b, , , c) possess relatively fewer targets. intriguingly, three of our csps (ppia, rps , and ndufa ) are targets of more than one sars-cov protein (fig g, supplementary fig. ), while phb is the target of several viral proteins tested as bait at low threshold. it is also important to note that phb is also targeted by viral proteins of sars-cov . these data support previous findings that an individual viral factor can target multiple host nodes and several viral proteins can interact with the same host protein , , , , , , , , , , , , , . collectively, these data strengthen our notion that centrality measures can be an effective method to predict highly influential nodes, leading us to discover such csps. to further understand the biological characteristics, regulatory relationships and molecular events associated with the nodes in csi, we incorporated transcriptome data of covid- patients derived from bronchoalveolar lavage fluid (balf) and peripheral blood mononuclear cells (pbmc) with our csi data . overall, sars-cov- infection exhibited largely different transcriptional signatures for balf and pbmc . we identified a set of and differentially expressed genes (degs) in balf and pbmc, respectively (p≤ . , fc ≥ . , fig. a , b, supplementary data ). thus, csi constitutes over % of transcriptomes pertaining to both balf and pbmc. intriguingly, in balf, we observed that the upregulated cluster a is enriched with eif signaling/translation pathway, while the two down-regulated clusters (b and c) are enriched in retinoic acid-mediated apoptosis signaling pathway (fig. a ). conversely, one major cluster that is significantly upregulated in pbmc is enriched in t cell receptor regulation of apoptosis and protein ubiquitination pathway (fig. b) . these data further support the notion that significantly enriched protein modules in csi are involved in sars-cov- pathogenesis. to reveal the regulatory circuitry and molecular events pertinent to sars-cov- infection, we performed probabilistic modeling using idrem (interactive dynamic regulatory events miner) framework that incorporates protein-dna interaction data with transcriptomics . given that idrem requires time-course transcriptional profiling data, and in vivo or in vitro temporal sars-cov- transcriptome data is currently lacking, we made use of a high-resolution temporal sars-cov dataset ( time points) . however, we only focused on those upstream transcriptional factors (tfs) and downstream target genes that were also present in balf and pbmc, which allowed us to mimic sars-cov- -mediated dynamic regulatory networks. this dynamic regulatory modeling identified several bifurcation points, where a set of tfs regulates their potential co-expressed and downstream target genes ( data ). among them, we observed the first major wave of differential regulation and activation of tfs at -hour post infection. at this bifurcation transcriptional event, we found a set of tfs (yy , stat , stat , and srebf ), which were also expressed in balf transcriptome. the next major bifurcation occurred at -hour post infection, comprising and tfs expressed in balf and pbmc, respectively (supplementary data ). while we found similar sets of target genes regulated by diverse sets of tfs at different stages of infection, we also discovered multiple combinations of tfs regulating similar sets of downstream genes (fig. c) . this reflects the intricate nature of dynamic regulatory relationships between tfs and their targets. next, we primarily focused on four major pathways/signaling events, i.e. cytokine storm, eif signaling/translation, protein ubiquitination pathway and t cell receptor regulation of apoptosis. in the first example of cytokine storm, we identified a total of tfs predominantly, we found that two tfs (stat and stat ) and one master regulator (jun) are early transcriptional players activated at -and -hour post infection. in particular, we found csi genes cxcl and tnfaip co-regulated with cxcl and cxcl , and il- a and il- , respectively, indicating that members of csi participate in cytokine storm. majority of these tfs are related to inflammatory/immune regulatory processes. similarly, during eif signaling/translation, we identified a total of tfs (mxi , in the last two decades, intra-and inter-species interactomes have been generated in a number of prokaryotes and eukaryotes including human, mouse, worm and plant models , , , . investigating such interactomes has indicated that diverse cellular networks are governed by universal laws, and led to the discovery of shared and distinct molecular components and signaling pathways implicated in viral pathogenicity. in the present study, we constructed a calu- -specific human-sars-cov- interactome (csi) by integrating the lung epithelial cells-specific co-expression network with the human interactome. we determined that csi displayed features of scale-freeness and was enriched in different centrality measures. identification of structural modules displayed the relationships with a set of functional pathways in csi. in-depth network analyses revealed most influential nodes. additional noteworthy findings pertain to sars-cov- transcriptional signatures, regulatory relationships among diverse pathways in csi and overall sars-cov- pathogenesis including the cytokine storm. we constructed a comprehensive and robust csi, a human-viral interactome that displayed scale free properties (r = . ; fig. d ). we also showed that the sars-cov- interacting proteins (sips) exhibit increased average centrality indices compared to the remaining proteins in the network (fig. d , supplementary fig. a , b). numerous human-viral interactomes have previously been generated to uncover global principles of viral entry, infection and disease progression. these include human t-cell lymphotropic viruses, epstein-barr virus, hepatitis c virus, influenza virus, human papillomavirus, dengue virus, ebola virus, hiv- , and sars-cov , , , , , , , , , , , , , , and all of these interactomes exhibited a power law distribution. another significant tenet of interactomes is the existence of modular structures or modules, defined as sets of densely connected clusters within a network that exhibit heightened connectivity among nodes within a module. such nodes within a module have previously been deemed to possess similar biological function or belong to the same functional pathways . since nodes in csi not only form protein complexes but also coexpress specifically to coronavirus infection, we extracted several functional modules from our network ( fig. e-k). the mostly highly connected module pertains to eif signaling, and is comprised of protein translation-related proteins such as rps and rpls. indeed, these ribosomal proteins have been shown to interact with viral rna for viral proteins biosynthesis, and are subsequently required for viral replication in the host cells . noteworthy, two ribosomal proteins, rpl and rps , found to interact with several sars-cov- viral factors. moreover, both of these proteins are also csi significant proteins (csps) that harbor increased centrality measures (fig. e) . intriguingly, rps has been demonstrated to operate as an immune factor that activates tlr -mediated antiviral. it remains to be addressed whether rps is a "double whammy" target of sars-cov- for ( ) hijacking this important factor for viral translation and replication, and ( ) suppressing a critical immune signaling pathway. regardless, ribosomal proteins are critical targets of numerous viruses and play equally essential roles in developing antiviral therapeutics . the ubiquitin proteasome system (ups) constitutes the major protein degradation system of eukaryotic cells that participate in a wide range of cellular processes, and another critical target of diverse viruses . ups plays an indispensable role in finetuning the regulation of inflammatory responses. for instance, proteasome-mediated activation of nf-κb regulates the expression of proinflammatory cytokines including tnf-α, il- β, il- . similarly, ups is indispensable in the regulation of leukocyte proliferation . the ups is generally considered a double-edged sword in viral pathogenesis. for example, ups is a powerhouse that eliminates viral proteins to control viral infection, but at the same time viruses hijack ups machinery for their propagation . in case of herpes simplex virus type , varicella-zoster virus and simian varicella virus, induction of nf-κb -mediated host innate immunity is suppressed by the manipulation of ups components . moreover, it was revealed that ups plays crucial roles at multiple stages of coronaviruses' infection . in our study, the ubiquitin proteasome module was composed of several members of s proteasome atpase or non-atpase regulatory subunits, which includes two csps, psmd and psma (fig. e ). it still needs to be determined whether these two csps play important roles in the expression of proinflammatory cytokines and are potentially involved in the cytokine storm. while the mechanistic evaluation of sars-cov- interaction with these two highvalue targets needs to be explored, both the mrna and protein expression corresponding to psmd was recently shown to be decreased up to % in aged keratinocytes . since reduced proteasome activity results in aggregation of aberrant proteins that perturb cellular functions, we hypothesized that sars-cov- targets these csps to interfere with er-mediated cellular responses. another noteworthy module is the t cell receptor regulation of apoptosis. indeed, it was recently reported that sars-cov- infection may cause lymphocyte apoptosis demonstrated by overall cell count and transcriptional signatures in pbmc of covid- patients , . another significant csp in this pathway is mtch (fig. e) , a proapoptotic protein that triggers apoptosis independent of bax and bak . we hypothesized that cytokines-mediated induction of cytokine storm is partially dependent on the sars-cov- interaction with mtch . taken together, our module-based functional analyses identified several novel molecular components, structural and functional modules, and overall provided insights into the pathogenesis of sars-cov- . our network topology analyses discovered csi significant proteins (csps) that have been implicated in several above described modules and pathways (fig. e ). to provide a system-wide perspective of the importance of these csps in covid- , we categorized these csps into three groups based on their possible functionality. group- includes csps that are potentially relevant to modifying host response following sars-cov- infection. these include eef a , etfa, mrps , mrps , mtch , ndufa , rab a, rab a, rab c, rab a and rhoa (fig. e) . we hypothesized that such csps are important in creating protective environment in host tissue following the viral infection. for example, rab and rho group of ras proteins may be involved in augmenting inflammatory signaling pathways. while antioxidants regulating mitochondrial and cytoplasmic proteins are possibly important in regulating and maintaining redox homeostasis , another csp, sccpdh, is involved in the metabolic production of lysine (lys) and α -ketoglutarate (α-kg) . intriguingly, l-lysine supplementation appears to be ineffective for prophylaxis or treatment of herpes simplex lesions . we hypothesized that sars-cov- may target sccpdh to hijack the biosynthesis of this essential amino acid for its benefit. group- csps that we identified are likely to be hijacked by sars-cov- for its entry, proliferation and survival in the host tissue. in this category, one of the most important csps is prohibitin (phb; fig. e ). phb is an important protein shown to be a receptor for dengue and chikungunya viruses , . although it has been shown that ace serves as the main receptor for sars-cov- entry into the cells , it is quite interesting that pathogenesis of the viral infection is not significantly different between the populations of hypertensive patients who receive or don't receive ace inhibitors , , , . therefore, it is plausible that under certain physiological conditions when sars-cov- does not engage with the ace receptor for its entry into the cell, phb serves as an alternative receptor. another csp integrin β encoded by itgb was recently shown to be required for the entry of rabies virus . whether itgb could also promote the entry of sars-cov- is another question that needs to be addressed. mepce is another important enzyme involved in rna stabilization by capping the ´ end of rna with methyl phosphate . it is also likely that mepce is utilized by the covid- virus for stabilization of its rna in the host tissue. similarly, ppp ca was shown to regulate hiv- transcription by modulating cdk phosphorylation , and thus is potentially involved in the gene regulation of sars-cov- . as discussed above, psma and psmd are the two proteasomal csps , , . while infecting the lung epithelium, sars-cov- may utilize these ups proteins for the fusion with the host cell membrane (fig. e) . similarly, nup can also be utilized for viral entry into the nucleus. additional three csps in this category, rpl and rps , as well as srp , could be employed for viral transcription and protein synthesis (fig. e) . finally, group- csps are proteins, which sars-cov- may utilize both to facilitate its proliferation as well as to induce a conducive environment in the host tissue for its sustenance and pathogenesis (fig. e) . these csps include ap m , csnk b, eef a , etfa, larp , rtn and ube i. among these csps, eef a , ppia, psma , psmd , rab a, rab a, rab c, rab a, rhoa and ube i are identified as the ones that are potentially associated with the pathogenesis of the cytokine storm as observed in some severely affected patient populations. intriguingly, eef a , a target of several viruses, is known to be activated upon inflammation . this csp is independently identified as one of the major regulators in human-sars-cov- predicted interactome . the csps, which regulate protein folding and translation, for example eiaf , could be utilized by sars-cov- to halt host protein translation, folding and protein quality control. in addition, we also identified e f , tbx and smarcb as first neighbors of some of these csps. these csps complexes play key roles in promoting cell death, causing inflammation and acting enzymatically as viral integrases. collectively, these csps and their first neighbors could directly and indirectly perform intricate pathopysiological functions but those mentioned here could be the key effects of covid- on host tissue dysregulation. this classification is also crucial for the design of effective therapeutic interventions against covid- . finally, we presented transcriptional modeling of csi genes including csps that participate in cytokine storm, eif signaling/translation, protein ubiquitination pathway and t cell receptor regulation of apoptosis. thus, these signaling pathways and tfs discovered through our analyses could provide important clues about effective drug targets and their combinations that can be administered at different stages of covid- . in conclusion, we generated a human-sars-cov- interactome, integrated virusrelated transcriptome to interactome, discover covid- pertinent structural and functional modules, identify high-value viral targets, and perform dynamic transcriptional modeling. thus, our integrative network biology-based framework led us uncover the underlying molecular mechanisms and pathways of sars-cov- pathogenesis. to build human interactome, we assembled a comprehensive protein-protein we obtained microarray data for gse , gse , gse from geo database and used geo r, an interactive web tool to generate differential gene expression between infection and mock treatments at their respective time points. briefly, geo r utilizes limma r package. limma is an r package for the analysis of gene expression microarray data. specifically, it uses the linear model for analyzing designed experiments and the assessment of differential expression. a threshold of log fold change and fdr ≤ . was set for differential expression analysis of all microarray experiments. for comparative study of sars-cov- expression pattern, we downloaded expression data set of rnas isolated from the bronchoalveolar lavage fluid (balf) and peripheral blood mononuclear cells (pbmc) of covid- patients . the criteria for filtering out significant genes were kept as adjusted p-value < . and foldchange > we mined calu- cells-specific datasets from geo database , and downloaded gse (wild type), gse (icsarscov) and gse (locov). we performed individual weighted gene co-expression network analysis (wgcna) package (r version . . ), and constructed three co-expression networks. moreover, we also generated topological overlap measure (tom) plots to compute a numerical entity that reflects interconnectedness among genes within a co-expression network. a cut-off of . was used to export the networks. subsequently, we merged these networks to generate a comprehensive calu- cells-specific co-expression to study the network connectivity pattern of interactome. to extract the calu- -specific human-sars-cov- interactome (csi) we integrated the cytoscape (version . . ) was used to visualize all the networks. the functional enrichment analysis was done by kyoto encyclopedia of genes and genomes (kegg), ingenuity pathway analysis (ipa), wikipathways, go biological process, cluego, and enricher for human phenotype ontology and rare diseases term with their statistically significant parameters . interactive visualization of dynamic regulatory networks (idrem) is a method which incorporates static and time series expression data to reconstruct condition-specific reaction network in an unsupervised manner . additionally, the regulatory model identifies specific stimulated pathways and genes, which uses statistical analysis to recognize tfs that vary in activity among models. we implemented idrem on , cumulative differentially expressed genes across hours of sars-cov infection with log normalization for dynamic regulatory event mining with all human , tfs/targets collections from encode database . the dynamic activated pathways regulated by tfs was generated by ebi human gene ontology function. hypergeometric test, linear regression (r ), and student t-test were performed using r version . . as well as online stat trek tool. all datasets used for this study are accessible through supplementary data files. the authors declare no competing interests. the authors also declare no financial interests. . t r a n s c r i p t o m i c c h a r a c t e r i s t i c s o f b r o n c h o a l v e o l a r l a v a g e f l u i d a n d p e r i p h e r a l b l o o d m o n o n u c l e a r c e l l s i n c o v i d - p a t i e n t s . e m e r g m i c r o b e s i n f e c t , - ( ) . a r u y a m a t , n a r a k , y o s h i k a w a h , s u z u k i n . t x k , a m e m b e r o f t h e n o n -r e c e p t o r t y r o s i n e k i n a s e o f t h e t e c f a m i l y , f o r m s a c o m p l e x w i t h p o l y ( a d p -r i b o s e ) p o l y m e r a s e a n d e l o n g a t i o n f a c t o r a l p h a a n d r e g u l a t e s i n t e r f e r o n -g a m m a g e n e t r a n s c r i p t i o n i n t h c e l l h s u l y , c h i a p y , l i m j f . t h e n o v e l c o r o n a v i r u s ( s a r s - c o v - ) e p i d e m i c . a n n a c a d m e d s i n g a p o r e , - ( ) . . c a s c e l l a m , r a j n i k m , c u o m o a , d u l e b o h n s c , d i n a p o l i r . f . d i n g j , h a g o o d j s , a m b a l a v the regulators only expressed in balf and pbmc transcriptomes are highlighted. significant regulators (tfs) control the regulation dynamics (p< . ). major bifurcation of pathways occurs at -hour with a total of tfs involved in dynamic modulation key: cord- - jejswuk authors: wang, nan; han, shengli; liu, rui; meng, liesu; he, huaizhen; zhang, yongjing; wang, cheng; lv, yanni; wang, jue; li, xiaowei; ding, yuanyuan; fu, jia; hou, yajing; lu, wen; ma, weina; zhan, yingzhuan; dai, bingling; zhang, jie; pan, xiaoyan; hu, shiling; gao, jiapan; jia, qianqian; zhang, liyang; ge, shuai; wang, saisai; liang, peida; hu, tian; lu, jiayu; wang, xiangjun; zhou, huaxin; ta, wenjing; wang, yuejin; lu, shemin; he, langchong title: chloroquine and hydroxychloroquine as ace blockers to inhibit viropexis of -ncov spike pseudotyped virus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jejswuk background the novel coronavirus disease ( -ncov) has been affecting global health since the end of and there is no sign that the epidemic is abating. the major issue for controlling the infectious is lacking efficient prevention and therapeutic approaches. chloroquine (cq) and hydroxychloroquine (hcq) have been reported to treat the disease, but the underlying mechanism remains controversial. purpose the objective of this study is to investigate whether cq and hcq could be ace blockers and used to inhibit -ncov virus infection. methods in our study, we used cck- staining, flow cytometry and immunofluorescent staining to evaluate the toxicity and autophagy of cq and hcq, respectively, on ace high-expressing hek t cells (ace h cells). we further analyzed the binding character of cq and hcq to ace by molecular docking and surface plasmon resonance (spr) assays, -ncov spike pseudotyped virus was also used to observe the viropexis effect of cq and hcq in ace h cells. results results showed that hcq is slightly more toxic to ace h cells than cq. both cq and hcq could bind to ace with kd =( . ± . )e− m and ( . ± . )e− m, respectively. they exhibit equivalent suppression effect for the entrance of -ncov spike pseudotyped virus into ace h cells. conclusions cq and hcq both inhibit the entrance -ncov into cells by blocking the binding of the virus with ace . our findings provide novel insights into the molecular mechanism of cq and hcq treatment effect on virus infection. . at the same time, in a clinical trial of -ncov positive patients in china, cq showed a significant therapeutic effect without severe adverse reactions (mingxing et al., ) . the above evidence suggests that the adverse effects of cq treatment in -ncov posotive patients may be lower than that of hcq. the curative effect and mechanism of the anti- -ncov of cq and hcq are still controversial. in this study, we found that cq and hcq can antagonize ace and inhibit the entry of -ncov spike pseudotyped virus into ace expressed hek t cells (ace h cells). carlsbad, ca, usa). the survival rate of ace h cells was calculated using the following formula: immunofluorescence assays ace h cells ( × ) were seeded on mm× mm coverslips. and incubated overnight at °c with % co . μm, μm or μm cq and hcq were added to the slides and treated for h. the slides were then fixed with % paraformaldehyde, followed with . % triton x- for min and % bsa solution for h at °c after washing three times with pbs. the cells were then continuously incubated with lc primary antibody at °c for h, and the fluorescent secondary antibody at °c for h followed with tritc-phalloidin stain for min at °c . finally, the cells were mounted with μl of dapi-containing anti-fluorescence quenching reagent. all the cells were observed using a laser confocal fluorescence microscope. data are presented as the mean ± standard error of the mean (sd) and were statistically analyzed using analysis of variance (anova). two-tailed tests were used for comparisons between two groups, and differences were considered statistically significant at p < . . the expression of ace protein in human lung and bronchial-related cells was higher than that in hek t cells. the expression of ace protein in ace h cells was significantly higher than that in other cells, indicating that ace h cells were successfully constructed. it has been reported that at cells express the highest ace receptors in lung and bronchial cells (zou et al., ) . we confirmed that the highest expression of the ace protein occurred in at cells. in addition, this is the first report that eol- cells also express the ace protein ( figure a) . as shown in figure b , cq and hcq had no significant effect on the activity of ace h cells when the concentration was less than μm, and the survival rate of ace h cells could be reduced in a dose-dependent manner when the concentration was above μm. the inhibition of hcq on the activity of ace h cells was more significant than that of cq. it can be concluded that the toxicity of hcq was higher than that of cq on ace h cells at different time points at the same concentrations (figure c) . at a concentration of μm, the statistical difference appeared at h. ca + is an essential second messenger in several cell pathways, as shown in figure d , and cq or hcq rarely affects ca + influx change in ace h cells. figure e shows that within h, the concentrations of both drugs had no significant effect on apoptosis. autophagosome is a spherical structure and as an essential marker for autophagy, and lc is known to be stably associated with the autophagosome membranes. lc includes two forms lc -i and lc -ii, lc -i is found in the cytoplasm, whereas lc -ii is membrane-bound and converted from lc -i to initiate formation and lengthening of the autophagosome. therefore, to investigate the dapi staining were used. activating lysosomal (green) and filamentous actin (f-actin, red) was detected after stimulation with , and μm of cq and hcq in ace cells (figure a) . autophagy proteins lc -i and lc -ii by western blotting. we found that the expression level of lc and lc -ii increased in cq and hcq-treated ace h cells ( figure b ). the protein level of the lc -ii/lc -i ratio was significantly increased compared to the control group ( figure b ). all of these results suggested that cq and hcq could induce lc -mediated autophagy in ace h cells. the sars-cov- virus infects its host cells through binding to the ace protein followed by cleavage of the spike protein by human tmprss , we focused on whether cq or hcq could bind with ace . a virtual molecular docking test was performed to investigate the binding character of cq and hcq with ace . the chemical structure of both drugs are showed in figure a . figure b shows that both cq and hcq can bind to r and d (both in green) of ace with their quinoline and imino groups. in addition, due to the replacement of a methyl group by a hydroxymethyl group, hcq can form two additional hydrogen bonds with d and s (in red).we further used spr to confirm the binding between cq or hcq and ace . the binding constant kd of these two compounds and ace protein were ( . ± . )e- and data are presented as mean ± s.d. (*p < . , **p < . , ***p < . , compared with hek t, or concentration was , or hcq μm, # p < . compared with hcq μm at corresponding time points). concomitant azithromycin among hospitalized patients testing positive for coronavirus disease (covid- ) preliminary evidence from a multicenter prospective observational study of the safety and efficacy of chloroquine for the treatment of covid- no evidence of rapid antiviral clearance or clinical benefit with the combination of hydroxychloroquine and azithromycin in patients with severe covid- infection establishment and validation of a pseudovirus neutralization assay for sars-cov- characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov current and future use of chloroquine hydroxychloroquine in infectious, immune, neoplastic, and neurological diseases: a mini-review therapy and pharmacological properties of hydroxychloroquine and chloroquine in treatment of systemic lupus erythematosus, rheumatoid arthritis and related diseases use of chloroquine in viral diseases new insights into the antiviral effects of chloroquine mechanisms of action of hydroxychloroquine and chloroquine: implications for rheumatology cell entry mechanisms of sars-cov- chloroquine is a potent inhibitor of sars coronavirus infection and spread. virol j a novel coronavirus outbreak of global health concern remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( - ncov) in vitro the treatment of malaria cardiotoxicity of antimalarial drugs cryo-em structure of the -ncov spike in the prefusion conformation a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace ace : the key molecule for understanding the pathophysiology of severe and critical conditions of covid- : demon or angel? viruses structural basis for the recognition of sars-cov- by full-length human ace covid- and the cardiovascular system single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to - ncov infection key: cord- -pf wbw authors: de lamballerie, claire nicolas; pizzorno, andrés; fouret, julien; szpiro, lea; padey, blandine; dubois, julia; julien, thomas; traversier, aurélien; dulière, victoria; brun, pauline; lina, bruno; rosa-calatrava, manuel; terrier, olivier title: transcriptional profiling of immune and inflammatory responses in the context of sars-cov- fungal superinfection in a human airway epithelial model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pf wbw superinfections of bacterial/fungal origin are known to affect the course and severity of respiratory viral infections. an increasing number of evidence indicate a relatively high prevalence of superinfections associated with covid- , including invasive aspergillosis, but the underlying mechanisms remain to be characterized. in the present study, to better understand the biological impact of superinfection we sought to determine and compare the host transcriptional response to sars-cov- versus aspergillus superinfection, using a model of reconstituted humain airway epithelium. our analyses reveal that both simple infection and superinfection induce a strong deregulation of core components of innate immune and inflammatory responses, with a stronger response to superinfection in the bronchial epithelial model compared to its nasal counterpart. our results also highlight unique transcriptional footprints of sars-cov- aspergillus superinfection, such as an imbalanced type i/type iii ifn, and an induction of several monocyte- and neutrophil associated chemokines, that could be useful for the understanding of aspergillus-associated covid- and but also management of severe forms of aspergillosis in this specific context. prevalence of superinfections associated with covid- , including invasive aspergillosis, but the underlying mechanisms remain to be characterized. in the present study, to better understand the biological impact of superinfection we sought to determine and compare the host transcriptional response to sars-cov- versus aspergillus superinfection, using a model of reconstituted humain airway epithelium. our analyses reveal that both simple infection and superinfection induce a strong deregulation of core components of innate immune and inflammatory responses, with a stronger response to superinfection in the bronchial epithelial model compared to its nasal counterpart. our results also highlight unique transcriptional footprints of sars-cov- aspergillus superinfection, such as an imbalanced type i/type iii ifn, and an induction of several monocyte-and neutrophil associated chemokines, that could be useful for the understanding of aspergillus-associated covid- and but also management of severe forms of aspergillosis in this specific context. the current pandemic of novel coronavirus disease , caused by severe acute respiratory syndrome coronavirus (sars-cov- ) began in wuhan, hubei province, china, in december . as of may , , there have been more than , , confirmed covid- cases in the world as reported by the who, including , deaths (who). characterized and validated in terms of viral production, impact on trans-epithelial resistance hallmark of cov+asp superinfection, in contrast to cov infection (fig. d) . similar observations were performed using different terms related to inflammation (extended data fig. ), hence suggesting a very different inflammation signature resulting from superinfection. altogether, our results indicate that the cov+asp superinfection presents a transcriptomic signature that recapitulates the overall signature of a simple cov infection, but with both a particularly distinct regulation of the inflammatory response and the additional regulation of many biological processes related to the physiology of epithelia. of note, we observed a relatively similar global pattern of regulation between the nasal and bronchial hae models, differing primarily in the magnitude rather than the nature of the responses to cov and cov+asp superinfection. in order to explore in more depth the transcriptomic signature of the superinfection, we then signaling pathways (fig. c) , which is consistent with the most upregulated degs shown in fig. a and extended data file . to better visualize these observations, we applied a protein- protein interactions analysis using string network to investigate the degs corresponding to several reactome and go terms (immune system process, cytokine signaling in immune system, inflammatory response, interleukin- signaling) enriched in the bronchial and nasal superinfection signatures along with their functional interactions ( fig. a and b invasive pulmonary aspergillosis (ipa), which typically occurs in an immunocompromised host, represents an important cause of morbidity and mortality worldwide (clancy and nguyen, ). superinfections were extensively documented in the case of influenza infections, with the latter being usually described to "pave the way" for bacterial superinfections, but several severe influenza cases have also been reported to develop invasive pulmonary aspergillosis / / : : am. an increasing amount of evidence points towards a relatively high prevalence of superinfections, including invasive aspergillosis, to be associated with covid- (alanio et al., ; lescure et al., ; zhou et al., ) . however, the underlying mechanisms remain to be characterized. in the present study, we sought to better understand (pizzorno et al., ) . whereas no major differences in terms of global superinfection signatures where observed between hae models of nasal or bronchial origin, the second part of our study highlighted more subtle differences between the differences of infectivity and consecutive host responses between different cell subsets (type ii pneumocytes, nasal goblet secretory cells) are linked to varying ace /tmprss levels, ace expression being linked to the ifn response (ziegler et al., ) . the discrepancies we observed in the two hae models could be explained by differences of cell type composition that could be interesting to further explore using combinations of additional experimental models, including ace /tmprss expression and single cell rna-seq approaches. our data are not entirely consistent with these findings. whereas we also demonstrate a very higher this probability is, the less informative is the child term relative to its parents. this is also a statistic that can be used for independent filtering to reduce the p-value adjustment burden. therefore, all terms with a p_min probability higher than e- were filtered out before p-values adjustment using bonferroni method (dunn, ) . this multiple-testing correction extended data table statistics of rna-seq fragment pseudo-alignment to the human transcriptome. go cell junction organization:go extracellular matrix organization:rc extracellular structure organization:go cell adhesion:kw biological adhesion:go go reproductive process:go regulation of cell population proliferation:go regulation of cell death:go cell surface receptor signaling pathway:go signal transduction:rc regulation of developmental process:go multicellular organismal process:go regulation of multicellular organismal process:go regulation of cell communication:go regulation of signaling:go signal transduction:go regulation of response to stimulus:go developmental process:go response to stimulus:go negative regulation of biological process:go positive regulation of biological process:go biological regulation:go epidermolysis bullosa:kw endodermal cell differentiation:go binding and uptake of ligands by scavenger recepto immune system:rc protein localization to cilium:go ciliopathy:kw cytoskeleton−dependent intracellular transport:go cilium biogenesis/degradation:kw organelle biogenesis and maintenance:rc microtubule−based process:go cell projection organization:go locomotion:go regulation of locomotion:go regulation of cellular component movement:go movement of cell or subcellular component:go regulation of localization:go signaling:go key: cord- - ablrwuo authors: guintivano, jerry; dick, danielle; bulik, cynthia m title: psychiatric genomics research during the covid- pandemic: a survey of psychiatric genomics consortium researchers date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ablrwuo between april , and june , we conducted a survey of the membership of the psychiatric genomics consortium (pgc) to explore the impact of covid- on their research and academic careers. a total of individuals responded representing academic ranks from trainee to full professor, tenured and fixed-term appointments, and all genders. the survey included both quantitative and free text responses. results revealed considerable concern about the impact of covid- on research with the greatest concern reported by individuals in non-permanent positions and female researchers. concerns about the availability of funding and the impact of the pandemic on career progression were commonly reported by early career researchers. we provide recommendations for institutions, organizations such as the pgc, as well as individual senior investigators to ensure that the futures of early career investigators, especially those underrepresented in academic medicine such as women and underrepresented minorities, are not disproportionately disadvantaged by the covid- pandemic. since the initial outbreak in december , the novel coronavirus sars-cov- (covid- ) has mushroomed into a global pandemic affecting every aspect of life. in an effort to reduce transmission, many governments, universities, and other research institutions issued work from home orders for non-essential workers. interruptions were widespread on both work and home fronts. researchers or their family members were infected by covid- , schools were closed leaving little time or space for work, and the unpredictability of the course of the pandemic led to persistent anxiety and distress for most people in the world. specifically, many clinician-researchers were seconded to covid-related clinical duties; patient facing research was halted or postponed; many basic science researchers were mandated to halt all laboratory-based activities; and academic medical centers faced enormous financial consequences (colenda, applegate, reifler, & blazer, ; kim et al., ; weissman, klump, & rose, ) . in person teaching was suspended requiring rapid adaptation to remote teaching platforms. in addition, other academic activities, such as conferences and face-to-face meetings, were cancelled or transitioned to virtual formats, interrupting training, networking, and other means of scientific information exchange. several studies have documented challenges that have been faced by academics at different career levels, different genders, and different family structures, and concerns have been raised, especially for early-career researchers and women regarding the long-term impact of the pandemic on their career progression (denfeld et al., ) . as a service especially to our early career researchers, the psychiatric genomics consortium (pgc) wanted to better understand the effects of covid- on its members and their research. we modeled our questionnaire after weissman et al (weissman et al., ) and assessed: (a) the impact of covid- on pgc research now and in the foreseeable future; (b) the level of concern about covid- -related disruptions of research and the potential impact of these disruptions on individuals' careers; (c) strategies that respondents thought to be effective in coping with covid- -related research disruptions; and (d) respondents' suggestions for how the pgc or the field should respond to help researchers move through and beyond the current crisis. to address these aims, we administered an anonymous survey containing both quantitative and free text responses to pgc researchers. quantitative questions focused on the perceived impact of covid- -related research disruptions. we hypothesized that respondents in secure employment positions (e.g., with tenure or permanent contracts) would report less stress and less concern about potential adverse impact of covid- on their research and career, than respondents at earlier stages in their career (e.g., tenure-track or fixed-term contracts, postdoctoral fellows, graduate students). the primary goal of the qualitative, free text questions was to describe participants' strategies or suggestions, and highlight issues that they felt were important that we had not addressed. we had no hypotheses regarding the free text responses. all members of the psychiatric genomics consortium (pgc) identified by the consortium listserv were invited to participate in the survey. invitations were sent through the main pgc listserv, through individual workgroup listservs, and were promoted during regular pgc videoconferences. the survey was launched on april , and closed on june , . qualtrics was used to administer and store the survey. this study was approved by the university of north carolina institutional review board committee for the protection of human subjects. this mixed-methods survey was composed of likert-scale items measuring concern about covid- disruptions (rated from = no concern to = extreme concern) and highest level of stress experienced since the outbreak of the pandemic ( = no stress to =highest level of stress imaginable). one item asked respondents to characterize the proportion of their pgcrelated research that had to be completely shut down due to covid- , using six options (from to up to % in % increments). three categorical items (yes, no, do not know/does not apply) addressed whether studies had been transferred to online; whether researchers were anticipating making changes to their research practices; and whether their institution had made policy changes in response to demographic data were collected in a final set of questions. participants were asked to report their gender (female, male, gender variant/nonconforming, choose not to answer), current position ("faculty appointment (> years post training)", "faculty appointment (up to years post training) (early career researcher)", "graduate student", "post-doctoral fellow", "resident", "other"), type of position ("a position that could lead to tenure or a permanent contract, but i have not yet reached this status", "a position that is not in the tenure track or permanent contract system", "a tenured or permanent contract position", "other"), department/institution/organization of their primary appointment (genetics, government organization, medicine [other than psychiatry], non-academic hospital or clinic, other, psychiatry, psychology, public health, research institute), and the country where they hold their primary research appointment (a drop-down menu of countries in the world). descriptive statistics were calculated for quantitative items using r (r core team, ). chisquare tests and analyses of variance (anovas) were used to test hypotheses that individuals would differ on the basis of type of academic position and gender. for analyses on gender, only male and female respondents were used due to small sample sizes of other respondents. a stringent significance threshold of p < . was used to account for multiple testing throughout the study. in addition, cohen's d were used to estimate effect sizes. responses to open-ended questions were grouped into thematic categories, as follows. for each open-ended question, the first author independently developed a set of themes to capture the responses. each theme was required to capture at least three responses to the open-ended question. because the goal of reporting the open-ended responses was strictly descriptive, we made no attempt at establishing reliability of the coding of major themes. a total of individuals completed the survey, with a majority of respondents being female (n = , . %) compared to male (n = , . %) and gender variant/nonconforming or unreported (n = , . %) ( table ). the majority of respondents more held permanent/tenured positions (n = , . %) compared to non-permanent faculty (n = , . %), trainees (n = , . %) or "other" positions (n = , . %). six respondents ( . %) did not specify their academic position ( table ) . of those who did report their position, a majority held primary research appointments in psychiatry (n = , . %) or genetics (n = , . %). the remaining respondents indicated various other departments (medicine, psychology, public health), research institutes, or nonacademic hospitals. academic appointments were most commonly in the united states (n = , . %), followed by united kingdom, n = ; germany, n = ; sweden, n = ; australia, n = ; denmark, n = ; canada, n = ; brazil, n < ; greece, n < ; italy, n < ; mexico, n < ; netherlands, n < ; spain, n < ; switzerland, n < ; afghanistan, n < ; austria, n < ; estonia, n < ; japan, n < ; new zealand, n < ; norway, n < ; romania, n < ; south africa, n < . ten individuals ( . %) not report the country of their appointment. a significantly greater proportion of men (n = , . % of male sample) reported holding a permanent appointment compared to women (n = , . % of female sample) (c ! ( , ) = . , p = . ). as shown in table , the largest proportion of participants ( . %) reported that % of their research was shut down due to covid- . a total of . % of respondents reported no shutdown of their research, many respondents in this category emphasized that their work was mostly on archived data. a total of . % of respondents reported their research to be totally (up to %) shut down. among those who reported any research disruptions (n = , . % of the total sample), some reported being able to transition their studies to online settings (n = , . %), whereas others (n = , . %) reported not having the need to move their studies online due to the nature of their work being theoretical or secondary data analyses. further, the majority of respondents who indicated "up to %" or "up to %" of their research was shut down due to covid- restriction (n = ) were more likely to also report they were unable to move their research work to an online setting (n = ). tables and present research related concerns stratified by academic position ( table ) and gender ( table ) . on average, the highest levels (mean ³ . ) of research-related concern were related to cancellation of career opportunities, securing future funding, recruitment, and data collection. intermediate levels of concern ( . > mean ³ . ) were disruptions from having to work from home (both technological and domestic concerns), staffing, transferring teaching/supervision to remote/online, budget, and obtaining institutional approvals. specific problems related to supply procurement (mean = . ) and animal research (mean = . ) ranked lowest among respondents. there were no significant differences in research-related concerns between appointment groups. however, when stratified by gender, women report significantly greater concerns regarding domestic issues around disruptions from having to work from home (p = . ). overall, the impact of covid- on career was an intermediate concern (mean = . ). however, very large and highly significant differences emerged in career concerns, with trainees and nontenured faculty reporting higher levels of concern compared to those with permanent positions (p = . e- , d = . ). moreover, women reported greater levels of career concerns, of medium effect size, compared to men (p = . e- , d = . ). stress levels displayed similar patterns as career impact. stress levels were high among all respondents (mean = . ), with higher levels reported among those with non-tenured positions (p = . , d = . ) and by females (p = . e- , d = . ). a minority (n = , . %) of respondents expected no changes would be made to their research moving forward as a result of covid- , while the majority (n = , . %) thought it was "too soon to tell" if any changes need to be made ( table ) . there was no significant difference across positions regarding the future practices of pgc research (p = . ). a full . % of respondents reported that there would be no changes in performance evaluations at their institutions related to covid- , with approximately one-third of respondents ( . %) indicated changes, and . % reporting that it was too soon to tell ( table ). respondents were asked to provide " - effective strategies for dealing with covid- in terms of your pgc research," which yielded responses. four main themes characterized the comments: maintain team dynamics (e.g., utilizing videoconferencing for regular team meetings, being flexible with deadlines, use clear communication) ( . % of responses); maintain good personal habits (e.g., keeping in mind productivity may be reduced, practicing self-care, keeping work and personal areas separate) ( . %); reprioritize research goals (e.g., spending more effort on dry-lab projects rather than wet-lab, using available time to complete analyses or manuscripts, utilizing existing data for new projects) ( . %); and shift recruitment to online approaches (e.g., phone interviews rather than face-to-face, development of online recruitment and consent protocols) ( . %). two themes emerged from the responses to the prompt, "effective strategies for transitioning your pgc research to online settings." both themes employ technological approaches to: maintain contact with research participants (e.g., using online questionnaires and other remote means for recruitment and consent) ( . %) and support activities of the research team (e.g., establishing connections to remote databases, regular teleconferencing with colleagues) ( . %). the responses to the prompt "describe - changes you expect to make in your pgc research practices" could be grouped into five themes: move to remote recruitment (e.g., phone/online recruitment, saliva rather than blood sample collection) ( . %); increased use of virtual meetings (e.g., video conferencing for in lieu of lab or scientific meetings) ( . %); organizational changes (e.g., expectations for decreased budget, reduced personnel) ( . %); increased protective measures when collecting samples in-person (e.g., increased ppe for blood draws) ( . %); and shifting research priorities (e.g. add covid- as a research focus) ( . %). when asked to "describe - changes the pgc should make to support pgc researchers during and after covid- " responses were provided. these comments fell into six common themes: continue to support remote meetings (e.g. virtual world congress of psychiatric genetics annual meeting, worldwide lab meetings) ( . %), facilitate more secondary analyses (e.g. making data access easier, increasing time frame on existing proposals, prioritize data access for junior researchers) ( . %), offer greater support to early career researchers (e.g. advocate for junior researchers, create support groups for early career researchers) ( . %), provide support for researchers directly impacted by covid- (e.g. leniency for those who see patients and other caregivers, support career advancement for those directly affected by covid- ) ( . %) , provide funding to support researchers (e.g. bridge funding to sustain pgc research especially among junior researchers and those not part of core pgc funding) ( . %), and provide online training (e.g. statistical genetics and other key topic courses) ( . %). of the answers to the prompt, "briefly describe your institution's policy changes about performance evaluations of researchers," by far, the most reported change was an extension of performance evaluation periods ( . % of responses), specifically extending the tenure clock by one year. seventeen comments were made in response to the question "is there a question about covid- 's impact on your pgc research that we should have asked but didn't?" there were two themes that met a minimum of three comments per theme. first, respondents indicated that we should have asked questions regarding clinical researchers and how their practice has shifted to support covid- patients ( . % of responses). second, respondents wanted more questions into how broader shutdowns (e.g., school closings, unemployment, work from home) have affected productivity, specifically childcare ( . % of responses). overall, our survey revealed high stress and concern about the impact of covid- on their careers especially in individuals with non-permanent positions and in women. our results are consistent with those reported across various academic fields (andersen, nielsen, simone, lewiss, & jagsi ; brubaker, ; denfeld et al., ; kibbe, ; weissman et al., ) , and highlight steps that can and should be taken to ensure the ability of early career researchers and female academics not only to survive but to thrive post-pandemic. results of the pgc survey align with other surveys reported in the literature that reveal the disproportionate impact that covid- -related interruptions have had on female researchers. from having primary responsibility for childcare at home while trying to work from home to concerns about an advancing tenure clock when their productivity is hampered by pandemicrelated disruptions, women do appear to be more stressed and more directly impacted than men. this augments the already disproportionate burden of domestic and emotional labor shouldered by female academics (brubaker, ; jolly et al., ; rao, ) . other studies confirm this observation including a disproportionate number of male first authors in papers submitted to journals on covid- (andersen et al., ) , journal submissions and productivity in general (viglione, ) , and projections of serious interruptions of career progress for women that could adversely affect progress toward gender equity in academe (sheikh et al., ) . a surprising number of institutions were not intending to make allowances on performance evaluations or tenure clocks. this is of significant concern, especially given the documented differential burden placed on junior female faculty in their childbearing years who are entrusted with the majority of childcare duties (jolly et al., ) . senior mentors, institutions, and scientific organizations like the pgc should actively develop and deploy measures to support the careers of junior researchers and those of all genders who have had to take on additional child-or elder-care burdens during this time. likewise, although we did not assess ancestry, it has been widely documented in the united states that individuals from underrepresented minority groups have been disproportionately affected by covid- (moore et al., ) , which means that not only might more minority researchers be directly affected by covid- , but they are also more likely to have connections in socially vulnerable communities and have family members and members of their communities impacted (nayak et al., ) . this can divert both time and emotional energy away from career progress and further perpetuate existing systemic inequities in academe (davis & fry, ) . in many years to come, evaluations of productivity and hiring decisions should explicitly address and account for disruptions encountered during this time. although many institutions have implemented a single-year extension of tenure clocks, that may not suffice given the prolonged nature of the pandemic. applications should include the opportunity to describe the impact of the pandemic on an applicant's life such that it can be factored into the evaluation of the candidate. it is critically important that covid- not set back progress toward equity in science and academe in general, but definitive action must be taken in order to ensure that outcome. some caveats and limitations should be considered when interpreting the results. first, the extent to which our sample represents the larger pgc is unknown as we are unable to calculate response rate or representativeness as the survey link was shared widely across pgc groups and subgroups. the composition of the sample, namely primarily female ( . % of total sample) and in a permanent/tenured position ( . %), does not necessarily reflect the overall composition of the pgc and may reflect selective participation. second, given our goal of providing strictly descriptive results, we did not undertake formal efforts at establishing a coding scheme for the free text responses. third, the survey was deployed relatively early in the pandemic when it was not yet clear how long the disruption to research would go on. responses could change as the duration of the home-and work-related disruption continues and researchers become increasingly fatigued by the pervasive and persistent disruption. finally, our failure to assess race and ethnicity was a missed opportunity to capture specific concerns faced by researchers from underrepresented minority groups. as a field, genetics already has considerable underrepresentation of researchers from diverse ancestral backgrounds; our findings for other historically disadvantaged groups (e.g., women, early career investigators) suggest that the pandemic may further exacerbate this underrepresentation. recurring themes that emerged focused on the cancellation of career opportunities in terms of networking, but also the financial impact of covid- on job availability as many institutions have implemented hiring freezes. this along with personal economic instability, and concerns about the availability of sources of future research funding lead many researchers to question their future job prospects and the viability of remaining in academe. many respondents expressed desire for the pgc and senior investigations to devise ways to help boost productivity and success in publications and grant applications-basically devoting greater energy to ensuring the success of early career researchers during this time. the disproportionate underrepresentation of women at higher academic ranks (carr et al., ) is a known phenomenon in many fields of academic medicine, especially for women. it is a critical juncture to ensure that we can shore up promising young investigators such that we can retain them in science and not erase the albeit slow and incremental advances that we have seen in striving for equity in academe (wingard, trejo, gudea, goodman, & reznik, ) . bulik is supported by r mh , r mh , r mh , r mh , r mh , r mh , u mh , and h sm . she also acknowledges funding from the swedish research council (vetenskapsrådet conflict of interest: j guintivano -none. cm bulik reports: shire (grant recipient pearson (author, royalty recipient). dm dick -none covid- medical papers have fewer women first authors than expected women physicians and the covid- pandemic gender differences in academic medicine: retention, rank, and leadership comparisons from the national faculty survey covid- : financial stress test for academic medical centers college faculty have become more racially and ethnically diverse, but remain far less so than students covid- : challenges and lessons learned from early career investigators gender differences in time spent on parenting and domestic responsibilities by high-achieving young physician-researchers consequences of the covid- pandemic on manuscript submissions by women one academic health system's early (and ongoing) experience responding to covid- : recommendations from the initial epicenter of the pandemic in the united states disparities in incidence of covid- among underrepresented racial/ethnic groups in counties identified as hotspots during impact of social vulnerability on covid- incidence and outcomes in the united states. medrxiv influences for gender disparity in academic psychiatry in the united states are women publishing less during the pandemic? here's what the data say conducting eating disorders research in the time of covid- : a survey of researchers in the field faculty equity, diversity, culture and climate change in academic medicine: a longitudinal study responses to free text questions number of comments recorded and themes and illustrative (shortened, paraphrased) statements identified by open-ended questions %) utilizing videoconferencing for regular team meetings, being flexible with deadlines, use clear communication . maintain good personal habits ( / , . %) keeping in mind productivity may be reduced, practicing self-care, keeping work and personal areas separate %) spend more effort on dry-lab projects rather than wet-lab, use available time to complete analyses or manuscripts, utilizing existing data for new projects other caregivers, support career advancement for those directly affected by covid- we should have asked questions regarding clinical researchers and how their practice has shifted to support covid- patients ( / school closings, unemployment, work from home) have affected productivity, specifically childcare ( / , . %) please share your - most effective strategies for dealing with covid- in terms of your pgc research. ( comments) please share your - most effective strategies for transitioning your pgc research to online settings please describe - changes you expect to make in your pgc research practices as a result of covid- . (respondents were instructed to skip the question if they did not anticipate making changes or had checked "it's too soon to tell") ( comments) is there a question about covid- 's impact on your pgc research that we should have asked but didn't? ( comments) please describe - changes the pgc should make to support pgc researchers during and after covid- dr. jerry guintivano is supported by k mh dr. danielle dick is supported by nih r aa (finnish twin study), p aa (alcohol research center), r aa (vcu great), r aa (personalized risk assessment), and u aa (coga) from the national institute on alcohol abuse and alcoholism (niaaa), and by r da (externalizing consortium) from the national institute on drug abuse (nida). key: cord- - w ciglv authors: marquez-miranda, valeria; rojas, maximiliano; duarte, yorley; diaz-franulic, ignacio; holmgren, miguel; cachau, raul e.; gonzalez-nilo, fernando d. title: analysis of sars-cov- orf a structure reveals chloride binding sites date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w ciglv sars-cov- orf a is believed to form ion channels, which may be involved in the modulation of virus release, and has been implicated in various cellular processes like the up-regulation of fibrinogen expression in lung epithelial cells, downregulation of type interferon receptor, caspase-dependent apoptosis, and increasing ifnar ubiquitination. orf a assemblies as homotetramers, which are stabilized by residue c . a recent cryoem structure of a homodimeric complex of orf a has been released. a lower-resolution cryoem map of the tetramer suggests two dimers form it, arranged side by side. the dimer’s cryoem structure revealed that each protomer contains three transmembrane helices arranged in a clockwise configuration forming a six helices transmembrane domain. this domain’s potential permeation pathway has six constrictions narrowing to about Å in radius, suggesting the structure solved is in a closed or inactivated state. at the cytosol end, the permeation pathway encounters a large and polar cavity formed by multiple beta strands from both protomers, which opens to the cytosolic milieu. we modeled the tetramer following the arrangement suggested by the low-resolution tetramer cryoem map. molecular dynamics simulations of the tetramer embedded in a membrane and solvated with . m of kcl were performed. our simulations show the cytosolic cavity is quickly populated by both k+ and cl-, yet with different dynamics. k+ ions moved relatively free inside the cavity without forming proper coordination sites. in contrast, cl- ions enter the cavity, and three of them can become stably coordinated near the intracellular entrance of the potential permeation pathway by an inter-subunit network of positively charged amino acids. consequently, the central cavity’s electrostatic potential changed from being entirely positive at the beginning of the simulation to more electronegative at the end. the recent release of the sars-cov- open reading frame a (orf a) dimer structure, solved by cryo-em [ ] (pdb: xdc), offers new opportunities for sars-cov antivirals design. the deletion of orf a reduces viral titers in animal models, suggesting orf a as a target for developing therapeutic agents against sars-cov- [ ] ; however, the exact mechanism of function of orf a is not well understood. orf a is believed to form ion channels [ ] [ ] [ ] , which may be involved in the modulation of virus release [ ] . apoptosis initiates a viral cytopathic effect in sars-cov- -infected cells [ ] [ ] [ ] . blocking orf a channel activity has been reported to abolish caspase-dependent apoptosis. orf is related to viral pathogenicity in porcine epidemic diarrhea virus (pedv) [ ] . other cellular processes affected by orf a include the up-regulation of fibrinogen expression in lung epithelial cells, downregulation of type interferon receptor, and increasing ifnar ubiquitination. for a full list, see zhang et al. [ ] . the broad spectrum of responses linked to orf a and its distinct sequence and structure makes orf a a valuable target. initial studies [ ] [ ] [ ] showed that orf a could form ion channels in xenopus laevis oocytes and yeast, with an expected tetrameric assembly as inferred from biochemical data [ ] . recently, nonetheless, channel activity has been detected using purified dimeric orf a proteins from sars-cov- that were reconstituted in proteoliposomoes [ ] . these dimers' cryoem high-resolution structure revealed that each protomer contains three transmembrane helices, arranging in a clockwise configuration forming a six helices transmembrane domain [ ] . the potential permeation pathway of this domain is formed by tm and tm [ ] , consistent with functional observations [ ] [ ] . this pathway has six constrictions narrowing to about Å in radius, suggesting the structure solved is in a closed or an inactivated state. the potential permeation pathway contains a polar cavity within the inner half of the transmembrane region. the cytosol end of the structure forms a cytosolic domain composed of multiple beta strands from both protomers. here we used this dimeric structure to perform full atom molecular dynamic simulations and electrostatic potential calculations to ask questions concerning the dimers' stability and whether ions could be populating specific regions of the channel. we showed that the dimer core is strongly stabilized by more than methyl-methyl interactions, consistent with a non-conductive conformation. further, we evidenced that the inner polar cavity has physicochemical properties that are better suited for hosting negatively charged species instead of cations. since biochemical data suggest that residue c is essential for tetramerization [ ] , we opted to establish an initial system composed of two dimers with high-resolution structures covalently joint by two c ( figure a ). this tetrameric complex is compatible with the low-resolution cryoem structure obtained from purified tetramer forms [ ] . the tetramer was embedded in a membrane solvated in a water box with . m kcl. five hundred nanoseconds molecular dynamics simulation of this system was carried out. for simulation details, see methods. all results described below were observed in both dimers of the tetrameric system. no significant changes in the transmembrane pore dimensions were observed during the simulation, with the pore radius widening by about Å around the extracellular and intracellular side ( figure b) . the pore dimensions stability can be explained by the strong vdw interactions dominating the interactions in the transmembrane region's narrowest section, formed by the residues from to . free energy of binding between two monomers of a single dimer obtained using mm-gbsa analysis over the last frames of md indicates the vdw contribution is five times larger than that of the electrostatics (see methods). the large vdw contribution originates in the more than methyl-methyl interactions between the two monomers (chain a and chain b), resulting in a tightly packed core interaction that cannot be breached unaided. not surprisingly, no ions were observed to cross the potential permeation pathway. basic residues in the water-membrane interface are critical in the entry of chloride ions to the central polar cavity of orf a. in addition to the outer half of the potential permeation pathway, there are three possible access routes to the central polar cavity, so-called tunnels [ ] . the upper tunnel connects the polar cavity to the lipid bilayer. through the inter-subunit and the inner tunnels, the polar cavity opens to the bulk solution at the water-membrane interface and directly into the cytosol, respectively. the inter-subunit tunnel, located between z=- and z=- (with z measuring the vertical displacement from the membrane midpoint when the model central axis is aligned along the z coordinate), has a high density of positively charged residues (lys , lys , arg ), which may negatively affect cations binding while favoring negatively charged species. indeed, our simulations show that this tunnel is the main entryway for cl-ions into the central polar cavity. within the first ns of the simulation, three cl-ions entered the central polar cavity and remained for the rest of the simulation. these three cl-ions are constantly alternating their position within the polar cavity. figure a shows a static view of the cl-ions inside the central polar cavity and the amino acids responsible for their retention in this region. the presence of anions in this region makes the polar cavity slightly wider than the experimental structure dimensions. still, it does not allow ions translocation through a potential transmembrane permeation pathway. the central polar cavity shows the highest occupancy of cl-ions (fig. b) . the average density of the molecular species in the atomic system is depicted in supp. material figure s . regions with high k+ ions occupancy were assessed using the volmap tool in vmd [ ] . the cytosolic domain, composed mainly of beta-sheets, acts as a repository of k+ ions, accumulating through the surface exposed to the cytosol ( figure a ). in contrast to cl-ions, no high occupancy sites for k+ were found inside the central polar cavity. we observed that most of the cytosolic domain regions where k+ ions accumulate do not form proper coordination sites, which means that k+ does not remain in those regions for a long time. there was only one k+ that remained stable at the cytosolic domain's surface, coordinated by e , d , s , and d , along with two water molecules ( figure b ). interestingly, residues e and d are relevant for the potassium channel activity of sars-cov- orf a [ ] , with alanine mutations decreasing conductance when compared to the wt channel. ion occupancies modify the electrostatic potential of the orf a channel. to assess the impact of the ion occupancies described above, we obtained the electrostatic potential maps for the orf a channel for the initial configuration, and the last frame, at the end of a trajectory of ns of the molecular dynamics simulations, by employing the poison-boltzmann approach implemented in the apbs package [ ] . this analysis shows that the entry of cl-ions through the inter-subunit tunnel into the central polar cavity and the accumulation of k+ ions at the cytosolic domain's surface changed the channel's electrostatic profile. the central cavity's electrostatic potential appears to be entirely positive in the first frame, changing dramatically after the cl-ions occupied the central polar cavity, becoming more electronegative ( figure a and b) . interestingly, this change propagated through the transmembrane domain's low dielectric field to the extracellular side, promoting a less positive electrostatic potential of the transmembrane region. the accumulation of k+ ions at the cytoplasmic domain's surface also produced substantial changes in the electrostatic potential of the channel, particularly at the intracellular end between the two dimers ( figure a and b) . the afal server [ ] was used to correlate our observations with previously characterized cation and anion binding sites. the afal server analyzes and catalogs the propensity of amino acid residues to interact with a specific ligand, based on the information available in the structures deposited in the protein data bank. the residues surrounding cl-ions with the highest probability are arg > asn > lys > ser > thr, in agreement with the residues observed at the entrance of the inter-subunit tunnel and those in the central polar cavity, specifically the residues lys , lys , arg , arg , arg , lys , ser , ser , and asn . the afal profile for k+ ions showed that the higher amino acid propensity near the ion is asp > ser > thr > glu, consistent with the amino acids that stabilized the single k+ at the surface of the cytoplasmic domain. this information supports our observation that the orf a central region is not well suited to interact with positively charged species. functional studies demonstrate that orf a has properties of cation-selective channels [ ] [ ] [ ] , an activity that has been linked to apoptosis in host cells [ ] . the recent cryoem high-resolution structure of a dimer form of orf a from sars-cov- and the expected tetrameric complex arrangement are unusual [ ] . here we presented an initial analysis of the sars-cov- orf a structure, explored with molecular simulations methods. we found that the central polar cavity has high-affinity sites for chloride ions. in contrast, no high-affinity regions for potassium ions were found in the potential permeation pathway. we only saw a high occupancy site for a single k+ in the cytosolic domain, close to the residues e and d . multiple k+ ions tend to concentrate intermittently in the cytosolic domain's cytoplasmic surface, acting as a repository of cations. orf a from sars-cov- (uniprot:p ) and sars-cov (uniprot: p dtc ) shares % of sequence similarity (as obtained using emboss needle [ ] ), yet all the key residues in the chloride anion binding sites are conserved. this observation is further supported by our analysis of distal sequences, mostly from bat viruses, where the key chloride coordinating residues are conserved (not shown). these observations indicate that the inter-subunit tunnel and the central polar cavity have evolved to coordinate negative ions. furthermore, the negative ions are stabilized in the central polar cavity located in the inner half of the potential permeation pathway, precisely in the middle of the transmembrane protein core. thereby, the low dielectric transmembrane region may contribute to propagating the electrostatic potential, which could help regulate the flow of cations through an alternative pore formed by a rearrangement of orf a monomers. it is difficult to reconcile the present orf a cryoem structure with cation conduction. additional experimental and structural evidence is needed to identify the pathway of cations. the proposed permeation pathway between tm and tm of both protomers is stabilized by numerous vdw interactions accounting for about - kcal/mol. the vdw contribution is the most important compared to the electrostatic energy contribution. this is mainly the result of more than methyl-methyl interactions between the two monomers. therefore, breaking these interactions to open the potential conduction pore would require a significant amount of energy that should be impossible to reach under a passive condition. interestingly, the most prevalent variant of orf a given by the analysis of sars-cov variants is q h, which is present in about % of sequenced viruses [ ] . in the same study, kern et al. observed that this mutant, located in the tm transmembrane segment, does not carry any change in the expression, stability, conductance, selectivity, or gating or the orf a channel. these results provide further suggestive evidence that the potential permeation pathway within the core of a dimeric complex might not be the actual conduction route for cations. in conclusion, we present a modeling study of orf a dimeric and tetrameric complex, based on the published cryoem structure. the results bring more questions than answers. what is the functional relevance of anions populating the central polar cavity? is the cytosolic domain necessary for conduction? which route cations take to cross the membrane, and what is the role of orf a in transport? and finally, what is the functional oligomeric form of orf a? we present these early observations to quickly share these observations if this is useful to guide future experiments. we will publish a more detailed analysis in future communications. a tetrameric model of orf a protein (code: xdc) was obtained by overlapping two dimer structures into the cryoem map deposited in the electron microscopy data bank (emdb) (code: ) [ ] . a disulfide bond was included to bind the cys residues of both dimers. the tetrameric structure was protonated according to the standard protonation states at ph and embedded into a popc ( -palmitoyl- -oleoyl-phosphatidylcholine) membrane, composed of lipids, and surrounded by water molecules and ions ( k+, cl−), resulting in a salt concentration of . m. the final dimensions of the system are x x Å . the system was equilibrated using the six steps scripts provided by the charmm-gui [ ] [ ] [ ] webserver under the amber molecular package [ ] [ ] [ ] . five hundred ns of molecular dynamics simulations were completed. for proteins, the amber sb force field was used [ ] , lipid force field for lipids [ ] , the tip p water model [ ] , and ion parameters reported by joung and cheatham [ ] . an integration time step of fs was used. the cut-off used for van der waals interactions was . nm, and the dispersion correction for energy and pressure was applied. the particle mesh ewald (pme) method was used [ ] to treat electrostatic interactions. the velocity rescale (v-rescale) thermostat [ ] was used to keep the temperature constant at k. the semi-isotropic berendsen barostat was used to keep the pressure at atm, k, and bar. mmpbsa.py [ ] script was used to calculate mm-gbsa [ ] energy values among two monomers. pore radius profiles were obtained using hole software [ ] and the mdanalysis package [ ] . we used python [ ] and the packages numpy [ ] , pandas [ ] , and matplotlib [ ] for data analysis. visual molecular dynamics software was used for visualization and to generate images [ ] . electrostatic potential maps were computed using apbs [ ] and pdb pqr to obtain the input files [ ] . figure : simulation system. a) initial orf a structure composed of two dimers obtained from xdc crystal structure, covalently joint by a disulfide bond formed by two c residues. b) pore radius profile for the reference structure pdb id: xdc, and for both dimers in the tetramer as obtained after ns of molecular dynamics simulations. profiles of both dimers evidence slight changes in the pore size, especially in the central polar cavity, where up to three chloride ions remain in this region (~ z = - Å), widening the pore. for a single k+ is located in the cytosolic domain. residues e , d , s , and d , along with two alternating water molecules, are coordinating a single k+ (in magenta). figure : electrostatic potential maps were obtained from the orf a tetramer model at the first frame of md (a) and the last frame (b). the maps show the change in the electrostatic potential in the intersubunit tunnel, which changes from electropositive to electronegative due to chloride ions, and in the interface between beta-sheets from both orf a dimers, where it becomes electropositive due to the accumulation of potassium ions. cryo-em structure of the sars-cov- a ion channel in lipid nanodiscs proc. natl. acad. sci key: cord- -hsc x j authors: dittmar, mark; lee, jae seung; whig, kanupriya; segrist, elisha; li, minghua; jurado, kellie; samby, kirandeep; ramage, holly; schultz, david; cherry, sara title: drug repurposing screens reveal fda approved drugs active against sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hsc x j there are an urgent need for antivirals to treat the newly emerged sars-cov- . to identify new candidates we screened a repurposing library of ~ , drugs. screening in vero cells found few antivirals, while screening in human huh . cells validated diverse antiviral drugs. extending our studies to lung epithelial cells, we found that there are major differences in drug sensitivity and entry pathways used by sars-cov- in these cells. entry in lung epithelial calu- cells is ph-independent and requires tmprss , while entry in vero and huh . cells requires low ph and triggering by acid-dependent endosomal proteases. moreover, we found drugs are antiviral in lung cells, of which have been tested in humans, and are fda approved including cyclosporine which we found is targeting cyclophilin rather than calcineurin for its antiviral activity. these antivirals reveal essential host targets and have the potential for rapid clinical implementation. coronaviruses represent a large group of medically relevant viruses were historically associated with the common cold. however, in recent years, members of the coronavirus family have emerged from animal reservoirs into humans and have caused novel diseases ( ). first, severe acute respiratory syndrome (sars-cov) emerged in china in , followed by middle east respiratory syndrome (mers-cov) in ( , ) . while sars was, in the end eradicated, mers continues to cause infections in the middle east. beginning in december , into continuing into january , it became clear that a new respiratory virus was spreading in wuhan, china. rapid sequencing efforts revealed a coronavirus closely related to sars, and was named sars-cov- ( ). unfortunately, this virus is highly infectious and has spread rapidly around the world. identification of broadly acting sars-cov- antivirals is essential to clinically address sars-cov- infections. a potential route to candidate antivirals is through the deployment of drugs that show activity against related viruses. previous studies found that the antiviral drug remdesivir, which was developed against the rna-dependent rna polymerase of ebola virus, was also active against sars-cov- in vitro, with promising results in clinical trials ( ) ( ) ( ) . chloroquine, and its derivatives including hydroxychloroquine are approved for use in malaria, and many in vitro studies have found that these drugs are also active against coronaviruses, including sars-cov- ( , ) . this led to early adoption of these agents to treat covid- (the disease caused by sars-cov- infection); however, little efficacy of these agents has been demonstrated in subsequent clinical trials ( ) . it remains unclear why these agents have not been more active in humans. there are currently more than fda approved drugs, and many others that have been tested in humans. we created an in-house library of drugs including ~ fda approved drugs and ~ drug-like molecules against defined molecular targets with validated pharmacological activity. in addition, we purchased drugs with reported anti-sars-cov- activity (e.g. remdesivir). viruses encode unique proteins essential for infection, and most approved antivirals target these virally encoded essential targets. this class of antivirals has been termed direct-acting antivirals. viruses are also dependent on host cellular machineries for successful infection, and drugs that block these activities are host-targeted antivirals. given our dearth of effective treatments, we developed a screening platform that would allow us to identify both direct-acting and host-targeted antivirals that can be potentially repurposed for use against sars-cov- ( ). we developed a specific and sensitive assay to quantify viral infection using a cell-based highcontent approach. we began our studies in african green monkey (cercopithecus aethiops) kidney epithelial cells (vero) because they are routinely used to propagate sars-cov- . they are robustly infected, and thus vero cells are widely used as a model system to screen for antivirals ( , ( ) ( ) ( ) . we screened our in-house repurposing library, identifying only six drugs that were antiviral with low toxicity in the primary screen. given how few candidates emerged, we reasoned that human cells might be a better model of infection and thus tested a panel of human cell lines to identify cells that are easy to grow, and permissive to infection. we found that the human hepatocyte cell line huh . was readily infected with sars-cov- . screening in this human cell line we validated drugs that were active in dose-response experiments and showed a favorable selective index versus toxicity ( ) . these candidates targeted a wide variety of cellular activities, but few were active in vero cells. however, one class, the chloroquines and their derivatives were active in both cell types. the entry pathway of sars-cov- has only begun to be elucidated with much of what we know being inferred from studies of the related sars-cov- ( , ) . the coronavirus glycoprotein, or spike, requires proteolytic processing for entry ( , , ) . this processing can occur outside the cell, or within the endolysosomal compartment ( , ) . both sars-cov- and - engage angiotensin-converting enzyme (ace ) as their plasma membrane receptor ( ) ( ) ( ) . upon binding, the viruses along with the receptor are endocytosed into the cell into a low ph endosomal compartment where there are proteases, including cathepsins, that can cleave spike and allow for entry into the cytosol ( ) ( ) ( ) . since cathepsins require a low ph for activity, chloroquine and its derivatives that neutralize this low ph can effectively block viral entry ( ) ( ) ( ) . recent studies have also identified a plasma membrane-associated serine protease, tmprss , is active against spike, cleaving the protein extracellularly thereby bypassing the requirement for endosomal proteases ( ) ( ) ( ) . whether sars-cov- enters through different routes in different cell types remains unclear. lung epithelial cells are the major cellular target for sars-cov- in vivo and have been used to explore the role of tmprss in infection. perhaps surprisingly, while we found remdesivir was antiviral in calu- , hydroxychloroquine was not. since a panel of quinolines had no activity in calu- cells, these data suggest that entry in lung epithelial cells is independent of low-ph processing in the endosomal compartment. in contrast, the tmprss inhibitor camostat was highly active in calu- cells but inactive in vero and huh . cells. these data demonstrate distinct modes of entry in lung cells ( ) . further, these data suggest that there may be other fundamentally different cellular requirements in different cell types. we screened our validated candidates in calu- cells and found only drugs showed activity, including fda approved drugs: cyclosporine, dacomitinib and salinomycin. in additional studies, we found that cyclosporine analogs that target cyclophilin a were active against sars-cov- , but not compounds that target calcineurin. identifying broadly acting antivirals is essential to move forward with clinical treatments for sars-cov- . vero cells are permissive to infection and can be used for antiviral screening for direct acting antivirals sars-cov- is routinely propagated in vero e cells ( , , ) . when growing the virus in either vero e or vero ccl cells, two different strains of vero cells from atcc, we observed that sars-cov- is cytopathic in vero e , but not in vero ccl (data not shown) ( ) . moreover, viral stocks propagated from either of these cells produced similar titers of virus ( x pfu/ml) suggesting that viral replication and cytotoxicity are separable. therefore, we set out to develop a quantitative microscopy-based assay to measure the level of replication of sars-cov- more directly in infected cells, and we chose vero ccl to uncouple toxicity from infection. we first validated that our antibodies could detect infection of sars-cov- . we used an antibody to dsrna, and to sars-cov- spike (figure a ) ( ) ( ) . we created an in-house library of drugs purchased from selleckchem. this library contains ~ fda approved drugs and ~ drug-like molecules against defined molecular targets with validated pharmacological activity. the library contains known kinase inhibitors, annotated cancer therapeutics, epigenetic regulators, anti-viral/infectives, gpcr and ion channel regulators. the remaining compounds falling into diverse target classes. we next optimized the dose and timing of infection by performing dose-response studies with known antivirals. indeed, we found that hydroxychloroquine and remdesivir were active in vero cells and presented with little cytotoxicity at the active doses ( figure b) ( ) . next, we validated the assay metrics, and observed a z'= . ( figure s ) ( ) . we used this assay pipeline to screen our in-house repurposing library in well plates at a final concentration of µm ( figure c ) ( ) . we quantified the percentage of infected cells as well as the total cell number per well, to allow for exclusion of toxic compounds. we robustly identified the positive control remdesivir as antiviral ( figure s ) ( ) . using a threshold of < % infection and > % viability, as compared to the vehicle control, we identified only six drugs that were antiviral in our primary screen (table s ). this included the natural product nanchangmycin, which we previously found in a drug repurposing screen against zika virus ( ) . nanchangmycin was broadly antiviral against viruses that enter cells through endocytosis, consistent with the role of endosomal acidification for sars-cov- entry in these cells ( ) . we then repurchased powders and validated four of these candidates in a dose-response assays where we observed antiviral activity in the absence of toxicity (figure d ). since vero cells are derived from african green monkeys, we set out to identify a human cell line permissive to infection. to this end, we infected a panel of human cell lines with sars-cov- and monitored infection by microscopy. we initially tested a , calu- , huh , huh . , hepg , hacat, imr , nci-h , cfbe o, and u os cells. we detected less than % infection of a , calu- , huh , hepg , hacat, imr , nci-h , cfbe o, and u os cells (data not shown). interestingly, while huh were largely non-permissive, the derivative cell line huh . cells was permissive to sars-cov- (fig a) . huh . cells are defective in innate immune signaling (rig-i) and are known to be more permissive to many viruses, including hepatitis c virus ( ) . remdesivir and hydroxylchloroquine were antiviral against sars-cov- in huh . cells with ic s that were greater than -fold lower than those observed in vero cells (fig b) . we also found that nanchangmycin was antiviral against sars-cov- in huh . cells (fig s ) .these observations suggest that huh . cells may be more sensitive to some classes of inhibitors, and may reveal antivirals that are selectively active against human targets. we optimized our image-based assay in huh . cells using remdesivir and observed a z'= . ( fig s ) ( ) . we screened our repurposing library at nm quantifying both the percentage of infected cells as well as cell number to exclude toxic compounds (fig c) . we found drugs had antiviral activity in the absence of cytotoxicity (< % infection, > % viability, as compared to vehicle control) (table s ). this included three of the six drugs identified in vero cells: z-fa-fmk, y- and salinomycin. we repurchased powders for drugs and tested their activity in dose-response assays in huh . cells against sars-cov- . cell number and the percent of infected cells were quantified. remdesivir and hydroxychloroquine were used as positive controls and vehicle controls (dmso) was included as a negative control ( ) . of those tested, drugs showed activity and fell into diverse classes (fig d) . dose-response curves are shown for candidates and the ic s and cc s were calculated (fig e) . the selectivity index (si, ratio between antiviral and cytotoxicity potencies) was calculated and the candidates were antiviral with si> ( figure e , table s ). dose-responses curves for the other candidates that did not validate in huh . cells are shown in fig s . direct-acting antivirals are likely to be active against the virus in multiple cell types, as was observed for remdesivir. in addition, host-directed antivirals that target key steps in the viral lifecycle and are highly conserved and broadly expressed are also likely to emerge across cell types. one example is the endosomal acidification blocker hydroxychloroquine which indeed scored as antiviral in both cell types ( , , ) . in total, we identified three drugs as antiviral in both screens. we performed dose-responses in vero cells against the candidates from the huh . screen. we found that additional compounds were antiviral in vero cells with a si> , azd , bix , ebastine, mg- , and wye- , albeit at higher concentrations ( figure s ). however, the majority of the antivirals that were validated in huh . cells were not active in vero cells. we next focused on lung epithelial models as these are the most relevant to human infections. we found that a number of lung-derived epithelial cell lines were refractory to infection (eg a , calu- , nci-h , cfbe o). however, we found that calu- cells, that have been shown to be permissive for many coronaviruses including sars-cov- , were highly readily infected (fig a) ( , , ) . we optimized assays using calu- cells and tested their sensitivity to remdesivir and hydroxycholorquine. as expected, we found that while the direct acting antiviral remdesivir was antiviral; however, hydroxychloroquine had little or no activity in calu- cells (fig b) . this led us to test the antiviral activity of a panel of chloroquine derivatives and we found that none of these had activity against sars-cov- (fig c) , while these compounds are antiviral in both vero cells and huh . cells (fig d) . this suggests that there are major differences in the requirement for endosomal acidification during infection of sars-cov- in lung epithelial cells. endosomal acidification is thought to be required for sars-cov- entry to maintain the low ph necessary for endosomal cysteine protease activity required for priming spike for membrane fusion ( ) . consistent with the requirement for acidification in vero and huh . , the cathepsin inhibitor z-fa-fmk emerged as antiviral in both cell types (fig d, fig e) . we tested z-fa-fmk in calu- cells and found that it had no antiviral activity (fig d) , consistent with a lack of a requirement for endosomal acidification. recent studies found the plasma membraneassociated serine protease, tmprss , can prime the viral glycoprotein for entry in lung epithelial cells ( ) . therefore, we tested the role of tmprss by treating cells with the inhibitor camostat. we found that camostat was antiviral in calu- cells but had no activity in either vero or huh . cells (figure e -f) ( ) . moreover, the main endosomal kinase phosphatidylinositiol- -phosphate/phosphatidylinositol -kinase, pikfyve, promotes internalization of diverse viruses and was recently shown to impact entry of coronaviruses including sars-cov- in hela cells ( ) . using the pikfyve inhibitor apilimod, we found that pikfyve promotes infection of sars-cov- in huh . and vero cells, with little importance calu- cells ( fig s ) . these data suggest that the entry pathway used by sars-cov- is cell-type specific. to determine which of the antiviral candidates validated in huh . cells also had antiviral activity in calu- cells we performed dose-response studies. we found that drugs were antiviral against sars-cov- in calu- cells with a selectivity index greater than (fig ) . these include: two drugs with unclear targets (salinomycin, y- ), kinase inhibitors (azd , bemcentinib, dacomitinib, wye- ), histamine receptor inhibitor (ebastine), iron chelator dp mt, and the cyclophilin inhibitor cyclosporine. many kinase inhibitors were quite potent, suggesting an important role in intracellular signaling for infection. the other drugs tested in calu- with a si< are shown in fig s . the full table of candidates from the huh . screen with ic , cc and si are shown in figure s . cyclosporine is an fda approved generic drug that is readily available and showed a submicromolar ic with high selectivity in both huh . and calu- cells (fig , fig , fig s ) . cyclosporine binds cyclophilin a and prevents activation of the phosphatase calcineurin which is required for the nuclear translocation of the nuclear factor of activated t cells (nfat) ( ) ( ) ( ) . inhibition of this pathway in t cells is used as an immunosuppressant ( ) . cyclosporins have been shown to have antiviral activity against a wide variety of viruses, including other coronaviruses ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . the activity of cyclosporine against previously studied coronaviruses is cyclophilin-dependent and independent of calcineurin ( , ) . we set out to perform initial structure-activity relationships (sar) and to determine if this activity was through its inhibition of cyclophilin or inhibition of calcineurin. for these studies, we obtained a panel of cyclosporine analogs including cyclosporin a, cyclosporin b, cyclosporin c, cyclosporin h and isocyclosoporin a ( ). we found that isocyclosporin a, cyclosporin a, cyclosporin b, and cyclosporin c were active with increasing ic s (fig a-c) . cyclosporine h, shows only weak binding to cyclophilin a, and has no immunosuppressant activity, as it does not inhibit the phosphatase activity of calcineurin ( , ) . cyclosporin h has a log reduction in activity compared to cyclosporine, but retains some activity. psc is a non-immunosuppressant derivative of cyclosporine that does not inhibit calcineurin, has a similar activity to cyclosporin c( ). tmn is a cyclophilin a inhibitor that is times more potent than cyclosporine a in inhibiting the prolyl isomerase activity of cyclophilin a ( ) . we found the latter to lack antiviral activity, suggesting that the enzymatic activity of cyclophilin a is dispensable. we further validated that cyclosporine is antiviral in both cell types by performing rt-qpcr (fig d) . in addition, nim is another non-immunosuppressive cyclosporine derivative of cyclosporine, and we found that it is potently antiviral (fig e) , further suggesting that the antiviral activity is cyclophilin-dependent and separable from calcineurin. strikingly, the activity of this panel of drugs are similar in the two cell lines. these data suggest that cyclosporine has the same target and mechanism-of-action. we also found that none of these drugs are antiviral in vero cells ( figure s ). to further assess the mechanism by which cyclosporine is antiviral, we tested fk , an inhibitor of calcineurin. fk binds the related immunophilin fkbp, rather than cyclophilin a, to block the phosphatase activity of calcineurin, and thus is also a potent immunosuppressant ( ) . we found that fk has no activity against sars-cov- (fig a-c) . moreover, since one of the major targets of calcineurin is the activation of nfat, we also tested whether an nfat inhibitor impacted viral infection ( ) . we found that the nfat inhibitor had no effect on infection (fig a-c) . altogether, we found that cyclosporins are potent antivirals against sars-cov- in lung epithelial cells, and that this activity is independent of calcineurin and nfat. the emergence of sars-cov- has led to devastating global morbidity and mortality, creating an immediate need for new therapeutics and vaccines. repurposing existing drugs can allow for rapid deployment of therapeutics that have already been tested in humans ( ) . remdesivir was developed against the ebola virus rna-dependent rna polymerase, and was also found to have robust activity against sars-cov- ( ) . importantly, we found that remdesivir is active against sars-cov- across cell types. chloroquine and hydroxychlorquine have been used for decades to treat malaria and have been shown to have in vitro antiviral activity against sars-cov- ( , ) . however, we find that this antiviral activity is cell-type specific. lung epithelial cells are resistant to these drugs, and this may explain the lack of efficacy seen in many trials ( ) . to determine if there are additional drugs that are active against sars-cov- in vitro, we screened a repurposing library that includes ~ fda approved drugs and ~ additional drugs that have been tested in humans. repurposing can be used to reveal new and similar pathways and targets, but also the time and monetary investment associated with repurposing is potentially less since these drugs often bypass phase- trials ( , ) . initial screens in vero cells yielded few active drugs, leading us to pursue a screen in human huh . cells, a transformed hepatocyte line deficient in innate immune signaling. using this model system, we identified drugs, and validated with dose-response assays. this includes many drugs that were previously shown to have activity against other coronaviruses (tetrandrine, cepharanthine, cyclosporine, alixostatin, mg , salinomycin), and sars-cov- (salinomycin, tetrandrine, cepharanthine, cyclosporine, ebastine) ( , , , , ( ) ( ) ( ) ( ) ( ) ( ) ( ) . the drugs fall into distinct classes and most have known targets. however, two drugs that were active across cell types, salinomycin and y- , do not have clear targets. salinomycin, is a polyether antibiotic and chemotherapy drug, that has been shown to be antiviral against many viruses, including coronaviruses ( , ( ) ( ) ( ) . salinomycin was also identified in a vero cell screen ( ) . mechanistically, some studies have suggested that salinomycin is an ionophore that can attenuate viral entry by disrupting the acidification of the endosome ( ) . other studies have implicated salinomycin in er stress ( ) . studies in mice have shown antiviral activity against influenza ( ) . salinomycin has also been characterized as an activator of autophagy, which may influence sars-cov- infection ( , ) ( ). y- is a phenylpyrazoleanilide immunomodulatory agent that has been shown to inhibit il- production by t cells and has activity in monkeys ( ) . interestingly, treatment with y- is associated with decreased il- production, a cytokine that is thought to be highly expressed in sars-cov- infection ( ) ( ) ( ) . however, it is unclear how y- could attenuate sars-cov- in non-immune cells. ebastine is a potent h -histamine receptor antagonist, used for allergic disorders outside of the us, particularly in asia ( ). we found that ebastine is antiviral in all three cell types, although -fold less active in vero cells ( ) . ebastine is orally available with few side effects and there are there clinical trials underway in china testing whether ebastine can impact covid- outcomes ( ) . since other h -histamine receptor antagonists were not active, it is unclear why this particular agent is more effective at inhibiting sars-cov- infection. interestingly, ebastine and its active metabolite, carebastine, are reported to inhibit expression of il- , while many other h -histamine receptor antagonists do not ( , ) . we also identified protease inhibitors as antiviral in huh . cells. two cysteine protease inhibitors, z-fa-fmk and mg- , had activity in both vero and huh . cells. none of the protease inhibitors were active in calu- cells. this observation suggests that they are not targeting the viral proteases. consistent with this, z-fa-fmk is an inhibitor of cathepsins which are required for sars-cov- entry in cells where endosomal proteases are required for spike cleavage, and thus we observe no requirement in calu- cells where tmprss is required for infection ( , , ( ) ( ) ( ) .this has important implications in diverse sars-cov- studies where there may be cell-type specific requirements for different steps in the replication cycle. we also identified two inhibitors against the cellular histone methyltransferase g a as antiviral in huh . cells. however, these drugs were not active in calu- cells, suggesting that there are cell type specific requirements. am is a selective cannabinoid cb receptor agonist that we found was antiviral in huh . cells. gw x, another cb agonist that has a -fold higher ec , was not active. moreover, dose-response studied found that am is not active in either vero or calu- cells. cepharanthine and tetrandrine are both bis-benzylisoquinoline alkaloids produced as natural products from herbal plants ( ) . tetrandrine, a traditional chinese medicine and calcium channel blocker, has been shown to antagonize calmodulin. it has anti-tumor and antiinflammatory effects, and can effectively inhibit fibroblasts, thereby inhibiting pulmonary fibrosis ( , ) . multiple studies have suggested that tetrandrine has antiviral activity, including against dengue virus and herpes simplex virus ( , ) . tetrandrine has also been shown to inhibit entry of ebola virus into host cells in vitro and showed therapeutic efficacy against ebola in preliminary studies on mice ( ) . currently, there is an ongoing clinical trial using tetrandrine in covid- patients to improve pulmonary function ( ) . cepharanthine is reported to have antiinflammatory and immunoregulatory properties and is used to treat a variety of acute and chronic conditions outside of the us ( ) . both cepharanthine and tetrandrine were previously shown to have antiviral activity against the human coronavirus oc and in recent studies on sars-cov- in vero cell screens ( , , ) . while both of these molecules were antiviral in our huh . screen, neither were active in calu- cells. this may suggest that they are modulating endosomal entry pathways. we identified few metabolic regulators. dp mt is a potent iron chelator that we found to be antiviral against sars-cov- in huh . and calu- cells ( ) . a clinical trial with the iron chelator deferoxamine is underway (nct ). however, other iron chelators in our library, deferasirox and deferiprone, were not identified as antiviral making the mechanism of action unclear. we identified several kinase inhibitors as antivirals against sars-cov- . frax is a p activated kinase (pak) inhibitor that is antiviral in huh . cells, but only modestly impacted infection of calu- cells ( ) . other pak inhibitors were not identified in our screens. pak is required for entry by many viruses( ). pd is a potent wee and chk inhibitor that is antiviral in huh . cells, but shows strong toxicity in calu- cells ( ). we also found three mtor inhibitors, azd , pf- , and wye- are antiviral against sars-cov- in huh- and calu- cells. these are highly potent atp competitive mtor inhibitors that target both torc and torc . in our library, none of the rapamycin analogs that selectively inhibit mtorc were active. we also identified two potent selective and irreversible inhibitors of egfr, dacomitinib and naquotinib. the other egfr inhibitors showed no activity in huh . cells. importantly, dacomitinib is a potent antiviral in calu- cells. it is unclear if the target is indeed egfr, but for many viruses egfr activation promotes viral entry which may also be the case for sars-cov- ( - ). cyclosporine is a commonly used immunosuppressant that binds cyclophilin a and inhibits the calcium-dependent phosphatase calcineurin which is required for the nuclear translocation of the nuclear factor of activated t cells (nfat) ( ) ( ) ( ) ( ) ) . inhibition of this pathway in t cells is used as an immunosuppressant. we found that cyclosporine is active in both huh . and calu- cells, but has no activity in vero cells nor did the cyclosporine analogs. a recent screen in vero cells did find activity with cyclosporine against sars-cov- ( ) . cyclophilin a is a ubiquitously expressed peptidyl-prolyl cis-trans isomerase ( ). cyclophilin a and other cyclophilins have chaperone-like activity and take part in protein-folding processes ( ) . cyclophilin a has been shown to be an important cellular factor that facilitated many diverse viral infections. this includes human immunodeficiency virus type (hiv- ), influenza virus, hepatitis c virus (hcv), hepatitis b virus (hbv), vesicular stomatitis virus (vsv), vaccinia virus (vv), severe acute respiratory syndrome coronavirus (sars-cov) and rotavirus (rv) ( - , , ) . the coronaviruses hcov- e, hcov-nl , fpiv, mouse hepatitis virus (mhv), avian infectious bronchitis virus, and sars-cov have been found to be attenuated by cyclosporin a ( , , ) . cyclosporine and its non-immunosuppressive derivatives can inhibit replication of a number of viruses including some coronaviruses. in most cases the responsible cyclophilin is cypa ( , ), but cypa and cypb were found to be required for fcov replication ( ) . for hcov-nl , and hcov- e, cyclophilin a is required for infection in caco- cells ( ) and huh- . cells respectively ( , ) . it is generally thought that the activity of cyclosporine against coronaviruses is cyclophilin-dependent and independent of calcineurin. we found that a number of cyclosporins were antiviral with similar potencies including cyclosporine, cyclosporin a, cyclosporin b and the metabolic breakdown product of cyclosporin a, isoscyclosporin a. we also found that cyclophilin a is likely required, as cyclosporin h, which is a weak binder had reduced activity. however, the enzymatic activity of cyclophilin a is likely dispensable as tmn was inactive. to further address the role of calcineurin, we tested a non-immunosuppressant derivative of cyclosporine that does not inhibit calcineurin, has a similar activity to cyclosporin c. we also found that fk , a calcineurin inhibitor independent of cyclophilin a, and nfat inhibitors also have no antiviral activity. altogether, we found that cyclosporins are potent antivirals against sars-cov- in lung epithelial cells, and that this activity is independent of calcineurins. nim is a cyclophilin inhibitor independent of calcineurin, and we found that this is highly active in huh . cells, further suggesting that cyclophilin is required for sars-covo- infection. strikingly, the activities of all of these drugs is similar in the two cell lines suggesting the same target and mechanism-of-action and that cyclosporine would block sars-cov- in diverse infected tissues in vivo. one approach would be to use cyclophilin inhibitors that do not have immunosuppressive activity such as nim or others that have been tested for hcv infection (alisporivir (debio- ) and scy- ) ( ) or for hiv infection (nim ) ( ) ( ) . another possibility is to use cyclophilin inhibitors that also target calcineurin (eg. cyclosporine). one of the major complications of covid- is the hyper-inflammatory response and cytokine storm associated with increased immune activation. to prevent hyper-activation, there has been interest in treating covid- patients with immunosuppressants ( ). there are ongoing trials for a variety of agents including anti-il and jak inhibitors, two clinical trials using sirolimus, the fda approved mtor inhibitor, which selectively inhibits mtorc . we find no antiviral activity of sirolimus or other rapamycin derivatives. in contrast, cyclosporin a is an approved immunosuppressant that we found is also antiviral at concentrations achieved in vivo ( ) . therefore, it may be useful to implement clinical trials using cyclosporin a as an immunosuppressant as it would potentially ameliorate symptoms by two mechanisms ( ) . there have been a large number of screens posted in the literature that suggest antiviral activity of several existing drugs (e.g. azithromycin, faviprivir, lopinavir, ribavirin, and ritonavir, tetracycline, etc). these drugs and most screens have been performed in vero cells, with toxicity as read-outs. medicines for malaria venture (mmv) has compiled a list of drugs that has support for antiviral activity against sars-cov- (https://www.mmv.org/mmv-open/covid-box). we tested > of the compounds and find that in addition to the quinolines and drugs found in our screen there are few additional compounds that show activity at less than um. while it is possible that some of these drugs are false negatives in our screens, it is likely that many of these candidates do not have antiviral activity when either measuring viral antigen production or when looking in different cell types. it is very important that identified antivirals be tested for their impact on viral replication more directly. moreover, given the striking differences in sensitivities across cell types it is important to validate the activity of any new antivirals in lung epithelial cells. altogether, these studies highlight the roles of cellular genes in viral infection, cell type differences, and our discovery of nine broadly active antivirals suggest new avenues for therapeutic interventions. we found that of the drugs are antiviral in lung epithelial cells, have been used in humans, of these are fda approved in the us (cyclosporine, dacomitinib, and salinomycin), and ebastine is approved outside of the us. while clinical trials are underway with some of these candidates, additional trials will be needed to determine the efficacy of these antivirals in covid- patients, to inform future treatment strategies. vero e cells and vero ccl were obtained from atcc and were cultured in dmem, supplemented with % (v/v) fetal bovine serum, % (v/v) penicillin/streptomycin, % (v/v) l-glutamax and were maintained at °c and % co . huh . cells were obtained from c. rice (rockefeller) and cultured in dmem, supplemented with % (v/v) fetal bovine serum, % (v/v) penicillin/streptomycin, % (v/v) l-glutamine, and were maintained at °c and % co . calu- cells (htb- ) were obtained from atcc and cultured in mem, supplemented with % (v/v) fetal bovine serum, % (v/v) penicillin/streptomycin, % (v/v) l-glutamine, and were maintained at °c and % co . sars-cov- was obtained from bei (wa- strain). stocks were prepared by infection of vero e cells in % serum plus mm hepes for five days, freezethawed, and clarified by centrifugation (po). titer of stock was determined by plaque assay using vero e cells and were x pfu/ml and . x tcid /ml ( ) . this seed stock was amplified in vero ccl (p ) at . x tcid /ml. all work with infectious virus was performed in a biosafety level laboratory and approved by the institutional biosafety committee and environmental health and safety. infections: cells were plated in well plates ( µl/well) , cells per well for vero, , cells per well huh . , , cells per well calu- . the next day, nl of drugs were added. the positive control remdesivir and the negative control dmso were spotted on each plate. one hour later cells were infected with sars-cov- (vero, moi= ; huh . moi= ; calu- moi= . ) cells were fixed ( hpi vero and huh , , hpi calu- ) in % formaldehyde/pbs for min at room temperature and then washed three times with pbst. cells were blocked ( % bsa/pbst) for minutes and incubated in primary antibody (anti-dsrna j ) overnight at c. cells were washed x in pbst and incubated in secondary (anti-mouse alexa and hoescht ) for h at room temperature. cells were washed x in pbst and imaged using imagxpress micro using a x objective. four sites per well were captured. the total number of cells and the number of infected cells were measured using metaxpress . . cell scoring module, and the percentage of infected cells was calculated. the aggregated infection of the dmso and remdesivir control wells (n= ) on each assay plate were used to calculate z'-factors, as a measure of assay performance and data quality. sample well infection was normalized to aggregated dmso plate control wells and expressed as percentage of control [poc = (%infection sample / average %infection dmso )* ] and z-score [z= (%infection sample -average %infection dmso ) / standard deviation %infection dmso ]in spotfire (perkinelmer). candidate hits were selected as compounds with poc< % and viability > %, compared to vehicle control. candidate drugs were repurchased as powders from selleckchem, medchemexpress, and medkoo and suspended in dmso. drugs were arrayed in -pt dose-response in well plates. infections were performed using screening conditions. dmso (n= ) and µm remdesivir (n= ) were included on each validation plate as controls for normalization. infection at each drug concentration was normalized to aggregated dmso plate control wells and expressed as percentage-of-control (poc=% infection sample /avg % infection dmso cont ). ). a nonlinear regression curve fit analysis (graphpad prism ) was performed on poc infection and cell viability using log transformed concentration values to calculate ic values for infection and cc values for cell viability for each drug/cell line combination. selectivity index (si) was calculated as a ratio of drug's cc and ic values (si = cc /ic ). rt-qpcr: huh . ( , cells/well) or calu- cells ( , cells/well) were plated in well plates. the next day, drugs were added and one hour later infected with sars-cov- (moi= . ). total rna was purified using trizol (invitrogen) followed by rna clean and concentrate kit (zymo researc) hpi for huh . or hpi for calu- . for cdna synthesis, reverse transcription was performed with random hexamers and moloney murine leukemia virus (m-mlv) reverse transcriptase (invitrogen). synthesized rna was used as a standard (bei). gene specific primers to sars-cov- (wuhan v , nsp ) and sybr green master mix (applied biosystems) were used to amplify viral rna and s rrna primers were used to amplify cellular rna using the quantstudio flex rt-pcr system (applied biosystems). relative quantities of viral and cellular rna were calculated using the standard curve method ( ) . viral rna was normalized to s rna for each sample (wuhan v / s). wuhan-v _forward s rrna_forward '-aacccgttgaaccccatt- ' s rrna reverse '-ccatccaatcggtagtagcg- ' we thank s. weiss and y. li for sharing sars-related coronavirus , isolate usa-wa / (obtained from the centers for disease control and bei resources). we thank bei resources for quantitative sars-cov- rna. we thank m. diamond and s. hensley for providing anti-spike antibody (cr ), c. coyne for j antibody, m. diamond for oligo sequences. we thank e. grice for hacat cells. we thank c. kovacsics for biosafety support. we thank the cherry lab, the high-throughput screening core, david roth, and john epstein for discussions. we thank timothy wells and medicines for malaria venture for helpful discussions and compounds. we thank the nih, dean's innovation fund, linda and laddy montague, bwf for funding. origin and evolution of pathogenic coronaviruses coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus sars and mers: recent insights into emerging coronaviruses a new coronavirus associated with human respiratory disease in china remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro a randomized trial of hydroxychloroquine as postexposure prophylaxis for covid- drug repositioning: identifying and developing new uses for existing drugs severe acute respiratory syndrome coronavirus from patient with coronavirus disease identification of antiviral drug candidates against sars-cov- from fda-approved drugs an orally bioavailable broad-spectrum antiviral inhibits sars-cov- in human airway epithelial cell cultures and multiple coronaviruses in mice highly permissive cell lines for subgenomic and genomic hepatitis c virus rna replication cell entry mechanisms of sars-cov- physiological and molecular triggers for sars-cov membrane fusion and entry into host cells activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites host cell proteases: critical determinants of coronavirus tropism and pathogenesis structural and functional basis of sars-cov- entry by using human ace functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses cellular entry of the sars coronavirus clinical pharmacokinetics and metabolism of chloroquine. focus on recent advancements targeting the endocytic pathway and autophagy process as a novel therapeutic strategy in covid- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response enhanced isolation of sars-cov- by tmprss -expressing cells determination of preferential binding sites for anti-dsrna antibodies on double-stranded rna by scanning force microscopy potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody a simple statistical parameter for use in evaluation and validation of high throughput screening assays calu- : a human airway epithelial cell line that shows camp-dependent cl-secretion characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov calcineurin is a common target of cyclophilin-cyclosporin a and fkbp-fk complexes two cytoplasmic candidates for immunophilin action are revealed by affinity for a new cyclophilin: one in the presence and one in the absence of csa transcriptional regulation by calcium, calcineurin, and nfat mechanisms of action of cyclosporine merscoronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin a or interferon-alpha treatment influences of cyclosporin a and non-immunosuppressive derivatives on cellular cyclophilins and viral nucleocapsid protein during human coronavirus e replication cyclophilin a restricts influenza a virus replication through degradation of the m protein requirement for cyclophilin a for the replication of vesicular stomatitis virus new jersey serotype redistribution of cyclophilin a to viral factories during vaccinia virus infection and its incorporation into mature particles cyclophilin a inhibits rotavirus replication by facilitating host ifn-i production cyclosporin a inhibits the influenza virus replication through cyclophilin a-dependent and -independent pathways cyclophilin a, trim , and resistance to human immunodeficiency virus type infection hepatitis b virus (hbv) surface antigen interacts with and promotes cyclophilin a secretion: possible link to pathogenesis of hbv infection cyclophilin a modulates the sensitivity of hiv- to host restriction factors chicken cyclophilin a is an inhibitory factor to influenza virus replication human coronavirus nl replication is cyclophilin a-dependent and inhibited by non-immunosuppressive cyclosporine a-derivatives including alisporivir genetic deficiency and polymorphisms of cyclophilin a reveal its essential role for human coronavirus e replication hepatocytes that express variants of cyclophilin a are resistant to hcv infection and replication cyclosporine analogues chapter optimization of cyclophilin inhibitors for use in antiviral therapy sdz psc , a non-immunosuppressive cyclosporine: its potency in overcoming p-glycoproteinmediated multidrug resistance of murine leukemia thiosemicarbazones from the old to new: iron chelators that are more than just ribonucleotide reductase inhibitors mode of action of tacrolimus (fk ): molecular and cellular mechanisms a new cell-permeable peptide allows successful allogeneic islet transplantation in mice potential therapeutic targets for combating sars-cov- : drug repurposing, clinical trials and recent advancements drug repurposing: progress, challenges and recommendations drug repurposing from an academic perspective repurposing of clinically approved drugs for treatment of coronavirus disease in a -novel coronavirusrelated coronavirus model natural bis-benzylisoquinoline alkaloids-tetrandrine, fangchinoline, and cepharanthine, inhibit human coronavirus oc infection of mrc- human lung cells. biomolecules cyclosporin a inhibits the replication of diverse coronaviruses coronavirus protein processing and rna synthesis is inhibited by the cysteine proteinase inhibitor e d severe acute respiratory syndrome coronavirus replication is severely impaired by mg due to proteasome-independent inhibition of m-calpain the ubiquitin-proteasome system facilitates the transfer of murine coronavirus from endosome to cytoplasm during virus entry repurposing of clinically developed drugs for treatment of middle east respiratory syndrome coronavirus infection potential antiviral options against sars-cov- infection salinomycin: a new monovalent cation ionophore salinomycin inhibits influenza virus infection by disrupting endosomal acidification and viral matrix protein function salinomycin triggers endoplasmic reticulum stress through atp a upregulation in pc- cells salinomycin, as an autophagy modulator--a new avenue to anticancer: a review salinomycin induces autophagy in colon and breast cancer cells with concomitant generation of reactive oxygen species salinomycin induces activation of autophagy, mitophagy and affects mitochondrial polarity: differences between primary and cancer cells a new phenylpyrazoleanilide, y- , inhibits interleukin production and ameliorates collageninduced arthritis in mice and cynomolgus monkeys y- , a novel immune-modulator, sensitizes multidrug-resistant tumors to chemotherapy a review of the second-generation antihistamine ebastine for the treatment of allergic disorders ongoing clinical trials for the management of the covid- pandemic ebastine inhibits t cell migration, production of th -type cytokines and proinflammatory cytokines histamine h -receptor antagonists with immunomodulating activities: potential use for modulating t helper type (th )/th cytokine imbalance and inflammatory responses in allergic diseases bisbenzylisoquinoline alkaloids targeting the ca( +)/calmodulindependent protein kinase ii by tetrandrine in human liver cancer cells tetrandrine potently inhibits herpes simplex virus type- -induced keratitis in balb/c mice differential effects of triptolide and tetrandrine on activation of cox- , nf-kappab, and ap- and virus production in dengue virus-infected human lung cells ebola virus. two-pore channels control ebola virus host cell entry and are drug targets for disease treatment cepharanthine: an update of its mode of action, pharmacological properties and medical applications rescue of fragile x syndrome phenotypes in fmr ko mice by the small-molecule pak inhibitor frax radiosensitization of p mutant cells by pd , a novel g( ) checkpoint abrogator viruses exploit the function of epidermal growth factor receptor the epidermal growth factor receptor (egfr) promotes uptake of influenza a viruses (iav) into host cells epidermal growth factor receptor is a host-entry cofactor triggering hepatitis b virus internalization epidermal growth factor receptor is a co-factor for transmissible gastroenteritis virus entry hepatitis c virus induces epidermal growth factor receptor activation via cd binding for viral internalization and entry r , a selective small molecule inhibitor of axl kinase, blocks tumor spread and prolongs survival in models of metastatic breast cancer axl mediates zika virus entry in human glial cells and modulates innate immune responses the mechanism of axl-mediated ebola virus infection first patient dosed in bemcentinib covid- trial peptidylproline cis-trans-isomerases: immunophilins peptidyl-prolyl cis-trans isomerases, a superfamily of ubiquitous folding catalysts cyclophilin a is an essential cofactor for hepatitis c virus infection and the principal mediator of cyclosporine resistance in vitro cyclophilin a and viral infections feline coronavirus replication is affected by both cyclophilin a and cyclophilin b cyclophilin inhibition as potential therapy for liver diseases sars-cov- pandemic : time to revive the cyclophilin inhibitor alisporivir cytokine release syndrome in severe covid- cyclosporin. a review of its pharmacodynamic and pharmacokinetic properties, and therapeutic use in immunoregulatory disorders covid- and calcineurin inhibitors: should they get left out in the storm? a standard curve based method for relative real time pcr data processing key: cord- - r mico authors: resnick, samuel j.; iketani, sho; hong, seo jung; zask, arie; liu, hengrui; kim, sungsoo; melore, schuyler; nair, manoj s.; huang, yaoxing; tay, nicholas e.s.; rovis, tomislav; yang, hee won; stockwell, brent r.; ho, david d.; chavez, alejandro title: a simplified cell-based assay to identify coronavirus cl protease inhibitors date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: r mico we describe a mammalian cell-based assay capable of identifying coronavirus cl protease ( clpro) inhibitors without requiring the use of live virus. by enabling the facile testing of compounds across a range of coronavirus clpro enzymes, including the one from sars-cov- , we are able to quickly identify compounds with broad or narrow spectra of activity. we further demonstrate the utility of our approach by performing a curated compound screen along with structure-activity profiling of a series of small molecules to identify compounds with antiviral activity. throughout these studies, we observed concordance between data emerging from this assay and from live virus assays. by democratizing the testing of cl inhibitors to enable screening in the majority of laboratories rather than the few with extensive biosafety infrastructure, we hope to expedite the search for coronavirus cl protease inhibitors, to address the current epidemic and future ones that will inevitably arise. we next conducted dose-response profiling for two additional sars-cov- clpro inhibitors, compound and compound a, and observed reversal of the toxic effect of the protease in a dose- dependent manner (fig. b-c) , . in agreement with the results obtained with gc , the ec value for compound was comparable to those obtained with live virus, . µm and . µm, respectively (table ) . unexpectedly, we calculated an ec of . µm for a, which is approximately -fold higher than the literature reported value of . µm, based on viral plaque assay . we have noticed that literature reported ec values from live virus testing could range over an order of magnitude depending on the exact method employed, as is the case for gc (table ). to resolve this discrepancy between the transfection-based approach and the live virus assay, we conducted live virus testing of a using the commonly employed readout of cytopathic effect in vero e cells and observed closer concordance with our transfection-based results (supplementary figure and table ) , , . to measure the toxicity of each compound, we exposed eyfp- transfected cells to each molecule and determined cc values (fig. ) . we also calculated the selectivity index (si) for each compound tested in this study (supplementary table ). we hypothesized that the assay would be able to distinguish between compounds that are only active on the purified sars-cov- clpro and those that are able to inhibit the live virus through protease inhibition. in general, we observe concordance between compounds showing activity within this transfection-based cl assay and live virus studies ( supplementary fig. a -e) . however, we hypothesized that this assay may be used to study other coronavirus clpros to enable users to identify broad-acting inhibitors, as constructs containing other clpro enzymes could be readily variable amino acid identity compared with sars-cov- clpro ( supplementary fig. a ). for each of these proteases, we confirmed that expression in mammalian cells resulted in toxicity that is dependent upon the enzyme's catalytic activity ( supplementary fig. b ). next, we tested gc , compound , and a across this panel of proteases. gc , a drug originally identified for use against the feline infectious peritonitis virus, showed ec < µm for the most, but not all of proteases tested . unexpectedly, compound , which was originally designed as a sars-cov clpro inhibitor showed particular potency against ibv clpro (ec = . µm) along with broad activity (ec < µm) for all other cl proteases tested. in contrast to gc and compound , a had a relatively narrow activity spectrum with ec < µm against only sars-cov and sars-cov- clpro enzymes (fig. ) . of note, in all cases where previous live virus data was available, the ec values obtained from this transfection-based assay were similar (table ) . having further determined the assay's ability to examine the effects of active individual compounds, we sought to determine its suitability for small molecule screening. before performing the screen, we we noted that gc is structurally similar to its prodrug gc , except for the change of the bisulfide salt adduct to an aldehyde warhead , . additional testing of gc revealed it to have a similar ec as gc in both the transfection assay and when tested against live sars-cov- virus, suggesting that the differences in structure has a minimal effect on their potency (supplementary fig. and table ), although solubility may be affected . the other hit from the screen, grl- , shares structural similarity to several other compounds within the library, one of which is a previously reported clpro inhibitor (mac- ) that failed to show activity against . µm against sars-cov- clpro within our transfection-based assay (fig. c ). to verify grlexamined and observed a narrow range of activity, with ec < µm only observed against sars- table ) . we also tested gc against the full panel of clpro enzymes and observed concordance with gc , with sars-cov clpro, mers-cov cl pro, and ibv cl pro demonstrating an ec < µm ( supplementary fig. ). given the potential for protease inhibitors in the treatment of viral illnesses, small molecule inhibitors of coronavirus cl proteases represent a promising avenue for treating infections caused by this large family of viruses. here, we present a simplified assay to identify candidate inhibitors under physiologic cellular conditions. this approach presents significant advantages over other methods to detect cl protease inhibitory activity with its ease of use and ability to be performed with equipment and reagents commonly available to many biomedical research laboratories. while conventional methods for identifying cl protease inhibitors make use of in vitro purified protease, the isolation of sufficiently pure enzyme in its native state can be costly and labor intensive. furthermore, assays using purified protease fail to consider cell permeability and the influence of the extracellular and . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint intracellular milieu on compound activity. in comparison to live virus-based assays, the outlined approach does not require extensive biosafety containment. these data also suggest that the approach described here is applicable to a number of coronaviruses for which live virus assays may not be available or would be deemed ethically challenging to be performed even with extensive biosafety infrastructure , . finally, because the phenotype assayed within this approach is driven solely by protease activity, it may enable the distinction between compounds with multiple biological targets and subsequent potential for off-target toxicity from those that function primarily as clpro (table ). these differences appear to be driven by variation in experimental setup such as cell line used, assay readout, incubation period, and initial concentration of virus added. while we have observed agreement between the ec values obtained from the described transfection-based method and those reported in the literature, given the differences in ec across assays, we suggest caution when comparing results across studies. by developing this transfection-based clpro testing platform, we hope to facilitate the discovery of new coronavirus inhibitors while also facilitating the comparison of existing inhibitors within a single simplified assay system. furthermore, . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint we propose that this cellular protease assay system could be industrialized to screen and optimize a large number of compounds to discover potential treatments for future viral pandemics. cell lines and cell culture hek t and hek cells used in this study were obtained from atcc. cells were maintained at °c in a humidified atmosphere with % co . hek t and hek cells were grown in during the drug screen, within each of the four plates screened, two positive controls wells were included to ensure assay reliability, along with several wells with the negative control . % dmso for hit selection, we employed a robust z-score method. we first normalized data using a robust z- the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint of saturated nahco was added. the cloudy mixture was extracted with ch cl , dried over na so , and concentrated in vacuo to give gc as a colorless oil ( . mg). . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint all reagents generated in this study are without restriction. plasmids generated in this study will be some of the molecules described in this work. b.r.s. is an inventor on additional patents and patent applications related to small molecule therapeutics, and co-founded and serves as a consultant to inzen therapeutics and nevrox limited. . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint absorbance . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint compound a bat-cov-hku clpro hcov-nl clpro ibv clpro - - - fig. . small-scale drug screen and structure-activity profiling at µm identify two . cc-by-nc-nd . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted august , . . https://doi.org/ . / . . . doi: biorxiv preprint a new coronavirus associated with human respiratory disease in china key: cord- -rw yls authors: dominguez andres, ana; feng, yongmei; campos, alexandre rosa; yin, jun; yang, chih-cheng; james, brian; murad, rabi; kim, hyungsoo; deshpande, aniruddha j.; gordon, david e.; krogan, nevan; pippa, raffaella; ronai, ze’ev a. title: sars-cov- orf c is a membrane-associated protein that suppresses antiviral responses in cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rw yls disrupted antiviral immune responses are associated with severe covid- , the disease caused by sar-cov- . here, we show that the -amino-acid protein encoded by orf c of the viral genome contains a putative transmembrane domain, interacts with membrane proteins in multiple cellular compartments, and impairs antiviral processes in a lung epithelial cell line. proteomic, interactome, and transcriptomic analyses, combined with bioinformatic analysis, revealed that expression of only this highly unstable small viral protein impaired interferon signaling, antigen presentation, and complement signaling, while inducing il- signaling. furthermore, we showed that interfering with orf c degradation by either proteasome inhibition or inhibition of the atpase vcp blunted the effects of orf c. our study indicated that orf c enables immune evasion and coordinates cellular changes essential for the sars-cov- life cycle. one-sentence summary sars-cov- orf c is the first human coronavirus protein localized to membrane, suppressing antiviral response, resembling full viral infection. sars-cov- is an enveloped, positive-sense single strand . kb rna virus ( , ) that causes severe respiratory disease in humans . this coronavirus was first identified in wuhan, china, at the end of ( ) . due to its easy human-to-human transmission and the lack of effective antiviral therapy, covid- has caused a pandemic with more than million cases and over , deaths worldwide (https://covid .who.int). mechanistically, the host protein ace serves as the viral receptor and host cellular proteases, such as tmprss , play key roles in sars-cov- entry into host cells ( ) ( ) ( ) ( ) . ace expression is high in alveolar epithelial cells ( ) , making the lung a highly vulnerable target for the virus. sars-cov- infection causes a wide range of disease, from asymptomatic to mild disease to severe disease that can lead to death ( ) . sars-cov- is most similar to the coronaviruses sars-cov and mers-cov ( , ) . however, neither of those became a global pandemic. current therapies are primarily palliative and supportive ( ) . more than clinical trials are currently in progress worldwide ( ) (https://clinicaltrials.gov/ct /who_table). without effective vaccines or treatments, there is an urgent need to understand the pathology of sars-cov- infection, the roles of each of the proteins encoded within the viral genome in the life cycle, virulence, and pathogenicity of the virus, and identify strategies for intervention or treatment. various therapeutic and vaccine strategies target viral entry mechanisms, such as vaccines or antibodies targeting on the spike (s) protein ( ) ( ) ( ) ( ) ; others target viral replication or assembly processes, such as the antiviral drug remdesivir, which interferes with rna replication and has emerged as superior to placebo in shortening recovery time in adults ( ). another strategy for treatment is interfere with viral immune evasion mechanisms and thus enable the body's natural antiviral responses to be more effective at clearing the virus. indeed, investigation of mechanisms of immune evasion by sars-cov- is an active area of translational research with immune evasion properties discovered for nonstructural protein (nsp ) ( ) . the sars-cov- genome contains open reading frames (orfs), which encode viral proteins ( ) ( ) ( ) . orf a and orf ab encode polyproteins that are cleaved into nonstructural proteins (nsp -nsp ) that comprise the replicase-transcriptase complex. spike (s) is encoded by orf , envelope (e) by orf , membrane (m) by orf , and nucleocapsid (n) by orf . an additional orfs encode "accessory" proteins: orf a, orf b, orf , orf a, orf b, orf , orf b, orf c, and orf . various studies have investigated the functions of the virally encoded proteins by performing interactome analysis in cells expressing individual viral proteins ( ) or by evaluating the proteomic or transcriptomic changes associated with either viral infection ( ) ( ) ( ) ( ) ( ) ( ) . others have used computational approaches to investigate protein-protein interactions between sars-cov- viral proteins and host proteins ( ) . the interactome and proteome studies identified cellular processes affected by sars-cov- infection or specific viral proteins, notably innate immune signaling ( , , , ( ) ( ) ( ) , ubiquitin ligase activities ( , , , ( ) ( ) ( ) , p mitogenactivated protein kinase (mapk) signaling ( , , , ( ) ( ) ( ) . the transcriptomic studies identified interferon signaling ( ) , cell death ( ) , interleukin (il- ), il- , and chemokine signaling ( ) . given the intense interest in catalytically active cov- proteins ( ) ( ) ( ) ( ) , we examined the lessstudied group of orfs encoding accessory proteins, which are largely thought to maintain viral structural organization in replication organelles and within the viral particle ( , ) . here, we showed that expression of only orf c is sufficient to alter cellular networks in a manner that resembles full sars-cov- virus infection. seven of the cov- proteins are orfs that lack catalytic activity and, in some cases, lack a known function ( ) . each of these were tagged with strep at the n terminus and expressed them individually in the lung cancer epithelial cell line a in the presence or absence of the proteasome inhibitor mg (fig. a ). the protein encoded by orf c was particularly unstable, with a profound increase in abundance evident in mg -treated compared to that in vehicle-treated a cells (fig a) . orf c is present in previously characterized strains of sars-cov ( ), a conservation suggesting a function in coronavirus pathogenesis. phylogenetic analysis and alignment of the protein sequences showed that mutations are present in orf c among different coronavirus strains with bat sars-like coronavirus orf as the closest ortholog sharing % sequence identity and only % identity with orf of sars-cov (fig. b, fig. s a ). tmhmm analysis ( ) of sars-cov- orf c predicted a transmembrane sequence in the c-terminal domain, a motif not present in sars-cov- (or other human coronaviruses) orf c sequence ( fig. b, fig. s b ). additionally, a single nucleotide mutation in sars-cov- orf c altered a termination codon, enabling the reading frame to extend by amino acids (fig. c ). we assessed potential functions of orf c by performing transcriptome, interactome, and proteome analysis of a lung cancer cells transfected with orf c tagged at the n terminus with copies of the strep tag ( ) . to map the orf c interactome, we conducted liquid chromatography tandem mass spectrometry (lc-ms/ms) of xstrep-tagged orf c compared with control xstrep-tagged gfp) immunoprecipitated from a cells h after transfection. orf c interactome analysis revealed that most interacting proteins were classified as membrane proteins (fig. d , table s ) according to gene ontology cellular component. as a protein with a transmembrane domain, this was not surprising. however, we were surprised to find that the orf c interactome was distributed throughout the membrane-bound organelles (fig. d ), including > proteins in the protein biosynthesis and transport systems of the endoplasmic reticulum (er) and golgi, > proteins in the mitochondria, and > other membrane-related proteins. given the instability of orf c, we were not surprised to identify a group of membraneassociated proteins that function with the proteasome. comparison between orf c and orf , both expressed using the strep-tagged vector, confirmed that enrichment of membranal proteins as part of the interactome was selectively seen for the orf c (fig. s c). we conducted both label free quantification (lfq) and tandem mass tag (tmt) mass spectrometry analysis to identify changes in the cellular proteome in a cells expressing orf c. we compared the proteomic changes associated with orf c expression in the presence or absence of proteasomal inhibition with mg , using dmso as the vehicle control for comparison in each set. principle component analysis (pca) revealed that orf c contributed to major variance in all data sets ( fig. s a, s b ). pairwise comparisons between orf c and control untransfected samples identified differentially expressed proteins in both the dmso and mg groups (fig a) . in both the dmso and mg datasets, most changes induced by orf c were a reduction in protein abundance (downregulation) ( fig. a, table s ). downregulated proteins identified using both approaches consistently showed ~ % overlap, while no overlap were identified among the upregulated proteins. thus, to maximize the discovery of orf c dysregulated proteins, results from both technologies were combined. including the differentially regulated proteins identified by both the tmt and lfq analysis revealed proteins were upregulated in common by orf c expression in the presence or absence of the proteasome inhibitor and proteins were downregulated in common (fig. b ). using the downregulated proteins and upregulated proteins identified for either the dmso or mg condition separately, we performed ingenuity pathway analysis (ipa) to assess signaling pathways deregulated in orf c-expressing cells. in both the dmso and mg condition, interferon (ifn) signaling exhibited the greatest difference, both in terms of the intensity of the downregulation and the number of proteins significantly associated with this pathway, in response to orf c (fig. c, table s ). other pathways affected by orf c and of particular importance to virulence were antigen presentation and innate immune response pathways, such as irf/cytosolic pattern recognition receptors. we further examined potential upstream regulators of these pathways using ingenuity pathway analysis for proteins that exhibited a change in abundance in the orf c-expressing cells. this analysis revealed that several components of the ifn machinery [interferons (ifnl , ifna), interferon responsive transcription factors (irf , irf ), and an interferon receptor (infar)] were reduced, consistent with the impaired ifn signaling, and an increase in mapk (also known as erk ) abundance (fig. c, table s ). to assess if there were notable differences in the intensity of the changes in protein abundance in response to proteasome inhibition, we calculated relative changes in protein abundance between control and orf c-expressing cells from both the dmso and mg conditions for proteins associated with ifn signaling or the ubiquitin proteasome (ubp) system and antigen presentation (fig. d ). the intensity of the changes was similar in the presence or absence of mg , suggesting even small amounts of the unstable orf c are sufficient to induce cellular changes including those that contribute to immune evasion. consistent with the ipa-based analysis (fig. b) , ifn signaling components, including ifi , multiple ifit proteins, irf , isg , mx , psmb , and stat proteins, were downregulated in orf c-expressing cells in both the presence and absence of mg (fig. d, left) . indicative of a decrease in antigen presentation capacity, multiple proteins involved in this process were decreased, including proteins involved in antigen loading and display [hla proteins, β m, and antigen transporters (tap and tap )] and proteins involved in ubp [ubiquitin-conjugating enzymes ube i and ube l ), deubiquitination enzymes (usp and ups ), and proteasome components (psmb and psme proteins)] (fig d, right) . these changes in the proteome indicated that the expression of only orf c, even in the absence of proteasomal inhibition to stabilize this protein, is sufficient to elicit effective inhibition of ifn, immune recognition, and ubp components at the protein level. such a response suggested that orf c contributes to immune evasion of sars-cov- . we assessed transcriptional changes elicited by orf c expression in a cells using rna-seq analysis. pca showed that changes induced by orf c in both dmso-and mg -treated cells cluster in distinct experimental groups (fig. s c) . in contrast to the proteomic results that revealed predominant downregulation of proteins following orf c expression, rna-seq analysis showed a similar number of transcripts were increased or decreased in the presence or absence of mg (fig. a, table s ). additionally, the number of differentially regulated transcripts was higher than that for the differentially regulated proteins. using ipa, we identified the pathways significantly enriched in differentially regulated transcripts. the same set of pathways were identified in the dmso and mg conditions, and similar to the proteomic results, most related to immune signaling (fig. b , upper, table s ). however, many of the specific pathways were different from those identified at proteomic level. at the transcriptional level, we detected the greatest effects on the complement system and several pathways involved in inflammatory signaling. thus, some components of antigen presentation and immune signaling pathways showed comparable changes at the protein and mrna levels; other changes elicited by orf c expression were unique to the transcriptional level, such as induction of il- signaling and p mapk signaling, or the protein level, such as impairment of ifn signaling. we analyzed the orf c-regulated transcripts for those encoding upstream regulators of the pathways altered at the transcriptional level by orf c. this analysis identified the classic immune modulators tumor necrosis factor (tnf), il- b, ifng, transforming growth factor β (tgfb), and nf-kb signaling components (fig. b , lower, table s ). we evaluated transcripts associated with the complement system or il- signaling in detail. in both the presence and absence of proteasome inhibition, complement system transcripts were mostly downregulated by orf c expression in a cells (fig. c, left) . for il- signaling, we found some differences between the mg and dmso conditions (fig. c, right) . for several transcripts the intensity of the upregulation was greater in the absence of proteasome inhibition (map k and map k , il and il r, socs and socs ); for others the presence of the proteasome inhibitor resulted in a greater reduction in transcript abundance (il b, tnfaip , cd , il r ). thus, these results suggested that orf c had a dose-dependent effect on some transcripts. we combined the results of the proteomic and transcriptomic (fig. a ), which revealed a small set of commonly upregulated or downregulated genes by orf c at both the transcription and protein levels (fig. b ). we performed ipa canonical pathway analysis and found commonly altered pathways at both the transcript and protein levels (fig. c , table s , s ). the direction of regulation (increased or decreased activity) was consistent between the transcripts and proteins. however, the number of components significantly enriched in most of the pathways differed between the protein and transcript levels. we compared our findings at the transcriptome, proteome, and interactome levels with those reported by stukalov et al. ( ) for proteins with altered ubiquitination (ubiquitinome) in response to sars-cov- infection of a cells. within the top enriched ipa canonical pathways, we noticed enrichment across all protein ubiquitination data sets, sirtuin signaling, phagosome maturation, tight junction signaling, and caveolar-mediate endocytosis (fig. d , table s ). seven of the were common between the ubiquitinome data ( ) and our proteomic analyses by lfq or tmt mass spectrometry. we also compared the pathway enrichment across our transcriptomic data and those from blanco-melo et al. ( ) reporting transcriptomic changes upon sars-cov- infection in human primary epithelial cells or ace-expressing a cells and with those from stukalov et al. ( ) reporting transcriptomic changes in ace -expressing a cells and hours after infection in the presence or absence of the proteasome inhibitor, we observed substantial overlap ( - %) in the orf c-downregulated proteins ( fig. a) and - % overlap in orf cupregulated transcripts and - % in the downregulated transcripts (fig. a) . thus, despite orf c increasing in mg -treated a cells, the persistent changes in mg -treated cells suggested that even a low amount of orf c is sufficient to elicit its cellular effects. however, some transcripts ( ) and proteins ( ) were downregulated by orf c in cells not treated with mg and were upregulated in cells treated with mg ( fig. a ). pathway analysis on the transcripts and proteins showed discordant regulation in dmso-or mg -treated cells in response orf c. the most pronounced among all three datasets were components of the ubp and the unfolded protein response (upr) (fig. b, table s ). changes in events or pathways associated with the cell cycle was not surprising given the critical role ubp plays in their regulation. we thus hypothesized that both the ubp and upr were involved in degradation of orf c. finding that upr signaling was also reversed upon mg treatment ( fig b) is consistent with the role of upr signaling in degradation of orf c. to directly assess the importance of ubp and upr components in orf c instability, we performed an sirna-based screen targeting over genes that encode components of both machineries in a cells stably expressing strep-tagged orf c (fig. s ). the top hits independently validated as blocking orf c degradation were sirnas targeting vcp [also known as p , an atpase involved in export of unfolded proteins from the er for er-associated degradation (erad) and in er to golgi transport] ( , ) , the proteasomal subunit psmd , and the proteasome maturation factor pomp ( ) , which has been also implicated in erad ( ) and in ifn-induced reorganization of proteasomes into immunoproteasomes ( ) (fig. c , table ). to assess if interfering with erad affected the cellular effects induced by orf c, we compared the transcript abundance for genes (ifngr , igs , irf , socs , psmb , tap ) that were downregulated by expression of orf c in dmso-treated cells with their abundance in cells treated with the vcp inhibitor mns- , the heat shock protein (hsp ) inhibitor geldanamycin, or the proteasome inhibitor bortezomib. although the hsp inhibitor and the proteasome inhibitor increased transcript abundance for some of the gens tested, vcp inhibition was the most consistently effective at enabling expression of each of these transcripts in the orf c-expressing cells (fig. d ). these observations suggested that orf c ability to attenuate key cellular signaling involved in antiviral responses, including antigen presentation, immune, and ifn pathways, requires the activity of vcp. the key to our ability to control spread of the sars-cov- virus is to understand its mechanism of action and how the concerted action of its encoded proteins subvert cellular regulatory networks. one can divide viral "success" into two key phases: infection, which is the ability to enter a given cell type, and multiplication that enables continuous infection through viral replication and packaging, which exploits host cell machineries ( ) . a third aspect to viral success is evasion from immune clearance. disruption of either the infection or replication phases should effectively inhibit the sars-cov- life cycle. accordingly, many efforts focus on neutralizing interaction of the viral s protein with ace ( , ) . other efforts strive to interfere with the viral life cycle after it invades target cells, and many focus on catalytically active proteins encoded by the sars-cov- genome ( ). here, we analyzed one small, unstable sars-cov- protein, orf c in the context of an epithelial lung cancer cell line. limited overlap between published studies of the orf c interactome can be attributed to their use of different cell system (hek compared with a lung cancer cells used here, as the use of different filtering criteria ( ) . however, many of the cellular changes elicited by orf c in our study also occur following infection with full replicative sars-cov- virus ( ) . those phenotypes included changes in ifn and other cytokine signaling, immune recognition (including antigen presentation; dendritic cell, t cell, and acute immune responses; and pattern recognition), cell cycle, and the complement system, all of which were downregulated by orf c. additionally, similar to cells infected with the virus or expressing orf c, il- , il- , and p mapk signaling pathways were upregulated. the primary change identified in our analysis was deregulation of the ifn system, coupled with changes in cytokines associated with tnf and stat signaling and factors implicated in innate immunity. in addition to mediating an antiviral response, aberrant ifn signaling is also critical for numerous pathological indications linked to covid- ( ). thus, we concluded that sars-cov- orf c elicits pathologies not seen with previously characterized coronavirus prototypes, primarily through effective modulation of ifn signaling. our findings suggested that orf c enables cells to escape from immune surveillance through by reducing hla abundance and antigen presentation, while also slowing cell replication, which could viral replication of infected cells. strikingly, orf c is predicted have a transmembrane domain and we found that the orf c interactome was mostly comprised of membrane-associated proteins in multiple organelles, including er, golgi, mitochondria, cell surface membrane, and peroxisomes. indeed, many of the cellular changes that we observed following orf c expression are associated with membrane proteins or pathways mediated by proteins that associate with the membranes of various cellular compartments. importantly, sars-cov- orf c is the first human coronavirus orf c protein that has acquired this putative transmembrane sequence. mutations have been acquired along the course of evolution of orf c, although ~ % of the sars-cov- orf c sequence is identical to the ortholog in other coronaviruses, although greater similarity was identified with the bat sars-cov- sequence. the membranal anchoring capability identified in sars-cov- orf c is novel feature that may mediate the effect on ifn signaling, antigen presentation, and immune evasion phenotypes, characteristics that make sars-cov- much more virulent and pathogenic than other coronaviruses. notably, . - . % of patients were found to possess a mutation that is expected to impair transmembrane domain of sars-cov- orf c ( ) ; awaiting future assessment of clinical outcome, our data would predict a better clinical outcome, distinguishing these patients from those harboring the transmembrane domain. correspondingly, the interactome for the less virulence and pathogenic sars-cov- orf c ( ) did not overlap with that for sars-cov- orf c. a notable signature that we identified is the upregulation of histone and histone deacetylaserelated factors, which suggested that histone modification may underlie the transcriptional repression. the increased transcription of ap family members (fosb, fosl , creb, and atf ), which participate in the cellular stress response, may reflect the response to stress imposed by orf c, which, in turn, can limit immune-related signaling identified in our study. another remarkable signature of orf c expression in a cells was the association with ubp components. together with the observations on cellular immune pathways, this association with the ubp suggested that orf c induces changes in ubp components that alter the stability of cellular proteins implicated in cytokine signaling, antigen presentation, innate immunity, and the cell cycle. additionally, we identified upr components as important for orf c instability, suggesting that this protein is misfolded or at least recognized as a misfolded protein by the host cell. in this scenario, we propose that misfolded orf c engages upr (through vcp) and the ubp, which clears this protein. by engaging the ubp, orf c promotes enhanced proteasome activity as suggested by our proteome analysis. our analysis revealed that interfering with vcp activity blunted the transcriptional repressive effects of sars-cov- orf c on impact immune system components, such as irf , infgr , isg , socs and tap . proteasome inhibition with bortezomib was also effective, although not as consistently effective as mns- , the vcp inhibitor. these findings suggested that inhibition of vcp or the proteasome, which has inhibitors currently in clinical trials for cancer ( , ), may be considered among therapeutic measures to fight sars-cov- virulence and pathologies. however, we cannot exclude the possibility that orf c stability and degradation mechanisms differ based on cell type or the activity of other viral proteins in infected cells. another potential therapeutic opportunity involves targeting the membrane association of orf c, because this is a unique feature of the protein in the sars-cov- coronavirus. thus, identifying small molecules that could interfere with orf c localization to the membrane could limit orf c function and impede the ability of the virus to evade the immune response and reduce viral replication. given that orf c is expected to affect immune evasion, virulence and pathogenesis, additional studies should assess the consequences of orf c inhibition in vivo, using primates and possibly mouse models where sars-cov- shown to impact ifn signaling and immune response ( ). secondary antibodies were used at : . immunoprecipitation of streptavidin-tagged cov- orfs was performed as previously described ( ) . briefly, frozen cell pellets were thawed on ice for - minutes and suspended in ml lysis buffer with mm tris-hcl, ph . at °c, mm nacl, mm edta and supplemented with . % nonidet p- substitute, complete mini edta-free protease and phosstop phosphatase inhibitor cocktails (roche). samples were centrifuged minutes at °c at , g. protein quantification was performed using pierce's bca quantification kit as per the manufacturer's indications. supernatants ( mg protein) were incubated h at °c with magstrep "type " beads ( μl; iba lifesciences) that had been previously equilibrated twice with ml wash buffer (ip buffer supplemented with . % np ). beads were washed five times with ml wash buffer and then five times with ml ammonium bicarbonate mm. raw fastq files were processed using cutadapt v . ( ) figure d , figure a and b were selected based on bh correct p < . without fold change cutoff. differentially expressed proteins in figure d , figure a and b were selected based on p < . without fold change cutoff. protein sequences similar to sars-cov- orf c were retrieved through ncbi blastp using the nr database ( ). sequence alignment was performed using clustalo ( ). phylogenetic tree was built using phyml algorithm by times bootstrap, and visualized using seaview ( ) and geneious version . . (san diego, ca). transmembrane domain prediction was performed using tmhmm web server v . ( ) . transmembrane domain was predicted based on tmhmm posterior probability more than . . statistical test results of rna-seq, proteomics and interactome data provided in supplementary tables. analyses of omics data in this study were performed using r customized scripts. statistical analysis of proteomics data sets was performed using msstats (label-free data) and mstatstmt (tmt data) bioconductor package. differential expression of rna-seq was performed using deseq bioconductor package following negative binomial distribution and wald test. pathway enrichment and upstream regulator analyses were performed using ipa following fisher's exact test and z-score calculation considering directional changes in ipa database. ana dominguez andres ^, yongmei feng ^, alexandre rosa campos ^, jun yin ^, chih-cheng beads were resuspended in m urea, mm ammonium bicarbonate, and cysteine disulfide bonds were reduced with mm tris ( -carboxyethyl) phosphine (tcep) at °c for min. high confidence interacting proteins were selected using the following filtering criteria: log fc > . ( x) and a p-value < . (to include the p-value of proteins detected in at least orf c pulldown replicates but not detected in the negative controls). we also considered the 'crapomescore' < . , which is the fraction of single affinity purification experiments a given protein-interacting candidate receives in the crapome database (crapome.org). a score of means the candidate is identified in all experiments in that database. cells were lysed in uab buffer ( m urea, mm ammonium bicarbonate (abc) and benzonase u/ ml) with vigorous shaking ( hz for min at room temperature using a retsch mm instrument). lysates were centrifuged at , xg for minutes to remove cellular debris, and protein concentration in supernatants was determined using bicinchoninic acid (bca) protein assay (thermo scientific fragmented precursors were detected in the ion trap as rapid scan mode with automatic gain control target set to x and a maximum injection time set at ms. the dynamic exclusion was set to seconds with a ppm mass tolerance around the precursor. for processing label-free lc-ms/ms data, all raw files were processed with maxquant (version . . . ) using the integrated andromeda search engine against a target/decoy version of the curated human uniprot proteome without isoforms (downloaded in january of ) and the gpm crap sequences (commonly known protein contaminants). first search peptide tolerance was set to ppm, and main search peptide tolerance was set to . ppm. fragment mass tolerance was set to ppm. trypsin was set as the enzyme in specific mode, and up to two missed cleavages was allowed. carbamidomethylation of cysteine was specified as fixed modification and protein n-terminal acetylation and oxidation of methionine were considered variable modifications. in addition, the phosphopeptide-enriched samples were also searched with phosphorylation of serine, threonine or tyrosine considered as variable modification. the target-decoy-based false discovery rate (fdr) filter for spectrum and protein identification was set to %. statistical analysis of label-free proteomics data was carried out using in-house r script (version . . , -bit), including r bioconductor packages. first, peptide feature intensities (maxquant evidence table) were log -transformed and normalized (loess normalization) across samples to account for systematic errors. then all non-razor peptide sequences were removed from the list. protein-level quantification and statistical testing for differential abundance were performed using msstats bioconductor package. cells were imaged with an ic high-content screening system (vala sciences) using a x objective to visualize strep-orf c proteins (alexa ) and nuclei (dapi). four images were obtained from different fields in each well for -well plates. images were analyzed with acapella high-content imaging and analysis software for valid cell numbers per field and to determine average alexa intensity per cell. a cells expressing sars-cov- strep-orf c were treated with dmso or mg ( um) served as negative and positive imaging controls, respectively. plate-to-plate variability was normalized using a control-based method; associated control samples were aggregated, and the mean and variance across wells were determined. the alexa mean intensity for all wells with sirna knockdown was normalized using unique non-targeting sirnas included in each plate as reference data points. the top scoring hits were obtained using a threshold of > . -fold increase in average intensity from duplicates (p-value < . ). ten of the sirna pools were selected for confirmation in a secondary deconvolution screen. for that screen, quantification data were converted to a z-score, and the average z-score from data in triplicate plates was determined. genes were defined as confirmed screen hits if they had or more individual positive sirna score (cut-off of > sd). table s . interactome and proteome data analysis, pathway enrichment and upstream regulators analyses table s . transcriptome data analysis, pathway enrichment and upstream regulators analyses table s . canonical pathway comparison of different omics technologies and public data sets table s . canonical pathway analysis of mg reversed genes and proteins a novel coronavirus outbreak of global health concern a new coronavirus associated with human respiratory disease in china a novel coronavirus from patients with pneumonia in china sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor cell entry mechanisms of sars-cov- structural and functional basis of sars-cov- entry by using human ace a pneumonia outbreak associated with a new coronavirus of probable bat origin single-cell rna expression profiling of ace , the receptor of sars-cov- sars-cov- : a comprehensive review from pathogenicity of the virus to clinical consequences genome composition and divergence of the novel coronavirus ( -ncov) originating in china systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov characteristics of registered studies for coronavirus disease (covid- ): a systematic review an alphavirus-derived replicon rna vaccine induces sars-cov- neutralizing antibody and t cell responses in mice and nonhuman primates structure, function, and antigenicity of the sars-cov- spike glycoprotein a vaccine targeting the rbd of the s protein of sars-cov- induces protective immunity potently neutralizing and protective human antibodies against sars-cov- remdesivir for the treatment of covid- -preliminary report structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- a sars-cov- protein interaction map reveals targets for drug repurposing structural genomics of sars-cov- indicates evolutionary conserved functional regions of viral proteins the proteins of severe acute respiratory syndrome coronavirus- (sars cov- or n-cov ), the cause of covid- imbalanced host response to sars-cov- drives development of covid- proteomics of sars-cov- -infected host cells reveals therapy targets bulk and single-cell gene expression profiling of sars-cov- infected human cell lines identifies molecular targets for therapeutic intervention. biorxiv transcriptional landscape of sars-cov- infection dismantles pathogenic pathways activated by the virus, proposes unique sex-specific differences and predicts tailored therapeutic strategies a dynamic immune response shapes covid- progression transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients interactome of sars-cov- / ncov modulated host proteins with computationally predicted ppis the global phosphorylation landscape of sars-cov- infection covid- : viral-host interactome analyzed by network based-approach model to study pathogenesis of sars-cov- infection drug design targeting the main protease, the achilles' heel of coronaviruses crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors structure of the rna-dependent rna polymerase from covid- virus structural and evolutionary analysis indicate that the sars-cov- mpro is a challenging target for small-molecule inhibitor design characterization of accessory genes in coronavirus genomes. biorxiv sars coronavirus accessory proteins insights into sars-cov- genome, structure, evolution, pathogenesis and therapies: structural genomics approach predicting transmembrane protein topology with a hidden markov model: application to complete genomes multi-level proteomics reveals host-perturbation strategies of sars-cov- and sars-cov. biorxiv ubiquilin and p /vcp bind erasin, forming a complex involved in erad vcp/p -mediated unfolding as a principle in protein homeostasis and signaling the proteasome maturation protein pomp facilitates major steps of s proteasome formation at the endoplasmic reticulum sirna silencing of proteasome maturation protein (pomp) activates the unfolded protein response and constitutes a model for klick genodermatosis ifn-gamma-induced immune adaptation of the proteasome system is an accelerated and transient response covid- infection: origin, transmission, and characteristics of human coronaviruses sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues competing interests: zar is a co-founder and serves as scientific advisor to pangea therapeutics. all other authors declare no competing interests.data and materials availability: all datasets will be deposited in publicly available data sets prior to publication; all reagents and study protocols are available by requests from the corresponding authors. key: cord- -fs dn ir authors: kim, so young; jin, weihua; sood, amika; montgomery, david w.; grant, oliver c.; fuster, mark m.; fu, li; dordick, jonathan s.; woods, robert j.; zhang, fuming; linhardt, robert j. title: glycosaminoglycan binding motif at s /s proteolytic cleavage site on spike glycoprotein may facilitate novel coronavirus (sars-cov- ) host cell entry date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fs dn ir severe acute respiratory syndrome-related coronavirus (sars-cov- ) has resulted in a pandemic and continues to spread around the globe at an unprecedented rate. to date, no effective therapeutic is available to fight its associated disease, covid- . our discovery of a novel insertion of glycosaminoglycan (gag)-binding motif at s /s proteolytic cleavage site ( - (prrars)) and two other gag-binding-like motifs within sars-cov- spike glycoprotein (sgp) led us to hypothesize that host cell surface gags might be involved in host cell entry of sars-cov- . using a surface plasmon resonance direct binding assay, we found that both monomeric and trimeric sars-cov- spike more tightly bind to immobilized heparin (kd = pm and pm, respectively) than the sars-cov and mers-cov sgps ( nm and nm, respectively). in competitive binding studies, the ic of heparin, tri-sulfated non-anticoagulant heparan sulfate, and non-anticoagulant low molecular weight heparin against sars-cov- sgp binding to immobilized heparin were . μm, . μm, and . μm, respectively. finally, unbiased computational ligand docking indicates that heparan sulfate interacts with the gag-binding motif at the s /s site on each monomer interface in the trimeric sars-cov- sgp, and at another site ( - (yrlfrks)) when the receptor-binding domain is in an open conformation. our study augments our knowledge in sars-cov- pathogenesis and advances carbohydrate-based covid- therapeutic development. in march , the world health organization declared severe acute respiratory syndrome-related coronavirus (sars-cov- ) a pandemic less than three months after its initial emergence in wuhan, china [ , ] . sars-cov- is a zoonotic betacoronavirus transmitted through person-person contact through airborne and fecal-oral routes, and has caused over , confirmed coronavirus disease (covid- ) cases and , associated deaths worldwide [ ] [ ] [ ] [ ] . while there is limited understanding of sars-cov- pathogenesis, extensive studies have been performed on how its closely related cousins, sars-cov and mers-cov (middle east respiratory syndrome-related coronavirus), invade host cell. upon initially contacting the surface of a host cell, sars-cov and mers-cov exploit host cell proteases to prime their surface spike glycoproteins (sgps) for fusion activation, which is achieved by receptor binding, low ph, or both [ , ] . the receptor binding domain (rbd) resides within subunit (s ) while subunit (s ) facilitates viral-host cell membrane fusion [ ] . activated sgp undergoes a conformational change followed by an initiated fusion reaction with the host cell membrane [ ] . endocytosed virions are further processed by the endosomal protease cathepsin l in the late endosome [ , ] . both mers-cov and sars-cov require proteolytic cleavage at their s ' site, but not at their s -s junction, for successful membrane fusion and host cell entry [ , ] . additionally, receptors involved in fusion activation of sars-cov and mers-cov include heparan sulfate (hs) and angiotensinconverting enzyme (ace ), and dipeptidyl peptidase (dpp ), respectively [ ] [ ] [ ] . sars-cov and other pathogens arrive at a host cell surface by clinging, through their surface proteins, to linear, sulfated polysaccharides called glycosaminoglycans (gags) [ ] [ ] [ ] . the repeating disaccharide units of gags, comprised of a hexosamine and a uronic acid or a galactose residue, are often sulfated (s fig) [ ] . gags are generally found covalently linked to core proteins as proteoglycans (pgs) and reside inside the cell, at the cell surface, and in the extracellular matrix (ecm) [ ] . gags facilitate various biological processes, including cellular signaling, pathogenesis, and immunity, and possess diverse therapeutic applications [ ] . for example, an fda approved anticoagulant heparin (hp) is a secretory gag released from granules of mast cells during infection [ , ] . some gag binding proteins can be identified by amino acid sequences known as cardin-weintraub motifs corresponding to 'xbbxbx' and 'xbbbxxbx', where x is a hydropathic residue and b is a basic residue, such as arginine and lysine, responsible for interacting with the sulfate groups present in gags [ , ] . examination of the sars-cov- sgp sequence revealed that the gag-binding motif resides within s -s proteolytic cleavage motif (furin cleavage motif bbxbb) that is not present in sars-cov or mers-cov sgps (fig , s fig, s fig) [ ] . additionally, we discovered gag-binding-like motifs within rbd and s ' proteolytic cleavage site in sars-cov- sgp (fig , s fig, s fig) . this discovery prompted us to hypothesize that gags may contribute to sars-cov- fusion activation and host cell entry as a novel mechanism through sgp binding. we performed surface plasmon resonance (spr)-based binding assays to determine binding kinetics of the interactions between various gags and sars-cov- sgp in comparison with sars-cov, and mers-cov sgp to address this question. lastly, we performed blind docking on the trimeric sars-cov- sgp model to objectively identify the preferred binding gag-binding sites on the sgp. previous reports showed that various cov bind gags through their sgps to invade host cells [ ] . in the current study, we utilized spr to measure the binding kinetics and interaction affinity of monomeric and trimeric sars-cov- , monomeric sars-cov and mers-cov with sgp-hp using a sensor chip with immobilized hp. sensorgrams of cov sgp-hp interactions are shown in fig . the sensorgrams were fit globally to obtain association rate constant (ka), dissociation rate constant (kd) and equilibrium dissociation constant (kd) ( table ) using the biaevaluation software and assuming a : langmuir model. sars-cov- and mers cov sgp exhibited a markedly low dissociation rate constant (kd ~ - /s) suggesting excellent binding strength. the hp binding properties of monomeric sars-cov- sgp was comparable to that of the trimeric form (kd of monomer and trimer were pm and pm, respectively). in comparison, previously known hp binding sars-cov sgp showed nearly -fold lower affinity, nm. the extremely high binding affinity of sars-cov- sgp to hp was supported by the chip surface regeneration conditions. the immobilized hp surface could only be regenerated using a harsh regeneration reagent, . % sds, instead of the standard m nacl solution used for removing hp-binding proteins. one reason for sars-cov- sgp monomer and trimer extremely high affinity to immobilized heparin is the high density of surface bound ligands might promote polyvalent interactions. the difference of binding kinetics and affinity of cov sgps to hp may also be due in part to the difference in protein sequence of the cov sgps. based on amino acid alignment analysis using the basic local alignment search tool (blast), sars-cov and sars-cov- sgps share % similarity. association rate constants (ka) for mers-cov sgp ( (± ) /m - s ) was the lowest, followed by monomeric and trimeric sars-cov- sgp ( . × (± . ) m - s - and . × (± ) m - s - , respectively) ( table ) . sars-cov sgp had the highest ka, which was . × (± ) m - s - . the differences in ka values suggest a different mechanism when each sgp binds hp in addition to differences in binding strengths. solution/surface competition experiments were performed by spr to examine the effect of the saccharide chain length of hp on the sars-cov- sgp-hp interaction. hp-derived oligosaccharides of different lengths, from tetrasaccharide (dp ) to octadecasaccharide (dp ), were used in these competition studies. the same concentration ( nm) of hp oligosaccharides were mixed in the sars-cov- sgp protein ( nm)/ hp interaction solution. negligible competition was observed (s fig) when nm of oligosaccharides (from dp to dp ) were present in the protein solution suggesting that the sars-cov- sgp-hp interaction is chainlength dependent and it prefers to bind full chain (~dp ) hp. competition levels measured by spr for chemically modified hp derivatives are shown in using a modified version of autodock vina tuned for use with carbohydrates (vina-carb) [ , ] , we performed blind docking on the trimeric sars-cov- sgp model to discover objectively the preferred binding gag-binding sites on the sgp protein surface. the sgp contains three putative gag-binding motifs with the following sequences: - (yrlfrks), - (prrars), and - (skpskrs), which we define as sites , , and , respectively (fig , s fig, s fig) . an hs hexasaccharide fragment (glca( s)-glcns( s)) binds site in each monomer chain in the trimeric sgp ( fig c, s fig) . the docking results also indicates that hs may bind to site when the apex of the s monomer is in an open conformation, as this allows basic residues to be more accessible to ligand binding. the site residues are less accessible for gag binding when the domain is in a closed conformation (fig d) . the electrostatic potential surface representation of the trimeric sgp confirms that the gag-binding poses generally prefer regions of positive charge, as expected, and illustrates that basic residues within site are not exposed for binding to hs on any of the chains (fig a) . finally, our blind docking analysis reveals that a longer hs polymer may span an inter-domain channel that contains site . the original sars-cov and numerous pathogens exploit host cell surface gags during the initial step of host cell entry [ ] [ ] [ ] . based on our discovery of gag-binding and gagbinding-like motifs at site (within the rbd, y -s ), site (at the proteolytic cleavage site at s /s junction, p -s ), and site (at the s ' proteolytic cleavage site, s -s ), we hypothesized that sars-cov- may also interact with host cell surface gags through its sgps to invade host cell (fig , s fig, s fig) . the predominant gag in normal human lung is hs followed by cs [ ] and it is noteworthy that lung tissue is rich in mast cells and has been a source of commercial hp [ ] . using unbiased docking, we found that tris hs hexasaccharide d ). docking results indicated that the hs hexasaccharides could span an inter-domain channel that includes site , suggesting a mechanism for the binding of a longer hs sequence ( fig c) . next, we experimentally determined binding kinetics for the interactions between hp (rich ( - %) in tris domains) and monomeric sars-cov- , trimeric sars-cov- , monomeric sars-cov, and monomeric mers-cov sgps using spr binding assays (fig and table ). gag-protein interactions are mainly electrostatically driven [ ] , thus, hs-binding proteins generally bind hp due to its higher degree of sulfation [ ] . we discovered that hp binds both monomeric and trimeric sars-cov- sgp with remarkable affinity (kd = pm and pm, respectively) (fig and table ). this was unexpectedly tight binding for a gag-protein interaction as even one of one of the prototypical hp-binding proteins, fibroblast growth factor (fgf ), has a kd of nm [ ] . in comparison, sars-cov and mers-cov sgps also bind hp, however, much more weakly with binding strengths of kd = nm and nm, respectively (fig and table ). while hs facilitates sars-cov host cell entry and is an essential host cell surface receptor, its involvement in mers-cov host cell entry or binding kinetics for sars-cov and mers-cov sgps had not previously been reported [ ] . after discovering the high binding affinity between hp and sars-cov- sgp, we next found that the degree and position of sulfation within hp was important for its successful binding to monomeric sars-cov- sgp (figs and ) . the low ic of these gags suggest that the fda approved anticoagulant hp, or its nonanticoagulant derivatives, might have therapeutic potential against sars-cov- infection as competitive inhibitors. the location of proposed gag-binding sites is also of interest. unlike sars-cov and mers-cov sgps, sars-cov- sgp has a novel insert in the amino acid sequence ( - (prrars)) that fully follows gag-binding cardin-weintraub motif (xbbxbx) and a furin-cleavage motif (bbxbb) at the s /s junction (fig ) . this site was also shown to be a preferred gag-binding motif by our unbiased docking study (fig ) . proteolytic cleavage at s /s is not required for successful viral-host cellular membrane fusion in sars-cov and mers-cov sgps [ , ] . proteolytic cleavage primes the sgp for fusion activation and may additionally influence cell-cell fusion, host cell entry, and/or the infectivity of the virus [ , ] . possess both gag-binding and furin cleavage motifs at their s /s junction in their sgps [ ] . in the cases of mhv and ibv spike proteins, a single amino acid mutation near the gag-binding and furin cleavage motifs resulted from a cell culture adaptation, and determines whether a virion binds gags or exploits host cell surface protease, but not both [ ] . while not within the cov family, human immunodeficiency virus type (hiv- ) requires hs-binding to achieve optimal furin processing because hs binding allows selective exposure of furin cleavage site [ ] . while the idea of repurposing hp as covid- therapeutic sounds appealing, further questions, including in vitro relevance of gags as host cell surface receptors, proteolytic processing of sgps at s /s junction, and their relationship in host cell entry and infectivity, must first be carefully evaluated. based on our findings, we propose a model on how gags may facilitate host cell entry of sars-cov- (fig ) . first, virions land on the epithelial surface in the airway by binding to hs through their sgps (fig a) . host cell surface proteoglycans utilize their long hs chains to securely wrap around the trimeric sgp ( fig a) . during this step, heavily sulfated hs chains span inter-domain channel containing gag-binding site on each monomer in the trimeric sgp and binds site within the rbd in an open conformation (fig ) . host cell surface and extracellular proteases, such as furin and transmembrane serine protease (tmprss ), may process site (s /s junction) and/or (s ') and gag chains come off from site upon cleavage (fig b) . hs and ace binding to more readily accessible rbd containing site may drive conformational change of sgp and activate viral-cellular membrane fusion [ ] . finally, sgp on the endocytosed virion may utilize an endosomal host cell protease, such as cathepsin l, to further execute viralcellular membrane fusion. (fig c) . in conclusion, we have discovered that gags can facilitate host cell entry of sars-cov- by binding to sgp in the current work. spr studies demonstrate that both monomeric and trimeric sars-cov- sgp bind hp with remarkably high affinity and it prefers long, heavily sulfated (tris rich) structures. additionally, we reported low ic of hp and derivatives against hp and sars-cov- sgp interactions suggesting therapeutic potential of hp as covid- competitive inhibitors. lastly, unbiased computational ligand docking indicated that a tris hs oligosaccharide preferably interacts with gag-binding motifs at the s /s junction and within receptor binding domain and hinted at mechanism of binding. this study adds to our current understanding of sars-cov- pathogenesis and serves a foundation for designing glycoconjugate vaccines and therapeutics to successfully contain and eliminate covid- . [ ] . the -o-desulfated hp derivative, -des hp, mw = kda, was generously provided by prof. lianchun wang (university of south florida). nonanticoagulant low molecular weight hp (nach) was synthesized from dalteparin, a nitrous acid depolymerization product of porcine intestinal hp, followed by periodate oxidation as described in our previous work [ ] . tris hs (ns s s) was synthesized from n-sulfo heparosan with subsequent modification with c -epimerase and -o-and -o-sulfotransferases ( ost and ost / ost ) [ ] . hp oligosaccharides included tetrasaccharide (dp ), hexasaccharide (dp ), octasaccharide (dp ), decasaccharide (dp ), dodecasaccharide (dp ), tetradecasaccharide (dp ), hexadecasaccharide (dp ) and octadecasaccharide (dp ) and were prepared from porcine intestinal hp controlled partial heparin lyase treatment followed by size fractionation. the chemical structures of the gags are shown in s fig. sensor sa chips were from ge healthcare (uppsala, sweden). spr measurements were performed on a biacore operated using biacore control and biaevaluation software (version . . ). biotinylated hp was prepared by conjugating its reducing end to amine-peg -biotin (pierce, rockford, il). in brief, hp ( mg) and amine-peg -biotin ( mg, pierce, rockford, il) were dissolved in µl h o, mg nacnbh was added. the reaction mixture was heated at °c for h, after that a further mg nacnbh was added and the reaction was heated at °c for another h. after cooling to room temperature, the mixture was desalted with the spin column ( , mwco). biotinylated hp was collected, freeze-dried and used for sa chip preparation. the biotinylated hp was immobilized to streptavidin (sa) chip based on the manufacturer's protocol. the successful immobilization of hp was confirmed by the observation of a -resonance unit (ru) increase on the sensor chip. the control flow cell (fc ) was prepared by min injection with saturated biotin. l/min, respectively. after each run, the dissociation and the regeneration were performed as described above. solution competition studies between surface hp and soluble glycans (hp, tris hs and nach) to measure ic were performed using spr [ ] . in brief, sars-cov- s-protein ( nm) samples alone or mixed with different concentrations of glycans in spr buffer were injected over the hp chip at a flow rate of l/min, respectively. after each run, dissociation and regeneration were performed as described above. for each set of competition experiments, a control experiment (only protein without glycan) was performed to ensure the surface was completely regenerated. the d coordinates for the sgp trimer (ncbi reference sequence yp_ . ) were downloaded from the swiss-model homology modeling server [ ] . the selected model was generated with the cryo-em structure pdb id vsb as a template, which has a . % sequence identity and % coverage for amino acids to . the template and resulting model is the "prefusion" structure with one of the three receptor binding domains (chain a) in the "up" or "open" conformation [ ] . cryo-em studies have revealed that the sars-cov- sgp trimer exists in two conformational states in approximately equal abundance [ ] . in one state, all sgp monomers have their hace -binding domain closed, and in the other, one monomer has its hace -binding domain open, where it is positioned away from the interior of the protein. initial coordinates for a hexasaccharide fragment of hs (glca( s)-glcns( s)) were generated using the gag-builder tool [ ] at glycam-web (glycam.org) and used for unbiased (blind) docking. a hexasaccharide was chosen as being sufficiently long to represent a typical gag length found in protein co-complexes [ ] and to avoid introducing so many degrees of internal flexibility that the efficiency of the docking conformational search algorithm was impaired. docking was performed using a version of vina-carb [ ] that has been modified to improve its performance for gags. a grid box with dimensions (x = , y = , z = Å) was placed at the geometric center the protein enclosing its entire surface. docking was performed with default values, with the following exceptions: exhaustiveness = , chi_cutoff = , and chi_coeff = . . all sulfate and hydroxyl groups and glycosidic torsion angles were treated as flexible, resulting in ligand poses. world health organization. who director-general's opening remarks at the mission briefing on covid- - a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster aerosol and surface stability of sars-cov- as compared with sars-cov- enteric involvement of coronaviruses: is faecal-oral transmission of sars-cov- possible? world health organization. coronavirus disease (covid- ) situation report. in: world health organization activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites middle east respiratory syndrome coronavirus spike protein is not activated directly by cellular furin during viral entry into target cells sars coronavirus, but not human coronavirus nl , utilizes cathepsin l to infect ace -expressing cells receptor and viral determinants of sars-coronavirus adaptation to human ace inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc glycosaminoglycans in infectious disease mechanisms of coronavirus cell entry mediated by the viral spike protein interaction of zika virus envelope protein with glycosaminoglycans proteoglycans and sulfated glycosaminoglycans copper regulates the interactions of antimicrobial piscidin peptides from fish mast cells with formyl peptide receptors and heparin molecular modeling of protein-glycosaminoglycan interactions glycosaminoglycan-protein interactions: definition of consensus sites in glycosaminoglycan binding proteins the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade software news and updates autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading vina-carb: improving glycosidic angles during carbohydrate docking matrix proteoglycans as effector molecules for epithelial cell function comparison of low-molecular-weight heparins prepared from bovine lung heparin and porcine intestine heparin thermodynamic analysis of the heparin interaction with a basic cyclic peptide using isothermal titration calorimetry kinetic model for fgf, fgfr, and proteoglycan signal transduction complex assembly further evidence that periodate cleavage of heparin occurs primarily through the antithrombin binding site furin cleavage of the sars coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry cleavage of group coronavirus spike proteins: how furin cleavage is traded off against heparan sulfate binding upon cell culture adaptation heparin enhances the furin cleavage of hiv- gp peptides cryo-em structure of the -ncov spike in the prefusion conformation. science ( -) h and c nmr spectral assignments of the major sequences of twelve systematically modified heparin derivatives enzymatic synthesis of glycosaminoglycan heparin structural characterization of pharmaceutical heparins prepared from different animal tissues swiss-model: homology modelling of protein structures and complexes structure, function, and antigenicity of the sars-cov- spike glycoprotein gag builder: a web-tool for modeling d structures of glycosaminoglycans vmd: visual molecular dynamics genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan we appreciate prof. jason mclellan from university of texas austin for providing trimeric sars-cov- sgp. additionally, we thank professor lianchun wang from university of south florida for providing -o-desulfated hp derivative. key: cord- -afqfymbq authors: ryu, seungjin; shchukina, irina; youm, yun-hee; qing, hua; hilliard, brandon k.; dlugos, tamara; zhang, xinbo; yasumoto, yuki; booth, carmen j.; fernández-hernando, carlos; suárez, yajaira; khanna, kamal m.; horvath, tamas l.; dietrich, marcelo o.; artyomov, maxim n.; wang, andrew; dixit, vishwa deep title: ketogenesis restrains aging-induced exacerbation of covid in a mouse model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: afqfymbq increasing age is the strongest predictor of risk of covid- severity. unregulated cytokine storm together with impaired immunometabolic response leads to highest mortality in elderly infected with sars-cov- . to investigate how aging compromises defense against covid- , we developed a model of natural murine beta coronavirus (mcov) infection with mouse hepatitis virus strain mhv-a (mcov-a ) that recapitulated majority of clinical hallmarks of covid- . aged mcov-a -infected mice have increased mortality and higher systemic inflammation in the heart, adipose tissue and hypothalamus, including neutrophilia and loss of γδ t cells in lungs. ketogenic diet increases beta-hydroxybutyrate, expands tissue protective γδ t cells, deactivates the inflammasome and decreases pathogenic monocytes in lungs of infected aged mice. these data underscore the value of mcov-a model to test mechanism and establishes harnessing of the ketogenic immunometabolic checkpoint as a potential treatment against covid- in the elderly. highlights - natural mhv-a mouse coronavirus infection mimics covid- in elderly. - aged infected mice have systemic inflammation and inflammasome activation - murine beta coronavirus (mcov) infection results in loss of pulmonary γδ t cells. - ketones protect aged mice from infection by reducing inflammation. etoc blurb elderly have the greatest risk of death from covid- . here, ryu et al report an aging mouse model of coronavirus infection that recapitulates clinical hallmarks of covid- seen in elderly. the increased severity of infection in aged animals involved increased inflammasome activation and loss of γδ t cells that was corrected by ketogenic diet. aging-driven reduced resilience to infections is dependent in part on the restricted t cell repertoire diversity together with impaired t and b cell activation as well as inflammasomedriven low-grade systemic inflammation that compromises innate immune function (akbar and gilroy, ; camell et al., ; youm et al., ) . consequently, percent of deaths due to in us are in adults > years old (https://www.cdc.gov/) and aging is the strongest factor to increase infection fatality (pastor-barriuso et al., ; perez-saez et al., ; ward et al., ) . lack of an aging animal model that mimics sars-cov- immunopathology has been a major limitation in the effort to determine the mechanism of disease and to develop effective therapeutics for the elderly. inability of mouse ace to bind sars-cov- is a significant hurdle in understanding the basic mechanism of covid- . accordingly, several approaches have been employed to develop models including introduction of human-ace in mice and transient induction of hace through adenoviral-associated vectors. these models have begun to yield important information on the mechanism of disease development. for example, epithelial cell specific induction of hace (k -hace ) as a model of sars-cov- infection demonstrated that post intranasal inoculation, animals develop lung inflammation and pneumonia driven by infiltration of monocytes, neutrophils and t cells . also, initial studies that employ lung ciliated epithelial cell-specific hfh /foxj promoter driven hace transgenic mice show sars-cov- infection induces weight loss, lung inflammation and approximately % mortality rate, suggesting the usefulness of this model to understand the mechanism of immune dysregulation (jiang et al., ) . however, significant hurdles remain to understand the mechanism and test therapeutic interventions that are relevant to disease severity in elderly, as complicated breeding and specific mutations need to be introduced in hace transgenic strains in addition to the time required to age these models. the mouse model of sars-cov- based on adeno-associated virus (aav)-mediated expression of hace may allow circumvention of the above constrains. the delivery of hace into the respiratory tract of c bl/ mice with aav causes a productive infection as revealed by > fold increase in sars-cov- rna and show similar interferon gene expression signatures as covid- patients . however, in young wild-type mice, this model induces mild acute respiratory distress syndrome (ards) and does not cause neutrophilia, weight loss or lethality . other studies using replication deficient adenovirus-mediated transduction of hace in mice and infection with sars-cov- produced % weight-loss including lung inflammation (hassan et al., ; sun et al., ) . furthermore, genetic remodeling of the sars-cov- spike receptor binding domain that allow interaction with mace demonstrated peribronchiolar lymphocytic inflammatory infiltrates and epithelial damage but no weight-loss in infected mice (dinnon et al., ) . moreover, middle aged female mice (one year old, analogous to approx. year old human), display greater lung pathology and loss of function post infection with % weight-loss followed by spontaneous recovery days post infection (dinnon et al., ) . however, it remains unclear if this model replicates the clinical, systemic inflammation and immunological response and mortality observed in covid- . the mouse hepatitis virus (mhv) and sars-cov- are both ards-related beta coronaviruses with a high degree of homology (gorbalenya et al., ) . until the emergence of sars-cov- , the natural infection with mhv mouse coronavirus-a (mcov-a ) has traditionally been thought to be of minor relevance to human disease and largely been a primary veterinary concern to maintain the specific-pathogen free status of mouse research facilities (hickman and thompson, ) . however, lack of an aging mouse model of necessitates re-evaluation of mcov-a to investigate the mechanism of multi-organ inflammation, morbidity and mortality caused by the disease. importantly, the mcov-a utilizes the entry receptor ceacam , which is expressed on respiratory epithelium, but also on enterocytes, endothelial cells, and neurons, much like ace (godfraind et al., ) thus allowing the study of wide-ranging systemic impacts of infection. moreover, natural infection with mcov-a causes ards in c bl/ j animals, while all other mhv strains require the a/j or type-i interferon-deficient background, for the development of severe disease (de albuquerque et al., ; khanolkar et al., ; yang et al., ) limiting their use. lastly, another major practical advantage of natural infection with mcov-a mouse model is that it does not require limited bsl facilities, thus allowing the allocation of precious world-wide bsl specialized laboratory space to be prioritized for human sars-cov- virus studies in primates and other models, which cannot be achieved by using mouse-adapted sars-cov- models. aging-induced chronic inflammation in the absence of overt infections is predominantly driven by the nlrp inflammasome (bauernfeind et al., ; camell et al., ; youm et al., ) , a myeloid cell-expressed multiprotein complex that senses pathogen associated molecular patterns (pamps) and danger associated molecular patterns (damps) to cause the processing and secretion of il- β and il- . there is increasing evidence that sars-cov- infection activates the nlrp inflammasome (siu et al., ) with increased levels of il- and lactate dehydrogenase (ldh) levels due to inflammasome mediated pyroptotic cell death (lucas et al., ; zhou et al., ) . it is now known that increased glycolysis, which activates inflammasome is associated with worsened covid- outcome (codo et al., ) . this raises the question whether the substrate switch from glycolysis-to-ketogenesis, can be employed to stave off covid- in high risk elderly population. here, we establish that intranasal infection with mcov-a recapitulates clinical features of covid- seen in elderly and demonstrate that ketone metabolites protect against disease through inhibition of nlrp inflammasome and expansion of protective γδ t cells in lungs. to determine the underlying deficits in immune and inflammatory response in aging, we investigated the impact of mcov-a intranasal inoculation on adult ( - month) and old male mice ( - month) ( figure a ). the ld- infectious dose of mcov-a in adult (pfu e ) caused percent lethality in aged mice ( figure b ). compared to adults, the old infected mice displayed greater weight-loss ( figure c ), hypoxemia ( figure d ), and anorexia ( figure e ) without a significant difference in viral load in lungs ( figure s a ). interestingly, aging led to a significant reduction in cd , cd :cd ratio ( figure f and figure s b ) and ϒδ t cells ( figure g and figure s c ) in lungs and spleen ( figure s d ) with increased neutrophils ( figure h ), ly c hi monocytes ( figure i and s e) and no change in eosinophils ( figure s f ). in addition, the lungs of old infected mice displayed increased frequency of cd + mertk + cells ( figure j and s e) with no significant differences in the total population of alveolar or interstitial macrophages ( figure s g and s h). transmission electron microscopy confirmed the dissemination of the viral particles in pneumocytes in lungs ( figure a ). following mcov-a inoculation, ihc analyses by he and msb staining, in both month and - -month old mice, there is perivascular inflammation (arrows, arrowhead) as well as perivascular edema (*) and increased perivascular collagen/fibrosis (msb, blue) that is more severe in the - -month mcov-a infected mice ( figure b ). further, - month old mice inoculated with mcov-a have dense foci visible at low power (box) and amphophilic material (fibrosis) with few scattered brightly eosinophilic erythrocytes (grey arrowhead) admixed with lymphocytes and plasma cells. by msb stain, at higher power this same focus (***) in the - month infected mice revealed that the end of a small blood vessel (bv) terminates in to a mass of collapsed alveoli, without obvious septa admixed with inflammatory cells, disorganized fibrin/collagen fibers (blue) suggestive of ante mortem pulmonary thrombosis in contrast to post mortem blood clots where erythrocytes are yellow (** msb, yellow) ( figure b ). taken together, consistent with ards, the lungs of aged mice infected with mcov-a had increased foci of inflammation, immune cell infiltration, perivascular edema, hyaline membrane formation and type ii pneumocyte hyperplasia, organizing pneumonia, interstitial pneumonitis, and occasional hemorrhage and microthrombi, affecting approximately % of the lungs ( figure b ). we next investigated whether mcov-a infection in aged mice mimics the hyperinflammatory systemic response seen in elderly patients infected with covid- . compared to young animals, old mice infected with equivalent doses of mcov-a displayed significant increase in circulating il- β, tnfα, il- ( figure a -c) and mcp- without affecting mip- β ( figure s a and s b). similar to covid- , the infection with mcov-a caused increased cardiac inflammation in old mice as evaluated by greater number of infiltrating cd + myeloid cells ( figure d ). given that increased visceral adiposity is a risk factor for covid- severity and expression of ace is upregulated in adipocytes of obese and diabetic patients infected with sars-cov- (kruglikov and scherer, ) , we next studied whether mcov-a infection affects adipose tissue. given the prevalence of obesity is % among younger adults aged - , % among adults aged - years and % among older adults aged and over (hales et al., ) we investigated adipose tissue inflammation as a potential mechanism that contributes to infection severity in the aged. interestingly, consistent with the prior findings that adipose tissue can harbor several viruses (damouche et al., ) , the mcov-a rna was detectable in vat ( figure e ). despite similar viral loads, vat of aged infected mice had significantly higher levels of the pro-inflammatory cytokines il- β, tnfα and il- ( figure f -h). moreover, compared to young mice, old animals infected with mcov-a had increased caspase- cleavage (p active heterodimer), a marker of inflammasome activation ( figure i and s c). in addition, similar to sars-cov- invasiveness in cns, the mcov-a was detectable in the hypothalamus ( figure j ). compared to adults, the hypothalamus of aged infected mice showed increased expression of tnfα and caspase- ( figure k and l) with no significant differences in il- β, il- ( figure m and n) and nlrp ( figure s d ). infection in both young and aged mice caused significant increases in markers of astrogliosis and microglia activation ( figure s e and f). interestingly, mcov-a reduced the mrna expression of orexigenic neuropeptide-y (npy) ( figure s e and f), consistent with the fact that infected mice display anorexia ( figure c and e). however, mcov-a infection completely abolished the expression of preopiomelanocortin (pomc) in the hypothalamus, a transcript expressed by pomc neurons, which is involved in the control of the autonomic nervous system and integrative physiology. therefore, further investigation will be necessary to test the involvement of the hypothalamus in the pathogenesis of covid- and organ failure due to alterations in autonomic nervous system. given the switch from glycolysis to fatty acid oxidation reprograms the myeloid cell from proinflammatory to tissue reparative phenotype during infections (ayres, ; buck et al., ; galván-peña and o'neill, ) , we next investigated whether mcov-a -driven hyperinflammatory response in aging can be targeted through immunometabolic approaches. hepatic ketogenesis, a process downstream of lipolysis that converts long-chain fatty acids into short chain β-hydroxybutyrate (bhb) as a preferential fatty acid fuel during starvation or glucoprivic states, inhibits the nlrp inflammasome activation (youm et al., ) and protects against influenza infection induced mortality in mice (goldberg et al., ) . moreover, given our recent findings that ketogenesis inhibits inflammation and expands tissue resident ϒδ t cells (goldberg et al., ) while sars-cov- infection in patients is associated with depletion of ϒδ t cells (lei et al., ; rijkers et al., ) , we next tested whether elevating bhb by feeding a ketogenic diet (kd) protects against mcov-a -driven inflammatory damage in aged mice. we infected bone marrow derived macrophages (bmdms) with mcov-a in vitro in tlr ( figure a and s a) and tlr / primed cells ( figure b and s b). infection with mcov-a caused robust activation of inflammasome as measured by cleavage of active il- β (p ) in bmdm supernatants ( figure a and b) as well as in cell lysates ( figure s a and b). given our prior findings that ketone metabolites specifically inhibits the nlrp inflammasome in response to sterile damps such as atp, ceramides, silica and urate crystals (youm et al., ) , we next tested whether bhb impacts inflammasome activation caused by mcov-a . interestingly, bhb treatment reduced pro and active cleaved il- β (p ) in both conditions when protein level was measured in the supernatant ( figure a and b) and cell lysate ( figure s a and b). mechanistically, post mcov-a infection, the bhb reduced the oligomerization of asc, which is an adaptor protein required for the assembly of the inflammasome complex ( figure c and d). this data provides evidence that the ketone metabolite bhb can lower inflammation in response to coronavirus infection and deactivate the inflammasome. however, inflammasome activation is also required for mounting adequate immune response against pathogens including certain viruses. therefore, we next investigated if induction of ketogenesis and ketolysis in vivo by feeding a diet rich in fat and low in carbohydrates that elevates bhb level impacts inflammasome and host defense against mcov-a infection in aged mice. ketogenesis is dependent on hydrolysis of triglycerides and conversion of long chain fatty acids in liver into short chain fatty acid bhb that serves as primary source of atp for heart and brain when glucose is limiting. aging is associated with impaired lipid metabolism which includes reduced lipolysis that generates free fatty acids that are essential substrates for bhb production. thus, it is unclear whether in context of severe infection and aging, if sufficient ketogenesis can be induced. to test this, the aged male mice ( - months old) were fed a kd or control diet for days and then intranasally infected with mcov-a ( figure e ). despite mcov-a 's known effects in causing hepatic inflammation (navas et al., ) , we observed that compared to chow fed animals, old kd fed mice achieved mild physiological ketosis between . to mm over the course of infection for one week ( figure s c ). compared to chow fed animals, the mcov-a infected mice fed kd displayed similar levels of food intake, blood glucose, core-body temperature, heart rate, and respiration ( figure s d -i). interestingly, mcov-a infected kd fed mice were protected from infection-induced weight-loss and hypoxemia ( figure f and g). importantly, consistent with in vitro data, kd feeding caused significant reduction in inflammasome activation in the vat ( figure h ), a major source of inflammation in aging. together, these data show that kd lowers exuberant inflammasome activation in old mice and can potentially be therapeutically employed. the elderly covid- patients exhibit multi-organ failure with systemic viremia and inflammation. therefore, we next investigated the impact of kd on the inflammatory response in lungs, adipose tissue and hypothalamus in old mice post mcov-a infection. consistent with the improved clinical outcome and protection afforded by ketone bodies in infected mice, we found that kd-fed mice inoculated with mcov-a had significantly reduced expression of pro-inflammatory cytokines il- β, tnfα and il- in lung, vat and hypothalamus ( figure a -c). aging and mcov-a increases inflammasome activation which is increasingly implicated in the pathogenesis of covid- (vijay et al., ; youm et al., ) . the sars-cov open reading frame a (orf a) and orf b activates the nlrp inflammasome (siu et al., ) by inducing er stress and lysosomal damage (shi et al., ) . moreover, ability of bats to harbor multiple viruses including coronaviruses is due to splicevariants in the lrr domain of nlrp which prevents inflammasome mediated inflammatory damage (ahn et al., ) . interestingly, in the aging mouse model of mcov-a infection, kd significantly lowered nlrp and caspase- mrna in lung, vat and hypothalamus ( figure d and e) and decreased myeloid cell infiltration in heart ( figure f ). the ketogenesis in infected old mice did not affect the frequency of cd , cd effector memory or macrophage subsets in lungs suggesting that reduction in pro-inflammatory cytokines was not a reflection of reduced infiltration of these cell types ( figure s ). interestingly, we found that kd feeding rescued mcov-a -induced depletion of ϒδ t cell in lungs of aged mice ( figure g and s a). to determine the mechanism of ketogenesis-induced protection from mcov-a driven inflammatory damage in aging, we next investigated the transcriptional changes in lung at the single-cell level. the scrna sequencing of whole lung tissues ( figure a ) found that kd feeding in old infected mice caused significant increase in goblet cells ( figure b ), expansion of ϒδ t cells ( figure b ) and significant decrease in proliferative cell subsets and monocyte populations ( figure b ). comparison with scrna-seq of the lungs from young and old noninfected animals highlighted that only loss of proliferative myeloid cells was associated with the baseline aging process ( figure s ), while other age-specific changes in cellular subpopulations emerged as a result of interaction between virus and host. moreover, the old mice showed reduced interferon responses, suggesting increased vulnerability to the viral infection ( figure s ). interestingly, some of the most striking changes occurred in t cells, where ketogenesis led to a substantial increase in ϒδ but not αβ t cells ( figure c and figure s b ). to understand whether expansion of ϒδ t cells was also accompanied by the changes in their regulatory programs, we sorted the lung ϒδ t cells from aged mice fed chow diet and kd and conducted bulk rna sequencing to determine the mechanism of potential tissue protective effects of these cells in mcov-a infection. we found that kd in aging significantly increased the genes associated with reduced inflammation ( figure d ), increased lipoprotein remodeling and downregulation of tlr signaling, plk and aurora b signaling pathways in ϒδ t cells ( figure e ). furthermore, rna sequencing revealed that lung ϒδ t cells from ketogenic mcov-a infected old mice displayed elevated respiratory electron transport and complex i biogenesis ( figure e ). in addition, golgi to er retrograde transport and cell cycle are downregulated, suggesting the reduced activation status of ϒδ t cell ( figure e ). this may indicate that ϒδ t cells expanded with kd are functionally more homeostatic and immune protective against mcov-a infection. zooming in into monocyte subpopulation we observed three distinct monocyte clusters ( figure f ), characterized by ifi , lmna, and cd e respectively ( figure g ). strikingly, ketogenesis-induced change in the monocyte compartment was driven by a loss of cluster (characterized by high levels of chil , lmna, il r , lcn , cd , cd a, figure h and figure s d and e). in addition, the loss of monocyte subpopulation was observed in cells with low interferon-response further suggesting the immune protective response induction post ketogenesis in infected mice. this is an intriguing finding that is consistent with recent observations that dietary interventions can impact plasticity of the monocyte pool in both mouse and human (collins et al., ; jordan et al., ) . immune-senescence exemplified by inflammasome-mediated basal activation of myeloid cells, expansion of pro-inflammatory aged b cells, impaired germinal center and antibody responses together with thymic demise and restriction of t cell repertoire diversity all contribute to increased risk of infections and vaccination failures in elderly (akbar and gilroy, ; frasca et al., ; goldberg and dixit, ; goronzy and weyand, ) . it is likely that multiple mechanisms partake in aging-induced mortality and morbidity to sars-cov- . however, study of immunometabolic mechanisms that control aberrant inflammatory response in elderly covid- patients are hindered due to lack of availability of an aging mouse model of disease that recapitulates the key features of sars-cov- immunopathology and multi-organ inflammation. despite the severity of this viral infection, it is currently unclear what underlies the symptom diversity and the mortality of this pandemic. epidemiological data strongly support that elderly and aged individuals with late-onset chronic diseases-including diabetes, obesity, heart conditions, pulmonary dysfunctions and cancer-present a much higher disease severity compared to young healthy adults (cai et al., ; chen et al., ) . these observations suggest that it is the vulnerability of the various tissues that occur in these chronic conditions that predispose elderly to develop severe forms of covid- . rodent covs are natural, highly contagious pathogens of mice and rats (compton et al., ; compton et al., ) . they are efficient and safe platforms for recapitulating and examining factors and interventions that impact disease. these models enable basic and translational covid- studies by minimizing studies requiring sars-cov- infection, thereby conserving and reserving limited bsl- space for studies using the most promising candidates in a homologous sars-cov- model. among the diverse rodent covs, mouse hepatitis virus (mhv) is a collection of mouse cov strains which have clinical diseases ranging from clinically silent (enterotropic) to mortality (polytropic/respiratory tropic). of particular interest for covid- are the strains of mhv that are respiratory tropic (yang et al., ) . given, these advantages, mhv mcov-a infection in c bl/ mice can be a powerful tool to rapidly study the disease as well as test therapeutic interventions. we demonstrate that compared to all reported models (dinnon et al., ; hassan et al., ; israelow et al., ; jiang et al., ; sun et al., ; winkler et al., ) , mhv mcov-a infection recapitulates severe features of covid- that includes, up to % weight-loss, sickness behavior exemplified by anorexia, loss of oxygen saturation, lung pathology including neutrophilia, monocytosis, loss of γδ t cells, lymphopenia, increase in circulating pro-inflammatory cytokines, hypothalamic, adipose and cardiac inflammation and inflammasome activation. importantly, ld dose of mhv mcov-a induces % lethality in year old male mice, suggesting that this model allows investigation of covid- relevant immunometabolic mechanisms that control disease development and severity with aging. mechanistically, nlrp inflammasome has been demonstrated to be an important driver of aging-induced chronic inflammation and organ damage (bauernfeind et al., ; camell et al., ; youm et al., ) . covid- patients have inflammasome dependent pyroptosis and increase in il- (lucas et al., ; zhou et al., ) . consistent with the hypothesis that aging may exacerbate inflammasome activation in sars-cov- infection, our data demonstrates that in vivo, mcov infection increases nlrp inflammasome mediated inflammation. emerging evidence demonstrates that ability of bats to harbor multiple viruses including coronaviruses is due to splice-variants in the lrr domain of nlrp which prevents inflammasome mediated inflammatory damage (ahn et al., ) . furthermore, recent study shows that mhv-a also activates the nlrp inflammasome in vitro bone marrow derived macrophages . in addition severe cases of covid- are accompanied with dyregulation of monocyte populations with increased level of s a /a or calprotectin (schulte-schrepping et al., ; silvin et al., ) , which can prime and induce the inflammasome activation . these data underscore that enhanced innate immune tolerance mediated by inflammasome deactivation maybe an important strategy against covid- . the integrated immunometabolic response (iimr) is critical in regulating the setpoint of protective versus pathogenic inflammatory response (lee and dixit, ). the iimr involves sensing of nutrient balance by neuronal (sympathetic and sensory innervation) and humoral signals (e.g. hormones and cytokines) between the cns and peripheral tissues that allow the host to prioritize storage and/or utilize substrates for tissue growth, maintenance and protective inflammatory responses. peripheral immune cells, both in circulation and those residing within tissues, are subject to regulation by the metabolic status of the host. ketone bodies, bhb and acetoacetate are produced during starvation to support the survival of host by serving as an alternative energy substrate when glycogen reserves are depleted (newman and verdin, ) . classically, ketone bodies are considered essential metabolic fuels for key tissues such as the brain and heart (puchalska and crawford, ; veech et al., ) . however, there is increasing evidence that immune cells can also be profoundly regulated by ketone bodies (youm et al, , goldberg et al . for example, stable isotope tracing revealed that macrophage oxidation of liver-derived acac was essential for protection against liver fibrosis (puchalska et al., ) . given our past findings that ketone bodies inhibit nlrp inflammasome activation induced by sterile damps, we next hypothesized that coronavirus mediated inflammasome activation and disease severity in aging could be improved by bhb driven improved metabolic efficiency and nlrp deactivation. in support of this hypothesis, we found that bhb inhibits the mcov-a induced nlrp inflammasome assembly and kd reduces caspase- cleavage as well as decreases gene expression of inflammasome components. we next investigated the mechanism of protection elicited by kd that is relevant to aging. interestingly, scrna sequencing analyses of lung homogenates of old mice fed kd revealed robust expansion of immunoprotective γδ t cells, which are reported to decline in covid- patients (lei et al., ; rijkers et al., ) . the kd activated the mitochondrial function as evidenced by enhanced complex- biogenesis and upregulation of etc in immunoprotective γδ t cells. moreover, the kd feeding blocked infiltration of pathogenic monocyte subset in lungs that has high s a / and low interferon expression. our data shows that the mcov-a murine model offers an efficient and biosafety level- platform for recapitulating covid- to test mechanism of age-related immune decline and can thereby fast-track testing of interventions that impact disease outcome. our findings assumes strong clinical significance as recent studies, demonstrate that γδ t cells were severely depleted in covid- patients in two highly variable cohorts and disease progression was correlated with near ablation of vγ vδ cells that are dominant subtype of circulating γδ t cells (laing et al., ) . taken together these data demonstrate that a ketogenic immunometabolic switch protects against mcov-a driven covid in mice and this anti-inflammatory response in lung is coupled with reduction of inflammasome activation, restoration of protective ϒδ t cells and remodeling of the pool of the inflammatory monocytes. finally, our results suggest that acutely switching infected or at-risk elderly patients to a kd may ameliorate covid- and, therefore, is a relatively accessible and affordable intervention that can be promptly applied in most clinical settings. mouse remains imperfect to model human biology and disease. instead of following the current approaches to make human sars-cov- amenable to infecting an unnatural murine host with unknown biological compatibility, we focused on natural mouse hepatitis virus (mhv)-a because like sars-cov- , it belongs to the family of ards-related beta coronaviruses that are highly homologous. however, the obvious limitation of the model is that mhv-a is not sars-cov- virus. although, both these beta coronaviruses display high degree of homology, mhv-a uses ceacam instead of ace for binding and infectivity. however, cellular expression of ceacam is similar to ace in humans and similarity of viral orfs offer significant advantages in studying tissue responses. like most models of disease, this study shows that not all features of sars-cov- infection seen in humans are seen in mice. this includes lack of development of fever instead mice become hypothermic. also, despite monocyte infiltration in heart, the aged mice did not die due to cardiac failure and displayed normal heart rate. mcov-a virus induced hypothalamic inflammation and led to anorexia that included reduction in orexigenic npy but almost complete loss of pomc gene expression. it remains unclear, if this alters pomc derived peptides including melanocortins and endogenous opioids peptides, in addition these data suggest potential dysregulation of autonomic nervous system which can play a role in organ failure post infection. in terms of mechanism, our data shows that the inflammasome is activated in infection and kd-induced protection is associated with nlrp inflammasome deactivation. future studies are required to test if aged nlrp deficient mice are protected from mcov-a or if absence of γδ t cell increases mortality. aw and vdd conceived the project and helped with data interpretation and manuscript preparation. the authors declare they have no competing interests. fed chow (old-chow, n= ) or ketogenic diet (old-kd, n= ). the mice were provided with diet from days before infection. after infection, the phenotype was evaluated until days post infection. weight change (%) (f), and % o saturation (g) in old mice fed chow or kd. (g) western blot analysis of caspase- inflammasome activation in vat of infected old-chow and old-kd mice. error bars represent the mean ± s.e.m. two-tailed unpaired t-tests were performed for statistical analysis. * p < . ; ** p < . . further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, vishwa deep dixit (vishwa.dixit@yale.edu). this study did not generate new unique reagents. the single cell rna-sequencing and bulk rna-sequencing data has been uploaded to gene expression omnibus (gse and gse ) respectively. all mice used in this study were c bl/ mice. old mice ( - month old) were received from nia, maintained in our laboratory. young mice ( - month old) were from nia or purchased from jackson laboratories or bred in our laboratory. the mice were housed in specific pathogenfree facilities with free access to sterile water through yale animal resources center. mice were fed a standard vivarium chow (harlan s) or a ketogenic diet (envigo, td. ) for indicated time. the mice were housed under h light/dark cycles. all experiments and animal use were conducted in compliance with the national institute of health guide for the care and use of laboratory animals and were approved by the institutional animal care and use committee (iacuc) at yale university. mhv-a was purchased from bei resources (nr- ) and grown in bv cells. mice were anesthetized by intraperitoneal injection of ketamine/xylazine. or pfu of mhv-a was delivered in ul pbs via intranasal inoculation. vital signs were measured before and after infection. arterial oxygen saturation, breath rate, heart rate, and pulse distention were measured in conscious, unrestrained mice via pulse oximetry using the mouseox plus (starr life sciences corp.). lungs were fixed in % formaldehyde, osmicated in % osmium tetroxide, and dehydrated in ethanol. during dehydration, % uranyl acetate was added to the % ethanol to enhance ultrastructural membrane contrast. after dehydration, the lungs were embedded in durcupan and ultrathin sections were cut on a leica ultra-microtome, collected on formvar-coated single-slot grids, and analyzed with a tecnai biotwin electron microscope (fei). h&e and msb staining of lung tissues were performed on sections of formalin-fixed paraffinembedded at the comparative pathology research core at yale school of medicine. for immunohistochemistry, the hearts were harvested from mhv-a infected mice, fixed in % pfa overnight and embedded in oct after dehydration with % sucrose and serial sections of aortic root were cut at μm thickness using a cryostat. sections were incubated at °c overnight with cd (serotec; #mca ) and alexa fluor™ phalloidin (thermofisher, a ) after blocking with blocker buffer ( % donkey serum, . % bsa, . % triton x- in pbs) for hour at rt, followed by incubation with alexa fluor secondary antibody (invitrogen, carlsbad, ca) for hour at rt. the stained sections were captured using a carl zeiss scanning microscope axiovert m imaging system and images were digitized under constant exposure time, gain, and offset. results are expressed as the percent of the total plaque area stained measured with the image j software (imagej version . ). plaque assay l cell ( . ml of x cells/ml) were seeded on well plates (corning, ) in supplemented dmem and allowed to adhere overnight. tissue samples were homogenized in unsupplemented dmem, and spun down at rpm for min. supernatant was serially diluted and μl of each sample was added to aspirated l cells in well plates. plates were agitated regularly for hour before adding overlay media consisting of part . % avicel and part x dmem (thermo fisher, ) supplemented with % fbs (thermo fisher, a ), penicillin-streptomycin (gibco, ), mem non-essential amino acids solution (gibco, ), and hepes (gibco, ). after a four-day incubation, cells were fixed in % formaldehyde (sigma aldrich, ) diluted with pbs for hour. cells were then stained in % (w/v) crystal violet (sigma aldrich, c ) for hour, washed once in distilled water, and then quantified for plaque formations. serum cytokine and chemokine level was measured by procartaplex multiplex assay (thermo fisher scientific). assay was prepared following manufacture's instruction. µl of collected serum from each mice in this study was used. customized assay including il- β, tnf , il- , mcp- , and mip- β was used. luminex xponent system was used to perform the assay. to extract and purify rna from tissues, rneasy plus micro kit (qiagen) and direct-zol™ rna miniprep plus kit (zymo research) were used according to manufacturer's instructions. cdna was synthesized with isolated rna using iscript cdna synthesis kit (bio-rad). to quantify amount of mrna, real time quantitative pcr (qpcr) was done with the synthesized cdna, gene specific primers, and power sybr green detection reagent (thermo fischer scientific) using the lightcycler ii (roche). analysis was done by ddct method with measured values from specific genes, the values were normalized with gaphd gene as an endogenous control. bone marrow derived macrophage was cultured by collecting mouse femurs and tibias in complete collecting media containing rpmi (thermo fischer scientific), % fbs (omega scientific), and % antibiotics/antimycotic (thermo fischer scientific). using needle and syringe, bone marrow was flushed into new complete media, followed by red blood cells lysis by ack lyses buffer (quality biological). in well plate, the collected cells were seeded to be differentiated into macrophages incubated with ng/ml m-csf (r&d) and l (atcc) conditioned media. cells were harvested on day , and seeded as x cell/well in well plate for experiments. to infect bmdm, mhv-a was incubated with bmdm as a moi ( : ) for hour. for inflammasome activation, lps ( ug/ml) or pam csk ( ug/ml) were pre-treated with or without bhb ( , , mm) for hour before mhv-a infection for hour or atp ( mm) treatment for hour. to prepare samples for western blotting, tissues were snap frozen in liquid nitrogen. ripa buffer with protease inhibitors were used to homogenize the tissues. after cell supernatant was collected, cells were harvested by directly adding ripa buffer on cell culture plate. after quantification of protein amount by the dc protein assay (bio-rad), same amount of protein was run on sds-page gel followed by transferring to nitrocellulose membrane. specific primary antibodies and appropriate secondary antibodies (thermo fisher scientific) were used to probe blots and bands were detected by ecl western blotting substrate (pierce). the following primary antibodies were used for experiments. antibodies to caspase- ( : , genentech), βactin ( : , , l; cell signaling), il- β ( : , gtx , genetex), and asc ( : , ag- b- , adipogen) were used. to detect asc oligomers, cells were harvested in np- lysis buffer which contains mm hepes-koh (ph . ), mm kcl, % np- , . mm pmsf, and protease inhibitors. the cells in lysis buffer were incubated on ice for min, and centrifuged at , rpm at °c for min. supernatant was collected and kept for cell lysate western blotting. the pellet was vortexed with ml of np- lysis buffer, and centrifuged at , rpm at °c for min. the pellet was incubated with ul of np- lysis buffer and ul of mm dss (disuccinimidyl suberate) for min at room temperature, then centrifuged at , rpm at °c for min. the pellet with sds sample buffer and reducing reagent was loaded for western blotting. lung was digested in rpmi (thermo fisher) with . mg/ml collagenase i (worthington) and . mg/ml dnase i (roche) for hour. digested lung tissues were minced through μm strainer. spleen was directly minced through μm strainer. minced tissues were additionally filtered with μm strainer after red blood cell lysis by ack lysing buffer (quality biological). after incubation with fc block cd / antibodies (thermo fisher scientific), the cells from lung and spleen were further incubated with surface antibodies for min on ice in the dark. washed cells were stained with live/dead™ fixable aqua dead cell stain kit (thermo fisher scientific). bd lsrii was used for flow cytometry and results were analyzed by flowjo software. the following antibodies were used for flow cytometry analysis to detect cd t cell, cd t cell, γδ t cell, neutrophil, eosinophil, and macrophage: cd -bv , mertk-fitc, cd -bv , f / -efluor , ly c-percp-cy . , cd c-apc, cd -pe, cd -pe-cy , cd -bv , cd -pe-cy , cd -efluor , tcr γ/δ-pe, ly g-apc, siglecf-percp-cy . , cd l-percp-cy . , cd -apc-cy . lung cells were prepared as mentioned above for flow cytometry and equal amount of cells were pooled as indicted in the experiments. single-cell rna sequencing libraries were prepared at yale center for genome analysis following manufacturer's instruction ( x genomics). novaseq was used for sequencing library read. the cell ranger single-cell software suite (v . . ) (available at https://support. xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-iscell-ranger) was used to perform sample demultiplexing, barcode processing, and single-cell ' counting. cellranger mkfastq was used to demultiplex raw base call files from the novaseq sequencer into sample-specific fastq files. subsequently, fastq files for each sample were processed with cellranger counts to align reads to the mouse reference (version mm - . . ) with default parameters. for the analysis, the r (v . . ) package seurat (v . . ) (butler et al., ) was used. cell ranger filtered genes by barcode expression matrices were used as analysis inputs. samples were pooled together using the merge function. the fraction of mitochondrial genes was calculated for every cell, and cells with high (> %) mitochondrial fraction were filtered out. expression measurements for each cell were normalized by total expression and then scaled to , , after that log normalization was performed (normalizedata function). two sources of unwanted variation: umi counts and fraction of mitochondrial reads -were removed with scaledata function. for both datasets platelet clusters as well as a cluster of degraded cells (no specific signature and low umi count) were removed and data was re-normalized without them. in case of old-keto and old-chow dataset we additionally removed neutrophils, doublets, and red blood cells. the most variable genes were detected using the findvariablegenes function. pca was run only using these genes. cells are represented with umap (uniform manifold approximation and projection) plots. we applied runumap function to normalized data, using first pca components. for clustering, we used functions findneighbors and findclusters that implement snn (shared nearest neighbor) modularity optimization-based clustering algorithm on top pca components using resolution of . for both datasets. to identify marker genes, findallmarkers function was used with likelihood-ratio test for single cell gene expression. for each cluster, only genes that were expressed in more than % of cells with at least . -fold difference (log-scale) were considered. for heatmap representation, mean expression of markers inside each cluster calculated by averageexpression function was used. single cell rna-seq differential expression to obtain differential expression between clusters, mast test was performed via findmarkers function on genes expressed in at least % of cells in both sets of cells, and p-value adjustment was done using a bonferroni correction (finak et al., ) . pathway analysis was performed using clusterprofiler package (v . . ) (yu et al., ) with hallmark gene sets from msigdb. significantly different genes were used (padj < . ) if percent difference between conditions (|pct. -pct. |) was over %. to visualize pathway expression for each cell z-scores of all pathway genes were averaged. to separate ab t cells and gs t cells we subset raw values of t cell (cluster ), normalized and clustered corresponding data separately as described above with clustering resolution . . obtained subclusters were projected on the original umap, splitting cluster in _ (ab t cells) and _ (gs t cells). monocyte clusters and were subset and re-analyzed in the same manner with clustering resolution . . umap was recalculated for monocytes only as shown in figure f . bulk rna sequencing of sorted gδ t cells cells from lung were prepared as mentioned above for flow cytometry and γδ t cell was sorted by flow cytometry (live cd + cd + cd -cd -tcr γ/δ+). rna was isolated from sorted cells using rneasy plus micro kit (qiagen). quality checked rna was used for rna sequencing library preparation at yale center for genome analysis following manufacturer's instruction (illumina). novaseq was used for sequencing library read. fastq files for each sample were aligned to the mm genome (gencode, release m ) using star (v . . a) with the following parameters: star --genomedir $genome_dir --readfilesin $work_dir/$file_ $work_dir/$file_ --runthreadn --readfilescommand zcat --outfiltermultimapnmax --outfiltermismatchnmax --outreadsunmapped fastx --outsamstrandfield intronmotif --outsamtype bam sortedbycoordinate --outfilenameprefix ./$ (dobin et al., ) . quality control was performed by fastqc (v . . ), multiqc (v . ) (ewels et al., ) , and picard tools (v . . ). quantification was done using htseq-count function from htseq framework (v . . ): htseqcount -f bam -r pos -s no -t exon $bam $annotation > $output . differential expression analysis was done using deseq function from deseq package (love et al., ) (v . . ) with default settings. significance threshold was set to adjusted p-value < . . gene set enrichment analysis via fgsea r package (sergushichev, ) (v . . ) was used to identify enriched pathways and plot enrichment curves. to calculate statistical significance, two-tailed student's t test was used. level of significance was indicated as follow. *p < . ; **p < . ; ***p < . ; ****p < . respectively. all statistical tests used % confidence interval and normal distribution of data was assumed. biological replication numbers for each experiment were indicated in each figure and figure legend. data were shown as mean ± s.e.m. graphpad prism software was used for all statistical tests to analyze experimental results. expression analysis in hypothalamus of young uninfected and young infected mice (e), and old uninfected and old infected mice (f). error bars represent the mean ± s.e.m. two-tailed unpaired t-tests were performed for statistical analysis. * p < . ; ** p < . ; *** p < . ; **** p < . . figure s a split by sample. color represents expression of mki . (e) summary of cluster-by-cluster differential expression comparison of young and old samples. each dot represents a gene, significant genes are shown in red. (f) gene set enrichment analysis of significantly up-and down-regulated genes described in figure s e. (g, h) umap as in figure s a split by sample. color shows average z-scores of genes in selected pathways. figure c . color represents expression of trac. (c) summary of cluster-by-cluster differential expression comparison of old-kd and old-chow. each dot represents a gene, significant genes are shown in red. (d) percentage of each monocyte subset as identified in figure f relative to total number of monocytes in the corresponding sample. (e) umap plot as in figure f showing average zscores of genes in ifn-alpha and ifn-gamma pathways. dampened nlrp -mediated inflammation in bats and implications for a special viral reservoir host aging immunity may exacerbate covid- htseq-a python framework to work with high-throughput sequencing data immunometabolism of infections aging-associated tnf production primes inflammasome activation and nlrp -related metabolic disturbances integrating single-cell transcriptomic data across different conditions, technologies, and species obesity and covid- severity in a designated hospital inflammasome-driven catecholamine catabolism in macrophages blunts lipolysis during ageing clinical characteristics of deceased patients with coronavirus disease : retrospective study elevated glucose levels favor sars-cov- infection and monocyte response through a hif- alpha/glycolysis-dependent axis the bone marrow protects and optimizes immunological memory during dietary restriction pathogenesis of mouse hepatitis virus infection in gamma interferon-deficient mice is modulated by coinfection with helicobacter hepaticus the cellular and molecular pathogenesis of coronaviruses adipose tissue is a neglected viral reservoir and an inflammatory site during chronic hiv and siv infection murine hepatitis virus strain produces a clinically relevant model of severe acute respiratory syndrome in a/j mice a mouse-adapted model of sars-cov- to test covid- countermeasures star: ultrafast universal rna-seq aligner multiqc: summarize analysis results for multiple tools and samples in a single report mast: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell rna sequencing data age-related factors that affect b cell responses to vaccination in mice and humans metabolic reprograming in macrophage polarization tissue and cellular distribution of an adhesion molecule in the carcinoembryonic antigen family that serves as a receptor for mouse hepatitis virus hydroxybutyrate deactivates neutrophil nlrp inflammasome to relieve gout flares drivers of age-related inflammation and strategies for healthspan extension the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- mechanisms underlying t cell ageing prevalence of obesity and severe obesity among adults: united states a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies multi-phase approach to eradicate enzootic mouse coronavirus infection mouse model of sars-cov- reveals inflammatory role of type i interferon signaling pathogenesis of sars-cov- in transgenic mice expressing human angiotensin-converting enzyme dietary intake regulates the circulating inflammatory monocyte pool the role of adipocytes and adipocyte-like cells in the severity of covid- infections a consensus covid- immune signature combines immuno-protection with discrete sepsis-like traits associated with poor prognosis. medrxiv the phenotypic changes of γδ t cells in covid- patients. medrxiv moderated estimation of fold change and dispersion for rna-seq data with deseq longitudinal analyses reveal immunological misfiring in severe covid- murine coronavirus spike protein determines the ability of the virus to replicate in the liver and cause hepatitis hydroxybutyrate: a signaling metabolite. annual review of nutrition sars-cov- infection fatality risk in a nationwide seroepidemiological study. medrxiv serology-informed estimates of sars-cov- infection fatality risk in multi-dimensional roles of ketone bodies in fuel metabolism, signaling, and therapeutics hepatocyte-macrophage acetoacetate shuttle protects against tissue fibrosis more bricks in the wall against sars-cov- infection: involvement of γ δ t cells severe covid- is marked by a dysregulated myeloid cell compartment an algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. biorxiv sars-coronavirus open reading frame- b triggers intracellular stress pathways and activates nlrp inflammasomes elevated calprotectin and abnormal myeloid cell severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc generation of a broadly useful model for covid- pathogenesis, vaccination, and treatment virus-induced inflammasome activation is suppressed by prostaglandin d /dp signaling antibody prevalence for sars-cov- in england following first peak of the pandemic: react study in , adults. medrxiv sars-cov- infection of human ace -transgenic mice causes severe lung inflammation and impaired function coronavirus mhv-a infects the lung and causes severe pneumonia in c bl/ mice the ketone metabolite β-hydroxybutyrate blocks nlrp inflammasome-mediated inflammatory disease canonical nlrp inflammasome links systemic low-grade inflammation to functional decline in aging clusterprofiler: an r package for comparing biological themes among gene clusters impaired nlrp inflammasome activation/pyroptosis leads to robust inflammatory cell death via caspase- /ripk during coronavirus infection clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study key: cord- -rvhwl ea authors: miyashita, l; foley, g; semple, s; grigg, j title: traffic-derived particulate matter and angiotensin-converting enzyme expression in human airway epithelial cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rvhwl ea background the mechanism for the association between traffic-derived particulate matter less than microns (pm ) and cases of covid- disease reported in epidemiological studies is unknown. to infect cells, the spike protein of sars-cov- interacts with angiotensin-converting enzyme (ace ) on host airway cells. increased ace expression in lower airway cells in active smokers, suggests a potential mechanism whereby pm increases vulnerability to covid- disease. objective to assess the effect of traffic-derived pm on human airway epithelial cell ace expression in vitro. methods pm was collected from marylebone road (london) using a kerbside impactor. a and human primary nasal epithelial cells were cultured with pm for h, and ace expression (median fluorescent intensity; mfi) assessed by flow cytometry. we included cigarette smoke extract as a putative positive control. data were analysed by either mann-whitney test, or kruskal-wallis with dunn’s multiple comparisons test. results pm at μg/ml, and μg/ml increased ace expression in a cells (p< . , . vs. medium control, respectively). experiments using a single pm concentration ( μg/ml), found increased ace expression in both a cells (control vs. pm , median (iqr) mfi; ( . to ) vs ( to ), p< . ), and in human primary epithelial cells ( ( to ) vs. ( to ), p< . ). culture of a cells with % cigarette smoke extract increased ace expression (n= , ( to ) vs. ( to , p< . ). conclusion traffic-related pm increases the expression of the receptor for sars-cov- in human respiratory epithelial cells. expression may be upregulated by mediators. for example, culture of primary human airway cells with interferon alpha in vitro increases ace transcripts ( ) . since the effect of traffic-related pm on the expression of ace in human airway cell populations is not known we sought, in this study, to assess ace expression in human airway epithelial cells exposed to traffic-derived pm in vitro. traffic-derived pm was collected as dry particles using a high-volume cyclone placed within metres of marylebone road, london, uk ( ). marylebone road is one of the most polluted roads in europe, with diesel trucks dominating near-road traffic-derived pm emissions ( ). in order to obtain milligram amounts of pm , sampling was done between to h per day on occasions between may and september (i.e. before the uk lockdown). pm samples were pooled and stored at room temperature in a sterile glass container. an aliquot of pm was diluted in dulbeccos phosphate-buffered saline (dpbs) to a final concentration of mg/ml and stored as a master stock at - °c. cigarette smoke extract (cse) was collected onto a cotton filter through a peristaltic pump (jencons scientific ltd., east grinstead, uk) at a fixed rate from two malborough red cigarettes, as previously described ( ) . cigarette smoke extract was extracted after vortexing in ml dulbecco's dpbs and stored at - c as % master stock. the human alveolar type ii epithelial cell line a was purchased from sigma- culture of a cells with fossil-fuel derived pm ( to µg/ml) for h resulted in a concentration-dependent increase in ace expression, with significant increase at both µg/ml and µg/ml (n= , p< . , p< . vs. medium control, figure ). at µg/ml ace increased by fold (iqr to ). using a single concentration of pm of µg/ml, ace expression increased in figure b ). culture of a cells with % cse, a putative positive control, increased ace expression (mfi, n= , ( to ) vs. ( to ), p< . , figure ). in this study we found that pm , collected next to a major london road dominated by diesel traffic ( ), upregulates ace expression in a human type ii pneumocyte cell line (a cells). we also found that traffic-derived pm upregulates ace expression in human primary nasal epithelial cells, suggesting that this response occurs throughout the respiratory tract. one strength of the present study is that collection of traffic-derived pm by a high-volume cyclone obviated the need to extract pm from filters in solution, and we could therefore accurately determine pm concentrations used in cell culture studies. although the effect of pm on ace expression in human airway cells has not previously been reported, our findings are compatible with an animal study that reported lung ace protein expression in wild type mice increased by . fold at days post intratracheal instillation of urban pm . ( ) . a putative protective effect of increased pulmonary ace was suggested in this mouse model by complete recovery of pm-induced acute lung injury in wild type mice, and incomplete recovery in ace knockout mice ( ) . we therefore speculate that increased ace expression may, on one hand, be a beneficial response to pm exposure, but on the other hand presents a trojan horse to the sars-cov- virus. we included cse as a putative positive control, since leung et al ( ) there are limitations to this study. first, we did not determine whether increased in conclusion, this study provides the first mechanistic evidence that traffic-derived air pollution increases ace expression in human airway cells and therefore vulnerability to sars-cov- infection. we conclude that there is biological plausibility for epidemiological studies reporting an association between either pm or active smoking and covid- disease. effect association between short-term exposure to air pollution and covid- infection: evidence from china association of respiratory allergy, asthma and expression of the sars-cov- receptor, ace e-cigarette vapour enhances pneumococcal adherence to airway epithelial cells instillation of particulate matter . induced acute lung injury and attenuated the injury recovery in ace knockout mice ace- expression in the small airway epithelia of smokers and copd patients: implications for covid- cigarette smoke and platelet-activating factor receptor dependent adhesion of streptococcus pneumoniae to lower airway cells impact of copd and smoking history on the severity of covid- : a systemic review and meta-analysis smoking is associated with covid- progression: a meta-analysis adhesion of streptococcus pneumoniae to human airway epithelial cells exposed to urban particulate matter key: cord- - x yubt authors: sawmya, shashata; saha, arpita; tasnim, sadia; anjum, naser; toufikuzzaman, md.; rafid, ali haisam muhammad; rahman, mohammad saifur; rahman, m. sohel title: analyzing hcov genome sequences: applying machine intelligence and beyond date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: x yubt covid- pandemic, caused by the sars-cov- strain of coronavirus, has affected millions of people all over the world and taken thousands of lives. it is of utmost importance that the character of this deadly virus be studied and its nature be analysed. we present here an analysis pipeline comprising phylogenetic analysis on strains of this novel virus to track its evolutionary history among the countries uncovering several interesting relationships, followed by a classification exercise to identify the virulence of the strains and extraction of important features from its genetic material that are used subsequently to predict mutation at those interesting sites using deep learning techniques. in a nutshell, we have prepared an analysis pipeline for hcov genome sequences leveraging the power of machine intelligence and uncovered what remained apparently shrouded by raw data. covid- was declared a global health pandemic on march , [ ] . it is the biggest public health concern of this century [ ] . it has already surpassed the previous two outbreaks due to the coronavirus, namely, severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov). the virus acting behind this epidemic is known as severe acute respiratory syndrome coronavirus or in short sars-cov- virus. it is a single stranded rna virus which is mainly , to , bases long in average [ ] . the novel coronavirus is spherical in shape and has spike protein protruding from its surface. these spikes assimilate into human cells, then undergo a structural change that allows the viral membrane to fuse with the cell membrane. the host cell is then attacked by the viral gene through intrusion and it copies itself within the host cell, producing multiple new viruses [ ] . as of mid-april, , about , of high-quality complete genome sequences were present in the gisaid initiative database [ ] collected from clinicians and researchers from around the world. to understand the viral evolution and its nature of spread among the different countries, we present an analysis pipeline of the genome sequence leveraging the power of machine intelligence. this paper makes the following key contributions. a. an alignment-free phylogenetic analysis is carried out with a goal to uncover the evolutionary history of sars-cov- . the resulting phylogenetic tree is able to highlight evolutionary relationships that can be explained by facts and figures and has further identified some mysterious relationships. b. several machine learning and deep learning models are used to identify the virulence of the strains (i.e., to classify a virus strain as either severe or mild). additionally, from the classification pipeline, important features are identified as sites of interest (sois) in the virus strains for further analysis. c. several cnn-rnn based models are used to predict mutations at specific sites of interest (sois) of the sars-cov- genome sequence followed by further analyses of the same on several south-asian countries. d. overall, we present an analysis pipeline that can be further utilized as well as extended and revised (a) to study where a newly discovered genome sequence lies in relation to its predecessors in different regions of the world; (b) to analyse its virulence with respect to the number of deaths its predecessors have caused in their respective countries and (c) to analyse the mutation at specific important sites of the viral genome. figure : the whole analysis pipeline consisted of three phases. in the first phase, the genome sequences are divided into subsets based on country and a phylogenetic tree is constructed considering only the "representative" sequences of each such subset using an alignment-free sequence comparison approach. in the second phase, we employed state of the art classification algorithms, leveraging both traditional and deep learning pipelines to learn to discriminate the viral strains of many countries as either mild or severe. we also identify the features that contributed the most as the discriminant factor in the classification pipeline. finally, we use the identified features from the previous stare to predict the mutation of the interesting sites in the viral strain using a deep learning model. figure presents our overall analysis pipeline. below we present the details of the pipeline. we have collected hcov genome sequences upto the date april, (cut-off date) from the gisaid initiative dataset [ ] . these are high quality complete viral genome sequences submitted by the scientists and scientific institutes of individual countries. we also have collected country wise death statistics (upto cut-off date) from the official site of who [ ] . the label was assigned based on a threshold of deaths which is the estimated median of the number of deaths in the data points. any genome sequence of a country having deaths below (above) the threshold were considered a mild (severe) strain, i.e., assigned a label ( ). a sample labelling is shown in the supplementary table . informatively, we have also considered some other metrics for labeling purposes albeit with unsatisfactory output (please see supplementary file for details) . we divided the whole dataset into training and testing subset in / ratio with a balanced number of data points per class for traditional machine learning pipeline and for deep learning classification routine, we created the subsets training/validation/testing in / / ratio. figure : the viral genome sequences were divided into subsets of sequences based on country. for each subset, each viral genome sequence is converted into a vector representation and pairwise euclidean distance was calculated among the vectors to create the distance matrix. as the matrix is very highdimensional, we used principal component analysis to find the principal component matrix from the distance matrix. representative sequences were identified through k-means clustering on the pca matrix, and a phylogenetic tree was constructed from the representative sequence of each country. we aim to identify and interpret the evolutionary relationships among the hcov genome sequences uploaded at gisaid from different regions around the globe ( figure ). to do that we have used an alignment-free genome sequence comparison method as proposed in [ ] as briefly described below. notably, we do not consider any alignmentbased method since it is not computationally feasible for us to align thousands of viral sequences for analysis and clustering purposes [ ] . at first the sequence set is divided into subsets of sequences based on the location. all sequences are converted into representative ℝ vector. pairwise distance among vectors derived from the fast vector method [ ] are computed using euclidean distance. due to the high dimensionality of the resulting distance matrix, we resort to principal component analysis (pca) technique [ ] to reduce the dimension of the matrix. subsequently, we use k-means clustering [ ] to identify the corresponding cluster centers. for the k-means clustering algorithm, we have used the implementation of [ ] and used the default parameters except for the number of clusters which were set to for determining the cluster center for each of the subsets. for each location-based cluster, the representative sequence (i.e., the "centroid" of the cluster) is then identified and used in the subsequent step of the pipeline. the evolutionary relationship among the representative sequences of different clusters (from section . ) has been estimated by constructing a phylogenetic tree. we have used the neighbor joining algorithm [ ] for phylogenetic tree construction since it is more reliable [ ] . we have used euclidean distance among the vectors, as described in the section . , to prepare the distance matrix. while we predominantly have used the alignment-free method of [ ] , in this stage, we have only representative sequences and hence we have also attempted a few other alignment-free and alignment-based methods to estimate the phylogenetic tree; however, these didn't produce satisfactory results (more details are in supplementary file). for traditional machine learning, we use a pipeline similar to [ ] (see figure in supplementary file). we extracted three types of features from the genomic sequence of novel sars-cov- . inspired by the recent works [ ] [ ] [ ] [ ] that focus only on sequences, we also extract only sequence-based features. these features are: position independent features, n-gapped dinucleotides and position specific features (see details in section of supplementary file). we use the gini value of the extremely randomized tree (extra tree) classifier [ ] to rank the features. subsequently, only the features with gini value greater than the mean of the gini values are selected for training a lightgbm classifier model [ ] (with default parameters) and performed -fold cross validation. lightgbm is a highly efficient and fast gradient boosting framework which uses tree-based algorithms. we use shap values and univariate feature selection to compare the importance of the features. shap (shapley additive explanations) is a game theoretic approach which is used to explain the output of a model [ ] . univariate feature selection works by selecting the best features based on univariate statistical tests [ ] . we use selectkbest univariate feature selection to get the top k highest scoring features according to anova f_classif feature scoring [ ] function. we leverage the power of different deep learning (dl) classification models, namely, vanilla cnn [ ] , alexnet [ ] and inceptionnet [ ] . we transform the raw viral genome sequences into two different representations, namely, k-mers spectral representation [ ] and one hot vectorization [ ] to feed those into the dl networks in a seamless manner. details of these representations are given in section . of the supplementary file. for k-mers spectral representation we experimented with different values of k (k = , , for vanila cnn and k = & only for the rest due to resource limitation). for one hot vectorization, we have trained inceptionnet for epochs for both -and -mers and trained alexnet for , and epochs for -, -and -mers respectively. we design a pipeline to predict mutation on specific sites (chosen in an earlier stage of the pipeline) in the sars-cov- genome (figure ). we follow a similar protocol followed by [ ] and adopt it to fit our setting as follows. we divide all the available countries and the states of the usa into different time-steps by the date of the first reported incidence of sars-cov- infected patients of that location. thus, every resulting time-step represents a date (tk for cluster k) and contains the clusters of genome sequences of the countries/states. then the time series samples are generated by concatenating sites from different time-step one-by-one that represent the evolutionary path of the sars-cov- viral strain. for example, t is the very first date when the virus is discovered in china. so, the time-step contains only one country, china. likewise, time-step t contains clusters for those countries where the virus is discovered on date t and so on. (check table in supplementary file for more details). we generate time series sequences by concatenating genome sites from t ,t ,....,tn (in our case, n = ) and then fed the samples to the model which consists of a convolutional one dimensional layer and a recurrent neural network layer [ ] . we experiment with both pure lstm and bidirectional lstm as our rnn layer (see section . of supplementary file). the model has a dense layer of neurons in the end which predicts the probability of the next base pair of the next time-step. so, in a nut-shell the model takes concatenated genome sequences from t ,t ,....,tn- as input and predicts the mutation for time tn. we further use our mutation prediction pipeline to identify and analyze possible parents of a mutated strain. for this particular analysis, we trained the models specifically for some south-asian countries, namely, bangladesh, india and pakistan. we only used the best performing model for this analysis and generated five time series samples. at the time of generating these samples, the country/location having the minimal euclidean distance was taken for each time-step. we have implemented our experiments mostly in python. we have used scikit-learn library [ ] for clustering and plotting the graphs. for deep learning models, scikit-learn, tensorflow and keras neural network libraries are used and for lightgbm classifier, python lightgbm framework has been used. the phylogenetic trees are constructed using the dendropy library of python [ ] keeping default parameters. we use the tree visualizer tools dendroscope [ ] and evolview [ ] for tree visualization and annotation. the experiments have been conducted in the following machines: a) clustering and phylogenetic analyses have been carried out in a machine with intel(r) core (tm) i - u cpu @ . ghz, ubuntu . os and gb ram. b) experiments involving the deep learning pipelines (i.e., both classification and mutation prediction) have been conducted in the work-stations of galileo cloud computing platform [ ] and the default gpu provided by the google colaboratory cloud computing platform [ ] . c) the lightgbm classifier model was trained in a machine with intel core i - u cpu @ . ghz x , windows os and gb ram. all the codes and data (except for the genome sequences) of our pipeline can be found at the following link: https://github.com/pythonloader/analyzing-hcov-genome-sequence. the genome sequence data have been extracted from and are publicly available at gisaid [ ] . we identify the representative sequence of each of the countries as present in the gisaid dataset (upto cut-off date). the estimated phylogenetic tree constructed from the representative sequences is shown in figure . in what follows, we will be referring to this tree as the sc (sars-cov- ) tree. the phylogenetic tree generated is expected to reveal the evolutionary relationship of the viral strains. however, with careful scrutiny we have some apparently unusual but interesting observations. for example, it is generally expected that the countries sharing (open) borders (e.g., countries in europe) should be either neighbours or at least in the same clade in the tree. however, surprisingly from the tree, we do not notice geographically adjacent countries in europe as neighbors; rather we see for example that china and italy are immediate neighbors. it is to be noted that these two countries are also the first countries to get hit by the first pandemic wave. in addition to that, although the usa and canada share the longest un-militarized international border in the world, representative strains do not appear to be sister branches as they should have been. also, we notice that the usa, uk, canada, turkey and russia are in the same clade which have a higher number of deaths than most of the other countries. all our classifiers are trained to learn whether a given strain is mild or severe. the classification accuracy of the lightgbm classifier (~ %) is superior to that of the deep learning classifiers (~ - %), which, while is somewhat surprising, is in line with the recent findings of [ ] . it should be noted that lightgbm had produced better results in significantly less time than deep learning models for this dataset. the results of the classifier models are shown in figure . quantitative results aside, we also have applied our classifiers on the sequences that have been deposited at gisaid after the cut-off date (i.e. april , ). since the cutoff date, the country wise death statistics [ ] has certainly changed significantly and this has pushed a few countries, particularly from asian regions and several states of the united states of america transition from mild to severe state (based on our predefined threshold). interesting, our classifiers have been able to predict the severity of the new strains submitted from these countries/states correctly. table in the supplementary file shows a snapshot of a few such countries/states with the relevant information. we preliminarily identify the top features of shap and selectkbest feature selection (with k= ). from these features, as sois, we have selected the features that are also biologically significant, i.e., cover different significant gene expression regions ( figure ). in particular, we have selected the position specific features pos_ _ , pos_ _ , pos_ _ and pos_ _ as the sois for the mutation prediction analyses down the pipeline. here, pos_x_y indicates the site from positions x to y of the virus strains. the reason for selecting these features as sois are outlined below. according to gene expression studies [ ] [ ], our sois, namely, pos_ _ and pos_ _ encode to two non-structural proteins, nsp and nsp , respectively. and, our other two sois, namely, pos_ _ and pos_ _ correspond to the spike protein of sars-cov- . nsp binds to viral rna, nucleocapsid protein, as well as other viral proteins, and participates in polyprotein processing. it is an essential component of the replication/transcription complex [ ] . so, the mutation in this protein is expected to affect the replication process of the sars-cov- in host bodies. on the other hand, the spike protein sticks out from the envelope of the virion and plays a pivotal role in the receptor host selectivity and cellular attachment. according to wan et al. there exists strong scientific evidence that sars and sars-cov- spike proteins interact with angiotensin-converting enzyme (ace ) [ ] . the mutation on this protein is expected to have a significant impact on the human to human transmission [ ] . therefore, it is certainly interesting and useful to predict the mutation of such sois. cnn-lstm and cnn-bidirectional lstm performed in a similar manner for different sois of the genome registering . % and % accuracy, respectively, considering all sois together. for detailed results please check table and table of the supplementary material. for the model involving only bangladesh, we applied the cnn-bidirectional lstm model (as this is the best performer among the two) and achieved almost % accuracy. then we analyzed the ancestors in the time series test samples and noticed that some of the states of the usa are present in these samples. these states are california, massachusetts, texas, new jersey and maryland. for india and pakistan, we got similar results for some sites but for other sites, accuracy was not as high as bangladesh (check table of the supplementary file for details). our analyses reveal a very close (evolutionary) relationship between the genome sequences of china and italy. also, similarity was found among the virus strains of the usa, germany, qatar and poland. these countries have similar numbers of deaths and although not geographically directly adjacent (except for germany and poland) they have strong air connectivity among them. in fact, a number of interesting relationships can be inferred from the estimated phylogenetic tree as follows. chinese tourists [ ] . this relationship is clearly portrayed in the sc tree where the two strains appear to be immediate siblings. . poland's strain is in the same clade as that of germany, which can be explained by the fact that its strain (through poland's patient zero) came from germany [ ] . . taiwan is geographically very close to china. the virus was confirmed to have spread to taiwan on january , , through a -year-old woman who had been teaching in wuhan, china [ ] . the virus strains from these regions are also close together as can be seen from the sc tree, about branches apart. similar relationship can also be inferred from the tree between china and south korea: the strain of the virus in south korea is believed to be transmitted from china firstly through a -year old chinese woman and secondly by a -year old south korean national [ ] . interestingly, from the sc tree it can also be deduced that the south korean strain is very close to that of taiwan and also near to the strain from china. the incident of a taiwanese woman being deported from south korea after refusing to stay at a quarantine facility can be a probable explanation as to how the south korean strain might have found its path to taiwan [ ] . . on march , , the virus was confirmed to have reached portugal, when it was reported that a portuguese year-old man working in spain was tested positive for covid- after returning home [ ] . subsequently, within a span of days, more cases were reported all originating from spain [ ] [ ] . the fact that the first cases of covid- in portugal originated from spain is clearly captured in our sc tree. . the sc tree suggests that india's strain is closely related to that from china and also italy (around branches) and that it is also connected to that from saudi arabia. these relationships can be explained as follows. a . turkey's first identified case was a man who was travelling europe [ ] . turkey also announced a huge number of cases and subsequent deaths, which were originating from europe [ ] . in our inferred relationship, we can see that the turkish representative strain is quite close to several central and western european countries like russia, iceland and ireland which can be backed up by the two facts stated above. . it is visible from the sc tree that the strain of germany is very close to the strains of both poland and the usa. it might be the case that the community transmission occurred concurrently in both usa and poland from germany which hit the peak of pandemic before both usa and poland [ ] . . qatar has the second highest number of covid- patients in the middle-east [ ] . the first case of qatar was reported on february , to be a man working in iran [ ] . qatar introduced a travel ban to and from germany and the usa as precautionary measures in mid-march, quite a while later following the first occurrence. qatar has air-routes with germany and usa, with more than airlines operating in that route [ ] [ ] . though the first case has originated from iran, it might be the case that subsequent patients were found to be travelling from the aforementioned countries as a result of which the travel ban was introduced. our estimated sc tree places qatar very close to both the usa and germany. . while we can certainly explain many of the relationships identified by the estimated sc tree a above, there are some relationships which are not that apparent. one such example is the direct relationship between vietnam and greece. while apparently, there exists no direct relationship, when investigated further, we identified something interesting. patient zero of greece is believed to have been contaminated during her trip to the milan fashion week which took place during february - , [ ] . interestingly, the first covid- patient in hanoi [ ] left hanoi on february to visit family members living in london, england and three days later, she traveled from london to milan city. could she be in contact with patient zero of greece or any other who had been contaminated by the latter, before returning to london on february ? we can't be certain, but our inferred relationship between vietnam and greece certainly put a lot of legitimacy to that question. . finally, we are unable to find any apparent explanation analyzing the reported news sources for a few other strong relationships inferred by the tree (e.g., congo-iran, panama-malaysia, sweden-singapore, japan-australia, etc). this could be because of the inherent inaccuracies of the distance matrices as well as the limitations of the tree estimation algorithms: none of these algorithms are % accurate. from another angle, perhaps, the tree did identify these relationships correctly; but the relevant incidences were not accurately identified or not documented. in recent times, the number of deaths is increasing rapidly in india. we have been closely following the change in the virus strains of india before and after the cut-off date. a genome sequence (epi_isl_ ) was collected on april , (before our cutoff date) from a patient in ahmedabad, gujrat, india. it was predicted to be a severe strain (with low confidence) even though at that time we trained the classifier to consider the indian sequences as mild. according to our evolutionary relationship, india is very close to both italy and china. so, we calculated the distance between the representative sequence of both italy and china with this strain. we considered another strain (epi_isl_ ) which was collected from another patient from the same place in india on april , (after our cut-off date) and predicted the severity thereof. the classifiers declared this isolate to be severe with very high confidence (about %). we did the distance calculation like before. interestingly, it was identified that this isolate is closer to both italy and china's representative sequence than the previous less severe one. this strongly suggests that there were some mutations that turned the indian sequences from mild or less severe to severe or highly severe, respectively. also, the sequences from the us states of pennsylvania, maryland, indiana, illinois and florida that were collected on may , (about one month after our cut-off date) were analyzed and our classifiers could correctly capture the severity of the genome sequences (see table in the supplementary file). we conduct an analysis to predict possible parents of the (mutated) virus strains of the south asian region (bangladesh, india and pakistan). our mutation prediction pipeline suggests that the strains of some states of the usa, namely, california, massachusetts, texas, new jersey and maryland could be the parents/ancestors of these south asian strains. now, the total deaths in these states up to june , are , , , and respectively [ ] and the strains thereof are also classified to be severe by our classification pipeline. it thus seems quite likely that the sars-cov- situation in these south-asian countries will worsen in near future. bangladesh, india and pakistan are ranked th , th and nd in global health performance compared to the united states of america which is at the th position [ ] . in the majority of lower middle-income countries such as bangladesh, india and pakistan, available hospital beds are < bed per population and icu beds are < bed per , population [ ] . additionally, an uncontrolled epidemic is predicted to have , , deaths having a duration of nearly days in the majority of these countries [ ] . these predictions coupled with our findings call for stern actions (i.e., interventions) on part of these countries. bibliography: covid- ) outbreak situation genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding cryo-em structure of the -ncov spike in the prefusion conformation alignment-free sequence comparison: benefits, applications, and tools a novel fast vector method for genetic sequence comparison who coronavirus disease (covid- ) dashboard a deep learning approach to dna sequence classification dna sequence classification by convolutional neural network principal component analysis and factor analysis. (n.d.). principal component analysis springer series in statistics tempel: time-series mutation prediction of influenza a viruses via attention-based recurrent neural networks dendroscope : an interactive tool for rooted phy-logenetic trees and networks crisprpred(seq): a sequence-based method for sgrna on target activity prediction using traditional machine learning extra tree forests for sub-acute ischemic stroke lesion segmentation in mr sequences isgpt: an optimized model to identify sub-golgi protein types using svm and random forest based feature selection lightgbm: a highly efficient gradient boosting decision tree vietnam confirms th covid- patient -vnexpress international india confirms its first coronavirus case kerala defeats coronavirus; india's three covid- the weather channel, the weather channel india's first coronavirus death is confirmed in karnataka coronavirus: india 'super spreader' quarantines , people , indians quarantined after 'super spreader' ignores government advice responding to covid- -a once-in-a-century pandemic? data, disease and diplomacy: gisaid's innovative contribution to global health evolview, an online tool for visualizing, annotating and managing phylogenetic trees why neighbor-joining works coronavirus, primi due casi in italia: sono due turisti cinesi koronawirus w lubuskiem. godziny, dwa razy za wolno. daleko do laboratorium taiwan confirms st wuhan coronavirus case (update) austria's coronavirus cases are italian citizens greece confirms first coronavirus case, a woman back from milan as coronavirus takes hold, greece worries about migrant camps turkey remains firm, calm as first coronavirus case confirmed human mitochondrial genome compression using machine learning techniques google colaboratory the neighbor-joining method: a new method for reconstructing phylogenetic trees scikit, scikitlearn.org/stable/modules/generated/sklearn.cluster.kmeans.html dynamic interventions to control covid- pandemic: a multivariate prediction modelling study comparing worldwide countries imagenet classification with deep convolutional neural networks going deeper with convolutions europe's coronavirus numbers offer hope as us enters 'peak of terrible pandemic' algorithm as : a k-means clustering algorithm consistent individualized feature attribution for tree ensembles greece's 'patient zero' shares coronavirus experience (lead) taiwanese woman deported for refusing to stay at quarantine facility sağlık bakanı fahrettin koca: pozitif Çıkan yeni vakalarımız var -türkiye haberleri flights from qatar, www.qatar.to/united-states/qatar-to-united-states ministra confirma primeiro caso positivo de coronavírus em portugal scikit, scikitlearn.org/stable/modules/feature_selection.html#univariate-feature-selection nsp of coronaviruses: structures and functions of a large multi-domain protein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus role of changes in sars-cov- spike protein in the interaction with the human ace receptor: an in silico analysis measuring overall health system performance for countries. global programme on evidence forhealth policy discussion paper no. qatar reports first case of coronavirus sklearn.feature_selection.f_classif ¶ dendropy: a python library for phylogenetic computing flights from qatar, www.qatar.to/germany/qatar-to-germany flights from qatar, www.qatar.to/united-states/qatar-to-united-states single-stranded rna genome of sars-cov sars-cov- (severe acute respiratory syndrome coronavirus ) sequences antigenic: an improved prediction model of protective antigens dpp-pseaac: a dna-binding protein prediction model using chou's general pseaac key: cord- - n tsk authors: roy, susmita title: dynamical asymmetry exposes -ncov prefusion spike date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n tsk the novel coronavirus ( -ncov) spike protein is a smart molecular machine that instigates the entry of coronavirus to the host cell causing the covid- pandemic. in this study, a structural-topology based model hamiltonian of c symmetric trimeric spike is developed to explore its complete conformational energy landscape using molecular dynamic simulations. the study finds -ncov to adopt a unique strategy by undertaking a dynamic conformational asymmetry induced by a few unique inter-chain interactions. this results in two prevalent asymmetric structures of spike where one or two spike heads lifted up undergoing a dynamic transition likely to enhance rapid recognition of the host-cell receptor turning on its high-infectivity. the crucial interactions identified in this study are anticipated to potentially affect the efficacy of therapeutic targets. one sentence summary inter-chain-interaction driven rapid symmetry breaking strategy adopted by the prefusion trimeric spike protein likely to make -ncov highly infective. movement that generates the 'up' and 'down' conformations ( , , ) . other betacoronaviruses, like sars-cov, mers-cov and distantly related alphacoronavirus porcine epidemic diarrhea virus (pedv) also have this apparently stochastic rbd movement ( , ) . the combination of rbd up-down rearrangement may lead each s -head of the trimeric prefusion spike protein of coronavirus to adopt different possible conformations: (i) down, (ii) up- down, (iii) up- down, and (iv) up (fig. c) . among them down, up are symmetric conformers and up- down, up- down are asymmetric conformers. single-particle cryo-electron microscopy (cryo-em) determined few such symmetric and asymmetric structures referred to as the receptorbinding inactive state and receptor-binding active state, respectively ( ) . the asymmetric structure where one of the rbds rotates up was thought to be less stable for sars-cov s ( ) . in comparison, the recent cryo-em study found three rbds in up- down conformation as a predominant arrangement in the prefusion state of -ncov s trimer ( ) . this arrangement apparently appears legitimate for sars-cov- s in order to explain the higher affinity of up- down for ace receptor than that of sars-cov s. however, we cannot rule out the possibility of up- down conformation as a functional state, which may provide even stronger binding with ace considering the fact that ace is a dimeric receptor ( , ) . this hypothesis is consistent with a recent crystallographic study demonstrating that cr , a neutralizing antibody isolated from convalescent sars patients targets the rbd when at least two rbd on the trimeric spike protein are in the up conformation ( ) . assembling all these experimental results it is high time to understand the molecular mechanism of s -head coordination of trimeric sar-cov- s and to identify important interaction in regulating spike up-down conformations. a schematic of receptor-bound spike protein including the receptor-binding subunit s , the membrane-fusion subunit s of a coronavirus is shown. b. side and top views of the homo-trimeric structure of sars-cov- spike protein with one rbd of the s subunit head rotated in the up conformation. c. rbd up-down movement expected to lead s heads of the trimeric spike protein to attain the following possible conformers: (i) down (ii) up- down (iii) up- down, and (iv) up. these are an analogue demonstration of the spike protein top-view where ntds are represented by colored ovals, rbds are represented by flexible sticks and s domains are represented by filled circles. a major challenge was simulating the gigantic structure of the full-length trimeric spike, as it is associated with the largescale conformational transition. it is indeed a daunting task to explore the full conformational landscape at an atomic length-scale. to overcome this, a structure-based coarse-grained molecular dynamic simulation approach has been adopted ( ) . the simulation started with a full-length homo-trimeric spike protein structure generated from homology modeling which involves the alignment of a target sequence and a template structure (pdb: vsb) ( , ) . this also helped to build the missing loops. the domain-specific residuerange for the full-length, trimeric sars-cov- s is given in fig. a . the s head coordination of the trimeric spike is programmed by developing a super-symmetric topology-based modeling framework ( fig. b ) (described in the method pipeline in the supplementary material). with this, the molecular machine is ready to swing each of its s head between its 'up' and 'down' conformations (movie s , s ). a number of cryo-em structures captured the 'up' and 'down' conformations of the rbd domain of spike proteins of other coronaviruses including sars-cov- where the s subunit undergoes a hinge-like conformational movement prerequisite for receptor binding (fig. c) ( , , , ) . apart from the hinge-responsive rbd-cleft interaction, in this study, a few inter-chain interactions are found to assist the 'rbd-up' and the 'rbd-down' conformations (shown in fig. d and e, movie s ). these few interactions are identified to impact the breathing of rbd of sars-cov- s. this makes the early referred 'rbd-up/down' conformations slightly different from the 's -head-up/down' conformation for trimeric sars-cov- s as the former is regulated only by intra-chain interactions while the latter is regulated by both intra and inter-chain interactions (fig. s ). after identifying all these unique intra and inter-chain contacts ( , ) extracted from the corresponding 's -head-up' and 's -head-down' conformations, a super-symmetric contact map is generated. this follows the development of a structure-based model hamiltonian (materials and methods in supplementary) which is based on the energy landscape theory of protein folding ( ) ( ) ( ) ( ) ( ) . this approach not only potentiates the trimeric spike to adopt c symmetric ' up' and ' down' states but also to break the symmetry in a thermodynamically governed way ( fig. s -s ) ( , ) . residue-residue native contact map identifying unique intra and inter-chain contact-pairs formed by any single monomer in its s -head up and s -head down states. c. within intra-chain contacts, the unique contacts that drive hinge motion leading to rbd-up and rbd-down states are highlighted in the structure, as well as in the contact map. d. inter-chain unique contacts between rbd and ntd domains upholding the s -head-up state. e. inter-chain unique contacts are responsible for connecting the rbd of chaina with the s -stalk of chainb and the s stalk of chainc. to monitor the transition between the 's -head-up' and the 's -head-down' states for each monomer with the trimeric interactions, a large pool of unbiased longtime trajectories generated where multiple occurrences of up and down states for each monomer have been sampled. we employ a reaction coordinate, q, the fraction of the native contact ( , ) corresponding to the inter-chain contacts associated with the 's -head-up' and the 's -head-down' states. a typical trajectory plot of q extracted from the equilibrium simulation of the trimeric prefusion spike clearly shows the hopping between different conformational states as hypothesized earlier (fig. a) . furthermore, the dynamic transitions between the two major asymmetric states ( up- down: q s -head-down ≈ . and up-down: q s -head-down ≈ . ) are evident in the q-trajectory. analysis of all the simulations yields the -d free energy landscape of the trimeric spike protein of sars-cov- ( fig b) with its all possible conformations. the conformations corresponding to the minima of the free energy landscape are shown in fig. c . the temperature dependence of conformational transition indicates that the configurational entropy and enthalpy compensation results in the enhanced population of the asymmetric up- down to up- down conformations ( fig s ) . while the predominant population of the up- down state is consistent with the recent cryo-em data, ( ) (movie s , s ) the other asymmetric structure ( up- down) emerges as a best binding epitope for cr (an antibody collected from convalescent sars patients) according to a recent antibody recognition study of sars-cov- s ( ). .conformational transition of sars-cov- spike protein in its prefused state. a. the fraction of native contact (q) dynamics counting inter-chains contact-pairs formed in the s head-up state and the s -head-down state. b. a two-dimensional free energy landscape of conformational transition as a function of inter-chain contacts supporting s -head-down (x-axis) and s -head-up state (y-axis) explores all possible conformations. c. the representative structure corresponding to each minimum of the free energy landscape is designated as follows: (i) up, (ii) up- down, (iii) up- down, and (iv) down state (as shown in the one-dimension population distribution plot). a. unique inter-chain interactions formed by rbd of one chain with ntd of the adjacent chain stabilizing the s -head-up conformation in sars-cov- s (pdb: vsb). interchain domain closure is analyzed by inter-chain proline-proline distance measurement. the same distance measured for the following spikes: b. sars-cov spike (pdb: x b) and c. mers-cov spike (pdb: x f). d. rbd up-down hinge dynamics triggered by inter-chain rbd-ntd domain interaction. e. in the absence of rbd-ntd inter-chain interaction, the hinge motion of rbd is hindered by populating more 'rbd-down' conformations and allows to sample 'rbd-up' conformation only rarely in a stochastic manner. in this study, sequence and interaction level (fig. s , fig. s ) comparison has been made over the cryo-em structure of sars-cov- s (pdb: vsb), sars-cov s (pdb: x b) and mers-cov s (pdb: x f) ( , ) . this comparison results that sars-cov- s has ntd-rbd domain association where a proline residue of chaina forms ch-п type interaction with the tyrosine residue ( ) and hydrophobic interaction with another proline of chainb (fig. a, fig. s ). inter-chain proline-proline distance measurement shows that the corresponding rbd-ntd domains are far away in the case of sars-cov s (fig. b ) and further away in the case of mers-covs (fig. c ). this measurement involves their respective cryo-em structures. despite the relatively high degree of sequence similarity between the sars-cov- s and the sars-cov s and also with the spike protein from the bat coronavirus ratg , a single histidine residue at the relevant rbd-ntd domain interface is found unique in the vase of sars-cov- s (fig. s ) ( ) . the imidazole ring of histidine is pointing towards the hydrophobic assembly of aforesaid proline-tyrosine in the juxtaposition of the rbd-s hinge region. such inter-chain rbd-ntd connection is thus found to impact the rbd hinge interaction by upregulating more rbd-up conformation (fig. d ). in the absence of such interchain interaction, the rbd mostly stays in the down conformation allowing rbd to break the symmetry rarely in a stochastic manner (fig. e) . the absence of inter-chain rbd-ntd connection also appears to impact the sars-cov rbd hinge interaction. here, the opening of rbd-s cleft is significantly less than that of sars-cov- s in their respective s -head-up state (fig. s ). the assistance from the inter-chain rbd-s -stalk related interfacial contacts are also found to modulate the population dynamics of rbd-down conformation (fig. s ). the influence of this inter-chain rbd-s -stalk interaction has also been observed in an early cryo-em analyses where two proline mutations at the top of s stalk (inferring rbd-s inter-chain connection) helped to stabilize the 'up' conformers of sars-cov s ( ) . the synergy between internal rbd-hinge interactions and inter-chain interactions allows trimeric sars-cov- s to adopt a unique dynamical feature than other corona-virus spikes. it appears that the inter-chain interactions driven rapid symmetry breaking strategy potentiates this spike machine to turn on its high-infectivity. the energy landscape framework used in this study indeed helps to unify and compare different spike protein interactions present in other coronaviruses. while in the current situation to develop diagnostics and antiviral therapies are of utmost priority, the present structure-based model derived information at the microscopic interaction level might provide deep insight to design effective decoys or antibodies to fight against -ncov infection. movies s to s method pipeline of building a super-symmetric contact map of sars-cov- prefusion spike protein. coarse-grained structure-based simulations have been performed for full-length trimetric sars-cov- spike protein. the structure-based hamiltonians for different simulations were derived after processing the recent cryo-em structure (pdb: vsb) thorough the swiss model to complete missing loops present in the structure ( , ) . this generates a homo-trimeric sars-cov- spike where this initial structure has important components in terms of intra and inter-chain contacts (interaction) leading to an 's -head-up' and an 's -head-down' conformation for each protomer. in this prevalent trimeric variant, only one monomer adopts 's -head-up' and the same of the other two adopts the 's -head-down' conformation. few characteristic intra-chain contacts cause the receptor-binding domain to perform a hinge-motion resulting 'rbd-up' and 'rbd- conformations driven by intra c as defined in the pipeline method. contact calculation is performed using the shadow criterion ( ) . interesting components are inter-chain contacts residing at the interface of the dimer. now, two categories of interactive dimeric interfaces are there: asymmetric-dimer interface and symmetricdimer interface. chaina (s -head-up) and the adjacent chainb (s -head-down) represent an asymmetric dimer unit. similarly, chainb (s -head-down) and the adjacent chainc (s -headdown) represent a symmetric dimer unit. at the asymmetric-dimer interface, the rbd-domain of chaina forms a few unique contacts with the ntd domain of the adjacent chainb as shown in has been cycled over all the interfaces making each of interfaces dynamically capable of inducing s -head movement. developing a structure-based hamiltonian of trimeric spike protein simulation: a structure-based hamiltonian of trimeric spike protein for sar-cov is derived using the super-symmetric contact map. in the current structure-based model amino acids are represented by single beads at the location of the c-α atom ( , , , ) . the coarse-grained structurebased model, a well-established model, comprehends a novel way to investigate the mechanisms associated with protein folding and function ( - , , - ) . in the current context of decoding virus entry mechanism, this model successfully characterized class-i viral fusion protein dynamics including conformational rearrangement of a viral surface glycoprotein, influenza hemagglutinin (ha) during its prefusion and postfusion states ( , ) . as described in the pipeline method, the complete hamiltonian comprises of two terms: and, intra up down shared the first non-local term of the hamiltonian used in a/b/c intra h represents non-bonded interaction potential in the form of - lennard-jones potential that is used to describe the interactions that stabilize the native contacts ( ) . a native contact is defined for a pair of residues (i and j) present in the native state using shadow criteria and when (i−j)> . Δ ij is defined in such a way that if any i and j residues belong to intra c , Δ ij = turning on - lennard-jones potential; otherwise Δ ij = . for all non-native pairs for which Δ ij = , a repulsive potential with σ = Å is used. all the interaction coefficients used in this potential are given in table s . as described in the method pipeline, inter h will include only the non-local inter-chain contacts residing at the interface of the dimer which comprises of accounting for asymmetric-dimer similar to our early approach, Δ ij is such defined that if any i and j residues belong to inter c , Δ ij = , turning on - lennard-jones potential; otherwise Δ ij = . here, inter to begin every simulation an initial structure is energetically minimized under the structure-based hamiltonian using the steepest descent algorithm. atomic coordinates of the energy minimized structure have been evolved using langevin dynamics with a time step of . r  . we used an underdamped condition for rapid sampling ( ) . for explicit particles, reduced mass of r  and a drag coefficient all temperatures mentioned here are in reduced units. temperature dependence of the conformational transition has been performed over several temperatures. three representative reduced temperature-dependent (t*= . t r , t*= . t r, and t*= . t r ) analyses are shown for clarity in fig. s . population distribution as a function of the fraction of native inter-chain contacts formed in the s -head-down state is monitored over these temperatures. four states emerge as indicated in fig. b and fig. s . as the temperature increases the population shifts more towards the s -head-up state. at t*= . t r the population of up- down state appears as a predominant population in the conformational landscape which correlates well with the recent cryo-em data ( ) . we have performed all our simulations being consistent with this selected temperature. the rmsd analyses ensure the correctness of the simulation progress and the emergence of the correct structure (fig. s ) . the population shifts more towards the s -head-up state conformations as the temperature increases. it suggests that the s -head-up states are more dynamic and entropically stable. note that the dynamical transition between up- down and up- down states may tolerate a wide range of temperatures by a population shift mechanism. so far, we have examined that it tolerates the temperature range from t*= . t r to t*= . t r . temperature dependence of rbd hinge motion has also been studied (fig. s ) . population distribution as a function of the fraction of native intra-chain hinge-region contacts formed by the rbd at different temperatures has been monitored. a bimodal distribution reflects the population of the 'rbd-up' and 'rbddown' states for any individual chain being in trimeric spike. as temperature increases, the rbd-up states start to enhance their populations. free energy calculation: in a system, if a state "a" described by its reaction coordinate, x a (which in our case is the fraction of native contact) is separated from another state "b" described by its reaction coordinate, x b , by a finite barrier, the free energy of transition from a to b can be expressed as, where, ( ) b p x is the probability to find the system in state b at the reaction coordinate, q b . the same holds for ( ) a p x . from a finite set of unbiased simulations of trimeric spike protein, a complete thermodynamic description is obtained. probability distributions are obtained by sampling the configurational space running molecular dynamics simulation sets. fig. s : inter-chain interaction from the 's -head-up' and the 's -head-down' states of sars-cov- spike. a. inter-chain rbd-ntd domain closure in the s -head-up state. the domain closure is mediated by double hydrogen bonds connecting arg of chaina with asn and cys residue of chainb. b. inter-chain rbd-s domain closure in the s -head-down state. the s stalk connection with rbd is mediated by a proline residue of chaina with the formation of a ch-п type interaction with tyrosine and hydrophobic interaction with another proline of chainb. fig. s : the structural alignment of two chains in the s -head-down state. chainb (orange) and chainc (green) in the s -head-down state extracted from the cryo-em structure (pdb: vsb) of trimeric spike. low rmsd between these two chains suggests that contact information extracted from any of these chains will be equivalent. this supports our contact map generation shown in the method pipeline. fig. s . rms deviation of each chain from their initial state during a typical simulation progress. a. the initial state of chain a in the trimeric spike was in 's -head-up' state and chain b/c was in 's -head-down' state. b. the lower rmsd for chain a corresponds to chain a's head-up state. c. the lower rmsd for chain b corresponds to chain b 's head-down state. d. the lower rmsd for chain c corresponds to chain c's head-down state. the rmsd analyses ensure the correctness of the simulation progress and the emergence of the correct structure. fig. s . temperature dependence of s -head up-down transition and rbd open-close breathing transition. a. population distribution as a function of the of native inter-chain contacts formed in the s -head-down state as shown in fig. s . four states emerge as shown in fig. b . as temperature increases the population shifts more towards the s -head-up state conformations indicating that s -head-up states are more dynamic and entropically stable.note that the dynamical transition between up- down and up- down states may tolerate a wide range of temperatures by a population shift mechanism. b. population distribution as a function of the fraction of native intra-chain hinge-region contacts formed by the rbd. a bimodal distribution reflects the 'rbd-up' and the 'rbd-down' states for any individual chain being in trimeric spike. as the temperature increases, rbd-up started populating more. temperature analysis helps to choose an intermediate temperature to obtain correct population distribution. fig. s . sequence alignment of sars-cov- spike (pdb: vsb) with that of sars-cov spike(pdb: x b), mers-cov spike (pdb: x f) and ratg spike. only the rbd is highlighted in green. the unique histidine residue (highlighted in yellow) of the rbd of sars-cov- is noted. identical residues are denoted by an "*" beneath the consensus position. the multiple sequence alignment is continued over the next page. fig. s .the opening of rbd-s cleft in the 's -head-up' state of sars-cov- s differs from that of sars-cov s. the opening is measured by a characteristic distance between a serine and proline residues at two edges of the cleft. for sars-cov- s the distance is . nm while for sars-cov-s, it is . nm. it appears that inter-chain rbd-ntd connection influences the sars-cov s rbd hinge motion significantly where the cleft opening is supported by those inter-chain interactions. fig. s . the free energy landscape in the presence and absence of inter-chain rbd-s contacts. a. in the presence of inter-chain rbd-s contacts, the enhanced population of the up- down compared to up- down. b. in the absence of inter-chain rbd-s contacts, the population shifts from up- down state up- down state. movie s : conformational dynamics of full-length trimeric sars-cov- spike protein showing rapid symmetry breaking. movie s :conformational dynamics of full-length trimeric sars-cov- spike protein showing rapid symmetry breaking. in this movie the ntd domains are not shown for better demonstration of the rbd movement. movie s : conformational dynamics of a monomer of the full-length sars-cov- showing rbd hinge motion. and notes: . in microbial evolution and co-adaptation: a tribute to the life and scientific legacies of joshua lederberg: workshop summary plagues and peoples sars-cov- : an emerging coronavirus that causes a global threat evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission identifying sars-cov- related coronaviruses in malayan pangolins structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein structural basis for the recognition of sars-cov- by full-length human ace cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding the . -angstrom cryo-electron microscopy structure of the porcine epidemic diarrhea virus spike protein in the prefusion conformation cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov topological and energetic factors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? an investigation for small globular proteins swiss-model: homology modelling of protein structures and complexes the human coronavirus hcov- e s-protein structure and receptor binding smog : a versatile software package for generating structure-based models the shadow map: a general contact definition for capturing the dynamics of biomolecular folding and function. the journal of physical chemistry protein folding funnels: a kinetic approach to the sequence-structure relationship symmetry and the energy landscapes of biomolecules chemical physics of protein folding levinthal's paradox funnels, pathways, and the energy landscape of protein folding: a synthesis microbial evolution and co-adaptation: a tribute to the life and scientific legacies of joshua lederberg: workshop summary plagues and peoples sars-cov- : an emerging coronavirus that causes a global threat evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission identifying sars-cov- related coronaviruses in malayan pangolins structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein structural basis for the recognition of sars-cov- by full-length human ace cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding the . -angstrom cryo-electron microscopy structure of the porcine epidemic diarrhea virus spike protein in the prefusion conformation cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov topological and energetic factors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? an investigation for small globular proteins swiss-model: homology modelling of protein structures and complexes the human coronavirus hcov- e s-protein structure and receptor binding smog : a versatile software package for generating structure-based models the shadow map: a general contact definition for capturing the dynamics of biomolecular folding and function. the journal of physical chemistry protein folding funnels: a kinetic approach to the sequence-structure relationship symmetry and the energy landscapes of biomolecules chemical physics of protein folding levinthal's paradox funnels, pathways, and the energy landscape of protein folding: a synthesis the origin of minus-end directionality and mechanochemistry of ncd motors order and disorder control the functional rearrangement of influenza hemagglutinin landscape approaches for determining the ensemble of folding transition states: success and failure hinge on the degree of frustration pi-interactions in proteins the embl-ebi search and sequence analysis tools apis in stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis role of aaa domain in allosteric communication of dynein motor proteins intersubunit assisted folding of dna binding domains in dimeric catabolite activator protein. the journal of physical chemistry online service), computational modeling of biological systems : from molecules to pathways chemical physics of protein folding from levinthal to pathways to funnels from structure to function: the convergence of structure based models and co-evolutionary information protein folding mechanisms and the multidimensional folding funnel navigating the folding routes rotation-activated and cooperative zipping characterize class i viral fusion protein dynamics the nature of folded states of globular proteins strain mediated adaptation is key for myosin mechanochemistry: discovering general rules for motor activity key: cord- -ymvrserl authors: crooke, stephen n.; ovsyannikova, inna g.; kennedy, richard b.; poland, gregory a. title: immunoinformatic identification of b cell and t cell epitopes in the sars-cov- proteome date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ymvrserl a novel coronavirus (sars-cov- ) emerged from china in late and rapidly spread across the globe, infecting millions of people and generating societal disruption on a level not seen since the influenza pandemic. a safe and effective vaccine is desperately needed to prevent the continued spread of sars-cov- ; yet, rational vaccine design efforts are currently hampered by the lack of knowledge regarding viral epitopes targeted during an immune response, and the need for more in-depth knowledge on betacoronavirus immunology. to that end, we developed a computational workflow using a series of open-source algorithms and webtools to analyze the proteome of sars-cov- and identify putative t cell and b cell epitopes. using increasingly stringent selection criteria to select peptides with significant hla promiscuity and predicted antigenicity, we identified potential t cell epitopes ( hla class i, hla class ii) and potential b cell epitopes, respectively. docking analysis and binding predictions demonstrated enrichment for peptide binding to hla-b (class i) and hla-drb (class ii) molecules. overlays of predicted b cell epitopes with the structure of the viral spike (s) glycoprotein revealed that of epitopes were located in the receptor-binding domain of the s protein. to our knowledge, this is the first study to comprehensively analyze all (structural, non-structural and accessory) proteins from sars-cov- using predictive algorithms to identify potential targets for vaccine development. significance statement the novel coronavirus sars-cov- recently emerged from china, rapidly spreading and ushering in a global pandemic. despite intensive research efforts, our knowledge of sars-cov- immunology and the proteins targeted by the immune response remains relatively limited, making it difficult to rationally design candidate vaccines. we employed a suite of bioinformatic tools, computational algorithms, and structural modeling to comprehensively analyze the entire sars-cov- proteome for potential t cell and b cell epitopes. utilizing a set of stringent selection criteria to filter peptide epitopes, we identified t cell epitopes ( hla class i, hla class ii) and b cell epitopes that could serve as promising targets for peptide-based vaccine development against this emerging global pathogen. in december , public health officials in wuhan, china, reported the first case of severe respiratory disease attributed to infection with the novel coronavirus sars-cov- ( ). since its emergence, sars-cov- has spread rapidly via human-to-human transmission ( ), threatening to overwhelm healthcare systems around the world and resulting in the declaration of a pandemic by the world health organization ( ). the disease caused by the virus is characterized by fever, pneumonia, and other respiratory and inflammatory symptoms that can result in severe inflammation of lung tissue and ultimately death-particularly among older adults or individuals with underlying comorbidities ( - ). as of this writing, the sars-cov- pandemic has resulted in million confirmed cases of covid- and over , deaths worldwide ( ). sars-cov- is the third pathogenic coronavirus to cross the species barrier into humans in the past two decades, preceded by severe acute respiratory syndrome coronavirus (sars-cov) ( , ) and middle-east respiratory syndrome coronavirus (mers-cov) ( ). all three of these viruses belong to the β -coronavirus genus and have either been confirmed (sars-cov) or suggested (mers-cov, sars- cov- ) to originate in bats, with transmission to humans occurring through intermediary animal hosts ( ) ( ) ( ) ( ) . while previous zoonotic spillovers of coronaviruses have been marked by high case fatality rates (~ % for sars-cov; ~ % for mers-cov), widespread transmission of disease has been relatively limited ( , cases of sars; , cases of mers) ( ). in contrast, sars-cov- is estimated to have a lower case fatality rate (~ - %) but is far more infectious and has achieved world-wide spread in a matter of months ( ). as the number of covid- cases continues to grow, there is an urgent need for a safe and effective vaccine to combat the spread of sars-cov- and reduce the burden on hospitals and healthcare systems. no licensed vaccine or therapeutic is currently available for sars-cov- , although there are over vaccine candidates reportedly in development worldwide. seven vaccine candidates have peptide was removed from the structure using chimera . (university of california-san francisco) ( ) prior to running simulations. ten models of each peptide-hla complex were generated on the basis of minimized energy scores, and the top model for each complex was selected for comparative analysis. prediction and structural modeling of sars-cov- b cell epitopes linear b cell epitope predictions were performed on the three exposed sars-cov- structural proteins: s (genbank accession: qhd ), m (qhd ), and e (qhd ) using the bepipred . algorithm ( ). epitope probability scores were calculated for each amino acid residue using a threshold of . (corresponding to > . specificity and sensitivity below . ), and only epitopes > amino acid residues in length were further analyzed. the structure of the sars-cov- s protein was accessed from the protein data bank (pdb id: vsb) ( ). discontinuous (i.e., structural) b cell epitope predictions for the s protein structure were carried out using discotope . ( ) with a score threshold greater than - . (corresponding to > . specificity and sensitivity below . ). the main protein structure was modeled in pymol (schrödinger, llc), with predicted b cell epitopes identified by both bepipred . and discotope . highlighted as spheres. genetic similarity of sars-cov- isolates the primary goal of our study was to identify peptide epitopes that would be broadly applicable in vaccine development efforts against sars-cov- . we identified point mutations and deletions across the genomes of clinical isolates, with all deletions and the majority of mutations (n= ) occurring in the orf ab polyprotein (supp. figure s ). single-point mutations were also found in the s protein (n= ), n protein (n= ), orf protein (n= ), orf a protein (n= ), orf protein (n= ), e protein (n= ), and m protein (n= ). despite the genetic diversity introduced by these events (figure d) , matrix analysis determined that > % sequence identity was maintained across all viral genomes. based on these findings and for study feasibility, the genome from the original virus isolate genbank: mn ) was selected as the consensus sequence for all further analyses. we next identified potential cd + t cell epitopes from all proteins in the sars-cov- proteome. using the netctl . predictive algorithm, we analyzed the complete amino acid sequence of each viral protein to generate sets of -mer peptides predicted to be recognized across at least one of the major hla class i supertypes (figure a, supp. figure s ) . this approach yielded a significant number of potential epitopes from each viral protein (orf : , orf : , orf : , e: , orf : , n: , m: , orf a: , s: , orf ab: ) , with the number directly related to the size of the parent protein. we used the netmhcpan . server to further refine the list of potential cd + t cell epitopes by predicting binding affinity across representative hla class i alleles (see methods) and assigning percentile scores to quantify binding propensity. peptides with percentile rank scores < . % (i.e., strong binders) were filtered using a nm threshold for binding affinity to further delineate candidate hla class i epitopes from the viral proteome ( ). for feasibility reasons, we refined our selection to candidate epitopes by excluding peptides predicted to bind only one hla molecule (supp . table s ). the resultant peptides were enriched for predicted binders to hla-b molecules hla-b* : = ; hla-b* : = ) ( figure b) . a final round of selection on the basis of hla promiscuity (i.e., predicted binding to > hla molecules) and predicted antigenicity scoring using the vaxijen . server produced a subset of five candidate peptides (four orf ab, one s protein) as potential targets for vaccine development (table ) with the hypothesis that increased hla binding promiscuity meant broader population base coverage by those peptides. these peptides were predicted to provide % global population coverage and had higher predicted binding affinities for hla-b molecules (b* : = . nm; b* : = . nm; b* : = . nm) compared to hla-a molecules (a* : = . nm; a* : = . nm), with the exception of one orf ab-derived peptide (mmisagfsl) that was predicted to bind hla-a* : with high affinity (ic = . nm) ( figure c) . we also sought to identify potential hla class ii peptides from sars-cov- , as the stimulation of cd + t-helper cells is critical for robust vaccine-induced adaptive immune responses. using the netmhciipan . server, we identified candidate hla class ii peptides from the viral proteome predicted to have high binding affinity (< nm) and percentile rank scores < % across a reference panel of hla molecules covering > % of the population ( , ). similar to hla class i epitope predictions, the number of class ii epitopes identified for each viral protein (orf : , e protein: , orf : , orf : , orf : , n: , m: , orf a: , s: , orf ab: ) was largely proportional to protein size. after excluding peptides predicted to bind to only a single hla molecule in our panel, we refined our selection to peptides (supp . table s ), which were enriched for binding to hla-drb molecules (n= ) ( figure d ). filtering on hla promiscuity and predicted antigenicity scores yielded a subset of peptides ( orf ab, s protein, m protein, orf , orf a, orf , orf ) as cd + t cell epitopes for further study ( table ) . these peptides were predicted to collectively provide % population coverage and have significantly higher average binding affinities for hla-dr alleles (drb = . nm; drb = . nm; drb = . nm; drb = nm) compared to hla-dp ( . nm) or hla-dq ( . nm) molecules ( figure e) . characterization of hla class i peptide docking with hla-b* : the five candidate hla class i peptides identified by our computational approach were predicted to provide coverage across six hla alleles (a* : , a* : , a* : , b* : , b* : , b* : ). the peptide famqmayrf was the only candidate predicted to bind to a* : molecules, whereas mmisagfsl was predicted to uniquely bind a* : and b* : molecules. four of the five peptides were predicted to bind a* : and b* : molecules, but all were predicted to bind with relatively high affinity (average ic = . nm) to hla-b* : . therefore, we performed molecular docking studies of each peptide with the molecular structure of hla-b* : (pdb: c n). all peptides were predicted to bind within the peptide binding groove, forming hydrogen bond contacts with numerous amino acid side chains ( figure a) . the binding motif for hla-b* : is highly selective for residues at the p and p anchor positions, with a preference for bulky hydrophobic amino acids at the c-terminus ( figure b ) ( ). all candidate peptides possessed terminal residues (phe, tyr, leu) that fit into the hydrophobic binding pocket of the hla groove, further supporting that these peptides should be strong binders of hla-b* : and promising candidates for vaccine development studies. an effective vaccine should stimulate both cellular and humoral immune responses against the target pathogen; therefore, we also sought to identify potential b cell epitopes from sars-cov- proteins. we limited our analysis to the primary structural proteins exposed on the virus capsid (s, n, m, and e), as these are the most accessible antigens for engaging b cell receptors. using the bepipred . algorithm, we identified potential linear b cell epitopes in the s protein, potential epitopes in the n protein, and potential epitopes in the m protein ( table ) . no epitopes were identified in the e protein. studies have previously shown the s protein to be the predominant target of neutralizing antibodies against coronaviruses ( , ), and, as our findings indicate this to likely be the case for sars-cov- , we focused all subsequent analyses on the s protein. while the n protein is also a major target of the antibody response ( ), it is unlikely these antibodies have any neutralizing activity based on the viral structure. as epitope conformation can significantly influence recognition by antibodies, we also employed discotope . to identify discontinuous b cell epitopes in the protein structure. our analysis identified potential structural epitopes in the s protein ( in the s domain, in the s domain), with six regions having significant overlap with our predicted linear epitopes ( table ) . antigenic regions identified in both analyses were modeled using the recently published structure of the sars-cov- s protein ( ) to examine their accessibility for antibody binding. epitopes in the s domain (p -d ; y -d ) were clustered near the base of the spike protein, whereas regions in the s domain (d -d ; n -n ; g -p ; d -t ) were exposed on the protein surface (figure ). in the face of the covid- pandemic, it is imperative that safe and effective vaccines be rapidly developed in order to induce widespread herd immunity in the population and prevent the continued spread of sars-cov- . our study identified probable peptide targets of both cellular and humoral immune responses against sars-cov- using computational methodologies to investigate the entire viral proteome a priori. studies such as these are paramount during the early stages of pandemic vaccine development given the relative scarcity of biological data available on the viral immune response, and we employed an approach that allowed us to systematically refine our predictions using increasingly stringent criteria to select a subset of the most promising epitopes for further study. the data we have curated could inform the design of a candidate peptide-based vaccine or diagnostic against sars-cov- . as selective pressures are known to introduce viral mutations that promote fitness and can lead to evasion of immune responses ( , ), we first sought to investigate the genetic similarity of all reported sars-cov- clinical isolates and identify a consensus sequence for use in our epitope prediction studies. we identified mutations/deletions across the genomes of clinical isolates reported as of february . despite these variations, the viral genomic identity was > % conserved across all isolates. as the protein coding sequences were largely conserved, the genome of the original virus isolate (wuhan-hu- ) was deemed a representative consensus sequence for analysis of the sars-cov- proteome. cd + and cd + t cell responses will likely be directed against both structural and non-structural proteins during antiviral immune responses, as all viral proteins are accessible for processing and presentation on the hla molecules of infected cells. therefore, we sought to identify t cell epitopes across the entire viral proteome. our analysis identified potential cd + t cell epitopes (supp. table s ) and potential cd + t cell epitopes (supp . table s ) , with stringent filtering for more promiscuous peptides with high predicted antigenicity yielding a subset of cd + t cell epitopes and cd + t cell epitopes ( table ) as potential targets for vaccine development. a single study by grifoni and colleagues has recently reported the computational identification of cd + t cell epitopes from sars-cov- ( ), and peptides from our analysis shared sequence homology or were nested within peptides identified in their study. moreover, seven peptides from this initial report were replicated in our final subset of hla class ii epitopes, supporting that these peptides may be promising vaccine targets. an increasing number of studies have employed predictive algorithms to identify potential hla class i epitopes for sars-cov- , although relatively few have comprehensively analyzed the entire viral proteome. a report from feng et al. recently outlined the identification of potential class i epitopes in the main structural proteins from sars-cov- but did not consider any non-structural proteins ( ). grifoni and colleagues conducted a more rigorous analysis, identifying unique cd + t cell epitopes across all sars-cov- proteins but focusing their analyses solely on peptides with sequence homology to known sars-cov epitopes ( ). our approach initially identified ~ , potential cd + t cell epitopes across all viral proteins, which we refined to a subset of peptides (table ) . one peptide derived from orf ab (mmisagfsl) was predicted to bind hla-a* : with high affinity (ic = . nm) ( figure c) . given the prevalence of this allele in the american and european populations ( - % frequency) ( ), mmisagfsl may represent a promising epitope capable of providing broad vaccine population coverage. we also observed a notable enrichment of epitopes predicted to bind hla-b molecules- particularly hla-b* : -as we imposed more stringent selection criteria ( figure b ). all five peptides identified by our approach were predicted to be relatively strong binders for this allele (ic = . nm), with molecular docking simulations illustrating strong contacts with amino acid residues in the peptide binding groove (figure a, b) . a recent computational study identified another hla-b allele (b* : ) as having a high capacity for presenting epitopes from sars-cov- that were conserved among other pathogenic coronaviruses ( ). these data collectively suggest the hla-b locus may be significantly associated with the immune response to sars-cov- (and potentially other coronaviruses), with further biological studies warranted to determine the true role of host genetics in sars-cov- immunology. lastly, we analyzed the primary structural proteins of sars-cov- (s, n, m, e proteins) for potential b cell epitopes, as an ideal vaccine would be designed to stimulate both cellular and humoral immunity. our analysis identified potential linear b cell epitopes in all proteins except for the e protein ( table ). the greatest number of epitopes were predicted in the surface-exposed s protein (n= ), but a significant number of epitopes were also predicted for the n protein (n= ). this is not surprising, as previous reports identified the n protein as a significant target of the humoral response to sars-cov ( , ). as the s protein is the predominant surface protein and has been the primary target of neutralizing antibody responses against other coronaviruses ( , ), we elected to focus our subsequent analyses solely on antigenic regions in the s protein. we identified potential structural epitopes in the s protein structure and referenced against our linear epitope predictions to identify six regions that were independently identified by both analyses (table , figure ) to further evaluate the potential of these six antigenic regions as targets for antibody binding, we modeled their surface accessibility on the crystal structure of the sars-cov- spike protein ( ). four regions in the s domain (d -d ; n -n ; g -p ; d -t ) were solvent exposed (figure a, b) , with minimal steric hindrance for antibody accessibility. the s domain contains the residues (n -v ) important for virus binding to angiotensin converting enzyme (ace ) on the cell surface ( ), and studies have shown that antibodies with potent neutralizing activity against sars-cov target this domain ( - ). indeed, three of the four s epitopes identified in our analyses are located in the ace -binding region, supporting their potential utility in vaccine development against sars-cov- . two regions were identified in the s "stalk" domain of the s protein (figure a, c) . while y - d is located at the base of the s protein and likely inaccessible to antibodies, p -d is on the outer face of the protein and has been previously identified as part of a larger b cell epitope that is conserved with sars-cov ( ). as sars-cov s -specific antibodies have previously been shown to possess antiviral activity ( ), it is interesting to speculate whether a strategy similar to targeting the influenza hemagglutinin protein stalk could be employed for developing a broadly reactive coronavirus vaccine. our study possessed several strengths and limitations. rather than restricting our analyses of hla class i and class ii epitopes to specific proteins based on prior studies of sars-cov immunology, we investigated the complete proteome of sars-cov- using an unbiased approach. furthermore, we employed a multi-tiered strategy for identifying putative b cell and t cell epitopes from all viral proteins studied. our initial analyses were performed with liberal thresholds for epitope identification, and at each additional step, we imposed more stringent selection criteria to filter these peptides to a subset of b cell and t cell epitopes for further study. nevertheless, the results of this study are derived purely from computational methods, and it should be noted that computational algorithms can fail to capture a significant number of antigenic peptides ( ). experimental validation with biological samples will ultimately be needed. during the early stages of a pandemic, access to sufficient biological samples may be extremely limited, so we must continue to utilize methodologies-such as computational predictive algorithms- that allow us to explore the epitope landscape for experimental vaccine development. our approach in this study allowed us to identify and refine a manageable subset of t cell and b cell epitopes for further testing as components of a sars-cov- vaccine. based on our results, our proposed sars-cov- vaccine formulation could contain the following: ) one or more b cell peptide epitopes from the s protein to generate protective neutralizing antibodies; and ) multiple hla class i and class ii-derived peptides from other viral proteins to stimulate robust cd + and cd + t cell responses. based on global allele frequencies, these class i and class ii peptides would be expected to collectively provide % and % population coverage, respectively. while such a vaccine could be readily formulated as a synthetic polypeptide or an adjuvanted peptide mixture, these strategies may not retain the epitope structural features necessary to induce a robust antibody response. recombinant nanoparticles and assembly into vlps represent promising alternative vaccine platforms, as they have been extensively used for the controlled display and delivery of peptide-based vaccine components ( - ). by omitting whole viral proteins from the vaccine formulation, a peptide-based sars-cov- vaccine should have a well- tolerated safety profile and avoid the adverse events previously observed with experimental sars-cov vaccines ( ) ( ) ( ) ( ) . in summary, we have identified potential t cell epitopes ( hla class i, hla class ii) and potential b cell epitopes from across the sars-cov- proteome that are predicted to have broad population coverage and could serve as the basis for designing investigational peptide-based vaccines. further study on the biological relevance and immunogenicity of these peptides is warranted in an effort to develop a safe and effective vaccine to combat the sars-cov- pandemic. the authors would like to thank caroline l. vitse for editorial assistance with this manuscript. the research presented here was not supported by any specific funding source. : . . . . . huang lr, et al. ( ) evaluation of antibody responses against sars coronaviral nucleocapsid or spike proteins by immunoblotting or elisa. identification workflow illustrating the algorithms used ( , - , - , , ) and filtering criterion applied to refine peptide selection. (d) cladogram illustrating the genetic relationship of sars-cov- isolates. the original viral isolate and consensus sequence (wuhan-hu- ) is highlighted in red. figure . immunogenicity scoring of peptides in the sars-cov- proteome with predicted hla class i and ii coverage and binding affinities. discotope prediction algorithms highlighted on the trimeric structure of the s glycoprotein. inset panels show the s domain (upper) and s domain (lower a new coronavirus associated with human respiratory disease in china a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster who declares covid- a pandemic epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of hospitalized patients with clinical features of patients infected with coronavirus disease (covid- ) situation report - identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel coronavirus associated with severe acute respiratory syndrome. isolation of a novel coronavirus from a man with pneumonia in saudi arabia bats are natural reservoirs of sars-like coronaviruses a pneumonia outbreak associated with a new coronavirus of probable bat origin middle east respiratory syndrome coronavirus in bats middle east respiratory syndrome coronavirus in dromedary camels: an outbreak investigation structure, function, and antigenicity of the sars-cov- spike glycoprotein covid- : knowns, unknowns, and questions. msphere ( ). . world health organization ( ) draft landscape of covid- candidate vaccines tortoises, hares, and vaccines: a cautionary note for sars-cov- vaccine development immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus vaccine efficacy in senescent mice challenged with recombinant sars- cov bearing epidemic and zoonotic spike variants prior immunization with severe acute respiratory syndrome (sars)- associated coronavirus (sars-cov) nucleocapsid protein causes severe pneumonia in mice infected with sars-cov characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine quantitative comparison of the efficiency of antibodies against s and s subunit of sars coronavirus spike protein in virus neutralization and blocking of receptor binding: implications for the functional roles of s subunit neutralizing epitopes of the sars-cov s-protein cluster independent of repertoire, antigen structure or mab technology identification and characterization of novel neutralizing epitopes in the receptor-binding domain of sars-cov spike protein: revealing the critical antigenic determinants in inactivated sars-cov vaccine discovery of naturally processed and hla-presented class i peptides from vaccinia virus infection using mass spectrometry for vaccine development development of autologous c vaccine nanoparticles to reduce intravascular hemolysis in vivo plug-and-display: decoration of virus-like particles via isopeptide bonds for modular immunization a novel candidate hpv vaccine: ms phage vlp displaying a tandem hpv l peptide offers similar protection in mice to gardasil- targeted immunomodulation using antigen-conjugated nanoparticles scoring function for automated assessment of protein structure template quality galaxypepdock: a protein-peptide docking tool based on interaction similarity and energy optimization key: cord- -g p lsr authors: maldonado, lucas l.; kamenetzky, laura title: molecular features similarities between sars-cov- , sars, mers and key human genes could favour the viral infections and trigger collateral effects date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g p lsr in december rising pneumonia cases caused by a novel β-coronavirus (sars-cov- ) occurred in wuhan, china, which has rapidly spread worldwide causing thousands of deaths. the who declared the sars-cov- outbreak as a public health emergency of international concern therefore several scientists are dedicated to the study of the new virus. since human viruses have codon usage biases that match highly expressed proteins in the tissues they infect and depend on host cell machinery for replication and co-evolution, we selected the genes that are highly expressed in the tissue of human lungs to perform computational studies that permit to compare their molecular features with sars, sars-cov- and mers genes. in our studies, we analysed molecular features for viral genes and human genes that consisted of codon positions. hereby, we found that a/t bias in viral genes could propitiate the viral infection favoured by a host dependant specialization using the host cell machinery of only some genes. the envelope protein e, the membrane glycoprotein m and orf could have been further benefited by a high rate of a/t in the third codon position. thereby, the mistranslation or de-regulation of protein synthesis could produce collateral effects, as a consequence of viral occupancy of the host translation machinery due tomolecular similarities with viral genes. furthermore, we provided a list of candidate human genes whose molecular features match those of sars-cov- , sarsand mers genes, which should be considered to be incorporated into genetic population studies to evaluate thesusceptibility to respiratory viral infections caused by these viruses. the results presented here, settle the basis for further research in the field of human genetics associated with the new viral infection, covid- , caused by sars-cov- and for the development of antiviral preventive methods. m and orf could have been further benefited by a high rate of a/t in the third codon position. thereby, the mistranslation or de-regulation of protein synthesis could produce collateral effects, as a consequence of viral occupancy of the host translation machinery due tomolecular similarities with viral genes. furthermore, we provided a list of candidate human genes whose molecular features match those of sars-cov- , sarsand mers genes, which should be considered to be incorporated into genetic population studies to evaluate thesusceptibility to respiratory viral infections caused by these viruses.the results presented here, settle the basis for further research in the field of human genetics associated with the new viral infection, covid- , caused by sars-cov- and for the development of antiviral preventive methods. since its initial outbreak at huanan seafood wholesale market in wuhan, china, in late , covid- has affected more than million people and caused more than thousand deaths all around the world. thereafter, scientists are focused not only on studying the biology and dissemination of covid- to control the transmission and design proper diagnostic tools and treatments, but also theyare racing to design a vaccine that could prevent the infection caused by the coronavirus sars-cov- .this virus belongs to the betacoronavirus(β-coronavirus) of the coronaviridaefamily, which is also composed of three more genera: alphacoronavirus(αcov),gammacoronavirus(γcov) anddeltacoronavirus(δcov) (chen et al., a) . viruses from this family possess a single-stranded, positive-sense rna and thegenome ranges from to kb (su et al., ) . coronaviruses have been identified in several host species including humans, bats, civets, mice, dogs, cats, cows and camels (cavanagh, ; clark, in order to contribute to solving the sanitary emergence, here we provide a thorough and comprehensive analysis that could help to understand the viability of the virus as well as the susceptibility of the human host to the viral infection based on the molecular patterns of their genes. therefore, the main goals of ourwork wereto study the molecular and evolutionary aspects of the human coronaviruses sars-cov- ,sars and mers andto determine the level of similarity of the codon usage and molecular features betweenthe genes of human coronaviruses and the human genesin order to identify the factors that are responsible for the codons selection in the viruses.moreover, we proposed to identify the essential viral genes for viral replication andhumangenes whosetranslation machinery is involved in propitiating the system for viral replicationin orderto determine whether the genetic population variability could be involved in modelling the gene features andtherefore contributing to the human susceptibility to viral infections. up to late april, a total of sars-cov- β -coronavirus genome became available. the total available sequences of β -coronavirus were downloaded from the ncbi (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/) including the reference genomes of mers (nc_ ), sars (nc_ ) and sars-cov- (nc_ ) and were classified according to their host. different sars-cov- isolates from different countries were pre-analysed but only reference genomes were retaineddue to the low variability of the data.the genomes qualitywas assessed and the genomes containing more than gaps were discarded. cds of representative viruses fromthe previous classification were selected and analysed. since human viruses have codon usage biases (cub) that match highly expressed proteins in the tissues they infect (miller et (sueoka, ) , the three stop codons (uaa, uag, and uga) were excluded in the calculation of p , and the two single codons for methionine (aug) and tryptophan (ugg) were excluded from p , p , and p . the following codon indices were calculated: relative synonymous codon usage (rscu) (sharp and li, ) , the effective number of codons (enc) (wright, ) , codon adaptation index (cai) (lee et al., ; sharp and li, ) , codon bias index (cbi) (bennetzen and hall, ) , the optimal frequency of codons (fop) (ikemura, ) , general average hydropathicity (gravy) (sharp and li, ) , aromaticity (aromo) (lobry and gautier, ) and gc-content at the first, second and third codon positions (gc , gc and gc ), frequency of either a g or c at the third codon position of synonymous codons (gc s), the average of gc and gc (gc )and translational selection (trs ). enc indicates the degree of codon bias for individual genes. over a range of values from to , lower values indicate higher codon bias, while enc equal to means that all codons are used with equal probability (novembre, ; wright, ) . cai values measure the extent of bias toward preferred codons in highly expressed genes. cai values range between and . , with higher cai values indicating higher expression and higher cub (lee et al., ; sharp and li, ) under the assumption that translational selection would optimize gene sequences according to their expression levels. cbi is another measure of directional codon bias, based on the degree of preferred codons used in a gene, like to the frequency of optimal codons. it measures the extent to which a gene uses a subset of optimal codons. in genes with extreme codon bias, cbi will be equal to , whereas in genes with random codon usage the cbi values will be equal to (bennetzen and hall, ) . fop is a species-specific measure of bias towards particular codons that appear to be translationally optimal in particular species. it can be calculated as the ratio between the frequency of optimal codons and the total number of synonymous codons. its values range from if a gene contains no optimal codons to if a gene is entirely composed of optimal codons (ikemura, ) . the determination of optimal codons was carried out based on the axis ordination, the top and bottom % of genes were regarded as the high and low bias datasets, respectively. codon usage in the two data sets was compared using chi-square tests, with the sequential bonferroni correction to assess significance according to peden (peden, ) . optimal codons were defined as those that are used at significantly higher frequencies (p-value < . ) in highly expressed genes compared with the frequencies in genes expressed at low levels. the determination of codon pair biases in coding sequences was performed using cpbias (https://rdrr.io/github/alex-sbu/cpbias/) developed in r. as described by coleman et al. (coleman et al., ) . the cps is defined as the natural logarithm of the ratio of the observed over the expected number of occurrences of a particular codon pair in all protein-coding sequences of a species. the cpb was used as an index and also to determine the bias in cps among the virus and host genes. the expected number of codon pair occurrences estimates the number of codon pairs to be present if there is no association between the codons that form the codon pair. it is also calculated to be independent of codon bias and amino acid frequency (coleman et al., ) . (greenacre, ) . the data were normalized according to sharp and li (sharp and li, ) in order to define the relative adaptiveness of each codon (peden, ; suzuki et al., ) , codon usage indices described above were also included as variables. pca analyses were performed using "factoextra r package" (https://cloud.r- project.org/web/packages/factoextra/index.html). showed enc values that ranged from . to . being the human mers gene, followed by the bats mersgene, the viral genes that presented the lowest values. the genes that encode for the spike protein s presented enc values that ranged from . to . . the gene of the humansars-cov- showed the lowest enc value followed by the genes of bat and pangolin sars-cov- . orf genes presented enc values that ranged from . to . , being the human sars-cov- the virus that presented the lowest and the highest enc value for orf and orf respectively. distantly.this analysis also showed that cpb is highly related to the dinucleotide bias. in this cluster the genes that encode for the orf genes showed low values of cpb, being the lowest of all the clusters. conversely, the genes that encode for the spike protein s presented high cpb values. found that the total gene repertoire had a similar enc average that differs only unit with respect to their non-human host they come from, reflecting the molecular features of their original host. furthermore, as demonstrated in our clustering analysis, codon pair usage seems to be dependent on the dinucleotide bias and the human cpb was higher for human genes than for viruses genes as previously reported (kames et al., ; kunec and osterrieder, ) . moreover, our analyses allowed us to distinguish not only the main factors that contribute to the distribution of the genes along the axes in pca, but also to determine some particular different features among human and non-human viruses in specific genes that could be important for explaining the virus infection evolution. in contrast to sars-cov- of bats and pangolins, human sars-cov- exhibited a differential distribution in particular genes that depended mostly on the a/t content in the third two viral genes that also present high cpb are orf a/b, that encodes for the replicase complex (polyproteins pp a and pp ab) and the spike protein s that participates in the early viral infection by attaching to the host receptor ace and mediating the internalization of the virus (guo et al., ) . in our studies, orf a/b grouped with the gene that encodes for the nucleocapsid protein n, indicating that their molecular features are highly conserved and are also presentin several human genes. this result is in concordance with previous works that proposed these genes as candidates for deoptimization for the design of attenuated vaccines due to their high positive cpb values (kames et al., ) . instead, the gene that encodes for the spike protein s, grouped with orf (involved in viral pathogenesis and apoptosis induction) that also presents high and similar positive cpb values. for all of them, a higher rate of a/t composition in the third codon position was observed. changes in the third position produce synonymous substitutions that could have conducted to a codon optimization in human cells using the host machinery that translates only genes whose molecular features match the viral needs. some viral genes seem to have been favoured for an increased viral replication in humans and optimized by using or mimicking some particular molecular patterns of human genes. but only some genes, such as the envelope e, the orf and , could be the key for an exacerbated viral pathogenesis. furthermore, because of these molecular and codon usage similarities between some highly expressed human genes and viral genes that occupy the same clusters, the translation machinery of the host could propitiate the translation of viral genes to the detriment of human gene expression in lung tissues.indeed, mistranslation or de- regulation of protein synthesis has been reported as a consequence of trna miss- in our study, we described the main factors that shape cub in sars-cov- , sars and mers in comparison with highly expressed genes in human lung tissue and revealed matching features with human genes that could have favoured the virus for an incremented pathogenesis. furthermore, we provided a list of candidate human genes that could be involved in the viral infection and had not been described yet which could be the key for explaining collateral effects and the human susceptibility to viral infectionsandshould be considered to be incorporated into genetic population studies. . declarations cov- (nc_ ) to ) using a hierarchical method of viral genes forsars (nc_ ) of the human host and human genes based on the molecular features. cpb correlation is included in the left for each cluster relating the cpb of human genes (horizontal axis) and cpb of the viral genes to ) using a hierarchical method of viral genes forsars (nc_ ) of the human host and human genes based on the molecular features. cpb correlation is included in the left for each cluster relating the cpb of human genes (horizontal axis) and cpb of the viral genes comparative genomic analysis mers cov isolated from humans and camels with special reference to virus encoded helicase sars-cov- codon usage bias downregulates host expressed genes with similar codon usage bats and coronaviruses codon selection in yeast chromosome architecture and genome organization neurologic complications of covid- coronavirus avian infectious bronchitis virus genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan emerging coronaviruses: genome structure emerging coronaviruses: genome structure bovine coronavirus virus attenuation by genome-scale changes in codon pair bias origin and evolution of pathogenic coronaviruses correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins modulation of host cell death by sars coronavirus proteins neurologic manifestations in an infant with covid- canine parvovirus type (cpv- ) and feline panleukopenia virus (fpv) codon bias analysis reveals a progressive adaptation to the new niche after the host jump codon usage in bacteria: correlation with gene expressivity theory and applications of correspondence analysis covid- ) outbreak-a n update on the status bats may be sars reservoir selection intensity for codon bias the structure of viruses correlation between the abundance of escherichia coli transfer rnas and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the e sequence analysis of sars-cov- genome reveals features important for vaccine design codon pair bias is a direct consequence of partitionfinder : new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses pathways to disease from natural variations in human cytoplasmic trnas relative codon adaptation index, a sensitive measure of codon usage bias prevalence and impact of cardiovascular metabolic diseases on covid- in china bats are natural reservoirs of sars-like coronaviruses hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in escherichia coli chromosome-encoded genes coronavirus genomic rna packaging human viruses have codon usage biases that match highly expressed proteins in the tissues they infect live attenuated influenza virus vaccines by computer-aided rational design how china sees america attenuation of human respiratory syncytial virus by genome-scale codon- pair deoptimization accounting for background nucleotide composition when measuring codon usage bias sequence comparison of the n genes of five strains of the coronavirus mouse hepatitis virus suggests a three domain structure for the nucleocapsid protein analysis of codon usage severe acute respiratory syndrome fasttree: computing large minimum evolution trees with profiles instead of a distance matrix analysis of codon usage bias of crimean-congo hemorrhagic fever virus and its adaptation to hosts the codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference fast, scalable generation of high quality protein multiple sequence alignments using host influence in the genomic composition of flaviviruses: a multivariate approach genetic recombination, and pathogenesis of coronaviruses directional mutation pressure and neutral molecular evolution a problem in multivariate analysis of codon usage data and a possible solution the adaptation of codon usage of +ssrna viruses to their hosts a comprehensive analysis of genome composition and codon usage patterns of emerging coronaviruses codon usage pattern of genes factors influencing codon usage of mitochondrial nd gene in pisces, aves and mammals recoding of the vesicular stomatitis virus l gene by computer-aided design provides a live, attenuated vaccine candidate review of bats and sars the need for urogenital tract monitoring in covid- covid and the digestive system the "effective number of codons" used in a gene focus on the crosstalk between deliberate reduction of hemagglutinin and neuraminidase expression of influenza virus leads to an ultraprotective live vaccine in mice mers, sars and other coronaviruses as causes of pneumonia isolation of a novel coronavirus from a man with pneumonia in saudi arabia covid- and the cardiovascular system fatal swine acute diarrhoea syndrome caused by an hku -related coronavirus of bat origin x p _ l r r k l e u c i n e -r i c h r e p e a t s e r i n e / t h r e o n i n e -p r o t e i n k i n a s e i s o f o r m x x p _ c o l a c o l l a g e n a l p h a - ( v i ) c h a i n i s o f o r m x x p _ a b c a a t p -b i n d i n g c a s s e t t e s u b -f a m i l y a m e m b e r i s o f o r m x x p _ a b c a a t p -b i n d i n g c a s s e t t e s u b -f a m i l y a m e m b e r i key: cord- -amfv z y authors: nguyen-contant, phuong; embong, a. karim; kanagaiah, preshetha; chaves, francisco a.; yang, hongmei; branche, angela r.; topham, david j.; sangster, mark y. title: s protein-reactive igg and memory b cell production after human sars-cov- infection includes broad reactivity to the s subunit date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: amfv z y the high susceptibility of humans to sars-cov- infection, the cause of covid- , reflects the novelty of the virus and limited preexisting b cell immunity. igg against the sars-cov- spike (s) protein, which carries the novel receptor binding domain (rbd), is absent or at low levels in unexposed individuals. to better understand the b cell response to sars-cov- infection, we asked whether virus-reactive memory b cells (mbcs) were present in unexposed subjects and whether mbc generation accompanied virus-specific igg production in infected subjects. we analyzed sera and pbmcs from non-sars-cov- -exposed healthy donors and covid- convalescent subjects. serum igg levels specific for sars-cov- proteins (s, including the rbd and s subunit, and nucleocapsid [n]) and non-sars-cov- proteins were related to measurements of circulating igg mbcs. anti-rbd igg was absent in unexposed subjects. most unexposed subjects had anti-s igg and a minority had anti-n igg, but igg mbcs with these specificities were not detected, perhaps reflecting low frequencies. convalescent subjects had high levels of igg against the rbd, s , and n, together with large populations of rbd- and s -reactive igg mbcs. notably, igg titers against the s protein of the human coronavirus oc in convalescent subjects were higher than in unexposed subjects and correlated strongly with anti-s titers. our findings indicate cross-reactive b cell responses against the s subunit that might enhance broad coronavirus protection. importantly, our demonstration of mbc induction by sars-cov- infection suggests that a durable form of b cell immunity is maintained even if circulating antibody levels wane. importance recent rapid worldwide spread of sars-cov- has established a pandemic of potentially serious disease in the highly susceptible human population. key questions are whether humans have preexisting immune memory that provides some protection against sars-cov- and whether sars-cov- infection generates lasting immune protection against reinfection. our analysis focused on pre- and post-infection igg and igg memory b cells (mbcs) reactive to sars-cov- proteins. most importantly, we demonstrate that infection generates both igg and igg mbcs against the novel receptor binding domain and the conserved s subunit of the sars-cov- spike protein. thus, even if antibody levels wane, long-lived mbcs remain to mediate rapid antibody production. our study also suggests that sars-cov- infection strengthens preexisting broad coronavirus protection through s -reactive antibody and mbc formation. the high susceptibility of humans to sars-cov- infection, the cause of covid- , reflects the novelty of the virus and limited preexisting b cell immunity. igg against the sars-cov- spike (s) protein, which carries the novel receptor binding domain (rbd), is absent or at low levels in unexposed individuals. to better understand the b cell response to sars-cov- infection, we asked whether virus-reactive memory b cells (mbcs) were present in unexposed subjects and whether mbc generation accompanied virus-specific igg production in infected subjects. we analyzed sera and pbmcs from non-sars-cov- -exposed healthy donors and covid- convalescent subjects. serum igg levels specific for sars-cov- proteins (s, including the rbd and s subunit, and nucleocapsid [n] ) and non-sars-cov- proteins were related to measurements of circulating igg mbcs. anti-rbd igg was absent in unexposed subjects. most unexposed subjects had anti-s igg and a minority had anti-n igg, but igg mbcs with these specificities were not detected, perhaps reflecting low frequencies. convalescent subjects had high levels of igg against the rbd, s , and n, together with large populations of rbd-and s -reactive igg mbcs. notably, igg titers against the s protein of the human coronavirus oc in convalescent subjects were higher than in unexposed subjects and correlated strongly with anti-s titers. our findings indicate cross-reactive b cell responses against the s subunit that might enhance broad coronavirus protection. importantly, our demonstration of mbc induction by sars-cov- infection suggests that a durable form of b cell immunity is maintained even if circulating antibody levels wane. importance the betacoronavirus sars-cov- , the causative agent of a respiratory disease termed covid- , emerged in china in late and rapidly spread worldwide ( ). a pandemic was declared in march and global deaths from covid- now exceed , . the rapid increase in cases in many countries has challenged healthcare systems and shutdowns and quarantine measures introduced to slow virus spread have caused major disruptions to society and economies ( ). sars-cov- infection produces a wide spectrum of outcomes. a proportion of infections, likely more than %, remain asymptomatic. most clinical cases develop mild to moderate respiratory symptoms, but up to % progress to a more severe disease with extensive pneumonia ( , ). when sars-cov- emerged and began to spread, the severity of the threat was primarily attributed to the novelty of the virus to the human immune system and, consequently, a lack of preexisting immune memory to quickly clear virus and limit disease progression. four types of common cold coronavirus are endemic in humans, the alphacoronaviruses e and nl and the betacoronaviruses oc and hku . however, limited relatedness between key structural proteins of these human coronaviruses (hcovs) and those of sars-cov- suggested that significant cross-reactive immunity was unlikely ( , ). initial studies of non-sars-cov- -exposed individuals found negligible levels of igg against the sars-cov- spike (s) protein, the viral attachment protein that binds the receptor angiotensin converting enzyme (ace ) on host cells to initiate infection ( ). more recently, however, studies have provided evidence of sars-cov- -reactive b and t cell memory in unexposed subjects that could confer some protection against sars-cov- or modulate disease pathogenesis. sera from non-sars-cov- -exposed individuals have been screened for igg binding to the s and s subunits of the sars-cov- s protein. the membrane-distal s subunit contains the receptor binding domain (rbd) for receptor recognition, and the membrane-proximal s , which has higher homology among coronaviruses than does s ( , ), mediates membrane fusion to release viral rna into the host cell. in two large cohorts of unexposed subjects, approximately % had igg that bound s , but not s or the rbd. approximately % of subjects had igg against the sars-cov- nucleocapsid (n) protein, which is highly conserved among coronaviruses ( , ) . although n is an internal viral protein and not a target of neutralizing antibodies (abs) , coronavirus infections typically elicit strong anti-n ab production ( ). the idea that circulating hcovs elicit igg that cross-reacts with sars-cov- is supported by the finding that sars-cov- infection increases igg titers against the s proteins of multiple hcovs ( ). in t cell studies, cd + t cells in up to % of non-sars-cov- -exposed donors responded to epitopes in s and non-s proteins of sars- ) . notably, s-reactive cd + t cells in unexposed subjects were mostly reactive to the conserved s subunit, consistent with cross-reactivity to circulating hcovs ( ). sars-cov- -reactive cd + t cells were also detected in unexposed donors, but the response was less marked than for cd + t cells ( ). are also likely to be present in non-sars-cov- -exposed individuals. indeed, mbcs might be more important than preexisting cross-reactive abs as a source of protection against sars-cov- . igg mbcs are more broadly reactive than abs generated against the same antigen, they persist after circulating ab levels wane, and they are readily activated to generate strong ab responses or seed germinal centers for additional rounds of affinity maturation ( ). concurrent early production of virus-specific igm and igg in the response to sars-cov- infection suggests a response mediated by igg mbcs as well as naïve b cells ( , ( ) ( ) ( ) . this picture is supported by to extend our understanding of the b cell response to sars-cov- infection, the current study compared ab and mbc immunity to sars-cov- in unexposed individuals and individuals in the convalescent phase of infection. in particular, we were interested in the presence of sars- cov- -reactive mbcs in unexposed subjects that could confer some protection against sars- cov- , and formation of mbcs by sars-cov- infection to provide durable protection against igg mbcs reactive to the novel rbd and the conserved s subunit of the s protein. mbcs are thus likely to be available to mediate rapid protective ab responses if circulating ab levels wane and reinfection occurs. our study also draws attention to preexisting sars-cov- - cross-reactive b cell memory to the s subunit in sars-cov- -naïve subjects. we speculate that the strong response to s after sars-cov- infection reflects preexisting s -reactive mbc activation and strengthens broad coronavirus protection. convalescent subjects sampled - weeks after symptom onset. reactivity was measured against the s (including the rbd and s subunit) and n proteins of sars-cov- and the s proteins of the human alphacoronavirus e and betacoronavirus oc . the h influenza virus hemagglutinin and tetanus toxoid (ttd) were included as control antigens that humans are commonly exposed to through infection and vaccination. serum igg levels were measured by elisa. approximately one-third of non-sars-cov- -exposed subjects in the healthy donor cohort had low levels of serum igg against the s and n proteins of sars-cov- , likely reflecting cross-reactivity with seasonal hcovs ( figure a ). notably, % of unexposed subjects had igg against the highly conserved s subunit of the s protein. it is possible that inherent features of the bulky s reagent used in our analysis reduced binding by anti-s abs. igg that bound the highly novel rbd was not detected in unexposed subjects. all non-sars-cov- -exposed subjects had igg against s proteins of the hcovs e and oc , indicating previous infection, and against the control proteins h and ttd ( figures c- f). response to the s subunit. levels of igg against s, rbd, s and n were markedly higher in convalescent subjects than unexposed subjects, indicating strong induction of these abs by sars- cov- infection ( figure a) . in a small number of convalescent subjects, high anti-s igg titers were associated with low levels of anti-n igg. indeed, more than % of convalescent subjects had anti-n igg levels within the range in unexposed subjects, questioning the reliability of using anti-n igg measurement to identify previous sars-cov- infection. notably, serum igg titers against s were consistently higher than against the rbd in convalescent subjects, perhaps reflecting the novelty of the rbd and a response dependent on naive b cell activation ( figure b) . interestingly, titers of igg were higher against the s protein of the hcov oc in convalescent subjects than in unexposed subjects, but this was not the case for the s protein of hcov e (or for the control proteins h and ttd) ( figures c- f ). the cov- infection ( figure g ). the particularly strong correlation between igg titers against oc s and the sars-cov- s suggests a cross-reactive response to the s subunit. since the healthy donor samples in our analysis were collected - years before the emergence of sars-cov- , we considered the possibility that a recently circulating hcov could have been responsible for the higher anti-oc s igg titers in the convalescent subjects. to exclude this possibility, we measured anti-oc s igg titers in sera collected from healthcare workers in . the healthcare workers cared for hospitalized sars-cov- patients, but all were negative for igg against sars-cov- s and rbd, consistent with the effectiveness of personal protective equipment and appropriate work practices. oc s-reactive igg levels in healthcare worker sera were similar to those in non-sars-cov- -exposed healthy donor sera and significantly lower than those in sera from convalescent subjects ( figure c ). taken together, our results indicate that sars-cov- infection generates a strong igg response that cross-reacts with the s of human betacoronaviruses. reactivity to the rbd and s subunit. pbmcs from non-sars-cov- -exposed subjects and convalescent subjects were analyzed for mbcs reactive to sars-cov- proteins. circulating proportion of unexposed subjects suggested that igg mbcs with the same specificity had also been formed. however, these mbcs were not detected, possibly because of very low frequencies in the circulation. in contrast, igg mbcs reactive to the s proteins of the hcovs oc and e and the control proteins h and ttd were detected in nearly % or more of non-sars-cov- - exposed subjects, consistent with the higher levels of serum igg against these antigens ( figure e - h) . as expected, sars-cov- rbd-reactive mbcs were not detected in unexposed subjects. in marked contrast to non-sars-cov- -exposed subjects, the vast majority of convalescent subjects had circulating igg mbcs reactive to the sars-cov- s, rbd, and s , indicating strong induction by sars-cov- infection of mbcs reactive to novel and conserved regions of the s protein ( figure a) . notably, numbers of igg mbcs reactive to the s protein of the hcov oc were higher in convalescent subjects than in unexposed subjects ( figure e generates igg mbcs reactive to the sars-cov- s that cross-react with the s of human betacoronaviruses. interestingly, only a small proportion of the convalescent subjects generated detectable n-reactive igg mbcs, even though most subjects produced high levels of anti-n igg in serum (figures c, d) . it is unclear whether this reflects a real difference between s-and n- reactive mbc formation or an effect of the sampling time. overall, we demonstrate that sars- cov- infection induces strong s-reactive mbc formation that would be expected to provide lasting protection against reinfection and potentially broad protection against betacoronaviruses. our goals in this study were to investigate sars-cov- -reactive b cell memory in unexposed subjects that could provide some protection against sars-cov- infection, and the generation of b cell memory by sars-cov- infection that could provide lasting protection against re-infection. in particular, we were interested in igg mbcs, which respond to cognate antigens with rapid, vigorous, and high-affinity ab production. importantly, mbcs are long-lived cells that continue to provide strong protection when circulating ab levels wane. our approach was to analyze circulating igg as well as igg mbcs from the sars-cov- -naïve and sars- cov- -convalescent subject groups. our key findings are as follows: (i) the presence of igg reactive to the s subunit of sars-cov- in most unexposed subjects, likely reflecting cross- reactivity to hcovs, (ii) markedly increased levels of igg against the sars-cov- s and n proteins, including reactivity to the rbd and s subunit of s, in convalescent subjects, (iii) increased igg binding to the s protein of the oc hcov, but not e hcov, in convalescent subjects, reflecting greater cross-reactivity between s subunits of betacoronaviruses, (iv) strong formation of igg mbcs reactive with the rbd and s subunit of the sars-cov- s protein in convalescent subjects, and (v) formation of igg mbcs reactive with the s protein of oc , but not e, in convalescent subjects, consistent with s subunit cross-reactivity between approximately one-third of our cohort of non-sars-cov- -exposed subjects had low levels of igg against the sars-cov- s and n proteins. the anti-n igg likely reflects infection with hcovs, which have low level ( - %) homology with the sars-cov- n protein ( ). however, a protective function for anti-n abs has not been established ( ). notably, % of unexposed subjects had igg against the s subunit, reflecting homology with hcovs, but none had igg against the highly novel sars-cov- rbd ( , , ) . abs that target the s subunit have been shown to have virus neutralizing activity, raising the possibility that preexisting anti-s igg confers some protection against sars-cov- ( ). the processes that generate anti-s igg are also likely to generate s -reactive igg mbcs and these might provide more significant protection than low levels of anti-s abs. however, s -reactive mbcs (or s-reactive and n-reactive mbcs) were not detected in non-sars-cov- -exposed subjects. taken together with the identification of s-reactive mbcs in unexposed healthy donors ( ), it is likely that s -reactive mbcs were below the limit of detection in our assays. most mbcs are resident in lymphoid tissues, except for mbcs against frequently seen immunogenic antigens (for example, the influenza h or ttd in this study), and are at very low frequencies in circulation in steady state ( , ) . anti-rbd, -s, and -n igg levels were markedly higher in the convalescent subjects than in non-sars-cov- -exposed subjects, indicating strong induction by sars-cov- infection. perhaps notably, the majority of convalescent subjects had higher igg titers against the s than against the rbd. this is particularly surprising because of the accessibility of the rbd to b cells and the expected immunodominance over the s subunit ( , ). our demonstration of strong anti-s igg production is consistent with the activation of a preexisting population of igg mbcs against the conserved s subunit in the absence of mbcs reactive to the novel rbd. however, we cannot exclude inherent differences in the stability or antigenicity of rbd and s reagents as an explanation. in convalescent subjects, igg levels against the s protein of hcov oc (but not e) were significantly higher than in non-sars-cov- -exposed subjects and correlated strongly with anti-s igg levels. these findings support stronger b cell cross-reactivity between the s subunits of sars-cov- and human betacoronaviruses than alphacoronaviruses ( ). importantly, we demonstrate that sars-cov- infection generates rbd-reactive and s - reactive igg mbcs. recently, long et al. ( ) found that levels of sars-cov- -reactive abs, including neutralizing abs, start to decrease within - weeks of infection, especially when the infection is asymptomatic. since mbc populations are maintained for many years, perhaps decades, our findings indicate that mbcs generated by sars-cov- infection will be available to rapidly generate protective abs if waning ab levels allow re-infection to occur ( ). notably, three convalescent subjects in our analysis had undetectable rbd-reactive igg, but nevertheless had rbd-reactive igg mbcs. this might reflect mbc production by germinal centers that remained active after recovery from infection ( ). the proportion of subjects with mbcs reactive to the hcovs oc and e was greater for the convalescent group than the unexposed group, likely reflecting the increase in s -reactive mbcs in the convalescent group and cross-reactivity with hcovs. s -reactive mbc expansion by sars-cov- infection could enhance protection against a broad range of coronaviruses ( ). n-reactive mbc formation in convalescent subjects was less than expected given the large number of subjects with high titers of n-reactive igg, but additional sampling times are required to confirm this observation. in conclusion, our analysis investigated ab and mbc immunity to sars-cov- in unexposed subjects and individuals soon after recovery from sars-cov- infection. findings emphasized the novelty of the sars-cov- s protein rbd in unexposed subjects. however, igg reactive to the s was widespread in unexposed subjects and likely resulted from exposure to hcovs. although our approach was unable to directly identify s -reactive mbcs in the unexposed subjects, we suggest that these cells are present and strongly contribute s -reactive igg early in the response to sars-cov- infection. the igg response in sars-cov- convalescent subjects was also strong against the rbd and, less consistently, against the n protein. importantly, sars-cov- convalescent subjects had generated rbd-reactive and s -reactive igg mbcs. the may, and consisted of pcr-confirmed patients and non-pcr-confirmed subjects who were contacts of confirmed cases or displayed covid- -like symptoms. the convalescent subjects were sampled - weeks after symptom onset. symptoms reported (percent of subjects) were fever ( %) cough ( %), sore throat ( %), stuffy/runny nose ( %), difficulty breathing ( %), fatigue ( %), headache ( %), body aches ( %), nausea/vomiting ( %), and diarrhea/loose stool ( %). (isolate wuhan-hu- ) were expressed in-house in hek cells using pcaggs plasmid constructs kindly provided by florian krammer (icahn school of medicine at mount sinai) ( ). baculovirus-expressed s subdomain and hek cell-expressed n protein were obtained from sino biological (chesterbrook, pa) and raybiotech (peachtree corners, ga), respectively. baculovirus-expressed s proteins from seasonal hcovs oc and e were obtained from sino biological. in-house hek cell-expressed hemagglutinin from egg-derived h n mabtech stockholm, sweden) and p-nitrophenyl phosphate substrate (thermo fisher) were subsequently added to detect bound antigen-specific abs. absorbance was read at nm after color development. a weight-based concentration method was used to quantify antigen-specific ab levels in test samples as described previously ( , ) . sera from healthy donors and convalescent subjects with high titers for test antigens were used to establish human serum standards. the cutoff for assay positivity was set at approximately x the mean od value for negative wells. statistical analyses. the medians with (q , q ) were summarized by subject group and compared by the wilcoxon rank-sum test. spearman correlation analysis together with corresponding robust regression models was used to assess monotonic associations among ab responses. multiple test adjustment was not applied for this explorative study and thus a p value < . was considered significant for all analyses. statistical analyses were performed using software sas . (sas institute inc, cary, nc). cov- -exposed and covid- convalescent subjects. sera were collected from ( proteins in non-sars-cov- -exposed and covid- convalescent subjects. pbmcs for mbc analysis were collected from (i) healthy donors sampled from - (hd) and (ii) covid- convalescent subjects sampled - weeks after symptom onset (conv). pbmcs were stimulated in vitro to induce mbc differentiation into ab-secreting cells. antigen-specific a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- vaccines: status report clinical features of patients infected with novel coronavirus in wuhan clinical and immunological assessment of asymptomatic sars-cov- infections genome composition and divergence of the novel coronavirus ( -ncov) originating in china phylogenetic analysis and structural modeling of sars-cov- spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop a serological assay to detect sars-cov- seroconversion in humans reimer presence of sars-cov- reactive t cells in covid- patients and healthy donors infectious diseases (except hiv/aids) pre-existing and de novo humoral immunity to sars-cov- in humans characterization of a novel coronavirus associated with severe acute respiratory syndrome antibody response of patients with severe acute respiratory syndrome (sars) targets the viral nucleocapsid virological assessment of hospitalized patients with covid- targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals b cell responses: cell interaction dynamics and decisions antibody responses to sars-cov- in patients with covid- kinetics of sars-cov- specific igm and igg responses in covid- patients covid- serology at population scale: sars-cov- -specific antibody responses in saliva. infectious diseases (except hiv/aids) deep sequencing of b cell receptor repertoires from covid- patients reveals strong convergent immune signatures broad neutralization of sars-related viruses by human monoclonal antibodies convergent antibody responses to sars-cov- in convalescent individuals contributions of the structural proteins of severe acute respiratory syndrome coronavirus to protective immunity human monoclonal antibodies against highly conserved hr and hr domains of the sars-cov spike protein are more broadly neutralizing the transcription factor t-bet resolves memory b cell subsets with distinct tissue distributions and antibody specificities in mice and humans broad dispersion and lung localization of virus- specific memory b cells induced by influenza pneumonia a sequence approach can predict candidate targets for immune responses to sars-cov- the receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients cutting edge: long-term b cell memory in humans after smallpox vaccination role of memory b cells in hemagglutinin- specific antibody production following human influenza a virus infection broad hemagglutinin-specific memory b cell expansion by seasonal influenza virus infection reflects early-life imprinting and adaptation to the infecting virus assignment of weight-based antibody units to a human antipneumococcal standard reference serum, lot -s individual hd and conv subjects in order of ascending titers against s. the assigned cutoff for positivity is shown by the shaded bar. (b) proportions of serum igg against the sars-cov c) serum igg concentrations against the s protein of the hcov oc in conv, hd, and hcw subjects. (d-f) serum igg concentrations against the s protein of the hcov e (d), the influenza virus h hemagglutinin (e), and ttd (f) in conv and hd subjects. (g) correlation between serum igg concentrations against the s subunit of sars-cov- and the s protein of the hcov oc ; ns [not significant]) for comparisons of serum igg concentrations between subject groups was determined by the wilcoxon rank-sum test. correlations were tested by spearman correlation analysis with corresponding robust regression models quantitation of mbc-derived ab (igg)-secreting cells (mascs) or mbc-derived polyclonal mpabs) provided a measure of the abundance of specific igg mbcs. (a) igg mbcs reactive to the sars-cov- spike (s), receptor binding domain (rbd), and nucleocapsid (n) in conv subjects. mbc numbers were determined by enumeration of igg mascs by elispot essay after in vitro mbc stimulation. the assigned cutoff for positivity is shown by the shaded bar mbcs reactive to the influenza virus h hemagglutinin and ttd in conv subjects. mbc numbers were determined by enumeration of igg mascs. (c) proportions of igg mbcs reactive to the sars-cov- rbd, s , and n for individual conv subjects. (d) comparison of serum igg concentrations (upper panels) and igg mbc numbers cov- s (left-hand side) and n (right-hand side) proteins. serum igg was measured by elisa dilution curves are shown for individual conv subjects; curves for subjects are shown in different colors to identify particular response patterns key: cord- - hb ndr authors: resende, paola cristina; motta, fernando couto; roy, sunando; appolinario, luciana; fabri, allison; xavier, joilson; harris, kathryn; matos, aline rocha; caetano, braulia; orgeswalska, maria; miranda, milene; garcia, cristiana; abreu, andré; williams, rachel; breuer, judith; siqueira, marilda m title: sars-cov- genomes recovered by long amplicon tiling multiplex approach using nanopore sequencing and applicable to other sequencing platforms date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hb ndr genomic surveillance has become a useful tool for better understanding virus pathogenicity, origin and spread. obtaining accurately assembled, complete viral genomes directly from clinical samples is still a challenging. here, we describe three protocols using a unique primer set designed to recover long reads of sars-cov- directly from total rna extracted from clinical samples. this protocol is useful, accessible and adaptable to laboratories with varying resources and access to distinct sequencing methods: nanopore, illumina and/or sanger. the novel severe acute respiratory syndrome coronavirus (sars- belonging to the family of coronaviridae and to the genus betacoronaviridae, emerged in wuhan, china in december , and has already been introduced in countries to date ( , ) . in uk it has caused more than , reported cases and , death and in brazil , cases and , deaths have already been reported, last update april th , ( ) . on march th, the who declared a sars-cov- pandemic, reinforcing the need for all countries to implement measures for rapid detection and characterization of the virus to help mitigate virus transmission. genomic surveillance has become a useful tool for better understanding virus pathogenicity, origin and spread. obtaining accurately assembled, complete viral genomes directly from clinical samples is still a challenging task due to the low amount of viral nucleic acid in the clinical specimen compared to host dna, and to the size of sars-cov- genome, which is around kb in length. despite those limitations, we developed a sequencing protocol that successfully obtained whole genomes from sars-cov- positive samples referred to the national reference laboratory at fiocruz in brazil. this protocol was further optimised for higher throughput sequencing at university college london pathogen genomics unit and ucl genomics to sequence genomes for the covid- genomics uk consortium (cog-uk). the tiling amplicon multiplex pcr method has been previously used for virus sequencing directly from clinical samples to obtain consensus genome sequences ( ). this protocol has been applied to ebola, zika, chikungunya and sars-cov- sequencing ( ) ( ) ( ) ( ) using preferentially short amplicons (~ base pairs). nanopore sequencing allows rapid turnaround times ( - days) for obtaining a consensus sequence directly from clinical samples and a allows faster response during an outbreak. in this study, we adapted this protocol to recover longer kb reads, decreasing the number of primers required and thus reducing possible mismatches and/or undesired interactions. additionally, it is easier to assemble larger viral genomes from longer reads enabling higher depth coverage (more than x) in a reduced sequencing time. here, we describe three protocols using a primer set designed to sequence sars-cov- directly from total rna extracted from clinical samples, which were initially diagnosed using real-time rt-pcr ( , ) . the protocols described herein can be applied to different sequencing platforms, such as sanger, illumina and oxford nanopore, and therefore are useful, accessible and adaptable to laboratories with different resources and sequencing facilities. by using this protocol, we generated sars-cov- genomes ( from clinical samples and isolates from a set of primer pairs (supplementary table table ). the primers were tested in silico using the geneious r software against the available sars-cov- genomes at that moment. to test the efficiency of each primer pair ( um) we performed conventional sanger sequencing with two positive samples detected in reverse transcription was initially performed using superscript™ iv first-strand synthesis system (invitrogen), using total rna from samples presenting ct values ≤ for gene e ( , ) . two multiplexed pcr products (pool a = primers pairs and pool b = primers pairs) were generated using the q ® high-fidelity dna polymerase (neb) and the primer scheme described in table . the pcr products were purified using agencourt ampure xp beads (beckman coulter™) and the dna concentration measured by the qubit fluorometer (invitrogen) using the qubit dsdna hs assay kit (invitrogen). dna products (multiplex pcr pools a and b) were normalised and pooled together in a final concentration of fmol. the nanopore library protocol is straightforward as this method is optimised for long reads, such as the generated kb amplicons. library preparation was conducted using ligation sequencing d (sqk-lsk oxford nanopore technologies (ont) and native barcoding as illumina sequencing chemistry is geared towards sequencing short reads, dna libraries were generated from the pooled amplicons using nextera xt dna sample preparation kit (illumina, san diego, ca, usa) according to the manufacturer specifications. the size distribution of the libraries was evaluated using a bioanalyzer (agilent, santa clara, usa) and the samples were pair-end sequenced ( x bp) on a miseq v cycle (illumina, san diego, usa). different data analysis pipelines for illumina and oxford nanopore sequencing were used to extract the consensus files from the raw data. demultiplexed fastq files generated from the illumina sequencing data were used as an input for the analysis. reads were trimmed based on quality scores with a cutoff of q used to remove low quality regions and adapter sequences were removed. the reads were mapped to wuhan strain mn , duplicate reads were removed from the alignment and the consensus sequence called at a threshold of x. the entire workflow was carried out in clc genomics workbench software version . . for the oxford nanopore sequencing data, the high accuracy base called fastq files were used as an input for analysis. the pipeline used was an adaptation of the artic-ncov medaka workflow (https://artic.network/ncov- /ncov -bioinformatics-sop.html). we used an earlier version of the workflow which used porechop to demultiplex the reads. the mapping to the wuhan reference sequence (mn ) was done using minimap with medaka used for error correction. this was all carried out within the artic-ncov -medaka conda environment (https://github.com/artic-network/artic-ncov ). to put the genomes from brazil and uk generated using this protocol in a global context, sars-cov- genomes from other countries were recovered from gisaid. any sequences of length less than nucleotides, having quality issues on gisaid, or where we detected an unusual frameshifting deletion or insertion relative to the sars-cov- reference sequence were not included in the phylogenetic reconstruction. to identify similar genomes to the genomes produced and not available yet in gisaid we used the cov-glue website (http://cov-glue.cvr.gla.ac.uk/#/home), and we used the nextstrain website (https://nextstrain.org/ncov/global), in order to observe the topology of genomes already available in gisaid. the final curated dataset consisting of sars-cov- genome sequences was aligned using mafft ( ) . model testing was carried out using jmodeltest ( ) here we introduce a versatile sequencing protocol to recover the complete sars-cov- genome based on reverse transcription plus an overlapping long amplicon multiplex pcr strategy, and associated with pipelines to report the data, and recover the consensus files. the protocol was validated with rna extracted from some of the first covid- cases detected in brazil and then optimized and developed for automation at two sequencing facilities at ucl (pgu and ucl genomics) in london uk. alternative protocols for illumina platform, based on an initial amplification of larger fragments ( kb and . kb) produced by one-step rt-pcr with high fidelity enzyme blends were also tested. however, they were prone to producing false mutations, likely due errors during amplification. based on the fact that sars-cov- remains conserved, presenting few mutations scattered throughout the genome, the possibility of artificial mutations must be ruled out. we have demonstrated that this overlapping long amplicon multiplex pcr protocol suitable for samples with a wide range of viral loads, generating high coverage throughout the viral genome without artificial indels. it worked well on all four platforms tested (minion, gridion, illumina and sanger) making it suitable for labs with distinct expertise, enabling successful rapid sequencing recovery of the sars-cov- genome directly from clinical samples. the sequencing workflow optimizations were conducted with the purpose of protocol development and the samples used for this optimization were collected as part of the national brazilian surveillance and cog-uk london. we did not use any clinical information or any patient data in this study. * lineage based on pangolin version subtyping tool (https://github.com/hcov- /pangolin); ** ct = cycle threshold , samples from brazil and uk had the ct value measured by different rt-pcr protocols. coronaviriae study group of the international committee on taxonomy of v. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- . nat microbiol an interactive web-based dashboard to track covid- in real time multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples real-time, portable genome sequencing for ebola surveillance circulation of chikungunya virus east/central/south african lineage in rio de janeiro ncov- sequencing protocol extraction-free covid- (sars-cov- ) diagnosis by rt-pcr to increase capacity for national testing programmes during a pandemic maria diagnostic detection of -ncov by real-time rt-pcr detection of novel coronavirus ( -ncov) by real-time rt-pcr mafft multiple sequence alignment software version : improvements in performance and usability jmodeltest: phylogenetic model averaging raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models we acknowledge the originators of sequences in gisaid (www.gisaid.orgacknowledgments in supplementary table key: cord- - tbb df authors: di gioacchino, andrea; Šulc, petr; komarova, anastassia v.; greenbaum, benjamin d.; monasson, rémi; cocco, simona title: the heterogeneous landscape and early evolution of pathogen-associated cpg dinucleotides in sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tbb df covid- can lead to acute respiratory syndrome in patients, which can be due to dysregulated immune signaling. we analyze the distribution of cpg dinucleotides, a pathogen-associated molecular pattern, in the sars-cov- genome. we find that cpg relative abundance, which we characterize by an adequate force parameter taking into account statistical constraints acting on the genome at the nucleotidic and amino-acid levels is, on the overall, low compared to other pathogenic betacoronaviruses. however, the cpg force widely fluctuates along the genome, with particularly low value, comparable to the circulating seasonal hku , in the spike protein (s) coding region and high value, comparable to sars and mers, in the highly expressed nucleocapside (n) coding regions, whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the ’utrs of all subgenomic rna. this dual nature of cpg content could confer to sars-cov- the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. we then investigate the evolution of synonymous mutations since the outbreak of the covid- pandemic. using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven, in the n protein coding region, both by the viral codon bias and by the high value of the cpg content, leading to a loss in cpg. sequence motifs preceding these cpg-loss-associated loci match recently identified binding patterns of the zinc finger anti-viral protein. when a virus enters a new host, it can present pathogen-associated molecular patterns (pamps) that are rarely seen in circulating strains that have adapted to that host's immune environment over evolutionary timescales. the emergence of sars-cov- , therefore, provides a rare window into innate immune signaling that may be relevant for understanding immune-mediated pathologies of sars-cov- , anti-viral treatment strategies, and the evolutionary dynamics of the virus, where evidence for selective pressures on viral features can reflect what defines "self" in its new host. as a case in point, the influenza pandemic was likely caused by a strain that originated in water fowl and entered the human population after possible evolution in an intermediate host. that viral genome presented cpg dinucleotides within a context and level of density rarely found in the human genome where they are severely underespressed, particularly in a set of genes coding for the proteins associated with antiviral innate immunity [ , , , ] . over the past century the h n lineage evolved in a directed manner to lower these motifs and gain upa motifs, in a way that could not be explained by its usage of amino-acid codon bias [ , ] . it has since been found that these motifs can engage the pattern recognition receptors (prrs) of the innate immune system [ , ] , and directly bind the zinc finger anti-viral protein (zap), both in a cpg-dependent manner [ , , ] . hence, the interrogation of emergent viruses from this perspective can predict novel host virus interactions. covid- presents, thus far, a different pathology than that associated with the h n , which was disproportionately fatal in healthy young adults. it has been characterized by a large heterogeneity in the immune response to the virus [ , , ] and likely dysregulated type i interferon signaling [ , , , ] . various treatments to attenuate inflammatory responses have been proposed and are currently under analysis or being clinically tested [ ] . it is therefore essential to quantify pathogen-associated patterns in the sars-cov- genome for multiple reasons. the first is to better understand the pathways engaged by innate immune agonism and the specific agonists to help build better antiviral therapies. another is to better predict the evolution of motif content in synonymous mutations in sars-cov- , as it will help understand the process and timescales of attenuation in humans. third is to offer a principled approach for optimizing vaccine strategy for designed strains [ , ] to better reflect human-genome features. in this work we will use the computational framework developed in [ ] to carry out a study of non-self associated dinucleotide usage in sars-cov- genomes. the statistical physics framework is based on the idea of identifying the abundance or scarcity of dinucleotides given their expected usage based on host features. it generalizes the standard dinucleotide relative abundance introduced in [ ] , as it can easily incorporates constraints in coding regions coming from amino-acid content and codon usage. the outcome of the approach are forces [ ] that characterize the deviations with respect to null models in which the number of dinculeotides is the one statistically expected under a set of various constraints. cpg forces could be related to the evolutionary constraint to lower or increase cpg number, under the pressure of host prrs that recognize a pathogen. such formalism has further been applied to identify non-coding rna from repetitive elements in the human genome expressed in cancer that can also engage prrs [ ] , to characterize the cpg evolution through synonymous mutations in h n [ ] , and to characterize local and non-local forces on dinucleotides across rna viruses [ ] . we perform an analysis of the landscape of cpg motifs and associated selective forces in sars-cov- in comparison with other genomes in the coronavirus family in order to understand specific pamp associated features in the new sars-cov- strains (sec. . ). we also focus on the heterogeneity of cpg motif usage along the sars-cov- genome (sec. . and sec. . ). finally we use a model of the viral gene evolution under human host pressure, characterized by the cpg force, to study synonymous mutations, and in particular those which change cpg content, observed since the sars-cov- entered the human population (sec. . ). the latter approach points out at hotspots where new mutations will likely attenuate the virus, while evolving in contact with the human host. we first compute the global force on cpg dinucleotides for sars-cov- and a variety of other viruses from the coronaviridae family affecting humans or other mammals (bat, pangolin), see fig. a , using as null model the nucleotide usage calculated from human genome [ ] (see methods sec. . ) . the value − . of the global force for sars-cov- is lower than for sars and mers, and other strongly pathogenic viruses in humans, such as h n , h n , ebola (suppl. fig. si . ). mers shows the highest cpg force among the human coronaviruses, followed by sars, while some bat coronaviruses have stronger cpg force. it is worth noticing that sars-cov- is among the viruses with smallest global cpg force and some hcov that circulate in humans with less pathogenicity have global cpg forces comparable or higher than that of sars-cov- . the absence of a straightforward correlation between global cpg force and the pathology of a coronavirus in humans calls for a finer, local analysis of cpg forces we report below. figure b compares the forces acting on cpg and upa motifs within the coronaviridae family, with a particular emphasis on the genera alphacoronavirus and betacoronavirus, and on those viruses which infect humans [ ] ; for other dinucleotides, see suppl. we observe an anti-correlation between upa and cpg forces. upa is the cpg complementary motif corresponding to the two nucleotidic substitutions more likely to occur in terms of mutations; transitions have larger probability with respect to transversions and are less likely to result in aminoacid substitutions. such anti-correlations are not observed with motifs that are one mutation away from cpg (suppl. fig. si. ) . to go beyond the global analysis we study the local forces acting on cpg in fixed-length windows along the genome. results for sars, mers, sars-cov- , hcov-hku and two representative sequences of bat and pangolin coronaviruses, chosen for their closeness to sars-cov- are reported in fig. c . in some genomic regions, especially at the ' and ' extremities, sars-cov- , sars and mers (together with the bat and pangolin viruses) have a peak in cpg forces, which is absent in the hcov-hku (as well as in the other hcovs, see suppl. fig. si. ). the high cpg forces at the extremities could have an important effect on the activation of the immune response via sensing, as the life cycle of the virus is such that the initial and final part of the genome are those involved in the subgenomic transcription needed for viral replication [ , ] . during the infection many more rna fragments from these regions are present in the cytoplasm than from the other parts of the viral genome. consequently, despite the relatively low cpg content of sars-cov- compared to other coronaviruses, there can be high concentrations of cpg-rich rna due to the higher transcription of these regions. the similarity between the high values of the maximum local forces of sars-cov- and those of sars, bat and pangolin coronaviruses shown in fig. a confirm this pattern: mers and sars, viruses that are likely less well adapted to a human host, have the highest local peaks in cpg content, followed by sars-cov- and then by seasonal strains that circulate in humans. it is interesting to notice that high and very high levels of proinflammatory cytokines/chemokines (such as il- and tnf-α) have been observed in, respectively, sars and mers and, at times, sars-cov- infection [ , , ] . these results are qualitatively corroborated by the simpler analysis of cpg motif density (suppl. fig. si. ). we now restrict our analysis to the coding regions of sars-cov- and, in particular, on two structural proteins, n (nucleocapside) and s (spike) [ , , ] . to account for the extra constraints on the amino-acid content and codon usage acting over these coding regions we modify our local force calculation, see methods sec. . and sec. . . the landscape of forces when taking into account these coding constraints is shown, restricted to the coding regions of sars-cov- genome (see methods sec. . ), in fig. a , together with the forces computed without the coding constraints with the same human nucleotide frequency used in fig. (dashed lines) .the global shift of the forces with or without coding constraints is due to the different background used. indeed, when the human nucleotide frequency is modified into that computed through only the human coding rnas [ ] , the result without coding constraints becomes much similar to the one with coding constraints (dotted lines in fig. a) . apart from this global shift, the qualitative result does not substantially differ from the findings of fig. c . in particular the peak of high cpg density and force is still present at the ' and the ' ends of the genome, including the n-orf, the envelope e-orf and membrane glycoprotein m-orf regions. on the contrary in the s-orf region the cpg forces under coding constraint are small. detailed results for the s (s orf) and n (n orf) proteins are shown in, respectively, figs. b& c. these structural proteins are present and quite similar across the coronaviridae family, and allow us to compare several strains of coronaviruses. in the s orf, sars-cov- shows the lowest global cpg force among the human-infecting . the force is highly variable along the genome, with much larger values in certain regions (such as the coding region for protein n) than in others (e.g. coding region for protein s). the maximum value of the local cpg force hints at the similarity of sars-cov- with the most pathogenic viruses. bat sequence analyzed in panels (a) and (c): ratg ; pangolin sequence obtained in guangdong in . data from vipr [ ] and gisaid [ ] , see methods sec. . and suppl. sec. si. . betacoronaviruses, see fig. c . the cpg force is much higher for protein n in sars-cov- , immediately below the level of sars and above that of mers, see comparison with human-infecting members of the coronaviridae family presented in fig. b . the comparative analysis of forces in the e-orf (suppl. fig. si. b) gives results similar to the n-orf, while smaller differences in cpg force among coronaviruses that circulate in humans are observed for the m-orf (suppl. fig. si. c ). we now assess the ability of our cpg force model to predict biases in the synonymous mutations already detectable across the few months of evolution following the first sequencing of sars-cov- (data from gisaid [ ] , reference sequence wuhan, - - , last updated sequence - - , see methods sec. . ). barring confounding effects, we expect that high-force regions, such as n orf, will be driven by host mimicry towards a lower number of cpg motifs. other regions, such as s orf, have already low cpg content and would feel no pressure to keep the cpg content at that level, so random mutations would likely increase their cpg numbers. these predictions are in good agreement with the observed mutations in current sars-cov- data, as shown in fig. a . most of the mutations that decrease the number of cpg are located at the ' and '-end of the sequence, in correspondence with the high peak in cpg force, notably in the n orf region. conversely, mutations that increase the number of cpg are found in orf ab, in the low-cpg-force regions and in the s orf region. we now focus on the n protein. figure b shows the locations of synonymous mutations, indicated by bars, and the number of variants in which they are found to occur, indicated by star symbols of corresponding sizes (only if observed more than times). we observed a total number of ms = variants with synonymous mutations, out of which are unique. out of these ms variants and , respectively, lower and increase cpg, while the remaining leave cpg content unchanged. it is remarkable than more than % of the variants differing in cpg count actually have it decreased. when restricting the analysis on the variants in which the cpg count decreases, the losses take place in at different loci. the nucleotide motifs preceding these loci are listed in the top lines of table , together with their positions along sars-cov- (wuhan, / / ) and their number of occurrences in the sequence data. out of of these motifs, which represent out of the observed cpg losses, are of the type cnxgxcg, where nx is a spacer of n nucleotides and were identified as zap binding patterns in [ ] . the binding affinity of zap to the motifs strongly depends on the spacer length, n, with top affinity for n= [ ] . notice that out of the cpgsuppression related motifs in sars-cov- correspond to n= . other motifs of the type cnxgccg are also present in sars-cov- , but their cpg is not lost in sequence data, see last lines of table ; the dissociation constants associated to their spacer lengths are on average larger than the ones of the motifs showing cpg loss. for the s protein (see fig. c and star sizes), we observe ms = synonymous variants (with unique mutations). among these variants, and , respectively, lower and increase the cpg content. therefore, only about % of the variants that affect the cpg count decrease it. these results support the existence of early selection pressure to lower cpg occurrence in n orf, but not in s orf. our model can be further used to predict the odds of synonymous mutations from the original sars-cov- (wuhan, - - ) sequence (see sec. . ) . for this purpose, we introduce a synonymous mutation score (sms), defined in methods sec. . , whose value expresses how likely the mutation is to appear under the joint actions of the force and of the reference codon usage , see eq. . the effect of codon usage is important, in particular for synonymous mutations that do not change cpg and for which it is the only driving factor in our model. in fig. we show our predictions for synonymous mutations in the n ( b, a) and the s ( c, b) proteins. figs. b and c show sms along, respectively, the n and s sequences and the unique mutations, respectively, lowering (blue), increasing (red), or leaving unchanged (black) the cpg content. the majority of mutations, both unique and taking into account multiplicities (which are described by the stars in figs. b and c), in sars-cov- correspond to high sms, in agreement with our model. to make our arguments more quantitative, we tested the ability of our model to discriminate between observed and non-observed mutations. in figs. a and b we show the histograms of the sms corresponding to observed synonymous variants (in green) and to putative mutations that would leave amino-acid content unchanged but have not been observed so far (in yellow). the distribution of sms for observed variants is shifted to higher values compared to their counterparts for non-observed mutations, both for the proteins n and s. hence, our model is able to statistically discriminate between non-observed and observed synonymous mutations (anova f-test: for n and for s). note that for the null mutational models in which synonymous mutational rate are uniform, the score distribution for observed and unobserved mutations is equally peaked at zero (anova f-test= ). we have checked that our model is able to discriminate between non-observed and observed synonymous mutations even when we restrict only to unique mutations, dropping any information about multiplicity, (anova f-test= (n protein), (s protein) see suppl. fig. si. ). we have further performed comparative tests of our model, in which mutations are driven by codon bias and cpg forces, with simpler models using: i) only the transition versus transversion rate (with ratio : ), [ ] (trs-trv bias) (methods sec. . .), ii) the transition versus transversion rate and cpg force, iii) a uniform rate (null model described above), and iv) a uniform rate and cpg force. the results of these additional tests are shown in suppl. fig. si. . the anova f-test and p-values are shown in table and confirm that while the uniform rate and the transition versus transversion bias are not enough to separate the score distributions between observed and unobserved mutations, for the n orf adding a cpg force gives a very clear separation, in the two cases, while for the s orf we observe a still present but less marked separation. finally we checked the consistency of our results at different times since our first analysis (dated - - , see suppl. sec. si. ). these motifs were shown to be binding patterns for the zap protein in [ ] ; the dissociation constants were measured for repeated a spacers, with values (in µm) k d ( ) = . ± . , k d ( ) = . ± . , k d ( ) = . ± . , k d ( ) = . ± . , [ ] . the next lines show the other cpg lost through mutations and their preceding nucleotides, which do not correspond to motifs tested in [ ] . the last lines show other subsequences in the n protein, known as binding motifs of zap from [ ] , but for which no loss of cpg is observed in the sequence data. the present work reports analysis of dinucleotide motif usage, particularly cpg, in the early evolution of sars-cov- genomes. first, a comparative analysis with other genomes shows that the overall cpg force, and the associated cpg content are not as large as for highly pathogenic viruses in humans (such as h n , h n , ebola and sars and mers in the coronaviridae family). however, the cpg force, when computed locally, displays large fluctuations along the genome. this strong heterogeneity is compatible with viral recombination, in agreement with the hypothesis stated in [ ] . the degree to which this heterogeneity in any way reflects zoonotic origins should be further worked out using phylogenetic analysis. in particular, the segment coding for the spike protein has a much lower cpg force. the s protein has to bind ace human receptors and tmprss [ , ] . a fascinating reason that could explain the low cpg force on this coding region, is that it may come (at least in part) from other coronaviruses that better bound human entry receptors [ ] . other regions, in particular the region after the slippage site in orf ab and the initial and final part of the genome including the n orf, are characterized by a larger density of cpg motifs (and corresponding cpg force), which are comparable to what is found in sars and mers viruses in the betacoronavirus genus. interestingly the initial and final part of the genome are implied in the full-genome and subgenomic viral replication. in particular, the coding region of the n protein and its rna sequence, present in the ' untranslated region (utrs) of all sars-cov- subgenomic rnas, has been shown in [ ] to be the most abundant transcript in the cytoplasm. the high concentration of n transcripts in the cytoplasm could contribute to a dysregulated innate immune response. sars-cov- , due to its complex replication machinery, does not express its rna at uniform concentration. a mechanism generating different densities of pamps being presented to the immune system at different points in the viral life cycle can affect immune recognition and regulation. the precise way this can contribute to immuno-pathologies associated with covid- and how this is related to the cytokine signaling dysfunction associated with severe cases [ ] , need further experimental investigation. second, a first analysis of the evolution of synonymous mutations since the outbreak of covid- shows that mutations lowering the number of cpg have taken place in regions with higher cpg content, at the ' and ' ends of the sequence, and in particular in the n protein coding region. the sequence motifs preceding the loci of the cpg removed by mutations match some of the strongly binding patterns of the zap protein [ ] . natural sequence evolution seems to be compatible for protein n with our model, in which synonymous mutations are driven by the virus codon bias and the cpg forces leading to a progressive loss in cpg. these losses are expected to lower the cpg forces, until they reach the equilibrium values in human host, as is seen in coronaviruses commonly circulating in human population [ ] . more data, collected at an unprecedented pace [ , , ] , and on a longer evolutionary time are needed to confirm these hypothesis. since the data collected are likely affected by relevant sampling biases, a more precise analysis of synonymous mutations could be carried out using the available phylogenetic reconstruction of viral evolution [ ] . nevertheless our results seem robust, since they are consistent both considering unique mutations and all collected synonymous variants. they coherently point to the presence of putative mutational hotspots in the viral evolution. while the results presented here are preliminary due to the early genomics of this emerging virus, they point to interesting future directions to identifying the drivers of sars-cov- evolution and building better antiviral therapies. it would interesting to further model transmission and mutations (in the presence of a proofreading mechanism [ ] ) processes in sars-cov- to predict the time scale at which natural evolution driven by host mimicry would bring the virus to an equilibrium with its host [ , ] . after our work was posted on the bioarxiv, r. nchioua and colleagues have shown the importance of zap in controlling the response against sars-cov- , see [ ] , by demonstrating that a knock-out of this protein increases sars-cov- replication. this finding supports our prediction that recognition of sars-cov- by zap imposes a significant fitness cost on the virus, as demonstrated by its early evolution to remove zap recognition motifs. two other recent theoretical works [ , ] , corroborate our results showing that at the single nucleotide level there is a net prevalence of c→u synonymous mutations (the most common nucleotide mutation which may cause a cpg loss) in the early evolution of sars-cov- . moreover a recent analysis of the immune profile of patients with moderate and severe disease revealed an association between early, elevated cytokines and worse disease outcomes identifying a maladapted immune response profile associated with severe covid- outcome [ ] . the aim of this section is to give an overview of the methods used throughout this work, and to explain why for some analyses we used one method rather than the others. we want to characterize the cpg content of a given genome. the different methods that we used to achieve this result, discussed with their usage cases and their limits, are the following: • a first possibility is to simply count the number of dinucleotide motifs (or to compute their density), along the whole genome. this simple count can be useful to see if there is an evolution of the motif number over time, or to study local fluctuations along a sequence to identify regions in which a motif is abundant or scarce, but it is not suitable to make comparisons among viruses of different families, mainly because of the different length and usage biases of viral genomes. however, since we focused mostly on the coronaviridae family, these differences are not so important, and indeed we can see in suppl. fig. si. that some of our results are also apparent from the motif density analysis. • the force defines the abundance or scarcity of a motif given its expected usage based on the nucleotide bias. it can be computed on the whole or part of the genome. in this work we always use, to calculate the force, the human nucleotide bias as reference bias. in the following we detail the force calculation. an important remark is that the force is directly related to the relative abundance f • as shown below, the calculation of the force can be extended to constrain variability of nucleotidic sequences at fixed codons, and using as reference bias the codon bias. this way of computing forces takes into account the fact that the virus has to code for certain specific proteins in its genome. we used here the human codon bias and the sars-cov- codon bias (the latter only for the computation of sms since the virus is not in equilibrium with the human host) as references to compute this force with codon constraints. calculating forces at fixed codon usage allows us to confirm also in this framework the identification of high-and low-cpg force regions, and it was crucial to investigate the dynamics of the synonymous mutations in viral evolution. the model at the core of many of the analyses made here is taken from [ ] . here we briefly review the model, together with its simplified version which does not take into account the codon constraint. let us start from the latter. given a motif m and a sequence s = {s , . . . , sn } of length n , we consider the ensemble of all sequences with length n , which we denote with s, and we suppose the probability of observing s out of this ensemble to be here, f (si) is the nucleotide bias, that is the probability of the i-th nucleotide being si (for example, we always used in this work the human frequency of nucleotides as f (si)), x is the force we want to compute, and nm is the number of times the motif m appears in the sequence. z is the normalization constant, that is therefore the force x is a parameter which quantifies the lack (if negative) or abundance (if positive) of occurrences of m with respect to the number of occurrences due to the local probabilities f (si). we can fix x by requiring that the number of motifs in the observed sequence, nm(s ) = n , is equal to the average number of motifs computed through the model, n , that is notice that this is exactly equivalent to the request that x maximizes the probability of having observed s . let us focus now on the specific case of a dinucleotide motif, that is our motif m consists of the pair ab, where a and b are two consecutive nucleotides (for example, a = c and b = g for the cpg motif). in this case, within an approximation discussed in the si, suppl. sec. si. , the force computed as above turns out to be the logarithm of the relative abundance index, that is where f (ab) is the number of motifs ab divided by the total length of the sequence n . in suppl. fig. si. we tested the accuracy of this approximation in our specific case. our model can be improved to take into account the fact that not all the possible sequences with length n can be observed if the genome is coding for one (or more) protein(s). if we restrict the ensemble of sequences to those coding for the same protein, we obtain the model with the codon constraints that we used for several of our analyses here. in this case, we write each sequence s as a series of codons, and its probability is defined as where now the bias takes the form of a codon usage bias, and the normalization constant z changes accordingly into a sum over all possible synonymous sequences. the force x can be computed, analogously with the procedure for the simpler case, by requiring that the number of motifs observed in s is equal to the model average (although this creates some technical difficulties, which have been overcome in [ ] ). we use the model introduced above in eq. to assign a score, which we call synonymous mutation score (sms), to each possible synonymous mutation of a reference sequence. we consider a system evolving for a small time scale, and a mutation which changes the i-th codon ci into a synonymous c i . the transition probability, that is the probability of observing the mutation, for such evolution can be decomposed in the product of two evolution operators: the first t (ncg → n cg ) representing the change in the number of cpg motifs in the mutated sequence, and the second t (ci → c i ) representing the gain in mutating the particular codon in position i. the first operator can be computed from the dynamical equation introduced in [ ] for the evolution of the cpg number ncg of a sequence under a initial force x through a equilibrium force xeq: the equilibrium force is the force computed on a viral strain which is supposed to be to the equilibrium with the human innate immune system, because it has evolved in the human host since a long time. eq. was used in [ ] to describe the evolution of the cpg numpber in h n , taking as the equilibrium force the one of the influenza b strain. in analogy with this approach we take here as feq the average force calculated for the given segment or coding region on the seasonal hcovs (that is hcov- e, hcov-nl , hcov-hku , hcov-oc ). τ is a parameter determining the characteristic time scale for synonymous mutations. based on eq. ( ) we define the transition operator for cpg number as where ∆ncg = n cg − ncg. notice that for all the synonymous mutations leaving unchanged the cpg number the above operator is one. the codon mutational operator is where f (ci) is the frequency of codon ci from the chosen codon usage bias. putting together these two terms allows us to estimate how likely a specific synonymous mutation is to happen. the synonymous mutation score (sms) accompanying a mutation is defined as the logarithm of this quantity, it is well known that transversions (i. e. mutations of purines in pyrimidines and vice-versa) are suppressed with respect to transitions (i. e. mutations of purines in purines or pyrimidines in pyrimidines). we introduce here a simple way to account for penalties for transversions in the uniform codon bias. we suppose that a mutation with n transversions happens times less than a mutation with n − transversions. therefore, starting from a uniform probability of mutating a codon c into a synonymous codon c , we insert the transvertion penalty and obtain that this probability becomes where t(c, c ) is the number of transvertions needed to mutate c into c . here n is a normalization constant, such that the sum runs over the synonymous of c (without including c). then we can define, for a fixed set of synonym codons, a transition matrix t c,c = p(c → c ). the codon usage bias, for synonyms mutations with transversion penalties, is the stationary probability distribution of the markov chain having the matrix defined in eq. as transition operator. this stationary distribution is therefore given by the unique vector of probabilities b(c) such that by solving this set of equations, together with the requests that c b(c) = , for each set of synonymous codons, we obtain our codon usage bias. we have repeated this calculation for all amino acids. we used this modified bias to perform anova f-tests together with other codon usage biases, see table and suppl. fig. si. . sars-cov- sequences are taken from gisaid [ ] . we collected each sequence present in the database on - - (the most recent sequence was collected on - - ). before any of our analyses, we discarded all the sequences where one or more nucleotides were wrongly read. this left us with sars-cov- sequences. to obtain fig. we considered, in addition to the sars-cov- sequences are taken from gisaid, other alphacoronavirus and betacoronavirus sequences (whole genomes and genes) which have been obtained from vipr [ ] . the pre-processing consisted again of discarding all the sequences with one error or more. after this process we collected sars, mers, hcov- e, hcov-nl , hcov-hku , hcov-nl , bat-covs and pangolin-covs whole genomes. for fig. b we used the largest number possible of sequences, up to a maximum of . for fig. a (viral sequences) and fig. c we chose a single sequence for each species. however, we checked that the result is qualitatively the same if we use other sequences from the same species for human coronaviruses. to obtain the plots in fig. and fig. , we considered as reference sars-cov- sequence the one which has been collected on - - (id: epi isl ). in fig. a the sars-cov- sequence has been processed to ensure the correct reading frame. therefore the orf ab gene is read in the standard frame up to the ribosomal shifting site, and it is read in the shifted frame from that site up to the end of the polyprotein. moreover, the small non-coding parts between successive proteins have been cut, resulting in a loss of nucleotides. to produce the bar plots in figs. c and b we collected genes data on vipr. then we discarded as usual all the sequences with one or more errors, and we computed for each gene an average of up to different sequences (coming from the same species). for some structural proteins we did not find different genes but in any case the standard deviation of the averages of figs. c and b is smaller than . (and, for most of the viruses, much smaller). in particular, we used sequences of sars-cov- , mers, hcov-nl , hcov-oc proteins, sequences for hcov- e, for hcov-hku and for sars. more detailed information about the genomes used in this work are given in suppl. sec. si. . the code used to compute forces, both in the coding and non-coding cases, is publicly available at https: //github.com/adigioacchino/dinucleotide_forces. fig. and suppl. fig. si. , where all the coronaviruses associated with circulating human strains are compared with sars-cov- in terms of cpg force (panel (a)) or number in fixed-length windows (panel (b)). again, even though the final regions of the hcovs has relatively high cpg force with respect to the other parts of their sequences, sars-cov- has a ' cpg force peak well above the final region of hcov virus. sars-cov- genomes are collected at an unprecedented pace, with tens of genomes added daily. we updated our analysis several times since the first version of this manuscript has been posted on biorxiv, and our results are consistently verified by mutations observed in recent genomes. to compare the analyses performed with the most recent genomes and those that we obtained previously, we collected all the sars-cov- sequences submitted to gisaid up to - - (last updated sequence collected on - - ), and we report here the analogous of tables and , and of fig. computed with those sequences, respectively as tables , and fig. si. . we also show in table table , computed with the sequences submitted to gisaid by - - . although the availability of fewer data lowers the f-test results most of the times (and therefore gives a higher pvalue) with respect to table , the qualitative results are very similar. for instance, it remains true that the score given through the transition-transversion bias alone cannot distinguish between the observed and non-observed mutations, while these two cases become distinguishable if the cpg force is added, especially for n orf. table : total number of synonymous variants and cpg decreasing/increasing variants observed for orf n and s at two stages of the early evolution of the sars-cov- in contact with the human host. the dates refer to the upper limit for submission to gisaid of the sequences used. we want to show in which limit the cpg force (without codon constraints) is equivalent to the relative dinucleotide abundance [ ] , eq. . we start from the partition function: now we can compute each term in the cluster expansion, and we get for the k-th term (for a = b, as in the cpg case) where we defined g = (e x − ) f (a) f (b). now we suppose n = m, that is n is even (however, we will consider soon the large-n limit, where this request is not necessary anymore). therefore, we have to proceed further, we can consider the case where g . this is a good approximation when x , and it is also fairly good as long as x is lower than , as in all the cases studied here. under this hypothesis, we have where in the last step we used also that n . from this, by using that n = ∂x log z and requesting n = n = n f (ab), we obtain eq. . fig. si. shows the correlation between the cpg force with the nucleotide bias and the cpg relative abundance. figure si. : comparison between the cpg force and the cpg relative abundance index. as discussed in sec. si. , these two quantities are almost identical when the genome is long. to show that, here different genomes for several coronavirus species are used to compute these two quantities, and the dashed blue line is a linear fit of the resulting points. compositional differences within and between eukaryotic genomes why is cpg suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses cpg usage in rna viruses: data and hypotheses quantitative theory of entropic forces acting on constrained nucleotide sequences applied to viruses patterns of evolution and host gene mimicry in influenza and other rna viruses oligonucleotide motifs that disappear during the evolution of influenza virus in humans increase alpha interferon secretion by plasmacytoid dendritic cells sequence-specific sensing of nucleic acids cg dinucleotide suppression enables antiviral defence targeting non-self rna molecular mechanism of rna recognition by zinc-finger antiviral protein structure of the zincfinger antiviral protein in complex with rna reveals a mechanism for selective targeting of cg-rich viral sequences covid- : a new virus, but an old cytokine release syndrome covid- : consider cytokine storm syndromes and immunosuppression. the lancet a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov and ifn: too little, too late impaired type i interferon activity and exacerbated inflammatory responses in severe covid- patients. medrxiv the covid- cytokine storm; what we know so far immunology of covid- : current state of the science effective treatment of severe covid- patients with tocilizumab sars-cov- vaccines: status report sequence analysis of sars-cov- genome reveals features important for vaccine design distinguishing the immunostimulatory properties of noncoding rnas expressed in cancer cells categorical spectral analysis of periodicity in human and viral genomes from sars to mers, thrusting coronaviruses into the spotlight the molecular biology of coronaviruses the architecture of sars-cov- transcriptome active replication of middle east respiratory syndrome coronavirus and aberrant induction of inflammatory cytokines and chemokines in human macrophages: implications for pathogenesis vipr: an open bioinformatics database and analysis resource for virology research data, disease and diplomacy: gisaid's innovative contribution to global health evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor rates of transition and transversion in coding sequences since the human-rodent divergence the proximal origin of sars-cov- understanding human coronavirus hcov-nl . the open virology journal nextstrain: real-time tracking of pathogen evolution introductions and early spread of sars-cov- in the new york city area the zinc finger antiviral protein restricts sars-cov- . biorxiv coronavirus genomes carry the signatures of their habitats. biorxiv cytosine deamination in sars-cov- leads to progressive cpg depletion. biorxiv longitudinal analyses reveal immunological misfiring in severe covid- database resources of the national center for biotechnology information we thank nicolas vabret for discussions and reading of the manuscript, eddie holmes and marta luksza for helpful exchanges. we gratefully acknowledge the authors, originating and submitting laboratories of the sequences from gisaids epicov(tm) database on which this research is based. this work was partially supported by the anr- decrypted ce - - grant and by the key: cord- -ps x es authors: li, wei; yang, shuai; xu, peng; zhang, dapeng; tong, ying; chen, lu; jia, ben; li, ang; ru, daoping; zhang, baolong; liu, mengxing; lian, cheng; chen, cancan; fu, weihui; yuan, songhua; ren, xiaoguang; liang, ying; yang, zhicong; li, wenxuan; wang, shaoxuan; zhang, xiaoyan; lu, hongzhou; xu, jianqing; wang, hailing; yu, wenqiang title: human identical sequences of sars-cov- promote clinical progression of covid- by upregulating hyaluronan via namirna-enhancer network date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ps x es the covid- pandemic is a widespread and deadly public health crisis. the pathogen sars-cov- replicates in the lower respiratory tract and causes fatal pneumonia. although tremendous efforts have been put into investigating the pathogeny of sars-cov- , the underlying mechanism of how sars-cov- interacts with its host is largely unexplored. here, by comparing the genomic sequences of sars-cov- and human, we identified five fully conserved elements in sars-cov- genome, which were termed as “human identical sequences (his)”. his are also recognized in both sars-cov and mers-cov genome. meanwhile, his-sars-cov- are highly conserved in the primate. mechanically, his-sars-cov- , behaving as virus-derived mirnas, directly target to the human genomic loci and further interact with host enhancers to activate the expression of adjacent and distant genes, including cytokines gene and angiotensin converting enzyme ii (ace ), a well-known cell entry receptor of sars-cov- , and hyaluronan synthase (has ), which further increases hyaluronan formation. noteworthily, hyaluronan level in plasma of covid- patients is tightly correlated with severity and high risk for acute respiratory distress syndrome (ards) and may act as a predictor for the progression of covid- . his antagomirs, which downregulate hyaluronan level effectively, and -methylumbelliferone (mu), an inhibitor of hyaluronan synthesis, are potential drugs to relieve the ards related ground-glass pattern in lung for covid- treatment. our results revealed that unprecedented his elements of sars-cov- contribute to the cytokine storm and ards in covid- patients. thus, blocking his-involved activating processes or hyaluronan synthesis directly by -mu may be effective strategies to alleviate covid- progression. the pandemic of coronavirus disease has caused more than million identified cases with more than . million confirmed deaths worldwide by november , (dong et al., ) , and poses an unprecedented global threat. covid- is caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) and characterized clinically by fever, dry cough, and shortness of breath (guan et al., ; mao et al., ) . furthermore, covid- may develop acute liver and kidney injury, cardiac injury, bleeding, and coagulation dysfunction (wiersinga et al., ) . severe patients frequently experience acute respiratory distress syndrome (ards), causing % of deaths in fatal cases . unfortunately, we still lack a specific strategy for covid- treatment. as a single-stranded positive-sense rna virus, sars-cov- is closely related to other highly pathogenic beta-coronaviruses such as severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov) (hassan et al., ) . it has been revealed that sars-cov- enters into the cell through the binding of spike (s) protein to angiotensinconverting enzyme (ace ) receptor (hoffmann et al., ; zhou et al., ) . sars-cov- infection activates innate and adaptive immune response (blanco-melo et al., ) , accompanied by elevated inflammation markers (such as c-reactive protein, il- r, il- , il- , and tnf-α) (chen et al., b) . noteworthily, as the antiviral immunity guardian, t cells are reduced significantly in covid- patients (diao et al., ) , which is negatively correlated with survival rates. besides, increased d-dimer concentration in covid- patients suggests sars-cov- infection remarkably activates the fibrinolytic system (levi et al., ) . however, the molecular mechanism underlying these clinical features caused by sars-cov- is still elusive. accumulating evidence has revealed an interesting phenomenon that both dna and rna virus can generate small rnas different from that of the host cells in the infected cells (mishra et al., ; shapiro, ; weng et al., ) , which are called virus-derived small rnas (vsrnas) or mirnalike non-coding rnas (v-mirnas) . for example, one of the vsrnas derived from enterovirus (ev ) inhibits viral translation and replication via targeting its internal ribosomal entry site (ires) (weng et al., ) . conversely, influenza a virus-generated vsrnas promote virus rna synthesis (perez et al., ) . except for the regulation of virus replication, vsrnas also play a key role in regulating host response and disease processes. for instance, repression of lmp a by mir-bart derived from epstein-barr virus (ebv) protects the infected cells from host immune surveillance (lung et al., ) . moreover, vsrnas can mediate the silencing of host genes in caenorhabditis elegans (guo et al., ) , and sars-cov virus n gene-derived small rna (vsrna-n) could enhance the lung inflammatory pathology (morales et al., ) . however, little is known about whether sars-cov- derived vsrnas participate in the replication of virus and host response in covid- patients. micrornas (mirnas) are ~ nucleotides (nt) non-coding rnas (ncrnas) that primarily regulate post-transcriptional silence via targeting the ′ untranslated region ( ′ utr) of mrna transcripts in the cytoplasm (pasquinelli, ) . however, our group uncovered that mirnas located in the nucleus were capable of activating gene expression through targeting enhancer and termed them as "nuclear activating mirnas (namirnas)" (xiao et al., ) . consistent with our findings, phillip a. sharp et al. deciphered the interaction between super-enhancers (ses) and mirna networks (suzuki et al., ) . significantly, sars-cov- was predicted to be enriched in the nucleolus by a computational model named rna-gps . besides, numerous unknown transcripts have been identified from the architecture of sars-cov- transcriptome in infected vero cells (kim et al., ) , which may serve as the precursor mirnas (pre-mirnas). therefore, these findings imply that sars-cov- may generate vsrnas that function in host cells. here, we identified five conserved fragments with high similarity across different primates and termed them as "human identical sequences (his)" by comparing the genomic sequences of sars-cov- and humans. further bioinformatics analysis indicated his embedded in sars-cov- could potentially be virus-derived small rnas. his-sars-cov- rna could directly bind to the conserved human dna loci in vitro. besides, these virus fragments containing his can increase the h k acetylation (h k ac) enrichment at their corresponding regions of the human genome in different mammalian cells and activate the expression of adjacent and distant genes associated with inflammation. notably, his can also activate hyaluronic acid synthase (has ) and increase the production of hyaluronic acid, which were further verified in covid- patients' plasma and proven to be correlated with the severity and clinical manifestations of sars-cov- infection. hyaluronan inhibitor treatment downregulated the hyaluronan level and thus emerged as a potential therapeutic strategy for covid- patients. conserved and regulatory elements have been revealed in the viral genome (bernard, ; morales et al., ) . therefore, we hypothesized that nucleic acid sequence of sars-cov- might interact with host genome as pathogenic factors. to investigate the underlying interaction between sars-cov- and the host, we analyzed the sequence conservation between the genome of sars-cov- (accession number nc_ ) and human (grch /hg ). considering that the shortest regulatory rnas derived from human genome are ~ nt mirnas; thus, we set the length range of conserved sequences as greater than bp and the matching rate as %. surprisingly, we identified five fully conserved sequences in the sars-cov- genome (figure a), which were collectively termed as "human identical sequences (his)". his-sars-cov- - (abbreviated as "his-sars - ") and his-sars - were located in chr (figure b), while his-sars - , his-sars - , and his-sars - in chr , chr , chrx, respectively (figures a). to acquire a better understanding of the potential function of his, we interrogated the characteristics of the identical targeted sequences of his-sars in human genome. clearly, we found that all these human genomic loci were widely featured with h k ac, the well-known marker of enhancer (figure c & figures b) , suggesting that his-sars may function as a regulator with host enhancer. as enhancer elements embedded in the genome generally regulate their neighboring as well as distant genes, we extended the genomic scope and found there were plenty of well-recognized genes near the targets of his-sars , including cytokines genes (figure b), which might explain why most severe covid- patients are characterized by cytokine storm, the main cause for ards that leads to death . furthermore, the kyoto encyclopedia of genes and genomes (kegg) pathway analysis of the neighboring (± kb) genes of his showed that they were enriched in cgmp-pkg signaling pathway and muscle contraction (figure d), consistent with the proposals that modulation of cgmp-pkg by the inhibitor of its upstream regulator phosphodiesterase (pde ), which is highly expressed in airways and vascular smooth muscle, could be a potential treatment for covid- patients (giorgi et al., ) . gene ontology (go) analysis showed that they were enriched in vascular smooth muscle contraction and protein import into the mitochondrial matrix (figures c), consistent with the recent reports that mitochondria might play a vital role in covid- . for instance, cytokine storm could induce iron dysfunction, such as hyperferritinemia, which can lead to the production of reactive oxygen species (ros) and oxidative stress, and finally cause mitochondrial dysfunction, platelet damage, apoptosis. abnormal platelets will cause clotting events and thrombus formation (saleh et al., ) . howard chang and college predicted that sars-cov- rna enriched in host mitochondria , and mitochondria dysfunction was also identified in covid- patients (saleh et al., ) . collectively, we identified his in sars-cov- genome, and the targeted human genome loci enriched with cytokines genes suggested that his may underly the clinical characteristics of covid- patients and serve as a vital player in the pathological progression. sars-cov- belongs to the coronavirinae subfamily, which contains six well-known human coronaviruses (hcovs), including two alphacoronavirus (hcov- e, hcov-nl ), two lineages a beta-coronavirus (hcov-oc , hcov-hku ), one lineage b beta-coronavirus (sars-cov), and one lineage c beta-coronavirus (mers-cov). to figure out whether his is a common feature embedded across the hcovs' genome (the accession number for each hcovs refers to tables ), we analyzed the hcovs and human genome sequence using the same criteria. except for the five his in sars-cov- mentioned above, we overall identified his, including five his in hcov-hku , five his in . like his-sars-cov- was abbreviated as his-sars , his-sars-cov and his-mers-cov were abbreviated as his-sars and his-mers, respectively (the detailed information of his in hcovs refers to tables .) these his correspond to identical sequences in human genome, which are distributed in different chromosomes. it is noteworthy that one his may correspond to multiple loci in human genome, and chr ( / ) and chr ( / ) are the hotspots for his (figures a), including five loci in mhc region of chr (tables ). the length of his ranges from bp to bp, while out of his are bp (tables ). meanwhile, the gc ratio range from . % to . %; however, gc ratio of out of ( . %) his were no greater than % (figures b). the universal distribution of his in hcovs suggests a comprehensive role of these unheeded elements. sars-cov- was reported to originate from bat . however, many animals can be its potential host, including malayan pangolins (lam et al., ; xiao et al., ) , ferrets, and cats (shi et al., ) . we presumed that his might also exist in mediator hosts. to explore the distribution of his-sars-cov- across species, we analyzed the sequences of species, including eight primates (human, rhesus, crab-eating macaque, baboon, green monkey, marmoset, squirrel monkey, bushbaby), squirrel, mouse, rat, pig, cat, dog, four bats (megabat, david's myotis bat, microbat, big brown bat) and chicken (the accession number for each species mentioned here refers to tables ). it was demonstrated clearly that all these five his-sars were highly conserved in primates (figure b & figures ) . among them, his-sars - , his-sars - , and his-sars - were also highly conserved in bats and relatively less conserved in other mammals, such as squirrel, mouse, rat, cat, and dog (figure b, figures c & figures d) ; while his-sars - and his-sars - showed less conservation beyond primates (figures b & figures c) . when it comes to the domesticated animals exampled by chicken, which is the main host for avian influenza, no conserved his were identified at all. the conservation of his across species hinted the propagation evolution pathway of sars-cov- and highlighted the unrevealed significance of his. to investigate the potential functions of these his embedded fragments, we performed rna secondary structures prediction (mathews et al., ) and found that his-containing fragments were capable of forming the typical hairpin structure like mirna precursors (figure c & figures ) . emerging evidence exhibits how human mirnas interact with viral rna, including direct binding (chen et al., c; hosseini rad sm and mclellan, ; nersisyan et al., ) , thus it's reasonable to suppose that virus-derived mirna-like rnas could also regulate human rnas or even interact with human genome dna elements. indeed, sars-cov-encoded small viral rnas (svrnas) targeted ′utr of specific transcripts and repressed mrna expression (morales et al., ) . also, the function of small regulatory rnas is largely dependent on their subcellular locations. it was predicted that sars-cov- rna was highly enriched in the host nucleolus and mitochondria . previously, our team revealed that mirna could be allocated in the nucleus and target enhancer for gene activation (xiao et al., ; zou et al., ) . therefore, we suspected that his-sars could interact with host enhancers and function like namirna. to investigate whether his can activate the host gene expression, we constructed vectors containing his fragments identified in sars-cov- and sars-cov. we transfected them into transformed human embryonic kidney cell hek t, human fetal lung fibroblast cell mrc , and human umbilical vein endothelial cell (huvec), and detected the expression of surrounding as well as the distant genes. the expression level of coronavirus fragments was verified by qpcr (figures a&b&c) . fbxo , timm , and cyb a lie the upstream of targeted locus of his-sars - ; among them, fbxo was upregulated when his-sars - was transfected in hek t (figure a), mrc (figure b), and huvec (figure c); meanwhile, timm and cyb a upregulated in hek t (figure a) and mrc (figure b), respectively. as an e ligase subunit, fbxo impairs mitochondrial integrity and induces lung injury in pneumonia . both timm (mick et al., ) and cyb a (plitzko et al., ) encode the components of mitochondrial membrane, implying their fundamental roles in mitochondria, the dysfunction of which is tightly correlated with covid- pathogenesis (saleh et al., ) . similarly, his-sars - overexpression upregulated the upstream gene kalrn in hek t cell (figures d). kalrn plays a significant role in the development of sarcoidosis (besnard et al., ) , a systemic inflammatory disease involved in multiple organs throughout the body, including the kidney and lungs, implying kalrn may get involved in inflammatory response after sars-cov- infection. except for the adjacent genes, his can also upregulate distant genes. the transfected his-sars - upregulated myl in hek t and mrc (figure d) and epn (figure e) in mrc and huvec (figure f) . myl was demonstrated to be a functional ligand for cd that regulate airway inflammation (hayashizaki et al., ) . these data suggest his contribute to covid- pathogenesis by upregulating the adjacent or distant genes, which may cause mitochondrial dysfunction or participate in the inflammatory response. like sars-cov- , sars-cov is a typical fetal beta-coronavirus. patients who contracted these two largely share similar symptoms (chen et al., a) . transfection of his-sars- could also activate neighboring genes, such as has , zhx , and derl in hek t (figures e), mrc (figures f), and huvec (figures g). importantly, has encodes the critical enzyme for the production of hyaluronan, which is a kind of glycosaminoglycan that plays a fundamental role in the inflammatory response. the accumulation of hyaluronan in the lung has been recognized in ards for more than three decades (hallgren et al., ) . besides, ards is the most typical clinical manifestation of severe covid- cases (guan et al., ; wang et al., a) . in that, it's natural to testify whether his-sars could activate has . surprisingly, we found his-sars - and his-sars - could activate the expression of has about ~ folds in hek t (figure g) and mrc (figures h); however, the expression of has was upregulated more than ten folds in huvec (figures i), suggesting that small vascular cell may be the main target of sars-cov infection. in this case, his might largely explain most clinical symptoms of covid- patients, including ards and microvessels injury, by upregulating has and the level of hyaluronan, which receives very limited attention in covid- progression. additionally, we noticed that ace could also be upregulated by his-sars - and his-sars - (figure h). ace is the essential receptor for both sars-cov- and sras-cov to enter the host cell (hoffmann et al., ; zhou et al., ) . these data indicate his-sars not only induce the clinical manifestations of covid- patients but also enhance the spreading of sars-cov- by promoting the level of ace and making it more susceptible, which partially explain the global threat of pandemic for covid- . to explore the underlying mechanism of his regulated gene expression. we transfected hek t with his fragments and performed h k ac chip-qpcr. it clearly showed that his fragments could induce the enrichment of h k ac at his targeted loci (figure a). h k ac marks enhancer or super-enhancer, which is critical for gene activation. this indicates that his regulated gene activation is largely dependent on activating the targeted enhancers. jq is a potent inhibitor of the bet family proteins, which functions by binding to h k ac. after the treatment of jq , the upregulation of his targeted genes was abolished (figure b&c&d), which was also the case in his-sars- (figures a), further supporting that his mediated gene regulation is achieved through enhancer. to further confirm whether his were indispensable for gene activation, we knocked down the his expression by cas d and found the activation of targeted genes was abolished (figure e&f&g), which was also the case in his-sars- (figures b), indicating the specificity of his for regulating host gene. the behaviors of his-sars rna are highly similar to our previously identified namirnas, which activate genes as enhancer triggers in the nucleus (xiao et al., ; zou et al., ) . overall, these data suggest that his activates the host gene through the namirna-enhancer network. treatment sars-cov and sars-cov- are similar in many aspects; especially, the infected patients are characterized by ards and pulmonary fibrosis. it has been reported that sars-cov infection promotes acute lung injury by regulating hyaluronan level, which is mediated by has (hu et al., ) . among his-activated genes, has arose our special attention for its ability to regulate hyaluronan level. accordingly, we found that his-sars- , his-sars - , and his-sars - significantly upregulated the hyaluronan level in supernatant of culture medium for hek t (figure a), which was also the case in mrc (figures a) . therefore, we speculated that hyaluronan may get involved in covid- pathogenesis and progression. to further decipher the fundamental role of hyaluronan in covid- , we collected the plasma of covid- patients who have been hospitalized at the shanghai public health clinical center. we categorized patients into mild (n= ) and severe (n= ) groups based on the characteristic pneumonia features of chest ct. in severe patients, the mean value of hyaluronan ( . ng/ml) (figure b) was significantly higher than that in mild patients ( . ng/ml), which was supported by the recent report that hyaluronan level was higher in severe patients (ding et al., ) . hyaluronan can encompass a large volume of water, which enable it to determine the water content in specific tissues (turino and cantor, ) . it has been reported that the extravascular lung water volume is positively correlated with hyaluronan level in normal animal lungs (bhattacharya et al., ) . in that, the water absorption characteristics of hyaluronan rationalize the possibility that increased hyaluronan, which was induced by the upregulated has in lung cells after sars-cov- infection, bind lots of water and form jellylike substances, underlying the ground-glass opacity commonly occurred in covid- patients. besides, higher level of hyaluronan in severe patients' plasma suggests hyaluronan may act as a predictor of covid- progression, especially when physicians making a quick decision in the clinic for patients who will need critical medical care. to test whether hyaluronan level alone could be an indicator for characterizing the clinical manifestations of covid- patients, we classified patients into a mild and severe group based on the hyaluronan level. we found that in severe patients, the level of lymphocytes decreased (figure c), d-dimer increased (figure d), and c-reactive protein (crp) upregulated (figure e), which was supported by the report that compared non-intensive care unit (icu) patients, d-dimer was significantly upregulated, while lymphocyte count decreased . additional evidence showed that lymphocyte count, c-reactive protein, and d-dimer were significantly associated with covid- severity . these data suggest hyaluronan alone can function as a potential indicator for covid- severity. based on the discoveries that his regulated has and hyaluronan level contributed to the clinical features of covid- patients, we speculate the downregulation of hyaluronan by his antagomirs might be an effective way to abolish the upregulation of inflammatory genes and even relieve the clinical symptoms. to testify this strategy, we transfected cells with antagomirs and found antagomirs of his-sars - , his-sars - , and his-sars - downregulated the inflammatory genes which had been upregulated, such as kalrn (figure f), has , myl (figure g), fbxo , and timm (figure h). antagomir of his-sars- also downregulated the has and zhx (figures b). these data reveal the druggable potential of his antagomir for covid- treatment. to further test whether hyaluronan could function as a target for covid- progression inhibition, we treated hek t with -methylumbelliferone ( -mu), a hyaluronan synthesis inhibitor (nagy et al., ) . after treatment, hyaluronan was downregulated in all groups (figure i), which was also the case in mrc (figures c). we noticed there was a prescription drug, hymecromone, which also function as a hyaluronan inhibitor. similarly, we treated hek t with dmso-dissolved hymecromone and found hyaluronan reduced accordingly in the cell culture supernatants (figure j). taken together, hyaluronan promoted by his is emerging as a novel target for covid- treatment, and the inhibition of has by his antagomir or -methylumbelliferone might be novel strategies to block the progression of covid- . it is of critical need to understand the mechanism of how sars-cov- causes inflammation cytokines storm in the host response. in this current study, we illustrated that human identical sequences (his) of sars-cov- activate host cytokines through the namirna-enhancer-gene activation network. ectopic expression of these fragments containing his-sars promotes h k ac enrichment at their corresponding regions in host cells. it is noteworthy that these his of sars-cov- can bind steadily to their corresponding regions of human dna fragments in vitro. importantly, his activate has expression and increase hyaluronan level, while covid- patients who have higher plasma hyaluronan levels tend to show severe symptoms. these findings contribute to understanding the progression of covid- after sars-cov- infection and may provide novel therapeutic targets. a growing number of studies indicate that the infection of various viruses, including both dna and rna viruses, has species-specific and organ-specific signatures (de meyer et al., ; iwamoto et al., ; ohlund et al., ; rothenburg and brennan, ) . for example, humans and chimpanzees are the known unique hosts to be infected naturally by hepatitis c virus (hcv) (sandmann and ploss, ) . notably, the liver is the principal site of hcv infection. the pathogenesis of hcv infectioncausing progressive liver diseases is thought to be an uncontrolled inflammatory response within the change of inflammatory cytokines . of note, not all the infection of viruses is virulent for their hosts. one of the typical examples is that human immunodeficiency virus type (hiv- ) is pathogenic only for human, and does not affect the health of the primary host (nomaguchi et al., ) . similarly, sars-cov- infection in human results in covid- without being fatal for its other potential hosts, including bats and pangolins (xiao et al., ; zhou et al., ) . by comparing different primates with sars-cov- genome, we found five conserved fragments with high similarity in human. interestingly, the conserved fragments were discovered between sars-cov- and their hosts, including bats. moreover, we also identified his between other highly pathogenic betacoronaviruses (such as sars-cov and mers-cov) and different primates. consistent with these points, his was found within other pathogenic viruses, including avian influenza virus, swine flu virus, rabies virus, coxsackievirus, influenza a virus, hiv, ebolavirus, and zika virus (tables ). therefore, it is a universal phenomenon that the conserved fragments, which were termed here as "host identical sequences (his)", are ubiquitous between viruses and their hosts, implying his could be the determinants of viral susceptibility and pathogenicity. in this case, it is natural to speculate that his from the pathogen genome may help to trace the lines of the host, especially for the mediated host during sars-cov- virus evolution. since his activate host genes through enhancer and enhancer is well known for its tissue-specific signature, it is paramount to investigate the virus-host cell interaction and to understand why viruses usually show organ or tissue-specific pathogenesis. in recent years, multiple dna and rna viruses have been demonstrated to produce mirna-like non-coding rnas (mishra et al., ) . surprisingly, bioinformatics analysis revealed that the fragments containing his in sars-cov- could produce the primary structure of precursor mirna (pre-mirna). we have previously clarified that mirnas can activate genome-wide gene transcription by targeting enhancers (xiao et al., ) . excitingly, further bioinformatics analysis (betel et al., ) indicated that enhancers of lung targeted by these his could regulate genes (tables ) overlapping with recently published transcriptome sequencing data of the bronchoalveolar lavage fluid (balf) in covid- patients (xiong et al., ) , implying his probably play a key role during the progression of covid- . an increasing number of researches point to sars-cov- infection as causing multiple organ damage involved in the lung, kidney, and liver by activating inflammation response (wiersinga et al., ) . additionally, the fibrinolytic system in covid- patients is also destroyed by sars-cov- infection (levi et al., ) . to confirm whether his affect inflammation in these organs, different fragments of his were transfected in selected human cell lines, including hek t, mrc , huvec. interestingly, his-sars - can upregulate the expression of its upstream kalrn in hek t, which contributes to the development of sarcoidosis (besnard et al., ) , causing a systemic inflammatory in multiple organs such as kidney and lungs. alternatively, the adjacent gene fbxo and the distant gene myl were increased by transfected his-sars - and his-sars - in hek t. of note, both fbxo and myl are closely related to inflammation response hayashizaki et al., ) . similarly, his-sars - promoted the expression of its upstream gene cyb a and fbxo in mrc . consistent with these results, his in sars-cov also activate neighboring genes, such as has , zhx , and igf r. collectively, all the above evidence emphasizes that his can indeed cause inflammation response by activating gene expression, supporting this novel mechanism underlying the viral pathogenicity. mirnas can activate gene transcription epigenetically by increasing h k ac enrichment at their target enhancers in our previous study (xiao et al., ) . interestingly, chip-qpcr confirmed that h k ac was enriched in the his-sars - region of the human genome by the corresponding his fragment in hek t. then we treated his-sars - hek t cell with jq , which is an inhibitor of brd , resulting in the preferential loss of enhancer (loven et al., ) . meanwhile, blocking h k ac with jq remarkably downregulated fbxo expression in hek t, suggesting enhancer is essential for his mediated gene activation. to investigate the specificity of this activation process, we designed the antagomir against his-sars - and found that fbxo was decreased in his-sars - transfected hek t cell, which was further confirmed by degrading the his-sars - with casrx system for efficiently and functionally knocking down targeted genes . as ago may serve as a guide mediating the binding of mirnas to their enhancers, resulting in gene activation (liang et al., ) , we found that ago was also involved in his induced host gene activation. clearly, his-sars - rna could hybridize its target ssdna and form stable double strands, which can be stabilized by hago by reducing dissociation constant (kd), supporting that ago guiding the binding of mirna on their target enhances. these results demonstrate that his can activate gene transcription epigenetically, and inhibition or degradation of his may act as a new strategy for controlling the virus-induced disease. importantly, we identified that hyaluronan could be a novel target for covid- treatment. at first, has located on the upstream of his targeted site in the human genome attracted our interest, the major enzyme responsible for hyaluronan synthesis (csoka et al., ) . accumulation of hyaluronan is closely associated with ards (hallgren et al., ) , the typical symptom in patients of sars-cov, sars-cov- , and mers-cov infection. as expected, his-sars-cov and his-sars can activate dramatically has expression in host cells, causing the upregulation of hyaluronan in cell medium supernatant. clinically, hyaluronan is significantly increasing in severe covid- patients with ground-glass opacity by the chest ct scan, and the level of hyaluronan is correlated with the clinical prognosis of patients with covid- (ding et al., ) . interestingly, decreased lymphocytes, increased d-dimer and c-reactive protein show up more often in severe patients compared to mild patients distinguished by their hyaluronan level, which is in surprising unanimity with the pathological features of icu covid- patients . in other words, hyaluronan level alone could be an indicator for characterizing the clinical progression of covid- patients. notably, the total number of t cells reduced significantly in covid- patients compared to the normal levels (qin et al., ) . such discrepancies may be owing to the binding of hyaluronan and its ligand cd , which can induce the death of activated t cells (mckallip et al., ) . another ligand habp (also called factor vii-activating protease), which conjugate with hyaluronan, plays an important role in blood coagulation by activating the pro-urokinase-type plasminogen activator (kanse et al., ) , which may cause the dysregulation of the fibrinolytic system in covid- patients. in addition, habp aggravates the disruption of the hyaluronan -mediated endothelial cell barrier (mambetsariev et al., ) , which may explain well the sudden brain hemorrhage of icu patients with covid- ( barrios-lopez et al., ) . further studies will be performed to elucidate the latent mechanism of hyaluronan causing these clinical characteristics during the progression of covid- . -methylumbelliferone is reported to be an efficient inhibitor for hyaluronan production (nagy et al., ) . -mu treatment can reverse the upregulation of hyaluronan in response to his in hek t and mrc . additionally, hyaluronan level was also decreased in hek t after the treatment of hymecromone, which is one of the commercially available hyaluronan inhibitor drugs for cholecystitis, cholelithiasis, and cholecystectomy syndrome in china. overall, decreasing hyaluronan could improve most of the clinical symptoms of covid- patients and serve as a promising therapeutic target for sars-cov- and other related diseases. taken together, our results presented here indicate a novel mechanism that sars-cov- can induce host response by regulating host gene expression through the direct interaction between their vsrnas and chromatin enhancers in host during covid- progression. given these findings, blocking the his with nucleic acid drugs or inhibiting hyaluronan production with specific medicine like hymecromone can provide new strategies for effective covid- treatment. modified cas d plasmid was a generous gift from professor pengyu huang at shanghaitech university. we thank yue yu for editorial help and comments on the manuscript. (b) the location of the identical sequence of his-sars - and his-sars - in human chromatin and their surrounding genes. all genes within ± kb of these loci are listed. among them, inflammation-or immunity-related genes, which are defined by text mining (search combining keywords "gene name + inflammation" or "gene name + immunity") in pubmed and google scholar, are written in red. the searched results were further confirmed by literary research. in all figures above, the y-axis indicates the mrna fold changes to gapdh detected by rt-qpcr. pvalues were calculated using the unpaired, two-tailed student's t test by graphpad prism . . *, p< . ; **, p< . ; ***, p< . ; ns, not significant. (c) hyaluronan level is negatively correlated with lymphocytes number in patients' plasma. ha (hyaluronic acid) level ( ng/ml) functions as a discriminator for patients' lymphocytes number. for patients whose hyaluronan < ng/ml ( patients), the mean value is . × /l; while for patients whose hyaluronan ≥ ng/ml ( patients), the mean value is . × /l. p-values were calculated using the two-tailed nonparametric mann-whitney test by graphpad prism . . ****, p< . . (d) hyaluronan level is positively correlated with d-dimer level in patients' plasma. ha (hyaluronic acid) level ( ng/ml) functions as a discriminator for patients' d-dimer levels. for patients whose ha< ng/ml ( patients), the mean value is . µg/ml; while for patients whose ha≥ ng/ml ( patients), the mean value is . µg/ml. p-values were calculated using the two-tailed nonparametric mann-whitney test by graphpad prism . . ****, p< . . (e) hyaluronan level is positively correlated with c-reactive protein (crp) level in patients' plasma. ha (hyaluronic acid) level ( ng/ml) functions as a discriminator for patients' crp levels. for patients whose ha< ng/ml ( patients), the mean value is . mg/l; while for patients whose ha≥ ng/ml ( patients), the mean value is . mg/l. p-values were calculated using the twotailed nonparametric mann-whitney test by graphpad prism . . ****, p< . . all genes within ± kb of those loci are listed. among them, inflammation-or immunity-related genes, which are defined by text mining (search combining keywords "gene name + inflammation" or "gene name + immunity") in pubmed and google scholar, are written in red. the searched results were further confirmed by literary research. supplemental table further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, wenqiang yu (wenqiangyu@fudan.edu.cn). this study did not generate new unique reagents. human reference genome (hg ) was obtained from genbank using blastn (chen et al., ) with sars-cov- as a query. the detail parameters were chosen as follow: ) the maximum number of hits to report: ; ) maximum e-value for reported alignments: ; ) word size for seeding alignments: ; ) the match/mismatch scores: ,- ; ) gap penalties: openging: , extension: ; ) filter low complexity regions; ) filter query sequences using repeatmasker. kegg (kyoto encyclopedia of genes and genomes) and gene ontology analysis on the surrounding (± kb) genes of the identical sequences of his in human genome were performed using david (huang da et al., ) . gene ontology biological processes and kegg pathway were ranked by pvalue and the top terms were plotted. five his in sars-cov- were chosen to construct plasmids, named his-sars - , his-sars - , his-sars - , his-sars - , and his-sars - . besides, we chose his-sars- in sars-cov as the parallel group. in brief, as his precursors, ~ bp virus dna fragments containing ~ bp his in sars-cov and sars-cov- were obtained by annealing and extension with specific primers synthesized by shanghai sunnybio-technology co., ltd. then, these his were cloned into the pcdh-cmv-mcs-ef -copgfp lentiviral vector (zou et al., ) at ecor i( ′) and bamh i ( ′) sites through clonexpress ii one step cloning kit (vazyme, c ) according to the manufacturer's manual. besides, diverse single guide rnas (sgrnas) targeting the his precursors were designed and cloned into modified lentivirus plasmids containing cas d. also, the antagomirs for his-sars - , his-sars - , his-sars - , his-sars - , his-sars - , and his-sars- were purchased from guangzhou ribobio co., ltd. the sequences of primers, his precursors, and antagomirs are listed in tables . cells were transfected with plasmids or antagomirs at %- % confluency via hieff trans tm liposome nucleic acid transfection reagent (yeasen, es ) following the manufacturer's instruction. we changed fresh medium containing % fbs at - hours after transfection and harvested transfected cells at hours after transfection to detect the expression of his and target genes or perform chip assay. we co-transfected pcdh-pre-his, pspax , and pmd g plasmids into hek t cells in a ratio of : : . and collected the virus supernatant by filtering cell culture supernatant with . filters at hours after changing the serum-containing medium. then, cells were infected with different lentiviruses and cultured in medium with μg/ml puromycin to obtain stable cell lines. total rna was extracted from freshly harvested cells using trizol reagent (invitrogen, ). complementary dna (cdna) was synthesized with the primescript™ rt reagent kit (takara, rr a) involved in genomic dna erasing. quantitative pcr was performed using sybr green pre-mix (tiangen, fp ) on the roche lightcycler instrument. gapdh was the normalized endogenous control gene. relative gene expression was calculated according to -ΔΔct method. the primers for target genes and diverse his precursor fragments are shown in tables . chip assay was carried out as our previous study described (xiao et al., ) . in brief, transfected cells were cultured in cm dishes and crosslinked with % formaldehyde in ×pbs for minutes at room temperature. after sonication, sheared chromatin was immunoprecipitated with h k ac antibody (abcam, ab ) and protein a magnetic beads (invitrogen, d) overnight at °c. dna from the chromatin immunocomplexes was extracted with qiaquick pcr purification kit (qiagen, ) according to the manufacturer's guidelines. chip-derived dna was analyzed by quantitative pcr using sybr green premix (see tables for primer sequences), and data were normalized by input dna. loss-of-function of his was performed by inhibiting the enhancer components with jq (selleck chemicals inc., s ) as previously described (suzuki et al., ) or blocking them via antagomirs synthesized by guangzhou ribobio co., ltd (see tables for antagomir sequences). cells were cotransfected with antagomirs and his precursor vectors and harvested at h. besides, cells transfected with his precursor vectors were then treated with nm jq for h. as indicated time points, total rna was exacted from the harvested cells and evaluated the effect of loss-of-function for his on the target genes expression by rt-qpcr. uv-vis spectroscopy experiments were performed on a shimadzu uv- uv-vis spectrophotometer (tokyo, japan) at room temperature. the dna or/and rna oligonucleotide, dna-rna hybrid, dna duplex samples ( ml) were diluted to . μm in mm tris-hcl, mm nacl buffer at ph . . spectra were recorded over a wavelength range from to nm with a scan rate at nm/min and data interval for nm. a mm optical path length quartz cuvette was used for uv-vis measurement. melting temperatures (tm) of self-complementary sequences were determined from the changes in absorbance at nm as a function of temperature in a mm path length quartz cuvette on a shimadzu uv- uv-vis spectrophotometer (tokyo, japan) equipped with a temperature control system. solutions ( ml) of μm pre-hybridized dna-rna hybrid or dna duplex in aqueous buffer ( mm tris-hcl, mm nacl, ph . ) were equilibrated at ºc for min and then slowly ramped to ºc with ºc step at a rate of ºc/min. tm values were calculated as the first derivatives of heating curves. all dna-rna hybrid and dna duplex were prepared as follows: firstly, the oligonucleotide was mixed with an equal molar of the complementary target strand in hybridization buffer ( mm tris-hcl ph . , mm nacl, mm edta), then the mixtures were annealed by heating them to ºc for min, and then slowly cooled to room temperature. secondly, the annealed mixtures were separated on % nondenaturing polyacrylamide gel at v for min using . ×tbe as electrophoresis buffer. thirdly, the target bands in the gel were cut and recovered the nucleic acid probes from the gel according to the reference [molecular cloning: a laboratory manual, fourth edition, by green mr, sambrook j, cold spring harbor laboratory press, cold spring harbor, new york, usa] . full-length human argonaute (hago ) coding sequence was amplified and cloned into the bamh i and hind iii restriction endonuclease sites of a home-reconstructed pmal-c x expression vector for protein purification. the constructed plasmid was transformed into a laboratory-built dam knock out rosetta (de ) cells for protein expression. the colony was inoculated into ml of lb containing μg/ml ampicillin and allowed to grow overnight at °c. this culture was diluted into l lb containing μg/ml ampicillin and grown at °c until a reached . ~ . . then the culture was overnight induced by the addition of . mm isopropyl- -thio-β-d-galactopyranoside at ºc. the cells were harvested at rpm for min at °c and then suspended in ml lysis buffer ( mm tris-hcl ph . , mm nacl, % glycerol, mm dtt) supplemented with edta-free protease inhibitor. the suspended cells were lysed by ultrahigh-pressure continuous flow cell breaker under the low-temperature ( ºc) water bath. following centrifugation , rpm for min at ºc, the cleared lysate was loaded onto ml mbp traptm hp column pre-equilibrated with mm tris-hcl ph . , mm nacl, % glycerol, . mm dtt. the mbp-hago recombinant protein was eluted by the mm tris-hcl ph . , mm nacl, . mm maltose, % glycerol, . mm dtt. the eluted recombinant protein was digested by thrombin to remove the mbp-tag under the ice-water bath, and then the digested mixture was further purified by the ml histrap hp column. the oligonucleotide probes ( nm) were incubated with hago recombinant protein on the ice for min in the fresh prepared binding buffer ( mm pbs ph . , mm mgcl , and . % triton- ). the protein-substrate complexes were separated from the unbound substrate probes on % nondenaturing polyacrylamide gels at v for min using . ×tbe as electrophoresis buffer. after electrophoresis, the resolved oligonucleotide probes in the gel were detected using an odyssey clx dual-color ir-excited fluorescence imaging system (li-cor, lincoln, ne). hyaluronic acid of covid- patients' plasma was measured in : dilution by the enzyme-linked sandwich assay hyaluronan duoset elisa (r&d systems, minneapolis, mn, usa) following the manufacturer's descriptions. to evaluate the effect of hyaluronic acid inhibitor, × cells were seeded in -well plates and incubated for h under appropriate treatments (with µm -mu or µg/ml hymecromone, and dmso as the control group). then, we collected the culture supernatants and quantified hyaluronic acid using the same elisa kit. the mrna relative expression level for each sample was evaluated by -ΔΔct method. the statistical analysis was described in figure legends. (((((....) )))).))))))).... figures scale chr : ---> bases hg , , , , , , , , t a t a a c a c a t a t a a a a a t a c g t g t multiz alignments of vertebrates t a t a a c a c a t a t a a a a a t a c g t g t t a t a a c a c a t a c a a a a a t a t g t g t t a t a a c a c a t a t a a a a a t a t g t g t t a t a a c a c a t a t a a a a a t a t g t g t t a t a a c a c a t a t a a a a a t a t g t g t t a t a a c a c a t a t g a a a a t a t c t g t t a t a a c a c a t a t g a a a a t a t c t g t t a t a a t a t g c a t a a t a g t a t g t g t bases hg , , , , , , , , t multiz alignments of vertebrates t t t c a t t t c t t t a t t c c g t a t a t t t t t c g t t t c t t c a t t c c g t a t a t t t t t c g t t t c t t c a t t c c g t a t a --t t t c g t t t c t t c a t t c c g t a t a t t t t t c a t t t c t t c a t t t c g t a t a t t ---- t t t a t c t t t t a t c g t c g t t g t t multiz alignments of vertebrates a a t t t a t c t t t t a t c g t c g t t g t t a a t t t a t c t t t t a t c g t c g t t g t t a a t t t a t c t t t t a t c g t c g t t g t t a a t t t a t c t t t t a t c g t c g t t g t t a a t t t a t c t t t t a t c g t c g t t g t t a a t t t a t c t t t g a t c g t c g t t g t t a a t t t a t c t t t g a t c g t c g t t g t t a a t t t a t c t t t t g t c g t c g t t g t t a a a t t a t c t t t t g t c t t c g t t g t t a a t t t a t c t t t t g t g g t c g t t g t t a a t t t a t c t t t t g t c g t c g t t g t t a a t t t a t c t t t t g t c g t c a t t g t t a a t t t a t c t t t c c t c g t a g t t g t t t t a t g t t t t a t c g t c g t t g t t a a t t t a t a t t t t g t c g t t g t t g t t a a t t t a t g t t t t g t c g t c g t t g t t a a t t t a t g t t t t g t c g t c g t t g t t ischaemic stroke and sars-cov- infection: a causal or incidental association? regulatory elements in the viral genome g r nod variant in a family with sarcoidosis comprehensive modeling of microrna targets predicts functional non-conserved and non-canonical sites hyaluronan affects extravascular water in lungs of unanesthetized rabbits imbalanced host response to sars-cov- drives development of covid- overview of lethal human coronaviruses e ligase subunit fbxo and pink kinase regulate cardiolipin synthase stability and mitochondrial function in pneumonia clinical and immunological features of severe and moderate coronavirus disease computational identification of small interfering rna targets in sars-cov- high speed blastn: an accelerated megablast search tool the six hyaluronidase-like genes in the human and mouse genomes organ and species specificity of hepatitis b virus (hbv) infection: a review of literature with a special reference to preferential attachment of hbv to human hepatocytes reduction and functional exhaustion of t cells in patients with coronavirus disease (covid- ) correlation analysis of the severity and clinical prognosis of cases of patients with covid- an interactive web-based dashboard to track covid- in real time phosphodiesterase inhibitors: could they be beneficial for the treatment of covid- ? clinical characteristics of coronavirus disease in china silencing of host genes directed by virus-derived short interfering rnas in caenorhabditis elegans accumulation of hyaluronan (hyaluronic acid) in the lung in adult respiratory distress syndrome a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies myosin light chains and are functional ligands for cd that regulate airway inflammation modulation of metabolic functions through cas d-mediated gene knockdown in liver sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor implications of sars-cov- mutations for genomic rna structure and host microrna targeting sars-cov regulates immune function-related gene expression in human monocytic cells clinical features of patients infected with novel coronavirus in wuhan, china. the lancet systematic and integrative analysis of large gene lists using david bioinformatics resources identification of host-specificity determinants in betanodaviruses by using reassortants between striped jack nervous necrosis virus and sevenband grouper nervous necrosis virus factor vii-activating protease (fsap): vascular functions and role in atherosclerosis the architecture of sars-cov- transcriptome identifying sars-cov- related coronaviruses in malayan pangolins coagulation abnormalities and thrombosis in patients with covid- hepatitis c: from inflammatory pathogenesis to anti-inflammatory/hepatoprotective therapy an epigenetic perspective on tumorigenesis: loss of cell identity, enhancer switching, and namirna network selective inhibition of tumor oncogenes by disruption of super-enhancers emerging roles of small epstein-barr virus derived noncoding rnas in epithelial malignancy hyaluronic acid binding protein is a novel regulator of vascular integrity manifestations and prognosis of gastrointestinal and liver involvement in patients with covid- : a systematic review and meta-analysis incorporating chemical modification constraints into a dynamic programming algorithm for prediction of rna secondary structure role of cd in activation-induced cell death: cd -deficient mice exhibit enhanced t cell response to conventional and superantigens mitrac links mitochondrial protein translocation to respiratory-chain assembly and translational regulation the interplay between viral-derived encoded small rnas contribute to infection-associated lung pathology -methylumbelliferone treatment and hyaluronan inhibition as a therapeutic strategy in inflammation the potential role of mir- - p in coronavirus-host interplay species tropism of hiv- modulated by viral accessory proteins insect-specific virus evolution and potential effects on vector competence micrornas and their targets: recognition, regulation and an emerging reciprocal relationship influenza a virus-generated small rnas regulate the switch from transcription to replication the involvement of mitochondrial amidoxime reducing components and and mitochondrial cytochrome b in n-reductive metabolism in human cells dysregulation of immune response in patients with covid- in wuhan species-specific host-virus interactions: implications for viral host range and virulence mitochondria and microbiota dysfunction in covid- pathogenesis barriers of hepatitis c virus interspecies transmission processing of virus-derived cytoplasmic primary-micrornas susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus super-enhancer-mediated rna processing revealed by integrative microrna network analysis hyaluronan in respiratory injury and repair clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in correlation analysis between disease severity and clinical and biochemical characteristics of cases of covid- in wuhan, china: a descriptive study mammalian rna virus-derived small rna: biogenesis and functional activity a cytoplasmic rna virus generates functional viral small rnas and regulates viral ires activity in mammalian cells pathophysiology, transmission, diagnosis, and treatment of coronavirus disease rna-gps predicts sars-cov- rna residency to host mitochondria and nucleolus isolation of sars-cov- -related coronavirus from malayan pangolins micrornas activate gene transcription epigenetically as an enhancer trigger transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients clinical characteristics of cases of death from covid- a pneumonia outbreak associated with a new coronavirus of probable bat origin mirna-mediated rnaa by targeting enhancers mir- a-mediated downregulation of rhob inhibits the dephosphorylation of akt and induces osteosarcoma cell metastasis adr bib cache css ent etc figures htm js license manifest.htm metadata.csv pos provenance.tsv readme standard-error.txt standard-output.txt tmp tsv txt urls wrd key: cord- -tjmx msm authors: sardar, rahila; satish, deepshikha; birla, shweta; gupta, dinesh title: comparative analyses of sar-cov genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tjmx msm the ongoing pandemic of the coronavirus disease (covid- ) is an infectious disease caused by severe acute respiratory syndrome coronavirus (sars-cov ). we have performed an integrated sequence-based analysis of sars-cov genomes from different geographical locations in order to identify its unique features absent in sars-cov and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. the phylogeny of the genomes yields some interesting results. systematic gene level mutational analysis of the genomes has enabled us to identify several unique features of the sars-cov genome, which includes a unique mutation in the spike surface glycoprotein (a v ( c>t)) in the indian sars-cov , absent in other strains studied here. we have also predicted the impact of the mutations in the spike glycoprotein function and stability, using computational approach. to gain further insights into host responses to viral infection, we predict that antiviral host-mirnas may be controlling the viral pathogenesis. our analysis reveals nine host mirnas which can potentially target sars-cov genes. interestingly, the nine mirnas do not have targets in sars and mers genomes. also, hsa-mir- b is the only unique mirna which has a target gene in the indian sars-cov genome. we also predicted immune epitopes in the genomes the first case of covid- patient was reported in december at wuhan (china) and then it has spread worldwide to become a pandemic, with maximum death cases in italy, though initiallythe maximum mortality was reported from china ( ). according to a who report, as on th march there were confirmed , covid- cases and cases of deaths, that includes cases which were locally transmitted or imported ( ) . there are published reports which suggests that sars-cov shares highest similarity with bat sars-cov. scientists across the globe are trying to elucidate the genome characteristics using phylogenetic, structural and mutational analysis. recent paper identified specific mutations in receptor binding domain (rbd) domain of spike protein which is most variable part in coronavirus genome ( ) . there are more than sars-cov assembled genomes available at ncbi database. sequence analysis of the genomes can give us plethora of information which can of use for drug development and vaccine development research attempts. in the current work we collected sars-cov genomes from different geographical origins mainly from india, italy, usa, nepal and wuhan to identify notable genomic features of sars-cov by integrated analysis. these analyses include identification of notable mutational signatures, host antiviral-mirna identification and epitope prediction. as a host defense mechanism, a repertoire of host mirnas also target invading viruses. we followed the parameters used in various anti-viral mirna databases to predict host anti-viral mirnas against sars-cov . our analysis shows unique host-mirnas targeting sars-cov virus genes. respectively, were retrieved from ncbi genome database. sars-cov genomes from india, italy, usa, nepal along with sars-cov and mers were used as query genomes to compare with wuhan sars-cov genome. genes and protein sequences of sars-cov were retrieved from vipr database( ). all assembled query genomes in fasta format were analyzed using genome to understand the variation in genomes from various geographical areas used in the study, we performed a phylogenetic analysis. neighbor joining method with bootstrap value of replicates was used for the construction of consensus tree using mega software( ) ( . . version). cello go ( )server was used to infer biological function for each protein of sars-cov genome with their localization prediction. the mutations reported in literature ( )were catalogued and evaluated for pathogenicity. we used mutpred( )server to identify disease associated amino acid substitution from neutral substitution, with a p-value of >= . . in order to assess the impact of snps on protein stability, we used two machine learning based prediction methods. the first method, i-mutant server( ) was used to predict stability of the protein sequences at ph . and temperature ˚c. the second prediction method is mupro( ) server, the predictions with the former method helps in getting a consensus prediction. to predict host mirnas targeting the virus, we collected a list of experimentally verified antiviral mirnas with their targets from virmirna database ( ) . only these host mirnas were processed for downstream analysis. (figure to identify potential host microrna target sites in the virus genome sequences, we have used miranda ( . a version) ( , ) software, with an energy threshold of - kcal/mol. we also used psrnatarget server to compare the predicted targets by the two methods ( ) . all the genes and protein sequences for sars-cov were retrieved from vipr database. to identify ctl and b-cell epitopes we have used ctlpred( ), abcpred( ) servers with default parameters. chemopred ( ) and vaxijen server ( ) were used to predict chemokines and protective probable antigen, respectively ( figure (c) ). assembled sars-cov genomes sequences in fasta format from india, usa, china, italy and nepal used for coronavirus typing tool analysis. using the tool, we were able to locate query sars-cov genomes with known sars-cov to obtain a cladogram for evolutionary analysis as shown in figure several mutations are revealed when sar-cov and sars-cov spike glycoproteins are compared. six frameshift mutations and insertion in the genome that corresponds to s _q inssdld ( _ insagtgaccttgac) ( table s ) was also revealed. we also observed that there are several mutations located in the regions associated with high immune response (table s ) from snps analysis we observed that all the mutations might bring about decrease in stability without changing their properties i.e. hydrophobicity to hydrophilicity or vice versa. l y mutation predicted to altered ordered interface, disordered interface stability, transmembrane protein and gain of gpi-anchor amidation at n position (table ) . it is known, and also confirmed by gene ontology analysis that the protein is involved in pathogenesis, membrane organization, reproduction, symbiosis, encompassing mutualism through parasitism, and locomotion. psrnatarget analysis based on the complementary matching between the srna sequence and target mrna sequence with predefined scoring schema identified mirnas out of the identified mirnas to target sars-cov genes. the mirnasare predicted to act on the viral genomes by cleaving their target sites (table ). intriguingly, our analysis (s. figure ) revealed that there is only a single host mirna we have used bioinformatics tools to investigate sars-cov sequences from different geographical locations. the phylogenetic analysis of the genomes, the nucleotide sequence diversity analysis of the genomes, the predicted antiviral host mirnas specific to the genomes and the prediction of immune active sequences in the genomes have yielded some interesting facts, including unique features. for the phylogenetic analysis, we compared the sequences of sars-cov isolates from different countries namely, wuhan, india, italy, usa and nepal along with other corona virus species ( figure ). as reported earlier too ( , ) , the virus from wuhan showed higher similarity with sars-cov. there was no phylogenetic segregation of the genomes based on geographic origin, whether from the same continent or a neighboring country (figure ) but, ambiguously showing varied clustering like italy and nepal clustered together, followed by india and usa. this reiterates the findings indicating the massive exchange and importation of the carriers between the epicenter wuhan and these countries. however, a detail analysis, complemented with more sequences and patient met data will give further evolutionary insights regarding the fast spreading pandemic. the phylogenetics heterogeneity between different strains is explored by genome variation profiling to find alterations in genetic information during the course of evolution, outbreak, and clinical spectrum caused by the different strains. in case of sars-cov and sars-cov too, few clinical characteristics differentiate them among themselves and with other seasonal influenza infections as well, as reported recently ( ) . interestingly in the present analysis, in comparison to sars-cov, we observed at least one of the variations like indels, deletions, misaligned and frameshift in all the sars-cov proteins except orf , orf and orf (table s ). the ( ) . going well with the expectations from a rapidly transmitting pandemic virus, in our analysis, we observed various mutations located in the regions associated with immune response (table s ) . these mutations may have significant impact on the antigenic and immunogenic changes responsible for differences in the severity of the outbreak in different geographical regions. to gain further insights, we compared the genetic mutation spectrum identified in the four countries, namely usa, italy, india and nepal. surprisingly, the mutation spectrums were different among these countries ( ( ), combined with other factors-a speculation which maybe verified with more evidences. from this analysis, we also speculate that the presence of country specific mutation spectrum may also be able to explain the current scenario in these countries like severity of illness, containment of the outbreak, the extent and timing of exposures to a symptomatic carrier etc. non-structural proteins have their specific roles in replication and transcription ( ) . previous studies on sars-cov revealed nsp as a potential candidate for the therapeutic target ( ) . it is noteworthy to mention that in the present study; various mutations have been identified in all the non-structural proteins suggesting them to be an important and potential player in proposing therapeutic targets and should be explored experimentally. many studies have reported that mirnas not only act as the signature of tissue expression and function but also as potential biomarkers playing important role in regulating disease pathophysiology ( ) . in viral infections, host antiviral mirnas play a crucial role in the regulation of immune response to virus infection depending upon the viral agent. many known human mirnas appear to be able to target viral genes and their functions like interfering with replication, translation and expression. in the present study, we tried to predict the antiviral host-mirnas specific for ( ) . also there are studies on the regulatory role of mirna hsa-mir- b- p described in ace signaling ( ) . the results of the present study suggest a strong correlation between mirna hsa-mir- b- p and ace which needs to be confirmed experimentally in sars-cov cases. further, we tried to compare the mirnas in the genomes and observed some striking findings. we observed that out of all the mirnas, hsa-mir- b is the only unique based on our analysis, we speculate an important regulatory role of mir- b in sars-cov infection. the contradictory treatment outcomes may be due to the presence of the mir- b target in the indian genome specifically. it probably indicates that the specific genetic and mirna spectrum should be considered as the basis of the treatment management. the findings in the study have revealed unique features of the sars-cov genomes, which may be explored further. for example, one may analyse the link between severity of diseases to each of the variants, expression of the predicted host antiviral mirnas can be checked in the patients, the predicted epitopes may be explored for their immunogenicity, difference in treatment outcomes may also be correlated with genome variations, lastly the potential of the unique segments of the virus proteins and the unique host mirnas may be explored in development of novel antiviral therapies. probable pangolin origin of sars-cov- associated with the covid- outbreak coronavirus disease (covid- ). situation report - the proximal origin of sars-cov- vipr: an open bioinformatics database and analysis resource for virology research genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes molecular evolutionary genetics analysis across computing platforms. molecular biology and evolution cello go: a web server for protein subcellular localization prediction with functional gene ontology annotation automated inference of molecular mechanisms of disease from amino acid substitutions : predicting stability changes upon mutation from the protein sequence or structure prediction of protein stability changes for single-site mutations using support vector machines virmirna: a comprehensive resource for experimentally validated viral mirnas and their targets. database : the journal of biological databases and curation the microrna.org resource: targets and expression. nucleic acids research pamirdb: a web resource for plant mirnas targeting viruses. scientific reports psrnatarget: a plant small rna target analysis server prediction of ctl epitopes using qm, svm and ann techniques prediction of continuous b-cell epitopes in an antigen using recurrent neural network prediction and classification of chemokines and their receptors vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov a novel coronavirus from patients with pneumonia in china composition and divergence of coronavirus spike proteins and host ace receptors predict potential intermediate hosts of sars-cov- recent progress in the discovery of inhibitors targeting coronavirus proteases coronavirus nonstructural protein mediates evasion of dsrna sensors and limits apoptosis in macrophages extracellular mirnas: the mystery of their origin and function. trends in biochemical sciences are patients with hypertension and diabetes mellitus at increased risk for covid- infection? the lancet respiratory medicine the ace /apelin signaling, micrornas, and hypertension. international journal of hypertension regulation of cyclin t and hiv- replication by micrornas in resting cd + t lymphocytes interferon-beta and interferon-gamma synergistically inhibit the replication of severe acute respiratory syndrome-associated coronavirus (sars-cov), virology diagnosis and treatment of novel coronavirus infection in children: a pressing issue key: cord- - jlg gkc authors: hopp, marie-thérèse; domingo-fernández, daniel; gadiya, yojana; detzel, milena s.; schmalohr, benjamin f.; steinbock, francèl; imhof, diana; hofmann-apitius, martin title: unravelling the debate on heme effects in covid- infections date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jlg gkc the sars-cov- outbreak was recently declared a worldwide pandemic. infection triggers the respiratory tract disease covid- , which is accompanied by serious changes of clinical biomarkers such as hemoglobin and interleukins. the same parameters are altered during hemolysis, which is characterized by an increase in labile heme. we present two approaches that aim at analyzing a potential link between available heme and covid- pathogenesis. four covid- related proteins, i.e. the host cell proteins ace and tmprss as well as the viral protein a and s protein, were identified as potential heme binders. we also performed a detailed analysis of the common pathways induced by heme and sars-cov- by superimposition of knowledge graphs covering heme biology and covid- pathophysiology. herein, focus was laid on inflammatory pathways, and distinct biomarkers as the linking elements. finally, the results substantially improve our understanding of covid- infections and disease progression of patients with different clinical backgrounds and expand the diagnostic and treatment options. in the beginning of , the coronavirus disease has been declared a pandemic of international concern and an unprecedented challenge for the country-specific health care systems (cucinotta and vanelli, ) . covid- is caused by infections with severe acute respiratory syndrome coronavirus (sars-cov- ) and is accompanied by pneumonia, acute respiratory distress syndrome (ards) associated with a cytokine storm, and death in the most severe cases (ye et al., ; zhou et al., a) . by taking a closer look into the molecular mechanisms of the infection and disease development, it is important to note that patients with severe covid- often had a history of hypertension, yet also chronic kidney disease, cardiovascular disease or diabetes mellitus compared to those with milder disease progression (ji et al., ; zhou et al., a) . the scientific evidence of a potential higher risk for these patients, however, is still pending. furthermore, there is evidence that the renin-angiotensin system (ras), which is associated with hypertension, is directly associated with viral transmission (hanff et al., ) . an essential part of ras is the enzyme angiotensin-converting enzyme (ace ) that is expressed on the cell surface of alveolar epithelial cells of the lungs (ren, ) . more precisely, a recent report identified specific bronchial transient secretory cells, a cell state between goblet (responsible for mucus production) and ciliated cells (responsible for airway clearance) in human bronchial epithelial cells to be primarily attacked during the viral infection (lukassen et al., ; xu et al., ) . the virus gains access to the host cell by docking of its spike proteins (s protein) to the membrane surface of the host cell, which primarily occurs via the transmembrane protein ace (hoffmann et al., ) . this interaction between the host cell and sars-cov-related viruses is known since and involves the residues gln of the s protein and lys of ace as the critical interacting amino acids (wu et al., ) . in this context, the membrane protein m (m protein) is discussed to be relevant for the entry and attachment of the virus by interacting with the s protein . furthermore, the m protein may also be important for the budding process of the virus, since it interacts with the nucleocapsid envelope protein (e protein) and the s protein during virus particle assembly (alsaadi and jones, ) . additionally, it has been proposed that the e protein oligomerizes to form ion channels, and also plays a role in the assembly of the viral genome (ruch and machamer, ) . although protein a has not yet been fully characterized it is already known to act as an accessory protein (as derived by similarity from sars-cov), while interacting with m and e proteins (fielding et al., ) . this process is essential for virus particle formation before release of the reproduced virus particles into the surrounding areas, such as the blood stream (fielding et al., ; kwon et al., ) . recent studies have revealed that the affinity of sars-cov- for ace is - -times higher than the affinity of sars-cov, which would explain its much higher transmissibility zhang et al., b) . upon binding, the viral s protein is subjected to proteolytic cleavage by the host cell's transmembrane serine protease subtype (tmprss ) . the virus' entry can be blocked by a clinically proven protease inhibitor, rendering the tmprss an interesting target against covid- . interestingly, it was shown that sars-cov- does not use other receptors like aminopeptidase n or dipeptidyl peptidase for cell entry as described for other coronaviruses (zhou et al., b) . therefore, these proteins are unlikely to represent suitable targets for therapy of covid- . the tissue distribution of one of the main actors, ace , in organs like heart, kidney, endothelium, and intestine might explain the multi-organ dysfunction observed in covid- patients . several studies have provided information about the main symptoms, risk factors for severe disease progression, and clinical diagnostic values including blood routine, blood biochemistry, and infection-related biomarkers guan et al., ; han et al., ; zhou et al., a) . although the details of these individual studies vary, there is a consensus among changes of numerous clinical parameters, which might be directly connected or must be considered together in a specific context. for example, hemoglobin is decreased in more than % of the patients, as is serum albumin in % guan et al., ; yang et al., ; zhou et al., a) . in intensive care unit (icu) patients, reduced levels of hemoglobin levels and cluster of differentiation proteins (cd) , cd , cd , cd , and cd / were observed . in contrast, values of absolute lymphocyte count and absolute monocyte count were comparable to non-icu patients . however, in contrast to human immunodeficiency virus (hiv) and cytomegalovirus, the cd / ratio was not inverted . main symptoms are fever, cough, and fatigue, all presenting reactions of an activated immune system (guan et al., ) . the activation of the immune and the complement system is also observed by a variety of markers including increased values for interleukin (il)- ( % of the patients), erythrocyte sedimentation rate ( %), serum ferritin ( %), and c-reactive protein (crp) ( %) risitano et al., ; wang et al., ; zhang et al., c; zhou et al., a) . furthermore, recent studies that monitored and compared coagulation parameters of covid- patients have suggested a tendency to procoagulant states ji et al., ; tang et al., ) and an increased risk of venous thromboembolism (giannis et al., ) , which was indicated by higher levels of fibrin/fibrinogen degradation products and fibrinogen itself, as well as lower antithrombin levels . moreover, d-dimer levels, a marker for coagulation and sepsis, are markedly increased in non-survivors of covid- (guan et al., ; zhou et al., a) . therefore, covid- patients often suffer from leukopenia, lymphocytopenia, and thrombocytopenia (guan et al., ) . overall, these clinical parameters are interrelated when viewed from the perspective of heme and its interaction radius (dutra and bozza, ; kühl and imhof, ; roumenina et al., ; humayun et al., ) . heme is well-known as the prosthetic group of diverse proteins, e.g., hemoglobin, where it is responsible for the oxygen transport in the blood (ascenzi et al., ) . under hemolytic conditions such as malaria, sickle cell anemia (scd), β-thalassemia, and hemorrhage, or in case of severe cellular damage, heme is released in enormous amounts as a result of hemoglobin degradation, leading to a pool of labile heme (ascenzi et al., ; roumenina et al., ) . in this case, the heme-detoxifying scavenger proteins, in particular hemopexin, become saturated, which allows heme to execute its wide-ranging effects (chiabrando et al., ; roumenina et al., ; detzel et al., ) . the response to heme in this context leads to cytotoxic, procoagulant, vasculotoxic, and proinflammatory effects, as well as an activation of the complement system (dutra and bozza, ; roumenina et al., ) . labile heme also plays a central role in the pathology of severe sepsis which leads to vascular inflammation and severe toxic effects in organs like liver, kidney or cardiac tissue (dutra and bozza, ) . these responses are, in part, mediated through direct interaction of heme with the responsible proteins (i.e. tumor necrosis factor α (tnfα), toll like receptor (tlr ), and complement factor (c ) (dutra and bozza, ; roumenina et al., ; kupke et al., ) , or by up-regulation of the respective cytokines, including il- β, il- , and tnfα (dutra and bozza, ) . in addition, for heme a downstream ros-dependent induction of distinct signaling pathways (mapk/erk pathway, nfκb signaling) is discussed, which can lead to stimulation of neutrophil recruitment, necrotic cell death or expression of adhesion molecules (dutra and bozza, ) . in fact, we recently contextualized the role of heme as an inflammatory mediator as well as its crosstalk with the tlr signaling pathway (humayun et al., ) . although the aforementioned findings suggest a connection between biological processes implicated in sars-cov- and those related to heme, the lack of information on both subject areas impedes the use of modeling approaches that can facilitate their interpretation. despite a considerable volume of research on sars-cov- over the past few months, knowledge of the molecular mechanisms responsible for the pathophysiology of the virus still remains scarce. likewise, data in the context of heme's biology is underrepresented, if available at all, in standard bioinformatics resources such as pathway databases. however, two of our recent scientific publications have focused on modeling knowledge around heme as well as sars-cov- humayun et al., ) . although both studies are tangential, each follows a knowledge-driven approach aimed at generating a context-specific knowledge graph (kg) that can subsequently be employed to investigate candidate mechanisms involved in hemolytic disorders and covid- . moreover, in leveraging the interoperability between the covid- and heme kgs, these kgs can be integrated to shed light on the shared mechanisms between these two, seemingly independent domains. following the deposition of a manuscript stating that sars-cov- attacks hemoglobin -β chain and captures the porphyrin of its heme , a heated debate arose about the scientific substantiation of the truthfulness of the claims. in this context, one should ask: how realistic is it, in view of the patients' constitution that coronavirus components encounter heme and following up on this, what might be the consequences of such an interaction, if any? thus, in order to contribute to the understanding of sars-cov- , its strategies of infection, as well as the pathogenesis and the course of disease in coronavirus patients, we combined our know-how and expertise for a joint analysis of the aforementioned issue. on the one hand, we examine the possibility of a direct interaction of heme with select sars-cov- proteins and specific host cell proteins by applying our webserver hemoquest (paul that is based on experimental data. one of the most promising findings was the prediction of heme-binding motifs (hbms) in the host cell proteins ace and tmprss . on the other hand, by superimposing the two knowledge graphs, i.e. heme kg (humayun et al., ) and covid- kg (domingo-fernández et al., ), we provide insights into pathways that might play a role when considering heme in the context of covid- infections. finally, our results suggest that proinflammatory pathways could connect the pathophysiology of elevated heme with covid- disease progression. hemoquest (http://bit.ly/hemoquest) (paul george et al., ) was used to identify potential hbms in the following proteins of sars-cov- : s protein, m protein, e protein, and protein a, and human: ace and tmprss . for the selection, the procedure described earlier was applied (wißbrock et al., ; detzel et al., ) . in all in silico experiments, we used available cryogenic electron microscopy (cryoem) structures or homology models (publicly available or in-house built). in case of ace the recently published, fully glycosylated cryoem structure (pdb: m ) was used, which was recorded in complex with sodium-dependent neutral amino acid transporter b (b at ) (yan et al., ) . b at was removed in order to focus on ace only. although s protein cryoem structures for open and closed states are available (pdb: vxx and vyb) (walls et al., ) , these structures lack several surface-exposed sequence stretches, in which some of the predicted motifs are located. we therefore used the structure available from the c-i-tasser structure prediction server reported earlier (zhang et al., a) . for sars-cov- protein a and tmprss , no structures are available so far. thus, homology models were built using yasara version . . and the hm_build macro with default settings (krieger and vriend, ) . we were able to build a hybrid model of the virion surface-exposed part of protein a from two structures of the sars virus protein a (pdb: xak and yo ). the model achieved an overall z-score of - . , which can be regarded as suitable. in contrast, a hybrid model of tmprss based on kallikrein and hepsin (pdb: i , o g, z g, ce ) exhibited only a poor overall z-score of - . and was therefore rejected. in this case, further in silico analysis was performed with the available swiss-model o (waterhouse et al., ) . in order to investigate the mechanisms linking sars-cov- and heme, we exploited the kgs generated in our previous work humayun et al., ) . we compiled the two kgs encoded in biological expression language (bel) using pybel (hoyt et al., ) directly from their public repositories (i.e. https://github.com/covid kg and https://github.com/hemekg/) and superimposed their interactions onto a merged network. given the high degree of expressivity of bel that enables the representation of multimodal biological information, the kgs were not only enriched with molecular information, but also with interactions from the molecular level to phenotypes and clinical readouts. we leveraged this multimodal information to hypothesize the pathways that connect key molecules associated with sars-cov- and heme to phenotypes observed in covid- patients. since both kgs comprise several thousands of interactions, manually inspecting all relations and evaluating the implication of the crosstalk between covid- and heme is largely infeasible. accordingly, this analysis primarily focuses on the set of nodes present in both kgs. prior to the crosstalk analysis, we conducted a one-sided fisher's exact test (fisher, ) to evaluate the significance of the overlap between human proteins present in each of the kgs (p-value < . ). we then classified the set of overlapping nodes into four pathways based on their functional role: i) immune response -inflammation, ii) immune response -complement system, iii) blood and coagulation system, and iv) organ-specific diagnostic markers. finally, for each of the pathways, we analyzed the similarity between the signatures for both kgs by superimposing the relations that connect each of the overlapping nodes to covid- and heme biology. the relations present in figure are also shown in table s -s together with their evidence and provenance information. in order to validate the knowledge-driven hypothesis coming from the kg, we compared the relations emerging from the overlap between the two kgs with experimental data published in the context of covid- (blanco-melo et al., ) . the concordance of the expression patterns in these datasets with each relation shown in figure is shown in table s . covid- progression severely diverges between affected patients with ards and other patients, which could even remain asymptomatic. current research is thus focusing on explaining the reasons for such discrepancy considering the physical conditions and (pre-)existing illnesses of those affected. with regard to the subject of a possible interrelation between covid- and heme, numerous options need to be regarded. first, the earlier claim of an interaction of protoporphyrin ix with sars-cov- (liu and li, ) must be questioned, since heme would appear before protoporphyrin ix as a consequence of e.g., hemolytic conditions. thus, the direct interaction of heme with viral surface proteins, as well as host cell proteins exposed to virus attack, needs to be considered. second, systemic hyperinflammation follows severe covid- infection. this is manifested by an increase in the abundance of numerous cytokines (e.g., il- , il- , il- , tnfα) (ye et al., ) , which is indicative of cytokine release syndrome (huang et al., ) and leads to elevated serum biomarkers in patients (e.g., crp, lactate dehydrogenase (ldh), d-dimer, ferritin) young et al., ; zhang et al., c; zhou et al., a) . several of these indications, however, were also reported for labile heme occurring in patients with hemolytic disorders (litalien et al., ; barcellini and fattizzo, ) . therefore, heme as a key player in initiating or mediating distinct processes in connection with a viral infection needs to be considered as well. this can be exemplified with the interaction of heme with e.g., zika, chikungunya, and hiv- viruses (gupta et al., ; lecerf et al., ; assunção-miranda et al., ; neris et al., ) . in the following, we present our results concerning the potential direct heme interaction with covid- -related proteins as well as a detailed analysis of common pathways of excess heme and covid- pathophysiology. numerous interesting target proteins of the virus and host cell surface were linked with pathological effects of sars-cov- , including e protein, s protein, m protein and protein a as well as the human proteins ace and tmprss . all proteins contain at least an extracellular, surface-exposed part and are thus accessible for interaction with heme (hänel and willbold, ; mendes de oliveira et al., ; mousavizadeh and ghasemi, ; walls et al., ) . this led us to examine these proteins for potential heme-binding sites. we identified potential hbms on all target proteins using the machine-learning algorithm hemoquest (paul . screening of the amino acid sequences of s protein, protein a, ace and tmprss resulted in , , , and potential hbms, respectively. m protein and e protein were dismissed as candidates, since no suitable hbms were found. hbms, which are part of the transmembrane or intravirion/intracellular domains, were removed from the selection. in addition, we excluded motifs in which the potential coordinating residue or a residue adjacent to the coordination site was involved in disulfide bonds or glycosylation. after this refinement of the hits, we identified motifs in s protein, two in protein a, in ace , and in tmprss (figure ). these motifs were then manually screened for surface accessibility using the protein structures, or if unavailable, homology models. consequently, three motifs for s protein, two motifs for protein a, five motifs for ace and ten motifs for tmprss remained and are discussed below (figure ). the potential hbms in s protein are all located in the n-terminal domain ( figure a ) . the first occurring sequence flgvy yhkn may be the most promising hbm, which is based on a yyh motif and further equipped with phenylalanine at p- , two additional hydrophobic amino acids (val, leu), and a net charge of + , all beneficial for heme binding . the following, iyskh tpin and lhrsy ltpg, contain a y/h-based motif with two spacers between the potential coordinating residues, e.g., yxxh, which have been shown to be less favorable for heme binding . nonetheless, both motifs possess a net charge of + , and several hydrophobic residues, and are thus likely to moderately bind heme. in protein a, only two overlapping motifs were predicted, which is not surprising due to the small size of amino acids ( figure b) . both, dgvkh vyql and vkhvy qlra, possess a hxy motif and three hydrophobic residues, rendering it a moderate heme binder and, in turn, protein a as a less interesting candidate for interaction with heme. the analysis of ace revealed five hmbs in total, two of which representing promising h/y motifs ( figure c ). the most interesting hbm is ltahh emgh comprising a hxxxh motif, which was recently shown to exhibit high heme-binding affinity . the central h is immediately adjacent to the site that is essential for cleavage by adam (heurich et al., ; lan et al., ) . the occurrence of three histidines may be favorable for heme binding as could l , while e might be slightly detrimental. the second interesting motif is plyeh lhay, since it contains the efficient hxh motif with further advantageous aromatic tyrosines (y , y ) and hydrophobic leucines (l , l ). the only limitation to affinity might be e . the remaining three motifs sfiry ytrt, qaakh egpl, and amrqy flkv are less promising because they only contain one coordinating amino acid or the weak motif yy . the largest number of motifs, i.e. ten in total, was identified in the transmembrane serine protease tmprss . four of these motifs contained only one coordinating amino acid and can be dismissed for the aforementioned reasons. additional three motifs (cvrly gpnf, rkswh pvcq, cakay rpgv) contain cysteine as a possible further coordinating site. cysteine has been shown to efficiently function as hrm in conjunction with proline in the cp motif, but without it, it lacks high heme-binding affinity (brewitz et al., ) . two further overlapping motifs (kvish pnyd and shpny dskt) were found in the protease domain of tmprss ( figure d ). both are equally well-suited for moderate heme binding based on the hxxy motif and a positive net charge. in order to shed light on the crosstalk and common pathways between heme and covid- , we investigated the overlap between our two kgs (i.e., heme kg (humayun et al., ) and the covid- kg (domingo-fernández et al., )) ( figure ). while the heme kg was generated from the analysis of scientific articles specifically selected to explain inflammatory processes related to labile heme, the covid- kg contains over articles. the difference in the size of these kgs thus explains the disproportionate number of molecules they possess. nonetheless, we observed that a significant amount of proteins is shared, predominantly in three major systems, namely blood coagulation, complement and immune system. among the shared nodes, there are clinical phenotypes, proteins, immune system specific cells, and small molecules. nodes belong to immune response evoking (pro-)inflammatory pathways, to the complement system, and to the blood coagulation system ( figure a) . moreover, we also noticed the presence of clinical phenotypes related to organ dysfunction. further, we individually investigated the four systems to reveal the common relations observed in each of the two kgs ( figure , table s -s ). finally, we evaluated the concordance of these systems with experimental data published in the context of covid- (blanco-melo et al., ) and the vast majority of them are in line with the findings presented below (table s ). the largest consistency was found in inflammatory pathways ( figure b ) as indicated by a common set of inflammatory -mostly pro-inflammatory -molecules. these are changed with respect to their levels due to expression and/or secretion or their activity as a consequence of both, high heme concentrations and covid- infection, mediating inflammatory response. in particular, the proinflammatory cytokines tnfα, il- β, il- , il- , and the anti-inflammatory cytokine il- , as well as proteins related to tlr -mediated signaling pathways (i.e. cd , myd , nf-κb and tlr ) are influenced under both conditions ( figure b , table s ). within the complement system ( figure c , table s ), one of the main mediators, c , is activated under hemolytic conditions associated with high heme concentrations, thus leading to complement activation (roumenina et al., ) . the same was observed in covid- patients (risitano et al., ) . furthermore, other complement factors, like c a and c q, were reported to be activated by heme (roumenina et al., ) . so far, an increase of the activation of these proteins was not described for covid- . finally, the number of neutrophils is positively correlated with both heme and covid- infection. however, heme induces neutrophil activation through a ros-dependent mechanism (dutra and bozza, ) , a pathway that is not yet discussed in the context of covid- . the blood and coagulation system is pronounced by the connecting proteins ferritin and albumin ( figure d , table s ). both conditions lead to reduced levels of ferritin, a protein involved in iron uptake and release (mogl, ) . same applies for albumin in covid- patients guan et al., ; yang et al., ; zhou et al., a) . moreover, albumin is known as one of the common heme scavengers, neutralizing heme's toxic effects up to a certain extent (roumenina et al., ) . as indicated by the impact on different components of the blood coagulation system, such as plasminogen or fibrin in case of covid- and heme, respectively, both conditions can influence hemostasis. with regard to the impact on platelets, a decreased platelet count was observed in covid- patients , whereas for heme an induction of platelet aggregation was described (roumenina et al., ) . finally, a trend towards elevated levels of organ-specific diagnostic markers, i.e. ldh and bilirubin, is shared by both kgs ( figure e , table s ). currently, sars-cov- and its associated disease covid- keep the world in suspense. patients being most severely affected suffer from pneumonia, acute respiratory distress syndrome, and death (ye et al., ; zhou et al., a) . while covid- patients often exhibit high levels of proinflammatory markers as well as an activation of the complement and coagulation system, hemoglobin and albumin levels have been reported to be remarkably low risitano et al., ; ye et al., ) . these affected clinical parameters have generated a recent debate about the role of heme in the context of covid- that has not been conclusively explained to date . with this work, we intend to provide deeper insights into the connection between sars-cov- infection, covid- and the effects of heme, wherever possible and appropriate. such a connection would be in line with recent studies that already described the impact of heme in the context of different viruses gupta et al., ; neris et al., ) . lecerf et al. reported on the interaction of heme with antibodies (abs) resulting in the induction of new antigen binding specificity and acquisition of binding polyreactivity to gp hiv- in % of the antibodies from different b-cell subpopulations of seronegative individuals . in contrast, no difference in the sensitivity towards heme was found for abs originally expressed by naive, memory, or plasma cells. the transient interaction of heme with a fraction of circulating abs that might change their antigen binding repertoire by means of cofactor association was suggested as another possible regulatory function of heme . in addition, the novel antigen specificities of these circulating abs was proposed to be recruited only in case of certain pathological conditions that might depend on extracellular heme as occurring in disorders such as malaria, sickle cell disease, hemolytic anemia, β-thalassemia, sepsis, and ischemia-reperfusion . a similar report by gupta et al. revealed the heme-mediated induction of monoclonal immunoglobulin g antibodies that acquired high-affinity reactivity towards antigen domain iii of the japanese encephalitis virus (jev) e glycoprotein that exhibited neutralizing activity against dominant jev genotypes (gupta et al., ) . in both cases, heme was found to confer novel binding specificities to the respective abs without changing the binding to their cognate antigen and, as a consequence of the contact with heme, the anti-inflammatory potential of these abs was substantially increased (gupta et al., ) . finally, assuncao-miranda et al. and neris et al. described the inactivation of different arthropod-borne viruses like dengue, yellow fever, zika, chikungunya, mayaro and others by porphyrin treatment via targeting of the viral envelope and thus, the early steps of viral infection (assunção-miranda et al., , neris et al., . all together, these studies advocate for studying the impact of heme in coronavirus-infected patients (figure ) . here, we have investigated the possibility of a direct interaction of heme with sars-cov- surface proteins and their human counterparts ace and tmprss . our analysis revealed that heme binding conferred by hbms would potentially be possible. the quality, availability, and accessibility of the motifs follows the rank order: tmprss (good binder) > ace > s protein > protein a (poor binder). especially in tmprss , the location of the most suitable hbms correlates with the important catalytic protease domain. this potential heme interaction would be of a transient nature, as has been observed for other heme-binding proteins such as il- α and cbs (kumar et al., ; wißbrock et al., ) . intact heme would bind to the protein surface in a reversible fashion, which would be in contrast to the recently presented hypothesis by liu & li (liu and li, ) . therein, the authors describe heme extraction from hemoglobin through attack by viral proteins and subsequent iron and porphyrin release from heme, which does not occur in a physiological situation (belcher et al., ) . in addition, the docking analysis performed in their study is not based on experimental data concerning the porphyrin interaction, unlike the data-driven machine learning algorithm hemoquest used in our study. nevertheless, the effect of heme on the suggested proteins tmprss , ace , s protein, and protein a needs experimental verification. apart from investigating the direct impact of heme on proteins at the interface of the virus-host cell interaction, we also explored similarities between relevant pathways characterizing the respective pathologies, i.e. labile heme occurrence in hemolytic conditions and covid- disease progression ( figure ) . both, hemolytic conditions and covid- , have been found to trigger inflammatory pathways. covid- patients often develop respiratory distress syndrome, which is accompanied by a cytokine storm, and thus an activation of the immune system (ye et al., ) . clinically, this is manifested by an increase in the levels of a wide range of cytokines, including tnfα, il- , il- and il- (ye et al., ) , and the activation of the complement system (e.g. c ) (risitano et al., ) . interestingly, hemoglobin is described to be often decreased in covid- patients without indicating the molecular cause (huang et al., ) . this seems to correlate with increased levels of the iron-storage protein ferritin. an increase in ferritin concentration is observed in diseases like hemochromatosis or porphyria (mogl, ) . furthermore, it is upregulated during hemolytic diseases as a consequence of hemoglobin degradation and the associated increase of oxidative stress, e.g. induced by heme (belcher et al., ) . hemolytic disorders such as malaria, ischemiareperfusion, hemorrhage or hemolytic anemias are associated with an excess of labile heme and are, as in covid- infection, often accompanied by inflammatory events (chiabrando et al., ; barcellini and fattizzo, ) . therefore, similar clinical parameters are observed under these conditions (barcellini and fattizzo, ) . moreover, several studies have reported that heme directly binds or induces tnfα, il- β and il- , and triggers numerous inflammatory pathways (e.g. nf-κb signaling) (dutra and bozza, ; humayun et al., ) . taken together, these clinical observations suggested a correlation between both processes, which we aimed to analyze by superimposing the two kgs of both pathophysiologies humayun et al., ) . indeed, the results of the knowledge-driven analysis revealed a core of similar molecular patterns shared. the majority of these were related to the three major systems inflammation, complement, and coagulation system. as expected, inflammation was the most emphasized common system, suggesting several processes that are commonly mediated by heme and in covid- . the tlr signaling pathway was previously shown to play an important role in heme-mediated inflammatory processes. interestingly, this pathway with its components tlr , myd and nf-κb was pronounced in the overlay of the heme kg and covid- kg. the tlr pathway belongs to the innate immune system, and thus results in the production of several proinflammatory cytokines, such as tnfα, il-  and il- (humayun et al., ) . tnfα and il-  can further stimulate the release of inflammatory mediators, such as il- . exactly the same proteins have emerged as common key molecules in our analysis. clinical observations revealed their upregulation in covid- patients as well as during hemolytic events (dutra and bozza, ; humayun et al., ; ye et al., ) , which highlights even more the tlr signaling pathway in both situations. interestingly, tnfα and il-  were reported to be capable of regulating platelet aggregation. this might support the common link of both pathologies to blood coagulation (bar et al., ) . blood parameters, such as hemoglobin and albumin levels, may allow for a direct correlation between covid- and heme, since they are inevitably connected to the processing of heme under hemolytic conditions (chiabrando et al., ; roumenina et al., ) . at the current state of research, there is no explanation for the decreased levels of hemoglobin in covid- . it might be conceivable, that this is due to a rapid turnover of red blood cells, which would lead to a degradation of hemoglobin and, in turn, to an increase of heme. although for some viruses it is known that they cause hemolysis of red blood cells, such as hepatitis a (goel et al., ) , a similar behavior has not yet been described for sars-cov- and related viruses. however, our approach is not without limitations as our analysis is restricted to a limited number of scientific articles. furthermore, there is an unbalanced source of information when comparing the tremendous amount of literature that is currently being published on covid- versus the information that is currently included in heme kg for heme. the findings described herein require a more detailed experimental investigation with dedicated experiments for each of the reported relations that shed light on the underlying biochemical mechanisms as well as for the full characterization of the heme-binding capacity of the proposed proteins. nevertheless, the results of this study draw attention to a relationship that could be plausible based on the current characterization of covid- by clinical parameters. a correlation between the symptoms of covid- infection and the consequences of excess heme does not necessarily have to be related, but in specific cases it may correlate or even cause a more severe course of the disease in existing hemolytic conditions or hemolysis-provoking events. the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. this work has been supported by the mavo and icon programs of the fraunhofer society. financial support by the university of bonn is gratefully acknowledged. the location of the proteins is presented (first column, left), individually highlighting each target protein (s protein, red; protein a, orange; ace, green; tmprss , turquoise). all motifs predicted by hemoquest are shown excluding those with modifications (glycosylation, disulfide bonds) or located in intracellular or virion domains, i.e. motifs for s protein, motifs for protein a, motifs for ace , and motifs for tmprss . potential hemebinding residues are bold-written and numbered according to swissprot numbering system (bairoch and apweiler, ) . a refined analysis considering the surface accessibility of the motifs resulted in motifs for s protein, largely overlapping motifs for protein a, motifs for ace , and motifs for tmprss (third column). in addition, these motifs are highlighted in a zoom-in below the list with annotation of the respective potential coordinating residues (green; third column), as well as in the available monomer (fourth column) and oligomer (fifth column) structures, if applicable (s protein, homology model from c-i-tasser (zhang et al., a) ; protein a, in-house homology model; ace , pdb: m ; tmprss , swiss-model: o ). within the oligomers, the motifs were only depicted in one of the monomers (green). each time, the central, potential hemecoordinating residue is shown as stick model. since some surface-exposed motifs within s protein were not covered by the available em structure (pdb: vxx), motifs were highlighted within the monomer homology model from c-i-tasser (zhang et al., a ) (turquoise), which was then superimposed with the trimer (pdb: vxx). where applicable, glycosylation sites and ions are highlighted in blue. . the color of the nodes denotes whether it is present exclusively in the covid- kg (blue), in heme kg (green), or in both (red). finally, to highlight the areas close to the overlapping nodes, their neighbors are colored in light red. the most matching nodes are shown circled assigned to the respective major systems, i.e. inflammation, blood coagulation and complement system. immune response -inflammation, (c) immune response -complement system, (d) blood and coagulation system, and (e) organ-specific diagnostic markers. moreover, for each classification available clinical parameters were denoted risitano et al., ) . crp = creactive protein, c = complement component , cd = cluster of differentiation, g-csf = granulocyte-colony stimulating factor, gm-csf = granulocyte macrophage colony-stimulating factor, il = interleukin, ldh = lactate dehydrogenase, mcp = monocyte chemoattractant protein . top left: virus (grey) release after conquering the host cell (yellow) and taking over its protein synthesis machinery. top right: in case of hemolysis, erythrocyte lysis occurs in a blood vessel, leading to degradation of hemoglobin (blue/red, pdb: gzx) and, thus, to an excess of labile heme. circulating membrane binding proteins of coronaviruses hemoglobin and heme scavenging inactivation of dengue and yellow fever viruses by heme, cobalt-protoporphyrin ix and tin-protoporphyrin ix the swiss-prot protein sequence database and its supplement trembl in the regulation of platelet aggregation in vitro by interleukin- beta and tumor necrosis factor-alpha: changes in pregnancy and in pre-eclampsia clinical applications of hemolytic markers in the differential diagnosis and management of hemolytic anemia heme degradation and vascular injury sars-cov- envelope and membrane proteins: differences from closely related proteins linked to crossspecies transmission? preprints imbalanced host response to sars-cov- drives development of covid- role of the chemical environment beyond the coordination site: structural insight into fe(iii)protoporphyrin binding to cysteine-based heme-regulatory protein motifs epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study heme in pathophysiology: a matter of scavenging, metabolism and trafficking across cell membranes who declares covid- a pandemic revisiting the interaction of heme with hemopexin: recommendations for the responsible use of an emerging drug covid- knowledge graph: a computable, multi-modal, cause-and-effect knowledge model of covid- pathophysiology heme on innate immunity and inflammation hematologic parameters in patients with covid- infection severe acute respiratory syndrome coronavirus protein a interacts with hsgt statistical methods for research workers coagulation disorders in coronavirus infected patients: covid- , sars-cov- , mers-cov and lessons from the past hepatitis a virus-induced severe hemolysis complicated by severe glucose- -phosphate dehydrogenase deficiency clinical characteristics of coronavirus disease in china neutralization of japanese encephalitis virus by heme-induced broadly reactive human monoclonal antibody prominent changes in blood coagulation of patients with sars-cov- infection sars-cov accessory protein a directly interacts with human lfa- is there an association between covid- mortality and the renin-angiotensin system? -a call for epidemiologic investigations tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor pybel: a computational framework for biological expression language clinical features of patients infected with novel coronavirus in wuhan a computational approach for mapping heme biology in the context of hemolytic disorders elevated plasmin(ogen) as a common risk factor for covid- susceptibility yasara view -molecular graphics for all devices -from smartphones to workstations regulatory fe ii/iii heme: the reconstruction of a molecule's biography heme interaction of the intrinsically disordered n-terminal peptide segment of human cystathionine-β-synthase heme binding of transmembrane signaling proteins undergoing regulated intramembrane proteolysis post-donation covid- identification in blood donors structure of the sars-cov- spike receptor-binding domain bound to the ace receptor prevalence and gene characteristics of antibodies with cofactor-induced hiv- specificity circulating inflammatory cytokine levels in hemolytic uremic syndrome covid- : attacks the -beta chain of hemoglobin and captures the porphyrin to inhibit human heme metabolism sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells structural characterization and crystallization of human tmprss protease an unhappy triad: hemochromatosis, porphyria cutanea tarda and hepatocellular carcinoma-a case report genotype and phenotype of covid- : their roles in pathogenesis co-protoporphyrin ix and sn-protoporphyrin ix inactivate zika, chikungunya and other arboviruses by targeting the viral envelope characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov hemoquest: a webserver for qualitative prediction of transient heme binding to protein motifs analysis of ace in polarized epithelial cells: surface expression and function as receptor for severe acute respiratory syndrome-associated coronavirus complement as a target in covid- ? heme: modulator of plasma systems in hemolytic diseases the coronavirus e protein: assembly and beyond high-affinity binding and catalytic activity of his/tyr-based sequences: extending hemeregulatory motifs beyond cp abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia structure, function, and antigenicity of the sars-cov- spike glycoprotein epidemiological and clinical features of hospitalized patients with covid- in swiss-model: homology modelling of protein structures and complexes structural insights into heme binding to il- α proinflammatory cytokine mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus high expression of ace receptor of -ncov on the epithelial cells of oral mucosa structural basis for the recognition of sars-cov- by full-length human ace . science ( -. ) clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study the pathogenesis and treatment of the`cytokine storm' in covid- epidemiologic features and clinical course of patients infected with sars-cov- in singapore protein structure and sequence reanalysis of -ncov genome refutes snakes as its intermediate host and the unique similarity between its spike protein insertions and hiv- angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target epidemiological, clinical characteristics of cases of sars-cov- infection with abnormal imaging findings clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin the shown proteins contain domains in the extracellular space that have specific characteristics for heme binding. bottom right: prominent changes of clinical parameters in patients suffering from covid- infection ( ↑ increase, ↓ decrease). the terms depicted by an asterisk, have been reported in both key: cord- - t w z authors: campione, elena; lanna, caterina; cosio, terenzio; rosa, luigi; conte, maria pia; iacovelli, federico; romeo, alice; falconi, mattia; del vecchio, claudia; franchin, elisa; lia, maria stella; minieri, marilena; chiaramonte, carlo; ciotti, marco; nuccetelli, marzia; terrinoni, alessandro; iannuzzi, ilaria; coppeda, luca; magrini, andrea; moricca, nicola; sabatini, stefano; rosapepe, felice; bartoletti, pier luigi; bernardini, sergio; andreoni, massimo; valenti, piera; bianchi, luca title: pleiotropic effect of lactoferrin in the prevention and treatment of covid- infection: randomized clinical trial, in vitro and in silico preliminary evidences date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t w z the current treatments against sars-cov- have proved so far inadequate. a potent antiviral drug is yet to be discovered. lactoferrin, a multifunctional glycoprotein, secreted by exocrine glands and neutrophils, possesses an antiviral activity extendable to sars-cov- . we performed a randomized, prospective, interventional study assessing the role of oral and intra-nasal lactoferrin to treat mild-to-moderate and asymptomatic covid- patients to prevent disease evolution. lactoferrin induced an early viral clearance and a fast clinical symptoms recovery in addition to a statistically significant reduction of d-dimer, interleukin- and ferritin blood levels. the antiviral activity of lactoferrin related to its binding to sars-cov- and cells and protein-protein docking methods, provided the direct recognition between lactoferrin and spike s, thus hindering the spike s attachment to the human ace receptor and consequently virus entering into the cells. lactoferrin can be used as a safe and efficacious natural agent to prevent and treat covid- infection. in december , in whuan, china, a cluster of pneumonia cases was observed. this cluster was related to a novel member of betacoronavirus, named sars-cov- , possessing more than % identity to sars-cov and % to the mers-cov , . coronavirus are spherical, enveloped viruses then, the efficacy of different concentrations of blf in inhibiting sars-cov- infection was tested on vero e and caco- cells according to different experimental procedures: i) control: untreated sars-cov- and cells; ii) blf pre-incubated with virus inoculum for h at °c before cell infection; iii) cells pre-incubated with blf for h at °c before virus infection; iv) blf added together with virus inoculum at the moment of infection step; v) virus and cells separately pre- incubated with blf for h at °c before infection. the results obtained with vero e cells are shown in figure a (moi . ) and b (moi . ). cov- infection (p < . and p < . , respectively) ( figure a and b). on the contrary, the data illustrated in figure a and b, independently from the moi used, regarding caco- cells, at moi . , no significant differences were observed in all experimental conditions compared to the control ones when using blf at µg/ml ( figure a ). at moi . , an inhibition of viral load in supernatants was observed at hours post-infection (hpi) only when µg/ml of blf was pre-incubated with the viral inoculum and when the cells were pre-incubated with µg/ml of blf compared to the control one (p < . ) ( figure b ). at hpi, an inhibition of viral load was observed only when the cells were pre-incubated with blf (p < . ) ( figure b ). when blf was used at a concentration of µg/ml, a decrease of viral load up to hpi was observed when the viral inoculum was pre-incubated with blf compared to the control group, independently from the moi used (p < . ) ( figure c, d) . when the cells were pre-incubated with blf, a decrease of viral load up to hpi was observed compared to the control at moi . (p < . after hpi and p < . after hpi) ( figure c ), while at moi . the decrease of viral load remained statistically significant up to hpi compared to the control group (p < . ) ( figure d). when blf was added together with sars-cov- inoculum during the adsorption step a decrease of viral load up to hpi was observed compared to untreated sars-cov- infection, independently from the moi used (p < . after hpi and p < . after hpi for moi . ; p < . after and hpi for moi . ) ( figure c, d) . when the cells were pre-incubated with blf and infected with sars-cov- previously pre-incubated with blf, a decrease of viral load up to hpi was observed for moi . compared to untreated sars-cov- infection (p < . after hpi and p < . after hpi for moi . ) ( figure c ), while at moi . the decrease of viral load remains statistically significant up to hpi compared to untreated sars-cov- infection (p < . ) ( figure d ). computational results the molecular docking simulation suggests a potential interaction of the blf structure with the spike glycoprotein cdt domain in the up conformation (fig. a ). the first three solutions obtained by frodock clustering procedure account for more than % of the total generated (table s a , supplemental data). a detailed analysis of the interaction network reveals the presence of different interactions, which persist for more than % of the simulation time, in agreement with the high interaction energy calculated. in detail, we found salt bridges, hydrogen bonds and residue pairs involved in hydrophobic contacts (table s left side, supplemental data). to check if some of the spike residues targeted by the blf protein are involved in the binding with ace , we have compared the average structure extracted from the simulation with the ace /cdt domain complex structure (pdb id: lzg fig. ) . surprisingly, only two spike residues (gly and tyr ) are shared between the complexes interfaces (table s left side, supplemental data), as evaluated from the inspection of the superimposed structures and from the paper analysis . despite this, lf holds the same position assumed by the ace enzyme, i.e. above the up cdt domain. we performed the same analysis over the evaluated human lactoferrin (hlf)-spike complex, obtaining a binding pose superimposable to that observed for the bovine protein (fig. b) . besides supplemental data), we observed that also for the hlf only two residues (thr and tyr ) are shared between the complexes interfaces (table s right side, supplemental data). these results allow us to hypothesize that, in addition to the hspgs binding , both blf and hlf should be able to hinder the spike glycoprotein attachment to the ace receptor, consequently blocking the virus from entering into the cells. the current treatment approaches to covid- have so far proved to be inadequate, and a potent antiviral drug or effective vaccine are yet to be discovered and eagerly awaited the immediate priority is to harness innate immunity in order to accelerate early antiviral immune responses. to-moderate disease and in covid- asymptomatic patients. we focused our research on asymptomatic and mild-to-moderate covid- patients, considering them a transmission reservoir with possible evolution to the most severe disease form . li et al, analyzing the viral shedding dynamics in asymptomatic and mildly symptomatic patients infected with sars-cov- , observed a long-term viral shedding, also in the convalescent phase of the disease, where specific antibody production to sars-cov- may not guarantee viral clearance after hospital discharge. in their study, the median duration of viral shedding appeared to be shorter in pre-symptomatic patients ( . days) than in asymptomatic ( days) and mild symptomatic cases ( days) . in our study, lf induced an early viral clearance just after days from the beginning of the treatment in % of patients, and after days of treatment in the rest of our liver function is known to be deranged in covid- and a meta-analysis showed that % and % of patients with covid- had alt and ast levels higher than the normal range . liver and sars-cov- were previously pre-incubated with blf ( figure d ). our experimental results indicate that blf exerts its antiviral activity either by direct attachment to taken together these results reveal that, even if the definitive mechanism of action still has to be explored, the antiviral properties of lf are also extendable to sars-cov- virus. considering the risk of covid relapse , we also suggest additional long-term studies to evaluate the maintenance of viral clearance with lf continuous administration. finally, due to ethical reasons, we could not include placebo arms in our study and therefore we could not evaluate properly the different disease evolution in treated and not-treated patients. however, considering the reported natural disease course we can state lf induced an early rt- pcr negative conversion and a fast clinical symptoms recovery. this study is part of the gefacovid . research program coordinated by the tor vergata university of rome. clinical trial we performed a randomized, prospective, interventional study to assess the efficacy of a liposomal blood parameters obtained at t in covid- group and control group were compared using t-test. data were then analyzed with a significant two-tailed p-value <= . . all parameters obtained at t and t in covid- group were then compared using paired t-test. in addition, the mean change between t and t was also assessed using paired t-test. normally genomic characterisation and epidemiology of novel coronavirus pattern of liver injury in adult patients with covid- : a retrospective analysis of patients clinical outcomes and adverse events in patients hospitalised with covid - , treated with off-label hydroxychloroquine and azithromycin adrenomedullin in covid- induced endotheliitis endothelial cell infection and endotheliitis in covid- pro-adrenomedullin to predict severity and outcome in community- acquired pneumonia influence of liposomes on tryptic digestion of insulin. ii kinetic stability and membrane structure of liposomes during in vitro infant intestinal digestion: effect of cholesterol and lactoferrin responsiveness of emulsions stabilized by lactoferrin nano- particles to simulated intestinal conditions antiviral activities of whey proteins antiviral activities of lactoferrin recombinant porcine lactoferrin expressed in the milk of transgenic mice cancer res effects of orally administered bovine lactoferrin and lactoperoxidase on influenza virus infection in mice lactoferrin inhibits hepatitis c virus viremia in patients with chronic hepatitis c: a pilot study randomized, double-blind, placebo-controlled trial of bovine lactoferrin in patients with chronic hepatitis c the clinical efficacy of a bovine lactoferrin/whey protein ig-rich fraction (lf/igf) for the common cold: a double blind randomized study effects of lactoferrin-containing formula in the prevention of enterovirus and rotavirus infection and impact on serum cytokine levels: a randomized trial viral entry mechanisms: human papillomavirus and a long journey from extracellular matrix to the nucleus herpes simplex virus: receptors and ligands for cell entry bovine lactoferrin inhibits japanese encephalitis virus by binding to heparan sulfate and receptor for low density lipoprotein inhibition of herpes simplex virus infection by lactoferrin is dependent on interference with the virus binding to glycosaminoglycans antiviral effects of milk proteins: acylation results in polyanionic d frodock . : fast protein- protein docking server frodock: a new approach for fast rotational protein-protein docking an overview of the amber biomolecular simulation package numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes scalable molecular dynamics with namd gromacs: high performance molecular simulations through multi- level parallelism from laptops to supercomputers mdtraj: a modern open library for the analysis of molecular the role of medium size facilities in the hpc ecosystem: the case of the new cresco cluster integrated in the eneagrid infrastructure ucsf chimera--a visualization system for exploratory research and analysis key: cord- -jxm ndw authors: karamitros, timokratis; papadopoulou, gethsimani; bousali, maria; mexias, anastasios; tsiodras, sotiris; mentis, andreas title: sars-cov- exhibits intra-host genomic plasticity and low-frequency polymorphic quasispecies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jxm ndw in december , an outbreak of atypical pneumonia (coronavirus disease - covid- ) associated with a novel coronavirus (sars-cov- ) was reported in wuhan city, hubei province, china. the outbreak was traced to a seafood wholesale market and human to human transmission was confirmed. the rapid spread and the death toll of the new epidemic warrants immediate intervention. the intra-host genomic variability of sars-cov- plays a pivotal role in the development of effective antiviral agents and vaccines, but also in the design of accurate diagnostics. we analyzed ngs data derived from clinical samples of three chinese patients infected with sars-cov- , in order to identify small- and large-scale intra-host variations in the viral genome. we identified tens of low- or higher-frequency single nucleotide variations (snvs) with variable density across the viral genome, affecting out of protein-coding viral genes. the majority of these snvs corresponded to missense changes. the annotation of the identified snvs but also of all currently circulating strain variations revealed colocalization of intra-host but also strain specific snvs with primers and probes currently used in molecular diagnostics assays. moreover, we de-novo assembled the viral genome, in order to isolate and validate intra-host structural variations and recombination breakpoints. the bioinformatics analysis disclosed genomic rearrangements over poly-a / poly-u regions located in orf ab and spike (s) gene, including a potential recombination hot-spot within s gene. our results highlight the intra-host genomic diversity and plasticity of sars-cov- , pointing out genomic regions that are prone to alterations. the isolated snvs and genomic rearrangements, reflect the intra-patient capacity of the polymorphic quasispecies, which may arise rapidly during the outbreak, allowing immunological escape of the virus, offering resistance to anti-viral drugs and affecting the sensitivity of the molecular diagnostics assays. sars-cov- , covid- , intra-host variability, qiasispecies, genomic recombination, wuhan seafood market epidemic coronaviruses (covs), considered to be the largest group of viruses, belong to the nidovirales order, coronaviridae family and coronavirinae subfamily, which is further subdivided into four genera, the alpha-and betacoronaviruses, which infect mammalian species and gamma-and deltacoronaviruses infecting mainly birds [ ] , [ ] . small mammals (mice, dogs, cats) serve as reservoirs for hcovs, with significant diversity seen in bats, which are considered to be primordial hosts of hcovs [ ] . on the contrary, peridomestic animals are usually intermediate hosts, who enable long-term establishment of endemicity of the viruses, facilitating mutations and recombination events [ ] , [ ] . until , minor consideration was given to hcovs, as they were associated with mild-to-severe disease phenotypes in immunocompetent people [ ] - [ ] . in , the beginning of severe acute respiratory syndrome (sars) outbreak took place [ ] . in , after the discovery of sars-cov-related viruses in horseshoe bats (rhinolophus), palm civets were suggested as intermediate hosts, and bats as primordial hosts of the virus [ ] , [ ] . in the size of the ssrna genome of sars-cov- is , nucleotides, it encodes amino acids and is characterized by nucleotide identity of ~ % with bat sars-related-cov sl-zxc and ~ % with human sars-covs bj and tor [ ] . covs are enveloped positive-sense rna viruses, which are characterized by a very large non-segmented rna genome ( to kb length), ready to be translated [ ] , [ ] . the genes arrangement on the sars-cov- genome is: [ ] . the main difference between sars-cov- and sars-cov is in orf b, orf and spike. intra host variability of pathogenic viruses and bacteria represents a significant barrier in the control of infectious diseases. in viral infections, this variation emerges from genomic phenomena taking place during error-prone replication, ending up to multiple circulating quasispecies of low or higher frequency [ ] , [ ] . these variants, in combination with the genetic profile of the host, can potentially influence the natural history of the infection, the viral phenotype, but also the sensitivity of molecular and serological diagnostics assays [ ] , [ ] . importantly, intra-host genomic variability leads to antigenic variability, which is of higher importance, especially for pathogens that fail to elicit long-lasting immunity in their hosts, and remains a major contributor to the complexity of vaccine design [ ] , [ ] . to date, there are no clinically approved vaccines available for protection of general population from sars-and mers-cov infections as there is no effective vaccine to induce robust cell mediated and humoral immune responses [ ] , [ ] . here, we explore intra-host genomic variants and low-frequency polymorphic quasispecies in next generation sequencing (ngs) data derived from patients infected by sars-cov- . our analyses provide insights into the intra-patient pool of viral genomes, identify the frequency levels of rare variants and highlight variable genomic regions and a potential recombination hot-spot within s gene. intra-host genomic variability is critical for the development of novel drugs and vaccines, which are of urgent necessity, towards the containment of this newly emerging epidemic. in this study we analysed ngs data derived from clinical specimens (oral swabs) from three chinese patients infected by sars-cov- (sra projects prjna and prjna ). we aligned the raw read data on reference strain mn . using bowtie [ ] , after quality check with fastqc (www.bioinformatics.bbsrc.ac.uk/projects/fastqc). the resulting alignments were visualized with the integrated genomics viewer (igv) [ ] . after removing pcr duplicates, snvs were called with a bonferroni-corrected p-value threshold of . using samtools [ ] and lofreq [ ] . variants supported by absolute read concordance (> %) were filtered-out from intra-host variant frequency calculations. we annotated the variations to the reference strain using snpeff [ ] , snvs effects were further filtered with snpsift [ ] and we estimated the average mutation rate per gene across the viral genome using r scripts. we compared the localization of the intra-host snvs with all available snvs observed at population level up to february th (retrieved from www.gisaid.org). we also compared all intra-host and population level snps with all primers and probes coordinates to investigate for potential interferences with currently available molecular diagnostic assays [ ] (www.who.int/docs/default-source/coronaviruse/peiris-protocol- - - .pdf). to investigate intra-host genomic rearrangements, we performed de novo assembly of the sars-cov- genomes using spades [ ] , and the resulting contigs were analyzed with blast [ ] and confirmed by remapping of the raw reads. smaller contigs (< bp) were elongated where possible, after pair-wise realignment of the corresponding mapped reads. basic computations and visualizations we implemented in r programming language r version . . , using in-house scripts. the secondary structures of the genomic regions surrounding the recombination breakpoints was predicted using rnafold [ ] . the mapping assembly of the viral genome was almost complete for all samples. the genome coverage and the average read depth across the genome was . % and . x for sample srr , . % and . x for sample srr , and . %, and . x for sample srr , respectively. the alignment statistics for all samples are summarized in suppl. table . in all cases we isolated the same snvs with - % read concordance, thus in total divergence with the reference strain (mn . ), which were excluded from the downstream analysis. for sample srr we isolated lower frequency snvs in total. off these, were present with frequencies ranking between and %, while only one was present in % of the intra-host viral population. the sequencing depth, which is also evaluated during the snv calling by the lofreq algorithm, ranked between x and x at the corresponding snv positions. the sequencing depth of sample srr at the polymorphic positions was substantially higher ( x - x), allowing the isolation of snvs with frequencies distributed between . % and %. the depth over the polymorphic positions of sample srr was between x - x, allowing the isolation of intra-host snvs, with frequencies . % - . % (figure .a, suppl. table ). intra-host variants were distributed across out of the protein-coding genes of the viral genome, namely orf ab, s, orf a, orf , orf a, orf and n. after normalising for the gene length (variants / kb-gene-length -"v/kbgl"), the higher density was observed in the small orf ( . v/kbgl), followed by orf ( . v/kbgl), n ( . v/kbgl), s ( . v/kbgl), orf ab ( . v/kbgl), orf a ( . v/kbgl) and orf a ( . v/kbgl). interestingly, the majority of the snps corresponded to missense changes (leading to amino-acid change) compared to synonymous changes ( vs. respectively, ratio . : ) ( table ) . the average intra-host variant frequency did not differ substantially either between missense and synonymous polymorphisms (figure .c) , neither between their hosting genes (figure .d) . we did not detect any small-scale insertions or deletions in the samples (suppl. table ). the comparison of all snvs (intra-host and population level) with the genomic targets of the molecular diagnostics assays, revealed colocalizations of three intra-host snvs and isolate-specific snvs with primers and probes currently in use. in detail, intra-host snvs colocalized with the probe of rdrp_sarsr reaction ( , the de novo assembly of the viral genomes revealed intra-host genomic rearrangements. for samples srr and srr , these large-scale structural events were systematically observed over poly-a / poly-u-rich genomic regions, located in orf ab and s genes. in all cases, similar or identical strings of nucleotides in close proximity appear to have served as seeds for homologous recombination events. all rearrangements were validated by remapping of the raw reads on the corresponding de novo assembled contigs, setting a threshold of at least supporting reads of high mapping quality (> ) in each case. for sample srr we isolated three inversions/misassemblies in orf ab (suppl. figure ) and one inversion/misassembly in s gene (figure -a) . notably, we were able to validate the same inversion in s gene for sample srr as well (figure -b) . apart from inversions in orf ab supported by only reads each (not passing the validation threshold), there were no further large-scale intra-host events observed for sample srr . similarly, we identified one inversion/misassembly in sample srr that was supported by only one read. the alignment coordinates of all rearrangementsupporting contigs with respect to the reference strain are presented in (table ) . the rapid spread and the death toll of the new sars-cov- epidemic warrants the immediate identification / development of effective antiviral agents and vaccines, but also the design of accurate diagnostics. the intra-and inter-patient variability of the viral genome plays a pivotal role in all the abovementioned efforts, since it affects the compatibility of molecular diagnostics but also impairs the effectiveness of the vaccines and the serological assays by altering the antigenicity of the virus. intra-host low-frequency variants are also the main source of resistance to anti-viral drugs. bioinformatics analysis of ngs data allows the generation of the consensus sequence of a viral genome from the of majority nucleotides at each position but also the identification of non-consensus nucleotides, enabling the exploration of intra-host variability but also its consequences on intra-host viral evolution [ ] - [ ] . all samples analysed in this study were probably infected by the same viral strain since they shared the same set of consensus snvs. however, apart from intra-host snvs that were common between srr and srr , there was no other overlap observed between the low frequency variants of each sample (figure -b) . this indicates that these variations have been occurred in a rather random fashion and are not subject of selective pressures, which is also supported by the fact that the missense mutations were systematically more, compared to the synonymous mutations. on the other hand, missense substitutions are more common in loci involving pathogen resistance, indicating positive selection [ ] . the analysed viral rna might have been originated from functional/packed virions, but also from unpacked viral genomes, which are unable to replicate and infect other host cells. even if a viral genome is unable to replicate independently, its abundant presence in the pool of viral quasispecies implies some functionality regarding the intra-host evolution and adaptation. for example, defective viral genomes might affect infection dynamics such as viral persistence but also the natural history of an infection [ ] - [ ] . at the same time, these variants may arise rapidly during an outbreak and can be used for tracking the transmission chains and the spaciotemporal characteristics of the epidemic [ ] - [ ] . studies involving large number of samples and in-vitro experiments on sars-cov- viral isolates are needed, in order to conclude whether these variations are advantageous or come with a fitness cost for the virus. snvs and quasispecies that are observed at low frequency could represent viral variations of low impact on the functionality of the genome. however, their abundance is largely affected by the population size and the epidemic characteristics. for example, a neutral substitution in a region that represents a primer target for a molecular diagnostic assay can drift to fixation rather quickly in a rapidly spreading virus, jeopardizing the sensitivity of the assay [ ] , [ ] . here, we highlight three intra-host but also two fixed variants that colocalized with primers or probes of real-time pcr diagnostics assays that are currently in use ( figure ). since the alignment of these oligos with their genomic targets is directly linked to the performance of the corresponding diagnostic assays, the community should pay extra attention in the evaluation of these potentially emerging variations and be alerted, in case redesigning of these oligos is needed. as it is well documented, recombination events lead to substantial changes in genetic diversity of rna viruses [ ] , [ ] . in covs, discontinuous rna synthesis is commonly observed, resulting in high frequencies of homologous recombination [ ] , which can be up to % across the entire cov genome [ ] . for pathogenic hcovs genomic rearrangements are frequently reported during the course of epidemic outbreaks, such as hcov-oc [ ] , and hcov-nl [ ] , sars-cov [ ] [ ] and mers-cov [ ] . we have isolated intra-host genomic rearrangements, located in poly-a and poly-u enriched palindrome regions across the sars-cov- genome (figure , suppl. figure ) . we have validated the majority of these events by visual inspection of the alignments. we conclude that these rearrangements do not represent artifacts derived from the ngs library preparation (e.g. pcr crosstalk artifacts), especially since all the supporting reads were not duplicated and, in some cases, differed in polymorphic positions (suppl. figure ) . recombination processes involving s gene particularly, have been reported for sarsand sars-like cov but also for hcov-oc . in the case of sister species hcov-nl and hcov- e, recombination breakpoints are located near '-and '-end of the gene [ ] [ ] . s is a trimeric protein, which is cleaved into two subunits, the globular n-terminal s and the cterminal s [ ] . the s subunit consists of a signal peptide and the nt and receptor binding (rb) domains, with the latter sharing only % amino acid identity with other sars-related covs. our analysis revealed that similarly to other genomic regions, the s subunit hosts many low-frequency snvs, characterized by higher density compared to the rest of the s gene sequence (figure -e) . the s subunit is highly conserved, with % identity compared to human sars-cov and two bat sars-like covs [ ] . the s subunit consists of two fusion peptides (fp, ifp), followed by two heptad repeats (hr and ) , the pretransmembrane domain (ptm), the transmembrane and the cytoplasmic domain (tm, cp) [ ] . in s gene, the same rearrangement event has taken place in two samples analyzed in this study. this observation highlights a potential recombination hot-spot in s gene. the rearrangement that was common between the two samples of this study is located in nt , of the -ncov genome, which corresponds to the ~ nt linking region between the fusion peptides fp and ifp (aa - ). examining closely the secondary structure of the rna genome around the breakpoints, we suggest a model where the palindromes '-ugguuuu- ' and '-aaaaccaa- ', have served as donor-acceptor sequences during the recombination event, since they are both exposed in the single-stranded internal loops formed in a highly structured rna pseudoknot (figure -c) . the rb domain of the s protein has been tested as a potential immunogen as it contains neutralization epitopes which appear to have a role in the induction of neutralizing antibodies [ ] , [ ] . it should be mentioned though that s protein of sars-cov is the most divergent in all strains infecting humans [ ] , [ ] , as in both c and n-terminal domains variations arise rapidly, allowing immunological escape [ ] . our findings support that apart from these variations, the n-terminal region also hosts a recombination hot-spot, which together with the rest of the observed rearrangements, indicates the genomic instability of sars-cov- over poly-a and poly-u regions. prediction of the secondary structure of the genomic region spanning the rearrangement breakpoint ( bases upstream and bases downstream). the corresponding donor-acceptor sequences, exposed in internal loops, are indicated in green bars. i ii i ii i ii ' ' base-pair probabilities mfe secondary structure g a a g u u u u u g c a c a a g u c a a a c a a a u u u a c a a a a c a c c a c c a a u u a a a g a u u u u g g u g g u u u u a a u uuu u c a c a a a u a u u a c c a g a u c c a u c a a a a c c a a g c a a g a g g u c a u u u a u u g a a g a u c u a c u u u u c a a c a a a g u g a c a c u u g c a g a u g c u g g c u u c a u c a a a c a a u a u g g u g a u u g c c u u g g u g a u a u u g c u g c c hosts and sources of endemic human coronaviruses evolutionary insights into the ecology of coronaviruses coronavirus infections -more than just the common cold structural biology: structure of sars coronavirus spike receptor-binding domain complexed with receptor origin and evolution of pathogenic coronaviruses bats are natural reservoirs of sars-like coronaviruses novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan rapid reversion of sequence polymorphisms dominates early human immunodeficiency virus type evolution assessment of phylogenetic sensitivity for reconstructing hiv- epidemiological relationships quasispecies diversity determines pathogenesis through cooperative interactions in a viral population the interferon receptor- promoter polymorphisms affect the outcome of caucasians with hb eag-negative chronic hbv infection cd + t cells mediate antibody-independent acquired immunity to pneumococcal colonization patterns of antigenic diversity and the mechanisms that maintain them a decade after sars: strategies to control emerging coronaviruses fast gapped-read alignment with bowtie variant review with the integrative genomics viewer the sequence alignment/map format and samtools lofreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w ; iso- ; iso- using drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, snpsift detection of novel coronavirus ( -ncov) by real-time rt-pcr spades: a new genome assembly algorithm and its applications to single-cell sequencing basic local alignment search tool the vienna rna websuite full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus the application of genomics to emerging zoonotic viral diseases unravelling the history of hepatitis b virus genotypes a and d infection using a full-genome phylogenetic and phylogeographic approach nature encyclopedia of the human genome long-term transmission of defective rna viruses in humans and aedes mosquitoes the outcome of acute hepatitis c predicted by the evolution of the viral quasispecies hcv defective genomes promote persistent infection by modulating the viral life cycle hiv- epidemic in russia: an evolutionary epidemiology analysis molecular investigation of hiv- cross-group transmissions during an outbreak among people who inject drugs an innovative study design to assess the community effect of interventions to mitigate hiv epidemics using transmission-chain phylodynamics simultaneous detection of severe acute respiratory syndrome, middle east respiratory syndrome, and related bat coronaviruses by real-time reverse transcription pcr a contaminant-free assessment of endogenous retroviral rna in human plasma the evolution and emergence of rna viruses spatiotemporal characteristics of the hiv- crf _ag/crf _ a epidemic in russia and central asia rna recombination in animal and plant viruses establishing a genetic recombination map for murine coronavirus strain a complementation groups molecular epidemiology of human coronavirus oc reveals evolution of different genotypes over time and recent emergence of a novel genotype due to natural recombination mosaic structure of human coronavirus nl , one thousand years of evolution evidence of the recombinant origin of a bat severe acute respiratory syndrome (sars)-like coronavirus and its implications on the direct ancestor of sars coronavirus co-circulation of three camel coronavirus species and recombination of mers-covs in saudi arabia bat-to-human: spike features determining 'host jump' of coronaviruses sars-cov, mers-cov, and beyond evaluation of serologic and antigenic relationships between middle eastern respiratory syndrome coronavirus and other coronaviruses to develop vaccine platforms for the rapid response to emerging coronaviruses severe acute respiratory syndrome coronavirus spike protein expressed by attenuated vaccinia virus protectively immunizes mice vaccines to prevent severe acute respiratory syndrome coronavirus-induced disease the recombinant n-terminal domain of spike proteins is a potential vaccine against middle east respiratory syndrome coronavirus (mers-cov) infection prediction of the secondary structure of the genomic region spanning the rearrangement breakpoint ( bases upstream and bases downstream). the corresponding donoracceptor sequences key: cord- -if w n authors: bailey, adam l.; dmytrenko, oleksandr; greenberg, lina; bredemeyer, andrea l.; ma, pan; liu, jing; penna, vinay; lai, lulu; winkler, emma s.; sviben, sanja; brooks, erin; nair, ajith p.; heck, kent a.; rali, aniket s.; simpson, leo; saririan, mehrdad; hobohm, dan; stump, w. tom; fitzpatrick, james a.; xie, xuping; shi, pei-yong; hinson, j. travis; gi, weng-tein; schmidt, constanze; leuschner, florian; lin, chieh-yu; diamond, michael s.; greenberg, michael j.; lavine, kory j. title: sars-cov- infects human engineered heart tissues and models covid- myocarditis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: if w n epidemiological studies of the covid- pandemic have revealed evidence of cardiac involvement and documented that myocardial injury and myocarditis are predictors of poor outcomes. nonetheless, little is understood regarding sars-cov- tropism within the heart and whether cardiac complications result directly from myocardial infection. here, we develop a human engineered heart tissue model and demonstrate that sars-cov- selectively infects cardiomyocytes. viral infection is dependent on expression of angiotensin-i converting enzyme (ace ) and endosomal cysteine proteases, suggesting an endosomal mechanism of cell entry. after infection with sars-cov- , engineered tissues display typical features of myocarditis, including cardiomyocyte cell death, impaired cardiac contractility, and innate immune cell activation. consistent with these findings, autopsy tissue obtained from individuals with covid- myocarditis demonstrated cardiomyocyte infection, cell death, and macrophage-predominate immune cell infiltrate. these findings establish human cardiomyocyte tropism for sars-cov- and provide an experimental platform for interrogating and mitigating cardiac complications of covid- . to explore whether human cardiomyocytes might be susceptible to sars-cov- infection, we examined the expression of angiotensin converting enzyme (ace ) within the human heart. previous studies have established that ace serves as a cell-surface receptor for sars-cov- through interactions with the spike protein in numerous human cell types , . immunostaining of human left ventricular myocardial tissue revealed evidence of ace expression in cardiomyocytes (fig. a) . we observed significant variation in ace expression between individual cardiomyocytes within the same myocardial specimen. ace mrna was abundantly expressed in the healthy human heart and further increased in the context of chronic heart failure (fig. b) . rna sequencing of human pediatric and adult heart failure specimens revealed robust expression of ace mrna within the human heart across the spectrum of age (fig. c) . consistent with our immunostaining findings, primary human cardiomyocytes obtained from the left ventricle and atria expressed ace mrna (fig. d) . these data are consistent with prior single cell and bulk rna sequencing analyses of human myocardium and suggest that cardiomyocytes might be permissive to sars-cov- infection . to ascertain whether human pluripotent stem cell-derived cardiomyocytes (hpsc-derived cms) can serve as an appropriate model to study cardiac sars-cov- infection, we measured ace mrna expression in hpsc-derived cms. quantitative rt-pcr revealed that hpsc- derived cms abundantly expressed ace mrna. in contrast, minimal ace mrna was detected in human dermal fibroblasts, hpsc-derived cardiac fibroblasts, or human fetal cord blood derived- macrophages (fig. s a-c) . human engineered heart tissues (ehts) self-assembled between two deformable polydimethylsiloxane (pdms) posts after mixing cells in an extracellular matrix composed of collagen and matrigel (fig. s d) . ehts composed of either hpsc-derived cms and fibroblasts or hpsc-derived cms, fibroblasts, and macrophages also expressed ace mrna (fig. s e) . immunostaining of ehts confirmed the presence of ace protein specifically in hpsc-derived cms (fig. s f) . these data suggest that hpsc-derived cms might be susceptible to sars-cov- infection and serve as a suitable experimental model to study cardiac manifestations of covid- . (fig. s g) . in contrast, two independent lines of hpsc-derived cardiomyocytes (hpsc- derived cms) were permissive to sars-cov- infection (fig. f) . undifferentiated hpsc lines did not demonstrate evidence of infection (fig s ) . to confirm cardiomyocyte tropism, we inoculated various combinations of hpsc-derived cms, fibroblasts, and macrophages grown in monolayer culture with wild-type sars-cov- (usa_wa / ). we analyzed tissue culture supernatants for production of infectious virus using a vero cell infection-based focus forming assay, and we measured intracellular viral rna transcript levels using rt-qpcr at days post-inoculation. these assays revealed the production of infectious virus (fig. a) and viral rna (fig. b) selectively in cultures that contained hpsc- derived cms. cultures lacking hpsc-derived cms contained viral loads that were equivalent to media-only controls. a time course of hpsc-derived cm infection showed that cardiomyocytes rapidly produced infectious virus with peak titers observed at day post-inoculation. these kinetics were closely mirrored by sars-cov- -mneongreen (fig. c) . using sars-cov- -mneongreen, we examined the relationship between viral replication and cell death using flow cytometry. although the percentage of hpsc-derived cms that were mneongreen-positive peaked at day post-inoculation, significant levels of hpsc-derived cm cell death were not observed until - days post-inoculation (fig. d) indicating that viral infection precedes hpsc-derived cm cell death. sars-cov- -infected cardiomyocytes also displayed characteristics of cytopathic effects. cellular rounding, clumping, and syncytium formation first were observed on day post-inoculation. distortion of cellular morphology was evident by day post-inoculation and cultures contained largely dead cells and debris by days - post-inoculation (fig. e) . to verify that cardiomyocytes are the primary target of sars-cov- in a simulated cardiac environment, we infected two-dimensional tissues assembled with hpsc-derived cms ( %), fibroblasts ( %), and macrophages ( %) with sars-cov- -mneongreen. flow cytometry performed days following infection revealed mneongreen expression only in cd -cd - tnnt + cardiomyocytes. mneongreen was not detected in cd + fibroblasts or cd + macrophages within infected two-dimensional tissues ( fig. f-g, fig. s ). these data to examine viral transcription and the host immune response to sars-cov- infection, we performed rna sequencing. cultures containing either hpsc-derived cms, fibroblasts, or macrophages were either mock-infected or inoculated with sars-cov- . we also examined two- dimensional co-culture tissues assembled with % cardiomyocytes, % fibroblasts, and % macrophages. cells and tissues were harvested on day post-inoculation. principal component analysis revealed separation between experimental groups consistent with their distinct cellular composition (fig. a) . classification of transcript types demonstrated that infected hpsc-derived cms and two-dimensional tissues comprised of hpsc-derived cms and fibroblasts or hpsc- derived cms, fibroblasts, and macrophages contained abundant viral transcripts (fig. s a) . we then assessed the expression of specific viral transcripts by aligning the rna sequencing data to the sars-cov- genome and transcriptome. subgenomic rnas were identified based on the presence of ' leader sequences . we observed robust expression of most sars-cov- genomic and subgenomic rnas in infected hpsc-derived cms and two-dimensional tissues with the exception of orf b (fig. b, fig. s b) . to facilitate differential expression analysis of host genes, we censored viral rnas from the rna sequencing computational model. this was necessary given the asymmetric prevalence of viral transcripts across samples. we identified numerous host genes that were differentially regulated upon sars-cov- infection in each of the examined cell types and two-dimensional tissues (fig. c) . conditions that supported viral replication (hpsc-derived cms and two- dimensional tissues) displayed the greatest overlap in differentially expressed genes. cell types that did not support viral replication (fibroblasts and macrophages) also demonstrated numerous differentially expressed host genes, indicating that sars-cov- might elicit changes in host gene expression in the absence of direct viral infection. notably, host genes differentially expressed in fibroblasts and macrophages exposed to sars-cov- were largely distinct (fig. d) . these findings suggest that elements within or on the surface of sars-cov- virions may serve as pathogen-associated molecular patterns (pamps) and stimulate distinct gene expression programs in differing cell types. go pathway analysis revealed that infected hpsc-derived cms and two-dimensional co- culture tissues showed upregulation of genes associated with immune cell activation, stress- induced transcription, and responses to pathogens including viruses. genes associated with metabolism, oxidative phosphorylation, and mitochondrial function were downregulated by infection. two-dimensional tissues displayed alterations in other pathways including upregulation of cellular responses to cytokines and downregulation of genes involved in muscle contraction ( fig. e-f) . host genes differentially expressed in macrophages and fibroblasts were associated with pathways involved in innate immune cell activation, migration, and cytokine responses (fig. g-h). examination of specific genes differentially regulated in infected hpsc-derived cms and two-dimensional tissues (fig. i ) revealed marked reduction in components of the electron transport chain (atp synthase, mitochondrial cytochrome c oxidase, nadph dehydrogenase) and key upstream metabolic regulators (glycerol- -phosphate dehydrogenase, pyruvate dehydrogenase, succinate dehydrogenase complex). pdk , an inhibitor of pyruvate dehydrogenase was upregulated in infected hpsc-derived cms and two-dimensional tissues. we also observed marked downregulation of numerous components and regulators of the contractile apparatus including cardiac actin, troponin subunits, myosin light and heavy chains, desmin, phospholamban, and calsequestrin in infected two-dimensional tissues. infected hpsc-derived cms displayed similar changes, albeit to a lesser extent. ace expression was diminished in infected cardiomyocytes and two-dimensional tissues. infected hpsc-derived cms and two-dimensional tissues also displayed upregulation of key regulators of innate immunity (fig. i) . type i interferon (ifn) activation was apparent by the increased expression of ifnb and numerous ifn stimulated genes including ifit , ifit , ifit , isg , mx , and oas . stress response programs (fos) and cytokine expression (tnf) were similarly upregulated in these cell types. consistent with a greater innate immune response in two-dimensional tissues, we found that several chemokines (ccl , ccl , ccl , ccl , and cxcl ) and cytokines (il b, il , and csf ) were selectively upregulated in infected two- dimensional tissues. macrophages and fibroblasts contributed to enhanced chemokine and cytokine responses in two-dimensional tissues. ccl , ccl , and ccl were selectively expressed in infected macrophages and csf , cxcl , il b, and il were induced in infected fibroblasts (fig. s c) . inhibitor of the sars-cov- rna-dependent rna polymerase - ( fig. a-b) . after binding to ace , the sars-cov (and sars-cov- ) spike protein must undergo proteolytic activation to initiate membrane fusion . host proteases located at the plasma membrane (tmprss ) or within endosomes (cathepsins) most commonly perform this function. the relative contributions of each of these protease families to sars-cov- infection varies by cell-type , . rna sequencing data revealed that hpsc-derived cms express robust levels of ace and multiple endosomal proteases including cathepsins and calpains (fig. c) . ace mrna was not abundantly expressed in either macrophages or fibroblasts. while tmprss expression was present at the lower limit of detection for rnaseq, we detected low, but measurable levels of tmprss by rt-qpcr in hpsc-derived cms, but not in fibroblasts or macrophages ( fig. c-d) . to determine whether sars-cov- enters cardiomyocytes through an endosomal or plasma membrane route, we inoculated hpsc-derived cms with sars-cov- -mneongreen and administered either the endosomal cysteine protease inhibitor e- , which blocks cathepsins, or the serine protease inhibitor camostat mesylate, which blocks tmprss (and possibly tmprss ) . notably, e- abolished sars-cov- infection of hpsc-derived cms as demonstrated by reduced mneongreen expression and viral rna within the supernatant (fig. e-f). in contrast, camostat had no effect on cardiomyocyte infection over a range of doses ( fig. g-h). thus, sars-cov- enters cardiomyocytes through an endosomal pathway that requires cathepsin but not tmprss -mediated cleavage. myocarditis is characterized by direct viral infection of cardiomyocytes and accumulation of immune cells at sites of active infection or tissue injury , . to examine whether sars-cov- infection of cardiomyocytes in a three-dimensional environment mimics aspects of viral myocarditis, we generated ehts containing either hpsc-derived cms and fibroblasts or hpsc- derived cms, fibroblasts, and macrophages. ehts were seeded in a collagen-matrigel matrix between two pdms posts, infected with sars-cov- , and harvested days after inoculation. hematoxylin and eosin (h&e) staining revealed evidence of tissue injury and increased interstitial cell abundance within the periphery of sars-cov- -infected ehts (fig. a) . immunostaining for the viral nucleocapsid protein demonstrated evidence of prominent infection at the periphery of the eht. nucleocapsid staining was localized within hpsc-derived cms. staining for cd demonstrated macrophage accumulation corresponding to sites of interstitial cell accumulation and viral infection (fig. b, fig. s ). enrichment of nucleocapsid staining at the periphery of the tissue suggests that viral diffusion might be limited by the three-dimensional eht environment. consistent with our immunostaining results, infected ehts (with and without macrophages) accumulated high levels of viral rna, as detected by quantitative rt-pcr (fig. c) . in situ hybridization for viral spike sense and antisense rna was also indicative of viral replication within ehts (fig. d, fig. s ) . ehts consisting of hpsc-derived cms and fibroblasts were assembled and allowed to mature for days prior to infection. ehts were inoculated with sars-cov- , and contractile function was analyzed daily. from days to post infection, the average maximal displacement generated during beating did not differ between the mock and sars-cov- -infected tissues. however, on days to post infection, the sars-cov- inoculated tissues showed reduced contraction relative to the mock-infected tissues ( fig. e-f ). on day after inoculation, the maximal displacement produced during contraction by the sars-cov- inoculated tissues was markedly lower than mock infected-tissues. moreover, the tissues show reduced speed of contraction and relaxation, consistent with systolic dysfunction (fig. g) . to examine whether cardiomyocyte cell death might serve as a mechanism explaining reduced eht contractility on days to post inoculation, we performed tunel staining. consistent with the temporal course of sars-cov- cardiomyocyte infection and cell death in our two-dimensional hpsc-cm cultures (fig. d) , we observed increased numbers of tunel positive cardiomyocytes in sars-cov- infected ehts on day post infection ( fig. a-b) . our rna sequencing data suggest that other mechanisms also may contribute to reduced eht contractility, including decreased expression of genes important for sarcomere function and metabolism as well as activation of host immune responses (fig. i) . consistent with the possibility that disrupted sarcomere gene expression might contribute to reduced eht contractility, immunostaining of hpsc-derived cms infected with sars-cov- revealed evidence of sarcomere loss days following infection (fig. c) , a time point that preceded cell death. furthermore, immunostaining of ehts demonstrated loss of troponin t expression in infected cardiomyocytes. (fig. d-e) . thus, the reduction in contractile function may be multifactorial with contributions from virus-induced cardiomyocyte cell death and loss of sarcomere elements. we then examined the mechanistic relationship between cardiomyocyte infection, inflammatory signaling, sarcomere breakdown, and cell death. inhibition of viral entry (ace neutralizing antibody) or viral replication (remdesivir) was sufficient to prevent type i ifn and tnf expression following sars-cov- infection ( fig. f-g) . remdesivir similarly reduced inflammatory gene expression in d ehts ( fig. s a-b) . these data establish that viral infection represents the upstream driver of inflammation in our model system. to examine the impact of cardiomyocyte inflammatory signaling on cardiomyocyte cell death, sarcomere gene expression, and sarcomere structure, we focused on inhibiting viral nucleic acid sensing. tbk (tank-binding kinase ) is an essential mediator of numerous nucleic acid sensing pathways including rig-i, mavs, sting, and tlrs , . inhibition of tbk activity was sufficient to reduce type i ifn activity (primary inflammatory signature identified in infected cardiomyocytes, fig i) without impacting viral load or cardiomyocyte infectivity ( fig. f-h) . inhibition of tbk activity during sars-cov- cardiomyocyte infection had no impact on cardiomyocyte cell death (fig. i) . while tbk inhibition prevented reductions in tnnt and myh mrna expression following cardiomyocyte sars-cov- infection, sarcomere breakdown remained prevalent in infected cardiomyocytes treated with the tbk inhibitor. in contrast, remdesivir prevented both reductions in tnnt and myh mrna expression and sarcomere loss following sars-cov- infection ( fig. j-k, fig. s c-d) . these data indicate that sars-cov- elicits an inflammatory response in cardiomyocytes that is at least partially dependent on viral nucleic acid sensing and tbk signaling. however, tbk -dependent cardiomyocyte inflammation does not appear responsible for sarcomeric disassembly or cardiomyocyte cell death. these findings do not rule out the possibility that other inflammatory pathways or cross- talk between infected cardiomyocytes and immune cells contributes to reduced eht contractility. to validate the myocarditis phenotype generated by sars-cov- infection of the eht model, we obtained autopsy and endomyocardial biopsy specimens from four subjects with confirmed sars-cov- infection and clinical diagnoses of myocarditis. evidence of myocardial injury (elevated troponin) and left ventricular systolic dysfunction were present in each case (table ) were noted accompanied by a mixed mononuclear cell infiltrate (fig. a) . these changes are distinct from postmortem autolytic changes. examination of the coronary arteries from the covid- myocarditis autopsy cases demonstrated non-obstructive mild atherosclerotic changes, consistent with the angiogram findings. there was no evidence of microvascular injury or thromboembolic events. two autopsy heart samples from subjects with metastatic carcinoma and an inherited neurodegenerative disease with similar tissue procurement times were included as negative controls. rna in situ hybridization for sars-cov- spike and nucleocapsid genes revealed evidence of viral rna within the myocardium of each covid- myocarditis subject. viral transcripts were located in cytoplasmic and perinuclear locations within cells that were morphologically consistent with cardiomyocytes (fig. b, fig. s a-b) . viral transcripts also were identified in airway epithelial cells within the lung of this subject and other myocardial cell types including perivascular adipocytes and pericytes (fig. s c) . immunostaining for the nucleocapsid protein further demonstrated presence of viral protein in cardiomyocytes fig. c) . the covid- myocarditis immune cell infiltrate was characterized by accumulation of a mixed population of ccr and ccr + macrophages within injured areas of the myocardium (fig. d) . minimal evidence of t-cell infiltration was noted (fig. e) . macrophage abundance was highest in areas that demonstrated evidence of cardiomyocyte injury as depicted by complement deposition (c d staining), a pathological marker of cardiomyocyte cell death - (fig. s d) . together, these observations provide initial pathological evidence that sars-cov- infects the human heart and may contribute to cardiomyocyte cell death and myocardial inflammation that is distinct from lymphocytic myocarditis. ( mg/ml), % fbs, % non-essential amino acids, % glutamax supplement, and % pen- strep. all drug compounds were purchased from selleckchem (ruxolitinib, catalog number s ; mrt , catalog number s ; e , catalog number s ; camostat, catalog number s ) and resuspended to a stock concentration of μm in pbs or dmso (depending on the solubility profile), then diluted to working concentration in culture media (described above) and sterile-filtered. immunostaining was performed as previously described with a few modifications . briefly, cardiomyocytes were fixed for minutes in % formaldehyde in phosphate buffered saline (pbs). cells were then permeabilized with . % triton x- for minutes at room temperature. the cells were blocked for hour using a blocking solution containing % bovine serum albumin, % donkey serum, . % triton x- , and . % sodium azide in pbs. primary antibodies (rabbit anti troponin t, : , abcam, ab ) were added for - hours at room temperature or overnight at °c. cells were then washed with pbs before incubating for hour in secondary antibody (cy donkey anti-rabbit, jackson immunoresearch, ). ′, -diamidino- - phenylindole (dapi) was used at a : dilution to stain for nuclei. cells were visualized using a nikon a rsi confocal microscope (washington university center for cellular imaging). z- stacks of cells with x magnification were recorded in sequential scanning mode. images were processed in imagej and z-stacks were converted to standard deviation projections . to test for changes in expression of the reported log fold-changes reported by limma in each term versus the background log fold-changes of all genes found outside the respective term. the r/bioconductor package heatmap was used to display heatmaps across groups of samples for each go or msigdb term with a benjamini-hochberg false-discovery rate adjusted p-value less than or equal to . . perturbed kegg pathways where the observed log fold- changes of genes within the term were significantly perturbed in a single-direction versus background or in any direction compared to other genes within a given term with p-values less than or equal to . were rendered as nnotated kegg graphs with the r/bioconductor package to find the most critical genes, the raw counts were variance stabilized with the r/bioconductor package deseq and then analyzed via weighted gene correlation network analysis with the r/bioconductor package wgcna . briefly, all genes were correlated across each other by pearson correlations and clustered by expression similarity into unsigned modules using a power threshold empirically determined from the data. an eigengene was created for each de novo cluster and its expression profile was then correlated across all coefficients of the model matrix. because these clusters of genes were created by expression profile rather than known functional similarity, the clustered modules were given the names of random colors where grey is the only module that has any pre-existing definition of containing genes that do not cluster well with others. these de novo clustered genes were then tested for functional enrichment of known go terms with hypergeometric tests available in the r/bioconductor package clusterprofiler . significant terms with benjamini-hochberg adjusted p-values less than . were then collapsed by similarity into clusterprofiler category network plots to display the most significant terms for each module of hub genes in order to interpolate the function of each significant module. the information for all clustered genes for each module were combined with their respective statistical significance results from limma to identify differentially expressed genes. figure : ace is expressed in the human heart and in stem cell derived cardiomyocytes. clinical course and risk factors for mortality of adult inpatients with covid- china: a retrospective cohort study association of cardiac injury with mortality in hospitalized patients with covid- and cardiac arrhythmias cardiac involvement in patients recovered from using magnetic resonance imaging outcomes of cardiovascular magnetic resonance imaging in covid- ) cardiovascular magnetic resonance findings in competitive athletes recovering from covid- infection elevated troponin in patients with coronavirus disease animal models of mechanisms of sars-cov- infection and covid- pathology functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses coronavirus from wuhan: an analysis based on decade-long structural studies of sars engineering approaches to modeling the mechanics of human heart failure for drug engineering cardiac muscle tissue: a maturating field of research cardiomyocyte maturation: advances in knowledge and implications for regenerative medicine sars-cov- cell entry depends on ace and tmprss and is the pathogenicity of sars-cov- in hace transgenic mice the ace expression in human heart indicates new potential mechanism of heart injury among patients infected with sars-cov- an infectious cdna clone of sars-cov- ultrastructural characterization of sars coronavirus electron microscopy of sars-cov- : a challenging task the architecture of sars-cov- transcriptome sars-cov- infects and induces cytotoxic effects in human cardiomyocytes myocardial localization of coronavirus in covid- cardiogenic shock multiorgan and renal tropism of sars-cov- association of cardiac infection with sars-cov- in confirmed autopsy cases sars-cov- productively infects human gut enterocytes inhibition of sars-cov- infections in engineered human tissues using a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids marked up-regulation of ace in hearts of patients with obstructive hypertrophic cardiomyopathy: implications for sars-cov- -mediated covid- angiotensin-converting enzyme is an essential regulator of heart function heart disease. titin mutations in ips cells define sarcomere insufficiency as a cause of dilated cardiomyopathy comparison of the effects of a truncating and a missense mybpc mutation on contractile parameters of engineered heart tissue generation of quiescent cardiac fibroblasts from human induced pluripotent stem cells for in vitro modeling of cardiac fibrosis efficient differentiation of human pluripotent stem cells to endothelial progenitors via small-molecule activation of wnt signaling phosphomimetic cardiac myosin-binding protein c partially rescues a cardiomyopathy phenotype in murine engineered heart tissue increased afterload augments sunitinib-induced cardiotoxicity in an engineered cardiac microtissue model tmprss and tmprss promote sars-cov- infection of human small intestinal enterocytes infection of bat and human intestinal organoids by sars-cov- covid- artic v illumina library construction and sequencing protocol v (protocols.io.bgxjjxkn) human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants growth, detection, quantification, and inactivation of sars-cov- moderated estimation of fold change and dispersion for rna-seq data with deseq epigenome-wide association study identifies cardiac gene patterning and a novel class of biomarkers for heart failure disrupted mechanobiology links the molecular and cellular phenotypes in familial dilated cardiomyopathy robust cardiomyocyte differentiation from human pluripotent stem cells via temporal modulation of canonical wnt signaling directed cardiomyocyte differentiation from human pluripotent stem cells by modulating wnt/β-catenin signaling under fully defined conditions derivation of highly purified cardiomyocytes from human induced pluripotent stem cells using small molecule-modulated differentiation and subsequent glucose starvation directed differentiation of primitive and definitive hematopoietic progenitors from human pluripotent stem cells fiji: an open-source platform for biological-image analysis differentiation of cardiomyocytes and generation of human engineered heart tissue star: ultrafast universal rna-seq aligner featurecounts: an efficient general purpose program for assigning sequence reads to genomic features salmon provides fast and bias-aware quantification of transcript expression rseqc: quality control of rna-seq experiments edger: a bioconductor package for differential expression analysis of digital gene expression data limma powers differential expression analyses for rna-sequencing and microarray studies why weight? modelling sample and observational level variability improves power in rna-seq analyses gage: generally applicable gene set enrichment for pathway analysis advanced heat map and clustering analysis using heatmap pathview: an r/bioconductor package for pathway-based data integration and visualization wgcna: an r package for weighted correlation network analysis clusterprofiler: an r package for comparing biological themes among gene clusters rnascope . hd detection reagent -red the authors would like to acknowledge funding support from the national institutes of health anti-human cd bv biolegend rrid: ab_ a, immunohistochemistry of human heart tissue showing ace (red) expression in cardiomyocytes (green, sarcomeric actin). representative images from analyzed specimens. b, rna sequencing demonstrating ace mrna expression in myocardial biopsies obtained from adult controls and heart failure patients. data is displayed as counts per million (cpm). n= pediatric, n= adult. each data point indicates an individual sample. n= controls, n= heart failure. c, rna sequencing demonstrating ace mrna expression in adult and pediatric heart tissue. data is displayed as counts per million (cpm insets are high magnification images of the boxed areas. representative images from independent samples. b, immunostaining of mock or sars-cov- infected three-dimensional ehts for sarcomeric actin (cardiomyocytes, red), cd (macrophages, green), and nucleocapsid protein (white). ehts were harvested days after inoculation. blue: dapi. images are representative of independent experiments. representative images from independent samples. c, quantitative rt-pcr of sars-cov- n gene expression in ehts consisting of hpsc-derived cardiomyocytes (cm) and fibroblasts (fb) or hpsc-derived cardiomyocytes, fibroblasts, and macrophages. ehts were either mock infected or inoculated with sars-cov- (moi . ) and harvested days after inoculation. each data point represents individual samples/experiments. error bars denote standard error of the mean. bar height represents sample mean. dotted line: limit of detection. *p< . compared to uninfected control (mock, mann-whitney test). d, in situ hybridization for sars-cov- orf ab rna sense and anti-sense strands (red) in ehts days after mock or sars-cov- infection (moi . ). hematoxylin: blue. representative images from independent specimens. insets are high magnification images of the boxed areas. e, representative spontaneous beating displacement traces for an infected and an uninfected eht on day post-infection. videos used to generate these traces can be found in supplemental videos and . f, displacement (relative to uninfected mock condition) generated by spontaneous beating of ehts as a function of time following inoculation with sars-cov- (moi . ). each data point represents a mean value from - independent samples ( independent experiments), error bars denote standard error of the mean. g, quantification of absolute displacement (left) and contraction speed (right) generated by spontaneous beating of ehts days following mock or sars-cov- infection (moi . ). each data point denotes an individual eht, bar height corresponds to mean displacement, error bars represent standard error of the mean, *p< . compared to mock (mann-whitney test). figure . mechanisms of reduced eht contractility. a, combined immunostaining for cardiomyocytes (cardiac actin, green) and tunel staining (red) of ehts (cm+fb+mac) days after mock or sars-cov- infection (moi . ). dapi: blue. representative images from independent experiments. b, quantification of cell death (percent of tunel-positive cells) in areas of viral infection. each data point denotes an individual eht, bar height corresponds to the mean, error bars represent standard error of the mean, *p< . compared to mock (mann-whitney test). c, immunostaining of hpsc-derived cardiomyocytes for troponin t (red) days after inoculation with mock control or sars-cov- -neongreen (moi . ). blue: dapi. arrows denote areas of sarcomere disassembly. d, immunostaining of ehts for troponin t (red) and sars-cov- nucleocapsid (green) days after inoculation with mock control or sars-cov- -neongreen (moi . ). blue: dapi. arrows denote sars-cov- nucleocapsid positive cells with reduced troponin t staining. e, quantification of troponin t staining in mock (white) and sars-cov- (red) infected ehts. np: nucleocapsid. data is presented as mean florescence intensity (mfi). mfi was measured in infected (np+) cardiomyocytes and uninfected (np-) cardiomyocytes located proximal or remote to areas of infection. each data point denotes an individual eht, bar height corresponds to the mean, error bars represent standard error of the mean, *p< . compared to mock (mann-whitney test). f, quantitative rt-pcr measuring oas , mx , and tnf mrna expression in hpsc-derived cardiomyocytes days after inoculation with mock control (white) or sars-cov- (green, moi . ). cells were treated with vehicle, ace antibody (ace ab) ( µg/ml), remdesivir ( µm), or tbk inhibitor (mrt , µm). each data point denotes a biologically unique sample, bar height corresponds to the mean, and error bars indicate standard error of the mean. * p< . compared to mock control. g, quantitative rt-pcr of sars-cov- n gene expression in hpsc-derived cardiomyocytes that were either mock infected (white) or inoculated with sars-cov- (green, moi . ) and harvested days after inoculation. cells were treated with vehicle, ace ab ( µg/ml), remdesivir ( µm), or tbk inhibitor (mrt , µm). each data point represents individual samples. error bars denote standard error of the mean. bar height represents sample mean. dotted line: limit of detection. *p< . compared to uninfected control. ***p< . compared to uninfected control and vehicle infected (mock, mann-whitney test). h-i, flow cytometry measuring the percent of infected (h) and viable (i) hpscderived cardiomyocytes following either mock infection (white) or inoculation with sars-cov- (green, moi . ). cells were harvested and analyzed days after inoculation. cells were treated with vehicle, remdesivir ( µm), or tbk inhibitor (mrt , µm). each data point represents individual samples. error bars denote standard error of the mean. bar height represents sample mean. *p< . compared to uninfected control. **p< . compared to vehicle infected (mock, mann-whitney test). j, quantitative rt-pcr measuring tnnt mrna expression in hpsc-derived cardiomyocytes days after inoculation with mock control (white) or sars-cov- (green, moi . ). cells were treated with vehicle, ace ab ( µg/ml), remdesivir ( µm), or tbk inhibitor (mrt , µm). each data point denotes a biologically unique sample, bar height corresponds to the mean, and error bars indicate standard error of the mean. * p< . compared to mock control. k, immunostaining of hpsc-derived cardiomyocytes for troponin t (red) days after inoculation with mock control or sars-cov- -neongreen (moi . ). hpsc-derived cardiomycoytes were treated with vehicle, remdesivir ( µm) or tbk inhibitor (mrt , µm). blue: dapi. arrows denote areas of sarcomere disassembly. merged images can be found in fig. s . figure . human autopsy and endomyocardial tissue from patients with suspected covid- myocarditis show evidence of sars-cov- cardiomyocyte infection. a, hematoxylin and eosin staining of cardiac autopsy (anterior left ventricular wall) and biopsy samples (right ventricular septum) from subjects without covid- (control case) and patients with a clinical diagnosis of covid- myocarditis (case - ) . b, in situ hybridization of cardiac autopsy and biopsy tissue for sars-cov- spike and nucleocapsid rna (red) showing evidence of viral infection. hematoxylin: blue. arrows denotes viral rna staining in cells with cardiomyocyte morphology. c, immunostaining of control and covid- myocarditis cardiac autopsy tissue for sars-cov- nucleocapsid (white) and cardiac actin (red). dapi: blue. arrows denotes nucleocapsid staining in cardiomyocytes .d, immunostaining of control and covid- myocarditis cardiac autopsy and biopsy tissue for cd (green) and ccr (red). dapi: blue. e, immunostaining of control and covid- myocarditis cardiac autopsy and biopsy tissue for cd (brown). hematoxylin: blue. key: cord- - w ap gz authors: guo, hua; hu, bing-jie; yang, xing-lou; zeng, lei-ping; li, bei; ouyang, song-ying; shi, zheng-li title: evolutionary arms race between virus and host drives genetic diversity in bat sars related coronavirus spike genes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w ap gz the chinese horseshoe bat (rhinolophus sinicus), reservoir host of severe acute respiratory syndrome coronavirus (sars-cov), carries many bat sars-related covs (sarsr-covs) with high genetic diversity, particularly in the spike gene. despite these variations, some bat sarsr-covs can utilize the orthologs of human sars-cov receptor, angiotensin-converting enzyme (ace ), for entry. it is speculated that the interaction between bat ace and sarsr-cov spike proteins drives diversity. here, we have identified a series of r. sinicus ace variants with some polymorphic sites involved in the interaction with the sars-cov spike protein. pseudoviruses or sarsr-covs carrying different spike proteins showed different infection efficiency in cells transiently expressing bat ace variants. consistent results were observed by binding affinity assays between sars- and sarsr-cov spike proteins and receptor molecules from bats and humans. all tested bat sarsr-cov spike proteins had a higher binding affinity to human ace than to bat ace , although they showed a -fold lower binding affinity to human ace compared with their sars-cov counterpart. structure modeling revealed that the difference in binding affinity between spike and ace might be caused by the alteration of some key residues in the interface of these two molecules. molecular evolution analysis indicates that these residues were under strong positive selection. these results suggest that the sarsr-cov spike protein and r. sinicus ace may have coevolved over time and experienced selection pressure from each other, triggering the evolutionary arms race dynamics. it further proves that r. sinicus is the natural host of sarsr-covs. importance evolutionary arms race dynamics shape the diversity of viruses and their receptors. identification of key residues which are involved in interspecies transmission is important to predict potential pathogen spillover from wildlife to humans. previously, we have identified genetically diverse sarsr-cov in chinese horseshoe bats. here, we show the highly polymorphic ace in chinese horseshoe bat populations. these ace variants support sars- and sarsr-cov infection but with different binding affinity to different spike proteins. the higher binding affinity of sarsr-cov spike to human ace suggests that these viruses have the capacity of spillover to humans. the positive selection of residues at the interface between ace and sarsr-cov spike protein suggests a long-term and ongoing coevolutionary dynamics between them. continued surveillance of this group of viruses in bats is necessary for the prevention of the next sars-like disease. the first and essential step of virus infection is cell receptor recognition. the entry of the coronavirus is mediated by specific interactions between the viral s protein and cell surface receptor, followed by fusion between the viral and host membrane. the coronavirus s protein is functionally divided into two subunits, a cell attachment subunit (s ) and a membrane-fusion subunit (s ). the s region contains an n-terminal domain (ntd) and a c-terminal domain (ctd); both can be used for coronavirus receptor binding (rbd) ( ) . for sars-cov, its s -ctd serves as an rbd for binding to the cellular receptor, angiotensin-converting enzyme (ace ) ( ). biochemical and crystal structure analyses have identified a few key residues in the interface between the sars-cov s-rbd and human ace ( - ). have a smaller s protein, due to , , or amino acid deletions ( , ) . despite the variations in the rbd, all clade strains can use ace for cell entry, whereas clade strains, with deletions cannot ( , , ) . these results suggest that members of clade are likely to be the direct source of sars-cov in terms of genome similarity and ace usage. samples from three provinces (hubei, guangdong, and yunnan) were used for ace amplification, based on the prevalence of bat sarsr-covs and tissue sample availability and quality. in addition to previously sequenced bat ace by our group (sample id , , and , collected from hubei, guangxi, and yunnan, respectively) and others (genbank accession no. act ; sample collected from hong kong), we obtained ace gene sequences from r. sinicus bat individuals: five from hubei, nine from guangdong, and seven from yunnan. the ace sequences exhibited - % amino acid (aa) identity within their species and - % aa identity with human ace (table s ). major variations were observed at the n-terminal region, including in some residues which were previously identified to be in contact with sars-cov s-rbd ( fig. a and fig. s ). analysis based on nonsynonymous snps helped identify eight residues, including , , , , , , , and . the combination of these residues produced eight alleles, including riesedyk, liefenyq, rtesenyq, riksedyq, qiksedyq, rmtsedyq, emktkdhq, and eiktkdhq, named allele - , respectively (fig. a) . in addition to the ace genotype data from previous studies (allele , , and ), five novel alleles were identified in the r. sinicus populations in this study. alleles and were found in two and three provinces, respectively, whereas the other alleles seemed to be geographically restricted. in summary, three alleles ( , , and ) were found in guangdong, four ( , , , and ) in yunnan, three ( , , and ) in hubei, and one each in guangxi and hong kong. coexistence of four alleles was found in the same bat cave of yunnan where the direct progenitor of sars-cov was found (fig. b) . taken together, these data suggest that ace variants have been circulating within the r. similar to our previous report, all four bat sarsr-cov strains with the same genomic background but different s proteins could use human ace and replicate at similar levels ( ). however, there are some differences in how they utilize r. sinicus ace s ( fig. and fig. s ). all test viruses could efficiently use allele , , , for entry. rswiv and rswiv , which share an identical rbd, could not use allele (sample id ) from guangdong. rs and rsshc , which share an identical rbd, could not use allele (sample id ) and (sample id ) from yu nnan and guangdong, respectively. sars-cov-bj , which shares high similarity with wiv and wiv rbd, was able to use same bat ace alleles as rs and rsshc in the pseudotyped infection assay ( fig. and fig. s ). these results indicate that cell entry was affected by both spike rbd and r. sinicus ace variants. ace (allele ) was found to bind rsshc and bj but not rswiv rbd; ace (allele ) was found to bind rswiv but not rsshc and bj rbd; ace (allele ) was found to bind all tested rbds. all tested rbds had a high binding affinity to human or bat ace . bj rbd had a higher binding affinity for human ace than did rswiv and rsshc rbds (fig. a , e, and i); however, it had a lower binding affinity to bat ace than the two bat sarsr-cov rbds ( the four tested spike proteins of bat sarsr-cov are identical in size and share over % aa identity with sars-cov, which suggests that these proteins have a similar structure. in this study, we built structural complex models of bat sarsr-cov-rswiv rbd with r. sinicus ace (allele ) and rsshc rbd with r. sinicus ace (allele ), in concordance with the results of the binding affinity assay between sars-cov rbd and human ace (fig. ) . compared with the contact residues in the interface between sars-cov rbd and in r. sinicus ace - , we found a threonine at , unlike human ace , which has a lysine at this position (fig. ) . therefore, both rswiv and rsshc rbd had a lower binding affinity to human ace than did bj , but they both showed a higher binding affinity with table ) , of those were found to be located on the rbd region, which faces its receptor ace , according to the crystal structure (fig. s ) . moreover, five of those ( , , , , and ), present in the sars-cov spike, have been previously identified to have a significant impact on binding affinity to human ace (fig. ( , , , , , , , and ) correspond to the residues in human ace , which were previously identified to be involved in direct contact with the human sars-cov spike protein (fig. , fig. s ) ( ). we also analyzed the ace gene of rhinolophus affinis (r. affinis), which has been reported to carry sarsr-cov occasionally ( ). used an alignment of ace gene sequences from r. affinis obtained in this study, we found that r. affinis ace was more conserved between different individuals in the entire coding region than r. sinicus ace (fig. s ) and no obvious positive selection sites were observed (data genes with important functions usually display a dn/ds ratio of less than (negative selection) because most amino acid alterations in a protein are deleterious. in a host-virus arms race situation, the genes involved tend to display dn/ds ratios codon-based analysis of molecular evolution bat ace and sarsr-cov spike sequences were analyzed for positive selection. in this study, bat ace sequences were either amplified or downloaded from ncbi and sarsr-cov spike sequences were downloaded from ncbi; the database accession numbers are listed in table s . sequences were aligned in clustal x. phylogenetic table s . discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel coronavirus associated with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a pneumonia outbreak associated with a new coronavirus of probable bat origin isolation and characterization of viruses related to the sars coronavirus from animals in southern china severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats bats are natural reservoirs of sars-like coronaviruses intraspecies diversity of sars-like coronaviruses in rhinolophus sinicus and its implications for the origin of sars coronaviruses in humans sars-coronavirus ancestor's foot-prints in south-east asian bat colonies and the refuge theory genomic characterization of severe acute respiratory syndrome-related coronavirus in european bats and classification of coronaviruses based on partial rna-dependent rna polymerase gene sequences isolation and characterization of a bat sars-like coronavirus that uses the ace receptor identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in china isolation and characterization of a novel bat coronavirus closely related to the direct progenitor of severe acute respiratory syndrome coronavirus discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus diversity of coronavirus in bats from eastern thailand origin and evolution of pathogenic coronaviruses rules of engagement: molecular insights from host-virus arms races cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human receptor and viral determinants of sars-coronavirus adaptation to human ace conformational states of the severe acute respiratory syndrome coronavirus spike protein ectodomain angiotensin-converting enzyme is an essential regulator of heart function evidence for ace -utilizing coronaviruses (covs) related to severe acute respiratory syndrome cov in bats angiotensin-converting enzyme (ace ) proteins of different bat species confer variable susceptibility to sars-cov entry identification of key amino acid residues required for horseshoe bat angiotensin-i converting enzyme to function as a receptor for severe acute respiratory syndrome coronavirus mega : molecular evolutionary genetics analysis version . receptor adaptation by severe acute respiratory syndrome coronavirus receptor recognition and cross-species infections of sars coronavirus paml : phylogenetic analysis by maximum likelihood structure of sars coronavirus spike receptor-binding domain complexed with receptor two-stepping through time: mammals and viruses molecular evolution of the sars coronavirus during the course of the sars epidemic in china mutation of a single residue renders human tetherin resistant to hiv- vpu-mediated depletion two loci controlling genetic cellular resistance to avian leukosis-sarcoma viruses a glycan shield on chimpanzee cd protects against infection by primate lentiviruses cd receptor diversity in chimpanzees protects against siv infection dual host-virus arms races shape an essential housekeeping protein parasite transmission modes and the evolution of virulence coevolution of host and pathogen identification of a cellular receptor for subgroup e avian leukosis virus human genes that limit aids persistent infection promotes cross-species transmissibility of mouse hepatitis virus a mouse-adapted sars-coronavirus causes disease and mortality in balb/c mice amino acid substitutions in the s subunit of mouse hepatitis virus variant v encode determinants of host range expansion mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus adaptive evolution of mers-cov to species variation in dpp filovirus receptor npc contributes to species-specific patterns of ebolavirus susceptibility in bats human adaptation of ebola virus during the west african outbreak ebola virus glycoprotein with increased infectivity dominated the characterization of a filovirus (mengla virus) from rousettus bats in china trilogy of ace : a peptidase in the renin-angiotensin system, a sars receptor angiotensin-converting enzyme is an essential regulator of heart function retroviruses pseudotyped with the severe acute respiratory syndrome coronavirus spike protein efficiently infect cells expressing angiotensin-converting enzyme bat severe acute respiratory syndrome-like coronavirus wiv encodes an extra accessory protein, orfx, involved in modulation of the host immune response bat origins of mers-cov supported by bat coronavirus hku usage of human receptor cd exploring host-pathogen interactions through genome wide protein microarray analysis identification of host key: cord- - tg up authors: zheng, fan; zhang, she; churas, christopher; pratt, dexter; bahar, ivet; ideker, trey title: identifying persistent structures in multiscale ‘omics data date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tg up in any ‘omics study, the scale of analysis can dramatically affect the outcome. for instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. here we use the concept of “persistent homology”, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of sars-cov- protein interactions suggests hijacking of wnt. the method, hidef, is available via python and cytoscape. significant patterns in data often become apparent only when looking at the right scale. for example, single-cell rna sequencing data can be clustered coarsely to identify broad categories of cells (e.g. mesoderm, ectoderm), or analyzed more sharply to delineate highly specific subtypes (e.g. pancreas islet β-cells, thymus epithelium) [ ] [ ] [ ] . likewise, protein-protein interaction networks can inform groups of proteins spanning a wide range of spatial dimensions, from protein dimers (e.g. leucine zippers) to larger complexes of dozens or hundreds of subunits (e.g. proteasome, nuclear pore) to entire organelles (e.g. centriole, mitochondria) [ ] . many different approaches have been devised or applied to detect structures in biological data, including standard clustering, network community detection, and low-dimensional data projection [ ] [ ] [ ] , some of which can be tuned for sensitivity to objects of a certain size or scale (so-called 'resolution parameters') [ , ] . even tunable algorithms, however, face the dilemma that the particular scale(s) at which the significant biological structures arise are usually unknown in advance. guidelines for detecting robust patterns across scales come from the field of topological data analysis, which studies the geometric "shape" of data using tools from algebraic topology and pure mathematics [ ] . a fundamental concept in this field is "persistent homology" [ ] , the idea that the core structures intrinsic to a dataset are those that persist across different scales. recently, this concept has begun to be applied to analysis of 'omics data and particularly biological networks [ , ] . here, we sought to integrate concepts from persistent homology with existing algorithms for network community detection, resulting in a fast and practical multiscale approach we call the hierarchical community decoding framework (hidef). hidef works in the three phases to analyze the structure of a biological dataset (methods). to begin, the dataset is formulated as a similarity network, depicting a set of biological entities (e.g. genes, proteins, cells, patients, or species) and pairwise connections among these entities (representing similarities in their data profiles). the goal of the first phase is to detect network communities, i.e. groups of densely connected biological entities. communities are identified continually as the spatial resolution is scanned, producing a comprehensive pool of candidates across all scales of analysis (fig. a) . in the second phase, candidate communities arising at different resolutions are pairwise aligned to identify those that have been redundantly identified and are thus persistent (fig. b) . in the third phase, persistent communities are analyzed to identify cases where a community is fully or partially contained within another (typically larger) community, resulting in a hierarchical assembly of nested and overlapping biological structures ( fig. c,d) . hidef is implemented as a python package and can be accessed interactively in the cytoscape network analysis and visualization environment [ ] (availability of data and materials). we first explored the idea of measuring community persistence via analysis of synthetic datasets [ ] in which communities were simulated and embedded in the similarity network at two different scales (supplementary fig. a; methods) . notably, the communities determined to be most persistent by hidef were found to accurately recapitulate the simulated communities at the two scales (supplementary fig. b-g) . in contrast, applying community detection algorithms at a fixed resolution had limited capability to capture both scales of simulated structures simultaneously (supplementary fig. ; methods) . we next evaluated whether persistent community detection improves the characterization of cell types. we applied hidef to detect robust nested communities within cell-cell similarity networks based on the mrna expression profiles of , single cells gathered across the organs and tissues of mice (obtained from two datasets in the tabula muris project [ ] ; methods). these cells had been annotated with a controlled vocabulary of cell types from the cell ontology (co) [ ] , via analyses of cell-type-specific expression markers [ ] . we used groups of cells sharing the same annotations to define a panel of reference cell types and measured the degree to which each reference cell type could be recapitulated by a hidef community of cells (methods). we compared these results to toomanycells [ ] and conos [ ] , two recently developed methods that generate nested communities of single cells in divisive and agglomerative manners, respectively (methods). reference cell types tended to better match communities generated by hidef than those of other approaches, with % ( / ) having a highly overlapping community (jaccard index > . ) in the hidef hierarchy ( fig. a,b, supplementary fig. a,b) . this favorable performance was observed consistently when adjusting hidef parameters to formulate a simple hierarchy, containing only the strongest structures, or a more complex hierarchy including additional communities that are less persistent but still significant (fig. c, supplementary fig. c) . the top-level communities in the hidef hierarchy corresponded to broad cell lineages such as "t cell", "b cell", and "epidermal cell". finer-grained communities mapped to more specific known subtypes (fig. d) or, more frequently, putative new subtypes within a lineage. for example, "epidermal cell" was split into two distinct epidermal tissue locations, skin and tongue; further splits suggested the presence of still more specific uncharacterized cell types (fig. e) . hidef communities also captured known cell types that were not apparent from d visual embeddings (supplementary fig. a,b) , and also suggested new cell-type combinations. for example, astrocytes were joined with two communities of neuronal cells to create a distinct cell type not observed in the hierarchies of toomanycells [ ] , conos [ ] , or a two-dimensional data projection with umap [ ] (fig. f, supplementary fig. c ). this community may correspond to the grouping of a presynaptic neuron, postsynaptic neuron, and a surrounding astrocyte within a so-called "tripartite synapse" [ ] . next, we applied hidef to analyze protein-protein interaction networks, with the goal of characterizing protein complexes and higher-order protein assemblies spanning spatial scales. we benchmarked this task by the agreement between hidef communities and the gene ontology (go) [ ] , a database that manually assigns proteins to cellular components, processes, or functions based on curation of literature (methods). application to protein-protein interaction networks from budding yeast and human found that hidef captured knowledge in go more significantly than previous pipelines proposed for this task, including the nexo approach to hierarchical community detection [ ] and standard hierarchical clustering of pairwise protein distances calculated by three recent network embedding approaches [ ] [ ] [ ] (fig. a, fig. ) . we also applied hidef to analyze a collection of human protein interaction networks [ , ] . we found significant differences in the distributions of community sizes across these networks, loosely correlating with the different measurement approaches used to generate each network. for example, bioplex . , a network characterizing biophysical protein-protein interactions by affinity-purification mass-spectrometry (ap-ms) [ ] , was dominated by small communities of - proteins, whereas a network based on mrna coexpression [ ] tended towards larger-scale communities of > proteins. in the middle of this spectrum, the string network, which integrated biophysical protein interactions and gene co-expression with a variety of other features [ ] , contained both small and large communities (fig. c) . in agreement with the observation above, the hierarchy of bioplex had a relatively shallow shape in comparison to that of string (and other integrated networks including giant and pcnet [ , ] ), in which communities across many scales formed a deep hierarchy (fig. d ,e; availability of data and materials). in contrast to clustering frameworks, hidef recognizes when a community is contained by multiple parent communities, which in the context of protein-protein networks suggests that the community participates in diverse pleiotropic biological functions. for example, a community corresponding to the mapk (erk) pathway participated in multiple larger communities, including ras and rsk pathways, sodium channels, and actin capping, consistent with the central roles of mapk signaling in these distinct biological processes [ ] (supplementary fig. ) . the hierarchies of protein communities identified from each of these networks have been made available as a resource in the ndex database [ ] (availability of data and materials). to explore multiscale data analysis in the context of an urgent public health issue, we considered a recent application of ap-ms that characterized interactions between the sars-cov- viral subunits and human host proteins [ ] . we used network propagation to select a subnetwork of the bioplex . human protein interactome [ ] proximal to these proteins ( proteins and , interactions) and applied hidef to identify its community structure (methods). among the persistent communities identified (fig. f) , we noted one consisting of human transducin-like enhancer (tle) family proteins, tle , tle , and tle , which interacted with sars-cov nsp , a highly conserved rna synthesis protein in corona and other nidoviruses (fig. g) [ ] . tle proteins are well-known inhibitors of the wnt signaling pathway [ ] . inhibition of wnt, in turn, has been shown to reduce coronavirus replication [ ] and recently proposed as a covid- treatment [ ] . if interactions between nsp and tle proteins can be shown to facilitate activation of wnt, tles may be of potential interest as drug targets. community persistence provides a basic metric for distilling biological structure from data, which can be tuned to select only the strongest structures or to include weaker patterns that are less persistent but still significant. this concept applies to diverse biological subfields, as demonstrated here for single cell transcriptomics and protein interaction mapping. while these subfields currently employ very different analysis tools which largely evolve separately, it is perhaps high time to seek out core concepts and broader fundamentals around which to unify some of the ongoing development efforts. to that effect, the methods explored here have wide applicability to analyze the multiscale organization of many other biological systems, including those related to chromosome organization, the microbiome and the brain. consider an undirected network graph , representing a set of biological objects (vertices) and a set of similarity relations between these objects (edges). examples of interest include networks of cells, where edges represent pairwise cell-cell similarity in transcriptional profiles characterized by single-cell rna-seq, or networks of proteins, where edges represent pairwise protein-protein biophysical interactions. we seek to group these objects into communities (subsets of objects) that appear at different scales and identify approximate containment relationships among these communities, so as to obtain a hierarchical representation of the network structure. the workflow is implemented in three phases. phase i identifies communities in at each of a series of spatial resolutions . phase ii identifies which of these communities are persistent by way of a panresolution community graph ! , in which vertices represent communities, including those identified at each resolution, and each edge links pairs of similar communities arising at different resolutions. persistent communities correspond to large components in ! . phase iii constructs a final hierarchical structure that represents containment and partial containment relationships (directed edges) among the persistent communities (vertices). community detection methods generally seek to maximize a quantity known as the network modularity, as a function of community assignment of all objects [ ] . a resolution parameter integrated into the modularity function can be used to tune the scale of the communities identified [ , , ] , with larger/smaller scale communities having more/fewer vertices on average (fig. a) . of the several types of resolution parameter that have been proposed, we adopted that of the reichardt-bornholdt configuration model [ ] , which defines the generalized modularity as: where ⃗ defines a mapping from objects in to community labels; " is the degree of vertex ; is the total number of edges in ; is the resolution parameter; ( , ) indicates that vertices and are assigned to the same community by ⃗ ; and is the adjacency matrix of . to determine two values satisfying the above formula are defined as -proximal. the sampling step, which was practically set to . to sufficiently capture the interesting structures in the data; it is conceptually similar to the nyquist sampling frequency in signal processing [ ] . we used $"% = . , which we found always resulted in the theoretical minimum number of communities, equal to the number of connected components in . we used $&' = for single-cell data ( fig. to identify persistent communities, we define the pairwise similarity between any two communities and as the jaccard similarity of their sets of objects, ( ) and ( ): we initialize a hierarchical structure represented by , a directed acyclic graph (dag) in which each vertex represents a persistent community. a root vertex is added to represent the community of all objects. the containment relationship between two vertices, and , is quantified by the containment index (ci): which measures the fraction of objects in shared with . an edge is added from to in if ( , ) is larger than a threshold ( is -contained by ). since ( , ) < for all , (a property established by the procedure for connecting similar communities in phase ii), setting ≥ /( + ) guarantees to be acyclic. in practice we used a relaxed threshold = , which we found generally maintains the acyclic property but includes additional containment relations. in the (in our experience rare) event that cycles are generated in , i.e. ( , ) ≥ and ( , ) ≥ , we add a new community to , the union of and , and remove and from . finally, redundant relations are removed by obtaining a transitive reduction [ ] of , which represents the hierarchy returned by hidef describing the organization of communities. the biological objects assigned to each community are expanded to include all objects assigned to its descendants. throughout this study, we used the parameters = . , = , = . note that since is a threshold of minimum persistence, the results under a larger value of ′ can be produced by simply removing communities with persistence lower than ′ (figs. c, a- fig. ). different combinations of parameters and typically do not significantly change the performance of hidef in the benchmark tests on protein-protein interaction networks (supplementary fig. ), except that certain parameters (e.g. = . ) are less robust to network perturbation (i.e. randomly deleting edges from networks). we found that combining hidef with node embedding resolved this issue and further improved the performance and robustness (supplementary fig. ; see sections below). simulated network data were generated using the lancichinetti-fortunato-radicchi (lfr) method [ ] (supplementary figs. , ) . we used an available implementation (lfr benchmark graphs package at http://www.santofortunato.net/resources) to generate benchmark networks with two levels of embedded communities, a coarse-grained (macro) level and a fine-grained (micro) level. within each level, a vertex was exclusively assigned to one community. two parameters, c and f, were used to define the fractions of edges violating the simulated community structures at the two levels. all other edges were restricted to occur between vertices assigned to the same community (supplementary fig. a) . we fixed other parameters of the lfr method to values explored by previous studies [ ] . some community detection algorithms include iterations of local optimization and vertex aggregation, a process that, like hidef, also defines a hierarchy of communities, albeit as a tree rather than a dag. we demonstrated that without scanning multiple resolutions, this process alone was insufficient to detect the simulated communities at all scales (supplementary fig. ) . we used louvain and infomap [ , ] , which have stable implementations and have shown strong performance in previous community detection studies [ ] . for louvain, we optimized the and other parameters to default. in general, these settings generated trees with two levels of communities. note that infomap sometimes determined that the input network was nonhierarchical, in which cases the coarse-and fine-grained communities were identical by definition. mouse single-cell rna-seq data ( fig. ; supplementary fig. identical analyses were applied to the facs and the droplet datasets respectively, yielding a hierarchy of and communities respectively (fig. d) . scanpy . . [ ] was used to create tsne or umap embeddings and associated two-dimensional visualizations [ ] as baselines for comparison (fig. e,f; supplementary fig. a,b) . through previous analysis of the single-cell rna data, all cells in these datasets had been annotated with matching cell-type classes in the cell ontology (co) [ ] . before comparing these annotations with the communities detected by hidef, we expanded the set of annotations of each cell according to the co structure, to ensure the set also included all of the ancestor cell types of the type that was annotated. for example, co has the relationship "[keratinocyte] (is_a) [epidermal_cell]", and thus all cells annotated as "keratinocyte" are also annotated as "epidermal cell". the co was obtained from http://www.obofoundry.org/ontology/cl.html and processed by the data driven ontology toolkit (ddot) [ ] retaining "is_a" relationships only. we compared hidef to toomanycells [ ] and conos [ ] as baseline methods. the former is a divisive method which iteratively applies bipartite spectral clustering to the cell population until the modularity of the partition is below a threshold; the latter uses the walktrap algorithm to agglomeratively construct the cell-type hierarchy [ ] . we chose to compare with these methods because their ability to identify multiscale communities was either the main advertised feature or had been shown to be a major strength. toomanycells (version . . . ) was run with the parameter "min-modularity" set to . as recommended in the original paper [ ] , with other settings set to default. this process generated dendrograms (binary trees) with communities. the walktrap algorithm was run from the conos package (version . . ) with the parameter "step" set to as recommended in the original paper [ ] , yielding a dendogram. the greedymodularitycut method in the conos package was used to select n fusions in the original dendrogram, resulting in a reduced dendrogram with n+ communities (including n internal and n+ leaf nodes). here we used n = , generating a hierarchy with communities (fig. c) . the communities in each hierarchy were ranked to analyze the relationships between celltype recovery and model complexity (fig. c, supplementary fig. c) . hidef communities were ranked by their persistence; conos and toomanycells communities were ranked according to the modularity scores those methods associate with each branch-point in their dendrograms. conos/walktrap uses a score based on the gain of modularity in merging two communities, whereas toomanycells uses the modularity of each binary partition. we obtained a total of human protein interaction networks gathered previously by survey studies [ , ] , along with one integrated network from budding yeast (s. cerevisiae) that had been used in a previous community detection pipeline, nexo [ ] . this collection contained two versions of the string interaction database, with the second removing edges from text mining (labeled string-t versus string, respectively; fig. ). benchmark experiments for the recovery of the gene ontology (go) were performed with string and the yeast network ( fig. a,b, supplementary fig. ) . the reference go for yeast proteins was obtained from http://nexo.ucsd.edu/. a reference go for human proteins was downloaded from http://geneontology.org/ via an api provided by the ddot package [ ] . hidef was directly applied to all of the above benchmark networks. the nexo communities were obtained from http://nexo.ucsd.edu/, with a robustness score assigned to each community. to benchmark communities created by hierarchical clustering, we first calculated three versions of pairwise protein distances (hc. - ; fig. a,b; supplementary fig. ) using mashup, dsd and deepnf [ ] [ ] [ ] . mashup was used to embed each protein as a vector, with and dimensions for yeast and human, as recommended in the original paper. a pairwise distance was computed for each pair of proteins as the cosine distance between the two vectors. similarly, deepnf was used to embed each protein into a -dimensional vector by default. dsd generates pairwise distances by default. given these pairwise distances, upgma clustering was applied to generate binary hierarchical trees. following the procedure given in the nexo and mashup papers [ , ] communities with < proteins were discarded. since all methods had slight differences in the resulting number of communities, communities from each method were sorted in decreasing order of score, enabling comparison of results across the same numbers of top-ranked communities. hidef communities were ranked by persistence. nexo communities were ranked by the robustness value assigned to each community in the original paper [ ] . to rank each community c of hierarchical clustering (branch in the dendrogram), a one-way mann-whitney u-test was used to test for significant differences between two sets of protein pairwise distances: (set ) all pairs consisting of a protein in c and a protein in the sibling community of c; (set ) all pairs consisting of a protein in each of the two children communities of c. the communities were sorted by the one-sided p-value of significance that distances in set are greater than those in set . we adopted a metric average f -score [ ] to evaluate the overall performance of multiscale structure identification, focusing on the recovery of reference communities. given a set of reference communities * and a set of computationally detected communities ⃗ , the score was defined as: where ( ) is the best match of " in ⃗ , defined as follows: and ( " , sss⃗ ) is the harmonic mean of precision( " , sss⃗ ) and recall( " , sss⃗ ). the calculations were conducted by the xmeasures package (https://github.com/exascaleinfolab/xmeasures) [ ] . hidef was directly applied to the original networks in in most of our analyses of protein-protein interaction networks, and compared with the results of hierarchical clustering following the network embedding techniques [ , ] . we sought to explore if we can combine the strength of network embedding and hidef to further improve the performance and robustness to parameter choices (supplementary fig. ) . we borrowed the idea of shared-nearest neighbor (snn) graph that we had been using in the analyses of single-cell data. we made a customized script to use the -dimensional node embeddings of the string network as the input of the seurat findneighbors function [ ] . the parameters of this function remained as the default. the output snn graph has . ´ edges, which is on the same magnitude as the original network ( . ´ edges). we then applied hidef to this snn graph with different combinations of parameters ( supplementary fig. ) . human proteins identified to interact with sars-cov- viral protein subunits were obtained from a recent study [ ] . this list was expanded to include additional human proteins connected to two or more of the virus-interacting human proteins in the new bioplex . network [ ] . these operations resulted in a network of proteins and , interactions. hidef was applied to this network with the same parameter settings as for other protein-protein interaction networks (see previous methods sections), and enrichment analysis was performed via g:profiler [ ] (fig. f,g) . not applicable. not applicable. these models include the hierarchy of murine cell types (fig. ) , the hierarchies of yeast and human protein communities identified through protein network analysis, and the hierarchy of human protein complexes targeted by sars-cov (fig. ) . t.i. is cofounder of data cure, is on the scientific advisory board, and has an equity interest. t.i. . a yeast network [ ] and the human string network [ ] were used as the inputs of a and b, respectively. hc. - represent upgma hierarchical clustering of pairwise distances generated by mashup, dsd, and deepnf [ ] [ ] [ ] , respectively. c, distributions of community sizes (x-axis, number of proteins) for three human protein networks: bioplex . [ ] , coexpr-geo [ ] , and string [ ] . supplementary figure . exploring simulated networks. a, the lfr generative model [ ] was used to simulate networks with vertices and average degree (methods). the simulation included two layers of communities, "coarse" ( - communities, - vertices per community) and "fine" ( - companion plots to panels (b-d). points represent identified communities, delineated by size (y axis) and persistence (x axis). blue/gray point colors indicate a match/non-match to a true community in the simulated network (jaccard similarity > . ). note that when noise is low (e), the highest persistence communities correctly recover simulated communities with near-perfect accuracy, e.g. for persistence threshold > . hidef is compared with the louvain and infomap algorithms [ , ] , with louvain and infomap fixed at their default single resolutions (methods). the three plots (a-c) compare the performance of the three algorithms in recovering simulated communities at different settings of the coarse/fine mixing parameters (see supplementary fig. clustering following any of three protein pairwise distance functions (mashup, dsd, and deepnf) [ ] [ ] [ ] . using the performance analysis depicted in fig. b , the area under curve (auc) was computed for different sets of hidef parameters (p, ). this auc was compared to that of the best baseline tool, hc. (i.e. hierarchical clustering of pairwise distances generated by deepnf [ ] ) to generate an equal number of communities (methods). note the ratio hidef auc / hc. auc is usually higher than , indicating the favorable performance of hidef except for very high values of the t parameter. as per fig. b , the analysis was undertaken using the string network and the go cellular component branch. b, similar analysis with subsampling of network edges (in which a random % of network edges are removed prior to community detection at each resolution). higher persistence (y axis) than a given threshold (x axis). e-f, scatterplots of community size (y axis) versus persistence (x axis). the left column characterizes the single-cell transcriptomics data (fig. , supplementary fig. ) . the right column (panel b, d, f) characterizes the yeast and human protein-protein interaction datasets ( fig. a-b) . the human cell atlas data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis integrating single-cell transcriptomic data across different conditions, technologies, and species molecules into cells: specifying spatial architecture data clustering: a review community detection in networks: a user guide visualizing data using t-sne analysis of the structure of complex networks at different resolution levels van dooren p: significant scales in community structure persistent homology-a survey a topological paradigm for hippocampal spatial map formation using persistent homology homological scaffolds of brain functional networks cytoscape: a software environment for integrated models of biomolecular interaction networks benchmark graphs for testing community detection algorithms organ collection and p, library preparation and s, computational data a, cell type a, writing g, supplemental text writing g, principal i: single-cell transcriptomics of mouse organs creates a tabula muris the cell ontology : enhanced content, modularization, and ontology interoperability toomanycells identifies and visualizes relationships of single-cell clades joint analysis of heterogeneous single-cell rna-seq dataset collections dimensionality reduction for visualizing single-cell data using umap tripartite synapses: astrocytes process and control synaptic information gene ontology: tool for the unification of biology. the gene ontology consortium a gene ontology inferred from molecular networks compact integration of multi-network topology for functional analysis of genes going the distance for protein function prediction: a new distance metric for protein interaction networks deepnf: deep network fusion for protein function prediction systematic evaluation of molecular networks for discovery of disease genes assessment of network module identification across complex diseases architecture of the human interactome defines protein communities and disease networks a next generation connectivity map: l platform and the first , , profiles string v : protein-protein association networks with increased coverage, supporting functional discovery in genomewide experimental datasets understanding multicellular function and disease with human tissue-specific networks activation and function of the mapks and their substrates, the mapk-activated protein kinases ndex . : a clearinghouse for research on cancer pathways a sars-cov- protein interaction map reveals targets for drug repurposing dual proteome-scale networks reveal cellspecific remodeling of the human interactome the nonstructural proteins directing coronavirus rna synthesis and processing molecular functions of the tle tetramerization domain in wnt target gene repression inhibition of severe acute respiratory syndrome coronavirus replication by niclosamide broad spectrum antiviral agent niclosamide and its therapeutic potential finding and evaluating community structure in networks statistical mechanics of community detection introduction to digital signal processing the transitive reduction of a directed graph fast unfolding of communities in large networks maps of random walks on complex networks reveal community structure scanpy: large-scale single-cell gene expression data analysis ddot: a swiss army knife for investigating data-driven biological ontologies computing communities in large networks using random walks overlapping community detection at scale: a nonnegative matrix factorization approach accuracy evaluation of overlapping and multiresolution clustering algorithms on large datasets profiler: a web server for functional enrichment analysis and conversions of gene lists ( update) the reactome pathway knowledgebase we are grateful for the helpful discussions with drs. jianzhu ma, karen mei, and daniel carlin. reactome [ ] . key: cord- -uu ofc authors: kang, yuan-lin; chou, yi-ying; rothlauf, paul w.; liu, zhuoming; soh, timothy k.; cureton, david; case, james brett; chen, rita e.; diamond, michael s.; whelan, sean p. j.; kirchhausen, tom title: inhibition of pikfyve kinase prevents infection by zaire ebolavirus and sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uu ofc virus entry is a multistep process. it initiates when the virus attaches to the host cell and ends when the viral contents reach the cytosol. genetically unrelated viruses can subvert analogous subcellular mechanisms and use similar trafficking pathways for successful entry. antiviral strategies targeting early steps of infection are therefore appealing, particularly when the probability for successful interference through a common step is highest. we describe here potent inhibitory effects on content release and infection by chimeric vsv containing the envelope proteins of zaire ebolavirus (vsv-zebov) or sars-cov- (vsv-sars-cov- ) elicited by apilimod and vacuolin- , small molecule inhibitors of the main endosomal phosphatidylinositol- -phosphate/phosphatidylinositol -kinase, pikfyve. we also describe potent inhibition of sars-cov- strain -ncov/usa-wa / by apilimod. these results define new tools for studying the intracellular trafficking of pathogens elicited by inhibition of pikfyve kinase and suggest the potential for targeting this kinase in developing small-molecule antivirals against sars-cov- . transporter ) -the late endosomal-lysosomal receptor protein ( ). proteolytic processing is also required for severe acute respiratory syndrome coronavirus (sars- cov) ( , ), and for the current pandemic sars-cov- ( ). lassa fever virus (lasv) uses a different mechanism, binding alpha-dystroglycan at the plasma membrane ( ), for internalization with a subsequent ph-regulated switch that leads to engagement of lysosomal associated membrane protein (lamp ) for membrane fusion ( ). lymphocytic choriomeningitis virus (lcmv) also uses alpha-dystroglycan ( ) and is internalized in a manner that depends on endosomal sorting complexes required for also interfered with vsv-megfp-lcmv and vsv-megfp-zebov infection (fig. c) . all of these viruses require low ph to trigger viral membrane fusion with the endosomal membranes, and as expected, infection was fully blocked by bafilomycin a , which inhibits the vacuolar type h + -atpase (v-atpase) acidification activity (fig. c) . using live-cell spinning disk confocal microscopy ( fig. , ) , we monitored the presence table i; primers used for screening are listed in table ii . endosomal proteolysis of the ebola virus glycoprotein is necessary for infection ebola virus entry requires the cholesterol transporter sars coronavirus, but not human coronavirus nl , utilizes cathepsin l to infect ace -expressing cells inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov identification of alpha-dystroglycan as a receptor for lymphocytic choriomeningitis virus and lassa fever virus virus entry. lassa virus entry requires a trigger-induced receptor switch old world arenaviruses enter the host cell via the multivesicular body and depend on the endosomal sorting complex required for transport coincidence detection in phosphoinositide signaling pikfyve, a mammalian ortholog of yeast fab p lipid kinase, synthesizes -phosphoinositides. effect of insulin cloning, characterization, and expression of a novel zn +-binding fyve finger-containing phosphoinositide kinase in insulin-sensitive cells mammalian cell morphology and endocytic membrane homeostasis require enzymatically active phosphoinositide -kinase pikfyve the mammalian phosphatidylinositol -phosphate - kinase (pikfyve) regulates endosome-to-tgn retrograde transport core protein machinery for mammalian phosphatidylinositol , -bisphosphate synthesis and turnover that regulates the progression of endosomal transport. novel sac phosphatase joins the arpikfyve- pikfyve complex functional dissection of lipid and protein kinase signals of pikfyve reveals the role of ptdins , -p production for endomembrane integrity a selective pikfyve inhibitor blocks ptdins( , )p( ) production and disrupts endomembrane transport and retroviral budding vacuolin- inhibits autophagy by impairing lysosomal maturation via pikfyve inhibition the small chemical vacuolin- inhibits ca( +)-dependent lysosomal exocytosis but not cell resealing pikfyve, a class iii pi kinase, is the target of the small molecular il- /il- inhibitor apilimod and a player in toll-like receptor signaling a family of pikfyve inhibitors with therapeutic potential against autophagy-dependent cancer cells disrupt multiple events in lysosome homeostasis active pikfyve associates with and promotes the membrane attachment of the late endosome-to-trans-golgi network transport factor rab effector p identification of novel vacuolin- analogues as inhibitors by virtual drug screening and chemical synthesis vacuolin- potently and reversibly inhibits autophagosome- lysosome fusion by activating rab a randomized, double-blind, placebo-controlled trial of the oral interleukin- / inhibitor apilimod mesylate for treatment of active crohn's disease apilimod inhibits the production of il- and il- and reduces dendritic cell infiltration in psoriasis the phosphatidylinositol- -phosphate -kinase inhibitor apilimod blocks filoviral entry and infection ebola virus requires phosphatidylinositol ( , ) bisphosphate production for efficient viral entry visualization of ebola virus fusion triggering in the endocytic pathway a phase / a trial of sta inhibitor, in patients with active moderate to severe crohn's disease brief report: a phase iia, randomized, double-blind, placebo-controlled trial of apilimod mesylate patients with rheumatoid arthritis arbidol and other low-molecular-weight drugs that inhibit lassa and ebola viruses identification of combinations of approved drugs with synergistic activity against ebola virus in cell cultures characterization of vps -in , a selective inhibitor of vps , reveals that the phosphatidylinositol -phosphate-binding sgk protein kinase is a downstream target of class iii phosphoinositide -kinase evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor characterization of severe acute respiratory syndrome- associated coronavirus (sars-cov) spike glycoprotein-mediated viral entry identification of apilimod as a first-in-class pikfyve kinase inhibitor for treatment of b-cell non-hodgkin lymphoma the vp subunit of jc polyomavirus recapitulates early events in viral trafficking and is a novel tool to study polyomavirus entry the phosphoinositide kinase pikfyve promotes small molecule inhibitors reveal niemann-pick c is essential for ebola virus infection endosome-to-cytosol transport of viral nucleocapsids protease inhibitors targeting coronavirus and filovirus entry tracking the fate of genetically distinct vesicular stomatitis virus matrix proteins highlights the role for late domains in assembly uptake of rabies virus into epithelial cells by clathrin-mediated endocytosis depends upon actin a forward genetic strategy reveals destabilizing mutations in the ebolavirus glycoprotein that alter its protease dependence during cell entry efficient recovery of infectious vesicular stomatitis virus entirely from cdna clones genome engineering using the crispr-cas system identification and characterization of a novel broad spectrum virus entry inhibitor a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov the first five seconds in the life of a clathrin-coated pit we thank walter j. atwood for providing the parental svg-a cells, eric marino, justin h. houser and tegy john vadakkan for maintaining the spinning disc confocal *** ** ** *** *** *** *** a table i table ii key: cord- -j enftje authors: zoltán, köntös title: in vitro efficacy of “essential iodine drops” against severe acute respiratory syndrome-coronavirus (sars-cov- ) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: j enftje background aerosolization of respiratory droplets is considered the main route of coronavirus disease (covid- ). therefore, reducing the viral load of severe acute respiratory syndrome-coronavirus (sars-cov- ) shed via respiratory droplets is potentially an ideal strategy to prevent the spread of the pandemic. the in vitro virucidal activity of intranasal povidone-iodine (pvp-i) has been demonstrated recently to reduce sars-cov- viral titres. this study evaluated the virucidal activity of the aqueous solution of iodine-v (a clathrate complex formed by elemental iodine and fulvic acid) as in essential iodine drops (eid) with μg elemental iodine/ml content against sars-cov- to ascertain whether it is a better alternative to pvp-i. methods sars-cov- (usawa / strain) virus stock was prepared by infecting vero cells (atcc crl- ) until cytopathic effect (cpe). the virucidal activity of eid against sars-cov- was tested in three dilutions ( : ; : and : ) in triplicates by incubating at room temperature ( ± °c) for either or seconds. the surviving viruses from each sample were quantified by a standard end-point dilution assay. results eid ( μg iodine/ml) after exposure for and seconds was compared to controls. in both cases, the viral titre was reduced by % (lrv . ). the : dilution of eid with virus reduced sars-cov- virus from , cell culture infectious dose % (cccid ) to ccid within seconds. conclusion substantial reductions in lrv by iodine-v in eid confirmed the activity of eid against sars-cov- in vitro, demonstrating that iodine-v in eid is effective at inactivating the virus in vitro and therefore suggesting its potential application intranasally to reduce sars-cov- transmission from known or suspected covid- patients. the global pandemic of coronavirus disease (covid- ) caused by the severe acute respiratory syndrome-coronavirus (sars-cov- ) was first identified in the city of wuhan, china during late ( ) . it has since spread rapidly globally, with over million confirmed cases to date and over deaths. the rapid spread of this novel sars-cov- virus is enhanced by the shading of viral particles from infected or symptomatic individuals through bodily secretions, especially respiratory droplets including saliva, and nasal fluid. it has been demonstrated that tiny droplets of - μm diameter produced during speech-making, sneezing, coughing, and panting contain sars-cov- viral particles ( ) , which remained substantially viable and infectious in aerosols for up to hours, similar to sars-cov- ( ) . thus, it is presumed that the most effective infection mechanism is the aerosolization of viral particles in respiratory droplets ( ) , which is easily inoculated directly into the airway during breathing or indirectly by contact transfer via contaminated hands ( ) . the aerosolized microscopic infectious viral particles can linger in the air circulating in poorly ventilated rooms or spaces long enough to cause multiple infections to individuals who inhale them ( ) . a raft of recommended preventive measures including wearing face masks, eye protection, washing hands and keeping social distance on > m have been demonstrated to be effective in reducing covid- transmission ( ) . the goal of face masks is to reduce the transmission of respiratory droplets from infected individuals and shield non-infected individuals from transmitted droplets. face masks as vital personal protective equipment (ppe) against covid- disease is not fully accepted and has remained controversial from perspectives of attitude, effectiveness, and necessity ( ) . in the initial stage of covid- disease, sars-cov- viral titres of > /ml in saliva and nasal mucous have been reported; and therefore, reduction of these titres should help to reduce disease transmission from known covid- patients to individual who is uncomfortable using masks ( ) . this can be achieved using oral and nasal sprays with antiviral activity against sars-cov- virus. intranasal saline sprays containing elemental iodine as an active ingredient has been used for nasal moisturizing and prevention and/or treatment of sinusitis or rhinitis ( ) . iodine-containing intranasal/oral moisturizing saline sprays are being explored as drug agents against coronaviruses. their potential use has been supported by a recent clinical safety trial that demonstrated that intranasal povidone-iodine (pvp-i; . %) spray had no adverse effects for up to five months and therefore, could be used against coronaviruses (sars-cov- / and middle east respiratory syndrome -mers) ( ) . regarding the antiviral efficacy of iodine-containing nasal/oral sprays, nasal and oral formulations containing pvp-i have been reported to inactive sars-cov- in vitro ( , ) . thus, pvp-i oral antiseptic is potentially efficacious in reducing the transmission risk of coronaviruses in dental practice ( ) . sars-cov- is internalized into the host cells via two viral receptors of host cell infection: angiotensin-converting enzyme (ace ) and cd (a highly-glycosylated transmembrane protein) by binding to them using virus spike proteins (sp) ( ) . the mechanism of action of pvp-i mouth rinse/nasal spray involves targeting of the host's ace for inhibition. it inactivates the activity of haemagglutinin esterase (he), a fifth important structural protein of beta-coronaviruses and diminishes the ace receptors in lymphocytes by promoting their absorption from host epithelial tissue ( ) . this reduces the concentration of sars-cov- / shed in saliva and nasal fluid. enthused by promising findings from a recent study by pelletier and colleagues ( ) , the present study was interested in essential iodine drops (eid) for oral/nasal decontaminant in known or suspected cases of covid- as a potentially better alternative to pvp-i. iodine-v is a novel complex derivative of fulvic acid in a clathrate complex by elemental iodine (i ) molecule. essential iodine drops (eid) is an aqueous solution of iodine-v containing µg elemental iodine/ml. eid is currently used as an oral dietary supplement to support a healthy thyroid function and is non-toxic, non-allergic and lends itself to prolonged continuous use if kept within the daily recommended upper limit of µg. its potential for use in protecting against the transmission of sars-cov- virus during elective surgical procedures in ophthalmology, otolaryngology and dental has been demonstrated recently using pvp-i solutions ( ) . whilst steps are already being made to make essential iodine drops (eid) available for nasal administration, the present study was conducted to ascertain whether the eid solution is effective in deactivating sars-cov- in vitro, as demonstrated with pvp-i. therefore, this research aims to evaluate the in vitro virucidal efficacy of essential iodine drops (eid) on sars-cov- to ascertain whether it is a better alternative to pvp-i. the present study tested the virucidal efficacy of essential iodine drops (eid) against sars-cov- using similar materials and protocol by pelletier and colleagues ( ) . importantly, the present study was conducted in biosafety level (bsl- ) laboratories at the institute for antiviral research at utah state university, usu (logan, ut) following established standard operating procedures approved by the usu biohazards committee. before testing, a virus stock of sars-cov- (usawa / strain) sourced from the cdc following its original isolation from an infected patient in washington state, usa (wa- strain bei #nr- strain), was prepared according to pelletier and colleagues ( ) . the virus stock was prepared by infecting vero cells (atcc crl- ) until cytopathic effect (cpe) was visible at two days postinoculation. the vero cells were cultured in minimal essential medium (mem) (quality biological), supplemented with % (v/v) fetal bovine serum (sigma), and μg/ml gentamicin (gemini bioproducts) ( ) . the test drug was essential iodine drops (eid), obtained from ioi investment zrt. the virucidal activity of essential iodine drops against sars-cov- was tested in three dilutions; : ; : and : . the original concentration of essential iodine drops as supplied by ioi investment zrt was µg elemental iodine/ml. this was subsequently diluted with sars-cov- virus solution to : for seconds, : for seconds and : for seconds. briefly, the three dilutions of essential iodine drops (eid) containing sars-cov- virus solution ( : ; : and : ) were tested in triplicates for virucidal activity as described by pelletier and colleagues ( ) . the undiluted drug as supplied (without virus solution) in two tubes was used as toxicity and neutralization controls. ethanol ( %) was used as the positive control while water was used as a virus control. the test solution and virus were incubated together at room temperature ( ± °c) for either seconds and seconds and the solution were then neutralized by a / dilution in mem with fbs ( %) and gentamicin ( μg/ml). the survived viruses from each sample were quantified by a standard end-point dilution assay according to pelletier and colleagues ( ) . briefly, neutralized samples were pooled and serially diluted using eight log dilutions in the test medium. subsequently, μl of each dilution was plated into quadruplicate wells of -well, plates containing - % confluent vero cells. the toxicity controls were added to an additional wells of vero cells and of those wells at each dilution were infected with the virus to serve as neutralization controls, ensuring that the residual sample in the titre assay plate did not inhibit growth and detection of the surviving virus. plates were incubated ( ± °c with % co ) for days. each well was then scored for the presence or absence of the infectious virus. the titres were measured using a standard endpoint dilution % cell culture infectious dose (ccid ) assay calculated using the reed-muench ( ) equation and the log reduction value (lrv) of each compound compared to the negative (water) control was calculated ( ) . infectious sars-cov- viruses were quantified by endpoint dilution virus titration on vero cells and the results were expressed as log cell culture of % infectious dose (log ccid ). virus titres and log reduction value (lrv) of sars-cov- when incubated with a single concentration of the test drug for each time point are shown in table . after seconds of incubation, : eid reduced viral titre by % from . log ccid / . ml to . log ccid / . ml giving an lrv of . . it reduced the virus from , ccid to ccid per . ml. after seconds, : eid reduced viral titre by % from . log ccid / . ml to . log ccid / . ml giving an lrv of . . no cytotoxicity was observed in any of the test wells and both positive and neutralization controls performed as expected. iodine is established as having a broad-spectrum antimicrobial activity against bacterial, viral, fungal and protozoal pathogens and has been used as an antiseptic for the prevention of infection and the treatment of wounds for decades. pvp-i currently has the best antiviral activity against sars-cov and mers-cov ( ), and its virucidal in vitro activity against the novel, sars-cov- virus has been demonstrated recently ( ) . this paper aimed to demonstrate that the novel iodine-v in eid as an active virucidal ingredient is as efficacious as pvp-i and potentially safer. in vitro virucidal assay in the present study has indeed demonstrated that % and % of essential iodine drops (eid) solution reduced sars-cov- virus titre after seconds and seconds of incubation by an lrv of . ( %). these findings indicate that iodine-v in eid lends itself as a drug with dual benefits, a mineral supplement to maintain a healthy thyroid functioning and reduce the transmission of sars-cov- virus. when applied orally or intranasally, iodine-v in eid is potentially beneficial against the transmission of sars-cov- from covid- patients. this means that the health benefits of eid are superior to that of pvp-i or lugol's iodine (iodine and potassium iodide in water alone forming mostly triiodide). although pvp-i was reported to be non-cytotoxic ( ) , this formulation may cause serious rashes that are similar to chemical burns observed in a few rare cases. it has been implicated in late-onset allergic contact dermatitis when used as a pre-operative antiseptic in oral and maxillofacial surgery ( ) . this has been attributed to the presence of free iodine (i ), which has a strong oxidizing effect on the skin or mucosa, hence, the reported allergic reaction. free iodine concentrations in pvp-i ( . % dilution) and lugol's iodine are regarded as low and safe. the virucidal activity of eid was lower than an lrv of . for pvp-i ( . %) as demonstrated by pelletier et al. ( ) , however, - % pvp-i containing - µg/ml iodine compared to µg/ml iodine in eid. elemental iodine in iodine-v within the fulvic acid clathrate complex forming a solid stable material that can be easily formulated into a tablet form for slow, extended release of elemental iodine. this means that even eid tablet formulation would be less likely to cause iodine-induced allergies in oral and nasal mucosae. furthermore, eid is formulated with iodine-v without excipients unlike pvp-i and therefore, has a potentially better virucidal activity against sars-cov- virus. furthermore, pvp-i excipient has been reported, in rare cases, to induced immediate (type ) hypersensitivity reactions in children ( ) . therefore, eid can be considered a safer alternative to pvp-i as it can be used in children. unlike pvp-i, which is routinely used in surgical settings as a pre-operative antiseptic in oral and maxillofacial surgery ( ) , eid is currently used in the general populace as a mineral supplement and can be readily accepted as a drug to prevent the spread of the covid- pandemic. being already on the market as a mineral supplement to help people with iodine deficiency, data on its potential adverse effects are already known or available and therefore, safe when used routinely orally or intranasally in known or suspected covid- patients. iodine-v (currently only available in eid formulation) inactivated % of sars-cov- after and seconds; quite similar to povidone-iodine (pvp-i) ( . % at seconds) reported elsewhere. being excipient-free, this data also suggests that iodine-v in eid is likely to have better stability and an enhanced potency in vivo when compared with pvp-i. therefore, iodine-v offers an advantage as a nasal or oral antiseptic to reduce viral transmission from known or suspected covid- patients. with the increasing home-based care approach for covid- patients, the risk of transmitting sars-cov- virus among family members and the immediate community can be reduced using intranasal/oral iodine-v, which has the potential to remove or inactivate shed virus from respiratory secretions of known or suspected covid- cases. this in vitro study used sars-cov- (usawa / strain) sourced from the cdc, which obtained it anonymously from an infected patient in washington state, usa. therefore, ethics approval and patient consent are not applicable in this section. not applicable. the data supporting study findings can be found in a separate supplemental material that can be accessed by reaching out to the corresponding author, dr. köntös zoltán, through his email: zkontos@ioi-investment.com dr. köntös zoltán, the corresponding author is the manufacturer of iodine-v and eid already on the market for istituto per le opere di innovazione (ioi) investment zrt. (budapest, hungary). the author is also the majority owner, chairman and ceo of this company. this work was supported solely through private funding by istituto per le opere di innovazione (ioi) investment zrt. (budapest, hungary) chaired by dr. köntös zoltán. thus, the author and this research have not received any external funding. dr. köntös zoltán is the sole researcher who coordinated laboratory analysis and wrote the manuscript. novel coronavirus (covid- ) pandemic: built environment considerations to reduce transmission transmission potential of sars-cov- in viral shedding observed at the university of aerosol and surface stability of sars-cov- as compared with sars-cov- air, surface environmental, and personal protective equipment contamination by severe acute respiratory syndrome coronavirus (sars-cov- ) from a symptomatic patient modality of human expired aerosol size distributions small droplet aerosols in poorly ventilated spaces and sars-cov- transmission physical distancing, face masks, and eye protection to prevent person-to-person transmission of sars-cov- and covid- : a systematic review and meta-analysis mask use during covid- : a risk adjusted strategy the use of povidone iodine nasal spray and mouthwash during the current covid- pandemic may protect healthcare workers and reduce cross infection a comparison of the effectiveness of tincture of iodine and potassium iodide on chronic sinusitis patients with biofilm povidone-iodine use in sinonasal and oral cavities: a review of safety in the covid- era in vitro efficacy of povidone-iodine nasal and oral antiseptic preparations against severe acute respiratory syndrome-coronavirus (sars-cov- ) rapid in-vitro inactivation of severe acute respiratory syndrome coronavirus (sars-cov- ) using povidone-iodine oral antiseptic rinse ace receptor expression and severe acute respiratory syndrome coronavirus infection depend on differentiation of human airway epithelia an unusual complication of late onset allergic contact dermatitis to povidone iodine in oral & maxillofacial surgery -a report of cases anaphylactic reaction to povidone secondary to drug ingestion in a young child i am particularly grateful for the service given by jonna b. westover with the institute for antiviral research at utah state university. key: cord- - id sfsu authors: auerswald, heidi; yann, sokhoun; dul, sokha; in, saraden; dussart, philippe; martin, nicholas j.; karlsson, erik a.; garcia-rivera, jose a. title: assessment of inactivation procedures for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: id sfsu severe acute respiratory syndrome coronavirus (sars-cov- ), the causative agent of coronavirus disease (covid- ), presents a challenge to laboratorians and healthcare workers around the world. handling of biological samples from individuals infected with the sars-cov- virus requires strict biosafety and biosecurity measures. within the laboratory, non-propagative work with samples containing the virus requires, at minimum, biosafety level- (bsl- ) techniques and facilities. therefore, handling of sars-cov- samples remains a major concern in areas and conditions where biosafety and biosecurity for specimen handling is difficult to maintain, such as in rural laboratories or austere field testing sites. inactivation through physical or chemical means can reduce the risk of handling live virus and increase testing ability worldwide. herein we assess several chemical and physical inactivation techniques employed against sars-cov- isolates from cambodian covid- patients. this data demonstrates that all chemical (avl, inactivating sample buffer and formaldehyde) and heat treatment ( °c and °c) methods tested completely inactivated viral loads of up to log . to determine if any viable virus remained post inactivation, % polyethylene glycol (sigma-aldrich, st. louis, usa) in pbs was added ( / of total sample volume) to an aliquot from each sample condition and incubated overnight at °c. following incubation, virus was recovered by centrifugation at , rpm for h. precipitates were washed twice with sterile pbs, re-constituted with infection medium, and used for infecting the tcid on vero e cells and recovery cultures on vero cells. negative controls were treated the same way to examine cytotoxicity of possible remaining traces of inactivation solutions. sars-cov- real-time rt-pcr following inactivation, rna from one aliquot per condition per virus isolate and negative control was immediately extracted with the qiaamp viral rna mini kit (qiagen) and stored at - °c until further processing. real-time rt-pcr assays for sars-cov- rna detection were performed in duplicate using the charité virologie algorithm (berlin, germany) to detect both e and rdrp genes [ ] . in brief, real-time rt-pcr was performed using the superscript™ iii one- step rt-pcr system with platinum™ taq inc., la jolla, ca,, usa). analysis of variance was performed comparing mean ct values for each inactivation method. difference between standard (avl) and each specific inactivation method was determined using dunnett's test for many-to-one comparison. a p-value of less than . was considered to indicate statistical significance. agreement, including bias and % confidence interval, between ct values following inactivation by avl and other methods was assessed using all chemical and thermal inactivation methods resulted in the reduction of viable sars-cov- to undetectable levels. untreated virus isolates had a concentration of viable virus up to . x (isolate ) before treatment (table ) previous studies have been conducted on the effectiveness of chemical inactivation techniques on sars-cov- [ , ] , the majority of these based on infectious agents of concern such as ebola [ ] and sars and mers coronaviruses [ ] . as with other viruses, the primary step in the molecular detection of sars-cov- is viral lysis to begin the extraction of nucleic acids. the buffers used in this lysis step yield varying results [ , , , ] ; however, unlike previous studies [ ] , this study found that avl buffer alone was successfully able to fully inactivate up to log of virus from three different primary isolates of sars-cov- . apart from differences in isolates utilized and a slight reduction in titer, it is unclear as to the reasons why avl buffer fully inactivated in this study versus others, but further work is warranted to determine the exact effectiveness of this step alone. inactivating sample transport media, either made in-house or commercially available, also presents an attractive way to inactivate samples at the point of sampling to ensure safe handling along the transport chain and within the laboratory. these inactivating transport media include the key components of many viral lysis buffers including chaotropic agents (gitc), detergents (triton x- ) and buffering agents (edta, tris-hcl) to inactivate a preserve viral rna. previous studies have shown that gitc-lysis buffers are able to inactivate sars-cov- samples [ , ] ; however, the addition of triton-x may be necessary for complete inactivation [ ] . in line with these studies, commercial sample transport media containing both gitc and triton-x was successfully able to inactivate up to log of virus with no loss of molecular diagnostic sensitivity. apart from sample media and buffers utilized for diagnostic testing, various disinfectant and inactivating chemicals are available for sample treatment. formaldehyde has a long history of use for inactivation against a number of viruses and in a number of fixation techniques, including vaccine preparations [ , ] . formaldehyde has been shown to successfully inactivate both sars and mers [ , , ] and has been suggested to be a viable alternative for disinfection and inactivation of ] . formaldehyde treatment did successfully inactive up to log of virus; however, this treatment severely impacted viral detection in subsequent molecular testing. this decreased detection is not unexpected as formaldehyde treatment results in rna degradation and modification [ ] . therefore, formaldehyde treatment does not appear to be a solution for increased molecular sars-cov- testing; however, it does remain a viable alternative for sample inactivation or disinfection. perhaps the most studied technique thus far regarding sars-cov- has been thermal inactivation at various times and temperatures [ , [ ] [ ] [ ] . several previous studies have shown heat to be an effective inactivation technique against other coronaviruses, including sars, mers, and human seasonal strains [ , , ] . similar to previous studies, o c heat treatment for or minutes was fully able to inactivate up to log of sars-cov- from three different isolates [ , ] . interestingly, while other studies utilized o c for to minutes for inactivation, heat treatment at o c for only minutes was also able to completely inactivate up to log of virus. these results are very promising as high heat treatment is extremely rapid and may be a vital addition to the testing arsenal, as rt-pcr can possibly be performed directly from these samples without the need for nucleic acid extraction [ , ] . interestingly, the shortened time period of high heat treatment may mitigate some of the reduction in detection seen in previous studies and make this technique more employable [ ] . overall, the agreement and retained sensitivity amongst rt-pcr results, combined with the fact that all methods resulted in % virus inactivation up to a viral load of log , suggests that any of the tested methods, except formaldehyde, are useful to inactivate sars-cov- samples. given the who recommendation to "test, test, test," these data can help to optimize sample inactivation for austere or remote areas. indeed, it may be possible to use basic tools such as a stopwatch and boiling water to achieve % virus inactivation without compromising sample integrity, significantly decreasing possible exposure during sample transportation and handling, allowing for dissemination of testing to labs with decreased biosafety and biosecurity capacity, world health organization. coronavirus disease (covid- ) outbreak an interactive web-based dashboard to track covid- in real time. the lancet infectious diseases guidelines for handling and processing specimens associated with coronavirus disease world health organization. laboratory testing for coronavirus disease (covid- ) in suspected human cases: interim guidance un list of least developed countries developed-countries.aspx emerging infectious diseases and public health policy: insights from cambodia, hong kong and indonesia determination of % endpoint titer using a simple formula statistical methods for assessing agreement between two methods of clinical measurement evaluation of heating and chemical protocols for inactivating sars- validation of a lysis buffer containing m guanidinium thiocyanate (gitc)/ triton x- for extraction of sars-cov- rna for covid- comparison of formulated lysis buffers containing to m gitc, roche external lysis buffer and qiagen rtl lysis buffer. biorxiv buffer avl alone does not inactivate ebola virus in a representative clinical sample type inactivation of the coronavirus that induces severe acute respiratory syndrome, sars-cov virus inactivation by nucleic acid extraction reagents unreliable inactivation of viruses by commonly used lysis buffers inactivation methods for whole influenza vaccine production formaldehyde treatment and safety testing of experimental poliomyelitis vaccines. american journal of public health and the nation's health coronavirus disinfection in histopathology persistence of coronaviruses on inanimate surfaces and their inactivation with biocidal agents the effect of formaldehyde fixation on rna: optimization of heat inactivation of the severe acute respiratory syndrome inactivation of coronaviruses by heat resilient sars-cov- diagnostics workflows including viral heat inactivation heat inactivation of the middle east respiratory syndrome coronavirus. influenza and other respiratory viruses extraction-free covid- (sars-cov- ) diagnosis by rt-pcr to increase capacity for national testing programmes during a pandemic an alternative workflow for molecular detection of sars-cov- -escape from the na extraction kit-shortage key: cord- - hldozml authors: cortey, martí; li, yanli; díaz, ivan; clilverd, hepzibar; darwich, laila; mateu, enric title: sars-cov- amino acid substitutions widely spread in the human population are mainly located in highly conserved segments of the structural proteins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hldozml the severe acute respiratory syndrome coronavirus (sars-cov- ) pandemic offers a unique opportunity to study the introduction and evolution of a pathogen into a completely naïve human population. we identified and analysed the amino acid mutations that gained prominence worldwide in the early months of the pandemic. eight mutations have been identified along the viral genome, mostly located in conserved segments of the structural proteins and showing low variability among coronavirus, which indicated that they might have a functional impact. at the moment of writing this paper, these mutations present a varied success in the sars-cov- virus population; ranging from a change in the spike protein that becomes absolutely prevalent, two mutations in the nucleocapsid protein showing frequencies around %, to a mutation in the matrix protein that nearly fades out after reaching a frequency of %. the emergence of the novel severe acute respiratory syndrome coronavirus (sars- cov- ) and the subsequent pandemic has become a health problem unparalleled in the last century. sars-cov- is thought to be originated from an animal coronavirus that successfully adapted to humans. the species of origin of sars-cov- has not been fully identified, but the virus seems to be related to sars-cov and other coronaviruses found in bats and other mammal species, although different from them (chan et the sars-cov- genome size is around kb with the typical gene structure known in other betacoronaviruses: starting from the ′, more than two-thirds of the genome comprises orf ab encoding polyproteins (nsp to nsp ), while the last third consists of genes encoding major structural proteins; including spike (s or orf ), envelope (e or orf ), membrane (m or orf ), and nucleocapsid (n or orf ) proteins. additionally, the sars-cov- contains at least minor structural proteins, encoded by orf a, orf , orf a, orf b, orf , and orf genes (khailany et al. ) . the first cases of the novel coronavirus associated disease (covid- ) have been traced to the chinese province of hubei in early december (https://www.who.int/csr/don/ -january- -novel-coronavirus-china/en/). although the actual index case is not really known, the first sequence of the novel coronavirus was produced within weeks from the emergence of the disease (zhu et al. ). as of the moment of writing this paper, more than , sequences have been produced in less than five months since the start of the pandemic. this is a unique opportunity to gain insight on the evolution of a betacoronavirus in a completely naïve human population. in this context, viral variants efficiently transmitted will have less influence of the selection exerted by the immune response, since most transmissions will occur from individuals before the development of an efficient immune response to naïve recipients. the aim of the present study was to determine the amino acid substitutions in viral proteins that were widely present in available sequences of sars-cov- , relating them to the known chronology of the pandemic. also, the mutations found were assessed in order to try to understand its potential significance for viral fitness. week, they were also reported in other continents (supplementary table s ). interestingly, the only substitution that became fully predominant was asp gly in the spike protein (orf -s). gly his in orf a reached a frequency of % at the moment of writing this paper. it is worth noting that when the sequences where analysed by continents, all mutations were spread worldwide, except the met in the orf -m, that was absent in africa (supplementary table s ). fig. s ). these clades did correspond to the l and s types reported by tang et al. ( ). this mutation was predicted to be neutral. finally, the last change was found in the nsp protein, leu phe, which significance was unclear. this mutation was also predicted to be neutral. cfssp: chou and fasman secondary structure prediction server evolutionary analysis of sars-cov- : how mutation of non- structural protein (nsp ) could affect viral autophagy global spread of sars-cov- subtype with spike protein mutation d g is shaped by human genomic variations that regulate expression of tmprss and mx genes', biorxiv genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan predicting the functional effect of amino acid substitutions and indels the membrane m protein carboxy terminus binds to transmissible gastroenteritis coronavirus core and contributes to core stability sars-cov nucleocapsid protein binds to hubc , a ubiquitin conjugating enzyme of the sumoylation system bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt molecular evolution of the sars coronavirus during the course of the sars epidemic in china sars-cov- and orf a: non-synonymous mutations polyproline regions', msystems genomic characterization of a novel sars-cov- ' spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- genetic evidence for a structural interaction between the carboxy termini of the membrane and nucleocapsid proteins of mouse hepatitis virus genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the coronavirus nucleocapsid is a multifunctional protein it is too soon to attribute ade to covid- ', microbes and infection on the origin and continuing evolution of sars-cov- ', national science review on the origin and continuing evolution of sars-cov- ', national science review is covid- receiving ade from other coronaviruses?', microbes & infection clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice immunodominant sars coronavirus epitopes in humans elicited both enhancing and neutralizing effects on infection in non-human primates cryo-em structure of the -ncov spike in the prefusion conformation a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china key: cord- -t t i authors: schloer, sebastian; brunotte, linda; mecate-zambrano, angeles; zheng, shuyu; tang, jing; ludwig, stephan; rescher, ursula title: drug synergy of combinatory treatment with remdesivir and the repurposed drugs fluoxetine and itraconazole effectively impairs sars-cov- infection in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t t i the sars-cov- pandemic and the global spread of coronavirus disease (covid- ) urgently calls for efficient and safe antiviral treatment strategies. a straightforward approach to speed up drug development at lower costs is drug repurposing. here we investigated the therapeutic potential of targeting the host- sars-cov- interface via repurposing of clinically licensed drugs and evaluated their use in combinatory treatments with virus- and host-directed drugs. we tested the antiviral potential of repurposing the antifungal itraconazole and the antidepressant fluoxetine on the production of infectious sars-cov- particles in the polarized calu- cell culture model and evaluated the added benefit of a combinatory use of these host-directed drugs with remdesivir, an inhibitor of viral rna polymerase. drug treatments were well-tolerated and potent impaired viral replication was observed with all drug treatments. importantly, both itraconazole-remdesivir and fluoxetine-remdesivir combinations inhibited the production of infectious sars-cov- particles > % and displayed synergistic effects in commonly used reference models for drug interaction. itraconazole-remdesivir and fluoxetine-remdesivir combinations are promising therapeutic options to control sars-cov- infection and severe progression of covid- . the zoonotic coronavirus sars-cov- and the resulting covid- pandemic impressively the late endosome is an entry site for many zoonotically transmitted viruses, in particular for the results presented in this study strongly argue for the endolysosomal host-sars-cov- interface as a druggable target. however, host-directed drugs will rather suppress infection than completely eradicate the pathogen. the resulting demand for high drug doses and early and prolonged treatment is often associated with poor patient compliance. while drugs directly acting on virus structures are much more likely to completely eliminate the pathogens in shorter treatment time, emerging viral resistance to these antivirals is a major concern, as observed with the influenza neuraminidase inhibitor oseltamivir (kim et al., itraconazole antiviral activity in sars-cov- infected vero cells (fig. b) . of note, no detectable cytotoxicity was observed with these doses (fig. s a) . the itraconazole cytotoxic s a). the combination treatments were also well-tolerated, and no cytotoxic effects were seen when cells were simultaneously treated with the drug pairs (fig. s b, c) . for all drugs, we chose those concentrations that were not sufficient to achieve a % reduction when individually applied (fig a) . for both itrarem and fluorem combinations, a potent reduction in virus titers was detected in all cases. of note, several combinations yielded a reduction > % of the maximum virus titers produced in control cells (fig. b) . synergy scores were calculated with the lower concentration ranges of both drugs (fig. ) . the strong synergy led to an overall drug combination sensitivity score (css) of . , resulting in > % inhibition already at nm of remdesivir and nm of itraconazole. fluorem combination treatment had a higher average synergy score, as well as a higher css score than itrarem ( . vs . ), suggesting that this drug combination is more likely to show synergy. importantly, for all models, the fluorem combinations that met the ≥ % inhibition criterion were well within the high synergy area (fig. ) . while remdesivir and the host-directed drugs itraconazole or fluoxetine target independent pathways, we found that drug combinations together with remdesivir (itrarem and fluorem) showed stronger antiviral activities against sars-cov- than the remdesivir monotherapy. moreover, the overall therapeutic effect of the combinations was larger than sars-cov- on virus entry and its immune cross-reactivity with sars-cov remdesivir inhibits sars-cov- in human lung cells and chimeric sars-cov expressing the sars-cov- rna polymerase in mice drug repurposing: progress, challenges and recommendations the many estimates of the covid- case fatality rate targeting the endolysosomal host-sars-cov- interface by clinically licensed functional inhibitors of acid sphingomyelinase (fiasma) including the antidepressant fluoxetine the clinically licensed antifungal drug itraconazole inhibits influenza virus in vitro and in vivo combinatory treatment with oseltamivir and itraconazole targeting both virus and host factors in influenza a virus infection host-directed drug targeting of factors hijacked by pathogens broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov dose-dependent pharmacokinetics of itraconazole after intravenous or oral administration to rats: intestinal first-pass antiviral drug resistance: mechanisms and clinical implications coronavirus membrane fusion mechanism offers a potential target for antiviral development triazoles inhibit cholesterol export from lysosomes by binding to npc therapeutic efficacy of the small molecule gs- against ebola virus in rhesus monkeys estimating clinical severity of covid- from the transmission dynamics in wuhan a pneumonia outbreak associated with a new coronavirus of probable bat origin reducing mortality from -ncov: host-directed therapies should be an option supporting figure analysis of the cytotoxicity of treatments. (a) itraconazole, (b) calu- cells were treated with the indicated drug combinations for h. bars display mean percentages of viable cells ± sem, with mean viability in solvent-treated control cells (c) set to % staurosporine (st)-induced cytotoxicity served as a positive control. n = , one-way anova followed by dunnett's multiple comparison test supporting figure analysis dose-response curve of remdesivir treatments in calu- cells were infected with . moi of sars-cov- for h and treated with the indicated drug combinations for h. mean percent inhibition ± sem of sars-cov- replication, with mean virus titer in control cells (treated with the solvent dmso) set to % logec and logec values were determined by fitting a non-linear regression model we thank jonathan hentrey for help with the plaque assays. this research was funded by grants from the german research foundation (dfg), crc "breaking barriers", project a (to u.r.) and b (to s.l.), crc "dynamic key: cord- - stbjl authors: liu, tianyuan; nogueira, leandro balzano; lleo, ana; conesa, ana title: transcriptional differences for covid- disease map genes between males and females indicate a different basal immunophenotype relevant to the disease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: stbjl worldwide covid- epidemiology data indicate clear differences in disease incidence among sex and age groups. specifically, male patients are at a higher death risk than females. however, whether this difference is the consequence of a pre-existing sex-bias in immune genes or a differential response to the virus has not been studied yet. we created decovid, an r shiny app that combines gene expression data of different human tissue from the genotype-tissue expression (gtex) project and the covid- disease map gene collection to explore basal gene expression differences across healthy demographic groups. we used this app to study differential gene expression between men and women for covid- associated genes. we identified that healthy women present higher levels in the expression of interferon genes and the jak-stat pathway leading to cell survival. acknowledges that a vaccine solution for the general population might not be ready before mid or late . at the same time, very recent re-infection cases suggest that immunity may not always persist, and active disease surveillance of past cases may be required. as the second wave of the pandemic steadily progresses in europe and the us, possibly with new variants of the virus and affecting a more diverse and younger population, novel or improved treatment targets and options are likely to remain as important for the management of the disease. one of the most intriguing aspects of covid- is the different grade of severity it affects different people. although the acknowledged risk factors such as pre-existing conditions such as cardiovascular disease (zheng et al., ; d. wang et al., ; chen et al., ) , diabetes (apicella et al., ; fang et al., ; guo et al., , ) , obesity (simonnet et al., ) , age (liu et al., ; cdc covid- response team, ; litcovid) and sex (higher mortality in men than women (gebhard et al., ; jin et al., ) suggest a relationship between the physical condition and disease progression, the precise pathways of this relationship have not been clearly established yet. sex-associated differences between males and females that have been proposed to affect covid- incidence include lifestyle (e.g., smoking and drinking) and mental health. the smoking population of males is higher than females, and smoking may be a risk factor for severe covid- (cai, , ; jamal et al., ; grundy et al., ) . in contrast, men differ from women during the pandemic in sleeping and signs of depression (c. . a recent study showed that male and female covid- patients differed in their immune response, with the first showing a stronger cytokine response while the second is having a higher t-cell activation pattern (takahashi et al., ) . interestingly, the pattern of covid- risk factors is not fully shared with other similar recent pandemics such as sars, mers, h n . for example, sars patients were more frequent among the healthy young people (yin and wunderink, ) . age and complications, but not gender, was the most significant risk factors for mortality in the arab and south korean studies of mers (park et al., ) . several studies on the h n influenza revealed the younger age, chronic conditions, and female sex as risk factors of the disease (viasus et al., ) . this suggests that the observed covid- risk factors, especially those associated with age and sex groups, might be the result of specific interactions of the sars-cov- with intrinsic physiological characteristics, including the basal immunophenotype, of these population groups. however, which specific immunological characteristics imprint an existing condition that how this relates to covid- remains to be explored. the covid- disease map is a valuable resource to investigate the molecular responses to sars -cov- infection and understand the biological pathways leading to the severe manifestations of the disease. this database also creates an opportunity for interrogating existing molecular data on differences associated with covid- risk factors in the general population. in this paper, we present the decovid app, a shiny app, to explore basal expression level differences in covid- disease map genes between men and women and different age groups. we used data from the gtex database, which contains rna-seq profiles for hundreds of demographically diverse healthy individuals in multiple tissues and allows us to interrogate covid- genes globally or individually. we used this resource to study basal expression differences in covid- genes between man and woman and found that a different immunological state is prevalent in both genders. interestingly some of these differential pathways have been shown to be relevant for disease severity progression and be characteristic of the sex-biased immune response to the virus, providing ground for hypotheses on the molecular basis of the covid- sex-bias. we anticipate that the decovid app could be a useful tool for researchers to explore the molecular etiology of covid- demographic differences. the decovid shiny app combines a selection of human tissue specific gtex data with the covid- disease map database to allow quick exploration of basal gene expression values and differences in the healthy human population for genes described to be important for covid- . we included data from blood, lungs, heart, kidney, stomach, and brain for being tissues described to be affected by sars-cov- infection. the sample size for different tissues is different in the gtex datasets. the largest sample size is the whole blood with samples, while the smallest dataset is the stomach tissue with samples (figure a) . the total number of collected covid- disease map is genes , . a gene ontology (go) term enrichment analysis indicates that genes related to cytokines, response to virus, innate immune response, transcriptional regulation, and kinase activity populate this collection (figure b) . the decovid app is a user-friendly implementation to allow an analysis of these data for researchers without strong bioinformatics skills (figure ) . extensive documentation and video tutorials are provided to the facility for a quick start. users should indicate the demographic factor for differential expression analysis. in case age is selected, the age threshold value to classify samples as old or young should be specified. once a significance p.value and a fold-change threshold are provided, the app computes differential expression and provides results as a list of genes and explore the distribution of gene expression values for specific genes of the covid- disease map collection or investigate the enrichment of specific immune responserelated functions among genes with sex or age expression biases. screenshot of the decovid app showing on the left dialog panels for input parameters (contrast type and significance thresholds), and on the right results panels with list of differentially expressed genes, gene-specific expression plots and enrichment analysis. using decovid, we analyzed the extent of gene expression differences among demographic groups across different covid- affected tissues (table ) . the analysis revealed a wide-range of significant gene expression differences for covid- genes between when comparing sexes and age groups for all evaluated tissues. differences where lower when imposing a fold-change threshold of . for the magnitude of the mean expression level. whole-blood were the tissue with the highest number of expression differences for covid- genes followed by the brain. from this simple analysis, we concluded that the covid- disease map represents a disease signature with significant differences across demographic groups that may be worth exploring for hypothesizing on the molecular basis for differences in disease incidence among these groups. decovid identified basal sex-biased expression differences in immune-response genes associated to differential disease incidence between man and women since covid- fatality rates are consistently higher in men than women in many different countries worldwide (supplementary figure ) , we hypothesized that sex is a pre-existing condition that influences the severity of the disease and that the study of basal differences in expression differences in covid- relevant genes in healthy individuals could reveal insights into the molecular basis of these differences. we used the decovid app to explore these differences and concentrated on the blood tissue for capturing systemic immune responses. a total of genes (p.value < . and fold change > . ) were differentially expressed between men and women in blood samples, regardless of the age group (figure a) . we found high expression in females of genes involved in the innate immune response, particularly type i interferons, such as ifna , ifna , ifne, ifit , and ifit , anti-apoptotic regulators such showed that female patients have a high t-cell level (takahashi et al., ) , which were postulated as a critical factor for the differential fatality incidence between sexes. therefore, with a high level of baseline expression of type interferons, female patients are more likely to maintain a high t-cell level in the early stage, which might lead to a milder evolution of the disease. here we speculate that these innate immunity pathways may play a crucial role in the a b early control of respiratory coronavirus infection (figure b) . moreover, data indicate higher expression in women of jak, multiple members of the stat family, and cell survival genes such as mcl and bcl a. it has been shown that the jak-stat signaling pathway mediates the synthesis of anti-apoptotic regulators such as mcl , bcl a, and bcl l upon induction of high type interferon levels (sepúlveda et al., ) , agreeing with the observed gene expression phenotype in females. finally, we detected significant increased tumor necrosis factor tnfsf b expression in females. this gene stimulates the synthesis of chemokines ccl and cxcl through nfkb subunit pathway, leading to b-cell homing through the synthesis of ccl (lópez-giral et al., ) , lymphopoiesis through the synthesis of cxcl (silva et al., ; sen, ) , and development and survival through the synthesis of tnfsf b (caamaño et al., ; gerondakis and siebenlist, ) . since b-cells are part of the humoral immunity component of the adaptive immune system and are responsible for mediating the production of antigen-specific immunoglobulins (ig) directed against invasive pathogens, higher levels of these chemokines in females might lead to a faster response against a potential viral infection. in males, in turn, the mentioned processes were comparatively downregulated, while the only process upregulated compared to females were those related to the expression of pro-inflammatory cytokines (figure b) , particularly il , il and il . also, chemokine cxcl levels are higher in men, which supports the hypothesis of a first stage of male response against a viral infection being more related to inflammation and th cell differentiation processes. here we present the decovid app as a resource to explore sex and age differential expression patterns in the healthy population for genes described to be involved in covid- disease pathways. the gtex data, used in this work has been recently mined for sex-specific expression differences concluding that these are tissue-specific and generally small, also observed in our analyses (oliva et al., ) . this study focused on the sex-specific genetic effects on gene expression associated to complex genetic traits. in our application we repurpose the valuable gtex database for the study of covid- relevant pathways while providing a mechanistic model to interpret differential expression results. we used decovid to investigate the gene expression differences between men and woman that could explain differences in disease severity among these demographic groups, with men being more likely to have a fatal outcome than women. we found that two major immune response pathways were differentially expressed between men and women in the healthy population. women showed higher expression of interferon genes while men have higher levels of pro-inflammatory th cytokine genes. these two pathways have been proposed to be critical factors for fatal response to the virus infection, with patients that present an interferon-mediated response having a better prognosis than those responding with a massive cytokine activation (blanco-melo et al., , ; hadjadj et al., ) . in agreement, animal models of sars-cov- and mers-cov infection showed that failure to elicit an early type interferon response correlates with the disease severity (channappanavar et al., ) . perhaps more importantly, these models demonstrate that timing is key, as type interferons are protective at the early stages of a viral disease. on the other hand, clinical studies showed a differences in the immunophenotype between female and male covid- patients with women having a more robust t-cells activation while male patients presenting a higher plasma level of pro-inflammatory cytokines (takahashi et al., ) . in this study we showed that these immune system differences, which are critical to covid- progression, represent a sex-associated pre-existing condition that is likely to create a differential predisposition to disease outcome between men and women and may explain the observed sex-bias in covid- mortality. future studies should address whether these gene expression patterns translate into functional differences in patients infected with sars-cov- , whether they have prognosis value and of if they present any relationship with hormonal levels, which have also been linked to covid- severity (strope et al., ; channappanavar et al., ; scully et al., ) . all together our results show that decovid is a useful resource to investigate the status of covid- genes across demographic groups and human tissues and postulate on the differential pathways to disease operating in each case. we used rna-seq data from the genotype-tissue expression project (gtex) that contains data from tissue sources obtained from deceased but considered healthy individuals from both sexes and a wide range of ages (lonsdale et al., ) . gene read counts matrix and the annotation files from the gtex portal (https://www.gtexportal.org/home/datasets). in other to provide a homogenous set of human tissue reference samples, gtex data were analyzed for possible biases, and samples belong to individuals with a "ventilator case" label as a cause of death were discarded as they clearly segregated from remaining samples on a principal component analysis plot (supplementary figure ) . the covid- disease map genes were downloaded from https://github.com/wikipathways/cord- and available on march th, . the list is mined from pmc papers in covid- open research dataset (cord- ) using machine learning approaches, so they are positively related to the covid- disease process (lu . the decovid software is a shiny app written in r with a user-friendly interface. the app can be installed through a docker image or directly download from github (see supplementary materials for installation). decovid already has integrated all essential data from gtex, and no additional downloads are necessary. differential gene expression is calculated using edger (robinson et al., ) and multiple testing correction is applied following the benjamini and hochberg(bh)method (benjamini and hochberg, ) . results are provided either as lists of differentially expressed genes, heatmaps of sex and age mean expression values, and gene-specific boxplots showing the distribution of expression values across demographic groups. go enrichment analysis of significant gene sets is provided through the cluster profile r package and uses the list of covid- disease map as a reference set (yu et al., ) . covid- in people with diabetes: understanding the reasons for worse outcomes covid- disease map, building a computational repository of sars-cov- virus-host interaction mechanisms mers transmission and risk factors: a systematic review edger: a bioconductor package for differential expression analysis of digital gene expression data considering how biological sex impacts immune responses and covid- outcomes control of b lymphocyte apoptosis by the transcription factor nf-kappab bcl- expression is mainly regulated by jak/stat pathway in human cd + hematopoietic cells the tlr -trif pathway protects against h n influenza virus infection transcription factors of the alternative nf-κb pathway are required for germinal center b-cell development high prevalence of obesity in severe acute respiratory syndrome coronavirus- (sars-cov- ) requiring invasive mechanical ventilation sexx matters in infectious disease pathogenesis are sex discordant outcomes in covid- related to sex hormones? sex differences in immune responses that underlie covid- disease outcomes factors associated with severe disease in hospitalized adults with pandemic (h n ) in spain immediate psychological responses and associated factors during the initial stage of the coronavirus disease (covid- ) epidemic among the general population in china clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan mers, sars and other coronaviruses as causes of pneumonia key: cord- - y n authors: xu, cong; wang, yanxing; liu, caixuan; zhang, chao; han, wenyu; hong, xiaoyu; wang, yifan; hong, qin; wang, shutian; zhao, qiaoyu; wang, yalei; yang, yong; chen, kaijian; zheng, wei; kong, liangliang; wang, fangfang; zuo, qinyu; huang, zhong; cong, yao title: conformational dynamics of sars-cov- trimeric spike glycoprotein in complex with receptor ace revealed by cryo-em date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y n the recent outbreaks of severe acute respiratory syndrome coronavirus (sars-cov- ) and its rapid international spread pose a global health emergency. the trimeric spike (s) glycoprotein interacts with its receptor human ace to mediate viral entry into host-cells. here we present cryo-em structures of an uncharacterized tightly closed sars-cov- s-trimer and the ace -bound-s-trimer at . -Å and . -Å-resolution, respectively. the tightly closed s-trimer with inactivated fusion peptide may represent the ground prefusion state. ace binding to the up receptor-binding domain (rbd) within s-trimer triggers continuous swing-motions of ace -rbd, resulting in conformational dynamics of s subunits. noteworthy, sars-cov- s-trimer appears much more sensitive to ace -receptor than sars-cov s-trimer in terms of receptor-triggered transformation from the closed prefusion state to the fusion-prone open state, potentially contributing to the superior infectivity of sars-cov- . we defined the rbd t -t loop and residue y as viral determinants for specific recognition of sars-cov- rbd by ace , and provided structural basis of the spike d g-mutation induced enhanced infectivity. our findings offer a thorough picture on the mechanism of ace -induced conformational transitions of s-trimer from ground prefusion state towards postfusion state, thereby providing important information for development of vaccines and therapeutics aimed to block receptor binding. coronaviruses are a family of large, enveloped, positive-stranded rna viruses that cause upper respiratory, gastrointestinal and central nervous system diseases in humans and other animals (song et al., ; walls et al., ) . in the past few decades, new evolved coronaviruses have posed a global threat to public health, including the outbreaks of the severe acute respiratory syndrome coronavirus (sars-cov) in - and the middle east respiratory syndrome coronavirus (mers-cov) in which had caused thousands of infection, and the mortality rate of them was about % and . %, respectively (rabaan et al., ) . the recent coronavirus disease pandemic is caused by a novel coronavirus named severe acute respiratory syndrome coronavirus (sars-cov- ). on june , , there had been , , laboratory-confirmed sars-cov- infections globally, leading to , deaths. to date, there is no approved therapeutics or vaccines against sars-cov- and other human-infecting coronaviruses. as in other coronaviruses, the spike (s) glycoprotein of sars-cov- is a membranefusion machine that mediates receptor recognition and viral entry into cells and is the primary target of the humoral immune response during infection (rabaan et al., ; tang et al., ) . the s protein is a homotrimeric class i fusion protein that forms large protrusions from the virus surface and undergoes a substantial structural rearrangement to fuse the viral membrane with the host-cell membrane once binds to a host-cell receptor (bosch et al., ; li, ) . the s protein ectodomain consists of a receptor-binding subunit s and a membrane-fusion subunit s (tang et al., ; walls et al., ; wrapp et al., ) . two major domains in coronavirus s have been identified, including an n-terminal domain (ntd), and a c-terminal domain (ctd) also called receptor binding domain (rbd). following the rbd, s also contains two sub-domains (sd and sd ) . the s contains a variety of motifs, starting with the fusion peptide (fp). the fp describes a short segment, conserved across the viral family and composed of mostly hydrophobic residues, which inserts in the hostcell membrane to trigger the fusion event (epand, ; tang et al., ) . recent cryoelectron microscopy (cryo-em) studies on the stabilized ectodomain of sars-cov- s protein revealed a closed state of s trimer with three rbd domains in "down" conformation (walls et al., ) , as well as an open state with one rbd in the "up" conformation, corresponding to the receptor-accessible state (walls et al., ; wrapp et al., ) . unlike in mers-cov s protein (pallesen et al., ) , the two or three rbd "up" conformation has not been detected for sars-cov- s trimer. sars-cov- s and sars-cov s share % amino acid sequence identity, yet, they bind the same host-cell receptor-human angiotensin-converting enzyme (ace ) (hoffmann et al., ; wang et al., ; zhou et al., ) . it is usually considered that the transition process towards the postfusion conformation is triggered when the s subunit binds to a hostcell receptor; receptor binding destabilizes the prefusion trimer, resulting in shedding of the s subunit and transition of the s subunit to a stable postfusion conformation (walls et al., b) . the available crystal structures of the rbd domain of sars-cov- interacting with the extracellular peptidase domain (pd) of ace , together with the cryo-em structure of rbd domain associated with the full length ace provided important information on the rbd-ace interaction interface, revealing that the residues s to q , known as the receptorbinding motif (rbm), within rbd directly interact with ace (lan et al., ; wang et al., ; yan et al., ) . however, a complete picture of ace associating with the sars-cov- trimeric s protein is still missing, and it remains elusive on how ace binding induces sars-cov- s trimer conformational destabilization to facilitate transitions towards the postfusion state. here, we present cryo-em structures of sars-cov- s trimer in a tightly closed state, and the s trimer in complex with the receptor ace (termed sars-cov- s-ace ) at . Å and . Å resolution, respectively, in addition to a s trimer structure in the unliganded open state. the tightly closed ground prefusion state with originally dominant population may indicate a conformational masking mechanism of immune evasion for sars-cov- spike. our data suggested there is one rbd in the "up" conformation and is trapped with ace in the s-ace complex; ace can greatly shift the conformational landscape of s trimer, and trigger continuous swing motions of ace -rbd in the context of the s trimer resulting in conformational dynamics in s subunits. we demonstrated the rbm t -t loop and residue y as viral determinants for specific recognition of sars-cov- rbd by ace . our findings provide a blueprint for the understanding of the mechanisms of ace -induced conformational dynamics and resulted conformational transitions of the s trimer towards postfusion state, which may benefit anti-sars-cov- drug and vaccine development. prefusion stabilized ectodomain trimer of sars-cov- s glycoprotein was produced from hek f cells using the strategy also adopted in other studies (fig. s a ) (kirchdoerfer et al., ; miroshnikov et al., ; pallesen et al., ; tortorici et al., ; walls et al., a; walls et al., ; walls et al., ; walls et al., ; wrapp et al., ) , and was subjected to cryo-em single-particle analysis ( fig. s a-b ). our initial reconstruction suggested a preferred orientation problem associated with the s trimer (highly preferred "side" orientation but lacking tilted top views, fig. s c ), which is also the case for the influenza hemagglutinin (ha) trimer (but highly preferred "top" orientation) (tan et al., ) . to overcome this problem, we adopted the recently developed tilt stage strategy in data collection with additional data collected at º and º tilt angles (tan et al., ) . this allowed us to obtain a cryo-em structure of sars-cov- s trimer in a closed state at . Å resolution (with imposed c symmetry, termed s-closed) (figs. a, and s -s , movie ). excitingly, after overcoming the preferred orientation problem, our s-closed map very well resolved the peripheral edge of the ntd domain ( fig. a-c) , which was less well resolved in the recent reports (walls et al., ; wrapp et al., ) . this enabled us to build a more complete model of the sars-cov- s trimer containing the previously missing loop regions (including q -p , k -f , y -n , q -n , r -s , and s -a , fig. b, s g) ; additionally, the s -c loop in the rbm subdomain was also captured in our structure ( fig. d) . interestingly, compared with the recent closed state sars-cov- s trimer structure (walls et al., ) , our map represents an uncharacterized tightly closed conformation. for instance, the upper portion of s subunit especially ntd and rbd depicts an anti-clockwise rotation of . º and . º, respectively (fig. e ). accompanying this rotation, there is a slight inward tilt leading the peripheral edge of ntd exhibiting a . Å inward movement for ca of t (fig. s g ). these motions can be propagated to the central helix (ch) of s subunit, generating a clockwise rotation of . º (fig. e ). this central portion clockwise rotation associating with the outer potion opposite anti-clockwise rotation in reality twists the complex in a more compact conformation. indeed, the average interaction interface between protomers increased from ~ , . Å in their structure to , . Å in our structure (fig. f ). taken together, our map represents a tightly closed state of the sars-cov- s trimer, not captured before. furthermore, when comparing our sars-cov- s-closed structure with the closed state sars-cov s trimer cryo-em structure (gui et al., ) , there is an anti-clockwise rotation of . º and . º in ntd and rbd, respectively, and a clockwise rotation of . º in ch region from their structure to our s-closed structure, associating with a rbd inward shift towards the central axis (rmsd of . Å, fig. s h ). collectively, our s-closed structure appears more compact than that of sars-cov s trimer ( , . Å vs. , . Å in interaction interface, fig. f ). altogether, our study revealed a tightly closed conformation of sars-cov- s trimer, not observed in the homologous sars-cov s neither, extending the detected conformational space of sars-cov- spike protein. the tightly closed state with stably packed fusion peptide may represent the ground prefusion state of sars-cov- s trimer the hydrophobic fusion peptide, immediately after the s ' cleavage site and essential for host-cell membrane fusion, is highly conserved among sars-cov- , sars-cov, and mers-cov s proteins (tang et al., ) . still, the majority of fp is missing in the available sars-cov- s trimer structures. thus, how it folds and where it locates within s trimer of the virus and how it can be activated remain unclear. here, our s-closed map enabled us to capture the entire fp of sars-cov- including the previously undetected l -q fragment, which locates on the flank surface of s trimer, surrounded by hr of s subunit from the same protomer, and sd /sd of s subunit from the clockwise neighboring protomer (fig. g-h) . the fp fragment is well ordered, forming two small helixes (y -g , l -f ) and connecting loops (fig. g-h) . this observation further substantiates the notion that our sclosed structure with inactivated fp most likely represents the ground prefusion state. further interaction analysis revealed that sd and hr can form hydrogen bonds/salt bridges with the fp fragment, and sd plays a key role in this interaction involving in predicted hydrogen bonds/salt bridges (table s ) . noteworthy, among the sd -fp interactions, d from sd contributes to the formation of hydrogen bonds/salt bridges, majorly through its sidechain atoms, with k , y and k of fp, suggesting d may be essential in the interaction with and stabilization of fp (fig. i and table s ). this could be related to the recent reports suggesting that the d g mutation of sars-cov- s enhanced viral infectivity (more in discussion) (korber et al., ) . interestingly, it appears that before being activated, fp could serve as a linkage that wraps around the neighboring protomers in their s /s interface and simultaneously connects s with s , this way to coordinately lock the s trimer in the tightly closed ground prefusion state ( fig. g-h) . moreover, in this dataset the dominant population of the particles (~ %) is in the tightly closed state; although performed multiple rounds of d classification, eventually we found only a minor population ( %) of the particles is in the open state (fig. s ). our observations indicate that the open state sars-cov- s might be intrinsically dynamic and only exist transiently to expose the rbd domain. interestingly, the dominant population of the sars-cov- s trimer is in the ground prefusion state with inactivated fp and all the rbd domains buried, which may result in "conformational masking" preventing antibody binding and neutralization, similar to that described for hiv- envelope (env) (kwong et al., ; munro et al., ) . the population distribution of closed and open state of sars-cov- s varies among different studies (walls et al., ; wrapp et al., ) , which is reminiscent of observations made with sars-cov s and mers-cov s trimers. this observed variation could be potentially due to subtle difference in chemical condition used by different research groups (gui et al., ; kirchdoerfer et al., ; pallesen et al., ; song et al., ; walls et al., ; yuan et al., ) . to gain a thorough picture on how the receptor ace binding induces conformational dynamics of the sars-cov- s trimer and triggers transition towards the postfusion state, we determine the cryo-em structure of sars-cov- s trimer in complex with human ace pd domain to . Å resolution (termed sars-cov- s-ace , figs. a, s a-e, and s ). further focused-refinements improved the resolution of the s trimer portion of the map to . Å, and the connectivity in the ace -rbd portion of the map, respectively (fig. s e, s ) . we then built a pseudo atomic model of the complex with combined map information (fig. b ). to the best of our knowledge, the structure of sars-cov- s-ace complex has not been reported before. in this dataset we additionally captured an unliganded s trimer in the open state with one rbd up (resolved to . Å resolution, termed s-open), but did not detect the closed state . we should mention that our bio-layer interferometry (bli) assay revealed a relatively rapid disassociation kinetics between ace and the s trimer (koff = . x - s - , fig. s e ). we thus determined the complex structure in the presence of trace amount of cross linker glutaraldehyde (methods). additionally, we also determined the s-ace complex structure without cross linker at . Å resolution, and the two maps are in comparable conformation, suggesting that addition of cross linker did not change the conformation of the complex (fig. s g ). we then used the s-ace map at . Å resolution for detailed structural analysis. to inspect the conformational changes from the closed state to the unliganded open state, we first overlaid our s-open with our s-closed structures together. in the s-open structure, the only up rbd domain from protomer (termed rbd- ) shows a . º upwards/outwards rotation, resulting in an exposed rbm region accessible for ace binding (fig. c ). this rbd- rotation can be propagated to the underneath sd , inducing a downwards movement of sd (fig. c) . we also noticed a considerable clockwise rotation of . º, . º, and . º in ntd for protomer , , and , respectively, and anti-clockwise rotations in the ch of corresponding s subunit, greatly untwisting the s trimer from the tightly closed state (fig. d ). associated with this s untwisting, there is a downwards/outwards movement of ntds in the scale of ~ Å (fig. d, right panel) . these combined untwisting motion could release the original protomer interaction strength, beneficial for the transient raising up of the rbd. moreover, our local resolution analysis on the s-open map also suggested that other than rbd- , the consecutive rbd- also exhibits considerable dynamics (fig. s d ). our sars-cov- s-ace structure revealed that the s trimer binds with one ace through the only up rbd domain, while the other two rbds remain in the down conformation ( fig. a-b) , suggesting ace binding to sars-cov- strictly requires the up conformation of rbd. unlike the observations made with sars-cov and mers-cov s trimers, we did not detect s trimer with two rbd domains up with bound ace (kirchdoerfer et al., ; pallesen et al., ) . though our s-ace and s-open structures generally resemble each other especially in the s region, there are noticeable differences in the s region. specifically, after ace binding, the up rbd- from the s-open state can be pushed tilting downwards slightly, with the angle to the horizontal plane of s trimer reduced from . º to . º in ace bound state (fig. e ). this ace binding induced motion of rbd- could be propagated to the neighboring rbd- and the consecutive rbd- (rmsd: . Å, fig. f ), collectively disturbing the allosteric network of the fusion machinery. indeed, the neighboring protomer interaction interface was reduced from the original ~ . Å in the s-closed state to . ~ . Å in the ace bound state (fig. g ). altogether, these s subunits untwisting and rbd- tilting motions could destabilize the prefusion state of s trimer, prepared for the subsequent conformational transitions towards the postfusion state. interestingly, our s-ace structure showed that the core region of the up rbd- and the rbm t -f loop of the neighboring rbd- could form aromatic interactions with the involvement of y /f from rbd- and f /y from rbd- ( fig. h ), potentially enhancing interactions between neighboring s subunits, thus beneficial for subsequent simultaneous release of s subunits. this interaction was not detected in the counterpart of the homologous sars-cov s-ace structure, likely due to longer distance between the adjacent "up" and "down" rbds in that structure (kirchdoerfer et al., ; song et al., ) . noteworthy, the originally stably packed fp from protomer surrounded by sd /sd of the neighboring protomer in the s-closed structure is now mostly missing in the s-ace structure, which is also the case in the s-open structure. this is mostly caused by the s trimer untwisting-motion induced downwards shift of sd (fig. c , i). indeed, the b /b strands within sd shift downwards for up to . Å; consequently, the c and t from b and the connecting loop could clash with the y and l of the originally packed a helix of fp ( fig. i ), potentially resulting in destabilization and activation of the fp motif from protomer . since the untwisting/downwards-shift motions of s subunits are allosterically coordinated within the s trimer in its opening process, the density corresponding to fps in protomer and are also missing, indicating a coordinated activation mechanism of fp, which may be one of the key elements prepared for the subsequent fusion of s trimer. according to our sars-cov- s-ace cryo-em structure, the overall ace -rbd interaction interface is comparable to that of the crystal structures of the rbd domain of sars-cov- s interacting with the ace pd domain (fig. a ) (lan et al., ; wang et al., ) , i.e. our structure revealed residues of rbd are in contact with residues of ace with a distance cut-off of Å (table s ). sequence alignment demonstrated that the rbm t -f loop is the most diversified region between sars-cov- and sars-cov s proteins (fig. s ). in line with this, structural comparison revealed that the conformation of the rbm t -f loop in our sars-cov- s-ace structure is very distinct from that in the sars-cov rbd-ace crystal structure ( fig. b ) (li et al., ) . noteworthy, the rbm t -f loop can originally be resolved in our s-closed structure, but is mostly missing in our s-open structure, indicating the t -f loop may be activated in the open state. in our s-ace structure, a portion of this loop forms contact with the n-terminal helix of ace (fig. a ), for instance, a within this loop could interact with s /t of ace (table s ), suggesting that the rbm t -f loop may play an important role in receptor recognition. moreover, the s-ace structure indicated that the q -y region located in the other edge of rbm could also form close contact with ace , i.e. y could form hydrogen bonds/contacts with to further define the subdomains/residues critical for rbd binding to ace , we designed and produced three sars-cov- rbd mutant proteins, each of which had a single subdomain substituted with the counterpart of sars-cov. these rbd mutants were termed rbd-(core), rbd-(rbm-r ) and rbd-(rbm-r ), which harbored r to n of the core region, l to k , and t to t of the rbm from sars-cov, respectively (figs. c and s ). results from ace -binding enzyme linked immunosorbent assay (elisa) showed that the binding activity of the three rbd mutants towards anti-rbd polyclonal antisera and the crossreactive monoclonal antibody a was comparable to that of the wildtype sars-cov- rbd protein ( fig. c) , indicating that the mutations did not significantly affect the overall conformation of the rbds. the mutants rbd-(core) and rbd-(rbm-r ) bound ace as efficiently as the wildtype rbd; in contrast, rbd-(rbm-r ) completely lost ace -binding ( fig. c ). these results pinpoint the rbm-r region (residues -teiyqagst- ) as the critical viral determinant for specific recognition of sars-cov- rbd by the ace receptor. additionally, we constructed three single-point mutants of sars-cov- rbd protein, rbd (q a), rbd (v a), and rbd (y a). our elisa ace -binding assay showed that the mutation y a was sufficient to completely abolish the binding of ace , while the other two mutations did not show such effect (fig. d) , demonstrating that the residue y of sars-cov- rbd is a key amino acid required for ace receptor binding. our sars-cov- s-ace map showed well defined density for the s trimer region, but relatively lower local resolution in the associated ace -rbd region (fig. s c) , suggesting considerable conformational heterogeneity of ace -rbd as well as relative dynamics between ace -rbd and the remaining part of s trimer with respect to each other. this is in line with the report showing that in sars-cov s trimer, the associated ace -rbd is relatively dynamic, showing three major conformational states with the angle of ace -rbd to the surface of s trimer at ~ º, º, and º, respectively (song et al., ) . to better delineate the conformational space of the ace engaged sars-cov- s trimer, we performed multi-body refinement in relion . (fernandez-leiro and scheres, ) . principal component analysis of the movement revealed that approximately % of the movement of the complex is described by the first three eigenvectors representing swing motions in distinct directions relative to the s trimer (fig. a) . eigenvector describes a swing motion of ace -rbd towards rbd- direction with the angular range of . º, eigenvector corresponds to the swing motion of ace -rbd towards the original location of rbd- with the angular range of . º, and eigenvector describes the swing motion of ace -rbd along the ntd- to ntd- direction with the angular range of . º (fig. b) . histograms of the amplitudes along the three eigenvectors are unimodal, indicative of continuous motions (fig. c ). as the dynamic motions in the complex are formed by linear combination of all eigenvectors, these data suggested that ace -rbd processes on top of the s trimer in a noncorrelated manner. moreover, multi-body analysis on the non-cross-linked sars-cov- s-ace data showed similar swing motions (fig. s h) , indicating the presence of cross linker did not disturb the mode of ace -rbd motions within the s trimer. additionally, compared with the homologous sars-cov s-ace complex, which shows discrete movements of ace -rbd in one direction (similar to our eigenvector direction) (song et al., ) , ace binding to sars-cov- s induces more complex combined continuous swing motions of ace -rbd within the complex. putting together, our observations suggest that ace receptor binding to sars-cov- s triggers considerable conformational dynamics in s subunits that could destabilize the prefusion s trimer. indeed, the b-factor distribution of our s-ace complex demonstrated that ace binding induces strikingly enhanced dynamics in the s region including rbd and ntd domains (fig. d) , facilitating the release of the associated ace -s component and transitions of the s subunit towards a stable postfusion conformation. indeed, we found a notable drop in the interaction surface between s and s subunits from the s-closed state ( . Å ) to the s-ace state ( . Å ). it has been suggested that the large number of n-linked glycans covering the surface of the spike protein of sars-cov and mers-cov could pose challenge to antigen recognition, thus may help the virus evade immune surveillance yuan et al., ) . similar to sars-cov s, sars-cov- s also comprises n-linked glycosylation, with glycans in the s subunit and the other in the s subunit ( fig. a ) (walls et al., ; watanabe et al., ) . in our s-closed structure, we resolved the density for n-linked glycans per protomer (fig. a-b and s i ), including two undetected glycans at site n and n located in the ntd (fig. b) , while the three glycans located in the flexible c-terminal region are missing as in other studies (walls et al., ; wrapp et al., ) . similar to mers-cov and sars-cov s trimers (walls et al., b; walls et al., ; yang et al., ) , sars-cov- s trimer also forms a glycan hole at proximity of the s /s cleavage site and the fusion peptide (near the s ' cleavage site, fig. b) . although there is an extra glycan at n site near the s /s cleavage site in sars-cov- s, the hole region is still more sparsely glycosylated than the rest of the protomer. this glycan hole might be important for permitting the access of activating host proteases and for allowing membrane fusion to take place without obstruction (walls et al., b; walls et al., ; yang et al., ) . moreover, after ace binding, our s-ace structure revealed that the density corresponding to glycan at n site is weaker in protomer , while the other resolved glycans in the s-closed state can also be visualized in the s-ace structure (fig. c ). the outbreak of covid- caused by sars-cov- virus has become pandemic. several structures of sars-cov- spike rbd domain bound to ace have been reported (lan et al., ; wang et al., ; yan et al., ) . however, the complete architecture of sars-cov- trimeric s in complex with ace remains unavailable, leading to an incomplete understanding of the nature of this interaction and of the resulted conformational transitions of the s trimer towards postfusion and virus entry. in the present study, we determined an uncharacterized tightly closed state of sars-cov- s trimer revealing the stably packed fusion peptide, most likely representing a previously undetected ground prefusion state of s trimer. the tightly closed s trimer with originally dominant population may indicate a conformational masking mechanism of immune evasion for sars-cov- spike. importantly, we captured the complete architecture of sars-cov- s trimer in complex with ace . we found the presence of ace could dramatically shift the conformational landscape of the s trimer, and after engagement the continuous swing motions of ace -rbd in the context of the s trimer could generate considerable conformational dynamics in s subunits resulting in a significant decrease in s /s interface area. furthermore, our structural data combined with biochemical analysis revealed that the rbm t -t loop and residue y play vital roles in the binding of sars-cov- rbd to ace receptor. our findings depict a new role of fp in stabilizing s trimer and the mechanism of fp activation, expand the detected conformational space of the s trimer, and provide structural basis on the sars-cov- spike d g mutation induced enhanced infectivity. based on the data, we put forward a mechanism of ace binding-induced conformational transitions of sars-cov- s trimer from the tightly closed ground prefusion state transforming towards the postfusion state (fig. ) . in the receptor-free sars-cov- s, the majority of the s trimers is in the tightly closed ground prefusion state with inactivated fp, and only a minor population of the particles is in the transient open state with one rbd up representing the fusion-prone state, forming a dynamic balance between the two states (step ). however, the presence of ace and subsequent trapping of the rbd (discussed later) could overcome the energy barrier, break the balance and shift the conformational landscape towards the open state with an untwisting/downwards-shift motion of the s subunits, leading to unpacked/activated fps, weakened interactions among the protomers, and an up rbd. in step , once the receptor ace grasp the up rbd, the rbd will be trapped in the up conformation, and the associated ace -rbd together shows combined continuous swing motions on the topmost surface of the s trimer. these motions and dynamics could disturb the allosteric network and release the constrains imposed on the fusion machinery, beneficial for the releasing of the ace -s component, thereby allowing the s trimers to refold and fuse the viral and host membranes (step ). the dominantly populated conformation ( %) for the unliganded sars-cov- s trimer is in the tightly closed state (more compact than that of sars-cov s trimer) with all the rbd domains buried, resulting in conformational masking preventing antibody binding and neutralization at sites of receptor binding. this sars-cov- conformational masking mechanism of neutralization escape suggested here could affect all antibodies that bind to the receptor binding site, similar to that described for hiv- env (kwong et al., ; munro et al., ) . while for mers-cov or sars-cov s trimer, the closed state is less populated ( . % and . %, respectively, indicating the conformational masking mechanism may be less effective for the two viruses (gui et al., ; pallesen et al., ) . interestingly, our findings also suggest that unliganded s trimer proteins of sars-cov- are inherently competent to transiently display conformation with one rbd up ready for receptor ace binding; ace facilitates the capture of pre-existing s trimer open conformation that are spontaneously sampled in the unliganded spike, rather than triggering a trimer opening event. therefore, the spontaneously sampled s trimer conformations may serve a functional role in infectivity. intriguingly, our data also suggest that the sars-cov- s trimer is very sensitive to (gui et al., ; song et al., ) . this demonstrates that the sars-cov- s trimer is much more sensitive to the ace receptor than sars-cov s in terms of receptor-triggered transformation from the closed prefusion state to the fusion-prone open state, which might have contributed to the observed superior infectivity of sars-cov- as compared to that of sars-cov. noteworthy, the mutation sars-cov- spike d g has gained urgent concern; the mutated genotype g began spreading in early february, and it was detected to reach at a frequency of ~ % in early june according to gisaid public repository (daniloski et al., ; korber et al., ) . moreover, it has been reported that the d g mutation promotes the infectivity of sars-cov- and enhances viral transmissibility in multiple human cell types (daniloski et al., ; hu et al., ; zhang et al., ) . however, the structural basis of d g enhanced infectivity has not been fully understood yet. here our s-closed structure in the ground prefusion state showed that d heavily involves in the interaction with fp through its side chain atoms (fig. i , table s ). this interaction could contribute greatly to the linkage between neighboring protomers as well as between s and s subunits. however, the mutation of d to g without side chain could eliminate most of the hydrogen bonds/salt bridges the d originally forms with fp, hence greatly reduce its interaction with fp potentially leading to a coordinated unpacking/activation of fps. therefore, d g mutation could ( ) reduce the constrains between neighboring protomers as well as s /s interactions within s trimer, and ( ) lower the energy barrier for the conformational transformation from closed prefusion state to fusion-prone open state, leading to even more sensitive sars-cov- s trimer to ace binding. collectively, these factors may contribute to the enhanced infectivity and viral transmissibility of the g strain. in summary, our data revealed the unliganded sars-cov- s trimer to be intrinsically transforming between two distinct pre-fusion conformations, whose relative occupancies could be dramatically remodeled by receptor ace . these findings support a dynamics-based mechanism of immune evasion and ligand recognition (munro et al., ) . thus, our study delineates the properties of the sars-cov- spike glycoproteins that simultaneously allow the retention of function and the evasion of the humoral immune response. we also delineated that the substantial conformational dynamics of s subunits induced by ace binding could trigger the transition of the spike protein towards postfusion state prepared for viral entry and infection. collectively, our findings suggest that stabilization of the tightly closed ground prefusion state of s trimer with inactivated fps might be a general and effective means of inhibiting sars-cov- entry, and an understanding of the properties of the sars-cov- s trimer that permit neutralization resistance will guide attempts to create vaccines as well as therapeutics that target receptor binding. we are grateful to the staffs of the ncpss electron microscopy facility, database and cryo-em maps have been deposited in the electron microscopy data bank, https://www.ebi.ac.uk/pdbe/emdb/ (accession nos. ***), and the associated models have been deposited in the protein data bank, www.rcsb.org (accession nos. **, **, and **). the authors declare that they have no conflict of interest. to express sars-cov- s glycoprotein ectodomain, the mammalian codon-optimized gene coding sars-cov- (wuhan-hu- strain, genbank id: mn . ) s glycoprotein ectodomain (residues m -q ) with proline substitutions at k and v , a "gsas" substitution at the furin cleavage site (r -r ) was cloned into vector pcdna . +. a cterminal t fibritin trimerization motif, a tev protease cleavage site, a flag tag and a his tag were cloned downstream of the sars-cov- s glycoprotein ectodomain (fig. s a) before bli experiments, sars-cov- s trimer protein was biotinylated using the ez-link™ sulfo-nhs-lc-lc-biotin kit (thermo fisher) and then purified using zeba™ spin desalting column (thermo fisher), according to manufacturer's protocols. to determine binding affinity of ace , bli assay was carried out using an octet red instrument (pall fortébio, usa). briefly, biotinylated sars-cov- s trimer protein was loaded onto streptavidin (sa) biosensors (pall fortébio). s-trimer-bound biosensors were dipped into wells containing varying concentrations of ace protein and the interactions were monitored over a -sec association period. finally, the sensors were switched to dissociation buffer ( . m pbs supplemented with . % tween and . % bovine serum albumin) for a -sec dissociation phase. data was analyzed using octet data analysis software version . (pall fortébio). the purified sars-cov- s glycoprotein ectodomain and human ace pd domain were mixed at a molar ratio of : and were incubated on ice for hours. the mixture was purified by filtration chromatography using a superose increase / gl column (ge healthcare) pre-equilibrated with mm tris-hcl ph . , mm nacl, % glycerol. for cross linking complex, the buffer of purified sars-cov- s glycoprotein ectodomain and human ace pd domain were exchanged to mm hepes ph . , mm nacl; then sars-cov- s and human ace were mixed at a molar ratio of : . after incubation on ice for hours, the complex was cross linked by . % glutaraldehyde, which is commonly used in cryo-em studies of fragile macromolecular complexes (kastner et al., ; patel et al., ) . the glutaraldehyde was neutralized by adding mm tris-hcl ph . after incubated on ice for hour. the mixture was run over a superose increase / gl column (ge healthcare) in mm tris-hcl ph . , mm nacl, % glycerol. the complex peak fractions were concentrated and assessed by sds-page and negative-staining electron microscopy. for the ns sample, a volume of µl of sars-cov- s-ace sample was placed on a plasma cleaned copper grid for one minute. excess sample on the grid was blotted off using filter paper, and a volume of µl of . % uf (sigma-aldrich) was added to wash the grid. after blotting, another volume of µl of . % uf was placed on the grid again for one minute to stain. grids were visualized under a tecnai g spirit kv transmission electron microscope (thermo fisher scientific), and micrographs were taken using an eagle camera with a nominal magnification of , ×, yielding a pixel size of . Å. , particles were autopicked in eman (bell et al., ) . after d classification, we selected good averages with , particles for initial model building, which were performed in relion . (zivanov et al., ) . to prepare the cryo-em sample of sars-cov- s trimer, a . -µl aliquot of this sample was applied to a plasma cleaned holey carbon grid (r / , mesh; quantifoil) or graphene oxide-lacey carbon grid ( mesh, emr). the grid was blotted with vitrobot mark iv (thermo fisher scientific) and then plunged into liquid ethane cooled by liquid nitrogen. to prepare the cryo-em sample of s-ace complex with or without cross linking, we used graphene oxide-lacey carbon grid ( mesh, emr), and adopted the same vitrification procedure as for the s trimer. cryo-em movies of the samples were collected on a titan krios electron microscope (thermo fisher scientific) operated at an accelerating voltage of kv with a nominal magnification of , x (table s ). the movies were recorded on a k summit direct electron detector (gatan) operated in the super-resolution mode (yielding a pixel size of . Å after times binning), under low-dose condition in an automatic manner using serialem (mastronarde, ) . each frame was exposed for . s and the total accumulation time was . s, leading to a total accumulated dose of e -/Å on the specimen. to solve the problem of preferred orientation associated with sars-cov- s trimer, we additionally collected tilt datasets with the stage tilt at ° or °, while the other conditions remained the same. single particle analysis was mainly executed in relion . (fernandez-leiro and scheres, ) . all images were aligned and summed using motioncor software (zheng et al., ) . after ctf parameter determination using ctffind (rohou and grigorieff, ) , particle auto-picking, manual particle checking, and reference-free d classification, particles with s trimer features were maintained for further processing. for receptor-free s trimer sample, , particles were picked from non-tilt micrographs, and , remained after d classification (fig. s ) . these particles went through d auto-refine using available sars-cov- s trimer cryo-em map (emdb: ) lowpass filtered to Å resolution as initial model (walls et al., ) . these particles were refined into a closed state map of s trimer with imposed c symmetry. we then re-extracted the particles using the refinement coordinates to re-center it. after ctf refinement and polishing, these particles were refined with c symmetry again. noteworthy, the euler angle distribution of the map suggested the dataset is lacking tilted top views (fig. s c left panel) . indeed, when refine the dataset without imposing -fold symmetry, the top view of the map appeared distorted indicating a preferred orientation problem associated with the sample. to overcome the preferred orientation problem, we additionally collected tilt data, and boxed out , particles from º tilt micrographs and , particles from º tilt micrographs. after d classification, , particles remained. we then used goctf software to determine the defocus for each of the tilt particle, and these particles were re-extracted with corrected defocus (su, ) . after combining the tilt with non-tilt particles, we refined the dataset without imposing symmetry, then performed two rounds of d and d classifications to further cleanup the dataset, and obtained a dataset of , particles, of which , particles were from the tilt data. we then carried out heterogeneous refinement in cryosparc (punjani et al., ) , and obtained a closed state map from , particles and an open state reconstruction with , particles ( fig s ) . after ctf refinement and bayesian polishing, the closed state map was refined to . Å resolution with c symmetry, while the open state map was at . Å resolution and hardly to improve the resolution, indicating an intrinsic dynamic nature of the open state. the overall resolution was determined based on the gold-standard criterion using an fsc of . (scheres and chen, ) . for the sars-cov- s-ace cross-linked dataset, , particles were picked from original micrographs, and , particles remained after d classification (fig. s ) . these particles were refined with an initial model built from our negative staining data. we then reextracted the particles to re-center them. these particles went through a d- d classification step resulting in a further cleaned up dataset of , particles. we refined these particles into a map of ace bound s trimer complex. we then used this map as initial model to refine the originally picked , particles for one round to re-extract and re-center the particles. after d classification, , particles remained. after rounds of d- d cleaning step, , particles were left for further structure determination. after heterogeneous refinement in cryosparc, class resembled an ace -free open state of s trimer, and classes - adopted s-ace engaged conformation. for class , after further d classification, we refined the , cleaned up particles into a s-open map at . Å resolution using non-uniform refinement in cryosparc. among the other four classes with bound ace , we sorted out good particles for classes - by d classification and combined them with class exhibiting good structural details, resulting in a dataset of , particles. after refinement, bayesian polishing, and ctf refinement, we reconstructed a . Å resolution sars-cov- s-ace map. the s trimer portion without the up rbd was rather stable, could be locally refined to . Å using local refinement in cryosparc with non-uniform refinement option chosen. the ace associated with the up rbd was subtracted and refined in relion to obtain a more . Å map with better connectivity. multi-body refinement in relion . was applied to analyze the motion of the complex. for sars-cov- s-ace w/o crosslinking dataset, we followed similar classification and cleaning up strategy and obtained , particles. through heterogeneous refinement and d classification in cryosparc, we reconstructed a . Å resolution sars-cov- s-ace map from , particles using non-uniform refinement, and an unliganded open state map of . Å resolution from , particles, with the population of . % and . %, respectively. multi-body refinement was also applied to analysis the mobility of the complex. to build the pseudo atomic model for our sars-cov- s-closed structure, we used the available atomic model of sars-cov- s (pdb: vxx) as initial model (walls et al., ) . we first refined the model against our map using phenix.real_space_refine module in phenix (adams et al., ) . for the missing loop regions in s subunit, we either built the homology model based on sars-cov s structure (pdb: crw) (kirchdoerfer et al., ) through swiss-model webserver (waterhouse et al., ) , or built the loop manually according to the density in coot (emsley and cowtan, ) . for the fp region, we first built the homology model by modeller tool within chimera by using mers-cov s structure (pdb: nb ) as template (pettersen et al., ; sali, ; walls et al., ) , then used rosetta to refine this region against the density map (dimaio et al., ) . eventually, we used phenix.real_space_refine again for the protomer and s-trimer model refinement against the map. for the sars-cov- s-ace structure, we used the sars-cov- rbd-ace crystal structure (pdb: m j) as initial model for the ace and the associated up rbd portion, and our s-closed model as initial model for the remaining portion. these models were firstly refined against the corresponding focused map using rosetta and phenix (dimaio et al., ) , then combined together in coot. we then refined the combined model against our . Å resolution sars-cov- s-ace map using rosetta and phenix. for the s-open structure, we used the model of sars-cov- s-ace as initial model with ace removed, and refined against the map using rosetta. we used phenix.molprobility to evaluate the models, and calculated b-factors by atom displacement refinement function in phenix.real_space_refine. we used ucsf chimera and chimerax for figure generation (goddard et al., ; pettersen et al., ) , and also for rotation, translation, rmsd, and vdw contact measurement. interaction surface analysis was conducted by pisa server (krissinel and henrick, ) . to uncover the amino acids important for ace receptor recognition, ace ecotodomain (residues q to s ) gene, with an n-terminal il signal peptide, tagged with human igg fc and his tag at the c-terminus, were cloned into the pcdna . vector. codonoptimized rbd (residues v to g ) gene fragment, with an n-terminal il signal peptide, tagged with his tag at the c-terminus, were cloned into the pcdna . vector. three sars-cov- rbd mutants were constructed. for mutant rbd (core), amino acids r to n of core region in the sars-cov- rbd were substituted by the corresponding region of sars-cov strain tor (genbank id: aap . ). for mutants rbd (rbm-r ) and rbd (rbm-r ), residues l to k , and residues t to t of rbm region in the sars-cov- rbd were mutated into the corresponding regions of sars-cov strain tor , respectively. for single point mutations of rbd (q a), rbd (v a), and rbd (y a), rbd residues q , v , and y were substituted by ala, respectively. all mutant plasmids were constructed using the mutexpress tm ii fast mutagenesis kit v (vazyme, china) according to the manufacturer's instruction. the proteins were generated using hek f expression system and purified as described above. anti-rbd polyclonal antibody and monoclonal antibody (mab) a were prepared by immunizing balb/c mice with recombinant sars-cov- rbd fused with a c-terminal mouse iggfc tag (sino biological inc, beijing, china) using previously described protocols (qu et al., ) . the purified rbd mutants were tested by elisa for reactivity with the receptor ace . briefly, elisa plates were coated with ng/well of the purified rbd mutants in pbs at °c for hours and then blocked with % milk in pbs-tween (pbst). next, the plates were incubated with ng/well of ace -hfc fusion protein, µl/well of culture supernatant of hybridoma a , or µl/well of mouse anti-rbd sera (diluted at / ) at °c for h. after washing, the corresponding secondary antibodies, horseradish peroxidase (hrp)conjugated anti-human igg (abcam, usa) or hrp-conjugated anti-mouse igg (sigma, usa), were added and incubated at °c for h. after washing color development, absorbance at nm was determined. cryo-em data processing procedure for sars-cov- s trimer in the presence of ace . amino acid sequence alignment of sars-cov- s to sars-cov s. the secondary structure elements were defined based on an espript (robert and gouet, ) algorithm and are labeled based on our sars-cov- s-closed structure. the rbd domain is labeled in green frames, and the subdomains of rbm are also labeled. contacting residues at the sars-cov- rbd-ace interface (distance cutoff of Å) y h l k f t , d , k y t a s , t g s , q f l , m , y n q , y y t , f , k f k q k , h s h y h g k q y t y , d , r n k g k , g y k , g , a , r phenix: a comprehensive python-based system for macromolecular structure solution high resolution single particle refinement in eman the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex the d g mutation in sars-cov- spike increases transduction of multiple human cell types atomic-accuracy models from . -a cryo-electron microscopy data with density-guided iterative local refinement coot: model-building tools for molecular graphics fusion peptides and the mechanism of viral fusion a pipeline approach to single-particle processing in relion ucsf chimerax: meeting modern challenges in visualization and analysis cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the d g mutation of sars-cov- spike protein enhances viral infectivity and decreases neutralization sensitivity to individual convalescent sera grafix: sample preparation for single-particle electron cryomicroscopy stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- inference of macromolecular assemblies from crystalline state hiv- evades antibody-mediated neutralization through conformational masking of receptor-binding sites structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure, function, and evolution of coronavirus spike proteins structure of sars coronavirus spike receptor-binding domain complexed with receptor automated electron microscope tomography using robust prediction of specimen movements engineering trimeric fibrous proteins based on bacteriophage t adhesins conformational dynamics of single hiv- envelope trimers on the surface of native virions immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen structure of human tfiid and mechanism of tbp loading onto promoter dna ucsf chimera--a visualization system for exploratory research and analysis cryosparc: algorithms for rapid unsupervised cryo-em structure determination a new class of broadly neutralizing antibodies that target the glycan loop of zika virus envelope protein sars-cov- , sars-cov, and mers-cov: a comparative overview deciphering key features in protein structures with the new endscript server ctffind : fast and accurate defocus estimation from electron micrographs comparative protein modeling by satisfaction of spatial restraints prevention of overfitting in cryo-em structure determination cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace goctf: geometrically optimized ctf determination for single-particle cryo-em addressing preferred specimen orientation in single-particle cryo-em through tilting coronavirus membrane fusion mechanism offers a potential target for antiviral development structural basis for human coronavirus attachment to sialic acid receptors crucial steps in the structure determination of a coronavirus spike glycoprotein using cryo-electron microscopy function, and antigenicity of the sars-cov- spike glycoprotein cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion unexpected receptor functional mimicry elucidates activation of coronavirus fusion structural and functional basis of sars-cov- entry by using human ace site-specific analysis of the sars-cov- glycan shield swiss-model: homology modelling of protein structures and complexes cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace two mutations were critical for bat-to-human transmission of middle east respiratory syndrome coronavirus cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy a pneumonia outbreak associated with a new coronavirus of probable bat origin new tools for automated high-resolution cryo-em structure determination in relion- key: cord- -rstmd va authors: matsumura, yasufumi; shimizu, tsunehiro; noguchi, taro; nakano, satoshi; yamamoto, masaki; nagao, miki title: comparison of molecular detection assays for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rstmd va molecular testing for sars-cov- is the mainstay for accurate diagnosis of the infection, but the diagnostic performances of available assays have not been defined. we compared molecular diagnostic assays, including commercial kits using respiratory samples ( nasopharyngeal swabs, oropharyngeal swabs, and sputum) collected at japanese hospitals. sixty-eight samples were positive for more than one assay and one genetic locus and were defined as true positive samples. all the assays showed a specificity of % ( % confidence interval, . to ). the n assay kit of the us centers for disease control and prevention (cdc) and the n assay of the japanese national institute of infectious disease (niid) were the most sensitive assays with % sensitivity ( % confidence interval, . to ), followed by the cdc n kit, e assay by corman, and niid n assay multiplex with internal control reactions. these assays are reliable as first-line molecular assays in laboratories when combined with appropriate internal control reactions. shows the molecular assays evaluated in this study. real-time rt-pcrs were performed using n , n , and rnasep (rp) internal control assays developed by the ( ) (with/without eav), n and e assays developed by charité in germany ( ) (corman) with taqpath™ -step rt-qpcr master mix, cg (thermo fisher scientific). we also tested the lightmix® modular assays (roche) for e, rdrp, and n genes multiplexed loopampexia® real-time turbidimeter (eiken chemical, tokyo, japan). analytical sensitivity we determined the limit of detection (lod) of each assay using a minimum of four replicates of two-fold serial dilutions of recombinant sindbis virus containing a partial sars-cov- genome (accuplex™ sars-cov- reference material kit, , copies/ml; seracare, milford, ma, usa). we calculated the % limit of detection (lod) using probit analysis. at the time manuscript preparation, no gold standard exists. in this study, to ensure the presence of sars-cov- rna and to avoid false-positives, a sample was defined as positive when positive test results were obtained for more than one genetic locus and assay and the others were defined as negative. is available as dataset s . all the assays exhibited a specificity of %, while sensitivity varied ( table ). the cdc n , cdc n , niid n (with/without eav), and corman e assays were the most sensitive assays with ≥ . % sensitivity. these assays displayed high overall agreement compared with the reference standard (kappa values of ≥ . ) and between any two of them (kappa values of ≥ . ). the cdc n and niid n assays exhibited % sensitivity; thus, their results were equal to the defined reference standard. the sensitivities of the remaining assays (corman n, roche e, roche rdrp, roche n, thermo combo, bgi and lamp assays; ≤ . %) were significantly lower than those of the most sensitive assays. the cdc protocol requires both n and n assays, and a sample will be considered positive if both produced positive results. in this study, one true positive nasopharygeal sample was positive only for the n assay even after retesting. the sample was considered inconclusive and the performance of the cdc protocol was considered the same as the cdc n assay. the niid protocol includes both niid n and corman n assays, and a sample will be considered positive if either assay produces a positive result. in this study, . % of samples were positive for both assays, and . % table shows diagnostic performances for each specimen type. nasopharyngeal swabs tended to have a higher sensitivity than the other samples. the sensitivity of corman n assay for sputum samples and that of roche n assay for oropharyngeal swabs and sputum samples were significantly lower than those for nasopharyngeal swabs. the lods of the roche rdrp and n assays were high (> , copies/ml). the current diagnosis of covid- mainly relies on rt-pcr tests ( ). we performed manufacturer-independent evaluation of the molecular assays, including commercial kits that utilize otherwise-extracted rna templates. we found that the specificity was perfect for all the assays and that the cdc n , cdc n , niid n , and corman e assays were the most sensitive and highly concordant ( ). genetic variations that may compromise sensitivity of the cdc n , n , and corman e assays have been rarely observed as of week of ( ). false negatives by the other assays occurred among low-copy number samples (presenting high ct values by the cdc n or niid n assay; dataset s ), suggesting a lack of sensitivity of these assays. the roche assays were based on corman's assays ( ) but had lower sensitivity for their e and n assays. this is likely due to lower ct cutoffs for the roche assays, rather than differences in reagents and reaction conditions (table and dataset s ). previous studies reported that n assay was less sensitive than the e and rdrp assays ( ) and the rdrp assay was less sensitive than the roche e assay ( ). the low sensitivity of the roche rdrp and n assays were concordant with their high lods (table ) to avoid false negatives due to technical errors such as extraction problems or pcr inhibition, it is recommended to include internal control reactions. the cdc assays were designed to be combined with a separate internal control reaction (table ) . different from multiplex assays that incorporate internal controls such as roche, thermo, or bgi kits, this approach needs extra reagents, time, and space in a reaction plate but can be combined with other in-house assays (niid n or corman e) without any modification. for the multiplex approach, we selected the niid n assay to be seegene. these reports are in agreement with our findings. the study limitations included a relatively small sample size of each specimen type and lack of clinical information, measurements by multiple investigators, and genomic variation analysis. in conclusion, we validated the niid n assay with eav control reaction and showed that samples ( true positive sputum samples, true positive pharyngeal sample, and true negative sputum sample) were negative for all genes, and these were considered negative. detection of novel coronavirus ( -ncov) by real-time rt-pcr rt-pcr for severe acute respiratory syndrome coronavirus improved molecular diagnosis of covid- by the novel, highly sensitive and specific covid- -rdrp/hel real-time reverse transcription-pcr assay validated genetic diagnostic methods for novel coronavirus -ncov) real-time rt-pcr diagnostic panel interpreting diagnostic tests for sars-cov- the measurement of observer agreement for categorical data in comparison with the sensitivity for nasopharyngeal swabs cdc n niid n with eav ci, confidence interval. key: cord- -tqdcb oo authors: pratibha,; shaju, c.; kamal, title: ubiquitous forbidden order in r-group classified protein sequence of sars-cov- and other viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tqdcb oo each amino acid in a polypeptide chain has a distinctive r-group associated with it. we report here a novel method of species characterization based upon the order of these r-group classified amino acids in the linear sequence of the side chains associated with the codon triplets. in an otherwise pseudo-random sequence, we search for forbidden combinations of kth order. we applied this method to analyze the available protein sequences of various viruses including sars-cov- . we found that these ubiquitous forbidden orders (ufo) are unique to each of the viruses we analyzed. this unique structure of the viruses may provide an insight into viruses’ chemical behavior and the folding patterns of the proteins. this finding may have a broad significance for the analysis of coding sequences of species in general. of the side chains associated with codon triplets. in an otherwise pseudo-random sequence, we search for forbidden combinations of k th order. the results indicate what nature has decided not to do rather than what to do. we found that these forbidden orders are ubiquitous to each of the viruses we analyzed. these ubiquitous forbidden orders (ufo) are unique structures of the viruses that may provide an insight into viruses' chemical behavior and the folding patterns of the proteins. this finding may have a broad significance for the analysis of coding sequences of species in general. coding sequences of species were downloaded from the national centre for the codon triplets of four bases (a, g, c, and t) in the sequence can have possible combinations that form amino acids. each of these amino acids is associated with a side chain that controls the folding patterns of the protein and its chemical behavior. based on the chemical properties of the side chain, each codon triplet can be sub-classified as non-polar (n), uncharged polar (p), acidic (a), or basic (b). a matlab code reads the sequence and classifies each amino acid triplet as its respective r-group side chain. based on the chemical properties of the side chain of an amino acid in a coding sequence of a species, we generated a sequence of n (non-polar), p (polar), a (acidic), and b (basic). a new sequence of four symbols, n, b, a, and p is thus created from the protein sequence of the species that looks something like this -nbnnpaaabbnpnnnpabbababaa…………… where each letter represents amino acid with one of the chemical properties listed above. this r-group sequence is used to obtain cgr plots of any given protein sequence of a species (figure ). theoretically, all combinations of n, b, a, and p are possible in this sequence. in this study, we look at the sequence from a different perspective. instead of studying what is present in the protein sequence, we decided to analyze what is absent from the sequence. cgr of a driven ifs -a protein sequence x(k) can be considered as a string composed of n, p, a, and b. we consider a unit square u and name corners ci (i= , , , ) as n, b, a, and p respectively, which corresponds to the value of x(k). the initial point p( ) is the midpoint of the square. now the second point p( ) is the midpoint between p( ) and cx( ). in general, p(k) is plotted as the midpoint between p(k- ) and cx(k) [ ] . after plotting the genetic sequence x in unit square u, the unit square is divided into k x k sub squares; each sub-square represents a unique sub-sequence of length k. an example of the movement of points in cgr is shown with the first eight members of the data sequence (pnaabnpa….) in figure a . an example of addresses of the sub-squares for different orders (k =, , , , and ) is given in figure b . pc-plots -to make these plots, the percentage of points plotted in sub-square is calculated. this percentage value represents the intensity of points in each subsquare. after plotting points by cgr and dividing the unit square into k x k sub squares, each sub-square is color-filled based on the calculated intensity values. figure c shows the percentage-cgr plot made for the human adenovirus for k= . the existing literature presents studies of these plots in a phylogenetic analysis of species [ , , , , ] . we take a step further but in the opposite direction and look for those combinations of n, b, a, and p, which are ubiquitously forbidden by nature for a given length (k). d. e. f. ubiquitous forbidden order (ufo) plots of human adenovirus for combinations of order (k) , , and . the vertices in the squares are the same as depicted in figure a the red color indicates that the corresponding address is forbidden. for example, in figure d , the address abab is forbidden. the forbidden order in the plots can only be visualized for lower orders. it can be seen that figures e and f are becoming more and more chaotic as the value of k increases and are difficult to analyze. next, we analyzed protein sequences of viruses (figures , , and supplementary figures - ) to search for a ubiquitous forbidden order in each one of them. our purpose was to find some clues for uniquely analyzing the sars-cov- to handle the covid- pandemic. figure shows the th level ufo plots of seven coronaviruses infecting humans. from ufo plots, the viruses seem to be getting optimized with time. the earlier coronaviruses, e, oc , nl , and hku (figure d, e, f, and g), which are also the mild coronaviruses, have a lot of forbidden addresses in the amino acid polypeptide chain. with evolution, the structure seems to be getting simpler for mers, sars-cov- , and sars-cov- (figure a, b, and c ). it appears that nature prefers to be simple and less complex to be able to survive and evolve. although the ufo plot is unique for each virus, they seem to optimize their evolution in time. a close examination of ufo plots of sars-cov- ( figure a) and sars-cov- ( figure b ) reveals that the evolution of the sars-cov- from sars-cov- is quite straight forward. one just has to prohibit a particular order bapb in the sars-cov- protein sequence to get a sars-cov- strain. this opens up a point of discussion whether this formation is possible in a laboratory setting or not. as non-biologists, we cannot comment on their origin, though our results provide an alternative approach for further exploration by the subject specialists. among the viruses studied, we noted that the forbidden order bpab is unique to sars cov- , rubella, and avian ib (figure ) . a forbidden order bpab in the ufo plot means that the th order combination bapb is prohibited by nature, i.e., a basic side chain(b), followed by an acidic side chain(a), followed by an uncharged polar side chain(p) cannot be followed by a basic side chain(b). this rule is found to be followed only by the sars-cov- , rubella, and the avian infectious bronchitis (aib) sequences among all viruses studied by us. the whole forbidden structure of sars-cov- is an inherent part of the rubella plot and partly of the aib plot. this feature in the sars-cov- protein sequence may be a pointer to support the idea of using the mmr vaccine in covid- as floated by fidel and noverr [ ] and the use of recombinant ace by kruse [ ] as a preventive measure to reduce the inflammation in covid- patients. the use of drugs/vaccines for existing viruses in managing untreatable viral diseases has been suggested by others also [ , ] including for covid- [ ] . next, we tried to analyze the studies on the evolutionary origin of the sars-cov- and its comparison to the bat coronavirus ratg isolate [ , ] . we found that at r-group classified sequences of n, b, a, and p in these two samples are identical up to level of the amino acid ordering in the protein structures (figures a, d, g) . however, at the next level, i.e. k= , the differences between the two protein sequences start to emerge (figures b, e , h) and become quite clear at the th level of ordering (figures c, f, i) . this supports the findings of wrobel et.al. [ ] . indicate the addresses that are forbidden in the second plot but are allowed in the first one. note the increase in complexity with the increase in order. there is no evident difference between the sars-cov- and its closest relative [ ] ratg structures at the th order (figure g ), but the higher-order comparison (figures h, i) reveals key differences between these two viruses. recently, flies et.al. [ ] emphasized shifting the focus of immunological research to new models and interdisciplinary studies. we have looked at the amino acid sequences, as non-biologists, from a different angle based on the chemical properties of the side chain of an amino acid in a coding sequence of a species. the forbidden order bapb is unique to sars cov- . a basic side chain, followed by an acidic side chain, followed by an uncharged polar side chain can not be followed by a basic side chain. this rule is found to be followed only by the sars-cov- , rubella, and the avian infectious bronchitis among the viruses studied by us. this study of the forbidden order of r-group side chains in a protein sequence opens up new directions for microbiologists to study coding sequences. the consequences of the forbidden order to the properties of a protein are yet to be ascertained. dna sequence alignment by microhomology sampling during homologous recombination gapped blast and psi-blast: a new generation of protein database search programs a novel method of characterizing genetic sequences: genome space with biological distance and applications numerical encoding of dna sequences by chaos game representation with application in similarity comparison the number of k-mer matches between two dna sequences as a function of k and applications to estimate phylogenetic distances chaos game representation of gene structure genomic signature: characterization and classification of species assessed by chaos game representation of sequences analysis of genomic sequences by chaos game representation similarity analysis for dna sequences based on chaos game representation case study: the albumin alignment-free genomic sequence comparison using fcgr and signal processing a categorization of covid- treatment strategies: a modified chaos game representation (cgr) analysis of genome sequences for thirty-two pathogens. covid- virtual conference machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: covid- case study a novel numerical representation for proteins: three-dimensional chaos game representation and its extended natural vector chaos game representation of protein sequences based on the detailed hp model and their multifractal and correlation analyses could an unrelated live attenuated vaccine serve as a preventive measure to dampen septic inflammation associated with covid- infection? therapeutic strategies in an outbreak scenario to treat the novel coronavirus originating in wuhan, china. f res. : developing the concept of beneficial nonspecific effect of live vaccines with epidemiological studies non-specific effects of bcg vaccine on viral infections a sars-cov- protein interaction map reveals targets for drug repurposing evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic sars-cov- and bat ratg spike glycoprotein structures inform on virus evolution and furin-cleavage effects rewilding immunology key: cord- - f gve authors: jeon, sangeun; ko, meehyun; lee, jihye; choi, inhee; byun, soo young; park, soonju; shum, david; kim, seungtaek title: identification of antiviral drug candidates against sars-cov- from fda-approved drugs date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: f gve covid- is an emerging infectious disease and was recently declared as a pandemic by who. currently, there is no vaccine or therapeutic available for this disease. drug repositioning represents the only feasible option to address this global challenge and a panel of fda-approved drugs that have been pre-selected by an assay of sars-cov was screened to identify potential antiviral drug candidates against sars-cov- infection. we found a total of drugs which exhibited antiviral efficacy ( . μm < ic < μm) against sars-cov- . in particular, two fda-approved drugs - niclosamide and ciclesonide – were notable in some respects. these drugs will be tested in an appropriate animal model for their antiviral activities. in near future, these already fda-approved drugs could be further developed following clinical trials in order to provide additional therapeutic options for patients with covid- . covid- is an emerging infectious disease caused by a novel coronavirus, sars-cov- . although the case fatality rate due to this viral infection varies from to % , the transmission rate is relatively high and recently, the who declared covid- outbreak a pandemic. currently, there is no vaccines or therapeutics available and the patients with covid- are being treated with supportive care. drug repositioning could be an effective strategy to respond immediately to emerging infectious diseases since the new drug development usually takes more than years . fda-approved drugs provide safe alternatives only in the case where at least modest antiviral activity can be achieved. accordingly, several drugs are being tested in numerous clinical trials including remdesivir, lopinavir, and chloroquine . in this study, we screened a panel of fda-approved drugs to identify antiviral drug candidates for the treatment of covid- and suggest the identified drug candidates may be considered for therapeutic development. we screened approximately , fda-and ind-approved drug library against sars-cov to identify antiviral drug candidates (manuscript in preparation). since the sars-cov and sars-cov- are very similar ( . % sequence identity) , the drugs which show antiviral activity against sars-cov are expected to show similar extent of antiviral activity against sars-cov- . a total of drugs were selected from the earlier sars-cov screening results. in addition, drugs were included based on recommendations from infectious diseases specialists (table ) . for screening experiments, vero cells were used and each drug was added to the cells prior to the virus infection. at h after the infection, the infected cells were scored by immunofluorescence analysis with an antibody specific for the viral n protein of sars-cov- . the confocal microscope images of both viral n protein and cell nuclei were analyzed using our in-house image mining (im) software and the dose-response curve (drc) for each drug was generated ( figure ). chloroquine, lopinavir, and remdesivir were used as reference drugs with ic values of . , . , and . µm, respectively ( figure a ). among the drugs that were evaluated in our study, drugs showed potential antiviral activities against sars-cov- with ic values in second, ciclesonide is another interesting drug candidate for further development although its antiviral potency was much lower (ic = . µm) than niclosamide. it is an inhaled corticosteroid used to treat asthma and allergic rhinitis . a recent report by matsuyama et al. corroborated our finding of ciclesonide as a potential antiviral drug against sars-cov- . a treatment report of three patients who were infected by sars-cov- in japan (https://www .nhk.or.jp/nhkworld/en/news/ _ /) warrants further clinical investigation of this drug in patients with covid- . intriguingly, an underlying mechanism for the suppression of viral infection by ciclesonide has been revealed by the isolation of a drug-resistant mutant . the isolation of the drug-resistant mutant indicated that nsp , a viral riboendonuclease, is the molecular target of ciclesonide. together, it is not unreasonable to consider that ciclesonide exhibits a direct-acting antiviral activity in addition to its intrinsic antiinflammatory function. in the future, sirna targeting the hormone receptor will allow to assess the extent of direct-acting antiviral activity. with its proven anti-inflammatory activity, ciclesonide may represent as a potent drug which can manifest dual roles (antiviral and antiinflammatory) for the control of sars-cov- infection. prior to our evaluation of drugs against sars-cov- infection, we also tested antiviral activity of several other drugs based on the cytopathic effect of the virus in the presence of each drug ( figure ). in particular, the effect of favipiravir and atazanavir was compared to those of the reference drugs (chloroquine, lopinavir, remdesivir) because favipiravir is considered as a drug candidate for clinical trials and atazanavir was recently predicted as the most potent antiviral drug by ai-inference modeling . however, in the current work, we did not observe any antiviral activity of either favipiravir or atazanavir. in summary, we selected and screened fda-approved drugs based on our sars-cov screening and our screening campaign revealed potential antiviral drug candidates against sars-cov- . our findings could be further validated in an appropriate animal model, and hopefully developed through subsequent clinical trials in order to provide additional therapeutic options for patients with covid- . vero cells were obtained from the american type culture collection (atcc ccl- ) and ten-point drcs were generated for each drug. vero cells were seeded at . ten-point drcs were generated for each drug. vero cells were seeded at . a pneumonia outbreak associated with a new coronavirus of probable bat origin estimating the risk of novel coronavirus death during the course of the outbreak in china early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia drug repositioning: identifying and developing new uses for existing drugs more than clinical trials launch to test coronavirus treatments remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro broad spectrum antiviral agent niclosamide and its therapeutic potential inhibition of severe acute respiratory syndrome coronavirus replication by niclosamide skp attenuates autophagy through beclin -ubiquitination and its inhibition reduces mers-coronavirus infection absorption of pyrvinium pamoate ciclesonide: a safe and effective inhaled corticosteroid for the treatment of asthma the inhaled corticosteroid ciclesonide blocks coronavirus rna replication by targeting viral nsp . biorxiv ( ) predicting commercially available antiviral drugs that may act on the novel coronavirus ( -ncov) we thank drs. wang-shick ryu and spencer shorte for their helpful discussion and review of the manuscript. the pathogen resource (nccp ) for this study was provided by the national culture collection for pathogens. this work was supported by the national research key: cord- -czh zfb authors: lu, shuaiyao; zhao, yuan; yu, wenhai; yang, yun; gao, jiahong; wang, junbin; kuang, dexuan; yang, mengli; yang, jing; ma, chunxia; xu, jingwen; qian, xingli; li, haiyan; zhao, siwen; li, jingmei; wang, haixuan; long, haiting; zhou, jingxian; luo, fangyu; ding, kaiyun; wu, daoju; zhang, yong; dong, yinliang; liu, yuqin; zheng, yingqiu; lin, xiaochen; jiao, li; zheng, huanying; dai, qing; sun, qiangmin; hu, yunzhang; ke, changwen; liu, hongqi; peng, xiaozhong title: comparison of sars-cov- infections among species of non-human primates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: czh zfb covid- , caused by sars-cov- infection, has recently been announced as a pandemic all over the world. plenty of diagnostic, preventive and therapeutic knowledges have been enriched from clinical studies since december . however, animal models, particularly non-human primate models, are urgently needed for critical questions that could not be answered in clinical patients, evaluations of anti-viral drugs and vaccines. in this study, two families of non-human primates, old world monkeys ( macaca mulatta, macaca fascicularis) and new world monkeys ( callithrix jacchus), were experimentally inoculated with sars-cov- . clinical signs were recorded. samples were collected for analysis of viral shedding, viremia and histopathological examination. increased body temperature was observed in % ( / ) m. mulatta, . % ( / ) m. fascicularis and none ( / ) of c. jacchus post inoculation of sars-cov- . all of m. mulatta and m. fascicularis showed chest radiographic abnormality. viral genomes were detected in nasal swabs, throat swabs, anal swabs and blood from all species of monkeys. viral shedding from upper respiratory samples reached the peak between day and day post inoculation. from necropsied m. mulatta and m. fascicularis, the tissues showing virus positive were mainly lung, weasand, bronchus and spleen. no viral genome was seen in any of tissues from necropsied c. jacchus. severe gross lesions and histopathological changes were observed in lung, heart and stomach of sars-cov- infected animals. in summary, we have established a nhp model for covid- , which could be used to evaluate drugs and vaccines, and investigate viral pathogenesis. m. mulatta is the most susceptible to sars-cov- infection, followed by m. fascicularis and c. jacchus. one sentence summary m. mulatta is the most susceptible to sars-cov- infection as compared to m. fascicularis and c. jacchus. are going to answer with the animal models. about years ago, the nonhuman primate models recapitulated several important aspects of sars - . these models made great contributions to investigation of sars pathogenesis, evaluation of antiviral drugs and vaccine . is different from sars in some aspects, pathogens for these two diseases share some characters, such as ace receptor. here, to establish the covid- model, two families including species of non-human primates, which are widely used for animal models with their own advantages and disadvantages, were experimentally infected with sars-cov- , followed by comparisons of clinical symptoms, hematology, biochemical indexes, immunology and histopathology among species. we found that both given that host factors may be involved in viral pathogenesis, we designed an experiment in the present study to investigate whether host genetics, age and gender affect sars-cov- infection in non-human primates ( figure ). two families of non-human primates, old world monkeys ( m. mulatta, m. fascicularis) and new world monkeys ( c. jacchus), were chosen for this experiment after screening and randomly grouped based on species, age and gender ( figure ). animals in large size ( adults and old m. mulatta, m. fascicularis) were inoculated with . x pfu of sars-cov- via routines (intratreachuslly . ml, intranasally . ml and intra . ml). half dosage of the viruses was given to young m. mulatta via the same routines. six c. jacchus were inoculated with . x pfu intranasally. animals were monitored daily and sampled at the indicated time points before and after viral inoculation ( figure ). increased body temperature (bt) (above ℃ ) was continuously observed in % ( / ) m. mulatta, . % ( / ) m. fascicularis and none ( / ) of c. jacchus post inoculation of sars-cov- . bt of / m. mulatta reached the first peak on day - post inoculation (dpi) and lungs is increased and thickened, and scattered in small patches ( figure b ). overview of the chest radiograph throughout this study showed that progressive pulmonary infiltration was noted in all m. mulatta and m. fascicularis (supplemental figure ). to know dynamics of viral replication and virus shedding, samples of nasal swabs, throat swabs, anal swabs, feces, blood and tissues were collected at the indicated time points, and sars-cov- genomes were quantitated by rt-qpcr. swab samples collected on dpi from m. mulatta and m. fascicularis showed surprisingly high levels of viral genome rna, particularly in nasal swabs. most of swab samples saw the second peaks of viral rna on - dpi. in some swab samples from old world monkeys, viral rna was still detectable on dpi ( figure a ). less viral rna were detected in throat swabs, compared to nasal swabs and anal swabs. in contrast to m. mulatta and m. fascicularis, lower levels of viral rna were detected in swab samples from c. jacchus during two weeks post viral inoculation ( figure b ). virus shedding in feces from out of old world monkeys started on dpi. notably, higher levels of viral rna were detected in feces from sxh , hhh , hhh and hhh , from which anal swabs also gave correspondingly higher number of viral rna. in peripheral blood, / old world monkeys and / new world monkeys had viral rna detectable on dpi. after days, blood samples from nearly all old world monkeys became viral rna-positive. on dpi, viral rna was not detectable in blood samples from almost all ( / ) of experimental monkeys ( figure a ). viral load of sars-cov- was determined in tissue samples from animals necropsied on dpi (hhh ), dpi (hhh ), dpi (hhh ), dpi (sxh- , rh , rh ) and dpi (hhh , hhh ). rt-qpcr with sars-cov- specific primers and probes was performed to quantitate copy number of viral figure d and figure e ). under tem, no viral particle was observed in the ultrathin sections of lungs and other tissues. to determine the host response to sars-cov- infection, we firstly pandora's box after years , leaves us not just many mysteries but also hopes in its jar. animal models of covid- , particularly non-human primate models, will give us hopes to uncover many mysteries of sars-cov- and however, animals in this study were screened prior to experiments and supposed to be no severe complications/comorbidities and relatively healthy with normal immunity. this may explain that no matter age and gender within one species, monkeys showed similar susceptibility and responses to sars- monitoring body temperature is one of critical steps to define covid- suspected patients, which leads to the conclusion that fever is the most common in covid- patients, counting for more than % of all inpatients , , . here, all sars-cov- inoculated m. mulatta had increased body temperature at some time points with a peak of . ℃ . one-third of m. fascicularis and c. jacchus had a slightly elevated body temperature. nevertheless, this manifestation could not be defined as fever since we don't have the physiological body temperature of various species of monkeys as a reference. the other important clinical feature of covid- is abnormal changes of chest radiograph from x-ray and ct , which is generally considered to be direct evidence for pneumonia. ground glass opacities (ggo) were seen in %- - . % of covid- patients, followed by consolidation ( - . %) and other abnormal radiographs , , , . in this study, all animals (c. jacchus jacchus, showed severe histopathological changes in lung as pneumonia, and inflammation in liver and heart. in conclusion, the nhp model in this study simulates several important aspects of covid- . using this model, we should further explore pathogenesis of sars-cov- in order to answer some critical remaining questions in clinics, such as the refractory patients, cytokine storm, antibodies against sars-cov- , and so on. and this model is suitable for preclinical evaluation of anti-viral drugs and vaccines against sars-cov- . all animal procedures were approved by the institutional animal care and viral stock of sars-cov- was obtained from the center of diseases control, guangdong province china. viruses were amplified on vero-e cells and concentrated by ultrafilter system via kda module (millipore). amplified sars-cov- were confirmed via rt-pcr, sequencing and transmission electronic microscopy, and titrated via plaque assay ( pfu/ml). three species of monkeys from two families of primates (old world monkeys and new world monkey) were used for this study. detailed information about experimental animals was shown in figure a . animal groups and experimental schedules were outlined in figure a . referring to the nhp model of sars, we inoculated old world monkeys with total . ml of pfu/ml sars-cov- intratracheally ( ml), intranasally ( . ml) and on conjunctiva ( . ml), new world monkeys with . ml intranasally. animals were daily checked for clinical sign and body temperature. at the indicated time points in figure a , we anaesthetized animals with ketamine and performed the following experimental procedures. every other day post inoculation, we took chest radiography of animals, harvested peripheral blood to prepare samples of whole blood or serum, and collected nasal, pharyngeal, and rectal swabs in ul trizol ls solution (invitrogen, us) for further analysis. monkeys with severe signs were chosen for necropsy and pathological changes of all organs were recorded and evaluated at gross, histological and ultrastructural levels. chest x-ray image of each anaesthetized monkey was taken with - v and - . ma every other day using mobile digital medical x-ray photography system (mobilecooper, browiner china). data was evaluated and scored double-blindly and independently by two radiologists. for paraffin-bedded sections, tissues were collected and fixed in % neutral- a novel nucleic acid hybridization technique rnascope was performed to detect and localize viral rna in paraffin-embedded tissue sections using sars-cov- specific probe (rnascope® probe-v-ncov -s, acd, cat no. , targeting the region of - (nc_ . ) of sars-cov- genome). ffpe slides were pretreated by baking, de-paraffinizing, target retrieval and protease treatment. sars-cov- specific probe was applied to the pretreated slides and hybridized with viral genome. fluorophore was added to detect the hybridization on slides, followed by counter-staining with dapi. slides were scanned via d histech system for further analysis. levels of sars-cov- -specific antibodies in serum samples were evaluated via the commercially available sars-cov- antibody assay kit (elisa) (cat#xg h , china). in this kit, viral spike protein was coated in each well of -well plate to capture spike-specific antibodies. hrp-conjugated goat anti-human igg (h+l) antibody was added to detect the captured spikespecific antibodies. data was plotted via the software graphpad. i r u s d i s c o v e r e d b y c h i n e s e s c i e n t i s t s i n v e s t i g a t i n g p n e u m o n i a o u t b r e a k . t h e w a l l s t r e e t j o u r n a l u p d a t e d j a n . , : p m e t ( and heart (right) fixed in . % glutaraldehyde and % osmium tetroxide solution were for ultrastructural analysis (c). histopathological changes were described in the text. i n o c u l a t e d w i t h s a r s - c o v - . b i o r x i v , . . . ( ) . . c h a o s h a n , y . - f . y . , x i n g - l o u y a n g e t a l . . i n f e c t i o n w i t h n o v e l c o r o n a v i r u s ( s a r s - c o v - ) c a u hw did the animal experiments.; my, cm, sz, jl, yd detected viral rna; jx did rnascope; lj and xl performed immunofluorescent assay vero-e cells yz did histopathological work; hz and ck provided viral stock; qs, yh, qd, ql gave suggestions to the study. all authors have approved the submitted manuscript the authors would like to thank all staffs in national kunming high-level biosafety primate research center for providing absl -and absl -related services. this study was supported by yfc and yfc . . % ↓ *body weights (bw) of animals were measured on day and day post virus inoculation. body weight change (%) was expressed as |bw on dpi-bw on dpi|/bw on dpi. key: cord- - w xgt authors: kirchdoerfer, robert n.; wang, nianshuang; pallesen, jesper; wrapp, daniel; turner, hannah l.; cottrell, christopher a.; corbett, kizzmekia s.; graham, barney s.; mclellan, jason s.; ward, andrew b. title: receptor binding and proteolysis do not induce large conformational changes in the sars-cov spike date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: w xgt severe acute respiratory syndrome coronavirus (sars-cov) emerged in as a highly transmissible pathogenic human betacoronavirus. the viral spike glycoprotein (s) utilizes angiotensin-converting enzyme (ace ) as a host protein receptor and mediates fusion of the viral and host membranes, making s essential to viral entry into host cells and host species tropism. as sars-cov enters host cells, the viral s undergoes two proteolytic cleavages at s /s and s ’ sites necessary for efficient membrane fusion. here, we present a cryo-em analysis of the trimeric sars-cov s interactions with ace and of the trypsin-cleaved s. surprisingly, neither binding to ace nor cleavage by trypsin at the s /s cleavage site impart large conformational changes within s or expose the secondary cleavage site, s ’. these observations suggest that s ’ cleavage does not occur in the s prefusion conformation and that additional triggers may be required. viral and host membranes, making s essential to viral entry into host cells and host species tropism. as sars-cov enters host cells, the viral s undergoes two proteolytic cleavages at s /s and s ʹ′ sites necessary for efficient membrane fusion. here, we present a cryo-em analysis of the trimeric sars-cov s interactions with ace and of the trypsin-cleaved s. surprisingly, neither binding to ace nor cleavage by trypsin at the s /s cleavage site impart large conformational changes within s or expose the secondary cleavage site, s ´. these observations suggest that s ´ cleavage does not occur in the s prefusion conformation and that additional triggers may be required. severe acute respiratory syndrome coronavirus (sars-cov) emerged in humans in and rapidly spread globally causing , cases and associated deaths in countries through july . sars-cov reappeared in a second smaller outbreak in , but has since disappeared from human circulation. however, closely related coronaviruses, such as wiv , currently circulate in bat reservoirs and are capable of utilizing human receptors to enter cells . the more recent emergence of middle east respiratory syndrome coronavirus (mers-cov) and the likelihood of future zoonotic transmission of novel coronaviruses to humans from animal reservoirs make understanding the coronavirus infection cycle of great importance to human health. coronaviruses are enveloped viruses possessing large, trimeric spike glycoproteins (s) required for the recognition of host receptors for many coronaviruses as well as the fusion of viral and host cell membranes for viral entry into cells . during viral egress from infected host cells, some coronavirus s proteins are cleaved into s and s subunits. the s subunit is responsible for host-receptor binding while the s subunit contains the membrane-fusion machinery. during viral entry, the s subunit binds host receptors in an interaction thought to expose a secondary cleavage site within s (s ´) adjacent to the fusion peptide for cleavage by host proteases - . this s ´ proteolysis has been hypothesized to facilitate insertion of the fusion peptide into host membranes after the first heptad repeat region (hr ) of the s subunit rearranges into an extended α-helix - . subsequent conformational changes in the second heptad repeat region (hr ) of s form a six-helix bundle with hr , fusing the viral and host membranes and allowing for release of the viral genome into host cells. coronavirus s is also the target of neutralizing antibodies , making an understanding of s structure and conformational transitions pertinent for investigating s antigenic surfaces and designing vaccines. the sars-cov s subunit is composed of two distinct domains: an n-terminal domain (s ntd) and a receptor-binding domain (s rbd) also referred to as the s ctd or domain b. each of these domains have been implicated in binding to host receptors, depending on the coronavirus in question. however, most coronaviruses are not known to utilize both the s ntd and s rbd for viral entry . sars-cov makes use of its s rbd to bind to the human angiotensin-converting enzyme (ace ) as its host receptor , . recent examination using cryo-electron microscopy (cryo-em) has illuminated the prefusion structures of coronavirus spikes [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . initial examination of hcov-hku s showed that the receptor-binding site on the s rbd was occluded when the rbd was in a 'down' conformation and it was hypothesized that conformational changes were required to access this site . subsequent studies of the highly pathogenic human coronavirus s proteins of sars- cov , and mers-cov , showed that these viral s rbd do indeed sample an 'up' conformation where the receptor-binding site is accessible. these structural studies also located the positions of the s /s and s ´ cleavage sites on the prefusion spike. the s /s site lies within a surface exposed loop in the second subdomain of s . however, the s ´ site lies closer to base of the spike and though this region is located on the surface of the spike, cleavage at this site is prevented by surrounding protein elements . to examine the hypothesized conformational transitions induced by proteolysis and receptor binding, we used single-particle cryo-em to determine structures of s in uncleaved, s /s cleaved and ace -bound states. three-dimensional classification of the s rbd positions and corresponding atomic protein models revealed that neither ace -binding nor trypsin cleavage at the s /s boundary induced substantial conformational changes in the cov may use a distinct mechanism of fp membrane insertion. as observed in the previous sars-cov and mers-cov s structures , , the trimeric s adopts two distinct conformations related to each of the s rbd. the 'down' conformation caps the s helices and makes extensive contacts with the s ntd. the 'up' conformation of the s rbd exposes the s rbd receptor-binding site. it has been previously reported that for wild- type sars-cov s, % of the particles contained three 'down' rbd conformations while % contained a single 'up' s rbd conformation . to examine the conformation of the s rbd among our sars-cov s p ectodomains, we used a local masking and -d sorting strategy to more accurately classify the conformations as being either 'down' or 'up' at each of the three s rbd positions within the trimer. this analysis revealed that the majority of the s p proteins were in the single-'up' conformation ( %) with lesser amounts of double-and triple-'up' conformations ( % and % respectively) and with no all-'down' conformation observed. the increased propensity to adopt the 'up' s rbd conformation may indicate a difference in the coronavirus s containing the p mutations, however other differences in sample preparation cannot be ruled out. ace and s c-terminal domains to examine the structure of sars-cov s bound to its receptor, ace , we combined sars-cov s p ectodomain with an excess of soluble human ace with subsequent purification by size-exclusion chromatography and immediate cryo-em specimen preparation. initial sorting of particle heterogeneity indicated spikes could be split into ace -bound ( %) and unbound ( %) classes. using a similar masking and -d sorting strategy as above we sorted the unbound s class further into classes with s conformations of one or two 'up' s rbds (fig and supplementary tables - and supplementary fig. - ) . we did not observe an all-'down' class nor a three 'up' s rbd class indicating a low prevalence of these conformations among the unbound spikes. expanding our d sorting strategy, we classified our ace -bound particles at each s rbd position and identified single, double and triple ace -bound s. we were further able to identify s rbd conformations at the non-ace occupied rbd positions to represent each population of s rbd conformations among ace -bound s. as hypothesized by previous structural work - , , the s rbd recognizes ace with an 'up' s rbd conformation. the proportion of total 'up' s rbd conformations within the ace -bound and -unbound classes is nearly identical within this dataset ( % 'up' s rbd), similar to the proportion of total 'up' s rbd in the sars s p ectodomain dataset ( %). this strongly suggests that binding of a single ace receptor does not induce adjacent s rbds to transition from a 'down' to 'up' conformation. hence, ace is more likely to bind to an already 'up' s rbd rather than inducing the conformational changes that are required for the s rbd to become accessible to ace . it is noteworthy that despite prolonged co-incubation and an excess of ace , we had difficulties in saturating the s rbd with ace in the context of trimeric s ectodomain. this poor saturation is illustrated by the small proportion of triple-bound ace and the majority of spikes that are unbound by receptor. in contrast, isolated recombinant s rbd easily binds ace and is capable saturating ace on target cells to block s-mediated entry . our observed sub-stoichiometric ace binding to trimeric spikes is consistent with the difficulty in using soluble ace receptor to neutralize sars-cov s pseudotyped onto vsv . the reduced binding of ace to trimeric spikes is likely due to the incomplete exposure and conformational flexibility of the s rbd. incomplete neutralization with soluble receptor was not encountered for mhv which binds ceacam a via its s ntd, which does not undergo conformational changes , . similar to recently published mers-cov s structures , the ace -bound rbd adopts a much more extended and rotated conformation compared to s rbd modeled in previous sars-cov s structures . this difference is likely due to poor density in the hinge regions between the s rbd and subdomain (sd- ) in these previous reconstructions , rather than the presentation of a unique receptor-bound conformation. indeed, the bound ace receptor and s rbd for all reconstructions here show poorer density quality than the less mobile regions of the sars-cov s (fig ) . to improve the density for ace -bound s rbd, we used focused refinement on this region to overcome the flexibility of these domains relative to the rest of s. this yielded a . Å resolution reconstruction with improved local density quality (fig b and c) . we successfully placed the crystal structure of the sars-cov s rbd bound to ace ( ajf.pdb ) into this density as a rigid body indicating that the previously determined crystal structure accurately recapitulates the conformation between the ace -bound s rbd in the trimeric spike. the ace -bound, s rbd extends upwards and rotates away from contacts with nearby amino acids. hence, any conformational changes induced by receptor binding to the s rbd are more likely to be caused by the absence of the s rbd contacts in the 'up' conformation, rather than the formation of additional contacts (supplemental figure ) . this model provides a flexible mechanism for how different coronavirus spikes can bind to different protein receptors with their s rbd and facilitate fusion with host cells. moreover, movements of the s rbd to the 'up' fig. ). nearing the end of the time course additional lower molecular weight bands are observed which we interpret to be degradation of the s subunit. regardless of which construct was used or whether ace was bound to the s ectodomains, there is no prominent band that corresponds to a s ʹ′ cleavage product (approximately kda). to analyze the cleavage products in detail, we performed cryo-em analysis on the trypsin-cleaved sars-cov s p ectodomain. using all-particles and c symmetry yielded a reconstruction at . Å resolution (fig. , supplementary tables and and supplementary fig. ). the short loop containing the s /s cleavage site is disordered in the uncleaved spike reconstruction and remains disordered in the trypsin cleaved reconstruction. moreover, examination of the structure models indicates no significant differences between the trypsin- cleaved and uncleaved sars-cov s (fig. b) . fine sorting of s rbd positions of the trypsin- cleaved s reveals a very similar distribution of 'up' s rbd conformations available for receptor binding as in the uncleaved samples, although we additionally observe a small proportion of s rbd in the all-'down' conformation (fig. c) . these results indicate that trypsin-cleavage at s /s does not impart large conformational changes on the sars-cov s and justifies the removal of s /s cleavage sites for the production of more homogeneous material as vaccine immunogens. this suggests that although cleavage at s /s may remove an obstacle for conformational changes leading to fusion, s /s cleavage alone does not produce significant conformational changes. terminal helix of s hr (fig ) . exposure of this site for cleavage may require remodeling of this penultimate loop or hr beyond the conformation observed in the prefusion state. we hypothesize that additional triggers beyond cleavage at the s /s site or protein-receptor binding are needed to transition the spike from its prefusion state to a yet to be observed intermediate. changes and that the s ′ proteolysis does not occur in the s prefusion state (fig. ) . this grids were loaded onto a titan krios and data was collected using leginon at a total dose of e -/Å . frames were aligned with motioncor (ucsf) implemented in the appion workflow . particles were selected using dog picker . images were assessed and particle picks were masked using em hole punch . the ctf for each image was estimated using gctf . electron microscopy data processing initial particle stacks were cleaned using multiple rounds of d classification in relion . good particles were selected as resembling prefusion coronavirus spikes. for the sars s p and trypsin-treated sars s p, all particles from the clean stacks were used for reconstruction with c symmetry. all datasets were extensively sorted using d classification to examine heterogeneity in the s rbds as described previously . briefly, d masks were defined to encompass the possible heterogeneity at each s rbd position. the density within these masks was then removed from unfiltered, unsharpened reconstructions. we then used relion_project with image subtraction to create a particle stack containing only the signal arising from the masked density. finally, we used focused d classification to identify compositional and conformational states at each s rbd position. all d reconstructions were produced with relion and final refinements were performed with a six-pixel soft-edge solvent mask. post- processing was applied to each reconstruction to apply b-factor sharpening and amplitude corrections as well as to calculate local resolution maps. coordinate models were built for several of the high-resolution reconstructions using i .pdb , ajf.pdb and x s.pdb as template models with reference to a recently sars and mers: recent insights into emerging coronaviruses proceedings of the national academy of sciences of the united states of america mechanisms of coronavirus cell entry mediated by the viral spike protein two-step conformational changes in a coronavirus envelope glycoprotein mediated by receptor binding and proteolysis receptor-bound porcine epidemic diarrhea virus spike protein cleaved by trypsin induces membrane fusion proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex structure of influenza haemagglutinin at the ph of membrane fusion tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion sars immunity and vaccination recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission angiotensin-converting enzyme is a functional receptor for the sars coronavirus a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensin-converting enzyme . the cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding pre-fusion structure of a human coronavirus spike protein immunogenicity and structures of a rationally designed prefusion mers- proceedings of the national academy of sciences of the united states cryo-em structure of porcine delta coronavirus spike protein in the pre- fusion state cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy glycan shield and fusion activation of a deltacoronavirus spike glycoprotein fine-tuned for enteric infections cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains structure of sars coronavirus spike receptor- binding domain complexed with receptor peptide forms an extended bipartite fusion platform that perturbs membrane order in a calcium-dependent manner vesicular stomatitis virus pseudotyped with severe acute respiratory syndrome coronavirus spike protein n-terminal domain of the murine coronavirus receptor ceacam is responsible for fusogenic activation and conformational changes of the spike protein activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites protease-mediated enhancement of severe acute respiratory syndrome coronavirus infection physiological and molecular triggers for sars-cov membrane fusion and entry into host cells host cell proteases: critical determinants of coronavirus tropism and pathogenesis discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus automated molecular microscopy: the new leginon system motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy appion: an integrated, database-driven pipeline to facilitate em image processing dog picker and tiltpicker: software tools to facilitate particle selection in single particle electron microscopy emhp: an accurate automated hole masking algorithm for single-particle cryo-em image processing real-time ctf determination and correction accelerated cryo-em structure determination with parallelisation using gpus in relion- coot: model-building tools for molecular graphics atomic-accuracy models from . -a cryo-electron microscopy data with density-guided iterative local refinement phenix: a comprehensive python-based system for macromolecular structure solution computational resources for electron microscopy at the scripps research institute are supported by nih grant od processed electron microscopy data. r.n.k and c.a.c. built and refined atomic models we gratefully acknowledge travis nieusma, charles bowman, jean-christophe ducom and bill anderson for microscopy and computational support. we also thank lauren holden for a critical reading of this manuscript. this work was supported by grants from nih/niaid to a.b.w and key: cord- -ely aen authors: pickering, brad s.; smith, greg; pinette, mathieu m.; embury-hyatt, carissa; moffat, estella; marszal, peter; lewis, charles e. title: susceptibility of domestic swine to experimental infection with sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ely aen sars-cov- , the agent responsible for covid- has been shown to infect a number of species. the role of domestic livestock and the risk associated for humans in close contact remains unknown for many production animals. determination of the susceptibility of pigs to sars-cov- is critical towards a one health approach to manage the potential risk of zoonotic transmission. here, pigs undergoing experimental inoculation are susceptible to sars-cov- at low levels. viral rna was detected in group oral fluids and nasal wash from at least two animals while live virus was isolated from a pig. further, antibodies could be detected in two animals at and days post infection, while oral fluid samples at days post inoculation indicated the presence of secreted antibodies. these data highlight the need for additional livestock assessment to better determine the potential role domestic animals may contribute towards the sars-cov- pandemic. severe acute respiratory syndrome coronavirus (sars-cov- ), the agent of coronavirus disease was recently identified to cause severe respiratory distress in humans with symptoms ranging from asymptomatic, mild to severe, and sometimes fatal cases ( ). rapidly spreading, this novel virus emerged in wuhan china, to generate a pandemic as declared by the world health organization on march, th ( ). predicted to have originated in bats, sars-cov- origins are still under intense investigation as reports continue to identify the ability of the virus to infect new animal species ( ) ( ) ( ) ( ) ( ) ( ) . detection of natural infections has recently shed light on knowledge gaps in understanding transmission which has raised concerns regarding amplifying or reservoir hosts. in turn, a better understanding of wildlife and domestic animal susceptibility is required to assess the potential roles and present risks to prevent future spread of disease. domestic swine, one of the most significant and highly produced agricultural species with previous impacts to public health, must be assessed ( - ). the increase in "backyard" small stakeholder animal production in both rural and urban environments provides an important source of high-quality protein and income, but can also serve as a source for zoonotic disease; therefore, it is important to investigate their potential role during sars-cov- spread ( ). evidence for the involvement of production animals was recently highlighted in the netherlands where anthroponotic transmission of sars-cov- from humans to farmed mink with subsequent zoonotic transmission to at least two humans from mink has been proposed, further exemplifying the need to identify the potential role of production animals in disease transmission ( ) . angiotensin-converting enzyme (ace ) has been identified to be the receptor for sars-cov- ( ). a basic local alignment search tool (blast) query of the protein database using translated nucleotide (blastx) from the human ace coding sequence predicts % coverage and % identity for the homologous receptor in swine. interestingly, using the same search both mink ( %) and feline ( %) show similar identity to the human ace for their cognate receptors. moreover, both mink and cats have been reported to be susceptible to sars-cov- and have shown transmission to other animals ( , ). work by zhou et al. utilized in vitro infectivity studies testing ace receptor from laboratory mice, horseshoe bats, civets and the domestic pig. all of the respective receptors, except mice, were reported to enter hela cells indicating a functional target for sars-cov- . moreover, the authors employed additional known coronavirus receptors including both aminopeptidase n and dipeptidyl peptidase finding neither are used for cell entry outlining the specificity for the ace receptor ( ). the work reported here aims to determine whether domestic swine are susceptible to sars-cov- infection, providing critical information to aid public health risk assessments. following oronasal inoculation, swine were assessed for: clinical signs and pathology, evidence of virus shedding, viral dissemination within tissues, and seroconversion. the data presented in this study provides evidence live sars-cov- virus can persist in swine for at least days following experimental inoculation. guidelines. group housing was carried out in the bsl- zoonotic large animal cubicles, and animals were provided with commercial toys for enrichment and access to food and water ad libitum. all invasive procedures, including experimental inoculation and sample collection (nasal washes, rectal swabs, and blood collection) were performed under isoflurane gas anesthesia, and hematology, chemistry, and blood gas analyses. hematology was performed on an hm analyzer (abaxis) using k edta-treated whole blood and the following parameters were evaluated: red blood cells, hemoglobin, hematocrit, mean corpuscular volume, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, red cell distribution weight, platelets, mean platelet volume, white blood cells, neutrophil count (absolute (abs) and %), lymphocyte count (abs and %), monocyte count (abs and %), eosinophil count (abs and %), and basophil count (abs and %). blood chemistries were evaluated on a vetscan (abaxis) with the comprehensive diagnostic profile rotor (abaxis) using serum stored at - ˚c until tested and the following parameters were evaluated: glucose, blood urea nitrogen, creatinine, calcium, albumin, total protein, alanine aminotransferase, aspartate aminotransferase, alkaline phosphatase, amylase, potassium, sodium, phosphate, chloride, globulin, and total bilirubin. sodium heparin treated blood was used to analyze venous blood gases, which were performed on an istat alinity v machine (abaxis) using a cg + cartridge (abaxis) to measure the following parameters: lactate, ph, total carbon dioxide, partial pressure carbon dioxide, partial pressure experimental inoculation of sixteen eight week old swine was performed oronasally with x pfu of sars-cov- , distributed evenly between both nostrils and the distal pharynx. starting at day post inoculation (dpi), pigs developed a mild, bilateral ocular discharge and in some cases, this was accompanied by serous nasal secretion. this was observed for only the first three days post inoculation. temperatures remained normal throughout the study (table s ). overall, animals did not develop clinically observable respiratory distress, however one animal (pig - ) presented mild depression at dpi accompanied with a cough which was maintained through dpi. this animal did not display additional clinical signs over the course of the study. (table ) . every other day starting at dpi to dpi, oral, nasal, and rectal swabs were sampled to evaluate the potential for delayed onset ( ). nucleic acid was extracted from swabs and rt- qpcr was performed to identify sars-cov- by targeting the envelope gene (e gene). viral rna could not be detected in swabs from any animals over the course of the study (table , a) . nasal washes are a sensitive method for detection of pathogens in swine and were routinely sampled using sterile d-pbs to rinse nasal passages. two pigs ( - , - ) displayed low levels of viral rna by rt-qpcr at dpi (table , detection of sars-cov- was also attempted from whole blood by rt-qpcr, following the sampling schedule outlined in table . as outlined in table a , viremia, as indicated by the presence of viral rna in the blood, could not be detected in any animal throughout the study. blood cell counts, chemistries, and gasses were measured using the abaxis hm , vetscan , and istat respectively. although some variation was observed throughout the study, changes were minimal and inconclusive, and profiles consistent with acute viral infection or subsequent organ damage were not observed. to identify potential target tissues or gross lesions consistent with sars-cov- disease, necropsy was performed on two animals starting at dpi and every other day up to day ; with an additional two pigs necropsied at both and dpi (table ) (table ) . the development of sars-cov- neutralizing antibodies were monitored over the course of study. starting at dpi, serum was obtained from individual animals for both virus neutralization test (traditional vnt) and a surrogate virus neutralization test (svnt; genscript). sera was first tested using a traditional vnt, with one pig ( - ) generating neutralizing antibody titers, albeit weak, at a : dilution with a % reduction of plaques at both and dpi (table ) . consequently, the svnt assay identified the same animal, pig - , as antibody positive with . µg/ml antibody at dpi. a second pig ( - ) was shown to have generated antibody at dpi ( . µg/ml) and dpi ( . µg/ml). the svnt was also employed to identify secreted antibody in oral fluids throughout the study. interestingly, at dpi we detected positive antibody ( . µg/ml) from group oral fluid collected from cubicle (table ) . the results presented in this study define domestic swine as a susceptible species albeit at low levels to sars-cov- viral infection. one animal was found to retain live virus, while two additional animals had detectible rna measured in the nasal wash, and two pigs developed antibodies. in total, of the sixteen animals experimentally inoculated, five displayed some level we would like to thank the public health agency of canada for sars-cov- isolate for this study, in addition the animal care and genomics units for their support during this project. we would also like to thank dr. claire andreasen for her review of the clinical pathology early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia pathogenesis and transmission of sars-cov- in golden hamsters susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus sars-cov infection in farmed mink identifying sars-cov- related coronaviruses in malayan pangolins the proximal origin of sars-cov- . nature medicine evidence for sars-cov- infection of animal hosts emerging swine zoonoses. vector-borne and zoonotic diseases nipah virus: a recently emergent deadly paramyxovirus fatal encephalitis due to nipah virus among pig-farmers in malaysia japanese encephalitis virus infection, diagnosis and control in domestic animals veterinary vaccines and their importance to animal health and public health. procedia in vaccinology coronavirus rips through dutch mink farms, triggering culls functional assessment of cell entry and receptor usage for sars- cov- and other lineage b betacoronaviruses. nat microbiol transmission of sars-cov- in domestic cats a pneumonia outbreak associated with a new coronavirus of probable bat origin hematology and biochemistry reference intervals for ontario commercial nursing pigs close to the time of weaning the biomedical piglet: establishing reference intervals for haematology and clinical chemistry parameters of two age groups with and without iron supplementation recombinant nipah virus vaccines protect pigs against challenge experimental inoculation study indicates swine as a potential host for hendra virus detection of novel coronavirus ( -ncov) by real-time rt-pcr a genomic perspective on the origin and emergence of sars-cov- sars-cov- in fruit bats, ferrets, pigs, and chickens: an experimental transmission study. the lancet microbe key: cord- -i pic o authors: boris, bonaventure; antoine, rebendenne; de gracia francisco, garcia; marine, tauziet; joe, mckellar; valadão ana luiza, chaves; valérie, courgnaud; eric, bernard; laurence, briant; nathalie, gros; wassila, djilli; mary, arnaud-arnould; hugues, parrinello; stéphanie, rialle; olivier, moncorgé; caroline, goujon title: a genome-wide crispr/cas knock-out screen identifies the dead box rna helicase ddx as a broad antiviral inhibitor date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: i pic o genome-wide crispr/cas knock-out genetic screens are powerful approaches to unravel new regulators of viral infections. with the aim of identifying new cellular inhibitors of hiv- , we have developed a strategy in which we took advantage of the ability of type interferon (ifn) to potently inhibit hiv- infection, in order to create a cellular environment hostile to viral replication. this approach led to the identification of the dead-box rna helicase ddx as an intrinsic inhibitor of hiv- . depletion of endogenous ddx using sirna or crispr/cas knock-out increased hiv- infection, both in model cell lines and in physiological targets of hiv- , primary cd + t cells and monocyte-derived macrophages (mdms), and irrespectively of the ifn treatment. similarly, the overexpression of a dominant-negative mutant of ddx positively impacted hiv- infection, whereas wild-type ddx overexpression potently inhibited hiv- infection. the positive impact of endogenous ddx depletion on hiv- infection was directly correlated to an increase in viral dna accumulation. interestingly, proximity ligation assays showed that ddx , which can be mainly found in the nucleus but is also present in the cytoplasm, was in the close vicinity of hiv- capsid during infection of primary monocyte-derived macrophages. moreover, we show that ddx is also able to substantially decrease infection with other retroviruses and retrotransposition of long interspersed elements- (line- ). finally, we reveal that ddx potently inhibits other pathogenic viruses, including chikungunya virus and severe acute respiratory syndrome coronavirus (sars-cov- ). over the past years, a growing list of cellular proteins with various functions have been identified as capable of limiting different steps of hiv- life cycle (doyle et al., ; ghimire et al., ) . lentiviruses have generally evolved to counteract the action of these so-called restriction factors. however, type interferons (ifns) induce, through the expression of interferonstimulated genes, an antiviral state particularly efficient at inhibiting hiv- when cells are preexposed to ifn (ho et al., ; bednarik et al., ; coccia et al., ; baca-regen et al., ; goujon and malim, ; cheney and mcknight, ) . the dynamin-like gtpase mx , and, very recently, the restriction factor trim a, have both been shown to participate in this ifninduced inhibition (goujon et al., a; kane et al., ; ohainle et al., ; jimenez-guardeño et al., ) . with the hypothesis that additional hiv- inhibitors remained to be identified, we took advantage of the hostile environment induced by ifn to develop a wholegenome screen strategy in order to reveal such inhibitors. the development of crispr/cas as a genome editing tool in mammalian cells has been a major breakthrough, notably with the generation of pooled single guide (sg)rna libraries delivered with lentiviral vectors (lvs), allowing high-throughput screens at the whole-genome scale (shalem et al., , doench, ) . we used the genome-scale crispr knock-out (gecko) sgrna library developed by feng zhang's laboratory shalem et al., shalem et al., , to generate cell populations knocked-out for almost every human gene in the t g glioblastoma cell line. this model cell line is both highly permissive to lentiviral infection and potently able to suppress hiv- infection following ifn treatment (supplementary information, si figure ). the screen strategy is depicted in figure a . t g cells were first modified to stably express cas and a high number of t g-cas cells were then transduced with lvs coding sublibraries a or b, the two halves of the gecko library. a low multiplicity of infection (moi) was used to avoid multiple integration events and increase the probability to express only sgrna per cell. deep sequencing analysis of the gecko cell populations showed more than % coverage for both libraries (≥ reads for , and , sgrna-coding sequences out of gecko populations were subjected to type ifn treatment in order to induce the antiviral state and, h later, incubated with vsv-g-pseudotyped, hiv- based lvs coding for an antibiotic resistance cassette. two days later, the cells successfully infected despite the ifn treatment were selected by cell survival in the presence of the corresponding antibiotic. in order to enrich the population with mutants of interest and to limit the presence of false-positives, two additional rounds of ifn treatment, infection and selection (using different antibiotics) were performed ( figure a ). as expected, the cells enriched after each round of the screen became less refractory to hiv- infection following ifn treatment (si figure ) . . the gecko populations were then exposed to ifn for h and challenged with hiv- -based lvs coding for an antibiotic resistance gene. after selection by antibiotic addition, the surviving cells (i.e. efficiently infected despite the ifn treatment) were amplified. in total, the gecko population underwent three successive rounds of ifn treatment, infection and selection using lvs coding for different resistance cassettes. the genomic dnas of the initial gecko populations and the three-time selected populations were extracted, the sgrnacoding sequences were amplified by pcr and sequenced by next generation sequencing (ngs). b. the candidate genes were identified using the mageck computational statistical tool (li et al., ) . mageck establishes a robust rank aggregation (rra) score for each gene based on the sgrna enrichment and the ctrl ifnar mx wars ddx cldn reep ikbip topbp copg fgf spryd rdh dctpp prlr sun cxorf tmem a kcn lsm loc pkd ndpc kiaa mir- vstm a calu smarca dnajb fam a etnppl rhoc mir- - cxorf ccser slc a or n uba hspa b siae krt adra a lin tectb cdrt tceal trerf cd hsp aa ctnna plac l relb bsnd gecko population versus selected population number of sgrnas targeting the same gene. here, genes belonging to the ifn-response pathway (indicated in blue) and ddx (in red) are represented (together with their respective rank into brackets) for the independent screens (the results of which were merged in the analysis). c. t g/cd /cxcr /cas /firefly ko populations were generated for the best candidate genes of each screen. the control (ctrl) condition represents the mean of four negative control cell populations (i.e. expressing different non-targeting sgrnas) and ifnar and mx ko cell populations were used as positive controls. ko cell populations were pre-treated with ifn and infected with hiv- renilla. the cells were lysed h post-infection and the two luciferase signals were measured (renilla signals were normalized to internal firefly control). the ifn inhibition (corresponding to the ratio of the untreated / ifn-treated conditions) was calculated and sets at % inhibition for the average of the negative ctrl populations. a representative experiment is shown (mean and standard deviation from technical duplicates). to identify the genes of interest, the differential sgrna abundance between the starting gecko populations and the enriched ( -times selected) populations was analysed by ngs. , and , different sgrnas were identified (≥ reads) for screens a and b, which represented , % and % of the sgrnas present in the initial gecko population a and b, respectively. the mageck algorithm, which assigns a robust ranking aggregation (rra) score, was used to rank the gene candidates from each screen ( figure b ). for both screens, we observed a positive enrichment for genes (rra score > , ), with the best hits being ifnar , jak and stat ( figure b ). all the crucial mediators of the type ifn signalling cascade were present among the top hits in both screens (with the notable exception of stat ), validating our approach and confirming the identification of relevant genes. interestingly, most of the other positively selected genes displayed unknown functions or functions that were a priori unrelated to the ifn response pathway or to innate immunity. of note, very little overlap was observed between the two independent screens, performed with two different sub-libraries. however, a poor overlap between independent screens has been observed before and does not preclude obtaining valid data (doench, ) . therefore, the top candidate genes from each independent screen were selected for further validation. as a first validation step, the sequences of the most enriched sgrna for each gene were chosen and cloned into the lentiguide-puro vector . t g-cas cells expressing hiv- cd and cxcr receptors, as well as the firefly luciferase as internal control (t g cas /cd /cxcr /firefly cells), were transduced with the sgrna-expressing lvs to generate individual ko populations. four irrelevant, non-targeting sgrnas, as well as sgrnas targeting ifnar and mx , were used to generate negative and positive control populations, respectively. the ko cell populations were pre-treated with ifn and infected with an hiv- reporter virus expressing the renilla luciferase reporter and bearing hiv- envelope (nl - /nef-ires-renilla, hereafter called hiv- renilla). infection efficiency was analysed h later ( figure c ). as expected, ifnar and mx ko fully and partially rescued hiv- infection from the protective effect of ifn, respectively (goujon et al., a; bulli et al., ; xu et al., ) . the ko of two candidate genes, namely wars and ddx , allowed a partial rescue of hiv- infection from the ifn-induced inhibition, suggesting a potential role of these candidate genes. ddx is a member of the dexd/h box family of rna helicases with rna chaperone activities (uhlmann-schiffler et al., ) and, as such, retained our attention. indeed, various dead box helicases, such as ddx , ddx and ddx , are well-known to regulate hiv- life cycle (gringhuis et al., ; sithole et al., sithole et al., , soto-rifo et al., ; williams et al., ; yedavalli et al., ) . however, to our knowledge, the impact of ddx on hiv- replication had never been studied. in order to validate the effect of ddx ko on hiv- infection in another model cell line, two additional sgrnas were designed (sgrna- and - ) and used in parallel to the one identified in the gecko screen (sgddx - ) (figure a ). u -mg/cd /cxcr cells were used here, as we previously extensively characterized the ifn phenotype in these cells (goujon et al., a) . control and ddx ko cell populations were treated or not with ifn for h prior to infection with increasing amounts of hiv- renilla. ddx depletion improved hiv- infection with all three sgrnas used, confirming that endogenous ddx had a negative impact on hiv- infection. interestingly, the increase in infection efficiency induced by ddx ko was observed irrespectively of the ifn treatment. ddx is not known to be an isg (interferome database and our previous study (goujon et al., a) , geo accession number: gse ), which we confirmed in a number of cell types (si figure ). the fact that the ifn-induced state is at least partially saturable (si figure ) explains why an intrinsic inhibitor of hiv- , which is not regulated by ifn, could be identified by our approach: removing one barrier to infection presumably rendered the cells generally more permissive and, in this context, ifn had less of an impact. sgddx - , and sgddx - ) and different non-targeting sgrnas, respectively (for the ctrl condition, the average of the data obtained with the four cell populations is shown). cells were pre-treated or not with ifn h prior to infection with hiv- renilla ( ng p gag ) and the ratio of renilla/firefly activity was analysed, as in ) and the infection efficiency was measured by p gag intracellular staining followed by flow cytometry analysis. when indicated, the cells were treated with µm zidovudine (azidothymidine, azt) and lamivudine ( tc) reverse transcription inhibitors for h prior to infection. d. blood monocytes from healthy donors were isolated, differentiated into mdms, and transfected with non-targeting sirnas (sictrl and sictrl ) or sirnas targeting ddx (siddx - and siddx - ). two days after transfection, mdms were infected with a ccr tropic version of nl - -renilla ( ng p gag ). infection efficiencies were monitored h later by measuring renilla activity. the relative luminescence results from experiments performed with cells from different donors are shown. e. cd + t cells were isolated from peripheral blood mononuclear cells, activated with il- and phytohemagglutinin, and electroporated with cas -sgrna rnp complexes using two non-targeting sgrnas (sgctrl and sgctrl ) and five sgrnas targeting ddx (sgddx - , - , - , - , - ) two days later. four days after electroporation, the activated cd + t cells were infected with nl - renilla for h. relative infection efficiencies obtained with cells from three independent donors are shown. ddx protein levels were determined by immunoblot and actin served as a loading control (a representative experiment is shown in order to confirm ddx 's effect on hiv- infection with an independent approach, we used different sirnas to knockdown ddx expression. we observed that depleting ddx with sirnas (with > % efficiency both at the mrna and protein levels, figure b ) improved hiv- infection efficiency by to -fold when using an hiv- renilla reporter in u -mg/cd /cxcr cells, irrespectively of the presence of ifn ( figure b , right panel). of note, wild-type hiv- infection was also impacted by ddx silencing, as shown by capsid (p gag ) intracellular staining h post-infection ( figure c ). we then investigated whether ddx had an impact in hiv- primary target cells. in mdms, we observed that hiv- infection was increased by about fold following ddx silencing ( figure d ), whereas ddx mrna abundance was decreased by only % in these cells using sirnas (si figure ). as the sirna approach did not work in our hands in primary t cells, we used electroporation of pre-assembled cas -sgrna ribonucleoprotein complexes (rnps) to deplete ddx in primary cd + t cells ( figure e ). highly efficient depletion of ddx was obtained with all sgrnas as compared to the sgctrls ( figure e , bottom panel) and this depletion increased hiv- infection by -to -fold, showing a role of ddx as an intrinsic inhibitor of hiv- in primary cd + t cells. having established that endogenous ddx had an impact on hiv- infection, we then analysed the consequences of ddx overexpression. an irrelevant control (firefly) or ddx were ectopically expressed in u -mg/cd /cxcr and the cells were challenged with hiv- renilla ( figure f ). ddx ectopic expression induced a substantial inhibition of hiv- infection (about -fold decrease in infection efficiency in comparison to the control) ( figure f ). we then tested a mutant version of ddx that is unable to hydrolyse atp and may supposedly act as a dominant negative, ddx k e (granneman et al., ; rocak, ) (si figure ) . interestingly, the expression of ddx k e mutant increased hiv- infection by -fold, reminiscent of what we observed with ddx depletion. altogether, these data showed for the first time that endogenous ddx is able to intrinsically inhibit hiv- infection. in order to determine the step of hiv- life cycle affected by ddx , we first analysed viral entry with a blam-vpr assay (cavrois et al., ) . consistent with the observation that vsv-gpseudotyping did not bypass ddx -mediated inhibition of infection (si figure ), we observed that ddx silencing did not impact hiv- entry (si figure ) . we then quantified hiv- dna accumulation over time in ddx -silenced and control cells. ddx depletion increased by -to -fold the accumulation of early and late reverse transcript products ( figure a , b and c), as well as integrated provirus and -ltr circles at h post-infection ( figure d and e). more than % knockdown was achieved with both sirnas targeting ddx ( figure f ). these data suggested that ddx rna helicase could inhibit the reverse transcription process and/or impact the stability of hiv- genome, leading to a decrease in viral dna accumulation. we hypothesized that if that was the case, ddx should be found in close proximity to hiv- reverse transcription complexes during infection. in agreement with this, proximity ligation assay (pla) performed on mdms infected with hiv- showed that ddx could indeed be found in close vicinity of capsid ( figure f and g). we next examined the ability of ddx to inhibit infection by a range of primate lentiviruses including laboratory-adapted strains of hiv- , hiv- -transmitted founder strains, hiv- and simian immunodeficiency virus derived from the rhesus macaque (sivmac). tzm-bl cells were transfected with ddx -targeting or scramble sirnas and infected with vsv-g-pseudotyped lentiviruses. infection efficiencies were monitored after h by measuring β-galactosidase activity ( figure h ). ddx depletion increased infection levels with all the tested hiv- strains to the same extent than what was observed with hiv- nl - (i.e. -to -fold). hiv- rod and sivmac infection efficiencies were also slightly improved in the absence of ddx (by about -fold). the analysis was then extended to two non-primate lentiviruses, the equine infectious anaemia virus (eiav) and feline immunodeficiency virus (fiv), using gfp-coding lvs derived from these viruses in comparison to hiv- and hiv- lvs (si figure ). ddx antiviral activity appeared less potent on hiv- lvs compared to replication-competent, full-length hiv- , which might suggest that viral components, absent in lvs, could be playing a role in ddx -mediated hiv- inhibition. nevertheless, ddx depletion appeared to increase hiv- , hiv- and fiv lv infection to the same extent, i.e. by about -fold, whereas eiav infection was less impacted by ddx (si figure ). we extended this study to the gammaretrovirus murine leukaemia virus (mlv) and observed that ddx depletion led to an increase in infection with gfp-coding mlv vectors ( figure i ). these results strongly support a general antiviral activity of ddx against retroviruses. ddx can be found in the cytoplasm but is predominantly located in the nucleus (si figure ; uhlmann-schiffler et al., ; zyner et al., ) . considering that ddx showed a broad activity against retroviruses and seemed to act at the level of reverse transcription, we sought to investigate whether ddx could inhibit retrotransposons. long interspersed nuclear elements (line)- are non-ltr retrotransposons, which have been found to be active in the germ line (branciforte and martin, ; ergün et al., ; trelogan and martin, ) and in some somatic cells (belancio et al., ; muotri et al., ; rangwala et al., ) . interestingly, ddx was identified among the suppressors of line- retrotransposition through a genome-wide screen in k cells, although not further characterized (liu et al., ) . to confirm that ddx could inhibit line- retrotransposition, hek t cells were co-transfected with two different, gfpexpressing line- plasmids (rps or lre ) or an inactive line- (jm ) together with a ddx -or a control (firefly)-expressing plasmid ( figure j ). gfp-line- retrotransposition was quantified by flow-cytometry days post-transfection (moran et al., ) . because the gfp cassette is cloned in antisense and disrupted by an intron in this reporter system, gfp is only expressed after line- transcription, splicing and orf p-mediated reverse-transcription and integration into the host genome (moran et al., ) . considering that most line- replication cycles lead to truncations and defective integrations (gilbert et al., ) , gfp expression derived from a new integration is a relatively rare event and, as expected, the percentage of gfp+ cells observed was very low ( figure j ) (figure ) . strikingly, ddx depletion did not have an impact on iav or vsv replication ( figure a and b), thereby confirming that manipulating ddx expression did not have a broad and unspecific impact on target cells. however, depletion of endogenous ddx increased infection with zikv, chikv and sars-cov- , and had a particularly high impact on the latter two (up to log and log increase in infection efficiency in ddx -depleted cells in comparison to control cells, for chikv and sars-cov- , respectively, figure d and f). of note, silencing efficiency was similar in the two types of target cells used here ( figure e and g). interestingly, ddx was recently identified as a potential inhibitor of sars-cov- replication in a whole-genome crispr/cas screen in simian vero e cells (wei et al., ) , supporting our observations that endogenous ddx potently inhibits the replication of this highly pathogenic coronavirus. taken together, our data showed that ddx is a broad inhibitor of viral infections, albeit presenting some specificity. further work is now warranted to explore in depth the breadth of ddx antiviral activity. . in contrast, our study revealed broad activity of endogenous ddx among retroviruses and retroelements, which was observed in various cell types, including primary cd + t cells. interestingly, our pla assays showed a close proximity between ddx and hiv- capsid, which is a viral protein recently shown to remain associated with reverse transcription complexes until proviral dna integration in the nucleus (burdick et al., ; dharan et al., ; peng et al., ) . this observation could suggest a direct mode of action on viral ribonucleoprotein (rnp) complexes. interestingly, we observed that ddx was able to inhibit viruses from other families, which possess different replication strategies, including sars-cov- and chikv. however, ddx did not have an impact on all the viruses we tested, reminiscent of broad-spectrum antiviral inhibitors such as mxa, which show some specificity (haller et al., ) . ddx is known to be a non-processive helicase, which also possesses rna annealing activities and the ability to displace rna-binding proteins from single-stranded rnas (uhlmann-schiffler et al., ) . moreover, ddx binds g-quadruplexes (zyner et al., ) , which are secondary structures found in cellular and viral nucleic acids and involved in various processes, such as transcription, translation and replication (fay et al., ; ruggiero and richter, ) . all these known activities of ddx would be consistent with a potential role in rnp remodeling (uhlmann-schiffler et al., ; will et al., ) . nonetheless, further investigation will be needed to determine whether ddx acts directly by altering viral rnps, and, if that's the case, what are the determinants for viral rnp recognition. in conclusion, this work highlights the importance of understanding the mechanism of action of ddx rna helicase and its contribution to the control of rna virus replication, an understanding which may contribute to the development of future antiviral interventional strategies. plasmids were a gift from prof. f. zhang (addgene # , # , and # , respectively ). lvs coding for sgrnas targeting the candidate genes and bearing hiv- nl - , iiib and hiv- proviral clones have been described (adachi et al., ; simon et al., ; schaller et al., ) , as well as the transmitted founder hiv- molecular clones ch .t, ch .c, rejo.c (gifts from prof. b. hahn, (ochsenbauer et al., ) ) and hiv- rod and sivmac (ryan-graham and peden, ; gaddis et al., ) . pblam-vpr and padvantage have been described (cavrois et al., ) . gfp-coding hiv- based lv system (i.e. p . hiv- gag-pol, pmd.g, and gfp-coding minigenome), and hiv- , fiv, and eiavderived, gfp coding lvs, as well as mlv-derived, gfp coding retroviral vectors have all been described (naldini et al., ; bainbridge et al., ) , (o'rourke et al., ; saenz et al., ) . the line- plasmid rps-gfp pur (prps-gfp), rps-gfp jm pur (pjm ) and plre -gfp were developed by prof. kazazian's lab (moran et al., ; ostertag et al., ; goodier et al., ) . was added at u/ml for - h prior to virus infection or rna extraction, and azt and tc (aids reagent program) at µm for h prior to infection. sang, under agreement n° pler - . peripheral blood mononuclear cells (pbmcs) were isolated by centrifugation through a ficoll® paque plus cushion (sigma-aldrich). primary human cd + t cells and monocytes were purified by positive selection using cd and cd microbeads, respectively (miltenyi biotec), as previously described (goujon et al., a) . hiv- renilla and nl - hiv- were produced by standard pei transfection of hek t. when indicated, pmd.g was cotransfected with the provirus at a : ratio. the culture medium was changed h later, and virus-containing supernatants were harvested h later. viral particles were filtered, purified by ultracentrifugation through a sucrose cushion ( % weight/volume in tris-nacl-edta buffer) for min at °c and , rpm using a sw ti rotor (beckman coulter), resuspended in serum-free rpmi or dmem medium and stored in small aliquots at - °c. β-lactamase-vpr (blam-vpr)-carrying viruses, bearing the wild-type env, were produced by cotransfection of hek t cells with the nl - /nef-ires-renilla provirus expression vector, pblam-vpr and padvantage at a ratio of : : . , as previously described (cavrois et al., ) . viral particles were titrated using an hiv- p gag alpha-lisa kit and an envision plate reader (perkin elmer) and/or by determining their infection titers on target cells. wild-type and/or vsv-g pseudotyped-hiv- , target cells were plated at . x cells per well in -well plates or at x cells per well in -well plates and infected for - h before lysis and renilla (and firefly) luciferase activity measure (dual-luciferase® reporter assay system promega) or fixation with % paraformaldehyde (pfa)-pbs, permeabilization (perm/wash buffer, bdbiosciences) and intracellular staining with the anti-p gag kc -fitc antibody (beckman coulter), as described previously (goujon and malim, ) . for tzm-bl assays, the bgalactosidase activity was measured using the galacto-star™ system (thermofisher scientific). (corman et al., ) . blam-vpr assay for hiv- entry. these assays were performed as described previously (goujon and malim, ) . briefly, pnl - or ptopo- ltr (generated by ptopo cloning of a -ltr circle junction amplified from nl - infected cells, using ohc and u -reverse primers into pcr™ . -topo™) were diluted in ng/ml of salmon sperm dna to create dilution standards used to quantify relative cdna copy numbers and confirm the linearity of all assays. proximity ligation assay. the proximity ligation assays were performed using the duolink® in situ detection reagents (sigma-aldrich, duo ). for this, mdms were plated in -well plates with coverslips pre-treated with poly-l-lysin (sigma-aldrich) and infected with µg p gag of hiv- nl - (ba-l env) or mock-infected. h later, the cells were fixed with % paraformaldehyde in pbs x for min, washed in pbs x and permeabilized with . % triton x- for min. after a couple of washes in pbs x, either ngb buffer ( mm nh cl, % goat serum and % bovine serum albumin in pbs) or duolink® blocking solution was added for h. cells were incubated with ag . mouse anti-capsid antibody obtained from the national institutes of health (nih) aids reagent program (# ) and anti-ddx rabbit antibody (hpa , sigma-aldrich) diluted in ngb buffer or in duolink® blocking solution for h. after washes in pbs x, the cells were incubated with the duolink® in situ pla® probe anti-rabbit minus (duo ) and duolink® in situ pla® probe anti-mouse plus (duo ) for h at °c. after washes in pbs x, the ligation mix was added for min at °c. after washes in pbs x, the cells were incubated with the amplification mix for min at °c. finally, the cells were washed twice with pbs x and stained with hoechst at µg/ml for min, washed again and the coverslips mounted on slides in prolong mounting media (thermofisher scientific). zstack images were acquired using an lsm confocal microscope (zeiss) using a x lens. pla punctae quantification was performed using the fiji software (schindelin et al., ) . briefly, maximum z-projections were performed on each z-stack and the number of nuclei per field were quantified. then, by using a median filter and thresholding, pla punctae were isolated and quantified automatically using the analyse particles function. to obtain a mean number of dots per cell, the number of pla dots per field were averaged by the number of nuclei. for representative images, single cells were imaged using a lsm confocal microscope coupled with an airyscan module. processing of the raw airyscan images was performed on the zen black software. immunoblot analysis. cell pellets were lysed in sample buffer ( mm tris-hcl, ph . , . % sds, % glycerol, . % bromophenol blue, % β-mercaptoethanol), resolved by sds-page and analysed by immunoblotting using primary antibodies specific for human ddx (hpa , sigma-aldrich) and actin (mouse monoclonal a , sigma-aldrich), followed by secondary horseradish peroxidase-conjugated anti-mouse or anti-rabbit immunoglobulin antibodies and chemiluminescence (bio-rad). images were acquired on a chemidoc™ gel imaging system (bio-rad). we have described previously iav nanoluciferase reporter virus generation (doyle et al., ) . stocks were titrated by plaque assays on mdck cells. iav challenges were performed in serum- zikv production and infection. the nanoluciferase expressing zikv construct has been described (mutso et al., ) . the corresponding linearized plasmid was transcribed in vitro using the sp mmessage mmachine™ (thermofischer scientific) and hek t cells were transfected with the transcribed rna. after days, supernatants were harvested, filtered and stock titers were determined by plaque assays on vero cells. for infections, . x cells per well in -well plates were infected, at the indicated mois. h after infection, cells were lysed and nanoluciferase activity was measured using the kit nano glo luciferase assay (promega). chikv production and infection. the gaussi luciferase coding chikv construct has been described (pohjala et al., ) . the linearized plasmid coding chikv genome was transcribed with the t mmessage mmachine kit (thermofischer scientific) and x hek t were transfected with - µg of transcribed rna, using lipofectamine (thermofischer scientific). after h, supernatants were harvested, filtered and viruses were then amplified on baby hamster kidney (bhk ) cells. stock titers were determined by plaque assays on vero cells. for infections, . x cells per well in -well plates were infected at the indicated mois. h after infection, cells were lysed and gaussia luciferase activity was measured using the pierce™ gaussia luciferase flash assay kit (thermofischer scientific). the sars-cov- betacov/france/idf / isolate was supplied by pr. sylvie van der werf and the national reference centre for respiratory viruses hosted by institut pasteur (paris, france). the virus was amplified in vero e cells (moi , ) in serum-free media supplemented with , µg/ml l- -p-tosylamino- -phenylethyl chloromethylketone (tpck)-treated trypsin (sigma-aldrich). the supernatant was harvested at h post infection when cytopathic effects were observed (with around % cell death), cell debris were removed by centrifugation, and aliquots stored at - c. viral supernatants were titrated by plaque assays in vero e cells. typical titers were . plaque forming units (pfu)/ml. infections of a -ace cells were performed at the indicated multiplicity of infection (moi; as calculated from titers obtained in vero e cells) in serum-free dmem and % serum-containing dmem, respectively. the viral input was left for the duration of the experiment and cells lysed at h post-infection for rt-qpcr analysis. the datasets generated during and/or analysed during the current study are available from the corresponding authors on reasonable request. requests for material should be addressed to caroline goujon or olivier moncorgé at the corresponding address above, or to addgene for the plasmids with an addgene number. b.b. and cg designed the study, analysed the data and wrote the manuscript. b.b. and c.g. and mageck analyses, respectively. all authors have read and approved the manuscript. the authors have no conflicts of interest to declare in relation to this manuscript. supplementary production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone alpha interferoninduced antiretroviral activities: restriction of viral nucleic acid synthesis and progeny virion production in human immunodeficiency virus type -infected monocytes in vivo gene transfer to the mouse eye using an hiv-based lentiviral vector; efficient long-term transduction of corneal endothelium and retinal pigment epithelium inhibition of human immunodeficiency virus (hiv) replication by hiv-trans-activated alpha -interferon somatic expression of line- elements in human tissues developmental and cell type specificity of line- expression in mouse testis: implications for transposition complex interplay between hiv- capsid and mx -independent ifnα-induced antiviral factors hiv- uncoats in the nucleus near sites of integration a sensitive and specific enzyme-based assay detecting hiv- virion fusion in primary t lymphocytes specific inhibition of viral protein synthesis in hiv-infected cells in response to interferon treatment detection of novel coronavirus ( -ncov) by real-time rt-pcr nuclear pore blockade reveals that hiv- completes reverse transcription and uncoating in the nucleus am i ready for crispr? a user's guide to genetic screens hiv- and interferons: who's interfering with whom? the interferon-inducible isoform of ncoa inhibits endosome-mediated viral entry cell type-specific expression of line open reading frames and in fetal and adult human tissues rna g-quadruplexes in biology: principles and molecular mechanisms further investigation of simian immunodeficiency virus vif function in human cells novel host restriction factors implicated in hiv- replication multiple fates of l retrotransposition intermediates in cultured human cells mov rna helicase is a potent inhibitor of retrotransposition in cells characterization of the alpha interferon-induced postentry block to hiv- infection in primary human macrophages and t cells characterization of simian immunodeficiency virus sivsm/human immunodeficiency virus type vpx function in human myeloid cells human mx is an interferon-induced post-entry inhibitor of hiv- infection evidence for ifnα-induced, samhd -independent inhibitors of early hiv- infection comprehensive mutational analysis of yeast dexd/h box rna helicases required for small ribosomal subunit synthesis hiv- blocks the signaling adaptor mavs to evade antiviral host defense after sensing of abortive hiv- rna by the host helicase ddx mx gtpases: dynamin-like antiviral machines of innate immunity recombinant human interferon alfa-a suppresses htlv-iii replication in vitro immunoproteasome activation enables human trim α restriction of hiv- mx is an interferon-induced inhibitor of hiv- infection ultrafast and memory-efficient alignment of short dna sequences to the human genome the sequence alignment/map format and samtools mageck enables robust identification of essential genes from genome-scale crispr/cas knockout screens selective silencing of euchromatic l s revealed by genome-wide screens for l regulators the highly polymorphic cyclophilin a-binding loop in hiv- capsid modulates viral resistance to mxb cutadapt removes adapter sequences from high-throughput sequencing reads investigation of influenza virus polymerase activity in pig cells high frequency retrotransposition in cultured mammalian cells somatic mosaicism in neuronal precursor cells mediated by l retrotransposition reverse genetic system, genetically stable reporter viruses and packaged subgenomic replicon based on a brazilian zika virus isolate in vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector generation of transmitted/founder hiv- infectious molecular clones and characterization of their replication capacity in cd t lymphocytes and monocyte-derived macrophages a virus-packageable crispr screen identifies host factors mediating interferon inhibition of hiv comparison of gene transfer efficiencies and gene expression levels achieved with equine infectious anemia virus-and human immunodeficiency virus type -derived lentivirus vectors determination of l retrotransposition kinetics in cultured cells quantitative microscopy of functional hiv post-entry complexes reveals association of replication with the viral capsid inhibitors of alphavirus entry and replication identified with a stable chikungunya replicon cell line and virus-based assays many line elements contribute to the transcriptome of human somatic cells a vesicular stomatitis virus replicon-based bioassay for the rapid and sensitive determination of multi-species type i interferon characterization of the atpase and unwinding activities of the yeast deadbox protein has p and the analysis of the roles of the conserved motifs g-quadruplexes and g-quadruplex ligands: targets and tools in antiviral therapy both virus and host components are important for the manifestation of a nef-phenotype in hiv- and hiv- restriction of feline immunodeficiency virus by ref , lv , and primate trim alpha proteins improved vectors and genome-wide libraries for crispr screening hiv- capsid-cyclophilin interactions determine nuclear import pathway, integration targeting and replication efficiency fiji: an open-source platform for biologicalimage analysis genome-scale crispr-cas knockout screening in human cells high-throughput functional genomics using crispr-cas complementation of vif-defective human immunodeficiency virus type by primate, but not nonprimate, lentivirus vif genes controls use of the hiv a / splice acceptor cluster and is essential for efficient replication of hiv ddx potentiates hiv- transcription as a co-factor of tat the dead-box helicase ddx substitutes for the cap-binding protein eif e to promote compartmentalized translation initiation of the hiv- genomic rna tightly regulated, developmentally specific expression of the first open reading frame from line- during mouse embryogenesis ddx p--a human dead box protein with rna chaperone activities the dead box protein ddx p modulates the function of aspp , a stimulator of apoptosis genome-wide crispr screens reveal host factors critical for sars-cov- infection characterization of novel sf b and s u snrnp proteins, including a human prp p homologue and an sf b dead-box protein identification of rna helicases in human immunodeficiency virus (hiv- ) replication -a targeted small interfering rna library screen using pseudotyped and wt hiv- fastq screen: a tool for multi-genome mapping and quality control role of mxb in alpha interferon-mediated inhibition of hiv requirement of ddx dead box rna helicase for hiv- rev-rre export function genetic interactions of g-quadruplexes in humans indirect immunofluorescence analysis of endogenous ddx in primary mdms. mdms were fixed, endogenous ddx and the nuclei were visualized using ddx -specific antibodies and hoechst staining, respectively, and confocal microscopy we wish to thank tom doyle and chad swanson for their useful comments on the manuscript, key: cord- -umuiovrd authors: bindayna, khalid mubarak; crinion, shane title: variant analysis of sars-cov- genomes in the middle east date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: umuiovrd background coronavirus (covid- ) was introduced into society in late and has now reached over million cases and , deaths. the middle east has a death toll of ∼ , and over , of these are in iran, which has over , confirmed cases. we expect that iranian cases caused outbreaks in the neighbouring countries and that variant mapping and phylogenetic analysis can be used to prove this. we also aim to analyse the variants of severe acute respiratory syndrome coronavirus- (sars-cov- ) to characterise the common genome variants and provide useful data in the global effort to prevent further spread of covid- . methods the approach uses bioinformatics approaches including multiple sequence alignment, variant calling and annotation and phylogenetic analysis to identify the genomic variants found in the region. the approach uses samples from the countries of the middle east sourced from the global initiative on sharing all influenza data (gisaid). findings we identified distinct genome variants including downstream gene variants, frame shift variants, missense variants, start lost, start gained, stop lost, synonymous variants and upstream gene variants. the most common, high impact variants were deltinsg, delcinsc, delcinsc and delainsa. variant alignment and phylogenetic tree generation indicates that samples from iran likely introduced covid- to the rest of the middle east. interpretation the phylogenetic and variant analysis provides unique insight into mutation types in genomes. initial introduction of covid- was most likely due to iranian transmission. some countries show evidence of novel mutations and unique strains. increased time in small populations is likely to contribute to more unique genomes. this study provides more in depth analysis of the variants affecting in the region than any other study. funding none on january h , the china centre for disease control reported that of suspected cases of pneumonia were due to a novel human coronavirus (cov), now known as severe acute respiratory syndrome cov (sars-cov- ) . the genome for this novel virus was then made publicly available on the global initiative on sharing all influenza data (gisaid) the next day. sars-cov- is an easily spreadable virus which would evolve into a global pandemic of at least million cases and , deaths . one of the first countries to experience a significant outbreak was iran. the country reported its first confirmed case on th february from a merchant in qom who travelled from china . . many of the first countries with infections in the middle east were linked to travellers from iran including lebanon, kuwait, bahrain, iraq, oman and uae. covid- continued to spread to the remaining middle eastern countries with a death toll of over , people according to health authorities. this number is expected to be an underestimation due to countries effected by war including libya, syria and yemen. needless to say, there have been devastating effects to the region and the real effects are expected to be unreported . researchers are racing to develop a vaccine that can provide viral immunity and avoid additional deaths. sar-cov- is transmitted using the spike protein which binds to human angiotensin-converting enzyme (ace ) receptor; the virus is easily transmittable due to mutations in the receptor-binding (s ) and fusion (s ) domain of the strain . transmission could be made even easier if more mutations accumulate. although mutations are rare, they can create new strains and it is not guaranteed that the current leading vaccine trials will be effective as sars-cov- continues to mutate . by categorizing variants, we can identify any new strains and how the mutations are likely to affect spread. as the middle east is often under reported, it is important to characterise the variants of strains that are commonly present. analysis of the common variants in the middle east is essential to develop a vaccine that treats the strains in the region. this analysis helps understand the viral genome landscape and identify clades of the region. • our hypothesis is that variants found in sars-cov- genomes from middle eastern samples will indicate delivery from iran. we will use bioinformatics tools and publicly available samples to explore the composition of strains within each country. we expect that many strains will show evidence of iranian origin. • the aim is to explore the structure of middle eastern genome strains using multiple sequence alignment, tree generation and variant prediction (and others). if we explore the structure and common variants of sars-cov- strains in these populations, we expect to learn more about how the virus spread. sample source: we obtained the publicly available data from the global initiative on sharing all influenza data (gisaid) . samples were also filtered to high quality when possible. samples was selected as the optimum number to cover all possible countries and remain within alignment file limit of size mb (maximum size for clustal omega tool). in countries with samples, to prevent sample sourcing from same outbreak, the earliest and most recent samples were taken. all samples were downloaded from gisaid and then concatenated into a single multi-sample file and saved in fasta format. multiple sequence alignment: using the collected samples, multiple sequence alignment (msa) was performed using clustal omega (https://www.ebi.ac.uk/tools/ msa/clustalo/) . the clustal omega online tool was used to perform the alignment (found at: https://www.ebi.ac.uk/tools/msa/clustalo/). the online tools allows up to sequences or a maximum file size of mb, therefore the maximum number of samples was used. the concatenated file of samples was uploaded to the online tool. for step , the output parameters selected were pearson/fasta. all other options were kept at the default option. the output file generated is an alignment file; the file consists of all sequences with gaps denoted by '-'. the output file format is also a fasta file. variant identification: variant calling was performed using the alignment fasta file and the snp extraction tool snp-sites (https://github.com/sanger-pathogens/snp-sites). these tools identify the snp sites by taking a multi-sample fasta file as input. the program then restructures the data as a variant call format (vcf) file. the vcf file provides a clear mapping of snps from the aligned sequences -this allows us to easily identify the snp location and the genotype for each sample at a given locus. in the outputted vcf file, the rows correspond with each unique variant and the column provides the genotype at the given site. we used snp-eff to perform the variant annotation information such as the variant definition and the overlapping gene (found at: (https://pcingola.github.io/snpeff/snpeff.html). snpeff also predicts the effect of the variants. snpeff is integrated into the galaxy web-based tool for bioinformatics analysis (found at: usegalaxy.org) . we utilised the galaxy platform and uploaded our vcf file to galaxy using their online upload tool. to annotate variants, you must first build a database from the reference genome. this is performed using the "snpeff build" tool on galaxy. to create the snpeff database, we downloaded data from ncbi for wuhan reference nc_ . and uploaded to galaxy. to build the database, we directed to ncbi and searched for nc_ . we then downloaded the corresponding gff file, which contains the annotations, and the fasta file, which contains the entire genome. we then selected the build database option. once the database was built, we selected the "snpeff eff" tool to annotate variants. galaxy populates the fields for vcf with the uploaded file from the previous step. the output format is selected as vcf and csv report was also selected for additional useful information for downstream analysis. for genome source parameter, the option "custom" was selected to use the newly created database. other default parameters were used including bases for upstream / downstream length and bases as set size for splice sites (donors and acceptor) in bases. all filter output and additional annotation options were deselected and analysis was ran. once the analysis was executed, the annotation data is outputted as an annotated vcf and a html report file. data visualisation: once the annotated vcf was generated, the vcf was imported to r for extraction of the variant annotation information. the annotated data was imported, manipulated and plotted using r v . . . the dplyr v . . package was used to summarise and align the data . the ggplot package was used to align the identified variants and visualise the types of mutations that re-occur . the x-axis in the plots indicates the variant position along the sars-cov- genome; the left y-axis indicates the sample name and the right y-axis represents the country of origin for each sample. this plot is used to compare the genome in different populations. the data was then ordered by date of first reported case, meaning that wuhan is followed by united arab emirates and the final country is cyprus. phylogenetic analysis: phylogenetic analysis was performed using beast (bayesian evolutionary analysis sample trees) v . . , to perform bayesian analysis of molecular sequences using monte carlo markov chains (mcmc) . the analysis followed the approach recommended to reconstruct the evolutionary dynamics of an epidemic. the aim of this is to obtain an estimate of the origin of the epidemic in the region and understand how it spread through the middle east. to undertake the analysis, we opened beauti, the graphical application used to analyse the control file. although it requests a nexus file, the fasta file can also be used. the data was uploaded using the import data option and appeared under the partitions section. beauti confirmed that sites are present in the uploaded data. the default options are selected for site model and clock model. next, we specified the individual virus dates by selecting the "tips" panel and selecting the "use tip dates" option. a tab delimited file was uploaded which specified the upload date. this information was extracted from the names as they were downloaded from gisaid. next, we set the substitution model by selecting the "sites" tab and selected the default options of hky model, the default estimated base frequencies and select gamma as site heterogeneity model. next, the molecular clock was selected under the "clock" tab as a strict clock since we know that the frequency of mutation is low. the tree options are elected under the "tree prior" tab as "random starting tree" for the tree model and "coalescent: exponential growth", a model that assumes a finite but constant population size and predicts that all alleles will be removed from the population individually. this provides additional predictions on the reproductive rate. in the "priors" tab, select the scale as for prior distribution which models the expected growth for a pandemic. the operators require no changes from the default. the mcmc option for chain length is set as , and sampling frequency to . tree visualisation: finally, we summarised the tree using the treeannotator tool, an additional package as part of beast. we first select the file generated using beast and outputted the tree file. then the output nexus file was imported to figtree program to display. once we opened figtree to display, we re-ordered the order by increasing value and then switched on branch labels. we switched on node bars and selected the % highest posterior density (hpd) credible intervals for the node heights. we plotted a time scale by turning on the scale axis and then setting the time scale section for offset as . , the latest date of collection for our samples. sequence alignment and variant calling were completed successfully. once these were complete, variation annotation was performed. we identified distinct genome variants which are recorded in table . the most common, high impact variants were deltinsg, delcinsc, delcinsc and delainsa. the frequency of each unique variant type can be found in table , which outlines the locus of all snps with over instances. frame shift variants start lost stop lost upstream gene variant these results were then used to generate the dotplot of variant alignment (figure ). the dotplot successfully indicated a pattern in variants that could not be easily identified from the alignment or annotation files. the alignment includes samples in facets based on their country. the alignment shows a pattern in variants that occur between each country. for example, this is prominent in qatar, jordan and oman where the pattern makes the country distinctive from the variants plotted for other countries. in addition, the phylogenetic tree generated branching indicative of an iranian origin ( figure ) the aim of this study was to identify whether covid- was introduced to the middle east from iran and also to explore the genomic composition in the region. our study performs sequence alignment to compare all sequences against the reference genome. once this is complete, the annotated variants were extracted to generate a plot mapping variants, grouping samples by country. the plot as seen in figure , shows clear distinctive patterns within countries that are not obvious from the generated alignment and annotation files. this may indicate a diverse, new strain is circulating in the country. cyprus has little diversity in the variant mapping which is surprising given its late date for first reported cases. another interesting point is that time-varied samples were taken for countries with samples. we see not indication that there are distinct groups within countries. this further indicates the the mutation frequency is low. it also indicates that there is more variation in the genomic composition in samples from different countries than differences found in samples from different collection times. smaller populations can cause greater accumulation of variants through genetic drift. this may occur given local lockdowns and travel restrictions that have been enforced worldwide. it is possible that these genomic strains with new mutations may create a situation where the countries develop a deathly strain that is not prominent in other parts of the world. this could result in a situation where a country is disproportionately affected by accumulating deaths or an inefficient vaccine. phylogenetic trees help in understanding the evolutionary relationships between groups. in the present context, we use them to identify the earliest strains and to track the spread of covid- across the middle east. the tree shows that uae samples are distinguished and form one clade. this correlates with their early intervention and lockdown and subsequently appears to have resulted in a unique genome. samples from qatar also form the majority of clade, with many of the wuhan samples, indicating that they are similar to the wuhan samples and show little distinction. egypt also becomes a distinct branch earlier than most samples. these examples are indicative of the global response -the lockdown of each country and prevention of spreading has resulted in sars-cov- strains of great similarity within each country. if lockdowns were not enforced, it is likely that these clades would be less distinguisable as mutations are spread between countries. as we expected, of the highest branches points attach to iranian samples, further implementing iran in the initial spread across the middle east. the phylogenetic tree therefore indicates what we suggest in our hypothesis -most samples originate from the iranian sample. this is not surprising given the vast number of cases and early crisis state of the country. however, it is useful to see that the variant analysis shows what we suspect at the genome level. a related study also came to this conclusion by using contact tracing from cases related to religious events in the city of qom, iran . risk assessment: outbreak of acute respiratory syndrome associated with a novel coronavirus covid- situation reports mapping the incidence of the covid- hotspot in iran -implications for travellers challenges to testing covid- in conflict zones: yemen as an example phylogenetic analysis and structural modeling of sars-cov- spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop on the origin and continuing evolution of sars-cov- data, disease and diplomacy: gisaid's innovative contribution to global health fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega snp-sites: rapid efficient extraction of snps from multi-fasta alignments a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w the galaxy platform for accessible, reproducible and collaborative biomedical analyses: update r: a language and environment for statistical computing. r foundation for statistical computing dplyr: a grammar of data manipulation ggplot : elegant graphics for data analysis bayesian phylogenetic and phylodynamic data integration using beast . virus evolution is visiting qom spread covid- epidemic in the middle east? the authors are grateful for the timely sequencing and release of genomes to make this study possible and to dr. anusha c p for her comments. this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. iran epi_isl_ iran epi_isl_ iran epi_isl_ iran epi_isl_ iran epi_isl_ iran epi_isl_ iran epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ israel epi_isl_ jordan epi_isl_ jordan epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ saudiarabia epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ turkey epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ unitedarabemirates epi_isl_ wuhan epi_isl_ wuhan epi_isl_ wuhan epi_isl_ wuhan epi_isl_ wuhan epi_isl_ wuhan epi_isl_ table : catalogue of sample accession id by country. key: cord- - nrzfon authors: stanifer, megan l.; kee, carmon; cortese, mirko; triana, sergio; mukenhirn, markus; kraeusslich, hans-georg; alexandrov, theodore; bartenschlager, ralf; boulant, steeve title: critical role of type iii interferon in controlling sars-cov- infection, replication and spread in primary human intestinal epithelial cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nrzfon sars-cov- is an unprecedented worldwide health problem that requires concerted and global approaches to better understand the virus in order to develop novel therapeutic approaches to stop the covid- pandemic and to better prepare against potential future emergence of novel pandemic viruses. although sars-cov- primarily targets cells of the lung epithelium causing respiratory infection and pathologies, there is growing evidence that the intestinal epithelium is also infected. however, the importance of the enteric phase of sars-cov- for virus-induced pathologies, spreading and prognosis remains unknown. here, using both colon-derived cell lines and primary non-transformed colon organoids, we engage in the first comprehensive analysis of sars-cov- lifecycle in human intestinal epithelial cells. our results demonstrate that human intestinal epithelial cells fully support sars-cov- infection, replication and production of infectious de-novo virus particles. importantly, we identified intestinal epithelial cells as the best culture model to propagate sars-cov- . we found that viral infection elicited an extremely robust intrinsic immune response where, interestingly, type iii interferon mediated response was significantly more efficient at controlling sars-cov- replication and spread compared to type i interferon. taken together, our data demonstrate that human intestinal epithelial cells are a productive site of sars-cov- replication and suggest that the enteric phase of sars-cov- may participate in the pathologies observed in covid- patients by contributing in increasing patient viremia and by fueling an exacerbated cytokine response. coronaviridae is a large family of single-stranded positive-sense enveloped rna viruses that can infect most animal species (human as well as domestic and wild animals). they are known to have the largest viral rna genome and are composed of four genera (cui et al., ) . generally, infection by coronaviruses results in mild respiratory tract symptoms and they are known to be one of the leading causes of the common cold (moriyama et al., ; paules et al., ) . however, in the last years, we have witnessed the emergence of highly pathogenic human coronaviruses: the severe acute respiratory syndrome-related coronavirus (sars-cov- ), the middle east respiratory syndrome-related coronavirus (mers-cov) and, at the end of , the severe acute respiratory syndrome-related coronavirus- (sars-cov- ) . sars-cov- is responsible for the coronavirusassociated acute respiratory disease or coronavirus disease (covid- ) and represents a major global health threat and coordinated efforts are urgently needed to treat the viral infection and stop the pandemic. although sars-cov- primarily targets cells of the lung epithelium causing respiratory infection, there is growing evidence that the intestinal epithelium can also be infected. multiple studies have reported gastro-intestinal symptoms such as diarrhea at the onset of the disease and have detected the prolonged shedding of large amounts of coronavirus genomes in the feces even after the virus was not detectable in oropharyngeal swabs (wu et al., b; xiao et al., ; xing et al., ; xu et al., b) (wölfel et al., ) . although one study revealed the isolation of infectious virus particles from stool samples , to date, it remains unclear how many people shed infectious viruses in feces. most critically, it remains unknown whether or not there is a possibility for fecal transmission of sars-cov- but multiple health agencies worldwide have highlighted this possibility. the presence of such a large amount of coronavirus genomes in feces is hardly explainable by a swallowing virus replicating in the throat or by a loss of barrier function of the intestinal epithelium which will allow the release of viruses or genomes from the inside of the body (circulation or lamina propria) to the lumen of the gut. instead, it is likely due to an active replication in the intestinal epithelium. recently, intestinal biopsies of sars-cov- infected patients clearly show the presence of replicating viruses in epithelial cells of the small and large intestine (xiao et al., ) . sars-cov- infection of the gastrointestinal tract is supported by the fact that ace , the virus receptor (hoffmann et al., ) , is expressed in intestinal epithelial cells (zhao et al., ) (lukassen et al., ) (wu et al., a) (venkatakrishnan et al., ) and single cell sequencing analysis suggest that its expression is even higher on intestinal cells compared to lung cells (xu et al., a) .this highlights that sars-cov- is not restricted to the lung but also infects the gastrointestinal tract. importantly, many animal coronaviruses are well known to be enteric and are transmitted via the fecal-oral route (wang and zhang, ) . additionally, the presence of human pathogenic coronaviruses in the gastrointestinal tract was previously reported for sars-cov- and mers-cov but remained seriously understudied (leung et al., ; wong et al.; zhou et al., ) . although it is now clear that human coronaviruses, particularly sars-cov- , are found in feces and can infect the gastrointestinal tract, the importance of its enteric phase for viremia, pathogenesis and patient prognosis remains unknown. to combat the current pandemic of covid- and to prepare for potential future emerging zoonotic coronaviruses, we need to gain a better understanding of the molecular basis of sars-cov- infection, replication and spread in a tissue-specific manner. here, we engaged in studying sars-cov- infection of human intestinal cells. for this, we exploited both human intestinal epithelial cell lines and human organoid culture models to characterize how these cells support sars-cov- replication and spread and how they respond to viral infection. direct comparison of both primary and transformed cells show that human intestinal epithelial cells fully support sars-cov- infection and de-novo production of infectious virus particles. interestingly, viral infection elicited a robust intrinsic immune response where type iii interferon mediated response was shown to be significantly more efficient at controlling sars-cov- replication and spread as compared to type i interferon. importantly, human primary intestinal epithelial cells responded to sars-cov- infection by producing only type iii interferon. taken together, our data clearly highlight the importance of the enteric phase of sars-cov- and this should be taken into consideration when developing hygienic/containment measures, and antiviral strategies, and when determining patient prognosis. as there is growing evidence that the gastro-intestinal tract is infected by sars-cov- , we engaged in studying virus infection, replication and spread in human intestinal epithelial cells (iecs). first, sars-cov- (strain bavpat ) was propagated in the green monkey cell line vero (see methods section). to detect viral infection, we used an antibody directed against a region of the nucleoprotein (np) which is conserved between of sars-cov- and sars-cov- . additionally, we used the j antibody which detects double-stranded rna (dsrna) which is a hallmark of rna virus replication (targett-adams et al., ) . cells positive for np were found to be always positive for dsrna; the np signal was found to be dispersed within the cytosolic area whereas dsrna were found in discrete foci likely corresponding to replication compartments (harak and lohmann, ) (fig. s a ). supernatants of infected vero cells were collected at hours post-infection (hpi) and the amount of infectious virus particles present was measured using a tcid approach on vero cells (fig. s b ). the colon carcinoma derived lines t and caco- cells were then infected with sars-cov- at a moi of . (as determined in vero cells) and, at different time post-infection, cells were fixed and immunostained using the anti-np and anti-dsrna antibodies (fig. a) . results show that sars-cov- infected caco- cells were readily detected as early as hpi and, by hpi, most of the cells were infected (fig. b) . similar results were observed in the t cells, but detection of infection was slightly delayed compared to caco- cells and, although the same amount of sars-cov- was used to infect these cells, only around % of the cells were found infected at hpi (fig. b) . these observations were in agreement with the increase in viral genome copy numbers over time (fig. c ) and the release of infectious virus particles in the supernatant of infected t and caco- cells (fig. d) . interestingly, infection of iecs by sars-cov- was associated with the generation of an interferon (ifn)-mediated intrinsic immune response. concomitant with the differences observed in virus replication and denovo virus production observed between t and caco- cells, t cells mounted a much stronger immune response compared to caco- cells (fig. e ) although much less t cells were infected (fig. b) . all together these results show that iecs are readily infected by sars-cov- and that infection of caco- cells lead to a weaker intrinsic immune response which is associated with more de-novo infectious virus production compared to t cells. this observation suggests that the ifn-mediated immune response controls sars-cov- infection in iecs. to directly test the function of ifns in controlling/restricting sars-cov- replication and spread in human iecs, t and caco- cells were mock-treated or pre-treated with type i (ifnb ) or type iii (ifnl) ifns. hrs post-treatment, cells were infected with sars-cov- at a moi of . (as determined in vero cells), in the presence or absence of ifns ( fig. a) . hpi, cells were immunostained using both the anti-np and anti-dsrna antibodies. results to address whether the endogenous levels of ifns generated by iecs controls sars-cov- replication and spread, we exploited our previously reported iecs depleted of either the type i ifn receptor (ifnar ) (ar-/-), the type iii ifn receptor (ifnlr ) (lr -/-) or depleted of both ifn receptors (dko). to control that our cells were functionally knocked-out for the type i ifn (ar-/-) and/or the type iii ifn receptor (lr-/-), t cells were treated with either ifn and the production of the ifn stimulated gene ifit was evaluated. as expected, type i ifn receptor knock-out cells (ar-/-) only responded to ifnl, whereas type iii ifn receptor knock-out cells (lr-/-) only responded to ifnb (fig. s ). the ifn receptor double knock-out cells (dko) did not respond to either ifn (fig. s ). wt or ifn receptor knock-out t cells were infected with sars-cov- at a moi of . (as determined in t cells). hpi, cells were immunostained using the anti-np antibody and the number of infected cells was quantified using fluorescent microscopy (fig. a) . results showed that depletion of the type i ifn receptor (ar-/-) resulted in a slight increase in the number of infected cells. importantly, depletion of the type iii ifn receptor (lr-/-) resulted in a massive increase of cell infectivity by a factor of around seven. similar results were obtained when both the type i and the type iii ifn receptors were depleted (dko) (fig. b) . interestingly, this increase in the number of infected cells upon depletion of the type iii ifn receptor (lr-/-) was associated with a significant increase in viral genome copy numbers (fig. c ) and with a three orders of magnitude increase in de-novo infectious virus production (fig. d ). together these results suggest that the type iii ifn-mediated immune response actively participates in controlling sars-cov- infection in human iecs. to unambiguously address the importance of the ifn-mediated antiviral response, we used the pan-jak inhibitor (pyridone- ) to inhibit the stat phosphorylation activation and block the production of interferon stimulated genes (isgs). as we previously reported, treatment of t cells with the pan-jak inhibitor fully inhibits signal transduction downstream both the type i and type iii ifn receptors ( (pervolaraki et al., ) and data not shown). mock and pyridone- pre-treated t cells were infected with sars-cov- for hrs and analyzed using fluorescence microscopy following immunostaining using the anti-np antibody. results show both a significant increase in the number of infected cells (fig. e ) and an increase of viral genome copy number in cells treated with the pan-jak inhibitor (fig. f ). importantly, and in agreement with the results observed in cells depleted of the type iii ifn receptor, this increase in infectivity was also associated with an increase in infectious denovo virus particle production ( fig. g ). all together, these results strongly support a model where the type iii ifn mediated signaling controls sars-cov- infection in human intestinal epithelial cells. to address whether primary human iecs can be infected by sars-cov- and support denovo virus production, we used human colon derived organoids from two distinct donors. intact ultrastructural organization and differentiation to all cell types (e.g. enterocytes, goblet cells, enteroendocrine cells, and stem cells) was confirmed using confocal fluorescent microscopy ( fig. a ) and quantitative-rt-pcr against cell type specific transcripts (fig. b ). as quantification of the number of infected cells in d organoids is very challenging, we exploited previously established protocols to differentiate and infect human intestinal organoids in two dimensions with viruses (ettayebi et al., ; stanifer et al., ) . non-differentiated organoids were seeded on human collagen-coated ibidi chambers. at hrs post-seeding, differentiation was induced by removal of wnt a and reducing the amounts of r-spondin and noggin for four days. upon full differentiation, organoids were infected with sars-cov- . at hpi, the infection was analyzed by immunostaining using the anti-np and anti-dsrna antibodies and by quantitative rt-pcr ( fig. c ). results show that independent of the donor, colon organoids were readily infected by sars-cov- as noted by the presence of cells positive for both np and dsrna (fig. d ). quantification revealed that around - % of cells were infected in each donor (fig. e ). this infection was associated with an increase of viral genome copy number (fig. f ). interestingly, infection of organoids led to no type i interferon (ifnb ) production but an extremely large up regulation of type iii ifn (ifnl) ( fig. g and fig. s ). to determine if exogenously added ifns could prevent sars-cov- infection, colon organoids were mock or pre-treated with ifnb and ifnl and then subsequently infected with sars-cov- for hrs. results show that pre-treatment of colon organoids with both ifnb and ifnl significantly impaired infection (fig. h ). this was associated with a reduction of sars-cov- genome copy numbers (fig. i ) and a decrease in infectious de-novo virus particle production (fig. j ). all together these results show that human colon organoids can support sars-cov- infection, replication and spread and that the type iii ifn response plays a critical role in controlling virus replication. human primary intestinal epithelial cells support robust replication of sars-cov- and secretion of infectious de-novo virus particles. we found that around % of the cells are infected in a human colon derived organoid leading to a modest but significant increase in viral genome copy number (fig. f ) and de-novo infection virus particles (fig. j ) over time. this modest replication of sars-cov- is explained by the fact that (i) only a small fraction of the cells is infected by sars-cov- , and (ii) as shown in this work, ifns are potent inhibitors of sars-cov- replication ( fig. and ) and organoids are highly immunoresponsive upon viral infection (stanifer et al., ) (fig. g) . it is well known that in vivo intestinal epithelium cells are less immunoresponsive because of the gut microenvironment (e.g. microbiota and tissue specific immune cells) and as such they will very likely show a severely dampened immune response allowing for an even greater sars-cov- replication. importantly, the intestinal epithelium is the largest organ in the body and even if only a few percent of the cells are infected it will result in the generation of an extremely large amount of de-novo viruses. analysis of the single-cell rna sequencing data from the colon atlas revealed that only . % of human colon epithelial cells express very low levels of the sars-cov- receptor ace- ( fig. s a-e) . this is very different to the small intestine where ace- appears to be more expressed (qi et al., ) . this low ace- level could explain why we have a small percentage of infected cells in our colon organoids (fig. e) . on the contrary, tmprss seems to be not a limiting factor in the colon (fig. s c-e) . interestingly, q-rt-pcr and western blot analysis do not support single cell analysis and clearly show that ace- is expressed in both our carcinoma derived lines and our colon organoids (fig. s f-g) . the discrepancy between the single-cell rna sequencing and classical molecular and biochemical approaches is likely the results of (i) the sequencing being not deep enough to detect ace- in individual cells and (ii) rna expression not necessarily matching the protein expression levels. these observations highlight that although analyzing data from single-cell rna sequencing atlases could be very informative, their findings should be validated in tissues as they may be mis-leading or miss important sites of virus replication. the human colon carcinoma caco- cells produce very large amounts of infectious viral particles (between and infectious particles per ml). this is higher than the titers obtained in vero cells which are commonly used to isolate and propagate sars-cov- (harcourt et al., ) and where we routinely obtained titers of and infectious particles per ml in vero cells (fig. s b) . interestingly, in a study comparing different human cell lines, caco- cells were the only cell type found to support sars-cov- replication and, compared to green monkey cells, caco- cells were very efficient in producing infectious de-novo virus particles and did not show cytopathic effects (mossel et al., ) . these observations that caco- are excellent culture models supporting both sars-cov- and sars-cov- further highlight the potentially central role of intestinal epithelial cells in covid- patients. intriguingly, t cells, which are also colon-carcinoma derived cells supported sars-cov- infection, replication and spread but to a much lesser extent compared to caco- cells. analysis of the intrinsic immune response generated upon infection of both cell lines revealed that t cells are more immunoresponsive compared to caco- cells. in light of the results obtained by pretreating cells with exogenous ifns, we propose that t cells, by being able to mount a stronger and faster immune response compared to caco- cells can better restrict sars-cov- infection. this model is fully supported by our ifn receptor knock-out t cells, in which virus replication and de-novo virus production are drastically increased to levels similar to the ones observed in caco- cells. exogenously added ifns (both type i and type iii ifns) induce an antiviral state in our human intestinal epithelial cells, thereby restricting sars-cov- replication in these cells. similar observations were made with type i ifn in vero cells (mantlo et al., ) . interestingly, infection of the calu- human lung epithelial cells by sars-cov- seem to also mount an immune response (lokugamage et al., ) importantly, our work shows that, compared to the airway epithelium, the intestinal epithelium produces a typical antiviral response. this highlights that host/pathogen interaction should be considered in a tissue specific manner as different cellular responses and viral countermeasures might be established between the lung, the gut and other organs. interestingly, in human colon organoids we observed that only type iii ifn is made upon sars-cov- infection, although human intestinal organoids are capable of making both type i and iii ifn upon enteric virus infection (pervolaraki et al., ; stanifer et al., ) . the lack of type i induction appears to be specific to sars-cov- and it is likely that this virus encodes a specific antagonist which counteracts the production of type i ifn only. however, further studies are necessary to prove this novel concept. we propose that the gut is an active site of replication for sars-cov- and this could account for the viremia observed in covid- patients and for the presence of large amounts of sars-cov- genomes in the feces. the origin of the replicating sars-cov- in the intestinal epithelium is still not clear. to date only one paper reported the isolation of infectious virus from stool samples . further characterization of the sars-cov- enteric lifecycle is necessary to determine whether the viral infection observed in the gut is due to fecal/oral transmission or is a manifestation of virus spreading from the lung to the gut. in the context of the gut we foresee that at the onset of sars-cov- infection, human intestinal epithelial cells will mount an antiviral response through the type iii ifn signaling pathway. as immune cells participate in mounting an innate immune response, type i ifn will be secreted from these cells and will be able to act on intestinal epithelial cells further reinforcing their antiviral state against sars-cov- . in respect to the severe pathologies observed in the lung, which are believed to be caused by a cytokine storm, the findings that lung epithelial cells mount a muted immune response upon sars-cov- infection suggests that the cytokines are coming from an alternative source. this source could be from local immune cells but also from the gastro-intestinal mucosa given the large immune response generated by primary human intestinal epithelial cells. as a matter of fact, many cytokines are released from epithelial cells towards the lamina propria (tissue side) (stanifer et al., ) , and will quickly enter the circulation to potentially fuel and promote the inflammation and pathology observed in the lung. in conclusion, the gastro-intestinal tract is an active site of sars-cov- replication and this should be considered when developing antiviral strategies as it may participate in the viremia and potentially reinfection. from a patient prognosis point of view, the severity of the disease should be correlated with the extent of the enteric replication. ifnl (il a) (# - k) and ifnl (il- b) (# - k) were purchased from peprotech and were used at a concentration of ng/ml each to make a final concentration of ng/ml, pyridone (calbiochem # - )was used at a final concentration of um. all sars-cov- infections were performed the moi indicated in the text. media was removed from cells and virus was added to cells for hour at °c. virus was removed, cells were wash x with pbs and media or media containing inhibitors/cytokines was added back to the cells. d organoid seeding. -well ibidi glass bottom chambers were coated with . % human collagen in water for h prior to organoids seeding. organoids were collected at a ratio of organoids/transwell. collected organoids were spun at g for mins and the supernatant was removed. organoids were washed x with cold pbs and spun at g for mins. pbs was removed and organoids were digested with . % trypsin-edta (life technologies) for mins at °c. digestion was stopped by addition of serum containing medium. organoids were spun at g for mins and the supernatant was removed and organoids were re-suspended in normal growth media at a ratio of µl media/well. the collagen mixture was removed from the ibidi chambers and µl of organoids were added to each well. rna isolation, cdna, and qpcr. rna was harvested from cells using rnaeasy rna extraction kit (qiagen) as per manufactures instructions. cdna was made using iscript reverse transcriptase (biorad) from ng of total rna as per manufactures instructions. q-rt-pcr was performed using itaq sybr green (biorad) as per manufacturer's instructions, tbp or hprt were used as normalizing genes. primer used: performed, and cells having fewer than genes and more than % of umi count mapped to mitochondrial genes were discarded. consecutively, the resulting datasets were normalized, scaled and high-variance genes genes were selected. reciprocal pca-based data integration was done to merge the samples. afterwards, the resulting batch-corrected counts were used for calculating pca-based dimensionality reduction and unsupervised louvain clustering. furthermore, umap visualization was calculated using neighboring points for the local approximation of the manifold structure. cell type annotation was based on the unsupervised clustering and the metadata provided by the colon atlas (smillie et al., ) . statistics. unless otherwise stated, statistical analysis was performed by a two-tailed unpaired t test using the graphpad prism software package. all samples were analyzed without blinding or exclusion of samples. zhou, j., li, c., zhao, g., chu, h., wang, d., yan, h.h.-n., poon, v.k.-m., wen, l., wong, b.h.-y., zhao, x., et al. ( ) . human intestinal tract serves as an alternative infection route for middle east respiratory syndrome coronavirus. sci. adv. , eaao . sars-cov- launches a unique transcriptional signature from in vitro, ex vivo comparative replication and immune activation profiles of sars-cov- and sars-cov in human lungs: an ex vivo study with implications for the pathogenesis of covid- type i and type iii interferons drive redundant amplification loops to induce a transcriptional signature in influenza-infected airway epithelia origin and evolution of pathogenic coronaviruses replication of human noroviruses in stem cell-derived human enteroids ultrastructure of the replication sites of positive-strand rna viruses isolation and characterization of sars-cov- from the first us covid- patient (microbiology) sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor enteric involvement of severe acute respiratory syndrome-associated coronavirus infection sars-cov- sensitive to type i interferon pretreatment outbreak of pneumonia of unknown etiology in wuhan, china: the mystery and the miracle sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells potent antiviral activities of type i interferons to sars-cov- infection seasonality of respiratory viral infections exogenous ace expression allows refractory cell lines to support severe acute respiratory syndrome coronavirus replication coronavirus infections-more than just the common cold type i and type iii interferons display different dependency on mitogen-activated protein kinases to mount an antiviral state in the differential induction of interferon stimulated genes between type i and type iii interferons is independent of interferon receptor abundance ifn-λ determines the intestinal epithelial antiviral host defense single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses intra-and inter-cellular rewiring of the human colon during ulcerative colitis reovirus intermediate subviral particles constitute a strategy to infect intestinal epithelial cells by exploiting tgf-β dependent pro-survival signaling asymmetric distribution of tlr leads to a polarized immune response in human intestinal epithelial cells visualization of double-stranded rna in cells supporting hepatitis c virus rna replication knowledge synthesis from million biomedical documents augments the deep expression profiling of coronavirus receptors animal coronaviruses: a brief introduction emerging and re-emerging coronaviruses in pigs detection of sars-cov- in different types of clinical specimens virological assessment of hospitalized patients with covid- covid- and the digestive system single-cell rna expression profiling of ace , the putative receptor of wuhan -ncov prolonged presence of sars-cov- viral rna in faecal samples evidence for gastrointestinal infection of sars-cov- prolonged viral shedding in feces of pediatric patients with coronavirus disease high expression of ace receptor of -ncov on the epithelial cells of oral mucosa characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding interferon-λ enhances adaptive mucosal immunity by boosting release of thymic stromal lymphopoietin single-cell rna expression profiling of ace , the receptor of sars-cov- (bioinformatics) we would like to acknowledge vibor laketa and the infectious diseases imaging platform (idip) at the center for integrative infectious disease research, heidelberg, germany, for support with image acquisition and analysis. we also thank christian drosten at the charité, berlin and the european virus archive (evag) for the provision of the sars-cov- strain bavpat . the authors declare no competing interests. key: cord- - e cn u authors: rad sm, ali hosseini; mclellan, alexander d. title: implications of sars-cov- mutations for genomic rna structure and host microrna targeting date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e cn u the sars-cov- virus is a recently-emerged zoonotic pathogen already well adapted to transmission and replication in humans. although the mutation rate is limited, recently introduced mutations in sars-cov- have the potential to alter viral fitness. in addition to amino acid changes, mutations could affect rna secondary structure critical to viral life cycle, or interfere with sequences targeted by host mirnas. we have analysed subsets of genomes from sars-cov- isolates from around the globe and show that several mutations introduce changes in watson-crick pairing, with resultant changes in predicted secondary structure. filtering to targets matching mirnas expressed in sars-cov- permissive host cells, we identified twelve separate target sequences in the sars-cov- genome; eight of these targets have been lost through conserved mutations. a genomic site targeted by the highly abundant mir- - p, overexpressed in patients with cardiovascular disease, is lost by a conserved mutation. our results are compatible with a model that sars-cov- replication within the human host could be constrained by host mirna defence. the impact of these and further mutations on secondary structures, mirna targets or potential splice sites offers a new context in which to view future sars-cov- evolution, and a potential platform for engineered viral attenuation and antigen presentation. the sars-cov- virus has rapidly emerged as a zoonotic pathogen with broad cellular tropism in human, or zoonotic-host cells. host selection pressure on sars-cov- virus would be expected to have a major impact on the conservation of mutations that enhance viral fitness. of these selection pressures, the cellular-based adaptive and innate immune systems will be major constraints to viral evolution. intracellular detection and anti-viral pathways within infected cells are a critical frontline to control virus replication. the success of the sars coronaviruses is proposed to be due to their ability to suppress intracellular anti-viral pathways. for example, interference with dsrna detection and the interferon response is enabled through the activity of several nonstructural proteins (nsp). in addition, the sequestration of genomic viral rna into double membrane vesicles, and dsrna cleavage by nsp , is inferred from the closely related sars-cov- virus, and likely acts to prevent intracellular detection of the virus [ ] . in addition to encoded mechanisms of immune avoidance, the paucity of cpg runs in sars-cov- genome with unexpectedly low gc-content at codon position-three points to major selection pressure being placed on structural features of the genome [ ] . as a recently emerged zoonotic pathogen, it might be expected that bat-adaptations will not be optimal for infection and replication in human cells. however, extensive mutation and strain-radiation has not yet been observed [ ] . the low mutation rate in sars-cov- is reduced by the activity of the ' - ' exonuclease nsp in the rna-dependent rna polymerase (rdrp) complex. an alternative possibility is that the observed mutation rate is lower than the actual mutation rate, but that deleterious mutations have already been lost through natural selection. the short time frame of sars-cov- evolution, coupled to a low mutation rate is consistent with a founder effect for geographical bias in mutation patterns [ ] . a common primary focus of mutational analysis of emerging viruses is the alteration in amino acid sequence of viral proteins that may provide enhanced or new functions for virus replication, immune avoidance, or spread. however, nucleic acid secondary structure and sub-translational events have a critical impact on genome replication [ ] , virus maturation and genome packaging [ ] . in addition, the rna secondary structures of sars-cov- genes have been proposed to be druggable targets [ ] [ ] [ ] . because little is known of the influence of sars-cov- mutations on the rna secondary structure, and its possible implications for inhibition by host mirna, we have modeled the impact of common mutations of the structure and susceptibility of the sars-cov- genome to interference from host mirna. the incident presence of host mirna targets within the sars-cov- genome may be pivotal for the action of host selection pressures to further shape further viral evolution. viruses not only alter host mirna expression, but may also produce mirnas to promote their infectivity [ ] [ ] [ ] [ ] . on the other hand, the host targets viral transcripts for inhibition of translation, or mrna destruction, through an mirna-mediated defence system. since mirna are extremely divergent between species, it would be expected that bat-adapted sars-cov- will undergo selection pressure derived from human mirna interference [ ] [ ] [ ] , ] . while perfect matches of mirna to target viral sequences results in mirna-induced silencing complex (mirisc)-mediate destruction of viral rna, imperfect matches interfere with translation [ ] . a growing body of evidence suggests that human mirnas act as a critical host defense against coronaviruses. an interaction between human coronavirus oc nucleocapsid and mir- can enhance the type i interferon response necessary to clear viral infection [ ] . several host mirnas (mir- - p, - , - , - , - and - a) bind to sars-cov encoded transcripts such as s, e, m, n and orf a [ , ] . on the other hand, sars-cov escapes from mirna-mediated defence through the manipulation of host mirna machinery [ , ] . additionally, sars-cov and sars-cov- express short rnas that resemble mirnas and could impact upon host house-keeping or immune defence processes [ ] [ ] [ ] . more recently, several studies have proposed that host mirnas bind sars-cov- transcripts [ , , ] . however, host mirnas inhibition of viral replication is relevant only if the identified mirnas are expressed in target host cells. both dna viruses, and 'cytoplasmically-confined' rna viruses, use the host rna splicing-machinery to generate new viral transcripts, or to modify the host transcriptome in favor of their own replication [ ] [ ] [ ] [ ] [ ] . it has been suggested that the fused leader sequence in ′ end of the mouse hepatitis virus (betacoronavirus) mrnas is result of a non-canonical splicing process [ ] . moreover, deep rna sequencing has identified several unknown sars-cov- viral rnas, possibly the result of non-canonical splicing events [ ] . therefore, our study has additionally identified and mapped mutations to predicted mrna splice sites within the sars-cov- genome. no selective advantage of the identified sequence alterations in sars-cov- should be inferred by their inclusion here. however, the potential of these mutations to impact upon rna structure and mirna recognition provides a basis for ongoing monitoring of viral evolution at these sites in the sars-cov- genome. the interplay of viral genome sequences and host mirna might be veiwed as academic and refractory for translation into usable clinical outcomes. however, the inclusion of host mirna binding sites into the orf of conserved viral regions essential for the viral life cycle is a feasible mechanism for enhancing the attenuation of live vaccines [ ] [ ] [ ] , or antigen availability of epiptopes for the adaptive immune response [ ] [ ] [ ] [ ] . geneious alignment tools were used to perform multiple sequence alignment. mutations with occurrence in multiple sequences originated from different countries were categorized as conserved mutation. potential splice donor / acceptor splice sites, exon splicing enhancer (ese), exon splicing silencer (ess), intron splicing enhancer (ise) and intron splicing silencer (iss) motifs were predicted using regrna [ ] , hsf [ ] and nipu [ , ] tools. we used well-accepted methods to predict the rna secondary structure in both wild type and mutated sequences. minimum free energy (mfe) structures [ ] and centroid structures [ ] were calculated by rnafold program to predict rna secondary structures. to evaluate the impact of mutations on rna secondary setructure and basepair probability, we utilized rnafold, rnaalifold [ ] , mutarna [ , ] and rnasnp [ ] programs. for identifying potential mirna binding sites, the sars-cov- genome was screened with regrna and mirdb [ ] . we excluded mirnas that are not expressed in sars-cov- target cells such as lung, esophagus, kidney and small intestine [ , ] . the expression level of mirnas in target cells were determined by tissueatlas [ ] , imota [ ] , or using published data. the impact of mutations on mirnas binding were visualized by regrna , mirdb, intarna [ ] and copomus [ ] . we used intarna to illustrate mirna binding to its target. a total of sars-cov- patient isolate sequences were collected from ncbi and gisaid databases were aligned against sars-cov- reference sequence (nc_ . ). the mutations present in multiple sequences and in at least in three different countries were categorized as 'conserved mutations' [ ] . in the line with previous reports, we confirmed the occurrence of conserved mutations at positions (nsp ), (nsp ), (nsp ), (nsp ), (nsp ), (s), (orf a) and / (n) [ ] [ ] [ ] [ ] [ ] . we also considered nine additional conserved mutations with occurrence in multiple countries (table & s ). most of these mutations are substitutions of c/g to u. the high a/u content (u= . %/ a= . %/ g= . %/ c= . %) and enrichment of codons in in pyrimidines is likely due to apobec editing of viral rna and the fact that nsp (proof-reading) does not remove u (the product of cytosine deamination) [ ] . two mutations at and , are in the ' and ' untranslated regions (utrs). seven mutations are silent point mutations, including , , and , while the others result in amino acid changes ( table ) . none of these amino acid changes are conservative substitutions. interestingly, the c u (s l in orf ) exists only in usa sequences, with the earliest isolated in march th (mt . ), after usa underwent lockdown [ ] ( figure s ). mutation amino acid change ggg to aac -nt - r k and g r ' utr g to u -nt - among all the mutations, only two mutations were predicted to have an impact on secondary structure of viral rna. first, a conserved mutation in nsp changed the secondary structure of nsp dramatically ( figure a ). this mutation also increased the watson-crick base-pair probability -predicted to result in more stable rna secondary structure ( figure b , c, d & e). this mutation had no effect on rna accessibility which is a consideration for rna-rna and rna-protein interactions ( figure f ). mutation occurs in a conserved region within ′ utr known as the coronavirus ' stem-loop ii-like motif (s m). this mutation has significant effect on secondary structure of the ′ utr (figure a & b) . it is wellknown that s m present in most coronaviruses and plays a vital role in viral replication and invasion [ ] [ ] [ ] . mutations in this region has been shown to increase the stability of ′ utr and its interaction with ′ utr [ ] . translation, transcription and ubiquitination regulation [ , ] . in addition, s m interacts with viral and host proteins such as the polypyrimidine tract-binding protein (ptb), to regulate viral replication and transcription [ , ] . collectively, these results suggest that and yield more stable rna structures. however, the relationship of changes in rna secondary structure of nsp and ′ utr to viral replication or infectivity must be tested in adequate experimental assays. [ ] . the mutated position is highlighted by a red line. in addition to previous studies, we have identified several human mirnas with potential binding sites across sars-cov- genome. we filtered our considered mirna to those with documented expression in sars-cov- target cells ( figure & s - ) and additionally focused on mirnas that have been reported as components of the anti-viral mirna-mediated defence system. we hypothesised that some mutations may represent escape mutations to avoid mirna-mediated defence. as shown in figure , ten mutations occur in mirna binding site and abolish the binding of mirnas to their targets. only one mutation (c u) inside nsp did not destroy the mir- d binding site (gisaid: epi_isl_ ). mir- - p is upregulated in patients with cardiovascular disease and its detection has been proposed as a biomarker [ ] [ ] [ ] . in addition, it is well-known that patients with cardiovascular disease are overrepresented in symptomatic covid- cohorts and have a higher mortality rate [ ] . the c u conserved, but synonymous, mutation within nsp sequence abolished the mir- - p target sequence (figure ). this mutation was introduced early january ( figure s ). this mirna was previously reported to act in defence against hepatitis viruses, such as hbv, hcv, hav and enterovirus [ ] [ ] [ ] . three mutations within nsp result in loss of mir- and mir- b- p target sequences, although they are not conserved mutations. both mirna are expressed in sars-cov- target cells ( figure b , s & s ). nsp a g is in a sequence obtained from vietnam (gisaid: epi_isl_ ), nsp g u was reported in a chinese patient [ ] and nsp c u was identified in the netherlands (gisaid: epi_isl_ ) ( figure ). we identified five mirnas with perfectly-matched complementary sequences within the s-gene: mir- b- p, mir- - p, mir- - p, mir- - p and mir- - p ( figure ). as shown in figure , four of these sites were altered by recently identified mutations in the s-gene. in particular, the mir- - p mirna is expressed at high levels in sars-cov- target cells ( figure b, s ) . the mir- - p target in the s-gene is removed by the c u conserved mutation identified early january in multiple countries ( figure s ). the mir- - p mirna acts as tumor suppressor in liver, lung and gastric cancers [ ] [ ] [ ] . the expression level of mir- - p declines during hbv infection [ , ] and mir- - p has a recognition site within vaccinia virus genome [ ] . also, the g a mutation present in an sequence from brazil (gisaid: epi_isl_ ), destroys the complementarity of the mir- - p binding site. of interest, two more conserved mutations c a (q k) and a g (d g) occur in binding sites for mir- b- p and mir- - p, respectively. lastly, g u in a patient sample isolated india (mt . ) removed the mir- - p binding site within the s gene ( figure ). in addition to the sites mentioned here, we identified an additional four host mirnas with perfect complementarity within the receptor binding domain (rbd) region of s gene ( figure ). these mirna are not expressed by sars-cov- target cells (data not shown). however, because these sites exist within the critical ace- targeting region, these mirna target sequences may be relevant to mirna-mediated virus attenuation technology. for example, viral replication can attenuated in a species-specific and tissue-specific manner by host mirna machinery, which controls viral tropism, replication and pathogenesis [ ] [ ] [ ] . atypical cytoplasmic rna splicing has been proposed to contribute to non-canonical viral transcripts, even for viruses that classically replicate in the cytoplasm [ ] [ ] [ ] [ ] [ ] [ ] . moreover, deep rna sequencing has identified several previously unidentified sars-cov- viral rnas that may be the result of non-canonical splicing events, or alternative transcriptional start sites [ ] . we used regrna [ ] , hsf [ ] and nipu [ , ] tools to identify the putative splice sites and motifs within sars-cov- genome. our computational prediction identified several ′ donor and ′ acceptor splice sites, as well as splice enhancer / inhibitor motifs [ ] (table ). however, none of conserved mutations introduced, or deleted, any potential splice sites. - at present there are nearly mutations identified within global sars-cov- isolates. these mutations are mostly limited to point mutations, with little evidence for recombination events mediating the simultaneous transfer of multiple mutations. although mutations may be due rdrp / nsp infidelity, the predominance of c ® u and g ® a mutations is consistent with base-editing defence (e.g. apobec / adar) [ , ] . the nsp exonuclease-based proof-reader is a critical counter-defence against host base-editor attack on the sars-cov- genome. it is also possible that the position of mutations within the genome could reflect accessibility of host base-editors to the sars-cov- genome upon uncoating, or during genome translation [ ] . in our study, we filtered mutations to common / conserved events according to published sources [ ] . there is little evidence that the existing mutations in sars-cov- have an impact on transmission, replication or viral load, but our study has flagged potential sites that could impact on viral fitness. it remains to be seen if these mutations undergo purifying selection in human populations over time. carriage of sars-cov- mutations through the rapid expansion into naive populations throughout the world is most likely due to a neutral founder effects, rather than from fitness gains. for example, the high ratio of non-synonymous to synonymous mutations is close to the expected value for an emerging virus that has not undergone purifying selection [ ] . one of the mutations (a g; d g) in the spike protein gene studied here defines the rapidly emerging gclade present in one-third of global sars-cov- isolates. this mutation has been extensively studied by korber et al. with respect to clinical outcomes and viral load [ ] . the korber et al. study showed that the a g did not enhance spike protein binding to ace- , however the mutation may be associated with higher viral load and poorer clinical outcomes. our own analysis demonstrated that this mutation abolished a potential interaction with mir- - p, a mirna highly expressed in target cells including oesophagus, lung epithelium and small intestine. we propose that the a g mutation may represent an escape variant from the action of host mir- - p. three mutations within nsp were predicted to abolish mir- and mir- b- p targeting. the expression of mir- and mir- b is altered upon viral infections [ ] [ ] [ ] [ ] . the expression level of mir- upregulates during h n , crimean-congo hemorrhagic fever virus, coxsackievirus a and enterovirus infection [ , , ] . on the other hand, mir- b was reported to be downregulated during hbv and ebola virus infections [ , ] . similar to what observed for mir- - p, both mir- b and mir- are upregulated in patient with cardiovascular disease [ ] [ ] [ ] . it is not yet clear if anti-viral mirnas have evolved as host defense against viral infection, or are simply critical gene regulatory elements that assume an additional role for targeting viral transcripts -particularly when the human cellular defence machinery is confronted by an emerging zoonotic virus [ , , ] . the possibility of including host mirna binding sites into the genome of live-attenuated viruses offers a further checkpoint for the further attenuation of live vaccines, in a host-cell specific manner. for example, the identification of mirna target sites in viral pathogens opens up opportunities for further study of viral host cell-tropism, or to create cellspecific or species-specific viral vaccines [ ] [ ] [ ] . finally, mirna sites within the cds of viral genes may be critical for ribosomal stalling, leading to the production of pioneer translation products (ptp). enhanced production of ptp peptides may be critical for mhc-i loading for boosting the anti-viral ctl response [ ] [ ] [ ] [ ] . conflict of interest iran, taiwan, turkey, australia and hong kong iran, taiwan, sir lanka and turkey table s .newly identified common mutations with the counties of occurrence. sars coronavirus pathogenesis: host innate immune responses and viral antagonism of interferon. current opinion in virology a comprehensive analysis of genome composition and codon usage patterns of emerging coronaviruses no evidence for distinct types in the evolution of sars-cov- . virus evolution a g-quadruplex-binding macrodomain within the "sars-unique domain" is essential for the activity of the sars-coronavirus replicationtranscription complex structure of the sars coronavirus nucleocapsid protein rna-binding dimerization domain suggests a mechanism for helical packaging of viral rna de novo d models of sars-cov- rna elements and small-molecule-binding rnas to guide drug discovery rna genome conservation and secondary structure in sars-cov- and sars-related viruses an in silico map of the sars-cov- rna structurome five questions about viruses and micrornas micrornas in viral acute respiratory infections: immune regulation, biomarkers, therapy, and vaccines. exrna viruses, micrornas and cancer rna virus microrna that mimics a b-cell oncomir viruses and micrornas virus-specific host mirnas: antiviral defenses or promoters of persistent infection? mirnas: small changes, widespread effects human coronavirus oc nucleocapsid protein binds microrna and potentiates nf-κb activation micrornome analysis unravels the molecular basis of sars infection in bronchoalveolar stem cells the mirna complexes against coronaviruses covid- how mirnas can protect humans from coronaviruses covid- sola, i. sars-cov-encoded small rnas contribute to infection-associated lung pathology computational analysis of microrna-mediated interactions in sars-cov- infection implications of the virus-encoded mirna and host mirna in the pathogenicity of sars-cov- influenza viruses and mrna splicing: doing more with less herpes simplex virus inhibits host cell splicing, and regulatory protein icp is required for this effect alternative splicing of human immunodeficiency virus type mrna modulates viral protein expression, replication, and infectivity rna splicing in borna disease virus, a nonsegmented, negative-strand rna virus cytoplasmic viral rna-dependent rna polymerase disrupts the intracellular splicing machinery by entering the nucleus and interfering with prp characterization of leader rna sequences on the virion and mrnas of mouse hepatitis virus, a cytoplasmic rna virus the architecture of sars-cov- transcriptome harnessing endogenous mirnas to control virus tissue tropism as a strategy for developing attenuated virus vaccines attenuation of semliki forest virus neurovirulence by microrna-mediated detargeting engineering microrna responsiveness to decrease virus pathogenicity major source of antigenic peptides for the mhc class i pathway is produced during the pioneer round of mrna translation mhc i-associated peptides preferentially derive from transcripts bearing mirna response elements a novel class of microrna-recognition elements that function only within open reading frames flu drips in mhc class i immunosurveillance an enhanced computational platform for investigating the roles of regulatory rna and for identifying functional rna motifs human splicing finder: an online bioinformatics tool to predict splicing signals pre-mrna secondary structures influence exon recognition freiburg rna tools: a central online resource for rna-focused research and teaching optimal computer folding of large rna sequences using thermodynamics and auxiliary information rna secondary structure prediction by centroids in a boltzmann weighted ensemble rnaalifold: improved consensus structure prediction for rna alignments a pan-cancer analysis of synonymous mutations rna snp: efficient detection of local rna secondary structure changes induced by snp s. human mutation mirdb: an online database for prediction of functional microrna targets integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of sars-cov- viral entry and highlights inflammatory programs in putative target cells multiorgan and renal tropism of sars-cov- . the new england journal of medicine distribution of mirna expression across human tissues imota: an interactive multi-omics tissue atlas for the analysis of human mirna-target interactions intarna . : enhanced and customizable prediction of rna-rna interactions emerging sars-cov- mutation hot spots include a novel rnadependent-rna polymerase variant the proximal origin of sars-cov- genetic diversity and evolution of sars-cov- . infection genomic diversity of sars-cov- in coronavirus disease patients phylogenetic network analysis of sars-cov- genomes translation-associated mutational u-pressure in the first orf of sars-cov- and other coronaviruses insights on early mutational events in sars-cov- virus reveal founder effects across geographical regions rna accessibility in cubic time the structure of a rigorously conserved rna element within the sars virus genome structural lability in stem-loop drives a ′ utr- ′ utr interaction in coronavirus replication the structure and functions of coronavirus genomic ′ and ′ ends a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing sars-cov- proteins exploit host's genetic and epigenetic mediators for the annexation of key host signaling pathways that confers its immune evasion and disease pathophysiology schnabel, r.b. mirna- and mirna- predict cardiovascular death in a cohort of patients with symptomatic coronary artery disease association of mir- - p, a circulating biomarker for heart failure, with myocardial fibrosis and adverse cardiovascular events among patients with stage c or d heart failure circulating micrornas as biomarkers and potential paracrine mediators of cardiovascular disease pathological findings of covid- associated with acute respiratory distress syndrome mir- expression in peripheral blood mononuclear cells from hepatitis b virus-infected patients host microrna mir- plays a negative regulatory role in the enterovirus infectious cycle by targeting the ran protein reciprocal control of mir- and il- /stat pathway reveals mir- as potential therapeutic target for hepatocellular carcinoma patient-derived mutations impact pathogenicity of sars-cov- the effect of mir- - p on hbx deletionmutant (hbx-d ) mediated liver-cell proliferation through cyclind regulation the egfr/mir- - p/eya axis controls breast tumor growth and lung metastasis epigenetic silencing of mir- - p contributes to tumorigenicity in gastric cancer by targeting ssx ip differential plasma microrna profiles in hbeag positive and hbeag negative children with chronic hepatitis b. plos one the function of microrna in hepatitis b virus-related liver diseases: from dim to bright a computational analysis to construct a potential post-exposure therapy against pox epidemic using mirnas in silico silencer elements as possible inhibitors of pseudoexon splicing extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- highly sensitive in vitro diagnostic system of pandemic influenza a (h n ) virus infection with specific microrna as a biomarker hepatitis b virus x protein enhances hepatocarcinogenesis by depressing the targeting of nusap mrna by mir- b. cancer biology & medicine systematic identification and bioinformatic analysis of micrornas in response to infections of coxsackievirus a and enterovirus identification of potential microrna markers related to crimean-congo hemorrhagic fever disease circulating microrna profiles of ebola virus infection role of micrornas in cardiac hypertrophy, myocardial fibrosis and heart failure micrornas in heart disease: putative novel therapeutic targets? microrna- b* induces apoptosis in cardiomyocytes through targeting topoisomerase (top ) the authors declare that they have no conflicts of interest with the contents of this article. ah and am conceived of the study and wrote the paper. ah performed the data analysis and produced the figures. key: cord- -wxmr eki authors: meysman, pieter; postovskaya, anna; de neuter, nicolas; ogunjimi, benson; laukens, kris title: tracking sars-cov- t cells with epitope-t-cell receptor recognition models date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wxmr eki much is still not understood about the human adaptive immune response to sars-cov- , the causative agent of covid- . in this paper, we demonstrate the use of machine learning to classify sars-cov- epitope specific t-cell clonotypes in t-cell receptor (tcr) sequencing data. we apply these models to public tcr data and show how they can be used to study t-cell longitudinal profiles in covid- patients to characterize how the adaptive immune system reacts to the sars-cov- virus. our findings confirm prior knowledge that sars-cov- reactive t-cell diversity increases over the course of disease progression. however our results show a difference between those t cells that react to epitope unique to sars-cov- , which show a more prominent increase, and those t cells that react to epitopes common to other coronaviruses, which begin at a higher baseline. the emergence of a novel coronavirus in , termed sars-cov- , has led to the most prominent global pandemic in recent history. infection by sars-cov- manifests as covid- , a disease of which symptoms and severity greatly vary, and which has caused substantial loss of life all over the world. characterizing the immune response against this novel virus has become a top priority, in the hope that novel insights can lead to new treatment plans or can aid in vaccine development. special attention is now being paid to t-cell response in particular, which was already determined an important factor in long-term immunity against coronaviruses during sars and mers outbreaks ( ) ( ) ( ) . moreover, sars-specific t cells are still found in individuals years later and demonstrate robust cross-reactivity against sars-cov- suggesting the possibility of long-term protection for sars-cov- ( ). current findings indicate that t cells play a key role in susceptibility to and severity of the ongoing covid- pandemic as well. in particular, disease severity has been found to be linked with a sub-optimal or excessive t-cell response ( ) . similarly, while most of the patients present with high antibody titers ( ) , frequently they do not provide necessary virus neutralization ( ) . moreover, some studies have reported that around % of patients recover with very low levels of neutralizing antibodies or none at all ( ) . most of the patients, however, developed cd + t cell response ( ) , with cells predominantly showing activation signals ( ) . activated cd + t cells were also detected in the majority of patients, sometimes even reaching % in convalescent patients ( ) . interestingly, % of patients were found to have cd + t cells specific to the sars-cov- spike protein in one study ( ) , and the magnitude of antibody response was correlated to the cd + t cell with the same protein specificity in another ( ) . surprisingly, t cells specific to the sars-cov- epitopes were even found in a high proportion of unexposed individuals ( , , ) , which was later demonstrated to be due to cross-reactivity with common cold coronaviruses ( ) and might explain pre-existing protection from the sars-cov- in some individuals. altogether those findings support a key role for t cells in the immune defense against sars-cov- and are important considerations for vaccine development. a multitude of computational approaches to discover t-cell epitopes for rational sars-cov- vaccine design have been performed using different perspectives: genetic similarity between sars and sars-cov- ( , - ), previous knowledge about sars immunogenic epitopes ( ), de novo epitope prediction with respect to affinity to mhc estimated utilizing structural biology ( , ) or machine learning ( , , ) , and mhc distribution across populations ( ) . different approaches resulted in various numbers of potential targets in different groups. for instance, lee et al. reported sars-cov- peptides identical to those of sars ( ), and fast et al. yielded de novo candidate t-cell epitopes ( ) . for some predictions, experimental validation was later performed by different groups ( ) . one key technique that facilitates insights into the makeup of an individual's t-cell repertoire is high-throughput tcell receptor sequencing. several tcr sequencing studies have now been performed in covid patients and revealed some characteristics of the t-cell receptor repertoire during covid- . one study, for example, found clusters of t cells tied to disease severity, which predominantly comprised public clonotypes ( ) . another group identified statistically enriched public tcr sequences from a large number of repertoires to distinguish sars-cov- positive from healthy individuals ( ) . in addition, several studies have determined sars-cov- specific tcrs and distinct cdr motifs down to the individual epitope level ( , ) . however, one key downside of t-cell receptor sequencing is the high diversity of tcr sequences across individuals. only a handful of tcr sequences can be found across different individuals. these so-called 'public' clones can therefore be tracked across individuals and stored in databases for reference. however, tcr repertoires consist mostly of individual-specific sequences, which due to their individual nature cannot be expected to be found in a database. recently, computational methods have been developed to convert epitope-tcr pairing data to predictive models ( ) that generalize the epitope-tcr specificity determinants. such models can be used to screen tcr repertoires to find additional potential epitope-specific tcrs that are not contained in any database ( ) . these models are based on the concept that tcrs targeting the same epitope tend to have similar amino acid sequences ( ) . in this study, we create such prediction models for sars-cov- epitopes and apply them to track epitope-specificity over time. epitope-tcr data. a collection was established of experimentally validated tcr-epitope pairs by combining two primary sources: • the vdjdb database, which contained tetramerderived data from shomuradova et al ( ) . accessed on the th of may, . • the immunecode collection from adaptive technologies and microsoft, which contained pairs derived through mira assay ( ) . accessed on the th of june, . for all extracted pairs, several data curation steps were performed. all pairs matching more than one possible sars-cov- epitope were removed for training data. only valid tcr sequences that could be matched to standard imgt were kept. where needed, a limit was placed on unique tcrs, which were then selected randomly. longitudinal tcr data. public repertoire data was retrieved through the ireceptor gateway ( ) in the week of the th of july, . in particular, tcr data was extracted of those studies that had longitudinal tracking of covid- patients, namely from minervina et al. ( ) and schultheiß et al ( ) . only those tcrs that occur at a frequency of at least in were retained, to compensate for the different sequencing depths between studies. meta data was made uniform so that the time points are annotated by days after onset of symptoms. protein and sequence data. protein sequence data for nidovirales species, which included the human and non-human sars viruses along with other coronaviruses and single-strand rna viruses, were downloaded from the corona oma orthology database ( ) . in this manner, the used protein amino acid sequences for sars-cov- corresponds to genbank accession gca_ . , and the protein sequences for sars-cov to gca_ . . epitopes were matched to all proteins for all species with an exact match, as the degree of variation allowed in the epitope space while retaining tcr recognition is still an unsolved question. matches across all species for each epitope were tallied, and the annotation for sars-cov- was retained. sequence identity between proteins was established using a pairwise protein blast. model training and application. for the machine learning part of this study, we made use of the tcrex framework ( ) . we trained models for all epitopes that had more than distinct tcrs. only those models that had an auc roc higher than . and a auc pr higher than . in a cross-validated setting were retained, as per the default tcrex criteria. the models were then applied to full tcr repertoires, where a match was defined as a probable epitope-specific tcr if the score is higher than . and the bpr is lower than e- . for normalization, reported hits were divided by the unique tcr repertoire size. recognition model performance. in total, distinct epitope tcrex models could be trained for sars-cov- . an overview of all models and their performance can be found in table s . this is almost as many models as are available for all non-sars-cov- epitopes combined ( ), indicating the vast amount of data that has been generated in the past few months compared to what has been collected for all prior pathogens and diseases. of these epitopes match the sars-cov- replicase protein coded by orf ab, match the sars-cov- spike protein encoded by orf and the final are distributed across the remaining proteins. in addition, of the epitopes are unique to sars-cov- in our data set of nidovirales species. as can been seen in figure , this does not seem to be evenly distributed across the protein of origin. out of the epitopes derived from the spike protein are unique to sars-cov- , whereas only out of are unique for the orf ab replicase protein. as previously reported ( ) , the spike protein of sars-cov- shares % amino acid sequence identity to that of sars-cov. in contrast, the replicase protein has % sequence identity between sars-cov- and sars-cov. it is well known that viruses accumulate mutations to avoid potential immunogenic epitopes ( , ) , and the same process may be playing a role here. as we are integrating models from different resources and diverse experimental methods, we wished to confirm if this data is comparable. interestingly, one epitope has both tetramer ( tcrs) and mira data ( tcrs), namely ylqprt-fll (ylq). however in the case of the mira data, the tcrs were not uniquely assigned to this epitope, but all were assigned to the trio ylqprtfl,ylqprtfll, and yyv-gylqprtf. these tcrs were thus excluded from the training data. thus the tetramer-based ylq model can be applied on the mira ylq data as an independent model. in this manner, tcrex predicted putative ylq-reactive t-cells in the ylq mira data out of tcrs. note that only tcrs matched between the two datasets based on cdr sequence (not accounting for v/j genes), showing that tcrex is able to extrapolate from found tcr patterns. this number of tcrex predictions was assigned an enrichment p-value of . e- based on the built-in binomial test. no other epitopes present in tcrex (including the non-sars-cov- models) were predicted to have a single tcr target within this data set. thus the data is comparable and the models can be used without respect of their origin. longitudinal tracking. once established, these models can be applied to any tcr repertoire data and thus can be used to study putative sars-cov- reactive t cells in the currently available covid- data. a large tcr data set was made available by schultheiß et al ( ) , which featured longitudinal samples from both patients with active disease and those that have recovered. as can be seen in figure , the percentage of predicted sars-cov- reactive tcrs increases as time goes on after on-set of symptoms (spearman rho = . , p-value = . ). this is both due to an expansion of distinct sars-cov- tcr sequences and a contraction of the remainder of the tcr repertoire. this matches findings from prior studies investigating t-cell immunity, which have observed that sars-cov- specific t-cell immunity mounts as disease progresses, alongside an absolute decrease in tcell population size ( , ) . in addition, the increase is irrespective of final disease outcome. as can be seen in figure , there seems to be a difference between those tcrs that are unique to sars-cov- versus those that are not. the fraction of tcrs that are predicted to match unique sars-cov- epitopes shows a similar increase as was found for all epitopes (spearman rho = . , p-value = . ). while that for the epitopes occurring across coronaviruses have a markedly lower increase signal that is not significant (spearman rho = . , p-value = . ). indeed cross-reactive tcrs predicted to target epitopes not unique to sars-cov- start at a higher level, and only seem to gradually increase as the infection progresses. this may be in line with prior reports of existing cross-reactive t cells in uninfected patients ( , , ). the number of sars-cov- tcrs do decrease once the pa- tient enters recovery. an example can be seen in figure s . this can also be seen based on the tcr data from minervina et al., where both donors were sampled after symptoms had disappeared as per figure s . t cells contracted over time points within a single individual were considered as associated with sars-cov- . a list of and contracted tcrs (one set for each donor) were identified using edger in the original study and published as supplemental materials. at the time, no epitope-specific tcr data was available for sars-cov- and thus it could not be analysed in this manner. if we apply all tcrex models (both sars-cov- and other viruses), we find strong enrichment for two sars-cov- epitopes in one donor (donor m), as can be seen in figure . the tcrs associated with the most prominent epitope, namely ylqprtfll, clearly decrease over time in donor m as seen in the original tcr data set as can been seen in figure s . note that the ylq epitope originates from the spike protein and is unique to sars-cov- , which matches the previous findings. the other donor had no such enriched epitopes of any origin, indicating that these tcrs might still be resulting from a not-included set of epitopes. in this paper, we have shown that there is sufficient sars-cov- epitope-tcr data to create a large number of epitopespecific tcr recognition models. these models can be used to screen tcr data from various individuals to track their t-cell immunity. in addition, using such models on longitudinal data reveals a potential difference in temporal dynamics between t cells predicted to react against epitopes that are unique to sars-cov- and those that are shared among other coronaviruses. t-cell immunity of sars-cov: implications for vaccine development against mers-cov understanding the t cell immune response in sars coronavirus infection long-lived effector/central memory t-cell responses to severe acute respiratory syndrome coronavirus (sars-cov) s antigen in recovered sars patients sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls t cell responses in patients with covid- kinetics of sars-cov- specific igm and igg responses in covid- patients convergent antibody responses to sars-cov- neutralizing antibody responses to sars-cov- in a covid- recovered patient cohort and their implications a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- eliisa kekäläinen, and petter brodin. systems-level immunomonitoring from acute to recovery phase of severe covid- presence of sars-cov- reactive t cells in covid- patients and healthy donors. medrxiv selective and cross-reactive sars-cov- t cell epitopes in unexposed humans in silico identification of vaccine targets for -ncov bioinformatic prediction of potential t cell epitopes for sars-cov- preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars immunoinformatics-aided identification of t cell and b cell epitopes in the surface glycoprotein of -ncov potential t-cell and b-cell epitopes of -ncov sars-cov- epitopes are recognized by a public and diverse repertoire of human t-cell receptors. medrxiv next-generation sequencing of t and b cell receptor repertoires from covid- patients showed signatures associated with severity of disease magnitude and dynamics of the t-cell response to sars-cov- infection at both individual and population levels on the feasibility of mining cd + t cell receptor patterns underlying immunogenic peptide recognition detection of enriched t cell epitope specificity in full t cell receptor sequence repertoires on the viability of unsupervised t-cell receptor sequence clustering for epitope preference ireceptor: a platform for querying and analyzing antibody/b-cell and t-cell receptor repertoire data across federated repositories longitudinal high-throughput tcr repertoire profiling reveals the dynamics of t cell memory formation after mild covid- infection the oma orthology database in : retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces phylogenetic analysis and structural modeling of sars-cov- spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop viruses selectively mutate their cd + t-cell epitopes-a large-scale immunomic analysis immunological evasion of immediate-early varicella zoster virus proteins characteristics of peripheral lymphocyte subset alteration in covid- pneumonia. the journal of infectious diseases longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of sars-cov- infected patients sars-cov- reactive t cells in uninfected individuals are likely expanded by beta-coronaviruses. biorxiv we wish to thank the scientists who have made their data available, and doing so have made these studies possible. key: cord- -ij gtesw authors: gultom, mitra; licheri, matthias; laloli, laura; wider, manon; strässle, marina; steiner, silvio; kratzel, annika; thao, tran thi nhu; stalder, hanspeter; portmann, jasmine; holwerda, melle; v’kovski, philip; ebert, nadine; stokar – regenscheit, nadine; gurtner, corinne; zanolari, patrik; posthaus, horst; schuller, simone; vicente – santos, amanda; moreira – soto, andres; corrales – aguilar, eugenia; ruggli, nicolas; tekes, gergely; von messling, veronika; sawatsky, bevan; thiel, volker; dijkman, ronald title: susceptibility of well-differentiated airway epithelial cell cultures from domestic and wildlife animals to sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ij gtesw severe acute respiratory syndrome coronavirus (sars-cov- ) has spread globally, and the number of cases continues to rise all over the world. besides humans, the zoonotic origin, as well as intermediate and potential spillback host reservoirs of sars-cov- are unknown. to circumvent ethical and experimental constraints, and more importantly, to reduce and refine animal experimentation, we employed our airway epithelial cell (aec) culture repository composed of various domesticated and wildlife animal species to assess their susceptibility to sars-cov- . in this study, we inoculated well-differentiated animal aec cultures of monkey, cat, ferret, dog, rabbit, pig, cattle, goat, llama, camel, and two neotropical bat species with sars-cov- . we observed that sars-cov- only replicated efficiently in monkey and cat aec culture models. whole-genome sequencing of progeny virus revealed no obvious signs of nucleotide transitions required for sars-cov- to productively infect monkey and cat epithelial airway cells. our findings, together with the previously reported human-to-animal spillover events warrants close surveillance to understand the potential role of cats, monkeys, and closely related species as spillback reservoirs for sars-cov- . introduction animals studied, only monkey and cat airway epithelial cells support efficient replication of sars-cov- the absence of infectious progeny virus in most animal species, except rhesus macaques and cats, indicates that certain animal species may be intrinsically refractory to sars-cov- infection, which may be due to incompatibility with the cellular receptor utilized by sars-cov- for cellular entry , . to assess whether the observed susceptibility to sars-cov- corresponds to the amino acid sequence conservation of the receptor-binding motif (rbm) in ace we performed in silico analysis on the available ace protein sequences , . the ace protein sequences from the two neotropical bat species (s. lilium and c. perspicillata) were not included in the analysis, due to their unavailability. similarly, the ace protein sequence for llama is not available, and therefore we used the sequence of alpaca (xm_ . ) as an alternative, as it is the closest relative. this revealed that in comparison to humans, the ace rbm regions interacting with the receptor-binding domain (rbd) of sars-cov- are well conserved in rhesus macaques and cats while being slightly more diverse in other species (fig. s a) . orthomyxoviridae and are known to have a broad host spectrum, including ferrets [ ] [ ] [ ] . the aec cultures from different species (rhesus macaque, cat, ferret, dog, rabbit, pig, cattle, goat, llama, camel, and two neotropical bats) were inoculated with . tcid of either iav or idv and incubated at °c and °c. after hours, the aec cultures were fixed and processed by immunofluorescence assays. this analysis showed that, in contrast to sars-cov- , iav antigen-positive cells could be detected in both companion animals aec cultures, as well as in the commonly used animal models, such as ferret, monkey, rabbit, and porcine aec cultures (fig. , fig. s a ) . for idv we observed antigen-positive cells in all aec model, except for rhesus macaque and one of the neotropical bat species, indicating that the aec cultures were all well-differentiated and susceptible to virus infection. in the immunofluorescence analysis we also incorporated an antibody against beta-tubulin marker to discern ciliated and non-ciliated cell populations. for both rhesus macaques and cats, sars- cov- antigen-positive cells predominantly overlapped with the non-ciliated cell populations, irrespective of the incubation temperature. using a polyclonal antibody against the human ace we could observe that the cellular receptor expression in rhesus macaques predominantly overlaps with sars-cov- cell tropism, indicating same cellular tropism (fig. s b) . unfortunately, the polyclonal antibody against the human ace did not bind the feline ace protein, we could therefore not formally demonstrate that sars-cov- virus-infected cat cells are indeed expressing ace on their surface. it has previously been shown that sars-cov- can undergo rapid genetic changes in vitro . since we observed efficient replication in rhesus macaque and cat aec cultures, therefore we assessed whether any mutations suggestive of viral adaptation had occurred. we performed whole-genome sequencing (nanopore sequencing technology) on the viral inoculum used as well as the progeny viruses collected after one passage, at hpi from the rhesus macaque and cat aec cultures incubated at °c and °c. this inoculum was either passage or passage virus stocks from the sars-cov- /münchen- . / / isolate we had received. in the viral sequences in the hpi samples from virus-infected rhesus macaque and cat aec cultures, we observed no obvious signs of nucleotide transitions that lead to nonsynonymous mutations compared to the respective inoculums ( fig. ) , irrespective of temperature and animal species. this highlights that the currently circulating sars-cov- d g- variant can productively infect rhesus macaque and cat airway epithelial cells. sequencing indicated that the current circulating sars-cov- d g-variant can efficiently infect rhesus macaque and cat airway epithelial cells. our data highlight that these two animals are potential models for the evaluation of therapeutic mitigation strategies for currently circulating viral variants. in conjunction with the previous documented spillover events, close surveillance of these animals, including closely related species, in the wild, captivity, and household situations is warranted. to date, there have been several published reports evaluating the suitability of animal models towards sars-cov- infection, including cats, rhesus macaques, dogs, pigs, and ferrets , , - . interestingly, we observed that sars-cov- does not efficiently replicate in our tracheobronchial airway epithelial cells derived from ferrets, whereas ferrets are used as animal models. this may be due to viral infections in ferrets are mainly restricted to the nasal conchae and are dose-dependent, and additionally, the origin of the cells used as input for the aec may not recapitulate the cells of the nasal mucosa , , . it is known that there are differences in cellular composition and the host determinant expression levels along proximal and distal regions of the respiratory tract . additionally, sars-cov- may even utilize a different cellular receptor in ferrets . therefore, it would be of interest to complement our current repository with aec cultures from different anatomical regions of animals like ferrets and to evaluate whether ace is employed by sars-cov- as the cellular receptor in the different animal species. it has been proposed that sars-cov- spillover into the human population, like sars-cov, has originated from bats, either directly or via an intermediate reservoir , . with more than bat species comprising more than % of all mammalian species, we restricted our experiments with sars- cov- to our established aec cultures from the two neotropical carollia perspicillata and sturnira lilium bat species (gultom et al manuscript in preparation) . we show that these two neotropical bats are not susceptible to sars-cov- , suggesting that they are not a likely reservoir host for sars-cov- , despite the detection of other coronaviruses and presumptive ace receptor usage by sars-cov- in the closely related bat species , . interestingly, it has recently been described that fruit bats (rousettus and stored at - °c for later analysis. following the collection of the apical washes, the basolateral medium was exchanged with fresh ali medium. each experiment was repeated as two independent biological replicates using aec cultures established from either one or two biological donors of each species, depending on the availability of procured animal tissue (table. for the quantification of sars-cov- apical washes were titrated by plaque assay on vero e cells. briefly, x cells/well were seeded in -well plates one day prior to the titration and inoculated with -fold serial dilutions of virus solutions. inoculums were removed hpi and replaced with overlay medium consisting of dmem supplemented with . % avicel (rc- , fmc biopolymer), % heat-inactivated fbs, µg/ml streptomycin, and iu/ml penicillin. cells were incubated at °c with % co for hours and fixed with % (v/v) neutral buffered formalin prior to staining with crystal violet . for the analysis on the conservation of ace among different species, the available ace protein clustalw in geneious . . . software (biomatters) using the default setting. ace protein residues interacting with sars-cov- , receptor binding motif (rbm), were selected based on previous described critical ace residues interacting with sars-cov- receptor binding domain (rbd) , . sequencing was performed on viral rna isolated from the sars-cov- stock and the hpi apical washes of sars-cov- -infected monkey and cat aec cultures according to the artic platform ncov protocols , . the v protocol was used as a basis for the reverse transcript and tiled multiplex pcr reaction using the artic ncov- v primer pool (see table. identification of a novel coronavirus in patients with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a pneumonia outbreak associated with a new coronavirus of probable bat origin early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia sars-cov- infection in farmed minks, the netherlands the risk of sars-cov- transmission to pets and other wild and domestic animals strongly mandates a one-health evidence of exposure to sars-cov- in cats and dogs from households in italy detection of sars-cov- in a cat owned by a covid- −affected patient in spain sars-cov- infection and longitudinal fecal screening in malayan tigers composition and divergence of coronavirus spike proteins and host ace receptors predict potential intermediate hosts of sars-cov- sars-cov- spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals livestock susceptibility to infection with middle east respiratory syndrome coronavirus limited susceptibility of chickens, turkeys, and mice to pandemic (h n ) virus susceptibility of pigs and chickens to sars coronavirus the contribution of animal models to the understanding of the host range and virulence of influenza a viruses influenza virus reservoirs and intermediate hosts: dogs, horses, and new possibilities for influenza virus exposure of humans emerging influenza d virus threat: what we know so far! well-differentiated primary mammalian airway epithelial cell tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus disparate temperature-dependent virus-host dynamics for sars-cov- and sars-cov in the human respiratory epithelium predicting infectious severe acute respiratory syndrome coronavirus from diagnostic samples susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus sars-cov- in fruit bats, ferrets, pigs, and chickens: an experimental transmission study dose-dependent response to infection with sars-cov- in the ferret model: evidence of protection to re-challenge structure of the sars-cov- spike receptor-binding domain bound to the ace receptor efficient activation of the severe acute respiratory syndrome coronavirus spike structural and functional basis of sars-cov- entry by using human ace respiratory syncytial virus infection of human airway epithelial cells is polarized isolation of a novel swine influenza virus from oklahoma in which is distantly related to human influenza c viruses the differentiated airway epithelium infected by influenza viruses maintains the barrier function despite a dramatic loss of ciliated cells sars-coronavirus- replication in vero e cells: replication kinetics, rapid adaptation and cytopathology infection and rapid transmission of sars-cov- in ferrets respiratory disease in rhesus macaques inoculated with sars-cov- sars-cov- is transmitted via contact and via the air between ferrets sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract lsectin interacts with filovirus glycoproteins and the spike protein of sars coronavirus bats are natural reservoirs of sars-like coronaviruses. science ( -. ) genetic diversity of bats coronaviruses in the atlantic forest hotspot biome many bat species are not potential hosts of sars-cov and sars-cov- : evidence from ace receptor usage rapid reconstruction of sars-cov- using a synthetic genomics platform cathepsin w is required for escape of influenza a virus from late endosomes determining the replication kinetics and cellular tropism of influenza d virus on primary well-differentiated human airway epithelial cells fiji: an open-source platform for biological-image analysis quick-and-clean article figures with figurej detection of novel coronavirus ( -ncov) by real-time rt-pcr characterization of human coronaviruses on well-differentiated human airway epithelial cell cultures sequencing protocol v v (protocols.io.bdp i rn) ncov- sequencing protocol v (locost) key: cord- - qupdn authors: mirvakili, seyed m; sim, douglas; langer, robert title: reverse pneumatic artificial muscles for application in low-cost artificial respirators date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qupdn one of the main challenges associated with mechanical ventilators is their limited availability in pandemics and other emergencies. therefore, there is a great demand for mechanical ventilators to address this issue. in this work, we propose a low-cost, portable, yet high-performance design for a volume-controlled mechanical ventilator. we are employing pneumatic artificial muscles, such as air cylinders, in the reverse mode of operation to achieve mechanical ventilation. the current design of the device can operate in two modes: controlled mode and assisted mode. unlike most icu ventilators, our device does not need a high-pressure air pipeline to operate. with the current design, mechanical ventilation for respiration rate ranging from b/min to b/min with a tidal volume range of ml to ml and i:e ratio of : to : can be performed. we achieved a total cost of less $ usd to make one device. we estimate the device to cost less than $ usd when produced in larger volumes. respiratory dysfunction due to diseases, physical damage to the lungs, or air pollution can be fatal if not treated in time. artificial ventilation is used to deliver "air" (e.g., oxygen + air/helium/nitric oxide), in a pure form or mixed with drugs, to the lungs, as well as to assist or replace spontaneous breathing. the two major methods of pulmonary ventilation are manual insufflation of the lungs (via mouth-to-mouth resuscitation or compressing a bag-valve-mask (bvm)) and mechanical ventilation of the lungs with electronically or mechanically controlled equipment. the manual method can be sufficient for temporary cardiopulmonary resuscitation (cpr), while mechanical ventilation is necessary for prolonged pulmonary ventilation. the state-of-the-art mechanical ventilators address the need for all range of respiratory failures; however, their time-to-manufacture, design sophistication, cost, and scalability for deployment in mass emergencies such as pandemics, make them not a viable solution in many situations. delivering air with a compliant bladder such as a bvm (e.g., ambu bag resuscitator) is considered to be among the simplest yet effective method for pulmonary ventilation. while being sufficient for emergency cases, the bvm manual resuscitators cannot be utilized for prolonged ventilation. moreover, the operation of bvms requires training and constant attention of the operator. this tedious and repetitive task can be automated to save the time of a well-trained medical staff who can then focus on other aspects of resuscitation. mit's e-vent solution addresses this issue by utilizing an electronically controlled mechanical gripper that periodically presses the compliant bladder of the bvm resuscitator ( ) . since its introduction, a variety of approaches have been proposed and implemented to automate the operation of bvm resuscitators ( ) . although automation is achieved, there are several shortcomings associated with interfacing bvm resuscitators with mechanical grippers. due to the dynamics of deflating the ambu bag, there can be inconsistencies between breathing cycles. moreover, it is difficult to monitor the flow rate and volume of air delivered to the lungs due to the compliance of the bag and its geometry. additionally, excessive and prolong compression/decompression of the bag, if not secured, can lead to material fatigue and leakage which makes it delicate ( ) . there are two major types of mechanical ventilation: positive pressure ventilation and negative pressure ventilation. in the positive pressure ventilation, a positive pressure of air is applied to the lungs to deliver a controllable volume to the respiratory system. in comparison, in negative pressure ventilation, the chest goes under a negative pressure, which expands the lungs and sucks air into the lungs (e.g., iron lung, chest cuirass). negative pressure ventilators are considered to be noninvasive. positive pressure ventilators were first introduced in the early s to treat polio patients with respiratory paralysis ( ) . they deliver air under a constant volume, pressure, flow rate, or a combination of these parameters. constant pressure ventilators can be noninvasive or invasive. in noninvasive ventilation, the air is delivered via an interface such as nasal, oronasal, facial masks, mouthpieces, and helmets. primary modes for noninvasive ventilation are continuous positive airway pressure (cpap), auto-titrating (adjustable) positive airway pressure (apap), and bilevel positive airway pressure (bipap). unlike the noninvasive ventilation technique, in the invasive technique, the air is delivered with a tube via endotracheal intubation or tracheostomy. positive pressure ventilators are the most popular type of respirator. they are used at home (for obstructive sleep apnea), resuscitation in the case of emergency, icus, and operation rooms during surgery. a typical icu ventilator starts around $ , and can be as expensive as $ , ( ) . in this work, we are demonstrating a fully functional device from readily available components such pneumatic artificial muscles. pneumatic artificial muscles are one of the most widely applied actuators in the industry due to their simple design ( , ) . the energy and power density of such artificial muscles do not exceed those of thermal actuators such as shape memory alloy fibers ( , ) and highly oriented semicrystalline fibers ( , ) . however, in terms of cycle life and performance stability, pams are among the most reliable actuators suitable for precision applications such as in biomedical devices. air cylinders, one of the sub-categories of artificial muscles, convert pneumatic energy to mechanical energy in a cylinder/piston type structure. we are using this operation mechanism in reverse by applying a linear stroke to the actuator to obtain air pressure/volume to perform the mechanical ventilation. thanks to the advances in manufacturing precision glassware, glass syringes of up to ml capacity are now commercially available. traditionally, in chemistry labs, glass syringes are used to smoothly deliver an exact volume of a gas to a system or collect a known volume of gas from a reactor. unlike plastic syringes, the small coefficient of friction between the plunger and the barrel of the glass syringe offers a smooth profile for flow rate and pressure. in our design we are utilizing a ml glass syringe as the pressure source for mechanical ventilation. in some anesthesia ventilators, smooth and accurate ventilation is achieved by employing bellows. however, the mechanical work for moving the bellows is provided by the high-pressure ( psi) central air pipelines of medical centers. pneumatic-powered mechanical ventilators' reliance on high-pressure air pipelines makes them less desirable for scenarios where a portable ventilator is needed. examples of such situations include home care, delivering patients in ambulances, office-based anesthesia practices, and transferring patients from icu to other units (e.g., imaging, operation room). moreover, unlike bellows, glass syringes exhibit low compliance. therefore, accurate tidal volume delivery can be performed. we believe our proposed solution is reliable for prolonged use and is accurate and low-cost (<$ usd). moreover, the device is scalable in emergencies and has a rapid manufacturing time. our design involves eight major components: ) stepper motor, ) motor driver, ) crank-shaft linkage, ) ml glass syringe, ) power supply, ) microcontroller, ) sensor, and ) interface io (i.e., pushbuttons, lcd, switches). the mechanical system is responsible for pushing/pulling the plunger of the glass syringe and is digitally controlled by the microcontroller ( figure a ). the outlet of the glass syringe is connected to the breathing circuit ( figure a ). the primary role of the breathing circuit is to enable the syringe to take in air from a reservoir bag or the ambient environment when the glass syringe is pulling. the air is then delivered to the lungs when the glass syringe is pushing without escaping from the intake pathway ( figure b ). in the current design, we are using a mechanical peep valve, as shown in figure b . it is possible to use a digitally controlled peep valve. the user interface lets the operator set parameters, stop/start the device, and monitor the pressure and flow rate on the lcd ( figure c ). the plunger of the glass syringe is linked to the crank by two joint linkages. the plunger is first attached to a rod end bolt by a resin (materials and methods). the rod end bolt is connected to a clevis rod end, which can freely rotate by °. the clevis rod end is linked to the crank via a threaded rod and a ball joint linkage. finally, the crank is attached to the stepper motor via a flange-mount shaft collar. the crank-shaft linkage converts the rotational motion of the stepper motor into a linear motion. the dynamics of the movement can be controlled by applying different pulse functions to the stepper motor (materials and methods). the outlet of the syringe is connected to a breathing circuit that connects to the lungs. we are employing the breathing circuit from an ambu bag for two reasons. first, the one-way respiratory valves, the positive end expiration pressure (peep) valve, and the pressure gauge valve are compact and easy to integrate with our design. second, in the case of failure, the ambu bag can be operated manually to resuscitate the patient in place of the device. our proposed device is categorized as an acv (assist-control ventilator) device, previously known as the continuous mandatory ventilation (cmv). the device controls the volume and can operate in two modes. the first mode is the controlled mode (e.g., cm mode), which is suitable for sedated and paralyzed patients. the second mode is the assisted mode (e.g., am mode) for assisting patients who where h is the height measured in unit of cm. figure illustrates the tidal volume as a function of the patient's height for different gravimetric tidal volumes. as shown in figure , the range for tidal volume is between ml to ml. respiratory rate (rr), also known as breath per minute (bpm), is the number of respiration cycles that occur per minute. the typical range for an adult is between - b/min ( ) . respiratory rate of up to b/min is also reported for covid- patients ( , ) . inhalation/exhalation ratio (i:e) defines the time ratio between the inhale and the exhale cycle. the typical value for a healthy human is : and is reduced to : or : in the presence of obstructive airway disease ( ) . end-inspiratory hold is defined as the hold time occurring at the end of the inspiration cycle. the endinspiratory hold maneuver is performed to eliminate the pressure contribution from the airway resistance and reveal the pressure in the alveoli ( ) . in the current design, the value is set as the percentage of the inhale time. inhale trigger pressure is only applied in the assisted mode. this represents the slight negative pressure that is developed in the breathing circuit when the body attempts to inhale. by adjusting this parameter, we can ensure the device is synchronized with the spontaneous respiration rate of the patient. positive end-expiratory pressure (peep) is the gauge pressure in the lungs (alveolar pressure) developed at the end of expiration. the peep is typically applied by the ventilator at the end of each breath to reduce the likelihood of the alveoli to collapse. this positive pressure 'recruits' the closed alveoli in the sick lung and improves oxygenation. in this we work, we employed a mechanical peep valve. depending on the condition of the patient (e.g., anesthesia, oxygen therapy, obstructive sleep apnea, sars), different combinations of parameters are used. for example, for acute respiratory distress syndrome (ards), which also occurs with covid- , a tidal volume of ml to ml with an i:e ratio of : to : with respiration rate of b/min to b/min is used ( ) . we made a test setup to evaluate the performance of the design. as depicted in figure , the setup is made of readily available components. to accelerate the prototyping of the setup, we built the breathing circuit from plastic tubes and tubing adaptors from fast shipping vendors. it is easy to replace the current breathing circuit with low-cost fda approved tubing with close no modification of the device. for logging data, we used a sampling rate of ms. for low respiration rates, ms sampling is sufficient. however, for higher respiration rates (> b/min), to log the data accurately faster sampling rate is required, which comes at the expense of stressing the microcontroller. in volume-controlled mode, the ventilator does not sense any efforts from the patient for breathing. it provides the tidal volume at a specific respiration rate and the i:e ratio that is configured by the operator. our device can operate within the range of the parameters listed in table . to evaluate the performance of the device, we swept the parameters, tested edge cases, and characterized boundary conditions that are used in real life scenarios as well. figure illustrates the volume, pressure, and flow rate for nine test cases (tabulated data in supporting information). we performed the end-inspiratory hold maneuver with a hold time of % to obtain the plateau pressure for each case. for experiments figure a , we kept the i:e ratio constant at : but increased the rr from b/min to b/min while decreasing the vt from ml to ml. as we decreased the tidal volume, both the peak pressure and plateau pressure decreased as well. this correlation can be explained by the fact that less air is moved to the test lung; therefore, a smaller pressure is developed inside it. for experiments in figure b , we kept the vt at ml and rr at b/min but increased the i:e ratio from to better evaluate the performance of our device, we performed ventilation of a test lung under identical device configurations used for a covid- patient with a maquet servo-i ventilator ( figure ). figure shows the results for a configuration of ml delivery at i:e ratio of : . , respiration rate of b/min, and peep of cmh o. this configuration resulted in a peak pressure of cmh o with a plateau pressure of cmh o with the ventilator. the slight discrepancies are due to the small differences in the mechanical compliance of the test lung/the breathing circuit that we used and the patient's lung/the breathing circuit that was used with the maquet servo-i ventilator. to test reproducibility of our device, we logged the performance parameters for consecutive cycles for this configuration with inspiration hold time of % ( . s). as figure a illustrates, the performance parameters are very consistent over the entire duration. we measured mean peak pressure of . ± . cmh o, mean plateau pressure of . ± . cmh o, and mean peep of . ± . cmh o. we performed a long cycle life test under vt = ml, i:e = : , rr = b/min, and inspiration hold = %. we measured mean peak pressure of . ± . cmh o, mean plateau pressure of . ± . cmh o, and mean peep of . ± . cmh o ( figure b ). the slight drift in the peep value is due the mechanical nature of the peep valve that we used. this drift increased the peak and plateau pressure as well. it is important to note that due to the kinematics of the actuator (supporting information), the plunger of the glass syringe does not travel linearly but rather exhibits a sinusoidal profile (supporting information). therefore, the flow rate has a 'sinusoidal' characteristic. however, it is possible to digitally control the stepper motor to obtain a linear profile for the tidal volume, similar to that of figure . the tidal volume and flow rate were almost constant, which is due to the fact that in every cycle, the stepper motor's position was recalibrated with the limit switch. therefore, there was no error, drift, or inconsistencies between successive cycles. modern ventilators deliver oxygen (mixed with air) to the lungs and monitor the co during expiration. our device does not have a capnograph internally; however, an external capnograph can be used to monitor the co . the reservoir bag of the bvm that we employed in our design can be filled with oxygen to increase the fio . moreover, anesthetic gas agents (e.g., sevoflurane, isoflurane) can be delivered via the medication port of the bvm mask adaptor for patients under anesthesia. similar to the controlled mode, in the assisted mode a constant volume of air is delivered to the lungs. however, in this mode, the inspiratory cycle is triggered by the patient's attempt to inhale. the effort to inhale generates a slight negative pressure in the respiratory system which is used to trigger the device. depending on the condition of the patient, a trigger pressure between - to - cmh o is configured for this mode of operation. as a safety feature, if the inhale attempt is not detected within a fixed period (set by the respiration rate and the i:e ratio), the device automatically defaults to the cm mode and notifies the operator via an alarm. to test this mode, we used ml for the tidal volume, a respiration rate of b/min with an i:e ratio of : , and trigger pressure of - cmh o. to simulate the negative pressure generated in the attempt for inspiration by the lungs, we applied slight positive pressure to the open end of the differential pressure sensor ( figure ). as depicted in figure , when pressure drops below - cmh o, the device starts the ventilation and delivers the set tidal volume. in the third cycle of figure , we made no 'attempt' for inhale and the device defaulted to the controlled mode and operated continuously as expected (figure ) . our design rationale is based on optimizing the performance (e.g., accuracy, portability, cycle life), simplicity, time to manufacture, scalability, and cost while utilizing generic and readily available components. unlike pneumatically powered ventilators (e.g., bird mark , bio-med mvp- , servo-i), the driving force in our proposed device is provided by a robust high precision stepper motor, which makes it portable and reliable for prolonged use. to simplify the manufacturing procedure, we avoided d any of the components and used readily available materials ( figure ). we achieved a total bill of materials cost of less than $ usd for making the prototype (materials and methods). in designing this device, we took into consideration the safety guidelines for designing a ventilator and incorporated multiple safety features in our design. these measures include: • the pressure is constantly monitored if it passes certain level, that are set in the software, the alarm is triggered to notifies the operator. for the first level (currently set at cmh o), the alarm sounds periodically, and the device continues to function. for the second level (currently set at cmh o), the alarm sounds continuously and halts the machine mid-operation. a mechanical safety valve is used to release the pressure should this situation occur. • in the assisted mode, if the device does not detect an attempt for inspiration from the patient, it automatically switches to the controlled mode and operates according to the set values. • in the event of an electronic failure, by default the stepper motor turns off. this feature enables the operator to manually actuate the glass syringe easily from the crank joint. in this case, the tidal volume can be read from the measurement markings on the glass syringe. • in the event to of a mechanical failure, the operator can immediately disconnect the device from the breathing circuit and use the already attached bvm to perform the ventilation manually. • the tidal volume is limited by two limit switches. the limit switches change the motor's direction of rotation when toggled. one of them is used to recalibrate the position of the motor in every breathing cycle. moreover, the limit switches prevent the motor from going past a certain angle, which could potentially damage the device. • the disassembly process of the device for maintenance and cleaning is very straightforward. moreover, glass is relatively chemically non-reactive and resistant to disinfectants. historically, to make a ventilator there is a remarkably high barrier of entry due to its targeted field being very niche and specific (i.e., medical field). the high level of regulations and requirements to make and sell such medical devices adds further to this entry barrier. we are achieving a potential advantage by substantially lowering the barrier of entry to manufacturing, deploying, and owning a mechanical ventilator. the covid- pandemic has brought to light the need for simpler and readily available devices that can address the urgency and impact of the situation rather than only relying on high-end all-rounded "perfect" devices. we focused our fundamental design on addressing the demand for ventilators in life-death situations. therefore, only the core and essential features, such as the safety features mentioned, as well as the two modes of operation are implemented. contributing to the cost of high-end ventilators have are the additional quality of life improvements and features which come with the device. common examples include capnograph, blood oximeter, additional storage for data logging/analysis, larger and more sophisticated lcd interface, ip rating, and ik impact resistance rating. we avoided using custom made parts (e.g., d printed components, machined gears, injection-molded pieces) and utilized off-the-shelf components to cut down the cost further. for example, the precision machining of rigid materials such as teflon or metals for making a ml cylinder/piston is a subtractive manufacturing method that is time-consuming, labor-intensive, and expensive. the demand for this device in underdeveloped countries will be much higher due to the sheer cost of high-end first-class ventilators. moreover, underdeveloped countries typically have a higher population density, which, in turn, creates a higher demand for the number of medical devices. in emergencies such as pandemics where the need for ventilators overwhelms the number of ventilators available, our device can act as a temporary or substitute to high-end medical grade ventilators. moreover, these ventilators could be part of the first aid or medical kit in large companies or institutions such as universities and airports. another possible application would be in education and training institutions where hands-on experience and time with such devices not only required but beneficial. considering precision for the tidal volume delivery that we achieved, we anticipate that our proposed ventilator can be used as an anesthesia ventilator as well. future work on this device can be divided into two major areas: improving the device hardware and expanding functionality. currently, a generic arduino microcontroller is being used due to its availability and community support, making it ideal for prototyping. for optimization purposes, we could move away from arduino to a more specialized microcontroller or an soc (system on chip). a more feature specific microcontroller unit would lay the foundation for more functionality, such as realtime data analysis, remote monitoring with wifi, and more compact circuitry. another significant component that could be improved on is the user interface (ui) and user experience (ux) of the device as well as more sensors (e.g., flow sensor, temperature sensor, ekg, blood oximeter, capnograph). an example of improving the ui/ux of the device would be to add a real-time plot and display more parameters on a larger display similar to high-end ventilators. the other major area for improvement is the functionality of the device. currently, a contact volume ventilation is possible via only cm and am modes with our device. constant pressure and constant flow rate ventilation with more modes can be incorporated with a minimum change to the hardware architecture. moreover, a servo-controlled heater can be attached to the body of the syringe to control the temperature of the gas before delivery. a more compact design can be implemented by utilizing a linear voice coil actuator and equip the software with a system identification technique to measure the lung's compliance and other mechanical properties (e.g., viscoelastic behavior) in situ. finally, long-term (weeks) porcine trials should be run to validate the usage of the device for prolonged ventilation. the clinical trials can then be performed to increase the confidence in our device in the medical field. in this work we presented a low cost, high performance, and straightforward mechanical ventilator that can be deployed in public health emergencies rapidly. we successfully mimicked the functionality of the two primary modes (i.e., cm and am) of operation for modern ventilators. in addition, we achieved the accuracy and consistency of a certified commercialized ventilator. to further improve on the device, more modes of operation can be implemented easily as the underlying design and architecture accommodates for these alternative modes. in addition, sensor for monitoring the blood oxygen saturation level, exhale co level, temperature, ekg, and heartbeat can be easily integrated. hardware architecture: the artificial respirator device is constructed of eight major components. the specifications and rationale behind choosing these components are explained in the following: • stepper motor: the maximum safe pressure that can be exerted on the respiratory system is cmh o. to generate this pressure with the glass syringe (plunger diameter of mm), we need a force of . n. to deliver ml air, we need an arm length of mm, which gives a torque requirement of . n⋅m for the motor. considering that the pressure transmission is not always %, we chose a nema , bipolar, n⋅m stepper motor. stepper motors with higher torque ratings can be utilized for higher respiration rates and i:e ratios. • motor driver: we chose a digital stepper driver with a peak current rating of . a to drive the stepper motor with our microcontroller. the inputs of the driver are optically isolated from the microcontroller to prevent backpropagation of back emf noise or any em noise from the stepper motor. we used pulse/rev setting to achieve a balance between smooth motion and stressing the microcontroller. • crank-shaft linkage: we made the crank-shaft linkage from the components illustrated in figure . the crank-shaft linkage components include a clevis rod end, a piece of high-strength steel threaded rod, a partially threaded rod end bolt, a ball joint linkage, medium-strength steel hex nut, a highstrength steel threaded rod, a piece aluminum sheet, and a flange mount shaft collar to connect the crank to the stepper motor. to join the top end of the glass syringe to the rod end bolt, we used a fast curing urethane resin that has similar performance and mechanical characteristics to abs plastic. figure illustrates the procedure. • glass syringe: we used a ml glass syringe manufactured by blg. the syringe has a mm opening at the tip, which is large enough not to slow down the system for fast actuation rates. • power supply: a v switching power supply with a current rating of a is used to power the stepper motor. • microcontroller: we employed an mhz, -bit arm core microcontroller. • sensors: a differential pressure sensor (honeywell hscdrrn mdsa ) is used to monitor the pressure via the spi interface. • interface: a . -inch tft color display was used for this work with pushbutton switches and limit switches. the pushbutton switches are soldered on a pcb board with no debounce capacitors. setup design: we used pine lumber ( / ", "× ") for the chassis of the device and pine slabs and studs for mounting the glass syringe, limit switches and making the housing for the electronics. for the final commercialized device, these components will be hosted in an enclosure. software architecture: we designed and implemented two similar state machines for the cm and am. the following section describes the high-level logic of the state machines. figure illustrates the highlevel logic of the cm and am states. both modes have the same architecture but with slightly different logic. details of each state are described below: • • exhale wait state: this state holds the motor in position for a set period. this period is determined at the start of the cycle from the i:e ratio, respiration rate, and the tidal volume parameters. in the am, the exhale wait state has additional logic to monitor pressure to detect the patient's attempt to inhale. • inhale state: similar to the reset state, the motor is moved cw back to the ° position in this state. the start of the inhale state is also considered the beginning of a new breathing cycle. therefore, any setting parameters such as respiration rate, i:e ratio, and tidal volume will be updated. • inhale wait state: similar to the exhale wait state, the motor is held in position for a set period of time. this state also calculates and displays the maximum flow rate, peak pressure, peep, and plateau pressure. the breathing cycle is then repeated until the device is stopped, rebooted, or powered off. cost analysis: the list of the materials and the cost to build a prototype of the device is tabulated in figure . the values for the set parameter and measured data for each cycle are tabulated in table s . for experiments to , we kept the i:e ratio constant at : but increased the rr from b/min to b/min while decreasing the vt from ml to ml. as we decreased the tidal volume, both the peak pressure and plateau pressure decreased as well. this correlation can be explained by the fact that less air is moved to the test lung; therefore, a smaller pressure is developed inside it. for experiments to , we kept the tidal volume and i:e ratio constant and increased the respiration rate. as expected, the flow rate increases as well as the difference between the peak pressure and plateau pressure increases. to show this effect in more depth, for experiments to , we kept the vt at ml and rr at b/min but increased the i:e ratio from : to : . similar to the previous case, since the volume is kept constant, the plateau pressure is also constant; however, the peak pressure increases, which can be explained by the fact the resistive pressure is increasing due to the increase in flow rate. more details are provided in the next section. this behavior of the test lung is similar to that of the human lungs. the lung is a porous structure made of soft tissues and a complex network of airway branches. due to the inhomogeneity of its structure, the pressure developed in the lungs has three components: • resistive pressure (presistive) which is the contribution from the resistive properties of the respiratory system. viscous and turbulent losses related to flow of gas through the airway tree and the deformation of parenchymal and chest wall tissues are probably the main contributors to the resistive pressure ( ) . at small flow rates, the resistive pressure is a linear function of the flow rate (v̇). however, at higher flow rates (e.g., low i:e ratios, exercising), the resistive pressure scales nonlinearly, often in the form of k v̇ + k v̇ , where k and k are determined empirically. • elastic pressure (pelastic) which is the contribution from the recoil of the lungs and chest wall to their relaxed states when inflated with air either by contraction of the diaphragm and intercostal muscles or by mechanical insufflation (e.g., a ventilator). therefore, the elastance of the respiratory system (ers) can be seen as a linear summation of the elastance from the chest walls (ecw) and the elastance from the lungs (el). the reciprocal of the respiratory system's elastance gives its compliance (crs). the elastic pressure is a function of the tidal volume. • inertial pressure (pinertial) which is the contribution from the inertial forces from the chest wall, airway tree, and parenchymal tissues. the inertial pressure is a function of volume's acceleration (v̈). therefore, the total pressure is ( , ): where i, r, and e are the internal, elastic, and resistive properties of the respiratory system. the po is the distending pressure at the end of expiration. the resistance (r) can be estimated from the difference between the peak pressure and plateau pressure divided by the flow rate. any alterations in presistive (for a specified flow rate) can reflect changes in the airway caliber ( ). at zero flow rate, the elastance of the respiratory system can be determined from the difference between the plateau pressure and the peep divided by the tidal volume. it is important to note that the dynamic elastance is higher than static elastance in general, which is due to the viscoelasticity and gas redistribution. figure s illustrates the contribution of each component of the pressure in the respiratory system. figure s -pressure, flow rate, and volume profiles in volume-controlled ventilation (at constant flow rate). (a) pressure, flow rate, and volume profiles for a healthy patient. in this the case the resistive pressure has a small contribution to the peak pressure. while the elastic pressure has the largest contribution. (b) the peak pressure is increased while the plateau remains similar to that of in (a). this indicates an increase in the resistive pressure. this condition occurs in cases such as bronchospasm, mucous plug, retained secretions, and ett tip occlusion. (c) the elastic pressure is increased in this case, while the resistive pressure contribution is similar to that of in part (a). conditions such as ards, pneumonia, pneumothorax, and pulmonary oedema lead to an increase in the elastic pressure. the position of the plunger is almost proportional to the cosine of the angle between the crank axis and axis of the plunger ( figure s ). from the cosine law, we can find the position of the plunger (s) as a function of the crank radius (r)which is half of the linear stroke, and length of the connecting rod (l) as the following equation suggests: ( ) the position of the plunger corresponds to the volume that the syringe can deliver. by rearranging the terms in equation , we find the s to be: ( ) therefore, the tidal volume (vt) as a function of rotation stroke (αi -αf) can be found to be: where d is the volume capacity of the glass syringe (i.e., ml) over the length the plunger should travel inside the barrel to achieve that volume capacity (e.g., the amplitude of the maximum linear stroke). for extreme cases of α = ° and α = ° we obtain vt = rd. since r is half of the maximum linear stroke, we obtain vt = ml as expected. figure s -diagram of the geometry of the crankshaft linkage. ( ) emergency ventilator design toolbox. mit e-vent mit emerg reporter focused on the health-care economy's effects on patient health, costs, privacyemailemailbiobiofollowfollow, more lifesaving ventilators are available. hospitals can't afford them actuation of untethered pneumatic artificial muscles and soft robots using magnetically induced liquid-to-gas phase transitions artificial muscles: mechanisms, applications, and challenges fast torsional artificial muscles from niti twisted yarns universal equation for estimating ideal body weight and body weight at any bmi ventilator parameters for covid- patients what is the normal inspiration and expiration ratio in mechanical ventilation? benumof and hagberg's airway management notice: importation or sale of ventilators -use of us fda guidance and canadian requirements for authorization under the interim order. aem ( ) interim order respecting the importation and sale of medical devices for use in relation to partitioning airway and lung tissue resistances in humans: effects of bronchoconstriction lung mechanics: an inverse modeling approach image-based computational modeling of the human circulatory and pulmonary systems: methods and the prediction of pressure drop and variation of resistance within the human bronchial airways the author benefited substantially from ms. mahdieh chavoshi (nurse anesthetist) and ms. farnoush key: cord- -mr o cu authors: li, fei; han, ming; dai, pengfei; xu, wei; he, juan; tao, xiaoting; wu, yang; tong, xinyuan; xia, xinyi; guo, wangxin; zhou, yunjiao; li, yunguang; zhu, yiqin; zhang, xiaoyu; liu, zhuang; aji, rebiguli; cai, xia; li, yutang; qu, di; chen, yu; jiang, shibo; wang, qiao; ji, hongbin; xie, youhua; sun, yihua; lu, lu; gao, dong title: distinct mechanisms for tmprss expression explain organ-specific inhibition of sars-cov- infection by enzalutamide date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mr o cu the coronavirus disease (covid- ) pandemic, caused by severe acute respiratory syndrome coronavirus (sars-cov- ), has rapidly become a global public health threat due to the lack of effective drugs or vaccines against sars-cov- . the efficacy of several repurposed drugs has been evaluated in clinical trials. among these drugs, a relatively new antiandrogen agent, enzalutamide, was proposed because it reduces the expression of transmembrane serine protease (tmprss ), a key component mediating sars-cov- -driven entry into host cells, in prostate cancer cells. however, definitive evidence for the therapeutic efficacy of enzalutamide in covid- is lacking. here, we evaluated the antiviral efficacy of enzalutamide in prostate cancer cells, lung cancer cells, human lung organoids and sars-cov- -infected ad-ace -transduced tmprss knockout (tmprss -ko) and wild-type (wt) mice. tmprss knockout significantly inhibited sars-cov- infection in vivo. enzalutamide effectively inhibited sars-cov- infection in human prostate cancer cells (lncap) but not in human lung cancer cells or patient-derived lung organoids. although tmprss knockout effectively blocked sars-cov- infection in ace -transduced mice, enzalutamide showed no antiviral activity due to the ar independence of tmprss expression in mouse and human lung epithelial cells. moreover, we observed distinct ar binding patterns between prostate cells and lung cells and a lack of direct binding of ar to tmprss in human lung cells. thus, our findings do not support the postulated protective role of enzalutamide in treating covid- . coronavirus disease (covid- ), which is caused by the novel coronavirus severe acute respiratory syndrome coronavirus (sars-cov- ), has emerged as a new worldwide pandemic. covid- has led to nearly , , confirmed global cases and , deaths as of september , . sars-cov- is a serious worldwide threat due to its high infectivity , . the current pandemic and the potential for future pandemics have exposed the urgent need for the rapid development of efficient countermeasures. therefore, repurposing clinically proven drugs has been postulated as a promising strategy for developing treatments for sars-cov- infection. transmembrane serine protease (tmprss ) has been reported with essential role in mediating viruses including sars-cov- -driven entry into host cells - . the spike glycoprotein (s) of sars-cov- and its receptor, angiotensin-converting enzyme (ace ) have been demonstrated with the function in mediating the attachment of sars-cov- to host cells , . then, the priming of sars-cov- s protein is processed by tmprss . moreover, besides sars-cov- , both sars-cov, another type of coronaviruses and h n , an influenza virus, also employ tmprss for viral entry , , . since the conserved role of tmprss in provoking coronaviruses and influenza viruses-driven entry into host cells has been highlighted, modulating tmprss expression or its protease activity is postulated to be a potential method for antiviral intervention - enzalutamide is a potent inhibitor of the androgen receptor (ar) and has been approved for the treatment of castration-resistant prostate cancer (crpc) patients , . mechanistically, enzalutamide binds to ar, reduces the efficiency of its translocation from the cytoplasm to the nucleus, and impairs the ar- mediated signaling pathway . given the modulation of tmprss expression by ar in prostate cells , several clinical trials have been initiated to assess the therapeutic efficacy of enzalutamide in covid- patients (clinicaltrials.gov identifiers; nct and nct ). however, it remains elusive whether ar indeed controls tmprss expression in different organs, especially lung. thus, an investigation of enzalutamide in the treatment of sars-cov- infection is urgently needed. herein, we evaluated the antiviral efficacy of enzalutamide in human lung organoids (luos) and human ace recombinant adenovirus (ad-ace )-transduced tmprss knockout (tmprss -ko) and wild-type (wt) mice. with these powerful approaches, we comprehensively defined the antiviral effect of enzalutamide. moreover, we showed the potential mechanism of enzalutamide with its different antiviral activity in the human prostate and lung. results to elucidate whether tmprss is crucial for sars-cov- -driven entry into host cells, we employed previously established tmprss -ko mouse model (extended data fig. a ). in line with previous findings , , under physiological conditions, tmprss knockout exhibited little effect on multiple organs including lungs (fig. a) . in order to identify tmprss positive cells in multiple organs, we next crossed tmprss -ko mice with rosa -eyfp mice expressing a cag-driven yfp cre-reporter (t y), when exposed to tamoxifen, this mouse model can be utilized to trace cells of the tmprss -positive lineage via detection of yfp expression (extended data fig. b ). with this model, we further investigated the existence of tmprss -positive cells in multiple organs. notably, in addition to prostate, other essential organs, including lung, kidney and liver, which are permissive for sars-cov- infection in human, were characterized with tmprss -postive epithelial cells ( fig. b and extended data fig. c ). the broad distribution of tmprss -positive cells might indicate the universal function of tmprss in mediating sars- cov- -driven entry in multiple organs. to confirm the role of tmprss in sars-cov- infection, we employed a previously reported ad-ace transduction method to overcome the natural resistance of mice to sars-cov- infection . briefly, we first transduced -to -week-old wt mice and tmprss -ko mice with . × pfu of flag-tagged ad-ace adenovirus. consistent with previous findings, predominant ace expression was observed in the alveolar epithelium, as indicated by flag staining (extended data fig. d ). five days post ad-ace transduction, mice were challenged with × pfu of sars-cov- . notably, compared to wt mice challenged with sars-cov- , tmprss -ko mice exhibited extremely less severe lung pathology, as indicated by the lack of robust inflammatory responses (fig. d, e) . we also quantified the percentage of lung cells with sars-cov- infection. based on the quantification of more than one million cells in mice per group via s protein staining, the percentage of s protein-positive lung cells was significantly lower in tmprss -ko mice than in wt mice ( fig. f, g) . collectively, these findings suggested that the lack of tmprss had marked effects on sars-cov- infection, highlighting the important role of tmprss in mediating sars-cov- -driven entry into host cells. since tmprss expression is modulated by ar in prostate cells, which promoted us to identify whether ar inhibition can prevent sars-cov- infection through reducing tmprss expression. we first surveyed tmprss and ar expression across a panel of well-characterized prostate cancer cell lines and two previously established organoid lines mskpca and mskpca . notably, both qrt-pcr and western blotting demonstrated high ar and tmprss expression in lncap and vcap cells (extended data fig. a, b) . consistent with previous findings , , a marked reduction in tmprss protein and mrna expression was induced by ar inhibition using enzalutamide treatment and was validated in both lncap and vcap cells (extended data fig. c , d, e and f). for sensitive and convenient detection of sars-cov- -driven entry into host cells, we employed a pseudovirus system by incorporating sars-cov- s protein and luciferase into pseudoviral particles through cotransfection of pnl - .luc.re and pcdna . encoding the sars-cov- s protein. thus, this system allowed the sensitive detection of sars-cov- pseudotype entry by measuring luciferase activity. the constructed pseudovirus was named sars-cov- -s. we first asked whether lncap and vcap were susceptible to sars-cov- -s-driven entry. since undetectable ace expression in lncap and vcap cells was identified, we observed lack of robust sars-cov- -s-driven entry into these cells, as expected (extended data fig. b, d) . to enable the permissiveness of lncap and vcap cells, we next transduced ad-ace into these cells (extended data fig. a, c) . given that enzalutamide treatment can reduce tmprss expression in prostate cells , , we next sought to ascertain whether enzalutamide can prevent sars-cov- from infecting prostate cells through downregulation of tmprss expression. we first investigated the therapeutic efficacy of enzalutamide in blocking sars-cov- -s-driven entry into lncap cells. in line with previous results generated from other tmprss positive cells , , camostat mesylate, a clinically proven inhibitor for serine protease including tmprss , significantly attenuated the infection of sars-cov- -s, as indicated by the reduction in luciferase activity, suggesting that tmprss is also an important factor for facilitating sars-cov- -driven entry into lncap cells (fig. a) . remarkably, enzalutamide also significantly blocked sars-cov- -s infection, which even exhibited much higher treatment efficacy than camostat mesylate (fig. a) . in addition, immunofluorescence staining for luciferase also demonstrated the consistent results that enzalutamide significantly reduced the percentage of cells with sars-cov- -s-driven entry (fig. b, c) . since pseudovirus system was limited to investigations on sars-cov- -s-driven entry into host cells, we next assessed whether enzalutamide interferes with authentic sars-cov- -driven entry and the subsequent steps of the viral replication cycle. consistent with findings from the pseudovirus system, enzalutamide efficiently exerted antiviral activity against sars-cov- in lncap cells, as demonstrated by the significantly reduced viral titers of sars-cov- in both culture medium supernatant and cellular contents (fig. d, e) . moreover, we evaluated whether enzalutamide can prevent sars-cov- -s-driven entry into vcap cells. recapitulating results in lncap cells, enzalutamide treatment also significantly blocked sars-cov- -s-driven entry into vcap cells (extended data fig. e ). taken together, utilizing both the pseudovirus system and authentic sars-cov- , we demonstrated that enzalutamide efficiently prevented sars-cov- -driven entry into prostate cells by inhibiting ar to reduce tmprss expression. since enzalutamide efficiently inhibited infection of human prostate cells with sars-cov- , we next sought to evaluate its therapeutic efficacy in human lung cells. to this end, we surveyed single-cell rna- sequencing data from healthy human lungs , consistent with previous results , tmprss -positive cells were broadly distributed in various cell types, potentially indicating an important role of tmprss in mediating sars-cov- infection in multiple lung cell types (extended data fig. a , b and d). in particular, high expression levels of both tmprss and ace were identified in sftpc-positive alveolar type ii (atii) cells, potentially indicating tmprss -dependent entry of sars-cov- into these cells (extended data fig. a , c, e and f). we further assessed ar expression and found that similar to tmprss , ar was also highly expressed with a wide distribution (extended data fig. a since human lungs were characterized with both ar and tmprss expression, we next sought to determine whether ar can modulate tmprss expression in the lungs. we firstly established human lung organoids (luos) derived from adjacent normal lung tissues with similar culture protocol as previously reported (fig. a) . to verify whether luos are an appropriate model in which to evaluate the therapeutic efficacy of enzalutamide, we performed immunofluorescence staining for ar and tmprss in luos. by staining of serial sections, we identified both ar expression and tmprss expression in luos, which also contained ar/tmprss double-positive cells (fig. b) . we next employed luos to explore whether enzalutamide could manipulate tmprss expression. distinct from above findings in prostate lncap cells, enzalutamide treatment did not significantly reduce tmprss expression, validated in three luos lines (fig. c, d and e) . to ensure the permissiveness of luos for sars-cov- -s-driven entry, we also transduced these organoids with ad-ace (fig. f) . twenty-four hours post ad-ace transduction, luos were pretreated with μm camostat mesylate or μm enzalutamide for hours before virus infection (fig. g ). camostat mesylate but not enzalutamide inhibited infection of luos with sars-cov- -s, confirming that enzalutamide could not protect lung cells against sars-cov- infection (fig. h , i and j). moreover, we also evaluated whether enzalutamide blocked authentic sars-cov- -driven entry and viral replication. consistent with the results obtained with the sars-cov- pseudovirus, enzalutamide did not exhibit antiviral activity against authentic sars-cov- (fig. k ). given that lack of treatment efficacy of enzalutamide in blocking sars-cov- -driven entry was characterized in human lung organoids, we also employed multiple lung cancer cell lines to validate whether these results could be recapitulated. among these cell lines, three of eight, namely, h , h and a cells were ar-positive, confirming the wide distribution of ar expression across multiple lung cell types (extended data fig. a ). since only h and h cells exhibited detectable tmprss expression, we treated these two cell lines with the ar ligand dihydrotestosterone (dht) and the ar inhibitor enzalutamide to assess changes in tmprss expression (extended data fig. b ). notably, unlike in lncap cells, in which dht stimulated and enzalutamide reduced tmprss expression, no obvious changes in tmprss expression were observed in h and h cells treated with these two agents (extended data fig. c , d, e and f). in addition, we also performed immunofluorescence for tmprss to validate these results. distinct from results in lncap cells that dht stimulated tmprss expression and enzalutamide reduced tmprss expression respectively, no obvious changes in tmprss expression were observed in h and h cells with these two treatments (extended data fig. g ). to enable these lung cells to be susceptible to sars-cov- -s-driven entry, we also transduced ad-ace into these cells (extended data fig. h ). we next compared camostat mesylate and human lung organoids and lung cancer cells, we demonstrated that tmprss expression was independent of ar expression in human lung epithelial cells, thus ar inhibition using enzalutamide did not reduce tmprss expression to block sars-cov- -driven entry into human lung epithelial cells. in order to identify whether ar was not capable of modulating tmprss expression utilizing in vivo mouse models, we next treated wt mice and castrated mice with enzalutamide for days. notably, in castrated mice, enzalutamide treatment impaired the function of ar by blocking its nuclear translocation in prostate cells (extended data fig. a ). as observed in human prostate cells, reduced tmprss mrna levels were identified in prostate epithelial cells in enzalutamide-treated mice and castrated wt mice (fig. a) . no significant changes in tmprss mrna levels in response to enzalutamide treatment and castration were observed in the lungs of male mice (fig. b ). in addition, consistent results were obtained in the lungs of female mice treated with enzalutamide ( fig. c) . finally, consistent with findings in human prostates and lungs, in vivo experimentation in mice also demonstrated the organ-specific role of ar in regulating tmprss expression. to demonstrate enzalutamide treatment efficacy in preventing sars-cov- driven entry into lung cells utilizing in vivo models, we also employed ad-ace transduced mouse models (extended data fig. b ). briefly, we firstly treated - -week-old wild type c bl/ mice with or without enzalutamide treatment by daily intragastric gavage. we next transduced control and enzalutamide-treated mice with . × pfu of ad-ace adenovirus. five days post ad-ace transduction, mice were challenged with × pfu of sars-cov- . mouse lungs were collected for pathological analysis and viral load determination days post sars-cov- challenge. notably, histopathological analysis revealed similar levels of inflammatory infiltration in control and enzalutamide-treated mice (fig. d , e). in addition, the viral titers did not differ significantly between control and enzalutamide-treated mouse lungs (fig. f) . moreover, the percentage of lung cells infected with sars-cov- in control mice did not differ significantly from that in enzalutamide- treated mice (fig. g, h) . taken together, utilizing in vivo mouse models, we obtained consistent results indicating that enzalutamide did not inhibit sars-cov- -driven entry into lung cells and subsequent viral replication. given the discrepancy between prostate cells and lung cells in the changes in tmprss expression in response to enzalutamide treatment, we next sought to elucidate whether such discrepancy was attributed to distinct ar binding pattern. we first performed chromatin immunoprecipitation with sequencing (chip- seq) on ar in prostate cells lncap and assay for transposase-accessible chromatin using sequencing (atac-seq) in both prostate cells lncap and lung cells a , h and h . based on ar chip-seq in lncap cells, we compared chromatin accessibility among these four cell lines of ar binding sites. notably, distinct from extensive chromatin accessibility of these sites in lncap cells as expected, the other three lung cell lines were characterized with much less open chromatin (fig. a) . in principle, transcription factors modulate transcriptional regulation through binding to regulatory elements of target genes, which tightly associates with chromatin accessibility . the chromatin accessibility disparity might indicate distinct ar binding pattern between prostate cells and lung cells. to further characterize ar binding pattern in prostate cells and lung cells respectively, we categorized these ar binding sites into two main groups: compatible with ar binding in tmprss , two extra ar binding sites were verified with extensive chromatin accessibility in lncap cell but not in other lung cells (fig. e ). in addition, unlike in prostate cells, chip-qpcr demonstrated the lack of robust ar binding in the upstream region of tmprss locus in lung cells (fig. f ). these results indicated lack of ar binding in tmprss in lung cells, which coincided with the above findings that ar inhibition utilizing enzalutamide did not reduce tmprss expression to inhibit sars-cov- -driven entry. furthermore, gene set enrichment analysis (gsea) revealed that androgen response genes were significantly enriched in ar-positive prostate cells (lncap, vcap and rv ) when compared with ar-positive lung cells (a , h and h ) (fig. g) . in accordance with gsea results, a significantly higher sum of z-scores for androgen responsive genes was observed in ar-positive prostate cells than that in ar-positive lung cells (fig. h ). since we demonstrated lack of ar binding in tmprss in lung cells, utilizing freshly dissociated lung cells from normal human lung tissue samples, we next sought to validate whether the correlation between the expression of ar and tmprss coordinated these findings. concordant with above findings, no significant correlation relationship was identified between ar and tmprss expression (fig. i, j, k) . we also analyzed normal lung tissues and normal prostate tissues from tcga datasets. a significant and positive correlation between the mrna expression of ar and a both-open gene parp was observed in both lung and prostate tissues, in keeping with ar binding in this gene (extended data fig. a, d) . the mrna levels of both tmprss and fkbp , which were characterized with specific ar binding in prostate cells, significantly correlated with ar mrna levels in prostate tissues but not in lung tissues (extended data fig. b, c, e, f) . these findings established a distinct ar binding pattern between the prostate and the lungs, providing clinical evidence that tmprss expression is not responsive to ar inhibition in lungs. collectively, these results revealed a distinct ar binding pattern between human prostate and lung cells. this finding not only offers a mechanistic explanation for the inability of ar to modulate tmprss expression but also suggests that enzalutamide is not a promising drug for blocking sars-cov- -driven entry into host cells. tmprss has been demonstrated with a pivotal role in promoting sars-cov- -driven entry into host cells through facilitating s protein priming via its serine protease activity - , . these previous findings suggest that the modulation of tmprss expression may provide a novel strategy to treat sars-cov- infection by blocking viral entry into host cells. it is well known that tmprss expression is regulated by ar in prostate epithelial cells. enzalutamide, an ar inhibitor approved for use in crpc patients, can reduce tmprss expression in prostate cancer cells. thus, enzalutamide has been proposed as a promising repurposed drug to inhibit sars-cov- infection and subsequent replication, which even provoked the initiation of two clinical trials. here, we further confirmed the indispensable role of tmprss in sars-cov- infection using human ace -transduced tmprss -ko mice (fig. ) . consistently, enzalutamide significantly decreased tmprss expression and inhibited sars-cov- infection in human prostate cancer cells (fig. ) . however, we did not observe any antiviral activity of enzalutamide against sars-cov- in the lungs of ad-ace -transduced wt mice or human lung organoids. these results suggested that enzalutamide may have antiviral activity in the prostate in male covid- patients, but also indicated that enzalutamide may have no clinical efficacy in treating covid- patients with lung infection. in human lung cancer cells. surprisingly, in ar/tmprss double-positive h and h lung cancer cells, neither ar inhibition using enzalutamide nor ar stimulation using dht resulted in a significant change in tmprss expression, implying that ar cannot regulate tmprss expression in human lung cancer cells. these findings seemed inconsistent with those of previous studies indicating that androgen exposure enhanced tmprss expression in another lung cell line, a . this discrepancy might be due to nm testosterone, since such higher concentration of testosterone to treat cells might result in misleading findings, which could not reflect the physiological function of ar. in addition, notably, tmprss mrna was hard to detect under physiological conditions in a cells, which exhibit high ar expression (extended data fig. b) . given that enzalutamide did not downregulate tmprss expression in these cells, we further demonstrated that enzalutamide failed to inhibit the entry driven by sars-cov- -s, as expected. since lung cancer cell lines harbor many genetic alterations, which might lead to disparities in findings with respect to normal lung cells, we employed early-passage benign human luos for further study. compatible with findings in lung cancer cells, enzalutamide had no treatment efficacy in preventing authentic sars-cov- and sars-cov- -s pseudovirus in benign human lung organoids. moreover, we also employed ad-ace -transduced mouse models and demonstrated that enzalutamide lacked antiviral activity against sars-cov- in vivo. a previous study demonstrated that androgen-deprivation therapy (adt) significantly reduced the risk of sars-cov- infection in prostate cancer patients . however, a subsequent study demonstrated that the lethality rate of sars-cov- in metastatic prostate cancer patients with adt was not lower than that in other cohorts of infected italian male patients , which did not suggest that adt exhibited antiviral activity against sars-cov- in patients with metastatic prostate cancer. the inconsistent findings from these two studies might be attributed to the different populations selected for investigation. however, to date, no concordant and definitive clinical evidence indicates that adt, including enzalutamide treatment, significantly inhibits sars-cov- infection. utilizing multiple models, including human lung cancer cells, human lung organoids and ad-ace - transduced wt mice, we demonstrated that enzalutamide failed to inhibit sars-cov- infection, which was attributed to the lack of ar-driven modulation of tmprss expression in lung epithelial cells. to elucidate the mechanisms underlying the disparity of ar-driven regulation between prostates and lungs, we further performed ar chip-seq and atac-seq in ar-positive lung cells and ar-positive prostate cells. unlike in prostate cells, the lack of specific ar binding at tmprss locus in lung cells, as demonstrated by ar chip-seq, were consistent with the finding that tmprss expression was independent of ar expression in human lung epithelial cells. these findings indicated mechanisms explaining that the lack of antiviral activity of enzalutamide against sars-cov- is due to a lack of direct ar binding at the tmprss locus in lung epithelial cells. however, our study had limitations. microenvironmental components, including immune cells, nerve cells and stromal cells, are involved in viral infection and subsequent replication - . although we employed ad-ace -transduced in vivo mouse models, our models did not consider the human lung microenvironment. since stromal cells in multiple human organs, including the lungs, are also characterized by ar expression, we cannot exclude the possibility that enzalutamide might display antiviral activity by altering the expression of some essential cytokines or chemokines in stromal cells. it was also noting that sars-cov- could still infect the lungs of tmprss -ko mice with lower effectivity. besides, when transduced with ad-ace , both tmprss -negative prostate cells pc (data not shown) and lung cells h were permissive for robust sars-cov- -driven entry (extended data fig. k ). these findings implied that besides tmprss , other factors may also play a crucial role in promoting sars- cov- infection. further studies to identify these factors and their precise functions in mediating sars- cov- infection will be really necessary. finally, we took advantage of multiple models of human prostate and lung cells, patients-derived benign lung organoids and ad-ace -transduced tmprss -ko and wt mice to comprehensively confirm the pivotal function of tmprss in sars-cov- infection. our findings validated that enzalutamide significantly inhibits sars-cov- infection in ar and tmprss double positive prostate cancer cells, identified that enzalutamide does not exhibit antiviral activity in human lung cancer cells and patients- derived benign lung organoids in vitro and in the lungs of ad-ace -transduced wt mice in vivo, and demonstrated the distinct ar binding pattern between prostate and lung epithelial cells. these findings will enhance our understanding of tmprss in sars-cov- infection and indicate the potential failure of clinical trials using enzalutamide to treat covid- patients. we thank members of the core facility of microbiology and parasitology (shmc) and the biosafety level the authors have no competing interests to declare. transduction and infection of mice. mice were anesthetized with avertin (sigma-aldrich, t - g) and transduced intranasally with . × ffu of ad -ace adenovirus in μl dmem (gibco c bt). mice were infected intranasally with × pfu of sars-cov- at the fifth day after ad- ace transduction. three days post infection, lungs were harvested for virus titer measurement and pathogenicity analysis using qpcr and immunohistology, respectively. castration and enzalutamide treatment of mice. enzalutamide (selleck, s ) ( mg/kg; the vehicle contained % carboxymethyl cellulose, . % tween , and % dmso) was administered intragastrically to castrated mice daily for - days as previously described . for minutes and centrifuged at , rpm for minutes at °c. the aqueous phase was transferred into a new tube, and an equal volume of isopropanol was added. the mixture was centrifuged at , rpm for minutes at °c. the supernatant was discarded, and the pellet was resuspended in % ethanol and centrifuged at , rpm for minutes at °c. the supernatant was then thoroughly removed and discarded. the pellet was resuspended in μl of nuclease-free water. reverse transcription was further performed with primescripttm rt master mix (takara, rr a) with ng of total rna as input. qrt- pcr was conducted with sybr qpcr mix (qiagen, ) using the manufacturer's protocol. the primer sequences are listed as followed: mouse-actb-f: cattgctgacaggatgcagaagg; mouse-actb-r: tgctggaaggtggacagtgagg. western blotting. cell lysates were prepared in ripa buffer supplemented with proteinase/phosphatase inhibitors. the protein content was quantified with a bca (thermo) assay. fifteen micrograms of protein were separated via sds-page and transferred onto a . mm pvdf membrane (ge). the membrane was blocked for hour at room temperature in tbst buffer containing % milk and was incubated overnight at °c or for h at room temperature with primary antibodies diluted in tbst buffer containing % milk. the membrane was then incubated with rabbit hrp-conjugated secondary antibodies (sab, #l ) for h in % milk at rt. the primary antibodies included anti-β-actin (sigma-aldrich, a , : , ), anti-ar (abcam, ab , : , ), anti-tmprss (abcam, ab transduction of cells with ad-ace . cells were seeded in -well plates before transduction. the next day, ad-ace was transduced into cells at a multiplicity of infection (moi) of with polybrene. the culture medium supernatant was replaced with fresh medium hours post transduction. pseudovirus production. sars-cov- pseudovirus was produced by cotransfection of t cells with pnl - .luc.re and pcdna . encoding the sars-cov- s protein using vigofect transfection reagent (vigorous biotechnology, t ). one hour before transfection, the medium was replaced with fresh dmem (gibco, c bt). further transfection was performed according to the manufacturer's protocol. the supernatants were harvested at hours post transfection, filtered through a . μm cell strainer, and split into . ml tubes for storage at - °c. pseudovirus infection assay. cells transduced with or without ad-ace were seeded in -well plates at initial count of between , and , cells per well and treated with different agents ( μm camostat mesylate (selleck, s ) or μm enzalutamide (selleck, s )). two days post seeding and treatment, cells were incubated with pseudovirus for hours. the culture medium supernatant was then replaced with fresh medium. two days post virus infection, the culture medium supernatant was removed, chip-seq library preparation. chip-seq was performed as previously described . in brief, million cells were fixed with % formaldehyde at room temperature for minutes with rotation. then, mm glycine was added to quench the formaldehyde at room temperature for minutes. after preclearing using µl of protein g beads (invitrogen, d), µl of anti-ar antibody was added (abcam, ab ) for immunoprecipitation overnight. to bind the anti-ar antibody, μl of protein g beads was added and incubated with rotation for two hours at °c. the beads were washed twice each with low salt wash buffer, high salt wash buffer and licl wash buffer and resuspended in μl of freshly prepared dna elution buffer ( mm nahco and % sds). the chip sample beads were placed on a magnet, and the supernatant was collected into a new tube. the above elution step was repeated with another μl volume of elution buffer. the samples were then digested with μl of proteinase k (invitrogen, ) with incubation at °c for hours. dna was purified with dna clean & concentrator tm - (zymo research, d ). one nanogram of eluted dna was used as input for library construction with a trueprep dna library prep kit v for illumina (vazyme, td ). libraries were sequenced with the illumina novaseq sequencing system (pe × bp reads) at berry genomics. chip-qpcr. eluted dna ( . μl) was used as the template for chip-qpcr with sybr qpcr mix (qiagen, ) following the manufacturer's protocol. the primer sequences used for chip-qpcr are listed as follows: ar_chip_tmprss _are_f: tggtcctggatgataaaaaagttt; ar_chip_tmprss _are_r: gacatacgcccccacaacaga. chip-seq data processing and analysis. raw fastq files were first trimmed to remove adaptors using trimgalore- . . with the following parameter settings: -q --phred --length -e . --stringency . trimmed fastq files were then mapped to hg genome utilizing bowtie . sambamba_v . . was conducted to remove duplicates . for igv visualization, deeptools was then performed using function bamcoverage to generate normalized cpm .bw files . for peak calling, macs was utilized with -q . parameter setting. deeptools was further applied for heatmap visualization with the function of computematrix and plotheatmap. atac-seq library preparation. to reduce the amount of contaminating mitochondrial dna, we performed a previously reported optimized atac-seq protocol . in brief, , cells were collected and washed once with pbs. cells were then lysed in μl of ice-cold lysis buffer ( mm tris-hcl, ph . ; mm nacl; mm mgcl ; . % np- ; . % tween ; and . % digitonin) for minutes on ice. immediately after lysis, nuclei were washed with ml of wash buffer ( mm tris-hcl, ph . ; mm nacl; mm mgcl ; and . % tween ) and then centrifuged at g for minutes at °c. to prepare sequencing libraries, a trueprep dna library prep kit v for illumina (vazyme, td ) was utilized for the following steps. atac-seq data processing and analysis. the approach used or atac-seq data processing was quite similar to that used for chip-seq data processing. however, the peak calling step differed due to the lack of input control files. in brief, after raw reads were trimmed with trimgalore- . . , bowtie was used for mapping the reads to the hg genome . samtools was further utilized for bam file sorting and indexing. the bamcoverage function in deeptools was used to generate .bw files with counts per million (cpm) normalization . r package diffbind was used to identify overlapped peaks between ar chip-seq- generated peaks in lncap cells and atac-seq-generated peaks in all four cell lines, respectively. then, both-open peaks were defined by overlapping the above generated peaks in all four cell lines. to further identify specific prostate-open peaks, we employed the intersect function in bedtools to exclude peaks that emerged in any of the three lung cell lines in lncap cells. gsea analysis. we downloaded gene expression matrices of multiple cancer cell lines from cbioportal , (derived from cancer cell line encyclopedia). we next performed gsea to determine whether hallmark androgen response genes show significantly differences between ar-positive prostate cancer cells (lncap, vcap and rv ) and ar-positive lung cancer cells (a , h and h ) . in addition, we also compared sum of z-scores for hallmark androgen response genes between these two groups. coronavirus disease (covid- ) weekly epidemiological update pandemic preparedness: developing vaccines and therapeutic antibodies for covid- aerodynamic analysis of sars-cov- in two wuhan hospitals evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry structure of sars coronavirus spike receptor-binding domain complexed with receptor receptor and viral determinants of sars-coronavirus adaptation to human ace tmprss is essential for influenza h n virus pathogenesis in mice tmprss contributes to virus spread and immunopathology in the airways of murine models after coronavirus infection inhibition of sars-cov- entry through the ace /tmprss pathway: a promising approach for uncovering early covid- drug therapies tmprss and covid- : serendipity or opportunity for intervention? a sars-cov- protein interaction map reveals targets for drug repurposing rapid repurposing of drugs for covid- emerging mechanisms of resistance to androgen receptor inhibitors in prostate cancer development of a second-generation antiandrogen for treatment of advanced prostate cancer covid- and androgen-targeted therapy for prostate cancer patients phenotypic analysis of mice lacking the tmprss - encoded protease generation of a broadly useful model for covid- pathogenesis, vaccination, and treatment organoid cultures derived from patients with advanced prostate cancer sox promotes lineage plasticity and antiandrogen resistance in tp -and rb -deficient prostate cancer acquired resistance to the second-generation androgen receptor antagonist enzalutamide in castration-resistant prostate cancer tmprss and tmprss promote sars-cov- infection of human small intestinal enterocytes a cellular census of human lungs identifies novel cell states in health and in asthma sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes long-term expanding human airway organoids for disease modeling chromatin accessibility and the regulatory epigenome sars-cov- receptor ace is an interferon-stimulated gene in human airway androgen receptor and androgen- dependent gene expression in lung androgen-deprivation therapies for prostate cancer and risk of infection by sars- cov- : a population-based study (n = ) on the relationship between androgen-deprivation therapy for prostate cancer and risk of infection by sars-cov- author correction: pathological inflammation in patients with covid- : a key role for monocytes and macrophages transplantation of ace (-) mesenchymal stem cells improves the outcome of patients with covid- pneumonia lung innervation in the eye of a cytokine storm: neuroimmune interactions and covid- erg orchestrates chromatin interactions to drive prostate cell fate reprogramming fast gapped-read alignment with bowtie sambamba: fast processing of ngs alignment formats deeptools : a next generation web server for deep-sequencing data analysis. nucleic an improved atac-seq protocol reduces background and enables interrogation of frozen tissues bedtools: a flexible suite of utilities for comparing genomic features integrative analysis of complex cancer genomics and clinical profiles using the cbioportal the cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles key: cord- - ugdxbmy authors: laskar, rezwanuzzaman; ali, safdar title: mutational analysis and assessment of its impact on proteins of sars-cov- genomes from india date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ugdxbmy the ongoing global pandemic of sars-cov- implies a corresponding accumulation of mutations. herein the mutational status of genomes from india along with their impact on proteins was ascertained. after excluding gaps and ambiguous sequences, a total of variable sites ( parsimony informative and singleton) were observed. the most prevalent reference nucleotide was c ( ) and substituted one was t ( ). nsp had the highest incidence of sites followed by s protein ( sites), nsp b ( sites) and orf a ( sites). the average number of mutations per sample for males and females was . and . respectively suggesting a higher contribution of mutations from females. non-uniform geographical distribution of mutations implied by odisha ( samples, mutations) and tamil nadu ( samples, mutations) suggests that sequences in some regions are mutating faster than others. there were mutations ( ‘neutral’ and ‘disease’) affecting amino acid sequence. nsp has a maximum of ‘disease’ variants followed by s protein and orf a with each. further, constitution of ‘disease’ mutations in genomes from asymptomatic people was mere % but those from deceased patients was over three folds higher at % indicating contribution of these mutations to the pathophysiology of the sars-cov- . the ongoing covid- global pandemic began from wuhan, china and has devastated millions of lives, economies and even nations as a whole. the first reported case was in created a scare as they had a relatively higher mortality rate. however, sars-cov- is by far the most contagious one [ ] [ ] [ ] [ ] . the higher incidence of viral infections would imply a faster evolution process for sars-cov- [ ] . this is so because more the virus replicates, higher are the chances of it accumulating mutations with the possibility of it leading to altered dynamics of its virulence, pathogenesis and interactions with host. the changes may not be necessarily favoring the virus; however, the unpredictability demands caution. the sars-cov- genome encodes for non-structural proteins in addition to the replicase polyprotein, the spike (s) glycoprotein, envelope (e), membrane (m), nucleocapsid (n) and other accessory proteins [ ] . the impact of mutations in all the regions of the genome needs to be assessed to understand viral evolution. with a definitive possibility of india becoming the most affected country by sars-cov- in near future and the demographic burden involved, its pertinent to be analyze the accumulating variations in the genome accounting for possible changes in protein and their potential to alter the virus in any manner. on th june, we retrieved fasta sequence congregations from india along their rational meta data from gisaid epicov server to construct the phylo-geo-network and analyze the haplogroups along with their geographical distribution across different states of india [ ] . herein we extend our study using the same congregation of sequences to analyze the nature and composition of the observed mutations and their impact on proteins of sars-cov- . gisaid epicov is an open access repository of genomic and epidemiologic information about novel corona viruses from across the world from wherein sequences were extracted and alignment performed as previously reported [ ] . briefly, fasta sequence congregations along their rational meta data from gisaid epicov (www.epicov.org) server. for mutational profile analysis with clinical correlation, we selected genomes of deceased patients from existing congregation. however, there were just two genomes for asymptomatic patients in the congregations. so, on . . , we downloaded fasta sequences with patient status from the same server and selected genomes from asymptomatic patients. as the data filter for genome extraction, we used hcov- as a virus name, human as a host, india as a location and complete sequence with high coverage. details of the asymptomatic samples are given in supplementary file . the sequences thus extracted were analyzed with nc_ . from wuhan, china as reference. mega(v. ) is a multithreaded tool for molecular and evolutionary analysis. multiple sequence alignment (msa) of the extracted sequences [ ] was initially visualized by this software then the variable sites are exported into spreadsheets with or without missing/ambiguous and gap sites along their respective positions [ ] . using this software, we estimate the mcl (maximum composite likelihood) nucleotide substitution pattern and tajima's neutrality test to understand transition transversion bias and nucleotide diversity [ , ] . piro iglsf is a matlab-based simulation software, we used this for the identify the location of mutated nucleotide position on specific gene [ ] . coronavirus typing tool of genome detective (v. . ) and covid- genome annotator of coronapp are webtools for analysis of protein and nucleotide mutation [ , ] . we used these tools for annotation, identification and classification of mutated protein followed by verification and validation of the positions with the mutated nucleotide sites by the output of mega. the nucleotide similarity percentage was validated by ncbi blast (blast.ncbi.nlm.nih.gov) to investigate the sequence diversity. sift, provean and ws-snps&go are the prediction tools which report positive or negative impact of variants on protein phenotype. the assessments are focused upon scores using several algorithms. it is expected that a sift score of < . is diseased ("affect protein function"), and that > . is neutral ("tolerated"). this is stated that a provean score of < − . is diseased ("deleterious"), and > − . is neutral. ws-snps&go 's phd-snp method is estimated to be > . mutation in the probability of disease, and < . is neutral [ ] [ ] [ ] . composition and distribution of variable sites (table ) and its negative value indicated the significance of these variable sites. however, excluding the gaps and ambiguous sequences reduced this percentage cover to . % encompassing variable sites which we have used for subsequent analyses reported in this study. this included parsimony informative (pi) sites and singleton sites (snp: single nucleotide polymorphism). the pi sites are those whose incidence was observed in multiple samples whereas singleton sites had a restricted single sample incidence. the distribution of these sites according to various substitutions, protein localizations and impact therein has been summarized in figure , supplementary file . as evident therein, c→t ( sites) forms the most prevalent mutation in both pi and singleton sites and g→t ( sites) comes a distant second. the common aspect of two most prevalent mutations is "t" being the substituted nucleotide. further, there were two multi-variable (mv) sites each in pi and singleton category wherein two separate mutations were observed at the same site in different samples. the details of observed mv sites have been summarized in table . the distribution of the variable sites across proteins of sars-cov- in a non-uniform manner is reflective of the differential contributions of proteins in evolution. as per our data, nsp had the maximum of variable sites followed by s protein ( sites), nsp b ( sites) and orf a ( sites) ( figure ; supplementary file ). these four proteins account for over half of the total variable sites of the genome and may be considered as drivers of genomic evolution for sars-cov- . the mutations of s protein have been the focus for multiple research groups owing to its plausible impact on viral entry to the host cell but the mutations elsewhere may be equally relevant as the viral genome is known to harbor only what's essential [ ] [ ] [ ] . we believe a holistic approach is required to understand the evolution as more often than not the selection advantage being offered by any mutation is a chance event and can be from any part of the genome. in terms of the impact of these variable sites on amino acid sequence of the viral proteins we classified them into four categories. first, the sites located in the extragenic region and hence no influence on the coding proteins. there were such variable sites localized to the utr regions ( in 'utr and in 'utr). secondly, snp-silent included those variable sites wherein the nucleotide change was leaving the amino acid sequenced unaltered. a total of such sites were distributed across the genome. thirdly, the variable sites which were leading to the introduction of a stop codon were referred to as snp-stop and there were such sites in our study. lastly, the variable sites which were affecting the protein sequence are referred as snp in the study and there were such sites (supplementary file ). the prevalence and distribution of these sites has been summarized in figure and results of the prediction of their impact on protein has been discussed later. in order to understand the underlying dynamics of substitutions, we performed the maximum composite likelihood estimate of nucleotide substitution as shown in table we thereon looked at these variations in combination with their prevalence across samples. the most prevalent nucleotide at the variable sites in reference sequence was c ( ) followed by g ( ) whereas t was by far the predominantly substituted nucleotide ( , %). also, the other three nucleotides had an almost equal representation in substitutions (a- , g- , t- ). this biased prevalence was not restricted to the alignment but was also getting translated to population incidence. there was a total of mutations with c as reference nucleotide and mutations with t as substituted nucleotide across studied genomes. the composition of variable sites, their substitutions and prevalence across samples has been summarized in figure and supplementary file . evidently, any particular mutation may be incident across multiple samples and a single sample can harbor multiple mutations. a cumulative number for the same has been referred to as "sum of mutation incidence" herein and thereafter in this study. we subsequently analyzed the patient's dataset with reference to age and gender for the incidence of mutations. however, since patients' data wasn't cumulatively available, the data for this aspect isn't exhaustive but representative for samples ( females and males). the patients whose genomes were used in the study and age was known were classified into seven categories from infancy to over years. the maximum number of patients for both males and females belonged to mature adulthood category of to years with and samples respectively (figure , supplementary file ) . this adheres to the fact that the older population is at a greater risk for infection owing to a possibly weaker immune system and other physiological conditions. the simple question of whether or not age and gender are associated with accumulation of genome variations has a not so simple answer. the overall average number of mutations per sample was . and the corresponding values for males and females separately was . and . respectively. thus, women were contributing more to the mutational accumulation as compared to males. the individual mutational load for different age groups in males and females has been represented in figure . evidently, women are contributing more to the mutational load except for three age groups; - years, - years and - years. the highest difference on the basis of gender is for - years ( . ) but since there was just one female sample in that age group, it can't be emphasized much in isolation but the overall pattern does seem relevant. this is more so because, in terms of incidence, males are almost . times of the females but in terms of variations, fewer females are contributing more to the mutational load. possibly, the virus is behaving differently depending on gender. the mutational distribution across different states of india was subsequently ascertained. generally speaking, more the virus replicates more should be the accumulated variations. the fact that the samples used in the study aren't uniformly distributed across states provides for an intriguing template for analysis. the number of samples and the mutations therein for different states has been summarized in figure however, we can surely say that some sequences are mutating more than the others but whether the geographical location is playing a role needs to be ascertained. a total of snps which were present which were altering the amino acid sequence. their details and positions have been summarized in table and supplementary file . we also ascertained the prevalence of these variants across samples. the most incident variant q h localized to orf a was present in samples followed by a d in nsp present in samples. amongst the silent snps, y y in m protein was present in samples followed by d d in s protein with incidences. the overall data for variants present in genomes or more has been summarized in figure a . conversely, we also assessed the accumulation of variations in a given genome as summarized in figure b . interestingly, one sample (genome id ) had highest incidence of mutations while samples harbored just a single mutation. there were samples with no mutations and with more than one mutation. to account for these, the sum of mutation incidence has been used in this study as explained above. the impact of mutations on proteins was predicted through three different tools; sift, provean, ws-snps&go; which classified the mutations as "neutral/tolerated" or "disease/affect protein function/deleterious". for the sake of simplicity, we have referred the results from all sites as neutral and disease. though the prediction outcomes of the three tools were not in sync for all sites but since the classification of outcomes were on similar lines, the results can be represented in a binary manner with four categories. first two categories represent wherein the three tools have the same prediction; either all predicting a site to be "neutral" or "disease". the other two categories represent deviation between prediction outcomes. they are "disease by one, neutral by two" and "disease by two and neutral by one". for comparison between variants, any mutation predicted as disease by two or three tools are considered as disease and mutations predicted as neutral by two or three tools are considered as neutral. the distribution of disease and neutral variants across the different genes of sars-cov- has been shown in table and supplementary file . these could be analyzed in three aspects. first, in terms of overall incidence. the maximum variants affecting protein sequence were present in nsp ( ) followed by spike (s) protein with variants. secondly, if we focus only on variants with predicted outcome as "disease" then nsp has a maximum of such variants followed by s protein and orf a with variants each. thirdly, we looked at those proteins which had more disease variants as compared to neutral. there were five such proteins namely: nsp , nsp , nsp , orf a, e, orf a. of these nsp had just two variants and both of them were predicted as disease by all three tools. others had differential bias towards disease variants. thus, we can say that though some regions of the genome have more variations but mostly neutral while others with fewer variations are more impactful in terms of their predicted impact due to more disease variants. conversely, mutations in some proteins can be relatively better tolerated by the viral genome. the overall protein prediction outcomes of the genomes have been summarized in figure . there were total of mutations ( %) and mutations ( %) which are predicted to be neutral and disease respectively by at least two tools. these predictions suggest that even though mutations are accumulating in sars-cov- , they are predominantly neutral. this is the possible reason that no major virulence or physiological deviations have been observed so far. in order to further assess impact of these variations we compared their prevalence across samples which were asymptomatic with those wherein the patient died. the idea was that if predictions are true, then asymptomatic samples should have more of neutral mutations whereas deceased ones should have more of disease mutations. the present congregation of samples in the study had just asymptomatic samples and deceased. thereon, we included mutations with those of deceased samples. their comparative data has been shown in table . the p value therein represents the probability that a given variant chosen at random to be neutral or disease. taking the threshold as common prediction by at least two tools the data gives interesting insights. as shown and previously mentioned, for the original congregation of samples, % mutations were neutral (p value . ) and % were disease the mutational accumulation in sars-cov- genomes is a multifactorial event with some areas of genome more prone to mutations, selective mutations being more prevalent, nonlinear assimilation of mutations across various states and differential correlation between mutational impact on proteins and physiological state. age and gender specific bias in incidence of mutations was observed. the asymptomatic samples had higher occurrence of neutral variants while deceased samples had relatively higher incidence of disease variants. a cross-linking of mutational dynamics and patient history will provide for better correlation and understanding of the variations in sars-cov- genomes. ; d= ) a t, t i, y c, e d, p s, g v, s i, p l, m i, h y, a t, s i, t k, t i, m i, v a, s g, t i, g s, t i, t a, t i, n t, k r, s f, k e, g e, p s, l f, a v, v i, t i, t i, l v, p l, h y, a v, s f, ; d= ) l f, n y, e d, a s, s f, g s, q r, t i, t i, e q, a s, t i, e d, t i, v i, q h, a s, t s, g v, t i, a s, i k, s p, d y, a v h q, p l t i, f c, l p, t k, v f, d y, c f g c a t a v s f i n c f a s, t i, h y, v l, g v, k r, k n, g v, q k orf a (n= ; d= ) v l, g v, s a, v f, t i l f, s f, s l, t i i t, l f, t i i t, l f, l f, q h severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a major outbreak of severe acute respiratory syndrome in hong kong a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster epitope-based chimeric peptide vaccine design against s, m and e proteins of sars-cov- etiologic agent of global pandemic covid- : an in silico approach identification of a novel coronavirus causing severe pneumonia in human: a descriptive study phylo-geo-network and haplogroup analysis of novel coronavirus (ncov- ) genomes from india molecular evolutionary genetics analysis across computing platforms statistical method for testing the neutral mutation hypothesis by dna polymorphism prospects for inferring very large phylogenies by using the neighbor-joining method microsatellite diversity, complexity, and host range of mycobacteriophage genomes of the siphoviridae family identification from high-throughput sequencing data coronapp : a web application to annotate and monitor sars-cov- mutations sift web server: predicting effects of amino acid substitutions on proteins predicting the functional effect of amino acid substitutions and indels : predicting stability changes upon mutation from the protein sequence or structure missense mutations in sars-cov genomes from indian patients geographic and genomic distribution of sars-cov- mutations mechanisms of viral mutation the authors thank the department of biological sciences, aliah university, kolkata, india for all the financial and infrastructural support provided. authors acknowledge all the authors associated with originating and submitting laboratories of the sequences from gisaid's epiflu™ (www.gisaid.org) database on which this research is based. the authors declare they have no competing interests. not applicable. all data pertaining to the study has been provided as supplementary material of the manuscript. rl: methodology, investigation, formal analysis and validation sa: conceptualization, supervision and writing. key: cord- -ebgusf o authors: cao, yipeng; yang, rui; wang, wei; lee, imshik; zhang, ruiping; zhang, wenwen; sun, jiana; xu, bo; meng, xiangfei title: computational study of ions and water permeation and transportation mechanisms of the sars-cov- pentameric e protein channel date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ebgusf o coronavirus disease (covid- ) is caused by a novel coronavirus (sars-cov- ) and represents the causative agent of a potentially fatal disease that is of public health emergency of international concern. coronaviruses, including sars-cov- , encode an envelope (e) protein, which is a small, hydrophobic membrane protein; the e protein of sars-cov- has high homology with that of severe acute respiratory syndrome coronavirus. (sars-cov) in this study, we provide insights into the function of the sars-cov- e protein channel and the ion and water permeation mechanisms on the basis of combined in silico methods. our results suggest that the pentameric e protein promotes the penetration of monovalent ions through the channel. analysis of the potential mean force (pmf), pore radius and diffusion coefficient reveals that leu and phe are the hydrophobic gates of the channel. in addition, the pore demonstrated a clear wetting/dewetting transition with monovalent cation selectivity under transmembrane voltage, which indicates that it is a hydrophobic voltage-dependent channel. overall, these results provide structural-basis insights and molecular-dynamic information that are needed to understand the regulatory mechanisms of ion permeability in the pentameric sars-cov- e protein channel. covid- is a severe and highly contagious respiratory illness that was first reported in china in early december . subsequently, the virus has spread worldwide. as of may , , millions of cases have been confirmed, and hundreds of thousands have died. the world health organization (who) announced a global pandemic for covid- in march . in addition to the hazards of the disease itself, it has also led to a severe turbulence in international financial markets and may cause serious consequences such as a financial crisis. covid- is a disease caused by a new coronavirus named sars-cov- . it is speculated that it originated from bats and was transmitted to humans through an intermediate host (some kind of wildlife). its symptoms include fever, general malaise, dry cough, shortness of breath, and respiratory distress. comparing similar diseases, including severe acute respiratory syndrome (sars) and middle east respiratory syndrome (mers), which are also caused by coronaviruses, the mortality rates for sars and mers were % and %, respectively . currently, covid- has a mortality rate of to % in different countries but appears to be more contagious than sars and mers . similar to other coronaviruses, sars-cov- is a long positive-sense, long single-stranded ( kb) rna virus. the structure of different human coronaviruses (hcovs) is similar. the viral genome is packed by nucleocapsid (n) proteins, forming a helicoidal nucleocapsid protected by a lipid envelope . several viral proteins, including the spike (s), envelope (e), and membrane (m) proteins, are embedded within a lipid envelope . studies have shown that s, m, and n proteins play important roles in receptor binding and virion budding. for example, the m protein participates in virus germination and interacts with the n and s proteins. the s protein has immune recognition sites and can be used to design vaccines . currently, the importance of the e protein has not been fully revealed. evidence suggests that the e protein maintains its morphology after virus assembly by interacting with the m protein . when an e protein gene mutation occurs, it promotes apoptosis. recent studies have shown that coronaviruses have a viroporin that can self-assemble into a pentameric structure and have ion selectivity. when there is a transmembrane voltage, the ion channel characteristics of viroporins are more significant. in addition, the asn ala and val phe mutations could destroy ion channel activity . this indicates that the e protein may play an important role in regulating the ion equilibrium inside and outside the viral envelope. the ion channel activity of the e protein can lead to increased levels of the inflammatory cytokines il- β, tnf and il- in the lungs, leading to the occurrence of an "inflammatory storm" . this plays a key role in the progression of the disease and may cause the patient's condition to suddenly deteriorate and lead to death. in addition, its evolutionary conservatism may be an important cause of viral cross-host infection . although previous studies have suggested that the e protein of coronaviruses such as sars and mers oligomerizes and has ion permeability, the specific mechanism of ion permeability and channel properties remain to be explored due to the lack of a crystal structure for the e protein. in , li et al identified the sars-cov e protein monomer structure. in , surya et al. extracted the pentamer structure of the sars e protein by nuclear magnetic resonance (nmr) , which provided strong support for the study of the sars-cov- e protein pentamer ion permeability mechanism. in this study, we obtained the amino acid sequence of the sars-cov- e protein from the national center for biotechnology information (ncbi) database . the e protein pentamer model of sars-cov- was built by using the homology modeling method, and the reasonableness of the model was evaluated. subsequently, µs-level molecular dynamics (md) simulations were performed to evaluate the pentamer's stability in the membrane environment. we tried to use potential mean force (pmf) to reveal the permeability of different physiological ions and water molecules in the pores of the e protein pentamer. the characteristics of the pentameric channel were analyzed by combining the channel diffusion coefficient and geometric properties. in addition, computational electrophysiology was applied for different transmembrane voltages of the system to reveal the effect of the voltage on the ion permeability of the pentameric e protein. overall, exploring the mechanisms of the pentameric sars-cov- e protein not only provides valuable insights into the conduction of the channel but also has important implications for our understanding of the difference between sars-cov- and other coronaviruses. sequence alignment and homology modeling figure a shows the sequence alignment between the e proteins of two human coronaviruses (sars-cov and sars-cov- sequence) made by clustal x software. the blue dotted rectangle in figure b the rampage online program was used to evaluate the accuracy of the sars-cov- pentameric e protein model. more than . % of the amino acids are within the acceptable range, suggesting that the sars-cov- e protein is similar to the sars e protein nmr model. subsequently, a ns ( µs) md simulation was performed to evaluate the stability of pentameric e protein embedded in the membrane environment. the root-mean-square deviation (rmsd) of the model is shown in figure , and the red and blue curves represent the whole protein and tm region, respectively. in the first ns, the rmsd continued to rise, indicating that the model needed longer optimization (compared with other ion channels) to reach the pentameric structure equilibrium. during the last ns, the curves plateaued. the rmsd of the whole protein and the tm region converged at ~ . and ~ . nm, respectively. there is a . nm difference between the whole protein and tm region. these findings are consistent with other ion channel data that show that the tm region has a higher stability than the other parts of the membrane protein . the permeability of the sars-cov- pentameric e protein channel is very important for understanding the replication ability of viruses in cells and how they are secreted into the extracellular medium . it is possible to examine a profile of the free energy of a single ion or water as a function of its position along the pore axis by calculation of pmf. in this simulation, the ions or water molecules are restrained to a continuous position along the z-axis and move freely in the pore's xy plane. moreover, other parts of the system (proteins, ions, water molecules, and lipids) can move freely and reach equilibrium. the pmf of important physiological ions (mg + ; ca + ; cl-; k + and na + ) and the water molecules as a function of their position along the pore (z)-axis were calculated separately. figure a shows the pmf of ions and water molecules permeating through the sars-cov- e protein pentamer pore. z m is defined as the axial direction along the pore, which is from - ~ nm (the length of umbrella sampling is nm in total). from the pmf curve, we found that the energy barriers of monovalent and divalent ions have a significant difference. the maximum energy barriers of the two divalent ions ca + and mg + are kj/mol and kj/mol, respectively. in contrast, the pmfs of the monovalent ions na + and k + are ~ kj/mol and ~ kj/mol, respectively, while the cl-energy barrier is ~ kj/mol, which is significantly smaller than that of ca + and mg + . from an energy perspective, the sars-cov- pentameric e protein is almost impermeable to divalent ions. this confirms the previous hypothesis that the sars-cov and mers-cov e protein channel is a monovalent cation channel . the maximum energy barrier in descending order is as follows: na + . v. computational electrophysiology also explained that the transmembrane voltage led to an easier wetting transition for the channel. the range for keeping the channel open should be between . v and . v. na + and k + could be cotransported during water permeation, which is very similar to that in many hydrophobic channels . intriguingly, we did not observe clconduction, which may be because chloride ions could not overcome the energy barrier under the maximum transmembrane voltage. in addition, the change in the pore radius at different voltages highlights two important sites, where the geometric radius changes the most, corresponding to leu and phe . phe contains a benzene ring group at the hydrophobic surface of the inner pore as a hydrophobic gate. the isopropyl group of leu at the bottom of the pore prevents the reverse penetration (fig a) . previous studies indicated that asn val and val phe mutations will make the channel dysfunctional. this may be due to the additional side chain groups in the pores caused by these mutations (especially the benzene ring in val phe), making ) the radius of the pores decrease and ) the hydrophobicity of the pores increase. in short, these mutations may increase the energy barrier, negatively affecting the wetting transition and causing the channel to be functionally closed. therefore, the sars-cov- e protein pentamer is a voltage-dependent hydrophobic channel. we propose that the e protein may play an essential role in the virus infection and replication processes through the following mechanisms: ) the monovalent selective permeation of the e protein pentamer ion channel may change the intracellular ph, providing a suitable microenvironment for virus replication. ) selective permeation can form a transmembrane voltage, creating feedback regulation and maintaining an intracellular microenvironment that is suitable for viral growth. ) the disintegration of the ion equilibrium of the intracellular area affect the charges coming from the cell through ion channels in the cell membrane, changing the ph and making it easier for the virus to fuse with the cell membrane. this is the infection mechanism of the avian coronavirus as well as some types of influenza viruses . ) it has a certain signal transduction function. although the pentameric sars-cov- e protein crystal structure is not yet available, preparation of the sars-cov- pentameric e protein-membrane simulation system the amino acids sequence of the human coronaviruses e proteins were downloaded from the national center for biotechnology information (ncbi to obtain the equilibrated pentameric sars-cov- e protein model, charmm all-atom force field was chosen for md simulation. the md time step was set at fs. electrostatic interactions were described using the particle mesh ewald (pme) algorithm with a cut-off of . nm. the lincs algorithm was used to constrain the bond lengths. the pressure was maintained semi-isotropically at bar at both x and y directions using the perinello-rahman barostat algorithm and the system temperature was maintained at k by the nose-hoover thermostat . then, a ns ( µs) md simulations were performed for the e protein pentameric. umbrella sampling the initial system for umbrella sampling simulations was derived from the equilibrated pentameric sars-cov- e protein mentioned above. a single ion that maintain physiological activity (na + , k + , ca + , mg + , cl -) or water molecule was placed at successive positions along the central pore axis by using gromacs pull code. energy minimization was performed before simulation for optimizing the water and ions position. the reaction coordinate defined from z + to - nm with the mass center at z = nm, with a spacing of . nm between successive windows, resulting in umbrella sampling simulation systems. the probe ion or water molecules were harmonically restrained by a force constant of kj mol/nm same with the pore direction. each window was performed a ns total umbrella sampling simulation. the initial ns for system equilibration, then a subsequent ns was applied for analysis. the pmfs were computed by the weighted histogram analysis method (wham) , and the profile was generated by gromacs protocol 'g_wham' . bootstrap analysis (n = ) was used to estimate statistical error and the '-cycl' parameter was used to make the value equal. the diffusion-coefficient was calculated using the method described by shirvanyants et al. . we restrained the e-protein pentamer helix backbones and made the side chains, membrane, water and ions free to move. due to the ion selectivity is related to the side chain of inner pore , the backbone restraint maintained the e-protein pentamer tertiary structure but the side-chain flexibility in the inner pore was not influenced. a k + as the probe ion, the calculation system was used in the same manner as for umbrella sampling. the ion mean-square displacement (msd) was calculated along the pore's z-axis. a total of simulation systems were obtained, each widows interval distance of the k + being set as . nm. the umbrella restraint was used to maintain the k + ion's position on the x-y plane of the pore. the einstein equation msd= d(z)t was used to calculate the diffusion coefficient by the protocol g_msd. the umbrella restraint can be disregarded for these analyses because of the restraint force was negligible compared to thermally induced rms fluctuations. computational electrophysiology (ce) is a good tool for simulating the ion conduction ability of a channel under transmembrane voltage. we established a sandwich structure including three parts (membrane-protein-water) as described by kutzner et al. as shown in fig a, each water layer contained a different number of ions. due to the imbalance of the ion distribution of the water layer, the ions gradient will produce transmembrane voltage. specific transmembrane voltage can be applied to the simulation system by adjusting the number of ions between the water layers. in this study, we built the sars-cov- e protein sandwich system. the initial single layer system was equilibrium for ns to make the structures stable. then the single-layer system was duplicated along the z direction. the resulting system had a size of around × × nm sars and mers: recent insights into emerging coronaviruses real estimates of mortality following covid- infection case-fatality rate and characteristics of patients dying in relation to covid- in italy coronavirus: covid- has killed more people than sars and mers combined, despite lower case fatality rate clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study the sars coronavirus nucleocapsid protein-forms and functions characterization of a novel coronavirus associated with severe acute respiratory syndrome. science ( -. ) isolation of a novel coronavirus from a man with pneumonia in saudi arabia covid- spike-host cell receptor grp binding site prediction infectious bronchitis virus e protein is targeted to the golgi complex and directs release of virus-like particles nucleocapsid-independent assembly of coronavirus-like particles by co-expression of viral envelope protein genes efficient assembly and release of sars coronavirus-like particles by a heterologous expression system conductance and amantadine binding of a pore formed by a lysine-flanked transmembrane domain of sars coronavirus envelope protein covid- : consider cytokine storm syndromes and immunosuppression a decade after sars: strategies for controlling emerging coronaviruses severe acute respiratory syndrome coronavirus envelope protein ion channel activity promotes virus fitness and pathogenesis structure of a conserved golgi complex-targeting signal in coronavirus envelope proteins structural model of the sars coronavirus e channel in lmpg micelles a new coronavirus associated with human respiratory disease in china structure validation by cα geometry: ϕ, ψ and cβ deviation molecular dynamics simulations of membrane channels and transporters a single polar residue and distinct membrane topologies impact the function of the infectious bronchitis coronavirus e protein sars coronavirus e protein forms cation-selective ion channels mers coronavirus envelope protein has a single transmembrane domain that forms pentameric ion channels ion channels of excitable membranes theoretical and computational models of biological ion channels functional annotation of ion channel structures by molecular simulation permeation process of small molecules across lipid membranes studied by molecular dynamics simulations molecular transport through membranes: accurate permeability coefficients from multidimensional potentials of mean force and local diffusion constants effect of field direction on electrowetting in a nanopore voltage gating of a biomimetic nanopore: electrowetting of a hydrophobic barrier viroporins: structure and biological functions ion channel voltage sensors: structure, function, and pathophysiology designing a hydrophobic barrier within biomimetic nanopores stim activates crac channels through rotation of the pore helix to open a hydrophobic gate -relaxation studies on cell membranes and lipid bilayers in the high electric field range caver . : a tool for the analysis of transport pathways in dynamic protein structures covid- infection: origin, transmission, and characteristics of human coronaviruses repurposing therapeutics for covid- : supercomputer-based docking to the sars-cov- viral spike protein and viral spike protein-human ace interface identification of the mechanisms causing reversion to virulence in an attenuated sars-cov for the design of a genetically stable vaccine biophysical characterization of vpu from hiv- suggests a channel-pore dualism identification of an ion channel activity of the vpu transmembrane domain and its involvement in the regulation of virus release from hiv- -infected cells evidence for the formation of a heptameric ion channel complex by the hepatitis c virus p protein in vitro hydrophobic gating in ion channels electric-field-induced wetting and dewetting in single hydrophobic nanopores voltage-gated hydrophobic nanopores a role for hydrophobic residues in the voltage-dependent gating of shaker k+ channels the avian coronavirus infectious bronchitis virus undergoes direct low-ph-dependent fusion activation during entry into host cells variations in ph sensitivity, acid stability, and fusogenicity of three influenza virus h subtypes swiss-model: modelling protein tertiary and quaternary structure using evolutionary information charmm-gui: a webbased graphical user interface for charmm charmm general force field (cgenff): a force field for drug-like molecules compatible with the charmm all-atom additive biological force fields particle mesh ewald: an n⋅ log (n) method for ewald sums in large systems lincs: a linear constraint solver for molecular simulations polymorphic transitions in single crystals: a new molecular dynamics method the nose-hoover thermostat the weighted histogram analysis method for free-energy calculations on biomolecules. i. the method a free weighted histogram analysis implementation including robust error and autocorrelation estimates pore dynamics and conductance of ryr transmembrane domain principles of selective ion transport in channels and pumps. science ( -. ) computational electrophysiology: the molecular dynamics of ion channel permeation and selectivity in atomistic detail vmd: visual molecular dynamics gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers this study was supported by the national natural science foundation of china all other authors declare no competing interests. key: cord- -fglyfz p authors: minervina, anastasia a.; komech, ekaterina a.; titov, aleksei; koraichi, meriem bensouda; rosati, elisa; mamedov, ilgar z.; franke, andre; efimov, grigory a.; chudakov, dmitriy m.; mora, thierry; walczak, aleksandra m.; lebedev, yuri b.; pogorelyy, mikhail v. title: longitudinal high-throughput tcr repertoire profiling reveals the dynamics of t cell memory formation after mild covid- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fglyfz p covid- is a global pandemic caused by the sars-cov- coronavirus. t cells play a key role in the adaptive antiviral immune response by killing infected cells and facilitating the selection of virus-specific antibodies. however neither the dynamics and cross-reactivity of the sars-cov- -specific t cell response nor the diversity of resulting immune memory are well understood. in this study we use longitudinal high-throughput t cell receptor (tcr) sequencing to track changes in the t cell repertoire following two mild cases of covid- . in both donors we identified cd + and cd + t cell clones with transient clonal expansion after infection. the antigen specificity of cd + tcr sequences to sars-cov- epitopes was confirmed by both mhc tetramer binding and presence in large database of sars-cov- epitope-specific tcrs. we describe characteristic motifs in tcr sequences of covid- -reactive clones and show preferential occurence of these motifs in publicly available large dataset of repertoires from covid- patients. we show that in both donors the majority of infection-reactive clonotypes acquire memory phenotypes. certain t cell clones were detected in the memory fraction at the pre-infection timepoint, suggesting participation of pre-existing cross-reactive memory t cells in the immune response to sars-cov- . covid- is a global pandemic caused by the novel sars-cov- betacoronavirus [ ] . t cells are crucial for clearing respiratory viral infections and providing longterm immune memory [ , ] . two major subsets of t cells participate in the immune response to viral infection in different ways: activated cd + t cells directly kill infected cells, while subpopulations of cd + t cells produce signaling molecules that regulate myeloid cell behaviour, drive and support cd response and the formation of long-term cd memory, and participate in the selection and affinity maturation of antigen specific bcells, ultimately leading to the generation of neutralizing antibodies. in sars- survivors, antigen-specific memory t cells were detected up to years after the initial infection, when viral-specific antibodies were undetectable [ , ] . the t cell response was shown to be critical for protection in sars- -infected mice [ ] . pa-tients with x-linked agammaglobulinemia, a genetic disorder associated with lack of b cells, have been reported to recover from symptomatic covid- [ , ] , suggesting that in some cases t cells are sufficient for viral clearance. theravajan et al. showed that activated cd +hla-dr+cd + t cells in a mild case of covid- significantly expand following symptom onset, reaching their peak frequency of % of cd + t cells on day after symptom onset, and contract thereafter [ ] . given the average time of days from infection to the onset of symptoms [ ] , the dynamics and magnitude of t cell response to sars-cov- is similar to that observed after immunization with live vaccines [ ] . sars-cov- specific t cells were detected in covid- survivors by activation following stimulation with sars-cov- proteins [ ] , or by viral protein-derived peptide pools [ ] [ ] [ ] [ ] [ ] [ ] [ ] . some of the t cells activated by peptide stimulation were shown to have a memory phenotype [ , , ] , and some potentially cross-reactive cd + t cells were found in healthy donors [ , , , ] . t cells recognise short pathogen-derived peptides presented on the cell surface of the major histocompatibility complex (mhc) using hypervariable t cell receptors (tcr). tcr repertoire sequencing allows for the quantitative tracking of t cell clones in time, as they go through the expansion and contraction phases of the response. it was previously shown that quantitative longitudinal tcr sequencing is able to identify antigen-specific expanding and contracting t cells in response to yellow fever vaccination with high sensitivity and specificity [ ] [ ] [ ] . not only clonal expansion but also significant contraction from the peak of the response are distinctive traits of t cell clones specific to the virus [ ] . in this study we use longitudinal tcralpha and tcrbeta repertoire sequencing to quantitatively track t cell clones that significantly expand and contract after recovery from a mild covid- infection, and determine their phenotype. we reveal the dynamics and the phenotype of the memory cells formed after infection, identify pre-existing t cell memory clones participating in the response, and describe public tcr sequence motifs of sars-cov- -reactive clones, suggesting a response to immunodominant epitopes. in the middle of march (day ) donor w female and donor m (male, both healthy young adults), returned to their home country from the one of the centers of the covid- outbreak in europe at the time. upon arrival, according to local regulations, they were put into strict self-quarantine for days. on day of selfisolation both developed low grade fever, fatigue and myalgia, which lasted days and was followed by a temporary loss of smell for donor m. on days , , , and we collected peripheral blood samples from both donors (fig. a) . the presence of igg and igm sars-cov- specific antibodies in the plasma was measured at all timepoints using sars-cov- s-rbd domain specific elisa (fig. s ) . from each blood sample we isolated pbmcs (peripheral blood mononuclear cells, in two biological replicates), cd +, and cd + t cells. additionally, on days , and we isolated four t cell memory subpopulations (fig. s ): effector memory (em: ccr -cd ra-), effector memory with cd ra re-expression (emra: ccr -cd ra+), central memory (cm: ccr +cd ra-), and stem cell-like memory (scm: ccr +cd ra+cd +). from all samples we isolated rna and performed tcralpha and tcrbeta repertoire sequencing as previously described [ ] . for both donors, tcralpha and tcrbeta repertoires were obtained for other projects one and two years prior to infection. additionally, tcr repertoires of multiple sam-ples for donor m -including sorted memory subpopulations -are available from a published longitudinal tcr sequencing study after yellow fever vaccination (donor m samples in [ ] ). from previously described activated t cell dynamics for sars-cov- [ ] , and immunization with live vaccines [ ] , the peak of the t cell expansion is expected around day post-infection, and responding t cells significantly contract afterwards. however, weiskopf et al. [ ] reports an increase of sars-cov- -reactive t cells at later timepoints, peaking in some donors after days following symptom onset. to identify groups of t cell clones with similar dynamics in an unbiased way, we used principal component analysis (pca) in the space of t cell clonal trajectories ( fig. b and c) . this exploratory data analysis method allows us to visualize major trends in the dynamics of abundant tcr clonotypes (occuring within top on any post-infection timepoints) between multiple timepoints. in both donors, and in both tcralpha and tcrbeta repertoires, we identified three clusters of clones with distinct dynamics. the first cluster (fig. bc , purple) corresponded to abundant tcr clonotypes which had constant concentrations across timepoints, the second cluster (fig. bc , green) showed contraction dynamics from day to day , and the third cluster (fig. bc , yellow), showed an unexpected clonal expansion from day with a peak on day followed by contraction. the clustering and dynamics are similar in both donors and are reproduced in tcrbeta (fig. bc) and tcralpha (fig. s ab) repertoires. we next used edger, a software for differential gene expression analysis [ ] and noiset, a bayesian differential expansion model [ ] , to specifically detect changes in clonotype concentration between pairs of timepoints in a statistically reliable way and without limiting the analysis to the most abundant clonotypes. both noiset and edger use biological replicate samples collected at each timepoint to train a noise model for sequence counts. results for the two models were similar (fig. s ) and we conservatively defined as expanded or contracted the clonotypes that were called by both models simultaneously. we identified tcralpha and tcrbeta clonotypes in donor w, and tcralpha and tcrbeta in donor m significantly contracted from day to day (largely overlapping with cluster of clonal trajectories, fig. s ). tcralpha and tcrbeta for donor w, and tcralpha and tcrbeta clonotypes for donor m were significantly expanded from day to (corresponding to cluster of clonal trajectories). note that, to identify putatively sars-cov- reactive clones, we only used post-infection timepoints, so that our analysis can be reproduced in other patients and studies where pre-infection timepoints are unavailable. s shows overlap between clonal trajectory clusters and edger/noiset hits). right: each curve shows the average ± . se of normalized clonal frequencies from each cluster. contracting (d) and expanding (e) clones include both cd + and cd + t cells, and are less abundant in pre-infection repertoires. t cell clones significantly contracted from day to day (d) and significantly expanded from day to day (e) were identified in both donors. the fraction of contracting (d) and expanding (e) tcrbeta clonotypes in the total repertoire (calculated as the sum of frequencies of these clonotypes in the second pbmc replicate at a given timepoint and corresponding to the fraction of responding cells of all t cells) is plotted in log-scale for all reactive clones (left), reactive clones with the cd (middle) and cd (right) phenotypes. similar dynamics were observed in tcralpha repertoires (fig. s ) , and for significantly expanded/contracted clones identified with the noiset bayesian differential expansion statistical model alone (fig. s ). however, tracking the identified responding clones back to pre-infection timepoints reveals strong clonal expansions from pre-to post-infection (fig. de, fig. s cd ). for brevity, we further refer to clonotypes significantly contracted from day to as contracting clones and clonotypes significantly expanding from day to as expanding clones. contracting clones corresponded to . % and . % of t cells on day post-infection, expanding clones reached . % and . % on day for donors m and w respectively (fig. de, left) . this magnitude of the t cell response is of the same order of magnitude as previously observed after live yellow fever vaccine immunization of donor m ( . % t cells on day post-vaccination). for each contracting and expanding clone we determined their cd /cd phenotype using separately sequenced repertoires of cd + and cd + subpopulations (see methods). both cd + and cd + subsets participated actively in the response (fig. de) . interestingly, clonotypes expanding after day were significantly biased towards the cd + phenotype, while contracting clones had balanced cd /cd phenotype fractions in both donors (fisher exact test, p < . for both donors). on days , and we identified both contracting ( fig. a -c) and expanding ( fig. s a -c) t cell clones in the memory subpopulations of peripheral blood. both cd + and cd + responding clones were found in the cm and em subsets, however cd + were more biased towards cm (with exception of donor w day timepoint, where a considerable fraction of cd + clones were found in cm), and cd + clones more represented in the emra subset. a small number of both cd + and cd + responding clonotypes were also identified in the scm subpopulation, which was previously shown to be a long-lived t cell memory subset [ ] . note that we sequenced more cells from pbmc than from the memory subpopulations (table s ), so that some low-abundant responding t cell clones are not sampled in the memory subpopulations. intriguingly, a number of responding cd + clones, and fewer cd + clones, were also represented in the repertoires of both donors and years before the infection. pre-existing clones were expanded after infection, and contracted afterwards for both donors (fig. s ). for donor m, for whom we had previously sequenced memory subpopulations before the infection [ ] , we were able to identify pre-existing sars-cov- reactive cd + clones in the cm subpopulation year before the infection and a group of cd + clones in the pre-infection em subpopulation. interestingly, on day after infection the majority of pre-infection cm clones were detected in the em subpopulation, suggesting recent t cell activation and a switch of the phenotype from memory to effector. these clones might represent memory t cells cross-reactive for other infections, e.g. other human coronaviruses. a search for tcrbeta amino acid sequences of responding clones in vdjdb [ ] -a database of tcrs with known specificities -resulted in essentially no overlap with tcrs not specific for sars-cov- epitopes: only two clonotypes matched. one match corresponded to the cmv (cytomegalovirus) epitope presented by the hla-a* mhc allele, which is absent in both donors (table s ) , and a second match was for influenza a virus epitope presented by hla-a* allele. the absence of matches suggests that contracting and expanding clones are unlikely to be specific for immunodominant epitopes of common pathogens covered in vdjdb. we next asked if we could map specificites of our responding clones to sars-cov- epitopes. on day post-infection donor m participated in study by shomuradova et al. [ ] (as donor p ), where his cd + t cells were stained with hla-a* : -ylqprtfll mhc-i tetramer. tcralpha and tcrbeta of facs-sorted tetramer-positive cells were sequenced and deposited to vdjdb (see [ ] for the experimental details). we matched these tetramer-specific tcr sequences to our longitudinal dataset ( fig. a for tcrbeta and fig. s for tcralpha). we found that their frequencies were very low on pre-infection timepoints and monotonically decreased from their peak on day ( . · − fraction of bulk tcrbeta repertoire) to day ( . · − fraction), in close analogy to our contracting clone set. among the tetramer positive clones that were abundant on day (with bulk frequency > − ), out of or tcrbetas and out of tcralphas were independently identified as contracting by our method. it was previously shown that tcrs recognising the same antigens frequently have highly similar tcr sequences [ , ] . to identify motifs in tcr amino acid sequences, we plotted similarity networks for significantly contracted (fig. bc, fig. ab ) and expanded ( fig. s b-e) clonotypes. the number of edges in all similarity networks except cd + expanding clones was signif-icantly larger than would expected by randomly sampling the same number of clonotypes from the corresponding repertoire ( fig. d and fig. s a ). in both donors we found clusters of highly similar clones in both cd + and cd + subsets for expanding and contracting clonotypes. clusters were largely donor-specific, as expected, since our donors have dissimilar hla alleles (si table ) and thus each is likely to present a nonoverlapping set of t cell antigens. the largest cluster, de- scribed by the motif trav -cagxnyggsqgnlif-traj , was identified in donor m's cd + contracting alpha chains. clones from this cluster constituted . % of all of donor m's cd + responding cells on day , suggesting a response to an immunodominant cd + epitope in the sars-cov- proteome. the high similarity of the tcr sequences of responding clones in this cluster allowed us to independently identify motifs from donor m's cd alpha contracting clones using the al-ice algorithm [ ] (fig. s ). while the time dependent methods (fig. ) identify abundant clones, the alice approach is complementary to both edger and noiset as it identifies clusters of t cells with similar sequences independently of their individual abundances. mapping tcr motifs to sars-cov- epitopes in cd + t cells, clusters of highly similar tcrbeta clonotypes in donor m and one cluster of tcralpha clonotypes correspond to ylqprtfll-tetramerspecific tcr sequences described above. to map additional specificities for cd + tcrbetas, we used a large set of sars-cov- -peptide specific tcrbeta sequences from [ ] obtained using multiplex identification of antigen-specific t cell receptors assay (mira) with combinatorial peptide pools [ ] . for each responding cd + tcrbeta we searched for the identical or highly similar (same vj combination, up to one mismatch in cdr aa) tcrbeta sequences specific for given sars-cov- peptides. a tcrbeta sequence from our set was considered mapped to a given peptide if it had at least two highly similar tcrbeta sequences specific for this peptide in the mira experiment. this procedure yielded unambiguous matches for cd + tcrbetas -just one clonotype was paired to two peptide pools (table s ). the vast majority of matches to mira corresponded to groups of contracting clones. as expected, we found that all clusters corresponding to hla-a* : -ylqprtfll mhc-i tetramer-specific tcrs were matched to the peptide pool ylqprtfl,ylqprtfll,yyvgylqprtf in the mira dataset. another large group of matches corresponded to the hla-b* : -restricted [ ] nqk-lianqf epitope. interestingly, clonotypes corresponging to this cluster together made up % of the cd + immune response on day , suggesting immunodominance of this epitope. two tcrbeta clonotypes mapped to this epitope were identified in effector memory subset one year before the infection, suggesting potential cross-reactive response. we speculate, that this response might be initially triggered by nqklianaf, a homologous hla-b* : epitope from hku or oc , common human betacoronaviruses. to predict potential pairings between tcralpha and tcrbeta motifs, we used a method of alpha/beta clonal trajectory matching described in [ ] (see methods for details). we found consistent pairing between one of the motifs in tcralpha to the largest motif in tcrbeta t cells, which is associated to hla-b* : -nqklianqf. at the time of writing, no data on tcr sequences specific to mhc-ii class epitopes exist to map specificities of cd + t-cells in a similar way as we did with mira-specific tcrs. however, a recently published database of bulk tcrbeta repertoires from covid- patients allowed us to confirm the sars-cov- specificity of contracting clones indirectly. public tcrbeta sequences that can recognize sars-cov- epitopes are expected to be clonally expanded and thus sampled more frequently in the repertoires of covid- patients than in control donors. in fig. cd we show that the total frequency of tcrbeta sequences forming the largest cluster in donor m (fig. c ) and donor w (fig. d) is significantly larger in the covid- cohort than in the healthy donor cohort from ref. [ ] , suggesting antigen-dependent clonal expansion. we hypothesized that the difference between control and covid- donors in motif abundance should be even larger if we restrict the analysis to donors sharing the hla allele that presents the epitope. unfortunately, hla-typing information is not yet available for the covid- cohort. however, using sets of hla-associated tcrbeta sequences from ref. [ ] , we could build a simple classifier to predict the hla alleles of donors from both the control and covid- cohorts exploiting the presence of tcrbeta sequences associated with certain hla alleles (see methods for details). we found that the cd + tcrbeta motif from donor w occurs preferentially in donors predicted to have drb * : allele, while the motif from donor m appears to be associated with hla-drb * : -dqb * : haplotype. the frequency of sequences corresponding to these motifs can then be used to identify sars-cov- infected donors with matching hla alleles (fig. s ). using longitudinal repertoire sequencing, we identified a group of cd + and cd + t cell clones that contract after recovery from a sars-cov- infection. our response timelines agree with t cell dynamics reported by theravajan et al. [ ] for mild covid- , as well as with dynamics of t cell response to live vaccines [ ] . we further mapped the specificities of contracting cd + t cells using sequences of sars-cov- specific t cells identified with tetramer staining in the same donor, and as well as the large set of sars-cov- peptide stimulated tcrbeta sequences from ref. [ ] . for large cd + tcrbeta motifs we show strong association with covid- by analysing the occurence patterns and frequencies fig. . a, analysis of tcr amino acid sequences of cd + contracting clones reveal distinctive motifs. each vertex in the similarity network corresponds to a contracting clonotype. an edge indicates or less amino acid mismatches in the cdr region (and identical v and j segments). networks are plotted separately for cd alpha (a), cd beta (b), contracting clonotypes. clonotypes without neighbours are not shown. sequence logos corresponding to the largest clusters are shown under the corresponding network plots. c, d, clonotypes forming the two largest motifs are significantly more clonally expanded (p< . , one sided t-test) in a cohort of covid- patients [ ] than in a cohort of control donors [ ] . each dot corresponds to the total frequency of clonotypes from motifs shaded on (b) in the tcrbeta repertoire of a given donor. of these sequences in a large cohort of covid- patients. surprisingly, in both donors we also identified a group of predominantly cd + clonotypes which expanded from day to day after the infection. one possible explanation for this second wave of expansion is the priming of cd + t cells by antigen-specific b-cells, but there might be other mechanisms such as the migration of sars-cov- specific t cells from lymphoid organs or bystander activation of non-sars-cov- specific t cells. it is also possible that later expanding t cells are triggered by another infection, simultaneously and asymptomatically occurring in both donors around day . in contrast with the first wave of response identified by contracting clones, for now we do not have confirmation that this second wave of expansion corresponds to sars-cov- specific t cells. accumulation of tcr sequences for cd + sars-cov- epitope specific t cells may further address this question. we showed that a large fraction of putatively sars-cov- reactive t cell clones are later found in memory subpopulations and remain there at least months after infection. impor-tantly, some of responding clones are found in long-lived stem cell-like (scm) memory subset, as also reported for sars-cov- convalescent patients in ref. [ ] . a subset of cd + clones were identified in pre-infection central memory subsets, and a subset of cd + t cells were found in effector memory. among these are cd + clones recognising nqklianqf, an immunodominant hla-b* : restricted sars-cov- epitope, for which homologous epitope differing by aa mismatch exists in common human betacoronaviruses. the presence of sars-cov- cross-reactive cd + t cells in healthy individuals was recently demonstrated [ , [ ] [ ] [ ] [ ] ] . our data further suggests that cross-reactive cd + and cd + t cells can participate in the response in vivo. it is interesting to ask if the presence of cross-reactive t cells before infection is linked to the mildness of the disease (with predicted hla-b* : cross-reactive epitope described above as a good starting point). larger studies with cohorts of severe and mild cases with pre-infection timepoints are needed to address this question. peripheral blood samples from two young healthy adult volunteers, donor w (female) and donor m (male) were collected with written informed consent in a certified diagnostics laboratory. both donors gave written informed consent to participate in the study under the declaration of helsinki. hla alleles of both donors (table s ) were determined by an in-house cdna high-throughput sequencing method. an elisa assay kit developed by the national research centre for hematology was used for detection of anti-s-rbd igg according to the manufacturer's protocol. the relative igg level (od/co) was calculated by dividing the od (optical density) values by the mean od value of the cut-off positive control serum supplied with the kit (co). od values of d , d and d samples for donor m exceeded the limit of linearity for the kit. in order to properly compare the relative igg levels between d , d , d and d , these samples were diluted : instead of : , the ratios d :d and d :d and d :d were calculated and used to calculate the relative igg level of d , d and d by multiplying d od/co value by the corresponding ratio. relative anti-s-rbd igm level was calculated using the same protocol with anti-human igm-hrp conjugated secondary antibody. since the control cut-off serum for igm was not available from the kit, on fig. s b . we show od values for nine biobanked pre-pandemic serum samples from healthy donors. pbmcs were isolated with the ficoll-paque density gradient centrifugation protocol. cd + and cd + t cells were isolated from pbmcs with dynabeads cd + and cd + positive selection kits (invitrogen) respectively. for isolation of em, emra, cm and scm memory subpopulations we stained pbmcs with the following antibody mix: anti-cd -fitc (ucht , ebioscience), anti-cd ra-efluor (hi , ebioscience), anti-ccr -apc ( d , ebioscience), anti-cd -pe (dx , ebioscience). cell sorting was performed on facs aria iii, all four isolated subpopulations were lysed with trizol reagent immediately after sorting. tcralpha and tcrbeta cdna libraries preparation was performed as previously described in [ ] . rna was isolated from each sample using trizol reagent according to the manufacturer's instructions. a universal primer binding site, sample barcode and unique molecular identifier (umi) sequences were introduced using the 'race technology with tcralpha and tcrbeta constant segment specific primers for cdna synthesis. cdna libraries were amplified in two pcr steps, with introduction of the second sample barcode and illumina truseq adapter sequences at the second pcr step. libraries were sequenced using the illumina novaseq platform ( x bp read length). raw data preprocessing. raw sequencing data was demultiplexed and umi guided consensuses were built using migec v. . . [ ] . resulting umi consensuses were aligned to v and j genomic templates of the tra and trb locus and assembled into clonotypes with mixcr v. . . [ ] . see table s for the number of cells, umis and unique clonotypes for each sample. identification of clonotypes with active dynamics. principal component analysis (pca) of clonal trajectories was performed as described before [ ] . first we selected clones which were present among the top abundant in any of post-infection pbmc repertoires, including biological replicates, i.e. considered clone abundant if it was found within top most abundant clonotypes in at least one of the replicate samples at one timepoint. next, for each such abundant clone we calculated its frequency at each post-infection timepoint and divided this frequency by the maximum frequency of this clone for normalization. then we performed pca on the resulting normalized clonal trajectory matrix and identified three clusters of trajectories with hierarchical clustering with average linkage, using euclidean distances between trajectories. we identify statistically significant contractions and expansions with edger as previously described [ ] , using fdr adjusted p < . and log fold change threshold of . noiset implements the bayesian detection method described in [ ] . briefly, a two-step noise model accounting for cell sampling and expression noise is inferred from replicates, and a second model of expansion is learned from the two timepoints to be compared. the procedure outputs the posterior probability of expansion or contraction, and the median estimated log fold change, whose thresholds are set to . and respectively. mapping of covid- associated tcrs to the mira database. tcrbeta sequences from t cells specific for sars-cov- peptide pools mira (im-munecode release ) were downloaded from https: //clients.adaptivebiotech.com/pub/covid- . v and j genomic templates were aligned to tcr nucleotide sequences from the mira database using mixcr . . . we consider a tcrbeta from mira matched to a tcrbeta from our data, if it had the same v and j and at most one mismatch in cdr amino acid sequence. we consider a tcrbeta sequence mapped to an epitope if it has at least two identical or highly similar (same v, j and up to one mismatch in cdr amino acid sequence) tcrbeta clonotypes reactive for this epitope in the mira database. computational alpha/beta pairing by clonal trajectories. computational alpha/beta pairing was performed as described in [ ] . for each tcrbeta we determine the tcralpha with the closest clonal trajectory (tables s and s ). we observe no stringent pairings between tcrbeta and tcrbeta motifs with exception of two contracting cd tcrbeta clusters: trbv - /trbj - nqklianqf-associated clones from donor m paired to trav /traj alphas from the same cluster (cassledtnygytf-cavhssgtykyif and cassledtiygytf-caaltsgtykyif), and trbv - /trbj - beta cluster paired to largest alpha cluster (cassptgrgrtdtqyf-cayrsggseklvf and cassptgrggtdtqyf-cayrrpggekltf). computational prediction of hla-types. to predict hla-types from tcr repertoires of covid- cohort we used sets of hla-associated tcr sequences from [ ] . we use tcrbeta repertoires of donors from cohort from [ ] , for which hla-typing information is available in ref. [ ] as a training set to fit logistic regression model, where presence or absense of given hlaallele is an outcome, and the number of allele-associated sequences in repertoire, as well as the total number of unique sequences in the repertoire, are the predictors. a separate logistic regression model was fitted for each set of hla-associated sequences from ref. [ ] , and then used to predict the probability p that a donor from the covid- cohort has this allele. donors with p < . were considered negative for a given allele. raw sequencing data are deposited to the short read archive (sra) accession: prjna . processed tcralpha and tcrbeta repertoire datasets, resulting repertoires of sars-cov- -reactive clones, and raw data preprocessing instructions can be accessed from: https: //github.com/pogorely/minervina_covid. immunology of covid- : current state of the science the cd t cell response to respiratory virus infections expanding roles for cd + t cells in immunity to viruses memory t cell responses targeting the sars coronavirus persist up to years postinfection engineering t cells specific for a dominant severe acute respiratory syndrome coronavirus cd t cell epitope t cell responses are required for protection from clinical disease and for virus clearance in severe acute respiratory syndrome coronavirus-infected mice a possible role for b cells in covid- ? lesson from patients with agammaglobulinemia two xlinked agammaglobulinemia patients develop pneumonia as covid manifestation but recover. pediatric allergy and immunology p pai breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid- epidemiology and transmission of covid- in cases and of their close contacts in shenzhen, china: a retrospective cohort study human effector and memory cd + t cell responses to smallpox and yellow fever vaccines detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals phenotype of sars-cov- -specific t-cells in covid- patients with acute respiratory distress syndrome presence of sars-cov- reactive t cells in covid- patients and healthy donors. medrxiv magnitude and dynamics of the t-cell response to sars-cov- infection at both individual and population levels., (infectious diseases sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls single-cell transcriptomic analysis of sars-cov- reactive cd + t cells pre-existing t cell memory as a risk factor for severe covid- in the elderly broad and strong memory cd + and cd + t cells induced by sars-cov- in uk convalescent covid- patients selective and cross-reactive sars-cov- t cell epitopes in unexposed humans. science p eabd targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals robust t cell immunity in convalescent individuals with asymptomatic or mild covid- primary and secondary antiviral response captured by the dynamics and phenotype of individual t cell clones precise tracking of vaccineresponding t cell clones reveals convergent and personalized response in identical twins dynamics of the cytotoxic t cell response to a model of acute viral infection persisting fetal clonotypes influence the structure and overlap of adult human t cell receptor repertoires edger: a bioconductor package for differential expression analysis of digital gene expression data inferring the immune response from repertoire sequencing long-lasting stem celllike memory cd + t cells with a nave-like profile upon yellow fever vaccination vdjdb in : database extension, new analysis infrastructure and a t-cell receptor motif compendium sars-cov- epitopes are recognized by a public and diverse repertoire of human t-cell receptors quantifiable predictive features define epitope-specific t cell receptor repertoires identifying specificity groups in the t cell receptor repertoire detecting t cell receptors involved in immune responses from single repertoire snapshots immunosequencing identifies signatures of cytomegalovirus exposure history and hlamediated effects on the t cell repertoire multiplex identification of antigen-specific t cell receptors using a combination of immune assays and immune receptor sequencing generation of sars-cov- s spike glycoprotein putative antigenic epitopes in vitro by intracellular aminopeptidases human t cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity towards error-free profiling of immune repertoires mixcr: software for comprehensive adaptive immunity profiling key: cord- -bw lbzvt authors: pizzorno, andrés; padey, blandine; julien, thomas; trouillet-assant, sophie; traversier, aurélien; errazuriz-cerda, elisabeth; fouret, julien; dubois, julia; gaymard, alexandre; lescure, françois-xavier; dulière, victoria; brun, pauline; constant, samuel; poissy, julien; lina, bruno; yazdanpanah, yazdan; terrier, olivier; rosa-calatrava, manuel title: characterization and treatment of sars-cov- in nasal and bronchial human airway epithelia date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bw lbzvt in the current covid- pandemic context, proposing and validating effective treatments represents a major challenge. however, the lack of biologically relevant pre-clinical experimental models of sars-cov- infection as a complement of classic cell lines represents a major barrier for scientific and medical progress. here, we advantageously used human reconstituted airway epithelial models of nasal or bronchial origin to characterize viral infection kinetics, tissue-level remodeling of the cellular ultrastructure and transcriptional immune signatures induced by sars-cov- . our results underline the relevance of this model for the preclinical evaluation of antiviral candidates. foremost, we provide evidence on the antiviral efficacy of remdesivir and the therapeutic potential of the remdesivir-diltiazem combination as a rapidly available option to respond to the current unmet medical need imposed by covid- . one sentence summary new insights on sars-cov- biology and drug combination therapies against covid- . on dec , , a cluster of cases of pneumonia of unknown etiology was reported in wuhan, china. on jan , , a novel coronavirus, lately named severe acute respiratory syndrome coronavirus (sars-cov- ) and classified into the betacoronavirus genus, was identified as the causative agent ( ) . as of mar , , days after the world health organization (who) declared the covid- a pandemic ( ) , the new coronavirus disease (covid- ) had caused approximately deaths among more than confirmed cases reported mainly in china but also spreading to at least other countries or territories worldwide ( ). compared to the two other coronaviruses responsible for epidemic outbreaks in the past, sars-cov and mers-cov, the novel sars-cov- strain shares ⁓ % and ⁓ % genome sequence identity, respectively ( ) . not surprisingly, important differences in terms of the epidemiology and physiopathology between these three viruses have also been observed ( ) ( ) ( ) . as with most emerging viral diseases, no specific antiviral treatment nor vaccine against any of these three coronaviruses are currently available, with standard patient management relying mainly on symptom treatment and respiratory support when needed. in that regard, and considering that key features of the biology of sars-cov- and its induced covid- still require further characterization, the scarce readiness of biologically relevant pre-clinical experimental models of sars-cov- infection as a complement of the african green monkey veroe cell line represents a major barrier for scientific and medical progress in this area. we and others have previously reported the advantage of using more physiological models such as in-house or commercially available reconstituted human airway epithelia (hae) to isolate, culture and study a wide range of respiratory viruses ( , ) . developed from biopsies of nasal or bronchial cells differentiated in the air/liquid interphase, these models reproduce with high fidelity most of the main structural, functional and innate immune features of the human respiratory epithelium that play a central role in the early stages of infection and constitute robust surrogates to study airway disease mechanisms and for drug discovery ( ) . in this study, we initially isolated and amplified in veroe cells a sars-cov- virus directly form a nasal swab from one of the first hospitalized patients with confirmed covid- in france ( ) . the complete genome sequence of the isolated sars-cov- virus was deposited in the gisaid epicovtm database under the reference betacov/france/idf / (accession id epi_isl_ ). phylogenetic analysis confirmed that the isolated virus is representative of currently circulating strains ( ) . we first characterized the replicative capacities of this viral strain in veroe cells at different multiplicities of infection (mois) (fig. a) , using both classic infectious titer determination in cell culture (tcid ) and molecular semi-quantitative methods, the latter based on orf b-nsp -specific primers and probes designed by the school of public health/university of hong kong (details in supplementary materials). this double approach was facilitated by the appearance of clearly observable characteristic cytopathic effect from hpi (fig. b) , and enabled the validation of a large interval (range - log (tcid )) with high correlation (r-squared . ) between molecular and infectious viral titers (fig. c) . in parallel, we successfully inoculated nasal mucilair™ hae on the apical surface directly with nasal swab samples, as confirmed by transmission electron microscopy observations (fig. s ). characteristic features of coronavirus-induced cell ultrastructure remodeling were easily distinguishable in both the apical and basal sides of the hae at hpi, notably the high accumulation of progeny virions in mucus-producer goblet cells. then, we advantageously exploited the mucilair™ hae model and in-house adapted protocols previously optimized for different respiratory viruses ( ) to perform experimental infections with sars-cov- . viral replication was monitored through repeated sampling and tcid titration at the apical surface of hae (fig. d) . trans-epithelial electrical resistance (teer), considered as a surrogate of epithelium integrity, was also measured during the time-course of infection (fig. e) . in parallel, comparative molecular viral genome quantification was performed at the three levels of the air/liquid hae interphase: in apical washes (fig. f, apical) , total cellular rna (fig. g , intracellular) and basal medium (fig. h, basal) . sars-cov- viral production at the epithelial apical surface increased sharply at hpi, reaching . and . log tcid /ml in nasal and bronchial hae, respectively. the peak of viral replication was reached earlier in bronchial ( - hpi) than in nasal hae, in which a progressive increase in infectious viral titers was observed until at least hpi (fig. d) . this replication kinetics was validated by molecular viral genome quantification at the apical pole ( fig. f) . high viral replication correlated with a reduction in epithelium integrity at hpi, reflected by more than . -and -fold decreases in bronchial and nasal hae teer values, respectively, followed by a partial recovery in the case of bronchial hae (fig. e) . moreover, viral production at the apical pole was well correlated with intracellular viral genome detection during infection, except for the nasal hae at hpi, in which a strong relative increase of nsp rna was observed (fig. f) . interestingly, viral genome was detected in the basal medium from hpi, with the peak observed at hpi ( fig. h ) coinciding with the highest impact of sars-cov- infection on epithelium integrity. to further characterize the biology of the sars-cov- , we inoculated both nasal ( fig. a , b) and bronchial (fig. c, d) hae and analyzed the infection-induced remodeling of the cellular ultrastructure using transmission electron microscopy. at hpi, both hae exhibited a well-established infection, with ciliated, goblet and to a lesser extent basal cells showing active production of viral progeny. this observation is accordance with viral replication results described in fig. and with a recent study reporting high expression levels of the sars-cov- cell receptor angiotensin-converting enzyme- (ace ) in both ciliated and goblet respiratory cells ( ) . as previously observed in structural studies of other coronaviruses, notably sars-cov and mers-cov ( - ), we distinguished characteristic clusters in the perinuclear region of infected hae cells. these clusters are mainly composed of numerous viral single-and double-membrane vesicles (dmv) and mitochondria ( fig. a, a we therefore evaluated in both veroe and hae model the antiviral potential against sars-cov- of remdesivir monotherapy but also in combination with diltiazem. diltiazem is a voltage gated ca + channel antagonist currently used as anti-hypertensive for the control of angina pectoris and cardiac arrhythmia ( ) , which we have recently repurposed as an effective host-directed influenza inhibitor due to its so far undescribed capacity of inducing the interferon (ifn) antiviral response, particularly type iii ifns (fig. s ) ( ) . additionally, the rationale of testing such virus-directed plus host-directed drug combination is consistent with a novel study describing hypertension as a potential risk factor observed among a cohort of inpatients with covid- ( ), and two reports not anticipating potential adverse effects of diltiazem ( ) or negative pharmacological interactions of between remdesivir and diltiazem for the treatment of covid- ( ) . cells has been proved functional, this cell line cannot produce type i ifns ( , ) . this incomplete ifn response most likely accounts for the lack of significant antiviral effect observed with diltiazem monotherapy in our experimental conditions. nonetheless, addition of . µm diltiazem significantly potentiated the antiviral effect of remdesivir ( fig. a-c) , inducing % and % reductions in remdesivir ic values at and hpi, respectively. comparably, daily treatment with µm remdesivir resulted in . log and . log reductions of intracellular sars-cov- viral titers at hpi in nasal and bronchial hae, respectively (fig. d, upper panel) . not surprisingly for a model with a completely functional ifn response, daily treatment with µm diltiazem resulted in moderate yet substantial ( . log and . log , respectively) reductions of intracellular viral titers in nasal and bronchial hae at the same time-point (fig. d, upper panel) . on top of that, we observed an additional . log reduction in nasal hae viral titers for the remdesivir-diltiazem combination when compared with remdesivir monotherapy. in all cases, the antiviral effects induced by remdesivir, diltiazem or the remdesivir-diltiazem combination translated into a protection of the nasal but not the bronchial hae barrier integrity, preventing the drop on teer values induced by the infection (fig. d, lower panel) . importantly, remdesivir also showed a strong antiviral effect at hpi (fig. e, upper panel) . this time, the ⁓ log reductions in nasal and bronchial hae viral titers observed for the remdesivir and remdesivirdiltiazem treatments correlated higher teer values in both hae compartments (fig. e the continuing -ncov epidemic threat of novel coronaviruses to global health -the latest novel coronavirus outbreak in wuhan who director-general's opening remarks at the media briefing on covid situation (public) genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding statpearls clinical features of patients infected with novel coronavirus in wuhan unique epidemiological and clinical features of the emerging novel coronavirus pneumonia (covid- ) implicate special control measures repurposing of drugs as novel influenza inhibitors from clinical gene expression infection signatures culturing of respiratory viruses in well-differentiated pseudostratified human airway epithelium as a tool to detect unknown viruses. influenza and other respiratory viruses antiviral drug screening by assessing epithelial functions and innate immune responses in human d airway epithelium model clinical and virological data of the first cases of covid- in europe: a case series gisaid -next hcov- app characterization of cellular transcriptomic signatures induced by different respiratory viruses in human reconstituted airway epithelia sars-cov- entry genes are most highly expressed in nasal goblet and ciliated cells within human airways sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum ultrastructure of human nasal epithelium during an episode of coronavirus infection van den hoogen, mers-coronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin a or interferon-α treatment clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study towards standardization of immune functional assays a novel coronavirus from patients with pneumonia in china covid- : consider cytokine storm syndromes and immunosuppression broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro are patients with hypertension and diabetes mellitus at increased risk for covid- infection? liverpool covid- interactions regulation of the interferon system: evidence that vero cells have a genetic defect in interferon production new world hantaviruses activate ifnλ production in type i ifn-deficient vero e cells scale bar: . µm. (d ) enlargement of a double-membraned spherule containing virions (v), double-membrane vesicles and electron-dense viral materials key: cord- -t xjy y authors: nazneen akhand, mst rubaiat; azim, kazi faizul; hoque, syeda farjana; moli, mahmuda akther; joy, bijit das; akter, hafsa; afif, ibrahim khalil; ahmed, nadim; hasan, mahmudul title: genome based evolutionary study of sars-cov- towards the prediction of epitope based chimeric vaccine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t xjy y sars-cov- is known to infect the neurological, respiratory, enteric, and hepatic systems of human and has already become an unprecedented threat to global healthcare system. covid- , the most serious public condition caused by sars-cov- leads the world to an uncertainty alongside thousands of regular death scenes. unavailability of specific therapeutics or approved vaccine has made the recovery of covi- more troublesome and challenging. the present in silico study aimed to predict a novel chimeric vaccines by simultaneously targeting four major structural proteins via the establishment of ancestral relationship among different strains of coronaviruses. conserved regions from the homologous protein sets of spike glycoprotein (s), membrane protein (m), envelope protein and nucleocapsid protein (n) were identified through multiple sequence alignment. the phylogeny analyses of whole genome stated that four proteins (s, e, m and n) reflected the close ancestral relation of sars-cov- to sars-cov- and bat coronavirus. numerous immunogenic epitopes (both t cell and b cell) were generated from the common fragments which were further ranked on the basis of antigenicity, transmembrane topology, conservancy level, toxicity and allergenicity pattern and population coverage analysis. top putative epitopes were combined with appropriate adjuvants and linkers to construct a novel multiepitope subunit vaccine against covid- . the designed constructs were characterized based on physicochemical properties, allergenicity, antigenicity and solubility which revealed the superiority of construct v in terms safety and efficacy. essential molecular dynamics and normal mode analysis confirmed minimal deformability of the refined model at molecular level. in addition, disulfide engineering was investigated to accelerate the stability of the protein. molecular docking study ensured high binding affinity between construct v and hla cells, as well as with different host receptors. microbial expression and translational efficacy of the constructs were checked using pet a(+) vector of e. coli strain k . the development of preventive measures to combat covid- infections might be aided the present study. however, the in vivo and in vitro validation might be ensured with wet lab trials using model animals for the implementation of the presented data. novel coronavirus named sars-cov- / -ncov was identified at the end of in wuhan, a city in the hubei province of china, causing severe pneumonia that leads to huge death cases . gradually this virus emerged as a new threat to the whole world and affecting almost all parts of the world. to date, the pathogen has affected countries, and thus becoming a global public health emergency. global public health concern with pandemic notion of covid- was declared on january th, by the world health organization (who, ) . agin, an adverse situation has also been announced on march for increasing the infections of covid- (kunz and minder, ) . till april , , total virus affected people around the world exceeded , , and more than , committed death, while , people fully recovered from the infection (who, ) . the alarming situation is that the number of confirmed cases worldwide has exceeded one million by this time. it took more than three months to reach the first confirmed cases, while required only days to detect the next cases. the situation is getting worse in european region. total death cases in italy, spain, usa, france, united kingdom was , , , and respectively (till april , ) and this number is exacerbating day by day (who, ) . some common clinical manifestations of covid- is fever, sputum production, shortness of breath, cough, fatigue, sore throat and headache which leads to severe cases of pneumonia. a few patients also have gastrointestinal symptoms with diarrhea and vomiting (guan et al., ) . though several early studies showed that the mortality rate for sars-cov- is not as high ( - %), the latest global death rate for covid- is . % which indicates the increasing trends . the investigation of chinese center for diseases control and prevention ( ) revealed that the prevalence of covid- is more apparent in the people ages years rather than the lower age groups (jeong-ho et al., ) . high fever and lymphocytopenia were found more common in covid- , though the frequency of the patient without fever condition is also higher than in the earlier outbreaks caused by sars-cov ( %) and mers-cov ( %) chen et al., ) . sars-cov- is a betacoronavirus that has a positive sense, - kb in length, single stranded rna molecule as its genetic material and belongs to the family coronaviridae, order nidovirales (hui et al., ) . it shares genome similarity with sars-cov ( . %) and bat coronavirus ( %) zhu et al., ) . however, there are still obscured hypothesis regarding the vector or carrier of sars-cov- , though its detection was primarily linked to wuhan's huanan seafood wholesale market (lu et al., ; who, ) . though the species of sars-cov- and bat coronavirus shares sufficient sequence similarities with the covid- , the known way mechanism of infection to the host, and the death rate is quite different in case of the novel coronavirus. in addition, there is an evolutionary distance between sars-cov- and bat coronavirus as well as the covid- (hu et al., ; wu, b) . because of high sequence variability of the pathogen, many of the efforts that have been undertaken to develop vaccine against sars-cov , remain unsuccessful . therefore, there is an urgent need to develop vaccines for treatment of sars-cov- based on the understanding of actual evolutionary ancestral relationship. while some natural metabolites and traditional medication may come up with comfort and take the edge off few symptoms of covid- , there is no proof that existing treatment procedures can effectively combat against the diseased condition (who, ). however, inactivated or live-attenuated forms of pathogenic organisms are usually recommended for the initiation of antigen-specific responses that alleviate or reduce the possibility of host experience with secondary infections (thompson & staats, ) . moreover, all of the proteins are not usually targeted for protective immunity, whereas only a few numbers of proteins are necessary depending on the microbes (tesh et al., , li et al., ) . depending on sufficient antigen expression from experimental assays, traditional vaccine could take years to develop, while sometimes can lead to undesirable consequences (purcell et al., , petrovsky & aguilar, . reverse vaccinology approach, on the other hand, is an effective way to develop vaccine against covid- . in this method, computation analysis towards genomic architecture of pathogenic candidate could predict the antigens of pathogens without the prerequisite to culture the pathogens in lab condition. although, few pathogens that challenge to develop effective vaccines so far may become possible through such approach (rappuoli, ) which initiates a huge move in the development of vaccine against the deadly pathogens. the strategy included the comprehensive utilization of bioinformatics algorithm or tools to develop epitope based vaccine molecules, though further validation and experimental procedures are also needed (moxon et al., ) . in addition, peptide based subunit vaccines are biologically safer due to the absence of continuous in vitro culture during the production period, and also implies an appropriate activation of immune responses (purcell et al., ; dudek et al., ) . such immunoinformatic approaches have already been employed by the researchers to design vaccines against a number of deadly pathogens including ebola virus (khan et al., ) , hiv (pandey et al., ) , areanaviruses , marburgvirus , norwalk virus (azim et al., b) , nipah virus (saha et al., ) , influenza virus (hasan et al., b) and so on. at present, a suitable peptide vaccine against sars-cov- is urgently necessary that could efficiently generate enough immune response to destroy the virus. hence, the study was designed to develop a chimeric recombinant vaccine against covid- by targeting four major structural proteins of the pathogen, while revealing the evolutionary history of different species of coronavirus based on whole genome and protein domain-based phylogeny. complete genomes of the covid- and other coronaviruses were retrieved from the ncbi (https://www.ncbi.nlm.nih.gov/), using the keyword 'coronavirus' and the search option 'nucleotide'. a total complete genomes were retrieved, with unique identity (supplementary file ). protein sequence of the spike, envelope, membrane and nucleocapsid were also retrieved from the corresponding genome sequences found in ncbi (supplementary file ). tthe complete genome sequences of coronaviruses and the proteins of envelope, envelope, membrane and nucleocapsid were employed to construct different phylogenetic trees. multiple sequence alignment (msa) of the complete genome and protein sequences were performed using mafft v . (katoh & standley, ) tool. for the whole genome alignment, we used mafft auto algorithm, while for the protein sequences alignment, mafft g-ins-i algorithm was used using default parameters. next, alignment was visualized using the jalview- . (waterhouse et al., ) . alignment position with more than % gaps was pruned from coronavirus genome using phyutility . . program (smith & dunn, ) . again, more than % gaps from the spike protein alignment was removed. partitionfinder- . . (lanfear et al., ) indicated the best fit substitution model of the completed genome sequences and the protein sequences. the phylogeny of the whole genome sequences of coronavirus was constructed using both the maximum likelihood method and bayesian method. raxml version . . (stamatakis, ) with the substitution model gtrgammai was used using rapid bootstrap replicates. mrbayes version . . (ronquist et al., ) with invgamma model was used for the corona virus genomes. phylogenetic analyses of four different protein sequences were performed by using raxml- . . tool. for spike and nucleocapsid proteins, we found protgammaiwag and protgammaiwag as the best fil model, respectively. again, protgammawag was the best fit model of evolution for both the membrane and envelope proteins. for the retrieval of the domain sequences of the stated protein sequences, interpro database (https://www.ebi.ac.uk/interpro/) was utilized. finally, the interactive tree of life (itol; embl, heidelberg, germany) was used for the visualization of the phylogenetic trees. all the trees were rooted in the midpoint. in the present study, reverse vaccinology technique was utilized to model a novel multiepitope subunit vaccine against -ncov. the scheme in figure represents the complete methodology that has been adopted to develop the final vaccine construct. among proteins (available in the ncbi database) from different strains of novel corona virus, four structural proteins, i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein, were prioritized for further investigation (supplementary file ). after sequence retrieval from ncbi, the sequences were subjected to blastp analysis to find out the homologous protein sequences. multiple sequence alignment was done by using clustal omega to identify the conserved regions (sievers and higgins, ) . the topology of each conserved regions were predicted by tmhmm server v. . (http://www.cbs.dtu.dk/services/tmhmm/), while the antigenicity of the conserved regions was determined by vaxijen v . (doytchinova and flower, a) . only the common fragments were used for t-cell epitopes enumeration via t-cell epitope prediction server of iedb (http://tools.iedb.org/main/tcell/) (vita et al., ) . again, tmhmm server was utilized for the prediction of transmembrane topology of predicted mhc-i and mhc-ii binding peptides followed by antigenicity scoring via vaxijen v . server (krogh et al., ; doytchinova and flower, b) . the epitopes which have antigenic potency were picked and used for preceding analysis. the level of conservancy scrutinizes the ability of epitope candidates to impart capacious spectrum immunity. homologous sequence sets of the chosen antigenic proteins were retrieved form the ncbi database by utilizing blastp tool. later, conservancy analysis tool (http://tools.iedb.org/conservancy/) in iedb was used to demonstrate the conservancy level of the predicted epitopes among different viral starins. the toxicity of non-allergenic epitopes was enumerated by using toxinpred server (gupta et al., ) . among different ethnic societies and geographic spaces, the hla distribution varies around the world. population coverage study was conducted by using iedb population coverage calculation server (vita et al., ) . to check the allergenicity of the proposed epitopes, four distinct servers i.e. allergenfp , allertop (dimitrov et al., ) , allermatch (fiers et al., and allergen online (http://www.allergenonline.org/) servers were utilized. three different algorithms i.e. bepipred linear epitope prediction . (jespersen et al., ) , emini surface accessibility prediction (emini et al., ) and kolaskar and tongaonkar antigenicity scale (kolaskar and tongaonkar, ) from iedb predicted the potential b-cell epitopes within conserved fragments of the chosen viral proteins. top ctl, htl and b cell epitopes were compiled to design the final vaccine constructs in the study. each vaccine constructs commenced with an adjuvant followed by top ctl epitopes, htl epitopes and bcl epitopes respectively. for construction of novel corona vaccine, the chosen adjuvants i.e. l /l ribosomal protein, beta defensin (a mer peptide) and haba protein (m. tuberculosis, accession number: agv . ) were used (rana and akhter, ) . several linkers such as eaaak, gggs, gpgpg and kk in association with padre sequence were incorporated to construct fruitful vaccine sequences against covid- . the constructed vaccines were then analyzed whether they are non-allergenic by utilizing the following tool named algpred (azim et al., ) . the most potential vaccine among the three constructs was then determined by assessing the antigenicity and solubility of the vaccines via vaxijen v . (doytchinova and flower, b) and proso ii server (smialowski et al., ) , respectively. protparam tool (https://web.expasy.org/protparam/), provided by expasy server (hasan et al., c ) was used to functionally characterize (gasteiger et al., ) the vaccine constructs. the studied functional properties were isoelectric ph, molecular weight, aliphatic index, instability index, hydropathicity, estimated half-life, gravy values and other physicochemical characteristics. alpha helix, beta sheet and coil structures of the vaccine constructs were analyzed through gor secondary structure prediction method using prabi (https://npsaprabi.ibcp.fr/). in addition, espript . (robert & gouet, ) was also used to predict the secondary structure of the stated protein sequences. vaccine d model was generated on the basis of percentage similarity between target protein and available template structures from pdb by using i-tasser (peng and xu, ) . the modeled structures were further refined via fg-md refinement server. structure validation was performed by ramachandran plot assessment in rampage (hasan et al., b) . by utilizing dbd server, probable disulfide bonds were designed for the anticipated vaccine constructs (craig and dombkowski, ) . the value of energy was considered < . , while the chi value for the residue screening was chosen between - to + for the operation (hasan et al., b) . the b-cell epitopes of putative vaccine molecules were predicted via ellipro server (http://tools.iedb.org/ellipro/) with minimum score . and maximum distance of Å (ponomarenko et al., ) . moreover, ifn-inducing epitopes within the vaccine were predicted using ifnepitope with motif and svm hybrid detection strategy (hajighahramani et al., ). normal mode analysis (nma) was performed to predict the stability and large scale mobility of the vaccine protein. the imod server determined the stability of construct v by comparing the essential dynamics to the normal modes of protein (aalten et al, ; wuthrich et al., ) . it is a recommended alternative to costly atomistic simulation (tama and brooks, ; cui and bahar, ) and shows much quicker and efficient assessments than the typical molecular dynamics (md) simulations tools (prabhakar et al., ; awan et al., ) . the main-chain deformability was also predicted by measuring the efficacy of target molecule to deform at each of its residues. the motion stiffness was represented via eigenvalue, while the covariance matrix and elastic network model was also analyzed. patchdock server was prioritized for docking between different hla alleles and the putative vaccine molecules. in addition, the superior construct was also docked with different human immune receptors such as, ace , apn, dpp and tlr- .the d structure of these receptors were retrieved from rcsb protein data bank. detection of highest binding affinity between the putative vaccine molecules and the receptor was experimented based on the lowest interaction energy of the docked structure. jcat tool was utilized for codon adaptation in order to fasten the expression of vaccine construct v in e. coli strain k . for this, some restriction enzymes (i.e. bgli and bglii), rho independent transcription termination and prokaryote ribosome-binding site were put away from the work (grote et al., ) . after that, the mrna sequence of constructed v vaccine was ligated within bgli ( ) and bglii ( ) restriction site at the c-terminal and n-terminal sites respectively. snapgene tool was utilized for in silico restriction cloning (solanki and tiwari, ). in the phylogenetic analysis, we introduced different coronavirus from three different genera: (forni et al., ; zhou et al., ; zumla et al., ) . among these, the first five species belong to the beta coronavirus genera, while the last two belongs to the alpha genera. apart from the human coronaviruses, we introduced other coronaviruses which choose different species of bats, whale, turkey, rat, mink, ferret, swine, camel, rabbit, cow and others as host (supplementary table- domain analysis of spike protein of coronaviruses reveals that they contain mainly one signature domains namely, coronavirus s glycoprotein (ipr ), which is present in all the candidates. all other betacoronavirus contains spike receptor binding protein (ipr ), coronavirus spike glycoprotein hapted receptor domain (ipr ) and spike receptor binding domain superfamily (ipr ). sars-cov- contains an extra domain, namely spike glycoprotein n-terminal domain (ipr ), which is also present in some the sub-genera (embecovirus) of betacoronavirus, but not in covid- . one important finding in our study is that the covid- candidates do not contain the domain spike glycoprotein (ipr ), which is present in the sars-cov- (figure ) . the secondary structure prediction study shows a large numbers of cysteine residues which contribute to the formation of disulfide bonds within the spike protein. most of them fall within the s spike protein, which is amino acid long in sars-cov- , while amino acids long in covid- . the rgd motif which is conserved within the covid- is present in the vicinity of the s protein. it exists as kgd that clearly demonstrates the mutation over the short time period. again, the receptor binding domain and receptor binding motif analyses disclose variations within several region between the covid and sars-cov- (supplementary file ). the domain-based phylogenetic analysis reflects two main divisions, where the all the novel betacoronavirus i.e., covid form clade with the sars-cov- ; while other betacoronavirus fall in another clade which further divide to give rise different sub-genera. this clearly shows that the covid- exerts specific ancestral connection to the sars-cov- in terms of spike glycoproteins. interestingly, our study also revealed close relatedness of both the sars-cov- and covid- to the bat betacoronavirus that belongs to the hibecovirus sub-genus. however, in our study, the bat coronaviruses of nobecovirus subgenus did not fall into the same clade of novel coronaviruses. the phylogenetic study and msa also revealed that, the functional portion of the spike glycoprotein domain and spike glycoprotein n-terminal domain might be lost from the covid- during the course of evolution. the envelope proteins of both betacoronavirus and alphacoronavirus contain only one protein domain (ipr ) namely, nonstructural protein ns or small envelope protein e (ns /e). this domain is well conserved in coronavirus and also found in murine hepatitis virus. on the other hands, the gamma coronavirus shows the exception, which possess (ipr ) ibv c protein domain, which thought to be expressed from the orf c gene of infectious bronchitis virus (jia & naqi, (figure ) . in spite to the previous findings, where it was found that the envelope proteins of the mers virus and sars-cov- exerted close proximity in terms of secondary structure and functions (surya et al., ) . unlike to earlier finding, we got that gamma corona virus candidate in our study shows close connection with both sars-cov- and covid- in terms of envelope proteins. membrane the length of nucleocapsid proteins of betacoronavirus genus ranges from to amino acids. three signature domains are mainly present in the nucleocapsid proteins, which are: coronavirus nucleocapsid protein (ipr ), nucleocapsid proteins c-terminal (ipr ) and nucleocapsid proteins n-terminal (ipr ). however, in our experiment, we didn't find these domains in hcov-hku ( figure ) showed that among the immunogenic conserved sequences from the corresponding proteins except spike glycoprotein met the criteria of desired exomembrane characteristics (table ) . a plethora of immunogenic epitopes were generated from the conserved sequences that were able to bind with most noteworthy number of hla cells (supplementary table , supplementary table , supplementary table and supplementary table ). top epitopes with exomembrane characteristics were ranked for each individual protein after investigating their antigenicity score and transmembrane topology (table ) . epitopes from each protein showed high level of conservancy up to % (table ). toxinpred server predicted the relative toxicity of each epitope which indicated that the top epitopes were non-toxin in nature (supplementary table ). population coverage of four structures proteins were also done for the predicted ctl and htl epitopes. from the screening, results showed that population of the various geographic regions could be covered by the predicted t-cell epitopes ( figure ) . finally, the allergenic epitopes were excluded from the list based on the evaluation of four allergenicity prediction server (supplementary table ). top b-cell epitopes were predicted for spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein using distinct algorithms (i.e. bepipred linear epitope prediction, emini surface accessibility, kolaskar & tongaonkar antigenicity prediction) from iedb. epitopes were also allowed to analyze their vaxijen scoring and allergenicity (table ) . three putative vaccine molecules (i.e. v , v and v ) were constructed, each comprising a protein adjuvant, eight t-cell epitopes, twelve b-cell epitopes and respective linkers (supplementary table ). padre sequence was included to extend the efficacy and potency of the constructed vaccine. the putative vaccine constructs, v , v and v were , and residues long respectively. however, allergenicity score of v (- . ) revealed that it was superior among the three constructs in terms safety and efficacy. v also had a solubility score ( . ) ( figure e ) and antigenicity ( . ) over threshold value (table ) . protparam tool was employed to analyze the physicochemical properties of v . figure ) . tertiary structure of the putative vaccine construct v was generated using i-tasser server ( figure a and b). the server used best templates with highest significant (measured via zscore) from the lomets threading program to model the d structure. after refinement, ramachandran plot analysis revealed that . % and . % residues were in the favored and allowed regions respectively, while only residues ( . %) occupied in the outlier region ( figure c ). the overall quality factor determined by errat server was . % ( figure d ) ellipro server predicted a total conformational b-cell epitopes from the d structure of the construct v . epitopes no. were considered as the broadest conformational b cell epitopes with amino acid residues (figure and supplementary table ). stability of the vaccine construct v was investigated through mobility analysis ( figure a and b), b-factor, eigenvalue & deformability analysis, covariance map and recommended elastic network model. results revealed that the placements of hinges in the chain was insignificant ( figure c ) and the b-factor column gave an averaged rms ( figure d ). the estimated higher eigenvalue . e - ( figure e ) indicated low chance of deformation of vaccine protein v . the correlation matrix and elasticity of the construct have been shown in figure g and figure h , respectively. the structural interaction between hla alleles and the designed vaccines were investigated by molecular docking approach. the server detected the complexed structure by focusing on complementarity score, ace (atomic contact energy) and estimated interface area of the compound ( table ). the molecular affinity between the putative vaccine molecules v and several immune receptors were also experimented. the result showed that construct v interacted with each receptor with significantly lower binding energy (figure ). the codon adaptation index (cai) and gc content for the predicted codons of the putative vaccine constructs v were demonstrated as . and . % respectively. an insert of bp was found which lacked the restriction sites for bgli and bglii, thus providing comfort zone for cloning. the codons were inserted into pet a(+) vector alongside two restriction sites (bgli and bglii) and a clone of base pair was generated ( figure ). in december , a new coronavirus prevalence flourished in wuhan, china, causing clutter among the medical community, as well as to the rest of the world . the new species has been renamed as -ncov or, sars-cov- , already causing considerable number infections and deaths in china, italy, spain, iran, usa and to a growing degree throughout the world. the major outbreak and spread of sars-cov- in forced the scientific community to make considerable investment and research activity for developing a vaccine against the pathogen. however, owing to high infectivity and pathogenicity, the culture of sars-cov- needs biosafety level conditions, which may obstructed the rapid development of any vaccine or therapeutics. it had been found that about companies and academic institutions are engaged in such works (spinney et al., , ziady et al., . among the potential sars-cov- vaccines in the pipeline, four have nucleic acid based designs, four involve non-replicating viruses or protein constructs, two contain live attenuated virus and one involves a viral vector (pang et al., ) , while only one, called mrna- (developed by niaid collaboration with moderna, inc.), has confirmed to start phase- trial (nih, ) . however, in this study we emphasized on a different approaches by prioritizing the advantages of different genome and proteome database using the immunoinformatic approach. computational vaccine predictions were adopted by the researchers to design vaccines against both mers-cov (sudhakar et al., ; fernando et al., ) and sars-cov- (yang et al., ; oany et al., ) , targeting the outer membrane or functional proteins (sharmin and islam, ) . several in silico strategies have also been employed to predict potential t cell and b cell epitopes against sars-cov- , either emphasizing on spike glycoprotein or envelope proteins (behbahani, ; rasheed et al., ) . none of the studies, however, focused on other structural proteins. moreover, random genetic changes and mutations in the protein sequences (yin, ) may obstruct the development of effective vaccines and therapeutics against human coronavirus in the future. hence, the present study was employed to identify the similarity and divergence among the close relatives of the target pathogen and develop a novel chimeric recombinant vaccine considering all major structural proteins i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein simultaneously. the topology of the phylogenetic trees of the whole genome and the stated four proteins sequences from different species of coronaviruses reveal that sars-cov- and bat coronaviruses are the closest homologs of the novel coronaviruses. our results infer a significant level of similarities within the covid- and sars-cov- which was also aligned with the previous findings (jaimes et al., ; wu, a) . the sequence similarities between the sars-cov, bat coronaviruses and the covid- from the reported studies (hu et al., ; wu, b) suggests that those are distantly related, in spite those are capable of infecting the humans and therefore possess the adaptive convergent evolution. interestingly, the covid- envelope proteins form clade with the turkey coronavirus which belongs to gamma coronavirus genus. so, in terms of envelope proteins, the envelope gene of turkey coronavirus might contribute to the convergence process, which need further analysis. in addition, from the domain-based phylogeny of nucleocapsid proteins, it can be deduced that this protein might have originated in bats and was transmitted to camels and then later on choose human as the potential host. overall, the covid- might go through complex adaptation strategies in order to be transmitted into the human via different animals. the homologous protein sets for four structural proteins of coronavirus were sorted to identify conserved regions through blastp analysis and msa. only the conserved sequences were utilized to identify potential b-cell and t-cell epitopes for each individual protein (table ) . thus, our constructs are expected to stimulate a broad-spectrum immunity in host upon administration. cytotoxic cd +t lymphocytes (ctl) play a crucial role to control the spread of pathogens by recognizing and killing diseased cells or by means of antiviral cytokine secretion (garcia et al., ) . thus, t cell epitope-based vaccination is a unique process to confer defensive response against pathogenic candidates (shrestha, ) . approximately mhc-i peptides (ctl epitopes) and mhc-ii peptides (htl epitopes) were predicted via iedb server, from which we screened the top ones through analyzing the antigenicity score, transmembrane topology, conservancy level and other important physiochemical parameters employing a number of bioinformatics tools ( table ). the top epitopes from each protein was further assessed by investigating the toxicity profile and allergenicity pattern. different servers rely on different parameters to predict the allergenic nature of small peptides. therefore, we used distinct servers for such assessment and the epitopes predicted as non-allergen at least via servers were retained for further analysis (supplementary table ). vaccine initiates the generation of effective antibodies that are usually produced by b cells and plays effector functions by targeting specifically to a foreign particles (cooper & nemerow, ) . the potential b cell epitopes were generated by three different algorithms (bepipred linear epitope prediction . , kolaskar and tongaonkar antigenicity prediction and emini surface accessibility prediction) from iedb database (table ) . suitable linkers and adjuvants were used to combine top finalized epitopes from each protein that led to develop a multi epitope vaccine molecules (supplementary table ). as padre sequence was usually recommended to lessen the polymorphism of hla molecules in the population (ghaffari-nazari et al., ) , it was also considered to construct the final vaccine molecule. here, adjuvants would enhance the immunogenicity of the vaccine constructs and appropriate separation of epitopes in the host environment would be ensured by the linker (yang et al., ) . allergenicity, physiochemical properties, antigenicity and three-dimensional structure of vaccine constructs were characterized, and it had been concluded that v was superior to v and v vaccine constr. the final construct also occupied by several interferon-α producing epitopes (supplementary table ). the vaccine protein (v ) was subjected to disulfide engineering to enhance its stability. analysis of the normal modes in internal coordinates by imods was employed to investigate the collective motion of vaccine molecules (lopez-blanco et al., ) . negligible chance of deformability at molecular level was analyzed for the putative vaccine construct v , thereby strengthening our prediction. moreover, molecular docking was investigated to analyze the molecular affinity of the vaccine with different hla molecules i.e. drb * , drb * , drb * , drb * , drb * and drb * (table ) . it had been reported that a specific receptor-binding domain of cov spike protein usually recognizes its host receptor ace (angiotensin-converting enzyme ) (li. et al., ; li, ) . previous studies also identified dipeptidyl peptidase (dpp ) as a functional receptor for human coronavirus (raj et al., ) . therefore, we performed another docking study prioritizing these immune receptors to strengthen our prediction ( figure ). results showed that the designed construct bound with the selected receptors with minimum binding energy which was biologically significant. finally, in-silico restriction cloning was adopted to check the suitability of construct v for entry into pet a (+) vector and expression in e. coli strain k (figure ). traditional ways to vaccine development are time consuming and laborious. moreover, the result may not be always as expected or fruitful (stratton et al., ; hasan et al., ) . in silico prediction and prescreening methods, on the contrary, offer some advantages while saving time and cost for production. therefore, the present study may aid in the development of preventive strategies and novel vaccines to combat infections caused by -ncov. however, further wet lab trials involving model organism needs to be experimented for validating our findings. the darker the greys, the stiffer the springs (h). a comparison of techniques for calculating protein essential dynamics mutation-structure function relationship based integrated strategy reveals the potential impact of deleterious missense mutations in autophagy related proteins on hepatocellular carcinoma (hcc): a comprehensive informatics approach conglomeration of highly antigenic nucleoproteins to inaugurate a heterosubtypic next generation vaccine candidate against arenaviridae family. biorxiv immunoinformatics approaches for designing a novel multi epitope peptide vaccine against human norovirus in silico design of novel multi-epitope recombinant vaccine based on coronavirus spike glycoprotein genomic variance of the -ncov coronavirus epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study pathogenicity and transmissibility of -ncov-a quick overview and comparison with other emerging viruses the role of antibody and complement in the control of viral infections disulfide by design . : a web-based tool for disulfide engineering in proteins normal mode analysis theoretical and applications to biological and chemical systems allertop v. -a server for in silico prediction of allergens allergenfp: allergenicity prediction by descriptor fingerprints identifying candidate subunit vaccines using an alignmentindependent method based on principal amino acid properties a server for prediction of protective antigens, tumour antigens and subunit vaccines epitope discovery and their use in peptide based vaccines induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide engineering a replication-competent, propagation defective middle east respiratory syndrome coronavirus as a vaccine candidate allermatch™, a webtool for the prediction of potential allergenicity according to current fao/who codex alimentarius guidelines molecular evolution of human coronavirus genomes structural basis of t cell recognition improving multi-epitope long peptide vaccine potency by using a strategy that enhances cd +t help in balb/c mice a decade after sars: strategies for controlling emerging coronaviruses jcat: a novel tool to adapt codon usage of a target gene to its potential expression host clinical characteristics of coronavirus disease in china silico approach for predicting toxicity of peptides and proteins vaccinomics strategy for developing a unique multi-epitope monovalent vaccine against marburg marburgvirus. infection, genetics and evolution contriving a chimeric polyvalent vaccine to prevent infections caused by herpes simplex virus (type- and type- ): an exploratory immunoinformatic approach reverse vaccinology approach to design a novel multi-epitope subunit vaccine against avian influenza a (h n ) virus, microbial pathogenesis genomic characterization and infectivity of a novel sars-like coronavirus in chinese bats. emerging microbes and infections clinical features of patients infected with novel coronavirus in wuhan, china the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirusoutbreak in wuhan structural modeling of -novel coronavirus (ncov) spike protein reveals a proteolytically-sensitive activation loop as a distinguishing feature compared to sars-cov and related sars-like coronaviruses chinese scientists race to develop vaccine as coronavirus death toll jumps". south china morning post bepipred- . : improving sequence-based bcell epitope prediction using conformational epitopes sequence analysis of gene , gene and gene of avian infectious bronchitis virus strain cu-t mafft multiple sequence alignment software version : improvements in performance and usability epitope-based peptide vaccine design and target site depiction against ebola viruses: an immunoinformatics study a semi-empirical method for prediction of antigenic determinants on protein antigens predicting transmembrane protein topology with a hidden markov model: application to complete genomes covid- pandemic: palliative care for elderly and frail patients at home and in residential and nursing homes partitionfinder : new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses receptor recognition mechanisms of coronaviruses: a decade of structural studies peptide vaccine: progress and challenges angiotensin-converting enzyme is a functional receptor for the sars coronavirus imods: internal coordinates normal mode analysis server outbreak of pneumonia of unknown etiology in wuhan china: the mystery and the miracle subcellular location and topology of severe acute respiratory syndrome coronavirus envelope protein nih clinical trial of investigational vaccine for covid- begins. study enrolling seattle-based healthy adult volunteers. accessed on design of an epitope-based peptide vaccine against spike protein of human coronavirus: an in silico approach. drug design, development and therapy immunoinformatics approaches to design a novel multi-epitope subunit vaccine against hiv infection. vaccine potential rapid diagnostics, vaccine and therapeutics for novel coronavirus ( -ncov): a systematic review exploiting structure information for protein alignment by statistical inference vaccine adjuvants: current state and future trends. immunology and cell biology monomerization alters the dynamics of the lid region in campylobacter jejuni cstii: an md simulation study more than one reason to rethink the use of peptides in vaccine design dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc reverse vaccinology silico identification of novel b cell and t cell epitopes of wuhan coronavirus ( -ncov) for effective multi epitope-based peptide vaccine production deciphering key features in protein structures with the new endscript server mrbayes . : efficient bayesian phylogenetic inference and model choice across a large model space in silico identification and characterization of common epitope-based peptide vaccine for nipah and hendra viruses. asian pacific journal of tropical medicine a highly conserved wdypkcdra epitope in the rna directed rna polymerase of human coronaviruses can be used as epitope-based universal vaccine design role of cd + t cells in control of west nile virus infection clustal omega, accurate alignment of very large numbers of sequences protein solubility: sequence based prediction and experimental verification phyutility: a phyloinformatics tool for trees, alignments and molecular data subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against acinetobacter baumannii when will a coronavirus vaccine be ready?". the guardian. retrieved raxml version : a tool for phylogenetic analysis and post-analysis of large phylogenies immunization safety review: vaccinations and sudden unexpected death in infancy platform strategies for rapid response against emerging coronaviruses: mers-cov serologic and antigenic relationships in vaccine design potential factors influencing repeated sars outbreaks in china covid- : epidemiology, evolution, and cross-disciplinary perspectives mers coronavirus envelope protein has a single transmembrane domain that forms pentameric ion channels symmetry, form, and shape: guiding principles for robustness in macromolecular machines efficacy of killed virus vaccine, live attenuated chimeric virus vaccine, and passive immunization for prevention of west nile virus encephalitis in hamster model. emerging infectious diseases cytokines: the future of intranasal vaccine adjuvants. clinical and developmental immunology the immune epitope database (iedb) . a novel coronavirus outbreak of global health concern. the lancet jalview version -a multiple sequence alignment editor and analysis workbench coronavirus disease (covid- ) outbreak infection prevention and control during health care when covid- is suspected: interim guidance characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention a new coronavirus associated with human respiratory disease in china strong evolutionary convergence of receptor-binding protein spike between covid- and sars-related coronaviruses strong evolutionary convergence of receptor-binding protein spike between covid- and sars-related coronaviruses correlations between internal mobility and stability of globular proteins an evolutionary rgd motif in the spike protein of sars-cov- may serve as a potential high risk factor for virus infection ? in silico design of a dna-based hiv- multiepitope vaccine for chinese populations a dna vaccine induces sars coronavirus neutralization and protective immunity in mice genotyping coronavirus sars-cov- : methods and implications atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling a pneumonia outbreak associated with a new coronavirus of probable bat origin network-based drug repurposing for novel coronavirus -ncov/sars-cov- a novel coronavirus from patients with pneumonia in china biotech company moderna says its coronavirus vaccine is ready for first tests coronaviruses-drug discovery and therapeutic options authors would like to acknowledge the department of biochemistry and chemistry, department of microbial biotechnology and department of pharmaceuticals and industrial biotechnology of sylhet agricultural university for the technical support of the project. this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. authors declare that they have no conflict of interests. supplementary table : predicted ctl and htl epitopes of spike glycoprotein. key: cord- -c y lygj authors: bozzo, caterina prelli; nchioua, rayhane; volcic, meta; wettstein, lukas; weil, tatjana; krüger, jana; heller, sandra; conzelmann, carina; müller, janis; gross, rüdiger; zech, fabian; schütz, desiree; koepke, lennart; stuerzel, christina m; schüler, christiane; stenzel, saskia; braun, elisabeth; weiß, johanna; sauter, daniel; münch, jan; stenger, steffen; sato, kei; kleger, alexander; goffinet, christine; sparrer, konstantin m.j.; kirchhoff, frank title: ifitm proteins promote sars-cov- infection of human lung cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c y lygj interferon-induced transmembrane proteins (ifitms , and ) restrict numerous viral pathogens and are thought to prevent infection by severe acute respiratory syndrome coronaviruses (sars-covs). however, most evidence comes from single-round pseudoparticle infection of cells artificially overexpressing ifitms. here, we confirmed that overexpression of ifitms blocks pseudoparticle infections mediated by the spike proteins of β-coronaviruses including pandemic sars-cov- . in striking contrast, however, endogenous ifitm expression promoted genuine sars-cov- infection in human lung cells both in the presence and absence of interferon. ifitm was most critical for efficient entry of sars-cov- and enhanced virus production from calu- cells by several orders of magnitude. ifitms are expressed and further induced by interferons in the lung representing the primary site of sars-cov- infection as well as in other relevant tissues. our finding that ifitms enhance sars-cov- infection under conditions approximating the in vivo situation shows that they may promote viral invasion during covid- . highlights overexpression of ifitm , and restricts sars-cov- infection endogenous ifitm , and boost sars-cov- infection of human lung cells ifitm is critical for efficient entry of sars-cov- in calu- cells ( figures s b, s c) . taken together, our results show that all three ifitms prevent sars-cov- s/ace -mediated attachment and membrane fusion in single round pseudotype infection assays. it has been reported that overexpression of ifitms inhibits the function of the s proteins of the two to determine whether ifitms also prevent infection by genuine sars-cov- , we infected hek t cells overexpressing ace alone or together with ifitms with a wildtype isolate of only observed at a dilution of - , while silencing of ifitm expression reduced the cpe caused by infectious sars-cov- down to a dilution of - . no virus-induced cpe was observed at any dilution upon depletion of ifitm or all three ifitm proteins ( figure s d ). this result indicates that lack of ifitm expression reduced infectious sars-cov- yield by more than four orders of magnitude. our finding that ifitm had the strongest effect agrees with the results of the qpcr assays. notably, titration experiments showed that ifitms do not promote genuine sars-cov- infection in hek t cells over a broad range of expression levels ( figure s e ). thus, the opposing the exact reasons for the opposing effects of overexpressed and endogenous ifitms exposure to single cycle sars-cov- s pseudotyped virus and wildtype sars-cov- remain to be determined. previous studies suggested that the subcellular localization, membrane curvature, endocytic activity the authors declare no competing interests. ifitm-family proteins: the cell's first line of antiviral defense severe respiratory illness caused by a novel coronavirus imbalanced host response to sars-cov- drives development of covid- a sensitive and specific enzyme-based assay detecting hiv- virion fusion in primary t lymphocytes high-efficiency transformation of mammalian cells by plasmid dna regulation of the trafficking and antiviral activity of ifitm by post-translational modifications the broad-spectrum antiviral functions of ifit and ifitm proteins psgl- restricts hiv- infectivity by blocking virus particle attachment to target cells more than meets the i: the diverse antiviral and cellular functions of interferon-induced transmembrane proteins opposing activities of ifitm proteins in sars-cov- infection ifitm proteins -cellular inhibitors of viral entry intracellular detection of viral nucleic acids vpu modulates dna repair to suppress innate sensing and hyper-integration of hiv- ifitm proteins inhibit entry driven by the spike protein: evidence for cholesterol-independent mechanisms palmitoylome profiling reveals s-palmitoylation-dependent antiviral activity of ifitm cholesterol -hydroxylase suppresses sars-cov- replication by blocking membrane fusion antiviral protection by ifitm in vivo interferon induction of ifitm proteins promotes infection by human coronavirus oc identification of residues controlling restriction versus enhancing activities of ifitm proteins on entry of human coronaviruses ifitm genes, variants, and their roles in the control and pathogenesis of viral infections ly e restricts the entry of human coronaviruses, including the currently pandemic cov- a pneumonia outbreak associated with a new coronavirus of probable bat origin figure s (related to figure ). spectrum and determinants of the spike-mediated fusion inhibition of ifitms schematic depiction of the split-gfp fusion assay. (b) proximity ligation assay of ace and sars-cov- spike in hela cells. cells were transfected with sirna (ctrl, ifitm and ifitm ) and infected with vsv(luc)Δg*sars-cov- -s for h at °c. lines represent means of n=( - cells)±sem. (c) exemplary images of the pla the inset depicts a magnification, membrane outline of a cell depicted by a dotted white line. scale bar, µm. (d) alignment of the spike amino acid sequences from sars-cov- alanine substitutions are color coded. blue, ubiquitination-negative mutant. red, palmitoylation negative mutant. pink, y a. orange, ntΔ aa . (f) quantification of the entry of vsv(luc)Δg*-sars-cov- -s by luciferase activity in hek t cells transiently expressing indicated proteins (ifitm mutants) and infected h post-transfection with the vsvpp (moi . ) for h. bars represent means of n= ±sem. (g) quantification of the entry of hiv(fluc)Δenv*-sars-cov- -s by luciferase activity in hek t cells stably expressing indicated proteins (ifitm mutants) and ace standard curve (left) and raw qrt-pcr ct values (right) corresponding to the sars-cov- rna copy numbers per ml shown in figure panel a. the bar diagram shows mean raw qrt-pcr figure s (related to figure ). impact of endogenous and transient ifim expression on sars-cov- replication exemplary immunoblots of whole cell lysates of calu- cells transiently transfected with sirna either control (si.nt) or targeting ifitm , (si.ifitm , , ) as indicated d) cpe (white) after h caused by infection of monolayers of vero cells with serial dilutions of calu- supernatants form figure d. cells were stained with crystal violet (blue). (e) sars-cov- rna production from hek t cells transient expressing ace and increasing levels of the indicated ifitm proteins. quantification of viral n gene rna by qrt-pcr in the supernatant of hek t was performed h post-infection with key: cord- - zr b authors: ravichandran, supriya; coyle, elizabeth m.; klenow, laura; tang, juanjie; grubbs, gabrielle; liu, shufeng; wang, tony; golding, hana; khurana, surender title: antibody repertoire induced by sars-cov- spike protein immunogens date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zr b multiple vaccine candidates against sars-cov- based on viral spike protein are under development. however, there is limited information on the quality of antibody response generated following vaccination by these vaccine modalities. to better understand antibody response induced by spike protein-based vaccines, we immunized rabbits with various sars-cov- spike protein antigens: s-ectodomain (s +s ) (aa - ), which lacks the cytoplasmic and transmembrane domains (ct-tm), the s domain (aa - ), the receptor-binding domain (rbd) (aa - ), and the s domain (aa - as control). antibody response was analyzed by elisa, surface plasmon resonance (spr) against different spike proteins in native conformation, and a pseudovirion neutralization assay to measure the quality and function of the antibodies elicited by the different spike antigens. all three antigens (s +s ectodomain, s domain, and rbd) generated strong neutralizing antibodies against sars-cov- . vaccination induced antibody repertoire was analyzed by sars-cov- spike genome fragment phage display libraries (sars-cov- gfpdl), which identified immunodominant epitopes in the s , s -rbd and s domains. furthermore, these analyses demonstrated that surprisingly the rbd immunogen elicited a higher antibody titer with -fold higher affinity antibodies to native spike antigens compared with other spike antigens. these findings may help guide rational vaccine design and facilitate development and evaluation of effective therapeutics and vaccines against covid- disease. one sentence summary sars-cov- spike induced immune response the ongoing pandemic of sars-cov- has resulted in more than million human cases and , deaths as of th april . therefore, development of effective vaccines for commercially available sars-cov- spike protein and subdomains: the spike s +s ectodomain (aa - ), the s domain (aa - ), rbd domain (aa - ), and the s domain (aa - ) as a control, which is devoid of rbd (fig. a, suppl. fig. ). theese spike proteins were either produced in hek mammalian cells (s and rbd) or insect cells (s +s ectodomain and s domain). the purified s +s ectodomain, the s domain, and the rbd proteins retained the functional activity as demonstrated in spr assay using human ace protein, the sars-cov- receptor (fig. b) . the s +s ectodomain, s domain and rbd (black, blue and red binding curves, respectively) demonstrated high-affinity interaction with human ace . the control s domain protein (green curve), lacking the rbd, did not bind to human ace , demonstrating specificity of this receptor-binding assay (fig. b) . female new zealand white rabbits were immunized twice intra-muscularly at a -day interval with g of the purified proteins mixed with emulsigen adjuvant. sera were collected before (pre-vaccination) and after the first and second vaccination and analyzed for binding antibodies in elisa and spr, in a pseudovirion neutralization assay, and by gfpdl analysis. igg to various spike proteins and domains in elisa (s +s ; black, s ; blue, rbd; red, and s ; green) (fig. c) . representative titration curves to spike ectodomain (s +s ) and to the rbd in igg-elisa are shown in suppl. fig. . end-point titers of the serum igg were determined as the reciprocal of the highest dilution providing an optical density (od) twice that of the negative control (fig. c ). all four immunogens elicited strong igg binding to the spike ectodomain (s +s ). binding to the individual domains (s , s , and rbd) was specific, in that sera generated by s vaccination bound to s , but not to s or rbd, and vice-versa (fig. c ). spr allows following antibody binding to captured antigens in real-time kinetics, including total antibody binding in resonance units (max ru) and affinity kinetics (suppl. fig. ). in elisa, the antigens directly coated in the wells can be partially denatured increasing the likelihood of presenting epitopes that are not seen on the native form of the protein by the polyclonal serum igg. on the other hand, in our spr, the purified recombinant spike proteins were captured to a ni-nta sensor chip to maintain the native conformation (as determined by ace binding) to allow comparisons of binding to and dissociation from the four proteins. importantly, the protein density captured on the chip surface is low ( ru) and was optimized to measure primarily monovalent interactions, so as to measure the average affinity of antibody binding in the polyclonal serum ( , ). additionally, while elisa measured only igg binding, in spr, all antibody isotypes contributed to antibody binding to the captured spike antigen. in the current study, all rabbit sera contained anti spike antibodies that were at least % igg (data not shown). serial dilutions of post-vaccination serum were analyzed for binding kinetics with different spike proteins (suppl. fig. ). the spike ectodomain (s +s ) generated antibodies that predominantly bound to s +s (black bar), followed by the s protein (blue bar), and -fold lower antibody binding to the rbd and the s domain (red and green bars, respectively) (fig. d ). the s domain antigen induced antibodies that bound with similar titers (max ru values) to the s +s , s and rbd proteins (black, blue and red bars, respectively), and did not show reactivity to the s domain (green bar). however, the antibody reactivity of rabbit anti-s serum to s +s domain was -fold lower than the antibodies in the rabbit anti-s +s serum. rbd immunization generated similar high-titer antibody binding to s +s , s and rbd (black, blue and red bars, respectively), (fig. d) . in contrast, the s domain induced antibodies that primarily bound to homologous s antigen (green bars) and only weakly binding to the s +s ectodomain (black bars), and no binding to either s or rbd (fig. d ). antibody off-rate constants, which describe the fraction of antigen-antibody complexes that decay per second, were determined directly from the serum sample interaction with sars- cov- spike ectodomain (s +s ), s , s , and rbd using spr in the dissociation phase only for sensorgrams with max ru in the range of - ru (suppl. fig. ) and calculated using the biorad proteon manager software for the heterogeneous sample model as described before( ). these off rates provide additional important information on the affinity of the antibodies following vaccination with the different spike proteins that are likely to have an impact on the antibody function in vivo, as was observed previously in studies with influenza virus, rsv and ebola virus ( - ). surprisingly, we observed significant differences in the affinities of antibodies elicited by the four spike antigens (fig. e) . specifically, the rbd induced -fold higher affinity antibodies (slower dissociation rates) against s +s (black), s (blue) and rbd (red) proteins, compared with the post-vaccination antibodies generated by other three immunogens (fig. e ). this region may not be highly exposed on the virions or infected cells but is clearly immunogenic in the soluble recombinant spike ectodomain. in addition, the rabbit anti-s +s antibodies bound diverse epitopes spanning the rbd and to a lesser degree to the n-terminal domain (ntd) and the c-terminal region of s , and the n-terminus of s , including the fusion peptide ( fig. b and suppl. table ). the s domain elicited very strong response against the c-terminal region of s protein and a diverse antibody repertoire recognizing the ntd and rbd/rbm regions ( fig. c and suppl. table ). the recombinant rbd induced high-titer antibodies that were highly focused to the rbd/rbm (fig. e , and suppl. table ). in contrast, the recombinant s immunogen after two immunizations in rabbits elicited antibodies primarily targeting the c-terminus of the s protein (cd-hr ). . table ). structural depiction of these antigenic sites on the sars-cov- spike (suppl. table ). the other epitopes identified in our study cover less conserved sequences between the two sars-cov viruses that are unique to the sars-cov- spike and were not identified in the in-silico approach by grifoni et al. surprisingly, the s domain doesn't appear to elicit as many neutralizing antibodies as rbd or s . although s contains the fusion peptide, it does not appear to be as immunogenic, compared with s or rbd, in generating binding antibodies to the intact spike (s +s ) ectodomain, as observed in both igg elisa and spr. even though we characterized the purified proteins in various assays, there is a possibility that the structure of the antigens used in the study is different from the corresponding authentic spike protein on the surface of sars-cov- virion particle. one unexpected finding in this study was the higher affinity of antibodies elicited by the rbd compared with the other spike antigens (s +s ectodomain, s and s domains). in earlier anti-spike reactivity of post-immunization rabbit sera. serial dilutions of post-second vaccination rabbit sera were evaluated for binding to various spike proteins and domains (s +s ; black, s ; blue, rbd; red, and s ; green) in elisa. representative titration curves are shown in fig. s . to spike protein and domains from sars-cov- (s +s ; black, s ; blue, rbd; red, and s ; (e) antibody off-rate constants, which describe the fraction of antigen-antibody complexes that decay per second, were determined directly from the serum/ sample interaction with sars-cov- spike ectodomain (s +s ), s , s , and rbd using spr in the dissociation phase only for the recombinant sars-cov- proteins were purchased from sino biologicals (s +s ectodomain; -v b , s ; -v h, rbd; -v h or s ; -v b). recombinant purified proteins used in the study were either produced in hek mammalian cells (s and rbd) or insect cells (s +s ectodomain and s domain). female new zealand white rabbits (charles river labs) were immunized twice intra- muscularly at -days interval with g of purified proteins mixed with emulsigen adjuvant. sera were collected before (pre-vaccination) and after st and nd vaccination and analyzed for binding antibodies in elisa, spr, neutralization assay and gfpdl analysis. hr, plates were washed as before and opd was added for min. absorbance was measured at nm. end titer was determined as -fold above the average of the absorbance values of the naïve serum samples. the end titer is reported as the last serum dilution that was above this cutoff. proteon manager software (version . ). all spr experiments were performed twice and the researchers performing the assay were blinded to sample identity. in these optimized spr conditions, the variation for each sample in duplicate spr runs was < %. the maximum resonance units (max ru) data shown in the figures was the ru signal for the -fold diluted serum sample. antibody off-rate constants, which describe the fraction of antigen-antibody complexes that decay per second, are determined directly from the serum/ sample interaction with sars cov- spike ectodomain (s +s ), s , s , and rbd using spr in the dissociation phase only for the sensorgrams with max ru in the range of - ru and calculated using the biorad proteon manager software for the heterogeneous sample model as described before( ). off-rate constants were determined from two independent spr runs. the datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- vaccines: status report. immunity antigenic fingerprinting of h n avian influenza using convalescent sera and monoclonal antibodies reveals potential vaccine and diagnostic targets vaccines with mf adjuvant expand the antibody repertoire to target protective sites of pandemic avian h n influenza virus human antibody repertoire after vsv-ebola vaccination identifies novel targets and virus-neutralizing igm antibodies antigenic fingerprinting following primary rsv infection in young children identifies novel antigenic sites and reveals unlinked evolution of human antibody repertoires to fusion and attachment glycoproteins as -adjuvanted h n vaccine promotes antibody diversity and affinity maturation, nai titers, cross-clade h n neutralization, but not h n cross-subtype neutralization mf adjuvant enhances diversity and affinity of antibody-mediated immune response to pandemic influenza vaccines high-affinity h head and stalk domain-specific antibody responses to an inactivated influenza h n vaccine after priming with live attenuated influenza vaccine longitudinal human antibody repertoire against complete viral proteome from ebola virus survivor reveals protective sites for vaccine design intravenous immunoglobulin for adults with influenza a or b infection (flu-ivig): a double-blind, randomised, placebo-controlled trial antigenic fingerprinting of respiratory syncytial virus (rsv)-a-infected hematopoietic cell transplant recipients reveals importance of mucosal anti-rsv g antibodies in control of rsv infection in humans the covid- vaccine development landscape characterization of the receptor- binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- h n-terminal beta sheet promotes oligomerization of h -ha that induces better antibody affinity maturation and enhanced protection against h n and h n viruses compared to inactivated influenza vaccine differential human antibody repertoires following zika infection and the implications for serodiagnostics and disease outcome ) with a single receptor-binding domain (rbd) in the up conformation, wherever available using ucsf chimera software. the rbd region is shaded in red (residues - ) on every structure key: cord- -etf afd authors: moustaqil, mehdi; ollivier, emma; chiu, hsin-ping; van tol, sarah; rudolffi-soto, paulina; stevens, christian; bhumkar, akshay; hunter, dominic j.b.; freiberg, alex; jacques, david; lee, benhur; sierecki, emma; gambin, yann title: sars-cov- proteases cleave irf and critical modulators of inflammatory pathways (nlrp and tab ): implications for disease presentation across species and the search for reservoir hosts date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: etf afd the genome of sars-cov- (sars ) encodes for two viral proteases (nsp / papain-like protease and nsp / c-like protease or major protease) that are responsible for cleaving viral polyproteins for successful replication. nsp and nsp of sars-cov (sars ) are known interferon antagonists. here, we examined whether the protease function of sars nsp and nsp target proteins involved in the host innate immune response. we designed a fluorescent based cleavage assay to rapidly screen the protease activity of nsp and nsp on a library of human innate immune proteins (hiips), covering most pathways involved in human innate immunity. by expressing each of these hiips with a genetically encoded fluorophore in a cell-free system and titrating in the recombinant protease domain of nsp or nsp , we could readily detect cleavage of cognate hiips on sds-page gels. we identified proteins that were specifically and selectively cleaved by nsp or nsp : irf- , and nlrp and tab , respectively. direct cleavage of irf by nsp could explain the blunted type- i ifn response seen during sars-cov- infections while nsp mediated cleavage of nlrp and tab point to a molecular mechanism for enhanced production of il- and inflammatory response observed in covid- patients. surprisingly, both nlrp and tab have each two distinct cleavage sites. we demonstrate that in mice, the second cleavage site of nlrp is absent. we pushed this comparative alignment of irf- and nlrp homologs and show that the lack or presence of cognate cleavage motifs in irf- and nlrp could contribute to the presentation of disease in cats and tigers, for example. our findings provide an explanatory framework for in-depth studies into the pathophysiology of covid- and should facilitate the search or development of more effective animal models for severe covid- . finally, we discovered that one particular species of bats, david’s myotis, possesses the five cleavage sites found in humans for nlrp , tab and irf . these bats are endemic from the hubei province in china and we discuss its potential role as reservoir for the evolution of sars and sasr . the ongoing pandemic of covid- (coronavirus disease- ) has already had a deep health, economic and societal impact worldwide [ ] . covid- is caused by a novel betacoronavirus, sars-cov- . other members of coronaviridae family include the highly pathogenic sars-cov and mers-cov, responsible for widespread outbreaks in and , respectively [ ] . sars-cov- encodes a large ( kb) single stranded, positive sense rna genome that contains multiple open reading frames (orfs). orf a and ab produce two large replicase polyproteins precursors ( kda for orf a, kda for orf ab) which upon proteolytic cleavage generates non-structural proteins (nsp), to . other orfs encode the main structural proteins of sars-cov : spike (s), membrane (m), envelope (e) and nucleocapsid (n) proteins, as well as accessory proteins. processing of the polyprotein precursors relies on the two viral proteases, nsp and nsp . as shown in figure a , nsp or papain-like protease (plpro) is responsible for the proteolytic cleavage of nsp - . the protein nsp , or c-like protease ( clpro), is responsible for the processing of other cleavage sites that results in nsp - . nsp is uniquely cleaved by nsp on the n-terminus and nsp on the c-terminus [ ] . as these two proteases are essential for viral replication, they are evident drug targets and considerable effort was spent by the structural biology community early in the sars-cov- outbreak. structures of the protease domains of nsp and nsp have already been reported, opening the door to the identification or development of inhibitors, using virtual or high-throughput screening (e.g. [ ] [ ] [ ] [ ] [ ] ). viral proteins, especially those from rna viruses that have stricter constraints on their genome size, often perform multiple tasks. in addition to performing their intrinsic functions in the viral life cycle, many have evolved to interfere with innate immune responses or otherwise co-opt the host cell's machinery to facilitate optimal viral replication [ , ] . the role of coronaviruses proteases in mediating virulence has been shown before [ , ] . for example, plpro of both sars-cov [ , ] and mers-cov [ ] , as well as other coronaviruses [ , ] , lead to inhibition of the type i interferon pathway. inactivation of different components of the pathway, including rig-i [ ] , sting [ ] traf / traf [ ] , tbk [ ] and irf [ , , , ] , has been documented. these effects are mediated by partly mediated by the protease activity but mainly derive from the deubiquitinating and deigsgylating functions associated with fulllength nsp [ ] [ ] [ ] . sars-cov plpro has also been reported to also activate tgf-β signalling [ ] or down-regulate p [ ] . similarly, clpro from the feline coronavirus, feline infectious peritonitis virus (fipv), inhibits type i interferon signalling through cleavage of nemo [ ] , while the porcine deltacoronavirus (pdcov) clpro cleaves stat . sars-cov clpro is responsible for virus-induced apoptosis [ ] . therefore, we hypothesized that these proteases could cleave human innate immune pathway proteins (hiips), leading to interference with or dysregulation of the host response. to screen for hiips that might be targeted by sars-cov plpro or clpro, we first leveraged the systems virology and systems biology tools present in relevant databases like innatedb [ ] , pathbank [ ] vipr [ ] , virhostnet . , and virusmentha [ ] to downselect a core set of hiips that covers almost all pathways involved in human innate immune responses. we then designed a fluorescent based in-vitro protease activity assay to screen this library of full-length hiips (fig. b) . the protease domains of plpro and clpro of sars-cov were recombinantly purified and added to gfp-labelled target hiips, expressed in a eukaryotic cell-free system. proteolytic cleavage was assessed by sds-page gels. our screen of hiips (fig. c ) revealed that only proteins were directly cleaved by these two viral proteases. notably, we discovered that nsp directly cleaved irf , while nsp cleaved nlrp and tab . surprisingly, both nlrp and tab are cleaved at two different sites, creating three protein fragments. we identified the five cognate cleavage sites in these hiips targeted by the plpro and clpro domains of sars nsp and nsp , respectively. structure-function correlative analysis followed by comparative alignment of irf and nlrp homologs across relevant mammalian orders reveal the potential explanatory power of our findings. the cleavage of irf could explain the enigmatically blunted type-i ifn response that have been noted in early studies of sars-cov- infections [ , ] , while the nsp mediated cleavage of nlrp might explain the hyperinflammatory response linked to severe covid- disease [ , ] . indeed, the lack or presence of cognate cleavage motifs in irf and nlrp homologs presents interesting correlations with the presentation of disease in animal models; our results will enable the development of more effective animal models for severe covid- . finally, we searched the available genomes of potential hosts, to determine whether sars could have evolved into an animal where the different cleavage sites would be present. we found that out of species of bats, only one presents all five cleavage sites identical to humans for nlrp , tab and irf . as myotis davidii, is found endemically in hubei province of china, near the first epicentre of sars-cov- pandemic, we will discuss its potential role as reservoir host. an in-vitro protease assay identifies targets of sars-cov plpro and clpro the human target proteins to screen in the in-vitro assay were selected to contain major proteins associated with the signaling pathways of innate immunity and cell death [ ] . relevant to infection by viruses, proteins downstream of the nucleic acid sensors mda- and rig-i have been selected (e.g. mavs, traf , nfκb and irfs). also included are the effectors of the toll-like receptors, tlr and tlr , such as trif, tram, traf or tab . proteins involved in cell-death (e.g. traf , caspases, bcl , xiap) were also included. human proteins were cloned for expression as gfp-fusions in a cell-free expression system based on the eukaryotic organism of leishmania tarentolae (lte). this system produces full-length proteins up to kda with minimal truncations, minimal protein aggregation [ ] and was previously used by our group to study the behaviour of various apoptotic proteins such as myd [ ] , mal [ ] or asc and nlrp [ ] . the assay was designed as a one-pot reaction to rapidly identify proteolytic cleavage. purified recombinant protease domains (adjusted to a final concentration of µm) were added to the lte during expression of the target proteins (see supplementary figure ). the screening conditions were optimised to avoid offtarget effects and false positives. the human proteins targets were typically expressed at low concentration (reaching at most μm), in a crowded environment (lte) that recapitulates the host cytosol. the proteases were allowed to react to target protein de novo synthesis for about ½ hours, at °c (optimal temperature for protein expression using lte). under these conditions, it is probable that the activity of the proteases was greatly reduced. we used the gfp-tag on the target protein to directly visualize cleavage using reducing sds-page (fig. c ). as expected, partial denaturation (i.e. no thermal denaturation) maintained the fluorescence of gfp so that proteins could be imaged without any subsequent purification steps. comparing the protein migration patterns in the presence and absence of the protease identifies cleavable proteins. indeed, an intact protein would appear on the gel as a single fluorescent band. if the protein is cleaved, then the gel will show either a single band, at a lower molecular weight (in the event of a complete proteolysis of all target proteins), or multiple fluorescent bands, corresponding to the full-length protein and its cleavage product in the case of an incomplete cleavage process, as described in figure b . the use of a fluorescent tag also allows simple quantification of protein concentration based on fluorescence intensity [ , ] . sds-page showed no difference of sizes in the presence or absence of viral proteases for most proteins, indicating that a large majority of human proteins were unaffected by the addition of plpro and clpro. this suggests that in our assay conditions, non-specific cleavage was not observed. however, cases of proteolytic degradation were identified, giving confidence that the viral proteases are active. the fact that only specific members of the same family of proteins (e.g. irfs, fig. c ) were cleaved (irf cleavage by plpro) suggests high specificity and the recognition of specific sequences. there was no common reactivity between plpro and clpro reinforcing the idea that each viral protease did indeed recognize a specific consensus sequence. the screen results also revealed that protein expression levels were unchanged upon addition of the proteases, suggesting that none of the components required for cell-free expression were cleaved during the experiments. as shown in supplementary figure , the cell-free lysate acts as a crowded environment made up of many different proteins. analysis of the coomassie stained gels shows that staining intensity and profile were similar, even at the highest plpro concentration, indicating that there was no significant cleavage of components of the cell-free reagent. to validate that plpro could cleave irf , we titrated different concentrations of the protease in the reaction. as shown in figure a , a strong concentration-dependence was observed, as expected. when the same experiment was performed in the presence of clpro, no proteolysis was detected, validating that the cleavage is indeed specific to pl-pro (suppl. fig. ). we then set out to identify the cleavage site on irf . based on the proteolysis sites on orf a and orf ab [ ] , and similarly to sars-cov plpro [ ] and mers-cov plpro [ ] , sars -cov plpro recognises and cleaves after lxgg sequences (fig. b ). sequence analysis of the human irf shows the presence of a single lggg sequence at residues - of the canonical isoform. cleavage of the n-gfp tagged protein at this site would result in the formation of a gfp-tagged kda fragment ( kda + kda for the gfp) and a kda untagged fragment, which corresponds well to the band obtained by sds-page (fig. c ). the identified cleavage site would be present on an exposed loop, based on previously solved structures (see fig. d , pdb: j f [ ] ) and therefore accessible to the protease. no such motif was found in any other member of the irf family (fig. e ), in agreement with our data. on the contrary, recognition motifs present in other proteins of our test panel (lagg in β-catenin, lvgg in stat a and legg in nlrp ) did not get processed by plpro in our assay. in the case of stat a, the cleavage motif is partially buried in the protein (see supplementary figure ). however, the lagg motif in β-catenin is exposed at the surface in a structured a-helix (see supplementary figure ); the legg motif in nlrp cannot be located on the only existing structure nlrp (nlrp , pdb npy). it is possible in addition to local structures, the residue between l and gg would contribute to the selectivity of plpro cleavage. irf is a key mediator of type i interferon (ifn) response triggered by viral infections [ ] . the c-terminal part of the protein is responsible for mediating interactions with upstream receptors and effectors sting, mavs and trif [ ] . irf tail is also strongly targeted for post-translational modifications upon infection [ , ] , leading to its homodimerization (pdb: qwt, figure f ), translocation into the nucleus and transcriptional activation [ ] . therefore, we reasoned that plpro cleavage of irf would result in reduced ifn production, a feature that has been observed upon sars-cov infection [ ] . similarly, we set out to validate clpro proteolysis of tab and nlrp ( fig. a and a ). as before, we observed a concentration-dependent cleavage of both proteins and verified that plpro did not have an effect, at any concentration (see supplementary figure and ). sds-page analysis reveals the presence of two cleavage sites on tab , that can be more easily visualized when the gfp tag is placed at either the c-terminus (fig. a ) or the n-terminus (fig. b ) of the protein. the recognition motif for cl-pro of coronaviruses sars-cov [ ] , mers-cov [ ] and sars -cov [ ] is often lq/(s,a,g) although the protease can also cuts after fq or vq motifs [ ] . further, the presence of a combination of hydrophobic and positively charged residues at position p- and p- also seems to be preferred (fig. b ). in tab , we identified a ltlqs motif at position of the canonical form, that would give rise to a kda n-terminal fragment and a kda c-terminal fragment. another possible recognition motif (aslqs) is present at position and that would give a kda n-terminal fragment and a kda c-terminal fragment (fig. c) . therefore, the two proteolytic fragments for c-gfp tab correspond to amino-acids - and - ( fig. a) whereas n-gfp tab cleavage leads to the formation of proteolytic fragments corresponding to residues - and - (fig. b ). more details on calculation of sizes based on migration are included in supplementary figure and . based on the reported structure of tab (fig. d , pdb: j o, [ ] ), the first cleavage site is on an exposed loop in the pseudo-phosphatase domain. the second cleavage site is in a disordered region of the protein that does not appear on the structure (see figure e for the full sequence of human tab .) another target of cl-pro revealed by our screen is nlrp . nlrp is an intracellular patternrecognition receptor, from the nucleotide-binding and oligomerization domain-like (nlr) receptor family, which regroups key mediators of the innate immune response and inflammation [ ] . nlrp modulates the expression of inflammatory cytokines [ ] through the regulation of the nfκb and mapk pathways [ ] . formation of nlrp inflammasome can activate caspase [ , ] , which produces interleukin and leads to cell death. but nlrp can also form mixed inflammasomes with other nlrps, including nlrp , and then play an inhibitory role [ , ] . nlrp is also involved in adaptative immunity and controls mhc class i expression through a yet ill-defined mechanism [ ] . unexpectedly, we also noted the presence of two additional bands when nlrp was submitted to clpro treatment, suggesting that two cleavage sites would also be found in nlrp (fig. a) . in figure b , we show the analysis of preferred residues for cleavage by clpro of sars-cov- . sequence analysis identified a canonical lqa motif at residue , at the c-terminal of nlrp , in the middle of the lr repeats (fig. c) . the cleavage at residue would create a small c-terminal fragment (residues - ), observed on the gel using the c-terminally gfp-tagged nlrp (fig. a ). there was no other canonical clpro recognition sequence (lqs) in nlrp , to explain the second proteolytic event. when expanding the search to degenerated sequences (fq/vq), we identified a klfqg sequence at residue (fig. c ). cleavage at this site would result in the formation of a kda c-terminal fragment (residues - , ), observed in figure a . analysis of the sequences of other nlrps (nlrp -nlrp ) shows that both motives are unique to nlrp (fig. d) . interestingly, the mouse homolog of nlrp possesses the first but not the second recognition motif (fig. d) . to further validate our data, we compared the effect of clpro on human vs. mouse nlrp , each tagged at the n-and c-terminal position. figure e shows that human nrlp (left panel) is indeed cleaved twice, whereas mouse nlrp (right panel) is only processed once. only the smaller n-terminal fragment and corresponding larger c-terminal fragment are conserved, indicating that cleavage at residue occurs for both species, while cleavage at residue is specific to human. the calibration of migration of the fragments on the sds-page gels is further described in supplementary figure . homology modelling of nlrp , using the structure of its close relative nlrp (pdb: npy [ ] ), shows that both sites are expected to be in an exposed, unstructured loops, readily accessible to a protease (fig. f ). similar to sars, sars utilizes ace and tmprss to achieve its entry step [ ] [ ] [ ] . the cleavage of irf , tab , and nlrp was demonstrated by in vitro plpro or clpro cleavage at the previous figures. to validate the cleavage effects upon sars infection, we generated ace expressing and ace /tmprss double expressing t cells by lentiviral expression system for enhancing the infectivity of sars . t-ace and t-ace /tmprss cells were then infected with sars and analyzed for viral and host protein levels. fig. shows that the expression of irf , nlrp , and tab (panels a-c, respectively) was decreased in virus-infected cells (lanes - and - ) compared to the mockinfected controls (lanes ). the lower expression of ace in t-ace /tmprss cells compared with that in t-ace cells (lanes and , respectively) might be caused by the proteolytic cleavage of ace due to tmprss overexpression [ ] ; however, it did not affect the pattern of irf , tab , and nlrp during sars infection. combined with the results from in vitro assay, we found the proteases of sars (plpro and clpro) can degrade irf , tab , and nlrp probably through their protease activity, leading to the imbalance responses of host innate immunity experimental screens identify accessible cleavage sites in proteins and more importantly, detects noncanonical sequences. in this report, we show that the viral proteases plpro and clpro of sars-cov lead to the in-vitro proteolytic cleavage of three important proteins of the host immune response: irf , tab and nlrp (fig. b) . these results first show the exquisite specificity of the viral proteases, with only positive hits observed out of experiments. in the case of plpro, the recognition sequence seems well defined and lxgg motifs are scarcely found in the proteins we studied. nevertheless, of the proteins that harbor such motifs, only (irf ) was cleaved in vitro. indeed, the motif is present on an unstructured loop of irf that made this site readily accessible. this shows that the presence of a recognition motif is required but not sufficient to predict biological activity. as such, experimental validation of potential targets, identified by bioinformatics remains critical. the identification of nlrp and tab as substrates of clpro further demonstrate the importance of this type of screening approaches. clpro recognition motif is not as well defined as plpro, and lq/(s,a,g) motifs are ubiquitously found in the proteome. in our protein list, multiple bona fide recognition motifs based on lq/s, lq/a and lq/g, respectively, can be found as well as several degenerated motifs. yet only two targets (and four cleavage sites) were recognized by clpro, emphasizing again the importance of site accessibility. more importantly, our data identify an unexpected cleavage site on nlrp (klfq/g) that does not resemble to the cleavage motifs present on the virus orf a/ ab, further indicating that the determinants of selectivity for clpro are yet to be identified. viral proteases have probably evolved to efficiently process their own polypeptides but their ability to target host proteins, especially the ones involved in host defence, provide them with an evolutionary advantage. the three hiips identified in this screen (irf , tab and nlrp ) are important contributors to the innate immune and inflammatory responses, either driving or dampening it. irf belongs to the interferon-regulatory factor (irf) family. all irfs possess a well-conserved dnabinding domain at the n-terminus and a variable c-terminal domain that mediates most of the interactions with the other irf proteins and other co-factors [ ] . fine-tuning of the choice of partners rely on posttranslation modifications, such as phosphorylation and ubiquitination, in this region. as mentioned before, irfs, especially irf , , and , are major contributors to the production of and response to type i interferons, which stimulate macrophages and nk cells to elicit anti-viral responses [ ] . therefore, irf has been found to be targeted by several different classes of viruses. paramyxoviruses, herpesviruses, reovirus and double stranded rna viruses, to cite some examples, have been shown to interfere with irf signalling through different strategies [ ] . these include the control of protein expression, protein cellular localisation, modifications of the ptms, inhibition of protein-protein interactions or induced cellular degradation [ ] . direct proteolytic cleavage of irfs by viral proteases has also been identified before. proteases of enteroviruses ev [ ] and ev-d [ ] have been shown to directly cleave irf . papain-like protease domains of several coronaviruses including sars-cov and mers-cov have been implicated in the observed reduction of ifn signalling upon viral infection [ ] . although this link seems well established, the molecular basis of this effect is still under debate. whether the enzymatic activity is required is also controversial. besides its proteolytic activity, plpro of coronaviridae also possess a deubiquitinase and deisgylation activity that contribute to inhibition of irf activity [ , , , ] . further, sars-cov plpro has been shown to target proteins upstream of irf . it can directly bind to sting, disrupt the formation of the sting-traf -tbk -ikkε complex or limit signalling by deubiquitinating sting, traf , traf or tbk [ ] . here we show that in-vitro, sars-cov plpro was able to cleave irf and validated that the effect could be observed in relevant virus infected cells ( figure ). direct proteolytic cleavage of irf may contribute to the reduction of type i interferon production observed in covid patients [ , ] . tab is part of the tab / / /tak complex [ ] that regulates the activity of tak (tgfβ-activated kinase ), in response to different stimuli including tgfβ, il , tnfα and upon viral and bacterial infection [ , ] . tak can then activate the nfκb pathway or signal through the map kinases pathway [ , ] . lei et al. showed that the c protease of enterovirus (ev ) cleaved tab (along with tab , tab and tak ) at two cleavage sites (q s and q s) to perturb the formation of the complex and inhibit cytokine release downstream of nfκb [ ] . our identified cleavage site, at q s, forms a protein that is reminiscent of the second isoform of tab (tab β, which lacks the c-terminus) that loses its interaction with tak [ , ] . further, the poly-ser region ( - ) is a substrate for p kinase [ ] (whose binding site has been mapped to residues - on tab ) and phosphorylation controls cellular localisation and activity of the protein [ ] . therefore, the loss of tab c-terminus through clpro cleavage would profoundly impact its ability to activate tak and result in decrease production of cytokine through nfκb signalling. finally, we identified nlrp as a substrate of clpro, and two cleavage sites could be identified. nlrp , like other members of the nlrp family, possess a pyd domain, that binds the effector asc through homotypic interaction, followed by a nacht domain, which binds atp and mediates activation of the protein, and a series of lr repeats that gives specificity to each member by modulating proteinprotein interactions [ ] . nlrp , by similarity with nlrp , is thought to be normally maintained in a monomeric, auto-inhibited conformation and release of this auto-inhibition is mediated by binding of a ligand (e.g. atp) to the nacht domain [ ] . atp binding to nlrp plays a major role in regulating the protein's activity as it has been shown to not only induce self-oligomerisation, but also promote interaction with nfκb-induced kinase (nik) and subsequent degradation of nik [ ] . mutations in the atp binding site are sufficient to increase production of proinflammatory cytokines and chemokines, mimicking loss of nlrp [ ] . the first cleavage site in nlrp , at residue , is located the nacht domain, in between the two walker motifs that mediate atp binding (walker a; residues - and walker b: residues - , by analogy with nlrp [ ] ). this cleavage breaks the nucleotide binding site and whether this leads to activation or repression of the protein activity is an open question. of note, this proteolytic cleavage also releases the pyd. pyd of asc, the nlrp / adaptor, has been shown to drive polymerization of asc and formation of asc specks upon inflammasome formation [ ] . whether nrlp pyd is able to polymerise on its own and induce downstream signalling and pyroptosis remains to be explored. the second cleavage site, at residue , releases lrr motifs, and probably modifies protein-protein interactions. most relevant to covid- , nlrp is known to negatively regulate the release of proinflammatory cytokines [ ] . mutations in nlrp [ ] have been linked to autoinflammatory disorders, in particular the familial cold autoinflammatory syndrome (fcas , omim: ). of note, truncation r x, close to the identified cleavage site has been identified in patient suffering from hereditary periodic fever syndrome and leads in vitro to a dramatic increase in nf-κb activation [ ] . up-regulation (and potentially over-activation) of nlrp has also been noted in patients with kawasaki disease [ ] , a rare auto-inflammation of blood vessels in children. the emergence of kawasaki-like syndromes in children positive for sars-cov [ , ] could point to a molecular link between nlrp cleavage and inflammatory sur-activation. the recognition sequences on irf and nlrp are unique in their respective families of proteins, so we wanted to investigate how conserved these motifs were. to this end, we compared the protein sequences of irf and nlrp among species, specifically around these sequences. in both cases, the cleavage sites were located on well conserved portions of the proteins but small differences at or near the cleavage site were identified. figure summarizes the conservation of the irf and nlrp sequences for species that could be use as infection models. the plpro recognition sequence in irf is present in most species, except for rodents and ferrets, where the g at p is replaced by k, r or n, which would not be permissive to proteolytic cleavage. we then examined the sequences of nlrp . the motif around the first cleavage motif is mainly conserved, but we noticed that the amino acid directly after the cleavage site varied significantly. in primates, this amino acid is a small neutral amino acid (g) that is replaced by a bulkier, charged residue d or h in other species. a similar trend is observed for the second cleavage site where the small amino acid a found in human is replaced by larger amino acids (v or i). such substitutions would likely affect the electrostatic environment and most likely inhibit the formation of active enzyme-substrate complexes. interestingly, apart from primates, only cats and tigers have / similar recognition sequences to humans, for both proteins. it has been reported that, besides primates, cats are amongst the few species can could not only be infected with sars-cov , but develop covid- symptoms [ ] . anecdotal evidence suggest that amur tigers (panthera tigris altaica) [ ] and european minks (mustela lutreola) [ ] could be infect/be infected by humans and develop symptoms. unfortunately, the genome annotation for european minks is incomplete and does not allow for a comparison of its nlrp sequence. this comparison once again highlights the difficulty of finding an animal model suitable for the study of sars-cov infectivity and disease [ ] . it has been rapidly established that mouse models of covid- were ill-adapted as murine ace the receptor for sars-cov , is significantly different to the human isoform [ ] . this led to efforts in developing transgenic mice models with humanized ace which was successfully used to study infection by sars-cov [ ] and sars-cov [ ] . according to our analysis however, this model may not fully recapitulate the lack of interferon production if this is driven by irf cleavage. we have also shown that mouse nlrp is only cleaved once, compared to two cleavage sites for human nlrp (fig. d and e) , although the functional impact of these cleavages remains to be studied. ferrets are also a preferred model of respiratory viral infections and have been shown to be infected by sars-cov , but develop only mild symptoms [ ] . so far, the most promising disease models amongst primates are rhesus macaques (macaca mulatta) that develop pneumonia [ ] , although results of infection in capuchins (sapaju appella) are yet to be published. in non-primates, cats present the most symptoms with massive lung lesions [ ] . we then extended our comparison to include wild animals that could have been host reservoirs of the virus, hypothesizing that the viral proteases had evolved to target innate immune proteins of their host. to date, the exact reservoir and intermediate hosts of the virus remain to be found. it is highly probable that the virus originates from a bat coronavirus (e.g. [ ] [ ] [ ] ) and the malayan pangolins (manis javanica) might have acted as intermediate hosts [ , ] to this end, we compared the irf and nlrp sequences in different bats, the malayan pangolin and the chinese tree shrew (tupaia chinensis), based on data availability (figure ) . unfortunately, data for the masked palm civet (paguma larvata), the intermediate host for sars-cov, could not be easily retrieved. as before, the irf sequence around the cleavage site was remarkably conserved amongst species. on the contrary, the sequences for nlrp are highly divergent, especially at the c-terminus. interestingly, this (limited) analysis identified only one species (myotis davidii) that possess similar cleavage sites compared to human. overall, the method presented here enables medium to high-throughput screen of the activity of viral protein or bacterial effectors and will help design new antiviral and antibiotic strategies. in this study, we presented the results on the first hiips (human innate immune proteins), and the screen will be expanded to cover more potential targets and pathways in the future. our results show that in addition to the de-ubiquitinase activity of nsp , sars-cov- uses its two proteases to further impact the host innate immune signalling. our results were validated in sars-cov- infected a t-ace and t ace /tmpssr cells, where decrease of irf , nlrp and tab was demonstrated by western blotting. our findings of irf cleavage are consistent with the literature and the previous work showing an inhibition of interferon beta production in covid- patients. more importantly, the direct cleavage of nlrp by clpro could explain the hyper-inflammation observed in some patients. the fact that the two proteases of sars-cov- could have evolved to interfere with innate immunity is attractive. if one considers that most of evolution is driven by lucky side-effects, the additional targets irf , tab and nlrp could give a selective advantage if their cleavage would either enhance transmission or delay the response to infection. the discovery that all five cleavage sites are present in a single species of bats gives support to this theory. the fact that myotis davidii can be found near the epicentre of the sars pandemic makes it a possible candidate for a previous reservoir host, even if it does not exclude other hosts for sars and sars . further studies will be required to determine if these hosts display positive selection around the cleavage sites, and if zoonosis can be more precisely dated and determined. the authors would like to thank katherina michie, jack bennett and key personnel at the protein production facility of unsw for the purification of plpro and clpro of sars-cov- . the authors would like to thank dr kate schroeder for the gift of the nlrp mouse plasmid in a previous collaboration, and prof. alexandrov for the cell-free plasmids compatible with lte protein production. (a) schematic of the organization of the genome of sars-cov- , focusing on the non-structural proteins nsp - . as depicted, two proteases are encoded in orf a: nsp or papain-like protease (plpro) and nsp , or c-like protease ( clpro). plpro is responsible for three proteolytic cleavages, while the protein nsp cuts the large polyprotein at eleven different sites. (b) in our assay, the proteins are expressed using an eukaryotic cell-free system as a gfp fusion, and visualized using non-reducing sds-page using fluorescence. the proteases were added during protein expression, and the changes in banding pattern (protein size/integrity) were analysed to detect potential cleavage. here, the results obtained for the family of irf proteins are shown with plpro, and the additional band obtained for irf is indicated by a blue arrow (bottom panel, in the presence of nsp /plpro). (c) overview of the proteins tested in this study and the proteolytic events detected: out of the proteins tested, plpro cleaves only irf (indicated in blue) and clpro cleaves nlrp and tab (as shown in red). note that the first g is believed to be crucial for successful cleavage (c) representation of the domains found in irf and the position of the cleavage site (d) representation of irf structure (from pdb j f). the cleavage sequence lggg is highlighted in blue; the site is presented in a flexible loop and seems fully exposed for cleavage by plpro. (e) alignment of the amino acids for human irf -irf , demonstrating that only irf would be predicted to be cleaved as observed. (f) structure of the irf homodimer (from pdb qwt), showing the two fragments (green, n terminal, green, and blue for c-terminal). the cleavage seems to affect the dimeric interface, but this would need to be demonstrated experimentally. the gel shows two additional bands upon cleavage, corresponding to the fragment - and to the fragment to . these two fragments are visible as they carry the gfp tag; the fragments - , - and - are not visible in this configuration. (b) same, but for the n-terminal gfp. in this case, the fragments - and - are fluorescent and can be detected on the gel, while the fragments - , - and - are not fluorescent. (c) schematic representation of tab protein structure with the location of the identified cleavage sites on the primary sequence. (d) representation of tab structure (from pdb pom). the cleavage sequence aslqs is highlighted in red; the site is presented in a flexible loop in an helix-loop-helix motif, and seems fully exposed for cleavage by clpro. (e) full sequence of amino acids for human tab , showing the two putative cleavage sites aslqs and ltlqs. the second cleavage site is not present in existing structures of tab but is predicted to be flexible and accessible. note that the removal of c-terminal tail of tab , is mimicked by a natural isoform. (a): sds-page analysis of the cleavage of nlrp protein, with a c-terminal gfp tag. the protein was expressed alone or in the presence of increasing concentrations of the sars-cov- protease clpro. the gel shows that two separate cleavage sites produce two distinct fragments attached to the c-terminal gfp (b) logo analysis of the cleavage site predicted for clpro, from the polyprotein cleavage of sars-cov- . note that the q residue in position p- is believed to be crucial for successful cleavage (c) representation of the domains found in nlrp and the position of the cleavage sites.(d) alignment of the amino acids for human nalps (nlrps), demonstrating that only nlrp would be predicted to be cleaved as observed. below, alignment of mouse nlrp , showing that the first cleavage site is conserved, but the second site presents an a→v mutation that would disrupt cleavage. (e) sds-page analysis of human and mouse nlrp , with different tag orientations, to demonstrate the differences between species. with an n-terminal gfp tag, the human nlrp appears as two fragments ( - and - ), as the fragments - and - would be non-fluorescent. with a c-terminal tag, only the fragments - and - are fluorescent and are detected on the gel, and the fragments - and - are nonfluorescent. the banding patterns obtained in the presence of clpro are consistent with the predicted sizes. using the mouse nlrp constructs, only one cleaved fragment is observed in the n-term and cterm constructs. here, a myc-mcherry was used and the tag was detected by the red fluorescence of mcherry. in the n-term configuration, a single cleaved product is detected, corresponding to the fragment - . in the c-term configuration, the fragment - is detected; this confirms that the lqa→lqv mutation found in mice inhibits cleavage by clpro. (f) representation of nlrp structure (derived from the structure of nlrp from pdb npy). (top): the cleavage sequence lfqg (site , at residue ) is highlighted in red; the site is presented in a flexible loop and seems fully exposed for cleavage by clpro. (a-c) stable t-ace and t-ace /tmprss cells generated by lentiviral transduction were infected with sars-cov- at . or . moi. uninfected cells plated and treated identically served as mock-infected controls. hpi, cell lysates were collected, and the indicated proteins were detected by western blot using the relevant antibodies as specified in methods. coxiv serves as the loading control. the relative amounts of irf , tab and nlrp was quantified by densitometry and presented as a ratio of coxiv with the ratio obtained from mock infected t-ace cells set at . all primates tested presented both irf and nlrp cleavage sequences, except for the rhesus monkey and cynomolgus ("crab-eating") monkey where the second nlrp cleavage site is mutated or missing. in rodents, most of the typical species used for testing in laboratories (rat, hamsters, and mouse) have a mutation in the most important residue for plpro cleavage. cats, tigers (and most feline, not shown) present both irf and nlrp cleavage sites and we would predict that both proteins would be cleaved during sars-cov- infection. ferrets have variations in these sites, and it is unlikely that the proteins would be affected. cats and tigers can present respiratory symptoms, but dogs are not affected. unfortunately, the genome of the european mink (mustela lutreola) could not be found at this stage. european minks can present the disease and transmit sars-cov- . horses, pigs and camels possess the irf cleavage site but not the nlrp sites; on contrary, the rabbit has a mutation in irf , but we would predict that the second nlrp site would be cleaved. both cleavage sites of tab are exactly conserved across all these species. (*) the protein sequences for cotton rats (sigmodon hispidus) and minks (neovison vison) were found using tblastn against shotgun genomes, with the query aah . for irf protein [(isoform ) homo sapiens] and np_ . for nacht, lrr and pyd domains-containing protein [(isoform ) homo sapiens]. most "exotic" species that would be relevant for sars-cov, mers or sars-cov- present the correct cleavage site for irf . we found that one species of bats, davids' myotis, presents the three cleavage sites in irf and nlrp (and also the two cleavage sites in tab ). this species of bats is endemic to the province of hubei, where the sars-cov- pandemic originated. another small animal found in the same province, the chinese tree shrew, displays at least two cleavage sites for irf and nlrp ; the first cleavage site in nlrp (klfrg) may potentially be cleaved. the species of pangolin described in china (manis javanica) does not possess the nlrp cleavage sites (note that surprisingly, an african pangolin presents all three cleavage sites identical to humans). both cleavage sites of tab are exactly conserved across all these species. the presence of the five human-like cleavage sites for irf , tab and nlrp in a single species shows that it is possible that the sars viruses could have gained the new functionality of cleaving these human innate immune proteins in a single reservoir host, potentially in myotis davidii. evolution of severe acute respiratory syndrome coronavirus (sars-cov- ) as coronavirus disease (covid- ) pandemic: a global health emergency. science of the total environment human coronaviruses with emphasis on the covid- outbreak. virusdisease functional studies of the coronavirus nonstructural proteins. stemedicine, . identification of potential binders of the main protease clpro of the covid- via structure-based ligand design and molecular modeling structural basis of sars-cov- clpro and anti-covid- drug discovery from medicinal plants prediction of the sars-cov- ( -ncov) c-like protease ( cl (pro)) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates virtual screening and repurposing of fda approved drugs against covid- main protease artificial intelligence approach fighting covid- with repurposing drugs sars coronavirus and innate immunity. virus research rna-virus proteases counteracting host innate immunity the papain-like protease determines a virulence trait that varies among members of the sars-coronavirus species sars coronavirus papain-like protease inhibits the type i interferon signaling pathway through interaction with the sting-traf -tbk complex severe acute respiratory syndrome coronavirus papain-like protease ubiquitin-like domain and catalytic domain regulate antagonism of irf and nf-kappab signaling proteolytic processing, deubiquitinase and interferon antagonist activities of middle east respiratory syndrome coronavirus papain-like protease coronavirus papain-like proteases negatively regulate antiviral innate immune response through disruption of sting-mediated signaling plp of mouse hepatitis virus a (mhv-a ) targets tbk to negatively regulate cellular type i interferon signaling pathway sars coronavirus papain-like protease inhibits the tlr signaling pathway through removing lys -linked polyubiquitination of traf and traf the sars coronavirus papain like protease can inhibit irf at a post activation step that requires deubiquitination activity regulation of irf- -dependent innate immunity by the papain-like protease domain of the severe acute respiratory syndrome coronavirus mers-cov papain-like protease has deisgylating and deubiquitinating activities nsp of coronaviruses: structures and functions of a large multi-domain protein recognition of lys -linked di-ubiquitin and deubiquitinating activities of the sars coronavirus papain-like protease sars coronavirus papain-like protease up-regulates the collagen expression through non-samd tgf-β signaling p down-regulates sars coronavirus replication and is targeted by the sars-unique domain and plpro via e ubiquitin ligase rchy feline infectious peritonitis virus nsp inhibits type i interferon production by cleaving nemo at multiple sites. viruses severe acute respiratory syndrome coronavirus c-like protease-induced apoptosis innatedb: systems biology of innate immunity and beyond--recent updates and continuing curation pathbank: a comprehensive pathway database for model organisms virus pathogen resource (vipr) virusmentha: a new resource for virus-host protein interactions imbalanced host response to sars-cov- drives development of covid- type i ifn immunoprofiling in covid- patients clinical and immunological features of severe and moderate coronavirus disease complex immune dysregulation in covid- patients with severe respiratory failure antiviral innate immunity pathways performance benchmarking of four cell-free protein expression systems pathological mutations differentially affect the self-assembly and polymerisation of the innate immune system signalling adaptor molecule myd structural basis of tir-domain-assembly formation in mal-and myd -dependent tlr signaling single-molecule fluorescence reveals the oligomerization and folding steps driving the prion-like behavior of asc papain-like protease (plp ) from severe acute respiratory syndrome coronavirus (sars-cov): expression, purification, characterization, and inhibition catalytic function and substrate specificity of the papain-like protease domain of nsp from the middle east respiratory syndrome coronavirus x-ray crystal structure of irf- and its functional implications regulating irfs in ifn driven disease structural basis for concerted recruitment and activation of irf- by innate immune adaptor proteins virus-dependent phosphorylation of the irf- transcription factor regulates nuclear translocation, transactivation potential, and proteasome-mediated degradation positive regulation of interferon regulatory factor activation by herc via isg modification direct inhibition of irf-dependent transcriptional regulatory mechanisms associated with disease biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus c-like proteinase identification and evaluation of potent middle east respiratory syndrome coronavirus (mers-cov) cl(pro) inhibitors. antiviral research substrate specificity profiling of sars-cov- mpro protease provides basis for anti-covid- drug design tak -binding protein is a pseudophosphatase functions of nod-like receptors in human diseases nlrp suppresses colon inflammation and tumorigenesis through the negative regulation of noncanonical nf-κb signaling pypaf , a novel pyrin-containing apaf -like protein that regulates activation of nf-kappa b and caspase- -dependent cytokine processing nlrp , nlrp , and ifi inflammasomes induction and caspase- activation triggered by virulent hsv- strains are associated with severe corneal inflammatory herpetic disease. front immunol beyond the inflammasome: regulatory nod-like receptor modulation of the host immune response following virus exposure nlrp regulates anti-viral rig-i activation via interaction with trim cutting edge: monarch- : a pyrin/nucleotide-binding domain/leucine-rich repeat protein that controls classical and nonclassical mhc class i genes structural mechanism for nek -licensed activation of nlrp inflammasome angiotensin-converting enzyme is a functional receptor for the sars coronavirus efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein irf and stat transcription factors -from basic biology to roles in infection, protective immunity, and primary immunodeficiencies the molecular basis of viral inhibition of irf-and stat-dependent immune responses. frontiers in immunology enterovirus c inhibits cytokine expression through cleavage of the tak /tab /tab /tab complex c protease of enterovirus d inhibits cellular defense mediated by interferon regulatory factor viral innate immune evasion and the pathogenesis of emerging rna virus infections. viruses plp , a potent deubiquitinase from murine hepatitis virus, strongly inhibits cellular type i interferon production post-translational modifications of the tak -tab complex tak , more than just innate immunity. iubmb life pan-optosis). frontiers in cellular and infection microbiology the kinase tak can activate the nik-iκb as well as the map kinase cascade in the il- signalling pathway irak and tak are required for il- -mediated signaling tab β (transforming growth factor-β-activated protein kinase -binding protein β), a novel splicing variant of tab that interacts with p α but not tak transforming growth factor-β-activated protein kinase -binding protein (tΑΒ)- α, but not ΤΑΒ β, mediates cytokine-induced p mitogen-activated protein kinase phosphorylation and cell death in insulin-producing cells identification and functional characterization of novel phosphorylation sites in tak -binding protein (tab) . plos one nlr proteins: integral members of innate immunity and mediators of inflammatory diseases the inflammasomes. cell atp binding by monarch- /nlrp is critical for its inhibitory function cryopyrin/nalp binds atp/datp, is an atpase, and requires atp binding to mediate inflammatory signaling unified polymerization mechanism for the assembly of asc-dependent inflammasomes the multifaceted nature of nlrp nlrp in autoimmune diseases mutations in nalp cause hereditary periodic fever syndromes epigenetic hypomethylation and upregulation of nlrc and nlrp in kawasaki disease an outbreak of severe kawasaki-like disease at the italian epicentre of the sars-cov- epidemic: an observational cohort study sars-cov- -induced kawasaki-like hyperinflammatory syndrome: a novel covid phenotype in children susceptibility of ferrets, cats, dogs, and other domesticated animals to sarscoronavirus . science, . . coronavirus: tiger at bronx zoo tests positive for covid- , in bbc-news. . . dutch farm worker contracted coronavirus from mink: agriculture minister from mice to monkeys, animals studied for coronavirus answers structural basis of receptor recognition by sars-cov- mice transgenic for human angiotensin-converting enzyme provide a model for sars coronavirus infection the pathogenicity of sars-cov- in hace transgenic mice infection and rapid transmission of sars-cov- in ferrets age-related rhesus macaque models of covid- a genomic perspective on the origin and emergence of sars-cov- . cell genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a novel bat coronavirus closely related to sars-cov- contains natural insertions at the s /s cleavage site of the spike protein probable pangolin origin of sars-cov- associated with the covid- outbreak isolation of sars-cov- -related coronavirus from malayan pangolins gateway-compatible vectors for high-throughput protein expression in proand eukaryotic cell-free systems species-independent translational leaders facilitate cell-free expression cell-free gene expression: an expanded repertoire of applications a cell-free approach to accelerate the study of protein-protein interactions in vitro. interface focus leishmania cell-free protein expression system the controls and protease-treated lte reactions were then mixed with lds (bolt lds sample buffer, thermofisher) and loaded onto sds-page gels ( - % bis-tris plus gels, thermofisher); the proteins were detected by scanning the gel for green (gfp) or red (mcherry) fluorescence using a chemidoc mp system (biorad) and proteolytic cleavage was assessed from the changes in banding patterns, as shown in figure b . note that in this protocol, the proteins are not treated at high temperature with the lds and not fully denatured, to avoid destruction of the gfp/mcherry fluorescence. as proteins would retain some folding, the apparent migration on the sds-page gels may differ slightly from the expected migration calculated from their molecular weight. we have calibrated our sds-page gels and ladders using a range of proteins, as shown in supplementary information. leishmania tarentolae extracts were prepared in house using the protocol described previously [ , ] briefly, leishmania tarentolae parrot strain was obtained as lexsy host p from jena bioscience gmbh, jena, germany and cultured in tbgg medium containing . % v/v penicillin/streptomycin (life technologies) and . % w/v hemin (mp biomedical). cells were harvested by centrifugation at x g, washed twice by resuspension in mm hepes, ph . , containing mm sucrose, mm potassium acetate and mm magnesium acetate and resuspended to . g cells/g suspension. cells were placed in a cell disruption vessel (parr instruments, usa) and incubated under kpa nitrogen for minutes, then lysed by rapid release of pressure. the lysate was clarified by sequential centrifugation at x g and x g and anti-splice leader dna leader oligonucleotide was added to μm. the lysate was then desalted into mm hepes, ph . , containing, mm potassium acetate and mm magnesium acetate, supplemented with a coupled translation/transcription feeding solution and snap-frozen until required. we verified that the expression patterns and cleavage of the proteins in this study was independent of the batch of lte used. cells were lysed with ripa lysis buffer (pierce) containing a cocktail of protease inhibitors (cell signaling). equivalent amounts of proteins determined by the bradford protein assay (bio-rad) were separated by sds-page and transferred through the trans-blot turbo transfer system (bio-rad). to avoid the nonspecific antibody reaction, the membranes were blocked with intercept blocking buffer (li-cor) prior to the addition of primary antibodies. after primary antibodies incubation, the blots were then treated with alexa fluor conjugated secondary antibodies (invitrogen) and developed signals using the chemidoc mp image system (bio-rad). the following commercial antibodies were used: anti-irf (ab ) and anti-ace (ab ) from abcam, anti-nlrp (pa - ) from invitrogen, anti-tab (# ) from cell signaling, anti-sars-cov- n (gtx ) from genetex, anti-tmprss (sc- ) from santa cruz biotechnology, anti-coxiv ( - -ap) from proteintech. sars-cov- proteases cleave irf and critical modulators of inflammatory pathways (nlrp and tab ): implications for disease presentation across species and the search for reservoir hosts. purification of plpro (nsp ) and clpro (nsp ):e. coli c (de ) cells were transformed with the sars-cov- pet- a-nsp plasmid (idt dna). single colonies were used to inoculate lb media supplemented with kanamycin ( µg/ml). ml cultures were grown at °c until an od of . was reached, cooled to °c and induced for hours with mm iptg. following growth, the cells were pelleted by centrifugation ( x g, min, °c) and washed with x pbs before storage at - °c. frozen cell pellets were resuspended in buffer a ( mm sodium phosphate buffer ph , mm nacl, mm imidazole) and lysed on ice via sonication using a branson sfx sonifier ( minutes at % amplitude, s pulse on, s pulse off). cell debris was removed from the lysate by centrifugation ( , x g, min, °c) and subsequent filtration of the supernatant through a . µm syringe filter. ml of the clarified lysate was loaded onto a ml hitrap imac sepharose ff column (ge healthcare, illinois) charged with ni + and preequilibrated with buffer a. unbound proteins were removed from the column through washing with column volumes (cv) of buffer a. bound proteins were eluted with a stepwise gradient of buffer b ( mm sodium phosphate buffer ph , mm nacl, mm imidazole) as follows: - % b, cv; % b, cv hold; - % b, cv; %b, cv hold; - % b, cv; % b, cv hold. fractions containing nsp were exchanged and concentrated into buffer c ( mm tris, ph . , % glycerol), flash frozen and then stored at − °c. all purification steps were performed on ice or at °c. protein concentrations were determined using a linearized bradford protein assay. the human innate immune proteins (hiips) listed in figure c were cloned as gfp or mcherry fusions into dedicated gateway vectors for cell-free expression. open reading frames (orfs) were sourced from the human orfeome collections, versions . , . and . and transferred into gateway destination vectors that include nterminal or c-terminal fluorescent proteins [ ] . most proteins screened were expressed as n-terminal enhanced gfp fusions (vector pcellfree g ); for tab and nlrp , c-terminal gfp were also used to validate the cleavage sites. the specific gateway vectors were created by the laboratory of pr. alexandrov and sourced from addgene (addgene plasmid # ; http://n t.net/addgene: ; rrid:addgene_ ). mouse nlrp constructs were sourced from the laboratory of dr kate schroeder (imb, university of queensland).the hiips were expressed in vitro using a cell-free expression system derived from leishmania tarentolae [ , ] . this eukaryotic system enables expression of full-length proteins with minimal truncations and non-specific aggregation, for proteins up to kda in size []. this system has been used recently to study the folding and oligomerisation of nlrp proteins and the polymerization of asc [ref jmb] or the formation of higher-order assemblies of myd [ref bostjan ref bmc] . the expression is simply set-up as a one-pot reaction where the plasmid encoding the protein of interest is added to the leishmania tarentolae extracts (lte); expression occurs within h at ℃ and expression yields can be evaluated by the fluorescence intensity of the gfp/mcherry tags [ ] . the hiips proteins were expressed individually in µl reactions ( µl dna plasmid at concentrations ranging from ng/µl to ng/µl added to µl of lte reagent). the mixture was incubated for minutes at ℃ to allow the efficient conversion of dna into rna. the samples were then split into controls and protease-containing reactions. the proteases plpro (nsp ) and clpro (nsp ) were added at various concentrations, and the reactions were allowed to proceed for another . h at ℃ before analysis. the cell-free expression system was loaded onto the sds-page gels following the classic protocol and stained by coomassie to reveal the proteins present in the lte system. the sars-cov- protease plpro (nsp ) was added at concentration ranging from µm to µm (same as in figure a ). the gel shows on the right that the banding pattern of lte is not affected by plpro. (left): the density of bands across the control (lte without plpro added) was compared to the one obtained when µm of purified plpro was added in the expression system. no significant differences were noted, confirming that plpro doesn't have a non-specific cleavage activity on the proteins of lte. also, the intensity of the gfp-tagged protein bands on the gels did not vary significantly when plpro or clpro were added, suggesting that the expression levels were unaffected, and that the components of the cell-free systems essential for expression were not cleaved. with different molecular weights, ranging from to kda; the hiips were expressed in lte, mixed with lds after h of expression, and separated on the sds-page gel. in this gel, two different ladders were used to calibrate migration of proteins of interest. the page ruler prestained ladder has been developed specifically for the - % bis-tris gels (thermofisher, black •) and seems more accurate to predict sizes of gfp-tagged proteins (in white diamonds ◊). the trendline indicated on the graph (right) will be used to estimate the size of the gfp-tagged fragments of tab and nlrp . probably due to the non-denaturation of the protein tab in our lds loading protocol, the full-length protein human tab migrates slower than expected on the sds-page gel (estimated size kda from the calibration, vs . kda expected). nevertheless, the variations in size due to clpro cleavage are consistent with the two sites at position and . on the gel, the fragments are indicated, and the migration was analysed on the master curve to estimate the size of the gfp-tagged fragments. the kda of the gfp were taken into account in the table (right) to compare the predicted and observed sizes of the proteins and protein fragments. the full-length human nlrp proteins, tagged in n-term and c-term with gfp, were mixed with clpro and analysed by sds-page gels. the five bands obtained for the full-length nlrp and the four fluorescent cleavage products were analysed on the predictive size/migration plot. the variations in size are perfectly consistent with the two cleavages sites at position and . on the sds-page gel on the left, the fragments are indicated, and the migration was analysed on the master curve to estimate the size of the gfp-tagged fragments. the sizes obtained (after removing the kda contribution of the gfp) were compared to the predicted sizes of the proteins and protein fragments. key: cord- - ipkk authors: chen, dongsheng; sun, jian; zhu, jiacheng; ding, xiangning; lan, tianming; zhu, linnan; xiang, rong; ding, peiwen; wang, haoyu; wang, xiaoling; wu, weiying; qiu, jiaying; wang, shiyou; li, haimeng; an, fuyu; bao, heng; zhang, le; han, lei; zhu, yixin; wang, xiran; wang, feiyue; yuan, yuting; wu, wendi; sun, chengcheng; lu, haorong; wu, jihong; sun, xinghuai; zhang, shenghai; sahu, sunil kumar; chen, haixia; fang, dongming; luo, lihua; zeng, yuying; wu, yiquan; cui, zehua; he, qian; jiang, sanjie; ma, xiaoyan; feng, weimin; xu, yan; li, fang; liu, zhongmin; chen, lei; chen, fang; jin, xin; qiu, wei; yang, huanming; wang, jian; hua, yan; liu, yahong; liu, huan; xu, xun title: single-cell screening of sars-cov- target cells in pets, livestock, poultry and wildlife date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ipkk a few animals have been suspected to be intermediate hosts of severe acute respiratory syndrome coronavirus (sars-cov- ). however, a large-scale single-cell screening of sars-cov- target cells on a wide variety of animals is missing. here, we constructed the single-cell atlas for representative species in pets, livestock, poultry, and wildlife. notably, the proportion of sars-cov- target cells in cat was found considerably higher than other species we investigated and sars-cov- target cells were detected in multiple cell types of domestic pig, implying the necessity to carefully evaluate the risk of cats during the current covid- pandemic and keep pigs under surveillance for the possibility of becoming intermediate hosts in future coronavirus outbreak. furthermore, we screened the expression patterns of receptors for viruses, resulting in a comprehensive atlas of virus target cells. taken together, our work provides a novel and fundamental strategy to screen virus target cells and susceptible species, based on single-cell transcriptomes we generated for domesticated animals and wildlife, which could function as a valuable resource for controlling current pandemics and serve as an early warning system for coping with future infectious disease threats. in the past two decades, the world has witnessed the outbreak and spread of sars, middle east respiratory syndrome (mers) , zika , avian influenza and swine influenza , which have been posing an urgent challenge to our infectious disease prevention and control system. recently, sars-cov- has caused a highly contagious pandemic disease named coronavirus disease (covid- ), which is rapidly spreading all over the world and has triggered a severe public health emergency. as of rd june , globally the total number of confirmed covid- cases and deaths has uncontrollably reached , , and , , respectively . the bat has been proposed to be the original host of sars-cov- , however, the transmission from bats to humans requires some intermediate hosts. several studies have linked pangolins, cats, dogs and hamsters with sars-cov- infection and transmission [ ] [ ] [ ] [ ] [ ] [ ] [ ] , indicating the potential widespread prevalence across animals, which would post potential threats to humans. the identification of the origin of this virus and its path to becoming a deadly human pathogen is needed to understand how such processes occur in nature and identify ways we can prevent the onset of these types of global crises in the future. evaluating host susceptibility is critical for controlling the infectious disease. the screening of virus putative host is usually performed using in vivo assay or inoculation experiments, which helps to reveal the host susceptibility, however, there are several limitations: ) experiments for some dangerous and infectious viruses to cell receptor angiotensin-converting enzyme (ace ) and the cleavage of s protein by transmembrane serine protease (tmprss ) . although sars-cov- like corona virus has been isolated from pangolins and bats, their susceptible cell types for sars-cov- is not clear. given that the species barrier of sars-cov- was estimated to be relatively low and livestock, poultry and pets have very close contact with humans, it is crucial to evaluate animal susceptibility to sars-cov- . previous studies have proposed that animal tissues show high heterogeneity in terms of cellular composition and gene expression profiles , and ace is only expressed in a small proportion of specific cell populations , making single cell analysis of sars-cov- target cells an attracting field to investigate. here, we constructed the single cell atlas for livestock, poultry, pets and wildlife, then screened putative sars-cov- target cells (indicated by the co-expression patterns of sars-cov- entry receptor ace and sars-cov- entry activator tmprss ) and systematically evaluated their susceptibility, with the aim to understand the virus transmission routes and provide clues to fight against covid- . both pangolin and cat are suspected to be sras-cov- intermediate hosts. sras-cov- like coronavirus has been isolated from pangolin, and sars-cov- was proposed to origin from the recombination of a pangolin coronavirus with a bat coronavirus , . cat is also a suspected intermediate host, as human-to-cat and cat-to-cat transmission of sars-cov- have been reported , . domestic pig is an animal in close contact with human and have been reported to be susceptible to sars coronavirus . although those animals have been linked with coronavirus, yet a comprehensive single-cell atlas for those species is missing. in this study, we generated the single nuclei libraries for various tissues of pangolin (heart, liver, spleen, lung, kidney, large intestine, duodenum, stomach and esophagus) , cat (heart, liver, lung, kidney, eyelid, esophagus, duodenum, colon and rectum), and pig (heart, liver, spleen, lung, kidney, hypothalamus, area postrema, vascular organ of lamina terminalis, subfomical organ and cerebellum) (fig. a table ) . in cat, ace and tmprss co-expressing cells were detected in lung (ati, atii, secretory cells, mesothelial cells, ciliated cells, endothelial cells, fibroblasts, macrophages), kidney (endothelial cells, non-proximal tubule cells, proximal tubule cells, stromal cells); eyelid (endothelial cells, epithelium cells and immune cells), esophagus ( immune cells) and rectum (enterocytes). notably, we observed over % co-expression of ace and tmprss in proximal tubule cells of cat kidney, and around % in epithelium cells of cat eyelid (fig. d, supplementary table ). in pangolin, sars-cov- target cells were found in lung endothelial cells, kidney (endothelial cells, podocytes and proximal tubule cells), liver (hepatocytes) and spleen (immune cells) (fig. g, supplementary table ). in pig, ace and tmprss were mainly co-expressed in lung (ati, atii, ciliated cells, secretory cells, endothelial cells, fibroblasts, macrophages) and kidney (non-proximal tubule cells, proximal tubule cells, endothelial cells, podocytes) (fig. j, supplementary table ) . overall, the proportions of sars-cov- target cells in cat are much higher than the proportions in corresponding cell types of all other species studied here. in consistent with previous report that sras-cov- replicates poorly in chicken and duck , no sars-cov- target cells were found in lung cells of poultry (chicken, duck, goose and pigeon) (fig. a, b) . in cat, sars-cov- target cells were detected in eight out of eleven cell types investigated, with the top two cell types being ciliated cells and secretory cells. in pig, co-expression of ace and tmprss was observed in seven out of eleven cell types. in pangolin, a small proportion of endothelial cells were found to co-express ace and tmprss . in hamster, ciliated cells were the cell type with most abundant sars-cov- target cells. in goat, we detected the co-expression of ace and tmprss in ati, fibroblasts, endothelial cells, ciliated cells and t cells. goat share highly similar ace amino acids sequence with pig and human , implying that goat ace might have similar capability for mediating virus entering into host cells. in lizard, we detected the co-expression of ace and tmprss , mainly in b cells. in dog, less than . % ace and tmprss co-expressing cells were detected in ciliated cells and atii (fig. a, b) . the infection with sars-cov- could lead to a severe pneumonia, and respiratory diseases caused by other respiratory viruses is also noteworthy. to reveal the putative target lung cells of other respiratory viruses, we screened the expression patterns of virus receptors for a total of virus species derived from virus families (coronavirida, orthomyxoviridae, adenoviridae, hantaviridae, matonaviridae, paramyxoviridae, parvoviridae, phenuiviridae, picornaviridae, pneumoviridae, reoviridae and rhabdoviridae) (fig. c, supplementary table ) , which have been shown to be able to transmit via the respiratory system . generally, poultry lung cells express less types of virus receptors than mammalians and reptiles. for example, coronavirus receptors were generally not expressed in poultry lung cells, except that human coronavirus e receptor anpep was expressed in chicken and duck lung cells but not in goose and pigeon lung cells. mers coronavirus receptor dpp was expressed in chicken and goose lung cells but was barely detected in corresponding cell types in duck and pigeon. coronavirus sars receptor clec g was expressed in goat lung cells (ciliated cells, endothelial cells, fibroblasts and macrophages) but was found absent in lung cells of other species. influenza a virus receptors uvrag and egfr was present in lung cells of every species we investigated, but was absent or exhibited meagre expression in pangolin and pigeon. adenoviridae virus receptors showed a preferential expression in mammals, except adenovirus type c receptors which was also present in poultry lung cells. rhinovirus c receptor displayed preferential expression in ciliated cells of human, cat, dog, hamster, pig, goat and goose while respiratory syncytial virus receptor cd was only present in human macrophage cells. taken together, our work, for the first time, revealed the putative target cells for respiratory viruses in an important organ of respiratory system (lung), which lays the foundation for dissecting the infection and transmission of respiratory system viruses at the single-cell level. while comparing the frequencies of sars-cov- target lung cells across different species, we noticed that cat clearly outweigh other species, with . % in ciliated cells, compared to pig ( . %) and hamster ( . %) (supplementary table ). moreover, when taking the proportions of sars-cov- target cells in distinct organs among cat, pangolin and pig into consideration, it further indicated that the proportion of ace and tmprss target cells were much higher in cat. for example, the proportion of ace and tmprss co-expressing cell was as high as % in cat kidney proximal tubular cells while the proportions were only around % and % in corresponding cell type in pangolin and pig, respectively (supplementary table ). we also noticed that sars-cov- target cells were widely distributed among organs within the digestive system (esophagus, rectum), respiratory system (lung) and urinatory system (kidney) of cat (fig. a) , implying that cats could be infected by sars-cov- via multiple routes such as dietary infections of the digestive tract or airborne transmission through respiratory system. to highlight sars-cov- susceptible cell types, we summarized all cell types with proportion of ace and tmprss co-expressing cells higher than %, which clears shows that cat, as well as pig, have more susceptible cell types than other species we investigated (fig. a) . taken together, our data explains the observation that cats are highly permissive to sars-cov- and raise the necessity to carefully monitor and evaluate the possible roles of cat as intermediate hosts in current pandemic. sars-cov- replicate poorly in dog, chicken and duck . our data suggests that the co-expression of ace and tmprss is very rare in dog lung cells and absent in poultry lung cells (chicken, duck, goose and pigeon). besides, it has been predicted that dog and poultry ace cannot be utilized efficiently by sars-cov- spike glycoprotein because of mutations in critical amino acids of dog ace . therefore, our data, to some extent, explains why dogs and poultry are not as permissive for sars-cov- infection as cat. in addition to cats, pangolins , and hamsters have been reported to be permissive for sars-cov- infection, however, the target cells for virus infection and putative transmission routes are largely unknown. our study identified the sars-cov- target cells in distinct tissues of cat, pangolin and hamster, indicated by the simultaneous expression of sars-cov- entry factors: ace and tmprss . a detailed comparative analysis among cat, pangolin and hamster deciphered that proportion of sars-cov- target cells in cat was much higher than pangolin and hamster, implying that cats are more susceptible to sars-cov- . besides, as a companion animal, cats interact with humans more frequently than pangolins, thus we proposed that cats should be closely monitored in the current covid pandemic. in addition, we also detected the co-expression of ace and tmprss in lung cells of goat and lizard. to investigate the susceptibility of host cells to different kinds of viruses, we screened table ). intriguingly, we found rabies lyssavirus receptor ncam was widely expressed in cell types of pig neural system but rarely expressed in non-neural tissues. besides, we found that vesicular stomatitis virus receptor uvrag was preferentially expressed in pig lung cells (extended data fig. ). to demonstrate the proportions of virus receptor expressing cells in an intuitive manner, we proposed a system named "traffic light system for virus entry" to assign each cell type to one of the following three status: red light, green light and yellow light. briefly, if the receptor is expressed in less than % of that cell types, a "red light" status will be assigned. if the proportion is between % and %, "yellow light" status will be assigned. if the proportion is over %, then a "green light" status would be assigned (see methods). based on this system, we assigned status to all the cell populations of all the investigated species (supplementary data table ). to demonstrate the potential usefulness of this system, we employed this analysis and our project provides a novel strategy to find out putative susceptible hosts, based on the distribution of virus target cells among distinct cell populations, which makes it possible to screen the susceptibility of all existing viruses (using the available virus receptor information) on all existing species (using the available scrnaseq data) in an unbiased manner. with the development of single cell sequencing techniques and the progress of international single cell atlas collaborative projects, the atlas for more species could be generated at an accelerated speed. we anticipate that the information gained from the present study will certainly augment future research work, and provide some novel insights about the prevention and control strategies against sars-cov- along with many other harmful viruses. sample collection and research were performed with the approval of institutional review board on ethics committee of bgi (approval letter reference number bgi-no. bgi-irb a ). all procedures were conducted according to the guidelines of institutional review board on ethics committee of bgi. a total of samples were collected in this study, including four pets: felis catus after the execution of animals in accordance with the ethics of animal experiment, dissection was carried out quickly to separate each organ. the collected tissues were rinsed by x pbs, then quick-frozen and stored in liquid nitrogen. the single cell nucleus of each tissue was separated by mechanical extraction method . briefly, the tissues were first thawed, infiltrated by x homogenization buffer (containing mm cacl , mm mg(ac) , mm tris-hcl (ph . ), mm sucrose, . % np- , . mm edta and . u/µl rnase inhibitor), then cut into smaller pieces, and the single nucleus was isolated by ml dounce homogenizer set. after filtration with µm strainer, the nuclei extraction was resuspended by %bsa containing . u/µl rnase inhibitor and span down at the speed of g for min at degrees (to carefully discard the cellular impurities within the supernatant). this step was repeated twice, and finally the nucleus was recollected with . % bsa containing . u/µl rnase inhibitor. subsequently, dapi was used to stain the nucleus, and the nucleus density was calculated under a fluorescence microscope to prepare for the subsequent library construction. the mrna within the single nucleus samples of different organs of pig (heart, liver, spleen, lung, kidney, hypothalamus, area postrema, vascular organ of lamina terminalis, subfomical organ and cerebellum) were captured and the libraries were constructed using inhouse dnbelab c kit and sequenced using dnbseq-t . the separated single nucleus of different organs (including the lungs for pig, dog, cat, goat, pangolin, chicken, pigeon, goose, duck, lizard, and hamster; pangolin organs: heart, liver, spleen, lung, kidney, large intestine, duodenum, stomach and esophagus; cat organs: heart, liver, lung, kidney, eyelid, esophagus, duodenum, colon and rectum) to facilitate the integration of cross-species lung single cell data set, we converted genes from other species to mouse homologs. we downloaded homolog gene lists using biomart . then, if a : match existed between a non-mouse gene and a mouse gene, the non-mouse gene name was converted. as for pangolin, goose and pigeon, which lack homologs records on ensemble, single-copy orthologs were identified from two species genomes by cluster analysis of gene families using orthofinder (orthofinder version . . ) with the default parameters. sequencing data filtered using custom script and gene expression matrix were obtained using cell ranger . . ( x genomics). the genomes using for reads alignment were downloaded from ncbi assembly (supplementary table ). single cell analysis was conducted using seurat , . briefly, quality control was performed based on the following criteria: cells with mapped number of genes less than or with mitochondrial percentage higher than % were removed. variable genes were determined using seurat's findvariablegenes function with default parameters. clusters were identified using seurat's findclusters function and visualized using seurat's runtsne. all the degs for each seurat objects were identified using seurat's findallmarkers function. cell types were annotated according to the expression of canonical cell type markers. the human lung single cell rnaseq data was obtained from literature . data sets of lungs from different species were integrated using seurat's findintegrationanchors and integratedata function with features after homolog conversion. virus receptor list were downloaded from a virus-host receptor interaction database and manually collected from published literatures (supplementary table ) . single nuclei rnaseq data sets for frontal lobe, occipital lobe, parietal lobe, temporal lobe and hypothalamus of domestic pig were obtained from literature . the proportions of virus receptor were calculated and "red, yellow, green" status was assigned to each cell type based on receptor proportions (less than %, red light; greater than or equals to % & less than %, yellow light; greater than or equals to %, green light). in case of viruses with multiple receptors, the receptor with the highest proportion was considered for status assignment. the single cell atlas of all the investigated species in this study are available via http:// . . . : /sars-cov- . the raw data supporting the findings of this study will be made available upon request. raw transcriptome sequencing data has been deposited to the cnsa (cngb nucleotide sequence archive) with the accession number cnp (https://db.cngb.org/cnsa/) and will be released to the public after the manuscript is accepted for publication. smooth muscle cells duodenum • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • tmprss ace duodenum tmprss ace • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • tmprss ace c o l a c o l a f b l n a c t a c l d n t m e m s l c a p a x m a l n t n l r p p c k p d z k s l c a b t b d a c m s d s l c a s l c a a s s f b p s l c a s l c a a l d o b a d g r l c d f l t k d r m y c t p e c a m p t p r b e s a m n r c a m t c f d p p m a g i e yr g s p r k g m r c it g a m c d c d c q a a d g r e s o x g j a e m c n p e c a m t h b s p c s k g s n d d r g f p t c r is p l d e n o m y o z n e x n recent insights into emerging coronaviruses origins and evolutionary genomics of the swine-origin h n influenza a epidemic covid- situation reports possible bat origin of severe acute respiratory syndrome coronavirus . emerg identifying sars-cov- related coronaviruses in malayan pangolins isolation of sars-cov- -related coronavirus from malayan pangolins susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus . science ( -. ) sars-cov- neutralizing serum antibodies in cats: a serological investigation transmission of sars-cov- in domestic cats infection of dogs with sars-cov- pathogenesis and transmission of sars-cov- in golden hamsters sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss the human cell atlas sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-associated coronavirus transmitted from human to pig the authors declare no competing interests. key: cord- -cpiveo j authors: cao, xia; maruyama, junki; zhou, heyue; kerwin, lisa; sattler, rachel; manning, john t.; johnson, sachi; richards, susan; li, yan; shen, weiqun; blair, benjamin; du, na; morais, kyndal; lawrence, kate; lu, lucy; pai, chin-i; li, donghui; brunswick, mark; zhang, yanliang; ji, henry; paessler, slobodan; allen, robert d. title: discovery and development of human sars-cov- neutralizing antibodies using an unbiased phage display library approach date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cpiveo j sars-cov- neutralizing antibodies represent an important component of the ongoing search for effective treatment of and protection against covid- . we report here on the use of a naïve phage display antibody library to identify a panel of fully human sars-cov- neutralizing antibodies. following functional profiling in vitro against an early pandemic isolate as well as a recently emerged isolate bearing the d g spike mutation, the clinical candidate antibody, sti- , and the affinity-engineered variant, sti- , were evaluated for in vivo efficacy in the syrian golden hamster model of covid- . both antibodies demonstrated potent protection against the pathogenic effects of the disease and a dose-dependent reduction of virus load in the lungs, reaching undetectable levels following a single dose of micrograms of sti- . these data support continued development of these antibodies as therapeutics against covid- and future use of this approach to address novel emerging pandemic disease threats. global incidence of severe acute respiratory disease syndrome coronavirus (sars-cov- ) infection has continued to rapidly increase since the virus was first detected in december . to date, public health agency efforts to combat the pandemic level of infection and resultant coronavirus disease (covid- ) have relied mostly upon effective quarantine measures , . to further protect at-risk populations, a worldwide effort has been undertaken to develop additional countermeasures, such as antiviral compounds, immunomodulatory agents, vaccines, and neutralizing antibodies (nabs) , , . while vaccines have been among the most effective means in preventing and containing widespread infections caused by many pathogens, their development is often faced with drawbacks, including long development timelines, large clinical trial sizes, and uncertain efficacy due to the reliance on patientgenerated immune responses to the antigens , , . of note, it has been shown in the context of multiple virus infections that virus-specific antibodies can lead to exacerbation of disease symptoms through a process termed antibody dependent enhancement (ade) , . this ade of virus infection is a phenomenon in which certain virus-specific antibodies enhance the entry of virus due to engagement between igg fc regions and fc-receptors on the surface of monocytes leading to productive infection of these cells rather than virus clearance , . in addition, the presence of immunodominant epitopes, i.e. highly immunogenic sequences, in viral antigens can skew this immune response, resulting in production of antibodies that are neither neutralizing nor protective, as seen in the vaccine development against human immunodeficiency virus (hiv) . in the case of sars-cov- , nabs targeting the spike protein have been isolated from the sera of convalescent covid- patients as well as from sera donated by healthy normal patients prior to december , , , , , . while the patients' immune systems generated these nabs, they often recognize the same region of the spike protein or even a similar epitope due to immunodominant regions of the spike protein , . one way to overcome this inherent immune bias is the use of an artificial immune system in a test tube, namely phage display human antibody libraries, which allows for functional unbiased selection of spike-binding antibodies irrespective of the underlying natural immunogenicity . most antibody libraries are constructed in a way that allows for chain shuffling leading to random associations of heavy and light chain encoding genes and resulting in a vast repertoire of non-naturally occurring antigen specificities. in fact, few candidate anti-sars-cov- nabs have been isolated from phage display libraries , . using sorrento's g-mab™ library , a single chain variable fragment (scfv) antibody phage display library constructed from the antibody repertoire of over healthy individuals, a panel of antibodies that bind the sars-cov- spike s subunit was identified and characterized. herein, we report on the identification, characterization, and subsequent optimization of sti- , a sars-cov- neutralizing antibody isolated from the g-mab library. sti- neutralized both the wa- / isolate of sars-cov- as well as isolate , a clinical isolate of sars-cov- that encodes the g variant of the spike protein. isolates harboring the g spike variant have been reported to be more infectious and occur in greater frequency than viruses such as the wa- / isolate that encode the d spike variant . affinity maturation of sti- resulted in identification of sti- , an antibody with a -fold increased affinity for the sars-cov- spike receptor-binding domain (rbd) leading to a greater than -fold increase in virus neutralization potency against live wa- / and viruses in vitro. importantly, both nabs, sti- and sti- , provided protection against the pathogenic effects and replication of sars-cov- in the syrian golden hamster virus challenge model , . the s subunit of the sars-cov- spike protein (amino acids - ) bearing a c-terminal histidine tag (acro biosystems, newark, nj) was coated at μg/ml on a ni-nta plate (qiagen, valencia, ca). after washing and blocking, μg/ml antibody was added to the corresponding wells for -hour incubation. the plate was washed three times and incubated with : dilution of hrp-conjugated mouse anti-human igg (fc). color development was performed with , ′, , ′-tetramethylbenzidine (tmb). absorbance was read at nm. kinetic interactions between the antibodies and his-tagged receptor binding domain (rbd, amino acids - ) (acro biosystems, newark, nj) protein was measured at c using biacore t surface plasmon resonance (spr) (ge healthcare). sti- or sti- antibody was covalently immobilized on a cm sensor chip to approximately and resonance units (ru), respectively using standard n- sti- was expressed using a two-vector transient expression protocol. briefly, x /ml cho-s cells were co-transfected with g of total dna of the heavy and light chain plasmids, at : dna: polyethyleneimine (pei) ratio (g/g). after hours of incubation with shaking ( rpm) at °c, the culture conditions were adjusted to incubation at °c with shaking in a % co chamber for - days. sti- was purified from the cho culture supernatants by affinity chromatography with mabselectsurelx™ protein a resin using the aktapure platform. sti- was produced using a double gene vector stable pool expression protocol. briefly, the pxc double gene vector (lonza) encoding both the heavy and light chain genes was used to generate polyclonal pools of stably transfected cho-s cells, according to manufacturer's recommended transfection and selection protocols. the stable pool was cultured with shaking ( rpm) at °c in a % co incubator for days. sti- was purified to homogeneity from the cho culture supernatants first by affinity chromatography with mabselectsurelx™ protein a resin and then by ion exchange chromatography using spimpres resin. vero e cells were maintained in dulbecco's modified eagle's medium (dmem, corning, ny) supplemented with % fetal bovine serum (fbs, thermo fisher scientific, ma), % penicillinstreptomycin, and l-glutamine. the p stock of the sars-cov- (usa/wa- / ) isolate and the p stock of the sars-cov- ( ) isolate were obtained from the world reference center for emerging viruses and arboviruses (wrceva) at the university of texas medical branch. the viruses were propagated in vero e cells and cell culture supernatant of p and p stocks, respectively, were stored at - °c under bsl conditions. the day before infection, x vero e cells were plated to -well plates and incubated at ° c, % male and female syrian golden hamsters were obtained from charles river laboratories at weeks of age. hamsters were inoculated intranasally (i.n.) with tcid of sars-cov- in µl of sterile pbs on day . antibody treatments were administered intravenously (i.v.) with monoclonal antibodies (mabs) against sars-cov- spike, or isotype control mab in up to µl of sterile pbs at hour-post inoculation. animals were monitored for illness and mortality for days post-inoculation and clinical observations were recorded daily. body weights and temperatures were recorded at least once every hours throughout the experiment. on day post-infection, animals from each treatment group were sacrificed and virus titers in the lungs of these animals were measured using a sars-cov- virus tcid assay. average % weight change on each experimental day were compared with the isotype control mab-treated group using -way anova following dunnett's multiple comparisons test. all animals were housed in animal biosafety level- (absl- ) and absl- facilities in galveston national laboratory at the university of texas medical branch. all animal studies were reviewed and approved by the institutional animal care and use committee at the university of texas medical branch and were conducted according to the national institutes of health guidelines. prior to initiation of the experiment, five animals from each treatment group (n= ) were designated for virus titration in lung tissue at days post inoculation. on the assigned day, animals were euthanized, lung tissue samples were collected from each animal, and a portion of the tissue ( . - . g) was placed into pre-labeled microcentrifuge tubes containing mm stainless steel beads (qiagen inc., ca). lung samples were homogenized with dmem + % fbs in a tissuelyser (qiagen inc., ca) operated at - hz for four minutes. tubes were centrifuged and clarified homogenate was serially diluted -fold with dmem+ % fbs. from material representing each serial dilution step, l was transferred to each of four wells of a -well plated seeded with vero e cells. plates were then incubated for - hours at ° c, % co . cells were subsequently fixed with % formalin and stained with . % crystal violet solution. tcid values were calculated by the method of reed and muench . the g-mab library was panned using a recombinant his-tagged sars-cov- spike s subunit to allow for selection of candidates with high affinity to the target antigen, as outlined in figure . in summary, following confirmation of s binding by elisa, clonal scfv preparations were tested in a competition elisa format for disruption of spike s :ace binding. candidate scfvs with high s binding affinity and/or the capacity to block spike s :ace binding were converted into and expressed as full length human igg antibodies. in an effort to mitigate the risk of ade resulting from administration of g-mab-derived nabs, the igg the nab candidate termed sti- displayed potent sars-cov- neutralizing activity superior to that of other g-mab nab hits, with an ic in the virus neutralization assay of . g/ml (figure a) . in parallel to sti- preclinical profiling and cell line development efforts, affinity maturation of sti- was undertaken, with systematic amino acid changes engineered within complementarity determination regions (cdrs) of the sti- heavy and light chains. for individual cdr regions from sti- , a library of variants was screened for binding to the spike s subunit by elisa ( figure b ). variants bearing combinations of affinity-enhancing cdr amino acid changes were then systematically engineered and expressed as igg lala antibodies. based on spike s elisa binding results, the most potent clone of the single cdr and combination cdr variants, sti- , was further profiled for binding affinity using surface plasmon resonance (spr). sti- bound to the rbd region of the spike protein with an affinity of . nm, a -fold affinity improvement over that of the parental sti- ( figure c ). sti- and sti- were tested for binding to full length spike proteins derived from naturally emerging viruses including the usa/wa- / (wa- ) isolate as well as an emerging sars-cov- isolate that bears a d →g mutation at amino acid residue in the spike protein (d g). binding studies demonstrated equivalent binding affinity for both nabs against cell surface-expressed spike protein with the d g mutation as well as that of the wa- isolate ( figure a ). to determine the neutralizing activity of sti- and sti- against the aforementioned sars-cov- clinical isolates (vide supra), nabs were evaluated in the virus neutralization assay. both nabs exhibited equipotent neutralization of the virus isolates. in keeping with observed improved binding properties, sti- neutralized both isolates with an ic of . g/ml and an ic of . g/ml, a greater than fold enhancement of neutralizing potency over the parental sti- nab ( figure b ). animals in the sti- treatment groups were administered a single dose of either , , , or g iv. administration of a g dose of sti- resulted in maximum average percentage body weight loss of - . %, which occurred on d.p.i.. after that, animals treated with ug sti- maintained an average body weight that as a percentage of day weight was significantly different on days , , , , , and than the average weight measured among isoctl-treated animals. as with sti- -treated hamsters, virus titers in the lung were measured in of animals from each treatment group. in isoctltreated animals, an average of . x tcid /g of lung tissue was detected. treatment with g sti- resulted in reduction of lung titers below the level of detection in all animals tested, a sti- treatment-related lung titer reduction of -fold at minimum. in the g sti- treatment group, the average lung titer was reduced below the level of detection in of animals tested, while of animals had lung titers of similar magnitude to those measured in isoctl-treated animals. animals with undetectable lung virus titers in the g treatment group also experienced only moderate weight loss compared to animals with detectable lung virus titers in this group (d values of - . % and - . %) no changes in average lung titer compared to isoctl-treatment were detected in animals from the g sti- treatment group. in summary, sti- demonstrated promising protective efficacy in the hamster model at a dose of g. the observed overall enhancement of in vitro potency for the affinity matured nab, sti- , translated into increased protection against disease in the hamster challenge model, with a greater than -fold increase in protective efficacy over that of sti- . in this study, we detail the initial discovery and profiling of a sars-cov- nab isolated from a phage display antibody library derived from the b-cell repertoire of over healthy normal individuals. this approach complements the isolation of nabs collected from natural sources, such as convalescent patients or vaccinated healthy individuals, and allows for isolation of nabs recognizing neutralizing epitopes developed outside the temporal and biological context of the pathogen/antigen-specific immune response. the parental nab sti- and the affinity-matured derivative sti- were characterized for their biochemical and function properties in vitro and in vivo against prevalent clinical isolates of sars cov- . both nabs demonstrated potent functionality in these assays, which is strongly supportive of continued preclinical and clinical development. notably, demonstration of improvement in neutralizing potency in vitro for sti- over that of sti- was predictive of increased protective efficacy in the hamster challenge model following post-exposure therapeutic treatment. in addition, in vivo efficacy was achieved in the presence of the ade-mitigating lala fc modification, providing further confidence that this modification might add to the utility of nabs under clinical evaluation for use in the treatment of covid- . studies aimed at establishing the therapeutic and prophylactic treatment window in the hamster model as well as demonstrating efficacy following administration of nab proteins by nonparenteral routes are currently underway. in addition, introduction of nab-encoding dna plasmids provides a low cost, long-acting means of establishing antibody-mediated immunity, and experiments in rodents are ongoing and will determine the suitability of sti- for use in this context. treatments using sti- nab protein or dna-encoding of nabs in combination with other identified nabs targeting non-overlapping epitopes on the spike s subunit to constitute a nab cocktail approach will provide a means of increasing the barrier to treatment resistance as the prevalence of naturally emerging and treatment-associated virus variants increases. this approach is under investigation and in preclinical development. as emergence of novel pathogens will continue to pose a significant threat to humanity, the rapid discovery of potent nabs against these yet-unknown microbes will remain paramount to global health efforts. the discovery and development of anti-sars-cov- nabs sti- and sti- in less than months without the need of samples from infected individuals could serve as blueprint for future rapid response efforts. panning of the g-mab library consisting of fully human antibody sequences allowed for the expeditious progression of lead nab candidates from initial screening to the production cell line generation stage, avoiding additional development steps such as subsequent humanization of the identified antibodies and associated immunogenicity concerns of non-human antibody-based therapeutics. this has enabled an efficient and expedited discovery and development of sars-cov- nabs starting in march , thus taking less months from initiation of the project to completed ind filing. notably, sti- has recently received fda clearance to commence first-in-human studies making it the first clinical-stage anti-sars-cov- nab derived not from convalescent patients but from a fully human phage display antibody library, and it is anticipated that dosing of the first patients will begin in october . panned for sars-cov- spike s subunit-binding scfv fragments. following confirmation of binding activity and blocking of s :ace interactions by candidate scfvs, the most potent of these candidates were converted to igg antibodies bearing the lala fc modification. candidate nabs were characterized for binding of spike s subunit and neutralization of related clinical sars-cov- isolates. affinity maturation of potent nabs was carried out in parallel to biophysical profiling, cell line development, and evaluation of protective efficacy for the parental nab, sti- . and sti- . the antibody affinities were measured using spr on a biacore t instrument. competing interests: sorrento authors own options and/or stock of the company. this work has been described in one or more provisional patent applications. hj is an officer at sorrento therapeutics, inc.. a novel coronavirus from patients with pneumonia in china covid- : towards controlling of a pandemic public health response to the initiation and spread of pandemic covid- in the united states pandemic preparedness: developing vaccines and therapeutic antibodies for covid- the current and future state of vaccines, antivirals and gene therapies against emerging coronaviruses countermeasures to coronavirus disease : are immunomodulators rational treatment options-a critical review of the evidence immunological considerations for covid- vaccine strategies the long road toward covid- herd immunity: vaccine platform technologies and mass immunization strategies sars-cov- vaccines: status report antibody-dependent enhancement and sars-cov- vaccines and therapies antibody-dependent enhancement of coronavirus antibody-dependent sars coronavirus infection is mediated by antibodies against spike proteins anti-severe acute respiratory syndrome coronavirus spike antibodies trigger infection of human immune cells via a ph-and cysteine protease-independent fcgammar pathway the hard way towards an antibody-based hiv- env vaccine: lessons from other viruses potent neutralizing antibodies directed to multiple epitopes on sars-cov- spike a human monoclonal antibody blocking sars-cov- infection convergent antibody responses to sars-cov- in convalescent individuals longitudinal isolation of potent near-germline sars-cov- -neutralizing antibodies from covid- patients cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody a human neutralizing antibody targets the receptor-binding site of sars-cov- identification of immunodominant linear epitopes from sars-cov- patient plasma two linear epitopes on the sars-cov- spike protein that elicit neutralising antibodies in covid- patients manufacturing immunity to disease in a test tube: the magic bullet realized isolation of and characterization of neutralizing antibodies to covid- from a large human naïve scfv phage display library. biorxiv blocking antibodies against sars-cov- rbd isolated from a phage display antibody library using a competitive biopanning strategy. biorxiv recombinant antibody libraries and selection technologies tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility animal models for covid- a simple method of estimating fifty percent endpoints novel human igg and igg fc-engineered antibodies with completely abolished immune effector functions a novel angiotensin-converting enzyme-related carboxypeptidase (ace ) converts angiotensin i to angiotensin - tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis quantitative mrna expression profiling of ace , a novel homologue of angiotensin converting enzyme key: cord- -w dtfjel authors: peng, qi; peng, ruchao; yuan, bin; zhao, jingru; wang, min; wang, xixi; wang, qian; sun, yan; fan, zheng; qi, jianxun; gao, george f.; shi, yi title: structural and biochemical characterization of nsp -nsp -nsp core polymerase complex from covid- virus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w dtfjel the ongoing global pandemic of coronavirus disease (covid- ) has caused huge number of human deaths. currently, there are no specific drugs or vaccines available for this virus. the viral polymerase is a promising antiviral target. however, the structure of covid- virus polymerase is yet unknown. here, we describe the near-atomic resolution structure of its core polymerase complex, consisting of nsp catalytic subunit and nsp -nsp cofactors. this structure highly resembles the counterpart of sars-cov with conserved motifs for all viral rna-dependent rna polymerases, and suggests the mechanism for activation by cofactors. biochemical studies revealed reduced activity of the core polymerase complex and lower thermostability of individual subunits of covid- virus as compared to that of sars-cov. these findings provide important insights into rna synthesis by coronavirus polymerase and indicate a well adaptation of covid- virus towards humans with relatively lower body temperatures than the natural bat hosts. in the end of , a novel coronavirus ( -ncov) caused an outbreak of pulmonary disease in china (zhu et al., ) , which was later officially named "severe acute respiratory syndrome viruses with a broad host-spectrum (vicenzi et al., ) . currently, a total of seven human- the interface domain ( figures b and d ). the two nsp subunits display significantly different conformations with substantial refolding of the n-terminal extension helix region, which mutually preclude the binding at the other molecular context ( figure c ). the importance of both cofactor-binding sites has been validated by previous biochemical studies on polymerase, which revealed their essential roles for stimulating the activity of nsp polymerase subunit (subissi et al., ) . given the residue substitutions between sars-cov- and sars-cov polymerase subunits albeit the high degree overall sequence similarity, we compared the enzymatic behaviors of the viral polymerases aiming to analyze their properties in terms of viral replication. both sets of core polymerase complex could well mediate primer-dependent rna elongation reactions templated by the '-vrna. intriguingly, the sars-cov- nsp -nsp -nsp complex displayed a much lower efficiency (~ %) for rna synthesis as compared to the sars-cov counterpart ( figure a ). as all three nsp subunits harbor some residue substitutions between the two viruses, we further conducted cross-combination analysis to evaluate the effects of each subunit on the efficiencies of rna production. in the context of sars-cov- nsp polymerase subunit, replacement of the nsp cofactor subunit with that of sars-cov did not result in obvious effect on polymerase activity, whereas the introduction of sars-cov nsp subunit greatly boosted the activity to ~ . times of the homologous combination. simultaneous replacement of the nsp and nsp cofactors further enhanced the efficiency for rna synthesis to ~ . times of that for the sars-cov- homologous complex ( figure b ). consistent with this observation, the combination of sars-cov- nsp -nsp subunits with the sars-cov nsp polymerase subunit compromised its activity as compared to the native cognate cofactors, among which the nsp subunit exhibited a more obvious effect than that for nsp ( figure c) . these evidences suggested that the variations in nsp subunit rendered a significantly negative impact on the polymerase activity of sars-cov- nsp . the non-significant effect of nsp on polymerase activity was quite conceivable as only one residue substitution occurred between the two viruses ( figure b ). in addition, we also compared the polymerase activity of different nsp subunits in the same context of nsp -nsp cofactors. combined with either cofactor sets, the sars-cov- nsp polymerase showed a lower efficiency (~ %) for rna synthesis as compared to the sars-cov counterpart ( figure d ). this observation demonstrated that the residue substitutions in nsp also contributed to the reduction of its polymerase activity, with similar impact to the variations in the nsp cofactor. despite that there are amino acid substitutions in all three subunits of the core polymerase complex between sars-cov- and sars-cov, none of these residues is located at the polymerase active site or the contacting interfaces between adjacent subunits ( figure b) , suggesting these substitutions do not affect the inter-subunit interactions for assembly of the polymerase complex. to test this hypothesis, we measured the binding kinetics between different subunits of the two viruses by surface plasmon resonance (spr) assays. each interaction pair exhibited similar kinetic features for the two viruses, all with sub-micromolar range affinities ( figure a and b). we also tested the cross-binding between subunits of the two viruses, which revealed similar affinities for heterologous pairs as compared to the native homologous interactions ( figure c and d). see also figures s -s and table s . the codon-optimized sequences of nsp and nsp were synthesized with n-terminal ×histidine tag and inserted into pet- a vector for expression in e. coli (synbio tec, suzhou, china). for the nsp l fusion protein, the sequence was also codon-optimized for e. coli expression system and a ×histidine linker was introduced between the nsp and nsp subunits (genewiz tec, suzhou, china). protein production was induced with mm isopropylthio- galactoside (iptg) and incubated for - hours at °c. bacterial cells were harvested by centrifugations ( , rpm, min), resuspended in buffer a ( mm hepes, mm nacl, mm tris ( -carboxyethyl) phosphine (tcep), ph . , and lysed by sonication. cell debris were removed via centrifugation ( , rpm, h) and filtration with a . μm cut-off filter. an aliquot of μl protein solution ( . mg/ml) was applied to a glow-discharged quantifiol . / . holey carbon grid and blotted for . s in a humidity of % before plunge-freezing with an fei vitrobot mark iv. cryo-samples were screened using an fei tecnai tf electron microscope and transferred to an fei talos arctica operated at kv for data collection. the microscope was equipped with a post-column bioquantum energy filter (gatan) which was used with a slit width of ev. the data was automatically collected using serialem software (http://bio d.colorado.edu/serialem/). images were recorded with a gatan k -summit camera in super-resolution counting mode with a calibrated pixel size of . Å at the specimen level. each exposure was performed with a dose rate of e -/pixel/s (approximately . e -/Å /s) and lasted for . s, resulting in an accumulative dose of ~ e -/Å which was fractionated into movie-frames. the final defocus range of the dataset was approximately - . to - . μm. the image drift and anisotropic magnification was corrected using motioncor (zheng et al., ). initial contrast transfer function (ctf) values were estimated with ctffind . (rohou and grigorieff, ) at the micrograph level. images with an estimated resolution limit worse than Å were discarded. particles were automatically picked with relion- . (zivanov et al., ) following the standard protocol. in total, approximately , , particles were picked from ~ , micrographs. after rounds of extensive d classification, ~ , particles were selected for d classification with the density map of sars-cov nsp -nsp -nsp complex (emdb- ) as the reference which was low-pass filtered to Å resolution. after two rounds of d classification, a clean subset of ~ , particles was identified, which displayed clear features of secondary structural elements. these particles were subjected to d refinement supplemented with per-particle ctf refinement and dose-weighting, which led to a the structure of sars-cov nsp -nsp -nsp complex (pdb id: nur) was rigidly docked into the density map using chimera (pettersen et al., ) . the model was manually corrected for local fit in coot (emsley et al., ) and the sequence register was corrected based on alignment. the initial model was refined in real space using phenix (adams et al., ) with the secondary structural restraints and ramachandran restrains applied. the model was further adjusted and refined iteratively for several rounds aided by the stereochemical quality assessment using molprobity (chen et al., ) . the representative density and atomic models are shown in figure. images were taken using a vilber fusion system and analyzed with the image j software. python-based system for macromolecular structure solution mechanism of nucleic acid unwinding by sars-cov helicase biochemical characterization of a recombinant sars coronavirus nsp rna-dependent rna polymerase capable of copying viral rna templates shaping the flavivirus replication complex: it is curvaceous! the proximal origin of sars-cov- molprobity: all-atom structure validation for macromolecular crystallography the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- the crystal structure of zika virus ns reveals conserved drug targets features and development of structure of the rna-dependent rna polymerase from covid- virus structural insights into bunyavirus replication and its regulation by the vrna promoter crystal structure of zika virus ns rna-dependent rna polymerase structural basis for active site closure by the poliovirus rna-dependent rna polymerase crystal structure of the rna- dependent rna polymerase from influenza c virus united states clinical features of patients infected with novel coronavirus in wuhan china: implication for infection prevention and control measures structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors quantifying the local resolution of cryo-em density maps discovery of an essential nucleotidylating activity associated with a newly delineated conserved domain in the rna polymerase-containing protein of all nidoviruses genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding bat flight and zoonotic viruses structural insight into arenavirus replication machinery ucsf chimera--a visualization system for exploratory research and analysis structure of influenza a polymerase bound to the viral rna promoter structural insight into cap-snatching and rna synthesis by influenza polymerase ctffind : fast and accurate defocus estimation from electron micrographs transmission of -ncov infection from an asymptomatic contact in germany insights into rna synthesis, capping, and proofreading mechanisms of sars-coronavirus viral rna polymerase: a promising antiviral target for influenza a virus one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities coronaviridae and sars-associated coronavirus strain hsr clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china structural basis for the inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir. biorxiv insights into sars-cov transcription and replication from the structure of the nsp -nsp hexadecamer a genomic perspective on the origin and emergence of sars-cov- structure and function of the zika virus full-length ns protein motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy epidemiology and cause of severe acute respiratory syndrome people's republic of china a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china the coronavirus replicase new tools for automated high-resolution cryo-em structure determination in relion- the affinities between nsp , nsp and nsp or nsp l proteins were measured at room temperature (r.t.) using a biacore k system with cm chips (ge healthcare). the nsp protein was immobilized on the chip with a concentration of μg/ml (diluted by . mm naac, ph . ), and the nsp protein was immobilized with a concentration of μg/ml (diluted by . mm naac, ph . ). for all measurements, the same running buffer was used which consists of mm hepes, ph . , mm nacl and . % tween- . proteins were pre- exchanged into the running buffer by sec prior to loading to the system. a blank channel of the cryo-em density map and atomic coordinates have been deposited to the electron microscopy data bank (emdb) and the protein data bank (pdb) with the accession codes emd- and bw , respectively. all other data are available from the authors on reasonable request. key: cord- -vy mvzeb authors: liu, hongbo; ye, fei; sun, qi; liang, hao; li, chunmei; lu, roujian; huang, baoying; tan, wenjie; lai, luhua title: scutellaria baicalensis extract and baicalein inhibit replication of sars-cov- and its c-like protease in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vy mvzeb covid- has become a global pandemic that threatens millions of people worldwide. there is an urgent call for developing effective drugs against the virus (sars-cov- ) causing this disease. the main protease of sars-cov- , c-like protease ( clpro), is highly conserved across coronaviruses and is essential for the maturation process of viral polyprotein. scutellariae radix (huangqin in chinese), the root of scutellaria baicalensis has been widely used in traditional chinese medicine to treat viral infection related symptoms. the extracts of s. baicalensis have exhibited broad spectrum antiviral activities. we studied the anti-sars-cov- activity of s. baicalensis and its ingredient compounds. we found that the ethanol extract of s. baicalensis inhibits sars-cov- clpro activity in vitro and the replication of sars-cov- in vero cells with an ec of . μg/ml. among the major components of s. baicalensis, baicalein strongly inhibits sars-cov- clpro activity with an ic of . μm. we further identified four baicalein analogue compounds from other herbs that inhibit sars-cov- clpro activity at microm concentration. our study demonstrates that the extract of s. baicalensis has effective anti-sars-cov- activity and baicalein and analogue compounds are strong sars-cov- clpro inhibitors. coronaviruses (covs) are single stranded positive-sense rna viruses that cause severe infections in respiratory, hepatic and various organs in humans and many other animals [ , ] . within the years of the st century, there are already three outbreaks of cov-causing global epidemics, including sars, mers, and covid- . the newly emerged cov infectious disease (covid- ) already caused more than . million confirmed infections and thousands deaths worldwide up to april , (https://www.who.int/emergencies/diseases/novel-coronavirus- /situation-reports). there is an urgent call for drug and vaccine research and development against covid- . covid- was confirmed to be caused by a new coronavirus (sars-cov- ), whose genome was sequenced in early january [ , ] . the genomic sequence of sars-cov- is highly similar to that of sars-cov with . % sequence identity [ ] and remain stable up to now [ ] . however, the sequence identities vary significantly for different viral proteins [ ] . for instance, the spike proteins (s-protein) in covs are diverse in sequences and even in the host receptors that bind due to the rapid mutations and recombination [ ] . although it has been confirmed that both sars-cov and sars-cov- use ace as receptor and occupy the same binding site, their binding affinities to ace vary due to subtle interface sequence variations [ ] . on the contrary, the clike proteases ( cl pro ) in covs are highly conserved. the cl pro in sars-cov and sars-cov- share a sequence identity of . %, making it an ideal target for broad spectrum anti-cov therapy. although many inhibitors have been reported for sars-cov and mers-cov cl pro [ ] [ ] [ ] [ ] , unfortunately none of them has entered into clinical trial. inspired by the previous studies, several covalent inhibitors were experimentally identified to inhibit the cl pro activity and viral replication of sars-cov- , and some of the complex crystal structures were solved [ , ] . in addition, a number of clinically used hiv and hcv protease inhibitors have been proposed as possible cure for covid- [ ] and some of them are now processed to clinically trials [ ] . several computational studies proposed potential sars-cov- cl pro inhibitors by virtual screening against the crystal or modeled three-dimensional structure of sars-cov- cl pro as well as machine intelligence [ ] [ ] [ ] [ ] [ ] [ ] . highly potent sars-cov- cl pro inhibitors with diverse chemical structures need further exploration. traditional chinese medicine (tcm) herbs and formulae have long been used in treating viral diseases. some of them have been clinically tested to treat covid- [ ] . anti-microbial and anti-inflammatory activities have been reported [ ] . remarkably, the extracts of s. baicalensis have exhibited broad spectrum anti-viral activities, including zika [ ] , h n [ ] , hiv [ ] and denv [ ] . in addition, a multicenter, retrospective analysis demonstrated that s. baicaleinsis exhibits more potent antiviral effects and higher clinical efficacy than ribavirin for the treatment of hand, foot and mouth disease [ ] . several s. baicalensis derived mixtures or pure compounds have been approved as antiviral drugs, such as baicalein capsule (to treat hepatitis) and huangqin tablet (to treat upper respiratory infection) in china. most of the s. baicaleinsis ingredients are flavonoids [ ] . flavonoids from other plants were also reported to mildly inhibit sars and mers-cov cl pro [ , ] . here we studied the anti-sars-cov- activity of s. baicalensis and its ingredients. we found that the ethanol extract of s. baicalensis inhibits sars-cov- cl pro activity and the most active ingredient baicalein exhibits an ic of . m. in addition, the ethanol extract of s. baicalensis effectively inhibits the replication of sars-cov- in cell assay. we also identified four baicalein analogue compounds from other herbs that inhibit sars-cov- cl pro activity at microm concentration. we prepared the % ethanol extract of s. baicalensis and tested its inhibitory activity against sars-cov- cl pro . we expressed sars-cov- cl pro and performed activity assay using a peptide substrate (thr-ser-ala-val-leu-gln-pna) according to the published procedure of sars-cov cl pro assay [ , ] . the inhibitory ratio of s. baicalensis extract at different concentrations on sars-cov- cl pro activity were shown in figure a . the crude extract exhibits significant inhibitory effect with an ic of . g/ml, suggesting that s. baicalensis contains candidate inhibitory ingredients against sars-cov- cl pro . we tested the inhibitory activity of four major ingredients from s. baicalensis: baicalein, baicalin, wogonin and wogonoside in vitro. baicalein showed the most potent anti-sars-cov- cl pro activity with an ic of . m ( figure b and table ). baicalin inhibited sars-cov- cl pro activity for about % at μm, while wogonin and wogonoside were not active at this concentration. a b we performed molecular docking to understand the inhibitory activity of s. baicalensis ingredients. in the docking model, baicalein binds well in the substrate binding site of sars-cov- cl pro with its -oh and -oh forming hydrogen bond interactions with the carbonyl group of l and the backbone amide group of g , respectively ( figure a ). in addition, the carbonyl group of baicalein is hydrogen bonded with the backbone amide group of e . the catalytic residues h and c are well covered by baicalein, accounting for its inhibitory effect. as the -oh in baicalin is in close contact with the protein, there may not be enough space for glycosyl modification, explaining the low activity of baicalin. as for wogonin, the absence of -oh together with its additional -methoxyl group alters the binding orientation and weakens the binding strength ( figure b ). hydrogen bond is observed between its -oh and the backbone carbonyl group of l , while the interaction with e by its -methoxy group is weaker than that formed by the carbonyl group in baicalein. we searched for baicalein analogues from available flavonoid suppliers and selected flavonoids and glycosides for experimental testing. four flavonoid compounds were found to be potent sars-cov- cl pro inhibitors. among them, scutellarein is mainly distributed in genus scutellaria and erigerontis herba (dengzhanxixin or dengzhanhua in chinese) in its glucuronide form, scutellarin. scutellarin has long been used in cardiovascular disease treatment for its ability to improve cerebral blood supply [ ] . for all the active flavonoid compounds that we found, the introduction of glycosyl group, as in the case of baicalein and baicalin, decreased the inhibition activity, probably due to the steric hindrance of the glycosyl group, which is also true for scutellarein/scutellarin, and myricetin/myricetrin. as glycosides and their a b c d corresponding aglycones are often interchangeable in vivo, for instance, baicalin was reported to be metabolized to baicalein in intestine [ ] , while baicalein can be transformed to baicalin by hepatic metabolism [ ] , we expect that both the flavonoid form of the active compounds and their glycoside form will function in vivo. we suggest that these compounds can be further optimized or used to search for other tcm herbs containing these compounds or substructures for the treatment of covid- . targetmol. the dna of sars-cov- cl pro (referred to genbank, accession number mn ) was synthesized (hienzyme biotech) and amplified by pcr using primers n clp-nhe ( '-catggctagcggttttagaaaaatggcattccc- ') and n clp-xho ( '-cactctcgagttggaaagtaacacctgagc- '). the pcr product was digested with nhe i/xho i and cloned into the pet a dna as reported previously [ ] . the resulting sars-cov- pet cl- x plasmid encodes a da sars-cov- cl pro with a c-terminal xhis-tag. the sars-cov- pet cl- x plasmid was further transformed to e. coli bl for protein expression as reported [ ] . the recombinant protein was purified through a nickelnitrilotriacetic acid column (ge healthcare) and subsequently loaded on a gel filtration column sephacryl s- hr (ge healthcare) for further purification as previously described [ ] . a colorimetric substrate thr-ser-ala-val-leu-gln-pna (gl biochemistry ltd) and assay buffer ( the structure of sars-cov- cl pro (pdb id lu ) [ ] and s. baicalensis concentrations for h. the supernatant was collected and the rna was extracted and analyzed by relative quantification using rt-pcr as in the previous study [ , ] . viral rna was extracted from μl supernatant of infected cells using the automated nucleic acid extraction system (tianlong, china), following the manufacturer's recommendations. sars-cov- virus detection was performed using the one step system (roche, rotkreuz, switzerland). orf ab was amplified from cdna and cloned into ms -ncov-orf ab and used as the plasmid standard after its identity was confirmed by sequencing. a standard curve was generated by determination of copy numbers from serially dilutions ( - copies) of plasmid. the following primers used for quantitative pcr were ab-f: ʹ-agaagattggttagatgatgatagt- ʹ; ab-r: ʹ-ttccatctctaattgaggttgaacc- ʹ; and probe ʹ-fam-tcctcactgccgtcttgttg acca-bhq - ʹ. the individual ec values were calculated by the origin software. coronaviruses -drug discovery and therapeutic options antiviral drugs specific for coronaviruses in preclinical development a novel coronavirus from patients with pneumonia in china a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin on the origin and continuing evolution of sars-cov- genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding structure, function, and evolution of coronavirus spike proteins structure of the sars-cov- spike receptor-binding domain bound to the ace receptor an overview of severe acute respiratory syndrome-coronavirus (sars-cov) cl protease inhibitors: peptidomimetics and small molecule chemotherapy isatin compounds as noncovalent sars coronavirus c-like protease inhibitors design of wide-spectrum inhibitors targeting coronavirus main proteases small molecules targeting severe acute respiratory syndrome human coronavirus structure of mpro from covid- virus and discovery of its inhibitors crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors therapeutic options for the novel coronavirus clinical trial analysis of -ncov therapy registered in china nelfinavir was predicted to be a potential inhibitor of -ncov main protease by an integrative approach combining homology modelling, molecular docking and binding free energy calculation potential inhibitors for -ncov coronavirus m protease from clinically approved medicines. biorxiv therapeutic drugs targeting -ncov main protease by high-throughput screening. biorxiv machine intelligence design of -ncov drugs. biorxiv prediction of the sars-cov- ( -ncov) c-like protease ( cl (pro)) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates through a drug-target interaction deep learning model can chinese medicine be used for prevention of corona virus disease (covid- )? a review of historical classics, research evidence and current prevention programs a comprehensive review on phytochemistry, pharmacology, and flavonoid biosynthesis of scutellaria baicalensis baicalein and baicalin as zika virus inhibitors anti-h n virus, cytotoxic and nrf activation activities of chemical constituents from scutellaria baicalensis inhibition of hiv replication by baicalin and s. baicalensis extracts in h cell culture extract of scutellaria baicalensis inhibits dengue virus replication efficacy of scutellaria baicalensis for the treatment of hand, foot, and mouth disease associated with encephalitis in patients infected with ev : a multicenter, retrospective analysis a targeted strategy to analyze untargeted mass spectral data: rapid chemical profiling of scutellaria baicalensis using ultra-high performance liquid chromatography coupled with hybrid quadrupole orbitrap mass spectrometry and key ion filtering inhibition of sars-cov cl protease by flavonoids characteristics of flavonoids as potent mers-cov c-like protease inhibitors c-like proteinase from sars coronavirus catalyzes substrate hydrolysis by a general base mechanism therapeutic effects of breviscapine in cardiovascular diseases: a review identification of myricetin and scutellarein as novel chemical inhibitors of the sars coronavirus helicase, nsp metabolism of constituents in huangqin-tang, a prescription in traditional chinese medicine, by human intestinal flora hepatic metabolism and disposition of baicalein via the coupling of conjugation enzymes and transporters-in vitro and in vivo evidences biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus c-like proteinase maturation mechanism of severe acute respiratory syndrome (sars) coronavirus c-like proteinase in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) key: cord- -xrydkiu authors: pahmeier, felix; neufeldt, christoper j; cerikan, berati; prasad, vibhu; pape, costantin; laketa, vibor; ruggieri, alessia; bartenschlager, ralf; cortese, mirko title: a versatile reporter system to monitor virus infected cells and its application to dengue virus and sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xrydkiu positive-strand rna viruses have been the etiological agents in several major disease outbreaks over the last few decades. examples of that are flaviviruses, such as dengue virus and zika virus that cause millions of yearly infections and spread around the globe, and coronaviruses, such as sars-cov- , which is the cause of the current pandemic. the severity of outbreaks caused by these viruses stresses the importance of virology research in determining mechanisms to limit virus spread and to curb disease severity. such studies require molecular tools to decipher virus-host interactions and to develop effective interventions. here, we describe the generation and characterization of a reporter system to visualize dengue virus and sars-cov- replication in live cells. the system is based on viral protease activity causing cleavage and nuclear translocation of an engineered fluorescent protein that is expressed in the infected cells. we show the suitability of the system for live cell imaging and visualization of single infected cells as well as for screening and testing of antiviral compounds. given the modular building blocks, the system is easy to manipulate and can be adapted to any virus encoding a protease, thus offering a high degree of flexibility. importance reporter systems are useful tools for fast and quantitative visualization of viral replication and spread within a host cell population. here we describe a reporter system that takes advantage of virus-encoded proteases that are expressed in infected cells to cleave an er-anchored fluorescent protein fused to a nuclear localization sequence. upon cleavage, the fluorescent protein translocates to the nucleus, allowing for rapid detection of the infected cells. using this system, we demonstrate reliable reporting activity for two major human pathogens from the flaviviridae and the coronaviridae families: dengue virus and sars-cov- . we apply this reporter system to live cell imaging and use it for proof-of-concept to validate antiviral activity of a nucleoside analogue. this reporter system is not only an invaluable tool for the characterization of viral replication, but also for the discovery and development of antivirals that are urgently needed to halt the spread of these viruses. antibodies. the antibodies used in this study are listed in table . oligonucleotides encoding the protease cleavage sites were designed to allow insertion into the vector via mlui and bamhi restriction sites. the primer pairs (table table sequences of oligonucleotides used in this study. sars-cov- (moi = ) and hpi the medium was exchanged for imaging medium. lid was moved to the locked position and silicon was used to seal the dish in order to bioinformatics analysis. images were analyzed using the fiji software ( , ). graph generation and statistical analysis was performed using the graphpad prism . design and characterization of denv reporter constructs. in order to generate a reporter system that can specifically indicate virus infection, we designed a construct expressing a gfp fusion protein that could selectively be cleaved by viral proteases. the reporter construct was engineered for viruses that produce er which can be easily detected and quantified by light microscopy. the denv polyprotein is cleaved into the individual viral proteins by either the host signal peptidase of the viral ns b/ serine protease ( , ). the er-resident ns b protein acts as a co-factor of ns protease and anchors it to er membranes ( , ). to determine an optimal system for reporting denv infection, several previously described ns b/ specific cleavage sequences were inserted into the reporter construct (table ) . recently, a plasmid-based expression system for induction of denv replication organelles in transfected cells has been described ( ). this system, designated "plasmid-induced replication organelle -dengue (piro-d), encodes the viral polyprotein that is translated from an rna generated in the cytoplasm by a stably expressed t rna polymerase. in this way, the piro-d system allows the analysis of viral proteins in cells, independent of viral replication. however, since no fluorescent protein coding sequence is incorporated into the construct, expression of the denv polyprotein cannot be followed by live cell imaging. to overcome this limitation, we determined whether our denv reporter cell line (figures and ) . interestingly, for several denv reporter constructs, nuclear gfp localization was observed in absence of the viral protease (figure a) huh -rc and lunet-t -rc, we had to establish single cell clones for the were observed ( figure a ). this is likely due to differences in the ability of cell lines to respond to high levels of gfp fusion proteins. in addition to sorting for cells with lower reporter expression as done here, this problem might be overcome by employing less active promoter or by using an alternative fluorescent protein. live cell imaging demonstrated that sars-cov- infected cells can be identified as epidemiological characteristics of human- infective rna viruses the continued threat of emerging flaviviruses sequence of the dengue virus genome in membranous replication organelle formation cov- infection induces a pro-inflammatory cytokine response through cgas- sting and nf-kb an orally bioavailable broad- spectrum antiviral inhibits sars-cov- in human airway epithelial cell cultures and multiple coronaviruses in mice a nanoluciferase sars-cov- for rapid neutralization testing and screening of anti-infective drugs for covid- an infectious cdna clone of sars-cov- rapid reconstruction of sars-cov- using a synthetic genomics platform remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro correlative light and electron microscopy methods for the study of virus-cell interactions development of a fluorescence based, high-throughput sars-cov- clpro reporter assay key: cord- -pxof pl authors: eskier, doğa; suner, aslı; oktay, yavuz; karakülah, gökhan title: mutations of sars-cov- nsp exhibit strong association with increased genome-wide mutation load date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pxof pl sars-cov- is a betacoronavirus responsible for human cases of covid- , a pandemic with global impact that first emerged in late . since then, the viral genome has shown considerable variance as the disease spread across the world, in part due to the zoonotic origins of the virus and the human host adaptation process. as a virus with an rna genome that codes for its own genomic replication proteins, mutations in these proteins can significantly impact the variance rate of the genome, affecting both the survival and infection rate of the virus, and attempts at combating the disease. in this study, we analyzed the mutation densities of viral isolates carrying frequently observed mutations for four proteins in the rna synthesis complex over time in comparison to wildtype isolates. our observations suggest mutations in nsp , an error-correcting exonuclease protein, have the strongest association with increased mutation load in both regions without selective pressure and across the genome, compared to nsp , , and , which form the core polymerase complex. we propose nsp as a priority research target for understanding genomic variance rate in sars-cov- isolates, and nsp mutations as potential predictors for high mutability strains. covid- is an ongoing global pandemic characterized by long-term respiratory system damage in patients, and caused by the sars-cov- betacoronavirus. it is likely of zoonotic origin, but capable of human-to-human transmission, and since the first observed cases in the wuhan province of china (chan et al., ; riou & althaus, ) , it has infected over million people, with recorded deaths. in addition to its immediate effects on the respiratory system, its long term effects are still being researched, including symptoms such as neuroinvasion (li, bai & hashikawa, ; wu et al., ) , cardiovascular complications (kochi et al., ; zhu et al., ) , and gastrointestinal and liver damage (lee, huo & huang, ; xu et al., ) . due to its high transmissibility, and capacity for asymptomatic transmission (wong et al., ) , study of covid- and its underlying pathogen remain a high priority. as a result, the high amount of frequently updated data on viral genomes on databases such as gisaid (elbe & buckland-merrett, ) and nextstrain (hadfield et al., ) provides researchers with invaluable resources to track the evolution of the virus as it spreads across the world. sars-cov- has a linear, single-stranded rna genome, and does not depend on host proteins for genomic replication, instead using an rna synthesis complex formed from nonstructural proteins (nsp) coded by its own genome. four of the key proteins involved in the complex are nsp , nsp , nsp , and nsp , all of which are formed from cleavage of the polyprotein orf ab into mature peptides. nsp , also known as rdrp (rna-dependent rna polymerase), is responsible for synthesizing new strands of rna using the viral genome as a template. nsp and nsp act as essential co-factors for the polymerase unit, together creating the core polymerase complex (kirchdoerfer & ward, ; peng et al., ) , while nsp is an exonuclease which provides error-correcting capability to the rna synthesis complex (subissi et al., ; ma et al., ; romano et al., ) . owing to their role in maintaining replication fidelity and genome sequences, these proteins are key targets of study in understanding the mutation accumulation and adaptive evolution of the virus (peng et al., ) . in our previous study, we examined the top most frequent mutations in the sars-cov- nsp , and identified that four of them are associated with an increase in mutation density in two genes, the membrane glycoprotein (m) and the envelope glycoprotein (e) (the combination of which is hereafter referred to as moe, as we previously described), which are not under selective pressure, and mutations in these genes are potential markers of reduced replication fidelity (eskier et al., ) . in this study, we follow up on our previous findings and analyze the mutations in nsps , , and , in addition to nsp , to identify whether the mutations are associated with a nonselective increase in mutation load or not. we then examine whole genome mutation densities in mutant isolates in comparison to wildtype isolates using linear regression models, in order to understand whether the mutations are associated with potential functional impact. our findings indicate that mutations in nsp are most likely to be predictors of accelerated mutation load increase. genome sequence filtering, retrieval, and preprocessing as previously described (eskier et al., ) , isolate genomes obtained from the gisaid epicov database (date of accession: june, ) were filtered to remove low-coverage or incomplete genomes, aligned against the reference genomic sequence for sars-cov- , and processed to identify any snvs present in the isolates and their impact on peptide sequences, if any, using the mafft, snp-sites, bcftools, and annovar suite of software. following the alignment and variant annotation, isolates with incomplete sequencing location or date data were further removed from the pool, and the ' utr (bases - ) and the nucleotides at the ' end of the genome were masked due to their gap-heavy and low-quality nature. following the filters, , genomes were used for the analyses. variants were categorized as synonymous and nonsynonymous following annotation by annovar, with intergenic or terminal mutations being considered synonymous. gene mutation densities were calculated separately for synonymous and nonsynonymous mutations, as well as the total of snvs, for each isolate, using a non-reference nucleotides per kilobase of region metric. mutation densities were calculated for the combined membrane glycoprotein (m) and envelope glycoprotein (e) genes (moe), the surface glycoprotein gene (s), and the whole genome. descriptive statistics for continuous variable days were calculated with mean, standard deviation, median, and interquartile range. kolmogorov-smirnov test was used to check the normality assumption of the continuous variables. in cases of non-normally distributed data, the wilcoxon rank-sum (mann-whitney u) test was performed to determine whether the difference between the two moe status groups was statistically significant. the fisher's exact test and the pearson chi-square test were used for the analysis of categorical variables. the univariate logistic regression method was utilized to assess the mutations associated with moe status in single variables, and then multiple logistic regression method was performed. the final multiple logistic regression model was executed with the backward stepwise method. the relationship between mutation density and time in isolates with mutations of interest, as well as in the group comprising all isolates, was examined via non-polynomial linear regression model and spearman's rank correlation. a p-value of less than . was considered statistically significant. all statistical analyses were performed using ibm spss version . (chicago, il, usa). to identify the trends in sars-cov- mutation load over time, we calculated the average mutation density per day for all isolates for whole genome, s gene, and moe regions, capping outliers at the th and th percentile values to minimize the potential effects of sequencing errors ( fig. ) . our results show that both at the genome level and the s gene, a very strong positive correlation between average mutation density and time. in comparison, moe has a weak positive correlation, with a wider spread of mean density in early and late periods compared to the genome and the s gene. this is consistent with reduced selective pressure on the m and e genes, as has previously been described (dilucca et al., ) . the top nonsynonymous mutation is a>g (in , isolates), responsible for the d g substitution in the spike protein, followed by the c>t mutation (in , isolates) in the nsp region of the orf ab gene, causing p l substitution in the rdrp protein, and the c>t mutation (in , isolates), responsible for the l s substitution in the orf protein. the most common synonymous mutation is the c>t mutation (in , isolates), and is found on the nsp coding region of the orf ab gene. for the s gene, the most frequent synonymous mutation is the c>t mutation (in isolates), and the second most common nonsynonymous mutation, after the aforementioned d g mutation, is c>t (in isolates), responsible for the p l substitution. for moe, the most common synonymous and nonsynonymous mutations are c>t (in isolates) and c>t (in isolates), respectively, both of which are found in the m gene, and the latter of which causes t m amino acid substitution. other than the d g mutation, all of the mentioned mutations are c>t substitutions, the prevalence of which in t-or a-rich regions of the sars-cov- genome have been previously documented (simmonds, ) . after identifying the increase in mutation load over time, which was more prominent in genes with high functional impact (s, orf ab) compared to other structural genes (m, e, n), we sought to examine possible associations of variants in proteins involved in sars-cov- genome replication with the increase. we first identified the five most frequently observed mutations for nsps , , (also known as rdrp) and , four of the proteins cleaved from the orf ab polyprotein and are involved in the rna polymerization, followed by analyzing the association of each mutation with the presence of moe mutations (hereafter referred to as moe status) using the chi-square test. out of the mutations were found to have a significant association with moe status (p-value < . ) ( table ). compared to our previous findings on the top nsp mutations (eskier et al. ) , which was based on an analysis of , samples as of may , c>t and c>t have increased in rank of appearance, from th and th to th and th, respectively, and decreased in p-value to show statistically significant associations. in addition, the c>t mutation have increased in rank of appearance from th to rd. out of the other nsps tested, nsp was found to have four significant mutations, while nsp had two and nsp had one. in addition to time and genotype, we also examined the potential association between the location of isolates and moe status as a possible confounding factor. we first examined whether there is a significant association between location, defined here as continent the isolate was originally obtained, and moe status. our results indicate that there is a strong association between location and moe status, with the highest percentage of moe present isolates in asia ( . %), and the percentage ratio in south america ( . %) (p-value < . ). in comparison to our previous findings, south america had a dramatic decrease in moe present isolate percentage, likely as a result of the increased sequencing efforts (from isolates to ) removing potential sampling biases or localized founder effects. africa, asia, and north america had an increase in moe present proportion, while europe, oceania, and south america showed lowered percentages (table ) . after observing the potential confounding effect of location on moe status, we sought to understand whether a location is more or less likely to predict moe status, using a logistic regression model (table ). comparing each individual region ( ) to the other five ( ), we found that asia, europe, and north and south america are all possible predictors of moe status (pvalue < . ), with asia and europe . and . times as likely to be moe present as the other regions, and north and south america . and . times as likely, respectively. using these findings, we created different logistic regression models to identify which of the mutations are likely to be independent predictors of moe status (table ). in the single variable model, all mutations we previously identified and location were found to be potential we then examined the effects of each mutation on genomic mutation density to see whether the relationship between the mutations and moe status are indicative of a genome-wide trend. due to selection potentially effecting nonsynonymous mutations differentially, we separated the mutations in the two categories and calculated mutation density separately for each category. our results show that nsp mutations show the most consistent association with mutations between moe and the whole genome. all three nsp mutations ( c>t, t>c, and c>t) which have a significant association with moe status also show a similar relationship with genomic mutation density (fig. ) . c>t (l l) has the lowest odds ratio for moe status (table ), and while it shows a slower increase in synonymous mutation density compared to wildtype isolates ( fig. a) , it has a significant impact on faster mutation density increase in nonsynonymous mutations (fig. b ). in comparison, c>t (l l) (fig. c-d) and show much lower impact on altered mutation density increase rate. c>t, an nsp mutation, displays high divergence from wildtype isolate patterns; however, its low sample size (n = ) creates a skewed distribution of isolates across time, complicating any potential inference. it bears noting that not all mutations continue to be observed up to the date of retrieval. c>t and t>c have been last observed on may and may, , respectively. whether this is due to sequence sampling or because the mutations confer an evolutionary disadvantage is unknown. c>t and c>t are both synonymous mutations, yet only the latter continues to be present in recent isolates. our previous work identified rdrp mutations as contributors to the evolution of the sars-cov- genome and this study confirmed those findings. furthermore, we hypothesized that mutations of the other critical components of the viral replication and transcription machinery may have similar effects. our results implicate nsp as a source of increased mutation rate in sars-cov- genomes. three of the five most common nsp mutations, namely c>t, c>t and c>t are associated with increases in both genome-wide mutational load, as well as moe status, an alternative indicator of mutational rate and virus evolution. interestingly all three are located within the exon domain, which is responsible for the proofreading activity of nsp ; however, only c>t mutation is non-synonymous (f l), while c>t and c>t are synonymous mutations and therefore, only after functional studies it will be possible to understand their effects on viral replication processes. the fate of three nsp mutations are also intriguing: despite being present in the first case detected in the washington state of the us in mid-january, and detected in , cases till may , c>t mutation has not been detected since then. on the other hand, c>t mutation arising around at the end of january likely in saudi arabia and being detected in much less cases (n= ), is still present in many isolates. however, it should be noted that c>t mutation arose within the dominant a>g / c>t lineage, while the other two nsp mutations are in different lineages. therefore, dominance or disappearance of different nsp mutations may have less to do with these particular mutations and more with the co-mutations. yet, we cannot rule out possible effects of these nsp mutations on the fitness of sars-cov- . the authors would like to thank mr. alirıza arıbaş from izmir biomedicine and genome center for his technical assistance. the authors also would like to extend their thanks to izmir biomedicine and genome center (ibg) covid platform ibg-covid for their support in implementing the study and the scientific and technological research council of turkey (tubitak) for their financial support of ibg-covid . yavuz oktay is supported by the turkish academy of sciences young investigator program (tüba-gebi̇p). the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. the following grant information was disclosed by the authors: turkish academy of sciences young investigator program (tüba-gebi̇p). the authors declare that they have no competing interests. doğa eskier, aslı suner, gökhan karakülah and yavuz oktay conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. the data is available at mendeley: eskier, doğa; suner, aslı; oktay, yavuz; karakülah, gökhan ( ), "sars-cov- gisaid isolates ( - - ) genotyping vcf", mendeley data, v . http://dx.doi.org/ . / t c xb c. supplemental materials are included with this research. correlation scores are calculated using spearman rank correlation. wildtype isolates in all graphs carry the reference nucleotide for the nine positions of interest ( , , , , , , , , ) a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster codon usage and phenotypic divergences of sars-cov- genes data, disease and diplomacy: gisaid's innovative contribution to global health rdrp mutations are associated with sars-cov- genome evolution nextstrain: real-time tracking of pathogen evolution structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors cardiac and arrhythmic complications in patients with covid- gastrointestinal and liver manifestations in patients with covid- the neuroinvasive potential of sars-cov may play a role in the respiratory failure of covid- patients structural basis and functional analysis of the sars coronavirus nsp -nsp complex structural and biochemical characterization of the nsp -nsp -nsp core polymerase complex from sars-cov- pattern of early human-to-human transmission of wuhan a structural view of sars-cov- rna replication machinery: rna synthesis, proofreading and final capping rampant c→u hypermutation in the genomes of sars-cov- and other coronaviruses: causes and consequences for their short-and long-term evolutionary trajectories. msphere one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities asymptomatic transmission of sars-cov- and implications for mass gatherings. influenza and other respiratory viruses nervous system involvement after infection with covid- and other coronaviruses liver injury during highly pathogenic human coronavirus infections cardiovascular complications in patients with covid- : consequences of viral toxicities and host immune response confidence interval; multiple logistic regression final model was executed on all these statistically significant variables, included together in the model, and selected with backward stepwise method key: cord- -jgbjxgh authors: graham, simon p.; mclean, rebecca k.; spencer, alexandra j.; belij-rammerstorfer, sandra; wright, daniel; ulaszewska, marta; edwards, jane c.; hayes, jack w. p.; martini, veronica; thakur, nazia; conceicao, carina; dietrich, isabelle; shelton, holly; waters, ryan; ludi, anna; wilsden, ginette; browning, clare; bialy, dagmara; bhat, sushant; stevenson-leggett, phoebe; hollinghurst, philippa; gilbride, ciaran; pulido, david; moffat, katy; sharpe, hannah; allen, elizabeth; mioulet, valerie; chiu, chris; newman, joseph; asfor, amin s.; burman, alison; crossley, sylvia; huo, jiandong; owens, raymond j.; carroll, miles; hammond, john a.; tchilian, elma; bailey, dalan; charleston, bryan; gilbert, sarah c.; tuthill, tobias j.; lambe, teresa title: evaluation of the immunogenicity of prime-boost vaccination with the replication-deficient viral vectored covid- vaccine candidate chadox ncov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jgbjxgh clinical development of the covid- vaccine candidate chadox ncov- , a replication-deficient simian adenoviral vector expressing the full-length sars-cov- spike (s) protein was initiated in april following non-human primate studies using a single immunisation. here, we compared the immunogenicity of one or two doses of chadox ncov- in both mice and pigs. whilst a single dose induced antigen-specific antibody and t cells responses, a booster immunisation enhanced antibody responses, particularly in pigs, with a significant increase in sars-cov- neutralising titres. as sars-cov- began to spread around the world at the beginning of several vaccine platform technologies were employed to generate candidate vaccines. several use replicationdeficient adenoviral (ad) vector technology and express the sars-cov- spike (s) protein. the first phase i clinical study of an ad -vectored vaccine has been reported , chadox ncov- (azd ) phase i trials (nct ) began in april with phase ii and iii trials (nct ) started soon thereafter, and an ad -vectored vaccine is expected to enter phase i shortly. typically, only one dose of ad-vectored vaccines has been administered in early preclinical challenge studies or clinical studies against emerging or outbreak pathogens - . rhesus macaques immunised with a single dose of chadox ncov- were protected against pneumonia but there was no impact on nasal virus titers after high dose challenge to both the upper and lower respiratory tract . to increase antibody titres and longevity of immune responses, a booster vaccination may be administered. homologous prime-boost immunisation resulted in higher antibody titres including neutralising antibodies and a trend towards a lower clinical score in a mers-cov challenge study . here, we set out to test the immunogenicity of either one or two doses of chadox ncov- in mice and pigs, to further inform clinical development. 'prime-boost' vaccinated inbred (balb/c) and outbred (cd ) mice were immunised on and days post-vaccination (dpv), whereas, 'prime-only' mice received a single dose of chadox ncov- on day . spleens and serum were harvested from all mice on day ( weeks after boost or prime vaccination). analysis of sars-cov- s protein-specific murine splenocyte responses by ifnγ elispot assay showed no statistically significant difference between the prime-only and primeboost vaccination regimens, in either strain of mouse ( figure a ). intracellular cytokine staining (ics) of splenocytes ( figure b) showed, in both mouse strains, that the response was principally driven by cd + t cells. the predominant cytokine response of both cd + and cd + t cells was expression of ifn-γ and tnf-α, with negligible frequencies of il- + and il- + cells, consistent with previous data suggesting adenoviral vaccination does not induce a dominant th response , . there were no signficant differences in cd + and cd + t cell cytokine responses between prime-only and primeboost mice. prime-only and prime-boost pigs were immunised on dpv and prime-boost pigs received a second immunisation on dpv. blood samples were collected weekly until dpv to analyse immune responses. ifn-γ elispot analysis of porcine peripheral blood mononuclear cells (pbmc) showed responses on dpv ( weeks after boost) that were significantly greater in the prime-boost pigs compared to prime-only animals (p < . ; figure c ). the prime-boost dpv responses were greater than responses observed in either group on dpv, but inter-animal variation meant this did not achieve statistical significance. ics analysis of porcine t cell reponses showed a dominance of th -type cytokines (similar to the murine response) but with a higher frequency of s-specific cd + t cells compared to cd + t cells ( figure d ). however, cd + and cd + t cell cytokine responses did not differ significantly between vaccine groups or timepoints ( vs. dpv). sars-cov- s protein-specific antibody titres in serum were determined by elisa using recombinant soluble trimeric s (fl-s) and receptor binding domain (rbd) proteins. a significant increase in fl-s binding antibody titres was observed in prime-boost balb/c mice compared to their prime-only counterparts (p < . ), however, the difference between vaccine groups for cd mice was not significant (figure a ). antibody responses were evaluated longitudinally in pig sera by fl-s and rbd elisa. compared to pre-vaccination sera, significant fl-s specific antibody titres were detected in both prime-only and prime-boost groups from and dpv, respectively (p < . ; figure b ). fl-s antibody titres did not differ signifcantly between groups until after the boost, when titres in the prime-boost pigs became significantly greater with an average increase in titres of > log (p < . ). rbd-specific antibody titres showed a similar profile with significant titres in both groups from dpv (p < . ) and a further significant increase in the prime-boost pigs from dpv onwards which was greater than the prime-only pigs (p < . ; figure c ). sars-cov- neutralising antibody responses were assessed using a virus neutralisation test (vnt; figure d ) and pseudovirus-based neutralisation test (pvnt; figure e ). after the prime immunisation, sars-cov- neutralising antibody titres were detected by vnt in and dpv sera from / prime-boost and / prime-only pigs. two weeks after the boost ( dpv), neutralising antibody titres were detected and had increased in all prime-boost pigs, which were significantly greater than the earlier timepoints and the titres measured in the prime-only group (p < . ). in agreement with this analysis, serum assayed for neutralising antibodies using the pvnt revealed that antibody titres in dpv prime-boost pig sera were significantly greater than earlier timepoints and the prime-only group (p < . ). statistical analysis showed a highly significant correlation between pvnt and vnt titres (spearman's rank correlation r = . ; p < . ). in this study, we utilised both a small and a large animal model to evaluate the immunogenicity of either one or two doses of a covid- vaccine candidate, chadox ncov- (now known as azd ). small animal models have variable success in predicting vaccine efficacy in larger animals but are an important stepping stone to facilitate prioritisation of vaccine targets. in contrast, larger animal models, such as the pig and non-human primates, have been shown to more accurately predict vaccine outcome in humans [ ] [ ] [ ] . the mouse data generated in this study suggested that the immunogenicity profile was at the upper end of a dose response curve, which may have saturated the immune response and largely obscured our ability to determine differences between prime-only or prime-boost regimens. we have developed the pig as a model for generating and understanding immune responses to vaccination against human influenza [ ] [ ] [ ] and nipah virus , .the inherent heterogeneity of an outbred large animal model is more representative of immune responses in humans. extensive development of reagents to study immune responses in pigs in recent years has extended the usefulness and applicability of the pig as a model to study infectious disease. these data demonstrate the utility of the pig as a model for further evaluation of the immunogenicity of chadox ncov- and other covid- vaccines. we show here that t cell responses are higher in pigs that received a prime-boost vaccination when compared to prime only at day , whilst comparing responses days after last immunisation demonstrates the prime-boost regimen trended toward a higher response. in addition, chadox ncov- immunisation induced robust th -like cd + and cd + t cell responses in both pigs and mice. this has important implications for covid- vaccine development as virus-specific t cells are thought to play an important role in sars-cov- infection [ ] [ ] [ ] [ ] [ ] . while no correlate of protection has been defined for covid- , recent publications suggest that neutralising antibody titres may be correlated with protection in animal challenge models , . a single dose of chadox ncov- induces antibody responses, but we demonstrate here that antibody responses are significantly enhanced after homologous boost in one mouse strain and to a greater extent in pigs. however, it is likely that a combination of neutralising antibodies and antigen-specific t cells would act in synergy to prevent and control infection, as we have recently shown in the context of influenza vaccination , . whilst human immunogenicity and clinical read-outs are a critically meaningful endpoint, studies in small animals and pigs will help prioritise candidates to be tested in humans. further clinical studies are needed to assess immunogenicity after prime-boost vaccination and the impact on clinical efficacy and durability of the immune response. mouse and pig studies were performed in accordance with the uk animals (scientific procedures) act and with approval from the relevant local animal welfare and ethical review body (mice -project license p b f , and pigs -project license pp ). the principles of the r's were applied for the duration of the study to ensure animal welfare was not unnecessarily compromised. vero e cells were grown in dmem containing sodium pyruvate and l-glutamine (sigma-aldrich, poole, uk), % fbs (gibco, thermo fisher, loughborough, uk), . % penicillin/streptomycin ( , u/ml; gibco) (maintenance media) at °c and % co . sars-cov- isolate england- stocks were grown in vero e cells using a multiplicity of infection (moi) of . for days at °c in propagation media (maintenance media containing % fbs). sars-cov- stocks were titrated on vero e cells using mem (gibco), % fcs (labtech, heathfield, uk), . % avicel (fmc biopolymer, girvan, uk) as overlay. plaque assays were fixed using formaldehyde (vwr, leighton buzzard, uk) and stained using . % toluidine blue (sigma-aldrich). all work with live sars-cov- virus was performed in acdp hg laboratories by trained personnel. the propagation, purification and assessment of chadox ncov- titres were as described previously . a synthetic dna, encoding the spike (s) protein receptor binding domain (rbd; amino acids - ) of sars-cov- (genbank mn ), codon optimised for expression in mammalian cells (idt technology) was inserted into the vector popinttgneo incorporating a c-terminal his tag. recombinant rbd was transiently expressed in expi ™ (thermo fisher scientific, uk) and protein purified from culture supernatants by immobilised metal affinity followed by a gel filtration in phosphate-buffered saline (pbs) ph . buffer. a soluble trimeric s (fl-s) protein construct encoding residues - with two sets of mutations that stabilise the protein in a pre-fusion conformation (removal of a furin cleavage site and the introduction of two proline residues; k p, v p) was expressed as described . the endogenous viral signal peptide was retained at the nterminus (residues - ), a c-terminal t -foldon domain incorporated to promote association of monomers into trimers to reflect the native transmembrane viral protein, and a c-terminal his tag included for nickel-based affinity purification. similar to recombinant rbd, fl-s was transiently expressed in expi ™ (thermo fisher scientific) and protein purified from culture supernatants by immobilised metal affinity followed by gel filtration in tris-buffered saline (tbs) ph . buffer. for analysis of t cell responses in pigs, overlapping mer peptides offset by residues based on the predicted amino acid sequence of the entire s protein from sars-cov- wuhan-hu- isolate (ncbi reference sequence: nc_ . ) were designed and synthesised (mimotopes, melbourne, australia) and reconstituted in sterile % acetonitrile (sigma-aldrich) at a concentration of mg/ml. three pools of synthetic peptides representing residues - (pool ), - (pool ) and - (pool ) were prepared for use to stimulate t cells in ifn-γ elispot and intracellular cytokine staining (ics) assays. for analysis of t cell responses in mice, overlapping mer peptides offset by residues were designed and synthesised (mimotopes) and reconstituted in sterile % dmso (sigma-aldrich) at a concentration of mg/ml. two peptide pools spanning s region (pool : to and - , pool : - ) and peptide pools spanning s region (pool : to , pool : to ) were used for stimulating splenocytes for ifn-γ elispot analysis, and single pools of s (pool and pool ) and s (pool and pool ) were used to stimulate splenocytes for ics. mice: inbred female balb/colahsd (balb/c) (envigo) and outbred crl:cd (cd ) (charles river) of at least weeks of age were randomly allocated into 'prime-only' or 'prime-boost' vaccination groups (balb/c n= and cd n= ). prime-boost mice were immunised intramuscularly with infectious units (iu) ( . x virus particles; vp) chadox ncov- and boosted intramuscularly four weeks later with × iu chadox ncov- . prime-only mice received a single dose of iu chadox ncov- at the same time prime-boost mice were boosted. spleens and serum were harvested from all animals a further weeks later. pigs: six - -week-old, weaned, female, large white-landrace-hampshire cross-bred pigs from a commercial rearing unit were randomly allocated to two treatment groups (n = ): 'prime-only' and 'prime-boost'. both groups were immunised on day with × iu ( . × vp) chadox ncov- in ml pbs by intramuscular injection (brachiocephalic muscle). 'prime-boost' pigs received an identical booster immunisation on day . blood samples were taken from all pigs on a weekly basis at , , , , , and dpv by venepuncture of the external jugular vein: ml/pig in bd sst vacutainer tubes (fisher scientific) for serum collection and ml/pig in bd heparin vacutainer tubes (fisher scientific) for peripheral blood mononuclear cell (pbmc) isolation. mice: antibodies to sars-cov- fl-s protein were determined by performing a standardised elisa on serum collected -weeks after prime or prime-boost vaccination. maxisorp plates (nunc) were coated with ng/well fl-s protein overnight at °c, prior to washing in pbs/tween ( . % v/v) and blocking with blocker casein in pbs (thermo fisher scientific) for hour at room temperature (rt). standard positive serum (pool of mouse serum with high endpoint titre against fl-s protein), individual mouse serum samples, negative and an internal control (diluted in casein) were incubated for hours at rt. following washing, bound antibodies were detected by addition of alkaline phosphatase-conjugated goat anti-mouse igg (sigma-aldrich), diluted / in casein, for hour at rt and detection of anti-mouse igg by the addition of pnpp substrate (sigma-aldrich). an arbitrary number of elisa units were assigned to the reference pool and od values of each dilution were fitted to a -parameter logistic curve using softmax pro software. elisa units were calculated for each sample using the od values of the sample and the parameters of the standard curve. pigs: serum was isolated by centrifugation of sst tubes at × g for minutes at rt and stored at - °c. sars-cov- rbd and fl-s specific antibodies in serum were assessed as detailed previously with the exception of the following two steps. the conjugated secondary antibody was replaced with goat anti-porcine igg hrp (abcam, cambridge, uk) at / , dilution in pbs with . % tween and % non-fat milk. in addition, after the last wash, a µl of tmb (one component horse radish peroxidase microwell substrate, biofx, cambridge bioscience, cambridge, uk) was added to each well and the plates were incubated for minutes at rt. a µl of biofx nmstop reagent (cambridge bioscience) was then added to stop the reaction and microplates were read at nm. end-point antibody titres (mean of duplicates) were calculated as follows: the log od was plotted against the log sample dilution and a regression analysis of the linear part of this curve allowed calculation of the endpoint titre with an od of twice the average od values of dpv sera. the ability of pig sera to neutralise sars-cov- was evaluated using virus and pseudovirus neutralisation assays. for both assays, sera were first heat-inactivated (hi) by incubation at °c for hours. virus neutralization test (vnt): starting at a in dilution, two-fold serial dilutions of sera were prepared in well round-bottom plates using dmem containing % fbs and % antibiotic-antimycotic (gibco) (dilution media). μl of diluted pig serum was mixed with μl dilution media containing approximately plaque-forming units (pfu) sars-cov- for hour at °c. vero e cells were seeded in -well flat-bottom plates at a density of × cells/ml in maintenance media one day prior to experimentation. culture supernatants were replaced by µl of dmem containing % fcs and % antibiotic-antimycotic, before µl of the virus-sera mixture was added to the vero e cells and incubated for six days at °c. cytopathic effect (cpe) was investigated by brightfield microscopy. cells were further fixed and stained as described above, and cpe scored. each individual pig serum dilution was tested in quadruplet on the same plate and no sera/sars-cov- virus and no sera/no virus controls were run concurrently on each plate in quadruplet. wells were scored for cytopathic effect and neutralisation titres expressed as the reciprocal of the serum dilution that completely blocked cpe in % of the wells (nd ). researchers performing the vnts were blinded to the identity of the samples. pseudovirus neutralisation test (pvnt): lentiviral-based sars-cov- pseudoviruses were generated in hek t cells incubated at °c, % co . cells were seeded at a density of . x in well dishes, before being transfected with plasmids as follows: ng of sars-cov- spike, ng p . (encoding for hiv- gag-pol), ng csflw (lentivirus backbone expressing a firefly luciferase reporter gene), in opti-mem (gibco) along with µl pei ( µg/ml) transfection reagent. a 'no glycoprotein' control was also set up using carrier dna (pcdna . ) instead of the sars-cov- s expression plasmid. the following day, the transfection mix was replaced with ml dmem with % fbs (dmem- %) and incubated at °c. at both and hours post transfection, supernatants containing pseudotyped sars-cov- (sars-cov- pps) were harvested, pooled and centrifuged at , x g for minutes at °c to remove cellular debris. target hek t cells, previously transfected with ng of a human ace expression plasmid (addgene, cambridge, ma, usa) were seeded at a density of × in µl dmem- % in a white flat-bottomed -well plate one day prior to harvesting of sars-cov- pps. the following day, sars-cov- pps were titrated -fold on target cells, with the remainder stored at - °c. for pvnts, pig sera were diluted : in serum-free media and µl was added to a -well plate in quadruplicate and titrated fold. a fixed titred volume of sars-cov- pps was added at a dilution equivalent to signal luciferase units in µl dmem- % and incubated with sera for hour at °c, % co . target cells expressing human ace were then added at a density of x in µl and incubated at °c, % co for hours. firefly luciferase activity was then measured with brightglo luciferase reagent and a glomax-multi + detection system (promega, southampton, uk). pseudovirus neutralization titres were expressed as the reciprocal of the serum dilution that inhibited luciferase expression by % (ic ). mice: single cell suspension of mouse spleens were prepared by passing cells through μm cell strainers and ack lysis (thermo fisher) prior to resuspension in complete media (mem supplemented with % fcs, pen-step, l-glut and -mercaptoethanol). for analysis of ifn-γ production by elispot assay, splenocytes were stimulated with s peptide pools at a final concentration of g/ml on ipvh-membrane plates (millipore) coated with g/ml anti-mouse ifn-γ (clone an ; mabtech). after - hours of stimulation, ifn-γ spot forming cells (sfc) were detected by staining membranes with anti-mouse ifn-γ biotin mab ( µg/ml; clone r a , mabtech) followed by streptavidin-alkaline phosphatase ( µg/ml) and development with ap conjugate substrate kit (bio-rad). for analysis of intracellular cytokine production, cells were stimulated with μg/ml s peptide pools, media or cell stimulation cocktail (containing pma-ionomycin, biolegend), together with μg/ml golgiplug (bd biosciences) and μl/ml cd a-alexa for hours in a -well u-bottom plate, prior to placing at o c overnight. following surface staining with cd -buv , cd -percp-cy . , cd l-bv , cd -bv , cd -apc-cy and live/dead aqua (thermo fisher), cells were fixed with % neutral buffered formalin (containing % paraformaldehyde) and stained intracellularly with tnf--af , il- -pe-cy , il- -bv , il- -pe and ifn-γ-e diluted in perm-wash buffer (bd biosciences). sample acquisition was performed on a fortessa (bd) and data analysed in flowjo v (treestar). an acquisition threshold was set at a minimum of events in the live cd + gate. antigen-specific t cells were identified by gating on live/dead negative, doublet negative (fsc-h vs fsc-a), size (fsc-h vs ssc), cd + , cd + or cd + cells and cytokine positive. total sars-cov- s specific cytokine responses are presented after subtraction of the background response detected in the media stimulated control spleen sample of each mouse, prior to summing together the frequency of s and s specific cells. pigs: pbmcs were isolated from heparinised blood by density gradient centrifugation and cryopreserved in cold % dmso (sigma-aldrich) in hi fbs . resuscitated pbmc were suspended in rpmi medium, glutamax supplement, hepes (gibco) supplemented with % hi fbs (new zealand origin, life science production, bedford, uk), % penicillin-streptomycin and . % -mercaptoethanol ( mm; gibco) (crpmi). to determine the frequency of sars-cov- s specific ifn-γ producing cells, an elispot assay was performed on pbmc from , , and dpv. multiscreen -well plates (mahas ; millipore, fisher scientific) were pre-coated with µg/ml anti-porcine ifn-γ mab (clone p g , bd biosciences) and incubated overnight at °c. after washing and blocking with crpmi, pbmcs were plated at × cells/well in crpmi in a volume of µl/well. pbmcs were stimulated in triplicate wells with the sars-cov- s peptide pools at a final concentration of µg/ml/peptide. crpmi alone was used in triplicate wells as a negative control. after hours incubation at °c with % co , plates were developed as described previously . the numbers of specific ifn-γ secreting cells were determined using an immunospot ® s analyzer (cellular technology, cleveland, usa). for each animal, the mean 'crpmi only' data was subtracted from the s peptide pool , and data which were then summed and expressed as the mediumcorrected number of antigen-specific ifn-γ secreting cells per x pbmc. to assess intracellular cytokine expression pbmc from and dpv were suspended in crpmi at a density of × cells/ml and added to µl/well to -well round bottom plates. pbmcs were stimulated in triplicate wells with the sars-cov- s peptide pools ( µg/ml/peptide). unstimulated cells in triplicate wells were used as a negative control. after hours incubation at °c, % co , cytokine secretion was blocked by addition : , bd golgiplug (bd biosciences) and cells were further incubated for hours. pbmc were washed in pbs and surface labelled with zombie nir fixable viability stain (biolegend), cd -percp-cy . mab (clone - - , bd bioscience) and cd β-fitc mab (clone ppt , bio-rad antibodies). following fixation (fixation buffer, biolegend) and permeabilization (permeabilization wash buffer, biolegend), cells were stained with: ifn-γ-af mab (clone cc , bio-rad antibodies, kidlington, uk), tnf-α-bv mab (clone mab , biolegend), il- mab (clone a d f h , invitrogen, thermo fisher scientific) and il- bv mab (clone mp - d , biolegend) followed by staining with anti-mouse igg a-pe-cy (clone rmg a- , biolegend). cells were analysed using a bd lsrfortessa flow cytometer and flowjo x software. total sars-cov- s specific cytokine positive responses are presented after subtraction of the background response detected in the media stimulated control pbmc sample of each pig, prior to summing together the frequency of s-peptide pools - specific cells. graphpad prism . . (graphpad software, san diego, usa) was used for graphical and statistical analysis of data sets. anova or a mixed-effects model were conducted to compare responses over time and between vaccine groups at different time points post-vaccination as detailed in the results. neutralising antibody titre data were log transformed before analysis. neutralising antibody titre data generated by the vnt and pvnt assays were compared using spearman nonparametric correlation. p-values < . were considered statistically significant. were immunised on day and with chadox ncov (prime-boost) or chadox ncov on day (prime-only); pigs (n= ) were immunised with chadox ncov- on days and (primeboost), or only on day (prime-only). to analyse sars-cov- s-specific t cell responses, all mice were sacrificed on day for isolation of splenocytes and pigs were blood sampled longitudinally to isolate pbmc. following stimulation with sars-cov- s-peptides, responses of murine splenocytes (a) and porcine pbmc (c) were assessed by ifn-γ elispot assays. using flow cytometry, cd + and cd + t cell responses were characterised by assessing expression of ifn-γ, tnf-α, il- , il- and il- (mice; b) and ifn-γ, tnf-α, il- and il- (pigs; d). each data point represents an individual mouse/pig with bars denoting the median response per group/timepoint. : sars-cov- s protein-specific antibody responses following chadox ncov- primeonly and prime-boost vaccination regimens in mice and pigs. inbred balb/c (n= ) and outbred cd (n= ) were immunised on day and with chadox ncov (prime-boost) or chadox ncov on day (prime-only), whereas, pigs were immunised with chadox ncov- on days and (prime-boost), or only on day (prime-only). to analyse sars-cov- s protein-specific antibodies in serum, all mice were sacrificed on day and pigs were blood sampled weekly until day . antibody units or end-point titres (ept) were assessed by elisa using recombinant sars-cov- fl-s for both mice (a) and pigs (b), and recombinant s protein rbd for pigs (c). sars-cov- neutralising antibody titres in pig sera were determined by vnt, expressed as the reciprocal of the serum dilution that neutralised virus infectivity in % of the wells (nd ; d) , and pvnt, expressed as reciprocal serum dilution to inhibit pseudovirus entry by % (ic ; e). each data point represents an individual mouse/pig sera with bars denoting the median titre per group. a single dose of chadox mers provides broad protective immunity against a variety of mers-cov strains. biorxiv chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques. biorxiv a single dose of chadox mers provides protective immunity in rhesus macaques antigen encoded by vaccine vectors derived from human adenovirus serotype is preferentially presented to cd + t lymphocytes by the cd α+ dendritic cell subset immunization with an adenovirus-vectored tb vaccine containing ag a-mtb effectively alleviates allergic asthma the pig: a model for human infectious diseases large animal models for vaccine development and testing the contribution of non-human primate models to the development of human vaccines comparison of heterosubtypic protection in ferrets and pigs induced by a single-cycle influenza vaccine immunogenicity and protective efficacy of seasonal human live attenuated cold-adapted influenza virus vaccine in pigs aerosol delivery of a candidate universal influenza vaccine reduces viral load in pigs challenged with pandemic h n virus vaccine development for nipah virus infection in pigs bovine herpesvirus- -vectored delivery of nipah virus glycoproteins enhances t cell immunogenicity in pigs. vaccines (basel) targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals elevated exhaustion levels and reduced functional diversity of t cells in peripheral blood may predict severe progression in covid- patients clinical and immunological features of severe and moderate coronavirus disease transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients presence of sars-cov- reactive t cells in covid- patients and healthy donors. medrxiv sars-cov- infection protects against rechallenge in rhesus macaques dna vaccine protection against sars-cov- in rhesus macaques vaccination with viral vectors expressing np, m and chimeric hemagglutinin induces broad protection against influenza virus challenge in mice a serological assay to detect sars-cov- seroconversion in humans this study was supported by engineering sarah gilbert and teresa lambe are named on a patent application covering chadox ncov- . the remaining authors declare no competing interests. the funders played no role in the conceptualisation, design, data collection, analysis, decision to publish, or preparation of the manuscript. correspondence and material requests to professor simon p. graham (simon.graham@pirbright.ac.uk) and professor teresa lambe (teresa.lambe@ndm.ox.ac.uk). key: cord- -m m mhye authors: fagre, anna c.; manhard, john; adams, rachel; eckley, miles; zhan, shijun; lewis, juliette; rocha, savannah m.; woods, catherine; kuo, karina; liao, wuxiang; li, lin; corper, adam; challa, dilip; mount, emily; tumanut, christine; tjalkens, ronald b.; aboelleil, tawfik; fan, xiaomin; schountz, tony title: a potent sars-cov- neutralizing human monoclonal antibody that reduces viral burden and disease severity in syrian hamsters date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: m m mhye the emergence of covid- has led to a pandemic that has caused millions of cases of disease, variable morbidity and hundreds of thousands of deaths. currently, only remdesivir and dexamethasone have demonstrated limited efficacy, only slightly reducing disease burden, thus novel approaches for clinical management of covid- are needed. we identified a panel of human monoclonal antibody clones from a yeast display library with specificity to the sars-cov- spike protein receptor binding domain that neutralized the virus in vitro. administration of the lead antibody clone to syrian hamsters challenged with sars-cov- significantly reduced viral load and histopathology score in the lungs. moreover, the antibody interrupted monocyte infiltration into the lungs, which may have contributed to the reduction of disease severity by limiting immunopathological exacerbation. the use of this antibody could provide an important therapy for treatment of covid- patients. the emergence of a novel coronavirus disease , caused by severe acute respiratory syndrome coronavirus (sars-cov- ), was first described in december in wuhan, china . symptoms include cough, dyspnea, chest tightness and fever, plus occasional adverse gastrointestinal (gi) disturbances. in a vulnerable subset of patients, the disease often progresses to an atypical pneumonia with high morbidity and mortality rates [ ] [ ] [ ] [ ] [ ] . mortality based on case fatality rates have ranged from early estimates of . % in hubei province, china, to . % in populations from countries with widespread testing. recent estimates of infection fatality rates (ifrs) are between . % to % and overall indicate that covid- has a -fold greater mortality than seasonal influenza [ ] [ ] [ ] [ ] [ ] . alarmingly, the ifr is considerably higher ( . %) in individuals ≥ years of age where the prognosis for those hospitalized patients ranges from guarded to poor; they frequently require long periods of breathing support, with mortality rates of - % , . consequently, the effect on healthcare resources is of major concern and there is an increasingly unmet medical need for effective therapies, particularly for older patients with comorbidities, including atherosclerosis, hypertension and diabetes mellitus, to prevent lethal pneumonia. cellular infection by coronaviruses is mediated by the viral homotrimeric spike glycoprotein binding to specific host cell receptors. each protomer consists of an s domain, which mediates receptor binding, and an s domain that mediates membrane fusion and cell infection following conformational changes induced by host cell receptor binding [ ] [ ] [ ] . the receptor for sars-cov, sars-cov- and hcov-nl is angiotensin converting enzyme (ace ), which is expressed on mucosal epithelia of the upper respiratory tract, bronchioles, lungs and the gi tract [ ] [ ] [ ] [ ] [ ] . convalescent serum and purified antibodies from recovering patients have yielded promising results in past viral outbreaks - , and this approach may also be promising for sars-cov- . one key challenge in the development of these antibodies is how to prevent viral escape whereby variant mutations nullify antibody binding to the rbd epitope. multiple variants of sars-cov- have emerged, some of which have mutations in the rbd of the s protein and are predicted to have increased binding to ace [ ] [ ] [ ] [ ] . the best neutralizing antibodies are likely to recognize epitopes with high sequence conservation due to structural or functional constraints. the rbm binding site to ace is such an example because sequence divergence within this region is constrained to the prerequisite that binding to ace must be maintained. an analysis of the crystal structures of the ace -rbd sars-cov complex and the ace -sars-cov- rbd complex , confirms that eight of the key rbm contact residues are conserved between these respiratory coronaviruses, although the sars-cov- rbm forms a more compact interface with ace than that formed by sars-cov rbd and has a higher affinity for ace , . many of the previously identified neutralizing antibody clones against sars-cov were shown to block s binding to its ace receptor by binding to epitopes located within the rbm [ ] [ ] [ ] [ ] , . however, not all monoclonal antibodies isolated previously against the sars-cov rbd recognize the sars-cov- spike protein in the context of the full-length spike homotrimer , . similarly, recent studies evaluating anti-sars-cov- antibodies isolated from convalescent patients indicate only a minor subset of the clones have the requisite viral neutralizing activity , , . together, these studies indicate that there is an urgent need for sars-cov- -specific human neutralizing antibody therapeutics that have proven ability to both neutralize sars-cov- viral host cell infection in vitro as well as the appropriate activity in vivo. the syrian hamster has been shown to be a suitable model for sars-cov- infection given the similarity of clinical signs and lung pathology to human disease in the first few days of infection [ ] [ ] [ ] . however, to date, there has been only a gross histological analysis of the lung pathological changes following infection and the impact of sars-cov- neutralizing antibody clones on lung immune infiltrates has yet to be fully assessed. in this study, we describe an anti-sars-cov- spike rbd clone isolated from a rationally designed, fully human antibody library that bound to native spike protein. this potent antibody could block the interaction of the spike protein with ace and could also block sars-cov- infection of vero e cells and the resultant cytopathic effect. this clone was then tested in a pilot therapeutic study in syrian golden hamsters (mesocricetus auratus) and was shown to reduce viral load and ameliorated the severity of bronchointerstitial pneumonia. detailed histological and image analysis suggests that macrophages play a key role in the lung response to sars-cov- infection in hamsters. production and purification of antibody clones. the dna inserts encoding the light and heavy chains of clone avgn-b and its variants were separately cloned into different expression vectors carrying the constant regions of human igg heavy chain and the kappa chain. the heavy and light chain plasmids were co-transfected into expi suspension cells. after - days, secreted antibody was then purified from the culture supernatants by protein a chromatography, bufferexchanged into pbs ph . and its concentration determined by nanodrop a assay. the quality of the igg was assessed by sds-page and by hplc. endotoxin levels were tested using a limulus amoebocyte lysate (lal) assay (thermofisher scientific, cat # a ). variants. the wells of immulon hb elisa -well plates were coated with avgn-b or its variant iggs at ° c overnight. the wells were blocked, washed, then serial dilutions of the monomeric biotinylated rbd-antigen added to the wells and incubated for h at room temperature. bound antigen was then detected with hrp-labeled streptavidin. the od readings were plotted against concentration using prism software (graphpad ca) for curve fitting and determination of the apparent kd value. the ic values for inhibition of rbd binding to the ace receptor were determined by pre-mixing a serial dilution of antibody with nm of biotinylated sars-cov- -rbd and incubating for min before adding to wells coated with recombinant ace -fc fusion protein (residue gln -ser fused to human fc [kactus biosystems, cat# ace-hm ]). after h, bound rbd was detected using hrp-labeled streptavidin and the od readings plotted against antibody concentration using prism software. real time kinetics were measured by bilayer interferometry (bli) at a temperature of ° c using vero e cells were plated overnight in -well plates at , cells per well. antibodies were diluted in complete dmem and serially diluted : resulting in a -point dose response dilution series run in or replicates. the dilutions of antibody were incubated with tcid per µl of sars-cov- (strain -ncov/usa-wa / ) for h and added to the assay plates. the plates were incubated for days at ° c, % co and % relative humidity and the inhibitory effects of the antibodies were assessed. approval of the study protocol was obtained from the colorado state university institutional animal care and use committee (protocol ). male syrian hamsters (n= , weeks of age, obtained from charles river laboratory). the animals were held in the csu animal facility and provided access to standard pelleted feed and water ad libitum prior to being moved into the biosafety level facility for experimental challenge. of the hamsters, were intranasally infected with . x tcid /ml equivalents of sars-cov- (strain -ncov/usa-wa / ) and divided into treatment groups as follows: "avgn-b high" ( . mg avgn-b) (n= ), "avgn-b low" ( mg avgn-b) (n= ), "untreated" (no antibody) (n= ), and "ab control" ( . mg isotype control igg) (n= ). two uninfected hamsters received . mg avgn-b (termed "uninfected"). each animal was dosed intraperitoneally with corresponding treatment at and hours post dosing (hpi). at , , , , and hpi, each hamster was weighed and assessed for presence of clinical signs (lethargy, ruffled fur, hunched back posture, nasolacrimal discharge, and rapid breathing). at hpi ( dpi), each hamster was anesthetized with isoflurane and then euthanized via cardiac exsanguination and blood was collected. weight loss (calculated as percentage decrease from dpi weight) and viral rna load in lung were compared between treatment groups in prism using multiple t-tests and mann-whitney u tests, respectively. p< . was considered significant. swabs in viral transport medium were vortexed thoroughly and centrifuged to pellet cellular debris. lungs, tracheobronchial and hilar lymph nodes, thymus, esophagus, heart and liver from hamsters labeled with ear tags - were extirpated en bloc and fixed whole in % neutral buffered formalin for at least days to ensure virus inactivation prior to transfer to the csu veterinary diagnostic laboratory for trimming. four transverse whole-lung sections were stained with h&e or processed for ihc. sections, µm thick, were subjected to heat-induced epitope retrieval performed online on a leica bond-iii ihc automated stainer using bond-epitope retrieval solution. antibodies to sars-cov- nucleocapsid protein (mouse, : ), pancytokeratin, factor-viii and ionized calcium binding adaptor molecule (iba- ) (leica biosystems) or negative control slides primary antibody was replaced by a rabbit non-specific igg isotype negative control antibody for minutes. labeling was performed on an automated staining platform. fast red was used a chromogen and slides were counterstained with hematoxylin. immunoreactions were visualized and blindly scored by a single pathologist. in all treated categories, reactive lung sections incubated with primary antibodies were used as positive immunohistochemical control. negative control sections were incubated in diluent composed of tris-buffered saline with a carrier protein and homologous nonimmune serum. all sequential steps of the immunostaining procedure were performed on negative controls following incubation. paraffin embedded tissue sections were stained for sars-cov- nucleocapsid protein ( : ) and ionized calcium binding adaptor molecule (abcam; ab ; : ) using a leica bond rx m automated staining instrument following permeabilization using . % triton x diluted in tris- table . the precise digital quantification of the total affected pulmonary parenchyma as well as counting of inflammatory cells per area (roi, mm ) in histological section was determined. a digital montage was compiled at ´ magnification to include the four tissue classes that were differentially characterized (table ) to establish algorithm classifier. affected rois were subsequently automatically identified using olympus cellsens software by quantifying whole-lung mounts scanned from each hamster for total number of nuclei or nucleated cells (to exclude erythrocytes) stained with hematoxylin. sections labeled by immunofluorescence for iba- and sars-cov- were analyzed using the count and measure module of cellsens. the algorithm predominantly extracted multispectral information from images with additional dia processing using spatial, logical and threshold separators after manual annotations. all four classes were accurately identified as expected by a board-certified pathologist who was blinded to experimental groups of hamsters. a rationally designed fully human antibody library displayed on yeast was screened by magnetic individual antibody clones were tested for their abilities to block sars-cov- -rbd binding to ace using a competition elisa and tested for their ability to bind to native sars-cov- spike protein expressed as a gfp fusion protein by transfected cells compared to binding activity to non-transfected cell controls. those antibody clones that blocked the interaction of the rbd with ace and bound to native spike protein were then tested for neutralization of sars-cov- in a cytopathic effect (cpe) assay with vero e cells. clone avgn-b was identified as the most potent in these assays. it exhibited an apparent affinity by elisa for sars-cov- rbd of . nm (fig. a) , an apparent ic value for blocking rbd binding to ace -fc of . nm (fig. b) , and an apparent kd for binding to native spike protein expressed by cells of . nm compared to . nm for ace -fc (fig. d) . real time binding kinetics of avgn-b to the isolated rbd recombinant protein measured by bli indicated a kd of . nm (fig. c) . when tested for its ability to neutralize sars-cov- infectivity, clone avgn-b exhibited % protection from sars-cov- induced cell death down to . µg/ml. using a colorimetric assay for quantitation of cell death, avgn-b exhibited an ic value of that ranged from . µg/ml (experiment with replicates) to . µg/ml (experiment with replicates) in the cpe assay (fig. e) . at dpi, infected hamsters appeared quiet and began to progressively lose weight over the course of the study (up to ~ % reduction in untreated animals). there was no significant reduction in weight loss associated with avgn-b treatment (p> . ) ( fig. a) . none of the hamsters in the study died or met euthanasia criteria prior to study termination at dpi. there was a significant reduction in viral rna in the lungs of hamsters treated with avgn-b (both . mg and mg doses) compared to those that were untreated (p = . and . , respectively) (fig. b ). lungs from the uninfected and the ab control hamsters (treated with . mg control isotype lungs were examined for the extent of macrophage infiltration and sars-cov- viral load at days post infection by histopathological examination and by immunofluorescence imaging (fig. ). whole mount sections of paraffin-embedded lung tissue were stained with hematoxylin and eosin and brightfield grayscale images were collected using a microscope equipped with a scanning motorized stage (fig. a-e) . hematoxylin-positive macrophage soma were rendered as focal points within the regions of interest to calculate the percent hypercellularity of tissue following infection with sars-cov- . by dpi, lung tissue showed extensive infiltration of macrophages (fig. a ) that was decreased in dose-dependent fashion by treatment with avgn-b (fig. b, c) . treatment of uninfected hamsters with avgn-b alone did not increase macrophage infiltration, whereas co-treatment with igg control ab during infection with sars-cov- (ab control group) still resulted in marked infiltration of macrophages (fig. e ). the percent of total lung area displaying macrophage hypercellularity was quantified in fig. n , and consequently, in parallel to the in vivo studies described above, antibody engineering of clone avgn-b has been performed and a panel of more potent variants has now been isolated. as shown in table future studies will explore the efficacy of the avgn-b and/or its variants in non-human primate models, several of which have been described for use in sars-cov- pathogenesis and countermeasure development studies , , . larger studies will also be conducted to examine whether avgn-b and/or its variants by reducing viral load and accumulation of macrophages within the lung will prevent downstream inflammatory and coagulation sequalae of sars-cov- infection within other parenchymatous organs, particularly heart, kidneys and liver. should avgn-b and/or its variants advance to clinical trials in human patients, its effect on kawasaki-like disease (kd) in sars-cov- infected children merits clinical investigation. there is accumulating evidence that the monocyte/macrophage system releases cytokines that directly lead to vascular endothelial damage during acute kd , . investigating the role of avgn-b and/or its variants in suppressing the cytokine storm by suppressing a pivotal player, monocyte-macrophage system will be important. affected lung area "blue" and "red" for cellular density intra-bronchiolar, intra-alveolar, peribronchiolar and perivascular inflammation or alveolar septal thickening, edema and hemorrhage unaffected lung tissue "light blue" normal lung parenchyma background "grey" intravascular erythrocytes, blood vessel walls, bronchiolar mucosa glass "white" glass a pneumonia outbreak associated with a new coronavirus of probable bat origin clinical characteristics of coronavirus disease in china risk factors of fatal outcome in hospitalized subjects with coronavirus disease from a nationwide analysis in china clinical features of patients infected with novel coronavirus in wuhan clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study clinical, laboratory and imaging features of covid- : a systematic review and meta-analysis case-fatality risk estimates for covid- calculated by using a lag time for fatality estimates of the severity of coronavirus disease : a model-based analysis estimating the infection and case fatality ratio for coronavirus disease (covid- ) using age-adjusted data from the outbreak on the diamond princess cruise ship serology-informed estimates of sars-cov- infection fatality risk in presenting characteristics, comorbidities, and outcomes among patients hospitalized with covid- in the new york city area the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex pre-fusion structure of a human coronavirus spike protein tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion angiotensin-converting enzyme is a functional receptor for the sars coronavirus human coronavirus nl employs the severe acute respiratory syndrome coronavirus receptor for cellular entry receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses airways expression of sars-cov- receptor, ace , and tmprss is lower in children than adults and increases with smoking and copd the effectiveness of convalescent plasma and hyperimmune immunoglobulin for the treatment of severe acute respiratory infections of viral etiology: a systematic review and exploratory meta-analysis potent neutralization of severe acute respiratory syndrome (sars) coronavirus by a human mab to s protein that blocks receptor association molecular and biological characterization of human monoclonal antibodies binding to the spike and nucleocapsid proteins of severe acute respiratory syndrome coronavirus development and characterisation of neutralising monoclonal antibody to the sars-coronavirus potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies structure of severe acute respiratory syndrome coronavirus receptorbinding domain complexed with neutralizing antibody importance of neutralizing monoclonal antibodies targeting multiple antigenic sites on the middle east respiratory syndrome coronavirus spike glycoprotein to avoid neutralization escape structure, function, and antigenicity of the sars-cov- spike glycoprotein the establishment of reference sequence for sars-cov- and variation analysis emergence of sars-cov- spike rbd mutants that enhance viral infectivity through increased human ace receptor binding affinity structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure of sars coronavirus spike receptor-binding domain complexed with receptor structural basis of receptor recognition by sars-cov- evaluation of human monoclonal antibody r for immunoprophylaxis of severe acute respiratory syndrome by an animal study, epitope mapping, and analysis of spike variants human monoclonal antibody as prophylaxis for sars coronavirus infection in ferrets potent binding of novel coronavirus spike protein by a sars coronavirusspecific human monoclonal antibody cryo-em structure of the -ncov spike in the prefusion conformation pathogenesis and transmission of sars-cov- in golden hamsters simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility syrian hamsters as a small animal model for sars-cov- infection and countermeasure development respiratory disease in rhesus macaques inoculated with sars-cov- detection of novel coronavirus ( -ncov) by real-time rt-pcr potent neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody human neutralizing antibodies elicited by sars-cov- infection a human monoclonal antibody blocking sars-cov- infection a human neutralizing antibody targets the receptor binding site of sars-cov- isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies massive transient damage of the olfactory epithelium associated with infection of sustentacular cells by sars-cov- in golden syrian hamsters identification of oxidative stress and toll-like receptor signaling as a key pathway of acute lung injury sars-cov- and myocardial injury: a role for nox ? prospects for the use of regulators of oxidative stress in the comprehensive treatment of the novel coronavirus disease (covid- ) and its complications is macrophages heterogeneity important in determining covid- lethality? trained immunity: a program of innate immune memory in health and disease monocyte and macrophage immunometabolism in atherosclerosis covid- : immunopathology and its implications for therapy role of oxidized ldl-induced "trained macrophages" in the pathogenesis of covid- and benefits of pioglitazone: a hypothesis coagulopathy and antiphospholipid antibodies in patients with covid- metabolic modulation of inflammation-induced activation of coagulation infection with novel coronavirus (sars-cov- ) causes pneumonia in rhesus macaques chadox ncov- vaccine prevents sars-cov- pneumonia in rhesus macaques peripheral blood monocyte/macrophages and serum tumor necrosis factor in kawasaki disease nf-κb activation in peripheral blood monocytes/macrophages and t cells during acute kawasaki disease key: cord- - kxwkcbl authors: overholt, kalon j.; krog, jonathan r.; bryson, bryan d. title: dissecting the common and compartment-specific features of covid- severity in the lung and periphery with single-cell resolution date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kxwkcbl as the global covid- pandemic continues to escalate, no effective treatment has yet been developed for the severe respiratory complications of this disease. this may be due in large part to the unclear immunopathological basis for the development of immune dysregulation and acute respiratory distress syndrome (ards) in severe and critical patients. specifically, it remains unknown whether the immunological features of the disease that have been identified so far are compartment-specific responses or general features of covid- . additionally, readily detectable biological markers correlated with strata of disease severity that could be used to triage patients and inform treatment options have not yet been identified. here, we leveraged publicly available single-cell rna sequencing data to elucidate the common and compartment-specific immunological features of clinically severe covid- . we identified a number of transcriptional programs that are altered across the spectrum of disease severity, few of which are common between the lung and peripheral immune environments. in the lung, comparing severe and moderate patients revealed severity-specific responses of enhanced interferon, a /iκb, il- , and il- pathway signatures along with broad signaling activity of ifng, spp , ccl , ccl , and il across cell types. these signatures contrasted with features unique to ards observed in the blood compartment, which included depletion of interferon and a /iκb signatures and a lack of il- response. the cell surface marker s pr was strongly upregulated in patients diagnosed with ards compared to non-ards patients in γδ t cells of the blood compartment, and we nominate s pr as a potential marker for immunophenotyping ards in covid- patients using flow cytometry. highlights covid- disease severity is associated with a number of compositional shifts in the cellular makeup of the blood and lung environments. transcriptional data suggest differentially expressed cell surface proteins as markers for covid- immunophenotyping from balf and pbmc samples. severity-specific features covid- manifest at the pathway level, suggesting distinct changes to epithelia and differences between local and systemic immune dynamics. immune-epithelial cellular communication analysis identifies ligands implicated in transcriptional regulation of proto-oncogenes in the lung epithelia of severe covid- patients. network analysis suggests broadly-acting dysregulatory ligands in the pulmonary microenvironment as candidate therapeutic targets for the treatment of severe covid- . december , the virus had spread to every major country on earth [ , , , ] . the pandemic disease caused by sars-cov- , termed coronavirus disease (covid- ) , has diverse clinical presentations ranging from asymptomatic infection, to moderate symptomatic infection with possible pneumonia, to severe respiratory distress, to critical respiratory failure, septic shock, and/or multiple organ dysfunction or failure [ , , , ] . a hallmark of severe and critical covid- cases is a rampant dysregulation of the immune system concomitant with the development of a hypoxemic respiratory condition widely characterized as acute respiratory distress syndrome (ards) [ , , , , ] . the serological profile of severe covid- patients largely resembles the cytokine profile of ards [ ] and has been characterized by high levels of many cytokines including il- , il- , il- , il- , il- , ccl , ccl , tnf-α, and ifn-γ [ , , , , ] . post-mortem examinations of covid- patients reveal the aftermath of these disease dynamics: diffuse alveolar damage, multi-organ infiltration of lymphocytes and alveolar macrophages [ , , , ] , and pneumocyte hyperplasia and peribronchiolar metaplasia in the epithelium resembling adenocarcinomas [ , , ] . these morphological findings suggest local and systemic activation and infiltration of inflammatory immune cells that may contribute to the inflammatory injuries of the respiratory system consistent with ards phenotypes in severe covid- [ ] . while months of clinical observations in hospitals across the world have led to consistent descriptions of covid- severity at a clinical level [ , , ] , the biological underpinnings of immune hyperactivation in severe and critical covid- are only beginning to be defined. bulk rna sequencing (bulk rna-seq) and single-cell rna sequencing (scrna-seq) studies have identified stark transcriptional differences between bronchoalveolar lavage fluid (balf) and peripheral blood mononuclear cell (pbmc) samples in hospitalized covid- patients, indicating that immunological responses may be highly compartment-specific [ , ] . a number of recent studies have compared severe patients to healthy control subjects in an effort to define immunological hallmarks of disease severity in both the lung and peripheral circulation using scrna-seq. these studies have identified that severe disease in the lung compartment is associated with lymphopenia, t cell hyperactivation, and inflammatory macrophage polarization, while severe disease in the blood is associated with lymphopenia, suppression of type i and type ii interferon activity, and decreased monocyte hla class ii expression [ , , , ] . although these studies describe immunological dysregulation in severe covid- patients requiring intensive care, they do not explore immunological features distinguishing covid- patients who experience moderate or non-ards pathology from those who progress to life-threatening disease courses. single-cell resolution studies specifically comparing disease severity strata are needed to reveal the immunological mechanisms responsible for severity-specific immune dysregulation and to identify signatures of disease severity that may inform covid- patient triage and treatment. while many studies have endeavored to apply transcriptional profiling to understand the cellular dynamics underlying coronavirus infection, to our knowledge, a single integrative study comparing responses across the spectrum of disease severity in both the lung and blood compartments has not yet been reported. isolated analyses of balf samples have shown neutrophil and proliferating t cell enrichment in severe over moderate patients accompanied by cd + t cell lymphopenia, a shift towards inflammatory macrophage polarization, and increased transcriptional expression of a variety of cytokines [ , ] . previous investigations focusing only on peripheral blood responses have shown various compositional shifts between severe and moderate disease including cytopenias of monocytes, natural killer (nk) cells, dendritic cells (dcs), and t cells, [ , , , , ] accompanied by increases in plasmablasts, b cells, and neutrophils [ , , , ] . serum cytokine levels are reportedly altered between severe and moderate patients, including increased il- , il- , and tnf-ɑ [ , ] and deficient ifn-α and ifn-γ [ ] , while blood single-cell transcriptional analyses have shown alteration of the tnf-α/nf-κb pathway, interferon signatures, and hla class ii expression [ , , , ] . while individual studies have increased our understanding of severe covid- in the lung and blood separately, transcriptional dynamics have not been leveraged to propose detectable cell surface markers correlated with the level of a patient's disease severity in cell types of either compartment. another large gap in the literature persists regarding the transcriptional similarities and differences between the local and peripheral immune environments. critically, dysregulated biological pathways and their correlation with clinical severity levels remain poorly understood in immune cells of both compartments. immuneepithelial signaling dynamics at the site of local infection likely play a supporting role in severe responses but have also not been rigorously described. an overarching unaddressed question is the possibility for therapeutic interventions to affect the lung and blood compartments differently; thus, generating comparative transcriptional data will be useful for determining efficacious treatment strategies for severe in this study, we present comparisons between immunological signatures of covid- based on clinical severity level in both the lung and blood compartments using identical methods. we re-analyzed two publicly available scrna-seq datasets: one containing balf cells from donors stratified by the original authors as "moderate" and "severe" covid- patients as well as healthy control subjects, obtained from liao et al. [ ] , and one containing pbmcs from donors stratified by the original authors as "non-ards" and "ards" patients as well as healthy control subjects, obtained from wilk et al. [ ] for both of these datasets we conducted identical pre-processing, integration and analysis in order to obtain comparable results. we first evaluated differential gene expression and pathway-level changes across the disease severity stratifications mentioned above for each cell type in the balf and the pbmc datasets. next, we leveraged differential expression data to identify cell surface proteins that could be promising markers for covid- severity immunophenotyping. finally, we investigated ligand-receptor interactions implicated in regulating severity-specific immunophenotypes using nichenet network analysis [ ] and we propose multiple 'pan-ligands' as potential key regulators specific to severe disease. our integrative analysis adds to a growing knowledge-base and contributes to a finer understanding of the mechanisms that drive ards-related immune dysregulation in severe covid- . our findings may guide future work informing potential interventional strategies to improve patient outcomes as the covid- pandemic continues to unfold. severe covid- has been thought to result in profound immune dysregulation at both a local and systemic level. here, we performed a re-analysis of single-cell rna sequencing (scrna-seq) data to identify common and compartment-specific signatures of covid- disease severity. we used identical methods to separately analyze multi-donor scrna-seq datasets from bronchoalveolar lavage fluid (balf) and peripheral blood mononuclear cells (pbmcs) in covid- patients classified by severity strata as well as healthy control subjects to investigate severity-specific immune dysregulation in the lung and periphery. balf and pbmc raw gene-barcode matrices were downloaded from separate studies in the geo [ , ] . the balf dataset consisted of patients ( severe and moderate) and healthy control donors, while the pbmc dataset consisted of patients ( patients with ards, patients without ards, where one patient who was sampled twice fell within both groups at different stages of the disease course) and healthy control donors. we performed our analyses on balf and pbmc datasets separately with the goal of identifying common and compartment-specific gene signatures. to analyze the compositional changes in cell populations occurring during severe covid- , we first sought to identify clusters in transcriptionally heterogeneous scrna-seq data of the lung. after preprocessing raw balf gene-barcode matrices, a filtered dataset consisting of , cells was clustered using seurat and visualized by uniform manifold approximation and projection (umap), as shown in figure a . a total of clusters were identified showing distinct separation in a two-dimensional umap space. annotation of these clusters based on the expression of canonical cell type markers showed the presence of myeloid cells, lymphoid cells, and epithelial cells, closely matching the cell types found by the original publication [ ] . of annotated cell types, contained cells from moderate, severe, and control donors, with the exception of plasma cells which were not recovered from moderate donors ( figure s a ). compositional changes across disease severity categories were assessed by z-scored percentages of cell types in each donor sample. analyzing balf compositional changes showed trends of increased donor specific percentages of nk cells, mixed t cells, and plasmacytoid dendritic cells (pdcs) in moderate donors over control and severe donors as displayed in the figure a heatmap. we next preprocessed raw pbmc gene-barcode matrices, resulting in a filtered dataset of , cells. the blood single-cell landscape was visualized by umap ( figure b) . a total of clusters were identified, showing distinct populations in a two-dimensional umap space. annotation of these clusters revealed the presence of myeloid cells and lymphoid cells, roughly matching the cell types found by in the original publication using a different method [ ] . we termed the blood cells observed in this dataset as "pbmcs", although cell type annotation revealed the presence of anuclear platelets and erythrocytes, as well as polymorphonuclear basophils and neutrophils. all annotated cell types contained cells from ards, non-ards, and control donors ( figure s d ). cluster (cd +/cd d+/ighg +) was annotated as probable doublets using the marker gene panel shown in figure s f . analyzing pbmc compositional changes across severity categories using z-scored percentages as described above, we observed trends indicating a decrease in erythrocyte and cd + effector memory t cell percentages as well as an expansion of plasma cells and proliferating plasma cells in both ards and non-ards patients compared to control donors. to examine the population of nk and t cells in the balf at finer resolution, we performed iterative data integration on the "nk", "mixed t", and "proliferating t" populations, using severe donors, moderate donors, and healthy control donors. severe donor s and healthy control donor hc were omitted as a consequence of having too few nk and t cells to allow integration and alignment. after reintegration, a total of , nk and t cells were visualized via umap ( figure c) . a total of clusters were identified, which were annotated according to distinct cell type labels with additional cluster labeled as "uncertain" and clusters (cd d+/cd +) and (cd d+/fcgr b+) classified as probable doublets using the marker gene panel shown in figure s i ). analyzing nk and t cell compositional changes across severity categories, we observed trends indicating expansion of cd + naive t and cd + effector treg cells in moderate donors compared to severe and control donors. the distinctive heterogeneity observed in mononuclear phagocytes (mps) of the balf seen in figure a led us to address severity-specific changes in individual mp populations. we again performed iterative data integration on clusters identified as mps in the original balf dataset ( figure a) . following iterative data integration using severe donors, moderate donors, and healthy controls, a total of , mp cells were visualized by umap ( figure d ). the mp clusters were left unannotated in further analysis. the clusters were evaluated using a panel of balf marker genes to identify probable doublets (figure s l), and clusters (cd +/cd d+) and (cd +/ighg +) were classified as such. analysis of compositional changes of mp clusters across disease categories showed more pronounced compositional changes than we observed in the balf in general. clusters , , and (expressing chemokines ccl , ccl , ccl as well as spp ) appear to be expanded in severe donors compared to moderate and control donors, while clusters , , and (expressing mrc , c qa, and fabp ) appear to be expanded in control donors compared to moderate and severe donors. cluster (expressing hladqa and hladqa ) appears to be uniquely expanded in moderate donors compared to control and severe donors. cellular markers corresponding to clinical categories of covid- disease severity have not yet been rigorously characterized. here, we sought to leverage transcriptional differences across severity strata to identify surface-bound proteins in both the lung and blood that could be used as markers for immunophenotyping. to nominate putative cell surface markers, we conducted differential gene expression analysis between severe and moderate patient groups for lung (balf) cells and between ards and non-ards patient groups for blood (pbmc) cells and selected significant differentially expressed genes (degs) with a fold change (fc) cutoff of |log fc> |. figure a shows significantly up-and downregulated transcripts for pdcs in the balf, one of lung cell types studied. from the large number of significant degs, we observed the upregulation of areg and cd , which were verified as cell surface protein-coding genes using the cell surface protein atlas [ ] . the areg and cd transcripts show robustly increased expression in pdcs in severe compared to moderate disease ( figure b ) and compared to pdcs in healthy control subjects ( figure s a) . we additionally showed that areg and cd are not upregulated in moderate disease compared to controls ( figure s a ). the expression of areg and cd was cross-referenced with bulk rna-seq blood data from the human immune cell atlas (figure s c) , and expression of these transcripts in various subtypes of dcs was verified [ , ] . differential expression of transcripts coding for cell surface proteins was evaluated for every cell type in the balf, resulting in a matrix of potential severity-specific surface markers for each cell type ( figure c) . upregulation of the areg gene was also observed in myeloid dendritic cells (mdcs, also known as conventional dendritic cells), indicating a possible conserved response across dc subtypes. extending our analysis beyond immune cell types and relaxing the stringency cutoff of |log fc> |, we made the interesting observation that epithelial (club and ciliated) cells demonstrate significant upregulation of icam and ldlr, the two main entry receptors for respiratory rhinoviruses. we confirmed that markers identified as differentially regulated between severe and moderate disease were not differentially regulated in the same direction between moderate disease and control ( figure s b) , with the exception of hladra and vamp on mps and ccr on mdcs, indicating that all other identified cell surface protein transcripts in figure c are unique markers of severe disease. we also sought to identify ards-specific cell surface markers in the blood in order to nominate immunophenotyping markers on cells that may be accessible through a blood draw. in gamma delta t (γδ t) cells, we observed ards-specific differentially expressed genes including the surface marker s pr (figure d) , a g-protein coupled receptor that interacts with multiple inflammatory pathways such as jak/stat and mtor/pi k/akt [ , ] . s pr expression was robustly upregulated in ards compared to non-ards patients ( figure e ) and ards compared to control (figure s d) , and additionally did not show upregulation between non-ards patients and controls. similarly to our analysis of the balf, we analyzed differential expression of surface protein genes in every cell type of the pbmc pool, and the matrix of potential markers is shown in figure e . we found that s pr was also differentially expressed in cd + effector t cells, and its general expression on t cells was verified in bulk rna-seq data ( figure s f ). immunoglobulin heavy chain genes also appeared to be differentially expressed in unexpected blood cell types. in both balf and pbmcs, we observed that increasing disease severity was associated with the common downregulation of hla class ii genes hladqa and hladqa on cd + monocytes and balf mps and hladra on cd + monocytes and balf mps. overall, these results suggest cell surface proteins whose transcripts are differentially expressed across disease severity categories as candidate immunophenotyping markers in cells of the lung and blood. to complement our transcriptomic analysis, we employed gene ontology (go) analysis to identify the biological functions of degs ( figure s g -i). go analysis of the degs in balf and pbmcs shows differential expression of transcripts belonging to ontologies of the innate immune system, cytokine signaling, and adaptive immune cell proliferation and activation. to probe the molecular mechanisms of severe covid- at a broader scale, we sought to utilize the large number of differentially expressed transcripts to identify biological pathways that are altered across the spectrum of disease severity in the local lung microenvironment (balf) and systemic circulation (pbmcs). significant transcriptional changes in cells of the balf were abundant in all cell types with the exception of plasma cells. we first directed our attention to the epithelium, where substantial morphological damage and histological atypia were observed in deceased covid- patients [ , , , , ] . at the level of individual gene regulation, we first noted a surprisingly strong significant upregulation of fos and jun, which code for proteins implicated in the development of cancer ( figure a) . additionally, we noted a strong significant downregulation of the tumor suppressor gene c orf . when comparing ciliated cells between moderate disease and healthy control subjects ( figure s a ), we did not observe differential expression of proto-oncogenic transcripts. we next leveraged gene set enrichment analysis (gsea) to dissect pathway-level regulation. gsea provides positive and negative enrichment readouts; here, we define pathways with positive normalized enrichment score as "enriched" and those with a negative normalized enrichment score as "depleted". gsea revealed the enrichment of pathways involved in the innate immune response, general inflammatory response, and, surprisingly, oncogenic signaling in severe versus moderate donors. specifically, we observed the significant enrichment of the epithelial-tomesenchymal transition, k-ras, pi k/akt and p pathways in addition to the tnf-α signaling via nf-κb and il- /stat pathways. when we increased this analysis to include all of the cell types found in the balf (figure a we next investigated pathway-level changes occurring in pbmcs and found that differential gene expression between ards and non-ards patients supported the detection of statistically enriched pathways through gsea. we first examined cd + blood monocytes, a cell type whose severe disease response could be compared to that of balf mps. in ards patients, cd + monocytes demonstrated pronounced downregulation of isg , tnfaip , and nfkbia compared to non-ards patients, manifesting in the depletion of the ifn-α, ifn-γ, and tnf-α signaling via nf-κb pathways ( figure b ). expression of the immunoglobulin g and m heavy chains appeared to be altered at the transcriptional level but did not contribute to significantly enriched pathways. when expanding this analysis to all pbmc cell types ( figure b ), we observed a striking depletion of ifn-α, ifn-γ, and tnf-α signaling via nf-κb pathways in a conserved trend across nearly every cell type. the il- /stat and il- /jak/stat pathways did not appear to be significantly altered in the majority of cell types. comparing pbmcs of non-ards donors to healthy controls revealed a strong enrichment of the ifn-α, ifn-γ, and tnf-α signaling via nf-κb pathways, as well as some inclusion of the il- /stat and il- /jak/stat pathways ( figure s b ). interestingly, comparing ards donors to healthy controls showed that these same pathways exhibited weaker enrichment ( figure s e ). to establish a comparison between blood monocytes and balf mps and explore the heterogeneity of balf mps observed in our dataset, we analyzed differential expression and pathway-level responses across of the clusters resulting from iterative data integration of the mp population. two clusters containing probable doublets were excluded from the analysis, namely cluster (cd +/cd d+) and to nominate potential transcription factors related to differentially expressed gene sets, we sought to identify transcription factors linked to deg lists using the encode and chea chip-x databases. consistent with the identified activity of the il- /jak/stat pathway, we observed enrichment of stat in many cells of the balf (figure s g ). cebpb and cebpd appeared to be depleted in the pbmc but enriched in the balf, and opposite directionality of these genes was observed between mps and cd + monocytes in particular. nelfe appeared to be enriched in t cells of the balf and pbmc (figure s g-h) , representing a commonality between the lung and blood responses to increasing disease severity. following the identification of severity-specific pathway-level regulation differentiating severe and moderate disease courses, we sought to construct a putative network for how these transcriptional programs could be induced by soluble and surface-level cell-cell interactions. specifically, we aimed to identify ligands acting as potential key regulators of severe disease in many cell types of the lung in order to nominate targets for further study as therapeutic options for severe covid- . figure a shows a flow diagram detailing the steps carried out to nominate severity-specific "pan-ligands" local to the site of viral infection. briefly, differential gene expression between moderate and severe donors was first evaluated for each cell type in the balf. we classified these transcripts as "target genes" whose differential expression is potentially regulated through ligand-receptor interactions [ ] . next, we employed nichenet to identify potential ligands linked to regulation of these differentially expressed target genes [ ] . we applied two filtering criteria to nominate potential ligands: ligands should ( ) act on over one-third of the cell types in the balf and ( ) be differentially expressed by at least one cell type in severe disease. to follow up on our observation of areg upregulation in pdcs during severe disease, we used nichenet to nominate potential soluble or cell-surface mediators of pdc differential gene expression. ligand activities were ranked using a nichenet-generated pearson correlation coefficient indicating the correlation between the target genes of a given ligand and the list of differentially expressed target genes in the "receiver" cell ( figure b ). the receptors for the top predicted ligands anxa , spp , tnf, csf, cxcl , ifng, cd lg, itgb , cd , icam , adm, and il all showed non-zero expression in pdcs. to identify which genes may be regulated by top-ranked ligands, putative ligand-gene interactions were scored by nichenet according to "regulatory potential", a graph-based likelihood for a ligand to regulate a particular target gene [ ] . we next sought to identify candidate "sender cells" expressing these ligands ( figure s a) . we found that mps and neutrophils expressed transcripts for pdc ligands, and we chose to further investigate these relationships. ligand-mediated cell-cell interactions between mps, neutrophils, and pdcs are visualized in the network diagram in figure b . as shown, the differential expression of areg is potentially regulated by il- and spp from mps, and il -β from neutrophils, although the receptors for il- and il -β could not be verified at the transcript level in pdcs in our dataset. additionally, given the enrichment of oncogenic pathways in epithelial cells of the balf, we identified cell types and signaling molecules potentially regulating epithelial degs. we applied the nichenet procedure outlined above for pdcs to club and ciliated cells of the balf, resulting in the ligandgene regulatory matrix shown in figure c . the ligands il b, il a, tnf, ifng, apoe, il rn, osm, edn , lif, mif, csf , spp , cd , ccl , vegfa, and ccl show potential for regulating differential expression, and receptors for all of these ligands were manually verified at the transcript level in club or ciliated cells. interestingly, fos and jun are among the genes potentially driven by the proposed ligands. as mps and neutrophils appear to express the ligands of interest (figure s b) , we investigated the role of these cells in regulating epithelial gene expression, visualized in the network diagram in figure c (at larger scale in figure s e ). as shown, mps may induce both fos and jun through il -α, while neutrophils may induce both fos and jun through il -β, tnf, and ifng, with additional regulation of jun by edn . finally, we aggregated ligand-receptor relationships across all of the cell types in the balf and proposed broadly-acting "pan-ligands" possibly contributing to the induction of transcriptional programs in the lung microenvironment. following the procedure described above ( figure a ) using each cell type in the balf as a receiver, we identified a list of potential pan-ligands and developed a ligand-receiver cell correlation matrix (figure s c ). this matrix was subsequently filtered to preserve ligands acting on over one-third of cell types in the balf (≥ cell types) with > % receptor expression and thresholded pearson correlation values. of these candidates, we selected ligands differentially expressed between severe and moderate disease in one or more balf cell types ( figure s d ) to further filter the ligand-receiver matrix. interestingly, the il transcript was not found to be differentially expressed in any of the balf cell types and failed at this filtering step. after filtering, we arrived at a list of differentially expressed pan-ligands implicated in broad severity-specific responses across cell types in the lung, consisting of ifng, il , ccl , ccl , and spp . these ligands were differentially expressed in at least one balf sender cell type and act on over one-third of the cell types in the balf (figure d ), indicating these ligands should be further investigated to elucidate their role in the life-threatening immunopathology of severe covid- . in the course of a severe sars-cov- viral infection, acute pulmonary damage has been observed concomitantly with elevated cytokine levels in serum and infiltration of macrophages and lymphocytes into multiple organs, indicative of both local and systemic immunological responses [ , ] . to better understand these dynamics, we leveraged scrna-seq data from balf and blood to compare severity-specific shifts in cellular composition at the local and systemic levels. in balf, compositional analysis indicated expansion of nk cells, t cells (general t cell population, cd + naive t cells, and cd + effector treg cells), and pdcs in moderate patients compared to both severe and control patients. although severityspecific compositional changes in balf samples remain largely unaddressed in the literature [ ] , the severity-specific lymphopenia of nk cells and cd + naive t cells that we report in balf agrees with findings from compositional studies of pbmcs [ , , ] . we additionally report severity-specific compositional shifts in lung mp populations and identify clusters expressing mrc , c qa, and fabp suggesting an m macrophage-like phenotype expanded in control subjects. moderate donors showed expanded clusters expressing hladqa and hladqa , indicating increased antigen presentation through hla class ii. expanded clusters in severe donors expressing spp as well as various chemokines including ccl , ccl , and ccl are likely associated with an m macrophage-like phenotype, in agreement with previous analyses [ ] . strong macrophage expression of osteopontin, the gene product of spp , has been previously observed during influenza a infection, suggesting that macrophages in severe covid- may be following a similar viral response to influenza [ ] . compositional changes were not readily apparent in pbmcs when comparing ards patients to non-ards patients, although we observed an expansion of multiple plasma cell types in ards patients compared to healthy control subjects, consistent with previous reports [ , , , ] . despite recent advances, the immunological signatures of severe covid- remain largely uncharacterized in both the lung [ , ] and systemic circulation [ , ] . in particular, a need exists to define markers of severity that can be used to assess patient disease trajectory in a clinical setting for the purposes of triage or to inform the use of immunomodulatory treatment strategies [ ] . here, we leveraged balf and pbmc transcriptional data to nominate cell surface proteins that could be incorporated into a flow cytometry panel for covid- patient immunophenotyping. notably, in balf samples we observed upregulation of areg in pdcs and mdcs of patients undergoing a severe disease course compared to patients who experienced moderate disease. the areg transcript was verified to be expressed by dendritic cells in bulk rna-seq data [ , ] , has been previously shown to orchestrate tissue homeostasis during influenza infection [ , ] , and has been previously labeled in flow cytometry studies [ ] . additionally, timp , vamp , and il r were strongly differentially expressed in balf mps during severe disease and have been previously documented to play a role in the immune response to respiratory viral infections. timp has been shown to promote deleterious immune responses in the lung during murine influenza infection [ ] , vamp is implicated in signaling pathways regulating influenza viral replication in vitro [ , ] , and il r encodes a decoy receptor for il -β that is upregulated during severe influenza [ ] . all of the balf identified surface markers in figure c with the exception of vamp , hla-dra, and ccr were specifically differentially expressed between severe and moderate donors but not between moderate and control donors (figure s b) . investigating cell surface markers in pbmcs, we found that il r on nk cells as well as s pr (cd ) on γδ t cells differentiate ards from non-ards covid- blood samples. il- stimulation through il- r has been shown to promote nk cell survival by inhibiting apoptosis [ , ] . s pr is a particularly interesting marker due to previous literature documenting its role in stimulating highly inflammatory pathways such as jak/stat and mtor/pi k/akt. additionally, suppression of the s pr cognate ligand s p decreased mortality rates during influenza infection in mice [ , , ] . all of the identified pbmc surface markers in figure f were specifically differentially expressed between ards and non-ards donors but not between non-ards and control donors ( figure s e ). our investigation revealed a small number of common surface markers of interest between balf and pbmc samples, namely hladqa and hladqa on mps/cd + monocytes and hladra on mps/cd + monocytes; these hla class ii transcripts are downregulated with increasing severity level in both compartments. similar observations of a severity-specific loss of hla-dr on cd + monocytes and t cells have been previously observed in recent scrna-seq and flow cytometry studies [ , , , , , ] . together, these findings suggest transcripts that should be studied further at the protein level as potential markers of a severe or ards disease course for immunophenotyping patients through balf sampling or a convenient blood draw. cell type-specific immunopathological responses in severe covid- have been investigated at both local and systemic levels in recent studies comparing severe patients to healthy controls [ , , , , , , , , , ] . we sought to build on this knowledge by determining the cell-type specific biological responses that differentiate more finely stratified disease severity levels in the lung and systemic circulation. in myeloid and lymphoid cells of the balf, we observed a concerted enrichment of the tnfα signaling via nf-κb pathway in severe compared to moderate donors, indicating the upregulated expression of factors such as nfkbia and tnfaip in severe disease. we further noted il- and il- signaling pathway enrichment in cell types including mps, mast cells, and neutrophils. il ra expression was observed in mast cells, il r was expressed in mps, mast cells and neutrophils, and neither the il nor il transcripts were well-captured in the dataset. ifn-α (type i) and ifn-γ (type ii) pathway responses showed mixed enrichment and depletion, with type i and ii interferon signaling apparently decreased in a majority of balf t cell subtypes. these data are consistent with previous observations of low expression of ifng and tnf in cytotoxic t lymphocytes of balf derived from severe patients [ ] . mps in the balf demonstrated a nearly homogenous enrichment of the il- , il- , and tnf-α signaling via nf-κb pathways across the diversity of clusters. when we compared disease strata based on ards diagnosis in pbmcs, fewer pathways demonstrate enrichment with increasing disease severity. nevertheless, a striking depletion of the type ii interferon response was observed in nearly every cell type studied, along with a broad depletion of the type i interferon response and tnf-α signaling via nf-κb pathways (including downregulation of tnfaip and nfkbia). this result corroborates a previously reported correlation between suppressed type i and ii interferon responses and disease severity in a study of pbmcs that was not specific to cell type [ ] , as well as extending findings of increased interferon response in monocytes during moderate disease but not severe disease [ , ] . our findings do not preclude the observation of increased tnf-α signaling in the bloodstream of severe and critical covid- patients [ ] , as tnfaip depletion has been shown to cause increased tnf-α expression [ ] . we next examined the similarity of pathway-level changes across the severity stratifications provided in the balf and pbmc datasets. comparing balf and pbmc responses to increasing disease severity level in consanguineous cell types showed shared suppression of the type i interferon response in cd + naive, cd + effector, and γδ t cells and of the type ii interferon response in mature b, proliferating plasma, cd + naive, cd + effector, and γδ t cells. for other cell types including mps/monocytes, nk and cd + effector t cells, the type ii interferon response appears to be increased in cells of the lung but decreased in the blood. moreover, our analysis identifies divergent severity-specific responses in the balf and blood in terms of the tnf-α signaling via nf-κb pathway for nearly all cell types. notably, the concerted tnf-α/nf-κb pathway enrichment across balf cell types is indicative of a strong upregulation of tnfaip and nfkbia. conversely, the depleted pathway across pbmc cell types corresponds to tnfaip and nfkbia downregulation. the a protein encoded by tnfaip is an nf-κb inhibitor that functions as a "brake" on antiviral signaling and inflammatory responses [ , , , ] whose deletion in mouse models improved survival during influenza infection [ ] . tfnaip differential expression was largely co-directional with nfkbia (coding for iκbα), a gene whose upregulation is a key feature of the human blood response to sars-cov as well as the lung response to sars-cov and mers-cov infection [ ] . our results suggest that the a /iκbα axis likely plays a role in sars-cov- infection as well, with tnfaip inhibiting nf-κb-mediated antiviral responses during severe disease at the site of local infection yet promoting a systemic inflammatory response through its absence during ards in the periphery. we note in this compartmental comparison that the balf and pbmc clinical stratifications represent different parts of the disease severity spectrum. therefore, we interpret differences in compartmental responses with respect to relative levels of clinical severity rather than absolute severity stages. taken together, our data suggest that transcriptional programs are differentially induced with increasing covid- severity, while the specific responses are nuanced according to cell type and local versus systemic immune environment. in epithelial cells of the balf, we detected the surprising upregulation of fos and jun, along with the downregulation of c orf in severe compared to moderate donors. due to the involvement of these genes in oncogenic programs, we further investigated pathway-level alterations in epithelial cells and detected the enrichment of epithelial-to-mesenchymal transition, k-ras, pi k/akt and p pathways. we note that a separate analysis of epithelial cells from the same balf dataset failed to detect differential enrichment of these pathways through gsea when comparing control samples to pooled moderate and severe samples [ , ] , likely as a result of this pooling and using a much shorter list of statistically tested genes (a single set of genes common to all epithelial cells compared to sets of , genes for ciliated cells and , genes for club cells). our results may be cautiously interpreted as providing evidence for the molecular underpinnings of the morphological changes known to occur in the severely damaged lung epithelia of severe covid- patients, previously described as cellular proliferation resembling atypical adenomatous hyperplasia, in situ adenocarcinoma, or even invasive adenocarcinoma [ ] . following the observation of pathways enriched in the epithelia of severe covid- patients, we employed ligand-receptor network analysis to nominate potential immune-epithelial communication networks that may explain transcriptional differences between severe and moderate disease courses. we identified ligands implicated in driving the differential expression in ciliated and club cells, of which il b, tnf, and ifng are predicted to regulate fos and jun, with il a additionally regulating jun. exploring the expression of these ligands across the diversity of balf cell types indicates that mps and neutrophils may act as "sender cells" signaling to the epithelium, as suggested in recent reports [ , , ] . in particular, aberrant neutrophil responses have been implicated in severe covid- through a number of studies focusing on either the lung or the blood [ , , , , , ] . on further analysis, we our analysis of the transcriptional regulation of cell surface proteins implicated areg as a potential marker unique to severe disease in pdcs. this finding appears consistent with previous literature showing that amphiregulin aids in the maintenance of epithelial integrity and tissue repair during infection or injury, [ , ] as patients with severe covid- experience extreme pulmonary damage. expanding on this knowledge, we utilized ligand-receptor network analysis to provide orthogonal information suggesting how pdc differentially expressed genes, including areg, may be regulated through soluble or cell-surface mediators. nichenet revealed that mps and neutrophils appear to signal to pdcs through spp /il and il b respectively to regulate the expression of areg as well as various other genes. spp in particular appears to be better-evidenced than il and il b as its active receptors cd and itgb are present on over % of pdcs in severe donors. our network analysis suggests that the expansion of a population of spp -expressing mps during severe disease, observed here and in liao et al. [ ] , may partly account for the upregulation of areg by pdcs. we recommend further study of the potential for areg induction by spp in addition to investigation of areg surface expression as a marker of severe disease. immune blockades, including the il- r antagonist tocilizumab, are a class of treatments currently under clinical use and research for covid- intended to dampen the hyperactive immune response by targeting key nodes in a signaling network [ , , ] . here, we sought to utilize ligand-receptor network analysis to nominate ligands as potential key regulators of severity-specific dysregulation in the lung microenvironment during severe covid- . our analysis suggests ifng, il , ccl , ccl , and spp as candidate "pan-ligands" that may induce transcriptional regulation in over one-third of cell types identified in the balf during severe disease. notably, we did not identify il as a potential pan-ligand as its differential expression between moderate and severe patients in our study was not statistically significant in any cell type. all of the pan-ligands we report have been previously implicated in sarsr-cov infection, with il- , ccl , and ifn-γ having been detected in serum from covid- patients [ , , ] , spp upregulation measured in microarrays from sars-cov-infected nonhuman primates, and ccl , ccl , il , and ifng upregulation detected in single-cell or bulk rna-seq from the lungs of covid- patients [ , , , ] . importantly, our data supports accumulating evidence for a nuanced interferon response in covid- depending on disease severity, cell type, and local versus systemic immune environment, as discussed above [ , , , , , , ] . we suspect that treatment strategies consisting of a direct type ii interferon blockade (nih clinical trial nct ) [ , ] could exert different effects systemically and locally depending on the patient's clinical condition when administered. taken together with previous findings, our data suggests ifng, il , ccl , ccl , and spp as candidate targets for the treatment of severe covid- that warrant further study. in summary, our findings emphasize unique roles for the cells of the lung microenvironment and systemic circulation in the immunopathology of ards and severe covid- . we recommend further investigation of differentially expressed cell surface markers to determine their utility in immunophenotyping patients according to suspected disease course to aid in triage or to inform optimal treatment options. our transcriptional analyses show that pathway enrichment differs between cells of the lung and blood, with concerted immunological responses within each compartment mirroring viral respiratory diseases including influenza, sars, and mers. finally, we nominate a small number of broadly-acting ligands as potential drivers of severity-specific transcriptional regulation in the severely damaged lung microenvironment of covid- patients. all single cell rna-sequencing (scrna-seq) data used in this analysis were obtained from publicly available datasets. balf scrna-seq gene-barcode matrices from covid- patients ( severe and moderate), and healthy control subjects were obtained from the geo under accession number gse . [ ] scrna-seq data from an additional healthy balf donor was obtained from the geo under accession number gsm . [ , ] severity-level stratifications for balf donors (severe/moderate/healthy) were used exactly as provided in the original manuscript. out of severe balf donors were invasively ventilated, with the exception of s . gene-barcode matrices obtained from the geo were preprocessed using seurat (v. . . ) in r (v. . . ). matrices were filtered to preserve cells with a umi count over , , gene count between and , , and expression of less than % mitochondrial rna. for pbmc data, cells were additionally filtered to preserve cells expressing less than % s rna and less than % s rna. gene-barcode matrices unique to each donor were subsequently normalized using the 'normalizedata' function and the , most highly variable genes (features) were identified via the 'findvariablefeatures' function using a variance stabilizing transformation. individual donor datasets were subsequently aligned and integrated using the standard seurat v multi-donor integration workflow, finding pairwise anchors using the 'findintegrationanchors' function acting on dimensions with default parameters (k.filter= ), then applying the 'integratedata' function on dimensions. multi-donor integration and all further analyses were performed separately on balf and pbmc datasets. following integration, "raw" rna expression values were normalized using log-normalization via the 'normalizedata' function and the , most variable features were identified using 'findvariablefeatures'. raw expression levels were left unscaled so that scaling could later be applied as needed (e.g. for generating marker gene heatmaps). in parallel, "integrated" expression values were scaled using the 'scaledata' function, regressing out umi count, number of genes, and percent rrna. pbmc data contained rrna transcripts to regress, whereas balf data did not. dimensionality reduction via principal component analysis (pca) was carried out using the 'runpca' function and the first principal components were retained. this reduction was projected onto the entire dataset prior to clustering. clustering was performed by finding nearest neighbors using the 'findneighbors' function in seurat acting on dimensions, then running the louvain algorithm via the 'findclusters' function with a resolution parameter of . . the resulting clustered datasets for balf and pbmc were separately visualized by uniform manifold approximation and projection (umap) acting on the first principal components. cell type annotation was conducted using the resultant clusters for balf and pbmc data. clusters were annotated according to the average expression levels of canonical marker genes identified in the original papers for the balf and pbmc datasets [ , ] , the broader literature, and the human protein atlas [ ] . clusters deemed identical by similar presence of marker genes were merged during annotation. a list of the marker genes used for cell type annotation is available in supplemental figure s , along with a visualization of expression levels. following cell type annotation for the balf, populations annotated as mononuclear phagocytes (consisting of one merged cluster) and t/nk cells (consisting of clusters annotated as "nk", "proliferating t" and "mixed t" cells) were separately re-integrated to enable clustering at a finer resolution. for mononuclear phagocytes, iterative data integration was performed using the integration procedure described above, with a k.filter parameter of for 'findintegrationanchors' and a resolution parameter of . for 'findclusters'. the resulting fine-scale mononuclear phagocyte clusters were left unannotated during further analysis, but were tested using the balf marker gene panel to identify probable doublets. for t and nk cells, iterative data integration was performed using the integration procedure described above, with a k.filter parameter of and a resolution parameter of . . in order to filter anchors for t/nk re-integration, one healthy control donor (hc , cells) and one severe donor (s , cells) with small numbers of t/nk cells were excluded from the integration and subsequent analysis. t and nk cell fine-level clusters were annotated according to a panel of canonical t and nk cell marker genes, available in supplemental figure s . the original integration of pbmc mutli-donor datasets allowed cell type annotation at a fine level and did not require iterative data integration. compositional changes across disease severity categories were assessed for balf and pbmc datasets separately using: ) donor-specific fractional contributions to each cell type in the total pool of cells recovered from all donors, ) donor-specific percentages of cell types comprising each pool of cells recovered from individual donor severity conditions (severe, moderate, or healthy control for balf, ards, non-ards, or healthy control for pbmcs), and ) percentages of cell types in each donor sample z-scored across donors. here, we did not employ low-powered statistical tests but instead sought to use method to highlight donor-to-donor variation in cell type composition and identify robust trends across donors indicating possible expansion or cytopenia. differential expression analysis was performed in seurat using the 'findmarkers' function utilizing model-based analysis of single-cell transcriptomics (mast) statistical framework through the "mast" r package (v. . ) [ ] . differential expression analysis was conducted for all cell types in the integrated balf dataset across three permutations of donor groups: severe vs. moderate, moderate vs. control, and severe vs. control. for all cell types in the pbmc dataset, differential expression analysis was conducted across three permutations of donor groups: ards vs. non-ards, non-ards vs. control, and ards vs. control significantly differentially expressed genes (degs) are indicated by having a mast adjusted p< . and a natural log fold change threshold of . was applied. the identification of cell surface markers indicative of severe disease was performed by crossreferencing all degs across disease severity levels in the integrated balf and pbmc datasets with entries in the cell surface protein atlas (cspa) [ ] . degs were considered to be differentially expressed surface markers if they were included in the "high confidence" cspa category and showed significant differential expression (adjusted p< . ) with an absolute value average natural log fold change greater than . (or log fold change > ). differentially expressed surface markers were further analyzed to verify low expression in moderate or non-ards and control donor samples. finally, differentially expressed surface markers were verified to be expressed in the cell type indicated by our data by cross referencing with the immune cell atlas human bulk rna-seq data [ , ] . gene set enrichment analysis (gsea) [ ] was performed using the "fgsea" package (v. . . ) [ ] in r. differentially expressed gene lists generated using mast were ranked by -log (p) multiplied by the sign of the average natural log change, as previously demonstrated in debski et al. and riemand et al. [ , ] , using the mast adjusted p-value. average natural log fold change was used to break ranking ties and this value also served as an input to gsea to quantify the correlation between the gene and phenotype, as -log (p) values become arbitrarily large. gsea results were interpreted according to normalized enrichment score (nes) and an adjusted p-value, with a p< . significance threshold. pathways with positive nes were defined throughout the text as "enriched" and pathways with negative nes were defined as "depleted". further analysis of significantly differentially expressed genes was performed using the "enrichr" (v. . ) package in r. gene ontology (go) analysis was conducted on the degs between the severe and moderate samples for both the balf and pbmc, evaluating the go "biological function" annotations representing large scale biological programs. transcription factor (tf) analysis was similarly conducted on the deg list to query the regulation of tfs identified by the encode and chea chip-x databases [ , ] . go and tf results were ranked based on the enrichr 'combined score' metric and significance was determined using an adjusted p-value threshold of p< . . to investigate the intracellular interactions potentially contributing to the observed differential gene expression between the severe and moderate sample populations in the balf, we employed the ligand-receptor interaction tool nichenet via the "nichenetr" package (v. . . ) in r. [ ] differentially expressed target genes between severe and moderate disease in a "receiver" cell population of interest were identified using the 'findmarkers' function in seurat with criteria of p< . , average natural log fold change > . , and expression in over % of the receiver cells. concurrently, a list of potential receptors expressed in over % of cells in the severe disease receiver population was generated using nichnet. a list of "sender" cells was created comprising all cell types in the balf, including the receiver cell type to account for the possibility of autocrine signaling. for each sender cell population, potential ligands were inferred using the nichnet ligand-receptor network applied to genes expressed in over % of the severe disease sender population. ligand activities were predicted using the 'predict_ligand_activities' function in nichenet, and the top ligands were selected by pearson correlation coefficient. differentially expressed target genes ranking among the most strongly predicted targets of the top ligands were given a "regulatory potential" interaction score. the upper % of targets by regulatory potential were visualized for plasmacytoid dendritic cells (pdcs), and the upper % of targets were visualized for epithelial cells (ciliated cells and club cells). to specifically analyze signaling from mononuclear phagocytes and neutrophils, differentially expressed target genes ranking among the most strongly predicted targets of the top ligands were used, with the upper % of these targets according to regulatory potential visualized in circos plots. non-signaling molecules and molecules acting on non-coding rna targets were manually removed. receptors for the top ligands were identified using the cellphonedb web server [ ] , and non-zero expression was verified in the severe disease receiver cell population of interest using seurat. identification of potential broadly acting "pan-ligands" was conducted by replicating the above nichenet procedure iteratively using each cell type of the balf as a receiver. for each receiver cell type, ligands with positive pearson correlation were filtered based on ) either belonging to the list of top ligands or having a pearson correlation greater than . , and ) presence of the corresponding receptor in over % of cells (manually validated using cellphonedb [ ] ). only ligands meeting these criteria for over one-third of cell types in the balf were preserved for further analysis. of these, the ligands differentially expressed in severe disease compared to moderate disease in at least one balf cell type were classified as potential pan-ligands. differential gene expression was analyzed using the "mast" package statistical framework through the 'findmarkers' function in seurat. differential expression statistical significance was qualified using the mast false discovery rate (fdr) adjusted p-value, with a significance threshold of p< . . pathway enrichment was analyzed using the standard gsea method in the "fgsea" package. statistical significance of normalized enrichment scores was qualified using the fgsea fdr adjusted p-value, with a significance threshold of p< . . differential gene expression relevant to ligand-receptor interactions was evaluated using the nichnet pipeline implementing the wilcoxon rank sum test through the 'findmarkers' function in seurat. statistical significance was qualified using the 'findmarkers' bonferroni-adjusted pvalue, with a significance threshold of p< . . go and tf analysis was conducted using the "enrichr" package, and statistical significance was qualified using the enrichr adjusted p-value based on fisher's exact test, with a significance threshold of p< . . ligand's ability to induce the differential gene expression measured in the corresponding "receiver" cell. a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- cross-country comparison of case fatality rates of covid- /sars-cov- . osong public health and research perspectives covid- : specific and non-specific clinical manifestations and symptoms: the current state of knowledge characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster pathological findings of covid- associated with acute respiratory distress syndrome planning and provision of ecmo services for severe ards during the covid- pandemic and other outbreaks of emerging infectious diseases cytokine profile in plasma of severe covid- does not differ from ards and sepsis covid- pneumonia: ards or not? an inflammatory cytokine signature helps predict covid- severity and death covid- : consider cytokine storm syndromes and immunosuppression imbalanced host response to sars-cov- drives development of covid- aveolar macrophage activation and cytokine storm in the pathogenesis of severe covid- pulmonary post-mortem findings in a large series of covid- cases from northern italy. the lancet infections disease postmortem examination of patients with covid- clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study baseline characteristics and outcomes of patients infected with sars-cov- admitted to icus of the lombardy region characteristics and outcomes of critically ill patients with covid- in washington state transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients comprehensive transcriptomic analysis of covid- blood, lung, and airway tocilizumab treatment in severe covid- patients attenuates the inflammatory storm incited by monocyte centric immune interactions revealed by single-cell analysis blood single cell immune profiling reveals the interferon-mapk pathway mediated adaptive immune response for covid- single-cell analysis reveals macrophage-driven t cell dysfunction in severe covid- patients ellebedy ah ( ) targeted immunosuppression distinguishes covid- from influenza in moderate and severe disease single-cell landscape of bronchoalveolar immune cells in patients with covid- cross-talk between the airway epithelium and activated immune cells defines severity in covid- in-depth phenotyping of human peripheral blood mononuclear cells in convalescent covid- patients following a moderate versus severe disease course complex immune dysregulation in covid- patients with severe respiratory failure a single-cell atlas of the peripheral immune response to severe covid- terrier b ( ) impaired type i interferon activity and exacerbated inflammatory responses in severe covid- patients immunologic perturbations in severe covid- /sars-cov- infection emergence of low-density inflammatory neutrophils correlates with hypercoagulable state and disease severity in covid- patients nichenet: modeling intercellular communication by linking ligands to target genes a mass spectrometric-derived cell surface protein atlas immune cell atlas single-cell rna-seq reveals new types of human blood dendritic cells, monocytes, and progenitors h n avian influenza infection altered expression pattern of sphiogosine- -phosphate receptor in balb/c mice the alliance of sphingosine- -phosphate and its receptors in immunity osteopontin modulates the generation of memory cd + t cells during influenza virus infection deep immune profiling of covid- patients reveals patient heterogeneity and distinct immunotypes with implications for therapeutic interventions emerging functions of amphiregulin in orchestrating immunity, inflammation, and tissue repair new fronts emerge in the influenza cytokine storm il- promotes an innate immune pathway of intestinal tissue protection dependent on amphiregulin-egfr interactions timp- promotes the immune response in influenza-induced acute lung injury integrative gene network analysis identifies key signatures, intrinsic networks and host factors for influenza virus a infections patient-based transcriptome-wide analysis identify interferon and ubiquination pathways as potential predictors of influenza a disease severity interleukin- expression and its effect on natural killer cells in patients with multiple sclerosis endothelial cells are central orchestrators of cytokine amplification during influenza virus infection sars-cov- activates lung epithelia cell proinflammatory signaling and leads to immune dysregulation in covid- patients by single-cell sequencing immune cell profiling of covid- patients in the recovery stage by singlecell sequencing longitudinal peripheral blood transcriptional analysis of covid- patients captures disease progression and reveals potential biomarkers the tumor necrosis factor alpha-induced protein (tnfaip , a ) imposes a brake on antitumor activity of cd t cells a /tumor necrosis factor α-induced protein in immune cells controls development of autoinflammation and autoimmunity: lessons from mouse models. front immunol a and cell death-driven inflammation a deficiency in lung epithelial cells protects against influenza a virus infection network-based analysis of comorbidities risk during an infection: sars and hiv case studies neutrophils in cystic fibrosis display a distinct gene expression pattern the role of cytokines including interleukin- in covid- induced pneumonia and macrophage activation syndrome-like disease should we stimulate or suppress immune responses in covid- ? cytokine and anti-cytokine interventions proliferating spp /mertk-expressing macrophages in idiopathic pulmonary fibrosis mast: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell rna sequencing data gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression profiles fast gene set enrichment analysis etiology matters -genomic dna methylation patterns in three rat models of pathway enrichment analysis and visualization of omics data using g:profiler, gsea, cytoscape and enrichmentmap an integrated encyclopedia of dna elements in the human genome chea: transcription factor regulation inferred from integrating genome-wide chip-x experiments cellphonedb: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes all data used in the study are available in the geo under accession numbers gse , gsm , and gse . all code used for analysis will be made available in a public repository. the authors have no conflict of interest to declare. key: cord- -nmii mc authors: youk, jeonghwan; kim, taewoo; evans, kelly v.; jeong, young-il; hur, yongsuk; hong, seon pyo; kim, je hyoung; yi, kijong; kim, su yeon; na, kwon joong; bleazard, thomas; kim, ho min; ivory, natasha; mahbubani, krishnaa t.; saeb-parsy, kourosh; kim, young tae; koh, gou young; choi, byeong-sun; ju, young seok; lee, joo-hyeon title: robust three-dimensional expansion of human adult alveolar stem cells and sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nmii mc severe acute respiratory syndrome-coronavirus (sars-cov- ), which is the cause of a present global pandemic, infects human lung alveolar cells (hacs). characterising the pathogenesis is crucial for developing vaccines and therapeutics. however, the lack of models mirroring the cellular physiology and pathology of hacs limits the study. here, we develop a feeder-free, long-term three-dimensional ( d) culture technique for human alveolar type (hat ) cells, and investigate infection response to sars-cov- . by imaging-based analysis and single-cell transcriptome profiling, we reveal rapid viral replication and the increased expression of interferon-associated genes and pro-inflammatory genes in infected hat cells, indicating robust endogenous innate immune response. further tracing of viral mutations acquired during transmission identifies full infection of individual cells effectively from a single viral entry. our study provides deep insights into the pathogenesis of sars-cov- , and the application of long-term d hat cultures as models for respiratory diseases. several members of the family coronaviridae are transmitted from animals to humans and cause severe respiratory diseases in affected individuals . these include the severe acute respiratory syndrome (sars) and the middle east respiratory syndrome (mers) coronavirus. currently, coronavirus disease , caused by severe acute respiratory syndrome coronavirus (sars-cov- ), is spreading globally and more than . million confirmed cases with ~ k deaths have been reported worldwide as of th jul . the lung alveoli are the main target for these emerging viruses . to develop strategies for efficient prevention, diagnosis, and treatment, the characteristics of new viruses, including mechanisms of cell entry and transmission, kinetics in replication and transcription, host reactions and genome evolution, should be accurately understood in target tissues. although basic molecular mechanisms in sars-cov- infection have been identified [ ] [ ] [ ] [ ] , most findings have been obtained from experiments using non-physiological cell lines , model animals, such as transgenic mice expressing human angiotensin-converting enzyme (ace ) , ferrets and golden hamsters , or from observation in clinical cohorts and/or inference from in-silico computational methods [ ] [ ] [ ] . as a consequence, we do not fully understand how sars-cov- affects human lung tissues in the physiological state. development of three-dimensional ( d) stem cell-derived organotypic culture models, conventionally called organoids, has enabled various physiologic and pathological studies using human-derived tissues in vitro [ ] [ ] [ ] . organoid models established from induced pluripotent stem cells (ipscs), or adult stem cells in the human kidney, intestine, and airway have been used to investigate sars-cov- pathogenesis [ ] [ ] [ ] [ ] . although human alveolar type cells (hereafter referred to as hat s) are believed to be the ultimate target cells for sars-cov- , their infection model has not previously been introduced. we have established feeder-free, d hat organoids (hereafter referred to as haos; definition of organoid is available at ref. ) with defined factors which support molecular and functional identity of hat cells over multiple passages, showing substantial improvements from the previous application of co-culture models [ ] [ ] [ ] . briefly, single-cell dissociated hat cells were isolated by fluorescenceactivated cell sorting (facs) for the hat surface marker htii- (cd -cd -epcam + htii- + ) , (fig. a; extended data fig. a) . isolated htii- + cells showed higher expression of at cell marker sftpc, while htii- cells revealed higher expressions of basal cell marker tp and secretory cell marker scgb a (extended data fig. b) . we then plated htii- + hat cells into matrigel for d cultures with our expansion medium, supplemented with chir , rspo (rspondin ), fgf , fgf , egf, nog (noggin), and sb , that are known to support the growth of human embryonic lung tip cells . htii- cells were also cultured under conditions supporting human bronchial (airway) organoids (hereafter referred to as hbos) that have previously been reported . haos established from single hat cells grew up to weeks with heterogeneous morphology including budding-like and cystic-like structures consisting of mature at cells expressing pro-sftpc, htii- , and abca , as well as exhibiting uptake of lysotracker, a fluorescent dye that stains acidic organelles such as lamellar bodies (fig. b and c) . in contrast, hbos grew quickly by day with cystic-like structures consisting of a number of airway cell types, including krt + tp + basal cells and scgb a + secretory cells, as previously reported (fig. d and e) . wnt activation was identified as an essential factor for hao formation, because no colony formation was found in the absence of wnt activator chir in culture (extended data fig. c) . importantly, our culture system allows the long-term expansion (> months) of hat s, although colony forming efficiency varied between tissue samples and reduced at later passages (extended data fig. d ). over passaging via single cells, hat s consistently formed organoids, exhibiting sftpc expression following months of continuous cultures, although growth began to slow, as evident by reduced organoid size and lower forming efficiencies (extended data fig. d and e) . alveolar type cells (hat s) expressing hopx and pdpn were also observed during early cultures, demonstrating differentiation capacity of hat cells in our haos (extended data fig. f) , although later passages exhibited loss of these cells. the expressions of ace and tmprss , which are necessary for sars-cov- infection, were observed in the membrane and cytoplasm of hao cells ( fig. f and g; extended data fig. g) . we next infected haos and hbos with sars-cov- at a multiplicity of infection (moi) of . the viral particles were prepared from a patient (known as kcdc ) who was diagnosed with covid- on th jan, , after traveling to wuhan, china . vero cells were also infected as a positive control, although this was not directly comparable to our d models due to different technical procedures. infectious virus particles increased to significant titers in haos (fig. h- k; extended data fig. ) , reaching maximum levels within the st day post infection (dpi), suggesting that full infection occurs within day from viral entry to haos. in hbos, the increment of viral particles was observed as consistent with another study , but their titers were < times lower than haos ( fig. h and i; extended data fig. ). in line with viral particles, the amount of the viral rna in haos and in its culture supernatant reached a plateau at dpi ( fig. j and k) . although infected vero cells exhibited significant cytopathic effects at dpi, typically cell rounding, detachment, degeneration and syncytium formation , sars-cov- infected haos and hbos did not show prominent macroscopic pathologies up until dpi. immunostaining for double-stranded viral rna (dsrna) and nucleocapsid protein (np) of sars-cov- identified widespread viral infection in hat cells co-expressing pro-sftpc and ace in haos ( fig. a and b; extended data fig. ) . to further determine subcellular events at a higher resolution, transmission electron microscopic analysis was performed at dpi ( fig. c- ), showed discernible viral particles in the cytoplasm. a fraction of cells in haos showed much higher viral burdens than other cells, with as many as copies in the nm section, implying that > , sars-cov- particles per cell. aggregated viral proteins, which appeared as electron-dense regions near nuclei , were also detected (fig. c) . accordingly, several key pathogenic phenotypes were observed in the infected haos. alveolar cells with enormous vacuoles were frequently observed (fig. c, f, h, i) , similar to cytopathic signals in zika virus . doublemembrane vesicles (dmvs), subcellular structures known as viral replication sites frequently seen in the early phase of infection , , were observed in the vicinity of zippered endoplasmic reticulum in a small fraction of hao cells ( fig. j and k) . viral particles were dispersed in the cytosol (fig. e) or enclosed in the small vesicular structures ( fig. f and i) . diverse forms of viral secretion were also observed mainly through the apical surfaces of hat cells ( fig. g and l) . more ultrastructural pathologies are available at extended data fig. and the empiar data archive (see data availability). from strand-specific deep rna-sequencing, we explored gene expression changes in the infected haos. indeed, a set of human genes were differentially expressed as infection progressed (i.e., , and dpi), although most genes showed good correlations (extended data fig. a and supplementary table ). cytokeratin genes (including krt , krt a, krt b, and krt c), genes involved in keratinization (including sprr a), cytoskeleton (including s a ) and cell-cell adhesion genes (including dsg ), were significantly reduced to ~ - % in haos at dpi (fig a and b) . many more genes were upregulated in the infected haos specifically at dpi. in particular, transcription of a broad range of interferon-stimulated genes (isgs), known to be typically activated by type i and iii interferons , were remarkably increased ( fig. a and b) . these genes include interferon induced protein genes (such as ifi , ifi , ifi , ifi l), interferon induced transmembrane protein genes (such as ifitm ), interferon induced transmembrane proteins with tetratricopeptide repeats genes (ifit , ifit , ifit ), '- '-oligoadenylate synthetase genes (oas , oas ), and miscellaneous genes known to be involved in innate cellular immunity (mx , mx , rsad , isg ). these genes were expressed to > times higher levels in haos at dpi than at dpi. many other known isgs also showed moderate inductions ( - times) at dpi, including bts (~ times), oas (~ times), herc (~ times), herc (~ times) and usp (~ times). antiviral functions are known for these isgs including ( ) inhibition of virus entry (mx genes, ifitm genes), ( ) inhibition of viral replication and translation (ifit genes, oas genes, isg , herc , herc , usp ) and ( ) inhibition of viral egress (rsad and bst ). of note, given that immune cells are absent in our culture system, the innate immune response was completely autologous to alveolar cells, mimicking the initial phase of sars-cov- alveolar infection. in line with the notion, innate induction of some type i and type iii interferons was observed. of the interferon genes, an interferon beta gene (ifnb ) and three interferon lambda genes (ifnl , ifnl , and ifnl ) showed significant transcriptional induction, although their absolute changes were not substantial (fig. c) . the surface receptors of interferons were stably expressed in hao cells without reference to viral infection (fig. c ). downstream signalling genes of the receptors were also upregulated such as stat (~ . times), stat (~ . times) and their associated genes irf (~ . times) and irf (~ . times). of note, irf is known to be specific to type i interferon responses , while type i and type iii isgs are generally overlapping . in addition to isgs, genes in the viral sensing pathway in cytosol showed increased expression in the infected haos at dpi, for example, ddx (official gene name of rig- , from . to . tpm), ifih (also known as mda , from . to . tpm), and tlr (toll-like receptor , from . to . tpm), irf (interferon regulatory factor , from . to tpm) and il ( . to . at dpi; proinflammatory factor). notably, these transcriptional changes were much stronger in haos than in hbos. in the similar transcriptome profiling of the infected hbos, the genes aforementioned were not significantly altered (supplementary table and extended data fig. b ). in addition, we identified few hbo specific differentially expressed genes (extended data fig. c ). this finding implies cellular tropism of sars-cov- viral infection. we further analysed the viral rna sequences obtained from the infected models. in agreement with the plaque assay ( fig. h and i) , relative transcription of sars-cov- genes plateaued by dpi (fig. d) , which is earlier than the host gene expression changes. approximately % of the rna sequencing reads were mappable to the sars-cov- genome in haos from dpi (fig. d) , indicating prevailing viral gene expression in infected hao cells as observed in vero cells . of note, the proportion of viral transcripts was much lower in the infected hbos. transcripts from sars-cov- were not mapped uniformly to the viral genome sequence, but ' genomic regions, where canonical subgenomic rnas are located, showed much higher read-depth in all samples, consistent with the previous report (fig. e) . the vast majority of viral rna sequences produced from the infected haos and hbos was in the orientation of positive-sense rna strands ( fig. e; for example, . % vs . % for positive-and negative-sense rnas, respectively, from hao at dpi). this is in good agreement with the nature of sars-cov- , which is an enveloped, nonsegmented, and positive-sense rna virus. by cross-comparison of viral rna sequences produced from a total of infected haos (n= ) and hbcs (n= ), we identified viral base substitutions (supplementary table ). no mutation was at % variant allele fraction (vaf) and exclusive to an infected sample. instead, sequence alterations showed a broad range of quasispecies heterogeneity in each culture (vaf ranges from . % to . %; fig. f ), and a large proportion of the mutations (n= ; %) were shared by two or more infected models (by the cut-off threshold of . %). therefore, we speculate that most of these sequence changes were originally present in the pool of viral particles before their inoculation. given the fact that these viral particles were prepared from one of the earliest covid- patients, our finding suggests that mutations can accumulate in the viral genomes in a small number of rounds of viral transmissions, and appear with dramatic changes in quasispecies abundance. a substantially higher proportion of specific mutations in a sample may suggest a bottleneck in viral entry or stochasticity in viral replication. to understand transcriptional changes of the infected haos at a single-cell resolution, we employed two x genomics single-cell rna-seq experiments for uninfected and infected haos at dpi (to a throughput of . gb and . gb, respectively). we (fig. ) . the number of viral transcripts, however, was not uniformly distributed in all infected hao cells, but enriched in cluster cells. infected cells in cluster exhibited . times more viral umi counts than cells in cluster , on average ( fig. c; , vs. umis, respectively), despite cells in cluster containing relatively lower total umi counts than ones in cluster ( , vs. , umis, respectively). when normalised with umi counts for human genes, cells in cluster showed a > times higher viral rna burden than cells in cluster (fig d) . interestingly, the infected cells in cluster showed reduced expression of canonical hat marker genes, including sftpb (surfactant protein b) and nkx - (nk homeobox ) ( fig. e; extended data fig. a ). compared with infected cells in cluster , expression levels of isgs, such as ifi l and oas , were also highly reduced. instead, these cells showed transcriptional induction of apoptosis mediator, gadd b (growth arrest and dna-damage-inducible, beta) and anti-apoptotic tnfaip (tumor necrosis factor, alpha-induced protein ), suggesting a catastrophic cellular pathway operating in a cell due to the extreme viral burdens. despite active protein expression ( fig. f and g; extended data fig. g) , we found cells ( . %) showing ace transcripts, cells ( . %) expressing tmprss transcripts, and cells ( . %) coexpressing both in single-cell transcriptome sequencing (fig. e) . these proportions are low at face value, but are consistent with a previous observation . although the previous report also suggested that ace rna expression can be stimulated as an infection-mediated response, particularly in human airway cells, such a trend was not observed in our dataset. finally, we statistically inferred the number of viral particles effectively entering each alveolar cell for infection. although we incubated cells at an moi of on average, it is generally not known how many viral particles are necessary for effective infection of an alveolar cell. in an extreme scenario, one viral particle is sufficient. alternatively, infection may be initiated with the entry of multiple viruses. we tracked the effective viral number of cellular entry using a mutation (nc_ . : , c>u) as a viral barcode. from our sequencing, the mutation was estimated to be present at . % vaf in the initial viral pool for innoculation. if the first scenario dominantly applies, the infected alveolar cells will ; fig. f ). in a more sophisticated statistical analysis, infection by single viral entry is estimated as > times more frequent than by multiple viral entry ( % vs. %, respectively; fig. g ). our calculation indicates that a single viral particle is mainly responsible for sars-cov- infection in most alveolar cells, although multiple viral entry is also possible. it may also reflect the viral interference in sars-cov- alveolar infection. in this study, we established conditions for optimised d long-term cultures of adult hat cells, which provided an essential tool for studying initial intrinsic responses of sars-cov- infection. single hat cells were capable of self-constructing alveolus-like structures consisting of at cell and differentiated at cells. mature hat cells were maintained > months over multiple passages although self-renewal capacity and growth rate was reduced after month in cultures. hat cells were also lost from later culture, likely due to the persistent exposure to high wnt conditions allowing expansion of hat cells over differentiation. alteration of wnt activity in culture media (differentiation media) may enable to induce further at cell differentiation. haos showed remarkable phenotypic changes in the first few days after sars-cov- innoculation. the interferon response is the first line of host antiviral defense . contrary to a recent report , we observed substantial isgs in the alveolar cells induced by endogenously produced type i and iii interferons. however, the induction of interferon responses was seen at dpi in hao models, - days later than the timing of viral amplification at dpi. the timing of isg induction may be earlier in vivo in concert with exogenous interferons from immune cells. for more physiological understanding, co-culturing sars-cov- infected hao models with immune cells obtained from the same donor will be helpful. in summary, our study highlights the power of feeder-free haos to elucidate the intrinsic responses of tissue damage including virus infection. our data, including high-resolution electron microscopic images and the list of gene expression changes following infection, will be a great resource for the biomedical community to provide a deeper characterisation of sars-cov- infection specifically within adult hat cells. we believe that our hao models will enable more accurate and sophisticated analyses in the very near future, especially for studying the response of viral infection within vulnerable groups such as aged or diseased lungs, providing the opportunity to elucidate individual patient responses to viral infection. furthermore, our models can be applied to other techniques, such as co-culture experiments with immune cells and robust in vitro screening of antiviral agents applicable to alveolar cells, in addition to being applicable for the study of the basic biology of alveolar cells as well as chronic disorders of the lung. for the establishment of human lung organoid models, human distal lung parenchymal tissues from as with htii- + , although with a few minor differences. bronchial organoids (hbos) were passaged every - days due to accelerated growth compared with alveolar organoids (haos), and were cultured in previously reported medium conditions with the following concentration/factor edits; ng/ml human fgf , % r-spondin- , µm sb (instead of a - ). organoids were fixed and embedded in a paraffin block . pre-cut µm paraffin sections were dewaxed and rehydrated (sequential immersion in xylene, % etoh, % etoh, % etoh, distilled water) and either stained with hematoxylin and eosin (h&e) or immunostained. for antigen retrieval, slides were submerged into pre-heated citrate antigen retrieval buffer ( mm sodium citrate, ph . ) and allowed to boil for min. slides were cooled in a buffer for min, washed in running water for min, and permeabilised with . % triton-x in pbs for min. cells were protected from light and imaged immediately using an evos cell imaging system. organoids were fixed in % paraformaldehyde (pfa) for hrs at ice, and then dehydrated in pbs with % sucrose (v/v) (sigma). organoids were embedded with optimal cutting temperature (oct) compound (leica) and cut with µm. organoid section was blocked with % normal donkey serum in pbs % triton-x (sigma). sections were incubated with primary antibodies overnight at °c, freshly sorted htii- + and htii- cells were lysed with trizol, and rna was extracted. rna was reverse transcribed using superscript iv (thermo fisher scientific), and were assessed using the following taqman probes; sftpc (hs _g ), tp (hs _m ), scgb a (hs _m ). viral rna samples were reverse-transcribed using superscript iv (thermo fisher scientific). viral n gene was targeted for qrt-pcr. nucleotide sequences of the probes as below (cdc). matrigel was sheared with the organoid media and frozen at - °c once. thaw the solution and dilute by scale of . each well containing vero cells in wells were infected with the diluted solution respectively at °c, % co for hr. after infection, remove infection media and wash the vero cells with pbs two times, mixed agar and modified eagle's medium (thermofisher) were poured on each well. when agar mixture was hardened, fix each well with % pfa for days, and stain with crystal violet (sigma). when there are individual spots, the original solution's viral titer was calculated. extracted cellular rna was processed through truseq stranded total rna gold kit, and cdna library was sequenced x bp using hiseq . fastq file was aligned to grch with virus sequence (nc . from ncbi) using star and normalized rna expression was calculated using rsem . differentially expressed genes are found from deseq . we obtained enriched gene sets using in-house scripts. for mutation calling, we used strelka , varscan , and samtools , and then manually checked the position through igv . fastq file was aligned and each umi count was calculated using cell ranger software provided by the manufacturer ( x genomics). cells with mitochondria rna percent < %, total rna number > subsets were used for downstream analysis. starting from the x gene counts, we have normalized data as follows. the x data includes sars-cov- genome as an extra gene besides , human genes. for uninfected cells, such sars-cov- gene would have zero read count, while for infected cells, the gene could account for a large portion of total reads. one of the goals of normalisation is to remove technical difference such as sequencing coverage before comparing gene expression levels. as our interest is to compare expression levels between uninfected and infected cells on human genes, we have applied a normalisation method ('scater r package's 'lognormcounts' function) to human genes as a whole set. the method calculated a scaling factor for each cell based on total human gene count, then scaled all genes before taking log-transformation. using the same scaling factor learned during human gene normalisation, sars-cov- count was also normalized. for clustering, we combined single-cell data from infected and uninfected haos. unsupervised clustering was performed using a shared nearest neighbor (snn) based clustering algorithm in seurat . contaminated cells (< %) were discarded. in house r scripts were used for more downstream analyses. to assess whether alveolar cells tend to be infected by a single viral particle or multiple particles, we employed a likelihood approach. as a proof-of-concept, we assumed only two scenarios exist, one supporting a single viral entry and the other supporting double viral entry, then aimed to estimate the proportion of cells with a single viral particle ( ). out data consists of the observed reference ( cell ) and variant ( ) read counts for each of the reporting at least one read at the mutation site of nc_ . : , . assuming a sequencing error rate (ε) of . %, which will cover any illumina sequencing errors or misalignment, the likelihood of data given the weight supporting a single virus scenario ( ) was computed as follows. we observed a distinct transcriptional feature of cells in cluster which express lower levels of canonical hat marker genes, including sftpc but detectable levels of airway marker genes, including sox , tp , krt , and krt (extended data fig. b) . these expression patterns were not affected by virus infection. hat cells expressing airway markers such as sox were seen in chronic lung diseases such as lung cancer and idiopathic pulmonary fibrosis (ipf) , , representing pathologic phenotypes of alveolar bronchiolization. a recent study also suggested the potential transition of hat cells to krt + basal-like cells in the context of ipf . given the fact that the haos used for our singlecell rna sequencing study were derived from hat cells isolated from adjacent normal counterparts of lung cancer and/or ipf, it is likely that this transcriptional feature reflects the cellular status of original tissues rather than virus-associated phenotype. this finding suggests that our hao models maintain the pathophysiologic features of original tissues although we used apparently normal background regions for our hao establishments. further long-term tracing of changes in cellular identities and states in response to virus infection in hao cells will be of significant interest to understand the progression of pathologic features and reparative mechanisms for developing therapeutic interventions. furthermore, from our scrnaseq analysis, most captured cells were hat cells. it is likely that this might result from the enrichment of hat cells in our haos (p ) and the nature of fragile hat cells during the procedure of single-cell preparation for scrnaseq. supplementary table . rna expression levels (tpm) of all genes in seven human alveolar organoid samples. the authors declare no competing interests. all unique organoids generated in this study are available from young seok ju or joo-hyeon lee with a completed materials transfer agreement. bulk rna and single cell rna sequencing datasets will be uploaded on the european genome-phenome archive (ega). accession id is not assigned yet. human coronavirus: host-pathogen interaction a new coronavirus associated with human respiratory disease in china health organization coranavirus disease (covid- ) situation report - integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of sars-cov- viral entry and highlights inflammatory programs in putative target cells sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the architecture of sars-cov- transcriptome structure, function, and antigenicity of the sars-cov- spike glycoprotein structural and functional basis of sars-cov- entry by using human ace comparative tropism, replication kinetics, and cell damage profiling of sars-cov- and sars-cov with implications for clinical manifestations, transmissibility, and laboratory studies of covid- : an observational study the pathogenicity of sars-cov- in hace transgenic mice infection and rapid transmission of sars-cov- in ferrets pathogenesis and transmission of sars-cov- in golden hamsters virological assessment of hospitalized patients with covid- the proximal origin of sars-cov- phylogenetic network analysis of sars-cov- genomes structural basis of receptor recognition by sars-cov- organoids as an in vitro model of human development and disease modelling cryptosporidium infection in human small intestinal and lung organoids in vitro expansion of human gastric epithelial stem cells and their responses to bacterial infection organoids of human airways to study infectivity and cytopathy of sars-cov- sars-cov- productively infects human gut enterocytes inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids lung organoids: current uses and future promise type alveolar cells are stem cells in adult lung regeneration of the lung alveolus by an evolutionarily conserved epithelial progenitor human alveolar type epithelium transdifferentiates into metaplastic krt + basal cells during alveolar repair htii- , a biomarker specific to the apical plasma membrane of human lung alveolar type ii cells human embryonic lung epithelial tips are multipotent progenitors that can be expanded in vitro as long-term self-renewing organoids long-term expanding human airway organoids for disease modeling identification of coronavirus isolated from a patient in korea with covid- generation of human bronchial organoids for sars-cov- research sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum acute and persistent infection of human neural cell lines by human coronavirus oc infectious bronchitis virus generates spherules from zippered endoplasmic reticulum membranes sars-coronavirus- replication in vero e cells: replication kinetics, rapid adaptation and cytopathology early local immune defences in the respiratory tract interferon-stimulated genes: a complex web of host defenses differential activation of the transcription factor irf underlies the distinct immune responses elicited by type i and type iii interferons type i and type iii interferons -induction, signaling, evasion, and application to combat covid- sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues imbalanced host response to sars-cov- drives development of covid- lung stem cell differentiation in mice directed by endothelial cells via a bmp -nfatc -thrombospondin- axis human intestinal organoids maintain self-renewal capacity and cellular diversity in niche-inspired culture condition clinical diagnosis of early dengue infection by novel one-step multiplex realtime rt-pcr targeting ns gene star: ultrafast universal rna-seq aligner rsem: accurate transcript quantification from rna-seq data with or without a reference genome moderated estimation of fold change and dispersion for rna-seq data with deseq strelka: accurate somatic small-variant calling from sequenced tumornormal sample pairs varscan : somatic mutation and copy number alteration discovery in cancer by exome sequencing the sequence alignment/map format and samtools integrative genomics viewer comprehensive integration of single-cell data bronchiolization of the alveoli in lung cancer: pathology, patterns of differentiation and oncogene expression single-cell rna sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis we thank jinwook choi key: cord- -wt crton authors: fajardo, Álvaro; pereira-gómez, marianoel; echeverría, natalia; lópez-tort, fernando; perbolianachis, paula; aldunate, fabián; moreno, pilar; moratorio, gonzalo title: evaluation of sybr green real time pcr for detecting sars-cov- from clinical samples date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wt crton the pandemic caused by sars-cov- has triggered an extraordinary collapse of healthcare systems and hundred thousand of deaths worldwide. following the declaration of the outbreak as a public health emergency of international concern by the world health organization (who) on january th, , it has become imperative to develop diagnostic tools to reliably detect the virus in infected patients. several methods based on real time reverse transcription polymerase chain reaction (rt-qpcr) for the detection of sars-cov- genomic rna have been developed. in addition, these methods have been recommended by the who for laboratory diagnosis. since all these protocols are based on the use of fluorogenic probes and one-step reagents (cdna synthesis followed by pcr amplification in the same tube), these techniques can be difficult to perform given the limited supply of reagents in low and middle income countries. in the interest of economy, time and availability of chemicals and consumables, the sybr green-based detection was implemented to establish a convenient assay. therefore, we adapted one of who recommended taqman-based one-step real time pcr protocols (from the university of hong kong) to sybr green. our results suggest that sybr-green detection represents a reliable cost-effective alternative to increase the testing capacity. ever since sars-cov- was identified as the etiological agent of a novel disease, at the beginning of the current year (gorbalenya et al. ; zhu et al. a) , the world health organization (who) has been following up on its spread (world health organization (who) a). in addition, most of the scientific work has been mainly focused on three areas: i) the characterization of this virus and the disease that it caused; ii) the rapid developing of diagnostic methods; and iii) the patient treatments (dennis lo and chiu ). the rapid spreading of sars-cov- highlights the need for an effective surveillance method to be widely used in different laboratory settings (thompson ) . this fact has prompted the development of a wide variety of molecular diagnostic methods based on the detection of viral genomic rna. the vast majority rely on reverse transcription real time pcr (rt-qpcr), due to its high sensitivity and specificity corman et al. ; huang et al. ; world health organization (who) b; zhu et al. a) . this technique, either as a one-step or a two-step protocol, has accelerated pcr laboratory procedures and has had the strongest impact on virology as it is being applied for detection, quantification, differentiation and genotyping of animal and human viruses (bankowski and anderson ; kaltenboeck and wang ) . furthermore, it is regarded as a gold standard for analysis and quantification of pathogenic rna viruses in clinical diagnosis (espy et al. ) . in particular, for the molecular diagnosis of covid- the who website recommends few one-step rt-qpcr detection protocols that have been developed in different countries (world health organization (who) b). since all these protocols are based on the use of fluorogenic probes and one-step reagents (cdna synthesis followed by pcr amplification in the same tube), these techniques are limited to the use of more specific reagents and can be quite expensive. moreover, these protocols involve the amplification of more than one gene, which implies different probes and fluorescent channels. therefore, several researchers have attempted to develop alternative sars-cov- detection methods that might be faster or cheaper to implement, such as loop-mediated isothermal amplification (lamp) (jiang et al. ; park et al. ; yang et al. ; zhang et al. ; zhu et al. b) , droplet digital pcr (ddpcr) (dong et al. ; suo et al. ) , multiplex pcr or even protocols based on crispr-cas (curti et al. ) . furthermore, considering the shortage in the supply of rna extraction kits, others have evaluated alternative nucleic acids extraction methods (bruce et al. ; ladha et al. ; zhao et al. ) . despite all these approaches, there is still much to be done to generate strategies that might be helpful for different laboratory settings. quantitative pcr (qpcr) is a molecular technique widely used when detection and/or quantification of a specific dna target is needed. qpcr is based on fluorescence to measure the amount of a dna target present at each cycle of amplification during the pcr. the most common ways of generating a fluorescent signal are by use of specific hydrolysis probes (i.e. taqman® probes), or a double-stranded dna binding dye (i.e sybr® green). sybr-green-based detection method presents several advantages over taqman chemistry ones, as being cheaper and not requiring the synthesis of specific probes. this technique has already been proposed and used for laboratory testing of different pathogens, including viruses (espy et al. ; fernández et al. ; gomes-ruiz et al. ; kumar et al. ), bacteria (kositanont et al. ; keerthirathne et al. ) and unicellular protozoan parasites (espy et al. ; haanshuus et al. ), among others. for sars-cov- detection, some preliminary reports have attempted to assess the sensitivity and predictive value of the different sets of primers and probes available (either commercially or in-house developed) (barra et al. ; casto et al. ; jung et al. ), but so far, no comparison has been made between the different real time chemistries for this emerging virus. the aim of the study was to set up an alternative molecular protocol to detect sars-cov- from clinical samples, without the need of taqman probes or post-pcr steps (i.e. gel electrophoresis), which can be implemented in case of difficulties to get specific reagents or kits because of the current pandemic situation. here we showed that taqman-based one-step real time pcr protocol recommended by the who poon et al. ) can be successfully adapted and alternatively used with sybr green-based two-step qpcr.. besides, performing a comparison of the different molecular techniques by employing dilutions of control vectors and rna standards for quantification, we tested our assay with clinical samples collected from confirmed covid- cases and one negative patient. overall, our results showed that both approaches were able to detect sars-cov- from clinical samples. positive controls were kindly provided by dr. leo poon from the university of hong kong. positive controls contain a region of orf b-nsp or n targets of sars-cov urban strain cloned into a standard plasmid. residual de-identified nasopharyngeal samples were remitted to the institut pasteur montevideo, that has been validated by the ministry of health of uruguay as an approved center providing diagnostic testing for covid- . the one-step rt-qpcr protocol evaluated in this study corresponded to the one developed by the university of hong kong poon et al. ) , with modifications, which consists of two monoplex real-time rt-pcr assays targeting the orf b-nsp and n gene regions of sars-cov- (supplementary table ). concentrations used were lowered to avoid non-specific amplification (data not shown). briefly, a μl monoplex reaction contained μl of x taqman fast virus master mix (thermo fisher), . µl of each primer ( . μm final concentration each), . μl of the probe ( . μm final concentration) and μl of rna. these monoplexes were performed for both n and orf b-nsp regions. thermal cycling was run on a step-one plus rt-pcr thermal cycler (applied biosystems) with the following cycle parameters: °c for min for reverse transcription, inactivation of reverse transcriptase at °c for s and then cycles of °c for s and °c for s. the expected amplicon sizes of orf b-nsp and n are bp and bp, respectively. this protocol was carried out with serial dilutions of plasmids containing n and orf b-nsp genes (kindly provided by dr. leo poon from the school of public health, university of hong kong), rna standards of the same targets constructed in our laboratory, and later validated with rna samples of covid- cases. a non-template control (nuclease-free water) was included in every one-step rt-qpcr run. we manually set the threshold value to . in all assays to determine the threshold cycle (ct). a test run for the amplification of the controls was done to select the appropriate dilution to use for the amplification of the clinical samples, which were, in turn, run in duplicates. first, complementary cdna of sars-cov- clinical samples was generated using superscript ii reverse transcriptase (invitrogen), random primers and μl of rna, according to the manufacturer's instructions. qpcr reactions were carried out using a step-one plus rt-pcr thermal cycler (applied biosystems), luna universal qpcr master mix (new england biolabs), following manufacturer's instructions, and the same primers previously used in the one step rt-qpcr taqman protocol. each μl reaction contained μl of x master mix (neb), . µl of each primer ( . μm final concentration each) and μl of cdna. again, non-template control (nuclease-free water) was included in every qpcr run as a negative control. in this case, we set the threshold value to . in all assays to determine the ct. as with the probe-based protocol, a test run for the amplification of the control plasmids was done to select the appropriate dilution to use for the amplification of the clinical samples, which were, in turn, run in duplicates. the cycling conditions were: initial denaturation at °c for min, pcr cycles of °c for s and °c for s, followed by a melting curve ranging from °c to °c (acquiring fluorescence data every . °c). with the aim of verifying specific amplification, in addition to the melting curve step during the run, we also confirmed the amplicon sizes by % agarose gel electrophoresis. a fragment of and bp containing the orf b-nsp and n targets, respectively, were cloned into pcr™ . -topo® using the topo® ta cloning® kit (invitrogen) following manufacturer's instructions and transformed in neb® -alpha competent e. coli (high efficiency) by the heat shock method ( °c, s). plasmids were isolated using purelink quick plasmid miniprep kit (invitrogen) and quantified by spectrophotometric analysis (biophotometer, eppendorf). then, µg of each plasmid was linearized with spei and in vitro transcribed with t rna polymerase (thermo fisher) following the manufacturer's instructions. in vitro transcribed rna was treated with dnase and purified with turbo dna-free™ kit (thermo fisher). rna purified was checked for size and integrity by gel electrophoresis. the number of copies/µl was calculated as: (na x c)/ mw, where, na is the avogadro constant expressed in mol− , c is the concentration expressed in g/µl, and mw is the molecular weight expressed in g/mol. a stock containing around x copies/µl (for both orf b-nsp and n) was used for standard curve and sensitivity determination of the qpcr assays. the standard curve and sensitivity were determined by -fold serial dilutions. in the case of the one-step probe-based qpcr assays, µl of rna was directly added to the mix and run, in triplicates, as mentioned above. for the two-step sybr green-based qpcr µl of the same -fold serial dilutions of the in vitro transcribed rna for each target were retrotranscribed and then µl of the cdna was used as template for sybr green qpcr. each cdna was run in duplicates. standard curves were represented as ct vs log copy number/reaction. the lower limit of detection was defined as the lowest copy number of target/qpcr, taking account for dilution, which amplified reliably. pcr products generated by the qpcr protocol with sybr green contain da overhangs at the ´ ends. therefore, the fresh pcr products of the n target from samples , , and were directly cloned into pcr™ . -topo® using the topo® ta cloning® kit (invitrogen) following manufacturer's instructions. next, cloning reactions were transformed in neb® -alpha competent e. coli (high efficiency) by the heat shock method ( °c, s), plated in lb medium containing µg/ml ampicillin (amp), µl xgal ( mg/ml), µl iptg ( mm) and incubated at °c overnight. three individual white colonies for each cloning reaction were isolated and overnight cultured in lb containing µg/ml ampicillin. plasmids were isolated using purelink quick plasmid miniprep kit (invitrogen) and sanger sequenced with the universal primers m forward and m reverse. ab files from sanger sequencing were analyzed using the staden package v . . (http://staden.sourceforge.net). megax (http://www.megasoftware.net) was used to perform sequence analysis. gc content and dna melting temperature of orf b-nsp and n targets from sars urbani isolate (mk ) and sars-cov- (mt ) were estimated using the dna melting temperature (tm) calculator (available at http://www.endmemo.com/bio/tm.php). the rationale for setting the parameters was to emulate as much as possible sybr-green qpcr conditions used in this study. to do so, the salt and magnesium concentration were set at mm and . mm, respectively, as it is indicated by the manufacturer. initial dna copy numbers were obtained using the ct values empirically observed in the sybr green qpcr for the positive control and the clinical samples for both targets (showed in table ). then, we interpolated them in their corresponding standard curve in order to estimate the initial number of copies in the qpcr reaction. after this, we calculated the number of target copy numbers after one pcr cycle and estimated the dna concentration for each sequence expressed as nm using the dna/rna copy number calculator from the http://www.endmemo.com/bio/dnacopynum.php website. in order to select an appropriate amount of control vector to use in the comparison between the two real time qpcr methods, we prepared plasmids dilutions ( , , and copies/μl) and assayed them following both protocols: the probe-based one step rt-qpcr developed by the university of hong kong poon et al. ) and the in-house sybr green-based protocol adapted in this study. it is worth mentioning that previous results, of our laboratory, had indicated that a lower amount of primers and probes than initially suggested by poon et al. ( ) , rendered similar positive results, and diminished the amplification of primer dimers (data not shown). real time pcr results, from syber and taqman chemistries, of different dilutions of the control vectors for the targeted regions (orf b-nsp and n) are shown in figure (panels a, b, c and d). since all dilutions amplified correctly and below a ct of ( fig. and table ), we decided to use copy number/μl as a positive control for subsequent assays (for both orf b-nsp and n genes). analyzing the specificity of the sybr green-based qpcr method (fig. , panels e to h) from orf b-nsp , we verify the presenceof only one pcr product, corroborated by a unique melting peak (tm= . °c) (fig. e and table ). agarose gel electrophoresis (fig. g ) allowed also the verification of the expected product size ( bp) with no amplification in the negative control. in the case of the sybr green-based qpcr method for n gene amplification we can observed ( fig. d and f, for all n dilutions, a very clear peak at tm= . °c, together with a non-symmetric melting temperature peak slightly skewed to a higher temperature, which might suggest the presence of two pcr products. however, when we run the pcr products on an agarose gel only one product of the expected size ( bp) is observed (fig h) , demonstrating that the presence of a double peak was not indicative of non-specific amplification.. for the non-template-control we observed a slight amplification (ct= . ), although the melting curve evidenced a non-specific peak (tm= . °c) to validate the sybr green qpcr protocol , we assayed a set of rna samples from covid- cases with both qpcr methods and both genetic regions . these samples were beforehand determined as sars-cov- positive (samples to ) and negative (sample ), employing the diagnostic kit provided by the panamerican health organization (berlin protocol). the results obtained, with both qpcr assays chemistries and genetic regions, were in agreement with the data previously gatheredfor these samples ( fig. and ) .. the amplification data for the sybr green-based qpcr protocol showed that the orf b-nsp region was correctly amplified for all sars-cov- positive samples ( to ) (fig. ) . this was verified by melting curve analysis in every case, in agreement with the positive control ( fig. b and table ). in addition, sample , which was suspected to have a low viral load, was correctly amplified with this protocol. as for the negative viral rna sample (sample ), even though it seems to amplify in very late cycles ( fig. a and table ), the melting curve analysis reveals that the amplification corresponds to primer dimers and/or non-specific products ( fig. b and table ). as in the case of orf b-nsp amplification, the amplification of n region allowed the correct assignment of all positive and negative clinical samples. the non-template controls, as well as sample (negative for sars-cov- ), showed delayed non-specific amplification ( fig. c and d, and table ), which in principle does not invalidate the results, because are in agreement with the results of the assay using different dilutions of the control vectors. in addition, the clinical samples showeda skewed peak similar to the one that was previously observed for the positive control. (fig. d ).taking together, these results suggest the specific amplification of n target. despite all sars-cov- positive samples amplify the same product, they exhibit a higher tm than the positive control ( . ± . °c vs . °c, respectively), effect that itwas not observed for orf b-nsp target. this result can be explained as the positive control used in this study correspond to a sars-cov urbani isolate (genbank accession number mk ) which was used due to the unavailability of a sars-cov- positive control. the tm of a dna fragment depends on a variety of features such as its length, gc composition, sequence and concentration, among others. given the results previously described, we hypothesized that the difference between the tm of the positive control and the clinical samples assayed here was due to a higher gc content in the n target from uruguayan patients. to test this, we first cloned of pcr product obtained from the clinical samples and sequenced one molecular clone for each ( figure ) . the results showed that all sequences were identical between them and a blast search showed that all cloned sequences had % identity with sars-cov- , confirming that sybr-green based qpcr specific amplified viral rna present in the clinical samples. we then calculated the gc content as well as the tm was in silico estimated for the amplicon sequences of the n target from clinical samples. comparisons were made taking as reference the the n target of the sars-cov urbani strain, which was used as positive control (table ) . given the in silico tm estimates, the n amplicons obtained for sars-cov- samples should have a higher tm (around °c) than that of sars-cov urbani strain, which confirms that the tm differences observed for n amplicon derive from their different gc content. the magnitude of the tm gain correlated positively with the gc% (pearson, r = . , p < . ). therefore, we conclude that the observed differences on the tm of n targets from clinical samples were due to differences in their gc content. finally, to determine the limit of quantitation of both sars-cov- detection qpcr approaches, serial dilutions of an rna standard for each target were performed ( figure ). as expected, the ct of each reaction increased along with the lower number of target copies/reaction. the ct values showed an inverse linear relationship with the log value of the rna concentrations with a very high correlation (r > . in all cases). the results showed that the limit of quantitation for orf b-nsp and n targets were equal to copies/reaction for probe-based qpcr ( figure a and c) and × copies/reaction ( figure b and d) for sybr-green qpcr assays. the qpcr technique is widely used in clinical virology diagnostic laboratories because of its high sensitivity, specificity, reproducibility and no need of post pcr steps (josko ) . additionally, qpcr allows for quantification because during the amplification of the target, it reaches a threshold level that correlates with the amount of initial target sequence (valasek and repa ) . sybr-green based qpcr has relatively low cost benefit, whereas taqman-based qpcr are more expensive. in addition, the specificity of the qpcr is mainly provided by the use of specific primers, although taqman probes increase the specificity because only sequence-specific amplifications are measured (tajadini et al. ). our results with sybr green chemistry were consistent with the initial probe-based protocol designed by poon et al. ( ) showing sensitivity to specifically detect sars-cov- . it is worth noting that the original protocol poon et al. ) suggested the use of the n target for screening analyses, whereas the amplification of orf b-nsp was indicated as a confirmatory assay. orf-nsp encodes for a very conserved exoribonuclease present in all known coronaviruses which is involved in replication fidelity (eckerle et al. (eckerle et al. , . the n gene encodes for the structural nucleoprotein, which is more exposed to the recognition of the host immune system and therefore could be more prone to change than orf-nsp (woo et al. ). importantly, if mutations occur within the probe-binding site, they would prevent the annealing of the probe and its subsequent detection. although coronaviruses are between the rna viruses with lower mutation rates (sanjuan et al. ) , it would be possible that new mutations impact negatively on the probe detection. therefore, counting on an alternative detection method such as sybr green-based two-step qpcr, that only requires two conserved regions for primer binding instead of three (for hybridization probe), it might become useful. in this context, our results with sybr green chemistry may provide a simpler and cheaper alternative for sars-cov- detection. here we reported a lower limit of detection of the taqman probe-based approach compared to the sybr-green based qpcr. in addition to the decrease in specificity due to the lack of use of a probe, sybr green-based qpcr approach needs a previous step of cdna synthesis. this extra step represents a possible source of contamination that can affect the results. in order to increase the specificity of the sybr green-based qpcr assayed here, we could evaluate the use of specific primer instead of random hexamers during the retrotranscription step. another disadvantage of the sybr green vs probe-based qpcr is that any non-specific product including primer-dimer can lead to false positive results. for this reason, the melting curve analysis must be performed to confirm that only specific amplification was obtained. although multiplexed qpcr is more frequently developed for taqman technology, our results suggest that a multiplexed sybr green-based qpcr could be developed for sars-cov- detection. the difference of the gc% content among the targets orf-nsp and n which produce a tm difference of °c, seems to be enough for simultaneous detection of both targets in the same tube. in this case, after melting curve analysis two specific double peaks should be observed. altogether, both sybr green-based qpcr and taqman probe-based qpcr assays for detecting sars-cov- were set up in our laboratory conditions and their consistencies, as well as their advantages and disadvantages, were analyzed. this work could help to increase the testing capacity of some places in the world with limited access to taqman specific reagents, given the current lockdown of many countries. rapid detection of herpes simplex virus dna in genital ulcers by real-time pcr using sybr green i dye as the detection signal quantitative analyses of cytomegalovirus genome in aqueous humor of patients with cytomegalovirus retinitis real-time nucleic acid amplification in clinical microbiology analytical sensibility and specificity of two rt-qpcr protocols for sars-cov- detection performed in an automated workflow determination of viral load by quantitative realtime pcr in herpes simplex encephalitis patients rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swab using qiagen rneasy kits or directly via omission of an rna extraction step comparative performance of sars-cov- detection assays using seven different primer/probe sets and one assay kit assessment of a rabies virus rapid diagnostic test for the detection of australian bat lyssavirus molecular diagnosis of a novel coronavirus ( -ncov) causing an outbreak of pneumonia detection of novel coronavirus ( -ncov) by real-time rt-pcr comparative evaluation of real-time pcr assays for detection of the human metapneumovirus an ultrasensitive, rapid, and portable coronavirus sars-cov- sequence detection method based on crispr-cas racing towards the development of diagnostics for a novel coronavirus ( -ncov) highly accurate and sensitive diagnostic detection of sars-cov- by digital pcr infidelity of sars-cov nsp -exonuclease mutant virus replication is revealed by complete genome sequencing high fidelity of murine hepatitis virus replication is decreased in nsp exoribonuclease mutants external quality assessment study for ebolavirus pcr-diagnostic promotes international preparedness during the - ebola outbreak in west africa real-time pcr in clinical microbiology: applications for routine laboratory testing comparison of the sybr green and the hybridization probe format for real-time pcr detection of hhv- sybr green and taqman real-time pcr assays are equivalent for the diagnosis of dengue virus type infections the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- assessment of malaria real-time pcr methods and application with focus on lowlevel parasitaemia clinical features of patients infected with novel coronavirus in wuhan development and validation of a rapid single-step reverse transcriptase loop-mediated isothermal amplification (rt-lamp) system potentially to be used for reliable and high-throughput screening of covid- molecular virology in the clinical laboratory comparative analysis of primer-probe sets for the laboratory confirmation of sars-cov- advances in real-time pcr: application to clinical laboratory diagnostics real-time pcr for rapid diagnosis of entero-and rhinovirus infections using lightcycler real time pcr for the rapid identification and drug susceptibility of mycobacteria present in bronchial washings detection and differentiation between pathogenic and saprophytic leptospira spp. by multiplex polymerase chain reaction comparative reproducibility of sybr green i and taqman real-time pcr chemistries for the analysis of matrix and hemagglutinin genes of influenza a viruses a -min rna preparation method for covid- detection with rapid detection of west nile virus from human clinical specimens, field-collected mosquitoes, and avian samples by a taqman reverse transcriptase-pcr assay high sensitivity detection of coronavirus sars-cov- using multiplex pcr and a multiplex-pcr-based metagenomic method development of reverse transcription loop-mediated isothermal amplification (rt-lamp) assays targeting sars-cov- comparison of real-time sybr green dengue assay with real-time taqman rt-pcr dengue assay and the conventional nested pcr for diagnosis of primary and secondary dengue infection detection of novel coronavirus ( -ncov) in suspected human cases by rt-pcr viral mutation rates detection of australian bat lyssavirus using a fluorogenic probe ddpcr: a more sensitive and accurate tool for sars-cov- detection in low viral load specimens comparison of sybr green and taqman methods in quantitative real-time polymerase chain reaction analysis of four adenosine receptor subtypes novel coronavirus outbreak in wuhan, china, : intense surveillance is vital for preventing sustained transmission in new locations the power of real-time pcr coronavirus genomics and bioinformatics analysis world health organization (who) ( a) rolling updates on coronavirus disease world health organization (who) ( b) coronavirus disease (covid- ) technical guidance: laboratory testing for -ncov in humans rapid detection of sars-cov- using reverse transcription rt-lamp method rapid molecular detection of sars-cov- (covid- ) virus rna using colorimetric lamp a simple magnetic nanoparticles-based viral rna extraction method for efficient detection of sars-cov- a novel coronavirus from patients with pneumonia in china reverse transcription loop-mediated isothermal amplification combined with nanoparticles-based biosensor for diagnosis of covid- we thank dr. leo poon and his group (school of public health, the university of hong kong) for kindly providing us with the control vectors for orf b-nsp and n regions of sars-cov used in this work. the authors declare no conflict of interest. information of primers and probes tested in this study from the university of hong kong protocol (poon et al. ) . orf b-nsp hku-orf b-nsp f key: cord- -rovyvv authors: wagner, teresa r.; kaiser, philipp d.; gramlich, marius; becker, matthias; traenkle, bjoern; junker, daniel; haering, julia; dulovic, alex; schweizer, helen; nueske, stefan; scholz, armin; zeck, anne; schenke-layland, katja; nelde, annika; strengert, monika; walz, juliane s.; ruetalo, natalia; schindler, michael; schneiderhan-marra, nicole; rothbauer, ulrich title: neutrobodyplex - nanobodies to monitor a sars-cov- neutralizing immune response date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rovyvv as the covid- pandemic escalates, the need for effective vaccination programs, diagnosis tools and therapeutic intervention ever increases. neutralizing binding molecules have become important tools for acute treatment of covid- and also provide a unique possibility to monitor the emergence and presence of a neutralizing immune response in infected or vaccinated individuals. here we identified unique nanobodies (nbs) with high binding affinities to the sars-cov- spike receptor domain (rbd). of these, effectively block the rbd:ace interface. via competitive binding analysis and detailed epitope mapping, we grouped all nbs into sets and demonstrated their neutralizing effect. combinations from different sets showed a profound synergistic effect by simultaneously targeting different epitopes within the rbd. finally, we established a competitive multiplex binding assay (“neutrobodyplex”) enabling the detection of neutralizing antibodies in serum of infected patients. overall, our nbs have high potential for prophylactic and therapeutic options and provide a novel approach to screen for a neutralizing immune response in infected or vaccinated individuals, helping to monitor immune status or guide vaccine design. phycoerythrin (pe)-labeled streptavidin after stringent washing. additionally, a non-specific nb (gfp-nb, negative control) and two inhibiting mouse antibodies (positive controls) were analyzed . data obtained by this multiplex binding assay showed that of the analyzed nbs inhibit ace binding to isolated rbd, s domain and homotrimeric spike. ic values calculated for inhibition of ace :rbd interaction ranges between . nm for nm and nm for nm (figure ) . notably, ic values obtained for the most potent inhibitory nbs nm ( . nm), nm ( . nm) and nm ( . nm) are highly comparable to ic values measured for the mouse iggs (mm : . nm; mm : . nm). additionally, the assay revealed that all nbs except nm , show a similarly strong inhibitory effect of ace binding to all tested antigens. nm seems to exclusively inhibit rbd:ace interaction and does not prevent binding of ace to either the homotrimeric spike or the s domain. after identifying rbd-specific nbs which have an inhibitory effect on ace binding, we investigated the relative location of their epitopes within the rbd. firstly, we first performed epitope binning experiments of nb combinations using biolayer interferometry. after coating sensors with biotinylated rbd, a nb was loaded until binding saturation was reached, followed by a short dissociation step to remove excess nb. a second nb from a different family was then exposed to the rbd-nb-complex. using this approach, we identified nbs which recognize overlapping and non-overlapping epitopes on rbd (figure , supplementary figure ) . as expected nbs with only minor differences in their cdr (nm , nm and nm , nb- set ) were suggested to recognize an identical or highly similar epitope as they cannot bind simultaneously to rbd. our analysis revealed that nbs with highly diverse cdr s such as nm , nm , nm and nm could not bind simultaneously, suggesting that these nbs recognize similar or at least overlapping epitopes. as a result, we clustered these diverse nbs in nb-set . overall, we identified five distinct nbs-sets, comprising at least one candidate targeting a different epitope within the rbd compared to any member of a different nb-set (figure ) . next, we performed hydrogen-deuterium exchange mass spectrometry (hdx-ms) with the most potent inhibitory nbs selected from the different nb-sets. this allowed us to more precisely locate their binding sites at the surface of rbd and compare with the rbd:ace interface. both members of nb-set , nm and nm , interacted with the rbd at the back/ lower right site (back view, figure ) . notably, the binding site of nm does not encompass amino acid residues involved in the rbd:ace interface. in contrast, nm (nb-set ) as well as nm (nb-set ) contacted the rbd at amino acid residues overlapping with the rbd:ace binding interface, whereas nm additionally covers parts of the spike- like loop region on one edge of the ace interface at the top front/ lower left side (front view, which did not contact any amino acid residues involved in the rbd:ace interface but rather binds to the opposite site (front view, figure ). comparing the data from epitope binning with the hdx-ms results, provides structural insights into the mechanism by which non- competing pairs of nbs can simultaneously bind the rbd. interestingly, the combination of nm (nb-set ) with nm (nb-set ) shows near complete coverage of the ace interface (figure ) whereas the observed inhibitory effect of nm might be due to steric hindrance. from these findings, we proposed that the combination of nb-set with nb-set might act synergistically on the inhibition of the interaction between rbd and ace . after identification of nbs which inhibit the rbd:ace interaction biochemically, we employed a cell-based viral infection assay to test for their neutralization potency. to this end, human caco- cells were co-incubated with the icsars-cov- -mng strain and serial dilutions of the inhibitory nbs nm , nm , nm and nm . h post-infection neutralization potency was determined via automated fluorescence-microscopy of fixed and nuclear-stained cells (supplementary figure ) . percentage of the infection rate following nb treatment normalized to a non-treated control was plotted and ic values were determined via sigmoidal inhibition curve fits. overall, data obtained from the multiplex binding assay and the viral infection assay were broadly consistent. representatives of nb-set , nm and nm , showed the highest neutralization potency with ic values of ~ nm and ~ nm followed by nm (~ nm) and nm (~ nm). as expected, nm (nb-set ) was not found to considering that nbs targeting diverse epitopes within the rbd:ace interface are beneficial in both reducing viral infectivity and preventing mutational escape, we next combined the most potent inhibitory and neutralizing candidates derived from nb-set (nm , nm ) and nb-set (nm ) and examined their response in both the multiplex binding assay and viral infection assay. in the multiplex binding assay the combination of nm and nm showed an increased effect in competing with ace binding to rbd illustrated by a ic of . nm which is -or -fold lower compared to treatment with individual nm or nm , respectively (figure a) . notably, the ic measured for the combination of nm and nm did not exceed the ic identified for nm alone indicating that nm by its own has a very high inhibiting effect (figure a). when we tested both combinations in the viral infection assay, we observed significantly improved effects in both as illustrated by an ic of ~ nm for the combination nm and nm and ~ . nm for nm and nm (figure b, supplementary figure ). from these findings we conclude, that a combinatorial treatment with two nbs targeting different epitopes within the rbd:ace interaction site is beneficial for viral neutralization. context, multiple studies have convincingly shown that neutralizing antibodies preferable bind to the rbd domain and sterically inhibit viral entry via ace , . from this, we can assume that our rbd nbs covering large parts of the rbd:ace interface might be suitable to monitor the emergence and presence of neutralizing antibodies in patients. to test this hypothesis, we set up a high-throughput competitive binding assay, termed neutrobodyplex, by combining our most potent neutralizing nb combinations with a recently developed, automatable multiplex immunoassay (figure a) . we incubated our previously generated color-coded beads comprising rbd, s domain or homotrimeric spike with serum samples from patients or non- infected individuals, in addition to dilution series of the combinations nm / nm or nm / nm and used this to detect patient-derived iggs bound to the respective antigens. depending on the nb concentration, neutralizing antibodies targeting the rbd:ace interaction site within the serum samples are displaced resulting in a reduction of the detectable signal (figure a) . when analyzing rbd specific iggs from serum samples, we detected a distinct signal reduction in the presence of increasing nb concentrations for all tested samples (figure to further demonstrate that our approach is able to determine the presence of iggs targeting the rbd:ace interaction site in detailed resolution, we highlight here the effect of competing iggs could be observed when measuring binding to rbd, however using the s domain as target antigen distinct differences between both serum samples became visible. while # comprise a substantial fraction of iggs addressing the rbd:ace interface also presented by the s domain, in sample # iggs binding to additional epitopes of the s domain cover the detectable signal reduction derived from displaced iggs (figure for functional analysis we employed a recently developed in vitro multiplex binding assay to monitor the replacement of ace as the natural ligand from binding to rbd, s domain or homotrimeric spike upon addition of rbd-specific nbs. with this assay, we were able to identify inhibiting nbs targeting those spike-derived antigens. interestingly, ic values obtained for inhibitory nbs on rbd and homotrimeric spike show a higher correlation compared to ic values obtained for the s domain. based upon detailed epitope mapping, we grouped our nbs in different nb-sets. of those nb-sets, comprise inhibitory nbs which were shown to target different epitopes within the rbd:ace interaction site. we confirmed the neutralizing potency of those nbs in a cell-based viral infection assay using fully intact sars-cov- . through this, we noted that the measurable viral neutralization effect of the individual nbs strongly correlates to the data obtained from the biochemical screen, which demonstrates that the multiplex binding assay as presented is highly relevant and suitable to identify virus neutralizing binders. as a result, we modified our previously described multiplex immunoassay (multicov-ab, ) and developed a novel diagnostic test called neutrobodyplex to monitor the presence and the emergence of neutralizing antibodies in serum samples of sars-cov- infected individuals. using combinations of high affinity nbs covering the rbd:ace interface, we were able to directly and specifically displace iggs present in serum samples from these particular rbd epitopes. according to previous studies, human iggs addressing those epitopes were classified as neutralizing antibodies , , . in our neutrobodyplex, we further demonstrated that such neutralizing antibodies can be detected best using the rbd. larger expression constructs for bacterial expression of nbs, sequences were cloned into the phen vector , thereby adding a c-terminal xhis-tag for imac purification as described previously , . the pcaggs plasmids encoding the stabilized homotrimeric spike protein and the receptor binding domain (rbd) of sars-cov- were kindly provided by f. krammer . the cdna encoding the s domain (aa - ) of the sars-cov- spike protein was obtained by pcr amplification using the forward primer s _cov -for ´-ctt ctg gcg tgt gac cgg - ´ and reverse primer s _cov -rev ´ -gtt gcg gcc gct tag tgg tgg tgg with high-confidence identification (q-value ≤ . ) were included to the list. peptides with overlapping mass, retention time and charge in nb and antigen digest, were manually removed. the deuterated samples were recorded in ms mode only and the generated peptide list was imported into hdexaminer v . . (sierra analytics, modesto, ca, usa). deuterium uptake was calculated using the increase of the centroid mass of the deuterated peptides. hdx could be followed for % of the rbd amino acid sequence. the calculated percentage deuterium uptake of each peptide between rbd-nb and rbd-only were compared. any peptide with uptake reduction of % or greater upon nb binding was considered as protected. cell culture caco- (human colorectal adenocarcinoma) cells were cultured at °c with % co in dmem containing % fcs, mm l-glutamine, μg/ml penicillin-streptomycin and % neaa. results from bead-based multiplex ace competition assay are shown for the three sars- cov- spike-derived antigens, rbd, s and homotrimeric spike. ace bound to the respective antigen was detected. for each nb, a dilution series from . µm to . nm is shown in the presence of ng/ml ace . mfi signals were normalized to the maximal signal per antigen as given by the ace -only control. ic values were calculated from a four-parametric sigmoidal model and are displayed for each nb and antigen. data is presented as mean +/- sd of three technical replicates (n = ). isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model structural basis for the recognition of sars-cov- by full-length human human neutralizing antibodies elicited by sars-cov- infection characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine nanobodies: natural single-domain antibodies structural basis for potent neutralization of betacoronaviruses by neutralizing nanobodies bind sars-cov- spike rbd and block interaction with ace an ultra-high affinity synthetic nanobody blocks sars-cov- infection by locking spike into an inactive conformation. biorxiv humanized single domain antibodies neutralize sars-cov- by targeting spike receptor binding domain. biorxiv an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction. biorxiv affinity nanobodies block sars-cov- spike receptor binding domain interaction with human angiotensin converting enzyme. biorxiv, fast isolation of sub-nanomolar affinity alpaca nanobody against the spike rbd of sars-cov- by combining bacterial display and a simple single-step density gradient selection. biorxiv multivalent nanobody cocktails for highly efficient sars- a potent neutralizing nanobody against sars-cov- with inhaled delivery potential. biorxiv spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- . biorxiv tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus potent neutralizing antibodies against sars-cov- identified by high- throughput single-cell sequencing of convalescent patients' b cells a neutralizing human antibody binds to the n-terminal domain of the spike protein of sars-cov- a serological assay to detect sars-cov- seroconversion in humans going beyond clinical routine in sars-cov- antibody testing -a multiplex corona virus antibody test for the evaluation of cross-reactivity to endemic coronavirus antigens. medrxiv quantum dot-conjugated sars-cov- spike pseudo-virions enable tracking of angiotensin converting enzyme binding and endocytosis evaluation of nine commercial sars-cov- immunoassays convergent antibody responses to sars-cov- in convalescent individuals a translational multiplex serology approach to profile the prevalence of anti-sars-cov- antibodies in home-sampled blood. medrxiv a high-throughput neutralizing antibody assay for covid- diagnosis and vaccine evaluation speed up to find the right ones: rapid discovery of functional nanobodies a sars-cov- surrogate virus neutralization test based on antibody- mediated blockage of ace -spike protein-protein interaction selection and identification of single domain antibody fragments from camel heavy- chain antibodies modulation of protein properties in living cells using nanobodies a versatile nanotrap for biochemical and functional studies with fluorescent fusion proteins targeting and tracing antigens in live cells with fluorescent nanobodies sars-cov- seroconversion in humans: a detailed protocol for antigen production, and test setup deuterium exchange mass spectrometry to study protein complexes optimization of feasibility stage for hydrogen/deuterium structure of the sars-cov- spike receptor-binding domain bound to the ace receptor s domain or homotrimeric spike of sars-cov- was incubated with nb combinations (concentrations ranging from . µm to . nm for each nb) and serum samples of convalescent sars-cov- patients and healthy donors at a : dilution. as positive control and maximal signal detection per sample, serum only was included and as negative control for nb binding a sars-cov- -unspecific gfp nanobody ( . µm) was used. to compare nb performance, the inhibiting mouse antibody ( -mm ) was added in concentrations of . µm to . nm. bound serum iggs were detected via anti-human-igg-pe as previously and fragments mass tolerance were set to ppm and . da, respectively. no enzyme selectivity was applied, however, identified peptides were manually evaluated to exclude peptides originated through cleavage after arginine, histidine, lysine, proline and the residue key: cord- -bhjw xc authors: olaleye, omonike a.; kaur, manvir; onyenaka, collins; adebusuyi, tolu title: discovery of clioquinol and analogues as novel inhibitors of severe acute respiratory syndrome coronavirus infection, ace and ace - spike protein interaction in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bhjw xc severe acute respiratory syndrome coronavirus (sars-cov- ), the etiological agent for coronavirus disease (covid- ), has emerged as an ongoing global pandemic. presently, there are no clinically approved vaccines nor drugs for covid- . hence, there is an urgent need to accelerate the development of effective antivirals. here in, we discovered clioquinol ( -chloro- -iodo- -quinolinol (clq)), a fda approved drug and two of its analogues ( -bromo- -chloro- -hydroxyquinoline (clbq ); and , -dichloro- -hydroxyquinoline (clcq)) as potent inhibitors of sars-cov- infection induced cytopathic effect in vitro. in addition, all three compounds showed potent anti-exopeptidase activity against recombinant human angiotensin converting enzyme (rhace ) and inhibited the binding of rhace with sars-cov- spike (rbd) protein. clq displayed the highest potency in the low micromolar range, with its antiviral activity showing strong correlation with inhibition of rhace and rhace -rbd interaction. altogether, our findings provide a new mode of action and molecular target for clq and validates this pharmacophore as a promising lead series for clinical development of potential therapeutics for covid- . shedding of s and transition of the s subunit to expose a hydrophobic fusion peptide , , . the initial priming at s /s boundary promotes subsequent cleavage at the s site by host proteases, which is critical for membrane fusion and viral infectivity , , . therefore, targeting the interaction between human ace receptor and the rbd in s protein of sars-cov- could serve as a promising approach for the development of effective entry inhibitors for potential prevention and/or treatment of covid- . in this study, we evaluated the effect of clq, and two of its analogues ( -bromo- - thereafter, the solution was discarded and the plate was washed consecutively four times with µl x wash buffer, followed by the addition of the detection antibody (anti-ace goat antibody). the reaction was allowed to go on for hr at room temperature ( °c) with shaking at rpm. then, the solution was discarded and the wash step was repeated as described above. next, the hrp-conjugated anti-goat igg was added to each well, and the reaction plate was further incubated for hr at room temperature ( °c) with shaking at rpm. again, the solution was discarded and the wash step was repeated as described above. then, µl of , ', , '-tetramethylbenzidine (tmb) one-step substrate was added to each well, and reaction compared to its counterparts, clbq exhibited the highest maximum inhibition at about . % inhibition at µm (table ). in addition, we compared the antiviral effects of clbq table ). these results suggest a potential new mechanism of action for clq and its congeners. notably, this is the first report to our knowledge, revealing that clq and its analogues effectively inhibit the novel sars- we determined the preliminary cytotoxicity of clq and its analogues (clbq and clcq), using a cell titer-glo luminescent cell viability assay . we assessed the cytotoxic effects of the various compounds in vero e cells and observed that, the % cytotoxic concentration (cc ) of clq and its derivatives were all greater than µm. however, in comparison to the other reference compounds tested, clq and its analogues displayed lower percent minimum viability at higher concentrations. on the other hand, we observed similar percent maximum viability for clq pharmacophore and the other reference compounds at lower concentrations (table ) . this suggests that, the cytotoxic effects may not be a concern at lower concentrations of clq and its analogues. additional concentrations need to be tested in future studies to determine the actual cc value (table ) . we determined the effect of clq, clbq and clcq on the exopeptidase activity of rhace using an adapted fluorometric assay (https://bpsbioscience.com/pub/media/wysiwyg/ .pdf). we found that all three compounds inhibited rhace activity with similar ic values in the low micromolar concentration, with clq being the most potent amongst all three analogues tested, at ic of . µm (table ) . to our knowledge, these results revealed for the first time that, rhace is a biochemical target of clq and its analogues. because, the known metal cofactor for ace is zinc , , using the same fluorometric assay described above in the methods activity and its interaction with spike protein. in this study, we also compared the dose-response curves of antiviral effects of clq and its analogues with five other known inhibitors of sars-cov-found that clq's potency was better and comparable to aloxistatin; but had lower efficacy than the other reference inhibitors (table ) . it is important to note that the vero e cells used for the sars-cov- infection induced cpe assay were first sorted by flow cytometry by sri for selection of cells that had higher levels of ace expression to increase the efficiency of infection. (table ) . this suggests that, the cytotoxic effects may not be a concern at lower concentrations of clq and its analogues. in addition, the observed ic values for inhibition of rhace exopeptidase activity and rhace -rbd interaction were in the low micromolar range, recent studies have also shown that ace plays a key role in protecting the lungs from ards , , a severe complication of covid- disease . therefore, one has to proceed cautiously when targeting ace ; without permanently inactivating its exopeptidase or other cellular functions, to avoid potential adverse effects to heart and/or lung function. our lead compound clq is a weak metal chelator and zinc ionophore, that can shuttle free zinc across the function and prevent its interaction with sars-cov- rbd protein; without permanently inhibiting its essential exopeptidase function. because rhace is a novel host target for clq and its analogues, the potential effect of clq inhibition on heart and lung function needs to be further explored in vivo and pre-clinical studies. the crystal structure of full length human ace revealed that the rbd on sars-cov- s binds directly to the metallopeptidase domain (mpd) of ace receptor , , that consists of amino acid residues that coordinates zinc, providing further support for the utility of zinc chelators the strengths of our study includes, the use of a rapid multi-prong approach via three world health organization (who) director-general's opening remarks at the media briefing on covid- coronavirus disease (covid- ) situation report a novel coronavirus from patients with pneumonia in china clinical features of patients infected with novel coronavirus in wuhan potential effects of coronaviruses on the cardiovascular system: a review neurological manifestations of hospitalized patients with covid- in wuhan, china: a retrospective case series study clinical characteristics of coronavirus disease in china potential antiviral drugs under evaluation for the treatment of covid- covid- treatment guidelines current updates on the european and who registered clinical trials of coronavirus disease (covid- ) clioquinol, an alternative antimicrobial agent against common pathogenic microbe effect of clioquinol, an -hydroxyquinoline derivative, on rotavirus infection in mice rna dependent dna polymerase (reverse transcriptase) from avian myeloblastosis virus: a zinc metalloenzyme characterization of clioquinol and analogs as novel inhibitors of methionine aminopeptidases from mycobacterium tuberculosis killing of non-replicating mycobacterium tuberculosis by - structure and function of the methionine aminopeptidases control of protein life-span by n-terminal methionine excision zinc supplementation for the treatment or prevention of disease: current status and future perspectives pyrithione and -hydroxyquinolines transport lead across erythrocyte membranes clioquinol and pyrithione activate trpa by increasing intracellular zn + zinc-binding compounds induce cancer cell death via distinct modes of action cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace structural and functional basis of sars-cov- entry by using human ace quantitative mrna expression profiling of ace , a novel homologue of angiotensin converting enzyme angiotensin converting enzyme- (ace ) and its possible roles in hypertension, diabetes and cardiac function. letters in peptide science a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus- induced lung injury ace x-ray structures reveal a large hinge-bending motion important for inhibitor binding and catalysis structural biology: structure of sars coronavirus spike receptor-binding domain complexed with receptor receptor recognition mechanisms of coronaviruses: a decade of structural studies cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace cleavage of the sars coronavirus spike glycoprotein by airway proteases enhances virus entry into human bronchial epithelial cells in vitro cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding host cell proteases: critical determinants of coronavirus tropism and pathogenesis characterization of a highly conserved domain within the severe acute respiratory syndrome coronavirus spike protein s domain with characteristics of a viral fusion peptide adapting cell-based assays to the high- throughput screening platform: problems encountered and lessons learned development and validation of a high-throughput screen for inhibitors of sars cov and its application in screening of a , -compound library hydrolysis of biological peptides by human angiotensin-converting enzyme- related carboxypeptidase the curious case of clioquinol subacute myelo-optico-neuropathy, a new neurological disease prevailing in japan subacute myelo optic neuropathy and clioquinol. an epidemiological case history for diagnosis covid- , ace , and the cardiovascular consequences angiotensin-converting enzyme protects from severe acute lung failure insights into zn + homeostasis in neurons from experimental and modeling studies the effect of ace inhibitor mln- on the interaction of sars-cov- spike protein with human ace : a molecular dynamics study interaction of severe acute respiratory syndrome-coronavirus and nl coronavirus spike proteins with angiotensin converting enzyme- receptor and viral determinants of sars-coronavirus adaptation to human ace zinc-dependent protein folding a country level analysis measuring the impact of government actions, country preparedness and socioeconomic factors on covid- mortality and related health outcomes electrophysiologic studies on the risks and potential mechanism underlying the proarrhythmic nature of azithromycin hydroxychloroquine reduces heart rate by modulating the hyperpolarization- activated current i f : novel electrophysiological insights and therapeutic potential drug-drug interactions between covid- treatments and activity of clioquinol (clq) and analogues against ace exopeptidase activity and ace and sars-cov- spike (rbd) protein interaction the authors declare no competing interests. key: cord- - vaarrji authors: gauttier, v.; morello, a.; girault, i.; mary, c.; belarif, l.; desselle, a.; wilhelm, e.; bourquard, t.; pengam, s.; teppaz, g.; thepenier, v.; biteau, k.; de barbeyrac, e.; kiepferlé, d.; vasseur, b.; le flem, fx.; debieuvre, d.; costantini, d.; poirier, n. title: tissue-resident memory cd t-cell responses elicited by a single injection of a multi-target covid- vaccine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vaarrji the covid- pandemic is caused by severe acute respiratory syndrome coronavirus- (sars-cov- ) which enters the body principally through the nasal and larynx mucosa and progress to the lungs through the respiratory tract. sars-cov- replicates efficiently in respiratory epithelial cells motivating the development of alternative and rapidly scalable vaccine inducing mucosal protective and long-lasting immunity. we have previously developed an immunologically optimized multi-neoepitopes-based peptide vaccine platform which has already demonstrated tolerance and efficacy in hundreds of lung cancer patients. here, we present a multi-target cd t cell peptide covid- vaccine design targeting several structural (s, m, n) and non-structural (nsps) sars-cov- proteins with selected epitopes in conserved regions of the sars-cov- genome. we observed that a single subcutaneous injection of a serie of epitopes induces a robust immunogenicity in-vivo as measured by ifnγ elispot. upon tetramer characterization we found that this serie of epitopes induces a strong proportion of virus-specific cd t cells expressing cd , cd , cxcr and cd a, the specific phenotype of tissue-resident memory t lymphocytes (trm). finally, we observed broad cellular responses, as characterized by ifnγ production, upon restimulation with structural and non-structural protein-derived epitopes using blood t cells isolated from convalescent asymptomatic, moderate and severe covid- patients. these data provide insights for further development of a second generation of covid- vaccine focused on inducing lasting th -biased memory cd t cell sentinels protection using immunodominant epitopes naturally observed after sars-cov- infection resolution. statement of significance humoral and cellular adaptive immunity are different and complementary immune defenses engaged by the body to clear viral infection. while neutralizing antibodies have the capacity to block virus binding to its entry receptor expressed on human cells, memory t lymphocytes have the capacity to eliminate infected cells and are required for viral clearance. however, viruses evolve quickly, and their antigens are prone to mutations to avoid recognition by the antibodies (phenomenon named ‘antigenic drift’). this limitation of the antibody-mediated immunity could be addressed by the t-cell mediated immunity, which is able to recognize conserved viral peptides from any viral proteins presented by virus-infected cells. thus, by targeting several proteins and conserved regions on the genome of a virus, t-cell epitope-based vaccines are less subjected to mutations and may work effectively on different strains of the virus. we designed a multi-target t cell-based vaccine containing epitope regions optimized for cd + t cell stimulation that would drive long-lasting cellular immunity with high specificity, avoiding undesired effects such as antibody-dependent enhancement (ade) and antibody-induced macrophages hyperinflammation that could be observed in subjects with severe covid- . our in-vivo results showed that a single injection of selected cd t cell epitopes induces memory viral-specific t-cell responses with a phenotype of tissue-resident memory t cells (trm). trm has attracted a growing interest for developing vaccination strategies since they act as immune sentinels in barrier tissue such as the respiratory tract and the lung. because of their localization in tissues, they are able to immediately recognize infected cells and, because of their memory phenotypes, they rapidly respond to viral infection by orchestrating local protective immune responses to eliminate pathogens. lastly, such multiepitope-based vaccination platform uses robust and well-validated synthetic peptide production technologies that can be rapidly manufactured in a distributed manner. the covid- pandemic is caused by severe acute respiratory syndrome coronavirus- (sars-cov- ) which enters the body principally through the nasal and larynx mucosa and progress to the lungs through the respiratory tract. sars-cov- replicates efficiently in respiratory epithelial cells motivating the development of alternative and rapidly scalable vaccine inducing mucosal protective and long-lasting immunity. we have previously developed an immunologically optimized multi-neoepitopes-based peptide vaccine platform which has already demonstrated tolerance and efficacy in hundreds of lung cancer patients. here, we present a multi-target cd t cell peptide covid- vaccine design targeting several structural (s, m, n) and non-structural (nsps) sars-cov- proteins with selected epitopes in conserved regions of the sars-cov- genome. we observed that a single subcutaneous injection of a serie of epitopes induces a robust immunogenicity in-vivo as measured by ifnγ elispot. upon tetramer characterization we found that this serie of epitopes induces a strong proportion of virus-specific cd t cells expressing cd , cd , cxcr and cd a, the specific phenotype of tissue-resident memory t lymphocytes (trm). finally, we observed broad cellular responses, as characterized by ifnγ production, upon restimulation with structural and nonstructural protein-derived epitopes using blood t cells isolated from convalescent asymptomatic, moderate and severe covid- patients. these data provide insights for further development of a second generation of covid- vaccine focused on inducing lasting th biased memory cd t cell sentinels protection using immunodominant epitopes naturally observed after sars-cov- infection resolution. covid- , the infectious disease caused by the zoonotic coronavirus sars-cov- , is a global pandemic which has infected more than immune responses against this type of respiratory viruses but that antibody responses are shortterm [ ] [ ] [ ] [ ] [ ] in contrast to the cellular immunity which is still observed and years after the infection , . current and previous covs vaccine strategies have been almost exclusively focused on eliciting a humoral immune response, particularly anti-spike neutralizing igg antibody. however, the generation of non-neutralizing antibody responses , insufficient antibody titers , th -biased immune response or glycosylation changes in the igg fc tail may be associated with vaccine failure, and in the worst case scenario may enhance disease upon viral exposure, either through the induction of enhanced pulmonary macrophage-mediated hyper-inflammation , or fc receptor-mediated antibody-dependent enhancement (ade) . th -biasing immunization using cd t cells optimized peptide vaccination may offer an important alternative and complementary approach with a history of safe administration, may be developed and updated rapidly, and should avoid safety pitfalls in the pursuit of a covid- vaccine , , . the discovery of memory t lymphocytes resident in diverse tissues, in particular mucosal and barrier tissues, has highlighted the importance of site-specific responses and continuous surveillance mediated by a specific tissue-resident memory t cell population (trm) [ ] [ ] [ ] . a couple of studies demonstrated that after induction trm migrate and reside in the lung, skin, and gut long after infection resolution and provide localized protective immunity and immunosurveillance in tissues [ ] [ ] [ ] [ ] . trm represents an attractive target population and a growing interest for developing vaccine strategies since they act as immune sentinels in mucosal and barrier tissue and rapidly respond to infection by orchestrating local protective immune responses to eliminate pathogens . previous cd t cell-based vaccines against sars-cov- and influenza a viruses showed lasting virus-specific memory cd t cells induction in the spleen, lung and bronchoalveolar fluids (bals) and protection of mice from lethal sars-cov- or influenza challenges. expression of chemokine receptors, such as cxcr , is critical for cd + trm to populate the airways after vaccination and protection against influenza a viruses . more recently, a trm-inducing hiv vaccine durably prevented mucosal infection in non-human primates even with lower neutralizing antibody titers . altogether, these data emphasize the interest of tissue-resident memory viral-specific cd t cell generation upon vaccination for optimal protection against airways viruses. stimulation of a proper immune response that leads to protection is highly dependent on presentation of epitopes to circulating t-cells via the hla complex. memopi ® is a robust vaccine platform based on selection and immunogenicity optimization of hla-restricted peptides (neo-epitope) technologies , and formulation for multi-epitopes and targets combination with a pan-dr helper epitope (padre) providing help for memory cd t cell generation . a combination of multi-target antitumor neo-epitopes, tedopi ® , based on this platform already demonstrated good safety profile and efficacy in clinical phase trial and more recently successfully validated in the first step of a phase clinical testing in lung cancer paients . while induction of mucosal immunity using parenteral administration of conventional virus vaccine technologies was challenging, we observed that subcutaneous injection of our neo-epitopes multi-target cancer vaccine promotes th -biased antigen-specific memory cd t cell responses in the lung and bals of vaccinated mice in the absence of tumor. based on this original preclinical observation and the significant survival increase measured during clinical trials in lung cancer patients correlating with epitope responses, we generated and screened individually a large number of immuno-dominant sars-cov- epitopes, and their neo-epitopes generated using artificial intelligence (ai) algorithms, covering all sequenced circulating sars-cov- strains and derived from structural (including spike) and non-structural proteins with significant homology with previous sars-cov- virus. previous research in sars-cov- suggests that the structural spike (s) protein is one of the main antigenic component responsible for inducing the host immune responses orf a/b non-structural proteins (nsp , nsp , nsp , nsp , nsp , nsp , nsp ) (figure and table ). based on our knowledge of key fixed-anchor positions to enhance hla binding and increase their immunogenicity potential, we designed mutated sequences for each individual peptide resulting in total analyzed sequences of the selected epitopes. we first screened these potential peptides using in-silico bioinformatic analyses (e.g. iedb immune epitope database, netmhcpan el . algorithm) and a first series of the most optimized mutant for each epitope was selected (neo-epitopes a). in parallel, in silico hla-a* peptide docking models were generated using computational tools and analyzed using newly europe, then spread worldwide and became the most prevalent form , . we eliminated t cell epitopes with recurrent mutation and homoplasic site in order to cover all circulating sars-cov- strains and anticipate future evolution of the virus in hotspot mutation regions. wt and mutated peptides (neoepitopes a and b) were produced using synthetic peptide synthesis (proteogenix, france). hla-a binding property characterization at °c, using uv peptide exchange assay on hla-a* monomer, showed that the majority of selected wt epitopes binds to hla-a with good efficacy (figure a ) as compared to our memopi ® internal positive neoepitope control (mutated peptide with increased hla-a* binding and in-vivo immunogenicity). hla-a binding was increased with several neoepitopes a and/or b, particularly when the corresponding wt peptide showed weak (< %: figure a) . similarly, broad immunogenicity response was observed by hla-a* -tetramer flow cytometry analyses to a higher number of epitopes since out of ( %) evaluated peptides, derived from out of the selected proteins, exhibited significant frequency ( . - %) of viral-specific cd t cells ( figure b ). phenotypic characterization of tetramer+ cells showed that out of ( %) positive peptides elicited viral-specific cd t cells with mainly a trm phenotype: co-expressing the memory marker cd , the cd αe integrin, the th -biased cxcr chemokine receptor and to a lesser extent the cd a α integrin (figure ) . altogether, these data showed that optimized peptide vaccination against selected sars-cov- epitopes elicits robust and broad th -biased immunogenicity against several structural (s, m, n) and non-structural proteins in hla-a expressing mice and that several peptides induce viral-specific memory cd t cells displaying all characteristics of t lymphocyte sentinels in barrier tissues. in order to identify and select naturally sars-cov- cd t cell immunodominant epitopes, peripheral blood mononuclear cells (pbmc) from asymptomatic and moderate or severe covid- patients with a previously confirmed (at least one month before sampling) and altogether, when we compared convalescent sars-cov- individuals to unexposed healthy donors, we identified significantly different cd t cell immunodominant epitopes against structural proteins (s, m, n), accessory factor (orf a) and non-structural proteins (nsp , nsp , nsp , nsp , nsp , nsp , nsp ). of these epitopes are of particular interest for vaccination since they were able to elicit also in-vivo immunogenicity (elispot response) against all structural and non-structural sars-cov- proteins after a single peptide injection. finally, we selected a combination of cd t cells epitopes based on manufacturing facilities, hla-i coverage, previous covs homology and sars-cov- proteins diversity considerations ( table ) . these epitopes covered the selected proteins, epitope/protein excepting spike for which epitopes (including rbd epitope) have been selected. bioinformatic analyses illustrate these epitopes are not restricted only to the hla-a* allele, hence are predict (netmhcpan score < ) to bind efficiently to different hla-i (a, b, c) alleles with high genetic coverage in all geographical region of the world. despite hla polymorphism and different worldwide hla-i distribution, the combination of these t cell epitopes should induce at least to positive peptide responses in all individuals globally and achieve the - % 'herd immunity' threshold with at least to positive peptide responses in each geographical region ( table ). here we report a differentiated sars-cov- vaccine design based on memory t-cell induction technology. using sequence design through reverse vaccinology selection approach based on previous covs knowledge on immunodominant epitopes and computational immunology optimization, we developed a combination of cd t cell synthetic peptides originating from sars-cov- structural and non-structural proteins capable to cover hla polymorphism with high coverage globally and to induce immunogenicity to different proteins independently of hla alleles expression. these epitopes are naturally immunogenic after sars-cov- infection in recovered individuals and, for most of them, elicit a specialized sub-population of viral-specific memory cd t cells with a tissue-resident phenotype hence capable to migrate, stay attached and patrolling in airways barrier tissues. vaccine-induced t cell entry to the lung mucosal compartments [ ] [ ] [ ] [ ] [ ] . here we showed that, as previously observed with our memopi ® -based neoepitope cancer vaccine approach, several peptides induced viral-specific cd t cells expressing the e (cd ) and α (cd a) integrins and the cd memory marker, altogether characteristic of trm. these cells express also the cxcr marker of chemoattraction, which is also a surrogate marker of th cd t cells since cxcr is transactivate directly by the th master gene t-bet animal housing and procedures have been conducted according to the guidelines of the french agriculture ministry and were approved by the regional ethical committee (apafis . t-cell wt and mutated peptides binding property on hla-a has been evaluated using the flex-t hla-a* : monomer ultraviolet (uv) exchange assay according to the manufacturer recommendation (biolegend, san diego, usa). hla-a* : monomer ( µg/ml) were exposed to a -nm uv lamp in the presence or absence of µm of peptide. after uvexposure, hla-peptide complexes were incubated at °c for min to promote unfolding of peptide-free hla molecule. hla-peptide complexes stability was detected by elisa with β microglobulin coated antibodies and incubation of ng/ml of complexes for h at room temperature under shaking condition. avidin-hrp were used to reveal stable biotinylated hla-peptide complexes and absorbance was monitored at nm. data are expressed as percentage of binding relative to an memopi ® internal positive control neoepitope. a visualization tool was used to determine t-cell and b-cell epitope location in sars-cov- genomes according to single nucleotide polymorphism (snps) and homoplasic site (https://macman .shinyapps.io/ugi-scov -alignment-screen/) all subjects were enrolled in the covepit- were not considered as an exclusion criterion. pbmc were isolated after a ficoll density-gradient centrifugation and a red blood cell lysis. hla-a phenotyping was performed by flow cytometry (clone bb . , bd bioscience). exvivo stimulation protocol was adapted from a previously described protocol continuous variables were expressed as the mean ± sem, unless otherwise indicated, and raw data were compared with nonparametric tests: mann-whitney for groups or kruskall-wallis with dunn's comparison when the number of groups was > . p values of < . were considered statistically significant. all statistical analyses were performed on graphpad software (graphpad software, san diego, ca). this work was supported by funding from nantes metropole as part of the metropolitan fund to support health innovations linked to the covid- health crisis. we thank clinicians and patients involved in the covepit- trial as well as the exystat company for biometric expertise and data management for the covepit- trial. figure : t-cell epitopes location in sars-cov- genome sars-cov- genome annotation by the krogran lab and schematic representation of t-cell epitopes location in each encoded proteins. n= structural proteins; n= non-structural proteins (nsps); n= accessory factors. hla-a binding characterization of wt and mutated t-cell epitopes (a) wt and mutated peptides were incubated with hla-a* : monomer, exposed to uv for peptide exchange and then hla-peptide complexes stability at °c was measured by elisa. data are mean +/-sem (n= ) expressed as percentage of binding relative to an internal memopi ® positive control neoepitope. (b) wt and mutated peptides ( µm) binding to tapdeficient human cell line (t ) expressing hla-a . data are expressed as percentage of binding relative to an internal memopi ® positive control neoepitope. x mark indicates that the peptide was not tested in the assay. figure a ). medium and high response threshold were defined based on -fold or -fold increase respectively compared to the background frequency measured in the non-vaccinated mice control group. controls+ are memopi ® peptides with previously validated immunogenicity. data are mean +/-sd of pooled female (n= ) and pooled male (n= ) vaccinated mice. medium level: two-fold background. high level: three-fold background ifnγ secretion responses for hours after one week of restimulation of human pbmc from unexposed hla-a + healthy donors (n= ), asymptomatic confirmed covid- hla-a + individuals (n= ) and moderate or severe covid- hla-a + convalescent patients (n= ) with each of the isolated peptides and hla-a + antigen-presenting cells. data were normalized to negative control peptides. data are expressed as mean +/-min to max. *p< . table : t-cell wt epitopes hla-i alleles numbers and regional hla-i coverage were determined using iedb public database and netmhcpan score < . duration of antibody responses after severe acute respiratory syndrome lack of peripheral memory b cell responses in recovered patients with severe acute respiratory syndrome: a six-year follow-up study a systematic review of antibody mediated immunity to coronaviruses: antibody kinetics, correlates of protection, and association of antibody responses with severity of disease longitudinal evaluation and decline of antibody responses in sars-cov- infection rapid decay of anti-sars-cov- antibodies in persons with mild covid- memory t cell responses targeting the sars coronavirus persist up to years post-infection sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls rapid covid- vaccine development the potential danger of suboptimal antibody responses in covid- anti-sars-cov- igg from severely ill covid- patients promotes macrophage hyper-inflammatory responses a double-inactivated severe acute respiratory syndrome coronavirus vaccine provides incomplete protection in mice and induces increased eosinophilic proinflammatory pulmonary response upon challenge molecular mechanism for antibody-dependent enhancement of coronavirus entry avoiding pitfalls in the pursuit of a covid- vaccine t-cell-inducing vaccines -what's the future preferential localization of effector memory cells in nonlymphoid tissue visualizing the generation of memory cd t cells in the whole body location, location, location: tissue resident memory t cells in mice and humans dendritic cellinduced memory t cell activation in nonlymphoid tissues memory t cells in nonlymphoid tissue that provide enhanced local immunity during infection with herpes simplex virus dynamic t cell migration program provides resident memory within intestinal epithelium cutting edge: tissue-retentive lung memory cd t cells mediate optimal protection to respiratory virus infection sensing and alarm function of resident memory cd + t cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection the design and proof of concept for a cd + t cell-based vaccine inducing cross-subtype protection against influenza a virus lung airway-surveilling cxcr (hi) memory cd (+) t cells are critical for protection against influenza a virus t cell-inducing vaccine durably prevents mucosal shiv infection even with lower neutralizing antibody titers identification of new epitopes from four different tumor-associated antigens: recognition of naturally processed epitopes correlates with hla-a* -binding affinity improved immunogenicity of an immunodominant epitope of the her- /neu protooncogene by alterations of mhc contact residues formulation and characterization of a ten-peptide single-vial vaccine, ep- , designed to induce cytotoxic t-lymphocyte responses for cancer immunotherapy development of high potency universal dr-restricted helper epitopes by modification of high affinity dr-blocking peptides induction of immune responses and clinical efficacy in a phase ii trial of idm- , a -epitope cytotoxic t-lymphocyte vaccine, in metastatic non-small-cell lung cancer the spike protein of sars-cov--a target for vaccine and therapeutic development targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals robust t cell immunity in convalescent individuals with asymptomatic or mild single-cell transcriptomic analysis of sars-cov- reactive cd + t cells different pattern of pre-existing sars-cov- specific t cell immunity in sarsrecovered and uninfected individuals overwhelming mutations or snps of sars-cov- : a point of caution emergence of genomic diversity and recurrent mutations in sars-cov- sars-cov- alignment screen large scale genomic analysis of sars-cov- genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations polymorphism and selection pressure of sars-cov- vaccine and diagnostic antigens: implications for immune evasion and serologic diagnostic performance preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies prioritization of sars-cov- epitopes using a pan-hla and global population inference approach a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- covid- vaccine candidates: prediction and validation of sars-cov- development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach personalized workflow to identify optimal tcell epitopes for peptide-based vaccines against in silico identification of vaccine targets for -ncov sars-cov- (covid- ) by the numbers a structural analysis of m protein in coronavirus assembly and morphology a sars-cov- protein interaction map reveals targets for drug repurposing gene of the month: the -ncov/sars-cov- novel coronavirus spike protein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus structural basis of receptor recognition by sars-cov- structure of sars coronavirus spike receptor-binding domain complexed with receptor structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections recent advances in the vaccine development against middle east respiratory syndrome-coronavirus spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus selective and cross-reactive sars-cov- t cell epitopes in unexposed humans sars-cov- -reactive t cells in healthy donors and patients with covid- herd immunity thresholds for sars-cov- estimated from unfolding epidemics safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind an mrna vaccine against sars-cov- -preliminary report genomic diversity and hotspot mutations in , sars-cov- genomes: moving toward a universal vaccine for the "confined virus intranasal administration of rsv antigen-expressing mcmv elicits robust tissue-resident effector and effector memory cd + t cells in the lung lung-resident memory cd t cells (trm) are indispensable for optimal crossprotection against pulmonary virus infection vaccine-generated lung tissue-resident memory t cells provide heterosubtypic protection to influenza infection expression of chemokine receptors by lung t cells from normal and asthmatic subjects a small-molecule compound targeting ccr and cxcr prevents airway hyperresponsiveness and inflammation cxcr signaling is required for restricted homing of parenteral tuberculosis vaccine-induced t cells to both the lung parenchyma and airway lung tissue resident memory t-cells in the immune response to mycobacterium tuberculosis lung airway-surveilling cxcr hi memory cd + t cells are critical for protection against influenza a virus trm integrins cd and cd a differentially support adherence and motility after resolution of influenza virus infection cd a expression defines tissue-resident cd + t cells poised for cytotoxic function in human skin potently neutralizing and protective human antibodies against sars-cov- respiratory syncytial virus disease in infants despite prior administration of antigenic inactivated vaccine altered reactivity to measles virus atypical measles in children previously immunized with inactivated measles virus vaccines no evidence for increased transmissibility from recurrent mutations in sars-cov- spatially resolved analyses link genomic and immune diversity and reveal unfavorable neutrophil activation in melanoma key: cord- -aoti v authors: lupala, cecylia s.; kumar, vikash; li, xuanxuan; su, xiao-dong; liu, haiguang title: computational analysis on the ace -derived peptides for neutralizing the ace binding to the spike protein of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aoti v the severe acute respiratory syndrome coronavirus (sars-cov- ), the causative agent of the covid- , is spreading globally and has infected more than million people. it has been discovered that sars-cov- initiates the entry into cells by binding to human angiotensin-converting enzyme (hace ) through the receptor binding domain (rbd) of its spike glycoprotein. hence, drugs that can interfere the sars-cov- -rbd binding to hace potentially can inhibit sars-cov- from entering human cells. here, based on the n-terminal helix α of human ace , we designed nine short peptides that have potential to inhibit sars-cov- binding. molecular dynamics simulations of peptides in the their free and sars-cov- rbd-bound forms allow us to identify fragments that are stable in water and have strong binding affinity to the sars-cov- spike proteins. the important interactions between peptides and rbd are highlighted to provide guidance for the design of peptidomimetics against the sars-cov- . severe acute respiratory syndrome coronavirus (sars-cov- , also known as -ncov) caused the covid- , which has been declared by the world health organization to be a global pandemic. the covid- has caused over , fatalities (as of april th, ) with more was also investigated in another study, which aims to develop molecules that interfere the binding of sars-cov- rbd to hace . their results showed that a -residue peptide (residues - ) of hace n-terminal helix was able to bind to the rbd with nanomolar affinity, comparable to that of full length hace . they also reported that a -residue peptide (residues - ) failed to bind to the sars-cov- rbd. in a computational study, a -residue peptide derived from s a q n t f , a , y f y d k , f h y , l d y , g y q , t , n q q , y l q m f y f , n forms. this observation suggests that those peptides prefer a similar helical conformation as they were in the full hace protein in solution, and the binding to the rbd resulted induced conformational changes, which are more pronounced for shorter peptides, such as sif and sif . under the consideration of peptide stability, longer peptides are preferred according to the simulation results. the sif stability was also measured with their helicity contents in the free and bound forms (table ). the peptides sif to sif showed high helical contents (~ %) when they are in complex with the spike protein rbd. it is interesting to observe that longer peptides tend to maintain stable helix to be effective in inhibiting the hace binding. figure illustrates the binding energy dependency on helical contents, especially the helicity of free peptides (solid black circles in figure ) . among the strong binders, whose binding energy are lower than - kcal/mol, there is a shared sequence segment composed of the residues ( - ). the major difference for sif's with stronger binding affinity compared to the common sequence in all sif's are the residues of ( - coronavirus disease an emerging coronavirus causing pneumonia outbreak in china: calling for developing therapeutic and prophylactic strategies clinical features of patients infected with novel coronavirus in clinical characteristics of coronavirus disease in china review of the clinical characteristics of coronavirus disease (covid- ) a novel coronavirus from patients with pneumonia in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission angiotensin-converting enzyme is a functional receptor for the sars coronavirus a -amino acid fragment of the structural and functional basis of sars-cov- entry by using human ace trilogy of ace : a peptidase in the renin-angiotensin system, a sars receptor, and a partner for amino acid transporters angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor mice transgenic for human angiotensin-converting enzyme provide a model for sars coronavirus infection angiotensin-converting enzyme protects from severe acute lung failure a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury structure, function, and antigenicity of the sars-cov- spike cryo-em structure of the -ncov spike in the prefusion conformation computational simulations reveal the binding dynamics between human ace and the receptor binding domain of sars-cov- spike protein the first-in-class peptide binder to the sars-cov- spike protein inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace computational design of peptides to block binding of the sars-cov- spike protein to human ace identification of critical determinants on ace for sars-cov entry and development of a potent entry inhibitor a web-based graphical user interface for charmm optimization of the additive charmm all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ and χ dihedral constraint solver for molecular simulations gromacs: fast, flexible, and free vmd: visual molecular dynamics ucsf chimera -a visualization system for exploratory research and analysis the authors declare no competing interests. key: cord- -w a ec authors: sen, sanjana; sanders, emily c.; gabriel, kristin n.; miller, brian m.; isoda, hariny m.; salcedo, gabriela s.; garrido, jason e.; dyer, rebekah p.; nakajima, rie; jain, aarti; santos, alicia m.; bhuvan, keertna; tifrea, delia f.; ricks-oddie, joni l.; felgner, philip l.; edwards, robert a.; majumdar, sudipta; weiss, gregory a. title: predicting covid- severity with a specific nucleocapsid antibody plus disease risk factor score date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w a ec effective methods for predicting covid- disease trajectories are urgently needed. here, elisa and coronavirus antigen microarray (covam) analysis mapped antibody epitopes in the plasma of covid- patients (n = ) experiencing a wide-range of disease states. the experiments identified antibodies to a -residue epitope from nucleocapsid (termed ep ) associated with severe disease, including admission to the icu, requirement for ventilators, or death. importantly, anti-ep antibodies can be detected within six days post-symptom onset and sometimes within one day. furthermore, anti-ep antibodies correlate with various comorbidities and hallmarks of immune hyperactivity. we introduce a simple-to-calculate, disease risk factor score to quantitate each patient’s comorbidities and age. for patients with anti-ep antibodies, scores above . predict more severe disease outcomes with a . likelihood ratio ( . % specificity). the results lay the groundwork for a new type of covid- prognostic to allow early identification and triage of high-risk patients. such information could guide more effective therapeutic intervention. effective methods for predicting covid- disease trajectories are urgently needed. here, elisa and coronavirus antigen microarray (covam) analysis mapped antibody epitopes in the plasma of covid- patients (n = ) experiencing a wide-range of disease states. the experiments identified antibodies to a -residue epitope from nucleocapsid (termed ep ) associated with severe disease, including admission to the icu, requirement for ventilators, or death. importantly, anti-ep antibodies can be detected within six days post-symptom onset and sometimes within one day. furthermore, anti-ep antibodies correlate with various comorbidities and hallmarks of immune hyperactivity. we introduce a simple-to-calculate, disease risk factor score to quantitate each patient's comorbidities and age. for patients with anti-ep antibodies, scores above . predict more severe disease outcomes with a . likelihood ratio ( . % specificity). the results lay the groundwork for a new type of covid- prognostic to allow early identification and triage of high-risk patients. such information could guide more effective therapeutic intervention. the covid- pandemic has triggered an ongoing global health crisis. more than million confirmed cases and . million deaths have been reported worldwide as of october , ( ) . the virus that causes covid- , severe acute respiratory syndrome coronavirus (sars-cov- ), belongs to the same family of viruses responsible for respiratory illness linked to recent epidemicssevere acute respiratory syndrome ( ) . the current and previous outbreaks suggest coronaviruses will remain viruses of concern for global health. many risk factors and comorbidities, including age, sex, hypertension, diabetes, and obesity, can influence covid- patient outcomes ( ) . analysis of patient immune parameters has linked disease severity to elevated levels of biomarkers for inflammation (c-reactive protein and cardiac troponin i), organ damage (aspartate aminotransferase, abbreviated ast, and hypoalbuminemia), immune hyperactivity (il- and il- ), and clotting (d-dimer) ( ) . mortality in covid- is often caused by multi-organ injury and severe pneumonia attributed to an excessive immune response, termed a cytokine storm ( ) . given the rapid and wide spectrum of covid- disease progression, a more precise prognostic linking disease risk factors and specific immune responses can potentially predict disease trajectories and guide interventions. one hypothesis to explain differences in severity of covid- implicates weakly binding, non-neutralizing antibodies (abs) to sars-cov- proteins ( ). however, the potential harm of these suboptimal abs in covid- patient outcomes remains illdefined. furthermore, a recent review on antibody-dependent enhancement of sars-cov- stated, "at present, there are no known clinical findings, immunological assays or biomarkers that can differentiate any severe infection from immune-enhanced disease, whether by measuring antibodies, t cells or intrinsic host responses ( ) ." this conclusion inspired our study. sars-cov- encodes four major structural proteinsspike (s), nucleocapsid (n), membrane (m), and envelope (e). the s, n, and m proteins from sars elicit an ab-based immune response ( , ) . the ab response and its effects on disease progression in sars-cov- remain under investigation ( , ) . bioinformatics has predicted > ab binding epitopes from sars-cov- ( ) ( ) ( ) ( ) ( ) ( ) . the epitopes for n, m or e proteins are less well-characterized than for s protein. several studies have reported comprehensive epitope mapping of the antibody response to sars-cov- ( ) ( ) ( ) . here, we sought to characterize epitopes from sars-cov- and their correlations with disease severity. elisas with phage-displayed epitopes (phage elisas) and coronavirus antigen microarray (covam) analysis ( ) examined plasma samples from covid- patients (n = ). the results demonstrate that abs to a specific epitope from n protein plus disease risk factors strongly correlate with covid- disease severity. twenty-one putative sars-cov- epitopes were predicted through bioinformatics ( ) ( ) ( ) and structure-based analysis. the candidate epitopes span the s, n, m, or e proteins and are on average amino acids in length ( fig. and table s ). the structure of s protein bound to a neutralizing antibody ( , ) provided the starting point for of these antibody epitopes. epitopes were designed to potentially isolate even suboptimal abs binding to small portions of these structural proteins; such suboptimal abs were hypothesized to provide insight into disease severity. after display of each potential epitope on the surface of phage, the quality of the epitopes was evaluated by pcr, dna sequencing, and qc elisa (fig. s ) . a total of phage-displayed, putative epitopes passed quality control, and were selected for further study. proteins illustrate our epitope design (colored). these epitopes were phage-displayed as fragments of the full-length protein and were likely unstructured. the depicted structural models were derived from an s protein x-ray structure (pdb: vxx) ( ) or computation modeling of n, m, and e proteins (protein gene bank: qhd , qhd , and qhd , respectively) ( ) . table s provides sequences and, where applicable, sources of each epitope. plasma from covid- patients was subjected to elisas with the phagedisplayed sars-cov- epitopes ( fig. a) . unless otherwise indicated (e.g., healthy controls), plasma refers to samples from pcr-verified, covid- patients. in this initial assay, plasma was pooled, diluted -fold, and coated on a microtiter plate ( pools of n = patients per pool). nonspecific interactions were blocked (chonblock), and phage-displayed epitopes were added for elisa. the resultant data were normalized by signal from the corresponding negative control (phage without a displayed epitope). seven promising epitopes from the pooled patients were further investigated with a larger number of individual patient samples (n = ) (fig. b) . the strongest binding was observed for three epitopes from m (ep ), n (ep ), and s (ep ) proteins. additional covid- plasma samples were profiled for binding to these three epitopes (n = total) ( fig. b) . the ep epitope from n protein demonstrated robust antibody binding in % of the patient plasmas (n = ). the other epitopes failed to produce statistically sufficient numbers of responses. levels of anti-ep abs (αep abs) were mapped over days. the highest levels of αep abs were observed at days to post-symptom onset (n = ) and were detectable within days (fig. c ). this phage elisa with the indicated epitopes (x-axis) examined plasma pooled from patients (n = pools of patients each, technical replicates). b) the epitopes with the highest signals were then further examined by elisa with plasma from individual patients (n as indicated). c) with samples from individual patients (designated as p# and by color) collected at the indicated times, αep abs were measured. d) ep orthologs from sars, mers, hku- , or nl (x-axis) examined the cross-reactivity of abs to ep ( technical replicates). error bars represent sem (panels a, b, and d) or range of two measurements (panel c). cross-reactivity of αep abs against orthologous epitopes from other coronaviruses next, the cross-reactivity of αep abs was examined with ep -orthologs from four phylogenetically related coronaviruses known to infect humans (fig. s a) . specifically, plasma with αep abs (n = ) and pooled plasma from healthy individuals (n = ) were assayed. the ep epitopes from sars-cov- and sars have % amino acid sequence homology. unsurprisingly, this high degree of similarity resulted in a crossreactive ep epitope, and a strong antibody response was observed to ep epitopes from both viruses (fig. d) . the coronaviruses, mers, hku- , and nl have %, %, and % sequence homology to sars-cov- ep , respectively (fig. s b) . these more distantly related orthologs exhibited no cross-reactivity with the αep abs. furthermore, no response was observed to ep in pooled plasma from healthy individuals. covam analysis tests cross-reactivity with a panel of antigens from strains of respiratory tract infection-causing viruses. in this assay, each antigen was printed onto microarrays, probed with human sera or plasma, and analyzed as previously described. covam distinguishes between igg and igm abs binding to the full-length n protein ( fig. s and s , respectively). the elisa and covam data both demonstrate that αep abs are highly specific for lineage b betacoronaviruses, and unlikely to be found in patients before their infection with sars-cov- . direct comparison of data with full-length n protein from covam and ep phage elisa (n = patients assayed with both techniques) reveals five unique categories of patients (fig. a) . to enable this comparison, raw data from each assay was normalized as a percentage of the negative control. category (fig. a) . interestingly, the patients with αep abs suffer more prolonged illness and worse clinical outcomes compared to patients with non-ep αn abs or no αn abs. in this study, severe covid- cases are defined as resulting in death or requiring admission to the icu or intubation. the fraction of severe covid- cases was . times higher in αep abs patients than non-ep αn abs patients (fig. b, yellow panel) c and d). a larger data set of patient plasma analyzed by phage elisa confirmed this conclusion (p< . , fisher's exact test) (fig. b, blue panel) . our data further demonstrates that asymptomatic covid- patients (n = ) also tested negative for αep abs ( table s ) . the data also reveals early seroconversion of αep iggs (fig. e) , but not αep igms (fig. f) . abs have more severe disease. a) normalized and categorized data from measurements by covam (igms in yellow, iggs in green) and ep phage elisa (blue). anova comparing covam to elisa with dunnett's multiple comparisons yields p-values of **< . , ****< . , or ns: not significant. b) disease severity (color) binned by antibody response (covam in yellow, or elisa in blue). statistical analysis reveals significant differences between distributions of severe and non-severe disease comparing patient categories, p< . (  ) and p< . (fisher's exact test) for covam and elisa, respectively. patients with αep abs are c) symptomatic for longer durations and d) spend more days in the hospital than those with other αn abs or no αn abs. anova with tukey's multiple comparisons yields p-values of *< . and **< . . one outlier (black) (rout = . %) was omitted from statistical calculations for panels c and d. e) the αn igg appear at high levels early in the course of disease only for αep -positive patients, but are lower in non-ep , αn-positive patients. after > days post symptom onset, αn igg levels increase for both groups of patients. f) however, igm levels do not change significantly. error bars depict sem with the indicated number of patients (n, numbers above columns). we compared risk factors, clinical parameters, and disease outcomes among patients with αep abs (n = ) (figs. a and s ) . a disease risk factor score (drfs) was developed to evaluate the relationship between clinical preconditions and disease severity in patients with αep abs. the drfs quantifies a patient's age, sex, and preexisting health conditions associated with covid- disease severity and mortality. risk factors include hypertension, diabetes, obesity, cancer, and chronic conditions of the following: cardiac, cerebrovascular, kidney, and pulmonary ( ) ( ) ( ) ( ) . using the age score from the charlson comorbidity index ( ) yields a patient's drfs as: where each risk factor is valued as either or if absent or present, respectively. the drfs of patients with αep abs strongly correlates with covid- disease severity (pearson's r = . , p-value < . , and r = . ) (fig. a) . the correlation in patients without αep abs is weak (r = . , p-value = . , r = . ) (fig. a) . amongst patients with αep abs (n = ), a drfs ≥ can determine disease severity with . % sensitivity ( / false negatives) and % specificity ( / false positives) (fig. b) . in the entire study cohort (n = ), patients with αep abs and drfs ≥ (n = ) have severe disease with a high degree of specificity ( . %) and a sensitivity of %. notably, drfs predicts disease severity only for patients with αep abs (n = ), and patients without such abs (n = ) had no correlation with disease outcomes. examining key contributors to high drfs, the presence of αep abs correlates with more severe disease in patients who have hypertension, diabetes, or age > years. such correlation is not observed for patients lacking αep abs (figs. c) . such risk factors are prevalent at roughly the same percentages in both populations of patients (table s ) . thus, these risk factors are particularly acute for patients with αep abs. the relationship between drfs and disease severity of covid- patients with αep abs (blue) or no αep abs (gray). each data point represents one patient. the solid lines indicate linear regression fits with % confidence intervals (dotted lines), and pearson's r-value as noted. b) correlation of disease severity with drfs in patients with αep abs. the data depicts a significant correlation between drfs and disease severity in patients with αep abs (blue), but not in patients lacking αep abs (gray). in αep patients, a drfs threshold of . can predict severe disease (red). two-tailed, parametric t-tests were conducted to compare non-severe and severe disease outcomes of patients with and without αep abs, where ****p< . . the error bars represent sd with the indicated n. c) the color-indicated risk factors (diabetes, hypertension, and age score) are depicted on the x-axis as the fractions of patients in each disease severity category (y-axis). numbers indicate total patients (n) without αep abs (left) or with αep abs (right). the prevalence of risk factors (colors) increases with disease severity in patients with αep abs, but not in patients without these abs. d) patients with αep abs and drfs ≥ are predisposed to increased covid- severity and poorer outcomes. abs covid- patients can have elevated serum concentrations of > inflammatory cytokines and chemokines ( ) . however, information on the cytokine levels and the association with tissue damage and worse covid- outcomes have been inconsistent ( ) ( ) ( ) . for patients with il- concentrations measured in plasma, patients with (n = ) or without (n = ) αep abs were compared. interestingly, the comparison uncovered a strong positive sigmoidal association between il- and ast unique to patients with αep abs (r = . , spearman's r = . , p-value < . , n = ) (blue line, fig. s a) ; correlation of il- and ast in patients with αep abs remains strong even after removal of the data point at the highest il- concentration. conversely, a slight negative trend is observed in patients lacking αep abs (spearman's r = - . , p-value= . , n = ). thus, the presence of αep abs can disambiguate the sometimes contradictory association of il- with disease severity. this study introduces a two-step test as a prognostic for predicting covid- disease severity and its worst outcomes. specifically, αep abs can effectively predict severe disease (specificity . %). however, combining presence of αep abs with drfs ≥ provides much higher specificity ( . %) for predicting severe disease. previously, αn iggs have been recognized as a focal site for an antibody response ( , , ) and associated with disease severity and poor outcomes ( , , ) . the results of our study of αep abs independently confirm exciting observations from a patient cohort in singapore, which focused on its use in diagnosing sars-cov- infection ( ) . the present investigation expands on previous reports. experimental differences include in-depth patient clinical histories, test results, and disease outcomes ranging from asymptomatic to fatal. such data allows calculation of the drfs. together with the presence of αep abs, patient drfs allows early discrimination of severe from non-severe disease outcomes. additionally, fine epitope mapping demonstrates that αep abs strongly and uniquely correlate with covid- disease severity relative to other αn abs. we hypothesize that the underlying mechanism relating αep abs to increased disease severity involves an overzealous immune response. specifically, we observe early seroconversion and strong early upregulation of αep iggs (fig. e) . similar igg observations have been correlated with poor viral neutralization and clearance, resulting in increased covid- severity ( , , ) . also high levels of il- are observed for αep -positive patients with increased levels of the tissue damage marker ast; this correlation does not exist for patients lacking αep abs (fig. s a) . the sensitivity to il- concentration before ast-monitored organ damage suggests anti-il- therapeutics could be an effective management for αep -positive patients ( , ( ) ( ) ( ) . further investigation is required to determine the basis for increased disease severity in αep patients. the data demonstrate that αep positive patients with drfs ≥ are . times (likelihood ratio) more likely to have severe covid- disease symptoms within the study cohort (n = ). the presence of αep without drfs is less effective as a prognostic (likelihood ratio of . ). despite its high specificity ( . %), the sensitivity of this two-step test is % (n = ). however, this test could predict a subset of patients with a specific immune response (i.e., early igg response and il- dependent immune hyperactivity), and could suggest targeted treatment options (e.g., targeting il- and its pathways). importantly, αep abs appear early in the course of disease. thus, such a prognostic could outperform traditional markers for the cytokine storm such as il- , which appears - days after symptom onset ( , ) ; all plasma collected from αep positive patients (n = , fig c) between to days post-symptoms onset demonstrate detectable levels of αep igg (≥ fold over negative control). early detection of αep abs in patients could be used to triage and treat covid- prior to the onset of its most severe symptoms after which drugs can lose efficacy ( , ( ) ( ) ( ) (fig. s b) . this study demonstrates the usefulness of fine epitope mapping, but the following limitations should be noted. post-translational modifications, such as glycosylation were omitted for the phage-displayed s protein epitopes; covam antigens, however, are produced in baculovirus or hek- cells, which could include glycans. our analysis is based upon a population of covid- patients and healthy individuals, with the majority of hispanic descent. the conclusions could be further strengthened with followup investigations in a larger population. additionally, the population examined here only included three asymptomatic individuals, and additional testing is required to verify absence of αep abs in such patients. the sample size of patients with multiple antibody targets was too limited to allow correlation analysis; future investigations could examine associations between αep and other abs. abs recognizing other sars-cov- structural proteins could also exhibit similar characteristics to αep abs. existing diagnostic platforms could readily be adapted to test for αep abs, and the drfs calculation is quite simple to implement. as shown here, αep abs do not recognize orthologous sequences from closely related coronaviruses, providing good specificity for αep as a prognostic. previous studies have shown that the high homology of n protein among related coronaviruses can lead to high false positive rates in serodiagnostics with full-length n antigen ( ) . thus, the two-step prognostic reported here could mitigate the worst outcomes of covid- , particularly for patients at high risk. detailed materials and methods for cloning, phage purification, patient sample collection, plasma phage-antibody elisa, serum covam, and statistical analysis are described in the supplementary materials. world health organization, coronavirus disease (covid- betacoronavirus genomes: how genomic information has been used to deal with past outbreaks and the covid- pandemic presenting characteristics, comorbidities, and outcomes among patients hospitalized with covid- in the new york city area predictors of covid- severity: a literature review cytokine storm induced by sars-cov- the potential danger of suboptimal antibody responses in covid- nat a perspective on potential antibody-dependent enhancement of sars-cov- evaluation of antibody responses against sars coronaviral nucleocapsid or spike proteins by immunoblotting or elisa identification of immunodominant epitopes on the membrane protein of the severe acute respiratory syndrome-associated coronavirus antibody responses to sars-cov- in patients with covid- potential t-cell and b-cell epitopes of -ncov a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- immunoinformatics-aided identification of t cell and b cell epitopes in the surface glycoprotein of -ncov novel antibody epitopes dominate the antigenicity of spike glycoprotein in sars-cov- compared to sars-cov comparative computational analysis of sars-cov- nucleocapsid protein epitopes in taxonomically related coronaviruses immunoinformatics-guided design of an epitope-based vaccine against severe acute respiratory syndrome coronavirus spike glycoprotein serological responses to human virome define clinical outcomes of italian patients infected with sars-cov- , medrxiv rescan, a multiplex diagnostic pipeline, pans human sera for sars-cov- antigens linear b-cell epitopes in the spike and nucleocapsid proteins as markers of sars-cov- exposure and disease severity a modular microarray imaging system for highly specific covid- antibody testing structure, function, and antigenicity of the sars-cov- spike glycoprotein structure of the sars-cov- spike receptor-binding domain bound to the ace receptor deep-learning contact-map guided protein structure prediction in casp predictors of mortality in hospitalized covid- patients: a systematic review and meta-analysis high prevalence of obesity in severe acute respiratory syndrome coronavirus- (sars-cov- ) requiring invasive mechanical ventilation covid- prevalence and mortality in patients with cancer and the effect of primary tumour subtype and patient demographics: a prospective cohort study clinical course and mortality of stroke patients with coronavirus disease a new method of classifying prognostic comorbidity in longitudinal studies: development and validation preventing mortality in covid- patients: which cytokine to target in a raging storm? epidemiology, clinical course, and outcomes of critically ill adults with covid- in new york city: a prospective cohort study plasma ip- and mcp- levels are highly associated with disease severity and predict the progression of covid- viral kinetics and antibody responses in patients with covid- , medrxiv sars-cov- proteome microarray for global profiling of covid- specific igg and igm responses covid- pneumonia treated with sarilumab: a clinical series of eight patients tocilizumab in patients with severe covid- : a single-center observational analysis early identification of covid- cytokine storm and treatment with anakinra or tocilizumab whole nucleocapsid protein of severe acute respiratory syndrome coronavirus may cause false-positive results in serological assays ef-tu binding peptides identified, dissected, and affinity optimized by phage display virus bioresistor (vbr) for detection of bladder cancer marker dj- in urine at pm in one minute heat inactivation of different types of sars-cov- samples: what protocols for biosafety, molecular detection and serological diagnostics? use of an influenza antigen microarray to measure the breadth of serum antibodies across virus subtypes evaluation of quantum dot immunofluorescence and a digital cmos imaging system as an alternative to conventional organic fluorescence dyes and laser scanning for quantifying protein microarrays protein microarray analysis of the specificity and cross-reactivity of influenza virus hemagglutinin-specific antibodies statistics corner: a guide to appropriate use of correlation coefficient in medical research tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus research fund (craft), the allergan foundation, and ucop emergency research seed funding. s.s. was supported by a public impact fellowship from the uci s thank the minority access to research careers (marc) program, funded by the nih (gm- ) collected patient samples and advised on patient clinical data analysis; and tables s to s references ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) key: cord- -kje rn authors: zheng, yue; larragoite, erin t.; lama, juan; cisneros, isabel; delgado, julio c.; slev, patricia; rychert, jenna; innis, emily a.; williams, elizabeth s.c.p.; coiras, mayte; rondina, matthew t.; spivak, adam m.; planelles, vicente title: neutralization assay with sars-cov- and sars-cov- spike pseudotyped murine leukemia virions date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kje rn antibody neutralization is an important prognostic factor in many viral diseases. to easily and rapidly measure titers of neutralizing antibodies in serum or plasma, we developed pseudovirion particles composed of the spike glycoprotein of sars-cov- incorporated onto murine leukemia virus capsids and a modified minimal mlv genome encoding firefly luciferase. these pseudovirions provide a practical means of assessing immune responses under laboratory conditions consistent with biocontainment level . coronaviruses are a group of enveloped rna viruses with a positive-sense single-stranded rna genome ranging from - kilobases, which can cause respiratory tract infections. in december , a novel coronavirus known as severe acute respiratory syndrome coronavirus (sars-cov- ) was identified in china and has caused a global ongoing pandemic of coronavirus disease (covid- ). to date, sars-cov- has spread to countries (https://coronavirus.jhu.edu/). more than million cases and , deaths have been reported at the time of this writing. enveloped viruses are known to efficiently package their core elements with heterologous envelope glycoproteins, giving rise to the so called 'pseudotypes' or 'pseudoviruses ' neutralization titers nt and nt were calculated using prism (graphpad, us). to generate pseudovirion particles, three plasmids were co-transfected into hek ft cells. the hiv- lai gp -pseudotyped virus is used as a negative control as it utilizes cd as a primary receptor, which is present in supt cells but absent in hek t. pseudotyped mlv viruses were tested on hek ft, hek t-ace , huh and supt cells. hek ft cells were used as a control cell line, which is known to lack of susceptibility of to test for specificity of neutralization, we asked whether neutralizing antibodies from sars- cov- patients would exhibit cross-reactivity against a pseudotype expressing sars-cov- ( figure ). we tested samples # , and , which had the highest nt and nt . none of these sera had characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov veesler d: structure, function, and antigenicity of the sars-cov- spike glycoprotein the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity high-efficiency gene transfer into cd + cells with a human immunodeficiency virus type -based retroviral vector pseudotyped with vesicular stomatitis virus envelope glycoprotein g fate of the human immunodeficiency virus type provirus in infected cells: a role for vpr pseudotyping viral vectors with emerging virus envelope proteins the coronavirus spike protein is a class i key: cord- - hmecfi authors: korber, b; fischer, wm; gnanakaran, s; yoon, h; theiler, j; abfalterer, w; foley, b; giorgi, ee; bhattacharya, t; parker, md; partridge, dg; evans, cm; freeman, tm; de silva, ti; labranche, cc; montefiori, dc title: spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hmecfi we have developed an analysis pipeline to facilitate real-time mutation tracking in sars-cov- , focusing initially on the spike (s) protein because it mediates infection of human cells and is the target of most vaccine strategies and antibody-based therapeutics. to date we have identified thirteen mutations in spike that are accumulating. mutations are considered in a broader phylogenetic context, geographically, and over time, to provide an early warning system to reveal mutations that may confer selective advantages in transmission or resistance to interventions. each one is evaluated for evidence of positive selection, and the implications of the mutation are explored through structural modeling. the mutation spike d g is of urgent concern; it began spreading in europe in early february, and when introduced to new regions it rapidly becomes the dominant form. also, we present evidence of recombination between locally circulating strains, indicative of multiple strain infections. these finding have important implications for sars-cov- transmission, pathogenesis and immune interventions. keywords: covid- , pandemic, diversity, evolution, spike, positive selection, recombination, reinfection, antibodies, transmission rates the past two decades have seen three major highly pathogenic zoonotic outbreaks of betacoronaviruses (cui et al., ; de wit et al., ; fehr et al., ; lu et al., ; wu et al., a) . the first was severe acute respiratory syndrome coronavirus (sars-cov) in , which infected over , people and killed (graham and baric, ) . this was followed in by middle east respiratory syndrome, mers-cov, a difficult to transmit but highly lethal virus, with , cases as of , and % mortality (cui et al., ; graham and baric, ) . the third, sars-cov- is the cause of the severe respiratory disease covid- (gorbalenya et al., ) . it was first reported in china in late december , and triggered an epidemic (wu et al., b ) that rapidly spread globally to become a pandemic of devastating impact, unparalleled in our lifetimes; today's world health organization (who) situation report reads: over . million confirmed cases of covid- , and over , deaths (who situation report number , april ); tomorrow's report will bring us new and markedly higher tallies of suffering, as the who continues to track the remarkable pace of expansion of this disease. three related factors combine to make this disease so dangerous: human beings have no direct immunological experience with this virus, leaving us vulnerable to infection and disease; it is highly transmissible; and it has a high mortality rate. estimates for the basic reproductive ratio, r , vary widely, but commonly range between . - . . estimates of mortality, deaths per confirmed cases, also vary widely, and range between . - % (mortality analyses, john hopkins university of medicine). differences in mortality estimates will reflect regional access to testing (a higher proportion of mild cases is detected when more testing is deployed), as well as regional differences in clinical care, and population differences in associated risk factors such as age. these basic numbers, r and mortality, are critical for public health response planning, but are difficult to resolve with confidence or to generalize across populations, given limited diagnostic testing and variations in the strategies of estimation. although the observed diversity among pandemic sars-cov- sequences is low, its rapid global spread provides the virus with ample opportunity for natural selection to act upon rare but favorable mutations. this is analogous to the case of influenza, where mutations slowly accumulate in the hemagglutinin protein during a flu season, and there is a complex interplay between mutations that can confer immune resistance to the virus, and the fitness landscape of the particular variant in which they arise (wu et al., c) . antigenic drift in influenza, the accumulation of mutations by the virus during an influenza season, provides the baseline variation needed to enable selection for antibody resistance across populations, and this drift is the primary reason we need to develop new influenza vaccines every few seasons. the seasonality of influenza is likely to be dictated in part by weather patterns (chattopadhyay et al., ) ; longer seasonal epidemics allow selection pressure to continue over a more extended period, enhancing opportunities for the development of virus with novel antigenic surfaces that resist pre-existing immunity (boni et al., ) . sars-cov- is new to us; we do not yet know if it will wane seasonally as the weather warms and humidity increases, but our lack of pre-existing immunity and its high transmissibility relative to influenza are among the reasons it may not. if the pandemic fails to wane, this could exacerbate the potential for antigenic drift and the accumulation of immunologically relevant mutations in the population during the year or more it will take to deliver the first vaccine. such a scenario is plausible, and by attending to this risk now, we may be able avert missing important evolutionary transitions in the virus that if ignored could ultimately limit the effectiveness of the first vaccines to clinical use. there is clearly an urgent need to develop an effective vaccine against sars-cov- , as well as antibody-based therapeutics (kumar et al., ) . over vaccine approaches are currently being explored, and a wide variety of candidate sars-cov- vaccines are in development (landscape of who) . most of these vaccine approaches target the trimeric spike protein (s) with the goal of eliciting protective neutralizing antibodies. spike mediates binding and entry into host cells and is a major target of neutralizing antibodies yuan et al., ) . each spike monomer consists of an n-terminal s domain and a membrane-proximal s domain, which mediate receptor binding and membrane fusion, respectively (hoffmann et al., ; walls et al., ; wrapp et al., ) . notably, current immunogens and testing reagents are generally based on the spike protein sequence from the index strain from wuhan . sars-cov- is closely related to sars-cov; the two viruses share ~ % sequence identity (lu et al., ) and both use angiotensin converting enzyme- (ace ) as their cellular receptor (hoffmann et al., ; li et al., ; wrapp et al., ) , however the sars-cov- s-protein has a - -fold higher affinity for ace than the corresponding s-protein of sars-cov (wrapp et al., ) . it remains to be seen to what extent lessons learned from sars-cov are helpful in formulating hypotheses about sars-cov- , but sars-cov studies suggest that the nature of the antibody responses to the spike protein are complex. in sars-cov infection, neutralizing abs are generally thought to be protective; however, rapid and high neutralizing ab responses that decline early are associated with greater disease severity and a higher risk of death (ho et al., ; liu et al., ; temperton et al., ; zhang et al., ) . furthermore, some antibodies against spike mediate antibody dependent enhancement (ade) of sars-cov (jaume et al., ; kam et al., ; wan et al., ; wang et al., ; yilla et al., ; yip et al., ; yip et al., ) . because of the short duration of the outbreak, there were no efficacy trials of sars-cov vaccines and therefore we lack critical information that would help guide sars-cov- vaccine development. given spike's vital importance both in terms of viral infectivity and as an antibody target, we felt an urgent need for an "early warning" pipeline to evaluate spike pandemic evolution. our primary intent is to identify dynamically changing patterns of mutation indicative of positive selection for spike variants. also, because recombination is an important aspect of coronavirus evolution (graham and baric, ; rehman et al., ) , we also set out to determine if whether recombination is playing a role in sars-cov- pandemic evolution. here we describe a three-stage data pipeline (analysis of gisaid data, structural modeling of sites of interest, and experimental evaluation) and the identification of several sites of positive selection, including one (d g) that may have originated either in china or europe, but begin to spread rapidly first in europe, and then in other parts of the world, and which is now the dominant pandemic form in many countries. over the past two months, the hiv database team at los alamos national laboratory has turned to developing an analysis pipeline to track in real time the evolution of the sars-cov- spike (s) protein in the covid- pandemic, using the global initiative for sharing all influenza data gisaid sars-cov- sequence database as our baseline (sup. item is the gisaid acknowledgments table, listing all the groups who contribute sequences to this global effort) (elbe and buckland-merrett, ; shu and mccauley, ) . gisaid is the primary sars-cov- sequence database resource, and our intent is to complement what they provide with visualizations and summary data specifically intended to support the immunology and vaccine communities, and to alert the broader community to changes in frequency of mutations that might signal positive selection and a change in either viral phenotype or antigenicity. new global sequences arrive at gisaid at a furious pace; currently, hundreds of new sars-cov- sequences are added each day (sup . fig ) . by automating a series of key analysis steps with frequent inputs of gisaid data, critical analysis is provided in real time. the figures for this paper were based on an april , download of the gisaid data to enable the preparation of this manuscript. our analysis pipeline (www.cov.lanl.gov) enables readers to reproduce the key figures (e.g. figs. - , and an updated table of sites of interest, item s ) based on contemporary data downloaded from giasid. the pipeline was developed in collaboration with the neutralizing antibody evaluation team at duke university, who are in the process of creating a neutralizing antibody pseudovirus testing facility to experimentally resolve the virological and immunological implications of mutations of interest in spike. while developing the computational tools to enable the analysis pipeline, the los alamos group has concurrently provided reports, at roughly two-week intervals, to colleagues at duke as well as to others in the community who are working in the early design and testing phases of spike-targeting vaccines and immunotherapeutics. this paper is essentially a formalized version of the fourth such report, based on the april th gisaid data, and marks the transition of our analysis pipeline to a public resource and webpage (www.cov.lanl.gov). our analysis pipeline begins by downloading the gisaid data, then discarding partial and problematic sequences using stringent inclusion criteria, and then trimming jagged ends of the sequences back to the beginning of the first open reading frame (orf) and the end of the last orf. we use this to create two basic codon-aligned alignments (kurtz et al., ) : a more comprehensive spike alignment ( , sequences as of april th ), to monitor mutations that are beginning to accrue in spike while maximizing the sample size; and a full genome alignment ( , sequences as of april th ), to enable tracking the spike mutations in a phylogenetic context informed by the evolutionary trajectories of the full genome ( fig. ) . frequent public updates of analyses based on these alignments are provided through the pipeline. mutations among the pandemic sars-cov- sequences are sparse, limiting the applicability of traditional sequence-based methods for detecting positive selection. an alternative analysis framework for identifying positive selection can be taken, however, based on gisaid data, which provides a rich database of thousands of sequences linked to geographic information and sampling dates. this enables the tracking of sites for early indications of positive selection by identifying shifts in mutational frequencies over time. early indicators include: (i) an increasing frequency of sequences that exhibit a particular mutation over time in a local region; (ii) frequent recurrent identification of a particular mutation in different geographic regions, and in different regions of the phylogenetic tree; (iii) the use of different codons to encode the same recurrent amino acid; and (iv) tight clusters of mutations in linear or structural sequence space. because spike mutations are rare, we set a low threshold for a site to be deemed "of interest" for further tracking. thus, when a mutation is found in . % of the sequences, we begin to track it by exploring its evolutionary trajectory and modeling its structural implications, e.g., its potential impact on antibody binding sites, trimer stability, and glycosylation patterns. accumulation of mutations in known antibody epitopes are a special focus, including sites in or near the spike receptor binding domain (rbd), as well as sites that are near an immunodominant enhancing antibody epitope from the first sars epidemic that centers on the mutation d g ; thus, a lower tracking threshold of . % has been adopted for sites in spike that are within Å of these epitopes in the spike d structure. if, given these criteria, a site merits further characterization, the experimental pipeline is triggered, and reagents are ordered and put into the queue for experimental evaluation of its impact on infectivity, antigenicity, neutralization sensitivity, and capacity to bind to the ace receptor. a summary of the sites of interest (as of april th ) included positions, as well as one local cluster of mutations (table ) ; a detailed report for each site is provided in supplemental data item s . some of the sites are diminishing in frequency, and may have simply reflected a local sampling artifact from an earlier data set; they are still included in item s because they reached the . % threshold in a past sample of gisaid. still others are increasing in frequency or persisting (sup. item ), or have other interesting features suggesting they merit continued monitoring, such as recurrence in many regions of a phylogenetic tree (fig. ), or being located in a part of spike that is of structural interest. we discuss a spike mutation of particular interest, d g, at some length, and then briefly summarize the other sites of interest. d g is increasing in frequency at an alarming rate, indicating a fitness advantage relative to the original wuhan strain that enables more rapid spread. increasing frequency and global distribution. the mutation d g (a g-to-a base change at position , in the wuhan reference strain) was the only site of interest identified in our first spike mutation report in early march (fig. a) ; it was found times in sequences that were available at the time. four of these seven first d g strains were sampled in europe, and one each in mexico, brazil, and in wuhan. in / cases, d g was accompanied by other mutations: a silent c-to-t mutation in the nsp gene at position , , and a c-to-t mutation at position , which results in an rna-dependent rna polymerase (rdrp) amino acid change (rdrp p l).the combination of these three mutations forms the basis for the clade that soon emerged in europe (fig. ). by the time of our second report in mid-march, d g was being tracked at gisaid due to its high frequency, and referred to as the "g" clade; it was present in % of the global samples, but was still found almost exclusively in europe. it was recently reported by pachetti et al. to be found in europe and absent from regions globally, presumably because they capture gisaid data in roughly the time frame as our second report (pachetti et al., ) . the data available for study mid-march, given an approximate -week lag time between sampling and reporting, were consistent with the possibility of a founder effect in europe resulting in spread across the continent, coupled with an increase in european sampling in the database. however, an early april sampling of the data from gisaid showed that g 's frequency was increasing at an alarming pace throughout march, and it was clearly showing an ever-broadening geographic spread (fig. a) . to differentiate between founder effects and a selective advantage driving the increasing frequency of the g clade in the gisaid data, we applied the suite of tools that we had been developing for the sars-cov- analysis pipeline (fig. , fig. , fig. s and fig. s ) . a clear and consistent pattern was observed in almost every place where adequate sampling was available. in most countries and states where the covid- epidemic was initiated and where sequences were sampled prior to march , the d form was the dominant local form early in the epidemic (orange in figs. and ) . wherever g entered a population, a rapid rise in its frequency followed, and in many cases g became the dominant local form in a matter of only a few weeks ( fig. and s ). in europe, where the g first began its expansion, the d and g forms were cocirculating early in the epidemic, with d more common in most sampled countries, the exceptions being italy and switzerland ( fig. a) . through march, g became increasingly common throughout europe, and by april it dominated contemporary sampling ( fig. and ) . in north america, infections were initiated and established across the continent by the original d form, but in early march, the g was introduced into both canada and the usa, and by the end of march it had become the dominant form in both nations. washington state, the state with the greatest number of available gisaid sars-cov- sequences from the usa, exemplifies this pattern (figure and s ) , and a similar shift over time is evident in many other states with samples available throughout march (detailed data by state provided in item s ). sequences from new york were poorly sampled until some days into march (fig. ) , and the g form was the predominant form; the g form was coming into prominence elsewhere in the usa by that time. thus it is not clear whether the local pandemic was seeded by european contacts, as suggested in (brufsky, ) , if it was seeded by contacts from within the usa where it had already achieved high prevalence, or by a combination of both routes. australia follows the same transition pattern, from d to g dominance, as the usa and canada. iceland is the single exception to the pattern; there sequencing was extensive and the epidemic seems to have started with a g form, but there was a transient increase in the d form, which then persisted at constant low level ( fig. and s ). asian samples were completely dominated by the original wuhan d form through mid-march, but by mid-march in asian countries outside of china, the g form was clearly established and expanding (figs and ). the status of the d g mutation in china remains unclear, as very few chinese sequences in gisaid were sampled after march . south america and africa remain sparsely sampled, but are shown for completeness in figs. and ; for details see item s . taken together, the data shows that g confers a selective advantage that is repeatedly reflected by dramatic shifts to g forms in regional epidemics over a period of several weeks. the earliest d g mutation in europe was identified in germany (epi_isl_ , sampled / / ), and it was accompanied by the c-to-t mutation at , , but not by the mutation at , . of potential interest for understanding the origin of the g clade, the g form of the virus was also found times in china among early samples (item s ). a wuhan sequence (epi_isl_ ) sampled on / / had the d g mutation, but it did not have either of the two accompanying mutations, and this single d g mutation may have arisen independently (fig. ) . the other three cases were potentially related to the german sequence. one sequence sampled on / / was from zhejiang (epi_isl_ ), which borders shanghai, and it had all three mutations associated with the g clade that expanded in europe. the other two samples were both from shanghai, and sampled on / / and / / (epi_isl_ , epi_isl_ ); like the german sequence they lacked the mutation at , . given that these early chinese sequences were also highly related to the german sequence throughout their genomes, it is possible that the g may have originated either in china or in europe as it was present in both places in late january. there are no recent gisaid sequences from shanghai or zhejiang at this time, so we do not know if g was preferentially transmitted there, but the d form was prevalent in shanghai in january and early february. d g and potential mechanisms for enhanced fitness. there are two distinct conceptual frameworks that may explain why the d g mutation is associated with increased transmission. the first is based on structure. d is located on the surface of the spike protein protomer, where it can form contacts with the neighboring protomer. examination of the cryo-em structure (wrapp et al., ) indicates that the sidechain of d potentially can form a hydrogen bond with t of the neighboring protomer as shown in fig. c ; the strength of asp-thr hydrogen bonding has been well documented (kandori et al., ) . this protomer-protomer hydrogen bonding may be of critical importance as it can bring together a residue from the s unit of one protomer to the s unit of the other protomer fig. d . these two sites in the spike protein bracket both the dibasic furin-and s -cleavage sites; thus, it is possible that the d g mutation diminishes the interaction between the s and s units, facilitating the shedding of s from viral-membrane-bound s . an alternative structure-based hypothesis is that this mutation may impact rbd-ace binding (albeit indirectly since this site is not proximal to the binding interface). the rbd needs to be in the "up" position to engage with the ace receptor, and it is possible that this site could allosterically alter the transitions of "up" and "down". in the only known sars-cov- spike structure with all rbds in "down" position, the distances between d and t remain the same between all protomers (walls et al., ) . however, in the structures with one rbd "up," the distances between these residues are altered (walls et al., ; wrapp et al., ) . the distance between the "up" protomer and the "down" neighboring protomer (clockwise) is slightly longer than the rest of the protomer-protomer distances (fig. s d ). these slight changes are, however, most likely within the conformational fluctuations of a dynamic spike trimer. more detailed experimental and modeling studies are needed to elucidate the effect of this mutation on rbd transitions. the second way the d g mutation might impact transmission is immunological. d is embedded in an immunodominant linear epitope in the original sars-cov spike, s - . this peptide had a very high level of serological reactivity ( %), and induced long term b-cell memory responses in convalescent-phase sera from individuals infected during the original sars-cov epidemic . antibodies against this peptide mediate antibody-dependent enhancement (ade) of sars-cov infection by an epitope-sequencedependent mechanism, both in vitro, and in vivo in rhesus macaques . the minimal linear core epitope for the ade mediating antibodies in sars-cov was lyqdvnc (sars-cov s - ), and this peptide is immediately proximal to a peptide that is targeted by potentially beneficial neutralizing antibodies (sars-cov s - ). the ade target peptide spans the sars-cov- d site, and is identical to the equivalent region in sars-cov, s - . wang et al. noted proximity of this epitope to the rbd, and speculated that antibody binding may mediate a conformational change in spike that increases rbd-ace interaction resulting in the enhancing effect. this mechanism is notably different from the more common fc receptor-mediated mechanism of sars-cov ade reported by others, and which occurs in both the presence and absence of ace (jaume et al., ; kam et al., ; wan et al., ; wang et al., ; yilla et al., ; yip et al., ; yip et al., ) . thus, based on currently available information, there are several ways the d g mutation may impact spike's infectivity: it may improve receptor binding, fusion activation, or ade antibody elicitation. another mechanism for the shift to the g form at later times points might simply be through antigenic drift mediating antibody escape. if the d g mutation in sars-cov- was impacting neutralizing antibody sensitivity as well as, or instead of, the ade activity observed in the sars-cov study, d g could also be mediating escape that makes individuals susceptible to a second infection. d g and clinical outcome. we were concerned that if the d g mutation can increase transmissibility, it might also impact severity of disease. because clinical outcome data are not available in gisaid, we focused on a single geographic region, sheffield, england, where a large data set existed and was made available for an initial exploration of this question. sars-cov- sequences were generated from individuals presenting with covid- disease at the sheffield teaching hospitals nhs foundation trust. sheffield followed the pattern observed through much of world, starting out with d , and shifting to predominantly g by the end of march. the sheffield data included age, gender, date of sampling, cycle threshold (ct) for positive signal in e gene-based rt-pcr (used here as a surrogate for relative levels of viremia) (corman et al., ) , and clinical status: outpatient (op), inpatient (ip, requiring hospitalization), or admittance into the intensive care unit (icu). because the numbers admitted into in the icu were small, and because that information was not readily available for all subjects, we grouped the ip and icu together for most of the statistical analysis. as anticipated, there was a significant relationship between hospitalization and age ( fig. s ) (wilcoxon p < . e- , median age , interquartile range (ir) - for the hospitalized patients, versus a median of age (with ir - ) for the outpatients (dowd et al., ; promislow, ) . also, males were hospitalized more often than females (fisher's p = . e- ) (conti and younes, ; promislow, ) . furthermore, fewer pcr cycles were required for detection of virus among individuals who admitted to the hospital, indicating higher viral loads (fig. s ) (wilcoxon p = . e- , median , ir - , versus median , ir - ) (fig. s ) . there was, however, no significant correlation found between d g status and hospitalization status; although the g mutation was slightly enriched among the icu subjects, this was not statistically significant (fig. c ). we were concerned sampling issues might have introduced a bias, particularly as the d viruses were more heavily sampled early, and g at later times, and clinical practice might have changed over the course of the time period. age, however, was evenly distributed between the g and d hospitalized groups (fig b) , and the relative number of hospitalizations stayed constant throughout the study period in both the d and g groups ( fig a) . we also performed a multivariate generalized linear model (glm) analysis, (bates et al., ) with outpatient vs. hospitalized status as the outcome, and age, gender, d , and pcr ct as potential predictors. pcr ct was a significant predictor of disease severity, after adjusting for age and gender, but d g status was not. again, age was the most significant predictor (p< x - ), followed by gender (p= x - ) and then pcr ct (p= . x - ). d g status did not significantly contribute to modeling hospitalization as an outcome, but there was a marginally significant interaction with pcr ct (p= . ). while d g did not predict hospitalization, there was a significant shift in cycle threshold to fewer pcr cycles being required for detection among the group that carried g relative to d (fig. d ). this indicated that patients carrying the g mutation had higher viral loads (wilcoxon p = . , median . , ir . - . , versus median . , ir . - . ). this comparison is limited by uncertainty regarding the time from infection at which sample was taken, and by the fact that pcr is an indirect measure of viral load; still it is notable that despite these limitations, a significant difference was observed. knowing that recombination plays an important role in coronavirus evolution generally (graham and baric, ; rehman et al., ) , the possibility that recombination might be also be contributing significantly to evolution in the current pandemic seemed plausible, but difficult to detect using standard methods given how little variation there is among pandemic strains. recombination requires simultaneous infection of the same host with different viruses, and the two parental strains have to be distinctive enough to manifest in a detectable way in the recombined sequence. to determine if potential recombination events could be identified in geographically regional data sets, we applied a computational method called rapr (song et al., ) that we had originally developed to explore the evolutionary role of within-patient hiv recombination in acute hiv infection-another situation of low viral diversity. rapr enables the comparison of all triplets (sets of three sequences) in an alignment, and applies a run-length statistic to evaluate the possibility of recombination. we began this analysis with the set of sars-cov- sequences from washington state, as it was particularly well sampled set and from a geographically local population where co-infection might occur. using rapr, we found several recombination candidates, and the two most significant examples of these are shown in (fig. ). to identify these cases, we did all possible comparisons of three sequences in the sample set, and we show raw p-values given for the run-length statistic; these p-values did not withstand a multiple correction for the run length statistic. the statistic, however, only considered whether the similarities to the parents are clustered as expected in a recombination event, without regard to the plausibility of alternate hypotheses like recurring mutations. given the very low rate of mutation among the pandemic sequences, however, the patterns of shared sites shown in fig. seem highly unlikely to have arisen by serial spontaneous mutation, thus either recombination in vivo, or possibly recombination in vitro as a result of contamination during pcr, are a better explanations of the observed data. furthermore, rapr analysis indicated that one of the recombinant forms in washington founded a lineage that continued to spread. to identify other geographic regions where recombination might be occurring among the sequences in the global data set, we next focused on sequences sampled from areas with co-circulation of both haplotypes of the d g mutation, as defined by mutational events (fig. ) : the a c-to-t mutation at position , , a c-to-t mutation at position , , and the g-to-a base change at position , that gives rise to the d g amino acid change in spike. the original wuhan d form carried the bases 'c-c-g' in these positions, and the g form carries the bases 't-t-a'. given that these positions were well spaced in the genome, and the two forms were cocirculating in many communities over the month of march, we reasoned that deviations from either the 'c-c-g' or the 't-t-a' patterns would be indicative of recombination. such deviations were rare, found only in . % of our full genome alignment of , sequences, but with examples found in belgium, netherlands, minnesota, spain, iceland, latvia, nanchang, and australia. using rapr to identify likely recombination among these sets we were indeed able to identify multiple recombination candidates; some of the most significant examples from iceland and the netherlands are also shown in fig. . sites l f and l v. these are both signal peptide mutations, and it is difficult anticipate how they might impact the virus. variation in the signal peptide of other viruses, for example hiv env, can impact posttranslational modifications in the endoplasmic reticulum, including folding, expression levels, and glycosylation (asmal et al., ; upadhyay et al., ) . the l f mutation is intriguing because of its recurrence in many lineages throughout the sars-cov- phylogenetic tree, and in many different countries throughout the world (fig ) . once established, it is often regionally transmitted and contributes to multiple small local clusters (fig. , item s ) -e.g. a cluster of infections in iceland with identical in sequences all carrying f, and several comparable clusters in different states in the usa). this combination suggests that it may be a favorable mutation that tends to persist when it arises. despite its recurrences, it has maintained but not increased in frequency, and continues to be found at roughly . % of the global sampling through april. l v is potentially interesting because of a very different evolutionary trajectory. it is mostly found in a single lineage in hong kong, with one apparent recurrence in canada. because hong kong has not been recently sampled in gisaid, l v's overall global frequency appears to decline in april; however, l v may in fact be increasing in frequency in hong kong over time and thus merits continuing scrutiny (fig. s ) sites v f, g s, and v a. there are three mutational sites, v f, g s, and v a, that are found within the rbd domain (fig. a,b) . of these three, only g s occurs directly at the binding interface of rbd and peptidase domain of ace . fig. b shows that g is at the end of the binding interface that is predominantly driven by polar interactions (yan et al., ) . the closest ace residue to g is q , which is near the end of a- , the helix of ace most engaged in rbd recognition. the ace residue q makes distinct electrostatic interactions with the backbone of a , the neighboring residue to g (fig. s b) . the g s mutation may contribute to the rigidification of the local loop region in rbd near the binding interface, or bring this flexible loop even closer to the interface. the mutation at site v does not directly contact ace although it is on the same face of the rbd that forms the binding interface with the ace . site v is at the opposite end from where rbd binds to ace receptor; it is on the same face as the epitope of cr , a neutralizing antibody that was isolated from a convalescent sars-cov patient when at least two of the rbd regions of the spike trimer are "up", (fig. s c) though no direct contacts between v and cr are observed . sites v f and g s were identified as mutations of interest in early smaller datasets, and appear to be diminishing in overall global frequency in later samples, but v f merits continued scrutiny due to its potential for interactions with ace . the mutation v a has predominately appeared in washington state (item s ). it is not increasing over time among samples from washington, but it maintains a steady, albeit very low, presence (fig. s ). sites h y, y h/del, q k. each of these mutations are located in s n-terminal domain (ntd), a domain not well characterized functionally. the sites were identified as sites of interest in early smaller datasets, but appear to be diminishing in overall frequency in later samples. they recur in different countries, although k is predominantly found in the netherlands. sites a v and d y/n/e. a v and d are both being maintained at ~ . % in the global sample through april. a v is found only in iceland and is in a single lineage that is stable in frequency over time (fig. s ) , whereas d has been sampled in many countries in europe. both mutations occur in the region of fusion peptide, although neither position is structurally resolved in the cryo-em reconstruction; site a is part of the canonical fusion peptide. we have also identified a small local cluster of mutations in s -s (fig. s ) , focused in the fusion core of the hr , next to the region where the helix is broken in the trimeric pre-fusion spike (table s , fig. e ). this cluster is rich in serine residues that have a high propensity to form hydrogen bonds. upon spatially localizing in a helix with a motif sxss ( - ) as seen here ( - : sxsstxs), it has the potential to enhance the association of helices. previously, sxxssxxt-like motifs have been shown to drive the association of trans-membrane helices (dawson et al., ) and could be even more relevant given the association with amphipathic helices. given that, this cluster in the hr region of s unit could impact conformational rearrangements as the s unit transitions from pre-fusion to post-fusion by enhancing the association of the hr trimer and maintaining amphipathicity when the a single hr helix is extended (fig. s ) . even though such a motif is not seen in hiv gp , it is observed near the mper region in g-retroviral glycoproteins and has been suggested as a possible conserved mechanism to drive oligomerization (salamango and johnson, ). (of note, we had originally identified the s p as a mutation of interest because it had met our threshold critia, and it was also located in the fusion core, but a closer examination of this mutation revealed that it was the result of a sequencing processing error (freeman et al., ) (see fig. s , and methods for details).) site p l. this mutation is not included in the sars-cov- structure, but is near the end of the cytoplasmic tail of the spike protein. the mutation is found mostly in the uk, in both england and wales, but also in australia, and is emerging as a single related lineage (fig. ) . it is maintaining its frequency both globally as well as locally in the uk. when we embarked on our sars-cov- analysis pipeline, our motivation was to identify mutations that might be of potential concern in the sars-cov- spike protein as an early warning system for consideration as vaccine studies progress; we did not anticipate such dramatic results so early in the pandemic. in a setting of very low genetic diversity, traditional means of identification of positive selection have limited statistical power, but the incredibly rich gisaid data set provides an opportunity to look more deeply into the evolutionary relationships among the sars-cov- sequences in the context of time and geography. this approach revealed that viruses bearing the mutation spike d g are replacing the original wuhan form of the virus rapidly and repeatedly across the globe (fig. - ) . we do not know what is driving this selective sweep, nor for that matter if it is indeed due the modified spike and not one of the other two accompanying mutations that share the gisaid "g-clade" haplotype. the spike d g change, however, is consistent with several hypotheses regarding a fitness advantage that can be explored experimentally. d is embedded in an immunodominant antibody epitope, recognized by antibodies isolated from recovered individuals who were infected with the original sars-cov; this epitope is also targeted by vaccination in primate models . thus, this mutation might be conferring resistance to protective d -directed antibody responses in infected people, making them more susceptible to reinfection with the newer g form of the virus. alternatively, the advantage might be related to the fact that d is embedded in an immunodominant ade epitope of sars-cov , and perhaps the g form can facilitate ade. finally, the d g mutation is predicted to destabilize inter-protomer s -s subunit interactions in the trimer, and this may have direct consequences for the infectivity of the virus (fig. ) . increased infectivity would be consistent with rapid spread, and also the association of higher viral load with g that we observed in the clinical data from sheffield, england (fig. ) . many of the ways we anticipated we might find evidence of positive selection in spike are being manifested among subset of sites with accruing mutations. while the d g mutation is the only one that is dramatically increasing in frequency globally (fig. - ) , the l v mutation may be on the rise in the local epidemic in hong kong (fig. s ). to date, mutations are extremely rare in the spike rbd, but the mutation g s is directly in an ace contact residue. the mutation l f occurs in many geographic regions in many distinct clades, suggesting it repeatedly arose independently, and was selected to the extent that was frequent enough to be resampled, or is possibly a recurrent sequencing artifact. finally, we have found evidence of recombination among regional sample sets (fig. ) . recombination among pandemic sars-cov- strains would not be not surprising, given that it is also found among more distant coronaviruses with higher diversity levels (graham and baric, ; rehman et al., ) . still, it has important implications. first, natural recombination cannot be detected without simultaneous coinfection of distinct viruses in one host. if the recombination events that are illustrated in fig. are indeed happening in vivo, co-infections that enable them might be happening prior to the adaptive immune response, or in series with reinfection occurring after the initial infection stimulated a response. recombination may be more common in communities with less rigorous shelter-in-place and social distancing practices, in hospital wards with less stringent patient isolation because all patients are assumed to already be infected, or in geographic regions where antigenic drift has already begun to enable serial infection with more resistant forms of the viruses. also, recombination provides an opportunity for the virus to bring together, into a single recombinant virus, multiple mutations that independently confer distinct fitness advantages but that were carried separately in the two parental strains. tracking mutations in spike has been our primary focus to date because of the urgency with which vaccine and antibody therapy strategies are being developed; the interventions under development now cannot afford to miss their contemporary targets when they are eventually deployed. to this end, we built a data-analysis pipeline to explore the potential impact of mutations on sars-cov- sequences. the analysis is performed anew as the data becomes available through gisaid. experimentalists can make use of the most current data available to best inform vaccine constructs, reagent tests, and experimental design. while the gisaid data used for the figures in this paper was frozen at april , , many of the key figures included here are rebuilt on a frequent basis based on the newly available gisaid data. while our initial focus is on spike, the tools we have developed can be extended to other proteins and mutations in subsequent versions of the pipeline. meanwhile understanding both how the d g mutation is overtaking the pandemic and how recombination is impacting the evolution of the virus will be important for informing choices about how best to respond in order to control epidemic spread and resurgence. with the exception of d g, all other mutations in spike remain rare; we will nonetheless monitor them for potential immunological impact and/or for increased frequencies regionally or globally as the pandemic progresses. the ntd is the n-terminal domain, s is a membrane fusion subunit, and hr the first heptad repeat region . up/down conformations refer to a change in state in which the up conformation exposes the rbd (kirchdoerfer et al., ; kirchdoerfer et al., ) . the sars-cov epitope was identified from the first sars epidemic, and is the immunodominant linear antibody epitope observed in natural infection a basic neighbor joining tree, centered on the wuhan reference strain, with the gisaid g clade (named for the d g mutation, though a total of base changes define the clade) are highlighted in yellow. the regions of the world where sequences were sampled are indicated by colors. by early april, g was more common than the original d form isolated from wuhan, and rather than being restricted to europe (red) it had begun to spread globally. b) the same tree expanded to show interesting patterns of spike mutations that we are tracking against the backdrop of the phylogenetic tree based on the full genome. note two distinct patterns: mutations that predominantly appear to be part of a single lineage (p l, orange in the uk and australia, and also a v, red, in iceland), versus a mutation that is found in very different regions both geographically and in the phylogeny, indicating the same mutation may be independently arising and sampled (l f green, rare but found in scattered locations worldwide). a chart showing how gisaid sequence submissions increase daily is provided in fig. s . the tree shown here can be recreated with contemporary data downloaded from gisaid at www.cov.lanl.gov. the tree shown here was created using paup (swofford, ) ; the trees generated for the website pipeline updates are based on parsimony (goloboff, ). showing the tallies of each form, d and g in different countries and regions, starting with samples collected prior to march , then following in day intervals. b) bar charts illustrating the relative frequencies of the original wuhan form (d , orange), and the form that first emerged in europe (g , blue) based on the numbers in part (a). a variation of this figure showing actual tallies rather than frequencies, so the height of the bars represent the sample size, is provided as fig. s . c) a global mapping of the two forms illustrated by pie charts over the same periods. the size of the circle represents sampling. an interactive version of this map of the april th data, allowing one to change scale and drill down in to specific regions of the world is available at https://cov.lanl.gov/apps/covid- /map, and updates of this map based on contemporary data from gisaid are provided at www.cov.lanl.gov. fig. . running weekly average counts showing the relative amount of d (orange) and g , (blue) in different regions of the world. in almost every case soon after g enters a region, it begins to dominate the sample. fig. s shows the same data, illustrated as a daily cumulative plot. plots were generated with python matplotlib (hunter, ) . the plots shown here and in fig. s can be recreated with contemporary data from gisaid at www.cov.lanl.gov. protomers and sub-units. s and s sub-units are defined based on the furin cleavage site (protomer # : s -blue, s -cyan, protomer # : s -grey, s -tan, protomer # : s -light green, s dark green). the rbd of protomer # is in "up" position for engagement with ace receptor. red color is used to indicate individual mutational sites (ball). the dashed squares with labels indicate forthcoming detailed investigations in subsequent images. b) mutational sites near the rbd (blue)-ace (yellow) binding interface. the interfacial region is shown as a surface (pdb: m ). c) the proximity of d to t from the neighboring protomer. the white dashed lines indicate the possibility for forming hydrogen bonds. d) a cartoon is used to capture how the potential protomerprotomer interactions shown in (c) brings together d from s unit of one protomer to the t from s unit of the neighboring protomer. e) cluster of mutations, s -s , in the hr region of the spike protein. these residues occur in a region that undergoes conformational transition during fusion. the left and right images show the pre-fusion (pdb: vsb) and post-fusion (pdb: lxt) conformations of this hr region. structural implications of different sites are noted in fig. s . structural evaluations and rendering of three dimensional images were carried out using visual molecular dynamics (vmd) (humphrey et al., ) . the course we documented throughout the globe (see fig. ), with g overtaking d as the dominant form of sars-cov- . we were concerned that the rate of hospitalization might have varied over this time period, which could have biased the sample as g tended to be sampled later. the d and g panels show that the rate of hospitalized individuals from whom sequences were obtained, averaged per week, remained relatively constant across this time period for both groups. b) age distribution between clinical status groups. we were also concerned that the age distribution of people visiting the hospital might have differed between the groups, as age is highly associated with high risk, but part (b) shows that this distribution is very similar, and there was no statistical difference between the groups overall (wilcoxon p = . , d had a median age of (ir - ), while g had a median age of (ir - ). c) d g status was not statistically associated with hospitalization status. d) g was associated with fewer rounds of pcr required for detection, suggesting that people with a g virus had higher viral loads. other associations with hospital status are shown in fig. s . iceland/epi_isl_ | - iceland/epi_isl_ | - - iceland/epi_isl_ | - - parents and child (the unique bases marked with black tics emphasize this), we have sampled representatives from each of the lineages that were likely to be involved in a recombination event. putative parental strains are shown in red (top) and light blue (bottom), and the solid colored line represents the nearly , bases of their full length sar-cov- genomes. the recombinant child is shown below with color-coded tick marks representing mutations matching either parental strain, when the parents differ in the base at a given position. nucleotides in black are distinct in the child and do not match either parent. boxes below show nucleotides at each position of diversity across the triplet, with red or light blue boxes highlighting the parental strain they match. black arrows show the d g trio of mutations at positions , and (d g in the spike gene). the p-values are based on a run-length statistic, and are not corrected for multiple tests. top panel: a recombination event in which the putative recombinant "child" gave rise to a cluster of recombinants in the wa state sequence set. below, a second example from the same sequence set shows a more complex recombination event with two distinct breakpoints. middle panel: two examples of recombination events detected in the sequence set from the netherlands. bottom panel: two examples of recombination events detected in the sequence set from iceland. in both of these cases the red and blue parents were similar, and breakpoints were similar, but the red and blue parents were distinct in a few positions. in cases from the netherlands and iceland, the haplotypes representing the gisaid g clade that carries the d g mutation and the other two mutations were mixed, their positions are indicated with arrows. further information and requests for resources should be directed to and will be fulfilled by the lead contact, bette korber (btk@lanl.gov). this study did not generate new unique reagents. sequence data are available from the global initiative for sharing all influenza data (gisaid), at https://gisaid.org. the user agreement for gisaid does not permit redistribution of sequences, but lists of the sequences used in our analyses, high-resolution figures, and code will be made available at www.cov.lanl.gov. sars-cov- sequences were generated using samples taken for routine clinical diagnostic use from individuals presenting with active covid- disease: female, male, no gender specified; ages - (median . ) years. detection and sequencing of sars-cov- isolates from clinical samples nucleic acid was extracted from µl of sample on magnapure extraction platform (roche diagnostics ltd, burgess hill, uk). sars-cov- rna was detected using primers and probes targeting the e gene and the rdrp genes for routine clinical diagnostic purposes, with thermocycling and fluorescence detection on abi thermal cycler (applied biosystems, foster city, united states) using previously described primer and probe sets (corman et al., ) . nucleic acid from positive cases underwent long-read whole genome sequencing (oxford nanopore technologies (ont), oxford, uk) using the artic network protocol (accessed the th of april, https://artic.network/ncov- .) following basecalling, data were demultiplexed using ont guppy using a high accuracy model. reads were filtered based on quality and length ( to bp), then mapped to the wuhan reference genome and primer sites trimmed. reads were then downsampled to x coverage in each direction. variants were called using nanopolish (https://github.com/jts/nanopolish) and used to determine changes from the reference. consensus sequences were constructed using reference and variants called. the global initiative for sharing all influenza data (gisaid) (elbe and buckland-merrett, ; shu and mccauley, ) has been coordinating sars-cov- genome sequence submissions and making data available for download since early in the pandemic. at time of writing, dozens to hundreds of sequences were being added every day. these sequences result from extraordinary efforts by a wide variety of institutions and individuals: they are an invaluable resource, but are somewhat mixed in quality. the complete sequence download includes a large number of partial sequences, with variable coverage, and extensive 'n' runs in many sequences. to assemble a highquality dataset for mutational analysis, we constructed a data pipeline using off-the-shelf bioinformatic tools and a small amount of custom code. from thesars-cov- sequences available from gisaid, we derived a "clean" codon-aligned dataset comprising near-complete viral genomes, without large insertions or deletions ("indels") or runs of undetermined or ambiguous bases. for convenience in mutation assessment, we generated a codon-based nucleotide multiple sequence alignment, and extracted translations of each reading frame, from which we generated lists of mutations. the cleaning process was in general a process of deletion, with alignment of retained sequences; the following criteria were used to exclude sequences: . fragmented matching (> nt gap in match to reference) . gaps at ' or ' end (> nt) . high numbers of mismatched nucleotides (> ), 'n' or other ambiguous iupac codes. regions with concentrated ambiguity calls: > in any nt window) any sequence matching any of the above criteria was excluded in its entirety. sequences were mapped to a reference (bases : of genbank entry nc_ ; i.e., the first base of the orf ab start codon to the last base of the orf stop codon) using "nucmer" from the mummer package (version . ; (kurtz et al., ) ). the nucmer output "delta" file was parsed directly using custom perl code to partition sequences into the various exclusion categories (sequence mapping table) and to construct a multiple sequence alignment (msa). the msa was refined using code derived from the los alamos hiv database "gene cutter" tool code base. at this stage, alignment columns comprising an insertion of a single "n" in a single sequence (generating a frame-shift) were deleted, and gaps were shifted to conform with codon boundaries. using the initial "good-sequence" alignment, a low-effort parsimony tree was constructed using paup: a single replicate heuristic search using stepwise random sequence addition. sequences in the alignment were sorted vertically to correspond to the (ladderized) tree, and reference-sequence reading frames were added. estimated phylogenies were inferred for three distinct data partitions: the full sequence set (nearcomplete genomes), the spike open reading frame, and the full set with the spike open reading frame (orf) excluded. the full genome tree was used for fig. . the tree with the spike orf excluded was intended to allow independent assessment of the phylogenetic distribution of changes within the spike protein, by preventing convergent or homoplastic mutations driven by phenotypic selection upon the spike protein from overwhelming phylogenetic signal from the rest of the genome. we confirmed that the phylogenetic observation discussed in fig. were supported in the spikeexcluded tree, but do not include it here. it is available for cross-checking phylogenetic based inferences at www.cov.lanl.gov. trees were inferred by either of two methods: . neighbor-joining using a p-distance criterion, (swofford, ) or . parsimony heuristic search using a version of the parsimony ratchet (goloboff, ). the covid- pie chart map is generated by overlaying leaflet (a javascript library for interactive maps) pie charts on maps provided by openstreetmap. the interface is presented using rocker/shiny, a docker for shiny server. we discovered a sequencing processing error that gave rise to what appeared to be a mutation at position ( a>c and c>g) in spike that was evident in sequences from belgium (fig. s ) . we contacted the group in belgium, the source of the data, who were already aware of the issue, concurred with our interpretation, and they had been in touch with gisaid with a request to remove the problematic sequences. the error was not found among more recent sequences from belgium (fig. s ) . we identified the issue with this site as part of another study using a method to detect systematic sequencing errors (freeman et al., ) ; we are interrogating the quality of available sequencing data and these positions were highlighted as suspect. we interrogated these positions in the raw sequencing data from sheffield, and although these two variants are not present in the final consensus sequence from any of the sheffield isolates, the raw, untrimmed bam files show their presence in only one of the amplicons covering the site (fig. s a&b) . we noticed that in fact this position is to the left of the ' primer of amplicon in what we believe to be an adapter sequence. comparison of the wuhan reference and the adapter sequence reveals similarity around this position: nanopore adapter sequence: cagcacctt the wuhan reference sequence: cagcaagtt in our validation set, we see a c present at around % of called bases at both these positions in raw data but this region is trimmed by the artic pipeline and is therefore not used to call variants and contribute to the final consensus sequence. although it is evident in amplicon , in this region, there is no evidence for these variants in the data from amplicon , which also covers these positions. we include a figure (fig. s ) that hopefully will help to explain our finding. in summary this is an error that has arisen due to a combination of improper trimming of adapter and primer regions from raw sequencing reads before downstream analysis, and the coincidental homology between the nanopore adapter sequence and the wuhan reference genome in this region. to assess possible associations of clinical and sequence variables with disease severity, we used a generalized linear model (glm) using outpatient vs. hospitalized status as outcome and age, gender, d and pcr cycle threshold for e gene amplification (e_gene_ct) as potential predictors. outpatient vs. hospitalized status, gender, and d were all categorized as binomial factors, while age and e_gene_ct were considered as continuous variables. we started with the largest model that included all variables and then used anova to down-select the best predicting model. all coding was done using r and the lme package (bates ) . the phylogenetic tree was broken up into subtrees using the r ape package (paradis and schliep ), and those subtrees containing only belgian or only non-belgian tips were selected, and their total branch lengths calculated. for each subtree the r package phangorn (schliep ) was used to calculate the minimum number of changes required at each site. the maximum rate, in mutations per unit branch length, compatible with the non-belgian data was calculated as its % upper confidence level assuming a poisson distribution of mutations with the poisson parameter proportional to the branch length. the p-value of the belgian data was estimated from a poisson distribution with parameter given by the rate for non-belgian data multiplied by the total branch length of the belgian trees. only two sites were found to be significant after bonferroni corrected by the number of sites in the alignment. identification of recombination candidates using rapr. to identify the candidate recombination parent and child sequences shown in fig. s , we first did all triplet comparisons of all sequences from local region using the rapr, and the raw pvalues based on a run-length statistic all the comparisons were rank-ordered to identify the triplet candidates with strongest evidence for recombination to be used as a basis for further exploration. thus these p-values are uncorrected for multiple testing and not formally compared against the alternative hypotheses of stepwise convergent mutation, as we traditionally do in rapr analysis (song et al., ) . however, given the very low overall mutation rate among sars-cov- pandemic, it is extremely unlikely that the mutational patterns seen in the recombinant sequences were a result of stepwise convergence. archived data for the current manuscript, and current data updates, analytical results, and webtools: https://cov.lanl.gov. the r foundation for statistical computing, http://www.r-project.org r packages (https://cran.r-project.org/ except as noted) • the r foundation for statistical computing, http://www.r-project.org for the geographic sampling, we only list one country or region if it dominates a sample. for details see the complete listing of sequences with given mutation in item s highlighter: a tool to highlight matches, mismatches, and specific mutations in aligned protein or nucleotide sequences a signature in hiv- envelope leader peptide associated with transition from acute to chronic infection impacts envelope processing and infectivity fitting linear mixed-effects models using lme epidemic dynamics and antigenic evolution in a single season of influenza a distinct viral clades of sars-cov- : implications for modeling of viral spread potential for developing a sars-cov receptor-binding domain (rbd) recombinant protein as a heterologous human vaccine against coronavirus infectious disease (covid)- coronavirus cov- /sars-cov- affects women less than men: clinical response to viral infection detection of novel coronavirus ( -ncov) by real-time rt-pcr origin and evolution of pathogenic coronaviruses motifs of serine and threonine can drive association of transmembrane helices sars and mers: recent insights into emerging coronaviruses demographic science aids in understanding the spread and fatality rates of covid- data, disease and diplomacy: gisaid's innovative contribution to global health middle east respiratory syndrome: emergence of a pathogenic human coronavirus oblong, a program to analyse phylogenomic data sets with millions of characters, requiring negligible amounts of ram the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission neutralizing antibody response and sars severity sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor vmd: visual molecular dynamics matplotlib: a d graphics environment anti-severe acute respiratory syndrome coronavirus spike antibodies trigger infection of human immune cells via a ph-and cysteine proteaseindependent fcγr pathway antibodies against trimeric s glycoprotein protect hamsters against sars-cov challenge despite their capacity to mediate fcgammarii-dependent entry into b cells in vitro tight asp- --thr- association during the pump switch of bacteriorhodopsin pre-fusion structure of a human coronavirus spike protein stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis a short review on antibody therapy for covid- versatile and open software for comparing large genomes structure of sars coronavirus spike receptor-binding domain complexed with receptor extensive recombination and strong purifying selection -origin of sars-cov- two-year prospective study of the humoral immune response of patients with severe acute respiratory syndrome clinical and biochemical indexes from -ncov infected patients linked to viral loads and lung injury genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant characterizing the murine leukemia virus envelope glycoprotein membrane-spanning domain for its roles in interface alignment and fusogenicity tracking hiv- recombination to resolve its contribution to hiv- evolution in natural infection longitudinally profiling neutralizing antibody response to sars coronavirus with pseudotypes alterations of hiv- envelope phenotype and antibody-mediated neutralization by signal peptide mutations function, and antigenicity of the sars-cov- spike glycoprotein molecular mechanism for antibody-dependent enhancement of coronavirus entry a human monoclonal antibody blocking sars-cov- infection. biorxiv immunodominant sars coronavirus epitopes in humans elicited both enhancing and neutralizing effects on infection in non-human primates antibody-dependent sars coronavirus infection is mediated by antibodies against spike proteins cryo-em structure of the -ncov spike in the prefusion conformation genome composition and divergence of the novel coronavirus ( -ncov) originating in china a new coronavirus associated with human respiratory disease in china inhibition of sars-cov- (previously -ncov) infection by a highly potent pancoronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion structural basis for the recognition of sars-cov- by full-length human ace sars-coronavirus replication in human peripheral monocytes/macrophages antibody-dependent enhancement of sars coronavirus infection and its role in the pathogenesis of sars antibody-dependent infection of human macrophages by severe acute respiratory syndrome coronavirus a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov antibody responses against sars coronavirus are correlated with disease outcome of infected individuals a pneumonia outbreak associated with a new coronavirus of probable bat origin we thank sir andrew mcmichael, professor sarah rowland-jones, and dr. xiao-ning xu for their invaluable help with bringing together the clinical and theoretical biology teams necessary for this study. sequencing of sars-cov- samples was undertaken by the sheffield covid- genomics group as part of the cog-uk consortium. cog-uk and supported by funding from the medical research council (mrc) part of uk research & innovation (ukri), the national institute of health research (nihr) and genome research limited, operating as the wellcome sanger institute. tids is supported by a wellcome trust intermediate clinical fellowship ( /z/ /z). analyses strategies presented in this article were developed with the support of the laboratory directed research and development program of los alamos national laboratory. recombination analysis was conducted under project number ecr. the sequence data pipeline design and analysis of the structural immunological implications of spike mutations was conducted under project number ( er). the sequence data pipeline implementation was funded through the national institute of allergy and infectious diseases, national institutes of health, department of health and human services, under interagency agreement no. aai - - . we gratefully acknowledge the team at gisaid for creating the remarkable covid- outbreak global database and resources, and the many authors from the oringiating and submitting laboratories of sequence data on which this analysis is based; see item s is the acknowledgment table from gisaid at the time of our april th data download, listing the many people responsible for generating the sequence data. bless them. key: cord- -kzd vvci authors: digard, paul; lee, hui min; sharp, colin; grey, finn; gaunt, eleanor title: intra-genome variability in the dinucleotide composition of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kzd vvci cpg dinucleotides are under-represented in the genomes of single stranded rna viruses, and coronaviruses, including sars-cov- , are no exception to this. artificial modification of cpg frequency is a valid approach for live attenuated vaccine development, and if this is to be applied to sars-cov- , we must first understand the role cpg motifs play in regulating sars-cov- replication. accordingly, the cpg composition of the newly emerged sars-cov- genome was characterised in the context of other coronaviruses. cpg suppression amongst coronaviruses does not significantly differ according to genera of virus, but does vary according to host species and primary replication site (a proxy for tissue tropism), supporting the hypothesis that viral cpg content may influence cross-species transmission. although sars-cov- exhibits overall strong cpg suppression, this varies considerably across the genome, and the envelope (e) open reading frame (orf) and orf demonstrate an absence of cpg suppression. while orf is only present in the genomes of a subset of coronaviruses, e is essential for virus replication. across the coronaviridae, e genes display remarkably high variation in cpg composition, with those of sars and sars-cov- having much higher cpg content than other coronaviruses isolated from humans. phylogeny indicates that this is an ancestrally-derived trait reflecting their origin in bats, rather than something selected for after zoonotic transfer. conservation of cpg motifs in these regions suggests that they have a functionality which over-rides the need to suppress cpg; an observation relevant to future strategies towards a rationally attenuated sars-cov- vaccine. cpg dinucleotides are under-represented in the dna genomes of vertebrates (cooper and krawczak ; simmonds, et al. ) . cytosines in the cpg conformation may become methylated, and this methylation is used as a mechanism for transcriptional regulation (medvedeva, et al. ). methylated cytosines have a propensity to undergo spontaneous deamination (and so conversion to a thymine). over evolutionary time, this has reduced the frequency of cpgs in vertebrate genomes (cooper and krawczak ) . however, loss of cpgs in promoter regions would affect transcriptional regulation, and so cpgs are locally retained, resulting in functionally important 'cpg islands' found in around half of all vertebrate promoter regions (deaton and bird ) . single strand rna (ssrna) viruses infecting vertebrate hosts reflect the cpg dinucleotide composition of their host in a type of mimicry (simmonds, et al. ) . it was hypothesised that this is because vertebrates have evolved a cpg sensor which flags transcripts with aberrant cpg frequencies gaunt, et al. ). this idea was strengthened by the discovery that the cellular protein zinc-finger antiviral protein (zap) binds cpg motifs on viral rna and directs them for degradation (takata, et al. ) , and further supported by observations that cpgs can be synonymously introduced into a viral genome to the detriment of virus replication without negatively impacting transcriptional or translational efficiency (tulloch, et al. ; gaunt, et al. ). current understanding is therefore that ssrna viruses mimic the cpg composition of their host at least in part to subvert detection by zap. ssrna viruses also under-represent the upa dinucleotide, but to a far more modest extent (simmonds, et al. ) , and the reasons behind upa suppression are less well understood. a consequence of dinucleotide bias is that certain codon pairs are under-represented (tulloch, et al. ; kunec and osterrieder ) (so, for example, codon pairs of the conformation nnc-gnn are among the most rarely seen codon pairs in vertebrates (tats, et al. ) ). whether the two phenomena of cpg suppression and codon pair bias (cpb) are discrete remains controversial (futcher, et al. ; kunec and osterrieder ; groenke, et al. ) . the coronaviridae have a generally low genomic cytosine content (berkhout and van hemert ) , but as with other ssrna viruses, nonetheless still under-represent cpg dinucleotides to a frequency below that predicted from individual base frequencies of cytosine and guanine (woo, et al. ) . the coronavirus family comprises four genera -the alpha, beta, gamma and delta-coronaviruses. human-infecting coronaviruses (hcovs) have been identified belonging to the alpha and beta genera (hu, et al. ) . alphacoronaviruses infecting humans include hcov- e and the more recently discovered hcov-nl (van der hoek, et al. ). betacoronaviruses include hcov-oc , hcov- hku (woo, et al. ) , severe acute respiratory syndrome (sars)-cov (rota, et al. ) , middle east respiratory syndrome (mers)-cov (zaki, et al. ) and the recently emerged sars-cov- ). prior to the emergence of sars-cov- , sars-cov had the strongest cpg suppression across human-infecting coronaviruses (woo, et al. transcription regulation sequences (trss)); this complementarity allows viral polymerase jumping from the ' leader sequence to directly upstream of orfs preceded by a trs (sawicki and sawicki ). the negative sense sub-genomic rnas serve as efficient templates for production of mrnas (sawicki, et al. ). generally, only the first orf of a sub-genomic mrna is translated (perlman and netland ), although leaky ribosomal scanning has been reported as a means for accessing alternative orfs for several coronaviruses including sars-cov (schaecher, et al. ). sars-cov- was recently reported to have a cpg composition lower than other members of the betacoronavirus genus, comparable to certain canine alphacoronaviruses; an observation used to draw inferences over its origin and/or epizootic potential (xia in gc content (from ~ . - . ) was seen across the coronaviridae, and as expected, all viruses exhibited some degree of cpg suppression, with cpg o:e ratios ranging from . to . (fig a) . to investigate the root of this variation, the coronavirus sequence dataset was refined to remove sequences with more than % nucleotide identity to reduce sampling biases (so, for example, sars- cov sequences of human origin were stripped from over representative sequences to just one). the cpg compositions of the remaining sequences (table s ) were compared between coronavirus genera (alpha, beta, gamma and delta). for the representative sequences, a genus could be assigned for . no differences in cpg composition between coronavirus genera were apparent, although the gamma genus exhibited a tighter range (fig b) . next, we examined whether differences in cpg composition between viruses isolated from different hosts explained the range in cpg composition across the coronaviridae. for the representative sequences, a host could be assigned to . coronavirus sequences were divided into host groups, and groups with at least three divergent sequences were compared; this included bat, avian, camelid, canine, feline, human, mustelid, rodent, swine and ungulate viruses. variation in cpg composition between coronaviruses detected in different host species was evident across groups (p = . ) and between groups, with coronaviruses detected in canine and human species having lower cpg content and rodent and bat coronaviruses having the highest (fig. c ). significant differences in cpg composition were detected between bat and canine (p = . ), avian and rodent (p = . ), canine and mustelid (p = . ), canine and rodent (p < . ), human and rodent (p = . ), and rodent and ungulate (p = . ) viruses. all frequency ranges overlapped however, indicating viral cpg frequency alone seems to be a poor predictor of virus origin, contradicting the recent suggestion of a canine origin of sars-cov- (xia ). where sequences in a host group representative of both alpha and betacoronaviruses were available (which was the case for bat, camelid, canine, human, rodent and swine viruses), these sequences were split by genus and compared to determine whether coronavirus genera influenced coronavirus cpg frequencies in a host species-specific manner. by this method, the lack of difference in cpg composition of coronaviruses of different genera was maintained (fig. d) . to test the hypothesis that coronavirus cpg content varies according to tissue tropism (xia ), we classified the viruses according to their primary site of replication, where this was known or could be inferred from the sampling route. samples were split into five categories -'respiratory', 'enteric', 'multiple', 'other', or 'unknown'. altogether, of the sequences were classifiable (detailed in table s ), with sequences categorised as 'unknown' and excluded from further analyses. by this admittedly inexact approach, viruses infecting the respiratory tract had a significantly lower mean cpg composition than viruses with enteric tropism (p = . ; fig. e ). however, the spread of respiratory virus cpg frequencies was contained entirely within the range exhibited by enteric viruses. furthermore, sequences were assigned to the enteric group, and only to the respiratory group. of these sequences, bat viruses accounted for , all of which were assigned to the enteric group (despite reasonable sampling of respiratory tract in bats) and this cohort of viruses maintained almost the full spread of cpg frequencies (fig. e , table s ). thus, while coronavirus cpg frequency may show some correlation with replication site, the dataset available does not permit strong conclusions to be drawn or predictions about zoonotic potential to be made. cpg o:e ratios, sars-cov- has a genomic cpg ratio of . (representing the mean of complete genome sequences). this is similar to the value calculated previously for a much smaller sample (n = ) of sars-cov- sequences (xia the genomic cpg ratio. however, two orfs in particular, e orf and orf , had cpg ratios higher than , indicating an absence of cpg suppression in those regions (fig. a) . these two orfs also did not suppress the upa dinucleotide, in contrast with other sars-cov- orfs (fig. b) . due to the difficulties in distinguishing between dinucleotide bias and cpb, cpb scores were also calculated for each orf and plotted against cpg composition (fig. c) . cpb scores provide an indication of whether the codon pairs encoded in each orf are congruous with usage in vertebrate genomes. a score below indicates use of codon pairs that are disfavoured in host orfs. an approximately linear relationship between cpg o:e ratio and cpb score for each sars-cov- orf was apparent (r = . ). e orf and orf both had negative cpb scores, indicating that they use under-represented codon pairs and in keeping with the observation that both orfs over-represent cpg and upa dinucleotides. to examine the precise location of the cpg hotspots, a sliding window analysis of cpg content across the ' end of the sars-cov- genome (averaged over complete genome sequences) as well as the closely related bat and pangolin sequences was performed. as expected, marked increases in cpg o:e ratio were observed concomitant with the genomic regions associated with e orf and orf (fig. d) . the e orf and orf regions associated with high cpg composition were maintained across the bat, pangolin and human sequences, indicating that since the bat sample was collected in , the higher cpg frequency in this region has not been negatively selected. while the increase in cpg presentation was apparent across the entire e orf, starting at the ' end of orf and ending at the beginning of the m gene, the cpg spike in orf was more narrowly associated with the putative coding region. additionally, a cpg spike between the '-end of orf and the '- end of the n gene was evident. the '-end of the n orf also contains the overlapping orf b gene, which when considered alone, has a cpg o:e ratio approaching (fig. a) cov- , ratios for e orf: genomic cpg o:e were calculated (fig. b) . in non-bat non-avian host genomes, e orf usually displayed cpg suppression in line with or stronger than that seen at the genome level, whereas sars-cov and sars-cov- starkly contrasted with this, displaying far less cpg suppression in this region. to investigate the evolutionary history of e orf cpg composition in the human-infecting coronaviruses, a phylogenetic reconstruction of all human coronavirus and bat coronavirus e genes was performed to determine whether cpg ratios in this region were ancestrally derived. as expected (cotten, et al. ; , the human viruses were interspersed among the bat viruses, reflective of their independent emergence events (fig. c) coronaviridae is striking. if coronaviruses also produce a protein with anti-zap activity, it is possible that this has variable efficacy between strains, explaining the ability of coronaviruses to fluctuate cpg composition considerably. alternatively (or in addition), this may be host driven; we show that average cpg suppression varies with host species (fig c) and, as previously suggested (xia ), this may be linked with zap expression levels. we have demonstrated that cpg variation is not related to viral taxonomic grouping (fig. b) but we did find an association between viral cpg composition and primary replication site, with respiratory coronaviruses having a lower cpg composition than enteric ones (fig. e) . this is the opposite of what has been previously suggested (xia ), though this proposal was not supported by any comprehensive investigation. nevertheless, our meta-analysis was subject to the sampling preferences of many labs who have performed surveillance for coronaviruses, and many of the tissue tropism assignments we made have not been verified by experimental infections. another limitation of this analysis is that only sequences of greater than % divergence were included. tissue tropism can be defined by much smaller differences; for example, a deletion in the spike protein of transmissible gastroenteritis virus (a porcine coronavirus) altered the tropism of the virus from enteric to respiratory, while nucleotide identity was preserved at % (cox, et al. ; rasschaert, et al. ) and speculatively this may indicate that sars-cov- was genetically predisposed to make a host switch into humans. similarly, the genomic cpb score of . indicates that sars-cov- uses codon pairs which are preferentially utilised in the human orfeome, which may mean that the virus was well suited for translational efficiency in humans at its time of emergence. in coding regions which do not have overlapping orfs, there is no requirement at the coding level for cpg motifs to be retained (kanaya, et al. ). e orf and orf are not known to be in overlapping reading frames; conversely, orf b overlaps with the orf for nucleocapsid (n). some cpg retention in this region is therefore inevitable and may explain the high cpg composition of orf b. this nevertheless leaves open the question of why cpg motifs are retained in the e orf and orf regions (if this is not an ancestrally derived evolutionary hangover; as cpgs have not been lost from these regions between and now (fig. d) lower abundance than most other transcripts (kim, et al. ) . it is therefore possible that e orf is of sufficiently low abundance for a high cpg frequency to be physiologically inconsequential. similar logic can be applied to orf , which is just nucleotides in length. synonymous addition of cpgs into a virus genome has been suggested as a potential novel approach context of their cpg composition and find that sars-cov- has a low cpg composition in comparison with other coronaviruses, but with cpg 'hotspots' in genomically disparate regions. this highlights the potential for large scale recoding of the sars-cov- genome by introduction of cpgs into multiple regions of the virus genome as a mechanism for generation of an attenuated live vaccine. introduction of cpg into multiple sites could also be used to subvert the potential of the virus to revert to virulence through recombination. a challenge of live attenuated vaccine manufacture is to enable sufficient production of a vaccine virus that has a replication defect. strategic introduction of cpgs into specific regions of the virus genome has the potential to negate a replication defect in zap coronaviruses were downloaded from ncbi on the april ( sequences in total). sequences were then aligned and sequences less than % divergent at the nucleotide level, identified using the 'identify similar/ identical sequences' function in sse v . were removed from the dataset. sequences were annotated into animal groups and genera based on their description in the ncbi database. the trimmed dataset (table s ) included complete genome coronavirus sequences. individual groups were made for sequences originating from the following hosts: bat (n = ), avian ( ), camelid ( ), canine ( ), feline ( ), human ( ), mustelids ( ), rodents ( ), swine ( ), ungulates ( ) and 'other' (which included bottle-nosed dolphin ( ), hedgehog ( ), rabbit ( ), beluga whale ( ), civet ( ) and pangolin ( )). groups were loosely defined based on taxonomic orders, with some exceptions made to examine our specific research questions. bats are of the order chiroptera; multiple avian orders were grouped together (galliformes, anseriformes, passeriformes, gruiformes, columbiformes and pelicaniformes); even toed (artiodactyla) and odd toed (perissodactyla) ungulate orders were grouped, with camelids analysed separately due to their association with mers-cov (azhar, et al. ); canidae (canine) and pantherinae (feline) sequences of the carnivora order were analysed separately, as canines have previously been suggested as an intermediate host species for sars-cov- (xia ) and cat infections with sars-cov- have been reported ); humans were the only representatives from the primate order; all remaining carnivora, with the exception of a single civet sequence, belonged to the mustelidae (mustelids); rodents belong to the rodentia order; and swine belong to the artiodactyla order; whales are also artodactyla but swine were considered separately due to considerable interest in porcine coronaviruses (vlasova, et al. ). sequences were also annotated for genus by reference to the ncbi description ( of the sequences were assigned to a genus), and for primary replication site by literature reference (refer to table s ). replication site annotations were based on the sample type from which a coronavirus sequence was obtained -'enteric' for faecal/ gastrointestinal samples, 'respiratory' for nasal, oropharyngeal and other respiratory samples; 'multiple' if samples from multiple systems tested positive, 'other' if the sample was collected from a site not falling into the enteric or respiratory categories (e.g. brain), or 'unknown' if a sample type could not be determined. if only one sampling route was tested and returned a positive result, the sequence was categorised in accordance with the sole sampling route. the sequence datasets used in this paper are summarised in fig. these were then categorised by genera, host, and tissue tropism. the subset of sequences were also aligned over the e orf and grouped by host (blue shaded boxes). each box firstly describes each dataset used, the number of sequences in that dataset is then indicated in italicized font, and the figure to which the dataset corresponds is indicated in bold font. include only one representative from sequences with less than % nucleotide diversity to overcome epidemiologic biases ( representative sequences), which were analysed in the subsequent sub- figures. b. coronavirus genus against genomic cpg content. other human-infecting coronaviruses (hcov- e, hcov-nl (alphacoronaviruses) and hcov-hku and hcov-oc (betacoronaviruses) are represented using orange circles. c. vertebrate host of coronavirus against genomic cpg content. statistically significant differences between cpg compositions of viruses from different hosts are indicated above the x axis line, with 'c' denoting a statistically significant difference from canine coronaviruses and 'r' denoting a statistically significant difference from rodent coronaviruses. tukey's multiple comparisons test was used to identify differences in cpg composition between viruses infecting different hosts. a p value < . is indicated with *, p < . = **, p < . = *** and p < . = ****. d. vertebrate host of coronavirus, with further sub- division into coronavirus genus, against genomic cpg content. alphacoronaviruses are denoted with filled circles and betacoronaviruses with open circles. e. primary replication site against genomic cpg content by host. tukey's multiple comparisons test was used to identify differences in cpg composition between viruses infecting different tissues. for a full breakdown of how these were assigned, please refer to table s . the influence of cpg and upa dinucleotide frequencies on rna virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication evidence for camel-to-human transmission of mers coronavirus bats and coronaviruses on the biased nucleotide composition of the human coronavirus rna genome genetic inactivation of poliovirus infectivity by increasing the frequencies of cpg and upa dinucleotides within and across synonymous capsid region codons translation initiation at alternate in-frame aug codons in the rabies virus phosphoprotein mrna is mediated by a ribosomal leaky scanning mechanism cytosine methylation and the fate of cpg dinucleotides in vertebrate genomes full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus sites of replication of a porcine respiratory coronavirus related to transmissible gastroenteritis virus characterisation of the transcriptome and proteome of sars-cov- using direct rna sequencing and tandem mass spectrometry reveals evidence for a cell passage induced in- frame deletion in the spike glycoprotein that removes the furin-like cleavage site cpg islands and the regulation of transcription cytosine methylation by dnmt facilitates stability and survival of hiv- rna in the host cell during infection khnyn is essential for the zinc finger antiviral protein (zap) to restrict hiv- containing clustered cpg dinucleotides candidates in astroviruses, seadornaviruses, cytorhabdoviruses and coronaviruses for + frame overlapping genes accessed by leaky scanning reply to simmonds et al.: codon pair and dinucleotide bias have not been functionally distinguished elevation of cpg frequencies in influenza a genome attenuates pathogenicity but enhances host response to infection patterns of evolution and host gene mimicry in influenza and other rna viruses mechanism of virus attenuation by codon pair deoptimization the zinc-finger antiviral protein recruits the rna processing exosome to degrade the target mrna nonrandom utilization of codon pairs in escherichia coli hostile takeovers: viral appropriation of the nf-kb pathway. the bat origin of human coronaviruses high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling codon usage and trna genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with cg- dinucleotide usage as assessed by multivariate analysis identification of direct targets and modified bases of rna cytosine methyltransferases the architecture of sars-cov- transcriptome point mutations define a sequence flanking the aug initiator codon that modulates translation by eukaryotic ribosomes mega x: molecular evolutionary genetics analysis across computing platforms codon pair bias is a direct consequence of dinucleotide bias requirement of the '-end genomic sequence as an upstream cis-acting element for coronavirus subgenomic mrna transcription evidence for involvement of a ribosomal leaky scanning mechanism in the translation of the hepatitis b virus pol gene from the viral pregenome rna human cytomegalovirus evades zap detection by suppressing cpg dinucleotides in the major immediate early genes genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the genome sequence of the sars-associated coronavirus asymmetrical distribution of cpg in an 'average' mammalian gene effects of cytosine methylation on transcription factor binding sites attenuation of rna viruses by redirecting their evolution in sequence space downstream ribosomal entry for translation of coronavirus tgev gene b the role of zap and oas /rnasel pathways in the attenuation of an rna virus with elevated frequencies of cpg and upa dinucleotides coronaviruses post-sars: update on replication and pathogenesis porcine respiratory coronavirus differs from transmissible gastroenteritis virus by a few genomic deletions dinucleotide and stop codon frequencies in single-stranded rna viruses characterization of a novel coronavirus associated with severe translation reinitiation and leaky scanning in plant viruses a new model for coronavirus transcription coronaviruses and arteriviruses a contemporary view of coronavirus transcription the orf b protein of severe acute respiratory syndrome coronavirus (sars-cov) is expressed in virus-infected cells and incorporated into sars- cov particles evidence for translation of the borna disease virus g protein by leaky ribosomal scanning and ribosomal reinitiation bovine coronavirus i protein synthesis follows ribosomal scanning on the bicistronic n mrna sse: a nucleotide and amino acid sequence analysis platform modelling mutational and selection pressures on dinucleotides in eukaryotic phyla -selection against cpg and upa in cytoplasmically expressed rna and in rna viruses widespread occurrence of -methylcytosine in human coding and non-coding rna the expected equilibrium of the cpg dinucleotide in vertebrate genomes under a mutation model cg dinucleotide suppression enables antiviral defence targeting non-self rna the short form of the zinc finger antiviral protein inhibits influenza a virus protein expression and is antagonized by the virus-encoded ns preferred and avoided codon pairs in three domains of life sequence context at human single nucleotide polymorphisms: overrepresentation of cpg dinucleotide at polymorphic sites and suppression of variation in cpg islands rna virus attenuation by codon pair deoptimisation is an artefact of increases in cpg/upa dinucleotide frequencies. elife :e . van der hoek l identification of a new human coronavirus porcine coronaviruses. emerging and transboundary animal viruses overlapping signals for translational regulation and packaging of influenza a virus segment characterization and complete genome sequence of a novel coronavirus cytosine deamination and selection of cpg suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense the c protease of enterovirus a counteracts the activity of host zinc-finger antiviral protein (zap) isolation of a novel coronavirus from a man with pneumonia in saudi arabia a novel coronavirus from patients with pneumonia in china key: cord- -scd f vk authors: pape, constantin; remme, roman; wolny, adrian; olberg, sylvia; wolf, steffen; cerrone, lorenzo; cortese, mirko; klaus, severina; lucic, bojana; ullrich, stephanie; anders-Össwein, maria; wolf, stefanie; cerikan, berati; neufeldt, christopher j.; ganter, markus; schnitzler, paul; merle, uta; lusic, marina; boulant, steeve; stanifer, megan; bartenschlager, ralf; hamprecht, fred a.; kreshuk, anna; tischer, christian; kräusslich, hans-georg; müller, barbara; laketa, vibor title: microscopy-based assay for semi-quantitative detection of sars-cov- specific antibodies in human sera date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: scd f vk emergence of the novel pathogenic coronavirus sars-cov- and its rapid pandemic spread presents numerous questions and challenges that demand immediate attention. among these is the urgent need for a better understanding of humoral immune response against the virus as a basis for developing public health strategies to control viral spread. for this, sensitive, specific and quantitative serological assays are required. here we describe the development of a semi-quantitative high-content microscopy-based assay for detection of three major classes (igg, iga and igm) of sars-cov- specific antibodies in human samples. the possibility to detect antibodies against the entire viral proteome together with a robust semi-automated image analysis workflow resulted in specific, sensitive and unbiased assay which complements the portfolio of sars-cov- serological assays. the procedure described here has been used for clinical studies and provides a general framework for the application of quantitative high-throughput microscopy to rapidly develop serological assays for emerging virus infections. the recent emergence of the novel pathogenic coronavirus sars-cov- [ ] [ ] [ ] and the rapid pandemic spread of the virus has dramatic consequences in all affected countries. in the absence of a protective vaccine or a causative antiviral therapy for covid- patients, testing for sars-cov- infection and tracking of transmission and outbreak events are of paramount importance to control viral spread and avoid the overload of healthcare systems. the sequence of the viral genome became publicly available only weeks after the initial reports on covid- witnessed in the early phases of the ongoing sars-cov- pandemic. thus, complementary strategies to test for antiviral antibodies that can be rapidly deployed in situations where commercially available kits are either not yet developed or not available are an important addition to the diagnostic toolkit. immunofluorescence (if) using virus infected cells as a specimen is a classical serological approach in virus diagnostics and has been applied to coronavirus infections, including the closely related virus sars-cov [ ] [ ] [ ] . the advantages of if are (i) that it does not depend on specific diagnostic reagent kits or instruments, (ii) that the specimen contains all viral antigens expressed in the cellular context and (iii) that the method has the potential to provide high information content (differentiation of staining patterns and intensities due to reactivity against various viral proteins). a mayor disadvantage of the if approach as it is typically used in serological testing is its limited throughput capacity due to the involvement of manual microscopy handling steps and sample evaluation based on visual inspection of micrographs. furthermore, visual classification is subjective and thus not well standardized and yields only binary results. here, we address those limitations, making use of advanced automated microscopy and image analysis strategies developed for basic research. we present the establishment and validation of a semi-quantitative, semi-automated workflow for sars-cov- specific antibody detection. with its -well format, semi-automated microscopy and automated image analysis workflow it combines advantages of if with a reliable and objective semi-quantitative readout and high throughput compatibility. the protocol described here was developed in response to the emergence of sars-cov- , but it represents a general approach that can be adapted for the study of other viral infections and is suitable for rapid deployment to support diagnostics of emerging viral infections in the future. . results . setup of the if assay for sars-cov- antibody detection we decided to use cells infected with sars-cov- as samples for our if analyses, since this setup provides the best chance for detection of antibodies targeted at the different viral proteins expressed in the host cell context. african green monkey kidney epithelial cells (veroe cell line) have been used for infection with sars-cov- , virus production and if [ , ] . in preparation for our analyses we compared different cell lines for use in infection and if experiments, but all tested cell lines were found to be inferior to veroe cells for our purposes (see materials and methods and fig. s ). all following experiments were thus carried out using the veroe cell line. in order to allow for clear identification of positive reactivity in spite of a variable and sometimes high nonspecific background from human sera, our strategy involves a direct comparison of the if signal from infected and non-infected cells in the same sample. preferential antibody binding to infected compared to non-infected cells indicates the presence of specific sars-cov- antibodies in the examined serum. under our conditions, infection rates of ~ - % of the cell population were achieved, allowing for a comparison of infected and non-infected cells in the same well of the test plate. an antibody that detects dsrna produced during viral replication was used to distinguish infected from non-infected cells within the same field of view ( fig. a) . in order to define the conditions for immunostaining using human serum, we selected a small panel of negative and positive control sera. four sera from healthy donors collected before november were chosen as negative controls, and eight sera from pcr confirmed covid- inpatients collected at day or later post symptom onset were employed as positive controls. sera from this test cohort were used for primary staining, and bound antibodies were detected using fluorophore-coupled secondary antibodies against human igg, iga or igm. no difference between infected and non-infected cells in serum igg antibody binding was observed when sera collected before the onset of the sars-cov- pandemic were examined ( fig. b, fig. s ). in contrast, covid- patient sera were clearly characterized by higher serum igg antibody binding to infected compared to non-infected cells (fig. b) . all eight covid- patient serum samples yielded higher igg binding to infected compared to non-infected cells as assessed by visual inspection (fig. s ). similar results were obtained when an iga or igm specific secondary antibody was used for detection (fig. s ) . in order to allow for the parallel assessment of igg and iga or igm antibodies, we established conditions for the parallel detection of anti-igg coupled to alexafluor and anti-iga or anti-igm coupled to dylight or alexafluor secondary antibodies, respectively, without signal bleedthrough. using this approach, it was possible to implement detection of sars-cov- specific igg and iga or igm antibodies in a single experimental setup (fig. s ) . titration experiments were performed with positive control sera to determine the optimal range of serum concentration in the if experiments. all eight positive control samples showed visually detectable specific labelling of infected cells over the range of : and : , demonstrating robustness of the assay (fig. s ). serum concentrations of less than : did not yield detectable signals in all cases. we decided to employ a dilution of : in the further experiments . image analysis our next aim was to establish a semi-automated analysis workflow for image acquisition and analysis for a medium to high throughput setting. veroe cells were seeded into -well plates infected and immunostained using anti-dsrna antibody and patient serum, followed by indirect detection using a mixture of anti-igg and anti-iga/igm secondary antibodies. images were acquired using an automated widefield microscope (see materials and methods section for more detail). to obtain a measure for specific antibody binding we performed automated segmentation of cells and classified them into infected and non-infected cells based on the dsrna staining. we then measured fluorescence intensities in the serum channel per cell as a proxy for the amount of bound antibodies for both infected and non-infected cells and calculated the ratio between these values for infected and non-infected cells in a given specimen. to enable training of a machine learning approach for cell segmentation and to directly evaluate infected cell classification, we manually labelled cells and annotated them as infected/non-infected in images chosen from positive and control specimens. fig. presents a graphical overview of all analysis steps; the full description of every step can be found in materials and methods. briefly, our approach works as follows: first, we manually discarded all images that contained obvious artefacts such as large dust particles or dirt and out-of-focus images. then, images were processed to correct for the uneven illumination profile in each channel. next, we segmented individual cells with a seeded watershed algorithm [ ] , using nuclei segmented via stardist [ ] as seeds and boundary predictions from a u-net [ , ] as a heightmap. we evaluated this approach using leave-one- image-out cross-validation on the manual annotations and measured an average precision [ ] of . +- . (i.e., on average % of segmented cells are matched correctly to the corresponding cell in the annotations). combined with extensive automatic quality control which discards outliers in the results, the segmentation was found to be of sufficient quality for our analysis, especially since robust intensity measurements were used to reduce the effect of remaining errors. we then classified the segmented cells into infected and non-infected, by measuring the th percentile intensities in the dsrna channel and classifying cells as infected if this value exceeded . times the noise level, determined by the mean absolute deviation. this factor and the percentile were determined empirically using grid search on the manually annotated images (see above). using leave-one-out cross validation on the image level, we found that this approach yields an average f -score of . %. in order to make our final measurement more reliable, we then discarded whole wells, to score each sample, we computed the intensity ratio : here, is the median serum intensity of infected cells and the median serum intensity of non-infected cells. for each cell, we compute its intensity by computing the mean pixel intensity in the serum channel (excluding the nucleus area where we typically did not observe serum binding) and then subtracting the background intensity, which is measured on two control wells that did not contain any serum. we used efficient implementations for all processing steps and deployed the analysis software on a computer cluster in order to enhance the speed of imaging data processing. for visual inspection, we have further developed an open-source software tool (plateviewer) for interactive visualization of high-throughput microscopy data [ ] . plateviewer was used in a final quality control step to visually inspect positive hits. for example, plateviewer inspection allowed identifying a characteristic spotted pattern co-localizing with the dsrna staining ( fig. s ) that was sometimes observed in the iga channel upon staining with negative control serum. in contrast, sera from covid- patients typically displayed cytosol, er-like and plasma membrane staining patterns in this channel (fig. b, fig. s ). the dsrna co-localizing pattern observed for sera from the negative control cohort is by definition non-specific for sars-cov- , but would be classified as a positive hit based on staining intensity alone. using plateviewer, we performed a quality control on all iga positive hits and removed those displaying the spotted pattern colocalising with the dsrna signal from further analysis. with the immunofluorescence protocol and automated image analysis in place we proceeded to test a larger number of control samples in a high throughput compatible manner for assay validation. all samples were processed for if as described above, and in parallel analysed by a commercially available semi-quantitative sars-cov- elisa approved for diagnostic use (euroimmun, lübeck, germany) for the presence of sars-cov- specific igg and iga antibodies. as outlined above, a main concern regarding serological assays for sars-cov- antibody detection is the occurrence of false positive results. a particular concern in this case is cross-reactivity of antibodies that originated from infection with any of the four types of common cold corona viruses (cccov) circulating in the population. the highly immunogenic major structural proteins of sars-cov- nucleocapsid (n) and spike (s) protein, have an overall homology of ~ % [ ] to their counterparts in cccov and subdomains of these proteins display a higher degree homology; cross-reactivity with cccov has been discussed as the major reason for false positive detection in serological tests for closely related sars-cov and mers-cov [ ] . also, acute infection with epstein-barr virus (ebv) or cytomegalovirus (cmv) may result in unspecific reactivity of human sera [ , ] . we therefore selected a negative control panel consisting of sera collected before the fall of , comprising samples from healthy donors (n= , cohort b), patients that tested positive for cccov several months before the blood sample was taken (n= , all four types of cccov represented; cohort a), as well as patients with diagnosed mycoplasma pneumoniae (n= ; cohort z), ebv or cmv infection (n= , cohort e). we further selected a panel of sera from rt-pcr confirmed covid- patients collected at different days post symptom onset as a positive sample set (cohort c, see below). sera were employed as primary antisera for if staining using igm, iga or igg specific secondary antibodies, and samples were imaged and analysed as described above. this procedure yielded a ratiometric intensity score for each serum sample. based on the scores obtained for the negative control cohort and the patient sera, we defined the threshold separating negative from positive scores for each of the antibody channels. for this, we performed roc curve analysis [ ] [ ] [ ] on a subset of the data (cohorts a, b, c, z). using this approach, it is possible to take the relative importance of sensitivity versus specificity as well as seroprevalence in the population (if known) into account for optimal threshold definition. by giving more weight to false positive or false negative results, one can adjust the threshold dependent on the context of the study. whereas high sensitivity is of importance for e.g. monitoring seroconversion of a patient known to be infected, high specificity is crucial for population based screening approaches, where large study cohorts characterized by low seroprevalence are tested. since we envision the use of the assay for screening approaches, we decided to assign more weight to specificity at the cost of sensitivity for our analyses (see materials and methods for an in-depth description of the analysis). optimal separation in this case was given using threshold values of . , . and . for iga, igg and igm channels respectively (fig. s ). we validated the classification performance on negative control cohort e (n= ) which was not seen during threshold selection, and detected no positive scores. results from the analysis of the negative control sera are presented in fig. and table . while the majority of these samples tested negative in elisa measurements as well as in the if analyses, some positive readings were obtained in each of the assays, in particular in the iga specific analyses ( fig. and table ). since samples from these cohorts were collected between and , and donors were therefore not exposed to sars-cov- before sampling, these readings represent false positives. of note, negative control cohort e displayed a particularly high rate of false positives in elisa measurements, but not in if (table ). we conclude that the threshold values determined achieve our goal of yielding highly specific if results (at the cost of sub-maximal sensitivity). roughly . % (iga) or % (igg) of the samples were classified as positive or potentially positive by elisa ( table ). the notably lower specificity of the iga determination in a seronegative cohort observed here is in accordance with findings in other studies [ , ] accordance with other reports [ ] [ ] [ ] . consistent with other reports [ ] , sars-cov- specific igm was not detected notably earlier than the two other antibody classes in our measurements. at the earlier time points (up to day ), a similar or higher proportion of positive samples was detected by if compared to elisa for igg. although the sample size used here is too small to allow a firm conclusion, these results suggest that the sensitivity of igg detection by the semi- quantitative if approach is higher than that of an approved semi-quantitative elisa assay routinely used in diagnostic labs. in the case of iga detection at earlier time points (< day ) elisa performed slightly better ( / samples scored positive) compared to if ( / scored positive) however that came with the price of a very low specificity of elisa iga assay ( . % false negative detection) compared to if ( . %). . discussion here, we describe the development of a semi-quantitative if based assay for detection of sars-cov- specific antibodies in human samples that complements available elisa-based testing systems [ , ] . alternatives to elisa-based commercial test kits are important in situations where those kits are not available either because they are not yet developed in early days of the pandemic or due to high global demands for tests and required reagents. the microscopy-based assay described here has been developed during the early phase of the covid- pandemic to support the serological testing needs of the university hospital heidelberg, germany and is employed as a confirmatory assay in clinical studies [ ] and ongoing studies]. the assay displayed comparable or slightly better sensitivity and specificity than a commercially available semi-quantitative sars-cov- elisa approved for diagnostic use at the time. more importantly, combining two technically different serological assays, if and elisa, and classifying as "positive hits" only those that scored positive in both assays was instrumental to minimize false positive results while maintaining high sensitivity, and thus serves as a principle for serological studies or diagnostics where specificity of detection is of critical importance. specificity of detection is essential in settings of relatively low sars-cov- antibody prevalence [ ] [ ] [ ] in conjunction with high prevalence of potentially cross-reactive anti-cccov antibodies in a global population [ ] . one advantage of the if based assay presented here is that the specimens used for detection present the entire viral proteome, while elisa or chemiluminescent approaches use a single recombinantly expressed antigen. both the n and s protein of coronaviruses are highly immunogenic, and antibodies binding to the receptor binding domain on the s subunit are considered most relevant for neutralization. however, the relative importance of antibodies directed against the n protein for potential protective immunity against sars-cov- and the possible relevance of the overall breadth of the antibody response is currently unclear. other sars-cov- structural and non-structural proteins might play a role in immune response as it was shown for proteins a and b of the closely related sars-cov [ ] . in addition, expression of the viral proteome in permissive cells ensures correct protein folding and post-translational modification patterns. alterations in post-translational modifications are likely to influence the ability of serum antibodies to bind to different viral epitopes as it was shown for other viruses such as hiv- [ ] . it has to be noted that the detection of viral rna requires fixation and permeabilization of cells, which has the potential to affect epitope preservation. however, based on the high sensitivity of antibody detection and the good correlation to elisa measurements observed we conclude that this was no major concern in this case. two major disadvantages of typical if-based serological assays as applied in the past are manual microscopy acquisition steps and evaluation of samples based on a visual inspection. this procedure is incompatible with high throughput approaches and results are subjective, not quantitative and difficult to standardize. we have addressed these disadvantages by implementing automated microscopy acquisition and developing a robust software platform that is able to identify individual cells, classify infected and non-infected cells and take into account specific and non-specific background in order to generate semi-quantitative results. high-throughput application [ , ] . combining such cell lines with spectral unmixing microscopy [ ] would not only enable simultaneous determination of levels of all three major classes of antibodies (igm, igg and iga), but also identification of the viral antigens recognized, in a single multiplexed approach. the high information content of the if data (differential staining patterns) together with a machine learning-based approach [ ] and the implementation of stable cell lines expressing selected viral antigens in the if assay will provide additional parameters for classification of patient sera and further improve sensitivity and specificity of the presented if assay. the described analysis pipeline can be readily applied for serological analysis of other virus infections, provided that an infectable cell line and a staining procedure that allows differentiating between infected and non-infected cells are available. the assay described here thus offers potential as an immediate response to any future virus pandemic, as it can be rapidly deployed from the moment the first isolate of the pathogen has been obtained without requiring information on the expression of immunogenicity of viral proteins. two of our processing steps require manually annotated data: in order to train the convolutional neural network used for boundary and foreground prediction, we needed label masks for the individual cells. to determine suitable parameters for the infected cell classification, we needed a set of cells classified as being infected or non-infected. we have produced these annotations for images with the following steps. first, we created an initial segmentation following the approach outlined in the segmentation subsection, using boundary and foreground predictions from the ilastik [ ] pixel classification workflow, which can be obtained from a few sparse annotations. we then corrected this segmentation using the annotation tool bigcat (https://github.com/saalfeldlab/bigcat). after correction, we manually annotated these cells as infected or non-infected. note that this mode of annotations can introduce two types of bias: the segmentation labels are derived from an initial segmentation. small systematic errors in the initial segmentation that were not found during correction, could influence the boundary cell segmentation forms the basis of our analysis method. in order to obtain an accurate segmentation, we make use of both the dapi and the serum channel. first, we segment the nuclei on the dapi channel using the stardist method [ ] trained on data from caicedo et al. [ ] . note that this method yields an instance segmentation: each nucleus in the image is assigned a unique id. in addition, we predict per pixel probabilities for the boundaries between cells and for the foreground (i.e. whether a given pixel is part of a cell) using a d u-net [ ] based on the implementation of wolny et al. [ ] . this method was trained using the annotated images, see above. the cells are then segmented by the seeded watershed algorithm [ ] . we use the nucleus segmentation, dilated by pixels, as seeds and the boundary predictions as the height map. in addition, we threshold the foreground predictions, erode the resulting binary image by pixels and intersect it with the binarised seeds. the result is used as a foreground mask for the watershed. the dilation / erosion is performed to alleviate issues with very small nucleus segments / imprecise foreground predictions. in order to evaluate this segmentation method, we train different networks using leave-one-out cross-validation, training each network on of the manually annotated images and evaluating it on the remaining one. we measure the segmentation quality using average precision [ ] at an intersection over union (iou) threshold of . as described in https://www.kaggle.com/c/data-science-bowl- /overview/evaluation. we measure a value of . +- . with the optimum value being . . quantitation and scoring to distinguish infected cells from control cells we use the dsrna virus marker channel: infected cells show a signal in this channel while the non-infected control cells should ideally be invisible (see fig. ). we classified each cell in the cell segmentation (see above) individually, using the following procedure. first, we denoised the marker channel using a white tophat filter with a radius of pixels. to account for inaccuracies in the cell segmentation (the exact position of cell borders is not always clear), we then eroded all cell masks with a radius of pixels and thereby discard pixels close to segment boundaries. this step does not lead to information loss, since the virus marker is mostly concentrated around the nuclei. on the remaining pixels of each cell, we compute the . quantile ( ) of the intensity in the marker channel. for the pixels that the neural network predicts to belong to the background ( ), we compute the median intensity of the virus marker channel across all images in the current plate. finally, we classify the cell as infected if the . quantile of its intensity exceeds the median background by more than a given threshold: for additional robustness against intensity variations we adapt the threshold based on the variation in the background in the plate. hence, we define it as a multiple of the mean absolute deviation of all background pixels of that plate with n= . : to determine the optimal values of the parameters used in our procedure, we used the cells manually annotated as infected / non-infected (see above). we performed grid search over the following parameter ranges: in order to determine the presence of sars-cov- specific antibodies in patient sera, it was necessary to define a decision threshold r*. if a measured intensity ratio r is above a decision threshold r* than the serum would be characterized as positive for sars-cov- antibodies. for this an roc analysis was performed [ ] . each possible choice of r* for a test corresponds to a particular sensitivity/specificity pair. by continuously varying the decision threshold, we measured all possible sensitivity/specificity pairs, known as roc curves (fig. s ). to determine the appropriate r* we considered two factors [ ] : where is the prevalence or prior probability of disease. the optimal decision threshold r*, given the false-positive/false-negative cost ratio and prevalence, is the point on the roc curve where a line with slope m touches the curve. as discussed in the main text, a major concern regarding serological assays for sars-cov- antibody detection is the occurrence of false-positive results. therefore, we choose m to be larger than one in our analysis. in particular, we determine r* for the choice of m= (see fig. s ). we performed quality control of the images and analysis results at the level of wells, images and cells. the entities that did not pass quality control are not taken into account when computing the score during final analysis. we exclude wells that contain less than non-infected cells, that have a median serum intensity of infected cells smaller than times the noise level (measured by the median absolute deviation), or that have negative intensity ratios, which can happen due to the background subtraction. out of . wells, did not pass the quality control, corresponding to . % of wells. at the image level, we visually inspect all images and mark those that contain imaging artifacts using a viewer based on napari [ ] . we distinguish the following types of artifacts during the visual inspection: empty, unstained or over-saturated images, as well as images covered by a large bright object. in addition, we automatically exclude images that contain less than or more than cells. these thresholds are motivated by the observation that too few or too many cells often result from a problem in the assay. thus, of the total . images were excluded from further analysis, corresponding to . % of images. out of these, were manually marked as outliers and only a single one did not pass the subsequent automatic quality control. finally, we automatically exclude segmented cells with a size smaller than pixels or larger than . pixels that most likely correspond to segmentation errors. these limits were derived by the histogram of cell sizes investigated for several plates. two percent of the approximate . million segmented cells did not pass this quality control. in addition, we have also inspected all samples scored as positives. for the iga channel, we have found a dotty staining pattern in ten cases that produced positive hits based on intensity ratio in negative control cohorts, but does not appear to indicate a specific antibody response. we have also excluded these samples from further analysis. in order to scale the analysis workflow to the large number of images produced by the assay, we implemented an open-source python library to run the individual analysis steps. this library allows rerunning experiments for a given plate for newly added data on demand and caches intermediate results in order to rerun the analysis from checkpoints in case of errors in one of the processing steps. to this end, we use a file layout based on hdf [ ] to store multi-resolution image data and tabular data. the processing steps are parallelized over the images of a plate if possible. we use efficient implementations for the u-net [ ] , stardist [ ] and the watershed algorithm (http://ukoethe.github.io/vigra/) as well as other image processing algorithms [ ] . we use pytorch (https://pytorch.org/) to implement gpu-accelerated cell feature extraction. the total processing time for a plate (containing around images) is about two hours and thirty minutes using a single gpu and cpu cores. in addition, the results of the analysis as well as meta-data associated with individual plates are automatically saved in a centralized mongodb database (https://www.mongodb.com) at the end of the workflow execution. apart from keeping track of the analysis outcome and meta-data, a user can save additional information about a given plate/well/image in the database conveniently using the plateviewer (see below images that correspond to three different ratio scores (ratio score is indicated above the image) determined from samples stained with three different human sera, followed by staining with an anit-igg secondary antibody coupled to alexafluore . images represent overlays of three channels -nuclei (blue), igg (green) and dsrna (red). white boxes mark the zoomed area. cells in the insets are highlighted with yellow or cyan boundaries, indicating infected and non- infected cells, respectively. scale bar = m. thank embl, especially the embl it services department for providing computational infrastructure and support, as well as wolfgang huber for discussions on computing image based scores and statistical tests. we thank the patients who participated in this study berlin and the european virus archive (evag) for the provision of the sars-cov- strain bavpat . individual images used in the fig. a courtesy of medical illustrations database sb is supported by the heisenberg program (project number ) and mls is supported by the dfg (project number ). the funders had no role in study design, data collection bartenschlager microscopy development: s merle data interpretation laketa study design all authors have read and approved the final version of the manuscript miccai . miccai van den bogaard prevalence of covid- in children in baden-württemberg preliminary study report handb. open source tools we would like to thank martin weigert and uwe schmidt for their help with setting up prediction for stardist. we would like to acknowledge infectious disease imaging platform (idip) at center for integrative infectious diseases research (ciid) for microscopy support. we would like to key: cord- -xp rkxc authors: nikolaev, en; indeykina, mi; brzhozovskiy, ag; bugrova, ae; kononikhin, as; starodubtseva, nl; petrotchenko, ev; kovalev, g; borchers, ch; sukhikh, gt title: mass spectrometric detection of sars-cov- virus in scrapings of the epithelium of the nasopharynx of infected patients via nucleocapsid n protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xp rkxc detection of viral rna by pcr is currently the main diagnostic tool for covid- [ ]. the pcr-based test, however, shows limited sensitivity, especially at early and late stages of the disease development [ , ], and is relatively time consuming. fast and reliable complementary methods for detecting the viral infection would be of help in the current pandemia conditions. mass-spectrometry is one of such possibilities. we have developed a mass-spectrometry based method for the detection of the sars cov- virus in nasopharynx epithelial swabs, based on the detection of the viral nucleocapsid n protein. the n protein of the sars-cov- virus, the most abundant protein in the virion, is the best candidate for mass-spectrometric detection of the infection, and ms-based detection of several peptides from the sars-coov- nucleoprotein has been reported earlier by the sinz group [ ]. our approach shows confident identification of the n protein in patient samples even with the lowest viral loads and a much simpler preparation procedure. our main protocol consists of virus inactivation by heating and adding of isopropanol, and tryptic digestion of the proteins sedimented from the swabs followed by ms analysis. a set of unique peptides, produced as a result of proteolysis of the nucleocapsid phosphoprotein of sars-cov- , is detected. the obtained results can further be used to create fast parallel mass-spectrometric approaches for the detection of the virus in the nasopharyngeal mucosa, saliva, sputum and other physiological fluids. severe acute respiratory syndrome coronavirus (sars-cov- ) is the causative agent of coronavirus disease (covid- ). the virus unit consists of a single-stranded rna with nucleoproteins enclosed within a capsid containing matrix proteins. the genome of the sars-cov- virus has been sequenced [ ] , and the polymerase chain reaction (pcr) method is currently the main diagnostics approach for covid- diagnostics. however, it has shown limited accuracy and sensitivity, giving a high frequency of false positive and false negative results [ ] [ ] [ ] , and is relatively time consuming. the detection of viral proteins in body fluids by mass-spectrometry based methods could serve as a complementary diagnostic tool. in addition, alternative testing assays would allow us to better understand the biological activity of the virus and to suggest potential drug targets for patient treatment. sars-cov- proteins can be grouped into two major classes -structural and non-structural proteins. nonstructural proteins are encoded by the virus, but are present only in the infected host cells and include the various enzymes and transcription factors necessary for virus replication. these proteins are not incorporated into the virion and are not as highly expressed, thus are less likely to be detected. the four structural proteins incorporated in the virion particle of sars-cov- virus are known as the s (spike), e (envelope), m (membrane), and n (nucleocapsid) proteins. the n protein encapsulates and protects the rna genome, and the s, e, and m proteins together create the viral envelope. from the data previously obtained for sars-cov [ ] , which has a structure similar to sars-cov- structure, it can be concluded that the number of copies of e, m, and n proteins is much larger than that of s, but the e and m are relatively short and tightly membrane-bound proteins, what makes them difficult to extract, detect, and identify. thus, the main target for mass-spectrometry based detection of sars-cov- is the n protein. the possibility of developing such methods using gargle solution samples of covid- patients have been reported [ ] . we have performed a pilot study on nasopharynx epithelial swabs already collected from patients with codiv- for rt-qpcr and showed confident identification of the n protein of the sars cov- virus by mass-spectrometry with the use of a very basic sample preparation procedure. also results on the unique easily detectable peptides characteristic of the infection can further be used as targets for creating highly specific and sensitive prm based detection methods or fast parallel mass-spectrometric approaches based on immunoprecipitation and maldi analysis (imaldi), allowing very fast testing for the presence of the virus in nasopharyngeal mucosa, saliva, sputum and other physiological fluids. all procedures for collection, transport, and preparation of the samples were carried out according to the restrictions and protocols of sr . . - «safety procedures for work with microorganisms of the i -ii groups of pathogenicity (hazard)». swabs of the mucosa of the lower part of the nasopharynx and posterior wall of the oropharynx were used for the study. the sample was collected via a sterile velor swab with a plastic applicator. the swab was introduced along the outer wall of the nose to a depth of - cm to the lower shell, and after performing a rotational movement, was removed along the outer wall of the nose. after obtaining the material, the swab (up to the place of breakage) was placed into a sterile disposable tube with transport medium, and the end of the probe was broken off. oropharyngeal swab samples were taken with a dry sterile viscose swab by rotational movements along the surface of the tonsils, palatine arches and posterior wall of the oropharynx. in order to increase the viral concentration both the nasopharyngeal and oropharyngeal swabs were placed into a single tube, which was then sealed and marked. the samples were collected from patients with covid- infection confirmed by rt-qpcr. the negative control samples were collected from healthy individuals. before the virus has been inactivated and the outside of the tubes has been disinfected, all work must be carried out in accordance with the rules of biological safety level [ ] . . transfer . ml of the specimen to a clean polypropylene . - . ml eppendorf tube. . inactivate by heating at °c for minutes using a water bath or a thermostated block heater) [ ]*. . after incubation, add . ml of % isopropanol to obtain a % solution and allow the sample to sit for minutes at room temperature [ ] . . treat the outer surfaces of the tubes with % isopropanol or % sodium hypochlorite: spill from a jet wash and let stand for minutes without getting wet. after this procedure, the virus can be considered deactivated, and the surface of the tube safe to handle. note: * -it is important to note, that during heat treatment, all parts of the tube are heated, so that all of the virus, even on "inaccessible" surfaces, will be inactivated. protocol : «standard proteomics preparation procedure» inactivated samples were lyophilized and resuspended in mm ammonium bicarbonate buffer, containing . % rapigest sf surfactant (waters). reduction was carried out by incubating for minutes at °c in mm dtt, followed by alkylation with mm iodacetamide for minutes at room temperature in the dark, and tryptic digestion for hours at °c. the reaction was terminated by adding formic acid to a final concentration of . %. inactivated samples were cooled down to - °c for hours and centrifuged at x g for minutes. the pellet was resuspended in mm ammonium bicarbonate buffer, containing . % rapigest sf surfactant (waters) and subjected to tryptic digestion for hours at °c. the reaction was terminated by adding formic acid to a final concentration of . %. the tryptic peptides were analyzed in duplicate on a nano-hplc dionex ultimate system (thermo fisher scientific, usa) coupled to a tims tof pro (bruker daltonics, usa) mass-spectrometer. the amount of sample loaded was ng per injection. hplc separation (injection volume µl) was carried out using a packed emitter column (c , cm x µm . µm) (ion optics, parkville, australia) [ ] by gradient elution. mobile phase a was . % formic acid in water; mobile phase b was . % formic acid in acetonitrile. lc separation was achieved at a flow of nl/min using a -min gradient from % to % of phase b. mass spectrometric measurements were carried out using the parallel accumulation serial fragmentation (pasef™) [ ] acquisition method. the esi source settings were the following: v capillary voltage, v end plate offset, . l/min of dry gas at temperature of oc. the measurements were carried out over the m/z range from to th. the range of ion mobilities included values from . - . vs/cm ( /k ). the total cycle time was set at . sec and the number of pasef ms/ms scans was set to . for low sample amounts, the total cycle time was set to . sec. the obtained data was analyzed using peaks studio . and maxquant version . . . using the following parameters: parent mass error tolerance - ppm; fragment mass error tolerance - . da. due to light denaturation conditions, the absence of reduction and alkylation steps in one of the sample preparation approaches and short hydrolysis time -up to missed cleavages were allowed, but only peptides with both trypsin-specific ends were considered. oxidation of methionine and carbamidomethylation of cysteine residues were set as possible variable modifications and up to variable modifications per peptide were allowed. the search was carried out using the swissprot sars-cov- database with the human one set as the contamination database. fdr thresholds for all stages were set to . ( %) or lower. approximately - proteins were identified in each sample, among which the p dtc |ncap_sars nucleoprotein of severe acute respiratory syndrome coronavirus was registered ( figure ). depending on the viral content in the samples, preparation protocol and processing software used - peptides of the n protein were reliably detected and identified in the covid- patient samples (tables and ) . the n protein, being the most abundant protein in the virion, is the best candidate for mass-spectrometric detection of the infection, and so its detection is expectable. also ms-based detection of several peptides from the sars-coov- nucleoprotein has been reported earlier by the sinz group [ ] in the gargle solution samples of covid- patients. we have performed a pilot study on nasopharynx epithelial swabs already collected from patients with codiv- for rt-qpcr and showed confident identification of the n protein with the use of a very basic sample preparation procedure. more than that the express procedure allowed better detection of the n protein than the more thorough one, standardly used for proteomic analysis. this is probably due to the significantly lower amounts of peptides from the much more abundant host proteins, which require reduction, alkylation, deglycosylation and other preparative steps for good sequence coverage. also since an untargeted lc-ms/ms with data-dependent acquisition approach was used to identify as many proteins as possible, it is expected that the use of targeted approaches aimed at monitoring the presence of this exact protein will result in significantly lower detection limits. for example the use of prm approaches on triple-quadrupole mass-spectrometers basing on the peptides identified in this study may allow to go below the sensitivity of rt-qpcr, while application of immunoprecipitation methods with subsequent maldi analysis or even maldi detection of viral proteins directly on the immobilized antibodies (imaldi) will allow to shorten the processing times to as low as hour. the observation of over host proteins also suggests that this approach could be used for detecting and studying the changes caused by the viral infection in the proteome of host cells, as well as the response of the organism to these conditions. table . peptides from the p dtc |ncap_sars nucleoprotein identified via peaks studio in different samples sample preparation protocol. samples - were collected from patients with covid- ( - were prepared by protocol ; - by protocol ). samples - -negative control (healthy individuals). two lc ms/ms runs were performed for each sample. the numbers in the table correspond to spectral counts for each peptide. detection of novel coronavirus ( -ncov) by real-time rt-pcr. euro surveillance : bulletin europeen sur les maladies transmissibles katrin zwirglmaier, christian drosten & clemens wendtner, virological assessment of hospitalized patients with covid- risk of reactivation or reinfection of novel coronavirus (covid- ) mass spectrometric identification of sars-cov- proteins from gargle solution samples of covid- patients a new coronavirus associated with human respiratory disease in china severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection who laboratory biosafety guidance for novel coronavirus ( -ncov): interim recommendations evaluation of inactivation methods for severe acute respiratory syndrome coronavirus in noncellular blood products stability and inactivation of sars coronavirus simplified high-throughput methods for deep proteome analysis on the timstof pro online parallel accumulation-serial fragmentation (pasef) with a novel trapped ion mobility mass spectrometer key: cord- - qgsq km authors: fignani, daniela; licata, giada; brusco, noemi; nigi, laura; grieco, giuseppina e.; marselli, lorella; overbergh, lut; gysemans, conny; colli, maikel l.; marchetti, piero; mathieu, chantal; eizirik, decio l.; sebastiani, guido; dotta, francesco title: sars-cov- receptor angiotensin i-converting enzyme type (ace ) is expressed in human pancreatic β-cells and in the human pancreas microvasculature date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qgsq km increasing evidence demonstrated that the expression of angiotensin i-converting enzyme type (ace ), is a necessary step for sars-cov- infection permissiveness. in the light of the recent data highlighting an association between covid- and diabetes, a detailed analysis aimed at evaluating ace expression pattern distribution in human pancreas is still lacking. here, we took advantage of innodia network eunpod biobank collection to thoroughly analyse ace , both at mrna and protein level, in multiple human pancreatic tissues and using several methodologies. using multiple reagents and antibodies, we showed that ace is expressed in human pancreatic islets, where it is preferentially expressed in subsets of insulin producing β-cells. ace is also is highly expressed in pancreas microvasculature pericytes and moderately expressed in rare scattered ductal cells. by using different ace antibodies we showed that a recently described short-ace isoform is also prevalently expressed in human β-cells. finally, using rt-qpcr, rna-seq and high-content imaging screening analysis, we demonstrated that pro-inflammatory cytokines, but not palmitate, increases ace expression in the β-cell line endoc-βh and in primary human pancreatic islets. taken together, our data indicate a potential link between sars-cov- and diabetes through putative infection of pancreatic microvasculature and/or ductal cells and/or through direct β-cell virus tropism. the expression of molecules that act as receptors for viruses determine tissue-specific tropism. sarscoronavirus (sars-cov- ), that leads to the respiratory illness coronavirus disease (covid- ) , uses its surface envelope spike glycoprotein (s-protein) to interact and gain access to host cells through the angiotensin-i converting enzyme- (ace ) receptor. as such, s-protein-ace binding is the key determinant for virus entry, propagation and transmissibility of covid- -related disease ( , ) . artificially induced ace de-novo expression in ace -negative cell lines is a necessary step to sars-cov and sars-cov- infection ( , ) . sars-cov- does not enter cells that do not express ace and does not use other coronaviruses receptors, such as aminopeptidase n (apn) and dipeptidyl peptidase (dpp ), thus being fully dependent on ace presence in host cells ( ) . additional host co-factors, such as transmembrane protease tmprss , cathepsin b/l and furin protease, have been shown to enhance efficiency of sars-cov- cell entry by processing the s-protein and eliciting membrane fusion and syncytia formation ( ) . the central role played by ace in sars-cov- infection has been further supported by evidence that sars-cov- infection is driven by ace expression level ( ) . sars-cov- mainly targets cells of the nasal, bronchial and lung epithelium, causing respiratory-related symptoms; however, growing evidence shows that other tissues can also be infected. several reports indicate a wide, although variable, distribution of ace expression patterns among different tissues ( - ), thus underlining a potential different virus infection susceptibility among cell types. the fact that covid- disease may lead to multiple organ failure ( , ) shows the crucial relevance for understanding the molecular mechanisms of host cell factors used by sars-cov- to infect their target tissues. recent studies showed that older adults and those with chronic medical conditions like heart and lung disease and/or diabetes mellitus are at the highest risk for complications from sars-cov- infection. of importance, a yet unresolved conundrum relies on the recently hypothesized bidirectional relationship between covid- and diabetes mellitus ( , ) . this concept is supported by reports in which impaired glycaemic control is associated with increased risk of severe covid- . indeed, elevated blood glucose concentrations and deterioration of glycaemic control may contribute to increased inflammatory response, to abnormalities in the coagulation system and to impairment of ventilatory function, thus leading to severe covid- disease and to a worse prognosis ( ) . interestingly, acute hyperglycaemia has been observed at admission in a substantial percentage of sars-cov- infected subjects, regardless of the past medical history of diabetes ( ) ( ) ( ) ( ) . the same observations were previously made in sars-cov- pneumonia during sars epidemic ( ) . a recently published case report described autoantibody-negative insulin-dependent diabetes onset in a young patient who was infected by sars-cov- seven weeks before diabetes symptoms occurrence ( ) . additional previous studies further support such observation ( , ) . this indicates the possibility of a link between sars-cov- infection and new-onset diabetes through potential direct infection of pancreatic islets or additional indirect mechanisms. indeed, an in-vitro infection model of human pluripotent stem cells derived β-cells exposed to sars-cov- ( ) showed permissiveness of these pre-β-cells to the virus. however, whether fully mature primary beta-cells or other cells of the human pancreas are indeed permissive to sars-cov- infection remains to be clarified. to address this question, we screened the ace expression pattern in human pancreata obtained from adult non-diabetic multiorgan donors and in the insulin-producing human β-cell line endoc-βh , using different methodologies, multiple reagents, and publicly available or in-house generated rna sequencing datasets. our data indicate that ace is expressed by pancreas microvasculature, by scattered ductal cells and by a subset of human β-cells. these different cell types are thus potentially prone to sars-cov- infection. we also identified a differential distribution of the two recently discovered ace isoforms ( , ) . exposure of endoc-βh human beta-cell line and human pancreatic islets to pro-inflammatory cytokines significantly increased ace expression. taken together, our data suggest a potential link between sars-cov- infection and new onset diabetes, which deserves further investigation based on long-term follow up of patients recovered from covid- disease. materials and methods human pancreatic sections analysed in this study were obtained from pancreata of brain-dead adult non-diabetic multiorgan donors within the european network for pancreatic organ donors with diabetes (eunpod), a project launched in the context of the innodia consortium (www.innodia.eu). whole pancreata were processed following standardized procedures at university of pisa. formalin fixed paraffin embedded (ffpe) pancreatic tissue sections and frozen oct pancreatic tissue sections were obtained from n= adult non-diabetic multiorgan donors, and from n= longstanding t d donor pancreas (table s ). in innodia eunpod network, pancreata not suitable for organ transplantation were obtained with informed written consent by organ donors' next-of -kin and processed with the approval of the local ethics committee of the pisa university. human pancreatic islets were obtained from n= non-diabetic multi-organ donors (table s ) . briefly, purified islets were prepared by intraductal collagenase solution injection and density gradient purification, as previously described ( ) . at the end of the isolation procedure, fresh human pancreatic islets preparations were resuspended in cmrl culture medium (cat. - - , thermofisher in order to evaluate the staining pattern of ace in human pancreatic tissues, we analyzed ffpe sections ( -μm thickness), prepared by using a microtome (cat. rm rts -leica microsystems, wetzlar, germany) and baked overnight at °c, from two different portions of pancreatic tissue for each multiorgan donor (listed in table s ). after deparaffinization and rehydration through decreasing alcohol series (xylene-i min, xylene-ii coverslip allowing them to dry. a negative control with only secondary antibody incubation (no primary antibody control sample) was also included in order to exclude potential background artifacts generated by the secondary antibody or the enzymatic detection reaction ( figure s a ). in order to further evaluate ace expression in pancreas sections, the same ace ihc protocol was applied to other primary antibodies anti-human ace : monoclonal rabbit anti-human ace (cat. all the primary antibodies anti-human ace , with respective secondary antibodies, were also used to perform a positive control staining in ffpe human lung sections ( -μm thickness), in order to double check the specificity of the primary antibodies. immunofluorescence staining for ace -insulin-glucagon and ace -cd counterstained with dapi and then mounted as described above. cultured endoc-βh cells were immunostained for ace and insulin as follows. cytokines-treated or untreated cells were fixed in % pfa for min, washed for min in . mol/l glycine, permeabilized in , % triton-x- for min and blocked in % bsa+ . % triton-x in pbs and were incubated with goat anti-mouse- : in % bsa in pbs x without ca + and mg + , or with goat anti-rabbit- diluted : in % bsa in pbs x without ca + and mg + . cells were counterstained with dapi and then mounted as described above. images were acquired using leica tcs sp confocal laser scanning microscope system (leica microsystems, wetzlar, germany). images were acquired as a single stack focal plane or in z-stack mode capturing multiple focal planes (n= ) for each identified islet or selected representative islets. sections were scanned and images acquired at × or × magnification. the same confocal microscope setting parameters were applied to all stained sections before image acquisition in order to uniformly collect detected signal related to each channel. colocalization analysis between ace and insulin and between ace and glucagon were performed using lasaf software (leica microsystems, wetzlar, germany). the region of interest (roi) was drawn to calculate the colocalization rate (which indicates the extent of colocalization between two different channels and reported as a percentage) as a ratio between the colocalization area and the image foreground. evaluation of the signal intensity of ace expression in human pancreatic islets of eunpod donors was performed using the lasaf software (www.leica-microsystem.com). this software calculates the ratio between intensity sum roi (which indicates the sum roi of the greyscale value of pixels within a region of interest) of ace channel and area roi (µm ) of human pancreatic islets. both in colocalization and intensity measurement analysis, a specific threshold was assigned based on the fluorescence background. the same threshold was maintained for all the images in all the cases analysed. cultured endoc-βh cells were immunostained for ace and insulin as reported above. cytokinestreated or untreated cells were fixed in % pfa for min, washed for min in . mol/l glycine, permeabilized in , % triton-x- for min and blocked in % bsa+ . % triton-x in pbs spot intensity" minus "spot background intensity") ( ). total rna of endoc-βh cells and of pancreatic human islets exposed or not to ifnα or to il- β + ifnγ for the indicated time points was obtained and prepared for rna sequencing as described ( ) ( ) ( ) ( ) . genes were considered significantly modified with a fdr < . . total proteins from endoc-βh cells were extracted using a lysis buffer ( mm chalfont, buckinghamshire, uk-rpn ). spectrometer (thermofisher scientific), equipped with electrospray (esi) ion source operating in positive ion mode. the instrument is coupled to an uhplc ultimate (thermofisher scientific). the chromatographic analysis was performed on a column acquity uplc waters csh c Å ( mm x mm, , µm, waters) using a linear gradient and the eluents were . % formic acid in water (phase a) and . % formic acid in acetonitrile (phase b). the flow rate was maintained at μl/min and column oven temperature at °c. the mass spectra were recorded in the mass to charge (m/z) range - at resolution k at m/z . the mass spectra were acquired using a "data dependent scan", able to acquire both the full mass spectra in high resolution and to "isolate and fragment" the ten ions with highest intensity present in the full mass spectrum. the raw data obtained were analyzed using the biopharma finder . software from thermofisher scientific. the elaboration process consisted in the comparison between the peak list obtained "in silico" considering the expected aminoacidic sequence of human ace protein (uniprot id: q byf ), trypsin as digestion enzyme and eventual modifications (carbamidomethylation, oxidation, etc.). ace proximal promoter sequence was retrieved from ensembl genome browser database the ncbi geo accession number for rna sequencing data reported in this paper are: gse , gse , gse . results were expressed as mean ± sd. statistical analyses were performed using graph pad prism software. comparisons between two groups were carried out using mann-whitney u test (for nonparametric data) or wilcoxon matched-pairs signed rank test. differences were considered significant with p values less than . . results to determine the ace protein expression pattern in human pancreatic tissue, we first performed a colorimetric immunohistochemistry analysis to detect ace on formalin-fixed paraffin embedded (ffpe) pancreatic sections obtained from seven (n= ) adult non-diabetic multiorgan donors collected by the innodia eunpod biobank (table s ). to specifically detect ace protein in such context, we initially used a previously validated monoclonal anti-human ace antibody (r&d mab ) ( ) which passed the validation criteria suggested by the international working group for antibody identified three main cell types positive for ace (figure , panel-a to -f ). in the exocrine pancreas there was a marked and intense staining in a subset of vascular components (endothelial cells or pericytes) found in inter-acini septa (figure , panel-a and -b) . we also identified ace positive cells in the pancreatic ducts, even though only some scattered cells with a clear ace signal were detected (figure , panel-c and -d) . of interest, we observed a peculiar ace staining pattern in the endocrine pancreatic islets, showing a diffuse ace signal in a subset of cells within islet parenchyma ( figure , panel -e and -f; figure s ). however, the observed ace expression in islets was lower than the expression observed in microvasculature, the latter representing the main site for ace expression in the pancreas. in all cases analysed, including different blocks of the same case, a similar expression pattern of ace was observed, even though a certain degree of variability in terms of ace staining intensity within the islets was noted (figure s ). the highest signal of ace within the pancreas was observed in putative association with microvasculature (figure , panel-a and -b) . of note, in such context, a lobular staining pattern of microvasculature associated ace was evident, as demonstrated by the presence of positive cells in certain lobules and low or null expression in other lobules of the same pancreas section (figure a) . ace staining pattern in inter-acini septa suggested an overlap with cells associated to microvasculature, most likely endothelial cells. in order to explore such possibility, we performed a double immunofluorescent staining on pancreas ffpe sections for ace and the endothelial cell specific marker cd . the results showed that ace signal is associated, but not superimposed, to the cd -specific one, thus resembling the tight association of pericytes to endothelial cells and strongly suggesting the presence of ace in microvasculature pericytes ( figure b) . in order to confirm the ace cellular distribution observed in the pancreas using mab antibody, we tested two additional anti-ace antibodies from abcam: ab and ab (see resources table) . according to the informations obtained by r&d and abcam, while mab and ab are reported to recognize the c-terminus portion of ace ( - aa and - aa, respectively), ab is specifically designed to react with a linear peptide located in the n-terminal ace protein sequence ( - aa). it has been recently reported that a aa short-ace isoform ( - aa of ace + aa at n-terminus) can be co-expressed alongside the full-length ace protein ( - aa) ( figure a ) ( , ) ; the short-ace misses part of the n-terminal region targeted by ab antibody. we observed ace islet-related signal using both mab ( figure b ) and ab ( figure c ), but the ab antibody did not show any positivity within the islet parenchyma ( figure d) , raising the possibility that the most prevalent ace isoform within pancreatic islets is the short one. of note, the three antibodies tested showed ace positivity in the microvasculature, thus suggesting a putative differential distribution of the two ace -isoforms in the human pancreas. as a positive control for our immunohistochemistry method and ace antibodies adopted, we evaluated ffpe lung tissue sections. as previously shown ( , , ) , we observed scattered positive cells (putatively at pneumocytes) in the alveolar epithelium both using mab and ab ( figure s a and s b) . in contrast, we did not observe any signal using ab ( figure s c) . collectively, these results indicate that the same staining pattern were obtained by using out of antibodies that may recognize both ace isoform (short-ace and long-ace ) thus confirming: (i) a high ace expression in microvasculature pericytes; (ii) rare scattered ace positive ductal cells; (iii) diffuse though weak ace positive staining in a subset of cells within human pancreatic islets. using both mab and ab we observed ace signal in pancreatic islets which suggests that ace is expressed in endocrine cells. therefore, we sought to determine which pancreatic islet cell subset contributes to ace signal in such context. to do so, we performed a triple immunofluorescence analysis on the same set of ffpe pancreatic sections of non-diabetic multiorgan donors, aimed at detecting glucagon-positive α-cells, insulin-positive β-cells and ace signals (figure and figure s ). using r&d mab , ace preferentially overlapped with the insulin-positive β-cells (figure a, panel-a to -m), being mostly colocalized with insulin and low/not detectable in α-cells (figure a, panel-e, -f, -l , -m). such staining pattern was observed in all cases and was consistent between two different ffpe pancreas blocks of the same case ( table s ) . as expected, ace -only positive cells within or around pancreatic islets were also observed ( figure a, panel g) , potentially indicating the presence of ace -positive pericytes interspersed in the islet parenchyma or surrounding it. intriguingly, in the β-cells a major fraction of ace was observed in the cytoplasm and partially overlapped with insulin positive signal, while in a subsets of them only a minor fraction of the ace signal was attributable to several spots located on plasma membrane (figure a ; figure s a red arrow, and figure s b ). in microvasculature pericytes, the ace -positive signal was mainly observed in plasma membrane ( figure a , panel g; figure s green arrow) as previously described ( ) . there were also some ace -negative β-cells (figure s , white arrow) . colocalization rate analysis between ace -insulin and ace -glucagon, performed on a total of single pancreatic islets from seven different adult non-diabetic cases, confirmed the significant preferential expression of ace in β-cells compared to α-cells (colocalization rate: ace -ins . ± . % vs. ace -gcg . ± . % p< .± ) ( figure b and figure s a ) the comparison of colocalization rates between ace -insulin and ace -glucagon among all cases analysed, confirmed the consistent preferential expression of ace in β-cells in comparison to α-cells ( figure b) . these results were confirmed when comparing different blocks of the same case (table s ) . there was however heterogeneity in terms of the ace -insulin colocalization rate among different islets (ace -ins colocalization rate range: . - . %). such heterogeneity was also highlighted by the presence of rare ace -negative pancreatic islets in the same pancreas section. inter-islets heterogeneity was also clearly observed regarding ace islet-related signal intensity analysis ( figure d ). of note, some cases showed a lower ace -insulin mean colocalization rate and islet-ace signal intensity compared to the other ones (figure d ), thus suggesting a high degree of heterogeneity among cases also in terms of islet-ace expression. no significant correlation between ace -islets signal intensity and age, bmi or cold-ischemia time were observed in our donors cohort ( figure s b ). using western blot (wb) and immunofluorescence analysis, we explored the expression of ace in the human β-cell line endoc-βh , a model of functional β-cells for diabetes research ( , ) . to do so we used r&d mab , abcam ab and ab antibodies, as previously done in the above described pancreas immunohistochemistry experiments. in wb analysis, mab revealed the presence of a prevalent kda band corresponding to the short-ace isoform; use of this ab showed the brightest signal in immunofluorescence staining among the three antibodies tested (figure a ). abcam ab worked better in wb for the recognition of both ace isoforms, and indicated that the most prevalent ace isoform present in human β-cells is the short-ace ( kda, blue arrow) ( figure b ). in contrast, ab recognized only the long-ace isoform (> kda-red arrow) ( figure c ). of note, the results obtained through wb analysis are in line with the immunofluorescence signal which revealed that ab only stained a minor fraction of endoc-βh and the obtained signal was mainly found on the plasma membrane ( figure c, panel-b) . conversely, ab and mab , which recognized both ace isoforms, showed a higher signal and a different subcellular localization respect to ab (figure a, b, panel-b) . mab ace -insulin double immunofluorescence staining confirmed the main punctuate and likely granular cytoplasmic ace signal which also partially overlapped with insulin-positive secretory granules ( figure d, panel -a to panel -h) . in addition, we also observed some spots putatively localised on the plasma membrane ( figure d ). of note, the specificity of ace mab signal observed in endoc-βh was orthogonally tested in comparison to hela cells which showed very low/absent ace mrna expression ( figure s a ) and resulted indeed negative for ace in immunofluorescence ( figure s b ). an additional evidence of the presence of ace in endoc-βh was provided by the shotgun proteomic analysis, aimed at detecting specific peptides derived from ace protein independently of the use of specific antibodies. by this independent approach, we observed the presence of both n-terminal and c-terminal unique ace -derived peptides (figure s ) , which further confirmed the presence of the ace- protein in human β-cells (supplementary file a, b) . to confirm the ace expression in human islets, we also evaluated its transcriptional activity both in collagenase-isolated and in laser-capture microdissected (lcm) human pancreatic islets, by measuring its mrna expression using taqman rt-real time pcr. in order to avoid detection of genomic dna, we used specific primers set generating an amplicon spanning the exons - junction of ace gene, thus uniquely identifying its mrna ( figure s a ). of note, the selected amplicon is shared between short and long-ace isoforms thus identifying total ace mrna. first, as a positive control we analysed total ace expression in rna extracted from a lung parenchyma biopsy tissue (figure a and e) . collagenase-isolated human pancreatic islets obtained from four different non-diabetic donors pancreata (table s ) showed ace mrna expression, as demonstrated by rt-real-time pcr raw cycle threshold (ct) values, reporting a ct range between - ( figure b and f) . since human pancreatic islets enzymatic isolation procedures may induce some changes in gene expression ( ), we microdissected human islets from frozen pancreatic tissues obtained from five non-diabetic multiorgan donors recruited within innodia eunpod network ( ) and evaluated ace mrna levels. the lcm procedure ( figure s b ) allowed us to extract high quality total rna ( figure s c ) from human pancreatic islets directly obtained from their native microenvironment, thus maintaining transcriptional architecture. ace mrna expression in lcm- human pancreatic islets showed a consistent expression among cases, similar to isolated islets, as shown by ace mrna raw ct and normalized values (figure c and g) . finally, we analysed total ace mrna expression in the human β-cell line endoc-βh . analysis of ace mrna expression in these cells demonstrated a similar expression level in comparison to human pancreatic islets (figure d and h) , with raw ct values ranging from to . in order to determine whether metabolic or inflammatory stress conditions modify pancreatic endocrine β-cell expression of ace , we exposed the human β-cell line endoc-βh and isolated human pancreatic islets to metabolic or inflammatory stressors and subsequently evaluated ace expression levels. exposure to palmitate ( mm palmitate for h) did not significantly modulate ace expression ( figure a ). in line with these observations, neither primary human islets exposed to palmitate ( ) nor human islets isolated from patients affected by type diabetes ( ) (figure a) . the same results were confirmed through immunofluorescence analysis aimed at measuring ace protein levels and subcellular localization in endoc-βh exposed or not to the same proinflammatory condition ( figure c). indeed, we observed a significant increase in ace mean intensity values upon cytokine treatment, confirming the upregulation of ace protein as well ( figure e) . these results were confirmed using an automated micro-confocal high content images screening system ( ) which allowed us to measure ace intensity in cytokine-treated vs not-treated endoc-βh cells ( figure d and figure f ). in support of the observed increase of ace upon pro-inflammatory stress, rna sequencing data analysis of endoc-βh cells exposed to il- β+ifnγ ( h) or to ifnα ( h) further confirmed such increase ( figure s a ) expression in endoc-βh cells treated with il- β+ifnγ or with ifnα, respectively. importantly, the same expression pattern was observed also in human pancreatic islets exposed to the same cytokines mix, as demonstrated by a . and . fold-increase in ace mrna expression following il- β+ifnγ or ifnα treatment respectively (p< . ) (table s and figure s b ). additionally, in order to strengthen such observations, we focussed on the ace gene promoter by analysing its upstream sequence (- bp), from ace transcriptional start site (tss). using two different transcription factors (tf) binding motifs databases, we found several binding sites for tfs related to cytokine signalling pathways such as stat or stat (figure s ) , thus reinforcing our results of an association between inflammation and ace expression, and confirming what previously reported ( ) . however, the analysis of ace expression distribution in ffpe pancreas sections from a t d longstanding donor (see table s ) did not show remarkable changes in the levels or distribution of ace in infiltrated islets ( figure s ). this is in line with rnaseq analysis of whole islets from two t d patients and four controls, which indicated similar ace expression (rpkm . - . in all cases) ( ) . analysis of additional recent-onset t d donors is needed to evaluate potential changes in ace expression in β-cells of highly infiltrated pancreatic islets. collectively these results demonstrate that ace is upregulated upon in vitro exposure to early and acute inflammatory, but not metabolic, stressors both in endoc-βh and in human pancreatic islets. in covid- disease, clinical complications involving the metabolic/endocrine system are frequently observed. these include critical alterations of glycaemic control in diabetic patients and new-onset hyperglycaemia at admission in individuals without previous clinical history of diabetes. although multiple causes have been indicated for covid- -related hyperglycaemia, a recently published case report described autoantibody-negative insulin-dependent diabetes onset in a young patient who was infected by sars-cov- seven weeks before occurrence of diabetes symptoms. this suggests a potential effect (direct or indirect) of the sars-cov- infection on the pancreatic islet insulin producing β-cells, but additional evidence is needed to allow solid conclusions. previous studies suggested that ace , the human host cell receptor for sars-cov- and sars-cov, which in other tissues has been shown to be a necessary component for infection permissiveness ( , ) is expressed in pancreatic tissue ( ) . however, an in-depth analysis aimed at evaluating ace expression pattern distribution in human pancreas is still lacking. here, we adopted multiple technologies and reagents to thoroughly analyse presence of ace , both at mrna and protein level, in order to evaluate its expression and localization in pancreatic tissue samples obtained from adult non-diabetic multiorgan donors from the innodia eunpod biobank collection, in enzymatic-and lcm-isolated primary adult human pancreatic islets and in human β-cell line endoc-βh . in human adult pancreas, we primarily observed ace expression in microvasculature component (endothelial cells-associated pericytes, both in endocrine and exocrine compartments). the expression of ace in the pancreatic microvasculature compartment was associated, but not superimposed, to the endothelial cells specific marker cd (or pecam- ). such staining pattern strongly suggests the presence of ace in pancreatic vascular pericytes which are tightly associated to endothelial cells. of interest, although the exocrine pancreas and the pancreatic islets are highly vascularized ( ) , only a subset of pancreatic pericytes cells markedly express ace . additionally, ace expression in microvascular compartment is surprisingly lobular, resembling the heterogeneous staining pattern of several inflammatory markers. additional observations on multiple pancreatic lobules are required to confirm such heterogeneous lobular patterning observed. the presence of ace in pancreatic pericytes is of sure interest. as a matter of fact, the vascular leakage and endothelitis were reported as a typical sign of sars-cov- infection in various organs, driving early local inflammation and the subsequent exacerbation of immune responses ( , ) . of note, multiple studies addressed the importance of an intact islet microvasculature in order to render pancreatic islets fully functional ( , ) . therefore, a pancreatic islet local vascular damage and inflammation due to sars-cov- direct infection of ace + pancreatic pericyte cells is a potential contributory factor for islet dysfunction. of note, two recent preprint manuscripts also indicated the presence of ace expression in pancreatic microvasculature ( , ) . our results indicate that ace is expressed also in the pancreatic islets and this expression is mostly located in β-cells as compared to α-cells. these results are in contrast with the two recent preprint manuscripts ( , ) which failed to observe ace expression in pancreatic islet endocrine cells. such discrepancies may be explained by a resulting sum of differences in primary antibodies sensitivity, different epitopes targeted, tissue sections preparation and pre-treatment, as well as immunodetection methodology sensitivity (immunohistochemistry vs. immunofluorescence). such variables may be of critical importance when detecting a low-expressed protein and may generate different results. furthermore, it should be taken into consideration that ace expression may vary greatly among individuals due to genetic or environmental factors ( ) . such intrinsic ace variability has been previously observed also in other cellular contexts, with some authors reporting high ace levels and others low or absent ( ). in our study, localization of ace in pancreatic islet endocrine cells was observed using two out of three different antibodies tested. surprisingly, we were not able to observe ace pancreatic islets positive signal using abcam monoclonal antibody ab , while signal was clearly evident in microvasculature and scattered ductal cells. however, these results are in line with kusmartseva et al ( ) . of note, ab monoclonal antibody specifically targets an epitope located in the nterminal domain of ace protein ( - aa) which is missing in the recently discovered truncated ace isoform (short-ace , - aa) ( , ) , thus being capable to recognize only the long-ace isoform. on the contrary, by using two different antibodies (mab and ab ) -which can recognize the c-terminal domain shared between short-and long-ace -we obtained clear and concordant results, with identification of ace in pancreatic islet endocrine cells in addition to the microvasculature. as a positive control for the antibodies and our ihc procedure, mab and ab were also tested in ffpe lung tissue sections, showing overlapping results with previously published studies ( , , ) . overall, these results suggest that the short-ace isoform may be the prevalent one expressed in βcells. indeed, ace western blot analyses of the human β-cell line endoc-βh support this hypothesis by confirming that: (i) short-ace is prevalent over long-ace , the latter being present but with low expressed in β-cells; (ii) abcam ab cannot recognize the short-ace isoform. based on these results we suggest that in human β-cells both ace isoforms are present, with a predominance of the short-ace . although the presence of the short-ace isoform, alongside with long-ace , is clearly evident in human β-cells, its functional role remains to be established. a previous study suggested the ability of the short-ace isoform to form homodimers or heterodimer with the long-ace isoform, thus potentially being able to modulate both the activity and structural protein domains conformation of the long-ace ( ) . significantly, short-ace is missing the aminoacidic residues reported as fundamental for virus binding; however, due to the lack of detailed data regarding its function we cannot exclude that the short-ace may modulate sars-cov- susceptibility by interacting with long-ace or additional membrane proteins which may mediate the binding to sars-cov- spike protein [e.g. itga , as previously observed ( )]. of interest, nasal epithelium, reported to represent the main reservoir of sars-cov- ( ), showed higher levels of short-ace vs. long-ace ( ). as described above, we report that in human pancreatic islets ace is enriched in insulin-producing ( ), which usually detects only , - , genes/cell, thus representing a minor fraction ( - %) of the total number of genes identified by bulk cells rna sequencing (> , genes). another recent study analyzed sars-cov- host receptors expression using two different methods (microarray and bulk rna-seq) and further confirmed that ace is indeed expressed in human pancreatic islets, and also demonstrated that ace expression was higher in sorted pancreatic β-cells relative to other endocrine cells ( ) . additional evidence of ace expression in endocrine pancreas and in β-cells derive also from mouse studies, which demonstrated ace expression in insulin-producing cells as well as its critical role in the regulation of β-cell phenotype and function ( ) ( ) ( ) ( ) . collectively, our in-situ expression data alongside with multiple published datasets and reports both in human and mouse show that ace is expressed in pancreatic islets, albeit at relatively low levels. ace expression in human β-cells may render these cells sensitive to sars-cov- entry ( ). such hypothesis is consistent with the known sensitivity of these cells to infection by several enterovirus serotypes. indeed, multiple evidence from our and other groups ( ) ( ) ( ) ( ) showed that enteroviruses are capable to competently infect β-cells but less so α-cells ( , ) ; these viruses are thus being considered as one of the potential triggering causes of type diabetes (t d) ( ) . of note, it has been previously demonstrated that human β-cells exclusively express virus receptor isoform coxsackie and adenovirus receptor-siv (car-siv), making them prone to infection by certain viruses ( ) . therefore, it would not be surprising that, under particular conditions, human β-cells could be directly infected by sars-cov- . importantly, a recent report showed that human pancreatic islets can be infected in vitro by sars-cov- ( ), supporting our observations of a specific tropism of the virus due to ace expression. noteworthy, the subcellular localization of ace in β-cells recapitulates what was previously found for the virus receptor car-siv ( ) . in our dataset, ace protein signal is mostly cytoplasmic/granular and partially overlaps with insulin granules. additional spots are also localized close to the plasma membrane, thus suggesting the existence of ace isoforms in multiple compartments within β-cells. such subcellular localization was observed both in-vitro in endoc-βh and ex vivo in β-cells of primary human pancreatic tissues. although ace has been primarily observed on cell surfaces ( , ) , some studies also described ace granular localization in other cell types of epithelial origin ( , ) . a similar intracellular localization and putative trafficking was observed for the viral receptor car-siv ( ) , also expected to be mainly localized on the cell membrane. against this background, we speculate that: (i) upon activation, ace can be internalized through endosome/lysosome pathway ( ); (ii) in β-cells, ace trafficking to cell membrane may be mediated by insulin granules; (iii) in β-cells the short-ace isoform could be mainly localized in cytoplasm, while long-ace in the plasma membrane; (iii) ace can be secreted and found in a soluble form or within exosomes ( , ) . additional studies are needed to fully ascertain the subcellular localization and trafficking of ace isoforms in human β-cells, and to determine whether ace is indeed present in the secretome of β-cells. our data also indicate that in β-cells, total ace mrna expression is upregulated upon different proinflammatory conditions, but not following exposure to the metabolic stressor palmitate or to the t d environment. importantly, these observations were obtained both in the human β-cell line endoc-βh and in human primary pancreatic islet cells, as shown by qrt-pcr, rna-seq datasets and immunofluorescence. as a matter of fact, ace has been previously indicated as an interferonstimulated gene (isg) in a variety of cells ( ) . of interest, a previous report suggested that the short-ace may represent the prevalent ace isoform upregulated upon inflammatory stress ( ) . however, whether this is the case also in human β-cells it should be examined in depth in future studies. although total ace expression is increased upon inflammatory stress, in-situ analysis of ace in infiltrated pancreatic islets derived from ffpe sections of a pancreas from a longstanding t d donor did not reveal significant changes of sars-cov receptor expression. however, we recognize that a high number of pancreatic islets with different degree of inflammation, and pancreata from recentonset t d donors showing highly infiltrated islets are needed to adequately characterize ace expression in the early stages of the disease and to determine whether changes in ace expression contribute to: (i) the observed alteration of glycaemic control at admission in sars-cov- individuals without previous clinical history of diabetes; (ii) the increased severity of covid- in those subjects with previous inflammatory-based diseases. in conclusion, the presently described preferential expression of ace isoforms in human β-cells, the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. structure of the sars-cov- spike receptor-binding domain bound to the ace receptor cell entry mechanisms of sars-cov- exogenous ace expression allows refractory cell lines to support severe acute respiratory syndrome coronavirus replication functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract the protein expression profile of ace in human tissues heterogeneous expression of the sars-coronavirus- receptor ace in the human respiratory tract tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis comorbidity and its impact on patients with covid- in china: a nationwide analysis clinical features of patients infected with the novel coronavirus (covid- new-onset diabetes in covid- diabetes and covid- : evidence, current status and unanswered research questions association of blood glucose control and outcomes in patients with covid- and preexisting type diabetes epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study outcomes in patients with hyperglycemia affected by covid- : can we do more on glycemic control? elevation of blood glucose level predicts worse outcomes in hospitalized patients with covid- : a retrospective cohort study admission hyperglycemia and radiological findings of sars-cov in patients with and without diabetes binding of sars coronavirus to its receptor damages islets and causes acute diabetes autoantibody-negative insulin-dependent diabetes mellitus after sars-cov- infection: a case report covid- infection may cause ketosis and ketoacidosis diabetic ketoacidosis precipitated by covid- in a patient with newly diagnosed diabetes mellitus a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids a novel isoform of ace is expressed in human nasal and bronchial respiratory epithelia and is upregulated in response to rna respiratory virus infection interferons and viruses induce a novel primate-specific isoform dace and not the sars-cov- receptor ace generation and expansion of multipotent mesenchymal progenitor cells from cultured human pancreatic islets a genetically engineered human pancreatic β cell line exhibiting glucoseinducible insulin secretion development of a conditionally immortalized human pancreatic β cell line sensitivity profile of the human endoc-βh beta cell line to proinflammatory cytokines endoc-βh cells display increased sensitivity to sodium palmitate when cultured in dmem/f medium microscopy-based high-content screening an integrated multi-omics approach identifies the landscape of interferon-α-mediated responses of human pancreatic beta cells the impact of proinflammatory cytokines on the β-cell regulatory landscape provides insights into the genetics of type diabetes conventional and neo-antigenic peptides presented by β cells are targeted by circulating naïve cd + t cells in type diabetic and healthy donors salmon provides fast and bias-aware quantification of transcript expression gencode reference annotation for the human and mouse genomes moderated estimation of fold change and' ' dispersion for rnaseq data with deseq controlling the false discovery rate: a practical and powerful approach to multiple testing transcription factor binding predictions using trap for the analysis of chip-seq data and regulatory snps enhanced validation of antibodies for research applications expression of ace , the sars-cov- receptor, in lung tissue of patients with type diabetes robust ace protein expression localizes to the motile cilia of the respiratory tract epithelia and is not increased by ace inhibitors or angiotensin receptor blockers endocytosis of the receptor-binding domain of sars-cov spike protein together with virus receptor ace finally! a human pancreatic β cell line analysis of beta-cell gene expression reveals inflammatory signaling and evidence of dedifferentiation following human islet isolation and culture from immunohistological to anatomical alterations of human pancreas in type diabetes: new concepts on the stage rna sequencing identifies dysregulation of the human pancreatic islet transcriptome by the saturated fatty acid palmitate global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism distinct gene expression pathways in islets from individuals with short-and long-duration type diabetes a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies whole-mount imaging demonstrates hypervascularity of the pancreatic ducts and other pancreatic structures covid- : the vasculature unleashed endothelial cell infection and endotheliitis in covid- the role of blood vessels, endothelial cells, and vascular pericytes in insulin secretion and peripheral insulin action the pericyte of the pancreatic islet regulates capillary diameter and local blood flow sars-cov- cell entry factors ace ace and sars-cov- expression in the normal and covid- pancreas sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues a single-cell rna expression map of human coronavirus entry factors dynamic regulation of sars-cov- binding and cell entry mechanisms in remodeled human ventricular myocardium sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes transcriptomes of the major human pancreatic cell types epigenomic plasticity enables human pancreatic α to β cell reprogramming novel observations from next-generation rna sequencing of highly purified human adult and fetal islet cell subsets current best practices in single-cell rna-seq analysis: a tutorial expression profile of sars-cov- host receptors in human pancreatic islets revealed upregulation of ace in diabetic donors the ace /ang-( - )/mas axis regulates the development of pancreatic endocrine cells in mouse embryos ace deficiency reduces β-cell mass and impairs β-cell proliferation in obese c bl/ mice angiotensin-converting enzyme influences pancreatic and renal function in diabetic mice activation of ace /angiotensin ( - ) attenuates pancreatic β cell dedifferentiation in a high-fat-diet mouse model coxsackie b virus infection of beta cells and natural killer cell insulitis in recent-onset type diabetic patients detection of a low-grade enteroviral infection in the islets of langerhans of living patients newly diagnosed with type diabetes the prevalence of enteroviral capsid protein vp immunostaining in pancreatic islets in human type diabetes expression of the enteroviral capsid protein vp in the islet cells of patients with type diabetes is associated with induction of protein kinase r and downregulation of mcl- viral infections in type diabetes mellitus--why the β cells? the case for virus-induced type diabetes prospective virome analyses in young children at increased genetic risk for type diabetes unexpected subcellular distribution of a specific isoform of the coxsackie and adenovirus receptor, car-siv, in human pancreatic beta cells angiotensin-converting enzyme (ace ), but not ace, is preferentially localized to the apical surface of polarized kidney cells sars-cov- receptor ace is expressed in human conjunctival tissue, especially in diseased conjunctival tissue angiotensin ii mediates angiotensin converting enzyme type internalization and degradation through an angiotensin ii type i receptor-dependent mechanism the insulin secretory granule as a signaling hub ace -containing extracellular vesicles and exomeres bind the sars-cov- spike protein sars-cov- receptor angiotensin i-converting enzyme type is expressed in human pancreatic islet β-cells and is upregulated by inflammatory stress we thank dr. r. scharfmann (institut cochin, inserm, université de paris) who provided the human pancreatic β-cell line endoc-βh . we thank dr. laura salvini, dr. laura tinti and dr. vittoria key: cord- -ouno jpl authors: mahajan, swapnil; kode, vasumathi; bhojak, keshav; magdalene, coral m.; lee, kayla; manoharan, malini; ramesh, athulya; sudheendra, hv; srivastava, ankita; sathian, rekha; khan, tahira; kumar, prasanna; chakraborty, papia; chaudhuri, amitabha title: immunodominant t-cell epitopes from the sars-cov- spike antigen reveal robust pre-existing t-cell immunity in unexposed individuals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ouno jpl the covid- pandemic has revealed a range of disease phenotypes in infected patients with asymptomatic, mild or severe clinical outcomes, but the mechanisms that determine such variable outcomes remain unresolved. in this study, we identified immunodominant cd t-cell epitopes in the rbd and the non-rbd domain of the spike antigen using a novel tcr-binding algorithm. a selected pool of predicted epitopes induced robust t-cell activation in unexposed donors demonstrating pre-existing cd and cd t-cell immunity to sars-cov- antigen. the t-cell reactivity to the predicted epitopes was higher than the spike-s and s peptide pools containing and peptides both in unexposed donors and in convalescent patients suggesting that strong t-cell epitopes are likely to be missed when larger peptide pools are used in assays. a key finding of our study is that pre-existing t-cell immunity to sars-cov- is contributed by tcrs that recognize common viral antigens such as influenza and cmv, even though the viral epitopes lack sequence identity to the sars-cov- epitopes. this finding is in contrast to multiple published studies in which pre-existing t-cell immunity is suggested to arise from shared epitopes between sars-cov- and other common cold-causing coronaviruses. whether the presence of pre-existing t-cell immunity provides protection against covid- or contributes to severe disease phenotype remains to be determined in a larger cohort. however, our findings raise the expectation that a significant majority of the global population is likely to have sars-cov- reactive t-cells because of prior exposure to flu and cmv viruses, in addition to common cold-causing coronaviruses. uncovering the immunological responses to covid- infection will help in designing and developing next-generation therapies and manage the treatment of critical covid- patients. many host factors associated with mild or severe disease symptoms have been reported. for example, leukopenia, exhausted cd t-cells, higher levels of th cytokines in serum, a high titer of neutralizing antibodies, blunted interferon response, dysregulation of the myeloid cell compartment, activated nk cells, and the size of the naïve t-cell compartment is associated with critically ill patients ( ) ( ) ( ) ( ) . this wide range of variable factors shares a common immunological underpinningthat of a systemic dysregulation in immune homeostasis due to the failure of the host immune system to clear the virus during the early stages of the infection ( ) . many studies have shown that clearance of respiratory viruses requires cd t-cell immunity ( ) . a delay in the activation of cd t-cells and a lack of early ifn- production lead to an increase in viral load triggering overactivation of the innate and the adaptive arm of the immune system leading to a loss of immune homeostasis resulting in severe disease phenotype, including death. therefore, an early wave of strong cd t-cell response may delay viral titer build-up, allowing rapid clearance of the virus by the immune system without perturbing immune homeostasis. healthy humans not exposed to covid- show pre-existing cd and cd t-cell immunity to sars-cov- antigens ( ) ( ) ( ) . the pre-existing immunity to cd and cd t-cells was detected against structural and non-structural sars-cov- proteins by overlapping -mer peptide pools. the existence of a pool of sars-cov- -reactive t-cells in unexposed individuals is thought to arise from coronaviruses that cause common cold ( , ) . whether pre-existing immunity provides any protection to sars-cov- infection, or contribute to a faster recovery from infection remains speculative. besides, it is unclear whether a pre-existing immunity, involving either cd or cd t-cells, or both, is required for maximal protection. identifying robust pre-existing immunity against sars-cov- in the healthy population can be used as a measure to assess the mode of recovery and also viral spread in the global population. in this study, we identified strong cd t-cell-activating epitopes from sars-cov- spike protein by a combination of epitope prediction and t-cell activation assays in healthy donors unexposed to sars-cov- . the rationale for identifying epitopes that favor cd t-cell activation was two-fold. first, robust cd t-cell activating epitopes can be formulated as second-generation vaccines for short and long-term protection against viral infection. second, detection of preexisting immunity in healthy donors using epitopes that favor cd t-cell activation may provide a framework to understand the complex immune responses observed in clinical settings. it may also shed light on the differences in morbidity and mortality in different population groups across the globe. we developed a proprietary algorithm oncopeptvac to predict cd t-cell activating epitopes across the sars-cov- proteome. oncopeptvac predicts binding of the hla-peptide complex to the t-cell receptor (tcr). we selected a cocktail of eleven -mer peptides with a broad class-i and class-ii coverage and favorable tcr engagement predicted by the algorithm. the cocktail of peptides was tested for t-cell activation in healthy donors from the usa and india unexposed to covid- . we observed higher cd t-cell activation by the -peptide pool compared to the overlapping -mer peptide pools from the spike-s and s proteins. homology analysis of the selected peptides with other coronavirus spike proteins indicated a lack of significant amino acid identity with any of the peptides, suggesting engagement of one or more peptides in the pool to cross-reactive tcrs from other viruses, not particularly from a coronavirus. bulk and single-cell tcr analysis revealed expanded clonotypes recognizing epitopes from cmv, influenza-a, and other viruses to which most of us are exposed. taken together, our findings support that strong pre-existing cd t-cell immunity in unexposed donors is contributed by cross-reactive tcrs from other viruses. significantly, we discovered multiple immunodominant epitopes in our predicted pool of peptides that favored cd t-cell activation. finally, we show that our cocktail of -peptides induced a robust immune response in convalescent patients demonstrating that these peptides are recognized by infected patients. taken together, our study uncovered strong pre-existing cd t-cell immunity against sars-cov- using a small set of epitopes that engaged cross-reactive tcrs recognizing epitopes from other viruses, not necessarily common cold viruses belonging to the coronavirus family as hypothesized by other studies. additionally, our findings provide a basis for the generation of herd immunity against covid- without prior sars-cov- infection. a deep cnn model oncopeptvac was implemented to predict the immunogenicity of the peptide-based only on the peptide and hla sequences. a total of , immunogenic and nonimmunogenic peptide-hla pairs were obtained from the iedb ( ) . the blosum encoding was used to represent the peptide and hla molecules. the blosum substitution scores encode evolutionary and physicochemical properties of the amino acids ( ) . in addition, hydrophobicity indices and predicted hla binding scores were also used to represent the peptide and hla sequences. oncopeptvac used the cnn model with multiple d convolutional layers combined with maxpooling to confirm the additive effect of different input features on the performance of the model. all the model versions were trained using -fold cross-validation. the aucs of the final model was . based on a blind test dataset ( figure a ). the prediction algorithm showed a sensitivity of . and a specificity of . based on the score cut-off of . ( figure b) . by increasing the cut-off score to . , the specificity could be further increased to . with a concomitant loss in sensitivity. oncopeptvac reduced the number of false positives significantly compared to the hla-binding rank (compare figures b and c) reducing the number of epitopes that needed to be screened in a t-cell activation assay by % to identify true immunogenic epitopes. for example, to identify % ( out of ) of the immunogenic peptide-hla pairs present in the blind test dataset, top peptide-hla pairs from oncopeptvac prediction needed to be screened, compared to top peptide-hla pairs predicted by netmhcpan- . . the prediction algorithm was applied to the sars-cov- proteome and screened against class-i hlas covering over % of the world population. a schematic of the in silico screening approach is shown in figure d . briefly, - -mer peptides from the sars-cov- proteome were screened for tcr-binding against hla, and peptides with oncopeptvac score > . were analyzed for class-i hla binding. peptide-hla pairs with a high predicted binding affinity (< percentile rank) were selected, their length extended to -mer and screened for class-ii hla binding. peptides with favorable tcr binding and class-i/ii hla binding features were selected for further validation. the number of predicted immunogenic epitopes from sars-cov- protein-coding genes is shown in figure e . the distribution of oncopeptvac scores against different class-i hla genes indicates a higher number of favorable tcr-binding peptides for hla-b and c compared to hla-a ( figure f ). natural biases in hla-restrictions have been reported for immunogenic hiv epitopes ( ) . we performed t-cell activation assay using the selected epitopes from the sars-cov- spike antigen in unexposed donors. the -mer peptides are distributed across different segments of the rbd and the non-rbd regions of the spike antigens and few peptides carry ace receptor binding sites (figure a and table- table-s ). activation of t-cells using the cocktail of peptides (all-peptide) was compared to the responses from spike-s ( peptides) and s ( peptides) pools (see methods for assay details). in a h assay, % of the unexposed donors responded strongly to the predicted peptide mix by inducing intracellular ifn + in both cd and cd t-cells. the responses to spike-s and s peptide pools were weaker ( figure b) . a strong h ifn- response suggested recall to pre-existing antigen-experienced cd t-cells. the peptide-mix also induced a strong - bb response in cd t-cells but not in cd t-cells ( figure b ). both ifn- and - bb levels increased in cd t-cells at day- by the all-peptide mix compared to the spike peptide pools ( figure c ). we observed higher expression of ifn- and - bb in the cd t-cells by the spike peptide pools at day- suggesting de novo activation ( figure c ). although the use of -mer peptides is expected to skew the response towards cd t-cells, we observed a stronger cd t-cell response to the peptide-mix suggesting that the -mer peptide though added exogenously, was processed and presented by class-i hlas efficiently. taken together, the results demonstrate that the use of oncopeptvac identified potent cd t-cell epitopes in the spike antigen that could not have been detected by using large overlapping peptide pools used in t-cell activation assays. next, we tested individual peptides from the mix to assess their contribution to t-cell activation. the magnitude and kinetics of ifn- induction in cd t-cells were variable in different donors ( figure d ). in most donors, the maximal response was detected by -days, but in donors d and d the response peaked at h and declined ( figure d ). we tested the effect of individual peptides in multiple donors as indicated by the arrows to determine their immunogenicity. as shown in figure multiple studies have reported pre-existing t-cell immunity in unexposed donors using spike peptide pools and attributed the response to t-cells recognizing epitopes from common coldcausing coronaviruses to which a large section of the global population is exposed ( , , ) . homology analysis of the selected epitopes (see methods) indicated that out of the peptides share > % sequence identity with sars-cov and only (peptide- ) out of the peptides has over % identity with multiple coronaviruses ( table ) . peptide- is in the s domain of the spike protein and showed ≥ % cd t-cell response at h in out of donors tested (d , figure e ). however, peptides , , , and lacking significant identity to other coronaviruses ( table ) showed ≥ % cd t-cell activation at h in at least one donor out of ( figure e -f and figure s ). peptide- induced high cd t-cell activation at h in two donors (d and d ( figure f and figure s ). taken together, the data suggest that pre-existing t-cell immunity to these peptides may be derived from cross-reactive tcrs recognizing other viruses. to identify cdr s amplified by individual peptides, or the all-peptide-mix, bulk tcr analysis was performed on antigen-stimulated pbmcs from donors d and d (see methods). both donors showed a robust ifn- response to pep- , and the pep-mix, but not to pep- ( figure s ). diversity and clonal amplification of unique public and private cdr s were analyzed at three different time points ( figure a -b). both the donors showed clonal expansion of multiple public cdr s recognizing hcmv, human herpes virus- (hhv- ), and influenza-a peptides when stimulated with pep- and all-peptide mix, but not with pep- (table s ). hcmv and hhv- cdr s were expanded in donor d ( figure a , top panel), whereas d showed expansion of hcmv and influenza-a cdr s ( figure a , bottom panel). significantly, these cdr s were not amplified by spike-s and s peptide pools or by pep- , the latter failed to activate t-cells in these donors. further, cdr s recognizing hcmv peptide nlvpmvatv in donors and were different, suggesting that the same antigen engages multiple crossreactive tcrs in different donors. next, we analyzed private cdr s in these two donors to identify novel sars-cov- antigen-specific cdr ( figure b ). donor showed a lack of specific amplification of private cdr s suggesting that the robust cd t-cell response detected in this donor may be contributed by the amplified public cdr s ( figure b , top panel). in contrast, two private cdr s were clonally amplified by pep- and all-peptide mix in d suggesting that the t-cell response is derived from both public and non-public tcrs in this donor ( figure b , bottom panel). a list of clonally amplified public and private cdr s detected in the two donors is given in table s . to further investigate the tcr repertoire profile of donors and we analyzed the vdj gene usage in the bulk cdr data. in d , two v segments trvb and trvb and a j gene trbj - were significantly over-represented in pep and peptide-mix treated samples ( figure c ), whereas in d trbv - and trbj - genes were amplified (fig. d ). to characterize the phenotype and functional state of activated t-cells and reveal differences between the different treatments, we performed single-cell sequencing on a x platform. single-cell transcriptomics and tcr data obtained from - cells identified - unique transcripts (see methods). using graph-based clustering of uniform manifold approximation and projection (umap) we captured transcriptomes of distinct cell types ( figure a and table s ). our assay method is enriched for the growth and proliferation of t-cells causing depletion of other immune cell types present in pbmc in a -day culture. three cell types, cd , /, and nk-t were detected in all the samples. compared to dmso and pep- in which the cd t-cell fraction was ~ %, in spike-s and spike-s the cd t-cell fraction was % and % respectively. conversely, the cd cluster was expanded in spike-s ( %) and s ( %) compared to dmso ( %) suggesting that the spike peptide pools engaged cd tcells (table s ). the single cell transcriptomic analysis further revealed that pep- induced effector phenotype in the cd t-cell cluster by the expression of activation markers ifn-, - bb ( figure b ) tnfrsf , fas, and tigit (compare figure s d with s a-c). the top pep- -expanded clonotypes were cd + /sellsuggesting transition towards effector memory phenotype ( figure b ). spike-s and s peptides induced cd + /sell -t-cells in the cd clones and respectively ( figure b ). single-cell data revealed amplification of trbv ( %) and trbj - ( %) in pep- stimulated t-cells ( figure c ) confirming the results from the bulk tcr analysis. next, we mapped cdr - to specific clones from each treatment ( figure d ). the dmso, spike-s , and s -treated samples shared many clonotypes among themselves in the same frequency range suggesting weak antigen-induced activation and proliferation of t-cells. next, we analyzed the clonal composition and phenotype of t-cells to investigate the dynamics of antigen-specific t-cell response in the treated samples. we analyzed the top- clones for their phenotype by the expression of marker genes ( figure s ). in all samples, including dmso, cd t-cell clonotypes were more frequent ( figure s a-d) . as expected, the cd t-cell compartment was expanded in spike-s and s -treated samples ( and % of all clonotypes respectively) compared to dmso ( . %) ( figure s b-c) . the cd t-cells expressed tnfsf (ox- ) suggesting activation, although they failed to express ifn- ( figure s b-c) . a few expanded cd clones in the spike-s and s treated samples showed a high expression of il rb suggesting polarization towards a th phenotype ( figure s b-c) . in the pep- treated sample, almost all clonotypes in the top- were cd t-cells. the highly expanded clones expressed multiple t-cell activation markers ( figure s d ). interestingly, in addition to the activation markers, these cells expressed higher levels of il ra (cd ) suggesting differentiation towards an effector memory phenotype ( figure s d ). cd expression was low in the cd t-cell compartment in other samples. taken together, the results of the transcriptomic analysis highlighted that the strong immunogenic cd t-cell epitope identified in this study preferentially engaged cd t-cells pushing them towards an effector and effector memory phenotype. the spike-s and s peptide pools on the other hand engaged both cd and cd t-cells and modulated the cd t-cells towards a th phenotype. to assess whether the predicted epitopes are recognized by covid- infected patient t-cells, we tested the all-peptide mix on seven asymptomatic, five with mild-moderate symptoms, and five severe convalescent patients requiring icu admission (table s ) and analyzed their cd and cd t-cell response after h. the patients experiencing mild to moderate symptoms exhibited higher induction of ifn- in cd t-cells ( figure a ). the ifn- induction in cd tcells was higher in the presence of the spike-s peptide pool compared to the all-peptide mix ( figure c ). spike-s peptide pool induced stronger - bb induction in cd and cd t-cells compared to the all-peptide mix ( figure b-d) . taken together, our results confirm that the epitopes prioritized by the algorithm were recognized by covid- infected patient t-cells, and the ifn- response induced by the all-peptide mix was skewed towards cd t-cells. spike peptide pools favored activation of the cd compartment in these convalescent subjects in line with our assay results and single cell transcriptome analysis showing a preferential expansion of the cd compartment by these peptide pools. a wide array of respiratory viruses induces severe pneumonia, bronchitis, and even death following infection. despite the immense clinical burden, there is a lack of efficacious vaccines with long-term therapeutic benefit. most current vaccination strategies employ the generation of broadly neutralizing antibodies, however, the mucosal antibody response to many respiratory viruses is short-lived and declines with age. in contrast, several studies on respiratory viruses have shown the presence of robust virus-specific cd -t cell responses which has been shown to last for decades. therefore, vaccine designs for emerging respiratory viruses need consideration and rational inclusion of cd epitopes to confer long term resistance ( ) . this study demonstrates the existence of strong cd t-cell activating epitopes in the spike antigen and uncovers robust pre-existing cd t-cell immunity in unexposed donors. several studies have reported pre-existing t-cell immunity in unexposed donors and attributed these to infections by common cold-causing human coronaviruses ( , , ) . other studies, on the contrary, have reported a lack of pre-existing t-cell immunity in unexposed donors ( , ) . to identify strong cd t-cell epitopes, we developed a novel tcr-binding algorithm oncopeptvac that selects epitopes favorable for tcr-binding. in all epitope screening methods, epitope selection is primarily based on class-i and ii hla-binding affinity, which predicts surface presentation of antigen in complex with hla ( ) , but not the interaction of the peptide-hla complex with a tcr ( ) . by incorporating features that predict tcr-binding of a peptide, our algorithm oncopeptvac successfully identified many cd t-cell epitopes in a small pool of peptides used in t-cell activation assays. the tcr-binding algorithm is especially suitable for reducing the number of epitopes that need to be screened to identify robust cd t-cell activating epitopes. for example, our algorithm predicted peptides from all sars-cov- proteins excluding orf , which is a much smaller number compared to the number of peptides screened in some of the published studies to identify pre-existing t-cell immunity ( , , , ) . a second factor that may have resulted in the identification of strong cd t-cell activating epitopes is the avoidance of epitope competition. using a large pool of peptides to screen for t-cell responses ensures broad coverage of all hlas, but has the disadvantage that strong immunogenic epitopes are not detected efficiently. some of the peptides predicted by our algorithm produced > % cd t-cell response in healthy donors by -days. in the same donors, the response from spike-s and s peptide pools containing and peptides respectively was much weaker. a similar finding was reported by mateus et al. where deconvolution of peptide pools identified a single peptide that evoked -fold higher t-cell response compared to the pool ( ) . also, important to note, that the strategy of using -mer peptides with overlapping or -mer sequences may not identify immunodominant epitopes. for example, out of the -peptides tested in our assay, only three peptides were present in the spike peptide pools. by using a smaller pool of immunodominant cd t-cell epitope, our study uncovered a fundamental feature of the host immune response to sars-cov- the existence of crossreactive tcrs to viruses, such as cmv and influenza that recognizes sars-cov- antigen. an early and robust t cell response is driven by the size and the diversity of the tcr repertoire to a given antigen ( ) . the pep- epitope derived from the rbd domain of sars-cov- spike antigen lacking homology to other coronaviruses expanded multiple public cdr s recognizing immunodominant cmv epitope nlvpmvatv and influenza epitope gilgfvftl. further, tcr analysis demonstrated that although a donor's tcr repertoire contains many cmv-epitopespecific cdr s, only a few are expanded in the presence of the sars-cov- peptide. for example, donor d tcr repertoire has cmv and influenza cdr s of which three and one were expanded respectively. similarly, donor carries nlvpmvatv specific cdr of which only two expanded. these findings suggested the specificity of interaction between cross-reactive cdr s and specific peptides from sars-cov- . significantly, the expanded cdr s in the two donors d and d were different, even though they recognized the same cmv peptides. it has been documented that conserved features within cdr - allow recognition of the same phla complex within a group of diverse cdr s ( ) . a robust antigen-specific t cell response utilizes a broad range of tcrs and for many viral infections, tcr usage diversity has been positively linked to disease outcomes ( ) ( ) ( ) . a diverse repertoire not only allows increased structural capacity to recognize variant epitopes ( ) but increases the chances that high-affinity tcrs may be present in an individual ( ) . a recent large-scale study mapped a few immunogenic regions in the sars-cov- proteome responsible for expanding many unique tcrs in a large number of convalescent covid- patients and unexposed healthy donors ( ) . immunodominant epitopes reported in our study cover some of the "hotspot" regions identified by this large-scale study ( ) . efforts to identify cross-reactive tcrs recognizing different antigens from diverse infectious organisms can lead to the development of broad-spectrum tcr-based therapeutics against infectious diseases. we compared the all-peptide mix with spike- and peptide pools on a small number of convalescent patients and identified a slightly higher cd /ifn- response by the peptide mix in mild to moderate disease, compared to patients with asymptomatic or severe disease. many studies have indicated that short and long-term protection against respiratory viruses requires cd t-cell immunity and antibody response alone is not sufficient ( , ) . in line with this observation, low plasma titers of neutralizing antibodies are detected in a large fraction of convalescent patients suggesting additional immune protective mechanisms, besides viral neutralization ( ) . on the contrary, high levels of neutralizing antibodies were associated with severe disease and icu visits in many covid- patients suggesting an imbalanced cd t-cell response is not optimal for protection ( ) ( ) ( ) . it has been challenging to demonstrate a strong cd t-cell response in covid- patients in many studies. however, our findings along with a recent report from peng et al. showed that a higher cd t-cell response correlated with a mild disease compared to patients with severe disease ( ) . in conclusion, our study demonstrates strong pre-existing cd t-cell immunity in many data on , t cell assays was collected from the iedb ( ). there were , cd t cell assays in total with , cd t cell assays with humans as a host. the cd t cell assays with hla allele names and peptide lengths ranging - residues were further selected. hla supertypes were replaced with their representative allele names, for example, hla-a was replaced with hla-a* : . the immunogenic peptide-hla pairs tested on at least three donors with % response frequencies or at least tested on donors with greater than % response frequencies were labelled as positive. the non-immunogenic peptide-hla pairs tested on at least donors with % response frequency were labelled as negative. the final dataset contained , unique peptide-hla pairs which were split randomly into % training and % test datasets. the training dataset had immunogenic and , nonimmunogenic peptide hla pairs. the test dataset had and , immunogenic and nonimmunogenic peptide-hla pairs, respectively. a deep convolutional neural network (cnn) was implemented to predict immunogenicity of the peptide-hla pair (provisional patent pending). the hla alleles were represented as pseudosequences described as amino acid residues ( ) . the peptide and hla pseudo-sequences were converted to the two-dimensional ( d) feature matrices of x and x dimensions using blosum encoding ( ) respectively. peptide sequences shorter than residues were padded by zeroes to maintain x feature matrix dimensions. peptide sequences were also encoded into x feature vector using the kyte-doolittle hydrophobicity scale ( ). the hla binding percentile ranks, and scores for each peptide-hla pair were obtained using netmhcpan- . ( ) and were appended to the kyte-doolittle hydrophobicity scale feature vector. the peptide and hla feature vectors were each processed by multiple d convolutional filters of two different sizes followed by max-pooling layers of the same sizes serially. the peptide and hla max-pooled layers were concatenated and processed again with multiple d convolutional filters followed max-pooling layers. the max-pooled layers were flattened-concatenated and then connected to a dense layer. the output of the peptide and hla dense layer was concatenated with the hydrophobicity and hla binding feature vector and again connected to two dense layers. the final output of dense layer was connected to the output neuron. the cnn was trained using -fold cross validation with the training dataset exclusively. the test dataset was solely used for model performance evaluation. model performance was evaluated using auc (area under roc curve) where an auc of . represents random predictions and auc of . represents the perfect predictions. the tensorflow library from python programming language was used to implement the models. full length shortlisted peptide sequences of sars-cov- were blasted against the spike proteins of other coronaviruses, oc , nl , e and hku . an e-value cutoff of . was used with a minimum cutoff of amino acid residues was used to identify homologous peptides. unexposed donor pbmcs were obtained from the us and india for this study. pbmcs from the us were collected between - and purchased from stemcell technologies, canada. pbmcs from india were collected between - . covid- convalescent patient blood from the us was purchased from ppa research (usa) and the indian samples were collected through hospitals. all participants in this study provided informed consent in accordance with protocols approved by the institutional review board. pbmcs were thawed, counted and analyzed using the diagnostic panel of antibodies (table s ) . pbmcs were rested overnight in rpmi containing % human serum (table s ) . for t-cell activation assays, , pbmcs were incubated either with dmso (negative control) or with different peptide pools in . ml rpmi (gibco) + % human ab serum (sigma) + ng/ml il- and iu of il- (stemcell technologies, canada). the culture media was replenished every three days with fresh media containing iu of il- and ng/ml il- . on days , and of incubation, fresh peptides were added to the culture. for intracellular cytokine staining, cells were treated with brefeldin a (bd biosciences) for hours, fixed and permeabilized using bd lysis solution and perm solutions respectively followed by staining with t-cell activation panel of antibodies (table s ) . stained cells were analyzed in bd accuri c plus to detect the expression of activation markers ifn- and cd ( - bb) on cd and cd t cells. data was analyzed using bd accuri c plus software. , pbmcs was removed after h, , and days from the t-cell activation assay and processed for bulk tcr sequencing. tcr repertoire profiling was performed using the smarter tcr α/β profiling kit (takara bio, usa) according to the manufacturer's protocol. rna was isolated using the qiagen rna isolation kit. ng rna from antigen-induced pbmcs were used as the starting material. the kit uses smart technology (switching mechanism at 'end of rna template) with 'race to capture the entire v(d)j variable regions of tcr transcripts followed by two rounds of seminested pcr to obtain tcr- and the -chain. libraries are prepared analyzed for quality and quantity. sequencing is performed using the * miseq reagent kit v (illumina, inc.). for each sample, raw gene expression matrices were generated by cell ranger(v. . . ) coupled to the human reference version grch . the gene expression data was analyzed by r software (v. . . ) with the seurat package ( . . ) . in brief, low-quality cells were removed if they met one of the following criteria: > , unique molecular identifiers (umis); < or > , genes; > % umis derived from the mitochondrial genome; and > % of transcripts contributed by top genes ( figure s ). after removing low-quality cells, gene expression matrices were normalized by the normalizedata function and features with high cell-to-cell variation were calculated using the findvariablegenes function. next, the expression of the sphase and the g -m phase genes were used to calculate the cell cycle score for all the cells using the cellcyclescoring feature. to generate unbiased clustering of cells scaledata function was used, regressing out the expression of cell cycle genes, mitochondrial % and number umi from the features ( figure s ). the dimensionality of the datasets was reduced by runpca function using variable features identified by findvariablegenes function on lineartransformation scaled data generated by the scaledata function. next, the elbowplot and dimheatmap functions were used to identify the true dimensionality of each dataset. finally, cells were clustered using the findclusters function and nonlinear dimensional reduction with the runumap function using the euclidean distance feature. all details regarding the seurat analyses performed in this work can be found in the website tutorial (https://satijalab.org/seurat/v . /pbmc k_tutorial.html). after nonlinear dimensional reduction and projection of all cells into two-dimensional space by umap, cells were clustered together according to common features. the findallmarkers function in seurat was used to find markers for each of the identified clusters. clusters were then classified and annotated based on expressions of canonical markers of particular cell types. differential gene expression was performed using the findallmarkers function in seurat with default parameters. we selected top- upregulated degs with maximum fdr value of . and annotated the clusters based on the expression of these upregulated genes. next, we used mergeseuratfunction to integrate the four datasets from the four treatment conditions and performed the steps of data normalization, feature extraction, regressing out features as described above. the heatmap and dot plots were generated using the doheatmap and dotplot function in seurat. full-length tcr v(d)j segments were enriched using a chromium single-cell v(d)j enrichment kit according to the manufacturer's protocol ( x genomics). demultiplexing, gene quantification and tcr clonotype assignment were performed using cell ranger (v. . . ) vdj pipeline with grch as reference. tcr diversity metric, containing clonotype frequency and barcode information, was obtained. cells with at least one productive tcr α-chain (tra) and one productive tcr β-chain (trb) were retained for further analysis. each unique tra(s)-trb(s) pair of tra-trb was defined as a clonotype. the presence of identical clonotypes at least in two cells were considered to be clonal, and the number of cells containing the same tra-trb pairs defined clonal amplification of a clonotype. using barcode information, tcr clonotypes were projected on umap and dot plots. public tcrs were mapped to the iedb and vdjdb annotated databases using the trb sequence. the trinity of covid- : immunity, inflammation and intervention clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan pathological inflammation in patients with covid- : a key role for monocytes and macrophages systems biological assessment of immunity to mild versus severe covid- infection in humans imbalanced host response to sars-cov- drives development of covid- news feature: avoiding pitfalls in the pursuit of a covid- vaccine targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals selective and cross-reactive sars-cov- t cell epitopes in unexposed humans phenotype and kinetics of sars-cov- -specific t cells in covid- patients with acute respiratory distress syndrome sars-cov- -reactive t cells in healthy donors and patients with covid- the immune epitope database (iedb): update reliable prediction of t-cell epitopes using neural networks with novel sequence representations hiv peptidome-wide association study reveals patient-specific epitope repertoires associated with hiv control t cell-mediated immune response to respiratory coronaviruses broad and strong memory cd (+) and cd (+) t cells induced by sars-cov- in uk convalescent individuals following covid- immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind, placebo-controlled, phase trial immunological considerations for covid- vaccine strategies emerging pandemic diseases: how we got to covid- towards a systems understanding of mhc class i and mhc class ii antigen presentation t cell epitope predictions magnitude and dynamics of the t-cell response to sars-cov- infection at both individual and population levels. medrxiv revealing factors determining immunodominant responses against dominant epitopes identifying specificity groups in the t cell receptor repertoire clonally diverse ctl response to a dominant viral epitope recognizes potential epitope variants narrowed tcr repertoire and viral escape as a consequence of heterologous immunity limited t cell receptor repertoire diversity in tuberculosis patients correlates with clinical severity a molecular basis for the control of preimmune escape variants by hiv-specific cd + t cells direct link between mhc polymorphism, t cell avidity, and diversity in immune defense sars-cov- t cell immunity: specificity, function, durability, and role in protection convergent antibody responses to sars-cov- in convalescent individuals altered cytokine levels and immune responses in patients with sars-cov- infection and related conditions disease severity dictates sars-cov- -specific neutralizing antibody responses in covid- high neutralizing antibody titer in intensive care unit patients with covid- the immune epitope database (iedb): update netmhcpan, a method for mhc class i binding prediction beyond humans reliable prediction of t-cell epitopes using neural networks with novel sequence representations a simple method for displaying the hydropathic character of a protein netmhcpan- . and netmhciipan- . : improved predictions of mhc antigen presentation by concurrent motif deconvolution and integration of ms mhc eluted ligand data key: cord- -gytebbua authors: eydoux, cecilia; fattorini, veronique; shannon, ashleigh; le, thi-tuyet-nhung; didier, bruno; canard, bruno; guillemot, jean-claude title: a fluorescence-based high throughput-screening assay for the sars-cov rna synthesis complex date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gytebbua the severe acute respiratory syndrome coronavirus (sars-cov) emergence in introduced the first serious human coronavirus pathogen to an unprepared world. to control emerging viruses, existing successful anti(retro)viral therapies can inspire antiviral strategies, as conserved viral enzymes (eg., viral proteases and rna-dependent rna polymerases) represent targets of choice. since , much effort has been expended in the characterization of the sars-cov replication/transcription machinery. until recently, a pure and highly active preparation of sars-cov recombinant rna synthesis machinery was not available, impeding target-based high throughput screening of drug candidates against this viral family. the current severe acute respiratory syndrome coronavirus- (sars-cov- ) pandemic revealed a new pathogen whose rna synthesis machinery is highly (> % aa identity) homologous to sars-cov. this phylogenetic relatedness highlights the potential use of conserved replication enzymes to discover inhibitors against this significant pathogen, which in turn, contributes to scientific preparedness against emerging viruses. here, we report the use of a purified and highly active sars-cov replication/transcription complex (rtc) to set-up a high-throughput screening of coronavirus rna synthesis inhibitors. the screening of a small ( , compounds) chemical library of fda-approved drugs demonstrates the robustness of our assay and will allow to speed-up drug repositioning or novel drug discovery against the sars-cov- . principle of sars-cov rna synthesis detection by a fluorescence-based high throughput screening assay highlights - a new sars-cov non radioactive rna polymerase assay is described - the robotized assay is suitable to identify rdrp inhibitors based on hts -a new sars-cov non radioactive rna polymerase assay is described -the robotized assay is suitable to identify rdrp inhibitors based on hts the rdrp core nsp and shown to confer full activity and processivity to nsp (subissi et al., ) . these finding were corroborated by the cryo-em structure determination of the nsp - - sars-cov rna synthesis complex in , followed by that of sars-cov- in (gao et al., ; kirchdoerfer and ward, ) . the structure shows the nsp rdrp core bound to one molecule of nsp and two molecules of nsp . furthermore, the structure described by hillen et al. suggest that nsp acts as a "sliding pole" for processivity of rna synthesis (hillen et al., ) . although it is not known if this represent one of the biologically relevant form(s) of the sars-cov replication transcription complex (rtc), it does suggest that there is both biochemical and functional relevance to consider the nsp - - complex as the minimum rtc required for synthesis of the coronavirus ~ , nt rna genome. the sars-cov and sars-cov- viruses are highly homologous, with a % overall amino acid sequence identity along the genome . remarkably, the sars-cov nsp amino acid sequence is % identical to that of sars-cov- , and amino acid polymorphisms involve conservative changes. the genome is translated into two large polyproteins originating from orf a and orf b, the latter being expressed from a ribosomal frameshift occurring at the end of orf a. orf b codes for a set of five conserved replication proteins, among which nsp carries the polymerase activity when supplemented with orf a products nsp and nsp . in the core polymerase domain, amino acid differences between sars-cov- and map to the surface of the protein, not onto any of the conserved motif a to g, indicating that sars-cov and sars-cov- rna synthesis properties should be nearly identical (shannon et al., ) . in this paper, we use the nsp , nsp , and nsp proteins of the sars-cov to assemble a highly active rna synthesis complex. we optimize reaction conditions to set-up a non-radioactive nucleotide polymerization assay with a signal-to-noise ratio appropriate for a robotized high-throughput screening (hts) assay. we validate this hts assay through a screening campaign using the prestwick chemical library® of , fda-ema approved compounds, and report detailed inhibition profiles of two series of hits. the fusion protein nsp -nsp was generated by inserting a gsgsgs linker sequence between the nsp and nsp coding sequences, and is named nsp l (subissi et al., ) . the nsp l and nsp proteins were produced and purified independently as described previously in a bacterial expression system free t -bacteriophage rna polymerase. the complex was reconstituted with a : ratio of nsp :nsp l as indicated. picogreen kinetic assay was based on polymerase activity of sars nsp in complex with nsp l , which catalyzed the reaction using a poly (a) template and uridine triphosphate (utp). the reaction ( µl) was carried out at o c in mm hepes ph , mm dtt, mm kcl, mm mgcl , mm mncl , µm utp and nm nsp as final concentrations. the final poly(a) concentration varied from nm to nm. to reconstitute an active replicase, nsp l was used in a -fold molar concentration excess compared to nsp as described. nsp and nsp l were incubated for min in the presence of poly(a) before starting the reaction with utp addition. µl of ethylenediaminetetraacetic acid (edta) mm were added into each well of a wells black flat bottom plate (greiner bio-one ref ). at each time interval ( ; . ; ; ; ; and min), µl of reaction was added into wells on the plate with edta to stop the reaction. the plate was then incubated minutes in the dark with / picogreen® in te buffer ( mm tris-hcl, mm edta, ph . ). the plate was read on tecan safire ii using software magellan (excited light at nm, emitted light at nm, optimal gain). the assay was done in triplicate. velocity values of each condition were determined using the prism software by calculating the slope of the linear phase and then plotted against the poly(a) template concentration to determine the km (poly(a)) and v max values by using michaelis-menten fitting. utp variation assay was performed in the same conditions as described above, with the defined poly(a) optimum concentration of nm, except for the final utp concentration, which varied from µm to mm. at each time interval ( ; . ; ; ; ; and min), µl of reaction was added into wells on the plate with edta to stop the reaction. the assay was experimented three times., velocity values of each condition were determined using the prism software by calculating the slope of the linear phase and then plotted against the utp concentration to determine the apparent km app (utp) value by using michaelis-menten fitting or hill fitting. enzyme variation assays were performed in the same conditions as described above (poly(a) nm, utp µm) except for the final enzyme concentration, which varied from nm to nm. background and optimal gain values (given by tecansafire ) of each assay were determined and analyzed, using prism software, to obtain the final best condition. to assess the optimized conditions of the polymerase activity of the sars nsp -nsp l complex, a reaction time course was performed using nm poly (a) template, µm utp and nm nsp enzyme, testing nsp alone, nsp l alone or the nsp /nsp l complex. nsp l is in a -fold molar concentration excess compared to nsp concentration. nsp and nsp l were incubated at room temperature for min in the presence of poly (a) before starting the reaction by the addition of utp. as a comparison, the assay was performed with the dv- polymerase ( nm poly (u) template, µm atp, nm enzyme as already described (benmansour et al., ) . experiments, performed in triplicate, were analyzed using the prism software. the assay was performed in -well nunc plates. the chemical library is from prestwick chemical. the compounds are distributed in plates, with compounds per plate and the first and th columns with dmso. each of the compounds were added to the reaction mix using a biomek i workstation (beckman) to a final concentration of µm in % dmso. reactions were conducted in μl final volume. the enzyme mix containing both nsp , nsp l and the poly (a) template was incubated min at room temperature to form the active complex. for each assay, μl of this enzyme mix was distributed in wells using a biomek (beckman), containing μl of the compounds. reactions were initiated by addition of μl of the nucleotide mix ( μm utp) and incubated at °c for min. assays were stopped by the addition of μl of edta ( mm). reaction mixes were transferred to a greiner plate using a biomek i automate (beckman). picogreen fluorescent reagent was diluted to / in te buffer according to manufacturer's data, and μl of reagent was distributed into each well of the greiner plate. the plate was incubated for min in the dark at room temperature, and the fluorescence signal was read at nm (excitation) and nm (emission) using a tecansafire . positive and negative controls consisted respectively of a reaction mix with % dmso final concentration or edta mm or hinokiflavone µm instead of compounds. for each compound, the percentage of inhibition was calculated as follows: inhibition % = (raw_data_of_compound − av(pos)/(av(neg) − av(pos)). compounds leading to a % inhibition or more at µm were selected to further investigations. the z' factor is calculated using the following equation: z' = -[ (sd of max) + (min sd)] /[(mean max signal) -(mean min signal)], where sd is the standard deviation. the compound's concentration leading to a % inhibition of polymerase mediated rna synthesis was determined in ic buffer ( mm hepes ph . , mm kcl, mm mncl , mm mgcl , mm dtt) containing nm of poly(a) template, nm of nsp in complex with . µm nsp l using different concentrations of compound. five ranges of inhibitor concentration were available ( , to µm / , to µm / , to µm / to µm / to µm). according to the inhibitory potency of the compound tested, a range was selected to determine the ic . reactions were conducted in µl on a -well nunc plate. all experiments were robotized by using a biomek automate (beckman). µl of each diluted compound in % dmso were added in wells to the chosen concentration ( % dmso final concentration). for each assay, the enzyme mix was distributed in wells after a min incubation at room temperature to form the active complex. reactions were started by the addition of the utp mix and were incubated at °c for min. reaction assays were stopped by the addition of µl edta mm. positive and negative controls consisted respectively of a reaction mix with % dmso final concentration or edta mm instead of compounds. reaction mixes were then transferred to greiner plate using a biomek i automate (beckman). picogreen® fluorescent reagent was diluted to / ° in te buffer according to the data manufacturer and µl of reagent were distributed into each well of the greiner plate. the plate was incubated min in the dark at room temperature and the fluorescence signal was then read at nm (excitation) and nm (emission) using a tecansafire . ic was determined using the equation: % of active enzyme = /( +(i) /ic ), where i is the concentration of inhibitor and % of activity is the fluorescence intensity without inhibitor. ic was determined from curve-fitting using prism software. for each value, results were obtained using triplicate in a single experiment. the nsp rdrp activity is dependent on the presence of the nsp and nsp proteins (subissi et al., ) . we made use of a : ratio of nsp :nsp l which provides appropriate levels of rdrp activity. a hts assay was developed with a non-radioactive readout. we used conditions similar to the hts dengue virus (dv) polymerase assay previously developed on our platform. briefly, the homo-polymeric ntp template was incubated with nsp :nsp l , and the polymerase activity is detected through a fluorescent dye which intercalates upon the synthesis of double stranded rna (dsrna). nsp protein and nsp l complex were expressed and purified separately ( fig s ) . we first evaluated the effect of the poly(a) concentration. maximal sars-cov rtc activity was obtained with a poly(a) concentration of nm, allowing a stable and reproducible signal (fig. a) . to define the optimal utp concentration, the velocity of the rtc was assessed at different utp concentrations. we observed a non michaelis-menten curve with a significant lag-phase (fig. b, full line) , and therefore it was not possible to caculate a reliable km concentration for the robotized assay using standard kinetic modeling. using a hill equation fit, we observed an allosteric cooperation with a hill coefficient of . (fig. b, dotted line) . the utp concentration to be used was defined at µm, as it corresponds to the beginning of the saturation phase. finally, to reach % of the maximum activity, the nsp concentration was set to nm (fig. c ). using the conditions established above ( nm poly(a) template, µm utp with nm nsp and . µm nsp l ) a time course up to min was performed (fig. ) . in contrast to the dv polymerase, the sars-cov rtc complex exhibits a significant lag phase (fig. ) . to obtain sufficient levels of activity, a reaction time of min. was determined to be suitable for inhibitor screening. as expected, nsp alone, or nsp l alone, in the same conditions, did not exhibit any polymerase activity (fig. ) . to develop the hts of the sars-cov rtc, reference inhibitors had to be selected. in absence of specific inhibitors of the sars-cov rtc we first tested a nucleotide analogue (na), 'dutp, along with a large spectrum of non-specific rna synthesis inhibitors: hinokiflavone, amentoflavone, quercetin and apigenin (coulerie et al., ) . ic s values range from . µm for hinokiflavone to µm for apigenin (fig. ) , demonstrating the discriminating capacity of the hts assay. the 'dutp ic is measured at . µm under our experimental conditions. the total of , compounds from the prestwick chemical library® (pcl) were tested using the sars-cov-rtc assay. the prestwick chemical library ® is a unique collection of small molecules, mostly approved drugs (fda, ema and other agencies) selected by a team of medicinal chemists and pharmacists for their high chemical and pharmacological diversity as well as for their known bioavailability and safety in humans. the z' value was calculated based on the ten control wells on each microplate, resulting in an overall z'-score of , +/- , for all the chemical library used in the screen. the screening was performed using µm of compounds in % dmso in a single assay. based on a cut-off of % inhibition, the calculated hit rate is . % ( hits) and the repartition of identified compounds according their percentage of inhibition was indicated in figure . based on these results, three main families of molecules were identified: anthracyclines, tetracyclines and detergents, the two first of which were further investigated (tables a and a ). eight anthracyclins were identified with inhibitory potential from zero to % at µm (s table ). regarding the five determined ic s, they perfectly correlated with the calculated percentage of inhibition at µm of the hts. four of the five anthracyclins tested were in the same range of ic s (from to µm). the remaining one, prestw- , exhibited a sub-micromolar ic ( . µm ± . ) which was in the same range than hinokiflavone, the inhibitor control (fig. a ). ten tetracyclins were identified in the pcl, with inhibitory potential up to % at µm (s table ). again, the calculated ic values for these compounds was found to perfectly correlate to the percentage of inhibition at µm. three tetracyclins showed ic s of µm or higher (prestw- ; prestw- ; prestw- ) while prestw- and prestw- exhibited lower ic s of . and . µm respectively ( figure b ). in this study we have established a robust hts assay with a fluorescent readout for the sars-cov rtc. the overall hit rate for the pcl screen is ~ %, with a cut-off of % of inhibition. obviously, at this stage the identified hits are active only on the sars-cov rtc and have by no means been deemed suitable for medication. regarding the selected two families of molecules, the anthracyclins and the tetracyclins, the number of identified molecules did not allow structure activity relationship (sar) studies. rather, it is important to outline the accuracy of the ic s of the primary screening, as well as the reproducibility of the ic s regarding the calculated standard errors. with the exception of the anthracyclins, which have been described as potential intercalating molecules, it is not known at this stage if these compounds are indeed intercalating, denaturing, or low specificity agents having a mechanism of action suitable for advanced drug design. preliminary testing in sars-cov- infected cells indicate that none of the compounds described here show significant antiviral activity (data not shown). in summary, this work, to the best of our knowledge, is the first description of a robust hts assay based on a sars-cov rtc. it provides a new strategy for the rapid identification of potential anti-sars inhibitors. the next steps already under development include increasing the output by scaling up the assay to and potentially wells plates. furthermore, the screening of large chemical libraries derived from innovative approaches as virtual screenings or protein protein interaction inhibitors (ippi) will broaden the scope of potential antivirals and constitute a key step to speed up these objectives. identification of candidate anti-coronavirus drugs, based on a diversity of chemical libraries, should expedite drug discovery and design, and constitute an invaluable help to confirm the target of hits identified during cell-based or phenotypic screens. the polymerase activity of nm dv -ns pol (•), nm ns in complex with . µm nsp l (▲) , nm nsp alone (Δ) or . µm nsp l alone (□) were measured in a time course ( ; ; ; . ; ; ; and min). the produced doubled strand rna was detected by adding an intercalant reagent (picogreen®) and by measuring the fluorescence emission at nm. each assay was performed three times (mean value ± sd). number of compounds from pcl on the screening of sars-cov rtc, according to their inhibitory potential. based on their efficiency on the sars-cov rtc assay, the number of compounds with more of %; %; %; %; %; %; % and % inhibition were evaluated. the exact value was indicated in white above each bar graph. (only compounds with more % inhibition of the polymerase activity in the assay were represented. frequent hitter and fluorescent compounds were excluded). the z' value was calculated based on the ten control wells on each microplate, resulting in an overall z' score of , ± , . compounds were incubated with nm nsp , . µm nsp lnsp , µm utp, nm poly (a) at °c during min. the ic s were then calculated using graphpad prism equation (experiment were done twice in triplicate; mean value ± sd). ic : concentration for % inhibition. the ic of prestw- was approximate. novel -phenyl- -[(e)- -(thiophen- -yl)ethenyl]- , , -oxadiazole and -phenyl- -[(e)- -(thiophen- -yl)ethenyl]- , , -oxadiazole derivatives as dengue virus inhibitors targeting ns polymerase improving therapy of severe infections through drug repurposing of synergistic combinations biflavonoids of dacrydium balansae with potent inhibitory activity on dengue ns polymerase structure of the rna-dependent rna polymerase from covid- virus structure of replicating sars-cov- polymerase sustained virologic response to direct-acting antiviral therapy in patients with chronic hepatitis c and hepatocellular carcinoma: a systematic review and meta-analysis structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors nucleoside analogues for the treatment of coronavirus infections remdesivir and sars-cov- : structural requirements at both nsp rdrp and nsp exonuclease active-sites unique and conserved features of genome and proteome of sars-coronavirus, an early split-off from the coronavirus group lineage one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities rna dependent rna polymerases: insights from structure, function and evolution the establishment of reference sequence for sars-cov- and variation analysis a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin quercetin (■) and apigenin (□) were incubated with nm nsp , . µ m nsp lnsp , µm utp, nm poly (a) at °c during min. the inhibitory concentrations (ic s) were then calculated using graphpad prism equation (experiments were done twice in triplicate this work was supported by the fondation pour la recherche médicale (aide aux équipes), the score project h sc -phe-coronavirus- (grant# ), inserm through the reacting initiative (research and action targeting emerging infectious diseases), and the anr-flash-covid (anr- -covi- - , tamac), supported by the fondation de france. we thank ml jung, e. decroly for helpful comments. key: cord- -ka n pft authors: arumugam, arunkumar; wong, season title: the potential use of unprocessed sample for rt-qpcr detection of covid- without an rna extraction step date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ka n pft quantitative reverse transcription polymerase chain reaction (rt-qpcr) assay is the gold standard recommended to test for acute sars-cov- infection. – it has been used by the centers for disease control and prevention (cdc) and several other companies in their emergency use authorization (eua) assays. with many pcr-based molecular assays, an extraction step is routinely used as part of the protocol. this step can take up a significant amount of time and labor, especially if the extraction is performed manually. long assay time, partly caused by slow sample preparation steps, has created a large backlog when testing patient samples suspected of covid- . using flu and rsv clinical specimens, we have collected evidence that the rt-qpcr assay can be performed directly on patient sample material from a nasal swab immersed in virus transport medium (vtm) without an rna extraction step. we have also used this approach to test for the direct detection of sars-cov- reference materials spiked in vtm. our data, while preliminary, suggest that using a few microliters of these untreated samples still can lead to sensitive test results. if rna extraction steps can be omitted without significantly affecting clinical sensitivity, the turn-around time of covid- tests and the backlog we currently experience can be reduced drastically. next, we will confirm our findings using patient samples. if rna extraction steps can be omitted without significantly affecting clinical sensitivity, the turnaround time of covid- tests and the backlog we currently experience can be reduced drastically. next, we will confirm our findings using patient samples. the sample preparation step is generally time-consuming, regardless of whether it is done manually or automated. in addition, there is a current shortage of the recommended viral rna extraction kits needed for the centers for disease control and prevention (cdc) rt-qpcr assay to diagnose sars-cov- . during a study in developing a rapid protocol for influenza (inf) and respiratory syncytial virus (rsv) diagnostics, we investigated the feasibility of omitting the sample preparation steps to expedite the test without significantly impacting the test's sensitivity. using inf and rsv clinical specimens, we successfully performed rt-qpcr reactions by simply adding a few microliters of the unprocessed sample in viral transport medium (vtm) directly into the rt-qpcr assay master mix. we then tested the approach using sars-cov- plasmid and seracare accuplex reference materials. the data presented below suggest that it is possible to skip the rna extraction step in covid- testing without a significant drop in assay sensitivity. we first tested the feasibility using a very small amount of sample in rt-qpcr reactions by using aerosol generating vials to spray the samples over the uncapped pcr tubes. the material in vials containing influenza a (infa), influenza b (infb), or rsv clinical specimens (swabs in vtm) were sprayed into pcr tubes containing primer and probe sets targeting infa, infb, rsv and rnasep (rp) in master mix prior to capping the tubes and performing rt-qpcr. fig. shows that there were positive pcr signals in the respective tubes that did not have any non-specific amplification. though the unprocessed, sprayed samples have higher ct values ( . , . and for infa, infb, and rsv, respectively) than the corresponding extracted rna template samples ( . , . , . for infa, infb, and rsv). it is important to note that we were able to detect low viral load samples (ct > ) when less than µl of sample entered the tubes as measured by an analytical balance. to test the extent of pcr inhibition exerted by flu specimens in vtm, we used clinical samples from discovery life sciences, a biospecimen repository. between . µl to µl of the unprocessed samples were directly spiked into the master mix with infa primers in a µl reaction mix (fig. ). an extracted template (in promega maxwell device) from the same sample was also amplified along with unprocessed samples. adding more unprocessed samples (up to µl) improved (reduced) the ct values. the ct difference between the extracted template ( µl input and µl eluate) and the unprocessed sample is minimal ( . vs. . , respectively). adding more than µl of untreated samples has resulted in high ct (data not shown), meaning the inhibitory effect outweighed the benefits of having more copies of the target in a reaction. we next tested whether the rna from sars-cov- can be detected by directly spiking samples of the non-replicative recombinant virus particles (seracare accuplex sars-cov- reference material) in vtm to master mix without an extraction step. the sars-cov- virus particles were mixed with vtm to get a final concentration of , copies per ml. different amounts ( , , and µl) of these mock clinical samples were spiked into the master mix containing cdc recommended sars-cov- rt-qpcr diagnostic panel primers n , n o n in µl pcr reactions, though we note that the n primers were recently removed by the cdc. purified nucleic acid template ( µl) isolated from a promega maxwell device was also amplified. the ct difference between the purified template and the directly spiked clinical samples (i.e., µl, µl and µl) is minimal. then, µl of accuplex sars-cov- reference material was processed in promega maxwell device (using an as cartridge) and eluted in µl of elution buffer. the extracted template ( µl) was added to master mix, and rt-qpcr amplification was carried out in a bio-rad (cfx- ) thermal cycler along with the reactions with unprocessed samples. as shown in fig. , the sars-cov- rna from directly spiked samples was successfully detected by the rt-qpcr reaction without a nucleic acid extraction step (n target shown). up to µl of vtm and accuplex sars-cov- mix was used in the reaction, which did not exert major template ( copies/reaction) is . , which is cycles lower than the result from µl of untreated sample ( copies) in the reaction (fig. ) . we speculate that the seracare matrix, containing trisbuffered saline, glycerol, anti-microbial agents and human proteins, could be the reason for the above noticed pcr inhibition. we also tested vtm mixed sars-cov- plasmid (purchased from integrated dna technologies, inc.) using rt-qpcr. this plasmid is used as a positive control for the cdc's sars-cov- rt-qpcr assay. the positive control plasmid was mixed with vtm and µl of this mix was used for the rt-qpcr reaction. the rt-qpcr results showed that the ct values of control (without vtm) and vtm mixed reactions (all containing copies/reaction) were very similar for all three (n , n and n ) sars-cov- targets ( table ) . we also tested copies of sars-cov- plasmid using rt-qpcr, which gave a ct value of ~ (data not shown). this means direct spiking of a higher concentration of sars-cov- genome copies in vtm did not have a major pcr inhibitory effect in the reactions when the target concentration is high ( table ) . therefore, it is likely that specimens with higher viral load could also be detected by rt-qpcr without needing an rna extraction step. sars-cov- plasmid in vtm did not show any pcr inhibition when compared to plasmid alone. the cdc positive control plasmid for sars-cov- (from idt) was diluted in vtm or te buffer (control) to get , copies per ml. µl of sars-cov- plasmids in vtm and in te buffer were added to a µl pcr reaction mix targeting n , n and n genes for the detection of sars-cov- . the ct values of both the control and plasmid in vtm are very similar, indicating that the sample preparation step can be omitted in high viral load samples. our data using both high (ct < ) and low (ct > ) target concentrations suggest that since rt-qpcr is highly sensitive, using raw samples or minimal sample preparation steps might not reduce the test sensitivity as most patients tend to have a higher viral load. also, the efficiency of extraction methods tends to drop significantly at very low target concentrations when processed through numerous washing steps. therefore, nucleic acid loss during extraction steps may hurt the limit of detection. while further studies with patient specimens and a higher number of samples are needed to confirm our preliminary results, we report that the use of untreated samples can be a viable option during the covid- pandemic. the accuplex sars-cov- reference material from seracare containing con-replicative viral particles ( , viral particles per ml) was mixed with equal volumes of vtm to get a final concentration of , viral particles per ml. each microliter contained ~ . genomic material equivalents of sars-cov- . sars-cov- positive control plasmid (cdc recommended) was obtained from integrated dna technologies ( , copies/µl). this was diluted to , copies/µl in te buffer and copies/ml. a control was prepared by diluting µl of plasmid stock solution in µl of te buffer. both the vtm and te buffer working solutions ( µl or copies in each reaction) were added into a µl pcr reaction. the promega maxwell extraction was performed using as cartridge (promega corporation, madison, wi). µl of the sample (accuplex sars-cov- or influenza a/b or rsv) was added to the cartridge and eluted in µl elution buffer provided in the kit. the authors declare no conflict of interest. molecular diagnosis of a novel coronavirus ( -ncov) causing an outbreak of pneumonia a novel coronavirus from patients with pneumonia in china comparison of different samples for novel coronavirus detection by nucleic acid amplification tests detection of novel coronavirus ( -ncov) by real-time rt-pcr sars-cov- viral load in upper respiratory specimens of infected patients evaluation of three influenza a and b real-time reverse transcription-pcr assays and a new h n assay for detection of influenza viruses the burden of hospitalized lower respiratory tract infection due to respiratory syncytial virus in rural thailand division of viral diseases key: cord- -qvz m authors: banerjee, shuvam; dhar, shrinjana; bhattacharjee, sandip; bhattacharjee, pritha title: decoding the lethal effect of sars-cov- (novel coronavirus) strains from global perspective: molecular pathogenesis and evolutionary divergence date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qvz m background covid- is a disease with global public health emergency that have shook the world since its’ first detection in china in december, . severe acute respiratory syndrome coronavirus (sars-cov- ) is the pathogen responsible behind this pandemic. the lethality of different viral strains is found to vary in different geographical locations but the molecular mechanism is yet to be known. methods available data of whole genome sequencing of different viral strains published by different countries were retrieved and then analysed using multiple sequence alignment and pair-wise sequence alignment leading to phylogenetic tree construction. each location and the corresponding genetic variations were screened in depth. then the variations are analysed at protein level giving special emphasis on non synonymous amino acid substitutions. the fatality rates in different countries were matched against the mutation number, rarity of the nucleotide alterations and functional impact of the non synonymous changes at protein level, separately and in combination. findings all the viral strains have been found to evolve from the viral strain of taiwan (mt ) which is % identical with the ancestor sars-cov- sequences of wuhan (nc . ; submitted on th jan, ). transition from c to t (c>t) is the most frequent mutation in this viral genome and mutations a>t, g>a, t>a are the rarest ones, found in countries with maximum fatality rate i.e italy, spain and sweden. non synonymous mutations are located in viral genome spanning orf ab polyprotein, surface glycoprotein, nucleocapsid protein etc. the functional effect on the structure and function of the protein can favourably or unfavourably interact with the host body. interpretation the fatality outcome depends on three important factors (a) number of mutation (b) rarity of the allelic variation and (c) functional consequence of the mutation at protein level. the molecular divergence, evolved from the ancestral strain (s) lead to extremely lethal (e), lethal(l) and non lethal (n) strains with the involvement of an intermediate strain(i). in the new decade of st century, surfaced the first public health emergency of global concern in wuhan, china from the menace of covid- (ncov/β-coronavirus), also known as sars-cov- . covid- is one of the seven pathogenic members of family coronaviridae, other notable ones are severe acute respiratory syndrome (sars) coronavirus (sars-cov), identified first in southern china (november, ) , and middle east respiratory syndrome (mers) coronavirus (mers-cov), from saudi arabia ( ). less severe ones are hku , nl , oc- and e. the enveloped virus possesses crown-like spikes on their surface (latin word corona i.e crown) and a rna genome, single stranded positive-sense strand with to kb length. the microbe can co-infect different vertebrates including humans and affect organs of respiratory, gastrointestinal and central nervous system. according to world health organization (who), % covid infection are mild or asymptomatic, % are severe infection, while % are critical. crude mortality ratio (no. of reported deaths divided by reported cases) is between - %. the infection mortality rate (the number of reported deaths divided by the number of infection) will be lower than seasonal influenza ( · %). globally there are , , confirmed cases; , deaths and so far , , people recovered in countries and territories. , the most vulnerable group in corona epidemic is elderly, malnourished, hypertensive, diabetic, immune compromised, cancer and cardiovascular patients, as well as pregnant women. despite wide spread covid- infection, the numbers of fatally infected children cases are less reported so far, possible mechanism might be an unknown protective interaction between immune system and respiratory pathway. covid- spreads through droplet infection and fomites, from infected person during coughing and sneezing. majority of symptoms are sore throat, breathing difficulty and fever, although gi and musculoskeletal system are also involved. , clinical course is somehow showing a predictable pattern. on around day five, flu like symptom starts, common are fever, headache, dry cough, myalgia (and back pain), nausea (without vomiting), abdominal discomfort with some diarrhea, loss of smell, anorexia and fatigue. around same time symptoms can worsen leading to shortness of breath due to bilateral viral pneumonia from direct viral damage to lung parenchyma. around day , cytokine storm kicks in, subsequently acute respiratory distress syndrome (ards) and multi organ failure ensues. this degradation usually happens in matter of hours. hospitalized patients in moderate to severe cases usually come in hypoxic stage without dyspnea. china reported % cardiac involvement and serious final outcome. in most of the chest x-rays, bilateral interstitial pneumonia or ground glass opacities are seen. hypoxia mostly does not correlate well with the chest x-ray findings. chest auscultations are not of great help. blood reports show, in the most cases, wbc is low, i.e. mostly lymphocytopenia, low platelet count whereas pro-calcitonin are mostly normal. most consistently, crp and ferritin levels are elevated and similarly cpk, d-dimer, ldh, alkphos/ast-alt levels are also high. it has been seen a ratio of absolute neutrophil count to absolute lymphocyte count, greater than . , may be the highest predictor of poor outcome. an elevated level of il- is of grave concern of cytokine storm. , among the four classified subfamilies, α -and β -coronaviruses both were found in bats, e.g leafnose bats like hipposideros armige harbor α -coronavirus and β -coronavirus in hipposideros larvatus; while γ -and δ -coronavirus found in pigs and birds respectively. the genesis of β coronavirus cluster in afore mentioned bat species, consisting of primeval and genetically independent types might took favour of host/pathogen interaction despite of wide geographical distribution. genomic characterization of sars-cov- reveals % identity to that of bat coronavirus whole genome; while % and % to that of sars-cov and mers-cov respectively. like other betacoronaviruses, the genome of sars-cov- has a long orf ab polyprotein at the ' end, followed by four major structural proteins, including the spike like surface glycoprotein, small envelope protein, matrix protein, and nucleocapsid protein. , the spike (s) protein has two distinct functional domains, termed s and s , both of which are necessary for a coronavirus to successfully enter a cell. it has been found that s protein of sars-cov- is - times more likely to bind to human ace than the s protein of the early s sars-cov strain. ace is essentially a carboxypeptidase, which can remove carboxy-terminal of hydrophobic or basic amino acids. the expression of ace is comparatively higher in mouth and tongue and is also normally expressed in human lower lungs on type-i and type-ii alveolar epithelial cells. the heightened affinity for a prevalent cellular receptor may be a factor which increases a quick transmission of sars-cov- in the upper respiratory tract. as an rna virus, -ncov has the inherent feature of a high mutation rate. due to mutations and recombination effects, different viral strains are originating with new characteristics; however because of its genome encoded exonuclease, the mutation rate might be somewhat lower than other rna viruses. , in this study, we comprehensively analyzed the whole genome sequence homology from the available patient data uploaded by affected countries in ncbi virus database, identified the mutations developed by different strains from the ancestor strain and studied the impact of those mutations at functional level. our endeavour was to categorize all the strains into major groups depending upon their lethal effect and mutations observed. whole genomic data retrieval from the database retrieved the whole genome sequences from "ncbi virus" database, specific input was "sars-cov- ", during a period from jan through march , . later submissions or the sequences with many undetermined nucleotides (denoted as n) were not considered. countries those had multiple entries, highest number of entries with whole genome sequence were matched for identifying representative sequence to consider (e.g. for usa, out of submitted complete genomic sequences, the sequence with submissions were of complete genomic sequence with % identity and thus one representative sequence was considered for the present study). thus, complete whole genome sequences submitted by different affected countries were taken into consideration (figure ). clustalw (version . ) was employed to align multiple sequences, representative of countries, with the purpose (a) to understand the similarity and variation in all genomic position, (b) to establish the evolutionary relationship among the viral strains affecting the whole world and (c) finally, to distinguish the ancestry of different viral strains using fasttree tool, which is based on neighbour joining method. pair-wise sequence alignment emboss-stretcher was used for whole genomic pair-wise alignment between the ancestor sars-cov- sequences of wuhan (nc . ; submitted on th jan, ) with all sequences. similar strategy was used for pair wise alignment between the genome sequences of sars-cov submitted by italy, with the sars-cov- strain of italy, . all alignment results were thoroughly screened to find out the exact location of the mutations and the genetic variation affecting that position. functional implications of each type of nucleotide variations were understood from the lods ratio and it is calculated by the formulae log e (obs/exp), where 'obs' is the observed frequency of a specific alteration in the genome and 'exp' is the expected frequency of that specific alteration due to individual proportion of the nucleotides in the genome sequences. to obtain the different protein sequences of the translated genome, open reading frame was generated using orf finder. six different orf were generated which covered the entire genome sequence. our protein sequences were aligned with existing protein sequences of sars-cov- present in the 'ncbi virus' database to check which orf corresponds to the which known protein. the amino acid alterations were identified from the nucleotide information and were classified into synonymous (s) and non synonymous (ns). the functional impacts of all ns mutations at protein were analysed using different snp annotation tools (i.e. sift, snap, polyphen and metasnp). the fatality rate of a country was calculated considering the summation of total number of death and critical case and then dividing the same with total number of detected case. [fatality rate = (total no. of deaths + total no of severe cases)/ total no. of detected cases]). the fatality rates were matched against the rarity of the nucleotide alterations and functional impact of the nonsynonymous changes at protein level. all the strains of different geographical locations were categorized depending upon the no of mutations and type of mutations they had. multiple sequence alignment of whole genome sequences, representative entries from countries, clearly showed extremely conserved sequence with only mutations (supplementary material ). only viral genome sequence identified from taiwan entry does not show any mutation, suggesting this is the ancestral genome sequence. all the mutations of sars-cov- identified were reported in table . among all mutations, a three nucleotide deletion (from - position) was observed only from mt strain of india. the phylogenetic tree analysis showed the proximity of the viral strains of different covid- affecting countries. three distinct clusters have been found ( all the mutations are screened depending upon their rarity to occur in this viral genome and a>t, g>a and t>a are found to be very rare alterations with lod score value less than - · where as c>t is the most common alteration with lod score · . variation analysis at protein level different protein sequences are obtained using orf finder and those sequences were matched with existing orf ab polyprotein, spike like surface glycoprotein, nucleocapsid phosphoprotein etc. to ensure prediction of in silico orf with corresponding sars-cov- protein. next, observed mutations were confirmed at amino acid level and identified a total of nonsynonymous (ns) and synonymous (s) mutations ( table ) . the analysis with different snp annotation tools like sift, snap, polyphen , metasnp showed the functional impact of ns mutations at different level, predicting either tolerance of the mutation, or disease causing ability, or executing risk or damage, which altogether can infer the fatality rate. the fatality rate of italy ( · %), spain ( · %) and sweden ( · %) were significantly higher, where it was less than % in countries like nepal (zero fatality), finland ( · %), vietnam ( · %), usa ( · ), australia ( · %) and india ( · ). in case of japan, brazil and china, fatality rate was moderate i.e approximately - % (figure ). the number of mutations and presence of either of the very rare mutations or functionally important ns mutations or both are found to be strongly linked with the fatality rate (table a-table c ). it is extremely important to understand how the ancestral strain (s) leads to lethal strain (l) and other clusters observed according to phylogeny. our analysis showed ancestral strain (s) give rise to an intermediate strain (i) with single very common mutation, a transition from c to t. from i, three different strains originate, i.e (a) lethal strain (l), with additional mutations over i, (b) extremely lethal strain (e) that contains very rare mutations and a (c) non lethal strain (n) that contains favourable mutations at surface glycoprotein, which possibly inhibit the interaction of ace and favouring non-lethal outcome ( figure ). the current global pandemic of the novel coronavirus covid- created two epicenters first at hubei province in people's republic of china and presently in europe. sars-cov- , like its close relatives sars-cov and mers-cov, is also pathogenic but with a higher infectivity rate. the increasing number of cases and wide spread disease raise grave concerns about the future trajectory of the pandemic and thus, a better understanding regarding the molecular divergence of viral strains and pathogenesis, is utmost important. the present study attempted to categorize covid affected countries based on molecular pathogenesis. three important factors were considered, i.e, number of mutations during evolution, rarity of the allelic substitution and functional alteration of the non-synonymous mutations. we screened and compared extent of mutations observed in genome sequence of sars-cov- . all reported genome sequences of affected countries have been analyzed. so far, studies indicated that there are only two strains, s, possibly the ancestral one and l, might be the lethal and more aggressive one. the distinct clusters that we observed from our phylogenetic tree construction raise the assumption of existence of many more unknown strains (figure ). one interesting report substantiating our finding is from shenzhen which showed possible generation of new strain which neither belongs to 's' nor to 'l' subtype. we evaluated the extent of molecular divergence by msa and pair-wise alignment and subsequently tried to decipher the protein alteration parallel to the mutations. as a rna virus, it was hypothesized that -ncov mutate faster than dna virus. the genome-wide phylogenetic tree indicated that -ncov was closest to sars-like coronavirus, with % sequence similarity. , when we evaluated the sars-cov ( ) and sars-cov- ( ) from italy, sequence identity was . %, suggesting new mutations, might be deleterious in later strain. our first approach was to categorize the countries depending on number of mutations and type of mutation they had. it was found that similar number of mutations does not correlate well with similar extent of fatality outcome. example, patients from australia, japan and italy all show mutations in the viral genome, but australia has least fatality rate, while italy has maximum. again, numbers of mutations are in china, in india and in both sweden and spain. the observations mentioned above strongly indicate that not the total number of mutation, but the nature of mutations finally guides the overall fatality outcome. in the case of italy, we found three deadly mutations in (i.e. a>t, very rare allele; g>a, very rare allele and g>t) with high disease outcome but the wild counterpart of these found in did not lead to any fatality, suggesting the significance of newly evolved mutations. although, a>t mutation does not alter amino acid (a a), increasing number of evidences suggest that synonymous mutations could have effects on splicing, transcription, that ultimately alter the phenotype, disrupting their silence. spain is also carrying a very rare allele transversion t>a, that occurs in orf ab gene of virus. orf ab gene transcribes into a polyprotein and cleaving by protease ( clpro) and papain-like protease (plpro) produces several non-structural proteins, which are important for replication as well as virulence for coronavirus. thus, an alteration in this region might alter the virulence, and associated fatality outcome. we assume, while some mutations are pathogenic, some will be favorable and will undergo positive selection pressure. herein, we tried to elucidate the possible interaction between ace and spike glycoprotein. it is well established that for virus entry, spike glycoprotein (s) present on the cov can be a neutralization antibody and it binds to its receptor followed by membrane fusion. it has been inferred that some favorable changes in viral glycoprotein may limit the increase of fatality. this kind of favorable mutation was found in some strains of covid- from india, australia and sweden. detailed screening depicted that australian strain carry a rare allele transversion (t>g) which results into a ns mutation (s r) on surface glycoprotein s domain which may affect the binding of hace molecules, thus these strains become non-lethal to human. in this study, only one tri-nucleotide deletion at s domain of surface glycoprotein has been found and that is found in indian strains along with a ns mutation (r i) in the receptor binding domain (rbd) of s subunit that disfavor viral entry by inhibiting hace interaction. on the other hand, brazilian strain acquires four common allelic transitions, of which only two non-synonymous mutations affect protein alteration. surprisingly, swedish strain carry both extreme lethal ( g>a, very rare allele leading to g s at orf ab polyprotein) and also favorable mutations ( t>g, rare allele leading to f c at surface glycoprotein) and thus cumulatively lower the severity of the disease. in summary, the present study reveals that the fatality rate increases with not only the number of mutations but also depending on its allelic rarity as well as functional alteration of protein. surface glycoprotein domain is very important for host-carrier interactions and hence the mutations affecting surface glycoprotein can be one of the important mechanisms which alter the viral entry and pathogenesis. future studies may uncover more genetic information at the molecular level as well as structural levels of the proteins, because without this knowledge it'll be difficult to identify drug target and prepare vaccine. we hope our work will help in that direction. sb, pb designed the study; sb, sd analyzed the data; sb, pb interpreted the data; sb, sd, sbh searched literatures; sd, sbh, pb wrote the manuscript; sd prepared figures; pb supervised overall study. we declare no competing interests. zhou p, yang xl, wang common c t · c>t · · · very common a g · a>g · · - · rare t a · t>a · · - · very rare c t · c>t · · · very common g t/n · g>t · · · common g>a · · - · very rare c g · c>g · · · common t c · t>c · · · common c t · c>t · · · very common c t · c>t · · · very common c t · c>t · · · very common c t · c>t · · · very common c t · c>t · · · very common a g · a>g · · - · rare t c · t>c · · · common c t · c>t · · · very common - tta --- · na na na na na t g · t>g · · - · rare g t · g>t · · · common t g · t>g · na na na c t · c>t · · t>c a>g t>g g>t ggt>agt aga>agg tcc>tcg ttt>ctt aca>acg ttt>tgt ggt>gtt g s r r s s f l t t f c g v ns s s ns s ns ns · mt spain c>t t>a c>t g>t t>c c>t c>t agc>agt ttt>tat tac>tat gga>gta ttt>ttc gac>gat tca>tta s s f y y y g v f f d d s l s ns s on the origin and continuing evolution of sars-cov- genetic diversity and evolution of sars-cov- mechanisms of viral mutation the establishment of reference sequence for sars cov and variation analysis sift: predicting amino acid changes that affect protein function hiv signature and sequence variation analysis predicting functional effect of human missense mutations using polyphen- collective judgment predicts disease-associated single nucleotide variants mutations in proteins nonstructural proteins ns b and ns are likely to be phylogenetically associated with evolution of -ncov the -new coronavirus epidemic: evidence for virus evolution genetic variation: synonymous mutations break their silence the authors acknowledge ugc-dae csr for providing fellowship to shuvam banerjee. key: cord- -sbon aes authors: mok, chee keng; ng, yan ling; ahidjo, bintou ahmadou; hua lee, regina ching; choy loe, marcus wing; liu, jing; tan, kai sen; kaur, parveen; chng, wee joo; wong, john eu-li; wang, de yun; hao, erwei; hou, xiaotao; tan, yong wah; mak, tze minn; lin, cui; lin, raymond; tambyah, paul; deng, jiagang; hann chu, justin jang title: calcitriol, the active form of vitamin d, is a promising candidate for covid- prophylaxis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sbon aes covid- , the disease caused by sars-cov- ( ), was declared a pandemic by the world health organization (who) in march ( ). while awaiting a vaccine, several antivirals are being used to manage the disease with limited success ( , ). to expand this arsenal, we screened compound libraries: a united states food and drug administration (fda) approved drug library, an angiotensin converting enzyme- (ace ) targeted compound library, a flavonoid compound library as well as a natural product library. of the compounds identified with activity against sars-cov- , were shortlisted for validation. we show for the first time that the active form of vitamin d, calcitriol, exhibits significant potent activity against sars-cov- . this finding paves the way for consideration of host-directed therapies for ring prophylaxis of contacts of sars-cov- patients. abstract: covid- , the disease caused by sars-cov- ( ), was declared a pandemic by the world health organization (who) in march ( ) . while awaiting a vaccine, several antivirals are being used to manage the disease with limited success ( , ) . to expand this arsenal, we screened compound libraries: a united states food and drug administration (fda) approved drug library, an angiotensin converting enzyme- (ace ) targeted compound library, a flavonoid compound library as well as a natural product library. of the compounds identified with activity against sars-cov- , were shortlisted for validation. we show for the first time that the active form of vitamin d, calcitriol, exhibits significant potent activity against sars-cov- . this finding paves the way for consideration of host-directed therapies for ring prophylaxis of contacts of sars-cov- patients. despite implementation of physical distancing, mask wearing, quarantine and the tireless efforts expended for contact tracing, the rapid transmissibility of sars-cov- even during the asymptomatic phase has made containment of this virus extremely difficult. the main proposed strategy to curb this pandemic is the implementation of mass vaccination programs. once a suitable vaccine is discovered, the significant challenges associated with vaccination programs e.g. limitations in manufacturing capabilities and associated costs, are anticipated to significantly affect uptake of vaccinations globally. we therefore propose that ring prophylaxis, which had been previously proposed for influenza pandemics and involves treating close contacts of a confirmed case with an antiviral prophylaxis to further curb community spread ( ) , be considered as a viable strategy to reduce transmission of sars-cov- . in an effort to identify potential candidates for sars-cov- chemoprophylaxis, we performed a virusinduced cytopathic effect (cpe) based screen of several small molecule libraries in sars-cov- infected vero e cells (fig. a) . the african green monkey kidney epithelial vero e cells were used for the screen as these cells are highly susceptible to coronaviruses and exhibit obvious cpe upon infection. a -compound natural product library and a library of ace targeted inhibitors (the ace receptor was identified to be necessary for sars-cov- infection ( )) were used in a preinfection treatment screen to identify potential viral entry inhibitors, while a post-infection treatment screen was performed using both a compound flavonoid library and a fda-approved compound libraries in order to identify potential inhibitors targeting post-entry steps of the sars-cov- replication cycle. for the pre-infection treatment screen, vero e cells were treated with compounds for two hours prior to infection with sars-cov- . the post-infection treatment screen on the other hand was performed by adding compounds to the vero e cells hour post-infection with sars-cov- . compounds which showed less than % cpe compared to the . %dmso vehicle control with sars-cov- infection were identified as hits (tables s -s ) . using this method, we identified compounds from the pre-infection treatment screen and compounds from the post-infection treatment screen with activity against sars-cov- (table s ) . as expected, our hit list included the tyrosine kinase inhibitors masitinib and imatinib mesylate ( ), the antiretroviral drug lopinavir( ) and the calpain inhibitor calpeptin ( ) -all compounds reported to inhibit sars-cov, sars-cov- or mers-cov. this provides the robustness and confidence of our primary screen for potential antivirals against sars-cov- . of these primary hits, compounds were selected for downstream validation (table ). these included compounds from the pre-infection treatment screen (citicoline, pravastatin sodium and tenofovir alafenamide) and four compounds from the post-infection treatment screen (imatinib mesylate, calcitriol, dexlansoprazole, and prochlorperazine dimaleate). these compounds were selected based on level of cpe inhibition in the primary screens (fig. b) , known mechanism of action and existing fda approval or generally recognized as safe (gras) status. fda approval was considered an important factor as pre-existing data on safety and dosage would allow expedited decisions to be made regarding the potential use of these compounds in vulnerable populations to stymie the current pandemic. validation assays to determine changes in infectious virus titres upon treatment was carried out by testing selected hit compounds in dose-dependent assays in vero e to confirm the primary screen observation and also in the human hepatocarcinoma huh cell line as the latter cell line expresses high levels of the ace receptor ( ) and supports replication of coronaviruses ( ) . cell viability assays were also carried out to ensure that reduction of sars-cov- titres was not due to cytotoxic effects of the compounds on host cells. cc , ic were obtained for each of the compounds in vero e cells (table s ) and huh cells (table s ) (table s ). given that the huh cell line is a hepatocarcinoma cell line and therefore not the first point of entry for sars-cov- in humans, we decided to test the three most promising compounds (imatinib mesylate, citicoline and calcitriol) against sars-cov- in the primary human nasal epithelial cell line (hnec) that is a known in vivo target of sars-cov- ( ) (fig. a) . despite its significant activity in the continuous cell lines (vero e and huh ), in hnecs, imatinib mesylate only displayed a . log reduction in viral titre (fig. ) . interestingly, out of the three compounds only calcitriol proved effective against sars-cov- with a reduction of . log in viral titre (fig. ). while recent data has shown that vitamin d levels are negatively associated with morbidity and mortality of covid- cases ( , ) , this is the first report of a direct inhibitory effect of calcitriol on sars-cov- . vitamin d is well known to modulate host immune responses through the production of the antimicrobial peptides such as cathelicidin to promote autophagy ( ) . it has proven essential for host defenses against many intracellular pathogens including respiratory pathogens such as mycobacterium tuberculosis, and has been shown to also possess anti-inflammatory properties ( ) . a recent study by smith and colleagues ( ) showed an association between vitamin d deficiency and sars-cov- infection and covid- associated mortality. the authors speculated that vitamin d supplementation could protect against sars-cov- infection and improve patient disease outcomes ( ) , and our finding certainly provides credence to this hypothesis. given that calcitriolmediated inhibition occurred upon post-treatment of vero e cells and hnecs, it is likely that its mechanism of antiviral action targets the post-entry phase of viral replication. use of host-directed therapies (hdts) for prevention of infections is certainly not a new idea. most of these therapies however rely mainly on the use of vaccines, convalescent plasma and monoclonal antibodies ( , ) . small molecule hdts have been used adjunctively for diseases such as tuberculosis ( ) and have been proposed for viral pandemics ( ) . this strategy would overcome some of the costs and challenges associated with antiviral production, including the emergence of drug resistance ( ) . vitamin d is a dietary supplement that is cheap and widely available even in low and middle income countries and is converted by the liver and kidneys into the active compound calcitriol ( , ) . it is however important that our findings be confirmed in vivo as well as in clinical trials in order to assess efficacy, optimal dosage, treatment duration, toxicity and safety of calcitriol. given the high transmissibility of sars-cov- globally ( ), if these findings can be replicated in clinical trials, calcitriol may certainly prove to be an effective tool in the effort to control the pandemic while waiting for an effective vaccine to be rolled out globally. at the very least, these findings certainly pave the way for consideration of host-directed therapies for ring prophylaxis of contacts of sars-cov- patients. coronaviridae study group of the international committee on taxonomy of v. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- who. rolling updates on coronavirus disease accessed treatments administered to the first reported cases of covid- : a systematic review current perspective of antiviral strategies against covid- tackling the next influenza pandemic a pneumonia outbreak associated with a new coronavirus of probable bat origin repurposing of clinically developed drugs for treatment of middle east respiratory syndrome coronavirus infection factors associated with prolonged viral shedding and impact of lopinavir/ritonavir treatment in hospitalised noncritically ill patients with sars-cov- infection inhibition of severe acute respiratory syndrome-associated coronavirus (sarscov) by calpain inhibitors and beta-d-n -hydroxycytidine highly infectious sars-cov pseudotyped virus reveals the cell tropism and its correlation with receptor expression replication of respiratory viruses, particularly influenza virus, rhinovirus, and coronavirus in huh hepatocarcinoma cell line sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes the role of vitamin d in the prevention of coronavirus disease infection and mortality evidence that vitamin d supplementation could reduce risk of influenza and covid- infections and deaths the vitamin d-antimicrobial peptide pathway and its role in protection against infection the role of vitamin d in the prevention of coronavirus disease infection and mortality accessed a review of sars-cov- and the ongoing clinical trials pandemic influenza and healthcare demand in the netherlands: scenario analysis vitamin d deficiency and liver disease therapeutic use of calcitriol. current vascular pharmacology presymptomatic transmission of sars-cov- -singapore human nasal epithelial cells derived from multiple subjects exhibit differential responses to h n influenza virus infection in vitro the use of nasal epithelial stem/progenitor cells to produce functioning ciliated cells in vitro key: cord- -pu fetq authors: zang, ruochen; castro, maria f.g.; mccune, broc t.; zeng, qiru; rothlauf, paul w.; sonnek, naomi m.; liu, zhuoming; brulois, kevin f.; wang, xin; greenberg, harry b.; diamond, michael s.; ciorba, matthew a.; whelan, sean p.j.; ding, siyuan title: tmprss and tmprss mediate sars-cov- infection of human small intestinal enterocytes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pu fetq both gastrointestinal symptoms and fecal shedding of sars-cov- rna have been frequently observed in covid- patients. however, whether sars-cov- replicate in the human intestine and its clinical relevance to potential fecal-oral transmission remain unclear. here, we demonstrate productive infection of sars-cov- in ace + mature enterocytes in human small intestinal enteroids. in addition to tmprss , another mucosa-specific serine protease, tmprss , also enhanced sars-cov- spike fusogenic activity and mediated viral entry into host cells. however, newly synthesized viruses released into the intestinal lumen were rapidly inactivated by human colonic fluids and no infectious virus was recovered from the stool specimens of covid- patients. our results highlight the intestine as a potential site of sars-cov- replication, which may contribute to local and systemic illness and overall disease progression. correspondence: siyuan ding, siyuan.ding@wustl.edu infection of cultured cells (fig. s a) . we also observed multiple events of syncytia formation between iecs in both d monolayer (fig. s b ) and in d matrigel (fig. s c and video s ). the cell fusion and subsequent cytopathic effect may have important implications regarding the common gi symptoms seen in covid- patients ( ) ( ) ( ) ( ) ( ) . tmprss and matriptase (encoded by st ) shared a highly specific expression pattern in human iecs (fig. a) . previous studies indicated that both tmprss and st facilitate influenza a virus but neither play a role in sars-cov infection ( - ). notably, in our mouse scrna-seq dataset, both tmprss and st were found to be present in mature enterocytes and had an increased co-expression pattern with ace than tmprss (fig. s a) . to mechanistically dissect the entry pathway of sars-cov- in iecs, we set up an hek ectopic expression system to evaluate the sars-cov- chimera virus infectivity. as reported ( , ), ace conferred permissiveness to sars-cov- infection ( fig. b and s b). while tmprss alone did not mediate viral infection, co-expression of tmprss significantly enhanced ace -mediated infectivity (fig. b) . importantly, we found that expression of tmprss but not st also resulted in a significant increase in the levels of viral rna and infectious virus titers in the presence of ace ( fig. b and s c) . tmprss and tmprss had an additive effect and mediated the maximal infectivity in cell culture ( fig. b and s c-e) . we reasoned that tmprss may function as a cell surface serine protease that enhances s cleavage to promote viral entry. to test this hypothesis, we co-expressed a c-terminally strep-tagged full-length sars-cov- s protein in an hek cell line that stably expresses ace with or without additional introduction of tmprss or tmprss . in mock cells, we readily observed the full-length s and a cleaved product that corresponded to the size of s fragment, presumably cleaved by furin protease that is ubiquitously expressed ( ). importantly, the expression of tmprss or tmprss enhanced s cleavage, as evidenced by the reduction of full-length s and increase of s levels (fig. c) . based on these results, we hypothesize that tmprss serine proteases assist virus infection by inducing s cleavage and exposing the fusion peptide for efficient viral entry. indeed, tmprss expression did not affect virus binding (data not shown) whereas the endocytosis was significantly enhanced (fig. d ). to examine s protein's fusogenic activity, we examined sars-cov- s mediated cell-cell fusion in the presence of absence of tmprss or tmprss . previous work with sars-cov and mers-cov suggested s mediated membrane fusion takes place in a cell-type dependent manner ( ). we found that the ectopic expression of s alone was sufficient to induce syncytia formation, independent of the virus infection (fig. e ). this process was dependent on a concerted effort from ace and tmprss serine proteases. tmprss expression triggered s-mediated cell-cell fusion, although to a lesser extent than tmprss (fig. e) . collectively, we have shown that tmprss and tmprss activate sars-cov- s and enhance membrane fusion and viral endocytosis into host cells. with sars-cov- s containing donor cells that were transfected with tdtomato. we found that tmprss was able to function in cis, i.e. on the same cells as ace , and infection of primary human iecs, we used a crispr/cas -based method to genetically delete tmprss or tmprss in human duodenum enteroids. efficient knockout was confirmed by western blot (fig. s a) . importantly, abrogating tmprss expression led to a -fold reduction in sars-cov- chimera virus replication in human enteroid, even more significant than tmprss knockout (fig. b) , highlighting its importance in mediating virus replication in primary cells. in parallel to genetic depletion, we also tested the effect of pharmacological inhibition of tmprss serine proteases on virus replication. we pre-treated enteroids with camostat mesylate, a selective inhibitor of tmprss over other serine proteases including trypsin, prostasin and matriptase ( ). while camostat treatment significantly inhibited sars- cov- chimera virus infection, soybean trypsin inhibitor (sbti) and e- d, a cysteine protease inhibitor that blocks cathepsin activity and has excellent inhibitory activity against sars-cov- in vitro ( ), did not had a major impact on virus replication in enteroids (fig. c) . combined with the human enteroid results ( fig. - ) , we hypothesize that the sars-cov- has the potential to and likely does replicate in human iecs but then get quickly inactivated the gi tract. we collected stool specimens from a small group of covid- patients. from out of fecal samples, we detected high rna copy numbers of sars- cov- viral genome (fig. b ). however, we were unable to recover any infectious virus using a highly sensitive cell-based assay ( ). in this study we set out to address an important basic and clinically relevant question: is the persistent viral rna seen in covid- patients' stool infectious and transmissible? we showed that despite sars-cov- 's ability to establish robust infection and replication in human iecs ( fig. - ) , the virus is rapidly inactivated by colonic fluids and we were unable to detect infectious virus in fecal samples (fig. ) . thus, the large quantities of integrity and give rise to gi pathology seen in covid- patients. it is possible that in the small intestine, whereas sars-cov- is relatively stable, additional proteases such as trypsin likely enhance viral pathogenesis by triggering more robust iec fusion (fig. s a) . in this sense, although viruses are not released into the basolateral compartment ( were selected under μg/ml g . hek cells stably expressing human ace and tmprss were selected under μg/ml g and μg/ml blasticidin. table s ) in the presence of polybrene ( µg/ml). at hours post transduction, puromycin ( µg/ml) was added to the maintenance media. puromycin was adjusted to µg/ml upon the death of untransduced control enteroids. table s ). gene knockout enteroids were seeded into monolayers and infected with . x pfus of sars-cov- chimera virus for hours. the expression of vsv-n was measured by rt-qpcr and normalized to that of (c) human duodenum enteroids seeded into collagen-coated -well plates were differentiated for days, pre-treated with μg/ml of soybean trypsin inhibitor (sbti), μm of camostat mesylate, or μm of e- d for minutes, and infected with . a new coronavirus associated with human respiratory disease in a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- cell entry depends on ace and tmprss functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses structure, function, and antigenicity of the sars-cov- spike peyrin-biroulet, diarrhea during covid- infection: pathogenesis, epidemiology, prevention and management clinical characteristics of covid- patients with digestive symptoms in hubei, china: a descriptive, cross-sectional, multicenter study gastrointestinal manifestations of sars-cov- infection virus load in fecal samples from the hong kong cohort and systematic review and meta-analysis first case of novel coronavirus in the united states molecular and serological investigation of -ncov infected patients: implication of multiple shedding routes prolonged presence of sars-cov- viral rna in faecal samples evidence for gastrointestinal infection of sars-cov- virological assessment of hospitalized patients with covid- tissue-specific amino acid transporter partners ace and collectrin differentially interact with hartnup mutations ace links amino acid malnutrition to microbial ecology and intestinal inflammation rotavirus vp targets mavs for degradation to inhibit type iii interferon expression in intestinal epithelial cells innate immune response to homologous rotavirus infection in the small intestinal villous epithelium at single-cell resolution controlling epithelial polarity: a human enteroid model for host- human vp * mabs neutralize rotavirus selectively in human intestinal epithelial cells activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites different host cell proteases activate the sars-coronavirus spike-protein for cell-cell and virus-cell fusion epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of coronavirus disease clinical features of patients infected with novel coronavirus in wuhan epidemiological, clinical and virological characteristics of cases of coronavirus-infected disease (covid- ) with gastrointestinal symptoms clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response matriptase, hat, and tmprss activate the hemagglutinin of h n influenza a viruses matriptase proteolytically activates influenza virus and promotes multicycle replication in the human airway epithelium tmprss and furin are both essential for proteolytic activation and spread of sars- cov- in human airway epithelial cells and provide promising drug targets. biorxiv structure, function, and evolution of coronavirus spike proteins the cutting edge: membrane-anchored serine protease activities in the pericellular microenvironment tmprss activates the human coronavirus e for cathepsin-independent host cell entry and is expressed in viral target cells in the respiratory epithelium simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry fields virology enteric involvement of severe acute respiratory syndrome- associated coronavirus infection viral shedding and antibody response in patients with clinical infectious diseases : an official publication of the infectious diseases society of america site-specific n-glycosylation characterization of recombinant sars cov- spike proteins using high-resolution mass spectrometry. biorxiv aerosol and surface stability of sars-cov- as compared with sars-cov- . the new england journal of medicine stability of sars-cov- in different environmental conditions cell-to-cell spread of hiv- and evasion of neutralizing antibodies a sars-cov- -human protein-protein interaction map stag deficiency induces interferon responses via cgas-sting pathway and restricts virus infection retinoic acid and lymphotoxin signaling promote differentiation of human intestinal m cells dynamic expression profiling of type i and type iii interferon-stimulated hepatocytes reveals a stable hierarchy of gene expression a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov nlrp b inflammasome restricts rotavirus infection in intestinal epithelial cells rna sequencing data with many zero counts spatial reconstruction of single-cell gene expression data recovering gene interactions from single-cell data using key: cord- - djnz p authors: bert, nina le; tan, anthony t; kunasegaran, kamini; tham, christine y l; hafezi, morteza; chia, adeline; chng, melissa; lin, meiyin; tan, nicole; linster, martin; chia, wan ni; chen, mark i-cheng; wang, lin-fa; ooi, eng eong; kalimuddin, shirin; tambyah, paul anantharajal; low, jenny guek-hong; tan, yee-joo; bertoletti, antonio title: different pattern of pre-existing sars-cov- specific t cell immunity in sars-recovered and uninfected individuals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: djnz p memory t cells induced by previous infections can influence the course of new viral infections. little is known about the pattern of sars-cov- specific pre-existing memory t cells in human. here, we first studied t cell responses to structural (nucleocapsid protein, np) and non-structural (nsp- and nsp of orf ) regions of sars-cov- in convalescent from covid- (n= ). in all of them we demonstrated the presence of cd and cd t cells recognizing multiple regions of the np protein. we then show that sars-recovered patients (n= ), years after the outbreak, still possess long-lasting memory t cells reactive to sars-np, which displayed robust cross-reactivity to sars-cov- np. surprisingly, we observed a differential pattern of sars-cov- specific t cell immunodominance in individuals with no history of sars, covid- or contact with sars/covid- patients (n= ). half of them ( / ) possess t cells targeting the orf- coded proteins nsp and , which were rarely detected in covid- - and sars-recovered patients. epitope characterization of nsp -specific t cells showed recognition of protein fragments with low homology to “common cold” human coronaviruses but conserved among animal betacoranaviruses. thus, infection with betacoronaviruses induces strong and long-lasting t cell immunity to the structural protein np. understanding how pre-existing orf- -specific t cells present in the general population impact susceptibility and pathogenesis of sars-cov- infection is of paramount importance for the management of the current covid- pandemic. severe acute respiratory syndrome coronavirus- (sars-cov- ) is the cause of the coronavirus disease (covid- ) . this disease has spread pandemically placing lives and economies of the world under severe stress. sars-cov- infection is characterized by a broad spectrum of clinical syndromes, ranging from mild influenza-like symptoms to severe pneumonia and acute respiratory distress syndrome . it is common to observe in human the ability of a single virus to cause different pathological manifestations. this is often due to multiple contributory factors including the quantity of viral inoculum, the genetic background of patients and the presence of concomitant pathological conditions. moreover, an established adaptive immunity towards closely related or completely different viruses can increase protection or enhance disease severity . sars-cov- belongs to coronaviridae, a family of large rna viruses infecting many animal species. six other coronaviruses are known to infect human. four of them are endemically transmitted and cause common cold (oc , hku , e and nl ), while sars-cov (defined from now as sars-cov- ) and mers-cov have caused limited epidemics of severe pneumonia . all of them trigger antibody and t cell responses in infected patients: however, antibody levels appear to wane relatively quicker than t cells. in sars recovered patients, sars-cov-specific antibodies dropped below detection limit within to years , while sars-cov-specific memory t cells can be detected even at years after infection . since the sequences of selected structural and nonstructural proteins are highly conserved among different coronaviruses (i.e. nsp and nsp are % and % identical, respectively, between sars-cov- , sars-cov- and the bat-sl-covzxc ), we studied whether crossreactive sars-cov- -specific t cells are present in individuals who resolved from sars-cov- or sars-cov- infection. we also studied these t cells in individuals with no history of sars or covid- and who were also not in contact with sars-cov- infected cases. collectively these individuals are hereon referred to as sars-cov- / unexposed. sars-cov- -specific t cells have just started to be characterized in covid- patients , and their potential protective role has been inferred from studies in sars and mers patients. to study sars-cov- specific t cells associated with viral clearance, we collected peripheral blood of individuals who recovered from mild to severe covid- (demographic, clinical and virological information are summarized in extended data table ) and studied the t cell response against selected structural (nucleocapsid protein-np) and non-structural proteins (nsp and nsp of orf ) of the large sars-cov- proteome ( figure a) . we selected nucleocapsid protein as it is one of the more abundant structural proteins produced and has large homology between different betacoranaviruses (extended data fig. ) . nsp and nsp were selected for their complete homology between sars-cov- , sars-cov- and other animal coronaviruses belonging to the betacoranavirus genus (extended data fig. ) , and because they are representative of the orf a/b polyprotein encoding the replicase-transcriptase complex . this polyprotein is the first to be translated upon coronavirus infection. we synthesized -mer peptides overlapping by amino acids (aa) covering the whole length of nsp ( aa), nsp ( aa) and np ( aa) that were organized in pools of approximately peptides each (np- , np- , nsp - , nsp - , nsp - ) and in a single pool of peptides spanning nsp ( figure b) . the unbiased method with overlapping peptides was utilized instead of peptide selection by bioinformatic approaches, since the performance of such algorithms in ethnically-diverse asians is often suboptimal . peripheral blood mononuclear cells (pbmc) of recovered covid- patients were stimulated for h with the different peptide pools and virusspecific t cell responses were analyzed by ifn-γ elispot assay. in all tested individuals ( / ) we detected ifn-γ spots following stimulation with the pools of synthetic peptides covering np (figure c/d) . in nearly all individuals npspecific responses could be identified for multiple regions of the protein: / for region - aa (np- ) and / for - aa (np- ). in sharp contrast, responses to nsp and nsp peptide pools were detected at low levels only in out of covid- convalescents tested. direct ex vivo intracellular cytokine staining (ics) was performed to confirm and define the np-specific ifn-γ elispot response. due to the low frequency, np-specific t cells were more difficult to visualize by ics than by elispot, but a clear population of cd and/or cd t cells producing ifn-γ and/or tnf-α were detectable in out of tested subjects ( figure e) . to confirm and further delineate the multispecificity of the np-specific t cell response detected ex vivo in covid- recovered patients, we defined in nine individuals, the distinctive sections of np targeted by t cells. we organized the overlapping peptides covering the entire np into small peptide pools ( - peptides) that were used to stimulate pbmc either directly ex vivo or after an in vitro expansion protocol previously used in hbv or sars recovered subjects . a schematic representation of the peptide pools is shown in figure a . we found that out of covid- recovered patients possess t cells that recognize multiple regions of np of sars-cov- (figure a) . importantly, we then defined single peptides that were able to activate t cells in patients. utilizing a peptide matrix strategy , we first deconvolute individual peptides responsible for the detected t cell response by ifn-γ elispot. subsequently, we confirmed the identified single peptide by testing, with ics, its ability to activate cd or cd t cells ( figure b ). figure b summarizes the different t cell epitopes defined by both elispot and ics, in covid- recovered individuals. remarkably, we observed that covid- convalescents developed t cells specific to regions that were also targeted by t cells of sars recovered subjects. for example, the np region - which is a described cd t cell epitope in sars-cov- to explore this possibility, we tested np and nsp / -specific t cell responses in sars-cov- / unexposed donors. the blood samples were collected either before july or were serologically negative for both sars-cov- neutralizing antibodies and sars-cov- np antibodies . different coronaviruses known to cause common cold in humans like oc , hku , nl and e present different degrees of amino acid homology with sars-cov- (extended data fig. , ) and recent data demonstrated the presence of sars-cov- cross-reactive cd t cells (mainly specific for spike) in sars-cov- unexposed donors . remarkably, we detected np-specific t cells in some of our sars-cov- / unexposed individuals. the pattern of t cell reactivity, however, was different compared to covid- and sars recovered. t cells from sars-cov- / unexposed were directed against a single peptide pool: i.e. none of the donors responded to the np- peptide pool ( figure a ). moreover, a different pattern was observed for nsp -and nsp -specific t cells. these cells were detected in only out of covid- and in out of sars recovered tested, but were present in out of unexposed donors ( figure a/b) . the cumulative proportion of all studied subjects responding to np and orf- -coded nsp and proteins is shown in figure b . these these latter two t cell specificities were particularly intriguing since the homology between the two protein regions of sars-cov- / and other "common cold" coronaviruses (oc , hku nl and e) was minimal ( figure d) , especially for the cd peptide epitope. this may suggest that perhaps not only human "common cold" coronaviruses, but other presently unknown coronaviruses, possibly of animal origin, can induce cross-reactive sars-cov- memory t cells in the general population. it was remarkable to find that nsp / -specific t cells were detected in out of ( %) sars-cov- / unexposed donors, despite the fact that our analysis was performed with peptides that cover only % ( aa) of the orf- proteome ( aa). notably, t cells specific for orf- -coded proteins were rarely detected in our sars and covid- convalescents. this is consistent with the findings of grifoni et al : using selected peptides, they detected orf- specific t preferentially in some sars-cov- unexposed donors while t cells of covid- recovered donors preferentially recognized structural proteins. the cause of this observed different pattern of immunodominance is presently unknown. we might speculate that a robust t cell response against structural proteins is induced by a productive infection (occurring in covid- and sars recovered patients). individuals exposed to but not infected with possible unknown coronaviruses might just prime orf- -specific t cells. indeed, induction of virus-specific t cells in "exposed but not we np- np- nsp nsp - nsp - nsp - a pneumonia outbreak associated with a new coronavirus of probable bat origin coronavirus infections: epidemiological, clinical and immunological features and hypotheses cd + t cells cross-reactive with dengue and zika viruses protect against zika virus infection no one is naive: the significance of heterologous t-cell immunity epidemiology of seasonal coronaviruses: establishing the context for the emergence of coronavirus disease origin and evolution of pathogenic coronaviruses disappearance of antibodies to sars-associated coronavirus after recovery memory t cell responses targeting the sars coronavirus persist up to years post-infection genome composition and divergence of the novel coronavirus ( -ncov) originating in china detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals t cell responses to whole sars coronavirus in humans recovery from the middle east respiratory syndrome is key: cord- -kq gu cc authors: klein, joshua a.; zaia, joseph title: assignment of coronavirus spike protein site-specific glycosylation using glycresoft date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kq gu cc widely-available lc-ms instruments and methods allow users to acquire glycoproteomics data. complex glycans, however, add a dimension of complexity to the data analysis workflow. in a sense, complex glycans are post-translationally modified post-translational modifications, reflecting a series of biosynthetic reactions in the secretory pathway that are spatially and temporally regulated. one problem is that complex glycan is micro-heterogeneous, multiplying the complexity of the proteome. another is that glycopeptide glycans undergo dissociation during tandem ms that must be considered for tandem ms interpretation algorithms and quantitative tools. fortunately, there are a number of algorithmic tools available for analysis of glycoproteomics lc-ms data. we summarize the principles for glycopeptide data analysis and show use of our glycresoft tool to analyze sars-cov- spike protein site-specific glycosylation. the analysis of glycopeptides from glycoprotein digests using liquid chromatographymass spectrometry (lc-ms) is well established [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . as with many protein posttranslational modifications, the depth and sensitivity of glycopeptide analysis is highest when an enrichment step is used [ , , [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . glycopeptide lc-ms methods provide maximal dynamic range but require specialized processing steps ( figure ) to account for glycopeptide heterogeneity and glycosidic bond dissociation [ , ] . in this review, we summarize bioinformatics methods for processing glycopeptide lc-ms data. in proteomics, in order to assign the neutral mass of a molecule, it is necessary to convert the raw data from the m/z space to the neutral mass space. for unmodified peptides, the elemental composition is approximated using an average amino acid (averagine) to allow estimation of the protein composition [ ] . for glycopeptides, it is necessary to adjust the averagine value to include glycosylation. tryptic glycopeptides tend to be observed over a larger m/z and charge state range ( + to +) than typical tryptic peptides ( + to +). in addition, as shown in figure , glycosylation skews the isotopic distribution relative to unmodified peptides. therefore, specialized deconvolution algorithms are required for glycoproteomics data. sweetnet glycresoft suite of tools for glycomics and glycoproteomics uses an lc-scale deisotoping and charge state deconvolution algorithm for precursor and product ions [ ] . glycopeptide identification algorithms use peptide-centric, glycan-centric or complete approaches. the peptide-centric method focusses on identifying the peptide backbone sequence, may use peptide + y ions, but do not control for the false discovery rate of the glycan [ ] . by contrast, glycan-centric methods [ , ] identify the attached glycan but do not use peptide backbone dissociation to assign the peptide sequence. combined methods [ ] [ ] [ ] employ a single score that includes both peptide and glycan components and controls the total uncertainty but not the uncertainty of the separate components. complete methods [ ] [ ] [ ] control the uncertainty of glycan and peptide components separately and combined. some methods use oxonium ions to constrain the range of glycopeptide glycans in a manner that complements use of peptide + y ions for assigning glycan composition. these approaches assume that there is no ion co-isolation of more than one glycopeptide ion. a glycoproteomics database search engine includes functions for (i) search space construction, (ii) mass spectrum pre-processing, (iii) a scoring model that evaluates glycopeptide lc-ms algorithms p. the match between a spectrum and a search space structure, and (iv) a model that evaluates the identification uncertainty for estimation of false discovery rates of glycopeptide sequence matches. the search space uses an input protein list to calculate proteolytic peptides with a list of constant and variable modification rules that include glycosylation. the input protein list may be derived from a fasta file, an annotated protein sequence format, or an exported proteomics search mzidentml file. the advantage to using a well-annotated proteome is that the extent of combinatorial expansion of the search space due to inclusion of glycosylation is minimized. there is a degree of subjectivity regarding the makeup of the glycan search space used to construct theoretical glycopeptides. the best practice is to use a measured glycome for this purpose, but this is not always practical. while glycan databases such as glytoucan [ ] can be used, care must be taken to use the subset of glycans appropriate for the biological system in question. approaches for estimating glycan search spaces have been described using biosynthetic simulation [ , ] , manual curation [ , , ] , and combinatorial expansion [ , ] . the sweetnet algorithm used a small combinatorial glycan list to extrapolate the set of n-glycans, o-glycans, and gag linker saccharides using a spectral network to infer monosaccharide gain/loss in networks of spectra [ ] . glycopeptide tandem ms scoring models depend on the dissociation method and glycopeptide size, meaning that there is no one optimal model that applies to all tandem ms data. for collisional dissociation, collision energy strongly influences the appearance and informational value of glycopeptide tandem mass spectra. as shown in figure , glycopeptide tandem mass spectra contain low m/z oxonium ions that act as signatures for glycosylation and high m/z ions from loss of monosaccharide units from the precursor ion. peptide backbone product ions are typically observed only for p. elevated collision energies. therefore, use of stepped collision energy has become popular for glycopeptide studies [ ] . while electron activated dissociation methods generally favor peptide backbone dissociation over glycosidic bond dissociation, the degree to which vibrational excitation is observed is technique and instrument dependent [ ] [ ] [ ] [ ] [ ] [ ] . as with proteomics of unmodified peptides, empirical models are used to estimate false discovery rate (fdr) for glycopeptides. as in proteomics, glycopeptide data are searched using target decoy analysis [ ] [ ] [ ] whereby targets and decoys compete for spectral matching. some published methods for glycopeptides use structural properties to optimize model performance [ , , ] or employ hierarchical filters [ , , , ] to optimize results. for hcd, stepped collision energies most consistently produce peptide+y ions and peptide bn and yn ions that characterize the glycopeptide glycan and peptide backbone independently [ , , ] . whole pathogenic organism vaccines work well against viruses the life cycles of which do not require evasion from the host immune system, including measles, polio, and small pox [ ] . by contrast, viruses that have life cycles that depend on the ability to evade the host immune system and have evolved mechanisms that result in suboptimal antibody responses. immune evasion by molecular mimicry and glycan shielding has been observed and characterized for spike proteins of viruses including hiv- envelop protein [ ] , influenza hemagglutinin [ ] , lassa virus glycoprotein complex [ ] , and corona virus s protein [ ] . glycosylation of the hiv envelope trimer corresponds to about half of its total mass [ ] . the dense glycan shield limits the extent of biosynthetic processing, resulting in primarily high mannose n-glycans that are thought to interfere with proteolytic processing of envelope peptides for presentation to the major histocompatibility complex [ , ] . although studies have identified broadly neutralizing antibodies that recognize the hiv envelope glycan shield, it has not been possible to induces such antibodies in response to vaccine challenge [ ] . by contrast, glycosylation of influenza a virus hemagglutinin reflects a balance of immune evasion versus receptor binding. if hemagglutinin glycosylation becomes too dense, it interferes with receptor bniding and/or membrane fusion [ ] [ ] [ ] . four respiratory coronaviruses cause mild, cold-like, symptoms in humans. while most adults have antibodies against these coronaviruses, they have circulated in the human population for centuries [ ] . the severe acute respiratory syndrome corona virus (sars-cov) zoonotic outbreak in humans was contained within three months after its discovery in . the middle east respiratory syndrome (mers) coronavirus has spread zoonotically to humans repeatedly but has so far had limited human-to-human spread [ ] . by contrast, the sars-cov- virus jumped from animals to humans in and caused a global pandemic with incalculable damage to human culture world-wide. glycosylation of the sars-cov- s protein is of interest for development of antiviral strategies that target the virus-angiotensin-converting enzyme (ace ) receptor recognition [ ] . the s protein is composed of the amino-terminal receptor binding s and carboxy-terminal s membrane fusion subunit [ ] . proteolytic cleavage between s and s is required for receptor binding and membrane fusion [ ] . because antibodies against s receptor binding domain have the potential to neutralize the virus, there is interest in using s protein constituents as vaccine candidates [ , ] . the use of glycan masking and molecular mimicry has been described for human respiratory coronavirus hcov-nl and other coronaviruses [ , ] . the we chose to analyze a published lc-ms data set on sars-cov- recombinant s protein [ ] using our publicly available, open-source glycresoft program [ ] . we show how any biomedical scientist with access to a windows desktop computer can query publicly available data for s protein site-specific glycosylation. the site-specific glycosylation of recombinant s protein expressed in human cells was characterized using glycoproteomics liquid-chromatography-mass spectrometry [ ] . the authors expressed the pre-fusion s domain with two proline substitutions were used to stabilize the trimer [ ] . a "gsas" substitution at the furin cleavage site and a c-terminal trimerization motif were used to facilitate maintenance of quaternary architecture during glycan processing [ ] . they digested separate samples using trypsin, chymotrypsin and alpha-lytic protease, respectively, in order to map glycosylation at all sequons. size fractionated, reduced, and alkylated s protein was digested with protease and the resulting peptides analyzed using µm internal diameter, cm length reversed phase lc-ms with a min linear gradient. the scan range was - and hcd collision energy set to %. the instrument was set for top-n data dependent acquisition. a single raw lc-ms data file for each proteolytic enzyme was posted publicly to the massive database [ ] . glycopeptides were assigned using the glycresoft graphical user interface [ ] available at http://www.bumc.bu.edu/msr/glycresoft/. raw files were converted to mzml format using proteowizard msconvert [ ] and deconvoluted/deisotoped using the glycresoft preprocessing algorithm. a glycan search space was constructed by combining an n-glycan biosynthesis simulation combined with up to one sulfate per glycan composition. a glycopeptide search space was built for each protease using the corresponding mzidentml or fasta file and the glycomics search space. glycopeptides were identified using - ammonium adducts, with a precursor mass error tolerance of ppm, a product mass error tolerance of ppm. the complete glycresoft html reports are included as supplemental files. total ion chromatograms for the tryptic, chymotryptic and alpha lytic protease digests are shown in figure . the use of a long lc gradient combined with a single hcd collision energy value of % maximized the number of glycopeptides that were while the glycopeptide tandem mass spectrum shown in figure was acquired using stepped collision energy, the s protein tandem ms data were acquired using hcd set at %. under these conditions, glycopeptides were extensively fragmented and the abundances of peptide + yn ions was very low, skewing tandem ms scores to the lower range (see for example figure b ). the peptide sequence is identified unambiguously but the lack of peptide+yn ions limited glycan characterization to intact mass and oxonium ions, leaving core structure unknown. as a balancing factor, it is possible to dissociate more precursor ions when a single collision energy is specified that with stepped collision energy. we processed lc-ms runs acquired for three proteolytic digests, trypsin, chymotrypsin and alpha lytic protease. the trypsin and alpha-lytic protease search parameters were set to specify one site of glycosylation peptide using a desktop computer using processors. the chymotryptic digest was first considered using only one site of glycosylation per peptide, but the set of identified glycans from that search were used to re-generate the search space allowing up to two sites of glycosylation per peptide for the final reported results, searched on with a shared high performance computing cluster utilizing processors. the glycoforms identified for each glycopeptide are shown figure -figure . the results shown correspond to the enzyme digest that produced the highest glycopeptide abundances for a given glycosite. overall, the abundances of high mannose, hybrid, and complex n-glycan compositions is consistent with those in the original publication [ ] . n-glycan sulfation is a topic of interest for influenza a virus because this modification influences viral replication, receptor binding, antigenicity and interactions with lectins of the innate immune system [ , ] . in influenza a virus, the virus neuraminidase enzyme removes all or nearly all of the sialic acid residues from hemagglutinin nglycans. sulfation has been identified on c- of gal and c- of glcnac residues of n-glycans as a biosynthetic event taking place in the trans-golgi network [ ] . researchers investigated several influenza vaccine preparations and found sulfation at several n-glycan sites for h n , h n , h n , h n and influenza b [ ] . in contrast to influenza hemagglutinin, both sialylated and asialo n-glycans of s protein are abundant. we therefore included sulfation as a modification to the nglycan search space we used for our analyses. we found position to carry abundant sulfated tri-antennary and tetra-antennary n-glycans ( figure a ). a total of glycan compositions were identified at this position from the chymotryptic digest. an example annotated glycopeptide tandem mass spectrum is shown in figure b . as shown in figure c hexnac - neuac - , indicating that sulfation is likely placed on a non-reducing end hexnac residue. sulfation was also detected at trace levels for sites ( figure ), ( figure ) and ( figure ). as expected, each glycosite reflects a distribution of glycan compositions, consistent with the existence of populations of mature s glycoprotein molecules differing by glycosylation. as shown in figure - figure , glycans at sites , , , are occupied primarily by high mannose n-glycans with minimal processing to complex type compositions. note that sites and were identified in the same chymotryptic peptide ( figure ) and we assumed one glycan per site. glycans at sites , , , and display an abundant hex hexnac composition, indicating processing by mannosidases, along with hybrid, complex biantennary and complex triantennary compositions, indicating that the s protein population undergoes a range from low to high degree of golgi-mediated biosynthetic processing at these sites. sites , , , , , , , and contain extensively processed bi-, tri-, and tetra-antennary compositions, consistent with high degree of accessibility to biosynthetic enzymes at these sites. glycresoft is an open-source, publicly available software program that can used to analyze glycoproteomics lc-ms data. the program allows the user to specify glycan modifications including sulfation. we show an example of the use of glycresoft to assign sars-cov- s protein glycosylation from a published data set in which we identify sulfated n-glycans not identified in the original manuscript. glycresoft html output summary tiles are provided for the sars-cov- s protein tryptic and chymotryptic digests, respectively. this work was supported by u. s. nih grant u ca current protocols in bioinformatics emerging microbes & infections sars-cov- spike site-specific n-linked glycan analysis. massive database (accessed key: cord- -scokdxp authors: tani, hideki; tan, long; kimura, miyuki; yoshida, yoshihiro; yamada, hiroshi; fukushi, shuetsu; saijo, masayuki; kawasuji, hitoshi; ueno, akitoshi; miyajima, yuki; fukui, yasutaka; sakamaki, ippei; yamamoto, yoshihiro; morinaga, yoshitomo title: evaluation of sars-cov- neutralizing antibodies using a vesicular stomatitis virus possessing sars-cov- spike protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: scokdxp sars-cov- is a novel coronavirus that emerged in and is now classified in the genus coronavirus with closely related sars-cov. sars-cov- is highly pathogenic in humans and is classified as a biosafety level (bsl)- pathogen, which makes manipulating it relatively difficult due to its infectious nature. to circumvent the need for bsl- laboratories, an alternative assay was developed that avoids live virus and instead uses a recombinant vsv expressing luciferase and possesses the full length or truncated spike proteins of sars-cov- . furthermore, to measure sars-cov- neutralizing antibodies under bsl conditions, a chemiluminescence reduction neutralization test (crnt) for sars-cov- was developed. the neutralization values of the serum samples collected from hospitalized patients with covid- or sars-cov- pcr-negative donors against the pseudotyped virus infection evaluated by the crnt were compared with antibody titers determined from an immunofluorescence assay (ifa). the crnt, which used whole blood collected from hospitalized patients with covid- , was also examined. as a result, the inhibition of pseudotyped virus infection was specifically observed in both serum and whole blood and was also correlated with the results of the ifa. in conclusion, the crnt for covid- is a convenient assay system that can be performed in a bsl- laboratory with high specificity and sensitivity for evaluating the occurrence of neutralizing antibodies against sars-cov- . coronavirus with closely related sars-cov. sars-cov- is highly pathogenic in humans and is classified as a biosafety level (bsl)- pathogen, which makes manipulating it relatively difficult due to its infectious nature. to circumvent the need for bsl- laboratories, an alternative assay was developed that avoids live virus and instead uses a recombinant vsv expressing luciferase and possesses the full length or truncated spike proteins of sars-cov- . furthermore, to measure sars-cov- neutralizing antibodies under bsl conditions, a chemiluminescence reduction neutralization test (crnt) for sars-cov- was developed. the neutralization values of the serum samples collected from hospitalized patients with covid- or sars-cov- pcr-negative donors against the pseudotyped virus infection evaluated by the crnt were compared with was designated as pcag-sars-cov- . the plasmid, which contains the s protein gene with a aa truncation at the c-terminus, was constructed using the cdna of pcag-sars-cov- . the s proteins with the aa deletion of coronaviruses were previously reported to show increased efficiency regarding incorporation into virions of vsv ( , ). human (huh and t), monkey (vero), hamster (bhk and cho), and mouse (nih t ) cell lines were obtained from the american type culture collection (summit pharmaceuticals international, tokyo, japan). all cell lines were grown in dulbecco's modified eagle's medium (dmem; nacalai tesque, inc., kyoto, japan) containing % heat inactivated fetal bovine serum (fbs). pseudotyped vsvs bearing the s protein, the aa-truncated s protein of sars-cov- , or vsv-g were generated as described below. briefly, t cells were grown to % confluence on collagen-coated tissue culture plates and then transfected with each expression vector: pcag-sars-cov- s-full, pcag-sars-cov- s-t , and pcag-vsv-g. after h of incubation, the cells transfected with each plasmid were infected with g-complemented (*g) vsv∆g/luc (*g-vsv∆g/luc)( ) at a multiplicity of infection (moi) of . per cell. then, the virus was adsorbed and extensively washed four times with % fbs dmem. after h of incubation, to remove cell debris, the culture supernatants containing pseudotyped vsvs were centrifuged, and then, they stored at − °c until ready for use. the pseudotyped vsv bearing sars-cov- s protein or sars-cov- truncated s protein are referred to as sfullpv or st pv, respectively. the infectivity of sfullpv, st pv, or vsvpv to t cells was assessed by measuring the luciferase activity. the value of the relative light unit (rlu) of luciferase was determined using a picagene luminescence kit (toyo b-net co., ltd, tokyo, japan) and glomax navigator system g (promega corporation, madison, wi), according to the manufacturer's protocol. (pbs) containing % np . then, the lysates were centrifuged to separate insoluble pellets from supernatants. the supernatants were used as samples. the sfullpv or st pv, which were generated as described above, were pelleted through a % (wt/vol) sucrose cushion at , rpm for h in an sw rotor (beckman coulter, tokyo, japan). then, the pellets were resuspended in pbs. each sample that was boiled in loading buffer was subjected to % sodium dodecyl sulfate-polyacrylamide gel electrophoresis (sds-page). according to the manufacturer's protocol, the proteins in the gel were stained with cbb stain one (nacalai tesque, inc.). next, the proteins in another gel were electrophoretically transferred to a methanol-activated polyvinylidene difluoride (pvdf) membrane (millipore, billerica, ma) and reacted with covid- hospitalized patient sera (# ). then, immune complexes were visualized with supersignal west dura extended duration substrate (pierce, rockford, il) and detected by an las analyzer (fuji film, tokyo, japan). twenty-three serum samples were collected from hospitalized patients with covid- who were admitted to the university of toyama hospital, toyama, japan. in addition, nineteen serum samples were collected from covid- pcr-negative donors at the university of toyama hospital. the diagnosis of covid- in all patients or donors was assessed using the real-time pcr method with specific primers, which were developed at the national institute of infectious diseases, japan ( ). by using a blood collection tube containing edta, whole blood samples were obtained from hospitalized patients (the university of toyama hospital) with covid- . the patient sera used in this study were collected from participants after obtaining informed consent. to examine neutralization of the human serum or whole blood samples against pseudotyped viruses, vero cells were treated with serially diluted sera or whole blood of convalescent patients with covid- or pcr-negative donors and then inoculated with sfullpv, st pv, or vsvpv. to remove hematopoietic cells from whole blood samples, centrifugation was performed at , × g for min. infectivity of the pseudotyped viruses were determined by measuring luciferase activities after h of incubation at °c. and ∆gpv by cbb staining (fig. a, left panel) . overall, the amount of st proteins incorporated was higher than that of sfull proteins, although the amount of structural proteins of vsv was almost the same level among all the virions. these results indicate that the incorporation of truncated s proteins into vsv particles was more efficient than the full-length s protein. next, ∆gpv, sfullpv, st pv, and vsvpv were inoculated into the indicated cell lines to examine the infectivity of pseudotyped viruses to various mammalian cell lines (fig. b) . among the tested cell lines, huh , t, vero, and bhk cells were susceptible to sfullpv and st pv infection. showed no susceptibility. notably, the infectivity of st pv was higher than that of sfullpv in vero to determine the specificity of infection of sfullpv and st pv, a neutralization assay of the pseudotyped viruses was performed using sera of two hospitalized covid- patients. the infectivity of sfullpv and st pv, but not that of vsvpv in vero cells, were clearly inhibited by sera of the patients in a dose-dependent manner (fig. ) . these data indicated that sfullpv and st pv infection exhibited an s protein-mediated entry. to examine the neutralization of covid- hospitalized patients or covid- pcr-negative donors against st pv, vero cells with each serum were infected with st pv and vsvpv. neutralization of st pv was observed based on the sixteen sera of covid- hospitalized patients at a rate of more than % (fig. a) . sera, which did not show the neutralization of st pv, were derived from covid- hospitalized patients who were hospitalized for a short period before antibody production (such as within days after onset). no neutralization was observed in the vsvpv infection by any of the sera of covid- pcr-negative donors ( fig. a and b) . the dot plot graph shows a classification of each pseudotyped virus from fig. a and b graphs (fig. c) . due to the presence of convalescent and non-convalescent patient sera, the degree of neutralization activity of st pv by covid- hospitalized patient sera was variable. to examine the correlation of the antibody titers using crnt compared to those determined by the ifa, the ifa was also performed using covid- hospitalized patients or covid- pcr-negative donors. the fluorescence intensity of ifa was correlated with the sera, which exhibit a high neutralizing activity in the crnt (fig. and table ). sera with low neutralizing activity in the crnt showed weak fluorescence intensity, and sera that demonstrated no neutralizing activity in the crnt were also negative by ifa. also, we compared the neutralizing effect of pseudotyped viruses between the sera and whole blood of covid- hospitalized patients with or without centrifugation (fig. ) . after centrifugation of whole blood, hematopoietic cells, including red blood cells, may be removed. as a result, the neutralization of both sera and whole blood against st pv infection was observed with or without centrifugation (fig. ) . although the neutralizing activities of whole blood were higher than those of the sera in st pv infection, the infectivity of vsvpv was reduced by approximately / without centrifugation (fig. a) . the reduction of infectivity of both st pv and vsvpv was suppressed, by removing hematopoietic cells in whole blood after centrifugation (fig. b) . therefore, some inhibitory factors, such as hematopoietic cells, may be involved in pseudotyped virus infection in whole blood. a rapid, safe, and highly sensitive crnt system using vsv-based pseudotyped viruses with sars-cov- s or truncated s proteins was developed. because this system utilizes replication and translation of vsv, neutralization against pseudotyped virus infection can be determined within - hours. another pseudotyped viral system that uses retroviral or lentiviral vectors takes approximately h to obtain results. therefore, the vsv-based pseudotyped viral system is considered more useful. in addition, since measurement of luciferase activity is a quantitative method, it is not necessary to count gfp-positive cells. therefore, this crnt system permits a simple and objective evaluation for the neutralization. for many viral species, crnt systems were developed using vsv-pseudotyped viruses with their own envelope proteins ( , ( ) ( ) ( ) . in sars-cov- , researchers recently demonstrated the construction of pseudotyped viruses and evaluation of the presence of neutralizing antibodies ( ). in this study, we prepared a pseudotyped virus that possesses a truncated sars-cov- s protein, which showed higher infectivity. furthermore, the neutralizing activity of the test sera and whole blood against the pseudotyped virus was quantitatively detected in a convalescent patient with covid- , while the donor sera of the covid- pcr-negative patient showed a negative reaction. in the crnt of st pv, the infectivity of st pv was reduced by % or more by the convalescent phase patient sera. this demonstrated that the convalescent phase patient sera of covid- exhibited a high neutralizing antibody activity. the results determined by the crnt also correlated with those determined by the ifa. antibodies against the s protein of sars-cov- in covid- convalescent patient sera were capable of neutralizing the viral infection. if using whole blood in the crnt becomes possible, the work of separating serum will no longer be necessary. the crnt can be performed with an extremely small amount of blood sample (only a few microliters). therefore, when we confirmed that the crnt with whole blood of the convalescent phase patient of covid- was possible, a high neutralizing activity by the crnt should be observed. however, the infectivity of the control vsvpv was also reduced by the whole blood control. since many hematopoietic cells, including red blood cells, are contained in whole blood, these cells are present on the vero cells used in the crnt with whole blood. the possibility highly exists that this inhibition is due to the presence of hematopoietic cells because removal of the cells with centrifugation suppressed non-specific reduction of the pseudotyped viral infection. however, inhibition of st pv infection shows a stronger neutralizing activity compared to vsvpv infection. therefore, evaluating the crnt using whole blood is possible. the neutralizing antibody measurement system using pseudotyped viruses for sars-cov- is an effective tool for evaluating the presence or duration of the neutralizing antibody in convalescent patients and to screen for those who present with the neutralizing antibody among suspected populations. in addition, this crnt system does not require the use of infectious viruses to measure neutralizing antibodies. therefore, once the pseudotyped virus system is established, it can be made available at many laboratories without bsl- facilities. furthermore, because of the measuring system by chemiluminescence, the results can be obtained safely and quickly. finally, the crnt using whole blood is a simpler and safer method because it can be measured with only a very small amount of blood from an eligible person. the authors declare no conflicts of interest in association with the present study. clinical features of patients infected with novel coronavirus in wuhan remdesivir for the treatment of covid- : a systematic review of the literature current epidemiological and clinical features of covid- ; a global perspective from virology, epidemiology, pathogenesis, and control of covid- current status on the development of pseudoviruses for enveloped viruses pseudotyped lentiviral vectors: one vector generation of vsv pseudotypes using recombinant deltag-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines recombinant nucleoprotein-based enzyme-linked immunosorbent assay for detection of immunoglobulin g antibodies to crimean-congo hemorrhagic fever virus vesicular stomatitis virus pseudotyped with severe acute respiratory syndrome coronavirus spike protein protease-mediated entry via the endosome of human coronavirus e involvement of ceramide in the propagation of japanese encephalitis virus development of genetic diagnostic methods for detection for novel coronavirus (ncov- ) in japan analysis of vsv pseudotype virus infection mediated by rubella virus envelope proteins analysis of lujo virus cell entry using pseudotype vesicular stomatitis virus characterization of glycoprotein-mediated entry of severe fever with thrombocytopenia syndrome virus protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays indicated dilutions of sera from two different convalescent . key: cord- - in wzv authors: lemoine, frédéric; blassel, luc; voznica, jakub; gascuel, olivier title: covid-align: accurate online alignment of hcov- genomes using a profile hmm date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: in wzv motivation the first cases of the covid- pandemic emerged in december . until the end of february , the number of available genomes was below , , and their multiple alignment was easily achieved using standard approaches. subsequently, the availability of genomes has grown dramatically. moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. a more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. results hcov- genomes are highly conserved, with very few indels and no recombination. this makes the profile hmm approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. using a core of ∼ , high quality genomes, we estimated a profile using hmmer, and implemented this profile in covid-align, a user-friendly interface to be used online or as standalone via docker. the alignment of , genomes requires less than mn on our cluster. moreover, covid-align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). availability https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align contacts olivier.gascuel@pasteur.fr, frederic.lemoine@pasteur.fr supplementary information supplementary information is available at bioinformatics online. since the emergence of the hcov- virus (or sars-cov- ) responsible for the covid- pandemic, unprecedented efforts are taking place across the world to sequence genomes of this virus and share the data. as of today ( / / ), the gisaid (shu et al., ) provides access to more than , full genomes, and the ncbi and ebi more than , and , , respectively. the first genomes were sequenced in china by the end of december . their number first increased slowly and then rapidly when the pandemic appeared on all continents. submissions of several thousand sequences to gisaid in a single day has become common. moreover, some genomes may be submitted incomplete, with sequencing and assembly errors. these characteristics pose major challenges to bioinformatics, notably that of multiple sequence alignment (msa; chatzou et al., ) , which is crucial for subsequent analyses (phylogeny, transmission clusters, mutation study, structure, etc.). to solve this difficulty, we use a profile hmm-based approach (durbin et al., ) , which is the norm for hiv (www.hiv.lanl.gov), and is particularly well suited to hcov- , as its genome is highly conserved, without known recombination in human hosts (xiaolu et al., ; de maio et al., ) . using a profile, the addition of new data to an existing msa requires linear computing times in the number of input genomes. moreover, profilebased msa proved to be very accurate (earl et al., ; nute and warnow, ) . this approach is implemented in covid-align, which can be used thanks to a web service and via docker. to estimate our profile hmm, we proceeded in several steps, in order to select an appropriate set of sequences and obtain a clean and reliable msa to give as input to hmmer (www.hmmer.org): • we downloaded all hcov- genomes available on gisaid (april , ) and performed pairwise alignments using mafft (katoh and standley, ) of each of these genomes with the reference strain hcov- /wuhan/wiv / , sequenced in china december , . this genome was found perfectly conserved not only in china, but also in thailand, japan, usa, uk, etc. and is considered as the origin of the virus (li et al. ; www.gisaid.org) . then, using loose thresholds, we removed the genomes that were excessively divergent from the reference and had too many unknown (n) characters. we edited the remaining ones (e.g. removing the first gappy positions and the poly-a tail) and aligned them with mafft. the msa so obtained was further filtered by removing the genomes having too many unique (i.e. not shared by any other genome) mutations and indels. we used more stringent thresholds than in the previous stage. this resulted in an msa of , genomes, where the first and last positions of the reference genome were removed due poor alignment and low signal, but all other reference positions were preserved and showed high conservation. we used hmmer to estimate our profile from this curated msa. all details and program options are available in supplementary information. the resulting profile was implemented in a nextflow (di tomaso et al. ) and galaxy workflow combining hmmalign from hmmer to align the input genomes to the profile, goalign to format the input/output files (https://github.com/evolbioinfo/goalign), and python to compute summary statistics. these statistics help users evaluate the sequencing quality and potential evolutionary novelties of input genomes; for example: number of unique mutations and indels, number of mutations compared to the reference genome... a user-friendly interface, implemented in go (similar to lemoine et al. ) allows users to launch their analyses without having to know how to use the galaxy system. for advanced users, covid-align can be installed locally via docker (https://www.docker.com). all results are given in a zipped file containing: • the msa of the input genomes plus the reference one that is displayed first, but cutting the first and last positions. with small datasets, this msa can be visualized using msaviewer ( fig. ; yachdav et al., ) . the hmmalign output in fasta format, for each of the input genomes. this can be used to recover the insertions, deletions and match positions (to be reported to the reference genome). a csv file with all statistics computed for each of the input genomes. unique mutations and indels are possibly due to errors (sequencing, assembly etc.), while new ones (seen at least twice in submitted genomes, for the first time) likely correspond to evolutionary novelties (see sup. info. for details). a table in csv format, summarizing the main average statistics and features of submitted genomes (fig. ). our web service processes , genomes in less than minutes, thanks to parallelization that is easy to set up with profiles. comparison with mafft-based gisaid msa shows that our msa: ( ) can be used as is, while mafft's cannot due to ~ highly gappy columns resulting from sequencing and assembly errors; ( ) helps to detect and filter these errors; ( ) is similar for most sequences to a properly trimmed version of mafft's msa, and more accurate for the few others (sup. info). importantly, our profile and statistics will be regularly updated to account for user needs and the evolutionary novelties (mutations, indels...) of the emerging genomes to come. sequenced ones, and the bat and pangolin genomes (bottom). the site numbering corresponds to that of the reference, to be used to recover the orfs and genes. in rbd region the pangolin virus genome is closer to human's than is bat's, suggesting a possible recombination. on the opposite, human viruses are highly conserved. right: statistics summary, displaying the number of high and low quality genomes, and the number of evolutionary events (mutations, gaps, gap openings, insertions, insertion openings). we distinguish the number of unique events (not seen yet and present only once in submitted genomes, possibly corresponding to errors) and the number of new events (seen at least twice, likely corresponding to evolutionary novelties to estimate our profile hmm, we proceeded in several steps, in order to select an appropriate set of sequences and obtain a clean and reliable msa to give as input to hmmer (www.hmmer.org) [we provide all details of this procedure below using bracketed, italic insertions in the main text]: we downloaded all hcov- genomes available on gisaid (april , ; human host only) and performed pairwise alignments using mafft (katoh and standley, ) [options: mafft --add ] of each of these genomes with the reference strain hcov- /wuhan/wiv / [genome id: epi_isl_ ], sequenced in china december , . this genome was found perfectly conserved not only in china, but also in thailand, japan and usa, and is considered as the origin of the virus (li et al. ; www.gisaid.org the msa so obtained was further filtered by removing the genomes having too many unique (i.e. not shared by any other genome) mutations and indels. we used more stringent thresholds than in the previous stage [a genome is removed if in the second, global msa it has: > unique mutations, or > unique internal indels]. this resulted in an msa of , genomes, where the ~ first and last positions of the reference genome were removed due to poor alignment and low signal, but all other reference positions were preserved and showed high conservation [> . % in average among all positions, but variable positions with less than % conservation; average fraction of gaps per site = . %, but positions with more that . % gaps]. we used hmmer to estimate our profile from this curated msa [options: hmmbuild -n covid covid .hmm ]. for each of the input genomes, covid-align computes a series of summary statistics to help users analyze their data, remove problematic sequences, and detect those containing evolutionary novelties. as explained in the main text, we compute (among other statistics) the number of unique and new mutations/deletions/deletions. to achieve these computations, we regularly analyze all the data available on gisaid and count for every msa position the number of a, c, g, t and gaps, and the number of times this position is followed by an insertion and the length of that insertion. when, a set of genomes is submitted, we compute the same quantities, which are used in combination with gisaid-based ones to obtain our summary statistics. definitions are as follows:  a unique mutation/insertion/deletion is present once and only once in the submitted sequences, but not in the gisaid sequences.  a new mutation/insertion/deletion is either ( ) not present in the gisaid sequences and seen at least twice in the submitted sequences, or ( ) unique in the gisaid sequences and seen at least once in the submitted sequences. importantly, this does not apply to sequences already available on gisaid, as these would be counted twice. the summary statistics returned for each of the submitted genomes are as follows:  length: length of unaligned sequence (not counting for starting/end gaps and unknown characters), to be compared to the length of the msa ( , see above).  high_quality: our quality index (yes/no) based on the following rule: the sequence is deemed of high quality if it has : at most unique mutations, at most unique gap openings, at most unique insertion openings, at most mutations compared to the reference sequence, less than % n, less than % n + start gaps + end gaps.  mut_unique: # unique dna mutations (see above definition for unique/new).  mut_new: # new dna mutations (does not apply to gisaid sequences).  mut_ref: # dna mutations compared to the reference genome (epi_isl_ ).  mut_orf: # mutations occurring in orfs.  mut_density: highest number of dna mutations in a window of size (to be used to detect poor quality genomes).  mut_unique_list: list of unique mutations, as pairs of (position, nucleotide).  mut_new_list: list of new mutations (does not apply to gisaid sequences).  mut_orf_list: list of mutations compared to the reference sequence, occurring in orfs. each mutation is represented as a triple of (position, mutated nucleotide, name of orf).  gap_start: # gaps (i.e. deletions) at the beginning of the sequence (not counting those in the first positions of the reference sequence).  gap_end: # gaps at the end of the sequence (not counting those in the last positions of the reference sequence).  gap: # gaps in the core sequence (i.e. not counting start/end gaps).  gap_unique: # unique core gaps (see above definition for unique/new).  gap_new: # new core gaps (does not apply to gisaid sequences).  gap_opening: # number of core gap openings.  gap_opening_unique: # number of unique core gap openings.  gap_opening_new: # number of new core gap openings.  gap_orf: # gaps occurring in orfs.  gap_segment_unique: # unique gap segments in the core sequence, having a unique set of starting position and length (see above definition for unique/new).  gap_segment_new: # new gap segments in the core sequence (does not apply to gisaid sequences).  gap_orf_list: list of gaps occurring in orfs, as pairs of (position, orf name).  insertion: # non n insertions in the core sequence (i.e. not counting start/end insertions).  insertion_opening: # core insertion openings.  insertion_opening_unique: # unique core insertion opening positions (see above definition for unique/new).  insertion_opening_new: # new core insertion opening positions (does not apply to gisaid sequences).  insertion_orf: # insertion opening in orfs.  insertion_segment_unique: # unique insertions segments in the core sequence, having a unique set of opening position and length (see above definition for unique/new).  insertion_segment_new: # new insertions segments in the core sequence (does not apply to gisaid sequences).  insertion_opening_unique_list: list of unique opening insertion positions.  insertion_opening_new_list: list of new opening insertion positions.  insertion_segment_list: list of insertion segments as pairs of (opening position, length).  insertion_orf_list: list of insertions occurring in orfs, as pairs of (opening position, orf name).  a: # a in the whole sequence. from these statistics, we compute average results for all submitted genomes (csv format): for example, in the following table ( also provided in main text) we display the average statistics obtained for all available gisaid sequences from human host added between april and may , with "unique" and "new" statistics based on all available gisaid sequences up to april . these results confirm that insertions are very rare. the number of shared insertion openings is . % per genome, that is, in total, with length of to nucleotides, corresponding to sequences. most of them are shared by or sequences only, and could be sequencing or assembly errors. only one insertion of length found in orf ab is shared by sequences from uk and australia. this contrasts with deletions (gaps), which are much more frequent, with some long shared deletions, e.g. the -nt deletion found in over a dozen sequences from singapore and taiwan. when new sequences with confirmed insertions and deletions will be available from emerging genomes, they will be incorporated in the profile and the resulting msa will closely account for these indels. the "new" statistics shown in above table are based on all human sequences available on gisaid up to april . this is intended to illustrate the behavior of covid-align and the type of results the users should expect. but in real use, these statistics are based on a database that is regularly updated top account for the last evolutionary events observed among emerging genomes. in above table, covid-align was used to align, assess quality and summarize features of newly submitted sequences sampled from human hosts. nevertheless, covid-align can also provide high quality alignments of sequences sampled from various animal hosts, environment or cell cultures, or even sequences of more distant viral species from coronaviridae family. furthermore, as an hmm profile, it can be used to search for related sequences in a data pool. the genomes submitted to the gisaid and the other repositories may be incomplete with assembly and sequencing errors, and long stretches of unknown characters and gaps. these unusual characteristics, in addition to the number of genomes available, make multiple alignment difficult, despite the fact that these genomes are highly conserved up to now, after ~ months of evolution. the sequences need not only to be compared and aligned, but also to be trimmed. in that respect, a profile hmm approach is especially well suited. the gisaid web site provides an msa of all high-quality (< % n) complete genomes (> , length), without duplicates. this msa is inferred using mafft version (katoh and standley, ) with options: --thread - --nomemsave. we downloaded the msa available on april , comprising , sequences. this msa is much longer ( , sites) than the genome length (epi_isl_ = , ). this is due to the low quality of certain sequences. even if these sequences were manually curated and assessed to be of high quality, they still contain a relatively large fraction of unknown characters (n), gaps (-) and assembly errors. moreover, the length of n stretches seems to be approximate and poorly correlated to the real length of the corresponding sequences. this msa thus contains a large number of columns containing ~ % gaps, but for one or a few (mostly n) characters. figure provides an example where a large number of gaps is caused by an assembly error, plus a long stretch of unknown n characters. in some cases, likely due to the combination of poor sequences and progressive alignment strategy, mafft produces difficult to understand errors, as in figure where a portion of sequence with perfect match is shifted, resulting in a number of mismatches and gaps. moreover, as noticed by several groups, the beginning and end of this msa (and any other) are of particularly low quality, due to incomplete sequences, poly-a tails, etc. consequently, this msa cannot be used as is for most applications, e.g. to infer phylogenies. epi_isl_ is the reference genome. sequence epi_isl_ most likely contains an assembly error, as the segment aatggtttaataggcacaggtgttc is repeated twice. this, combined with a long stretch of n (unknown) characters, creates a large number of gaps in the reference sequence and in the whole msa. covid-align detects this error as an insertion (unique among all gisaid sequences) and does not have any gap in this region. on the opposite, covid-align msa starts at position in the reference genome, stops at position , , and has a fixed length of , . assembly error in figure is trimmed, and the shifted region in figure is perfectly aligned. all sites in the msa are highly conserved. insertions are short a portion of the sequence is shifted, while in this region this sequence does not contain any insertions, gaps or n characters. in this region covid-align produces a perfect match, as expected. and very rare, while some long deletions are found and confirmed as they are observed in several sequences of different origins (see above). our profile will be updated regularly. if well-assessed insertions and deletions are found (as expected) in new emerging genomes, they will be added to the profile to reflect these important features of genome diversity. to compare the two msas on the same basis, we trimmed mafft's by removing all columns corresponding to gaps in the reference genome, as well as the first and last reference positions. thus, both msas have the same length, refer to the same position in the reference genome and become similar, with , sequences having % identical alignment, and , sequences showing at least one mismatch (two different characters at the same position; n and gap characters are considered the same due to ambiguities and errors in the input sequences). visual inspection shows that most differences between both msas are situated at the beginning and end of the sequences, due to n characters, poly-a tails, incompleteness of the sequences, etc. thus, for each msa we searched in each of the , differing sequences for the "real start" and "real end" of the aligned part of the given sequence, that is, the first and last windows of length with at most mismatch with the reference genome. when both msas indicated different start/end, we used the common part. restricting the comparison to this common part, , sequences have identical alignment, and show at least one mismatch. moreover, the discarded parts (before the "real start" and after the "real end") represent a very small fraction (~ . %) of the , -sequence msa, with ~ % mismatch in average with the reference genome. on the opposite, the conserved part (~ . % of the msa) has ~ . % mismatch in average with the reference genome. this confirms that the discarded start and end parts contain too many sequencing errors and uncertainties to be used in most analyses. we compared both msas for the genomes with core differences, using the number of substitutions with the reference genome (a substitution is a difference at the same position between two a, t, g, c characters; gaps are not considered as they are sometimes confounded with unknown n characters; moreover, after trimming of mafft's both msas have the same length). results are displayed in figure . to summarize, genomes are slightly better aligned by the trimmed version of the mafft msa (with differences of at most substitutions), and are better aligned by covid-align (with differences up to substitutions). for example, for the extreme sequence (id epi_isl_ ), covid-align has substitutions with the reference genome, while the mafft msa has substitutions. figure displays a portion of both msas with strong differences. while this sequence is evolutionary close to the reference genome, it will appear as one of the most distant using the trimmed mafft msa. even if the number of such sequences is relatively low, the presence of these alignment errors will profoundly perturb analyses. to summarize, these results show the importance of trimming the msas obtained using mafft and any other standard aligner, and the accuracy of our profile hmm approach in both aligning the sequences and trimming the poorly sequenced or assembled regions, thus providing an msa that is ready to use for further evolutionary and phylogenetic studies. multiple sequence alignment modeling: methods and applications issues with sars-cov- sequencing data nextflow enables reproducible computational workflows biological sequence analysis: probabilistic models of proteins and nucleic acids alignathon: a competitive assessment of whole-genome alignment methods mafft multiple sequence alignment software version : improvements in performance and usability ngphylogeny.fr: new generation phylogenetic services for non-specialists similarities and evolutionary relationships of covid- and related viruses scaling statistical multiple sequence alignment to large datasets gisaid: global initiative on sharing all influenza data -from vision to reality on the origin and continuing evolution of sars-cov- msaviewer: interactive javascript visualization of multiple sequence alignments sincere thanks to amandine perrin (institut pasteur) for her help, and the gisaid team and all its data contributors for sharing their genome data. conflict of interest: none declared. key: cord- -y ufzx authors: ye, qing; zhou, jia; yang, guan; li, rui-ting; he, qi; zhang, yao; wu, shu-jia; chen, qi; shi, jia-hui; zhang, rong-rong; zhu, hui-min; qiu, hong-ying; zhang, tao; deng, yong-qiang; li, xiao-feng; xu, ping; yang, xiao; qin, cheng-feng title: sars-cov- infection causes transient olfactory dysfunction in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y ufzx olfactory dysfunction caused by sars-cov- infection represents as one of the most predictive and common symptoms in covid- patients. however, the causal link between sars-cov- infection and olfactory disorders remains lacking. herein we demonstrate intranasal inoculation of sars-cov- induces robust viral replication in the olfactory epithelium (oe), resulting in transient olfactory dysfunction in humanized ace mice. the sustentacular cells and bowman’s gland cells in oe were identified as the major targets of sars-cov- before the invasion into olfactory sensory neurons. remarkably, sars-cov- infection triggers cell death and immune cell infiltration, and impairs the uniformity of oe structure. combined transcriptomic and proteomic analyses reveal the induction of antiviral and inflammatory responses, as well as the downregulation of olfactory receptors in oe from the infected animals. overall, our mouse model recapitulates the olfactory dysfunction in covid- patients, and provides critical clues to understand the physiological basis for extrapulmonary manifestations of covid- . were validated to possess a high level expression of ace in mouse model (brann et al., ), which play a key role on the maintenance of blood-brain barrier, as well as the regulation of blood pressure and host immune response (armulik et al., ) . interestingly, some respiratory viruses, such as influenza virus, respiratory syncytial cov- infection on olfactory system, groups of - weeks old hace mice were intranasally infected with . × plaque-forming units (pfu) of sars-cov- . mice inoculated with the same volume of culture media were set as mock infection controls. at -and -days post infection (dpi), tissues from the respiratory tract and olfactory system were collected from the necropsied mice, respectively, and subjected to virological and immunological assays ( figure a ). as expected, high levels of sars- cov- rnas were detected in the nasal respiratory epithelium (re), trachea and lung at and dpi, and peak viral rna ( . × rna copies/mouse) was detected in the lung at dpi ( figure s a ). robust viral nucleocapsid (n) protein was detected in the lung from sars-cov- infected hace mice, but not from the control animals ( figure s b ). strikingly, high levels of viral rnas ( . × rna copies/mouse) were also detected in the olfactory mucosa (om) at dpi and maintained at high level ( . × rna copies/mouse) till dpi ( figure b) , while the viral rna levels were much lower in the ob and other parts of brain on dpi and decreased to marginal level on dpi. furthermore, immunofluorescence staining assay detected a large amount of sars-cov- n proteins in the oe along om ( figure c ), while no viral n protein was detected in the ob and other parts of brain from sars-cov- infected hace mice ( figure s c ). additionally, in situ hybridization (ish) by rnascope demonstrated that sars-cov- rna was predominantly detected in the oe ( figure s d), but no in the ob ( figure s e ). to examine whether sars-cov- infection directly impairs the olfactory function of infected mice, a standard bfpt was conducted on and dpi, respectively. remarkably, a significantly increased latency ( . s v.s. . s; p= . ) to locate food pellets was observed in sars-cov- infected mice as compared with the control animals on dpi ( figure d ). of particular note, out of infected mice developed severe symptoms of anosmia as they failed to locate the food pellet within the observation period. interestingly, recovery from olfactory dysfunction of infected mice was observed at dpi, as the latency to locate food pellets was no difference from that of the control animals ( . s v.s. . s; p= . ). thus, these results demonstrate that sars-cov- primarily infects oe and leads to olfactory dysfunction in mice. to overcome this, we took advantage of the tdtomato cassette downstream of hace transgene with an internal ribosome entry site (ires), which allows the detection of hace expression by cytoplasmic fluorescence of tdtomato ( figure s b ). an (figures a and c ). the sustentacular cells ( . %) and bowman's gland cells ( . %) represent as the major target cell types at dpi, while some microvillar cells ( . %) and hbcs ( . %) were also infected by sars-cov- (figures a and b) . additionally, a small population of iosns ( . %) were also infected by sars-cov- , while none mosn was infected at dpi (figures a and b) . interestingly, sars- cov- -positive hbcs and iosns were found adjacent to infected sustentacular cells (fig. a) . additionally, substantial viral protein was detected within the cilia, the of particular note, kegg pathway enrichment of down regulated transcripts and proteins in oe showed that genes belonging to "olfactory transduction" were significantly enriched ( figure c ). among all down regulated transcripts at dpi, were ors ( figures d and s b) , while among down regulated transcripts at dpi, were ors (figures s b and s e) . further rt-qpcr assay showed a dozen of or genes were significantly down regulated in response to sars-cov- infection ( figure e ), which may also attribute to the observed olfactory dysfunction. interestingly, sars-cov- positive signals were also observed in mosns and hbcs of infected animals, although we didn't detect any hace expression in these cells. the underlying mechanism remains elusive and a hace -independent spread of sars-cov- infection may be considered. we observed many ors were significantly down regulated at and dpi, suggesting pericytes: developmental, physiological, and pathological perspectives, problems, and promises sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes string v : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets proinflammatory cytokines in the olfactory mucosa result in covid- induced anosmia inflammatory obstruction of the olfactory clefts and olfactory loss in humans: a new syndrome? chemical senses virological assessment of hospitalized patients with covid- antigenicity of the sars-cov- spike glycoprotein clinical characteristics of hospitalized patients with receptor-transporting protein short (rtp s) mediates translocation and activation of odorant receptors by acting through multiple steps systematical optimization of reverse-phase chromatography for shotgun proteomics association of chemosensory dysfunction and covid- in patients presenting with influenza-like symptoms simple behavioral assessment of mouse olfaction. current protocols in neuroscience regeneration and rewiring of rodent olfactory sensory neurons recombinant expression, refolding, purification and characterization of pseudomonas aeruginosa protease iv in escherichia coli development of a rapid high-efficiency scalable process for acetylated sus scrofa cationic trypsin production from escherichia coli inclusion bodies. protein expr purif a pneumonia outbreak associated with a new coronavirus of probable bat origin metascape provides a biologist-oriented resource for the analysis of systems-level datasets sars-cov- receptor ace is an interferon-stimulated figure . sars-cov- primarily infects the oe and causes olfactory dysfunction schematic diagram of experimental design. briefly, groups of - weeks old hace mice were infected with . × pfu of sars-cov- intranasally. olfactory function of infected mice was measured by the buried food pellet test at indicated times post inoculation. mice were sacrificed at dpi and dpi for viral detection and histopathological analysis schematic view of the om in the nasal cavity of mice in a sagittal plane, the dotted line indicated a coronal section (upper) immunostaining of om from sars-cov- infected mice for sars-cov- n protein (red) and dapi (blue). scale bar buried food pellet test. latency to locate the food pellets for mice infected with sars-cov- (n= ) or dmem (n= ) was measured at dpi and dpi a) representative multiplex immunofluorescent staining shows sars-cov- (sars-cov- n protein-positive) infects sustentacular cells (ck -positive, yellow arrows), bowman's gland cells (sox /ck -positive, white arrows), microvillar cells (cd /ck -positive, cyan arrows), hbcs (ck -postitive, gold arrows) and iosns (gap -positive b) statistical analysis of the percentage of each cell compartment within the sars cov- -positive cells. data were presented as mean ± sd multiplex immunofluorescent staining shows an om sample at dpi with sars cov- detected in the omp-positive mosns and the underlying nerve bundles. the framed areas labelled as c and c are shown adjacently at larger magnifications. scale bar sars-cov- infection induces apoptosis and immune cell infiltration in oe. (a) representative hematoxylin-eosin (he) shows histopathological changes of oe representative multiplex immunofluorescent detection of sustentacular cells (ck - positive) and microvilli (ezrin-positive) of oe representative immunofluorescent detection of mosns (omp-positive) of oe apoptosis of olfactory epithelial cells (cleaved-caspase -positive, white) after the panels below shows apoptosis of sustentacular cells (ck - positive, yellow; indicated by cyan arrows), hbcs (ck -positive, gold; indicated by gold arrows), mosn (omp-positive, green; indicated by magenta arrows), iosn (gap -positive, magenta; indicated by magenta arrows) and olfactory nerve bundles (omp/gap -positive representative multiplex immunofluorescent staining shows infiltration of macrophages (cd -positive, magenta), dendritic cells (cd -positive, green) and neutrophils (ly- g-positive, white) in the oe after infection representative multiplex immunofluorescent staining shows infiltration of cd cytotoxic t lymphocytes (magenta) with expression of perforin (green) and granzyme b (white) in the olfactory mucosa after infection. the framed areas are shown adjacently at larger magnifications. scale bar figure . sars-cov- infection triggers regeneration of oe representative immunofluorescent staining of ck (gold), sox (red) and ki (white) shows the increase of actively cycling olfactory stem cells as labelled the framed areas labelled as a and a are shown adjacently at larger magnifications representative immunofluorescent staining of ck (gold), ck (yellow), cd (cyan) and gap (magenta) shows the transition states during the differentiation of the framed areas labelled as b -b are shown adjacently at larger magnifications. green arrows in b denote ck /gap double-positive cells. gold arrows in b denote ck /ck double-positive cells. cyan arrows and red arrow in b denote ck /ck and ck /cd double-positive cells (a) dotplot visualization of enriched go terms of up regulated genes/proteins at / dpi in oe. gene enrichment analyses were performed using metascape against the go dataset for biological processes interaction map of proteins which consistently up regulated at both transcriptomic and proteomic levels along the course of sars-cov- infection in oe table s . key: cord- -h zoplkm authors: ghobrial, moheb; charish, jason; takada, shigeki; valiante, taufik; monnier, philippe p.; radovanovic, ivan; bader, gary d.; wälchli, thomas title: the human brain vasculature shows a distinct expression pattern of sars-cov- entry factors date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: h zoplkm a large number of hospitalized covid- patients show neurological symptoms such as ischemic- and hemorrhagic stroke as well as encephalitis, and sars-cov- can directly infect endothelial cells leading to endotheliitis across multiple vascular beds. these findings suggest an involvement of the brain- and peripheral vasculature in covid- , but the underlying molecular mechanisms remain obscure. to understand the potential mechanisms underlying sars-cov- tropism for brain vasculature, we constructed a molecular atlas of the expression patterns of sars-cov- viral entry-associated genes (receptors and proteases) and sars-cov- interaction partners in human (and mouse) adult and fetal brain as well as in multiple non-cns tissues in single-cell rna-sequencing data across various datasets. we observed a distinct expression pattern of the cathepsins b (ctsb) and -l (ctsl) - which are able to substitute for the ace co-receptor tmprss - in the human vasculature with ctsb being mainly expressed in the brain vasculature and ctsl predominantly in the peripheral vasculature, and these observations were confirmed at the protein level in the human protein atlas and using immunofluorescence stainings. this expression pattern of sars-cov- viral-entry associated proteases and sars-cov- interaction partners was also present in endothelial cells and microglia in the fetal brain, suggesting a developmentally established sars-cov- entry machinery in the human vasculature. at both the adult and fetal stages, we detected a distinct pattern of sars-cov- entry associated genes’ transcripts in brain vascular endothelial cells and microglia, providing a potential explanation for an inflammatory response in the brain endothelium upon sars-cov- infection. moreover, ctsb was co-expressed in adult and fetal brain endothelial cells with genes and pathways involved in innate immunity and inflammation, angiogenesis, blood-brain-barrier permeability, vascular metabolism, and coagulation, providing a potential explanation for the role of brain endothelial cells in clinically observed (neuro)vascular symptoms in covid- patients. our study serves as a publicly available single-cell atlas of sars-cov- related entry factors and interaction partners in human and mouse brain endothelial- and perivascular cells, which can be employed for future studies in clinical samples of covid- patients. ace is expressed in vascular endothelial-and smooth muscle cells of the brain, as previously shown at the protein level using immunohistochemistry we visualized the data using t-distributed stochastic neighbor embedding (t-sne) and computed differential gene expression analyses for each specific organ in adult ( organs/tissues, ca. ' cells) and fetal ( organs/tissues, ca. ' cells) human organs/tissues (figure a ,b). in the adult, the expression of ace was generally low in all tissues, in agreement with the previously reported low expression level of ace in scrna seq datasets , demonstrating the highest expression in the jejunum, duodenum, kidney, and ileum ( figure c) . tmprss expression displayed a somewhat broader distribution than ace (as reported in sungnak et al., nat med, ) with the highest expression in the prostate, followed by trachea, colon, stomach, pancreas, kidney, jejunum, duodenum, and lung ( figure c ). in the temporal lobe and the cerebellum, both ace and tmprss demonstrated low expression ( figure c ). as mentioned above, ctsb and ctsl can substitute tmprss protease activity. therefore, to address potential alternative mechanisms for protein s protease priming and thus sars-cov- entry into cells, we hereafter mainly focus on ctsb and ctsl. ctsb and ctsl were highly expressed in various organs, and ctsb showed the highest expression in the thyroid, heart, liver, and temporal lobe, followed by the lung, artery, and cerebellum ( figure c ). ctsl, in contrast, showed low expression in the temporal lobe and cerebellum (figure c ), and ctsb showed higher expression than ctsl in both the temporal lobe and the cerebellum ( figure c ). in the fetus, the expression of both ace and tmprss was low in most tissues, displaying, however highest expression values in the intestine and adrenal gland ( figure d ); ace and tmprss showed very similar expression patterns across all organs, as revealed by dot plots (figure d ). similar to the adult, ctsb and ctsl were variably but overall highly expressed in all tested organs, and ctsb showed higher expression than ctsl in both the spinal cord and brain (figure d ). to examine differential expression patterns of the brain/cns and the periphery (non-cns), we compared the brain/cns (pool of temporal lobe and cerebellum in the adult, and pool of brain and spinal cord in the fetus) to a pool of all peripheral organs (figure e ,f). interestingly, ace and tmprss showed a higher expression in the periphery in both the adult and the fetus (figure e ,f). in the adult, ctsb was higher expressed in the brain/cns, both in intensity and percent of expression, whereas ctsl showed higher expression in the periphery (figure e ). in the fetus, both ctsb and ctsl showed a higher expression in the periphery (figure f ). ctsb also displayed a higher expression than ctsl in the adult and fetal mouse brain/cns as compared to the periphery, thereby confirming the observations made in humans (extended interestingly, ctsb was the highest expressed sars-cov- entry associated gene in adult and fetal brain endothelial whereas ctsl showed the highest expression of all sars-cov- entry-associated genes in adult and fetal peripheral endothelial (figure g ,h). together, these data suggest that the distinct expression pattern of sars-cov- proteases and associated enzymes in the cns is developmentally established. . the brain vasculature is comprised of endothelial cells and perivascular cells of the neurovascular unit including neurons, pericytes, astrocytes, and immune cells such as microglia and macrophages , , , , (figure a ). to examine the expression patterns of sars-cov- entry factor transcripts in the neurovascular unit and thereby address potential entry mechanisms of sars-cov- , we next addressed brain endothelial-and perivascular cells ( figure a) . first, looking at all brain cells (endothelial-and perivascular cells pooled together), we observed that ctsb showed the highest transcript expression of sars-cov- entry associated genes in both the adult and fetal brain, and that the sars-cov- entry associated genes were more highly expressed in the adult than in the fetal stage while displaying certain similarity ( figure b ). next, we compared the sars-cov- entryassociated gene expression patterns between endothelial-and perivascular cells at the adult and fetal stages (figure c ,d,f,h,j). next, we wondered whether these observed ctsb/ctsl-and sars-cov- entry associated factor expression patterns were retained in in vitro brain organoid models which were recently employed to study sars-cov- invasiveness to the brain . in a recent single cell rna-seq datasets of brain organoids , ctsb showed the highest expression in cells undergoing epithelial-to-mesenchymal transition (emt), followed by endothelial-like and endothelial-like progenitor cells and neuronal-like progenitor cells (extended data figure a -c). when comparing the expression patterns in endothelial-like cells to the pool of perivascular cells, ctsb was is higher in endothelial-like cells and ctsl was higher in perivascular cells (extended data figure c ). these expression patterns were similar to the ones in the adult and fetal human brain datasets, indicating that brain organoids might constitute a model system to further explore the brain vascular-related observations made here together, these data indicate that endothelial cells and perivascular cells (mainly microglia) of the fetal and adult cns neurovascular unit express ctsb but not tmprss , suggesting a potential alternative of how sars-cov- might enter the brain endothelium and neurovascular unit. importantly, additional validation experiments need to be carried out to validate these findings at the protein level. to further characterize the sars-cov- entry factor transcript expression patterns in brain endothelial cells, we examined the different brain endothelial cell clusters according to their arterio-venous specification, as previously described in mouse brain , , (figure a ). we found that ctsb was expressed in multiple brain endothelial cell types including veins, capillaries, endothelial cells (ec ), antigen presenting-and vein (antigen presenting) endothelial cells (figure b ). in all adult and fetal brain endothelial cell clusters, ctsb was amongst the highest expressed sars-cov- entry-associated transcripts (figure a ,b and extended data figure a ,b). in the adult brain vasculature, the veins cluster showed the highest average expression of ctsb, followed by ec , capillaries, antigen-presenting veins, and antigen-presenting endothelial cells (figure a,b) , whereas ctsb showed the highest expression in fetal veins > capillaries > arteries > ec (extended data figure a,b) . notably, recent studies using single-cell transcriptomics in human and mouse tissues revealed endothelial clusters in a variety of tissues including the human bladder, and kidney (and others) as well as the mouse and human lung expressing mch class ii genes such as hla-dpa and hla-dra involved in mhc class ii-mediated antigen processing, loading and presentation . these observations of professional antigen-presenting signatures of endothelial cells suggested potential immune functions of endothelial cells, and ctsb might be presented at the cell-surface upon sars-cov- -infection of brain endothelial cells. these results suggest that ctsb expression in different endothelial clusters including antigenpresenting endothelial cells is developmentally established and might be reactivated in pathological conditions, for instance upon endothelial cell infection with sars-cov- . ctsb correlates with genes and pathways involved in inflammation, angiogenesis, coagulation, and blood-brain-barrier permeability in brain endothelial cells next, we wondered which genes were associated with ctsb in brain endothelial cells. we performed spearman's correlation analysis of ctsb in brain endothelial cells within the datasets. the observed correlation coefficients of the top ctsb-correlated genes were relatively low, which might be due to dropout effects and technical noise in the scrna seq datasets, and the top ctsb correlated genes showed a slight predominance for the capillary-venous side, consistent with that of ctsb ( figure c ). notably, we observed various genes associated with immune functions/inflammation including innate and antiviral immune table ). expression of most of these inflammatory genes is highest in venous endothelial cells, and for some in vein antigen-presenting endothelial cells, antigen-presenting endothelial cells, and ec (figure c ), indicating immunity functions for these endothelial cell clusters. the relatively high expression of inflammatory-related genes in antigen-presenting (venous) endothelial cells is plausible whereas the relative preference for the venous side remains to be further explored. moreover, genes involved in angiogenesis and cell-extracellular matrix interactions including spred , fscn , slc a r , gpr , and others, and were also abundantly present in the rank list and were highest in capillaries, veins, and ec (figure c, green and supplementary table ). we also observed several genes associated with endothelial glucose-and fatty acid metabolism, namely mrpl and txn , which were highest in ec and veins (figure c, blue and supplementary table ) . we observed a similar expression pattern of the top ctsb-correlated genes in the fetal brain vascular endothelial cells (extended data figure c and supplementary genes involved in angiogenesis and cell-extracellular matrix interactions including sparc, rhoa, tmsb , vim, cldn , and others, as well as genes associated with endothelial glucose-and fatty acid metabolism, namely sepw and pomp were also abundantly present and were all highest in veins, followed by capillaries, arteries, and ec (extended data table ) . we next performed pathway analysis using gsea and cytoscape on the ctsb-correlated genes. notably, the top regulated pathways included inflammation, angiogenesis, coagulation, cell-extracellular matrix interaction, viral-host interaction, vascular metabolism, blood-brain-barrier permeability, and reactive oxygen species (ros) in both the adult and the fetal brain endothelium ( together, these data reveal that ctsb is highly expressed in various endothelial cell clusters of the fetal and adult human brain and that pathways downstream of ctsb might provide a suggestive explanation of some of the neurovascular symptoms observed in covid- patients. upon entry of the sars-cov- virus into the host cell, the virus interacts with multiple intracellular proteins . therefore, we next addressed the expression of the recently described sars-cov- interaction partners (= protein-protein interactions (ppis) = proteins that are physically associated with sars-cov- proteins) in the different adult and fetal organs (extended data figure a -d). in the adult, sars-cov- interactions partners were highly expressed in trachea, cerebellum, artery, kidney, and temporal lobe (extended data figure e ), and the pooled analysis showed a higher expression in cns versus non-cns organs (extended data figure f ). in the fetus, sars-cov- interactions partners were highly expressed in fetal male gonad, adrenal gland, brain conjointly with heart, intestine and kidney (extended data figure g ). in the pooled analysis, however, due to the relatively low expression in the fetal spinal cord the expression was higher in non-cns-as compared to cns organs (extended data figure h ). next, we focused on the endothelial cells in the different adult and fetal organs (extended data figure i -l). in the adult, endothelial cells of the temporal lobe and the cerebellum were among the organs/tissue displaying high expression of sars-cov- interaction partners, together with endothelial cells from artery, cervix, trachea, uterus, kidney, muscle and spleen (extended data figure m ). the relatively high expression in cns versus non cns endothelial cells was further revealed in the pooled analysis (extended data figure n ). at the fetal stage, brain endothelial cells showed the highest expression of sars-cov- interaction partners among all organs/tissues, followed by endothelial cells from male gonad, adrenal gland, pancreas, heart, and muscle (extended data figure o ). again, due to the relatively when addressing the different brain endothelial cell clusters according to their arterio-venous specification, we found that in both adult and fetal brain endothelial cell clusters, sars-cov- interaction partners showed a tendency of higher expression in the veins and capillaries as compared to the arteries (extended data figure y -z). in the adult endothelial cells, the cluster endothelial cell ec showed the highest expression of sars-cov- interaction partners, followed by antigen-presenting veins, veins, capillaries, endomt, and arteries (extended data figure y ), whereas sars-cov- interaction partners showed the highest expression in fetal capillaries > arteries > veins > ec (extended data figure z ). with ctsb being a sars-cov- entry associated gene and sars-cov- interaction partners being intracellular virus-interaction partners post viral cellular entry, we next wondered whether there was a correlation between the top ctsb correlated pathways and the sars-cov- interaction partners. notably, the top ctsb-correlated pathways including inflammation, angiogenesis, coagulation, cell-extracellular matrix interaction, vascular metabolism, blood-brain-barrier permeability, and reactive oxygen species (ros) all displayed a high correlation with sars-cov- interaction partners in both the adult and fetal brain endothelium (extended data figure a -zvi, extended data figure a -zvi). thus, all these pathways that were recently linked to covid- patients with neurological-and vascular symptoms , , , showed a strong correlation with the intracellular sars-cov- interaction partners, therefore suggesting a link between ctsb-mediated cellular entry and intracellular signaling post viral entry into the host cell. ctsb -but not tmprss -protein is expressed in the human brain vasculature whereas ctsl-but not tmprss -protein is expressed in the peripheral vasculature finally, to examine the observed mrna expression patterns of ctsb, ctsl, and tmprss at the protein level, we took advantage of the publicly available human protein atlas dataset (https://www.proteinatlas.org/). tmprss did not show protein expression in the brain, but was expressed in the prostate, jejunum, kidney, and others, again similar to the observed mrna expression patterns ( figure a ). interestingly, and in line with the observations made in the scrna-seq data, immunohistochemical analysis revealed that tmprss was neither expressed in the brain endothelium ( in contrast to ctsb, ctsl did not show protein expression in the brain, but was expressed in numerous other organs known to be affected by systemic sars-cov- , such as the prostate, lung, gastrointestinal tract, liver, kidney, jejunum, and colon, among others (extended data immunohistochemical analysis using the validated antibody cab , , revealed that ctsl was expressed in endothelial cells of various organs but not in the brain vasculature, displaying high cytoplasmic/membranous expression in peripheral vascular endothelial cells across various peripheral organs including the liver, placenta, lung, spleen, colon, kidney, pancreas, and heart (extended data figure a -m). to further validate these findings, we performed double immunostainings for an endothelial cell marker (cd ) and ctsb, ctsl, and tmprss on human temporal lobe specimens from our neurosurgical operating theatre. ctsb showed high expression in cd + endothelial cells as well as in cd cells of the neuropil and intermediate expression in neun + neurons, gfap + astrocytes, ib + microglia (figure a -zii), in agreement with the observations made in the hpa and the scrna seq datasets described above. tmprss and ctsl either were absent in brain endothelial-and perivascular cells or showed very low expression levels (figure a -l), again confirming the scrna-seq-and human protein atlas data. together, these data indicate that whereas tmprss is absent in brain-and peripheral vascular endothelial cells, ctsb is expressed in the brain vasculature and ctsl in the peripheral vasculature at both the mrna and protein levels, suggesting potential alternative mechanism for sars-cov- cellular entry into the brain and peripheral vasculature. in this study, analysing multiple scrna-seq datasets as well as referring to the human protein atlas and performing immunofluorescent stainings, we found that the human brain vasculature expresses a distinct profile of sars-cov- entry associated genes. most interestingly, the viral entry-associated protease ctsb but not tmprss is highly expressed in brain vascular endothelial cells whereas ctsl but not tmprss is highly expressed in vascular endothelial cells of peripheral organs. these observations suggest a potential mechanism of sars-cov- viral entry into brain-and peripheral endothelial cells that might underlie the neurovascular-and vascular symptoms observed in some covid- patients (extended data figure ). the expression patterns we found can provide insight about the susceptibility of the human vasculature to sars-cov- infections and couldin a next stepbe followed by similar studies on currently still limited clinical samples from covid- patients. towards this end we examined gene expression of sars-cov- entry-associated genes and interaction partners in multiple scrna-seq datasets from different tissues, including the brain, spinal cord and various peripheral tissues. we are aware of the limitations of these analyses, as for instance sparse cell types might be lacking or underrepresented/under-detected due to their low abundance, technical limitations related to isolation protocols, or bioinformatic analyses including technical/computational dropout effects. thus, the specificity is high (meaning positive results are highly reliable) whereas the sensitivity is limited and thus negative results should be interpreted with care. sars-cov- shows systemic effects affecting multiple organ systems including the brain, liver, kidney, heart, and others, and increasing evidence suggests that blood vessel endothelial cells exert crucial roles in the underlying pathogenesis , , , . whereas cellular entry of sars-cov- exclusively depends on ctsl and tmprss , , cellular entry of sars- cov- can occur either via tmprss or ctsb/ctsl , (extended data figure ). thus, the widespread expression of ctsb in the brain (vasculature) and the expression of ctsl in multiple other sars-cov- affected organs/tissues suggest that ctsb and ctsl might be involved in alternative entry mechanisms and transmission routes that could be responsible for the neurovascular-and vascular phenotypes observed in covid- patients , . furthermore, our findings that ctsb isin addition to the brain endotheliumalso highly expressed in microglia (and that ctsl shows high expression in peripheral endothelial cells and macrophages) suggests a potential mechanism involving these two cell types e.g. via an endothelial-to-microglia crosstalk , , that could explain endotheliitis/endothelialitis in peripheral organs , and in the brain . it has been suggested that the sars-cov- affected endothelial cell is characterized by increased vascular leakage, pro-coagulative and proinflammatory states in the lung , and that a combination of those might account for the % of covid- patients with severe lung damage in part owed to an overreacting inflammatory response and multi-organ failure . interestingly, our endothelial cluster analysis revealed a relative preference of ctsb expression for the venous endothelial side while the correlation analysis showed a high expression of inflammatory-related genes in antigen-presenting (venous) endothelial cells, suggesting a preference of sars-cov- entry factors and downstream pathways for the venous endothelium. in light of the findings that sars-cov- caused venous vasculitis in small veins of the brain and lung , and that sars-cov- causes severe endothelialitis particularly in venous vascular bed consistent with our findingsit will be exciting to further investigate the potentially different susceptibility of endothelial cells along the arteriovenous specification. moreover, ctsb is known to exert pivotal roles in cancer including brain tumors and brain inflammation/inflammatory brain disease via interleukin- β (il -β) and tumor necrosis factor-α (tnf-α) , and is capable of crossing the blood-brain-barrier . thus, we speculate that the co-expression of ctsb in microglia and endothelial cells could account for vascular leakage and opening of the blood-brain-barrier with subsequent increased leukocyte migration across the bbb and infection of the brain vascular endothelium , which might explain neurological symptoms such as stroke, epilepsy, necrotizing hemorrhages, and encephalopathy , . the results from our correlation-and pathway analyses showing that ctsb is tightly linked to genes and pathways involved in viral entry, inflammation, angiogenesis, coagulation, vascular metabolism, and blood-brain-barrier leakage, further supports these speculations. for instance, we observed clusters of antigen (mhc class ii)presenting brain endothelial cells expressing ctsb, possibly explaining cellular crosstalk (for instance with microglia) required for inflammatory responses ; as endothelial cells cannot act as antigen-presenting cells on their own. along these lines, the other cathepsinsin addition to ctsb -showing higher expression in the adult-(ctsa, ctsd) and fetal (ctsc, ctsf) brain endothelium were all linked to inflammatory processes. for instance, ctsb is known to interact with ctsa and ctsd , , - and to be involved in brain inflammation and other immune functions such as brain inflammation and immune response . moreover, the viral receptor/receptor-associated enzyme st gal encoding for the human beta-galactoside alpha- , -sialyltransferase , is a protein that is involved in the generation of the cell-surface carbohydrate determinants and differentiation antigens hb- , cdw , and cd and that is found in mouse high endothelial cells of mesenteric lymph node and peyer's patches, where its suggested function is to be involved in the b cell homing to peyer's patches . interestingly, our correlation analysis indeed showed a high correlation of ctsb with inflammatory pathways, angiogenesis, and viral-host-interaction, therefore further suggesting a direct link of ctsb with inflammatory, angiogenic, and coagulative responses in the brain endothelium, pathways that were all recently shown to exert pivotal roles in sars-cov- pathogenesis and covid- patients in peripheral-and cns organs , , . moreover, inflammatory-, angiogenic-, coagulative-, cell-ecm interaction, vascular/bbb permeability-, metabolism, and oxidative stress pathways all correlated with both the sars-cov- entryassociated gene ctsb as well with the sars-cov- intracellular interaction partners, suggesting a link between sars-cov- entry mechanisms and intracellular signaling. thus, the role of ctsb and the other sars-cov- entry-associated genes in inflammatory, angiogenic, and the other aforementioned responses within the brain endothelium deserves further investigation. for instance, we observed a high correlation of with endothelial glucose-and fatty acid metabolism, known to be key for viral replication and propagation , , indicating that endothelial cells represent an attractive metabolic target for sars-cov- infection. to that regard, it was recently suggested that risk factors for covid- such as old age, obesity, hypertension, and diabetes mellitus are all characterized by pre-existing vascular dysfunction with altered vascular endothelial metabolism . thus, whether certain endothelial clusters in specific organs and certain pathological conditions display a metabolic signature that is more prone to sars-cov- infection remains to be explored. interestingly, ctsb and ctsl have been shown to substitute tmprss for viral entry of the ebola virus, which affects the vasculature in the brain and in peripheral organs , but their role in sars-cov- remains unknown. whereas the expression of ctsb in the brain and in multiple peripheral tissues was previously reported - , , , the high expression of ctsb in the brain vasculature endothelium and of ctsl in the peripheral vascular endothelium at both the mrna and protein levels was not reported to our knowledge, nor was the absence of tmprss . our own experiments revealed ctsb expression in endothelial cells, neurons, astrocytes, pericytes and microglia within the human brain neurovascular unit and might indeed provide the basis of a potential explanation for some of the putative sars-cov- mediated effects on the human blood-brain-barrier as well as some of the observed neurovascular symptoms in covid- patients , . we thus propose a working model in which sars-cov- (and to a much lower extent sars-cov- ) can infect brain endothelial cells via ace and mainly ctsb and peripheral endothelial cells via ace and mainly ctsl (extended data figure ). however, regarding the periphery, further validation is needed addressing the ctsl expression patterns within the perivascular niche in various peripheral vascular beds taking into account their tissue-specific properties . we are also well aware of the limitations of our study needing further confirmation using functional assays in vivo and in vitro. taken together, our findings may have important implications for understanding sars-cov- cellular entry and viral transmissibility in the brain, the brain vasculature, and in peripheral vascular beds. targeting ctsb (and ctsl) using already approved drugs/known inhibitors (for instance e- d, ammonium chloride) could result in inhibiting angiogenesis, vascular metabolism, vascular leakage, and the inflammatory response, resulting in vascular normalization , . as ctsb is expressed in the brain endothelium, and as sars-cov- invades the brain (putatively via ctsb) and might affect blood-brain-barrier integrity , , our discoveries might have important translational implications for both neurovascular-and vascular symptoms observed in covid- patients. in summary, our work further illustrates the opportunities emerging from integrative analyses of publicly available datasets including scrna sequencing and the human protein atlas. tissue sections ( µm) of human adult brain (derived from temporal lobectomies) were stained for tmprss , ctsb, ctsl (green), the vascular endothelial cell markers cd (pan-endothelial marker, red), the microglial marker iba (red), the neuronal marker neun (red), the pericyte marker pdgfrb (red), the astrocytic marker gfap (red), and dapi nuclear counterstaining (blue). a-l expression of tmprss , ctsb, and ctsl in blood vessel endothelial cells in the adult human brain. datasets were retrieved from published datasets of multiple human and mouse tissues of the human and mouse cell atlas , . adult brain datasets were retrieved from publically available sources including (jäkel et al. benjamini -hochberg correction false-discovery rate (fdr) q-value that ranges from (highly significant) to (not significant). the resulting pathways are ranked using nes, and adult ecs fetal ecs adult average expression percent expressed non-cns ecs fetal endothelial cells average expression percent expressed figure antigen presenting antigen presenting a c e mast cell mast cell t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s macrophage epithelial cell mural cells extended data figure pvcs ecs a c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p e p d p p s t g a l s t g a l percentm a p k s l c a m a v s c d s p r e d r p s p f s c n m x a l a d h i g d b m r p l s l c a p r t g s l c a r a r l i p e i f b c r p s a g p r k e a p d n a j c c y b r t x n r a s g r p p r e l p c g n l s p o c k p r y f a m b r p s c o r f t m e m a s f t d p r e p r b m b g r m d n a j c c a a c . m y l k f a m a x r n e l t d r p l a l c y b a t r a d d u b e l e s d b n i p la c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p e a c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p e p d p p s t g a l s t g a l a c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p e p d p p s t g a l s t g aa c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p ea c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p ep r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p ep r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z at m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c ta c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p et m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p ea c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p ea c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z ah j a c e t m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z a n p et m p r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z ap r s s c t s b c t s l c t s a c t s c c t s d c t s e c t s f c t s g c t s h c t s k c t s o c t s s c t s v c t s w c t s z ao a t m s b g n g s l c a a c t b p e c a m z f p l p r o c r c d k l f g n a i s a v i m h f b t m s b x r h o b v a c c n l t s c d s e r p i n h p k m c s r p m y l a r n a s e e f n a c a v p s a p f a m a r h o c n d u f b y w h a z a m b s g d s t n s e l k m a g e d d a d b m c d c sars-cov- interaction partners fa metabolism extended data figure the neuroinvasive potential of sars-cov may play a role in the respiratory failure of covid- patients epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a novel coronavirus from patients with pneumonia in china the socio-economic implications of the coronavirus pandemic (covid- ): a review coronavirus patients are reporting neurological symptoms. here's what you need to know sars-cov- : underestimated damage to nervous system guillain-barre syndrome associated with sars-cov- infection in a pediatric patient guillain-barre syndrome associated with sars-cov- detection and a covid- infection in a child sars-cov- -associated guillain-barre syndrome with dysautonomia guillain-barre syndrome associated with sars-cov- guillain-barre syndrome associated with sars-cov- infection guillain-barre syndrome associated with sars-cov- infection guillain-barre syndrome associated with sars-cov- infection: causality or coincidence? the neuroinvasive potential of sars-cov may play a role in the respiratory failure of covid- patients cov- : olfaction, brain infection, and the urgent need for clinical samples allowing earlier virus detection neurological involvement of coronavirus disease : a systematic review neurological associations of covid- neurological and neuropsychiatric complications of covid- in patients: a uk-wide surveillance study inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry proteolytic cleavage of the sars-cov- spike protein and the role of the novel s /s site effect of angiotensin-converting enzyme inhibition and angiotensin ii receptor blockers on cardiac angiotensin-converting enzyme tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis a sars-cov- protein interaction map reveals targets for drug repurposing construction of a human cell landscape at single-cell level sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to -ncov infection single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses the neurovascular unit -concept review the blood-brain barrier/neurovascular unit in health and disease the neurovascular unit coming of age: a journey through neurovascular coupling in health and disease altered human oligodendrocyte heterogeneity in multiple sclerosis single-cell multi-omic integration compares and contrasts features of integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain massively parallel single-nucleus rna-seq with dronc-seq single-cell genomics identifies cell type-specific molecular changes in autism molecular diversity of midbrain development in mouse, human, and stem cells neuroinvasion of sars-cov- in human and mouse brain. biorxiv engineering of human brain organoids with a functional vascular-like system a molecular atlas of cell types and zonation in the brain vasculature single-cell transcriptome atlas of murine endothelial cells an integrated gene expression landscape profiling approach to identify lung tumor endothelial cell heterogeneity and angiogenic candidates pulmonary vascular endothelialitis, thrombosis, and angiogenesis in covid- covid- : the vasculature unleashed profiling of atherosclerotic lesions by gene and tissue microarrays reveals pcsk as a novel protease in unstable carotid atherosclerosis matrix metalloproteinase- cleavage of the beta integrin ectodomain facilitates colon cancer cell motility heterogeneity in signaling pathways of gastroenteropancreatic neuroendocrine tumors: a critical look at notch signaling pathway microglial interactions with the neurovascular system in physiology and pathology microglia-blood vessel interactions: a double-edged sword in brain pathologies the clinical pathology of severe acute respiratory syndrome (sars): a report from china direct evidence of sars-cov- in gut endothelium the nlrp inflammasome contributes to brain injury in pneumococcal meningitis and is activated through atp-dependent lysosomal cathepsin b release the authors declare no competing financial interests. key: cord- -v s ski authors: belmonte-reche, efres; serrano-chacón, israel; gonzalez, carlos; gallo, juan; bañobre-lópez, manuel title: exploring g and c-quadruplex structures as potential targets against the severe acute respiratory syndrome coronavirus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v s ski in this paper we report the analysis of the -ncov genome and related viruses using an upgraded version of the open-source algorithm g -im grinder. this version improves the functionality of the software, including an easy way to determine the potential biological features affected by the candidates found. the quadruplex definitions of the algorithm were optimized for -ncov. using a lax quadruplex definition ruleset, which accepts amongst other parameters two residue g- and c-tracks, hundreds of potential quadruplex candidates were discovered. these sequences were evaluated by their in vitro formation probability, their position in the viral rna, their uniqueness and their conservation rates (calculated in over three thousand different covid- clinical cases and sequenced at different times and locations during the ongoing pandemic). these results were compared sequentially to other coronaviridae members, other group iv (+)ssrna viruses and the entire realm. sequences found in common with other species were further analyzed and characterized. sequences with high scores unique to the -ncov were studied to investigate the variations amongst similar species. quadruplex formation of the best candidates was then confirmed experimentally. using nmr and cd spectroscopy, we found several highly stable rna quadruplexes that may be suitable theranostic targets against the -ncov. graphical abstract the severe acute respiratory syndrome coronavirus (sars-cov- or -ncov) is a positive-sense single-stranded rna virus from the betacoronavirus genus, within the coronaviridae family of the nidovirales order. although it is believed to have originated from a bat-borne coronavirus, ( - ) the -ncov can spread between humans with no need of other vectors or reservoirs for its transmission. the virus is responsible for the ongoing covid- pandemic that has caused hundreds of thousands of deaths, millions of infected, and a disastrous strain on the economy of most countries and citizens worldwide. the origin of the virus has been traced back to the chinese city of wuhan, where the first cases of infected individuals were reported amongst the workers of the huanan seafood market ( , ) . this wet exotic animal market, where wild animals including bats and pangolins are sold and prepared for consumption, offers ample opportunities for pathogenic bacteria and virus to adapt and thrive. such circumstances led cheng and colleagues to predict the current pandemic back in ( ) . in their own words: "the presence of a large reservoir of sars-cov-like viruses in horseshoe bats, together with the culture of eating exotic mammals in southern china, is a time bomb. the possibility of the re-emergence of sars and other novel viruses from animals or laboratories and therefore the need for preparedness should not be ignored". the fight against the -ncov has now become a global problem. in this current scenario, the scientific community is playing a fundamental role in defeating the virus and minimizing the number of victims. their work includes, to name a few, the development of fast and reliable detection methods, the identification of therapeutic targets within the virus, and the development of active drugs and vaccines to cure and to prevent infections, respectively. g-quadruplexes (g s) and i-motifs (ims) have been proposed as therapeutic targets in many disease aetiologies. g s are guanine (g) rich dna or arn nucleic acid sequences where successive gs stack in a planar fashion via hoogstein bonds to form four-stranded structures, stabilized by monovalent cations ( ) . ims on the contrary, are cytosine (c)-rich regions that fold into tetrameric structures of stranded duplexes ( ) ( ) ( ) . these are sustained by hydrogen bonds between the intercalated nucleotide base pairs c·c + when under acidic physiological conditions. the importance of these genomic secondary structures has been abundantly studied during the last years ( ) ( ) ( ) ( ) ( ) ( ) . they have been found to be regulatory elements in the human genome implicated in key functions such as telomere maintenance and genome transcription regulation, replication and repair ( ) . g structures have also been identified in fungi ( ) ( ) ( ) ( ) , bacteria ( ) ( ) ( ) ( ) ( ) and parasites ( ) ( ) ( ) ( ) ( ) ( ) . their occurrence are known in many viruses that afflict humans as well. these include the hiv- ( ) ( ) ( ) , epstein-barr ( , ) , human and manatee papilloma ( , ) , herpes simplex ( , ) , hepatitis b ( ) , ebola ( ) and zika ( ) viruses. here they can regulate the viral replication, recombination and virulence ( , , ) . we reported the presence of the known cmyb.s ( ) im within the epstein-barr virus ( ) . despite the lack off reports, ims present great potential as viable targets against viruses. for example, the in silico analysis of the rubella virus revealed an extremely dense genome of potential ims (density as counts per genomic length) that surpassed its human counterpart by over an order of magnitude ( ) . other viruses, such as the measles and hepacivirus c, were also very rich in potential ims with densities similar to the human genome. in this work, we wished to contribute to the ongoing effort against the covid- pandemic by investigating the relationship between the -ncov and quadruplex targets. with this aim, we analysed the prevalence, distribution and relationships of potential g sequences (pqs) and potential im sequences (pims) in its genome. these pqs and pims have been assessed according to their potential to form, uniqueness, frequency of appearance, conservation rates between different -ncov clinical cases, confirmed quadruplex-forming sequence presence and localization within the genome. the study of the -ncov and its quadruplex results were expanded to integrate the coronaviridae family, group iv of the baltimore classification and the entire virus realm, as to allow a wider range of interpretation. with all this information at hand, our final objective was to identify biologically important pqs and pims candidates in the virus. to substantiate our bioinformatic analysis, we analysed experimentally some of these sequences by cd and nmr spectroscopies. our in vitro results confirmed the formation of stable quadruplexes that can form in the viral genome, suggesting that they may be suitable targets for new therapeutic or diagnostic agents ( , ) . hence, our analysis into the virus realm, and especially the -ncov, may provide useful insights into using quadruplex structures as targets in future anti-viral treatments. in this work, we have used the g -im grinder (gig) package for the analysis of all viruses ( ) . gig is an r-based algorithm that locates, quantifies and qualifies pqs, pims and their potential higher-order versions in rna and dna genomes. in order to extract more information and better analyse the viruses, we first upgraded gig. two new functions were developed and are now incorporated in the gig-package (as of gig version . . ) named gig.seq.analysis and gig.df.genomicfeatures (supplementary material, section ). to help us locate any g or im already studied in the literature, we updated the gig's database (gig.db) to version . . . the library now includes quadruplex-related sequences that can be identified within any of gig's results. the database is categorized by the capability of the sequence to form or not form quadruplex structures ( do, do not), their relation to gbased or c-based quadruplexes ( g , im) and the genome type ( dna, rna). the reference information (including doi and/or pubmedid) of each sequence is also listed and accessible to facilitate further studies. with these upgrades at hand, we retrieved the -ncov's reference sequence (gcf_ . ) from the ncbi database ( ) . we also downloaded those of other viruses which can cause mortal illness in humans, including six other pathogenic coronavirus, as comparison (supplementary material, section ). as a workflow, we applied the functions gig.seq.analysis (to study their g-and c-run characteristics), g imgrinder (to locate quadruplex candidates) and g .listanalysis (to compare quadruplex results between genomes) from the g -im grinder package (gig) to all the viruses. the 'size restricted overlapping search and frequency count' method (method , m a and m b) was used to locate all the potential candidates. then, these pqs and pims were evaluated by their frequency of appearance in the corresponding genome, the presence of known-to-form quadruplex structures sequences within and their probability of quadruplex-formation score (as the mean of g hunter ( ) and the adaptation of the pqsfinder algorithm ( ) ). to compare between virus species, we calculated the density of potential quadruplex sequences per nucleotides (density = × ℎ ). we previously saw that viruses have a wider-range of pqs and pims densities than that of the human, fungi, bacteria and parasite genomes ( ) . some were totally void whilst others were very rich in potential candidates. so, we explored different quadruplex definitions to determine the most useful configurations for the analysis of the viruses at hand. these different definitions control the characteristics of what the algorithm considers a quadruplex. they include the acceptable size of g-or c-repetitions to be considered a run, the acceptable amount of bulges within these runs, the acceptable loop sizes between runs, the acceptable number of runs to constitute a pqs or pims, and the total acceptable length of the sequence (figure , a) . a flexible configuration of quadruplex definitions will detect larger amounts of candidates at the expense of requiring more computing power and accepting sequences that are more ambiguous in forming quadruplex structures in vitro (as determined by their score; with longer loops, smaller runs, more bulges and more complementary g/c %, figure , b). more constrained definitions result in the opposite. hence, for the analysis, we chose three different configurations: a lax configuration (which accepts run bulges and longer ranges of runs, loops and total sizes), the predefined configuration of the package (which restricts sizes but still accepts run bulges), and the original formation. this scale is the mean of g hunter (that considers g richness and c skewness for pqs or vice versa for pims as factors) ( ) and an adaptation of pqsfinder (that considers run, loop and bulge effects on the structure) ( ) . positive values mean that the sequence is more capable of forming g s, whilst more negative values mean that it is more capable of forming i-motifs. values near zero are not good candidates. c. left, gig's quadruplex definitions used in this work. granting more freedom to the quadruplex search will increase the number of structures found, at the expense of requiring more computational power and potentially finding more sequences with ambiguous quadruplex formation potential. c. right, total results found within the -ncov by configuration and score criteria. d. pqs and pims densities (per nucleotides) found per different configuration and score criteria for viruses that cause mortal illness in humans. x scale is in logarithmic scale (base ). results are categorized by their score: intense colours (blue for pqs, yellow for pims) are the most probable to form in vitro (score over | |), lighter bars are the density of structures with at least a score of | | and grey bars are the densities without the score filter (hence accepting all potential structures). the different -ncov viral genomes sequenced during the pandemic (from december- to july- , by different laboratories worldwide) were retrieved from the online database gisaid ( , ) . these genomes are the result of filtering the database by their coverage (< % n content), completeness (> nucleotides) and association to a clinical patient history (only those that have it). all other viral genomes used were retrieved from the ncbi database. to analyse these genomes, we employed the workflow described in the pre-analysis section using the lax parameter configuration. we investigated the biological features potentially data were smoothed using the means-movement function within the jasco graphing software. melting transitions were recorded by the monitoring the decrease of the cd signal at nm. heating rates were °c/h. transitions were evaluated using a nonlinear least squares fit assuming a two-state model with sloping pre-and post-transitional baselines. oligonucleotide solutions for cd measurements were prepared at the same buffer conditions as the nmr experiments. oligonucleotide concentration was of m. we further expanded the search to the remaining viruses classified in the ncbi database to validate the predictions of our bioinformatics search, we selected three candidates using the criteria mentioned in the material and methods section. two of them were potential g forming sequences and the third one was an im candidate. the first g (covid-rna-g - ) examined is found in the n-gene of the -ncov with a conservation rate of . % ( figure , a entry ) . the nmr spectra of this rna exhibited a broad set of signals around - ppm, characteristic of guanine imino protons involved in gtetrads. these signals are observed at high temperatures indicating that the g-quadruplex is quite stable ( figure , b left). additional signals around - ppm, which are characteristic of watson-crick base-pairs, can also be observed at low temperature. these interactions may arise from loops between g-tracts or alternative conformation such as hairpin-like structures. the cd spectra of the candidate revealed a positive band at nm and negative band with a minimum at nm consistent with the formation of a g of a parallel topology ( figure , d, left) . melting experiments monitored by cd confirmed the great stability of the g , whose melting temperature (tm) was calculated to be . ºc at [k + ] = mm. encouraged by these results, we additionally selected another candidate for experimental analysis with a very high conservation rate (covid-rna-g - ; figure , a, entry ). this candidate is located in the orf ab gene within the nsp region. as for the previous analysis, nmr and cd spectra revealed a stable parallel g quadruplex ( figure , b and d, right), with a cd-monitored tm of . ºc. in this case, the cd spectra presented an additional band at nm, most likely related to the association between two quadruplexes to form a dimeric structure. this is consistent with the number of imino signals observed in the nmr spectra at high temperatures, which suggests the presence of more than two g-tetrads. in the case of pims, we selected a very conserved candidate ( %) found in the orf ab -nsp region of the virus (covid-rna-im- ; figure , a entry ). in nmr, only two small signals appeared in the . - . ppm range at neutral ph. under acidic conditions more signals were observable, including a peak at ppm which could be associated with c·c+ iminos. however, further analysis by d nmr spectroscopy revealed that this signal arises from an au base pair (supplementary material, section . figure ). we must conclude that this sequence, although folded, does not form an im. this result is not totally unexpected, since the lower stability of rna vs dna im is well known. in spite of this negative result, and to check the capability of our algorithm to detect ims, we decided to study the dna version of this sequence (covid-dna-im- ; figure , a, entry ). most interestingly, the nmr spectra of this dna oligonucleotide exhibited several imino signals in the - ppm range, characteristic of c·c+ base pairs. these signals are observable in the . to . ph range ( figure , c, and supplementary material, section . figure ) . additionally, amino groups from c·c+ (in the - in this work, we have used g -im grinder to analyse the genome of the -ncov, and that of many other viruses, in search off potential quadruplex (both g and im) therapeutic targets. to or entamoeba histolytica may be less rich in g and c content ( ) , the size of these genomes enables finding rich g-or c-tracks that can ultimately form potential quadruplexes. in most viruses, however, this does not take place because of the small size of the genomes (in the range of tens to hundreds thousand nucleotides versus the tens of millions for the parasites mentioned, and thousands of millions for humans). furthermore, most of the g s found in viruses are complex sequences, with short runs and bulges (for example, hiv- ( , ) and ebola ( )), which elude detection when following traditional quadruplex definitions. to overcome these problems, we took advantage of the great modulability of g -im grinder, and developed, tested and successfully employed a lax quadruplex definition configuration for the analysis. with these settings, the number of candidates found increased greatly and included the complex sequences expected in viruses, at the expense of needing more computational power. with all these updates and configurations at hand, we focused on the reference -ncov and located pqs and pims unique (only occurring once in the genome) sequences dispersed unevenly in the genome. % of these candidates had at least a medium probability of formation (score over | |). these were concentrated in the orfab gene (especially nsp and areas for pqs; and nsp , and for pims), the n-gene and s-gene (a highly variable gene that binds to the ace membrane receptor and controls the viral penetration into the cell ( )). the orf a (related to virulence by necrotic death inducement and cytokine expression ( )), m-gene (which encodes membrane glycoprotein ( )), orf and utr regions also presented these candidates. here they may play their biological role if formed. other genes, such as orb a and b, and orf were found totally void of any quadruplex candidates. we calculated the -ncov candidate's quadruplex conservation rates and quadruplex-related the highest scoring candidates found in -ncov were however not common to any other coronaviridae member species. so, we investigated the differences between them through genome alignments and found that most of the sequence versions amongst species ( out of ) were still able to form potential quadruplex structures even with modifications. therefore, these pqs and pims, although different from those in the -ncov, maintain their potential biological role and importance. expanding the search for common candidates to the entire virus realm, we matched one pqs and pims from the -ncov with the potential quadruplexes found in four viruses from group i belonging to the herpesviridae, podoviridae and siphoviridae families (all dsdna). with g -im grinder, we analysed the entire virus realm in a similar fashion to other studies in the literature ( , ) . however, we used a lax definition of quadruplexes to detect gand c-structures and focused the comparison of the realm to the -ncov. whilst the -ncov did not present any of the published quadruplex sequences listed in the gig.db within its genome, other viruses including a wigeon-afflicting coronavirus did. in the entire virus realm, viruses presented at least one confirmed g sequence in their genome, while at least one confirmed im sequence (the dimensional discrepancies between both results may partially be due to the difference in the number of g and im entries in the database; and respectively). the sheer volume of species with confirmed quadruplex structures in all groups of viruses suggests that quadruplexes may be common and necessary genomic regulatory elements for viruses to "live", thrive and adapt; as seen in other organisms such as humans. however, the prevalence is not homogeneous and varies broadly at the group level although not that much at the family level. for example, some families like group i's herpesviridae and we, therefore, selected the best candidates to evaluate in vitro. the highly conserved candidate covid-rna- formed a parallel g stable even at ºc. this g , located in the ngene, can possibly interact with the viral rna packaging, transcription and replication functions of the virus ( ) . the second sequence, covid-rna- , also formed a stable parallel quadruplex structure. in this case, the quadruplex monomers interacted amongst themselves to form a higher order structure. covid-rna- is located in the nsp region of orf ab very near its sud domain. this area has been associated with the increased pathogenicity of the virus compared to other coronaviridae that do not present it ( ) . additionally, it has been suggested that the sud domain interacts with g-quadruplexes of the host. these results, however, open the possibility of an intrinsic gene modulation that may be linked with an increased virulence. such a hypothesis can be extended to the sars-cov, as another stable pqs candidate was found in its genome in the same location (figure , b ). for pims, the dna version of a candidate located in the orf ab gene of the -ncov and with a % conservation rate formed an im at almost neutral ph. however, the sars-cov version of the im (which differs by one nucleotide in the first loop, from tt to tg) was unable to form even at ph . . as tt base pairs are common capping positions, the substitution of the t might prevent the folding in sars-cov. additionally, the presence of c in g s lowers overall stability of the quadruplex as c can base pair with g and ultimately hinder g-quartet formation ( ) . for c-based structures, the opposite but with the same effect might also be happening. when we analysed the rna version of the -ncov im, it did not form a quadruplex structure. despite the fact that the sequences found in -ncov have an intermediate probability of formation, rna ims are known to be less stable than their dna-versions ( ) . still, g -im grinder methodology identified several more candidates with the potential to form ims in the virus. these results prove that especially for dna, g -im grinder can be used to find and characterize ims in even c-poor genomes. overall, these results greatly expand the current knowledge we have regarding quadruplexes and the -ncov ( ) , and open the door for targeting viruses in general, and the -ncov in particular, through the use of these nucleic sequences as therapeutic targets in future anti-viral treatments. the supplementary material is available online and includes information regarding the genomes used, how to access the results and additional figures. the -new coronavirus epidemic: evidence for virus evolution another decade, another coronavirus a pneumonia outbreak associated with a new coronavirus of probable bat origin clinical features of patients infected with novel coronavirus in wuhan epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection helix formation by guanylic acid ) i-motif dna: structure, stability and targeting with ligands fundamental aspects of the nucleic acid i-motif structures ) i-motif dna: structural features and significance to cell biology evidence for intramolecularly folded i-dna structures in biologically relevant ccc-repeat sequences genome-wide analysis reveals regulatory role of g dna in gene transcription ) '-utr rna g-quadruplexes: translation regulation and targeting g-quadruplexes and their regulatory roles in biology identification of multiple genomic dna sequences which form i-motif structures at neutral ph g-quadruplexes: prediction, characterization, and biological application the regulation and functions of dna and rna g-quadruplexes genomic distribution and functional analyses of potential g-quadruplex-forming sequences in saccharomyces cerevisiae g-quadruplex dna sequences are evolutionarily conserved and associated with distinct genomic features in saccharomyces cerevisiae dna replication through g-quadruplex motifs is promoted by the saccharomyces cerevisiae pif dna helicase a novel g-quadruplex binding protein in yeast-slx genome-wide prediction of g dna as regulatory motifs: role in escherichia coli global regulation control of bacterial nitrate assimilation by stabilization of g-quadruplex dna case studies on potential g-quadruplexforming sequences from the bacterial orders deinococcales and thermales derived from a survey of published genomes the presence and localization of g-quadruplex forming sequences in the domain of bacteria rna g-quadruplex structures mediate gene regulation in bacteria structural polymorphism of the four-repeat oxytricha nova telomeric dna sequences g-quadruplexes in pathogens: a common route to virulence control? genome-wide regulatory dynamics of g-quadruplexes in human malaria parasite plasmodium falciparum telomeric g-quadruplexes: from human to tetrahymena repeats g-quadruplex identification in the genome of protozoan parasites points to naphthalene diimide ligands as new antiparasitic agents parasitic protozoa: unusual roles for g-quadruplexes in early-diverging eukaryotes a dynamic g-quadruplex region regulates the hiv- long terminal repeat promoter formation of a unique cluster of g-quadruplex structures in the hiv- nef coding region: implications for antiviral activity topology of a dna g-quadruplex structure formed in the hiv- promoter: a potential target for anti-hiv drug development role for g-quadruplex rna binding by epstein-barr virus nuclear antigen in dna replication and metaphase chromosome attachment g-quadruplexes regulate epstein-barr virus-encoded nuclear antigen mrna translation human papillomavirus g-quadruplexes identification of g-quadruplex forming sequences in three manatee papillomaviruses the herpes simplex virus- genome contains multiple clusters of repeated g-quadruplex: implications for the antiviral activity of a g-quadruplex ligand signals of human herpesviruses contain a highly conserved g-quadruplex motif a g-quadruplex motif in an envelope gene promoter regulates transcription and virion secretion in hbv genotype b chemical targeting of a g-quadruplex rna in the ebola virus l gene zika virus genomic rna possesses conserved g-quadruplexes characteristic of the flaviviridae family genome-wide analysis of regulatory gquadruplexes affecting gene expression in human cytomegalovirus g-quadruplexes and g-quadruplex ligands: targets and tools in antiviral therapy a dynamic i-motif with a duplex stem-loop in the long terminal repeat promoter of the hiv- proviral genome modulates viral transcription ) i-motif formation in gene promoters: unusually stable formation in sequences complementary to known g-quadruplexes g -im grinder: when size and frequency matter. g-quadruplex, i-motif and higher order structure search and analysis tool g-quadruplexes in viruses: function and potential therapeutic applications database resources of the national center for biotechnology information re-evaluation of g-quadruplex propensity with g hunter pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in r prevalence of quadruplexes in the human genome highly prevalent putative quadruplex sequence motifs in human dna data, disease and diplomacy: gisaid's innovative contribution to global health: data, disease and diplomacy gisaid: global initiative on sharing all influenza datafrom vision to reality systematic investigation of sequence requirements for dna i-motif formation the rna i-motif sars-coronavirus open reading frame- a drives multimodal necrotic cell death evolutionary trajectory for the emergence of novel coronavirus sars-cov- . pathogens, g-quadruplex forming sequences in the genome of all known human viruses: a comprehensive guide relationship between g-quadruplex sequence composition in viruses and their hosts characterization of n protein selfassociation in coronavirus ribonucleoprotein complexes the sars-unique domain (sud) of sars coronavirus contains two macrodomains that bind g-quadruplexes new scoring system to identify rna g-quadruplex folding discovery of gquadruplex-forming sequences in sars-cov- the authors thank dr. matilde arévalo, rafael ferreira and sarah heselden for their help regarding this topic. the authors declare no conflict of interest. key: cord- -krim zt authors: wang, deguo title: one-pot detection of covid- with real-time reverse-transcription loop-mediated isothermal amplification (rt-lamp) assay and visual rt-lamp assay date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: krim zt background rapid and reliable diagnostic assays were critical for prevention and control of the coronavirus pneumonia caused by covid- . objective this study was to establish one-pot real-time reverse-transcription loop-mediated isothermal amplification (rt-lamp) assay and one-pot visual rt-lamp assay for the detection of covid- . methods six specific lamp primers targeting the n gene of covid- were designed, the rt-lamp reaction system was optimized with plasmid puc containing n gene sequence, the detection limit was determined with a serial dilution of the plasmid puc containing n gene sequence, and the one-pot real-time rt-lamp assay and one-pot visual rt-lamp assay for the detection of covid- were established. results our results showed that the one-pot rt-lamp assays can detect covid- with a limit of ≥ copies per μl− of puc containing n gene sequence. conclusion this study provides rapid, reliable and sensitive tools for facilitating preliminary and cost-effective prevention and control of covid- . the outbreak of covid- , firstly reported from chinese wuhan in december , has spread to many other countries in the last few weeks , and has caused almost infections and more than people died. the virus spreads so quickly that it has attracted the globe attention, and rapid diagnosis is one of the effective ways to prevent and control the diseases. chest ct and rt-pcr has been used for the clinical diagnosis of the coronavirus pneumonia , , however, chest ct needs ct equipment and put the professional operator at risk of infection, and rt-pcr has high professional and technical requirements for operators. loop-mediated isothermal amplification (lamp) can amplify nucleic acids under isothermal conditions , and it had the advantages over real-time pcr assays in specificity sensitivity, cost effectiveness and rapidity , , . the objective of this study was to develop the one-pot real-time rt-lamp assay and the one-pot visual rt-lamp assay for diagnosis of the pneumonia caused by covid- , which would be suitable for use at less developed areas. the rt-lamp primers targeting the n gene of covid- (genbank accession no. mn . ) were designed using primerexplorer v (http://primerexplorer.jp/e/) and oligo (molecular biology insights, inc. colorado springs, co, usa) software packages. the primer sequences are gcc aaa agg ctt cta cgc a (f ), ttt ggc ctt gtt gtt gtt gg (b ), tcc cct act gct gcc tgg agt ttt cgg cag tca agc ctc ttc (fip), tcc tgc tag aat ggc tgg caa ttt ttt ttg ctc tca agc tgg ttc a (bip) , cga cta cgt gat gag gaa cga (lf) and gcg gtg atg ctg ctc t (lb), table , and the length of the targeted sequence was bp. the n gene (genbank accession no. mn . ) of covid was chemically synthesized and cloned into puc plasmid (herein referred to as puc -n dna) by general biosystems (anhui) co., ltd, the puc -n dna was used as the template for optimization of the rt-lamp system, as well as for determination of sensitivity. the real-time rt-lamp assay with above designed rt-lamp primers was performed in a -μl reaction mixture containing . mm each of forward inner primer (fip) and backward inner primer (bip), . mm each of forward outer primer (f ) and backward outer primer (b ), . mm of forward loop primer (lf) and backward loop primer (lb), . mm dntps, × bst dna polymerase buffer (zhengzhou shenxiang industrial co., ltd, china), × evagreen, × rox, pg puc -n dna, and u bst dna/rna polymerase . (new england biolabs, inc., ma, usa) , . the reaction mixtures were heated at °c , °c , °c and °c for min ( s per cycle), individually. the amplification plot and melt curve were obtained using a stepone tm system (applied biosystems, foster city, ca, usa). because the rna extraction step was omitted, the effect of inhibitors in blood samples on the amplification efficiency of the rt-lamp assays was to be determined. µl, . µl, µl and . µl pork blood samples (purchased from xuchang market, china) free of covid- were dissolved in µl -ncov-fast-sample nucleic acid releasing agent, respectively, which were added to above µl real-time rt-lamp reaction system and heated at the optimized temperature for min ( s per cycle) in stepone tm system (applied biosystems, foster city, ca, usa). the analytic sensitivities of the newly developed real-time rt-lamp assay and visual rt-lamp assay were determined with puc -n dna ranging from - . fg, and the reaction mixtures were heated at the optimal temperature for min in a steponetm system (applied biosystems, foster city, ca, usa) or in a water bath, when water bath was used, the reaction tube was sunk into water. the stability and the repeatability of the real-time lamp system were tested with one-month interval. the test was carried out at the optimized temperature determined as described above with four positive controls ( pg puc -n dna) and four negative controls (ddh o) in a stepone tm system (applied biosystems, foster city, ca, usa). for the rt-lamp reaction, as shown in figure , all positive controls (with puc -n dna as template) had amplification, and the results of all negative controls (dna template substituted by ddh o) were negative at °c, °c, °c and °c , however, there was no significant difference in the amplification efficiency, and °c was temporarily was selected for the subsequent experiments. the effect of inhibitors in blood samples on the amplification efficiency had been determined, as figure indicated, when the real-time rt-lamp reaction system was added with µl blood samples dissolved in µl -ncov-fast-sample nucleic acid releasing agent (hunan shengxiang biology technology co., ltd, china), there was no amplification; when . µl blood samples and µl -ncov-fast-sample nucleic acid releasing agent added, there was slight amplification; when µl or . µl blood samples dissolved in µl -ncov-fast-sample nucleic acid releasing agent added, there was acceptable effect on the real-time rt-lamp reaction. therefore, the blood sample volume of µl was selected for µl one-pot real-time or visual rt-lamp reaction. the detection limits of the one-pot real-time rt-lamp assay and the one-pot visual rt-lamp assay were determined using puc -n dna ranging from - . fg at °c for min in a steponetm system (applied biosystems, foster city, ca, usa) or in a water bath. the detection limits of both the one-pot real-time rt-lamp assay and the one-pot visual rt-lamp assay were found to be ≥ . fg puc -n dna (figure ) , which was equivalent to ≥ copies. the stability and the repeatability of the real-time lamp system were tested with one-month interval. as shown in figure , the reactions of four positive controls were positive with the same amplification plot and melt curve, while the reactions of all negative controls were negative with the same melt curve. this demonstrated that the newly established real-time lamp assay was robust and repeatable. the one-pot real-time rt-lamp assay and one-pot visual rt-lamp assay for detection of covid- were established in the study, and the detection limit was ≥ copies. although the specificity of the established rt-lamp assays had been not determined, they were still considered to be highly specific for following reason. upon alignment in dna data bank of japan, the target sequence of established rt-lamp assays had % identities ( / ) with that of covid- strains, and had % identities ( / ) with that of congeneric sars coronavirus strains, among which there were bp on the rt-lamp primers, it was reported that bp primer-template mismatches extend the detection time from min to min , therefore, the established rt-lamp assays were theoretically highly specific. the bst dna/rna polymerase . (new england biolabs, inc., ma, usa) had been used in the study. the enzyme has faborable perforance of both amplification and reverse transcription activity, so it can use either dna or rna as template , , no reverse transcriptase is needed, and the estalished assays can be directly used for detection of the rna virus. the nucleic acid extraction had been omitted in the established one-pot real-time or visual rt-lamp assays, which had greatly reduced the infection risk of the operators. furthermore, it was recommended the visual rt-lamp assay over the real-time rt-lamp assay, one of four negative controls in above sensitivity determination of the real-time rt-lamp assay had amplification, while all four negative controls of the visual rt-lamp assay had no amplification, because the aerosol that may leak was washed by water in water bath, the false positive rate caused by aerosol of the real-time rt-lamp assay was higher than that of the visual rt-lamp assay which sunk the reaction tube in water. this study had established the one-pot real-time rt-lamp assay and one-pot visual rt-lamp assay for the detection of covid- , and provided the rapid, reliable and sensitive tools for facilitating preliminary and cost-effective prevention and control of covid- . no external funding the covid- epidemic chest imaging appearance of covid- infection detection of novel coronavirus ( -ncov) by real-time rt-pcr loop-mediated isothermal amplification of dna accelerated reaction by loop-mediated isothermal amplification using loop primers development and evaluation of a loop-mediated isothermal amplification (lamp) method for detecting listeria monocytogenes in raw milk rapid detection of listeria monocytogenes in raw milk with loop-mediated isothermal amplification and chemosensor evaluation and improvement of lamp assays for detection of escherichia coli serogroups o , o , o , o , o , o , and o . afri health sci development of a real-time loop-mediated isothermal amplification (lamp) assay and visual lamp assay for detection of african swine fever virus (asfv) effect of internal primer-template mismatches on loop-mediated isothermal amplification a novel thermostable polymerase for rna and dna loop-mediated isothermal amplification (lamp) innate reverse transcriptase activity of dna polymerase for isothermal rna direct detection the author declares no competing interests. key: cord- -c lljdi authors: lopez-rincon, alejandro; tonda, alberto; mendoza-maldonado, lucero; mulders, daphne g.j.c.; molenkamp, richard; perez-romero, carmina a.; claassen, eric; garssen, johan; kraneveld, aletta d. title: classification and specific primer design for accurate detection of sars-cov- using deep learning date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c lljdi in this paper, deep learning is coupled with explainable artificial intelligence techniques for the discovery of representative genomic sequences in sars-cov- . a convolutional neural network classifier is first trained on sequences from available repositories, separating the genome of different virus strains from the coronavirus family with considerable accuracy. the network’s behavior is then analyzed, to discover sequences used by the model to identify sars-cov- , ultimately uncovering sequences exclusive to it. the discovered sequences are first validated on samples from other repositories, and proven able to separate sars-cov- from different virus strains with near-perfect accuracy. next, one of the sequences is selected to generate a primer set, and tested against other state-of-the-art primer sets on existing datasets, obtaining competitive results. finally, the primer is synthesized and tested on patient samples (n= previously tested positive), delivering a sensibility similar to routine diagnostic methods, and % specificity. in this paper, deep learning is coupled with explainable artificial intelligence techniques for the discovery of representative genomic sequences in sars-cov- . a convolutional neural network classifier is first trained on sequences from ngdc, separating the genome of different virus strains from the coronavirus family with accuracy . %. the network’s behavior is then analyzed, to discover sequences used by the model to identify sars-cov- , ultimately uncovering sequences exclusive to it. the discovered sequences are validated on samples from ncbi and gisaid, and proven able to separate sars-cov- from different virus strains with near-perfect accuracy. next, one of the sequences is selected to generate a primer set, and tested against other state-of-the-art primer sets, obtaining competitive results. finally, the primer is synthesized and tested on patient samples (n= previously tested positive), delivering a sensibility similar to routine diagnostic methods, and % specificity. the proposed methodology has a substantial added value over existing methods, as it is able to both identify promising primer sets for a virus from a limited amount of data, and deliver effective results in a minimal amount of time. considering the possibility of future pandemics, these characteristics are invaluable to promptly create specific detection methods for diagnostics. the coronaviridae family presents a positive sense, single-strand rna genome. these viruses have been identified in avian and mammal hosts, including humans. coronaviruses have genomes from . kilo base-pairs (kbps) to . kbps, with g + c contents varying from % to %; human-infecting coronaviruses belonging to this family include sars-cov, mers-cov, hcov-oc , hcov- e, hcov-nl and hcov-hku . in december , sars-cov- , a novel, human-infecting coronavirus was identified in wuhan, china, using next generation sequencing (ngs) . as of the th august of , the new sars-cov- has , , confirmed cases across almost all countries, with , , cases in the european region . in addition, sars-cov- has an estimated mortality rate of - %, and it is spreading faster than sars-cov and mers-cov . as a typical rna virus, new mutations appears every replication cycle of coronavirus, and its average evolutionary rate is roughly - nucleotide substitutions per site each year . in the specific case of sars-cov- , rt-qpcr testing using primers in orf ab and n genes have been used to identified the infection in humans . this method has come into question; yang et al. in a study from respiratory specimens showed that for - days after onset of illness, the sputum samples had a negative rate of . % in severe and . % in mild cases, follow by . % and . % in nasal swabs and finally % and . % for throat swabs . zhao et al. reports that . % of patients did not show positive in rt-pcr test , which has been further explored by arevalo et al. and woloshin et al. . these problems could be the result of the variation of viral rna sequences within virus species, and the viral load in different anatomic sites . it has been noted that, population mutation frequency of site , located in orf ab gene and site , located in orf gene gradually increased from to % as the epidemic progressed . apart from the false negative test problems, sars-cov- assays can yield a small portion of false positives through nonspecific detection of other coronaviruses, as the virus is closely related to other coronavirus organisms . in addition, sars-cov- may be present with other respiratory infections, hindering its identification , . thus, it is fundamental to improve existing diagnostic tools to contain the spread. for example, diagnostic tools combining computed tomography (ct) scans with deep learning have been proposed, achieving an improved detection accuracy of . % . another solution being used for studying sars-cov- , is sequencing of the viral complementary dna (cdna). for example, we can use this sequencing data with cdna, resulting from the pcr of the original viral rna; e,g, real-time pcr amplicons to identify the sars-cov- . classification using viral sequencing techniques is mainly based on alignment methods such as fasta and blast . these methods rely on the assumption that cdna sequences share common features, and their order prevails among different sequences , . however, these methods suffer from the necessity of needing base sequences for the detection . nevertheless, it is necessary to develop innovative improved diagnostic tools that target the genome to improve the identification of pathogenic variants, as sometimes several tests, are needed to have an accurate diagnosis. therefore, as an alternative, deep learning methods have been suggested for classification of dna sequences. the advantage of these methods are that they do not need pre-selected features to identify or classify dna sequences. deep learning has been efficiently used for classification of dna sequences, using one-hot label encoding and convolution neural networks (cnn) , , albeit the examples in literature are featuring dna sequences of length up to bps, only. in particular, for the case of viruses, ngs genomic samples might not be identified by blast, as there are no reference sequences valid for all genomes, as viruses have high mutation frequency . alternative solutions based on deep learning have been proposed to classify viruses, by dividing sequences into pieces of fixed length, ranging from bps to , bps . however, this approach has the negative effect of potentially ignoring part of the information contained in the input sequence, that is disregarded if it cannot completely fill a piece of fixed size. the global impact of sars-cov- prompted researchers to apply effective alignment-free methods to the classification of the virus: for example, in the authors propose the use of machine learning digital signal processing for separating the virus from similar strains, with remarkable accuracy. nevertheless, there is no human-readable information that can be extracted from their black-box procedure, so the biological insight provided by their approach is limited. given the impact of the world-wide outbreak, international efforts have been made to simplify the access to viral genomic data and metadata through international repositories, such as the national genomics data center (ngdc) repository , the national center for biotechnology information (ncbi) repository and the global initiative on sharing all influenza data (gisaid) repository , expecting that the easiness to acquire information would make it possible to develop medical countermeasures to control the disease worldwide, as it happened in similar cases earlier [ ] [ ] [ ] . thus, taking advantage of the available information of international resources without any political and/or economic borders, we propose an innovative system based on viral gene sequencing. using a cnn to separate coronaviruses belonging to different strains , including sars-cov- , we apply techniques inspired by explainable ai in computer vision to discover representative cdna sequences that the network uses to classify sars-cov- . we then validate the discovered sequences on datasets not used during the training of the cnn, and show how to exploit them to create a novel, highly informative set of sequence features (e.g. viral sequences). such sequences can be later inspected and analyzed by human experts. experimental results show that the new set of sequence features leads traditional, simple classifiers, to correctly assess sars-cov- with remarkable accuracy (> %). a few of the discovered sequences also possess the correct characteristics for potentially becoming primers, as just checking for their presence in samples is enough to specifically identify sars-cov- ( fig. ) . laboratory testing on the most promising sequences identified showed that the primers found by our approach can be a viable alternative to the commonly adopted primers at the time of writing. figure . overall procedure to find the specific sars-cov- -bps rna sequences to create a primer set. summarizing the results of experiments - ( fig. ) , we discovered meaningful -bps sequences that best characterize sars-cov- . for all the analyzed data, these sequences appear only in sars-cov- samples and not in any other viruses, as summarized in table . remarkably, our results outperform earlier publications using machine learning for identifying sars-cov- (see for example ) , with the added benefit of producing human-readable results instead of a plain black box classifier. we calculated the frequency of appearance of different primer sets' sequences used in sars-cov- rt-pcr tests developed by who referral laboratories and compared it to our primer design in the dataset from the gisaid ( table ) repository. all of the sequences have a frequency of appearance of > %, with the exception of china-cdc-n-f with a . %. this is consistent with the percentage of genomes with mutation in the primer region in the gisaid latest update summary of august th . in the analysis of specificity in silico, we compared all the primers sets' sequences with the ncbi-b and ngdc dataset, the results show that hku-n-f, hku-n-r, charite-e-f, charite-e-r and us-cdc-n -f are not specific to sars-cov- as they detect sars-cov- too. the rest of the sequences, including our design, only appear in sars-cov- . thus, in summary from different primer sets, of them are not specific to sars-cov- , and from the remaining , considering frequency of appearance only, our design is in rd best option calculated with the lower limit. . % china-cdc-n- to validate the data obtained in silico by laboratory methods a conventional pcr was performed on cdna obtained from rna from sars-cov- and other human coronaviruses. in addition, rnas from nasopharyngeal swabs from six patients previously diagnosed with sars-cov- infection and four patients negative for sars-cov- by routine diagnostic method were analyzed with the same conventional pcr (fig. ) . different dilutions of sars-cov- rna were detected with similar sensitivity compared to the diagnostic reference assay. (fig. lanes - ) . our candidate primer set exclusively detected sars-cov- and did not amplify rna from other human coronaviruses (figure , lanes - ) . the candidate primer set was able to detect sars-cov- rna from patient samples previously found positive for sars-cov- , but not in patients previously found negative (fig. , lanes - ) . although further validation will be required to develop this candidate primer set into a diagnostic assay, our results clearly demonstrate the power of our method to select potential sequences for further validation. being able to reliably identify sars-cov- and distinguish it from other similar pathogens is important to contain its spread. the time of processing samples and the availability of reliable diagnostic tests is a challenge during an outbreak. developing innovative diagnostic tools that target the genome to improve the identification of pathogens, can help reduce health costs and time to identify the infection, instead of using unsuitable treatments or testing. moreover, it is necessary to perform an accurate classification to identify the different species of coronavirus, the genetic variants that could appear in the future, and the co-infections with other pathogens. given the high transmissibility of the sars-cov- , the proper diagnosis of the disease is urgent, to stop the virus from spreading further. considering the false negatives given by the standard rt-qpcr detection, better implementations such as using deep learning are necessary in order to properly detect the virus. while the accuracy of current rt-qpcr testing is around %, and ct scans with deep learning go up at %, we believe that the use of the sequences detected by a cnn-based methodology has the potential to improve the accuracy of the diagnosis. our results, show that by targeting one out of the selected -bps specific sequences, we are able to distinguish sars-cov- , from any other virus (> %). further testing is necessary to confirm these promising results so it is essential to create multidisciplinary groups that work to stop the outbreak. finally, as an interesting remark, by comparing the discovered sequences against other hosts, we noticed that from the sequences exclusive to sars-cov- , one of them appears in of samples from manis javanina. in contrast, of the sequences of sars-cov- appear in the only sample available from rhinolophus affinis and out of in canine samples (table ) . this is consistent with the findings of zhang et al. , , and could point to the zootonic origin of the virus. nevertheless, more data is necessary. as a result of the high density populations, and ever growing interaction between people, it is possible that other pandemics may occur. we believe that our methodology has a substantial added value over traditional methods, because it is a fast method and only limited set of viral sequencing data is needed. moreover, this procedure led to a primer set with a very high specificity for sars-cov- with at least the same accuracy as the best primers sets in the world developed by who referral laboratories. thus, thinking forward, our methodology can be applied in future viral pandemics to speed up the development of accurate detection methods for diagnosis and thereby contribute to limit the spread of a virus. the cnn used during all the experiments is composed of one convolutional layer with different filters or weights (each with window size ) with maxpooling (with pool size and stride ), a fully connected layer ( rectified linear units with dropout probability . ), and a final softmax layer with units, to differentiate the different classes of coronavirus strains. the optimized used is adaptive momentum (adam) , with learning rate − and a batch size of samples, run for , epochs . the convolutional layer of the network, in simple terms, is analyzing subsequences of base pairs that can appear in different points of the virus genome. we selected as designed primers for rt-pcr tests have a length of - bps normally. the pool size of the maxpooling represents the interval in which a specific -bps sequence can be recognized (in this case, positions). through the training process, the convolutional layer is de-facto learning new features to characterize the problem, directly from the data. in this specific case, the new features are -bps sequences that can more easily separate different virus strains. by analyzing the result of each filter in a convolutional layer, and how its output interacts with the corresponding max pooling, it is possible to detect human-readable sequences of base pairs that might provide domain experts with relevant information. it is important to notice that these sequences are not bound to specific locations of the genome; thanks to its structure, the cnn is able to detect them and recognize their importance even if their position is displaced in different samples. we downloaded sequences (*.fasta files) from the ngdc on march th , (table ) . we left out sars-cov- sequences and then, we divided the rest of the data into % training, % validation, % testing. the trained cnn described above obtained a mean accuracy of . % in a -fold cross-validation. once the network is trained, in a first analysis, we plot the inputs and outputs of the convolutional layer, to visually inspect for patterns. as an example, in fig. a we report the visualization of the first , bps of each of the samples from the ngdc repository . each filter slides a -bps window over the input, and for each step produces a single value. the output of a filter is thus a sequence of values in ( , ). the output of the max pooling for each of the filters is then further inspected for patterns. it is noticeable how samples belonging to different classes can be already visually distinguished. at this step, we identify filter as the most promising, as it seems to focus on a few relevant points in the genome, that could correspond to meaningful cdna sequences. given this data, it is now possible to identify the -bps sequences that obtained the highest output values in the max pooling layer of filter , in a section of positions. this process results in ( , divided by ) max pooling features, each one identifying the -bps sequence that obtained the highest value from the convolutional filter, in a specific -position interval of the original genome: the first max pooling feature will cover positions - , the second will cover position - , and so on. we graph the whole set of max pooling features for the complete data , ( * ), fig. b . the cnn architecture, and the visualization of the filter, and max pooling are available in the supplementary material section . analyzing the different sequence values appearing in the max pooling feature space, a total of , unique -bps cdna sequences, that can potentially be very informative for identifying different virus strains. for example, sequence agg taa caa acc aac caa ctt is only found inside the class of sars-cov- , in out of available samples. sequence cac gag taa ctc gtc tat ctt is present again only in sars-cov- , in out of the samples. the combination of the convolutional and max pooling layer allows the cnn to identify sequences even if they are slightly displaced in the genome (by up to positions). thus, we create a table of feature appearance of each of the sequences selected from the previous step. this results, in just a set of feature to differentiate sars-cov- from other viruses. the experiments presented in the following subsections to validate our method have different objectives and make use of different datasets. a summary of all the experiments and datasets used is shown in figure . table . organism, assigned label, and number of samples in the unique sequences for the ngdc repository (left) and query: gene="orf ab" and host="homo sapiens" and "complete genome" in the ncbi repository (right). we use the ncbi organism naming convention we downloaded the dataset from the ngdc repository on march th . we removed repeated sequences and applied the procedure to translate the data into the sequence feature space. this leaves us with a frequency table of , features ( -bps sequences) with samples (table (left) ). next, we ran a state-of-the-art feature selection algorithm , , to reduce the sequences needed to identify different virus strain to the bare minimum. remarkably, we are then able to correctly differentiate all the coronavirus (mers-cov, sars-cov- , sars-cov- , etc) samples using only of the original , sequences, obtaining a % accuracy in a -fold cross-validation with a simpler and more traditional classifier, such as logistic regression. the list of the features is available in the supplementary material section . we downloaded data from ncbi on march th , with the following query: gene="orf ab" and host="homo sapiens" and "complete genome". the query resulted in non-repeated sequences (table (right)). we call this dataset ncbi-a, where sequences belong to sars-cov- . then, we applied the procedure to translate the data into the set of sequence features, and we run the same state-of-the-art feature selection algorithm . the result is a list of different sequences (table ) , for which just checking for their presence is enough to differentiate between sars-cov- and other viruses in the dataset, with a % accuracy. each of the sequences, in fact, only appears in sars-cov- samples. , for a total of more than viruses. then, we applied the procedure to translate the data into the sequence feature space and run the feature reduction algorithm . this results in extra sequences of bps: just by checking for their presence, we are able to separate sars-cov- from the rest of the samples with a % accuracy. the sequences are: aat aga aga att att cta ttc and cga taa caa ctt ctg tgg ccc. from the gisaid repository , we downloaded , sequences available on august th , for sars-cov- , from different countries, from there , have as < % ns, high coverage and host="homo sapiens". then, we calculated the frequency table of the -bps sequences obtained from experiments and , to verify which sequences remain and could be used for detection. the appearance frequency of the target sequences among the samples in the gisaid dataset is reported in table , second column. in addition, we downloaded sequences from gisaid repository of other hosts (manis javanica, rhinolophus affinis, canine and felis catus) to make a comparison in the sequences from experiment and . experiment : design of the candidate primer set. after the analysis carried out on the deep learning model, we ran an analysis with primer plus , to see which of the sequences could be used as a forward primer, using sample ncbi nc . as the reference sars-cov- sequence. we uncover the sequence tag cac tct cca agg gtg ttc that shows a frequency of appearance of . % in viral genomes available from different countries in gisaid and . % in the ncbi datasets. using the reference sars-cov- sequence, we identify that this discovered sequence is located between nucleotides , and , in the orf a gene. in sars-cov, this gene encodes a protein of aa, that is related with necrotic cell death , chemokine production like interleukin (il- ) and rantes/ccl , nfκb activation resulting in an inflammatory response and may play an important role in the virus life cycle . we design a specific primer set for detection of sars-cov- using primer plus . we use tag cac tct cca agg gtg ttc as forward primer and gca aag cca aag cct cat ta as reverse primer, obtaining an amplicon size of bps. then, we run an in-silico pcr test using fastpcr . with default parameters in nc . used as a reference sars-cov- sequence, this yields t m = . • c for the forward primer, t m = . • c for the reverse primer and ta = • c. in addition, we calculated the frequency of appearance of different primers sets' sequences used in sars-cov- rt-qpcr tests developed by who referral laboratories and compared it to our primer design sequences in , sequences from the gisaid repository and the samples of different coronaviruses from the ngdc dataset from experiment . the used primers set are developed by university of hong kong (hku-n); charite, berlin, germany (charite-e); us-cdc, united states (us-cdc-n ,us-cdc-n ,us-cdc-n ) and china cdc, china (china-cdc-orf ab, china-cdc-n) ( table ) . we selected this primers as they are the ones more commonly used as stated in the gisaid status update of august , . we do not consider degenerate primer sets. coronavirus genomics and bioinformatics analysis. viruses genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding who report coronavirus disease (covid- ) (world health organization combination of rt-qpcr testing and clinical features for diagnosis of covid- facilitates management of sars-cov- outbreak detection of novel coronavirus ( -ncov) by real-time rt-pcr evaluating the accuracy of different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding of -ncov infections antibody responses to sars-cov- in patients of novel coronavirus disease false-negative results of initial rt-pcr assays for covid- : a systematic review false negative tests for sars-cov- infection-challenges and implications next generation sequencing of viral rna genomes correlation of chest ct and rt-pcr testing in coronavirus disease (covid- ) in china: a report of cases co-infections in people with covid- : a systematic review and meta-analysis clinical diagnosis of samples with -novel coronavirus in wuhan a deep learning algorithm using ct images to screen for corona virus disease (covid- ). medrxiv the first case of novel coronavirus pneumonia imported into korea from wuhan, china: implication for infection prevention and control measures rapid and sensitive sequence comparison with fastp and fasta basic local alignment search tool applications of alignment-free methods in epigenomics alignment-free sequence comparison-a review phylogenetically diverse tt virus viremia among pregnant women dna sequence classification by convolutional neural network a deep learning approach to dna sequence classification viraminer: deep learning on raw dna sequences for identifying viral genomes in human samples identifying viruses from metagenomic data by deep learning machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: covid- case study dbsnp: the ncbi database of genetic variation global initiative on sharing all influenza data-from vision to reality how ownership rights over microorganisms affect infectious disease control and innovation: a root-cause analysis of barriers to data sharing as experienced by key stakeholders managing severe acute respiratory syndrome (sars) intellectual property rights: the possible role of patent pooling threats to timely sharing of pathogen sequence data accurate identification of sars-cov- from viral genome sequences using deep learning a genomic perspective on the origin and emergence of sars-cov- extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense a method for stochastic optimization genbank: the nucleotide sequence database. the ncbi handb automatic discovery of -mirna signature for cancer classification using ensemble feature selection machine learning-based ensemble recursive feature selection of circulating mirnas for cancer tumor classification primer plus, an enhanced web interface to primer sars-coronavirus open reading frame- b triggers intracellular stress pathways and activates nlrp inflammasomes augmentation of chemokine production by severe acute respiratory syndrome coronavirus a/x and a/x proteins through nf-κb activation severe acute respiratory syndrome coronavirus orf a protein interacts with caveolin fastpcr software for pcr primer and probe design and repeat search viral rna was isolated from cell-cultured sars-cov- , sars- , mers-cov, hcov-nl , hcov-oc , hcov- e, and from nasopharyngeal swabs from n = patients by magna pure lc (roche diagnostics, the netherlands) using the total nucleic acid isolation kit. the rna was converted into cdna using superscriptiii (thermo-fisher scientific, usa) and random hexamers. subsequently, conventional pcr was performed on the cdna using hotstar taq dna polymerase (qiagen, the netherlands) with nm forward primer ( '-ag cac tct cca agg gtg ttc- ') and nm reverse primer ( '-gca aag cca aag cct cat ta- ') and the following cycling conditions: min at • c, followed by cycles of min. at • c , min. at • cc and min. at • c. the pcr products were visualized by electrophoresis. the same rna was used in a diagnostics reference assay by corman et al. and the cycle threshold values form this reference assay were used for estimating sensitivity. the study was approved by the medical ethical commission of the erasmus mc (mec- - ). lmm, cap made the biological analysis, and primer design. alr and at made the programming, data collection and experiments in silico. dm and rm made the pcr validation. ec, adk and jg made the experiment and study design. all the authors contributed to the writing. key: cord- -xmbnpawj authors: meekins, david a.; morozov, igor; trujillo, jessie d.; gaudreault, natasha n.; bold, dashzeveg; artiaga, bianca l.; indran, sabarish v.; kwon, taeyong; balaraman, velmurugan; madden, daniel w.; feldmann, heinz; henningson, jamie; ma, wenjun; balasuriya, udeni b. r.; richt, juergen a. title: susceptibility of swine cells and domestic pigs to sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xmbnpawj the emergence of sars-cov- has resulted in an ongoing global pandemic with significant morbidity, mortality, and economic consequences. the susceptibility of different animal species to sars-cov- is of concern due to the potential for interspecies transmission, and the requirement for pre-clinical animal models to develop effective countermeasures. in the current study, we determined the ability of sars-cov- to (i) replicate in porcine cell lines, (ii) establish infection in domestic pigs via experimental oral/intranasal/intratracheal inoculation, and (iii) transmit to co-housed naive sentinel pigs. sars-cov- was able to replicate in two different porcine cell lines with cytopathic effects. interestingly, none of the sars-cov- -inoculated pigs showed evidence of clinical signs, viral replication or sars-cov- -specific antibody responses. moreover, none of the sentinel pigs displayed markers of sars-cov- infection. these data indicate that although different porcine cell lines are permissive to sars-cov- , five-week old pigs are not susceptible to infection via oral/intranasal/intratracheal challenge. pigs are therefore unlikely to be significant carriers of sars-cov- and are not a suitable pre-clinical animal model to study sars-cov- pathogenesis or efficacy of respective vaccines or therapeutics. the emergence of sars-cov- , the causative agent of covid- , has resulted in a global pandemic with over million cases and , deaths as of august , [ , ] . sars-cov- causes a respiratory disease in humans with a broad clinical presentation, ranging from asymptomatic or mild illness to severe fatal disease with multi-organ failure [ ] [ ] [ ] [ ] . sars-cov- is rapidly transmissible via contact with infected respiratory droplets and can also be transmitted by asymptomatic carriers [ ] [ ] [ ] . to curb viral spread, countries have instituted varying levels of social distancing policies, which have significant negative economic and social impacts [ ] . mitigating the effects of this unprecedented pandemic will necessitate the development of effective vaccines and therapeutics, which will require well-characterized and standardized preclinical animal models. sars-cov- is a member of the betacoronavirus genus that includes the pathogenic human viruses sars-cov- and mers-cov [ , [ ] [ ] [ ] . while details of the origin of sars-cov- are unknown, evidence indicates it emerged from a zoonotic spillover event, with bats and perhaps pangolins as probable origin species [ , [ ] [ ] [ ] . the potential for a reverse zoonotic event, i.e. human-to-animal transmission, is possible and of significant concern to animal and public health [ ] [ ] [ ] . instances of natural human-to-animal transmission of sars-cov- have been reported with covid- patients in domestic settings (dogs and cats), zoos (lions and tigers), and farms (mink) [ ] [ ] [ ] . therefore, investigations into the infectivity of sars-cov- in various animal species with human contact are essential to assess and control the risk of a spillover event and to establish the role these animals may play in the ecology of the virus. several studies have determined the susceptibility of different animal species to sars-cov- via experimental infection [ , ] . cats, hamsters, and ferrets are highly susceptible to sars-cov- infection, demonstrate varying clinical and pathological disease manifestations, readily transmit the virus to naïve animals, and mount a virusspecific immune response [ ] [ ] [ ] [ ] [ ] [ ] [ ] . dogs are mildly susceptible to experimental sars-cov- infection, with limited viral replication but with clear evidence of seroconversion in some animals [ ] . poultry species seem to be resistant to sars-cov- infection [ , ] . these findings establish the respective utility of different animal species as pre-clinical models to study sars-cov- . several lines of evidence suggest that pigs could be susceptible to sars-cov- infection. pigs are susceptible to both experimental and natural infection with the related betacoronavirus, sars-cov- , and demonstrate seroconversion [ , ] . structure-based analyses predict that the sars-cov- spike (s) protein receptor binding domain (rbd) binds the pig angiotensin-converting enzyme (ace ) entry receptor with similar efficiency compared to human ace [ ] . single-cell screening also indicated that pigs co-express ace and the tmprss activating factor in a variety of different cell lines, and sars-cov- replicates in various pig cell lines [ , , , ] . despite these preliminary data indicating that pigs could be susceptible to sars-cov- infection, two recent studies revealed that intranasal inoculation of three and twelve pigs, respectively, with pfu or tcid of sars-cov- did not lead to any detectable viral replication or seroconversion [ , ] . however, the single route of intranasal inoculation used in these studies suggests that additional investigations are necessary before definitive conclusions can be made regarding susceptibility of pigs to sars-cov- . in the present study, we determined the susceptibility of swine cell lines and domestic pigs to sars-cov- infection. two different porcine cell lines were found to be permissive to sars-cov- infection showing cytopathic effects (cpe). domestic pigs were challenged via simultaneous oral/intranasal/intratracheal inoculation with a tcid dose of sars-cov- . sars-cov- did not replicate in pigs and none of them seroconverted. furthermore, the virus was not transmitted from sars-cov- inoculated animals to sentinels. the present findings, combined with the other studies [ , ] , confirm that pigs seem resistant to sars-cov- infection despite clear susceptibility of porcine cell lines. pigs are therefore unlikely to play an important role in the covid- pandemic as a virus reservoir or as a pre-clinical animal model to study sars-cov- pathogenesis or develop novel countermeasures. sars-cov- usa-wa / isolate (genbank accession # mn ) [ ] was eighteen pigs (mix of males and females, five weeks of age) were used in the study. pigs were acquired from a source guaranteed free of swine influenza virus (siv), porcine circovirus- (pcv- ), and porcine reproductive and respiratory syndrome virus (prrsv) infection. the study outline is illustrated in figure . upon arrival, pigs were acclimated for days prior to sars-cov- inoculation. nine pigs were designated as uninfected negative controls and housed in separate bsl- facilities. three of these uninfected negative control pigs were humanely euthanized at days post challenge (dpc) to provide negative control clinical and tissue samples. the nine principal infected pigs were housed in the same room in two separate groups ( or pigs each; gross pathological examinations on major organs were performed and respiratory tissue samples were collected and either stored in % neutral-buffered formalin or stored as fresh samples at - ˚c. blood and swab samples were all filtered using a . µm filter prior to storage at - ˚c. rna was isolated from blood, swabs, and tissue samples, using a magnetic bead-based protocol in a bsl- + laboratory at the bri at ksu. lung tissue homogenates ( mg per ml dmem; % w/v) were prepared by thawing tissue, mincing it into mm sections, followed by lysis in a ml sure-lock tube containing mm stainless steel homogenization beads using the tissuelyser lt (qiagen, germantown, md, usa) for seconds at hz followed by min of hz while keeping the sample cold. following clarification via a -minute centrifugation ( , xg; room temperature), supernatants were mixed with an equal volume of rlt lysis buffer. blood and clinical swabs were directly mixed with an equal volume of rlt lysis buffer. µl of each sample lysate was used to extract rna using a magnetic bead-based during post mortem examinations, the upper and lower respiratory tract, central nervous system, lymphatic and cardiovascular systems, gastrointestinal and urogenital systems, and integument were evaluated. lungs were removed in toto and the percentage of the lung surface that was affected by macroscopic lesions was estimated by single veterinarian experienced in evaluating gross porcine lung pathology as previously described [ , ] . lungs were evaluated for gross pathology such as edema, congestion, discoloration, atelectasis, and consolidation. tissue samples of interest were collected and either fixed in % neutral-buffered formalin for histopathological examination or frozen at - ˚c for rt-qpcr testing. tissues were fixed in formalin for days, then transferred to % ethanol (thermofisher scientific, waltham, ma, usa) prior to trimming and paraffin embedding following standard automated protocols used in the histology section of the kansas state veterinary diagnostic laboratory. following embedding, tissue sections were cut and stained with hematoxylin and eosin and evaluated by a board-certified veterinary pathologist who was blinded to the treatment groups. to detect sars-cov- antibodies in sera, indirect elisas were performed observed. neutralizing sera from sars-cov- -infected cats from a separate study [ ] was used as a positive control. to determine the consensus sequence of the usa-wa/ / virus and to analyze if there were any nucleic acid substitutions in the sars-cov- virus after passage in porcine cell lines, rna was extracted from cell culture supernatant as described above. the rna was then subjected to rt-pcr amplification using a tiledprimer approach to amplify the entire sars-cov- genome as described previously [ ] . briefly, the pcr amplicons were pooled and subjected to library preparation for next generation sequencing using the nextera xt library prep kit (illumina, san diego, ca, usa). the library was normalized and sequenced using a miseq nano v x sequencing kit. the sequence was then analyzed by mapping reads to the parent sequence (genbank accession # mn ) [ ] to generate a consensus sequence. the sars-cov- usa-wa / isolate, which was isolated from a human patient in washington state, usa, was used as the parent stock for the study [ ] . the to determine the effect of sars-cov- infection in domestic pigs, nine sixweek-old sars-cov- seronegative piglets were inoculated with a total of x tcid of the usa-wa / isolate, which was passaged once in swine st cells ( figure ). the challenge material (total ml) was administered orally ( ml), intranasally ( ml; . ml each nostril) and intratracheally ( ml) after sedation of the animals. at -day post challenge (dpc), six uninoculated sentinel contact pigs were comingled with the principal inoculated animals ( animals per pen). daily rectal temperatures were recorded for each pig and clinical signs were monitored daily, including observations for signs of lethargy, hyporexia, respiratory distress (coughing, labored breathing, nasal discharge), and digestive issues (diarrhea or vomiting). no significant temperature elevation or change in rectal temperature was observed in the principal inoculated nor sentinel contact pigs throughout the study (figure ) . moreover, no obvious clinical signs were observed for any of the principal inoculated nor sentinel pigs throughout the -day observation period. to detect viral replication in the principal and sentinel pigs, clinical samples were subjected to rt-qpcr to detect the sars-cov- n gene ( table ) table ). the only the exception was a low suspect positive result in a nasal swab at dpc in a principal inoculated pig # , for which one of two qpcr replicates yielded a low fluorescent amplification curve with a ct of (table ) . moreover, viral rna was not detected in any lung sample collected at post-mortem examination on , and dpc (table ). in addition, gross and histopathological analysis of trachea and lung from the principal challenged pigs did not reveal the presence of any obvious pathological lesions ( table , figure ). these results indicate that sars-cov- failed to replicate in the respiratory and digestive tract as well as the blood in orally/intranasally/intratracheally inoculated pigs throughout an observation period of days. this is confirmed by the fact that the principal infected pigs failed to transmit sars-cov- to co-mingled sentinel animals. to determine whether the orally/intranasally/intratracheally inoculated pigs sars-cov- is a zoonotic agent, and a detailed understanding of the susceptibility of various animal species to sars-cov- is central to controlling its spread [ , ] . in addition, the development of animal models that emulate covid- in humans is essential for pre-clinical testing of novel vaccines and therapeutics [ ] . in this study, we inoculated nine pigs with a high dose of sars-cov- that was passaged once in porcine cells. simultaneous oral/intranasal/intratracheal inoculation did not result in any detectable viral rna in the blood, the oral/nasal/rectal cavities, or the lungs. also, none of the co-mingled, sentinel contact pigs shed viral rna. moreover, a virus-specific immune response characteristic of infection was not observed within the -day study period in the principal infected or sentinel pigs. the transient nature of the igm and igg response observed in pig # could indicate cross-reactivity of antibodies directed against a porcine coronavirus such as porcine epidemic diarrhea virus [ ] . such antibodies could be maternally derived and therefore transient as the lack of sars-cov- specific reactivity by the end of the study might suggest. in contrast to previous sars-cov- swine studies [ , ] , the present study used a more stringent inoculation procedure (intratracheal and oral, in addition to intranasal) and log higher titer of virus inoculum ( vs ). in addition, the inoculum in the present study was passaged once in porcine st cells. these results, combined with previous intranasal pig inoculation studies [ , ] , indicate that pigs seem to be resistant to sars-cov- infection, are unlikely to be a sars-cov- carrier animal species, and are also not suitable as an animal model for research. the results of the present and previous sars-cov- inoculation studies in pigs are intriguing in light of the findings that the porcine ace receptor seems highly compatible with the sars-cov- rbd, suggesting that pigs could be susceptible to sars-cov- infection [ , ] . pigs are susceptible to both experimental and natural infection with sars-cov- [ , ] . however, the experimental sars-cov- infection was via simultaneous intranasal/oral/intraocular/intravenous inoculation [ ] , thus the actual route(s) of sars-cov infection cannot be determined. recently, several porcine cell lines have been shown to be permissive to sars-cov- infection [ , ] ; in addition, single-cell screening studies showed that porcine ace /tmprss expression are compatible with infection [ ] . in contrast to previous reports that some porcine cell lines are susceptible to sars-cov- infection, but show no cpe [ , ] , we found that both st and pk- cell lines are susceptible to infection and observed cpe after two or four passages, respectively. the absence of sars-cov- replication and transmission in the present and two previous pigs studies [ , ] seems to lessen the need to monitor pig populations for sars-cov- during the ongoing pandemic. however, the evidence described above suggests pig susceptibility should not be disregarded, because all pig studies to date have used rather young pigs and commercially available pig breeds/genetics. we also have to be aware that unforeseen genetic changes in the sars-cov- genome may result in a better compatibility of the virus for pigs in the future. pigs are considered to be an excellent model for studying human infectious diseases based on their relatedness to humans in terms of anatomy and immune responses and they have been found to be much more predictive for the efficacy of therapeutics when compared to rodent models [ ] . however, the results presented here indicate that pigs are not a suitable preclinical model for sars-cov- pathogenesis studies and the development and efficacy testing of therapeutics and/or vaccines. a recently available article indicates that while pigs are not susceptible to sars-cov- infection, neutralizing antibody responses were detected in pigs infected via intramuscular or intravenous inoculation routes [ ] ; this indicates that pigs could be used for immunogenicity studies related to sars-cov- . however, the use of pigs to monitor sars-cov- immune responses must be careful to screen for cross-reactive maternal antibodies derived from other coronaviruses [ ] . alternate pre-clinical animal models, namely non-human primates, syrian hamsters, transgenic or transduced mice expressing human ace , ferrets, or even cats need to be considered to gain additional insights into sars-cov- pathogenesis and virulence. comprehensive characterization of sars-cov- pathogenesis in preclinical animal models and the establishment of standardized infection and testing protocols will be crucial for the development of much-need countermeasures to combat covid- . swabs/blood were tested from samples on , , , , , and dpc. lung tissue was collected , , and dpc. swabs/blood were tested from samples on , , , and dpc. lung tissue was collected on dpc swabs/blood were tested on dpc. lung tissue was collected on dpc for these uninfected controls. *one pig (# ) had a ct signal of . ( . x copy number/ml) for / of rt-qpcr wells on dpc. magnification is x for main images and x for inserts. coronavirus disease (covid- ) situation report - a pneumonia outbreak associated with a new coronavirus of probable bat origin clinical presentation of covid- : case series and review of the literature epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of coronavirus disease in china transmission, diagnosis, and treatment of coronavirus disease (covid- ): a review early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia temporal dynamics in viral shedding and transmissibility of covid- the socio-economic implications of the coronavirus pandemic (covid- ): a review coronaviridae study group of the international committee on taxonomy of v. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- . nat microbiol severe acute respiratory syndrome (sars): a review middle east respiratory syndrome coronavirus (mers-cov): a review the proximal origin of sars-cov- probable pangolin origin of sars-cov- associated with the covid- outbreak possible bat origin of severe acute respiratory syndrome coronavirus . emerg infect dis is covid- the first pandemic that evolves into a panzootic? vet ital covid- and veterinarians for one health, zoonotic-and reverse-zoonotic transmissions a critical needs assessment for research in companion animals and livestock following the pandemic of covid- in humans. vector borne zoonotic dis are animals a neglected transmission route of sars-cov- ? pathogens evidence for sars-cov- infection of animal hosts. pathogens infectivity, virulence, pathogenicity, host-pathogen interactions of sars and sars-cov- in experimental animals: a systematic review susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus . science transmission of sars-cov- in domestic cats infection and rapid transmission of sars-cov- in ferrets sars-cov- is transmitted via contact and via the air between ferrets simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility pathogenesis and transmission of sars-cov- in golden hamsters susceptibility of pigs and chickens to sars coronavirus sars-associated coronavirus transmitted from human to pig. emerg infect dis receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus single-cell screening of sars-cov- comparative tropism, replication kinetics, and cell damage profiling of sars-cov- and sars-cov with implications for clinical manifestations, transmissibility, and laboratory studies of covid- : an observational study. lancet microbe severe acute respiratory syndrome coronavirus from patient with coronavirus disease, united states. emerg infect dis division of viral diseases. real-time rt-pcr panel for detection - -novel coronavirus pathogenic and antigenic properties of phylogenetically distinct reassortant h n swine influenza viruses cocirculating in the united states comparison of pathogenicity and transmissibility of influenza b and d viruses in pigs. viruses sars-cov- infection, disease and transmission in domestic cats ncov- sequencing protocol v v. lactogenic immunity and vaccines for porcine epidemic diarrhea virus (pedv): historical and current concepts the pig: a model for human infectious diseases pigs are not susceptible to sars-cov- infection but are a model for viral immunogenicity studies emerging and re-emerging we gratefully thank the staff of ksu biosecurity research institute, the the authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. key: cord- -ixun c g authors: su, haixia; yao, sheng; zhao, wenfeng; li, minjun; liu, jia; shang, weijuan; xie, hang; ke, changqiang; gao, meina; yu, kunqian; liu, hong; shen, jingshan; tang, wei; zhang, leike; zuo, jianping; jiang, hualiang; bai, fang; wu, yan; ye, yang; xu, yechun title: discovery of baicalin and baicalein as novel, natural product inhibitors of sars-cov- cl protease in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ixun c g human infections with severe acute respiratory syndrome coronavirus (sars-cov- ) cause coronavirus disease (covid- ) and there is currently no cure. the c-like protease ( clpro), a highly conserved protease indispensable for replication of coronaviruses, is a promising target for development of broad-spectrum antiviral drugs. to advance the speed of drug discovery and development, we investigated the inhibition of sars-cov- clpro by natural products derived from chinese traditional medicines. baicalin and baicalein were identified as the first non-covalent, non-peptidomimetic inhibitors of sars-cov- clpro and exhibited potent antiviral activities in a cell-based system. remarkably, the binding mode of baicalein with sars-cov- clpro determined by x-ray protein crystallography is distinctly different from those of known inhibitors. baicalein is perfectly ensconced in the core of the substrate-binding pocket by interacting with two catalytic residues, the crucial s /s subsites and the oxyanion loop, acting as a “shield” in front of the catalytic dyad to prevent the peptide substrate approaching the active site. the simple chemical structure, unique mode of action, and potent antiviral activities in vitro, coupled with the favorable safety data from clinical trials, emphasize that baicalein provides a great opportunity for the development of critically needed anti-coronaviral drugs. traditional chinese medicines (tcms) have evolved over thousands of years and are an invaluable source for drug discovery and development. as a notable example, the discovery of artemisinin (qinghaosu), which was originally isolated from the tcm artemisia annua l. (qinghao), is a milestone in the treatment of malaria. tcms as well as purified natural products also provide a rich resource for novel antiviral drug development. several herbal medicines and natural products have shown antiviral activities against viral pathogens ( ) ( ) ( ) ( ) . among these, the roots of scutellaria baicalensis georgi (huangqin in chinese) are frequently used in tcm for the prophylaxis and treatment of hepatitis and respiratory disorders ( ) ( ) ( ) . in the present study, an enzymatic assay was performed to test if the ingredients isolated from s. baicalensis are inhibitors of sars-cov- clpro. as a result, baicalin and baicalein, two bioactive components from s. baicalensis, are identified as novel inhibitors of sars-cov- clpro with an antiviral activity in the sars-cov- infected cells. a crystal structure of sars-cov- clpro in complex with baicalein, the first non-covalent, non-peptidomimetic small-molecule inhibitor, was also determined, revealing a unique binding mode of this natural product with the protease. the pivotal role of the cl protease in processing polyproteins into individual functional proteins for viral replication and a highly conserved substrate specificity of the enzyme among various covs make it a promising target for screening of inhibitors. a fluorescence resonance energy transfer (fret) protease assay was applied to measure the proteolytic activity of the recombinant sars-cov- clpro on a fluorogenic substrate. the detail of the assay is described in the experimental section. this fret-based protease assay was utilized to screen natural products as novel inhibitors of sars-cov- clpro. it was first used to determine the inhibitory activities of the total aqueous extract, fractionations, and purified compounds from s. baicalensis against sars-cov- clpro (see supplementary materials, fig. s ). as the result, two fractions from s. baicalensis showed significant inhibition on sars-cov- clpro at . g/ml (table s ) . surprisingly, baicalin, the major component in fraction , shows an ic of . m against the protease, while baicalein, the major component in fraction , has an ic of . m (fig. s ; table ). accordingly, baicalin and baicalein are identified as novel non-peptidomimetic inhibitors of sars-cov- clpro with single-digit micromolar potency. to validate the binding of baicalin and baicalein with sars-cov- clpro and exclude the suspicion of being the pan-assay interference compounds (pains) ( ) , their binding affinities with the protease were measured by isothermal titration calorimetry (itc), widely known as an invaluable tool used to determine thermodynamic parameters of protein-ligand interactions such as kd (fig. , a and b ; table ). the resulting kd of baicalin and baicalein binding with sars-cov- clpro is . and . m, respectively, which has a good correlation with the ic s mentioned above, demonstrating that specific binding of the compounds with the enzyme is responsible for their bioactivities. moreover, the itc profiles in combination with their chemical structures suggest that baicalin and baicalein act as noncovalent inhibitors of sars-cov- clpro with a high ligand binding efficiency. native state electrospray ionization mass spectrometry (esi-ms) has been used extensively to directly observe native state proteins and protein complexes, allowing direct detection of protein-ligand non-covalent complexes with kds as weak as mm ( ) . the determination of m/z between [protein + ligand] m/z and [unbound protein] m/z is able to identify a ligand as a binder with the correct molecular weight, while the ratio of the intensity of the [protein + ligand] peaks relative to [unbound protein] peaks provides a qualitative indication of the ligand-binding affinity. herein, an esi-ms analysis using high-resolution magnetic resonance mass spectrometry (mrms) was carried out to detect the binding of baicalin and baicalein with sars-cov- clpro. for the free protease performance optimization, the mass range around the change stated + was isolated with a center mass of the quadrupole of m/z (fig. s ). for the ligand-binding screening studies, two charge states ( + and +) have been used for calculation of the free protease and protein-ligand complex intensities. the representative spectra of samples containing sars-cov- clpro ( m) and baicalin ( . m) or baicalein ( . m) acquired under both optimized and screening conditions are shown in fig. c and d, demonstrating a specific binding of baicalin or baicalein with the protease. moreover, the plot of the fraction of the bound protease versus the total concentration of baicalin or baicalein obtained kds of . and . µm for baicalin and baicalein, respectively (fig. e and f), in keeping with the results from the itc measurements. the mode of action of baicalein and the structural determinants associated with its binding with sars-cov- clpro were further explored using x-ray protein crystallography. the crystal structure of sars-cov- clpro in complex with baicalein was determined at a resolution of . Å ( fig. ; table s ). the protease has a catalytic cys -his dyad and an extended binding site, features shared by sars-cov clpro and mers-cov clpro ( hydrophobic interactions. consequently, baicalein is perfectly ensconced in the core of the substrate-binding pocket and interacts with two catalytic residues, the oxyanion loop (residues - ), glu , and the s /s subsites, which are the key elements for recognition of substrates as well as peptidomimetic inhibitors ( ) . although baicalein did not move deeply into the s sub-pocket, its phenolic hydroxyl groups did form contacts with the crucial residue of this sub-pocket, his , via the water molecule. by the aid of an array of direct and indirect hydrogen bonds with leu /gly /ser , baicalein fixed the conformation of the flexible oxyanion loop, which serves to stabilize the tetrahedral transition state of the proteolytic reaction. these results together provide the molecular details of baicalein recognition by sars-cov- clpro and an explanation for the observed potent activity of such a small molecule against the protease. the amino sequence of sars-cov- clpro displays % sequence identity to sars-cov clpro. there are residues differed in two proteases and none of them participates in the direct contacts with baicalein. the high level of sequence identified between two proteases allows one to assume that baicalein will bind to the sars-cov clpro in the same way as it does to sars-cov- clpro. the inhibition assay shows that baicalein can also inhibit sars-cov clpro, with an ic of . ± . µm. thus, a three-dimensional model of sars-cov clpro in complex with baicalein was constructed by superimposing the crystal structure of sars-cov- clpro/baicalein with that of sars-cov clpro/tg- (pdb code zu ) (fig. s a ). the binding mode of baicalein with sars-cov clpro is distinctively different from those of known inhibitors. all of the crystal structures of the inhibitor-bound sars-cov clpro were collected for a comparison analysis (fig. s b ). if those peptidomimetic inhibitors are delineated like "swords" to compete with the binding of substrate, baicalein works as a "shield" in front of two catalytic dyads to prevent the approach of the substrate to the active site (fig. s b ). such a unique binding mode in combination with its high ligand-binding efficiency and small molecular weight renders baicalein valuable for drug development. we substrate-binding sites, particularly for the crucial s /s subsites ( , , ) . accordingly, substrate analogs or mimetics attached with a chemical warhead targeting the catalytic cysteine were designed as peptidomimetic inhibitors of clpros with a covalent mechanism of action ( ). a series of diamide acetamides acting as non-covalent sars-cov clpro inhibitors and their binding modes examined by crystal structures have been reported, but they are more or less peptidomimetic inhibitors and a continuous development of these compounds is absent ( ) . although several other small molecules have been declared as clpro inhibitors, a solid validation of their binding with clpros by itc or complex structure determination is lacking. as none of the known inhibitors has been moved to clinical trials, considerable efforts to discover novel small molecule inhibitors of clpros are urgently needed. in the authors sincerely thank prof. zihe rao and prof. haitao yang for kindly providing the protein as well as the substrate for the enzymatic assay. we also thank the staff from beamlines bl u and bl u at shanghai synchrotron radiation facility. we thank letpub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript. the authors declare no competing interests. the sars-cov- clpro-baicalein complex structure was deposited with the protein data bank with accession code m n. all other data are available from the corresponding author upon reasonable request. figures s -s tables s and s references ( - ) chemical shifts were reported in ppm (δ) coupling constants (j) in hertz. chemical shifts are reported in ppm units with me si as a reference standard. the cdna of full length sars-cov- clpro or sars-cov clpro was cloned into the for ligand binding screening and dissociation constant (kd) determination, the magnitude size was set to k and single scans were added. ( ). the purified sars-cov- - clpro protein was concentrated to mg/ml for crystallization. one hour incubation of the protein with mm baicalein was carried out before crystallization condition screening. crystals of the complex were obtained at °c by mixing equal volumes of protein-baicalein and a reservoir ( % peg , mm mes, ph . , % dmso) with a handing-drop vapor diffusion method. crystals were flash frozen in liquid nitrogen in the presence of the reservoir solution supplemented with % glycerol. x-ray diffraction data were collected at beamline bl u at the shanghai synchrotron radiation facility ( ) . the data were processed with hkl software packages ( ) . the complex structure was solved by molecular replacement using the program phaser ( ) with a search model of pdb code lu . the model was built using coot ( ) and refined with a simulated-annealing protocol implemented in the program phenix ( ) . the refined structure was deposited to protein data bank with an accession code listed in table s . the complete statistics as well as the quality of the solved structure are also shown in table s . the vero e cell line was obtained from american type culture collection (atcc, manassas, usa) and maintained in minimum eagle's medium (mem; gibco invitrogen) supplemented with % fetal bovine serum (fbs; invitrogen, uk) in a humid incubator with baicalein is shown as green spheres and other inhibitors together with two catalytic residues are shown as sticks. coronaviruses -drug discovery and therapeutic options sars and mers: recent insights into emerging coronaviruses early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a pneumonia outbreak associated with a new coronavirus of probable bat origin an overview of severe acute respiratory syndrome-coronavirus (sars-cov) cl protease inhibitors: peptidomimetics and small molecule chemotherapy design of wide-spectrum inhibitors targeting coronavirus main proteases antiviral effect of forsythoside a from forsythia suspensa (thunb.) vahl fruit against influenza a virus through reduction of viral m protein antiviral activity of chlorogenic acid against influenza a (h n /h n ) virus and its inhibition of neuraminidase chemistry and pharmacology of the herb pair flos lonicerae japonicae-forsythiae fructus antiviral natural products and herbal medicines baicalin and its aglycone: a novel approach for treatment of metabolic disorders the comparative study of the therapeutic effects and mechanism of baicalin, baicalein, and their combination on ulcerative colitis rat therapeutic potentials of baicalin and its aglycone, baicalein against inflammatory disorders the ecstasy and agony of assay interference compounds native state mass spectrometry, surface plasmon resonance, and x-ray crystallography correlate strongly as a fragment screening combination ph-dependent conformational flexibility of the sars-cov main proteinase (m(pro)) dimer: molecular dynamics simulations and multiple x-ray structure analyses remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro coronavirus main proteinase ( clpro) structure: basis for design of anti-sars drugs sars-cov cl protease cleaves its c-terminal autoprocessing site by novel subsite cooperativity discovery, synthesis, and structure-based optimization of a series of n-(tert-butyl)- -(n-arylamido)- -(pyridin- -yl) acetamides (ml ) as potent noncovalent small molecule inhibitors of the severe acute respiratory syndrome coronavirus (sars-cov) cl protease combination of western medicine and chinese traditional patent medicine in treating a family case of covid- in wuhan safety, tolerability, and pharmacokinetics of a single ascending dose of baicalein chewable tablets in healthy subjects optimization of electrospray ionization by statistical design of experiments and response surface methodology: protein-ligand equilibrium dissociation constant determinations upgrade of macromolecular crystallography beamline bl u at ssrf hkl- : the integration of data reduction and structure solution--from diffraction images to an initial model in minutes phaser crystallographic software coot: model-building tools for molecular graphics phenix: building new software for automated crystallographic structure determination a clinical isolate sars-cov- ( ) was propagated in the vero e cells all the infection experiments were performed at biosafety level- (bls- ) after that, the virus-compound mixture was removed and cells were further cultured with a fresh compound containing medium. at h p.i., the cell supernatant was collected and the viral rna in supernatant was subjected to qrt-pcr analysis as described previously ( ). dmso was used in the controls. the experiments were performed in triplicates and three hplc-ms profiling of the active fraction and hplc-ms profiling of fration . (b) hplc-ms profiling of fration key: cord- - xsypzt authors: nelson-sathi, shijulal; umasankar, pk; sreekumar, e; radhakrishnan nair, r; joseph, iype; nori, sai ravi chandra; philip, jamiema sara; prasad, roshny; navyasree, kv; ramesh, shikha; pillai, heera; ghosh, sanu; santosh kumar, tr; radhakrishna pillai, m. title: mutational landscape and in silico structure models of sars-cov- spike receptor binding domain reveal key molecular determinants for virus-host interaction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xsypzt protein-protein interactions between virus and host are crucial for infection. sars-cov- , the causative agent of covid- pandemic is an rna virus prone to mutations. formation of a stable binding interface between the spike (s) protein receptor binding domain (rbd) of sars-cov- and angiotensin-converting enzyme (ace ) of host actuates viral entry. yet, how this binding interface evolves as virus acquires mutations during pandemic remains elusive. here, using a high fidelity bioinformatics pipeline, we analysed , sars-cov- genomes across the globe, and identified non-synonymous mutations that cause distinct amino acid substitutions in the rbd. molecular phylogenetic analysis suggested independent emergence of these rbd mutants during pandemic. in silico structure modelling of interfaces induced by mutations on residues which directly engage ace or lie in the near vicinity revealed molecular rearrangements and binding energies unique to each rbd mutant. comparative structure analysis using binding interface from mouse that prevents sars-cov- entry uncovered minimal molecular determinants in rbd necessary for the formation of stable interface. we identified that interfacial interaction involving amino acid residues n and g on either ends of the binding scaffold are indispensable to anchor rbd and are well conserved in all sars-like corona viruses. all other interactions appear to be required to locally remodel binding interface with varying affinities and thus may decide extent of viral transmission and disease outcome. together, our findings propose the modalities and variations in rbd-ace interface formation exploited by sars-cov- for endurance. importance covid- , so far the worst hit pandemic to mankind, started in january and is still prevailing globally. our study identified key molecular arrangements in rbd-ace interface that help virus to tolerate mutations and prevail. in addition, rbd mutations identified in this study can serve as a molecular directory for experimental biologists to perform functional validation experiments. the minimal molecular requirements for the formation of rbd-ace interface predicted using in silico structure models may help precisely design neutralizing antibodies, vaccines and therapeutics. our study also proposes the significance of understanding evolution of protein interfaces during pandemic. cov- and related sars-cov provided initial clues regarding molecular architecture of the interface. rbd comprises of amino acid long peptide in the s -region of s-protein ( table ) (walls et al., ) . however, ace binding information is confined to a variable loop region within rbd called receptor binding motif (rbm). these structures elucidated key interfacial interactions responsible for enhanced binding affinity of sars-cov- to ace than sars-cov . it also suggested that few amino acid changes in rbm can remodel the interface resulting in altered binding affinities and viral transmission. however, all these studies were based on parental sars-cov- wuhan strain and several questions remain unanswered. what are the mutations acquired on rbd during covid- and what are the interfacial molecular rearrangements induced by these mutations? can we gain valuable insights regarding rbd-ace interface formation by analyzing these mutations? to address these questions we investigated the mutational landscape of sars-cov- rbd. a total of , spike proteins of sars-cov- were directly downloaded on th june from the gisaid database. we removed the partial sequences, sequences greater than % unidentified 'x' amino acids and sequences from low quality genomes. further, , spike protein sequences along with wuhan reference spike protein (yp_ . ) were aligned using mafft (maxiterate , and global pair-ginsi) (katoh et al. ) . the alignments were visualized in jalview (waterhouse am et al., ) and the amino acid substitutions in each position were extracted using custom python script. we ignored the substitutions that were present in only one genome and unidentified amino acid x. the mutations that are present in at least two independent genomes in a particular position were further considered. these two criteria were used to avoid mutations due to sequencing errors. the mutated amino acids were further tabulated and plotted as a matrix using r script. for the maximum-likelihood phylogeny reconstruction, we have used the sars-cov- genomes containing rbd mutations, and genomes were sampled as representatives for each known subtype with wuhan refseq strain as root. sequences were aligned using mafft (maxiterate , and global pair-ginsi), and phylogeny was reconstructed using iq-tree (nguyen et al., ) . the best evolutionary model (gtr+f+i+g ) was picked using the modelfinder program (kalyaanamoorthy et al., ) . the structural analysis of the mutated spike glycoprotein of sars-cov- rbd domain was done to assess the impact of interface amino acid residue mutations on binding affinity towards the human ace (hace ) receptor. the crystal structure of the sars-cov- rbd-hace receptor complex was downloaded from protein data bank (pdb id: lzg) and the mutagenesis analysis to capture changes that affect viral tropism during pandemic, we searched for non-synonymous mutations in rbd sequences from sars-cov- genomes. using unbiased and stringent filtering criteria, we analyzed , genomes deposited in gisaid till th june, . altogether, non-synonymous mutations in rbd were identified that belong to viral genomes from countries. these mutations were found to substitute amino acid residues in which residues lie within rbm ( table ) . these residues include those that directly engage ace (g , l , a , g , e , f and q ) and those that are in the near binding vicinity (figure a and b). hot spot mutations were also identified that caused recurrent substitutions of amino acid residues in the same position (n , p , q , i , s , v , f , a , p and a ). each rbd mutation was found to be unique to the genome; a combination of mutations was never observed in our analysis. overall, rbd mutations accounted for ~ % of the total non-synonymous mutations in s-protein. to see the evolutionary trend in rbd mutations, we compared rbds from sars-cov- , the related sars-cov and the bat coronavirus ratg , a suspected precursor of sars-cov- . sars-cov- rbd is . % identical to sars-cov and . % identical to ratg ( figure a ). we identified several rbd mutations on residues that are unique to sars-cov- (n k, v a/f/i, e d, f s/l, q l and s p) or are conserved in all three viruses ( figure a ). in addition, we observed micro evolutionary reversion mutations in sars-cov- that interchange residues to that in sars-cov or ratg (r k, n d, n k, l r, e v and s g) ( figure a) . interestingly, most of the unique and reversion mutations were located in the rbm region and thus may have implications in viral tropism. we performed phylogenomic analysis to understand the evolutionary pattern of rbd variants during pandemic and observed an unbiased clustering of rbd variants among distinct sars-cov- subtypes likely indicating independent emergence of these mutants (figure c ). rbd is divided into a structured core region comprising five antiparallel β-sheets and a variable random coil region, rbm that directly binds to ace (figure b) . on the contrary, binding information on ace is located across long α-helices. structurally, rbm scaffold resembles a concave arch that makes three contact points with ace α-helix; cluster-i, ii and iii. cluster-i and cluster-iii are on two ends and cluster-ii is towards the middle of the interface (figure a) . we analysed the effect of observed rbm mutations on the molecular interactions at rbd-ace binding interface. it has been shown that differences in ace residues render mouse resistant to infection from sars-like coronaviruses (zhao et al., ) . hence, to gain insights into relevant interactions that can create stable interface in mutants, we also included rbd-mouse ace interface in our analysis ( figure b) . structure models were created for all mutants based on the information from three recently reported crystal/ cryo-em structures of sars-cov- rbd-ace bound complex (figure a) . comparative analysis of structures showed key differences in all three binding clusters of sars-cov- rbd wild type and mutant interfaces with human or mouse ace (figure c, d and table s ). in cluster-i, f of sars-cov- rbm is found buried into the hydrophobic pocket made of human ace residues l , m and y . a mutation of f>l in sars-cov disrupts this pocket, thus weakens the binding affinity suggesting importance of this interaction (wan et al., ) . in addition, n of sars-cov- rbm forms hydrogen bonds with q and y of human ace . the hydrophobic pocket and n -y interactions were completely abolished in mouse interface due to natural ace substitutions in l t, m s and y f. but, these interactions were retained in all rbm mutants suggesting their importance in the pandemic. nevertheless, interactions of a / g of sars-cov- rbd with s of ace in cluster-i which were present in human and mouse were disrupted in mutants. in addition, sars-cov- genomes containing a v and g s replacements were identified in our analysis suggesting these mutations can be well tolerated. an additional hydrogen bond between y -y was seen in mutants lacking a /g -s interaction. this could possibly be a compensatory mechanism to stabilize cluster-i interactions (figure c, d and table s ). table s ). a bunch of interactions in cluster-iii involving g /y /q of rbm and q of ace were present in human but abolished in mouse and rbm mutants. however, additional interactions to compensate for these were not seen. a hydrogen bond formed between g of rbm and k of human ace appeared significant as this was completely abolished in mouse, owing to k h substitution, but retained in all rbm mutants. in addition, other interactions; g -k , y -d and t -y in the same cluster were maintained in human, mouse and mutants likely suggesting their supportive role ( figure c, d and table s ). the varying interface arrangements in mutants were consistent with the binding affinity differences (ΔΔg). compared to wild type, ΔΔg values of mutants ranged within ~ + kcal/mol, with the lowest value close to that of sars-cov ( figure s ) . sine rbm is a variable loop, mutations on any residue could impact spatial arrangements of backbone leading to altered binding affinities. consistently, we did not find a considerable difference in binding energies between mutations on residues that are involved in ace interaction or are in the near vicinity. in conclusion, we could pinpoint two interfacial interactions that remain unaffected in all mutants analysed. these are interactions mediated through rbd residues n and g and are located in cluster-i and cluster-iii respectively. based on their spatial arrangement, these residues appear critical in directly anchoring the rbm loop onto ace . this may help initiate interface formation that favours viral entry. both n and g are highly conserved in all sars-like corona viruses further reinforcing this notion. the significant remodelling in cluster-ii interactions indicates they are dispensable for anchoring but might be important for stabilizing the interface. since rbd-ace interface is a direct determinant of viral infectivity, along with other factors, varying interface architecture and binding affinities in sars-cov- rbm mutants may account for global variations in covid- transmission and outcome. sars-cov- s-protein is highly immunogenic, so recombinant vaccines and neutralizing antibodies that target the whole s-protein or rbd are currently being considered in clinics . our investigations reveal key molecular determinants and their modalities for rbd-ace interaction. this information might be used to design vaccines, synthetic nanobodies or small molecules that could specifically target rbm anchoring residues or their binding pockets. tepymol molecular graphics system the spike protein of sars-cov -a target for vaccine and therapeutic development sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor modelfinder: fast model selection for accurate phylogenetic estimates mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform increasing the precision of comparative models with yasara nova-a selfparameterizing force feld structure of the sars-cov- spike receptor-binding domain bound to the ace receptor iq-tree: a fast and effective stochastic algorithm for estimating maximum likelihood phylogenies characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov zdock server: interactive docking prediction of protein-protein complexes and symmetric multimers swiss-model: an automated protein homology-modeling server pic: protein interactions calculator ligplot: a program to generate schematic diagrams of protein-ligand interactions structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus structural and functional basis of sars-cov- entry by using human ace enhanced receptor binding of sars-cov- through networks of hydrogen-bonding and hydrophobic interactions jalview version -a multiple sequence alignment editor and analysis workbench cryo-em structure of the -ncov spike in the prefusion conformation prodigy: a web server for predicting the binding affinity of protein-protein complexes mining of epitopes on spike protein of sars-cov- from covid- patients broad and differential animal angiotensin-converting enzyme receptor usage by sars-cov- the authors wish to acknowledge john b johnson, mahendran kr and sara jones for critical comments. the work was supported by the department of biotechnology, government of india. key: cord- -vb bih authors: ahmed, shiek ssj; paramasivam, prabu; raj, kamal; kumar, vishal; murugesan, ram; ramakrishnan, title: interplay of host regulatory network on sars-cov- binding and replication machinery date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vb bih we dissect the mechanism of sars-cov- in human lung host from the initial phase of receptor binding to viral replication machinery. we constructed two independent lung protein interactome to reveal the signaling process on receptor activation and host protein hijacking machinery in the pathogenesis of virus. further, we test the functional role of the hubs derived from both interactome. most hubs proteins were differentially regulated on sars-cov- infection. also, the proteins of viral replication hubs were related with cardiovascular disease, diabetes and hypertension confirming the vulnerability and severity of infection in the risk individual. additionally, the hub proteins were closely linked with other viral infection, including mers and hcovs which suggest similar infection pattern in sars-cov- . we identified five interconnecting cascades between hubs of both networks that show the preparation of optimal environment in the host for viral replication process upon receptor attachment. interestingly, we propose that seven potential mirnas, targeting the intermediate phase that connects receptor and viral replication process a better choice as a drug for sars-cov- . coronavirus disease (covid- , sars-cov- ) became a pandemic, spread almost countries worldwide [ ] . sars-cov- causes lower respiratory infections leading to pneumonitis and multi-organ failure [ ] . till date, cases were reported as on th april and expected to increase every day by a human-tohuman transmission that occurred due to close contact with one another [ ] . an increase in patient number increases the fatality rate because of non-specific treatment against sars-cov- . today covid- became a serious threat in both developed and under-developing countries [ ] . there is an urgent need to accelerate work on specific treatments to decrease disease, morbidity and mortality [ , ] . repurposing of existing drugs has been considered as an immediate remedy rather than waiting for a new molecule to pass through human clinical trials. in addition, sars-cov- screening at the early stage may benefit the vulnerable population with the risk factor of diabetes, hypertension [ ] [ ] [ ] , and cardiovascular disease [ ] . however, understanding the mechanism of sars-cov- and host involvement will provide knowledge that guides towards developing biomarkers and drugs to treat this pandemic disease. sars-cov- is a single-stranded rna virus-derived from the human sars-cov- group of ß-genus with the genome size of kbp. sars-cov- genome encodes most structural proteins at ' orfs and non-structural proteins (nsp) at ' end orfs [ , ] . the nsp proteins are categorized as nsp - , which plays a potential role in replicase / transcriptase activity whereas the orfs at ' encode nine putative accessory factors along with four structural proteins, nucleocapsid (n), envelope (e), membrane (m), and spike (s) proteins [ , ] . the envelope (e), membrane (m), and spike (s) proteins localized at vial surface suggested having host binding capacity. on host attachment, sars-cov- initiates the infectious cycle and undergoes rna replication inside the host and assembles their substrate into their progeny virus. in general at the initial phase of the host binding, virus may create an optimal environment for its replication by interacting with the host proteins. simultaneously, the host alters its gene expression to react against the virus for self-defense leading to the overexpression of inflammatory genes and cytokine storm [ ] . due to the viral genome complexity and % sequence dissimilarity with other pathogenic hcovs [ ] . notably, the sequence variations at replicase complex (orf ab), envelope (e), spike (s), nucleocapsid (n) and membrane (m) suggest the possibility of different mechanisms adopted by sars-cov- that brings the question of repurposing the existing drugs. the current knowledge of host molecular components utilization by the sars-cov- is still at an early stage compared with other known human infecting rna viruses. understanding the regulatory behavior will enhance our knowledge of the virus-host mechanism, which may through light on diagnosis and treatment. in this study, we first time report a systems biological framework (fig ) to investigate the molecular interplay between sars-cov and human lung tissue. the novel approach uses high through-put experimental data and human protein interactome to reveal the sars-cov driven host mechanism. our approach provides (i) functional hubs that activated upon viral attachment to the receptor and viral genome evasion to the host. (ii) association of viral modulated functional hubs with diabetes, hypertension, and cardiovascular disease. (iii) an interdependency between the functional hubs that demonstrates the utilization of a host environment created upon receptor activation for the viral replication process. (iv) hubs representing mirna for therapeutic intervention. overall, our framework accesses the information such as human tissue proteome, transcriptome, text mining, and mirna data to postulate a mechanism in host on sars-cov infection. firstly, we analyzed whether sars cov- protein interacting with the endogenous human proteins (host). for which we searched and collected sars-cov- host proteomics data derived from affinity purification mass spectrometry [ ] . the data provide human endogenous proteins interacting with sars cov- proteins. further, these human endogenous proteins were categorized as set-a and set-b, respectively. set-a composed of human proteins showing physical affinity with structural proteins like as envelope (e), spike (s), and membrane (m) proteins localized at sars-cov surface. whereas, the set-b contains host proteins confirming its affinity with sars-cov non-structural and one nucleocapsid (n) proteins. then, we investigate host proteins that favor the attachment of sars-cov . so, the cellular localization of set-a proteins was annotated to have atp v a, ap b , stom, and zdhhc , localized at the host cell surface that may enable sars-cov attachment. further, the proteins in the set-a and b were mapped with an in-house lung-expressed gene database to confirm their expression in the lungs that relate the mechanism that occurs in lung tissue. next, we investigate the high-through-put protein-protein interaction networks that reveal the sars-cov mechanism in the host. two independent protein interactome networks were generated from set-a and set-b, respectively. the first interactome, receptor-mediated network from set-a contains four cell surface seed proteins that extended to have neighboring proteins with every node directly connected to its seed proteins. the receptor-mediated network describes the molecular signaling initiated on sars-cov attach to the host. secondly, the viral replication machinery network from set-b with seed proteins extended to neighboring proteins with interacting edges which representing the mechanism attributed to evasion of the sars-cov genome into the host. the proteins involved in the networks were mapped to the in-house lung-expressed-gene database. as a result, all proteins with four seed in the receptor-mediated network were expressed in the lungs (fig ) . similarly, the viral replication machinery network was acquired with proteins with interacting edges showing the complex sars-cov- mechanism in the human lungs. here, we looked for densely connected protein, which provides essential functional hubs within the network. molecular complex detection (mcode) algorithm [ ] used to extract hubs that define the core regulatory signaling and cellular processes within the network. eleven core functional hubs were obtained from the viral replication machinery network (s - fig) . in a receptor-mediated network, the implementation of the mcode algorithm was exempted for its limited number of nodes and edges, which by itself forms hubs. hence, the interacting proteins for all four distinct receptors were considered as hubs for the receptor-mediated network (fig ) . all hub proteins were mapped to differential expressed genes from rna-seq data. of hub proteins from both the network, were differentially regulated in primary human lung epithelium (nhbe) on sars-cov infection (rna-seq dataset gse ). for instance, eight differentially expressed hub proteins were noticed in the receptor-mediated network (s table) . similarly, among hub proteins of viral replication machinery network, were altered significantly, which confirms the hijacking of endogenous human proteins for the viral replication process (s table) . next, we investigated why sars-cov infection is susceptible to metabolic diseases like cardiovascular disease, diabetes, and hypertension. for that, we tested the associations of hub proteins with these metabolic diseases using natural language processing in our indigenous r-code. among hub proteins from the network, , , and proteins were associated with cardiovascular disease, diabetes, and hypertension, respectively (s table) . we adopted a similar mining process to determine the relevance of hub proteins with lung diseases such as asthma, chronic bronchitis, pneumonia, chronic obstructive pulmonary disease (copd), and emphysema. simultaneously, the relevance of hub proteins was tested in viral infections such as adenovirus, influenza, metapneumo, parainfluenza, respiratory syntical virus, rhinovirus, hcov-nl , hcov- e, hcov-hku , hcov-oc , sars-cov and mers-cov. these results suggest that hub proteins have relevance to lung disease (s table) , and proteins were associated with the known viral infection (s table) . these associating hubs convey the likelihood of vulnerability in the risk population, symptoms related to lung diseases, and similar modes of viral infection. subsequently, the protein enrichment analysis was executed to check the molecular involvement of hubs on sars-cov infection. the enrichment analysis of hubs showed several over representing pathways for the hubs of receptor-mediated ( fig ) and viral replication machinery network (fig ) . the receptor-mediated network hubs exhibit several molecular signaling and cellular events upon sars-cov attachment (s table) . notably, most hub proteins showed significant association with immune mechanism, signal transduction, metabolism of proteins, metabolism of rna, transcription machinery, vesicle-mediated transport, metabolism, transport of small molecules cell cycle and homeostasis were noticed (s table) . also, a moderate involvement was seen in cellular response mechanism, cell-cell communication, organelle biogenesis and maintenance. similarly, the enrichment analysis of hubs derived from viral replication machinery network (fig ) showed a major involvement of immune mechanism, signal transduction, metabolism of proteins, metabolism of rna, transcription machinery, dna repair, dna replication, programmed cell death, and cellular responses to external stimuli were noticed (table s ) . interestingly, several common mechanisms were noticed that occur due to the existence of interconnected proteins between the hubs of both the network (fig - ). these common molecules represent the inter-connecting mechanism involved in the transcription machinery, immune response, cell growth and/or maintenance, transport, metabolism, protein metabolism, cell communication and signal transduction that activated upon virus binding and has been subsequently utilized for viral replication process (s table) we further investigate these interconnecting proteins that may use for a therapeutic process. for which, the mirna target for interconnecting proteins (genes) was identified. among proteins, shown targeted by mirna. of which, mir- - p, let- g- p, mir- a- p, mir- b, mir- - p, mir- - p and mir- - p were noticed to target more than three interconnected proteins (fig ) . these mirnas may show its utility for therapeutic intervention. we implement a comprehensive systems biological framework ( fig. ) that utilizes a variety of datasets to illustrate the critical mechanism mediated by sars-cov in the lungs. two independent lung proteins interactome were constructed. the first interactome termed as a receptor-mediated network that revealed the molecular interaction that mediates cellular and signaling processes on sars-cov attachment to the host. in the receptor-mediated network, atp v a, ap b , stom, and zdhhc are the seed proteins localized at the host surface showed significant affinity to the outer surface envelope (e), spike (s), and membrane (m) proteins of sars-cov . on bounding with atp v a, ap b , stom, and zdhhc , the host activates multiple signaling and cellular events that demonstrated in the receptor-mediated protein network (fig ) . for instance, viral envelope (e) bound to the ap b cell-surface protein that initiates the interaction with its neighboring proteins, ap g , ap m , ap d , ap m , ap s , ap s , arf , arf , arrb , arrb , bub , bub b, cltc, and csnk a . similarly, spike (s) interacts with the zdhhc that connects cldnd , flot , fxyd , fyn, lrrc a, and pif . likewise, the sars-cov membrane (m) interacts with atp v a and stom host surface protein that signals human endogenous proteins (fig ) . each host cell surface proteins and it's interacting neighbors form a closed hub with no extensive interconnection with another hub. these hubs were pathways enriched that takes part in wide verities of molecular pathways that include endosome transport, vesicle-mediated transport, protein modification, regulation of cell cycle, immune response, kinase activity, signal transduction, protein metabolism, cell communication, energy and metabolic pathway, cell growth, and maintenance, regulation of nucleic acid metabolism. to our knowledge, most of this mechanism is very well connected with the viral pathogenesis machinery mechanism. notably, at an early phase of surface attachment, the host initiates inflammatory signaling such as interleukin signaling (il , il , il , il , il ), tnf receptor, and nf-kappa b signaling pathways that may initiate cytokine storm in lung cells [ , ] . simultaneously, the involvement of transcription and translational regulatory processes were noticed in the pathway enrichment of receptor hubs. these results suggest that on surface receptor activation host prepares an optimal environment in the host for the viral replication. the second lung protein interactome was named as viral replication machinery network. in this network, we demonstrated the complex interconnectivity of human endogenous proteome with nucleocapsid (n) and non-structural sars-cov proteins. the viral replication machinery network conveys the utilization of host factors for sars-cov replication and self-defense mechanism. using the mcode algorithm, eleven hubs (s - fig) were extracted from the viral replication machinery networks, which play a significant role in cellular and molecular mechanisms (s table) . further, mapping the differently expressed genes with the hubs showed approximately % of hub proteins were altered upon sars-cov infection (s table) . this result confirms the hijack of host hub proteins and their molecular pathways for sars-cov machinery. some hub proteins were proven linked to diabetes, hypertension, and cardiovascular disease (s table) . these results suggest that the sars-cov- alter and utilize the disease proteins for its replication machinery that may be the considerable cause for susceptibility, severity, and complication. besides, similar observations were noticed while mapping hub proteins with proteins related to lung disease and other viral infections. on mapping, hub proteins were shared between the lung diseases suggesting a common mechanism relating disease symptoms and/or vulnerability (s table) . also mapping with other viral infection dataset, hub proteins of the replication machinery network have noticed in influenza virus infection (s table) , which suggests sars-cov and influenza may have a similar mode of host infection machinery [ ] . as apart, several common pathways between the hubs of two networks were noticed (s table) . we obtained these common pathways as an intersection of the interacting molecules between the hubs (fig - ). for example, ap b hub derived from the receptor-mediated network interact with the "hub " of viral replication machinery network by gtf f , polr e, arrb , and ap b interconnecting proteins ( fig ) . overall, five interacting hubs were noticed between receptor-mediated and viral replication machinery network (fig - ). the molecular pathways of interconnecting protein hubs could be the intermediate phase that connects the receptor activation mechanism and viral replication process (fig ) . such intersecting proteins will aid in developing a drug for sars-cov . here, we suggest a few mirnas that regulates the proteins of interconnecting hubs, which may have therapeutic potential. the mirna has been regarded as drug molecules for various diseases [ ]. of mirna, mir- - p, let- g- p, mir- a- p, mir- b, mir- - p, mir- - p and mir- - p targets minimum of three interconnecting proteins (fig ) . these seven mirna molecules are needed to be viewed in a better perspective for future research to screen and treat the current sars-cov pandemic. overall, this study displays several advancement and advantages, ) establishes interplay between sars-cov and human lung host mechanism by integrating experimental evidence arrived from high throughput multi-omics observation on sars-cov . ) our approach provides a possible clue for susceptibility and severity in an individual with metabolic disease complications and suggests possible proteomic relevance with lung diseases and other viral infections. ) our study increases knowledge of molecular interconnectivity between receptor binding mechanism and viral replication machinery that guide towards drug target research. although our approach provides the multi-dimensional view on sars-cov and host interaction, two major limitations need to be considered, ) the interaction between sars-cov spike protein with the host angiotensin (ace ) receptor is not defined in affinity purification mass spectrometry data. hence, no direct ace interaction has been established in our interactome. ) we propose a few mirnas from our approach, are needed to be validated further for clinical application. in summary, our systems biological approach provides an extensive investigation on sars-cov host interaction by constructing interactome based on experimental evidence. we identified crucial functional hubs that relate the mechanism on activation sars-cov attachment through receptor and subsequent utilization functional hubs for viral replication in the host. enrichment analysis supports our hypothesis showed the common mechanism associated with transcriptional and protein translation processing activated upon in host attachment. hub proteins showed linked with diabetes, hypertension, and cardiovascular disease proteins that provide the clue for severity and susceptibility in the diseased population. the relationship between hub proteins with lung diseases like copd, asthma, pneumonia establishes the likelihood of similar symptoms. also, assessing the hubs with the reported proteins of other viral infection establishes the similarity in the mode of viral infection. interestingly, we propose mir- - p, let- g- p, mir- a- p, mir- b, mir- - p, mir- - p, and mir- - p from the genes of interconnecting proteins from its molecular pathways as a mirna drug for sars-cov . however, further work is needed to confirm its utility for clinical application. we believe our results will add knowledge to the existing information that may open up mirna as a drug for sars-cov infection. the dataset relating severe acute respiratory syndrome coronavirus (sars-cov ) and human proteins interactions searched electronically. of extensive search, the experimentally proven dataset reporting the physical association of sars-cov proteins with human proteins determined using affinity purification mass spectrometry (ap-ms) [ ] was retrieved. we converted all collected interacting human proteins to an official gene/protein symbol using the uniport database (https://www.uniprot.org/). next, we collected lung protein expression data from the hprd proteins. subsequently, in order to determine the involvement of each hub in viral pathogenesis, the expression data of sars-cov infection was analyzed, and the differential expressed genes were mapped hub proteins. we searched the gene expression dataset in ncbi gene expression omnibus were retried from ncbi, sra database. after quality assessment, the reads were aligned to the human genome (hg ) following the rna-seq analysis pipeline to determine the differential expression genes using deseq with p-value < . . further, the significant differentially expressed genes in the host on sars-cov infection were collected and mapped to the hubs. r-program was employed to find out the associated genetic hubs risk factors genes reported in the literature for diabetes, hypertension, and cardiovascular diseases. in brief, the r-code collects the abstract from the ncbi, pubmed, for the keywords related to each risk factor. all abstracts were automatically examined for the presence of risk factors (examples: "diabetes") and hub proteins using a natural language processing method. further, the co-occurrence of risk factors with each protein symbol in the abstract was assessed using a point-wise mutual information method. a similar data mining process was carried out for lung diseases (asthma, chronic bronchitis, pneumonia, and emphysema) and other viral pathogens such as adenovirus, influenza, metapneumo, parainfluenza, respiratory syntical virus, rhinovirus, hcov-nl , hcov- e, hcov-hku , hcov-oc , sars-cov and mers-cov. simultaneously, all hubs were enriched using kyoto encyclopedia of genes and genomes (reactome; https://reactome.org//) database to evaluate the relevance of the pathway of each hub in sars-cov infection and replication process. in particular, the pathway enrichment of receptor-mediated network hubs shows the mechanism initiated upon sars-cov receptor activation. whereas, the pathways of viral replication machinery network hubs explain the mechanism attribute on evasion of sars-cov genome into the host for its replication process. this cluster is termed as hub that represents the closed interconnected proteins (rectangular box) showing no direct connectivity with sars-cov proteins. however, these proteins involved in the host inflammatory process, protein serine/threonine kinase, dna topoisomerase, transcription factor, transcription regulator, chaperone, and ubiquitin-specific protease activity. this cluster is termed as hub that represents the closed interconnected proteins this cluster is termed as hub that represents the closed interconnected proteins transport. over-representation of these molecular events suggests host self-defense mechanism and preparation of the optimal environment in the host for the viral replication process on receptor attachment. additionally, the red highlighting node represents the differentially expressed genes on sars-cov infection derived from rna-seq analysis. interconnecting hub proteins are represented. these mirnas will be the potential molecule as a drug that helps in inhibition of connective between host attachment and the viral replication process. who strategic and technical advisory group for infectious hazards. covid- : towards controlling of a pandemic hypothesis for potential pathogenesis of sars-cov- infection-a review of immune changes in patients with viral pneumonia coronavirus disease (covid- ) situation report - managing covid- in low-and middle-income countries covid- -navigating the uncharted real estimates of mortality following covid- infection covid- and diabetes: knowledge in progress a new coronavirus associated with human respiratory disease in china china hypertension survey investigators. status of hyperten-sion in china: results from the china hypertension survey european centre for disease prevention and control. novel coronavirus disease (covid- ) pandemic: increased transmission in the eu/eea and the uk -sixth update genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan hlh across speciality collaboration, uk. covid- : consider cytokine storm syndromes and immunosuppression network-based drug repurposing for novel coronavirus -ncov/sars-cov- a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug an automated method for finding molecular complexes in large protein interaction networks how to reduce the likelihood of coronavirus- (cov- or sars-cov- ) infection and lung inflammation mediated by il- sars-cov- detection in patients with influenza-like illness mirnas in alzheimer disease -a therapeutic perspective microrna therapeutics: towards a new era for the management of cancer and other diseases all authors thank chettinad academy of research and education (care) for the support. we specially thank professor ram murugesan, research director (care) for the critical comments and feedback on the manuscript. key: cord- -f zqhpx authors: slaine, patrick; kleer, mariel; duguay, brett; pringle, eric s.; kadijk, eileigh; ying, shan; balgi, aruna d.; roberge, michel; mccormick, craig; khaperskyy, denys a. title: thiopurines activate an antiviral unfolded protein response that blocks viral glycoprotein accumulation in cell culture infection model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: f zqhpx enveloped viruses, including influenza a viruses (iavs) and coronaviruses (covs), utilize the host cell secretory pathway to synthesize viral glycoproteins and direct them to sites of assembly. using an image-based high-content screen, we identified two thiopurines, -thioguanine ( -tg) and -thioguanosine ( -tgo), that selectively disrupted the processing and accumulation of iav glycoproteins hemagglutinin (ha) and neuraminidase (na). selective disruption of iav glycoprotein processing and accumulation by -tg and -tgo correlated with unfolded protein response (upr) activation and ha accumulation could be partially restored by the chemical chaperone -phenylbutyrate ( pba). chemical inhibition of the integrated stress response (isr) restored accumulation of na monomers in the presence of -tg or -tgo, but did not restore na glycosylation or oligomerization. thiopurines inhibited replication of the human coronavirus oc (hcov-oc ), which also correlated with upr/isr activation and diminished accumulation of orf ab and nucleocapsid (n) mrnas and n protein, which suggests broader disruption of coronavirus gene expression in er-derived cytoplasmic compartments. the chemically similar thiopurine -mercaptopurine ( -mp) had little effect on the upr and did not affect iav or hcov-oc replication. consistent with reports on other cov spike (s) proteins, ectopic expression of sars-cov- s protein caused upr activation. -tg treatment inhibited accumulation of full length s or furin-cleaved s fusion proteins, but spared the s ectodomain. dbeq, which inhibits the p aaa-atpase required for retrotranslocation of ubiquitinated misfolded proteins during er-associated degradation (erad) restored accumulation of s and s proteins in the presence of -tg, suggesting that -tg induced upr accelerates erad-mediated turnover of membrane-anchored s and s glycoproteins. taken together, these data indicate that -tg and -tgo are effective host-targeted antivirals that trigger the upr and disrupt accumulation of viral glycoproteins. importantly, our data demonstrate for the first time the efficacy of these thiopurines in limiting iav and hcov-oc replication in cell culture models. importance secreted and transmembrane proteins are synthesized in the endoplasmic reticulum (er), where they are folded and modified prior to transport. during infection, many viruses burden the er with the task of creating and processing viral glycoproteins that will ultimately be incorporated into viral envelopes. some viruses refashion the er into replication compartments where viral gene expression and genome replication take place. this viral burden on the er can trigger the cellular unfolded protein response (upr), which attempts to increase the protein folding and processing capacity of the er to match the protein load. much remains to be learned about how viruses co-opt the upr to ensure efficient synthesis of viral glycoproteins. here, we show that two fda-approved thiopurine drugs, -tg and -tgo, induce the upr in a manner that impedes viral glycoprotein accumulation for enveloped influenza viruses and coronaviruses. these drugs may impede the replication of viruses that require precise tuning of the upr to support viral glycoprotein synthesis for the successful completion of a replication cycle. secreted and transmembrane proteins are synthesized in the endoplasmic reticulum (er), where they are folded and modified prior to transport. during infection, many viruses burden the er with the task of creating and processing viral glycoproteins that will ultimately be incorporated into viral envelopes. some viruses refashion the er into replication compartments where viral gene expression and genome replication take place. this viral burden on the er can trigger the cellular unfolded protein response (upr), which attempts to increase the protein folding and processing capacity of the er to match the protein load. much remains to be learned about how viruses co- opt the upr to ensure efficient synthesis of viral glycoproteins. here, we show that two fda- approved thiopurine drugs, -tg and -tgo, induce the upr in a manner that impedes viral glycoprotein accumulation for enveloped influenza viruses and coronaviruses. these drugs may impede the replication of viruses that require precise tuning of the upr to support viral glycoprotein synthesis for the successful completion of a replication cycle. enveloped viruses encode integral membrane proteins that are synthesized and post- translationally modified in the endoplasmic reticulum (er) prior to transport to sites of virion assembly. when er protein folding capacity is exceeded, the accumulation of unfolded proteins in the er causes activation of the unfolded protein response (upr) whereby activating transcription factor- (atf ), inositol requiring enzyme- (ire ) and pkr-like endoplasmic reticulum kinase (perk) sense er stress and trigger the synthesis of basic leucine zipper (bzip) transcription factors that initiate a transcriptional response ( ) . upr gene expression causes the accumulation of proteins that attempt to restore er proteostasis by expanding er folding capacity and stimulating catabolic activities like er-associated degradation (erad) ( ). erad ensures that integral membrane proteins that fail to be properly folded are ubiquitinated and retrotranslocated out of the er for degradation in the s proteasome. there is accumulating evidence that bursts of viral glycoprotein synthesis can burden er protein folding machinery, and that enveloped viruses subvert the upr to promote efficient viral replication ( , ) . influenza a viruses (iavs) encode three integral membrane proteins: hemagglutinin (ha) neuraminidase (na) and matrix protein (m ). ha adopts a type i transmembrane topology in the er, followed by addition of n-linked glycans, disulfide bond formation, and trimerization prior to transport to the golgi and further processing by proteases and glycosyltransferases ( - ); na adopts a type ii transmembrane topology in the er, is similarly processed by glycosyltransferases and protein disulfide isomerases, and assembles into tetramers prior to traversing the secretory pathway to the cell surface ( , ). the small m protein also forms disulfide-linked tetramers in the er, which is a prerequisite for viroporin activity ( - ). iav replication causes selective activation of the upr; ire is activated, but perk and atf are not ( ), although the precise mechanisms of regulation remain unknown. furthermore, chemical chaperones and selective chemical inhibition of ire activity inhibit iav replication, suggesting that ire has pro-viral effects. ha is sufficient to activate the upr ( ) and is subject to erad-mediated degradation ( ) . by contrast, little is known about how na and m proteins affect the upr. however, these inhibitors triggered sg formation and cytotoxic effects in uninfected cells as well, limiting their potential utility as antivirals. because sg formation correlates with antiviral activity, we conducted an image-based high-content screen to identify molecules that selectively induce sg formation in iav infected cells. we identified two fda-approved thiopurine analogs, - thioguanine ( -tg) and -thioguanosine ( -tgo), that blocked iav and hcov-oc replication in a dose-dependent manner. unlike pateamine a and silvestrol, these thiopurines selectively disrupted the processing and accumulation of viral glycoproteins, which correlated with upr activation. synthesis of viral glycoproteins could be partially restored in -tg treated cells by the chemical inhibition of the upr or isr. our data suggest that upr-inducing molecules could be effective host-targeted antivirals against viruses that depend on er processes to support efficient replication. induction of upr by -tg and -tgo represents a novel host-directed antiviral mechanism triggered by these drugs and reveals a previously unrecognized unique mechanism of action that distinguishes them from other closely related thiopurines and nucleoside analogues. through this screen, we identified two thiopurines, -thioguanine ( -tg) and - thioguanosine ( -tgo) (fig. a) , that triggered dose-dependent sg formation in iav-infected cells (fig. b) . specifically, sgs formed in approximately % of -tg-treated or -tgo-treated infected cells; no sgs were detected in mock infected cells treated with either drug at the highest concentration (fig. b) . these findings were confirmed in parental a cells infected with iav strain a/california/ / (h n ; iav-ca/ ); -tg treated cells displayed the formation of foci that contained sg constituent proteins g bp and poly a binding protein (pabp) (fig. c) . these foci also contained canonical sg proteins tiar and eif a (fig. d) , supporting their identity as bona fide sgs. next, we wanted to determine whether thiopurine-mediated sg formation indicated a disruption of viral replication. a cells were infected with iav strain a/puertorico/ / (h n ; iav-pr ) and treated with -tg, -tgo or controls at hpi. cell supernatants were harvested at hpi and infectious virions enumerated by plaque assay. despite sg induction in only a fraction of virus-infected cells, we observed a sharp dose-dependent decrease in virion production following treatment with either thiopurine analog. treatment with m -tg reduced virion production by ~ -fold, whereas m -tgo reduced virion production by ~ -fold (fig. a) . furthermore, treatment of iav infected cells with µm concentrations of either -tg or - tgo led to even greater inhibition of iav production ( fig. a ). this suggests that sg formation correlates with the disruption of the viral replication cycle. however, the sharp decrease in infectious virion production in -tg/ -tgo-treated cells suggests that sg formation is not required for their antiviral effect. the nucleoside analog -fluorouracil ( -fu) had no effect on iav replication at m and m doses ( fig. a) . using an alamarblue assay, we observed a ~ % reduction in a cell viability in the presence of m doses of -tg/ -tgo (fig. b) . compared to sg-inducing translation inhibitor silvestrol, which causes apoptosis in a cells upon prolonged exposure, we did not observe significant disruption of cell monolayer by -tg treatment ( fig. c ) or induction of apoptosis as measured by parp cleavage (fig. d ). this is consistent with a recent report of -tg-mediated cytostatic rather than cytotoxic effects on a cells ( ). in vero cells, -tg treatment partially protected cellular monolayers from iav-induced cell death over -h incubation (fig. e ). taken together, our data suggest that -tg and -tgo elicit a broad dose-dependent antiviral effect against iav that was not shared by the nucleoside analog -fu. faster-migrating, presumably un-glycosylated species, but these were difficult to visualize as they migrated to the same position as np on immunoblots probed with polyclonal anti-iav antibodies that concurrently detect np, m and ha. -fu, which had no effect on viral replication over the h time course in these cells ( fig. a) , likewise had no effect on the accumulation of these iav proteins (fig. a) . consistent with the notion of selective inhibition of iav glycoprotein synthesis and maturation, we observed that -tg had no effect on the accumulation of iav-pr ha or na transcripts or function of the rdrp in genome replication, as -tg had little effect on the accumulation of ha and na genome segments (fig. b ). taken together, these data support a novel mechanism of action for thiopurine analogs in selectively inhibiting processing and accumulation of iav glycoproteins and significantly impairing iav replication. -tg and -tgo activate the upr and chemical mitigation of er stress restores synthesis of by inhibiting n-linked glycosylation, tm impedes proper processing of secreted and transmembrane proteins in the lumen of the er, which elicits er stress and activates the upr ( ). indeed, we observed that tm treatment of a cells caused accumulation of xbp s and the er chaperone binding immunoglobulin protein (bip) (fig. a ). bip upregulation is an excellent measure for upr activation because it requires both atf (n)-dependent transcription of bip and perk-mediated activation of the isr and uorf-skipping-dependent translation ( ) . we observed that both -tg and -tgo caused bip and xbp s accumulation in a cells, whereas the chemically similar thiopurine -mercaptopurine ( -mp) did not. nucleoside analogs -fu and ribavirin also did not affect bip or xbp s levels (fig. a ). these data demonstrate that -tg and -tgo, but not all thiopurines, activate the upr in a cells. to determine whether thiopurines could activate the upr during iav infection, a cells infected with iav-pr were treated with -tg. we observed that -tg caused strong accumulation of bip which coincided with diminished accumulation of ha (fig. b ). by contrast, co-administration of -tg and the chemical chaperone -phenylbutyrate ( -pba) ( ) diminished accumulation of bip and partially restored ha levels in infected cells, without affecting levels of np and m proteins. this suggests that thiopurine-mediated activation of the upr/isr is at least partially responsible for the diminished accumulation of ha glycoproteins in infected cells. to corroborate our observation of -tg-mediated upr activation, a cells were mock infected or iav-pr infected, and treated with -tg, -mp, or tm for h before harvesting rna for rt-qpcr analysis of upr gene expression. we analyzed transcripts produced from target genes linked to each arm of the upr; atf target gene chop, xbp s target genes edem and erdj , and atf (n) target genes bip and herpud . as expected, tm treatment caused strong induction of all arms of the upr and increased transcription of all five target genes in mock-infected cells and infected cells alike (fig. c ). this strong and consistent transcriptional output between mock-infected and infected cells suggests that the upr remains largely intact during iav-pr infection. treatment with -mp had little effect on upr gene expression (fig. c ). by contrast, -tg treatment caused statistically significant increases in transcription from all five upr target genes (fig. c) . these observations confirm that -tg activates all three arms of the upr, whereas the chemically similar thiopurine -mp does not. inhibition of the integrated stress response does not restore na processing and oligomerization in -tg treated cells iav glycoproteins are translocated into the er, where they are modified with n-linked glycans and organize into oligomeric complexes. upon synthesis in the er, the type ii transmembrane protein na is glycosylated and forms dimers linked by intermolecular disulfide bonds in the stalk region ( ) that then assemble into tetramers ( ). we investigated the effect of -tg on na processing and oligomerization using sds-page/immunoblotting procedures in the presence or absence of the disulfide bond reducing agent dithiothreitol (dtt). since na tetramers are known to dissociate into dimers during electrophoresis ( , ) we annotated the ~ kda band as dimers/tetramers (fig. a ). we observed intact glycosylated na dimers/tetramers and monomers in mock-treated iav-pr -infected cells, which were resolved into ~ kda glycosylated na monomers in the presence of dtt (fig. a ). unglycosylated na monomers were undetectable in mock-treated cells at steady state, confirming that n-glycosylation is a rapid initial step in na processing in the er. tm treatment eliminated na dimers/tetramers, leaving a minor fraction of unglycosylated na monomers. the -tg treatment diminished accumulation of all forms of na, yielding a distinct residual band that migrated closer to the size of the unglycosylated na monomers from tm-treated cells; this suggests that -tg treatment interferes with proper n-glycosylation of nascent na. treatment with integrated stress response inhibitor (isrib), which prevents isr-mediated translation arrest by maintaining eif b activity ( , ), rescued accumulation of na monomers in both tm-and -tg-treated cells. however, isrib was not able to restore na glycosylation and oligomerization. these data provide further evidence that -tg inhibits iav glycoprotein accumulation via upr/isr activation and extends our understanding by demonstrating that isr suppression does not fully reverse these effects. this is further supported by our observations that administration of isrib alone had no impact on iav replication while co-administration of isrib with tm-or -tg failed to restore virion production in single-cycle infection assays (fig. b) . in vitro studies have shown that thiopurines -tg and -mp can reversibly inhibit sars- cov- and mers-cov papain-like cysteine proteases pl(pro) ( - ); however, whether these thiopurines could inhibit viral replication was not assessed. our observations of upr activation and selective inhibition of iav ha and na processing and accumulation by low micromolar doses of -tg and tgo, but not -mp, suggest a distinct antiviral mechanism of action for these thiopurines. if true, the antiviral activity of -tg and -tgo may be broadly applicable to other viruses with envelope glycoproteins like coronaviruses. to test this directly, we performed hcov- with the strong effect on infectious virion production, we also observed significant, dose- dependent reductions in viral protein accumulation due to -tg and -tgo treatment in hcov- oc -infected hct- cells (fig. c , the main band recognized by the anti-oc antibody is consistent with the size of the nucleocapsid n protein). treatment with higher dose of -mp caused detectable decline in n protein levels in hcov-oc -infected cells, but not to the levels observed with -tg or -tgo treatment (fig. c ). in hcov-oc -infected hct- cells, bip and chop expression was upregulated following thiopurine treatment (fig. c ). this is consistent with the significant induction of bip proten and chop mrna levels that was observed in thiopurine- treated a cells (fig. c ). to test the effects of -tg on cov replication, we analysed hcov- oc mrna synthesis by harvesting rna from infected cells treated with -tg or vehicle control. we observed that -tg treatment caused significant decreases in steady-state levels of (+) genomic rna (orf ab) as well as (+) subgenomic rna (sgrna) that encodes n (fig. d) . thus, despite the previously reported effects of -tg and -mp on hcov cysteine protease activity in vitro, we observed that -mp had only modest effects on hcov-oc replication whereas -tg and -tgo had clear antiviral effects similar to our previous observations of inhibition of iav replication. thiopurine antiviral activity in our hcov-oc infection assays correlated with upr activation and hampered viral genome synthesis and viral protein production. because -tg activates the upr/isr and inhibits the processing and accumulation of iav glycoproteins, we reasoned that coronavirus glycoproteins would be similarly affected by -tg treatment. due to the ongoing sars-cov- pandemic, numerous reagents and constructs have been rapidly developed to study this virus, including expression plasmids. we therefore sought to determine if sars-cov- s glycoprotein is sensitive to -tg in ectopic expression experiments. s is first translated as full-length s proprotein, before cleavage to s and s domains by cellular proprotein convertases like furin ( ). we observed that the s protein co-localised with the er marker calnexin when expressed alone or co-expressed with m protein (fig. a ). m also caused some of the s protein to accumulate in distinct regions of the cytoplasm proximal to, but not overlapping with, calnexin-stained er, which likely represents the ergic (fig. a) . ectopic expression of s led to accumulation of ~ kda full-length n-glycosylated s monomers; detection of ~ kda s ectodomains demonstrated efficient s n-glycosylation and trimerization in the er and transport to the golgi for furin cleavage (fig. b ). we observed that ectopic s expression was sufficient to activate the upr/isr, as indicated by accumulation of bip (fig. b) , consistent with previous reports of sars-cov- s ( , ). -tg causes a loss of membrane- bound s and s , but spared the cleaved s subunit (fig b) . co-expression of s with m altered s processing leading to different accumulation of s protein species, possibly due to altered s trafficking by m and retention at the ergic compartment ( ). pngase f treatment of the lysates to remove n-glycosylations confirmed that m and -tg altered glycosylation of s, but did not affect cleavage (fig. c) . treatment with either the chemical chaperone pba or dbeq, a selective chemical inhibitor of the p aaa-atpase, led to partial restoration of s and s (fig d ). together, these observations suggest that like iav glycoproteins, sars-cov- s glycoprotein is vulnerable to -tg mediated activation of the upr/isr and suggest a mechanism involving accelerated turnover of membrane-anchored s and s proteins by erad. discussion compared to current direct-acting antiviral drugs, effective host-targeting antivirals may provide a higher barrier to the emergence of antiviral drug-resistant viruses. however, it remains challenging to identify cellular pathways that can be targeted to disrupt viral replication without causing adverse effects on bystander uninfected cells. here, we report that two chemically similar fda-approved thiopurine analogues, -tg and -tgo, have broad antiviral effects that result from activation of upr and disruption of viral glycoprotein synthesis and maturation. importantly, our data demonstrate for the first time that -tg and -tgo are effective antivirals against influenza virus and coronavirus and may be effective against other glycoprotein-containing viruses. -tg is currently used in clinical settings to treat acute lymphoblastic leukemia and other hematologic malignancies, with the main mechanism of action involving conversion into thioguanine nucleotides and subsequent incorporation into cellular dna, which preferentially kills cycling cancer cells ( , ). furthermore, the active -tg metabolite, -thioguanosine ′- triphosphate, was shown to inhibit small gtpase rac ( ), which is believed to be largely with minimal effects on other viral proteins suggests that upr induction by -tg and -tgo is the main antiviral mechanism. indeed, chemical chaperones and the isr inhibitor isrib partially restored iav ha, na, and sars-cov- s protein accumulation in cells treated with -tg. inhibition of the erad pathway with dbeq also restored accumulation of er membrane- anchored subunits of sars-cov- s protein (uncleaved precursor s and cleaved s ) in -tg- treated cells. this indicates that the -tg-induced upr causes both the phospho-eif a dependent decrease in viral glycoprotein mrna translation and the erad-mediated degradation of newly synthesized er-anchored proteins. in the case of iav which replicates in the nucleus of infected cells, depletion of viral envelope glycoproteins blocks infectious virion production but minimally affects replication of viral nucleic acids. by contrast, synthesis of coronavirus genomic rna is inhibited by thiopurine-induced upr. this reduction could be due to inhibition of pl(pro) activity as previously suggested ( - ); however, we suggest that thiopurines likely inhibit viral replication due to its occurrence on the multivesicular network generated from virus-rearranged er membranes which may be highly sensitive to upr-induced alterations. despite multiple mechanisms deployed by iav to block sg formation, -tg and -tgo treatment induced sgs in infected cells, which allowed us to identify these molecules in our image- based screen. we previously reported that in a cells, iav inhibited sg formation triggered by treatment with thapsigargin, a potent inducer of er stress, with only % of infected cells forming sgs compared to % of mock-infected cells ( ). thus, induction of sgs in approximately % of infected cells by -tg and -tgo treatment appears consistent with our previous observations. however, unlike thapsigargin, -tg and -tgo did not trigger sg formation in uninfected cells. the levels of upr induction by these drugs were similar between infected and uninfected cells, highlighting that in iav-infected cells sg formation may not be triggered exclusively by er stress and perk activation and may only partially contribute to antiviral effects of thiopurines. indeed, sgs formed in a fraction of infected cells while accumulation of viral glycoproteins ha and na was nearly completely blocked by -tg and -tgo. consistent with previous reports, ectopic expression of sars-cov- s protein was enhanced during m co-expression and s alone was sufficient to trigger er stress. in this system, -tg treatment further potentiated upr responses, as measured by increased bip accumulation. our data also highlight the sensitivity of membrane anchored viral proteins to -tg treatment. it is currently unknown if host glycoproteins will be similarly affected by -tg treatment, but this is an important question to answer due to the prevalent use of thiopurines clinically. while we suspect that -tg and -tgo will be effective against a wide-range of enveloped viruses, our future studies will investigate if sars-cov- replication can be negatively impacted following what is the mechanism of upr induction by -tg and -tgo? our results suggest that the effects are unlikely to be mediated through dna or rna incorporation of -tg because ) replicative stress does not specifically induce upr; ) among viral proteins, glycoprotein accumulation and processing was preferentially disrupted; ) messenger rna levels of ha and na were not affected. furthermore, the closely related thiopurine -mp that can be converted into -thioguanosine triphosphate and incorporated into nucleic acids did not induce upr and had no effect on iav glycoproteins or oc replication. another nucleoside analogue, -fu, that is also incorporated into nucleic acids and can even trigger sg formation upon prolonged -hour incubation ( ), was similarly inactive in our assays. the second previously described antiviral mechanism of action of -tg and -mp that involves direct inhibition of viral cysteine proteases is similarly unlikely to have major contribution to the observed phenotypes because upr induction was triggered in both infected and uninfected cells and because, as mentioned above, -mp was not active in our assays. thus, by process of elimination, we speculate that the mechanism of upr induction by -tg and -tgo could involve gtpase inhibition. numerous gtpases regulate er homeostasis, including rab gtpases that govern vesicular trafficking events and dynamin-like gtpases that regulate homotypic er membrane fusion events required for the maintenance of branched tubular networks ( ) . future studies will focus on identifying specific molecular targets of these upr-inducing thiopurines using orthogonal biochemical and genetic screens. human lung adenocarcinoma a cells, human embryonic kidney (hek) t and a cells, table . table . primer sequences for rt-qpcr analysis primer sequences ( '- ') s rrna hpi, cells were treated with , , and um doses of thiopurine analogs -thioguanine ( -tg) or -thioguanosine ( -tgo). at hpi, cells were fixed and stained with hoeschst . automated image capture was performed using a cellomics arrayscan vti hcs reader. images were captured for each well and average punctate egfp-g bp intensity was calculated. (c) a cells were infected with iav-ca/ at a moi of . at hpi, cells were treated with -tg or mock-treated. at hpi, cells were fixed and immunostained with antibodies directed to stress granule marker proteins g bp (red), pabp (green) and a polyclonal iav antibody (blue) that detects antigens from np, m , and ha, followed by staining with alexa-conjugated secondary antibodies. (d) a cells were infected with iav-ca/ at a moi of . at hpi, cells were treated with -tg ( µm). at hpi, cells were fixed and immunostained with antibodies directed to stress granule marker proteins g bp (red), tiar (green) and eif a (green), followed by staining with alexa-conjugated secondary antibodies. images captured on a zeiss axioimager z fluorescent microscope. representative images shown. scale bars represents µm. a cells were treated with -thioguanine ( -tg), -thioguanosine ( -tgo), -mercaptopurine ( -mp), -fluorouracil ( -fu) or ribavirin at the indicated concentrations for h (xbp s) or h (bip) prior to harvesting lysates for immunoblotting. µg/ml tunicamycin (tm) served as positive control for upr activation, whereas dmso was mock treatment. membranes were probed with anti-bip and anti-xbp s antibodies to measure upr activation. -actin served as a loading control. (b) a cells were mock-infected or infected with iav-pr at moi of . after h, cells were washed and incubated with µm -tg or vehicle control, with or without mm pba, a chemical chaperone. at hpi, cell lysates were harvested and probed with antibodies for the indicated target proteins. western blots are representative of independent experiments. (c) a cells were infected with iav-pr at moi of , washed and overlaid with media containing -mp, -tg, or tm. cell lysates were collected at hpi and rna was isolated and processed for rt-qpcr. changes in chop, bip, edem , erdj , and herpud mrna levels were calculated by the ΔΔct method and normalized using s rrna as a reference gene and standardized to mock. error bars represent the standard deviation between biological replicates (n= ); circles represent biological replicates; lines represents the average value. statistical significance was calculated via a two-way anova followed by a dunnett multiple comparisons test. the unfolded protein response: from stress pathway to homeostatic regulation mechanistic insights into er-associated protein degradation herpesviruses and the unfolded protein response cellular proteostasis during influenza a virus infection-friend or foe? cells structure of the haemagglutinin membrane glycoprotein of influenza virus at a resolution expression of wild-type and mutant forms of influenza hemagglutinin: the role of folding in intracellular transport folding, trimerization, and transport are sequential events in the biogenesis of influenza virus hemagglutinin monoclonal antibodies localize events in the folding, assembly, and intracellular transport of the influenza virus hemagglutinin glycoprotein role of conserved glycosylation sites in maturation and transport of influenza a virus hemagglutinin membrane glycoprotein folding, oligomerization and intracellular transport: effects of dithiothreitol in living cells n-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin synthesis and processing of the influenza virus neuraminidase, a type ii transmembrane glycoprotein steps in maturation of influenza a virus neuraminidase influenza virus m protein is an integral membrane protein expressed on the infected-cell surface influenza virus m integral membrane protein is a homotetramer stabilized by formation of disulfide bonds structural characteristics of the m protein of influenza a viruses: evidence that it forms a tetrameric channel influenza a viral replication is blocked by inhibition of the inositol- requiring enzyme (ire ) stress pathway acute lung injury results from innate sensing of viruses by an er stress pathway innate sensing of influenza a virus hemagglutinin glycoproteins by the host endoplasmic reticulum (er) stress pathway triggers a potent antiviral response via er-associated protein degradation upregulation of chop/gadd during coronavirus infectious bronchitis virus infection modulates apoptosis by restricting activation of the extracellular signal-regulated kinase pathway the endoplasmic reticulum stress sensor ire α protects cells from apoptosis induced by the coronavirus infectious bronchitis virus the coronavirus spike protein induces endoplasmic reticulum stress and upregulation of intracellular chemokine mrna concentrations the perk arm of the unfolded protein response negatively regulates transmissible gastroenteritis virus replication by suppressing protein translation and promoting type i interferon production a human coronavirus oc variant harboring persistence-associated mutations in the s glycoprotein differentially induces the unfolded protein response in human neurons as compared to wild-type virus comparative host gene transcription by microarray analysis early after infection of the huh cell line by severe acute respiratory syndrome coronavirus and human coronavirus e modulation of the unfolded protein response by the severe acute respiratory syndrome coronavirus spike protein transcriptional profiling of vero e cells over-expressing sars-cov s subunit: insights on viral regulation of apoptosis and proliferation coronavirus infection modulates the unfolded protein response and mediates sustained translational repression post-translational modifications of coronavirus proteins: roles and function assembly of coronavirus spike protein into trimers and its role in epitope expression comparative analysis of the activation of unfolded protein response by spike proteins of severe acute respiratory syndrome coronavirus and human coronavirus hku a sars-cov protein er stress and jnk-dependent apoptosis coronavirus a protein causes endoplasmic reticulum stress and induces ligand- independent downregulation of the type interferon receptor the ab protein of sars- cov is a luminal er membrane-associated protein and induces the activation of atf sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum membrane topology of murine coronavirus replicase nonstructural protein localization and membrane topology of coronavirus nonstructural protein : involvement of the early secretory pathway in replication molinari m. . coronaviruses hijack the lc -i-positive edemosomes derived vesicles exporting short-lived erad regulators, for replication translation inhibition and stress granules in the antiviral immune response the mechanism of eukaryotic translation initiation and principles of its regulation rna-binding proteins tiar link the phosphorylation of eif- alpha to the assembly of mammalian stress granules complexes mediate stress granule condensation and associate with s subunits influenza a virus inhibits cytoplasmic stress granule formation influenza a virus host shutoff disables antiviral stress-induced translation arrest eukaryotic translation initiation factor a inhibitors block influenza a virus replication antioxidant antagonises chemotherapeutic drug effect in lung cancer cell line a mammalian stress response: induction of the glucose-regulated protein family clinical and experimental applications of sodium phenylbutyrate three-dimensional structure of the neuraminidase of influenza virus a/tokyo/ / at . a resolution antigenicity of the n influenza a virus neuraminidase: existence of an epitope at the subunit interface of the neuraminidase pharmacological brake-release of mrna translation enhances cognitive memory the small molecule isrib reverses the effects of eif α phosphorylation on translation and stress granule assembly thiopurine analogues inhibit papain-like protease of severe acute respiratory syndrome coronavirus thiopurine analogue inhibitors of severe acute respiratory syndrome-coronavirus papain-like protease, a deubiquitinating and deisgylating enzyme thiopurine analogs and mycophenolic acid synergistically inhibit the papain-like protease of middle east respiratory syndrome coronavirus cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell-cell but not virus-cell fusion the cytoplasmic tail of the severe acute respiratory syndrome coronavirus spike protein contains a novel endoplasmic reticulum retrieval signal that binds copi and promotes interaction with membrane protein oxidation-mediated dna cross-linking contributes to the toxicity of -thioguanine in human cells incorporation of -thioguanine into nucleic acids mediate immunosuppressive effects by interfering with rac protein function drug insight: pharmacology and toxicity of thiopurine therapy in patients with ibd -fluorouracil affects assembly of stress granules based on rna incorporation fusion of the endoplasmic reticulum by membrane-bound new low-viscosity overlay medium for viral plaque assays mammalian stress granules and processing bodies key: cord- -zcbyhsf authors: wilamowski, m.; sherrell, d.a.; minasov, g.; kim, y.; shuvalova, l.; lavens, a.; chard, r.; maltseva, n.; jedrzejczak, r.; rosas-lemus, m.; saint, n.; foster, i.t.; michalska, k.; satchell, k.j.f.; joachimiak, a title: methylation of rna cap in sars-cov- captured by serial crystallography date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zcbyhsf the genome of the sars-cov- coronavirus contains proteins, of which are nonstructural. nsp and nsp form a complex responsible for the capping of mrna at the ′ terminus. in the methylation reaction the s-adenosyl-l-methionine serves as the donor of the methyl group that is transferred to cap- at the first transcribed nucleotide to create cap- . the presence of cap- makes viral rnas mimic the host transcripts and prevents their degradation. to investigate the ′-o methyltransferase activity of sars-cov- nsp / , we applied fixed-target serial synchrotron crystallography (ssx) which allows for physiological temperature data collection from thousands of crystals, significantly reducing the x-ray dose while maintaining a biologically relevant temperature. we determined crystal structures of nsp / that revealed the states before and after the methylation reaction, for the first time illustrating coronavirus nsp / complexes with the m gpppam ′-o cap- , where ′oh of ribose is methylated. we compare these structures with structures of nsp / at k and k collected from a single crystal. this data provide important mechanistic insight and can be used to design small molecules that inhibit viral rna maturation making sars-cov- sensitive to host innate response. five coronaviruses can induce clusters of severe respiratory diseases in humans: e, oc , sars, mers and sars-cov- , . the outbreak of severe acute respiratory syndrome (sars-cov) in and middle east respiratory syndrome (mers-cov) in , showed a high fatality rate of % and %, respectively, but had limited geographical spread , . in contrast, in the past months sars-cov- (the cause of covid- ) has spread rapidly around the world and infected millions of people, killing over half a million. though, it has a significantly lower fatality rate (estimated at . - %) than mers and sars the amount of people infected is hundreds of times higher, and it caused significant economic hardship and extraordinary social restrictions. in many countries, after six months from the first reports of covid- , the number of cases is still increasing . in the absence of herd immunity, vaccines, or drugs against the virus, the situation is frightful. since the beginning of the worldwide outbreak of covid- , the scientific community has joined efforts to understand sars-cov- biology and to find chemical compounds that either block virus replication or enhance human immunological response. initial focus on repurposing existing drugs that potentially could provide immediate treatment has thus far resulted only in limited success. finding inhibitors of a biological cycle of sars-cov- or vaccines is, therefore, critical for global health, safety, and wellbeing. sars-cov- β-coronavirus has a large (~ kb) and complex ( proteins) (+) sense singlestranded rna genome . the rna present in the mature virion resembles human mrna: (i) it is capped on its ′-end, (ii) contains a ′-poly-a tail, and (iii) after infection can be directly translated to the two polyproteins pp a and pp ab using host machinery. these polyproteins are then matured into non-structural proteins (nsp), that assemble into a large replication-transcription complex. the rna is also used as a template for biosynthesis of (-) sense rna that serves to make additional copies of (+) sense rna and several sub-genomic rnas for the translation of four structural and - accessory proteins . for these rnas to serve as mrnas, they must undergo post-transcription modifications to resemble human mrna. the rna maturation involves several enzymatic steps performed by viral nsps; nsp is a bifunctional rna/ntp triphosphatase (tpase) and helicase; nsp is a bifunctional ′- ′ exonuclease and guanine n methyltransferase; and nsp is mg + dependent ribose ′-o methyltransferase and an elusive guanylyltransferase. in eukaryotes, the attachment of the m g cap at the ′-end of rna protects the transcript from mrna turnover in the pathway dependent on ′ → ′ exoribonucleases activity, such as xrns , and is required for several cellular processes: maturation of mrna, pre-mrna nuclear export, and protein synthesis . the cap at the ′ terminus of eukaryotic transcripts starts with methylguanosine at the reversed position and is connected through an uncommon ′ to ′ triphosphate bridge, which is attached to a nucleotide that undergoes methylation at ′-o-ribose. the canonical pathway of the mrna capping mechanism requires the subsequent action of multiple enzymes . at first, rna ′-triphosphatase (rtpase) cleaves the ′-terminal γ-β phosphoanhydride bond of the nascent mrna and enables further modification of the newly synthesized transcript at its ′-terminus. secondly, the diphosphate mrna is capped using gmp by guanylyltransferase (gtase). next, the (guanine-n )-methyltransferase adds the methyl group to the ′-terminal guanosine creating a precursor: cap- . additionally, the ′-o methyltransferase transfers the methyl group to the ′-o-ribose of the first transcribed nucleotide at mrna, making the m gpppnm ′-o (cap- ) structure . s-adenosyl-l-methionine (sam) is a donor of the methyl group and is converted to s-adenosyl-l-homocysteine (sah) during the reaction . in humans and other higher eukaryotes, the second nucleotide of mrna then undergoes methylation to produce cap- . the post-transcriptional modification of rna is also essential for efficient translation of viral transcript in a eukaryotic host . influenza, ebola, measles, poxvirus, and coronaviruses attach cap to their genomic and sub-genomic rna transcripts to mimic host molecules and escape from the innate immunological response . studies done in the last decade show that the ′-o methylation of the cap- facilitates this process by preventing activation of type i interferon that is induced by the cytoplasmic rna sensors, melanoma differentiation-associated protein (mda ), and retinoic acid inducible gene-i (rig- ) [ ] [ ] [ ] they all utilize a lys-asp-lys-glu catalytic tetrad essential for the enzymatic activity , . it has been shown that the nsp / complex methylates the cap- ( m gpppa ′-oh) to form cap- ( m gpppam ′-o) by adding a methyl group to the ribose ′-o of the first nucleotide (usually adenosine, in covs) of the nascent mrna using sam as the methyl group donor . the nsp activity is necessary for nsps' translation from new copies of viral (+) sense rna as well as structural and accessory proteins translation from sub-genomic rna which are all transcribed from the viral (-) sense rna template. vaccination with nsp defective sars-cov or an immunogenic disruption of the nsp / interface protects mice from a lethal sars challenge . therefore, blocking nsp activity should reduce viral proliferation, making the protein an attractive drug target. while several structures of nsp / in a complex with cap- have been determined by crystallography (pdb entries: wq , wrz, wvn, wks), none have captured the postmethylation state. we conducted serial synchrotron crystallography (ssx) experiments at k to test whether low radiation dose could help uncover the structure of nsp / in a complex with cap- . previously, x-ray free-electron laser (xfel) serial crystallography (sfx) had established methods to study molecular dynamics and chemical reactions in protein crystals occurring at femtosecond to millisecond timescales [ ] [ ] [ ] . the xfel-based protocol enabled to analyze protein structures with no radiation damage, in the so-called "diffraction before destruction" mode , . sfx has been successfully applied to determine the structure of the membrane protein complex of photosystem i and of several other macromolecules using time-resolved approach . the synchrotron equivalent, ssx, uses a similar approach to sfx but can access longer, biologically relevant time scales. moreover ssx is a much more accessible technique at a number of light sources and less sample consuming than sfx . nonetheless, delivering hundreds of thousands of batch-grown crystals to the x-ray beam still presents challenges. two approaches are common: the first uses a liquid injector of microcrystals , , , and the second uses a fixed-target system where crystals are deposited on a chip and scanned through the x-ray beam , . here we present three crystal structures of the nsp / complex determined by fixed-target ssx at k: in the presence of sam, cap- /sam, and with cap- /sah generated by metal-dependent conversion of cap- /sam substrates. we compare these structures with structures of nsp / at k and k collected from a single crystal. we observe the state of the molecules in the crystal before and after methylation. the uniqueness of the cap- structure shows the advantages of the ssx method in structural studies: allowing the use of larger crystals, reducing radiation damage, and being closer to physiological temperatures. we collected ssx data for the nsp / crystals using the fixed-target ssx system (advanced lightweight encapsulation crystallography (alex) mesh-holder) as depicted in figure a for initial data processing, we used the kanzus automated data pipeline. kanzus integrates the -id data collection system with the argonne leadership computing facility (alcf), using the theta supercomputer for high-speed on-demand data analysis. extended figure a presents an image with labeled indexed reflections and shows that most observed reflections of nsp / crystals were indexed. prime identified inadequate low-resolution statistics from the integration files, thus we subjected them to outlier rejection. integration files with resolution lower than . Å, containing less than reflections, or with a unit cell volume . % different from the average were removed from the scaling process (extended data tab. , fig. c, d ). the total number of integrated files that went into prime for nsp / bound with sam, cap- /sah, and cap- /sam were , ; , and , respectively. the overall hit-rate (images used for structure divided by images taken) for the five meshes was %. some positions on the mesh had multiple crystals, % of 'hits' had two lattices, and fewer than % had three (extended data tab. , fig. b ). for structure determination we used all indexed lattices in each diffraction image. two structures from ssx were solved at . Å (nsp / /sam and nsp / /cap- /sah), and a third at . Å (nsp / /cap- /sam) (extended data tab. ). the highest resolution shell in prime analysis was specified using a cc / cut-off below . . completeness for the ssx data was close to % and rwork was . %, . %, and . % for nsp / /sam, nsp / /cap- /sah, and nsp / /cap- /sam, respectively (extended data tab. ). for a comparison to ssx data, we also collected data from a single crystal of the nsp / with cap- /sam and cap- /sah in a capillary at k. these data could only beprocessed to . Å with reasonable statistics (extended data tab. , ). using raddose- d , we estimated the accumulated dose of x-ray for nsp / from ssx to be . mgy, whereas the dose for the capillary-mounted crystal was . mgy over times higher. the nsp / complex is an a/b heterodimer (fig. ) . the amino acid long nsp has unique fold formed by α-helices and a pair of antiparallel β-strands which are facing nsp in the complex. nsp possesses two zinc ions coordinated by the cchc and cccc motifs that are % conserved in β-coronaviruses . nsp is involved in forming the complexes with multiple sars-cov- and - proteins. the best characterized is the complex with nsp ′- ′ exoribonuclease (and guanine n methyltransferase) and nsp that is ′-o-ribose methylase. previous research showed that nsp is only active in the presence of nsp . interestingly, the activity of ′-o mtase of the nsp / complex was reduced by using a peptide compound based on the nsp sequence . nsp is a molecular co-factor for various sars-cov- enzymes and therefore a good candidate for design of molecules that will affect its structure or disrupt the interaction with nsp or nsp ′-o mtases. the structure of nsp is characterized by a rossmann fold, with a large β-sheet surrounded by α-helices, β-strands and loops. nsp contains α-helices, β-strands and loops. nsp has a centrally positioned β-sheet (β ↑,β ↑,β ↑,β ↑,β ↑,β ↓,β ↑) with only one antiparallel strand β . the sam and the cap binding sites are located at the surface of nsp . none of the nsp residues have a direct connection with the ligands' binding sites ( fig. a, c ), but nsp is required for nsp mtase activity . additionally, we observed partial electron density for the -methyl-guanosite- ′-triphosphate ( m gppp) of the cap in the possible allosteric site of the we also compared the two structures of nsp / /sam determined at k (this work pdb entry jib) and k (pdb entry w h) (cα rmsd: . Å) and observed significant differences: one loop from the cap binding pocket is shifted approximately . Å in k structures, which makes the cap binding site more accessible (extended data fig. d) . therefore, the k structures of nsp / in a complex with sam, cap- /sam, and cap- /sah, as determined by ssx, depict states important for molecular modeling studies and structure-based drug design. to better understand nsp activity from sars-cov- , we compared it with ns ′-o mtase from dengue virus (pdb entry dto) (fig. ) . we aligned the structure of nsp with the nterminal domain of ns that spans residues from n-terminus. both nsp and ns ′-o mtases assume a rossmann fold with a large β-sheet decorated with α-helices (fig. b, d) . mtases contain a central β-sheet that is composed of six parallel and one antiparallel strands. despite low sequence identity ( . %) the rmsd between the two mtases is reasonable ( . Å). both enzymes possess a canonical ′-o mtase catalytic tetrad lys-asp-lys-glu with the aspartic acid residue forming a hydrogen bond with a water molecule potentially relevant for catalysis ( fig. a, c) . the sam binding sites share high similarity but there are major differences in the mrna cap binding sites. the mg + or mn + ion is necessary for nsp activity , however is not localized near the active site in the currently available structures of nsp / from sars-cov- and sars-cov- . the mg + ion in the n-terminal domain of ns ′-o mtase is coordinated by the phosphate oxygens of the cap ′ to ′ triphosphate linker and three water molecules that bridge to three bases (a -g -u ) on the ′-end. the sars-cov- nsp / ′-o mtase complex provides a molecular arrangement for binding of the mrna cap- and subsequent methylation of the first transcribed nucleotide. the catalytic core is in the center of the rossmann fold and it binds the sam molecule in the deep, narrow grove that is buried inside the nsp active site. we observed that nsp had recruited sam during expression in e. coli (pdb entry w ) . however, addition of mm sam to the crystallization solution improves crystallization efficiency, reduces crystallization time, and increases the number of crystals -three highly desirable traits for ssx batch crystallization. the sam binding site is negatively charged and is formed by several nsp residues: asn , tyr , gly , ala , gly , ser , gly , thr , asp , leu , asn , asp , cys , asp , met , and phe (fig. , , extended data fig. b ). the sam carboxylate moiety binds to the positively charged n-terminus of the third a-helix spanning pro -trp . this interaction provides additional electrostatic stabilization. nsp directly interacts with cap- through several residues that form a positively charged, elongated binding groove accommodating the mrna cap- (fig. , , extended data a). the n -metyl guanosine binding pocket is formed by cys , asp , leu , tyr , thr , glu , and ser . the ′ to ′ triphosphate bridge of the cap is stabilized through interaction with tyr , lys , thr , his , ser , ser , and glu . the first nucleotide of the mrna cap (adenosine in the presented structure) is bound through lys , asp , tyr , pro , lys , asn , and glu . sam-dependent mtases share a conserved catalytic mechanism wherein the methyl group is transferred to the acceptor substrate via an sn reaction , which requires a linear alignment of the acceptor substrate (nucleophile), methyl group (electrophile), and the sulfur atom (leaving group) of the sah product . it was proposed that nsp ′-o mtases follow the same general mechanism . in these enzymes, the reaction is facilitated by the catalytic tetrad lys-asp-lys-glu, where lys sandwiched between the two acidic residues (asp and glu ) serves as a proton abstractor (fig. a, b) . it was shown previously in biochemical studies that substitution to ala of any residue of the ′o-mtase catalytic tetrad results in an inactive enzyme , . in sars-cov- nsp the catalytic tetrad lys -asp -lys -glu , is superposing well with lys -d -lys -glu of the dengue virus homolog (fig. a, c) . in the d context, lys binds to glu which then binds to lys interacting with asp , which further links to the amino group of sam. in this network, a proton can be transferred between multiple residues, occupying different sites potentially depending on the reaction state (fig. ) . lys is well-positioned to act as a general base deprotonating ′-oh. asp may serve multiple functions -as an acid deprotonating lys (alternating with glu ), and an anchoring point for the cofactor via interaction with its amino group, and as a stabilization for the sulfonium cation. however, the reaction does not occur in a presence of edta, as demonstrated by the ability to capture the cap- /sam complex. it was reported previously using biochemical assays that the activity of nsp / ′-o mtase is magnesium dependent . though, we did not observe any trace of metal near the active site in the k structures reported in this work, and any other cov- nsp / structures reported to date, despite the presence of magnesium in the crystallization buffer. we hypothesize that the magnesium ion (or other metal ion) can transiently bind to the active site possibly replacing one of the water molecules (for example water ) and promoting formation of a reactive conformation by changing electrostatics and geometry of the catalytic residues to stimulate methylation reaction. magnesium has a compact and tight coordination sphere with strict octahedral geometry and a typically short mg-o distance of . Å . by coordinating ′ oxygen and several water molecules magnesium could shortened the distance between ′ oxygen and sam methyl moiety thus promoting formation of transition state and methyl transfer. during the reaction a positively charged, sp planar transition state is formed and the methyl group inverts its stereochemistry. after the methyl-transfer reaction is completed the product is released from the active site. this is consistent with the structure of cap- /sah where several active site residues move, these include tyr , the entire α-helix spanning pro -lys , and tyr on the opposite site ( fig. a, extended data fig. c ). all these residues are involved in interactions with cap- and observed changes perhaps allow cap- to leave. interestingly, the sam/sah binding site remains virtually unchanged in all three structures (extended data fig. ) , suggesting that the sah exchange with sam may require dissociation of nsp that controls conformation of important loop gly -gly . opening this loop may help sah to leave and then a new sam molecule can bind. the biochemical assays show that optimal ph for mers-cov nsp / activity is approximately - . . additionally, the ′-o mtase activity of nsp / from the mers is significantly reduced at ph below . assuming that nsp / ′-o mtase efficiency is ph dependent, we performed the k ssx data collection using a buffer with ph . that could potentially slow down the reaction. comparison of the k and the k structures revealed clear methylation only at k, we hypothesize that cap- could be sensitive to radiation damage and thus can be only observed in low x-ray dose experiments. interestingly, the k structure of nsp / obtained from a single crystal has a mixture of states with % of the cap- /sah products and % of the cap- /sam substrates. it is well known that radiation damage can impact redox systems. high x-ray dose could affect electron density around amino acid residues and lead to photoreduction of metalloproteins , , . these free radical reactions can occur in crystals under cryogenic conditions . enzymes that catalyze transmethylation reactions using sam as the methyl group donor have been described in many cellular processes involving nucleic acids, proteins, phospholipids, and this clearly needs further investigation because ssx low radiation dose may be advantageous in helping reveal subtle and diverse chemical transformations in enzymes that otherwise may be degraded during x-ray or electron diffraction experiments. therefore, the structure of nsp / with cap- presented here provides unique information for understanding the mechanism that allows sars-cov- to mimic mature eukaryotic mrna and escape recognition by human innate immune response. the recent developments in micro focusing x-ray beams at synchrotron light sources and improvement in sample delivery technology, data collection, detectors, and computing will allow rapid determination of new structures using ssx and revealing significant biological information. the further development of ssx and implementation of time-resolved ssx crystallography is an approach that could visualize chemical processes and protein molecular dynamics -such as of the transfer of the methyl group catalyzed by nsp / ′o-mtase from sars-cov- . thus far the cap- was only observed in the structure obtained using ssx. for diffraction experiments conducted at k, ssx provides significant advantages over data collected using a single crystal as it considerably increases resolution, use less sample in comparison with a liquid jet delivery system and reduces levels of x-ray dose. studies of other enzymes can also significantly benefit from using this approach. the recombinant proteins nsp and nsp from sars-cov- were expressed in e. coli and tube. these volumes allowed us to get the number of crystals suitable for data collection from one chip. we have prepared several batches using same conditions to get reproducible data collection and have the ability to merge data from multiple chips. the crystals grew to optimum sizes in days at °c before the ssx data collection. to obtain the structure of nsp / /cap- /sam, the crystal batch was supplemented with mm edta two days before data collection. the crystals from batch crystallization were centrifuged at rcf for minutes at k. the excess solution was removed with a pipette, until µl was left. crystals sedimented on the walls of eppendorf polypropylene tube were gently resuspended using µl pipette tips with a cut end to increase the diameter of the tip to minimize mechanical damage to crystals. then . µl of mm stock of the m gpppa (s l, new england biolabs) was added to crystal slurry and the mix was loaded on a µm grid made from nylon (ny , millipore) which was placed on a µm layer of mylar polyester film. the nylon mesh was covered on the top with a second layer of µm mylar and sealed with the alex magnetic holder, such that sample is hermetically sealed from the outside environment. data collection was started approximately minutes after assembly of the chip. we collected ssx data for the nsp / crystals at the -id beamline at the advanced photon source using the fixed-target ssx (alex) mesh-holder developed at the structural biology center (sbc) as depicted in figure a and extended data fig. . a crystal slurry is deposited on a nylon mesh, which immobilizes the crystals; they are then encapsulated between two polyester films . the rod-shape crystals grew without seeding and the average crystal size was x x µm (extended data fig. a) . serial data collection was collected using three smaract slc- stages configured in an xyz geometry, with each having sufficient movement range to cover the sample area of the specially designed alex holder (patent application serial # / , ). the beamline was configured at an energy of , ev, with collimator slit sizes set to x µm, and step size (distance between exposures) of µm: overlapping exposed area to maximize crystal hits. the five mesh-covered samples used a grid of steps in the x-direction (columns) by steps in the y-direction (rows), covering a total area of approximately . x . mm. the number of steps and the resulting area varied slightly per sample depending on the chip mount position or possible false-starts. extended data table contains details of number of chips and detector distances used for data collection for nsp / ssx structures. crispy, the data acquisition gui for serial collection at sector , allows for quick alignment and acts as a source of information for downstream processing. metadata in the form of json files and beamline/collection strategy parameters are input and passed into the system before collection. these parameters include grid dimensions, detector distance/resolution, unit cell dimension, protein pdb coordinates, and a handful of others. the kanzus pipeline orchestrates ssx data acquisition, analysis, cataloging, and publishes processing metrics. kanzus uses a cloud-hosted research automation system called globus automate to manage these multi-step data "flows" . the first phase of the pipeline is integrated with the aps data management system at the beamline, which deposits each newly acquired image into an globus-accessible storage system at the aps. as new images are acquired, globus automate "flows" are launched to process them as follow: ) moves new files from aps to theta by using the globus transfer service ; ) performs dials stills_process on batches of images by using funcx , a function-as-a-service computation system (funcx uses parsl to abstract and acquire nodes on theta as needed, and dispatches tasks to available nodes); ) extracts metadata from files regarding identified diffractions, and generates visualizations (funcx) showing the locations of positive hits on the mesh; and ) publishes raw data, metadata, and visualizations to a portal on the alcf petrel data system . the result of this automated process is an indexed, searchable data collection that provides full traceability from data acquisition to processed data, and that can be used to inspect and update the running experiment. images were collected at hz, meaning that a -image batch totaling . gb, was generated every seconds. the data transfers to alcf ran at up to mb/s via globus, and theta nodes processed images by using dials stills_process at images per second in steadystate. as experimental configuration values were refined, reprocessing tasks were submitted as required. funcx managed these tasks by expanding the number of theta nodes being used to a maximum of , which enabled a processing rate of greater than images a second. successfully processed images with diffraction-produced integration files were returned to the beamline computers and later refined and merged using prime . the crystal structures of the nsp / complex were solved by molecular replacement using lavens a., and chard r. performed serial crystallography data collection contributed to the distributed processing pipeline saint n., and chard r analyzed serial crystallography data sars -beginning to understand a new virus clinical features of patients infected with novel coronavirus in wuhan who | middle east respiratory syndrome coronavirus (mers-cov). who covid- map -johns hopkins coronavirus resource center genome composition and divergence of the novel coronavirus ( -ncov) originating in china xrn '→ ' exoribonucleases: structure, mechanisms and functions mrna capping: biological functions and applications the viral rna capping machinery as a target for antiviral drugs in vitro reconstitution of sars-coronavirus mrna cap methylation ′-o-ribose methylation of cap in human: function and evolution in a horizontally mobile family influence of '-terminal cap structure on the initiation of translation of vaccinia virus mrna innate immune restriction and antagonism of viral rna lacking '-o methylation ribose '-o-methylation provides a molecular signature for the distinction of self and non-self mrna dependent on the rna sensor mda coronavirus non-structural protein : evasion, attenuation, and possible treatments a conserved histidine in the rna sensor rig-i controls immune tolerance to n - 'o-methylated self rna ′-o methylation of the viral mrna cap evades host restriction by ifit family members rna methyltransferases involved in ′ cap biosynthesis coronavirus nonstructural protein is a cap- binding enzyme possessing (nucleoside- ′o)-methyltransferase activity the crystal structure of nsp -nsp heterodimer from sars-cov- in complex with s-adenosylmethionine coronavirus nsp , a critical co-factor for activation of multiple replicative enzymes in silico identification, structure prediction and phylogenetic analysis of the -o-ribose (cap ) methyltransferase domain in the large structural protein of ssrna negative-strand viruses molecular phylogenetics of the rrmj/fibrillarin superfamily of ribose ′-o-methyltransferases biochemical and structural insights into the mechanisms of sars coronavirus rna ribose ′-o-methylation by nsp /nsp protein complex attenuation and restoration of severe acute respiratory syndrome coronavirus mutant lacking '-o-methyltransferase activity direct observation of ultrafast collective motions in co myoglobin upon ligand dissociation. science ( -. ) enzyme intermediates captured 'on the fly' by mix-and-inject serial crystallography proton uptake mechanism in bacteriorhodopsin captured by serial synchrotron crystallography. science ( -. ) femtosecond x-ray protein nanocrystallography high-resolution protein structure determination by serial femtosecond crystallography. science ( -. ) serial millisecond crystallography for routine room-temperature structure determination at synchrotrons liquid sample delivery techniques for serial femtosecond crystallography high-viscosity injector-based pink-beam serial crystallography of microcrystals at a synchrotron radiation source a modular and compact portable mini-endstation for high-precision, high-speed fixed target serial crystallography at fel and synchrotron sources pink-beam serial crystallography nylon mesh-based sample holder for fixed-target serial femtosecond crystallography raddose- d: time-and space-resolved modelling of dose in macromolecular crystallography coronavirus nsp /nsp methyltransferase can be targeted by nsp -derived peptide in vitro and in vivo to reduce replication and pathogenesis binding of the methyl donor s-adenosyl-l-methionine to middle east respiratory syndrome coronavirus ′-o-methyltransferase nsp promotes recruitment of the allosteric activator nsp crystal structure of guanidinoacetate methyltransferase from rat liver: a model structure of protein arginine methyltransferase mediation of donor-acceptor distance in an enzymatic methyl transfer reaction crystal structure and functional analysis of the sars-coronavirus rna cap ′-o-methyltransferase nsp /nsp complex the methyltransferase domain of the sudan ebolavirus l protein specifically targets internal adenosines of rna substrates, in addition to the cap structure checkmymetal: a macromolecular metal-binding validation tool dose-resolved serial synchrotron and xfel structures of radiationsensitive metalloproteins redox-coupled proton transfer mechanism in nitrite reductase revealed by femtosecond crystallography the catalytic pathway of cytochrome p cam at atomic resolution. science ( -. ) x-ray-radiation-induced cooperative atomic movements in highly versatile enzymes in biocatalysis structural basis for m g recognition and ′-o-methyl discrimination in capped rnas by the innate immune receptor rig-i the role of the cap structure in rna processing and nuclear export crystallization and diffraction analysis of the sars coronavirus nsp -nsp complex structural basis of rna cap modification by sars-cov- coronavirus globus platform-as-a-service for collaborative science applications efficient and secure transfer, synchronization, and sharing of big data electron diffraction data processing with dials funcx: a federated function serving fabric for science parsl: pervasive parallel programming in python petrel: a programmatically accessible research data service enabling x-ray free electron laser crystallography for challenging biological systems from a limited number of crystals model preparation in molrep and examples of model improvement using x-ray data refmac for the refinement of macromolecular crystal structures coot: model-building tools for molecular graphics ucsf chimera -a visualization system for exploratory research and analysis crystallographic data processing for free-electron laser sources global indicators of x-ray data quality linking crystallographic model and data quality. science ( -. ) molprobity: structure validation and all-atom contact analysis for nucleic acids and their complexes key: cord- -jmav xi authors: bridgland, victoria m. e.; moeck, ella k.; green, deanne m.; swain, taylor l.; nayda, diane; matson, lucy a.; hutchison, nadine p.; takarangi, melanie k.t. title: why the covid- pandemic is a traumatic stressor date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jmav xi the covid- pandemic does not fit into prevailing post-traumatic stress disorder (ptsd) models, or diagnostic criteria, yet emerging research shows traumatic stress symptoms as a result of this ongoing global stressor. current pathogenic event models focus on past, and largely direct, trauma exposure to certain kinds of life-threatening events. nevertheless, among a sample of online participants (n = , ) in five western countries, we found participants had ptsd-like symptoms for events that had not happened and when participants had been directly (e.g., contact with virus) or indirectly exposed to covid- (e.g., via media). moreover, . % of our sample were likely ptsd-positive, despite types of covid- “exposure” (e.g., lockdown) not fitting dsm- criteria. the emotional impact of “worst” experienced/anticipated events best predicted ptsd-like symptoms. our findings add to existing literature supporting a pathogenic event memory model of traumatic stress. posttraumatic stress disorder checklist- (pcl- ( )), adapted to measure pre/peri/post- traumatic reactions, and measures of general emotional reactions, well-being, psychosocial functioning, and depression, anxiety, and stress symptoms. importantly, although emerging research on covid- and traumatic stress reactions has typically not specified whether participants anchored their reactions to covid- itself (e.g., ( ) ), we asked our participants to respond to the pcl- in relation to covid- . these populations (e.g., non-naivete, worker inattention, fraudulent responses and worker treatment). to minimize "bots"/server farmers completing the survey ( , ), participants had to pass a captcha, a simple arithmetic question (presented as an image to make it difficult for bots to read), and score at least / on an english proficiency test. we are confident that these entry requirements screened out almost all bots/server farmers; in one estimate, the addition of an english proficiency test screened out % of bots/server farmers ( ). in addition to these entry requirements, participants had to pass at least one caucasian . %. others were of asian ( . %); african (including "black", . %); middle eastern (including "eurasian", . %); european ( . %); and hispanic ( . %) descent, or indigenous ( . %); pacific islander ( . %); mixed ( . %) ethnicity. some participants provided nationality (e.g., "australian" . %) or no answer ( . % four additional--and modify --categories. we recategorized seven "other" responses into new categories and into modified/existing categories. we then re-presented the same list of events, but asked participants to select events they were concerned about happening in the future ("other" events led to three additional categories [seven responses the ptsd checklist (pcl- ( )). participants rated how much they have been bothered by dsm- ptsd symptoms (e.g., "having difficulty concentrating"; = not at all, = extremely; current study α = . ). we made three modifications: measured symptoms in relation to covid- experiences, over the past week (rather than month) due to the rapidly changing circumstances, and asked participants to indicate if each symptom (rated > ) related to something that happened in the past, was currently happening, or may happen in the future. the -item world health organization well-being index (who- ( )). participants rated how five statements (e.g., "i have felt calm and relaxed") applied to them depression, anxiety and stress scale (dass- ( )). participants rated the degree to which each statement (e.g., "i felt down-hearted and blue") applied to them over the past week ( = did not apply to me at all, = applied to me very much). current study: depression, α = . ; anxiety, α = . ; stress, α = . . for context, over our days of data collection, confirmed cases worldwide increased from ~ . to ~ . million (deaths from ~ , to ~ , ). in the us, total cases jumped from , to over , , president trump released the "opening up why covid- is a traumatic stressor minister johnson was released after hospitalization for covid- , the queen addressed the nation, and lockdown restrictions were extended. in canada, deaths reached , , and an unrelated shooting occurred in nova scotia. in australia/nz, lockdown procedures were introduced or maintained, and both countries showed signs of reduced covid- spread from the first wave. we ran analyses using null-hypothesis significance tests (α = . ) in spss table . third, we examined evidence that ptsd-like symptoms occur for events that do not involve actual or threatened death, injury, or sexual violation. arguably, none of our events/categories meet criterion a; medically-based trauma is limited to sudden catastrophe (e.g., waking during surgery, anaphylactic shock ( )). even our most extreme direct exposure variables (e.g., being hospitalized in a critical condition) do not qualify. step , and emotion variables at step . after controlling for demographics, exposure table ). the final model explained . % of the variance in pcl total (f ( , ) = . , p <. ), and, notably, experienced event totals was no longer a significant predictor. vicarious traumatization in the general public, members, and non-members of medical teams aiding in covid- control media use and acute psychological outcomes during covid- outbreak in china fear of the coronavirus (covid- ): predictors in an online study conducted in financial and social circumstances and the incidence and course of ptsd in mississippi during the first two years after hurricane post-traumatic stress disorder and cancer stress, functioning, and coping during the covid- pandemic: results from an online convenience sample the ptsd checklist for dsm- (pcl- ) post-traumatic stress disorder and psychological distress in chinese youths following the covid- emergency at what sample size do correlations stabilize? evaluating effect size in psychological research: sense and nonsense memory in posttraumatic stress disorder: properties of voluntary and involuntary, traumatic and nontraumatic autobiographical memories in people with and without posttraumatic stress disorder symptoms anxiety disorders: why they persist and how to treat them synthesis of the psychometric properties of the ptsd checklist (pcl) military, civilian, and specific versions hedonic relativism and planning the good society world health organization. mental health and psychosocial considerations during the psychometric evaluation in military veterans traumatic stress in the age of covid- : a call to close critical gaps and adapt to new realities key: cord- -k svq n authors: pollet, jeroen; chen, wen-hsiang; versteeg, leroy; keegan, brian; zhan, bin; wei, junfei; liu, zhuyun; lee, jungsoon; kundu, rahki; adhikari, rakesh; poveda, cristina; mondragon, maria-jose villar; de araujo leao, ana carolina; rivera, joanne altieri; gillespie, portia m.; strych, ulrich; hotez, peter j.; bottazzi, maria elena title: sars-cov- rbd -n c : a yeast-expressed sars-cov- recombinant receptor-binding domain candidate vaccine stimulates virus neutralizing antibodies and t-cell immunity in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k svq n there is an urgent need for an accessible and low-cost covid- vaccine suitable for low- and middle-income countries. here we report on the development of a sars-cov- receptor-binding domain (rbd) protein, expressed at high levels in yeast (pichia pastoris), as a suitable vaccine candidate against covid- . after introducing two modifications into the wild-type rbd gene to reduce yeast-derived hyperglycosylation and improve stability during protein expression, we show that the recombinant protein, rbd -n c , is equivalent to the wild-type rbd recombinant protein (rbd -wt) in an in vitro ace- binding assay. immunogenicity studies of rbd -n c and rbd -wt proteins formulated with alhydrogel® were conducted in mice, and, after two doses, both the rbd -wt and rbd -n c vaccines induced high levels of binding igg antibodies. using a sars-cov- pseudovirus, we further showed that sera obtained after a two-dose immunization schedule of the vaccines were sufficient to elicit strong neutralizing antibody titers in the : , to : , range, for both antigens tested. the vaccines induced ifn-γ, il- , and il- secretion, among other cytokines. overall, these data suggest that the rbd -n c recombinant protein, produced in yeast, is suitable for further evaluation as a human covid- vaccine, in particular, in an alhydrogel® containing formulation and possibly in combination with other immunostimulants. introduction the number of coronavirus disease (covid- ) cases globally is readily approaching the - million-person mark, with over . million deaths. in response to the pandemic, an international enterprise to develop effective and safe vaccines is underway. there are many ways to categorize the more than potential covid- vaccine candidates , but one approach is to divide them as those employing new technologies for production, but that have not yet been licensed for use, versus terms of production, scale-up, potential efficacy and safety, and delivery. we have previously reported on recombinant protein-based coronavirus vaccine candidates, formulated with alhydrogel ® to prevent severe acute respiratory syndrome (sars) - and middle east respiratory syndrome (mers) . in both cases, the receptor-binding domain (rbd) of the sars or mers spike proteins was used as the target vaccine antigen. in a mouse model, the sars- cov rbd -n /alhydrogel ® vaccine induced high titers of virus-neutralizing antibodies and protective immunity against a mouse-adapted sars-cov virus challenge. it was also found to minimize or prevent eosinophilic immune enhancement compared to the full spike protein . the rbd of sars-cov- has likewise attracted interest from several groups now entering clinical trials with rbd-based vaccines , - . our approach was to apply the lessons learned from the development of the sars-cov vaccine candidate and accelerate the covid- vaccine induction temperature was set to °c and the ph to . and, the methanol feed rate was between - ml/l/hr. the fermentation supernatant (fs) was filtered ( . m pes filter) and stored at - °c before purification. a hexahistidine-tagged sars-cov- rbd -wt was purified from fermentation supernatant (fs) by immobilized metal affinity chromatography followed by size exclusion chromatography (sec). the fs was concentrated and buffer exchanged to buffer a ( mm tris- hcl ph . and . m nacl) using a pellicon cassette with a kda mwco membrane to evaluate the size of rbd -wt and rbd -n c , μg of these two proteins were loaded onto a - % tris-glycine gel under non-reduced and reduced conditions. these two proteins were also treated with pngase-f (neb, ipswitch, ma, usa) under the reduced condition to remove n- glycans and loaded on the gel to assess the impact of the glycans on the protein size. gels were stained using coomassie blue and analyzed using a bio-rad g densitometer with image alhydrogel ® formulations were centrifuged at , x g for min, and the supernatant was removed. the protein in the supernatant fraction and the pellet fraction were quantified using a micro bca assay (thermofisher, waltham, ma, usa). for the ace- binding study, the alhydrogel ® -rbd vaccine formulations were blocked overnight with . % bsa. after hace- -fc (lakepharma, san carlos, ca, usa) was added, the samples were incubated for hours at rt. after incubation, the alhydrogel ® was spun down at , x g for washed once with l pbst using a biotek ts plate washer and diluted mouse serum samples were added to the plate in duplicate, l/well. as negative controls, pooled naïve mouse serum ( : diluted) and blanks ( . % bsa pbst) were added as well. plates were incubated for hours at room temperature, before being were washed four times with pbst. subsequently, : , diluted goat anti-mouse igg hrp antibody ( l/well) was added in . % bsa in pbst. plates were incubated hour at room temperature, before washing five times with pbst, followed by the addition of l/well tmb substrate. plates were incubated for min at room temperature while protected from light. after incubation, the reaction was stopped by adding l/well m hcl. the absorbance at a wavelength of nm was measured using a biotek epoch spectrophotometer. duplicate values of raw data from the od were averaged. the titer cutoff value was calculated using the following formula: titer cutoff = x average of negative control + x standard deviation of the negative control. for each sample, the titer was determined as the lowest dilution of each mouse sample with an average od value above the titer cutoff. when a serum sample did not show any signal at all and a titer could not be calculated, an arbitrary baseline titer value of was assigned to that sample (baseline). sample/ rlu of negative control) x . serum from vaccinated mice was also characterized by the ic -value, defined as the serum dilution at which the virus infection was reduced to % compared with the negative control (virus + cells). when a serum sample did not neutralize % of the virus when added at a : dilution, the ic titer could not be calculated and an arbitrary baseline titer value of was assigned to that sample (baseline). as a control, human convalescent sera for sars- for the re-stimulation assays, splenocyte suspensions were diluted to x live cells/ml in a -ml deep-well dilution plate and l of each sample was seeded in two -well tissue culture treated culture plates. splenocytes were re-stimulated with g/ml rbd -wt, ng/ml pma + g/ml ionomycin or just media (unstimulated). for the flow cytometry plate, the pma/i was not added until the next day. l ( x concentration) of each stimulant was mixed with the l splenocytes suspension in the designated wells. after all the wells were prepared, the plates were incubated at °c % co . one plate was used for the cytokine release assay, while the other plate was used for flow cytometry. for flow cytometry, another plate was prepared with splenocytes, which would be later used as fluorescence minus onecontrols (fmos). after hours in the incubator, splenocytes were briefly mixed by pipetting. then plates were centrifuged for min at x g at rt. without disturbing the pellet l supernatant was transferred to two skirted pcr plates and frozen at - °c until use. for the in vitro cytokine release assay, splenocytes were seeded in a -well culture plate at x live cells in µl crpmi. splenocytes were then (re-)stimulated with either µg/ml rbd -wt protein, µg/ml rbd -n c protein, pma/iomycin (positive control), or nothing (negative control) for hours at °c % co . after incubation, -well plates were centrifuged to pellet the splenocytes down and supernatant was transferred to a new -well plate. the supernatant was stored at - °c until assayed. a milliplex mouse th luminex kit (md millipore) with analytes il- β, il- , il- , il- , il- , il- (p ), il- , il- a, il- , ifn-γ, and tnf-α was used to quantify the cytokines secreted in the supernatant by the re-stimulated splenocytes. an adjusted protocol based on the manufacturers' recommendations was used with adjustments to use less sample and kit materials . the readout was performed using a magpix luminex instrument. raw data was analyzed using bio-plex manager software, and further analysis was done with excel and prism. surface staining and intracellular cytokine staining followed by flow cytometry was performed to measure the amount of activated (cd =) cd + and cd + t cells producing ifn-, il- , tnf-, and il- upon re-stimulation with s rbd wt. five hours before the -hour re-stimulation incubation, brefeldin a was added to block cytokines from secretion. pma/i was also added to designated wells as a positive control. after the incubation, splenocytes were stained for the relevant markers. a viability dye and an fc block were also used to remove dead cells in the analysis and to minimize non-specific staining, respectively. results here we report on the expression of a modified, recombinant rbd of the sars-cov- spike protein using the yeast (p. pastoris) expression system. the candidate antigen selection, modifications, and production processes were based on eight years of process development, manufacture, and preclinical prior experience with a sars-cov recombinant protein-based receptor-binding domain (rbd) - . the rbds of the sars-cov- and sars-cov share significant amino acid sequence similarity (> % identity, > % homology) and both use the human angiotensin-converting enzyme (ace ) receptor for cell entry , . process development using the same procedures and strategies used for the production, scale-up, and manufacture of the sars-cov recombinant protein allowed for a rapid acceleration in the development of a scalable and reproducible production process for the sars-cov- rbd -n c protein, suitable for its technological transfer to a manufacturer. we found that the modifications used to minimize yeast-derived hyperglycosylation and optimize the yield, purity, and stability of the sars-cov rbd -n protein were also relevant to the sars-cov- rbd expression and production process. the modified sars-cov- antigen, rbd -n c , when formulated on alhydrogel ® , was shown to induce virus-neutralizing antibodies in mice, equivalent to those levels elicited by the wild-type (rbd -wt) recombinant protein counterpart. the wild-type sars-cov- rbd amino acid sequence comprises residues - of the spike (s) protein (genbank: qhd . ) of the wuhan-hu- isolate (genbank: mn . ) (figure ) . in the rbd- -wt construct, the gene fragment was expressed in p. pastoris. after fermentation at the l scale, the hexahistidine-tagged protein was purified by immobilized metal affinity chromatography, followed by size-exclusion chromatography. we observed glycosylation and aggregation during these initial expression and purification studies, and therefore, similar to our previous strategy , we generated a modified construct, the rbd -n c , by deleting the n residue and mutating the c residue to alanine. the additional mutation of c to a was done because we observed that in the wild-type sequence nine cysteine residues likely would form four disulfide bonds. therefore, the c residue was likely available for intermolecular cross-linking, leading to aggregation. as a result, in the rbd -n c construct, and based on the modifications, the pichia-derived hyperglycosylation, as well as aggregation via intermolecular disulfide bridging, were greatly reduced. we note that the deleted and mutated residues are structurally far from the immunogenic epitopes and specifically the receptor-binding motif (rbm) of the rbd (figure ) when mixing µg of either rbd -wt or rbd -n c proteins to µg of alhydrogel ® , we observed that > % of the proteins bind to alhydrogel ® after min of incubation. only when the alhydrogel ® was reduced to less than µg (alhydrogel ® /rbd ratio < ), the alhydrogel ® surface was saturated, and protein started to be detected in the supernatant (figure a) . it is known that unbound protein may impact the immunogenicity of the vaccine formulation, therefore we proceeded to only evaluate formulations with alhydrogel ® /rbd ratios higher than . figure b shows that hace- -fc, a recombinant version of the human receptor used by the virus to enter the host cells, can bind with the rbd proteins that are adsorbed on the surface of the alhydrogel ® . this demonstrates that bound rbd proteins are structurally and possibly functionally active and that after adsorption the protein does not undergo any significant conformational changes that could result in the loss of possible key epitopes around the receptor-binding motif (rbm). we saw no statistical differences between the binding of hace- -fc to rbd -wt (red, figure b ) or rbd -n c (green, figure b ) proteins, based on an unpaired t-test (p= . ). likewise, we saw no relation between the amount of alhydrogel ® to which the rbd was bound and the interaction with hace- -fc, indicating that the surface density of the rbd proteins on the alhydrogel ® plays no role in the presentation of ace binding sites. alhydrogel ® , produced a lower igg response, albeit slightly higher than the negative control that had been immunized with g alhydrogel ® alone (figure b, supplemental table ) . importantly, based on a mann-whitney test, we determined that there was no statistical difference between the groups vaccinated with the modified and the wild-type version of the rbd protein (p= . ). the average neutralizing antibody titers observed on day (ic range: . x to . x , supplemental (figure c) . on day , days after receiving the boost vaccination, half of the mice in each group (n= ), those with the highest igg titers, were sacrificed to determine the total igg, the igg subtypes, and the neutralizing antibody titers. as we observed on day , all animals that had received the vaccine produced strong antibody titers, with the groups receiving > g alhydrogel ® eliciting a higher titer than those that received only g of alhydrogel ® , albeit no statistical significance was detected (figures b) . for all animals, as typical for vaccine formulations containing aluminum, the igg a:igg titer ratio was < . (supplemental figure ) . in the pseudovirus neutralization assay for the day samples (figure c) , all vaccines containing > g alhydrogel ® elicited ic titers that, on average, were several-fold higher than on day (ic range: . x to . x , supplemental table ). there again was no difference between the rbd -wt and rbd-n c vaccines. on day , all remaining animals were sacrificed. in contrast to the animals studied on days and , these animals had received a second boost vaccination. a robust immune response in all vaccinated mice, including those immunized with the protein adsorbed to g alhydrogel ® achieved high average igg titers. the total igg titers in the mice sacrificed on day , had increased after the third vaccination, compared to the titers seen on day . likewise, we observed a corresponding increase in the average ic values (ic range: . x to . x , supplemental table ) for all animals, including those immunized with the protein adsorbed to g alhydrogel ® . interestingly, for this time point, the cohort receiving g rbd -n c with g alhydrogel ® appeared to show higher neutralizing antibody titers than the corresponding for all samples, we employed flow cytometry to quantify intracellular cytokines in cd + and cd + cells after restimulation ( figure a) . on day , high percentages of cd + -il- and, to a slightly lesser extent cd + -tnf producing cells were detected. conversely, as expected for an alhydrogel ® -adjuvanted vaccine, low levels of il- producing cd + cells were seen. in a cytokine release assay, strong ifn-, il- , and il- secretion was observed independent of whether the animals had received two or three immunizations, whereas low amounts of secreted th -typical cytokines such as il- or il- were seen ( figure b ). cytokine concentrations of non-stimulated controls were subtracted from re-stimulated samples. discussion here we report on a yeast-expressed sars-cov- rbd -n c protein and its potential as a vaccine candidate antigen for preventing covid- . building on extensive prior experience developing vaccines against sars-cov and mers-cov - , we initially selected and compared the sars-cov- rbd -wt and the sars-cov- rbd -n c proteins for their potential to induce high titers of virus-neutralizing antibodies, t-cell responses, and protective immunity. previously we observed that the sars-cov rbd -n antigen, formulation with alhydrogel ® elicited high levels of neutralizing antibodies without evidence of eosinophilic immune enhancement. that rbd-based vaccine was even superior to the full-length spike protein in inducing specific antibodies and fully protected mice from sars-cov infection while preventing eosinophilic pulmonary infiltrates in the lungs upon challenge . in this work, using the sars-cov- rbd protein analog, we observed that, just like in the case of the sars-cov rbd antigen, the deletion of the n-terminal asparagine residue reduced hyperglycosylation, thus allowing for easier purification of the antigen obtained from the yeast expression system. moreover, mutagenesis of a free cysteine residue further improved protein production through the reduction of aggregation. based on the predicted structure of the rbd, no impact on the functionality of the rbd -n c antigen was expected, and using an ace- in vitro binding assay we indeed showed similarity to the rbd -wt antigen. in addition, we showed that, in mice, the modified rbd -n c antigen triggered an equivalent immune response to the rbd -wt protein when both proteins were adjuvanted with alhydrogel ® . similar to our previous findings with the sars-cov rbd antigen , we show the rbd - n c protein when formulated with alhydrogel ® elicits a robust neutralizing antibody response with ic values up to . x in mice, as well as an expected t-cell immunological profile. some of the titers of virus-neutralizing antibodies exceed the titer, . x , measured in-house with human convalescent serum research reagent for sars-cov- (nibsc / , national institute for biological standards and control, uk). in a mouse virus challenge model for the sars cov rbd recombinant protein vaccine, we found that alhydrogel ® formulations induced high levels of protective immunity but did not stimulate eosinophilic immune enhancement, suggesting that alhydrogel ® may even reduce immune the selection of the p. pastoris expression platform for the production of the rbd antigen was motivated by the intent to develop a low-cost production process that could easily be transferred to manufacturers in lmics. currently, there are several types of covid- vaccine candidates in advanced clinical trials , - . the focus of some of the initiatives behind these vaccines is to provide vaccines for the developed world that might struggle to be successful without advanced infrastructure. being able to match the existing experience in lmics with the production of other biologics in yeast increases the probability of successful technology transfer . for example, currently, the recombinant hepatitis b vaccine is produced in yeast by several members of the development country vaccine manufacturers network (dcvmn), and we foresee that, given the existing infrastructure and expertise, those facilities could be repurposed to produce a yeast-produced covid- vaccine . recently, the research cell bank and production process for the rbd -n c antigen was technologically transferred to a vaccine manufacturer in india and produced under cgmp conditions with the intent to enter into clinical development. in addition, preclinical studies using the rbd -wt and rbd -n c antigens are ongoing to further optimize and evaluate other novel formulations, including a challenge study in a non-human primate model. the covid- vaccine-development multiverse. the new england journal of medicine the sars-cov- vaccine pipeline: an overview developing safe and effective covid vaccines -operation warp speed's strategy and approach. the new england journal of medicine who. access to covid- tools (act) accelerator development of an inactivated vaccine candidate for sars-cov- phase - trial of a sars-cov- recombinant spike protein nanoparticle a vaccine targeting the rbd of the s protein of sars-cov- induces protective immunity sars-cov- spike produced in insect cells elicits high neutralization titres in non-human primates. emerging microbes & infections yeast-expressed sars-cov recombinant receptor-binding domain (rbd -n ) formulated with alum induces protective immunity and reduces immune yeast-expressed recombinant protein of the receptor-binding domain in sars-cov spike protein with deglycosylated forms as a sars vaccine candidate optimization of the production process and characterization of the yeast- expressed sars-cov recombinant receptor-binding domain (rbd -n ) vaccine candidate engineering a stable cho cell line for the expression of a mers- coronavirus vaccine antigen randomized double blind, placebo controlled phase i trial for anti novel coronavirus pneumonia (covid- ) recombinant vaccine (sf ) kbp- covid- vaccine trial in healthy volunteers a study to evaluate the safety and immunogenicity of covid- (adimrsc- f) vaccine clinical study of recombinant novel coronavirus vaccine soberano -estudio fase i/ii, aleatorizado, controlado, adaptativo, a doble ciego y multicéntrico para evaluar la seguridad, reactogenicidad e inmunogenicidad del candidato vacunal profiláctico finlay-fr- anti sars -cov - en un esquema de dos dosis developing a low-cost and accessible covid- vaccine for global health will covid- become the next neglected tropical disease? plos neglected tropical diseases developing a low-cost and accessible covid- vaccine for murine leukemia virus (mlv)-based coronavirus spike- pseudotyped particle production and infection vaccine-linked chemotherapy improves benznidazole efficacy for acute transferring luminex(r) cytokine assays to a wall-less plate technology: validation and comparison study with plasma and cell culture supernatants differences in cd surface expression levels and function discriminates il- and ifn-gamma producing a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor covid- vaccines: neutralizing antibodies and the alum advantage the potential role of th immune responses in coronavirus immunopathology and vaccine-induced immune enhancement. microbes and infection prospects for a safe covid- vaccine draft landscape of covid- candidate vaccines s-trimer, a covid- subunit vaccine candidate, induces protective immunity in nonhuman primates scb- as covid- vaccine development of cpg-adjuvanted stable prefusion sars-cov- spike antigen as a subunit vaccine against covid- . biorxiv a study to evaluate the safety and immunogenicity of mvc-cov study of the safety, reactogenicity and immunogenicity of "epivaccorona" vaccine for the prevention of covid- (epivaccorona a study to evaluate the safety, tolerability, and immunogenicity of ub- covid- vaccine enhancing blood-stage malaria subunit vaccine immunogenicity in rhesus macaques by combining adenovirus, poxvirus, and protein-in-adjuvant vaccines combining viral vectored and protein-in-adjuvant vaccines against the blood-stage malaria antigen ama : report on a phase a clinical trial. molecular therapy : the journal of the sars-cov- mrna vaccine development enabled by prototype pathogen preparedness. biorxiv chadox ncov- vaccine prevents sars-cov- pneumonia in rhesus macaques safety and immunogenicity of the ad .rsv.pref investigational vaccine coadministered with an influenza vaccine in older adults safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial an mrna vaccine against sars-cov- -preliminary report. the new england journal of medicine rna-based covid- vaccine bnt b selected for a pivotal efficacy study. medrxiv : the preprint server for health sciences prequalified vaccines the authors declare that baylor college of medicine recently licensed the rbd -n c technology to an indian manufacturer for further development. the research conducted in this paper was performed in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. key: cord- -pgrrurrc authors: tripathi, satyendra c; deshmukh, vishwajit; creighton, chad j.; patil, ashlesh title: renal carcinoma is associated with increased risk of coronavirus infections date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pgrrurrc the current pandemic covid- has affected most severely to the people with old age, or with comorbidities such as hypertension, diabetes mellitus, chronic kidney disease, copd, and cancers. cancer patients are twice more likely to contract the disease because of the malignancy or treatment-related immunosuppression; hence identification of the vulnerable population among these patients is essential. it is speculated that along with ace , other auxiliary proteins (dpp , anpep, enpep, tmprss ) might facilitate the entry of coronaviruses in the host cells. we took a bioinformatics approach to analyze the gene and protein expression data of these coronavirus receptors in human normal and cancer tissues of multiple organs. here, we demonstrated an extensive rna and protein expression profiling analysis of these receptors across solid tumors and normal tissues. we found that among all, renal tumor and normal tissues exhibited increased levels of ace , dpp , anpep, and enpep. our results revealed that tmprss may not be the co-receptor for coronavirus in renal carcinoma patients. the receptors’ expression levels were variable in different tumor stage, molecular and immune subtypes of renal carcinoma. in clear cell renal cell carcinomas, coronavirus receptors were associated with high immune infiltration, markers of immunosuppression, and t cell exhaustion. our study indicates that cov receptors may play an important role in modulating the immune infiltrate and hence cellular immunity in renal carcinoma. as our current knowledge of pathogenic mechanisms will improve, it may help us in designing focused therapeutic approaches. coronavirus (cov) disease- has been declared as a pandemic by the world health organization (who) after the outbreak of severe acute respiratory syndrome-cov- (sars-cov- ) (ng et al., ) . the primary host for the sars-cov- has been identified as bats and the terminal host as humans (khan et al., ) . the related research revealed that sars-cov und sars-cov- share approximately % of amino acid identity . the primary symptoms include fever, dry cough, dyspnea, diarrhea, myalgia, headache, hyposmia and with common complications like acute respiratory distress ( %), acute cardiac injury ( %), and acute renal injury ( %) (huang et al., ) . it has been well established that covs require the ace or dpp receptors for entry into the host cells (seys et al., ; walls et al., ; wrapp et al., ) . the organs or the cells expressing ace are more vulnerable to the cov infection (glowacka et al., ) . according to hoffmann et al. , the viral entry in the host cell depends on the sars-cov receptor ace for binding and requires the tmprss for priming and also relies on tmprss activity (hoffmann et al., ) . therefore, it is speculated that other auxiliary proteins or co-receptors might facilitate the entry of covs in the host cells. these coreceptors or auxiliary proteins include tmprss , anpep, enpep (qi et al., ) . therefore, it has been proposed that organs representing the co-expression of co-receptors or auxiliary proteins such as tmprss , anpep, or enpep for ace and dpp are more susceptible for the viral entry replication and severity of the disease. the thought of extrapulmonary spread is not evitable due to the presence of these receptors and co-receptors. the prominent population with increased risk of virus infection are older patients and those associated with comorbidities such as hypertension, diabetes mellitus, chronic kidney disease, copd, and cancers (sun et al., ) . viral infection with comorbidities is responsible for higher mortality. according to lee and colleagues, cancer patients are twice more likely to contract the infection than the normal population (lee et al., ) . patients who received chemotherapy or surgery within the days before the covid- pandemic have more risk of infection than the patients who had not undergone chemotherapy or surgery (sharma et al., ) . according to an analysis of italian patients published in march, % of those who died from covid- in the country had active cancer . notably, the guidelines for cancer patients during the covid- pandemic focus on lung cancer patients undergoing active chemotherapy or radical radiotherapy, and on patients with blood cancers (burki, ) . the association of these receptors with the pathogenicity of the solid tumors is still to be solved. in the present study, we investigated molecular profiling data of the various proteins required for the entry of the covs in normal tissues and cancer tissues. immunological aspects of the study of pathogenesis cannot be overlooked. therefore, we also explored an immune perspective concerning cancer. understanding the usage of the multiplicity of receptors and co-receptors by the various covs can open new avenues for understanding the pathogenesis and development of intervention strategies. sars-cov- requires ace as its receptor for host cell entry which is involved in various biological functions primarily in the renin-angiotensin system ( figure a ). dpp , anpep, enpep, and tmprss have also been proposed as co-receptors to initiate sar-cov- infection (qi et al., ) . string pathway analysis revealed a high confident interaction between these proteins, which have peptidase activity and are involved in angiotensin system, peptide metabolism, and viral entry into the host cell ( figure b and table ). hence, we extracted the rna and protein expression data of all these receptors in healthy tissues and solid tumors. we found enrichment of ace at rna levels in testis, small intestine, and kidney ( figure c ). along with ace , enrichment of rnas for the other four co-receptors occurred in normal lung, mammary, liver, prostate, thyroid, head and neck tissues, small intestine, and kidney tissues ( figure c ). the protein expression levels of ace and coreceptors showed a similar expression pattern as to their rna ( figure e ). renal tumors exhibited the highest expression of ace receptor among solid tumor tissues followed by gastrointestinal cancers such as colorectal, pancreatic, and stomach cancer ( figure d ). dpp , anpep, and enpep rna expression were also elevated in renal tumors compared to other cancer tissues. of note, tmprss rna expression was highest in prostate cancer tissues, whereas renal tumors featured among the lowest expressing tissues. tcga data analysis also showed increased expression of all receptors except tmprss in renal tumor tissues as compared to their adjacent normal (sup fig ) . we found that protein level data was also in concordance to rna expression data in solid tumors. renal cancer expressed higher percent positivity of all receptors except tmprss , which exhibited the highest percent positivity in prostate tumor tissues ( figure f ,g). we observed that ace , anpep, enpep, and dpp gene expression was higher in renal papillary carcinoma (kirp) and renal clear cell carcinoma (kirc), whereas tmprss showed increased expression in renal chromophobe (kich) (figure a and sup fig a) . we further analyzed the correlation of ace with all four receptors in each renal cancer types. we observed a statistically significant correlation of ace with dpp in kich (p< . , ρ = . ), kirc (p< . , ρ = . ), kirp (p< . , ρ = . ) and also with anpep in kirp (p< . , ρ = . ) ( figure b ). and also negatively correlated with tumor stages in kirp tumor tissues along with anpep (ρ = - . , p< . ). enpep only showed a significant (ρ = - . , p< . ) negative correlation with tumor stages in kirc tumors ( figure c ). as molecular subtypes of kirp are well defined, we also analyzed the expression pattern of these receptors in various molecular subtypes. we observed that all cov receptors showed significantly (p< . , kruskal-wallis test) increased expression in c , c a, and c b subtypes compared to c c-cimp subtype of kirp (subtype with high dna methylation) ( figure d ). had a higher expression of tmprss ( figure a and supl fig b) . we also analyzed the correlation between the cov receptor genes and immune cell signatures, for each of the cancer types in tcga. the cov receptors tended to show a high correlation with immune signatures in most cancer types, though ace and tmprss exhibited weaker correlations than other receptors (supl fig c) . further, we specifically analyzed ace , which is a primary sars-cov receptor and dpp , highly correlated to ace across the kidney cancer subtypes along with immune cell signatures (innate and adaptive immunity, inflammatory cytokines and chemokines). we observed that ace and dpp exhibited increased expression in all molecular subtypes except kich, cimp subtype of kirp, and cc-e. kirc subtype ( figure b ). we found that these receptors are highly correlated to the innate and adaptive immunity-related cells, as well as il- , il- , cxcl , ccl -ccl , tgfb in kirc tumors, whereas in kiprp tumor tissues il , il a and tnf were highly correlated ( figure b ). our results revealed that, in kirc tumors, the expressions of various markers of exhausted t cells (cd , pd , ctla ) and immunosuppressive microenvironment (pdl , pdl ) were also significantly (p < . , t-test) correlated to cov receptors (supl fig d) . we ) . ace was only found to be significantly positively correlated to macrophage (r = . , p = . e- ) immune infiltrate in kirp subtype ( figure a ). on the other hand, dpp was significantly positively and negatively correlated to macrophage (r = . , p = . e- ) and b cell (r = - . , p = . e- ), cd + t cell (r = . , p = . e- ) immune infiltrate in kirp subtype ( figure b ). we observed either negative or no correlation of tmprss expression with immune infiltrates in renal carcinoma subtypes (suppl fig a) . we also found a significant correlation of anpep and enpep with immune filtrate in kirc as compared to kirp tumors (suppl fig b and c ). immune infiltrates of b cells, cd + t cells, macrophages, and dcs significantly correlated with these receptors except tmprss in kich tumors ( figure and suppl fig ) . these results strongly suggest a variable host immune response to cov infection depending upon the renal carcinoma subtypes. there is an increased risk of coronavirus related fatalities in the subpopulation with any underlying health conditions or comorbidities. the risk factors for an increased susceptibility for cov infection include, but not limited to, diabetes, heart disease, hypertension, chronic renal disease, chronic obstructive pulmonary disease, smoking and cancer . cancer patients especially those undergoing chemotherapy and other anti-cancer treatment have increased risk of mortality due to cov infection (lee et al., ) . the precise mechanism of increased severity in cancer remains unclear. in this study, we did landscape profiling of cov receptors and co-receptors (viz ace , tmprss , anpep, enpep, and dpp ) in various normal and cancer cells. our findings are in concordance with previous studies reporting ace expression in stratified epithelial cells, colon, lung, liver, and kidney. single-stranded rna viruses can have multiple receptors for their host cell entry (zhang et al., ) . sars-cov utilizes ace , cd , clec g, and clec m for its infection to host (marzi et al., ; yang et al., ; gramberg et al., ; wang et al., ) . a recent study has suggested a few other receptors such as dpp , anpep, enpep, and tmprss as co-receptors/auxiliary proteins to complement ace in initiating sar-cov- infection (qi et al., ) . we analyzed the data available on the gtex portal and observed the co-occurrence of these receptors in the small intestine and kidney at both rna and protein levels. corona infection is a multiorgan diseased condition and not limited to the lungs. few studies have reported low levels of ace , in lung parenchyma as compared to other normal tissues (jia et al., ; qi et al., ) . cell type specificity also exists for ace expression, such as in lung expression is mainly in alveolar cells (type pneumocytes) and immune cells (b cells, t cells or myeloid cells) (qi et al., ) . in kidneys, most of the cells of proximal collecting tubules and proximal straight tubules exhibit increased expression. and enpep along with ace receptors. we found that the expression of these receptors is inversely correlated to tumor stage and varies by molecular subtypes in renal carcinoma. as host immune response is crucial to eradicating viral infection, immunological aspects related to these receptors cannot be overlooked. we observed that cov receptors tend to show a high correlation with immune signatures in most cancer types. we further explored the possibility, whether cov receptors are involved in modulating tumor immunity. our analysis revealed for the first time that these receptors were correlated with immune cell infiltration in renal carcinoma. our findings supported the immunoregulatory functions as follows: ace , dpp , anpep, and enpep expression is closely related to the infiltration level of b cells, cd + t cell, macrophage, neutrophil, and dendritic cell. cytokine storm has been well-defined and described with the pathogenesis of the disease. chemokines are involved in several biological processes, such as the development of innate and acquired immunity, embryogenesis, and cancer metastasis (coperchini et al., ) . these chemokines along with cytokines recruit different immune cells (poeta et al., ) . we found that ace and dpp were highly expressed and significantly correlated to the innate and adaptive immunity-related cells, as well as il- , il- , cxcl , ccl -ccl , tgfb in kirc tumors only. upregulation of cxcl can enhance the levels of tumor-infiltrating cd + t cell and natural killer cells (humblin and kamphorst, ; kikuchi et al., ; petty et al., ) . t regulatory cells and macrophages get recruited by ccl walens et al., ) . these results indicate that cov receptors can play an important role in cellular immunity by modulating the immune infiltrate via cytokines and chemokines secretion. cancer cells also can evade the immune system by promoting t cell dysfunction and exhaustion (jiang et al., ; thommen and schumacher, ) . we analyzed the expression of inhibitory immune-checkpoint molecules in renal carcinoma subtypes to understand the dysregulation of the tumor microenvironment. our results revealed that the expressions of various markers of exhausted t cells (cd , pd , ctla ) and immunosuppressive microenvironment (pdl , pdl ) are highly correlated to cov receptors in kirc tumors. therefore, targeting these cov receptors along with immune checkpoint inhibitors in coronavirus positive kirc patients can be beneficial. our study has some limitations as our findings are based on correlation and associations drawn on analysis of data extracted from several public databases. further experiments are warranted to confirm the role of cov receptors in immune modulation of renal carcinoma. in conclusion, our bioinformatics analysis revealed that renal carcinoma patients might be more susceptible to cov infection. we found evidence that tmprss may not be the auxiliary protein for coronavirus infection in renal carcinoma. ace and dpp increased expression in renal carcinoma tissues as compared to normal kidney. this association suggests that these patients are at increased risk of case related fatalities than healthy subjects. ace , dpp , anpep, and enpep each associated with a high level of immune infiltration, inflammatory chemokines, cytokines and markers of an immunosuppressive microenvironment and t cell exhaustion in kirc tumors. our study indicates that cov receptors may play an important role in modulating the immune infiltrate and hence cellular immunity in renal carcinoma. we downloaded rna-seq gene expression profiling datasets from the genotype-tissue expression study (gtex consortium) for human normal tissues. the normalized rna-seq data in transcripts per million (tpm) was utilized for further analysis. we also downloaded rna-seq gene expression profiling datasets from the cancer genome atlas (tcga) (https://www.cancer.gov/aboutnci/organization/ccg/research/structural-genomics/tcga) (rsem normalized) for human normal and cancer tissues. the human protein atlas database (https://www.proteinatlas.org/humanproteome/tissue) includes expression profiles of rna and protein corresponding to ∼ % of the human protein-coding genes of specific tissues and organs. we downloaded the immunohistochemistry (ihc) data analysis of cov receptors protein in cancer and normal tissues (pmid: ). the score of ihc-based protein expression was determined as the fraction of positive cells defined in different tissues: = - %, = - %, = - %, > % and intensity: = negative, = weak, = moderate, and = strong staining. we utilized the combined data of positive fraction and intensity, which is represented as high ( ), moderate ( ), low ( ) or no ( ) staining. the ihc data representation for normal tissues was presented as staining of the protein. for carcinoma tissues, we used the percentage positivity based on tumor tissues with high or moderate staining compared to low or no staining detected. representative images of normal kidney and renal tumor tissues were also acquired from the same source. for analyses of interactions among cov receptors, the string database, which enables analysis for the structural and functional component of proteins (szklarczyk et al., ) was used. the sources for establishing the interactions and enrichment of molecular, functional processes were text mining, experiments, databases, co-expression, neighbourhood, and co-occurrence, and . was set as the cut-off criterion. the tumor immune system interaction database (tisidb) (http://cis.hku.hk/tisidb/) platform was used to analyze the correlation of ace with other cov receptors (dpp , anpep, enpep, tmprss ) expression in different renal carcinoma subtypes. the correlation and association of cov receptors with tumor stage, molecular subtypes, and immuno-subtypes of renal carcinoma were also analyzed using the tisidb interface. the tumor immune estimation resource (timer) (https://cistrome.shinyapps.io/timer/) platform comprised of the immune infiltrate data from the tcga patients (li et al., (li et al., , ) was used to investigate the association between cov receptors expression and the infiltration level of b cell, cd + t cell, cd + t cell, neutrophil, macrophage and dendritic cell. "diffexp" module was used to investigate the spp expression between tumor and adjacent normal tissues across all tcga tumors. the partial spearman's correlation and statistical significance after purity-correction were shown on the generated scatterplots. to computationally infer the infiltration level of specific immune cell types using rna-seq data from renal cell carcinoma samples from tcga, as described previously (chen et al., ) , we used a set of genes specifically overexpressed in one of immune cell types (bindea et al., ) . we analyzed various immune signatures, including innate immunity, adaptive immunity, pro, and anti-inflammatory cytokines and inflammatory chemokines. the for pan-cancer correlation analyses, we used the set of rna-seq profiles from different cancer types as featured in our previous study (chen et al., ) , using the bindea immune cell signature scores as computed for this study. the correlation among cov receptors expression and with immune infiltration level was determined by the timer interface using spearman's correlation analysis and statistical significance, and the strength of the correlation was determined using the following guide for the absolute value: . - . "weak," . - . "moderate," . - . "strong." the association between the cov receptors expression and molecular or immune-subtypes in renal carcinoma was analyzed by the tisidb interface using the kruskal-wallis test. for preparing the bar graph, the data was analyzed using graphpad™ software (version . , graphpad software, inc., usa) and presented as mean ± sd. p-values < . were considered statistically significant. previously (chen et al., ) . three of these subtypes-cc-e. , cc-e. , and cc-e. -are enriched for clear cell rcc cases; four other subtypes-p-e. a, p-e. b, p-e. , and p.cimpe-are enriched for papillary rcc cases; one subtype, ch-e, is enriched for chromophobe rcc cases; and one subtype ("mixed") is not enriched for any of the above. treg cells, spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer cancer guidelines during the covid- pandemic pan-cancer molecular classes transcending tumor lineage across cancer types, multiple data platforms, and over , cases multilevel genomics-based taxonomy of renal cell carcinoma the cytokine storm in covid- : an overview of the involvement of the chemokine/chemokinereceptor system evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response lsectin interacts with filovirus glycoproteins and the spike protein of sars coronavirus clinical characteristics of coronavirus disease in china sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor clinical features of patients infected with novel coronavirus in wuhan cxcr -cxcl : it's all in the tumor ace receptor expression and severe acute respiratory syndrome coronavirus infection depend on differentiation of human airway epithelia t-cell exhaustion in the tumor microenvironment covid- : a worldwide, zoonotic, pandemic outbreak forced expression of cxcl prevents liver metastasis of colon carcinoma cells by the recruitment of natural killer cells covid- mortality in patients with cancer on chemotherapy or other anticancer treatments: a prospective cohort study comprehensive analyses of tumor immunity: implications for cancer immunotherapy timer: a web server for comprehensive analysis of tumor-infiltrating immune cells cancer patients in sars-cov- infection: a nationwide analysis in china dc-sign and dc-signr interact with the glycoprotein of marburg virus and the s protein of severe acute respiratory syndrome coronavirus case fatality rate of cancer patients with covid- in a new york hospital system sars-cov- infection among travelers returning from wuhan hedgehog signaling promotes tumor-associated macrophage polarization to suppress intratumoral cd + t cell recruitment chemokines and chemokine receptors: new targets for cancer immunotherapy single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses dpp , the middle east respiratory syndrome coronavirus receptor, is upregulated in lungs of smokers and chronic obstructive pulmonary disease patients severity and risk of covid in cancer patients: an evidence based learning early epidemiological analysis of the coronavirus disease outbreak based on crowdsourced data: a population-level observational study the string database in : quality-controlled protein-protein association networks, made broadly accessible call for ensuring cancer care continuity during covid- pandemic t cell dysfunction in cancer ccl promotes breast cancer recurrence through macrophage recruitment in residual tumors structure, function and antigenicity of the sars-cov- spike glycoprotein the genetic sequence, origin, and diagnosis of sars-cov- roles of tnf-alpha gene polymorphisms in the occurrence and progress of sars-cov infection: a casecontrol study cancer-foxp directly activated ccl to recruit foxp + treg cells in pancreatic ductal adenocarcinoma cryo-em structure of the -ncov spike in the prefusion conformation a dna vaccine induces sars coronavirus neutralization and protective immunity in mice host lipids in positive-strand rna virus genome replication key: cord- - psoaer authors: ogando, natacha s.; zevenhoven-dobbe, jessika c.; posthuma, clara c.; snijder, eric j. title: the enzymatic activity of the nsp exoribonuclease is critical for replication of middle east respiratory syndrome-coronavirus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: psoaer coronaviruses (covs) stand out for their large rna genome and complex rna-synthesizing machinery comprising nonstructural proteins (nsps). the bifunctional nsp contains an n-terminal ’-to- ’ exoribonuclease (exon) and a c-terminal n -methyltransferase (n -mtase) domain. while the latter presumably operates during viral mrna capping, exon is thought to mediate proofreading during genome replication. in line with such a role, exon-knockout mutants of mouse hepatitis virus (mhv) and severe acute respiratory syndrome coronavirus (sars-cov) were previously found to have a crippled but viable hypermutation phenotype. remarkably, using an identical reverse genetics approach, an extensive mutagenesis study revealed the corresponding exon-knockout mutants of another betacoronavirus, middle east respiratory syndrome coronavirus (mers-cov), to be non-viable. this is in agreement with observations previously made for alpha- and gammacoronaviruses. only a single mers-cov exon active site mutant could be recovered, likely because the introduced d e substitution is highly conservative in nature. for other mers-cov exon active site mutants, not a trace of rna synthesis could be detected, unless – in some cases – reversion had first occurred. subsequently, we expressed and purified recombinant mers-cov nsp and established in vitro assays for both its exon and n -mtase activities. all exon knockout mutations that were lethal when tested via reverse genetics were found to severely decrease exon activity, while not affecting n -mtase activity. our study thus reveals an additional function for mers-cov nsp exon, which apparently is critical for primary viral rna synthesis, thus differentiating it from the proofreading activity thought to boost long-term replication fidelity in mhv and sars-cov. importance the bifunctional nsp subunit of the coronavirus replicase contains ’-to- ’ exoribonuclease (exon) and n -methyltransferase (n -mtase) domains. for the betacoronaviruses mhv and sars-cov, the exon domain was reported to promote the fidelity of genome replication, presumably by mediating some form of proofreading. for these viruses, exon knockout mutants are alive while displaying an increased mutation frequency. strikingly, we now established that the equivalent knockout mutants of mers-cov exon are non-viable and completely deficient in rna synthesis, thus revealing an additional and more critical function of exon in coronavirus replication. both enzymatic activities of (recombinant) mers-cov nsp were evaluated using newly developed in vitro assays that can be used to characterize these key replicative enzymes in more detail and explore their potential as target for antiviral drug development. the bifunctional nsp subunit of the coronavirus replicase contains '-to- ' exoribonuclease (exon) and n -methyltransferase (n -mtase) domains. for the betacoronaviruses mhv and sars-cov, the exon domain was reported to promote the fidelity of genome replication, presumably by mediating some form of proofreading. for these viruses, exon knockout mutants are alive while displaying an increased mutation frequency. strikingly, we now established that the equivalent knockout mutants of mers-cov exon are non-viable and completely deficient in rna synthesis, thus revealing an additional and more critical function of exon in coronavirus interestingly, a quite different phenotype was described for the corresponding exon- knockout mutants of two betacoronaviruses, mouse hepatitis virus (mhv) and cov. while exon inactivation decreased replication fidelity in these viruses, conferring a 'mutator phenotype', the mutants were viable, both in cell culture ( , ) and in animal models ( ). these findings suggested that exon may indeed be part of an error correction mechanism. subsequently, the ability of exon to excise '-terminal mismatched nucleotides from a double-stranded (ds) rna substrate was demonstrated in vitro using recombinant sars-cov nsp ( ). furthermore, this activity was shown to be strongly enhanced (up to -fold) by the presence of nsp , a small upstream subunit of the cov replicase ( ). the two subunits were proposed to operate, together with the nsp -rdrp, in repairing mismatches that may be introduced during cov rna synthesis ( , ) . in cell culture, mhv and sars-cov mutants lacking exon activity exhibit increased sensitivity to mutagenic agents like - fluoracil ( -fu), compounds to which the wild-type virus is relatively resistant ( , ) . recently, exon activity was also implicated in cov rna recombination, as an mhv exon knockout mutant exhibited altered recombination patterns, possibly reflecting its involvement in other activities than error correction during cov replication and subgenomic mrna synthesis ( ). outside the order nidovirales, arenaviruses are the only other rna viruses known to employ an exon domain, which is part of the arenavirus nucleoprotein and has been implicated in fidelity control ( ) and/or immune evasion, the latter by degrading viral dsrna ( , ) . based on results obtained with tgev and mhv exon knockout mutants, also the cov exon activity was suggested to counteract innate responses ( , ) . in the meantime, cov nsp was proven to be a bifunctional protein by the discovery of an (n -guanine)-methyltransferase (n -mtase) activity in its c-terminal domain ( ) (fig. ). this enzymatic activity was further corroborated in vitro, using biochemical assays with purified recombinant sars-cov nsp . the enzyme was found capable of methylating cap analogues or gtp substrates, in the presence of s-adenosyl methionine (sam) as methyl donor ( , ) . the n -mtase was postulated to be a key factor for equipping cov mrnas with a functional '-terminal cap structure, as n - methylation is essential for cap recognition by the cellular translation machinery ( ). although, the characterization of the nsp n -mtase active site and reaction mechanism was not completed, alanine scanning mutagenesis and in vitro assays with nsp highlighted several key residues ( fig. ) ( , , ) . moreover, crystal structures of sars-cov nsp in complex with its nsp co-factor (pdb entries c u and nfy) revealed several unique structural and functional features (ma et al., ; ferron et al., ) . these combined structural and biochemical studies confirmed that the two enzymatic domains of nsp are functionally distinct ( ) and physically independent ( , ). still, the two activities are structurally intertwined, as it seems that the n -mtase activity depends on the integrity of the n-terminal exon domain, whereas the flexibility of the protein is modulated by a hinge region connecting the two domains ( ). coronaviruses are abundantly present in mammalian reservoir species, including bats, and pose a continuous zoonotic threat ( ) ( ) ( ) ( ) . to date, seven covs that can infect humans have been identified, and among these the severe acute respiratory continues to circulate and cause serious human disease, primarily in the arabian peninsula ( ) . occasional spread to other countries has also occurred, including an outbreak with confirmed cases in south korea in ( - viable, while displaying a -to -fold increased mutation rate (eckerle, lu et al. , eckerle, becker et al. ). an alignment of cov nsp amino acid sequences is presented in fig. , including sars-cov- , which emerged during the course of this project. it highlights the key motifs/residues of the two enzymatic domains of nsp , as well as other structural elements, like the nsp binding site, the hinge region connecting the exon and n -mtase domains, and three previously identified zinc finger domains ( , ) . the alignment also illustrates the generally high degree of nsp sequence conservation across different cov (sub)genera. in the present study, we targeted all five predicted active site residues of the mers- cov exon domain (d , e , e , d , and h ) by replacing them with alanine as well as more conservative substitutions (d to e or q; e to d or q) . this yielded a total of exon active site mutants ( fig. a) transcripts were electroporated into bhk- cells, which lack the dpp receptor required for natural mers-cov infection (chan, chan et al. , raj, mou et al. but are commonly used to launch engineered cov mutants because of their excellent survival of the electroporation procedure ( , , , , , ) . as bhk- cells have a severely compromised innate immune response ( ), they would seem an appropriate cell line to launch exon knockout mutants also in case the enzyme would be needed to counter innate immunity (becares, pascual-iglesias et al. , case, li et al. . to amplify any progeny virus released, transfected bhk- cells were mixed with either innate immune-deficient (vero) or -competent (huh ) cells, which both are naturally susceptible to mers-cov infection. in stark contrast to what was previously described for mhv and sars-cov, mutagenesis of exon active site residues was found to fully abrogate mers-cov replication. when transfected cell cultures were analyzed using immunofluorescence microscopy at days post transfection (d p.t.), abundant signal was always observed for wild-type mers-cov, but no sign of virus replication was observed for out of mutants tested (fig. ), regardless whether vero or huh cells were used for propagation of recombinant virus. furthermore, infectious progeny was not detected when transfected cell culture supernatants were analyzed in plaque assays ( fig. and data not shown). the single exception was the mutant carrying the conservative e d replacement in exon motif ii (fig. ) , which was alive but somewhat crippled, as will be discussed in more detail below. these results were consistent across a large the lack of mers-cov-specific rna synthesis was further analyzed using rt-pcr assays specifically detecting genomic rna or subgenomic mrna . rna specifying an a d substitution in the betacoronavirus-specific marker (βsm) domain of nsp , which has been predicted to be a non-enzymatic domain ( ) and is absent in alpha-and delta-coronaviruses ( , ). thus, we assumed that any changes in viral replication were likely caused by the e d mutation in nsp exon. the same virus stock was used to assess growth kinetics in huh cells (fig. b ) and vero cells (fig. a ), which were found to be very similar for wt and mutant virus. still, the e d mutant was found to be somewhat crippled, yielding smaller plaque sizes and somewhat lower progeny titers in huh cells (fig. b-c), but not in vero cells (fig. a ). we next examined the sensitivity of e d and wt virus to the mutagenic agent - fu, which intracellularly is converted into a nucleoside analogue that is incorporated into viral rna ( , ). previously, mhv and sars-cov exon knockout mutants were found to exhibit increased sensitivity to -fu treatment, in particular in multi-cycle experiments, which was attributed to a higher mutation frequency in the absence of exon-driven error correction ( ). we employed this same assay to assess the phenotype of the e d mutant in more detail, by performing plaque assays in huh previously, the exon activity of sars-cov nsp was found to be dramatically stimulated by the addition of nsp as co-factor ( ). consequently, we also expressed and purified mers-cov nsp and optimized the exon assay by testing different molar ratios between nsp and nsp (fig. a , left-hand side), different nsp concentrations (fig. b , left-hand side), and by different incubation times (fig. , left- hand side). mers-cov nsp exon activity was found to be stimulated by nsp in a dose-dependent manner (fig. a) , while nsp did not exhibit any nuclease activity by itself (fig. b, nsp lane) . the full-length substrate is more completely degraded when a fourfold (or higher) excess of nsp over nsp was used compared to the effect of merely increasing the nsp concentration in the assay (fig. b ). similar (bouvet, imbert et al. , ma, wu et al. . using a : ratio of nsp versus nsp , mers-cov exon activity was analyzed in a time-course experiment, (fig. ) . over time, the full-length substrate was progressively converted to a set of degradation products in the size range of - nt. we anticipated that the structure of the h rna substrate would change from a when the two proteins were combined in the same reaction, a strong increase of exon activity was observed for both nsp -nsp pairs, with the sars-cov pair appearing to be somewhat more processive than the mers-cov pair (fig. , lanes and ) . (fig. ) . we also evaluated the impact of the h c zf mutation, which -despite its conservative nature -yielded a crippled mutant virus (fig. ) and of two n -mtase mutations (discussed below). the n -mtase mutants displayed wt nsp -like exon activities (fig. ) , suggesting that -as in sars-cov nsp -exon and n -mtase activities are functionally separated ( ). analyzing the substrate degradation pattern of the h c mutant (fig. ) , the enzyme seems to be somewhat crippled when compared to wt nsp . this suggests that this mutation alters exon activity in vitro, potentially by affecting the structure of the exon domain, as zf is in close proximity of the nsp interaction surface ( ). however, a similar reduction of exon activity was observed for the e d mutant (fig. ) , which was much more viable than the h c recombinant mers-cov nsp was found to methylate gpppa, but not m gpppa (fig. a) , which yielded a signal that was similar to the background signal in assays lacking nsp or substrate (data not shown). methylation increased with time until reaching a plateau after min (fig. b) . the n -mtase activity of the various nsp mutants was compared with that of wt nsp after reaction times of and min (fig. c) . while the r a and d a control mutations fully inactivated the n - exon catalytic residues abolished all detectable viral rna synthesis (fig. ) and the release of viral progeny (fig. ) . the only exception was the conservative e d mutant, which was found to exhibit near-wild-type levels of exon activity ( fig. and ). based on nsp conservation (fig. ) and the viable phenotype of sars-cov and mhv exon-knockout mutants, mers-cov was expected to tolerate exon inactivation, in particular since the enzyme was proposed to improve the fidelity of cov replication without being essential for rna synthesis per se ( , , - , , ) . this notion is in the only viable mers-cov exon active site mutant obtained, e d, the catalytic motif was changed from deedh to the deddh that is characteristic of all members of the exonuclease family that exon belongs to ( , , ) . the phenotype of the e d virus mutant was comparable to that of wt virus (fig. ) . biochemical assays revealed that e d-exon enzyme is able to hydrolyze a dsrna substrate with an activity level approaching that of the wt protein (fig. ) . additionally, the e d mutant behaved in this study, we developed an in vitro assay to evaluate mers-cov exon activity using a largely double-stranded rna substrate ( fig. and ) . as previously observed is interchangeable between cov subgenera in its role as co-factor for the nsp ′-o- methyltransferase, which was attributed to the high level of conservation of the nsp - nsp interaction surface ( ). as nsp and nsp share a common interaction surface on nsp ( , , ), we explored whether a similar co-factor exchange was possible in the context of nsp 's exon activity, which was indeed found to be the case (fig. ) . structurally, nsp interacts with nsp figuratively similar to a "hand (nsp ) over fist (nsp )" conformation ( ). in the formation of this complex, nsp induces conformational changes in the n-terminal region of exon that adjusts the distance between the catalytic residues in the back of the nsp palm and, consequently, impact exon activity ( ). the exchange of the nsp co-factor between the two beta-covs ), indicating that each of these residues is important for catalysis. our study suggests that, in addition to the active site residues, also other motifs in mers-cov exon are important for virus viability, specifically the two zf motifs that were probed using two point mutations each (fig. a) . in previous zf studies, a mutation equivalent to h a created solubility issues during expression of recombinant sars-cov nsp ( ) and resulted in a partially active exon in the case of white bream virus, a torovirus that also belongs to the nidovirus order ( ). it was suggested that zf contributes to the structural stability of exon, as it is close to the surface that interacts with nsp ( ). here, we demonstrate that the more conservative h c replacement, which converts zf from a non-classical ccch type zf motif into a classical cccc type ( ), was tolerated during recombinant protein expression and yielded an exon that is quite active in vitro (fig. ) . this likely contributed to the fact that the h c virus mutant retained a low level of viability ( fig. ), although its overall crippled phenotype and the non-viable phenotype of mutant c h clearly highlight the general importance of zf for virus replication. in contrast, the corresponding tgev mutant (zf-c) was not strongly affected and could be stably quasispecies diversity determines pathogenesis through cooperative interactions in a viral population the hypercycle. a principle of natural self-organization. part a: emergence of the hypercycle viral quasispecies evolution mutational fitness effects in rna and single-stranded dna viruses: common patterns revealed by site-directed mutagenesis studies viral mutation rates lack of evidence for proofreading mechanisms associated with an rna virus polymerase mutation rates among rna viruses error catastrophe and antiviral strategy selforganization of matter and the evolution of biological macromolecules discovery of the first insect nidovirus, a missing evolutionary link in the emergence of the largest rna virus genomes the footprint of genome architecture in the largest genome expansion in rna viruses a planarian nidovirus expands the limits of rna genome size description and initial characterization of metatranscriptomic nidovirus-like genomes from the proposed new family abyssoviridae, and from a sister group to the coronavirinae, the proposed genus alphaletovirus unique and conserved features of genome and proteome of sars- coronavirus, an early split-off from the coronavirus group lineage exoribonuclease superfamilies: structural analysis and phylogenetic distribution purification and characterization of escherichia coli rnase t structural basis for the '- ' exonuclease activity of escherichia coli dna polymerase i: a two metal ion mechanism a general two-metal-ion mechanism for catalytic rna discovery of an rna virus '-> ' exoribonuclease that is critically involved in coronavirus rna synthesis structural basis and functional analysis of the sars coronavirus nsp -nsp complex structural and molecular basis of mismatch correction and ribavirin excision from coronavirus rna high fidelity of murine hepatitis virus replication is decreased in nsp exoribonuclease mutants infidelity of sars-cov nsp -exonuclease mutant virus replication is revealed by complete genome sequencing a live, impaired- fidelity coronavirus vaccine protects in an aged, immunocompromised mouse model of lethal disease in vitro reconstitution of sars-coronavirus mrna cap methylation rna '-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp /nsp exoribonuclease complex one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics coronaviruses adapt for increased fitness over long-term passage without reversion of exoribonuclease-inactivating mutations routh al, denison mr. . the coronavirus proofreading exoribonuclease mediates extensive viral recombination arenaviridae exoribonuclease presents genomic rna edition capacity structure of the lassa virus nucleoprotein reveals a dsrna-specific ' to ' exonuclease activity essential for immune suppression the exonuclease domain of lassa virus nucleoprotein is involved in antigen-presenting-cell-mediated nk cell responses coronavirus nsp reveals its potential role in modulation of the innate immune response murine hepatitis virus nsp exoribonuclease activity is required for resistance to innate immunity functional screen reveals sars coronavirus nonstructural protein nsp as a novel cap n methyltransferase characterization of the guanine-n methyltransferase activity of coronavirus nsp on nucleotide gtp structure- function analysis of severe acute respiratory syndrome coronavirus rna cap guanine-n - methyltransferase methionine-binding residues in coronavirus nsp n -methyltransferase demonstrates differing requirements for genome translation and resistance to innate immunity bat origin of a new human coronavirus: there and back again identification of new human coronaviruses a novel coronavirus emerging in china -key questions for impact assessment a pneumonia outbreak associated with a new coronavirus of probable bat origin isolation of a novel coronavirus from a man with pneumonia in saudi arabia spread, circulation, and evolution of the middle east respiratory syndrome coronavirus. mbio dynamics of scientific publications on the mers-cov outbreaks in saudi arabia drivers of mers-cov emergence in qatar drivers of mers-cov transmission: what do we know? additional changes to taxonomy ratified in a special vote by the international committee on taxonomy of viruses coronaviridae study group of the international committee on taxonomy of v. . the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- engineering a replication-competent, propagation-defective middle east respiratory syndrome coronavirus as a vaccine candidate accessory protein a inhibits pkr-mediated antiviral stress responses genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans effects of mutagenesis of murine hepatitis virus nsp and nsp on replication in culture t rna polymerase-dependent and - independent systems for cdna-based rescue of rift valley fever virus a conserved '---- ' exonuclease active site in prokaryotic and eukaryotic dna polymerases the curious case of the nidovirus exoribonuclease: its role in rna synthesis and replication fidelity proteomics analysis unravels the functional repertoire of coronavirus nonstructural protein bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles nsp of coronaviruses: structures and functions of a large multi-domain protein -fluorouracil incorporation into rna and dna in relation to thymidylate synthase inhibition of human colorectal cancers effect of -fluorouracil combination therapy on rna processing in human colonic carcinoma cells zinc finger domains as therapeutic targets for metal-based compounds -an update zn(ii) binding and dna binding properties of ligand-substituted cxhh-type zinc finger proteins biochemical characterization of exoribonuclease encoded by sars coronavirus coronavirus nsp , a critical co-factor for activation of multiple replicative enzymes the crystal structure of nsp -nsp heterodimer from sars-cov- in complex with s- adenosylmethionine alisporivir inhibits mers-and sars- coronavirus replication in cell culture, but not sars-coronavirus infection in a mouse model mers-coronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin a or interferon-alpha treatment sars-coronavirus- replication in vero e cells: replication kinetics, rapid adaptation and cytopathology fitness barriers limit reversion of a proofreading-deficient coronavirus structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors . structural basis for inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir structure of replicating sars-cov- polymerase the '- ' exonuclease site of dna polymerase iii from gram-positive bacteria: definition of a novel motif structure genome-wide analysis of protein-protein interactions and involvement of viral proteins in sars-cov replication coronavirus nsp /nsp methyltransferase can be targeted by nsp -derived peptide in vitro and in vivo to reduce replication and pathogenesis the '- ' exonuclease of dna polymerase i of escherichia coli: contribution of each amino acid at the active site to the reaction characterization of a bafinivirus exoribonuclease activity to sense or not to sense viral rna--essentials of coronavirus innate immune evasion sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum . a unifying structural and functional model of the coronavirus replication organelle: tracking down rna synthesis characterization of bafinivirus main protease autoprocessing activities the nonstructural proteins directing coronavirus rna synthesis and processing arterivirus nsp modulates the accumulation of minus-strand templates to control the relative abundance of viral mrnas en passant mutagenesis: a two step markerless red recombination system double-stranded rna is produced by positive-strand rna viruses and dna viruses but not in detectable amounts by negative-strand rna viruses ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex reverse genetics of sars-related coronavirus using vaccinia virus-based recombination new low-viscosity overlay medium for viral plaque assays cov forms an ion channel: experiments and molecular dynamics simulations toward the identification of viral cap-methyltransferase inhibitors by fluorescence screening assay fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega jalview version --a multiple sequence alignment editor and analysis workbench key: cord- -zna qkkv authors: wirden, marc; feghoul, linda; bertine, mélanie; nere, marie-laure; le hingrat, quentin; abdi, basma; boutolleau, david; ferre, valentine marie; jary, aude; delaugerre, constance; marcelin, anne-genevieve; descamps, diane; legoff, jérôme; visseaux, benoit; chaix, marie-laure title: multicenter comparison of the cobas system with the realstar rt-pcr kit for the detection of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zna qkkv background rt-pcr testing is crucial in the diagnostic of sars-cov- infection. the use of reliable and comparable pcr assays is a cornerstone to allow use of different pcr assays depending on the local equipment. in this work, we provide a comparison of the cobas® (roche) and the realstar® assay (altona). methods assessment of the two assays was performed prospectively in three reference parisians hospitals, using clinical samples. they were tested with the cobas® assay, selected to obtain a distribution of cycle threshold (ct) as large as possible, and tested with the realstar assay with three largely available extraction platforms: qiasymphony (qiagen), magnapure (roche) and nuclisens-easymag (biomérieux). results overall, the agreement (positive for at least one gene) was %. this rate differed considerably depending on the cobas ct values for gene e: below (n = ), the concordance was %. regarding the positive ct values, linear regression analysis showed a determination correlation (r ) of . and the deming regression line revealed a strong correlation with a slope of . and an intercept of - . . bland-altman analysis showed that the mean difference (cobas® minus realstar®) was + . ct, with a sd of + . ct. conclusions in this comparison, both realstar® and cobas® assays provided comparable qualitative results and a high correlation when both tests were positive. discrepancies exist after ct and varied depending on the extraction system used for the realstar® assay, probably due to a low viral load close to the detection limit of both assays. in this study, we compared two different widely used tests in three major parisian university hospital laboratories. these are the realstar® sars-cov- rt-pcr kit . (altona diagnostics, france) which can be associated to different extraction and amplification devices, and the cobas® sars-cov- kit used on the cobas® system (cobas ; roche diagnostics, mannheim, germany). samples in april , patients were included in this prospective study performed in virological laboratories located in paris (saint louis hospital (n= ), bichat hospital (n= ) and la salpêtrière hospital (n= )). then, each laboratory selected to samples firstly detected using the cobas with a stratification according to the ct of the e gene cobas results, allowing to cover the whole linear range of the assays. thus, three categories were retained: ct < , ct between and and with a ct > . rapidly, in the same day or within hours, the leftover samples stored at + °c were tested with the realstar assay. thirty nasopharyngeal swab samples collected in (in the pre-epidemic covid period) were also tested with both techniques ( in each laboratory). the cobas ® sars-cov- test is a single-well dual target assay, which targets the non- structural orf a/b region specific of sars-cov- and the structural protein envelope e gene for pan-sarbecovirus detection. the test used rna internal control for sample extraction and pcr amplification process control. to take into account the available sample volume and the security conditions required for this virus before loading on the cobas system, the pre-analytical protocol has been adapted as recommended by the manufacturer as follows: μl of each sample were transferred at room temperature into barcoded secondary tubes containing μl of cobas lysis buffer for the sars-cov- neutralization. then, the tube was loaded on the cobas where μl from those μl were used for rna extraction, and eluted in μl of which µl were used for the rt-pcr. the test was performed in positive results either for both orf a/b and e genes or for the orf a/b gene only. in the case of single e gene positivity, the result should be reported as sars-cov- presumptive positive and repeated, but were considered as positive for this study. the realstar® sars-cov- rt-pcr kit . assay targets the e gene specific for sarbecoviruses, and the s gene specific for sars-cov- . it includes a heterologous the cobas assay. this is in accordance with the cobas insert information reporting a higher sensitivity for the e gene detection than for the orf /a, and also a drop in the positivity rate above ct for the e target. this may explain why the realstar test yielded many negative results in such cases as both tests probably reached their detection limits. this is a limitation of our study as we did not assessed comparatively the limit of detection of the two methods but the reliability of their ct values among cobas positive samples, excluding those that could be negative with cobas and positive with realstar in this range of low viral loads. our work highlights the impact of the extraction system on the sensitivity of the realstar assay. overall, we demonstrated the good performances and concordance between the two assays, at least for viral loads above the detection limit of both assays. this concordance allows to reliably compare ct values obtained from both methods. however, the variations observed between the ct values of the two assays, evaluated here as about . additional ct with the cobas assay, has to be taken into account for ct values follow-up done for the most severe patients in case of successive use of the two methods, depending of reagent and analyser availability. early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a novel coronavirus from patients with pneumonia in china who (world health organisation). . statement on the second meeting of the emergency committe regarding the outbreak of novel coronavirus laboratory testing for coronavirus disease suspected human cases laboratory readiness and response for novel coronavirus ( -ncov) in expert laboratories in eu/eea countries clinical evaluation of a sars-cov- rt-pcr assay on a fully automated system for rapid on-demand testing in the hospital setting comparing the analytical performance of three sars-cov- molecular diagnostic assays comparison of seven commercial rt-pcr diagnostic kits for covid- rt-pcr assay for the detection of the emerging coronavirus sars-cov- using a high throughput system the authors declare no conflict of interest. we acknowledge all the laboratory staff of saint louis hospital virology department, bichat hospital virology department and la pitié-salpêtrière hospital virology department. key: cord- -wyx ib s authors: sinegubova, maria v.; orlova, nadezhda a.; kovnir, sergey v.; dayanova, lutsia k.; vorobiev, ivan i title: high-level expression of the monomeric sars-cov- s protein rbd - in stably transfected cho cells by the eef a -based plasmid vector date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wyx ib s the spike (s) protein is one of the three proteins forming the coronaviruses’ viral envelope. the s protein of the severe acute respiratory syndrome coronavirus (sars-cov- ) has a spatial structure similar to the s proteins of other mammalian coronaviruses, except for a unique receptor-binding domain (rbd), which is a significant inducer of host immune response. recombinant sars-cov- rbd is widely used as a highly specific minimal antigen for serological tests. correct exposure of antigenic determinants has a significant impact on the accuracy of such tests – the antigen has to be correctly folded, contain no potentially antigenic non-vertebrate glycans, and, preferably, should have a glycosylation pattern similar to the native s protein. based on the previously developed p . vector, containing the regulatory sequences of the eukaryotic translation elongation factor alpha gene (eef a ) from chinese hamster, we created two expression constructs encoding sars-cov- rbd with c-terminal c-myc and polyhistidine tags. rbdv contained a native viral signal peptide, rbdv – human tpa signal peptide. we transfected a cho dg cell line, selected stably transfected cells, and performed a few rounds of methotrexate-driven amplification of the genetic cassette in the genome. for the rbdv variant, a high-yield clonal producer cell line was obtained. we developed a simple purification scheme that consistently yielded up to mg of rbd protein per liter of the simple shake flask cell culture. purified proteins were analyzed by polyacrylamide gel electrophoresis in reducing and non-reducing conditions and gel filtration; for rbdv protein, the monomeric form content exceeded % for several series. deglycosylation with pngase f and mass spectrometry confirmed the presence of n-glycosylation. the antigen produced by the described technique is suitable for serological tests and similar applications. humanity is faced with an unprecedented challenge -the severe acute respiratory syndrome coronavirus (sars-cov- ), which causes a severe respiratory illness -coronavirus disease (covid- ) pandemic. countries were sent to lockdown; people could not make informed decisions about the possibility of social contacts; the need for diagnostic tests is very high. existing tests for sars-cov are reviewed in [ ] . at the beginning of the pandemic, pcr testing methods dominated since such test systems can be developed urgently, soon after the emergence of a new virus in the population. among the disadvantages of pcr-tests is a high sensitivity to contamination and dependence on sampling's correctness, a high proportion of falsepositive signals. unlike pcr diagnostics, serological testing gives positive results long after the event of infection, at least for several months. this testing method makes it possible to reliably determine whether a person is infected with the sars-cov- , even in the absence of disease symptoms. we need serological tests, both in express format and screening tests based on elisa. serologic tests are also needed to detect convalescent plasma of therapeutic interest and assess emerging vaccines' effectiveness. in order for serological testing to have a more significant predictive value, mapping of the epitopes to which neutralizing antibodies appear should be carried out, as was done for sars-cov [ ] , аnd convalescent or postvaccinal sera should be massively tested for the presence of neutralizing antibodies, for example, with a surrogate virus neutralization test based on antibody-mediated blockage of ace -spike protein-protein interaction [ ] or another that can be carried out on a relatively large scale. the use of highly specific and high-affinity viral antigens is already a big step towards improving diagnostic accuracy. the immunodominant antigen of sars-cov is the rbd domain of the spike protein [ ] . another antigen widely used for diagnostics -the nucleocapsid (n) protein -combines high sensitivity and low specificity; therefore, it needs accurate antigen mutagenesis to remove highly conserved areas without compromising affinity. cases are described for sars-cov when the results of testing with n-protein were clarified using two subunits of spike protein [ ] . the coronaviruses' spike (s) protein forms large coronal-like protrusions on the virions surface, hence the name of the family coronaviridae. the s protein plays a crucial role in receptor recognition, cell membrane fusion, internalization of viruses, and their exit from the endosomes. it is described in detail in the review [ ] . it consists of s and s subunits and, in the case of the sars-cov- virus, has amino acids [ ] . the s protein is co-translationally incorporated into the rough endoplasmic reticulum (er) and is glycosylated by n-linked glycans. glycosylation is essential for proper folding and transport of the s protein. the s protein trimer is transported from the er. interacting with the m and e proteins s protein trimer is transported to the virus's assembly site. s protein is required for cell entry but not necessary for virus assembly [ ] . during their intracellular processing, s proteins of many types of coronaviruses, including sars-cov- and mers-cov, but not sars-cov, undergo partial proteolytic degradation at the furin signal protease recognition site with the formation of two subunits s and s . apparently, most of the s protein copies on the membrane of sars-cov- viral particles are trimers of s subunits that are incapable of interacting with the receptor. the full-length s protein trimer on the viral particle's surface also undergoes complex conformational rearrangements during the formation of the rbd-receptor complex and the virus's penetration into the cell. the s protein homotrimer binds to the ace dimer, detailed study of this interaction is available here [ ] . as part of the trimer, the spike protein's monomer "moves its head" -the s subunit can form the open or closed conformation; that is, it can have a raised fragment or a lowered rbd domain, this can influence the affinity of antibodies targeted to it. the s protein of sars-cov- amino acid sequence is variable, with more than and relatively frequent s protein amino acid variations. a glycan shield is formed by n-linked glycans on the s protein surface, which is likely to help viral immune escape. in a comparative study of genome-wide sequencing data of natural isolates of sars-cov- [ ] for the detected variants of the s protein, all potential nglycosylation sites within the s protein's ectodomain were completely conserved, which confirms the importance of each of these sites for maintaining the integrity of the s protein oligosaccharide envelope. it should be noted that not all of the potential n-glycosylation sites are occupied, for s and s subunits, obtained from transiently transfected hek cells [ ] n-glycosylation events were experimentally confirmed only for out of sites, also at least one o-glycosylation site was experimentally found inside the rbd-domain area of the s subunit with the mucin-like structures. non-vertebrate cells may be used to produce the s protein or its fragments; in this case, n-glycans are present mostly in the form of bulky high mannose or paucimannose structures, possibly blocking the interaction of antibodies with the folded s protein [ ] . computational modeling of the glycan shield, performed for the hek -derived s protein, revealed that in the case of human cells, around % of the protein's surface is effectively shielded from igg antibodies [ ] . the use of full-length s protein for practical serological testing is nearly impossible due to its insolubility, caused by the presence of transmembrane domain. an artificial trimer of its ectodomain has been successfully used as an antigen in serological tests; however, such complex protein cannot be obtained in large quantities in mammalian cells, apparently due to the limitation on the folding of the trimerized abundantly glycosylated protein and subsequent difficulties in its isolation and purification. it is generally believed that the sars-cov- s protein receptor-binding domain is a minimal proteinaceous antigen, adequately resembling the immunogenicity of the whole spike protein. this domain contains only two occupied n-glycosylation sites [ ] and - occupied o-glycosylation sites. it does not contribute to the trimer formation, and its surface is mostly unshielded. isolated rbd's of the s proteins of beta-coronaviruses were produced in various expression systems. bacterial expression of the rbd from mers-cov produced no soluble target protein, refolding attempts also were unsuccessful [ ] . budding yeasts pichia pastoris were the suitable host for the secretion of mers-cov rbd with at least two (from three) n-linked glycosylation sites present. similar data were obtained for the rbd from sars-cov virus -removal of all n-glycosylation sites resulted in the sharp drop of protein secretion rate in the p. pastoris yeast, in the case of full rbd domain (residues - ), secretion of the unglycosylated target protein was stopped completely [ ] . it may be proposed that the addition of n-glycans in these sites is needed for correct folding of the rbd in the er of eukaryotic cells. the sars-cov- s protein rbd, expressed in e.coli, also was detected only as inclusion bodies and was found to be unreactive even on blotting [ ] . hyperglycosylated yeast-derived sars-cov- rbd was obtained in reasonable quantities ( mg/l in bioreactor culture) by the p. pastoris expression system and successfully used for mice immunization [ ] . unfortunately, yeast-derived glycosylated proteins contain immunogenic glycans and cannot be used for immune assays with human antibodies. similarly, sars-cov- rbd may be produced in the nicotiana benthamiana plant, resulting in non-vertebrate n-glycans addition, potentially reactive with human antibodies [ ] . most early preprints and peer-reviewed articles describing the sars-cov- s protein and its rbd domain production methods were focused on transient transfection of hek cells [ ] [ ] and purification of small protein lots in a very short time. for example, d. stadlbauer [ ] reports more than mg/l target protein titer in transiently transfected hek- cells. simultaneously, the scalability of transiently transfected cell lines cultivation is still questionable, and gram quantities of rbd, needed for large scale in vitro diagnostic activity, may be produced only by stably transfected cell lines. previously we have developed the plasmid vector p . , containing large fragments of non-coding dna from the eef a gene of the chinese hamster and fragment of the epstein-barr virus long terminal repeat concatemer [ ] and employed it for unusually high-level expression of various proteins in cho cells, including blood clotting factors viii [ ] , ix [ ] , and heterodimeric follicle-stimulating hormone [ ] . cho cells were successfully used for transient sars-cov rbd expression at mg/l secretion level [ ] . we have proposed that sars-cov- rbd, suitable for in vitro diagnostics use, may be expressed in large quantities by stably transfected cho cells, bearing the eef a -based plasmid. p . -tr -rbdv construction. the rbd coding sequence was synthesized according to [ ] . the dna fragment encoding the rbdv orf with kozak consensus sequence and c-terminal c-myc and xhis tags were obtained by pcr using primers ad-cov-absf and ad-rbd-myc hnher (listed in table the resulting ptm vector was sequenced as described above, available from addgene, plasmid # . ptm-rbdv construction. rbd orf was amplified using adaptor primers ad-sfr -nhef and ad-sfr -xmar restricted by nhei and xmai (sibenzyme, novosibirsk, russia) and cloned into ptm vector, restricted by nhei and asigi (sibenzyme, novosibirsk, russia). the resulting construct was sequenced using sq- ch -f and sq-mych-r primers. ptm-rbdv is available from addgene, plasmid # plasmids for cell transfections were purified by the plasmid midiprep kit (evrogen, moscow, russia) and concentrated by ethanol precipitation in sterile conditions. the transgene copy number in the cho genome was determined by the quantitative real-time-pcr (qpcr) as described in [ , ] . serial dilutions of p . -egfp [ ] or pgem-rab plasmids were used for calibration curves generation. the weight of one cho haploid genome was taken as pg, according to [ ] . genomic chinese hamster ovary dg- cells (thermo fischer scientific) were cultured in the procho medium (lonza, switzerland), supplemented by mm glutamine, mm alanyl-glutamine and hypoxanthinethymidine supplement (ht) (paneco, moscow, russia). cells were grown as a suspension culture in sterile ml erlenmeyer flasks with vented caps, routinely passaged to days with centrifugation ( g, min) and seeding density - * cells/ml. the - µg of each plasmid were precipitated by the addition of % ethanol and m sodium acetate, washed with % ethanol, dried, and resuspended in µl of sterile r-buffer, neon transfection kit (thermo seeding cell culture was grown in ml erlenmeyer shake flasks with ml of lonza procho medium, supplemented with mm glutamine, mm alanyl-glutamine and - µm mtx until cell concentration exceeds - . mln cells/ml. cell suspension was transferred to four ml erlenmeyer flasks, each containing ml of culture medium, and grown to the same cell density. the entire cell suspension was transferred to a single l erlenmeyer flask with l culture medium, final seeding density - * cells/ml. cells were cultured for three days, on the fourth day of culture, daily glucose measurements were started. glucose concentration in the cell supernatant was measured by the accutrend plus system (roche, switzerland); if glucose level was below mm, it was added up to mm as the sterile % solution. the culture in l flask was grown for to days until the cell viability, measured by trypan blue exclusion, dropped below %. the clonal cell line was obtained by the limiting dilution method from the cell population, cultured in µm mtx. methotrexate was omitted in the culture medium for two d passage before cloning. cells were additionally split by : dilution hours before the cloning procedure. cells were diluted in excell-cho (merck, germany) culture medium supplemented with mm glutamine, mm alanyl-glutamine, ht and % of untransfected cho dg conditioned medium resulting in seeding density . cell/well, and the suspension was seeded into -well plates ( μl/well). plates were left undisturbed for days at °c, % co atmosphere. wells with single colonies were screened by microscopy; well grown colonies were detached by pipetting and transferred to the wells of -well plate, containing ml of the excell-cho, supplemented as described above and grown for days undisturbed. product titer was measured by elisa, as described below, wells with highest rbdv titer were used for further cultivation. best-producing clonal cell lines were transferred to ml erlenmeyer flasks with the procho culture medium supplemented with mm glutamine, mm alanyl-glutamine and µm mtx and after days in suspension culture, the best producing clone was determined by measuring the product titer and cell concentration. sds-page was performed with the . % acrylamide in the separating gel, in reducing conditions, if not stated otherwise, with the pageruler prestained marker, µl/lane (thermofisher scientific). gels were stained by the colloidal coomassie blue according to [ ] , scanned by the conventional flatbed scanner in the transparent mode as -bit grayscale images and analyzed by the totallab tl gel densitometry software (nonlinear dynamics, uk). sds-page was performed as described above, protein transfer, blocking, hybridization and color development were done according to [ ] using nitrocellulose transfer membrane (gvs group, bologna, italy) and towbin buffer with methanol. primary anti-c-myc antibody (sci store, moscow, russia #psm - ) was used at the : dilution, anti-mouse-hrp conjugate (abcam, cambridge, uk, ab ) was used at : dilution; membrane was developed by the dab-metal substrate and scanned by the flatbed scanner in the reflection mode. multimeric forms of the rbd were quantified by size exclusion chromatography, utilizing waters extracts were vacuum-dried and redissolved in the . % trifluoroacetic acid (tfa), % acn solution. prepared solutions were mixed at : ratio with % α-cyano- -hydroxycinnamic acid (merck) solution in % acn, . % tfa on the target plate. solutions of intact and deglycosylated proteins were passed through the ziptip c microcolumns (millipore), washed and eluted according to manufacturer protocol. one and a half µl of protein solutions were mixed on the target plate with . µl of the % , -dihydroxybenzoic acid (merck) solution in % acn, . % tfa. mass spectra were obtained by the maldi-tof mass spectrometer ultraflextreme peptides identification was performed by the gpmaw . software (lighthouse data, denmark) and by the mascot server (matrix science, boston, usa). glycopeptides mass assignment was performed by the glycomod online software tool [ ] . sandwich elisa with anti-s protein antibodies was performed using a prototype of the sars-cov- antigen detection kit (xema co., ltd., moscow, russia, a generous gift of dr. yuri lebedin). pre-covid- normal human plasma sample (renam, moscow, russia) was used for preparation of the sars-cov- negative serum sample. serum samples of five patients with the pcr-confirmed sars-cov- infection were pooled for testing and one serum sample with the borderline igg titer level was tested separately. the blood sampling protocol conformed to the local hospital human ethics committee guidelines. antibody capture elisa with human serum samples was performed according to [ ] at the ng per well antigens load. antigens were applied on elisa -well plates (corning, usa) overnight at + oc, in pbs, the t-test was performed using the graphpad quickcalcs web site: https://www.graphpad.com/quickcalcs/ttest .cfm (accessed november ). the native n-terminal signal peptide of sars-cov- s protein (amino acid sequence mfvflvllplvssq) was fused to the rbd sequence ( - , according to yp_ . ) and joined with a c-terminal c-myc epitope (eqkliseedl), short linker sequence, and hexahistidine tag. n-terminal part of the rbdv gene was constructed according to [ ] , utilizing the optimized codon usage gene structure. c-terminal tags were not optimized for codon usage frequencies. the resulting synthetic gene was cloned into the p . -tr vector plasmid, a shortened derivative of the p . plasmid [ ] , and used for transfection of dhfr-deficient cho dg cells. the resulting expression plasmid p . -tr -rbdv [genbank: mw ] is shown on fig a. the stably transfected cell population was obtained by selection in the presence of nm of dhfr inhibitor methotrexate, rbd titer . mg/l was detected for -days culture (fig. s ). one-step target gene amplification was performed by increasing the mtx concentration tenfold and maintaining the cell culture for days until cell viability restored to more than %; the resulting polyclonal cell population could secrete up to , mg/l rbd in the -days culture. the target protein was purified by a single imac chromatography step, utilizing the ida-based resin chelating sepharose fast flow (cytiva), ni + ions, and step elution by increasing imidazole concentrations (fig b, fig c) . the resulting protein production method was found to be sub-optimal due to unexpectedly low secretion rate, signs of cellular toxicity of the target gene - h cell duplication time, maximal cell density in shake flask of . mln сells/ml (fig s ) , and unacceptable level of contaminant proteins co-eluting with the rbdv . at the same time, the rbdv protein was stable in the culture medium during the extended batch cultivation of cells for at least days (fig d) , making the long-term feed batch cultivations a viable option for its production in large quantities. we proposed that target protein secretion rate and its purity after one-step purification could be significantly improved by a simultaneous shift of the rbd domain boundaries, exchange of the sars-cov- s protein native signal peptide to the signal peptide of more abundantly expressed protein, two-step genome amplification and switch from ida-based resin to the nta-based one (fig e) . human tissue plasminogen activator signal peptide (htpa sp, amino acid sequence mdamkrglccvlllcgavfvsas) is commonly used for heterologous protein expression in mammalian cells. it was successfully used for the expression of sars-cov s protein in the form of dna vaccine [ ] and envelope viral protein gp [ ] . in the case of mers-cov s protein rbd -fc fusion protein, various heterologous signal peptides modulate target protein secretion rate by the factor of two [ ] . corrected boundaries of the sars-cov- rbd were determined according to the cryo-em data [pdb id: vxx] [ ] obtained for the trimeric sars-cov- s protein ectodomain. initially used - coordinates, described in the [ ] include one unpaired cys residue originated from the n-terminal part of the next domain sd (structural domain ), so we excluded lys from the n-terminus of the mature rbd protein, aiming at the maximization of signal peptide processing, and removed c-terminal aminoacids c vnf , which form the structure of the sd domain. both linker areas surrounding the folded rbd domain core remain present in the rbdv protein ( - , according to yp_ . ). additionally, we redesigned c-terminal tags by introducing the pro residue immediately upstream of the c-myc tag, adding the short linker sequence sagg between the c-myc tag and polyhistidine tag, and extending the polyhistidine tag up to residues. we expected this structure to expose the c-myc tag properly on the protein globule's surface and move the decahistidine tag away from possible masking negatively charged protein surface areas. we constructed an expression vector ptm [genbank: mw ], where consensus kozak sequence, htpa sp and c-myc and -histidine tags are coded in the polylinker. rbd coding fragment was cloned in-frame, resulting ptm-rbdv expression plasmid [genbank: mw ] is shown on fig. a . cho dg cells were transfected by the ptm-rbdv plasmid, stably transfected cell population was established at the nm mtx selection pressure. target protein titer was similar to the previous plasmid design - . mg/l for -days culture, but after one step of the mtx-driven genome amplification, it increased eleven-fold to . mg/l at µm mtx ( fig b) and then increased by a factor of . after second amplification step at µm mtx, resulting titer was . mg/l for -days culture (fig a) . a steady increase of the target protein titer was detected for the extended batch cultivation of polyclonal cell population obtained at µm mtx, peaking at mg/l at days of cultivation in the l shake flask (fig d, e) . a similar ratio of product titer increase after multi-step mtx-driven genome amplification was described for the mers-cov rbd - -fold increase after steps of consecutive increments of mtx concentration, overall amplification period length was days [ ] . vector plasmid ptm, used in this study, allowed a much more rapid amplification course -a -fold titer increase in two steps, days total. this all cell populations, secreting rbd proteins, were analyzed by the quantitative pcr and it was found, that increased productivity of populations, adapted to higher concentrations of mtx corresponds to higher copy numbers of target gene (fig c) . higher cell productivity in the case of rbdv protein was not due to higher target gene copy numbers, then in the case of rbdv . cell culture medium pro cho (lonza), utilized in this study, contains unknown components, blocking histagged rbd protein's interaction with the ni-nta chromatography resin. clarified conditioned medium, used for protein purification, was concentrated approximately tenfold by tangential flow ultrafiltration on the kda mwco cassettes and completely desalted by diafiltration, diafiltration volumes of the mm imidazole-hcl, ph . solution. rbdv and rbdv proteins were purified by imac utilizing ni-nta agarose (thermo fischer scientific, usa) in the same conditions. desalted conditioned medium was applied onto the column in the presence of mm imidazole; the column was washed by the solution containing elution was performed by the mm imidazole solution; further column strip by the mm edta-na solution revealed no detectable target protein rbdv in the eluate (fig c) . purified proteins were desalted by another round of ultrafiltration/diafiltration on the centrifugal concentrators with kda mwco membranes; diafiltration solution was pbs; final concentration - mg/ml. purified proteins were flashfrozen in liquid nitrogen and stored frozen in aliquots. overall protein yield for rbdv was %, mg of purified rbdv were obtained from l shake flask culture. the apparent molecular weight of intact rbdv was determined as . kda, deglycosylated rbdv - . kda, theoretical molecular weight - da. rbdv molecular weight was determined as . kda for the intact protein, deglycosylated protein - . kda, theoretical molecular mass - da (fig a) . both protein variants possess two distinct forms of intramolecular disulfide bonds sets, visible as two closely adjacent bands in non-reducing conditions and complete absence of such band pattern in reducing conditions. previously it was reported that sars-cov- rbd - , expressed transiently in hek- cells, tends to form a covalent dimer, around % from the total, visible as the kda band on the denaturing gel in nonreducing conditions [ ] . we confirmed this observation; in the case of stably transfected cho cells, covalent dimerization was also % according to gel densitometry data. at the same time, it should be noted that the rbdv protein, redesigned explicitly for mitigation of this unwanted dimerization and containing an even number of cys residues, still forms % of the covalent dimer. purified rbdv was tested by size exclusion chromatography. the major monomer form's apparent molecular weight was determined as . kda (fig s ) , admixtures peaks apparent molecular masses corresponded well to rbd dimer, tetramer, and two high molecular mass oligomers accounting for % of all peak areas (fig b) . mass-spectrometry analysis of rbdv and rbdv revealed that both proteins' molecular masses diminished ( fig s , s ). this long peptide was completely absent in both spectra of de-glycosylated proteins (table s -s ) . a more detailed analysis of this area of the rbd protein may be of some interest for the s protein structure-function investigation but is out of scope for the present study. purified rbd variants were used as antigens for microplates coating and subsequent direct elisa with pooled sera obtained from patients with the rt-pcr-confirmed covid- diagnosis, weakly positive serum sample from the rt-pcr-confirmed covid- patient, and serum sample obtained from a healthy volunteer before december (fig e) . both rbd variants perform equally -all serum samples produce highly similar od readings for all dilutions tested with both antigens. here we describe a method of generating stably transfected cho cell lines, secreting large quantities of monomeric sars-cov- rbd, suitable for serological assays. at present, serological assays for detection of seroconversion upon sars-cov- infection are mostly based on two viral antigens -nucleoprotein (np) and s protein or fragments of the s protein, including the rbd. there are various reports on the specificity and sensitivity of assays based on these two antigens. in some cases, the sensitivity of clinically approved npbased assays was challenged by direct re-testing of np-negative serum samples by the rbd-based assays [ ] . other studies question the specificity of np-based elisa tests, demonstrating a significant level of false-positive results for the full-length sars-cov- np [ ] . it may be proposed that testing of serum samples with both sars-cov- antigens will produce the most accurate results, as was done, for example, in the south-east england population study [ ] ; this conclusion was made in the microarray study of a limited number of patients serum samples [ ] . it is unclear yet, which part of the s protein is the optimal antigen for serological assays; microarray analysis revealed that s fragment generates more false-positive results than s or rbd antigen variants [ ] in the case of igg detection, at the same time the rbd protein generated much lower signals on covid- patients serum samples then s or s +s antigens. in another microarray study it was found that igg response toward the rbd domain in the convalescent plasma samples correlates well with the response toward full-length soluble s protein [ ] . in the conventional elisa test format, rbd demonstrated nearly % specificity and sensitivity on a limited number of sars-cov- patients and control serum samples [ ] . as of . . , at least various immunoassays for sars-cov- antibodies were authorized for in vitro diagnostic use in the eu [ ], many of them use rbd as the antigen. a simple elisa screening test with the -well microplate will consume around µg of the rbd antigen for test samples, so even one million tests will require mg of the purified rbd protein, making the antigen supply a critical step in the production of such tests. method of the generation of highly productive stably transfected cho cell line, secreting the rbd protein, may be important for ivd test manufacturers in securing the sources of rbd antigen with highly predictable properties. although the rbd fragment of the s protein from sars-cov- is not the most popular antigen variant in the current efforts of anti-sars-cov- vaccine development [ ] , it may be considered as the viable candidate for a simple subunit vaccine. it demonstrated the significant protective immune response development in rodents, without signs of ade effect [ ] and some rbd-based protein subunit vaccine have advanced to phase ii clinical trials. cultured cho cells are the reliable source of rbd protein for this kind of vaccines; at the productivity level achieved in our study, only m of cell culture supernatant will provide enough antigen material for mln of typical µg/vial vaccine doses. С, d -protein sequence coverage by tryptic peptides, maldi-tof analysis. glycosylated peptides found are not pictured, signal peptides are yellow, detected tryptic peptides -violet, experimentally obtained masses, [m+h]+, are stated in the boxes. e -immunoreactivity of rbdv and rbdv by elisa with pooled serum samples from pcr-positive patients -(+)pooled, single serum sample from pcr-positive patient (+) and pre-covid- pooled sera (-). all sera samples were analyzed in duplicates, data are mean. supporting figure s . cell growth and viability dynamics of initial selection and mtx-driven target gene amplification. supporting figure s . cell growth curve for the extended batch cultivation of rbdv and rbdv producing cell populations, um mtx selection pressure. supporting figure s . size exclusion chromatography trace of molecular mass calibrators and molecular mass calibration curve. supporting figure s . maldi-tof spectra traces of intact proteins in glycosylated and deglycosylated forms. supporting figure s . maldi-tof spectra traces of tryptic peptides mxtures from intact and deglycosylated rbdv . supporting figure s . maldi-tof spectra traces of tryptic peptides mxtures from intact and deglycosylated rbdv . supporting table s . peptides mass list of the rbdv intact protein, in-gel digestion, reduced protein. supporting table s . peptides mass list of the rbdv intact protein, in-gel digestion, reduced protein. supporting molecular and immunological diagnostic tests of covid- : current status and challenges. iscience antigenic and immunogenic characterization of recombinant baculovirus-expressed severe acute respiratory syndrome coronavirus spike protein: implication for vaccine design a sars-cov- surrogate virus neutralization test based on antibody-mediated blockage of ace -spike protein-protein interaction the receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients false-positive results in a recombinant severe acute respiratory syndromeassociated coronavirus (sars-cov) nucleocapsid-based western blot assay were rectified by the use of two subunits (s and s ) of spike for detection of antibody to sars-cov structure, function, and evolution of coronavirus spike proteins structural and functional properties of sars-cov- spike protein: potential antivirus drug development for covid- fenner and white's medical virology molecular interaction and inhibition of sars-cov- binding to the ace receptor variations in sars-cov- spike protein cell epitopes and glycosylation profiles during global transmission course of covid- . front immunol deducing the n-and o-glycosylation profile of the spike protein of novel coronavirus sars-cov- site-specific n-glycosylation characterization of recombinant sars-cov- spike proteins analysis of the sars-cov- spike protein glycan shield reveals implications for immune recognition engineering a stable cho cell line for the expression of a mers-coronavirus vaccine antigen. vaccine yeast-expressed recombinant protein of the receptor-binding domain in sars-cov spike protein with deglycosylated forms as a sars vaccine candidate recombinant sars-cov- spike proteins for sero-surveillance and epitope mapping. biorxiv structural and functional comparison of sars-cov- -spike receptor binding domain produced in pichia pastoris and mammalian cells rapid production of sars-cov- receptor binding domain (rbd) and spike specific monoclonal antibody cr in nicotiana benthamiana purification of recombinant sars-cov- spike, its receptor binding domain, and cr mab for serological assay sars-cov- seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup improved elongation factor- alpha-based vectors for stable high-level expression of heterologous proteins in chinese hamster ovary cells stable high-level expression of factor viii in chinese hamster ovary cells in improved elongation factor- alpha-based system a highly productive cho cell line secreting human blood clotting factor ix high-level expression of biologically active human follicle stimulating hormone in the chinese hamster ovary cell line by a pair of tricistronic and monocistronic vectors a -mer cho-expressing receptor-binding domain of sars-cov s protein induces potent immune responses and protective immunity a serological assay to detect sars-cov- seroconversion in humans eukaryotic genome size databases highly sensitive and fast protein detection with coomassie brilliant blue in sodium dodecyl sulfate-polyacrylamide gel electrophoresis antibodies : a laboratory manual glycomod--a software tool for determining glycosylation compositions from mass spectrometric data identification of two neutralizing regions on the severe acute respiratory syndrome coronavirus spike glycoprotein produced from the mammalian expression system extracellular matrix proteins mediate hiv- gp interactions with alpha beta structure, function, and antigenicity of the sars-cov- spike glycoprotein testing for responses to the wrong sars-cov- antigen? whole nucleocapsid protein of severe acute respiratory syndrome coronavirus may cause false-positive results in serological assays estimates of the rate of infection and asymptomatic covid- disease in a population sample from se england analysis of sars-cov- antibodies in covid- convalescent blood using a ): p. . . database. foundation for innovative new diagnostics. sars-cov- diagnostic pipeline a systematic review of sars-cov- vaccine candidates the sars-cov- receptor-binding domain elicits a potent neutralizing response without antibody-dependent enhancement we thank mr. arthur isaev (genetico, moscow, russia) and dr. alexander ivanov (institute of molecular biology russian academy of sciences, moscow, russia) for valuable comments and early access to the sars-cov- s protein sequence data, dr. yuri lebedin, eugenia kostrikina and xema co., ltd., for providing anti-rbd mabs and conjugates.the measurements were carried out on the equipment of the shared-access equipment centre "industrial biotechnology" the research center of biotechnology of the russian academy of sciences. dna sequencing was carried out in the inter-institutional center for collective use "genome" imb ras, organized with the support of the russian foundation of basic research.the authors would like to acknowledge all the doctors who diagnose and treat patients during the covid- pandemic. primers for rbdv cloning, restriction sites are underlined ad-cov-absf aacctcgaggccgccaccatgttcatgccttctt ad-rbd-myc hnher gctagcctaatggtgatggtgatgatgaccggtatgcatat tcagatcctcttctgagatgagtttttgttcgaagttcacgc atttgtt primers for ptm construction, sticky ends of annealed pairs are underlinedctagtgatggtgatggtgatggtgatggtgatgaccgcctg cagacagatcctcttcgctgatcagtttttgttcaccggta primers for rbdv cloning, restriction sites are underlined ad-sfr -nhef gctagcgtgcagcccaccgaatcc ad-sfr -xmar cccgggtttgttcttcacgagattggt sequencing primers sq- ch -f gccgctgcttcctgtgac iresa rev aggtttccgggccctcacattg sq-mych-r gatgaccgcctgcagac key: cord- - dtk kyh authors: nguyen, thanh thi; abdelrazek, mohamed; nguyen, dung tien; aryal, sunil; nguyen, duc thanh; khatami, amin title: origin of novel coronavirus (covid- ): a computational biology study using artificial intelligence date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dtk kyh origin of the covid- virus has been intensely debated in the scientific community since the first infected cases were detected in december . the disease has caused a global pandemic, leading to deaths of thousands of people across the world and thus finding origin of this novel coronavirus is important in responding and controlling the pandemic. recent research results suggest that bats or pangolins might be the original hosts for the virus based on comparative studies using its genomic sequences. this paper investigates the covid- virus origin by using artificial intelligence (ai) and raw genomic sequences of the virus. more than genome sequences of covid- infected cases collected from different countries are explored and analysed using unsupervised clustering methods. the results obtained from various ai-enabled experiments using clustering algorithms demonstrate that all examined covid- virus genomes belong to a cluster that also contains bat and pangolin coronavirus genomes. this provides evidences strongly supporting scientific hypotheses that bats and pangolins are probable hosts for the covid- virus. at the whole genome analysis level, our findings also indicate that bats are more likely the hosts for the covid- virus than pangolins. the covid- pandemic has rapidly spread across many countries and disturbed lives of millions of people around the globe. there have been approximately million confirmed cases of covid- globally, including nearly , deaths, reported to the world health organization at the end of june [ ] . studies on understanding the virus, which was named severe acute respiratory syndrome coronavirus (sars-cov- ), are important to propose appropriate intervention strategies and contribute to the development of therapeutics and vaccines. finding origin of the covid- virus is crucial as it helps to understand where the virus comes from via its evolutionary relationships with other biological organisms and species. this will facilitate the process of identifying and isolating the source and preventing further transmissions of the pathogen to the human population. this will also help to understand the outbreak dynamics, leading to the creation of informed plans for public health responses [ ] . a study by wu et al. [ ] using a complete genome obtained from a patient who was a worker at a seafood market in wuhan city, hubei province, china shows that the virus is closely related to a group of sars-like covs that were previously found present in bats in china. it is believed that bats are the most likely reservoir hosts for the covid- virus as it is very similar to a bat coronavirus. these results are supported by a separate study by lu et al. [ ] using genome sequences acquired from nine covid- patients who were among early cases in wuhan, china. outcomes of a phylogenetic analysis suggest that the virus belongs to the genus betacoronavirus, sub-genus sarbecovirus, which includes many bat sars-like covs and sars covs. another study in [ ] confirms this finding by analysing genomes obtained from three adult patients admitted to a hospital in wuhan on december , . likewise, zhou et al. [ ] advocate a probable bat origin of sars-cov- by using complete genome sequences of five patients at the beginning of the outbreak in wuhan, china. one of these sequences shows . % similarity to a genome sequence of a coronavirus, denoted ratg , which was previously obtained from a rhinolophus affinis bat found in yunnan province of china. zhang and holmes [ ] also highlight a similarity of approximately % between sars-cov- and ratg in their receptor binding domain, which is an important region of the viral genomes for binding the viruses to the human angiotensin-converting enzyme receptor. in another study, lam et al. [ ] found two related lineages of covs in pangolin genome sequences sampled in guangxi and guangdong provinces in china, which have similar genomic organizations to sars-cov- . that study suggests that pangolins could be possible hosts for sars-cov- although they are solitary animals in an endangered status with relatively small population sizes. these findings are corroborated by zhang et al. [ ] who assembled a pangolin cov draft genome using a reference-guided scaffolding approach based on contigs taxonomically annotated to sars-cov- , sars-cov, and bat sars-like cov. xiao et al. [ ] furthermore suggest that sars-cov- may have been formed by a recombination of a pangolin cov-like virus with one similar to ratg , and pangolins are potentially the intermediate hosts for sars-cov- . on the other hand, by analysing genomic features of sars-cov- , i.e. mutations in the receptor binding domain portion of the spike protein and distinct backbone of the virus, andersen et al. [ ] determined that this novel coronavirus originated through natural processes rather than through a laboratory manipulation. this study presents a step further to suggesting the likely origin of the covid- virus by using artificial intelligence (ai) methods to explore genome sequences obtained from more than covid- patients across the world. we use ai-enabled unsupervised clustering methods to demonstrate and emphasize the relationships between covid- virus, bat covs and pangolin covs. through analysing the results of clustering methods, we are able to suggest the sub-genus sarbecovirus of the genus betacoronavirus of sars-cov- and the more likely bat origin of the virus rather than a pangolin origin. we downloaded complete genome sequences of sars-cov- available from the genbank database, which is maintained by the national center for biotechnology information (ncbi), in early april . among these sequences, were reported from usa, were from china and the rest were distributed through various countries from asia to europe and south america. accession numbers and detailed distribution of these genome sequences across different countries are presented in tables and in appendix . most of reference sequences, e.g. ones within the alphacoronavirus and betacoronavirus genera, are also downloaded from the ncbi genbank and virus-host db (https://www.genome.jp/virushostdb/) that covers ncbi reference sequences (refseq, release , march , ). genome sequences of guangxi pangolin covs [ ] are downloaded from the gisaid database (https://www.gisaid.org) with accession numbers epi_isl_ -epi_isl_ . a guangdong pangolin cov genome [ ] is also downloaded from gisaid with accession number epi_isl_ . we employ three sets of reference sequences in this study with details presented in tables - . the selection of reference genomes at different taxonomic levels is based on a study in [ ] that uses the ai-based supervised decision tree method to classify novel pathogens, which include sars-cov- sequences. we aim to traverse from high to low taxonomic levels to search for the covid- virus origin through discovering its genus and sub-genus taxonomy and its closest genome sequences. unsupervised clustering methods are employed to cluster datasets comprising both query sequences (sars-cov- ) and reference sequences into clusters. in this paper, we propose the use of hierarchical clustering algorithm [ ] and densitybased spatial clustering of applications with noise (dbscan) method [ ] for this purpose. with these two methods, we perform two steps to observe the clustering results that lead to interpretations about the taxonomy and origin of sars-cov- . in the first step, we apply clustering algorithms to cluster the set of reference sequences only, and then use the same settings (i.e. values of parameters) of clustering algorithms to cluster a dataset that merges reference sequences and sars-cov- sequences. through this step, we can find out reference sequences by which sars-cov- sequences form a group with. in the second step, we vary the settings of the clustering algorithms and observe changes in the clustering outcomes. with the second step, we are able to discover the closest reference sequences to the sars-cov- sequences and compare the similarities between genomes. in the hierarchical clustering method, the cut-off parameter c plays as a threshold in defining clusters and thus c is allowed to change during our experiments. with regard to the dbscan method, the neighbourhood search radius parameter ε and the minimum number of neighbours parameter, which is required to identify a core point, are crucial in partitioning observations into clusters. in our experiments, we set the minimum number of neighbours to and allow only the search radius parameter ε to vary. outputs of the dbscan method may also include outliers, which are normally labelled as cluster "- ". to facilitate the execution of the clustering methods, we propose the use of pairwise distances between sequences based on the jukes-cantor method [ ] and the maximum composite likelihood method [ ] . the jukes-cantor method estimates evolutionary distances by the maximum likelihood approach using the number of substitutions between two sequences. with nucleotide sequences, the distance is defined as d = − / * ln( −p * / ) where p is the ratio between the number of positions where the substitution is to a different nucleotide and the number of positions in the sequences. on the other hand, the maximum composite likelihood method considers the sum of loglikelihoods of all pairwise distances in a distance matrix as a composite likelihood because these distances are correlated owing to their phylogenetic relationships. tamura et al. [ ] showed that estimates of pairwise distances and their related substitution parameters such as those of the tamura-nei model [ ] can be obtained accurately and efficiently by maximizing this composite likelihood. the unweighted pair group method with arithmetic mean (upgma) method is applied to create hierarchical cluster trees, which are used to construct dendrogram plots for the hierarchical clustering method. the upgma method is also employed to generate phylogenetic trees in order to show results of the dbscan algorithm. we start the experiments to search for taxonomy and origin of sars-cov- with the first set of reference genome sequences (set in table ). this set consists of much more diversified viruses than the other two sets (sets and in tables and ) as it includes representatives from major virus classes at the highest available virus taxonomic level. with a large coverage of various types of viruses, the use of this reference set minimizes the probability of missing out any known virus types. outcomes of the hierarchical clustering and dbscan methods are presented in figs. and , respectively. in these experiments, we use sars-cov- sequences representing countries in table (appendix ) for the demonstration purpose. the first released sars-cov- genome of each country is selected for these experiments. clustering outcomes on all sequences are presented in fig. in appendix , which shows results similar to those reported here. both clustering methods consistently demonstrate that sars-cov- sequences form a cluster with a representative virus of riboviria among major virus classes (adenoviridae, anelloviridae, caudovirales, geminiviridae, genomoviridae, microviridae, ortervirales, papillomaviridae, parvoviridae, polydnaviridae, polyomaviridae, and riboviria). the middle east respiratory syndrome (mers) cov, which caused the mers outbreak in , is chosen as a representative of the riboviria realm. in hierarchical clustering ( fig. ), when combined with reference genomes, sars-cov- genomes do not create a new cluster on their own but form a cluster with the mers cov, i.e. cluster " ". with the dbscan method ( fig. ) , sars-cov- genomes also do not create their own cluster but form the cluster " " with the mers cov. these clustering results suggest that sars-cov- belongs to the riboviria realm. (table ) with the cut-off parameter c equal to * − (left), and using a set that merges representative sars-cov- sequences and reference sequences with c also set to * − (right). a number at the beginning of each virus name indicates the cluster that virus belongs to after clustering. once we have been able to identify sars-cov- as belonging to the riboviria realm, we move to the next lower taxonomic level that consists of virus families within riboviria. these families are presented in set ( , and using a set that merges sars-cov- sequences and reference sequences with ε also set to . (right). as set includes representatives of major virus classes and the minimum number of neighbours is set to while ε is set to . , dbscan considers individual viruses as outliers (left). when the dataset is expanded to include sars-cov- sequences, dbscan forms cluster " " that includes all sars-cov- sequences and the mers cov, which represents the riboviria realm (right). (table ) with the cut-off parameter c equal to . (left), and using a set that merges sars-cov- sequences and reference sequences with c also set to . (right). covs and bat sars-like covs. notably, we also include in this set sequences of guangxi pangolin covs deposited to the gisaid database by lam et al. [ ] and a sequence of guangdong pangolin cov by xiao et al. [ ] . evolutionary distances between each of the reference genomes in set (table ) to the sars-cov- genomes based on the jukes-cantor method are presented in fig. . we can observe that these distances are almost constant across sars-cov- sequences, which are collected in countries (table ) table ) with the search radius parameter ε equal to . (left), and using a set that merges sars-cov- sequences and reference sequences with ε also set to . (right). group contains genomes of alphacov viruses (refer to the taxonomy in table ) that are much evolutionarily divergent from sars-cov- sequences. the middle group of lines comprises most of the betacov viruses, especially those in the sarbecovirus sub-genus. the bottom lines identify reference viruses that are closest to sars-cov- , which include bat cov ratg , guangdong pangolin cov, bat sars cov zc and bat sars cov zxc . the bat cov ratg line at the bottom is notably distinguished from other lines while the guangdong pangolin cov line is the second closest to sars-cov- . the similarities between bat cov ratg , guangdong pangolin cov and guangxi pangolin cov gx/p l with sars-cov- /australia/vic / , produced by the simplot software [ ] , are displayed in fig. . consistent with the results presented in fig. , bat cov ratg is shown closer to sars-cov- than pangolin covs. fig. shows outcomes of the hierarchical clustering method using set of reference sequences in table . with the cut-off parameter c is set equal to . , the hierarchical clustering algorithm separates the reference sequences into clusters in which cluster " " comprises all examined viruses of the sarbecovirus sub-genus, including many sars covs, bat sars-like covs and pangolin covs (fig. a) . it is observed that the algorithm reasonably groups viruses into clusters, for example, the genus alphacov is represented by cluster " " while the sub-genera embecovirus, nobecovirus, merbecovirus, and hibecovirus are labelled as clusters " ", " ", " ", and " ", respectively. using the same cut-off value of . , we next perform clustering on a dataset that merges reference sequences and representative sars-cov- sequences (see fig. b ). results on all sars-cov- sequences, which are similar to those on the representative sequences, are provided in fig. in appendix . the outcome presented in fig. b shows that the representative sars-cov- sequences fall into cluster " ", which comprises the sarbecovirus sub-genus. the number of clusters is still and the membership structure of the clusters is the same as in the case of clustering reference sequences only (fig. a) , except that the sarbecovirus cluster now has been expanded to also contain sars-cov- sequences. by comparing figs. a and b, we believe that sars-cov- is naturally part of the sarbecovirus sub-genus. this realization is substantiated by moving to fig. c that shows a clustering outcome when the cut-off parameter c is decreased to . . in fig. c , while members of the merbecovirus sub-genus (i.e. pipistrellus bat cov hku , tylonycteris bat cov hku and mers cov) are divided into clusters (" ", " " and " ") or members of the sarbecovirus cluster are separated themselves into clusters "' " and " ", sequences of sars-cov- still join the cluster " ' with other members of sarbecovirus such as bat viruses (bat sars cov zc , bat sars cov zxc , bat cov ratg ) and pangolin covs. as the cut-off parameter c decreases, the number of clusters increases. this is an expected outcome because the cut-off threshold line moves closer to the leaves of the dendrogram. when the cut-off c is reduced to . (fig. d) , there are only viruses (bat cov ratg and guangdong pangolin cov) that can form a cluster with sars-cov- (labelled as cluster " "). these are viruses closest to sars-cov- based on the whole genome analysis. results in figs. c and d therefore provide evidence that bats or pangolins could be possible hosts for sars-cov- . we next reduce the cut-off c to . as in fig. e . at this stage, only bat cov ratg is within the same cluster with sars-cov- (cluster " "). we thus believe that bats are the more probable hosts for sars-cov- than pangolins. the inference of our ai-enabled analysis is in line with a result in [ ] that investigates the polyprotein ab of sars-cov- and suggests that this novel coronavirus has more likely been arisen from viruses infecting bats rather than pangolins. when the cut-off c is reduced to . as in fig. f , we observe that the total number of clusters now increases to and more importantly, sars-cov- sequences do not combine with any other reference viruses but form its own cluster " ". could we use this clustering result (fig. f) to infer that sars-cov- might not originate in bats or pangolins? this is a debatable question because the answer depends on the level of details we use to differentiate between the species or organisms. the cut-off parameter in hierarchical clustering can be considered as the level of details. with the results obtained in fig. d (and also in the experiments with the dbscan method presented next), we support a hypothesis that bats or pangolins are the probable origin of sars-cov- . this is because we observe that the similarity between sars-cov- and bat cov ratg (or guangdong pangolin cov) is considerably large compared to the similarity between viruses that originated in the same host. for example, bat sars-like covs such as bat sars cov rf , bat sars cov longquan- , bat sars cov hku - , bat sars cov rp , bat sars cov rs / , bat sars cov rsshc , bat sars cov wiv , bat sars cov zc and bat sars cov zxc had the same bat origin. in fig. d , these viruses however are separated into different clusters (" " and " ") while all sars-cov- representatives are grouped together with bat cov ratg and guangdong pangolin cov in cluster " ". this demonstrates that the difference between the same origin viruses (e.g. bat sars cov wiv and bat sars cov zc ) is larger than the difference between sars-cov- and bat cov ratg (or guangdong pangolin cov). therefore, sars-cov- is deemed very likely originated in the same host with bat cov ratg or guangdong pangolin cov, which is bat or pangolin, respectively. clustering outcomes of the dbscan method via phylogenetic trees using set of reference sequences (table ) are presented in fig. . we first apply dbscan to reference sequences only, which results in clusters and several outliers (fig. a) . the search radius parameter ε is set equal to . . as we set the minimum number of neighbours parameter to , it is expected that viruses of the sub-genera embecovirus, nobecovirus and hibecovirus are detected as outliers "- " because there are only or viruses in these sub-genera. three viruses of the merbecovirus sub-genus (i.e. tylonycteris bat cov hku , pipistrellus bat cov hku and mers cov) are grouped into the cluster " ". all examined viruses of the sarbecovirus sub-genus are joined in cluster " " while the alphacov viruses are combined into cluster " ". fig. b shows an outcome of dbscan with the same ε value of . and the dataset has been expanded to include representative sars-cov- sequences. we observe that genomes of sars-cov- fall into the cluster " ", which includes all the examined sarbecovirus viruses. when ε is decreased to . in fig. c , all members of the merbecovirus cluster or the alphacov cluster become outliers while sars-cov- genomes still stick with the sarbecovirus cluster. in line with the results obtained by using hierarchical clustering in fig. , those obtained in fig. b and c using the dbscan method give us the confidence to confirm that sars-cov- is part of the sarbecovirus sub-genus. fig. d shows that bat cov ratg and guangdong pangolin cov are closest to sars-cov- as they join with sars-cov- representatives in cluster " ". this again substantiates the probable bat or pangolin origin of sars-cov- . by reducing ε to . as in fig. e , the guangdong pangolin cov becomes an outlier whilst sars-cov- sequences form a cluster (" ") with only bat cov ratg . this further confirms our findings when using the hierarchical clustering in fig. that bats are more likely the reservoir hosts for the sars-cov- than pangolins. when ε is decreased to . as in fig. f , sars-cov- genomes form its own cluster " ", which is separated with any bat or pangolin genomes. as with the result in fig. f by the hierarchical clustering, this result also raises a question whether sars-cov- really originated in bats or pangolins. in fig. d , it is again observed that the similarity between sars-cov- and bat cov ratg (or guangdong pangolin cov) is larger than the similarity between bat sars covs, which have the same bat origin. specifically, sars-cov- , bat cov ratg and guangdong pangolin cov are grouped together in cluster " " while bat sars covs are divided into clusters, i.e. bat sars cov zxc and bat sars cov zc are in cluster " " whereas other bat sars covs are in cluster " ". we thus suggest that sars-cov- probably has the same origin with bat cov ratg or guangdong pangolin cov. in other words, bats or pangolins are the probable origin of sars-cov- . all results presented above are obtained using the pairwise distances estimated by the jukes-cantor method. results based on distances calculated by the maximum composite likelihood method are reported in appendix , which are similar to those obtained by using the jukes-cantor method. these ai-based quantitative results using the unsupervised hierarchical clustering and dbscan methods provide more evidences to suggest that ) sars-cov- belongs to the sarbecovirus sub-genus of the betacoronavirus genus, ) bats and pangolins may have served as the hosts for sars-cov- , and ) bats are the more probable origin of sars-cov- than pangolins. the severity of covid- pandemic has initiated a race in finding origin of the covid- virus. studies on genome sequences obtained from early patients in wuhan city in china suggest the probable bat origin of the virus based on similarities between these sequences and those obtained from bat covs previously reported in china. other studies afterwards found that sars-cov- genome sequences are also similar to pangolin cov sequences and accordingly raised a hypothesis on the pangolin origin of the covid- virus. this paper has investigated origin of the covid- virus using unsupervised clustering methods and more than raw genome sequences of sars-cov- collected from various countries around the world. outcomes of these ai-enabled methods are analysed, leading to a confirmation on the coronaviridae family of the covid- virus. more specifically, the sars-cov- belongs to the sub-genus sarbecovirus within the genus betacoronavirus that includes sars-cov, which caused the global sars pandemic in - [ ; ] . the results of various clustering experiments show that sars-cov- genomes are more likely to form a cluster with the bat cov ratg genome than pangolin cov genomes, which were constructed from samples collected in guangxi and guangdong provinces in china. this indicates that bats are more likely the reservoir host for the covid- virus than pangolins. this study among many ai studies in the fight against the covid- pandemic [ ] has shown the power and capabilities of ai in this challenging battle, especially from the computational biology and medicine perspective. the findings of this research on the large dataset of sars-cov- genomic sequences provide more insights about the covid- virus and thus facilitate the progress on discovering medicines and vaccines to mitigate its impacts and prevent a similar pandemic in future. the race to produce treatment drugs and vaccines is still ongoing and no effective results have been reported yet. a further research in this direction is strongly encouraged by a recent success of ai in identifying powerful new kinds of antibiotic from a pool of more than million molecules as published in [ ] . as ai is capable of analysing large datasets and discovering knowledge from them in an intelligent and efficient manner, finding a covid- vaccine using ai is a realistic hope [ ] . in this appendix, we first present results of the hierarchical clustering method applied to the dataset that combines set of reference sequences (table ) with all sars-cov- sequences (see fig. ). we then show results of the hierarchical clustering (fig. ) and dbscan (fig. ) on a dataset that combines all sars-cov- sequences and reference sequences in set (table ) . fig. . results shown via a dendrogram plot (left) of the hierarchical clustering method applied to the dataset that combines reference sequences in set (table ) and all sars-cov- sequences. the middle figure shows in detail (zoom in) the top part of the dendrogram plot while the right figure shows the bottom part of the plot. all sars-cov- sequences are grouped in cluster " ", which also includes the middle east respiratory syndrome cov of the riboviria realm. this means that sars-cov- belongs to the riboviria realm. these results are consistent with those shown in fig. b that, for the demonstration purpose, employed only sars-cov- genomes, which are representatives of countries in table . this appendix presents results of two clustering methods, i.e. hierarchical clustering and dbscan, using the sequence distances computed by the maximum composite likelihood method [ ] , which was conducted in the mega x software [ ] . these results are greatly similar to those obtained by using the jukes-cantor distance method shown throughout the paper. in these experiments, the clustering methods are applied to a dataset that combines reference sequences in set (table ) and representative genomes of countries in table . when a country has more than one collected genome, the first released genome of that country is selected for this experiment. fig. demonstrates the distances estimated by the maximum composite likelihood method between each of the reference sequences and representative sars-cov- genomes. the lines are almost parallel indicating that sars-cov- genome is not altered much across countries, which is in line with the results obtained using the jukes-cantor distance estimates in fig. . the bat cov ratg is again shown much closer to sars-cov- than pangolin covs and other reference viruses although the distance range in fig. is larger than that in fig. in fig. a , when the hierarchical clustering cut-off parameter is set equal to . , all representative sars-cov- genomes are grouped into cluster " ", which also includes other viruses of the sarbecovirus sub-genus of the betacov genus. when moving from fig. a to fig. b , even though members of the sarbecovirus cluster (" " in fig. a ) are split into clusters " " and " " in fig. b , the sars-cov- sequences are still grouped into cluster " " with other members of the sarbecovirus sub-genus such as bat cov ratg , guangdong pangolin cov, bat sars cov zxc and bat sars cov zc . these results provide us with a confidence on confirming the sarbecovirus sub-genus of the sars-cov- . this is consistent with the result based on the jukes-cantor distances shown in fig. . fig. c shows that sars-cov- genomes are combined only with that of bat cov ratg when the cut-off parameter is decreased to . . this again indicates that bats are the more likely origin of sars-cov- than pangolins. when we reduce the cut-off parameter to . , the sars-cov- sequences create their own cluster " " and this questions the probable bat or pangolin origin of sars-cov- . however, in fig. b , we also find that the similarity between sars-cov- and bat cov ratg (or guangdong pangolin cov) is larger than the similarity between viruses having the same origin. for example, bat sars cov wiv and bat sars cov zc have the same bat origin but they are divided into clusters (" " and " ") while all sars-cov- representatives are grouped into cluster " " with bat cov ratg and guangdong pangolin cov. this implies that sars-cov- may have originated in bats or pangolins. results of dbscan using distances estimated by the maximum composite likelihood method are presented in fig. , which are also consistent with those obtained by the jukes-cantor distance method in fig. , leading to the same suggestions on the sub-genus sarbecovirus membership of sars-cov- , its likely bat or pangolin origin, and the more probable bat origin than the pangolin origin of the virus at the whole genome analysis level. who coronavirus disease (covid- ) dashboard. available at origin of sars-cov- a new coronavirus associated with human respiratory disease in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin a genomic perspective on the origin and emergence of sars-cov- identifying sars-cov- related coronaviruses in malayan pangolins probable pangolin origin of sars-cov- associated with the covid- outbreak isolation of sars-cov- -related coronavirus from malayan pangolins the proximal origin of sars-cov- machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: covid- case study clustering methods a density-based algorithm for discovering clusters in large spatial databases with noise evolution of protein molecules prospects for inferring very large phylogenies by using the neighbor-joining method estimation of the number of nucleotide substitutions in the control region of mitochondrial dna in humans and chimpanzees full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination an exclusive amino acid signature in pp ab protein provides insights into the evolutive history of the novel human-pathogenic coronavirus (sars-cov- ) identification of a novel coronavirus in patients with severe acute respiratory syndrome origins of major human infectious diseases artificial intelligence in the battle against coronavirus (covid- ): a survey and future research directions a deep learning approach to antibiotic discovery ai can help scientists find a covid- vaccine mega x: molecular evolutionary genetics analysis across computing platforms table . accession numbers of sars-cov- genome sequences obtained from ncbi genbank in early april , sorted by date released mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mn , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt , mt tables and in this appendix present accession numbers and detailed distribution of sars-cov- complete genomes across different countries obtained from the ncbi genbank database. key: cord- -ysr l authors: perisé-barrios, ana judith; tomeo-martín, beatriz davinia; gómez-ochoa, pablo; delgado-bonet, pablo; plaza, pedro; palau-concejo, paula; gonzález, jorge; ortiz-diez, gustavo; meléndez-lazo, antonio; gentil, michaela; garcía-castro, javier; barbero-fernández, alicia title: humoral response to sars-cov- by healthy and sick dogs during covid- pandemic in spain date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ysr l covid- is a zoonotic disease originated by sars-cov- . infection of animals with sars-cov- are being reported during last months, and also an increase of severe lung pathologies in domestic dogs has been detected by veterinarians in spain. therefore it is necessary to describe the pathological processes in those animals that show symptoms similar to those described in humans affected by covid- . the potential for companion animals contributing to the continued human-to-human disease, infectivity, and community spread is an urgent issue to be considered. forty animals with pulmonary pathologies were studied by chest x-ray, ultrasound study, and computed tomography. nasopharyngeal and rectal swab were analyzed to detect canine pathogens, including sars-cov- . twenty healthy dogs living in sars-cov- positive households were included. immunoglobulin detection by different immunoassays was performed. our findings show that sick dogs presented severe alveolar or interstitial pattern, with pulmonary opacity, parenchymal abnormalities, and bilateral lesions. forty dogs were negative for sars-cov- but mycoplasma spp. was detected in of dogs. five healthy and one pathological dog presented igg against sars-cov- . here we report that despite detecting dogs with igg α-sars-cov- , we never obtained a positive rt-qpcr, not even in dogs with severe pulmonary disease; suggesting that even in the case of a canine infection transmission would be unlikely. moreover, dogs living in covid- positive households could have been more exposed to be infected during outbreaks. covid- is a zoonotic disease originated by sars-cov- . infection of animals with sars-cov- are being reported during last months, and also an increase of severe lung pathologies in domestic dogs has been detected by veterinarians in spain. therefore it is necessary to describe the pathological processes in those animals that show symptoms similar to those described in humans affected by covid- . the potential for companion animals contributing to the continued human-to-human disease, infectivity, and community spread is an urgent issue to be considered. forty animals with pulmonary pathologies were studied by chest x-ray, ultrasound study, and computed tomography. nasopharyngeal and rectal swab were analyzed to detect canine pathogens, including sars-cov- . twenty healthy dogs living in sars-cov- positive households were included. immunoglobulin detection by different immunoassays was performed. our findings show that sick dogs presented severe alveolar or interstitial pattern, with pulmonary opacity, parenchymal abnormalities, and bilateral lesions. forty dogs were negative for sars-cov- but mycoplasma spp. was detected in of dogs. five healthy and one pathological dog presented igg against sars-cov- . here we report that despite detecting dogs with igg α-sars-cov- , we never obtained a positive rt-qpcr, not even in dogs with severe pulmonary disease; suggesting that even in the case of a canine infection transmission would be unlikely. moreover, dogs living in covid- positive households could have been more exposed to be infected during outbreaks. we are currently in an international health emergency generated by the emerging zoonotic coronavirus sars-cov- that began its expansion by the end of the year in wuhan (china) and has caused a pandemic in a few months. covid- is a pathology with various clinical manifestations caused by sars-cov- , and the severity of its infection is mainly associated with lung injury, with findings similar to macrophage activation syndrome that causes hyperinflammation and lung damage by an uncontrolled activation and proliferation of t lymphocytes and macrophages. , four genera of coronavirus have been described: alphacoronavirus, betacoronavirus, gammacoronavirus, and deltacoronavirus (α-cov, β-cov, γ-cov, and δ-cov) according to their genetic structure. with the detection of sars-cov- in humans, seven coronaviruses had been isolated from people, but only two of them (sars-cov and mers-cov) predominantly infect the lower airways and cause fatal pneumonia. severe pneumonia is associated with rapid viral replication, massive inflammatory cell infiltration, and proinflammatory cytokine responses. in humans, the other coronaviruses cause mild upper respiratory tract infections in immunocompetent adults and serious symptoms in children and elderly people. α-cov and β-cov infect mammals and have been also described in dogs and cats; mostly they are responsible for respiratory infections in humans and gastroenteritis in animals. in dogs, canine enteric coronavirus (ccov), an α-cov, causes an enteritis of variable severity (rarely fatal) and develop immunity; however some of the recovered dogs become carriers with the ability to infect other dogs. however, the described fecal-oral transmission pattern for ccov includes only the canine species, and is not currently postulated as a possible zoonotic agent. however, canine respiratory coronavirus (crcov), that belong to the β-cov (like sars-cov- ), cause respiratory symptoms in dogs, in general with mild clinical signs and occasionally as a coinfection with other respiratory pathogens. recently, the first cases of asymptomatic dogs infected with sars-cov- have been described. due to the zoonotic origin of sars-cov- and the described transmission between species, the hypothesis of the spread between animals becomes more plausible. cases of infected cats, dogs, tigers, lions, minks, and ferrets have been reported during sars-cov- outbreaks, and all of them had close contact with infected people. , experimental infections using different animal species report an increased susceptibility to infection by ferrets and cats, and indicate a low susceptibility to sars-cov- infection by golden hamsters, macaques, fruit bats, and dogs. furthermore, under natural conditions, infections in more than mink farms have been reported. the transmission pattern postulated is that virus was transmitted to the animals by infected people and then the virus spread between minks. later, in may , two workers of the mink farms that could have acquired the infection from the minks have been reported. these would be the first described transmissions of sars cov- from animal to human (apart the first one that originated the pandemic). some data suggests that also transmission from minks to cats and dogs occurred in the farms. the world organization for animal health-oie stated that some animals can become infected by being in permanent contact with infected people, although they note that there are no evidences defining the role of infected pets in the spread of sars-cov- . to date, no cases of transmission from domestic or captive wild animals to humans have been described (excluding, if confirmed, mink farm workers). information regarding the possibility of companion animals becoming infected is confusing and controversial. some authors describe that dogs whose owners were positive for sars-cov- , showed serological negative to sars-cov- , postulating that pets are not virus carriers. by contrast, some cases of companion dogs have been reported that were positive by rt-qpcr detection and others that have developed neutralizing antibodies against sars-cov- . currently, around ten rt-qpcr-positive dogs have been detected worldwide (hong kong, denmark, and usa, but none in spain), all of them in close contact with covid- positive humans. less than half of them were asymptomatic, one presented mild respiratory illness, and only one present also neutralizing antibodies coursing with hemolytic anemia. on the other hand, two negative-pcr dogs have developed neutralizing antibodies, one was asymptomatic and the other had breathing problems, but it is not clear if those were related to the infection. further, molecular testing of dogs, cat and horse companion animals were done by idexx company in usa and korea and no positive cases were found. no positive cases were reported in dogs exposed to sars-cov- in france. another recent study in italy carried out with pets has shown that none of the animals studied was positive to sars-cov- by rt-qpcr test but dogs and cats had neutralizing antibodies. in this manuscript we report that, during the months of the pandemic, an increase of aggressive lung pathologies in dogs was detected by veterinarians in spain. therefore, it is important to determine the infectious agent and a potential role of a sars-cov- infection. considering the information that is currently available, it is therefore necessary to better describe the pathological processes that could occur in those animals that could be infected by sars-cov- and also showing symptoms similar to those described in humans affected by covid- . it is also highly relevant to determine if dogs could become infected in a home environment where close humanpet relationships occur. therefore, the potential for companion animals contributing to the continued human-to-human disease, infectivity, and community spread is an urgent issue to be considered. here, we describe the study of sick and healthy dogs regarding a potential infection with sars-cov- . a prospective study with forty dogs presenting pneumonia was performed between april and june in spain. a clinical follow-up of all patients was performed and mortality was also recorded. twenty healthy dogs living with people affected by covid- were included as animals exposed to the virus. inclusion/exclusion criteria and more information are available in suppl. methods. the study was approved by the ethical committee of the faculty of health sciences, alfonso x el sabio university and all dog owners gave written informed consent. chest x-ray (cxr), thoracic radiographic and a study ultrasound were performed in sick dogs. the pattern type, distribution and intensity were analyzed. the pathological lung was recognized when the ultrasound lung rockets also called b lines were observed ( figure ) or by the presence of other pulmonary ultrasound findings for consolidation (crushing, tissue, nodule sign) ( figure ). computed tomographic (ct) was perform to assess the lesions distribution, classified as generalized, focal, and uni or bilateral. more information is available in suppl. methods. blood samples from either sick or healthy-exposed dogs were analyzed to determine immunoglobulins (igg) against sars-cov- . a high-sensitive sars-cov- spike s protein elisa kit was used. values > · od of the negative control were considered as positives. to determine antibodies (igm and igg) against ccov, dog plasma samples were analyzed by eia assays. to determine neutralizing antibodies (igg) against canine adenovirus (cav), canine parvovirus (cpv) and canine distemper virus (cdv), plasma samples were analyzed by an elisa in solid phase. more information is available in suppl. methods. nasopharyngeal and rectal swabs were collected from sick dogs and analyzed by laboklin gmbh & co.kg using conventional pcr or real-time pcr (qpcr and rt-qpcr). all samples were tested for canine adenovirus type (cav- ), bordetella bronchiseptica, cdv, canine parainfluenza virus (cpiv), canine influenza a virus (civ), canine herpesvirus- (canid alphaherpesvirus- : cahv- ), and sars-cov- by taqman real-time pcr, and for mycoplasma spp. by conventional pcr. more information is available in suppl. methods. lungs of two dogs (ser and ser ) were histologically evaluated after necropsy. the macroscopic exam evaluated congestion, oedema and the lung injury pattern. lung samples were fixed in formalin % for hours, paraffin-embedded and µm thick sections stained with hematoxylin-eosin. more information is available in suppl. categorical variables were presented as percentages. for continuous variables, data distribution normality was evaluated with the kolmogorov-smirnov test. continuous data were presented as mean with standard deviation (sd) or median with interquartile range (iqr). forty pathologic dogs met the inclusion criteria with the mean age of years (range: months to years). fifteen breeds were recorded being the most common cross-breed · % ( / ), yorkshire terrier · % ( / ), and german shepherd % ( / ). there were females and males. the most common clinical signs were crackles on lung auscultation, followed by cough, tachypnoea, fatigue, fever, tachycardia, vomiting, and diarrhoea ( table ). the radiographic findings in all analyzed dogs were consistent with mild to severe alveolar or interstitial pattern, with pulmonary opacity accentuated in the caudodorsal lung field. in · % ( / ) in order to assess the overall status of dogs, and considering that the number of white blood cells is frequently altered in an infection, we consider it relevant to perform a hematologic evaluation on twenty-four pathologic patients. the count of white blood cells was out of range in · % of dogs being the number of neutrophils abnormal in %, lymphocytes in · %, and monocytes in · % (table ) . considering the altered number of immune cells observed in peripheral blood, together with the clinical course, we proceeded to evaluate possible pulmonary pathogens. in order to determine whether the observed pathologies could be related to a sars-cov- infection or to other pathogens, pcr analysis was performed. all forty dogs were negative for sars-cov- (table ) . furthermore, thirty-three dogs were analyzed for a complete profile including the most common canine infectious agents. all of them were negative for cpiv, civ, and cahv- . mycoplasma spp. and cdv were detected as a single agent with · % ( / ) and % ( / ), respectively ( table ). the pathologies in our patients were very aggressive and · % ( / ) of dogs died of pneumonia during follow-up ( table ) scant and disperse inflammatory cells, mainly macrophages with intracytoplasmic brown granular pigment were observed in alveolar septa. interestingly, some of these findings, mainly the scattered syncytia, are usually present in some viral infections. after analyzing the possible pathogens in the diagnosed dogs and the findings in the evaluated tissues, we set out to study the immune response against some of these infectious agents in dogs. further, we also decided to study dogs that lived with people diagnosed with covid- , as a group of dogs exposed to the virus but which did not present symptoms at the time of sampling. first of all, information about vaccination was gathered to determine the immune status of dogs. ten sick dogs had been vaccinated routinely as it is recommended by veterinarians but eight sick dogs did not received any vaccine (suppl. table s ). we have not detected any association between the vaccination patterns of pathological animals compared to healthy dogs. immunoglobulins g (igg) against cav, cpv, and cdv were analyzed in peripheral blood samples from these sick and healthy dogs (table ). further, antibodies (igm and igg isotypes) against canine coronavirus that affects enteric tract, and also igg against sars-cov- were studied in both groups ( table ). the number of dogs that presented igg antibodies against sars-cov- was higher in the group of healthy dogs ( %; / ), compared to the pathological ones ( · %; / ). interestingly, the sick dog that presented antibodies against sars-cov- was negative for the detection of the virus in swabs studied by rt-qpcr, however mycoplasma spp. and cdv were detected in this patient ( table ). all of the five igg α-sars-cov- positives healthy dogs showed the same pattern of antibodies against the other studied pathogens, being positives for igg α-cav, igg α-cpv, and igg α-cdv (table ). nevertheless, two of them presented igg α-ccov while the remaining three were not protected against canine coronavirus. twelve healthy dogs presented igg α-ccov and two of them were positive for igg α-sars-cov- (table ). seven pathological dogs presented igg α-ccov but in this group all of them were negative to α-sars-cov- (table ). professionals and pet owners demand more information about a possible infection of their animals with sars-cov- . a survey of us veterinarians reported that % of them seeing owners concerned that their pets had covid- , so many owners are restless and demand a more in-depth clinical study from their animals. pets are often in close contact with humans, and thus, it is important to determine their susceptibility to sars-cov- as well as the risk that infected pets are as a source of infection for humans. dogs are currently not considered to be susceptible hosts for sars-cov- , despite few positive rt-qpcr test results in dogs were reported. surveillance data from idexx laboratories, a multinational veterinary diagnostics company, showing that there were no positive results for sars-cov- in any of more than dog´s specimens submitted for respiratory pcr panels, according with the idea of transmission from human to pet is very rare. however, veterinarians, in spain, have observed an increase in aggressive lung pathologies in dogs in the months of the human pandemic, which did not respond to conventional antibiotic treatments. moreover, in veterinary medicine the respiratory disease is rarely lethal in the pet dog population. a mortality rate of · % due to respiratory disease (only · % due to pneumonia) has been reported, , nevertheless in our study we found a mortality rate of · % during follow-up without a clarified etiology. these dogs, with very aggressive lung diseases, showed a very similar appearance to those described for covid- pneumonia in human medicine. historically, the most common pathogens associated with canine infectious respiratory disease complex have been cpiv, cav- , bordetella bronchiseptica, streptococcus equi subsp. zooepidemicus, mycoplasma cynos, chv- , cdv, civ, and crcov. in our study, we detect eight of analyzed dogs presenting classical primary respiratory pathogens, and we also detect igm for ccov in four dogs ( / pathological dogs and / healthy dog). in this regard, the presence of crcov is detected more frequently in dogs with mild clinical signs than in dogs with moderate or severe clinical signs, therefore, we would rule out that it was the agent responsible for the severe respiratory pathologies in these three dogs. moreover, mycoplasma cynos is the only mycoplasma spp. significantly associated with pneumonia in dogs but it is still also unclear if m. cynos is a primary or secondary pathogen in dogs, because it can be cultured from the lungs of dogs, both with and without other identifiable infectious agents. in a european study of dogs with canine infectious respiratory disease seroprevalence of mycoplasma spp. levels ranging from · % to · %, but in other study with healthy dogs mycoplasma were isolated from % to % of throat swabs. moreover, mycoplasma infections are usually associated with other infections. it is interesting to note that mycoplasma coinfections are very common in covid- human patients and it also has been suggested that a co-infection or activation of latent mycoplasma infections in covid- disease may be important in determining a fatal disease course. , normally, the therapy response when treating respiratory tract diseases with drugs (antibiotics, bronchodilators, anti-inflammatories, antitussives, decongestants, mucolytics, mucokinetics or expectorants) is adequate or complete, nevertheless our patients did not respond adequately to the therapeutic protocol. a major pathogen has not been detected in our patients, so at the moment the causative agent of the pathologies is unknown. further, the number of deaths was more than times higher than expected without clarified etiology and curiously during the peak of the covid- pandemic in spain. when analyzing deceased dogs, interstitial pneumonia that usually courses with nonspecific lesions was detected, that has been described also in canine pathologies, such as canine distemper, herbicide poisoning or systemic processes (septicemia or uremia principally). however, it should be noted that showed lesions are similar to described for covid- affected humans. specially, striking lesions observed in vessels, both lymphocytic vasculitis, and the hyalinosis of the arteriolar wall. however, all of them were negative in rt-qpcr tests for sars-cov- using nasopharyngeal and rectal samples. these results agree with a large-scale study that recently has been shown to assess sars-cov- infection in companion animals living in northern italy and neither animals tested positive using rt-qpcr were found. likewise, it should be considered that viral particles have been detected in the skin endothelium of human patients despite they were negative when tested by rt-qpcr. therefore, it would be useful analyze sars-cov- by ihc in necropsy samples from our patients. regarding the presence of immunoglobulins against sars-cov- in peripheral blood of pets, in a previous study dogs were tested in china. they were serological negative for anti-sars-cov- iggs. among them, pet dog and street dog sera were collected from wuhan city but it should be noted that only one pet dog living with a confirmed covid- human patient presented antibodies against sars-cov- . however, in the italian study, · % of dogs (and · % of cats) had measurable sars-cov- neutralizing antibody titers. none of these animals with neutralizing antibodies displayed respiratory symptoms at the time of sampling. interestingly, dogs from covid- positive households seem to be significantly more likely to test igg positive than those from covid- negative households. finally, it has been determined that only half of the dogs artificially inoculated with sars-cov- seroconverted. here we detected specific anti-sars-cov- canine immunoglobulins in one sick dog ( / ) infection; nevertheless, they were not tested and confirmed. other owners had not exhibited symptoms during the pandemic. therefore, it was decided to exclude this data from the statistical study due to a lack of reliable information. most people infected with sars-cov- display an antibody response between day and day after infection, and several studies have suggested that previous antibodies and t cells against endemic human coronavirus may provide some degree of crossprotection to sars-cov- infection. further, it has been reported pre-existing memory cd + t cells that are cross-reactive with sars-cov- and the common human cold coronaviruses hcov-oc , hcov- e, hcov-nl , or hcov-hku . in our data we did not find any correlation between ccov and sars-cov- igg-positive dogs, although the low number of cases makes it difficult to reach a valid conclusion. in sum, we analyzed dogs affected by severe pulmonary disease, all of them being negative for sars-cov- by rt-qpcr, however some of them present igg α-sars-cov- , as well as the healthy dogs; suggesting that even in the case of a canine infection it would be little transmissible. moreover, dogs with owners positive for sars-cov- could have been more exposed to be infected during outbreaks. a pneumonia outbreak associated with a new coronavirus of probable bat origin clinical features of patients infected with novel coronavirus in wuhan the role of cytokines including interleukin- in covid- induced pneumonia and macrophage activation syndrome-like disease pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology canine respiratory coronavirus: an emerging pathogen in the canine infectious respiratory disease complex infection of dogs with sars-cov- the proximal origin of sars-cov- susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus transmission and response to re-exposure of sars-cov- in domestic cats sars-cov infection in farmed mink sars-cov- infection in farmed minks, the netherlands serological survey of sars-cov- for experimental, domestic, companion and wild animals excludes intermediate hosts of different species of animals evidence of exposure to sars-cov- in cats and dogs from households in italy in: associate administrator u-a, united states department of agriculture absence of sars-cov- infection in cats and dogs in close contact with a cluster of covid- patients in a veterinary campus s pathology of domestic animals owner concerns that pets have covid- methods and mortality results of a health survey of purebred dogs in the uk aetiology of canine infectious respiratory disease complex and prevalence of its pathogens in europe point-of-care lung ultrasound in patients with covid- -a narrative review canine infectious respiratory disease european surveillance of emerging pathogens associated with canine infectious respiratory disease precautions are needed for covid- patients with coinfection of common respiratory pathogens covid- coronavirus: is infection along with mycoplasma or other bacteria linked to progression to a lethal outcome? characterization of microbial co-infections in the respiratory tract of hospitalized covid- patients the pathological autopsy of coronavirus disease (covid- ) in china: a review sars-cov- endothelial infection causes covid- chilblains: histopathological, immunohistochemical and ultrastructural study of seven paediatric cases targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals selective and cross-reactive sars-cov- t cell epitopes in unexposed humans the authors would like to thank the owners of the dogs for their participation, as well as the veterinary clinics and hospitals that have collaborated in this study (veterinary hospital vetcare, veterinary hospital madrid norte, among others). the authors would like to thank the "centro de transfusión veterinario" for their donation of samples of virus not-exposed dogs. key: cord- -ya uvoki authors: böszörményi, kinga p.; stammes, marieke a.; fagrouch, zahra c.; kiemenyi-kayere, gwendoline; niphuis, henk; mortier, daniella; van driel, nikki; nieuwenhuis, ivonne; zuiderwijk-sick, ella; meijer, lisette; mooij, petra; remarque, ed j.; koopman, gerrit; hoste, alexis c. r.; sastre, patricia; haagmans, bart l.; bontrop, ronald e.; langermans, jan a.m.; bogers, willy m.; verschoor, ernst j.; verstrepen, babs e. title: comparison of sars-cov- infection in two non-human primate species: rhesus and cynomolgus macaques date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ya uvoki sars-cov- is a coronavirus that sparked the current covid- pandemic. to stop the shattering effect of covid- , effective and safe vaccines, and antiviral therapies are urgently needed. to facilitate the preclinical evaluation of intervention approaches, relevant animal models need to be developed and validated. rhesus macaques (macaca mulatta) and cynomolgus macaques (macaca fascicularis) are widely used in biomedical research and serve as models for sars-cov- infection. however, differences in study design make it difficult to compare and understand potential species-related differences. here, we directly compared the course of sars-cov- infection in the two genetically closely-related macaque species. after inoculation with a low passage sars-cov- isolate, clinical, virological, and immunological characteristics were monitored. both species showed slightly elevated body temperatures in the first days after exposure while a decrease in physical activity was only observed in the rhesus macaques and not in cynomolgus macaques. the virus was quantified in tracheal, nasal, and anal swabs, and in blood samples by qrt-pcr, and showed high similarity between the two species. immunoglobulins were detected by various enzyme-linked immunosorbent assays (elisas) and showed seroconversion in all animals by day post-infection. the cytokine responses were highly comparable between species and computed tomography (ct) imaging revealed pulmonary lesions in all animals. consequently, we concluded that both rhesus and cynomolgus macaques represent valid models for evaluation of covid- vaccine and antiviral candidates in a preclinical setting. author summary sars-cov- infection can have a wide range of symptoms. it can cause asymptomatic or mild disease, but can also have a severe, potentially deadly outcome. vaccines and antivirals will therefore be crucial in fighting the current covid- pandemic. for testing these prophylactic and therapeutic treatments, and investigating the progression of infection and disease development, animal models play an essential role. in this study, we compare the course of sars-cov- infection in rhesus and cynomolgus macaques. both species showed moderate disease symptoms as shown by pulmonary lesions by ct imaging. shedding of infectious virus from the respiratory system was also documented. this study provides a detailed description of the pathogenesis of a low-passage sars-cov- isolate in two macaque models and suggests that both species represent an equally good model in research for both covid- prophylactic and therapeutic treatments. were described in proven their value in research on the related coronaviruses that caused the sars and mers epidemics [ , ] , and thus are considered relevant nhp models for preclinical studies. cynomolgus macaques have been deployed in studies describing aspects of cov- pathogenesis [ , , ] , and have been utilized to evaluate the efficacy of hydroxychloroquine as an antiviral compound [ ] . rhesus macaques have also been applied in covid- pathogenesis studies [ , , , ] , and to test the efficacy of remdesivir in the treatment of sars-cov- infection [ ] . additionally, several prototype covid- vaccine candidates have received their first efficacy evaluation in the rhesus macaque model [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . some research groups [ , ] shed light on the heterogeneity in sars-cov- infection and investigated disease progression in different nhp species. most of these studies were conducted by different research teams, and a controlled comparative approach is lacking thus far. in other nhp disease models, like those developed for aids, tb, and influenza research, the choice of macaque (sub)species can influence the disease outcome considerably [ ] [ ] [ ] [ ] [ ] . the choice of a specific nhp species for research on a new and complex disease, like covid- , is therefore not a trivial one and the key question which macaque species is best suited to investigate specific aspects of covid- research needs to be answered. to address this issue, we compared sars-cov- replication in rhesus and cynomolgus macaque species and monitored signs of covid- -like disease symptoms for three weeks after infection. the macaques were infected in parallel with the same virus stock, received completely identical treatment, and the course of infection was followed using the same analyses, including monitoring of lung pathology using computed tomography (ct), and continuous telemetric recording of body temperature and activity of the animals. after administration of the virus in the upper trachea and nose, levels of viral rna were detectable in the tracheal and nasal swabs of all monkeys at day pi. viral rna remained evident in swab samples for several days. in the tracheal swab sample of rhesus macaque r , viral rna was first time below the detection time at days pi. (fig a, s table) . the individual variation of sars-cov- rna levels detected in the macaques, regardless of species, was considerable. peak viral rna levels in the trachea varied between . x copies/ml (r ; day pi) and . x copies/ml (j ; day pi). the time frame in which viral genetic material could be detected varied from only one day (r ; day pi) up to day pi. (animal r , rna in the trachea). peak viral loads detected in nasal swabs were generally lower than levels observed in the throat samples and did not exceed . x copies/ml (r ; day pi). the high virus loads measured in the first two days post-infection may suggest that some remaining rna from the original inoculum was still present. however, in all macaques, viral rna was also isolated from nasal swabs at later time points, showing that sars-cov- was excreted via the nose, and thus indicative for viral replication. the total viral rna production over time is shown in fig b. the patterns of viral rna detection in swabs also varied between individuals. the most outstanding observation was made for cynomolgus macaque j that was positive in the nose at day pi, then had no detectable viral rna for a period of three days, but later the animal became again positive in the nose swabs for three consecutive days. other animals (r , j , j , and ji ) also became pcr-positive again after a period of one or more days characterized by undetectable levels. in the anal swabs, viral genetic material was rarely detected. only at a few time points, three macaques tested positive, with a maximum viral rna load of x copies/ml at day pi., namely in cynomolgus macaque j . one animal tested positive for viral rna in blood at a single time point; r at day pi. (s table) . notably, no significant differences in viral rna loads were calculated between the macaque species (fig b) . body temperature, activity, clinical symptoms and blood parameters after sars-cov- infection body temperature and activity of each animal was continuously monitored using a physiotel digital telemetric device during the entire study. upon infection, elevated body temperatures were measured in both macaque species, which could be correlated to the episodes of viral replication in the nose and trachea as was evidenced by qrt-pcr. in fig , we show the body temperature alterations from the baseline during the study. in both groups of animals, the body temperature was significantly higher during the first two weeks after infection as compared to later time points (fig ) . the temperature curves for the individual animals are depicted in the supplementary data (s fig). the group of cynomolgus macaques showed elevated body temperature in the first to days following infection. this is in contrast with the measurements of the rhesus macaques where no substantial rise in temperature was measured, except for two animals (r and r ) that showed a sudden peak in body temperature of . °c at day pi. we applied a clinical scoring list to enumerate clinical symptoms that may be caused by the sars-cov- infection (s table) . the cumulative clinical scores per week did not exceed (of a maximum score per week; data not shown), confirming the absence of serious covid- -related symptoms. however, in the second week of infection, cynomolgus macaques showed more, but still mild, clinical symptoms than rhesus macaques. this was less evident during the first and third weeks, probably due to outlier clinical scores of individual animals (fig ) . blood samples were analyzed for changes in cell subsets and in biochemical parameters. these data were related to a set of normal (standard) values derived from uninfected, healthy rhesus, and cynomolgus macaques from the same breeding colony. no significant deviations from the normal values were seen in blood cell subsets of the infected monkeys. c-reactive protein levels, which are increased in covid- patients with pneumonia [ ], were not found higher in infected macaques. in humans, acute kidney injury has been related to ] , and elevated levels of serum creatinine and blood urea were detected in - % of a cohort of . hence, we measured creatinine and urea levels in macaque blood samples at days , , , , and pi., but did not find evidence of kidney malfunction in the infected, but otherwise seemingly healthy monkeys. equally, depending on the severity of the disease, blood coagulation disorders, like highly elevated d-dimer levels, chest cts of the macaques after infection revealed several manifestations of covid- with a variable time course and lung involvement ( table ). the most common lesion types that were found in both rhesus and cynomolgus macaques were ground glass opacities (ggo), consolidations, and crazy paving patterns (ccp) (fig ) . table lung lesions (max. ct score / ) were already seen in cts early after infection on day in out of monkeys, three rhesus, and two cynomolgus macaques. thereafter, lung involvement was seen in most animals and ct scores increased. around days and pi., lung lesions were manifest in all animals, and in several macaques the coverage had increased (table ) the cytokine profiles after sars-cov- were highly comparable between species, except for ip- and mcp- , suggesting differential involvement of monocyte activation between the two species. the similarity in cytokine response after sars-cov- infection contrasts with observations made after infection of macaques with another respiratory virus, pandemic h n influenza [ ] . in that study, macaque species-specific cytokine responses (il- , mcp- , il- , il- ra, mip- α, and il- ) were induced upon infection with ph n , highlighting the virus type-specific reaction of the chemokine system. unlike most published studies, we decided not to conduct a necropsy on animals early, - days, post-infection. at that time point after infection, evidence was found for acute viral interstitial pneumonia [ , , , ] . instead, we performed ct imaging to visualize lung pathology induced by sars-cov- . in humans, the sensitivity of ct scanning for lung pathology is high (positive predictive value of %), but the type of lesions found are not covid- specific, and can also be observed in a number of other infectious and non-infectious diseases [ , ] . in this study, we used purpose-bred nhps with a well-documented health status and we could compare the scans with a ct obtained just before infection. therefore, ct imaging provides a valuable tool to specifically monitor the progression of covid- -related lung pathology during the entire course of the study. based on the criteria set to determine clinical severity [ ], the macaques in our panel featured moderate disease levels as all eight individuals show levels of pneumonia. in another study using only cynomolgus macaques and using ct imaging as well, lesions were found as early as days post-infection in infected animals [ ] . type-wise, the lung lesions described in that report were comparable to the ones in this communication, but they tend to be located deeper in the lungs. an explanation for this difference may be that the method of instillation of the virus is the underlying cause. studies [ , , , ] . this demonstrates that tracheal swabs are a good alternative for bal sampling. in addition, the collection of tracheal swabs is a less invasive technique that causes relatively minor discomfort to the animals. in most sars-cov- studies in non-human primates, the animals are euthanized shortly after infection in the first week, or after a period of weeks. the animals from this study were not euthanized to be able to perform re-infection studies or to monitor them for late clinical signs, or co-morbidities related to we conclude that the course of sars-cov- infection of both macaque species is highly similar, indicating that they are equally suitable models to test vaccines and antivirals in a preclinical setting for safety and efficacy. the macaque model for sars-cov- infection in humans manifests important virological aspects of this disease in humans. given their immunological and physiological resemblance to humans, nhps likely will continue to play a pivotal role in research for both covid- prophylactic and therapeutic treatments. four indian-origin rhesus macaques and four cynomolgus macaques were used in this study (s table) . all macaques were mature, outbred animals, purpose-bred, and housed at the bprc. the animals were in good physical health with normal baseline biochemical and hematological values. all were pair-housed with a socially compatible cage-mate in cages of at least m with bedding to allow foraging and were kept on a -hour light/dark cycle. the monkeys were offered a daily diet consisting of monkey food pellets (ssniff, soest, germany) supplemented with vegetables and fruit. enrichment was provided daily in the form of pieces of wood, mirrors, food puzzles, and a variety of other homemade or commercially available enrichment products. drinking water was available ad libitum via an automatic watering system. animal care staff provided daily visual health checks before infection, and twice-daily after infection. the animals were monitored for appetite, general behavior, and stool consistency. all possible precautions were taken to ensure the welfare and to avoid any discomfort to the animals. all experimental interventions (intratracheal and intranasal infection, swabs, blood samplings, and ct scans) were performed under anesthesia. the animals were infected with the sars-cov- strain betacov/bavpat / . this strain was isolated from a patient who traveled from china to germany, and an aliquot of a vero e cell culture was made available through the european virus archive-global (evag). the viral stock for the infection study was propagated on vero e cells. for this study, a fifth passage virus stock was prepared with a titer of . x tcid per ml. the integrity of the virus stock was confirmed by sequence analysis. three weeks before the experimental infection, a physiotel digital device (dsi implantable telemetry, data sciences international, harvard bioscience, uk) was implanted in the abdominal cavity of each animal. this device allowed the continuous real-time measurement of the body temperature and the animals' activity remotely using telemetry throughout the study. at day , all animals were exposed to a dose of x tcid of sars-cov- , diluted in ml phosphate buffered saline (pbs). the virus was inoculated via a combination of the intratracheal route ( . ml) and intranasal route ( . ml per nostril). virus infection was monitored for days, during which period the animals were checked twice-daily by the animal caretakers and scored for clinical symptoms according to a previously published, adapted scoring system [ ] (s table) . a numeric score of or more per observation time point was predetermined to serve as an endpoint and justification for euthanasia. every time an animal was sedated, the body weight was measured. blood was collected using standard aseptic methods from the femoral vein at regular time points post-infection (pi). in parallel, tracheal, nasal, and anal swabs were collected using copan floqswabs (mls, menen, belgium). swabs were placed in ml dmem, supplemented with . % bovine serum albumin (bsa), fungizone ( . μg/ml), penicillin ( u/ml), and streptomycin ( for the final reconstruction, the expiration phases were exclusively used and manually selected. a semi-quantitative scoring system for chest ct evaluation was used to estimate sars-cov- -induced lung disease [ , ] . quantification of the cts was performed independently by two persons based on the sum of the lobar scores. the degree of involvement in each zone was scored as: for no involvement, for < %, for - %, for - %, for - % and for >= % involvement. an additional increase or decrease of . was used to indicate alterations in ct density of the lesions. by using this scoring system, a a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis estimates of the severity of coronavirus disease : a model-based analysis severe covid- stool viral rna testing, and outcomes the emerging spectrum of covid- neurology: clinical, radiological and laboratory findings outcomes of cardiovascular magnetic resonance imaging in patients recently recovered from covid- ) coronavirus as a possible cause of severe acute respiratory syndrome lethal infection of k -hace mice infected with severe acute respiratory syndrome coronavirus epub / / covid- preclinical models: human angiotensin- converting enzyme transgenic mice a single-dose live-attenuated yf d-vectored sars-cov vaccine candidate stat signaling as double-edged sword restricting viral dissemination but driving severe pneumonia in sars-cov- infected hamsters sars virus infection of cats and ferrets infection and rapid transmission of sars-cov- in ferrets cov- is transmitted via contact and via the air between ferrets nonhuman primate models of human viral infections animal models of mechanisms of sars-cov- infection and covid- pathology pneumonitis and multi-organ system disease in common marmosets (callithrix jacchus) infected with the severe acute respiratory syndrome-associated coronavirus comparative pathology of rhesus macaque and common marmoset animal models with middle east respiratory syndrome coronavirus comparison of nonhuman primates identified the suitable model for covid- animal models for coronaviruses: sars-cov- establishment of an african green monkey model for covid- cov- infection of african green monkeys results in mild respiratory disease discernible by pet/ct imaging and shedding of infectious virus from both respiratory and gastrointestinal tracts characteristic and quantifiable covid- -like abnormalities in ct-and pet/ct-imaged lungs of sars-cov- - infected crab-eating macaques (macaca fascicularis) comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model epub / / hydroxychloroquine use against sars-cov- infection in non-human primates cov- infection protects against rechallenge in rhesus macaques ocular conjunctival inoculation of sars-cov- can cause mild covid- in rhesus macaques clinical benefit of remdesivir in rhesus macaques infected with sars-cov- development of an inactivated vaccine candidate for sars-cov- alphavirus-derived replicon rna vaccine induces sars-cov- neutralizing antibody and t cell responses in mice and nonhuman primates ad vaccine protects against sars-cov- in rhesus macaques dna vaccine protection against sars-cov- in rhesus macaques intradermal- delivered dna vaccine provides anamnestic protection in a rhesus macaque sars-cov- challenge model evaluation of the mrna- vaccine against sars-cov- in nonhuman primates a vaccine targeting the rbd of the s protein of sars-cov- induces protective immunity a macaque model of hiv- infection siv infection of rhesus macaques of chinese origin: a suitable model for hiv infection in humans nonhuman primate model of tuberculosis experimental animal modelling for tb vaccine development pandemic swine-origin h n influenza virus replicates to higher levels and induces more fever and acute inflammatory cytokines in cynomolgus versus rhesus monkeys and can replicate in common marmosets c-reactive protein levels in the early stage of covid- kidney involvement in covid- and rationale for extracorporeal therapies management of acute kidney injury in patients with covid- kidney disease is associated with in-hospital death of patients with covid- severe hypercoagulability in patients admitted to intensive care unit for acute respiratory covid- and its implications for thrombosis and anticoagulation respiratory disease in rhesus macaques inoculated with sars-cov- the cytokine storm in covid- : an overview of the involvement of the chemokine/chemokine-receptor system clinical features of patients infected with novel coronavirus in wuhan a role for ct in covid- ? what data really tell us so far radiological society of north america expert consensus statement on reporting chest ct findings related to covid- . endorsed by the society of thoracic radiology, the american college of radiology, and rsna -secondary publication chinese experience and recommendations concerning detection, staging and follow-up thoracic radiography as a refinement methodology for the study of h n influenza in cynomologus macaques (macaca fascicularis) detection of novel coronavirus ( -ncov) by real-time rt-pcr time course of lung changes at chest ct during recovery from coronavirus disease (covid- ) two serological approaches for detection of antibodies to sars-cov- in different scenarios: a screening tool and a point-of-care test a) viral rna quantification in tracheal and nasal swabs of rhesus and cynomolgus macaques by qrt-pcr. the limit of quantification ( rna copies/ml) is indicated by the dotted horizontal line. (b) total virus loads in throat and nose samples of macaques throughout the study. horizontal bars represent geometric means the different colors used for each animal as shown in the legend of a are used to denote the same individual in all figures of this manuscript. the group of rhesus macaques is indicated by yellow to red colors; cynomolgus macaques by green to blue the body temperature was measured by telemetry throughout the study. the daily average body temperature of rhesus and cynomolgus macaques was calculated and the deviations from baseline body temperature (in °c) are depicted cumulative clinical scores. the cumulative clinical scores were calculated per week and per individual animal (day - , - and - ). horizontal bars represent medians types of lung lesions detected via ct scans in sars-cov- -infected macaques ground glass opacities (ggo), (b) consolidations, and (c) crazy paving patterns (ccp) key: cord- -zxg dsm authors: bernasconi, anna; canakoglu, arif; pinoli, pietro; ceri, stefano title: empowering virus sequences research through conceptual modeling date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zxg dsm the pandemic outbreak of the coronavirus disease has attracted attention towards the genetic mechanisms of viruses. we hereby present the viral conceptual model (vcm), centered on the virus sequence and described from four perspectives: biological (virus type and hosts/sample), analytical (annotations and variants), organizational (sequencing project) and technical (experimental technology). vcm is inspired by gcm, our previously developed genomic conceptual model, but it introduces many novel concepts, as viral sequences significantly differ from human genomes. when applied to sars-cov virus, complex conceptual queries upon vcm are able to replicate the search results of recent articles, hence demonstrating huge potential in supporting virology research. in addition to vcm, we also illustrate the data dictionary for patient’s phenotype used by the covid- host genetic initiative. our effort is part of a broad vision: availability of conceptual models for both human genomics and viruses will provide important opportunities for research, especially if interconnected by the same human being, playing the role of virus host as well as provider of genomic and phenotype information. despite the advances in drug and vaccine research, diseases caused by viral infection pose serious threats to public health, both as emerging epidemics (e.g., zika virus, middle east respiratory syndrome coronavirus, measles virus, or ebola virus) and as globally well-established epidemics (such as human immunodeficiency virus, dengue virus, hepatitis c virus). the pandemic outbreak of the coronavirus disease covid- , caused by the "severe acute respiratory syndrome coronavirus " virus species sars-cov (according to the genbank [ ] acronym ), has brought unprecedented attention towards the genetics mechanisms of coronaviruses. thus, understanding viruses from a conceptual modeling perspective is very important. the sequence of the virus is the central information, along with its annotated parts (known genes, coding and untranslated regions...) and the nucleotides' variants with respect to the reference sequence for the specific species. each sequence is identified by a strain name, which belongs to a specific virus species. viruses have complex taxonomies, as they belong to genus, sub-families, and finally families (e.g., coronaviridae). other important aspects include the host organisms and isolation sources from which viral materials are extracted, the sequencing project, the scientific and medical publications related to the discovery of sequences; virus strains may be searched and compared intra-and cross-species. luckily, all these data are made available publicly by various resources, from which they can be downloaded and re-distributed. our recent work is focused on data-driven genomic computing, providing contributions in the area of modeling, integration, search and query answering. we have previously proposed a conceptual model focused on human genomics [ ] , which was based on a central entity item, representing files of genomic regions. the simple schema evolved into a knowledge graph [ ] , including ontological representation of many relevant attributes (e.g., diseases, cell lines, tissue types...). the approach was validated through the practical implementation of the integration pipeline meta-base , which feeds an integrated database, searchable through the genosurf interface [ ] . very recently we have also been involved in the covid- host genetics initiative, a collaborative effort that aims at joining forces of the broader human genetics community to generate, share, and analyze data to learn the genetic characteristics and outcomes of covid- . in this project, we built a conceptual data definition (and related questionnaire) for describing the phenotype of covid- , to be used by clinicians who contribute to the project. thus, we created a conceptually solid definition of the clinical information of patients affected by covid- , acting as hosts to the sars-cov virus. based on these considerations, in this paper we contribute as follows: -we propose a new viral conceptual model (vcm), a general conceptual model for describing viral sequences, organized along specific dimensions that highlight a conceptual schema similar to gcm [ ] ; -focusing on sars-cov , we show how vcm can be profitably linked to a phenotype database with information on covid- infected patients; -we provide a list of interesting queries replicating newly released literature on infectious diseases; these can be easily performed on vcm. the manuscript is organized as follows: section overviews current technologies available for virus sequence data management. section proposes our vcm, while section shows its possible intersection with a general clinical database. we show examples of applications in section and review related works in section . section discloses our vision for future developments. the landscape of relevant resources and initiatives dedicated to data collection, retrieval and analysis of virus sequences is shown in fig. . we partitioned the space of contributors by considering: institutions that host data sequences, main sequence databases, tools provided for querying and searching them, and then organizations and tools hosting data analysis interfaces that also connect to viral sequence databases. the three main organizations providing open-source viral sequences are ncbi (us), ddbj (japan), and embl-ebi (europe); they operate within the broader contexts provided by the international nucleotide sequence database collaboration. ncbi hosts the two, so far, most relevant viral sequence databases: gen-bank [ ] contains the annotated collection of all publicly available dna and rna sequences; refseq [ ] provides a stable reference for genome annotation, gene identification and characterization, and mutation/polymorphism analysis. genbank is continuously updated thanks to the abundant sharing of multiple laboratories and data contributors around the world (note that sars-cov nucleotide sequences have increased from about around the end of march , to , as of april th). embl-ebi hosts the european nucleotide archive [ ] , which has a broader scope, accepting submissions of nucleotide sequencing information, including raw sequencing data, sequence assembly information and functional annotations. several tools are available for querying and searching these databases; among them, e-utilities [ ] , ncbi virus [ ] , and pathogens are tools and portals directly provided by the insdc institutions for supporting the access to their viral resources, however lacking possibility of querying based on annotations and variants. a number of databases and data analysis tools refer to these viral sequences databases. we mention: viralzone [ ] by the sib swiss institute of bioinformatics, which provides access to sars-cov proteome data as well as cross-links to complementary resources; the virus pathogen database and analysis resource (vipr, [ ] ), an integrated repository of data and analysis tools for multiple virus families, supported by the bioinformatics resource centers program; virusite [ ] , an integrated database for viral genomics; the viral genome organizer, implemented by the canadian viral bioinformatics research centre, focusing on search for sub-sequences within genomes. while the insdc consortium provides full open access to sequences, the gi-said initiative [ , ] was created in with the explicit purpose of offering an alternative to traditional public-domain data archives, as many scientists hesitated to share influenza data due to their legitimate concern about not being properly acknowledged, among others. gisaid hosts epiflu tm , a large sequence database, which started its mission for influenza data and is now expanding with epicov tm having a particular focus on the sars-cov pandemic ( , sequences for sars-cov on april th). some interesting portals have become interfaces to gisaid data with particular focuses: nextstrain [ ] overviews emergent viral outbreaks based on the visualization of sequence data integrated with geographic information, serology, and host species; cov-glue, part of the glue suite [ ] , contains a database of replacements, insertions and deletions observed in sequences sampled from the pandemic. many other resources link to viral sequence data, including: drug databases, particularly interesting as they provide information about clinical studies (see clinicaltrials ), protein sequences databases (e.g., uniprotkb/swiss-prot [ ] ), and cell lines databases (e.g., cellosaurus [ ] ). we previously proposed the genomic conceptual model (gcm, [ ] ), an entity-relationship diagram that recognizes a common organization for a limited set of concepts supported by most genomic data sources, although with different names and formats. the model is centered on the item entity, representing an elementary experimental file of genomic regions and their attributes. four views depart from the central entity, recalling a classic star-schema organization that is typical of data warehouses [ ] ; they respectively describe: i) the biological elements involved in the experiment: the sequenced sample and its preparation, the donor or patient; ii) the technology used in the experiment, including a specific assay (i.e., technique); iii) the management aspects: the projects/organizations involved in the preparation and production; iv) the extraction parameters used for internal selection and organization of items. gcm is employed as a driver of integration pipelines for genomic datasets, fuelling user search-interfaces such as genosurf [ ] . lessons learnt from that experience include the benefits of having: a central fact entity that helps structuring the search; a number of surrounding dimensions capturing organization, biological and experimental conditions to describe the facts; a data layout that is easy to learn for first-time users and that helps the answering of practical questions (as demonstrated in [ ] ). we hereby propose the viral conceptual model (vcm), which is influenced by our past experience with human genomes. there are significant differences between the two conceptual models. the human dna sequence is long ( billions of base pairs) and has been understood in terms of reference genomes (named h and grch ) to which all other information is referred, including genetic and epigenetic signals. instead, viruses are many, their sequences are short (order of thousands of base pairs) and each virus has its own reference sequence; moreover, virus sequences are associated to a host sample of another species. with a bird eye's view, the vcm conceptual model is centered on the sequence entity that describes individual virus sequences; sequences are analyzed from the biological perspective (hostsample and virus), the technological perspective (experimenttype), and the organizational perspective (sequencingproject). two other entities, annotation and variant, represent an analytical perspective of the sequence, allowing to analyze its characteristics, its sub-parts, and the differences with respect to reference sequences for the specific virus species. we next illustrate the central entity and the four perspectives. central entity. a viral sequence can regard either dna or rna; in either cases, databases and sequencing data write the sequence as a dna nucleotidesequence (i.e., guanine (g), adenine (a), cytosine (c), and thymine (t) ) that has a specific strand (positive or negative), length (typically thousands), and a percentage of read g and c bases (gc% ). each sequence is uniquely identified by an accessionid, which is retrieved directly from the source database (genbank's are usually formed by two capital letters, followed by six digits, gi-said by the string "epi isl " and six digits). sequences can be complete or partial (as encoded by the boolean flag iscomplete) and they can be a reference sequence (stored in refseq) or a regular one (encoded by isreference). in the latter case, sequences have a corresponding strainname assigned by the sequencing lab, somehow hard-coding relevant information (e.g., hcov- /nepal/ / or -ncov ph ncov ). technological perspective. the sequence derives from one experiment or assay, described in the experimenttype entity (cardinality is :n from the dimension towards the fact). it is performed on biological material analysed with a given sequencingtechnology platform (e.g., illumina miseq) and an as-semblymethod, collecting algorithms that have been applied to obtain the final sequence, for example: bwa-mem, to align sequence reads against a large reference genome; bcftools, to manipulate variant calls; megahit, to assemble ngs reads. another technical measure is captured by coverage (e.g., x or x). biological perspective. each sequence belongs to a specific virus, which is described by a complex taxonomy. the most precise definition is the species-name (e.g., severe acute respiratory syndrome coronavirus ), corresponding to a speciestaxonid (e.g., ), related to a simpler genbankacronym (e.g., sars-cov ) and to many comparable forms, contained in the equiv-alentlist (e.g., -ncov, covid- , sars-cov- , sars , wuhan coronavirus, wuhan seafood market pneumonia virus, ...). the species belongs to a genus (e.g., betacoronavirus), part of a subfamily (e.g., orthocoronavirinae), finally falling under the most general category of family (e.g., coronaviridae). each virus species corresponds to a specific moleculetype (e.g., genomic rna, viral crna, unassigned dna), which has either double-or single-stranded structure; in the second case the strand may be either positive or negative. these possibilities are encoded within the issinglestranded and ispositivestranded boolean variables. an assay is performed on a tissue extracted from an organism that has hosted the virus for an amount of time; this information is collected in the hostsample entity. the host is defined by a species, corresponding to a speciestaxonid, usually represented using the ncbi taxonomy [ ] (e.g., for homo sapiens). the sample is extracted on a collectiondate, from an isola-tionsource that is a specific host tissue (e.g., nasopharyngeal or oropharyngeal swab, lung), in a certain location identified by the quadruple originatinglab (when available), region, country, and geogroup (such as continent). both entities of this perspective are in :n cardinality with the sequence. organizational perspective. the entity sequencingproject describes the management aspects of the production of the sequence. each sequence is connected to a number of studies, usually represented by a research publication (with authors, title, journal, publicationdate and eventually a pubmedid referring to the most important biomedical literature portal ). when a study is not available, just the sequencinglab and submissiondate are provided. in rare occasions, a project is associated with a popset number, which identifies a collection of related sequences derived from population studies (submitted to genbank), or with a bioprojectid (an identifier to the bioproject external database ). we also include the name of databasesource, denoting the organization that primarily stores the sequence. in this perspective all cardinalities are :n as sequences can be part of multiple projects; conversely, sequencing projects contain various sequences. analytical perspective. this perspective allows to store information that are useful during the secondary analysis of genomic sequences. annotations include a number of sub-sequences representing segments (defined by start and stop coordinates) of the original sequence with a particular featuretype (e.g., gene, peptide, coding dna region, or untranslated region, molecule patterns such as stem loops and so on), the recognized genename to which it belongs (e.g., gene "e"), the product it concurs to produce (e.g., leader protein, nsp protein, rna-dependent rna polymerase, membrane glycoprotein, envelope protein...), and eventually an externalreference when the protein is present in a separate database such as uniprotkb. the variant entity contains subsequences of the main sequence that differ from the reference sequence of the same virus species. they can be identified with respect to the reference one, just by using the altsequence (i.e., the nucleotides used in the analyzed sequence at position start coordinate for an arbitrary length, typically just equal to ) and a specific type, which can correspond to insertion (ins), deletion (del), single-nucleotide polymorphism (snp) or others. the content of the attributes of this entity is not retrieved from existing databases; instead it is computed in-house by our procedures. indeed, we use the well known dynamic programming algorithm of needleman-wunsch [ ] , that computes the optimal alignment between two sequences. from a technical point of view, we compute the pair-wise alignment of every sequence to the reference sequence of refseq (nc ); from such alignment we then extract all insertions, deletions, and substitutions that transform (edit) the reference sequence into the considered sequence. a similar computation is performed within cov-glue (http://covglue.cvr.gla.ac.uk/). after the spread of covid- pandemia, several informal consortia have been created to foster international cooperation among researchers. we participate to the covid- host genetics initiative, aiming at bringing together the human genetics community to generate, share and analyze data to learn the genetic determinants of covid- susceptibility, severity and outcomes. in this setting, we are coordinating the production of a data dictionary for the phenotype definition, which will be used as a reference by participating institutions, hosted by ega [ ] , the european genome-phenome archive of embl-ebi the dictionary, illustrated in fig. , contains patient phenotype information, collected at admission and during the course of hospitalizations (hosted by a given hospital); each patient can be connected to a virus sequence (in that case, she is the host organism providing the hostsample of vcs) and can have multiple encounters. for ease of visualization, attributes are clustered within attribute groups, indicated with white squares instead of black circles. note that the dictionary representation deviates from a classic entity-relationship diagram as some attribute groups would typically deserve the role of entity; however, this simple format allows an easy mapping of the dictionary to questionnaires and an implementation by ega in the form of spreadsheet. attribute groups of patients describe: demography&exposure, riskfactors, comorbidities, admissionsymptoms, hospitalizationcourse; attribute groups of encounters describe: encountersymptoms, treatments, laboratoryresults. attributes within groups can be further clustered within subgroups; for instance, comorbidities include the subgroups immunesystem, respiratory, genitourinary, cardiovascular, neurological, cancer. the data dictionary includes two possible uses in further analysis (i.e., the course of hospitalization and longitudinal studies); for these uses we set each attribute to either mandatory or optional. in addition to very general questions that can be easily asked through our conceptual model (e.g., retrieve all viruses with given characteristics), in the following we propose a list of interesting application studies that could be backed by the use of our conceptual model. in particular, they refer to sars-cov virus as it is receiving most of the attention of the scientific community. fig. represents the reference sequence of sars-cov , highlighting the major structural sub-sequences that are relevant for the encoding of proteins and other functions. it has region annotations, of which fig. represents only the genes (orf ab, s, orf a, e, m, orf , orf a, orf b, orf , n, orf ) plus the rna-dependent rna polymerase enzyme, with approximate indication of the corresponding coordinates. we next describe biological queries supported by vcm, from the easiest to the most complex ones, typically suggested by existing studies. q . the most common variants found in sars-cov sequences can be selected for us patients; the query can be performed only on specific genes. country is in blue as samples will be distributed according to such field. q . according to [ ] , e and rdrp genes are highly mutated and thus crucial in diagnosing covid- disease; first-line screening tools of -ncov should perform an e gene assay, followed by confirmatory testing with the rdrp gene assay. conceptual queries are concerned with retrieving all sequences with mutations within genes e or rdrp and relating them to given hosts, e.g. humans affected in china. q . tang et al. [ ] claim that there are two clearly definable "major types" (s and l) of sars-cov in this outbreak, that can be differentiated by transmission rates. intriguingly, the s and l types can be clearly distinguished by just two tightly linked snps at positions , (within the orf ab gene from c to t) and , (within orf from t to c). then, queries can correlate these snps to other variants or the outbreak of covid- in specific countries (e.g., [ ] ). q . to inform sars-cov vaccine design efforts, it may be needed to track antigenic diversity. typically, pathogen genetic diversity is categorised into distinct clades (i.e., a monophyletic group on a phylogenetic tree). these clades may refer to 'subtypes', 'genotypes', or 'groups', depending on the taxonomic level under investigation. in [ ] , specific sequence variants are used to define clades/haplogroups (e.g., the a group is characterized by the , and , nucleotides, originally c mutated to t, by the , nucleotide t mutated to c, and by the , , from a to g). vcm supports all the information required to replicate the definition of sars-cov clades requested in the study. fig. illustrates the conjunctive selection of sequences with all four variants corresponding to the a clade group defined in [ ] and the resulting retrieved sequences. q . morais junior at al. [ ] propose a subdivision of the global sars-cov population into sixteen subtypes, defined using "widely shared polymorphisms" identified in nonstructural (nsp , nsp , nsp , nsp , nsp and nsp ) cistrons, structural (spike and nucleocapsid), and accessory (orf ) genes. vcm supports all the information required to replicate the definition of such subtypes. the above examples of complex queries refer to virus sequences and can be answered by vcm (fig. ) . due to the pressing interest on sars-cov , we are currently doing an effort to collect sars-cov sequences and provide a search interface for a first release of a vcm-based query engine. even more interesting queries will be enabled by combining phenotypes with virus sequences; along this direction, we also contributed to the data dictionary effort (fig. ) . when both datasets will be accessible, other more powerful studies will be possible. some early findings have been already published connecting virus sequences with phenotypes, so far with very small datasets (e.g., [ ] with only patients, [ ] with patients, and [ ] with sequenced sars-cov genomes). as reaffirmed by these works, there is need for additional comprehensive studies linking the viral sequences of sars-cov to the phenotype of patients affected by covid- . we are confident that in the near future there will be many more studies like [ , , ] . the use of conceptual modeling to describe genomics databases dates back to the late nineties, including a functional model for dna databases named "associative information structure" [ ] ; a model representing genomic sequences [ ] ; and a set of data models for describing transcription/translation processes [ ] . later on, a stream of works on conceptual modeling-based data warehouses includes the gedaw uml conceptual schema [ ] , driving the construction of a gene-centric data warehouse for microarray expression measurements; the genomics unified schema [ ] ; the genome information management system [ ] , a genome-centric data warehouse; and the genemapper warehouse [ ] , integrating expression data from a number of genomic sources. more recently, there has been a solid stream of works dedicated to data quality-oriented conceptual modeling: [ ] presents the human genome con-ceptual model and [ ] applies it to uncover relevant information hidden in genomics data lakes. conceptual modeling has been mainly concerned with aspects of the human genome, even when more general approaches were adopted; in [ ] we presented the genomic conceptual model (gcm), describing the metadata associated with genomic experimental datasets available for humans or other model organisms; gcm was essential for driving the data integration pipeline and building search interfaces [ ] . in the variety of types of genomic databases [ ] , several resources are dedicated to viruses [ ] ; however, very few works relate to conceptual data modeling. among them, [ ] considers host information and normalized geographical location, and [ ] focuses on influenza a viruses. the closest work to us, described in [ ] , is a flexible software system for querying virus sequences; it includes a basic conceptual model . in comparison, vcm covers more dimensions, that are very useful for supporting research queries on virus sequences. this paper responds to an urgent need, understanding the conceptual properties of sars-cov so as to facilitate research studies. however, the model applies to any type of virus, and will be at the basis for the development of new instruments. in the past, we first presented the conceptual model for human genomics [ ] , then we developed the web-based search system genosurf [ ] ; our ongoing effort is to develop a search system for viral conceptual schemas, inspired by genosurf. while the need for data is pressing, there is also a need of conceptually wellorganized information. in our broad vision, the availability of conceptual models for both human genomics and viruses will provide important opportunities for research, amplified to the maximum when human and viral sequences will be interconnected by the same human being, playing the role of host of a given virus sequence as well as provider of genomic and phenotype information. in the future we will continue our modeling and integration efforts for virus genetics in the context of humans, by interacting with the community of scholars who study viruses. we may add more discovery-oriented entities to the model, that could be of use in a future scenario, e.g., a new pandemic offspring. a user researching on diagnosis could ask, for example, what sequence patterns are unique to the whole or sub-part of the database (i.e., do not appear in viruses within the database). whereas, a user working on vaccine development could be interested in what are the epitopes (i.e., antigen parts to which antibodies attach) that cover the whole database or a partition of it, for mhc types prevalent in different infected humans. possibly, other dimensions will be necessary, such as drug resistance information and drug resistance-associated mutations. the european nucleotide archive in gus the genomics unified schema a platform for genomics databases the cellosaurus, a cell-line knowledge resource exploiting conceptual modeling for searching genomic metadata: a quantitative and qualitative empirical study from a conceptual model to a knowledge graph for genomic datasets conceptual modeling for genomics: building an integrated repository of open data designing data marts for data warehouses genosurf: metadata driven semantic search system for integrated genomic datasets detection of novel coronavirus ( -ncov) by real-time rt-pcr gims: an integrated data storage and analysis environment for genomic and functional data a summary of genomic databases: overview and discussion flexible integration of molecular-biological annotation data: the genmapper approach data, disease and diplomacy: gisaid's innovative contribution to global health the ncbi taxonomy database the european genotype archive: background and implementation spread of sars-cov- in the icelandic population integrating and warehousing liver gene expression data and related biomedical resources in gedaw nextstrain: real-time tracking of pathogen evolution virus variation resource-improved response to emergent viral outbreaks viralzone: a knowledge resource to understand virus diversity the global population of sars-cov- is composed of six major subtypes clinical and virological data of the first cases of covid- in europe: a case series influenza a virus informatics: genotype-centered database and genotype annotation genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding imagene: an integrated computer environment for sequence annotation and analysis a general method applicable to the search for similarities in the amino acid sequence of two proteins formal design and implementation of an improved ddbj dna database with a new schema and object-oriented library reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation a method to identify relevant genome data: conceptual modeling for the medicine of precision conceptual modelling of genomic information vipr: an open bioinformatics database and analysis resource for virology research uniprot: a worldwide hub of protein knowledge applying conceptual modeling to better understand the human genome the e-utilities in-depth: parameters, syntax and more. entrez programming utilities help unraveling the web of viroinformatics: computational tools and databases in virus research gisaid: global initiative on sharing all influenza data-from vision to reality glue: a flexible software system for virus sequence data virusite-integrated database for viral genomics named entity linking of geospatial and host metadata in genbank for advancing biomedical research on the origin and continuing evolution of sars-cov- acknowledgements. this research is funded by the erc advanced grant geco (data-driven genomic computing), - . the authors thank prof. limsoon wong for his precious suggestions and inspiration for future works. key: cord- -ywaefpe authors: rodon, jordi; muñoz-basagoiti, jordana; perez-zsolt, daniel; noguera-julian, marc; paredes, roger; mateu, lourdes; quiñones, carles; erkizia, itziar; blanco, ignacio; valencia, alfonso; guallar, víctor; carrillo, jorge; blanco, julià; segalés, joaquim; clotet, bonaventura; vergara-alert, júlia; izquierdo-useros, nuria title: pre-clinical search of sars-cov- inhibitors and their combinations in approved drugs to tackle covid- pandemic date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ywaefpe there is an urgent need to identify novel drugs against the new coronavirus. although different antivirals are given for the clinical management of sars-cov- infection, their efficacy is still under evaluation. here, we have screened existing drugs approved for human use in a variety of diseases, to compare how they counteract sars-cov- -induced cytopathic effect and viral replication in vitro. among the potential antivirals tested herein that were previously proposed to inhibit sars-cov- infection, only % had in vitro antiviral activity. moreover, only eight families had an ic below µm or iu/ml. these include chloroquine derivatives and remdesivir, along with plitidepsin, cathepsin inhibitors, nelfinavir mesylate hydrate, interferon -alpha, interferon-gamma, fenofibrate and camostat. plitidepsin was the only clinically approved drug displaying nanomolar efficacy. four of these families, including novel cathepsin inhibitors, blocked viral entry in a cell-type specific manner. since the most effective antivirals usually combine therapies that tackle the virus at different steps of infection, we also assessed several drug combinations. although no particular synergy was found, inhibitory combinations did not reduce their antiviral activity. thus, these combinations could decrease the potential emergence of resistant viruses. antivirals prioritized herein identify novel compounds and their mode of action, while independently replicating the activity of a reduced proportion of drugs which are mostly approved for clinical use. combinations of these drugs should be tested in animal models to inform the design of fast track clinical trials. there is an urgent need to identify novel drugs against the new coronavirus. although different antivirals are given for the clinical management of sars-cov- infection, their efficacy is still under evaluation. here, we have screened existing drugs approved for human use in a variety of diseases, to compare how they counteract sars-cov- -induced cytopathic effect and viral replication in vitro. among the potential antivirals tested herein that were previously proposed to inhibit sars-cov- infection, only % had in vitro antiviral activity. moreover, only eight families had an ic below µm or iu/ml. these include chloroquine derivatives and remdesivir, along with plitidepsin, cathepsin inhibitors, nelfinavir mesylate hydrate, interferon -alpha, interferon-gamma, fenofibrate and camostat. plitidepsin was the only clinically approved drug displaying nanomolar efficacy. four of these families, including novel cathepsin inhibitors, blocked viral entry in a cell-type specific manner. since the most effective antivirals usually combine therapies that tackle the virus at different steps of infection, we also assessed several drug combinations. although no particular synergy was found, inhibitory combinations did not reduce their antiviral activity. thus, these combinations could decrease the potential emergence of resistant viruses. antivirals prioritized herein identify novel compounds and their mode of action, while independently replicating the activity of a reduced proportion of drugs which are mostly approved for clinical use. combinations of these drugs should be tested in animal models to inform the design of fast track clinical trials. a novel betacoronavirus, the severe acute respiratory syndrome coronavirus (sars-cov- ), is causing a respiratory disease pandemic that began in wuhan, china, in november , and has now spread across the world (chen et al., ) . to date, remdesivir is the only approved antiviral drug for the specific treatment of this coronavirus infectious disease or covid- (beigel et al., ; grein et al., ) . however, several drugs are being used in the frontline of clinical management of sars-cov- -infected individuals in hospitals all around the world, to try to avoid the development of the covid- associated pneumonia, which can be fatal. by the end of september , almost a million people had died from covid- , and over million people have been infected (who situation report). although different drug regimens are being applied to hospitalized patients, no clinical study has evidenced their efficacy yet. under this scenario, initiatives launched by the world health organization (who), such as the solidarity study that has compared remdesivir, hydroxychloroquine, ritonavir/lopinavir and ritonavir/lopinavir plus ßinterferon regimes, have been of critical importance to prioritize the use of the most active compounds (who, ) . unfortunately, although remdesivir has proven efficacy in randomized controlled trials (beigel et al., ; grein et al., ) , a recent update of the who clinical trial has failed to detect any effect on overall mortality, initiation of ventilation and duration of hospital stay with any of the antivirals tested (pan et al., ) . thus, there is an urgent need to identify novel therapeutic approaches for individuals with covid- developing severe disease and fatal outcomes. in this report we present a prioritized list of effective compounds with proven antiviral efficacy in vitro to halt sars-cov- replication. compounds were analyzed depending on their expected mechanism of action, to identify candidates tackling diverse steps of the viral life cycle. sars-cov- entry requires viral binding and spike protein activation via interaction with the cellular receptor ace and the cellular protease tmprss (hoffmann et al., ) , a mechanism favored by viral internalization via endocytosis. interference with either of these processes has proven to decrease sars-cov- infectivity (hoffmann et al., ; monteil et al. ) , and therefore, inhibitors targeting viral entry may prove valuable. in addition, sars-cov- enters into the cells via endocytosis and accumulates in endosomes where cellular cathepsins can also prime the spike protein and favor viral fusion upon cleavage (hoffmann et al., ; mingo et al., ; simmons et al., ) , providing additional targets for antiviral activity. once sars-cov- fuses with cellular membranes, it triggers viral rna release into the cytoplasm, where polyproteins are translated and cleaved by proteases (song et al., ) . this leads to the formation of an rna replicase-transcriptase complex driving the production of negative-stranded rna via both replication and transcription (song et al., ) . negative-stranded rna transcribes into positive rna genomes, allowing for the translation of viral nucleoproteins, which assemble in viral capsids at the cytoplasm (song et al., ) . these capsids then bud into the lumen of endoplasmic reticulum (er)-golgi compartments, where viruses are finally released into the extracellular space by exocytosis. potentially, any of these viral cycle steps could be targeted with antivirals, so we have thus searched for these compounds as well. finally, as the most effective antiviral treatments are usually based on combined therapies that tackle distinct steps of the viral life cycle, we also tested the active compounds in combination. these combinations may be critical to abrogate the potential emergence of resistant viruses and to increase antiviral activity, enhancing the chances to improve clinical outcome. we have tested the antiviral activity of different clinically available compounds and their combinations by assessing their ability to inhibit viral induced cytopathic effect in vitro. our strategy was circumscribed mostly to compounds approved for clinical use, since they are ideal candidates for entering into fast track clinical trials. drug selection criteria first focused on compounds already being tested in clinical trials, along with well-known human immunodeficiency virus- (hiv- ) and hepatitis c virus (hcv) protease inhibitors, as well as other compounds suggested to have potential activity against sars-cov- in molecular docking analysis or in vitro assays. we first assessed the activity of compounds with hypothetical capacity to inhibit viral entry, and then we focused on drugs thought to block viral replication upon sars-cov- fusion. molecular docking studies provided an additional candidates, which were predicted to inhibit the sars-cov- main protease. finally, compounds with unknown mechanism of action were also assessed. by these means, we have compared drugs and of their combinations for their capacity to counteract sars-cov- induced cytopathic effect in vitro. we first tested compounds that could have an effect before viral entry by impairing viruscell fusion (supp . table ) . hydroxychloroquine is an anti-malarial drug that exerts its activity by disrupting the endosome pathway, and that has been proposed as an anti-sars-cov- agent wang et al., ) . we confirmed the inhibitory effect of hydroxychloroquine against sars-cov- -induced cellular cytotoxicity on vero e cells . a constant concentration of a clinical isolate of sars-cov- (id epi_isl_ ) was mixed with increasing concentrations of hydroxychloroquine and added to vero e cells. to control for drug-induced cytotoxicity, vero e were also cultured with increasing concentrations of hydroxychloroquine in the absence of sars-cov- . by these means we calculated the concentration at which hydroxychloroquine achieved a % maximal inhibitory capacity (ic ). as shown in fig. a , this drug was able to inhibit viral-induced cytopathic effects at concentrations where no cytotoxic effects of the drug were observed. the mean ic value of this drug in repeated experiments was always below µm (supp . table ) . these results aligned with previous reports highlighting the in vitro inhibitory capacity of chloroquine derivatives wang et al., ) , but contrasted with data on animal models (maisonnasse et al., ; rosenke et al., ) or currently ongoing clinical trials that have failed to detect any associated benefit of hydroxychloroquine treatment (boulware et al., ; cavalcanti et al., ) . since hydroxychloroquine was first administered in combination with the antibiotic azithromycin (gautret et al., ) , which induces antiviral responses in bronchial epithelial cells (gielen et al., ) , we further tested the activity of this compound in our assay. however, in the vero e model, azithromycin did not show any antiviral effect (fig. a) , and the combination of hydroxychloroquine with azithromycin had a similar activity to that of the chloroquine derivative alone (fig. a) . indeed, this was also the case when we tested hydroxychloroquine in combination with different hiv- protease inhibitors and other relevant compounds currently being tested in clinical trials (supp. table ). additional food and drug administration (fda)-approved compounds previously used to abrogate viral entry via clathrin-mediated endocytosis were also tested in this sars-cov- -induced cytotoxicity assay (supp . table ) . indeed, interference with clathrinmediated endocytosis is one of the potential mechanisms by which hydroxychloroquine may exert its therapeutic effect against sars-cov- . one of these compounds was amantadine, which blocks coated pit invaginations at the plasma membrane (phonphok and rosenthal, ) and is licensed against influenza a virus infections and as a treatment for parkinson's disease. in addition, we also tested chlorpromazine, an antipsychotic drug that inhibits clathrin-mediated endocytosis by preventing the assembly and disassembly of clathrin networks on cellular membranes or endosomes (wang et al., ) . when we assessed the antiviral efficacy of these clathrin inhibitors against sars-cov- , we did not find any prominent effect; only a partial inhibition at µm for amantadine (fig. b) . the broad cathepsin b/l inhibitor e -d also showed partial inhibitory activity (fig. b) . e -d exerts activity against viruses cleaved by cellular cathepsins upon endosomal internalization, as previously described using pseudotyped sars-cov- (hoffmann et al., ) . while these results could not be confirmed using the specific cathepsin b inhibitor ca- -me due to drug-associated toxicity (supp . table ) , it is important to highlight that none of these cathepsin inhibitors is approved for clinical use. these data suggest that sars-cov- entry in vero e partially relies on clathrin-mediated endocytosis and cellular cathepsins, which cleave the viral spike protein and allow viral fusion once sars-cov- is internalized in endosomes. however, hydroxychloroquine antiviral activity was much more potent than that exerted by amantadine or e- d ( fig. a-b) . recently, it has also been suggested that hydroxychloroquine could block sars-cov- spike interaction with gm gangliosides (fantini et al., ) . gm gangliosides are enriched in cholesterol domains of the plasma membrane and have been previously shown to bind to sars-cov spike protein (lu et al., ) . this mode of viral interaction is aligned with the capacity of methyl-beta cyclodextrin, which depletes cholesterol from the plasma membrane to abrogate sars-cov- induced cytopathic effect (fig. b) , as previously reported for sars-cov (lu et al., ) . removal of cholesterol redirected ace receptor to other domains, but did not alter the expression of the viral receptor (lu et al., ) . moreover, nb-dnj, an inhibitor of ganglioside biosynthesis pathway, also decreased sars-cov- cytopathic effect (fig. b) . these results highlight the possible role of gangliosides in viral binding, although the polar head group of gm ganglioside ( ' sialyllactose) was not able to reduce viral-induced cytopathic effect (supp. table ). agents involved in autophagy, such as the becn -stabilizing compounds niclosamide or ciclesonide, inhibit the release of infectious sars-cov- to the supernatant (gassen et al., ) or reduce the expression of viral nucleoprotein h post-infection (jeon et al., ) . however, these autophagy inhibitors were highly toxic in our three-day assay (supp . table ). arbidol is a compound that intercalates into membrane lipids leading to the inhibition of membrane fusion between viruses and cells, and between viruses and endosomal membranes (haviernik et al., ) . although arbidol has exhibited in vitro efficacy against sars-cov- (li, ) it showed drug associated cytotoxicity in our assay (supp. table ). we also tested the antiviral activity of two jak inhibitors: baricitinib and tofacitinib. baricitinib was previously suggested to reduce viral entry by interfering with ap -associated protein kinase (aak ) necessary for clathrin mediated endocytosis stebbing et al., ) . however, neither this compound or tofacitinib protected vero e cells from sars-cov- induced cytopathic effect (supp. table ). while we did not detect an antiviral effect for jak inhibitors, these compounds may still be useful to control hyperinflammation and cytokine storm at later stages of infection ). finally, we tested camostat, a serine protease inhibitor with capacity to abrogate sars-cov- spike priming on the plasma membrane of human pulmonary cells and avoid viral fusion (hoffmann et al., ) . camostat showed no antiviral effect on vero e cells (fig. c) , what indicates that the alternative viral endocytic route is the most prominent entry route in this renal cell type. of note, a broader cellular protease inhibitor such as the alfa -antitrypsin (att), used to treat severe att human deficiency, was able to exert an antiviral effect on vero e . however, it required high concentrations that will most likely rely on the activity of these proteases in the endosomal route (fig. c) . in order to confirm that previously identified compounds listed in supp. table specifically inhibit the viral entry step, we next employed a luciferase-based assay using pseudotyped lentivirus expressing the spike protein of sars-cov- , which allows to detect viral fusion on hek- t cells transfected with ace . as a control, we used the same lentiviruses pseudotyped with a vsv glycoprotein, where no entry inhibition above % was detected for any of the drugs tested (data not shown). in sharp contrast, sars-cov- pseudoviruses were effectively blocked by most of the drugs previously tested on vero e with wild-type virus (fig. d) . the main differences were observed with ca- -me, ciclesonide and arbidol. these compounds showed a partial blocking effect on ace hek- t cells that was not obvious when using replication competent sars-cov- on vero e (supp. fig. ). in addition, nb-dnj failed to block viral entry ( fig. d ), suggesting that ganglioside dependence may be reduced in ace overexpressing cells or, alternatively, that this drug requires a longer exposure time to effectively reduce the content of gangliosides via biosynthesis blocking. overall, using alternative sars-cov- viral systems, we could identify chloroquine derivatives, cathepsin inhibitors and cholesterol depleting agents as the most promising candidates to block sars-cov- endocytosis in vero e and hek- t cells transfected with ace . however, chloroquine derivatives were the only ones that displayed an ic below µm (supp . table ) , and were also active abrogating pseudoviral entry into hek- t cells expressing ace (fig. e) . although camostat failed to inhibit viral fusion on ace hek- t cells (fig. e) , its activity was rescued when these cells were transfected with tmprss . the opposite effect was observed for chloroquine, which reduced its inhibitory activity on tmprss transfected cells (fig. e ). thus, the expression of cellular proteases on the plasma membrane facilitates the fusion with viral membranes, decreasing the likelihood of viral entry through the endosomal route. these data concur with previous findings in pulmonary cells, where viral entry via endosomal route was not active since chloroquine failed to abrogate viral fusion (maisonnasse et al., ) ; however, camostat effectively blocked this entry (hoffmann et al., ) . our results highlight that alternative routes govern sars-cov- viral entry and these pathways vary depending on the cellular target. thus, effective treatments may need to block both plasma membrane fusion and endosomal routes to fully achieve viral suppression. in our search for antivirals inhibiting post-viral entry steps, we first focused on remdesivir, which has in vitro activity against sars-cov- after viral entry and has already been approved for the treatment of covid- by the fda and ema. we further confirmed the in vitro capacity of remdesivir to inhibit sars-cov- induced cytopathic effect on vero e ( fig. a) . the mean ic value of this drug in repeated experiments was always below µm (supp. table ). in combination with hydroxychloroquine, however, remdesivir did not significantly modified its own antiviral effect ( fig. b) , either when hydroxychloroquine was added at increasing concentrations or at different fixed concentrations of the drug. this was also the case for other antivirals tested in combination (supp. table ). of note, other rna polymerase inhibitors such as galdesivir, which was proposed to tightly bind to sars-cov- rna-dependent rna polymerase (elfiky, ) , showed no antiviral effect (supp . table ). favipiravir, approved by the national medical products administration of china as the first anti-covid- drug in china (tu et al., ) , showed only partial inhibitory activity at the non-toxic concentration of µm (supp . table ) . we also assessed clinically approved protease inhibitors with potent activity against hiv- . however, none of the hiv- protease inhibitors detailed in supp. table showed remarkable protective antiviral activity against sars-cov- infection on vero e cells, with the exception of nelfinavir mesylate hydrate, which showed an ic value below µm (supp . table and fig. c ). lopinavir and tipranavir inhibited sars-cov- induced cytopathic effect at the non-toxic concentration of µm, and amprenavir exhibited activity at the non-toxic concentration of µm (fig. c) . darunavir, which is currently being tested in ongoing clinical trials, showed partial inhibitory activity at µm, although this concentration had . ± . % of cytotoxicity associated (fig. c) . of note, we tested hiv- reverse transcriptase inhibitors such as tenofovir disoproxil fumarate, emtricitabin, tenofovir alafenamide, and their combinations, but they also failed to show any antiviral effect against sars-cov- (supp. fig. ) . these results indicate that future clinical trials should contemplate the limited antiviral effect displayed by these anti-hiv- inhibitors against sars-cov- in vitro. we also assessed the inhibitory capacity of hcv protease inhibitors, but none showed any antiviral activity (supp. table ). of note, exogenous interferons alpha and gamma displayed antiviral activity against sars-cov- (supp. table ). in light of these results, we tested the inhibitory effect of the tlr agonist vesatolimod that triggers interferon production. although this agonist was not able to protect from the viralinduced cytopathic effect on vero e (supp . table ) , as expected since it is an interferon-producer deficient cell line (emeny and morgan, ) , it could still be useful in other competent cellular targets. since severe covid- patients display impaired interferon responses (hadjadj et al., ) , these strategies may be valuable to avoid disease complication. in addition, we also assessed several compounds with the best computational docking scores among approved drugs against the cl protease of sars-cov- , but none of them were effective to protect vero e from viral induced cytopathic effect (supp. table ) . the most potent antiviral tested was plitidepsin (fig. d) , which targets the eukaryotic elongation factor a (eef a ) and has been previously used for the treatment of multiple myeloma. the mean ic value of this drug in repeated experiments was always in nm concentrations (supp . table ). in combination with other active antivirals, we did not observe a reduction on ic values (supp. table ). this result indicates no significant synergy, but also highlights the possibility of using plitidepsin without reducing its antiviral activity in combined therapies (fig. d) , what could be relevant to avoid possible selection of resistant viruses. overall, plitidepsin showed the lowest ic values of all the compounds tested in this in vitro screening ( table ) . we also assessed the inhibitory capacity of several inhibitors and broad anti-bacterial, anti-parasitic, anti-malarial, anti-influenza and anti-fungal compounds, along with other pharmacological agents previously suggested to interfere with sars-cov- infection (supp . table ). such was the case of ivermectin, an fda-approved broad spectrum anti-parasitic agent previously reported to inhibit the replication of sars-cov- in vitro as measured by rna accumulation (caly et al., ) . however, among these potential antivirals, only three types of molecules exerted detectable antiviral activity in our assay: itraconazole, fenofibrate, and calpain and cathepsin inhibitors such as mdl and npo compounds. itraconazole, an antifungal that may interfere with internal sars-cov- budding within infected cells (wu et al., ) , displayed an ic value of µm ( fig. a and supp. table ). fenofibrate is clinically used to treat dyslipidemia via activation of ppara, and also inhibited the cytopathic effect exerted by sars-cov- on vero e at µm ( fig. b and supp. table ). as fenofibrate is a regulator of cellular lipid metabolism, we made use of the luciferase-based viral entry assay to try to elucidate its mode of action. when lentiviruses pseudotyped with the spike protein of sars-cov- were added to ace- expressing hek- t cells in the presence of fenofibrate, viral entry was abrogated (fig. c) . the most potent agent found was mdl , a calpain iii inhibitor in a pre-clinical stage of development that displayed activity in the nanomolar range ( fig. d and supp. (riva et al., ) . moreover, three out of four different calpain and cathepsin inhibitors named npo showed potent antiviral activity too (supp. figure ) . of note, in combination with other active antivirals, we did not observe a reduction on ic values of mdl (supp. table ) . inhibitors of calpains, which are cysteine proteases, might impair the activity of viral proteases like cl (main protease) and plpro (papain-like protease) (riva et al., ; schneider et al., ) . however, calpain inhibitors may also inhibit cathepsin bmediated processing of viral spike proteins or glycoproteins, including sars-cov and ebola (schneider et al., ; zhou and simmons, ) . to understand the mechanisms of action of calpain and cathepsin inhibitors such as mdl , we added lentiviruses pseudotyped with the spike protein of sars-cov- to ace- -expressing hek- t cells and the same cells also expressing tmprss in the presence of this drug. importantly, mdl only blocked viral entry in ace- -expressing cells (fig. e) . this result indicates that mdl blocks cathepsins that are implicated in sars-cov- entry via the alternative endosomal pathway, as described for chloroquine derivatives and e- d (fig. e) , which are all active when tmprss is not present and their inhibitor camostat displays no activity (fig. e) . in conclusion, among the compounds and their combinations tested herein for their potential capacity to abrogate sars-cov- cytopathic effect, we only found compounds with antiviral activity, and only eight types of these drugs had an ic below µm or iu/ml ( table ) . these eight families of compounds were able to abrogate sars-cov- release to the supernatant in a dose dependent manner (fig. ) , indicating that the reduction in the cytopathic effect that we had measured in cells correlates with viral production. as these eight families of compounds tackle different steps of the viral life cycle, they could be tested in combined therapies to abrogate the potential emergence of resistant viruses. we have assessed the anti-sars-cov- activity of clinically approved compounds that may exert antiviral effect alone or in combination. although we were not able to detect any remarkable synergy in vitro, combined therapies are key to tackle viral infections and to reduce the appearance of viral resistance. we have tested more than seventy compounds and their combinations, and verified a potent antiviral effect of hydroxychloroquine and remdesivir, along with plitidepsin, cathepsin and calpain inhibitors mdl and npo, nelfinavir mesylate hydrate, interferon a, interferon-g and fenofibrate. these are therefore the most promising agents found herein that were able to protect cells from viral-induced cytopathic effect by preventing viral replication. our findings highlight the utility of using hydroxychloroquine and mdl or other cathepsin inhibitors to block viral entry via the endosomal pathway in kidney cell lines such as vero e or hek- t. however, the endosomal viral entry route is absent in pulmonary cells and, therefore, camostat should be considered as the primary inhibitor to limit sars-cov- entry in pulmonary tissues or in cells expressing tmprss . these findings can explain why randomized clinical trials using hydroxychloroquine have failed to show a significant protective effect (boulware et al., ; cavalcanti et al., ) . nonetheless, in combined therapies, it should be noted that agents targeting the alternative endosomal sars-cov- entry route such as hydroxychloroquine or mdl could be key to stop viral dissemination in other extrapulmonary tissues where viral replication has been already detected , and viral entry could take place through this endosomal pathway. this could partially explain why in a retrospective observational study including more than patients, hydroxychloroquine treatment showed a significant reduction of in-hospital mortality (arshad et al., ) . thus, since alternative routes govern sars-cov- viral entry depending on the cellular target (ou et al., a) , effective treatments might be needed to block both plasma membrane fusion and endosomal entry to broadly achieve viral suppression. sars-cov- replication could be effectively blocked using nelfinavir mesylate hydrate, remdesivir and plitidepsin. while nelfinavir showed lower potency, remdesivir and plitidepsin were the most potent agents identified. however, remdesivir and plitidepsin are not yet suitable for oral delivery and require intravenous injection, complicating their clinical use for prophylaxis. finally, we also confirmed the antiviral effect of type i and ii interferons as well as fenofibrate, which have been extensively used in the clinic for many years and may therefore prove valuable for therapeutic use. the data presented herein should be interpreted with caution, as the ic values of drugs obtained in vitro may not reflect what could happen in vivo upon sars-cov- infection. the best antiviral compounds found in the present study need to be tested in adequate animal models. this strategy already helped to confirm the activity of remdesivir against sars-cov- , while also questioning the use of hydroxychloroquine in monotherapy (maisonnasse et al., ) . thus, assessing antiviral activity and safety in animal models is key to identify and advance those compounds with the highest potential to succeed in upcoming clinical trials. in turn, in vitro results confirmed in animal models will provide a rational basis to perform future clinical trials not only for treatment of sars-cov- -infected individuals, but also for pre-exposure prophylaxis strategies that could avoid novel infections. prophylaxis could be envisioned at a population level or to protect the most vulnerable groups, and should be implemented until an effective vaccine is developed. in particular, orally available compounds with proven safety profiles, such as fenofibrate, could represent promising agents. germans trias i pujol (hugtip) approved this study. the individual who provided the sample to isolate virus gave a written informed consent to participate. eagle medium, (dmem; lonza) supplemented with % fetal calf serum (fcs; euroclone), u/ml penicillin, µg/ml streptomycin, and mm glutamine (all thermofisher scientific). hek- t (atcc repository) were maintained in dmem with % fetal bovine serum, iu/ml penicillin and µg/ml streptomycin (all from invitrogen). hek- t overexpressing the human ace were kindly provided by integral molecular company and maintained in dmem (invitrogen) with % fetal bovine serum, iu/ml penicillin and µg/ml streptomycin, and µg/ml of puromycin (all from invitrogen). tmprss human plasmid (origene) was transfected using x-tremegene hp transfection reagent (merck) on hek- t overexpressing the human ace and maintained in the previously described media containing mg/ml of geneticin (invitrogen) to obtain tmprss /ace hek- t cells. virus isolation, titration and sequencing. sars-cov- was isolated from a nasopharyngeal swab collected from an -year-old male patient giving informed consent and treated with betaferon and hydroxychloroquine for days before sample collection. the swab was collected in ml medium (deltaswab vicum) to reduce viscosity and stored at - ºc until use. vero e cells were cultured on a cell culture flask ( cm ) at . x cells overnight prior to inoculation with ml of the processed sample, for h at ºc and % co . afterwards, ml of % fcs-supplemented dmem were supplied and cells were incubated for h. supernatant was harvested, centrifuged at x g for min to remove cell debris and stored at - ºc. cells were assessed daily for cytopathic effect and the supernatant was subjected to viral rna extraction and specific rt-qpcr using the sars-cov- upe, rdrp and n assays (corman et al., ) . the virus was propagated for two passages and a virus stock was prepared collecting the supernatant from vero e . viral rna was extracted directly from the virus stock using the indimag pathogen kit (indical biosciences) and transcribed to cdna using the primescript™ rt reagent kit (takara) using oligo-dt and random hexamers, according to the manufacturer's protocol. dna library preparation was performed using swift amplicon sars-cov- panel (swift biosciences). sequencing ready libraries where then loaded onto illumina miseq platform and a bp paired-end sequencing kit. sequence reads were quality filtered and adapter primer sequences were trimmed using trimmomatic. amplification primer sequences were removed using cutadapt (martin, ) . sequencing reads were then mapped against coronavirus reference (nc_ . ) using bowtie tool (langmead, b. and salzberg, s, ) . consensus genomic sequence was called from the resulting alignment at a x x average coverage using samtools (li et al., ) . genomic sequence was deposited at gisaid repository (http://gisaid.org) with accession id epi_isl_ . pseudovirus production. hiv- reporter pseudoviruses expressing sars-cov- spike protein and luciferase were generated using two plasmids. pnl - .luc.r-.e-was obtained from the nih aids repository. sars-cov- .sctΔ was generated (geneart) from the full protein sequence of sars-cov- spike with a deletion of the last amino acids in c-terminal, human-codon optimized and inserted into pcdna . -topo (ou et al., b) . spike plasmid was transfected with x-tremegene hp transfection reagent µm to . nm at ⅕ serial dilutions. plitidepsin was also assayed at concentrations ranging from µm to . nm at / dilutions. interferons were assayed at concentrations ranging from to . iu/ml at ⅕ serial dilutions. when two drugs were combined, each one was added at a : molar ratio at concentrations ranging from µm to . nm at ⅕ serial dilutions. in combination with other drugs, plitidepsin was also assayed at concentrations ranging from µm to . nm at / dilutions. e cells together with . tcid /ml of sars-cov- , a concentration that achieves a % of cytopathic effect. untreated non-infected cells and untreated virus-infected cells were used as negative and positive controls of infection, respectively. to detect any drugassociated cytotoxic effect, vero e cells were equally cultured in the presence of increasing drug concentrations, but in the absence of virus. cytopathic or cytotoxic effects of the virus or drugs were measured days after infection, using the celltiter-glo luminescent cell viability assay (promega). luminescence was measured in a fluoroskan ascent fl luminometer (thermofisher scientific). were adjusted to a non-linear fit regression model, calculated with a four-parameter logistic curve with variable slope. cells not exposed to the virus were used as negative controls of infection, and were set as % of viability to normalize data and calculate the percentage of cytopathic effect. statistical differences from % were assessed with a one sample t test. all analyses and figures were generated with the graphpad prism v . b software. in silico drug modeling. we performed glide docking using an in-house library of all approved drug molecules on the cl protease of sars-cov- . for this, two different receptors were used, the lu pdb structure, after removing the covalently bound inhibitor, and a combination of two crystals from the diamond collection (https://www.diamond.ac.uk/covid- ). receptors were prepared with the schrodinger's protein wizard and glide sp docking was performed with two different hydrogen bond constraints: glu and his (with epsilon protonation); we enforced single constraints and also attempted the combination of both. the best molecules, based on glides's docking score were selected. top docking scores, however, did not exceed - , indicating poor potential binding. pseudovirus assay. hek- t overexpressing the human ace and tmprss were used to test antivirals at the concentrations found to be effective for sars-cov- without toxicity, which were the following: µm for niclosamide; µm for chloroquine, chlorpromazine, ciclesonide, mdl and fenofibrate; µm for hydroxychloroquine, ca- -me and arbidol hcl; µm for e- d; µm for baricitinib; µm for amantadine, nb-dnj, ' sialyl-lactose na salt, tofacitinib, and camostat mesylate; µm for methyl-b-cyclodextrin, and , mg/ml for att. a constant pseudoviral titer was used to pulse cells in the presence of the drugs. h postinoculation, cells were lysed with the glo luciferase system (promega). luminescence was measured with an ensight multimode plate reader (perkin elmer). table . compounds with antiviral activity grouped in colors depending on their ic values, expressed in µm unless otherwise indicated. activity of hydroxychloroquine and azithromycin. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of hydroxychloroquine, azithromycin, and their combination. drugs were used at a concentration ranging from . nm to µm. when combined, each drug was added at the same concentration. non-linear fit to a variable response curve from one representative experiment with two replicates is shown (red lines), excluding data from drug concentrations with associated toxicity. the particular ic value of this graph is indicated. cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). b. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of amantadine, a clathrin-mediated endocytosis inhibitor, e- d, a pancathepsin inhibitor acting downstream once viruses are internalized in endosomes, nb-dnj, an inhibitor of ganglioside biosynthesis and methyl-b-cyclodextrin, a cholesteroldepleting agent. all drugs were used at a concentration ranging from . nm to µm aside from methyl-b-cyclodextrin, which was used times more concentrated. nonlinear fit to a variable response curve from one experiment with two replicates is shown (red lines). cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). c. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of camostat, a tmprss inhibitor, and att, an alfa- antyitrypsin, a broad cellular protease inhibitor, as described in a. d. effect of entry inhibitors on luciferase expression of reporter lentiviruses pseudotyped with sars-cov- spike in ace expressing hek- t cells. values are normalized to luciferase expression by mock-treated cells set at %. mean and s.e.m. from two experiments with two replicates. cells were exposed to fixed amounts of sars-cov- spike lentiviruses in the presence of a non-toxic constant concentration of the drugs tested on vero e . statistical deviations from % were assessed with a one sample t test. e. comparison of entry inhibitors blocking viral endocytosis, such as chloroquine, with inhibitors blocking serine protease tmprss expressed on the cellular membrane, such as camostat, on different cell lines. ace expressing hek- t cells transfected or not with tmprss were exposed to sars-cov- spike lentiviruses as described in b. values are normalized to luciferase expression by mock-treated cells set at %. mean and s.e.m. from at least two representative experiments with two replicates. statistical deviations from % were assessed with a one sample t test. a. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of remdesivir. drug was used at a concentration ranging from . nm to µm. non-linear fit to a variable response curve from one representative experiment with two replicates is shown (red lines), excluding data from drug concentrations with associated toxicity. the particular ic value of this graph is indicated. cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). b. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of remdesivir and its combination with hydroxychloroquine, as detailed in a. drugs in combination were used at a concentration ranging from . nm to µm (left panel). alternatively, remdesivir was used at a concentration ranging from . nm to µm at the fixed indicated concentrations of hydroxychloroquine (right panel). c. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of protease inhibitors against hiv- . nelfinavir mesylate hydrate was the only drug with activity. inhibitors were used at a concentration ranging from . nm to µm. the particular ic value of this graph is indicated d. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of plitidepsin and its combinations with hydroxychloroquine and remdesivir. when combined, each drug was added at the same concentration. drugs were used at a concentration ranging from . nm to µm. the particular ic value of these graphs is indicated. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of itraconazole. drug was used at a concentration ranging from . nm to µm . non-linear fit to a variable response curve from one representative experiment with two replicates is shown (red lines), excluding data from drug concentrations with associated toxicity. the particular ic value of this graph is indicated. cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). b. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence were assessed with a one sample t test. supp. table . antiviral activity of potential entry inhibitors tested against sars-cov- . na; not active. ic values are reported in µm unless otherwise indicated. supp. table . antiviral activity of potential inhibitors against sars-cov- tested in combination. na; not active. table . antiviral activity of potential post-entry inhibitors against sars-cov- . na; not active. ic values are reported in µm unless otherwise indicated. supp. table . antiviral activity of potential inhibitors against sars-cov- with predicted capacity to block sars-cov- viral protease. na; not active. supp. table . antiviral activity of potential inhibitors against sars-cov- with unknown mechanism of action. na; not active. cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of hiv- reverse transcriptase inhibitors. drugs were used at a concentration ranging from . nm to µm. non-linear fit to a variable response curve from one experiment with two replicates is shown (red lines). cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). cytopathic effect on vero e cells exposed to a fixed concentration of sars-cov- in the presence of increasing concentrations of calpain and cathepsin inhibitors npo. drugs were used at a concentration ranging from . nm to µm. non-linear fit to a variable response curve from one experiment with two replicates is shown (red lines). cytotoxic effect on vero e cells exposed to increasing concentrations of drugs in the absence of virus is also shown (grey lines). the research of cbig consortium (constituted by irta-cresa, bsc, & irsicaixa) is supported by grifols pharmaceutical. the authors also acknowledge the crowdfunding initiative #yomecorono (https://www.yomecorono.com). js, jva and niu have nonrestrictive funding from pharma mar to study the antiviral effect of plitidepsin. the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. a patent application based on this work has been filed (ep . ). the authors declare that no other competing financial interests exist. treatment with hydroxychloroquine, azithromycin, and combination in patients hospitalized with covid- remdesivir for the treatment of covid- -preliminary report a randomized trial of hydroxychloroquine as postexposure prophylaxis for covid- the fdaapproved drug ivermectin inhibits the replication of sars-cov- in vitro hydroxychloroquine with or without azithromycin in mild-to-moderate covid- epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study detection of novel coronavirus ( -ncov) by real-time rt-pcr ribavirin, remdesivir, sofosbuvir, galidesivir, and tenofovir against sars-cov- rna dependent rna polymerase (rdrp): a molecular docking study regulation of the interferon system: evidence that vero cells have a genetic defect in interferon production structural and molecular modelling studies reveal a new mechanism of action of chloroquine and hydroxychloroquine against sars-cov- infection analysis of sars-cov- -controlled autophagy reveals spermidine hydroxychloroquine and azithromycin as a treatment of covid- : results of an open-label non-randomized clinical trial azithromycin induces anti-viral responses in bronchial epithelial cells compassionate use of remdesivir for patients with severe covid- impaired type i interferon activity and inflammatory responses in severe covid- patients histopathological findings and viral tropism in uk patients with severe fatal covid- : a post-mortem study arbidol (umifenovir): a broad-spectrum antiviral drug that inhibits medically important arthropod-borne flaviviruses the novel coronavirus ( -ncov) uses the sarscoronavirus receptor ace and the cellular protease tmprss for entry into target cells insights from nanomedicine into chloroquine efficacy against covid- identification of antiviral drug candidates against sars-cov- from fda-approved drugs (microbiology) fast gapped-read alignment with bowtie an exploratory randomized controlled study on the efficacy and safety of lopinavir/ritonavir or arbidol treating adult patients hospitalized with mild/moderate covid- (elacoi) the sequence alignment/map format and samtools hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro lipid rafts are involved in sars-cov entry into vero e cells hydroxychloroquine use against sars-cov- infection in non-human primates cutadapt removes adapter sequences from high-throughput sequencing reads ebola virus and severe acute respiratory syndrome coronavirus display late cell entry kinetics: evidence that transport to npc + endolysosomes is a rate-defining step inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace . hydroxychloroquine-mediated inhibition of sars-cov- entry is attenuated by tmprss (microbiology) characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov repurposed antiviral drugs for covid- -interim who solidarity trial results stabilization of clathrin coated vesicles by amantadine, tromantadine and other hydrophobic amines baricitinib as potential treatment for -ncov acute respiratory disease discovery of sars-cov- antiviral drugs through large-scale compound repurposing hydroxychloroquine proves ineffective in hamsters and macaques infected with sars-cov- (pharmacology and toxicology) severe acute respiratory syndrome coronavirus replication is severely impaired by mg due to proteasome-independent inhibition of m-calpain inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry from sars to mers, thrusting coronaviruses into the spotlight covid- : combining antiviral and anti-inflammatory treatments a review of sars-cov- and the ongoing clinical trials mis-assembly of clathrin lattices on endosomes reveals a regulatory switch for coated pit formation remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro solidarity" clinical trial for covid- treatments clinical benefit of remdesivir in rhesus macaques infected with sars-cov- (microbiology) analysis of therapeutic targets for sars-cov- and discovery of potential drugs by computational methods development of novel entry inhibitors targeting emerging viruses we are grateful to patients at the hospital germans trias i pujol that donated their samples for research. for his excellent assistance and advice, we thank jordi puig from fundació lluita contra la sida. we are most grateful to lidia ruiz and the clinical sample management team of irsicaixa for their outstanding sample processing and management, and to m. pilar armengol and the translational genomics platform team at key: cord- - hj hzt authors: yang, jianling; wu, meng; liu, xu; liu, qi; guo, zhengyang; yao, xueting; liu, yang; cui, cheng; li, haiyan; song, chunli; liu, dongyang; xue, lixiang title: cytotoxicity evaluation of chloroquine and hydroxychloroquine in multiple cell lines and tissues by dynamic imaging system and pbpk model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hj hzt chloroquine (cq) and hydroxychloroquine (hcq) have been used in treating covid- patients recently. however, both drugs have some contradictions and rare but severe side effects, such as hypoglycemia, retina and cardiac toxicity. to further uncover the toxicity profile of cq and hcq in different tissues, we evaluated the cytotoxicity of them in cell lines, and further adopted the physiologically-based pharmacokinetic models (pbpk) to predict the tissue risk respectively. retina, myocardium, lung, liver, kidney, vascular endothelium and intestinal epithelium originated cells were included in the toxicity evaluation of cq and hcq respectively. the proliferation pattern was monitored in - hours by incucyte s , which could perform long-term continuous image and video of cells upon cq or hcq treatment. cc and the ratio of tissue trough concentrations to cc (rttcc) were brought into predicted toxicity profiles. the cc at h, h, h of cq and hcq decreased in the time-dependent manner, which indicates the accumulative cytotoxic effect. hcq was found to be less toxic in cell types except cardiomyocytes h c cells (cc - h= . μm; cc - h= . μm). in addition, rttcc is significant higher in cq treatment group compared to hcq group, which indicates that relative safety of hcq. both cq and hcq have certain cytotoxicity in time dependent manner which indicates the necessity of short period administration clinically. hcq has the less impact in cell lines proliferation and less toxicity compared to cq in heart, liver, kidney and lung. the severe acute respiratory syndrome coronavirus (sars-cov- ), was first emerged in china and has spread globally due to its high transmissibility and infectivity, resulting in an unprecedented global public health challenge ( , ). as of april , , more than , , cases have been confirmed around the world, according to data supplied by johns hopkins university, and at least , people have died from the disease ( ). judging from current status, most patients have a good prognosis, nevertheless approximately % of the patients with covid- experienced critical complications, including arrhythmia, acute kidney injury, pulmonary edema, septic shock, and acute respiratory distress syndrome (ards) ( - ). apart from primarily inflammation in the lungs, it is also suggested that other vital organs like kidneys, heart, gut, as well as liver, were also suffered severe damage according to the autopsies, suggesting that individuals or older with chronic underlying diseases appear to have a higher risk for developing severe outcomes. such huge numbers of infected people call for an urgent demand of effective and available drugs to manage the pandemic. unfortunately, at present, there are still no specific antiviral drugs for prevention or treatment of covid- patients. recent publications have demonstrated that chloroquine (cq) and hydroxychloroquine (hcq) efficiently inhibited sars-cov- infection in vitro assay ( - ). cq, together with its derivate hcq, has been commercialized as antimalarial drugs in the clinic for several decades. hcq has also been broadly used in autoimmune diseases treatment, such as systemic lupus erythematosus (sle) and rheumatoid arthtitis ( - ). several clinical trials have confirmed that both cq and hcq were superior to the control group in inhibiting the exacerbation of pneumonia, improving lung imaging findings, as well as promoting the virus negative conversion and shorten the disease course. moreover, the u.s. food and drug administration (fda) also approved cq and hcq for emergency use to treat hospitalized patients for covid- . although exhibiting apparent efficacy and acceptable safety profile for covid- treatment, cq and hcq still have some potential concerns with prolonged usage, including heart rhythm disturbances, gastrointestinal upset, retinal toxicity, in particular for retinopathy ( , ( ) ( ) ( ) ( ) . additionally, risambaf et al. found that cq/hcq may increase the risk of liver and renal impairment when it used to treat . toxicity tolerability in key tissues about drug effectiveness and side effect were critical to understand their mechanism and to optimize dosing regimen by integrating calculated at the given target organ, respectively. the data suggest that hcq was demonstrated to be much less toxic than cq, at least at certain key tissues (heart, liver, kidney, and lung). taken together, this study provides the information regarding cytotoxicity in a wide spectrum and will be beneficial for both pharmacologists and the effect of cq on cell proliferation to gain the more comprehensive cytotoxic information upon cq and hcq treatment, both cq and hcq show strong and immediate toxicity on all cell lines upon treatment more than μm of cq or hcq. as shown in figure and , when the concentration of cq or hcq is higher than μm, the proliferation shows a sudden decline or brake compared with lower dosing regimens. h c (heart) 、 hek ( kidney), and iec- (intestine), are the more sensitive cells to cq compared with other cell lines, as their cc value at h are less than μm ( . with that of h in vero, which may be due to special drug metabolism or stability in it. as the selection index (si) is the safe range to evaluation the drug effect. table ) ( ). therefore, we can preliminarily conclude that the selectivity index (si) of hcq is higher than that of cq in most cell types. using our pbpk models, we simulated the tissues concentrations of hcq ( mg bid for day, mg bid for day to ) and cq ( mg bid for days) ( , ). the cmax of tissue concentrations were summarized in table . results of simulated tissue concentration showed that tissue trough concentration of cq in liver and lung reached the highest level of drug accumulation ( . μg/ml), which is times more than that in heart ( . μg /ml). however, the tissue trough concentration of hcq in lung is the highest level ( . μg/ml) compared with liver, kidney and heart (table and figure ). in order to better predict the toxicity risk of cq and hcq in different tissues, we used the ratio of simulated tissue trough concentration to cc (r ttcc ) to predict the risk of tissue toxicity for the safety profile of these two drugs in the given tissues. as shown in figure , we systematically compared the toxicity between cq and hcq, the r ttcc value of cq is - times more than that of hcq in lung, heart, kidney and liver, which suggests that the toxicity risk of hcq in the above tissues is much lower than that of cq. were obtained as previously reported. the lung to blood concentration ratio for cq and hcq (obtained from animal studies) was used to predict the drug concentration in the lungs, heart, liver, and kidney. to better investigate the potential toxicity in vivo and in vitro, we proposed r ttcc (ratio of tissue concentration and cc ) derived from pbpk model to predict the risk of toxic profiles in different tissues. we compared the r ttcc data collected from heart, liver, kidney, lung, and revealed hcq has shown significantly safe profiles than that of upon cq treatment ( ). however, recent publication reported that cq was safer than hcq according to si ( , ) . we speculate that the safety difference might be due to their complex pharmacokinetic characteristics in vivo, which possessed specific distribution and long half-life of around days. in short, based on our just published study, we further developed the novel parameters to predict the potential toxicity besides the traditional selectivity index (si), (the ratio of the cc to ec ), which is a commonly accepted to measure the window between cytotoxicity and antiviral capacity ( ). as a result, our data shows that kidney, lung and heart are prone to the toxicity of cq, otherwise lung and kidney are relative vulnerable upon hcq treatment ( figure ). in the meantime, considering the un-negligible effect on cardiocytes and retina cells, of which the most patients with the severe symptoms are more likely suffered the dysfunction in heart and eye sight with aging simultaneously. therefore, ecg monitoring should be necessary during clinical usage, even for the patients only infected with covid- but without the underlying diseases. in addition, the more attention should be paid to the patients in the changes of their eye sight when using hcq. in this study, we perform dynamic imaging system to accurately and precisely monitor the whole proliferation process other than conventional cck assay. health organization declares global emergency: a review of the novel coronavirus (covid- ) transmission of -ncov infection from an asymptomatic contact in germany the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak -an update on the status population movement, city closure in wuhan and geographical expansion of the -ncov pneumonia infection in china in covid- : a novel coronavirus and a novel challenge for critical care hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro lupus erythematosus-an ongoing need for international consensus and collaborations from pathogenesis, epidemiology, and genetics to definitions, diagnosis, and treatments of cutaneous lupus erythematosus and dermatomyositis: a report from the rd international conference on cutaneous lupus erythematosus (iccle) hydroxychloroquine-induced hyperpigmentation in systemic diseases: prevalence, clinical features and risk factors: a cross-sectional study of cases antimalarials and ophthalmologic safety updated recommendations on the use of hydroxychloroquine in dermatologic practice effects of chloroquine on viral infections: an old drug against today's diseases? liver and kidney injuries in covid- and their effects on drug therapy; a letter to editor kinetics of the distribution and elimination of chloroquine in the rat simultaneous quantitation of hydroxychloroquine and its metabolites in mouse blood and tissues using lc-esi-ms/ms: an application for pharmacokinetic studies drug treatment options for the -new coronavirus ( -ncov) chloroquine and hydroxychloroquine as available weapons to fight covid- response to recent commentaries regarding the involvement of angiotensin-converting enzyme (ace ) and renin-angiotensin system blockers sars-cov- infections what is the role of covid- infection in hypertensive patients with diabetes? new insights on the antiviral effects of chloroquine against coronavirus: what to expect for covid- ? mechanisms of action of hydroxychloroquine and chloroquine: implications for rheumatology chloroquine is a potent inhibitor of sars coronavirus infection and spread key: cord- - fxrqorg authors: guebre-xabier, mimi; patel, nita; tian, jing-hui; zhou, bin; maciejewski, sonia; lam, kristal; portnoff, alyse d.; massare, michael j.; frieman, matthew b.; piedra, pedro a.; ellingsworth, larry; glenn, gregory; smith, gale title: nvx-cov vaccine protects cynomolgus macaque upper and lower airways against sars-cov- challenge date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fxrqorg there is an urgent need for a safe and protective vaccine to control the global spread of sars-cov- and prevent covid- . here, we report the immunogenicity and protective efficacy of a sars-cov- subunit vaccine (nvx-cov ) produced from the full-length sars-cov- spike (s) glycoprotein stabilized in the prefusion conformation. cynomolgus macaques (macaca fascicularis) immunized with nvx-cov and the saponin-based matrix-m adjuvant induced anti-s antibody that was neutralizing and blocked binding to the human angiotensin-converting enzyme (hace ) receptor. following intranasal and intratracheal challenge with sars-cov- , immunized macaques were protected against upper and lower infection and pulmonary disease. these results support ongoing phase / clinical studies of the safety and immunogenicity of nvx-cov vaccine (nct ). highlights full-length sars-cov- prefusion spike with matrix-m ™ (nvx-cov ) vaccine. induced hace receptor blocking and neutralizing antibodies in macaques. vaccine protected against sars-cov- replication in the nose and lungs. absence of pulmonary pathology in nvx-cov vaccinated macaques. the in-life portion of the study was conducted at bioqual, inc (rockville, md anti-sars-cov- spike (s) protein igg elisa titers were measured as described [ ]. anti-s igg ec titers were calculated by -parameter fitting using softmax pro . . gxp software. individual animal anti-s igg ec titers, group geometric mean titers (gmt) were plotted using graphpad prism . software. antibodies that block binding of hace receptor to the s-protein and neutralize in a cytopathic effect assay (cpe) in vero e cells were measured as described previously as the serum titer that blocks % cpe [ ]. serum antibody titer at % binding inhibition (ic ) of hace to sars-cov- s protein was determined in the softmax pro program. individual animal hace receptor inhibiting titers, mean titers, and sem were plotted using graphpad prism . software. neutralizing antibody titers were figure a) . in contrast, sars-cov- anti-s antibody in convalescent human sera was . -to . -fold less with at gmt ec of , ( figure b) . and, hace receptor inhibition titers of , , , and , in . , , and µg nvx-cov dose groups respectively were . - . -fold higher than in convalescent sera ( figure c) . finally, sars-cov- gmt neutralization antibody titers of , - , cpe in immunized macaques, were . - . -fold higher than in convalescent sera ( figure d) . to evaluate the potential efficacy of nvx-cov vaccine, macaques were challenged with sars-cov- virus in upper and lower airways. macaques in the placebo group had , sgrna copies/ml in the bal at days post challenge and remained elevated at day except for one animal. in contrast, immunized animals had no detectable sgrna in bal fluid other than one animal in the low dose group at day which cleared replicating virus rna by day (figure e) . half of the controls had ~ log of virus sgrna copies in nasal swabs and in contrast, no detectable sgrna was in the nose of nvx-cov vaccinated animals ( figure f) . x -c o v n h p v s c o n v a l e s c e n t a n t i -s i g g t i t e r g g . funding for certain studies was provided by the coalition for epidemic preparedness author contributions conceptualization of experiments, generation of data and analysis, and interpretation of the results drafting and making critical revisions with the help of others sars-cov- spike glycoprotein vaccine candidate nvx-cov elicits immunogenicity in baboons and protection in mice first-in-human trial of a sars cov recombinant spike protein cryo-em structure of the -ncov spike in the prefusion conformation virological assessment of hospitalized patients with covid- infection with novel coronavirus (sars-cov- ) causes pneumonia in rhesus macaques sars-cov- infection protects against rechallenge in rhesus macaques development of an inactivated vaccine candidate for sars-cov- neeltje van doremalen chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques dna vaccine protection against sars-cov- in rhesus macaques evaluation of the mrna- vaccine against sars-cov- in nonhuman primates comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model authors mgx, np, jht, bz, sm, kl, adp, mjm, gg, gs and le are current or past employees of novavax, inc., a for-profit organization, and these authors own stock or hold stock options. these interests do not alter the authors adherence to policies on sharing data and materials. mbf and pap declare no competing interests. key: cord- - rpguepv authors: yan, kexin; rawle, daniel j.; le, thuy t.t.; suhrbier, andreas title: simple rapid in vitro screening method for sars-cov- anti-virals that identifies potential cytomorbidity-associated false positives date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rpguepv the international sars-cov- pandemic has resulted in an urgent need to identify new anti-viral drugs for treatment of covid- patients. the initial step to identifying potential candidates usually involves in vitro screening. here we describe a simple rapid bioassay for drug screening using vero e cells and inhibition of cytopathic effects (cpe) measured using crystal violet staining. the assay clearly illustrated the anti-viral activity of remdesivir, a drug known to inhibit sars-cov- replication. a key refinement involves a simple growth assay to identify drug concentrations that cause cellular stress or “cytomorbidity”, as distinct from cytotoxicity or loss of viability. for instance, hydroxychloroquine shows anti-viral activity at concentrations that slow cell growth, arguing that its purported in vitro anti-viral activity arises from non-specific impairment of cellular activities. the international sars-cov- pandemic has resulted in an urgent need to identify new anti-viral drugs for treatment of covid- patients. the initial step to identifying potential candidates usually involves in vitro screening. here we describe a simple rapid bioassay for drug screening using vero e cells and inhibition of cytopathic effects (cpe) measured using crystal violet staining. the assay clearly illustrated the anti-viral activity of remdesivir, a drug known to inhibit sars-cov- replication. a key refinement involves a simple growth assay to identify drug concentrations that cause cellular stress or "cytomorbidity", as distinct from cytotoxicity or loss of viability. for instance, hydroxychloroquine shows anti-viral activity at concentrations that slow cell growth, arguing that its purported in vitro anti-viral activity arises from non-specific impairment of cellular activities. the global sars-cov- pandemic has resulted in widespread activities seeking to identify new anti-viral drugs that might be used to treat covid- patients [ ] [ ] [ ] [ ] . remdesivir has emerged as a lead candidate with clear anti-viral activity in vitro and non-human primates , with some promising early results in human trials , . the quest for new anti-viral drugs for sars-cov- (as for other viruses) usually begins with in vitro screening to identify potential candidates - . initial screening usually involves assessing whether drugs can inhibit virus replication in a permissive cell line, with vero e cells widely used for sars-cov- . such in vitro screening approaches often identify drugs that work well in vitro, but ultimately fail to have anti-viral activity in vivo. for translational inhibition is also a key anti-viral response, which is able to inhibit replication of many viruses , , including coronaviruses . thus a drug that has no specific anti-viral activity, but that is able to induce cellular stress, may therefore inhibit virus replication non-specifically and generate a potential false positive in the screening assay. a key outcome of stress responses is usually to slow cell growth, allowing the cell to either recover, or if stress and/or damage is excessive, to induce cell death [ ] [ ] [ ] . cells that are slightly poisoned or otherwise compromised (without induction of stress responses) would likely also show reduced growth rates. cell growth of vero e cells can be very simply measured by seeding cells per well in triplicate into a well flat bottom plate and culturing with a range of drug concentrations for days followed by crystal violet staining. the percentage of protein staining relative to a no-drug control is then calculated and provides a simple measure of the drug concentration that slows cell growth. perhaps not surprisingly the drug concentrations that caused inhibition of cell growth were usually lower than the drug concentrations that caused cytotoxicity (fig. , compare black circles with green squares). for some drugs the concentration differences for these two activities were > fold ( fig. , ribavirin, cycloheximide, oleuropein, didemnin b). inhibition of cell growth is not really cytostasis, which generally means no growth, and not really cytotoxicity, which is generally viewed as cell death. the reason(s) for reduced cell growth induced by any given drug may not be clear, and may be related to stress responses or some other phenomena that compromises the cells normal metabolic activities. hence we suggest the term "cytomorbidity" to infer a level of cytotoxicity insufficient to kill the cells or induce cytostasis, but sufficient to stress or compromise the cells, with a simple growth bioassay used to indicate cytomorbidity. a simple rapid bioassay for screening drugs for potential antiviral activity against sars-cov- is to determine whether the drug can inhibit virus-induced cytopathic effects (cpe) in vero e cells. remdesivir is known to inhibit sars-cov- replication and is used herein to illustrate the behavior of an effective drug in this bioassay. remdesivir was able to inhibit virus-induced cpe by % at ≈ µg/ml and the drug caused % cytotoxicity at ≈ µg/ml, providing a selectivity index of ≈ . importantly, remdesivir showed cytomorbidity at ≈ µg/ml, which still leaves a selectivity index of ≈ ( fig. and , remdesivir). hydroxychloroquine was able to inhibit viral cpe by % at ≈ µg/ml and showed a % loss of viability using the mts assay at ≈ µg/ml, suggesting a selectivity index of ≈ . however, cytomorbidity was clearly evident at ≈ µg/ml, so the anti-viral activity occurred at similar concentrations to those that caused cytomorbidity ( fig. , hydroxychloroquine) ; indicating a potential false positive. the overlapping activities are clearly evident when the crystal violet stained plates are viewed (fig. ) . the close relationship between anti-viral activity and translation inhibition (inherent in the stress responses described above) can be seen with the use of the translation inhibitors, cycloheximide and didemnin b. these drugs provide selectivity indices of > , when comparing viral cpe inhibition and cytotoxicity. however, concentrations that inhibited viral cpe again overlapped with those that caused cytomorbidity ( fig. , cycloheximide, didemnin b). the drug γ-mangostin would appear to have a small level of anti-viral activity with a low selectivity index, but again this activity overlapped with the cytomorbidity (fig. , γ-mangostin). thus, as for hydroxychloroquine, the assay results for these latter drugs provide no supportive data for anti-viral activity, instead they suggest these drugs inhibit viral replication non-specifically by impairing cellular activities. nitazoxanide showed some anti-viral activity, but this coincided with cytotoxicity, providing an example of the conventional cytotoxicity control that would be used to argue that the drug has no specific anti-viral activity and has a selectivity index of (fig. , nitazoxanide) . curiously, higher concentrations of nitazoxanide were needed to inhibit cell growth than were needed to induce cytotoxicity; likely an example cell density associated toxicity. the frequently used mts assay, as expected, often gave results similar to those provided by the cytotoxicity assay. importantly the mts assay did not provide a measure of cytomorbidity, presumably because mitochondria largely remain active even in stressed cells and/or cells in g (cytostasis). for oleuropein, cyclosporine a and γ-mangostin, cytomorbidity was associated with an increase in mts activity (fig. ). the mts bioassay may thus provide slightly misleading information in this context; i.e. increased mitochondrial activity, rather than indicating increased cell numbers, can sometimes be associated with stress or mild toxicity. the cpe based assay described herein is really only useful for screening drugs that target the virus directly. for instance, drugs whose mechanism of action requires induction of type i interferons, would be ineffective as vero e cells do not make type i interferons. another limitation of using virus-induced cpe as a read-out for anti-viral drugs is sensitivity. higher drug concentrations are likely needed to prevent viral cpe (overwhelming infection resulting in cell death) than would be needed to inhibit viral replication as measured (for instance) by qrt-pcr of virus released into culture supernatants . nevertheless, the cpe-based assay represents a screening tool able rapidly to identify promising anti-viral candidates. more sensitive assays could be also envisaged for assessing cytomorbidity, such as measuring activation of stress factors such as atf , analyzing cell cycle perturbations by flow cytometry or cell growth kinetics using the incucyte live-cell analysis system. however, the simple growth assay proposed herein allows rapid identification of drug concentrations that disrupt cellular activities/functions, which are often sufficient to inhibit viral replication non-specifically. the cytomorbidity assay thereby flags potential false positives. mts assay; at % same od as control cells with no drug or virus, at % background od. cytomorbidity; at % cells are growing at their normal rate, % no cell growth. viral cpe; at % the drug has completely prevented viral cpe, % represents full viral cpe (no antiviral activity). cell debris removed by centrifugation at x g for min at °c, and virus aliquoted and stored at - °c. virus titers were determined using standard ccid assays (see below). the virus was determined to be mycoplasma free using co-culture with a non-permissive cell line (i.e. hela) and hoechst staining as described . cycloheximide, nitazoxanide, ribavirin, hydroxychloroquine sulfate, γ-mangostin and oleuropein were all purchased from sigma aldrich ribavirin and hydroxychloroquine sulfate was dissolved in ultrapure distilled all other drugs were dissolved in dmso (sigma aldrich) vero e cells were plated as above, /well in triplicate in µl medium and cultured overnight. the drug was diluted in fold serial dilutions in rpmi supplemented with % fcs, and µl was then added per well the plates were cultured for days, after which they were fixed and stained with crystal violet as above. a mts assay was performed where indicated (before fixation and crystal violet staining) using celltiter aqueous one solution cell proliferation assay (mts) (promega) as per manufacturer's instructions vero e cells were plated at cells per well in µl medium all other steps were performed as described above for cytotoxicity testing vero e cells were plated as above, /well in triplicate in µl medium and cultured overnight. the drug was added at times the indicated final concentration in µl rpmi supplemented with % fcs after days of culture the cells were fixed and stained with crystal violet as above heat shock protein inhibits lipopolysaccharide-induced inflammatory mediator production complete removal of mycoplasma from viral preparations using solvent extraction we thank dr i anraku for his assistance in managing the pc (bsl ) facility at qimr key: cord- -ruxz i authors: hennighausen, lothar; lee, hye kyung title: activation of the sars-cov- receptor ace by cytokines through pan jak-stat enhancers date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ruxz i ace , in concert with the protease tmprss , binds the novel coronavirus sars-cov- and facilitates its cellular entry. the ace gene is expressed in sars-cov- target cells, including type ii pneumocytes (ziegler, ), and is activated by interferons. viral rna was also detected in breast milk (wu et al., ), raising the possibility that ace expression is under the control of cytokines through the jak-stat pathway. here we show that ace expression in mammary tissue is induced during pregnancy and lactation, which coincides with the establishment of a candidate enhancer. the prolactin-activated transcription factor stat binds to tandem sites that coincide with activating histone enhancer marks and additional transcription components. the presence of pan jak-stat components in mammary alveolar cells and in type ii pneumocytes combined with the autoregulation of both stat and stat suggests a prominent role of cytokine signaling pathways in cells targeted by sars-cov- . ace , the receptor for sars-cov (imai et al., ) and sars-cov- (hoffmann et al., ) , has been identified in several target cells, including absorptive enterocytes (lamers et al., ) , secretory goblet cells (zhao, ) , the olfactory system (brann, ) and several epithelial cell types (brann, ; lukassen et al., ; qi et al., ) . a study in pneumocytes demonstrated that ace expression is induced by interferons (ziegler, ) , possibly through the transcription factors signal transducer and activator of transcription (stat) and , as the authors suggest. the stat family is comprised of seven transcription factors (stat , , , , a, b and ) that are activated by type i and ii cytokines through their respective receptors and the jak/tyk family of tyrosine kinases (stark and darnell, ) . although each cytokine receptor has some preference for individual stat members, it has become clear that any given cytokine can activate several, if not all, stat members, which subsequently bind to a shared dna motif, the gamma interferon activated sequence (gas) (hennighausen and robinson, ) . this permits individual genes to be activated by more than one cytokine through different receptors and several stat family members. while sars-cov- infection of lung epithelium is driving the disease, disturbances in other cell types (qi et al., ) , such as the olfactory system (brann, ) , have been observed. sars-cov- rna has also been detected in breast milk of infected patients suggesting that the virus can enter differentiated mammary alveolar cells and be vertically transmitted through breast feeding. based on the overlapping activities of jak-stat components and their potential redundancy, it is likely that ace expression is activated by a wide range of cytokines through stats , , and . this has profound implications for strategies to mitigate ace levels. interfering with individual stats will result in the compensational recruitment of other stat members to cytokine receptors (cui et al., ) with all its transcriptional consequences (hennighausen and robinson, ; shin et al., ) . ace mrna levels vary widely between cell types, with high expression detected in lactating mammary and intestinal tissues ( figure a -b) and type ii pneumocytes (ziegler, ) . to explore the possibility that ace gene expression in sars-cov- target cells is regulated not only by interferons but also by a range of cytokines through the family of stat transcription factors, we mined available scrna-seq data (ziegler, ) (table ) . interferon receptors (ifnar) and its downstream mediators jak , jak , tyk as well as stats , and are highly expressed, thus supporting the mechanism of ace induction by ifn-a/b and ifn-g. stat levels increase sharply in cells treated with ifns, supporting the notion of an autoregulatory loop (yuasa and hijikata, ) . moreover, these expression data point to the presence of functional stat and stat signaling cascades. interleukin receptors, such as il- r, that are dependent on the common gamma chain (il rg), jak and jak are also highly expressed. the presence of a wide range of cytokine receptors, jaks and stats, suggests that ace might be activated by a broad selection of extracellular cues and most cytokines, including growth hormone and prolactin. we have tested this premise and explored whether ace is activated in mouse mammary tissue through stat transcription factors. gene expression in mammary epithelium during pregnancy and lactation is activated by prolactin through stat (liu et al., ) . we observed an approximately -fold increase of ace mrna during pregnancy and lactation ( figure b ), which coincided with the establishment of a putative enhancer (figure a ). tmprss mrna levels were similar throughout pregnancy and lactation ( figure b ), suggesting that its expression is not under overt control of the jak/stat pathway. stat was recruited to two distinct gas (bona fide stat binding motifs) in the candidate enhancer and cooccupancy of the glucocorticoid receptor (gr), nuclear factor b (nfib) and mediator complex subunit (med ) is likely not through their individual recognition motifs but through contacting stat . the presence of h k me enhancer marks, h k ac marks and rna polymerase ii (pol ii) occupancy further supports the validity of this regulatory region. of note, no stat occupancy was observed, suggesting a predominance of stat . in contrast to mammary tissue, limited stat binding was observed in liver and no stat and stat binding was observed in kidney tissue ( figure b ). the putative autoregulatory enhancer in the stat gene served as a positive control for stat binding ( figure c ). our study demonstrates the presence of pan jak/stat components in type ii pneumocytes, suggesting that ace is not only activated by ifn-a/b and ifn-g but also by other cytokines. moreover, we demonstrate an ~ -fold increase of ace expression by pregnancy and lactation hormones in mouse mammary tissue. future inquiries aimed at understanding the mechanism of ace gene regulation in potential sars-cov- target cells need to address the pan jak-stat pathway as well as steroid hormones, which might explain some of the sex differences seen in covid- morbidity and mortality. such investigations would need to include experimental approaches that comprehensively interrogate regulatory elements controlling ace expression in vivo in human tissues, both in males and females at different ages. since underlying preexisting conditions, such as obesity, diabetes and high blood pressure, can affect the severity and progression of covid- , it would prudent to take this into account when analyzing the control of ace regulation. chromatin immunoprecipitation sequencing (chip-seq) analysis. quality filtering and alignment of the raw reads was done using trimmomatic (bolger et al., ) was used for visualization. coverage plots were generated using homer (heinz et al., ) software with the bedgraph from deeptools as input. r and the packages dplyr (https://cran.r-project.org/package=dplyr) and ggplot (love et al., ) were used for visualization. each chip-seq experiment was conducted for two replicates. sequence read numbers were calculated using samtools (masella et al., ) software with sorted bam files. the correlation between the chip-seq replicates was computed using deeptools using spearman correlation. rna-seq analysis. rna-seq reads were analyzed using trimmomatic (bolger et al., ) (version . ) to check read quality (with following parameters: leading: , trailing: , slidingwindow: : , minlen: ). the alignment was performed in bowtie aligner (langmead et al., ) (version . . ) using paired end mode. rna-seq data from human and mouse tissues shown in figure a were obtained from encode. rna-seq data shown in fig. b the authors declare not competing interests. table . mrna levels of genes associated with the pan jak-stat pathway in primary human basal epithelial cells. scrna-seq data were extracted from the study by ziegler and colleagues (ziegler, ) . the human bronchial cell line (beas- b) and airway basal cells from human donors had been exposed to interferons (ifna and ifng) and cytokines (il and il a). scrna-seq libraries were generated with , cells. mrna levels for genes in jak/stat signaling pathway were collected from the data and averages of independent biological replicates were normalized to the value of untreated group. genes that were regulated more than -fold by interferons and cytokines are marked in red and highlighted colors. trimmomatic: a flexible trimmer for illumina sequence data non-neuronal expression of sars-cov- entry genes in the olfactory system suggests mechanisms underlying covid- -associated anosmia loss of signal transducer and activator of transcription leads to hepatosteatosis and impaired liver regeneration simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities interpretation of cytokine signaling through the transcription factors stat a and stat b sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme protects from severe acute lung failure sars-cov- productively infects human gut enterocytes ultrafast and memoryefficient alignment of short dna sequences to the human genome stat a is mandatory for adult mammary gland development and lactogenesis moderated estimation of fold change and dispersion for rna-seq data with deseq sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells bamql: a query language for extracting reads from bam files single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses deeptools : a next generation web server for deepsequencing data analysis hierarchy within the mammary stat -driven wap super-enhancer the jak-stat pathway at twenty integrative genomics viewer (igv): high-performance genomics data visualization and exploration coronavirus disease among pregnant chinese women: case series data on the safety of vaginal birth and breastfeeding distal regulatory element of the stat gene potentially mediates positive feedback control of stat expression single-cell rna expression profiling of ace , the receptor of sars-cov- sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues key: cord- -sygnmiun authors: lam, sd; bordin, n; waman, vp; scholes, hm; ashford, p; sen, n; van dorp, l; rauer, c; dawson, nl; pang, csm; abbasian, m; sillitoe, i; edwards, sjl; fraternali, f; lees, jg; santini, jm; orengo, ca title: sars-cov- spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sygnmiun sars-cov- has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. to enter host cells, the viral spike protein (s-protein) binds to its receptor, ace , and is then processed by tmprss . whilst receptor binding contributes to the viral host range, s-protein:ace complexes from other animals have not been investigated widely. to predict infection risks, we modelled s-protein:ace complexes from vertebrate species, calculated changes in the energy of the complex caused by mutations in each species, relative to human ace , and correlated these changes with covid- infection data. we also analysed structural interactions to better understand the key residues contributing to affinity. we predict that mutations are more detrimental in ace than tmprss . finally, we demonstrate phylogenetically that human sars-cov- strains have been isolated in animals. our results suggest that sars-cov- can infect a broad range of mammals, but few fish, birds or reptiles. susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance. severe acute respiratory syndrome coronavirus (sars-cov- ) is a novel coronavirus that emerged towards the end of and is responsible for the coronavirus disease (covid- ) global pandemic. available data suggests that sars-cov- has a zoonotic source [ ] , with the closest sequence currently available deriving from the horseshoe bat [ ] . as yet, the transmission route to humans, including the intermediate host, is unknown. so far, little work has been done to assess the animal reservoirs of sars-cov- , or the potential for the virus to spread to other species living with, or in close proximity to, humans in domestic, rural, agricultural or zoological settings. coronaviruses, including sars-cov- , are major multi-host pathogens and can infect a wide range of non-human animals [ ] [ ] [ ] . sars-cov- is a member of the betacoronavirus genus, which includes viruses that infect economically important livestock, including cows [ ] and pigs [ ] , together with mice [ ] , rats [ ] , rabbits [ ] , and wildlife, such as antelope and giraffe [ ] . severe acute respiratory syndrome coronavirus (sars-cov), the betacoronavirus that caused the - sars outbreak [ ] , likely jumped to humans from its original bat host via civets. viruses genetically similar to human sars-cov have been isolated from animals as diverse as racoon dogs, ferret-badgers [ ] and pigs [ ] , suggesting the existence of a large host reservoir. it is therefore probable that sars-cov- can also infect a wide range of species. real-world sars-cov- infections have been reported in cats [ ] , lions and tigers [ ] , dogs [ , ] and minks [ , ] . animal infection studies have also identified cats [ ] and dogs [ ] as hosts, as well as ferrets [ ] , macaques [ ] and marmosets [ ] . recent in vitro studies have also suggested an even broader set of animals may be infected [ ] [ ] [ ] . to understand the potential host range of sars-cov- , the plausible extent of zoonotic and anthroponotic transmission, and to guide surveillance efforts, it is vital to know which species are susceptible to sars-cov- infection. the receptor binding domain (rbd) of the sars-cov- spike protein (s-protein) binds to the extracellular peptidase domain of angiotensin i converting enzyme (ace ) mediating cell entry [ ] . the sequence of ace is highly conserved across vertebrates, suggesting that sars-cov- could use orthologues of ace for cell entry. the structure of the sars-cov- s-protein rbd has been solved in complex with human ace [ ] . identification of critical binding residues in this structure have provided valuable insights into viral recognition of the host receptor [ ] [ ] [ ] [ ] [ ] [ ] . deep mutagenesis studies have also revealed residues important for stability [ , ] . compared with sars-cov, the sars-cov- s-protein has a - -fold higher affinity for human ace [ , , ] , due to more contacts in the interface that cover a larger surface area [ ] , and three mutational hotspots in the sprotein that lead to a more specific and compact conformation [ , ] . similarly, variations in human ace have also been found to increase affinity for s-protein receptor binding [ ] . these factors may contribute to the host range and infectivity of sars-cov- . both sars-cov- and sars-cov additionally require the transmembrane serine protease (tmprss ) to mediate cell entry. together, ace and tmprss confer specificity of host cell types that the virus can enter [ , ] . upon binding to ace , the s-protein is cleaved by tmprss at two cleavage sites on separate loops, which primes the s-protein for cell entry [ ] . tmprss has been docked against the sars-cov- s-protein, which revealed its binding site to be adjacent to these two cleavage sites [ ] . an approved tmprss protease inhibitor drug is able to block sars-cov- cell entry [ ] , which demonstrates the key role of tmprss alongside ace [ ] . as such, both ace and tmprss represent attractive therapeutic targets against sars-cov- [ ] . recent work has predicted possible hosts for sars-cov- using the structural interplay between the s-protein and ace . these studies proposed a broad range of hosts, covering hundreds of mammalian species, including tens of bat [ ] and primate [ ] species, and more comprehensive studies analysing all classes of vertebrates [ ] [ ] [ ] , including agricultural species of cow, sheep, goat, bison and water buffalo. in addition, sites in ace have been identified as under positive selection in bats, particularly in regions involved in binding the s-protein [ , ] . the impacts of mutations in ace orthologues have also been tested, for example structural modelling of ace from primate species [ ] demonstrated that apes and african and asian monkeys may also be susceptible to sars-cov- . however, whilst cell entry is necessary for viral infection, it may not be sufficient alone to cause disease. for example, variations in other proteins may prevent downstream events that are required for viral replication in a new host. hence, examples of real-world infections [ ] [ ] [ ] [ ] and experimental data from animal infection studies [ ] [ ] [ ] [ ] [ ] are required to validate hosts that are predicted to be susceptible. here, we analysed the effect of known mutations in orthologues of ace and tmprss from a broad range of vertebrate species, including primates, rodents and other placental mammals; birds; reptiles; and fish. for each species, we generated a -dimensional model of the ace protein structure from its protein sequence and calculated the impacts of known mutations in ace on the stability of the s-protein:ace complex. we correlated changes in the energy of the complex with changes in the structure of ace , chemical properties of residues in the binding interface, and experimental covid- infection phenotypes from in vivo and in vitro animal studies. to further test our predictions and rationalise the key sites contributing to energy changes of the complex, we performed detailed manual structural analyses, presented as a variety of case studies for different species. unlike other studies that analyse interactions that the s-protein makes with the host, we also analyse the impact of mutations in vertebrate orthologues of tmprss . our results suggest that sars-cov- could infect a broad range of vertebrates, which could serve as reservoirs of the virus, supporting future anthroponotic and zoonotic transmission. we aligned protein sequences of vertebrate orthologues of ace . most orthologues have more than % sequence identity with human ace (supplementary fig. a) . for each orthologue, we generated a -dimensional model of the protein structure from its protein sequence using funmod [ , ] . we were able to build high-quality models for vertebrate orthologues, with ndope scores < - (supplementary table ). low-quality models were removed from the analysis. ace residues directly contacting the s-protein (dc residues) were identified in a structure of the complex (pdb id m j; fig. a , supplementary results , supplementary fig. ) . we also identified a more extended set of both dc residues and residues within Å of dc residues likely to be influencing binding (dcex residues). after analysing the orthologue interfaces, we removed models that were missing > dcex residues. models were removed from the analysis, leaving models to take forward for further analysis. we observed high sequence (> % identity) and structure similarity (score > out of ) between ace proteins for all species (supplementary results ). we used multiple methods to assess the relative change in binding energy (ΔΔg) of the sars-cov- s-protein:ace complex following mutations in dc residues and dcex residues that are likely to influence binding. we found that protocol employing mcsm-ppi (henceforth referred to as p( )-ppi ), calculated over the dcex residues, correlated best with the phenotype data (supplementary results , supplementary fig. , table ), justifying the use of animal models to calculate ΔΔg values in this context. since this protocol considers mutations from animal to human, lower ΔΔg values correspond to stabilisation of the animal complex relative to the human complex, and therefore higher risk of infection. we show the residues that p( )-ppi reports as stabilising or destabilising for the sars-cov- s-protein:ace animal complex for dc ( supplementary fig. ) and dcex (supplementary fig. ) residues. to consider ΔΔg values in an evolutionary context, we annotated phylogenetic trees for all vertebrate species analysed ( supplementary fig. ) and for a subset of animals that humans come into close contact with in domestic, agricultural or zoological settings (fig. ). in general we see a high infection risk for most mammals, with a notable exception for all nonplacental mammals. ΔΔg values measured by p( )-ppi correlate well with the infection phenotypes (table ) . ΔΔg values are significantly lower for animals that can be infected by sars-cov- than for animals for which there is no evidence of infection ( fig. ; mann-whitney one-sided p = . x - ). two animals are outliers in the infected boxplot, corresponding to horseshoe bat (ΔΔg = . ) and marmoset (ΔΔg = . ). to be cautious, since in vivo experiments have shown that marmosets can be infected, and in vitro experiments have shown that horseshoe bats can be infected [ , [ ] [ ] [ ] ( table ), we consider animals that have ΔΔg values less than, or equal to, the ΔΔg = . for horseshoe bat to be at risk. additionally, there is a clear sampling bias in the set of animals that have so far been experimentally characterised: all but chicken and duck are mammals. as more nonmammals are tested, the median ΔΔg value for non-infection is likely to increase. in further support of these predictions we analysed the animals having experimental evidence using an orthogonal method, haddock [ ] , and found ~ % agreement between the two independent approaches for animals predicted to be at risk (see supplementary results and supplementary figure ). (fig. ) . as shown in previous studies, and supported by experimental data, many primates are predicted to be at high risk [ , , ] . in agricultural settings, camels, cows, sheep, goats and horses also have relatively low ΔΔg values, suggesting comparable binding affinities to humans, in agreement with experimental data [ , ] . in domestic settings, dogs [ ] , cats [ ] , hamsters [ ] , and rabbits [ ] [ ] [ ] also have ΔΔg values suggesting risk, again in agreement with experimental data (table ) . whilst, zoological animals that come into contact with humans, such as pandas, leopards and bears, are also at risk of infection as shown experimentally [ ] in predicting susceptibility, we have chosen thresholds supported by in vivo or in vitro experimental data. previous work contrasted the binding energy of the s-protein of sars-cov and sars-cov- with human ace protein [ , , ] . sars-cov is able to infect humans despite a ~ -fold lower binding affinity [ , , ] , suggesting that even where mutations in different animal species make the interfaces less compatible for sars-cov , a considerably decreased binding energy may still be sufficient to enable infection. by applying this threshold we correctly predict all animals in our dataset that have experimental evidence of infection, to be at risk (table ) . however, for a few animals we predict at risk using this threshold, in vitro experimental studies to date have not shown infection. for example, donkeys are at risk of infection (ΔΔg = . ) but no infections were observed in vitro for these animals [ ] . however, infection has been observed in vitro for horse [ ] and horse and donkey have identical dcex residues and the same ΔΔg. amongst new world monkeys, marmosets have been experimentally infected [ ] . we predict that the closely related capuchin and squirrel monkey are also at risk, although they have not been shown to be infected using functional assays [ ] . we performed detailed structural analyses to characterise the key residues contributing to binding energy changes and to consider these discrepancies further. our analyses reveal that the interfaces in both capuchin and squirrel monkey are similar to marmoset, suggesting that these two new world monkeys are also likely to be at risk even though there is no current experimental data supporting this [ ] (supplementary results ). furthermore, all these monkeys have high global sequence similarity to human. for capuchin and squirrel monkey this is >~ % and their dcex residues are identical to those of human, further supporting risk. in marmoset, which has experimental evidence of infection the global sequence identity is % and % over the dcex residues. additionally, we compared changes in energy of the s-protein:ace complex in sars-cov- and sars-cov and found similar changes suggesting that the range of animals susceptible to the virus is likely to be similar for sars-cov- and sars-cov (supplementary results ). ace and tmprss are key factors in the sars-cov- infection process. both are highly coexpressed in susceptible cell types, such as type ii pneumocytes in the lungs, ileal absorptive enterocytes in the gut, and nasal goblet secretory cells [ ] . since both proteins are required for infection of host cells, and since our analyses clearly support suggestions of conserved binding of sprotein:ace across animal species, we decided to analyse whether the tmprss was similarly conserved. there is no known structure of tmprss , so we built a high-quality model (ndope = - . ) from a template structure (pdb id i ). since tmprss is a serine protease, and the key catalytic residues are known, we used funfams [ ] to identify highly conserved residues in the active site and the cleavage site that are likely to be involved in substrate binding. this resulted in two sets of residues that we analysed: the active site and cleavage site residues (ascs), and the active site and cleavage site residues plus residues within Å of catalytic residues that are highly conserved in the funfam (ascsex). the sum of grantham scores for mutations in the active site and cleavage site for tmprss is zero or consistently lower than ace in all organisms under consideration, for both ascs and ascsex residues (fig. ) . this means that the mutations in tmprss involve more conservative changes. mutations in dcex residues seem to have a more disruptive effect in ace than in tmprss . whilst we expect orthologues from organisms that are close to humans to be conserved and have lower grantham scores, we observed some residue substitutions that have high grantham scores for primates, such as capuchin, marmoset and mouse lemur. in addition, primates, such as the coquerel sifaka, greater bamboo lemur and bolivian squirrel monkey, have mutations in dcex residues with high grantham scores. mutations in tmprss may render these animals less susceptible to infection by sars-cov- . a small-scale phylogenetic analysis was performed on a subset of sars-cov- assemblies in conjunction with a broader range of sars-like betacoronaviruses (supplementary table ), including sars-cov isolated from both humans and civets. consistent with previous phylogenetic work [ ] , sars-like viruses isolated from horseshoe bats (ratg , epi_isl_ ; rmyn , epi_isl_ ) are the closest relatives of sars-cov- strains currently available in genomic repositories (fig. ) , though still remain many decades divergent from sars-cov- [ ] . aided by a large community sequencing effort, tens of thousands of human-associated sars-cov- genome assemblies are now accessible on gisaid [ , ] . at the time of writing, these also include one complete assembly generated from a virus infecting a domestic dog (epi_isl_ ), one genome obtained from a zoo tiger (epi_isl_ ), one directly isolated from a domestic cat (epi_isl_ ) and high coverage complete genomes obtained from farmed mink (including epi_isl_ ). sars-cov- strains from animal infections fall among the phylogenetic diversity observed in a representative set of human strains (fig. a) , as also seen in larger phylogenetic analyses available on nextstrain (https://nextstrain.org/ncov/global). irrespective of host, the sars-cov- spike receptor binding domain is conserved (fig. b) across tested human and animal associated sars-cov- , suggesting mutations in the rbd are not required for infections observed in non-human species to date. of note, whilst genome-wide data indicates a closer phylogenetic relationship between sars-cov- strains and species in circulation in horseshoe bats, the receptor binding domain alignment instead supports a closer relationship with a sars-like virus isolated from pangolins [ ] (epi_isl_ ; fig. b ), in line with previous reports [ ] . the ongoing covid- global pandemic has a zoonotic origin, necessitating investigations into how sars-cov- infects animals, and how the virus can be transmitted across species. given the role that the stability of the complex, formed between the s-protein and its receptors, could contribute to the viral host range, zoonosis and anthroponosis, there is a clear need to study these interactions. however, to our knowledge there have been few studies of relative changes in the energies of the sprotein:ace complex [ ] . a number of recent studies [ , , ] have suggested that, due to high conservation of ace , some animals are vulnerable to infection by sars-cov- . concerningly, these animals could, in theory, serve as reservoirs of the virus, increasing the risk of future zoonotic events, though transmission rates across species are currently not known. therefore, it is important to try to predict which other animals could potentially be infected by sars-cov- , so that the plausible extent of transmission can be estimated, and surveillance efforts can be guided appropriately. animal susceptibility to infection by sars-cov- has been studied in vivo [ , , , , ] and in vitro [ ] [ ] [ ] during the course of the pandemic. parallel in silico work has made use of the protein structure of the s-protein:ace complex to computationally predict the breadth of possible viral hosts. most studies simply considered the number of residues mutated relative to human a[ , , ] ], although some also analyse the effect that these mutations have on the interface stabil [ , , ] ]. the most comprehensive of these studies analysed the number and locations of mutated residues in ace orthologues from species [ ] , but did not perform detailed energy calculations as we have done. few studies have explored changes in the energy of the sprotein:ace complex on a large scale. shortly after we reported our work in biorxiv rodrigues et al. [ ] submitted a paper in biorxiv also reporting changes in binding energy of the complex for different animal species, measured using a different approach (haddock [ ] ). the results are in good agreement with ours (nearly % of the risk assessments agree). furthermore, when we applied haddock to the animals for which experimental data exists, we also observed significant agreement in risk assessments with those predicted using mcsm-ppi (see supplementary results , supplementary figure and supplementary methods ). our haddock analysis showed slightly better correlation with experiment than the rodrigues et al. study, possibly due to use of a different template structure when building the animal models ( m j, a better resolved structure than m used by rodrigues et al.) . our work is the only study that has so far explored changes in the energy of the s-protein:ace complex on a very large scale ( animals) in order to assess risk of infection across a broad range of animal species. furthermore, it is the only study to assess whether changes in tmprss could also be influencing risk. in this study, we performed a comprehensive analysis of the major proteins that sars-cov- uses for cell entry. we predicted structures of ace and tmprss orthologues from vertebrate species and modelled s-protein:ace complexes. we calculated relative changes in energy (ΔΔg) of sprotein:ace complexes, in silico, following mutations from animal residues to those in human. our predictions suggest that, whilst many mammals are susceptible to infection by sars-cov- , most birds, fish and reptiles are not likely to be. however, there are some exceptions. we manually analysed residues in the s-protein:ace interface, including dc residues that directly contacted the other protein, and dcex residues that also included residues within Å of the binding residues, that may affect binding. we clearly showed the advantage of performing more sophisticated studies of the changes in energy of the complex, over more simple measures--such as the number or chemical nature of mutated residues--used in other studies. furthermore, the wider set of dcex residues that we identified near the binding interface had a higher correlation to the phenotype data than the dc residues. in addition to ace , we also analysed how mutations in tmprss impact binding to the s-protein. we found that mutations in tmprss are less disruptive than mutations in ace , indicating that binding interactions in the s-protein:tmprss complex in different species will not be affected. to increase our confidence in assessing changes in the energy of the complex, we developed multiple protocols using different, established methods. we correlated these stability measures with experimental infection phenotypes in the literature, from in vivo [ , , , ] and in vitro [ ] [ ] [ ] studies of animals. protocol using mcsm-ppi (p( )-ppi ) correlated best with the number of mutations, chemical changes induced by mutations and infection phenotypes, so we chose to focus our analysis employing this protocol. our method cannot determine relative changes in energy that are associated with no risk. instead, we used experimental in vivo and in vitro infection data as the gold standard to identify animals at risk. of note, horseshoe bats, heavily advocated as a putative reservoir host, are predicted to be infected from in vitro experiments, despite the considerable disruption in the interface that our detailed structural analysis shows. we found that our predicted ΔΔg values for animals that can be infected by sars-cov- are significantly lower than for animals that showed no infection when tested experimentally (fig. , table ). ΔΔg values for horseshoe bat and marmoset were outliers for infected animals. these ΔΔg values are higher than the median ΔΔg value for animals that are not infected and are approximately the same value as the median ΔΔg = . for all animals included in this study. however, this may be a result of the biased sampling of animals that have been tested experimentally, where most have been mammals to date. going forward, if more distantly related animals are experimentally characterised, it is plausible that non-placental animals, of which many have ΔΔg > . (the value obtained for horseshoe bat), would be found to not be infected. therefore, the difference between ΔΔg values for animals that can, and cannot, be infected by sars-cov- will increase. overall, our measurements of the change in energy of the complex for the sars-cov s-protein were highly correlated with sars-cov- , so our findings are also applicable to sars-cov. humans are likely to come into contact with of these species in domestic, agricultural or zoological settings (fig. ) . of particular concern are sheep, that have no change in energy of the sprotein:ace complex, as these animals are farmed and come into close contact with humans. indeed, sars-cov- is already responsible for infections in various animal species. sars-cov- genomes [ , ] have been isolated from natural infections in zoo lions and tigers [ ] , companion animals including cats and dogs [ , ] and following widespread outbreaks in multiple mink farms in the netherlands resulting in mass culling [ ] (fig. ) . in most cases natural infections have been linked to human infections supporting cross-species transmission and high levels of exposure [ ] . to date, minks provide the only well supported example of sustained intraspecies transmission with secondary zoonotic transmission back to humans [ ] . consistently, we predict american mink to be at risk of infection by sars-cov- , with ΔΔg = . . to gain a better understanding of the nature of the s-protein:ace interface, we performed more detailed structural analyses for a subset of species. in a few cases, we had found discrepancies between our energy calculations and experimental phenotypes, namely predicting risk for some animals where in vitro experiments showed no infection (table ) . to test our predictions, we manually analysed how the shape or chemistry of residues may impact complex stability for all dc residues and a selection of dcex residues. previous studies have identified a number of important locations in human ace for binding the s-protein [ , ] and we found agreement with these in structural studies using our animal models. these locations, namely the hydrophobic cluster near the n-terminus and two hotspot locations near residues and , stabilise the binding interface. sars-cov- exploits the hydrophobic pocket by mutations that alter the conformation and flexibility of the rbd loop, together with point mutation l f that provides a compact interface which is more dynamically and energetically favourable compared with sars-cov [ , ] . our structural analysis showed how sars-cov- can utilise this pocket for binding at the interface in all the species we examined, including those for which current experimental test data suggest no risk. hotspot shows more structural variability. in agreement with our calculations of large changes in energy of the s-protein:ace complex in horseshoe bat, our structural studies show that the variant d n causes the loss of a salt bridge and h-bonding interactions between ace and s-protein at hotspot . these detailed structural analyses are supported by the high grantham score and calculated total ΔΔg for the change in energy of the complex. both dog and cat have a physicochemically similar variant at this hotspot (d e), which although disrupting the salt bridge still permits alternative h-bonding interactions between the spike rbd and ace . for marmosets and other new world monkeys, capuchin and squirrel monkey, our structural analyses revealed similarity to human at hotspot in the ace interface. in fact, capuchin resembled the human ace interface even more closely than marmoset, which can be infected [ ] , even though in vitro experiments have not reported infection in capuchin [ ] . our ΔΔg value for capuchin (ΔΔg = . ) suggests risk of infection, and also for squirrel monkey (ΔΔg = . ), despite the fact that squirrel monkey also failed to show risk in in vitro experimental studies. of note is the fact that marmoset showed no infection in in vitro studies [ ] , whilst recent in vivo experiments [ ] have shown risk, perhaps suggesting that it can be difficult to detect infection in vitro for these monkeys. alternatively, the lack of infection may suggest additional factors influencing infection and indicate that these animals, which are primates closely related to human, may be useful models for studying immune, or other factors, related to resistance. finally, our structural analyses showed that some dcex residues were likely to be allosteric sites, which may represent promising drug targets [ ] . the value of our study is not in determining an absolute ΔΔg threshold for risk, but rather in providing information about relative changes in binding energy that will allow the host range of the virus to be more accurately gauged once more experimental work has been conducted. we believe that false positive predictions are more acceptable than false negatives. so, within the context of possible transmission events between species, and particularly to human, we consider that an animal can be infected if there is any experimental evidence of infection. we applied protocols that enabled a comprehensive study of host range, within a reasonable time, for identifying species at risk of infection by sars-cov- , or of becoming reservoirs of the virus. although we felt that these faster methods were justified by the need for timely answers to these questions, there are clearly caveats to our work that should be taken into account. whilst we use a state of the art modelling tool [ ] and an endorsed method for calculating changes in energy of the complex [ ] , molecular dynamics may give a more accurate picture of energy changes by sampling rotamer space more comprehensively [ ] . however, such an approach would have been prohibitively expensive at a time when it is clearly important to identify animals at risk as quickly as possible. each animal could take orders of magnitude longer to analyse using molecular dynamics. further caveats include the fact that although the animals we highlight at risk from our changes in binding energy calculations correlate well with the experimental data, there is only a small amount of such data currently available, and many of the experimental papers reporting these data are yet to be peer reviewed. finally, we restricted our analyses to one strain of sars-cov- , but other strains may have evolved with mutations that give more complementary interfaces. for example, recent work suggests sars-cov- can readily adapt to infect mice following serial passages [ ] . in summary, our work is not aiming to provide an absolute measure of risk of infection. rather, it should be considered an efficient method to screen a large number of animals and suggest possible susceptibility, and thereby guide further studies. any predictions of possible risk should be confirmed by experimental studies and computationally expensive, but more robust methods, like molecular dynamics. the ability of sars-cov- to infect host cells and cause covid- , sometimes resulting in severe disease, ultimately depends on a multitude of other host-virus protein interactions [ ] . while we do not investigate them all in this study, our results suggest that sars-cov- could indeed infect a broad range of mammals. as there is a possibility of creating new reservoirs of the virus, we should now consider how to identify such transmission early and to mitigate against such risks. in particular, farm animals and other animals living in close contact with humans could be monitored, protected where possible and managed accordingly [ ] . ace protein sequences for vertebrates, including humans, were obtained from ensembl [ ] version and eight sequences from uniprot release _ (supplementary table ). tmprss protein sequences for vertebrate sequences, including the human sequence, were obtained from ensembl (supplementary table ) . a phylogenetic tree of species, to indicate the evolutionary relationships between animals, was downloaded from ensembl [ ] . the structure [ ] of the sars-cov- s-protein bound to human ace at . Å was used throughout (pdb id m j). we used standard methods to analyse the sequence similarity between human ace and other vertebrate species (supplementary methods ). we also mapped ace and tmprss sequences to our cath functional families to detect residues highly conserved across species (supplementary methods ). in addition to residues in ace that contact the s-protein directly, various other studies have also considered residues that are in the second shell, or are buried, and could influence binding [ ] . therefore, in our analyses we built on these approaches and extended them to compile the following sets for our study: . direct contact (dc) residues. this includes a total of residues that are involved in direct contact with the s-protein [ ] identified by pdbe [ ] and pdbsum [ ] . . direct contact extended (dcex) residues. this dataset includes residues within Å of dc residues, that are likely to be important for binding. these were selected by detailed manual inspection of the complex, and also considering the following criteria: (i) reported evidence from deep mutagenesis [ ] , (ii) in silico alanine scanning (using mcsm-ppi [ ] ), (iii) residues with high evolutionary conservation patterns identified by the funfam-based protocol described above, i.e. residues identified with dops ≥ and scorecons score ≥ . , (iv) allosteric site prediction (supplementary methods ), and (v) sites under positive selection (supplementary methods ). selected residues are shown in supplementary fig. and residues very close to dc residues (i.e. within Å) are annotated. we also included residues identified by other related structural analyses, reported in the literature (supplementary methods ) . using the ace protein sequence from each species, structural models were generated for the sprotein:ace complex for animals using the funmod modelling pipeline [ , ] (supplementary methods ). funmod searches for structural templates by mapping sequences to a cath funfam and selecting the structure of the closest relative of known structure, to use as a template for homology modelling [ ] . sequences are mapped by scanning them against the cath funfam hmm library using hmmer [ ] . the structural template selected was pdb id m j, a high-resolution crystal structure of sars-cov s-protein:human ace complex. we generated query-template alignments using hh-suite [ ] and predicted d models using modeller v. . [ ] . the 'very_slow' schedule was used for model refinement to optimise the geometry of the complex and interface. for each species, we generated models and selected the model with the lowest ndope [ ] score. only high-quality models were used in this analysis, with ndope score < - and with < dcex residues missing. this gave a final dataset of animals for further analysis. the modelled structures of ace were compared against the human structure (pdb id m j) and pairwise, against each other, using ssap [ ] . ssap measures the similarity between d protein structures by calculating similarity in vector views between aligned residues. a vector view for a given residue is the set of vectors from the cβ atom of that residue to the cβ atom of all other residues in the protein structure. ssap returns a score in the range - , with identical structures scoring [ ] . we also built models for tmprss proteins in all available species and identified the residues likely to be involved in the protein function (see supplementary methods ). we calculated the changes in binding energy of the sars-cov- s-protein:ace complex and the sars-cov s-protein:ace complex of different species, compared to human, following two different protocols: . protocol : using the human complex and mutating the residues for the ace interface to those found in the given animal sequence and then calculating the ΔΔg of the complex using both mcsm-ppi [ ] and mcsm-ppi [ ] (supplementary methods ) . this gave a measure of the destabilisation of the complex in the given animal relative to the human complex. ΔΔg values < are associated with destabilising mutations, whilst values ≥ are associated with stabilising mutations. . protocol : we repeated the analysis with both mcsm-ppi and mcsm-ppi as in protocol , but using the animal -dimensional models, instead of the human ace structure, and calculating the ΔΔg of the complex by mutating the animal ace interface residue to the appropriate residue in the human ace structure. this gave a measure of the destabilisation of the complex in the human complex relative to the given animal. values ≤ are associated with destabilisation of the human complex (i.e. animal complexes more stable), whilst values > are associated with stabilisation of the human complex (i.e. animal complexes less stable). we subsequently correlated ΔΔg values with available in vivo and in vitro experimental data on covid- infection data for mammals. protocol , mcsm-ppi , correlated best with these data. to measure the degree of chemical change associated with mutations occurring in dc and dcex residues, we computed the grantham score [ ] for each vertebrate compared to the human sequence (supplementary methods ). we performed phylogenetic analyses for a subset of sars-cov (n = ), sars-like (n = ) and sars-cov- (n = ) viruses from publicly available data in ncbi [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] and gisaid [ , ] (supplementary methods ). the funmod structural models for the sars-cov- spike-rbd:ace complex and tmprss are available on zenodo at https://zenodo.org/record/ [ ] . world health organisation. covid- situation report - available from a pneumonia outbreak associated with a new coronavirus of probable bat origin isolation and characterization of viruses related to the sars coronavirus from animals in southern china bats, civets and the emergence of sars the phylogenetic range of bacterial and viral pathogens of vertebrates bovine respiratory coronavirus porcine coronaviruses. emerg transbound anim viruses discovery, diversity and evolution of novel coronaviruses sampled from rodents in china discovery of a novel coronavirus, china rattus coronavirus hku , from norway rats supports the murine origin of betacoronavirus and has implications for the ancestor of isolation and characterization of a novel betacoronavirus subgroup a coronavirus, rabbit coronavirus hku , from domestic rabbits biologic, antigenic, and full-length genomic characterization of a bovine-like coronavirus isolated from a giraffe severe acute respiratory syndrome) [internet]. who. world health organization sars-associated coronavirus transmitted from human to pig. emerg infect dis sars-cov- neutralizing serum antibodies in cats: a serological investigation from people to panthera : natural sars-cov- infection in tigers and lions at the bronx zoo disease and diplomacy: gisaid's innovative contribution to global health: data, disease and diplomacy. glob chall global initiative on sharing all influenza data -from vision to reality susceptibility of ferrets, cats, dogs, and different domestic animals to sars-coronavirus- [internet]. microbiology comparison of sars-cov- infections among species of non-human primates functional and genetic analysis of viral receptor ace orthologs reveals broad potential host range of sars-cov- . biorxiv potential host range of multiple sars-like coronaviruses and an improved ace -fc variant that is potent against both sars-cov- and sars-cov- [internet]. microbiology broad and differential animal ace receptor usage by sars-cov- . biorxiv angiotensin-converting enzyme is a functional receptor for the sars coronavirus structure of the sars-cov- spike receptorbinding domain bound to the ace receptor structural basis of receptor recognition by sars-cov- structure of sars coronavirus spike receptor-binding domain complexed with receptor exceptional diversity and selection pressure on sars-cov and sars-cov- host receptor in bats compared to other mammals. biorxiv covid- : epidemiology, evolution, and cross-disciplinary perspectives the sars-cov- exerts a distinctive strategy for interacting with the ace human receptor. viruses the sequence of human ace is suboptimal for binding the s spike protein of sars coronavirus . biorxiv deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding cryo-em structure of the -ncov spike in the prefusion conformation emergence of rbd mutations in circulating sars-cov- strains enhancing the structural stability and human ace receptor affinity of the spike protein human ace receptor polymorphisms predict sars-cov- susceptibility sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is enriched in specific cell subsets across tissues social science research network structural basis of sars-cov- spike protein priming by tmprss . biorxiv tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor ace and tmprss variants and expression as candidates to sex and country differences in covid- severity in italy. medrxiv a sars-cov- protein interaction map reveals targets for drug repurposing comparative ace variation and primate covid- risk broad host range of sars-cov- predicted by comparative and structural analysis of ace in vertebrates insights on crossspecies transmission of sars-cov- from structural modeling evidence of significant natural selection in the evolution of sars-cov- in bats, not humans gene d: expanding the utility of domain assignments an overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences the haddock . web server: user-friendly integrative modeling of biomolecular complexes composition and divergence of coronavirus spike proteins and host ace receptors predict potential intermediate hosts of sars-cov- susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus infection and rapid transmission of sars-cov- in ferrets infection with novel coronavirus (sars-cov- ) causes pneumonia in the rhesus macaques tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis functional classification of cath superfamilies: a domain-based approach for protein function annotation evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic isolation of sars-cov- -related coronavirus from malayan pangolins probable pangolin origin of sars-cov- associated with the covid- outbreak absence of sars-cov- infection in cats and dogs in close contact with a cluster of covid- patients in a veterinary campus sars-cov- spike protein favors ace from bovidae and cricetidae analysis of the mutation dynamics of sars-cov- reveals the spread history and emergence of rbd mutant with lower ace binding affinity first reported cases of sars-cov- infection in companion animals infection of dogs with sars-cov- sars-cov- infection in farmed minks, the netherlands computational design of ace -based peptide inhibitors of sars-cov- comparative protein structure modeling using modeller mcsm-ppi : predicting the effects of mutations on protein-protein interactions rapid adaptation of sars-cov- in balb/c mice: novel mouse model for vaccine efficacy host range of sars-cov- and implications for public health ensembl comparative genomics resources pdbe: improved findability of macromolecular structure data in the pdb pdbsum: summaries and analyses of pdb structures mcsm: predicting the effects of mutations in proteins using graph-based signatures challenges in homology search: hmmer and convergent evolution of coiled-coil regions hh-suite for fast remote homology detection and deep protein annotation statistical potential for assessment and prediction of protein structures ssap: sequential structure alignment program for protein structure comparison amino acid difference formula to help explain protein evolution characterization of severe acute respiratory syndrome coronavirus genomes in taiwan: molecular epidemiology and genome evolution genomic characterisation of the severe acute respiratory syndrome coronavirus of amoy gardens outbreak in hong kong discovery of a rich gene pool of bat sarsrelated coronaviruses provides new insights into the origin of sars coronavirus isolation and characterization of a bat sars-like coronavirus that uses the ace receptor bats are natural reservoirs of sars-like coronaviruses ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related rhinolophus bat coronavirus in china reveal bats as a reservoir for acute, self-limiting infection that allows recombination events severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats genomic characterization and infectivity of a novel sars-like coronavirus in chinese bats supplementary structural models (sars-cov- spike-rbd:ace complex and tmprss ) -sars-cov- spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals we thank gal horesh, caitlin lee carpenter and mohd firdaus raih for insightful discussions; alan hunns for help in making figures; and laurel woodridge, sean le cornu and declan torin cook for comments on the manuscript. we would also like to thank francois balloux, whose team member, lucy van dorp, contributed the phylogenetic analysis of sars-like viruses. key: cord- -ezrkg dc authors: myerson, jacob w.; patel, priyal n.; habibi, nahal; walsh, landis r.; lee, yi-wei; luther, david c.; ferguson, laura t.; zaleski, michael h.; zamora, marco e.; marcos-contreras, oscar a.; glassman, patrick m.; johnston, ian; hood, elizabeth d.; shuvaeva, tea; gregory, jason v.; kiseleva, raisa y.; nong, jia; rubey, kathryn m.; greineder, colin f.; mitragotri, samir; worthen, george s.; rotello, vincent m.; lahann, joerg; muzykantov, vladimir r.; brenner, jacob s. title: supramolecular organization predicts protein nanoparticle delivery to neutrophils for acute lung inflammation diagnosis and treatment date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ezrkg dc acute lung inflammation has severe morbidity, as seen in covid- patients. lung inflammation is accompanied or led by massive accumulation of neutrophils in pulmonary capillaries (“margination”). we sought to identify nanostructural properties that predispose nanoparticles to accumulate in pulmonary marginated neutrophils, and therefore to target severely inflamed lungs. we designed a library of nanoparticles and conducted an in vivo screen of biodistributions in naive mice and mice treated with lipopolysaccharides. we found that supramolecular organization of protein in nanoparticles predicts uptake in inflamed lungs. specifically, nanoparticles with agglutinated protein (naps) efficiently home to pulmonary neutrophils, while protein nanoparticles with symmetric structure (e.g. viral capsids) are ignored by pulmonary neutrophils. we validated this finding by engineering protein-conjugated liposomes that recapitulate nap targeting to neutrophils in inflamed lungs. we show that naps can diagnose acute lung injury in spect imaging and that nap-like liposomes can mitigate neutrophil extravasation and pulmonary edema arising in lung inflammation. finally, we demonstrate that ischemic ex vivo human lungs selectively take up naps, illustrating translational potential. this work demonstrates that structure-dependent interactions with neutrophils can dramatically alter the biodistribution of nanoparticles, and naps have significant potential in detecting and treating respiratory conditions arising from injury or infections. the covid- pandemic tragically illustrates the dangers of acute inflammation and infection of the lungs, for both individuals and societies. acute alveolar inflammation causes the clinical syndrome known as acute respiratory distress syndrome (ards), in which inflammation prevents the lungs from oxygenating the blood. severe ards is the cause of death in most covid- mortality and was a major cause of death in the influenza epidemic, but ards is common even outside of epidemics, affecting ~ , american patients per year with a ~ - % mortality rate. - ards is caused not just by viral infections, but also by sepsis, pneumonia (viral and bacterial), aspiration, and trauma. , largely because ards patients have poor tolerance of drug side effects, no pharmacological strategy has succeeded as an ards treatment. , [ ] [ ] [ ] therefore, there is an urgent need to develop drug delivery strategies that specifically target inflamed alveoli in ards and minimize systemic side effects. neutrophils are "first responder" cells in acute inflammation, rapidly adhering and activating in large numbers in inflamed vessels and forming populations of "marginated" neutrophils along the vascular lumen. [ ] [ ] [ ] [ ] [ ] [ ] [ ] neutrophils can be activated by a variety of initiating factors, including pathogen-and damage-associated molecular patterns such as bacterial lipopolysaccharides (lps). , after acute inflammatory insults, neutrophils marginate in most organs, but by far most avidly in the lung capillaries. , , , , neutrophils are therefore key cell types in most forms of ards. in ards, marginated neutrophils can secrete tissue-damaging substances (proteases, reactive oxygen species) and extravasate into the alveoli, leading to disruption of the endothelial barrier and accumulation of neutrophils and edematous fluid in the air space of the lungs ( figure a ). , , , [ ] [ ] [ ] targeted nanoparticle delivery to marginated neutrophils could provide an ards treatment with minimal side effects, but specific delivery to marginated neutrophils remains an open challenge. antibodies against markers such as ly g have achieved targeting to neutrophils in mice, but also deplete populations of circulating neutrophils. [ ] [ ] [ ] [ ] additionally, while ly g readily marks neutrophils in mice, there is no analogous specific and ubiquitous marker on human neutrophils. therefore, antibody targeting strategies have not been widely adopted for targeted drug delivery to these cells. as another route to neutrophil targeting, two previous studies noted that activated neutrophils take up denatured and agglutinated bovine albumin, concluding that denatured protein was critical in neutrophil-particle interactions. , nanoparticle structural properties such as shape, size, and deformability can define unique targeting behaviors. [ ] [ ] [ ] [ ] [ ] here, we screened a diverse panel of nanoparticles to determine the nanostructural properties that predict uptake in pulmonary marginated neutrophils during acute inflammation. as a high-throughput animal model for ards, we administered lps to mice, causing a massive increase in pulmonary marginated neutrophils. we show that two initial leads in our screen, lysozyme-dextran nanogels (ldngs) and crosslinked albumin nanoparticles (anps), selectively home to marginated neutrophils in inflamed lungs, but not naïve lungs. in our subsequent screen of over diverse nanoparticles, we find that protein nanoparticles, all defined by agglutination of protein in amorphous nanostructures (nanoparticles with agglutinated proteins, naps), but not by denatured protein, have specificity for lps-inflamed lungs. in contrast to naps, we demonstrate that three symmetric protein nanostructures (viruses/nanocages) have biodistributions unaffected by lps injury. we show that polystyrene nanoparticles and five liposome formulations do not accumulate in injured lungs, indicating that nanostructures that are not based on protein are not intrinsically drawn to marginated neutrophils in acute inflammation. we then engineered liposomes (the most clinically relevant nanoparticle drug carriers) as naps, through conjugation to protein modified with hydrophobic cyclooctynes, encouraging protein agglutination on the liposome surface by hydrophobic interactions. we thus show that supramolecular organization of proteins, rather than chemical composition, best predicts uptake in marginated neutrophils in acutely inflamed lungs. we then demonstrate proof of concept for naps as diagnostic and therapeutic tools for ards. we show; a) in-labeled naps provide diagnostic imaging contrast that distinguishes inflammatory lung injury from cardiogenic pulmonary edema; b) napliposomes can significantly ameliorate edema in a mouse model of severe ards; c) naps, but not crystalline protein nanostructures, accumulate in ex vivo human lungs rejected for transplant due to ards-like conditions. collectively, our results will demonstrate that supramolecular organization of protein, namely protein agglutination, predicts strong, intrinsic nanoparticle tropism for marginated neutrophils. this finding indicates that naps, encompassing a wide range of nanoparticles based on or incorporating protein, have biodistributions that are responsive to inflammation. naps could be useful beyond ards, since marginated neutrophils play a pathogenic role in a diverse array of inflammatory diseases, including infections, heart attack, and stroke. [ ] [ ] [ ] [ ] [ ] but our findings provide a clear path forward for using naps to improve diagnosis and treatment of ards. to quantify the increase in pulmonary marginated neutrophils after inflammatory lung injury, radiolabeled clone a anti-ly g antibody (specific for mouse neutrophils) was administered intravenously (iv) to determine the location and concentration of neutrophils in mice. iv injection of lps subjected mice to a model of mild ards. accumulation of anti-ly g antibody in the lungs was dramatically affected by iv lps, with . % of injected antibody adhering in lps-injured lungs, compared to . % of injected antibody in naïve control lungs ( figure b) . agreeing with previous studies addressing the role of neutrophils in systemic inflammation, biodistributions of anti-ly g antibody indicated that systemic lps injury profoundly increased the concentration of neutrophils in the lungs. , , , single cell suspensions prepared from mouse lungs were probed by flow cytometry to further characterize pulmonary neutrophils in naïve mice and in mice following lps-induced inflammation. to identify intravascular populations of leukocytes, mice received iv fluorescent cd antibody five minutes prior to sacrifice. single cell suspensions prepared from iv cd -stained lungs were then stained with anti-ly g antibody to identify neutrophils. a second stain of single cell suspensions with cd antibody indicated the total population of leukocytes in the lungs, distinct from the intravascular population indicated by iv cd . flow cytometry showed greater concentrations of neutrophils in lps-injured lungs, compared to naïve lungs ( figure c , counts above horizontal threshold indicate positive staining for neutrophils, figure d , rightmost peak indicates positive staining for neutrophils). comparison of ly g stain to total cd -positive cells indicated . % of leukocytes in the lungs were ly g-positive after lps injury, compared to . % in the naïve control ( figure d , center panel). comparison of ly g stain to iv cd stain indicated that the majority of neutrophils were intravascular, in both naïve and lpsinjured mice. in naïve mice, . % of neutrophils were intravascular and in lps-injured mice, . % of neutrophils were intravascular ( figure d , right panel). the presence of large populations of intravascular neutrophils following inflammatory injury is consistent with previously published observations. , , , , histological analysis confirmed results obtained with flow cytometry and radiolabeled anti-ly g biodistributions. staining of lung sections indicated increased concentration of neutrophils in the lungs following iv lps injury ( figure e , left panels). co-registration of neutrophil staining with tissue autofluorescence (indicating tissue architecture) broadly supported the finding that pulmonary neutrophils reside in the vasculature ( figure e , right panels). previous work has traced the neutrophil response to bacteria in the lungs, determining that pulmonary neutrophils pursue and engulf active bacteria following either intravenous infection or infection of the airspace in the lungs. , , we injected heat-inactivated, oxidized, and fixed e. coli in naïve and iv-lps-injured mice. with the bacteria stripped of their functional behavior, e. coli did not accumulate in the lungs of naïve control mice ( . % of initial dose in the lungs, blue bars in figure f ). however, pre-treatment with lps to recapitulate the inflammatory response to infection led to enhanced accumulation of the deactivated e. coli in the lungs ( . % of initial dose in the lungs, red bars in figure f ). with e. coli structure maintained but e. coli function removed, the inactivated bacteria were taken up more avidly in lungs primed by an inflammatory injury. in order to identify nanostructural parameters that correlate with nanoparticle uptake in inflamed lungs, we conducted an in vivo screen of a diverse array of nanoparticle drug carriers. the screen was based on the method used above for tracing inactivated bacteria: inject radiolabeled nanoparticles into mice and measure biodistributions, comparing naïve with iv-lps mice. to validate that the radiotracing screen would measure uptake in pulmonary marginated neutrophils, we more fully characterized the in vivo behavior of two early hits in the screen. lysozyme-dextran nanogels (ldngs, ngs) and poly(ethylene)glycol (peg)crosslinked albumin nps have been characterized as targeted drug delivery agents in previous work. [ ] [ ] [ ] here, ldngs ( . ± . nm diameter, . ± . pdi, supplementary figure a ) and peg-crosslinked human albumin nps ( . ± . nm diameter, . ± . pdi, supplementary figure b) were administered in naïve and iv-lps-injured mice. neither np was functionalized with antibodies or other affinity tags. the protein component of each particle was labeled with i for tracing in biodistributions, and assessed minutes after iv administration of nps. both absolute ldng lung uptake and ratio of lung uptake to liver uptake registered a ~ -fold increase between naïve control and lps-injured animals (figure a , supplementary table ) . specificity for lps-injured lungs was recapitulated with peg-crosslinked human albumin nps. albumin nps accumulated in naïve lungs at . % injected dose per gram organ weight (%id/g), and in lps-injured lungs at . %id/g, accounting for a -fold increase in lung uptake after intravenous lps insult ( figure b , supplementary table ) . single cell suspensions were prepared from lungs after administration of fluorescent ldngs or peg-crosslinked albumin nps. flow cytometric analysis of cells prepared from lungs after np administration enabled identification of cell types with which nps associated. firstly, the total number of cells containing ldngs or albumin nps increased between naïve and lps-injured lungs. in naïve control lungs, . ly g stain for neutrophils indicated that the bulk of ldng and albumin np accumulation in lps-injured lungs could be accounted for by uptake in neutrophils. in figure c and d, counts above the horizontal threshold indicate neutrophils and counts to the right of the vertical threshold indicate cells containing ldngs ( figure c ) or albumin nps ( figure d ). in iv-lps-injured lungs, ldng and albumin np uptake was dominated by neutrophils ( figure c , figure d , upper right quadrants indicate nppositive neutrophils). in lps-injured lungs, the majority of neutrophils, > % of cells, contained significant quantities of nanoparticles, compared to < % in naïve lungs. likewise, the majority of nanoparticle uptake in the lungs (> %) was accounted for by nanoparticle uptake in neutrophils ( figure e , f, g, h, supplementary table ) . for np uptake not accounted for by neutrophils, cd staining indicated that the remaining np uptake was attributable to other leukocytes. co-localization of albumin np fluorescence with cd stain showed that . % of albumin np uptake was localized to leukocytes in naïve lungs and . % of albumin np uptake was localized to leukocytes in injured lungs (supplementary figure c, supplementary figure d ). for ldngs, localization to neutrophils in injured lungs was confirmed via histology. ly g staining of lps-injured lung sections confirmed colocalization of fluorescent nanogels with neutrophils in the lung vasculature ( figure i ). slices in confocal images of lung sections indicated that ldngs were inside neutrophils ( figure j ). intravital imaging of injured lungs allowed real-time visualization of ldng uptake in leukocytes in injured lungs. ldng fluorescent signal accumulated over minutes and reliably colocalized with cd staining for leukocytes ( figure k , supplementary movie ). ldng pharmacokinetics were evaluated in naïve and iv-lps-injured mice (supplementary figure ) . in both naïve and injured mice, bare ldngs were rapidly cleared from the blood with a distribution half-life of ~ minutes. in naïve mice, transient retention of ldngs in the lungs ( . %id/g at five minutes after injection) leveled off over one hour. in iv-lps-treated mice, ldng concentration in the lungs reached a peak value at minutes after injection, as measured either by absolute levels of lung uptake or by lungs:blood localization ratio. ldng biodistributions were also assessed in mice undergoing alternative forms of lps-induced inflammation. intratracheal (it) instillation of lps led to concentration of ldngs in the lungs at . %id/g. liver and spleen ldng uptake was also reduced following it lps injury, leading to a -fold increase in the lungs:liver ldng localization ratio induced by it lps injury (supplementary figure ) . as with iv lps injury, it lps administration leads to neutrophil-mediated vascular injury focused in the lungs. mice were administered lps via footpad injection to provide a model of systemic inflammation originating in lymphatic drainage. ldng uptake in the lungs and in the legs was enhanced by footpad lps administration. at hours after footpad lps administration, ldngs concentrated in the lungs at . %id/g, an -fold increase over naïve. at hours, ldngs concentrated in the lungs at . %id/g (supplementary figure a) . total ldng accumulation in the legs accounted for . % of initial dose (%id) in naïve mice, . %id in mice hours after footpad lps injection, and . %id at hours after footpad injection (supplementary figure b) , indicating ldngs can concentrate in inflamed vasculature outside the lungs. previous work has indicated that nps based on denatured albumin accumulate in neutrophils in inflamed lungs and at sites of acute vascular injury, whereas nps coated with native albumin do not. , we have characterized lysozyme-dextran nanogels and crosslinked human albumin nps with circular dichroism (cd) spectroscopy to compare secondary structure of proteins in the nps to secondary structure of the native component proteins (supplementary figure a-b) . identical cd spectra were recorded for ldngs vs. lysozyme and for albumin nps vs. human albumin. deconvolution of the cd spectra via neural network algorithm trained against a library of cd spectra for known structures verified that secondary structure composition of lysozyme and albumin was unchanged by incorporation of the proteins in the nps. free protein and protein nps were also probed with -anilino- naphthalenesulfonic acid (ansa), previously established as a tool for determining the extent to which hydrophobic domains are exposed on proteins. consistent with known structures of the two proteins, ansa staining indicated few available hydrophobic domains on lysozyme and substantial hydrophobic exposure on albumin (supplementary figure c -d, blue curves). ldngs had increased hydrophobic accessibility vs. native lysozyme whereas albumin nps had reduced hydrophobic accessibility compared with native albumin. therefore, our data indicate that lysozyme and albumin are not denatured in ldngs and albumin nps, but the nps composed of the two proteins present a balance of hydrophobic and hydrophilic surfaces differing from the native proteins. the previous section demonstrates that two different nanoparticles based on protein, shown not to be denatured in cd spectroscopy studies, have uptake in lpsinflamed lungs driven by uptake in marginated neutrophils. we next undertook a broader study considering how aspects of np structure including size, composition, surface chemistry, and structural organization impact np uptake in lps-injured lungs. as examples of different types of protein nps, variants of ldngs (representing nps based on hydrophobic interactions between proteins), crosslinked protein nps, and nps based on electrostatic interactions between proteins were traced in naïve control and iv-lps-injured mice. as examples of nps based on site-specific protein interactions (rather than site-indiscriminate interactions leading to crosslinking, gelation, or chargebased protein nps), we also traced viruses and ferritin nanocages in naïve and lpstreated mice. liposomes and polystyrene nps were studied as examples of lipid and polymeric nanostructures. nanoparticles based on hydrophobic protein interactions ldng size was varied by modifying lysozyme-dextran composition of the nps and ph at which particles were formed. figure , all sizes of ldngs accumulated in lps-injured lungs at higher concentrations than in naïve lungs, with accumulation in injured lungs reaching ~ % of initial dose for all types of ldngs (supplementary table ). variations in size and composition of ldngs therefore did not affect ldng specificity for lps-injured lungs. expanding on data with peg-nhs ester-crosslinked human serum albumin particles, we varied the geometry and protein composition of nps based on peg-nhs protein crosslinking. human serum albumin nanorods (aspect ratio : ), bovine serum albumin nps ( . table ). lysozyme nps accumulated in naïve lungs at a uniquely high concentration of . %id/g, compared to . %id/g in inflamed lungs. degree of uptake in injured lungs, along with injured vs. naïve contrast, did vary with protein np composition. however, acute inflammatory injury resulted in a minimum three-fold increase in lung uptake for all examined crosslinked protein nps, excluding crosslinked lysozyme, which still accumulated in injured lungs at a high concentration ( . % of initial dose). we traced recently-developed poly(glutamate) tagged green fluorescent protein (e-gfp) nps, representing a third class of protein np based on electrostatic interactions between proteins and carrier polymer or metallic particles. negatively-charged e-gfp was paired to arginine-presenting gold nanoparticles ( . ± . nm diameter, pdi . ± . ) or to poly(oxanorborneneimide) (poni) functionalized with guanidino and tyrosyl side chains ( . ± . nm diameter, pdi . ± . ) (supplementary figure d) . for biodistribution experiments with poni/e-gfp hybrid nps, tyrosine-bearing poni was labeled with i and e-gfp was labeled with i, allowing simultaneous tracing of each component of the hybrid nps. the two e-gfp nps, with structure based on charge interactions, had specificity for iv lps-injured lungs. comparing uptake in lps-injured lungs to naïve lungs, we observe an lps:naïve ratio of . for poni/e-gfp nps as traced by the poni component, . for poni/e-gfp nps as traced by the e-gfp component, and . for au/e-gfp nps ( figure c , supplementary figure ). poni/e-gfp particles, specifically, accumulated in lpsinjured lungs at . % initial dose as measured by poni tracing and . % initial dose as measured by gfp tracing, indicating effective co-delivery in the inflamed organ. acute inflammatory injury therefore resulted in a two-to three-fold increase in pulmonary uptake of nps constructed via electrostatic protein interactions. nanoparticles based on symmetric protein organization adeno-associated virus (aav), adenovirus, and horse spleen ferritin nanocages were employed as examples of protein-based nps with highly symmetrical structure (see supplementary figure d for dls confirmation of structure). [ ] [ ] [ ] for each of these highly ordered protein nps, iv lps injury had no significant effect on biodistribution and levels of uptake in the injured lungs were minimal ( figure d table ). therefore, highly ordered protein nps traced in our studies did not have tropism for the lungs after acute inflammatory injury. liposomes and polystyrene nps were studied as example nps that are not structurally based on proteins. dota chelate-containing lipids were incorporated into bare liposomes, allowing labeling with in tracer for biodistribution studies. carboxylate polystyrene nps were coupled to trace amounts of i-labeled igg via edci-mediated carboxy-amine coupling. liposomes had a diameter of . ± . nm (pdi . ± . ) and igg-polystyrene nps had a diameter of . ± . nm (pdi . ± . ) (supplementary figure c -d). liposomes accumulated in inflamed lungs at a concentration of . %id/g, accounting for no significant change against naïve lungs. lps injury actually induced a reduction in the lungs:liver metric, from . for naïve mice to . for lps-injured mice. polystyrene nps accumulated in inflamed lungs at . %id/g ( . % initial dose), so iv lps injury did in fact induce increased levels of np uptake in the lungs, from a concentration of . %id/g in the naïve lungs ( figure e, supplementary figure ). however, neither bare liposomes nor polystyrene nps were drawn to lps-injured lungs in significant concentrations. significantly, isolated proteins did not home to lps-inflamed lungs themselves. we traced radiolabeled albumin, lysozyme, and transferrin in naïve control and iv lpsinjured mice (supplementary figure , supplementary table ). in injured mice, albumin, lysozyme, and transferrin localized to the lungs at low concentrations and no significant differences were recorded when comparing naïve to lps-injured lung uptake. the data presented in figure and supplementary figures - indicate that a variety of protein-based nanostructures have tropism for acute inflammatory injury in the lungs. nps based on agglutination of proteins in non-site-specific interactions (naps, figure a -c, supplementary figures - ) all exhibited either significant increases in lung uptake after lps injury or high levels of lung uptake in both naïve control and lpsinjured animals. nanostructures based on highly symmetrical protein organization had no specific tropism for inflamed lungs ( figure d ). representative nanostructures not based on proteins, bare liposomes and polystyrene beads, did not home to inflamed lungs ( figure e ). we next engineered naps from liposomes, a nanoparticle shown above to have no intrinsic neutrophil tropism. our methods for engineering nap-like liposomes serve to validate the finding that supramolecular organization of protein in nanoparticles predicts neutrophil tropism. liposomes were functionalized with rat igg conjugated via sata-maleimide chemistry (sata-igg liposomes) or via recently demonstrated copper-free click chemistry methods. briefly, click chemistry methods entailed nhs-ester conjugation of an excess of strained alkyne (dibenzocyclooctyne, dbco) to igg, followed by reaction of the dbco-functionalized igg with liposomes containing peg-azide-terminated lipids (dbco-igg liposomes, figure a ). dbco-igg liposomes had a diameter of . ± . nm and a pdi of . ± . and sata-igg liposomes had a diameter of . ± . nm and a pdi of . ± . (supplementary figure c) . in mice subjected to iv-lps, sata-igg liposomes accumulated in the lungs at a concentration of . %id/g ( figure b , yellow bars). dbco-igg liposomes, by contrast, concentrated in the lungs at . %id/g, corresponding to . % of initial dose and roughly matching the accumulation of nm ldngs in the inflamed lungs ( figure b , brown bars). for comparison, bare liposomes, as in figure e , concentrated in the inflamed lungs at . %id/g ( figure b , green bars). for dbco-igg liposomes, the inflamed vs. naïve lung uptake accounted for a twelve-fold change. dbco-igg liposomes specifically accumulated in injured lungs, whereas sata-igg liposomes and bare liposomes did not (supplementary figure , supplementary table ) . it lps instillation also led to elevated concentrations of dbco-igg liposomes in the lungs. biodistributions of the dbco-igg liposomes indicated a pulmonary concentration of . %id/g at hour after it lps, . %id/g at hours after it lps, and . %id/g at hours after it lps (supplementary figure ). even at early time points after direct pulmonary lps insult, dbco-igg liposomes accumulated in the inflamed lungs. results in figure b were obtained by introducing a -fold molar excess of nhs-ester-dbco to rat igg before dbco-igg conjugation to liposomes (dbco( x)-igg liposomes). optical density quantification of dbco indicated ~ dbco per igg following reaction of dbco and igg at : molar ratio (supplementary figure ) . to test the hypothesis that dbco functions as a tag that modifies dbco-igg liposomes for neutrophil affinity in settings of inflammation, we varied the concentration of dbco on igg prepared for conjugation to azide liposomes. dbco was added to igg at -fold, five-fold, and . -fold molar excesses. a -fold molar excess resulted in ~ dbco per igg, a -fold molar excess resulted in ~ dbco per igg, and a . -fold molar excess resulted in ~ dbco per igg (supplementary figure ) . igg with different dbco loading concentrations was conjugated to azide liposomes. dbco-igg liposomes had similar sizes across all dbco concentrations (supplementary figure c) , with diameters of ~ nm and pdis < . . the different types of dbco-igg liposomes were each traced in iv-lps injured mice. titrating the quantity of dbco on dbco-igg liposomes indicated that liposome accumulation in the lungs of injured mice was dependent on dbco concentration on the liposome surface. concentration of dbco-igg liposomes in inflamed lungs attenuated with decreasing dbco concentration on igg (supplementary table , figure c ). therefore, only igg with high concentrations of dbco served as a tag for modifying the surface of liposomes for specificity to pulmonary injury. flow cytometry verified the specificity of dbco-igg liposomes for neutrophils in injured lungs ( figure d -e). as with ldngs and albumin nps in figure c -h, single cell suspensions were prepared from lps-inflamed and naïve control lungs after circulation of fluorescent dbco-igg liposomes. confirming the results of biodistribution studies, . % of cells were liposome-positive in naïve lungs, compared to . % of all cells in lps-inflamed lungs (supplementary figure a-b) . dbco-igg liposomes predominantly accumulated in pulmonary neutrophils after iv lps. there were more neutrophils in the injured lungs and a greater fraction of neutrophils took up dbco-igg liposomes in the injured lungs, as compared to the naïve control ( figure d -e). approximately one half of neutrophils in iv lps-injured lungs contained liposomes. dbco-igg liposomes were also highly specific for neutrophils in inflamed lungs, with ~ % of liposome-positive cells in the injured lungs being neutrophils (supplementary table ). the remaining dbco-igg liposome uptake in the lungs was accounted for by other cd -positive cells (supplementary figure c -e). . % of liposome uptake colocalized with cd -positive cells in lps-injured lungs and . % of liposome uptake in the naïve lungs was associated with cd -positive cells. accordingly, less than % of liposome uptake was associated with endothelial cells (supplementary figure f -g). dbco( x)-igg itself did not have specificity for inflamed lungs (supplementary figure ). uptake of dbco( x)-igg in naïve and injured lungs was statistically identical and the biodistribution of the modified igg resembled published results with unmodified igg. these results verify that dbco-igg modifies the structure of immunoliposomes, but does not function as a standard affinity tag by acting as a surface motif with intrinsic affinity for neutrophils. indeed, cd spectroscopic and ansa structural characterization of dbcomodified igg and dbco-igg liposomes resembled results obtained for ldngs and crosslinked albumin nps. igg secondary structure, as assessed by cd spectroscopy, was unchanged by dbco modification (supplementary figure a) . deconvolution of cd spectra via neural network algorithm indicated identical structural compositions for dbco( x)-igg, dbco( x)-igg, dbco( x)-igg, dbco( . x)-igg, and unmodified igg, showing that igg was not denatured by conjugation to dbco. ansa was used to probe accessible hydrophobic domains on dbco( x)-igg and dbco( x)-igg liposomes (supplementary figure b) . ansa fluorescence indicated more hydrophobic domains available on dbco( x)-igg liposomes than on dbco( x)-igg itself, resembling results for lysozyme and ldngs. therefore, addition of a hydrophobic moiety to protein on the surface of liposomes led to uptake of the liposomes in pulmonary marginated neutrophils after inflammatory insult. this result indicates that hydrophobic interactions between proteins on the surface of functionalized liposomes, like the protein interactions in naps, predict liposome tropism for marginated neutrophils in inflamed lungs. including nps from our four classes of protein-based nps, two non-protein nps (bare liposomes and polystyrene nps), and five types of igg-coated liposomes, we traced nanoparticles in naïve and inflamed mice. direct assessment of naïve-toinflamed shifts in lung uptake led us to identify naps with specificity for inflamed lungs. to verify this assessment and derive additional patterns in the broader data set, we undertook linear discriminant and principal components analyses of the biodistribution data for our nanoparticles, along with three isolated proteins. grouping the nanoparticles and three proteins according to the classes defined in figure and supplementary figures - , we completed a linear discriminant analysis of the naïve-to-inflamed shift for particle retention in the lungs, blood, liver, and spleen (supplementary figure a) . data for particle uptake in each organ was normalized by subtracting and then dividing by the mean uptake over all particles. the first two eigenvectors, dominated by splenic uptake and a combination of liver and lung uptake, respectively, accounted for % of variation in the data. the resulting projection of the data along the first two linear discriminant analysis eigenvectors was analyzed by k-means clustering to confirm the classes of nanoparticle with specificity for the inflamed lungs (supplementary figure b) . indeed, division of the data into two clusters supported the delineation of the nanoparticles with specificity for inflamed lungs. naps, nanoparticles based on protein gelation, crosslinking, and charge association, all aligned in one cluster. as an exception, dbco( x)-igg liposomes were considered as a unique class of particle and the linear discriminant analysis indicated that the inflammation-specific liposomes had in vivo behavior resembling that of ldngs or poni-gfp nanoparticles. this analysis of the liposome biodistributions supports the classification of dbco( x)-igg liposomes as naps. igg-coated polystyrene nanoparticles and dbco( x)-igg liposomes were part of the k-means cluster without inflammation specificity, but data for these two particles resided close to the voronoi boundary distinguishing the two clusters. principal component analysis comparing normalized nanoparticle uptake in inflamed lungs to normalized retention in liver, spleen, and blood provided a reductive metric to compare the distinct in vivo behavior of nanoparticles in the classes identified by linear discriminant analysis. most variation in the biodistribution data was accounted for by an eigenvector closely aligned to variation in pulmonary uptake (supplementary figure a) . data was projected along that first eigenvector and magnitude of the projection was determined for each nanoparticle (supplementary figure b) . first eigenvector projection values were then grouped according to the classes examined above via linear discriminant analysis. only the classes in the inflammation-specific kmeans cluster had positive average first eigenvector projections. all other particle classes had average first eigenvector projections indistinguishable from isolated protein (supplementary figure b) . principal component and linear discriminant analyses of our compiled biodistributions confirmed; a) identification of naps as nanoparticles with distinct tropism for inflamed lungs and; b) alignment of dbco( x)-igg liposome in vivo behavior with that of other naps. computerized tomography (ct) imaging is a standard diagnostic tool for ards. ct images can identify the presence of edematous fluid in the lungs, but ct cannot distinguish between the two major types of pulmonary edema: non-inflammatory cardiogenic pulmonary edema (cpe) and ards-associated edema. we sought to use naps to distinguish inflammatory lung injury from cpe in diagnostic imaging experiments. we induced cpe in mice via prolonged iv propranolol infusion. edema was confirmed via ct imaging of inflated lungs ex vivo and in situ. three-dimensional reconstructions of chest ct images were partitioned to distinguish airspace and lowdensity tissue, as in normal lungs (white, yellow, and light orange signal in figure a ), from high-density tissue and edema (red and black/transparent signal in figure a ). quantification of ct attenuation and gaps in the reconstructed three-dimensional lung images indicated profuse edema in lungs afflicted with model cardiogenic pulmonary edema ( figure a nm ldngs were traced in mice with induced cardiogenic pulmonary edema. ldngs accumulated in the edematous lungs at . %id/g concentration, statistically indistinguishable from lung uptake in naïve mice and an order of magnitude lower than the level of lung uptake in mice treated with iv lps ( figure c ). naïve and iv lps-injured mice were dosed with ldngs labeled with in via chelate conjugation to lysozyme. in uptake in naïve and lps-injured lungs was visualized with ex vivo spect-ct imaging to indicate capacity of ldngs for imagingbased diagnosis of inflammatory lung injury ( figure d ). in signa was colocalized with anatomical ct images for reconstructions in figure d . in spect signal was detectable in lps-injured lungs, but in spect signal was at background level in naïve lungs (supplementary movies and ). reduced spect signal in the liver of lps-injured mice, in agreement with biodistribution data, was also evident in coregistration of spect imaging with full body skeletal ct imaging (supplementary movies and ). therefore, naps with tropism for marginated neutrophils have the ability to detect and assess ards-like inflammation via spect-ct imaging. since those same naps do not accumulate in lungs afflicted with cpe, naps have potential for differential diagnosis of acute lung inflammation against cpe. in recent work, we demonstrated that human donor lungs rejected for transplant due to ards-like phenotypes can be perfused with nanoparticle solutions. these perfusion experiments evaluate the tendency of nanoparticles to distribute to human lungs ex vivo. we used this perfusion method to evaluate nap retention in inflamed human lungs. first, fluorescent ldngs were added to single cell suspensions prepared from human lungs. µg, µg, or µg of ldngs were incubated with x cells in suspension for hour at room temperature. after three washes to remove unbound ldngs, cells were stained for cd and analyzed with flow cytometry ( figure a -b). the majority of ldng uptake in the single cell suspensions was attributable to cd positive cells. ldngs accumulated in the human leukocytes, extracted from inflamed lungs, in a dose-dependent manner, with . % of leukocytes containing ldngs at a loading dose of µg. therefore, our prototype nap was retained in leukocytes from human lungs. to test ldng tropism for inflamed intact human lungs, fluorescent or i-labeled ldngs were infused via arterial catheter into ex vivo human lungs excluded from transplant. immediately prior to ldng administration, tissue dye was infused via the same arterial catheter to stain regions of the lungs directly perfused by the catheterized branch of the pulmonary artery ( figure c ). after infusion of ldngs, phosphate buffered saline infusion was used to rinse away unbound particles. perfused regions of the lungs were dissected and divided into ~ g segments, then sorted into regions deemed to have high, medium, or low levels of tissue dye staining. for lungs receiving fluorescent ldngs, well-perfused and poorly-perfused regions were selected for sectioning and fluorescent imaging. fluorescent signal from ldngs was clearly detectable in sections of well-perfused tissue, but not poorly-perfused tissue ( figure d ). in experiments with i-labeled ldngs, i-labeled ferritin was concurrently infused (i.e. a mix of ferritin and ldngs was infused) as an internal control particle shown to have no tropism for injured mouse lungs. with ldngs and ferritin infused into the same lungs via the same branch of the pulmonary artery, ldngs retained in the lungs at . % initial dose and ferritin retained at . % initial dose ( figure e ). ldng accumulation in human lungs was focused in regions of the lungs with high levels of perfusion stain, with concentrations of . %id/g in the "high" perfusion regions, compared to . %id/g in the "medium" perfusion regions. ferritin accumulation was more diffuse, with . %id/g in the "high" perfusion regions, compared to . %id/g in the "medium" perfusion regions (supplementary figure ) . ldngs, a prototype nap shown to home to neutrophils in acutely inflamed mouse lungs, specifically accumulated in perfused regions of inflamed human lungs, but ferritin nanocages, a particle with no tropism for neutrophils, concentrated at much lower levels in injured human lungs. our data thus indicate that nap tropism for neutrophils in inflamed mouse lungs may be recapitulated in human lungs. previous studies indicate that nanoparticles can interfere with neutrophil adhesion in inflamed vasculature. we designed studies to evaluate whether or not naps mitigate the neutrophil-mediated effects of lung inflammation. namely, we administered ldngs, dbco( x)-igg liposomes, or bare liposomes in mice subjected to model ards and determined whether or not the nanoparticles prevented lung edema induced by inflammation. mice were treated with nebulized lps as a high-throughput model for severe ards. to evaluate physiological effects of the model injury, bronchoalveolar lavage (bal) fluid was harvested from mice at hours after exposure to lps. in three separate experiments, nebulized lps induced elevated concentrations of neutrophils, cd -positive cells, and protein in the bal fluid. in naïve mice, cd -positive cells concentrated at . x cells per ml bal and neutrophils concentrated at . x cells per ml bal. after lps injury, cd -positive cells and neutrophils concentrated at . x and . x cells per ml bal, respectively. in naïve mice, protein concentrated in the bal fluid at . mg/ml and in lps-injured mice, protein concentrated in the bal at . mg/ml ( figure , white and grey bars). vascular disruption after nebulized lps treatment thus led to accumulation of protein-rich edema in the alveolar space. dbco( x)-igg liposomes, ldngs, and bare liposomes were compared for effects on vascular permeability in model ards. nps were administered as an iv bolus ( mg per kg body weight) two hours after nebulized lps administration. as in untreated mice, bal fluid was harvested and analyzed at hours after exposure to nebulized lps. bare liposomes or ldngs did not have significant effects on vascular injury induced by nebulized lps, as measured by either leukocyte or protein concentration in bal fluid ( figure , red and green bars). dbco( x)-igg liposomes, however, had a significant salient effect on both protein leakage and cellular infiltration in the bal ( figure , brown bars). with dbco( x)-igg liposomes administered two hours after nebulized lps, cd -positive cells and neutrophils in bal were reduced to concentrations of . x and . x cells per ml, respectively. protein concentration in the bal was reduced to . mg/ml by dbco( x)-igg liposome treatment. as measured by protection against cellular or protein leakage, relative to untreated mice, dbco( x)-igg liposomes provided . % protection against leukocyte leakage, . % protection against neutrophil leakage, and . % protection against protein leakage. dbco( x)-igg liposomes, without any drug, altered the course of inflammatory lung injury to limit protein and leukocyte edema in the alveoli. our results with dbco( x)-igg liposomes indicate that some naps can interfere with neutrophil extravasation into the alveoli and thus limit edema following inflammatory injury. however, our results with ldngs show that tropism for marginated neutrophils is not alone sufficient to limit the neutrophil-mediated effects of inflammatory lung injury. neutrophils concentrate in the pulmonary vasculature during either systemic or pulmonary inflammation. , , , , these marginated neutrophils can recognize and engulf bacteria. , , therefore, neutrophils surveil the vasculature for potentially pathogenic foreign species, with the pulmonary vasculature serving as a "surveillance hub" in the case of systemic or pulmonary infection and inflammation. , , , our results with e. coli are noteworthy in this context: when e. coli are stripped of functional properties by heat treatment, oxidation, and fixation, but maintain their structure, uptake of the bacteria in the lungs only occurs after systemically prompting neutrophils with an inflammatory signal, lps. inflammation thus leads to pulmonary uptake of the e. colishaped particles. in large part, the overall outcome of this study is an accounting of nanoparticle structural properties that lead to recognition by "surveilling" neutrophils in the inflamed lungs, analogously to e. coli recognition by pulmonary neutrophils. including different liposomal formulations, nanoparticles were screened in our biodistribution studies comparing pulmonary nanoparticle uptake in naïve and lps-inflamed mice. thirteen different nanoparticles exhibited specificity for inflamed lungs over naïve lungs, with flow cytometry data indicating that at least three of those nanoparticle species specifically and avidly gather in neutrophils. the thirteen nanoparticles with specificity for the inflamed lungs have a range of properties. seven different proteins were used in the inflammation-specific particles. the particles have sizes ranging from ~ nm to ~ nm, include both spheres and rods, and have a range of zeta potentials. however, our analyses classify the inflammation-specific nanoparticles as; ) nanoparticles with structure based on hydrophobic interactions between proteins; ) nanoparticles with structure based on non-site-specific protein crosslinking; ) nanoparticles based on charge interactions between proteins. put broadly, these three classes can all be grouped as structures based on protein agglutination, without regard for site-specific interactions or symmetry in the resulting protein superstructure. we define the term nanoparticles with agglutinated proteins (naps) to indicate that particles with tropism for pulmonary marginated neutrophils during inflammation share commonalities in supramolecular organization. we identify naps as a broad class, rather than a single particle type. accordingly, we have presented diverse nap designs, implying a diversity of potential nap-based strategies for targeted treatment and diagnosis of ards and other inflammatory disorders in which marginated neutrophils play a role (e.g. local infections or thrombotic disorders). , , , , the diversity of naps will allow versatile options for engineering neutrophil-specific drug delivery strategies to accommodate different pathologies. in contrast to naps, three particles (adenovirus, aavs, and ferritin) characterized by highly symmetric arrangement of protein subunits into a protein superstructure [ ] [ ] [ ] did not accumulate in the inflamed neutrophil-rich lungs. these three particles have evolved structures that lead to prolonged circulation or evasion of innate immunity in mammals. [ ] [ ] [ ] [ ] it is conceivable that neutrophils more effectively recognize less patterned and more variable protein arrangements that may better parallel the wide variety of structures presented by the staggering diversity of microbes against which neutrophils defend. , to support our conclusions regarding supramolecular organization and neutrophil tropism, we re-engineered liposomes, particles with no intrinsic neutrophil tropism, to behave like naps. protein arrangement on the surface of dbco-igg liposomes was predicted to recapitulate protein agglutination seen in naps based on hydrophobic interactions. introduction of dbco to igg entails conjugation of a highly hydrophobic moiety to hydrophilic residues on the igg. replacing dbco with the less hydrophobic modifying group used in sata-maleimide conjugation abrogates the inflammation specificity observed with dbco-igg liposomes. likewise, titrating down the amount of dbco on the igg, thus limiting the hydrophobic groups on the protein, also ratchets down the targeting behavior of the dbco-igg liposomes. our data therefore points towards hydrophobic interactions between proteins on the liposome surface being a determinant in liposome uptake in neutrophils in the inflamed lungs. essentially, the dbco-igg liposomes may reproduce the hydrophobic interaction structural motif seen in naps produced by protein gelation (i.e. ldngs). nap-liposomes may be particularly attractive for future clinical translatability. liposomes are prominent among fda-approved nanoparticle drug carriers. further, even without cargo drugs, nap-liposomes conferred significant therapeutic effects in a mouse model of severe ards. ldngs, despite high levels of uptake in inflamed lungs, did not have the same therapeutic effect as the nap-liposomes. this result suggests that the composition of the liposomes may be important for their therapeutic effect. among possible mechanisms for the therapeutic effect, we note that lipid rafts are major signaling hubs in neutrophils. , the lipid content of the nap-liposomes (particularly the cholesterol content) may modulate neutrophil lipid rafts dependent on cholesterol. we have also observed that neutrophil content in the inflamed alveoli is markedly reduced by nap-liposomes. in this context, we note published work demonstrating that certain nanoparticles, in a still undetermined manner dependent on particle composition, can drive redistribution of neutrophils from the lungs to the liver. as a major corollary, our findings indicate many protein-based or proteinincorporating nanoparticles developed for therapeutic applications may accumulate in inflamed lungs, even when those nanoparticles were designed to accumulate elsewhere. the variety of protein nanostructures accumulating in inflamed lungs in our data includes particles that have been investigated as targeted drug delivery vehicles where marginated neutrophils are not the intended site of accumulation. , , , , the patterns in our data indicate that future studies may reveal additional nanoparticles that accumulate in the lungs following inflammatory insult. this study therefore serves as evidence that inflammatory challenges may prompt profound off-target changes in the biodistributions of nanomaterials, including dramatic shunting of nanoparticles and any associated drug payload to the lungs. the nanoparticle targeting profiles documented in naïve or, for instance, tumor model studies may be overturned by, for instance, bacterial infection in a patient receiving the nanoparticle. in conclusion, supramolecular organization in nanoparticle structure predicts nanoparticle uptake in pulmonary marginated neutrophils during acute inflammation. specifically, nanoparticles with agglutinated protein (naps) accumulate in marginated neutrophils, while nanoparticles with more symmetric protein organization do not. nap tropism for neutrophils allowed us to develop naps as diagnostics and therapeutics for ards, and even to demonstrate nap uptake in inflamed human lungs. future work may more deeply explore therapeutic effects of naps in ards and other diseases in which neutrophils play key roles. this study also obviates future testing of supramolecular organization as a variable in in vivo behavior of nanoparticles, including screens of tropism for other pathologies and cell types. these studies could in turn guide engineering of new particles with intrinsic cell tropisms, as with our engineering of nap-liposomes with neutrophil tropism. these "targeting" behaviors, requiring no affinity moieties, may apply to a wide variety of nanomaterials. but our current findings with neutrophil-tropic naps indicate that many protein-based and protein-coated nanoparticles could be untapped resources for treatment and diagnosis of devastating inflammatory disorders like ards. lysozyme-dextran nanogels (ldngs) were synthesized as previously described. , kda rhodamine-dextran or fitc-dextran (sigma) and lysozyme from hen egg white (sigma) were dissolved in deionized and filtered water at a : or : mol:mol ratio, and ph was adjusted to . before lyophilizing the solution. for maillard reaction between lysozyme and dextran, the lyophilized product was heated for hours at °c, with % humidity maintained via saturated kbr solution in the heating vessel. dextran-lysozyme conjugates were dissolved in deionized and filtered water to a concentration of mg/ml, and ph was adjusted to . or . . solutions were stirred at °c for minutes. diameter of ldngs was evaluated with dynamic light scattering (dls, malvern) after heat gelation. particle suspensions were stored at °c. crosslinked protein nanoparticles and nanorods were prepared using previously reported electrohydrodynamic jetting techniques. the protein nanoparticles were prepared using bovine serum albumin, human serum albumin, human lysozyme, human transferrin, or human hemoglobin (all proteins were purchased from sigma). protein nanorods were prepared using chemically modified human serum albumin. for electrohydrodynamic jetting, protein solutions were prepared by dissolving the protein of interest at a . w/v% (or . w/v% for protein nanorods) concentration in a solvent mixture of di water and ethylene glycol with : (v/v) ratio. the homobifunctional amine-reactive crosslinker, o,o′-bis[ -(n-succinimidylsuccinylamino)ethyl]polyethylene glycol with molecular weight of kda (nhs-peg-nhs, sigma) was mixed with the protein solution at w/w%. protein nanoparticles were kept at °c for days for completion of the crosslinking reaction. the as-prepared protein nanoparticles were collected in pbs buffer and their size distribution was analyzed using dynamic light scatting (dls, malvern). glutamic acid residues (e -tag) were inserted at the c-terminus of enhanced green fluorescent protein (egfp) through restriction cloning and site-directed mutagenesis as previously reported. proteins were expressed in an e. coli bl strain using standard protein expression protocol. briefly, protein expression was carried out in xyt media with an induction condition of mm iptg and °c for h. at this point, the cells were harvested, and the pellets were lysed using % triton-x- ( min, °c)/dnase-i treatment ( minutes). proteins were purified using hispur cobalt columns. after elution, proteins were preserved in pbs buffer. the purity of native proteins was determined using % sds−page gel. polymers (poni) were synthesized by ring-opening metathesis polymerization using third generation grubbs' catalyst as previously described. in brief, solutions in dichloromethane of guanidium functionalized monomer and grubbs' catalyst were placed under freeze thawing cycles for degassing. after warming the solutions to room temperature, the degassed monomer solutions were administrated to degassed catalyst solutions and allowed to stir for minutes. the polymerization reaction was terminated by the addition of excess ethyl vinyl ether. the reaction mixture was further stirred for another min. the resultant polymers were precipitated from excess hexane or diethyl ether anhydrous, filtered, washed and dried under vacuum to yield a light-yellow powder. polymers were characterized by h nmr and gel permeation chromatography (gpc) to assess chemical compositions and molecular weight distributions, respectively. subsequent to deprotection of boc functionalities, polymer was dissolved in the dcm with the addition of tfa at : ratio. the reaction was allowed to stir for hours and dried under vacuum. excess tfa was removed by azeotropic distillation with methanol. afterwards, the resultant polymers were re-dissolved in dcm and precipitated in anhydrous diethyl ether, filtered, washed and dried. polymers were then dissolved in water and transferred to biotech ce dialysis tubing membranes with a g/mol cutoff and dialyzed against ro water ( − days). the polymers were then lyophilized dried to yield a light white powder. poni polymer/e-tag protein nanocomposites (ppncs) were prepared in polypropylene microcentrifuge tubes (fisher) through a simple mixing procedure. . nmol of kda poni was incubated with . nmol of egfp at room temperature for minutes prior to dilution to µl in sterile pbs and subsequent injection. similarly, . nmol of arginine-tagged gold nanoparticles, prepared as described, were combined with . nmol of egfp to prepare egfp/gold nanoparticle complexes. azide-functionalized liposomes were prepared by thin film hydration techniques, as previously described. the lipid film was composed of mol% dppc ( , dipalmitoyl-sn-glycero- -phosphocholine), mol% cholesterol, and mol% azide-peg -dspe (all lipids from avanti). . mol% top fluor pc ( -palmitoyl- -(dipyrrometheneboron difluoride) undecanoyl-sn-glycero- -phosphocholine) was added to prepare fluorescent liposomes. . mol% dtpa-pe ( , -distearoyl-sn-glycero- phosphoethanolamine-n-diethylenetriaminepentaacetic acid) was added to prepare liposomes with capacity for radiolabeling with in. lipid solutions in chloroform, at a total lipid concentration of mm, were dried under nitrogen gas, then lyophilized for hours to remove residual solvent. dried lipid films were hydrated with dulbecco's phosphate buffered saline (pbs). lipid suspensions were passed through freezethaw cycles using liquid n / °c water bath then extruded through nm cutoff tracketched polycarbonate filters in cycles. dls assessed particle size after extrusion and after each subsequent particle modification. liposome concentration following extrusion was assessed with nanosight nanoparticle tracking analysis (malvern). for conjugation to liposomes, rat igg was modified with dibenzylcyclooctyne-peg -nhs ester (dbco, jena bioscience). igg solutions (pbs) were adjusted to ph . with m nahco buffer and reacted with dbco for hour at room temperature at molar ratios of . : , : , : , or : dbco:igg. unreacted dbco was removed after reaction via centrifugal filtration against kda cutoff filters (amicon [def] . dbcomodified igg was incubated with azide liposomes at igg per liposome overnight at room temperature. unreacted antibody was removed via size exclusion chromatography, and purified liposomes were concentrated to original volume against centrifugal filters (amicon). maleimide liposomes were also prepared via lipid film hydration. lipid films comprised % dppc, % cholesterol, and % mpb-pe ( , -dioleoyl-sn-glycero- phosphoethanolamine-n-[ -(p-maleimidophenyl) butyramide]), with lipids prepared, dried, resuspended, and extruded as described above for azide liposomes. igg was prepared for conjugation to maleimide liposomes by one-hour reaction of sata (n-succinimidyl s-acetylthioacetate) per igg at room temperature in . mm edta in pbs. unreacted sata was removed from igg by passage through kda cutoff gel filtration columns. sata-conjugated igg was deprotected by one-hour room temperature incubation in . m hydroxylamine in . mm edta in pbs. excess hydroxylamine was removed and buffer was exchanged for . mm edta in pbs via kda cutoff gel filtration column. sata-conjugated and deprotected igg was added to liposomes at igg per liposomes for overnight reaction at °c. excess igg was removed by size exclusion column purification, as above for azide liposomes. nm carboxylate nanoparticles (phosphorex) were exchanged into mm mes buffer at ph . via gel filtration column. n-hydroxysulfosuccinimide (sulfo-nhs) was added to the particles at . mg/ml, prior to incubation for minutes at room temperature. edci was then added to the particles at . mg/ml, prior to incubation for minutes at room temperature. igg was added to the particle mixture at igg per nanoparticle, prior to incubation for hours at room temperature while vortexing. for radiotracing, i-labeled igg was added to the reaction at % of total igg mass. the igg/particle mixture was diluted with -fold volume excess of ph . mes buffer and the diluted mixture was centrifuged at xg for minutes. supernatant was discarded and pbs with . % bsa was added at desired volume before resuspending the particles via sonication probe sonication (three pulses, % amplitude). particle size was assessed via dls after resuspension, and particles were used immediately after dls assessment. top e. coli were grown overnight in terrific broth with ampicillin. bacteria were heat-inactivated by -minute incubation at °c, then fixed by overnight incubation in % paraformaldehyde. after fixation, bacteria were pelleted by centrifugation at xg for minutes. pelleted bacteria were washed three times in pbs, prior to resuspension by pipetting. bacterial concentration was verified by optical density at nm, prior to radiolabeling as described for nanoparticles below. bacteria were administered in mice ( . x colony forming units in a µl suspension per mouse). protein, horse spleen ferritin nanocages (sigma), or adeno-associated virus (empty capsids, serotype ) were prepared in pbs at concentrations between and mg/ml in volumes between and µl. films of oxidizing agent were prepared in borosilicate tubes by drying µl of . mg/ml iodogen (perkin-elmer, chloroform solution) under nitrogen gas. alternatively, iodobeads (perkin-elmer) were added to borosilicate tubes (one per reaction). protein solutions were added to coated or beadcontaining tubes, before addition of na / i at µci per µg of protein. protein was incubated with radioiodine at room temperature for minutes under parafilm in a ventilated hood. iodide-protein reacottions were terminated by purifying protein solutions through a kda cutoff gel column (zeba). additional passages through gel filtration columns or against centrifugal filters (amicon, kda cutoff) were employed to remove free iodine, assuring that > % of radioactivity was associated with protein. lysozyme-dextran nanogels, crosslinked protein nanoparticles, e. coli, or adenovirus were similarly iodinated. at least µl of particle suspension was added to a borosilicate tube containing two iodobeads, prior to addition of µci of na i per µl of suspension. particles were incubated with radioiodine and iodobeads for minutes at room temperature, with gentle shaking every minutes. to remove free iodine, particle suspensions were moved to a centrifuge tube, diluted in ~ ml of buffer and centrifuged to pellet the particles ( xg/ minutes for nanogels, xg/ minutes for crosslinked protein particles, xg/ minutes for adenovirus, and xg/ minutes for e. coli). supernatant was removed and wash/centrifugation cycles were repeated to assure > % of radioactivity was associated with particles. particles were resuspended by probe sonication (three pulses, % amplitude) for nanogels or crosslinked protein nanoparticles or pipetting for adenovirus or e. coli. nanoparticle labeling with in in labeling of nanoparticles followed previously described methods, with adaptation for new particles. all radiolabeling chelation reactions were performed using metal free conditions to prevent contaminating metals from interfering with chelation of in by dtpa or dota. metals were removed from buffers using chelex metal affinity resin (biorad, laboratories, hercules ca). lysozyme-dextran nanogels were prepared for chelation to in by conjugation to s- -( -isothiocyanatobenzyl)- , , , -tetraazacyclododecane tetraacetic acid (p-scn-bn-dota, macrocyclics). nanogels were moved to metal free ph . m nahco buffer by three-fold centrifugation ( xg for minutes) and pellet washing with metal free buffer. p-scn-bn-dota was added to nanogels at : mass:mass ratio, prior to reaction for minutes at room temperature. free p-scn-bn-dota was removed by three-fold centrifugal filtration against kda cutoff centrifugal filters, with resuspension of nanogels in metal-free ph citrate buffer after each centrifugation. dota-conjugated nanogels or dtpa-containing liposomes in ph citrate buffer were combined with incl for one-hour chelation at °c. nanoparticle/ incl mixtures were treated with free dtpa ( mm final concentration) to remove in not incorporated in nanoparticles. efficiency of in incorporation in nanoparticles was assessed by thin film chromatography (aluminum/silica strips, sigma) with µm edta mobile phase. chromatography strips were divided between origin and mobile front and the two portions of the strip were analyzed in a gamma counter to assess nanoparticleassociated (origin) vs. free (mobile front) in. free in was separated from nanoparticles by centrifugal filtration and nanoparticles were resuspended in pbs (liposomes) or saline (nanogels). for spect/ct imaging experiments (see spect/ct imaging methods below) with nanogels, µci of in-labeled nanogels, used within one day in labeling as described above, were administered to each mouse. for tracing in-labeled liposomes in biodistribution studies, liposomes were labeled with µci in per µmol of lipid. nanoparticle or protein biodistributions were tested by injecting radiolabeled nanoparticles or protein (suspended to µl in pbs or . % saline at a dose of . mg/kg with tracer quantities of radiolabeled material) in c bl/ male mice from jackson laboratories. biodistributions in naïve mice were compared to biodistributions in several injury models. biodistribution data were collected at minutes after nanoparticle or protein injection, unless otherwise stated, as in pharmacokinetics studies. briefly, blood was collected by vena cava draw and mice were sacrificed via terminal exsanguination and cervical dislocation. organs were harvested and rinsed in saline, and blood and organs were examined for nanoparticle or protein retention in a gamma counter (perkin-elmer). nanoparticle or protein retention in harvested organs was compared to measured radioactivity in injected doses. for calculations of nanoparticle or protein concentration in organs, quantity of retained radioactivity was normalized to organ weights. mice subject to intravenous lps injury were anesthetized with % isoflurane before administration of lps from e. coli strain b at mg/kg in µl pbs via retroorbital injection. after five hours, mice were anesthetized with ketamine-xylazine ( mg/kg ketamine, mg/kg xylazine, intramuscular administration) and administered radiolabeled nanoparticles or protein via jugular vein injection to determine biodistributions as described above. for mice subject to intratracheal (it) lps injury, b lps was administered to mice (anesthetized with ketamine/xylazine) at mg/kg in µl of pbs via tracheal catheter, followed by µl of air. biodistributions of lysozyme-dextran nanogels in it-lps-injured mice were assessed as above hours after lps administration. biodistributions of liposomes in it-lps-injured mice were assessed at , , or hours after lps administration. mice subject to footpad lps administration were provided b lps at mg/kg in µl pbs via footpad injection. biodistributions of lysozyme-dextran nanogels were obtained at or hours after footpad lps administration. lysozyme-dextran nanogel biodistributions were also traced in a mouse model of cardiogenic pulmonary edema. to establish edema, mice were anesthetized with ketamine/xylazine and administered propranolol in saline ( µg/ml) via jugular vein catheter at µl/min over minutes. lysozyme-dextran nanogel biodistributions were subsequently assessed as above. single cell suspensions were prepared from lungs for flow cytometric analysis of cell type composition of the lungs and/or nanoparticle distribution among different cell types in the lungs. c bl/ male mice were anesthetized with ketamine/xylazine ( mg/kg ketamine, mg/kg xylazine, intramuscular administration) prior to installation of tracheal catheter secured by suture. after sacrifice by terminal exsanguination via the vena cava, lungs were perfused by right ventricle injection of ~ ml of cold pbs. the lungs were then infused via the tracheal catheter with ml of a digestive enzyme solution consisting of u/ml dispase, . mg/ml collagenase type i, and mg/ml of dnase i in cold pbs. immediately after infusion, the trachea was sutured shut while removing the tracheal catheter. the lungs with intact trachea were removed via thoracotomy and kept on ice prior to manual disaggregation. disaggregated lung tissue was aspirated in ml of digestive enzyme solution and incubated at °c for minutes, with vortexing every minutes. after addition of ml of fetal calf serum, tissue suspensions were strained through µm filters and centrifuged at xg for minutes. after removal of supernatant, the pelleted material was resuspended in ml of cold ack lysing buffer. the resulting suspensions were strained through µm filter and incubated for minutes on ice. the suspensions were centrifuged at xg for minutes and the resulting pellets were rinsed in ml of facs buffer ( % fetal calf serum and mm edta in pbs). after centrifugation at xg for minutes, the rinsed cell pellets were resuspended in % pfa in ml facs buffer for minutes incubation. the fixed cell suspensions were centrifuged at xg for minutes and resuspended in ml of facs buffer. for analysis of intravascular leukocyte populations in naïve and inflamed lungs, mice received an intravenous injection of fitc-conjugated anti-cd antibody five minutes prior to sacrifice and preparation of single cell suspensions as described above. populations of intravascular vs. extravascular leukocytes were assessed by subsequent stain of fixed cell suspensions with percp-conjugated anti-cd antibody and/or apcconjugated clone a anti-ly g antibody. to accomplish staining of fixed cells, µl aliquots of the cell suspensions described above were pelleted at xg for minutes, then resuspended in labeled antibody diluted in facs buffer ( : dilution for apcconjugated anti-ly g antibody and : dilution for percp-conjugated anti-cd antibody). samples were incubated with staining antibodies for minutes at room temperature in the dark, diluted with ml of facs buffer, and pelleted at xg for minutes. stained pellets were resuspended in µl of facs buffer prior to immediate flow cytometric analysis on a bd accuri flow cytometer. all flow cytometry data was gated to remove debris and exclude doublets. control samples with no stain, obtained from naïve and iv-lps-injured mice, established gates for negative/positive staining with fitc, percp, and apc. single stain controls allowed automatic generation of compensation matrices in fcs express software. comparison of percp anti-cd signal with fitc anti-cd signal indicated intravascular vs. extravascular leukocytes. comparison of apc anti-ly g signal with fitc anti-cd signal indicated intravascular vs. extravascular neutrophils, with percp and apc co-staining verifying identification of cells as neutrophils. similar staining and analysis protocols enabled identification of nanoparticle distribution among different cell types in the lungs. to enable fluorescent tracing, lysozyme-dextran nanogels contained fitc-dextran, dbco-igg liposomes contained green fluorescent top fluor pc lipid, and crosslinked albumin nanoparticles were labeled with nhs ester alexa fluor . alexa fluor labeling of albumin nanoparticles was accomplished by incubation of the nhs ester fluorophore with nanoparticles at : mass:mass fluorophore:nanoparticle ratio for two hours on ice. excess fluorophore was removed from nanoparticles by -fold centrifugation at xg for minutes followed by washing with pbs. nanoparticles were administered at . mg/kg via jugular vein injection and circulated for minutes, prior to preparation of single cell suspensions from lungs as above. fixed single cell suspensions were stained with apc-conjugated anti-ly g or percp-conjugated anti-cd as above. additional suspensions were stained with : dilution of apc-conjugated anti-cd , in lieu of anti-ly g, to identify endothelial cells. association of nanoparticles with cell types was identified by coincidence of green fluorescent signal with anti-cd , anti-ly g, or anti-cd signal. as described previously, thirty minutes after injection of µci of in-labeled nanogels, anesthetized mice were sacrificed by cervical dislocation. mice were placed into a milabs u-spect (utrecht, netherlands) scanner bed. a region covering the entire body was scanned for min using listmode acquisition. the animal was then moved, while maintaining position, to a milabs u-ct (utrecht, netherlands) for a fullbody ct scan using default acquisition parameters ( µa, kvp, ms exposure, . ° step with projections). for naïve mice and mice imaged after cardiogenic pulmonary edema, ct data was acquired as above without spect data. the spect data was reconstructed using reconstruction software provided by the manufacturer, with µm voxels. the ct data were reconstructed using reconstruction software provided by the manufacturer, with µm voxels. spect and ct data, in nifti format, were opened with imagej software (fiji package). background signal was removed from spect images by thresholding limits determined by applying renyi entropic filtering, as implemented in imagej, to a spect image slice containing ngassociated in in the liver. background-subtracted pseudo-color spect images were overlayed on ct images and axial slices depicting lungs were selected for display, with ct thresholding set to emphasize negative contrast in the airspace of the lungs. imagej's built-in d modeling plugin was used to co-register background-subtracted pseudo-color spect images with ct images in three-dimensional reconstructions. ct image thresholding was set in the d modeling tool to depict skeletal structure alongside spect signal. for three-dimensional reconstructions of lung ct images, thresholding was set, as above, for contrast emphasizing the airspace of the lungs, with thresholding values standardized between different ct images (i.e. identical values were used for naïve and edematous lungs). images were cropped in a cylinder to exclude the airspace outside of the animal, then contrast was inverted, allowing airspace to register bright ct signal and denser tissue to register as dark background. three-dimensional reconstructions of the lung ct data, and co-registrations of spect data with lung ct data, were generated as above with imagej's d plugin applied to ct data cropped and partitioned for lung contrast. quantification of ct attenuation employed imagej's measurement tool iteratively over axial slices, with measurement fields of view manually set to contain lungs and exclude surrounding tissue. mice were exposed to nebulized lps in a 'whole-body' exposure chamber, with separate compartments for each mouse (mpc- aero; braintree scientific). to maintain adequate hydration, mice were injected with ml of sterile saline warmed to °c, intraperitoneally, immediately before exposure to lps. lps (l - mg, sigma aldrich) was reconstituted in pbs to mg/ml and stored at - °c until use. immediately before nebulization, lps was thawed and diluted to mg/ml with pbs. lps was aerosolized via a jet nebulizer connected to the exposure chamber (neb-med h, braintree scientific, inc.). ml of mg/ml lps was used induce the injury. nebulization was performed until all liquid was nebulized (~ minutes). liposomes or saline sham were administered via retro-orbital injections of µl of suspension ( mg/kg liposome dose) at hours after lps exposure. mice were anesthetized with % isoflurane to facilitate injections. bronchoalveolar lavage (bal) fluid was collected hours after lps exposure, as previously described. briefly, mice were anesthetized with ketamine-xylazine ( mg/kg ketamine, mg/kg xylazine, intramuscular administration). the trachea was isolated and a tracheostomy was performed with a -gauge catheter. the mice were euthanized via exsanguination. . ml of cold bal buffer ( . mm edta in pbs) was injected into the lungs over ~ min via the tracheostomy and then aspirated from the lungs over ~ min. injections/aspirations were performed three times for a total of . ml of fluid added to the lungs. recovery bal fluid typically amounted to ~ . ml. bal samples were centrifuged at xg for minutes. the supernatant was collected and stored at - °c for further analysis. protein concentration was measured using bio-rad dc protein assay, per manufacturer's instructions. the cell pellet was fixed for flow cytometry as follows. µl of . % pfa in pbs was added to each sample. samples were incubated in the dark at room temperature for minutes, then ml of bal buffer was added. samples were centrifuged at xg for min, the supernatant was aspirated, and ml of facs buffer ( % fetal calf serum and mm edta in pbs) was added. at this point, samples were stored at °c for up to week prior to flow cytometry analysis. to stain for flow cytometry, samples were centrifuged at xg for min, the supernatant was aspirated, and µl of staining buffer was added. staining buffer used was a : dilution of stock antibody solution (apc anti-mouse cd ; alexa fluor anti-mouse ly g, biolegend) into facs buffer. samples were incubated with staining antibody for minutes at room temperature in the dark. to terminate staining, ml of facs buffer was added, samples were centrifuged at xg for minutes, and supernatant was aspirated. cells were resuspended in µl of facs buffer and immediately analyzed via flow cytometry. flow cytometric analysis was completed with a bd accuri flow cytometer as follows: sample volume was set to µl and flow rate was set to 'fast'. unstained and single-stained controls were used to set gates. forward scatter (pulse area) vs. side scatter (pulse area) plots were used to gate out non-cellular debris. forward scatter (pulse area) vs. forward scatter (pulse height) plots were used to gate out doublets. the appropriate fluorescent channels were used to determine stained vs. unstained cells. the gates were placed using unstained control samples. single-stain controls were tested and showed there was no overlap/bleed-through between the fluorophores. final analysis indicated the quantity of leukocytes (cd -positive cells) and neutrophils (ly g-positive cells) in bal samples. human lungs were obtained after organ harvest from transplant donors whose lungs were in advance deemed unsuitable for transplantation. the lungs were harvested by the organ procurement team and kept at °c until the experiment, which was done within hours of organ harvest. the lungs were inflated with low pressure oxygen and oxygen flow was maintained at . l/min to maintain gentle inflation. pulmonary artery subsegmental branches were endovascularly cannulated, then tested for retrograde flow by perfusing for minutes with steen solution containing a small amount of green tissue dye at cm h o pressure. the pulmonary veins through which efflux of perfusate emerged were noted, allowing collection of solutions after passage through the lungs. a ml mixture of i-labeled lysozyme-dextran nanogels and ilabeled ferritin nanocages were injected through the arterial catheter. ~ ml of % bsa in pbs was passed through the same catheter to rinse unbound nanoparticles. a solution of green tissue dye was subsequently injected through the same catheter. the cannulated lung lobe was dissected into ~ g segments, which were evaluated for density of tissue dye staining. segments were weighed, divided into 'high', 'medium', 'low', and 'null' levels of dye staining, and measured for i and i signal in a gamma counter. for experiments with cell suspensions derived from human lungs (chosen for research use according to the above standards), single cell suspensions were generously provided by the laboratory of edward e. morrisey at the university of pennsylvania. aliquots of , cells were pelleted at xg for minutes and resuspended in µl pbs containing different quantities of lysozyme-dextran nanogels synthesized with fitc-labeled dextran. cells and nanogels were incubated at room temperature for minutes before two-fold pelleting at xg with ml pbs washes. cells were re-suspended in µl facs buffer for staining with apcconjugated anti-human cd , applied by -minute incubation with a : dilution of the antibody stock. cells were pelleted at xg for minutes and resuspended in µl pbs for immediate analysis with flow cytometry (bd accuri). negative/positive nanogel or anti-cd signal was established by comparison to unstained cells. singlestained controls indicated no spectral overlap between fitc-nanogel fluorescence and anti-cd apc fluorescence. proteins were prepared in deionized and filtered water at concentrations of . mg/ml for human albumin, . mg/ml for hen lysozyme, and . mg/ml for igg. crosslinked albumin nanoparticles, lysozyme-dextran nanogels, and igg-coated liposome suspensions were prepared such that albumin, lysozyme, and igg concentrations in the suspensions matched the concentrations of the corresponding protein solutions. protein and nanoparticle solutions were analyzed in quartz cuvettes with mm path length in an aviv circular dichroism spectrometer. the instrument was equilibrated in nitrogen at °c for minutes prior to use and samples were analyzed with sweeps between and nm in nm increments. each data point was obtained after a . s settling time, with a s averaging time. cdnn software deconvolved cd data (expressed in millidegrees) via neural network algorithm assessing alignment of spectra with library-determined spectra for helices, antiparallel sheets, parallel sheets, beta turns, and random coil. -anilino- -naphthalenesulfonic acid (ansa) at . mg/ml was mixed with lysozyme, human albumin, or igg at . mg/ml in pbs. for nanoparticle analysis, nanoparticle solutions were prepared such that albumin, lysozyme, and igg concentrations in the suspensions matched the . mg/ml concentration of the protein solutions. protein or nanoparticles and ansa were reacted at room temperature for minutes. excess ansa was removed from solutions by centrifugations against kda cutoff centrifugal filters (amicon). after resuspension to original volume, ansa-stained protein/nanoparticle solutions/suspension were examined for fluorescence (excitation nm, emission - nm) and absorbance ( - nm) maxima corresponding to ansa. for imaging neutrophil content in naïve and iv-lps-injured lungs, mice were intravenously injected with rat anti-mouse anti-ly g antibody (clone a ) and sacrificed minutes later. lungs were embedded in m medium, flash frozen, and sectioned in µm slices. sections were stained with percp-conjugated anti-rat secondary antibody and neutrophil-associated fluorescence was observed with epifluorescence microscopy. similar procedures enabled histological imaging of lysozyme-dextran nanogels in iv-lps-injured lungs. nanogels synthesized with rhodamine-dextran were administered intravenously in injured mice minutes prior to sacrifice. lungs were sectioned as above and stained with clone a anti-ly g antibody, followed by briliant violetconjugated anti-rat secondary antibody. sections of human lungs were obtained after ex vivo administration (see nanoparticle administration in human lungs above) of lysozyme-dextran nanogels synthesized with rhodamine dextran. regions of tissue delineated as perfused and nonperfused, as determined by arterial administration of tissue dye as above, were harvested, embedded in m medium, flash frozen, and sectioned in µm slices. epifluorescence imaging indicated rhodamine fluorescence from nanogels, coregistered to autofluorescence indicating tissue architecture. a mouse was anesthetized with ketamine/xylazine five hours after intravenous administration of mg/kg b lps. a jugular vein catheter was fixed in place for injection of lysozyme-dextran nanogels, anti-cd antibody, and fluorescent dextran during imaging. in preparation for exposure of the lungs, a patch of skin on the back of the mouse, around the juncture between the ribcage and the diaphragm, was denuded. while the mouse was maintained on mechanical ventilation, an incision at the juncture between the ribs and the diaphragm, towards the posterior, exposed a portion of the lungs. a coverslip affixed to a rubber o-ring was sealed to the incision by vacuum. the exposed portion of the mouse lung was placed in focus under the objective by locating autofluorescence signal in the "fitc" channel. with ms exposure, fluorescent images from channels corresponding to violet, green, near red, and far red fluorescence were sequentially acquired. a mixture of rhodamine-dextran nanogels ( . mg/kg), brilliant violet-conjugated anti-cd antibody ( . mg/kg), and alexa fluor labeled kda dextran ( mg/kg) for vascular contrast was administered via jugular vein catheter and images were recorded for minutes. images were recorded in slidebook software and opened in imagej (fiji distribution) for composition in movies with coregistration of the four fluorescent channels. all animal studies were carried out in strict accordance with guide for the care and use of laboratory animals as adopted by national institute of health and approved by university of pennsylvania institutional animal care and use committee (iacuc). male c bl/ j mice, - weeks old, were purchased from jackson laboratories. mice were maintained at - °c and on a / hour dark/light cycle with food and water ad libitum. ex vivo human lungs were donated from an organ procurement agency, gift of life, after determination the lungs were not suitable for transplantation into a recipient, and therefore would have been discarded if they were not used for our study. gift of life obtained the relevant permissions for research use of the discarded lungs, and in conjunction with the university of pennsylvania's institutional review board ensured that all relevant ethical standards were met. error bars indicate standard error of the mean throughout. significance was determined through paired t-test for comparison of two samples and anova for group comparisons. linear discriminant analysis and principal components analysis were completed in gnu octave scripts (adapted from https://www.bytefish.de/blog/pca_lda_with_gnu_octave/, and made available in full in the supplementary materials). findings in this study contributed to united states provisional patent application number / . raw imaging, flow cytometry, gamma counter, and spectroscopy data supporting the findings of this study are available from the corresponding author upon reasonable request. all other data supporting the findings of this study are available within the paper and its supplementary information files. covid- in critically ill patients in the seattle region -case series the influenza pandemic: insights for the st century lung safe investigators; esicm trials group. epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in countries incidence and outcomes of acute lung injury. b. engl nanomedicine for the treatment of acute respiratory distress syndrome. the ats bear cage award-winning proposal the mercurial nature of neutrophils: still an enigma in ards? endothelial nanomedicine for the treatment of pulmonary disease balti- study investigators. effect of intravenous β- agonist treatment on clinical outcomes in acute respiratory distress syndrome (balti- ): a multicentre, randomised controlled trial national heart, lung, and blood institute acute respiratory distress syndrome (ards) clinical trials network. randomized, placebo-controlled clinical trial of an aerosolized β -agonist for treatment of acute lung injury neutrophil function in inflammation and inflammatory diseases paradoxical roles of the neutrophil in sepsis: protective and deleterious targeting neutrophils in ischemic stroke: translational insights from experimental studies neutrophil function in ischemic heart disease contribution of neutrophils to acute lung injury neutrophils kinetics in health and disease neutrophil-endothelial cell interactions in the lung the multifaceted functions of neutrophils neutrophils in the activation and regulation of innate and adaptive immunity what drives neutrophils to the alveoli in ards? pulmonary retention of primed neutrophils: a novel protective host response, which is impaired in the acute respiratory distress syndrome neutrophil margination, sequestration, and emigration in the lungs of l-selectin-deficient mice ly family proteins in neutrophil biology use of ly g-specific monoclonal antibody to deplete neutrophils in mice neutrophil targeted nano-drug delivery system for chronic obstructive lung diseases therapeutic targeting of neutrophil granulocytes in inflammatory liver disease prevention of vascular inflammation by nanoparticle targeting of adherent neutrophils neutrophil-mediated delivery of therapeutic nanoparticles across blood vessel barrier for treatment of inflammation and infection non-affinity factors modulating vascular targeting of nano-and microcarriers physical approaches to biomaterial design impact of particle elasticity on particle-based drug delivery systems factors controlling the pharmacokinetics, biodistribution and intratumoral penetration of nanoparticles cell-mediated delivery of nanoparticles: taking advantage of circulatory cells to target nanoparticles neutrophil sequestration and migration in localized pulmonary inflammation. capillary localization and migration across the interalveolar septum neutrophil recruitment to the lungs during bacterial pneumonia the lung is a host defense niche for immediate neutrophilmediated vascular protection icam- targeted nanogels loaded with dexamethasone alleviate pulmonary inflammation flexible nanoparticles reach sterically obscured endothelial targets inaccessible to rigid nanoparticles long-circulating janus nanoparticles made by electrohydrodynamic co-jetting for systemic drug delivery applications the transport and inactivation kinetics of bacterial lipopolysaccharide influence its immunological potency in vivo quantitative analysis of protein far uv circular dichroism spectra by neural networks selective staining of proteins with hydrophobic surface sites on a native electrophoretic gel lysozyme-dextran core-shell nanogels prepared via a green process in vivo editing of macrophages through systemic delivery of crispr-cas -ribonucleoprotein-nanoparticle nanoassemblies adeno-associated virus structural biology as a tool in vector development structure of human adenovirus cisplatin encapsulation within a ferritin nanocage: a high-resolution crystallographic study vascular targeting of radiolabeled liposomes with bio-orthogonally conjugated ligands: single chain fragments provide higher specificity than antibodies targeting superoxide dismutase to endothelial caveolae profoundly alleviates inflammation caused by endotoxin acute respiratory distress syndrome: diagnosis and management novel role for cftr in fluid absorption from the distal airspaces of the lung red blood cell-hitchhiking boosts delivery of nanocarriers to chosen organs by orders of magnitude neutrophil-particle interactions in blood circulation drive particle clearance and alter neutrophil responses in acute inflammation the tlr -myd pathway is critical for adaptive immune responses to adeno-associated virus gene therapy vectors in mice adeno-associated viral vectors at the frontier between tolerance and immunity serum ferritin: past, present and future facile double-functionalization of designed ankyrin repeat proteins using click and thiol chemistries a new reagent which may be used to introduce sulfhydryl groups into proteins, and its use in the preparation of conjugates for immunoassay doxil®--the first fda-approved nano-drug: lessons learned lipid rafts regulate lipopolysaccharide-induced activation of cdc and inflammatory functions of the human neutrophil alterations in membrane cholesterol cause mobilization of lipid rafts from specific granules and prime human neutrophils for enhanced adherence-dependent oxidant production generation of targeted adenoassociated virus (aav) vectors for human gene therapy biphasic janus particles with nanoscale anisotropy direct cytosolic delivery of crispr/cas -ribonucleoprotein for efficient gene editing direct cytosolic delivery of proteins through coengineering of proteins and polymeric delivery vehicles antioxidant protection by pecam-targeted delivery of a novel nadph-oxidase inhibitor to the endothelium in vitro and in vivo red: anti-ly g stain. green: tissue autofluorescence. (f) biodistributions of heat-inactivated, fixed, and ilabeled e. coli in naïve (n= ) and iv-lps-injured (n= ) mice tissue autofluorescence). (k) single frame from real-time intravital imaging of ldng (red) uptake in leukocytes (green) in iv-lps-inflamed pulmonary vasculature (blue, alexa fluor -dextran) biodistributions in iv-lps-injured mice for azide-functionalized liposomes conjugated to igg loaded with . , , , and dbco molecules per igg (bars further to the right correspond to more dbco per igg). (d) mouse lungs flow cytometry data indicating ly g anti-neutrophil staining density vs. levels of dbco( x)-igg liposome uptake. (e) flow cytometry data verifying increased dbco( x)-igg liposome uptake in and specificity for neutrophils following lps insult (inset: verification of increased concentration of neutrophils in the lungs following lps key: cord- -w xaa f authors: römer, rudolf a.; römer, navodya s.; wallis, a. katrine title: flexibility and mobility of sars-cov- -related protein structures date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w xaa f the worldwide covid- pandemic has led to an unprecedented push across the whole of the scientific community to develop a potent antiviral drug and vaccine as soon as possible. existing academic, governmental and industrial institutions and companies have engaged in large-scale screening of existing drugs, in vitro, in vivo and in silico. here, we are using in silico modelling of sars-cov- drug targets, i.e. sars-cov- protein structures as deposited on the protein databank (pdb). we study their flexibility, rigidity and mobility, an important first step in trying to ascertain their dynamics for further drug-related docking studies. we are using a recent protein flexibility modelling approach, combining protein structural rigidity with possible motion consistent with chemical bonds and sterics. for example, for the sars-cov- spike protein in the open configuration, our method identifies a possible further opening and closing of the s subunit through movement of sb domain. with full structural information of this process available, docking studies with possible drug structures are then possible in silico. in our study, we present full results for the more than thus far published sars-cov- -related protein structures in the pdb. at the end of a cluster of pneumonia cases was discovered in wuhan city in china, which turned out to be caused by a novel coronavirus, sars-cov- . since then the virus has spread around the world and currently has caused over million infections with more than , deaths worldwide (july st). sars-cov- is the seventh coronavirus identified that causes human disease. four of these viruses cause infections similar to the common cold and three, sars-cov, mers-cov and sars-cov- cause infections with high mortality rates. as well as affecting humans, coronaviruses also cause a number of infections in farm animals and pets, and the risk of future cross-species transmission, especially from bats, has the potential to cause future pandemics. thus, there is an urgent need to develop drugs to treat infections and a vaccine to prevent this disease. some success has already been achieved for sars-cov- with dexamethasone reducing mortality in hospitalised patients and a number of vaccine trials are currently ongoing. the viral spike protein is of particular interest from a drug-and vaccine-development perspective due to its involvement in recognition and fusion of the virus with its host cell. the spike protein is a heavily glycosylated homotrimer, anchored in the viral membrane. it projects from the membrane giving the virus its characteristic crown-like shape. the ectodomain of each monomer consists of an n-terminal subunit, s , comprising two domains, s a and s b , followed by an s subunit forming a stalk-like structure. each monomer has a single membrane-spanning segment and a short c-terminal cytoplasmic tail. the s is involved in recognition of the human receptor, ace . this subunit has a closed or down configuration where all the domains pack together with their partners from adjacent polypeptides. , however, in order for recognition and binding to ace to take place, one of the three s b domains dissociates from its partners and tilts upwards into the open or up configuration. , binding of ace to the open conformation leads to proteolytic cleavage of the spike polypeptide between s and s . s then promotes fusion with the host cell membrane leading to viral entry. drugs that target the spike protein thus have the potential to prevent infection of host cells. the main protease of the virus (m pro ) is another important drug target. m pro is responsible for much of the proteolytic cleavage required to obtain the functional proteins needed for viral replication and transcription. these proteins are synthesised in the form of polyproteins which are cleaved to obtain the mature proteins. m pro is active in its dimeric form but the sars-cov m pro is found as a mixture of monomer and dimers in solution. sars-cov m pro has been crystallised in different, ph-dependent conformations suggesting flexibility, particularly around the active site. md simulations support this flexibility. the protease is highly conserved in all coronaviruses so targeting either dimerization or enzymatic activity may give rise to drugs that can target multiple coronaviruses, known and yet unknown. , since the discovery of sars-cov- , a plethora of structures have been determined including m pro and the ectodomain of the spike protein , as well as other potential drug and vaccine targets. these structures provide the opportunity for rational drug design using computational biology to identify candidates and optimise lead compounds. however, crystal structures only provide a static picture of proteins, whereas proteins are dynamic and this property is often important in drug development. for example, agonists and antagonists often bind different conformations of g coupled-protein receptors. . flexibility also affects thermodynamic properties of drug binding, , yet the ability to assess flexibility is often hampered by the long computational times needed for md simulations. we use a recent protein flexibility modeling approach, combining methods for deconstructing a protein structure into a network of rigid and flexible units (first) with a method that explores the elastic modes of motion of this network, [ ] [ ] [ ] [ ] [ ] and a geometric modeling of flexible motion (froda). , the usefulness of this approach has recently been shown in a variety of systems. [ ] [ ] [ ] [ ] [ ] methods similar in spirit, exploring flexible motion using geometric simulations biased along "easy" directions, have also been implemented using frodan and nmsim. we have performed our analysis through multiple conformational steps starting from the crystal structures of sars-cov- -related proteins as currently deposited in the pdb. this results in a comprehensive overview of rigidity and the possible flexible motion trajectories of each protein. we emphasize that these trajectories do not represent actual stochastic motion in a thermal bath as a function of time, but rather the possibility of motion along a set of most relevant low-frequency elastic modes. each trajectory leads to a gradual shift of the protein from the starting structure and this shift may reach an asymptote, where no further motion is possible along the initial vector, as a result of steric constraints. energies associated with such a trajectory for bonds, angles, electrostatics, and so forth, have be estimated in previous studies of other proteins and shown to be consistent and physically plausible. computing times for the method vary with the size of the proteins, but range from minutes to a few hours and are certainly much faster than thermodynamically equilibrated md simulations. the approach hence offers to possibility of large-scale screening of protein mobilities. we have downloaded protein structure files as deposited on the protein data bank, including all pdb codes that came up when using "sars-cov- " and "covid- " as search terms, as well as minor variations in spelling. in our first search of april th, , this gave protein structures. a second and third search on may th and th, , respectively, resulted in a total of structures. in addition we included a few protein structures from links provided in selected publications to give a grand total of protein structure included in this study. many of the structures found, as outlined above, have been deposited on the pdb in dimer or trimer forms. hence one has the choice to study the rigidity and motion of the monomer or the dimer/trimer. clearly, the computational effort for a dimer/trimer is much larger than for a monomer since in addition to the intra-monomer bond network, also inter-monomer bond need to be taken into account. furthermore, it is not necessarily clear whether the possible motion of a monomer or dimer should be computed to have the most biological relevance. we have computed the motion of full dimer and trimer structures only for a few selected and biologically most relevant such structures while the default results concentrated on the monomers. nevertheless, we wish to emphasize that when results for a certain monomer exist, it should nearly always be possible to also obtain their motion in dimer/trimer configuration. for some protein structures, we have found that steric clashes were present in the pdb structures that made a flexibility and sometimes even just the rigidity analysis impossible. usually, this is due to a low crystal resolution. a list of all current protein pdb ids included in this work is given in table s . in figs. and we show examples of different rigidity patterns that emerge from the first analysis. in line with previous studies comparing these rigidity patterns across various protein families, we find that they can be classified into about four types. in fig. (a) we see that for the crystal structure of sars-cov- nucleocapsid protein n-terminal rna binding domain (pdb: m m), the largest rigid cluster in the pristine structure, i.e. at e cut = , largely remains rigid through the dilution process of consecutively lowering e cut values. when bonds are opened the rigid cluster shrinks but the newly independent parts are flexible and not part of any new large rigid cluster themselves. only very small parts of the protein chain break to form their own independent rigid structures before at a certain e cut the whole protein is essentially flexible. we shall denote such a behaviour as brick-like. in contrast, in fig. (b) we observe that for chain a of the co-factor complex of nsp and the c-terminal domain of nsp , which forms part of the rna synthesis machinery from sars cov- (pdb: wiq), already the crystal structure has fallen into three independent rigid structures. opening bonds, we find that the largest rigid cluster breaks into rigid structures. gives chain a of the co-factor complex of nsp and the c-terminal domain of nsp from sars cov- (pdb: wiq). different rigid clusters of the polypeptide chain appear as identically coloured blocks along the protein chain with each c α labelled from its n-terminal at to its c-terminal. when the energy cutoff e cut decreases (left-most column, downward direction towards larger, negative e cut magnitudes), rigid clusters break up and more of the chain becomes flexible. the colour coding shows which atoms belong to which rigid cluster. flexible regions appear as black thin lines. the second column on the left indicates the mean number r of bonded neighbours per atom as e cut changes. we note that the e cut scale is not linear since new rows are only added when the rigidity of a structure has changed. then these now four domains retain their character upon opening more and more bonds until they simply dissolve into full flexibility. while brick-and domain-like behaviour is found only occasionally, more prevalent are two further types of rigidity dissolution. in fig. (a) we see that for, e.g. the monomer of crystal structure of covid- main protease (m pro ) in apo form (pdb: m ), the rigid cluster dominating the crystal structure quickly falls apart upon change in e cut with five newly formed independent rigid clusters emerging towards to n-terminal of the protein chain. these new clusters remain stable to the opening of further bonds, even when the remnants of the original rigid cluster has become fully flexible. such a behaviour relies on a certain critical e cut value to dominate the rigidity dissolution and is reminiscent of so-called firstorder phase transitions in statistical physics. hence this rigidity-type is usually denoted as st order. the behaviour seen in (b) for the crystal structure of the complex resulting from the reaction between the sars-cov main protease and tert-butyl here, there are many values of e cut where large parts of the original rigid structure break off one after another so that towards the end of the bond-opening process, the original cluster is still present, but no longer dominates the rigidity pattern. this behaviour is called nd order. in table s , we have indicated the classification for each protein structure into the four classes. obviously, this classification is not perfect and there are also sometimes intermediate rigidity patterns. nevertheless, this rigidity classification already provides a first insight into the possible flexibility and range of motion for each structure. it should be clear, that a brick-like rigidity should show the least flexibility until it dissolves completely. on the other hand, for domain-like structures, one can expect possible intra-domain motion while inter-domain motion might be harder to spot. similarly, for a st-order rigidity, one would expect little dynamic mobility until the "transition" value for e cut has been reached, although high levels of flexibility should be possible afterwards. last, a protein with nd-order rigidity should have the most complex behaviour in terms of flexibility since new possible mobility can be expected throughout the range of e cut values. for each value of e cut , the first analysis has provided us with a map of rigid and flexible regions in a given crystal structure. we can now translate this into propensity of motion by allowing the flexible parts to move perturbatively, subject to full bonding and steric constraints (see methods section). moving along directions proposed by an elastic normal model analysis of the crystal structure, we can therefore construct possible motion trajectories that are fully consistent with the bond network and steric constraints. each trajectory corresponds to one such normal mode, denoted m up to m for the first low-frequency non-trivial such modes, as well as the chosen e cut value. generally, a larger value of |e cut | implies less rigidity and results in larger scale flexible motion. in fig. we give examples of such motion trajectories. fig. (a) shows a monomer of the sars-cov- spike glycoprotein (closed state) (pdb code: vxx ). we can see that there is a good range of motion from the crystal structure when following the normal mode modes into either positive or negative changes along the mode vector. fig. (b) shows the motion for the dimer structure of the sars-cov- main protease (pdb code: lu , in complex with inhibitor n ) . again, one can see considerable motion, although due to the complexity of the structure, it is difficult to distinguish individual movement patterns from such a frozen image. much better insight can be gained when watching for full range of motion as a movie. we have therefore included movies for all figures as supplemental materials. we also have made a dedicated web download page where movies for the complete set of pdb codes as given in table s are publicly available. the movies are being offered for various modes m to m as well as e cut values, at least containing results for e cut = kcal/mol, kcal/mol and kcal/mol, respectively. in addition, the site includes the rigidity resolutions discussed above and all intermediate structures needed to make the movies. this allows for the calculation of relative distances and other such structural measures along each motion trajectory as desired. for a detailed analysis of the sars-cov- main protease with similar methods, we refer the reader to ref. . as discussed above, the trans-membrane spike glycoprotein mediates entry into host cells. , as such, it is a "main target for neutralizing antibodies upon infection and the focus of therapeutic and vaccine design". it forms homotrimers protruding from the viral surface. up to the end of may , structures of the trans-membrane spike glycoprotein have been deposited in the pdb. these transmission electron cryomicroscopy (cryo-em) studies have led to structures with pdb codes vsb , vxx and vyb now being available. with rms resolution of . Å, vsb has a slightly lower resolution than vxx at . Å and byv at . Å. in the following, we shall discuss the resulting rigidity and flexibility properties of these three structures in their full trimer form. results for individual monomer rigidities, as given in fig. (a) , are also available at ref. . the rigidity pattern of the homotrimer for this structure (pdb: vsb, . kda, atoms, residues) is dominated by a large rigid cluster encompassing most of the trimer structure except for a region from roughly residue to residue . this indeed corresponds to the s b domain of s in each monomer. in the trimer configuration, it is known that one of the s b domains can change from the closed to an open configuration at |e cut | = . kcal/mol, the vsb structure of the trimer breaks into many different rigid parts, but the original large cluster remains a dominating feature across the rigidity dilution plot. a motion analysis analysis does not compute but breaks with bad sterics, apparently due to the comparably low resolution of the crystal structure. for the closed state structure ( vxx, . kda, atom count: atoms, residues) of the sars-cov- spike glycoprotein, the rigidity pattern is again "brick"-like and the whole of the crystal structure is part of a single cluster. at |e cut | = . kcal/mol, there is a first order break of the large rigid cluster into dozens of smaller rigid units. nevertheless, the original cluster retains a good presence in each chain of the trimer. in terms of motion, we are now able to produce motion studies for the full set of modes m to m and various e cut 's. in fig. we show motion results, using again the structures for the extreme ranges of the motion as in fig. . in addition, we are showing side, c.p. fig. (a) and top, c.p. fig. (b) , views similar to ref. . looking at the whole range of modes and e cut 's computed, we find that motion is very reminiscent of the vibrational excitations of a rigid cone or cylinder. there is twist motion around the central axis, bending of the trimer along the central axis, relative twist motion of chains relative to the remaining chain (c.p. fig. (b) ), etc. already at m (with |e cut | = kcal/mol), large scale motion has stopped and one only observes smaller scale motion is flexible parts of the trimer chains. overall, this behaviour is very consistent with the elastic behaviour of a "closed" structure similar to a cone or cylinder. we now turn to the last sars-cov- spike ecto domain structure ( vyb, . kda, atoms, residues). this structure is in the open configuration similar to the conformation seen in vsb. the structure is shown in fig. , again in side and top view. from the rigidity plots we find, in addition to the largest rigid cluster, already for the crystal structure at |e cut | = kcal/mol a second rigid cluster, also roughly spanning from residue to . again, this region identifies the s b domain as in the vsb structure. compared to the closed state ( vxx), we see that this cluster has more internal structure, i.e. consists of more flexible parts in the s b region from to and also seems to fall apart upon further changing e cut . this suggests that it is more flexible as a whole. upon motion simulation, we observe a very high mobility in that prominent s b subdomain. starting from the crystal structure, we find for |e cut | = kcal/mol a clear further opening towards a negative froda mode, m , while in the other, positive direction of m , the structure can again close the trimer. the distance range of the motion can be expressed as follow: the distance from residue in the middle of the central β -sheet of the s b to the most opened conformation is Å while distance to the same residue in the most closed conformation is Å. the distance from open to closed is Å. hence the motion simulation adds additional insight into the distinction between the open ( vyb) and the closed ( vxx) structures while also showing that a transition from open to closed is indeed possible. as stated in ref. , this interplay of closing and opening is expected to be central to the viral entry into the human cell. sars-cov- infectivity is dependent on binding of the spike protein to ace . this binding is only possible when the spike protein is in the open conformation. structures of both the open and closed conformation have already been determined and the flexibility of the s b domain inferred from these static structures , . the spike trimer consists of almost amino acids and hence is not an easy target for dynamics simulations due to its size. in the open structures ( vsb and vyb) the s b domain is clearly identifiable in the rigidity analysis as a separate cluster to the rest of the trimer. this shows that this domain has increased flexibility. we can clearly observe the hinge movement of the s b domain in the open configuration ( vyb) with the s b domain moving back into the closed configuration. the range of movement from the most opened to the most closed conformation can be measured to be quite large and the flexibility within the s b domain itself during the hinge movement is also seen to be considerable. all these findings suggest that the s b domain of the spike protein has the necessary flexibility to attach itself readily to ace . however, when starting from the closed structure ( vxx), we do not see an opening. this suggests possibly stronger bonds and steric constraints which need to be overcome before the structure is able to open up. nevertheless, to our knowledge this is the first time the hinge motion of the s b domain has been predicted solely based on the dynamics of a structure. in principle, the full structural information provided in our download site we start the rigidity, flexibility and mobility modelling in each case with a given protein crystal structure file in pdb .pdb format. hydrogen atoms absent from the pdb x-ray crystal structures are added using the software reduce . alternate conformations that might be present in the protein structure file are removed if needed and the hydrogen atoms renumbered in pymol. we find that for some protein structures the addition of hydrogen atoms is not possible without steric clashes. consequently, identification of a viable structure and its continued analysis is not possible. these proteins are labelled in table s . for the remaining proteins we produce the 'rigidity dilution' or rigid cluster decomposition (rcd) plot using first. the plots show the dependence of the protein rigidity on an energy cutoff parameter, e cut < . it parametrizes a bonding range cutoff based on a mayo potential , such that larger (negative) values of e cut correspond to more open bonds, i.e. a smaller set of hydrogen bonds to be included in the rigidity analysis. elastic network modes we obtain the normal modes of motion using elastic network modelling (enm) implemented in the elnemo software. , this generates a set of elastic eigenmodes and associated eigenfrequencies for each protein. the low-frequency modes are expected to have the largest motion amplitudes and thus be most significant for large conformational changes. the six lowest-frequency modes (modes - ) are just trivial combinations of rigid-body translations and rotations of / the entire protein. here we consider the six lowest-frequency non-trivial modes, that is, modes - for each protein. we will denote these modes as m , m , . . ., m . the modes are next used as starting direction for a geometric simulation, implemented in the froda module within first. this explores the flexible motion available to a protein within a given pattern of rigidity and flexibility. froda then reapplies bonding and steric constraints to produce an acceptable new conformation. since the displacement from one conformation to the next is typically small, we record only every th conformation. the computation continues for typically several thousand conformations. a mode run is considered complete when no further projection along the mode eigenvector is possible (due to steric clashes or bonding constraints). this manifests itself in slow generation of new conformations. we have performed froda mobility simulation for each protein at several selected values of e cut . this allows us to study each protein at different stages of its bond network, roughly corresponding to different environmental conditions, such as different temperatures as well as different solution environments. in a previous publication, we discussed the criteria for a robust selection of e cut . ideally, for each protein structure, a bespoke set of e cut values should be found, with the rcd plots providing good guidance on which e cut values to select. clearly, for a large-scale study as presented here, this is not readily possible due to time constraints. instead, we have chosen e cut = − kcal/mol, − kcal/mol and − kcal/mol for each protein. these values have been used before in a multi-domain protein with kdalton and shown to reproduce well the behaviour of (i) a mostly rigid protein at e cut = − kcal/mol, (ii) a protein with large flexible substructures/domains at e cut = − kcal/mol and (iii) a protein with mostly flexible parts connecting smaller sized rigid subunits at e cut = − kcal/mol. , in addition, we have also performed the analysis at other values of e cut when upon inspection of the rcd plots it was seen that the standard values (i) -(iii) would not be sufficient. the exact values used are given in table s . we emphasize that these trajectories do not represent actual stochastic motion in a thermal bath as a function of time, but rather the possibility of motion along the most relevant elastic modes. each trajectory leads to a gradual shift of the protein from the starting structure. this shift eventually reaches an asymptote, where no further motion is possible along the initial vector, as a result of steric constraints. energies associated with such a trajectory for bonds, angles, electrostatics, and so forth, can be estimated and shown to be consistent and physically plausible. a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster who coronavirus disease (covid- ) dashboard | who coronavirus disease (covid- ) dashboard viral infections of humans: epidemiology and control vaccination against coronaviruses in domestic animals bat-borne virus diversity, spillover and emergence dexamethasone for covid- -preliminary report effect of dexamethasone in hospitalized patients with covid- -preliminary report recovery collaborative structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein unexpected receptor functional mimicry elucidates activation of coronavirus fusion structure of mpro from sars-cov- and discovery of its inhibitors coronavirus main proteinase ( clpro) structure: basis for design of anti-sars drugs ph-dependent conformational flexibility of the sars-cov main proteinase (mpro) dimer: molecular dynamics simulations and multiple x-ray structure analyses structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design targeting the dimerization of the main protease of coronaviruses: a potential broad-spectrum therapeutic strategy structure of m pro from covid- virus and discovery of its inhibitors importance of protein dynamics in the structure-based drug discovery of class a g protein-coupled receptors (gpcrs protein conformational flexibility modulates kinetics and thermodynamics of drug binding rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses protein flexibility and dynamics using constraint theory normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic conformational change of proteins arising from normal mode calculations on the potential of normal mode analysis for solving difficult molecular replacement problems elnemo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement normal mode analysis and applications in biological physics constrained geometric simulation of diffusive motion in proteins docking of photosystem i subunit c using a constrained geometric simulation protein flexibility is key to cisplatin crosslinking in calmodulin inhibition of hiv- protease: the rigidity perspective structure and function in homodimeric enzymes: simulations of cooperative and independent functional motions the flexibility and dynamics of protein disulfide isomerase something in the way she moves': the functional significance of flexibility in the multiple roles of protein disulfide isomerase (pdi) generating stereochemically acceptable protein pathways multiscale modeling of macromolecular conformational changes combining concepts from rigidity and elastic network theory the protein data bank flex-covid data repository asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation the pymol molecular graphics visualization programme automated design of the surface positions of protein helices comparative analysis of rigidity across protein families amplitude elastic motions in proteins from a single-parameter, atomic analysis this work received funding by the cy initiative of excellence (grant "investissements d'avenir" anr- -idex- ) and developed during r.a.r.'s stay at the cy advanced studies, whose support is gratefully acknowledged. we thank warwick's scientific computing research technology platform for computing time and support. special thanks to overleaf.com for free premium access during covid- lockdown. uk research data statement: data accompanying this publication are available for download. rar conceived the study, nsr and rar assembled table s and identified e cut values, rar performed the computations and curated the data, akw identified biologically most relevant structures. all authors wrote and reviewed the manuscript. the authors declare no competing interests. is chain a of complex resulting from the reaction between the sars-cov main protease and a carbamate (pdb: y m). different rigid clusters of the polypeptide chain appear as identically coloured blocks along the protein chain with each c α labelled from its n-terminal at to its c-terminal. when the energy cutoff e cut decreases (left-most column, downward direction towards larger, more negative e cut magnitudes), rigid clusters break up and more of the chain becomes flexible. the colour coding shows which atoms belong to which rigid cluster. flexible regions appear as black thin lines. the second column on the left indicates the mean number r of bonded neighbours per atom as e cut changes. . colors are chosen identical to fig. . as in fig. , the arrows in each panel show the range of motion for chain b (blue shades) in (a+b) and also for chain c (reds) in (b). key: cord- - prczym authors: urra, josé miguel; ferreras-colino, elisa; contreras, marinela; cabrera, carmen m.; fernández de mera, isabel g.; villar, margarita; cabezas-cruz, alejandro; gortázar, christian; de la fuente, josé title: the antibody response to the glycan α-gal correlates with covid- disease symptoms date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: prczym the coronavirus disease (covid- ) pandemic caused by severe acute respiratory syndrome coronavirus (sars-cov- ) has affected millions of people worldwide. the characterization of the immunological mechanisms involved in disease symptomatology and protective response is important to advance in disease control and prevention. humans evolved by losing the capacity to synthesize the glycan galα - galβ -( ) glcnac-r (α-gal), which resulted in the development of a protective response against pathogenic viruses and other microorganisms containing this modification on membrane proteins mediated by anti-α-gal igm/igg antibodies produced in response to bacterial microbiota. in addition to anti-α-gal antibody-mediated pathogen opsonization, this glycan induces various immune mechanisms that have shown protection in animal models against infectious diseases without inflammatory responses. in this study, we hypothesized that the immune response to α-gal may contribute to the control of covid- . to address this hypothesis, we characterized the antibody response to α-gal in patients at different stages of covid- and in comparison with healthy control individuals. the results showed that while the inflammatory response and the anti-sars-cov- (spike) igg antibody titers increased, reduction in anti-α-gal ige, igm and igg antibody titers and alteration of anti-α-gal antibody isotype composition correlated with covid- severity. the results suggested that the inhibition of the α-gal-induced immune response may translate into more aggressive viremia and severe disease inflammatory symptoms. these results support the proposal of developing interventions such as probiotics based on commensal bacteria with α-gal epitopes to modify the microbiota and increase the α-gal-induced protective immune response and reduce the severity of covid- . the coronavirus disease (covid- ) pandemic caused by severe acute respiratory syndrome coronavirus (sars-cov- ) has affected millions of people worldwide. the characterization of the immunological mechanisms involved in disease symptomatology and protective response is important to advance in disease control and prevention. humans evolved by losing the capacity to synthesize the glycan galα - galβ -( ) glcnac-r (α-gal), which resulted in the development of a protective response against pathogenic viruses and other microorganisms containing this modification on membrane proteins mediated by anti-α-gal igm/igg antibodies produced in response to bacterial microbiota. in addition to anti-α-gal antibody-mediated pathogen opsonization, this glycan induces various immune mechanisms that have shown protection in animal models against infectious diseases without inflammatory responses. in this study, we hypothesized that the immune response to α-gal may contribute to the control of covid- . to address this hypothesis, we characterized the antibody response to α-gal in patients at different stages of covid- and in comparison with healthy control individuals. the results showed that while the inflammatory response and the anti-sars-cov- (spike) igg antibody titers increased, reduction in anti-α-gal ige, igm and igg antibody titers and alteration of anti-α-gal antibody isotype composition correlated with covid- severity. the results suggested that the inhibition of the α-gal-induced immune response may translate into more aggressive viremia and severe disease inflammatory symptoms. these results support the proposal of developing interventions such as probiotics based on commensal the coronavirus disease (covid- ), a pandemic caused by severe acute respiratory syndrome coronavirus (sars-cov- ), has rapidly evolved from an epidemic outbreak to a disease affecting the global population. sars-cov- infects human host cells by binding to the angiotensin-converting enzyme (ace ) receptor [ ] . it acute respiratory failure who needed mechanical ventilation support were admitted to a hospital icu. the patients were discharged from the hospital due to the clinical and radiological improvement of pneumonia caused by the sars-cov- , along with the normalization of analytical parameters indicative of inflammation, such as c-reactive protein (crp), d-dimer and blood cell count (table ) . samples from asymptomatic covid- cases with positive anti-sars-cov- igg antibody titers but negative by rt-pcr (n = ) were collected in may - , and included in the analysis. samples from healthy control individuals (n = ) were collected prior to covid- pandemic in april . the use of samples and individual's data was approved by the ethical and scientific committee (university hospital of ciudad real, c- and sescam c- ). . . inflammatory biomarkers are associated with severity in covid- patients in the blood cell analysis, the icu patients showed higher lymphocytopenia, percentage and neutrophil counts when compared to hospital discharge and hospitalized individuals (p < . ; figure a and table ). the cellular and biochemical indicators of systemic inflammation, neutrophil-lymphocyte count ratio (nlr), c-reactive protein (crp) and d-dimer levels were higher in icu patients when compared to other patients (p < . ; figure a and table ). although more severe symptoms have been associated with elderly patients, herein older patients were recorded in the hospitalized and not the icu group (p < . ; table the serum iga, ige, igm and igg antibody response to α-gal was characterized in healthy individuals and covid- patients at different disease stages (figures and a) . a negative correlation was observed for ige, igm and igg between anti-α-gal antibody titers and disease severity (rs < ; p = ; figure a) . the anti-α-gal iga antibody titers did not vary between the different groups (p = . ; fig. a ) nor correlate with disease severity (rs = . ; p = . ; figure a ). for anti-α-gal igm and igg antibodies, the titers decreased from healthy to icu individuals (p < . ; figures and a) . however, in asymptomatic cases the anti-α-gal ige titers were higher than in healthy individuals and symptomatic covid- patients (p < . ; figure a ). in covid- patients, the ige but not igm and igg antibody titers were higher in hospitalized patients than in hospital discharge and icu cases (p < . ; figure ). the profile of anti-α-gal antibody isotypes was qualitatively compared between groups including reference values for serum immunoglobulin levels (figure b ). the results evidenced gal antibody isotypes in covid- cases that may be associated with different disease stages ( figure ). these results suggested that higher anti-α-gal ige levels in asymptomatic cases may reflect an allergic response mediated by this glycan, which reflects the trade-off associated with the immune response to α-gal that benefit humans by providing immunity to pathogen infection while increasing the risk of developing allergic reactions to this molecule [ , , ] . in healthy individuals as in hospital discharge cases, the higher representation of anti-α-gal igm and/or igg antibodies may be associated with a protective response to covid- . however, in hospitalized patients the representation of anti-α-gal antibody isotypes did not vary, which could reflect the absence of protection. finally, the higher representation of anti- α-gal iga antibodies in icu patients may be associated with the inflammatory response observed in these cases. in accordance with these results, it was recently shown in endogenous α-gal-negative turkeys that treatment with probiotic bacteria with high α-gal content results in protection against aspergillosis through reduction by still unknown mechanisms in the pro- based on the fact that natural antibodies against α-gal are produced in response to bacteria with this modification in the microbiota [ ], our hypothesis is that the dysbacteriosis observed in covid- patients [ ] translates into a reduction in total anti-α-gal antibody titers and alteration of anti-α-gal antibody isotype composition due to the reduction in the microbiota of α-gal-containing commensal bacteria and other still uncharacterized mechanisms ( figure ) that may be implicated in the human protection to covid- . in conclusion, according to these results and previous findings in retrovirus [ , ] , the authors declare that there is no conflict of interest regarding the publication of this paper. (university of castilla la mancha, uclm, spain) for the critical reading of the manuscript. we acknowledge uclm, spain support to grupo sabio. mc was funded by the ministerio de ciencia, innovación y universidades, spain (grant fjc- - -i). igfm was supported by the uclm. mv was supported by the uclm and the fondo europeo de desarrollo regional, feder, eu. a pneumonia outbreak associated with a new coronavirus of probable bat origin lymphopenia predicts disease severity of covid- : a descriptive and predictive study expression of the sars-cov- cell receptor gene ace in a wide variety of human tissues increased tnf-alpha-induced apoptosis in lymphocytes from aged humans: changes in tnf-alpha receptor expression and activation of caspases efficacy of glutathione therapy in relieving dyspnea associated with covid- pneumonia: a report of cases clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china novel coronavirus infection (covid- ) in humans: a scoping review and meta-analysis a possible probiotic (s. salivarius k ) approach to improve oral and lung microbiotas and raise defenses against sars-cov- lung microbiota in the acute respiratory disease: from coronavirus to metabolomics gut microbiota elicits a protective immune response against malaria transmission catastrophic-selection" interplay between enveloped virus epidemics, mutated genes of enzymes synthesizing carbohydrate antigens, and natural anti-carbohydrate antibodies environmental and molecular drivers of the α-gal syndrome the alpha-gal syndrome: new insights into the tick-host conflict and cooperation vaccination with alpha-gal protects against mycobacterial infection in the zebrafish model of tuberculosis regulation of the immune response to α-gal and vector-borne diseases toll-like receptor signaling induces nrf pathway activation through p -triggered keap degradation human natural antibodies to mammalian carbohydrate antigens as unsung heroes protecting against past, present, and future viral infections the exquisite corpse for the advance of science sensitivity in detection of antibodies to nucleocapsid and spike proteins of severe acute respiratory syndrome coronavirus in patients with coronavirus disease antibody detection of sars-cov spike and nucleocapsid protein structure and function of immunoglobulins covid- : consider cytokine storm syndromes and immunosuppression prognostic value of neutrophil-to-lymphocyte ratio in sepsis: a meta-analysis ródenas, i. selective cd cell reduction by sars- cov- is associated with a worse prognosis and systemic inflammation in covid- patients a case series describing the epidemiology and clinical characteristics of covid- infection in jilin province relationship between anti-spike protein antibody titers and sars-cov- in vitro virus neutralization in convalescent plasma gut microbiota abrogates anti-α-gal iga response in lungs and protects against experimental aspergillus infection in poultry alterations in gut microbiota of patients with covid- during time of hospitalization effect of blood type on anti-α-gal immunity and the incidence of infectious diseases the association between ixodes holocyclus tick bite reactions and red meat allergy ige antibodies to alpha-gal in the general adult population: relationship with tick bites, atopy, and cat ownership atypical food allergen or model ige hypersensitivity? curr α-gal and atherosclerosis relationship between abo blood group distribution and clinical characteristics in patients with covid- testing the association between blood type and covid- infection, intubation, and death genomewide association study of severe covid- with respiratory failure blood groups in infection and host susceptibility biological functions of fucose in mammals ige production to α-gal is accompanied by elevated levels of specific igg antibodies and low amounts of ige to blood group b a novel mechanism of retrovirus inactivation in human serum mediated by anti-alpha-galactosyl natural antibody the alpha-galactosyl epitope: a sugar coating that makes viruses and cells unpalatable immunosuppression for hyperinflammation in covid- : a double-edged sword? this work was partially supported by the consejería de educación, cultura y deportes, jccm, spain, project ccm -pic- (sbply/ / / ). we thank antonio mas key: cord- - p dqacd authors: lee, cheryl yi-pin; amrun, siti naqiah; chee, rhonda sin-ling; goh, yun shan; mak, tze-minn; octavia, sophie; yeo, nicholas kim-wah; chang, zi wei; tay, matthew zirui; torres-ruesta, anthony; carissimo, guillaume; poh, chek meng; fong, siew-wai; bei, wang; lee, sandy; young, barnaby edward; tan, seow-yen; leo, yee-sin; lye, david c.; lin, raymond t. p.; maurer-stroh, sebastien; lee, bernett; cheng-i, wang; renia, laurent; ng, lisa f.p. title: neutralizing antibodies from early cases of sars-cov- infection offer cross-protection against the sars-cov- d g variant date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: p dqacd the emergence of a sars-cov- variant with a point mutation in the spike (s) protein, d g, has taken precedence over the original wuhan isolate by may . with an increased infection and transmission rate, it is imperative to determine whether antibodies induced against the d isolate may cross-neutralize against the g variant. in this report, profiling of the anti-sars-cov- humoral immunity reveals similar neutralization profiles against both s protein variants, albeit waning neutralizing antibody capacity at the later phase of infection. these findings provide further insights towards the validity of current immune-based interventions. importance random mutations in the viral genome is a naturally occurring event that may lead to enhanced viral fitness and immunological resistance, while heavily impacting the validity of licensed therapeutics. a single point mutation from aspartic acid (d) to glycine (g) at position of the sars-cov- spike (s) protein, termed d g, has garnered global attention due to the observed increase in transmissibility and infection rate. given that a majority of the developing antibody-mediated therapies and serological assays are based on the s antigen of the original wuhan reference sequence, it is crucial to determine if humoral immunity acquired from the original sars-cov- isolate is able to induce cross-detection and cross-protection against the novel prevailing d g variant. laurent renia: infectious diseases horizontal technology centre, a*star, a biomedical grove, immunos # - , singapore . phone: (+ )- . email: renia_laurent@immunol.a-star.edu.sg the emergence of a sars-cov- variant with a point mutation in the spike (s) protein, d g, has taken precedence over the original wuhan isolate by may . with an increased infection and transmission rate, it is imperative to determine whether antibodies induced against the d isolate may cross-neutralize against the g variant. in this report, profiling of the anti-sars-cov- humoral immunity reveals similar neutralization profiles against both s protein variants, albeit waning neutralizing antibody capacity at the later phase of infection. these findings provide further insights towards the validity of current immune-based interventions. random mutations in the viral genome is a naturally occurring event that may lead to enhanced viral fitness and immunological resistance, while heavily impacting the validity of licensed therapeutics. a single point mutation from aspartic acid (d) to glycine (g) at position of the sars-cov- spike (s) protein, termed d g, has garnered global attention due to the observed increase in transmissibility and infection rate. given that a majority of the developing antibody-mediated therapies and serological assays are based on the s antigen of the original wuhan reference sequence, it is crucial to determine if humoral immunity acquired from the original sars-cov- isolate is able to induce cross-detection and cross-protection against the novel prevailing d g variant. coronavirus disease is the consequence of an infection by severe acute respiratory syndrome coronavirus (sars-cov- ), which emerged in wuhan, china, in december ( ). the rapid expansion of the covid- pandemic has affected countries and territories, with a global count of more than million laboratory-confirmed human infection cases to date ( ). an inevitable impact of this pandemic is the accumulation of immunologically relevant mutations among the viral populations due to natural selection or random genetic drift, resulting in enhanced viral fitness and immunological resistance ( , ). for instance, antigenic drift was previously reported in other common cold coronaviruses, oc and e, as well as in sars-cov ( - ). in early march , a non-synonymous mutation from aspartic acid (d) to glycine (g) at position of sars-cov- spike (s) protein was identified ( ) figure a and b). all patients showed a decrease in igm response ( figure a) , and a prolonged igg response over time ( figure b) . although one recent study has demonstrated similar neutralization profiles against both d and g sars-cov- pseudoviruses, the virus clade by which the six individuals were infected with was not identified ( ). according to singapore's sars-cov- clade pattern from december till july , the d g mutation only appeared in february ( figure c ). hence, with knowledge on the d g status of a subset of covid- patients (n= infected with d , n= infected with g , n= containing all other clades: o, s, l, v, g, gh or gr; table , figure c infections. in addition, determining the level of cross-reactivity is essential for immunosurveillance, as well as to identify broadly neutralizing antibodies or epitopes ( ). here, we confirm that cross-reactivity occurs at the functional level of the humoral response on both the s protein variants. our results, together with the recent serological evaluation ( ), strongly suggest that existing serological assays will be able to detect both d and g clades of sars-cov- with a similar sensitivity. however, it is of clinical relevance to assess if cross-reactivity between the variants may enhance viral infection when neutralizing antibodies are present at suboptimal concentrations ( ). more importantly, further studies using monoclonal antibodies are necessary to validate the cross-reactivity profiles between both sars-cov- s variants. overall, our study shows that the d g mutation on the s protein does not impact sars-cov- neutralization by the host antibody response, nor confer viral resistance against the humoral immunity. hence, there should be negligible impact towards the efficacy of antibody-based therapies and vaccines that are currently being developed. the authors would like to thank the study participants who donated their blood samples to this project, and the healthcare workers caring for the covid- patients. the authors also wish to thank ding ying and the singapore infectious new sars-like virus in china triggers alarm insights into rna synthesis, capping, and proofreading mechanisms of sars-coronavirus making sense of mutation: what d g means for the covid- pandemic remains unclear genetic drift of human coronavirus oc spike gene during adaptive evolution analysis of human coronavirus e spike and nucleoprotein genes demonstrates genetic drift between chronologically distinct strains cross- host evolution of severe acute respiratory syndrome coronavirus in palm civet and human the d g mutation in the sars-cov- spike protein reduces s shedding and structures of human antibodies bound to sars-cov- spike reveal common epitopes and recurrent features of antibodies epidemiologic perspective on surveillance and control. frontiers in immunology safety and efficiency of endoscopic resection versus laparoscopic resection in gastric gastrointestinal stromal tumours: a systematic review and meta-analysis viral mutation rates models of rna virus evolution and their roles in vaccine design d g spike variant does not alter igg, igm, or iga spike seroassay performance serologic cross-reactivity of sars-cov- with endemic and seasonal betacoronaviruses. medrxiv : the preprint server for health sciences a perspective on potential antibody-dependent enhancement of sars-cov- . key: cord- -fzfl qa authors: manomaipiboon, anan; pupipatpab, sujaree; chomdee, pongsathorn; boonyapatkul, pathiporn; trakarnvanich, thananda title: the new silicone n half-piece respirator, vjr-nmu n : a novel and effective tool to prevent covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fzfl qa filter facepiece respirators (ffrs) are critical for preventing the transmission of respiratory tract infection disease, especially the dreadful coronavirus (sars-cov- ). the n mask is a prototype, high-efficiency protective device that can effectively protect against airborne pathogens of less than . μm. the n mask is tightly fitting and has high filtration capacity. the ongoing covid- pandemic has led to a greater requirement for ffr. this rising demand greatly exceeds current production capabilities and stockpiles, resulting in shortages. to address this, our team has invented a new type of half-piece respirator made from silicone and assembled with hepa or elastostatic filter. a variety of methods have been used to evaluate this new device, including a qualitative fit test with the bitrex® test kit and filtration test. the preliminary results showed that the new n respirators pass the fit test. the filtration tests also confirmed the superiority of n over traditional n masks, with a mean performance of protection greater than %. for the filters, we used two types: safestar, which is a kind of hepa filter; and carestar, which is considered an elastostatic filler. carestar was developed to filter virus and bacteria in the operating room, with a limit duration of use up to h, while the safe star was designed for h use and has the quality equivalent to a hepa filter. our study demonstrated superior filtration efficacy of both filters, more than % even after h of use. carestar has significantly more filtration efficacy than a safe star (p < . ). in conclusion, the development of our new n half-piece respirator should ultimately be applicable to healthcare workers with at least non-inferiority to the previously used n respirators.. currently, the adequate supply of such equipment is not feasible. the advent of the new protective device will help protect healthcare workers and replenish the shortage of n respirators during the covid- pandemic. since the rapid spread of sars-cov- worldwide, resulting in the novel coronavirus disease pandemic, the shortage of personal protective equipment (ppe), including surgical masks and n respirators, has been a serious concern [ ] . the transmission of sars-cov- can occur by contact or droplets released from infected persons when coughing, talking, and sneezing [ ] . airborne transmission might occur during aerosolgenerating procedures, such as tracheal intubation [ ] . therefore, health care personnel (hcp), especially frontline workers, should be properly protected with appropriate equipment. a filtering facepiece and n respirators are used as high-performance filtering masks to protect staff against both droplets and aerosols [ ] . to avoid cross contamination, these devices are designed to be disposable after a single use [ ] . consequently, the consumption of ffp masks has been overwhelmed, and a supply shortage for hcp has already been reported in many countries [ ] . to meet the need for ffrs during a pandemic, navamindradhiraj university has invented a new model of silicone n facepiece respirator by using the silicone mask and the hepa filter normally used in the operating room (carestar or safestar, draeger, germany) (fig ) . this filter has a minimum filtration efficiency of ≥ % to the challenge aerosols consisting of particles in the most penetrating particle size range (approximately . µm). this can prevent penetration of viruses and bacteria and improves on the protective efficacy of the traditionally used n respirators. here we describe a novel silicone n respirator aimed at hcps who are in direct contact with patients, and capable of contributing to the replenishment of n masks. an important part of the silicone mask is the o-ring strap which can be adjusted to prevent face-seal leakage (fig ) . the silicone n mask is made using silibione mm series u silicone (bluestar silicones, shanghai, china), which are elastomers comprising of polymethyl vinyl siloxane gums and silica. in this particular series, this silicone rubbers are cured after the addition of a vulcanizing agent (chosen as a function of the production process). the heat cure is done after the addition of an organic peroxide compound and post-curing at °c after vulcanization. this series includes four products that differ by their hardness once cured: , , , and types (data in the supplemental file). once processed, the silibione mm series u are intended for food contact and biomedical applications. the advantages of the silibione mm series u include easy processing, highly transparent, and excellent mechanical properties (including high tear strength and a good compromise between tear strength and compression set). the silibione mm series u is also very resistant to oxidizing agents for all sterilization modes, and is chemically inert. moreover, samples of the silicone mm series have been subjected to migration heats in accordance with european and american regulations. the masks are made in three sizes: small, medium, and large. we assembled the silicone with a hepa filter (carestar® electrostatic filter), which is normally used together with a ventilator to filter bacteria and viruses from airborne transmission ( fig ) . the efficiency of the silicone mask is based on the efficiency of the air filter, which is considered a high-performing electrostatic filtration medium (fig ) . as required by occupational safety and health administration osha) [ ] , the half piece respirators need to pass the fit test to identify those individuals who do not achieve a sufficiently good fit necessary for adequate protection. the performance of this respirator was also determined by measuring percent leakage under constant airflow [ ] . the purposes of this study were to evaluate the fitting characteristic of our newly invented silicone n respirator and the performance against -µm particles using nacl aerosols. the study was conducted to achieve two specific research objectives: ( ) to investigate the fit characteristics of the novel silicone mask and whether the strap adjustment can help reduce the face-seal leakage; and ( ) to determine the level of performance by measuring the inward leakage of the generated aerosols with a new filter and a used filter (for up to h). participants were healthcare workers at vajira hospital, bangkok, thailand. the vajira institutional review board, faculty of medicine, vajira hospital, navamindradhiraj university, approved the protocol, and all participants provided informed consent. the inclusion criteria were: healthy volunteers, to years old. the major exclusion criteria were: contraindications to fit tests, such as asthma, congestive heart failure, anosmia, and ageusia. forty-three people ( males, females; mean age . ± . years) participated in this study. we excluded one male due to test intolerance (difficulty in breathing while putting on the hood cover). the other female was excluded because of a failed sensitivity test (no tests after squeezes of sensitivity test solution). the remaining people ( males and females) chose the proper size of silicone respirators to perform the qualitative fit testing. the newly invented silicone half-piece respirators were tested. the configuration and model have been described above. there are three available sizes: small, medium, and large. none of the models tested in this study had exhalation valves. the carestar filter is an electrostatic filter, and safestar is a hepa filter product that is high performance, retaining at least . % of bacterial and virus, with high hydrophobicity and transparent housing for visual control. a fit test was done with qualitative (bittrex solution aerosol, qualitative fit test). the protocol was conducted in accordance with the protocol from the osha respiratory protection standard [ ] , including the number, type, and duration of the exercise, and the seal checks in accordance with the manufacturer's instructions [ ] . the bittrex test uses a person's ability to test a bitter solution to determine whether a respirator fits properly [ ] . each subject was given a taste-threshold screening test prior to each fit test to ensure that he or she could taste. this process was done without the subject wearing a respirator. after passing the sensitivity test, the subject will proceed through the seven steps of the fit test, as follows: breathe normally; breathe deeply; head side to the side; head up and down; bent over the wrist, jogging; talking; breath normally. each step took s, and the bitrex solution was refilled into the hood every s, with half the dosage of the amount of the previous test. we asked the participant if he/she could taste the solution during each step of the test. the test was considered pass or fail. we chose the qualitative fit test became it is more widely used [ ] , simpler to use, easy to transport [ ] , faster to perform, and cheaper to set up than the qualitative fit test [ ] . whenever the test failed, we would adjust the strap behind the respirator to tighten the faceseal leakage and repeat the procedure again. we would repeat the test twice before considering the test a failure. finally, we took note of the collected data, such as gender, age, size of the respirators, number of sprays, threshold level, and test result (pass, fail). the real-time respirator performance test method was developed using a mt- u machine (sibata model, saitama, japan), which measures particle concentrations of . - . µm diameter using a particle generator laser beam scattering particle counter, that measures particles outside and inside the mask. the ratio between these two values is considered the percentage leak, as follows: % leak=((particle inside the filter))/((particle inside the filter)) x ………………. ( ) the percentage filter performance was calculated as -% leak. we tested at an airflow rate of l/min, which is - times higher than normal physiology, assuming that there should be no leakage through the filter. we tested two types of filters, carestar and safestar, that were incorporated into the silicone mask. carestar is limited for -h usage, and safestar is for -h duration. we designed the filter test, as shown in fig . we put the filter on the airflow generator , which generates an airflow of about l/min. the nacl aerosol condensation was generated by a mt- machine and flown through the cannula with a concentration of at least particles/cc (the minimum level needed to conduct the test, in this study the average was particles/cc) into the test chamber (green tube). we then measured the real-time percentage leak by counting the particles inside the filter compared to outside. we included at least persons. we calculated the sample size using the test for non-inferiority for testing two dependent means as follows: where ∆ is the difference of mean of the population, δ is the statistically significant value, and δ^ is the variance. since there are no previous studies to cite for references, we have used the g power version . . . program to compute filter performance. two groups were compared using a paired ttest, where ∝ = . (one tall) and the power of the test is %. we applied effect size to . (medium effect size) [ ] . we used the panel n respirators, which were used from to h to test for the filtration performance, relative to a control (new filter) respirator. the data analysis was performed using spss version . and microsoft office excel for presenting the demographic data and respirator size. descriptive statistics were calculated. the normal distribution of the data was tested using on the shapiro-wilk test. the fittest passing rates (i.e., the number of subjects passing each fit test divided by the total number of subjects performing that fit test) were calculated. filter penetration data were reported and calculated as mean ± sd, and compared to a specified target protection value of % in our respirator model. table provides a summary of the baseline demographic of all participants. fortythree subjects entered the study ( male, female). we excluded two persons due to test intolerance (fatigue during the test) and insensitive taste. the remaining subjects ( male, female), mean age . ± . years old, used three sizes of n respirators: small (n = ), medium (n = ), large (n = ) ( table ). table shows the filler penetration of two filler types (safestar and carestar, draeger) [ ] . we tested the baseline filter performance (first use) and used the filters for between and h. all of the filters had at least % protection (mean, . ± . for safestar, and . ± . for carestar, p < . ). the safestar filter had better efficiency than carestar (table ). even with a use time of up to h, the protection value remained > % (fig ) . filters. it is widely accepted that wearing face masks in public corresponds can help to prevent inter-human transmission of sars-cov [ ] . in hcp that remain at risk to covid- , patients should follow appropriate infection control procedures. these levels of protection depend on the setting where modes of viral transmission are relevant. filtering facepiece respirators are essential devices to protect hcw from bioaerosol particles [ , ] . according to the national institute for occupational safety and health (niosh) regulations cfr ,the n respirators are recommended for personal protection from exposure to respiratory aerosol particles [ ] . however, due to the covid- pandemic, there is a global supply shortage for the most exposed persons, hcws. this underlines the urgent need to replenish the masks using a variety of available materials. our team has invented a novel n half-piece respirator by combining a silicone facemask (as used in the operating rooms) with an electrostatic filter plug (carestar or safe star draeger lubeck, germany). we developed an o-ring strap made from silicone, similar to the silicone mask, to tighten the mask to the face. the strap consists of three parts: the silicone strap; the strap locker to adjust the length (made from polypropylene); and a -way hook (made form silicone to connect the strap with the mask). by adjusting the o-ring straps, almost all patients passed the fit test. this shows a good level of protection. adjustable head straps then allow a better-customized seal because they can be tightened to better secure the respirators. at first, three different sizes of the face pieces were produced. the majority of our participants used medium sizes. it should be noted that fit-testing is just one factor to determine the level of protection provided by a respirator. other tests, such as filter penetration, should also be done. according to the specification of the electrostatic filter (carestar or safestar), the estimated filtration efficiency should be more than % after testing with the sibata machine. the new filter had the filtration efficiency of more than %. we then evaluated the filters after use ( to h) since the filter efficacy might be hampered by humidity and the formation of a biofilm layer over the filter surface. the test protocol that we adapted measures the percent leakage through the filters by counting the sodium chloride generated aerosols. the percent of leakage was less than %. thus, we can confirm that the efficiency of our new respirators was compatible with the n type of respirator and can be used at least up to h before changing the filter. the limitations of this study were the method of fit test, that was qualitative and should be confirmed with qualitative fit test , . the silicone mask still has some drawbacks, such as the hardness of the material, that is comfortable for the wearer. those who wear eyeglasses might have difficult place his/her eyeglasses on the bridge of the silicone mask. the levels of protection afforded by the respiratory protective devices in this study might not be representative of all respirator wearers, who might have different facial size distributions than those of the subjects in this study. the filters that we tested had limited use times (up to h), while their efficiency remained excellent. further tests should be performed for a filter that has been used for more than h. the filtration test was designed with the equipment available in the unit based on the urgency to search for alternative protective devices. despite the limitations of our study, a strength of this innovation is that the materials are available, and the efficacy is acceptable, and even superior to the long-used n masks. here we report the efficacy of our innovation, a silicone n mask that we have named vjr-nmu n half-piece respirators. the vjr-nmu n respirators surpassed the expected levels of protection, and can be useful in the context of a global shortage of ppe. however, this is the first version of our masks, and further modifications are needed to improve user friendliness and provide adequate protection. we believe that the findings of this study will contribute to the provision of safe and superior healthcare services for hcw, and that the vjr-nmu n respirators can help to replenish the shortage of essential healthcare worker protection. world health organization. rational use of personal protective equipment for coronavirus disease (covid- ) and considerations during severe shortages . geneva: world health organisation mild or moderate covid- the role of particle size in aerosolized pathogen transmission: a review respiratory virus shedding in exhaled breath and efficacy of face masks infection prevention and control during health care when covid- is suspected: interim guidance. geneva: world health organization covid- ) advice for the public: when and how to use masks. geveva: world health organization occupational safety and health administration (osha). cfr parts and respiratory protection: final rule centre for disease control and prevention. laboratory performance evaluation of n filtering facepiece respitators respiratory protection program standards-fit testing procedures (mandatory) safety data sheets (sds) of bitrex sensitivity and fit test solutions evaluation of the bitrex qualitative fit test method using n filtering-facepiece respirator respirators fit testing. m occupational health and environmental safety division (oh&osd) can homemade fit testing solutions be as effective as commercial products? healthc infect development of a new qualitative test for fit testing respirators statistical power analysis for the behavioral sciences identifying airborne transmission as the dominant route for the spread of covid- surgical mask vs n respirator for preventing influenza among health care workers: a randomized trial n respirators vs medical masks for preventing influenza among health care personnel: a randomized clinical trial national institute for occupational safety and health (niosh). niosh respirator selection logic key: cord- -d xnj authors: aktas, emre title: bioinformatic analysis reveals that some mutations may affect on both spike structure damage and ligand binding site date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d xnj there are some mutations are known related to sars-cov- . together with these mutations known, i tried to show other newly mutations regionally. according to my results which whole sequences are used, i found that some mutations occur only in a certain region, while some other mutations are found in each regions. especially in asia, more than one mutation(three different mutations are found in qla isolated from south korea) was seen in the same sequence. although i detected a huge number of mutations (more than seventy in asia) by regions, some of them were predicted that damage spike’s protein structure by using bioinformatic tools.the predicted results are g v(isolated from north america), t i(isolated from south korea), g v(isolated from north america), m i(isolated asia), l m(isolated from asia), p h(isolated from asia), t p(isolated from europe), p s(isolated from asia), d g(isolated from all regions) respectively. also, in this study, i tried to show how possible binding sites of ligands change if the spike protein structure is damaged and whether more than one mutation affects ligand binding was estimated using bioinformatics tools. interestingly, mutations that predicted to damage the structure do not affect ligand binding sites, whereas ligands’ binding sites were affected in those with multiple mutations.focusing on mutations may opens up the window to exploit new future therapeutic targets. in two decades, mankind have come acrossed at least one lethal outbreaks caused from betacoronaviruses [ ] [ ] [ ] [ ] [ ] . the first was severe acute respiratory syndrome coronavirus (sars-cov) in , which infected over , people and nearly people were died [ ] . in , middle east respiratory syndrome, mers-cov was followed and ıt was resulted in with , cases [ ] .last one , sars-cov- is the cause of the severe respiratory disease covid- [ ] .the first reported case was in china by the end of december [ ] , and triggered an epidemic that quickly spread whole world and resulted in a pandemic [ ] .it is called by world health organization (who) situation report reads: over million confirmed cases , and over deaths (who situation report number , july ) . having information about viral mutations(covid- ) is very important for insights for assessing viral drug resistance, immune escape and pathogenesis related mechanisms,moreover; ıt may play a vital role for designing new vaccines, antiviral drugs and diagnostic assays.however, mutagenic process is so complex and many reasons play a role in this process such as;replicate the nucleic acids, influenced by few or no proofreading capability and/or post-replicative nucleic acid repair,host enzymes, spontaneous nucleic acid damages due to physical and chemical mutagens, recombination events and also particular genetic elements. [ , ] . in addition, some combinations factors are thought that make covid- so dangerous . they might be that humanity have no direct immunological experience with sars-cov- , affecting us prone to infection and some other diseases. it is quite high transmissible from man to man; and it has a very big mortality rate. commonly range is between . - . and range of deaths per confirmed cases is . - % [ , ] (mortality analyses, john hopkins university of medicine). covid- which rapid globally spread may provide the virus with plentiful opportunity for natural selection to act on mutations. it might be thought like case of influenza(where mutations slowly accumulate in the hemagglutinin protein during a flu season),and there is a complex interplay between mutations that can confer immune resistance to the virus, and the fitness landscape of the particular variant in which they arise [ ] .the sars-cov- ,which mutation rate rate remarkable high and many variation have already characterized, has shown to have gone through certain mutations both in its structural and non structural proteins within several months while spreading throughout the world [ , ] i focused my study on both determining some mutations that occurred based on the regions and evaluating whether this new mutation had an effect on the shapes of spike proteins.besides i tried to predict that how mutations affect on ligand site. characterization of these detected variants may give a new way for making new vaccine design, treatments and diagnostic approach. almost whole sequences ,which are belongs to surface glycoprotein, of covid- (taxid: ) isolated from humans have been downloaded from ncbi virus website based on their geographic region(www.ncbi.nlm.nih.gov/labs/virus). the dataset has been aligned by using megax(align by muscle) program. geographical regions were evaluated separately one by one [ ] . phyre is a suite of tools available on the web to predict and analyze protein structure, function and mutations .all of predicted structure are obtained by this tool [ ] (www.sbg.bio.ic.ac.uk/phyre /html). missense d online tool is used to predicted structure of my missense variant to compare with normal structure [ ] . genome detective coronavirus typing tool are used. this application allows us identify phylogenetic clusters and genotypes from assembled genomes in fasta format [ ] . site: dligandstie method was used to do an automated for the prediction of ligand binding sites [ ] . after downloading whole spike sequences, i performed all different regions seperately. on africa region, the most common mutations are q h ( mutations), d g ( mutations), r i ( mutations ) others ( mutations) respectively in my whole sequences and there are found eight different mutations.they are shown on table .interestingly, one of them qjx isolated from tunisia has two mutations a t, q r respectively.the predicted structure damage for both do not affect structure damage of spike based on resulting missense d online tool. in this area only when d g mutation occur, the predicted structure has damage.for others, there is no any predicted structure damage results based on their sequences after using missense d online tool. table eight different mutataions types are shown based on whole sequences of surface glycoprotein(africa )from ncbi by using megax manually. the mutation qkr s f according to whole sequences of spike protein ,the most common mutations are d g ( mutations), h y( mutations), y f ( mutations), g d( mutations), a s ( mutations), t i( mutations), s f( mutations), i v( mutations) respectively and others have only one mutation are shown on the table .it is clear that same mutation can occur different position.for example, alanine can turn into serine at two different positions such as;a s,a s. besides, threonine(t) can change into isoleucine in three different positions(t i, t i,t i). only two predicted structure damage are estimated, they are t p and d g.. the mutation access. number the mutation qkm t i qjs n t qjt l f qjt t n qhu h y qjc k r qjd q r qjt t i qjt m i qjt l f qjt l i qkm d g qjs n d qjt t i qjs i v qjz m i qkj v l for whole sequences from ocenia and south america ,the most finding mutations are g v ( mutations), d g ( mutations), other different mutations tend to increase such as; s l( mutations),a t( mutations),l f( mutations),d h( mutations),s l( mutations),g r( mutations). all finding mutation are shown on the table .like europe and africa, there are same mutations occurred different position such as; t i, t i, t i.besides qkv isolated whole sequences of spike protein has two mutations. they are t i and s l.like all regions, when d g occur, structure's damage is predicted by tool. l f qhr d g qkv t i qjr p s qkv h y qjr a v qjr s l qkr q h qjr t i qjr q h qjr i f qkv s l qjr d h qjr m i qjr l f qjr t i qjr s l qkv p s qhr s r qjr a v qjr w l qjr d y qjr a t qjr p s qjr i t qkv g v qkr g r qjr d g qkr h q qjr d n qjr p l north america shown on the table has a quite high d g mutatition rate based on my complete sequences of only spike proteins. i determined over mutations for d g. based on my samples ,some other mutations are seen such as;l f( mutations) ,d h( mutations), e d(( mutations), p l( mutations) respectively.besides,two different mutations are found at the same position too.an example is qkg (a d) and qkv (a v)have the same position but different mutation .the other example is qkg (q p) and qkg (q l). these two example may be a proof that some position more vulnarable to change into other mutations.for both, there was no predicted structure damage according to missense d online tool.like europe and africa threonine(t) change into isoleucine(i) in three different positions.besides i did not find any predicted structure damage result. table fifty two different mutations are shown based on whole seguences ( sequences)of spike proteins in the north america . access.number the mutation h r qlc a v qkx d g qlc r l qlc a s qkg t i qkg v l qkg v a qkv r s qkv l f qkv p s qlc p s qkv t s qkg e q qlc n k qks g v qkv p l qlc v l qki e d qlc p l even number of sequences used are not so high according to europe and north america,many mutations are found by using megax program for asia. it is the region where the most mutation types are seen ,they are shown on the table . like nearly all regions, d g are the most variant, nearly isolated samples are found. in addition, some mutations more than three l f ( mutations) , r m ( mutations), v f( mutations), a t( mutations), h q( mutations), t i( mutations), q h( mutations), e d( mutations), t i( mutations), l v( mutations) are found even some of other have one or two mutations.all mutations are shown on table .surprisingly qla isolated from south korea has three different mutations which are l f, f s, t i and qky isolated from india has four mutations that are q h,p s,y n,k n.interestingly, when all of this mutation occur, no any predict results about structure damage by using missense d online tool. some mutations were found more than one. one can see on the table are threonine(t) change into isoleucine (i) occur different position such as; t i, t i, t i, t i, t i ,t i and glutamine (q) turn into histidine(h)( qla , qkw ). i found only mutations belonging to this region.some of them arec f,q k,k n,d y,p s etc.another example of three mutation occur at the same isolated sequences is qky . it has q h, y n, p s respectively.like qla isolated from south korea ,qjd isolated from wilayah persekutuan malaysia has four mutations which are l m,d i,p h, h q.i conducted all mutations( for qjd ) one by one,however, i did not find any predicted structure damages.these both results do not affect on structure damage according to my results after resulting of bioinformatic process. table seventy six mutations are shown based on whole seguences ( sequences)of spike proteins in the asia region .only one of isolated accession number are used even many are found. access. number the mutation qjx f l qjd h q qit l v qiu a v qjx s i qjt t i qko q h qkj d y qjq t i qjr e d qia y n qkj q h qko h y qit d g qiu s l qiu a v qko t i qkt h y qla l f qkw q h qjy r m qla q h qla f s qkv r q qla t i qkn r w qko d h qku a v qke n y qjx a s qkv w qjd r l qkq m i qiz v i qjt e d qjd t i qjy s i qky k n qkj q h qky k q qjw m i qko p s qla k n qjq a t qko n y qjt t i qhz s w qjx a s qla w l qjd s f qky a s qjt a v qkx g r qia a v qjc q r qkk s y qjd l m qkf q e qjd d i qjy h q qjd p h qki f l qkv v f qko v l qjx e q qjr k r qky q h qkj d y qky y n qko k n qky p s qkj q k qky p h qjr c f all missense mutation are used to predict structure damage and results are shown on the figure . predicted structure damage's reason of d g found for all regions is substitution replaces glycine originally located in a bend curvature in this area (fig. a) . t p isolated from europe substitution introduces a buried proline and it triggers disallowed phi/psi alert. the phi/psi angles are in favored region for wild-type residue but outlier region for mutant residue (fig. b) . the predicted reason for this m i isolated from asia is that substitution results in a change between buried and exposed state of the target variant residue. met is buried (rsa . %) and arg is exposed (rsa . %)[(rsa < % for buried and the difference between rsa has to be at least %. (fig. c) . this substitution(p s isolated from asia) replaces a buried uncharged residue (pro, rsa . %) with a charged residue (his) (fig. d) . substitution (p h isolated from asia) replaces a buried uncharged residue (pro, rsa . %) with a charged residue (his) and leads to the expansion of cavity volume by . Å^ (fig. e) . substitution(l m) results in a change between buried and exposed state of the target variant residue. leu is buried (rsa . %) and met is exposed (rsa . %)( criterion: the substitution results in a change between buried and exposed state of the target variant residue. (rsa < % for buried and the difference between rsa has to be at least %.) (fig. f) .substitution g v isolated from north america) replaces a buried gly residue (rsa . %) with a buried val residue (rsa . %) (fig. g ). this (g v)substitution triggers disallowed phi/psi alert. the phi/psi angles are in allowed region for wild-type residue but outlier region for mutant residue and it replaces glycine originally located in a bend curvature (fig. h) . the substitution(t i isolated from both asia and north america) disrupts all side-chain / side-chain h-bond(s) and/or side-chain / main-chain bond(s) h-bonds formed by a buried thr residue (rsa . %) (fig. i) .the phylogenetic tree of my mutations are shown on the figure by using bioinformatic tools.they tend to close both bat sars cov and outgroup according to genome detective coronavirus typing tool. interestingly, mutations that damage the structure do not affect ligand binding sites ( figure ), whereas ligands' binding sites were affected in those with multiple mutations (figure ) . the result of all mutations detected to be affected by the structure was the same and is shown in figure . for example, the same source structure( dd _s, ajf_e pdb ) was taken for the structure predicted for all ligand binding. besides, all amino acids were same( figure ). qjx isolated from africa has two mutations at the same sequence and the first(a on the figure ) represents result of sequence. .like figure ,source are used to predict of structure is dd _s, ajf_e but predicting binding sites are different from figure . they are phe( contact: ,av distance: . ), gly (contact: ,av distance: . ), phe(contact: ,av distance: . ) , asn (contact: ,av distance: . ) respectively. the second (b on the figure ) is about qla (has three mutation) whole sequence isolated from south korea. sources are used to predict of structure ww _a, ulf_ a, ulc_b and predicting binding site are leu(contact: ,av distance: . ), val(contact: ,av distance: . ) val(contact: ,av distance: . ), lys (contact: ,av distance: . ), phe(contact: ,av distance: . ), val(contact: ,av distance: . ), tyr(contact: ,av distance: . ), glu (contact: ,av distance: . ).it tend to be different from figure . i may say more than one mutations effect on ligand binding site based on dligandstie method. figure shows the ligand binding site and some varying features when two or more mutations occur. while blue color represents predicted residue , cyan represents heterogens based on dligandstie method. it is said that d g increases ınfectivity of the covid- virus, like this idea other mutation predicted damage structure may increase infectivity [ ] .as well as this mutation, my study reveals that there are many mutations are shown table - and some of them are seen all regions even some belongs specific region.for example d g is seen all regions even p h is seen only asia. one can see all mutations regions by using access. number on the tables. in addition, more than one mutation was detected in sequences isolated in some regions specially in asia even four mutations were seen in the same sequence . there may be human mistake, but when these four mutations were used, it was seen that the spike did not affect the structure (figure ). on the other hand, some of these mutations were seen to affect the ligand binding site (figure ). mers: recent insights into emerging coronaviruses middle east respiratory syndrome: emergence of a pathogenic human coronavirus origin and evolution of pathogenic coronaviruses genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding genome composition and divergence of the novel coronavirus ( -ncov) originating in china recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- emerging sars-cov mutation hot spots include a novel rna -dependent -rna polymerase variant coronavirus disease (covid- ): a scoping review the establishment of reference sequence for sars-cov- and variation analysis the phyre web portal for protein modeling, prediction and analysis can predicted protein d structures provide reliable insights into whether missense variants are disease associated genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes dligandsite: predicting ligand-binding sites using similar structures tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus key: cord- -nnv e gr authors: mulgaonkar, nirmitee; wang, haoqi; mallawarachchi, samavath; fernando, sandun; martina, byron; ruzek, daniel title: bcr-abl tyrosine kinase inhibitor imatinib as a potential drug for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nnv e gr the rapid geographic expansion of severe acute respiratory syndrome coronavirus (sars-cov- ), the infectious agent of coronavirus disease (covid- ) pandemic, poses an immediate need for potent drugs. enveloped viruses infect the host cell by cellular membrane fusion, a crucial mechanism required for virus replication. the sars-cov- spike glycoprotein, due to its primary interaction with the human angiotensin-converting enzyme (ace ) cell-surface receptor, is considered as a potential target for drug development. based on in silico screening followed by in vitro studies, here we report that the existing fda-approved bcr-abl tyrosine kinase inhibitor, imatinib, inhibits sars-cov- with an ic of nm. we provide evidence that although imatinib binds to the receptor-binding domain (rbd) of sars-cov- spike protein with an affinity at micromolar, i.e., . ± . μm levels, imatinib does not directly inhibit the spike rbd:ace interaction – suggesting a bcr-abl kinase-mediated fusion inhibition mechanism is responsible for the inhibitory action. we also show that imatinib inhibits other coronaviruses, sars-cov, and mers-cov via fusion inhibition. based on promising in vitro results, we propose the abl tyrosine kinase inhibitor (atki), imatinib, to be a viable repurposable drug against covid- . in early december , the chinese health authorities reported several cases of pneumonia of unknown cause that had originated in wuhan, a city in the hubei province of china. the causative agent of this outbreak was identified to be a virus that belonged to the sarbecovirus subgenus, orthocoronavirinae subfamily which was previously referred to by its interim name novel coronavirus ( -ncov) [ , ] and was later named as sars-cov- [ ] . due to the rapid spread of covid- , the world health organization (who) declared it a global pandemic in march [ ] . by mid-august , over million cases have been confirmed around the world, resulting in more than , deaths [ ] . unfortunately, there is no approved antiviral treatment or preventive vaccine for coronaviruses in humans. since supportive care is the only recommended interim treatment, it is imperative to identify repurposable lead compounds to rapidly treat covid- patients until a sars-cov- -specific drug and a vaccine is developed. although the coronavirus genome consists of numerous conserved druggable enzymes, including papain-like protease (plpro), c-like protease ( clpro), non-structural proteins rna-dependent rna polymerase (rdrp) and helicase, development of clinically approved antiviral therapies has proven to be a difficult task [ ] . the surface structural spike glycoprotein (s), a key immunogenic cov antigen essential for virus and host cell-receptor interactions, is an important target for therapeutic development. the spike protein consists of an n-terminal s subunit (receptor binding) and a c-terminal s subunit (membrane fusion). the s subunit contains the receptor-binding domain (rbd) which attaches to the host membrane, thus playing an important role in viral entry. sars-cov- utilizes the ace receptor for entry and the transmembrane protease, serine (tmprss ) for spike protein priming [ ] . crystallographic studies have shown that sars-cov- binds to the ace receptor, with a binding mode nearly identical to that of sars-cov [ ] [ ] [ ] [ ] . the binding affinity of the ace receptor to the rbd of the sars-cov- spike protein is reported to be significantly higher as compared to sars-cov [ , ] . based on the importance of virus membrane fusion events in the viral life cycle and its infectivity, the spike protein of sars-cov- was targeted for drug screening. this study utilizes in silico methodology followed by in vitro experimental validation to screen existing fda-approved small molecule drugs specific to the rbd of the spike protein of sars-cov- to identify repurposable drugs targeting further clinical validation. a model for sars-cov- spike protein was constructed using the crystal structure ( vsb_chain a) to correct missing residues. the amino acid sequence identity between the target sequence (genbank: qhd . ) and template ( vsb_chain a) was . %. the sars-cov- model showed an rmsd of . Å relative to the crystal structure ( vsb_chain a). structure assessment of the predicted model using the ramachandran plot showed . % residues in the most favored regions with . % outliers. none of the outliers contained the residues present at the active site of the protein. the predicted model was further used for in-silico studies. in virtual screening, a library of approximately , compounds was docked against the sars-cov- spike rbd protein. the output was analyzed for common classes of drugs with highest (most negative) docking scores that resulted in seven compounds with three compounds, antiviral , antiviral and antiviral with docking scores of - . ± . , - . ± . and - . ± . kcal/mol from the enamine antiviral library, and four compounds, ponatinib, imatinib, ergotamine, and glecaprevir with docking scores of - . ± . , - . ± . , - . ± . and - . ± . kcal/mol from the zinc fda library respectively. the above libraries were chosen to help identify a repurposable drug that can potentially inhibit the sars-cov- . the screened compounds had the highest scores within their respective sets and had one or more binding conformations at the ace binding domain of the spike protein. the most common class of drugs was found to be abl tyrosine kinase inhibitors (atki), and hence two drugs (ponatinib and imatinib) with the highest scores were selected for in vitro testing. the binding scores for the seven screened compounds at the rbd are shown in fig. a and detailed description of the screened drugs is given in table s under supplementary data. the high affinity of the screened compounds is visible when compared with the negative control dimethyl sulfoxide (dmso), which is ineffective against coronaviruses [ ] . based on promising in silico data, and initial viral plaque assay results (fig. a) , imatinib was chosen to be advanced for further experimental validation. (due to a supply-shortage ponatinib and ergotamine were unavailable for purchase and hence could not be included in the initial viral plaque assays). previous studies have shown imatinib to inhibit sars-cov and mers-cov by blocking endosomal fusion at the cell-culture level [ , [ ] [ ] [ ] [ ] . it has been suggested that tyrosinekinase inhibitors do not affect the cleavage of the spike protein but inhibit spike-mediated endosomal fusion [ , , ] . the high affinity of tyrosine-kinase inhibitors towards the spike protein is deduced from the initial docking results, where both imatinib and ponatinib have shown highly negative binding free energies. first, we evaluated the toxicity of imatinib when incubating the compound on vero cells for one hour or eight hours. in the experiments where the compound remained on the cells for eight hours toxicity was measured at concentrations of µm, . µm, . µm, and . µm. however, in the -hour design, no toxicity was observed. next, we evaluated the ability of imatinib to inhibit replication and entry. at concentrations as low as . µm the compound was effective in suppressing % of plaque formation in the -hr design, and the ic value determined using linear regression was nm. consistent with the toxicity data, toxicity was observed between and . µm. the compound also showed efficacy in the -hour design, with higher ic values. these data indicate that imatinib inhibits virus replication in vitro as shown in fig. a and b. to evaluate if imatinib inhibits viral entry, we performed two fusion assays: endosomal (vero) and plasma membrane (vero-tmprss ) as shown in fig. c and d, respectively. based on cytotoxicity, at concentrations below nm, no toxicity was observed microscopically (red arrow in the graph). vsv-g control revealed % infectivity (cytopathic effect at every concentration below this, suggesting the inhibitor did not affect vsv-g entry. vsv-g particles cells do not carry spike proteins and thus, no significant entry inhibition occurred, suggesting that entry inhibition is likely mediated through the spike protein. however, the effect on vero-tmprss cells was less clear for any of the coronaviruses used when compared to the vsv-g control. a similar level of toxicity was observed in these cells. it is worth noting that toxicity is probably the result of incubating cells with imatinib for hours in the assay. taken together, there is evidence that imatinib inhibits spike fusion and prevents viral entry, possibly by preventing endosomal entry. the binding kinetics of imatinib to the rbd of sars-cov- spike protein was evaluated using biolayer interferometry (bli), as shown in fig. a . the analysis showed that imatinib binds to the sars-cov- rbd protein with an on-rate (kon) as ( . ± . ) × m - s - and dissociates with an off-rate (koff) as ( . ± . ) × - s - . this resulted in an equilibrium affinity constant (kd) . ± . µm which is calculated as a ratio of the koff and kon rates. the affinity values indicate that % of the rbds on the surface spike glycoproteins will be occupied at micromolar concentrations of imatinib. however, this value is too close to the toxicity levels observed in the above assays and very high compared to the ic value, as well as the nanomolar affinity of ace on immobilized rbd (fig. s ) suggest that it is likely to inhibit spike fusion by the other previously suggested moa [ ] . in vitro colorimetric assays were performed over a pm to nm range in imatinib concentration to assess the ability of imatinib to directly inhibit the rbd:ace interaction. the colorimetric signal of the positive control (no inhibitor) reaction was strong, and the blank wells exhibited an absorbance of ~ . at nm as per the manufacturer's instructions. the test wells (with inhibitor) showed absorbance comparable to the positive control wells indicating that imatinib did not affect the sars-cov- rbd:ace interaction in the indirect competitive enzyme-linked immunosorbent assay (elisa), as shown in fig. b . a pharmacophore analysis was done to evaluate as to why imatinib showed promising in-silico results yet failed to directly inhibit the rbd:ace interaction. the primary binding site of the sars-cov- rbd was revealed via docking. pharmacophore analyses were done to further elucidate the interactions between the drug molecules and their receptors (fig. ) . twenty-five pharmacophores were collected from the top five binding positions of imatinib at the primary active site of abl tyrosine kinase (native receptor), where each purple sphere represents a pharmacophore as depicted in fig a. similarly, an additional pharmacophores were collected from the first five binding sites of imatinib at the primary active site of the sars-cov- rbd. the rbd pharmacophores were represented as yellow spheres in fig b. from the results of elixir-a alignment, it is evident that four pharmacophores between abl kinase and rbd overlapped (red spheres). this significant overlap reveals why the compounds that were originally screened using sars-cov- rbd also bound to abl kinase, ultimately ensuing in the inhibitory action. the above point is further explicated due to the . % identity between the active sites of the abl (uniprotkb: p aa - ) and sars-cov- spike rbd (uniprotkb: p dtc aa - ) generated by a protein blast (blastp) [ , ] search. there is an urgent need for finding a treatment against the current pandemic of the sars-cov- . health experts across the globe are trying to use existing clinically approved drugs to treat patients until a specific drug is developed. the present study, using a combination of computational techniques followed by in vitro studies, identified imatinib, an fda approved anti-cancer drug as a potential treatment of sars-cov- infection. the data indicate inhibition of sars-cov- replication at ic of nm. our results suggest that imatinib prevents viral replication by inhibiting the virus at the fusion stage, possibly by preventing endosomal entry. binding studies revealed that the affinity of imatinib for the sars-cov- spike rbd protein is still lower (higher kd value) than the previously published values of nanomolar range (ligand id: bdbm ) [ ] for imatinib on abl tyrosine kinase [ ] and in range with the micromolar affinities of imatinib to the src-family kinases, frk and fyn [ ] . although imatinib is not a promiscuous drug, it has been found to bind tightly to tyrosine kinases other than abl [ , ] . pharmacophore mapping between abl and sars-cov- rbd and a . % identity at the active site of the two proteins explains why imatinib binds to the sars-cov- rbd as well. however, imatinib failed to directly inhibit the sars-cov- spike rbd:ace interaction in the competitive elsa assays. therefore, it is likely that imatinib causes inhibition of virus fusion via cellular kinase pathway resulting in inhibition of virus replication, as previously described for other coronaviruses [ ] . the results provide further evidence supporting the recent clinical trials (clinicaltrials.gov identifier: nct , nct , nct , nct , and nct ) for covid- patients with imatinib. a swiss-model server [ ] was used to construct a homology model of the sars-cov- spike protein using the crystal structure of the sars-cov- spike protein (pdb: vsb_chain a) as the template [ ] . the genome sequence wuhan-hu- (genbank: mn . ) was used as a representative of the sars-cov- . spike protein sequence (genbank: qhd . ) was used as the target sequence [ ] . the swiss-model structure assessment tool was used to validate the quality of the predicted model. around , compounds, including , nucleoside-like compounds from the enamine targeted antiviral library (enamine.net) and , food and drug administration (fda)approved drugs from the zinc database [ ] were used for molecular docking. all molecules were prepared with obabel [ ] from .sdf or .mol format to .pdbqt format. the d compound structures from the enamine library were resolved by obabel --gen d command. the docking file of the protein model was prepared with mgltools v . . [ ] and the molecules were docked at the rbd of the spike protein via autodock vina . . [ ] . the grid box of × × size with . Å spacing was fixed around the rbd (thr -val ) of the spike protein. each docking was done in three replicates, and the conformation with the highest binding score was recorded. the batch processing of docking and data collection was performed using an in-house python script which is deposited in github. data were analyzed statistically using r studio [ ] and graphs were constructed with ggplot in r [ ]. the ligand-receptor interactions were studied using schrödinger maestro [ ] , and molecules with high docking scores were selected from each screening library for further studies. codon-optimized mers-cov (isolate emc, vg -g-n) and sars-cov (isolate cuhk-w ; vg -g-n) s expression plasmids (pcmv) were ordered from sino-biological and subcloned into pcaggs using the clai and kpni sites. the last amino acids of the sars-cov spike protein were deleted to enhance pseudovirus production. codon-optimized cdna encoding sars-cov- s glycoprotein (isolate wuhan-hu- ) with a c-terminal amino acid deletion was synthesized and cloned into pcagss in between the ecori and bglii sites. pvsv-egfp-dg (# ), pmd .g (# ), pcag-vsv-p (# ), pcag-vsv-l (# ), pcag-vsv-n (# ) and pcaggs-t opt (# ) were ordered from addgene. s expressing pcaggs vectors were used for the production of pseudoviruses, as described below. the cdna encoding human tmprss (nm_ ; ohu d) was obtained from genscript. the cdna fused to a c-terminal ha tag was subcloned into pqxcih (clontech) in between the noti and paci sites to obtain the pqxcih-tmprrs -ha vector. vero-tmprss cells were produced by retroviral transduction. to produce the retrovirus, μg pqxcih-tmprrs -ha was co-transfected with polyethylenimine (pei) with . μg pbs-gagpol (addgene # ) and μg pmd .g in a cm dish of % confluent hek- t cells in opti-mem i ( x) + glutamax. retroviral particles were harvested at hours post-transfection, cleared by centrifugation at x g, filtered through a . μm low protein-binding filter (millipore), and used to transduce vero cells. polybrene (sigma) was added at a concentration of μg/ml to enhance transduction efficiency. transduced cells were selected with hygromycin b (invitrogen). hek- t cells were maintained in dulbecco's modified eagle's medium (dmem, gibco) supplemented with % fetal bovine serum (fbs), x non-essential amino acids (lonza), mm sodium pyruvate (gibco), mm l-glutamine (lonza), μg/ml streptomycin (lonza) and u/ml penicillin. vero, vero-tmprss , and veroe cells were maintained in dmem supplemented with % fbs, . mg/ml sodium bicarbonate (lonza), mm hepes (lonza), mm l-glutamine, μg/ml streptomycin and u/ml penicillin. all cell lines were maintained at °c in a % co , humidified incubator. the protocol for vsv-g pseudovirus rescue was adapted from whelan and colleagues ( ). briefly, a % confluent cm dish of hek- t cells was transfected with µg pvsv-egfp-dg, µg pcag-vsv-n (nucleocapsid), µg pcag-vsv-l (polymerase), µg pmd .g (glycoprotein, vsv-g), µg pcag-vsv-p (phosphoprotein) and µg pcaggs-t opt (t rna polymerase) using pei at a ratio of : (dna:pei) in opti-mem i ( x) + glutamax. forty-eight hours post-transfection the supernatant was transferred onto new plates transfected hours prior with vsv-g. after a further hours, these plates were retransfected with vsv-g. after hours the resulting pseudoviruses were collected, cleared by centrifugation at x g for minutes, and stored at - °c. subsequent vsv-g pseudovirus batches were produced by infecting vsv-g transfected hek- t cells with vsv-g pseudovirus at a moi of . . titres were determined by preparing -fold serial dilutions in opti-mem i ( x) + glutamax. aliquots of each dilution were added to monolayers of × vero cells in the same medium in a -well plate. three replicates were performed per pseudovirus stock. plates were incubated at °c overnight and then scanned using an amersham typhoon scanner (ge healthcare). individual infected cells were quantified using imagequant tl software (ge healthcare). all pseudovirus work was performed in a class ii biosafety cabinet under bsl- conditions at erasmus medical center. for the production of mers-cov, sars-cov, and sars-cov- s pseudovirus, hek- t cells were transfected with µg s expression plasmids. twenty-four hours post-transfection, the medium was replaced for in opti-mem i ( x) + glutamax, and cells were infected at a moi of with vsv-g pseudotyped virus. two hours post-infection, cells were washed three times with optimem and replaced with medium containing anti-vsv-g neutralizing antibody (clone g f ; absolute antibody) at a dilution of : , to block remaining vsv-g pseudovirus. the supernatant was collected after hours, cleared by centrifugation at x g for minutes and stored at °c until use within days. coronavirus s pseudovirus was titrated on veroe cells as described above. transduction experiments were carried out by incubating pseudovirus with imatinib at concentrations ranging from - nm in opti-mem i ( x) + glutamax for hour at °c. pseudovirus-imatinib mixes were added to monolayers of × vero or vero-tmprss cells in a -well plate. plates were incubated for hours before quantifying gfp-positive cells using an amersham typhoon scanner and imagequant tl software. to determine the toxicity profile of imatinib, we performed the mtt assay using a -hr and an hr design. briefly, a serial dilution of imatinib was prepared and incubated on vero cells for hr at o c. subsequently, cells were washed, further cultured for eight hrs. in the -hr design, cells were incubated with a serial dilution of imatinib for eight hours without a washing step. we tested serial dilutions of imatinib for its ability to neutralize sars-cov- (german isolate; gisaid id epi_isl ; european virus archive global # v- ) using a plaque reduction neutralization test (prnt) as previously described [ ] . fifty μl of the virus suspension ( spot forming units) was added to each well and incubated at °c for either hr. following incubation, the mixtures were added on vero cells and incubated at °c for either hr or hrs. the cells incubated for hr were then washed and further incubated in medium for hrs. after the incubation, the cells were fixed and stained with a polyclonal rabbit anti-sars-cov antibody (sino biological; : ). staining was developed using a rabbit anti-sars-cov serum and a secondary alexa-fluor-labeled conjugate (dako). the number of infected cells per well were counted using the imagequant tl software. the binding kinetics of imatinib on sars-cov- rbd protein were studied using a blitz® system (fortébio). experiments were conducted using the advanced kinetics mode, at room temperature and a buffer system consisting of x kinetics buffer (fortébio), % anhydrous dimethyl sulfoxide (dmso; sigma aldrich). recombinant his-tagged sars-cov- rbd protein ( -v h; sino biological) at a concentration of µg/ml was loaded on anti-penta-his (his k) biosensors (fortébio), followed by a washing step with assay buffer to block the unoccupied sensor surface. the association and dissociation profiles of imatinib (sigma aldrich) were measured at various concentrations (four-point serial dilutions from . µm to . µm). a reference biosensor loaded in the same manner with µm imatinib was used for baseline correction in each assay. the final binding curves were analyzed with the blitz pro . software (fortébio) using the : global-fitting model. the assay was repeated twice to validate the binding constants. here, data is represented as mean ± sd. similarly, sars-cov- rbd was immobilized on his k biosensors to study the binding kinetics of mfc-tagged hace ( -h h; sino biological) before being dipped into tubes containing the x kinetics buffer. various concentrations of ace (four-point serial dilutions from to nm) were used to measure the association and dissociation profiles. data were reference subtracted and fit to a : binding model using the blitz pro . software. the ability of imatinib to inhibit the interaction of spike rbd:ace proteins was evaluated by using the spike rbd (sars-cov- ): ace inhibitor screening colorimetric assay kit (bps bioscience). the avi-his-tagged spike s rbd (sars-cov- ) protein ng/well in pbs was coated onto -well microplate by overnight incubation at °c. blocking buffer was used to block the nonspecific binding sites by incubation for hour. different concentrations of imatinib were added and incubated for hour at room temperature with slow shaking. for the wells designated "blank" and "positive control", inhibitor buffer (pbs with . % dmso) was added. the reaction was initiated by adding ace his-avi-tagged biotin-labeled hip tm protein ( ng/well) in x immuno buffer to the "positive control" and "test inhibitor" wells by incubation for hour at room temperature with slow shaking. streptavidin-hrp (dilution : , in blocking buffer ) was added to each well and incubated at room temperature for hour with slow shaking. washing procedure ( × µl x immuno buffer ) was performed after each step. the chromogenic reaction was initiated by adding colorimetric hrp substrate to each well and incubated at room temperature until blue color was developed (approximately minutes) in the "positive control" well. after the blue color was developed, the reaction was terminated by adding n hcl, and absorbance at nm was measured using synergy h hybrid multi-mode microplate reader (biotek instruments). the pharmacophore model was generated using pharmit [ ] , an online interactive platform to elucidate pharmacophores from the receptor and ligand complex. top five binding conformations of drug-protein complexes were produced by autodock vina. the pharmacophores of the ligands interacting with the receptor were considered active pharmacophores while the rest were defined as inactive pharmacophores. the pharmacophores from the native receptor of imatinib (abl tyrosine kinase; pdb: gvu) and rbd of sars cov- were generated using imatinib as the ligand. using the enhanced ligand exploration and interaction recognition algorithm (elixir-a), the two sets of pharmacophores were merged and processed to identify any overlap in d space. a detailed description of elixir-a can be found in our previous work [ ] and the algorithm has been deposited in github. the python script 'elixir-a-vina-batch-screening-module' used for running docking jobs in batch mode and elixir-a, the algorithm used for pharmacophore mapping have been deposited in github and will be made public upon publication of the manuscript. with md simulations. we are thankful to mart lammers for allowing us to use the fusion assays. funding: dr was supported by the ministry of health of the czech republic (project no. - - ). author contributions: sf, bm, and dr conceived and designed the study. first author nm designed the experiments, performed bli studies and immunoassays, reviewed literature, and compiled the manuscript and figures. co-first author hw conducted in silico experiments and compiled figures. sm did literature review on the resulting compounds and compiled the manuscript and figures. bm performed virology experiments. sf directed and verified studies and authored the manuscript. all authors reviewed and edited the paper. competing interests: all the authors declare that there are no conflicts of interest. percent inhibition compared to the amount of plaques on cells. here, -fold dilution of the compounds were done in duplo. then tcid of sars-cov- was added to each well, and plates were incubated at c for hour. then, the mixes were added onto vero cells and incubated for hours at c. subsequently, cells were fixed for min with % pfa, followed by another min fixation with % ethanol. fixed cells were stained with a monoclonal antibody, followed by alexafluor . here imatinib shows significant % inhibition as compared to other compounds tested. ( µg/ml) were incubated for hour at room temperature with slow shaking in the presence of various imatinib concentrations. streptavidin hrp ( : , ) was added to the reaction mixture. colorimetric substrate was added to initiate the chromogenic reaction, and minutes were allowed for color development. the reaction was terminated with the addition of n hcl and absorbance was measured at nm. positive control (no inhibitor) was assumed to represent % inhibition. values obtained from test wells (with imatinib) compared to the positive control showed % inhibition of rbd:ace interaction, indicating that imatinib does not inhibit spike fusion by direct inhibition. pharmacophore distribution of five most stable conformations on tyrosine kinase abl (purple spheres); and b] pharmacophore distribution (yellow spheres) on sars-cov- spike protein rbd with pharmacophores common to both receptors depicted in red. a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- who declares covid- a pandemic an interactive web-based dashboard to track covid- in real time. the lancet infectious diseases coronaviruses-drug discovery and therapeutic options sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor functional assessment of cell entry and receptor usage for lineage b β-coronaviruses discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. biorxiv cryo-em structure of the -ncov spike in the prefusion conformation structure of the sars-cov- spike receptor-binding domain bound to the ace receptor coronavirus s protein-induced fusion is blocked prior to hemifusion by abl kinase inhibitors abelson kinase inhibitors are potent inhibitors of severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus fusion corona virus drugs -a brief overview of past, present and future repurposing of clinically developed drugs for treatment of middle east respiratory syndrome coronavirus infection gapped blast and psi-blast: a new generation of protein database search programs protein database searches using compositionally adjusted substitution matrices. the febs journal bindingdb: a web-accessible database of experimentally determined protein-ligand binding affinities. nucleic acids research a quantitative analysis of kinase inhibitor selectivity a small molecule-kinase interaction map for clinical kinase inhibitors molecular therapeutics: is one promiscuous drug against multiple targets better than combinations of molecule-specific drugs? swiss-model: homology modelling of protein structures and complexes complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in wuhan docking.org: over . billion compounds you can search and buy; million leadlike you can dock. abstracts of papers of the open babel: an open chemical toolbox autodock and autodocktools : automated docking with selective receptor flexibility software news and update autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading gplots: various r programming tools for plotting data. . . release, s., : maestro. schrödinger, llc severe acute respiratory syndrome coronavirus -specific antibody responses in coronavirus disease pharmit: interactive exploration of chemical space a non-beta-lactam antibiotic inhibitor for enterohemorrhagic escherichia coli o : h pharmacophore distribution on sars-cov- spike protein rbd with imatinib we gratefully acknowledge the support from texas a&m high performance research computing (hprc) and tamu laboratory for molecular simulation (lms). we would like to thank dr. lisa perez (associate director for advanced computing enablement, hprc tamu) for guidance key: cord- -ug ovsws authors: hosie, margaret j; epifano, ilaria; herder, vanessa; orton, richard j; stevenson, andrew; johnson, natasha; macdonald, emma; dunbar, dawn; mcdonald, michael; howie, fiona; tennant, bryn; herrity, darcy; da silva filipe, ana; streicker, daniel g; willett, brian j; murcia, pablo r; jarrett, ruth f; robertson, david l; weir, william title: respiratory disease in cats associated with human-to-cat transmission of sars-cov- in the uk date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ug ovsws two cats from different covid- -infected households in the uk were found to be infected with sars-cov- from humans, demonstrated by immunofluorescence, in situ hybridisation, reverse transcriptase quantitative pcr and viral genome sequencing. lung tissue collected post-mortem from cat displayed pathological and histological findings consistent with viral pneumonia and tested positive for sars-cov- antigens and rna. sars-cov- rna was detected in an oropharyngeal swab collected from cat that presented with rhinitis and conjunctivitis. high throughput sequencing of the virus from cat revealed that the feline viral genome contained five single nucleotide polymorphisms (snps) compared to the nearest uk human sars-cov- sequence. an analysis of cat ’s viral genome together with nine other feline-derived sars-cov- sequences from around the world revealed no shared catspecific mutations. these findings indicate that human-to-cat transmission of sars-cov- occurred during the covid- pandemic in the uk, with the infected cats developing mild or severe respiratory disease. given the versatility of the new coronavirus, it will be important to monitor for human-to-cat, cat-to-cat and cat-to-human transmission. pandemic, naturally occurring sars-cov- infections linked to transmission from humans have been reported in domestic cats ( , ) , non-domestic cats ( ), dogs ( ) and mink ( ) . in addition, in vivo experiments have shown that while cats, ferrets and hamsters are susceptible to sars-cov- infection, ducks, chickens and pigs are apparently not susceptible ( , ) . cat-to-cat transmission has been demonstrated experimentally ( ) ( ) , but the significance of sars-cov- as a feline pathogen, as well as its reverse zoonotic potential, remains poorly understood. if sars-cov- were to establish new animal reservoirs, this could have implications for future emergence in humans. at present, there is no evidence of cat-to-human transmission or that cats, dogs or other domestic animals play any appreciable role in the epidemiology of human infections with sars-cov- . however, although the pandemic is currently driven by human-to-human transmission, it is important to address whether domestic animals are susceptible to disease or pose any risk to humans, particularly those individuals who are more vulnerable to severe disease. domestic animals could also act as a viral reservoir, allowing continued transmission of the virus, even when ro < in the human population. recent reports from dutch mink farms of both mink-to-cat and mink-tohuman transmission of the virus provide support for this scenario ( , ) we used a range of laboratory techniques to show that two domestic cats from households with suspected cases of covid- , and which displayed either mild or severe respiratory disease, were infected with sars-cov- . these findings confirm that human-to-cat transmission of sars-cov- occurs and can be associated with signs of respiratory disease in cats. sections of lung tissue were collected post-mortem from cat , placed in virus transport medium and stored at - °c on april ; on june the virus transport medium (vtm) was removed, and rnalater ® was added. lung tissue was also stored in formalin from april until june, when it was processed to wax prior to immunohistochemistry. infection of cat was identified via a retrospective survey of oropharyngeal and/or conjunctival swabs collected from cats with respiratory signs that had been submitted to the university of glasgow veterinary diagnostic service (vds) between march and july for routine pathogen testing. ethical approval for this study was granted by the university of glasgow school of veterinary medicine ethics committee (ea / ). permission was given for the retrospective analysis of feline swabs submitted to vds for routine respiratory pathogen testing. permission was also granted for a public appeal to practising veterinary surgeons via the veterinary record, to solicit the submission of samples from suspect sars-cov- cases ( ). this appeal was in line with guidance to veterinarians on the testing of animal samples for sars-cov- from the animal and plant health agency (apha), issued on may ( ). this briefing note confirmed that testing of animals for the purpose of clinical research was permitted under appropriate ethical review. approval to test tissue samples collected post-mortem from cat in the study was obtained from the primary veterinary surgeon. on submitting samples to scotland's rural college (sruc) veterinary services, veterinary practices agree that any sample may be used to investigate new and emerging diseases. samples were received in vtm and screened for feline herpes virus (fhv), feline calicivirus (fcv) and chlamydia felis (c. felis). dna extracts from vtm samples were tested for the presence of fhv and c. felis using a multiplex quantitative polymerase chain reaction (qpcr) approach. the assay incorporated published c. felis primers ( ) together with primers/probes for fhv and a feline host control gene which were designed in-house. standard respiratory virus isolation was also attempted using proprietary feline embryonic (fea) cells. the remnants of these samples were stored at °c prior to testing for sars-cov- . trizol™ reagent (thermofisher scientific, paisley, uk) was added to lyse the sample and ensure inactivation of sars-cov- , followed by organic solvent extraction using chloroform;isoamyl alcohol. subsequent steps were performed using rneasy® mini kits (qiagen, manchester, uk) as per the manufacturer's instructions, with elution of the final rna sample in µl nuclease-free water. one mock rna extraction was performed for every seven samples. all samples were tested using two reverse ( ) . negative controls processed in parallel retrieved no viral mapped reads after primer trimming. the created viral genome sequence for cat was uploaded to gisaid with the accession number epi_isl_ . the closest uk human sars-cov- sequence was initially identified using the cog-uk cluster identification tool civet (https://github.com/cog-uk/civet). a maximumlikelihood phylogenetic tree of all unique human sars-cov- sequences from the same county as cat (n = ), along with the cat genome, the closest uk human sequence and the wuhan-hu- reference, was created using iq-tree ( ) with the gtr substitution model (selected by iq-tree modelfinder) and bootstraps. existing feline (n = ; belgium, china, france, spain, usa) and mink (n = ; netherlands) sars-cov- viral genome sequences were downloaded from the gisaid website (https://www.gisaid.org) on july . suggesting that type i pneumocytes were infected ( figure b ). in contrast, neither viral protein nor rna was detected in the liver. to characterise cat 's viral genome, we performed high-throughput sequencing on rna derived from the clinical specimen. the generated viral genome sequence was . % complete and contained single nucleotide polymorphisms (snps) when compared with the original wuhan_hu- reference sequence. sequence data from the symptomatic owner were not available and therefore we compared the feline genome with human sars-cov- sequences, using data from the covid- genomics uk (cog-uk) consortium. the mutational hamming distance (ignoring ns and ambiguities) between the cat viral genome and all cog-uk human viral genomes available on august revealed that the closest human sars-cov- sequences from the uk differed from the feline sequence by five snps (n = ; table ); these human sequences were distributed throughout the uk but predominantly ( %) ( ) assigned to one lineage. the closest sequences (n = ) from the same county as cat were an additional snp away. phylogenetic analyses of these sequences reinforced the close relationship between the cat viral genome and human-derived uk sars-cov- genomes ( figure ). as we do not have the owner's virus sequence, we cannot determine whether the observed mutations in cat 's viral genome arose in a human prior to transmission. table details the snps observed in the cat viral genome, and their frequency in the existing uk human population and among existing feline sars-cov- sequences. six of the snps are widespread (> %) in the uk human population and only three have not been observed previously. it is most likely that the three novel snps arose recently as evolutionary bottlenecks during human-to-human transmission and represent an unsampled cluster of human variants. given that no other feline or mink sequences contained these mutations, there is little indication that these correspond to a host species adaptation of the virus. next we examined all globally available feline sars-cov- sequences from the gisaid database for evidence of convergent mutations. each of the six existing complete feline viral genomes contained snps in common with cat resulting in the d g mutation in spike, the p l mutation in nsp , and a synonymous mutation in nsp . however, as these mutations are widespread in the human population, it is likely that they evolved in humans and are not associated with feline adaptation. the existing feline viral sequences were mutation distances of (n = ), (n = ), and (n = ) snps away from the closest human sars-cov- sequence in their respective countries. it has been suggested that the d g mutation in spike (shared by the feline sars-cov- genomes) confers a fitness advantage to the virus in humans ( , ), whether the same mutation renders the virus more infectious for cats remains to be established. this is the first report of human-to-cat transmission of sars-cov- in cats in the uk. although the ongoing sars-cov- pandemic is driven by human-to-human these findings have potential implications for the management of cats owned by people who develop sars-cov- infection. currently, there is no evidence that domestic cats have played any role in the epidemiology of the covid- pandemic, but a better understanding of how efficiently virus is transmitted from humans to cats will require cats in covid- households to be monitored. the two cases of reverse zoonotic infections that are reported here serve to highlight the importance of a coordinated one health approach between veterinary and public health organisations. first detection and genome sequencing of sars-cov- in an infected cat in france first reported cases of sars-cov- infection in companion animals infection of dogs with sars-cov- sars-cov- infection in farmed minks, the netherlands susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus pathogenesis and transmission of sars-cov- in golden hamsters transmission of sars-cov- in domestic cats jumping back and forth: anthropozoonotic and zoonotic transmission of sars-cov- on mink farms send cat and dog samples to test for sars-cov- factors associated with upper respiratory tract disease caused by feline herpesvirus, feline calicivirus, chlamydophila felis and bordetella bronchiseptica in cats: experience from european catteries fast and accurate short read alignment with burrows wheeler transform an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar iq-tree : new models and efficient methods for phylogenetic inference in the genomic era cog. the covid- genomics uk consortium. evaluating the effects of sars-cov- spike mutation d g on transmissibility and pathogenicity. medrxiv a critical needs assessment for research in companion animals and livestock following the pandemic of covid- in humans. vector borne zoonotic dis the risk of sars-cov- transmission to pets and other wild and domestic animals strongly mandates a onehealth strategy to control the covid- pandemic. one health this study was supported by an award to mjh, bjw, rfj, prm and ww from the the authors have no potential competing interests. key: cord- -t neub d authors: fu, ziyang; huang, bin; tang, jinle; liu, shuyan; liu, ming; ye, yuxin; liu, zhihong; xiong, yuxian; cao, dan; li, jihui; niu, xiaogang; zhou, huan; zhao, yong juan; zhang, guoliang; huang, hao title: structural basis for the inhibition of the papain-like protease of sars-cov- by small molecules date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t neub d sars-cov- is the pathogen responsible for the covid- pandemic. the sars-cov- papain-like cysteine protease has been implicated in virus maturation, dysregulation of host inflammation and antiviral immune responses. we showed that plpro preferably cleaves the k -ubiquitin linkage while also being capable of cleaving isg modification. the multiple functions of plpro render it a promising drug target. therefore, we screened an fda-approved drug library and also examined available inhibitors against plpro. inhibitor grl showed a promising ic of . μm. the co-crystal structure of sars-cov- plpro-c s in complex with grl suggests that grl is a non-covalent inhibitor. nmr data indicate that grl blocks the binding of isg to plpro. the antiviral activity of grl reveal that plpro is a promising drug target for therapeutically treating covid- . one sentence summary co-crystal structure of plpro in complex with grl reveals the druggability of plpro for sars-cov- treatment. druggability of plpro for sars-cov- treatment. the covid- pandemic has caused devastating damage to the world and it has resulted in over million confirmed cases and over half a million death as of july , ( ). the novel sars-cov- coronavirus is the etiological agent responsible for the pandemic, and it belongs to the beta coronavirus family ( - ). similar to the other two known beta coronaviruses, i.e. sars and mers, sars-cov- also causes severe acute respiratory syndromes ( , ). unexpectedly, sars-cov- has been reported to have more mild symptoms but much higher transmission rate ( , ) , therefore it has caused the biggest catastrophe to the world healthcare since the spanish flu in - ( ). there are currently no fda-approved drugs for sars, mers or sars-cov- . remdesivir, an anti-hiv drug developed by gilead, was given emergency use authorization (eua) permission by the u.s. food and drug administration (fda) for covid- treatment ( ). meanwhile, the discovery of antibodies or vaccines for covid- is still in progress. therefore, anti-sars-cov- drugs are urgently needed. besides mpro, plpro has been considered another promising target for drug discovery to treat length sars-cov- plpro protein was expressed in e. coli and subsequently purified. a commercially available fluorogenic peptide substrate rlrgg-amc, representing the c-terminal residues of ubiquitin, was used to report the enzymatic activity of plpro. the st round of screening provided ~ compounds with over % inhibition at μm. hits from the st round of screening went into the nd round of validation using the same enzymatic assay. after removing compounds with poor solubility, strong reactivity and low intrinsic fluorescence, seven relatively potent compounds were measured for ic (n= ). these drugs showed modest ic values ranging from to μm (fig. s ). although these compounds can potentially provide a starting point for further optimization, their low potency imply a need of large amounts of resources and time input. therefore, we cherry-picked grl from promising sars-cov plpro inhibitors based on the structural similarity between the sars-cov and sars-cov- plpro proteins ( ). the in-vitro ic of grl against sars-cov- plpro was . ± . μm, suggesting a promising lead compound and therefore it was subjected to further structural and mechanistic studies ( fig. a) . a co-crystal structure would be crucial to understand the mechanism of inhibition of sars- cov- by grl , therefore we determined the x-ray co-crystal structure of sars-cov- plpro c s at . Å (fig. a -e and table s ). the crystal of plpro/grl belongs to the space group i with just one protein molecule in each asymmetric unit. grl resides in the s and s subsites of plpro and is apart from the catalytical triad with a minimum distance of . Å to s in the c s structure. based on the binding site of grl , it inhibits sars-cov- plpro in a non-covalent manner. the overall grl bound plpro structure is very similar to the available apo-structure of c s (pdb wrh) with a backbone rmsd of . Å, except for two residues on the bl loop, i.e. y and q (fig. d ). upon binding grl , the sidechain of y and q shifted towards grl to form polar and hydrophobic interactions with the compound and stabilize its binding (fig. d) . specifically, the sidechain oxygen of y forms a hydrogen bond with the nh group on the benzene ring of grl , and the backbone nh of q with the carbonyl oxygen of grl ( grl . in addition, hydrophobic integration also plays pivotal roles in grl 's binding to plpro. the naphthalene group of grl is buried in the pocket formed by aromatic residues y , y and the hydrophobic sidechains of p and p ( fig. b and c) . that the original pocket in mers plpro might be too shallow to allow grl to bind with extensive contacts, and the naphthalene moiety of grl would also be in steric clash with the pocket of mers plpro ( fig. s c) . in contrast to sars and sars-cov- , the bl- loop of mers is one residue longer but it lacks the critical y of sars-cov- (fig. s ) . the extra residue of mers plpro may rearrange the hydrogen-bonding interaction network of the bl loop and the lack of the aromatic tyrosine clearly resulted in the removal of t-shaped pi-pi stacking and van der waals interactions with the naphthalene group of grl ( fig. e and fig.s a-f) . response of peak intensity recovery was evident, which indicated that grl competes with isg for the binding site on plpro, namely the s and s sites and blocked the binding of isg to plpro (fig. s a) . the superposition of the hsqc spectra of . mm isg only and the . mm/ . mm plpro/ . mm grl mixture are essentially identical, which indicated that grl is a potent binder to plpro and almost completely blocked the binding of isg to plpro at a molar ratio of . ( . mm / . mm) (fig. c and fig. s b ). no peak shifting was observed in the superimposed hsqc (fig. s b) , suggesting that grl is a bona fide plpro binder because the hsqc spectrum of n-isg is not disturbed at all by . excess molar ratio of grl . in comparison with the complex structure of sars-cov- plpro with ub (pdb id: xaa) (fig. f ) or isg (pdb id: xa ) (fig. e) , grl blocked the access of the c- terminal tail of ub and isg to the catalytically active site of sars-cov- plpro. as suggested from the linkage cleaving preference experiments (fig. ) , only k linkage can be effectively cleaved by sars -plpro, therefore a very weak binding of monoub to plpro was expected. indeed, titrations of plpro into n-ub caused very limited peak shifting even at molar ratio of , confirming the suspected weak binding (fig. s ) . therefore, grl was not further titrated into the n-ub/plpro mixture. taken together, our nmr and x-ray analysis indicates that grl is a potent ppi (protein-protein interaction) inhibitor for plpro blocking the binding of isg . inhibitor remdesivir ( , ). in our study, we show that plpro is an equally promising target but more challenging because the co-crystal structure is hard to obtain and often irreproducible, like other coronaviral plpros ( ). our co-crystal structure of plpro c s in complex with the potent inhibitor grl validated that sars-cov- plpro is a druggable target. our structural characterization paves the way for future drug discovery targeting plpro and provids a solid template for modeling and modifying potential inhibitors, including grl and its naphthalene analogs. based on the structure, grl resides in the s /s site of plpro, naturally it will also inhibit the processing of viral polyproteins of sars-cov- since these viral polyproteins share the same substrate cleavage sequence with ub and isg . therefore, inhibition of sars -cov- plpro can simutaneouly prevent viral matureation and attaching host antiviral immune system. our nmr study reveals that grl is a potent protein-protein interaction (ppi) inhibitor targeting plpro. in comparison, mpro is another highly explored drug target with several potent covalent inhibitors been reported ( , , ) . however, no covalent inhibitors of plpro have ever been reported. our study suggests that focusing on the discovery of non-covalent inhibitors could be an effective strategy targeting plpro. althrough the seven fda-approved drug obtained in our screening show low potency against plpro, we cannot rule out the potential of these drug to therapeutically treat covid- because they may have higher antiviral activities by inhibiting other pathways in the virus lifecycle. the catalytic triad residues (s in place of c , h and d ) are shown in yellow; coronavirus disease (covid- ) situation reports- " ( );www.who.int/emergencies/diseases/novel-coronavirus- /situation-reports a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a novel coronavirus from patients with pneumonia in china the sars-cov- outbreak: what we know sars-cov- and covid- : the most important research questions reorganize and survive-a recommendation for healthcare services affected by covid- -the ophthalmology experience remdesivir in covid- : a critical review of pharmacology, pre-clinical and clinical studies structure of m(pro) from sars-cov- and discovery of its inhibitors structure-based design of antiviral drug candidates targeting the sars-cov- main protease structural basis for the inhibition of sars-cov- main protease by antineoplastic drug carmofur crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors boceprevir, gc- , and calpain inhibitors ii, xii inhibit sars-cov- viral replication by targeting the viral main protease the sars-coronavirus papain-like protease: structure, function and inhibition by designed antiviral compounds ubiquitin-linkage specificity and deisgylating activity of sars-cov papain-like protease deubiquitinating and interferon antagonism activities of coronavirus papain-like proteases modulation of extracellular isg signaling by pathogens and viral effector proteins research and development on therapeutic agents and vaccines for covid- and related human coronavirus diseases therapeutic options for the novel coronavirus ( -ncov) rapid repurposing of drugs for covid- sars hcov papain-like protease is a unique lys linkage-specific di- distributive deubiquitinating enzyme recognition of lys -linked di-ubiquitin and deubiquitinating activities of the sars coronavirus papain-like protease a noncovalent class of papain-like protease/deubiquitinase inhibitors blocks sars virus replication catalytic function and substrate specificity of the papain-like protease domain of nsp from the middle east respiratory syndrome coronavirus structural basis for inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir structural basis for rna replication by the sars-cov- polymerase inhibitor recognition specificity of mers-cov papain-like protease may differ from that of sars-cov y and q are shown in marine in bound state, and in cyan in unbound state sars-cov plpro c s/grl (marine sticks) and mers-cov plpro figure nmr studies show that grl blocks the binding of isg to sars-cov- a) h, n-hsqc spectrum of n labeled isg ; (b) hsqc spectrum of n labeled peak broadening and peak intensity loss indicates binding of isg to plpro; (c) hsqc spectrum of n labeled isg ( . mm) in the mixture of . mm plpro and . mm grl . recovery of peak intensity suggests that grl binds to plpro and displaces isg pdb xa ) was modelled on (d) by superposition, showing steric clash of grl with the isg c-terminal tail; (f) ub in the complex structure of ubpa/sars-cov- plpro (pdb xaa) was modelled on showing steric clash of grl with the ub c-terminal tail key: cord- -ba f mf authors: sikora, mateusz; von bülow, sören; blanc, florian e. c.; gecht, michael; covino, roberto; hummer, gerhard title: map of sars-cov- spike epitopes not shielded by glycans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ba f mf the severity of the covid- pandemic, caused by the sars-cov- coronavirus, calls for the urgent development of a vaccine. the primary immunological target is the sars-cov- spike (s) protein. s is exposed on the viral surface to mediate viral entry into the host cell. to identify possible antibody binding sites not shielded by glycans, we performed multi-microsecond molecular dynamics simulations of a . million atom system containing a patch of viral membrane with four full-length, fully glycosylated and palmitoylated s proteins. by mapping steric accessibility, structural rigidity, sequence conservation and generic antibody binding signatures, we recover known epitopes on s and reveal promising epitope candidates for vaccine development. we find that the extensive and inherently flexible glycan coat shields a surface area larger than expected from static structures, highlighting the importance of structural dynamics in epitope mapping. the ongoing covid- pandemic, caused by the sars-cov- coronavirus, has emerged as the most challenging global health crisis within a century [ ] . as such, the development of vaccines and antiviral drugs effective against sars-cov- is absolutely urgent. as for other enveloped viruses [ ] , the primary vaccine target is the trimeric spike (s) protein in the envelope of sars-cov- . s mediates viral entry into the target cell [ ] [ ] [ ] [ ] [ ] . after binding to the human angiotensin-converting enzyme (ace ) receptor, the ectodomain of s undergoes a drastic transition from a prefusion to a postfusion conformation. this transition drives the fusion between viral and host membranes, which triggers internalization of sars-cov- via endocytic and possibly non-endocytic pathways [ , ] . locking the conformation of s or blocking its interaction with ace would prevent cell entry and infection. however, the dense glycan coat of s effectively shields the virus from an immune response and hinders pharmacological targeting. a detailed understanding of the exposed viral surface is instrumental in vaccine design [ ] . thanks to the extraordinary response of the global scientific community, we already have atomistic structures of s [ , , , ] and detailed views of the viral envelope [ ] [ ] [ ] [ ] . however, static structures do not capture conformational changes of s or the motion of the highly dynamic glycans covering it. molecular dynamics (md) simulations add a dynamic picture of s and its glycan protective shield [ ] [ ] [ ] . intriguingly, amaro's group predicted that glycans also play a crucial role in the infection mechanism [ ] . recent experiments validated these results [ ] , confirming the potential of accurate atomistic models. here, we report on the˜ µs-long md simulation of a full-length atomistic model of four s trimers in the prefusion conformation, giving us in aggregate˜ µs of s dynamics. the model includes the transmembrane domain (tmd) embedded in a realistic membrane, along with realistic post-translational modification patterns, i.e., glycosylation of the ectodomain and palmitoylation of the tmd. although independently developed, our s protein model and its structural dynamics are in quantitative agreement with recent high-resolution electron cryo-tomography (cryoet) images [ ] . we identify possible immunogenic epitopes on sars-cov- s by combining information on steric accessibility and structural flexibility with bioinformatic assessments of sequence conservation and epitope characteristics. we recover known epitopes in the ace receptor-binding domain (rbd) and identify several epitope candidates on the spike surface that are exposed, structured, and conserved in sequence. in particular, target sites for antibodies emerge in the functionally important s domain harboring the fusion machinery. we propose the structural domains presenting these epitopes as possible immunogens. full-length molecular model of sars-cov- s glycoprotein. our simulation system contained four membrane-embedded sars-cov- s proteins assembled from resolved structures where available and models for the missing parts (si appendix, fig. s ). the spike head was modeled based on a recently determined structure (pdb id: vsb [ ] ) with one rbd domain in an open conformation and glycans modeled according to [ ] . the stalk connecting the s head to the membrane was modeled de novo as trimeric coiled coils, consistent with an experimental structure of the hr domain in sars-cov s (pdb id: fxp [ ] ). the tmd as well as the cytosolic domain were modeled de novo. see si appendix, supplementary methods for further details and si appendix, fig. s for a view of the final model. molecular dynamics simulations. we assembled four membrane-embedded full-length s proteins to form one large membrane patch with proteins spaced at about nm distance [ , ] , totaling˜ . million atoms in the system. we performed md simulations of the four s proteins for . µs in the npt ensemble with gromacs . [ ] . we used the charmm m protein and glycan force fields [ ] [ ] [ ] , in combination with the tip p water model, and sodium and chloride ions ( mm). rigidity analysis. we quantified the local rigidity in terms of rmsf values. for each frame, the c α atoms of residues within Å of the residue of interest were rigid-body aligned to the average structure. the rmsf values were then averaged over the c α atoms, weighted by the relative surface area of each residue [ ] . these flexibility profiles were averaged over the four spike copies and three chains. the local rigidity was then defined as the reciprocal of the flexibility. accessibility analysis. the accessibility of the s protein surfaces was probed by illuminating the simulation protein in diffuse light, as detailed below, and by rigid body docking of the fab of the antibody cr [ ] , as detailed in the si appendix. for the illumination analysis, rays of random orientation emanate from a halfsphere with radius nm around the center of mass of the protein. they are absorbed by the first heavy atom they pass within . Å. simulation structures collected at ns intervals were each probed with rays. to quantify the effect of glycosylation, the analysis was performed with and without including the glycan shield. the fraction of rays absorbed was used as one measure of accessibility, and possible contact with an fab (si appendix) as the other. sequence variability analysis. to estimate the evolutionary variability of the s protein, we analyzed the aligned amino acid sequences released by the gisaid initiative on may (https://www.gisaid.org/). we first built the consensus sequence with the most common amino acid (the mode) at each position across the whole data set. we then kept only amino acid long sequences, and filtered out corrupted sequences by discarding those having a hamming distance from the consensus larger than . . with the remaining , sequences, we estimated the conservation at each position [ ] . our conservation score is defined as the normalized difference between the maximum possible entropy and the entropy of the observed amino acid distribution at a given position, cons (i) = + k p k (i) log p k (i) / log , where p k (i) is the probability of observing amino acid k at position i in the sequence. sequence-based epitope predictions. we estimated the epitope probability prediction by using the bepipred . webserver (http://www.cbs.dtu.dk/services/bepipred/), with an epitope threshold of . [ ] . bepipred . uses a random forest model trained on known epitope-antibody complexes. consensus score for epitope prediction. we integrated the information of the different analyses into the consensus epitope score. we first applied a d gaussian filter with σ = Å to the ray and docking scores . we then mapped each score to the interval [ , ], with outliers mapped to the extremes listed in si appendix, table s . finally, we multiplied the individual scores together to obtain the consensus score, which was also mapped to [ , ]. as basis for our search for possible epitopes, we constructed a detailed structural model of glycosylated fulllength s. whereas high-resolution structures of the s head are available [ , ] , the stalk and membrane anchor have so far not been resolved at atomic level. moreover, the glycosylation partially resolved in the s head structures may differ from that under infection conditions because of its passage through an intact golgi in the expression system [ ] . we built a model of the complete s by combining experimental structural data and bioinformatic predictions. our full-length model of the s trimer consists of the large ectodomain (residues - ) forming the head, two coiled-coil domains, denoted cc (residues - ) and hr (residues - ), forming the stalk, the α-helical tmd (residues - ) with flanking amphipathic helices ( - ) and multiple palmitoylated cysteines, and a short c-terminal domain (residues - ). the model fits high-resolution cryoet electron density data of s proteins on the surface of virions extracted from a culture of infected cells remarkably well [ ] . we performed a˜ µs long atomistic md simulation of a viral membrane patch with four flexible s proteins, embedded at a distance of about nm [ , ] (fig. ) . during the simulation, the four s proteins remained folded and anchored in the membrane with wellseparated tmds. the s heads tilted dynamically and interacted with their neighbors (si appendix, movie s ). high-resolution cryoet images [ ] and a recent md study [ ] independently revealed significant head tilting associated with flexing of the joints in the stalk, in strong support of our observations. being highly mobile, the glycans on the surface of s cover most of its surface ( fig. a-c ) . antibody binding sites predicted from accessibility, rigidity, sequence conservation, and sequence signature accessibility of the s ectodomain. antibody binding requires at least transient access to epitopes. the general accessibility of s on the viral membrane and the surface coverage by glycans was extracted from the md simulations by ( ) ray and ( ) antigen-binding fragment (fab) docking analyses. in the ray analysis, we illuminated the protein model by diffuse light; in the fab docking analysis, we performed rigid body monte carlo simulations of s and the sars-cov- antibody cr fab to determine the steric accessibility to an antibody fab. to account for protein and glycan mobility, we performed both analyses individually for × snapshots taken at ns time intervals from the . µs md simulation with four glycosylated s proteins. the glycan shield dramatically reduces the accessibility to the s surface (fig. a,b ). ray accessibility time traces show that the accessibility of a given site varies considerably in time (si appendix, fig. s ). even though glycans cover only a small fraction of the protein surface at a given moment (fig. ) , their high mobility leads to a strong effective shielding of s (fig. ) . a comparison of the fab docking results for glycosylated and unglycosylated s further illustrates this effect (si appendix, fig. s ). ray and docking analyses show that glycans cause a reduction in accessibility by about % and %, respectively. the most marked effect occurs in the hr coiled coil close to the membrane. without glycosylation, hr is fully accessible; with glycosylation, hr becomes inaccessible to fab docking (fig. a,b ). whereas small molecules may interact with the hr protein stalk, antibodies are blocked from surface access, in agreement with recent simulations by casalino et al. [ ] . rigidity of s. structured epitopes are expected to bind strongly and specifically to antibodies. by contrast, mobile regions tend to become structured in the bound state, entailing a loss in entropy and may not retain their structure when presented in a vaccine construct. with the aim of eliciting a robust immune response, we chose to include rigidity in our epitope score. here, we are less concerned with the large-scale conformational dynamics associated with the flexible hinges in the stalk and membrane anchor, as analyzed in another paper [ ] . instead, we concentrate on motions of domains on the scale of about nm. for this, we determined the rootmean-square fluctuations (rmsf) locally by superimposing local protein regions and converting the rmsf into a rigidity score, as described in methods. the surface of s presents both dynamic and rigid regions (fig. c ) . interestingly, the rbd and its surrounding are comparably flexible, consistent with the experimental finding of large differences in the structure of the three peptide chains in open and closed states [ ] . by contrast, the protein surface of the s domain covering the fusion machinery is relatively rigid (fig. c ) , possibly to safeguard this functionally critical domain in the metastable prefusion conformation. sequence conservation. targeting epitopes whose sequences are highly conserved will ensure efficacy across strains and prevent the virus from escaping immune pressure through mutations with minimal fitness penalty. we estimated the sequence conservation from the naturally occurring variations at each amino acid position in the sequences collected and curated by the gisaid initiative (https://www.gisaid.org/). the analysis of , amino acid sequences revealed that s is highly conserved, with no mutation recorded for % of the amino acid positions. as conservation score, we mapped the entropy at each position to the interval between zero and one (see methods). even surface regions are mostly well conserved in sequence (fig. d) . sequence-based immunogenicity predictor. conserved, rigid, and accessible regions present good candidates for binding of protein partners in general. to complement this information, we assessed the immunogenic potential based on sequence signatures targeted by antibodies. the epitope-like motifs in the s sequence identified using the bepipred . server [ ] lie scattered across the s ectodomain and include known epitopes ( fig. e and fig. e ), but also contain buried regions inaccessible to antibodies. consensus epitope score. we combined our accessibility, rigidity, conservation, and immunogenicity scores into a single consensus epitope score (figs. f and f ) . by taking the product of all individual scores, we ensured that epitope candidates have high scores in all features. this rigorous requirement eliminates many candidate sites, mostly because accessibility scores (fig. a,b ) and the rigidity score (fig. c ) show opposite trends, in line with the extensive occurrence of flexible loops on the s surface. using our consensus score, we identified nine epitope candidates (e -e ; fig. and table i ). epitope candidates e -e recover known epitopes (fig. f ,g and fig. f ) , achieving residue-level accuracy in some cases (si appendix, fig. s ); in addition, we identify epitope candidates e , e , and e -e . all epitope candidates reside in the structured head of s. by contrast, low accessibility and high flexibility in the hinges [ ] give the stalk low overall epitope scores. validation through identification of known epitopes. even though sars-cov- was identified only a few months ago, several groups have already reported on antibodies binding the sars-cov- s protein [ , [ ] [ ] [ ] [ ] [ ] [ ] . most notably, yuan and co-workers structurally characterized the binding of sars-cov-neutralizing antibody cr to the sars-cov- s protein ectodomain [ , , ] . their structure reveals an epitope distal to the ace binding site that requires at least two of the s protomers to be in the open conformation to permit binding without steric clash. interestingly, while our sim- ulations do not probe the doubly open configuration, the epitope reported by yuan et al. [ ] is still successfully identified with a significant consensus score. moreover, epitopes for other reported antibodies h [ ] , cb [ ] , p b- f [ ] , s [ ] , and a [ ] also match regions of high consensus score. in particular, our candidate epitopes e and e overlap with the reported binding sites in the rbd for neutralizing antibodies. we conclude that our epitope-identification methodology is robust. dependence on detailed glycosylation pattern. the complexity and variability of s glycosylation in situ remains poorly understood. mass spectrometry on recombinant s confirmed its extensive glycosylation [ ] . cy-oet images of intact viral particles revealed branching points in the glycans [ ] , indicative of complex glycans. we addressed this uncertainty by repeating our docking accessibility analysis for different glycosylation patterns. we considered pruned (mannose-only) glycans by removing fucose, sialic acid, and galactose. remarkably, this pruned glycan shield impedes fab accessibility almost as effectively (∼ %) as the full shield (∼ %), even if epitopes e and e become somewhat more exposed with trimmed glycans (si appendix, fig. s ). overall, we conclude that even a light glycan coverage strongly reduces the antibody accessibility of the protein. structural and dynamic characteristics of candidate epitopes. epitopes e -e are part of the n-terminal domain, which is formed mostly of antiparallel β sheets (residues - ). all three epitopes include flexible loops and folded β strands (si appendix, fig. s a ). epitope e is located on a two-turn α-helix flanked by a short twin α-helix and lying on a five-strand antiparallel βsheet. this arrangement provides the epitope with remarkable stability (si appendix, fig. s b ). epitopes e and e are located on the apical part of s in the rbd, and are composed mostly of flexible loops. e and e jointly span a contiguous surface in chain a, which is in the open conformation. by contrast, in the closed chains b and c, this surface is altered and e is buried (si appendix, fig. s c ). the epitope e is part of a stable helix that connects neighboring β-sheets (si appendix, fig. s d ). e comprises two quite long and flexible loops (residues - and - ), and two shorter and less flexible ones (si appendix, fig. s e ). finally, e is located on a short and flexible loop (si appendix, fig. s f ). guidance for immunogen and vaccine design. having identified accessible, relatively structured, and conserved epitope sequences, one of the challenges is to present these epitopes in an immunogenic manner to induce a robust antibody response. the structure of sars-cov- s comprises distinct domains with residue numbers (a) - , (b) - , and (c) - . domains a and b encompass the predicted epitopes e , e and e , respectively, and e is contained within domain b. we speculate that these domains may fold independently, possibly after suitable sequence redesigns, and could thus be used to present the epitopes faithfully. glycans as epitopes. there have been reports of glycan-mediated antibody binding to sars-cov- s [ ] and to hiv- env [ ] . while this could open up possibilities for epitope binding, the natural variability of the glycan shield [ ] , along with its extensive structural dynamics demonstrated here, currently preclude a systematic search for glycan-involving epitopes. moreover, with human and viral proteins carrying chemically equivalent glycan coats, the risk of autoreaction is significant [ ] . therefore, we concentrated here on amino acid epitopes. we identified epitope candidates on the sars-cov- s protein surface by combining accurate atomistic modelling, multi-microsecond md simulations, and a range of bioinformatics and analysis methods. we concentrated on sites that are accessible to antibodies, unencumbered be the glycan shield, and fairly rigid. we also required these sites to be conserved in sequence and to display signatures expected to elicit an immune response. from all these features, we determined a combined consensus epitope score that enabled us to predict nine distinct epitope sites. validating our methodology, we recovered five epitopes that overlap with experimentally characterized epitopes, including a "cryptic" site [ ] . highly dynamic glycans shield a large fraction of the s surface. even though the instantaneous surface coverage of the glycans is low, the long-time average density of few well placed glycans covers most of the protein surface. in particular, only three n-glycosylation sites per protein chain suffice to shield the stalk domain and block antibody binding to this functionally critical part of the protein. new and conflicting reports emerge on the glycan types on the s surface [ , ] , with glycan composition possibly varying from host to host. we considered both light and heavy glycan coverages in our analysis, which should encompass most of the glycan variability. both extremes show that glycosylation strongly protects s from interactions with antibodies. the different epitopes we predicted are the starting point to engineering stable immunogenic constructs that robustly elicit the production of antibodies. a fragmentbased epitope presentation avoids the many challenges of working with full-length s, a multimeric and highly dynamic membrane protein, whose prefusion structure is likely metastable. epitopes e , e , e , and e are particularly promising candidates. they are located on distinct s domains that could fold independently and present these epitopes in a native-like manner. additionally, epitopes that are distributed on the surface of s will make the onset of resistance due to mutations less likely. the approach we introduced in this paper could be extended to predict epitopes from an integrated analysis of diverse betacoronaviruses, with the ultimate aim of producing a universal vaccine that guarantees broad protection against the whole virus family. full-length molecular model of sars-cov- s glycoprotein. the modeling procedure of the full-length sars-cov- s glycoprotein is outlined in fig. s . we based our model of the sars-cov- s /s s domain on a recently determined structure (pdb id: vsb [ ] ). we added missing loops using modeller [ ] . we modeled the stalk connecting the s head to the membrane as two distinct coiled coils (ccs, henceforth denoted cc and hr ) based on cc predictions [ , ] . cc and hr at positions - and - are predicted with low and high confidence, respectively. however, since the n-terminal ends of the three helices in cc have been resolved in the experimental structures [ , ] , we modeled both segments as trimeric ccs with ccbuilder [ ] , using the heptad repeat register prediction of [ ] and generously extending all termini by several residues to prevent destabilization of the ccs from solvation effects at the termini. thus, the first model of cc comprised residues - , while hr comprised residues - . we then performed µs-long md simulations of the solvated cc and hr models individually with procedures and parameter settings as described below. in cc and hr , residues - and - retained stable cc structures, respectively (fig. s ) . the cc structures of snapshots at contrary to some observations [ ] , the complete glycosylation pattern including heavy glycosylation of the stalk seems to reflect better the situation in situ [ ] . modeling of the transmembrane domain. lacking a structure for the s transmembrane domain (tmd), we used a hierarchical procedure to model the tmd trimer. secondary structure predictions revealed that the tmd is likely to be formed of two helical stretches with a long transmembrane helix (residues - ), followed by a shorter c-terminal helix (residues - ) with features of an amphipathic helix. the remaining c-terminal residues were predicted as disordered. we hypothesized that the c-terminal helix extends to k and encompasses all cysteine residues, leaving a total of disordered residues at the c-terminus. we used a manually curated sequence alignment to build a homology model of the s protein tmd trimer helical core (residues - ) with modeller [ ] . we palmitoylated all cysteines, inserted the trimer into a lipid bilayer (see below and table s ), and relaxed the system using molecular dynamics (md; see parameters below) for µs, to properly equilibrate the relative orientation of the protomers. separately, we built an l-shaped tmd monomer model by appending the c-terminal helix (residues - , modeled as an ideal α-helix) to the tmd core helix (residues - ). the c-terminal helix was oriented such that all cysteines pointed into the hydrophobic core of the membrane. the five residues connecting the tmd and c-terminal helix, as well as the c-terminal residues were modeled as unstructured loops using modeller [ ] . all cysteines were palmitoylated and the monomer was inserted into a lipid bilayer, then relaxed by molecular dynamics for µs for proper positioning of the c-terminal helix with respect to the lipid head groups. finally, a tmd trimer model was obtained by structurally fitting the relaxed l-shaped monomer onto the relaxed transmembrane trimer. in two out of three monomers, this resulted in an outwardpointing, clash-free c-terminal helix. in the third monomer, the c-terminal helix was manually rotated around the z-axis to relieve clashes. assembly of full-length s model. a full-length model of s was built by manually matching the separate structural domains using pymol [ ] , and then building missing connecting residues as unstructured linkers with modeller [ ] . membrane lipid composition. coronaviruses like mers-cov and sars-cov are assembled in the endoplasmic reticulum (er) [ ] . we therefore modeled the viral envelope with an er-like composition [ ] as detailed in table s . the transmembrane domain structures described above were inserted into the er-like membrane using charmm-gui [ ] [ ] [ ] [ ] [ ] . molecular dynamics simulations. molecular dynamics simulations were performed with gro-macs . [ ] , using the charmm m protein and glycan force field [ ] [ ] [ ] , in combination with the tip p water model [ ] . ions parameters were those by luo and roux [ ] . after energy minimization using the steepest descent algorithm for steps, the system was equilibrated in the nvt ensemble for ps with a time step of fs, followed by ps with a time step of fs. in the equilibration runs, the berendsen thermostat [ ] was used for temperature coupling, with the coupling constant τ = ps. after ps, we used the parrinello-rahman barostat [ ] to apply semiisotropic pressure coupling, using τ = ps and compressibility . × − bar − . lincs constraints [ ] were applied to all bonds involving hydrogen atoms, allowing us to use a fs integration timestep for equilibration. during equilibration, restraints on positions and dihedrals were gradually decreased from kj mol − nm − to . due to the large system size, we adopted specific strategies to enhance the simulation speed during production. we used an integration timestep of fs. all hydrogen masses were doubled, corresponding to deuterium, to avoid instabilities from high frequency vibrations. cutoffs for non-bonded interactions were set to nm. in addition, temperature control was switched to the v-rescale thermostat [ ] . we used mdbenchmark to perform scaling studies and determine the optimal hardware configuration and run settings (mpi ranks/openmp threads) [ ] . rigid-body docking. we probed the steric accessibility for antibody binding using rigid-body docking. the fab of antibody cr (pdb id: w [ ] ) was used for a coarse-grained rigidbody monte carlo (mc) docking analysis, following the procedure described in [ ] . backbone c α atoms were recorded every ns of the md simulation. each snapshot was centered in a . nm × . nm × nm orthorhombic simulation box. the fab was subjected to × translation and rotation mc moves, recorded every moves. in a first step, we probed the steric accessibility of the protein surface without glycans using rigidbody mc simulations at high temperature (t = k). contacts between the complementaritydetermining region of the fab (heavy chain residues - , - , - ; and light chain residues - , - , - ) and s were then counted based on a distance criterion of twice the sum of van der waals (vdw) radii of the amino acids involved in the contact (with radius definitions following [ ] ). in a second step, we assessed the influence of glycans on the steric surface coverage by excluding all snapshots in which the fab clashed with glycans. the full glycans and a mannose-only ("pruned") version of the glycans were considered. every sugar residue of a glycan was represented by a pseudoparticle positioned at the residue center of mass. the effective vdw radius of this sugar bead was estimated from the sugar residue radius of gyration and found to be roughly equal to the vdw radius of an alanine residue, as defined in [ ] . a distance cutoff of the sum of fab residue vdw radius and glycan (≈ alanine) vdw radius was used to determine clashes. evaluation of the accessibility reduction due to the glycan shield. we quantified the glycan coverage by comparing global accessibility (to rays or to the rigid, coarse-grained fab) of the s surface with full glycans and without glycans. first, the global accessibility was computed as the sum over all residues of the numbers of hits for a given probing method and glycosylation pattern. then, we considered the ratio of global accessibility with glycans over global accessibility without glycans. finally, the relative accessibility reduction due to glycan coverage was taken as the complementary of this global accessibility ratio (relative accessibility reduction = − global accessibility ratio). x the proximal origin of sars-cov- common features of enveloped viruses and implications for immunogen design for nextgeneration vaccines structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme viral membrane fusion. virology - ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation structural basis of receptor recognition by sars-cov- host cell proteases: critical determinants of coronavirus tropism and pathogenesis rational vaccine design in the time of covid- vulnerabilities in coronavirus glycan shields despite extensive glycosylation structural basis for the recognition of the sars-cov- by full-length human ace structures, conformations and distributions of sars-cov- spike protein trimers on intact virions sars-cov- structure and replication characterized by in situ cryo-electron tomography in situ structural analysis of sars-cov- spike reveals flexibility mediated by three hinges a molecular pore spans the double membrane of the coronavirus replication organelle developing a fully-glycosylated full-length sars-cov- spike protein model in a viral membrane shielding and beyond: the roles of glycans in sars-cov- spike protein citizen scientists create an exascale computer to combat covid- glycans on the sars-cov- spike control the receptor binding domain conformation site-specific glycan analysis of the sars-cov- spike solution structure of the severe acute respiratory syndrome-coronavirus heptad repeat domain in the prefusion state supramolecular architecture of the coronavirus particle in advances in virus research, coronaviruses architecture of the sars coronavirus prefusion spike gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers charmm m: an improved force field for folded and intrinsically disordered proteins charmm additive all-atom force field for glycosidic linkages between hexopyranoses charmm-gui glycan modeler for modeling and simulation of carbohydrates and glycoconjugates maximum allowed solvent accessibilites of residues in proteins a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov sequence logos: a new way to display consensus sequences bepipred- . : improving sequence-based b-cell epitope prediction using conformational epitopes a cryptic site of vulnerability on the receptor binding domain of the sars-cov- spike glycoprotein structural basis for neutralization of sars-cov- and sars-cov by a potent therapeutic antibody a human neutralizing antibody targets the receptor binding site of sars-cov- human neutralizing antibodies elicited by sars-cov- infection an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants a neutralizing human antibody binds to the n-terminal domain of the spike protein of sars-cov- structure of hiv- gp v /v domain with broadly neutralizing antibody pg deducing the n-and o-glycosylation profile of the spike protein of novel coronavirus sars-cov- impact of the glycosylation pattern on steric antibody accessibility. for each chain, the number of monte carlo rigid-body docking hits without glycans, with pruned glycans and with full glycans is shown a rolling average over a residue window was applied for legibility. epitopes that undergo significant accessibility increases upon glycan pruning (e and e ) are indicated schematic illustration of the strategy used to obtain an atomistic model of the full-length s protein. for clarity, we do not show the solvent and membrane atomistic model of the full-length membrane-embedded s protein shown in cartoon representation. the chains are differentiated by color. palmitoylated cysteine residues are shown in pink licorice (only one chain shown for clarity). glycans are shown in green licorice representation rigid-body-aligned simulation structure of the hr coiled-coil (residues - , blue) and sars-cov hr nuclear-magnetic-resonance solution structure fxp [ ] (residues - , orange). beads. water and ions are omitted for clarity structure, function, and antigenicity of the sars-cov- spike glycoprotein comparative protein structure modeling using modeller predicting coiled coils from protein sequences logicoil-multi-state prediction of coiled-coil oligomeric state cryo-em structure of the -ncov spike in the prefusion conformation ccbuilder . : powerful and accessible coiled-coil modeling site-specific glycan analysis of the sars-cov- spike in situ structural analysis of sars-cov- spike reveals flexibility mediated by three hinges deducing the n-and o-glycosylation profile of the spike protein of novel coronavirus sars-cov- the pymol molecular graphics system the molecular biology of coronaviruses in advances in virus research the ins and outs of endoplasmic reticulum-controlled lipid biosynthesis automated builder and database of protein/membrane complexes for molecular dynamics simulations charmm-gui: a web-based graphical user interface for charmm charmm-gui membrane builder for mixed bilayers and its application to yeast membranes charmm-gui membrane builder toward realistic biological membrane simulations openmm simulations using the charmm additive force field gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers charmm m: an improved force field for folded and intrinsically disordered proteins charmm additive all-atom force field for glycosidic linkages between hexopyranoses charmm-gui glycan modeler for modeling and simulation of carbohydrates and glycoconjugates comparison of simple potential functions for simulating liquid water simulation of osmotic pressure in concentrated aqueous salt solutions molecular dynamics with coupling to an external bath polymorphic transitions in single crystals: a new molecular dynamics method lincs: a linear constraint solver for molecular simulations canonical sampling through velocity rescaling mdbenchmark: a toolkit to optimize the performance of molecular dynamics simulations a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov coarse-grained models for simulations of multiprotein complexes: application to ubiquitin binding solution structure of the severe acute respiratory syndrome-coronavirus heptad repeat domain in the prefusion state we thank martin beck, beata turoňová, and philipp s. schmalhorst for stimulating discussions, the max planck society for generous support, the max planck computing and data facility for providing computational resources, and the leibniz supercomputing centre munich for the superspike computing allocation. r.c. acknowledges the support by the frankfurt institute for advanced studies. m.s. acknowl- key: cord- -mymndjvd authors: higuchi, yusuke; suzuki, tatsuya; arimori, takao; ikemura, nariko; kirita, yuhei; ohgitani, eriko; mazda, osam; motooka, daisuke; nakamura, shota; matsuura, yoshiharu; matoba, satoaki; okamoto, toru; takagi, junichi; hoshino, atsushi title: high affinity modified ace receptors prevent sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mymndjvd the sars-cov- spike protein binds to the human angiotensin-converting enzyme (ace ) receptor via receptor binding domain (rbd) to enter into the cell. inhibiting this interaction is a main approach to block sars-cov- infection and it is required to have high affinity to rbd independently of viral mutation for effective protection. to this end, we engineered ace to enhance the affinity with directed evolution in human cells. three cycles of random mutation and cell sorting achieved more than -fold higher affinity to rbd than wild-type ace . the extracellular domain of modified ace fused to the fc region of the human immunoglobulin igg had stable structure and neutralized sars-cov- pseudotyped lentivirus and authentic virus with more than -fold lower concentration than wild-type. engineering ace decoy receptors with directed evolution is a promising approach to develop a sars-cov- neutralizing drug that has affinity comparable to monoclonal antibodies yet displaying resistance to escape mutations of virus. coronavirus disease has spread across the world as a tremendous pandemic and presented an unprecedented challenge to human society. the causative agent of covid- , sars-cov- is a single-stranded positive-strand rna virus that belongs to lineage b, clade of the betacoronavirus genus - . the virus binds to host cells through its trimeric spike glycoprotein composed of two subunits; s is responsible for receptor binding and s for membrane fusion . angiotensin-converting enzyme (ace ) is lineage b clade specific receptor including sars-cov- . the receptor binding domain (rbd) of s subunit directly binds ace with high affinity, therefore, it is the most important targeting site to inhibit viral infection. in fact, the rbd is the common binding site of effective neutralizing antibodies identified from convalescent patients [ ] [ ] [ ] . rna viruses such as sars-cov- have high mutation rates , which are correlated with high evolvability including the acquisition of anti-viral drug resistance. neutralizing antibodies are one of the promising approaches to combat covid- . accumulating evidence demonstrated that monoclonal antibodies isolated from convalescent covid- patients have high potency in neutralizing viruses. however, mutations in the spike gene can lead to the sars-cov- adaptation to such neutralizing antibodies. in the replicating sars-cov- pseudovirus culture experiment, escape mutation was observed against monoclonal antibody as early as in the first passage and evasion was seen even against the polyclonal convalescent plasma . notably, some mutations identified in in vitro replicating culture experiment are present in natural population according to the database similarly to the anti-rbd antibodies, extracellular domain of ace , soluble ace (sace ), can also be used to neutralize sars-cov- as a decoy receptor. the therapeutic potency was confirmed using human organoid , and now apeiron biologics conducts european phase ii clinical trial of recombinant sace against covid- . in addition, fusing sace to the fc region of the human igg has been shown to enhance neutralization capacity as well as to improve the pharmacokinetics to the level of igg in mice . most importantly, sace has a great advantage over antibodies due to the resistance to the escape mutation. the virus with escape mutation from sace should have limited binding affinity to cell surface native ace receptors, leading to a diminished or eliminated virulence. unfortunately, many reports, including our current study, have revealed that the binding affinity of wild-type sace to the sars-cov- spike rbd is much weaker (kd ~ nm) than that of clinical grade antibodies , , [ ] [ ] [ ] . thus, the therapeutic potential of the wild-type sace as a neutralizing agent against sars-cov- is uncertain. here we conducted protein engineering with human cell-based directed evolution to improve the binding affinity of ace to the spike rbd. random mutations were introduced in the protease domain containing the interface to the rbd, then full length ace mutant library was expressed in t cells and incubated with fluorescence-labelled rbd. high binding population was sorted and underwent dna extraction, the bulk of which was further induced with random mutations for the next cycle of selection. three cycles of screening resulted in an identification of mutant ace clones with more than -fold higher binding affinity to the rbd and lower half-maximal inhibitory concentration (ic ) for sars-cov- pseudotyped lentivirus as well as authentic virus. the present protein engineering system generates a virus-neutralizing drug that has high affinity comparable with antibodies and can resolve the issue of drug resistance caused by escape mutation. we engineered ace to bind the rbd of the sars-cov- spike protein with the combination of surface display of mutagenized library and fluorescence-activated cell sorting (facs) to perform the evolution in t human cells. the protease domain (pd) of ace is known to harbor the interface to viral spike protein, located in the top-middle part of ace ectodomain. in this study, ace residues - and - , referred to as pd and pd , respectively, were mutagenized independently. synthetic signal sequence and ha tag were appended and restriction sites were introduced in both sides of pd and pd by optimizing codon (fig. a) . we used error-prone pcr to mutagenize the protease domain of ace with an average of about one amino acid mutation per bp, then inserted the fragment into the introduced restriction site by homologous recombination. the reaction sample was transformed to competent cell, generating a library of ~ mutants. mutant plasmid library was packaged into lentivirus, followed by expression in human t cells in less than . moi (multiplicity of infection) to yield no more than one mutant ace per cell. cells were incubated with recombinant rbd of sars-cov- spike protein fused to superfolder gfp (sfgfp; fig. b) . we confirmed the level of bound rbd-sfgfp and surface expression levels of ha-tagged ace with alexa fluor in twodimensional display of flow cytometry. top . % cells showing higher binding relative to expression level were harvested from ~ x cells by facs. to exclude the structurally unstable mutants, cells with preserved signal of surface ace were gated. genomic dna was extracted from collected cells and mutagenized again to proceed to the next cycle of screening (fig. c) . random mutagenesis screening for pd was performed times and mutated sequences from top . % population were reconstructed into the backbone plasmid, and expressed in t cells individually. one to three hundred clones were validated for the binding capacity to the rbd-sfgfp. as the selection cycle advances, the two-dimensional distribution of library cells in flowcytometry became broader and higher in rbd-binding signal, and individual clone validation identified several mutants with higher binding capacity (fig. a) . to evaluate the neutralization activity in the form of sace , we first generated fusion protein of the soluble extracellular domain of mutagenized ace residues - and sfgfp (sace -sfgfp) and used them to compete with the cell-surface wt ace for the rbd binding. to this end, concentration of each mutant sace -sfgfp in the cultured medium from transfected cells was quantitatively standardized with sfgfp signal, serially diluted, preincubated with rbd-sfgfp for min, and then transferred to wild-type ace expressing t cells. after min, the rbd-bound cells were analyzed in flowcytometry. higher neutralization activity against the rbd was confirmed for the mutants that have accumulated mutations (fig. b , table. s ). second mutagenesis based on the top hit of first screening, - mutant, was also performed, but the distribution of the library cells did not expand so much, and we could not isolate clones with significantly higher affinity than the bulk of top . % (fig. s ). we next performed pd mutagenesis in both the bulk of top . % and one of the highest mutants of the rd library, the clone n . again, the binding distribution of the pd library cells was similar to the basal cells, suggesting the inability of this strategy to further increase the affinity to rbd. a recent study reported, via deep mutational scanning, that several specific mutations in pd were enriched in high rbd-binding clones . when we added these mutations in n , it did not improve the capacity of the rbd neutralization further ( fig. s ). to identify essential mutation(s) in the high affinity ace mutants, each mutation was altered to wild-type in mutant n , j and j . mutant n contains mutations of a v, k e, k n, e k, n i, l f, and n h. among them, individual back-mutation of v , n , k , and f to wild-type residues resulted in modest to severe reduction of the rbd-neutralization capacity (fig. a) , while multiple back mutations of e , i , and h in various combination did not alter the high activity of the original n ( fig. b) , indicating that a v, k n, e k, and l f was necessary and sufficient components. in the case of mutant j that is composed of k m, e k, q r, s f, l f, and n d, similar back mutation experiment revealed that k m, e k, and q r was essential (fig. c) . simultaneous back mutations of two (f /f and f /d ) but not three (f /f /d ) nonessential residues were tolerated ( fig. d) , suggesting that l f and n d may exert their positive effect on the activity only when they coexist. the third mutant j has t i, a v, h a, t r, t q, and q h. single and multiple back mutation experiments showed that h a, t q, and q h were essential in securing the high inhibitory activity of j (figs. e, f). these top high affinity mutants exhibited higher rbd-neutralization activity when compared with the same sace scaffold carrying the two high affinity mutation sets reported recently , (fig. g ). first mutant library was also sorted in the manner of high and low rbd-sfgfp binding signal (fig. s a ). the affinity value of each mutant was defined as the ratio of high and low read counts. then, the impact of each amino acid mutation on rbd binding was analyzed as a semi-deep mutational scanning (dms) (fig. s b ). this analysis revealed that some mutations such as k n, e k, and q r in top high affinity mutants had very mild impact in itself. simple combination of high value mutations, a v, k t, q l, l v and t k, referred to here as the dms mutant, and its derivatives showed less neutralization activity than top three high affinity mutants, indicating that each mutation works coordinately in high affinity mutants (fig. h ). next, we characterized the binding affinities of the mutant sace s for spike rbd using surface plasmon resonance, where igg -fc fused rbd was immobilized as a ligand and the association and dissociation kinetics of his-tagged sace were determined. the kd value of wild type sace was . nm, whereas those of mutants j and n were determined to be . nm and . nm (fig. a) . analytical size exclusion chromatography (sec) showed no signs of protein aggregation in the ace mutant samples, confirming that the apparent high affinity was not caused by the avidity effect and the observed kd values represent genuine : affinity toward rbd (fig. s a ). recombinant soluble ace (rsace ) was reported to have a fast clearance rate in human blood with a half-life of hours , . recently, it was demonstrated that a rsace fused with a fc fragment show high stability in mice as well as higher neutralization activity toward both pseudotyped and authentic sars-cov- in cultured cells . we formulated our high affinity mutant sace s as fc fusion (sace -fc) and found that the purified proteins were folded well and devoid of aggregation, showing solution behavior indistinguishable from wild type protein (fig. s b) . to evaluate their efficacy in neutralizing sars-cov- infections, affinity-enhanced sace -fc mutants were assayed for viral neutralization against pseudotyped lentivirus and authentic sars-cov- . the ic values of wild-type, j , and n for pseudovirus neutralization in ace -expressing t cells were . , . , and . μg/ml, respectively. in the same way, n neutralized pseudovirus very efficiently with an ic value more than times lower than the wild-type in tmprss -expressing veroe cells, (fig. b) . most importantly, when the neutralization potential against the authentic sars-cov- in tmprss expressing veroe cells was evaluated, wild-type sace -fc showed no efficiency even at μg/ml, whereas n sace -fc demonstrated significant neutralizing effect in . μg/ml (fig. c) . sars-cov- neutralization is one of the preventative or therapeutic approaches against covid- . monoclonal antibodies have become one of the common drug modalities, especially as therapeutics against autoimmune diseases and cancer. as virus-neutralizing antibodies, palivizumab is clinically used to prevent hospitalization from respiratory syncytial virus infection in high-risk infants , and cocktail of monoclonal antibodies has been shown to reduce mortality from ebola virus disease . engineered recombinant decoy receptor drugs are also developed to neutralize various cytokines including vascular endothelial growth factor, tumor necrosis factor alpha, and ctla- and approved for orbital vascular diseases and rheumatoid arthritis. recombinant sace or sace -fc fusion protein has potency to neutralize sars-cov- , , however its modest binding affinity requires higher dose than monoclonal antibody. we developed the screening system based on the cycle of random mutation and sorting of high affinity population in t cells followed by validation of neutralizing activity in a soluble form. in this screening, an additional random mutation was induced in the bulk of sorted mutants, which worked better than mutagenesis in the top mutant. engineering of decoy receptors with improved affinity was previously reported for cancer-related molecules and ace , , . they used yeast display system to perform directed evolution. large scale library (~ mutants) was prepared and high affinity mutants were identified by repeating sorting from initial library. fast growth rate of yeast is suitable for library screening involving repeated sorting and propagation. we on the other hand employed human cells for the display purpose. since post-translational modification can modulate protein binding affinity, human cell-based screening is better to understand the impact of ace variants on viral affinity and also to proceed biologics development. repeating mutagenesis after cell sorting without propagation enabled us to conduct screening with relatively small library (~ mutants) and human cells. during the validation, we noticed that high affinity pattern in the flow cytometry assay of full length ace binding rbd-sfgfp did not always correlate with its neutralization activity. thus, it is evident that experimental validation of each mutation at the level of sace protein was important for efficient identification of high affinity mutants. our mutant ace s have affinity comparable to typical anti-spike monoclonal antibodies, but they also offer some advantages over antibodies when considered as a drug candidate. interface of ace to the rbd is larger than that of antibodies, which potentially increases efficacy. escape mutation to modified ace is likely to result in lower affinity to the native receptor, making such virus much less virulent. sars-cov- also enters into host cell via endocytosis. sars-cov- infection is mediated not only by tmprss family proteases but by cathepsin l that is catalytically active at ph . - . . some antibodies are susceptible to impaired affinity at lower ph, leading to lower viral neutralization. high affinity modified ace fused with fc is the promising strategy to neutralize sars-cov- . the time frame for running one cycle of mutagenesis and sorting was just one week in our system, and we succeeded in developing optimized mutants in a couple of months without depending on patientsderived cells or tissues. thus, our system can rapidly generate therapeutic candidates against various viral diseases and may be well suited for fighting against future viral pandemics. lenti-x t cells were purchased from clontech and cultured at °c with % co in dulbecco's modified eagle's medium (dmem, wako) containing % fetal bovine serum (gibco) and penicillin/streptomycin ( u/ml, invitrogen). veroe /tmprss cells were a gift from national institutes of biomedical innovation, health and nutrition (japan) and cultured at °c with % co in dmem (wako) containing % fetal bovine serum (gibco) and penicillin/streptomycin ( u/ml, invitrogen). all the cell lines were routinely tested negative for mycoplasma contamination. for a semi-deep mutational scanning of ace residues - , % of ha positive cells with the highest and % with the lowest gfp fluorescence were also collected in st mutated library (fig. s a) and their genomic dna was extracted by nucleospin tissue (takara). ace residues - plasmid sequence was amplified with primers containing adaptor and barcode sequence to perform deep sequencing on the illumina miseq platform using nt paired-end protocol. data were analyzed as follows; high and low gating read count of each mutant was normalized with total counts and log ratio of high/low was defined as affinity value. then, each amino acid mutation-containing mutant affinity values were aggregated. . the pcdna to ha-ace plasmid was transfected into t cells ( ng dna per ml of culture kinetic binding measurement using biacore (spr) the binding kinetics of sace (wild-type or mutants) to rbd were analyzed by spr using a biacore pseudotyped reporter virus assays were conducted as previously described . a plasmid coding sars-cov- spike was obtained from addgene # , and deletion mutant cΔ (with amino acids deleted from the c terminus) was cloned into pcdna to (invitrogen) to enhance virus titer . spike-pseudovirus with a luciferase reporter gene was prepared by transfecting plasmids (cΔ , pspax , and plenti firefly) into lentix- t cells with lipofectamine (invitrogen). after hours, supernatants were harvested, filtered with a . μm low protein-binding filter (sfca), and frozen at - °c. ace -expressing t cells were seeded at , cells per well in -well plate. pseudovirus and three-fold dilution series of sace -fc protein were incubated for hour, then this mixture was administered to ace -expressing t cells. after hour pre-incubation, medium was vero-tmprss were seeded on well plates ( , cells/well) and incubated for overnight. the culture supernatants serially diluted by medium were inoculated and incubated for hours. culture medium was removed, fresh medium containing % methylcellulose ( . ml) was added, and the culture was further incubated for days. the cells were fixed with % paraformaldehyde phosphate buffer solution (nacalai tesque) and plaques were visualized by using a crystal violet. table. s amino acid sequence and rbd neutralization activity value of validated mutants. the value of rbd neutralization activity was calculated as -log concentration of % rbd-sfgfp bound competing relative to n . a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses cryo-em structure of the -ncov spike in the prefusion conformation a human neutralizing antibody targets the receptor-binding site of sars-cov- human neutralizing antibodies elicited by sars-cov- infection potent neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells why are rna virus mutation rates so damn high? antibody cocktail to sars-cov- spike protein prevents rapid mutational escape seen with individual antibodies escape from neutralizing antibodies by sars-cov- spike protein variants engineered ace receptor traps potently neutralize sars-cov- . biorxiv neutralization of sars-cov- spike pseudotyped virus by recombinant ace -ig engineering human ace to optimize binding to the spike protein of sars coronavirus structural basis of receptor recognition by sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor targeting the degradation of angiotensin ii with recombinant angiotensinconverting enzyme : prevention of angiotensin ii-dependent hypertension pharmacokinetics and pharmacodynamics of recombinant human angiotensinconverting enzyme in healthy human subjects humanized respiratory syncytial virus monoclonal antibody, reduces hospitalization from respiratory syncytial virus infection in high-risk infants. the impact-rsv study group controlled trial of ebola virus disease therapeutics inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace an engineered axl 'decoy receptor' effectively silences the gas -axl signaling axis antitumor activity of an engineered decoy receptor targeting clcf -cntfr signaling in lung adenocarcinoma genome-wide crispr screen reveals host genes that regulate sars-cov- infection human secretory signal peptide description by hidden markov model and generation of a strong artificial signal peptide for secreted protein expression protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays retroviral vectors pseudotyped with severe acute respiratory syndrome coronavirus s protein we would like to thank sho hashimoto, toshiyuki nishiji, tomohiro hino, and keiko tamura- key: cord- - q oopaz authors: dobaño, carlota; vidal, marta; santano, rebeca; jiménez, alfons; chi, jordi; barrios, diana; ruiz-olalla, gemma; melero, natalia rodrigo; carolis, carlo; parras, daniel; serra, pau; de aguirre, paula martínez; carmona-torre, francisco; reina, gabriel; santamaria, pere; mayor, alfredo; garcía-basteiro, alberto; izquierdo, luis; aguilar, ruth; moncunill, gemma title: highly sensitive and specific multiplex antibody assays to quantify immunoglobulins m, a and g against sars-cov- antigens date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q oopaz reliable serological tests are required to determine the prevalence of antibodies against sars-cov- antigens and to characterise immunity to the disease in order to address key knowledge gaps in the context of the covid- pandemic. quantitative suspension array technology (qsat) assays based on the xmap luminex platform overcome the limitations of rapid diagnostic tests and elisa with their higher precision, dynamic range, throughput, miniaturization, cost-efficacy and multiplexing capacity. we developed three qsat assays to detect igm, iga and igg to a panel of eight sars-cov- antigens including spike (s), nucleoprotein (n) and membrane (m) protein constructs. the assays were optimized to minimize processing time and maximize signal to noise ratio. we evaluated the performance of the assays using plasmas obtained before the covid- pandemic (negative controls) and plasmas from individuals with sars-cov- diagnosis (positive controls), of whom were asymptomatic, had mild symptoms and were hospitalized. pre-existing igg antibodies recognizing n, m and s proteins were detected in negative controls suggestive of cross-reactive to common cold coronaviruses. the best performing antibody isotype/antigen signatures had specificities of % and sensitivities of . % at ≥ days since the onset of symptoms and . % at ≥ days since the onset of symptoms, with auc of . and . , respectively. combining multiple antibody markers as assessed by qsat assays has the highest efficiency, breadth and versatility to accurately detect low-level antibody responses for obtaining reliable data on prevalence of exposure to novel pathogens in a population. our assays will allow gaining insights into antibody correlates of immunity required for vaccine development to combat pandemics like the covid- . in a globalized world where emerging infectious diseases of broad distribution can put at stake the health and economy of millions of people, there is a need for versatile and reliable serological tools that can be readily applicable (i) to determine the seroprevalence of antibodies against any new pathogen and, more importantly, (ii) to characterise immunity to the disease at the individual and community levels. in the case of the covid- pandemic caused by sars-cov- , one of the main priorities since the beginning of the epidemics in china by the end of ( ) was to ascertain the percentage of the population that had been exposed to the virus, considering that a considerable number of people could have been asymptomatic ( ) ( ) . the lack of sensitive and specific serological tests early in the covid- pandemic delayed the precise estimation of the burden of infection for the rational implementation of public health measures to control viral spread ( ) . furthermore, immunological assays that can measure a high breadth of antibody types and specificities are needed to dissect which are the naturally acquired protective responses and identify correlates of immunity ( ) . additionally, when a vaccine becomes available, such assays would be valuable to evaluate immunogenicity of candidate vaccines and monitor duration of immunity at the population level ( ) . common tools for antibody studies are (i) rapid diagnostic tests (rdt) as point of care (poc) devices that usually measure either total immunoglobulins or igg and igm, qualitatively ( ), (ii) traditional enzyme-linked immunosorbent assays (elisa) ( ) that can quantify different isotypes and subclasses of antibodies against single antigens at a time, and that require certain previous expertise, personnel and equipment, and (iii) chemiluminescent assays (clia), widely used in clinical practice, faster and with higher throughput than elisa ( ) . the performance characteristics of the commercial kits available in the early months of the covid- pandemic were questionable ( ) , while external evaluations validating their reliability and accuracy were not published. a number of in-house elisa assays have also been developed in hospital and research laboratories ( ) , but they have the limitations that (i) a relatively large amount of sample is required, (ii) the large surface area of the individual microplate wells and the hydrophobic binding of capture antibody can lead to non-specific binding and increased background, and (iii) most elisas rely upon enzyme-mediated amplification of signal in order to achieve reasonable sensitivity ( ). an alternative technique that offers the benefits of elisa but also a larger dynamic range of antibody quantification and higher sensitivity ( ) ( ) is based on the xmap luminex® platform (www.luminexcorp.com/bibliography). secondary antibodies are labelled with fluorescent phycoerythrin (pe) directly or with biotin that mediates binding to streptavidin-rphycoerythrin (sape), which does not depend on an additional reaction. the technique has the added value of higher throughput (up to -well plate format), increased flexibility, and lower cost with the same workflow as elisa, particularly if using magnetic magplex® microspheres. paramagnetic beads allow for automation of workflow and better reproducibility compared to the previous generation of microplex® microspheres. since the beads have the capture antigen immobilized on their much smaller surface area compared to a -well microplate well, reduced sample volumes are required and non-specific binding is diminished ( ) . furthermore, a chief advantage over elisa is the multiplex nature of the assay that allows measuring antibodies to different antigens simultaneously. this increases the probabilities to detect a positive antibody response due to the heterogeneity of the human response and therefore it has a higher sensitivity relevant for identifying seropositive individuals. the luminex technology, capable of measuring simultaneously antibodies against (magpix®), (luminex / ®) and up to different antigens (flexmap d®), makes it an invaluable tool for antigen and epitope screening. finally, its versatility to set up adapted antigen panels makes luminex an excellent platform to ensure better preparedness for faster response to future emerging diseases and pandemics. here we report on the establishment and validation of three quantitative suspension array technology (qsat) assays to measure igm, iga and igg antibodies against eight sars-cov- antigens, based on the adaptation of previous in-house protocols that measured antibodies to other infectious diseases, including malaria ( )( )( ) ( ) . due to the need to process a large amount of samples with the minimal time and cost during a pandemic like the covid- , we optimized several conditions to reduce the duration of the assays and report them here for the three main isotypes that have proved useful for seroprevalence studies ( ) . positive samples were plasmas from individuals with a confirmed past/current diagnosis of covid- . one hundred and eleven had sars-cov- infection confirmed by real time reverse-transcriptase polymerase chain reaction (rrt-pcr). fifty-five were recruited in a study of health care workers in hospital clínic in barcelona, most of them with mild symptoms, of them hospitalized and without symptoms, all rrt-pcr positive ( ) . fiftyseven were covid- patients recruited at the clínica universidad de navarra in pamplona (spain), of which had severe symptoms and were hospitalized and had mild symptoms (one clinically diagnosed with positive radiology and serology, and negative rrt-pcr); were asymptomatic health workers with positive diagnosis confirmed by four serological tests but no rrt-pcr data. time since onset of symptoms ranged from to days. positive samples were used individually or as pools of up to samples depending on the tests. for optimization tests, only a subset of samples were used. negative controls were plasmas from healthy european donors collected before the covid- pandemic, and were used individually. numbers of positive and negative samples were in line with protocol recommendations from the foundation for innovative new diagnostics (find). ethics. samples analyzed in this study received ethical clearance for immunological evaluation and/or inclusion as controls in immunoassays, and the protocols and informed consent forms were approved by the institutional review board (irb) at hcb (refs. ceic- and hcb/ / ) or universidad de navarra (ref. un/ / ) prior to study implementation. the receptor-binding domain (rbd) of the spike (s) glycoprotein of sars-cov- , the leading vaccine candidate target, was selected as the primary antigen to develop the initial qsat assay because (i) s is one of the most immunogenic surface proteins together with the nucleocapsid protein (n) ( ) (ii) rbd is the fragment of the virus that mediates binding to the host receptor ace in the lung cells ( ) (iii) antibodies to rbd correlate with neutralizing antibodies ( )( ) that could be associated with protection based on studies of other coronaviruses and animal models ( ) ( ) ( ) ( ) , and (iv) an elisa based on this same protein has received fda approval for covid- serology ( ) . the rbd was from the krammer lab different test concentrations of protein antigens were coupled to magnetic magplex . µm cooh-microspheres from luminex corporation (austin, tx) in reactions of a maximum of , beads, at , beads/µl ( ) . first, beads were washed twice with . µl of distilled water using a magnetic separator (life technologies, d), and resuspended in µl of activation buffer, mm monobasic sodium phosphate (sigma, s ), ph . we compared the performance of the assays when a subset of positive and negative plasma samples were incubated at different dilutions with the antigen-beads for h or h at rt in relation to our previous protocol on at ºc. antigen-coupled beads, initially including rbd singleplex, were added to a -well µclear® flat bottom plate (greiner bio-one, ) at beads/well in a volume of µl/well pbs-bn. next, individual positive plasma samples (range of dilutions tested from / to / ) and individual negative controls (at the same dilutions as the positive samples), were added per plate in a final volume of µl per well. two blank control wells with beads in pbs-bn were set up in each plate to control for background signal. plates were incubated on a microplate shaker at rpm and protected from light, and then washed three times with µl/well of pbs-tween . %, using a magnetic manual washer (millipore, . for more accurate igm measurements, we tested whether diluting samples : with gullsorb™ igg inactivation reagent (meridian bioscience™) prior to testing for igm levels could reduce high responses observed in some negative samples ( ) . additionally, we tested the levels of rbd and s antibodies obtained at different plasma dilutions when incubated in a multiplex panel with additional antigenbeads including s , s , m and n constructs, compared to those obtained in singleplex, to check for potential interferences. finally, since viral proteins have diverse immunogenicity, definitive plasma dilutions were established with titration experiments in individual positive and negative samples once the final multiplex antigen panel and all assay conditions had been selected. we compared the performance of the assays when using biotinylated secondary antibodies days (top markers), ≥ days (top markers) and ≥ days (top markers). for each model, we calculated the auc and selected three seropositivity cutoffs aiming at specificities of i) %, ii) ≥ % and < %, and iii) ≥ % and < %, and obtained the corresponding sensitivity. models with % specificity and the highest sensitivities were selected for roc curve representations. the analysis was carried out using the statistical software r studio version r- . . ( ) (packages used: randomforest ( ) and proc ( )). the characteristics of sars-cov- infected participants whose plasma samples have been used in the study, with regards to age, sex, days since rrt-pcr diagnosis and days since onset of symptoms, are included in table s . the optimal amount of protein to be coupled to beads depended on the antigen and needs to be tested with each new lot. among the concentrations tested ( , or µg/ml protein), titration curves did not usually change substantially, in which case the lower concentration was chosen for the subsequent experiments. an illustrative example is shown for which the medium concentration was slightly superior when tested for igg and igm and thus selected ( figure s ). plasmodium falciparum antigens were based on on incubations at ºc ( ) . for the covid-incubations at ºc versus shorter times at rt. we tested that the range of dilutions was still adequate when reducing the incubation time (figure a ) and compared antibody levels and number of seropositive samples incubating on at ºc versus h rt at / ( figure b) . although the mfi readings in positive samples generally diminished with shorter times, the mfi readings in the negative samples also reduced, i.e. the signal to noise ratio was the same or sometimes better, maintaining or increasing the overall proportion of seropositive among the positive samples and thus the sensitivity. based on these data, we adopted the h incubation time for an initial covid- seroprevalence study ( ) . we subsequently tested shorter incubations more extensively and found that h was non-inferior to h incubation ( figure c ) and thus h was selected for the optimized sop. reduction of background in igm assay. treatment with gullsorb™ reduced or did not change the mfi signal, depending on the sample, antigen and dilution ( figure s a ). this additional incubation generally increased the signal to noise ratio and thus sensitivity and number of seropositive igm responses among the positive controls, particularly at the lower dilutions, therefore the gullsorb™ incubation was adopted for this assay ( figure s b ). igm reactivity in negative controls was lower against s-based antigens than against m-or nbased antigens and thus gullsorb™ treatment benefited the signal to noise ratio the most in these later proteins. singleplex versus multiplex antigen testing. multiplexing the antigens ( -plex panel) did not significantly decrease the mfi antibody levels to rbd or s compared to singleplex testing ( figure a ) neither for any of the other antigens ( figure s a) . interestingly, there was no evidence of any interference between rbd, s, s or s antigens despite sharing epitopes within the same multiplex panel. a number of negative pre-pandemic samples had preexisting antibodies recognizing sars-cov- proteins for certain isotypes and dilutions ( figure s a ): igg to s , s , m and n constructs, and iga to s and n-term & c-term of n. furthermore, testing plasmas against multiple antigens increased the sensitivity of the assay since some individuals who were seronegative or low responders to rbd, responded with higher antibodies to s (figure b) . once the multiplex antigen panel was established, a set of positive and negative samples were tested at different dilution(s) covering the diverse immunogenicity of the proteins, and / and / were selected for the assay performance evaluation (figures c and s b) . secondary antibodies conjugated to pe performed as well as a two-step secondary antibody conjugated to biotin followed by sape incubation (figure ) . the pe-antibody reagent that resulted in a shorter assay was selected as the preferred option. finally, min incubation was non-inferior to min incubation ( figure s ). we sought for the combination of ig and antigen responses that yielded the highest specificity (primarily), sensitivity and auc to detect seropositive responses. for rbd and s, igg and iga at / dilution, and igm responses at / , gave higher percentages of seropositive responses among the positive controls and thus were selected for the calculations; for n constructs, igg and iga performed better at / except for n c-term in which igg was better at / . antibodies to m, s (igg & iga) and n n-term (igm) did not discern well positive from negative responses and were not included in the rf models. the contribution of each antibody/antigen variable was ranked according to an rf algorithm at different periods since onset of symptoms ( figure s ) and the top - variables were selected. we performed rf for all the combination of variables and assessed the sensitivity of each combination at three different seropositivity thresholds aiming at specificities of %, % and %. the specificity of the qsat assays in samples from participants with sars-cov- positive diagnosis with ≥ days since the onset of symptoms (n= ) was up to % with sensitivity up to . %, and auc up to . , for the best combinations of ig isotypes/antigens. the top performing antibody signatures for three different seropositivity thresholds targeting specificities of %, % and % are shown in table , and their roc are shown in figure . in samples from participants with ≥ days since the onset of symptoms (n= ), the specificity was up to % and the sensitivity up to . %, with auc up to . for the best combinations of ig isotypes/antigens (table , figure ). in samples from all participants regardless of time since symptoms onset (n= ), the specificity was up to % and the sensitivity up to . %, depending on the combinations of ig isotypes/antigens, with auc up to . for the best combinations (table s , figure ). the performance of the qsat assays to predict positivity was clearly superior using combinations of multiple ig isotypes/antigens to using single isotype/antigen markers ( figure ). higher sensitivities were obtained when specificities were set to % or % ( tables , & s ), reaching % for samples ≥ or ≥ days since the onset of symptoms. we developed three novel multiplex immunoassays for quantifying igm, iga and igg to eight sars-cov- protein constructs and evaluated by machine learning classification algorithms the performance of several isotype/antigen combinations to detect any positive antibody response to infection, obtaining specificities of % and sensitivities of . % (≥ days since symptoms onset) or . % (≥ days since symptoms onset), and very high predictability (auc ≥ . ). our qsat assays, based on the xmap technology, provide the best precision, accuracy and widest range of detection compared to classical qualitative (rdt) or quantitative (elisa) assays. for any given test, there is usually a trade-off between sensitivity and specificity. to evaluate the performance of the assays here, we prioritized specificity over sensitivity for the implications that false positives may pose at a personal level and the impact that specificity has in seroprevalence studies. particularly when prevalence of infection is low, the positive predictive value of a test strongly relies on a high specificity. for example, in a scenario of % prevalence and % sensitivity, the positive predictive value of the test decreases from % to %, with a reduction in specificity from % to %. however, other seropositivity thresholds can be used to have a balanced specificity/sensitivity or to maximise sensitivity. a time period after the onset of symptoms is usually established for these analyses, because antibodies take an average - days since infection to be produced and detected depending on the isotype and test (igm - days, iga - days, igg - days) ( )( )( )( )( ) ( ) . thus, it is not necessarily expected to detect antibodies in individuals who are acutely infected and diagnosed around the time of plasma collection. accordingly, when considering all samples, which included and individuals with less than and days since onset of symptoms, respectively, sensitivity was lower (up to . %) at specificities of %. however, we detected igm or iga as early as days, and igg as early as day, from onset of symptoms. in fact, since samples were collected in the early days of the covid- pandemic, it is expected that igm and iga, which are induced upon primary infection earlier than igg, could contribute to a higher sensitivity of detection. most of the best signatures identified included igm and iga besides igg, regardless of the time period since onset of symptoms, also beyond days. however, over time, the only antibodies that would be expected to remain in blood are igg due to the decay of igm and iga, e.g. igm levels may become undetectable by the fifth week after symptoms onset ( ) . therefore, with longer days since infection, the serological assays to detect maintenance of antibodies could focus on igg detection. the superior performance of the qsat assays is partly based on direct fluorescence detection as opposed to colorimetric detection mediated by an enzyme. also, antigens are covalently coupled to beads as opposed to passive coating of the elisa plates, leading to a higher density of antigen per surface area and less antigen wash off during the assay. the higher background of elisa microplates is related to the fact that they have a much larger surface area than the combined area of microspheres, which is more prone to the binding of non-specific antibodies if blocking is not performed correctly ( ) . the sensitivities and specificities of other sars-cov- serological assays externally validated with > positive and > negative samples (as recommended by find protocols), some of them approved by the usa fda, are summarized in table s ( ) ( ) . while luminex assays generally have high correlation to elisas in singleplex (r ~ . ) ( ), it is important that the assays perform equally well in multiplex format, with no interference noted between antigens, even if they had overlapping epitopes. a key value of multiplexing is that it allows to capture a wider breadth of responses and this is needed because some individuals may not respond to one antigen (e.g. rbd) but may do so to other antigens (e.g. s or n proteins) ( )( ) ( ) . here, we substantially increased the sensitivity of the assay when combining isotypes/antigens compared to using only one isotype/antigen. the addition of n was more beneficial to detect seropositive responses when the onset of symptoms was recent, as this antigen is the most abundant and immunogenic and specific antibodies appear to be elicited earlier ( ) . in contrast, combinations of s antigens seemed to be sufficient to detect seropositive responses with longer periods since the onset of symptoms. an added advantage of multiplexing is the reduced usage of sample volume, resources and time, if antibodies to several antigens are to be evaluated. the possibility to perform miniaturized assays in small amounts of blood is very attractive in paediatric studies, in large field surveys where fingerpick may be more logistically feasible, and to test special tissues of interest including mucosal fluids. those combined advantages have a direct impact on the cost-efficacy of the qsat assay, that is overall cheaper than rdt or elisa assays. the cost of the xmap assay can be less than one-fifth of the least expensive commercial elisa and less than one-sixteenth of the most expensive commercial kit. cost is reduced because there is less protein used due to the smaller surface area and less amounts of other materials and reagents. we reduced the dilutions of plasma and titrated the secondary antibody to use the minimal amounts of samples and reagents, without compromising sensitivity. the economy of scale will improve further when the assays are adapted to high throughput flexmap d -well plate format but they are also easily adaptable to the bench top magpix -well format that is more affordable and easy to maintain even in remote laboratory settings. interestingly, positive antibody responses to m, s , s and n antigen constructs were detected in samples collected before the covid- pandemic. the presence of such antibodies has been interpreted as cross-reactivity with antigens of coronaviruses causing the common cold ( )( ) ( ) . indeed, higher sequence homology at the protein level between sars-cov- and coronaviruses has been reported for n (particularly n-terminal and central regions), m and s ( )( ) ( ) . pre-existing sars-cov- -specific t cells have been recently reported and also attributed to cross-reactivity to human coronaviruses previously encountered ( ) ( ) . the multiplex nature of the assay will allow to test this hypothesis in the future with the addition of antigens to related coronaviruses e, hku , nl and oc in the same assay panel, by comparing the patterns of antibody reactivity, in order to address the significance of this in immunity to here, antibody responses to m were very marginal and did not contribute to higher assay sensitivity and this could partly be because the purity of the protein was not high. however, this antigen may be valuable in studies establishing the antibody correlates of protection since at present the targets of immunity have not been elucidated. it is possible that, in addition to neutralizing antibodies directed to the rbd region of s, antibodies of other specificities with non-neutralizing functions, for example fc-mediated opsonisation and phagocytosis, could be relevant in protection. in fact, t cell responses to epitopes located on m have been detected at high frequencies ( ) , and it is possible that antibodies to this or other less immunogenic antigens may also have a role in protection in some individuals. in our study, the addition of s from a commercial supplier did not have any added value but for future versions of the assay we will test s from different sources, as this subunit is expected to not cross-react with other beta-coronaviruses and be specific for sars-cov- diagnostics ( ) ( ) . the assays performances were excellent but further testing needs to be performed with longer periods of time since onset of symptoms, although we do expect to maintain high specificity and sensitivity albeit antibody signatures would be different and based on igg only. future studies will include additional positive samples of asymptomatic individuals, who probably have lower antibody levels than mild or severe cases and are rarely included in the validation of commercial kits. in addition, it will be interesting to include negative controls reacting with other coronaviruses or other infections (e.g. malaria) and pathologies known to induce polyclonal responses or rheumatoid factor, which may increase background responses. in conclusion, we developed % specific and fast assays with possibly one of the best diagnostic characteristics reported in the published literature to assess seroprevalence of covid- . considering their high sensitivity, these qsat assays would be suited to identify individuals with levels of antibodies below the lower limit of detection of rdt or the lower limit of quantification of elisa, such as asymptomatic children or immunosuppressed individuals, or long-term decaying antibodies ( ) . in addition this approach would be particularly suited to identify hyper immune donors with very high levels of antibodies and the largest antigenic breadth for immunotherapy. the assays are highly versatile, being easily adaptable to quantify other antibody igg and iga subclasses and avidity with the use of chaotropic agents, and even functional activity like binding inhibition to the virus receptor ace . the multiplex capabilities make them also ideal for sizeable peptide screenings to accelerate epitope mapping and selection for identifying fine-specificity of immune correlates of protection for vaccine development, and would also be applicable in vaccine evaluation when the first candidates reach larger-scale phase and clinical trials. the assays development and sample collection were performed with internal funds from the investigators groups and institutions, and the performance analysis received support from a novel coronavirus from patients with pneumonia in china covid- : four fifths of cases are asymptomatic, china figures indicate asymptomatic sars coronavirus infection among healthcare workers, singapore. emerg infect dis the important role of serology for covid- control what policy makers need to know about covid- protective immunity immune surveillance for vaccine-preventable diseases lateral flow assays enzyme-linked immunosorbent assay (elisa) chemiluminescent immunoassay technology: what does it change in autoantibody detection? auto-immun highlights. / / serological assays for emerging coronaviruses: challenges and pitfalls luminex corporation. overcoming the cost and performance limitations of elisa with xmap(r) technology. tech note characterization and development of a luminex(®)-based assay for the detection of human il- simultaneous quantitation of cytokines using a multiplexed flow cytometric assay development of quantitative suspension array assays for six immunoglobulin isotypes and subclasses to multiple plasmodium falciparum antigens optimization of incubation conditions of plasmodium falciparum antibody multiplex assays to measure igg, igg( - ), igm and ige using standard and customized reference pools for sero-epidemiological and vaccine studies development of a high-throughput flexible quantitative suspension array assay for igg against multiple plasmodium falciparum antigens analysis of factors affecting the variability of a quantitative suspension bead array assay measuring igg to multiple plasmodium antigens seroprevalence of antibodies against sars-cov- among health care workers in a large spanish reference hospital neutralizing antibodies against sars-cov- and other human coronaviruses structural and functional basis of sars-cov- entry by using human ace neutralizing epitopes of the sars-cov s-protein cluster independent of repertoire, antigen structure or mab technology neutralizing antibodies against sars-cov- and other human coronaviruses the time course of the immune response to experimental coronavirus infection of man reinfection could not occur in sars-cov- infected rhesus macaques two year prospective study of the humoral immune response of patients with severe acute respiratory syndrome immunoassay for serodiagnosis of zika virus infection based on time-resolved förster resonance energy transfer r: a language and environment for statistical computing. r foundation for statistical computing classification and regression by randomforest. r news proc: an open-source package for r and s+ to analyze and compare roc curves antibody responses to sars-cov- in covid- patients: the perspective application of serological tests in clinical practice antibody responses to sars-cov- in patients of novel coronavirus disease temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov- : an observational cohort study antibody detection and dynamic characteristics in patients with covid- profile of igg and igm antibodies against severe acute respiratory syndrome coronavirus (sars-cov- ) profiling early humoral response to diagnose novel coronavirus disease (covid- ) profile of specific antibodies to sars-cov- : the first report antibody testing for covid- diagnostic testing for severe acute respiratory syndrome-related coronavirus- : a narrative review antibody responses to individual proteins of sars coronavirus and their neutralization activities profiles of igg antibodies to nucleocapsid and spike proteins of the sars-associated coronavirus in sars patients development of an enzyme-linked immunosorbent assay-based test with a cocktail of nucleocapsid and spike proteins for detection of severe acute respiratory syndrome-associated coronavirus-specific antibody detection of nucleocapsid antibody to sars-cov- is more sensitive than antibody to spike protein in covid- patients. medrxiv analysis of serologic cross-reactivity between common human coronaviruses and sars-cov- using coronavirus antigen microarray. biorxiv antigenic crossreactivity between severe acute respiratory syndrome-associated coronavirus and human coronaviruses e and oc sars-cov- specific antibody responses in covid- patients beyond the spike: identification of viral targets of the antibody response to sars-cov- in covid- patients. medrxiv targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals presence of sars-cov- reactive t cells in covid- patients and healthy donors. medrxiv serological signatures of sars-cov- infection: implications for antibody-based diagnostics. medrxiv we thank the volunteers who donated blood for covid- studies and the clinical and laboratory staff who participated in the sample collection and processing. special thanks to key: cord- -tded ih authors: gilmore, kerry; zhou, yuyong; ramirez, santseharay; pham, long v.; fahnøe, ulrik; feng, shan; offersgaard, anna; trimpert, jakob; bukh, jens; osterrieder, klaus; gottwein, judith m.; seeberger, peter h. title: in vitro efficacy of artemisinin-based treatments against sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tded ih effective and affordable treatments for patients suffering from coronavirus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ), are needed. we report in vitro efficacy of artemisia annua extracts as well as artemisinin, artesunate, and artemether against sars-cov- . the latter two are approved active pharmaceutical ingredients of anti-malarial drugs. proof-of-concept for prophylactic efficacy of the extracts was obtained using a plaque-reduction assay in veroe cells. subsequent concentration-response studies using a high-throughput antiviral assay, based on immunostaining of sars-cov- spike glycoprotein, revealed that pretreatment and treatment with extracts, artemisinin, and artesunate inhibited sars-cov- infection of veroe cells. in treatment assays, artesunate ( % effective concentration (ec ): μg/ml) was more potent than the tested plant extracts ( - μg/ml) or artemisinin ( μg/ml) and artemether (> μg/ml), while generally ec in pretreatment assays were slightly higher. the selectivity index (si), calculated based on treatment and cell viability assays, was highest for artemisinin ( ), and roughly equal for the extracts ( - ), artesunate ( ) and artemether (< ). similar results were obtained in human hepatoma huh . cells. peak plasma concentrations of artesunate exceeding ec values can be achieved. clinical studies are required to further evaluate the utility of these compounds as covid- treatment. the pandemic with severe acute respiratory syndrome coronavirus (sars-cov- ) , has worldwide been associated with over million deaths from coronavirus disease (covid- ) . , , this febrile respiratory and systemic illness is highly contagious and in many cases life-threatening. remdesivir, the only antiviral drug with proven in vitro and clinical efficacy, was approved for treatment of covid- . still, covid- treatment remains largely supportive with an urgent need to identify effective antivirals against sars-cov- . an attractive approach is repurposing drugs already licensed for other diseases. teas of a. annua plants have been employed to treat malaria in traditional chinese medicine, as well as in clinical trials, , and are used widely in many african countries, albeit against who recommendations. artemisinin (figure , ) , a sesquiterpene lactone with a peroxide moiety and one of many bioactive compounds present in a. annua, is the active ingredient to treat malaria infections. , the artemisinin derivatives artesunate (figure , ) and artemether ( figure , ) exhibit improved pharmacokinetic properties and are the key active pharmaceutical ingredients (api) of who-recommended anti-malaria combination therapies used in millions of adults and children each year with few side effects. a. annua extracts are active against different viruses, including sars-cov. , , therefore, we set out to determine whether a. annua extracts, as well as pure artemisinin, artesunate, and artemether are active against sars-cov- in vitro. artemisinin-based drugs would be attractive repurposing candidates for treatment of covid- considering their excellent safety profiles in humans, and since they are readily available for worldwide distribution at a relatively low cost. a. annua plants grown from a cultivated seed line in kentucky, usa, were extracted using either absolute ethanol or distilled water at °c for minutes, as described in materials and methods and supplementary information (figure s ). for a third preparation, to protect artemisinin from degradation by reducing agents, ground coffeea natural source of polyphenols such as chlorogenic acid ( figure , ( ) ) that also exhibit mild antiviral activities , was coextracted with the plant material using ethanol (see supporting information). solids were removed by filtration and the solvents were evaporated. the extracted materials were dissolved in dimethylsulfoxide (dmso) (both ethanol extracts) or a dmso:water mixture ( : for aqueous extract) and filtered (see supporting information for details). artemisinin ( figure , ( )) was synthesized and purified following a published procedure, to initially screen whether extracts and pure artemisinin were active against sars-cov- , their antiviral activity was tested by pretreating veroe cells at different time points during minutes with selected concentrations of the extracts or compounds prior to infection with the first european sars-cov- isolated in münchen (sars-cov- /human/germany/bavpat / ). the virus-drug mixture was then removed and cells were overlaid with medium containing . % carboxymethylcellulose to prevent virus release into the medium. dmso was used as a negative control. plaque numbers were determined either by indirect immunofluorescence using a mixture of antibodies to sars-cov n protein or by staining with crystal violet. the addition of either ethanolic or aqueous a. annua extracts prior to virus addition resulted in reduced plaque formation in a concentration dependent manner with median effective concentration (ec ) values estimated to range between and µg/ml (supplemental figure s a -c). artemisinin exhibited little antiviral activity with an ec > µg/ml (supplemental figure s d ). concentration-response experiments using the danish sars-cov- isolate sars-cov- /human/denmark/dk-ahh / were performed employing a -well plate based highthroughput antiviral assay, allowing for multiple replicates per concentration, as described in materials and methods and supplementary information (figures s and s ). seven replicates were measured at each concentration and a range of concentrations was evaluated to increase data accuracy when compared to the plaque-reduction assay, which was carried out in duplicates. extracts or compounds were added to veroe cells either . h prior to (pretreatment (pt)) or h post infection (treatment (t)), respectively, followed by a two-day incubation of virus with extracts or compounds. both protocols yielded similar results, with slightly lower ec values observed for treatment assays. the ethanolic extracts showed similar potency: for a. annua alone ec were µg/ml (pt) and µg/ml (t) and for a. annua with coffee ec were µg/ml (pt) and µg/ml (t) (figures , and table ). the aqueous extract was slightly less potent with ec being µg/ml (pt) and µg/ml (t) (figures , and table ). with all extracts, almost complete virus inhibition was achieved at high concentrations: for the a. annua ethanolic extract at µg/ml (pt) and µg/ml (t), for the a. annua + coffee ethanolic extract at µg/ml (pt) and µg/ml (t), and for the a. annua aqueous extract at µg/ml (pt) and µg/ml (t) (figures and ). the highest evaluated concentrations used in our assays were informed by the cytotoxicity of the extracts or compounds, as only concentrations resulting in cell viability greater than % were evaluated (figures , , s and table ). cell viability assays table ). selectivity indexes (si) were determined by dividing cc by ec and revealed similar results for the a. annua ethanolic extract being (pt) and (t), the a. annua + coffee ethanolic extract being (pt) and (t) as well as the a. annua aqueous extract being (pt) and (t) ( table ). the two ethanolic extracts were diluted with dmso that by itself caused reduction of cell viability to < % when used at a : dilution, but not at dilutions ≥ : ( figure s ). thus, the cytotoxicity observed when using the extracts at relatively high concentrations was most likely not caused by dmso (figures and ). dmso at dilutions > : including the dilutions used in antiviral assays did not have antiviral effects, defined as reduction of residual infectivity to < % ( figure s ). thus, the observed antiviral effect of the tested extracts was most likely not caused by dmso. a pure coffee extract estimated to contain . -fold higher coffee concentrations than the a. annua + coffee ethanolic extract did not result in reduction of cell viability to < % at dilutions ≥ : ( figure s ). the cytotoxicity observed when using the a. annua + coffee extract at relatively high concentrations was most likely not caused by coffee (figures and ) . interestingly, coffee extract alone showed some antiviral activity at dilutions ≤ : ( figure s ). thus, the observed antiviral effect of the a. annua + coffee extract may be influenced by coffee. veroe cells. a. annua plants contain, in addition to many other bioactive compounds, artemisinin that is responsible for the potent anti-malarial activities of a. annua. to investigate whether artemisinin is the active component responsible for the antiviral activities of the plant extracts described above, the pure compound and synthetic derivatives were tested in pretreatment and treatment assays. artemisinin was found to be active in sars-cov- assays with ec µg/ml (pt) and µg/ml (t) (figures , , and table ). close to complete virus inhibition was achieved in both assays at the highest concentration evaluated in the assays, (pt) and µg/ml (t). the si for artemisinin is relatively high, (pt) and (t), based on a cc of , µg/ml (figures , , s , and table ). the observed cytotoxicity of artemisinin appeared to be at least partially caused by dmso, as cytotoxicity was only observed at drug dilutions where dmso was found to reduce cell viability (figures , , and s ). the antiviral effects observed when using artemisinin at relatively high concentrations were most likely not due to the diluent dmso (figures , , and s ). the synthetic artemisinin derivative artesunate, the api of who-recommended first-line malaria therapies with improved pharmacokinetic properties, showed the highest potency of all compounds tested, with ec being µg/ml (pt) and µg/ml (t) (figures and ). in the treatment assay, close to complete virus inhibition was achieved at the highest evaluated concentration ( µg/ml), as determined by cytotoxicity data, compared to % inhibition at this concentration in the pretreatment assay. higher artesunate concentrations were not used considering its cytotoxicity in this assay (cc : µg/ml) (figures , , s , and table ). si of (pt) and (t) were calculated ( table ). the cytotoxicity and the antiviral effects observed when using artesunate at relatively high concentrations were most likely not due to the diluent dmso (figures , , and s ). artemether, another artemisinin-derivative that is used globally as the active ingredient in malaria medications, did not show a significant antiviral effect at concentrations of up to µg/ml (figures and ). considering artemether´s cytotoxicity (cc of , µg/ml), an si < was calculated (figures , , s , and table ). the cytotoxicity observed when using artemether at relatively high concentrations was most likely not due to the diluent dmso table ). in huh . cells, the ec for the ethanolic a. annua extract was µg/ml, with % virus inhibition at the highest evaluated concentration ( µg/ml), as determined by cytotoxicity data; the cc was µg/ml and the si was (figures , s and table ). artemisinin showed no significant virus inhibition at the highest evaluated concentration ( µg/ml) and an si < , based on a cc of , µg/ml (figures , s and table ). in huh . cells, dmso caused reduction of cell viability to < % when used at a : dilution, but not at dilutions ≥ : ( figure s ). thus, the cytotoxicity observed when using the ethanolic extract or the pure compounds at relatively high concentrations was most likely not caused by dmso ( figure ). dmso at dilutions > : including dilutions used in antiviral assays did not have any antiviral effects ( figure s ). thus, the observed antiviral effect of the ethanolic a. annua extract and the pure compounds was most likely not caused by dmso. here, we demonstrate the in vitro efficacy of artemisinin-based treatments against sars-cov- . initially, several a. annua extracts, as well as artemisinin, were screened for antiviral activity using a plaque-reduction assay in a pretreatment setting using a german sars-cov- strain from munich. based on these findings, three a. annua extracts and pure, synthetic artemisinin, artesunate, and artemether were studied in detail to establish concentration-response curves for extracts and compounds for pretreatment and treatment settings using a danish sars-cov- strain from copenhagen. high-throughput antiviral assays facilitated testing of drug concentrations in multiple replicates resulting in accurate ec values. the ec values in the pretreatment setting were slightly higher than ec values determined in the treatment setting possibly because preincubation may have a negative impact on the stability of the extracts and pure compounds. generally, ec values depend on the specific assay employed. while the type of assay we used with a single treatment and subsequent incubation of virus and drug is state of the art for antiviral efficacy measurements, assay modifications, such as repeated administration of treatment, might result in slightly different ec values. since the active antiviral substance may be an artemisinin metabolite, such that the artemisinin derivatives and extracts can be considered prodrugs, we used the human huh . cell line to confirm the ec determined in veroe cells. while a. annua extracts have been considered "natural combination therapies" as they contain several bioactive compounds, the who discourages the use of non-pharmaceutical forms of artemisinin as a therapeutic option for malaria due to lack of standardization with its sourcing and preparation, implying risks of suboptimal efficacy and resistance development. in this context, it is important to note that the extracts used in this study were prepared from plants grown under optimized and standardized conditions, in a manner where concentrations of the extracted material are reproducible. interestingly, we found that coffee extracts exhibited in vitro efficacy against sars-cov- . while modelling studies suggested that ingredients in coffee such as chlorogenic acid, caffeic acid, and tannins show activity against sars-cov- , when the typically used doses of to . mg/kg intravenously were administered, reported peak plasma concentrations (cmax) were between . and . µg/ml in patients. based on these observations and our treatment data in veroe and huh . cells, the calculated cmax/ec values are between . and . . in animal studies following administration of a single dose of artesunate, tissue concentrations including lung, kidney, intestine, and spleen concentrations were several-fold higher than plasma concentrations. in contrast, following administration of artemisinin, artemether, and a. annua teas, cmax values between − ng/ml were reported, which is close to three orders of magnitude below ec values for sars-cov- . plasma and tissue concentrations that can be achieved with standardized a. annua extracts with high artemisinin content used in this study still have to be determined. in vivo, immunomodulatory effects of artemisinin-based treatments have been reported for this class of drugs. such effects that may involve cytokine signaling cannot be monitored in in vitro assays performed here and will have to be carefully studied in subsequent clinical evaluations. infected and nontreated as well as noninfected and nontreated control wells were included in the assay. after ± hours incubation at °c and % co , cultures were immunostained for sars-cov- spike glycoprotein and evaluated as described below. cells were fixed and virus was inactivated by immersion of plates in methanol (j.t.baker, gliwice, poland) for min. unless specified, immunostaining was done at room temperature. plates were washed twice with pbs (sigma, gillingham, uk) containing . % tween- (sigma, saint louis, missouri, usa). endogenous peroxidase activity was blocked by incubation with % h o for ten minutes followed by two washes with pbs containing . % tween- and blocking with pbs containing % bovine serum albumin (roche, mannheim, germany) and . % skim milk powder (easis, aarhus, denmark) for minutes. figure s and representative images of single wells are show in figure s . which an antiviral effect (< % residual infectivity) / cytotoxic effect (< % cell viability) due to dmso is expected according to figure s . the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirus outbreak in wuhan, china importation and human-to-human transmission of a novel coronavirus in vietnam clinical features of patients infected with novel coronavirus in wuhan, china. lancet. coronavirus infections-more than just the common cold outbreak of pneumonia of unknown etiology in wuhan, china: the mystery and the miracle remdesivir for the treatment of covid- -preliminary report n engl artemisia annua and artemisia afra tea infusions vs. artesunateamodiaquine (asaq) in treating plasmodium falciparum malaria in a large scale, double blind, randomized clinical trial artemisia annua l. infusion consumed once a week reduces risk of multiple episodes of malaria: a randomized trial in a ugandan community qinghaosu (artemisinin): an antimalarial drug from china flavonoids from artemisia annua l. as antioxidants and their potential synergism with artemisinin against malaria and cancer effect of the affordable medicines facility -malaria (amfm) on the availability, price, and market share of quality-assured artemisinin-based combination therapies in seven countries: a before-and-after analysis of outlet survey data natural products as promising drug candidates for the treatment of hepatitis b and c valacyclovir combined with artesunate or rapamycin improves the outcome of herpes simplex virus encephalitis in mice compared to antiviral therapy alone antiviral activities of coffee extracts in vitro anti-hepatitis b virus activity of chlorogenic acid, quinic acid and caffeic acid in vivo and in vitro recovery of artemisinin from a complex reaction mixture using continuous chromatography and crystallization antigenic and cellular localization analysis of the severe acute respiratory syndrome coronavirus nucleocapsid protein using monoclonal antibodies the α-tif (vp ) homologue (etif) of equine herpesvirus is essential for secondary envelopment and virus egress efficient culture of sars-cov- in human hepatoma cells enhances viability of the virus in human lung cancer cell lines permitting the screening of antiviral compounds from ancient herb to modern drug: artemisia annua and artemisinin for cancer therapy. sem. cancer bio a natural products may interfere with sars-cov- attachment to the host cell anti-sars-cov- potential of artemisinins in vitro drug therapy in dental practice: general principles part -pharmacodynamic considerations review of the clinical pharmacokinetics of artesunate and its active metabolite dihydroartemisinin following intravenous, intramuscular, oral or rectal administration the distribution pattern of intravenous [ c] artesunate in rat tissues by quantitative whole-body autoradiography and tissue dissection techniques artemisinins and immune system highly permissive cell lines for subgenomic and genomic hepatitis c virus rna replication differential efficacy of protease inhibitors against hcv genotypes a, a, a, and a ns / a protease recombinant viruses recombinant hcv variants with ns a from genotypes - have different sensitivities to an ns a inhibitor but not interferon-. gastroent novel infectious cdna clones of hepatitis c virus genotype a (strain s ) and a (strain ed ): genetic analyses and in vivo pathogenesis studies key: cord- -kb fgfaq authors: mendonça, luiza; howe, andrew; gilchrist, james b.; sun, dapeng; knight, michael l.; zanetti-domingues, laura c.; bateman, benji; krebs, anna-sophia; chen, long; radecke, julika; sheng, yuewen; li, vivian d.; ni, tao; kounatidis, ilias; koronfel, mohamed a.; szynkiewicz, marta; harkiolaki, maria; martin-fernandez, marisa l.; james, william; zhang, peijun title: sars-cov- assembly and egress pathway revealed by correlative multi-modal multi-scale cryo-imaging date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kb fgfaq since the outbreak of the sars-cov- pandemic, there have been intense structural studies on purified recombinant viral components and inactivated viruses. however, investigation of the sars-cov- infection in the native cellular context is scarce, and there is a lack of comprehensive knowledge on sars-cov- replicative cycle. understanding the genome replication, assembly and egress of sars-cov- , a multistage process that involves different cellular compartments and the activity of many viral and cellular proteins, is critically important as it bears the means of medical intervention to stop infection. here, we investigated sars-cov- replication in vero cells under the near-native frozen-hydrated condition using a unique correlative multi-modal, multi-scale cryo-imaging approach combining soft x-ray cryo-tomography and serial cryofib/sem volume imaging of the entire sars-cov- infected cell with cryo-electron tomography (cryoet) of cellular lamellae and cell periphery, as well as structure determination of viral components by subtomogram averaging. our results reveal at the whole cell level profound cytopathic effects of sars-cov- infection, exemplified by a large amount of heterogeneous vesicles in the cytoplasm for rna synthesis and virus assembly, formation of membrane tunnels through which viruses exit, and drastic cytoplasm invasion into nucleus. furthermore, cryoet of cell lamellae reveals how viral rnas are transported from double-membrane vesicles where they are synthesized to viral assembly sites; how viral spikes and rnps assist in virus assembly and budding; and how fully assembled virus particles exit the cell, thus stablishing a model of sars-cov- genome replication, virus assembly and egress pathways. since the outbreak of the sars-cov- pandemic, there have been intense structural studies on purified recombinant viral components and inactivated viruses. however, investigation of the sars-cov- infection in the native cellular context is scarce, and there is a lack of comprehensive knowledge on sars-cov- replicative cycle. understanding the genome replication, assembly and egress of sars-cov- , a multistage process that involves different cellular compartments and the activity of many viral and cellular proteins, is critically krios to identify each individual infected cell ( . % for moi of . and . % for moi . ) where cryoet tilt series were collected at the cell periphery. the grids were then transferred to a fib/sem dualbeam instrument and the same infected cells were subjected to either serial cryofib/sem volume imaging (zhu et al., ) or cryofib milling of cellular lamellae where additional cryoet tilt series were collected (sutton et al., ) . alternatively, we imaged infected cells on cryoem grids by soft x-ray cryo-tomography figure d , movie ). we noticed that in a recent study, such cytoplasm invagination was also seen in one of the conventional em images of stained plastic sections of sars-cov- infected cells, although no description was given (lamers et al., ) . independently, we investigated cytopathic effect of sars-cov- infection using soft x-ray cryo-tomography. consistent with the cryofib/sem volume imaging results, the overview images of soft x-ray display profound changes in mitochondria morphology, as they appear fragmented in the sars-cov- infected cell ( figure s b&d on the cell surface are clearly distinguishable in soft x-ray cryo-tomogram ( figure s e , black arrow). also clearly observed are extensive vesiculation ( figure s f ) and a partial nucleus invagination in the infected cell ( figure g) . cov- (total portals from dmvs) compared to those of mhv ( portals per dmv), signifying the difference between coronaviruses. we also observed vaultosomes near the dmv outer membranes ( figure f , black arrow). it is still unclear what is the physiological role of vaults is, but they have been associated with rna nucleocytoplasmic transport, innate suggests that proteolytic processing has taken place in these compartments resulting in s shedding. therefore, we suggest that assembly at the ergic smvs is the only pathway which will lead to infectious viral progeny. there has not been much detailed studies on how sars-cov- viruses are released from cell. we investigated sars-cov- egress using both serial cryofib/sem volume imaging and cryoet. cryofib/sem images clearly reveal virus exiting tunnels in d at the cell periphery connecting to cell membrane ( figure a -b, movie ). this likely resulted from the fusion of very large multi-virus containing vesicles with cell membrane, i.e. egress through exocytosis. consistent with cryofib/sem analysis, we also observed virus exiting tunnels in cryo-tomograms ( figure c ). the fact that these compartments often contained many viral particles suggests that this is a snapshot of viral exit, rather than cellular entry. however, in addition to exocytosis, we also frequently found plasma membrane discontinuities next to viral particles outside the cell ( figure e we used a multi-modal, multi-scale and correlative approach to investigate sars-cov- infection process in native cells, from the whole cell to subcellular and to the molecular level. the integration of multi-scale imaging data, achieved through this advanced workflow ( figure s ), has led us to propose a pathway for sars-cov- replication, in particular virus genome replication, assembly and egress. the replication of sars-cov- appears spatially well-organized and highly efficient. from genome replication, to protein synthesis and transport, to virus assembly and budding, these processes take place in close-knit cytoplasmic compartments. as illustrated in figure the cells were incubated at room temperature for minutes after which a further . ml of media was added to each well. the plate was then incubated at °c for hours following which supernatants were discarded and cells washed with ml of pbs. the cells were then fixed by addition of ml of % paraformaldehyde in pbs for hour at room temperature. after fixation, grids were plunge-frozen on a leica grid plunger . ul of concentrated nm gold fiducials was applied to the gold-side of the em grid and blotted from the gold-side. the grid was quickly immersed in liquid ethane after blotting. frozen grids were stored in liquid nitrogen until data collection. milling of sars-cov- infected cells was carried out using a scios dualbeam cryofib (thermofisher scientific) equipped with a pp t transfer system and stage (quorum technologies). grids were sputter coated within the pp t transfer chamber maintained at - °c. after loading onto the scios stage at - °c, the grids were inspected using the sem (operated at kv and pa) and cells, identified as infected from tem, were found. the grid surface was coated using the gas injection system (trimethyl(methylcyclopentadienyl)platinum(iv), thermofisher scientific) for s, yielding a thickness of ~ um. milling was performed using the ion beam operated at kv and currents decreasing from pa to pa. at pa lamella thickness was less than nm. during the final stage of milling, sem inspection of the lamellae was conducted at kv and pa. samples were imaged on a zeis crossbeam xl fitted with a quorum transfer station and cryo-stage. they were mounted on a quorum-compatible custom sample holder and coated with platinum for sec at ma on the quorum transfer stage, prior to loading on the cryo- stage. stage temperature was set at - °c, while the anticontaminator was held at - °c. samples were imaged at ° tilt after being coated again with pt for x sec using the fib- sem's internal gis system, with the pt reservoir held at °c. initial trapezoid trenches were milled at kv na over μm to reach a final depth of μm, with a polish step over a rectangular box with a depth of μm performed at kv . na. serial sectioning and imaging acquisition was performed as follows: fib milling was set up using the kv pa probe, a z-slice step of nm and a depth of μm over the entire milling box; sem imaging was performed at a pixel depth of x pixels, which resulted in a pixel size of . nm, with the beam set at kv pa, dwell time nsec and scan speed , averaging the signal over line scans as a noise-reduction strategy. cryoet image processing the frames in each tilt angle in a tilt series were processed to correct drift using motioncor (zheng et al., ) . for the intact cells dataset, all tilt series were aligned using the default parameters in imod version . . with the etomo interface, using gold-fiducial markers (kremer et al., ) . for lamella dataset, the tilt series were aligned in the framework of appion-protomo fiducial-less tilt-series alignment suite (noble and stagg, ) . after tilt series alignment, the tilt-series stacks together with the files describing the projection transformation and fitted tilt angles were transferred to emclarity for the subsequent sub- tomogram averaging analysis (himes and zhang, ) . all sub-tomogram averaging analysis steps were performed using emclarity, mostly following previously published protocols described workflow (himes and zhang, ) . the ctf estimation for each tilt was performed by using emclarity version . . , and the subvolumes were selected by using automatic template matching function within emclarity using reference derived from emdb- (walls et al., ) that was low-pass filtered to -Å resolution in emclarity. the template matching results were cleaned manually by comparison of the binned tomograms overlaid with the emclarity-generated imod model showing the x,y,z coordinates of each cross-correlation peak detected. after manually template cleaning, a total of subvolumes from the lamella dataset and a total of subvolumes from the extracellular viruses dataset were retained, deriving from tilt-series and tilt-series respectively, for the following averaging and alignment steps in emclarity. for the extracellular viruses dataset, the d iterative averaging and alignment procedures were carried out gradually with binning of x, x, x , each with - iterations with increasingly restrictive search angles and translational shifts. -fold symmetry was applied during all the steps. final converged average map was generated using bin tomograms with pixel size of . Å/pixel and a box size of × × voxels. resolution indicated by . fsc cut-off was . Å. the same process was carried out for lamella dataset, except for the final average map was generated with pixel size of . Å/pixel and a box size of × × voxels and a final resolution at Å (gold standard fsc at . cut-off). cell structures were manually segmented from stacks of images using imagej (koppensteiner et al., ) (kounatidis et al., ) . grids were loaded on the x-ray microscope at b and were first mapped using visible light with a x objective. the resulting coordinate- map was used to locate areas of interest where d x-ray mosaics were collected (x-ray light used was at ev) and used to identify areas of interest within. tilt series of - º were collected for each field of view area of interest at . or . º steps with constant exposure of . sec keeping average pixel intensity to between - k counts. all tilt series were background subtracted, saved as raw tiff stacks and reconstructed using either imod (kremer et al., ) or batchruntomo (mastronarde, ) . the membrane rearrangements mediated by coronavirus nonstructural proteins and . virology - cryo-soft x-ray tomography: using soft x-rays to explore the ultrastructure of whole cells emclarity: software for high-resolution cryo-electron tomography and subtomogram averaging sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor '-triphosphate rna is the ligand for rig-i structures and distributions of sars-cov- spike proteins on intact virions sars- coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum macrophage internal hiv- is protected from neutralizing antibodies d correlative cryo-structured illumination fluorescence and soft x-ray microscopy elucidates reovirus intracellular computer visualization of three-dimensional image data using imod sars-cov- productively infects human gut enterocytes structure of the sars-cov- spike receptor-binding domain bound to the ace receptor the sars-cov- nucleocapsid phosphoprotein forms mutually exclusive condensates with rna and the membrane-associated m protein. biorxiv : the preprint server for biology automated electron microscope tomography using robust prediction of specimen movements the hepatitis c virus- induced membranous web and associated nuclear transport machinery limit access of pattern recognition receptors to viral replication sites a structural analysis of m protein in coronavirus assembly and morphology automated batch fiducial-less tilt- series alignment in appion using protomo sars-coronavirus- replication in vero e cells: replication kinetics, rapid adaptation and cytopathology expression and cleavage of middle east respiratory syndrome coronavirus nsp - polyprotein induce the formation of double-membrane vesicles that mimic those associated with coronaviral rna replication morphological and biochemical characterization of the membranous hepatitis c virus replication compartment ucsf chimera--a visualization system for exploratory research and analysis er-derived vesicles exporting short-lived erad regulators, for replication structural basis of receptor recognition by sars-cov- cardiomyocytes are susceptible to sars-cov- infection characterization of severe acute respiratory syndrome- associated coronavirus (sars-cov) spike glycoprotein-mediated viral entry a unifying structural and functional model of the coronavirus replication organelle: tracking down rna synthesis cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor the intracellular sites of early replication and budding of sars-coronavirus structural model of the sars coronavirus e channel in lmpg micelles assembly intermediates of orthoreovirus captured in the cell free fatty acid binding pocket in the locked structure of sars cov- spike protein. science in situ structural analysis of sars-cov- spike reveals flexibility mediated by three hinges nucleocapsid-independent assembly of coronavirus-like particles by co-expression of viral envelope protein genes structure, function, and antigenicity of the sars-cov structural and functional basis of sars-cov- entry by using human ace a molecular pore spans the double membrane of the coronavirus replication organelle membrane vesicles as platforms for viral replication. trends in microbiology direct visualization of vaults within intact cells by electron cryo-tomography structural basis for the recognition of the sars-cov- by full-length human ace molecular architecture of the sars-cov- virus motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy short article serial cryofib / sem reveals cytoarchitectural disruptions in leigh syndrome patient cells serial cryofib / sem reveals cytoarchitectural disruptions in leigh syndrome patient cells key: cord- - gsgd t authors: mohseni, amir hossein; taghinezhad-s, sedigheh; su, bing; wang, feng title: inferring mhc interacting sars-cov- epitopes recognized by tcrs towards designing t cell-based vaccines date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gsgd t the coronavirus disease (covid- ) is triggered by severe acute respiratory syndrome mediated by coronavirus (sars-cov- ) infection and was declared by who as a major international public health concern. while worldwide efforts are being advanced towards vaccine development, the structural modeling of tcr-pmhc (t cell receptor-peptide-bound major histocompatibility complex) regarding sars-cov- epitopes and the design of effective t cell vaccine based on these antigens are still unresolved. here, we present both pmhc and tcr-pmhc interfaces to infer peptide epitopes of the sars-cov- proteins. accordingly, significant tcr-pmhc templates (z-value cutoff > ) along with interatomic interactions within the sars-cov- -derived hit peptides were clarified. also, we applied the structural analysis of the hit peptides from different coronaviruses to highlight a feature of evolution in sars-cov- , sars-cov, bat-cov, and mers-cov. peptide-protein flexible docking between each of the hit peptides and their corresponding mhc molecules were performed, and a multi-hit peptides vaccine against the s and n glycoprotein of sars-cov- was designed. filtering pipelines including antigenicity, and also physiochemical properties of designed vaccine were then evaluated by different immunoinformatics tools. finally, vaccine-structure modeling and immune simulation of the desired vaccine were performed aiming to create robust t cell immune responses. we anticipate that our design based on the t cell antigen epitopes and the frame of the immunoinformatics analysis could serve as valuable supports for the development of covid- vaccine. designing of multi-hit peptides vaccine sequence a set of high immunogenic hit peptides derived from n and s proteins of sars-cov- with high binding events to hla-a , hla-b , hla-b , hla-b , hla-b , and hla-e were selected on the basis of their solvent exposed residues and hydrophobicity scales. the aay and gpgpg linkers were applied for linking the candidate n and s hit peptides together, respectively. additionally, the human beta defensin was also joined at n-terminus of the vaccine construct using eaaak linker which acts as adjuvant to improve the immunogenicity of the multi-hit peptides vaccine. the sopma web server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html) was identified for them, respectively. giving our data, five and ten hit peptide antigen candidates (z-value cutoff > ), and and homologous peptide antigens in and organisms by using hla-a - peptide-tcr template [pdb entry p e and oga] and the experimental peptide database were inferred in orf a and s proteins, respectively. no hit peptide antigen candidates with z-value cutoff > were detected for orf , orf a, and orf queries by using hla-a -peptide-tcr template ( figure s a ). our results showed that by using hla-b ( figure s d ), hla-b (figure s b), and hla-b ( figure s e ) template, tcr-pmhc models with z-value cutoff > were only predicted for s ( hit peptide antigen candidate and homologous peptide antigens in organisms with pdb entry mi ), n ( hit peptide antigen candidate and homologous peptide antigens in organisms with pdb entry nx ), and s ( hit peptide antigen candidate and homologous peptide antigens in organisms with pdb entry ak ) proteins, respectively. while, only m and s proteins were used to predict tcr-pmhc complex by using hla-b ( figure s c ) with pdb entry dxa ( homologous peptide antigens in organisms). also only n, orf , and s proteins were used for prediction of tcr-pmhc complex by hla-e (figure s f) with pdb entry esv ( homologous peptide antigens in organisms) with z-value cutoff > . no hit peptide antigen candidates with z-value cutoff > were detected for m, e, and orf a queries by using hla-e-peptide-tcr template ( figure s f ). moreover, model for orf ab query did not predict due to its sequence length was ≥ . the d structure of the tcr-pmhc complex for each hit peptide derived from each protein was illustrated by swiss-model. detailed information about derived hit peptides for sars-cov- along with bat-cov, mers-cov, and sars-cov were tabulated in table . the binding events among hit peptides derived from n proteins related to hla-a our tcr-pmhc models predicted that position of the homologous peptide antigens ( figure a ) related to tcr-pmhc complex of sars-cov- as well as sars-cov and bat-cov n proteins prefers the hydrophobic amino acid residues (e.g. ile, leu, met, and phe), and the second position of these hit peptides is an hydrophobic amino acid residue pro forming five strong vdw forces with residues y , v , m , y , and f and two h-bonds with residues k and e on mhc molecule ( figure s a , left). by contrast, the second position of mers-cov n protein-derived hit peptide is charged residue arg forming three strong vdw forces with residues m , f , and v and two h-bonds with residues k and e (as same as sars-cov- ) on mhc molecule. surprisingly, position of all homologous peptides antigens ( figure a ) prefers the hydrophobic amino acid residues (e.g. leu, ile, val, and met), and the position of these hit peptides is hydrophobic amino acid residue leu (in the sars-cov- , sars-cov, and bat-cov) and gly (in the mers-cov). our results showed that leu attaches to the mhc with three strong vdw forces with residues l , i , and w and three h-bonds with residues d , y , and t , while gly forms three strong vdw forces with residues l , v , and w and two h-bonds with residues d and v on mhc molecule ( figure s a , left). moreover, positions , , and of hit peptides in sars- cov- , sars-cov, and bat-cov form one h-bond with residues q , q and d in chain e of tcr, with reside s in chain d of tcr (table s ). visualization of interactions in the atomic level structure of a tcr-pmhc complex in the hit peptide of sars-cov- n protein for hla-a ( figure s a ) within and Å generated on-the-fly using pymol. accordingly, position of these hit peptides forms two h-bonds with residues y and y and position of these hit peptides forms a h-bonds with residue e and four strong vdw forces with residues a , y , w , and m . moreover, position of these hit peptides forms a h-bond with residue n and three hit peptides lacks any contacts, although the position and of hit peptides form both h-bonds and/or strong vdw forces on both mhc molecule and tcr ( figure s a and s c, center) (table s ). visualization of interactions in the atomic level structure of a tcr-pmhc complex in the hit peptide of sars-cov- n protein for hla-e ( figure s c ) within and Å was generated on-the-fly using of the homologous peptide antigens of all queries has no detectable binding to both mhc and tcr. additionally, position of these hit peptides with a h-bond and strong vdw forces connects to the residues q and y on tcr, respectively ( figure s d , right). the hit peptide correlates well with the amino acid profile on the conserved positions (tyr) that forms one strong vdw forces with residue w on mhc ( figure s b , right) and four strong vdw forces with residue a , h , h , and l on tcr ( figure s d , right and table s ). visualization of interactions in the atomic level structure of a tcr- pmhc complex in the hit peptide of sars-cov- s protein for hla-b ( figure s d ) within and Å was generated on-the-fly using pymol. cov and sars-cov orf proteins. our data emphasized the hit peptide derived from sar-cov- s protein by using hla-a had lower immunogenicity than bat-cov, mers-cov, and sars-cov. revealed a high degree of immunogenicity between sars-cov- , sars-cov, and bat-cov but a more limited immunogenicity with mers-cov. based on the hypothesis that solvent exposed residues via increasing tcr binding can provide appropriate evidence about peptide immunogenicity, we measured solvent exposed area (sea) for each hit peptide. our results displayed that the sea > Å for hit peptide related to hla-b , hla-b , and hla- b were . , . , and . , respectively. indeed, we found the most solvent accessibility of amino acids were in the hit peptide m (sea > Å : . ), s (sea > Å : . ), and orf (sea > Å : ) for hla-a , hla-b , and hla-e, respectively. over the past few months, studies in humans are beginning to unravel the underpinnings relationship between hydrophobicity scales and eradication of immune responses. currently, identification of peptide regions exposed at the surface has gained much attention in the field of immunogenicity of peptides to hla-e-peptide-tcr ( figure e ) and s protein related to hla-b peptide-tcr ( figure d ) had grater hydrophobicity than orf and m proteins due to differences at positions and , and , predicted as a single domain without disorder. among them, the first model had a better quality in most cases with c-score = - . , estimated tm-score = . ± . , and estimated rmsd = . ± . Å. however, because of the modelling errors and unavailability of an appropriate template such as angles and irregular bonds, generation of the d models cannot be sufficient to follow the necessary accuracy level for some biological purpose, especially where experimental data is rare. as such, for modification of local errors, helping to bring d model of vaccine closer to native structures, and growing the accuracy of primary d model, the refinement of d structure of the vaccine is vital, particularly for furthering in-silico studies [ ] . therefore, the refinement of the model was performed by using d refine tool. on the basis of the overall quality of the refined model, the model exhibited the best results with rmsd . Å ( figure b). the quality of the best model of the multi-hit peptides vaccine construct was validated by prosa-web. it is well-known that the z score is in relation with the length of the protein, indicating that negative z- scores are more appropriate for a trustworthy model. in fact, the z score shows the overall quality and (range − to ) and also was located within the space of protein related to x-ray, suggesting that the obtained model is reliable and closes to experimentally determined structure. evaluation the overall quality of the finalized model of vaccine construct by ramachandran plot analysis emphasized that . % ( / ) of all residues in finalized model ( figure f ) compared to . % ( / ) of all residues in initial model ( figure e ) were in favored ( %) regions, also, . % ( / ) of all residues in finalized model ( figure f) compared to . % ( / ) of all residues in initial model ( figure e ) were in allowed (> . %) regions. after that, protein structure and visualization of the measured interactions between atoms to our best knowledge, from the standpoint of immunoinformatics approaches, the concept discussed in this study is the first structural modeling to investigate both the tcr and pmhc interfaces for the sars- cov- proteins. our results provide a blueprint for inferring the sars-cov- -derived hit peptides with high accuracy towards vaccine development. therefore, it is tempting to speculate that the aforementioned models will offer valuable framework for identifying specific peptides with a potential to acrivate t cell- hit peptide; vdw: van der waals (vdw) forces c-immsim immune simulator web server was used for determining ability of vaccine to induce t cell immunity. this server yielded results consistent with actual immune responses as evidenced by a general marked increase in the generation of secondary responses. for better following the effects of the final vaccine construct for stimulation of t cell immunity, a construct with point mutations on key residues, replacing of the hydrophobic amino acids with charged amino acids was constructed ( figure a ). according to data, after vaccination with native vaccine, there was a consistent rise in th (helper) cell key: cord- -i cti ok authors: díez, josé-maría; romero, carolina; vergara-alert, júlia; belló-perez, melissa; rodon, jordi; honrubia, josé manuel; segalés, joaquim; sola, isabel; enjuanes, luis; gajardo, rodrigo title: cross-neutralization activity against sars-cov- is present in currently available intravenous immunoglobulins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: i cti ok background there is a crucial need for effective therapies that are immediately available to counteract covid- disease. recently, elisa binding cross-reactivity against components of human epidemic coronaviruses with currently available intravenous immunoglobulins (ivig) gamunex-c and flebogamma dif ( % and %) have been reported. in this study, the same products were tested for neutralization activity against sars-cov- , sars-cov and mers-cov and their potential as an antiviral therapy. methods the neutralization capacity of six selected lots of ivig was assessed against sars-cov- (two different isolates), sars-cov and mers-cov in cell cultures. infectivity neutralization was measured by determining the percent reduction in plaque-forming units (pfu) and by cytopathic effects for two ivig lots in one of the sars-cov- isolates. neutralization was quantified using the plaque reduction neutralization test (prnt ) in the pfu assay and the half maximal inhibitory concentration (ic ) in the cytopathic/cytotoxic method (calculated as the minus log dilution which reduced the viral titer by %). results all ivig preparations showed neutralization of both sars-cov- isolates, ranging from to . % with prnt titers from . to > for the pfu method and ranging from . %- . % with an ic ~ for the cytopathic method. all ivig lots produced neutralization of sars-cov ranging from . to . % and prnt values ranging from . to . . no ivig preparation showed significant neutralizing activity against mers-cov. conclusion in cell culture neutralization assays, the tested ivig products contain antibodies with significant cross-neutralization capacity against sars-cov- and sars-cov. however, no neutralization capacity was demonstrated against mers-cov. these preparations are currently available and may be immediately useful for covid- management. the outbreak of the novel severe acute respiratory syndrome coronavirus (sars-cov- ) which causes the respiratory disease covid- was declared a pandemic by the who in march . most infected patients ( %) have mild symptoms. however, about % of covid- patients can progress to severe pneumonia and to acute respiratory distress syndrome (ards) which is associated with multiorgan failure and death . the current critical situation demands an effective and reliable therapy that is immediately available to control the progression of the disease. convalescent plasma or plasma-derived immunoglobulin (ig) (either polyvalent ig prepared from healthy donors or hyperimmune ig prepared from donors with high antibody titers against a specific antigen) have been historically used as a readily available therapeutic option in outbreaks of emerging or re-emerging infections . to date, seven human coronaviruses (hcov) have been identified. four of them are endemic and globally distributed (hcov- e, hcov-nl , hcov-oc and hcov-hku ) . these viruses typically cause mild symptoms and are associated with about % of common colds [ ] [ ] [ ] . more recently (december ), the novel coronavirus sars-cov- emerged in china and because of its extraordinary human-tohuman transmissibility is currently causing an unprecedented pandemic . although several therapeutic approaches against sars-cov- are under investigation, therapeutic agents of proven efficacy are still lacking. interestingly, coronaviruses share some morphological and functional properties that may be associated with cross-reactive immune responses. this cross-reactivity may have important therapeutic implications . sars-cov, sars-cov- , and mers-cov are classified within the family coronaviridae, genus betacoronavirus, subgenera sarbecovirus (sars-cov, sars-cov- ) and merbecovirus (mers-cov). the spike protein (s), which is exposed on the virion surface, is the main determinant of the coronavirus entry into the host cell and is also the major target of neutralizing antibodies . spikes are formed by trimers of protein s, which is in turn formed by subunit (s ) that mediates the binding to the cell receptor and a membraneanchored subunit (s ) that mediates the fusion of the virus with cell membranes . potent neutralizing antibodies often target the receptor interaction site on s . however, the s subunit shows a higher variability than s . antibodies targeting s are often virus-specific making s a better target for cross-neutralizing antibodies , . the amino-acid sequence identity among the s proteins of human betacoronaviruses causing mild (hcov-oc and hcov-hku ) and severe (sars-cov, sars-cov- , and mers-cov) respiratory infections varies between % and % . however, the s proteins of sars-cov and sars-cov- share % amino-acid identity and more than % rna sequence homology . cross-reactivity in antigenic responses has been described among human coronaviruses of the same genus, particularly betacoronaviruses. crossreactivity between sars-cov, mers-cov and other endemic human coronaviruses has been reported in some neutralization assays - . recently, cross-reactivity in elisa binding assays against antigens of sars-cov, sars-cov- , and mers-cov has been reported with currently available intravenous immunoglobulins (ivig) such as gamunex-c and flebogamma dif . in this study, the neutralization capacity of the ivig products gamunex-c and flebogamma dif against these epidemic human coronaviruses -sars-cov, sars-cov- , and mers-cov-was evaluated. ivig products used in this study were flebogamma ® dif % and % (instituto grifols s.a., barcelona, spain) and gamunex ® -c % (grifols therapeutics inc., raleigh nc, usa), two highly purified (≥ %- %igg), unmodified human immunoglobulins. each product is manufactured from plasma collected from thousands of donors in the us and/or several european countries. igg concentrations in flebogamma dif products were mg/ml and mg/ml ( % and %) and in gamunex-c, the concentration was mg/ml ( %). to ensure a virus-free product, both ivig manufacturing processes contain dedicated steps with high pathogen clearance capacity, such as solvent/detergent treatment, heat treatment, caprylate treatment and planova™ nanofiltration down to nm pore size. the plasma used to manufacture the ivig lots tested was collected from march to october . six different lots of flebogamma dif and gamunex-c were tested at several dilutions for cross-reactivity against sars-cov, sars-cov- , and mers-cov by: i) elisa techniques; and ii) well-stablished neutralization assays in cell cultures. lots were identified as f and f for flebogamma % dif, f and f for flebogamma % dif and g and g for gamunex-c. each experiment was performed in duplicate. handling of viruses and cell cultures was carried out at the recombinant sars-cov was generated from urbani strain using a previously described reverse genetic technique . two different sars-cov- isolates were tested: a) sars-cov- mad isolated from a covid- patient in spain; and b) sars-cov- (accession id epi_isl_ at gisaid repository: http://gisaid.org) isolated from a covid- patient in spain. both stock viruses (a and b) were prepared by collecting the supernatant from vero e cells, as previously described . recombinant mers-cov was generated using a previously described reverse genetic system from the reference sequence of mers-cov isolated from the index patient emc/ (genebank jx ) . huh is a well differentiated human hepatocyte-derived carcinoma cell line, kindly provided by dr. luis carrasco vero e is a cell line isolated from kidney epithelial cells extracted from an african green monkey. vero e is composed of epithelial-like cells susceptible to infection by sars-cov and sars-cov- . at cnb-csic, vero e cell lines were kindly provided by dr. eric snjider (university of leiden medical center, the netherlands). both huh and vero e cell lines were cultured in dulbecco-modified eagle medium (dmem) supplemented with mm hepes buffer, mm l-glutamine (sigma-aldrich, st. louis, mi, usa), % nonessential amino-acids (sigma-aldrich), % fetal bovine serum (fbs; biowhittaker, inc., walkersville, md, usa). in the postinfection semisolid medium, the percentage of fbs was reduced to %, and deae-dextran was added to a final concentration of . mg/ml. at irta-cresa, vero e cells were obtained from the atcc (atcc crl- ) and cultured in dmem (lonza, basel, switzerland) supplemented with % (fbs (euroclone, pero, italy), u/ml penicillin, µg/ml streptomycin, and mm glutamine (all thermofisher scientific, waltham, ma, usa). in the post-infection medium, the percentage of fbs was reduced to %. qualitative determination of igg class antibodies crossreactivity against antigens of the tested coronaviruses was performed using elisa techniques. ivig samples were serially diluted using the buffer solutions provided in each igg elisa kit. the following kits were used for the qualitative determination of igg class antibodies in the experimental ivig lots: sars coronavirus igg elisa kit (creative diagnostics, shirley, ny, usa), against virus lysate; human anti-sars-cov- virus spike [s ] igg elisa kit (alpha diagnostic intl. inc., san antonio, tx, usa), against s subunit spike protein; rv- - , human anti-mers-np igg elisa kit (alpha diagnostic intl. inc.), against n protein; rv- - , human anti-mers-rbd igg elisa kit (alpha diagnostic intl. inc.), against receptor-binding domain (rbd) of s subunit spike protein (s /rbd); rv- - , human anti-mers-s igg elisa kit (alpha diagnostic intl. inc.), against s subunit spike protein; rv- (formerly rv- - ). in all cases the determinations were carried out following the manufacturer's instructions. reactivity was rated as negative if no reaction was observed with neat ivig or positive if the lowest ivig dilution demonstrated reactivity. aliquots of µl of each ivig dilution-virus complex were added in duplicate to confluent monolayers of vero e cells (for sars-cov and sars-cov- ) or huh (for mers-cov), seeded in -well plates and incubated for h ( °c; % co ). after this adsorption time, the igg-virus complex inoculum was removed, and a semi-solid overlay was added (dmem % fbs + . % agarose). cells were incubated for h at ºc. the semi-solid medium was removed, and the cells were fixed with % neutral buffered formaldehyde (sigma-aldrich) for hour at room temperature and stained with . % aqueous gentian violet for min, followed by plaque counting. the sensitivity threshold of the technique was pfu per ml. the neutralization potency of the ivig products was expressed in two ways: ) percent reduction in pfu calculated from the pfu count after neutralization by ivig relative to initial pfu count inoculated onto the cells; and ) plaque reduction neutralization test (prnt ) value, calculated as the -log of the reciprocal of the highest ivig dilution to reduce the number of plaques by % compared to the number of plaques without ivig. a fixed concentration of a sars-cov- stock ( . tcid /ml, a concentration that achieves % cytopathic effect) was mixed with decreasing concentrations of the ivig samples (range : to : ), each mixture was incubated for h at º c and added to vero e cells. to assess potential plasma-induced cytotoxicity, vero e cells were also cultured with the same decreasing concentrations of plasma in the absence of sars-cov- . uninfected cells and untreated virus infected cells were used as negative and positive infection controls, respectively. plasma from a covid- positive patient with a high half-maximal inhibitory concentration (ic ) was included as an active positive control (expressed as the -log of the reciprocal of the dilution). all the cultures were incubated at ºc and % co for days. cytopathic or cytotoxic effects of the virus or plasma samples were measured at days post infection, using the cell titer-glo luminescent cell viability assay (promega, madison, wi, usa). luminescence was measured in a fluoroskan ascent fl luminometer (thermofisher scientific). neutralization curves are shown as nonlinear regressions. ic values were determined from the fitted curves as the plasma dilutions that produced % neutralization. details of the technique are available elsewhere . ivig products showed consistent reactivity to antigens of sars-cov (culture lysate) at - mg/ml igg, sars-cov- (s subunit protein) at µg/ml igg, and mers-cov (n protein, s subunit/rhd protein and s subunit protein) at µg/ml igg (table ) . all the assayed ivig preparations had neutralizing activity against sars-cov ranging from % to % (figure ). all % igg ivig preparations (f , f , g , and g ) showed prnt neutralization titers between . and . , corresponding to - % pfu reduction ( figures b, c) . the highest pfu reductions, . % and . % (prnt neutralization titers of . and . ), were observed with lots f and g , respectively, at and . mg/ml igg (dilution factors and ). the f and f lots, ( % igg) showed a lower neutralization capacity with pfu reductions of . % and . %, respectively ( figure a ). for sars-cov- mad isolate, all ivig lots, except f (inconclusive results) showed a significant neutralizing activity and reached prnt titers ranging from . to > (figure ). pfu reductions ranging from . % to . % were observed with lots f , f and f at a dilution factor of . even at the highest dilution factor ( = . and µg/ml), the pfu reduction ranged from . % to . % corresponding to prnt titers of . - . (figures a, b) . for lots g and g , the pfu reduction was even higher, ranging from . % - . % at a dilution factor of to . % - . % at a dilution factor of with prnt titers greater than ( figure c ). for the sars-cov- epi_isl_ isolate, f and g lots neutralized . % and . %, respectively, tcid counts at a dilution factor of ( figure ). as shown in figure , one replicate of f product failed to demonstrate neutralization. no ivig lot showed any significant pfu reduction (i.e., > %) on mers-cov even at the lowest dilution factor ( mg/ml igg). the results presented here demonstrate, for the first time, significant cross-neutralization activity against sars-cov and especially sars-cov- in therapeutic ivig concentrates (flebogamma dif and gamunex-c). this neutralizing activity correlates with the cross-reactivity to different coronavirus antigens observed in elisa binding assays with ivig, as shown in a previous study . the plasma used to manufacture the tested ivig lots was collected prior the detection of sars-cov- in europe and the usa. therefore, these results should be ascribed to cross-reactivity against sars-cov- by antibodies against endemic human coronaviruses in the human population at large. similar results have been reported for sars-cov and mers-cov. [ ] [ ] [ ] these neutralization studies showed that ivig products contain antibodies with cross-neutralizing capacity against sars-cov ( - %) and sars-cov- ( %- %), but not against mers-cov (< %). these results suggest that the cross-neutralizing antibodies target antigenic regions more conserved in sars-cov and sars-cov- than in mers-cov. no significant differences in the neutralizing capacity were observed among ivig lots regardless the country of origin for the plasma. this reinforces the broad applicability of these results. two different neutralization techniques were used for sars-cov- and both techniques showed not only the ivig neutralization capacity, but also the reliability of the results. in addition, results obtained with two different sars-cov- isolates confirm that the neutralization capacity is not dependent on the isolate. this was not unexpected since no significant sequence differences have been observed among sars-cov- isolates currently circulating throughout the world. the percentage of sars-cov- cross-neutralization was higher in the pfu reduction technique than in the cytopathic effect/cytotoxic technique with very low or negative values in some few cases (inconclusive for lot f by the pfu study, and cytopathic effect in one replicate of lot f ). this suggests that the technique used and/or slight variations in methodology may significantly influence the nature or magnitude of the results. therefore, further evaluation this cross-neutralizing activity should be conducted. cross-neutralization is gaining attention as a protective mechanism against viral infection in the context of the covid- health emergency. the results of this study are in agreement with recent studies that describe crossneutralization of sars-cov- by monoclonal antibodies from memory b cells of an individual who was infected with sars-cov . furthermore, sars-cov- -reactive cd + t cells have been detected in around half of unexposed individuals, suggesting that there is cross-reactive t cell recognition between circulating common cold coronaviruses and sars-cov- . however, the levels of crossneutralizing antibodies against sars-cov- in the sera of sars-cov patients can be highly variable . ivig products are prepared using plasma from thousands of different donors, hence containing a broad representation of the state of immunity in the population at that time. this is consistent with the low rate of variability found among the different lots of ivig products tested. nevertheless, greater variability is expected among individuals with respect to infection by a given endemic human coronavirus. therefore, it has been hypothesized that the diversity of symptoms observed in sars-cov- -infected individuals and even the potential for getting infected may depend on pre-existing cross-immunity due to previous exposure to other endemic human coronaviruses. in this regard, a detailed study of the state of immunity in the general population distinguishing those affected and not affected by the sars-cov- may be warranted. the higher cross-neutralizing capacity of the tested ivig preparations against sars-cov and sars-cov- than mers-cov may be explained by higher sequence identity of the s proteins of circulating mild human coronaviruses (hcov-oc and hcov-hku ) and sars-cov and sars-cov- compared to mers-cov ( %- % vs. %- ) , . additionally, differences in specific domains of the s protein between sars-cov and sars-cov- might explain higher cross-reactivity of the tested ivig against sars-cov- compared to sars-cov ( %- % vs. %- %). the absence of cross-neutralization against mers-cov despite the cross-reactivity observed in elisa assays suggest that these antibodies are not neutralizing. this does not necessarily indicate that the antibodies are not functional by another mechanism. for example, these non-neutralizing antibodies could be labelling the virion for identification by immune cells and subsequent destruction . despite the limitations of the in vitro nature of this study, the clinical implications of the findings are encouraging. although ivig are considered a therapeutic option for hyperinflammation in patients with severe covid- , the results of this study may support the use of high dose ivig as a therapy for covid- . positive results have already been reported for ivig in case studies , . ivig is being tested in an ongoing clinical trial . further studies looking at the functionality of these antibodies could improve our understanding the human coronavirus acquired immunity. this could pave the way for ivig (and other igg products such as intramuscular or subcutaneous preparations) as a potential therapeutic/prophylactic approach to fight future epidemics by emerging human coronaviruses. in conclusion, under the experimental conditions of this study, ivig (flebogamma dif and gamunex-c) contained antibodies with significant neutralization capacity against sars-cov and sars-cov- , but not against mers-cov. additional research is warranted to advance ivig towards clinical use for covid- . strategies for the prevention and management of coronavirus disease use of human immunoglobulins as an anti-infective treatment: the experience so far and their possible re-emerging role genetic recombination, and pathogenesis of coronaviruses update on human rhinovirus and coronavirus infections sars: prognosis, outcome and sequelae sars-cov- /covid- : viral genomics, epidemiology, vaccines, and therapeutic interventions middle east respiratory syndrome european centre for disease prevention and control. covid- . situation update worldwide global patterns in coronavirus diversity. virus evolution identification of the receptor-binding domain of the spike glycoprotein of human betacoronavirus hku characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding sars-cov- , sars-cov, and mers-cov: a comparative overview phylogenetic analysis and structural modeling of sars-cov- spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop profiling early humoral response to diagnose novel coronavirus disease (covid- ) an outbreak of human coronavirus oc infection and serological cross-reactivity with sars coronavirus cross-reactive antibodies in convalescent sars patients' sera against the emerging novel human coronavirus emc ( ) by both immunofluorescent and neutralizing antibody tests antigenic crossreactivity between severe acute respiratory syndromeassociated coronavirus and human coronaviruses e and oc currently available intravenous immunoglobulin contains antibodies reacting against severe acute respiratory syndrome coronavirus antigens construction of a severe acute respiratory syndrome coronavirus infectious cdna clone and a replicon to study coronavirus rna synthesis search for sars-cov- inhibitors in currently approved drugs to tackle covid- pandemia engineering a replication-competent, propagation-defective middle east respiratory syndrome coronavirus as a vaccine candidate genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans comparative host gene transcription by microarray analysis early after infection of the huh cell line by sars coronavirus and human coronavirus e the life cycle of sars coronavirus in vero e cells crossneutralization of sars-cov- by a human monoclonal sars-cov antibody targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals lack of crossneutralization by sars patient sera towards sars-cov- protective antiviral antibodies that lack neutralizing activity: precedents and evolution of concepts covid- : consider cytokine storm syndromes and immunosuppression high-dose intravenous immunoglobulin as a therapeutic option for deteriorating patients with coronavirus disease effect of regular intravenous immunoglobulin therapy on prognosis of severe pneumonia in patients with covid- the efficacy of intravenous immunoglobulin therapy for severe -ncov infected pneumonia jordi bozzo phd, cmpp and michael k. james phd (grifols) are acknowledged for medical writing and editorial support in the preparation of this manuscript. contribution from antonio páez md (grifols) who provided his expert opinion is acknowledged. the authors acknowledge the expert technical assistance from daniel casals, eduard sala, judith luque, laura gómez, and gonzalo mercado (grifols, viral and cell culture laboratory). neutralization experiments were funded by grifols, the manufacturer of flebogamma ® dif and gamunex ® -c. jva, jr, mbp, jmh, is, and le declare having no other conflict of interest. jmd, cr and rg are full-time employees of grifols. key: cord- - nruf g authors: tian, jing-hui; patel, nita; haupt, robert; zhou, haixia; weston, stuart; hammond, holly; lague, james; portnoff, alyse d.; norton, james; guebre-xabier, mimi; zhou, bin; jacobson, kelsey; maciejewski, sonia; khatoon, rafia; wisniewska, malgorzata; moffitt, will; kluepfel-stahl, stefanie; ekechukwu, betty; papin, james; boddapati, sarathi; wong, c. jason; piedra, pedro a.; frieman, matthew b.; massare, michael j.; fries, louis; lövgren bengtsson, karin; stertman, linda; ellingsworth, larry; glenn, gregory; smith, gale title: sars-cov- spike glycoprotein vaccine candidate nvx-cov elicits immunogenicity in baboons and protection in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nruf g the covid- pandemic continues to spread throughout the world with an urgent need for a safe and protective vaccine to effectuate herd immunity to control the spread of sars-cov- . here, we report the development of a sars-cov- subunit vaccine (nvx-cov ) produced from the full-length spike (s) protein, stabilized in the prefusion conformation. purified nvx-cov s form . nm nanoparticles that are thermostable and bind with high affinity to the human angiotensin-converting enzyme (hace ) receptor. in mice and baboons, low-dose nvx-cov with saponin-based matrix-m adjuvant elicits high titer anti-s igg that is associated with blockade of hace receptor binding, virus neutralization, and protection against sars-cov- challenge in mice with no evidence of vaccine-associated enhanced respiratory disease (vaerd). nvx-cov vaccine also elicits multifunctional cd + and cd + t cells, cd + t follicular helper t cells (tfh), and the generation of antigen-specific germinal center (gc) b cells in the spleen. these results support the ongoing phase / clinical evaluation of the safety and immunogenicity of nvx-cov with matrix-m (nct ). titers that were detected - days after a single immunization (fig. a, right) . mice immunized with μg dose of nvx-cov /matrix-m induced antibodies that blocked hace receptor binding to s-protein and virus neutralizing antibodies - -days after a single priming dose ( fig. b and c ). animals immunized with the prime/boost regimen had significantly elevated anti-s igg titers that were detected - days following the booster immunization across all dose levels. animals immunized with μg and μg nvx-cov /matrix-m had similar high anti-s igg titers following immunization (gmt = , and , , respectively). importantly, mice immunized with . μg, μg, or μg nvx-cov/matrix-m had significantly (p ≤ . ) higher anti-s igg titers compared to mice immunized with μg nvx-cov without adjuvant (fig. a, left) . these results indicate the potential for a -fold or greater across all dose levels ( fig. b and c) . showing marked protection from weight loss compared to the unvaccinated placebo animals (fig. e) . the mice receiving a prime and boost vaccination with adjuvanted vaccine also demonstrated significant protection against weight loss at all dose levels (fig. f) . in addition, we compared the prime/boost regimens using μg of either adjuvanted or unadjuvanted nvx-cov . the mice receiving the prime/boost with adjuvant were significantly protected from weight loss relative to placebo mice, while the group immunized with μg nvx-cov alone were not protected against weight loss (fig. g) . these results confirm that nvx-cov confers protection against sars-cov- and that low doses of the vaccine associated with lower serologic responses do not exacerbate weight loss or demonstrate exaggerated illness. with increased periarteriolar cuffing. the thickened alveolar septa remain with increased diffuse interstitial inflammation throughout the alveolar septa (fig. ) . the nvx-cov immunized mice showed significant reduction in lung pathology at both day and day post infection in a dose-dependent manner. the prime only group displays reduced inflammation at the μg and μg dose with a reduction in inflammation surrounding the bronchi and arterioles compared to placebo mice. in the lower doses of the prime-only groups, lung inflammation resembles that of the placebo groups, correlating with weight loss and lung virus titer. the prime/boost immunized groups displayed a significant reduction in lung inflammation for all doses tested, which again correlated with lung viral titer and weight loss data. the epithelial cells in the large and small bronchi at day and were substantially preserved with minimal bronchiolar sloughing or signs of viral infection. the arterioles of animals immunized with μg, μg and . μg doses have minimal inflammation with only moderate cuffing seen in the . μg dose, similar to placebo. alveolar inflammation was reduced in animals that received the higher doses with only the lower . μg dose with inflammation (fig. ) . these data demonstrate that nvx-cov reduces lung inflammation after challenge and that even doses and regimens of nvx-cov that elicit minimal or no detectable neutralizing activity are not associated with any obvious exacerbation of the inflammatory response to the virus. to determine the role of matrix-m in generating t cell responses, we immunized groups of mice (n = /group) with μg nvx-cov alone or with μg matrix-m in a -dose regimen spaced -days apart. antigen-specific t cell responses were measured by elispot and intracellular cytokine staining (iccs) from spleens collected -days after the second immunization (study day ). the number of ifn-γ secreting cells after ex vivo stimulation increased -fold in spleens of mice immunized with nvx-cov /matrix- m compared to nvx-cov alone as measured by the elispot assay (fig. a) . in order to examine cd + and cd + t cell responses separately, iccs assays were performed in combination with surface marker staining. data shown were gated on cd hi cd leffector memory t cell population. importantly, we found the frequency of ifn-γ + , tnf-α + , and il- + cytokine-secreting cd + and cd + t cells was significantly higher (p < . ) in spleens from the nvx-cov /matrix-m immunized mice compared to mice immunized without adjuvant ( fig. b and c) . further, we noted the frequency of multifunctional cd + and cd + t cells, which simultaneously produce at least two or three cytokines was also significantly increased (p < . ) in spleens from the nvx-cov /matrix-m immunized mice ( fig. b and c) . immunization with nvx-cov /matrix-m resulted in higher proportions of multifunctional phenotype within both cd + and cd + t cell populations. the proportions of multifunctional phenotypes detected in memory cd + t cells were higher than those in cd + t cells (fig. d) . type cytokine il- and il- secretion from cd + t cells was also determined by iccs and elispot respectively. we found that immunization with nvx- cov /matrix-m also increased type cytokine il- and il- secretion ( -fold) compared to immunization with nvx-cov alone, but to a lesser degree than enhancement of type cytokine production (e.g. ifn-γ increased -fold). these tfh cells (cd + cxcr + pd- + ) (p = . ), as well as the frequency of gc b cells (cd + gl + cd + ) (p = . ) in spleens ( fig. a and b) . anti-s protein igg titers were detected within -days of a single priming immunization in animals immunized with nvx-cov /matrix-m across all the dose levels (gmt = , - , ). anti-s protein igg titers increased over a log (gmt = , - , ) within to weeks following a booster immunization (days and ) across all of the dose levels. importantly, animals immunized with nvx-cov without adjuvant had minimum or no detected anti-s igg titer (gmt < ) after one immunization, which was not boosted by a second immunization (fig. a) . we also determined the functionality of the antibodies. low levels of hace receptor blocking antibodies were detected in animals following a single immunization with or μg nvx-cov alone had no detectable antibodies that block s-protein binding to hace (fig. b) . neutralizing titers increased -to -fold following the second immunization (gmt = , - , ) (fig. c) . animals receiving the nvx-cov alone had little or no detectable neutralizing antibodies (gmt < ). there was a significant correlation (p < . ) between anti-s igg levels and neutralizing antibody titers (fig. d) . the immunogenicity of the adjuvanted vaccine in nonhuman primates is consistent with the mouse immunogenicity results and further supports the role of matrix-m adjuvant in promoting the generation of neutralizing antibodies and dose sparing. pbmcs were collected days after the second immunization (day ) and t cell cov alone or µg nvxcov /matrix-m (fig. e) . by iccs analysis, immunization with µg nvxcov /matrix-m also showed the highest frequency of ifn-γ + , il- + , and tnf-α + cd + t cells (fig. f) . this trend was also true for multifunctional cd + t cells, in which at least two or three type cytokines were produced simultaneously (fig. f) . type cytokine il- level were too low to be detected in baboons by elispot analysis. we also compared the level of functional hace receptor inhibiting ( % ri) titers. baboons receiving the vaccine had -fold higher binding and receptor inhibiting antibodies ( % ri = , %ci, . - . ) compared to covid- convalescent serum ( % ri = , %ci, . - . ) (fig. ) . therefore, nvx-cov vaccine induced binding and functional antibodies in a nonhuman primate at levels comparable or higher than individuals recovered from covid- . collectively these results support the development of nvx-cov for prevention of covid- here, we showed that a full-length, stabilized prefusion sars-cov- spike glycoprotein with tris buffer containing np- detergent, clarified by centrifugation at , x g for min. s-proteins were purified by tmae anion exchange and lentil lectin affinity chromatography. hollow fiber tangential flow filtration was used to formulate the purified spike protein at - μg ml - in mm sodium phosphate (ph . ), mm nacl, cumulants analysis of the scattered intensity autocorrelation function was performed with instrument software to provide the z-average particle diameter and polydispersity index (pdi). heated from °c to °c at °c per minute and the differential heat capacity change was measured in a nanodsc (ta instruments, new castle, de). a separate buffer scan was performed to obtain a baseline, which was subtracted from the sample scan to produce a baseline-corrected profile. the temperature where the peak apex is located is the transition temperature (tmax) and the area under the peak provides the enthalpy of transition (Δhcal). transmission electron microscopy (tem) and d class averaging. electron microscopy was perform by nanoimaging services (san diego, ca) with a fei tecani t electron microscope, operated at kev equipped with a fei eagle k x k ccd camera. sars-cov- s proteins were diluted to . µg ml - in formulation buffer. the samples ( µl) were applied to nitrocellulose-supported -mesh copper grids and stained with uranyl format. images of each grid were acquired at multiple scales to assess the overall distribution of the sample. high-magnification images were acquired at nominal magnifications of , x ( . nm/pixel) and , x ( . nm/pixel). the images were acquired at a nominal under focus of - . µm to - . µm ( , x) and electron doses of ~ e/Å . for class averaging, particles were identified at high magnification prior to alignment and classification. the individual particles were selected, boxed out, and individual sub-images were combined into a stack to be processed using reference-free classification. individual particles in the , x high magnification images were selected using an automated picking protocol . an initial round of alignments was performed for each sample, and from the alignment class averages that appeared to contain recognizable particles were selected for additional rounds of alignment. these averages were used to estimate the percentage of particles that resembled single trimers and oligomers. a reference-free alignment strategy based on xmipp processing package was used for particle alignment and classification . binding kinetics was determined by bio-layer interferometry (bli) using an octet qk system (pall forté bio, fremont, ca). hist-tagged human ace ( μg ml - ) was immobilized on nickel-charged ni-nta biosensor tips. after baseline, sars-cov- s protein containing samples were -fold serially diluted over a range . nm to nm range were allowed to associate for sec followed by dissociation for an additional sec. data was analyzed with octet software ht : global curve fit. specificity of sars-cov- s binding to hace receptor by elisa. ninety-six well plates were coated with μl sars-cov- spike protein ( μg ml - ) overnight at °c. similarly, baboon ifn-γ and il- assays were carried out using nhp ifn-γ and human il- assay kit from mabtech using pbmc collected at day following the second immunization (day ). systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges spatial-temporal distribution of covid- in china and its prediction: a data-driven modeling analysis a strategic approach to covid- vaccine r&d tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion the clinical pathology of severe acute respiratory syndrome (sars): a report from china identification of human neutralizing antibodies against mers-cov and their role in virus adaptive evolution structure, function, and evolution of coronavirus spike proteins the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen antigenicity of the sars-cov- spike glycoprotein dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites angiotensin-converting enzyme is a functional receptor for the sars coronavirus appion: an integrated, database-driven pipeline to facilitate em image processing xmipp: a new generation of an open-source image processing package for electron microscopy cryo-em structure of the -ncov spike in the prefusion conformation characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov improved titers against influenza drift variants with a nanoparticle fries l et al. a randomized, blinded, dose-ranging trial of an ebola virus ebov gp) nanoparticle vaccine with matrix-m™ adjuvant in healthy adults anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection complement activation contributes to severe acute respiratory syndrome coronavirus pathogenesis. mbio key: cord- -kifqgskc authors: lupala, cecylia s.; li, xuanxuan; lei, jian; chen, hong; qi, jianxun; liu, haiguang; su, xiao-dong title: computational simulations reveal the binding dynamics between human ace and the receptor binding domain of sars-cov- spike protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kifqgskc a novel coronavirus (the sars-cov- ) has been identified in january as the causal pathogen for covid- pneumonia, an outbreak started near the end of in wuhan, china. the sars-cov- was found to be closely related to the sars-cov, based on the genomic analysis. the angiotensin converting enzyme protein (ace ) utilized by the sars-cov as a receptor was found to facilitate the infection of sars-cov- as well, initiated by the binding of the spike protein to the human ace . using homology modeling and molecular dynamics (md) simulation methods, we report here the detailed structure of the ace in complex with the receptor binding domain (rbd) of the sars-cov- spike protein. the predicted model is highly consistent with the experimentally determined complex structures. plausible binding modes between human ace and the rbd were revealed from all-atom md simulations. the simulation data further revealed critical residues at the complex interface and provided more details about the interactions between the sars-cov- rbd and human ace . two mutants mimicking rat ace were modeled to study the mutation effects on rbd binding to ace . the simulations showed that the n-terminal helix and the k of the human ace alter the binding modes of the cov -rbd to the ace . the outbreak of a new type of severe pneumonia covid- started in december has been going on world-wide, and caused over , fatalities, infected more than , individuals globally. although the earlier infected cases were mainly found in china before march, , particularly in hubei province, the confirmed covid- cases have been reported in more than countries or territories by the end of march, , and still increasing rapidly. one urgent desire in coping with this global crisis is to develop or discover drugs that can treat the diseases caused by the novel coronavirus, the sars-cov- (also known as -ncov) . according to the genome comparative studies, the sars-cov- belongs to the genus beta-coronavirus, with nucleotide sequence identity of about % compared to the closest bat coronavirus ratg , about % compared to two other bat sars-like viruses (bat-sl-covzc & bat-sl-covzxc ), and % compared to the sars-cov , . furthermore, the sars-cov- spike protein has a protein sequence identity of % for the receptor binding domain (rbd) with the sars-cov rbd (denoted as sars-rbd in the following). the sars-cov and sars-cov- both utilize the human angiotensin converting enzyme protein (ace ) to initiate the spike protein binding and facilitate the fusion to host cells - . the -residue rbd of the sars-cov spike protein has been found to be sufficient to bind the human ace . based on this fact, the rbd of sars-cov- becomes a critical protein target for drug development to treat the covid- . when this study was started, neither the crystal structure of the sars-cov- spike protein nor the rbd segment were determined, so the homology modeling approach was applied to construct the model of the sars-cov- spike rbd in complex with the human ace binding domain (denoted as cov -rbd/ace in the following). similar approach has been applied to predict the complex structure and estimate the binding energies . because of the high sequence similarity between cov -rbd and sars-rbd, the predicted structure was found to be highly consistent with the resolved crystal structures (see another crystal structure at http://nmdc.cn/ncov entry:nmdcs , pdbid: lzg). these structures laid the foundation for the dynamics investigation of the cov -rbd/ace complex using computational simulation method. the predicted cov -rbd/ace model was subjected to all-atom molecular dynamics (md) simulations to study the binding interactions. although the crystal structure and the predicted model of the cov -rbd/ace complex provide important information about the binding interactions at the molecular interfaces, md simulations can extend the knowledge to a dynamics regime in a fully solvated environment. the importance of the ace residues was investigated by simulating the complexes with ace mutants, in which partial dissociation from the ace was observed within ns simulations. the control simulations of the sars-rbd/ace complexes allowed the detailed comparison in receptor binding for the two different types of viruses. the results showed that the wild type cov -rbd/ace complex is stable in ns simulations, especially in the well-defined binding interface. on the other hand, the mutations on the helix- or k of the ace can alter the binding, revealing new binding poses with reduced contacts compared to those in the crystal structures. the analysis of the interaction energy showed that the binding is enhanced by adjusting conformations to form more favorable interactions as the simulation progressed, consistent with the increased hydrogen bonding patterns. furthermore, the analysis also showed that sars-rbd and cov -rbd have comparable binding affinities to the ace , with the former slightly stronger than the latter. the dynamic information obtained by this study shall be useful in understanding sars-cov- host interaction and for designing inhibitors to block cov -rbd binding. the computer model of the sars-cov- spike rbd in complex with human ace the spike rbd of sars-cov- (genbank: mn ) comprises cys -gly residues according to the sequence homology analysis with sars-cov spike rbd. the predicted three-dimensional structure model of these residues was obtained with the swiss-model server . this predicted sars-cov- rbd model was subsequently superimposed into the x-ray structure of sars-cov rbd in complex with human ace (pdb code: ajf, chain d ). finally, the computer model of sars-cov- rbd with human ace (cov -rbd/ace ) was obtained for further simulations and analysis. based on the analysis of the predicted model, sequence alignment, and literature survey, two other systems containing mutations in the human ace were prepared and subject to md simulation studies. the mutant construct is based on the fact that rat ace markedly diminishes interactions with sars spike protein , and it was proposed that the rat ace likely has reduced binding affinity to the cov -rbd . to investigate the roles of critical residues on the ace , we created two mutants of the human ace (see table ): ( ) mutant mut_h , with the ace n-terminal (residue - ) mutated to the residues of rat ace ; and ( ) mutant k h, in which the highly conserved k was mutated to histidine (the corresponding amino acid in rat and mouse ace proteins). to focus on the impact of these two binding sites, the rest of the ace were kept to be the same as human ace . the predicted model of cov -rbd/ace complex was used as the starting models for md simulations. the spike protein rbd domain is composed of residues ( - ), while the ace protein contains residues from the n-terminal domain. the simulation parameterization and equilibration were prepared for complex structures including the mutant systems, using the charmm-gui webserver . each system was solvated in tip p water and sodium chloride ions to neutralize the systems to a salt concentration of mm. approximately, each system was composed of about , atoms that were parametrized with the charmm force field . after energy minimization using the steepest descent algorithm, each system was equilibrated at human body temperature . k, which was maintained by nose-hoover scheme with three independent trajectories starting from random velocities based on maxwell distributions were simulated for both cov -rbd/ace and sars-rbd/ace complex systems in their wild types. in all simulations, a time step of . fs was used and the pme (particle mesh ewald) was applied for long-range electrostatic interactions. the van der waals interactions were evaluated within the distance cutoff of . Å. hydrogen atoms were constrained using the lincs algorithm . the human ace mutants in complex with cov -rbd were constructed as described previously. each mutant complex model was simulated in two independent trajectories. furthermore, as the crystal structure of the cov -rbd/ace complex became available, two additional simulations were carried out using the crystal structure as the starting model to cross-validate the simulation results based on the homology model. each trajectory was propagated to ns by following the same protocol as the wild type cov -rbd/ace complex simulations. analyses were carried out with tools in gromacs (such as rmsd, rmsf, energy, and pairdist) to examine the system properties, such as the overall stability, local residue and general structure fluctuations through the simulations. the g_mmpbsa program was applied to extract the molecular mechanics energy emm (lennard jones and electrostatic interactions) between ace and the rbd of spike proteins. vmd and chimera were applied to analyze the hydrogen bonds, molecular binding interface, water distributions, visualization and rending model images , . the homology structure of the cov -rbd/ace was compared to the sars-rbd/ace crystal structure and the newly resolved crystal structure of the cov -rbd/ace . the results indicated that the homology model is accurate, especially at the binding interface. the md simulations further refined the side chain orientations to improve the model quality. the simulation data revealed the stable binding between the cov -rbd and the ace , in spite of the conformational changes of the ace . the relative movement between the cov -rbd and the ace mainly exhibited as a swinging motion pivoted at the binding interface. simulations also revealed the roles of water molecules in the binding of the rbd to the ace receptor. the md simulation of complex with ace mutants suggested that mutation to the ace helix- and the k can alter the binding modes and binding affinity. the predicted cov -rbd/ace complex structure is highly similar to the sars-rbd/ace , as shown in figure . the rbd domain has an rmsd of . Å for the aligned residues ( . Å for all residue pairs), indicating that the homology model of the cov -rbd is in good agreement with the sars-rbd. for ace residues near the binding interface (within . Å of the rbd), the rmsd is smaller than . Å compared to the sars-rbd/ace complex. the superposed structures revealed that the rbd/ace interfaces are almost identical in two complexes (figure c ). in a retrospective comparison, the predicted complex structure was superposed to the newly resolved crystal models (see figure d for a detailed comparison at the interface). the results indicated that the homology model is very accurate, especially for the binding interface. the residues near the cov -rbd/ace interface (defined as the combined set of ace residues within . Å of rbd and the rbd residues within . Å of the ace ) exhibited a difference of . Å rmsd, which is comparable to the difference between the two independently reported crystal models (an rmsd of . Å for the same comparison). the rmsd is about . Å for residues in an extended region within . Å of the binding interface. the rbd domain of the spike protein showed an overall rmsd values less than . Å, and the ace domain with an rmsd about . Å between the predicted model and the crystal structures. in three simulations of the cov -rbd/ace systems, the binding interface was highly stable, exhibiting very small conformational changes, especially for the interfacing residues of the ace protein. the rmsd for the residues at the rbd binding interface is . Å (+/- . Å) on average. side chain atom positions were refined to form more favorable interactions (figure d) . one outstanding example is the k side-chain, which pointed in the wrong orientation in the predicted structure, was quickly refined to the correct orientation, consistent with the crystal structure (right panel of figure d ). in terms of collective conformational changes, the cov -rbd/ace complex showed two interesting movements: ( ) the loop (l ) between β and β (residues between s and g in particular) of cov -rbd was found to expand its interactions with the nterminal helix (the helix- ) of the ace (figure a) , while it pointed away from the helix- in the predicted and the crystal structures (figure d, left) ; ( ) a tilting movement of the rbd relative to the ace was observed, which can be depicted as a swinging motion with the binding interface as the pivot (see figure for an illustration). in both predicted and the crystal structures of the cov -rbd/ace complex, the l does not form close contacts with the ace . the analysis of the crystal packing revealed that this loop participated in the interaction with another asymmetric unit (see supplemental information). interestingly, the simulation data suggested that the l could move towards the ace and form contacts with the helix- . this can potentially enhance the binding, as reflected in the change of interaction energies. in the crystal structure, the c and c of the rbd are cross-linked via a disulfide bond that reduces the flexibility of the l region, limiting its access to the ace . on the other hand, it has been reported that the binding of sars-rbd to ace is insensitive to the redox states of the cysteines to a high extend . based on the simulation results, we hypothesize that the reduced form of c and c can also exist during the virus invasion to host cells, and the reduced cysteines can potentially enhance the binding to ace . in the other two simulation trajectories, we found that the l remained in conformations similar to that in the crystal structure and the cysteines (c and c ) were close enough for disulfide bond formation. by examining the binding interface of cov -rbd and the ace , we found the polar and charged residues account for a large fraction, therefore the electrostatic interactions play critical roles for the complex formation. based on the distances between the two proteins, the key residues at the binding interface were identified and summarized in table for the three representative models (see figure ). majority of these residues are conserved for these three models, except that the model# (figure a) has additional contacts to the ace from residues in the l region (highlighted with green color in table ). as shown in figure , the l remained in the starting position for the other two representative models (figure b,c) . the same simulations were carried out for the sars-rbd/ace complex, serving as a comparative system. interestingly, the sars-rbd counterpart of the l in cov -rbd did not form close contacts with the ace in three simulations. it is worthwhile to mention that the sequence identity between cov -rbd and sars-rbd is low in this loop region, suggesting the loop region might be partially responsible for the difference in the receptor binding. the hydrogen bonds between the cov -rbd and ace were extracted using vmd based on the statistics of three simulation trajectories, the cov -rbd/ace complex has . hydrogen bonds between the subunits on average with stringent criteria. in comparison, the sars-rbd/ace has . hydrogen bonds on average (see supplementary information ). the statistics of hydrogen bonds suggest a slightly weaker binding between the cov -rbd and the ace , relative to the sars-rbd/ace complex. it is also noteworthy to point out the important roles of water molecules at the complex interface for cov -rbd/ace complex. at any instant time, there are approximately water molecules at the binding interface, simultaneously located within . Å of both the cov -rbd and the ace (figure ). these water molecules can function as bridges by forming hydrogen bonds with the residues from the rbd or the ace . the dwelling time of water molecules at the interface can be a few nanoseconds, as revealed by the md simulations. this results is also consistent with the crystal structure, which has water molecules at the interface (figure c ). these discoveries emphasize the role of the water molecules, which desires detailed quantification to understand the interactions between the rbd and the ace . it has been demonstrated that the ace from several mammalian species possess high sequence similarities, yet their binding to the sars-rbd differs significantly. in particular, the binding of sars-rbd to the rat ace is much weaker as discovered in experiments . inspired by these information, two mutants of the cov -rbd/ace were constructed: ( ) ace -mut-h by mutating n-terminal helix- to that of the rat ace ; ( ) ace -k h by mutating k to histidine (the amino acid in wild type rat ace ). two ns md simulations were carried out for each mutant system. the simulation showed that the mutations in ace -mut-h reduced the interaction between the cov -rbd and the helix- , and the ace -k h showed weaker binding between the cov -rbd and the β-hairpin centered at the h . using the clustering analysis, the representative structures were identified from each simulation trajectory (figure ) . although the overall topology is very similar to the wild type complex structure, there are pronounced differences. for the ace -mut-h system, the cov -rbd tilted further away from the ace helix- in one simulation (figure a) ; and the cov -rbd lost its contact with helix- (g to n ) in another simulation for the ace -k h (figure c ). in the wild type ace , the k is a hydrogen donor, and its mutant h cannot form the hydrogen bond with the cov -rbd as in the wild type cov -rbd/ace complex. the number of contacting residue pairs was significantly reduced in the ace -k h mutant system. this is in line with the report that k is more critical than the other residues, as its hydrophobic neighborhood enables this positively charged residue high selectivity to the rbd , . the physical interactions between the rbd and the ace were quantified for the simulated structures. we considered the molecular mechanics energy emm , which is composed of the van der waals and the electrostatic interactions. furthermore, the number of residue contacts (nc) between (rbd and ace ) was extracted from simulated structures. both the emm and nc indicate that the rbd interactions with the ace are comparable for cov and sars spike proteins (figure ) . from the simulations, the we would like to point out that the energy emm is the physical interaction between the rbd and the ace , rather than the binding energy, which requires accurately incorporating solvation energy and entropy. furthermore, the standard deviations of emm are . kj/mol and . kj/mol for the two complexes. therefore, we infer that the binding affinities are comparable for cov -rbd/ace and sars-rbd/ace . the simulations started from the predicted and crystal models yielded very similar results (purple triangles). this is in line with a recent study, in which the authors showed similar binding affinity to human ace for both sars-cov- and sars-cov spike proteins . they found the association rate constants kon to be the same at . x m- s- , while the sars-cov spike protein showed a faster dissociation, with the rate constant koff to be . x - s- , about . times larger than the sars-cov- spike protein koff = . x - s- . similar kon values and the equilibrium dissociation constants kd in nanomolar range were reported in other studies for sars-cov- spike protein (or rbd) binding to human ace , . more interestingly, the mutation impacts were reflected in the emm and nc analysis: the ace -mut_h is likely to reduce the binding to the ace due to the tilting movement of cov -rbd, making it further from the ace helix- (the blue triangle symbol at lower right, see figure a for the representative structure). in the other simulation trajectory for the cov -rbd/ace -mut_h complex (blue triangle at the left upper corner), the largest nc was observed among all simulations. for simulations of the complex with ace -k h mutants (green diamonds), the number of contacts were both reduced compared to the wild type system. in one simulation, the contacts between the cov -rbd and the helix- of the ace were completely lost (see figure c) , consistent with the less favorable interactions reflected on an increase of emm. for the sars-rbd interaction with the ace -mut_h , both simulations revealed fewer contacts compared to the wild type sars-rbd/ace complex (purple stars in figure ). the homology modeling of the cov -rbd/ace complex yielded highly consistent models compared to the crystal structures. all-atom molecular dynamics simulations were carried out to study the dynamic interactions of cov -rbd with human ace , the results were compared to the sars-rbd/ace system. the human ace mutants were also constructed to mimic the rat ace to investigate the roles of critical residues, and possible binding modes in other mammals. it is observed that md simulations improved the structure at the binding interface and strengthened the interactions between the subunits. the structure of the complex interface is highly stable for all simulations of cov -rbd/ace complex in the wild type. the loop region between β and β can potentially form more contacts with the ace as observed in one simulation trajectory. the simulations results also reveal that the interactions between cov -rbd and the ace are mediated by water molecules at the interfaces, stressing the necessity of accounting for the explicit water molecules when quantifying the binding affinity. the interactions between the rbd and the ace were quantified by physical interaction energies (molecular mechanics energy) and the number of contacting residues. the detailed comparison results suggest that the cov -rbd and the sars-rbd bind to human ace with comparable affinity. the comparison between the sars-rbd/ace and the cov -rbd/ace complexes, with the former forms fewer contacts than the latter (figure ), yet exhibiting stronger interactions. the decomposition of the emm to the van der waals and the electrostatic interactions revealed that the major difference is attributed to the electrostatic interactions. furthermore, we compared the major contacting residues and found that the sars-rbd has two charged residues (r and d ) and the cov -rbd has only one charged residue (k ) at the complex interface. the polar and hydrophobic residues are comparable in the two rbds. this is consistent with the statistics of hydrogen bonds at the complex interfaces. this study was started with a structure predicted using homology modeling method, which later found to be highly consistent with the crystal structure, demonstrating the potentiality of structure prediction and dynamics simulation in revealing molecular details before the availability of high resolution experimental information. furthermore, the interactions between cov -rbd and the ace mutants mimicking rat ace protein were investigated. the results provide valuable information at the atomic level for the reduced binding affinity. the recent report on the sars-cov- infection to a dog remark mut_h t l, q k, k e, t s, d n, h q, f s l , n , q , and s are conserved between rat and mouse. k h k h h is conserved between rat and mouse table . contact residues between the cov -rbd and the ace . green color denotes new interaction not observed in crystal structure. model# t f d k h e e d y q m y n k g d r k y l f q a g s t g f n y q y g q t n g y s q t f d k h e d y q l m y n k g d r k g y y l f a f n y q y g q t n g y q t f d k h e e d y q m y n k g d r k g y y l f a f n y q y g q t n g y table s . contact residues at the sars-cov-rbd/ace interface traj traj traj ace cov ace cov ace cov s q t k h e d y q l l l l m y q e n k g d r r y y y l d l n y n g y t t g i y q t d k h e d y q l l l l m y q e n k g d r r y y y l p d g l n y n g y t t g i y q t k h e d y q l l l l m y q e n k g d r r y y l p d g p l n y l n g y t t g i y m t k k y e m t k k y e fig. cov -rbd/ace sars-rbd/ace superposed a. b. c. fig. a. b. c. a novel coronavirus from patients with pneumonia in china a new coronavirus associated with human respiratory disease in china evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding angiotensin-converting enzyme is a functional receptor for the sars coronavirus a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensinconverting enzyme structure of sars coronavirus spike receptor-binding domain complexed with receptor trilogy of ace : a peptidase in the renin-angiotensin system, a sars receptor, and a partner for amino acid transporters the pathogenicity of novel coronavirus in hace transgenic mice crystal structure of the -ncov spike receptor-binding domain bound with the ace receptor swiss-model: homology modelling of protein structures and complexes efficient replication of severe acute respiratory syndrome coronavirus in mouse cells is limited by murine angiotensin-converting enzyme receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars simulations using the charmm additive force field optimization of the additive charmm all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ and χ dihedral angles a unified formulation of the constant temperature molecular dynamics methods canonical dynamics: equilibrium phase-space distributions polymorphic transitions in single crystals: a new molecular dynamics method gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers particle mesh ewald: an n·log(n) method for ewald sums in large systems lincs: a linear constraint solver for molecular simulations g-mmpbsa -a gromacs tool for highthroughput mm-pbsa calculations vmd: visual molecular dynamics ucsf chimera -a visualization system for exploratory research and analysis receptor and viral determinants of sars-coronavirus adaptation to human ace significant redox insensitivity of the functions of the sars-cov spike glycoprotein: comparison with hiv envelope structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections receptor recognition mechanisms of coronaviruses: a decade of structural studies structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation. science ( -. ) coronavirus: hong kong confirms a second dog is infected the authors declare no competing interests. key: cord- -pubhq authors: bryche, bertrand; st albin, audrey; murri, severine; lacôte, sandra; pulido, coralie; gouilh, meriadeg ar; lesellier, sandrine; servat, alexandre; wasniewski, marine; picard-meyer, evelyne; monchatre-leroy, elodie; volmer, romain; rampin, olivier; le goffic, ronan; marianneau, philippe; meunier, nicolas title: massive transient damage of the olfactory epithelium associated with infection of sustentacular cells by sars-cov- in golden syrian hamsters date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pubhq anosmia is one of the most prevalent symptoms of sars-cov- infection during the covid- pandemic. however, the cellular mechanism behind the sudden loss of smell has not yet been investigated. the initial step of odour detection takes place in the pseudostratified olfactory epithelium (oe) mainly composed of olfactory sensory neurons surrounded by supporting cells known as sustentacular cells. the olfactory neurons project their axons to the olfactory bulb in the central nervous system offering a potential pathway for pathogens to enter the central nervous system by bypassing the blood brain barrier. in the present study, we explored the impact of sars-cov- infection on the olfactory system in golden syrian hamsters. we observed massive damage of the oe as early as days post nasal instillation of sars-cov- , resulting in a major loss of cilia necessary for odour detection. these damages were associated with infection of a large proportion of sustentacular cells but not of olfactory neurons, and we did not detect any presence of the virus in the olfactory bulbs. we observed massive infiltration of immune cells in the oe and lamina propria of infected animals, which may contribute to the desquamation of the oe. the oe was partially restored days post infection. anosmia observed in covid- patient is therefore likely to be linked to a massive and fast desquamation of the oe following sustentacular cells infection with sars-cov- and subsequent recruitment of immune cells in the oe and lamina propria. respiratory viruses are very common worldwide and impact human and animal health with important economic consequences. although most symptoms of respiratory virus infection are related to airway inflammation and focused on the respiratory tract, extra respiratory organs are also targeted by these viruses including the central nervous system (bohmwald et al., ) . the recent pandemic of coronavirus disease emerging in december has been accompanied by a prevalence of olfaction alteration. indeed, approximately % of patients infected by the severe acute respiratory syndrome coronavirus (sars-cov- ) that is the cause of covid- suffered from mild (hyposmia) to total loss of olfactory capacities (anosmia) (spinato et al., ) . the origin of these alterations remains to be established. olfaction starts with olfactory sensory neurons (osn) located in the olfactory epithelium (oe) present in the most dorsal part of the nasal cavity. the oe along with the underlying lamina propria constitutes the olfactory mucosa. osns have cilia in direct contact with the environment in order to detect odorants. they are therefore accessible to respiratory viruses and some of these viruses are able to infect the osn and reach the central nervous system by following the olfactory nerves which project to the olfactory bulbs (forrester et al., ; bryche et al., a) . interestingly, sars-cov- (netland et al., ) was described as a virus able to use this olfactory pathway. ace (angiotensin-converting enzyme ) was characterized as the main entrance receptor for sars-cov- (letko et al., ) , similarly to sars-cov- . sars-cov- was described as being able to enter the central nervous system in mice transgenic for human ace (netland et al., ) . in this model, the brain infection starts from the olfactory bulbs and while the authors did not look for a potential infection of the olfactory mucosa, it may be that sars-cov- could infect olfactory sensory neurons, reaching the brain through the olfactory bulbs where these neurons project (forrester et al., ) . if sars-cov- infection follows the same pathway, anosmia following sars-cov- infection may correlate with the encephalopathies observed in some covid- patients (roe, ) and further highlights the need to unravel the cellular basis of the observed anosmia. golden syrian hamsters have been successfully used as a model of sars-cov- infection (roberts et al., ) and have also been recently shown to be a good model for sars-cov- as well (sia et al., ) . indeed, the expression profile of ace , the entry receptor for sars-cov- , is very similar in hamsters and humans (luan et al., ) . anosmia in covid- patients provided impetus to study the expression profile of ace in the nasal cavity. several studies show that ace is present in the oe but expressed in sustentacular cells, the supporting cells surrounding the olfactory neurons, rather than in the olfactory neurons themselves (bilinska et al., ; fodoulian et al., ) . however, a recent study suggested that sars-cov- could infect olfactory sensory neurons in hamsters (sia et al., ) . in the present work, we focused on the pathological impact of two different sars-cov- strains on the nasal cavity and the central nervous system of the golden syrian hamster. viral strains were isolated from nasopharyngeal swabs obtained from patients suffering from respiratory infection, suspected of covid- and submitted to molecular diagnosis. nasopharyngeal flocked swabs were suspended in utm media (copan, italy) , and kept at °c for less than hours. a volume of µl was homogenized and microfiltered ( . µm) previous to inoculation on vero cells ccl- (passage , from atcc, usa) grown at % confluence in a bsl virology laboratory (virology unit, chu de caen, france). supernatants were harvested at day after inoculation and immediately used in subsequent passage one (p ) of the virus following the same protocol. p was used for stock production and each stock was aliquoted and conserved at - °c before titration and genomic quantification. two strains were used in the model : ucn and ucn , isolated at one week interval during the course of the active epidemic in normandy, france (end of march). animal experiments were carried out in the animal-biosafety level (a-bsl ) facility of the animal experimental platform (anses -laboratoire de lyon). the protocol for animal use was approved by the french ministère de la recherche et de l'enseignement supérieur number (apafis n° - ). twelve female golden syrian hamsters (eight weeks old) were purchased from janvier's breeding centre (le genest, st isle, france). infection was done by nasal instillation ( µl in each nostril) of anaesthetised animals (vetflurane). five animals were infected with . plaque forming units (pfu) of sars-cov strain ucn , five animals were infected with . pfu of sars-cov strain ucn and two mock-infected animals received only dulbecco's minimal essential medium (dmem). sars-cov -infected hamsters were euthanized after general anaesthesia (vetflurane) at , , , or days after infection. at post-mortem, lungs were harvested, weighed and homogenized in ml of dmem with stainless steel beads (qiagen) for s at hz using tissuelyserii (qiagen). lung homogenates were clarified by centrifugation ( , × g at °c for min), aliquoted and stored at − °c until rna extraction. rna extraction was done with qiaamp viral rna minikit (qiagen) following manufacturer's instructions. real time quantitative pcr was performed on the e gene as described recently (corman et al., ) on a lightcycler lc (roche). the immunohistochemistry analysis of the olfactory mucosa tissue sections was performed as described previously in mice (bryche et al., a) . briefly, the whole animal head was fixed for days at room temperature in % paraformaldehyde pbs, then decalcified for one week ( % edta -ph . at °c). the nasal septum and endoturbinates were removed as a block and post-fixed overnight at °c in % paraformaldehyde pbs and further decalcified for days. the same procedure was applied with cortex and brainstem but without decalcification. blocks and tissues were cryoprotected with sucrose ( %). cryo-sectioning ( μm) was performed on median transversal sections of the nasal cavity, perpendicular to the hard palate, in order to highlight vomeronasal organ, olfactory mucosa, steno's gland and olfactory bulb. the brain was cryo-sectioned ( µm) in the coronal plane. sections were stored at - °c until use. non-specific staining was blocked by incubation in % bovine for histology, some sections were rehydrated then stained in hematoxylin (h- vector laboratories) for s and washed with water. slides were then transferred in an acid-alcohol solution ( . % hcl in % ethanol) for s and washed in water, then immersed in eosin (sigma ht ) for s and washed again in water. finally, slides were gradually dehydrated and mounted in eukitt (sigma - - ). nissl staining was performed on some sections following conventional histological procedures, including immersion in . % cresyl violet solution for minutes. images were taken at x magnification using a dmbr leica microscope equipped with an olympus dp- ccd camera using cell f software (olympus soft imaging solutions gmbh, osis, munster, germany). we measured the oe thickness, the osn cilia quality (based on golf staining) and the immune cell infiltration of the olfactory mucosa (based on the iba staining) on images per animal taken from different slides spread along the nasal cavity. for the oe thickness, measurements along the oe septum were performed for each image, as previously described (francois et al., ) . the percentage of apical oe with a preserved g olf staining and the percentage of iba + cells infiltration in the oe were measured using imagej to threshold specific staining. all values were averaged per animal. we also globally estimated the impact of sars-cov- on the oe on these parameters. by comparing mock-infected animals with sars-cov- infected animals, we evaluated: the integrity of the structure of the oe using he staining; the preservation of the osn cilia using g olf signal in the apical part of the oe; the presence and shape of iba + cells in the oe and in the underneath lamina propria. a ramified iba + cells is considered representative of a resting state while an amoeboid shape is linked to an activated state in the oe (mori et al., ) . for each parameter, we used a scale ranging from to scored on three sections of the olfactory mucosa spread along the nasal cavity per animal. this evaluation was performed by three experienced investigators blinded to the animal treatment. scores were averaged per animal between investigators. we used a cohort of hamsters to examine the impact of two different isolates of sars-cov- (ucn and ucn ) on the nasal cavity from to day post infection (dpi) through nasal instillation. while the olfactory epithelium was well preserved in control animals, we observed massive damage of the olfactory epithelium in sars-cov- -infected animals as early as dpi (fig. a) . the nasal cavity lumen was filled with cellular aggregates and most of the olfactory epithelium was strongly disorganized. in the most affected oe zones, axon bundles from the olfactory sensory neurons under the olfactory epithelium were almost in direct contact with the environment (fig. b) . as an objective measure of olfactory mucosa damage, we first measured the septal olfactory epithelium thickness (fig. c) and we then scored the overall lesions of the oe (fig. d) . most of the oe had disappeared at dpi, after which point it started to recover but had not reached pre-infection thickness at dpi when our measurements stopped. the results were similar for both viruses. we then examined the cellular localisation of sars-cov- in the nasal cavity by immunohistochemistry, using an antibody raised against the nucleocapsid protein of the virus. we observed that a large proportion of the oe was infected at dpi, which decreased at dpi and subsequently disappeared ( fig. a and supp. fig. a, b) . the virus was also present in the vomeronasal organ (supp. fig. a ) and in the steno's gland (supp. fig. b ) at dpi and dpi. we also examined various areas of the brain. we could not find any presence of the virus in the olfactory bulbs (supp. fig. a) , in the olfactory cortex (piriform cortex and olfactory tubercle), hypothalamus, hippocampus, and in the brainstem where we mainly focused on the respiratory control centres: the ventral respiratory column and the nucleus of the solitary tract (supp. fig. b) . the virus presence in the nasal cavity followed a kinetic similar to its presence in the lungs (supp. fig. c) . in order to identify which cells were infected with sars-cov- in the oe, we performed double staining of the olfactory mucosa with omp and ck , specific markers of mature osns and sustentacular cells respectively. while most of the cells infected with sars-cov expressed ck as well, we were not able to find any viral antigen in omp expressing cells. these results show that sars-cov- infects sustentacular cells but not osns (fig. ) . interestingly, we observed staining of omp, ck , sars-cov- as well as hoechst in the lumen of the nasal cavity indicating that part of the oe containing infected and non-infected cells was desquamated in response to sars-cov- infection. in order to understand the extent of the damage of the osns following sars-cov- infection, we examined the preservation of the osn cilia layer, which contains all the transduction complex allowing the detection of odours (dibattista et al., ) . we performed immunohistochemistry against g olf (fig. a, b) , a specific g protein from this complex (jones and reed, ) . we quantified the percentage of septal olfactory epithelium containing g olf staining in the apical part of the oe (fig. c) and scored globally the quantity of g olf staining in the oe (fig. d) . while the g olf signal was completely preserved in control animals, it quickly diminished in sars-cov- -infected animals without any clear improvement from dpi to dpi. we observed a small recovery of the g olf signal only in the septal part of the oe dpi with the ucn virus. we finally examined the presence of immune cells in the olfactory mucosa using iba + as a specific marker of monocyte/macrophage lineage (bryche et al., b) ; (fig. a) . in control animals, iba + cells were mainly localized in the lamina propria and exhibited a ramified morphology with many processes (fig. b) , which is typical of a resting phenotype (mori et al., ) . we measured the presence of iba + cell (fig. c) and scored the presence and shape of iba + cells in the oe and the lamina propria separately (fig. d) . while iba + cells were mostly absent from the oe and moderately present in the lamina propria in control animals, their presence was drastically increased up to dpi in infected animals, both in the oe and the lamina propria (fig. a, b) . furthermore, iba + cells were mostly of amoeboid shape which is typical of an activated state at to dpi (fig. b) . we also observed iba + cells in the lumen of the nasal cavity in the cellular aggregates containing sars-cov- (fig. a ). immune cell infiltration gradually diminished. this reduction was first observed at dpi in the oe but was not seen in the lamina propria before dpi. at dpi, it reached the mock level both in the oe and the lamina propria for both virus strains (fig. c, d) . and in the present study, we focused on the impact of sars-cov- in the nasal cavity and central nervous system using the golden syrian hamster as a model. we chose to work with different isolates of sars-cov- , with the potential to obtain slightly different strain behaviour. we explored the kinetic of the impact of these two different isolates. we identify sustentacular cells as the main target cells of sars-cov- in the oe. this observation is consistent with the expression pattern of ace mainly present in these cells along with the facilitating protease tmprss (bilinska et al., ; fodoulian et al., ) . two recent reports indicate that olfactory neurons in hamster (sia et al., ) and respiratory cells in ferret (ryan et al., ) may be the target of sars-cov- but these studies did not focus on the nasal cavity and they did not use double staining to clearly identify the infected cells in the oe. we observed that sars-cov- can infect other nonneuronal cells in the nasal cavity, notably in the epithelium covering the lumen of steno's gland which is however poorly described in the literature (bryche et al., b) . further studies are required to clearly identify the other non-neuronal cells infected by sars-cov- in the nasal cavity which may facilitate the systemic infection of other cell types and tissues lower in the respiratory airways. we did not observe any presence of the virus in the brain, notably in the olfactory bulbs where osns projects and in the piriform cortex where the olfactory signal is integrated. we also focused on the hypothalamus which contains ace expressing neurons (mukerjee et al., ) as well as on the respiratory centres of the brainstem as the latter are suspected to be infected in covid- patients suffering from heavy respiratory disorders (gandhi et al., ) . our lack of virus detection in the central nervous system may be due to the low number of animal examined but, nevertheless, we can rule out a systematic and important infection of the brain following sars-cov- infection in hamster. this is consistent with a recent review of the literature which failed to indicate any central nervous manifestation of sars-cov- presence in human central nervous system (romoli et al., ) , however the neurotropic ability of sars-cov- remains controversial (wang et al., ) . interestingly, sars-cov- has been shown to be neurotropic only using ace humanized mice. the osns of these mice must also express ace as it is under the control of k ubiquitous promoter (netland et al., ) . sars-cov- may thus infect osns expressing ectopic ace and the observation of presence of the virus in the brain may not be relevant for a more physiological model. in our model, we observed a fast desquamation of the oe following sars-cov- infection of sustentacular cells. further studies are required to decipher whether it could be an anti-viral strategy to limit access of the virus to the brain through the olfactory pathway and to which extent the recruitment of immune cells contributes to this process through immunopathological mechanisms. globally, we present here the first data focusing on the impact of the sars-cov- in the nasal cavity. our results could explain the high prevalence of anosmia observed in covid- patients and will need to be confirmed by analysis of human oe. cells of the olfactory epithelium: identification of cell types and trends with age neurologic alterations due to respiratory virus infections respiratory syncytial virus tropism for olfactory sensory neurons in mice il- c is involved in olfactory mucosa responses to poly(i:c) mimicking virus presence detection of novel coronavirus ( -ncov) by real-time rt-pcr. euro surveillance : bulletin europeen sur les maladies transmissibles = european communicable disease bulletin smell and taste disorders during covid- outbreak: a cross-sectional study on patients the long tale of the calcium activated cl(-) channels in olfactory transduction sars-cov- receptor and entry genes are expressed by sustentacular cells in the human olfactory neuroepithelium. biorxiv cns infection and immune privilege olfactory epithelium changes in germfree mice is the collapse of the respiratory center in the brain responsible for respiratory breakdown in covid- patients? g olf : an olfactory neuron specific-g protein involved in odorant signal transduction functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses spike protein recognition of mammalian ace predicts the host range and an optimized ace for sars-cov- infection olfactory and gustatory dysfunctions in patients hospitalized for covid- : sex differences and recovery time in real-life olfactory receptor neurons prevent dissemination of neurovirulent influenza a virus into the brain by undergoing virus-induced apoptosis severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace severe acute respiratory syndrome coronavirus infection of golden syrian hamsters explanation for covid- infection neurological damage and reactivations. transboundary and emerging diseases a systematic review of neurological manifestations of sars-cov- infection: the devil is hidden in the details dose-dependent response to infection with sars-cov- in the ferret model: evidence of protection to rechallenge. biorxiv pathogenesis and transmission of sars-cov- in golden hamsters alterations in smell or taste in mildly symptomatic outpatients with sars-cov proinflammatory cytokines in the olfactory mucosa result in covid- induced anosmia clinical manifestations and evidence of neurological involvement in novel coronavirus sars-cov- : a systematic review and meta-analysis we would like to thank pr. astrid vabret (head of the laboratoire de virology, chu de caen) for bsl facilities access and isolates production, estelle leperchois funding: this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. key: cord- -c o ukm authors: silvas, jesus a.; jureka, alexander s.; nicolini, anthony m.; chvatal, stacie a.; basler, christopher f. title: inhibitors of vps and lipid metabolism suppress sars-cov- replication date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c o ukm therapeutics targeting replication of sars coronavirus (sars-cov- ) are urgently needed. coronaviruses rely on host membranes for entry, establishment of replication centers and egress. compounds targeting cellular membrane biology and lipid biosynthetic pathways have previously shown promise as antivirals and are actively being pursued as treatments for other conditions. here, we tested small molecule inhibitors that target membrane dynamics or lipid metabolism. included were inhibitors of the pi kinase vps , which functions in autophagy, endocytosis and other processes; orlistat, an inhibitor of lipases and fatty acid synthetase, is approved by the fda as a treatment for obesity; and triacsin c which inhibits long chain fatty acyl-coa synthetases. vps inhibitors, orlistat and triacsin c inhibited virus growth in vero e cells and in the human airway epithelial cell line calu- , acting at a post-entry step in the virus replication cycle. of these the vps inhibitors exhibit the most potent activity. sars-cov- , a member of the betacoronavirus genus, is an enveloped positive-sense, rna virus responsible for a current pandemic . because of its profound impact on society and human health there is an urgent need to understand sars-cov- replication requirements and to identify therapeutic strategies . repurposing drugs developed for other purposes may provide a shortcut to therapeutic development - . the use of compounds known to target specific host factors may also elucidate key pathways needed for virus replication. in murine embryonic stem cell lines, autophagy was found to be critical for dmv formation and replication of the beta-coronavirus mouse hepatitis virus . however, studies in bone marrow derived macrophages or primary mouse embryonic fibroblasts lacking atg indicated that autophagy is not essential for dmv formation or mhv replication . an alternate model indicates that beta coronaviruses usurp vesicles known as edemosomes, which associate with compound. infections with sars-cov- at an moi of . were carried out by directly adding development of -well format assay to measure sars-cov- cytopathic effects sars-cov- induces significant cytopathic effects in infected vero e cells. based on this property, we standardized a -well format assay that provides continuous real-time, label- free monitoring of the integrity of cell monolayers, thereby providing assessment of virus growth through decreased cell viability. this assay was standardized using the maestro z platform (axion biosystems, atlanta, ga), an instrument that uses -well plates containing electrodes in each well (cytoview-z plates). the electrodes measure electrical impedance across the cell monolayer every minute throughout the course of the experiment. as sars-cov- replication damages the cell monolayer, impedance measurements decrease over time, providing a detailed assessment of infection kinetics. the capacity of the system to differentiate different levels of virus replication was first assessed. confluent vero e monolayers in cytoview-z plates were infected with sars-cov- . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint at multiple mois ( to . ) and resistance measurements were acquired for hours post- infection. as shown in figure a , the progression of infection at each moi was clearly distinct. a decrease in resistance could be observed as early as - h.p.i. at an moi of and , and as late as h.p.i. at an moi of . . depending on moi, signals reached their nadirs between to h.p.i. to correlate with a decrease in resistance, the raw kinetic data was used to determine the median time to cell death for each moi ( figure b ). based on its desirable kinetics, the moi of . was chosen for the screening of compounds for antiviral activities. to establish the maestro z as a potential instrument for screening of anti-sars-cov- therapeutics, we first tested remdesivir, a well-described inhibitor of sars-cov- that has been granted emergency use authorization (eua) for the treatment of covid- , . vero e cells were seeded on a cytoview-z plate, incubated overnight to allow cells to stabilize, pretreated with -fold dilutions of remdesivir for hour and infected with sars-cov- . resistance measurements were recorded for h.p.i. ( figure c ). in agreement with previous studies, we determined an % inhibitory concentration (ic ) for remdesivir of . µm ( figure d ) . taken together, these data validate the impedance-based assay described as a tool for screening of potential sars-cov- therapeutics. inhibitors of vps activity impair sars-cov- growth vps is a multifunctional protein involved in autophagy and membrane trafficking. since coronaviruses induce formation of double membrane vesicles for replication, we wanted to determine if vps activity was essential for sars-cov- replication. therefore, we tested two coa synthetases. to test these against sars-cov- , veroe cells were pre-seeded onto a cytoview-z plate, allowed to stabilize and then pre-treated with triacsin c or orlistat for hour before infection with sars-cov- at an moi of . . based on the toxicity window of - h.p.t. determined with the vps inhibitors, neither triacsin c nor orlistat induced early cytotoxic effects, even at the highest concentrations of um and um, respectively ( figure a and c). both compounds exhibited inhibition at the higher concentrations tested, although . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint complete inhibition was not achieved even with µm of orlistat. based on the data we extrapolated an ic of . um for orlistat and calculated an ic of . um for triacsin c ( figure b and d ). viruses such as hcv and rotavirus that are sensitive to inhibition by triacsin c are also impaired by inhibitors of dgats , . therefore, we tested the effects of dgat and dgat inhibitors t and pf , . neither compound displayed any inhibitory activity (supplemental figure ) . this data suggests that metabolism of fatty acids supernatants were collected at h.p.i. and titered on veroe cells by plaque assay. in parallel, to determine cytotoxicity of these compounds, calu- cells were seeded onto -well black walled -well plates, allowed to reach % confluency and treated with vps -in , pik-iii, triacsin c, orlistat, dmso, or mock treated with media alone. celltox green was added at the time of dosing and fluorescence measured at h.p.i. in order to assess cytotoxicity. each of the compounds inhibited production of infectious virus, as measured by plaque assay on vero e cells figure a , c, e, and g). in contrast to veroe cells, no cytotoxicity was observed even at the highest dose for each compound in calu- cells. we observed ic s of . µm (vps - in ), . µm (pik-iii), . µm (orlistat), and . µm (triacsin c), as shown in figure b , because each compound exhibited inhibitory effects when added after viral entry, we next asked whether the compounds altered the establishment of viral replication centers. calu- cells were seeded onto fibronectin coated glass cover slips and allowed to reach % confluency. cells were pre-treated with approximately the ic of vps -in ( µm), pik-iii ( µm), orlistat ( µm), or triacsin c ( µm) and infected with sars-cov- at a moi of . at h.p.i. cells were fixed, permeabilized, and indirect immunofluorescence performed using primary antibodies against sars-cov- nucleoprotein (n) and dsrna. we observed that when compared to the media only or dmso controls, n became completely cytoplasmic and did not form any large inclusion like formations in the presence of the compounds (figure ). additionally, even though dsrna could be detected both distributed throughout the cytoplasm and associated with n in large inclusion like formations in the media only and dmso controls, in the cells treated with inhibitors, dsrna was only found distributed throughout the cytoplasm. this data suggests that the compound disrupt replication center formation. here, we demonstrate that two vps inhibitors, orlistat, and triacsin c each have clear effects on sars-cov- replication and the morphology of viral replication centers. generation of replication centers is a key feature of the replication of many viruses - . these can serve as sites where required components concentrate within a relatively closed environment and hide viral replication products from the host innate immune response . in order to generate these centers, many viruses usurp host cellular pathways that are used to generate membranes or organelles . betacoronaviruses have been shown to target the erad-edemosome-er pathways to generate formation. that antiviral activity against hcv and rotavirus is connected to lipid droplet formation is supported by the fact that these viruses are sensitive to inhibition by the dgat inhibitors, t and pf . in contrast, the compounds did not exhibit any activity against sars-cov- in vero e cells whereas triacsin c did. this suggests an alternate role for long . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint chain fatty acyl coa or its downstream metabolites other than triacylglycerol and lipid droplets. it is notable that the ic for triacsin c was substantially lower in the calu- cell assay as compared to the vero cell assay. a lesser decrease in ic was also noted for orlistat in the coronavirus pandemic-therapy and vaccines review of the novel coronavirus cov- ) based on current evidence drug repurposing for new, efficient, broad spectrum antivirals drug repurposing approaches for the treatment of influenza viral infection: reviving old drugs to fight against a long-lived enemy repurposing anticancer drugs for covid- -induced inflammation, immune dysfunction, and coagulopathy coronavirus replication complex formation utilizes components of cellular autophagy coronaviruses hijack the lc -i-positive edemosomes, er-derived vesicles exporting short-lived erad regulators, for replication unconventional use of lc by coronaviruses through the alleged subversion of the erad tuning pathway a unifying structural and functional model of the coronavirus replication organelle: tracking down rna synthesis coronavirus replication does not require the autophagy gene atg rab and class iii phosphoinositide -kinase vps are involved in hepatitis c virus ns b-induced autophagy recruitment of vps pi k and enrichment of pi p phosphoinositide in the viral replication compartment is crucial for replication of a positive-strand rna virus modulation of triglyceride and cholesterol ester synthesis impairs assembly of infectious hepatitis c virus novel triacsin c analogs as potential antivirals against rotavirus infections rotaviruses associate with cellular lipid droplet components to replicate in viroplasms, and compounds disrupting or blocking lipid droplets inhibit viroplasm formation and viral replication evaluation of the antiviral activity of orlistat (tetrahydrolipstatin) against dengue virus, japanese encephalitis virus, zika virus and chikungunya virus involvement of fatty acid synthase in dengue virus infection the anti-obesity drug orlistat reveals anti-viral activity lipase inhibitor orlistat prevents hepatitis b virus infection by targeting an early step in the virus life cycle orlistat, a new lipase inhibitor for the management of obesity impedance-based cell monitoring: barrier properties and beyond identification and characterization of severe acute respiratory syndrome coronavirus replicase proteins remdesivir is a direct-acting antiviral that inhibits rna-dependent rna polymerase from severe acute respiratory syndrome coronavirus with high potency compassionate use of remdesivir in covid- characterization of vps -in , a selective inhibitor of vps , reveals that the phosphatidylinositol -phosphate-binding sgk protein kinase is a downstream target of class iii phosphoinositide -kinase fatty acid metabolism: target for metabolic syndrome fatty acid synthase and stearoyl-coa desaturase- are conserved druggable cofactors of old world alphavirus genome replication modulation of fatty acid synthase enzyme activity and expression during hepatitis c virus replication efficient hepatitis c virus particle formation requires diacylglycerol acyltransferase- targeting acyl-coa:diacylglycerol acyltransferase (dgat ) with small molecule inhibitors for the treatment of metabolic diseases discovery and optimization of imidazopyridine-based inhibitors of diacylglycerol acyltransferase (dgat ) severe acute respiratory syndrome coronavirus infection of human ciliated airway epithelia: role of ciliated cells in viral spread in the conducting airways of the lungs sars-cov replication and pathogenesis in an in vitro model of the human conducting airway epithelium building viral replication organelles: close encounters of the membrane types making of viral replication organelles by remodeling interior membranes acknowledgments. this work was supported by nih grants r ai and p ai key: cord- - aifz authors: laamarti, meriem; kartti, souad; alouane, tarek; laamarti, rokia; allam, loubna; ouadghiri, mouna; chemao-elfihri, m.w.; smyej, imane; rahoui, jalila; benrahma, houda; diawara, idrissa; essabbar, abdelomunim; boumajdi, nasma; bendani, houda; bouricha, el mehdi; aanniz, tarik; elattar, jalil; hafidi, naima el; jaoudi, rachid el; sbabou, laila; nejjari, chakib; amzazi, saaid; mentag, rachid; belyamani, lahcen; ibrahimi, azeddine title: genetic analysis of sars-cov- strains collected from north africa: viral origins and mutational spectrum date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aifz in morocco two waves of sars-cov- infections have been recorded. the first one occurred from march , with infections mostly imported from europe and the second one dominated by local infections. at the time of writing, the genetic diversity of moroccan isolates of sars-cov- has not yet been reported. the present study aimed to analyze first the genomic variation of the twenty-eight moroccan strains of sars-cov- isolated from march , to may , , to compare their distributions with twelve other viral genomes from north africa as well as to identify their possible sources. our finding revealed mutations in the moroccan genomes of sars-cov- compared to the reference sequence wuhan-hu- / , of them ( . %) were present in two or more genomes. focusing on non-synonymous mutations, ( . %) were distributed in five genes (orf ab, spike, membrane, nucleocapsid and orf a) with variable frequencies. the non-structural protein coding regions nsp -multi domain and nsp -rdrp of the orf ab gene harbored more mutations, with six for each. the comparison of genetic variants of fourty north african strains revealed that two non-synonymous mutations d g (in spike) and q h (in orf a) were common in four countries (morocco, tunisia, algeria and egypt), with a prevalence of . % (n = ) and . % (n = ), respectively, of the total genomes. phylogenetic analysis showed that the moroccan and tunisian sars-cov- strains were closely related to those from different origins (asia, europe, north and south america) and distributed in different distinct subclades. this could indicate different sources of infection with no specific strain dominating yet in in these countries. these results have the potential to lead to new comprehensive investigations combining genomic data, epidemiological information and the clinical characteristics of patients with sars-cov- . the new coronavirus , also known as severe acute respiratory syndrome coronavirus (sars-cov- ) ( ) is the causative agent of covid- , a new type of pneumonia that caused in late december, , an epidemic in wuhan, china, and then spread to countries around the world. in february, , covid- was emerged in north african countries, notably in egypt, tunisia, algeria and morocco ( , ) . the first case was reported in egypt on february , followed by algeria on february , then morocco and tunisia on the same day, march , ( , ) . due to the rapid transmission of viruses in the continents and the large number of confirmed cases, the world health organization (who) has declared (march , ) covid- as a global pandemic ( ) . as of june th , , and , ( . %) of confirmed and deceased cases, respectively, have been reported worldwide ( ) . it should be noted that mortality from sars-cov- differs considerably according to the geographic region. usa has the largest population of confirmed cases ( , , ) and deaths ( , ) ( ) . meanwhile, south america and europe were also hit hard with , , and , confirmed cases in brazil and russia, on their respective continents, while the african region had the least number of cases, with , ( ) . sars-cov- is a single-stranded positive-sense rna virus, coding for four structural proteins (spike (s), envelope (e), membrane (m) and nucleocapsid (n)), nonstructural proteins (nsp to nsp ) and several accessory proteins (orf a, orf , orf a, orf b, and orf ) ( , ) . protein s which s responsible for binding to membrane receptors in host cells (ace ) via its receptor-binding domain (rbd), therefore is considered as the most important target for candidate vaccines ( , , ) . it is known that the mutation rate of the rna virus contributes to viral adaptation, creating a balance between the integrity of genetic information and the variability of the genome, thus allowing viruses to escape host immunity and develop drug resistance ( , ) . our recent study ( ) based on the analysis of , genomes of sars-cov- variants belonging to countries, revealed . % of total mutations with a frequency greater than % of all the sequences analyzed suggesting that this virus is not yet adapted to its host. from the viral rna extracted from six clinical samples, the cdna was synthesized using reverse transcriptase with random hexamers, then amplified for genomes enrichment using q hot start high-fidelity dna polymerase (neb) using a set of primers targeting regions of the sars-cov- genome designed by artic network a set of sars-cov- genomes: from morocco, including six sequenced in the present study, from tunisia, from algeria, and from egypt, were downloaded from gisaid database (http://www.gisaid.org/) ( ) ( table ). the reads generated by minion nanopore-oxford of the six isolates were mapped to the reference sequence genome wuhan-hu- / using bwa-mem v . . -r ( ) with default parameters, while the data downloaded from gisaid database was mapped using minimap v . -r ( ) . the bam files were sorted using samtools ( ) and were subsequently used to call the genetic variants in variant call format (vcf) by bcftools ( ) . the final call set of the genomes, was annotated and their impact was predicted using snpeff v . t ( ) . we performed multiple sequence alignment using muscle v . ( ) for the moroccan strains with genomes of sars-cov- circulating in the world from different geographical areas (africa, asia, europe, north and south america and oceania) ( table s ) . maximum-likelihood trees were inferred with iq-tree v . . under the gtr model ( ) . generated trees were visualised using in order to identify the genetic variants of the sars-cov- moroccan genomes, genomes were studied, including six sequenced in the present study and twenty-two others available in gisaid database ( table ) . . % to . % of the reads produced for the six genomes were mapped on the reference sequence wuhan-hu- / (table s ). in all moroccan sars-cov- genomes, the analysis of genetic variants revealed mutations compared to the reference sequence (fig ) , including non-syn- (fig a) . these mutations have been distributed in seven genes, (orf ab, s, e, m, n, orf a and orf ) with variable frequencies. as regard to non-synonymous mutations (fig b) it is interesting to note that among the non-synonymous mutations, ( . %) were recurrent in two or more genomes (fig a) . the most frequent one was the d g mutation (in s protein) with a prevalence of . % (n = ) among the genomes included in this study, the second one was q h (in orf a) with a prevalence of . % (n = ). these two mutations have been observed within the four north african countries (fig b) . however, the eleven other mutations were variable between these four countries, for example, t i (in nsp ) was found in % of the genomes, including those of moroccan, algerian and tunisian origins. likewise, t i mutation (in nsp -rdrp) was found with a prevalence of . % within genomes belonging to morocco and tunisia. in addition, k r mutation (in nsp transmembrane domain- ) was present in % of the genomes from tunisia and egypt. the appearance and monitoring of genetic variants plays a major role in orienting the therapeutic approach for the development of candidate vaccines in order to limit this sars-cov- pandemic ( ) . to date, the genetic diversity of sars-cov- strains from north africa is poorly documented. in this study, we performed a genetic analysis of forty sars-cov- genomes from north africa, including twenty-eight from morocco ( newly sequenced), seven from tunisia, three from algeria and two from egypt, to provide new information on genetic diversity and transmission of sars- genetic diversity could potentially increase the physical shape of the viral population and make it difficult to fight, or reverse, make the virus weaker, which could be correlated with the loss of their virulence and a decrease in the number of critical cases ( ) . compared to the reference sequence of wuhan-hu- / , strains from north africa harbored to genetic variants, of which to are involved in the change of amino acids. these results are consistent with the mutation rate previously reported in sasr-cov- from different geographic areas ( , ( ) ( ) ( ) . in morocco, tunisia, algeria and egypt, five non-synonymous mutations were common within at least two countries. among them, d g (in s protein) and q h (in or a) were observed in strains from the four countries. the d g mutation is proximal to the s cleavage domain of advanced glycoprotein ( ) and was of great interest due to their predominance in the six continents ( , ) . alouane et al. ( ) showed that this mutation appeared for the first time on january , in the asian region (china), after a week it was also observed in europe (germany). the q h mutation was taken away end of february in africa (senegal), europe (france and belgium) and north america (usa and canada). likewise, our previous study ( ) showed that d g had no impact on the two-dimensional or three-dimensional structure of advanced glycoprotein. of the other three non-synonymous mutations that are variable between the strains from the four countries, t i mutation (in nsp ) was the orf ab polyprotein is known to be cleaved into non-structural proteins (nsp -nsp ) ( ) . we observed two domains rich in non-synonymous mutations, the first, nsp -multi domain due to its large size compared to other non-structural proteins and previously described as playing a different role in sars-cov- infection ( ) . likewise, nsp -rdrp displays the same number of non-synonymous mutations although it has a smaller size and considered as a key element of the replication/ transcription mechanism ( ) . a novel coronavirus from patients with pneumonia in china covid- : are africa's diagnostic challenges blunting response effectiveness preparedness and vulnerability of african countries against importations of covid- : a modelling study who declares covid- a pandemic coronavirus disease (covid- ) situtation report genome composition and divergence of the novel coronavirus ( -ncov) originating in china properties of coronavirus and sars-cov- subunit vaccines against emerging pathogenic human coronaviruses characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine the sars-cov- vaccine pipeline: an overview viruses at the edge of adaptation rna virus mutations and fitness for survival. annual review of microbiology genomic diversity and hotspot mutations in , sars-cov- genomes: moving toward a universal vaccine for the" confined virus gisaid: global initiative on sharing all influenza data-from vision to reality fast and accurate short read alignment with burrows-wheeler transform minimap : pairwise alignment for nucleotide sequences the sequence alignment/map format and samtools a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w muscle: multiple sequence alignment with high accuracy and high throughput iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies a review of sars-cov- and the ongoing clinical trials cross-species virus transmission and the emergence of new epidemic diseases large scale genomic analysis of sars-cov- genomes reveals a clonal geodistribution and a rich genetic variations of hotspots mutations sars-cov envelope protein: non-synonymous mutations and its consequences decoding the evolution and transmissions of the novel pneumonia coronavirus (sars-cov- /hcov- ) using whole genomic data prediction of the effectiveness of covid- vaccine candidates could the d g substitution in the sars-cov- spike (s) protein be associated with higher covid- mortality? bioinformatic prediction of potential t cell epitopes for sars-cov- covid- : the role of the nsp and nsp in its pathogenesis emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant pasteur institute (morocco) illumina-nextseq morocco/ n epi_isl_ - - anoual laboratory (morocco) appliedbiosystems pgm tunisia/cov tunisia/cov tunisia/cov tunisia/cov pasteur institute (algeria) illumina-nextseq we sincerely thank the authors and laboratories around the world who have sequenced and shared the full genome data for sars-cov- in the gisaid database. all data authors can be contacted directly via www.gisaid.org.this work was carried out under national funding from the moroccan ministry of higher education and scientific research (covid- program) to ai. this work was also supported by a grant to ai from institute of cancer research and the ppr- program to ai. the authors declare that they have no competing interests. key: cord- -iguhy z authors: calcagnile, matteo; forgez, patricia; iannelli, antonio; bucci, cecilia; alifano, marco; alifano, pietro title: ace polymorphisms and individual susceptibility to sars-cov- infection: insights from an in silico study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iguhy z the current sars covid- epidemic spread appears to be influenced by ethnical, geographical and sex-related factors that may involve genetic susceptibility to diseases. similar to sars-cov, sars-cov- exploits angiotensin-converting enzyme (ace ) as a receptor to invade cells, notably type ii alveolar epithelial cells. importantly, ace gene is highly polymorphic. here we have used in silico tools to analyze the possible impact of ace single-nucleotide polymorphisms (snps) on the interaction with sars-cov- spike glycoprotein. we found that s p (common in african people) and k r (common in european people) were, among the most diffused snps worldwide, the only two snps that were able to potentially affect the interaction of ace with sars-cov- spike. firedock simulations demonstrated that while s p may decrease, k r might increase the ace affinity for sars-cov- spike. this finding suggests that the s p may genetically protect, and k r may predispose to more severe sars-cov- disease. in principle, any new infectious agent that challenges a totally susceptible population with little or no immunity against it is able to totally infect the population causing pandemics. pandemics rapidly spread affecting a large part of people causing plenty of deaths with significant social disruption and economic loss. however, if we look at the even worst pandemics in the human history we can realize that ethnic and geographical differences in the susceptibility to disease actually exist, in spite of the infectious sources and transmission routes that are the same for all individuals . the current sars covid- (a shortened form of "coronavirus disease of ") epidemic spread appears to be similarly influenced by ethnical and geographical factors. after its initial spread in china, the pandemic is now progressing at an accelerating rate in western europe and the united states of america . in these regions, the causative agent, the severe acute respiratory syndrome corona virus - (sars-cov- ) is spreading incredibly quickly between people, due to its newness -no one on earth has immunity to sars covid- -and transmission route. yet, in the other regions of the world, the kinetics of diffusion and mortality seem less impressive, although the world has become highly interconnected as a result of a huge growth in trades and travels . a multitude of factors may concur to explain the ethnic and geographical differences in pandemic progression and severity, including cultural, social and economic inequality, as well as health care system organization, and climate also. mostly, considerable individual differences in genetic susceptibility to diseases may be involved . genomic predisposition is a major concept in modern medicine, and understanding of molecular bases of genetic predisposition can help to find prevention and treatment strategies for the corresponding diseases . in the sars covid- , even subtle inter-individual genetic differences may affect both the sars-cov- viral life cycle and the host innate and acquired immune response. sars-cov- is an enveloped positive-stranded rna virus that replicates in the cytoplasm, and uses envelope spike projections as a key to enter human airway cells . in coronaviruses spike glycoproteins, which forms homotrimers protruding from the viral surface, are a primary determinant of cell tropism, pathogenesis, and host interspecies transmission. spike glycoproteins comprise two major functional domains: an n-terminal domain (s ) for binding to the host cell receptor, and a c-terminal domain (s ) that is responsible for fusion of the viral and cellular membranes . following the interaction with the host receptor, internalization of viral particles into the host cells is accomplished by complex mechanisms that culminate with the activation of fusogenic activity of spike, as a consequence of major conformational changes that, in general, may be triggered by receptor binding, low ph exposure and proteolytic activation . in some coronaviruses spike glycoproteins are cleaved by furin, a golgi-resident protease, at the boundary between s and s domains, and the resulting s and s subunits remain non-covalently bound in the prefusion conformation with important consequences on fusogenicity . notably, at variance with sars-cov and other sars-like cov spike glycoproteins, sars-cov- spike glycoprotein contain a furin cleavage site at the s /s boundary, which is cleaved during viral biogenesis , and may affect the major entry route of viruses into the host cell . productive entry of coronaviruses that harbor non-cleaved spike glycoproteins (such as sars-cov) rely on endosomal proteases suggesting that this entry is accomplished by hijacking the host endocytic machinery . indeed, it has been reported that sars-cov infection is inhibited by lysomotropic agents because of the inhibition of the low-ph-activated protease cathepsin l . however, sars-cov is also able to fuse directly to the cell membrane in the presence of relevant exogenous proteases, and this entry route is believed to be much more efficient compared to the endocytic route . in fact, proteases from the respiratory tract such as those belonging to the transmembrane protease/serine subfamily (tmprss), tmprss or hat (tmprss d) are able to induce sars-cov spike glycoprotein fusogenic activity , , , . the first cleavage at the s -s boundary (r ) facilitates the second cleavage at position r releasing the fusogenic s ' subdomain . on the other hand, there is also evidence that cleavage of the ace c-terminal segment by tmprss can enhance spike glycoprotein-driven viral entry . notably, it has been very recently demonstrated that also sars-cov- cell entry depends on tmprss , and is blocked by protease inhibitors . sars-cov- and respiratory syndrome corona virus (sars-cov) spike proteins share very high phylogenetic similarities ( %), and, indeed, both viruses exploit the same human cell receptor namely angiotensin-converting enzyme (ace ), a transmembrane enzyme whose expression dominates on lung alveolar epithelial cells , , . this receptor is an -amino acid long captoprilinsensitive carboxypeptidase with a -amino acids n-terminal signal peptide and a c-terminal membrane anchor. it catalyzes the cleavage of angiotensin i into angiotensin - , and of angiotensin ii into the vasodilator angiotensin - , thus playing a key role in systemic blood pressure homeostasis, counterbalancing the vasoconstrictive action of angiotensin ii, which is generated by cleavage of angiotensin i catalyzed by ace although ace mrna is expressed ubiquitously, ace protein expression dominates on lung alveolar epithelial cells, enterocytes, arterial and venous endothelial cells, and arterial smooth muscle cells . there is evidence that ace may serve as a chaperone for membrane trafficking of an amino acid transporter b at (also known as slc a ), which mediates the uptake of neutral amino acids into intestinal cells in a sodium dependent manner . recently, . Å resolution cryo-em structure of full-length human ace in complex with b at was presented, and structural modelling suggests that the ace -b at can bind two spike glycoproteins simultaneously , . it has been hypothesized that the presence of b at may block the access of tmprss to the cutting site on ace , . b at (also known as slc a ) is expressed with high variability in normal human lung tissues, as shown by analysis of data available in oncomine from the work by weiss et al . notably, a wide range of genetic polymorphic variation characterizes the ace gene, which maps on the x chromosome, and some polymorphisms have been significantly associated with the occurrence of arterial hypertension, diabetes mellitus, cerebral stroke, coronary artery disease, heart septal wall thickness and ventricular hypertrophy , , . the association between ace polymorphisms and blood pressure responses to the cold pressor test led to the hypothesis that the different polymorphism distribution worldwide may be the consequence of genetic adaptation to different climatic conditions , . in this study we have used a combination of in silico tools to analyze the possible impact of ace single-nucleotide polymorphisms (snps) on the interaction with sars-cov- spike glycoprotein. results seem to suggest that ace polymorphism can contribute to ethnic and geographical differences in sars covid- spreading across the world. , lzg (sars-cov- spike rbd /ace complex) , m j (sars-cov- spike rbd /ace complex) , vw (chimeric sars-cov/sars-cov- spike rbd /ace complex) , and m (sars-cov- spike rbd /ace /b at complex) , (fig. a) . clustalo alignments of human ace amino acid sequences, in all models, similar to sars-cov rbm, sars-cov- rbm forms a concave surface that houses a convexity formed by two helices on the exposed surface of ace . strong network of hbond and salt bridge interactions mediate the receptor-ligand binding. global energy and several distinctive features of the d models with and without glycosylation are reported in supplementary table . contact residues are classified as: "permanent" (predicted as binding residues in all models), "stable" (predicted as binding residues in or out models), "unstable" (predicted as binding residues in or out models), "hyper-unstable" ( or models out of ). twenty-seven substitutions were predicted to influence the ace /spike interface in at least one of the different d pdb models. fifteen and seventeen were predicted to affect the ace /b at and ace /ace interfaces, respectively. some residues, which are described in uniprot database (https://www.uniprot.org/uniprot/q byf ) as important for the interaction between spike and ace , are permanent contact residues (predicted as binding) but all of these are non-polymorphic ( fig. b) . in contrast, polymorphic residues are stable, unstable or hyper-unstable. a list of snps from dbsnp (s p , i t, i v, e k, a t, k e, k r, t a, e d, e k, e k, s r, e g, m i, g e, e g, g v; d n), which were predicted to affect the ace /spike interface, was used for further analysis. snps possibly affecting ace glysosylation. supplementary table illustrates amino acid glycosylation sites, and structure of the glycosidic chains as inferred from different ace /spike complex pdb models. putative polymorphic sites (q r, n h, n d, n s) from dbsnp database that may affect ace glycosylation are also reported. one of these amino acid variations, n d, is rather common in south asia (supplementary table ). firedock was used to estimate the effects of removal of glycosidic residues or chains on ace interaction with sars-cov- spike rbd by calculating Δg values. the data indicated that removal of glycosidic chains results in either an increased or a decreased Δg values, depending on the pdb model ( fig. c ). in particular, removal of glycosidic moieties apparently strengthened the ace /spike interaction in sars-cov spike/ace in the ajf model, while it appeared to weaken the interaction between sars-cov- spike and ace in the vw model. in both cases, the effect was mostly due to removal of the terminal beta-mannose (bma) (fig. c) , which was predicted to decorate a glycosidic chain attached to aspartic amino acid residue at position that maps in a helix that is involved in the interaction with spike, as shown in fig. d . noteworthy, in the vw model, bma is involved in two h-bonds and one pseudo-bond (fig. e ), and these bonds are lost in non-glycosylated models. in contrast, in the ajf model, the bma forms only one h-bond, and after removal of terminal bma, the thr- acquires more grads for binding thereby strengthening the interaction with ace . these results seem to suggest that ace glycosylation may play a different role in modulating the interaction with sars-cov spike and sars-cov- spike. were, among the most diffused snps worldwide, the only two snps that were able to potentially affect the interaction of ace with sars-cov spike and sars-cov- spike (supplementary table ). in particular, the s p snp is rather common in african people with a frequency about . %, while k r snp is frequent in european people with a frequency about . % (supplementary table ). firedock results indicated that the s p substitution decreased the affinity of ace with spike in ajf and vw models (fig. c ) and similar results were obtained with all other models. moreover, this amino acid substitution seems also to affect the ace n-terminal cleavage site (fig. d ), and when firedock simulations were carried out on ace with the alternative cleavage site, the effects of s p snp was much more impressive (fig. c) . in contrast, the k r and the less common k e substitutions appeared to increase the affinity of ace with sars-cov- spike ( ajf model), and slightly decrease the affinity of ace with sars-cov spike ( vw and m ) models (fig. a) . as vw was generated with a chimeric sars-cov/sars-cov- spike, to support our results we performed an additional simulation by challenging the ace structure from vw with the spike structures that were generated by the different models (fig. b) , and the results confirmed those shown in fig. a . noteworthy, the receptor-ligand interactions was much weaker in m (sars-cov- spike rbd /ace /b at complex) with respect to the other models, confirming an inhibitory function of b at . however, in this model, at lower energy values, the effects of k r/e substitutions were much more evident. shows that ace (computed by using d pdb model m as input file) is characterized by a high deformation tract that is located immediately upstream of the transmembrane domain (fig. b) , whereas the c-terminal tail is characterized by high fluctuation (fig. c ). suggest that the hydrophobic domain alone is highly unstable in the membrane confirming that a chaperone is required for correct topology maintenance. this function was assigned to the moonlighting amino acid transporter b at . to investigate dynamic properties of ace globular head, the trans-membrane helix and conserved domains were firstly mapped on a d structure. then, dynamut simulation was carried out on ace by using wv pdb model (without the transmembrane domain) (fig. ab) . results indicate that some residues of the ace interface, which are involved in the interaction with sars-cov- spike glycoprotein can actually fluctuate (fig. cd ). dynamic properties of sars-cov- and sars-cov spike proteins were also investigated. between the two helices and beta-sheet, and between the residues of the beta-sheet are illustrated in supplementary fig. d (fig. a, supplementary fig. and supplementary table ). all amino acid residues that were reported as polymorphic in dbsnp were analyzed. in that women are probably more prone to infection but often present a less severe disease. although higher incidence of cardiac, respiratory and metabolic co-morbidities are probably responsible for more severe form of infection in men, estrogen-induced upregulation of ace expression would explain increased susceptibility of women to a less severe and often asymptomatic form of disease. furthermore, the ace gene is located on xp , in an area where genes are reported to escape from x-inactivation, further explaining higher expression in females , . on the other hand, it has been hypothesized that, regardless of sex, pharmacological (antihypertensive drugs, such as ace inhibitors and sartans) or environmental factors (no pollution), capable of inducing an overexpression of ace could be responsible of increased susceptibility to infection and/or greater severity . ace plays an essential role in the renin-angiotensinaldosterone system, and its loss of function due to the massive binding of viral particles and internalization could constitute an essential element of the pathophysiology of pulmonary and cardiac damage during covid- infection , . in this context it should be underlined that ace probably plays a dual role in the dynamic of infection and disease course. while at beginning ace overexpression may increase the entry of the virus into the cell and its replication, its consequent viral-induced loss of function results in an unopposed accumulation of angiotensin ii that further aggravates the acute lung injury response to viral infection. indeed, in the rodent blockade of the renin-angiotensin-aldosterone system limits the acute lung injury induced by the sars-cov- spike protein , suggesting that if ace function is preserved (because of increased baseline expression, as especially seen in pre-menopausal women), clinical course of infection might be less severe. it has been suggested that polymorphisms in the ace gene could reduce the spike affinity, with subsequent lower susceptibility to infection: in this hypothesis, their geographical / ethnical distribution could explain the strong discrepancies in infection rate and/or lethality observed worldwide . effectively, we showed by network plot and non-metric multidimensional scaling that most of the snps diffused worldwide did not affect significantly the interaction of ace with sars-cov- spike. s p was one of the rare polymorphisms able to potentially affect this interaction, by lowering the affinity. this polymorphism is more frequent in african populations, but its diffusion ( . %) remains too low to explain, except in minimal part, the reduced death toll . this allowed the authors to predict that hamsters could be infected, which was experimentally confirmed -underlining the reliability of in silico modeling-and could be subsequently at the origin of inter-animal transmission. however, hamsters, although developing clinical signs of the infection and relative histopathological changes, did not die : we speculate that lethality may be related to spike/ace affinity. on the other hand, the lower affinity in bat could explain -besides a better immune control-why these animals are carriers without dying. in the same study, chan and colleagues so, if modestly snp-determined lower affinity between spike and ace does not seem to explain the differences in the distribution and lethality of the disease in humans, we hypothesize that the question can be addressed in a specular way: perhaps polymorphisms responsible for higher affinity can be responsible of higher severity of disease, especially when very high affinity receptors are overexpressed because of the above mentioned environmental and pharmacological factors. obviously underlying diseases would contribute to an even more severe course of the disease, with an intense viral replication capable of infecting in turn a large number of persons, including some individuals with similar ace polymorphisms, and so on. our in silico models allowed us to identify k r and k e as snps with a possible increase in spike/ace affinity. k r snp is relatively frequent in european people with a frequency about . %, which would correspond to a potential target population of , , people at the european union level. in addition to firedock , simulations that led to predict the possible effects of s p, k r and k e ace snps, dynamut and encom tools were used to compare dynamic features of ace and its polymorphic variants in order to analyze the possible indirect effects on binding interfaces of snps that are located outside these interfaces. snps i v, v a and a t were identified as the most common snps that may produce these indirect effects in dynamic models. although the precise effects of these snps on the interaction between ace and sars-cov- or sars-cov spike proteins have to be determined in more detail, nevertheless, it is desirable to use dynamic modeling to unmask indirect effects of snps. it seems necessary to confirm in vivo that, among patients with serious disease and/or fatal outcome, polymorphisms responsible for a very high spike/ace affinity are more frequent than among patients with less severe/asymptomatic disease or even than in general population. obviously, the impact of these polymorphisms on severity of outcome should be weighted by appropriate demographic and clinical factors. if these differences were confirmed, this would pave the way for the identification, on a population scale, of healthy individuals whose molecular phenotypes would be responsible for more serious disease. apart from the usual social distancing measures, which could be reinforced for these cases, targeted drug prevention strategies could be evaluated. it could be logical to assess pharmacological prophylactic interventions, as proposed in categories of healthy people at particular risk of exposure such as care-givers. in particular, chloroquine, interfering with n-terminal glycosylation of ace , could lower its affinity for spike, thus representing an interesting candidate. in our in silico model, we found that removal of glycosidic moieties weakened the interaction between sars-cov- spike and ace . the serine protease inhibitor camostat mesylate, approved in japan to treat unrelated diseases, has been shown to block tmprss activity , and is thus another interesting candidate. on the other hand, the identification of broader categories of people with lower risk of developing severe disease, could allow a safer exit from the lock-down phase, while facilitating the establishment of a faster herd immunity, and waiting reliable serological tests and, above all, effective vaccines. on the basis of our in silico study we speculate that infection and mortality are databases. d structures of proteins were downloaded from pdb (rcsb protein data bank ). we focused our analysis on ajf for sars-cov , was used to identity the ace receptor snps, and to select the most diffused ones. functional information was acquired by uniprot database . chimera was used as a tool for image generation, d mapping, pdb managing and to analyze the results. binding interface characterization. the selected pdb models were analyzed by a structural point of view using chimera software in order to identify the glycosylation sites and the secondary structures of proteins involved in the binding between ace and spike protein receptor binding domain (rbd). to estimate the effect of glycosylations we implemented a static model. chimera was used to remove glycosydic residues, while firedock , was used to compute the global energy scores between the native structures and the de-glycosylated models. on the other side, starting from the entire list of snps, ssipe (evoef) was used to identify the residues involved in the binding interfaces. a second step, which was carried out with ssipe (ssipe) , was aimed at estimating the effects of single snps, and to generate mutant models. different snps lists were obtained, which were compared, and used to identify the most stable binding amino acid residues. ssipe analysis performed with the pdb model m was used to map: the ace /spike protein interaction interface, ace /ace dimerization interface, and b at /ace interaction interface. the model contains ace in the dimeric form (with the hydrophobic domains) and b at , while in all the others models the transmembrane domains, ace /ace interface and b at /ace interface are absent. spike models reported in this study as ligands. in a similar manner, other static models from ssipe were used as models to estimate the variation in terms of free energy related to polymorphisms that map on the ace dimerization interface and ace -b at binding interface. legends to figures fig. . d oscillation plot of the distance between ace amino acid residues mapping in the two helices that are involved in binding with spike proteins. b d images of regions containing the amino acid residues shown in panel a. c variance of the distance between ace amino acid residues mapping in the two helices that are involved in binding with spike proteins. d-e oscillation plot of the distance between the two helices and beta-sheet (d), and between the residues of the beta-sheet (e). f-g variance of the distance of amino acid residues between the two helices and beta-sheet (f), and between the residues of the beta-sheet (g). showing the results of snps that mapping outside the ace interfaces. pandemics: waves of disease, waves of hate from the plague of athens to a.i.d.s are patients with hypertension and diabetes mellitus at increased risk for covid- infection? genomic modulators of the immune response genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan mechanisms of coronavirus cell entry mediated by the viral spike protein structure, function, and antigenicity of the sars-cov- spike glycoprotein inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry protease-mediated enhancement of severe acute respiratory syndrome coronavirus infection cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response cleavage of the sars coronavirus spike glycoprotein by airway proteases enhances virus entry into human bronchial epithelial cells in vitro a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses cryo-em structure of the -ncov spike in the prefusion conformation covid- and the cardiovascular system tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis a protein complex in the brush-border membrane explains a hartnup disorder allele structure of dimeric full-length human ace in complex with b at structural basis for the recognition of sars-cov- by full-length human ace frequent and focal fgfr amplification associates with therapeutically tractable fgfr dependency in squamous cell lung cancer ace gene polymorphism and essential hypertension: an updated metaanalysis involving , subjects the combination of ace i/d and ace g a polymorphisms revels susceptibility to hypertension: a genetic association study in brazilian patients association of angiotensin-converting enzyme gene polymorphism and enzymatic activity with essential hypertension in different gender: a case-control study polymorphisms of ace are associated with blood pressure response to cold pressor test: the gensalt study structure of sars coronavirus spike receptorbinding domain complexed with receptor structure of the sars-cov- spike receptor-binding domain bound to the ace receptor. nature - structural basis of receptor recognition by sars-cov- . nature - evodesign: designing protein-protein binding interactions using evolutionary interface profiles in conjunction with an optimized physical energy function evoef : accurate and fast energy function for computational protein design ssipe: accurately estimating proteinprotein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function firedock: fast interaction refinement in molecular docking firedock: a web server for fast interaction refinement in molecular docking dynamut: predicting the impact of mutations on protein conformation, flexibility and stability charmm-gui: a web-based graphical user interface for charmm membrane builder for mixed bilayers and its application to yeast membranes vmd: visual molecular dynamics scalable molecular dynamics with namd cabs-flex . : a web server for fast simulations of flexibility of protein structures encom server: exploring protein conformational space and the effect of mutations on protein function and stability task force covid- del dipartimento malattie infettive e servizio di informatica, istituto superiore di sanità -epidemia covid- . aggiornamento nazionale transmission potential and severity of covid- in south korea x-inactivation profile reveals extensive variability in x-linked gene expression in females x chromosome gene expression in human tissues: male and female comparisons renin-angiotensin system at the heart of covid- pandemic angiotensin converting enzyme : sars-cov- receptor and regulator of the renin-angiotensin system a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury simulation of the clinical and pathological manifestations of coronavirus disease lethal infection in k -hace mice infected with sars-cov sars coronavirus infection of mice transgenic for the human angiotensinconverting enzyme (hace ) virus receptor mice transgenic for human angiotensin-converting enzyme provide a model for sars coronavirus infection simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry protease inhibitors targeting coronavirus and filovirus entry the rcsb protein data bank: integrative view of protein, gene and d structural information dbsnp: the ncbi database of genetic variation searching ncbi's dbsnp database uniprot: the universal protein knowledgebase ucsf chimera-a visualization system for exploratory research and analysis past: paleontological statistics software package for education and data analysis patchdock and symmdock: servers for rigid and symmetric docking gramm-x public web server for protein-protein docking the swiss-model repository and associated resources the i-tasser suite: protein structure and function prediction signalp . improves signal peptide predictions using deep neural networks we wish to thank prof. diane damotte (university of paris) for advice and critical reading of the manuscript. m.a., p.a.: conception, coordination, designing, writing.m.c.: experimental set-up, pipeline development, in silico analysis; p.f., a.i., designing, data providing; the authors declare no competing interests. the authors declare no competing interests. file) supplementary key: cord- -pp vlaye authors: li, jingjing; quan, weipeng; yan, shuge; wu, shuangju; qin, jianhu; yang, tingting; liang, fan; wang, depeng; liang, yu title: rapid detection of sars-cov- and other respiratory viruses by using lamp method with nanopore flongle workflow date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pp vlaye the ongoing novel coronavirus (covid- ) outbreak as a global public health emergency infected by sarc-cov- has caused devastating loss around the world. currently, a lot of diagnosis methods have been used to detect the infection. the nucleic acid (na) testing is reported to be the clinical standard for covid- infection. evidence shows that a faster and more convenient method to detect in the early phase will control the spreading of sars-cov- . here, we propose a method to detect sarc-cov- infection within two hours combined with loop-mediated isothermal amplification (lamp) reaction and nanopore flongle workflow. in this approach, rna reverse transcription and nucleic acid amplification reaction with one step in minutes at - °c constant temperature environment, nanopore flongle rapidly adapter ligated within minutes. flongle flow cell sequencing and analysis in real-time. this method described here has the advantages of rapid amplification, convenient operation and real-time detection which is the most important for rapid and reliable clinical diagnosis of covid- . moreover, this approach not only can be used for sars-cov- detection but also can be extended to other respiratory viruses and pathogens. until april , the outbreaking covid- around the world caused by sars-cov- have resulted in almost , , infection cases and more than , deaths. early studies show that covid- have an incubation period time when you catch a virus until your symptoms start. during the incubation period time sars-cov- can also occur human-to-human covid- transmission. at this moment, there is no effective vaccine or drugs for the covid- . therefore, rapidly and conveniently diagnosis of the patients in the early phase was supposed to the most effectively way to control the covid- epidemic. here, we use nanopore flongle workflow combined with lamp reaction to propose a faster and more convenient method to detect sars-cov- and other respiratory viruses in two hours. lamp amplification is a rapid, convenient, highly sensitive and specific technique for clinical samples. amplification reaction can be done less than minutes in a friendly environment. nanopore flongle is designed to be the quickest, most accessible and costefficient for real-time sequencing. we use the nanopore flongle workflow for genome sequencing and analysis after sequencing minutes for sars-cov- identification. this study presents a lamp based method combined with nanopore flongle rapid realtime sequencing workflow to detect covid- as low as . × ^ copies/ml of sars-cov- in both laboratory and wild-caught environment. it takes less then two hours to diagnose the covid- from the rna isolate. in additional, it does not require highly trained people and does not involve expensive and sophisticated equipment for amplification. in summary, this approach can be used at the point of care by field and local personnel for the rapid diagnosis of sars-cov- as well as the investigation of outbreaks of other respiratory diseases in the future. we propose a fast and efficient method (figure ) for the detection of covid- infection and other respiratory viruses. the method described here can detect sars-cov- and influenza b virus within two hours at a low copy to . × ^ copies/ml in a friendly environment. the species-specific genes (suppl. table ) were chosen as target genes for sars-cov- and influenza b virus. all the target genes were synthesized and then cloned into a pgem-t easy vector by tsingke (wuhan, china). primers for lamp reaction were synthesized by invitrogen biotech (shanghai, china). each primer (suppl. figure figure ) of amplification was smear band, which like staircases, and the longest fragment was up to kb. the reaction solution was purified by μl ampure xp beads (beckman, a ), and quantified with qubit® dsdna hs assay kit (invitrogen, q ). ng input . × ^ copies/ml amplification products was used to construct ont rapid sequencing library (oxford nanopore, sqk-rbk ) according to the manufacturer's instructions. μl total reaction mixture containing . μl of purified amplification dna and water, . μl fragmentation mix (fra) were added together to the reaction tube. mix well and incubate the reaction tube at °c for minute and then at °c for minute. briefly put the tube on ice to cool it down. the fra is a barcoded transposome complex, which can cleave randomly dna and add barcoded transposase adapters to the cleavage sites. μl rapid adapter (rap) add to μl previous reaction to attach sequencing adapter. mix well, and incubate the reaction for minutes at room temperature. then the nanopore library was loaded into flongle flowcell for real-time sequencing. the sequencing data were processed as describe for pathogen identification after running minutes. to test the limit of detection, the amplification products of dilution gradient . × ^ , . × ^ , . × ^ , . × ^ , . × ^ copies/ml and negative control total samples were constructed another barcoding library (oxford nanopore, sqk-rbk ) as described above and sequenced using a promethion flowcell to achieve more data. at the same time of real-time sequencing basecalling is doing in the nanopore platform with high accuracy mode by guppy (version . . ). quality control was processed by minknow (version . . ) software pipeline in the machine with reads quality value score cutoff . the reads after quality control called passed reads generated by the nanopore platform was processed in real-time. we use qcat (version . . ) to demultiplex the barcoding samples. statistical analysis of nanopore long reads sequencing data about reads length distribution, reads count, reads score was processed with an in-house pipeline. we align nanopore long reads to virus genome database with minimap (version . ) and use samtools (version . ) to demonstrate the coverage and depth. we confirm the sample is positive or not for the virus by count the aligned reads ratio, aligned reads count, aligned reads identity. the study design ( figure ) for sars-cov- detection is based on lamp rapid amplification of specific genes and sequenced by nanopore flongle workflow. we use standard plasmids of orf ab gene, n gene from sars-cov- , plasmids of ha gene, m gene from influenza b virus as a mixed sample. furthermore, homo sapiens gapdh gene was added to the mixed genes as one of the reaction controls. the amplified products of those target genes were ligated by nanopore flongle rapidly adapter, followed by sequencing on nanopore flongle workflow. for the detection sensitivity of sars-cov- identification by lamp reaction and different reaction times, we diluted the mixed sample to . × ^ , . × ^ , . × ^ , . × ^ , . × ^ copies/ml and with lamp reaction minutes and minutes. total dilution samples and a negative control (pure water) were prepared for lamp pcr amplification. in order to generate more data and improve detection sensitivity, we barcoding the samples and pooling together for nanopore promethion platform. compared with qpcr testing our method will output nucleic acid sequence from target genes if the sample is positive. so, we can identify whether the sample is the positive or negative by aligned reads ratio, aligned reads count, aligned coverage and identity. after amplification, we use nanopore flongle workflow to generate sequences data in real-time. this sequencing model allowed us to align the reads to the genome and process the data in real-time, so we can identify the covid- infection after sequencing in few minutes. in order to detect the sars-cov- , the species-specific genes orf ab and n gene were chosen as target genes after aligned to other hcovid (human coronavirus) such as mers, sars, oc , hku that happened to human. lamp primers were designed by using ncbi primer-blast. the primers were aligned to other coronaviruses and published sars-cov- sequences in genebank by blast alignment on line in ncbi. the aligned results showed the primers we chosen were species-specific. the influenza b virus primers take the same approach ha gene and m gene were chosen as target gene. a nanopore flongle flowcell sequencing library was prepared using ng of dna amplification product (amplification minutes) as input to sqk-rbk kit (oxford nanopore technology, uk). this library was sequenced on nanopore gridion device with running minutes, hour, hours, hours for real-time analysis (suppl. figure a ). nanopore reads were base called using guppy (version . . ) at a high accuracy mode. output fastq files were quality controlled by filtering reads quality value called passed reads (suppl. figure b ). for running minutes we got more then passed reads to detect covid- , the max read length is bp, average reads length is bp, mean reads quality score is . after hours finished sequencing total generated reads, the reads length and quality have the same performance when sequencing minutes (suppl. figure ) . the nanopore promethion is a high-throughput sequencer compare with nanopore flongle, we take more stringent barcode score to when using qcat to improve demultiplexed accuracy. the negative control (amplification minutes) after demultiplexed generate reads, probably caused by nanopore barcode demultiplexed error. compared with other positive samples about reads count the order of magnitude is very small. in order to achieve a rapid detection of covid- infection, the sequenced data were analysed in real-time. we compare the results by data sequenced time line minutes, hour, hours, hours. minimap (version . ) was used to map the nanopore long reads against the reference genome to count the detected reads count with aligning identity %. for different running time more than % ( figure a ) of the sequenced reads were aligned to reference genome and have a high identity against the reference. then the detected reads (table ) of each time line was assigned to each target gene ( figure b ). aligned reads and assigned reads show uniform performance to target species and target genes (suppl. figure ). compared with sequencing hours, sequencing minutes output more than nanopore reads used to sars-cov- detection showed the same performance and results. so, we can identify sars-cov- infection by sequencing few minutes with this method. to confirm the detection sensitivity of sars-cov- by nanopore platform, we use the mixed sample diluted to different gradients (from - . × ^ genome copies/ml) with different reaction time from minutes to minutes together a negative control total samples. after amplification (table ) the lamp can complete target nucleic acid amplification in minutes at - °c constant temperature environment. furthermore, products after lamp reaction were loaded into nanopore flongle adpater to sequence and analysis in real-time. to confirm the lamp have a highly sensitivity, we dilute the samples to . × ^ , . × ^ , . × ^ , . × ^ , . × ^ copies/ml, results show that for sars-cov- n gene can be detected after diluted to ^ copies/ml. within two hours have a high sensitivity at . × ^ copies/ml. as we known lamp and rt-lamp have been established for use as a highly sensitive methods for pathogen detection also contain sars in . we combine lamp and nanopore flongle workflow to detect sars-cov- in two hours. in conclusion, we report a rapid, accurate and convenient method for covid- infection detection by using lamp amplification with nanopore flongle workflow. the lamp reaction has been demonstrated to be a simple, fast, and highly sensitive method for sequence specific viral nucleic acid detection. the nanopore flongle workflow was designed to offer rapid, low cost and on demand sequencing in real time. by additional barcoded design on lamp and individual samples , this method can be scaled to detect millions of samples one day by using nanopore gridion or promethion platform. moreover, this method for covid- rapid detection can also extend to other pathogens by additional primer design. accelerated reaction by loop-mediated isothermal amplification using loop primers loop-mediated isothermal amplification of dna loop-mediated isothermal amplification (lamp): a rapid, accurate, and cost-effective diagnostic method for infectious diseases direct nucleic acid analysis of mosquitoes for high fidelity species identification and detection of wolbachia using a cellphone detection of viral pathogens with multiplex nanopore minion sequencing: be careful with cross multiplex logic processing isothermal diagnostic assays for an evolving virus infection by a novel dna amplification method , loop-mediated isothermal amplification simple differentiation method of mumps hoshino vaccine strain from wild strains by reverse transcription loop-mediated isothermal amplification (rt-lamp rapid and simple detection of ebola virus by reverse transcription -loop -mediated isothermal amplification minimap : pairwise alignment for nucleotide sequences multiple cross displacement amplification coupled with gold nanoparticles-based lateral flow biosensor for detection of the mobilized colistin resistance gene mcr- novel methodology for rapid detection of kras mutation using pna-lna mediated loopmediated isothermal amplification development of an allele-specific, loop-mediated, isothermal amplification method (as-lamp) to detect the l f kdr-w mutation in anopheles gambiae s. l streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time minion™ sequencing loopmediated isothermal amplification (lamp) for point-of-care detection of asymptomatic low-density malaria parasite carriers in zanzibar the development of loop-mediated isothermal amplification targeting alpha-tubulin dna for the rapid detection of plasmodium vivax multiplex pcr assay for identification of human diarrheagenic escherichia coli rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis next-generation sequencing confirmation of real-time rt-pcr false positive influenza-a virus detection in waterfowl and swine swab samples rapid detection of sars-cov- using reverse transcription rt-lamp method rapid molecular detection of sars-cov- (covid- ) virus rna using colorimetric lamp key: cord- -f q j iu authors: nick, benjamin c.; pandya, mansi c.; lu, xiaotao; franke, megan e.; callahan, sean m.; hasik, emily f.; berthrong, sean t.; denison, mark r.; stobart, christopher c. title: identification of a critical horseshoe-shaped region in the nsp (mpro, clpro) protease interdomain loop (idl) of coronavirus mouse hepatitis virus (mhv) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: f q j iu human coronaviruses are enveloped, positive-strand rna viruses which cause respiratory diseases ranging in severity from the seasonal common cold to sars and covid- . of the human coronaviruses discovered to date, emergent and severe human coronavirus strains (sars-cov, mers-cov, and sars-cov- ) have recently jumped to humans in the last years. the covid- pandemic spawned by the emergence of sars-cov- in late has highlighted the importance for development of effective therapeutics to target emerging coronaviruses. upon entry, the replicase genes of coronaviruses are translated and subsequently proteolytically processed by virus-encoded proteases. of these proteases, nonstructural protein (nsp , mpro, or clpro), mediates the majority of these cleavages and remains a key drug target for therapeutic inhibitors. efforts to develop nsp active-site inhibitors for human coronaviruses have thus far been unsuccessful, establishing the need for identification of other critical and conserved non-active-site regions of the protease. in this study, we describe the identification of an essential, conserved horseshoe-shaped region in the nsp interdomain loop (idl) of mouse hepatitis virus (mhv), a common coronavirus replication model. using site-directed mutagenesis and replication studies, we show that several residues comprising this horseshoe-shaped region either fail to tolerate mutagenesis or were associated with viral temperature-sensitivity. structural modeling and sequence analysis of these sites in other coronaviruses, including all human coronaviruses, suggests that the identified structure and sequence of this horseshoe regions is highly conserved and may represent a new, non-active-site regulatory region of the nsp ( clpro) protease to target with coronavirus inhibitors. importance in december , a novel coronavirus (sars-cov- ) emerged in humans and triggered a pandemic which has to date resulted in over million confirmed cases of covid- across more than countries and territories (june ). sars-cov- represents the third emergent coronavirus in the past years and the future emergence of new coronaviruses in humans remains certain. critically, there remains no vaccine nor established therapeutics to treat cases of covid- . the coronavirus nsp protease is a conserved and indispensable virus-encoded enzyme which remains a key target for therapeutic design. however, past attempts to target the active site of nsp with inhibitors have failed stressing the need to identify new conserved non-active-site targets for therapeutic development. this study describes the discovery of a novel conserved structural region of the nsp protease of coronavirus mouse hepatitis virus (mhv) which may provide a new target for coronavirus drug development. we used a combination of alanine-scanning mutagenesis and c-terminal additions and deletions to initially mutate the mhv nsp idl ( table ) . of the amino acids comprising the loop, a total of virus mutants were successfully recovered (p a, r a, a i, v i, v i, p a, q a, and y a), amino acid residues failed to permit virus recovery despite multiple attempts at rescue (y a, d a, q a, q a, and t a), and amino acid residues were not evaluated (l , v , and d ). among the unrecovered mutants, additional attempts to rescue using more conservative amino acid substitutions at residues d (d e) and q (q n) were also unsuccessful. a total of four different c-terminal modifications were also attempted, which included different c-terminal additions (a duplication of residues - and a duplication of residue ) and different c-terminal deletions (a deletion of residues - and a deletion of residue ). all four of these c- terminal modifications to the nsp idl failed to permit virus recovery. analyses of plaque formation, replication, and protease activity reveal a novel temperature-sensitive mutant in the mhv nsp idl. to evaluate the replication kinetics of each of the recovered mhv nsp idl mutants, we infected confluent dbt- cells with an moi of . of each of the idl mutants and titered aliquots over a h period ( fig. a) . all recovered mhv idl mutants exhibited indistinguishable replication kinetics compared to wt mhv. previously, we described a total of separate temperature-sensitive mutations (tsv a, tss a, and tsf l) in the mhv nsp protease whose phenotypes could be suppressed through long-distance second-site suppressor mutations ( , , ). to evaluate whether any of the recovered mhv nsp idl mutants may exhibit a temperature-sensitive phenotype, we performed an efficiency of plating (eop) analysis by comparing the titers of each idl virus by plaque assay determined at a physiologic ( °c) and elevated temperature ( °c) (fig. a) . average eop values were determined by the average ratios of titers at °c compared to °c, with those eop values less than - indicating a greater than -fold reduction in titers at the elevated temperature as being temperature-sensitive (ts). wt mhv exhibited an average eop of . x - . in contrast, previously described ts nsp mutant virus s a, exhibited an average eop of . x - , consistent with the eop previously reported ( ). two separate mhv nsp idl mutants exhibited average eop values less than - and were significantly lower than wt mhv (p< . ): p a and r a. mutant p a exhibited an average eop of . x - . in contrast, idl mutant r a resulted in a much lower average eop of . x - , which was not significantly different from the known ts mutant s a. no other idl mutants exhibited average eops significantly different from wt mhv. these data suggested that mutagenesis of two separate idl residues (p a and r a) have resulted in novel temperature-sensitive phenotypes. to determine whether the observed differences in phenotype for idl mutants p a and r a are due specifically to defects in nsp protease activity or some other long- distance effect, we performed a western blot to evaluate the ability for the p a and r a nsp proteases to process the maturation cleavage of a downstream replicase (pp ab) protein, nsp , during virus replication (fig. b) . lysates from wt-, p a-, and r a-infected dbt- cells were compared for nsp -mediated nsp processing at °c compared to °c. wt-mhv and p a exhibited approximately equivalent levels (ratios of . and . , respectively) of nsp protein detected at both temperatures. consistent with its temperature-sensitive eop, virus mutant r a exhibited reduced nsp protein detected at °c compared to °c (ratio of . ) and when normalized to wt, exhibited an approximate % reduction in mature nsp protein produced at the elevated temperature. these data demonstrate that mhv nsp idl mutation r a is associated with reduced nsp activity at °c, whereas no appreciable difference in processing at °c was detected for mutant p a. to assess the impact of elevated temperature on replication of the recovered mhv idl mutant viruses, we repeated the moi . replication assay in dbt- cells at °c (fig. b) . in contrast to replication at °c, the replication kinetics among the mhv idl strains were far more variable, with most strains exhibiting a delay in logarithmic growth compared to wt mhv. mutant p a, which had shown a temperature-sensitive eop of . x - , failed to exhibit replication kinetics that were significantly different for wild-type or the other mhv idl strains. in contrast, mutant strain r a showed significantly delayed replication kinetics to reach the maximal logarithmic growth rate (p< . ) compared to wt mhv consistent with its temperature-sensitive eop of . x - . collectively, these data indicate that mutant r a exhibits both significantly reduced capacity to form plaques and delayed replication kinetics at the elevated temperature of °c compared to wt mhv. reversion analysis of ts mhv nsp idl mutant r a reveals three compensatory second- site suppressor mutations. to identify potential interacting residues and novel regulatory networks within the mhv nsp protease associated with residue r , we performed reversion analysis at °c by expanding and sequencing formed plaques at the inhibitory temperature ( fig. a). a total of plaques were selected at expanded in t flasks for virus collection and sequencing. of these, of these plaques resulted in the original r a mutant virus while of these plaques yielded r a in addition to one of each of three different second-site putative suppressor mutations in nsp : p s, l v, and l i (fig. b) . additional sequencing was performed on these recovered viruses throughout the orf ab coding region and no other mutations were identified. the p s mutation arose within the mhv nsp idl, while residue l is located on the same loop housing the c catalytic residue of the active site. to evaluate whether the emergence of these second-site suppressor mutations aids in viral growth at °c, an eop analysis was performed using these viruses at °c and °c (fig. c) . bottom part of the binding pocket for residues p -p of the substrate (fig. a and b) . modeling using the crystal structure of sars-cov- , residues d and t formed a distinct pocket in and around the p residue of leu, residues t and q establish the back wall of the p binding pocket, and residues q , t , and q are responsible for forming the back (q and t ) and base (q ) of the p and p binding pockets. among the mhv idl mutants which failed to rescue were d a, q a, and q a. amino acid residues d and q are structurally conserved in all sequenced nsp proteases to date ( fig. b) . both d and q are located in a conserved horseshoe-shaped region in the n- terminus of the idl. the d side chain projects from the top of the horseshoe-shaped region towards domain and the protease active site and forms the inner wall pocket for the p binding site. in an alignment of the d residues of mhv, sars-cov, mers-cov, and sars-cov- , the positioning and orientation of the side chain are highly conserved with predicted polar contacts with two additional highly conserved residues r (which is immediately adjacent to the catalytic h ) and y (fig. c) . the q side chain is conserved in its positioning towards the center of the horseshoe-shaped region where it shares predicted polar contacts with several other idl residues including a and r (in mhv), r and r (in sars-cov- ), k (in mers-cov), and t (in sars-cov) (fig. d) (including the current sars-cov- pandemic) which collectively highlight both the importance for rapid development of effective therapeutics for the treatment of covid- , but also the need to be prepared for potential future coronavirus outbreaks. in the present study, we evaluated the structure and function of the nsp protease idl, a poorly studied and structurally conserved region of the protease. using site-directed mutagenesis, we demonstrated that some residues and regions of the protease were capable of accepting mutations without apparent defects in viral replication, however a number of residues mostly located within a horseshoe-shaped region in the n-terminus of the protease either failed to permit virus recovery or resulted in a viral temperature-sensitivity. of the amino acid residues comprising the loop, we were able to successfully recover viral mutants at different locations ( table ) . despite the overall structural conservation of the entirety of the loop, the majority of these mutations resulted in no apparent defects in viral replication compared to wt. a few of these residues (a , v , and v ) with no apparent viral defects are known to form the basis of part of the p -p substrate binding pockets of the protease ( , ). yet, compared to the rest of the idl, these residue positions showed among the least sequence conservation (figure b) , which may explain the plasticity with which these residues could tolerate mutagenesis as well as cleavage site variability among coronaviruses ( ). similarly, more c-terminal residues p , q , and y are also found in more variable sequence locations within the idl. collectively, these residue positions may simply represent flexible linker residues than serving additional structural supportive or enzymatic roles within the protease. residues p and r , while rescued when mutated to alanine amino acids, exhibited reduced capacity to form plaques at °c. p is found at a bend leading into the horseshoe shaped region of the idl and may be responsible for helping stabilize the n-terminal anchor of the loop within domain . replication analysis and western blots of the p a mutant virus failed to show significant differences from wt mhv, however the selection of a p s mutation in reversion analysis of r a may suggest that these two residues represent stabilizing and interacting nodes within the protease (figure b ). we previously described different temperature-sensitive mutations in mhv-a (s a, v a, and f l) which all shared overlapping compensatory second-site suppressor mutations ( , , ). all viruses selected for an h y mutation, while the temperature-sensitive v a mutation selected for an s n mutation. furthermore, second-site mutations were identified for f l which were located greater than Å away from the initial mutation. p a is located on an adjacent loop in domain to both s and h (less than Å) in distance (not shown). mhv viral mutant r a was found to exhibit delayed replication kinetics (figure ) , reduced capacity to form plaques (figure ) , and reduced nsp -mediated proteolytic processing at the elevated temperature of °c, consistent with a temperature-sensitive phenotype (figure ) . perhaps surprising, the r residue position was the most variable and least conserved structurally among all hcovs evaluated ( figure b) . structural analysis of the mhv, sars-cov, sars- cov- , and mers-cov revealed that the side chain of the % conserved q appears to form conserved polar interactions with the backbone amino and carboxyl termini of the residue position (figure d) . these data may suggest that q is stabilized within the horseshoe show a high level of amino acid conservation with two of these residues (d and q ) being % conserved across all known coronavirus nsp protease sequences to date (figure b) . all four of these residues are found within a conserved horseshoe-shaped region within the n- terminus of the nsp idl. we propose that this horseshoe-shaped region is a critical region of the protease for both structure and function based on the following observations: ( ) residues with the catalytic dyad h and c residues labeled. predicted polar contacts between q and other residues of the idl are shown. sars-cov has an additional and unique predicted polar interaction with t (shown in red coronaviruses: an overview of their replication and pathogenesis conservation of substrate specificities among coronavirus main inhibition of sars-cov cl protease by flavonoids prediction of novel inhibitors of the main protease (m-pro) of sars-cov- through consensus docking and drug reposition crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors evaluation of a non-prime site specificities of c and c-like proteases by zinc-coordinating and peptidomimetic insights for wide spectrum anti-coronavirus drug design structure of the main protease from a global infectious human coronavirus, hcov-hku human coronavirus oc cl protease and the potential of ml as a broad-spectrum lead compound: homology modelling and molecular dynamic studies modeling of the key: cord- -r d rx authors: grant, paul r; turner, melanie a; shin, gee yen; nastouli, eleni; levett, lisa j title: extraction-free covid- (sars-cov- ) diagnosis by rt-pcr to increase capacity for national testing programmes during a pandemic date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: r d rx severe acute respiratory syndrome coronavirus (sars-cov- ) causes coronavirus disease (covid- ), a respiratory tract infection. the standard molecular diagnostic test is a multistep process involving viral rna extraction and real-time quantitative reverse transcriptase pcr (qrt-pcr). laboratories across the globe face constraints on equipment and reagents during the covid- pandemic. we have developed a simplified qrt-pcr assay that removes the need for an rna extraction process and can be run on a real-time thermal cycler. the assay uses custom primers and probes, and maintains diagnostic sensitivity within . % compared to the assay run on a high-throughput, random-access automated platform, the panther fusion (hologic). this assay can be used to increase capacity for covid- testing for national programmes worldwide. coronavirus disease (covid- ) is a respiratory tract infection caused by a newly emergent coronavirus -severe acute respiratory syndrome coronavirus (sars-cov- )which was first recognised in wuhan, hubei province, china, in december . genetic sequencing of the virus suggests that sars-cov- is a betacoronavirus closely linked to sars coronavirus (wu et al. ). the standard molecular diagnostic test for this virus is a multistep process involving viral rna extraction and real-time quantitative reverse transcriptase pcr (qrt-pcr). although many companies have produced pcr kits to amplify the viral rna, rna extraction at any scale in a diagnostic laboratory is performed on a limited number of automated platforms that require specific reagents and consumables. this has led to significant effort to build large laboratories with existing research equipment to increase testing capacity, and to extract rna on more open platforms that enable non-specific reagents and plastics to be used. the covid- pandemic placed severe constraints on the availability of laboratory equipment, reagents and consumables required for molecular diagnostics in the uk and europe. this delayed the ability to scale-up testing capacity required for healthcare and population screening. at health services laboratories (hsl), we developed a qrt-pcr that can be run on a highthroughput, random-access automated platform, the panther fusion (hologic). using the open channel facility on this platform, custom primers and probes designed in-house can be added to a dna/rna extraction cartridge. in london, this qrt-pcr was used for largescale testing of patients hospitalized with suspected covid- . however, the covid- pandemic also led to these cartridges being in short supply. using the same primers and probes, we have now developed a qrt-pcr that can be run on a real-time thermal cycler without the need for an rna extraction process. this qrt-pcr maintains sensitivity to within . % of the assay run on the panther fusion. a panel of sars-cov- positive and negative samples was used to compare the rna extraction and rna-extraction free methods. μl viral transport medium (vtm) from a swab was added to μl qiagen lysis buffer containing guanidinum to inactivate the virus. this was then processed on a qiasymphony sp using the qiasymphony dsp virus/pathogen mini kit and complex protocol. the elution volume was set to μl, and μl of the purified rna was added to the pcr. μl, μl and μl of sample expressed in viral transport medium was added directly to the pcr without any heating step, the plate was sealed and thermal cycling begun. μl of sample expressed in viral transport medium was added to a pcr tube and heated to °c for mins prior to loading into the pcr at ul, μl and μl. a μl reaction containing μl rna, μl x taqman fast virus -step master mix (applied biosystems) and μl primer and probe mix as shown in table . where vtm was added directly to the pcr at μl or μl, this was made up to μl with rnase-free water. cycling was performed at °c for min for reverse transcription, followed by °c for sec and then cycles of °c for s, °c for s using an applied biosystems quantstudio real-time pcr system (thermofisher scientific). the primer/probe mix was made up in bulk and contained pmol primer/probe per reaction. (for example, a mix for reactions would have μl of each n gene primer/probe and . μl each rnasep primer/probe at μm stock concentration, made up to μl per reaction with water.) the rna extraction method was compared to the direct addition of samples to the rt-pcr with and without prior heating. when μl of the heated or unheated sample was added to the pcr, no amplification was observed. both the direct addition methods gave lower median ct values than those added after heat treatment and were equivalent to the ez (qiagen) extraction (figure ). the lowest ct values were achieved by adding μl of the vtm direct to the pcr without any prior heating (median ct value . vs . using ez rna). this method was selected for further analysis. the direct addition of μl sample to the pcr was compared to the standard method in use within the clinical laboratory using the open access channel of the panther fusion. an overall accuracy of . % was achieved compared to the panther fusion assay (see table ). the analytical sensitivity was compared by diluting a positive clinical sample to end point, and testing using the extracted rna and adding vtm directly to the pcr. the results are shown in table . direct addition of samples to the qrt-pcr without extraction with a diagnostic sensitivity of . %, specificity of % and accuracy of . % compared to the method on the panther fusion. this simplifies the process for covid- testing, and will enable increased capacity in diagnostic laboratories. implementation of this method will enable laboratories to provide a covid- testing service without the need for rna extraction equipment, reagents and consumables. turn-around times are similar to those of high-throughput rna-based assays, and faster than a two-step rna extraction and qrt-pcr. capacity can be significantly increased without the extraction step but is dependent on the number of safety cabinets for swab processing and number of real time pcr thermal cyclers. heating at °c for minutes causes sars cov (sars coronavirus) to lose infectivity (who, ) . health and safety assessments have been completed and the process has been deemed safe to perform with relevant precautions and safety practices. samples can be processed in batches of , each batch takes minutes to run the rtpcr on the thermal cycler, the rate limiting step being the swab processing. lower numbers would be processed more rapidly, within an equivalent time to a point-of-care test. standard swab processing can be automated to speed up the initial process on a large scale. many laboratories use real-time thermal cyclers, so this method can be used to increase national screening capacity without the need for other specialized equipment or rna extraction reagents. rna extracted ul added direct no rna extract - / / - / / - / / - / / - / / - / / - / / applying an extraction-free pcr protocol as described here would avoid limitations on covid- screening capacity in the uk and elsewhere caused by global pcr reagent supply constraints. we recommend this method is explored further by other medical laboratories using alternative pcr reagents to improve the resilience and capacity of virology laboratories during the pandemic. the sensitivity of the assay will be dependent upon the pcr efficiency, and so other pcr protocols will need to be carefully evaluated with this new approach. first data on stability and resistance of sars coronavirus compiled by members of who laboratory network real-time reverse transcription-polymerase chain reaction assay for sarsassociated coronavirus key: cord- -h oyjz authors: scherf-clavel, oliver; kaczmarek, edith; kinzig, martina; friedl, bettina; feja, malte; höhl, rainer; nau, roland; holzgrabe, ulrike; gernert, manuela; richter, franziska; sörgel, fritz title: tissue level profiling of sars-cov- antivirals in mice to predict their effects: comparing remdesivir’s active metabolite gs- vs. the clinically failed hydroxychloroquine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: h oyjz background and objectives remdesivir and hydroxychloroquine are or were among the most promising therapeutic options to tackle the current sars-cov- pandemic. besides the use of the prodrug remdesivir itself, the direct administration of gs- , the resulting main metabolite of remdesivir, could be advantageous and even more effective. all substances were not originally developed for the treatment of covid- and especially for gs- little is known about its pharmacokinetic and physical-chemical properties. to justify the application of new or repurposed drugs in humans, pre-clinical in vivo animal models are mandatory to investigate relevant pk and pd properties and their relationship to each other. in this study, an adapted mouse model was chosen to demonstrate its suitability to provide sufficient information on the model substances gs- and hcq regarding plasma concentration and distribution into relevant tissues a prerequisite for treatment effectiveness. methods gs- and hcq were administered intravenously as a single injection to male mice. blood and organ samples were taken at several time points and drug concentrations were quantified in plasma and tissue homogenates by two liquid chromatography/tandem mass spectrometry methods. in vitro experiments were conducted to investigate the degradation of remdesivir in human plasma and blood. all pharmacokinetic analyses were performed with r studio using non-compartmental analysis. results high tissue to plasma ratios for gs- and hcq were found, indicating a significant distribution into the examined tissue, except for the central nervous system and fat. for gs- , measured tissue concentrations exceeded the reported in vitro ec values by more than -fold and in consideration of its high efficacy against feline infectious peritonitis, gs- could indeed be effective against sars-cov- in vivo. for hcq, relatively high in vitro ec values are reported, which were not reached in all tissues. facing its slow tissue distribution, hcq might not lead to sufficient tissue saturation for a reliable antiviral effect. conclusion the mouse model was able to characterise the pk and tissue distribution of both model substances and is a suitable tool to investigate early drug candidates against sars-cov- . furthermore, we could demonstrate a high tissue distribution of gs- even if not administered as the prodrug remdesivir. due to threat of the current sars-cov- pandemic, scientists around the world work vigorously on the development of therapeutic options, be it new chemical entities, antibodies, or repurposed known drug molecules. in order to advance quickly from pre-clinical to clinical studies, successful in vivo experiments in animal models are mandatory, a fact recently acknowledged by dinnon and colleagues [ ] . an adapted mouse model could be a suitable proof-of-concept for new compounds as demonstrated for remdesivir (rem) [ ] . however, to step up to humans, the relationship between pharmacodynamics (pd) and pharmacokinetics (pk) needs to be assessed in such a model to calculate an appropriate dose. two prominent examples of drug candidates against covid- are rem and hydroxychloroquine (hcq). rem represents a prodrug developed to quickly convert to the corresponding nucleoside (gs- ) intracellularly or even in plasma ( figure ). however, using gs- would be a more direct treatment strategy which does not rely on enzymatic conversion, and thus be more suitable to demonstrate the applicability of a model. gs- is the major circulating plasma metabolite of rem but only very limited data on the pk of this compound are available. furthermore, the efficacy of rem for covid- should be reevaluated in comparison to its main metabolite (gs- ), which was highly effective against feline coronavirus (fcov) causing feline infectious peritonitis (fip) [ ] [ ] [ ] . in fact, theoretical considerations come to the conclusion, that the prodrug rem might not be the most appropriate nucleoside strategy for the treatment of covid- . after all, it was specifically designed as prodrug to target the ebola virus [ ] [ ] [ ] , while the broad and extensive multi-organ pathology of covid- could actually benefit from direct application of gs- . the aim of this study was to demonstrate the applicability of mice to assess in vivo pharmacokinetics of two pharmacologically extremely different compounds with regards to their pharmacokinetic and physical-chemical properties. moreover, not only plasma concentrations, but also the amount and levels in relevant tissue should be assessable to obtain information on the distribution, an important aspect regarding the question whether or not the foci of infection could even be reached, in vivo. in the present work, we chose a mouse model using gs- and hcq as model compounds. we further investigated the degradation of remdesivir in human plasma and blood in vitro in order to add data on the aspect of its short plasma half-life and origin of gs- after administration as the phosphoramidate prodrug. animal care was provided in accordance with the guidelines of the eu directive / /eu and the german animal welfare agency. all experiments were approved by an ethics committee and the governmental agency (lower saxony state office for consumer protection and food safety; laves; protocol number: az a ). all efforts were made to minimize both the suffering and the number of animals. a total of naive male mice (n= for hcq, n= for gs- ) at - weeks of age and . - . g body weight on a bdf (c bl /dba hybrid) background bred and housed in the institute's facility were used. animals were group housed ( - animals per cage) on standard bedding (shredded wood) and maintained on a reversed -h light/ -h dark cycle (lights off at a.m.). room temperature in the mouse holding room was °c ± °c and relative humidity was about %; values were recorded during the daily animal check. food (altromin standard diet) and water were available ad libitum and material for nest-building (paper rolls) was provided in addition to red plastic houses as enrichment. all drugs or vehicle were administered intravenously as a single injection at a volume of ml/kg via the tail vein. for pharmacokinetic studies in mice, gs- was dissolved in sterile aqua ad injectabilia with . % ethanol, . % propylene glycol, and . % peg . the ph of the drug solution and the vehicle was adjusted to . with sodium hydroxide (naoh) and hydrochloric acid (hcl), respectively. mice were randomly assigned to gs- at mg/kg (n= ) or the respective vehicle (n= ). hcq was dissolved in sterile aqua ad injectabilia. isotonic saline solution served as vehicle control. solutions were prepared just prior to injection, and the final concentrations of compounds were verified from aliquots using lc-ms/ms. a cohort of mice was randomly assigned to hcq at mg/kg (free base, n= ) or the respective vehicle (n= ). the target and actual time of blood sampling were recorded relative to the time of injection. blood samples were collected into microvette cb potassium-edta tubes, briefly mixed by slow inversion and placed on ice immediately. within hour after collection, plasma was separated by centrifugation at , g for minutes at °c. blood samples were collected from all mice immediately prior to injection of drugs or vehicle (baseline control) from the tail vein ( - µl) , and immediately after injection from the facial, sub-mandibular, vein ( - µl) . the plasma or whole blood was protected from light and stored at - °c until analyzed by lc-ms/ms for quantification. the animal group for testing gs- was split in half to allow further repeated blood sampling at , , , and min as follows: n= (and n= vehicle) were bled at and min post injection from the facial vein, and at min after sacrifice by decapitation from the vena cava; the other n= (and n= vehicle) were bled at and min post injection from the facial vein, and at min after sacrifice from the vena cava. from gs- or respective vehicle injected mice, the following organs were taken at sacrifice min post injection: cerebrum, cerebellum, lung, liver, kidneys (partly separated in cortex and medulla), spleen, heart, muscle (quadriceps). for animals sacrificed min post injection, additional samples were taken as follows: stomach, intestine (separated in small intestine and colon), nasal mucosa, and a sample of the cortex separated from cerebrum. the animal group for testing hcq was also split in half to allow further blood sampling. in addition to the immediate sample, blood sampling was repeated at and min from the facial vein, and at min after sacrifice from the vena cava in one group (n= plus n= vehicle) and at , min and hrs from the facial vein, and at hrs after sacrifice from the vena cava in the other group (n= plus n= vehicle). an additional µl of whole blood was taken at sacrifice from the vena cava and transferred into plastic tubes without additives. from hcq and respective vehicle injected mice, all the above mentioned organ samples (gs- ) were taken, with the addition of pancreas. surgical equipment was carefully cleaned with ddh o and dried between samples. in both cases, organs were quickly dried with moistened gauze to remove excess water, blood or content in case of intestines, placed in cryotubes and snap frozen in liquid nitrogen and kept at - °c until analyzed by lc-ms/ms for quantification. gs- and hcq concentrations were quantified in mouse plasma and several mouse tissue homogenates by two liquid chromatography/tandem mass spectrometry (lc-ms/ms) methods using an sciex api tm triple quadrupole (gs- ) and an api triple quadrupole (hcq) mass spectrometer (sciex, concord, ontario, canada). both instruments were equipped with turbo ion spray interface (sciex, concord, ontario, canada). details about the methods including quality data are described in two separate publications currently in preparation. data acquisition and processing of raw data was performed using analyst software version austria, ) using the packages 'ggplot ', 'noncompart', and 'tidyverse' [ ] [ ] [ ] [ ] . auc was calculated using the linear up, logarithmic down method. the terminal slope ( z ) was calculated by log-linear regression of the last three data points. published pk-data was imported using the webplotdigitizer version . [ ] . commercially available edta-plasma was spiked with remdesivir at a concentration of μg/ml. a sciex x qtof mass spectrometer (sciex, darmstadt, germany) was coupled to an agilent infinity uhplc (agilent, waldbronn, germany) and operated in esi negative mode. chromatographic separation was achieved on a hypercarb µm x . mm column (thermofischer scientific, germany) as stationary phase using mm ammonium acetate at ph . (nh oh %) and acetonitrile containing . % (v/v) nh oh % as mobile phase a and b, respectively. the gradient was programmed as follows: - . min: % a, . - min:  % a, - . min:  % a, . - . min:  %a, . - min: % a. the flow rate was set to µl/min. a sciex x qtof mass spectrometer was coupled to an agilent infinity uhplc and operated in esi positive mode. chromatographic separation was achieved on a kinetex f . µm x . mm column (thermofischer, germany) as stationary phase using water containing . % (v/v) formic acid and acetonitrile containing . % (v/v) formic acid as mobile phase a and b, respectively. the gradient was programmed as follows: - . min: % a, . - min:  % a, - . min:  % a, . - . min:  %a, . - min: % a. the flow rate was set to µl/min. a sciex triple quadrupole mass spectrometer was coupled to an agilent hplc and no obvious adverse effects were observed after injection of gs- and hcq, respectively. the pooled data of mice was used to estimate the pk properties of gs- in wildtype mice (see figure a). based on the estimated  z and auc -∞ , the volume of distribution (v z ) was estimated at . l/kg, whereas clearance was estimated at . l/h/kg. the pooled data of mice was used to estimate pk parameters of hcq in wildtype mice. v z and clearance were estimated at . l/kg and . l/h/kg, respectively. the mean blood-toplasma ratio (b/p) and hours after infusion of mg/kg hcq was . ± . (n = ) and . ± . (n = ), respectively. the obtained plasma concentration-time profile is presented in figure b. one hour after administration of mg/kg, mean tissue-to-plasma ratio (t/p) ranged from . (brain) to . (adrenal cortex). four hours after i.v. administration, mean t/p was as high as . and . in brain and adrenal cortex, respectively (see figure a). highest mean t/p were observed for liver ( . ), kidneys ( . - . ) and intestines (> . ). obtained absolute tissue concentrations are presented in table . the full summary is presented in table s . six hours after administration of mg/kg, mean t/p ranged from . (cerebellum) to . (lungs). with exception of tissue found in the cns, fat, muscle, colon and stomach wall, mean t/p was at least for all tissues (see also table . the full summary is presented in table s . tof-ms and tof-ms/ms experiments revealed a rapid decrease of rem at elevated temperature in plasma in vitro. at the same time, the peak due to the alanine metabolite appeared and increased accordingly. gs- was not detected in those experiments ( fig. s -s ). in the low resolution lc-ms/ms experiments, gs- and the alanine metabolite were present at about the same level, expressed as peak area ratio (analyte/internal standard) prior to incubation. after incubation at °c the peak ratios increased by a factor of . and . for the alanine metabolite and gs- respectively. more than % of rem were degraded within hours at °c. gs- is the major circulating plasma metabolite of rem but only very limited data is available on the pk of this compound. most on the information obtainable from published sources refers to the pk after administration as the prodrug rem (gs- ) either in animal models [ , ] or healthy human subjects [ ] . there is also a case report of two covid- patients receiving rem [ ] . further hints on the pk of gs- can be found in the assessment report for gilead sciences` veklury (rem) [ ] . to our knowledge, the only other publications investigating the pk after administration in the form of the nucleoside are investigations by murphy and colleagues in cats [ ] . in comparison to the estimated pharmacokinetics in cats, our results are in the same order of magnitude however suggesting increased clearance and reduced volume of distribution (see table ). these discrepancies could be due to interspecies differences, further highlighting the importance of thorough pharmacokinetic studies in species used to model covid- . it was suggested by yan and muller, that rem in vivo is predominantly hydrolysed in plasma to yield gs- rather than being activated intracellularly [ ] . however, this theory seems to contradict the increasing, dose dependant half-life of gs- after administration as rem (see table ). the dose dependency furthermore suggests that in this case half-life is not defined by the elimination rate constant. instead, the liberation of gs- from tissue/organs saturated with the prodrug is suggested to be the rate limiting process at higher doses. this is underlined by the fact that renal clearance increases with increasing half-life ( table ). the mass-balance study for rem (gs-us- - ) disclosed that renal clearance is in addition to hepatic extraction (via alanine metabolite and the monophosphate) the most relevant elimination pathway and gs- is the predominant metabolite detected in urine ( %), followed by rem ( %). it is therefore not surprising that in one human patient with renal impairment receiving rem for the treatment of covid- , all gs- plasma concentrations were significantly elevated compared to another critically ill patient without renal impairment [ ] . gs- exposure in an patient suffering from end stage renal disease receiving rem was dramatically increased compared to healthy volunteers or non-impaired patients [ ] . being a comparatively old drug, the pk of hcq has been studied extensively. however, there is still limited information on its pharmacokinetics especially with regard to distribution. reports considering the terminal half-life in plasma or blood are inconclusive and range from approx. to days [ , ] . these values refer to an elimination half-life calculated after discontinuation from steady state. due to the high degree of tissue distribution these long halflives include redistribution from tissue and thus are considerably longer compared to a halflife calculated from a single dose or within steady state. the plasma half-life calculated from our data is comparable to the half-lives obtained in mice and humans (table ) . hcq exhibits complex pharmacokinetics with extremely high volume of distribution ranging from approx. to l for a kg adult. the value obtained from our experiments falls in that range and is comparable to the v z obtained in mice from blood concentrations and to plasma v z measured in humans (see table ). the substance is subject to renal and hepatic elimination. three major metabolites are known: bis-desethyl chloroquine, desethyl chloroquine, and monodesethyl hydroxychloroquine. the latter two are considered active metabolites [ ] . hcq is predominantly metabolized via cyp c , cyp d , and cyp a [ ] and approximately % of hcq are eliminated renally as parent drug by filtration and most likely also tubular secretion [ , ] . due to the structural similarity to chloroquine, which is a mate- substrate, secretion via this transporter seems probable [ ] . tett et al. were the first to report bioavailability and terminal half-life of hcq. they described the pharmacokinetics of a single i.v. and oral dose using a four-and tri-exponential equation, respectively [ ] . the estimated oral systemically available fraction ranged from . to . . later investigations by carmichael and colleagues confirmed this finding with their estimate of . [ ] . in plasma, the compound is bound to serum albumin and  -acid glycoprotein (aag), which is not unusual for alkaloids [ ] . the plasma protein binding ranged from . to . %, was dependant on the concentration of (aag), and was higher (approx. . -fold) for the senantiomer [ ] . thrombocytes and especially leukocytes have been identified as deep compartment for hcq leading to a blood-to-plasma ratio (b/p) ranging from . to [ , ] . due to the acidic environment and organelles in those cells, hcq is trapped in the protonated form within those kinds of cells, but not in erythrocytes. to our knowledge, b/p of hcq has not been studied in mice before and seems to be lower according to our results. this might be explained by the fact that the murine cellular immune system differs from the humane. whereas human blood is particularly rich in neutrophils ( - % of leukocytes) this cell type plays a minor role in the murine blood ( - % of leukocytes) which is dominated by lymphocytes [ ] . pharmacokinetics, including the metabolites of hcq, in mice [ ] . however, their method was not validated for mouse plasma and therefore not suitable to determine t/p or b/p of hcq. it is noteworthy, that the blood concentrations in the terminal phase obtained by chhonker et al. furthermore, it could be agreed, that for the purpose of tdm, whole blood concentrations might be more suitable compared to plasma concentrations, due to the lower variability in whole blood. on the other hand, the determination of pharmacokinetic parameters on the basis of whole blood concentrations might not be the best choice, since distribution and clearance processes usually only affect (unbound) drug molecules in the plasma and redistribution from blood cells might be slow. in cynomolgus monkeys rem ([ c]gs- , mg/kg), showed a tissue-to-plasma ratio (t/p) for testes and epididymis of about to . after h. a considerably lower amount of radioactivity was recovered from eyes and brain (t/p: ~ . and ~ . , respectively) indicating towards a low permeability for the blood-brain barrier for rem or any of the downstream metabolites [ ] . furthermore, t/p < or > indicate involvement of influx and efflux transporters which can play an important role regarding rate and extent of tissue distribution [ ] . t/p < is due to efflux processes as it is mediated especially in the brain via p-gp. as rem was found to be a substrate for p-gp, low t/p values found in the brain of cynomolgus monkeys are plausible. to our knowledge, gs- itself was not investigated as a transporter substrate but according to our data, tissue-to-plasma ratio in the brain is far below . a hypothetical affinity of gs- to p-gp, besides its low membrane permeability, would therefore also be conceivable. in covid- patients receiving rem, a csf to plasma ratio of . % was reported by tempestilli et al. [ ] . furthermore, they found gs- concentrations in bronchoalveolar aspirate of . and . ng/ml on day and . and . ng/ml on day , respectively ( patients). our data revealed a significant distribution of gs- into numerous organs, such as liver and kidney with high k t/p values both after one and after four hours. hence, the phosphoramidate prodrug structure in the form of rem might not be necessary to achieve sufficiently high tissue and even intracellular concentrations. the high efficiency of gs- in treating fip corroborates this finding [ ] . given the multi-organ pathology observed in covid- , successful treatment strategies require broad distribution. lack of penetration into brain tissue after could explain limited efficacy of most therapeutics against neurological symptoms which are frequent and debilitating in severe covid- patients [ ] . as gs- represents the nonphosphorylated nucleoside form of rem, an active uptake process into the cell via equilibrative (ent) or concentrative (cnt) transporters is plausible [ ] . yan and muller also brought up the idea that, besides slow passive diffusion, gs- crosses membranes using nucleoside transporters [ ] . a variety of nucleoside based antiviral drugs (e.g. ribavirin) and their corresponding transporter proteins is presented by pastor-anglada et al. [ ] . due to the adenosine structure in gs- , purine-preferring cnt is one of the most probable nucleoside transporter (k m value of µm for adenosine) to mediate intracellular uptake [ ] . in comparison, k m values for adenosine uptake by ent (k m µm) and ent (k m µm) are much higher [ ] . nevertheless, they should not be overlooked as several nucleoside transporters can be expressed by a single cell. in mcchesney described the distribution of chloroquine and hcq in albino rats in a qualitative and quantitative way. they found organ concentrations in the following order: bone, fat, and brain < muscle < eye < heart < kidney< liver < lung < spleen < adrenal tissue [ ] . it was also demonstrated, that tissue accumulation did not reach steady state before seven months into treating albino rats with mg/kg hcq per day orally (stomach tube). after weeks of treating albino rats with . mg/kg per day orally (stomach tube), t/p ranged from (muscle) to (spleen) with the values for heart, kidney, liver, and lung in between ( , , , ) [ ] . we could confirm the extensive tissue distribution of hcq, explaining the frequently reported high v z . in general, a high t/p would be desirable for an antiviral drug, since penetration into tissue is a necessary step for the intracellular uptake. accumulation in internal organs like liver, kidney, lungs, and spleen was also observed in our experiments. the disparity between our t/p and the reported values in albino rats could be mostly contributed to the fact, that our experiment was based on a single dose and steady state pharmacokinetics have not been attained. parts of the central nervous system did not show accumulation of hcq, which is in accordance with earlier investigations [ ] and may limit its use for the neurological manifestations potentially caused by sars-cov- penetration into the brain [ ] . reported in vitro ec values for hcq against vero e infected sars-cov- cells range from . to . - . µm [ , ] . with respect to the pronounced t/p, the comparatively high ec values (~ µm) could be reached in our mouse model in some tissues (e.g. lungs and adrenal cortex). however, a multiple of the ec value might be necessary for a reliable antiviral effect. due to the long terminal half-life and potentially slow tissue distribution (time to reach steady state in tissue!) sufficiently high concentrations in tissue might not be achieved in time. this could be one reason for the failure of the -day hcq therapies tested [ ] . a quicker tissue saturation is not feasible due to the comparatively high toxicity of hcq. a prophylactic effect of hcq could be studied infecting the current mouse model after long term ( - weeks) treatment with hcq ( mg/kg per day) in order to reach high hcq tissue concentration at the time of infection. it could be demonstrated, that rem quickly degrades to two major products in plasma in vitro. one of those degradation products is gs- , the other is the alanine metabolite as an intermediate. gs- could only be found in the more sensitive lc-ms/ms approach. due to the lack of external standards for the alanine metabolite, we cannot make a statement regarding the sensitivity in lc-ms/ms (the transition could not be optimized) and therefore not give a measured concentration. however, using a mass balance approach assuming that no other degradation products were formed, only a minor percentage (approx. . %) of remdesivir was converted to gs- in plasma. if this finding translates to in vivo, the major part of plasma gs- originates from redistribution after cellular uptake of remdesivir and intracellular metabolism rather than hydrolysis in plasma as suggested by yan and muller (vide supra). we could demonstrate that the mouse model platform is suitable to characterise the pk and tissue distribution of two extremely different compounds. we could demonstrate that gs- is distributed into tissue even if not administered as the prodrug rem. thus, we believe that the mouse model and general procedure presented here is a useful tool in the early investigation of drug candidates targeted against sars-cov- . this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. none of the authors of this work has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper. a mouse-adapted sars-cov- model for the evaluation of covid- medical countermeasures. biorxiv remdesivir inhibits sars-cov- in human lung cells and chimeric sars-cov expressing the sars-cov- rna polymerase in mice oral mutian(r)x stopped faecal feline coronavirus shedding by naturally infected cats efficacy and safety of the nucleoside analog gs- for treatment of cats with naturally occurring feline infectious peritonitis the nucleoside analog gs- strongly inhibits feline infectious peritonitis (fip) virus in tissue culture and experimental cat infection studies broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses adenine c-nucleoside (gs- ) for the treatment of ebola and emerging viruses therapeutic efficacy of the small molecule gs- against ebola virus in rhesus monkeys welcome to the tidyverse team, r.c., r: a language and environment for statistical computing noncompart: noncompartmental analysis for pharmacokinetic data safety, tolerability, and pharmacokinetics of remdesivir, an antiviral for treatment of covid- , in healthy subjects pharmacokinetics of remdesivir and gs- in two critically ill patients who recovered from covid- advantages of the parent nucleoside gs- over remdesivir for covid- treatment pharmacokinetics of remdesivir in a covid- patient with end-stage renal disease on intermittent hemodialysis. medrxiv pharmacokinetics of hydroxychloroquine and chloroquine during treatment of rheumatic diseases hydroxychloroquine concentration-response relationships in patients with rheumatoid arthritis simulated assessment of pharmacokinetically guided dosing for investigational treatments of pediatric patients with coronavirus disease a dose-ranging study of the pharmacokinetics of hydroxy-chloroquine following intravenous administration to healthy volunteers molecular mechanism of renal tubular secretion of the antimalarial drug chloroquine population pharmacokinetics of hydroxychloroquine in patients with rheumatoid arthritis hematologic disposition of hydroxychloroquine enantiomers of mice and not men: differences between mouse and human immunology simultaneous quantitation of hydroxychloroquine and its metabolites in mouse blood and tissues using lc-esi-ms/ms: an application for pharmacokinetic studies physiologically-based pharmacokinetic (pbpk) modeling and simulations neurological associations of covid- . the lancet neurology metabolic efficacy of phosphate prodrugs and the remdesivir paradigm cell entry and export of nucleoside analogues kinetic and pharmacological properties of cloned human equlibrative nucleoside transporters, ent and ent , stably expressed in nucleoside transporter-deficient pk cells coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease. mbio pharmacokinetics and tissue distribution of remdesivir and its metabolites nucleotide monophosphate, nucleotide triphosphate, and nucleoside in mice review of the basic and clinical pharmacology of sulfobutylether-beta-cyclodextrin (sbecd) animal toxicity and pharmacokinetics of hydroxychloroquine sulfate tissue distribution of chloroquine, hydroxychloroquine, and desethylchloroquine in the rat hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) pharmacokinetics of hydroxychloroquine and its clinical implications in chemoprophylaxis against malaria caused by plasmodium vivax population pharmacokinetics of hydroxychloroquine in japanese patients with cutaneous or systemic lupus erythematosus pharmacokinetics of hydroxychloroquine in pregnancies with rheumatic diseases we are grateful to martina gramer and larsen kirchhoff for technical assistance. key: cord- -jz d m e authors: hasan, md. mahbub; das, rasel; rasheduzzaman, md.; hussain, md hamed; muzahid, nazmul hasan; salauddin, asma; rumi, meheadi hasan; rashid, s m mahbubur; siddiki, amam zonaed; mannan, adnan title: global and local mutations in bangladeshi sars-cov- genomes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jz d m e corona virus disease- (covid- ) warrants comprehensive investigations of publicly available severe acute respiratory syndrome-coronavirus- (sars-cov- ) genomes to gain new insight about their epidemiology, mutations and pathogenesis. nearly . million mutations were identified so far in ∼ , sars-cov- genomic sequences. in this study, we compared of sars-cov- genomes reported from different parts of bangladesh and their comparison with globally reported sequences to understand the origin of viruses, possible patterns of mutations, availability of unique mutations, and their apparent impact on pathogenicity of the virus in victims of bangladeshi population. phylogenetic analyses indicates that in bangladesh, sars-cov- viruses might arrived through infected travelers from european countries, and the gr clade was found as predominant in this region. we found mutations including missense mutations, synonymous mutations, insertions and deletions with other mutations types. in line with the global trend, d g mutation in spike glycoprotein was predominantly high ( . %) in bangladeshi isolates. interestingly, we found the average number of mutations in orf ab, s, orf a, m and n of genomes, having nucleotide shift at g (n= ), were significantly higher (p≤ . ) than those having mutation at d (n= ). previously reported frequent mutations such as p l, d g, r k, g r and i f were also prevalent in bangladeshi isolates. additionally, unique amino acid changes were revealed and were categorized as originating from different cities of bangladesh. the analyses would increase our understanding of variations in virus genomes circulating in bangladesh and elsewhere and help develop novel therapeutic targets against sars-cov- . severe acute respiratory syndrome-coronavirus- (sars-cov- ) has become an etiological agent of the disease called coronavirus disease- (covid- ) . as of august , globally there have been , , confirmed cases of covid- , including , deaths, reported to who. to explore the viral pathogenesis, modern genomics tools are highly crucial and has been employed by researchers around the world. hundreds of virus whole genomes are now submitted in publicly accessible databases from different parts of the globe everyday. it is hightime to analyze the variations among those sequences which will help future strategic efforts for its preventive measures such as vaccine design and therapeutics. sars-cov- consists of positive-sense single-stranded rna with a genome size ranging from ~ to kb. it contains a variable number ( ) ( ) ( ) ( ) ( ) ( ) of open-reading frames (orfs). the first orf is almost two-third of the whole genome and encodes four structural, non-structural and eight accessory proteins [ , ] . according to a recent study on , of sars cov- genome sequences, , mutation events have been observed globally in comparison to the reference genome of wuhan. among these sequences, india, congo, bangladesh and kazakhstan have significantly high numbers of mutations per sample compared to the global average [ ] . out of these mutations, d g mutation (causing aspartate to glycine in s protein position ) is reported to be the most prevalent mutations reported from europe, oceania, south america, africa [ ] . zhang et al. ( ) reported that the level of angiotensin-converting enzyme (ace ) expression was distinctly higher by the retroviruses pseudotyped with g compared to that of d [ , ] . the functional properties of the (d ) and (g ) were compared in this study and g was found to be more stable than d with more transmission efficiency, supporting the previous epidemiological data [ ] . another reported mutation of orf ab is p l linked with d g, that has been reported to have a strong relationship with higher fatality rates in countries and states of the united states [ , ] . three other mutations namely c t (orf b), c t ( ' utr), c t (orf a) is reported to be common and co-occurring in the same genome while g t has been found mostly in asian countries [ ] . hassan et al. ( ) investigated the accumulation of orf a mutations of sars-cov- from india where they revealed four types of mutations (q>h, d>y and s>l) near traf, ion channel and caveolin binding domain, respectively [ ] . notable that all these mutations might have implications in maintaining the virulence of the virus and nlrp inflammasome activation. in bangladesh, as of august, , nearly , people are infected and people have died due to covid- (https://iedcr.gov.bd/). among them, of sars-cov- genome sequences were deposited in gisaid database (https://www.gisaid.org). analysis of these sequences is sparsely reported in the literature. analyzed sequences and identified the presence of mutations in the coding regions of the viruses and mutations at nsp was the most prevalent [ ] . due to the small numbers of genome sequence analyzed, most of these findings were not conclusive and representative. further comprehensive analyses is therefore necessary to better understand the circulating virus in the country. in this study, we first compared genome sequences isolated from bangladesh with time-resolved phylogenetic analysis and investigated the origin of imported covid- cases to bangladesh. then, we studied the variants present in different isolates of bangladesh to investigate the pattern of mutations, identify ums, and discuss the pseudo-effect of these mutations on the structure and function of encoded proteins, with their role in pathogenicity. most interestingly, we found ums with a total count of in bangladeshi isolates which will increase our understanding of distribution of sars-cov- virus in different regions and associated pathogenicity. as of june, , a total of , whole genomic rna sequences of sars-cov- had been submitted to gisaid. from these downloaded sequences, a custom python script was used to retrieve unique sequences. the same script also removed any sequence containing "n" and other ambiguous iupac codes [ ] . this resulted in a total of complete genomic sequences. to select representative sequences from curated sequences and make a comparison against sequences from bangladesh, priorities were given to those countries that had a higher number of infections in each continent (source: https://www.worldometers.info/coronavirus/). we selected these sequences in such a way that each continent must contain at least one sequence from each gisaid clade. the number of sequences selected from a country was based on the total number of unique sequences retrieved. this resulted in a total of unique representative sequences from countries (see supporting file table s ). from bangladesh, whole-genome rna sequences of sars-cov- were uploaded to gisaid as of july, . only high coverage complete sequences (n= ) were kept for analysis. all these sequences were aligned with the previously selected representative sequences along with that of wuhan- (accession id mn ) as a reference sequence [ ] . to ensure comparability, the flanks of all the sequences were truncated to the consensus range from to , [ ] , with nucleotide position numbering according to the wuhan reference sequence, prior to alignment. multiple sequence alignment (msa) and phylogenetic tree construction were carried out using molecular evolutionary genetic analysis (mega x) software [ ] . all selected sequences were aligned using muscle software tool [ ] . later, an nj (neighbor-joining) phylogenetic tree [ ] was constructed using the tamura-nei method. tree topology was assessed using a fast bootstrapping function with replicates. tree visualization and annotations were performed in the interactive tree of life (itol) v [ ] . the genome detective coronavirus typing tool version . was used for variant analyses of sars-cov- genome which is specially designed for this virus analysis (https://www.genomedetective.com/app/typingtool/cov/) [ ] . for analysis of (um) among genomic sequences from bangladesh, we used a cov server hosted in gisaid server (https://www.gisaid.org/epiflu-applications/covsurver-app/). the server analyzed our dataset against all available genomic sequences of sars-cov- including the wuhan reference sequence deposited on gisaid until july , . descriptive and inferential statistics were used to analyze different mutations and their correlation with different categorical variables. for correlation, we used one-way analysis of variance using spss statistics (ibm, armonk, new york) licensed to king's college london. to understand the sars-cov- viral transmission in bangladesh, we performed phylogenetic analysis on the selected viral genomes reported from different districts of bangladesh along with selected globally submitted sequences as reported from countries and continents ( figure ). this apparently represents the overall clade distribution of all global sequences along with sequences from bangladeshi isolates. gr clade was found predominant in bangladesh as about % of the sequences were grouped to this clade followed by gh and g with ~ and ~ %, respectively. similar clade distribution has been found in isolates submitted from european countries. we also attempted to compare the sequence data among different districts of bangladesh from where patient samples were collected for sequencing, looking at the districtwise distribution of clade (figure ) , it was found that the sequences from three districts, namely dhaka, narayanganj and rangpur primarily belong to gr clade. conversely, only sequences from chattogram district were from s and gh clades. on another note, phylogenetic analyses of the clade distribution of isolates from countries like saudi arabia, where gh clade was predominant [ ] it is highly likely that the introduction of gh and s clades in bangladesh could be of middle-eastern origin. based on the datsets, we hypothesized that bangladeshi sars-cov- isolates belonging to different clades might have critical implications concerning viral transmission rate, virulence, severity and other aspects of disease pathogenesis. in addition, the presence of different clades of sars-cov- strains in different districts could also have implications in the accuracy of diagnostic tests that are underway. our analyses revealed a total of mutations observed among bangladeshi genomic sequences that constitutes a number of missense, synonymous, insertion/deletions and other mutations (table ) . we identified mutations like i f (nsp ), p l (nsp ), d g (s glycoprotein), r k and g r (n protein) are the most frequently occurring common mutations found in bangladesh with a frequency of , , , and , respectively ( figure ). notable that, no particular mutations occurred at any specific time period rather they have been observed over the whole period of disease incidence. firstly, a>t (i f), can be a destabilizing factor for nsp and thus modulating the strategy of host cell survival [ , ] . this mutation can lead to a reduction of conformational entropy due to the presence of the side chains that can result in charge neutralization of the phosphorylated serine residues [ ] . secondly, rna dependent rna polymerase (nsp ) is significant for replication and transcription of the viral rna genome. therefore, p > l at in nsp may have some effects on rna transcription. this mutation was also observed in most of the usa states ( out of ). the same mutation was prevalent in european countries like spain, france etc. this alteration could affect the pathogenesis triggered by antibody escape variants with the epitope loss [ , ] . thirdly, korber et al. ( ) stated that the g type might have originated either in europe or china [ ] . they also reported that the original wuhan d form was also predominant in asian samples. meanwhile, the g form had clearly established and started expanding in countries outside of china. we also noticed . % of genomes from bangladesh have d g mutation, which is also dominant in the world. however, the average number of mutations per orf is varied among d and g containing genomes that we have studied (n= ) as revealed by table . the average number of mutations in orf ab, s, orf a, m and n of genomes having mutated g (n= ) are significantly higher (p≤ . ) than those having wild d (n= ) in s glycoprotein (table ; figure ). interestingly, the average mutation number is declined in orf of genomes having g mutation (p< . ). this correlation indicates that the genomes containing d g mutation are more prone to bear other mutations which may facilitate the notion that the link of this mutation with the transmission and pathogenesis of sars-cov- [ , , ] . finally, r k and g r mutations in n protein were previously reported in indian, spanish, italian, and french samples [ , ] . these mutations are located in the site of the sr-rich region which has been reported to be intrinsically disordered [ ] . this region further incorporates a few phosphorylation sites [ ] , including the gsk phosphorylation at ser and a cdk phosphorylation site at ser which are located close to the position of this mutation. the 'srgts' ( - ) and 'spar' ( - ) sequence motifs are dependent on gsk and cdk phosphorylation motifs, respectively. other variations ( g>a and g>a) together convert polar to non-polar amino acid (r k) and g>c variation converts nonpolar to polar amino acid (g r). we observed unique shifts from the different proteins of sars-cov- genomes found in bangladesh (table s ). details of their pseudo-effects on viral replication, assembly, transmissivity and pathogenicity are corroborated in table s . surprisingly, most of these um were localized except a few exceptions. for example, the ums e d (e), s t (n), l j and is involved in the transcription and replication of cov rnas. we observed ums in nsp , but the location of these amino acids was not in the kh domain (k and h of nsp ), which binds with ribosome s subunit [ ] . however, nsp acts as a primary virulence factor in sars-cov- infection, and mutation in this protein could affect the structure and functional properties, thereby altering its virulence properties. nine ums were seen in nsp , but their effects on host cells were merely reported in the literature. since nsp interacts with the host proteins and disrupts the host cell survival signaling pathway [ ] , any mutation in nsp may play a crucial role in sars-cov infections. in a recent study, it has been found that, compared to bat sars-cov, sars-cov- has a stabilizing mutation at amino acid position , t q, which alters the viral pathogenicity and makes the virus more contagious [ ] . we did not observe this mutation in our study. the mutations of nsp are responsible for affecting the virus assembly and hence their replication. it is due to the disruption of replicase polyprotein processing into nsps. these nsps assemble with cellular membranes and facilitate virus replication. the ums found in nsp might have some probable effects. firstly, the ums (a v, r g, s f and v f) were found in the main domain of nsp that is important for processing endopeptidases from coronaviruses. secondly, many um (e.g. a v, a s, g c, i s, a s, d b, k r, l m, n s, n d, r g, t is, y c and y h etc) are found in topological (cytoplasmic) domain rather than transmembrane domain which could interfere less on cytoplasmic double-membrane vesicle formation, necessary for viral replication. thirdly, we observed three ums (l m, y c and y h) in adp-ribose- ′-phosphatase (adrp) or (macro) domain. it has been shown that mutations of the adrp domains does not diminish virus replication in mice, but reduces the production of the cytokine il- , which is an important pro-inflammatory molecule [ ] . however, we did not observe any mutation in the active sites and zinc finger motif, attributing normal catalytic activity of nsp . the general opinion is that sars-cov pl-pro domain is important for the development of antiviral drugs and of the actual role of this enzyme in the biogenesis of the covid- replicase complex is yet not explored. it is proposed that the proteins nsp , nsp and nsp , through their transmembrane domains, are involved in the replicative and transcription complex [ ] . in our study, we observed only ums in nsp and nsp , respectively. meanwhile, nsp encodes c-like proteinase which cleaves the c-terminus of replicase polyprotein at sites. the five ums that we found in nsp did not fall in its active sites ( and in orf ab) requiring further investigation with large number of sequence datasets. the present global outbreak of covid- , caused by sars-cov- , has already taken ~ . million lives. to combat this deadly disease, we need a greater understanding of the pathobiology of the virus. hence, it is essential to minimize the translational gap between viral genomic information and its clinical consequences for developing effective therapeutic strategies. in this study, we have attempted to explore genomic variations of bangladeshi sars-cov- viral isolates while comparing with a large cohort of global isolates. our analyses will facilitate the understanding of the origin, mutation patterns and their possible effect on viral pathogenicity. this study tries to address the importance of the variations in the viral genomes and their necessity for therapeutic interventions. the unique insights from this study will undoubtedly be supportive for a better understanding of sars-cov- molecular mechanism and to draw an end to the current life-threatening pandemic. emerging coronaviruses: genome structure, replication, and pathogenesis geographic and genomic distribution of sars-cov- the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity polymorphism and selection pressure of sars-cov- vaccine and diagnostic antigens: implications for immune evasion and serologic diagnostic performance sars-cov- genomic variations associated with mortality rate of covid- molecular conservation and differential mutation on orf a gene in indian sars-cov genomes emergence of european and north american mutant variants of sars-cov- in south-east asia genetic analysis of sars-cov- isolates collected from bangladesh: insights into the origin, mutation spectrum genome analysis of sars-cov- isolate from bangladesh in silico comparative genomics of sars-cov- to determine the source and diversity of the pathogen in bangladesh spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- , biorxiv a new coronavirus associated with human respiratory disease in china phylogenetic network analysis of sars-cov- genomes mega x: molecular evolutionary genetics analysis across computing platforms muscle: multiple sequence alignment with high accuracy and high throughput the neighbor-joining method: a new method for reconstructing phylogenetic trees interactive tree of life (itol) v : recent updates and new developments genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes complete genome sequence of a novel coronavirus (sars-cov- ) isolate from bangladesh mutations in sars-cov- viral rna identified in eastern india: possible implications for the ongoing outbreak in india and impact on viral structure and host susceptibility mutational spectra of sars cov orf ab polyprotein and signature mutations in the united states of non-synonymous mutations of sars-cov- leads epitope loss and segregates its varaints spike: evidence that d g increases infectivity of the covid- virus making sense of mutation: what d g means for the covid- pandemic remains unclear variant analysis of sars-cov- genomes the sars coronavirus nucleocapsid protein-forms and functions the severe acute respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes in the cytoplasm by - - -mediated translocation emergence of rbd and d g mutations in spike protein: an insight from indian sars-cov- genome analysis structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- , biorxiv mutational screening of the proteome of sars-cov- isolates: mutability of orf a covid : the role of the nsp and nsp in its pathogenesis murine coronavirus ubiquitin-like domain is important for papainlike protease stability and viral pathogenesis sars-cov- : virus mutations in specific european populations key: cord- -q i sz authors: bai, lei; zhao, yongliang; dong, jiazhen; liang, simeng; guo, ming; liu, xinjin; wang, xin; huang, zhixiang; sun, xiaoyi; zhang, zhen; dong, lianghui; liu, qianyun; zheng, yucheng; niu, danping; xiang, min; song, kun; ye, jiajie; zheng, wenchao; tang, zhidong; tang, mingliang; zhou, yu; shen, chao; dai, ming; zhou, li; chen, yu; yan, huan; lan, ke; xu, ke title: co-infection of influenza a virus enhances sars-cov- infectivity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q i sz the upcoming flu season in the northern hemisphere merging with the current covid- pandemic raises a potentially severe threat to public health. through experimental co-infection of iav with either pseudotyped or sars-cov- live virus, we found that iav pre-infection significantly promoted the infectivity of sars-cov- in a broad range of cell types. remarkably, increased sars-cov- viral load and more severe lung damage were observed in mice co-infected with iav in vivo. moreover, such enhancement of sars-cov- infectivity was not seen with several other viruses probably due to a unique iav segment as an inducer to elevate ace expression. this study illustrates that iav has a special nature to aggravate sars-cov- infection, and prevention of iav is of great significance during the covid- pandemic. the outbreak of severe acute respiratory syndrome coronavirus (sars-cov- ) at the end of has become pandemic worldwide. up to date, there had been more than million confirmed infected cases and million deaths globally (https://covid .who.int/). the ending time and the final severity of the current covid- pandemic wave are still uncertain. meanwhile, the upcoming seasonal influenza merging with the current pandemic might bring more challenges and pose a bigger threat to public health. there are many debates on whether seasonal flu would impact the severity of the covid- pandemic and whether massive influenza vaccination is necessary for the coming winter. however, no experimental evidence is available concerning iav and sars-cov- co-infection. it is well known that disease symptoms from sars-cov- and iav infections are quite similar, such as fever, cough, pneumonia, acute respiratory distress syndrome, etc( , ). moreover, both sars-cov- and iav are airborne transmitted pathogens that infect the same human tissues such as the respiratory tract, nasal, bronchial, and alveolar epithelial cultures ( , ) . besides, alveolar type ii cells (at pneumocytes) appeared to be preferentially infected by sars-cov- , which were also the primary site of iav replication( , ). therefore, the overlap of the covid- pandemic and seasonal influenza would pose a large population under the high risks of co-occurrent infection by these two viruses( ). unfortunately, during the last winter flu season in the southern hemisphere, there was little epidemiological evidence about the interaction between covid- and flu, probably due to a low iav infection rate resulted from social distancing ( , ) . a case report showed that three out of four sars-cov- and iav co-infected patients rapidly develop to respiratory deterioration( ). on the contrary, other reports only observed mild symptoms in limited co-infection outpatients( ). thus, the clinical co-infection outcomes are still unclear when a large population will face the threats of both viruses. in this study, we tested whether iav infection could affect the subsequent sars-cov- infection in both infected cells and mice. the results demonstrate that the pre-infection of iav strongly enhances the infectivity of sars-cov- by boosting viral entry in the cells and by elevating viral load plus more severe lung damage in infected mice. these data suggest a clear auxo-action of iav on sars-cov- infection, which implies the great importance of influenza virus and sars-cov- co-infection to public health. iav promotes sars-cov- virus infectivity. to study the interaction between iav and sars-cov- , a (a hypotriploid alveolar a was converted to be highly sensitive (up to , -fold) against the psars-cov- virus after different doses of iav infections (from low moi of . to high moi of , also shown by psars-cov- with mcherry reporter in fig. s ). in contrast, the pre-infection of iav had no impacts on pseudotyped vsv particles bearing vsv-g protein (fig. c ). we further tested more cell lines to show that the enhancement of the psars-cov- infectivity by iav was a general effect although the increased folds were different (lower basal level of infectivity, higher enhancement fold) (fig. d ). to validate the above results, we substituted the psars-cov- with the sars-cov- live (experimental scheme shown in fig. e ). we found that the pre-infection of iav strongly increased the copy numbers of the sars-cov- genome (e and n genes) in both cell lysates and supernatants of a (~ folds) (fig. f) . notably, in calu- (fig. g ) and nhbe ( fig. h ) cells that are initially susceptible to sars-cov- , iav pre-infection could further increase > folds of sars-cov- infectivity. collectively, these data suggest an auxo-action of iav on sars-cov- in a broad range of cell types. load and more severe lung damage. the hace transgenic mice were applied to study the interaction between iav and sars-cov- in vivo. mice were infected with x pfu of sars-cov- with or without pfu of iav pre-infection and were then sacrificed two days later after sars-cov- infection (the experimental scheme is shown in fig. a) . the viral rna genome copies from lung homogenates confirmed that sars-cov- efficiently infected both groups (more than x n gene copies) (fig. b) , while the influenza np gene was only detected in iav pre-infection group (fig. b ). intriguingly, a significant increase in sars-cov- viral load ( . -fold increase in e gene and . -fold increase in n gene) was observed in lung homogenates from co-infection mice compared to that from sars-cov- single- infected mice (fig. c ). the histological data in fig. d further illustrated that iav and sars-cov- co-infection induced more severe lung pathologic changes with massive infiltrating cells and obvious alveolar necrosis as compared to sars-cov- single infection or mock infection. iav components specifically facilitate the entry process of sars-cov- . we further tested if several other viruses on hand had similar effects to promote sars- catl were increased around three folds (a in fig. a , calu- in fig. s ). an obvious switch of intracellular ace expression was triggered at h post-iav-infection (fig. c ). in the meantime, influenza np, mx , and isg increased accordingly confirming a successful infection of iav (fig. b ). the data indicated that iav permitted increased sars-cov- infection through the up- regulation of ace expression. enhanced sars-cov- infectivity is independent of ifn signaling. ace was reported to be an interferon-stimulated gene (isg) in human airway epithelial cells ( ) . iav infection will also stimulate type i ifn signaling. we, therefore, tested whether the augment of ace expression is dependent on ifn or not. for this, cells were the a/wsn/ virus was generated by reverse genetics as previously described ( ). all the mrna levels were normalized by β-actin in the same cell. the relative number of sars-cov- viral genome copy number were determined using ifnα for hours. cells were then infected with psars-cov- for another hours followed by measuring luciferase activity and mrna expression levels of indicated genes. the data of mrna levels were expressed as fold changes relative to non-treatment cells. figure s . iav facilitates the entry process of psars-cov- (fig. ) . figure s . iav infection induces elevated ace expression (fig. ) . figure s . enhanced sars-cov- infection is independent of ifn signaling (fig. ) . figure s . iav facilitates viral entry of wt or mutant sars-cov- . mutations in the spike protein of middle east respiratory syndrome structural and functional basis of sars-cov- entry by using human ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor furin, a potential therapeutic target for covid- . iscience could an endo-lysosomal ion channel be the achilles heel of sars-cov ? structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues human antibody responses after dengue virus infection are highly cross-reactive to zika virus key: cord- -tpp j g authors: jin, zhenming; zhao, yao; sun, yuan; zhang, bing; wang, haofeng; wu, yan; zhu, yan; zhu, chen; hu, tianyu; du, xiaoyu; duan, yinkai; yu, jing; yang, xiaobao; yang, xiuna; yang, kailin; liu, xiang; guddat, luke w.; xiao, gengfu; zhang, leike; yang, haitao; rao, zihe title: structural basis for the inhibition of sars-cov- main protease by antineoplastic drug carmofur date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tpp j g the antineoplastic drug carmofur was shown to inhibit sars-cov- main protease (mpro). here the x-ray crystal structure of mpro in complex with carmofur reveals that the carbonyl reactive group of carmofur is covalently bound to catalytic cys , whereas its fatty acid tail occupies the hydrophobic s subsite. carmofur inhibits viral replication in cells (ec = . μm) and it is a promising lead compound to develop new antiviral treatment for covid- . the antineoplastic drug carmofur was shown to inhibit sars-cov- main protease (m pro ). here the x-ray crystal structure of m pro in complex with carmofur reveals that the carbonyl reactive group of carmofur is covalently bound to catalytic cys , whereas its fatty acid tail occupies the hydrophobic s subsite. carmofur inhibits viral replication in cells (ec = . μm) and it is a promising lead compound to develop new antiviral treatment for covid- . covid- , a highly infectious viral disease, has spread since its appearance in december , causing an unprecedented pandemic. the number of confirmed cases worldwide continues to grow at a rapid rate, but there are no specific drugs or vaccines available to control symptoms or the spread of this disease at this time. the etiological agent of the disease is the coronavirus sars-cov- . this virus has a ~ , nt rna genome. the n-terminus of the viral genome encodes two translational products, polyproteins a and ab (pp a and pp ab) , , which are processed into mature non-structural proteins, by the main protease (m pro ) and a papain- like protease . m pro has been proposed as a therapeutic target for anti-coronavirus (cov) drug development - . we previously screened over , compounds and identified carmofur as compound that can inhibit m pro in vitro, with an ic of . μm . carmofur ( -hexylcarbamoyl- -fluorouracil) is a derivative of -fluorouracil ( - fu) (fig. a) and an approved antineoplastic agent. carmofur has been used to treat colorectal cancer since s , and has shown clinical benefits on breast, gastric, and bladder cancers - . the target for carmofur is believed to be thymidylate synthase , , but it has also been shown to inhibit human acid ceramidase (ac) , through covalent modification of its catalytic cysteine . the molecular details for how carmofur inhibits m pro activity were unresolved. here, we present the . Å x-ray crystal structure of sars-cov- m pro in complex with carmofur (fig. b, domain ii feature the catalytic dyad residues cys and his (fig. c, d) . the substrate-binding pocket is divided into a series of subsites (including s , s , s , and s ′), each accommodating a single but consecutive amino acid residue in the substrate. the first residue serine of one protomer interacts with residue phe and glu of the other protomer to stabilize the s subsite (extended data fig. c) , and this structural feature is essential for catalysis . the electron density map unambiguously shows that the fatty acid moiety . another difference is that carmofur only occupies the s subsite (fig. d) , whereas n occupies four subsites (s , s , s and s ′, see extended data fig. b , c). the lactam ring of n is located in the s subsite, which is filled by a dmso molecule in the m pro - carmofur structure (extended data fig. b, c) . these observations demonstrate the potential for structural elaboration of carmofur and will be useful to design more potent derivatives against the m pro of sars-cov- . we previously showed that treatment with μm ebselen (ec = . μm) inhibited infection of vero cells with sars-cov- whereas carmofur did not showed detectable antiviral activity at this concentration . here we determined the inhibitory effect of carmofur against sars-cov- infection on vero e cells, as previously described (fig. ) . by measuring viral rna in supernatant, we determined the ec for carmofur as . μm (fig. a) . to verify this result, we fixed infected cells and stained them using anti-sera against viral nucleocapsid protein (np) and observed a decrease in np levels after carmofur treatment (fig. b) . we also performed cytotoxicity assays for carmofur in vero e cells and determined the cc value of . μm (fig. c) . thus, carmofur has a favorable selectivity index (si) of . , but further optimization will be required to develop into an effective drug. in conclusion, the crystal structure of m pro in complex with carmofur shows that the compound directly modifies the catalytic cys of sars-cov- m pro . our study also provides a basis for rational design of carmofur analogs with enhanced inhibitory efficacy to treat covid- . since m pro is highly conserved among all cov m pro s, carmofur and its analogs may be effective against a broader spectrum of coronaviruses. the best crystals were grown using a well buffer containing . to immunofluorescence to monitor intracellular np level as described previously . for cytotoxicity assays, vero e cells were suspended in growth medium in -well plates. the next day, appropriate concentrations of carmofur were added to the medium. after ceramidase in glioblastoma: a review of its role, potential treatment, and challenges molecular mechanism of inhibition of acid ceramidase by structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design the newly emerged sars-like coronavirus hcov-emc also has an "achilles' heel": current effective inhibitor targeting a c-like protease structure of main protease from human coronavirus nl : insights for wide spectrum anti-coronavirus design of wide-spectrum inhibitors targeting coronavirus main references phaser crystallographic software macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix features and development of coot towards automated crystallographic structure refinement with phenix.refine we are grateful to the staff at the bl u , bl u and bl u at shanghai this work was supported by grants from national key r&d program of china (grants no. department of science and technology of guangxi zhuang autonomous region ab ), and the natural science foundation of china (grant no conceived the project performed qrt- pcr and cytotoxicity assay analysis the authors declare no competing interests. key: cord- -pvn qq f authors: sadykov, mukhtar; mourier, tobias; guan, qingtian; pain, arnab title: short sequence motif dynamics in the sars-cov- genome suggest a role for cytosine deamination in cpg reduction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pvn qq f rna viruses use cpg reduction to evade the host cell defense, but the driving mechanisms are still largely unknown. in an attempt to address this we used a rapidly growing genomic dataset of sars-cov- with relevant metadata information. remarkably, by simply ordering sars-cov- genomes by their date of collection, we find a progressive increase of c-to-u substitutions resulting in ’-ucg- ’ motif reduction that in turn have reduced the cpg frequency over just a few months of observation. this is consistent with apobec-mediated rna editing resulting in cpg reduction, thus allowing the virus to escape zap-mediated rna degradation. our results thus link the dynamics of target sequences in the viral genome for two known host molecular defense mechanisms, mediated by the apobec and zap proteins. viruses utilize numerous mechanisms to avoid the host cell defense. one such mechanism is the cpg dinucleotide reduction observed in many single-stranded rna fraction of the observed c>u changes represent multiple, independent events ( figure s ). the reported c>u frequency is therefore most likely an underestimate. over this period of five months, we find a steady increase in c>u substitutions ( figure b '-uu- ') in viral dinucleotide frequencies for the -month period ( figure s ). among all dinucleotides, upc showed the highest degree of decrease, while upu exerted the highest rates of increase. cpg, cpa, cpc, cpu and gpa also showed a negative net gain but not as prominent as upc, cpa or apg. we find that the majority of dinucleotide losses were due to c>u changes, in agreement with a recent study by rice and et al. ). when analyzing the context of genomic sites undergoing c>u changes we noticed an enrichment for '-ucg- ' motifs (table s ) . to assess the contribution of c>u changes in cpg loss, we examined the dynamics of [a/c/g/u]cg trinucleotides over time ( figure d ). the progressive change (~ % over a -month period) of '-ucg- ' to '-uug- ' is most striking when supported by a larger number of genomes (days to ), whereas no such pattern is observed for the other trinucleotides ( figure d ). the association between cytosine deamination and cpg loss is further underlined by the rapid, progressive increase in '-ucg- ' > '-uug- ' changes compared to other '- for all di-, tri-, and tetra-nucleotide motifs containing c in the reference genome, the noted. the ratio between these two measures was compared to the expected ratio, defined as the number of c's with a c-to-u substitution divided by the total number of c's in the genome. the probability was calculated using a binomial distribution. all statistical tests were performed using rstudio v. . . (booth et al. ) . folding potential the reference sequence was divided into overlapping -nucleotides windows, each shifted by nucleotides. these sequences were folded using rnafold (lorenz et al. we thank all laboratories which have contributed sequences to the gisaid database and zhadyra yerkesh for giving her comments and helpful discussions. the data underlying this article are available in gisaid, at https://gisaid.org. the id numbers of genomes used are provided in table s . upa dinucleotide frequencies on rna virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication imbalanced host response to sars cov- drives development of covid- rstudio: integrated development for r diverse mechanisms used by cellular restriction factors to inhibit virus infections the heterogeneous landscape and early evolution of pathogen-associated cpg dinucleotides in sars-cov- evidence for host-dependent rna editing in the transcriptome of sars-cov- patterns of evolution and host gene mimicry in influenza and other rna viruses dinucleotide evolutionary dynamics in influenza a virus induced mutation of human immunodeficiency virus type- contributes to adaptation and evolution in natural infection snapshot: antiviral restriction factors viennarna package . molecular mechanism of rna recognition by zinc-finger antiviral protein mansky lm. . deamination hotspots among apobec family members are defined by both target site sequence context and ssdna secondary structure the zinc finger antiviral protein restricts sars-cov- . biorxiv evidence for strong mutation bias towards, and selection against, u content in sars-cov- : implications for vaccine design acute respiratory syndrome coronavirus sequence characteristics and evolutionary rate estimate from maximum likelihood analysis modeling the embrace of a mutator: apobec selection of other coronaviruses -causes and consequences for their short and long evolutionary trajectories the evolutionary pathway to virulence of an rna virus bieniasz pd. . cg dinucleotide suppression enables antiviral defence targeting non- self rna innate immune signaling induces high levels of tc-specific deaminase activity in primary monocyte-derived cells through expression of apobec a isoforms human sars-cov- has evolved to reduce cg dinucleotide in its open reading frames extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense multi-site co- mutations and 'utr cpg immunity escape drive the evolution of sars-cov- key: cord- -nsp lv authors: rath, soumya lipsa; kumar, kishant title: investigation of the effect of temperature on the structure of sars-cov- spike protein by molecular dynamics simulations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nsp lv statistical and epidemiological data imply temperature sensitivity of the sars-cov- coronavirus. however, the molecular level understanding of the virus structure at different temperature is still not clear. spike protein is the outermost structural protein of the sars-cov- virus which interacts with the angiotensin converting enzyme (ace ), a human receptor, and enters the respiratory system. in this study, we performed an all atom molecular dynamics simulation to study the effect of temperature on the structure of the spike protein. after ns of simulation at different temperatures, we came across some interesting phenomena exhibited by the protein. we found that the solvent exposed domain of spike protein, namely s , is more mobile than the transmembrane domain, s . structural studies implied the presence of several charged residues on the surface of n-terminal domain of s which are optimally oriented at - °c. bioinformatics analyses indicated that it is capable of binding to other human receptors and should not be disregarded. additionally, we found that receptor binding motif (rbm), present on the receptor binding domain (rbd) of s , begins to close around temperature of °c and attains a completely closed conformation at °c. the closed conformation disables its ability to bind to ace , due to the burying of its receptor binding residues. our results clearly show that there are active and inactive states of the protein at different temperatures. this would not only prove beneficial for understanding the fundamental nature of the virus, but would be also useful in the development of vaccines and therapeutics. graphical abstract highlights statistical and epidemiological evidence show that external climatic conditions influence the sars-cov infectivity, but we still lack a molecular level understanding of the same. here, we study the influence of temperature on the structure of the spike glycoprotein, the outermost structural protein, of the virus which binds to the human receptor ace . results show that the spike’s s domain is very sensitive to external atmospheric conditions compared to the s transmembrane domain. the n-terminal domain comprises of several solvent exposed charged residues that are capable of binding to human proteins. the region is specifically stable at temperatures ranging around - ° c. the receptor binding motif adopts a closed conformation at °c and completely closes at higher temperatures making it unsuitable of binding to human receptors severe acute respiratory syndrome coronavirus or sars-cov- , attacks the cells of the human respiratory system. recent studies have found that the virus also interacts with the cells of the digestive system, renal system, liver, pancreas, eyes and brain [ ] . it is known to cause severe sickness and is fatal in many cases [ ] . it is believed that the virus originated in bats, which act as the natural reservoir; subsequently it got transmitted to human. it then gradually spread across almost all the nations through aerial transmission resulting in one of the worst known global pandemic of this century [ ] . sars-cov- is one of the seven forms of coronaviruses that affect the human population. the other known coronaviruses include hcov- e, hcov-oc , sars-cov, hcov-nl , hcov-hku and mers-cov [ , ] . their infection varies from common cold to sars, mers or covid [ ] . these viruses have been observed to affect the human population predominantly during a particular season. for instance, the sars infections began during the cold winters of november and after eight months, the number of reported cases became almost negligible [ ] . statistics show that countries with hot and humid weather conditions had lesser number of infectious cases of sars [ ] . however, mers-cov, which was identified in middle east regions, affected individuals during the summer [ ] . thus, the disease epidemiology suggests that the virus is found to be prominent in certain climatic conditions only. the viability of sars-cov- was measured on different surfaces y hin et al who found that the virus droplets survived at ut qui kly dea tivated at elevated temperatures of [ ] . smooth surfaces, plastics and iron show greater viability of the virus compared to that of paper, tissue, wood or cloth. surgical masks had detectable viruses even on th day [ , ] . soaps and disinfectants which disintegrate the virus membrane and structural proteins are a potent example of how the modulation of atmospheric conditions can affect the virus viability. statistical reports by cai et. al., and several others had shown that tropical countries like malaysia, indonesia or thailand with high temperature and high relative humidity did not have major community outbreaks of sars [ , [ ] [ ] . although viruses cannot be killed like bacteria by autoclaving, temperature sensitivity of virus have been reported several times in the past. seasonal rhinoviruses ould not repli ate at c whereas -c is ideal for their survival in nasal cavity [ nfluenza was found to e effe tive at a temperature around c whereas higher temperatures of c resulted in clumping of viruses on cell surfaces [ , , ] . similarly, the viability of sars virus that persisted for days at temperatures ranging etween -c andhumidity was lost when the temperature was raised to c and % humidity [ ] . when the virus is exposed to different temperature conditions, the initial interactions of the atmosphere occur with the structural proteins. there are four major structural proteins present on the virus, the spike glycoprotein, the envelope protein, the membrane protein and the nucleocapsid. each of the proteins performs specific functions in receptor binding, viral assembly and genome release [ ] . one of the first and largest structural proteins of the coronavirus is the spike glycoprotein [ ] . the protein exists as a homotrimer where each monomer consists of amino acid residues ( figure ) and is intertwined with each other. each monomer has two domains, namely s and s [ ] . the s and s domains are cleaved at a furin site by a host cell protease [ , ] . the s domain lies predominantly above the lipid bilayer. the s domain, which is a class i transmembrane domain, travels across the bilayer and ends towards the inner side of the lipid membrane [ ] . figure shows the two domains of the spike glycoprotein. the s domain comprises of mostly beta pleated sheets. it can be further classified into receptor binding domain (rbd) and n-terminal domain (ntd). the rbd binds to angiotensin converting enzyme (ace ) on the host cells [ ] . it lies on the top of the complex, where around residues from the rbd domain bind to the ace receptor on the host protein [ , ] . the ntd is the outermost domain that is relatively more exposed and lies on the three sides giving a triangular shape to the protein when viewed from top ( figure ) . the ntd has a galectin fold and is known to bind to the sugar moieties [ ] . the s domain on the other hand is a transmem rane region with strong inter hain onding etween the residues t is mostly αhelical and forms a triangle when viewed from bottom, though there is no overlapping of the top and bottom triangles. temperature is a very significant variable parameter for proteins because proteins respond differently in high and low temperature conditions. many proteins have high thermal stability while others can unfold or even denature at high temperatures [ , ] . during november, , when the first out reak of ovid was reported the temperature in uhan hina was around in the morning and at night tropi al ountries such as india, where a large number of cases still persist, had over of temperature [ , ] . although statistical and experimental evidence show that temperature influences the activity and virulence of the virus, we still lack the understanding of the molecular level changes that are taking place in the virus due to the different weather conditions. till date, there is no concrete evidence on whether atmospheric conditions actually influence the structure of the virus. here, by using all atom molecular dynamics (md) simulations we explore the dynamics of the spike glycoprotein of sars-cov- at different temperatures. this is the first molecular study on the environmental influence on the protein structure. results suggest that s domain is more flexible than s . in the s domain, we observed the sensitivity of the receptor binding motif to different temperatures. we also found that the n-terminal domain of the protein has the potential of binding to different human receptors. the study will not only help us in understanding the nature of the virus but is also useful to design effective therapeutic strategies. the crystal structure of the spike glycoprotein (pdb: vxx) was found to have missing residues. thus, for our study we considered the complete model of the trimeric spike protein generated by zhang et. al. and had a template modeling score of . [ ] . the model was devoid of n-acetyl glucosamine (nag) sugar moieties which are known to bind and stabilize the protein. the envelope lipid bilayer was not considered in the work to avoid large system size in atomistic simulations. fter initial minimization and equili ration we generated five different systems having temperatures ranging from to at an interval of degrees. this was done to maintain the uniformity of the simulations, where temperature was the only variable that was different. n addition a temperature of was also imposed on the system to observe any possible deformation in the structure of spike protein, although this high temperature is not realistic to imitate the environmental condition (table s ). production run for ns was carried out in isothermal isobaric (npt) ensemble. after performing ns of classical molecular dynamics simulations, the root mean square deviation (rmsd) of the trajectory, with respect to the starting structure, was calculated to check if the systems have attained stability. figure s shows the complete rmsd of all the systems at different temperatures. it can be seen that the stability was attained within the first ns of the simulation time thus indi ating that the systems are well equili rated the r s values lie etween nm for all the systems with an ex eption at where a marginally higher rmsd was seen after ns of simulation time. at temperatures and a small rise in rmsd curves after ns of simulation time was observed. this implies that the spike protein was more sta le at temperatures and since, the protein comprises of two distinct domains s and s , we checked the rmsd of s and s domains individually, with respect to the starting structure, to understand the ause for higher r s values o served at and igure the r s values of s domain at and were found to e around nm nearly nm more than simulations at and respe tively similar trend was o served in the r s of s domain ut the differen e in values was only nm lthough in this study we haven't considered the bilayer lipid membrane of the sars-cov- envelope inside which the spike glycoprotein resides, the s domain shows remarkable stability in its rmsd values ( figure ). the stability of the s domain can be conferred to the strong interchain interactions among the highly α-helical s domain. since the spike protein is a homotrimer, the s domain of individual domains was also checked to account for the difference in fluctuations. figure (c) -(e) shows the rmsd of s domain of chains a, b and c at different temperatures. in chain a, it can be clearly seen how the rmsd is quite high at temperatures of c and respe tively t c however, the fluctuations are quite negligible and the system is very stable. similarly, for chain b at and the chains were stable. in the s domain of chain c, except for simulation at c, at all other temperatures, the rmsd values were found to be quite low along the length of the simulation time. the above data indicates that the protein chains, especially the s domains are quite flexible around the temperatures of - c in comparison to low temperatures of or high of simulation temperature rrespe tive of the presen e of the ilayer mem rane at different temperature conditions, the stalk of the spike protein remains stable. b. domain flexibility of s is more pronounced in order to identify the region on the spike protein that causes the deviations in rmsds, we the rmsf of s domain on the other hand shows marked stability compared to domain s ( figure s ). this is in good agreement to our earlier observations of the rmsd of the s domain. since it is a triple helical coil, the coiled-coil motif of the s domain which is further supported by three shorter helices supports domain stability [ ] . however, the c-terminal residues - show greater flexibility compared to the rest of the domain. it should be noted that the c- terminal region of the spike glycoprotein is exposed towards the inner side of the envelope bilayer and does not participate in the interchain interactions. it also has a more relaxed packing compared to the rest of the s [ , ] . although, the ntd is not known to directly bind to the receptor, in mouse hepatitis coronavirus, it was found that the ntd binds to a ceacam a receptor [ ] . similarly, vaccines developed against the ntd of spike protein in mice, had shown that ntd can also be a potential therapeutic target [ , ] . moreover, comparison between bovine coronavirus and bovine hemagglutinin-esterase enzyme indicated close evolutionary link between the virus and the host proteins, which could facilitate attachment in the host cells [ ] . we performed a multiple the ntd is relatively more exposed to solvents and more susceptible to external environmental conditions. however unlike rbd, the nt doesn't have a defined open or losed onformation the coronavirus ntd is composed of three layered beta-sheet sandwich with , and antiparallel β strands in ea h layer making it a total of eta stranded sheet with prominent β hairpin loops ( figure s ). the crystal structures of mouse hepatitis coronavirus (mhc) spike protein and its re eptor shows that the β and β of the ntd are the binding motif for cecam a protein [ ] . however, unlike the mhc ntd, the arrangement of strands in sars-cov- is in opposite direction. the upper layer of the beta sandwich is composed of beta strands β β β β β β β igure s the three prominent regions which are exposed to the solvent and capable of interacting with potential receptors are regions n-terminal β strand ββ β -β and β -β loop. comparison of the ntd at different temperatures ( figure ) show differential arrangement of the solvent exposed loops. the loops are formed by residues from n-terminal β strand ββ β -β and β -β . the time averaged conformation of the loops after ns of simulation show that the loops are oriented close to each other at temperatures - c, however at c and c, they move farther away from each other. since, there was similarity between the ehprin a proteins that binds to the ephrin a receptors; we compared the residues involved in protein- protein interaction in the crystal structure of the human epha ectodomain in complex with human ephrin a for comparison. (pdb id: bka). there are three salt bridges and seven hydrogen bonds between the ephrin protein and its receptor. moreover, it can be clearly seen that the ntd loops host a large number of polar residues ( figure ) . these residues form a stable motif at temperatures - c, primarily due to the stability between the loops. at c and c, hydrophobic patch from n-terminal β strand is exposed towards the solvent. the polar residues from β -β and β -β move away from the n-terminal β strand and the β -β loop, reducing the possibility of protein-protein interaction. hence, a strong possibility exists for the ntd to act as a protein binding site at lower temperature ranges. from the bioinformatics and structural analyses, we observed that the ntd not only acts as a glycan binding site but can also as a site for binding of several human proteins. the motif formed out of several polar residues on the solvent exposed loops at -ould form salt-bridges and hydrogen bonds with partner proteins. at higher temperatures, the propensity of forming such interactions would be lost owing to the differential orientation of the loops. nonetheless, the ntd could act as a possible target for development of vaccines and inhibitors. the receptor binding domain (rbd) of the spike glycoprotein is a potential target for vaccine and drug development [ , ] . it is highly conserved among the human coronaviruses and binds to ace receptor present on the lung tissues [ ] . residues - of the rbd domain comprises of the receptor binding motif (rbm). the rbm has residues which are identical and conserved region primarily interacts with the ace receptor and hence, often scientists target the rbd domain of for developing therapeutic agents [ , , and ] . earlier in figure , we saw that the rbd domain spanning from residues - shows higher stability when compared to the ntd of the s domain at different range of temperatures. we compared the time averaged conformation of the rbd generated from the last ns of the simulation time at different temperatures igure the ore β pleated sheet was very sta le demonstrating no lack of secondary structures at higher temperatures. however, the rbm motif (highlighted in magenta in figure ) shows a very dynamic conformation across different temperature ranges the dynami s was more pronoun ed at c c and c whereas at c and c of temperature, the rbm had a more confined onformation the r flexi ility was more apparent at c and c where the three chains moved further away from each other. however, a tighter and well packed structure was found for the protein at c. the figures suggest that although residue wise movements in rbd were not visible in rmsf (figure ), the rbd domains and motifs show intrinsic flexibility along particular temperature ranges. previous studies have indicated that the rbd domain can adopt either an open or a closed conformation in the virus [ ] . we compared the conformation of the spikeprotein-ace crystal structure and found that in the open conformation, the rbd exposes its rbm residues phe , ala , phe , asn , tyr , gln , gly , gln , thr , asn , gly and tyr to fa ilitate the inding of the re eptors t is fas inating to see that at c and more interestingly at c, the rbm motif is in a closed loop conformation and very compact which hinders its association with the partner proteins. to validate the findings, we ran another simulation of the spike protein at a higher temperature of c. after ns of simulation, we found that significant similarity between the closed conformation observed at c and the conformation at c. the rbm residues, specifically phe , ala , phe , asn , tyr , gln , gly , gln , thr , asn , gly and tyr were found to be clearly buried between the interchain subunits at c (figure , of the coronaviruses inside the host cells. it exists as a homotrimer and is partly exposed to the outer environment and partly immersed inside the lipid bilayer of the viral envelope. here, we studied the differential response of the spike protein at different temperature conditions. our results show that the s transmembrane domain remains stable even without the bilayer membrane, whereas the solvent exposed s domain is quite flexible. moreover, the s comprises of two subdomains, namely n-terminal domain (ntd) and the receptor binding domain (rbd). the simulations results show that the rbd is relatively less mobile. its flexibility is limited only to the receptor binding motif or rbm which interacts with the angiotensin converting enzyme (ace ), its human receptor. however, the ntd was found to be quite mobile. although, the ntd doesn't dire tly intera t with the re eptor in humans, it has been found to bind to receptors in other mammals [ ] . the flexible ntd hosts a large number of charged residues on the top layer of its tri-layered beta sandwi h ar hite ture owever at -c of temperature, the polar residues were found to be less solvent exposed. the similarity of the ntd sequence with the several human receptors such as ephrins, briakunumab, anti-tslp, etc indicated a possibility of the subdomain to be involved in binding to alternate human proteins. the rbm present on the rbd is very crucial in initial protein-protein interaction between the host and virus. we found that this domain is equilibration, by maintaining harmonic restraints on the protein heavy atoms, the system was gradually heated to k in a canonical ensemble. the harmonic restraints were gradually reduced to zero and solvent density was adjusted under isobaric and isothermal conditions at atm and k. this was followed by ps nvt and ps npt equilibration with harmonic restraints of kj mol - nm - on the heavy atoms. production run for all the systems was carried out for ns till it reached a stable rmsd. all simulations were carried out in gromacs with amberff sb-ildn forcefield for proteins [ , ] . the long-range electrostatic interactions were treated by using particle-mesh ewald sum and shake was used to constrain all bonds involving hydrogen atoms. after equilibration, systems were heated or cooled at different temperatures (table s ) and simulated for ns. all analyses were carried out using gromacs analysis tools [ ] . protein blast was used to search similar sequences in the human proteome. the blast tree view widget helped us generate the phylogenetic tree which is a simple distance based clustering of the sequences based on pairwise alignment results of blast relative to the query sequence [ ] . vmd was used for visualization of results and generation of figures [ ] . supporting figures, figs s -s and table s are provided online. a sars-cov- protein interaction map reveals targets for drug repurposing covid- : a novel zoonotic disease caused by a coronavirus from hina: hat we know and what we don't i ro iol ust from sars to mers: evidence and speculation genetic recombination, and pathogenesis of coronaviruses the effects of temperature and relative humidity on the viability of the sars coronavirus stability of sars-cov- in different environmental conditions effects of air temperature and relative humidity on coronavirus survival on surfaces aerosol and surface stability of sars-cov- as compared with sars-cov- influence of meteorological factors and air pollution on the outbreak of severe acute respiratory syndrome an initial investigation of the association between the sars outbreak and weather: with the view of the environmental temperature and its variation temperature-dependent innate defense against the common cold virus limits viral replication at warm temperature in mouse airway cells roles of humidity and temperature in shaping influenza seasonality temperature-sensitive viral infection: inhibition of hemagglutinating virus of japan (sendai virus) infection at ° highly heterogeneous temperature sensitivity of pandemic influenza a(h n ) viral isolates severe acute respiratory syndrome coronavirus (sars-cov- ): an overview of viral structure and host response structure, function, and evolution of coronavirus spike proteins structure, function, and antigenicity of the sars-cov- spike glycoprotein activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites structural and functional basis of sars-cov- entry by using human ace cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains structure of the sars-cov- spike receptor-binding domain bound to the ace receptor thermal stability of globins: implications of flexibility and heme coordination studied by molecular dynamics simulations structural flexibility and protein adaptation to temperature: molecular dynamics analysis of malate dehydrogenases of marine molluscs effects of temperature and humidity on the daily new cases and new deaths of covid- in countries will coronavirus pandemic diminish by summer? how significant is a protein structure similarity with tm-score = . ? the protein data bank nuc cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer identification of the membrane-active regions of the severe acute respiratory syndrome coronavirus spike membrane glycoprotein using a / -mer peptide scan: implications for the viral fusion mechanism j structure of mouse coronavirus spike protein complexed with receptor reveals mechanism for viral entry purified coronavirus spike protein nanoparticles induce coronavirus neutralizing antibodies in mice the recombinant n-terminal domain of spike proteins is a potential vaccine against middle east respiratory syndrome coronavirus (mers-cov) infection immunogenicity and protection efficacy of monomeric and trimeric recombinant sars coronavirus spike protein subunit vaccine candidates covid- and multiorgan response receptor-binding domain of sars-cov spike protein induces highly potent neutralizing antibodies: implication for developing subunit vaccine characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine structure, function, and evolution of coronavirus spike proteins receptor recognition mechanisms of coronaviruses: a decade of structural studies gromacs development team improved side-chain torsion potentials for the amber ff sb protein force field database resources of the national center for biotechnology information key: cord- -eui zyg authors: chandler-brown, devon; bueno, anna m.; atay, oguzhan; tsao, david s. title: a highly scalable and rapidly deployable rna extraction-free covid- assay by quantitative sanger sequencing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: eui zyg there is currently an urgent unmet need to increase coronavirus disease (covid- ) testing capability to effectively respond to the covid- pandemic. however, the current shortage in rna extraction reagents as well as limitations in qpcr protocols have resulted in bottlenecks in testing capacity. herein, we describe a novel molecular diagnostic for covid- based on sanger sequencing. this assay uses the addition of a frame-shifted spike-in, a modified pcr master mix, and custom sanger sequencing data analysis to detect and quantify sars-cov- rna at a limit of detection comparable to existing qpcr-based assays, at - genome copy equivalents. crucially, our assay was able to detect sars-cov- rna from viral particles suspended in transport media that was directly added to the pcr master mix, suggesting that rna extraction can be skipped entirely without any degradation of test performance. since sanger sequencing instruments are widespread in clinical laboratories and commonly have built-in liquid handling automation to support up to samples per instrument per day, the widespread adoption of qsanger covid- diagnostics can unlock more than , , tests per day in the us. coronavirus disease (covid- ) caused by severe acute respiratory syndrome coronavirus (sars-cov- ; covid- ) is a public health emergency of international concern. this virus first emerged in wuhan, china in december [ ] , and spread to over countries [ ] . as of april , , more than . million individuals are infected and over , deaths are reported [ ] . one study found that undocumented infections were the infection source for % of documented cases [ ] . evidence from other reports suggests that some asymptomatic or minimally symptomatic patients have high levels of viral shedding [ , ] . the substantial undocumented infection, along with the nonspecific clinical features and uncertainties in transmissibility and virulence of sars-cov- , have presented significant challenges in containment of this virus. until an effective vaccine or treatment is available, rapid, accurate, and widely available diagnostic tests not only for symptomatic patients but also for those who have come in contact with any positive cases are critical to curb the spread of the virus. however, such tests have been in limited supply in the us and many other countries in the world. the backbone of covid- diagnosis has been quantitative polymerase chain reaction (qpcr) tests. the limited number of qpcr machines and the limitations of qpcr protocols, which do not have safe stopping points or a readily automatable process, have hampered large-scale testing. furthermore, the sensitivity of qpcr requires specific rna extraction reagents that are currently in short supply. other diagnostics for covid- include blood-based immunoassays, which are often less accurate than molecular assays and indirectly detects immune response to covid- only after symptoms appear, and crispr-based assays that are still in development [ ] . in this report, we describe a molecular diagnostic method for covid- based on quantitative sanger (qsanger), a technique for reconstructing quantitative data from sanger sequencing chromatograms that we originally developed and validated for non-invasive prenatal testing (nipt) of cell-free dna (cfdna) [ ] . the workflow of qsanger-based covid- is highly similar to traditional sanger sequencing of reverse transcription (rt)-pcr amplicons (fig. ), but it includes the addition of a frame-shifted synthetic covid- spike-in dna in the reaction master mix. qsanger covid- is designed to support one-step reverse-transcription (rt)-pcr directly from viral transport media (vtm) of specimens, without an rna purification step (fig. a) . the amplification products are then purified and sanger sequenced by automated capillary electrophoresis. synthetic dna included in rt-pcr master mix prior to pcr amplification serves as an internal control that enables specimens to be readily identified as either positive or negative for covid- ( fig. b-c) . quantitative analysis of the sanger sequence chromatogram gives qsanger an extremely high sensitivity and specificity for all positive results with a limit of detection of - genome copy equivalents (gce), equivalent to gold-standard qpcr methods. furthermore, the presence of a spike-in as an intra-sample control in the qsanger assay allows for easy interpretation of results and determination of sources of error (e.g. extraction or amplification or sequencing failure), and also allows population-level analyses such as mutational analysis and contact tracing. in addition, the ratio of the amplitudes of corresponding bases between the endogenous and spike-in sequences at offset positions reflects the ratio of the molecular abundances of the two sequences. computationally combining the amplitude ratios of multiple corresponding bases can then be used to estimate the viral load over a -fold dynamic range with poisson-limited coefficient of variation. as an initial demonstration of qualitative detection of covid- by qsanger, we designed pcr primers and synthetic dna spike-in to target sars-cov- n protein (fig. b) . a one-step rt-pcr mix (neb) containing both was used to perform reverse-transcription of sars-cov- rna and subsequent pcr amplification in one pot. in each rt-pcr, we added either nuclease-free water as a no-template control (ntc) or - gce of synthetic sars-cov- rna (twist biosciences). all reactions also contained~ gce of spike-in in the rt-pcr master mix. after rt-pcr and sanger sequencing, we observed a qualitatively clean chromatogram for the spike-in sequence for the ntc condition in which no sars-cov- rna was added ( fig. a ). at the gce level, the sanger chromatogram showed clear mixed bases corresponding to approximately equal abundance of spike-in dna and sars-cov- rna. at gce sars-cov- rna, the chromatogram exhibited a relatively pure trace for the sars-cov- target sequence, suggesting that the sars-cov- signal overwhelmed the spike-in signal when it was present at -fold greater abundance. to determine the limit of detection of qsanger covid- , we performed assays on dilutions of sars-cov- rna corresponding to gce, gce, gce, gce, or gce. four replicates at each dilution were assayed by both qsanger and qpcr. as expected, all replicates of gce were negative for covid- by qpcr, and addition of or more sars-cov- gce exhibited a clear logarithmic decrease in qpcr cycle threshold (ct) (fig. b) . similarly, no sars-cov- sequence was apparent on sanger chromatograms for the ntc condition. no spike-in sequence was qualitatively discernable at the gce dilution. mixed bases were obviously present for both the gce and gce conditions suggesting that the limit of detection is about gce sars-cov- . we developed custom bioinformatic analyses to extract the relative abundance of sars-cov- and spike-in amplified products from sanger chromatograms and automate analysis of qsanger chromatograms (see methods). briefly, peak amplitudes were assigned to either the spike-in or sars-cov- sequence at each base position, and a linear regression analysis was performed to determine the qsanger ratio between sars-cov- and spike-in trace intensities. qsanger ratios near were recovered in the samples with gce, indicating the complete absence of sars-cov- rna. (fig. c ). since all of the sars-cov- rna at gce or qsanger ratios of % or greater, this provided further evidence that the limit of detection of qsanger covid- is~ gce or fewer. quantitative analysis of chromatogram peak heights was able to recover a qsanger ratio for even the gce condition, and the qsanger ratios had excellent linearity over - gce (fig. c) . furthermore, qsanger ratios were in good agreement qpcr ct values (fig. d ). qsanger detects sars-cov- rna when amplified directly from viral particles in transport medium without rna purification. since a major limitation for increasing testing capacity has been supply chain and lab workflow bottlenecks related to rna extraction, we next attempted to detect sars-cov- directly from the specimen matrix (viral transport medium). there has been a previous report that rt-qpcr can be successfully performed when up to - ul (total reaction volume of ul) of vtm without extraction is used as the template for rt-pcr [ ] . we hypothesized that qsanger could be a more reliable method for detection of covid- without rna extraction because of i . increased robustness against pcr inhibitors in the specimen matrix since quantification of sars-cov- is performed via comparison with the spike-in internally control; ii. an improved limit of detection by adding more vtm to a correspondingly larger reaction size; and, iii. avoidance of false-positives by examining sequencing data for the correct spike-in or sars-cov- sequence. to test this hypothesis, we obtained reference materials in which either sars-cov- (positive control) or human rna (negative control) is packaged inside of viral particles and suspended in vtm (seracare). since polymerase mixes can have varying resiliency to pcr inhibitors, we also evaluated both luna rt-qpcr and onetaq rt-pcr kits. for both seracare negative and positive control specimens, eight replicates each were performed on the cross-product of conditions for luna vs onetaq polymerases and direct vtm vs purified rna, for reactions total. ul of the seracare specimen (corresponding to gce) was added to a ul total reaction volume. additionally, replicates of no-template controls were performed for each polymerase wherein nuclease free water was added to the reaction. all ntc samples across all conditions were negative by qsanger assay and analysis (fig. a) . nearly all seracare negative control samples were determined to be negative by qsanger; indeterminate results were obtained from purified rna luna specimens and purified onetaq specimen. all seracare positive controls were identified by qsanger except for a onetaq direct vtm specimen (fig. a ). samples were classified as undetermined due to low chromatogram signal intensity (signal to noise ratio < ) or lack of sequence alignment. possible reasons for undetermined chromatograms could be failure in rna extraction, pcr amplification, or cycle sequencing. since the majority of undetermined specimens used purified rna, the possibility that the majority of assay failures were due to the rna extraction process itself, perhaps by carryover of high salt buffers, should be considered. to further evaluate the feasibility of a direct vtm, extraction-free method for sars-cov- detection, we also examined the ability of qsanger to quantify the amount of viral particles in the seracare positive control specimens (fig. b ). since each reaction contained gce of sars-cov- and gce of spike-in, we expected to measure a qsanger ratio of . . the onetaq results yielded a qsanger ratio consistently around - . . this discrepancy could be due to a slightly more efficient amplification of the sars-cov- sequence compared to spike-in sequence. remarkably, luna polymerase mix yielded a qsanger ratio of . +/- . s.e.m. for vtm and a qsanger ratio of . +/- . s.e.m. for purified rna, which is very close to the expected ratio. the~ % difference is on par with typical imprecisions for pipetting and dna quantification. the coefficient of variation (cv) for the vtm and rna purified luna assays were % and %, respectively. notably, this is in good agreement with the imprecision associated with measuring~ molecules at the poisson limit. since luna exhibited better accuracy and precision compared to onetaq, and the luna direct vtm method resulted in correct calls for all ntc, seracare positive, and negative specimens without any failed reactions, subsequent experiments were performed with luna polymerase mix. finally, we sought to demonstrate that omitting rna extraction does not adversely affect qsanger sensitivity. we added gce (corresponding to x the lod in fig. ) of viral particles in vtm containing either negative control or sars-cov- rna (fig. ) . sanger chromatograms clearly showed the absence and presence of sars-cov- signal in the negative and positive controls, respectively (fig. a ). the qsanger assay correctly identified the negative and positive samples in out of the samples tested, with negative control specimen returning an undetermined result due to sequencing failure (fig. b) . overall, the excellent performance shown by our qsanger results on unextracted vtm vs. purified rna with respect to absolute quantification accuracy, poisson-limited coefficient of variation, and limit of detection that is comparable to gold-standard rt-qpcr, suggests that qsanger can be performed on unprocessed specimen matrix without any loss in performance. in fact, it might be possible that rna-extraction free methods are more reliable because it eliminates the carryover risk of high-salt, pcr-inhibiting buffers used in rna extraction procedures. the covid- pandemic is one of the most deadly and disruptive public health emergencies that we have had to face in the last hundred years. to curb its rapid spread, most countries issued shelter-in-place rules which were able to flatten the pandemic curve, but at a significant economic cost [ ] . social distancing interventions can only be relaxed if a vaccine is developed, which is estimated to take months to develop and widely distribute, or after widespread and rapid testing can be performed on symptomatic patients as well as anyone who has been identified as likely to have come into contact with those who test positive [ ] . this suggests that for effective contact tracing, up to additional people may need to be tested for each positive case. however, on april , , the us daily testing capacity was only~ , to~ , tests when , patients tested positive [ ] . this suggests that us testing capacity might need to be increased by a factor of x or more for effective contact tracing. however, most covid- tests that are currently available rely on variations on qpcr and thus suffer from the same supply chain issues. in particular, the limited availability of rna extraction kits has been an important problem for increasing covid- testing capacity [ ] . here, we describe a novel quantitative sanger (qsanger) assay that can detect covid- without rna extraction. we show that qsanger performs as well as qpcr in estimates of viral rna abundance and consistently detects as low as - viral genome copy equivalents, even when vtm is added directly into the reaction mix without rna extraction. because qsanger is an end-point pcr reaction with an internal spike-in control, it is more robust to inhibitors that can exist in vtm, and failures in amplification result in undetermined results that require a repeat reaction, as opposed to false negatives that would be obtained by qpcr. it also has higher specificity than qpcr, as the examination of the sequencing information can be used to distinguish similar viruses and rule out false-positives due to non-specific amplifications. since qsanger has an extremely high specificity enabled by the sequencing information, it can be used for routine testing of asymptomatic individuals with high positive predictive value (ppv). this can be a new paradigm of routine and repeated testing of individuals who are at high risk for contracting disease, e.g., hospital staff, or those who are older or with comorbidities. early detection can improve individual healthcare outcomes and also enable relaxation of population scale non-pharmaceutical interventions like social distancing measures. in addition not requiring rna extraction kits, our qsanger-based covid- assay has a number of additional advantages compared to existing qpcr-based tests for covid- . importantly, qsanger thermal cycling occurs in higher-throughput end-point pcr instruments, rather than specialized qpcr instruments, and the sequencing can be run in automated sanger sequencers with plate feeders such as applied biosystems xl dna analyzers which have the capacity to sequence samples per day. the large existing install-base of end-point pcr and high-throughput sanger instruments throughout the us and the world [ ] supports rapid scale-up of qsanger-covid- assays without requiring any new device or instrument manufacturing. given that sanger sequencing is still the most widely used method of clinical sequencing worldwide, the widespread adoption of qsanger-covid- assay described here can create > m covid- testing capacity per day. more broadly, qsanger can enable an even higher volume of population-scale testing if clinical laboratories and sanger sequencing centers are allowed to collaborate for covid- testing. while sanger sequencers exist in all molecular diagnostic laboratories, they are most commonly utilized in high volume in genome centers, academic sequencing core facilities, and commercial sanger sequencing service laboratories. in this model, clinical laboratories could buy -well master-mix reaction plates that simply require the addition of each patient sample into a reaction well in a biosafety cabinet and pcr thermocycling, whereas the sequencing service laboratory would sequence the samples for next-day results. this would enable a rapidly deployed and distributed population-scale testing. qsanger also has a number of other advantages that may prove to be invaluable as we learn more information about sars-cov- . since sars-cov- mutates quickly, the availability of sequence information can be used to identify growing clusters of mutations and aid with contact tracing via phylogenetic analysis of the mutation data. longer sequences can be designed to capture a wide range of mutations in the qsanger reaction, as the virus mutates and creates sub-strains with different clinical implications. moreover, as opposed to relative measurements obtained by qpcr, absolute measurements of viral load are obtained in qsanger due to the known molecular count of the spike-in. quantification of viral abundance in a sample may prove to be useful for determining who is infectious, as well as for more accurate environmental monitoring. the quantitative dynamic range of qsanger can be broadened from - gce to - , , gce by employing two qsanger reactions with different molecular levels of spike-ins. at the time of the human genome project, sanger instruments were the automated state-of-the-art devices that generated more sequencing information in a single day than entire laboratories could produce in a year [ ] . they accelerated the release of the first human genome by five years and ushered in an era with exponentially decreasing sequencing costs [ ] [ ] . with the qsanger method presented here, they can be repurposed to enable the millions of covid- diagnostics needed to suppress this global pandemic. primer and spike-in: spike-in sequences were designed using the viral genomic region corresponding to the cdc designed n qpcr assay. spike-in molecules have sequence identical to sars-cov- sequence (lc ) including base positions to but lacking bases - . primers flanking the deletion were used for amplification. sequencing was performed using a primer containing the forward amplification binding region. purpose sequence synthetic sars-cov- genomic rna from twist biosciences was used for rna detection linearity experiments. accuplex sars-cov- reference material kit manufactured by seracare (cat. # - ) was used as a proxy for clinical samples. viral purification was performed using purelink viral rna/dna mini kit from thermofisher scientific using µl (at gce/ml) of accuplex positive or negative sample. gce input estimates from purified rna were estimated by corresponding fraction of eluent assuming % recovery. qpcr were performed using the n primers and taqman probes provided in the -ncov ruo kit manufactured by idt (cat. # ). amplification was performed as described in the cdc eua protocol. briefly, µl of synthetic rna template diluted in rnase-free te + . % tween- to , , , or gce/µl were added to each reaction.. rna samples were combined with water, taqpath -step rt-qpcr master mix, primers ( . µl to a final concentration of nm), and probes to a total final volume of µl. the reaction mixture was amplified and probe fluorescence was detected using a mastercycler ep realplex real-time pcr system. the first cycle above threshold was estimated (ct) was performed with default settings using realplex software. reverse transcription and amplification for figure was performed using onetaq one-step rt-pcr kit from neb (cat. # e s). both the onetaq one-step rt-pcr kit and luna universal one-step rt-qpcr kit (cat. # e e) were used for figure . figure was performed exclusively with the luna universal one-step rt-qpcr kit. buffer and enzyme were used according to manufacturer recommendations for µl total volume. all reactions contained tween- at a final concentration of % v/v, nm final concentration of each amplification primer, and gce of synthetic dsdna spike-in molecules. synthetic rna samples were added at µl/reaction and synthetic sample was added to achieve the appropriate number of viral particles. thermocycling was performed using an applied biosystems veriti thermal cycler with the following cycling programs: for concordance calls, sanger sequencing was analyzed using the following procedure. the primary base sequence based on automatic calling were aligned to the viral genome. if the aligned sequence matched the viral genomic sequence without any deletion, then the sample is called positive for viral rna. if the sequence does not match the reference, then the signal to noise ratio is checked with any less than indicating insufficient signal which returns an indeterminate result. if signal to noise is greater than , then the ratio of genomic sequence to spike-in sequence is quantified by performing robust linear regression of genomic peak heights to spike-in peak heights. quantitation for figure was performed with the same quality check as above but quantifying the terminal bases of reference and spike-in sequence for all primary sequences, regardless of whether genomic, spike-in, or mixed sequence dominates. all analysis was performed using custom scripts in r, employing the seqinr and tidyverse packages. since the spike-in is an internal control for amplification and sequencing, observing the spike-in signal only indicates that sars-cov- rna was absent, and therefore the specimen is negative for covid- . in contrast, observing sars-cov- signal only indicates that sars-cov- rna was so abundant in the specimen that it is above the quantifiable range. in a mixed chromatogram of spike-in and sars-cov- , the abundance of sars-cov- rna is determined by the relative contributions of sars-cov- and spike-in signal intensities. ( b -d ) synthetic viral genomic rna was added to rt-pcr reactions at , , , and gce. the same dilutions were subjected to qsanger testing and rt-qpcr. ( b ) rt-qpcr exhibits a linear estimate across the dilution series, consistent with previous results. ( c ) across the same dilution series, the ratio (r) of genomic sequence to spike-in sequence scales with rna added. ( d ) when the qpcr estimate of abundance is compared to qsanger estimates of abundance, they exhibit a strong linear relationship, indicating that qsanger performs as well as qpcr in estimates of viral rna abundance. all results were concordant with seracare and ntc. three samples were no-calls (undetermined) due to low signal-to-noise ratio in the sequencing results. ( b ) positive seracare samples were added to rt-pcr master mix either directly from the vtm or after purification with rna extraction kit at gce. the ratio of reference and spike-in intensities were measured by custom data analysis. the mean qsanger ratio was . (± . s.e.m., n= ) for direct addition, and . (± . s.e.m., n= ) for purified. the coefficient of variation (cv) of positive seracare samples were measured for both luna and onetaq polymerase mixes. the cv for luna direct vtm was . % (n= ), and for luna purified was . % (n= ). this is consistent with the theoretical counting noise associated with quantifying ~ molecules. a novel coronavirus from patients with pneumonia in china covid- ) world map mers: ( title-abs-key ( ( ( "middle east respiratory syndrome" or "mers" ) and ( coronavirus ) ) or ( "mers-cov" or "mers virus" or "mers disease" or "middle east respiratory syndrome virus" or "middle east respiratory syndrome disease" ) ) and not title-abs-key ( ( "ncov" or "covid- " or "covid " or "sars-cov" or "sars-cov- " or "sars" or "severe acute respiratory syndrome" ) ) ) and pubyear > covid- : title-abs- key ( "covid- " or "covid " or "coronavirus disease " or " ) and pubyear > the search was last time updated on may where it returned , items on sars, , items on mers and , items on covid- . figures , and show the distribution of the studies on sars, mers and covid- , respectively, across subject areas. figure (a) also shows the composition of the covid- literature in terms of the document types, demonstrating that only nearly % of the studies on this topic have so far been in the form of full-length articles, while letters, notes, reviews, and other document formats constitute a large portion (i.e. nearly half) of the literature on this topic at the time of this investigation. full records of the three datasets on sars, mers and covid- were retrieved in csv excel format from scopus, all on the same day. this included the citation information, bibliographic information, abstract and keywords, funding details and the references. the scopus restriction of maximum , document to export posed challenges for the retrieval of the sars and covid- datasets whose size were bigger than , documents. for the sars dataset, the challenge was circumvented by further limiting the search to specific year(s), in separate bundles, in a way that the size of each bundle was less than , items, therefore allowing us to export the items of each bundle separately. the extraction of the covid- dataset, however, posed a further layer of complication, given that nearly all studies of covid- have been published in one year, i.e. . therefore, the year of publication could not be used as a criterion to form a set of mutually exclusive smaller-size exportable bundles for this literature. to decompose the search outcome to bundles of , documents or less, the following strategy was adopted. the document type was used to initially limit the search to mutually exclusive (non-overlapping) categories. first, the search was limited to "review or short survey or erratum or conference paper or data paper". this formed a set of , documents which was extracted in one single export (see figure (a) for details of the number of items within each document type category). subsequently, the search was set back to original and was limited to notes ( , items) and then to editorial ( , items). with each set of these two subsets being smaller than , , they were exported separately. there were , documents of letter type. this set was further decomposed to two mutually exclusive subsets based on the publication stage criterion ( , article in press, and , final) and was retrieved in two separate exports. for the remaining , article documents, the following strategy was devised. of the , items, , were article in press and , were final. first, the , article in press items were considered. the list of those studies was sorted as first author (a-z) and the first , items were extracted in one export. then the list was sorted as first author (z-a) and the first items were exported. similar strategy was utilised to extract the remaining , final documents. a supplementary search was also conducted on the general topic of coronaviruses using the string title-abs-key ( "coronavirus*" ) and pubyear > which returned , documents on the same day. only the data related to the number of documents by year was extracted for this search. the increase continued, though at a slower rate, to and was then followed by a gradual decline till . the mers outbreak triggered another spike in the number of publications on coronaviruses, though not as large as that of the sars. the intensification of attention to this topic this time lasted for about three years till before another decline began. the spike of coronavirus studies prompted by the covid- outbreak, however, seem to have been occurring at a completely different scale which can be deemed unprecedented in the history of coronavirus studies. the number of studies emerged in the first five months of nears an equivalent of the % of the total sizer of coronavirus literature during more than years . in figure (c), the temporal distribution of the sars, mers and covid- studies have been shown separately according to the three datasets explained earlier. note that, the quantities associated with sars and mers are represented by the left vertical axis whereas that of the covid- is represented by the right vertical axis, with a scale ten times bigger than the scale of the left axis. the history of previous coronavirus research has suggested that the number of studies will likely keep rising for at least a few years before it peaks. but given the unprecedented magnitude of research and the explosive rate of publications since the begging of , it would be interesting to observe whether this pattern would repeat itself and whether the peak would occur at an earlier or later stage compared to those of the previous outbreaks, a question whose answer will only be determined by time. the co-occurrence of keywords associated with the sars, mers and covid- literature were analysed using vosviewer (van eck and waltman ). each analysis was performed on the separate set of data associated with the literature of interest. the maps of keyword cooccurrences associated with sars, mers and covid- literatures are provided in figures , and respectively. the minimum number of occurrences for the keywords to be included in the map was set to in all three cases. the unit of analysis has also been set to all keywords (that includes both author and index keywords) and the method of counting was full counting. figures a and a in the appendix illustrate the map associated with the sars literature overlaid respectively with the average year of publication and average number of citations associated with the studies where these keywords have occurred. figure a and a present the counterpart outputs for the mers literature analysis. figure a is a heatmap of covid- keyword co-occurrence and figure a overlays the covid- keywords map in figure with the colour-coding of the average number of citations. given that almost all studies of covid- are items, the colour-coding related to the average publication year was forgone in regard to this literature. maps of term occurrences based on the analysis of the title and abstract of studies on sars, mers and covid- have also been presented in figures , and respectively. while the below analysis focuses mainly on the interpretation of the keyword maps, similar patterns are by-and-large observable through analysis of the title and abstract terms of these studies. with respect to each of the three literatures, three distinct clusters of keywords were identifiable. these clusters showed certain patterns of commonality across the three datasets. each map presents a distinct cluster of keywords that seem to be associable to the studies related to public health emergency management and the prevention of epidemic. this would be the red cluster in figure (sars), the green cluster in figure (mers) and the red cluster in figure . here, this is referred to as cluster (i). in this cluster, one can observe terms such as those associated with general public health including "wold health organisation", "public health", "public health service", "global health", as well as those associated with disease outbreaks including "emergency", "health risk" "epidemics", "pandemic", "outbreak", "viral diseases", "virus infection", "communicable disease", "transmission", "travel". terms representing measures of emergency severity also appear in this cluster including "mortality", "fatality", "morbidity", "infection risk". this cluster also includes terms that are linked to the prediction of disease propagation. these are terms such as "mathematical model", "modelling", "simulation", "statistical model" and "prediction" that have commonly occurred in this cluster. the cluster includes terms affiliated with measures of disease control and spread prevention such as "(social/patient) isolation", "quarantine", "hygiene", "handwashing", "prevention", "infection control", "(population) surveillance", "mass screening", "(face) mask", "contact tracing". the cluster also represents keywords that attributable to public policy making and social protection such as "health care planning", "health care policy", "health care quality", "leadership", "disaster planning", "polices". the cluster (i) of keywords also have distinctly and commonly across all three datasets represented keywords that are attributable to the studies on mental health impact of the epidemic. these are keywords such as "mental health (service)", "psychiatry", "psychology", "mental stress", "anxiety", "fear", "mental disease". these studies have often used methods such as "questionnaire(s)" and "survey(s)" that have commonly reflected in this cluster across the three literatures. issues surrounding the safety of medical facilities and medical staff also appear to have been addressed mainly by studies whose keywords are attributable to this cluster. these studies have generated keywords such as "health care personnel", "nurse(s)", "medical staff", "hospital", "health care facility", "personal protective equipment" that are distinctly observable in cluster (i) of keywords across all three datasets. the economic aspects of the epidemics also seem to have been addressed particularly by covid- as reflected in cluster (i) of the covid- literature. these have been reflected in terms such as "economics", "economic aspect", which have occurred frequently enough in covid- studies for them to appear distinctly on the map. the trace of such cohort of studies is, however, not as clearly identifiable based on the sars and mers maps as is it with respect to the covid- literature. this could be explained by the greater magnitude of the societal impact of covid- outbreak compared to sars and mers. the names of the countries and regions have almost invariably appeared in cluster (i) across all three datasets. in certain cases, the country names that have occurred most are those from which the outbreaks originated or those that suffered most from the impact of the outbreak. for example, "saudi arabia" appears quite distinctly on the cluster (i) of the mers dataset. similarly, the occurrence of the names of south-east asian countries/regions such as "china", "hong kong", "taiwan", "singapore" on the cluster (i) of the sars map, or the term "wuhan" on the cluster (i) of the covid- map are quite notable. the occurrence of the name of the countries also could be a reflection of the early studies with respect to each outbreak that have addressed the local impacts/spread of the outbreaks on their own society. on the issue of early studies, the terms "letters", "editorial", and "review" (which have intentionally been kept on the maps) seem to also have distinctly occurred in cluster (i) of each literature which is another indication that this cluster includes early studies that appeared at a time where the amount of data and clinical trials were insufficient for full-length articles. an inspection of the figures a and a does, in fact, confirm this hypothesis at least in association with the sars and mers literature, that the cluster (i) of keywords represent studies that on average emerged early during the developments of their respective literatures. figures a , a and a that have illustrated the colour-coding of the average number of citations on the maps also show that, although cluster (i) is associated with the early studies that generally predated studies of the two other clusters and although it represents the largest variety of topics compared to the two other clusters, it is also associated with studies that, on average, been the recipient of a lesser number of citations when compared to the two other clusters. this pattern also appears to have commonly occurred across all three datasets. a second cluster of keywords associated with each of the three literatures were also discovered that is attributable to the studies on the chemistry and physiology of the virus, or viral pathogenesis, or in other words, the chemical constitution of the virus (knight ), a part of virology that investigates the biological processes and activities of viruses that take place in infected host cells and result in the replication of a virus. for the sars map in figure , as well as the covid- map in figure , this would be the green cluster, whereas for the mers map ( figure ), this cluster is red. according to the maps, the most distinct terms associated with this cohort of virology studies on sars, mers and covid- are terms such as "virus protein", "virus entry", "chemistry", "metabolism", "physiology", "pathology", "cell line", "(virus/viral) protein(s)", "molecular model(s)", "virus genome", "virus rna", "virus replication", "mutation", and "enzyme activity". as this sector of studies often use "animal model(s)", terms such as "animal cell", "animal experiment", "controlled study", "mice" and "mouse" have frequently appeared in cluster (ii) associated with each of the three literatures. in reflection of the fact that these cohort of studies ultimately seek "drug design", in addition to generic common terms such as "drug design/potency/structure/synthesis", the names of the specific potential drugs that have been investigated in relation to each disease have appeared in this cluster. this includes terms such as "hydroxychloroquine" or "remdesivir" on the covid- map. an inspection of the maps overlaid with the average year of publications for sars and mers in figures a and a in the appendix suggests that, on average, this cohort of studies are generally the last to emerge in the published domain compared to the two other major clusters, but they receive relatively high citations on average (according to figures a , a and a ). a third and relatively smaller cluster of keywords was commonly identifiable in relation to each three literatures. this cluster has been visualised in blue colour across all three maps of keyword co-occurrence. the studies represented by this cluster of keywords, here referred to as cluster (iii), appear to have been more closely focused on the developments of antibodies and vaccines. the terms "treatment", "treatment outcome", "disease severity", "antiviral therapy", "prognosis", "drug safety", "prospective/retrospective study", "immunology", "immunotherapy", "innate immunity", "immune response", "virus/viral vaccine(s)", "virus/viral antibody" are notable across these studies. terms affiliated with studies related to treatments and clinical care of respiratory disease patients also appear in this cluster. this includes terms such as "artificial ventilation", "intensive care unit", as well as symptom and organ terminologies associated with each disease, terms such as "fever", "headache", "diarrhea", "lung (injury)", "coughing", "liver injury", "kidney". terms affiliated with cohort analysis studies have appeared in this cluster of the maps associated with each literature. this is reflected in terms such as "female", "male", "child", "infant", "young adult", "adult", "age", "middle aged", "pregnant", "pregnancy". this pattern of the cohort analysis keywords appearing in cluster (iii) is particularly common across the mers and covid- studies. for sars, these terms have largely appeared in the red cluster, at the border between the red and blue clusters. bibliographic coupling of the studies on sars, mers and covid- were analysed at the level of their sources/journals. figure , and show the maps of journal bibliographic coupling associated with sars, mers and covid- literatures respectively. the node sizes are proportional to the number of documents published by the corresponding sources and the thickness of the links are proportional to the degree of bibliographic couplings between the sources connected by each link. the minimum number of documents associated with each node/journal to appear on the map has been set to . no minimum strength was set for links to be visualised on the map. a first-glance comparison shows that while the maps associated with sars and mers are well connected, connections across the covid- map are rather sparse. both the sars and mers maps include three major and distinct clusters of bibliographically coupled journals in addition to one minor and smaller cluster. these clusters show relatively strong degrees of inter-connectivity, whereas, this feature is not shared by the covid- map. the observation is understandable in light of the fact that the sars and mers literatures are relatively well established and have each been under development over a period of several years, whereas the covid- literature is an emerging field, and newly published studies do not seem to be sharing many references as of yet. the comparison also suggests that the covid- studies are generally scattered across a broader variety of journals and subject areas, as opposed to the sars and mers publications that seem to have been concentrated across a smaller set of specialty journals. this is also consistent with our observations from figures - showing that studies of covid- are scattered across a broader variety of subject areas compared to the sars and mers literature. though not shown in figure , due to the respective values being smaller than %, journals in the following subject areas (that are deemed minor areas in relation to covid- literature) have each published a relatively considerable number of studies on this topics (a phenomenon that is not necessarily common with respect to the literature of other coronaviruses): arts and humanities ( items , where the most active journal has been social anthropology ( items) covering topics such as "climate change reactions" (bychkova ), or "legal voids linked to declared states of emergency" (karaseva )), economics, econometrics and finance ( items, with economic and political weekly ( items) being the most active journal of that category, covering topics such as "food supply chains" (reardon et al. ) , "economic stimulus packages" (mulchandani ) or "reverse migration" (dandekar and ghai )), physics and astronomy ( items, where chaos solitons and fractals ( items) has been the most active publication outlet, covering topics such as "mathematical models for forecasting the outbreak" (barmparis and tsironis , bekiros and kouloumpou , boccaletti et al. , ndaïrou et al. , postnikov , ribeiro et al. , zhang et al. )), energy ( items, with international journal of advanced science and technology ( items) being the most active journal in that category, covering topics such as "flexible work arrangement in manufacturing" (sedaju et al. ) ), material sciences ( items, with acs nano ( items) being the most active outlet in that category, covering topics such as " -d printed protective equipment" (wesemann et al. ) ), decision sciences ( items, with lancet digital health ( items) and transportation research interdisciplinary perspectives ( items) being the most active outlets in that category, covering topics such as "the effect of social distancing on travel behaviour" (de vos ) or "the implementation of drive-through and walk-through diagnostic testing" (lee and lee )), earth and planetary sciences ( items, with indonesian journal of science and technology ( items) being most active in that domain, covering topics such as "the deployment of drones in sending drugs and patient blood samples" (anggraeni et al. ) ). the analyses of journal citations also showed similar patterns of scatter and relatively unclear clusters in relation to the covid- literature compared to well-defined clusters of journal citations for sars and mers literatures. consistent with the previous observation with respect to journal bibliographic coupling, the covid- literature seems to be also much less cohesive in terms of its journal citation networks, when compared to the sars and mers literatures. as discussed earlier in relation to bibliographic couplings, this could also be partly explained by the fact that covid- papers are scattered across a more diverse range of journals and broader variety of subject categories. for the maps of journal citation relations presented in figures , and in figures a -a in the appendix, the nodes of the bibliographic coupling maps have been colour-coded by the average year of publications and the average citations per document associated with the journals that each node represent (except for the covid- map that has only been overlaid with the average citations). according to these maps, emerging infectious diseases and the lancet have been a major source of publications for early studies on both sars and mers. this pattern for the lancet seems to have extended to covid- studies as well, as this journal has published a substantial portion of early studies on this topic. for sars, the strong representation of chinese medical journal and chinese journal of microbiology and immunology among the journals that published early studies are notable, a pattern that could be explained by the geographical origin of the sars outbreak. such pattern is to a less obvious extent observable in regard to the mers literature through representations of outlets such as eastern mediterranean health journal and saudi medical journal on the bibliographic coupling map associated with this literature by colours associated with relatively early publications. in collaborations of authors aggregated at the level of the countries were also analysed with respect to the sars, mers and covid- literatures. outputs of the analysis are presented in figures , and for sars, mers and covid- respectively. in each map, the size of nodes, each corresponding with a country, are proportional to the number of published documents with an author affiliated with the institutes of those countries. the links connecting the nodes indicate co-authorships between authors residing in the countries, while the thickness of the links represent the strength (i.e. frequency) of the co-authorships. the colour assigned to each node represents the average number of citations that documents authored by the countries have received. the minimum number of documents for country names to appear on the maps has been set to . comparison across the three maps of co-authorships shows a pattern of author involvement from the regions where each viral outbreak originated. studies authored by researchers affiliated with chinese institutes are well represented in all three cases, but clearly more so with respect to the sars and covid- literature, diseases whose first cases were recorded in china. the involvement of chinese authors is relatively less notable in relation to the mers studies whose origin was in the middle east. instead, with respect to the mers literature, it appears that authors affiliated with institutes in saudi arabia have been exceptionally overrepresented in the publications. this is also, to a lesser extent, the case with egypt being notably presented on the mers map of the country co-authorships. , has a very well spread and rather more evenly distributed network of collaborations with countries across the world, when compared with its network of collaboration on sars and mers. while its strongest collaboration has been with the united states, the names of many other countries appear on its network with no particular country standing out distinctly. italy, as a country that was highly affected by the viral outbreak, has been exceptionally well represented on this map with a very strong link of collaboration with the united states, followed by united kingdom at a smaller scale. this pattern of unique over-representation has to a lesser extent extended to iran, spain, france and brazil as other countries also severely affected by the covid- outbreak at early stages of the global spread. the sars studies with involvements of the authors affiliated with the institutes in the netherlands have on average received the highest number of citations and this is followed by authors from germany, as two countries whose authors have both published considerable number of documents and received high number of citations at the same time. this pattern was, to some extent, repeated in relation to the mers literature, with studies from the netherlands, germany and united kingdom having received on average highest number of citations. for studies published on covid- , studies from china have so far stood out in terms of both the magnitude of research activities and the average number of citations. the map of country co-authorships associated with the sars literature. the map of country co-authorships associated with the mers literature. the map of country co-authorships associated with the covid- literature. outbreaks of infectious diseases have often shown a pattern of generating a quick surge of publications on their respective topics, such that they often create an entirely new literature over a short amount of time (olijnyk , tian and zheng ) . by all measures, however, the influx of research publications that began to emerge following the novel coronavirus outbreak outsizes those of the previous cases in the history of coronaviruses, and perhaps arguably, in the history of infectious diaereses (tian and zheng ) . this has certainly marked a new milestone in the timeline of research on coronaviruses which dates back to (almeida et al. ). according to the editor-in-chief of the journal of virology, as quoted in an article of the scientist magazine (jarvis ), this surge of research outputs has been to the extent that has inundated established coronavirus researchers and domain experts with peer review requests to an extent that they are unable to cope. parallel to such intensified efforts in the research, peer review and editorial fronts, widespread efforts are underway in synthesising, summarising and visualising these rapid developments, a pattern that has also been observedthough at much smaller scales-in relation to the previous epidemics of viral diseases (kostoff and morse ) . in line with these endeavours, this work also aimed at quantifying and analysing scientometric aspects of the the involvement of authors from various countries on the publications linked to these three diseases seem to be distinctly correlated with the regions where the outbreaks originated, with authors from china, for example, being much more strongly represented on sars and covid- studies, two diseases whose origin of outbreaks were attributed to this country. middle eastern countries, on the other hand, are exceptionally represented in the mers literature. the questions of where the covid- literature is headed, how big it will grow in the next coming years, at what point in time the rate of publications on this topic are going to slow down (if ever) and how widely this literature is going to spread across journals and subject categories are only a few examples of questions that will be determined by time. these may also be influenced down the line by possible highly sought medical discoveries in relation to vaccine and treatment development, or lack thereof. but given the current rate at which scholarly outputs are emerging and given the extent of studies, projects, and trials that have already been conceived on this topic around the world; and also given the seemingly long-lasting and farreaching consequences of this global emergency which have impacted on aspects of life, it will probably not be so soon before we observe a decline in covid- research interest. the map of keyword co-occurrence for sars overlaid with the colour-coding of the average year of publication the map of keyword co-occurrence for sars overlaid with the colour-coding of the average citation number the deployment of drones in sending drugs and patient blood samples covid- estimating the infection horizon of covid- in eight countries with a data-driven approach sbdiem: a new mathematical model of infectious disease dynamics modeling and forecasting of epidemic spreading: the case of covid- and beyond sars-cov, mers-cov and now the -novel cov: have we investigated enough about coronaviruses?-a bibliometric analysis scientists are drowning in covid- papers. can new tools keep them afloat? science covid- and climate change reactions: sts potential of online research coronaviridae: a review of coronaviruses and toroviruses. coronaviruses with special emphasis on first insights concerning sars a bibliometric analysis of covid- research activity: a call for increased output coronavirus disease : coronaviruses and blood safety emerging coronaviruses: genome structure, replication, and pathogenesis sars: the first pandemic of the st century a scientometric overview of cord- a systematic review on the efficacy and safety of chloroquine for the treatment of covid- migration and reverse migration in the age of covid- the effect of covid- and subsequent social distancing on travel behavior bibliometric analysis of global scientific research on coronavirus (covid- ) the impact of early scientific literature in response to covid- : a scientometric perspective the scientific literature on coronaviruses, covid- and its associated safety-related research dimensions: a scientometric analysis and scoping review current status of global research on novel coronavirus disease (covid- ): a bibliometric analysis and knowledge mapping journals, peer reviewers cope with surge in covid- publications scientometric trends for coronaviruses and other emerging viral infections the legal void and covid- governance the chemical constitution of viruses structure and infrastructure of infectious agent research literature: sars author productivity of covid- research output globally: testing lotka's law. available at ssrn . visualising covid- research. arxiv testing on the move: south korea's rapid response to the covid- pandemic human coronaviruses: a review of virus-host interactions coronaviruses: a comparative review. current topics in microbiology and immunology/ergebnisse der mikrobiologie und immunitätsforschung covid- crisis: economic stimulus packages and environmental sustainability human coronaviruses: a brief review mathematical modeling of covid- transmission dynamics with a case study of wuhan an algorithmic historiography of the ebola research specialty: mapping the science behind ebola estimation of covid- dynamics "on a back-of-envelope": does the simplest sir model provide quantitative parameters and predictions? covid- 's disruption of india's transformed food supply chains short-term forecasting covid- cumulative confirmed cases: perspectives for brazil flexible work arrangement in manufacturing during the covid- pandemic: an evidence-based study of indonesian employees world health organization declares global emergency: a review of the novel coronavirus (covid- ) emerging infectious disease: trends in the literature on sars and h n influenza open access and altmetrics in the pandemic age: forescast analysis on covid- related literature software survey: vosviewer, a computer program for bibliometric mapping review of the novel coronavirus (sars-cov- ) based on current evidence . -d printed protective equipment during covid- pandemic predicting turning point, duration and attack rate of covid- outbreaks in major western countries key: cord- - hof it authors: cross, thomas j.; takahashi, gemma r.; diessner, elizabeth m.; crosby, marquise g.; farahmand, vesta; zhuang, shannon; butts, carter t.; martin, rachel w. title: sequence characterization and molecular modeling of clinically relevant variants of the sars-cov- main protease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hof it the sars-cov- main protease (mpro) is essential to viral replication and cleaves highly specific substrate sequences, making it an obvious target for inhibitor design. however, as for any virus, sars-cov- is subject to constant selection pressure, with new mpro mutations arising over time. identification and structural characterization of mpro variants is thus critical for robust inhibitor design. here we report sequence analysis, structure predictions, and molecular modeling for seventy-nine mpro variants, constituting all clinically observed mutations in this protein as of april , . residue substitution is widely distributed, with some tendency toward larger and more hydrophobic residues. modeling and protein structure network analysis suggest differences in cohesion and active site flexibility, revealing patterns in viral evolution that have relevance for drug discovery. severe acute respiratory syndrome coronavirus (sars-cov- ) emerged in late ( ) and rapidly spread worldwide, causing an ongoing pandemic. although the sequence of its rna genome is highly similar to that of sars-cov- , sars-cov- is believed to have arisen independently from a bat coronavirus ( ) , to which it shares % similarity ( ) . the emerging sars-cov- subsequently gained a modified spike protein due to recombination in an intermediate host, the pangolin ( , ) , followed by purifying selection for binding to the human ace protein ( ) . no therapeutic agents able to reduce sars-cov- mortality in clinical settings are yet known, although extensive efforts are underway to discover new drugs or repurpose existing ones to inhibit key viral proteins. here we focus on the main protease (m pro ), which plays a critical role in viral replication. like other betacoronaviruses, sars-cov- is a positive-sense rna virus that expresses all of its proteins as a single polypeptide chain, which is cleaved by m pro to yield the mature proteins ( ) . inhibiting this key enzyme would prevent viral replication, reducing viral load and thus symptom intensity. a similar approach was instrumental in making hiv a manageable disease ( ) ( ) ( ) . however, the proteins in question differ markedly, rendering hiv protease inhibitors ineffective against sars-cov- ; indeed, a standard hiv protease inhibitor combination did not prove effective against covid- in a recent clinical trial ( ) . specifically, hiv protease is an aspartic protease (and functional only as a dimer, as the active site comprises one residue from each monomer), whereas m pro is a cl cysteine protease that is likewise most active in the dimeric state, although each monomer has its own catalytic dyad ( ) . the cl cysteine proteases are characterized by a chymotrypsin-like fold and a cysteine-histidine catalytic dyad in the active site, implying both different structures and distinct chemical mechanisms. while the general strategy of seeking protease inhibitors is hence viable for both sars-cov- and hiv, drug development for the former depends on characterizing this novel enzyme. molecular modeling is an important tool for guiding inhibitor discovery, making it possible to evaluate large numbers of candidate drugs in silico to select experimental targets; however, standard approaches screen against only one version of the protein, typically the reference or wild-type (wt) sequence. in a host population, mutations accumulate with each viral passage, generating a mutational landscape rather than a single protein. the design of robust inhibitors that can protect against the multiple strains encountered in clinical settings requires characterization of this sequence space and the populations of conformations it engenders. furthermore, effective and rapid response to future emerging coronavirus diseases requires both in silico screening and experimental testing of antiviral agents and a validated library of relatively general inhibitors that can be used as a basis for the development of specialized therapeutics. central to the success of that effort will be developing an understanding of structural and functional variation in sars-cov- proteins, particularly as mutations accumulate and new strains emerge. here we characterize all known variants of m pro as of april, , and analyze trends in amino acid substitutions and the resulting structural changes using network analysis and molecular modeling. to our knowledge this is the first detailed analysis of clinically relevant mutations in m pro . our analysis shows a trend toward substitution for larger and more hydrophobic residues versus the wt protein. analysis of active site networks (asn) from m pro variants suggests differences in active site flexibility and cohesion that may serve to guide the design of robust, mutation-resistant inhibitors. mutations in m pro are geographically distributed and occur throughout the protein from the gisaid (https://www.gisaid.org/) ( ) epicov database (through apr, ), unique non-synonymous mutations to m pro were found in addition to the wt sequence, including single point variants and double variants. for genome sequences containing these m pro variants, full genome alignments were performed using muscle ( ) , and neighbor-joining trees were generated using mega x ( ) . overall, the variation in sars-cov- sequences observed so far is relatively low, with mutation hotspots not evenly distributed throughout the genome, but localized to specific sequence regions ( ) . because m pro is critical for viral replication, mutations that have a large deleterious effect on virus replication are unlikely to be observed in clinical isolates; all m pro variants investigated here are therefore assumed to be enzymatically competent. in general, codon usage and amino acid frequency in viruses of eukaryotes are essentially identical to those of their eukaryotic hosts, reflecting the viruses' use of the host translation machinery ( ) . m pro sequences found in sequences isolated from human hosts will therefore likely reflect bias toward human codon usage, somewhat limiting the scope of the observed mutation space. the known mutations in m pro are summarized in figure . the tree was generated based on overall genome similarity; however, only sequences containing at least one mutation in m pro were included in the analysis, along with the wt human sequence and two non-human reference sequences. the accession numbers and geographical sources are listed in supplementary table s . the solid arcs around the outside of the diagram indicate m pro mutations; color coding corresponds to the geographical source. several mutations appear to have arisen more than once in the virus's evolutionary history so far. notably, k r variants appear in multiple distantly related subtrees; five of these unique evolutionary events can be verified in nextstrain's sars-cov- phylogenetic tree ( ) . further, l f, p s, and n d arise at least twice in both trees. these phylogenetic comparisons appear to support a multiple event hypothesis, but are subject to errors resulting from the sparsity of testing. the repeated occurrence of the same mutation in seemingly unrelated subtrees may be due to missing data that would show their evolutionary connectedness. the average branch length of figure , which shows only topology, is . x − base substitutions per site (including those from the bat ( ) and pangolin ( ) ); . % of the branches have, to ten significant figures, base substitutions per site. for a genome of roughly , base pairs, this amounts to an average of only substitutions per branch. all of these unique mutants therefore effectively belong to the same strain, making them difficult to place in an evolutionary context. for more diverged mutants, unfortunately-placed ambiguous nucleotides ( ) could push them from one subtree to another. with the exception of five double variants, a majority of the sequences in figure arise from single point mutations. whether and how m pro mutations have affected viral fitness is not yet known, but at least three mutants have remained in the population long enough to accumulate another mutation: l f to a v/l f, g s to g s/d e and g s/v l, and k r to v a/k r. it is worth noting that although a single variant a v exists, the a v/l f double variant likely stemmed from an l f ancestor due to its shared lineage with l f single variants. a fifth double variant, a t/r c, was found but did not stem from any single mutation in our dataset; its origins remain unclear. while a mutation's prevalence and evolution in a population may be interpreted as a sign of stable viral function, the opposite does not necessarily indicate reduced virulence. testing rates, social behavior, and time of first infection in each region are all factors that contribute to the spread of the disease and the availability of sequencing data. for instance, a large number of k r mutants were collected in iceland, where the number of tests per , people is nearly twice as many as the next leading country's and more than seven times as many as the united states' (iceland: . , usa: . , as of april, ) ( ) . consequently, further investigation is needed to determine whether m pro mutations affect viral fitness on a global scale. as such, without greater divergence and more sequences, it is difficult to tell if the presence of an m pro mutation in unrelated subtrees is evidence of multiple evolutionary events, or an artifact of sparse testing. because only sequences harboring m pro mutations were retained for analysis, certain geographical areas appear to be underrepresented. it is likely that the strains that had spread to underrepresented regions prior to our data collection simply did not have m pro mutations. different regions tend to be dominated by different mutants, a feature that might be explained by the timing at which these mutations arose or arrived. for instance, of the m pro mutants from iceland were k r, and most stemmed from a single shared ancestor (supplementary table s ). further, it is likely that heterogeneity in sequencing rates have resulted in a less-thancomplete dataset. as of april th, the only north american, south american, and african m pro mutants reported in the gisaid database that passed our filtering parameters were from costa rica and the usa, argentina and brazil, and the drc respectively. this does not necessarily indicate a lack of m pro mutations in other subregions, and may instead reflect differences in sequencing rates. in the structural analyses that follow, we focus on the differences in protein properties relative to wt, of the clinically observed m pro variants. m pro mutations to date suggest selection for larger, more massive, and more hydrophobic residues to substitute for another ( figure ). the weights of the edges indicate the frequency of the mutation across known m pro variants, while node color reflects residue hydrophobicity on the scale of ( ) (larger numbers indicate greater hydrophobicity.) the most obvious trend observed in the pattern of mutation so far is the preferential substitution of larger, more hydrophobic amino acids in place of smaller, less hydrophobic ones. overall, the pattern is consistent with increased incidence of amino acids that are more likely to be present in folded domains, rather than in linker regions ( ) . in particular, it is notable that alanine has very few incoming ties and a large number of outgoing ties, mostly to valine, which has a larger and more hydrophobic side chain. alanine is at the same time one of the most common amino acids and one of those with the most variable prevalence in the human genome ( ) . similarly, observed ties to isoleucine are mostly incoming from smaller residues, and leucine, which is also large and hydrophobic, likewise has more incoming than outgoing ties overall, with the bulk of its outgoing ties going to phenylalanine. however, aromatic residues per se do not appear to be selected at a higher rate than can be explained by their hydrophobicity. also notable is the selection away from the secondary structure-breakers proline and glycine, both of which have only outgoing ties, and the propensity for lysine to be replaced by arginine even though both side chains are positively charged. arginine is both larger and capable of making more and stronger hydrogen bonds, as well as cation-π interactions not available to lysine, leading to its known overrepresentation in inter-domain and inter-monomer interfaces ( ) ( ) ( ) ( ) . the mean differences in sidechain properties for observed m pro mutations are summarized in table . as observed in the network representation ( figure ), mutated residues are, on average, larger and more hydrophobic than those they replace. although substituted residues are on average larger and more massive, we do not see strong evidence favoring bulky over compact residues net of mass: residue bulk (measured as volume/mass) for substituted residues did not differ significantly from wt (mean difference= . Å /da, t = . , p = . codes: * p < . , ** p < . , *** p < . table : mean differences in side chain properties for substituted residues, versus wt (n = ; substitutions from double mutants considered separately). on average, substituted residues are significantly more hydrophobic, massive, and larger than those they replace (all p-values for two-tailed t-tests versus no difference). for wt and each m pro variant, a molecular model was constructed using modeller . ( ) , based on the a chain monomer of the pdb structure y e ( ), followed by annealing, correction of protonation states, and all-atom molecular dynamics simulation in explicit solvent (see methods). examples of representative models are shown in supplementary figure s , with the positions of all mutated residues shown mapped onto the wt structure in supplementary figure s . we do not observe gross differences in structure or dynamics across variants, as expected given that all variants were found in clinical isolates and are therefore necessarily functional; mutations leading to radically altered or misfolded structures would likely be strongly selected against. however, analysis of md trajectories does suggest more subtle differences across variants, providing insight into function-preserving changes. to assess the overall degree to which local structure is conserved across m pro variants, we compute the cross-variant variance in average φ, ψ backbone torsion angles by residue. in order to control for overall flexibility, we normalize this by the estimated variance in torsion angles within each trajectory. for arbitrary angle α i at residue i, this leads to the local variation index where α ij is the vector of angles of type α i over the trajectory of variant j with corresponding angular mean α ij , α i is the vector of such means across variants, var b is the "between variant" angular variance in mean angles, and var w is the "within variant" angular variance in α ij . intuitively, high values of v(α i ) indicate relatively large between-variant variation in α i relative to angular variation seen within the trajectories themselves. for v(φ i ) and v(ψ i ), such values correspond to systematic changes in local conformation associated with m pro mutations. by turns, low values of v(φ i ) and v(ψ i ) indicate residues whose local structure does not vary meaningfully across variants. it should be noted that such regions can be either flexible or rigid. whereas the latter is of obvious relevance to substrate processing and specificity. this motivates a more detailed examination of variation in the active site, to which we return below. were significantly above it; similarly, when directly compared to wt, variants were observed to be significantly less constrained, while were significantly more constrained (i.e., bootstrap t-scores less than - or greater than , respectively). out of variants ( %) showed nominally higher levels of mean constraint than wt (discounting significance), suggesting a lack of uniform selection pressure for active sites that are more or less constrained than wild type (the fraction greater does not differ significantly from random deviation, p = . , exact binomial were used for molecular modeling. initial conditions for the wt trajectory used here are based on the a monomer of pdb structure y e ( ), representing a mature (i.e., cleaved pro-sequence) protein. initial variant protein structures were predicted using modeller . ( ) , using the y e structure as a template; three rounds of annealing and md refinement were performed using the "slow" optimization level for each. initial structures were then processed to correct protonation states to reflect their predicted cellular environment (with protonation states predicted using propka . ( ) ). each corrected model structure was then minimized and equilibrated in explicit solvent; simulations were performed using namd ( ) with the charmm forcefield ( ) in tip p water ( ) at k under periodic boundary conditions (with a Å margin water box). solvated protein models were energy-minimized for , iterations before being simulated for . ns to adjust water box size, after which a ns trajectory was simulated with conformations being sampled every ps; an n pt ensemble was used, with temperature controlled via langevin dynamics with a damping coefficient of /ps and nosé-hoover langevin piston pressure control set to atm ( , ). • molecular models of representative variants (s ); • sites of known m pro mutations, mapped on the wild-type structure (s ); • additional information on the fibril model reference measure (s ); • local variation index values for m pro backbone torsion angle, by residue number (s ); • additional detail on the psns used in this work (s ) tables: • mean cohesion scores (k-core number) and autocorrelation-corrected bootstrap standard errors by variant (s ); • accession numbers, locations, and dates of collection of all m pro variants referred to in this work (s ) additional files available for download: • uncompressed version of the tree depicted in figure (muscle gisaid .txt); • full acknowledgments for all sequences used in this work ( acknowledgements.pdf) cys figure s : the positions of each mutated residue are shown plotted on the wild-type protein (pdb id: ( ) . panels a-c show different views of the m pro dimer (left) and monomer (right.) one chain of the dimer is shown in black; on this monomer, only the active site residues his (magenta) and cy (yellow) are shown in space-filling representations. on the section monomer (gray) side-chains of the mutated residues are also shown, using the following color coding to indicate the properties of the substituted residue: light gray -polar to nonpolar; teal -polar to polar; sky blue -nonpolar to polar; light green -polar to nonpolar; and salmonmultiple mutations (i.e. two or more independent substitutions with different properties.) table s : accession numbers, locations, and dates of collection of variants as they appear in the uncompressed version of the tree depicted in figure , which is available for download as a .txt file. variants in bold are shown as individual branches in figure . those without accession numbers or dates represent subtrees that were compressed; their constituent variants reside underneath. a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- a pneumonia outbreak associated with a new coronavirus of probable bat origin viral metagenomics revealed sendai virus and coronavirus infection of malayan pangolins (manis javanica) probable pangolin origin of sars-cov- associated with the covid- outbreak emergence of sars-cov- through recombination and strong purifying selection from sars to mers, thrusting coronaviruses into the spotlight hiv- protease inhibitors: a review for clinicians abt- , a highly potent inhibitor of the human immunodeficiency virus protease lopinavir/ritonavir in the treatment of hiv- infection: a review a trial of lopinavir-ritonavir in adults hospitalized with severe covid- only one protomer is active in the dimer of sars c-like proteinase data, disease and diplomacy: gisaid's innovative contribution to global health muscle: multiple sequence alignment with high accuracy and high throughput mega x: molecular evolutionary genetics analysis across computing platforms the establishment of reference sequence for sars-cov- and variation analysis trend of amino acid composition of proteins of different taxa nextstrain: real-time tracking of pathogen evolution nomenclature for incompletely specified bases in nucleic acid sequences: recommendations coronavirus disease (covid- ) a new coronavirus associated with human respiratory disease in china a simple method for displaying the hydropathic character of a protein proteome-wide comparison between the amino acid composition of domains and linkers comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes hydrogen bonding motifs of protein side chains: descriptions of binding of arginine and amide groups cation-π interactions in structural biology cation-π interactions in protein-protein interfaces the molecular origin of like-charge arginine-arginine pairing in water comparative protein structure modeling using modeller αketoamides as broad-spectrum inhibitors of coronavirus and enterovirus replication: structure-based design, synthesis, and activity assessment a chemical group graph representation for efficient highthroughput analysis of atomistic protein simulations structure prediction and network analysis of chitinases from the cape sundew, drosera capensis network analysis provides insight into active site flexibility in esterase/lipases from the carnivorous plant drosera capensis centrum voor wiskunde en informatica amsterdam the neighbor-joining method: a new method for reconstructing phylogenetic trees confidence limits on phylogenies: an approach using the bootstrap crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors propka : consistent treatment of internal and surface residues in empirical pka predictions scalable molecular dynamics with namd optimization of the additive charmm all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ( ) and χ( ) dihedral angles comparison of simple potential functions for simulating liquid water constant pressure molecular dynamics algorithms constant pressure molecular dynamics simulation: the langevin piston method statnet: software tools for the representation, visualization, analysis and simulation of network data network: a package for managing relational data in r social network analysis with sna rpdb: read, write, visualize and manipulate pdb files bio d: an r package for the comparative analysis of protein structures r: a language and environment for statistical computing. r foundation for statistical computing a cartography of the van der waals territory network structure and minimum degree social network analysis: methods and applications an introduction to the bootstrap key: cord- -fw pmaoc authors: huang, jiao-mei; jan, syed sajid; wei, xiaobin; wan, yi; ouyang, songying title: evidence of the recombinant origin and ongoing mutations in severe acute respiratory syndrome coronavirus (sars-cov- ) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fw pmaoc the recent global outbreak of viral pneumonia designated as coronavirus disease (covid- ) by coronavirus (sars-cov- ) has threatened global public health and urged to investigate its source. whole genome analysis of sars-cov- revealed ~ % genomic similarity with bat cov (ratg ) and clustered together in phylogenetic tree. furthermore, ratgl also showed . % spike protein similarity with sars-cov- suggesting that ratgl is the closest strain. however, rbd and key amino acid residues supposed to be crucial for human-to-human and cross-species transmission are homologues between sars-cov- and pangolin covs. these results from our analysis suggest that sars-cov- is a recombinant virus of bat and pangolin covs. moreover, this study also reports mutations in coding regions of sars-cov- genomes signifying its aptitude for evolution. in short, our findings propose that homologous recombination has been occurred between bat and pangolin covs that triggered cross-species transmission and emergence of sars-cov- , and, during the ongoing outbreak, sars-cov- is still evolving for its adaptability. the family coronaviridae is comprised of large, enveloped, single stranded, and positivesense rna viruses that can infect a wide range of animals including humans guan et al., ) . the viruses are further classified into four genera: alpha, beta, gamma, and delta coronavirus (king et al., ) . so far, all coronaviruses (covs) identified in human belong to the genera alpha and beta. among them betacovs are of particular importance. different novel strains of highly infectious betacovs have been emerged in human populations in the past two decades that have caused severe health concern all over the world. severe acute respiratory syndrome coronavirus (sars-cov) was first recognized in , causing a global outbreak (zhong, ; peiris et al., ; cherry, ) . it was followed by another pandemic event in by a novel strain of coronavirus designated as middle east respiratory syndrome coronavirus (mers-cov) (lu et al., ) . both covs were zoonotic pathogens and evolved in animals. bats in the genus rhinolophus are natural reservoir of coronaviruses worldwide, and it is presumed that both sars-cov and mers-cov have been transmitted to human through some intermediate mammalian hosts (li et al., a; bolles et al., ; al-tawfiq and memish, ) . recently, emergence of another pandemic termed as coronavirus disease (covid- ) by world health organization (who) caused by a novel severe acute respiratory syndrome coronavirus (sars-cov- ) has been reported (zhu et al., ) . to date, more than , people are infected and over , death tolls, having transmission clusters worldwide including china, italy, south korea, iran, japan, usa, france, spain, germany and several other countries causing alarming global health concern. the large trimeric spike glycoprotein (s) located on the surface of covs is crucial for viral infection and pathogenesis, which is further subdivided into n-terminal s subunit and c-terminal s domain. the s subunit is specialized in recognizing receptors on host cell, comprising of two separate domains located at n-and c-terminal which can fold independently and facilitate receptor engagement (masters, ) . receptor-binding domains (rbds) of most covs are located on s c-terminus and enable attachment to its host receptor (li et al., b) . the host specificity of virus particle is determined by amino acid sequence of rbd and is usually dissimilar among different covs. therefore, rbd is a core determinant for tissue tropism and host range of covs. this article presents sars-cov- phylogenetic trees, comparison and analysis of genome, spike protein, and rbd amino acid sequences of different covs, deducing source and etiology of covid- and evolutionary relationship among sars-cov- in human. to determine the evolutionary relationship of the sars-cov- , phylogenetic analysis was performed on whole genomic sequences of different covs from various hosts. the maximum-likelihood (ml) phylogenetic tree is shown in figure , which illustrates four main groups representing four genera of covs, alpha, beta, gamma, and delta. in the phylogenetic tree, strains of sars-cov- (red colored) are cluster together and belong to the genera betacoronavirus. among beta-covs, sars-cov, civet sars cov, bat sars-like covs, bat/ratg cov, and sars-covs- clustered together forming a discrete clade from mers-covs. the clade is further divided into two branches and one of the branches comprises all sars-cov- strains clustered together with bat/yunnan/ratg cov forming a monophyletic group. bat/yunnan/ratg exhibited ~ % genomic similarity with sars-cov- . this specifies that sars-cov- is closely related to bat/yunnan/ratg cov. the ml phylogenetic tree demonstrates that covs from bat source are found in the inner joint or neighboring clade of sars-cov- . this indicates that bats covs particularly bat/yunnan/ratg are the source of sars-cov- , and they are emerged and transmitted from bats to humans through some recombination and transformation events in intermediate host. to explore the emergence of sars-cov- in humans, we investigated covs s-protein and its rbd as they are responsible for determining the host range ( table ) . the s-protein amino acid sequence identity between sars-cov- and related beta-covs showed that bat/yunnan/ratg shares highest similarity of . %. however, the amino acid sequence identity of rbd of sars-cov- with bat/yunnan/ratg is . %. on the other hand, beta-covs from pangolin sources (pangolin/guandong/ / and pangolin/guangdong/lung ) revealed highest rbd amino acid sequence identity of . % and . % respectively with sars-cov- . these indication shows the existence of homologous recombination events within the s-protein gene between bat and pangolin covs. similarity plot analysis of covs genome sequences from bat, pangolin and human also indicated a possible recombination within s-protein of sars-covs- ( figure s ). the amino acid residues change in s-protein of sars-cov- was further analyzed with sars-cov, pangolin and bat covs including pangolin/guandong/ / , pangolin/guangdong/lung , and bat/yunnan/ratg (figure ) . regardless of low homology between sars-cov- (wuhan-hu- _mn ) and sars-cov (sars_aar ), they had many homologues areas in s-protein. the five key amino acid residues of s-protein at positions , , , , and of sars-covs are described to be at the angiotensin-converting enzyme- (ace ) receptor complex interface and supposed to be crucial for human to human and cross-species transmission (li et al., b; wu et al., ) . figure b and table s describe that all key amino acid residues of rbd (except two positions) are completely homologues between sars-cov- (wuhan-hu- _mn ) and pangolin covs (pangolin/guandong/ / and pangolin/guangdong/lung ), supporting our postulation of recombination event in s-protein gene. even though, all five crucial amino acid residues of sars-cov- for binding to ace are different from sars-cov, their hydrophobicity and polarity are similar, having same s-protein structural confirmation and identical rbd -d structure (xu et al., ) . in addition, six critical key residues in mers-cov rbd binding to its receptor dipeptidyl peptidase (dpp ) are all different in sars-cov and sars-cov- related coronavirus (figure a) . we also investigated some of the important evolutionary and phylogenetic aspects of sars- table . among different orfs of sars-cov- , orf a was most variable segment with total number of dissimilar amino acid substitutions. it was followed by spike segment s orf with amino acid residue substitutions. however, orf and orf b are the most conserved regions without amino acid changes. in addition, orf , e, m and orf a have tended to be more conserved, with only one or two amino acid substitutions. with the global spread of sars-cov- , its amino acid sequence is also significantly varied (figure ) . usually, rna viruses have high rate of genetic mutations, which leads to evolution and provide them with increased adaptability (lin et al., ) . to further explore sars-cov- evolution in human, we have performed phylogenetic analysis based on the aforementioned sars-cov- in correspondence with their amino acid substitution. one hundred and twenty-five newly sequenced sars-cov- complete genomes were obtained from global initiative on sharing all influenza data epiflutm database (gisaid epiflu tm ) and genbank. closely related beta-covs genomes sequences from different hosts were also collected and analyzed together with sars-cov- . open reading frames (orfs) of covs genomes were predicted using orffinder (v . . ) with default parameters ignoring nested orfs. raw pair-end reads of pangolin dataset sample (srr ) obtained from ncbi were filtered with bbmap.sh (v . ) by removing adaptors, trimming low quality reads from both sides (quality value < ), and reads length less than nt were ignored. host reference genome (pangolin manjav . , gcf_ . ) contaminant reads were removed by bowtie (v . . . ) [ ] . pangolin cov genome fragments were assembled via megahit (v . . ) (li et al., ) . the sequences of covs were aligned using multiple sequence alignment mafft (v . ) (katoh et al., ) . aligned sequences were visualized with jalview (v . . ) (waterhouse et al., ) . poorly aligned regions and gaps were removed by trimal (v . .rev ) (capellagutiérrez et al., ) . maximum likelihood (ml) phylogenetic trees of whole genome sequences were constructed in iq-tree (v . . ) (nguyen et al., ) . support for inferred relationships in the phylogenetic tree was assessed by bootstrap analysis with replicates and the best-fit substitution model was determined by iq-tree model test. to be involved in recombination. similarity scores between genomic sequences were generated by simplot (v . . ) (lole et al., ) . red color highlights interaction positions of sars-cov- and pangolin covs with different amino acids residues. middle east respiratory syndrome coronavirus: transmission and phylogenetic evolution sars-cov and emergent coronaviruses: viral determinants of interspecies transmission trimal: a tool for automated alignment trimming in large-scale phylogenetic analyses the chronology of the - sars mini pandemic isolation and characterization of viruses related to the sars coronavirus from animals in southern china mafft online service: multiple sequence alignment, interactive sequence choice and visualization virus taxonomy. ninth report of the international committee on taxonomy of viruses fast gapped-read alignment with bowtie megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph structure of sars coronavirus spike receptor-binding domain complexed with receptor bats are natural reservoirs of sars-like coronaviruses naturally occurring mutations in pb affect influenza a virus replication fidelity, virulence, and adaptability full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination middle east respiratory syndrome coronavirus (mers-cov): challenges in identifying its source and controlling its spread the molecular biology of coronaviruses iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies severe acute respiratory syndrome from sars coronavirus to novel animal and human coronaviruses jalview version -a multiple sequence alignment editor and analysis workbench mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission management and prevention of sars in china a novel coronavirus from patients with pneumonia in china the authors have declared that no competing interests exist. key: cord- -tqtdjh m authors: enes, ak; pir, pınar title: transcriptional response of signalling pathways to sars-cov- infection in normal human bronchial epithelial cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tqtdjh m sars-cov- virus, the pathogen that causes covid- disease, emerged in wuhan region in china in , infected more than m people and is responsible for death of at least k patients globally as of may . identification of the cellular response mechanisms to viral infection by sars-cov- may shed light on progress of the disease, indicate potential drug targets, and make design of new test methods possible. in this study, we analysed transcriptomic response of normal human bronchial epithelial cells (nhbe) to sars-cov- infection and compared the response to h n infection. comparison of transcriptome of nhbe cells hours after mock-infection and sars-cov- infection demonstrated that most genes that respond to infection were upregulated ( genes) rather than being downregulated ( genes).while upregulated genes were enriched in signalling pathways related to virus response, downregulated genes are related to kidney development. we mapped the upregulated genes on kegg pathways to identify the mechanisms that mediate the response. we identified canonical nfκb, tnf and il- pathways to be significantly upregulated and to converge to nfκb pathway via positive feedback loops. although virus entry protein ace has low expression in nhbe cells, pathogen response pathways are strongly activated within hours of infection. our results also indicate that immune response system is activated at the early stage of the infection and orchestrated by a crosstalk of signalling pathways. finally, we compared transcriptomic sars-cov- response to h n response in nhbe cells to elucidate the virus specificity of the response and virus specific extracellular proteins expressed by nhbe cells. large single-stranded and positive-sense rna viruses. infection by coronaviridae family can cause gastrointestinal and hepatic disorders but the most severe effect is on respiratory system of the patients (weston and frieman, ). sars-cov- , initially known as -ncov, was detected in patients in wuhan region of china in , the disease covid- caused by sars-cov- infection was announced to be a pandemic by who in february (yi et al., ) . symptoms of sars-cov- infection varies among patients but according to clinical reports, high fever ( %) is the major symptom of covid- . this is followed by other symptoms such as dry cough (% ), myalgia or fatigue ( %) (wu et al., ) . symptoms are significantly milder in infants and children when compared to symptoms in older people, and asymptomatic infection can be seen in younger age ranges whereas patients older than years have higher risk of developing severe symptoms (singhal, ) . the main cause of deaths from sars-cov- is acute respiratory distress syndrome (ards). inflammation mechanisms are vital for the defence against pathogens. a crosstalk between immune system, nervous system and coagulative-fibrinolytic pathways orchestrated by "inflammatory mediators" regulate the response to inflammation. inflammatory mediators can be classified into two types, one type is cell-derived and the other one is plasma-derived, cell-derived mediators includes cytokines (coskun benlidayi, ) . sars-cov leads to increased production of the cytokines and chemokines in the cells against virus infection (dosch et al., ) and a similar response is observed in sars-cov- infection. synthesis of cytokines are regulated by signalling pathways such as nfκb, il- which are known as proinflammatory signalling pathways (lawrence, ; mcgeachy et al., ) . excessive production of cytokines is a complication that effects the severity of the infection via hyperinflammation. recently, integration of transcriptomic profile with metabolic networks has demonstrated that metabolic networks are also potentially dysregulated in response to sars-cov- infection (karakurt and pir, ) . here we investigate the effect of sars-cov- infection on signalling pathways in nhbe cells to elucidate the infection mechanisms that may help identification of potential drug targets or design of new test methods. we analysed the functions of significantly changed genes via enrichment of them in go biological process terms and kegg signalling pathways, mapping of the genes on pathways indicated that the crosstalk between the pathways are mediated by positive feedback loops. we compared the genes that were upregulated in sars-cov- infection in nhbe cells to those that were upregulated in h n infection to be able to identify the sars-cov- -specific response when compared to influenza. two sets of transcriptome data from normal human bronchial epithelial (nhbe) cell line was used (accession number: gse , blanco-melo et al., ) in this study. the rna-seq count matrix was downloaded from national center for biotechnology information (ncbi) gene expression omnibus (geo) database. one set is composed of samples collected hours after mock or sars-cov- virus infection ( replicates) and other set is composed of samples collected hours after mock or h n virus infection ( replicates). nhbe cells that received the mock treatment are referred to as 'control samples' throughout this report. statistical analysis of the rna-seq data was performed in r version . . . principalcomponent analysis (pca) was applied on normalized and standardized count matrix to inspect the structure of the data in reduced dimensions. differential gene expression analysis was performed to compare infected and control samples using deseq r package which applies wald test to determine significant differences between means of two groups of samples (love et al., ) . . was chosen as adjusted p-value threshold after applying benjamini hochberg correction to p-values from wald test (benjamini and hochberg, ) . enrichment analysis of the significant changed genes in go biological process terms and kegg pathways were performed using panther (mi et al., ) and g:profiler (raudvere et al., ) online enrichment analysis tools respectively, fisher's exact test was applied to calculate the p-values for significant enrichment and p-values were corrected for multiple testing by using benjamini-hochberg method. visualization and summarization of the gene ontology results were performed by revigo (supek et al., ) tool. to map the significantly changed genes onto signalling pathways, kegg pathway database gui (qiu, ) was used. prior to downstream transcriptome data analysis, pca was done on the dataset which is composed of control / sars-cov- infected and control / h n infected samples and scores on first two principal components were plotted ( figure ). variance explained by pc is % and variance explained by pc is %, pc separates the two batches, whereas control samples and infected samples are separated on pc . hence, main source of variation in the dataset, apart from the batch effect, is infection by either sars-cov- or h n . next, differential gene expression analysis was applied to identify the upregulated and downregulated genes in response to by sars-cov- infection. genes were identified as significantly changed (adjusted p-value < . ), genes were upregulated and genes were downregulated. these two groups of genes were further investigated to identify the biological processes and pathways that significantly responded to virus infection. go terms enriched with upregulated genes were related with immune system and response the external stimulus (table and supplementary figure ). most significant three go terms were immune system process (padj = . e- ), defence response (padj = . e- ), response the cytokine (padj = . e- ). significantly enriched go terms were not specific enough to identify the mechanism of the cellular response, whereas kegg pathway enrichment analysis (table ) contrary to infection-related biological functions of the upregulated genes, the biological functions of downregulated genes are related with organ development, especially kidney development as shown in supplementary table . although adjusted p-value of enrichment in kidney development pathway is relatively high, and not many genes in this pathway are upregulated ( / ), this result may indicate a biologically relevant phenomenon, as a connection between sars-cov- and acute kidney injury is recently reported (fanelli et al., ) . acute renal dysfunction has been observed as a comorbidity in % of the patients in a small cohort of patients (yang et al., ), a meta-analysis of data from publications indicated that existing chronic kidney disease and acute kidney injury are strongly correlated with increased disease severity in covid- patients (wang et al., ) . further, secondary effects of the infection such as low levels of oxygen being delivered to organs or cytokine storm leading to clots may worsen the kidney dysfunction with adverse effects on the patient. it has been also reported that respiratory viral infections increase the incidence of rheumatoid arthritis (ra). to date, three viruses which increases the ra incidence significantly were identified, one of which is sars-cov- (joo et al., ) . the genes upregulated in ra pathway response to sars-cov- are provided in supplementary unsurprisingly, influenza a signalling pathway was also found to be upregulated significantly. however, virus release pathway of influenza a does not seem to be affected transcriptionally, this may mean that virus release follows an alternative route in sars-cov- infection in nhbe cells. influenza a signalling pathway can be seen in figure , the transcriptional response of the pathway in covid- and swine flu is discussed in below sections. other pathways that play important roles in initiating the inflammation are also discussed below. one of the most upregulated pathways in nhbe cells in response to sars-cov- infection is interleukin (il- ) signalling pathway, il- is strongly associated with immunopathology but it also has important roles in host defence (veldhoen, ) . and play a critical role in the immune response to pathogens and several inflammatory diseases such as rheumatoid arthritis, asthma, allergy (rahimi et al., ) . the branch of il- signalling pathway triggered by il- e which induces th production and supress th production is not activated transcriptionally, as expected. however, most targets of the branch triggered by il- a and il- f are upregulated. this branch of the signalling pathway induces th cells, which in turn regulates autoimmune pathology, neutrophil recruitment and the immunity to extracellular pathogens. neutrophils are also part of the innate immune system, which generate the early response to infections and activate adaptive immune system by antigen presentation. chemokines such as cxcl , cxcl , cxcl have chemotactic activity in neutrophils, they play role in inflammation and target injured or infected tissues (li et al., ) . upregulated mmps and chemokines demonstrate the response generated by nhbe cells to trigger the immune system, these two subgroups of genes are further discussed below comparatively in sars-cov- and h n infection. il- β and tnfα are also among the upregulated targets of il- pathway, these two cytokines activate the canonical nfκb pathway. the transcription factor nfκb itself is activated by the il- pathway and upregulates transcription of its targets, hence, forms a positive feedback loop in response to viral infection. interestingly, none of the il- ligands were expressed in detectable amounts in nhbe cells, and none of the ligands or receptors were upregulated in response to virus infection, therefore the positive feedback loop is likely to be initiated by activation of nfκb via other signalling pathways. nfκb signalling pathway is a proinflammatory signalling pathway with nfκb as its key transcription factor. nf-κb is a member of rel-related protein family. this family also includes rela (p ), relb, c-rel, nfκb (p /p ) and nfκb (p /p ). the target genes of nfκb are (tnf)-α, il- , il , il- and interferon-β which play role in immune response and inflammation (liao et al. ) . thus, it has critical role in pathogenesis and lung diseases. this pathway is composed of typically two branches, these are canonical, and non-canonical branches (liao et al., ; dosch et al., ) nfκb signalling pathway is implicated in pathogenesis of inflammatory bowel disease (ibd), asthma, chronic obstructive pulmonary disease (copd) and ra (lawrence, ). unlike canonical nfκb, non-canonical nfκb is associated with development processes, such as b-cell survival and maturation of b-cell, bone metabolism and dendritic cell activation. in addition, the activation of canonical part of this pathway is faster but less persistent than non-canonical nfκb (sun, ) . upregulated genes in nfκb signalling pathway is shown in figure , most targets of the canonical branch are upregulated, whereas only one target of the non-canonical branch is upregulated. targets of the third branch, which is "atypical branch", activated by a crosstalk of the two branches, were also upregulated in nhbe cells. it is plausible that infected tissues give priority to defence and survival rather than signalling for production of b cells. in this pathway, unlike the il- signalling pathway, many ligands are upregulated although the receptors seem to be unaffected transcriptionally. tumour necrosis factor (tnf) plays important role in pro-inflammatory and antiinflammatory processes. it provides protection against cancer and infectious pathogens. tnf signalling pathway includes types of receptors, tnfr and tnfr . the branch of signalling pathway which includes tnfr receptor lead to apoptosis, necroptosis and immune system related biological functions via crosstalk with other signalling pathways such as nfκb and mapk. its role is implicated in immune system homeostasis, antitumour responses, and control of inflammation (mehta et al., ) . signalling via tnfr , activates cd and cd t cells or innate immune system cells which induce the death of infected cells (kalliolias and ivashkiv, ; mehta et al., ) . tnfr expression is limited with some cell types such as neurons, immune cells and endothelial cells. tnfr promotes homeostatic effects locally such as cell survival and tissue regeneration (kalliolias and ivashkiv, ) . tnf, a target of nfκb signalling pathway is upregulated in infected nhbe cells, triggering the tnf signalling pathway ( figure ). most targets of the pathway such as inflammatory cytokines are transcriptionally upregulated including the cytokines involved in leukocyte recruitment. angiotensin converting enzyme (ace ) is expressed in human airway epithelia. coronaviruses enter target cells via binding of their spike proteins to ace proteins of the target cells. therefore, the expression level of ace has an impact on the possibility of infection transmission (jia et al., ; hoffmann et al., ) . it is reported that overexpression of human ace increases the severity of sars-cov in infected mice, and the expression of ace decreases after infection. ace , which is then re-injected into mice, causes high levels of lung injury (kuba et al., ) . spike protein of sars-cov- and sars-cov share ~ % amino acid identity, and expression level of ace may also have an impact on sars-cov- infection (hoffmann et al., ) . according to our differential gene expression analysis results, ace did not change its expression significantly. a cellular protease, tmprss , reported to take part in mediating the virus entry, also had invariant expression levels in nhbe cells. nhbe cells produced a strong transcriptional response to sars-cov- although counts from the rna-seq data analysed here indicated very low expression of ace gene in the cells, as previously reported by blanco-melo et al. ( ) . it is therefore not clear whether if very small amounts of ace protein is enough to mediate the infection or another unknown protein also takes part. further, our results imply that ace or tmprss transcription is not regulated by the virus infection either negatively or positively in bronchial epithelial cells. h n infection which causes swine flu (influenza a) is lethal only in . - . % of the patients, as opposed to anticipated - % lethality in covid- . we analysed the transcriptional response in nhbe cells to h n infection, then compared the response to that of sars-cov- . gene expression levels from four replicates of h n infected and control cells were compared, genes were found to be upregulated and genes were found to be downregulated in nhbe cells hours after the h n infection (padj < . ). go terms and kegg pathways that were enriched with upregulated genes were similar to those of sars-cov- , but we found additional terms such as metabolism and ribosomes, indicating that impact of h n infection was not limited to virus response (supplementary table table ). although some of the differences observed between the responses may be attributed to sampling being done after hours in sars-cov- as opposed to sampling being done after hours in h n infection, the expected response profile in influenza a pathway suggest that sampling time has a minor effect on the results. infection finally, we compared expression levels of a subset of genes which encode the protein that are secreted to extracellular matrix in sars-cov- and h n infection ( figure ). cytokine csf was very significantly upregulated in response to sars-cov- infection, but was invariant in response to h n infection. also, il a, cxcl and csf had high fold changes in sars-cov- infection when compared to h n infection ( figure a , b). blood samples collected from patients in acute phase of covid- were analysed for levels of cytokines (wang et al. ), csf was one of three cytokines which had significantly different expression in healthy individuals, patients in intensive care and patients not in intensive care, hence csf had a discriminative power not only between healthy and disease cases, but also between severe and less severe disease cases. the other two cytokines were tnfα and mip α, former had similar fold changes in sars-cov- and h n infections, later was not included in our dataset. the samples were collected hours after the sars-cov- infection to generate the dataset analysed in this study, therefore the data reflects the early response to infection. fold change in expression of csf in nhbe cells in hours may mean that this cytokine is an early marker of the infection and potentially can be detected in blood samples before the patients develop any symptoms of the disease. another subgroup we analysed are matrix metalloproteinases (mmps), proteolytic enzymes which play multiple roles in immune response, tissue degradation and regulation of inflammation via activation and inactivation of cytokines and chemokines (elkington et al., ; dandachi and shapiro, ) . transcriptional response of mmp genes in sars-cov- versus h n infections are shown in figure c and d, mmp and mmp responded differentially in two types of infection. mmp is involved in respiratory epithelial healing and its level is elevated in rheumatoid arthritis, whereas mmp may take part in development of emphysema, an obstructive lung disease. both csf and mmp are targets of il- pathway, mmp is also a target of tnf pathway, hence upregulation of these two genes are likely to be a result of activities of il- and tnf pathways. our analyses demonstrated that multiple immune response pathways are activated in response to sars-cov- infection in nhbe cells. in the in vitro system we analysed, no external signals are available and nhbe cells do not express ligands of il- pathway themselves, although targets of the pathway are significantly upregulated. two of those, tnfα and il β, mediate the positive feedback loops with tnfα and nfκb pathways. in (frank and lisanti, ) . similarly, regulation by the three pathways converge in activation of inflammatory response related cytokines such as cxcl , cxcl , cxcl and ptgs . there is an ongoing global effort to better understand the covid- disease caused by sars-cov- to be able to develop high sensitivity / high specificity diagnosis tests and better treatment methods. in this study, we examined the signalling pathways triggered by sars-cov- infection and found that nfκb is central to upregulated signalling pathways. our study is limited to response generated in hours in nhbe cells grown in vitro. however, analysis of the early response of the cells gave important clues about the cytokine storm experienced by most patients which increases the severity of the disease. we identified a subgroup of genes related to renal system disorders and rheumatoid arthritis which also respond to sars-cov- infection. further, we compared the transcriptional response to sars-cov- and h n infection in nhbe cells. we find that response generated in h n infection is much stronger in the influenza a pathway as expected, but other immune response pathways such as il- , nfκb and tnf pathways yielded a stronger response in sars-cov- infection, which may explain the severity and higher mortality rates of inflammation in patients when compared to swine flu. we identified a cytokine, csf , and a matrix metal protease, mmp to be exclusive to sars-cov- when compared to h n . these two extracellular proteins should be further investigated as potential markers of the infection in blood samples. this study is limited with samples collected at two time points and infection by two controlling the false discovery rate: a practical and powerful approach to multiple testing t helper cells: a new player in immune-related diseases role of inflammation in the pathogenesis and treatment of fibromyalgia a protean protease: mmp- fights viruses as a protease and a transcription factor sars coronavirus spike proteininduced innate immune response occurs via activation of the nf-kappab pathway in human monocyte macrophages in vitro the paradox of matrix metalloproteinases in infectious disease acute kidney injury in sars-cov- infected patients icam- : role in inflammation and in the regulation of vascular permeability ace receptor expression and severe acute respiratory syndrome coronavirus infection depend on differentiation of human airway epithelia respiratory viral infections and the risk of rheumatoid arthritis tnf biology, pathogenic mechanisms and emerging therapeutic strategies integration of transcriptomic profile of sars-cov- infected normal human bronchial epithelial cells with metabolic and protein-protein interaction networks a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury the nuclear factor nf-kappab pathway in inflammation il- receptorbased signaling and implications for disease activation of nf-kappab by the full-length nucleocapsid protein of the sars coronavirus t follicular helper cells during immunity and tolerance moderated estimation of fold change and dispersion for rna-seq data with deseq the il- family of cytokines in health and disease tnf activity and t cells panther version : more genomes, a new panther go-slim and improvements in enrichment analysis tools comprehensive phenotyping of t cells using flow cytometry kegg pathway database. encyclopedia of systems biology targeting the balance of t helper cell responses by curcumin in inflammatory and autoimmune states g:profiler: a web server for functional enrichment analysis and conversions of gene lists ( update) a review of coronavirus disease- (covid- ) the non-canonical nf-kappab pathway in immunity and inflammation revigo summarizes and visualizes long lists of gene ontology terms interleukin is a chief orchestrator of immunity we declare no conflict of interest. key: cord- -ss g jkg authors: jakhar, renu; gakhar, s.k title: an immunoinformatics study to predict epitopes in the envelope protein of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ss g jkg covid- is a new viral emergent human disease caused by a novel strain of coronavirus. this virus has caused a huge problem in the world as millions of the people are affected with this disease in the entire world. we aimed to design a peptide vaccine for covid- particularly for the envelope protein using computational methods to predict epitopes inducing the immune system and can be used later to create a new peptide vaccine that could replace conventional vaccines. a total of available sequences of sars-cov- were retrieved from ncbi for bioinformatics analysis using immune epitope data base (iedb) to predict b and t cells epitopes. then we docked the best predicted ctl epitopes with hla alleles. ctl cell epitopes namely interacted with mhc class i alleles and we suggested them to become universal peptides based vaccine against covid- . potentially continuous b cell epitopes were predicted using tools from iedb. the allergenicity of predicted epitopes was analyzed by allertop tool and the coverage was determined throughout the worlds. we found these ctl epitopes to be t helper epitopes also. the b cell epitope, srvknl and t cell epitope, flafvvfll were suggested to become a universal candidate for peptide-based vaccine against covid- . we hope to confirm our findings by adding complementary steps of both in vitro and in vivo studies to support this new universal predicted candidate. as we all know the corona virus has stopped the movements of the entire world. this virus is so deadly that it is taking lives of the more than thousands of people every day and affecting millions of people on the globe. although the disease was first reported in the wuhan city of china, where the virus was isolated from a patient with respiratory symptom in dec , [ , ] later identified it by the name of covid- [ ] . world health organization (who) announced this disease as pandemic disease that spread from china to more than a hundred countries in the world. by may , , the disease had already struck more than million persons of whom thousands of peoples died from covid- infection majority of them were reported from china, italy, united state of america, britain and spain. corona viruses are the large group of viruses belonging to the family coronaviridae and the order nidovirales that are common among animals [ ] . the coronaviridae family is divided into four genera based on their genetic properties, including alpha, beta, gamma and delta corona virus genus [ ] . the -ncov is enveloped positive-sense rna, beta corona virus with a genome of . kb [ ] . they are zoonotic, transmitted from animals to humans [ ] . covid- affects the respiratory system (lungs and breathing tubes). most covid- patients developed severe acute respiratory illness with symptoms of fever, cough, and shortness of breath. maximum reported cases of covid- have been linked through travel to or residence in countries in this region [ , ] . presently there are no clinically approved vaccines available in the world for this disease. the development of new vaccine for this new emergent strain by using therapeutic and preventive approach can be readily applied to save human lives. the use of peptides or epitopes as therapeutics is a good strategy as it has advances in design, stability, and delivery [ , ] . moreover, there is a growing importance on the use of peptides in vaccine design by predicting immunogenic ctl, htl and b cell epitopes from tissue-specific proteins of organisms [ , ] . among the structural proteins of sars-cov- , the cov envelope (e) protein is a small integral membrane protein involved in life cycle of virus. it involves in envelope formation, and some other aspects like assembly formation, budding, and pathogenesis. thus, it is considered to be a promising target for effective covid- vaccine design [ ] . more importantly, t-cell-based cellular immunity is essential for cleaning sars-cov- infection because it is memory based [ , ] . also, the low mutation rate of the e protein or it is a highly conserved protein that can elicits both cellular immunity, and neutralizing antibody against covid- is necessary for an efficient vaccine development [ , ] . therefore, in this study, an immunoinformatics based approach was adopted to identify a candidate epitopes against envelope protein of sars-cov- that could be appropriately activate a significant cellular, and humoral immune response [ , ] . the aim of this study is to analyze envelope protein strains using in silico approaches looking for the conservancy, which is further studied to predict all potential epitopes that can be used after in vitro and in vivo confirmation as a therapeutic peptide vaccine [ , , ] . the protein sequence of envelope protein from severe acute respiratory syndrome coronavirus isolate indian strain (sars-cov- / /human/ /ind) with accession no. qia . was retrieved from the ncbi. the antigenicity of this sequence was predicted by the vaxijen v . server [ ] with default parameter. in the present study, envelope protein was found to be a potential antigenic protein with good antigenicity score. a total of envelope protein sequences were retrieved from the ncbi database till april . these sequences retrieved were collected from different parts of the world; retrieved sequences and their accession numbers are listed in the supplementary file. further, the multiple sequence alignment of envelope protein sequences was carried out through clustal w. envelope protein d structure was obtained by swissmodeller which uses homology detection methods to build d models [ ] . ucsf chimera was used to visualize and minimize the d structures [ ] , and structure validation was carried out with saves [ ] . homology modelling was achieved to establish conformational b cell epitope prediction and for further verification of the surface accessibility and hydrophilicity of b lymphocyte epitopes predicted, as well as to visualize all predicted t cell epitopes in the structural level. b cell epitope is the portion of an immunogen, which interacts with b-lymphocytes. as a result, the b-lymphocyte is differentiated into an antibody-secreting plasma cell and the memory cell. thus, the iedb resource was used for analysis. envelope protein was subjected to bepipred linear epitope prediction [ ] , emini surface accessibility [ ] , kolaskar and tongaonkar antigenicity [ ] , parker hydrophilicity [ ] , chou and fasman beta turn [ ] and karplus & schulz flexibility prediction [ ] prediction methods in iedb, that predict the probability of specific regions in the protein to bind to b cell receptor, being in the surface, being immunogenic, being in a hydrophilic region and being in a beta turn region, respectively. potentially continuous b cell epitope was predicted using tool ellipro from iedb resource [ ] . the allergenicity of predicted epitopes was analyzed by allertop tool [ ] . toxinpred server was used to predict toxicity assessment of epitopes [ ] . t-cell epitopes were predicted by the netctl server [ ] . the parameter was set at to have the highest specificity and sensitivity of . and . , respectively and all the supertypes were taken during the submission of a protein sequence. a combined algorithm of major histocompatibility complex (mhc)- binding, transporter of antigenic peptide (tap) transport efficiency and proteasomal cleavage efficiency were used to predict the overall scores [ ] . on the basis of the combined score first, five best epitopes were selected for further testing as putative epitope vaccine candidates. mhc- binding t cell epitope was predicted by iedb by using the stabilized matrix method (smm) for each peptide [ ] . prior to prediction, all epitope lengths were set as mers, conserved epitopes that bind to many hla alleles at score equal or less than . percentile rank were selected. for further analysis, alleles having ic less than nm were selected. overall, the higher immunogenicity of peptides shows more expected to be ctl epitopes than those having lower immunogenicity. therefore, the iedb immunogenicity prediction tool was used for the prediction of the immunogenicity of the candidate epitopes [ ] . analysis of peptide binding to mhc class ii molecules was assessed by the iedb mhc ii prediction tool, where smm based netmhciipan . server was used [ ] . it covers all hla class ii alleles including hla-dr, hla-dq, and hla-dp [ ] . ic below nm show maximum interaction potentials of htl epitope and mhc ii allele [ ] . accordingly, five top epitopes were selected. the predicted htl epitopes were submitted to the ifn epitope server to check whether the mhcii binding epitopes had the ability to induce ifn-γ [ ] . all potential mhc i and mhc ii binders from envelope protein were assessed for population coverage against the whole world population that had been reported covid- cases. calculations achieved using the selected mhc-i and mhc-ii interacted alleles by the iedb population coverage calculation tool [ ] . epitopes of mhc i alleles that predicted to bind with percentile rank below . were selected as the ligands, which are modeled using pep-fold online peptide modeling tool [ ] . the receptor mhc i allele d structure was obtained from the pdb server [ ] . patchdock program was used for all dockings [ ] . pymol and chimera were used for visualization and determination of binding affinity and to show the suitable epitopes binding with the lowest energy. the protein sequence of envelope protein from severe acute respiratory syndrome coronavirus isolate indian strain retrieved in fasta format was screened using the vaxijen server to predict the immunogenicity. in the present study, the qia . ) was predicted to be antigenic protein based on the overall score by the vaxijen server and this has been indicated as an immunogenic protein. a total of envelope protein sequences retrieved from the ncbi database were aligned, to see the conservation of predicted epitopes. by means of iedb analysis resource b and t cell epitopes were predicted and population coverage was calculated. three-dimensional structure of envelope protein of the sars-cov- was modelled using the homology structure modelling tool swissmodeller (fig. ) . this protein showed a good model with swissmodeller by using pdb id: x respectively as a template has more than % identity and % similarity with the query structure. these models were energy minimized by using chimera. the ramachandran plot and prosa z-score validation (fig. ) indicated that > % residues in the favoured region for the modelled envelope protein. the conformational b-cell epitopes were also obtained in five chains of envelope protein by using ellipro. ellipro gives the score to each output epitope, which is protrusion index (pi) value averaged over each epitope residue [ ] . some ellipsoids approximated the tertiary structure of the protein. the highest probability of a conformational epitope was calculated at % (pi score: . ). residues involved in conformational epitopes, their number, location and scores are shown in table , srvknl residues were found have highest pi score. this epitope is antigenic, nonallergic, nontoxin, and conserved in sars-cov- . also, their positions on d structures are shown in envelope protein from the sars-cov- was analyzed using the iedb mhc- binding prediction tool to predict the t cell epitope suggested interacting with different types of mhc class i alleles. based on netctl and smm-based iedb mhc-i binding prediction tools with higher affinity (ic less than ) were predicted to interact with different mhc- alleles. the predicted total score of proteasome score, tap score, mhc score, processing score, and mhc-i binding are summarized as a total score in table table . among these t-cell epitopes, -mer epitope, flafvvfll was found to have the highest immunogenicity which was maximum than above said epitope and found to have more number of allelic interactions with good population coverage than other epitopes. by the same way in iedb mhc- binding prediction tool, t-cell epitopes from the sars-cov- were analyzed using the mhc-ii binding prediction method; based on smm based netmhciipan with ic less than . there were top predicted epitopes found to be nonallergic and antigenic interact with mhc-ii alleles for which the peptide (core) (table- and ) . epitopes that are suggested interacting with mhc-i and ii alleles (especially high affinity binding epitopes and that can bind to a different set of alleles) were selected for population coverage analysis. the results of population coverage of all epitopes are listed in table and . flafvvfll epitope that interacts with most frequent mhc class i and ii alleles gave a high percentage against the whole world population by the iedb population coverage tool. the maximum class i and ii combined population coverage ( . %) for this proposed epitope was found in north america (table- ), while the higher population coverage in europe ( . %) and east asia ( . %) followed by south asia ( . %) and north africa ( . %) then northeast asia ( . %) and southeast asia ( . %). table represents the populations for which the mhc i and ii class combined coverage of other areas. proposed t-cell epitopes, flafvvfl (green) and b cell epitopes, srvknl (yellow) in pantamer structure of e protein of sars-cov- . the predicted t cell epitope flafvvfll that interacted with selected human's mhc- and ii alleles were used as ligands (fig. ) to detect their interaction with alleles /receptors, by docking techniques using on-line software patchdock. after successful docking by patchdock, the refinement and re-scoring of the docking results were carried out by the firedock server. after refinement of the docking scores, the firedock server generates global energies/ binding energies for the best solutions. chimera was used to visualize the best results. the d structure of epitopes was predicted using pep-fold and energy minimization was carried out by using chimera. based on the binding energy in kcal/mol unit, the lowest binding energy (kcal/mol) was selected to obtain a best binding (pose) and to predict real ctl and htl epitope as possible. the receptors used for docking studies included reported hlas, hla-c* : (pdb id: efx) for class i and hla-drb * : (pdb id: aqd) for class ii. hla-c* : and hla-drb * : was observed to have the interaction with the flafvvfll epitope with lower binding energy, - . kcal/mol and - . kcal/mol respectively ( fig. and ) . the predicted peptide showed significant binding affinities with all hlas. also, the binding energy of the predicted epitopes were compared with the binding energy of the already experimentally verified peptides and found to be negative [ ] . in this study, we aimed to determine the highly potential immunogenic epitopes for b and t since the immune response of t cell is long lasting response comparing with b cell, where the antigen can easily escape the antibody memory response [ ] additionally, cd + t and cd + t cell responses play a major role in antiviral immunity [ ] , designing of a vaccine against t cell epitope is much more promising. flafvvfll epitope could be used as a potential candidate because it had a maximum combined score and immunogenic score. moreover, it possessed the maximum number of hla binding alleles amongst other ctl and htl. this epitope was found to be antigenic, non-toxin and nonallergic. an ideal epitope should be highly conserved. the conservancy analysis of this epitopes indicated that this epitope was found to have been conserved in all sequences of the sars-cov- consider in this study. we found these ctl epitopes to be htl epitopes. the overlapping between mhc class i and ii t cell epitopes suggested the possibility of antigen presentation to immune cells via both mhc class i and ii pathways especially the overlapping sequences. to conclude, by using e protein one epitope, srvknl was proposed for an international therapeutic peptide vaccine for b cell. regarding t cell, the flafvvfll epitope was highly recommended as a therapeutic peptide vaccine to interact with both mhc class i and ii. we recommend in vitro and in vivo validation for the efficacy and efficiency of these predicted candidate epitopes as a vaccine as well as to be used as a diagnostic screening test. author contribution: renu jakhar conducted the study, performed in silico analysis and wrote the manuscript. s.k. gakhar plans the study and revises the manuscript. outbreak of pneumonia of unknown etiology in wuhan china: the mystery and the miracle a new coronavirus associated with human respiratory disease in china the -new coronavirus epidemic: evidence for virus evolution emerging coronaviruses: genome structure, replication, and pathogenesis a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster detection of novel coronavirus ( -ncov) by real-time rt-pcr cross-species transmission of the newly identified coronavirus -ncov recent advances in the detection of respiratory virus infection in humans the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirus outbreak in wuhan epitope-based vaccine target screening against highly pathogenic mers-cov: an in silico approach applied to emerging infectious diseases structural basis of development of multi-epitope vaccine against middle east respiratory syndrome using in silico approach epitope-based peptide vaccine design and target site depiction against middle east respiratory syndrome coronavirus: an immune-informatics study recent advances in the vaccine development against middle east respiratory syndrome-coronavirus preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies coronavirus envelope protein: current knowledge the membrane protein of severe acute respiratory syndrome coronavirus acts as a dominant immunogen revealed by a clustering region of novel functionally and structurally defined cytotoxic tlymphocyte epitopes a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- analysis of the genome sequence and prediction of b-cell epitopes of the envelope protein of middle east respiratory syndrome-coronavirus the membrane protein of severe acute respiratory syndrome coronavirus functions as a novel cytosolic pathogen-associated molecular pattern to promote beta interferon induction via a toll-like-receptor-related traf -independent mechanism exceptionally potent neutralization of middle east respiratory syndrome coronavirus by human monoclonal antibodies evaluation of candidate vaccine approaches for mers-cov a decade after sars: strategies for controlling emerging coronaviruses more than one reason to rethink the use of peptides in vaccine design vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines swiss-model: homology modelling of protein structures and complexes ucsf chimera, a visualization system for exploratory research and analysis stereochemistry of polypeptide chain configurations prediction of residues in discontinuous b-cell epitopes using protein d structures induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide a semi-empirical method for prediction of antigenic determinants on protein antigens new hydrophilicity scale derived from highperformance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and x ray-derived accessible sites prediction of the secondary structure of proteins from their amino acid sequence prediction of chain flexibility in proteins protection from ebola virus mediated by cytotoxic t lymphocytes specific for the viral nucleoprotein allertop -a server for in silico prediction of allergens open source drug discovery consortium. in silico approach for predicting toxicity of peptides and proteins large-scale validation of methods for cytotoxic t-lymphocyte epitope prediction sensitive quantitative predictions of peptide-mhc binding by a 'query by committee'artificial neural network approach the immune epitope database (iedb) . properties of mhc class i presented peptides that enhance immunogenicity netmhciipan- . , a common panspecific mhc class ii prediction method including all three human mhc class ii isotypes protection from ebola virus mediated by cytotoxic t lymphocytes specific for the viral nucleoprotein toward more accurate pan-specific mhcpeptide binding prediction: a review of current methods and tools novel immunoinformatics approaches to design multi-epitope subunit vaccine for malaria by investigating anopheles salivary protein predicting population coverage of t-cell epitope-based diagnostics and vaccines pep-fold: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides the rcsb protein data bank: a redesigned query system and relational database based on the mmcif schema patchdock and symmdock: servers for rigid and symmetric docking a comprehensive analysis of aminopeptidase n protein (apn) from anopheles culicifacies for epitope design using immuno-informatics models key: cord- -ofpna k authors: schubert, katharina; karousis, evangelos d.; jomaa, ahmad; scaiola, alain; echeverria, blanca; gurzeler, lukas-adrian; leibundgut, marc; thiel, volker; mühlemann, oliver; ban, nenad title: sars-cov- nsp binds ribosomal mrna channel to inhibit translation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ofpna k the non-structural protein (nsp ), also referred to as the host shutoff factor, is the first viral protein that is synthesized in sars-cov- infected human cells to suppress host innate immune functions , . by combining cryo-electron microscopy and biochemical experiments, we show that sars-cov- nsp binds to the human s subunit in ribosomal complexes including the s pre-initiation complex. the protein inserts its c-terminal domain at the entrance to the mrna channel where it interferes with mrna binding. we observe potent translation inhibition in the presence of nsp in lysates from human cells. based on the high-resolution structure of the s-nsp complex, we identify residues of nsp crucial for mediating translation inhibition. we further show that the full-length ’ untranslated region of the genomic viral mrna stimulates translation in vitro, suggesting that sars-cov- combines inhibition of translation by nsp with efficient translation of the viral mrna to achieve expression of viral genes . specialized mechanisms to hijack the host gene expression machinery and employ cellular resources to regulate viral protein production. such mechanisms are common for many viruses and include inhibition of host protein synthesis and endonucleolytic cleavage of host messenger rnas (mrnas) , . in cells infected with the closely related sars-cov, one of the most enigmatic viral proteins is the host shutoff factor nsp . nsp is encoded by the gene closest to the '-end of the viral genome and is among the first proteins to be expressed after cell entry and infection to repress multiple steps of host protein expression , , , . initial structural characterization of the isolated sars-cov nsp protein revealed the structure of its n-terminal domain, whereas its c-terminal region was flexibly disordered . interestingly, sars-cov nsp suppresses host innate immune functions, mainly by targeting type i interferon expression and antiviral signaling pathways . taken together, nsp serves as a potential virulence factor for coronaviruses and represents an attractive target for live attenuated vaccine development , . to provide molecular insights into the mechanism of nsp -mediated translation inhibition, we solved the structures of ribosomal complexes isolated from hek lysates supplemented with recombinant purified nsp as well as of an in vitro reconstituted s-nsp complex using cryo-em. we complement our findings by reporting in vitro translation inhibition in the presence of nsp that is relieved after mutating key interacting residues. furthermore, we show that the translation output of reporters containing full length viral ΄utrs is greatly enhanced, which could explain how nsp inhibits global translation while still translating sufficient amounts of viral mrnas. to elucidate the mechanism of how nsp inhibits translation, we aimed to identify the structures of potential ribosomal complexes as binding targets. previously, it has been suggested that nsp mainly targets the ribosome at the translation initiation step . therefore, we treated lysed hek e with bacterially expressed and purified nsp and loaded the cleared lysate on a sucrose gradient. fractions containing ribosomal particles were then analyzed for the presence of nsp . interestingly nsp not only co-migrated with s particles, but also with s ribosomal complexes (fig. a) , suggesting that it interacts with a range of different ribosomal states. we then pooled all sucrose gradient fractions containing ribosomal complexes and investigated them using cryo-em. this analysis revealed a s pre-initiation complex (pic) encompassing the initiation factor eif core, eif , the ternary complex comprising eif and initiator trnai with additional density in the mrna entrance channel that could not be assigned to mrna (fig. b,c; extended data fig. ) . to unambiguously attribute this extra density to nsp , we assessed whether nsp binds purified ribosomal s subunits alone. in vitro binding assays using sucrose density centrifugation showed that nsp associates with s ribosomal subunits since it co-pelleted with the s (fig. d) . however, nsp did not interact with s subunits, suggesting that the interaction with s subunits is specific. based on these results we assembled in vitro a s-nsp complex and determined its structure at . Å resolution using cryo-em (extended data fig. ). the molecular details revealed by these maps allowed us to identify the density as the c-terminal region of nsp and build an atomic model. docking of the model into the maps of the s pic obtained from the hek e cell lysates clearly showed that the c-terminus of nsp is also associated with the s pic ( fig. b; extended data fig. ). as observed in the high-resolution structure of the s-nsp complex, the c-terminal part of nsp in the mrna entrance channel (fig. e ) folds into two helices that interact with h of the s rrna as well as proteins us in the head and us and es in the body, respectively ( fig. f; fig. a ). in both complexes, nsp binds in the mrna entrance channel on the s subunit, where it would partially overlap with the fully accommodated mrna. consequently, mrna was not observed due to nsp binding. the high-resolution reconstruction also revealed the network of molecular interactions between nsp and the s subunit. the first c-terminal helix (residues - ) interacts with us and us through multiple hydrophobic side chains such as y , f and w (fig. b) . the two helices are connected by a short loop containing the kh motif that establishes stacking interactions with helix h of the s rrna through u and u as well as backbone binding (fig. c) . the second helix (residues - ), localized in proximity of the es c-terminus, interacts with the phosphate backbone of h via the two conserved arginines r and r (fig. d ). an additional weak density at the head of the small subunit in the proximity of es between h and us was observed. this may correspond to the flexibly disposed n-terminal domain of nsp considering the amino acid long unstructured linker between the n-and c-terminal domains (fig. e, extended data fig. ). however, we cannot exclude that this density corresponds to unassigned ribosomal protein segments in the vicinity as the cterminal amino acids of es (head) or the n-terminal amino acids of us (body) could become better ordered in the context of nsp binding. thus, it occurs that nsp is tightly bound to the s subunit through anchoring of its c-terminal helices to the mrna channel, while the n-terminal domain can sample space in the radius of approximately Å from its attachment point. luciferase-encoding reporter mrna (rluc) in an s hela lysate in vitro translation system ( fig. d) . wt nsp was recombinantly expressed and purified and its effect on translation was tested by adding increasing amounts of the protein to hela cell lysates containing capped and polyadenylated rluc mrna control transcript. we observed a concentration-dependent inhibition of translation where almost full inhibition was reached at µm of nsp (fig. b) . to dissect the contributions of the observed interactions between the s and nsp on the inhibition of translation, we used our structural information to design several mutants targeting key amino acids in helix (double mutant y a / f a), the kh motif (double mutant k a / h a), and in helix (double mutant r e / r e) as summarized in fig. a . we also rationalized our mutations based on the high conservation of the nsp cterminus between sars-cov- , sars-cov and closely related bat coronaviridae, with sequence identities above % for the orf ab (encoding for polyproteins pp a and pp ab) and key amino acids highly conserved ( fig. e ; extended data fig. ) . furthermore, the kh mutant had been described to abolish interaction with s in sars-cov . in contrast to the wt protein, the three mutants did not affect translation of the rluc control mrna, even at concentrations of µm (fig. b) . consistently, the mutants lost their ability to bind s ribosomal subunits indicating that the c-terminal domain is primarily responsible for the affinity of nsp for the ribosome (fig. c) . these results also agree with our structural findings where the c-terminal domain of nsp is responsible for specific contacts with the ribosome, whereas the n-terminal domain is flexibly disposed. interestingly, this mrna binding inhibition mechanism may be unique to sars-cov- and closely related beta-coronaviruses, since the c-terminal region of nsp is shorter in alpha-coronaviruses and is not highly conserved amongst other beta-coronaviruses including mers-cov, the latter being consistent with the observation that mers-cov nsp does not bind the ribosome . since translation of viral mrna competes with translation of cellular mrnas, the inhibitory effects of nsp in the context of special features of the sars-cov- genomic rna need to be considered. therefore, we investigated the differences in the translation of reporter mrnas with viral vs. cellular ´utrs and the relative inhibitory effect of nsp . using the in vitro translation system described above, we compared the translation efficiency of rluc reporters harboring the full-length ΄utr of the sars-cov- genomic rna (fl-rluc) with the translation of equimolar amounts of a native rluc reporter (fig. d) . we observed a significant five-fold increase in translation when the reporter mrna included the viral ΄utr, suggesting that the viral mrna is more efficiently translated than host mrnas (fig. e) . nevertheless, titration of wt nsp inhibited translation of both mrnas, fl-rluc and native rluc, to the same extent (fig. f ). these findings, as well as evidence from sars-cov , indicate that nsp acts as a general inhibitor of translation initiation. our structural data suggest that sars-cov- nsp inhibits translation by sterically occluding the entrance region of the mrna channel and interfering with binding of cellular mrnas (fig. a,b) . however, the question of how ribosomes in virus-infected cells are recruited to efficiently translate the viral mrna remains open. our results on the inhibitory mechanism of nsp together with the translationstimulating features of the viral mrna provide a possible explanation. first, nsp will act as a strong inhibitor of translation that tightly binds ribosomes and reduces the pool of available ribosomes that can engage in translation. under ribosome limiting conditions, translation from more efficient viral mrna is then likely to be favored (fig. c) . therefore, the combination of an nsp -mediated general translation inhibition and an enhanced translation efficiency of viral transcripts appears to lead to an effective switch of translation from host cell mrnas towards viral mrnas. considering that nsp can inhibit its own translation, the virus tunes cellular levels of nsp exactly below the concentrations necessary to inhibit viral mrna translation, but possibly enough to inhibit translation initiation from less efficiently recruited cellular mrnas. through this mechanism, we propose that nsp would be able to inhibit global cellular translation particularly for mrnas responsible for the host innate immune response, while the remaining ribosomes would still be able to translate viral mrnas with high efficiency. during the course of viral infection, the effect of viral mrnas on shifting protein synthesis machinery towards production of viral proteins would be increasingly strong since their levels are known to increase to % of total cellular rnas . the identification of the c-terminal region of nsp as the key domain for ribosome interactions that are essential for controlling cellular response to viral infections will be helpful in designing attenuated strains of sars-cov- for vaccine development. furthermore, these results provide an excellent basis for structure-based experiments aimed at investigating nsp function in vivo by using viral model systems. (a) sucrose gradient fractionation of hek lysate supplemented with nsp . nsp co-migrates with s and s ribosomal particles in a - % (w/v) sucrose gradient. his -tagged nsp is visualized by western blot using an α-his antibody, while the rrna content in corresponding fractions is monitored on an agarose gel. all samples for the western blot derive from the same experiment and the blots were processed in parallel. (b-c) overview of nsp (red) binding to a s pic containing the core of initiation factor eif (cyan), eif (blue) and the eif -trna ternary complex (magenta). (d) in the in vitro binding assay, wt nsp was added to s and s ribosomal su and loaded on a % (w/v) sucrose cushion. unbound proteins remained in the supernatant (sn), while bound nsp co-pelleted with s (p). (e) overview of nsp binding to the small ribosomal subunit. nsp (red) binds close to the mrna entry site and contacts us (blue) from the ribosomal s head as well as us (green), the c-terminus of us (orange) and h of the s rrna (grey) of the s body. human ribosomal subunits were purified as described , and final samples were flash-frozen in liquid nitrogen at a concentration of mg/ml (od of ) and stored at - °c. to verify nsp - s complex formation, we performed binding assays using sucrose density centrifugation. thawed human s and s ribosomal subunits were adjusted to a final and that had been treated with the same enzymes. the nt-long ΄utr of sars-cov genomic mrna sequence was subcloned to replace the rluc ' utr by fusion pcr using primers tctgcagaattcgcccttcatg and gccctatagtgagtcgtattacaattcact for vector amplification and the pair gactcactatagggcaactttaaaatctgtgtggctgtcact and ggcgaattctgcagacttacctttcggtcacacccg for amplification of the ΄utr fragment using 'utr-egfp cloned in puc vector as a template, which was designed to possess the sars-cov- 'utr sequence in front of the egfp coding sequence. preparation of in vitro transcribed mrnas was performed as described capping buffer (new england biolabs). the capping reaction was carried out at °c for h and quenched by the addition of acidic p.c.i., followed by rna purification. finally, the integrity of the capped mrnas was verified by agarose gel electrophoresis. hela s lysates were prepared similarly as described before . briefly, lysates were prepared from s hela cell cultures grown to a cell density ranging from - x in vitro translation reactions were performed similarly as described before additionally, before precipitation, samples were taken for analysis on agarose gels ( . % bleach, % (w/v) agarose). quantifoil for each sample, one grid was selected for data collection using a titan krios cryo-transmission electron microscope (thermo fisher scientific) operating at kv and equipped with either a falcon ec camera (thermo fisher scientific) in integration mode or a k camera (gatan), which was run in counting and super-resolution mode, mounted to a gif quantum ls operated with an energy filter slit width of ev. the falcon ec datasets were collected at a nominal magnification of ' x (pixel size of . Å/pixel), while for the k datasets a nominal magnification of ' x was used (physical pixel size of . Å/pixel, which corresponds to a super-resolution pixel size of . Å/pixel). for counting mode, illumination conditions were adjusted to an exposure rate of e -/pixel/second. micrographs were recorded as movie stacks at an electron dose of ~ e -/Å applied over frames. for both datasets, the defocus was varied from approximately - to - μm. the stacks of frames were first aligned to correct for motion during exposure, dose-weighted and gain-corrected using motioncor ( )). in short, the particle set was first cleaned from the preferentially oriented particles based on their orientation parameters, which reduced the particle set to ' particles. those particles were then further classified for their quality and for the presence of nsp using a focused d classification approach. the final set of particle images was refined using a global d refinement. to further improve the local resolution of the s-nsp complex, masks around the s head and body were generated using ucsf chimera by creating a mask which was extended by Å around a fitted model of the s subunit. those masks were used for a multi-body refinement in relion . . finally, the two focused maps were combined to generate a composite d map of the entire in vitro reconstituted s-nsp complex. for the hek cell extract, after d classifications, ab initio reconstruction was performed in cryosparc , and the determined volumes were used as starting references for a heterogeneous refinement in cryosparc (extended data fig. ). the ' particle images corresponding to the s ribosomal subunit were selected for a further round of heterogeneous refinement in cryosparc , which resolved a density corresponding to initiation factor eif in a fraction of the particles. to improve the occupancy of eif , particle images belonging to the s subunit class were then subjected to a focused d classification in relion . using a circular mask on the eif region. the d class depicting the best density for eif was selected ( ' particle images) and was then used for a global d refinement. to further improve the resolution of the nsp -bound region, a focused refinement was done using a mask on the body of the s subunit. for building of the s-nsp complex, the head and body of pdb oa were docked as rigid bodies into the . Å head and body maps that were obtained by focused classification (extended data fig. ). the structures were adjusted manually into the high-resolution maps using coot , and the c-terminus of nsp (residues - ), which was well-resolved in the map of the s body, was built de novo. the coordinates were subjected to cycles of real space refinement using phenix . . to stabilize the refinement in less well-resolved peripheral areas, protein secondary structure and ramachandran as well as rna base pair restrains were applied. remaining discrepancies between models and maps as well as missing mg + ions were detected using real space difference maps, and after model completion the coordinates were refined for two additional cycles. the resulting final models have excellent geometries and correlations between the maps and models (table , extended data fig. ). the structures were validated using molprobity and by comparison of the model vs. map fscs at values of . , which coincided well with the fscs between the half-sets of the em reconstruction using the fsc= . criterion (extended data fig. ). to assemble the full s-nsp complex, both refined structures were docked into a . Å chimeric map comprising the complete s-nsp . after readjustment of the head-to-body connections, the complete model was subjected to two additional rounds of real space refinement as described above. the . Å and . Å maps of the nsp - s pic shown in extended data fig. and the high-resolution cryo-em maps of the complete s-nsp complex, the s body and the viral and cellular mrna translation in coronavirus-infected cells severe acute respiratory syndrome coronavirus nsp protein suppresses host gene expression by promoting host mrna degradation the architecture of sars-cov- transcriptome the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- human coronaviruses: a review of virus-host interactions identification and characterization of severe acute respiratory syndrome coronavirus replicase proteins mechanisms and enzymes involved in sars coronavirus genome expression a two-pronged strategy to suppress host protein synthesis by sars coronavirus nsp protein sars coronavirus nsp protein induces template-dependent endonucleolytic cleavage of mrnas: viral mrnas are resistant to nsp -induced rna cleavage severe acute respiratory syndrome coronavirus protein nsp is a novel eukaryotic translation inhibitor that represses multiple steps of translation initiation novel β-barrel fold in the nuclear magnetic resonance structure of the replicase nonstructural protein from the severe acute respiratory syndrome coronavirus severe acute respiratory syndrome coronavirus nsp suppresses host gene expression, including that of type i interferon, in infected cells severe acute respiratory syndrome coronavirus evades antiviral signaling: role of nsp and rational design of an attenuated strain coronavirus non-structural protein is a major pathogenicity factor: implications for the rational design of coronavirus vaccines human nmd ensues independently of stable ribosome stalling genomic variance of the -ncov coronavirus middle east respiratory syndrome coronavirus nsp inhibits host gene expression by selectively targeting mrnas transcribed in the nucleus while sparing mrnas of cytoplasmic origin structural and functional insights into human re-initiation complexes motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy gctf: real-time ctf determination and correction new tools for automated high-resolution cryo-em structure determination in relion- prevention of overfitting in cryo-em structure determination algorithms for rapid unsupervised cryo-em structure determination integrated tools for structural and sequence alignment and analysis characterisation of molecular motions in cryo-em single-particle data by multi-body refinement in relion features and development of coot macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix molprobity: all-atom structure validation for macromolecular crystallography structural insights into the mammalian late-stage initiation complexes the phyre web portal for protein modeling, prediction and analysis conformational differences between open and closed states of the eukaryotic translation initiation complex structure and interactions of the translation initiation factor eif deciphering key features in protein structures with the new endscript server we thank the eth scientific center for optical and electron microscopy (scopem) and the cryoem knowledge hub (cemk), in particular d. böhringer, for technical support and the opportunity to continue our work in spite of the eth lockdown due to the covid- pandemic.we thank the functional genomics center zurich (fgcz) for the help with mass-spectrometry.the authors would like to thank their teams for the support in the lab, and especially to m. jia, p. bhatt and d. yudin for creating a productive working atmosphere. nb and ks initiated the project and designed the experiments. ks expressed proteins, together with be, and prepared samples for cryo-em. ks, aj and as prepared grids, carried out data collection and processing. edk and om designed translation experiments, edk and lg were involved in cloning and edk performed in vitro translation reactions, with the help of lg. ks and be performed sucrose binding assays.ml was involved in structure modelling and refinement as well as in figure preparation. nb and ks coordinated the project. all authors contributed to the final version of the manuscript. the authors declare no competing interests. extended data figure : data processing of the hek cell extract cryo-em dataset.scheme for the processing of the hek cell extract sample. local resolution estimates are plotted as heat map on the final volume accompanied with a slice through the volume. the half map vs. half map fsc curves are shown for the overall refinement (purple) and the refinement focused on the body (blue). figure : data processing of the in vitro reconstituted s-nsp cryo-em dataset. scheme of the processing steps performed for the sample of the in vitro binding experiment. the local resolution distribution is plotted on the final volumes as heat map, together with additional slices through the volumes. the half map vs. half map fsc curves are plotted for the overall refinement (purple), the refinement focused on the body (blue), on the head (cyan), as well as for the composite map (green). the map vs. model fscs are plotted for the body (yellow) and the head (orange) in their respective focused maps, as well as for the full s in the composite map (red). nsp sequences of human sars-cov, human sars-cov- and sars-related bat coronaviruses were obtained from the uniprot (www.uniprot.org) and genbank (www.ncbi.nlm.nih.gov/genbank) databases. the sequences were aligned using clustal omega (www.ebi.ac.uk/tools/msa/clustalo). the alignment was visualized with espript . note that sequences of mers nsp and other human coronaviruses were not included in the alignment due to lack of sequence homology. for displaying the secondary structure, the atomic coordinates of the sars-cov nsp n-terminus (pdb hsx) and of the nsp c-terminus (this publication) were combined. regions of known structure are highlighted with blue (n-terminal domain) and red (c-terminal domain) bars. unresolved regions are indicated by dotted lines, including the ~ Å unstructured linker (black) and the n-terminus (blue). double mutations analyzed in this study are shown as asterisks in different colors. sugar pucker outliers (%): key: cord- -rfuyd authors: dellicour, simon; durkin, keith; hong, samuel l.; vanmechelen, bert; martí-carreras, joan; gill, mandev s.; meex, cécile; bontems, sébastien; andré, emmanuel; gilbert, marius; walker, conor; de maio, nicola; faria, nuno r.; hadfield, james; hayette, marie-pierre; bours, vincent; wawina-bokalanga, tony; artesi, maria; baele, guy; maes, piet title: a phylodynamic workflow to rapidly gain insights into the dispersal history and dynamics of sars-cov- lineages date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rfuyd since the start of the covid- pandemic, an unprecedented number of genomic sequences of the causative virus (sars-cov- ) have been generated and shared with the scientific community. the unparalleled volume of available genetic data presents a unique opportunity to gain real-time insights into the virus transmission during the pandemic, but also a daunting computational hurdle if analysed with gold-standard phylogeographic approaches. we here describe and apply an analytical pipeline that is a compromise between fast and rigorous analytical steps. as a proof of concept, we focus on the belgium epidemic, with one of the highest spatial density of available sars-cov- genomes. at the global scale, our analyses confirm the importance of external introduction events in establishing multiple transmission chains in the country. at the country scale, our spatially-explicit phylogeographic analyses highlight that the national lockdown had a relatively low impact on both the lineage dispersal velocity and the long-distance dispersal events within belgium. our pipeline has the potential to be quickly applied to other countries or regions, with key benefits in complementing epidemiological analyses in assessing the impact of intervention measures or their progressive easement. * corresponding author (simon.dellicour@ulb.ac.be) since the start of the covid- pandemic, an unprecedented number of genomic sequences of the causative virus (sars-cov- ) have been generated and shared with the scientific community. the unparalleled volume of available genetic data presents a unique opportunity to gain real-time insights into the virus transmission during the pandemic, but also a daunting computational hurdle if analysed with gold-standard phylogeographic approaches. we here describe and apply an analytical pipeline that is a compromise between fast and rigorous analytical steps. as a proof of concept, we focus on the belgium epidemic, with one of the highest spatial density of available sars-cov- genomes. at the global scale, our analyses confirm the importance of external introduction events in establishing multiple transmission chains in the country. at the country scale, our spatially-explicit phylogeographic analyses highlight that the national lockdown had a relatively low impact on both the lineage dispersal velocity and the long-distance dispersal events within belgium. our pipeline has the potential to be quickly applied to other countries or regions, with key benefits in complementing epidemiological analyses in assessing the impact of intervention measures or their progressive easement. keywords: covid- , sars-cov- , phylodynamic, phylogeography, phylogenetic clusters, lockdown measures. first reported in early december in the province of hubei (china), covid- (coronavirus disease ) is caused by a new coronavirus (severe acute respiratory syndrome coronavirus ; sars-cov- ) that has since rapidly spread around the world , , causing an enormous public health and social-economic impact , . since the early days of the pandemic, there has been an important mobilisation of the scientific community to understand its epidemiology and help providing a real-time response. to this end, research teams around the world have massively sequenced and publicly released dozens of thousands of viral genome sequences to study the origin of the virus , , and to trace its spread at global, country or community-level scales [ ] [ ] [ ] . in this context, a platform like nexstrain , already widely used and recognised by the academic community and beyond, has quickly become a reference to follow the travel history of sars-cov- lineages. in the context of the covid- pandemic, the volume of genomic data available presents a unique opportunity to gain valuable real-time insights into the dispersal dynamics of the virus. yet, the number of available viral genomes is increasing every day, leading to substantial computational challenges. while bayesian phylogeographic inference represents the gold standard for inferring the dispersal history of viral lineages , these methods are computationally intensive and will fail to provide useful results in an acceptable amount of time. to tackle this practical limitation, we here describe and apply an analytical pipeline that is a compromise between fast and rigorous analytical steps. in practice, we propose to take advantage of the rapid time-scaled phylogenetic tree inference process used by the online nextstrain platform . specifically, we aim to use the resulting time-scaled tree as a fixed empirical tree along which we infer the ancestral locations with the discrete and spatially-explicit phylogeographic models implemented in the software package beast . . in belgium, there are two main different laboratories (from the university of leuven and the university of liège) involved in sequencing sars-cov- genomes extracted from confirmed covid- positive patients. to date, some genomes (n= ) have also been sequenced at the university of ghent, but for which metadata about the geographic origin are unavailable. as of june , , a total of genomes have been sequenced by these research teams and deposited on the gisaid (global initiative on sharing all influenza data ) database. in the present study, we exploit this comprehensive data set to unravel the dispersal history and dynamics of sars-cov- viral lineages in belgium. in particular, our objective is to investigate the evolution of the circulation dynamics through time and assess the impact of lockdown measures on spatial transmission. specifically, we aim to use phylogeographic approaches to look at the belgian epidemic at two different levels: (i) the importance of introduction events into the country, and (ii) viral lineages circulating at the nationwide level. on june , , we downloaded all belgian sars-cov- sequences (n= ) available on gisaid, as well as non-belgian sequences ( , ) originated from different countries and used in nextstrain to represent the overall dispersal history of the virus. we generated a time-scaled phylogenetic tree using a rapid maximum likelihood approach and subsequently ran a preliminary discrete phylogeographic analysis along this tree to identify internal nodes and descending clades that likely correspond to distinct introductions into the belgian territory ( fig. , s ). we inferred a minimum number of introduction events ( % hpd interval = [ - ]). when compared to the number of sequences sampled in belgium (n= ), this number illustrates the relative importance of external introductions in establishing transmission chains in the country. introduction events resulted in distinct clades (or "clusters") linking varying numbers of sampled sequences (fig. ) . however, many clusters only consisted of one sampled sequence. according to the time-scaled phylogenetic tree and discrete phylogeographic reconstruction (fig. s ), some of these introduction events could have occurred before the return of a cluster is here defined as a phylogenetic clade likely corresponding to a distinct introduction into the study area (belgium). we delineated these clusters by performing a simplistic discrete phylogeographic reconstruction along the time-scaled phylogenetic tree while only considering two potential ancestral locations: "belgium" and "non-belgium". we identified a minimum number of lineage introductions ( % hpd interval = [ - ]), which gives the relative importance of external introductions considering the number of sequences currently sampled in belgium ( ). on the tree, lineages circulating in belgium are highlighted in green, and green nodes correspond to the most ancestral node of each belgian cluster (see also figure s for a non-circular visualisation of the same tree). besides the tree, we also report the distribution of cluster sizes (number of sampled sequences in each cluster) as well as the number of sequences sampled through time. belgian residents from carnival holidays (around march , ), which was considered as the major entry point of transmission chains in belgium. to analyse the circulation dynamics of viral lineages within the country, we then performed spatially-explicit phylogeographic inference along the previously identified belgian clades ( fig. a ). our reconstructions reveal the occurrence of long-distance dispersal events both before (fig. b ) and during (fig. c ) the lockdown. by placing phylogenetic branches in a geographical context, spatially-explicit phylogeographic inference allows treating those branches as conditionally independent movement vectors . here, we looked at these movement vectors to assess how the dispersal dynamics of lineages was impacted by the national lockdown, of which the main measures were implemented on march , . firstly, we investigated if the lockdown was associated with a change in lineage dispersal velocity. we estimated a substantially higher dispersal velocity before the lockdown ( . km/day, % hpd [ . - . ]) compared to during the lockdown ( . km/day, % hpd [ . - . ]). this trend is further confirmed when focusing on the province of liège for which we have a particularly dense sampling: in that province, we estimated a lineage dispersal velocity of . km/day ( % hpd [ . - . ]) before the lockdown and of . km/day ( % hpd [ . - . ]) during the lockdown. however, the evolution of the dispersal velocity through time is less straightforward to interpret (fig. ) : while the lineage dispersal velocity was globally higher at the early phase of the belgian epidemic, which corresponds to the week following the returns from carnival holidays, it then seemed to drop just before the beginning of the lockdown before increasing again to reach a plateau. in the second half of april, our estimates indicate that the lineage dispersal velocity drops again. however, this result may be an artefact associated with the notably lower number of phylogenetic branches currently inferred during that period (fig. ) . secondly, we further investigated the impact of the lockdown on the dispersal events among provinces. our analyses indicate that amongprovinces dispersal events tended to decrease during the epidemic (fig. ) : such dispersal events were more frequent at the beginning of the epidemic and then progressively decreased until reaching a plateau at the beginning of the lockdown. again, the relatively limited number of phylogenetic branches currently inferred from mid-april does not really allow to interpret the fluctuations of the proportion of among-provinces dispersal events during that period. our preliminary phylogeographic investigation reveals the important contribution of external introduction events for the establishment of the sars-cov- epidemic in belgium. this highlights that transmission chains circulating in belgium were not established by a relatively restricted number of isolated infectious cases, e.g. people returning from skiing holidays in northern italy. on the contrary, we identify a large number of distinct clades given the number of analysed sequences sampled in belgium. this overall observation is in line with our spatially-explicit phylogeographic analyses uncover the spatiotemporal distribution of belgian sars-cov- clusters, indicating a relatively low impact of the lockdown on both the dispersal velocity of viral lineages and on the frequency of long-distance dispersal events. while it has been demonstrated that the national lockdown had an overall impact on the virus transmission, i.e. reducing its effective reproduction number to a value below one , our results highlight that the lockdown did not clearly decrease the velocity at which the viral lineages travelled or their ability to disperse over long distances within the country. this finding may be important to consider in the context of potential future lockdown measures, especially if more localised (e.g. at the province or city level). indeed, locally reduced transmission rates will not automatically be associated with a notable decrease in the average velocity or distance travelled by lineage dispersal events, which could in turn limit the effectiveness of localised lockdown measures in containing local upsurge of the virus circulation. applying the present phylodynamic pipeline in a real-time perspective does not come without risk as new sequences can sometimes be associated with spurious nucleotide changes that could be associated with sequencing or assembling errors. directly starting from inference results kept up to date by a database like gisaid allows for fast analytical processing but also relies on newly deposited data that could sometimes carry potential errors. to remedy such potentially challenging situations, our proposed pipeline could be extended with a sequence data resource component that makes uses of expert knowledge regarding a particular virus. the glue software package allows new sequences to be systemati-cally checked for potential issues, and could hence be an efficient tool to safely work with frequently updated sars-cov- sequencing data. such a "cov-glue" resource is currently being developed (http://cov-glue.cvr.gla.ac.uk/#/home). while we acknowledge that a fully integrated analysis (i.e. an analysis where the phylogenetic tree and ancestral locations are jointly inferred) would be preferable, fixing an empirical time-scaled tree represents a good compromise to rapidly gain insights into the dispersal history and dynamics of sars-cov- lineages. indeed, the number of genomes available, as well as the number of different sampling locations to consider in the analysis, would lead to a joint analysis requiring weeks of run-time in a bayesian phylogenetic software package like beast. to illustrate the computational demands of such approach, we ran a classic bayesian phylogenetic analysis on a smaller sars-cov- data set ( , genomic sequences) using beast . (data not shown). this analysis required over hours to obtain enough samples from the joint posterior, while using the latest gpu accelerated implementations on parallel runs. with a combined chain length of over . x states, and an average runtime of . hours per million states, the significant computational demands required make this approach impractical when speed is critical. on the other hand, we here use a maximum likelihood method implemented in the program treetime to infer a time-scaled phylogenetic tree in a short amount of time (~ hours for the data set analysed here). given the present urgent situation, we have deliberately assumed a time-scaled maximum-likelihood phylogenetic tree as a fair estimate of the true time-scaled phylogenetic tree. our analytical workflow has the potential to be rapidly applied to study the dispersal history and dynamics of sars-cov- lineages in other restricted or even much broader study areas. we believe that spatially-explicit reconstruction can be a valuable tool for highlighting specific patterns related to the circulation of the virus or assessing the impact of intervention measures. while new viral genomes are sequenced and released daily, a limitation could paradoxically arise from the non-accessibility of associated metadata. indeed, without sufficiently precise data about the geographic origin of each genome, it is not feasible to perform a spatially-explicit phylogeographic inference. in the same way that viral genomes are deposited in databases like gisaid, metadata should also be made available to enable comprehensive epidemiological investigations with a similar approach as we presented here. inference of a time-scaled phylogenetic tree. to infer our time-scaled phylogenetic tree, we selected all non-belgian sequences in the nextstrain analysis, along with all available belgian sequences in gisaid to be included in our analysis as of june , . once we knew which were the accessions of interest, we downloaded the latest whole genome alignment from gisaid and removed all non-relevant sequences. we then cleaned the alignment by manually trimming the ' and ' untranslated regions (refseq nc_ . ) and gap-only sites. to obtain a maximum-likelihood phylogeny, we ran iq-tree . . under a general time reversible (gtr) model of nucleotide substitution with empirical base frequencies and four free rate site categories. this model configuration was selected as the best gtr model using iq-tree?s modelfinder tool. the tree was then inspected for outlier sequences using tempest . . and, once the outliers were removed, time-calibrated using treetime . . . to replicate the nextstrain workflow as closely as possible, we specified a clock rate of x - in treetime and removed samples that deviate more than four interquartile ranges from the root-to-tip regression. preliminary discrete phylogeographic analysis. we performed a preliminary phylogeographic analysis using the discrete diffusion model implemented in the software package beast . . the objective of this first analysis was to identify independent introduction events of sars-cov- lineages into belgium. to this end, we used our time-scaled phylogenetic tree as a fixed empirical tree and only considered two possible ancestral locations: belgium and non-belgium. bayesian inference through markov chain monte carlo (mcmc) was run on this empirical tree for generations and sampled every , generations. mcmc convergence and mixing properties were inspected using the program tracer . to ensure that effective sample size (ess) values associated with estimated parameters were all > . after having discarded % of sampled trees as burn-in, a maximum clade credibility (mcc) tree was obtained using treeannotator . . we used the resulting mcc tree to delineate belgian clusters here defined as phylogenetic clades corresponding to independent introduction events in belgium. continuous and post hoc phylogeographic analyses. we used the continuous diffusion model available in beast . to perform a spatially-explicit (or "continuous") phylogeographic reconstruction of the dispersal history of sars-cov- lineages in belgium. we employed a relaxed random walk (rrw) diffusion model to generate a posterior distribution of trees whose internal nodes are associated with geographic coordinates . specifically, we used a cauchy distribution to model the among-branch heterogeneity in diffusion velocity. we performed a distinct continuous phylogeographic reconstruction for each belgian clade identified by the initial discrete phylogeographic inference, again fixing a time-scaled subtree as an empirical tree. as phylogeographic inference under the continuous diffusion model does not allow identical sampling coordinates assigned to the tips of the tree, we avoided assigning sampling coordinates using the centroid point of each administrative area of origin. for a given sampled sequence, we instead retrieved geographic coordinates from a point randomly sampled within its municipality of origin, which is the maximal level of spatial precision in available metadata. this approach avoids using the common "jitter" option that adds a restricted amount of noise to duplicated sampling coordinates. using such a jitter could be problematic because it can move sampling coordinates to administrative areas neighbouring their actual administrative area of origin . furthermore, the administrative areas considered here are municipalities and are rather small (there are currently municipalities in belgium). the clade-specific continuous phylogeographic reconstructions were only based on belgian tip nodes for which the municipality of origin was known, i.e. out of genomic sequences. furthermore, we only performed a continuous phylogeographic inference for belgian clades linking a minimum of three tip nodes with a known sampling location (municipality). each markov chain was run for generations and sampled every , generations. as with the discrete phylogeographic inference, mcmc convergence/mixing properties were assessed with tracer, and mcc trees (one per clade) were obtained with treeannotator after discarding % of sampled trees as burn-in. we then used functions available in the r package "seraphim" , to extract spatiotemporal information embedded within the same , posterior trees and visualise the continuous phylogeographic reconstructions. we also used "seraphim" to estimate the following weighted lineage dispersal velocity, and we verified the robustness of our estimates through a subsampling procedure consisting of re-computing the weighted dispersal velocity after having randomly discarded % of branches in each of the , posterior trees. the weighted lineage dispersal velocity is defined as follows, where d i and t i are the geographic distance travelled (great-circle distance in km) and the time elapsed (in days) on each phylogeny branch, respectively: a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china if the world fails to protect the economy, covid- will damage health not just now but also in the future multidisciplinary research priorities for the covid- pandemic: a call for action for mental health science the proximal origin of sars-cov- genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding an emergent clade of sars-cov- linked to returned travellers from iran genomic epidemiology of sars-cov- in guangdong province screening of healthcare workers for sars-cov- highlights the role of asymptomatic carriage in covid- transmission nextstrain: real-time tracking of pathogen evolution recent advances in computational phylodynamics bayesian phylogeography finds its roots phylogeography takes a relaxed random walk in continuous space and time bayesian phylogenetic and phylodynamic data integration using beast . . virus evol global initiative on sharing all influenza data -from vision to reality treetime: maximum-likelihood phylodynamic analysis unifying the spatial epidemiology and molecular evolution of emerging epidemics a genomic survey of sars-cov- reveals multiple introductions into northern california without a predominant lineage covid- report on a meta-population model for belgium: a first status report glue: a flexible software system for virus sequence data improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics iq-tree : new models and efficient methods for phylogenetic inference in the genomic era some probabilistic and statistical problems in the analysis of dna sequences a space-time process model for the evolution of dna sequences exploring the temporal structure of heterochronous sequences using tempest (formerly path-o-gen) posterior summarization in bayesian phylogenetics using tracer . phylodynamic assessment of intervention strategies for the west african ebola virus outbreak explaining the geographic spread of emerging epidemics: a framework for comparing viral phylogenies and environmental landscape data seraphim: studying environmental rasters and phylogenetically informed movements data accessibility the new sequences have been deposited in gisaid and all data (sequence metadata, beast input, output files, and r scripts for our analyses) are available in the following github repository acknowledgments we are grateful to sébastien kozlowskyj for his assistance in using the dragon cluster of the university of mons, and to ine boonen for her assistance during sars-cov- sequencing. sd and mg are supported by the fonds national de la recherche scientifique (fnrs, belgium). kd and ma, and the uliège sequencing effort, are supported by the grant walgemed from the walloon region (convention n • ). bv is supported by a fwo sb grant for strategic basic research of the "fonds wetenschappelijk onderzoek"/research foundation flanders ( s n). jmc is supported by a doctoral grant from honours (host switching pathogens, infectious outbreaks and zoonosis) marie-sklodowska-curie training network ( ) key: cord- -iyxyennq authors: guo, youjia; kawaguchi, atsushi; takeshita, masaru; sekiya, takeshi; hirohama, mikako; yamashita, akio; siomi, haruhiko; murano, kensaku title: potent mouse monoclonal antibodies that block sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iyxyennq coronavirus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ), has developed into a global pandemic since its first outbreak in the winter of . an extensive investigation of sars-cov- is critical for disease control. various recombinant monoclonal antibodies of human origin that neutralize sars-cov- infection have been isolated from convalescent patients and will be applied as therapies and prophylaxis. however, the need for dedicated monoclonal antibodies in molecular pathology research is not fully addressed. here, we produced mouse anti-sars-cov- spike monoclonal antibodies that exhibit not only robust performance in immunoassays including western blotting, elisa, immunofluorescence, and immunoprecipitation, but also neutralizing activity against sars-cov- infection in vitro. our monoclonal antibodies are of mouse origin, making them compatible with the experimental immunoassay setups commonly used in basic molecular biology research laboratories, and large-scale production and easy distribution are guaranteed by conventional mouse hybridoma technology. the outbreak of covid- caused by severe acute respiratory syndrome coronavirus (sars-cov- ) is a threat to global public health and economic development (huang et al., ; . vaccine and therapeutic discovery efforts are paramount to restrict the spread of the virus. passive immunization could have a major effect on controlling the virus pandemic by providing immediate protection, complementing the development of prophylactic vaccines (klasse & moore, ; walker & burton, ) . passive immunization against infectious diseases can be traced back to the late th century and the work of shibasaburo kitasato and emil von behring on the serotherapy of tetanus and diphtheria. there have been significant developments in therapies and prophylaxis using antibodies over the past years (graham & ambrosino, ) . the advent of hybridoma technology in provided a reliable source of mouse monoclonal antibodies (kohler & milstein, ) . with the development of humanized mouse antibodies and subsequent generation of fully human antibodies by various techniques, monoclonal antibodies have become widely used in therapy and prophylaxis for cancer, autoimmune diseases, and viral pathogens (walker & burton, ) . indeed, a humanized mouse monoclonal antibody neutralizing respiratory syncytial virus (rsv), palivizumab, is widely used in clinical settings prophylactically to protect vulnerable infants (connor, ) . in recent years, highly specific and often broadly active neutralizing monoclonal antibodies have been developed against several viruses (caskey, klein, & nussenzweig, ; corti et al., ; davide corti et al., ; corti, passini, lanzavecchia, & zambon, ; walker & burton, ) . passive immunization with a monoclonal antibody is currently under consideration as a treatment for covid- caused by sars-cov- (dhama et al., ; jawhara, ; jiang, hillyer, & du, ; klasse & moore, ; ni et al., ) . isolation of multiple human neutralizing monoclonal antibodies against sars-cov- has been reported (cao et al., ; chen et al., ; chi et al., ; hassan et al., ; ju et al., ; pinto et al., ; robbiani et al., ; rogers et al., ; shi et al., ; wan et al., ; wang et al., ; wu et al., ; zeng et al., ; zost et al., ) . these antibodies can avoid the potential risks of human-antimouse antibody responses and other side effects (hansel, kropshofer, singer, mitchell, & george, ) . they will be appropriate for direct use in humans since they are humanized even if these monoclonal antibodies are recombinant. owing to the recent rapid development of single-cell cloning technology, the process of antibody isolation has been dramatically shortened compared with the generation of a conventional monoclonal antibody secreted from a hybridoma resulting from the fusion of a mouse myeloma with b cells (wan et al., ) . however, since they are recombinant human antibodies produced in hek cell lines derived from human embryonic kidney, they have a disadvantage compared to conventional hybridoma-produced antibodies in terms of their lot-to-lot quality control and manufacturing costs (cohen, ) . instead, monoclonal antibodies produced by hybridomas are secreted into the culture supernatant, thus their production is straightforward and of low cost, and their quality is stable. it is also easy to distribute them to researchers worldwide, although they will not be applicable for treatment, if not chimeric and humanized, due to their immunogenicity (hansel et al., ; reichert, rosensweig, faden, & dewitz, ) . in addition to the impact of monoclonal antibodies on therapy and prophylaxis, they significantly impact the characterization of sars-cov- . to overcome the long-term battle with the virus, we need a detailed understanding of the replication mechanisms underlying its lifecycle, including viral entry, genome replication, budding from the cellular membrane, and interaction with host immune systems. these essential pieces of information are required for drug discovery, vaccine design, and therapy development. despite the large number of neutralizing antibodies reported to inhibit infection, there is an overwhelming lack of data on a well-characterized antibody available for basic research techniques such as western blotting, immunofluorescence, and immunoprecipitation to study the viral life cycle. here, we established six monoclonal antibodies against the spike glycoprotein of sars-cov- . the trimeric spike glycoproteins of sars-cov- play a pivotal role in viral entry into human target cells through the same receptor, angiotensin-converting enzyme (ace ) as sars-cov- (hoffmann et al., ) . our antibodies were produced by a hybridoma resulting from the fusion of a mouse myeloma sp / with splenocytes obtained from balb/c mice immunized with purified recombinant spike proteins. we evaluated these antibodies for application in molecular pathology research. among them, two antibodies were shown to attenuate the interaction of spike proteins with ace and neutralized infection of veroe /tmprss cells by sars-cov- . our antibodies will accelerate research on sars-cov- and lead to new therapies and prophylaxis. the sars-cov- spike glycoprotein is a homotrimeric fusion protein composed of two subunits: s and s . during infection, the receptor binding domain (rbd) on s subunit binds to ace , resulting in destabilization of the spike protein's metastable conformation. once destabilized, the spike protein is cleaved into the n-terminal s and c-terminal s subunits by host proteases such as tmprss and changes conformation irreversibly from the prefusion to the postfusion state (hoffmann et al., ; ou et al., ; song, gui, wang, & xiang, ) , which triggers an infusion process mediated by the s region (tai, zhang, he, jiang, & du, ; walls et al., ) . the instability needs to be addressed to obtain high-quality spike proteins for downstream applications. we adopted the design principle reported by wrapp et al. (wrapp et al., ) , in which the sars-cov- spike protein was engineered to form a stable homotrimer that was resistant to proteolysis during protein preparation. in our practice, recombinant spike protein rbd and ectodomain were constructed. a t fabritin trimerization motif (foldon) was incorporated into the c-terminal of the recombinant spike ectodomain to promote homotrimer formation (miroshnikov et al., ) (fig. a) . recombinant rbd proteins tagged with gst or mbp were produced using an e. coli expression system (fig. b) . both recombinant spike protein rbd and ectodomain (s∆tm) were produced using a mammalian expression system that retained proper protein glycosylation equivalent to that observed during virus replication (fig. c , s a). mice were immunized with these recombinant spike proteins to generate antibodies against the sars-cov- virus, followed by cell fusion to generate a hybridoma-producing antibody. culture supernatants were pre-screened by enzyme-linked immunosorbent assay (elisa), western blotting (wb), and immunoprecipitation (ip), and six monoclonal hybridomas were isolated and evaluated. to characterize these antibodies in detail, they were first purified from the culture supernatant and examined in terms of elisa and wb performance. four monoclonal antibodies derived from the antigen produced by e. coli (clones r , r , r , and r ) and two from mammalian cells (s d and s d ) showed remarkable performance. in the elisa binding assay, all six clones bound glycosylated rbd with high affinity. when tested against spike glycoprotein (s∆tm), two clones (r and r ) could not be distinguished from non-immune igg (fig. d ). we noted that igg subclass members tended to have higher binding affinities. half maximal effective concentration (ec ) required for these antibodies to bind rbd and s∆tm glycoproteins falls at the low hundreds ng/ml (fig. e ). in wb, where target proteins are reduced and denatured, all clones established by e. coli produced-antigens performed well at detecting rbd and s∆tm proteins regardless of glycosylation ( fig. f , left, and g, s b). among them, clones r and r showed higher sensitivity in wb. in addition, r was capable of detecting not only artificial spike glycoprotein carrying t foldon, but also native spike glycoprotein expressed in t cells on wb (fig. h ). however, neither rbd nor s∆tm could be detected by antibody clones established by the mammalian antigen (s d and s d ) on wb, suggesting a strong preference for intact tertiary structure (fig. f , right). an antibody capable of recognizing the intact tertiary structure of spike proteins would contribute to research dissecting the molecular mechanism of sars-cov- infection, especially cell entry, where these proteins play a significant role. the ip activity of antibodies can be correlated with the activity of capturing the native structure of the target protein and neutralizing the infection. we examined the ip performance of our monoclonal antibodies. although all clones were capable of immunoprecipitating rbd and s∆tm glycoproteins, clone r , r , s d , and s d demonstrated superior ip efficiency for s∆tm, whereas r , s d , and s d showed higher ip efficiency for rbd glycoprotein (fig a) . as shown in fig. b , our antibodies recognize the spike protein in a glycosylation-independent manner, and the ip efficiencies of r , r , s d , and s d , although mild, outperformed others. noticeably, although clones s d and s d are not capable of performing wb (fig. f ), a strong preference for tertiary structure grants them remarkable performance in ip, where rbd and s∆tm glycoproteins were pulled down in their native conformation. of note, we found that s d and s d could maintain intact ip efficiency under highly stringent experimental conditions where sodium dodecyl sulfate (sds) was present ( fig s a) . next, we examined whether our antibodies could be used in the immunofluorescence assay (if). an antibody applicable for ip would also have activity in if. cellular localization of spike proteins is essential for elucidating the mechanism of packaging and maturation of virions during release from the cellular membrane. we tested our antibodies' performance in if using hela cells overexpressing trimeric spike protein with the transmembrane domain. consistent with their performance in the abovementioned assays ( fig. a and b ), both s d and s d could detect spike proteins expressed homogeneously on the apical side of hela cells with a high signal-to-noise ratio ( fig. c and s b ). however, their localization pattern is different from a previous report that observed spike proteins exclusively in the golgi during sars-cov- infection (stertz et al., ) . one possible reason for the difference could be that the spikes were expressed with no other viral proteins (see also fig. b ). mouse hepatitis coronavirus spike protein localizes in the endoplasmic reticulum-golgi intermediate compartment (ergic) in a membrane (m) protein dependent manner. in contrast, when expressed by itself, the spike had a faint reticular appearance (artika, dewantari, & wiyatno, ; opstelten, raamsman, wolfs, horzinek, & rottier, ) . the manner in which antibodies bind and pull down spike glycoproteins in an ip experiment resembles the process of antibody-mediated neutralization, where spike-ace interaction is intercepted by competitive binding between neutralizing antibodies and spike glycoprotein. the performance of our antibodies in ip experiments prompted us to examine whether they were capable of inhibiting spike-ace binding or even neutralizing sars-cov- infection. first, we performed a spike pull-down assay in which the spike glycoprotein was pulled down by ace in the presence of monoclonal antibodies ( fig. a and s a). clones s d and s d clearly inhibited spike-ace binding, as shown by the dimmed spike signal in wb (fig. b ). to quantify the inhibition ability, we performed a bead-based neutralization assay by measuring the amount of ace bound to rbd beads after blocking with monoclonal antibodies (fig. c ). antibodies r and r showed no disruption of ace -rbd interaction, whereas s d and s d showed robust hindrance of ace -rbd binding with ic values of . ng/ml and . ng/ml, respectively ( fig. d and e ). s d and s d 's abilities to inhibit spike-ace binding was consistent with their superior performance in ip experiments. next, we asked whether our antibodies inhibit sars-cov- infection in veroe /tmprss (tm ) cells, which is susceptible to sars-cov- infection compared with the parental veroe cell line by expressing tmprss (matsuyama et al., ) . in wb, antibodies r and r , but not s d and s d , could detect spike glycoprotein along with the progression of sars-cov- infection in veroe /tm cells (fig. a ). on the other hand, s d and s d were applicable to if in infected veroe /tm cells. spike showed a punctate distribution pattern in the perinuclear region resembling er and ergic (sadasivan, singh, & sarma, ) (fig. b ). the subcellular localization of spike resembled that of the n protein in vero cells infected with sars-cov- (stertz et al., ) , suggesting assembly of sars-cov- virion in the cytoplasm. we then conducted a live virus neutralization assay to examine whether clones s d and s d inhibit the live virus infection. as expected, although clone r failed to protect veroe /tm from sars-cov- infection, s d and s d blocked sars-cov- infection significantly with ic values of . ng/ml and ng/ml, respectively, even at relatively high titers of tcid (fig. c , table ). a cocktail of s d and s d showed intermediate neutralizing activity ( . ng/ml), suggesting that s d and s d share an inhibitory mechanism. emerging sars-cov- is a global public health threat to society, which is predicted to be long-term for several years (kissler, tedijanto, goldstein, grad, & lipsitch, ) . although there are multiple ongoing endeavors to develop neutralizing antibodies, vaccines, and drugs against the virus (callaway, ; riva et al., ) , the lack of adequate, licensed countermeasures underscores the need for a more detailed and comprehensive understanding of the molecular mechanisms underlying the pathogenesis of the virus (artika et al., ) . fundamental knowledge has significant implications for developing countermeasures against the virus, including diagnosis, vaccine design, and drug discovery. due to the above reasons, and our experiences with routine antibody productions (iwasaki et al., ; murano et al., ) , we have established and characterized mouse monoclonal antibodies that can be used to dissect the molecular mechanism of the virus life cycle. these antibodies would serve as a reliable toolset for basic research investigating the expression profile and subcellular localization of spike glycoprotein during viral entry, replication, packaging, and budding. these antibodies could help to identify novel host factors interacting with spike glycoprotein when used in ip in combination with mass spectrometry. therefore, advancement in basic research would accelerate the discovery of drugs targeting virus transmission. since passive immunization with neutralizing antibodies has been proposed as a treatment for covid- (dhama et al., ; jawhara, ; klasse & moore, ; ni et al., ) , research interests have largely focused on cloning human neutralizing antibodies from covid- patients. our antibodies, s d and s d , have been shown to attenuate the interaction of spike proteins with ace and neutralize infection of veroe /tm cells by sars-cov- . it is worth noting that although their neutralizing activities (ic of . ng/ml and ng/ml) appeared to be lower than those of human antibodies reported previously ( fig. c and table ), the stringency of experimental conditions (relatively high virus titer of tcid ) tend to underestimate neutralizing activities of our antibodies compared to other research groups. specifically, we used a high multiplicity of live sars-cov- virus to infect veroe /tm cells, which are more prone to virus infection than the commonly adopted veroe cell line. therefore, it is difficult to compare antibody efficacy among them (tse, meganck, graham, & baric, ) . in addition to in vitro infection, their neutralizing activity in vivo should be examined in animal models that recapitulate sars-cov- disease. our mouse antibodies will not be applicable for use in clinical treatment, if not chimeric and humanized, due to their immunogenicity (hansel et al., ; reichert et al., ) . on the other hand, they may be valuable for investigating the mechanism of immune responses to the virus during passive immunization using mouse models for sars-cov- infection (bao et al., ; dinnon et al., ; hassan et al., ; israelow et al., ; r. d. jiang et al., ; winkler et al., ) . they could show stable performance due to lot-to-lot consistency and act as benchmarks for other antibodies and drug developments. the authors declare no competing interests. synthetic dna sequences encoding sars-cov- spike protein ectodomain (s∆tm, residue - ; strain wuhan-hu- ; genbank: qhd . ) and rbd (residue - ; strain wuhan-hu- ; genbank: qhd . ) fused with an n-terminal signal peptide, a c-terminal trimerization motif, an hrv c cleavage site, an sbp purification tag, and an xhis-tag were inserted into pefx mammalian expression vector. s /s ( the mouse myeloma cell line sp / -ag (rcb ) was provided by the riken bioresouces center (tsukuba, japan). the cells were cultured in rpmi (nissui) supplemented with % heat-inactivated calf serum (biowest) and ng/ml recombinant human interleukin (il- , peprotech). hela and t cells were cultured in dmem (nacalai tesque) with % fetal bovine serum (biowest). we maintained hybridoma clones against spike glycoproteins in hybridoma serum-free medium (fujifilm wako) supplemented with ng/ml il- . for immunoprecipitation assay, µg of purified antibodies was conjugated to µl dynabeads protein g (thermo fisher) for min at room temperature, followed by washing twice in ip buffer ( mm tris-hcl(ph . ), mm nacl, . % np- ). antibody conjugated beads were incubated with ng s∆tm in µl ip buffer for hours at room temperature. beads were washed three times in ip buffer and eluted with sds-page loading dye at ºc for min. immunoprecipitation of s∆tm was examined by sds-page followed by western blotting using antibody r . before performing immunofluorescence, hela cells seeded on cover glasses were transfected with plasmids encoding full length sars-cov- spike protein for days using lipofectamine (thermo fisher). cells were fixed with % formaldehyde in pbs for min at room temperature, washed in pbs-t once, and permeabilized with . % triton x- in pbs for min at room temperature. cells were blocked by % non-fat skim milk in pbs-t for min, then incubated with . µg/ml antibody for h at room temperature. after three times wash in pbs-t, cells were incubated in : nunc for spike pull-down assay, s∆tm glycoprotein was incubated with µg antispike antibody in µl binding buffer (pbs supplemented with . % np- ) at room temperature for hour, then µg of ace -sbp recombinant protein was applied the reaction for hour. the ace -sbp was pull-down by µl dynabeads m- streptavidin (thermo fisher) for min at room temperature, followed by washing twice with binding buffer and elution with sds-page loading dye at ºc for min. ace -spike binding inhibition was examined by sds-page, followed by wb using antibody r . fixed in % pfa and subjected to indirect immunofluorescence assays using s d antibody as described above. the number of infected cells were imaged and analyzed using arrayscan (thermo fisher). mouse anti-flag m antibody (sigma) was also used as a control. experiments with sars-cov- were performed in a biosafety level (bsl ) containment laboratory at university of tsukuba. were separated by sds-page, followed by wb using antibody r . a. immunoprecipitation (ip) of trimeric glycosylated spike protein (s∆tm) using established monoclonal antibodies. s , s d ; s , s d ; ni, non-immune mouse igg; in, input; s∆tm, trimeric spike protein without transmembrane domain; hc, igg heavy chain; lc, igg light chain. all clones were capable of pulling down rbd and spike glycoprotein. higher ip efficiency of spike glycoprotein was observed in clones r , r , s d , and s d . for rbd glycoprotein, clone r , s d , and s d showed higher ip efficiency. b. ip of trimeric spike protein de-glycosylated by pngase f using established monoclonal antibodies. "s∆tm" indicates s∆tm glycoprotein untreated with pngase f. all clones are capable of pulling down de-glycosylated spike protein. higher ip efficiency was observed in clone r , r , s d , and s d . c. immunofluorescence (if) staining of spike glycoprotein expressed in hela cells with monoclonal antibodies s d and s d . spike protein localized on the apical surface of transfected hela cells scale bar, µm. a. a schematic of the spike pull-down assay designed to evaluate inhibition of ace spike binding by monoclonal antibody. spike glycoprotein lacking tm domain (s∆tm) was mixed with a monoclonal antibody. ace -sbp was applied to capture s∆tm onto streptavidin beads competitively. captured s∆tm was detected by wb as a measurement of the antibody's inhibitory ability. s , s d ; s , s d ; ni, nonimmune mouse igg. b. wb of spike pull-down assay using antibody r . in the presence of clones s d and s d , ace was not able to pull down s∆tm. c. schematic of bead-based neutralization assay designed to quantify inhibition of ace -rbd binding by monoclonal antibody. rbd-sbp glycoprotein immobilized on streptavidin beads was mixed with a monoclonal antibody. ace -flag was applied to bind competitively with rbd. ace -rbd binding was quantified by measuring the signal given by an anti-flag antibody conjugated with apc fluorophore using facs. d. one set of representative facs results of a bead-based neutralization assay in the presence of µg/ml monoclonal antibodies. clones s d and s d significantly inhibited ace -rbd interaction, shown as lowered fluorescence intensity of apc. e. binding profiles of potent neutralizing antibodies. ni, non-immune mouse igg. error bars indicate standard deviation (n= ). clones r and r showed no inhibition of ace -rbd binding, while s d and s d inhibited ace -rbd interaction at lower ng/ml levels. a. recombinant spike glycoproteins were treated with hrv c protease to remove sbptag before immunizing mice. b. clone r showed the highest performance on western blotting among our antibodies and detected even . ng s∆tm glycoprotein. a. monoclonal antibody clones s d and s d maintain high efficiency even in the presence of . % sds. s , s d ; s , s d ; ni, non-immune igg; in, input. b. immunofluorescence (if) staining of spike glycoprotein expressed in hela cells with all six monoclonal antibodies. s d and s d showed higher performance in if. images were captured using a keyence bz-x fluorescence microscope. scale bar, µm. a. ace -sbp protein was purified from the culture supernatant of expi f cells transfected with a plasmid encoding ace -sbp. r r s d s d table guo et al. molecular biology of coronaviruses: current knowledge the pathogenicity of sars-cov- in hace transgenic mice the race for coronavirus vaccines: a graphical guide potent neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells broadly neutralizing anti-hiv- monoclonal antibodies in the clinic human monoclonal antibodies block the binding of sars-cov- spike protein to angiotensin converting enzyme receptor a neutralizing human antibody binds to the n-terminal domain of the spike protein of sars-cov- the race is on for antibodies that stop the new coronavirus palivizumab, a humanized respiratory syncytial virus monoclonal antibody, reduces hospitalization from respiratory syncytial virus infection in hieh-risk infants. the impact-rsv study group tackling influenza with broadly neutralizing antibodies protective monotherapy against lethal ebola virus infection by a potently neutralizing antibody rapid generation of a human monoclonal antibody to combat middle east respiratory syndrome covid- , an emerging coronavirus infection: advances and prospects in designing and developing vaccines, immunotherapeutics, and therapeutics a mouse-adapted model of sars-cov- to test covid- countermeasures history of passive antibody administration for prevention and treatment of infectious diseases the safety and side effects of monoclonal antibodies a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor clinical features of patients infected with novel coronavirus in wuhan mouse model of sars-cov- reveals inflammatory role of type i interferon signaling piwi modulates chromatin accessibility by regulating multiple factors including histone h to repress transposons could intravenous immunoglobulin collected from recovered coronavirus patients protect against covid- and strengthen the immune system of new patients? pathogenesis of sars-cov- in transgenic mice expressing human angiotensin-converting enzyme neutralizing antibodies against sars-cov- and other human coronaviruses human neutralizing antibodies elicited by sars-cov- infection projecting the transmission dynamics of sars-cov- through the postpandemic period antibodies to sars-cov- and their potential for therapeutic passive immunization. elife continuous cultures of fused cells secreting antibody of predefined specificity early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia potent neutralizing antibodies against multiple epitopes on sars-cov- spike enhanced isolation of sars-cov- by tmprss -expressing cells engineering trimeric fibrous proteins based on bacteriophage t adhesins nuclear rna export factor variant initiates pirna-guided co-transcriptional silencing detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals envelope glycoprotein interactions in coronavirus assembly characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody monoclonal antibody successes in the clinic discovery of sars-cov- antiviral drugs through large-scale compound repurposing convergent antibody responses to sars-cov- in convalescent individuals isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model cytoplasmic tail of coronavirus spike protein has intracellular targeting signals a human neutralizing antibody targets the receptor-binding site of sars-cov- cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace the intracellular sites of early replication and budding of sars-coronavirus identification of sars-cov rbd-targeting monoclonal antibodies with cross-reactive or neutralizing activity against sars-cov- the current and future state of vaccines passive immunotherapy of viral infections: 'super-antibodies' enter the fray structure, function, and antigenicity of the sars-cov- spike glycoprotein human-igg-neutralizing monoclonal antibodies block the sars-cov- infection a human monoclonal antibody blocking sars-cov- infection sars-cov- infection of human ace -transgenic mice causes severe lung inflammation and impaired function cryo-em structure of the -ncov spike in the prefusion conformation a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace isolation of a human monoclonal antibody specific for the receptor binding domain of sars-cov- using a competitive phage biopanning strategy potently neutralizing and protective human antibodies against sars-cov- we thank ayako ishida, mie kobayashi, and yasuyuki kurihara for technical assistance and advice on the production of antibodies. this work was supported by the key: cord- - kcqhgjw authors: dai, manman; li, huanan; yan, nan; huang, jinyu; zhao, li; xu, siqi; jiang, shibo; pan, chungen; liao, ming title: long-term survival of salmon-attached sars-cov- at °c as a potential source of transmission in seafood markets date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kcqhgjw several outbreaks of covid- were associated with seafood markets, raising concerns that fish-attached sars-cov- may exhibit prolonged survival in low-temperature environments. here we showed that salmon-attached sars-cov- at °c could remain infectious for more than one week, suggesting that fish-attached sars-cov- may be a source of transmission. abstract several outbreaks of covid- were associated with seafood markets, raising concerns that fish-attached sars-cov- may exhibit prolonged survival in low-temperature environments. here we showed that salmon-attached sars-cov- at o c could remain infectious for more than one week, suggesting that fish-attached sars-cov- may be a source of transmission. countries have identified sars-cov- in meat or meatpacking workers ( - ), raising concerns that fish-or meat-attached sars-cov- could be a potential source of covid- transmission. therefore, it is essential to determine the survival time of sars-cov- in the low-temperature environment of seafood markets. in this study, we detected the titer ( % tissue culture infectious dose/ml, tcid /ml) of viable sars-cov- attached on salmon or untreated sars-cov- in culture medium stored at °c, the temperature in refrigerators or cold rooms for the temporary storage of fish, or °c, the regular room temperature, respectively, using end-point titration assay on vero e cells as described previously ( ). as shown in figure a and b, salmon-attached sars-cov- remained viable at °c and °c for and days, respectively, while the untreated sars-cov- in culture medium remained infectious at °c and °c for more than days. sars-cov- attached on salmon or suspended in culture medium stored at °c remained viable for at least days, while these stored at °c resulted in attenuating infectivity very quickly. the result from the experiment on samples stored at °c is consistent with that reported by van doremalen et al. they showed that sars-cov- remained viable in aerosols, or on the surface of copper, cardboard, stainless steel, and plastic, at ~ °c and % relative humidity for ~ hours ( ), confirming that the loss of sars-cov- viability is associated with increased temperature. imported and exported fish must be transported under a low-temperature (e.g., ~ °c ) in conclusion, fish-attached sars-cov- can survive for more than one week at °c, the temperature of refrigerators, cold rooms, or transport carriers for storage of fish before selling in the fish or seafood market. this calls for strict inspection or detection of sars-cov- as a critical new protocol in fish importation and exportation before allowing sales. dashed lines indicate the limit of detection, which were tcid /ml. aerosol and surface stability of sars-cov- as compared with sars-cov- individual salmon cubes ( x x mm) were placed in a ml tube containing ml liquid of sars-cov- at . x tcid /ml, and the tube was gently inverted times. the salmon cubes were transferred into the cm dish and incubated for seconds at room temperature and then put on filter paper in another cm dish to remove the excess virus liquid. salmon cubes were transferred to . ml freezing tubes and stored at °c and °c, respectively. on day , , , , , , , , , and , respectively, one freezing tube was taken out, to which ml dmem culture medium was added, oscillated for seconds, and then centrifuged at , rpm for minutes at °c. about . ml of the liquid was transferred to a new freezing tube and kept at - °c until viral titration ( figure a) . the untreated virus in culture medium was stored at °c and °c, respectively. on day , , and , ml of the viral liquid was taken out and put in a freezing tube, which was kept at - °c until viral titration ( figure a) . the virus titer ( % tissue culture infectious dose, tcid per ml) was quantified by end-point titration assay on vero e cells ( ). the detection limit of the typical tcid assay used in this study was tcid / ml ( ). key: cord- - sdxqmkb authors: khan, md. abdullah-al-kamran; islam, abul bashar mir md. khademul title: sars-cov- proteins exploit host’s genetic and epigenetic mediators for the annexation of key host signaling pathways that confers its immune evasion and disease pathophysiology date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sdxqmkb the constant rise of the death toll and cases of covid- has made this pandemic a serious threat to human civilization. understanding of host-sars-cov- interaction in viral pathogenesis is still in its infancy. in this study we aimed to correlate how sars-cov- utilizes its proteins for tackling the host immune response; parallelly, how host epigenetic factors might play a role in this pathogenesis was also investigated. we have utilized a blend of computational and knowledgebase approach to elucidate the interplay between host and sars-cov- . integrating the experimentally validated host interactome proteins and differentially expressed host genes due to sars-cov- infection, we have taken a blend of computational and knowledgebase approach to delineate the interplay between host and sars-cov- in various signaling pathways. also, we have shown how host epigenetic factors are involved in the deregulation of gene expression. strikingly, we have found that several transcription factors and other epigenetic factors can modulate some immune signaling pathways, helping both host and virus. we have identified mirna hsa-mir- whose transcription factor was also upregulated and targets were downregulated and this mirna can have pivotal role in suppression of host immune responses. while searching for the pathways in which viral proteins interact with host proteins, we have found pathways like-hif- signaling, autophagy, rig-i signaling, toll-like receptor signaling, fatty acid oxidation/degradation, il- signaling etc significantly associated. we observed that these pathways can be either hijacked or suppressed by the viral proteins, leading to the improved viral survival and life-cycle. moreover, pathways like-relaxin signaling in lungs suggests aberration by the viral proteins might lead to the lung pathophysiology found in covid- . also, enrichment analyses suggest that deregulated genes in sars-cov- infection are involved in heart development, kidney development, age-rage signaling pathway in diabetic complications etc. might suggest why patients with comorbidities are becoming more prone to sars-cov- infection. our results suggest that sars-cov- integrates its proteins in different immune signaling pathway and other cellular signaling pathways for developing efficient immune evasion mechanisms, while leading the host to more complicated disease condition. our findings would help in designing more targeted therapeutic interventions against sars-cov- . though several human coronaviruses outbreaks caused severe public health crisis over the past few decades, the recent coronavirus disease (covid- ) outbreak caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) has beaten the records of the previous and still the case counts are still on the upswing. about countries and territories around the globe has been affected by this outbreak and around a total of millions of people are already infected with sars-cov- and the number is steadily rising till the date of writing this article (worldometer, ) . out of the closed cases, almost % of the patients have suffered death and about % of the active cases are in critical conditions (worldometer, ) . though the death rates from covid- was estimated to be as small as . % (who, ), at present the global fatality rate is changing very rapidly; therefore, more comprehensive studies needto be done for the efficient controlling to overturn this pandemic. coronaviruses are single stranded positive sense, enveloped rna virus having ~ kb genome (lu et al., ) . amongst the four genera, sars-cov- (accession no. nc_ . ) belong to the betacoronavirus genus and it has ~ . kb genome encoding genes (ncbi-gene, between sars-cov and sars-cov- ; as in sars-cov- , there have been amino acids substitution, deletion of orf a, elongation of orf b, and truncation of orf b observed (lu et al., ) . though the overall mortality rate from sars-cov is higher than that of sars-cov- , several unique features of sars-cov- like-increased incubation period and dormancy inside the host, thus spreading more efficiently (lauer et al., ) . this suggests that sars- cov- might be using some immune evasion strategies to maintain its survival and essential functions within its host. upon viral infection, host innate immune system detects the virion particles and elicits the first sets of antiviral responses (katze et al., ) to eliminate the viral threats. however, viruses itself have generated various modes of actions to evade those immune response by modulating the host's intracellular signaling pathways (kikkert, ) . this arm-wrestling between the host and the infecting virus results in the immunopathogenesis. different human coronaviruses also show similar features of host-pathogen interactions, which ranges from the viral entry, replication, transcription, translation, assembly to the evasion from host innate immune response (fung and liu, ) . not only this but also different antiviral cellular responses like-autophagy (ahmad et al., ) , apoptosis (barber, ) etc. can also be moderated by the virus to ensure its survival inside the host cells. apart from these, several other host-virus interactions are also observed like-modulation of the activity of host transcription factors (lyles, ) , host epigenetic factors (e.g. histone modifications, host mirnas etc.) (adhya and basu, ) . all of these multifaceted interactions can lead to the ultimate pathogenesis and progression of the disease. the interplay between different human coronaviruses and host was previously reported (fung and liu, ), however, sars-cov- interactions with the host immune response and its outcome in the pathogenesis are still to be elucidated. gordon we have obtained the transcription factors (tfs) which bind to the given promoters from cistrome data browser (zheng et al., ) that provides tfs from experimental chip-seq data. we utilized "toolkit for cistromedb", uploaded the kb upstream promoter with kb downstream from transcription start site (tss) bed file of the deregulated genes and fixed the peak number parameter to "all peaks in each sample". we extracted the experimentally validated target genes of human mirnas from mirtarbase database (huang et al., a) . we have downloaded the experimentally validated tfs which bind to mirna promoters and module it. we have considered those tfs that are expressed itself and that can 'activate' or 'regulated' mirnas. we used epifactors database (medvedeva et al., ) to find human genes related to epigenetic activity. we have utilized kegg mapper tool (kanehisa and sato, ) for the mapping of deregulated genes sars-cov- interacting host proteins in different cellular pathways. we then searched and targeted the pathways which are found to be enriched for sars-cov- deregulated genes. from these pathway information, we have manually sketched the pathways to provide a brief overview of the interplay between sars-cov- and host immune response, their outcomes in the viral pathogenesis. differentially expressed genes showed that deregulated genes of sars-cov- infection can exert biological functions like-regulation of inflammatory response, negative regulation of type-i interferon, response to interferon-gamma, interferon-gamma mediated signaling, nik/nf-kappab signaling, regulation of apoptotic process, cellular response to hypoxia, angiogenesis, negative regulation of inflammatory response, zinc ion binding, calcium ion binding etc. which were not enriched for sars-cov infection ( figure a , b). also, different organ specific functions like-heart development, kidney development etc. were only enriched for differentially expressed genes in sars-cov- infection ( figure a ). deregulated genes of sars-cov- infection were found to be related to pathways like-nf- kappab signaling, jak-stat signaling, rig-i-like receptor signaling, natural killer cell mediated cytotoxicity, phagosome, hif- signaling, calcium signaling, gnrh signaling, arachidonic acid metabolism, insulin signaling, adrenergic signaling in cardiomyocytes, ppar signaling etc. ( figure c , supplementary figure , ) which were found to be absent for sars-cov infection. cov p, hsa-mir- a- p, hsa-mir- - p, hsa-mir- b- p, hsa-mir- a- p, hsa-mir- b- p, hsa-mir- - p) were found to be targeting in both infections. although there are some similarities between sars-cov and sars-cov- genetic architecture, it is yet to know if they modulate common host pathways. also, it is largely unknown how sars-cov- uniquely exhibit some unique clinical features even having much similarities of the viral genes. as now the probable genetic and epigenetic regulators behind the differential gene expression have been identified, we aimed to explore how these deregulated genes are playing a role in the battle between virus and host. to obtain a detailed idea of the outcomes resulting from viral-host interactions and how sars-cov- uses its proteins to evade host innate immune response, we have mapped the significantly deregulated genes and host interacting protein in different overrepresented functional pathways using kegg mapper ( phagosome ( figure a ), arachidonic acid metabolism ( figure b ), pvr signaling ( figure ) etc., aberration of these pathways might provide sars-cov- an edge over the host immune response. also, sars-cov- can prevent the relaxin downstream signaling ( figure ) which plays a crucial role in lung's overall functionality and its abnormal regulation might results in the respiratory complications found in covid- . from previous studies we have compiled information on deregulated genes (blanco -melo et al., ) and virus-host interactome (gordon et al., ) in sars-cov- infection, however, to get detailed pictures of the affected pathways, which is still remained obscure, we have investigated how our identified host genetic and epigenetic factors are playing a role and how viruses are utilizing those. giving a closer look we have found some pathways which sars-cov- might be using but not sars-cov. figure a) . apoptosis is one important intracellular host immune response to reduce further spread of viruses from the infected cells (barber, ) . several signaling pathways are involved to elicit this apoptotic response inside the infected cells which can be suppressed by the viral proteins, for example-nsp is found to target rip , thus it might fail to relay signaling to casp /fadd mediated apoptosis, and necrosis by rip /rip complex ( figure ) ; nsp might block glutathione peroxidase which is involved in (s)-hete production and ultimately -oxoete production, thus apoptotic induction by these metabolites through pparγ signaling axis will not take place ( which in turn results in vascular remodeling and leakage of inflammatory cytokines (powell and rokach, ). -oxoete another arachidonic acid metabolism product that can also induce inflammation (powell and rokach, ) , production of both compounds might be hindered by sars-cov- nsp as it interacts with an upstream metabolic enzyme glutathione peroxidase ( figure b ). previously it was reported that il- signaling enhances antiviral immune responses (ma et al., ). sars-cov- nsp can bind il- receptor and inhibit the downstream signaling from il- receptor to traf for activating nfκb/mapks/cebpb signaling axis, thus decreasing the antiviral inflammatory responses (figure ). during acute viral infections toll-like receptor (tlr ) signaling plays important roles in eliciting inflammatory responses (olejnik et al., ) . sars-cov- protein nsp interacts with tbk which might reduce the signaling from irf , resulting in less ifn-i productions; while nsp interacts with rip ; as a result, the activation of downstream nfκb and mapks (p and jnk) pathways and induction of inflammatory responses from these pathways will be stalled ( figure ) . from the enrichment analysis, we have found that deregulated genes were also involved in processes and functions like-heart development, kidney development, age-rage signaling pathway in diabetic complications, zinc ion binding, calcium ion binding etc ( figure a, b flicek epigenetic modulation of host: new insights into immune evasion by viruses adrenergic signaling at the interface of allergic asthma and viral infections autophagy-virus interplay: from relaxin protects rat lungs from ischemia-reperfusion injury via inducible no synthase: role of erk- / pi k, and forkhead transcription factor fkhrl differential expression analysis for sequence count data gene ontology: tool for the unification of biology host defense, viruses and apoptosis ncbi geo: archive for functional genomics data sets-update emerging microbes & infections human coronavirus: host-pathogen interaction on the importance of host micrornas during cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing. biorxiv biases in illumina transcriptome sequencing caused by random hexamer priming histone deacetylases in viral infections mirtarbase : updates to the experimentally validated microrna-target interaction database the ncats bioplanet -an integrated platform for exploring the universe of cellular signaling pathways for toxicology nasal mucosal microrna expression in children with respiratory syncytial virus infection. bmc infectious diseases alveolar macrophage ingestion and phagosome-lysosome fusion defect associated with virus pneumonia an emerging coronavirus causing pneumonia outbreak in wuhan, china: calling for developing therapeutic and prophylactic strategies kegg: kyoto encyclopedia of genes and genomes kegg mapper for inferring cellular functions from protein sequences innate immune modulation by rna viruses: emerging insights from functional genomics. arrayqualitymetrics-a bioconductor package for quality assessment of microarray data innate immune evasion by human respiratory rna viruses the transcription factor mafb antagonizes antiviral responses by blocking recruitment of coactivators to the transcription factor irf hypoxia-induced angiogenesis: good and evil initial sequencing and analysis of the human genome the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application stat -dependent immune responses ensure host survival despite the presence of a potent viral antagonist foxo negatively regulates cellular antiviral response by promoting degradation of irf . ecsit bridges rig-i-like receptors to visa in signaling events of innate antiviral responses micrornas in the regulation of tlr and rig-i pathways the subread aligner: fast, accurate and scalable read mapping by seed-and-vote microrna- prevents necrotic cell death in human cardiomyocyte progenitor cells via targeting rip blimp- represses cd t cell expression of pd- using a feed-forward transcriptional circuit during acute viral infection genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding epigenetic regulators: multifunctional proteins modulating hypoxia-inducible factor-α protein stability and activity. cellular and molecular life sciences cytopathogenesis and inhibition of host gene expression by rna viruses hypoxia activates - pgdh and its metabolite -kete to promote pulmonary artery endothelial cells proliferation via erk / signalling the protective and pathogenic roles of il- in viral infections: friend or foe? pulmonary haemorrhage and cardiac dysfunction in a neonate with medium-chain acyl-coa dehydrogenase (mcad) deficiency cullin e ligases and their rewiring by viral factors micrornas and other mechanisms regulate interleukin- cytokines and receptors lipid involvement in viral infections: present and future perspectives for the design of antiviral strategies. lipid metabolism trim in the regulation of the antiviral innate immunity epifactors: a comprehensive database of human epigenetic factors and complexes pivotal role of receptor-interacting protein kinase and mixed lineage kinase domain-like in neuronal cell death induced by the human neuroinvasive coronavirus oc molecular pathways in virus-induced cytokine production eif e as a control target for viruses. viruses re: gene links for nucleotide (select ) -gene -ncbi toll-like receptor in acute viral infection: too much of a good thing epigenetic reprogramming of host genes in viral and microbial pathogenesis gitools: analysis and visualisation of genomic data using interactive heat-maps biosynthesis, biological effects, and receptors of hydroxyeicosatetraenoic acids (hetes) and oxoeicosatetraenoic acids (oxo-etes) derived from arachidonic acid degradation to attenuate the hepatitis c virus-induced innate immune response through pcbp transforming growth factor beta/smad signaling regulates irf- function and transcriptional activation of the beta interferon promoter the role of zinc in antiviral identification of a novel coronavirus causing severe pneumonia in human: a descriptive study mirnas regulate the hif switch during hypoxia: a novel therapeutic target hif and the lung: role of hypoxia-inducible factors in pulmonary development and disease. american journal of respiratory and critical care medicine wikipathways: a multifaceted pathway database bridging metabolomics to other omics research limma: linear models for microarray data string v : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets attenuates myocardial ischemia/reperfusion injury by suppressing ripk expression in mice transmir v . : an updated transcription factor-microrna regulation database tophat: discovering splice junctions with rna-seq the trimendous role of trims in virus-host interactions mice exhibit deficient calcium flux in immune cells and impaired immune responses re: who director-general's opening remarks at the media briefing on covid- - re: coronavirus cases micrornas in autophagy and their emerging roles in crosstalk with apoptosis characterization of the lipidomic profile of human coronavirus-infected cells: implications for lipid metabolism remodeling upon coronavirus replication mir- a inhibits oxidized low-density lipoprotein-induced lipid accumulation and inflammatory response via targeting toll-like receptor endothelial cell control of thrombosis parp - dtx l ubiquitin ligase targets host histone h bj and viral c protease to enhance interferon signaling and control viral infection cistrome data browser: expanded datasets and new tools for gene regulatory analysis key: cord- - o yd h authors: thépaut, michel; luczkowiak, joanna; vivès, corinne; labiod, nuria; bally, isabelle; lasala, fátima; grimoire, yasmina; fenel, daphna; sattin, sara; thielens, nicole; schoehn, guy; bernardi, anna; delgado, rafael; fieschi, franck title: dc/l-sign recognition of spike glycoprotein promotes sars-cov- trans-infection and can be inhibited by a glycomimetic antagonist date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o yd h the efficient spread of sars-cov- resulted in a pandemic that is unique in modern history. despite early identification of ace as the receptor for viral spike protein, much remains to be understood about the molecular events behind viral dissemination. we evaluated the contribution of c-type lectin receptors (clrs) of antigen-presenting cells, widely present in air mucosa and lung tissue. dc-sign, l-sign, langerin and mgl bind to diverse glycans of the spike using multiple interaction areas. using pseudovirus and cells derived from monocytes or t-lymphocytes, we demonstrate that while virus capture by the clrs examined does not allow direct cell infection, dc/l-sign, among these receptors, promote virus transfer to permissive ace + cells. a glycomimetic compound designed against dc-sign, enable inhibition of this process. thus, we described a mechanism potentiating viral capture and spreading of infection. early involvement of apcs opens new avenues for understanding and treating the imbalanced innate immune response observed in covid- pathogenesis in the detection of carbohydrate-based pathogen-associated molecular patterns by antigen-presenting cells (apc), including macrophages and dendritic cells, and in the elaboration of the immune response (geijtenbeek and gringhuis, ; takeuchi and akira, ) . many innate immune cells express a wide variety of clrs, which differ between cell types, allowing specific adjustments of the immune response upon target recognition. thus, clrs such as dectin- , mincle, mgl (macrophage galactose lectin), langerin and dc-sign are major players in the recognition of pathogenic fungi, bacteria, parasites and viruses (de jong et al., ; van kooyk and geijtenbeek, ; mnich et al., ; van breedam et al., ) . the interaction of these clrs with their ligands allows dendritic cells (dc) to modulate the immune response towards either activation or tolerance. this is done in particular through antigen presentation in lymphoid organs (primary mission of apcs) but also through the release of cytokines. thus, dcs have a major role in modulating the immune response from the early stages of infection. to fulfill their sentinel function, dcs are localized at and patrol the sites of first contact with a pathogen, such as epithelia and mucous interfaces, including the pulmonary and nasopharyngeal mucosae. similarly, alveolar macrophages are found in the lung alveoli. in this battle for infection, some pathogens have evolved strategies to circumvent the initial role of clrs in activating immunity and even to divert clrs to their benefit for their infection process. many viruses associate with clrs and other host factors at the cell surface to facilitate they transfer towards their specific target receptors that will trigger fusion of viral and host membranes. this kind of viral subversion has been reported for several c-type lectin receptors, including l-sign (also called dc-signr) and especially dc associated dc-sign, which promotes cisand/or trans-infection of several viruses such as hiv, cytomegalovirus, dengue, ebola and zika viruses (alvarez et al., ; carbaugh et al., ; geijtenbeek et al., ; halary et al., ; navarro-sanchez et al., ) . in particular, dc-sign mediates direct hiv infection of dcs (cis-infection) and can also induce trans-infection of t cells, the primary target of the virus (de witte et al., ) , while in the case of dengue and ebola, dc-sign allows direct cis-infection of the receptor-carrying cells (alvarez et al., ; navarro-sanchez et al., ) . even more noteworthy nowadays, dc-sign and l-sign (herein after collectively referred to as dc/l-sign) have also been reported to be involved in the enhancement of sars-cov- infection (jeffers et al., ; marzi et al., ; yang et al., ) . in the context of the current covid- pandemic, attention is now focused on the sars-cov- virus zhou et al., ) .coronaviruses use a homotrimeric glycosylated spike (s) protein protruding from their viral envelope to interact with cell membranes and promote fusion upon proteolytic activation. in the case of sars-cov- , a first cleavage occurs within infected cells, at the level of a furin site (s /s site), generating two functional subunits s and s that remain complexed in a prefusion conformation in newly formed virus. s contains the fusion machinery of the virus, while the surface unit s contains the receptor-binding domain (rbd) and stabilizes s in its pre-fusion conformation. the s protein of both sars-cov- (hoffmann et al., ; letko et al., ; walls et al., ; zhou et al., ) and sars-cov- (li et al., ) use ace (angiotensin-converting enzyme ) as their primary receptor. for sars-cov- spike, interaction of its rbd with ace , as well as a second proteolytic cleavage at a s ' site, trigger further irreversible conformational changes in s , thus engaging the fusion process (hoffmann et al., ) . the sequence of events around the s protein/ace interaction are becoming increasingly clearer, but much remains to be unraveled about additional factors facilitating the infection such as sars-cov- delivery to the ace receptor. indeed, s proteins from both sars-cov- and sars-cov- have identical affinity for ace (walls et al., ) , but this translates to very different transmission rates. we posit that the enhanced transmission rate of sars-cov- relative to sars-cov- (hca lung biological network et al., ) might result from a more efficient viral adhesion through host-cell attachment factors, which may promote efficient infection of ace + cells. this type of mechanism is frequently exploited by viruses using alternatively heparan sulfate, glycolipids or clrs to concentrate and scan cell surface for their receptor. additionally, in the case of sars-cov- , a new paradigm is needed to untangle the complex clinical picture, resulting in a vast range of possible symptoms and in a spectrum of disease severity associated on one hand with active viral replication and cell infection through interaction with ace along the respiratory tract, and, on the other hand, to the development of excessive immune activation, i.e. the so called "cytokine storm", that is related to additional tissue damage and potential fatal outcomes. in this framework, c-type lectin prrs and the apcs displaying them, i.e. dcs and macrophages, can play a role both as viral attachment factors and in immune activation. thus, their role in sars-cov- infection deserves attention and we focused on dc-sign and l-sign because of their involvement in sars-cov- infections (jeffers et al., ; marzi et al., ; yang et al., ) . l-sign is expressed in type ii alveolar cells of human lungs as well as in endothelial cells and was identified as a cellular receptor for sars-cov- s glycoprotein (jeffers et al., ) . dc-sign was also characterized as a sars-cov- s protein receptor (marzi et al., ) able to enhance virus cellular entry by dc transfer to ace + pneumocytes (yang et al., ) . recent thorough glycan and structural analyses comparing both sars-cov- / spike glycoproteins have shown that glycosylation is mostly conserved in the two proteins, both in position and nature of the glycan exposed (watanabe et al., a (watanabe et al., , b wrapp et al., ) , creating a glycan shield which complicates neutralization by antibodies. secondly, elegant molecular dynamic simulations suggested how some of the spike glycans may directly modulate the dynamics of the interaction with ace , stabilizing the up conformation of the rbd domain (casalino et al., ; zhao et al., ) . finally, and yet unexplored, spike glycans may contribute to infectivity by acting as anchor points for dc-sign and l-sign on host cells surfaces. indeed, % of glycans are of the oligomannose-type and could therefore constitutes ligands for clrs and notably for dc-sign and l-sign. this argues also for the potential use of these clrs by sars-cov- , as do sars-cov- . additionally, some mutations modulating sars-cov- virulence have an impact on the glycosylation level of the spike. as an example, the d g mutation, which increases virulence, has been reported as potentially increasing glycosylation at neighboring asparagine (brufsky and lotze, ; jia et al., ; korber et al., ) . a recent proteomic profiling study pointed to dc-sign as a mediator of genetic risk in covid- (katz et al., ) and finally it is of note that dc/l-sign expression is induced by proinflammatory cytokines such as il- , il- , il- and il- , known to be overexpressed in severe sars and covid- cases (lucas et al., ; relloso and puig-kroger) . these observations prompted us to investigate the potential interaction of c-type lectins receptors, notably dc/l-sign with sars-cov- , through glycan recognition of the spike envelope glycoprotein, as well at their potential role in sars-cov- transmission. in order to accurately analyze the interaction properties of the spike protein from sars-cov- with c-type lectin receptors, we expressed and purified recombinant spike protein using an expression system well-characterized in term of its site-specific glycosylation. we used here the same construct that was used ) to obtain the cryoelectron microscopy structure of the structure (wrapp et al., ) and ) for extensive characterization of glycan distribution on the spike surface (watanabe et al., a) . expression was performed as reported, without using kifunensine to avoid blocking glycan processing. the spike protein was purified exploiting its xhis tag, followed by a superose size exclusion chromatography (sec). sec chromatogram deconvolution allowed to select the best fractions ( figure b ). sds-page analysis confirmed protein purity and differences in migration after reduction supported the presence of expected disulfide bridges and thus proper folding ( figure a ). furthermore, sample quality and trimeric assembly were confirmed by d class averages of the spike obtained from negatively stained sample observed under the electron microscope ( figure c ). this construct contains " p" stabilizing mutations at residues and (pallesen et al., ) , a inacivated furin cleavage site at the s /s interface, and a c-terminal sequence optimizing trimerization (wrapp et al., ) . nonetheless, we observed a limited stability over a week time scale at °c. to further improve protein stability and therefore ensure the quality of the following investigations, we optimized the storage buffer. increasing ionic strength of the purification buffer up to mm nacl proved successful, preserving the trimeric state at °c at least for three weeks ( figure d to g). this "high-salt" concentration does not modify the structural properties of the protein as shown by identical elution profile in sec ( figure b ); in addition, negativestain em images are better in "high-salt" conditions ( figure d and f). all protein samples were therefore subsequently produced in mm nacl and stored at - °c. buffer was then modified according to the analysis performed. (a) sds-page analysis ( % acrylamide gel) of µg purified sars-cov- s protein; non-reduced and reduced with mercapto-ethanol, nr and r, respectively. (b) chromatograms of gel filtration profile of sars-cov- s protein using buffer with mm nacl (green line), mm nacl (blue line) and mm nacl (thick red line). manual deconvolution of gel filtration chromatogram at mm nacl: principal peak (thin red line) and contaminants (dashed red line). collected fractions are represented by the grey area. (c) classification of particles of sars-cov- s protein after the first step of purification on histrap hp column, using relion (auto-picking mode). (d) to (g) quality control of sars-cov- s protein performed by negative staining transmission electron microscopy (tem) using uranyl acetate as stain ( % w/v). scale bar is nm. (d) and (e) sample produced in mm nacl buffer, day of production and after days at °c, respectively. (f) and (g) sample produced in mm nacl buffer, day of production and after days at °c, respectively. dc-sign and l-sign have been already described as receptor of sars-cov- and twenty out of the twenty-two sars-cov- s protein n-linked glycosylation sequons are conserved. glycan shielding represent to % of the spike surface considering the head or the stalk of the s ectodomain, respectively (casalino et al., ; sikora et al., ) . one third of n-glycans of sars-cov- spike are of the oligomannose type (watanabe et al., a) . these glycans are common ligands for dc-sign and l-sign, and also for langerin, a clr of langerhans cells, a subset of tissue-resident dcs of the skin, also present in mouth and vaginal mucosae (hussain and lehner, ) . to compare their recognition capabilities, spr interaction experiments were performed with the various clrs with immobilized sars-cov- s proteins. first, a s protein functionalized surface was generated using a standard procedure for covalent amine coupling onto the surface. the functionalization degree of this "non-oriented" surface depends upon the number of solvent exposed lysine residues (figure a ), which may be severely restricted by the glycan shield discussed above. such restricted protein orientation could preclude the accessibility of some specific n-glycan clusters, located close to the linkage site and the sensor surface, thus hampering recognition by the oligomeric clrs tested. in order to overcome these limitations, we devised and generated a so-called "oriented surface" where the s protein is captured via its cterminal streptagii extremities onto a streptactin functionalized surface ( figure b ). in this set-up, no lateral parts of the s protein are attached to, and thus masked by, the sensor surface. moreover, in the "oriented surface" set-up the spike protein is presented as it would be at the surface of the sars-cov- virus, which might better reflect the physiological interaction with host receptors. considering both surface setups for the spike, the "non-oriented" one may favor access to n-glycans of the spike's stalk domain while the "oriented" one may favor access to n-glycans of the head domain. on the c-type lectin receptors side, we tested exclusively recombinant constructs corresponding to the extracellular domains (ecd), containing both their carbohydrate recognition domain (crd) and their oligomerization domain. thus, the specific topological presentation of their crd as well as their oligomeric status is preserved for each of the clr, going from tetramers for dc-sign and l-sign to trimers for mgl and langerin, ensuring interactions with avidity properties as close as possible to the physiological conditions for each clr. sensorgrams of interaction with both types of surface for various clr are presented in figure a and b. dc-sign and l-sign, initially tested on both surfaces, recognized the spike with the same profile, whatever the set-up. thus, the next two clrs were tested only on one type of surface. langerin was found to interact with the s protein in agreement with the presence of oligomannose-type glycans. finally, mgl, a lectin that specifically recognizes glycans bearing terminal gal or galnac residues, also interacted with the s protein ( figure a ). this shows that complex n-glycans may also serve as potential anchors for the sars-cov- s protein to cell surface clrs. while all clrs tested interacted with the spike, the interactions observed are not all equivalent. unfortunately, the complexity of the process involving probably multiple binding sites per oligomeric clr prevented a kinetic fitting using classical kinetic models, which precluded the determination of kinetic rate constants. nevertheless, an apparent equilibrium dissociation constant (kd) could be obtained by steady state fitting for dc-sign, l-sign and mgl. for langerin, despite a longer injection time, a much higher range of concentration would have been required to reach the equilibrium and accurately evaluate an apparent kd. dc/l-sign and mgl showed affinities in the µm range, from around to µm (table ) , depending on the clr and the surface type, while langerin has an affinity of at least one order of magnitude lower. despite the impossibility to evaluate kinetic association and dissociation rate constants (kon and koff), visual inspection of the sensorgrams clearly reveals different behaviors between dc-sign and l-sign independent of the surface set-up. while association and dissociation seem to be very fast for dc-sign, l-sign sensorgrams suggest a much slower association and dissociation rate that compensate each other to provide a kd similar to that of dc-sign. however, while the higher kon value for dc-sign argues for a faster formation of the dc-sign/s protein complex, the lower koff value for l-sign suggests that the l-sign/s-protein complex might be more stable. finally, for dc-sign and l-sign, which have been tested both on "non-oriented" and "oriented" s surface, no obvious differences has been observed in the interaction sensorgrams. this suggests that the interaction is not restricted to a limited glycan cluster, but rather that oligomannose-type glycans are multiple, accessible and distributed over the whole s protein. figure . they are the average from to independent measurements with different s protein preparations. the spr interaction analysis argues for multiple potential binding sites for clrs on the s protein. such initial host adhesion mechanism could be essential for efficient viral capture, viral particles concentration on the cell surface and subsequent enhanced ace targeting and infection. negative stain electron microscopy was used to visualize potential dc-sign/s protein complexes. extemporaneously after a fresh purification of s protein, sec fractions corresponding to the pure trimeric spike were mixed with a dc-sign ecd preparation in a molecular ratio : (meaning trimeric spike for tetrameric dc-sign ecd). in order to enrich the proportion of complex in the sample for em observation, we directly reinjected this mix onto same sec column and recovered fractions in the elution profile corresponding to higher molecular weight, thus potentially corresponding to dc-sign/s protein complex. these fractions were immediately used to prepare and observe negatively stained electron microscopy grid (figure ) . figure showed a great infectivity of all primary cell lines. however, this infection was dc-sign independent, since anti-dc-sign antibodies did not impact the infection level ( figure a ). pseudovirions dc/l-sign are known to enhance viral uptake for direct infection in the process referred to as cis-infection and can also internalize viral particles into cells for storage in non-lysosomal compartments and subsequent transfer to susceptible cells in the process recognized as trans-infection (alvarez et al., ; geijtenbeek et al., ) . to study the potential function of dc/l-sign in sars-cov- trans-infection, mddcs were incubated with vsv/sars-cov- for h and, after extensive washing, they were placed onto susceptible vero e cells, the reference ace + cell line for sars-cov- cell culture (zhou et al., ) . interestingly, dc-sign promoted efficient sars-cov- trans-infection from mddc to vero e ( figure b ). an anti-dc-sign antibody could reduce substantially the infectivity observed ( % inhibition), confirming the role of this clr in the process of sars-cov- trans-infection. the figure c ). polyman (pm , figure a ) is a multivalent glycomimetic mannoside tailored for optimal interaction with dc-sign (ordanini et al., ) . it is known to bind dc-sign carbohydrate recognition domain (crd), eliciting a th- type response from human immature monocyte derived dendritic cells (berzi et al., ) . it also inhibits dc-sign mediated hiv infection of cd + t lymphocytes with an ic of nm (ordanini et al., ) . pm was used in spr competition experiments to inhibit dc-sign binding to immobilized spike protein, both in the oriented and non-oriented set-ups ( figure b -c). the lectin ( µm) was co-injected with variable concentrations of pm (from µm to . µm), and the results showed clear dose-dependent inhibition. no significant difference was observed between the oriented and non-oriented surface, which is consistent with the binding data previously discussed ( figure ) . thus, an ic of , ± , µm correlates with the interaction affinity between dc-sign and the spike functionalized surfaces. it suggests, in such competition test were the reporting interaction can be limiting (porkolab et al., ) , that a real higher avidity towards dc-sign can be awaited for pm ( figure c ). trans-infection, respectively, which is consistent with the results described for hiv inhibition and confirming an effective affinity in the nanomolar range for pm (ordanini et al., ) . even if viruses target mainly one specific cellular entry receptor within their infection cycle, their efficiency often largely depends upon additional binding events at the cell surface, which promote access to the so-called primary receptor. although such additional receptors may not promote any fusion step, they can drive viral internalization through endocytic processes or simply by viral adhesion to the host cell, accumulation of viral particles on the cell surface and finally engagement with the primary receptor followed by the fusion event. different types of attachment factors can be found on the host cell surface: either glycans, such as heparan-sulfate, glycolipids or protein n-glycans, often targeted by envelope viral protein with lectin-like properties (dimitrov, ) , or immune lectin-type receptors including clrs and siglecs (chiodo et al., ) . given the importance of the role played by glycan determinants in this recognition event, therefore peculiar attention must be paid to the quality of the s protein sample used. indeed, it has to be ideally as close as possible to the physiological product in terms of glycosylation pattern and distribution. in particular, the expression system considered as well as the local protein environment may have a strong impact on the type of glycan added as well as on their level of maturation. viral envelope proteins display a dense array of glycans resulting from evolutionary pressure to mask immunogenic epitopes at their surfaces. this glycan density coupled to specific structural features of envelope proteins generate steric constraints preventing proper access of glycan processing enzymes to some substrate glycans (behrens and crispin, ) . expressing the whole spike ectodomain or just the single rbd domain may therefore lead to very different n-glycan distribution, especially considering that the rbd contains only n-glycosylations sites, while up to nglycan sites are found over the whole spike protein. for these reasons, we selected the entire ectodomain of s as our model to investigate additional attachment factors for sars-cov- . we expressed the protein using the same construct enabling the spike em structure (wrapp et al., ) and its glycan profiling (watanabe et al., a) , using a hek -derived expression system known to provide glycosylation pattern similar to epithelial tissues. similarly, we expressed the entire ectodomain for the clrs as well, avoiding fc-crd constructs, in order to preserve their specific oligomeric assembly and therefore their avidity properties. using spr we showed that all the c-type lectin receptors tested interact with the spike protein. three of those, dc-sign, l-sign and langerin share the ability to recognize high-mannose oligosaccharides. in particular, l-sign is tightly specific for high-mannose, while dc-sign additionally recognizes fucose based ligands (several lewis type glycans) and langerin binds sulfated sugars as well. mgl is specific for gal and galnac terminated glycans and may bind to complex n-glycans as a function of their level of maturation (valverde et al., ) . analyzing the glycosylation pattern of the spike protein, reported in figure b , all glycosylation site depicted in green or orange are potential ligands for l-sign and langerin, with different level of probability from site to site, while mgl's ligands will be found in magenta sites. dc-sign might potentially recognize all of them. beside all considerations about specificity, the accessibility of n-glycan sites upon spike presentation at the sars-cov- virus surface is also of paramount importance for recognition. dc-sign and l-sign share the same tetrameric organization and they recognize with similar avidity the spike functionalized spr surface, suggesting that they share a primary recognition epitope -i.e. high mannose. the spr experiments described here have been performed sequentially on the same spike surfaces with the different clrs. of these, dc-sign and l-sign have similar organization and molecular weight (feinberg et al., ) , thus the difference in ru level reached by the two lectins at their equilibrium (approx. ru higher for dc-sign) suggests that there is more dc-sign binding and thus more epitopes available for it, implying that high mannose are not the unique glycan epitope used here by dc-sign. this suggest that dc-sign may be able to also bind to some o complex n-glycosylation sites (in magenta in figure b ), possibly presenting a proper fucosylation pattern that generates lewis-type epitopes. these considerations, in addition to the oligomeric state of the clrs examined, lead us to rule out a simple interaction with a single preferential epitope and a : stoichiometry in favor of a more complex picture with multiple and simultaneous binding events, much like the "velcro effect" often recalled when discussing glycan-protein interactions. this is clearly supported by the em characterization of spike/dc-sign complexes ( figure c ) that shows several interactions areas on the spike and can also explain the absence of affinity differences between non-oriented and oriented spikes surfaces in spr. the complexity of the binding event(s) described above does not allow to extract kinetic association and dissociation rates from the sensorgrams. only a global apparent kd could be inferred, giving avidity levels. however, l-sign may have a slightly better affinity (around µm, while values ranging from to µm have been obtained for dc-sign and mgl) and seems to generate more stable complexes. such µm range of affinity, as determined here for soluble forms of clr, will result in surface avidity of several order of magnitude higher at the cell membrane (porkolab et al., attachment point for viral capture (cambi et al., ) . and initial dissemination of a number of viral agents has been described in animal models for measles (mesman et al., (mesman et al., , , japanese encephalitis virus (liang et al., ) and in vivo for hiv- . the founder viruses that initiate hiv infection through mucosa exhibit higher content of high-mannose carbohydrates (go et al., ) , as well as higher binding to dcs dependent on dc-sign expression (parrish et al., ) . in the case of ebola virus, dcs and macrophages have been shown to be the initial targets of infection in macaques (geisbert et al., ; martinez et al., ) and circulating dc-sign + dcs have been shown to be the first cell subset infected upon intranasal ebov inoculation in a murine model (lüdtke et al., ) . in sars-cov- infection it was shown that dc/l-sign can enhance viral infection and dissemination (marzi et al., ; yang et al., ) and even it has been proposed that l-sign could act as an alternative cell receptor to ace (jeffers et al., ) . our work shows that dc/l-sign are important enhancers of infection mediated by the s protein of sars-cov- that greatly facilitate viral transmission to susceptible cells. in vivo, dc-sign is largely expressed in immature dendritic cells in submucosa and tissue resident macrophages, including alveolar macrophages (tailleux et al., ) whereas l-sign is highly expressed in human type ii alveolar cells and lung endothelial cells (jeffers et al., ) . infection (geijtenbeek et al., ) . an obvious increase of infection was observed when sars-cov- pseudovirions were incubated with these primary cells and then placed in contact with susceptible veroe cells ( figure b ). similar results were obtained with the t lymphocyte jurkat cell line. t lymphocytes lack ace expression (hamming et al., ) and both the parental jurkat cell line and jurkats expressing dc/l-sign were not directly infected by sars-cov- pseudovirions ( figure a ). therefore, we did not observe that these clrs can function as alternative receptors to ace in non-permissive cells such as t lymphocytes or hek (supplementary information), as it has been recently suggested by amraie et al. (amraie et al., ) . on the extracellular domains (ecd) of dc-sign (residues: - ), l-sign (residues: - ), langerin (residues: - ) and mgl (residues: - ) were produced and purified as already described (achilli et al., ; chabrol et al., ; maalej et al., ; reina et al., ) while sars-cov- spike protein was expressed and purified as follows. the mammalian expression vectors used for the s ectodomain, derived from a p h vector, was a kind gift from j. mclellan (wrapp et al., ) . this construct possesses, in its c-terminus, an xhis tag followed by streptagii. expi cells grown in expi expression medium were transiently transfected with the s ectodomain vector according to the manufacturer's protocol (thermo fisher scientific). cultures were harvested five days after transfection and the medium was negative-stain grids were prepared using the mica-carbon flotation technique (valentine et al., ) . µl of spike samples from purifications diluted at about . - . mg/ml were adsorbed on the clean side of a carbon film previously evaporated on mica and then stained using % w/v uranyl acetate for s. the sample/carbon ensemble was then fished using an em grid and air-dried. images were acquired under low dose conditions (< e−/Å ) on a tecnai fei electron microscope operated at kv using a gatan orius sc camera (gatan, inc., pleasanton, ca) at , x nominal magnification. to facilitate the visualization of the molecules, a gaussian filter was applied to the images using photoshop, then the gray levels were saturated and the background eliminated. for the d classification, images were processed with relion . (scheres, ) . ctf was estimated with ctffind- . (zhang, two types of surface plasmon resonance (spr) experiments were performed at °c on a biacore t . the first experiments with non-oriented spike surfaces were performed using a cm sensor chip, functionalized at μl/min. the second type of experiments used oriented spike surfaces and were performed using a cm sensor chip functionalized at μl/min. the procedure for oriented functionalization has been described in our recent work (porkolab et al., baby hamster kidney cells (bhk- , - - maw, kerafast, boston, ma) and african green monkey cell line (veroe ) were cultured in dulbecco´s modified eagle medium (dmem) supplemented with % heat-inactivated fetal bovine serum (fbs), μg/ml gentamycin and mm l-glutamine. jurkat, jurkat dc-sign, jurkat l-sign (alvarez et al., ) and jurkat langerin were maintained in rpmi supplemented with % heat-inactivated fbs, μg/ml gentamycin and mm l-glutamine. blood samples were obtained from healthy human donors (hospital for differentiation of m -mdms, cells were incubated at ºc with % co for days and activated with m-csf ( u/ml) (miltenyi biotec) every second day. rvsv-luc pseudotypes were generated following a published protocol ( after h incubation at ºc, cells were washed exhaustively with pbs and then dmem supplemented with % heat-inactivated fbs, μg/ml gentamycin and mm lglutamine were added. pseudotyped particles were collected - h post-inoculation, clarified from cellular debris by centrifugation and stored at - ºc (hoffmann et al., ; letko et al., ; whitt, ) . infectious titers were estimated as tissue culture infectious dose per ml by limiting dilution of rvsv-luc-pseudotypes on vero e cells. luciferase activity was determined by luciferase assay (steady-glo luciferase assay system, promega). cell lines: jurkat, jurkat dc-sign and jurkat l-sign ( x cells) or primary cells: mddcs, m -mdms ( x cells) were challenged with sars-cov- , ebov-gp or vsv-g pseudotyped recombinant viruses (moi: . - ). after h of incubation, cells were washed twice with pbs and lysed for luciferase assay. for trans-infection studies, jurkat dc-sign, jurkat l-sign, jurkat langerin ( x cells) or mddcs ( x cells) were challenged with recombinant sars-cov- , ebov-gp or vsv-g pseudotyped viruses (moi: . - ) and incubated during h at room temperature with rotation. cells were then centrifuged at rpm for minutes and washed twice with pbs supplemented with . % bovine serum albumin (bsa) and mm cacl . jurkat dc-sign and mddcs were then resuspended in rpmi medium and co-cultivated with adherent vero e cells ( . x cells/well) on a -well plate. after h, the supernatant was removed and the monolayer of vero e was washed with pbs three times and lysed for luciferase assay. polyman (pm ) is a known glycomimetic ligand of dc-sign and an antagonist of dc-sign mediated hiv trans-infection (berzi et al., ; ordanini et al., ) . it was synthesized as previously described and tested in spr studies as an inhibitor of dc-sign interaction to the spike protein of sars-cov- , using both the oriented and nonoriented s surface described above. in both cases, a µm solution of dc-sign in a running buffer composed of mm tris ph , mm nacl, mm cacl , . % p surfactant was co-injected with variable concentrations of polyman , from µm to . µm, in the same buffer. ic values were determined from the plot of pm concentration vs % inhibition by fitting four-parameter logistic model as previously described (varga et al., ) . in the infection studies, cells were first incubated with the compound pm for min at room temperature with rotation and then challenged with sars-cov- recombinant viruses (moi: . - ) during h at room temperature with rotation. the concentrations tested for compound pm were and . μm. as a control, inhibition experiment was performed in the presence of anti-dc/l-sign antibody (r&d systems). cells were then washed as described above, resuspended in rpmi medium and co-cultivated with adherent vero e cells ( . x cells/well) on a -well plate. after h, the supernatant was removed and the monolayer of vero e was washed with pbs three times and lysed for luciferase assay. the authors declare no conflict of interest. tetralec, artificial tetrameric lectins: a tool to screen ligand and pathogen interactions c-type lectins dc-sign and l-sign mediate cellular entry by ebola virus in cis and in trans cd l/l-sign and cd /dc-sign act as receptors for sars-cov- and are differentially expressed in lung and kidney epithelial and endothelial cells (biorxiv) is innate immunity our best weapon for flattening the curve? structural principles controlling hiv envelope glycosylation c-type lectin receptors in antiviral immunity and viral escape pseudo-mannosylated dc-sign ligands as imbalanced host response to sars-cov- drives development of covid- dc/l-signs of hope in the covid- pandemic microdomains of the c-type lectin dc-sign are portals for virus entry into dendritic cells envelope protein glycosylation mediates zika virus pathogenesis shielding and beyond: the roles of glycans in sars-cov- spike protein (biorxiv) glycosaminoglycans are interactants of langerin: comparison with gp highlights an unexpected calcium-independent binding mode novel ace -independent carbohydrate-binding of sars-cov- spike protein to host lectins and lung microbiota (biorxiv) virus entry: molecular mechanisms and biomedical applications the dc-sign-related lectin lsectin mediates antigen capture and pathogen binding by human myeloid cells structural basis for selective recognition of oligosaccharides by dc-sign and dc-signr signalling through c-type lectin receptors: shaping immune responses dc-sign, a dendritic cell-specific hiv- -binding protein that enhances trans-infection of t cells pathogenesis of ebola hemorrhagic fever in cynomolgus macaques characterization of glycosylation profiles transmitted/founder envelopes by mass spectrometry analysis of the sars-cov- spike protein glycan shield: implications for immune recognition human cytomegalovirus binding to dc-sign is required for dendritic cell infection and target cell trans-infection tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor clinical features of patients infected with novel coronavirus in wuhan comparative investigation of langerhans' cells and potential receptors for hiv in oral, genitourinary and rectal epithelia cd l (l-sign) is a receptor for severe acute respiratory syndrome coronavirus analysis of the mutation dynamics of sars-cov- reveals the spread history and emergence of rbd mutant c-type lectin langerin is a beta-glucan receptor on human langerhans cells that recognizes opportunistic and pathogenic fungi proteomic profiling in biracial cohorts implicates dc-sign as a mediator of genetic risk in covid- dc-sign: escape mechanism for pathogens tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses angiotensin-converting enzyme is a functional receptor for the sars coronavirus dc-sign binding contributed by an extra n-linked glycosylation on japanese encephalitis virus envelope protein reduces the ability of viral brain invasion longitudinal immunological analyses reveal inflammatory misfiring in severe covid- patients ebola virus infection kinetics in chimeric mice reveal a key role of t cells as barriers for virus dissemination the human macrophage galactose-type lectin, mgl, recognizes the outer core of e. coli lipooligosaccharide. chembiochem eur the role of antigen-presenting cells in filoviral hemorrhagic fever: gaps in current knowledge dc-sign and dc-signr interact with the glycoprotein of marburg virus and the s protein of severe acute respiratory syndrome coronavirus analysis of the interaction of ebola virus glycoprotein with dc-sign (dendritic cell-specific intercellular adhesion molecule -grabbing nonintegrin) and its homologue dc-signr a prominent role for dc-sign+ dendritic cells in initiation and dissemination of measles virus infection in non-human primates measles virus suppresses rig-i-like receptor activation in dendritic cells via dc-sign-mediated inhibition of pp phosphatases human coronavirus nl utilizes heparan sulfate proteoglycans for attachment to target cells c-type lectin receptors in host defense against bacterial pathogens dendritic-cell-specific icam -grabbing non-integrin is essential for the productive infection of human dendritic cells by mosquito-cellderived dengue viruses designing nanomolar antagonists of dc-sign-mediated hiv infection: ligand presentation using molecular rods immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen structures of mers-cov spike glycoprotein in complex with sialoside attachment receptors phenotypic properties of transmitted founder hiv- anti-siglec- antibodies block ebola viral uptake and decrease cytoplasmic viral entry development of c-type lectin-oriented surfaces for high avidity glycoconjugates: towards mimicking multivalent interactions on the cell surface docking, synthesis, and nmr studies of mannosyl trisaccharide ligands for dc-sign lectin dc-sign (cd ) expression is il- dependent and is negatively regulated by ifn, tgf-␤, and anti-inflammatory agents dc-sign (cd ) expression is il- dependent and is negatively regulated by ifn, tgf-β, and anti-inflammatory agents relion: implementation of a bayesian approach to cryo-em structure determination map of sars-cov- spike epitopes not shielded by glycans (biorxiv) dc-sign induction in alveolar macrophages defines privileged target host cells for mycobacteria in patients with tuberculosis pattern recognition receptors and inflammation structural basis for human coronavirus attachment to sialic acid receptors immunology of covid- : current state of the science regulation of glutamine synthetase. xii. electron microscopy of the enzyme from escherichia coli molecular recognition in c-type lectins: the cases of dc-sign, langerin, mgl, and l-sectin. chembiochem cbic bitter-sweet symphony: glycan-lectin interactions in virus biology a multivalent inhibitor of the dc-sign dependent uptake of hiv- and dengue virus function, and antigenicity of the sars-cov- spike glycoprotein site-specific glycan analysis of the sars-cov- spike vulnerabilities in coronavirus glycan shields despite extensive glycosylation generation of vsv pseudotypes using recombinant Δg-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines langerin is a natural barrier to hiv- transmission by langerhans cells distinct roles for dc-sign+-dendritic cells and langerhans cells in hiv- transmission cryo-em structure of the -ncov spike in the prefusion conformation distinct cellular interactions of secreted and transmembrane ebola virus glycoproteins covid- : immunopathogenesis and immunotherapeutics ph-dependent entry of severe acute respiratory syndrome coronavirus is mediated by the spike glycoprotein and enhanced by dendritic cell transfer through dc-sign gctf: real-time ctf determination and correction virus-receptor interactions of glycosylated sars-cov- spike and human ace receptor (biorxiv) a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -s hbpvn authors: straus, marco r.; kinder, jonathan t.; segall, michal; dutch, rebecca ellis; whittaker, gary r. title: spint inhibits proteases involved in activation of both influenza viruses and metapneumoviruses date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: s hbpvn viruses possessing class i fusion proteins require proteolytic activation by host cell proteases to mediate fusion with the host cell membrane. the mammalian spint gene encodes a protease inhibitor that targets trypsin-like serine proteases. here we show the protease inhibitor, spint , restricts cleavage-activation efficiently for a range of influenza viruses and for human metapneumovirus (hmpv). spint treatment resulted in the cleavage and fusion inhibition of full-length influenza a/ca/ / (h n ) ha, a/aichi/ (h n ) ha, a/shanghai/ / (h n ) ha and hmpv f when activated by trypsin, recombinant matriptase or klk . we also demonstrate that spint was able to reduce viral growth of influenza a/ca/ / h n and a/x h n in cell culture by inhibiting matriptase or tmprss . moreover, inhibition efficacy did not differ whether spint was added at the time of infection or hours post-infection. our data suggest that the spint inhibitor has a strong potential to serve as a novel broad-spectrum antiviral. influenza-like illnesses (ilis) represent a significant burden on public health and can be caused by a range of respiratory viruses in addition to influenza virus itself ( ). an ongoing goal of anti-viral drug discovery is to develop broadly-acting therapeutics that can be used in the absence of definitive diagnosis, such as in the case of ilis. for such strategies to succeed, drug targets that are shared across virus families need to be identified. for the spint inhibition assays, trypsin (which typically resides in the intestinal tract and expresses a very broad activity towards different ha subtypes and hmpv f) served as a control ( ). in addition, furin was used as a negative control that is not inhibited by spint . as none of the peptides used in combination with the aforementioned proteases has a furin cleavage site we tested furin-mediated cleavage on a peptide with a h n hpai cleavage motif in the presence of nm spint (supplementary figure ) . we continued by measuring the vmax values for each protease/peptide combination in the presence of different spint concentrations and plotted the obtained vmax values against the spint concentrations on a logarithmic scale (supplementary figure ) . using prism software, we then determined the ic that reflects at which concentration the vmax of the respective reaction is inhibited by half. spint cleavage inhibition of a representative h n cleavage site by trypsin results in an ic value of . nm (table a) while the inhibition efficacy of spint towards matriptase, hat, klk and klk ranged from nm to nm (table a) . however, inhibition was much less efficient for plasmin compared with trypsin ( nm). we observed a similar trend when testing peptides mimicking the h n and h n ha cleavage sites using trypsin, hat, klk , plasmin and trypsin, matriptase, plasmin, respectively (table a) . with the exception of plasmin, we found that human respiratory tract proteases are inhibited with a higher efficacy compared to trypsin. we expanded our analysis to peptides mimicking ha cleavage sites of h n , h n (lpai and hpai), h n and h n that all reflected the results described above (table a) . only cleavage inhibition of h n ha by klk did not significantly differ from the observation made with trypsin (table a ). when we tested the inhibition of hmpv cleavage by trypsin, matriptase and klk , spint demonstrated high inhibition efficacy for all three tested proteases with measured ic s for trypsin, matriptase and klk of . nm, . nm and . nm, respectively (table b ).compared to the ic values observed with the peptides mimicking influenza ha cleavage site motifs the ic values for the hmpv f peptide were very low. . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint vivo situation and requires validation by expressing the full-length fusion proteins in a cell culture model to test cleavage and cleavage inhibition of the respective protease ( ). however, before conducting these experiments we wanted to ensure that spint does not have a cytotoxic effect on cells. therefore, t cells were incubated with various concentrations of spint over a time period of hours. pbs and µm h o served as cytotoxic negative and positive controls respectively. we observed a slight reduction of about - % in cell viability when spint was added to the cells (figure ) . to test spint -mediated cleavage inhibition of full-length ha we expressed the has of a/ca/ / (h n ), a/x (h n ) and a/shanghai/ / (h n ) in t cells and added recombinant matriptase or klk protease that were pre-incubated with nm or nm spint . trypsin and the respective protease without spint incubation were used as controls. cleavage of ha was analyzed via western blot and the signal intensities of the ha bands were quantified using the control sample without spint incubation as a reference point to illustrate the relative cleavage of ha with and without inhibitor (figure a and b-d) . trypsin cleaved all tested ha proteins with very high efficiency that was not observed with matriptase or klk ( figure b -d). however, h n ha was cleaved by matriptase and klk to a similar extent without and with nm spint . nm spint led to a relative cleavage reduction of about % and % for matriptase and klk , respectively (figure a and b). klk -mediated cleavage of h n ha was reduced by about % when klk was pre-incubated with nm spint and by about % when nm spint was used (figure a and c). when we tested the cleavage inhibition of matriptase with h n ha as a substrate we found that nm and nm spint reduced the cleavage to % and % cleavage, respectively, compared to the control. (figure a and d). in contrast, nm spint had no effect on klk -mediated cleavage of h n ha while nm reduced relative cleavage by approximately % (figure a and d) . . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https: //doi.org/ . / doi: biorxiv preprint in order to determine whether spint also prevented cleavage of hmpv f we first examined which proteases, in addition to trypsin and tmprss , were able to cleave hmpv f. first, we co-transfected the full length tmprss , hat and matriptase with hmpv f in vero cells. the f protein was then radioactively labeled with s methionine and cleavage was examined by quantifying the f full length protein and the f cleavage product. we found that tmprss and hat were able to efficiently cleave hmpv f while matriptase decreased the expression of f, though it is not clear if this was due to general degradation of protein or lower initial expression. however, matriptase demonstrated potential low-level cleavage when co-transfected ( figure a and b). we then examined cleavage by the exogenous proteases klk , klk and matriptase. compared with the trypsin control, klk and matriptase were able to cleave hmpv f, while klk was not ( figure c and d). in agreement with the peptide assay, cleavage of hmpv f by klk and matriptase was less efficient than for trypsin and both peptide, and full-length protein assays demonstrate that klk does not cleave hmpv f. this also serves as confirmation that matriptase likely cleaves hmpv f, but co-expression with matriptase may alter protein synthesis, stability or turnover if co- expressed during synthesis and transport to the cell surface. next, we tested spint inhibition of exogenous proteases trypsin, klk and matriptase. we pre-incubated spint with each protease, added it to vero cells expressing hmpv f and analyzed cleavage product formation. spint pre-incubation minimally affected cleavage at a concentration of nm but addition of nm spint resulted in inhibition of trypsin, klk and matriptase-mediated cleavage of hmpv, similar to our findings for ha ( figure e and f). the above presented biochemical experiments demonstrate that spint is able to efficiently inhibit proteolytic cleavage of hmpv f and several influenza a ha subtypes by a variety of proteases. for a functional analysis to determine whether spint inhibition prevents cells to cell fusion and viral growth, we examined influenza a infection in cell culture. while hmpv is an important human pathogen, influenza . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint grows significantly better in cell culture compared with hmpv. first, we tested whether cleavage inhibition by spint resulted in the inhibition of cell-cell fusion. as described above, matriptase and klk were pre-incubated with nm and nm spint and subsequently added to vero cells expressing a/ca/ / (h n ) ha or a/shanghai/ / (h n ) ha. cells were then briefly exposed to a low ph buffer to induce fusion and subsequently analyzed using an immune fluorescence assay. when matriptase and klk were tested with nm spint and incubated with vero cells expressing h n ha, we still observed syncytia formation ( figure a ). however, mm spint resulted in the abrogation of syncytia formation triggered by cleavage of the respective ha by matriptase and klk . we made the same observation when we tested klk and h n ha ( figure b ). matriptase-mediated h n ha syncytia formation was inhibited by the addition of nm spint ( figure b ). to ensure that cell-cell fusion inhibition is a result of ha cleavage inhibition through spint but not a side effect of spint treatment per se we expressed a/vietnam/ / (h n ) ha in vero cells and treated them with the inhibitor. h n ha possesses a hpai cleavage site and is cleaved intracellularly by furin during its maturation process. spint does not inhibit furin and is not able to cross cell membranes. thus, spint can not interfere with the proteolytic processing of h n ha and therefore this control allows to examine whether spint interferes with cell-cell fusion. figure c shows that h n ha forms large syncytia in the absence of spint as well as in the presence of nm spint . hence, we conclude that spint does not have a direct inhibitory effect on cell-cell fusion. to understand whether spint was able to inhibit or reduce the growth of virus in a cell culture model over the course of hours we transfected cells with human tmprss and human matriptase, two major proteases that have been shown to be responsible for the activation of distinct influenza a subtype viruses. tmprss is essential for h n virus propagation in mice and plays a major role in the activation of h n and h n viruses ( - ) . matriptase cleaves h n ha in a sub-type specific manner, is involved . cc-by-nc-nd . international license is made available under a in the in vivo cleavage of h n ha and our results described above suggest a role for matriptase in the activation of h n ( , ) . at hours post transfection we infected mdck cells with a/ca/ / (h n ) at a moi of . and subsequently added spint protein at different concentrations. non-transfected cells served as a control and exogenous trypsin was added to facilitate viral propagation. the supernatants were harvested hours post infection and viral titers were subsequently analyzed using an immuno- plaque assay. spint initially mitigated trypsin-mediated growth of h n at a concentration of nm and the extent of inhibition slightly increased with higher concentrations ( figure a , table a ). the highest tested spint concentration of nm reduced viral growth by about log ( figure a , table a ). we observed a similar pattern with cells transfected with human matriptase ( figure b , table a ). growth inhibition started at a spint concentration of nm and with the application of nm growth was reduced by approximately . logs ( figure b , table a ). when we infected cells expressing tmprss with h n and added spint , viral growth was significantly reduced at a concentration of nm. addition of nm spint led to a reduction of viral growth of about . logs ( figure c , table a ). we also tested whether spint could reduce the growth of a h n virus because it is major circulating seasonal influenza subtype. however, tmprss and matriptase do not seem to activate h n viruses ( , ) . hence, trypsin and spint were added to the growth medium of cells infected with a/x h n . compared to control cells without added inhibitor spint significantly inhibited trypsin-mediated h n growth at a concentration of nm ( figure d , table a ). at the highest spint concentration of nm viral growth was reduced by about log ( figure d ). we also examined the effect of spint inhibition of hmpv spread over time. vero cells were infected with rghmpv at moi and subsequently treated with nm of spint and . µg/ml of tpck- trypsin. every hours, cells were scraped and the amount of virus present was titered up to hpi with spint and trypsin replenished daily. we find that un-treated cells are infected and demonstrate significant spread through hpi. conversely, cells infected and treated with spint had no detectible . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint virus up to hpi and very minimal virus detected at and hpi, demonstrating that spint significantly inhibition hmpv viral replication and spread ( figure ). antiviral therapies are often applied when patients already show signs of disease. therefore, we tested if spint was able to reduce viral growth when added to cells hours after the initial infection. cells were infected with . moi of a/ca/ / (h n ) and trypsin was added to promote viral growth. at the time of infection, we also added nm spint to one sample. a second sample received nm spint hours post infection. growth supernatants were harvested hours later, and viral growth was analyzed. we found that viral growth was significantly reduced by regardless whether spint was added at the time of infection or hours later ( figure , table b ). influenza a virus has caused four pandemics since the early th century and infects millions of people each year as seasonal 'flu, resulting in up to , deaths annually ( ). vaccination efforts have proven to be challenging due to the antigenic drift of the virus and emerging resistance phenotypes ( ). moreover, the efficacy of vaccines seems to be significantly reduced in certain high-risk groups ( ). prevalent antiviral therapies to treat influenza a virus-infected patients such as adamantanes and neuraminidase inhibitors target viral proteins but there is increasing number of reports about circulating influenza a subtypes that are resistant to these treatments ( ). hmpv causes infections in the upper and lower respiratory tract expressing very similar symptoms as influenza infections and resulting in significant morbidity and mortality ( , ) . the most susceptible groups are young children, older adults and people that are immunocompromised ( - ). currently, there are not treatment against hmpv infections available. in this study we focused on a novel approach that uses antiviral therapies targeting host factors rather than viral proteins offering a more broad and potentially more effective therapeutic approach ( ). we demonstrate that spint , a potent inhibitor of serine-type proteases, is able to significantly inhibit . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint cleavage of hmpv f and ha, impair ha-triggered fusion of cells and hence, reduce the growth of various influenza a strains in cell culture. we assessed cleavage of the hmpv fusion protein in vitro using a peptide cleavage assay modified from previously work on other viral fusion proteins ( , ). the hmpv f peptide was cleaved by trypsin, plasmin, matriptase and klk but was unable to be cleaved by klk . to confirm these findings in a system in which the entire hmpv f protein was subject to cleavage, we co-expressed the fusion protein with tmprss , hat and matriptase proteases, or treated f with exogenous proteases klk , klk and matriptase. these findings are the first to identify proteases besides trypsin and tmprss that are able to cleave hmpv f. in addition, hmpv appears to utilize many of the serine proteases that influenza uses for ha processing and therefore, offers strong potential for an antiviral target. spint demonstrates greater advantage over other inhibitors of host proteases such as e.g. aprotinin that was shown to be an effective antiviral but also seemed to be specific only for a subset of proteases ( ). it can be argued that a more specific protease inhibitor which inhibits only one or very few proteases might be more advantageous because it may result in less side effects. with respect to influenza a infections, tmprss could represent such a specific target as it was shown to a major activating proteases for h n and h n in mice and human airway cells ( , , ( ) ( ) ( ) . however, there is no evidence that that application of a broad-spectrum protease inhibitor results in more severe side effects than a specific one as side effects may not be a consequence of the protease inhibition but the compound itself may act against different targets in the body. the reports demonstrating that tmprss is crucial for h n and h n virus propagation in mice and cell culture suggest that it also plays a major role in the human respiratory tract. so far, however, it is unclear whether the obtained results translate to humans and other studies have shown that for example human matriptase is able to process h n and h n ( , ). our peptide assay suggests that spint has a wide variety of host protease specificity. with the exception of plasmin, all the tested proteases in combination with peptides mimicking the cleavage site of different . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint cleavage inhibition of hmpv f were substantially lower, in the picomolar range. this suggests that the hmpv cleavage may be more selectively inhibited by spint . however, the western blot data showed that addition of the lowest concentration ( nm) of spint did not result in cleavage inhibition of hmpv f by the tested proteases. differences in sensitivity of spint between influenza ha and hmpv require further investigation. spint poses several potential advantages over other inhibitors that target host proteases. cell culture studies showed that, for example, matriptase-mediated h n ha cleavage was efficiently inhibited at a concentration of nm spint . in contrast, the substrate range for aprotinin, a serine protease inhibitor shown to reduce influenza a infections by targeting host proteases, seemed to be more limited ( ). other synthetic and peptide-like molecules designed to inhibit very specific serine proteases such as tmprss , tmprss and tmprss d (hat) were only tested with those proteases and their potential to inhibit other proteases relevant for influenza a activation remains unclear ( - ). currently, the most promising antiviral protein inhibitor is camostat which is already approved in japan for the treatment of chronic pancreatitis ( ). recently, it was demonstrated that camostat inhibited influenza replication in cell culture and prevented the viral spread and pathogenesis of sars-cov in mice by inhibition of serine proteases ( , ). however, camostat was applied prior to the virus infection and it was administered into the mice via oral gavage ( ) . a previous study showed that spint significantly attenuated influenza a infections in mice using a concentration that was x lower than the described camostat concentration and intranasal administration was sufficient ( ). our current study suggests that spint is able to significantly inhibit viral spread during an ongoing infection and does not need to be applied prior or at the start of an infection. the mouse study also showed that spint can be applied directly to the respiratory tract while camostat that is currently distributed as a pill and therefore less organ specific. in addition, camostat is synthetic whereas spint is a naturally occurring molecule which may attenuate . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint potential adverse effects due to non-native compounds activating the immune system. future research will be conducted to test if spint can be applied more efficiently via an inhaler and to explore potential side effects in mice studies. however, when we tested the potential of spint to inhibit viral replication in a cell culture model we were only able to achieve growth reductions of approximately - . logs after hours with a concentration of nm spint . one potential explanation is that nm spint was unable to saturate the proteases present in the individual experiments and was not sufficient to prevent viral growth. in addition, the continuous overexpression of matriptase and tmprss may have produced an artificially high quantity of protein that exceeded the inhibitory capacities of spint . this problem could be solved either by using higher concentration of spint or by optimizing its inhibitory properties. however, the data also demonstrates that spint has the ability to inhibit proteases that expressed on the cell surface and that inhibition is not limited to proteases that were added exogenously and pre- incubated with the inhibitor ( figure ). spint did not express any cytotoxic effects up to a concentration of mm, significantly above the therapeutic dosage required for inhibition. in comparison with other studies, the spint concentration we used here were in the nanomolar range while other published inhibitors require micromolar concentrations ( - ). however, we believe that future research will allow to fully utilize the potential of spint as a broad-spectrum antiviral therapy. wu et al., recently described that the kunitz domain i of spint is responsible for the inhibition of matriptase ( ). in future studies we will explore whether the inhibitory capabilities of spint can be condensed into small peptides that may improve its efficacy. its ability to inhibit a broad range of serine protease that are involved in the activation of influenza a suggest that a spint based antiviral therapy could be efficient against other pathogens too. tmprss , for example, not only plays a major role in the pathogenesis of h n but is also required for the activation of sars-cov and mers-cov and hmpv ( , ). currently, treatment options for these viruses are very limited and therefore spint could become a viable option if its potential as an antiviral therapeutic can be fully exploited. . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint however, while spint has a therapeutic potential to treat ilis caused by viruses that require activation by trypsin-like serine proteases it may have its limitation to provide a treatment option for infections caused by influenza hpai viruses, such as h n ( ). these viruses are believed to be activated by furin and pro-protein convertases that belong to the class of subtilisin-like proteases ( ). preliminary data from our lab demonstrated that spint did not inhibit furin-mediated cleavage of hpai cleavage site peptide mimics as well as peptides carrying described furin cleavage sites (data not shown). in addition, furin acts intracellularly and we have no evidence that spint is able to penetrate the cell membrane and thus inhibiting proteases located in intracellular compartments. therefore, it seems unlikely that spint is able to inhibit furin in cell culture-based studies or in vivo experiments. in conclusion we believe spint has potential to be developed into a novel antiviral therapy. in contrast to most similar drugs that are synthetic, spint is an endogenously expressed protein product that confers resistance to a variety of pathogenic viruses and which can potentially be delivered directly into the respiratory tract as an aerosol. most importantly, spint demonstrated the ability to significantly attenuate an ongoing viral infection in cell culture and further research will be conducted to explore the time period during which spint demonstrates the highest efficacy. cells, plasmids, viruses, and proteins were maintained in dulbecco's modified eagle medium (dmem) supplemented with mm hepes (cellgro) and % fetal bovine serum (vwr). vero cells used for hmpv experiments were maintained in dmem (hyclone) supplemented with % fbs (sigma). the plasmid encoding a/ca/ / (h n ) ha was generated as described ( ) . the plasmid encoding for hmpv f s h was generated as described( ). the plasmids encoding for a/shanghai/ / (h n ) ha, human tmprss and human matriptase were purchased from sino biological inc. the plasmid encoding for a/aichi/ / (h n ) ha was generously . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint donated by david steinhauer. a/ca/ / and a/x viruses were propagated in eggs. all recombinant proteases were purchased as described ( ). expression and purification of spint spint was expressed and purified as described with minor modification ( ). in brief, e. coli ril (de ) arctic express cells (agilent) were transformed with spint -psumo. cells were then grown in . l luria broth containing µg/ml kanamycin at °c. at od . - . cells were chilled on ice and protein expression was induced with . mm iptg. cells were then grown over night at °c. cells were harvested and protein was purified as previously described ( ). spint protein was eluted by a -hour incubation with ulp - xhis. glycerol was added to the eluted spint protein to a final concentration of % and protein aliquots were stored at - °c. protein concentration was determined by analyzing different dilutions of spint on an sds-page gel along with defined concentrations of bsa between ng and µg. the gel was then stained with coomassie, scanned with chemidoc imaging system (bio-rad) and bands were quantified using image lab software (bio-rad). concentrations of the spint dilutions were determined based on the bsa concentrations and the final spint concentration was calculated based on the average of the spint different dilutions. peptide assays were carried out as described ( ). the sequence of the hmpv f peptide mimicking the hmpv f cleavage site used in this assay is enprqsrfvl including the same n-and c-terminal modifications as described for the ha peptides ( ). the vmax was calculated by graphing each replicate in microsoft excel and determining the slope of the reaction for every concentration ( nm, nm, nm, nm, nm, nm, nm, nm, nm and nm). the vmax values were then plotted in the graphpad prism software against the log of the spint concentration to produce a negative sigmoidal graph from which the ic , or the concentration of spint at which the vmax is inhibited by half, could be . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint extrapolated for each peptide protease mixture. since the x-axis was the log of the spint concentration, the inverse log was then taken for each number to calculate the ic in nm. the cytotoxicity assay was performed with a cell counting kit- (dojindo molecular technologies) according to the manufacturer's instructions. in brief, approx. x t cells were seeded per well of a -well plate and grown over night. spint was added at the indicated concentrations. dmem and µm h o were used as a control. hours later µl ckk- solution were added to each well and incubated for hour. absorbance at nm was measured using a spark microplate reader (tecan). per sample and treatment three technical replicates were used and the average was counted as one biological replicate. experiment was conducted three times. vero cells were transfected with ug of pdna using lipofectamine and plus reagent (invitrogen) in opti-mem (gibco) according to the manufacturers protocol. the following day, cells were washed with pbs and starved in cysteine/methionine deficient media for min and radiolabeled with uc/ml s - cysteine/methionine for hours. cells were lysed in ripa lysis buffer and processed as described previously ( ) and the fusion protein of hmpv was immunoprecipitated using anti-hmpv f g monoclonal antibody (john williams, u. pitt). samples were run on a % sds-page and visualized using the typhoon imaging system. band densitometry was conducted using imagequant software (ge). repository. respective secondary antibodies had an alexa tag (invitrogen). western blot membranes were scanned using a chemidoc imaging system (bio-rad). for quantification the pixel intensity of the individual ha bands was measured using imagej software and cleavage efficiencies were calculated by the following equation: ha nm or nm spint /ha nm spint x %. cell-cell fusion assays were carried out as described ( ). mdck cells were seeded to a confluency of about % in -well plates. one plate each was then transformed with a plasmid allowing for the expression of human matriptase or human tmprss . one plate was transformed with empty vector. hours post transfection cells were infected with the respective egg-grown virus at a moi of approx. . . different spint concentrations were added as indicated. . nm trypsin was added to the cells transformed with the empty vector. hours post infection supernatants were collected, centrifuged and stored at - °c. viral titers were determined using an immune-plaque assay as described ( ). vero ( , ) cells were plated into -well plates. the following day, cells were infected with moi rghmpv for hours. cells were washed with pbs and µl opti-mem with or without nm spint and . µg/ml of tpck-trypsin was added and incubated for up to hours. the spint and trypsin was replenished in new opti-mem every hours. for each time point, media was aspirated and µl of opti-mem was added to cells followed by scraping and flash freezing. these samples were then titered on confluent vero cells up to - dilution to calculate viral titer. graph shows independent replicates with internal duplicates plotted as individual points. some data points not shown are due to sample loss, with a minimum of points per group or independent replicates. . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it hepatocyte growth factor activator inhibitor /placental bikunin (hai- /pb) gene is frequently hypermethylated in human hepatocellular carcinoma. cancer res : - . the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint analysis was performed using a non-paired student's t-test comparing the samples tested with nm spint against the respective sample incubated with nm spint . error bars indicate standard deviation. * indicates p = < . . figure : tmprss , hat, matriptase and klk cleave hmpv f and spint is able to prevent cleavage by exogenous proteases. hmpv f was either expressed alone or co-transfected with protease and allowed to express for ~ hours. cells were then metabolically starved of cysteine and methionine followed by radioactive s labeling of protein for hours in the presence of tpck-trypsin or specified protease. spint treated proteases were incubated at room temperature for minutes and placed onto cells for hours. radioactive gels were quantified using imagequant software with percent cleavage equal to �� + � �. a and b) co-transfected proteases tmprss , hat and matriptase are able to cleave hmpv f (n= ) while c and d) exogenous proteases klk and matriptase but not klk are able to cleave hmpv f (n= ). e and f) spint prevented cleavage of hmpv f by trypsin, klk and matriptase at nm concentrations demonstrated by the loss of the f cleavage product (n= ). statistical analysis was performed using a one-way anova followed by a student's t-test with a bonferroni multiple comparisons test correction. p< . *, p< . **, p< . ***, p< . ****. n values represent independent replicates for each treatment group. error bars represent sd. the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint antibodies and a secondary fluorogenic alexa antibody. nuclei were stained using dapi. magnification x. (c) vero cells expressed a/vietnam/ / h n ha that was cleaved during its maturation process in the cell. spint was added at nm or nm at the time of transfection. magnification x. for human matriptase or human tmprss and allowed to express the proteins for ~ hours. cells expressing human matriptase (b) or human tmprss (c) were then infected with a/ca/ / h n at a moi of . and different spint concentration were added to each well. non-transfected cells to which trypsin was added served as a control (a). (d) mdck cells were infected with a/x h n at an moi of . and trypsin was added to assist viral propagation. different spint concentration were added as indicated. after hours of infection the supernatants were collected and used for an immuno-plaque assay to determine the viral loads. experiment was repeated three times and each dot represents the viral titer of a single experiment. statistical analysis was performed using a non-paired student's t-test comparing the control ( h) against the respective sample. error bars indicate standard deviation. * indicates p = < . . extended horizontal line within the error bars represents mean value of the three independent replicates. completed in duplicate (all data points plotted within the graph). statistical analysis was performed using a student's t-test. p < . *, p < . **, p < . ***, p < . ****. nd indicates that the sample was below the limit of detection. . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint . cc-by-nc-nd . international license is made available under a the copyright holder for this preprint (which was not peer-reviewed) is the author/funder. it . https://doi.org/ . / doi: biorxiv preprint protease inhibitors targeting coronavirus and filovirus entry efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane key: cord- -aes l s authors: steffen, tara l.; stone, e. taylor; hassert, mariah; geerling, elizabeth; grimberg, brian t.; espino, ana m.; pantoja, petraleigh; climent, consuelo; hoft, daniel f.; george, sarah l.; sariol, carlos a.; pinto, amelia k.; brien, james d. title: the receptor binding domain of sars-cov- spike is the key target of neutralizing antibody in human polyclonal sera date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aes l s natural infection of sars-cov- in humans leads to the development of a strong neutralizing antibody response, however the immunodominant targets of the polyclonal neutralizing antibody response are still unknown. here, we functionally define the role sars-cov- spike plays as a target of the human neutralizing antibody response. in this study, we identify the spike protein subunits that contain antigenic determinants and examine the neutralization capacity of polyclonal sera from a cohort of patients that tested qrt-pcr-positive for sars-cov- . using an elisa format, we assessed binding of human sera to spike subunit (s ), spike subunit (s ) and the receptor binding domain (rbd) of spike. to functionally identify the key target of neutralizing antibody, we depleted sera of subunit-specific antibodies to determine the contribution of these individual subunits to the antigen-specific neutralizing antibody response. we show that epitopes within rbd are the target of a majority of the neutralizing antibodies in the human polyclonal antibody response. these data provide critical information for vaccine development and development of sensitive and specific serological testing. natural infection of sars-cov- in humans leads to the development of a strong neutralizing antibody response, however the immunodominant targets of the polyclonal neutralizing antibody response are still unknown. here, we functionally define the role sars-cov- spike plays as a target of the human neutralizing antibody response. in this study, we identify the spike protein subunits that contain antigenic determinants and examine the neutralization capacity of polyclonal sera from a cohort of patients that tested qrt-pcr-positive for sars-cov- . using an elisa format, we assessed binding of human sera to spike subunit (s ), spike subunit (s ) and the receptor binding domain (rbd) of spike. to functionally identify the key target of neutralizing antibody, we depleted sera of subunit-specific antibodies to determine the contribution of these individual subunits to the antigen-specific neutralizing antibody response. we show that epitopes within rbd are the target of a majority of the neutralizing antibodies in the human polyclonal antibody response. these data provide critical information for vaccine development and development of sensitive and specific serological testing. severe acute respiratory syndrome coronavirus (sars-cov- ) was initially identified in patients with severe pneumonia in wuhan, china in december of . due to its initial zoonotic transmission and human to human spread within an immunologically naïve population, it has since caused over million confirmed cases and over , deaths worldwide (who ), with approximately % of all cases occurring in the united states as of july th - . infection with sars-cov- can result in a range of states from asymptomatic to symptomatic, with symptomatic cases ranging from mild non-specific symptoms, like malaise, to severe pneumonia and multiple organ failure - , , . sars-cov- is a positive sense, single stranded, enveloped rna virus with a ~ kb genome that is virologically similar to the enzoonotic beta-coronaviruses sars-cov and mers-cov. the sars-cov- genome encodes non-structural proteins and structural proteins: spike (s), nucleocapsid (n), envelope (e), and membrane (m). the coronavirus n protein functions by interacting with viral rna to form the ribonucleoprotein, while e and m function in virion assembly and budding [ ] [ ] [ ] [ ] . spike is a homotrimeric transmembrane protein that is comprised of two subunits per monomer, s and s that are responsible for binding the host cell receptor and viral fusion, respectively. similarly to the human coronavirus nl- and sars-cov, sars-cov- spike uses human angiotensin converting enzyme (ace ) to gain entry into target cells [ ] [ ] [ ] . specifically, the s subunit of sars-cov- contains the receptor binding motif (rbm) within the receptor binding domain (rbd) that makes direct contact with the ace receptor for receptor-mediated entry [ ] [ ] [ ] . important to note for antibody structural determinants, the pre-fusion confirmation of the trimeric spike has a range of states that are described as "up" or "down" based on the angle of rbd within s . for a virion to be able to interact with ace and gain entry into host cells, rbd must be in the "up" conformation between ° and ° that represents a receptorbinding active state , . when the interaction between rbd and ace is disrupted, the entry of sars-cov- into susceptible cells is blocked , . spike is known to be a major antibody antigenic determinant for both mers-cov and sars-cov that leads to the generation of protective immune responses including the production of highly neutralizing antibodies [ ] [ ] [ ] . targets for these antibodies within spike include both conformation dependent and linear epitopes of rbd and the s fusion peptide. these neutralizing antibodies are proposed to block rbd-ace receptor interactions or prevent s fusion with host membranes , . spike being a major antigenic determinant for the antibody response against closely related beta-coronaviruses contributes to our hypothesis that the neutralizing polyclonal antibody response to sars-cov- will target spike and its sub-domains. to determine the current antigenic variation and display that variation within the structure of sars-cov- spike we interrogated , sars-cov- genomes derived from human samples available from gisaid on june , . the spike homotrimer contains multiple subunits, including s and s , both of which contain a total of glycosylated residues which can affect spike protein folding, receptor interactions and potentially block antibody recognition and are represented as lollipops in the schematic ( figure a ). the s subunit (residues - ) of spike contains the n terminal domain (ntd), c terminal domain (ctd), the receptor binding domain (rbd, residues - ), and the receptor binding motif (rbm, residues - ). the s subunit contains the fusion peptide (fp, residues - ), and heptad repeat and (hr , residues - , hr , residues - ), the transmembrane domain and cytoplasmic tail. in our analysis of naturally occurring amino acid (aa) variation, low quality sequences determined by gaps or ambiguous nucleotides > nt were removed (- sequences). the , remaining sequences were translated and aligned using muscle (supplemental file ), then duplicate sequences were removed. this resulted in a multiple sequence alignment of amino acids (aa), with , unique aa sequences and an overall pairwise identity of . %. the prevalence of aa variation per site and aa conservation was determined using the sequence variation tool (viprbrc) and the consurf server , respectively. the level of aa variation was measured by calculating the aa frequency at each position within the multiple sequence alignment, then shannon entropy was used to define aa conservation using data on the potential aa present. these conservation scores were broken down into discrete color coded categories with a score of being most variable, representing > , mutations at that site, and being most highly conserved with - mutations per site. the aa conservation was then displayed in the context of the spike pre-fusion trimer (pdb: vsb) to represent exposure to the human antibody response, where chain a displays the rbd within the up position and the b and c chains display the rbd in the down position ( figure b) . the spike trimer color coded for aa variation is located next to the spike protein where the subunits are color coded with s ntd in cyan, rbd in dark green, and s in light green. from the aa sequence variation analyses, we observed the well documented g d variation, which may have a fitness advantage. we also observed an additional positions that contained a range of variation from / ( . %) to / ( . %). once the aa variation was mapped onto the trimer structure, we observed that the greatest level of aa variation is found within the s ntd ( . % identity), while the lowest level of aa variation is within the rbd ( . % identity) and s ( . %). the low level of aa variation within rbd was also recently described by starr et al . our data, in addition to that of starr et al, indicate that overall the rbd and s domains are highly conserved and are currently genetically stable targets for vaccine and therapeutic intervention. we next wanted to investigate the specificity and immunodominance of the polyclonal antibody response to the subunits of spike due to its potential as a target for the development of vaccines and therapeutics. this concept is based upon vaccines developed against sars-cov and mers-cov , , and the strongly neutralizing monoclonal antibodies that have been identified as spike specific [ ] [ ] [ ] [ ] [ ] [ ] . to address specificity and immunodominance, we analyzed serum samples from laboratory confirmed sars-cov- infection cases as determined by qrt-pcr, with of the subjects being admitted to the hospital. the sars-cov- positive serum samples analyzed were obtained from patients in saint louis, mo, cleveland, oh, and san juan, pr, with no subjects succumbing to infection. the median age of the individuals is . years, with males and females. specific demographics regarding the subjects are listed in table . the cohort controls were collected prior to the emergence of sars-cov- ( - ) from previous studies conducted at saint louis, mo, cleveland, oh, and san juan, pr, and had a similar age and sex distribution. to investigate and quantify the igg response to sars-cov- , we performed elisa assays using serum from sars-cov- + subjects and sars-cov- negative control subjects. we serially diluted sera from : to : , as four-fold dilutions and evaluated binding to recombinant s , s , and rbd (figure a -c). polyclonal sera from all sars-cov- subjects showed igg binding to each spike subunit by elisa. igg reactivity to the s subunit, which contains the rbd, ranged in optical density (od) from . to . , while the control subjects had an od range from . to . at the highest concentration tested. four subjects were responsible for the majority of the elisa binding to s within the control group ( figure a ). the overlap of these four negative subject samples with the sars-cov- + subjects suggests that there is antibody cross-reactivity to the sars-cov- s protein, most likely due to prior human coronavirus (hcov) infections (nl , hku , oc , e). however, the focus of our study is to functionally define the key targets of the neutralizing antibody response to sars-cov- , and further studies would need to be completed to define the nature of the cross-reactive response. we next interrogated the antibody response to the s subunit, where at a : dilution the sars-cov- positive subjects had an od range of . to . , while control subjects had an od range of . to . ( figure b ). only one control subject had antibody binding that overlapped with the lower range of the sars-cov- subjects. we used an identical approach to evaluate the antibody response to rbd ( figure c ). here we saw a robust antibody response with a range in od from . to . at a serum dilution of : , and we observed no antibody binding above background in the control group. multiple groups have observed a similar responses to the rbd subunit, indicating the specificity of the rbd antibody response , , which could in part be due to the low level of conservation of the rbd amino acid sequences between sars-cov- and the hcovs, which cause the common cold , . to further quantify differences in binding to the individual spike subunits we calculated the area under the curve (auc) of the s , s , and rbd elisa assays for each subject ( figure d -f, table ). quantification of the auc measures the antibody binding at multiple antibody concentrations quantifying a combination of avidity and specificity of the sera for each subject. when assessing binding to s , the mean auc of sars-cov- subjects was . +/- . and the mean auc of controls was . +/- . (p< . ; figure d ). upon comparing the auc for antibody binding to s , we observed a mean auc of . +/- . and . +/- . (p< . ) sars-cov- + patients and controls, respectively ( figure e ). interestingly, when we assess binding to rbd, we observed a mean auc of . +/- . and . +/- . (p< . ) showing minimal cross-reactivity from the negative subjects ( figure f ). the rbd binding data matches recently described results , , which show that the antibody response to rbd is specific to sars-cov- infection, with no known cross-reactivity from antibodies derived from endemic hcov infection. overall, we show that s , s , and rbd from spike are targeted by the human polyclonal response in all individuals from our cohort. additionally, we observe potential cross-reactivity within the control group to the s domain outside of the rbd. this cross-reactivity is important to note for serological and vaccine evaluation, as using rbd as a target antigen may provide the most specific and sensitive test that results with fewer false-positives. interestingly, this also highlights the potential for cross-reactive s antibodies to play a role in either protection or exacerbation of sars-cov- disease. it has been recently demonstrated that human mabs generated after sars-cov infection were shown to cross-react and neutralize sars-cov- ( ), while sars-cov infection generates a polyclonal antibody response that is able to bind spike from sars-cov- while not able to neutralize the virus . furthermore, mechanisms aside from neutralization that are dependent on the fc region of the antibody are capable of limiting viral infection , , . as there was a broad dynamic range of antibody binding to spike subunits from our sars-cov- + subjects, we stratified the samples based upon days post qrt-pcr positive test, because days post symptoms was unavailable, to evaluate the potential role of time post infection on antibody binding and specificity. samples were stratified into three groups: prior to day , day - , and day - post qrt-pcr + test. we compared the auc values from the elisa binding curves to s , s , and rbd over this time period and did not observe a role for time post infection on antibody binding and specificity with our limited sample set (supplemental figure a-c). the heterogeneity of antibody binding has been observed in other patient cohorts , , . additionally, we quantified the relative binding of the polyclonal antibody response between different spike subunits to determine if the subunits were equally targeted by the antibody response. to this end, we evaluated the correlation of auc between the s , s and rbd subunits (figure g-i). when we compared the auc of s and s we observed significant correlation (p= . , r= . ) ( figure g ). as expected, based on the location of rbd within s , s auc significantly correlated with rbd auc (p= . , r= . ), which may suggest epitopes for binding within s as well as rbd ( figure h ). based on the location of rbd within s we would anticipate correlation between their auc values ( figure i ), and indeed there is a significant correlation (p= . , r= . ) that would suggest that either the majority of binding to s occurs within rbd, or that there are antibody epitopes throughout s that drive a robust antibody response. the elisa antibody binding results indicate that all sars-cov- + patients within our cohort had antibodies which bound to each subdomain of the spike protein. antibody neutralization is one mechanism of protection from severe viral disease. the mechanism of action of neutralizing antibodies often include the targeting of viral proteins that interact with the host receptor for entry or viral proteins required for fusion with host cell membranes (reviewed in ). for sars-cov- , the multifunctional spike protein is required for entry and fusion. specifically, the s domain contains rbd, which is responsible for binding the human ace protein mediating entry , , while s contains the fusion peptide . it has been shown by other groups that monoclonal antibodies targeting spike can block infection with sars-cov- and that natural infection of humans often produces neutralizing antibodies - , , , , which is thought to prevent subsequent covid- . however, the specificity of human polyclonal neutralizing antibodies against infectious sars-cov- is only now beginning to be understood. to begin to understand the human polyclonal neutralizing antibody response we utilized a focus reduction neutralization tests (frnt) ( figure a ) based upon the assay we had developed for multiple emerging infectious diseases [ ] [ ] [ ] and for sars-cov- ( figure a ). there are multiple advantages to the frnt assay over pseudotype-virus assays and plaque assays, including the use of infectious virus that may better reflect heterogeneity in the conformational structure of the virion, quantitative measurement of the reduction of viral replication and spread as each foci diameter measured represents multiple cells, and finally the use of well plates allowing for titers to be quantified using multiple technical and biological replicates. overall, this assay allows for a rigorous and quantitative determination of antibody neutralization potential. using the frnt assay, we determined the concentration of patient sera required to neutralize sars-cov- infection. based upon the antibody neutralization curve ( figure b ), the serum dilution necessary to neutralize % of the virus (frnt ) ranged from / to / with a mean of / ( figure c ). the serum dilution necessary to neutralize % of the virus (frnt ) ranged from / - / with a mean of / ( figure c) . notably, the sera from sars-cov- patients in our cohort were capable of neutralizing infectious virus independent of day post positive test ( figure b -c, table ); while, sera from the majority of control subjects had no demonstrated antibody neutralization. one control subject, whose sera was cross-reactive in the s /s elisa binding assay demonstrated % sars-cov- neutralization potential at a / dilution, but further investigation of cross-neutralization is beyond the scope of this current study. based upon the ability of the sars-cov- subjects to neutralize at least % of the virus, we show that the polyclonal antibody response has the breadth and specificity to completely neutralize sars-cov- infection. this would suggest that natural infection would be capable of controlling viral infection and limiting the potential of disease and transmission at the timepoints we assessed ( figure d ). in animal model studies, hamsters have demonstrated that immune sera can protect from challenge , although currently the mechanisms of that protection are unknown. to functionally determine which of the spike subunits are the main target of neutralizing antibodies, we performed a functional assay developed by the de silva lab for use in flaviviruses . in this approach individual spike subdomains are linked to beads and are used to depleted sera in an antigen specific manor. in our studies his-tagged proteins are conjugated to cobalt coated magnetic beads and serum from sars-cov- subjects are incubated with the conjugated proteins. this allows a complex of antibody:antigen:bead to form and be pulled down by a magnet, leaving the serum depleted of that particular antibody specificity ( figure a ). to understand the contribution of antibodies specific to each individual subunits, antibodies specific to each spike subunit, s , s , and rbd, were depleted from human polyclonal sera, and the antibody binding and neutralization potential of polyclonal sera after depletion was determined by elisa and frnt, respectively. using the bead-based approach, sera from patients were depleted for s , s , and rbd individually by sequentially incubating serum two times with protein coated beads. to quantify the effects of the antigen-specific antibody depletions, the auc from elisa binding curves pre and post depletion (supplemental figure , table ) were quantified, and the values were paired per subject ( figure b ). after antigen specific depletions we observed significant reduction in spike subunit antibodies represented by a . (p= . ), . (p< . ) and . (p< . ) fold reduction in auc binding to s , s , and rbd respectively ( figure b ). moreover, to confirm that depletion protocol did not impact sars-cov- neutralization we performed depletions with an irrelevant protein, vacv a r ( figure b ). the subunit depletion protocol significantly reduced the level of subunit specific antibody, which allowed us to evaluate the contribution of each individual subunit to the neutralizing antibody response. to measure the functional effect of s , s , and rbd antibody depletion on virus specific neutralization we evaluated post-depletion neutralization activity by frnt (supplemental figure ). to confirm that the depletion protocol itself had no off-target effects on sars-cov- neutralization, a control depletion with vacv a r was completed and neutralization pre and post depletion was measured. the control depletion had a minimal effect on the ability of the polyclonal sera to neutralize sars-cov- ( figure c : frnt : . fold decrease; frnt : . fold decrease). we then measured the antibody neutralization curves after depleting serum with s , rbd or s , and determined the serum dilution required to reduce infection by % (frnt ) and % (frnt ) ( table ). to take into account the effects of the antibody depletion protocol, we compared the frnt of the control depleted serum with the subunit depleted serum, and observed a . , . , and . fold reduction after s , rbd, and s depletion, respectively ( figure c ). based upon the frnt and frnt values, the depletion of s and rbd significantly reduced virus neutralization (p= . and p= . ). this suggests that polyclonal antibody binding to the rbd domain of the spike protein represents the key target of neutralizing antibody to sars-cov- after natural infection. since we observed a similar fold reduction after s and rbd depletion, it is likely that the majority of the neutralizing response is found within the rbd domain of s . however, this is the average neutralizing antibody response, which is applicable to our cohort. when we evaluate changes in individuals, there are two patients that have a strong rbd neutralizing response, but also have a s specific neutralizing antibody response with . , and . fold change after s depletion. overall, these data demonstrate natural sars-cov- infection generates a robust anti-rbd polyclonal neutralizing antibody response with some individuals mounting a neutralizing antibody response to s . we conclude that the polyclonal neutralizing antibody response to sars-cov- primarily targets receptor interactions (s /rbd) in the majority of individuals. to compare the relative neutralizing differences between spike domains, we normalized the data based upon frnt values and represented the data as % subunit specific neutralizing antibodies. this allows us to calculate the percentage of neutralizing antibodies that bind to s , s , or rbd, while taking into account the impact of the depletion protocol, based on our control and subunit specific depletions ( figure d ). further confirming the paired frnt data, % +/- and % +/- % the highest percentage of neutralizing antibodies indeed bind to rbd and s , suggesting a prevention in virus interaction with viral receptor maybe the dominant mechanism for antibody neutralization of sars-cov- after natural infection. additionally, s has the lowest percentage % +/- % of s binding antibodies capable of neutralization, suggesting that viral fusion with host membranes is not a dominant target of the neutralizing antibody response to sars-cov- after natural infection, with our cohort of patients ( figure d ). this data has been further represented as % binding neutralizing antibodies based on the pre depletion frnt values (supplemental figure d) . overall, these data further confirm that a majority of neutralizing antibodies are targeted against the rbd within s . in this study we examined the antigenic targets of the sars-cov- igg neutralizing antibody response that develop during natural infection. we quantified the immunodominance of anti-spike subdomain antibodies for binding by elisa and neutralization activity by antigen specific depletion followed by a sars-cov- neutralization assay. to define the specificity of the antibody response during natural infection, we needed to understand the amino acid variation present in the currently circulating sars-cov- human isolates. human sars-cov- isolates has a low frequency of amino acid variation within the spike protein, with the exception of the d g mutation, allowing us to estimate that the majority of known isolates permit effective polyclonal antibody binding and neutralization. the human polyclonal antibody response recognizes three subdomains (s , s , and rbd) of the spike protein as evidenced by elisa. interestingly, we identified cross-reactive sera from sars-cov- naïve subjects to s suggesting conserved sequences in the s subunit of spike may impact non-neutralizing responses to sars-cov- as well as serological tests for sars-cov- . most importantly, our antigen-specific antibody depletion approach demonstrated that the rbd domain of the spike protein is responsible for % +/- . % of the human polyclonal neutralizing antibody activity to spike after natural sars-cov- infection. although our study shows that the dominant target of igg neutralizing antibody response after natural sars-cov- infection is the rbd domain of the spike protein, we have evaluated a limited number (n= ) of patients by antigen-specific antibody depletion. there is the potential that immunodominance of the neutralizing antibody response may vary based upon a number of variables including viral load, co-morbidities including age and obesity, as well as genetic background. additionally, we have only focused on the igg response and it has been recently determined that the iga antibody response can neutralize sars-cov- virus and the antigen specificity of that response could be different than the igg response . importantly, it has also been recently described that more than % of individuals who seroconvert generate detectible neutralizing antibody responses and that these igg responses are indeed sustained for up to three months , , which has the potential to protect against re-infection. to begin to evaluate the correlates of protection beyond antibody neutralization, and investigate additional antibody mechanisms such as antibody dependent cellular cytotoxicity. as we detected antibodies targeted against s that are non-neutralizing these could provide a different mechanism of protection that may be valuable when considering vaccine design. there is also a strong t cell response established during natural infection [ ] [ ] [ ] [ ] , as well as a cross-reactive t cell response from potentially prior hcov infection , . currently the role of the human t cell response to sars-cov- has only begun to be dissected. overall our study describes the polyclonal igg response to sars-cov- from sera obtained from patients in a range of - days post positive qrt-pcr test. we focused on the relationship between antibody binding to the subdomains of spike and the neutralization capacity against infectious virus. we demonstrate that infection with sars-cov- results in an antibody response that results in a similar amount of igg that targets spike subunits s , s , and rbd regardless of time post infection (supplemental figure ) . furthermore, we show that this response results in a neutralizing antibody response by days post positive qrt-pcr, as determined by frnt ( figure b) . finally, using a bead-based immune depletion approach, we show that the highest percentage of neutralizing antibodies against sars-cov- bind to the receptor binding domain (rbd) ( figure d ) that directly interacts with human ace . these findings are important in the further development and prioritization of therapeutics and vaccine development. plates were coated with ul of a ug/ml mixture of recombinant protein in carbonate buffer ( . m na co . m nahco ph . ) overnight at °c. the next day plates were blocked with blocking buffer (pbs + %bsa + . % tween) for hours at room temperature and washed x with wash buffer prior to plating of serially diluted polyclonal sera. sera was incubated for hour at room temperature in the elisa plate, washed x with wash buffer, followed by addition of goat-anti-human igg hrp (sigma) conjugated secondary ( : ) for hour at room temperature. the plate was washed again x with wash buffer and the elisa was visualized with ul of tmb enhanced substrate (neogen diagnostics) and placed in a dark space for minutes. the reaction was quenched with n hcl and the plate was read for an optical density of nanometers on a biotek epoch plate reader. total peak area under the curve (auc) was calculated using grappad prism . antigen specific antibody depletions. antigen specific antibodies were depleted in a beadbased approach using ni-nta magnetic beads (thermo scientific) as described ( ). sars-cov his tagged proteins or vacv his tagged protein (control depletion) were conjugated to the hisspecific magnetic beads as suggested by manufacturer's protocol. briefly, mg of beads were washed with equilibration buffer followed by addition of ug of protein diluted in equilibration buffer. after addition of protein, the tube was rotated end over end for hour at °c. the beads were collected on a magnetic stand and washed twice with wash buffer followed by separation into two tubes of µl each. next, the human sera were diluted in tissue culture sterile pbs and placed into the first tube of beads and incubated end-over end at °c for hour. once again, the beads were collected with a magnetic stand, the supernatant was removed and transferred into the second tube for another end-over-end incubation at °c for hour. after incubation the beads were collected, and the supernatant was removed and placed at °c for subsequent elisas and (a)schematic of the full-length sars-cov- spike protein with the s and s highlighted. s is divided into the n-terminal domain, and the c-terminal domain which contains the receptor binding domain (rbd) subunit in dark green with the receptor binding motif displayed using black hashed lines. the separation between s and s is represented by a slash line. s contains the fusion peptide (fp) and heptad repeat one and two (hr and hr ) . the spike protein is where the surface reconstruction is colored according to discrete groups, with a score of being highly conserved ( - mutation per position) to being highly diverse with a score of (> mutations per position). the color coded bar describes the corresponding color for each range of mutations. next to each aa variation coded structure is the cryoem trimer structure with the individual trimers color coded to allow orientation. the forward facing trimer for the rbd up is color coded by subdomain, with rbd up being dark cyan, s as cyan, and s as pale green. the rbd down trimer is color coded with rbd down as brown, s as gold and s as pale yellow. we observed naturally occurring aa variations are less within the rbd as noted by the high level of purple colored ca residues, and greatest aa variations within the s -ntd as indicated by the white and green color residues. the epidemiology and clinical information about covid- clinical characteristics of coronavirus disease in china clinical features of patients infected with novel coronavirus in wuhan the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak clinical findings in a group of patients infected with the novel coronavirus (sars-cov- ) outside of wuhan, china: retrospective case series epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study coronavirus disease (covid- ): epidemiology, pathogenesis, diagnosis, and therapeutics structural basis of receptor recognition by sars-cov- structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace structures of human antibodies bound to sars-cov- spike reveal common epitopes and recurrent features of antibodies humoral immunogenicity and efficacy of a single dose of chadox mers vaccine candidate in dromedary camels neutralizing antibody and protective immunity to sars coronavirus infection of mice induced by a soluble recombinant polypeptide containing an n-terminal segment of the spike glycoprotein a dna vaccine induces sars coronavirus neutralization and protective immunity in mice vaccine efficacy in senescent mice challenged with recombinant sars-cov bearing epidemic and zoonotic spike variants receptor-binding domain of severe acute respiratory syndrome coronavirus spike protein contains multiple conformation-dependent epitopes that induce highly potent neutralizing antibodies structural definition of a neutralization epitope on the n-terminal domain of mers-cov spike glycoprotein vipr: an open bioinformatics database and analysis resource for virology research consurf : an improved methodology to estimate and visualize evolutionary conservation in macromolecules cryo-em structure of the -ncov spike in the prefusion conformation deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding. biorxiv immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis epidemiology and outcomes of acute decompensated heart failure in children genomic and serologic characterization of enterovirus a brainstem encephalitis potent neutralizing monoclonal antibodies directed to multiple epitopes on the sars-cov- spike. biorxiv convergent antibody responses to sars-cov- in convalescent individuals isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model potently neutralizing human antibodies that block sars-cov- receptor binding and protect animals. biorxiv a serological assay to detect sars-cov- seroconversion in humans the receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody cross-reactive antibody response between sars-cov- and sars-cov infections. biorxiv non-neutralizing antibodies from a marburg infection survivor mediate protection by fc-effector functions and by enhancing efficacy of other antibodies antibody-dependent cellular phagocytosis in antiviral immune responses sars-cov- infection induces robust, neutralizing antibody responses that are stable for at least three months. medrxiv antibody responses to viral infections: a structural perspective across three different enveloped viruses sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the sars-cov- spike protein. biorxiv a potently neutralizing antibody protects mice against sars-cov- infection propagation, quantification, detection, and storage of west nile virus. current protocols in microbiology isolation and quantification of zika virus from multiple organs in a mouse heterotypic immunity against vaccinia virus in an hla-b* : transgenic mousepox infection model characterization of cells susceptible to sars-cov- and methods for detection of neutralizing antibody by focus forming assay. biorxiv syrian hamsters as a small animal model for sars-cov- infection and countermeasure development dissecting the human serum antibody response to secondary dengue virus infections the rbd of the spike protein of sars-group coronaviruses is a highly specific target of sars-cov- antibodies but not other pathogenic human and potent neutralizing antibodies from covid- patients define multiple targets of vulnerability iga dominates the early neutralizing antibody response to sars-cov- . medrxiv evidence for sustained mucosal and systemic antibody responses to sars-cov- antigens in covid- patients. medrxiv sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls phenotype and kinetics of sars-cov- -specific t cells in covid- patients with acute respiratory distress syndrome targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals clinical characteristics and outcomes for , patients with sars-cov- infection. medrxiv selective and cross-reactive sars-cov- t cell epitopes in unexposed humans sars-cov- -reactive t cells in healthy donors and patients with covid- oligomeric state of the zikv e protein defines protective immune responses key: cord- - d b g authors: liu, yuan; yan, changhui title: the selection of reference genome and the search for the origin of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d b g the pandemic caused by sars-cov- has a great impact on the whole world. in a theory of the origin of sars-cov- , pangolins were considered a potential intermediate host. to assemble the coronavirus found in pangolins, sars-cov- were used a reference genome in most of studies, assuming that pangolins cov and sars-cov- are the closest neighbors in the evolution. however, this assumption may not be true. we investigated how the selection of reference genome affect the resulting cov genome assembly. we explored various representative cov as reference genome, and found significant differences in the resulting assemblies. the assembly obtained using ratg as reference showed better statistics in total length and n than the assembly guided by sars-cov- , indicating that ratg maybe a better reference for assembling cov in pangolin or other potential intermediate hosts. recently, the outbreak of sars-cov- (covid- ) has caused an ongoing global pandemic. as of july , , the pandemic resulting in a total of , , clinical cases and , deaths all over the world (www.who.int). with an effort of metagenomic rna deep sequencing, the genome of a sars-cov- isolate, wuhan-hu- , was published in [ ] . the genome showed high nucleotide similarity ( . %) to a group of sars-like coronavirus that were identified in bats in china, which indicated the possibility of animal origin. to identify potential direct or intermediate host of sars-cov- , coronavirus in several animals were studied and compared with sars-cov- . bat and pangolin were the two mostly investigated species. a cov in bat, ratg , showed % full-genome similarity with sars-cov- [ ] . other virus isolates from bat, zxc and zc also shared % of similarity to sars-cov- . these results led to the hypothesis that the progenitor of sars-cov- originated in bats and it spilled over to humans using another animal as intermediate host. many researchers believed that pangolins are a potential intermediate host and they attempted to characterize coronavirus in pangolin. liu, chen, and chen [ ] constructed coronavirus contigs using de novo assembly method from organ samples of dead malayan pangolins rescued at the guangdong wildlife rescue center. of the collected pangolins, coronavirus was detected in two. zhang, wu, and zhang [ ] re-analyzed the rna-seq reads from two pangolins carrying coronavirus using reference-guided de novo assembly method, with wuhan-hu- as the reference genome. the resulting draft genome shared . and . % whole genome similarity with wuhan-hu- and ratg , respectively. xiao et al. [ ] obtained samples from pangolins rescued by guangdong customs and conducted reference-guided genome assembly, with wuhan-hu- as the reference genome. the derived viral genome showed and % whole genome sequence identity to sars-cov- and ratg , respectively. lam et al. [ ] collected samples from pangolins from guangxi medical university, china, and six samples contained coronavirus sequences. the viral genomes of six samples were de novo assembled. they also performed rna sequencing in five archived pangolins samples from guangdong, and assembled the genomes using wiv , another sars-cov- genome from human, as reference genome. the resultant draft genomes have . % to . % identity to the sars-cov- genome. they suggested two sub-lineages of coronavirus existed in pangolins. in terms of all coding sites, coronavirus identified in pangolins from guangdong is more closely related to wiv than bat-cov ratg , whereas coronavirus genome of pangolins in guangxi showed lower genome similarity to wiv than bat-cov ratg . liu et al. [ ] took samples from three coronavirus positive pangolins rescued in guangdong and performed deep sequencing. using de novo assembly method, they obtained viral genome that showed . % and . % of whole genome identify to wuhan-hu- and bat-cov ratg , respectively. all of the pangolins involved in published coronavirus analysis were from either the guangdong collection or the guangxi collection. pangolins from the guangdong collection were investigated in most studies [ , , , ] . the resulting viral genomes were derived from de novo assembly with or without a guided reference and further curated using blast annotation or pcr amplicon sequencing. all studies have showed that cov in pangolin was highly related to bat-cov and sars-cov- . in all of the studies that used reference-guided de novo assemblies, a sars-cov- genome (wuhan-hu- or wiv ) was chosen as reference [ , , ] . this choice was based on the assumption that sars-cov- was the closest neighbor of the pangolin cov in the phylogeny tree. however, this assumption may not necessary be true. therefore, choosing the sars-cov- genome as reference could inadvertently introduce bias in the genome assembly, leading to inaccurate or incomplete results. in this study, we assembled the pangolin cov genome using several different genomes as reference. we investigated how the reference genome impact the resulting genome and its phylogenetic relationship with others. the results from this study will provide guidance for future studies on how to accurately construct cov genomes from pangolin or other potential intermediate hosts. two rna-seq samples, lung and lung , were downloaded from ncbi sra under bioproject sra: prjna . the two samples were originally published in [ ] for viral metagenomics analysis. adaptor trimming and quality control were performed on the raw sequence reads using with the trimmonmatic program (verson . ) [ ] . to eliminate host contamination, the remaining reads were aligned to the manis javanica genome (sra: prjna ) using bwaaln (version . . ) [ ] and reads mapped to the host genome were discarded. reads unmapped to the host reference genome were used to construct genome in the subsequent de novo assembly. cleaned reads were used to assemble genome using reference-guided de novo assembly. to investigate how the results were influenced by the choice of reference genome, we explored a few representative virus genomes on the phylogeny tree (table ) as the reference genome. these genomes were selected based on previous studies [ , , , ] . once a reference genome was picked, the cleaned reads were aligned to the reference genome using bwa-mem [ ] , and the mapped reads were assembled de novo using meghit (version . . ) with meta-sensitive mode [ ] . the resulting contigs were concatenated into an assembly by aligning them to the reference genome. phylogenetic distance analysis was performed using mega x (version . . ) [ ] .the whole genome was used in phylogenetic and distance analysis, and phylogenetic trees were constructed in the best-fit dna/amino acid substitution mode with bootstrap replications. the whole genome nucleotide identity analysis was performed in simplot . . [ ] . a total of eight viral genomes were tested as the reference genome in reference-guided de novo assembling. numbers of mapped reads ranged from to , , and length of the resulting draft assemblies varied from , to , bp (table ) . two reference genomes, merscov and bat hp-betacov zhejiang , failed in reads assembling due to the limited number of remaining reads. less than , reads were mapped to three reference genomes, bj , bm _ , and longquan , resulting in shorter assemblies. a total of , reads were mapped to zc and subsequently assembled into a , -bp assembly with . % of coverage. ratg , which is a bat cov, had , reads mapped to it, and the resulting assembly has total length of , and n of , . a total of , reads were mapped to wuhan-hu- , a human sars-cov- strain, which was the highest number among all genomes we surveyed. the assembly guided by ratg and wuhan-hu- showed similar coverage at about eighty percent. however, the resulting assembly had shorter total length ( , ) and n ( , ) than those of the assembly that used ratg as reference. to understand this seemly contradiction, we investigated the reads coverage and depth on ratg and wuhan-hu- . the results (figure ) show that there were an excessive number of reads mapped to distal regions of wuhan-hu- , which could indicate artifacts or contaminations during the sequencing. after removing these tail regions (with > x depth), more unique reads were mapped to wuhan-hu- . figure shows the overlap of between the unique reads mapped to ratg and wuhan-hu- . unique reads were used to construct venn diagrams. all reads mapped to italy strain were also mapped to wuahn-hu- genome (figure ). the additional unique reads were mapped to wuhan-hu- . most reads were mapped to both ratg and italy strain, but a total of reads were aligned to either one. bat-cov, zc and ratg , shared reads in alignment, and ratg genome guided over two hundred more reads into assembly. the two assemblies that used ratg and wuhan-hu- as reference genomes respectively, were aligned with the eight reference viral genomes in table in a multiple alignment. the multiple alignment result was used for similarity analysis and phylogenetic analysis. in terms of the whole genome nucleotide, the ratg -guided assembly showed . % and . % identity to ratg and wuhan-hu- , respectively, and the wuhan-hu- -guided assembly showed . and . % identity to ratg and wuhan-hu- respectively. the difference between wuhan-hu- -guided assembly and ratg -guided assembly was not always consistent with difference between ratg and wuhan-hu- (table s ). the differences between ratg -guided and wuhan-hu- -guided assemblies in the regions of , - , bp were probably due to references between the reference genomes (table s ). however, the differences in some regions including , - , bp and , - , bp, were not due to differences in the reference genomes. we further investigated phylogenetic relationship between the two assemblies and eight reference viral genomes using mega x with bootstrap tests. the two assemblies positioned between ratg and zc with strong statistical evidence (figure ) . the sars-cov- genome, wuhan-hu- , and bat-cov, ratg , clustered closely together. the two assemblies from this study consistently positioned between this cluster and other covs ( figure ). among other covs, zc is the closest neighbor to the assemblies. currently, very few samples of coronavirus-positive pangolins have been sequenced. pangolins-cov obtained from the guangdong collection and the guangxi collection represent two lineages of coronavirus [ ] . after outbreak of sars-cov, thousands of bat samples were collected and sequenced to identify coronaviruses that bat may carry. we can expect that in near future more pangolin samples will be collected and studied for better understanding of coronavirus in pangolin. in a reference-guided assembling, the resulting assembly may show bias towards the reference genome [ ] . successful decoding a complete viral genome usually require deep sequencing and further manual curations to fix the gaps. inaccuracies in assembling the sequencing reads could mislead the subsequent curation step. the whole genome identity between bat-cov ratg and sars-cov wuhan-hu- is about %, which corresponds to a total difference of about , nucleotides. our results have shown that observable difference could be found in the resulting assemblies when these genomes were used as reference separately. particular, when ratg was used as reference, the resulting assembly had a longer total length and higher n value than when wuhan-hu- was used. this points to the possibility that pangolin-cov is more closely related to ratg than to wuhan-hu- . therefore, in order to decode the coronavirus sequence accurately, ratg , and possibly other sars-cov- isolates, should also be considered as reference in future studies of coronavirus in pangolin or other potential intermediate hosts. in addition to using one reference genome to guide the assembling, we also attempted to assemble all reads that mapped to either ratg or wuhan-hu- genomes. the resulting twogenomes-guided assembly has a total length of , bp, which is slightly longer than that of the ratg -guided assembly. however, the n , , bp, is slightly shorter. the authors have no conflict of interest. table s . the whole genome nucleotide similarity from ratg , wuhan-hu- , and resulting assemblies. trimmomatic: a flexible trimmer for illumina sequence data mega x: molecular evolutionary genetics analysis across computing platforms identifying sars-cov- -related coronaviruses in malayan pangolins megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph aligning sequence reads, clone sequences and assembly contigs with bwa-mem fast and accurate short read alignment with burrows-wheeler transform reference-guided de novo assembly approach improves genome reconstruction for related species viral metagenomics revealed sendai virus and coronavirus infection of malayan pangolins (manis javanica) are pangolins the intermediate host of the novel coronavirus (sars-cov- )? full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination a new coronavirus associated with human respiratory disease in china isolation of sars-cov- -related coronavirus from malayan pangolins probable pangolin origin of sars-cov- associated with the covid- outbreak a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -qeo dfxg authors: feng, ye; qiu, min; liu, liang; zou, shengmei; li, yun; luo, kai; guo, qianpeng; han, ning; sun, yingqiang; wang, kui; zhuang, xinlei; zhang, shanshan; chen, shuqing; mo, fan title: multi-epitope vaccine design using an immunoinformatics approach for novel coronavirus (sars-cov- ) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qeo dfxg a new coronavirus sars-cov- has caused over . million infection cases and deaths worldwide. due to the rapid dissemination and the unavailability of specific therapy, there is a desperate need for vaccines to combat the epidemic of sars-cov- . an in silico approach based on the available virus genome was applied to identify high immunogenic b-cell epitopes and human-leukocyte-antigen (hla) restricted t-cell epitopes. thirty multi-epitope peptide vaccines were designed by ineo suite, and manufactured by solid-phase synthesis. docking analysis showed stable hydrogen bonds of epitopes with their corresponding hla alleles. when four vaccine peptide candidates from the spike protein of sars-cov- were selected to immunize mice, a significantly larger amount of igg in serum as well as an increase of cd + cells in ilns was observed in peptide-immunized mice compared to the control mice. the ratio of ifn-γ-secreting lymphocytes in cd + or cd + cells in the peptides-immunized mice were higher than that in the control mice. there were also a larger number of ifn-γ-secreting t cells in spleen in the peptides-immunized mice. this study screened antigenic b-cell and t-cell epitopes in all encoded proteins of sars-cov- , and further designed multi-epitope based peptide vaccine against viral structural proteins. the obtained vaccine peptides successfully elicited specific humoral and cellular immune responses in mice. primate experiments and clinical trial are urgently required to validate the efficacy and safety of these vaccine peptides. importance so far, a new coronavirus sars-cov- has caused over . million infection cases and deaths worldwide. due to the rapid dissemination and the unavailability of specific therapy, there is a desperate need for vaccines to combat the epidemic of sars-cov- . different from the development approaches for traditional vaccines, the development of our peptide vaccine is faster and simpler. in this study, we performed an in silico approach to identify the antigenic b-cell epitopes and human-leukocyte-antigen (hla) restricted t-cell epitopes, and designed a panel of multi-epitope peptide vaccines. the resulting sars-cov- multi-epitope peptide vaccine could elicit specific humoral and cellular immune responses in mice efficiently, displaying its great potential in our fight of covid- . ( ). all predicted structures or models were decorated and displayed by the open source version of pymol program (https://github.com/schrodinger/pymol-open-source). immuno-stimulation of b lymphocytes the selected peptides were synthesized with the solid phase synthesis method by genscript biotech company (nanjing, china), and were mixed at an equal concentration of mg/ml. the immunization experiment constituted three groups, each consisting of on the th day after the st immunization, randomly selected mice were euthanized. the inguinal lymph nodes (ilns) were harvested and processed into single cell the design of immunization experiment was similar to that for the b cells, but the suspensions. splenocytes ( × /well) and iln lymphocyte ( × /well) were cultured overnight with the peptide mixture ( μg/ml) or in rpmi- alone (negative control). percp/cyanine . -conjugated anti-mouse cd a antibody (biolegend, san diego, us), cells were resuspended in l pbs for flow cytometry analysis. the ifn-γ-secreting t lymphocytes were also quantified on randomly selected containing t-cell epitopes only. based on both the epitope counts and hla score, we eventually selected t-cell epitopes-only vaccine peptides. taken together, a total of peptide vaccine candidates were designed (table ) . of them were from the spike protein, two from the membrane protein and two from the envelope protein. five peptides were located in the rbd region, indicating they were likely to induce the production of neutralizing antibody. the vaccine peptides covered all structural proteins that may induce immune response against sars-cov- in theory; and the multi-peptide strategy we applied would better fit the genetic variability of the human immune system and reduce the risk of pathogen's escape through mutation ( ). interaction of predicted peptides with hla alleles to further inspect the binding stability of t-cell epitopes against hla alleles, the t-cell epitopes involved in the above designed vaccine peptides were selected to conduct an interaction analysis. figure illustrated the docking results against the most popular hla types for the two epitopes from vaccines peptide and (table ; table ), which showed relatively higher hla score. the mdockpep scores were between - ~ - , indicating that the predicted crystal structures were stable. all epitopes were docked (table ). taken together, the epitopes included in our vaccine peptides can interact with the given hla alleles by in silico prediction. p , p and p , were chosen as the candidates for the downstream validation experiments because of their relatively higher counts of b-cell and t-cell epitopes and the higher frequencies of their epitopes' corresponding hla alleles (table ) . we immunized mice by subcutaneous injection of the mixture of these synthesized peptides plus to evaluate whether these peptides induce b cells to produce specific antibody against the s protein, an elisa assay was conducted to detect igg in the sera of mice. fig. . interaction between the predicted peptides (by yellow sticks) and different hla alleles (by green cartoons). amino acids were labeled adjacent to the contact sites. table displays the detailed docking information. surveillance case definitions for human infection with novel coronavirus (ncov): interim guidance v the authors declare that they have no competing interests. key: cord- -ycuiso g authors: li, wei; drelich, aleksandra; martinez, david r.; gralinski, lisa; chen, chuan; sun, zehua; schäfer, alexandra; leist, sarah r.; liu, xianglei; zhelev, doncho; zhang, liyong; peterson, eric c.; conard, alex; mellors, john w.; tseng, chien-te; baric, ralph s.; dimitrov, dimiter s. title: rapid selection of a human monoclonal antibody that potently neutralizes sars-cov- in two animal models date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ycuiso g effective therapies are urgently needed for the sars-cov- /covid pandemic. we identified panels of fully human monoclonal antibodies (mabs) from eight large phage-displayed fab, scfv and vh libraries by panning against the receptor binding domain (rbd) of the sars-cov- spike (s) glycoprotein. one high affinity mab, igg ab , specifically neutralized replication competent sars-cov- with exceptional potency as measured by two different assays. there was no enhancement of pseudovirus infection in cells expressing fcγ receptors at any concentration. it competed with human angiotensin-converting enzyme (hace ) for binding to rbd suggesting a competitive mechanism of virus neutralization. igg ab potently neutralized mouse ace adapted sars-cov- in wild type balb/c mice and native virus in hace expressing transgenic mice. the ab sequence has relatively low number of somatic mutations indicating that ab -like antibodies could be quickly elicited during natural sars-cov- infection or by rbd-based vaccines. igg ab does not have developability liabilities, and thus has potential for therapy and prophylaxis of sars-cov- infections. the rapid identification (within days) of potent mabs shows the value of large antibody libraries for response to public health threats from emerging microbes. the severe acute respiratory distress coronavirus (sars-cov- ) ( ) has spread worldwide thus requiring safe and effective prevention and therapy. inactivated serum from convalescent patients inhibited sars-cov- replication and decreased symptom severity of newly infected patients ( , ) suggesting that monoclonal antibodies (mabs) could be even more effective. human mabs are typically highly target-specific and relatively non-toxic. by using phage display we have previously identified a number of potent fully human mabs (m , m , m . ) against emerging viruses including severe acute respiratory syndrome coronavirus (sars-cov) ( ) , middle east respiratory syndrome coronavirus (mers-cov) ( ) and henipaviruses ( , ) , respectively, which are also highly effective in animal models of infection ( ) ( ) ( ) ( ) ; one of them was administered on a compassionate basis to humans exposed to henipaviruses and successfully evaluated in a clinical trial ( ) . size and diversity of phage-displayed libraries are critical for rapid selection of high affinity antibodies without the need for additional affinity maturation. our exceptionally potent antibody against the mers-cov, m , was directly selected from very large (size ~ clones) library from individuals ( ) . however, another potent antibody, m . , against henipavirusses was additionally affinity matured from its predecessor selected from smaller library (size ~ clones) from individuals ( , ) . thus, to generate high affinity and safe mabs we used eight very large (size ~ clones each) naive human antibody libraries in fab, scfv or vh format using pbmcs from individuals total obtained before the sars-cov- outbreak. four of the libraries were based on single human vh domains where cdrs (except cdr which was mutagenized or grafted) from our other libraries were grafted as previously described ( ) . another important factor to consider when selecting effective mabs is the appropriate antigen. similar to sars-cov, sars-cov- uses the spike glycoprotein (s) to enter into host cells. the s receptor binding domain (rbd) binds to its receptor, the human angiotensinconverting enzyme (hace ), thus initiating series of events leading to virus entry into cells ( , ) . we have previously characterized the function of the sars-cov s glycoprotein and identified its rbd which is stable in isolation ( ) . the rbd was then used as an antigen to pan phage displayed antibody libraries; we identified potent antibodies ( , ) more rapidly and the antibodies were more potent than when we used whole s protein or s (unpublished). in addition, the sars-cov rbd based immunogens are highly immunogenic and elicit neutralizing antibodies which protect against sars-cov infections ( ). thus, to identify sars-cov- mabs, we generated two variants of the sars-cov- rbd (aa - ) (fig. s ) and used them as antigens for panning of our eight libraries. panels of high-affinity binders to rbd in fab, scfv and vh domain formats were identified. there was no preferential use of any antibody vh gene (an example for a panel of binders selected from the scfv library is shown in fig. s a ) and the number of somatic mutations was relatively low (fig. s b , for the same panel of binders as in fig. s a ). for nine of the highest affinity mabs a provisional patent application was filed on march , by the university of pittsburgh. those high affinity mabs can be divided into two groups in terms of their competition with hace . two representatives of each group are fab ab and vh ab . to further increase their binding through avidity effects and extend their half-live in vivo they were converted to igg and vh-fc fusion formats, respectively. ab was characterized in more details because of its potential for prophylaxis and therapy of sars-cov- infection. the fab and igg ab bound strongly to the rbd (fig. a ) and the whole sars-cov- s protein (fig. b) as measured by elisa. the fab ab equilibrium dissociation constant, kd, as measured by the biolayer interferometry technology (blitz), was . nm (fig. c) . the igg ab bound with high (kd = pm) avidity to recombinant rbd (fig. d) . igg ab bound cell surface associated native s glycoprotein suggesting that the conformation of its epitope on the rbd in isolation is close to that in the native s protein (fig. , s ). the binding of igg ab was of higher avidity than that of hace -fc (fig. b) . binding of ab was specific for the sars-cov- rbd; it did not bind to the sars-cov s (fig. a ) nor to cells that do not express sars-cov- s glycoprotein ( fig. a ). ab competed with hace for binding to the rbd ( fig. b and c) indicating possible neutralization of the virus by preventing binding to its receptor. it did not compete with the cr (fig. d and e) , which also binds to sars-cov ( ) and with ab ( fig. f ). igg ab potently neutralized sars-cov- pseudovirus with an ic of ng/ml ( fig a) . it did not enhance pseudovirus infection of fcγria overexpressing t-hace cells at any concentration ( fig b) . it also did not mediate pseudovirus infection of fcγrii expressing k cells ( fig s b) . importantly, igg ab exhibited potent neutralizing activity against authentic sars-cov- in two independent assays -a microneutralization-based assay ( % neutralization at < ng/ml) (fig. c ) and a luciferase reporter gene assay (ic = ng/ml) ( fig. d ). in agreement with the specificity of binding to the sars-cov- s and not to the sars-cov s the igg ab did not neutralize live sars-cov (fig. c ). the igg m ( ) control which is a potent neutralizer of mers-cov, did not exhibit any neutralizing activity against sars-cov- (fig. c ). the vh ab and vh-fc ab bound the rbd with high affinity and avidity (fig. s a .b) but did not compete with hace ( fig. s c ) or neutralize sars-cov- ( fig. d) , indicating that not all antibodies targeting epitopes on the rbd affect virus replication. to evaluate the efficacy of igg ab in vivo we used two animal models. the first one is based on the recently developed mouse ace adapted sars-cov- which has two mutations q t/p y at the ace binding interface on rbd ( ). igg ab protected mice from high titer intranasal sars-cov- challenge ( pfu) of balb/c mice in a dose dependent manner ( fig a) . there was complete neutralization of infectious virus at the highest dose of . mg, and statistically significant reduction by -fold at . mg; there was a trend for reduction at . mg dose but did not reach statistical significance. the igg m which potently neutralizes the mers-cov in vivo was used as an isotype control because it did not have any activity in vitro. these results also suggest that the rbd double mutations q t/p y do not affect igg ab binding. the second model we used is the transgenic mice expressing human ace (hace ) ( ). mice were administered ug of igg ab prior to wild type sars-cov- challenge followed by detection of infectious virus in lung tissue days later. replication competent virus was not detected in four of the five mice which were treated with igg ab ( fig b) . all six control mice and one of the treated mice had more than pfu per lung. these results show clear evidence of a potent preventive effect of igg ab in vivo. the reason for absence of virus neutralization in one of the mice is unclear but may be due to individual variation in antibody transfer from the peritoneal cavity where it was administered to the upper and lower respiratory tract. our previous experiments with transgenic mice expressing human dpp and treated with two different doses of m ( . and mg per mouse) showed similar lack of protection of one (out of four) mice at the lower dose but at the higher dose all four mice were protected ( ) similarly to the results obtained with the mouse adapted sars-cov- . the in vivo protection also indicates that igg ab can achieve neutralizing concentrations in the respiratory tract. this is the first report of in vivo activity of a human monoclonal antibody against sars-cov- by using two different mouse models. the results also show some similarity between the two models in terms of evaluation of antibody efficacy. in both models about the same dose of antibody ( . - . mg) reduced about -fold the infectious virus in the lungs. this result now suggests that testing of antibody efficacy could be performed at a larger scale than testing with the hace transgenic mice due to the availability of wild type mice. it also shows robust neutralizing activity of igg ab in two different models of infection. interestingly, fab ab had only several somatic mutations compared to the closest germline predecessor genes. this implies that ab -like antibodies could be elicited relatively quickly by using rbd-based immunogens especially in some individuals with naïve mature b cells expressing the germline predecessors of ab . this is in contrast to the highly mutated broadly neutralizing hiv- antibodies that require long maturation times, are difficult to elicit and their germline predecessors cannot bind native hiv- envelope glycoproteins ( , ). the rbd of the mers-cov s protein was previously shown to elicit neutralizing antibodies ( , ). for sars-cov- only a few somatic mutations would be sufficient to generate potent neutralizing antibodies against the sars-cov- rbd which is a major difference from the elicitation of broadly neutralizing antibodies against hiv- which requires complex maturation pathways ( , - ). the germline-like nature of the newly identified mab ab also suggests that it has excellent developability properties that could accelerate its development for prophylaxis and therapy of sars-cov- infection ( ). to further assess the developability (drugability) of ab its sequence was analyzed online (opig.stats.ox.ac.uk/webapps/sabdab-sabpred/tap.php); no obvious liabilities were found. in addition, we used dynamic light scattering (dls) and size exclusion chromatography to evaluate its propensity for aggregation. igg ab at a concentration of mg/ml did not aggregate for six days incubation at °c as measured by dls (fig. a) ; there were no high molecular weight species in freshly prepared igg ab also as measured by size exclusion chromatography (sec) (fig. b ). igg ab also did not bind to the human cell line t ( fig. a ) even at very high concentration ( μm) which is about -fold higher than its kd indicating absence of nonspecific binding to many membrane-associated human proteins. the igg ab also did not bind to , human membrane-associated proteins as measured by a membrane proteome array (fig. c ). the high affinity/avidity and specificity of igg ab along with potent neutralization of virus and good developability properties suggests its potential use for prophylaxis and therapy of sars-cov- infection. because it strongly competes with hace indicating a certain degree of mimicry, one can speculate that mutations in the rbd may also lead to inefficient entry into cells and infection. in the unlikely case of mutations that decrease the ab binding to rbd but do not affect binding to ace it can be used in combination with other mabs including those we identified or in bi(multi)specific formats to prevent infection of such sars-cov- isolates. ab could also be used to select appropriate epitopes for vaccine immunogens and for diagnosis of cov-specific igg m antibody was expressed in human mammalian cell as described previously ( ). the ace gene was ordered from origene (rockville, md). the rbd domain (residues - ) and s domain (residues - ) and ace (residues - ) genes were cloned into plasmid which carries a cmv promotor with an intron, human igg fc region and woodchuck posttranscriptional regulatory element (wpre) to generate the rbd-fc, s -fc and ace -fc expression plasmids. the rbd-avi-his protein with an avi tag followed by a ×his tag at c-terminal was subcloned similarly. these proteins were expressed with expi expression system (thermo fisher scientific) and purified with protein a resin (genscript) and by ni-nta resin (thermo fisher scientific). the fab cr antibody gene with a his tag was cloned into pcat plasmid (developed in house) for expression in hb bacteria and purified with ni-nta resin. protein purity was estimated as > % by sds-page and protein concentration was measured spectrophotometrically (nanovue, ge healthcare). blitz. antibody affinities and avidities were analyzed by the biolayer interferometry blitz binding ec was obtained by using the non-linear mode in graphpad prism . igg ab showed higher binding avidity to t-s cells than hace -fc ( . nm v.s. . nm for igg ab and hace -fc to achieve % binding, respectively). followed by pbst washing. for detection, an hrp conjugated anti mouse fc antibody was used. competition of ab with hace tested by blitz. nm hace -fc was monitored to bind ab saturated sensors (red line), which is compared to its independent binding signal to rbd sensor in the absence of ab (green line were defined as the sample concentration at which a % reduction in rlu was observed relative to the average of the virus control wells. a pneumonia outbreak associated with a new coronavirus of probable bat origin convalescent plasma as a potential therapy for covid- . the lancet infectious diseases sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody exceptionally potent neutralization of middle east respiratory syndrome coronavirus by human monoclonal antibodies potent neutralization of hendra and nipah viruses by human monoclonal antibodies exceptionally potent cross-reactive neutralization of nipah and hendra viruses by a human monoclonal antibody potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies passive transfer of a germline-like neutralizing human monoclonal antibody protects transgenic mice against lethal middle east respiratory syndrome coronavirus infection a neutralizing human monoclonal antibody protects against lethal disease in a new ferret model of acute nipah virus infection a neutralizing human monoclonal antibody protects african green monkeys from hendra virus challenge safety, tolerability, pharmacokinetics, and immunogenicity of a human monoclonal antibody targeting the g glycoprotein of henipaviruses in healthy adults: a first-in-human, randomised, controlled, phase study construction of a large naive human phage-displayed fab library through one-step cloning construction of a large phagedisplayed human antibody domain library with a scaffold based on a newly identified highly soluble, stable heavy chain variable domain structural basis for the recognition of sars-cov- by full-length human ace an emerging coronavirus causing pneumonia outbreak in wuhan, china: calling for developing therapeutic and prophylactic strategies the sars-cov s glycoprotein: expression and functional characterization biotinylated hace -fc ( nm) was incubated with rbd-fc in the presence of different concentrations of vh ab . after washing, bound hace -fc was detected by using hrp conjugated streptavidin after washing, bound cr was detected by using hrp conjugated anti human fc antibody. ab showed weak competition with cr for binding to sars-cov- rbd. all the elisa experiments were performed in duplicate and the error bars denote ± sd key: cord- -t ykkekl authors: stone, e. taylor; geerling, elizabeth; steffen, tara l.; hassert, mariah; dickson, alexandria; spencer, jacqueline f.; toth, karoly; dipaolo, richard j.; brien, james d.; pinto, amelia k. title: characterization of cells susceptible to sars-cov- and methods for detection of neutralizing antibody by focus forming assay date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t ykkekl the sars-cov- outbreak and subsequent covid- pandemic have highlighted the urgent need to determine what cells are susceptible to infection and for assays to detect and quantify sars-cov- . furthermore, the ongoing efforts for vaccine development have necessitated the development of rapid, high-throughput methods of quantifying infectious sars-cov- , as well as the ability to screen human polyclonal sera samples for neutralizing antibodies against sars-cov- . to this end, our lab has adapted focus forming assays for sars-cov- using vero ccl- cells, referred to in this text as vero who. using the focus forming assay as the basis for screening cell susceptibility and to develop a focus reduction neutralization test. we have shown that this assay is a sensitive tool for determining sars-cov- neutralizing antibody titer in human, non-human primate, and mouse polyclonal sera following sars-cov- exposure. additionally, we describe the viral growth kinetics of sars-cov- in a variety of different immortalized cell lines and demonstrate via human ace and viral spike protein expression that these cell lines can support viral entry and replication. which human cell types may be more permissive to sars-cov- infection, with particular uncertainty including if higher ace expression coincides with increased risk for heightened infection. furthermore, it is unclear which cell types may be suitable for the successful evaluation of antiviral and therapeutic drugs for in vitro screening before in vivo evaluation. further characterization of permissible cell types is necessary to improve our understanding of sars- cov- pathogenesis and to develop therapeutic strategies for the treatment of it has also been noted that following infection with sars-cov- , people typically seroconvert days following symptom onset, and there is increasing evidence suggesting that development of a neutralizing antibody response is a correlate of protection in patients recovered from . the covid- pandemic has highlighted the necessity for testing for neutralizing antibodies. as sars-cov- infection can have a range of manifestations from asymptomatic to fatal multiple organ failure [ ] [ ] [ ] [ ] [ ] , antibody testing and serological surveys are a critical tool for determining prior infection status and seroprevalence in a population . it is also the goal of many candidate sars-cov- vaccines to induce neutralizing antibodies targeting the viral spike protein, the major antigenic determinant of sars-cov- [ ] . for understanding the antibody response, assays that measure neutralizing antibody titer are considered the gold standard. one such tool for evaluating neutralizing antibody response is a plaque/focus neutralization reduction test (prnt/frnt), which evaluates the ability of polyclonal sera samples to prevent or reduce infection of a cell monolayer in vitro. previously, for sars- cov- , only prnt assays-which rely on the ability of virus to lyse infected cells and thus can take - hours to develop-have been used in the assessment of the neutralizing antibody response. it is not well known whether an frnt-which uses an immunostaining protocol to detect virus and does not depend on cell lysis, and thus is often more rapid-is amenable for detection of sars-cov- neutralizing antibody. in this study, we describe the growth kinetics of sars-cov- in multiple cell types and the methods our laboratory has used to optimize a sars-cov- focus forming assay (ffa) to improve sensitivity and specific detection. by characterizing the growth kinetics of sars-cov- on a variety of immortalized and primary cell lines, we have demonstrated which of these cell lines is susceptible to infection by sars-cov- .we also demonstrate that the ffa can be adapted to measure the neutralization capacity of polyclonal sera in an frnt. this high throughput frnt assay can be applied to sera from both animal models of sars-cov- infection, as well as human sars-cov- infected patients, and can serve as a useful assay for describing the kinetics of the neutralizing antibody response to sars-cov- . additionally, we have compared the expression of ace and sars-cov- spike on these cell lines to determine how spike expression correlates with susceptibility. the tools developed in this study have practical applications in both the basic science and translational approaches that will be critical in the ongoing effort to slow the covid- pandemic. based on our previous work optimizing the ffa for wnv, [ ] we proceeded to adapt the ffa for the detection of infectious sars-cov- . along with spot number, the spot size and border definition provide valuable information on possible differences in viral strains. as we have observed that the foci morphology, as well as spot number, can vary dramatically under different growth conditions, we sought to test different growth conditions and cell lines to determine the optimal conditions for sars-cov- viral titration. this goal was guided by previous studies that have suggested that the use of vero cells from varying origins can impact viral titer [ , ] . using both vero ccl- (atcc® ccl- ™, referred to in this text as vero who) and vero e (vero , atcc® crl- ™) cell lines, we determined if differences in foci number or size occurred to decide if one cell line was superior for titration by ffa (figure a- c) . although many laboratories utilize vero e cells for viral titer measurements of sars-cov- [ , ] , in our laboratory, vero e cells typically resulted in about two-fold lower foci formation relative to vero who cells (figure a-c) . figure a is a representative image of an ffa showing the viral titration on both the vero e and whos for both ~ ffu and ~ ffu when identical numbers of cells are seeded per well. we noted that at identical higher dilutions of sars-cov- virus stocks, vero who cells develop ~ individual foci per well, whereas vero e cells develop ~ foci per well ( figure a) . the same pattern was observed at lower dilutions of virus, with ~ foci formation on vero who cells yielding only ~ foci on vero e cells ( figure a ) the quantification of this difference in the vero who and vero e cell type is shown in figure b. interestingly, when we compared this observation to the genome copy number by we noted that vero e cells tended to produce significantly more virus across all , , and - hour timepoints (p = . ) ( figure c) . it is possible that the discrepancy between the ffa and qrt-pcr data could be due to vero who cells producing fewer genome copies yet more infectious virus than e cells, while e cells sustain more viral replication but yield less infectious virus. while we did not see any differences in foci morphology between the two vero cell lines we used, the significant difference in the number of foci observed between vero e and vero who (p < . ) suggests that vero who cells record higher viral titers than the vero e cells for sars-cov- ( figure b) . in order for sars-cov- to form distinct foci, it is critical to plate the optimal cell density. we examined the impact of cell density on foci formation for both vero who and vero e cells by plating identical dilutions of sars-cov- virus stocks on -well plates seeded with differing numbers of who or e cells ( × , . × or × cells/well) one day prior to infection of the cell monolayer. at these concentrations, on the day of infection, × , . × and × cells/well resulted in monolayers that were , and percent confluent respectively for both cell lines tested. figure d is a representative image of the focus forming assay showing the foci formations arising from different cell concentrations plated for both the vero e and whos. the quantification of the spot counts for vero who cells is shown in figure e . the quantification of the spot counts for vero e cells is shown in figure f . for both the vero whos and e cells, we observed a significant increase in the number of foci formed when either . × (e p = . , who p = . ) or × (e p = . , who p = . ) were seeded per well compared to × . however, there was no significant increase in foci formation when increasing the cell density from . × cells/well to × cells/well. thus, while the vero e cells fostered lower foci numbers at each confluency as compared to the who cells, viral titers were not significantly different at . × cells/well or × cells/well irrespective of whether the vero who (figure e ) or vero e ( figure f ) cell line was seeded for the assay. we have previously tested higher cell densities for ffas and have noted that cell concentrations higher than × cells/well results in an overly confluent monolayer with more cells than can adhere to the wells, leading to highly variable titer information (data not shown). the results of these studies suggest that vero who cells plated at either . × or × cells/well was optimal for the viral ffa. like plaque assays, the incubation time for the ffa is highly dependent on the viral replication cycle within the cells and the time required for infectious progeny to be released and spread to neighboring cells. whileunlike traditional plaque assaysthe ffa is not dependent on viral lysis of infected cells, the development of visible spots is dependent on the time it takes for viral protein production to occur and for infectious virus to spread to neighboring susceptible cells. to determine the optimal time frame for infection of sars-cov- on a vero who cell monolayer to form individual foci, we tested a variety of incubation times. in order to optimize these conditions, identical dilutions of sars-cov- virus stocks were added to infect vero who cells seeded at a density of . × cells/well, and incubated for , , , or hours post infection. figure g shows representative images of identical dilutions of sars-cov- ffas developed after , , or hour incubation times, respectively. the mean spot size of foci at each timepoint is quantified in figure h . the mean spot count per well at each time point is quantified in figure i . altering the incubation time had the most dramatic impact on mean foci size amid all other parameters tested. while there were no significant differences in the size of foci formed between hours post infection (hpi) and hpi (p = . ), we did observe the formation of significantly larger foci between the and hpi timepoints (p = . ) , as well as the and hpi timepoints (p = . ) ( figure h) . interestingly, at the same virus dilution, we found that there were fewer spots between hpi (mean spot number of ) relative to the hpi time point (p = . , mean spot number of . ) but not the hpi time point (p = . , mean spot value . ). similarly, we found that there were no significant differences in the number of foci formed between and hpi (p = . ) ( figure i ). however, we noted that this difference is within one standard deviation of the mean number of spots across all wells and was insignificant when determining titers. from this result we concluded that incubation times greater than hours resulted in a slight but significant increase in spot number, while assays incubated for up to did not alter the spot number but did increase the spot size. this increase in spot size but not number between and hours is a highly useful for the testing of anti-viral compounds which may require longer incubation times. in addition, larger spot size makes this assay more universally useful since laboratories without an automated machine can manually count spots, where the large size will improve readability of the assay. the ffa relies on an immunostaining protocol of an infected cell monolayer in order to quantify infectious virus titer and is therefore dependent upon sars-cov- -specific antibody binding. for this purpose, polyclonal guinea pig sera (bei: nr- ) raised against sars-cov- produces reproducible staining with minimal background. however, we have also used human as efforts to understand replication and transmission of sars-cov- are underway and vaccine development moves forward, more information regarding permissive cell types for sars- cov- infection and replication is needed. to determine the permissivity of several different cell types to infection with sars-cov- , we generated multistep growth curves for human, non- human primate, murine, hamster, and gastric adenocarcinoma cell lines. each of these cell lines was infected at a moi = . and cells or cell supernatants were collected aseptically, and total rna was isolated. cellular rna was normalized to an internal rnasep control for human and non-human primate cells and gapdh for murine cells. genome equivalents were determined using the applied biosystems taqman gene expression assay protocol for sars-cov- previously described [ ] . to identify susceptible cell lines, we first assessed genome copy number in non-human primate african green monkey kidney epithelial cells (vero who, vero e ) as well as human hepatocytes (huh . , huh ) and lung epithelial cells (a , calu- ). figure a shows the sars-cov- genome copy number for whole cells for human and non-human primate cell lines. figure b shows the sars-cov- genome copy number for cell culture supernatants for human and non-human primate cell lines. in each cell line aside from vero who and huh , genome equivalents within the cell peaked at hpi and remained relatively constant for the duration of the experiment. sars-cov- genome equivalents in vero who and huh cells peaked at hpi and remained relatively constant for the duration of the experiment. the vero e cell line reached the highest titer at . × copies/µl at hpi, with the huh . and calu- cell lines reaching . × copies/µl and . × copies/µl, at the same time point, respectively. vero who and huh cell lines reached the highest titer, . × copies/µl and . × copies/µl, respectively, at hpi. the a cells, although susceptible to infection, appeared to support little sars-cov- replication, reaching only copies/µl at hpi. we also measured genome copy number in the cell culture supernatant for all of the cell lines described ( figure b ) for each cell line, viral rna in the supernatant peaked at later timepoints, either hpi (vero who, . × copies/µl; vero e , . × copies/µl; calu- , . × copies/µl) or hpi (a , . copies/µl; huh , copies/µl; huh . , . × copies/µl). of these cell lines, the vero who cells had the highest titers in the supernatant, while vero e , huh . and calu- cells were comparable in terms of titer. as expected, a cells that did not contain high titers of cell-associated virus also did not contain high titers in the supernatant. interestingly, while huh cells support relatively high titers of cell-associated virus, they do not appear to yield high titers in the supernatant. these results suggest that vero e cells are most permissible for sars-cov- replication among all tested cell types and would be the ideal choice for propagation. vero who, huh . , and calu- cells are also permissible cell types for sars-cov- infection and replication, however a cells do not appear to be suitable for high levels of sars-cov- replication. our results also suggest that huh cells appear to be permissible for sars-cov- infection and replication, but do not appear to be suitable for egress into the cell culture supernatant. due to ongoing sars-cov- vaccine development efforts, there is an urgent need to develop and evaluate the susceptibility of small animal models to sars-cov- infection and covid- . recent studies have suggested that rodents may be used for these purposes, as well as to study the adaptive immune response to sars-cov- infection [ ] [ ] [ ] [ ] . to this end, we next sought to determine permissivity of rodent cell lines to sars-cov- infection, namely t and shhc cell lines. figure c shows the sars-cov- genome copy number for whole cells for murine and hamster cell lines, with vero who cells included for comparison. figure d shows the sars-cov- genome copy numbers for cell culture supernatants derived from murine and hamster cell lines, with vero who supernatant included for comparison. from total cellular rna, we detected only copies/µl in t cells at hpi, the time point at which the titer peaked. similarly, with shhc cells, we detected only copies/µl at hpi, the time point at which the titer peaked. in addition to total cellular rna, we also examined supernatants from cell culture and predictably found peak titers of only copies/µl in t cells and just copies/µl in shhc cells, both at hpi. these results suggest that neither t nor shhc cell lines are suitable for supporting sars-cov- replication or egress without further experimental finally, because it is known that hace is highly expressed by intestinal epithelial cells, we sought to examine the permissivity of human gastric adenocarcinoma cell lines to sars-cov- infection [ ] . for this purpose, we used ags and mkn cell lines, and examined viral genome copies associated with both total cellular rna as well as the cell supernatant. figure e shows the sars-cov- genome copy numbers for whole cells from gastric adenocarcinoma lines, with vero who cells included for comparison. figure f shows the sars-cov- genome copy numbers for cell culture supernatants from gastric adenocarcinoma lines, with vero who supernatant included for comparison. we found that mkn cells yielded relatively high titers, with cell-associated virus peaking at . × copies/µl at hpi. virus in the supernatant peaked at . × copies/µl at hpi. ags cells yielded lower titers, with cell-associated virus peaking at . × copies/µl at hpi and virus in the supernatant peaking at . × copies/µl hpi. interestingly, however, the titer in ags cells appeared more variable compared to other time points, increasing at and hpi and dropping at and hpi. these results suggest that these gastric adenocarcinoma cell lines can support infection, replication and egress of sars-cov- as well as, or in some cases better than, vero cell lines. given that our studies conducted to quantify sars-cov- viral genome copies in susceptible cell lines yielded results that highlighted highly permissive cell lines, like vero who, while also distinguishing less permissive cell lines, like a , we sought to analyze spike and hace protein co-expression to determine if higher hace expression correlated with higher susceptibility to sars-cov- infection. one facet of our understanding of the current sars-cov- outbreak that is rapidly evolving is sars-cov- seroprevalence in the general population. at the same time, forming a better understanding of and ability to assess the kinetics of the neutralizing antibody response to sars-cov- could be essential in further vaccine and anti-viral development efforts. to this end, we adapted the sars-cov- ffa for the quantification of neutralizing antibody (nab) titers in the form of an frnt. this was accomplished by incubating serially diluted convalescent serum from sars-cov- infected individuals with a known quantity of infectious sars-cov- (~ ffu) and measuring foci formation. infection was normalized to a pbs control to reflect the percent neutralization of sera. first, we sought to determine whether the frnt could be used to detect a range of nab concentrations in human samples. figure a shows the neutralization curves for human sera samples, showing a decrease in virus neutralization as the serum is diluted out. these samples were collected from human subjects at the university of puerto rico following a positive qpcr test for sars-cov- . all subjects were in the convalescence period at the time of sample collection. these assays were performed using deidentified residual sera samples. using the frnt approach, we quantified the neutralizing antibody titer in the form of the reciprocal serum dilution required to neutralize % of virus, or the frnt value. this assay can also be used to determine the frnt value (i.e. required for % neutralization). the frnt and frnt values are reported in figure b . the reciprocal serum dilutions required for % neutralization for the human samples (hs_a, hs_c and hs_d) are . , . , and . , respectively. the reciprocal serum dilutions required for % neutralization for hs_a, hs_c and hs_d are . , . , and . , respectively. for hs_b, the reciprocal serum dilution required to neutralize both % and % of the virus was below the lower limit of quantitation (lloq) for the assay. in order to increase confidence that these nabs were the result of recent sars-cov- infection rather than cross-reactivity with the four circulating human common cold coronaviruses, we performed an elisa to examine binding of these sera samples to sars-cov- receptor- binding domain (rbd). figure c shows the absorbance at nm (a ) values indicating that sera from these subjects contain antibodies that can bind specifically to the receptor binding domain (rbd) of sars-cov- . having confirmed that our assay can detect nab to sars-cov- in human sera, we next sought to demonstrate that this assay is applicable for numerous sera sources including non-human primates and mice. to this end, we performed an frnt with non-human primate (nhp) sera, which consisted of pooled sera samples from a group of rhesus macaques in the convalescent phase following sars-cov- infection by multiple routes (bei: nr- ). figure e shows the neutralization curve for this pool of nhp sera. low but detectable nab titers were present in this sample with an frnt value of . and frnt of . , as depicted in figure f . having shown that our assay can be utilized to quantify nabs in nhp samples, we next sought to demonstrate that this assay could also be used for quantifying nabs in small animal models such as mice. this also afforded us the opportunity to examine the nab response both at having demonstrated that our frnt assay can be used to quantify nab titers for human samples, non-human primates, and mice both in acute infection and memory responses, we next sought to determine whether this assay could be used to quantify nab resulting from a subunit or dna vaccine. to this end, we immunized c bl/ mice intramuscularly (i.m) with µg of dna encoding the sars-cov- spike (ms_c). a subset of these mice was boosted days later (ms_b, ms_d) with µg of dna intramuscularly and sera collected days following the boost. figure g shows the neutralization curves for these mice immunized with dna encoding the immunization. from these data we were able to define the frnt values for ms_b, ms_c, and ms_d, which are shown in figure h , and are . , . , . , respectively. however, the frnt values were below the lloq for the assay, as well as the frnt value for ms_a. these results suggest that the frnt can be used to detect nabs in sera resulting from immunizations, in addition to nabs in human, nhp, and mouse sera resulting from sars-cov- infections. to address the need for high-throughput, rapid quantification of infectious sars-cov- , our group has developed a focus forming assay (ffa) for sars-cov- using vero who cells. the strength of the ffa is the rapid visualization of individual foci forming from a single infectious unit or focus forming unit (ffu). the ffa for sars-cov- can be developed in as little as hours, shorter relative to traditional plaque assays for human coronaviruses which can take - days [ , , ] . the focus forming assay is also amenable to a -well plate format, allowing for assays to be scaled up or automated to handle large volumes of samples quickly relative to assays requiring plates with wells or fewer. automating the quantification of foci using equipment such as a ctl machine can also streamline the process of screening large numbers of samples. one potential disadvantage of the focus forming assay is the requirement of a sars-cov- specific antibody as the primary antibody for foci immunostaining. however, for our assays we have found that polyclonal guinea pig serum provides reproducible staining with minimal background when used at the appropriate concentrations, and numerous human monoclonal antibodies are now commercially available and suitable for this purpose [ ] [ ] [ ] . in regard to the focus forming assay development, we initially hypothesized that the absence of in vero e cells would make them more susceptible to sars-cov- infection and therefore a more sensitive choice of cell line for the focus forming assay. surprisingly, we found that vero who cells were more suited to foci formation. it is worth noting that other labs have shown that a higher titer and larger, clearer plaques result when vero e cells are used in place of vero who cells when performing plaque-assay based titrations with sars-cov- [ ]. this may reflect differences between the wuhan clinical isolate used in this study as opposed to the usa- wa / isolate or this may be an artifact of the focus forming assay. because we find that by qpcr, genome copy number is typically highest in vero e cells, we hypothesize that more defective or non-infectious virus results from replication in vero e cells. additionally, high levels of genome replication in vero e may not correlate with ability to spread laterally in cell culture and form foci. the discrepancy in sars-cov- replication in these two cell lines warrants further study. our understanding of the impact of cell type on sars-cov- entry, replication, assembly, and egress is in its infancy. these gaps in our knowledge were recently made evident by the use of chloroquine and hydroxychloroquine-widely used anti-malarial drugs that create suboptimal to advance our understanding of the sars-cov- life cycle in susceptible cell types, we generated multi-step growth curves for a variety of human, simian, and rodent cell types. in most cases, viral replication peaked at hpi in susceptible cell lines and this cell-associated virus was maintained for the duration of the experiment. in many cases, however, the presence of virus in the supernatant did not peak until - hpi. in the context of the viral replication cycle, our data suggests that genome replication in vitro peaks after just hours, however assembly and egress from infected cells may take as long as - hpi. while there are conflicting reports concerning the suitability of huh cells for sars-cov- studies [ , ] we observed a striking discrepancy was between cell-associated virus within the total rna and the virus detected within the cell supernatant. as much as . × copies/µl of cell-associated virus was detectable in rna isolated from huh cells, but virtually no detectable virus was found in the cell supernatant. this suggested to us that viral entry-and hence the production of cell-associated virus within the total rna fraction-was independent of successful viral egress. this trend did not hold for rig-i-deficient huh . cells [ ] , suggesting that viral egress is interferon (ifn) sensitive. this observation is in accordance with previous studies describing sars-cov- sensitivity to type i ifns [ , ] . further studies are warranted to determine what factors are necessary and sufficient for viral egress and could therefore serve as potential therapeutic targets. cov- infection, pathogenesis, and possibly transmission [ , [ ] [ ] [ ] , which may reflect differing susceptibility of hamster cell types based on anatomical location of the isolated cells. we did not observe a strong correlation between ace and viral spike protein levels, nor did we see a strong relationship between viral genome copy and ace mrna level. our results suggest that host cell susceptibility to sars-cov- infection is more complicated than ace expression alone, thus warranting further investigation. we have showed by elisa that convalescent sera sourced from human, non-human primate, and mice infected with sars-cov- can bind to the rbd of sars-cov- . while the s subunit of the sars-cov- spike protein is highly conserved among betacoronaviruses, previous studies have showed that the rbd within the spike protein of sars-cov- is unique [ , ] . in serological studies of sars-cov- , the presence of antibodies binding sars-cov- rbd is considered the most sensitive and specific indicator of previous sars-cov- exposure. these results increase our confidence that within these polyclonal sera samples are neutralizing antibodies that are specific to sars-cov- , rather than cross-reactivity due prior coronavirus exposure, as has been called into question by some [ , , ] . as other labs have noted [ ] we observed that binding to sars-cov- rbd appears to correlate with neutralization capacity, as human samples with a high auc for rbd binding by elisa also had lower ec values, indicating that low concentrations of sera from these patients were sufficient to neutralize % of a standardized amount of virus. we have showed that each of these samples can effectively neutralize sars-cov- in vitro and that neutralization can be with µl per well of diluted samples. this plate containing sample dilution on the cell monolayer was placed in an incubator with °c, % co for hour. a solution of % methylcellulose (sigma-m - g) in × pbs was made in advance of the assay and stored at °c until ready to use. on the day of the assay and during the one-hour infection period, % methylcellulose was diluted : in % dmem and placed on a rocker to mix. the % methylcellulose-media mixture (hereby referred to as overlay media) was stored at room temperature until ready to use. after the one-hour infection period, the well plate containing sample dilution and cell monolayer was removed from incubator. overlay was added to the plate by adding µl of overlay media to each well. this step reduces the uncontrolled spread of virus throughout the monolayer on the well, making it difficult to distinguish individual foci. after the addition of overlay media, the plate was returned to an incubator with °c, % co for hours. the plate was removed from the incubator and the media containing the overlay and sample was aspirated off. one wash with µl of × pbs per well was performed, taking care not to disrupt the cell monolayer. µl per well of % pfa in pbs was added for the fixing step. with the % pfa still in the plate, the plate was submerged in a bath of % formalin buffered phosphate (fisher: sf - ) in × pbs for minutes at room temperature. after minutes, the plate was removed from the formaldehyde bath and the % pfa removed from the monolayer. one wash with µl of × pbs (tissue culture grade) per well was performed. the plate was submerged in a bath of × pbs to rinse and removed from bsl- containment. foci were visualized by an immunostaining protocol. the -well plate was first washed twice with µl per well of ffa wash buffer ( × pbs, . % triton x- ). the primary antibody consisted of polyclonal anti- sars-cov- guinea pig sera (bei: nr- ) and was diluted : , with ffa staining buffer ( × pbs, mg/ml saponin (sigma: )).then, µl per well of primary antibody was allowed to incubate for hours at room temperature or °c overnight. the -well plate was then washed three times with µl per well of ffa wash buffer. the secondary antibody consisted of goat anti-mouse conjugated horseradish peroxidase (sigma: a- ) diluted : , in ffa staining buffer. similarly, µl per well of secondary antibody was allowed to incubate for hours at room temperature or °c overnight. the plate was washed three times with ul per well of ffa wash buffer. finally, µl per well kpl trueblue hrp substrate was added to each well and allowed to develop in the dark for - minutes, or until blue foci are visible. the reaction was then quenched by two washes with millipore water and imaged immediately thereafter with a ctl machine to quantify foci. briefly, sera samples were diluted : in % dmem and added to the topmost row of a round bottom -well plate. sera samples were then serially diluted : down the remainder of the plate in % dmem. an equal volume of sars-cov- diluted to ~ ffu/ml (~ ffu/ µl) was then added to the serially diluted sample, mixed thoroughly, and allowed to incubate at °c for hour. then µl of sars-cov- +sera mixture was transferred to a vero who cell monolayer (as described in the focus forming assay). from this point, the assay was as described in the focus forming assay section. binding of human polyclonal sera to recombinant sars-cov proteins was determined by elisa. a ug/ml mixture of µl per well containing recombinant protein in carbonate buffer ( . m na co . m nahco ph . ) was used to coat maxisorp (thermofisher) -well plates overnight at °c. plates were blocked with blocking buffer (pbs + %bsa + . % tween) for hours at room temperature the following day and washed four times with wash buffer. polyclonal sera was serially diluted in blocking buffer prior to plating. sera was allowed to incubate for hour at room temperature and washed four times with wash buffer. following the one hour incubation, goat-anti-human igg hrp (sigma) conjugated secondary ( : ) was added and allowed to incubate for hour at room temperature. the plate was washed again four times with wash buffer and the elisa was visualized with µl per well of tmb enhanced substrate (neogen diagnostics) and allowed to develop in the dark for minutes. a solution of n hcl was used to quench the reaction and the plate was read for an optical density of nanometers on a biotek epoch plate reader. the total peak area under the curve (auc) was calculated using graphpad prism . isolation of rna from cell culture and culture supernatants rna was isolated from cell culture and supernatant using an invitrogen purelink rna mini kit according to the manufacturer's instructions. rt-qpcr hace expression was measured by qrt-pcr using taqman primer and probe sets from idt (assay id hs.pt. . ). sars-cov- viral burden was measured by qrt-pcr using taqman primer and probe sets from idt with the following sequences: were allowed to infect cell monolayer for hour at °c, % co before overlay with % methylcellulose in % dmem. following a hour incubation at °c, % co , cells were fixed in a solution of % paraformaldehyde diluted in × pbs. foci were visualized via immunostaining with polyclonal guinea pig anti-sars-cov- sera and goat anti-guinea pig conjugated hrp. kpl neutralization capacity was measured by mixing serially diluted sera with a standardized amount of virus (~ ffu) and allowed to incubate for hour at °c, % co before allowed to infect a cell monolayer as described in figure non-human primate, and mouse sera samples previously described. the incubation period of coronavirus disease publicly reported confirmed cases: estimation and application. annals of internal medicine early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia identification of a novel coronavirus causing severe pneumonia in human: a descriptive study a new coronavirus associated with human respiratory disease in china. nature a pneumonia outbreak associated with a new coronavirus of probable bat origin the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- the proximal origin of sars-cov- the molecular biology of coronaviruses the m, e, and n structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles structural basis for the recognition of sars-cov- by full-length human ace sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues specific ace expression in small intestinal enterocytes may cause gastrointestinal symptoms and injury after -ncov infection structure, function, and antigenicity of the sars-cov- spike glycoprotein tmprss and tmprss promote sars-cov- infection of human small intestinal enterocytes syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response neutralizing antibodies correlate with protection from sars-cov- in humans during a fishery vessel outbreak with high attack rate. medrxiv neutralizing antibody responses to sars-cov- in a covid- recovered patient cohort and their implications. medrxiv, : p. . . quantifying antibody kinetics and rna shedding during early-phase sars-cov- infection antibody responses to sars-cov- in patients with covid- the epidemiology and clinical information about covid- . european journal of clinical microbiology & infectious diseases : official publication of the european society of clinical microbiology clinical characteristics of coronavirus disease in china the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak sars-cov- and viral sepsis: observations and hypotheses. the lancet clinical findings in a group of patients infected with the novel coronavirus (sars-cov- ) outside of wuhan, china: retrospective case series the receptor-binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients propagation, quantification, detection, and storage of west nile virus development and optimization of a direct plaque assay for human and avian metapneumoviruses enhanced isolation of sars-cov- by tmprss -expressing cells isolation and characterization of sars-cov- from the first us covid- patient sars-coronavirus- replication in vero e cells: replication kinetics, rapid adaptation and cytopathology mrna induced expression of human angiotensin-converting enzyme in mice for the study of the adaptive immune response to severe acute respiratory syndrome coronavirus . biorxiv syrian hamsters as a small animal model for sars-cov- infection and countermeasure development mouse model of sars-cov- reveals inflammatory role of type i interferon signaling. biorxiv a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies plaque assay for human coronavirus nl using human colon carcinoma cells two detailed plaque assay protocols for the quantification of infectious sars-cov- . current protocols in microbiology potently neutralizing and protective human antibodies against sars-cov- potently neutralizing human antibodies that block sars-cov- receptor binding and protect animals. biorxiv, : p. . . . rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the sars-cov- spike protein hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov- infection in vitro remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) a randomized trial of hydroxychloroquine as postexposure prophylaxis for covid- chloroquine does not inhibit infection of human lung cells with sars- cov- proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism regulating intracellular antiviral defense and permissiveness to hepatitis c virus rna replication through a cellular rna helicase, rig-i sars-cov- is sensitive to type i interferon pretreatment. biorxiv : the preprint server for biology antiviral activities of type i interferons to sars-cov- infection. antiviral research pathogenesis and transmission of sars-cov- in golden hamsters massive transient damage of the olfactory epithelium associated with infection of sustentacular cells by sars-cov- in golden syrian hamsters. brain, behavior, and immunity surgical mask partition reduces the risk of non-contact transmission in a golden syrian hamster model for coronavirus disease (covid- ) a serological assay to detect sars-cov- seroconversion in humans cross-reactivity towards sars-cov- : the potential role of low-pathogenic human coronaviruses. the lancet microbe human neutralizing antibodies elicited by sars-cov- infection key: cord- -wtmjt hf authors: zha, lisha; zhao, hongxin; mohsen, mona o.; hong, liang; zhou, yuhang; li, zehua; yao, chuankai; guo, lijie; chen, hongquan; liu, xuelan; chang, xinyue; zhang, jie; li, dong; wu, ke; vogel, monique; bachmann, martin f; wang, junfeng title: development of a covid- vaccine based on the receptor binding domain displayed on virus-like particles date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wtmjt hf the recently ermerging disease covid- is caused by the new sars-cov- virus first detected in the city of wuhan, china. from there it has been rapidly spreading inside and outside china. with initial death rates around %, covid- patients at longer distances from wuhan showed reduced mortality as was previously observed for the sars coronavirus. however, the new coronavirus spreads more strongly, as it sheds long before onset of symptoms or may be transmitted by people without symptoms. rapid development of a protective vaccine against covid- is therefore of paramount importance. here we demonstrate that recombinantly expressed receptor binding domain (rbd) of the spike protein homologous to sars binds to ace , the viral receptor. higly repetitive display of rbd on immunologically optimized virus-like particles derived from cucumber mosaic virus resulted in a vaccine candidate (rbd-cumvtt) that induced high levels of specific antibodies in mice which were able to block binding of spike protein to ace and potently neutralized the sars-cov- virus in vitro. onset of symptoms or may be transmitted by people without symptoms. rapid development of a protective vaccine against covid- is therefore of paramount importance. here we demonstrate that recombinantly expressed receptor binding domain (rbd) of the spike protein homologous to sars binds to ace , the viral receptor. higly repetitive display of rbd on immunologically optimized virus-like particles derived from cucumber mosaic virus resulted in a vaccine candidate (rbd-cumvtt) that induced high levels of specific antibodies in mice which were able to block binding of spike protein to ace and potently neutralized the sars-cov- virus in vitro. covid- is caused by a novel coronavirus closely related to viruses causing sars and mers. as the disease caused by the other two viruses, covid- mainly manifests symptoms in the lung and causes cough and fever . the disease covid- is less severe than sars and mers, which is beneficial per se but leads to easier and wider spread of the virus, in particular due to infected individuals with very little symptoms ("spreaders") and a long incubation time (up to weeks) combined with viral shedding long before disease onset . a vaccine with rapid onset of protection is therefore in high demand for the control of the pandemic that is currently taking its course. the spike protein of covid- is highly homologous to the spike protein of sars and both viruses share the same receptor, which is angiotensin converting enzyme (ace ) , . the receptor binding domain (rbd) of the sars spike protein binds to ace and is an important target for neutralizing antibodies [ ] [ ] [ ] . by analogy, the rbd of covid- spike protein may be expected to similarly be the target of neutralizing antibodies, blocking the interaction of the virus with its receptor. we have previously shown that antigens displayed on virus-like particles (vlp) induce high levels of antibodies in all species tested, including humans . more recently, we have developed an immunologically optimized vlp platform based on cucumber mosaic virus. these cumvtt vlps (hereafter cumvtt) incorporate a universal t cell epitope derived from tetanus toxin providing pre-existing t cell help. in addition, during the production process these vlps package bacterial rna which is a ligand for toll-like receptor / and serves as potent adjuvants . using antigens displayed on these vlps, it was possible to induce high levels of specific antibodies in mice, rats, cats, dogs and horses and treat diseases such as atopic dermatitis in dogs or insect bite hypersensitivity in horses [ ] [ ] [ ] . to generate a covid- vaccine candidate, we therefore attempted to display the rbd domain on cumvtt (fig. a) . to this end we gene-synthesized the covid- rbd domain and fused it to an fc molecule for better expression. as expected, the protein bound efficiently to the viral receptor ace as determined by sandwich elisa (fig. b) . in a next step, the protein was chemically coupled to the surface of cumvtt using the well established chemical cross-linkers sata and smph (ref. ). sds-page and western blotting confirmed efficient coupling of the rbd-fusion molecule to cumvtt, resulting in the vaccine candidate rbd-cumvtt (fig. c,d) . to test immunogenicity of the vaccine candidate, mice were immunized three times (weekly schedule) with the rbd-fusion molecule alone or conjugated to the surface of cumvtt formulated in montanide adjuvants. as shown in fig. a -c, coupling to vlps dramatically increased the immunogenicity of the rbd. as shown by elisa on recombinant rbd, rbd-cumvtt showed strongly increased immunogenicity at all time-points tested (one week after the vaccine injection time-points). to assess the potential for anti-viral activity, we assessed whether the induced antibodies were able to block binding of the rbd protein to the viral receptor ace . as shown in fig. , immune sera obtained after two boosts (day ) were able to strongly inhibit rbd binding to ace . the best correlate of protection is viral neutralization. to this end, we generated pseudotyped retroviruses expressing the sars-cov- spike protein and luciferase for quantification of infection (fig. a ). using these viruses, the neutralizing capacity of the sera from immunized mice was assessed on ace -transfected cells (fig. b) , directly demonstrating high anti-viral neutralizing activity of the induced antibodies. hence, the rbd-cumvtt vaccine candidate is able to induce high levels of sars-cov- neutralizing antibodies. furthermore, the cumvtt based vaccine is based on highly efficient expression systems and chemical conjugation technologies, rendering it an attractive candidate for large scale production under cgmp. previous studies with a similar vlp-based conjugate vaccine has demonstrated that high levels of specific antibodies can be mounted within a week , (see also fig. a) , offering the additional possibilities to rapidly immunize individuals that have been exposed to infected humans or those that are kept in quarantine. thus, vaccines based on the sars-cov- rbd domain displayed on vlps may have the potential to critically interfere with global spread of the virus. the sars-cov- receptor-binding domain (rbd) and the n-terminal peptidase domain of human ace were expressed using f cells (invitrogen). the sars-cov- rbd (residues arg -phe ) with an n-terminal il- signal peptide for secretion and a cterminal fc tag for purification was inserted into pfuse-migg -fc vector (invitrogen). the construct was transformed into bacterial dh α competent cells, and the extracted plasmid was then transfected into f cells at a density of × cells/ml using pei (invitrogen). the cell culture supernatant containing the secreted rbd was harvested h after infection, concentrated and buffer-exchanged to hbs ( mm hepes, ph . , mm nacl). rbd was captured by protein a resin (ge healthcare) and eluted with gly-hcl buffer ph . . fractions containing rbd were collected and neutralized to ph . with m tris. for elisa coating, ace was cleaved from the fc part using thrombin as described in the manufacturer's manual. the human ace (residues ser -ser ) with an n-terminal il- signal peptide for secretion and a c-terminal ×his tag for purification was inserted into pfuse-vector (invitrogen). the human ace was expressed by essentially the same protocol used for the sars-cov- rbd. ace was captured by ni-nta resin (ge healthcare) and eluted with mm imidazole in hbs buffer. rbd was then purified by gel filtration chromatography using the superdex column (ge healthcare) pre-equilibrated with hbs buffer. fractions containing ace were collected. the antibody competitive binding activities of the serum were assayed by elisa. ace ( ug/ml) was incubated in -well plate overnight at °c. after incubation, the plate was blocked with % bsa for h at °c and then washed five times with pbs containing . % tween . bsa was used as negative control followed by the addition of a mixture of -fold diluted serum and rbd-mfc ( . ug/ml) followed by incubation for min with gentle shaking at °c. plates were washed five times with pbs containing . % tween (pbt) followed by µl of horseradish peroxidase/anti-mfc antibody conjugate (diluted : in pbt buffer), incubated min with gentle shaking. plates were washed five times pbt buffer and developed with µl of freshly prepared , ', , '-tetramethylbenzidine (tmb) substrate. reaction was stopped with µl of . m h po and read spectrophotometrically at nm in a microtiter plate reader. the production of cumvtt was described in detail in zeltins et al. briefly, e coli c cells the rbd was conjugated to cumvtt using the cross-linker succinimidyl -(betamaleimidopropionamido) hexanoate (smph) (thermo fisher scientific, -molar excess, minutes, °c). the coupling reactions were performed with . x molar excess of rbd, . x rbd, or equal molar amount of rbd regarding the cumvtt (shaking at °c for hours at rpm on dsg titertek; flow laboratories, irvine, united kingdom). unreacted smph and rbd proteins were removed using amicon-ultra . , k (merck-millipore, burlington, mass). vlp samples were centrifuged for minutes at , rpm for measurement on nd- . coupling efficiency was calculated by densitometry (as previously described for il a-cumvtt vaccine ), with a result of approximately % to %. pseudovirus expressing the sars-cov- spike protein was produced by lentivirus second- the t-ace cells which stably express ace receptors on the cell membrane were prepared by transfection of ace gene into t cells using lentivirus system. pseudoviruses prepared above were added to the t-ace cells ( × cells/well) with μl polybrene ( μg/ml). after h, the infection was monitored using the luciferase assay system (promega). titer was calculated based on serial dilutions of pseudovirus. the mouse serum samples ( μl) were diluted to : , : , : , : and : respectively, and then mixed with an equal volume of pseudovirus stock. after incubation at °c for h, the mixture was inoculated on the t-ace cells ( x cells/well). at the same time, pseudovirus+dmem medium was set as a positive control and dmem medium only was set as a negative control. after the cells were incubated for hours, serum neutralization was measured by luciferase activity of infected pseudovirus. a cut-off of > % was used as to determine neutralizing titer. clinical characteristics of coronavirus disease in china characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a cases from the chinese center for disease control and prevention angiotensin-converting enzyme is a functional receptor for the sars coronavirus composition and divergence of coronavirus spike proteins and host ace receptors predict potential intermediate hosts of sars-cov- a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensin-converting enzyme an efficient method to make human monoclonal antibodies from memory b cells: potent neutralization of sars coronavirus receptor-binding domain of sars-cov spike protein induces highly potent neutralizing antibodies: implication for developing subunit vaccine therapeutic vaccines for chronic diseases: successes and technical challenges incorporation of tetanus-epitope into virus-like particles achieves vaccine responses even in older recipients in models of psoriasis, alzheimer's and cat allergy vaccination against il- for the treatment of atopic dermatitis in dogs treating insect-bite hypersensitivity in horses with active vaccination against il- pseudotyped lentiviral vectors: one vector, many guises. hum interaction of viral capsidderived virus-like particles (vlps) with the innate immune system vaccine against peanut allergy based on engineered virus-like particles displaying single major peanut allergens generation of high-titer pseudotyped lentiviral vectors key: cord- -zl g wqo authors: sato, taku; ueha, rumi; goto, takao; yamauchi, akihito; kondo, kenji; yamasoba, tatsuya title: expression of ace and tmprss proteins in the upper and lower aerodigestive tracts of rats date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zl g wqo objective patients with coronavirus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ), exhibit not only respiratory symptoms but also symptoms of chemo-sensitive disorders and kidney failure. cellular entry of sars-cov- depends on the binding of its spike protein to a cellular receptor named angiotensin-converting enzyme (ace ), and the subsequent spike protein-priming by host cell proteases, including transmembrane protease serine (tmprss ). thus, high expression of ace and tmprss are considered to enhance the invading capacity of sars-cov- . methods to elucidate the underlying histological mechanisms of the aerodigestive disorders caused by sars-cov- , we investigated the expression of ace and tmprss proteins in the aerodigestive tracts of the tongue, hard palate with partial nasal tissue, larynx with hypopharynx, trachea, esophagus, lung, and kidney of rats through immunohistochemistry. results strong co-expression of ace and tmprss proteins was observed in the nasal respiratory epithelium, trachea, bronchioles, alveoli, kidney, and taste buds of the tongue. remarkably, tmprss expression was much stronger in the peripheral alveoli than in the central alveoli. these results coincide with the reported clinical symptoms of covid- , such as the loss of taste, loss of olfaction, respiratory dysfunction, and acute nephropathy. conclusions a wide range of organs have been speculated to be affected by sars-cov- depending on the expression levels of ace and tmprss . differential distribution of tmprss in the lung indicated the covid- symptoms to possibly be exacerbated by tmprss expression. this study might provide potential clues for further investigation of the pathogenesis of covid- . level of evidence na the coronavirus disease , driven by the novel severe acute respiratory syndrome coronavirus (sars-cov- ), is explosively spreading worldwide since its first report in december . viral loads of sars-cov- have been found to be high in the upper respiratory tract, especially in the nasopharynx , whereas that of sars-cov had been reported to be high in the lower respiratory tract . the symptoms of covid- in patients are mostly similar to those of other respiratory infections, including that of sars: fever ( - %), cough ( - %), fatigue ( - %), sore throat ( - %), and sputum ( - %) , . moreover, dysfunction of senses, such as loss or change of taste (ageusia or dysgeusia) and loss of smell (anosmia), has been consistently reported as a unique clinical feature of this disease ( - %) , . angiotensin-converting enzyme (ace ) is known to be the receptor responsible for the cellular entry of coronaviruses, including sars-cov- , . once the viral spike (s) protein binds to ace , it is primed by the transmembrane protease serine (tmprss ) of host cells, thereby facilitating viral entry . thus, high expression of ace and tmprss may be considered to enhance the invasion of sars-cov- . previous genetic studies had demonstrated high expression of ace on the human oral mucosa, including tongue epithelium , and co-expression of ace and tmprss in both human nasal and bronchial epithelia . however, histological studies have been limited. based on the aforementioned aspects, this study aimed to elucidate the underlying histological mechanisms of the upper and lower aerodigestive disorders by studying the expression of ace and tmprss on the aerodigestive tracts of rats. tissue samples were obtained from rats examined in previous studies , . five eight-week-old male sprague dawley rats were included in the control group. the following paraffin-embedded tissues were collected: the tongue, hard palate with partial nasal tissue, larynx with hypopharynx, trachea, esophagus, lung, and kidney. all animal experiments were conducted in accordance with institutional guidelines and with the approval of the animal care and use committee of the university of tokyo (no. p - , p - ). to detect the expression of ace and tmprss in the upper and lower airway, histological analysis was performed by immunostaining. four-micron-thick serial paraffin sections were deparaffinized in xylene and dehydrated in ethanol before immunostaining. prior to immunostaining, deparaffinized sections were treated with % hydrogen peroxide to block endogenous peroxidase activity, and then incubated in blocking one (nacalai tesque, kyoto, japan) to block any non-specific binding to immunoglobulin. after antibody activation, primary antibodies against ace ( : dilution; rabbit monoclonal, abcam, ab ; cambridge, uk) and tmprss ( : dilution; rabbit monoclonal, abcam, ab ; cambridge, uk) were detected with appropriate peroxidase-conjugated secondary antibodies and a diaminobenzidine substrate. images of all sections were captured using a digital microscope (keyence bz-x ) with × and × objective lenses. in the tongue, both ace and tmprss proteins were strongly co-expressed in the taste buds of foliate papillae (fig. a , b). in the nasal cavity, the olfactory neuroepithelium showed sparse ace -positive cells, and remarkable tmprss immunostaining in the cytoplasm. in the villous brush border of the respiratory columnar epithelium, ace and tmprss were highly co-expressed (fig. c, d) . in the palate, ace and tmprss were weakly and diffusely co-expressed in the squamous epithelium (fig. e ). in the epithelium of the larynx, tmprss expression was explicit, whereas ace staining was weak and sparse. this tendency was similarly observed in the whole larynx from epiglottis to the subglottis (fig. ). in the trachea, the cilia of primary tracheal epithelial cells showed high expression of ace and tmprss . in addition, tmprss was abundantly present in the cytoplasm of ciliated columnar epithelium (fig. a, b) . in the lung, consistent with findings in the trachea, ace and tmprss were strongly expressed in the epithelium of bronchiole, in alveolar epithelial cells, and in capillary endothelium. remarkably, tmprss -positive cells were more abundant in the peripheral alveoli than in the central part of the lung (fig. a, b ). in the digestive tract, ace was strongly expressed in the superficial layer of the epithelium of hypopharynx and esophagus, whereas the cytoplasm of squamous epithelium was only weakly positive for tmprss ( fig. a, b, c ). in the kidney, ace staining was strong in the brush boarder of proximal tubular cells, weak in the cytoplasm of proximal and distal tubular cells, and negative in the glomeruli. expression of tmprss was strong in the cytoplasm of proximal tubular cells, weak in the distal tubular cells, and negative in the glomeruli (fig. c, d) . in this study, we reported the immunolocalization of ace and tmprss , which are considered to play a pivotal role in the manifestation of covid- . co-expression of ace and tmprss may induce and enhance the invasion of sars-cov- into the organs . notably, strong co-expression of ace and tmprss proteins was observed in the taste buds of the tongue, nasal respiratory epithelium, trachea, bronchioles, alveoli, and the kidney. remarkably, tmprss expression was much stronger in the peripheral alveoli than in the central alveoli. in the upper and lower respiratory tracts, the whole epithelium showed expression of ace to a certain extent, whereas tmprss expression varied across sites. these results coincide with the reported clinical symptoms of covid- , such as loss of taste, loss of olfaction, respiratory dysfunction, and acute nephropathy , . the strong co-expression of ace and tmprss in the taste buds may explain the high incidence of ageusia or dysgeusia in patients with sars-cov- infection. although gene expression of ace in the tongue had been examined previously , the present study is the first to reveal co-expression of ace and tmprss proteins in the taste buds. olfaction is also impaired by sars-cov- . the current study revealed the presence of both ace and tmprss in the olfactory neuroepithelium and respiratory epithelium, thus providing the first step in understanding the pathogenesis of olfactory impairment due to sars-cov- infection. future studies to elucidate the mechanisms of sensorineural olfactory loss in covid- might provide novel insights into sars-cov- -related olfactory dysfunction. the lung is an organ highly susceptible to sars-cov- infection. while ace is moderately expressed in the bronchial epithelium and in type pneumocytes, tmprss is strongly expressed in the cytoplasm of bronchioles and alveolar epithelial cells . although ace was found to exist on alveolar epithelial cells at approximately similar level as in the whole lung, the expression level of tmprss protein was considerably different between the peripheral and central parts of the lung. considering the peripheral parts of the lung to strongly express tmprss , along with ace , sars-cov- may be considered to damage the peripheral area at the very beginning of infection. thus, the present study could histologically resolve why chest ct reveals consolidation and ground glass opacities in the bilateral peripheral lobes in confirmed covid- cases . the tracheal epithelium, as well as nasal respiratory epithelium, was found to express both ace and tmprss . unlike that in the nasal respiratory epithelium, tmprss was more strongly expressed in the cytoplasm of tracheal epithelium, hence suggesting the trachea to be more prone to developing clinical symptoms after sars-cov- infection than the nasal tissue. the palate, pharynx, and esophagus are similarly lined by squamous epithelium. however, ace and tmprss expression was found to be quite different across the organs in the present study. the palate displayed weak cytoplasmic staining of ace /tmprss in its epithelium, whereas the hypopharynx and esophagus had high expression of ace and weak expression of tmprss in the epithelium. weak-to-moderate co-expression of ace /tmprss in the squamous mucosa of the oropharynx and esophagus may explain the infrequent involvement of oral and throat symptoms in patients with covid- , without advancing to severity. regarding ace expression in the vocal cords, previous studies had provided conflicting results; while one study asserted lack of ace expression in the vocal cords , the other demonstrated ace expression on the surface epithelium ; the present study supported the latter. as for limitations of the present study, the evaluation was performed on young rats which may not correctly reflect the backgrounds of covid- in human. future clinical case studies and autopsy studies might strengthen this study. this study demonstrated the co-expression of ace and tmprss by cells in the tongue and nasal mucosa, as well as by those in the epithelium of larynx, trachea, and bronchus. a wide range of tissues and organs are speculated to be affected by sars-cov- , depending on the expression levels of ace and tmprss . differential distribution of tmprss in the lung indicated covid- symptoms to possibly be exacerbated by tmprss expression. b: in the trachea, cilia of the epithelial cells were weakly stained for ace while the cells were strongly stained for tmprss . c: in the esophagus, the surface of squamous epithelium was distinctly stained for ace . positive staining for tmprss was weakly present in the cytoplasm of the epithelium and clearly so in the esophageal muscles. first two months of the coronavirus disease (covid- ) epidemic in china: real-time surveillance and evaluation with a second derivative model sars-cov- viral load in upper respiratory specimens of infected patients virological assessment of hospitalized patients with covid- clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china clinical features of patients infected with novel coronavirus in wuhan olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (covid- ): a multicenter european study. european archives of oto-rhino-laryngology: official journal of the european federation of oto-rhino-laryngological societies (eufos) : affiliated with the german society for oto-rhino-laryngology -head and neck surgery objective evaluation of anosmia and ageusia in covid- patients: single-center experience on cases angiotensin-converting enzyme is a functional receptor for the sars coronavirus a new coronavirus associated with human respiratory disease in china citation classics: most-cited articles from archives of pm&r. archives of physical medicine and rehabilitation tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis high expression of ace receptor of -ncov on the epithelial cells of oral mucosa laryngeal mucus hypersecretion is exacerbated after smoking cessation and ameliorated by glucocorticoid administration acute inflammatory response to contrast agent aspiration and its mechanisms in the rat lung enhanced isolation of sars-cov- by tmprss -expressing cells association of chemosensory dysfunction and covid- in patients presenting with influenza-like symptoms. international forum of allergy renal involvement and early prognosis in patients with covid- pneumonia influenza and sars-coronavirus activating proteases tmprss and hat are expressed at multiple sites in human respiratory and gastrointestinal tracts frequency and distribution of chest radiographic findings in covid- positive patients analysis of ace in polarized epithelial cells: surface expression and function as receptor for severe acute respiratory syndrome-associated coronavirus key: cord- -cogk kig authors: zhu, yuanmei; yu, danwei; yan, hongxia; chong, huihui; he, yuxian title: design of potent membrane fusion inhibitors against sars-cov- , an emerging coronavirus with high fusogenic activity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cogk kig the coronavirus disease covid- , caused by emerging sars-cov- , has posed serious threats to global public health, economic and social stabilities, calling for the prompt development of therapeutics and prophylactics. in this study, we firstly verified that sars-cov- uses human ace as a cell receptor and its spike (s) protein mediates high membrane fusion activity. comparing to that of sars-cov, the heptad repeat (hr ) sequence in the s fusion protein of sars-cov- possesses markedly increased α-helicity and thermostability, as well as a higher binding affinity with its corresponding heptad repeat (hr ) site. then, we designed a hr sequence-based lipopeptide fusion inhibitor, termed ipb , which showed highly poent activities in inibibiting the sars-cov- s protein-mediated cell-cell fusion and pseudovirus infection. ipb also inhibited the sars-cov pseudovirus efficiently. moreover, the strcuture and activity relationship (sar) of ipb were characterzized with a panel of truncated lipopeptides, revealing the amino acid motifs critical for its binding and antiviral capacities. therefore, the presented results have provided important information for understanding the entry pathway of sars-cov- and the design of antivirals that target the membrane fusion step. in late december of , a new infectious respiratory disease emerged in wuhan, china. the pathogen was soon identified as a novel coronavirus (cov) ( - ), which was initially termed covs, a large group of enveloped viruses with a single positive-stranded rna genome, are genetically classified into four genera: α-, β-, γ-, and δ-covs ( , ). the previously known six covs that cause human disease include two α-covs (nl ; e) and four β-covs comprises of s and s subunits and exists in a metastable prefusion conformation. the s subunit, which contains a receptor-binding domain (rbd) capable of functional folding independently, is responsible for virus binding to the cell surface receptor. a recent study suggested that ace -binding affinity of the rbd of sars-cov- is up to -fold higher than that of sars-cov, which may contribute to the significantly increased infectivity and transmissibility ( ). the receptor-binding deem to trigger large conformational changes in the s complex, which destabilize the prefusion trimer resulting in shedding of the s subunit and activate the fusogenic activity of the s subunit ( - ). as illustrated in fig. , the sequence structure of s contains an n-terminal fusion peptide (fp), heptad repeat (hr ), heptad repeat (hr ), transmembrane region (tm), and cytoplasmic tail (ct). during the fusion process, the fp is exposed and inserts into the target cell membrane, leading s in a prehairpin intermediate that bridges the viral and cell membranes; then, three hr segments self-assemble a trimeric coiled-coil and three hr segments fold into the grooves on the surface of the hr inner core, thereby resulting a six-helical bundle ( -hb) structure that drives the two membranes in close apposition for fusion. in the earlier time point, we would like to experimentally verify whether sars-cov- uses human ace as a receptor for cell entry, thus we generated its s protein pseudotyped lentiviral particles. the sars-cov and vesicular stomatitis virus (vsv-g) pseudoviruses were also prepared for comparison. as shown in fig. a , all of three pseudoviruses efficiently infected t cells that stably overexpress ace ( t/ace ); however, the infectivity sars-cov- and sars-cov dramatically decreased in t cells which express a low level of endogenous ace . as a virus control, vsv-g pesudovirus entered t cells even more efficiently relative its infectivity in t/ace cells. we further compared the fusion activity of viral s protein in t and t/ace cells in both the t and t/ace target cells, we observed that the s protein of sars-cov- had a significantly increased fusion activity than the s protein of sars-cov. therefore, we further compared the fusion activities of viral s proteins at different time points. as shown in fig. c and d, the sars-cov s protein exhibited had no appreciable fusion activity until the effector cells and target cells were cocultured for five or six hours; in sharp contrast, the sars-cov- s protein mediated a rapid and robust cell fusion reaction, as indicated by its fusion kinetic curves especially in t/ace cells. sars-cov, sars-cov- has a hr sequence with nine amino acid substitutions, and of them eight are located within the hr core site; whereas, two viruses share a fully identical hr sequence (fig. a) . in order to explore the mechanism underlying the highly active fusion activity of the sars-cov- s protein, we synthesized two peptides corresponding to the hr sequence and their secondary structures were determined by circular dichroism (cd) furthermore, we synthesized a peptide containing the hr sequence, termed ipb , and its interactions with the two hr peptides were analyzed by cd spectroscopy. as shown in fig. d and e, both the sars np and sars np interacted with ipb to form complexes with typical α-helical structures, having the t m values of and o c, respectively. in comparison, the complex formed by sars np and ipb was much more stable than the complex between the sars np and ipb peptides. taken together, these results suggested that sars-cov- might evolve an increased interaction between the hr and hr domains in the s fusion protein thus critically determining its high fusogenic activity. we next sought to determine the antiviral functions of the ipb and ipb peptides. firstly, their inhibitory activities on s protein-mediated cell-cell fusion were examined by the dsp-based cell fusion assay as described above. as shown in fig. a and table (table ) . as expected, ipb and ipb had no inhibitory activity against a control virus (vsv-g), indicating their antiviral specificities. therefore, we conclude that ipb is a highly potent fusion inhibitor of sars-cov- and sars-cov. differently, ipb was a c-terminally truncated inhibitor with ipb as a template, but its antiviral function was markedly impaired, underscoring the roles of c-terminal residues in ipb . on the basis of the results above, it was expected that ipb with two terminal truncations was antivirally inactive. indeed, the cd data suggested that both the n-and c-terminal sequences contributed critically to the binding of the inhibitors (table ) where its s protein is cleaved by endosomal cysteine proteases cathepsin b and l (catb/l) and activated ( ) . however, sars-cov also employs the cellular serine protease tmprss for s protein priming, and especially, tmprss but not catb/l is essential for viral entry into primary target cells and for viral spread in the infected host ( , - ). it was also found that introducing a furin-recognition site between the s and s subunits could significantly suggesting that it mainly utilizes a plasma membrane fusion pathway for cell entry. sequence analyses revealed that sars-cov- harbors the s /s cleavage site in its s protein, although its roles in s protein-mediated membrane fusion and viral life-cycle need to be characterized. one can speculate that furin-mediated precleavage at the s /s site in infected cells might *the antiviral assays were repeated three times, and data are expressed as means ± standard deviations. the cd experiment was repeated two times and representative data are shown. nd means "not done" owing to the solubility problem of the peptides in pbs. a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin china novel coronavirus i, research t. . a novel coronavirus from patients with pneumonia in china coronaviruses post-sars: update on replication and pathogenesis structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion interaction between heptad repeat and regions in spike protein of sars-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex synthesized peptide inhibitors of hiv- gp -dependent membrane fusion structure-based discovery of middle east respiratory syndrome coronavirus fusion inhibitor hydrocarbon-stapled short alpha-helical peptides as promising middle east respiratory syndrome mers-cov) fusion inhibitors severe acute respiratory syndrome coronavirus (sars-cov) infection inhibition using spike protein heptad repeat-derived peptides heptad repeat-derived peptides block protease-mediated direct entry from the cell surface of severe acute respiratory syndrome coronavirus but not entry via the endosomal pathway identification of a minimal peptide derived from heptad repeat (hr) of spike protein of sars-cov and combination of hr -derived peptides as fusion inhibitors influence of hydrophobic and electrostatic residues on sars-coronavirus s protein stability: insights into mechanisms of general viral fusion and inhibitor design a pan-coronavirus fusion inhibitor targeting the hr domain of human coronavirus spike design and characterization of cholesterylated peptide hiv- / fusion inhibitors with extremely potent and long-lasting antiviral activity lipopeptide hiv fusion inhibitor maintains long-term viral suppression in rhesus macaques exceptional potency and structural basis of a t -derived lipopeptide fusion inhibitor against hiv- , hiv- , and simian immunodeficiency virus structural and functional characterization of membrane fusion inhibitors with extremely potent activity against hiv- , hiv- , and simian immunodeficiency virus inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry contributes to virus spread and immunopathology in the airways of murine models after coronavirus infection simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry protease inhibitors targeting coronavirus and filovirus entry furin cleavage of the sars coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry functional analysis of potential cleavage sites in the mers-coronavirus spike protein proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor clinical isolates of human coronavirus e bypass the endosome for cell entry wild-type human coronaviruses prefer cell-surface tmprss to endosomal cathepsins for cell entry teicoplanin: an alternative drug for the treatment of coronavirus covid- ? teicoplanin potently blocks the cell entry of -ncov remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro compounds with therapeutic potential against novel respiratory coronavirus hydroxychloroquine and azithromycin as a treatment of covid- : results of an open-label non-randomized clinical trial the tryptophan-rich motif of hiv- gp can interact with the n-terminal structural and functional characterization of hiv- cell fusion inhibitor t key: cord- - e j pn authors: hao, wei; ma, bo; li, ziheng; wang, xiaoyu; gao, xiaopan; li, yaohao; qin, bo; shang, shiying; cui, sheng; tan, zhongping title: binding of the sars-cov- spike protein to glycans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e j pn the pandemic of sars-cov- has caused a high number of deaths in the world. to combat it, it is necessary to develop a better understanding of how the virus infects host cells. infection normally starts with the attachment of the virus to cell-surface glycans like heparan sulfate (hs) and sialic acid-containing oligosaccharides. in this study, we examined and compared the binding of the subunits and spike (s) proteins of sars-cov- and sars-cov, mers-cov to these glycans. our results revealed that the s proteins and subunits can bind to hs in a sulfation-dependent manner, the length of hs is not a critical factor for the binding, and no binding with sialic acid residues was detected. overall, this work suggests that hs binding may be a general mechanism for the attachment of these coronaviruses to host cells, and supports the potential importance of hs in infection and in the development of antiviral agents against these viruses. the novel coronavirus (cov) is the seventh human coronavirus. it is a deadly virus that is affecting the whole world in an unprecedented way. the global impact of the coronavirus disease (covid- ) pandemic is far beyond that of two other major coronavirus outbreaks in the past years, the severe acute respiratory disease (sars) in , and the middle east respiratory disease (mers) in . given that all three highly pathogenic covs were originated from bats and a large number of closely related covs are present in bats, future outbreak of this type of zoonotic virus remains possible. in order to avoid facing a similar pandemic in the future, it is necessary to develop a better understanding of these covs, especially with regard to effective ways that can help control the current covid- pandemic and prevent the second wave of outbreaks. studies showed that the genome of the sars-cov- has about % nucleotide identity with that of sars-cov. the major differences are found in the regions encoding the structural proteins (envelope e, membrane m, nucleocapsid n, and spike s) and accessory proteins (orf a/ b, , a/ b, and ), not the nonstructural proteins (nsp to nsp ). based on this genetic similarity, the novel cov was named by the international committee on taxonomy of viruses (ictv) as the severe acute respiratory syndrome coronavirus (sars-cov- ). sars-cov- is more genetically distant from mers-cov and shares only about % genome homology with mers-cov. , these similarities and differences are reflected in the viral binding receptors to cell surface receptors. sars-cov- and sars-cov use the same receptor angiotensin-converting enzyme (ace ) for entry into target cells, [ ] [ ] [ ] whereas mers-cov uses dipeptidyl peptidase (dpp , also known as cd ) as the primary receptor. however, there are many important findings that cannot be explained by their genetic relations. for example, it was found that sars-cov- has a relatively lower mortality rate ( %) than sars-cov ( %) and mers-cov ( %), but is more transmissible among humans. [ ] [ ] [ ] these findings suggest that more investigations other than genetic analysis need to be carried out to improve the understanding of sars-cov- infection. in order to efficiently infect host cells, sars-cov- must bind with cell surface molecules in the lungs and other organs to mediate viral attachment and entry into host cells. previous studies of many other viruses suggested that sars-cov- s protein may use other molecules on host cell surface as attachment factors to facilitate binding to the high-affinity receptor ace . , examples of such molecules include glycosaminoglycans (gags) and sialic acid-containing oligosaccharides. [ ] [ ] [ ] [ ] gags are primarily localized at the outer surface of cells. such a location makes them particularly suitable for acting as attachment factors to recruit viruses to cell surfaces. , hs is one of the most prevalent types of gags in mammals. it is a linear and sulfated polysaccharide that is abundantly expressed on the surface of almost all cell types and in the extracellular matrix. , the hs chains are mostly covalently linked as side chains to core proteins to form hs proteoglycans (hspgs) (fig. ). a possible mechanism for sars-cov- entry and infection. at the early stage of the infection process, sars-cov- may first interact with the hspgs on the surface of susceptible cells using the s protein protruding from the virus particle. this initial attachment may promote the subsequent binding of the virus to the high-affinity entry receptor ace . the serine protease tmprss on host cell surface and other host cell proteases may assist in viral entry by cleaving the s protein at the s /s and/or at the s ' sites. hs is synthesized in the golgi apparatus by many different enzymes. during and after its assembly, hs undergoes extensive series of modifications including sulfation, acetylation and epimerization, which leads to glycan structures with high heterogeneity in length, sulfation, and glucuronate/iduronate ratio. considerable variation in the sulfation pattern and degree of hs was noted in different species, organs, tissues, and even at different ages and disease stages. , , the sequence and sulfation pattern of hs has been shown to be able to regulate the binding of many viruses to host cells during infection. , a similar trend was also observed for sialylation patterns of cell surface oligosaccharides. , , these findings implicate that a possible relationship may exist between the distribution of different types of hs/sialylated glycans and the viral tropism. [ ] [ ] [ ] a better understanding of their relationship can potentially contribute to the design of new antiviral strategies. although currently the data on the viral tropism of sars-cov- are limited, results from recent studies suggested that its tropism may not be correlated with the ace expression. , some other factors, such as proteases and glycans, may be the determinant of cellular susceptibility to the infection with this virus. a recent study suggested that hs may bind to the receptor binding domain (rbd, the c-terminal region of the s subunit, fig. ) of the sars-cov- spike protein and change its conformation. the intriguing possibility that variation in hs and sialic acid characteristics could impact the tropism of viruses prompts us to investigate the binding of sars-cov- toward a series of hs and sialic acid containing oligosaccharides. in this study, we systematically examined and compared the binding of the sars-cov- s protein subunits, full-length molecule and its trimer to different hs using microarray experiments (fig. ) . our results suggested that all the tested protein molecules were capable of binding hs oligomers and had similar binding preferences, with higher affinity toward hs forms with higher degrees of sulfation. the binding was also shown to be positively related to the level of -o-sulfation. the hs chain length seemed to be not critical. a long heparin molecule (a highly sulfated hs) and a synthetic heparin pentasaccharide (fondaparinux sodium) were demonstrated to have similar binding affinity to these protein molecules. moreover, our study suggested that the sars-cov- s protein might not be able to bind sialic acid residues, or the binding might be too weak to be detected by the microarray technology, and the s proteins of sars-cov and mers-cov have similar hs and sialic acid binding properties as those of the sars-cov- s protein. overall, our study laid a foundation for future studies to explore whether the binding specificity to hs can serve as an important contributor to the viral tropism of sars-cov- and to explore the possibility of exploiting hs for therapeutic strategies. binding of the subunits and s proteins of sars-cov- , sars-cov, and mers-cov to a hs microarray. an early study has suggested that the rbd of sars-cov- s protein might bind heparin. in order to determine if there is any preference of the sars-cov- s protein for particular hs structures, we first investigated the binding of the rbd (here termed as sars-cov- -rbd) to a hs microarray containing synthetic heparan sulfate oligosaccharides. these oligosaccharides have systematic differences in their length, monosaccharide composition, and sulfation pattern (fig. ) . the microarray experiment was performed using a previously established standard protocol. , , briefly, the proteins were labeled with cy fluorescent dye and incubated with the microarray at different concentrations. after washing away the unbound hs molecules, a highly sensitive fluorescence method was used to detect the binding of the sars-cov- -rbd to hs. under the experimental conditions, the binding can be detected at concentrations higher than . g/ml. increasing the concentration of the protein was not found to produce noticeable changes in binding (supporting information, fig. s ). quantification of fluorescence revealed that the sars-cov- -rbd is able to bind to almost half of the molecules on the microarray, and not surprisingly, the binding is strongly affected by the sulfation level, which is a trend that has been previously noted for many hs-binding proteins. [ ] [ ] [ ] as shown in figure a , the hs oligosaccharides with higher sulfation degree, hs -hs (the number of sulfate groups per monosaccharide unit > . ), exhibit higher fluorescence intensity. the highest fluorescence intensity is observed for hs ( . sulfate groups per monosaccharide), which is followed by those of hs and hs ( . sulfate groups per monosaccharide) and those of hs and hs ( . sulfate groups per monosaccharide). binding of sars-cov- -rbd to hs seems to not be affected by the monosaccharide composition (compare hs with hs , and hs with hs ). the oligosaccharides with relatively lower sulfation levels (hs -hs , the number of sulfate groups per monosaccharide unit < . ) have lower or almost no fluorescence signals. an analysis of the effect of the variation in sulfation revealed that the position of sulfation is another factor that strongly influence the binding. as shown in the figure , removal of the -o-sulfate group from the glucosamine units significantly reduced the binding (compare hs with hs , hs with hs , and hs with hs ), suggesting that the -o-sulfate in hs plays a crucial role in determining its interaction with sars-cov- -rbd. the importance of the -o-sulfate for binding is further supported by comparing the binding of hs , hs , hs , hs , and hs , which shows that the one-by-one addition of sulfate to the -oposition of the glucosamine residues gradually increased the binding of hs with sars-cov- -rbd. the microarray study also indicates that the binding is not positively related to the length of the hs chains. short hs oligosaccharides could have comparable or even better binding properties (compare hs -hs , hs -hs , and hs -hs ). the binding results of the sars-cov- -s follow a similar trend as those of the sars-cov- -rbd, with only minor differences for a few hs molecules like hs , hs , hs , hs and hs (fig. b) . this is consistent with the assumption that rbd is the major determinant for viral s protein binding to hs. examination of the sequence of the sars-cov- s protein revealed the presence of a potential cleavage site for furin proteases (rrars) at the s /s boundary. this is similar to the s protein of mers-cov, which also contain a furin consensus sequence rxxr (fig. ) . furin cleavage can occur in the secretory pathway of infected cells and breaks the covalent linkage between the s and s subunits. the s subunit functions to fuse the virus to the host cell. it may interact with hs on the cell surface to facilitate the fusion process. therefore, to determine the role of hs in sars-cov- infection, it is also important to investigate the binding of the s subunit to hs. very interestingly, our results indicate that the sars-cov- -s can also bind to hs, with the binding preferences similar to those of the sars-cov- -rbd and the sars-cov- -s (fig. c) . this suggests that hs may also play an important role during viral membrane fusion after the s subunit is removed from the s protein. we also further investigated the binding full-length s protein and its trimer to hs using the same microarray. our data showed that, although the binding to hs -hs still remains the highest, the full-length s protein has increased binding to hs with slightly less sulfation, particularly the molecules in groups hs -hs and hs -hs (fig. d, e) . additionally, we found that the binding preferences of the rbds, s subunits and full-length s proteins of sars-cov and mers-cov are similar to those of the subunits and s protein of sars-cov- , although small differences are observed (fig. f-k) . because the mers-cov s protein also contains a furin cleavage site, we studied the binding of its s subunit to the hs microarray. once again, the results are similar to those of the s subunit of sars-cov- . overall, these data support the involvement of hs in the binding with the subunits and s proteins of sars-cov- , sars-cov and mers-cov and support the importance of hs for the infection of these coronaviruses. to determine the relative binding strength of the subunits and s proteins of sars-cov- , sars-cov, and mers-cov, we measured and compared the dissociation constants (kd) of the protein molecules studied in the microarray experiment using a real-time spr-based binding assay. a commercially available porcine heparin from sigma-aldrich was used as the first binding partner for the measurement. it is a mixture of highly sulfated hs, with most chains in the molecular weight range of to kilodalton. this long heparin molecule is more similar to the hs molecules on the host cell surface than those on the microarray. in the spr assay, protein molecules were covalently linked to the surface of the cm (carboxymethylated dextran) sensor chips by amine coupling. the heparin in various concentrations was then flowed over the immobilized proteins. the changes in refractive index by molecular interactions at the sensor surface were monitored and the dissociation constants were obtained by fitting the results using the software available in the spr instrument (fig. ) . the comparison of the dissociation constants revealed that all the tested protein molecules can bind to the heparin, but their binding affinities are relatively low (kds are at the micromolar level). this agrees well with previous observations that hs is a weak binder to viral s proteins. the results also showed that rbd has the lowest binding affinity among the tested protein molecules of sars-cov- (fig, a) . the s protein trimer and s subunit have relatively lower binding affinity in comparison with the fulllength s protein and s subunit, respectively (fig, b-e) . very similar trends were also observed for differences in the binding affinities of the tested protein molecules of sars-cov and mers-cov (fig, f-l) . in addition to the long heparin molecule, we also measured the binding of the protein molecules to fondaparinux (arixtra®), which is an ultralow-molecular-weightheparin (ulmwh) containing five monosaccharide units (fig. ). it has a well-defined chemical structure and is currently the only ulmwh that has been clinically approved as an anticoagulant. fondaparinux is very similar to hs in size, monosaccharide composition, and sulfation position and degree. the results showed that the kd values of fondaparinux binding to the tested protein molecules are in the same range as those for the long heparin. this agrees well with the observation in the microarray study, suggesting that the length of the hs chains is not a determining factor for binding. despite the similarity, there is a subtle but noticeable difference, which is that the binding affinities of the s subunit and rbd to fondaparinux are quite similar. can interact with sialic acid-containing glycans present on the cell surface. , such an interaction is normally mediated by the n-terminal domain of the s subunit. in order to find out if sars-cov- can bind to sialic acid residues, we carried out microarray analyses of its s protein and subunits. the first microarray used contains different n-glycans that may be found on the surface of cells. of them are terminated with α , -and α , -linked sialic acid, also known as n-acetylneuraminic acid (neu ac), with α , -and α , -linked n-glycolylneuraminic acid (neu gc), and the rest with other glycan residues (supporting information, table s ). the experiment was performed in the similar way as described for the hs microarray study. the microarray results showed that both the sars-cov- -rbd and sars-cov- -s gave no binding signal, suggesting that they may not be able to interact with sialylated n-glycans or the binding signal is too low to be detected. in order to confirm this finding, we also investigated the binding of the full-length s proteins of sars-cov- , sars-cov and mers-cov to more sialylated glycans, including sialylated n-and o-linked glycans and glycolipid glycans (supporting information, tables s - ), but again no specific binding was detected. for a virus like sars-cov- to establish infection, it must first attach itself to the surface of target cells in different organs and tissues. the s protein plays an essential role in this attachment process. recently, the structure of sars-cov- s protein in the prefusion conformation was determined by the cryo-em technique. it shows that the overall structure of the sars-cov- s protein is very similar to that of the closely related sars-cov s protein, which is organized as a homotrimer. each monomer can be divided into an n-terminal receptor-binding s subunit and a c-terminal fusionmediating s subunit. the s subunits are located at the apex of the spike, making them more accessible for binding to the proteinaceous receptor, ace . although similar, there are some notable differences between the sars-cov- and sars-cov s proteins. , for example, the key amino acid residues involved in the binding of the sars-cov- s protein to ace are largely different from those of sars-cov. , , these differences may be related to the observed higher binding affinity of the sars-cov- s protein to ace . another important difference between the sars-cov- and sars-cov s proteins is that the former protein contains a multibasic protease recognition motif (rrars) at the junction of s and s . a multibasic cleavage site (rsvrs) was also identified in the mers-cov s protein (fig. ) . the sars-cov s protein only has a monobasic amino acid (sslrs) at the same site. the multibasic site can be processed by furin or related proprotein convertases, which are widely expressed in different tissues, before the virus is released from the host cell. by contrast, the monobasic site can be cleaved by tmprss or other cell-surface proteases (whose expression is confined to certain tissues) only after the virus is released from the host cell. it was reported that the cleavage at the junction of s and s activates the s protein for virus-cell fusion. thus, the presence of the multibasic cleavage site may partially account for the enhanced infectivity and tropism of sars-cov- relative to sars-cov for human cells. , however, what is vague is how the virus attaches to the host cells after losing the s subunit, which is responsible for the binding to the proteinaceous receptor. in addition to binding protein-based receptors, many viruses can interact with cell surface glycans, including gags and sialic acid-containing oligosaccharides. depending on the virus, the glycan molecules can act as attachment factors, coreceptors or primary receptors. viruses typically bind gags through non-specific charge-based interactions. as one of the most abundant gags, hs appears to be the preferred binding partner for many viruses. , [ ] [ ] [ ] [ ] sialic acids are normally terminal monosaccharide residues linked to glycans decorating cell surface glycoproteins, glycolipids, or other glycoconjugates. , in general, the interactions of viruses with hs or sialic acids are responsible for the first contact with host cells. such contact may serve to concentrate viruses on the surface of target cells, facilitate their binding to more specific high-affinity protein receptors and/or promote their entry into host cells. , it has been demonstrated that virus binding and infection can be reduced by enzymatic removal hs or sialic acid from cell surface, or by treating virus with soluble hs or multivalent sialic acid conjugates. , [ ] [ ] [ ] therefore, in order to better understand and treat covid- , it is necessary to carry out research to investigate the possible interactions between sars-cov- and hs and sialic acid-containing glycans in the forms of separate subunits and full-length proteins, and to assess if such interactions could represent a target for therapeutic intervention. similar to studies that have been successfully conducted for many other viruses, we used the microarray and spr technology to study the binding of the rbds, s /s subunits and full-length s proteins of sars-cov- , sars-cov and mers-cov, and a trimer of the sars-cov- s protein to hs and sialic acid. the microarray results showed that all the tested protein molecules can bind to about half of the oligosaccharides on the hs microarray. in contrast, only background levels of fluorescence were detected on various sialylated glycan microarrays. this observation suggests all the tested protein molecules are able to bind to hs , and may not bind to and/or have very low binding affinity to sialylated glycans. this well agrees with previous studies showing that sars-cov can bind to hs and the binding of the mers-cov s protein to sialic acid-containing glycans can only be detected after the s subunit was attached to a nanoparticle to enhance its avidity via multivalent interactions. our results also suggested that the binding of the tested proteins to hs is related to hs sulfation position and degree. it seems that more -o-sulfate groups and higher sulfation degree normally lead to better binding. because hspgs exhibit different sulfation patterns in different tissues, such a binding specificity may contribute to the tropism of sars-cov- for human cells. the length of hs appears not to be a critical factor for the binding. short hs chains could have comparable binding specificity and signals. the comparison of the spr kd values obtained for the long heparin molecule and fondaparinux further support this finding. it implies that it may be possible to reduce the attachment of sars-cov- to the surface of host cells by low-molecularweight-heparin (lmwh). this is in agreement with a recent study showing that lmwh treatment may be associated with better prognosis in some severe covid- patients. while these initial findings are encouraging, further research is required to determine if the binding to hs could affect the tropism and pathogenesis of sars-cov- and to determine if hs could be used for the inhibition of the infection of this virus. our spr data of the tested protein molecules also showed that the s subunits could have similar or better binding affinity for hs as compared to those of the s subunits. this finding suggests that the cleaved s proteins of sars-cov- and mers-cov may depend on hs for interaction with the host cells during viral membrane fusion. in parallel with our study, the linhardt and boons research teams also conducted studies to investigate the binding of s proteins to hs. , the absolute values of the dissociation constants determined in our experiments are largely different from those presented by these two teams. this can be seen from the binding of the full-length s protein of sars-cov- , which was studied by all three teams (supporting information, table s ). our kd is one order of magnitude higher than that reported by the boons team, which in turn is three orders of magnitude higher than that reported by the linhardt team. the differences in results may be due to the method of analysis and/or the experimental materials used in the studies. in our study, the tested proteins were immobilized on the surface of cm sensor chips, while in their experiments, biotinylated heparin molecules were immobilized on the chips. at the same time, because the s glycoproteins and the heparin molecules were obtained from different sources, their composition may be different from each other. in conclusion, through our study, we provided experimental evidence for whether or not the s protein of sars-cov- can bind to two types of cell-surface glycans, hs and sialic acid-containing glycans, which are commonly utilized by human viruses for attachment to target cells. our data revealed that the sars-cov- s protein can weakly bind to hs in a sulfation-dependent manner. no binding with sialic acid residues was detected using the microarray assay. the results suggest that hs may act as an attachment factor to concentrate the virus at the cell surface and affect its tropism. through comparison, we found that the s proteins of sars-cov and mers-cov have similar binding properties to hs as that of the sars-cov- s protein, indicating that hs binding may be a conserved feature for these three types of coronaviruses. our data also revealed that the s subunits could bind equally well as the s subunits to hs. this binding may be an important element for viral attachment to the host cell surface after the removal of the n-terminal receptor-binding domains by protease cleavage. overall, our findings support the potential importance of hs in sars-cov- infection and in the development of antiviral agents. expression and purification of sars-cov- -rbd. dna containing the coding sequence for an n-terminal hemo signal peptide, the receptor binding domain (rbd, residues - ) of sars-cov- s protein and a c-terminal polyhistidine tag was amplified and inserted into a pfasebac vector for expression in high- insect cells using the bac-to-bac expression system (invitrogen). the resulting recombinant protein, termed sars-cov- -rbd-his, was secreted into cell culture medium, and subsequently purified on a nickel-nitrilotriacetic acid (ni-nta) affinity column, followed by a superdex gel filtration column (ge healthcare). the final buffer for the protein contains mm hepes (ph= . ) and mm nacl. the purified sars-cov- -rbd-his was concentrated to . mg/ml and flash frozen in liquid n and stored at - degrees celsius. binding of recombinant proteins to glycan microarrays. the tested protein molecules were first labeled with cy fluorescent dye ( mg/ml in dmso). after dialysis, they were incubated at different concentrations with microarrays for hour in the dark at room temperature. after incubation, the microarray slides were gently washed using washing buffer ( mm tris-cl containing . % tween , ph . ) to remove unbound proteins. finally, the slides were scanned with a microarray scanner luxscan- k/aat an excitation wavelength of nm and evaluated by the microarray image analyzer software. the spr measurements were performed using biacore t and s (ge healthcare). first, the carboxymethyl dextran matrix on cm sensor chip (ge healthcare) was activated by injection of a : mixture of -ethyl- -( dimethylaminopropyl) carbodiimide (edc) and n-hydroxysuccinimide (nhs). recombinant proteins in mm acetate buffer (ph . , ge healthcare) was then injected over the chip surface at a flow rate of μl/min to couple the amino groups of the recombinant proteins to the carboxymethyl dextran matrix. after the coupling reaction, the remaining activated ester groups were deactivated by ethanolamine. the binding study was carried out at °c in pbs-p buffer (ge healthcare). the heparin molecule or fondaparinux at different concentrations were flowed over the immobilized recombinant proteins at a flow rate of μl/min with a contact time of s and a dissociation time of s. the surface was regenerated by injection of mm glycine-hcl (ph . ) at a flow rate of μl/min for s. data was collected and analyzed by bia evaluation software (ge healthcare). all authors have given approval to the final version of the manuscript. a novel coronavirus from patients with pneumonia in china identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel coronavirus associated with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster coronaviridae study group of the international committee on taxonomy of viruses. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- the genetic sequence, origin, and diagnosis of sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc differences and similarities between severe acute respiratory syndrome (sars)-coronavirus (cov) and sars-cov- . would a rose by another name smell as sweet? sars case fatality ratio, incubation period middle east respiratory syndrome coronavirus (mers-cov) ebola virus entry requires the cholesterol transporter niemann-pick c cell surface α , -linked sialic acid facilitates zika virus internalization identification of sialic acid-binding function for the middle east respiratory syndrome coronavirus spike glycoprotein interaction between respiratory syncytial virus and glycosaminoglycans, including heparan sulfate interaction of zika virus envelope protein with glycosaminoglycans structural basis for human coronavirus attachment to sialic acid receptors heparan sulfate proteoglycans are required for cellular binding of the hepatitis e virus orf capsid protein and for viral infection herpes simplex virus types and differ in their interaction with heparan sulfate distribution of heparin and other sulfated glycosaminoglycans in vertebrates tissue specific distribution of sulfated mucopolysaccharides in mammals bio-specific sequences and domains in heparan sulphate and the regulation of cell growth and adhesion heparan sulfates and heparins: similar compounds performing the same functions in vertebrates and invertebrates? braz n-and -o-sulfated heparan sulfates mediate internalization of coxsackievirus b variant pd into cho-k cells the role of heparan sulfate proteoglycans as an attachment factor for rabies virus entry and infection human parainfluenza viruses hpiv and hpiv bind oligosaccharides with alpha - -linked sialic acids that are distinct from those bound by h avian influenza virus hemagglutinin inhibition of viral adhesion and infection by sialic-acidconjugated dendritic polymers herpesviruses and heparan sulfate: an intimate relationship in aid of viral entry heparan sulfate proteoglycans initiate dengue virus infection of hepatocytes heparan sulfate is an important mediator of ebola virus infection in polarized epithelial cells the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak -an update on the status sars-cov- and sars-cov differ in their cell tropism and drug sensitivity profiles sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the coronavirus (sars-cov- ) surface protein (spike) s receptor binding domain undergoes conformational change upon heparin binding binding of avian coronavirus spike proteins to host factors reflects virus tropism and pathogenicity lectin microarray profiling and relative quantification of glycome associated with proteins of neonatal wt and rd mice retinae the sweet spot: defining virus-sialic acid interactions n-terminal syndecan- domain selectively enhances -o heparan sulfate chains sulfation and promotes vegfa( )-dependent neovascularization heparan sulfate and heparin interactions with proteins heparan sulfate microarray reveals that heparan sulfate-protein binding exhibits different ligand requirements the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade sars-cov- spike protein binds heparan sulfate in a length-and sequence-dependent manner receptor recognition mechanisms of coronaviruses: a decade of structural studies structure of sars coronavirus spike receptor-binding domain complexed with receptor isolation and characterization of a bat sars-like coronavirus that uses the ace receptor functional analysis of potential cleavage sites in the mers-coronavirus spike protein synthetic analogues of the snail toxin -bromo- -mercaptotryptamine dimer (brmt) reveal that lipid bilayer perturbation does not underlie its modulation of voltage-gated potassium channels glycan-protein interactions in viral pathogenesis human metapneumovirus (hmpv) binding and infection are mediated by interactions between the hmpv fusion protein and heparan sulfate adaptation of tick-borne encephalitis virus to bhk- cells results in the formation of multiple heparan sulfate binding sites in the envelope protein and attenuation in vivo role of heparan sulfate in entry and exit of ross river virus glycoprotein-pseudotyped retroviral vectors role of heparan sulfate in the zika virus entry, replication, and cell death glycosaminoglycans and infection glycosaminoglycans in infectious disease sulfated glycosaminoglycans as viral decoy receptors for human adenovirus type respiratory syncytial virus with the fusion protein as its only viral glycoprotein is less dependent on cellular glycosaminoglycans for attachment than complete virus sialic acid is a cellular receptor for coxsackievirus a variant, an emerging virus with pandemic potential inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans organ-specific sulfation patterns of heparan sulfate generated by extracellular sulfatases sulf and sulf in mice anticoagulant treatment is associated with decreased mortality in severe coronavirus disease patients with coagulopathy glycosaminoglycan binding motif at s /s proteolytic cleavage site on spike glycoprotein may facilitate novel coronavirus (sars-cov- ) host cell entry we would like to thank the national natural science the authors declare no competing interests. key: cord- -e gtx z authors: jegouic, sophie m.; loureiro, silvia; thom, michelle; paliwal, deepa; jones, ian m. title: recombinant sars-cov- spike proteins for sero-surveillance and epitope mapping date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e gtx z the newly emergent sars-cov- coronavirus is closely related to sars-cov which emerged in . studies on coronaviruses in general, and sars in particular, have identified the virus spike protein (s) as being central to virus tropism, to the generation of a protective antibody response and to the unambiguous detection of past infections. as a result of this centrality sars-cov- s protein has a role in many aspects of research from vaccines to diagnostic tests. we describe a number of recombinant forms of sars-cov- s expressed in commonly available expression systems and their preliminary use in diagnostics and epitope mapping. these sources may find use in the current and future analysis of the virus and the covid- disease it causes. the appearance of covid- in late and its subsequent development to a pandemic have been widely reported [ ] . bioinformatics shows that the causative virus, sars-cov- , is closely related to sars, a beta coronavirus that caused an epidemic in / and probably emerged by zoonotic transfer from an animal species [ , ] . the basis of immunity for many coronaviruses is the spike protein (s), a kda type i membrane glycoprotein found on the surface of the virus. the sars and sars-cov- s proteins are > % homologous at the amino acid level and have many features in common, not least the use of the same receptor, angiotensin converting enzyme (ace ), for virus entry. recently determined structures of s in complex with ace have confirmed the same folded receptor binding domain (rbd) in both s proteins, albeit slightly offset when compared with each other [ , ] . the sars-cov- s has a polybasic site upstream of the s fusion peptide and preliminary experiments show that proteolytic processing of s is required for cell entry [ , ] . s is the principle neutralizing determinant of the virus and is composed of two domains, s and s . the s domain encompasses the rbd while the s domain encodes the fusion peptide, heptad repeats and transmembrane domain. many previous studies on sars s have shown that while the full-length molecule is sufficient for protection [ ] , s or the rbd alone offer similar protection without the possibilities of enhancing antibodies which have been reported in some cases [ , ] . thus, s, or fragments of s encompassing the rbd, constitute candidate vaccines for sars-cov- as well as being useful for the detection and mapping of anti-s serum responses in convalescent or vaccinated individuals [ , ] . we describe a range of sars-cov- s protein fragments produced in expression systems available in most laboratories which may find application for these purposes and for others investigating s structure-function relationships. a full-length sequence verified sars-cov- s construct acted as template for a number of smaller fragments made by amplification of the requisite sequence by high fidelity pcr. these fragments were designed to encompass the known domains of s or to span the entire s coding region as ~ amino acids overlapping by amino acids ( figure ). the original full length constructs and all smaller fragments were cloned into the multi-phylum expression vector ptriex . such that the encoded s sequence was appended at the c-terminus with a vector resident sequence encoding a polyhistidine tag for detection and purification, as used elsewhere [ ] . to provide expression resilience and to facilitate different uses, two expression systems were used to generate recombinant s proteins and fragments thereof. constructs encoding the full-length s and the s domain were transfected into spodoptera frugiperda (sf ) with linearized baculovirus dna for the generation of recombinant baculoviruses [ ] . in addition, all clones were transformed into bl related strains for t polymerase driven expression in e.coli. recombinant baculovirus stocks were used to infect small scale cultures ( x cells) for confirmation of s protein expression. analysis of infected insect cells and culture supernatant at days post infection by western blot with an anti-his antibody showed expression of s and s in cell extracts and, as expected, s in the supernatant ( figure a ). there was no evidence of an s band (~ kda) in the s expressing cells suggesting that the s protein is not processed in insect cells, consistent with the requirement to engage the receptor [ ] or incompatibility with insect cell furin [ ] . expression of s protein was also clear for full length s by immunofluorescence (permeabilized cells) or on the surface of cells as assessed by flow cytometry, both using an s reactive human monoclonal antibody (figure c & d). larger scale cultures of s related proteins were expressed in t.nao cells [ ] and were purified following detergent lysis (s) or, in the case of s , clarification of the infected cell supernatant. single passage imac for the full length s protein enriched the non-glycosylated and glycosylated forms of the protein which were confirmed by western blot but also had a range of other insect cells proteins present ( figure b ). however, direct imac of the s containing supernatant in the presence of . mm nickel sulphate resulted in s that was ~ % pure ( figure b ) with a yield of ~ mg/l ( cells). similar western blot analysis of total protein extracts following induction of logarithmic phase e.coli cultures with iptg confirmed the expression of his-tagged s antigen of the predicted molecular weight in all cases ( figure , upper panel). in general, the shorter overlapping set of s fragments, including those that spanned the rbd, were produced at higher levels than larger fragments following a hr induction. further analysis of the srelated proteins showed all well-expressed fragments to be produced as inclusion bodies which remained the case following low temperature induction ( o c). retransformation of solubl (ams biotechnology) and lobstr [ ] strains did not result in soluble protein expression despite titration of the iptg concentration used for induction and further work is required on rescue strategies commonly in use to solubilize inclusion bodies (e.g. [ ] ). nevertheless, we found that s protein fragments prepared for gel electrophoresis using non-reducing loading buffer could be used successfully for epitope mapping of s reactive monoclonal antibodies, g , an unpublished mouse mab generated to sars s, and cr , a human mab also isolated originally to sars [ ] but shown to cross-react with sars-cov- . a structure of the latter humab in complex with the rbd has recently been solved [ ] . by blot g bound to overlapping fragments c and d spanning resides - , overlap - ( figure , middle panel) while cr bound only to fragment c (figure , lower panel). cr was previously mapped to residues - which, in the modelled s trimer, are only exposed in certain conformations [ ] . our mapping suggests core binding by cr can occur to a smaller region than that suggested by the structural footprint, within residues - , in keeping with its reported non-sensitivity to mutation p l [ ] . the lower two panels are cropped but showed no reaction to rbd or s despite their presence at detectable levels (upper panel). the principal purpose in generating sars-cov- s protein or protein fragments were for studies of antibody binding or antibody generation. accordingly, we used s expressed and purified from insect cells as an antigen for serum binding. donated human sera from individuals, including some who had experienced covid- -like symptoms, were assessed for s reactivity. preliminary titration experiments using an antibody to the his tag determined that s coating at μg per ml saturated the plastic surface. in addition, assays using cr spiked into normal human plasma determined that, of several blocking buffers assessed, blocking the plate with steelhead salmon serum (sea block, thermo scientific), provided the lowest backgrounds. standard elisa of the sera with these conditions led to the identification of sera ( %) as positive and sera ( %) as negative and these were discriminated clearly at either the : or : dilution points (figure , inset) . three sera gave intermediate binding curves and could not be unambiguously scored. the titration curves for the positives were similar with anti s titres of ~ in all cases (figure ). figure . screen for s reactivity. sera were screened by elisa on purified s protein starting at a dilution of : . inset: scatterplot of all sera (n= ) at : and : dilution points with positives identified. to provide an additional level of validation and to add epitope specificity to the data, of the sera scoring positive by s elisa were used as probes on western blots using full length s expressed in insect cells (cf. figure ) and also on the overlapping set of s fragments expressed in e.coli (cf. figure ). both sera reacted with full length s antigen ( figure a ) and showed binding to overlapping s fragments c and d, encompassing the rbd ( figure b ). these data suggest that a second tier positivity test based on western blot could be used as confirmation of past infection and that, at least for antibodies able to bind to gel resolved antigen, antibodies to the rbd are present in convalescent individuals. the appearance of sars-cov- and its pandemic spread has led to the reported expression of the virus encoded proteins, notably the s protein, for structural study [ , ] , for immunisation [ ] and for diagnostics [ ] . we have described recombinant sources of several s related polypeptides from two common expression systems, recombinant baculoviruses and e.coli and used the proteins expressed for the analysis of seroconversion and for epitope mapping. the particular properties of the insect cell system, yield, robustness and the ability to perform at scale are discussed elsewhere [ ] , similarly the use of the t system in e.coli [ ] . using a set of overlapping s fragments we demonstrated epitope mapping of monoclonal antibodies g and cr to residues - of s. interestingly, neither antibody reacted with the rbd itself in the western blot format used despite it encompassing the residues recognised. this suggests that each fragment of s may adopt a variable level of refolding on the membrane following transfer and emphasizes the value of the overlapping fragment approach to enable epitope identification. of residues in the rbd shown to interact with cr , lie within residues - [ ] and this core region alone is evidently sufficient to allow binding, as shown by interaction with s fragment c here. the overlapping fragment set allowed mapping of two human sera which also showed binding to the same region. it remains to be determined how widespread this serum response is in exposed individuals and if there is any correlation between the epitope specificity of a serum and the titre or level of neutralization. the globular s domain of several coronaviruses is the preferred antigen for sero-surveillance [ , , , ] and purified sars-cov- s was used similarly here. full length s protein has been used elsewhere [ ] but the s domain which it includes can lead to cross reaction as a result of previous coronavirus infections [ ] . although we used purified s , we have shown previously that glycosylated antigen from infected insect cells captured to the plate by a mannose specific lectin (gna) can also be an effective antigen, avoiding the need for protein purification in resource limited situations [ ] . we found evidence for ~ % seroconversion in a set of random samples, some of which were confirmed by western blot. the titre of all these sera was similar and as no other information on the samples was available no correlation with symptoms was possible although none of the samples were from hospitalised individuals. the level of seroconversion in the uk population is currently unknown although unpublished data suggest a range of - % [ ] . oxfordshire has a demonstrated positivity rate of per , (office for national statistics) which, using a -fold factor for non-tested but exposed individuals suggested from other studies [ ] , would suggest an infection rate of ~ %, very close to our observed positivity. several studies of seroconversion have been reported, in hospitalised individuals [ ] and in the population generally [ ] but the general relationships among disease severity, antibody titre, neutralisation, epitope profile and longevity of response remain to be determined. the set of s resources described here may contribute to studies in these areas. the sequence of sars-cov- s (accession no. nc_ ) was codon optimised for spodoptera frugiperda cells and the resident signal peptide exchanged for that of honeybee melittin [ ] before being ordered as two overlapping fragments (idt europe) flanked by bps at the ' and ' ends homologous to the intended expression vector, ptriex . (emd millipore). the ' flanking nucleotides were also designed to fuse the s open reading frame in-frame to the vector resident polyhistidine encoding sequence ( his residues). the gene was assembled at the same time as recombination into the vector using infusion technology (neb) and the assembly reaction used to transform e.coli (neb beta). colonies positive by pcr screening using primers that flank the cloning site were sequenced across the entire s coding region and a single positive isolate adopted for all further manipulations. spodoptera frugiperda (sf ) and t.niao cells were maintained in ex-cell medium (sigma) supplemented with % fetal bovine serum, % penicillin/streptomycin solution, at ˚c with shaking. virus growth used exclusively sf s while protein expression used predominantly t.niao . linearized baculovirus dna was used to produce recombinant baculoviruses [ ] . small-scale protein expression was performed by infection of a -well plate seeded with x sf cells per well using μl of a high titre stock of the recombinant baculovirus, typically passage , and incubated for days at °c. after incubation, cells were harvested and used for antigen detection. constructs were transformed into e.coli t express lysy (neb) and isolated by ampicillin resistance. cultures were grown at o c to an od = . and induced by the addition of iptg to . mm. growth was continued at o c for hr and cells harvested and disrupted for gel or purification as required. sds-page analysis used the equivalent of μl of culture per lane. solubility was gauged after lysis in bugbuster (emd) and gel analysis of the soluble and pellet fractions. alternate hosts used were solubl (ams biotechnology) and lobstr [ ] . forty-eight hours after infection, cells were dislodged by pipette and washed twice with cold pbs for minutes, then incubated in fix and permeabilization buffers for hr at room temperature (ebioscience™). fixed and permeabilized cells were incubated with cr at μg per ml for hr at room temperature. they were washed in pbs % bsa and then incubated with anti-human alexafluor conjugate for a further hr. the cells were washed twice with tbs for min each at room temperature mounted with a drop of slowfade™ gold reagent before being imaged using an evos-fl digital fluorescence microscope (thermo fisher scientific). cells for flow cytometry were processed similarly, but without permeabilization, and data acquired using a bd accuri c plus and analysed by fc express v (de novo software). infected insect cells were disrupted with cytobuster™ protein extraction reagent (merck) and clarified by centrifugation at , x g for m before column loading. for s , the supernatant of infected cells was clarified by centrifugation as above followed by passage through a . micro filter. the clear supernatant was adjusted to . mm nickel sulphate to avoid stripping the imac column before loading. in both cases imac chromatography was done using a pre-prepared ml imac column (ge) operating at a flow rate of . ml.min - with a gradient elution of . - . m imidazole over minutes. proteins were disrupted in nupage loading buffer (thermo) and separated by sds-page using - % precast tris-glycine sds polyacrylamide gels (invitrogen) for min at v. after electrophoresis, gels were either stained with coomassie blue r or transferred to pvdf membranes for western blot analysis. for epitope mapping using overlapping s fragments, inclusion bodies were disrupted in % sds loading buffer without reducing agent. following semi-dry transfer to pvdf membranes, membranes were incubated in tbst blocking buffer consisting of ( % of skimmed milk powder, . % tween- , x tbs) for hr. membranes were incubated with the primary antibody at : , in x tbst buffer for hr followed by x washes for minutes each in tbst buffer and if necessary with a secondary horseradish-peroxidase (hrp) antibody conjugate (sigma) diluted : , in x tbst for hr followed by x washes for minutes each with tbst buffer. the membrane was finally washed with tbs and hrp activity detected using chemiluminescence imagery. microtitre plates (nunc maxisorb) were coated with s antigen at μg per ml in mm sodium carbonate-bicarbonate buffer (ph . ) for a minimum of hour at room temperature. excess antigen was removed by washing three times with super block (thermo scientific) and unbound sites blocked using non-diluted sea block. samples were added at / dilution and diluted in a -fold series thereafter, followed by hr at room temperature. the plates were washed five times with tbs containing . % v/v tween- and polyclonal anti-human antibody conjugated to hrp (sigma) added for one hour at room temperature. following washing the chromogenic substrate tmb (europa bioproducts) was added and colour development stopped by the addition of a % well volume of . m sulphuric acid. absorbance was read at nm against a reference read at nm. finger prick samples from volunteers contacted by word of mouth were collected into % sodium citrate using self-retracting lancets. the samples were collected in the last week of april in central oxfordshire. no other information on the donated sample was sought. imj designed research, smj, sl, mt, dp and imj performed research and analysed data; and imj and smj wrote the paper. the authors declare no conflict of interest. reagent requests should be made to imj. genome composition and divergence of the novel coronavirus ( -ncov) originating in china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- a new coronavirus associated with human respiratory disease in china structure, function, and antigenicity of the sarscov- spike glycoprotein structure of the sars-cov- spike receptor-binding domain bound to the ace receptor sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor nafamostat mesylate blocks activation of sars-cov- : new treatment option for covid- the spike protein of sars-cov -a target for vaccine and therapeutic development optimization of the production process and characterization of the yeast-expressed sars-cov recombinant receptor-binding domain ( rbd -n ), a sars vaccine candidate s protein receptor-binding domain stably expressed in cho cells antibody testing for covid- : a report from the national covid scientific advisory panel characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov interaction of severe acute respiratory syndromecoronavirus and nl coronavirus spike proteins with angiotensin converting enzyme- improving baculovirus recombination identification and characterization of spodoptera frugiperda furin: a thermostable subtilisin-like endopeptidase correction: bti-tnao , a new cell line derived from trichoplusia ni, is permissive for acmnpv infection and produces high levels of recombinant proteins optimized e. coli expression strain lobstr eliminates common contaminants from his-tag purification purifying natively folded proteins from inclusion bodies using sarkosyl, triton x- , and chaps human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov perspective sars-cov- vaccines: status report virological assessment of hospitalized patients with covid- recent developments in the use of baculovirus expression vectors t expression systems for inducible production of proteins from serologic responses of mers-coronavirus-infected patients according to the disease severity middle east respiratory syndrome coronavirus neutralising serum antibodies in dromedary camels: a comparative serological study pre-existing and de novo humoral immunity to sars-cov- in humans adjuvant-free immunization with hemagglutinin-fc fusion proteins as an approach to influenza vaccines covid- antibody seroprevalence enhanced secretion from insect cells of a foreign protein fused to the honeybee melittin signal peptide we acknowledge the williams family and their anonymous donors for their help in enabling the serology. phil lowry kindly read the manuscript. key: cord- - yfvspaq authors: ruetalo, natalia; businger, ramona; schindler, michael title: rapid and efficient inactivation of surface dried sars-cov- by uv-c irradiation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yfvspaq the sars-cov- pandemic urges for cheap, reliable and rapid technologies for disinfection and decontamination. we here evaluated the efficiency of uv-c irradiation to inactivate surface dried sars-cov- . drying for two hours did not have a major impact on the infectivity of sars-cov- , indicating that exhaled virus in droplets or aerosols stays infectious on surfaces at least for a certain amount of time. strikingly, short exposure of high titer surface dried virus ( * ^ iu/ml) with uv-c light ( mj/cm ) resulted in a total reduction of sars-cov- infectivity. together, our results demonstrate that sars-cov- is rapidly inactivated by relatively low doses of uv-c irradiation. hence, uv-c treatment is an effective non-chemical possibility to decontaminate surfaces from high-titer infectious sars-cov- . introduction sars-cov- has spread globally and there is an urgent need for rapid, highly efficient, environmentally friendly, and non-chemical disinfection procedures. application of uv-c light is an established technology for decontamination of surfaces and aerosols ( - ). this procedure has proven effective to inactivate sars-cov- ( - ), several other enveloped and non-enveloped viruses as well as bacteria ( ). recently, it has also been shown that sars- cov- is sensitive to inactivation by uv-c irradiation ( - ). however, doses and exposure times necessary for total inactivation of sars-cov- were in a range precluding efficient application of uv-based methods to be employed for large-scale decontamination of surfaces and aerosols ( ). we hence conducted a "real-life" application approach simulating the inactivation of dried surface residing infectious sars-cov- by a mobile handheld uv-c emitting device and an uv-c box designed to decontaminate medium-size objects. our data shows that surface dried sars-cov- retains infectivity for at least two hours. short exposure of high-titer surface dried sars-cov- to uv-c light lead to a total reduction of uv-c light inactivation treatment. ul of virus stock, corresponding to ~ * infectious units (iu) of icsars-cov- -mng were spotted (in triplicates) in -well plates and dried for two hours at rt. -well plates spotted with dried virus were treated with uv-c-light using the soluva® pro uv disinfection chamber (heraeus) for seconds or the soluva® pro uv disinfection handheld (heraeus) for seconds in a fix regime at and cm plate distance. in addition, a moving regime using a slow ( . cm/s) and fast ( cm/s) speed at cm distance was tested. as control, -well plates were spotted with the virus and dried, but not uv-treated. after uv-treatment, the spotted virus was reconstituted using ml of infection media (culture media with % fcs). as control, ul of the original virus stock were diluted to ml with infection media and used as virus stock infection control. we set up an experimental approach to evaluate the effect of uv-c treatment on the stability of sars-cov- . simulating the situation that exhaled droplets or aerosols from infected individuals contaminate surfaces, we produced a high-titer sars-cov- infectious stock and dried µl of this stock corresponding to ~ * ^ iu/ml in each well of a -well plate. the plates were then either non-treated or exposed to five uv-c regimens (fig. a) . these include inactivation for s in a box designed to disinfect medium-size objects, s exposure at cm or cm distance with a handheld uv-c disinfection device and finally an approach simulating decontamination of surfaces via the handheld uv-c device. for this, we performed slow and fast-moving at a distance of ~ cm, with "slow" corresponding to a speed of ~ . cm/s (supplemental movie ) and "fast" at ~ cm/s (supplemental movie ). uv-c irradiance ( nm) in the box with an exposure time of seconds corresponds to an irradiation dose of mj/cm²; for the handheld (hh) at cm the uv-c dose at two second irradiation time is mj/cm² and at cm is mj/cm². from the speed of the "slow" and "fast" moving regimens we calculate a uv-c dose of . mj/cm² (slow) and . mj/cm² (fast), assuming a focused intensity beam. however, taking into consideration the uv-c light distribution underneath the handheld device the integrated uv-c dose accumulates to mj/cm² for the fast regimen. subsequently, dried virus was reconstituted with ml infection media and used to inoculate naïve caco- cells at serial dilutions to calculate viral titers. taking advantage of an infectious sars-cov- strain expressing the chromophore mneongreen ( ), we quantified infected (mng+) and total (hoechst+) cells by single-cell counting with an imaging multiplate reader. of note, even short uv-c treatment of the dried virus in the context of the moving "fast" regimen completely inactivated sars-cov- , as no infected cells were detected based on fluorescence protein expression (fig. b) . titration of two-fold series dilutions of the uv- treated and non-treated control samples, as well as the freshly thawed strain as reference, revealed that (i) drying for two hours does not have a major impact on the infectivity of sars- cov- and (ii) all five uv-c treatment regimens effectively inactivate sars-cov- (fig. c) . calculation of viral titers based on the titration of the reconstituted virus stocks revealed a loss of titer due to drying from ~ * ^ to ~ * ^ iu/ml and effective -log titer reduction of sars-cov- by all employed uv-c treatment regimens (fig. d) . altogether, our data demonstrate that uv-c regimens that expose high-titer sars-cov- to doses down to mj/cm² are sufficient to achieve complete inactivation of the virus. discussion disinfection of surfaces and aerosols by uv-c irradiation is an established, safe and non- chemical procedure used for the environmental control of pathogens ( - , ) . uv-c treatment has proven effective against several viruses including sars-cov- ( - ) and other coronaviruses i.e. canine coronaviruses ( ). hence, as recently demonstrated by others ( - ) and now confirmed by our study it was expected that sars-cov- is permissive for inactivation by uv-c treatment. one critical question is the suitability of this technology in a "real-life" setting in which the exposure time of surfaces or aerosols should be kept as short as possible to allow for a realistic application, for example in rooms that need to be used frequently as operating rooms or lecture halls. furthermore, in such a setting, we assume that the virus is exhaled from an infected person by droplets and aerosols, dries on surfaces and hence represents a threat to non-infected individuals. we simulated such a situation and first evaluated if surface dried sars-cov- is infectious. drying for two hours, in agreement with previous work ( ), did not result in a significant reduction of viral infectivity indicating smear- infections could indeed play a role in the transmission of sars-cov- (fig. ) . on the other hand, our virus-preparations are dried in cell culture ph-buffered medium containing fcs, which might stabilize viral particles. hence, even though this is not the scope of the current study, it will be interesting to evaluate if longer drying or virus-preparations in pbs affect the environmental stability of sars-cov- . irrespective of the latter, uv-c-exposure of dried high-titer sars-cov- preparations containing ~ * ^ iu/ml resulted in a complete reduction of viral infectivity (fig. ) . in this context, it is noteworthy that we achieved a -log virus-titer reduction in a setting simulating surface disinfection with a moving handheld device. effect of ultraviolet germicidal irradiation on viral aerosols role of ultraviolet (uv) disinfection in infection control and environmental cleaning role of ultraviolet disinfection in the prevention of surgical site infections stability of sars coronavirus in human specimens and environment and its sensitivity to heating and uv irradiation large-scale preparation of uv-inactivated sars coronavirus virions for vaccine antigen inactivation of the coronavirus that induces severe acute respiratory syndrome, sars-cov uv dose) achieve incremental log inactivation of bacteria, protozoa, viruses and algae rapid inactivation of sars-cov- with deep-uv led irradiation effectiveness of -nm ultraviolet light on disinfecting sars-cov- surface contamination susceptibility of sars-cov- to uv irradiation an infectious cdna clone of sars-cov- no touch' technologies for environmental decontamination: focus on ultraviolet devices and hydrogen peroxide systems williamson bn, et al. aerosol and surface stability of sars-cov- as with the "fast"-moving protocol (see supplemental video ) we were exposing surfaces at a distance of cm with a speed of . cm/s resulting in an calculated integrated uv-c dose of mj/cm² at nm. this is substantially less than the previously reported mj/cm² necessary to achieve a -log reduction in virus titers when exposing aqueous sars-cov- to uv-c ( ). in another study, using a nm uv-led source, mj/cm² lead to a . -log ( . %) reduction of infectious sars-cov- when irradiating for s, however inactivation did not be increase with extended irradiation regimens up to s ( ). in addition, s deep- ultraviolet treatment at nm corresponding to a dose of mj/cm² reduced sars-cov- titer up to -logs ( ). comparing these values to other pathogens, sars-cov- seems particularly sensitive towards uv-c light. to achieve a -log titer reduction, - mj/cm² are necessary for adenovirus, - mj/cm² for poliovirus, and bacteria as for instance bacillus subtilis require - mj/cm² ( ). this is in-line with susceptibility of sars-cov towards uv-c in aerosols at . mj/cm², whereas adenovirus or ms -bacteriophages were resistant to such a treatment ( ). altogether, we establish the effectiveness of uv-c treatment against sars-cov- in a setting designed to simulate realistic conditions of decontamination. the easy, rapid, chemical-free, and high efficacy of uv-c treatment to inactivate sars-cov- demonstrates the applicability of this technology in a broad range of possible settings. key: cord- -uuwgpuhw authors: roy, sylvie; ghani, karim; de campos-lima, pedro o.; caruso, manuel title: efficient production of moloney murine leukemia virus-like particles pseudotyped with the severe acute respiratory syndrome coronavirus- (sars-cov- ) spike protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uuwgpuhw the severe acute respiratory syndrome coronavirus (sars-cov- ) outbreak that started in china at the end of has rapidly spread to become pandemic. several investigational vaccines that have already been tested in animals and humans were able to induce neutralizing antibodies against the sars-cov- spike (s) protein, however protection and long-term efficacy in humans remain to be demonstrated. we have investigated if a virus-like particle (vlp) derived from moloney murine leukemia virus (mlv) could be engineered to become a candidate sars-cov- vaccine amenable to mass production. first, we showed that a codon optimized version of the s protein could migrate efficiently to the cell membrane. however, efficient production of infectious viral particles was only achieved with stable expression of a shorter version of s in its c-terminal domain (Δs) in cells that express mlv gag-pol ( gp). the incorporation of Δs was -times more efficient into vlps as compared to the full-length version, and that was not due to steric interference between the s cytoplasmic tail and the mlv capsid. indeed, a similar result was also observed with extracellular vesicles released from parental and gp cells. the amount of Δs incorporated into vlps released from producer cells was robust, with an estimated . μg/ml s equivalent (s is comprised of s and s ). thus, a scalable platform that has the potential for production of pan-coronavirus vlp vaccines has been established. the resulting nanoparticles could potentially be used alone or as a boost for other immunization strategies for covid- . importance several candidate covid- vaccines have already been tested in humans, but their protective effect and long-term efficacy are uncertain. therefore, it is necessary to continue developing new vaccine strategies that could be more potent and/or that would be easier to manufacture in large-scale. virus-like particle (vlp) vaccines are considered highly immunogenic and have been successfully developed for human papilloma virus as well as hepatitis and influenza viruses. in this study, we report the generation of a robust moloney murine leukemia virus platform that produces vlps containing the spike of sars-cov- . this vaccine platform that is compatible with lyophilization could simplify storage and distribution logistics immensely. (mlv) could be engineered to become a candidate sars-cov- vaccine amenable to mass production. first, we showed that a codon optimized version of the s protein could migrate efficiently to the cell membrane. however, efficient production of infectious viral particles was only achieved with stable expression of a shorter version of s in its c-terminal domain (ds) in cells that express mlv gag-pol ( gp). the incorporation of ds was -times more efficient into vlps as compared to the full-length version, and that was not due to steric interference between the s cytoplasmic tail and the mlv capsid. indeed, a similar result was also observed with extracellular vesicles released from parental and gp cells. the amount of ds incorporated into vlps released from producer cells was robust, with an estimated . µg/ml s equivalent (s is comprised of s and s ). thus, a scalable platform that has the potential for production of pan-coronavirus vlp vaccines has been established. the resulting nanoparticles could potentially be used alone or as a boost for other immunization strategies for importance several candidate covid- vaccines have already been tested in humans, but their protective effect and long-term efficacy are uncertain. therefore, it is necessary to continue developing new vaccine strategies that could be more potent and/or that would be easier to manufacture in large-scale. virus-like particle (vlp) vaccines are considered highly immunogenic and have been successfully developed for human papilloma virus as well as hepatitis and influenza viruses. in this study, we report the generation of a robust moloney murine leukemia virus platform that produces vlps containing the spike of sars-cov- . this vaccine platform that is compatible with lyophilization could simplify storage and distribution logistics immensely. facs analysis on -ace cells, a cell line generated by stable transfection that is % positive for ace (fig. ) . titers of . x infectious units (iu)/ml and . x iu/ml were obtained for vsv-g-and galv-pseudotyped viruses, although titers of s and ds-pseudotyped viruses were below the detection limit of iu/ml (fig. a ). only few gfp cells could be observed by fluorescence microscopy after infection with the ds-pseudotyped virus and there were none when the s- pseudotyped vector was used (fig. b) . thus, these results indicated that the transient production was extremely inefficient for generating vlp-s, even with ds. ds-pseudotyped mlv recombinant viral particles are efficiently released from stable producer cells. we have shown that stable retrovirus packaging cell lines can generate pseudotyped vectors with at least -fold higher titers as compared to transient transfection productions ( ). we then hypothesized that s or ds stably expressed in gp cells ( cells that express mlv gag-pol) could be a better system to produce vlp-s. stable populations of gp cells expressing s and ds were then generated by transfection. in these cells, s and ds were able to localize at the cell surface at even higher levels than what we found in transient transfection (fig. a). a gfp retroviral vector was then introduced in these cells by infection (fig b) , and titers of gfp viruses released by these new producers were measured after infecting -ace cells. only few gfp positive cells could be detected by fluorescence microscopy after infection of -ace cells with the s-pseudotyped vector, but a very high percentage of fluorescent cells was observed after infection with the ds virus. a high number of gfp positive cells was seen with the galv virus diluted -times as compared to the two other vectors (fig. a) . titers of . x iu/ml and iu/ml were measured for the galv and ds-pseudotyped viruses, respectively, and the s-pseudotyped vector titer was below the detection limit of iu/ml, as expected ( fig b) . we could conclude that the production of recombinant viral particles was robust from stable producers expressing ds and inefficient with the full-length version of sars cov- s. the deletion of the amino acid cytoplasmic tail of s does not enhance its fusogenicity. as producer cells express the same amount of s and ds at the cell surface, one possible explanation for the high transduction efficiency of ds-pseudotyped vectors could be increased fusogenicity. the fusion capacity of s and ds was then assessed in a syncytia formation assay by mixing gp cells expressing s or ds with -ace cells. the number and the size of syncytia evaluated one day after mixing were very similar between s and ds mixtures, and there were none with the control cells (fig. ) . thus, the deletion of the amino acids in the s cytoplasmic tail does not have a significant effect on its fusogenicity. high amounts of sars-cov- ds protein are incorporated into mlv vlps released from stable producer cells. a vlp-derived sars cov- vaccine will be a viable option if sufficient amounts of s protein are incorporated at the surface of the released particles. western blots were performed with an anti-s antibody to evaluate the quantity of s protein into vlps produced in transient transfections and from stable producers. two bands were detected around kda that are most likely two glycosylated forms of s . the uncleaved s protein migrated around kda, and two other bands above kda were also detected in the ds samples that had more intense signals. these bands could be dimeric and trimeric forms of s as it has been suggested ( ). the amount of s detected at the surface of vlps produced in transient transfections or released from stable producers was much higher with the truncated version of s than with the full-length molecule (fig. a). mlv viral particles produced in transient transfection or from stable producers were detected with an antibody against p . a -and a -fold difference was found with the transient and the stable production systems, respectively (fig. b ), although there was less than a . -fold difference between s and ds in cellular extracts (fig. c ). more ds was also released as compared to the full- length protein in the supernatants of stably transfected cells, however the amount of ds detected was -to- times lower than the one released from the gp-ds. the amount of s equivalent present in the supernatant of gp-ds cells was high and evaluated at . µg/ml using the igg-s standard (fig. a ). our results indicated that the incorporation of s into mlv vlps is very efficient in stable producers but only with the truncated version of s. immunization will be the best preventive strategy to address the current covid- pandemic, although therapeutic alternatives cannot be neglected as an efficient vaccine is not a certainty ( , , ). yet preliminary results from preclinical and clinical studies are encouraging as several types of vaccines are able to trigger the production of nabs against sars-cov- s ( , - , - , , ) . how efficient and how long these nabs will be present in vaccinated people remains an open question that will only be answered with time ( ). also, antibody-dependent enhancement will have to be carefully monitored in these trials as it is a side effect that cannot be underestimated with coronaviruses ( , , ). one other major challenge ahead will be the capacity to mass produce covid- vaccines. in this study, we have established and characterized a new mlv-derived vlp platform that could be used for the production of a covid- vaccine. the efficient pseudotyping of mlv particles with s is a prerequisite to establish a robust vlp cov- s at the cell surface was not improved after disrupting the er retention signal by missense mutations ( ). in this study, we showed that s could be detected at the cell surface at a similar level to that achieved by ds in transiently transfected cells as well as in stable producers ( fig. and fig. a), a finding that has also been reported for sars-cov s expressed in transient transfections ( , ) . these results indicate that s can bypass its natural localization and efficiently migrates to the cell surface when it is overexpressed. despite similar amounts of s and ds at the cellular membrane, the truncated version was more efficiently incorporated into mlv viral particles. four-and -fold differences were obtained with vlps produced in transient transfection experiments and from stable producers, respectively (fig. b) . the hypothesis that has been proposed for sars-cov and sars-cov- is that the amino- acid deletion in the s cytoplasmic tail facilitates the pseudotyping by decreasing the steric interference with the retroviral matrix proteins ( , , ). our results invalidate this hypothesis as more ds was also found in the supernatant of transfected cells that did not express mlv gag-pol (fig. a) . parental and gp cells release evs that can incorporate ds more efficiently than s ( fig. a and fig. ). vlps and evs are very similar in composition, and it has been postulated that they use similar pathways for vesicle trafficking ( , ). so, unlike s, ds was efficiently incorporated into vlps or evs like for example tetraspanins or endosomal markers that are equally found in both particle types ( , ). evs released from gp-ds contain less than % of the total ds protein, and they would not need to be removed from vaccine preparations as they could be as good immunogens as vlps. it was even reported that evs containing the s protein of sars-cov could induce high levels of nabs ( ). titers of recombinant gfp retroviruses released from stable producers were at least a -fold higher with ds versus s despite a -fold difference in the amount of the two proteins incorporated at the surface of vlps ( fig. a and fig b) . as we did not find major differences in fusogenicity between s and ds in a syncytia formation assay (fig. ) , our results suggest that recombinant viruses become fully infectious when a certain threshold of s protein is incorporated at their surface. recombinant gfp or luciferase pseudotyped retroviruses are commonly used to measure the activity of nabs present in serum of infected or vaccinated people ( - ). these reagents are convenient, as unlike sars-cov- they can be manipulated in a bsl- laboratory. the robust production system with the gp-ds cell line could be highly valuable to evaluate the presence of nabs in large cohorts. mass production will be a major challenge with all types of sars-cov- vaccine that are being developed as the entire worldwide population will have to be vaccinated. based on the results of a nanoparticle vaccine containing s, whose and µg doses triggered a high level of nabs in people ( ), we assume that a vaccine derived from the vlp platform described in this study could be efficient with similar or lower amounts of s per dose. the yield of vlps produced from the gp-ds cells could be increased if a high producer clone is selected instead of a bulk population, and if cells are cultured in bioreactors in fed-batch or perfusion modes. the average titer of gene therapy vectors produced with a derivative of the gp cell line was increased by . -fold in bioreactor versus a -layer cell factory, and the total vector yield was increased by . -fold ( ). mutations of the furin cleavage site located between s and s and the d g variant that is now more prevalent in the infected population could increase the amount of s incorporated into vlps ( , a very concise review that compared the first results of different covid- vaccines concluded that the most immunogenic ones were made with recombinant proteins ( ). these results emphasize the importance of the platform developed in this study because vlps present the antigen in a protein format that seems more potent for vaccination than the protein alone. indeed, mlv vlps displaying the human cytomegalovirus glycoprotein b antigen could trigger -times more nabs in mice than the protein alone using the same amount of antigen ( ). finally, vlp-s could be used as a boost for other types of vaccine like measle virus-and adenovirus-based recombinant vectors. these combinations were highly potent for triggering nabs against hepatitis c proteins in mice and macaques ( ). in conclusion, we have developed and characterized a new mlv vlp platform that can efficiently incorporate the s protein from sars-cov- , and that has the potential to produce a pan- coronavirus vaccine. the next logical step is to validate this vaccine in experimental animals and in humans thereafter. plasmids. the expression plasmid pmd ace ipuro r containing the human angiotensin- converting enzyme (ace ) cdna used to generate ace positive cells was constructed as follows: the ace pmei cdna fragment obtained from the plasmid hace (addgene; # ) was cloned in pmd ipuro r opened in ecorv. the sars-cov- s gene from the wuhan-hu- isolate (genbank: mn . ) was codon optimized (genscript, township, nj) and cloned in pmd ipuro r in ecori/xhoi. a shorter version with a -codon deletion in c-terminal (ds) was also constructed in a similar way. the pmd .galvipuro r and pmd .g plasmids that encode the galv and vsv-g envelopes, and the retroviral vector plasmid containing the gfp gene under the control of the ' long terminal repeat sequence have been described elsewhere ( ). human chimeric anti-s antibody (genscript; : dilution) followed by an alexa -conjugated goat anti-human igg (jackson laboratories; : ) were successively incubated with cells for labelling. the fixable viability stain (bd biosciences, san jose, ca, usa) was used to exclude dead cells. the presence of s was then analyzed by flow cytometry with a bd facsaria ii (bd biosciences). cells transfected with a galv expression plasmid were used as control. the presence of stably expressed s at the cell surface of gp-s and gp-ds was similarly analyzed by flow cytometry. the presence of ace at the surface of -ace cells was also checked by facs. detached cells were labelled with a mouse anti-ace antibody (r&d systems, minneapolis, mn / ) followed by an alexa goat anti-mouse ( : , ; invitrogen, carlsbad, ca). the presence of s released in the supernatant of transiently transfected gp cells was analyzed by western blot. subconfluent cells plated in mm were transfected for h with µg of envelope expression plasmids and µg of the gfp retroviral plasmid. one day later, the media was replaced with . ml of sfm that was then harvested the following day. supernatants were concentrated -fold with a kda amicon centrifugal unit (millipore sigma, oakville, canada) and were stored at - o c until use. the gfp fluorescence evaluated under a microscope at the time of harvest was very similar among the different transfected plates. supernatants from confluent gp-s, gp-ds, -s and -ds cells were also harvested and concentrated from -mm dishes. cell pellets of x cells were resuspended in µl ripa lysis buffer containing a protease inhibitor cocktail (roche). samples were centrifuged for min to remove cell debris and stored at - o c until use for western blot analysis. samples of µl were incubated min at °c in loading buffer containing % sds and . % b-mercaptoethanol, and run on a % sds-polyacrylamide gel ( % stacking), followed by transfer onto nitrocellulose membranes (ge healthcare). immunoblotting was performed with a rabbit polyclonal antibody anti-s ( : dilution, sinobiological, beijing, china) and a rat monoclonal antibody anti-mlv p produced from the hybridoma r ( : , dilution; american type culture collection, manassas, va). blots were then incubated with secondary antibodies irdylight goat anti-rat igg ( : , ; invitrogen) and irdye cw anti-rabbit igg ( : , ; li-cor biosciences, lincoln, ne), and analyzed with the odyssey infrared imaging system biosciences). serial dilutions of known amounts of c-terminally fc-tagged s (biovendor, brno, czech republic) were used for quantification. china novel coronavirus i, research t. . a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin the socio-economic implications of the coronavirus pandemic (covid- ): a review human coronavirus: host-pathogen interaction origin and evolution of pathogenic coronaviruses a new threat from an old enemy: reemergence of coronavirus (review) possible bat origin of severe acute respiratory syndrome coronavirus identifying sars-cov- related coronaviruses in malayan pangolins cross host transmission in the emergence of mers coronavirus bats are natural reservoirs of sars-like coronaviruses rooting the phylogenetic tree of middle east respiratory syndrome coronavirus by characterization of a conspecific virus from an african bat severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc. functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses probable pangolin origin of sars-cov- associated with the covid- outbreak sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus structure, function, and antigenicity of the sars-cov- spike glycoprotein characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov structural basis for potent neutralization of betacoronaviruses by single- monoclonal antibodies for prevention and treatment of covid- fruitful neutralizing antibody pipeline brings hope to defeat sars-cov- sars-cov- vaccines: status report development of sars-cov- vaccines: should we focus on mucosal immunity? sars-cov- vaccines: 'warp speed' needs mind melds not warped minds major findings and recent advances in virus-like particle (vlp)-based vaccines safety, tolerability, and immunogenicity of a recombinant adenovirus type- vectored covid- vaccine: a dose-escalation, open-label, non-randomised, first-in-human trial first-in-human trial of a sars cov recombinant spike protein nanoparticle vaccine an mrna vaccine against sars-cov- -preliminary report an alphavirus-derived replicon rna vaccine induces sars-cov- neutralizing antibody and t cell responses in mice and nonhuman primates oxford cvtg. . safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial primary exposure to sars-cov- protects against reinfection in rhesus macaques immunogenicity of a dna vaccine candidate for covid- cov- spike s -fc fusion protein induced high levels of neutralizing responses in nonhuman primates development of an inactivated vaccine candidate, bbibp-corv, with potent protection against sars-cov- development of an inactivated vaccine candidate for sars-cov- phase / study of covid- rna vaccine bnt b in adults immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind, placebo-controlled, phase trial intradermal- delivered dna vaccine provides anamnestic protection in a rhesus macaque sars-cov- challenge model retroviral vectors pseudotyped with severe acute respiratory syndrome coronavirus s protein the contribution of the cytoplasmic retrieval signal of severe acute respiratory syndrome coronavirus to intracellular accumulation of s proteins and incorporation of s protein into virus-like particles intracellular targeting signals contribute to localization of coronavirus spike proteins near the virus assembly site cytoplasmic tail of coronavirus spike protein has intracellular targeting signals genetic analysis of the sars-coronavirus spike glycoprotein functional domains involved in cell-surface expression and cell-to-cell fusion aromatic amino acids in the juxtamembrane domain of severe acute respiratory syndrome coronavirus spike glycoprotein are important for receptor-dependent virus entry and cell- cell fusion vesicular stomatitis virus pseudotyped with severe acute respiratory syndrome coronavirus spike protein efficient human hematopoietic cell transduction using rd -and galv-pseudotyped retroviral vectors produced in suspension and serum-free media discrimination between exosomes and hiv- : purification of both vesicles from cell-free supernatants highly purified human immunodeficiency virus type reveals a virtual absence of vif in virions the improbability of the rapid development of a vaccine for sars-cov- dna vaccine protection against sars-cov- in rhesus macaques single-shot ad vaccine protects against sars-cov- in rhesus macaques human immunodeficiency viral vector pseudotyped with the spike envelope of severe acute respiratory syndrome coronavirus transduces human airway epithelial cells and dendritic cells retroviruses pseudotyped with the severe acute respiratory syndrome coronavirus spike protein efficiently infect cells expressing angiotensin-converting enzyme measuring sars-cov- neutralizing antibody activity using pseudotyped and chimeric viruses protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays optimized pseudotyping conditions for the sars cov- spike glycoprotein extracellular vesicles and viruses: are they close relatives? the trojan exosome hypothesis exosomal vaccines containing the s protein of the sars coronavirus induce high levels of neutralizing antibodies large-scale clinical-grade retroviral vector production in a fixed-bed bioreactor the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity enveloped virus-like particle expression of human cytomegalovirus glycoprotein b antigen induces antibodies with potent and broad neutralizing activity a prime-boost strategy using virus-like particles pseudotyped for hcv proteins triggers broadly neutralizing antibodies in macaques generation of a high-titer packaging cell line for the production of retroviral vectors in suspension and serum-free media fiji: an open-source platform for biological-image analysis key: cord- -o zyt vb authors: motayo, babatunde olarenwaju; oluwasemowo, olukunle oluwapamilerin; akinduti, paul akiniyi; olusola, babatunde adebiyi; aerege, olumide t; faneye, adedayo omotayo title: evolution and genetic diversity of sarscov- in africa using whole genome sequences date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o zyt vb the ongoing sarscov- pandemic was introduced into africa on th february and has rapidly spread across the continent causing severe public health crisis and mortality. we investigated the genetic diversity and evolution of this virus during the early outbreak months using whole genome sequences. we performed; recombination analysis against closely related cov, bayesian time scaled phylogeny and investigated spike protein amino acid mutations. results from our analysis showed recombination signals between the afrsarscov- sequences and reference sequences within the n and s genes. the evolutionary rate of the afrsarscov- was . × − high posterior density hpd ( . × − to . × − ) substitutions/site/year. the time to most recent common ancestor tmrca of the african strains was december th . the afrsarcov- sequences diversified into two lineages a and b with b being more diverse with multiple sub-lineages confirmed by both maximum clade credibility mcc tree and pangolin software. there was a high prevalence of the d -g spike protein amino acid mutation ( . %) among the african strains. our study has revealed a rapidly diversifying viral population with the g spike protein variant dominating, we advocate for up scaling ngs sequencing platforms across africa to enhance surveillance and aid control effort of sarscov- in africa. towards the end of december , chinese authorities through the world health organization office in china made known of a new pathogen responsible for a series of pneumonia associated infections in wuhan, hubei province (who ). the pathogen was later identified to be a novel coronavirus closely related to the severe acute respiratory syndrome virus (sars), with a possible bat origin (zhou et al, ) . the world health organization named the disease covid- (chan et al, ) , and later declared it a pandemic on th march prompting concerted efforts towards prevention and control worldwide (who ). on febuary th the international committee on the taxonomy of viruses (ictv) adopted the name sars-cov- following the report of their coronavirus working group (csg, ) . the virus has been placed in the subgenera sarbecovirus, genus betacoronavirus, subfamily coronavirinea, family coronaviridea (de groot et al, ; gorbalenya et al, ) . coronaviruses are enveloped viruses containing a single-stranded positive sense rna genome with a size of between kb to kb (masters and pearlman ) . they are responsible for a host of human and animal infections. the betacoronaviruses host the most medically important species contains several human coronaviruses such as hucovoc , hucovhku . the severe acute respiratory syndrome coronavirus sarscov and the middle east respiratory syndrome coronavirus mers are also members of this group, both have been shown to be pathogens of high consequence that caused large scale epidemics and have been shown to be of zoonotic origin (lau et al, ; zaki et al, ) . genomic and structural analyses have revealed that sarscov- contains four structural proteins and several non structural proteins (chen et al, ; lu et al, ) . the spike protein is the major antigenic protein responsible for initiating infection, via attachment of its receptor binding domain (rbd) to the sarscov/sarscov- receptor angiotensin converting enzyme ace (donelli et al, ; monteil et al, ) . globally, there have been , , confirmed sarscov- cases globally, with , deaths as at st of may (ecdc ). the coronavirus pandemic began in egypt africa on the th febuary with an italian who returned into the country (who b). as at st of may there have been , cases in africa, with , deaths and , recoveries covering fifty four countries in africa with south africa having the highest number of cases of , (who c). several reports have traced the evolutionary origins of sarscov- to sarsrcov from bats (zhou et al, ) and pangolins (lam et al ) . phlogenetic analysis has shown that the virus has diversified through the duration of the pandemic into three major lineages a and b with several sub-lineage diversifications (rambault et al ) . majority of the reports were generated using genome sequences of sarscov- from america, europe and asia (rambault et al, ) . there has been paucity of data on the genetic evolution of sarscov- sequences from africa, despite the increasing number of genome sequence submissions into the gisaid database from africa; there were whole genome sequences available in the gisaid database as at th april . this gap in knowledge prompted the conceptualization of this study. this study was designed to determine to the genetic diversity and evolutionary history of genome sequences of sarscov- isolated in africa. full genome sequences with high coverage were downloaded from the global initiative for sharing of avian influenza data gisaid database. as at th april there were full genome sequences from africa available in the gisaid database we downloaded all of them excluding genomes with low coverage. a total of high coverage genomes were eventually selected from the african sequences, along with these high quality full genome sequences were also downloaded from three continents america (usa), asia (china and south korea) and europe (england, italy and germany). three different datasets were then generated from these sequences, the first dataset consisted of high coverage full genome sequences from africa, along with the sarscov reference genome sequence from wuhan, china, bat and pangolin sars related reference sequences and sarscov reference sequence. the second dataset consisted of complete genome sequences from africa, america, asia and europe, while the third dataset consisted of complete spike protein (s) gene sequences from africa, bat and pangolin sars related reference s gene sequences. whole genome sequences downloaded from the gisaid database were aligned using mafftv . (ff-ns- algorithm) following default settings (katoh et al, ). maximum likelihood phylogenetic analysis was performed using the general time reversible nucleotide substitution model with gamma distributed rate variation gtr-Γ (yang et al, ) with bootstrap replicates using iq-tree software (nguyen et al ) . lineage assignments for the sarscov- sequences were conducted using the phylogenetic assignment of named global outbreak lineages tool (pangolin), available at http://github.com/hcov- /pangolin (o'toole and mccrone ). we analyzed potential recombination events using the recombination detection program rpd software (martin et al, ) . the analysis was conducted on whole genome sequences of identified lineages among the african isolates, using rdp, bootscan analysis, genecov, chimera, siscan, seq, and maximum chisquare methods. a putative recombination event was passed only if three of the above mentioned methods gave a positive recombination signal (liu et al, ) . temporal clock signal was analyzed among the aligned sequences using tempest version . (rambault et al, ) . the root-to-tip divergence and sampling dates supported the use of molecular clock analysis in this study. phylogenetic trees were generated by bayesian inference through markov chain monte carlo (mcmc), implemented in beast version . . (suchard et al, ) . we partitioned the coding genes into first+second and third codon positions and applied a separate hasegawa-kishino-yano (hky+g) substitution model with gammadistributed rate heterogeneity among sites to each partition (hasaegawa et al, ) . the relaxed clock with gausian markov random field skyride plot (gmrf) coalescent prior was selected for the final analysis, after running different models and comparing them using bayes factor with marginal likelihood estimated using the path sampling and stepping stone methods implemented in beast version . . (suchard et al, ) . one hundred million mcmc chains were run with % burn in. results were then visualized with tracer version . . (http://tree.bio.ed.ac.uk/software/tracer/), all effective sampling size ess values were > indicating sufficient sampling. bayesian skyride analysis was carried out to visualize the epidemic evolutionary history using tracer v . . complete s protein gene sequence of afrsarscov- was aligned along with ratg btcov and pangolin sarsrcov sequences using mafft (katoh et al, ) . the alignment was then edited and visualized using bioedit software. the current global sarscov- pandemic, otherwise known as covid- began on the african continent from a european returnee in egypt on february th (who ). it has since spread to virtually all the countries within the african region. this study was based on sequences generated during the early phase of the pandemic in africa precisely between, february and april . sixty nine high coverage full genome sequences from six african countries, namely algeria ( ), senegal ( ), democratic republic of congo drc ( ), nigeria ( ), ghana ( ) and south africa ( ) were analyzed. phylogenetic analysis of the african sequences showed clustering within the sarbecovirus sub-genus forming a sub-cluster with sarsr cov and pcov (figure ) as previously reported by several workers (zhao et al, ; lam et al, ; . the root to tip regression analysis showed a not so strong signal with a correlation of coefficient of . and r = . (supplementary figure ) . results of recombination analysis of the african sarscov- (afrsarscov- ) sequences against references whole genome sequences of sars, recombination signals were observed between the african sarscov- sequences and reference sequence (major recombinant hcov- pangolin/guangu p l/ ; minor parent hcov- b batyunan/ratg ) between the rdrp and s gene regions (figure ). this result is consistent with a previous report from saudi arabia which investigated the recombination between sarscov- and closely related viruses such as sarscov and mers (nour et al, ) . evolutionary rate for the afrsarscov- isolates during the period under study was . × - substitutions/site/year, high posterior density interval hpd ( . × - to . × - ). this is slightly higher than that of an earlier report from early outbreak strains from china with a rate of . × - (li et al, ) , it is however lower than the calculated global sarscov- evolutionary rate estimated to be . × - reported by nexstrain (www.nextstrain.org/ncov/global ). the mcc tree of the african sarscov- sequences shows that they have evolved into two major lineages a and b with lineage b being more diverse. majority of the african sarscov sequences clustered within lineage b, while three ghanaian, three congolese, and four senegalese strains clustered along with the reference chinese and south korean strains within lineage a (figure ). the mcc tree for the dataset containing global reference sequences also showed a similar topology with that of the african tree, the tree was distributed into two major lineages a and b, with lineage b further diversifying into about four sub-lineages, while lineage a seemed to evolve into only two sublineages ( figure ) . the afrsarscov- strains were intermixed with the global sequences within both lineages, lineage b consisted mainly of strains from germany, england, italy and usa, intermixed with african strains; while lineage a consisted mainly of strains from south korea and china with a few african strains from senegal, ghana and drc. the result of the genotype analysis using the genotyping tool pangolin was largely in conformity with observed phylogenetic analysis. figure shows a summary of the lineage distribution of the isolates by country of origin using the pangolin genotyping tool. the complete distribution of the strains according to lineage and country is shown in supplementary table . from the analysis with pangolin, lineage b. was the most commonly encountered and the most widely distributed, consisting of sequences from seven countries, followed by lineage b. and genotype b. lineage a had positive sequences from six countries. majority of the sequences recorded high bootstrap values with over % of the sequences recording a bootstrap value of above %. this shows that the pangolin is a reliable tool with a broad scope of functions including a user friendly and interactive representation of phylogenetic clustering of the identified sub-lineages and lineages by means of graphical images of the trees generated using virtually all available sarcov- sequences available on gisaid platform as reference. the genotyping tool was recently introduced several reports have utilized it in predicting lineage assignments accurately (xaiveir et al, ) . the time to most recent common ancestor tmrca of the african sarscov- strains was december th (november th -december th ), while the tmrca of all the sequences under analysis was th october (july th -december th ). our tmrca was lower than a similar study which reported a tmrca of th october among global isolates including chinese isolates (li et al, ), but was slightly higher than another recent study investigating the evolutionary dynamics of the ongoing sarscov- epidemic in brazil which reported a tmrca of th february (xaiveir et al, ) . the epidemic history of the ongoing outbreak was investigated using the bayesian skyline plot bsp. the bsp showed a steady increase in viral population as the outbreak progressed under the study period ( figure ). this observation is expected as viral sequence population is supposed to increase as the infection spreads. a major limitation was the rather small number of sequences analyzed and very short study duration; therefore our results may not reflect the exact viral population dynamic of the outbreak in africa. the afrsarscov- sequences were analyzed for the d -g mutation within the s subunit of the spike protein, which has been reported to contribute to increased transmissibility of sarscov- (korber et al, ) . figure shows a representative amino acid alignment of selected afr sarscov- sequences along with reference sequences of btcov ratg and pcov. our results revealed high prevalence of d -g mutation among afrsarscov- with / ( . %). the mutation was recorded in isolates from all african countries analyzed in this study, supplementary figure . prior to this report the d -g spike mutation was found predominantly in europe accompanied by high number of cases and significant mortality rate (pachetti et al, ; korber et al, b) . the introduction of this strain in africa is quite worrisome, considering the population densities of most african cities and the poor state of public health infrastructure to support medical intervention of symptomatic sarscov- cases. although more evidence is still required to determine the extent of the effect of the d -g mutation on the virulence properties of the virus, current evidence from in vitro studies seem to support the hypothesis of increased transmissibility of this variant of the virus (korber et al, ; hu et al, ) . in conclusion we have reported the genetic diversity and evolutionary history of sarscov- isolated in africa during the early outbreak period. our findings have identified diverse sublineages of sarscov- currently circulating among africans. we also identified high prevalence of the d -g spike protein variant of the virus capable of rapid transmission in all countries sampled. a major limitation was the relatively low amount of sequence submission available in gisaid database compared with those of other regions such as europe and asia. we advocate for upscale of next generation sequencing ngs capacity for whole genome sequencing of sarcov- samples across the african continent to support surveillance and control effort in africa. figure . amino acid alignment of the partial s gene sequences covering amino acid positions to , of selected afrsarscov isolates along with reference sequences of closely related pcov and bat ratg . the red shaded region represents the receptor binding domain; the blue shaded box represents the d -g motive, while the empty red box represents the polybasic cleavage site bordering the s /s sub-unit. figure . amino acid alignment of the partial s gene sequences covering amino acid positions to , of selected afrsarscov isolates along with reference sequences of closely related pcov and bat ratg . the red shaded region represents the receptor binding domain; the blue shaded box represents the d -g motive, while the empty red box represents the polybasic cleavage site bordering the s /s sub-unit. novel coronavirus ( -ncov ) situation report - , who africa/second case of ncov confirmed in africa coronavirus disease (covid- ) a pneumonia outbreak associated with a new coronavirus of probable bat origin covid- situation world wide as at th may a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster commentary: middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group isolation of a novel coronavirus from a man with pneumonia in saudi arabia severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace epidemiological and genetic analysis of severe acute respiratory syndrome chapter , coronaviridea emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology the d g mutation of sars-cov- spike protein enhances viral infectivity anddecreases neutralization sensitivity to individual convalescent sera tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus probable pangolin origin of sars-cov- associated with the covid- identifying sars-cov- related coronaviruses in malayan pangolins rdp : detection and analysis of recombination patterns in virus genomes. virus evolution exploring the temporal structure of heterochronous sequences using tempest (formerly path-o-gen) bayesian phylogenetic and phylodynamic data integration using beast . . virus evolution dating of the human-ape splitting by a molecular clock of mitochondrial dna codon usage bias and recombination events for neuraminidase and hemagglutinin genes in chinese isolates of influenza a virus subtype h n . archives of virology evolutionary history, potential intermediate animal host, and cross species analysis of sarscov the species severe acute respiratory syndrome-related virus: classifying -ncov and naming it sarscov- the ongoing covid- epidemic in minas gerais, brazil: insights from epidemiological data and sars-cov- whole genome sequencing maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods insights into evolution and recombination of pandemic sarscov- using saudi arabian sequences. biorxiv preprint figure . boot scan plot of complete genome sequences of afrsarscov- sequences analysed with the rdp recombination software. the legend shows the identity of the sequences scanned within the plot; the light blue bars indicate the portions of the genome with recombinant signals in reference to the major and minor recombinant parent sequences. key: cord- -fujejfwb authors: gaurav, shubham; pandey, shambhavi; puvar, apurvasinh; shah, tejas; joshi, madhvi; joshi, chaitanya; kumar, sachin title: identification of unique mutations in sars-cov- strains isolated from india suggests its attenuated pathotype date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fujejfwb severe acute respiratory syndrome coronavirus- (sars-cov- ), which was first reported in wuhan, china in november has developed into a pandemic since march , causing substantial human casualties and economic losses. studies on sars-cov- are being carried out at an unprecedented rate to tackle this threat. genomics studies, in particular, are indispensable to elucidate the dynamic nature of the rna genome of sars-cov- . rna viruses are marked by their unique ability to undergo high rates of mutation in their genome, much more frequently than their hosts, which diversifies their strengths qualifying them to elude host immune response and amplify drug resistance. in this study, we sequenced and analyzed the genomic information of the sars-cov- isolates from two infected indian patients and explored the possible implications of point mutations in its biology. in addition to multiple point mutations, we found a remarkable similarity between relatively common mutations of -nucleotide deletion in orf of sars-cov- . our results corroborate with the earlier reported -nucleotide deletion in sars, which was frequent during the early stage of human-to-human transmission. the results will be useful to understand the biology of sars-cov- and itsattenuation for vaccine development. severe acute respiratory syndrome coronavirus- (sars-cov- ) is the cause of the novel human corona virus disease covid- , first reported on november th , in wuhan, china [ ] . it has spread rapidly creating a global public health emergency situation in the last few months and the world health organization (who) on march th , declared it a pandemic. till date, sars-cov- has infected close to five million people and caused more than , deaths globally. although started late, india showed ramping of more than , reports of sars-cov- infection including more than , deaths, as of may th , . as compared to the other recent coronavirus epidemics sars, and mers (middle east respiratory syndrome), having mortality rates of % [ ] and % [ ] , respectively, sars-cov- has a current mortality rate of about . % [ ] . however, the transmissibility of sars-cov- recorded far greater than the previous coronavirus epidemics. sars-cov- has a positive-sense . kbp rna genome, encoding open reading frames (orfs) [ ] . it codes for four structural proteins, namely spike glycoprotein (s), envelope glycoprotein (e), membrane glycoprotein (m), and nucleocapsid protein (n) [ ] . s protein is a large membrane protein which helps in endocytosis of the viral particle into the host cell by binding to the angiotensin-converting enzyme (ace ) receptor [ ] . e protein is a small viral transmembrane protein which helps in the assembly of the virions by forming ion channels inside the host cell [ ] . m protein is the most abundant protein present on the viral membrane, important for morphogenesis and viral assembly [ ] . n protein is responsible for the packaging of the viral genome and plays an important role in viral assembly [ ] . the genome also codes for overlapping polyproteins, namely polyprotein a (pp a) and polyprotein ab (pp ab); pp a being a truncated version of pp ab. these code for non-structural proteins (nsps) including proteases ( c-like protease and papain-like protease) and rna-processing enzymes like rnadependent rna polymerase, helicase, '- ' exonuclease, endoribonuclease, guanine n methyltransferase, 'o-ribose methyltransferase and adp ribose phosphatase [ ] . in addition to structural and nsps, sars-cov- genome also codes for at least two other viroporin candidates (other than the e protein), namely orf a and orf [ ] . the function and the possible contribution of other orfs in the viral infection are largely unknown. there is a global race to sequence the sars-cov- to understand its variability among the population and identify the unique mutation. gujarat biotechnology research centre (gbrc), gujarat india, actively involved in the complete genome sequencing of sar-cov- strains from india. more than complete genome sequences of sars-cov- strains from the state have been completed. the samples were collected from the covid- testing center following the guidelines of indian council of medical research. the complete genome sequencing of the samples were carried out by the gene specific primer sets using the ion s ™ next-generation sequencing system (thermo fisher scientific, usa) following manufacturer's protocol. in this study, we are reporting the unique mutations in the sars-cov- genome isolates from india. the two samples from a male and female aged (husband and wife) has been procured from the covid- testing facility in gujarat, india. husband and wife contracted infection from their son who got it from local/community transmission. both were recovered without having severe symptoms of covid- . the complete genome sequence analysis of sars-cov- isolated from both the patients showed a size of bp each (accession numbers mt and mt ), in contrast with bp size of the wuhan strain. however, the single nucleotide deletions in the genome of virus isolates from these two patients were at different positions. furthermore, the sequence revealed a total of ten mutations each across the genome as compared to sars-cov- strain from wuhan. the details of the mutations in the indian sars-cov- isolates areprovided in table . the genome sequence of sars-cov- from the virus isolates of two patients in gujrat, india, were analyzed and compared with the isolate from wuhan, china. the single nucleotide change in the viral genome could change the encoding amino acid, which might result in the conformational change in the viral structure protein [ ] . the genome of virus isolates from both patients showed a point mutation of c t corresponding to its genome length and lies in the ' untranslated region ( ' utr). it is unlikely that this mutation has any substantial effect on viral replication. the nucleotide change of c t in the protein orf ab has directed threonine, a polar uncharged amino acid, to mutate into a hydrophobic amino acid, isoleucine. threonine residues are well known to form hydrogen-bond interactions with surrounding polar residues in the interior of protein structure. the amino acid residues around this threonine at position corresponding to the polyproteinorf ab correspond to k-f-d-t-f-n-g-e-c-p suggesting that threonine might form hydrogen bonds with surrounding polar residues. perhaps it might also assist in the formation of protein bend caused due to glycine and proline having the position of i, i+ with respect to each other. isoleucine being hydrophobic might disturb any conformational protein bend due to its inability to form a hydrogen bond. this mutation might also result in increased hydrophobic interactions with surrounding phenylalanine residues. since very little is known about nsp or any of its homologous structure, no assertion at the structural level could be made. in the native form, nsp is bound with the cofactors nsp and nsp to form the rna dependent rna polymerase (rdrp) complex. the mutation of c t at nsp in indian isolates of sars-cov- lead to the substitution of proline with threonine. the proline residue at position corresponding to the polyprotein results in an unusual turn in the alpha-helix in which it resides [ ] . interestingly, the α -helix is one of the sites for the interaction of nsp with the cofactor, nsp . however, this mutation of proline causes significant changes into the structure due to the disturbance of that turn in that α -helix (fig. a) . this would probably change the entire structure of that interacting site, leading to the decreased affinity of rdrp to nsp which likely disturbs its functioning. mutation study in the -d structure of nsp ( mutation of a g causes the substitution of aspartic acid with glycine. this mutation in the s protein lies in the small loop turn between two anti-parallel beta-strands. these betastrands participate actively in the trimerization of spike protein which is vital for binding of s protein to its receptor ace [ ] . replacing the aspartic acid residue with the much more flexible glycine might end up in disturbing the β -strands. prediction of effect of the mutation on the secondary structure (fig. a) suggests that this mutation is likely to disrupt one of the four beta-strands in that region, thus possibly inhibiting the trimerization of spike protein on the surface of the virus. this could significantly decrease the affinity of s protein to ace receptors as well. as a result of the mutation of g t, glutamine gets substituted into histidine in the (fig. b) suggests that the c-terminal of this protein ( - amino acids) in the wild-type orf protein was found to have four uniformly-spaced beta-strands. however, this mutation was found to essentially chop off the fourth beta-strand. the highly organized secondary structure of the c-terminal of the protein is likely to have some function in the viral replication or infectivity. this mutation truncating the fourth beta-strand probably disturbs its function. thus, this mutation might be severely affecting the functionality of the orf protein. we also found this same mutation in the virus isolate from an unrelated individual suggesting that this mutation might be relatively common. this result has a striking similarity with that of an in-vivo experiment reported earlier on sars cov- where a -nucleotide deletion in the orf gene was the most obvious genetic change in the early stage of human-to-human transmission of sars cov- [ ] . moreover, the -nucleotide deleted sars cov- strain had a -fold less viral replication as compared to its wild type, suggesting that this mutation effectively attenuated the virus. the conspicuous similarity with our result might shed light into the attenuation of the indian strain of sars-cov- in humans. it would be interesting to further explore its possibility for vaccine development. interestingly, the mutation in orf was found when a at th position was found to be deleted. however, another stop codon was formed nucleotides farther due to this can compromise the glycosylation process. presence of glycine and proline residues around this region is expected to give the structure kink and expose these residues for a potential o-linked glycosylation process. hence, this mutation of serine into isoleucine is expected to heavily inhibit and compromise this modification in this region. the authors declare no conflict of interest. zhang h, penninger jm, li y, zhong n, slutsky as ( ) angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target. intensive care med : - comparison between the predicted secondary structure of wild-type orf protein (upper) and e stop mutated orf protein (lower). the e stop mutation essentially truncates off the last beta-strand ( th to th amino acid) in the secondary structure of the protein. the other column includes, protein pertaining to the mutation site (protein annotation), reference severe acute respiratory syndrome coronavirus (sars-cov- ): an overview of viral structure and host response role of severe acute respiratory syndrome coronavirus viroporins e, a, and a in replication and pathogenesis sars: prognosis, outcome and sequelae in-silico approaches to detect inhibitors of the human severe acute respiratory syndrome coronavirus envelope protein ion channel the architecture of sars-cov- transcriptome severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges comparative and kinetic analysis of viral shedding and immunological responses in mers patients representing a broad spectrum of disease severity attenuation of replication by a nucleotide deletion in sars-coronavirus acquired during the early stages of human-to-human transmission formation and transfer of disulphide bonds in living cells a new integrated symmetrical table for genetic codes a new coronavirus associated with human respiratory disease in china evidence for gastrointestinal infection of sars-cov- structural basis for the recognition of sars-cov- by full-length human ace we would like to thank the department of science and technology, government of gujarat for financial support. nucleotide at the same position in wuhan sars-cov- sequence, the mutated nucleotide (allele nucleotide), and the corresponding amino acid change due to the point mutation (amino acid mutation) in the full length genome sequence of the indian virus isolates. protein annotation key: cord- -vm k hfx authors: rothan, hussin a.; stone, shannon; natekar, janhavi; kumari, pratima; arora, komal; kumar, mukesh title: the fda- approved gold drug auranofin inhibits novel coronavirus (sars-cov- ) replication and attenuates inflammation in human cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vm k hfx sars-cov- has recently emerged as a new public health threat. herein, we report that the fda-approved gold drug, auranofin, inhibits sars-cov- replication in human cells at low micro molar concentration. treatment of cells with auranofin resulted in a % reduction in the viral rna at hours after infection. auranofin treatment dramatically reduced the expression of sars-cov- -induced cytokines in human cells. these data indicate that auranofin could be a useful drug to limit sars-cov- infection and associated lung injury due to its anti-viral, anti-inflammatory and anti-ros properties. auranofin has a well-known toxicity profile and is considered safe for human use. sars-cov- has recently emerged as a new public health threat. herein, we report that the fda-approved gold drug, auranofin, inhibits sars-cov- replication in human cells at low micro molar concentration. treatment of cells with auranofin resulted in a % reduction in the viral rna at hours after infection. auranofin treatment dramatically reduced the expression of sars-cov- -induced cytokines in human cells. these data indicate that auranofin could be a useful drug to limit sars-cov- infection and associated lung injury due to its anti-viral, antiinflammatory and anti-ros properties. auranofin has a well-known toxicity profile and is considered safe for human use. gold-based compounds have shown promising activity against a wide range of clinical conditions and microorganism infections. auranofin, a gold-containing triethyl phosphine, is an fda-approved drug for the treatment of rheumatoid arthritis since ( ). it has been investigated for potential therapeutic application in a number of other diseases including cancer, neurodegenerative disorders, hiv/aids, parasitic infections and bacterial infections ( , ) . recently, auranofin was approved by fda for phase ii clinical trials for cancer therapy. the mechanism of action of auranofin involves the inhibition of redox enzymes such as thioredoxin reductase, induction of endoplasmic reticulum (er) stress and subsequent activation of the unfolded protein response (upr) ( ) ( ) ( ) ( ) . inhibition of these redox enzymes leads to cellular oxidative stress and intrinsic apoptosis ( , ) . in addition, auranofin is an anti-inflammatory drug that reduces cytokines production and stimulate cell-mediated immunity ( ) . the dual inhibition of inflammatory pathways and thiol redox enzymes by auranofin makes it an attractive candidate for cancer therapy and treating microbial infections. coronaviruses are a family of enveloped viruses with positive sense, single-stranded rna genomes ( ) . sars-cov- , the causative agent of covid- , is closely related to severe acute respiratory syndrome coronavirus (sars-cov- ) ( , ). it is known that er stress and upr activation contribute significantly to the viral replication and pathogenesis during a coronavirus infection ( ) . infection with sars-cov- increases the expression of the er protein folding chaperons grp , grp and other er stress related genes to maintain protein folding ( ) . cells overexpressing the sars-cov spike protein and other viral proteins exhibit high levels of upr activation ( , ) . thus, inhibition of redox enzymes such as thioredoxin reductase and induction of er stress by auranofin could significantly affect sars-cov- protein synthesis ( ) . in addition, sars-cov- infection causes acute inflammation and neutrophilia that leads to a cytokine storm with over expression of tnf-alpha, monocyte chemoattractant protein (mcp- ) and reactive oxygen species (ros) ( ). the severe covid- illness represents a devastating inflammatory lung disorder due to cytokines storm that is associated with multiple organ dysfunction leading to high mortality ( , ) . taken together, these studies suggest that auranofin could mitigate sars-cov- infection and associated lung damage due to its anti-viral, anti-inflammatory and anti-ros properties. auranofin has a well-known toxicity profile and is considered safe for human use. we investigated the anti-viral activity of auranofin against sars-cov- and its effect on virus-induced inflammation in human cells. we infected huh cells with sars-cov- (usa-wa / ) at a multiplicity of infection (moi) of for hours, followed by the addition of µm of auranofin ( , ) . dmso ( . %) was used as control (the solvent was used to prepare drug stock). cell culture supernatants and cell lysates were collected at and hours after infection. virus rna copies were measured by rt-pcr using two separate primers specific for the viral n gene and n gene ( , ) . as depicted in figure taken together these results demonstrate that auranofin inhibits replication of sars-cov- in human cells at low micro molar concentration. we also demonstrate that auranofin treatment resulted in significant reduction in virus-induced inflammation. these data indicate that auranofin could be a useful drug to limit sars-cov- infection and associated lung injury. further animal studies are warranted to evaluate the efficacy of auranofin for the management of sars-cov- associated disease. in this study, we used a novel sars-cov- (usa-wa / ) isolated from an oropharyngeal swab from a patient in washington, usa (bei nr- ). virus strain was amplified once in vero e cells and had titers of x plaque-forming units (pfu)/ml. huh cells (human liver cell line) were grown in dmem (gibco) supplemented with % heat-inactivated fetal bovine serum. cells were infected with sars-cov- or pbs (mock) at a multiplicity of infection (moi) of for hours ( , , , ) . cell were washed twice with pbs and media containing different concentrations of auranofin (sigma) or dmso (sigma) was added to cells. supernatants and cell lysates were harvested at and hours after infection. virus rna levels were analyzed in the supernatant and cell lysates by quantitative reverse transcription-polymerase chain reaction (qrt-pcr). rna from cell culture supernatants was extracted using a viral rna mini kit (qiagen) and rna from cell lysates was extracted using a rneasy mini kit (qiagen) as described previously ( , , ) . qrt-pcr was used to measure viral rna levels using previously published primers and probes specific for the sars-cov- . forward ( ′-gaccccaaaatc agcgaaat- ′), reverse ( ′-tctggttactgccagttgaatctg- ′) for mrna analysis of il- , il- β, tnfα and nf-kb, cdna was prepared from rna isolated from the cell lysates using a iscript™ cdna synthesis kit (bio-rad, hercules, ca, usa), and qrt-pcr was conducted as described previously ( , , ) . the primer sequences used for qrt-pcr are listed in table . auranofin: repurposing an old drug for a golden new age auranofin exerts broad-spectrum bactericidal activities by targeting thiol-redox homeostasis repurposing auranofin, ebselen, and px- as antimicrobial agents targeting the thioredoxin system repurposing auranofin as an antifungal: in vitro activity against a variety of medically important fungi repurposing auranofin for the treatment of cutaneous staphylococcal infections the combination of alcohol and cigarette smoke induces endoplasmic reticulum stress and cell death in pancreatic acinar cells the unfolded protein response: controlling cell fate decisions under er stress and beyond biologic actions and pharmacokinetic studies of auranofin. the american journal of medicine the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak hlh across speciality collaboration uk. covid- : consider cytokine storm syndromes and immunosuppression coronavirus infection, er stress, apoptosis and innate immunity comparative host gene transcription by microarray analysis early after infection of the huh cell line by severe acute respiratory syndrome coronavirus and human coronavirus e comparative analysis of the activation of unfolded protein response by spike proteins of severe acute respiratory syndrome coronavirus and human coronavirus hku the ab protein of sars-cov is a luminal er membrane-associated protein and induces the activation of atf role of endoplasmic reticulum-associated proteins in flavivirus replication and assembly complexes covid- , cytokines and immunosuppression: what can we learn from severe acute respiratory syndrome? clinical and experimental rheumatology integrated microrna and mrna profiling in zika virus-infected neurons favipiravir and ribavirin inhibit replication of asian and african strains of zika virus in different cell models z-dna-binding protein is critical for controlling virus replication and survival in west nile virus encephalitis a guinea pig model of zika virus infection cellular microrna- regulates virus-induced inflammatory response and protects against lethal west nile virus infection deletion of pregnancy zone protein and murinoglobulin- restricts the pathogenesis of west nile virus infection in mice identification of host genes leading to west nile virus encephalitis in mice brain using rna-seq analysis integrated analysis of micrornas and their disease related targets in the brain of mice infected with west nile virus national institute of neurological disorders and stroke, grant (r od ) from the office of the director, national institutes of health, and institutional funds. key: cord- -lkdkdque authors: rayko, mikhail; komissarov, aleksey title: quality control of low-frequency variants in sars-cov- genomes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lkdkdque during the current outbreak of covid- , research labs around the globe submit sequences of the local sars-cov- genomes to the gisaid database to provide a comprehensive analysis of the variability and spread of the virus during the outbreak. we explored the variations in the submitted genomes and found a significant number of variants that can be seen only in one submission (singletons). while it is not completely clear whether these variants are erroneous or not, these variants show lower transition/transversion ratio. these singleton variants may influence the estimations of the viral mutation rate and tree topology. we suggest that genomes with multiple singletons even marked as high-covered should be considered with caution. we also provide a simple script for checking variant frequency against the database before submission. sequencing of viral genomes allowed researchers to track the distribution of the viruses on earth, and to assess the rate of the viral evolution. this task is especially important during the active outbreaks, where arisen mutations may affect test systems and vaccines under development. during the current pandemic of sars-ncov- , the primary resource for consolidating genomic data is the gisaid database (shu et al., ) . as a result of the collaborative efforts of the researchers worldwide, on april , it contained over , sars-ncov- genomes from different countries, sequenced and assembled using various technologies and approaches. unfortunately, these sequences are not error-free. different sequencing technologies are characterized by different types and frequency of errors (ma et al., ) . often these sequencing errors are not random and are typical for certain sets of nucleotides such as homopolymers. at least % of submitted genomes on april , are sequenced using oxford nanopore technology according to gisaid (according to searching by "nanopore" or "minion" keywords in metadata). oxford nanopore technology is error-prone and ont data requires careful polishing. another source of systematic errors can be the use of a fixed set of primers, which leads to the enrichment of some regions over others. while many assemblers imply more or less uniform coverage, primer sets (such as described at https://github.com/cdcgov/sars-cov- _sequencing ) are commonly used to enrich the sequences. the use of pcr enrichment may result in low coverage of individual genomic regions (even in case of high average coverage), which can be a serious source of errors in the downstream analysis. in addition to sequencing errors, the variety of genome assembly methods makes it difficult to compare data, and the lack of access to raw data makes it impossible to reassemble data using a standardized approach. prompt access to original raw sequencing data is needed to perform accurate and reproducible analysis. gisaid database curators do a tremendous job of filtering submitted sequences, but sometimes it is difficult to distinguish real variants from errors, especially at the lack of information about coverage. here we compared variants across the submissions and developed a pipeline to separate real variants from potential errors based on their frequency across all genomes in the database. we suppose that variations observed in a single genome from the dataset -hereinafter referred to as singletons -may be erroneous, and one should proceed with caution, or maybe even filter out singleton-containing genomes from downstream applications until we get additional evidence from other samples. dataset , full-length (> , bp) sequences of the sars-cov- were downloaded from the gisaid database ( www.epicov.org ) on april , , including , genomes marked as "high coverage". sequences were aligned to the reference genome (ncbi refseq nc_ . ) using minimap (li, ) . resulting vcf files were merged using mergevcfs tool from picard toolkit ( http://broadinstitute.github.io/picard/ ). snvs were annotated using snpeff . t (cingolani et al., ) . ts/tv was calculated as a direct transition/transversion ratio on a filtered set of snvs (not considering their multiplicity, and excluding indels and ns). scripts for data analysis and visualization are available at https://github.com/ablab/covid _variation_analysis . we used the following keywords for gisaid database to get information about sequencing methods: "illumina", "nanopore", "ion torrent", "sanger", "dbnseq". we used the following keywords for gisaid database to get information about assembly methods: "artic", "phe", "spades", "dnbseq", "megahit", "clc", "ivar", and "seattle". to get various assembly methods based on raw reads mapping we used the following keywords: "mpileup", "bwa", "bowtie", or "mapping". Аfter filtering out all variants containing ns, there are variants in , positions (out of , bp). , of them were identified as singletons. figure illustrates quantity and distribution over the genome for snvs of different multiplicity. we explored transition/transversion (ts/tv) ratio for the variants observed with different frequencies. for singletons this ratio is lower than for more frequent variants. (table ) . lower ts/tv ratio corresponds to false positive results (e.g. wang et al. , guo et al , and may indicate the introduced sequencing/assembly errors. currently, genomes with more than . % singleton mutations (i.e. more than snps) are automatically excluded from "high-covered" in gisaid database. when comparing the fraction of genomes marked as "high-covered" among genomes that contain singletons we see that for genomes with and less singletons there is no significant difference. however, the fraction of genomes with or more singletons is significantly (p< . , counted with χ criteria) lower than in those that do not contain any singletons (see table ). all variants in coding regions were annotated with snpeff. singletons showed slightly higher presence of the frameshift indels and stop-gained mutations, most probably erroneous (see table ). also we see a significant difference in the percentage of synonymous variants between singleton and non-singleton snps. however, it is not clear whether this difference corresponds to errors or not. genomes marked as "high covered" in the gisaid database must not contain indels unless verified by the submitter. thus, this is important to compare obtained indels with those already presented in the database. out of indels from samples collected and submitted to gisaid before april st, , we observed only non-singletons (see supplemental table ). of these non-singleton indels are observed in at least one hc genome, some of them were already described and checked (bal et al., , su et al., . we extracted information about sequencing technology by keywords in metadata, and estimated the number of genomes with singletons. we were expecting to see an elevated number of singleton-containing genomes in the oxford nanopore results. however, it turned out that the proportion of the singleton-containing genomes for illumina and ont data is almost the same (see table ). there is a small amount of sanger and dnbseq data in the database at the moment, these results may change over time. then we compared different assembly methods (see table ). we found the lowest number of singletons in genomes assembled by specialized virus-tailored pipelines, such as artic network ( https://artic.network/ncov- ), phe (public health england), ivar (grubaugh et al., ) and seattle flu assembly pipeline ( https://github.com/seattleflu/assembly ). de novo assembly with megahit (li et al., ) shows a significant amount of singletons -one should probably interpret such results with caution. the full data shown in supplementary table . percentage of singleton-containing genomes depending on assembly method. *"other" category includes all custom pipelines, rarely used tools and samples with incomplete or absent information about assembly methods. for each day from january to april we computed a number of total known variants and a number of variants shared in more than one genome to this date ( figure ). we found that the data on the new variants has a two week delay. this time delay should also be taken into account when analyzing the data, especially if one links it to other more rapidly updated data, such as infection statistics. based on our results, we suggest that the singletons can serve as indicators of the potential erroneous assembly. we provide a script that quickly allows to obtain sample's snvs frequencies against the current sars-cov- samples database and to use this result as a sanity check before submission. it is definitely possible to see in a recently sequenced sample a real snv that is not present in the database yet. however, if you see plenty of them we recommend checking these locations (i.e. in some genome visualization software like tablet (milne et al., ) ). described artifacts may influence even the estimations of the viral mutation rate and phylogenetic tree topology. we suggest that genomes with multiple singletons even marked as high-covered should be used with caution. although described methods allow to notice some potential errors in the database, reliable quality control for individual samples is not possible without access to reads. we hope to extend this work when more raw sequencing data connected to gisaid genomes will become available in public databases. molecular characterization of sars-cov- in the first covid- cluster in france reveals an amino-acid a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w ; iso- ; iso- an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar the effect of strand bias in illumina short-read sequencing data analysis of error profiles in deep next-generation sequencing data megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph minimap : pairwise alignment for nucleotide sequences using tablet for visual exploration of second-generation sequencing data incorporating indel information into phylogeny estimation for rapidly emerging pathogens gisaid: global initiative on sharing all influenza data-from vision to reality discovery of a -nt deletion during the early evolution of sars-cov- genome measures used for quality control are dependent on gene function and ancestry table . full table with percentage of singleton-containing genomes depending on sequencing and assembly method. "other" category includes all custom pipelines, rarely used tools and samples with incomplete or absent information about assembly methods. "mapping" category includes all assembly methods with words "mpileup", "bwa", "bowtie", or "mapping" in the description. key: cord- -jyp gjh authors: grant, rogan a.; morales-nebreda, luisa; markov, nikolay s.; swaminathan, suchitra; guzman, estefany r.; abbott, darryl a.; donnelly, helen k.; donayre, alvaro; goldberg, isaac a.; klug, zasu m.; borkowski, nicole; lu, ziyan; kihshen, hermon; politanska, yuliya; sichizya, lango; kang, mengjia; shilatifard, ali; qi, chao; argento, a. christine; kruser, jacqueline m.; malsin, elizabeth s.; pickens, chiagozie o.; smith, sean; walter, james m.; pawlowski, anna e.; schneider, daniel; nannapaneni, prasanth; abdala-valencia, hiam; bharat, ankit; gottardi, cara j.; budinger, gr scott; misharin, alexander v.; singer, benjamin d.; wunderink, richard g. title: alveolitis in severe sars-cov- pneumonia is driven by self-sustaining circuits between infected alveolar macrophages and t cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jyp gjh some patients infected with severe acute respiratory syndrome coronavirus- (sars-cov- ) develop severe pneumonia and the acute respiratory distress syndrome (ards) [ ]. distinct clinical features in these patients have led to speculation that the immune response to virus in the sars-cov- -infected alveolus differs from other types of pneumonia [ ]. we collected bronchoalveolar lavage fluid samples from patients with sars-cov- -induced respiratory failure and patients with known or suspected pneumonia from other pathogens and subjected them to flow cytometry and bulk transcriptomic profiling. we performed single cell rna-seq in bronchoalveolar lavage fluid samples collected from patients with severe covid- within hours of intubation. in the majority of patients with sars-cov- infection at the onset of mechanical ventilation, the alveolar space is persistently enriched in alveolar macrophages and t cells without neutrophilia. bulk and single cell transcriptomic profiling suggest sars-cov- infects alveolar macrophages that respond by recruiting t cells. these t cells release interferon-gamma to induce inflammatory cytokine release from alveolar macrophages and further promote t cell recruitment. our results suggest sars-cov- causes a slowly unfolding, spatially-limited alveolitis in which alveolar macrophages harboring sars-cov- transcripts and t cells form a positive feedback loop that drives progressive alveolar inflammation. this manuscript is accompanied by an online resource: https://www.nupulmonary.org/covid- / one sentence summary sars-cov- -infected alveolar macrophages form positive feedback loops with t cells in patients with severe covid- . secondary to other pathogens or non-pneumonia controls (intubated for reasons other than pneumonia) according to a standardized adjudication procedure (see methods). patients were designated as viral pneumonia if a multiplex pcr for respiratory viruses obtained by nasopharyngeal swab or bal was positive in the absence of an alternative diagnosis based on quantitative culture of bal fluid or multiplex pcr. a complete list of viral and bacterial pathogens identified as the cause of pneumonia is in table . for some patients without covid- the diagnosis of pneumonia was made based on clinical suspicion, radiographic findings and response to antimicrobial therapy in the absence of an identified pathogen. for all patients with covid- , samples were collected from regions of greatest chest radiograph abnormality by a critical care physician using a disposable bronchoscope. the majority of samples prior to the pandemic were collected by respiratory therapists using non-bronchoscopic bronchoalveolar lavage (nbbal) with the catheter directed to the most radiographically affected lung [ ] . compared with patients with known or suspected pneumonia requiring mechanical ventilation, patients with severe sars-cov- pneumonia were similar in age, race, and sex, but had a significantly higher body mass index and a lower proportion of patients were current smokers, compared with historic control populations diagnosed with other viral pneumonias, non-viral pneumonia, or non-pneumonia control patients (figure b-g) . patients with severe sars-cov- pneumonia required longer periods of ventilation and had longer lengths of stay in the icu compared with all pneumonia and non-pneumonia controls (figure h-i) . severity of illness estimated using the sequential organ failure assessment (sofa) score and the acute physiology score (aps) was similar in patients with sars-cov- pneumonia compared with other pneumonia and was comparable to that observed in a recent study of ards [ ] (supplemental figure s c and table ). nevertheless, mortality was significantly lower (q = . , chi-square tests of proportions) in patients with sars-cov- pneumonia, compared with the entire cohort ( % vs %), but was not significantly different as compared with any individual comparison group (figure j) . figure . demographics of the script cohort. we compared bal fluid obtained sequentially from patients with severe sars-cov- pneumonia requiring mechanical ventilation with patients with confirmed pneumonia secondary to other respiratory viruses (other viral pneumonia) and patients with non-viral pneumonia (other pneumonia) and mechanically ventilated patients without pneumonia undergoing bal (non-pneumonia controls). a. timing of hospital admission (square), bal collection (diamonds) and length of mechanical ventilation (bold red line) and hospital stay (thin grey line) in patients with severe covid- grouped by outcomes (crossed open red circles -death; green circles -discharged home; light blue circles -discharged to inpatient facility). day is defined as the day of the first intubation. an average of . (range - ) bal samples were collected per patient. b. distribution of patient ages. differences not significant by pairwise t-test with fdr correction. c. sex. proportions of females (red) and males (blue) were similar between groups (pairwise chi-square tests of proportions with continuity and fdr correction). d. self-reported race. proportions were similar between groups (pairwise chi-square tests of proportions with continuity and fdr correction). e. self-reported ethnicity. a significant increase in the proportion of individuals identifying as hispanic or latino was observed in the covid- cohort, relative to all controls (q < . , pairwise chi-square tests of proportions with continuity and fdr correction). f. self-reported smoking status. significantly fewer active smokers were observed in the covid- cohort, as compared with all control groups (q < . , pairwise chi-square tests of proportions with continuity and fdr correction). g. body mass index (bmi). bmi was elevated in the covid- cohort relative to patients with viral pneumonia, and other pneumonia, but not to non-pneumonia controls (q < . , t-test with fdr correction). h. icu length of stay. icu length of stay was longer in the covid- cohort, relative to all control cohorts (pairwise t-tests with fdr correction). i. duration of mechanical ventilation. the duration of mechanical ventilation was significantly increased in the covid- cohort relative only to non-pneumonia controls (pairwise t-tests with fdr correction). j. mortality in patients with covid- was similar to patients with other respiratory viruses, non-viral pneumonia and non-pneumonia controls. differences are not significant (q > . ) unless otherwise noted. figure s : overview of the study and biomarkers. a. sankey diagram illustrating relationship between number of participants with covid- , other viral pneumonia, non-viral pneumonia (other pneumonia) and non-pneumonia controls ) enrolled in the script study ( patients), ) analyzed via flow cytometry ( patients), ) bulk rna-seq on flow-sorted alveolar macrophages ( patients) and ) single-cell rna-seq ( patients). some samples were cryopreserved and sorted post-cryorecovery. since cryopreservation affects the number of granulocytes, these samples were not included in flow cytometric analysis, but were used for bulk rna-seq profiling of flow-sorted alveolar macrophages. skipped analyses are represented by alluvia flowing over the grouping bars. b. sankey diagram illustrating relationship between number of bal samples from participants with covid- , other viral pneumonia, non-viral pneumonia (other pneumonia) and non-pneumonia controls ) enrolled in the script study ( samples), ) analyzed via flow cytometry ( samples), ) bulk rna-seq on flow-sorted alveolar macrophages ( samples) and ) single-cell rna-seq ( samples). c. the sequential organ failure assessment (sofa) score, the acute physiology score (aps) and inflammation biomarkers: creatine phosphokinase (cpk), c-reactive protein (crp), ddimer, ferritin, lactate dehydrogenase (ldh), procalcitonin (q = . x - , pairwise wilcoxon rank-sum tests with fdr correction). green-shaded area indicates normal range. to define the immune cell profile over the course of severe sars-cov- -induced pneumonia, we analyzed samples from patients with confirmed covid- in our cohort. we compared those samples with samples from patients with severe pneumonia secondary to other causes. we used multicolor flow cytometry with a panel of markers sufficient to identify neutrophils, cd + and cd + t cell subsets, macrophages and monocytes in the bal fluid (figure s a ,b) [ , ] . we first focused analysis on patients who had a sample collected within hours of intubation, comparing samples from patients with confirmed covid- to samples from patients with severe pneumonia secondary to other causes. we found that, despite a diagnosis of severe ards requiring mechanical ventilation, only . % of patients with severe covid- exhibited neutrophilia in bal fluid within hours of intubation, defined as a neutrophil frequency over % (figure a-c) . instead, we found that in patients with severe sars-cov- pneumonia, the alveolar space was significantly enriched for both cd + and cd + t cells and monocytes (figure a,b) . this observation significantly differs from patients with pneumonia secondary to other respiratory viruses, bacteria or mechanically ventilated non-pneumonia controls [ ] . this pattern was not due to the differences in co-infections between different types of pneumonia (figure d) . collectively, these findings suggest that unusual factors related to sars-cov- pneumonia result in the recruitment of t cells and monocytes, rather than neutrophils, to the infected alveolus. pneumonia are characterized by neutrophilia while bal fluid from most patients with sars-cov- pneumonia is enriched for cd + and cd + t cells and monocytes. samples were clustered by euclidean distance using ward's method. b. proportions of cells detected within hours of intubation. proportion of cd + and cd + t cells is increased in the covid- cohort (q < . , pairwise wilcoxon rank-sum tests with fdr correction) and proportion of neutrophils is reduced in these patients, relative to non-viral pneumonia controls (q < . , pairwise wilcoxon ranksum tests with fdr correction). comparisons are not significant unless otherwise noted. c. averaged cell-type compositions in the first hours of intubation, binned by diagnosis. d. comparison of rates of co-infection. coinfection was defined by detection in a single sample of any bacterial or fungal pathogen listed in table by culture or multiplex pcr analysis. no significant differences were observed between the covid- cohort and pneumonia controls (q ≥ . , pairwise chi-square tests of proportions with continuity and fdr correction). transcriptomic analysis of flow-sorted alveolar macrophages identifies an interferon response signature as a unique feature of sars-cov- pneumonia. we performed rna-seq on flow-sorted alveolar macrophages from patients with severe sars-cov- pneumonia ( samples from patients) and compared them with patients with pneumonia secondary to other causes ( samples from patients with any other viral pneumonia, samples from patients with non-viral pneumonia), non-pneumonia controls ( samples from patients) and healthy volunteers ( samples from volunteers). k-means clustering of the , most variable genes across diagnoses using a likelihood-ratio test (lrt) identified clusters ( . interestingly, il , the protein product of which has been implicated in cytokine storm in severe covid- and serves as a predictor of morbidity and mortality in severe covid- [ ] , was not among differentially expressed genes and its overall expression was not different between the groups with the exception of healthy controls, where il transcripts were never detected (supplemental figure b) . we then asked whether we can detect sars-cov- transcripts in flow-sorted alveolar macrophages and whether the presence of viral transcripts correlates with specific changes in alveolar macrophage transcriptomes. to allow detection of viral rna, we aligned sequences to a hybrid genome including human, sars-cov- and influenza a/california/ / reference genomes. an additional negative strand transcript, which is transiently formed during viral replication, was added to the sars-cov- transcriptome to detect replicating virus [ ] . we queried macrophage transcriptomes from all patients for the presence of positive-and negative-strand transcripts encoding sars-cov- . we detected sars-cov- transcripts in macrophage transcriptomes from % of patients with pcr-confirmed sars-cov- infection. in % of these patients, we detected both positive-and negative-strand sars-cov- transcripts (figure b, supplemental figure c ). as would be expected during recovery, we also observed a significant negative correlation between abundance of sars-cov- transcripts in patients with confirmed covid- and days of ventilation (ρ = - . , spearman correlation; supplemental figure d ). in order to identify more specific macrophage gene modules, particularly those that distinguish pneumonia type and outcome, we performed weighted gene coexpression network analysis (wgcna) (figure c , supplemental data file ). as predicted, we identified some modules related to composition of the bulk samples, correlating strongly with flow cytometry results. module exhibited a strong association with the percentage of cd hi alveolar macrophages, and was characterized by genes expressed in mature tissueresident alveolar macrophages, including fabp and pparg (r = . ) [ ] . module exhibited a strong negative correlation with percentage of cd hi macrophages, and was identified by expression of genes found in immature monocyte-derived alveolar macrophages including spp (r = - . ). strikingly, module correlated with the detection of sars-cov- transcripts, levels of c-reactive protein, cd + t cell abundance, and covid- diagnosis (r = . , . , . , . , respectively). notably, all sars-cov- genes included in this analysis were assigned to this module, further underscoring disease relevance. in confirmation of the k-means clustering results, this module was highly enriched for type i and type ii interferon response genes (go: type i interferon signaling pathway; go: interferongamma-mediated signaling pathway). this association was further confirmed by umap projection of all bulk rna-seq samples, which separated largely by diagnosis with module as a major driver of clustering (figure d ). significant negative correlations of average module expression, type i interferon signaling (go: ), and type ii interferon signaling (go: ) were observed over the time-course of mechanical ventilation, suggesting reductions during the recovery phase (figure e -g). (figure b) . all macrophage clusters were represented by cells from - patients (supplemental figure s a) . as our analysis of transcriptomic data from alveolar macrophages suggested that sars-cov- pneumonia is uniquely associated with the activation of pathways induced by interferons, we looked for the expression of type i interferons in our single cell dataset. we did not detect type i interferon expression in any of the sequenced cells (data not shown). similarly, we did not detect type i interferon transcripts in other publicly available single cell rna-seq datasets obtained from bal fluid [ , ] . in contrast, expression of type ii interferon -ifng -was detected in t cells from all patients (figure c, supplemental figure b) . these results suggest the interferon response gene signature we observed in flow-sorted alveolar macrophages is induced by ifnγ released from activated t cells. we next looked for sars-cov- positive-and negative-strand transcripts within our data. as expected, both were detected in epithelial cells and migratory ccr + dendritic cells. interestingly, they were also detected in moam and tram subclusters (figure d, supplemental figure c ). coronaviruses generate large numbers of positive strand transcripts from a single negative strand transcript [ , ] . consistent with this known biology, we detected more transcripts for positive compared with negative strands in both our single cell and bulk rna-seq data supplementary figure d) . these data suggest that alveolar macrophages harbor virus and suggest they may support viral replication as has been reported for sars-cov and mers-cov [ ] [ ] [ ] . interestingly, tram harboring sars-cov- showed distinct clustering compared with uninfected tram (clustering was performed without sars-cov- transcripts). importantly, these tram also clustered differently when compared with tram from a patient with bacterial pneumonia (supplemental figure e , f, supplemental data file ). marker genes distinguishing infected tram from non-infected tram included several chemokines and cytokines important for t cell recruitment. these include ccl , which attracts dendritic cells and t cells via ccr and only weakly attracts neutrophils (figure b, e) , cxcl and cxcl , which attract t cells and plasmacytoid dendritic cells via cxcr , and ccl , which attracts monocytes, nk and t cells via ccr (figure b, f) . these cells also expressed genes important in the innate immune response to virus. these include il b (figure b, g) , tnfsf , a member of tnf ligand family, and the antimicrobial peptide defensin b (defb ) (figure b, supplemental data file ) . finally, these cells were marked by increased expression of interferon response genes (isg , ifit , ifit , ifit , mx and others). consistent with this observation, we found higher average levels of expression of a curated list of ifn-response genes in infected (tram ) compared with non-infected (tram ) alveolar macrophages (figure b,h; supplemental figure g) . these results indicate a positive feedback loop in which sars-cov- -infected alveolar macrophages release chemokines that recruit t cells, which release ifnγ to further activate the infected alveolar macrophages. high serum levels of il- and its transcriptional target c-reactive protein have been observed in patients with severe covid- . il- induces the transcription of clotting factors in the liver and tissue factor in the endothelium that promote thrombosis [ ] . these mechanisms are thought to contribute to the microvascular thrombosis in the lung and other tissues observed in autopsy studies conducted on patients with covid- , which in turn is thought to contribute to multiple organ dysfunction [ , ] . as increased transcripts encoding il- have not been observed in single cell rna-seq studies of the peripheral blood [ ] , some have suggested il- is generated by inflammatory cells in the alveolus [ ] . the overall expression of il was low, and was not restricted to a specific cell type in our single cell rna-seq dataset. despite this, the levels of c-reactive protein at the time of intubation were increased in patients with severe covid- , relative to the upper limit of the normal range of mg/l (p = . x - , one-sample wilcoxon rank-sum test; supplemental figure s c ). these results suggest that the resident and recruited immune cells in the infected alveoli do not directly contribute to systemic increase of il- (supplemental figure s h ). severe sars-cov- pneumonia unfolds slowly over time. our studies of bal fluid collected from patients with covid- within hours of intubation revealed an already established feedback loop between activated t cells and infected alveolar macrophages. this stable, slowly amplifying feedback loop is consistent with the long duration between symptom onset and the development of respiratory failure in patients with covid- ( - days) and the prolonged course of mechanical ventilation in many of these patients (figure a , h, i) [ ] . a prediction from this model is that these circuits will persist over time until the virus is cleared. hence, we examined serial bal fluid samples collected from patients with severe sars-cov- pneumonia over time (figure a, b) . we used clinical measures performed on the bal fluid (quantitative culture or multiplex pcr for respiratory pathogens) to identify bacterial superinfections, defined as detection of a new bacterial pathogen acquired during the hospitalization. in the absence of a detected bacterial superinfection, bal fluid remained non-neutrophilic in % of samples from patients admitted with sars-cov- pneumonia. bal neutrophilia developed in % of patients with covid- who developed bacterial superinfection. hierarchical clustering of the composition of bal samples from all time points demonstrated that in comparison to samples from other types of pneumonia samples from patients with covid- were enriched for t cells irrespective of the time of bal collection (figure a, b) or presence or absence of superinfection (figure c ). our data suggest that composition of immune cells in sars-cov- pneumonia is relatively stable and continues to unfold slowly over time, resulting in a loss of tissue-resident alveolar macrophages and continuous recruitment of monocyte-derived alveolar macrophages. to test this hypothesis, we performed cell type deconvolution to estimate the proportion of individual cell types in bulk rna-seq data from flowsorted alveolar macrophages using cell type-specific signatures from single cell rna-seq. this approach confirmed that only a small portion of the samples from patients with pneumonia contained tissue-resident alveolar macrophages. instead, the majority of alveolar macrophages were mature monocyte-derived alveolar macrophages (moam ) (figure d, supplemental figure i ). interestingly, samples from patients with covid- contained more immature moam alveolar macrophages than samples from patients with other types of pneumonia, supporting our hypothesis that sars-cov- pneumonia is characterized by the ongoing recruitment of monocyte-derived alveolar macrophages. (figure a ) was performed using signatures from integrated single-cell rna-seq (figure a) . asterisks indicate statistical significance, wilcoxon rank-sum tests with fdr correction: * p < . , ** p < . , *** p < . . e. schematic illustrating interpretation of the main findings. . normal alveolus contains ace -expressing alveolar type and type cells (at and at , correspondingly) and tissue-resident alveolar macrophages (tram). . sars-cov- infects at cells and tram. infected tram are activated and express il b and t cell chemokines. . cross-reactive or de novo generated t cells recognize sars-cov- antigens presented by tram and produce ifnγ, further activating tram to produce cytokines and chemokines. . activated t cells proliferate and continue to produce ifnγ, eventually leading to death of infected tram and recruitment of monocytes, which rapidly differentiate into monocyte-derived alveolar macrophages (moam). moam in turn produce chemokines that further promote t cell and monocyte recruitment, thus sustaining pneumonia. . recruited moam either become mature and repopulate alveoli or become infected with sars-cov- , continuing to present antigens to t cells and maintain the feedback loop. our data represent the earliest sampling of the alveolar space of patients with covid- pneumonia reported to date [ , ] . it is possible to use these data to speculate about early events that lead to pneumonia in patients with sars-cov- infection (figure e) . strong evidence suggests sars-cov- initially infects and replicates in epithelial cells in the nasopharynx, which express relatively high levels of ace in comparison with epithelial cells in airways or the distal lung [ , ] . there the virus replicates and escapes clearance by suppressing type i interferon responses [ ] [ ] [ ] and broadly disrupting protein translation [ ] . whether by progressive movement distally in the tracheobronchial tree or via aspiration of nasopharyngeal contents, some virus gains access to the distal alveolar space. in the alveolar space, we confirm that sars-cov- infects alveolar epithelial cells and alveolar macrophages [ ] . alveolar macrophages might be infected with sars-cov- by ) phagocytosis of infected epithelial cells, ) via direct infection, as was shown for sars-cov and mers-cov [ , ] or ) through other mechanisms, such as antibody mediated enhancement as was shown for sars-cov [ , ] . indeed, it is estimated that ~ % of adults have antibodies against one of four major seasonal coronaviruses [ ] , and antibodies cross-reacting with sars-cov- might facilitate viral entry into alveolar macrophages. while tissue-resident alveolar macrophages are poor antigen-presenting cells and do not migrate to lymph nodes to participate in conversion of naive t cells into effector t cells [ ] , a low level of antigen presentation by alveolar macrophages in the alveolar space might be sufficient to drive activation of pre-existing memory t cells that target endemic seasonal coronaviruses, but cross-react with sars-cov- . existence of such cross-reactive memory t cells has been reported for sars-cov [ ] and, recently, sars-cov- [ ] [ ] [ ] [ ] . this mechanism might explain the epidemiology of sars-cov- , which disproportionately affects elderly individuals, while children and young adults often have only mild symptoms, despite having viral load in upper airways compatible to adults [ , ] . children and young individuals have fewer encounters with seasonal coronaviruses during their lifetime than elderly individuals and therefore would have fewer crossreactive antibodies or memory t cells. we did not detect substantial expression of type i interferons in our datasets. this could be either related to the timing of sampling, undersampling cell types producing type i interferons, or inhibition of type i interferon production by sars-cov- proteins [ , ] . however, we readily detected ifnγ production by t cells. this supports a model where infection and activation of tissue-resident alveolar macrophages with sars-cov- , followed by antigen presentation to cross-reactive or sars-cov- -specific t cells, leads to a delayed interferon response. this in turn amplifies the inflammatory response by recruiting proinflammatory monocyte-derived macrophages, as has been suggested from animal models of sars-cov infection [ ] . interestingly, in agreement with our observation that the inflammatory environment in the lung during severe covid- is characterized by the paucity of neutrophils, we found that alveolar macrophages produced chemokines preferentially driving recruitment of t cells and monocytes, but not neutrophils. our results also explain the slow progression of sars-cov- pneumonia relative to other respiratory viruses, most notably influenza a. specifically, the time from the onset of symptoms to respiratory failure in patients with sars-cov- infection is - days, compared with - days or even less in patients with influenza a infection [ ] . consistent with a slower course, the duration of illness is longer in patients with severe sars-cov- infection. while both viruses might gain access to the distal lung by similar mechanisms, the sialic acid residues that serve as receptors for influenza a are abundantly expressed in alveolar type cells [ ] . in contrast, single cell rna-seq atlases of the human lung and sm-fish studies of the normal lung show that only a small number of alveolar epithelial cells express ace [ , ] . thus, while influenza a would infect large numbers of cells causing widespread injury, rapid viral replication and robust antiviral responses, infection by sars-cov- is likely to lead to more localized areas of infection. these more localized areas of infection and injury are consistent with radiographic patterns of early covid- , and the presence of discrete radiographic abnormalities in asymptomatic covid- patients [ ] . because tissue-resident alveolar macrophages have limited motility [ ] , the uptake and persistence of sars-cov- -infected alveolar macrophages would not be predicted to spread the virus to adjacent lung regions. however, recruited monocyte-derived alveolar macrophages, which we found can also harbor replicating sars-cov- , are typically more numerous, and could conceivably spread the virus to adjacent alveoli. in each new area of infection, positive feedback loops between alveolar macrophages harboring the virus and activated t cells could promote ongoing injury and systemic inflammation. this model would also explain the heterogeneity we observed in alveolar macrophages recovered from the same individual. different stages of sars-cov- infection might be spatially localized to subsegmental regions such that the bal procedure samples alveoli in various stages of infection. further studies using spatial techniques and a time-course analysis of patients with early disease will address these questions. human subjects: all human subjects research was approved by the northwestern university institutional review board. samples from patients with covid- , viral pneumonia, other pneumonia and nonpneumonia controls were collected from participants enrolled in successful clinical response in pneumonia therapy (script) study stu . alveolar macrophages from healthy volunteers were obtained under study stu . all subjects or their surrogates provided informed consent. patients ≥ years of age with suspicion of pneumonia based on clinical criteria (including but not limited to fever, radiographic infiltrate, and respiratory secretions) were screened for enrollment into the script study. inability to safely obtain bal or nbbal were considered exclusion criteria. in our center, patients with respiratory failure are intubated based on the judgement of bedside clinicians for worsening hypoxemia, hypercapnia, or work of breathing refractory to high-flow oxygen or non-invasive ventilation modes. extubation occurs based on the judgement of bedside clinicians following a trial of spontaneous breathing in patients demonstrating physiologic improvement in their cardiorespiratory status during their period of mechanical ventilation. management of patients with covid- was guided by protocols published and updated on the northwestern medicine website as new information became available over the pandemic. clinical laboratory testing including studies ordered on bronchoalveolar lavage fluid was at the discretion of the care team, however, quantitative cultures, multiplex pcr (biofire film array respiratory panel), and automated cell count and differential were recommended by local icu protocols. most patients also underwent urinary antigen testing for streptococcus pneumoniae and legionella pneumophilia on admission. clinicians were encouraged to manage all patients, including those with covid- , according to ardsnet guidelines including the use of a higher peep/lower fio strategy for those with severe hypoxemia. prone positioning ( hours per day) was performed in all patients with a pao /fio < who did not have contraindications. in those who had a response to prone positioning evident by improved oxygenation, prone positioning was repeated. esophageal balloon catheters (cooper surgical) were placed at the discretion of the care team to estimate transpulmonary pressure and optimize peep, particularly in patients with a higher than normal bmi. pneumonia category adjudication was performed by five critical care physicians (rgw, jmk, bds, cop, jmw) using a standardized reporting tool (supplemental data file ). clinical laboratory data were obtained from the northwestern medicine enterprise data warehouse using structured query language (sql). aps and sofa scores were generated from the electronic health record using previously validated programming. nbbal and bal procedures: consent was obtained from patients or legal decision makers for the bronchoscopic procedures. bronchoscopic bal was performed in intubated icu patients with flexible, single-use ambu ascope (ambu) devices. patients were given sedation and topical anesthetic at the physician proceduralist's discretion. vital signs were monitored continuously throughout the procedure. the bronchoscope was wedged in the segment of interest based on available chest imaging or intraprocedure observations, aliquots of cc of normal saline at a time, generally - cc total, were instilled and aspirated back. the fluid returned following the first aliquot was routinely discarded. samples were split (if sufficient return volume was available) and sent for clinical studies and an aliquot reserved for research. a similar procedure was applied to non-bronchoscopic bal (nbbal); however nbbal was performed with directional but not visual guidance, and as usual procedural care by a respiratory therapist rather than a pulmonologist [ ] . for the bronchoscopies performed in the covid- patients, additional precautions were taken to minimize the risk to healthcare workers including only having essential providers present in the room, clamping of the endotracheal tube, transient disconnection of the inspiratory limb from the ventilator, and preloading of the bronchoscope through the adapter; sedation and neuromuscular blockade to prevent cough, was administered for these procedures at the physician's discretion. in most cases the early bronchoscopy was performed immediately after intubation. flow cytometry and cell sorting: nbbal and bal samples were filtered through a um cell strainer, pelleted by centrifugation at rcf for min at c, followed by hypotonic lysis of red blood cells with ml of bd pharmlyse reagent for min. lysis was stopped by adding ml of macs buffer. cells were pelleted again and resuspended in ul of fc-block (human trustain fcx, biolegend), and ul aliquot was taken for counting using k cellometer (nexcelom) with ao/pi reagent. the volume of fc-block was adjusted so the concentration of cells was always less than x cells/ml and the fluorophore-conjugated antibody cocktail was added in : ratio ( table ) . after incubation at c for min cells were washed with ml of macs buffer, pelleted by centrifugation and resuspended in ul of macs buffer + ul of sytox green viability dye (thermofisher). cells were sorted on facs aria ii sorp instrument using um nozzle and psi pressure. cells were sorted into ul of macs buffer for bulk rna-seq or ul of % bsa in pbs for single cell rna-seq. sample processing was performed in bsl- facility using bsl- practices. bulk rna-seq of flow-sorted alveolar macrophages: immediately after sorting, cells were pelleted by centrifugation and lysed in ul of rlt plus lysis buffer (qiagen) supplemented with -mercaptoethanol. lysed cells were stored at - c until rna isolation using allprep dna/rna micro kit according to manufacturer's protocol (qiagen). rna quality and quantity were assessed using tapestation high sensitivity rna tapes (agilent) and rna-seq libraries were prepared from pg of total rna using smarter stranded total rna-seq kit v (takara bio). after qc using tapestation high sensitivity dna tapes (agilent) dual indexed libraries were pooled and sequenced on a nextseq instrument (illumina), cycles, single-end, to an average sequencing depth of . m reads. fastq files were generated using bcl fastq (illumina). to enable detection of viral rna, a custom hybrid genome was prepared by joining fasta, gff, and gtf files for grch . , sars-cov- (nc_ . ) -virus causing covid- , and influenza a/california/ / (gcf_ . ), which was the dominant strain of influenza throughout bal fluid collection at our hospital [ ] . an additional negative strand transcript spanning the entirety of the sars-cov- genome was then added to the gtf and gff files to enable detection of sars-cov- replication. normalized counts tables later revealed extremely high enrichment of sars-cov- transcripts in diagnosed covid- patients, and strong enrichment of iav genes in patients marked as other viral pneumonia. to facilitate reproducible analysis, samples were processed using the publicly available nf-core/rna-seq pipeline version . . implemented in nextflow . . using singularity . . - with the minimal command nextflow run nf-core/rnaseq -r . . -singleend -profile singularity -unstranded --three_prime_clip_r [ ] [ ] [ ] . briefly, lane-level reads were trimmed using trimgalore! . . and aligned to the hybrid genome described above using star . . d [ ] . gene-level assignment was then performed using featurecounts . . [ ] . bulk differential expression analysis (dea). all analysis was performed using custom scripts in r version . . using the deseq version . . framework [ ] . correspondence between lanes was first confirmed by principal component analysis before merging counts using the command collapsereplicates(). one outlier sample with low rin score and exhibiting extreme deviation on pca and poor alignment and assignment metrics was excluded from downstream analysis. for differential expression analysis (dea), both moam content from flow cytometry data and diagnosis were used as explanatory factors. a "local" model of gene dispersion was employed as this better fit dispersion trends without obvious overfitting; otherwise default settings were used. k-means clustering of bulk samples. for k-means clustering, a custom-built function was used (available at https://github.com/nupulmonary/utils/blob/master/r/k_means_figure.r). briefly, variable genes were identified using a likelihood-ratio test (lrt) with local estimates of gene dispersion in deseq with diagnosis as the "full" model, and a reduced model corresponding to intercept, alone. genes with q ≥ . were discarded. extant genes were then clustered using the hartigan-wong method with random sets and a maximum of iterations using the kmeans function in r stats . . . samples were then clustered using ward's method and plotted using pheatmap version . . . go term enrichment was then determined using fisher's exact test in topgo version . . , with org.hs.eg.db version . . as a reference. weighted gene coexpression network analysis (wgcna). wgcna was performed manually using wgcna version . with default settings unless otherwise noted [ ] . genes with counts > and detection in at least % of samples were included in analysis. to best capture patterns of co-regulation, a signed network was used. using the picksoftthreshold function, we empirically determined a soft threshold of to best fit the network structure. a minimum module size of was chosen to isolate relatively large gene modules. module eigengenes were then related back to patient and sample metadata using biweight midcorrelation. module go enrichment was then determined as above using fisher's exact test in topgo version . . , with org.hs.eg.db version . . as a reference. umap plotting was performed using uwot version . . using the first principal components of the same genes used in wgcna analysis after zscaling and centering, with a minimum distance of . [ ] . default parameters were otherwise used. single cell rna-seq of flow-sorted bal cells: cells were sorted into % bsa in dpbs, pelleted by centrifugation at rcf for min at c, resuspended in . % bsa in dpbs to ~ cells/ul concentration. concentration was confirmed using k cellometer (nexcelom) with ao/pi reagent and ~ , - , cells were loaded on x genomics chip a with chromium single cell ' gel beads and reagents ( x genomics). libraries were prepared according to manufacturer protocol ( x genomics, cg _revm). after quality check single cell rna-seq libraries were pooled and sequenced on novaseq instrument. data was processed using cell ranger . . pipeline ( x genomics). to enable detection of viral rna, reads were aligned to a custom hybrid genome containing grch . and sars-cov- (nc_ . ). an additional negative strand transcript spanning the entirety of the sars-cov- genome was then added to the gtf and gff files to enable detection of sars-cov- replication. data was processed using scanpy v . . [ ] , doublets were detected with scrublet v . . [ ] and removed, ribosomal genes were removed and multisample integration was performed with bbknn v . . [ ] . gene set enrichment analysis was performed with signatures retrieved from gsea-msigdb.org website [ ] using following terms: hallmark_interferon_gamma_response m , hallmark_interferon_alpha_response m . computations were automated with snakemake v . . [ ] . deconvolution of bulk rna-seq alveolar macrophage signatures was performed using autogenes v . . [ ] and signatures derived from integrated single cell rna-seq object. for details of the functions used and their parameters see the code https://github.com/nupulmonary/ _grant. statistical analysis: all statistical analysis was performed using r version . . . for all comparisons, normality was first assured using a shapiro-wilk test and manual examination of distributions. for parameters exhibiting a clear lack of normality, nonparametric tests were employed. in cases of multiple testing, p-values were corrected using false-discovery rate (fdr) correction. adjusted p-values < . were considered significant. visualization: all plotting was performed using ggplot version . . , with the exception of heatmaps, which were generated using pheatmap version . . . sankey/alluvial plots were generated using ggalluvial version . . [ ] . figure layouts were generated using patchwork version . and edited in adobe illustrator . bulk rnaseq: counts tables and metadata are included as supplemental data files and . single-cell rna-seq: counts tables and integrated objects are available through geo with accession number gse . raw data is in the process of being deposited to sra/dbgap. code: https://github.com/nupulmonary/ _grant. tables: table : demographics of the script cohort, grouped by diagnosis. table : pneumonia-causing pathogens detected in the script cohort. table : flow cytometry panels, reagents and instruments configuration. supplementary data file : data related to k-means clustering and differential expression analysis (dea) of bulk rna-seq samples: gene cluster assignments, cluster go cluster is part of quest, northwestern university's high performance computing facility, with the purpose to advance research in genomics and world peace rogan grant was supported by nih grant luisa morales-nebreda was supported by t hl and f hl gr scott budinger was supported by nih grants u ai , p ag , r hl and veterans affairs grant i cx misharin was supported by nih grants u ai , p ag , r hl , r hl and nucats covid- rapid response grant singer was supported by nih awards k hl , u ai , r hl , and p ag wudnerink was supported by nih grant u ai and a glaxosmithkline distinguished scholar in respiratory health grant from the chest foundation diagnosis: covid- (n = ) diagnosis: non-pneumonia control (n = ) diagnosis: other pneumonia (n = ) diagnosis: other viral pneumonia clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study covid- does not lead to a "typical" acute respiratory distress syndrome. american journal of respiratory and critical care medicine real estimates of mortality following covid- infection clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study effect of dexamethasone in hospitalized patients with covid- : preliminary report. infectious diseases (except hiv/aids). medrxiv multidimensional assessment of alveolar t cells in critically ill patients early neuromuscular blockade in the acute respiratory distress syndrome multidimensional assessment of the host response in mechanically ventilated patients with suspected pneumonia elevated levels of il- and crp predict the need for mechanical ventilation in covid- the architecture of sars-cov- transcriptome single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis single-cell landscape of bronchoalveolar immune cells in patients with covid- cross-talk between the airway epithelium and activated immune cells defines severity in covid- . infectious diseases (except hiv/aids). medrxiv continuous and discontinuous rna synthesis in coronaviruses active replication of middle east respiratory syndrome coronavirus and aberrant induction of inflammatory cytokines and chemokines in human macrophages: implications for pathogenesis cytokine responses in severe acute respiratory syndrome coronavirus-infected macrophages in vitro: possible relevance to pathogenesis antibody-dependent infection of human macrophages by severe acute respiratory syndrome coronavirus metformin targets mitochondrial electron transport to reduce air-pollution-induced thrombosis thrombosis and covid- pneumonia: the clot thickens! the european respiratory journal: official journal of the european society for clinical respiratory physiology immunology of covid- : current state of the science a single-cell atlas of the peripheral immune response in patients with severe covid- covid- severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract imbalanced host response to sars-cov- drives development of covid- sars-cov- orf b is a potent interferon antagonist whose activity is further increased by a naturally occurring elongation variant comparative replication and immune activation profiles of sars-cov- and sars-cov in human lungs: an ex vivo study with implications for the pathogenesis of covid- structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- anti-severe acute respiratory syndrome coronavirus spike antibodies trigger infection of human immune cells via a ph-and cysteine protease-independent fcγr pathway prevalence of antibodies to four human coronaviruses is lower in nasal secretions than in serum division of labor between lung dendritic cells and macrophages in the defense against pulmonary infections t-cell response profiling to biological threat agents including the sars coronavirus presence of sars-cov- reactive t cells in covid- patients and healthy donors targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls selective and cross-reactive sars-cov- t cell epitopes in unexposed humans epidemiology of covid- among children in china an analysis of sars-cov- viral load by patient age. infectious diseases (except hiv/aids). medrxiv impaired type i interferon activity and inflammatory responses in severe covid- patients dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice prevalence of asymptomatic sars-cov- infection: a narrative review immune surveillance of the lung by migrating tissue monocytes fluview and flunet: tools for influenza activity and surveillance nextflow enables reproducible computational workflows the nf-core framework for community-curated bioinformatics pipelines singularity: scientific containers for mobility of compute star: ultrafast universal rna-seq aligner featurecounts: an efficient general purpose program for assigning sequence reads to genomic features moderated estimation of fold change and dispersion for rna-seq data with deseq wgcna: an r package for weighted correlation network analysis umap: uniform manifold approximation and projection for dimension reduction scanpy: large-scale single-cell gene expression data analysis scrublet: computational identification of cell doublets in single-cell transcriptomic data bbknn: fast batch alignment of single cell transcriptomes gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles snakemake--a scalable bioinformatics workflow engine automatic gene selection using multi-objective optimization for rnaseq deconvolution layered grammar for alluvial plots northwestern university flow cytometry core facility is supported by nci cancer center support grant p ca awarded to the robert h. lurie comprehensive cancer center. cell sorting was performed on bd facsaria sorp cell sorter purchased through the support of nih s od - .this research was supported in part through the computational resources and staff contributions provided by the genomics compute cluster which is jointly supported by the feinberg school of medicine, the center for genetic medicine, and feinberg's department of biochemistry and molecular genetics, the office of the provost, the office for research, and northwestern information technology. the genomics compute key: cord- -g cpjp authors: brunaugh, ashlee d.; seo, hyojong; warnken, zachary; ding, li; seo, sang heui; smyth, hugh d.c. title: broad-spectrum, patient-adaptable inhaled niclosamide-lysozyme particles are efficacious against coronaviruses in lethal murine infection models date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g cpjp niclosamide (nic) has demonstrated promising in vitro antiviral efficacy against sars-cov- , the causative agent of the covid- pandemic. though nic is already fda-approved, the oral formulation produces systemic drug levels that are too low to inhibit sars-cov- . as an alternative, direct delivery of nic to the respiratory tract as an aerosol could target the primary site of for sars-cov- acquisition and spread. we have developed a niclosamide powder suitable for delivery via dry powder inhaler, nebulizer, and nasal spray through the incorporation of human lysozyme (hlys) as a carrier molecule. this novel formulation exhibits potent in vitro and in vivo activity against mers-cov and sars-cov- and may protect against methicillin-resistance staphylococcus aureus pneumonia and inflammatory lung damage occurring secondary to cov infections. the suitability of the formulation for all stages of the disease and low-cost development approach will ensure wide-spread utilization in terms of pharmaceutical development, a major limitation of nic as an antiviral therapy stems from its poor solubility in water, which is reported at . mg/l ( ) . this makes systemic absorption of the drug by the oral route of administration at therapeutically relevant concentrations difficult. administration of the existing fda-approved chewable tablet formulation was found to result in inadequate systemic concentrations for inhibiting sars-cov- replication. ( , ) as an alternative, direct delivery of niclosamide to the lung could overcome the limitations of the oral nic formulation by generating high drug concentrations at the site of infection. for use against p. aeruginosa, costabile et al previously developed an inhalable dry powder consisting of nic nanocrystals embedded in mannitol particles. ( ) however, these particles required a high amount of polysorbate to ensure production of a stable suspension ( % w/w to nic) which is beyond what is in currently fda approved inhaled products ( ). furthermore, the utilization of mannitol as the carrier system may induce bronchospasm and cough ( ) ( ) ( ) which may contribute to increased risk of spread of sars-cov- through generation of aerosolized respiratory droplets from infected individuals. an additional considerable challenge in developing therapies for covid- is the variable presentation of illness. patients may act as asymptomatic carriers of the virus or develop severe pneumonias and acute respiratory disease ( ) , which can result in the requirement for mechanical ventilation. in the case of ventilated patients, a nebulizer is often used to delivery aerosolized drug to the lungs. however, for treatment of asymptomatic carriers or in regions of developing economies with reduced access to clean water sources, nebulizer-based aqueous products may present an undue burden and reduce therapy compliance. for these populations, dry powder inhaler (dpi) or nasal spray systems, or a combination of both, would be the preferred option based upon the rapid administration time and ease of use and could also potentially be utilized as a prophylactic therapy in high risk populations such as healthcare workers and first responders. utilizing repurposed nic, and with the goal of developing a therapeutically effective, rapidly scalable and globally distributable antiviral therapy to reduce the spread of sars-cov- , we describe an inhalable nic formulation that can be administered using three major models or respiratory tract delivery systems: dpi, nasal spray and nebulizer. to achieve these aims, we utilized human lysozyme (hlys), an endogenous protein in the upper and lower respiratory tracts, as a therapeutically active matrix material for the delivery of nic to the airways, based upon its known anti-inflammatory ( ) , antiviral ( ) ( ) ( ) ( ) , and anti-bacterial activity ( ) ( ) ( ) as well as its surface active properties. the antiviral, antibacterial, and anti-inflammatory efficacy of the nic-hlys powders were evaluated in vitro and in vivo in mers-cov and sars-cov- infection mouse models. the composition of the nic-hlys formulation was optimized using a constrained mixtures design of experiments (doe) achieve particles with a size appropriate for inhalation both in the dry powder state and when reconstituted in aqueous media. physicochemical characterization of the optimized powder was performed and nasal, dpi, and nebulizer systems were developed and tested. to determine the utility of hlys as a carrier molecule for the nasal and pulmonary delivery of nic, we assessed the in vitro antiviral activity of our novel formulation against a lysozyme-free nic suspension (nic-m). nic-hlys particles ( . % w/w nic) were administered at varying doses (based upon nic content) to vero e cells infected with mers-cov or sars-cov- , and the ec was calculated based upon observed cytopathic effect (cpe). the addition of hlys to the nic formulation resulted in improved antiviral activity based upon reductions in the ec dose for mers-cov ( . µg/ml nic to . µg/ml nic) and sars-cov- ( . µg/ml to . µg/ml) (fig a and b) . we assessed the antiviral dose-response of nic-hlys particles in separate assay utilizing qpcr quantification of viral rna collected from infected cells. at the highest dose tested ( . µg/ml nic), vero cells with an established mers-cov infection exhibited an . % ± . % decrease in viral load compared to untreated controls after -hours of exposure to nic-hlys particles ( fig d) . this effect was sustained at hours post-drug exposure ( . % ± . %; mean ± sd). a similar dose-response profile was achieved in vero cells with an established sars-cov- infection ( fig e) . a dose . µg/ml nic resulted in a -hour inhibition of . % ± . % relative to untreated controls. however, in contrast to the mers-cov response, inhibition dropped to . % ± . % at hours post-exposure in the sars-cov- infected cells. a separately conducted trypan blue viability assay in uninfected vero e cells determined that the highest dose of nic-hlys utilized ( . µg/ml) had no effect on cell viability versus untreated controls ( . % viability in treated cells). interestingly, hlys alone appeared to exhibit activity against sars-cov- (fig f) , which has not been previously reported. though the inhibitory activity was not as potent as that of nic, it may contribute to observed increase in potency of nic-hlys in the cpe-based ec assay. nic-hlys, nic-m and nic dissolved in dmso (nic-dmso) were compared for their inhibitory activity against sars-cov- at a nic dose of . µg/ml. a two-way anova with tukey multiple comparisons test revealed statistically significant increase in viral inhibition of the nic-hlys particles versus nic-dmso at hours (p < . ) and hours (p = . ), though no significant differences were noted in the viral inhibition of nic-hlys and nic-m formulations at the tested dose ( fig c) , thus, an improvement in solubility does not appear to be the mechanism for the increased activity of nic-hlys. , where an initial drop in activity is preceded by a sharp rise in activity. this profile was also noted in inhibitory assays for s. aureus. hlys alone exhibits some antiviral activity which was more pronounced at hours post dosing than hours (f). data presented as mean + sem (n = ). ***p < . , ****p < . using two-way anova with tukey's multiple comparisons test. the in vivo efficacy of nic-hlys particles was assessed in lethal infection models for both mers-cov and sars-cov- . hddp transgenic mice were inoculated intranasally with × pfu mers-cov and rested for hours, after which once daily treatment with varying doses of intranasal nic-hlys (dosed based nic component) was initiated (fig a) . in this initial efficacy study, animals were sacrificed at day to determine viral titres in brain and lung tissue compared to untreated controls. notable decreases in brain viral titres was observed at the µg/kg dose although these differences were not statistically significant (two-way anova with dunnet's multiple comparisons test, p = . ) (fig c) . in a follow-up survival study utilizing lethal inoculum ( × pfu), mers-cov-infected hddp mice were dosed at µg/kg nic-hlys daily by the intranasal route. by day (study endpoint), % of treated mice had survived compared to % of untreated controls (fig b) . surviving mice were left untreated for an additional days, at which point they were sacrificed. during this period of no treatment, the survival percentage remained at %. a statistically significant decrease in lung viral titres was noted in these surviving mice compared to the day untreated controls from the earlier efficacy study (two-way anova with dunnett's multiple comparisons test, p = . ,). while brain viral titres did not exhibit further reduction from levels noted in the preliminary efficacy study, the inoculation of vero e cells with viral particles obtained from lung and brain homogenates of surviving animals resulted in no observation of cpe at any of the inoculum concentrations tested, which indicates that remaining viral particles were not active. thus, in the % of surviving animals it appears that the lethal mers-cov infection was essentially cured. serological assays revealed that surviving animals expressed anti-mers cov igg antibodies. though notable in a lethal infection model, survival probability differences in treated and untreated mice were not statistically significant (mantel-cox test, p = . ). in a similar study, the efficacy of nic-hlys particles was assessed in hace transgenic mice infected intranasally with a lethal dose of sars-cov- ( × pfu) (fig. a) . after a -hour rest period, daily intranasal treatment with . % sodium chloride (untreated control) or nic-hlys ( µg/kg nic) was initiated. a portion of mice from each group (n = ) were sacrificed at day p.i. to determine viral loads in brain, kidney and lung tissues, while remaining animals were utilized in a -day survival study. non-significant changes in day viral titers were observed (two-way anova with dunnet's multiple comparisons test) (fig. c) . by the day study endpoint, % of treated mice had survived, compared to % in the untreated arm ( fig b) . similar to the mers-cov study, these surviving mice were left untreated for an additional days, at which point they were sacrificed (day p.i.). the surviving mice exhibited a statistically significant decrease in viral loads in lung tissue compared to day p.i. untreated controls (two-way anova with dunnett's multiple comparisons test), p = . ,), and no virus particles were detected in brain and kidney tissue using qpcr (fig. c ). inoculation of vero e cells with -fold dilutions of tissue homogenates resulted in no observed cpe, and surviving animals were sera positive for anti-sars-cov- antibodies. similar to the mers-cov model, differences in survival probability between treated and untreated were not significant (mantel-cox test, p = . ). in both mers-cov and sars-cov- infection models, the lung tissue of infected and nic-hlys-treated mice ( fig. f and f) showed lower levels of interstitial pneumonia than that of infected and non-treated mice ( fig. e and e) on day p.i.. inflammation was further reduced in the treated/infected groups on day p.i. (fig. f and f ) and was more comparable to the mock-infected mice ( fig. d and d ), which showed no signs of interstitial pneumonia. . the viral particles obtained from lung and brain homogenates of surviving animals did not produce cpe when administered to vero e cells, indicated that they were no longer active. compared to lung tissue of uninfected mice (d), infection with sars-cov- resulted in the development of interstitial pneumonia without treatment (e). by day of treatment with µg/kg nic, interstitial pneumonia was notably reduced (f) and further resolved by day (f) . data are presented as mean + sem (n = ). *p < . , using two-way anova with dunnet's multiple comparisons test. covid- patients may be at-risk for secondary bacterial pneumonias and severe inflammatory lung damage. the efficacy of nic-hlys in the treatment of these important covid- sequalae was therefore assessed. a resazurin-based microbroth dilution assay was performed to determine the inhibitory activity of several nic formulations (nic-hlys, nic-bsa, nic-m, and nic-dmso). compared to the other nic formulations, nic-hlys reached an mic at lower levels of nic ( . µg/ml), and % inhibition was noted at a concentration of . µg/ml (fig a-c) . no inhibitory activity was observed for hlys alone. interestingly, all nic formulations exhibited a similar dose-response profile where a sharp dip in activity preceded the concentrations at which % inhibition was achieved. this same pattern was also noted in the dose-response profiles for anti-mers-cov and anti-sars-cov- activity (fig d and e) . plating of the wells with % inhibition noted resulted in the growth of colonies at all drug concentrations tested, which indicates that the antimicrobial activity of nic may be bacteriostatic rather than bactericidal. a feared consequence of sars-cov- infection is the development of ards, which is a major contributor to morbidity and mortality and dramatically increases the burden on healthcare systems. ards is caused by the massive release of inflammatory cytokines in the lungs, which occurs in some patients in response to pathogenic infiltration. both nic and hlys are known to exhibit anti-inflammatory activity. the anti-inflammatory activity of two nic formulations, nic-bsa and nic-hlys, was assessed using an acute macrophage inflammation model. nic-hlys and nic-bsa exhibited similar suppression of the inflammatory cytokines il- and tnf-a ( fig d and e) . the similarities in the levels of these two cytokines across the two formulations tested indicates that the suppression may be related to the activity of nic rather than hlys. suppression of il- b was not observed for either formulation, and the highest concentration of nic-hlys tested ( µg/ml total powder) resulted in statistically significant increase in compared to both the untreated, lps-stimulated control and an equivalent dose of the nic-bsa formulation ( fig f) . of, note this is an equivalent powder concentration to the dose exhibiting the highest efficacy in the in vitro antiviral assays. . data presented as mean + sem (n = ), **p < . ,****p < . , using two-way anova with dunnet's multiple comparisons test. nic-hlys significantly reduced production of the inflammatory cytokines il- (d) and tnf-a (g) in thp- macrophages stimulated with ng/ml lipopolysaccharide (lps), though a significant increase in il- b production was noted at the highest dose tested compared to both the untreated control and nic-bsa formulation, which may point towards the role of hlys in inducing production of this cytokine. data presented as mean + sem (n = ), *p < . , ** p < . , ***p < . , ****p < . using one-way anova with sidak's multiple comparisons test. targeted delivery of antivirals to the respiratory tract carries substantive benefits for the treatment of covid- , particularly for compounds with limited oral bioavailability like nic. however, the wide range of symptoms and disease severity associated with covid- makes development of a broadly applicable therapy difficult. the clinical applicability of our novel nic-hlys formulation was assessed by determining the delivery efficiency of the drug using three commercially available respiratory drug delivery platforms: a disposable dpi (twincaps®), a vibrating mesh nebulizer (aerogen solo®) and a nasal spray (vp aptar®). the composition of spray dried nic-hlys particles was optimized using a constrained-mixtures design of experiments (doe) to achieve respirable dry particles (geometric median diameter ≤ µm) that could be easily reconstituted as suspension suitable for nasal spray or nebulizer-based administration. the doe generated several promising powder formulations (supplementary table ), of which formulation (f ) was selected for further characterization. for ambulatory patients, dpis provide a convenient treatment option for lung-targeted delivery. the rapid administration time for the device as well as the compact size improves patient acceptability and compliance. a disposable dpi is likely to be preferred in the treatment of covid- , given the currently unknown risks regarding re-infectability. a commercially available disposable dpi, the twincaps® (hovione) was therefore selected as an exemplary delivery platform. nic-hlys powder inhalation was successfully delivered using the twincaps dpi, with a . ± . µg nic fine particle dose (i.e., recovered drug mass with less than µm aerodynamic diameter) achieved per mg total powder actuation ( . % nic content) when using inhalation flow rate conditions reflective of a healthy patient (fig h) . given the potential for shortness of breath and reduced inspiratory abilities that may occur in covid- , the delivery efficiency was also examined using inhalation flow-rate conditions reflective of a patient with reduced lung capacity due to illness or age. similar fine particle doses ( . ± . µg) were achieved in these reduced inhalation flow rate conditions, which indicates a minimal dependence on inspiratory flow rate to achieve successful delivery of the drug to the peripheral lung regions. covid- may result in the need for mechanical ventilation for continued patient survival. the delivery efficiency of reconstituted nic-hlys particles was assessed using an aerogen solo vibrating mesh nebulizer, which can be utilized aerosol drug delivery in-line with a ventilator circuit. nic-hlys powder reconstituted in . % sodium chloride to a mg/ml concentration (equivalent to µg/ml nic) resulted in the delivery of a fine particle dose of . ± . µg nic after a -minute run time ( fig h) . furthermore, a range of concentrations ( to mg/ml) could be successfully emitted using the aerogen solo device (fig g) . by altering the reconstitution concentration, the dose of nic-hlys could therefore be adjusted if required for pediatric patients, or those with hepatic or renal insufficiencies. the zeta potential of the reconstituted nic-hlys powder was determined to be + . , in contrast to the poorly performing nic-bsa particles, which exhibited a zeta potential of - . when reconstituted in water. epithelial cells of the upper respiratory tract (i.e., nasal passages) exhibit significantly higher expression of ace receptors than those of the lower respiratory tract, which indicates these tissues may be more prone to infection with sars-cov- ( ). as such, the feasibility of administration of the nic-hlys formulation using a nasal spray was assessed using spray pattern and plume geometry analysis. nic-hlys powders were reconstituted in . % sodium chloride at concentrations ranging from to mg/ml and actuated using a vp aptar® nasal spray device. suitable plume angles and uniform spray patterns for nasal administration were achieved for all tested concentrations (fig i) . an increase in plume angle was noted at increasing powder concentrations which was inversely related to changes in suspension viscosity (supp table ) and may be reflective of the decreased surface tension resulting from the presence of hlys and other surface-active stabilizers. the utilization of a slightly hypotonic reconstitution medium (supp table ) was selected and used in the in vivo studies to improve absorption and potentially assist nose-to-brain penetration of nic-hlys. the decreases in brain and kidney viral titers noted in vivo when nic-hlys was administered intranasally may reflect an achievement of therapeutic drug concentrations outside the respiratory passageways, though this will be evaluated further in a future pharmacokinetic study. nic is a poorly water-soluble drug, which renders the commercially available oral formulation ineffective against respiratory diseases due to the limited absorption of the drug from the gastrointestinal tract. direct delivery to the airways represents a promising alternative to oral delivery, as it would enable achievement of high levels of drug at the site of disease. however, limited solubility and delayed dissolution of niclosamide particles in the upper respiratory tract could result in rapid clearance of the particles by the mucocilliary escalator or through alveolar macrophage uptake. hlys exhibits surface active properties which could enhance the dissolution rate of poorly water soluble nic particles. to test this hypothesis, the dissolution rate of nic-hlys particles exhibiting an aerodynamic diameter of ~ µm was compared against hlys-free nic particles in simulated lung fluid medium. though the deposited particles exhibited equivalent sizes and surface areas, the inclusion of hlys as a formulation component resulted in a slightly faster rate of dissolution, with . % the deposited dose of nic dissolved by hours (fig f) . . this novel system was developed as an alternative to traditional lactose-based carrier systems (c) and enabled the targeted respiratory delivery of nic as a powder via dpi or a reconstituted suspension via nebulizer or nasal spray. the optimized formulation exhibited a size distribution that was appropriate for inhalation (i.e., geometric median diameter < µm) in both the dry powder state as well as when reconstituted using water or . % nacl (d). similar effects could not be achieved when a negatively charged protein, bovine serum albumin, was substituted in the formulation for the positively charged hlys (e). though hlys is surface active, it appeared to only slightly enhance the dissolution of nic compared to nic particles blended with lactose (f). a respirable droplet size distribution could be achieved with multiple different reconstituted concentrations when nebulized using the aerogen solo (g). these concentrations resulted in no aggregation to the lysozyme component. efficient aerosol delivery was achieved with both the nebulizer and disposable dpi, with ~ % of the emitted dose being of an appropriate size for lung deposition. this was significantly improved compared to a traditional lactose carrier particle system (h). reproducible plume geometry could be achieved using a variety of reconstituted concentrations when actuated using the aptar device (i). data is presented as mean + sem (n = ). *p < . , using two-way anova with tukey's multiple comparisons test (comparisons of dpis presented only). protein aggregation may result in a loss of therapeutic activity or an immunogenic response ( ) ( ) ( ) . we have previously demonstrated that hen egg white lysozyme is robust to processinduced aggregation using typical particle engineering techniques ( ) , which provided part of the rationale for its selection as a therapeutically active carrier for the aerosol delivery of nic particles. using size exclusion chromatography (sec), we investigated the formation of higher molecular weight hlys aggregates before and after spray drying. no further increases in higher molecular weight aggregates were noted after spray drying (supp fig b) or after nebulization at varying reconstituted concentrations (supp fig c- table ). interestingly, a decrease in the percentage of solubilized aggregates was noted in the spray dried hlys formulations compared to the initial unprocessed hlys product, which may be explained by the shift in secondary structure from towards a higher percentage of parallel b-sheet structure upon spray drying, whereas the unprocessed hlys had a higher percentage of anti-parallel b-sheet structure (supp table ). the glass transition temperature (tg) measured for the nic-hlys powder was . °c, which makes it suitable for storage in ambient conditions without risk of further protein degradation. the water content of the spray dried powder was determined to be . % based upon karl fisher coulometric titration, which is similar to literature reported values for the water content of lysozyme.( ) the co-formulation of the cationic, endogenous protein hlys with micronized nic produced a fold increase in potency against covs and a -fold increase in potency against mrsa compared to nic particles alone. though the inclusion of hlys did slightly increase the dissolution rate of micronized nic particles, this alone cannot explain the increased potency, as solubilized nic exhibited lower antiviral activity at the . µg/ml dose than both nic-m and nic-hlys. hlys plays an important role in the innate immune system, and it is found in abundance at the mucosal surfaces of the respiratory tract as well as in the lysosomal granules of neutrophils and macrophages. ( ) the antibacterial efficacy of hlys stems from both its enzymatic activity, which results in the hydrolysis of the glycosidic bonds linking peptidoglycan monomers in bacterial cells walls, as well as its cationic activity, which enables insertion of the protein and formation of pores in negatively charged bacterial membranes.( ) to our knowledge, the activity of lysozyme against coronaviruses has not been previously reported. a peptide (hl ) located within the helix-loop-helix motif of hlys has been previously demonstrated to block hiv- viral infection and replication, whereas mutants of hl did not. ( ) the location of this peptide was deemed to be separate from the hydrolytic site. it is possible that this peptide sequence, or another, is responsible for the anti-coronavirus activity of hlys. cationic peptides have been hypothesized to exhibit antiviral activity due to a disruption of the viral particle membrane ( ) . hlys may also disrupt various signaling pathways (tgfb, p , nfkb, protein kinase c, and hedgehog signaling) which affect host cell susceptibility to viral infection. ( ) an immunomodulatory response may explain the notable delay in anti-sars-cov- activity at lower doses when hlys was used alone. additionally, a unique up-regulation in il- b activity was noted for the nic-hlys formulation in a macrophage model of inflammatory stimulation, while expression of other inflammatory cytokines (il- , tnf-a) was decreased. il- family cytokines have been previously associated with the induction of antiviral transcriptional responses in fibroblasts and epithelial cells. ( ) in sars-cov infected african green monkeys, significantly lower levels of il- b were noted in the lungs of aged monkeys compared to juvenile monkeys.( ) similarly, elderly mice infected with influenza exhibited lower levels of l- b, and administration of il- b augmenting compound improved morbidity and mortality.( ) thus, it is possible that a similar immune-mediated mechanism is contributing to the increased activity of nic-hlys compared to other nic formulations, though this requires further investigation. at physiological ph, hlys exhibits a positive charge while nic exhibits a negative charge. this attraction may contribute to the improved dispersibility of nic suspensions when hlys is utilized as the carrier protein, as demonstrated by the notable aggregation observed when a negatively charged protein, bsa, is used in the formulation. this charge-based interaction may have additional benefits when nic is in the solubilized state. nic has two substituted aromatic rings, the electronegativity of which has been deemed critical for its activity against other viruses ( ) . non-covalent interactions between aromatic compounds is a known phenomenon, which occurs as a result of the overlap of the p-orbital of the two electron clouds of the aromatic rings ( , ) and p-p stacking of nic is observed in the crystalline form. ( ) though not investigated in this study, it's possible that self-association of nic molecules may disrupt the availability of the strongly electro-negative groups necessary for antiviral activity. this may explain why a decrease in antiviral activity was noted with nic solubilized in dmso compared to nic particles. interactions between nic and hlys in the microenvironment may serve as a mechanism to disrupt this self-association and ensure availability of the electronegative functional groups, as lysozyme has been previously demonstrated to complex with negatively charged molecules ( ) and a hydrophobic drug ( ) . given the physicochemical properties of nic, both charged-based and hydrophobic interactions with hlys may be important. future studies will examine the interactions and mechanisms of complexation between nic and hlys to further elucidate how the molecular interactions may contribute to improved antiviral activity. at present, there are limited treatment options for covid- . in august , the fda approved remdesivir for the treatment of all hospitalized adult and pediatric patients with covid- , irrespective of the severity of disease.( ) this approval is based upon a statistically significant reduction in median time to recovery for patients treated with remdesivir in a recent clinical trial.( ) however, remdesivir is currently not recommended for use in patients with acute or chronic kidney disease (gfr < ml/min), which may limit its utility in severe covid- infection. this contraindication stems from the incorporation of sulfobutylether-β-cyclodextrin (sbecd), which is utilized as a vehicle for the poorly water soluble remdesivir in the intravenous product, and which can accumulate in cases of renal failure. clinical trials examining inhalationbased delivery of remdesivir are currently under way( , ), though details of this aerosol formulation and whether or not it contains sbecd are not provided. a pre-print of a powder aerosol formulation of remdesivir has also been recently published ( ) . these studies overall indicate the clinical interest in the development of an inhalation-based treatment for covid- . based upon our data, nic may be a promising alternative or adjunt therapy to remdesivir for the treatment of covid- . a previous study examining the efficacy of remdesivir for the treatment of a lethal mers-cov infection ( × pfu) followed by treatment hours p.i with remdesivir resulted in ~ . - log reduction in viral particles in the lung tissue ( ) , which is similar to the reduction noted in our study using intranasal nic-hlys. at the day p.i. endpoint of the remdesivir mers-cov study equivalent survival ( %) was noted in the treated and untreated groups. in comparison, at day p.i., we noted . % survival in treated animals versus % in untreated controls, and at day p.i. % of treated animals had survived while % of untreated animals had survived. notably, these animals continued to survive after treatment was ceased, which was reproduced in the lethal sars-cov- infection model. while it is difficult to make comparisons between the two drugs on survival improvement given differences in study length and inoculum concentration, it appears survival with nic-hlys treatment is comparable if not improved compared to remdesivir treatment in lethal mers-cov infections. expansion of the dosing range or frequency in future studies, as well as the incorporation of prophylactic dosing may enable a significant improvement in survival to be achieved, similar to what was noted with remdesivir when used prophylactically in vivo. furthermore, we have demonstrated in vitro efficacy of nic-hlys reducing complications related to covid- , most notably secondary bacterial pneumonia by mrsa and dampening (but not complete suppression) of tnf-a and il- , which are known markers for covid- disease severity ( ) . this broad-spectrum activity may provide a unique advantage for nic-hlys compared to other leading covid- candidates as an early report on covid- reported % of patients that died had a secondary infection ( ) . though nic appears to be primarily renally cleared ( ) , limited data is available regarding the effects that hepatic or kidney failure may have on the toxicity profile of the drug. future studies will evaluate the pharmacokinetic distribution of nic-hlys delivered via the intranasal or pulmonary routes. we have successfully demonstrated nic-hlys is effective in vitro and in vivo for the treatment of covid- , and importantly, scalable inhaled drug delivery systems can be developed based upon the formulation which will enable rapid availability to a global patient population. historically, it has been observed that variations in patient inspiratory force can have profound effects on the magnitude and region of dose delivery to the airways from dpis.( ) one of the most common symptoms of covid- is shortness of breath, reportedly occurring in . % of patients.( ) moreover, patients with severe infection were significantly more likely to have shortness of breath than patients with non-severe infection.( ) nic-hlys powder exhibited efficient aerosol performance using a disposable dpi device even at reduce inspiratory efforts which will ensure reproducible dose delivery throughout disease progression, as well as in pediatric patients. the powder can be reconstituted at the point of care to form a stable suspension appropriate for nebulization, thus enabling treatment of both ambulatory and severely ill or mechanically ventilated patients. lastly, nic-hlys suspension could be reproducibly actuated using a commercially available nasal spray device and has demonstrated promising antiviral activity in vivo when administered to infected mice via the intranasal route. the utilization of a slightly hypotonic carrier may improve systemic penetration from the nasal route and promote distribution in the brain ( , ) . sars-cov- has been found to exhibit neuroinvasive properties, and severely affected covid- patients appear to be more likely to develop neurological symptoms compared to those with mild disease ( , ) . we observed that the brain tissue of surviving, nic-hlys treated animals with sars-cov- infection exhibited no detectable viral particles. nic-hlys delivered by the intranasal route may therefore be effective in preventing viral invasion of brain tissue or reducing viral replication within brain tissue. we acknowledge limitations within our study, namely the lack of pharmacokinetic data which may explain the mechanism of viral load reduction in brain, kidney, and lung tissue, as well as the utilization of only one dosing level of nic-hlys in the sars-cov- model. in our study, we utilized a "worst-case scenario" of efficacy evaluation by administering lethal inoculums of covs and initiating treatment hours after the infection was established ( ) . the effect of the timing of treatment initiation, i.e., prophylactic versus or hours p.i., was not examined. incorporation of prophylactic dosing may result in the increased efficacy of nic-hlys, as the antiviral mechanism of action of nic may be related to the prevention of viral entry into host cells.( ) future studies will examine these variables, based upon the notable activity of the novel nic-hlys formulation in these initial proof-of-concept efficacy studies. in conclusion, a novel formulation of nic-hlys optimized for delivery to the upper and lower respiratory tracts as a powder or stable suspension was developed. in vitro, the incorporation of hlys into the formulation was noted to improve potency against mers-cov and sars-cov- , as well as mrsa, which is an important causative agent for secondary bacterial pneumonias associated with covid- . the nic-hlys particles exhibited suppression of the inflammatory cytokines tnf-a and il- , which have been implicated in the development of more severe covid- , while elevating the production of the inflammatory cytokine il- b, which may contribute to enhanced antiviral activity. intranasal administration of nic-hlys particles improved survival and reduced viral tissue loads in vivo in two lethal cov infections at a level that was comparable to the leading covid- treatment candidate, remdesivir. thus, nic-hlys is likely to not only have efficacy in the treatment of the current sars-cov- pandemic but could be utilized as a treatment in future cov pandemics. niclosamide (nic) was obtained from shenzhen neconn pharmtechs ltd. (shenzhen, china) and micronized in-house using an model jet-o-mizer air jet mill (fluid energy processing and equipment co, telford, pa, usa) using a grind pressure of psi and a feed pressure of psi for a total of three milling cycles to achieve an x diameter of . µm and an x diameter of . µm. to generate a powder formulation of nic suitable for dpi-based delivery as well as suspension-based nebulizer and nasal spray delivery, micronized nic particles were embedded in a matrix of recombinant human lysozyme (hlys) (invitria, junction city, ks, usa), sucrose (sigma-aldrich, darmstadt, germany), polysorbate (sigma-aldrich) and histidine (sigma-aldrich) using spray drying. our group has previously identified that histidine (buffering agent), sucrose (lyoprotectant agent), and polysorbate (surface active agent) can be used to generate stable and dispersible formulations of lysozyme for delivery via dpi ( , ) . preliminary screening experiments indicated that spray drying with a feed solid content greater than % w/v resulted in a dry particle size distribution (psd) that was greater than the respirable size (typically less than µm particle diameter). a constrained mixtures doe was therefore utilized to determine the optimal ratio of micronized niclosamide, human lysozyme, sucrose, and polysorbate in the % w/v feed to generate powder with both a dry and reconstituted psd suitable for oral or nasal inhalation (supplementary table ). the upper constraint of polysorbate was selected to match the maximum fda approved level of the excipient for the inhalation route.( ) the lower level of human lysozyme was set at % w/w, based upon previously published results indicating that a respirable powder cannot be generated below this level ( ) . sucrose and polysorbate lower level constraints were set at conservative values, as their inclusion is necessary to ensure stability of lysozyme during the drying process, but the lowest level needed for stability has not been defined. the micronized niclosamide lower and upper constraints were selected to be around the point at which the average number of particles is excepted to be one, based upon equation ( ) . where ̅ is the average number of particles per droplet, ! is the volume fraction of particles of diameter # , " is the diameter of the particles (assumed to be equivalent to the x diameter of the micronized niclosamide particles) and # is the diameter of the droplets (assumed to be equivalent to the x diameter of the atomized droplets at conditions used in the experiment). a constrained mixtures doe was generated using the rsm package ( ) in the opensource software r ( ) . of these mixtures, a -run d-optimal subset (supplementary table ) was generated and prepared using spray drying to enable fitting of the results to the scheffé quadratic model. the dry components of the formulations were mixed using a process of geometric dilution and wetted and suspended using polysorbate followed by incremental additions of . mg/ml histidine buffer. all suspensions were spray dried with a buchi b- mini-spray dryer (buchi corporation, new castle, de, usa) coupled to a syringe pump (kd scientific inc, holliston, ma, usa) set at a feed rate of ml/min. a -fluid pneumatic atomizer nozzle ( . mm with . mm cap) was used to atomize the suspension, and house air was used as the atomization gas. the cleaning needle of the nozzle was removed to prevent disruptions to the feed flow rate.( ) the spray dryer was set at an inlet temperature of °c, which corresponded to an outlet temperature of ~ °c. for all runs, no settling of the feed suspensions was noted during processing. formulations were evaluated on the basis of dry powder psd and reconstituted suspension psd, and the composition exhibiting the most promising characteristics was selected for further evaluation. comparative powders were generated for the purposes of evaluation of physicochemical characteristics, aerosol performance, and efficacy. a nic-free hlys spray dried powder was generated using the optimized formulation composition identified in the doe, minus the addition of micronized nic. to compare the novel nic-hlys powders against a traditional lactose-based carrier system, micronized nic was blended with crystalline lactose particles (lactohale ; dfe pharma) using geometric dilution followed by mixing in a turbula powder blender. the concentration nic in the niclosamide-lactose blend (nic-lac) was set to match that in the nic-hlys powder ( . %). lastly, a powder was generated in which bovine serum albumin (bsa) was substituted for the hlys in the optimized nic formulation, in order to assess the effect of the protein on formulation characteristics. particle size distribution (psd) of nic-hlys powders was measured using a rodos disperser coupled to a sympatec laser diffractor unit (sympatec gmbh, clausthal-zellerfeld, germany). dispersion pressure was set at . bar and feed table rotation was set at %. time slices of the plume exhibiting an optical concentration between - % were averaged to generate the psd. the psd of the reconstituted powders was determined in both ½ normal saline (ns) and di water using the cuvette attachment for the laser diffraction. a spin bar was set to rotate at rpm, and the dry powders were added directly to the solvent in the cuvette until an optical concentration exceeding % was reached. three measurements were taken and averaged. zeta potential of nic-hlys suspensions before and after spray drying was determined using a zetasizer nanozs (malvern panalytical ltd, malvern, uk) and compared against a nic-bsa suspension. the morphology of nic-hlys powders was observed using scanning electron microscopy (sem). samples were mounted onto aluminum stubs using double-side carbon tape and sputter coated with nm of platinum/palladium (pt/pd) under argon using a cressington sputter coater hr (cressington scientific instruments ltd, watford, uk). images were obtained using a zeiss supra vp sem (carl zeiss microscopy gmbh, jena, germany). glass transition temperature (tg) and crystallinity of nic-hlys powder was determined using modulated dsc. powder samples were loaded into tzero pans with hermetically sealed lids, and a hole was pierced to prevent pan deformation. a scan was performed on a q dsc (ta instruments, new castle, de, usa) by ramping °c/min to - °c, followed by a °c/min ramp from - °c to °c with a modulation cycle of ± . °c every seconds. data was processed using ta universal analysis. the dissolution profile of the aerosolized nic-hlys and nic-lac powders was evaluated based upon previously published methods ( , ) using a composition of simulated lung fluid (slf) adapted from hassoun et al ( ) . briefly, whatman gf/c glass microfiber filters (diameter = mm) were placed in a stage of a next generation impactor, which corresponded to an aerodynamic size cut off of . µm at the l/min flow rate used in the experiment. powders were dispersed using a disposable twincaps (hovione) dpi. the filters were transferred to a modified transwell system (membrane removed) to enable contact of the bottom of the filter with a basal compartment containing . ml slf. the apical side of the filters were wetted with . ml of slf. the dissolution results are presented in table . the transwell system was placed in a °c isothermal chamber and . ml samples were removed from the basal compartment at various timepoints and replaced with fresh slf. effects of processing and nebulization on the aggregation of hlys was assessed using size exclusion chromatography (sec) based upon a previously published method ( ) . the effect of processing on the secondary structure of hlys was determined using a niclolet is fourier transform infrared spectrophotometer with attenuated total reflectance (ftir-atr) (thermo scientific). spectra were acquired using omnic software from a wavelength of cm - to cm - with acquisitions in total. an atmospheric background scan was collected and subtracted from all powder spectra. secondary structure analysis was performed in originpro (orginlab corporation) using the second derivative of the amide i band region ( - nm) and the peak analyzer function. the region of interest was first baseline-corrected, and the second derivative of the spectra was smoothed using the savitzky-golay method with a polynomial order of and points in the smoothing window. peaks identified from the second derivative minimums were iteratively removed to assess the effect on the model fit. the aerosol performance of the spray-dried composite nic-hlys powder was assessed using a disposable twincaps dpi. performance was assessed at both a kpa and kpa pressure drop through the device to determine the effects of inspiratory flow rate on emitted and fine particle dose. for comparative purposes, the performance of traditional lactose carrier-based dry powder, nic-lac, was also assessed. a kpa pressure drop was generated through the twincaps dpi using an inspiratory flow rate of l/min. a . second actuation time was used to pull l of air through the ngi. a kpa pressure drop was generated using an inspiratory flow rate of . l/min, and an second actuation time was used. for all experiments, mg of powder was loaded into the device. after actuation of nic-hlys powders, niclosamide was collected by dissolving the deposited powder using a - water:acetonitrile mix. an aliquot was taken, and an additional volume of acetonitrile was added to bring the final ratio to : water:acetonitrile. to induce phase separation, a m solution of ammonium acetate was added to this mixture at a volume that was % of the water:acetonitrile mix. niclosamide was assayed from the upper organic layer by measuring absorbance at nm using a plate reader. for the niclosamide-lactose blend, the deposited powder was collected by dissolving it in : water:acetonitrile, centrifuging, and then measuring absorbance at nm. delivery of the reconstitued nic-hlys suspension was assessed using the disposable aerogen® solo vibrating mesh nebulizer (aerogen). preliminary screening experiments indicated that a reconstituting mg/ml of nic-hlys powder in . % w/v sodium chloride ( / ns) reduced the changes in nebulizer concentration during therapy compared to higher concentrations; therefore, this concentration was utilized for further analysis. the inspiratory flow rate was set for l/min and the apparatus was chilled to °c as specified by the united states pharmacopeia (usp) ( ) . nebulization was performed for two minutes to ensure sufficient deposition of drug in the stages for analysis. after drug collection, aerosol performance was evaluated on the basis of emitted fraction or dose (niclosamide mass emitted from the device as a percentage of the total recovered powder) and fine particle fraction (niclosamide mass with a size cut off of less than µm aerodynamic diameter or µm aerodynamic diameter, as a percentage of the emitted dose). ngi stage cutoffs were determined for the flow rate utilized based upon eq. , while the moc cut-off diameter was determined using eq. . where %&,( is the cut-off diameter at the flow rate , %&,() is the cut-off diameter at the archival reference values of qn = l/min, and the values for the exponent, x, are those obtained from the archival ngi stage cut size-flow rate calculations determined by marple et al ( ) . aerosol performance was evaluated on the basis of emitted fraction (ef), which is defined as the cumulative mass emitted from the device as a fraction of the recovered mass, fine particle fraction less than µm (fpf< µm), defined as the mass less than µm aerodynamic diameter as a fraction of the emitted dose, and the fine particle fraction less than µm (fpf< µm) the fpf< µm and fpf< µm values were interpolated from a graph plotting the cumulative percentage of nic deposited in a stage against the cut-off values of the stage. to evaluate the utility of the optimized nic-hlys powders for nasal administration, suspensions of varying concentrations ( mg/ml, mg/ml, and mg/ml) were prepared in / ns and placed in a vp pump aptar® pump meter spray device. spray patterns and plume geometries were evaluated using laser-assisted high speed imaging based on methods previously reported by warnken et al.( ) briefly, the loaded spray devices were actuated using a mightyrunt automated actuator (innovasystems, inc) set at parameters that mimic those of an average adult user ( ) . a laser-sheet was oriented either parallel or perpendicular to the actuated spray at distances of and cm from the nozzle tip in order to assess the plume geometry and spray pattern, respectively. the actuation was conducted in a light-free environment in order to isolate the portions of the spray photographed by the high-speed camera (thorlabs, inc.) from those illuminated in the plane of the laser. image analysis of the plume geometry and spray pattern were permed in fiji. ( ) for the plume geometry, the outline of the observed plume was traced and the slope of each side of the plume was determined. this was used to calculate the angle formed at the intersection of the two lines. the spray pattern characteristics including maximum and minimum diameters were determined using the software's measurement function to determine the feret diameters. the ability of the optimized nic-hlys formulation to dampen inflammatory response was evaluated using lipopolysaccharide (lps) stimulation in a thp- macrophage model. thp- monocytes were seeded in -well plates at a concentration of x cells/ml ( ml total) in rpmi media supplemented with % fbs, % penicillin/streptomycin, and ng/ml phorbol -myristate (pma) to induce differentiation into mature macrophages. the cells were incubated in the presence of pma for hours, after the media was replaced with pma-free media and cells were rested for hours. nic-hlys and nic-bsa powders were suspended in rmpi media at varying concentrations ( to µg/ml, based on total powder content) and added to the cells simultaneously with ng/ml lps. the cells were then incubated for hours to achieve peak cytokine expression ( ) . following incubation, supernatants were collected, and cytokine concentrations were quantified using elisa (duoset, r&d systems) and compared against untreated controls. all antiviral efficacy experiments were performed using vero-e cells obtained from american type culture collection (manassas, virginia, usa). vero-e cells were maintained in minimal essential medium (mem) supplemented with % fetal bovine serum (fbs) and x antibioticantimycotic solution (sigma, st. louis, usa) (i.e., mem complete). cells were infected with either mers-cov (emc strain) or sars-cov- (sars-cov- /human/korea/cnuhv / strain). all experimental procedures involving potential contact with mers-cov or sars-cov- were conducted in a biosafety level laboratory of chungnam national university, which was certified by the korean government. vero-e cells ( x /ml) were seeded in the wells of -well tissue culture plates. after a -day incubation period, cells were washed with warm pbs (ph . ) twice and were infected with sars-cov- ( . x pfu) or mers-cov ( x pfu) diluted in mem with % fbs, which was followed by a -hour rest period. the media was then replaced with mem complete containing various concentrations of the investigational formulations (prepared as suspensions). for the assessment of solubilized nic, stock solutions were prepared by dissolving the drug in dmso, and the diluting in mem-complete, with the resulting media not containing more than % dmso. drug concentrations and formulations were assessed in triplicate. for each -well plate, well was utilized as untreated control. cells were incubated for or hours, at which point viral rna from samples was isolated using rneasy mini kit (qiagen, hilden, germany). viral rna was quantified with taqman real time fluorescent pcr (rtqpcr) using a topreal tm one-step rt qpcr kit (enzynomics, daejeon, korea) and sars-cov- and mers-cov primers and probe (supplementary table ). real-time amplification was performed using a rotor-gene (qiagen, hilden, germany). an initial incubation was performed at °c for minutes and at °c for minutes, after which cycles of a second hold at °c and a second hold at °c were performed. cycle threshold (ct) values were converted to plaque forming units (pfu) using a standard curve generated from data using stock viruses with known pfu titers by plaque assay. vero-e cells grown in tissue culture flasks were detached by treatment with trypsin-edta and were seeded in -well tissue culture plates. when confluent, cells were washed with warm pbs (ph . ) and infected with mers-cov or sars-cov- . the half maximal effective concentration (ec ) of the formulations was assessed by dosing infected vero e cells plated in -wells with nic-hlys suspensions with nic content ranging from . µg/ml to . µg/ml once daily over the course of hours. cell viability was determined on day by observing cytopathic effects (cpe) under microscope. the ec was calculated as the concentration of nic resulting in no observable cpe in % of the wells. for comparative purposes, the ec of micronized nic without the inclusion of hlys was also evaluated. the inhibitory activity of various nic formulations (nic-hlys, nic-m, nic-bsa, and nic-dmso) against methicillin-resistant s. aureus strain mu was assessed using a resazurinbased -well plate microdilution assay ( ) . varying concentrations of the nic formulations (: , , , . , . , . , . , . , . , . µg/ml, dosed based on nic content) were plated with a x cfu/ml inoculum of s. aureus (n = per dose). one column of the plate was used as growth control, i.e., no antibiotics were added, while another column was used as a sterile control, i.e., no bacteria added. the plates were incubated for hours at °c with rpm shaking, after which point µl a . % resazurin sodium solution was added. the plates were incubated for an additional hours to allow color change to occur, and the fluorescence of the wells was read at nm excitation/ nm emission. the fluorescence of the sterile wells was subtracted from the fluorescence of all treated wells, and a decrease in fluorescence of the treated wells versus the growth control was noted as inhibitory activity. the content of wells exhibiting % inhibition were plated on tryptic soy agar plates and incubated overnight to determine the mean bactericidal concentration (mbc). hdpp- transgenic mice were kindly provided by dr. paul b. mccray jr (university of iowa). mers-cov infection was initiated in anaesthetized mice by intranasal (i.n.) administration of µl ( × pfu) of mers-cov (emc strain), which was kindly drs bart haagmans and ron fouchier (erasmus medical center). efficacy was initially established using a dose-finding study, in which treatments were initiated -day post-infection (p.i.) and daily for days, at which point animals (n = from each group) were sacrificed. nic-hlys powder was reconstituted in . % sodium chloride to achieve a dose of or µg/kg nic (n = per group). the suspensions were administered i.n. in a volume of µl, and µl of . % sodium chloride was administered as a control (n = ). though a second timepoint was intended for day , death due to illness or as a result of treatment administration prevented obtainment of these data. a separate study was conducted to compare the survival of mers-cov-infected mice treated with µg/kg nic-hlys (n = ) and placebo (n = ). in this study, mice were dosed intranasally for days, at which point treatment was terminated. surviving mice were rested without treatment for an additional days, and sacrifice was performed day p.i. to obtain tissues (lung and brain) for viral titres and tissue pathology. the weight of mice was recorded daily. tissues ( . g per sample) were homogenised using a beadblaster homogeniser (benchmark scientific, edison, new jersey, usa) in ml of pbs (ph . ) to measure virus titres by rt-qpcr. the remaining portions of tissues were used for histopathology. mice were lightly anaesthetized with isoflurane usp (gujarat, india) prior to all viral inoculation and dosing procedures. hace- transgenic mice (k -hace mice) (the jackson laboratory, usa) were lightly anaesthetized with isoflurane usp (gujarat, india) and inoculated intranasally (i.n.) with µl ( × pfu) of sars-cov- /human/korea/cnuhv / . animals were rested for -hours, after which daily treatment was initiated with i.n. nic-hlys reconstituted in . % sodium chloride ( µg/kg nic) (n = ) or . % sodium chloride as a placebo (n = ). all treatments were performed on anaesthetized mice. on day post-infection, mice per group were euthanized, and lung, brain and kidney tissues were collected for viral titres and tissue pathology. treamtne was performed until days p.i., at which point surviving animals were left untreated for days, and then sacrificed on day p.i. to obtain tissues for viral titres and pathology. tissues ( . g per sample) were homogenised using a beadblaster homogeniser (benchmark scientific, edison, new jersey, usa) in ml of pbs (ph . ) to measure virus titres by rt-qpcr. the remaining portions of tissues were used for histopathology. mouse tissues were fixed in % neutral buffered formalin ( %) and then embedded in paraffin. the lung tissue was cut into μm sections, which were stained with haematoxylin (h) solution for min. the stained tissue sections were washed with tap water for min and then stained with eosin (e) solution. the stained sections were visualised under an olympus dp microscope and photographed (olympus corporation, tokyo, japan). to determine whether measured viral particles in lung and brain tissue were dead or alive, the log tcid /ml was determined. vero-e cells grown in tissue culture flasks were detached by treatment with trypsin-edta and were seeded in -well tissue culture plates with mem containing % fbs and × antibiotic-antimycotic solution. when confluent, the cells were washed with warm pbs (ph . ) and infected with virus samples, which were -fold diluted in mem with % fbs. the cells in four wells were infected with the diluted virus samples for days in a humidified incubator at °c. the cells were observed for cpe under microscope. the presence of igg antibody specific for mers-cov or sars-cov- in the sera of infected and treated animals was determined using enzyme-linked immunosorbent assays (elisa). the purified and inactivated mers-cov or sars-cov- antigen was diluted to final concentration of μg/ml in coating buffer (carbonate-bicarbonate buffer, ph . ). the diluted antigen ( μl) was coated to the wells of a nunc-immuno ™ microwell ™ well solid plates (sigma-aldrich, mo, usa) and was incubated overnight at °c. after removing the coating buffer, the plate was washed twice by filling the wells with μl of washing buffer ( . % tween pbs (ph . ) the buffer was removed, and sera ( μl diluted in : in pbs) collected from treated mice on days post treatment were added to the plate and incubated for hr at room temperature. the plate was washed times with washing buffer. goat anti-mouse igg cross-adsorbed secondary antibody hrp (invitrogen, ma, usa) was diluted ( : ) in blocking buffer, and µl was added to each well and incubated for hr at room temperature. after washing the plate times with the washing buffer, µl of the tmb elisa substrate (mabtech, nacka strand, sweden) was dispensed into the wells and incubated for min at ℃. abts® peroxidase stop solution - lethal infection models were evaluated for statistical significance using the mantel-cox test. statistical analysis on tissue homogenate viral titres was performed using a two-way anova with dunnet's multiple comparisons test to evaluate post-hoc differences between groups. alpha was set at . . statistical analysis was performed using prism (graphpad software all the studies were approved and were conducted in accordance with the relevant legal guidelines and regulations prescribed by cnu, republic of korea. references . who director-general's opening remarks at the mission briefing on estimated demand for us hospital inpatient and intensive care unit beds for patients with covid- based on comparisons with wuhan and guangzhou, china disease and healthcare burden of covid- in the united states structure, function, and antigenicity of the sars-cov- spike glycoprotein structural basis for human coronavirus attachment to sialic acid receptors projecting the transmission dynamics of sars-cov- through the postpandemic period identification of antiviral drug candidates against sars-cov- from fda-approved drugs skp attenuates autophagy through beclin -ubiquitination and its inhibition reduces mers-coronavirus infection niclosamide is a proton carrier and targets acidic endosomes with broad antiviral effects broad spectrum antiviral agent niclosamide and its therapeutic potential multi-targeted therapy of cancer by niclosamide: a new application for an old drug beyond an antihelminthic drug host-targeted niclosamide inhibits c. difficile virulence and prevents disease in mice without disrupting the gut microbiota structure-activity analysis of niclosamide reveals potential role for cytoplasmic ph in control of mammalian target of rapamycin complex (mtorc ) signaling a novel high content imaging-based screen identifies the anti-helminthic niclosamide as an inhibitor of lysosome anterograde trafficking and prostate cancer cell invasion extracellular ph modulates neuroendocrine prostate cancer cell metabolism and susceptibility to the mitochondrial inhibitor niclosamide structure-activity studies of wnt/beta-catenin inhibition in the niclosamide chemotype: identification of derivatives with improved drug exposure repurposing the anthelmintic drug niclosamide to combat helicobacter pylori repurposing salicylanilide anthelmintic drugs to combat drug resistant staphylococcus aureus repurposing niclosamide as a versatile antimicrobial surface coating against device-associated, hospitalacquired bacterial infections screening a commercial library of pharmacologically active small molecules against staphylococcus aureus biofilms the anthelmintic drug niclosamide synergizes with colistin and reverses colistin resistance in gram-negative bacilli toward repositioning niclosamide for antivirulence therapy of pseudomonas aeruginosa lung infections: development of inhalable formulations through nanosuspension technology new life for an old drug: the anthelmintic drug niclosamide inhibits pseudomonas aeruginosa quorum sensing synergistic activity of niclosamide in combination with colistin against colistin-susceptible and colistin-resistant acinetobacter baumannii and klebsiella pneumoniae antituberculosis activity of certain antifungal and antihelmintic drugs activities of drug combinations against mycobacterium tuberculosis grown in aerobic and hypoxic acidic conditions niclosamide as a promising antibiofilm agent niclosamide repurposed for the treatment of inflammatory airway disease identification of antiviral drug candidates against sars-cov- from fda-approved drugs a phase i study of niclosamide in combination with enzalutamide in men with castration-resistant prostate cancer toward repositioning niclosamide for antivirulence therapy of pseudomonas aeruginosa lung infections: development of inhalable formulations through nanosuspension technology nebulized mannitol, particle distribution, and cough in idiopathic pulmonary fibrosis dissociation in the effect of nedocromil on mannitol-induced cough or bronchoconstriction in asthmatic subjects* post-inhalation cough with therapeutic aerosols: formulation considerations. adv drug deliv rev asymptomatic carrier state, acute respiratory disease, and pneumonia due to severe acute respiratory syndrome coronavirus (sars-cov- ): facts and myths anti-inflammatory effects of lysozyme against hmgb in human endothelial cells and in mice antiviral activity of lysozyme antiviral effects of nisin, lysozyme, lactoferrin and their mixtures against bovine viral diarrhoea virus study on antimicrobial and antiviral activities of lysozyme from marine strain s- - in vitro structural and functional modeling of human lysozyme reveals a unique nonapeptide, hl , with anti-hiv activity human lysozyme possesses novel antimicrobial peptides within its n-terminal domain that target bacterial respiration activity of abundant antimicrobials of the human airway. american journal of respiratory cell and molecular biology from bacterial killing to immune modulation: recent insights into the functions of lysozyme reverse genetics reveals a variable infection gradient in the respiratory tract immunogenicity of protein aggregates--concerns and realities immunogenicity of therapeutic proteins: influence of aggregation immunogenicity of therapeutic protein aggregates effect of particle formation process on characteristics and aerosol performance of respirable protein powders integrity of crystalline lysozyme exceeds that of a spray-dried form lysozymes in the animal kingdom antibacterial peptides: basic facts and emerging concepts severe acute respiratory syndrome-coronavirus infection in aged nonhuman primates is associated with modulated pulmonary and systemic immune responses rethinking the term "pi-stacking an in vitro study of aromatic stacking of drug molecules pharmaceutical cocrystals of niclosamide complexation of lysozyme with sodium poly(styrenesulfonate) via the two-state and non-two-state unfoldings of lysozyme covid- update: fda broadens emergency use authorization for veklury (remdesivir) to include all hospitalized patients for treatment of covid- study to evaluate the safety and antiviral activity of remdesivir (gs- ™) in participants with moderate coronavirus disease (covid- ) compared to standard of care treatment development of remdesivir as a dry powder for inhalation by thin film freezing comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov an inflammatory cytokine signature predicts covid- severity and survival clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study. the lancet the biology and toxicology of molluscicides, bayluscide the impact of inspiratory flow rate on drug delivery to the lungs with dry powder inhalers dysregulation of immune response in patients with covid- in wuhan, china brain delivery of vasoactive intestinal peptide (vip) following nasal administration to rats ciclesonide, a hypotonic intranasal corticosteroid neurological manifestations of hospitalized patients with covid- in wuhan, china: a retrospective case series study nervous system involvement after infection with covid- and other coronaviruses comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov excipient-free pulmonary delivery and macrophage targeting of clofazimine via air jet micronization influence of formulation factors on the aerosol performance and stability of lysozyme powders: a systematic approach aerosol technology: properties, behavior, and measurement of airborne particles package 'rsm' r: a language and environment for statistical computing in vitro aqueous fluid-capacity-limited dissolution testing of respirable aerosol drug particles generated from inhaler products evaluation of the transwell system for characterization of dissolution behavior of inhalation drugs: effects of membrane and surfactant design and development of a biorelevant simulated human lung fluid overcoming sink limitations in dissolution testing: a review of traditional methods and the potential utility of biphasic systems next generation pharmaceutical impactor (a new impactor for pharmaceutical inhaler testing). part ii: archival calibration personalized medicine in nasal delivery: the use of patient-specific administration parameters to improve nasal drug targeting using d printed nasal replica casts automated actuation of nasal spray products: determination and comparison of adult and pediatric settings fiji: an opensource platform for biological-image analysis transcription profiles of lpsstimulated thp- monocytes and macrophages: a tool to study inflammation modulating effects of foodderived compounds resazurin-based -well plate microdilution method for the determination of minimum inhibitory concentration of biosurfactants the authors wish to acknowledge the valuable contributions of miguel orlando jara gonzalez (university of texas at austin, college of pharmacy) in the obtainment and review of background literature to support the premise of this work. key: cord- -ggcpsjk authors: radhakrishnan, chandni; divakar, mohit kumar; jain, abhinav; viswanathan, prasanth; bhoyar, rahul c.; jolly, bani; imran, mohamed; sharma, disha; rophina, mercy; ranjan, gyan; jose, beena philomina; raman, rajendran vadukkoot; kesavan, thulaseedharan nallaveettil; george, kalpana; mathew, sheela; poovullathil, jayesh kumar; govindan, sajeeth kumar keeriyatt; nair, priyanka raveendranadhan; vadekkandiyil, shameer; gladson, vineeth; mohan, midhun; parambath, fairoz cheriyalingal; mangla, mohit; shamnath, afra; sivasubbu, sridhar; scaria, vinod title: initial insights into the genetic epidemiology of sars-cov- isolates from kerala suggest local spread from limited introductions date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ggcpsjk coronavirus disease (covid- ) rapidly spread from a city in china to almost every country in the world, affecting millions of individuals. genomic approaches have been extensively used to understand the evolution and epidemiology of sars-cov- across the world. kerala is a unique state in india well connected with the rest of the world through a large number of expatriates, trade, and tourism. the first case of covid- in india was reported in kerala in january , during the initial days of the pandemic. the rapid increase in the covid- cases in the state of kerala has necessitated the understanding of the genetic epidemiology of circulating virus, evolution, and mutations in sars-cov- . we sequenced a total of samples from patients at a tertiary hospital in kerala using covidseq protocol at a mean coverage of , x. the analysis identified unique high-quality variants encompassing novel variants and new variants identified for the first time in sars-cov- samples isolated from india. phylogenetic and haplotype analysis revealed that the circulating population of the virus was dominated ( . % of genomes) by three distinct introductions followed by local spread, apart from identifying polytomies suggesting recent outbreaks. the genomes formed a monophyletic distribution exclusively mapping to the a a clade. further analysis of the functional variants revealed two variants in the s gene of the virus reportedly associated with increased infectivity and variants that mapped to five primer/probe binding sites that could potentially compromise the efficacy of rt-pcr detection. to the best of our knowledge, this is the first and most comprehensive report of genetic epidemiology and evolution of sars-cov- isolates from kerala. the covid- pandemic has seen a widespread application of genomic approaches to understand the epidemiology and evolution of sars-cov- . the accelerated efforts to sequence genomes of clinical isolates of sars-cov- from across the world picked up pace following the initial genome sequencing of the virus from a patient in wuhan, the epicenter for the pandemic [ ] ). as the virus evolves through the accumulation of mutations, it has split into major lineages with strong geographical affinities [ ] . the availability of the genome sequences in the public domain has provided a unique view of the introduction, evolution, and dynamics of sars-cov- in different parts of the world [ ] . a number of approaches have emerged for rapid and scalable sequencing of sars-cov- from clinical isolates. this includes direct shotgun approaches as well as targeted amplicon-based and targeted capture-based approaches [ ] [ ] [ ] . sequencing based approaches provide a unique opportunity for high fidelity of detection and for understanding the genetic epidemiology of sars-cov- [ ] . additionally, the genetic variants could offer insights into the mutational spectrum, evolution, infectivity, and attenuation of the virus [ , ] . additional analyses on genomic variants have also provided useful insights into the efficacy of primer/probe-based diagnostic assays as well as immune epitopes and resistance to antisera [ , ] . an approach for high-throughput multiplex amplicon sequencing of sars-cov- has been previously reported from our group [ ] . kerala is a unique state in india with a population of million people and extensively connected with the global populations through over . million expatriates, apart from being a traditional trade post and a global tourist destination. the state is therefore in a distinct position, affected by local as well as global epidemics. in fact, the first identified case of covid- in india was from kerala, early in the epidemic. the patient had traveled from wuhan, china [ , ] . the initial genomic identity of the virus was also established which mapped to the b superclade of sars-cov- [ ] . further introductions into the state during the later days of the pandemic through international and regional travel could have contributed to the spread of the epidemic in the state and the pool of circulating genetic lineages or clades. while a number of studies on the genetic epidemiology of sars-cov- from different states in india have emerged [ ] [ ] [ ] [ ] , there has been a paucity of information on the genetic architecture and epidemiology of sars-cov- isolates in the state of kerala. we intended to fulfill the gap in knowledge on the identity of the circulating genetic lineages/clades contributing to the epidemic in the state of kerala. to this end, we employed a high-throughput sequencing-based approach for the genetic epidemiology of sars-cov- . to the best of our knowledge, this is the first comprehensive overview of the genetic architecture of sars-cov- isolates from the state of kerala. the institutional human ethics committee approved the project (gmc kkd/rp /iec ). rna samples were isolated from nasopharyngeal/oropharyngeal swabs of patients presenting to government medical college, kozhikode, a major tertiary care center in kerala. rna extraction was done using magmax viral/pathogen nucleic acid isolation kit in thermo scientific kingfisher flex automated extraction system according to the manufacturer's instructions. all the rna samples were transferred within hours of collection at a cold temperature ( - °c) and were stored at - °c until further processing. sequencing was performed using the covidseq protocol as reported previously [ ] . briefly, this protocol involved multiplex amplicon sequencing on the illumina novaseq platform. the base calls generated in the binary base call (bcl) format were demultiplexed to fastq reads using bcl fastq (v . ). for reference-based assembly, we followed a previously defined protocol from poojary et al. [ ] . as per the protocol, the quality control of fastq reads was performed using trimmomatic (v . ) at a phred score of q [ ] with adapter trimming. these reads were further aligned to the severe acute respiratory syndrome (sars-cov- ) wuhan-hu- reference genome (nc_ . ) using hisat - . [ ] . the human reads were removed using samtools (v . ) [ ] . the samples with coverage > % and < % unassigned nucleotides underwent variant calling and consensus sequences generation using varscan (v . . ) [ ] and samtools (v . ) [ ] , bcftools (v . . ), and seqtk (v . -r ) [ ] respectively. variants were annotated using annovar [ ] employing a range of custom annotation datasets and tables. all the variants identified were systematically compared with a compendium of other indian and global variants. a total of , complete sars-cov- genomes deposited in the global initiative on sharing all influenza data (gisaid) database till september , were used for comparative analysis. summary of the sample details along with their originating and submitting laboratories are provided in supplementary table . viral genomes with a pairwise alignment ≥ % and gaps < % with the reference genome (nc_ . ) were considered for further variant calling using snp-sites [ ] . genetic variants compiled from a total of , high-quality genomes from india and , global genomes were considered for analysis. phylogenetic analysis was performed according to the pipeline provided by nextstrain [ ] . the dataset of , complete sars-cov- genomes deposited in the gisaid database from india was used for the analysis supplementary table , along with genomes from the current study which have % coverage and at least % pairwise alignment with the reference genome (nc_ . ). genomes having more than % ns or missing dates of sample collection were excluded from the analysis. the phylogenetic tree was constructed and refined to a molecular clock phylogeny using the augur framework provided by nextstrain and was visualized using auspice. the phylogenetic assignment of named global outbreak lineages (pangolin) package was used to assign lineages to the genomes from this study [ ] . the lineages were visualized and annotated on the phylogenetic tree using itol [ ] . for haplotype analysis, the genomes were aligned to the wuhan-hu- (nc_ . ) reference genome using mafft [ ] and problematic genomic loci (low coverage, high sequencing error rate, hypermutable and homoplasic sites) were masked from the alignment [ ] . the aligned sequences were imported into the dna sequence polymorphism tool (dnasp v . . ) [ ] to generate haplotypes. a tcs haplotype network [ ] for the genomes was constructed using the population analysis with reticulate trees software (popart v . ) [ ] . times to the most recent common ancestor (tmrca) for the haplogroups were computed following the bayesian markov chain monte carlo (mcmc) method using beast v . . [ ] . the analysis was performed using a coalescent growth rate model along with a strict molecular clock and the hky+Γ substitution model with gamma-distributed rate variation (gamma categories= ). mcmc was run for million steps. the output was analyzed in tracer v . . [ ] and burn-in was adjusted to attain an appropriate effective sample size (ess). further, we have evaluated the sars-cov- variants based on their functional relevance. we curated a comprehensive compendium of sars-cov- variants of functional relevance as well as variants that are associated with increased infectivity and attenuation of sars-cov- from literature and preprint servers. the variants were systematically annotated and mapped to the reference genome coordinates and their respective amino acid changes. this variant compendium encompassed about variants curated from publications. the variants in this study were compared with the genomic variants generated using bespoke scripts. we were also interested to evaluate the effect of sars-cov- variants on the efficacy of rt-pcr detection. we took a compiled list of primer/probe sequences widely used in the molecular detection of sars-cov- around the globe [ ] . in our analysis, we mapped the sars-cov- genetic variants obtained from kerala genomes to the primer or probes sequence and calculated the melting temperature (tm) of the mutant with the wild type sequence. the length of primers in the curated list is greater than nucleotides. the formula applied for calculating melting temperature is tm= . + *(yg+zc- . )/(wa+xt+yg+zc) where w, x, y, and z are the number of a, t, g, and c nucleotides respectively [ ] . figure summarises the schematic for the overall data analysis. a total of isolates of sars-cov- from kerala were processed for genome sequencing. the genomes were sequenced using covidseq protocol [ ] and generated approximately . million raw reads per sample. the reads were subjected to quality control and resulted in approximately . million reads per sample, of which around . million reads per sample aligned to the sars-cov- reference genome (nc_ . ). the reads had a mapping percentage of . % and mean , x coverage. the data has been summarized in supplementary table and the mean coverage of the sample across the amplicons has been represented in figure . of the isolates of sars-cov- sequenced, a total of samples had > % coverage and < % unassigned nucleotides across the genome. these samples were further processed for variant calling and consensus generation. our analysis identified a total of unique variants, with a median variant count of per sample. variant quality has been ensured with the average variation percentage across genomes ≥ . of the total unique variants, were categorized as high-quality variants. the detailed information on variant quality is provided in supplementary table . the distribution of variants across the sars-cov- genomes used in the study was analyzed. also, the proportional distribution of variants for every bps across the genome was calculated and compared among various datasets. variant distribution across genomes and comparison of variant proportions across genome datasets are represented in figure . out of the high-quality unique variants, variants were found to be reported for the first time in the global compilation of variants table . we have also added new variants ( . %) to the indian repertoire of genetic variants. details of these variants are systematically compiled in supplementary table . the overlap in the variants between the present study of kerala, other indian, and global datasets is summarized in supplementary figure . out of the novel variants, variant in the s gene, g>a, was a personal variant and was not shared by any other isolate. the remaining three novel variants were shared variants and were present in different genes (orf b, orf a and s). of the total high-quality unique variants, variants were located in the protein-coding regions while variants mapped to either downstream or upstream regions. of the total variants in protein-coding regions, variants were non-synonymous, were synonymous, and variants resulted in stopgain mutation. these two stopgain variants were found in orf a ( :g>t) and orf ( :g>a) genes and were present in one individual each. the annotation of the variants based on the location and consequence is represented in figure . the phylogenetic tree was constructed using the genome wuhan/wh (epi_isl_ ) as root and genomes from india which met the inclusion criteria (ns < %, no missing/ambiguous date of sample collection) including genomes sequenced in this study. all genomes from this study were found to cluster under the globally predominant clade a a (gisaid clade g and gh). in contrast, one of the previous genomes available from kerala (epi_isl_ , submitted by national institute of virology, pune, india), which is also one of the first sars-cov- genomes sequenced in india, belongs to the clade b [ ] . haplotype analysis was done using a dataset of sars-cov- genomes from india (including genomes from kerala) that fell under clade a a in the phylogenetic tree and clustered close to the genomes from kerala. among the genomes, there were variable sites and unique haplotypes supplementary figure summarizes the haplotype network of the a a clade genomes. may) for the three major haplogroups k , k , and k respectively. taken together, the analysis suggests that the majority of the sars-cov- isolates are outcomes of limited introductions early in the epidemic followed by local circulation. annotating the variants for their functional consequences using custom annotation datasets, revealed a total of genetic variants that were predicted as deleterious by sift [ ] . the filtered variants were found to span unique protein domains as per uniprot [ ] annotations. we found and genetic variants that mapped back to potential b and t cell epitopes from the immune epitope database (iedb) [ ] respectively. in addition, variants were found to span predicted error-prone sites including sequencing error sites, homoplasic positions, and hypermutable sites. functional annotation details of all the filtered variants are summarised in supplementary table . we also explored whether the variants mapped to the rt-pcr primers and probes sites. on mapping the genetic variants with the curated primers and probes, we found unique variants at unique primer or probes binding sites. a total of four unique variants had allele frequency > % at unique primer binding sites. summary of novel variants and diagnostic primer/probe spanning variants are compiled in table and table respectively. details on the read count and depth of coverage of these variants are systematically documented in supplementary table .a and . b . with the view of identifying potential functionally relevant variants, we overlapped the variants obtained from the present study with a manually curated compilation of functionally relevant sars-cov- variants. our analysis identified variants in the s gene which were reported to be associated with increased infectivity. l f, a variation co-occurring with d g was earlier demonstrated to possess increased infectivity [ , ] using cell line studies. in our study, a>g (d g) and c>t (l f) mutations were observed at frequencies of . % and . % respectively in the genomes. the combination of these variations was found to occur at a higher frequency in genomes from kerala. [ ] and therefore leaves the mutational fingerprint which is widely used for tracing the spread of the virus [ ] . the availability of high-throughput sequencing approaches has enabled researchers to sequence genomes as the pandemic progressed in their respective countries. a number of methods have been adopted for rapid high throughput sequencing of sars-cov- including shotgun sequencing [ ] , pcr amplicon, and hybridization/capture-based enrichment and sequencing [ ] [ ] [ ] . genome sequencing of sars-cov- in various countries [ ] has led to to insights into the temporal and geographical spread of the virus [ ] , introductions, and spread of the virus through travelers [ , ] , local transmission, and dynamics [ ] , investigating the origin of outbreaks [ ] , just to name a few. by virtue of its connectivity to major cities through its expatriate population, trade and tourism is uniquely poised in this pandemic. it is not surprising therefore that the first case of covid- in india, early in the pandemic, was reported from the state [ , ] . the genome of the isolate suggested it originated from china [ ] . the following months have seen the number of cases increase to over thousand in the state with a paucity of information on the origin, spread, and dynamics of the virus [ https://dashboard.kerala.gov.in/ ]. in this present study, we performed sequencing and analysis of sars-cov- isolates from kerala which revealed unique patterns of the transmission. these genomes are clustered into a monophyletic group mapping to the a a clade. the a a clade is also marked by the d g variant, which is suggested to confer higher infectivity to the virus in experimental in vitro settings [ , ] and is therefore thought to have emerged globally as the predominant clade [ ] , though the cause-effect relationship still remains speculative. haplotype analysis suggests that three major haplogroups with distinct ancestry groups encompass the majority of the isolates. the haplotype analysis in the context of other genomes from india suggests the introductions were from inter-state travel. the prevalent haplotypes were not found in any of the global genomes, supporting this observation. this also suggests that focussed testing, tracing and quarantine of expatriates and international travelers implemented during the epidemic would have been effective in curbing the spread from international travelers. the genome clusters also suggested polytomies, suggesting a recent outbreak [ ] . close follow-up of the cluster members confirmed the potential source of the outbreak, suggesting genetic epidemiology could be used in conjunction with case follow-ups to uncover potential outbreaks and possibly connect outbreaks which are apparently not related. this study uncovered a total of novel genetic variants and variants which were identified only in kerala and not in the rest of india. the genome sequences could also uncover insights into the variants of functional relevance. one of the variants of significance is a stopgain variant ( :g>a) in the orf gene. variants including deletions in orf have been suggested to attenuate the virus [ , ] . similar variants have also been identified in other related viruses like the sars-cov and mers-cov [ , ] . a variant c>t (l f) in the s gene associated with increased infectivity of the virus [ ] was present in . % of the genomes sequenced. following recent reports which suggest variants in the primer/probe binding sites could impact the efficiency of rt-pcr assays [ , , ] , we explored whether any of the variants in the present study mapped to the primer/probe binding sites. we identified unique variants in unique binding sites. the maximum number of variants were the primer set published by won et al. [ ] spanning multiple genes, apart from the -ncov-nfp ggggaacttctcctgctagaat binding sites in the n gene [ https://www.who.int/docs/default-source/coronaviruse/whoinhouseassays.pdf?sfvrsn=de a aa_ ]. the latter is part of the china centers for disease control and prevention (cdc) protocol with variants in . % in genomes from kerala. we have earlier reported variants in this primer site in . % of the genomes from india [ ] and . % [ ] of global genomes. this information would be potentially valuable for laboratories in selecting reagents for screening and diagnosis. the study has two caveats; first is that the samples were collected from a single major tertiary care center in north kerala. however, the center caters to a large population and region and has close proximity to an international airport. secondly, the sampling was limited to a short period of time, thus enabling only a cross-sectional view of the epidemic and precluding an accurate and temporal view of the dynamics of the epidemic in the state. nevertheless, this provides a unique opportunity to create a snapshot of the epidemic in time and space. notwithstanding the limitations, this is the first and most comprehensive overview of the genetic epidemiology of sars-cov- in the state of kerala. while providing insights into the epidemiology of the epidemic, the study also enabled tracing outbreaks thereby highlighting the utility of genome sequencing as an adjunct to high-throughput screening and testing. it has not escaped our mind that scalable technologies that can combine both the approaches [ ] could potentially find a place in understanding epidemics better. a new coronavirus associated with human respiratory disease in china evolutionary history, potential intermediate animal host, and cross-species analyses of sars-cov- global initiative on sharing all influenza data -from vision to reality rapid implementation of sars-cov- sequencing to investigate cases of health-care associated covid- : a prospective genomic surveillance study multiple approaches for massively parallel sequencing of sars-cov- genomes directly from clinical samples hidra-seq: high-throughput sars-cov- detection by rna barcoding and amplicon sequencing high throughput detection and genetic epidemiology of sars-cov- using covidseq next generation sequencing biorxiv tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus attenuation of replication by a nucleotide deletion in sars-coronavirus acquired during the early stages of human-to-human transmission a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- analysis of the potential impact of genomic variants in sars-cov- genomes from india on molecular diagnostic assays biorxiv first isolation of sars-cov- from clinical samples in india full-genome sequences of the first two sars-cov- viruses from india genomics of indian sars-cov- : implications in genetic diversity, possible origin and spread of virus. medrxiv genome-wide analysis of indian sars-cov- genomes for the identification of genetic mutation and snp phylogenomic analysis of sars-cov- genomes from western india reveals unique linked mutations biorxiv a distinct phylogenetic cluster of indian sars-cov- isolates biorxiv computational protocol for assembly and analysis of sars-ncov- genomes trimmomatic: a flexible trimmer for illumina sequence data hisat: a fast spliced aligner with low memory requirements the sequence alignment/map format and samtools varscan: variant detection in massively parallel sequencing of individual and pooled samples seqkit: a cross-platform and ultrafast toolkit for fasta/q file manipulation annovar: functional annotation of genetic variants from high-throughput sequencing data rapid efficient extraction of snps from multi-fasta alignments nextstrain: real-time tracking of pathogen evolution a dynamic nomenclature proposal for sars-cov- lineages to assist genomic epidemiology interactive tree of life (itol) v : recent updates and new developments recent developments in the mafft multiple sequence alignment program issues with sars-cov- sequencing data dnasp : dna sequence polymorphism analysis of large data sets tcs: a computer program to estimate gene genealogies popart : full-feature software for haplotype network construction bayesian phylogenetic and phylodynamic data integration using beast . posterior summarization in bayesian phylogenetics using tracer . hybridization of synthetic oligodeoxyribonucleotides to phi chi dna: the effect of single base pair mismatch sift: predicting amino acid changes that affect protein function uniprot: the universal protein knowledgebase the immune epitope database (iedb): update the impact of mutations in sars-cov- spike on viral infectivity and antigenicity genome-wide analysis of sars-cov- virus strains circulating worldwide implicates heterogeneity. sci rep cog-uk) consortiumcontact@cogconsortium.uk. an integrated national scale sars-cov- genomic surveillance network geographical and temporal distribution of sars-cov- clades in the who european region rapid sars-cov- whole-genome sequencing and analysis for informed public health decision-making in the netherlands whole genome and phylogenetic analysis of two sars-cov- strains isolated in italy in genomic epidemiology of sars-cov- in guangdong province clinical features of patients infected with novel coronavirus in wuhan the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity d g mutation of sars-cov- spike protein enhances viral infectivity. microbiology. biorxiv; identification of unique mutations in sars-cov- strains isolated from india suggests its attenuated pathotype biorxiv effects of a major deletion in the sars-cov- genome on the severity of infection and the inflammatory response: an observational cohort study deletion variants of middle east respiratory syndrome coronavirus from humans newly emerging mutations in the matrix genes of the human influenza a(h n )pdm and a(h n ) viruses reduce the detection sensitivity of real-time reverse transcription-pcr the impact of primer and probe-template mismatches on the sensitivity of pandemic influenza a/h n / virus detection by real-time rt-pcr development of a laboratory-safe and low-cost detection protocol for sars-cov- of the coronavirus disease (covid- ) presence of mismatches between diagnostic pcr assays and coronavirus sars-cov- genome authors thank anjali bajaj for editorial assistance. aj, md, bj acknowledge research fellowships from csir. ds acknowledges a research fellowship from intel. authors acknowledge funding for the work from the council of scientific and industrial research (csir), india through grants codest and mlp . the funders had no role in the design of experiment, analysis, or decision to publish . supplementary table : gisaid acknowledgment table for global genomes used in the study supplementary table : gisaid acknowledgment table for genomes from india considered for phylogenetic analysis supplementary table : data summary of the samples sequenced by covidseq protocol and processed using a custom pipeline. table . summary of quality details of the variants identified in the study key: cord- -gp u kh authors: song, xiang; hu, wei; yu, haibo; zhao, laura; zhao, yeqian; zhao, yong title: high expression of angiotensin-converting enzyme- (ace ) on tissue macrophages that may be targeted by virus sars-cov- in covid- patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gp u kh angiotensin-converting enzyme- (ace ) has been recognized as the binding receptor for the severe acute respiratory syndrome coronavirus (sars-cov- ) that infects host cells, causing the development of the new coronavirus infectious disease (covid- ). to better understand the pathogenesis of covid- and build up the host anti-viral immunity, we examined the levels of ace expression on different types of immune cells including tissue macrophages. flow cytometry demonstrated that there was little to no expression of ace on most of the human peripheral blood-derived immune cells including cd + t, cd + t, activated cd + t, activated cd + t, cd +cd +cd low/− regulatory t cells (tregs), th cells, nkt cells, b cells, nk cells, monocytes, dendritic cells (dcs), and granulocytes. additionally, there was no ace expression (< %) found on platelets. compared with interleukin- -treated type macrophages (m ), the ace expression was markedly increased on the activated type macrophages (m ) after the stimulation with lipopolysaccharide (lps). immunohistochemistry demonstrated that high expressions of ace were colocalized with tissue macrophages, such as alveolar macrophages found within the lungs and kupffer cells within livers of mice. flow cytometry confirmed the very low level of ace expression on human primary pulmonary alveolar epithelial cells. these data indicate that alveolar macrophages, as the frontline immune cells, may be directly targeted by the sars-cov- infection and therefore need to be considered for the prevention and treatment of covid- . the epidemic of a new coronavirus infectious disease is wreaking havoc worldwide, caused by the severe acute respiratory syndrome coronavirus (sars-cov- ). currently, this virus has been globally spreading for over months, with over million confirmed cases and , deaths. due to the lack of effective antiviral drugs, most patients may be treated only by addressing their symptoms, including reducing fevers and coughs. preliminary results from the double-blind, randomized, placebo-controlled trial of intravenous remdesivir showed the reduced median recovery time ( days) for hospitalized covid- patients [ ] . based on this evidence, the united states food and drug administration (fda) has approved remdesivir under an emergency-use authorization for the treatment of adults and children with severe covid- . despite this approved treatment, high mortality rates among patients have persisted. as remdesivir is an antiviral drug, the treatment is not sufficient to control covid- [ ] . to date, no pharmacological treatments have been shown effective for the treatment of covid- [ , ] . consequently, understanding the pathogenesis and finding an alternative treatment to improve clinical outcomes is extremely urgent as a global top priority. sars-cov- is an rna virus that displays high similarities, in both genomic and proteomic profiling, with sars-cov that first emerged in humans in after transmitting from animals in open-air markets in china [ ] . the betacoronaviruses are divided into four lineages (i.e. a-d). both sars-cov and sars-cov- belong to lineage b, with single strand rna ( , and , bp respectively) [ , ] . most viruses enter cells through pattern recognition receptors (prr) mediated endocytosis. angiotensin-converting enzyme (ace ), with a multiplicity of physiological roles such as a negative regulator of the renin-angiotensin system [ ] , has been recognized as the prr for sars-cov- infecting host cells [ ] , which is similar to the sars-cov [ ] . the ace expression has been mainly distributed in microvilli of the intestine and renal proximal tubules, gallbladder epithelium, testicular sertoli cells and leydig cells, glandular cells of seminal vesicle and cardiomyocytes [ ] . the human respiratory system is primarily affected by the sars-cov- infection. using the polyclonal anti-serum-based immunohistochemistry, the expression of ace was reported on type ii alveolar epithelial cells [ ] . however, a single cell-rna profiling analysis showed that only . % of lung type ii alveolar epithelial cells expressed ace at rna level [ ] , with no expression of ace protein [ ] . thus, there are fundamental knowledge gaps underlying the pathogenesis of covid- that need to be clarified. to understand the immunopathology and advance the strategies for the prevention and treatment of covid- , we examined the levels of ace expression on different types of immune cells. our data demonstrated that the activated macrophages and alveolar tissue macrophages, among others, displayed high levels of ace , while most of the immune cells were negative or displayed very low expressions of ace . this data highlights the importance of macrophages in the pathogenesis and treatment of covid- . to determine the expression of ace on different types of immune cells, human buffy coat blood units (n = ; mean age of ± . ; age range from to years old; males and females) were purchased from the new york blood center (new york, ny, usa). human buffy coats were initially added to ml chemical-defined serum-free culture x-vivo tm mediums (lonza, walkersville, md, usa) and mixed thoroughly with ml pipette. next, they were used for isolation of peripheral blood-derived mononuclear cells (pbmcs). pbmcs were harvested as previously described [ ] . briefly, mononuclear cells were isolated from buffy coats blood using ficoll-paque tm plus (g= . , ge healthcare, chicago, il, usa). next, the red blood cells were removed using ack lysing buffer (lonza, walkersville, md, usa). after three washes with saline, the whole pbmc were utilized for flow cytometry. to isolate monocytes, monocytes were purified from pbmc by using cd + microbeads (miltenyi biotec, bergisch gladbach, germany) according to the manufacturer's instruction, with purity of cd + cells > %. to generate th cells, cd + t cells were initially isolated from pbmc using cd + microbeads (miltenyi biotec, bergisch gladbach, germany) according to the manufacturer's instruction, with purity of cd + t cells > %. the purified cd + t cells were seeded at ´ cells/well in the anti-cd monoclonal antibody (mab) ( µg/ml, bd pharmingen, franklin lakes, nj, usa) precoated -well tissue culture-treated plate, in the presence of soluble anti-cd mab ( µg/ml, bd pharmingen, franklin lakes, nj, usa), interleukin (il)- ( ng/ml, biolegend, san diego, ca, usa), il- β ( ng/ml, biolegend, san diego, ca, usa), transforming growth factor (tgf)-β ( ng/ml, biolegend, san diego, ca, usa), il- ( ng/ml, biolegend, san diego, ca, usa), penicillin-streptomycin ( µg/ml, sigma, saint louis, mo, usa), the neutralizing antibodies anti-il- mab ( µg/ml, bd pharmingen, franklin lakes, nj, usa) and anti-ifn-γ mab ( µg/ml, bd pharmingen, franklin lakes, nj, usa), in x-vivo serum-free medium ( µl per well), at °c, % co conditions. after culturing for days, the cells were stimulated with the cell activation cocktail of . mm monensin sodium salt, . mm phorbol -myristate -acetate and . mm ionomycin calcium salt (bio-techne corporation, minneapolis, mn, usa) at µl. stock solution was diluted with ml of the culture medium for hours at °c, % co conditions. finally, the th cells were harvested for flow cytometry analysis. adult human platelet units (n = ; mean age of . ± . ; age range from to years old; males and female) were purchased from the new york blood center (new york, ny, http://nybloodcenter.org/). the mitochondria were isolated from platelets using the mitochondria isolation kit (thermo scientific, rockford, usa) according to the manufacturer's recommended protocol [ ] . the concentration of mitochondria was determined by the measurement of protein concentration using a nanodrop spectrophotometer (thermofisher scientific, waltham, ma, usa). for mitochondrial staining with fluorescent dyes, mitochondria were labeled with mitotracker deep red fm ( nm) (thermo fisher scientific, waltham, ma) at °c for minutes according to the manufacturer's recommended protocol, followed by two washes with pbs at rpm ´ minutes [ ] . finally, the mitochondria were harvested for flow cytometry analysis. to generate m macrophages and determine ace expression, monocytes were purified from human pbmc by using cd microbeads (miltenyi biotec, bergisch gladbach, germany) according to the manufacturer's instruction, with the purity of cd + cells > %. the isolated monocytes were seeded in the six-well tissue culture-treated plate at ´ cells/ well in the attached monocytes were washed twice with pbs to remove floating cells and cellular debris, followed by treatment with ng/ml mcsf (sigma, saint louis, mo, usa) in x-vivo serum-free medium at °c, % co . after culturing for days, the m macrophages were treated with µg/ml lipopolysaccharides (lps) (sigma, saint louis, mo, usa) or ng/ml il- (biolegend, san diego, ca, usa) in duplicates respectively for hours in x-vivo serum free medium. consequently, cells from different groups were collected for evaluations. phenotypic characterization of pbmc, monocytes, macrophages and th cells were to determine an expression of ace protein on different types of pbmc, macrophages, th cells, platelets or platelet-derived mitochondria, the indirectly-labeled immunostaining with mouse anti-human ace mab (novus biologicals, catalogue# nbp - - µg, clone #ac f, littleton, co, usa) was utilized in combination with above lineage-specific fluorescence-labeled mabs. briefly, samples were first pre-incubated with human bd fc block to block non-specific binding (bd pharmingen, franklin lakes, nj, usa) for minutes at room temperature, before being directly aliquoted for different antibody stainings. cells were initially incubated with mouse anti-human ace mab at : dilution, at room temperature for minutes. next, cells were washed with pbs at × g minutes and then stained with fitc-conjugated affinipure donkey or cy -conjugated affinipure donkey anti-mouse nd abs (jackson immunoresearch laboratories, west grove, pa, usa) at : dilution for minutes at room temperature. cells only with nd ab staining served as control. after finishing the nd ab staining, cells were washed with ml pbs to remove residual nd ab. consequently, cells were immunostained with above lineage-specific fluorescence-labeled mabs, as previously described [ ] [ ] [ ] female nod/ltj mice (aged weeks) and nod-scid mice (aged - weeks) were purchased from jackson laboratories (bar harbor, me, usa) and maintained under pathogen-free conditions, according to a protocol approved by the institutional animal care committee (acc). tissue samples (e.g., small intestine, lung, liver and kidney) were fixed in % formaldehyde, processed, and embedded in paraffin. tissue sections were cut at mm thickness. tissue sections from nod/ltj mice ( weeks old) were utilized for immunohistochemistry including lung tissue sections (n = ), small intestine tissue sections (n = ), liver tissue sections (n = ), and kidney tissue sections (n = ). immunostaining was performed as previously described with minor modifications [ ] . to block non-specific staining, sections were incubated in a buffer containing . % horse serum (vector laboratories, burlingame, ca, usa) and mouse fc block (bd pharmingen, franklin lakes, nj, usa) for minutes at room temperature. tissue sections were initially immunostained with ace rabbit polyclonal antibody (abcam, cambridge, ma, usa) at : dilution for hours at room temperature. next, tissue sections were stained with cy conjugated affinipure donkey anti-rabbit nd ab (jackson immunoresearch laboratories, west grove, pa, usa) at : dilution and combined with fitc-conjugated rat anti-mouse f / mab (ebioscience, san diego, ca, usa) at : dilution at room temperature for hour. for every experiment, only tissue sections with nd ab staining served as negative control. finally, the slides were covered by using mounting medium with dapi (vector laboratories, burlingame, ca, usa) and photographed with a nikon a r confocal microscope on a nikon eclipse ti inverted base, using nis elements version . software. statistical analyses were performed with graphpad prism (version . . ) software. the normality test of samples was performed by the shapiro-wilk test. statistical analyses of data were performed by the two-tailed paired student's t-test to determine statistical significance between untreated and treated groups. the mann-whitney u test was utilized for non-parametric data. values were given as mean ± sd (standard deviation). statistical significance was defined as p < . , with two sided. to explore the direct action of sars-cov- on immune cells, we examined the ace expressions on different types of immune cells from human peripheral blood (n = ). they were characterized and gated with cell type-specific surface markers [ ] : cd for t cells, cd + cd + for cd + t cells, cd + cd + for cd + t cells, cd c + cd for myeloid dendritic cells (mdc), cd -cd + for plasmacytoid dc (pdc), cd for monocytes, cd for b cells, cd + cd + cd low/for regulatory t cells (tregs), cd + cd + for nkt cells, cd -cd + for nk cells, and cd -cd b + for granulocytes ( figure a-c) . flow cytometry demonstrated that there were no expressions of ace on most types of immune cells, or with a background level (< %) ( figures d) . the percentages of ace + cells for nk and nkt cells were only . % ± . % and . % ± . % respectively (n = ) ( figure d ). the activated cd + hla-dr + t cells displayed only . % ± . % of ace + cells (n = ). the activated cd + hla-dr + was only . % ± . % (n = ) ( figure d ). the data suggests that sars-cov- virus may not directly attack blood immune cells lacking the ace expression. t-helper type (th ) cells are important pathogenic mediators for several autoimmune diseases, potentially contributing to the pathogenesis of covid- . rorγt (retinoic acid receptorrelated orphan nuclear receptor gamma t) belong to nuclear hormone receptors (nhrs) and act as a crucial transcription factor for the differentiation and function of th cells both in humans and mice [ ] . using rorγt, interleukin- a (il- a), il- f, and ccr as specific th markers [ ] , the purity of il a + rorgt + th cells was . % ± . % (figure a ). the percentage of il a + il f + th cells was . % ± . % ( figure a ). the purity of il a + ccr + th cells was . % ± . % ( figure a ). the gated th cells failed to express ace (figures b, n = ). this data implies no direct interaction between th cells and sars-cov- . increasing clinical evidence demonstrated the coagulation abnormalities in covid- subjects including disseminated intravascular coagulopathy (dic) and low levels of platelet count [ , ] . to determine whether platelets were directly targeted by sars-cov- or trigged by viral inflammatory reactions, we examined the ace expression on the highly-purified cd b + cd a + platelets from human peripheral blood ( figure a our previous work established that platelets could release mitochondria contributing to the immune modulation and islet b-cell regeneration [ ] . to explore the mitochondrial function in viral infection, flow cytometry indicated that while the purified platelet-derived mitochondria did not express ace (figures d), they strongly display the mitochondrial antiviral-signaling protein (mavs) with the percentage of mitotracker red + hsp + mavs + mitochondria at . % ± . % ( figure e , n = ) [ ] . this data suggests that platelets may have potential to improve antiviral immunity through the releasing mitochondria. macrophages have been characterized with type macrophages (m , inflammatory) and type macrophages (m , anti-inflammatory), according to their phenotypic differences such as spindlelike morphology and high expression cd and cd on m macrophages [ ] . initially, flow cytometry established that cd + monocytes from human peripheral blood failed to express ace ( . % ± . %, n = ) ( figure a ). m macrophages were then generated in the presence of ng/ml macrophage colony-stimulating factor (m-csf) with the percentage of spindle-like cells at . % ± . % (n = ). to evaluate the ace expressions on macrophages, m macrophages were activated by the treatment with lipopolysaccharide (lps) [ ] and interleukin- (il- ) respectively [ ] . phase contrast image showed significant differences in the morphology between two groups ( figure b ). lps-treated m macrophages exhibited pseudopod-like protrusions compared to the spindle form of il- -treated m macrophages ( figure b, left) . flow cytometry demonstrated that the level of ace expression was higher on the lps-activated m macrophages than that of il- -treated m macrophages ( figure c-e) . this finding was further confirmed by the confocal microscopy and image analysis ( figure f ). therefore, the data suggests the upregulation of ace expression on the activated m macrophages. to determine the expression of ace on tissue macrophages, we initially performed doublestaining with mouse macrophage marker f / through the immunohistochemistry in the small intestine, lung, liver, and kidney tissues of -week non-obese diabetic (nod) mice. using an expression of ace in the small intestine as a positive control ( figure a) , the data revealed that most ace expressions were co-localized with the f / + tissue macrophages in the lung and liver ( figure b and c) , which are known as dust cells (alveolar macrophages, figure b ) and kupffer cells ( figure c ) respectively. unexpectedly, there was little to no expression of ace on the alveolar epithelial cells ( figure b ). additionally, we observed the strong expressions of ace on the proximal tubules, with some scattered f / + macrophages double-positive with ace staining ( figure d ). to further determine the distribution of ace expression on alveolar macrophages, tissue sections were examined by utilizing non-inflammatory lung tissue from nod-scid mice ( figure e ). immunohistochemistry confirmed the marked co-localization of ace expression on f / + alveolar macrophages ( figure f ), with a low expression of ace on alveolar epithelial cells. next, we utilized the primary human pulmonary alveolar epithelial cells (hpaepic) to define their level of ace expression. flow cytometry proved the low level ( %) of ace expression on human pulmonary alveolar epithelial cells ( figure g ). therefore, these data indicate the high expression of ace on tissue macrophages. the human pulmonary system is primarily targeted by sars-cov- . ace and proteases such as tmprss (transmembrane protease, serine ) or cathepsin b/l were utilized for host cellular entry of sars-cov- [ ] . ace may act as a limiting factor for viral infection at the initial stage [ ] . sungnak et al. reported the high expression of ace in nasal epithelial cells [ ] . our current studies demonstrated little to no expression of ace on both primary human pulmonary alveolar epithelial cells and mouse lung alveolar epithelial cells, which is consistent with previous reports [ , ] . notably, we found that high expression of ace was colocalized with tissue macrophages of the lung (alveolar macrophages) and liver (kupffer cells), and up-regulated on the activated m macrophages. however, most immune cells in human peripheral blood were negative for ace expression, including the freshly-isolated monocytes. therefore, these data highlight the importance of alveolar macrophages during the pathogenesis of lung damage in covid- subjects. based on this evidence, we propose that lung macrophages may be directly clinical autopsies from sars-cov-infected patients demonstrated that there were major pathological changes in the lungs, immune organs, and small systemic blood vessels with vasculitis. the detection of sars-cov was primarily found in the lung and trachea/bronchus, but was undetectable in the spleen, lymph nodes, bone marrow, heart and aorta [ ] . this evidence highlights the overreaction of immune responses induced by viral infection which resulted in significant harm, as evidenced by pathogenesis of the lungs, immune organs, and small systemic blood vessels. the pathological study in covid- patients revealed that the majority of infiltrated immune cells in alveoli were macrophages and monocytes, with minimal lymphocytes, eosinophils and neutrophils [ ] . there were abundant proinflammatory macrophages in the bronchoalveolar lavage fluid of severe covid- patients [ ] . thus, the virus-infected alveolar macrophages play a critical role in the pathogenesis of covid- and sars [ ] [ ] [ ] and may recruit the lung infiltration of additional immune cells through predominantly releasing cytokines and chemokines [ , ] , resulting in pulmonary edema and hypoxemia: the hallmark of acute respiratory distress syndrome (ards) ( figure ). consequently, the viral inflammation disrupts the homeostasis and causes multiple dysfunctions including lymphopenia [ ] , coagulopathy [ , ] , diarrhea [ ] , liver injury [ ] , manifestations of cardiovascular [ , ] and central nervous system [ ] , and renal failures [ ] . additionally, the percentage of inflammation-associated th cells was markedly increased in the peripheral blood of covid- patients [ ] . th cells act as important pathogenic mediators for several autoimmune diseases including type diabetes (t d), multiple sclerosis, rheumatoid arthritis, alopecia areata (aa), psoriasis, and even the insulin resistance in type diabetes (t d) [ ] [ ] [ ] . thus, the viral inflammation caused by sars-cov- is very similar to the majority of autoimmune diseases in humans caused by overactive immune systems. therefore, immune modulation strategy may be potentially beneficial to enhance antiviral immunity and efficiently reduce the viral load, improve clinical outcomes, expedite patient recovery, and decline the rate of mortality in patients after being infected with sars-cov- . macrophages are widely distributed in human tissues and organs with pleiotropic functions in maintaining homeostasis. therefore, dysfunctions of macrophages may increase vulnerability to the viral infection. for example, the phenotype of intestinal macrophages may be changed during the chronic intestinal inflammation and infection [ ] . due to high expression of ace in the epithelial cells of the intestine and the utilization of immune-suppression regimens, patients with chronic inflammatory bowel disease (ibd) might have an increased risk of the sars-cov- infection [ ] . additionally, increasing clinical evidence demonstrates that the chronic metabolic stress-induced inflammation causes multiple dysfunctions of macrophages, leading to the insulin resistance in type diabetes [ ] . clinical studies have shown that type diabetes is one of the major risk factors for covid- [ , [ ] [ ] [ ] [ ] , while an interstitial subset of cd + lung-resident macrophages, primarily located around the airways in close proximity to the sympathetic nerves of the bronchovascular bundle, exhibit the characteristics of type macrophages and antiinflammatory effects [ ] . therefore, these two types of macrophages play an essential role in the immune surveillance and maintenance of homeostasis of the pulmonary system. considering all current approaches for the prevention and treatment of covid- , there are no therapies, either being tested or at the beginning of the pipeline, that directly focus on the modulation of macrophages. to address the overreaction of immune responses caused by viral infection, immune suppression regimens (e.g., chloroquine, hydroxychloroquine, jak inhibitors, anti-cytokine therapy, and anti-il- r antibody) are being tested in clinical trials, which may be not sufficient to treat covid- as they make subjects more vulnerable to the sars-cov- infection, in addition to their associated side effects. rather, the immune modulation strategy like that of traditional chinese herbs lian-hua-qing-wen granule [ ] may potentially enhance anti-viral immunity and improve clinical outcomes in patients after being infected with sars-cov- . additionally, over the last years, dr. yong zhao at tianhe stem cell biotechnologies has developed the stem cell educator (sce) therapy by utilizing the immune modulation of human cord blood-derived multipotent stem cells (cb-sc) for the treatment of type diabetes and other inflammationassociated diseases. clinical safety and efficacy have been demonstrated for sce therapy through multicenter clinical trials [ ] [ ] [ ] [ ] . recently, mechanistic studies established that cb-sc-released exosomes could differentiate the purified cd + monocytes into m macrophages [ ] , suggesting that sce therapy may have the potential to treat covid- (clinicaltrials.gov: nct ). targeting alveolar and other tissue macrophages through immune modulations may be potentially beneficial to correct the viral inflammation, effectively ameliorate anti-viral immunity, efficiently reduce the viral load, improve clinical outcomes, expedite the patient recovery, and decline the rate of mortality in patients after being infected with sars-cov- . authors are grateful to mr. poddar and mr. ludwig for generous funding support via hackensack umc foundation. all authors have no financial interests that may be relevant to the submitted work. ace protein was primarily displayed on alveolar macrophages of lung, with no or low expression on alveolar type (at-i) and type (at-ii) epithelial cells. upon entering the pulmonary alveoli, healthy alveolar macrophages may directly kill the virus, with asymptomatic or mild clinical symptoms. at this earlier stage , the infected alveolar macrophages may alternatively recruit other immune cells to build up the antiviral immunity through releasing cytokines (e.g., il- , il- , il- , and tnfa) and chemokines (e.g. cxcl and cxcl to recruit granulocytes, cxcl to recruit t cells, nk cells, and dcs). for instance, the recruited cd + t cells may secret interferon (ifn)-g to strengthen the antiviral immunity of alveolar macrophages and minimize the viral load. however, if this first line of defense is broken, the more cytokines and chemokines are released from the dead cells or dead-cell engulfed macrophages (stages and ), the more immune cells are infiltrated into pulmonary systems, leading to patients experiencing a rapid deterioration and the development of ards with high fatality in the clinic. remdesivir for the treatment of covid- -preliminary report pharmacologic treatments for coronavirus disease (covid- ): a review should chloroquine and hydroxychloroquine be used to treat covid- ? a rapid review systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses sars-cov- receptor and regulator of the renin-angiotensin system: celebrating the th anniversary of the discovery of ace cryo-em structure of the -ncov spike in the prefusion conformation angiotensin-converting enzyme is a functional receptor for the sars coronavirus the protein expression profile of ace in human tissues tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis single-cell rna expression profiling of ace , the receptor of sars-cov- a human peripheral blood monocyte-derived subset acts as pluripotent stem cells platelet-derived mitochondria display embryonic stem cell markers and improve pancreatic islet beta-cell function in humans generation of multipotent stem cells from adult human peripheral blood following the treatment with platelet-derived mitochondria released exosomes contribute to the immune modulation of cord blood-derived stem cells (cb-sc) human cord blood stem cell-modulated regulatory t lymphocytes reverse the autoimmune-caused type diabetes in nonobese diabetic (nod) mice standardizing immunophenotyping for the human immunology project small molecule inhibitors of rorgammat: targeting th cells and other applications covid- and its implications for thrombosis and anticoagulation thrombocytopenia is associated with severe coronavirus disease (covid- ) infections: a meta-analysis il- induces m macrophages to produce sustained analgesia via opioids sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes organ distribution of severe acute respiratory syndrome (sars) associated coronavirus (sars-cov) in sars patients: implications for pathogenesis and virus transmission pathways single-cell landscape of bronchoalveolar immune cells in patients with covid- the lung macrophage in sars-cov- infection: a friend or a foe? frontiers in immunology pathological inflammation in patients with covid- : a key role for monocytes and macrophages pathogenesis of macrophage activation syndrome and potential for cytokine-directed therapies cytokine release syndrome in severe covid- covid- : consider cytokine storm syndromes and immunosuppression characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention autopsy findings and venous thromboembolism in patients with covid- liver injury during highly pathogenic human coronavirus infections covid- and the heart cardiac involvement in a patient with coronavirus disease (covid- ) central nervous system manifestations of covid- : a systematic review pathological findings of covid- associated with acute respiratory distress syndrome the pathogenicity of th cells in autoimmune diseases rd. th cells in immunity and autoimmunity the potential pathogenic role of il- /th cells in both type and type diabetes mellitus origin, differentiation, and function of intestinal macrophages viral screening before initiation of biologics in patients with inflammatory bowel disease during the covid- outbreak macrophages, inflammation, and insulin resistance case-fatality rate and characteristics of patients dying in relation to covid- in italy diabetes is a risk factor for the progression and prognosis of covid- clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study identification of a nerve-associated, lung-resident interstitial macrophage subset with distinct localization and immunoregulatory properties efficacy and safety of lian-hua qing-wen granule for covid- : a protocol for systematic review and meta-analysis reversal of type diabetes via islet beta cell regeneration following immune modulation by cord blood-derived multipotent stem cells hair regrowth in alopecia areata patients following stem cell educator therapy targeting insulin resistance in type diabetes via immune modulation of cord blood-derived multipotent stem cells (cb-scs) in stem cell educator therapy: phase i/ii clinical trial differentiation capacity of peripheral naive t cells in rheumatoid and psoriatic arthritis key: cord- - wfb gt authors: ghorbani, mahdi; brooks, bernard r.; klauda, jeffery b. title: critical sequence hot-spots for binding of ncov- to ace as evaluated by molecular simulations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wfb gt the novel coronavirus (ncov- ) outbreak has put the world on edge, causing millions of cases and hundreds of thousands of deaths all around the world, as of june , let alone the societal and economic impacts of the crisis. the spike protein of ncov- resides on the virion’s surface mediating coronavirus entry into host cells by binding its receptor binding domain (rbd) to the host cell surface receptor protein, angiotensin converter enzyme (ace ). our goal is to provide a detailed structural mechanism of how ncov- recognizes and establishes contacts with ace and its difference with an earlier coronavirus sars-cov in via extensive molecular dynamics (md) simulations. numerous mutations have been identified in the rbd of ncov- strains isolated from humans in different parts of the world. in this study, we investigated the effect of these mutations as well as other ala-scanning mutations on the stability of rbd/ace complex. it is found that most of the naturally-occurring mutations to the rbd either strengthen or have the same binding affinity to ace as the wild-type ncov- . this may have implications for high human-to-human transmission of coronavirus in regions where these mutations have been found as well as any vaccine design endeavors since these mutations could act as antibody escape mutants. furthermore, in-silico ala-scanning and long-timescale md simulations, highlight the crucial role of the residues at the interface of rbd and ace that may be used as potential pharmacophores for any drug development endeavors. from an evolutional perspective, this study also identifies how the virus has evolved from its predecessor sars-cov and how it could further evolve to become more infectious. the novel coronavirus (ncov- ) outbreak emerging from china has become a global pandemic and a major threat for human public health. according to world health organization (who) as of june th , there has been about million confirmed cases and approaching , deaths due to coronavirus in the world. [ ] [ ] much of the human population including the united states of america were under lockdown or official stay-at-home orders to minimize the continued spread of the virus. coronaviruses are a family of single-stranded enveloped rna viruses. phylogenetic analysis of coronavirus genome has shown that ncov- belongs to the beta-coronavirus family, which also includes mers-cov, sars-cov and bat-sars-related coronaviruses. [ ] [ ] it is worth mentioning that sars-cov, which was widespread in caused more than , cases and about deaths and mers-cov (middle east respiratory syndrome coronavirus) in also spread in more than countries, causing about , cases and more than deaths. (www.who.int/health-topics/coronavirus). in all coronaviruses, a homotrimeric spike glycoprotein on the virion's envelope mediates coronavirus entry into host cells through a mechanism of receptor binding followed by fusion of viral and host membranes. , coronavirus spike protein contains two functional subunits s and s . the s subunit is responsible for binding to host cell receptor, and the s subunit is responsible for fusion of viral and host cell membranes. , the spike protein in ncov- exists in a meta-stable pre-fusion conformation that undergoes a substantial conformational rearrangement to fuse the viral membrane with the host cell membrane. , ncov- is closely related to bat coronavirus ratg with about . % sequence similarity in the spike protein gene. the sequence similarity of ncov- and sars-cov is less than % in the spike genome. s subunit in the spike protein includes a receptor binding domain (rbd) that recognizes and binds to the host cells receptor. the rbd of ncov- shares . % sequence identity to sars-cov rbd and the root mean squared deviation (rmsd) for the structure between the two proteins is . which shows the high structural similarity. , , experimental binding affinity measurements using surface plasmon resonance (spr) have shown that ncov- fold higher affinity than sars-cov binding to ace . based on the sequence similarity between rbd of ncov- and sars-cov and also the tight binding between rbd of ncov- and ace , it is most probable that ncov- uses this receptor on human cells to gain entry into the body. , , , the spike protein and specifically the rbd domain in coronaviruses have been a major target for therapeutic antibodies. however, no monoclonal antibodies targeted to rbd have been able to bind efficiently and neutralize ncov- . , the core of ncov- rbd is a - the sequence alignment between sars-cov in human, sars civet, bat ratg coronavirus and ncov- in the rbm is shown in figure . there is a % sequence similarity between the rbm of ncov- and sars-cov. rbm mutations played an important role in the sars epidemic in . , two mutations in the rbm of sars- from sars-civet were observed from strains of these viruses. these two mutations were k n and s t. these two residues are close to the virus binding hotspots in ace including hotspot- and hotspot- . hotspot- centers on the salt-bridge between k -e and hotspot- is centered on the salt-bridge between k -e on ace . residues k and s in sars-civet are in close proximity with these hotspots and mutations at these residues caused sars to bind ace with significantly higher affinity than sars-civet and played a major role in civet-to-human and human-to-human transmission of sars coronavirus in . , [ ] [ ] [ ] numerous mutations in the interface of sars-cov rbd and ace from different strains of sars isolated from humans in have been identified and the effect of these mutations on binding ace have been investigated by surface plasmon resonance. , two identified rbd mutations (y f and l f) increased the binding affinity of sars-cov to ace and two mutations (n k, t s) decreased the binding affinity. it was demonstrated that these mutations were viral adaptations to either human or civet ace . , a pseudotyped viral infection assay of the interaction between different spike proteins and ace confirmed the correlation between high affinity mutants and their high infection. further investigation of rbd residues in binding of sars-cov and ace was performed through ala-scanning mutagenesis, which resulted in identification of residues that reduce binding affinity to ace upon mutation to alanine. these residues are k , r , d , d , i , n , f , q , y and r . rbd mutations have also been identified in mers-cov, which affected their affinity to receptor (dpp ) on human cells. it is not known whether these mutations are linked to the severity of coronavirus in these regions. the focus of this article is to elucidate the differences between the interface of sars-cov and ncov- with ace to understand with atomic resolution the interaction mechanism and hotspot residues at the rbd/ace interface using long-timescale molecular dynamics (md) simulation. an alanine-scanning mutagenesis in the rbm of ncov- helped to identify the key residues in the interaction, which could be used as potential pharmacophores for future drug development. furthermore, we performed molecular simulations on the seven most common mutations found from surveillance of rbd mutations n k, t i, v a, g s, s p, v f and a v. from an evolutionary perspective this study shows the residues in which the virus might further evolve to be even more dangerous to human health. ncov- shares % sequence similarity with sars- spike protein, % sequence identity for rbd and % for the rbm. bat coronavirus ratg seems to be the closest relative of ncov- sharing about % sequence identity in the spike protein. the the mutations selected are listed in table s along with their location in rbd. the crystal structure of ncov- in complex with hace (pdb id: m j) as well as sars-cov complex with human ace (pdb id: acj) were obtained from rcsb (www.rcsb.org). the steps of energy minimizations were done using the steepest descent algorithm. in all steps the lincs algorithm was used to constraint all bonds containing hydrogen atoms and a time step of fs was used as the integration time step. equilibration of all systems were performed in three steps. in the first step, , steps of simulation were performed using a velocity-rescaling thermostat to maintain the temperature at k with a . ps coupling constant in nvt ensemble under periodic boundary conditions and harmonic restraints on the backbone and sidechain atoms of the complex. the velocity rescaling thermostat was used in all other steps of simulation. in the next step, we performed , steps in the isothermalisobaric npt ensemble at temperature of k and pressure of bar using a berendsen barostat. this was done by decreasing the force constant of the restraint on backbone and side chain atoms of the complex from to and finally to మ . berendsen barostat was only used for the equilibration step due to usefulness in rapidly correcting density. in the next step the restraints were removed, and the systems were subjected to , , steps of md simulation under npt ensemble. in the production run, harmonic restraints were removed and all the systems were simulated using a npt ensemble where the pressure was maintained at bar using the parrinello-rahman barostat with a compressibility of and a coupling constant of . ps. it is important to note that all the berendsen barostat was only used for the equilibration step as it was shown that this barostat can cause unrealistic temperature gradients. the production run lasted for ns for sars-cov and ncov- complexes and ns for all the mutants using with a fs timestep and the particle-mesh ewald (pme) for long range electrostatic interactions using gromacs . package. all mutant systems were constructed as described before and all complexes ran for ns of production run. the principal components were used to calculate and plot the approximate free energy landscape (afel). we refer to the free energy landscape produced by this approach to be approximate in that the ensemble with respect to the first few pc's (lowest frequency quasiharmonic modes) is not close to convergence, but the analysis can still provide valuable information and insight. g_sham, g_covar and g_anaeig functions in gromacs were used to obtain principal components and afel. in each afel the deep valleys represent the most stable conformations separated by some intermediate states. the dynamic cross-correlation maps (dccm) were obtained using md_task package to identify the correlated motions of rbd residues. in dccm the cross-correlation matrix by setting a value of and for solvent and solute dielectric constants. the non-polar free energy is simply estimated from solvent accessible surface area (sasa) of the solute from equation . to compute the rmsd of systems, the rotational and translational movements were removed by first fitting the c α atoms of the rbd to the crystal structure and then computing the rmsd with respect to c α atoms of rbd in each system. in most of the variants, the rmsd is stable during the ns simulation. however, a few mutations show some rmsd variance. in mutation y a, the rmsd increases from the first two eigenvectors were used to calculate and plot the afel as a function of first two principal components using the last ns of the simulation for mutant systems. afel for a few mutants are shown in figure and the rest of them are shown in figure s . the binding energetics between ace and the rbd of sars-cov, ncov- and all its mutant complexes were investigated by the mmpbsa method. for ncov- . the binding free energy for ncov- and sars-cov was decomposed into a perresidue based binding affinity to find the residues that contribute strongly to the binding and complex formation (figure ). most of the investigated residues in the rbm of ncov- had a favorable contribution to total binding energy. binding free energy decomposition to its individual components for all mutants is represented in contributes the most to this low binding energies for these mutants. the contribution of rbm residues to binding with ace for ncov- were mapped to the rbd structure and is shown in figure b . natural mutants exhibited similar or higher binding affinities compared to wild-type ncov- . importantly, mutation t i which is one of the most occurred mutations in england based on gisaid database table contribution of interface residues to structure in rbd of ncov- . the rbd domain is purple and the ace is yellow. the rbd in contact with ac is rendered in a surface format with more red being a favorable contribution to binding (more negative) and blue unfavorable contribution (positive). in this work, we preformed md simulations to unveil the detailed molecular mechanism . d in sars-cov is located in a region of high negative charge from residues e , e and d on ace . electrostatic repulsion between d on sars-cov and the acidic residues on ace is the reason for highly negative contribution of this residue to binding of sars-cov to ace . mutation to s in this location removes this highly negative contribution. to our knowledge this is first detailed molecular simulation study on the effect of mutations on binding of ncov- to ace . previous computational studies have found that ncov- binds to ace with a total binding affinity which was about stronger than sars-cov and is in fair agreement with the results here. the critical role of interface residues and residues are computationally investigated here and in other articles and the results of all the studies indicate the importance of these residues for the stability of the complex and finding hotspot residues for the interaction with receptor ace . , [ ] [ ] [ ] it is interesting to note the role of shown there is a correlation between higher binding affinity to receptor and higher infection rate by coronavirus. [ ] [ ] [ ] [ ] high binding affinity for some mutants such as t i could be the reason for higher human-to-human transmission rate in regions where these mutations are found. it is also alerting that mutations at other residues such as g a, g a, y a and y a increase the binding affinity considerably and should be monitored. mutations of ncov- rbd that do not change the binding affinity and complex stability, could have implications for antibody design purposes since they could act antibody escape mutants. escape from monoclonal antibodies are observed for mutations of sars-cov in and these mutations should be considered for any antibody design endeavors against consider these escape mutations. in conclusion, this study unraveled key molecular traits underlying the higher affinity of ncov- for ace compared to sars-cov and unveiled critical residues for the interaction higher affinity than wild-type. other occurring mutations n k and e a are found to increase the electrostatic interaction of rbd with ace . it is also alerting that some of the alanine substitutions at residues g , g and y substantially increased the binding affinity that may lead to a strongly rbd attachment to ace and influence the infection virulence. on the other hand, most mutations are found not to impact the binding affinity of rbd with ace in ncov- which could have implications for vaccine design endeavors as these mutations could act as antibody escape mutants. receptor recognition is the first line of attack for coronavirus and this study gives novel insights to key structural features of interface residues for advancement of effective therapeutic strategies to stop the coronavirus pandemic. the authors would like to dedicate this article to the doctors and nurses who sacrificed their time, health and even their lives to fight covid- , particularly those in iran and the united states. jbk would also like to dedicate this work to family friend joe kaplan (silver spring, md) who passed away due to covid- on april , . a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin structure, function, and evolution of coronavirus spike proteins structural and functional basis of sars-cov- entry by using human ace receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus cryo-em structure of the -ncov spike in the prefusion conformation. science the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex structural insights into the middle east respiratory syndrome coronavirus a protein and its dsrna binding mechanism structure, function, and antigenicity of the sars-cov- spike glycoprotein potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody receptor recognition mechanisms of coronaviruses: a decade of structural studies receptor and viral determinants of sars-coronavirus adaptation to human ace structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections receptor recognition and cross-species infections of sars coronavirus mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections the sars coronavirus s glycoprotein receptor binding domain: fine mapping and functional characterization development and characterization of a severe acute respiratory syndrome-associated coronavirus-neutralizing human monoclonal antibody that provides effective immunoprophylaxis in mice neutralizing human monoclonal antibodies to severe acute respiratory syndrome coronavirus: target, mechanism of action, and therapeutic potential human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants vaccine efficacy in senescent mice challenged with recombinant sars-cov bearing epidemic and zoonotic spike variants structural basis for potent cross-neutralizing human monoclonal antibody protection against lethal human and zoonotic severe acute respiratory syndrome coronavirus challenge escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus disease and diplomacy: gisaid's innovative contribution to global health structural basis for receptor recognition by the novel coronavirus from wuhan mcsm-ppi : predicting the effects of mutations on protein the sars coronavirus s glycoprotein receptor binding domain: fine mapping and functional characterization structure of the sars-cov- spike receptor-binding domain bound to the ace receptor the protein data bank gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers improved side-chain torsion potentials for the amber ff sb protein force field optimization of the additive charmm all-atom protein force field targeting chem. phys journal of computational chemistry polymorphic transitions in single crystals: a new molecular dynamics method constant pressure molecular dynamics simulation: the langevin piston method articles you may be interested in harmonic analysis of large systems. i. methodology principal component analysis: a method for determining the essential dynamics of proteins gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers a software suite for analyzing molecular dynamics trajectories assessing the performance of the g_mmpbsa tools to simulate the inhibition of oseltamivir to influenza virus neuraminidase by molecular mechanics poisson-boltzmann surface area methods recent developments and applications of the mmpbsa method virtual screening using molecular simulations insight into the interactive residues between two domains of human somatic angiotensin-converting enzyme and angiotensin ii by mm-pbsa calculation and steered molecular dynamics simulation enhanced receptor binding of sars-cov- through networks of hydrogen-bonding and hydrophobic interactions is the rigidity of sars-cov- spike receptor-binding motif the hallmark for its enhanced infectivity? insights from all-atom simulations characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine structural and functional basis of sars-cov- entry by using human ace phylogenetic network analysis of sars-cov- genomes is the rigidity of sars-cov- spike receptor-binding motif the hallmark for its enhanced infectivity? an answer from all-atoms simulations the sars-cov- exerts a distinctive strategy for interacting with the ace human receptor computational simulations reveal the binding dynamics between human ace and the receptor binding domain of sars-cov- spike protein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus receptor and viral determinants of sars-coronavirus adaptation to human ace efficient replication of severe acute respiratory syndrome coronavirus in mouse cells is limited by murine angiotensin-converting enzyme sars-cov- spike glycoprotein receptor binding domain is subject to negative selection with predicted positive selection mutations key: cord- -jkr p authors: gasser, romain; cloutier, marc; prévost, jérémie; fink, corby; ducas, Éric; ding, shilei; dussault, nathalie; landry, patricia; tremblay, tony; laforce-lavoie, audrey; lewin, antoine; beaudoin-bussières, guillaume; laumaea, annemarie; medjahed, halima; larochelle, catherine; richard, jonathan; dekaban, gregory a.; dikeakos, jimmy d.; bazin, renée; finzi, andrés title: major role of igm in the neutralizing activity of convalescent plasma against sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jkr p characterization of the humoral response to sars-cov- , the etiological agent of covid- , is essential to help control the infection. in this regard, we and others recently reported that the neutralization activity of plasma from covid- patients decreases rapidly during the first weeks after recovery. however, the specific role of each immunoglobulin isotype in the overall neutralizing capacity is still not well understood. in this study, we selected plasma from a cohort of covid- convalescent patients and selectively depleted immunoglobulin a, m or g before testing the remaining neutralizing capacity of the depleted plasma. we found that depletion of immunoglobulin m was associated with the most substantial loss of virus neutralization, followed by immunoglobulin g. this observation may help design efficient antibody-based covid- therapies and may also explain the increased susceptibility to sars-cov- of autoimmune patients receiving therapies that impair the production of igm. should be performed early after disease recovery. second, our results suggest that caution should be taken when using therapeutics that impair the production of igm. anti-cd antibodies (b cell-depleting agents) are used to treat several inflammatory disorders. their use is associated with igm deficiency in a substantial number of patients, while their impact on igg and iga levels is more limited (kridin and ahmed, ) . in line with our data, recent studies reported that anti-cd therapy could be associated with a higher susceptibility to contract an in vitro microneutralization assay for sars-cov- serology and drug screening decline of humoral responses against sars-cov- spike in convalescent individuals potent neutralizing antibodies from covid- patients define multiple targets of vulnerability rituximab for granulomatosis with polyangiitis in the pandemic of covid- : lessons from a case with severe pneumonia depends on ace and tmprss and is blocked by a clinically proven protease inhibitor covid- in persons with multiple sclerosis treated with ocrelizumab -a pharmacovigilance case series post-rituximab immunoglobulin m (igm) hypogammaglobulinemia intracytoplasmic tyrosine residue of hiv- envelope glycoprotein is critical for basolateral targeting of viral budding in mdck cells key: cord- -eqwxkwqa authors: kumar, roshan; verma, helianthous; singhvi, nirjara; sood, utkarsh; gupta, vipin; singh, mona; kumari, rashmi; hira, princy; nagar, shekhar; talwar, chandni; nayyar, namita; anand, shailly; rawat, charu dogra; verma, mansi; negi, ram krishan; singh, yogendra; lal, rup title: comparative genomic analysis of rapidly evolving sars-cov- viruses reveal mosaic pattern of phylogeographical distribution date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: eqwxkwqa the coronavirus disease- (covid- ) that started in wuhan, china in december has spread worldwide emerging as a global pandemic. the severe respiratory pneumonia caused by the novel sars-cov- has so far claimed more than , lives and has impacted human lives worldwide. however, as the novel sars-cov- displays high transmission rates, their underlying genomic severity is required to be fully understood. we studied the complete genomes of sars-cov- strains from different geographical regions worldwide to uncover the pattern of the spread of the virus. we show that there is no direct transmission pattern of the virus among neighboring countries suggesting that the outbreak is a result of travel of infected humans to different countries. we revealed unique single nucleotide polymorphisms (snps) in nsp - (orf b polyprotein) and s-protein within viral isolates from the usa. these viral proteins are involved in rna replication, indicating highly evolved viral strains circulating in the population of usa than other countries. furthermore, we found an amino acid addition in nsp (mrna cap- methyltransferase) of the usa isolate (mt ) leading to shift in amino acid frame from position onwards. through the construction of sars-cov- -human interactome, we further revealed that multiple host proteins (phb, ppp ca, tgf-β, socs , stat , jak / , smad , bcl , cav & specc ) are manipulated by the viral proteins (nsp , pl-pro, n-protein, orf a, m-s-orf a complex, nsp -nsp -nsp -rdrp complex) for mediating host immune evasion. thus, the replicative machinery of sars-cov- is fast evolving to evade host challenges which need to be considered for developing effective treatment strategies. downloaded on march , from ncbi database and based on quality assessment two genomes with multiple ns were removed from the study. further the genomes were annotated using prokka [ ] . a manually annotated reference database was generated using the genbank file of severe acute respiratory syndrome coronavirus isolate-sars-cov- /sh /human/ /chn (accession number: mt ) and open reading frames (orfs) were predicted against the formatted database using prokka (-gcode ) [ ] . further the gc content information was generated using quast standalone tool [ ] . to determine the evolutionary pressure on viral proteins, dn/ds values were calculated for orfs of all strains. the orthologous gene clusters were aligned using muscle v . team, ) . to infer the phylogeny, the core gene alignment was generated using mafft [ ] present within the roary package [ ] . further, the phylogeny was inferred using the maximum likelihood method based and tamura-nei model [ ] in megax [ ] and visualized in interactive tree of life (itol) [ ] and grapetree [ ] . to determine the single nucleotide polymorphism (snp), whole-genome alignments were made using libmuscle aligner. for this, we used annotated genbank of sars-cov- /sh /human/ /chn (accession no. mt ) as the reference in the parsnp tool of harvest suite [ ] . as only genomes within a specified mumi distance threshold are recruited, we used option -c to force include all the strains. for output, it produced a core-genome alignment, variant calls and a phylogeny based on single nucleotide polymorphisms. the snps were further visualized in gingr, a dynamic visual platform [ ] . further, the tree was visualized in interactive tree of life (itol) [ ] . sars-cov- protein annotation and host-pathogenic interactions updated in any protein database, we first annotated the genes using blastp tool [ ] . the similarity searches were performed against sars-cov isolate tor having accession no. ay selected from ncbi at default parameters. the annotated sars-cov- proteins were mapped against virusite [ ] and interaction databases such as virus.string v . [ ] and intact [ ] for predicting their interaction against host proteins. these proteins were either the direct targets of hcov proteins or were involved in critical pathways of hcov infection identified by multiple experimental sources. to build a comprehensive list of human ppis, we assembled data from a total of bioinformatics and systems biology databases with five types of experimental evidence: (i) binary ppis tested by high-throughput yeast two-hybrid (y h) systems; (ii) binary, physical ppis from protein d structures; (iii) kinase-substrate interactions by literature-derived low-throughput or high-throughput experiments; (iv) signaling network by literature-derived low-throughput experiments; and (v) literature-curated ppis identified by affinity purification followed by mass spectrometry (ap-ms), y h, or by literature-derived low [ , ] . filtered proteins (confidence value: . ) were mapped to their entrez id [ ] based on the ncbi database used for interactome analysis. hpi were stimulated using cytoscape v. . . [ ] . next, functional studies were performed using the kyoto encyclopedia of genes and genomes (kegg) [ , ] and gene ontology (go) enrichment analyses using uniprot database [ ] to evaluate the biological relevance and functional pathways of the hcov-associated proteins. all functional analyses were performed using string enrichment and stringify, plugin of cytoscape v. . . [ ] . network analysis was performed by tool networkanalyzer, plugin of cytoscape with the orthogonal layout. general genomic attributes of sars-cov- in this study, we analyzed a total of sars-cov- strains (available on march , ) isolated between december -march from different countries namely usa (n= ), china (n= ), japan (n= ), india (n= ), taiwan (n= ) and one each from australia, brazil, oronasopharynges or lungs, while two of them were isolated from faeces suggesting both respiratory and gastrointestinal connection of sars-cov- (table ) . no information of the source of isolation of the remaining isolates is available. the average genome size and gc content were found to be ± . bp and . ± . %, respectively. all these isolates were found to harbor open reading frames coding for orf a ( bp) and orf b ( bp) polyproteins, surface glycoprotein or s-protein ( bp), orf a protein ( bp our analysis revealed that strains of human infecting sars-cov- are novel and highly identical (> . %). a recent study established the closest neighbor of sars-cov- as sarsr- cov-ratg , a bat coronavirus [ ] . as covid transits from epidemic to pandemic due to extremely contagious nature of the sars-cov- , it was interesting to draw the relation between strains and their geographical locations. in this study, we employed two methods to delineate phylogenomic relatedness of the isolates: core genome ( figure a & c) and single nucleotide polymorphisms (snps) ( figure b ). phylogenies obtained were annotated with country of isolation of each strain ( figure a & b). the phylogenetic clustering was found majorly concordant by both core-genome ( figure a ) and snp based methods ( figure b) . the strains formed a monophyletic clade, in which mt . (south korea) and mt . (sweden) were most diverged. focusing on the edge-connection between the neighboring countries from where the transmission is more likely to occur, we noted a strain from taiwan (mt ) closely clustered with another from china (mt . ). with the exception of these two strains, we did not find any connection between strains of neighboring countries. thus, most strains belonging to the same country clustered distantly from each other and showed relatedness with strains isolated from distant geographical locations ( figure a & b). for instance, a sars-cov- strain isolated from nepal (mt ) clustered with a strain from usa (mt ). also, strains from wuhan (lr and lr ), where the virus was originated, showed highest identity with usa as well as china strains; strains from india, mt and mt clustered closely with china and usa strains, respectively ( figure a & b) . similarly, australian strain (mt ) showed close clustering with usa strain ( figure a & b) and one strain from taiwan (mt ) clustered nearly with chinese isolates ( figure b ). isolates from italy (mt ) and brazil (mt ) clustered with different usa strains ( figure a & b) . notably, isolates from same country or geographical location formed a mosaic pattern of phylogenetic placements of countries' isolates. for viral transmission, contact between the individuals is also an important factor, supposedly due to which the spread of identical strains across the border of neighboring countries is more likely. but we obtained a pattern where indian strains showed highest similarity with usa and china strains, australian strains with usa strains, italy and brazilian strains with strains isolated from usa among others. this depicts the viral spread across different communities. however, as genomes of sars-cov- were available mostly from usa and china, sampling biases is evident in analyzed dataset as available on ncbi. thus, it is plausible for strains from other countries to show most similarity with strains from these two countries. in the near future as more and more genome sequences will become available from different geographical locations; more accurate patterns of their relatedness across the globe will become available snps in all predicted orfs in each genome were analyzed using sars-cov- /sh /human/ /chn as a reference. snps were determined using maximum unique matches between the genomes of coronavirus, we observed that the strains isolated from usa although the primary mode of infection is human to human transmission through close contact, which occurs via spraying of nasal droplets from the infected person, yet the primary site of infection and pathogenesis of sars-cov- is still not clear and under investigation. to explore the role of sars-cov- proteins in host immune evasion, the sarscov- proteins were mapped over host proteome database ( figure b & table ). we identified a total of proteins from host proteome forming close association with viral proteins present in orfs of sars-cov- ( figure c ). the network was trimmed in cytoscape v . . where only interacting proteins were selected. only viral proteins were found to interact with host proteins ( figure a ). detailed analysis of interactome highlighted host proteins in direct association with viral proteins. further, the network was analyzed for identification of regulatory hubs based on degree analysis. we identified mitogen activated protein kinase similarly, the viral protein papain-like proteinase (pl-pro) which has deubiquitinase and deisgylating activity is responsible for cleaving viral polyprotein into mature proteins which are essential for viral replication [ ] . our study showed that pl-pro directly interacts with ppp ca which is a protein phosphatase that associates with over regulatory host proteins to form highly specific holoenzymes. pl-pro is also found to interact with tgfβ which is a beta transforming growth factor and promotes t-helper cells (th ) agreement with our prediction where we found il as an interacting partner. our study also showed jak / as an interacting partner which is known for ifnγ signaling. it is well known that tgfβ along with il and stat promotes th differentiation by inhibiting socs [ ]. th is a source of il , which is commonly found in serum samples of covid- patients [ , ] . hence, our interactome is supported from these findings where we found socs , stat , jak / as an interacting partner [ ] . the results suggested that proinflammatory cytokine storm is one of the reasons for sars-cov- mediated immunopathogenesis. in the next cycle of physical events the viral protein nc (nucleoprotein), which is a major structural part of sarv family associates with the genomic rna to form a flexible, helical nucleocapsid. interaction of this protein with smad leads to inhibition of apoptosis of sars- cov infected lung cells [ ] , which is a successful strategy of immune evasion by the virus. membranes. cav has been reported to be involved in viral replication, persistence, and the potential role in pathogenesis in hiv infection also [ ] . thus, orf a interactions will upregulate viral replication thus playing a very crucial role in pathogenesis. multiple structural proteins were observed to target the specc proteins and linked with cytokinesis and spindle formations during division. thus, major viral assembly also targets the proteins linked with immunity and cell division. taken together, we estimated that sars-cov- manipulate multiple host proteins for its survival while, their interaction is also a reason for immunopathogenesis. as covid- continues to impact virtually all human lives worldwide due to its extremely contagious nature, it has spiked the interest of scientific community all over the world to understand better the pathogenesis of the novel sars-cov- . in this study, the analysis was performed on the genomes of the novel sars-cov- isolates recently reported from different countries to understand viral pathogenesis. with the limited data available so far, we observed no direct transmission pattern of the novel sars-cov- in the neighboring countries through our analyses of the phylogenomic relatedness of geographical isolates. the isolates from same locations were phylogenetically distant, for instance, isolates from the usa and china. thus, there appears to be a mosaic pattern of transmission indicative of the result of infected human travel across different countries. as covid- transited from epidemic to pandemic within a short time, it does not look surprising from the genome structures of the viral isolates. the genomes of six isolates, specifically from the usa, were found to harbor unique amino acid snps and showed amino acid substitutions in orf b protein and s-protein, while one of them also harbored an amino-acid addition. this is suggestive of the severity of the mutating viral genomes within the population of the usa. these proteins are directly involved in the formation of viral replication-transcription complexes (rtc). therefore, we argue that the novel sars-cov- has fast evolving replicative machinery and that it is urgent to consider these mutants to develop strategies for covid- treatment. the orf ab polyprotein protein and s-protein were also found to have dn/ds values approaching and thus might confer a selective advantage to evade host responsive mechanisms. the construction of sars-cov- - human interactome revealed that its pathogenicity is mediated by a surge in pro-inflammatory cytokine. it is predicted that major immune-pathogenicity mechanism by sars-cov- includes the host cell environment alteration by disintegration by signal transduction pathways and immunity evasion by several protection mechanisms. the mode of entry of this virus by s-proteins inside the host cell is still unclear but it might be similar to sars cov- like viruses. lastly, we believe as more data accumulate for covid- the evolutionary pattern will become much clear. isolates. highly similar genomes of coronaviruses were taken as input by parsnp. whole- genome alignments were made using libmuscle aligner using the annotated genome of mt strain as reference. parsnp identifies the maximal unique matches (mums) among the query genomes provided in a single directory. as only genomes within a specified mumi distance threshold are recruited, option -c to force include all the strains was used. the output phylogeny based on single nucleotide polymorphisms was obtained following variant calling on core-genome alignment. c) the minimum spanning tree generated using maximum likelihood method and tamura-nei model showing the genetic relationships of sars-cov- isolates with their geographical distribution. table : general genomic attributes of sars-cov- strains. structural and functional basis of sars-cov- entry by using human ace structure, function, and antigenicity of the sars-cov- spike glycoprotein genome composition and divergence of the novel coronavirus ( -ncov) originating in china sars a/orf a sars b/orf b sars a/orf sars b/orf key: cord- -b bc vwd authors: bansal, kanika; patil, prabhu b. title: codon pattern reveals sars-cov- to be a monomorphic strain that emerged through recombination of replicase and envelope alleles of bat and pangolin origin date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: b bc vwd viruses are dependent on the host trna pool, and an optimum codon usage pattern (cup) is a driving force in its evolution. systematic analysis of cup of replicase (rdrp), spike, envelope (e), membrane glycoprotein (m), and nucleocapsid (n) encoding genes of sars-cov- from reported diverse lineages to suggest one-time host jump of a sars-cov- isolate into the human host. in contrast to human isolates, a high degree of variation in cup of these genes suggests that bats, pangolins, and dogs are natural reservoirs of diverse strains. at the same time, our analysis suggests that dogs are not a source of sars-cov- . interestingly, cup of rdrp displays conservation with two bat sars isolates ratg and rmyn . cup of the sars-cov- e gene is also conserved with bat and pangolin isolates with variations for a few amino acids. this suggests role allele replacement in these two genes involving sars strains of least two hosts. at the same time, a relatively conserved cup pattern in replicase and envelope across hosts suggests them it to be an ideal target in antiviral development for sars-cov- . the origin and success of novel sars coronavirus (sars-cov- ) (betacoronavirus) causing the covid- disease pandemic has been a topic of intense discussion. in the past two decades since the first outbreak of sars in , several sars-related coronaviruses were reported from the bat, which was speculated to be significant reservoirs for the future possible outbreaks (cui et al., ; ge et al., ; hu et al., ; li et al., ) . bats are the only flying mammals representing % of known mammalian species and are critical natural reservoirs of many zoonotic viruses like nipah virus, hendra virus, rabies virus, ebola virus, etc. (halpin et al., ; leroy et al., ; mackenzie et al., ) . besides bat, a considerable number of wild animals have played a pivotal role in zoonotic transfers (bengis et al., ) . according to reports before the sars-cov- pandemic, there was a high-risk assessment of sars coronavirus infection from wild animals like bats, civets, pangolins, snakes, tiger, and primates in china due to human interventions (bell et al., ; gottlieb, ; tang et al., ) . the human-wildlife interface as a part of culture or globalization poses risks for zoonotic transfers followed by disease outbreaks like coronavirus outbreaks: sars ( sars ( , , mers ( ) , and sars-cov- ( ). their genome similarities with already reported sars viruses from diverse animals are estimated by animal reservoirs for such outbreaks. for instance, sars outbreak virus had . % genome similarity with palm civets indicating it to be a direct source. just . % divergence from the animal reservoir stipulates its recent transfer into the masked palm civet population (shi and hu, ) . despite genetic diversity with bat sars-cov they were ultimately found to be a source of the pandemic due to no pathogen prevalence in wild civet population and clinical symptom manifestation in civets, unlike bats (li et al., ) . however, in the current pandemic, there are several theories of the origin of sars-cov- either from bat, pangolin, dog or some intermediate host, etc. (paraskevis et al., ; zhang et al., ) . the closest match to sars-cov- is ratg ( % identity) isolated from the rhinolophus affinis bat (zhou et al., ) , followed by pangolin sars viruses with % identity (zhou et al., ) . as the closest match is just %, it has opened a heated debate in the scientific community for its origin, and no direct animal source can be detected. according to genome similarities, sars-cov- differs from its closest sars coronavirus by %, followed by % with its next closest relative, pangolin. it indicates that the virus has evolved before infecting humans, and there is a missing link between bat/pangolin and humans, which further inflates the argument on the animal source. nevertheless, another study based on cpg island deficiency in sars-cov- and canine coronavirus (alphacoronavirus) suggested that dogs may have provided a cellular environment for sars-cov- evolution into a cpg deficient virus (xia, ) . hence, they claim dog to be a direct source of the current pandemic, raising a constant debate (pollock et al., ) (https://www.linkedin.com/pulse/where-dog-laymans-version-my-mbe-paper-xuhua-xia/). nevertheless, most of the other rna viruses like pestvirus, in addition to bat or pangolin sars-cov, are also depleted in cpg is not included in the study. further, the spike receptorbinding domain of sars-cov- was similar to pangolin sars strains compared to bat sars. cpg island deficiency is not a unique feature of dog sars-cov, and pollock et al. have concluded that there is no direct evidence for the role of dogs as intermediate hosts. hence, we need to address this issue with other fundamental evidence. for host jump events, viral codon optimization based on the host trna pool is critical (khandia et al., ; tian et al., ; van weringh et al., ) . in the present study, we have focused on codon usage pattern (cup) of sars coronavirus from different hosts under debate (bat, pangolin, and dog) as a probable origin for sars-cov- . usage patterns of synonymous codons are a critical feature in the adaptation of organisms as viruses are dependent on the host trna pool for replication and disease manifestations. for instance, codon adaptation indices were studied for retroviruses infecting humans, including the hiv- virus (roychoudhury and mukherjee, ). once the viral genome is in the host translational mechanism, genes having optimized codons according to the host translate faster, resulting in higher fitness of the virus (carbone, ) . hence, an optimum cup is vital in its evolution, and probable host jumps. this also results in synonymous changes in the viral genome, which are not revealed by protein mutational studies. codon usage study of sars-cov- has found high au content influencing its codon usage and better adaptation to the humans (dilucca et al., ) . however, another study comparing the codon usage pattern of sars-cov- with other betacoronaviruses suggested that current pandemic coronavirus is subjected to different evolutionary pressures (gu et al., ) . however, systematic insights into cup is required to understand its origin and phenomenal success of emergent viruses like sars-cov- . the genome of sars-cov- is a single positive-stranded rna of approximately , nucleotides. major structural proteins are spike protein (s), envelope protein (e), membrane glycoprotein (m), nucleocapsid protein (n), and non-structural rna dependent rna polymerase (rdrp). as these five viral proteins are common amongst betacoronaviruses (woo et al., ) , it will be useful in studying diverse coronaviruses from diverse hosts, i.e., humans, bat, pangolin, and dog. presently, we have analyzed cup of these five proteins in sars coronavirus, which are considered important targets for vaccine and antiviral development for sars-cov- (aftab et al., ; du et al., ; huang et al., ; wu et al., ) . cup calculations should be based on highly redundant amino acids rather than amino acids having one or two codons. here, we have calculated the percentage of gc biased synonymous codons for amino acids having at least four synonymous codons (glycine, valine, threonine, leucine, arginine, serine, proline and alanine) (patil and sonti, ) . we have compared cup for all five genes amongst sars-cov- genomes representing all the sub-lineages from the two major lineages a and b (supplementary table ) . here, genes having anonymous nucleotides were omitted from the analysis. cups of all the genes were uniform for all sars-cov- reported from the human population worldwide irrespective of ancestral or most recent lineages (supplementary figures , , , , and ), indicating a single event of a zoonotic transfer of a thriving strain of sars. however, each of them all the genes have distinct cup among themselves, indicating distinct selection pressure (figure ). since horseshoe bat sars strains ratg and rmyn are known to be the closest strains to sars-cov- , we have compared cup of these two strains with sars-cov- (figure ). here, the rdrp and e patterns overlapped for all the three strains with slight variations in e for serine and valine. however, all three strains have distinct patterns for s, m, and n genes. this reaffirms the previous studies that ratg and rmyn are close to sars-cov- (zhou et al., ) , indicating that sars-cov- is distinct from them. unique s, m, and n patterns indicate sars-cov- has evolved to adapt to humans and to look for the possibility of intermediate hosts. we analyzed other known sars viruses from hypothesized hosts. unlike sars-cov- isolates of human, cup is variable in isolates of non-human hosts like a bat, pangolin, and dog (supplementary table ) depicting ongoing adaptation and evolution of sars in these hosts ( figure ) . interestingly, variations in cup of s, m, and n genes are more marked across non-human hosts than the rdrp and e gene. for instance, rdrp cup for pangolin and dog does not have that degree of variation and is host-specific compared to bat rdrp. bat cup for rdrp is variable in all the strains under study, correlating with the fact that bat is a reservoir of sars coronaviruses. whereas, cup of e is conserved in bat and pangolin. interestingly, rdrp cup of sars-cov- shows a slight overlap in pattern with all pangolin sars strains, but it is not as prominent as two bat strains (figure ). among bat-sars, pattern matches with just two previously suggested sars, i.e., ratg and rmyn . in contrast, sars-cov- cup of e has an overlapping pattern with all isolates of bat except for jtmc and serine and valine cup. interestingly, the precisely same pattern in all pangolin strains overlapping with sars-cov- except for valine is also observed. unlike rdrp and e cup of sars-cov- , s, m, and n cup does not have any similarity with non-human hosts studied till now. further, no similarities in cup of all five genes with dogs overrules the role of dog in sars-cov- evolution. the above analysis reveals single patterns for all five genes in different lineages of sars-cov- affirms a single event of host jump of codon-optimized sars strain from its animal reservoir. however, amongst five sars-cov- distinct cup patterns are observed, pointing towards different evolutionary forces acting in the structural and non-structural genes. while comparing cup of the current pandemic with all the hypothesized hosts, rdrp and e gene sources were found to be bat ratg and rmyn and not for s, m, and n genes. this contradicts the previous study correlating rdrp, s, and n proteins with ratg isolate (gu et al., ) . though ratg is the closest match to the current pandemic, some genes of sars-cov- does not share cup with it. this indicates codon optimization in sars-cov- , according to the new host (human) as compared to its animal reservoir. further, no gene has any overlap with dog coronavirus strains, ruling out the possibility of dog as an animal reservoir. nevertheless, the unknown s, m, and n genes play pivotal roles in host interactions and disease manifestations. gene adaptations in s, m, and n have been pivotal in the emergence of the sars viral pool in the hosts and the current pandemic. suggesting that s, m, and n have mainly been under drastic variation both within and across the hosts. either hyper-recombination events or selection pressure can explain this disparity in cup conservation for rdrp, e than s, m, n genes. hence, for the high variability across hosts, the mutation is not a rampant force . we can conclude that evolution in host interaction steps (s, m, and n) has more significance than at the level of replicability (rdrp) and packaging (e). in addition to this, out of five genes in bat, only e shows a unified cup, overlapping with pangolin and sars-cov- . this indicates that rdrp and e are under purifying selection and could be better targeted for antivirals development. high selection pressure in s, m, and n genes indicate that antiviral development attempts targeting these may be futile. for centuries, bats are exploited as medicines, food products in southern china and asia (mickleburgh et al., ) . hence, wildlife-human interactions have been critical for the evolution of human sars coronavirus outbreaks. the high diversity of cup in bats compared to humans and pangolins is reflected by the high genetic diversity of sars in bats (li et al., ) . further, some of the sars-related coronaviruses are found to have broad species tropism (ge et al., ) . implicates that bats have provided a playground for sars to mutate or recombine, resulting in a strain that might be successful in other hosts (in this case, humans) than bats. as the resulting strain (sars-cov- ) might be an accidental strain that originated from recombination in bat but was unfit in bat or its unknown animal reservoir, not leaving its evolutionary footprint. a highly similar strain to the present pandemic could not be found and further warrants studies to explore the diversity of sarslike coronaviruses. for each gene the frequency of codon usage for different amino acids was calculated using a web based program (www.bioinformatics.org). further, eight amino acids i.e., glycine, valine, threonine, leucine, arginine, serine, proline and alanine that have atleast four synonymous codons were selected and the percentage of synonymous codons that end with g or c was calculated for each amino acid and gene. the pattern was calculated for a group of genes by plotting mean values ± sd corresponding to a particular amino acid. zhang, t., wu, q., and zhang, z.j.c.b. ( ) . probable pangolin origin of sars-cov- associated with the covid- outbreak. zhou, p., yang, x.-l., wang, x.-g., hu, b., zhang, l., zhang, w., si, h.-r., zhu, y., li, b., and huang, c.-l. ( ) . a pneumonia outbreak associated with a new coronavirus of probable bat origin. nature , - . analysis of sars-cov- rna-dependent rna polymerase as a potential therapeutic drug target using a computational approach animal origins of sars coronavirus: possible links with the international trade in small carnivores the role of wildlife in emerging and re-emerging zoonoses. revue scientifique et technique-office international des epizooties evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic. biorxiv codon bias is a major factor explaining phage evolution in translationally biased hosts origin and evolution of pathogenic coronaviruses codon usage and phenotypic divergences of sars-cov- genes the spike protein of sars-cov-a target for vaccine and therapeutic development isolation and characterization of a bat sars-like coronavirus that uses the ace receptor chinese scientists must test wild animals to find the host of sars multivariate analyses of codon usage of sars-cov- and other betacoronaviruses isolation of hendra virus from pteropid bats: a natural reservoir of hendra virus discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus pharmacological therapeutics targeting rna-dependent rna polymerase, proteinase and spike protein: from mechanistic studies to clinical trials for covid- analysis of nipah virus codon usage and adaptation to hosts fruit bats as reservoirs of ebola virus bats are natural reservoirs of sars-like coronaviruses emerging viral diseases of southeast asia and the western pacific a review of the global conservation status of bats full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event variation suggestive of horizontal gene transfer at a lipopolysaccharide (lps) biosynthetic locus in xanthomonas oryzae pv. oryzae, the bacterial leaf blight pathogen of rice viral cpg deficiency provides no evidence that dogs were intermediate hosts for sars-cov- a dynamic nomenclature proposal for sars-cov- lineages to assist genomic epidemiology complex codon usage pattern and compositional features of retroviruses. computational and mathematical methods in medicine a review of studies on animal reservoirs of the sars coronavirus the adaptation of codon usage of+ ssrna viruses to their hosts hiv- modulates the trna pool to improve translation efficiency coronavirus genomics and bioinformatics analysis vaccines and therapies in development for sars-cov- infections extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense key: cord- -zw usmh authors: walter, justin d.; hutter, cedric a.j.; garaeva, alisa a.; scherer, melanie; zimmermann, iwan; wyss, marianne; rheinberger, jan; ruedin, yelena; earp, jennifer c.; egloff, pascal; sorgenfrei, michèle; hürlimann, lea m.; gonda, imre; meier, gianmarco; remm, sille; thavarasah, sujani; zimmer, gert; slotboom, dirk j.; paulino, cristina; plattet, philippe; seeger, markus a. title: highly potent bispecific sybodies neutralize sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zw usmh the covid- pandemic has resulted in a global crisis. here, we report the generation of synthetic nanobodies, known as sybodies, against the receptor-binding domain (rbd) of sars-cov- spike protein. we identified a sybody pair (sb# and sb# ) that can bind simultaneously to the rbd, and block ace binding, thereby neutralizing pseudotyped and live sars-cov- viruses. cryo-em analyses of the spike protein in complex with both sybodies revealed symmetrical and asymmetrical conformational states. in the symmetric complex each of the three rbds were bound by both sybodies, and adopted the up conformation. the asymmetric conformation, with three sb# and two sb# bound, contained one down rbd, one up-out rbd and one up rbd. bispecific fusions of the sybodies increased the neutralization potency -fold, as compared to the single binders. our work demonstrates that linking two binders that recognize spatially-discrete binding sites result in highly potent sars-cov- inhibitors for potential therapeutic applications. the ongoing pandemic arising from the emergence of severe acute respiratory syndrome coronavirus (sars-cov- ) in , demands urgent development of effective antiviral therapeutics. several factors contribute to the adverse nature of sars-cov- from a global health perspective, including the absence of herd immunity [ ] , high transmissibility [ , ] , the prospect of asymptomatic carriers [ ] , and a high rate of clinically severe outcomes [ ] . despite intense development efforts, a vaccine against sars-cov- remains unavailable [ , ] , making alternative intervention strategies paramount. in addition to offering relief for patients suffering from the resulting covid- disease, therapeutics may also reduce the viral transmission rate by being administered to asymptomatic individuals subsequent to probable exposure [ ] . finally, given that sars-cov- represents the third global coronavirus outbreak in the past years [ , ] , development of rapid therapeutic strategies during the current crisis could offer greater preparedness for future pandemics. akin to all coronaviruses, the viral envelope of sars-cov- harbors protruding, club-like, multidomain, homotrimeric spike proteins that provide the machinery enabling entry into human cells [ ] [ ] [ ] . the spike ectodomain is segregated into two regions, termed s and s . the outer s subunit of sars-cov- is responsible for host recognition via interaction between its c-terminal receptor-binding domain (rbd) and human angiotensin converting enzyme (ace ), present on the exterior surface of airway cells [ , ] . while there is no known host-recognition role for the s n-terminal domain (ntd) of sars-cov- , it is notable that s ntds of other coronaviruses have been shown to bind host surface glycans [ , ] . in contrast to the spike subunit s , the s subunit contains the membrane fusion apparatus, and also mediates trimerization of the ectodomain [ ] [ ] [ ] . prior to host recognition, spike proteins exist in a metastable pre-fusion state, wherein the s subunits lay atop the s region and their rbds oscillate between up and down conformations that are, respectively, capable and incapable of receptor binding [ , , ] . upon processing at the s /s and s ' cleavage sites by host proteases as well as engagement to the receptor, the s subunit undergoes dramatic conformational changes from the pre-fusion to the post-fusion state. such structural rearrangements are associated with fusion of the viral envelope with host membranes, thereby allowing release of the rna genome into the cytoplasm of the host cell [ , ] . coronavirus spike proteins are highly immunogenic [ ] , and several experimental approaches have sought to target this molecule for the purpose of virus neutralization [ ] . the high specificity, potency, and modular nature of antibody-based antiviral therapeutics have shown exceptional promise [ ] [ ] [ ] , and the isolated, purified rbd has been a popular target for the development of antibodies directed against the spike proteins of pathogenic coronaviruses [ ] [ ] [ ] [ ] . however, binders of the isolated rbd may not effectively engage the aforementioned pre-fusion conformation of the spike protein, which could account for the poor neutralization ability of recently described single-domain antibodies that were raised against the rbd of sars-cov- spike protein [ ] . therefore, to more easily identify molecules with qualities befitting a drug-like candidate, it would be advantageous to validate rbdspecific binders in the context of the full, stabilized, pre-fusion spike assembly [ , ] . single domain antibodies based on the variable vhh domain of heavy-chain-only antibodies of camelids -generally known as nanobodies -have demonstrated great potential in several studies [ ] . nanobodies are small ( ) ( ) ( ) ( ) , stable, and inexpensive to produce in large amounts in bacteria and yeast [ ] , yet they bind targets in a similar affinity range as conventional antibodies. due to their minimal size, they are particularly suited to reach hidden epitopes such as crevices of target proteins [ ] . we recently designed three libraries of synthetic nanobodies, termed sybodies, based on elucidated structures of nanobody-target complexes (fig. a) [ , ] . sybodies can be selected against any target protein within twelve working days, which is considerably faster than the generation of natural nanobodies, which requires the repetitive immunization during a period of two months prior to binder selection by phage display [ ] . a considerable advantage of our platform is that the selection of sybodies is carried out under defined conditions -in the case of coronavirus spike proteins, this offers the opportunity to generate binders recognizing the metastable pre-fusion conformation [ , ] . finally, due to the feasibility of inhaled therapeutic nanobody formulations [ ] , virus-neutralizing sybodies could offer a convenient, fast and direct means of prophylaxis. here, we identified a series of sybodies, which bind to two non-overlapping epitopes at the rbd of sars-cov- . when fused to generate a bispecific binder format, the sybodies potently neutralize viral entry of both pseudotyped and live viruses. cryo-em analyses confirmed simultaneous binding of two sybodies and revealed a novel asymmetric spike conformation with one up rbd, one up-out rbd and one down rbd. sybodies were selected using two rbd constructs fused to additional domains (fc of mouse igg and vyfp, respectively). our "target swap" selection approach (fig. s ) resulted in two enriched pools for each of the three sybody libraries (concave, loop and convex, fig. a ). an off-rate selection step was performed using the pre-enriched purified sybody pool after phage display round as competitor (see materials and methods). after two rounds of phage display, strong enrichment by factors ranging from to were determined by qpcr (table s ). elisa screening was performed using rbd-vyfp (rbd), commercially acquired spike ectodomain containing wild-type s and s (ecd), and maltose binding protein (mbp) as negative control. elisa analysis revealed very high hit rates for the rbd and the ecd, ranging from % to % and % to %, respectively (fig. s , table s ). at a later stage, we also performed elisas using engineered pre-fusion-stabilized spike ectodomain, containing two stabilizing proline mutations (s- p) [ ] (fig. s ). while most elisa signals for the ecd and s- p were highly similar, we found around sybodies with stronger binding to ecd than to s- p, which can be explained by the fact that the s- p forms a stable trimer, whereas the ecd lacked stabilizing proline mutations as well as the c-terminal foldon trimerization motif and therefore may be predominantly dissociated into monomers with increased internal epitope accessibility. in addition, the ecd might partially or completely adopt a post-fusion state, whereas s- p is expected to be stabilized in the trimeric pre-fusion state [ , ] . elisa-positive sybodies were sequenced ( for each of the selection reactions numbered from sb# - , see also fig. s ). sequencing results of out of sybody clones were unambiguous. out of these clones, were found to be unique and belonged to the concave ( ), loop ( ) and convex ( ) sybody libraries (fig. s , fig. s , table s ). there were no duplicate binders identified in both selection variants, indicating that the two separate selection streams gave rise to completely different sybody populations. two other research groups also used our sybody libraries to generate binders against the sars-cov- rbd [ , ] . interestingly, there is no sequence overlap amongst binder hits in these three independent sybody generation campaigns. this demonstrates that the sybody libraries are highly diverse and suggests that identical binders must be the result of over-enrichment, likely occurring towards the end of the binder selection process (i.e., during phage display). although the high sybody sequence diversity was not unexpected due to the very large size of the sybody libraries, this unique and autonomous multi-institute sybody selection campaign clearly demonstrates that it is possible to get access to an enormous variety of binders via independent selection experiments. the selected unique sybodies were individually expressed in e. coli and purified via ni-nta affinity chromatography and size exclusion chromatography. ultimately, sybodies revealed appropriate biochemical features with respect to solubility, yield, and monodispersity, in order to proceed with further characterization. for an in vitro kinetic analysis of sybody interactions with the viral spike, we employed grating-coupled interferometry (gci) [ ] to probe sybody binding to immobilized rbd-vyfp. first, the purified sybodies were subjected to an off-rate screen, which revealed six sybodies (sb# , sb# , sb# , sb# , sb# , and sb# ) with strong binding signals and comparatively slow off-rates. binding constants were then determined by measuring on-and off-rates over a range of sybody concentrations, revealing affinities for rbd within a range of - nm using a langmuir : model for data fitting (fig. s a ). next, we evaluated the ability of the purified sybodies to compete with ace binding by elisa. to this end, binding of purified rbd to immobilized hace was measured in the presence or absence of an excess of each purified sybody ( fig. a) . nearly all sybodies were found to inhibit rbd-hace interaction. the signal decrease relative to unchallenged rbd was modest for most sybodies, with an average signal reduction of about %. however, five sybodies (sb# , sb# , sb# , sb# , and sb# ) reduced rbd-attributable elisa signal to near-background levels, implying that these binders were able to almost entirely abolish the interaction between rbd and hace . notably, these five hace -inhibiting sybodies were among the six aforementioned highest affinity rbd binders. we sought to determine if our set of sybodies recognized separate epitopes on the rbd surface. elisa experiments demonstrated that incubation of sb# with s- p only slightly diminished the ability of the spike from binding to immobilized sb# , whereas pre-incubation with sb# , sb# , sb# , sb# , or sb# almost completely prevented the interaction of the spike protein with immobilized sb# (fig. s ). this suggested that sb# and sb# can bind simultaneously to the spike. therefore, we characterized sb# and sb# in more detail and performed gci measurements with the rbd (as a repetition of the initial experiments), as well as s- p and an even further stabilized version of the spike protein containing six prolines (hexapro [ ] ), termed here s- p (fig. b, fig. s b ). in contrast to the data generated using rbd, for which the langmuir : model was used to fit the data, the experimental data for s- p and s- p could only be fitted adequately using a heterogenous ligand model, which accounts for a high and a low affinity binding site. as our cryo-em analysis revealed binding of three sb# molecules and two sb# molecules to a highly asymmetric spike trimer (see below), the heterogenous ligand model could be justified. in the case of sb# , the higher binding affinities (kd ) for s- p and s- p ( nm and nm, respectively) were found to be similar to the one determined for the rbd ( nm). in contrast, kd of sb# was more than -fold stronger for s- p and s- p ( nm and nm, respectively) than for rbd ( nm) (fig. b, fig. s b ). to investigate if both sybodies can also bind simultaneously in the context of the trimeric full-length spike protein, we used gci to monitor binding events of the sybodies injected either alone or in combination (fig. c) . when we analyzed the sybodies against coated rbd, the maximal binding signals for sb# ( pg/mm ) and sb# ( pg/mm ) were approximately additive when both sybodies were co-injected ( pg/mm ), clearly showing that both sybodies can bind simultaneously. interestingly, when the same analysis was performed using s- p and s- p, the binding signals of the co-injections ( pg/mm for s- p and pg/mm for s- p) were clearly greater than the sum of the binding signals of sb# and sb# when injected individually ( pg/mm and pg/mm for s- p and pg/mm and pg/mm for s- p). this suggests cooperative binding of the two sybodies to the full-length spike protein, but not of the isolated rbd. to investigate interference of sb# and sb# with ace binding in detail, we performed an ace competition experiment using gci. to this end, s- p was coated on a gci chip and sb# ( nm), sb# ( nm) and the non-randomized convex sybody control (sb# , nm) were injected alone or together with ace ( nm) to monitor binding (fig. b) . indeed, sb# did not bind when injected alone and consequently did not disturb ace binding when co-injected. conversely, both sb# and sb# were found to dominate over ace in the association phase during co-injection, and the resulting curves are highly similar to what was observed when these two sybodies were injected alone. this experiment unequivocally demonstrates a strong competition of ace binding by the two sybodies using s- p as target. ace competition by sb# to this extent was surprising in view of the initial ace elisa competition experiment ( fig. a) . however, the seeming discrepancy can be explained by our observation that the affinity of sb# for s- p (used in the gci experiment) is more than times stronger than for the isolated rbd (used in the elisa experiment). to determine the inhibitory activity of the identified sybodies, we conducted in vitro neutralization experiments. towards this aim, we employed engineered vesicular stomatitis viruses (vsv) that were pseudotyped with sars-cov- spikes [ ] . interestingly, only the high affinity sybodies (sb# and sb# ), which also efficiently blocked receptor binding, exhibited potent neutralizing activity with ic values of . µg/ml ( nm) and . µg/ml ( nm), respectively (fig. a , table ). in contrast, sb# and sb# inhibited pseudotyped vsvs only to a limited extent. in agreement with the high affinity of sb# for soluble spike and its ability to compete with ace in the context of s- p as determined by gci, the ic values were similar to those observed for sb# ( . µg/ml, nm). since sb# and sb# can bind simultaneously to the rbd and the full-length spike protein, we mixed sb# and sb# together to investigate potential additive or synergistic neutralizing activity of these two independent sybodies. indeed, consistent with the binding assays, the simultaneous presence of both sybodies resulted in improved neutralization profiles with ic values reaching . µg/ml ( nm) (fig. a , table ). note that no neutralization of the pseudotype virus was observed in a control experiment using a nanobody directed to mcherry at the highest concentration ( µg/ml), thus validating the specificity of the identified sybodies. in addition to the individual sybodies, we also explored potential avidity effects of sybodies genetically fused to human igg fc domains. the respective sybody-fc constructs (sb# -fc, sb# -fc, sb# -fc, sb# -fc and sb# -fc) exhibited vsv pseudotype ic values in the range of . to . µg/ml ( nm to nm) and were therefore clearly improved over the respective values of the sybodies alone, which ranged from . to µg/ml ( nm to nm) ( table ). this suggests that the bivalent arrangement of the fc fusion constructs resulted in a discernible avidity effect. it is interesting to note that for some sybodies the gain of neutralization potency was much higher (e.g. for sb# , the ic values for single sybody versus fc-fused sybodies were nm versus nm), whereas for others it was only modest (e.g. for sb# , the respective values were nm versus nm). this indicates that the avidity effect strongly depends on the binding epitope. next, the neutralizing activity of the various sybodies was assessed with live sars-cov- (strain münchen- . / / ) [ ] employing a % neutralization dose (nd ) assay (table ) . sybodies which exhibited the least potent neutralization activities in the pseudotyped vsv assays (sb# , sb# and sb# ), did not block sars-cov- infection. in sharp contrast, sb# and sb# successfully inhibited sars-cov- cell entry, with nd values of . and . µg/ml, respectively. with the exception of sb# , the overall neutralization data obtained with live sars-cov- virus corroborated the findings obtained with the pseudotyped vsv system, although the sybodies were less potent against live sars-cov- . the binding and neutralization data, as well as the structural data presented below, highlighted that sb# and sb# are (i) the most potent neutralizing sybodies; (ii) bind to non-overlapping epitopes on the rbd surface; and (iii) exhibit synergistic virus neutralizing effects. these findings provided the basis to investigate whether fusing both sybodies would further improve the neutralization potency. towards this aim, we engineered three constructs consisting of sb# and sb# fused via a flexible linker (ggggs) of various length (repetitions of x, x or x) (fig. a ). the resulting bi-specific sybodies were accordingly designated gs , gs and gs , respectively. the binding kinetics of these three bispecific sybodies were then analyzed by gci using coated s- p (fig. b) , and binding affinities were found to range between pm to pm (using a langmuir : fitting model). this pronounced improvement of the affinity of the bispecific sybodies over the individual binders indicated that the two sybodies of the fused construct bind simultaneously to the spike protein, thereby resulting in a strong avidity effect. in agreement with the improved affinity, all three engineered bispecific constructs displayed highly potent neutralizing activities against both pseudotyped virus and live sars-cov- (ic values of gs : . µg/ml ( nm), gs : . µg/ml ( . nm) and gs : . µg/ml ( . nm) (fig. c , table ). for live sars-cov- virus, nd values of gs : . µg/ml ( nm), gs : . µg/ml ( nm) and gs : . µg/ml ( nm) were determined (table ) . collectively, these data show that fusing sb# and sb# via flexible linkers results in bispecific sybodies with dramatically improved neutralization activity (by a factor of about times compared to the single binders). to gain structural insights into how sb# and sb# recognize the rbd, we performed single particle cryo-em analysis of the spike protein in complex with the sybodies. to generate complexes, sybodies (alone or in combination) were mixed with spike protein at a molar ratio of . : (sybody:spike monomer), prior to a final purification step using size-exclusion chromatography. in total, three cryo-em datasets were collected, allowing a glimpse of the spike protein either simultaneously bound to both sybodies, or associated to sb# or sb# alone ( fig. s - , table s ). the highest resolution was obtained for the spike protein in complex with both sybodies (fig. s ). in contrast, the structures with the individual sybodies were determined based on fewer particles and mainly served to unambiguously assign the binding epitopes of sb# (fig. s ) and sb# (fig. s ). although the global resolution of the spike protein in complex with both sybodies is around Å, the local resolution of the rbds with bound sybodies was only in the range of - Å, presumably due to conformational flexibility (fig. s ) . therefore, we did not build full models of the sybodies and provide details only on their interaction surface with the rbds. however, the cryo-em density is good enough to describe the general epitope location and the distinct conformations adopted by the rbds. for better assessment and visualization, we fitted homology models of the respective sybodies into the densities ( fig. s -s ). the sybody homology models were based on pdb: k k [ ] in case of the concave sb# and pdb: m [ ] for the convex sb# . analysis of the spike/sb# /sb# particles after d classification revealed that the spike protein adopts two distinct conformations (fig. s ). the first conformation ( % of particles) has a three-fold symmetry, with three rbds in the up conformation ( up) and two sybodies bound to each of the rbds, confirming that sb# and sb# bind simultaneously (fig. a, fig. s c , f and s a). according to the spike structure obtained with sb# alone (detailed analysis below, fig. s and s ), sb# binds to the top of the rbd. its binding epitope consists of two regions (residues - and - ) and thereby strongly overlaps with the ace binding site (fig. b ). in contrast, sb# binds to the side of the rbd ( fig. s and s d-e) and recognizes a conserved epitope [ ] clearly distinct from the ace interaction site, which includes residues - and - and is buried if the rbd is in its down conformation. although the binding epitope of sb# is clearly distinct from the one of ace , there would be a steric clash between the sb# backside loops and ace , if ace docks to the rbd (fig. b ). this accounts for sb# 's ability to compete with ace as evident from gci analyses (fig. b ). the second resolved conformation ( % of particles) of the spike/sb# /sb# complex is asymmetric with the rbds in three distinct states, and was obtained at a global resolution of . Å (fig. c, fig. s c , g and s b). in this case, three sb# and two sb# were bound. the first rbd was in the up conformation, having sb# and sb# bound in an analogous fashion as in the symmetric up structure. the second rbd adopted a down state with only sb# bound. this conformation of sb# bound rbd appears to act as a wedge, pushing the third rbd outward and away from the three-fold symmetry axis (fig. d ). the third rbd was in an up-out conformation with sb# and sb# bound. however, the density for sb# was very weak, indicating either a very high flexibility or a substoichiometric occupancy. we refer to this novel asymmetric spike conformation as a up/ upout/ down state (fig. c ). virtually the same asymmetric up/ up-out/ down spike conformation was observed for the spike/sb# complex, reinforcing our interpretation that wedging by sb# is responsible for the outward movement of the second up-rbd (fig. s ). however, according to our analysis, comprising only a limited number of images (fig. s d ), sb# alone was unable to induce the up conformation, suggesting that adoption of the up state requires the synergistic action of both sybodies to populate this symmetric conformation. finally, analysis of the spike/sb# complex dataset revealed two distinct populations ( figure s and s ). the most abundant class showed an up down conformation without sybody bound, which is identical to the one obtained for the spike protein alone [ , ] . the second structure featured two rbds in an up conformation with bound sb# . density for the third rbd was very weak, presumably due to high intrinsic flexibility, hindering the interpretation of its exact position and conformation. we therefore refer to this conformation as an up/ flexible state. structural comparisons revealed that sb# cannot access its epitope in the context of the up down conformation, due to steric clashes with the neighboring rbd (fig. s b ). in order to bind, at least two rbds need to be in the up conformation. in summary, both sybodies stabilized the up conformation of the rbds. notably, without sybodies, s- p predominantly assumes an equilibrium between the down and the up down conformation [ , ] . upon addition of sb# , the conformational equilibrium was shifted towards an asymmetric up/ up-out/ down state, whereas addition of sb# favored an asymmetric state with rbds adopting a up/ flexible conformation. when added together, the sybodies appear to synergistically act to stabilize two states: a predominant up state, as well as the asymmetric up/ up-out/ down state. in this work, we have demonstrated the ability of our rapid in vitro selection platform to generate sybodies directed to the sars-cov- rbd. the biochemical characterization of these sybodies led to the identification of a high-affinity subset of binders, which were further analyzed in depth using structural, biochemical and functional methods. thereby, we found a pair of sybodies, sb# and sb# , which bind simultaneously to the rbd. both sybodies were found to compete with ace binding, albeit likely through different mechanisms. while the binding epitope of sb# directly overlaps with the one of ace , this is not the case for sb# , which interferes with ace through a steric clash at the sybody backside (fig. b ). in agreement with their similar affinities for the s- p spike protein, sb# and sb# exhibited similar neutralization efficiencies in the range of . - . µg/ml ( nm). we noted a moderate synergistic effect in the virus neutralization test when both individual sybodies were mixed together, resulting in an improved ic of . µg/ml ( nm). this synergy can be explained by the concerted action of the sybodies to compete with ace docking via epitope blockage and steric clashing. cryo-em analyses revealed distinct binding epitopes for the two sybodies sb# and sb# . the s- p spike protein we used for cryo-em was shown to predominantly adopt the down and up/ down conformations [ , ] , whereas the s- p/sb# /sb# complex adopts either a novel up/ upout/ down or a up conformation. the structures further revealed that sb# can only bind to the up-rbd. the inability of sb# (and to some degree also sb# ) to bind to the down-rbd resulted in conformational selection of spike protein with at least two up rbds, thereby shifting the conformational equilibrium of the spike. it is interesting to note that the binding epitope of sb# is highly conserved between sars-cov- and sars-cov- , because it constitutes an interaction interface that, upon binder engagement, stabilizes the rbd in the down conformation. the same conserved epitope is also recognized by the human antibodies cr (isolated from a sars-cov- infected patient and showing cross-specificity against sars-cov- ) and ey a (vice versa) [ , ] (fig. ) . hence, the binding epitope of sb# is less likely to be remodelled due to drug-induced selection pressures, thereby limiting the evolution of sars-cov- escape mutants if sb# were to be used as a therapeutic antiviral drug. despite sharing a similar epitope on the rbd, cr and ey a do not display an obvious direct steric clash with ace and in contrast to sb# do not compete directly with ace binding (fig. ). since cr and ey a have strong neutralizing capacity, inhibition mechanisms in addition to ace blockage could exist, which may also apply for sb# and sb# . however, for the ey a antibody it has been proposed that surface glycans on ace may interact with ey a and at least partially account for its neutralizing effect [ ] . akin to the cr and ey a antibodies, our sybodies share the ability to stabilize spike conformations with -or -up rbds. thereby, the spike protein may be destabilized, resulting in the premature and unproductive transitions to the irreversible post-fusion state. this mechanism was dubbed "receptor mimicry" in a study on a neutralizing antibody s , which only bound to up-rbds and thereby triggered fusogenic conformational changes of sars-cov- spike [ ] . however, since we obtained well-resolved cryo-em structures with sb# and sb# bound to the spike after incubating the complex for more than hours, we consider the mechanism of receptor mimicry less plausible in our case. yet, it is important to note that recent investigations of nonengineered sars-cov- spike protein extracted from membranes by detergents revealed unique structural features not found in the stabilized pre-fusion spike, including a stronger compaction of the spike trimer and the pre-dominance of the -down rbd conformation [ ] . further, the study highlighted a high propensity of the native sars-cov- spike to spontaneously transit to the postfusion state without interacting with ace . therefore, it is still possible that the sybodies (and in particular sb# ) accelerate these spontaneous spike inactivation process in the context of live sars-cov- virus, without affecting the pre-fusion stabilized soluble spike protein used for cryo-em analyses. the recent months have brought about a large number of publications on neutralizing antibodies [ ] [ ] [ ] [ ] , nanobodies [ , , , ] and other binder scaffolds [ ] . for the smaller scaffolds, in particular in case of nanobodies, fusion of binders via flexible linkers emerged as a promising strategy to improve neutralization efficiencies by exploiting avidity effects in the context of the trimeric spike protein. however, strategies to exploit genetically fused nanobodies so far included only identical binders recognizing the same epitope on the rbds [ ] . a crucial issue regarding development of reliable therapeutics against enveloped rna viruses such as sars-cov- is their ability to rapidly develop resistance mutations. recently, the emergence of resistance against monoclonal antibodies targeting the sars-cov- spike-rbd was investigated in vitro [ ] . while drug-resistant viruses indeed emerged rapidly when such antibodies with overlapping epitopes were administered either individually or in combination, escape mutants were not generated when treated with cocktails of non-competing antibodies. because the neutralizing sybody pair (sb# /sb# ) identified in this study was found to simultaneously bind to two spatially-distinct epitopes on the spike-rbd (of which one is highly conserved among sarbecoviruses [ ] ), we anticipate that our rationally engineered single-format bispecific constructs, which displayed highly potent neutralization profiles, may also exhibit high resistance barriers. although monoclonal antibodies (mabs) hold great promise in modern medicine, their manufacture remains tedious, time-consuming and expensive. in addition, the administration of mabs must be performed by medical professionals at hospitals, which further hampers their fast and global availability. conversely, single domain antibodies and their derivative multi-component formats can be produced easily, quickly, and inexpensively in bacteria, yeast, or mammalian cell culture. furthermore, the biophysical properties of single domain antibodies make them feasible for development in an inhalable formulation, thereby not only enabling direct delivery to nasal and lung tissues (two key sites of sars-cov- replication), but also offering the potential of self-administration. overall, we present a robust platform to generate highly potent multi-specific biomolecules against coronaviruses. in particular, the rapid selection of sybodies and their swift biophysical, structural and functional characterization, provide a foundation for the accelerated reaction to potential future pandemics. finally, our recently described flycode technology can be utilized for deeper interrogation of sybody selection pools, in order to facilitate discovery of exceptional sybodies possessing very slow off-rates or recognizing rare epitopes [ ] . a gene encoding sars-cov- residues pro -gly (rbd, genbank accession qhd . ), downstream from a modified n-terminal human serum albumin secretion signal [ ] , was chemically synthesized (geneuniversal). this gene was subcloned using fx technology [ ] into a custom mammalian expression vector [ ] , appending a c-terminal c protease cleavage site, myc tag, venus yfp [ ] , and streptavidin-binding peptide [ ] onto the open reading frame (rbd-vyfp). - ml of suspension-adapted expi cells (thermo) were transiently transfected using expifectamine according to the manufacturer protocol (thermo), and expression was continued for - days in a humidified environment at °c, % co . cells were pelleted ( g, min), and culture supernatant was filtered ( . µm mesh size) before being passed three times over a gravity column containing nhsagarose beads covalently coupled to the anti-gfp nanobody k k [ ] , at a resin:culture ratio of ml resin per ml expression culture. resin was washed with column-volumes of rbd buffer (phosphate-buffered saline, ph . , supplemented with additional . m nacl), and rbd-vyfp was eluted with . m glycine, ph . , via sequential . ml fractions, without prolonged incubation of resin with the acidic elution buffer. fractionation tubes were pre-filled with / vol m tris, ph . ( µl), such that elution fractions were immediately ph-neutralized. fractions containing rbd-vyfp were pooled, concentrated, and stored at °c. purity was estimated to be > %, based on sds-page (not shown). yield of rbd-vyfp was approximately - μg per ml expression culture. a second purified rbd construct, consisting of sars-cov- residues arg -phe fused to a murine igg fc domain (rbd-fc) expressed in hek cells, was purchased from sino biological (catalogue number: -v h, µg were ordered). purified full-length spike ectodomain (ecd) comprising s and s (residues val -pro ) with a c-terminal his-tag and expressed in baculovirus-insect cells was purchased from sino biological (catalogue number: -v b , µg were ordered). the prefusion ectodomain of the sars-cov spike protein containing two stabilizing proline mutations (s- p) (residues - ) [ ] , was transiently transfected into x suspension-adapted expicho cells (thermo fisher) using mg plasmid dna and mg of pei max (polysciences) per l procho medium (lonza) in a l erlenmeyer flask (corning) in an incubator shaker (kühner). one hour post-transfection, dimethyl sulfoxide (dmso; applichem) was added to % (v/v). incubation with agitation was continued at °c for days. l of filtered ( . um) cell culture supernatant was clarified. then, a ml gravity flow strep-tactin®xt superflow® column (iba lifescience) was rinsed with ml buffer w ( mm tris, ph . , mm nacl, mm edta) using gravity flow. the supernatant was added to the column, which was then rinsed with ml of buffer w (all with gravity flow). finally, six elution steps were performed by adding each time . ml of buffer bxt ( mm biotin in buffer w) to the resin. all purification steps were performed at °c. to remove amines, all proteins were first extensively dialyzed against rbd buffer. proteins were concentrated to µm using amicon ultra concentrator units with a molecular weight cutoff of - kda. subsequently, the proteins were chemically biotinylated for min at °c using nhs-biotin (thermo fisher, # ) added at a -fold molar excess over target protein. immediately after, the three samples were dialyzed against tbs ph . . during these processes (first dialysis/concentration/biotinylation/second dialysis), %, %, % and % of the rbd-vyfp, rbd-fc, ecd and s- p respectively were lost due to adsorption to the concentrator filter or due to aggregation. biotinylated rbd-vyfp, rbd-fc and ecd were diluted to µm in tbs ph . , % glycerol and stored in small aliquots at - °c. biotinylated s- p was stored at °c in tbs ph . . sybody selections with the three sybody libraries concave, loop and convex were carried out as previously detailed [ ] . in short, one round of ribosome display was followed by two rounds of phage display. binders were selected against two different constructs of the sars-cov- rbd; an rbd-vyfp fusion and an rbd-fc fusion. mbp was used as background control to determine the enrichment score by qpcr [ ] . in order to avoid enrichment of binders against the fusion proteins (yfp and fc), we switched the two targets after ribosome display (fig. s ). for the off-rate selections we did not use non-biotinylated target proteins as described [ ] because we did not have the required amounts of purified target protein. instead, we employed a pool competition approach. after the first round of phage display, all three libraries of selected sybodies, for both target-swap selection schemes, were subcloned into the psb_init vector (giving approximately clones) and expressed in e. coli mc cells. the resulting three expressed pools were subsequently combined, giving one sybody pool for each selection scheme. these two final pools were purified by ni-nta affinity chromatography, followed by buffer exchange of the main peak fractions using a desalting pd column in tbs ph . to remove imidazole. the pools were eluted with . ml of tbs ph . . these two purified pools were used for the off-rate selection in the second round of phage display at concentrations of approximately µm for selection variant (competing for binding to rbp-fc) and µm for selection variant (competing for binding to rbp-yfp). the volume used for off-rate selection was µl, with . % bsa and . % tween- added to pools immediately prior to the competition experiment. off-rate selections were performed for minutes. elisas were performed as described in detail [ ] . single clones were analyzed for each library of each selection scheme. since the rbd-fc construct was incompatible with our elisa format due to the inclusion of protein a to capture an α-myc antibody, elisa was performed only for the rbd-vyfp ( nm) and the ecd ( nm) and later on with the s- p ( nm). of note, the three targets were analyzed in three separate elisas. as negative control to assess background binding of sybodies, we used biotinylated mbp ( nm). positive elisa hits were sequenced (microsynth, switzerland). the unique sybodies were expressed and purified as described [ ] . in short, all sybodies were expressed overnight in e.coli mc cells in ml cultures. the next day the sybodies were extracted from the periplasm and purified by ni-nta affinity chromatography (batch binding) followed by sizeexclusion chromatography using a sepax srt- c sec size-exclusion chromatography (sec) column equilibrated in tbs, ph . , containing . % (v/v) tween- (detergent was added for subsequent kinetic measurements). six out of the binders (sb# , sb# , sb# , sb# , sb# , sb# ) were excluded from further analysis due to suboptimal behavior during sec analysis (i.e. aggregation or excessive column matrix interaction). to generate the bispecific sybodies (sb# -sb# fusion with variable glycine/serine linkers), sb# was amplified from psb-init_sb# (addgene # ) using the forward primer atatatgctcttcaagtcaggttc and the reverse primer tatatagctcttcaagaaccgccaccgccgctaccgccaccacctgcgctcacagtcac, encoding x a ggggs motif, followed by a sapi cloning site. sb# was amplified from psb-init_sb# (addgene # ) using forward primers (atatatgctcttcttctcaagtccagctggtgg), (atatatgctcttcttctggtggtggcggtagcggcggtggcggtagtcaagtccagctggtgg) or (atatatgctcttcttctggtggtggcggtagcggcggtggcggttctggtggtggcggtagcggcggtggc ggtagtcaagtccagctggtgg) each combined with the reverse primer tatatagctcttcctgcagaaac. the forward primers start with a sapi site (compatible overhang to sb# reverse primer), followed by non, x or x the ggggs motif. the pcr product of sb# was cloned in frame with each of the three pcr products of sb# into psb-init using fx-cloning [ ] , thereby resulting in three fusion constructs with linkers containing x, x or x ggggs motives as flexible linkers between the sybodies (called gs , gs and gs , respectively). the three bispecific fusion constructs gs , gs and gs were expressed and purified the same way as single sybodies [ ] . the high affinity sybodies were cloned and produced as human igg fc-fusions by absolute antibody, where they are commercially available. purified recombinant hace protein (mybiosource, cat# mbs ) was diluted to nm in phosphate-buffered saline (pbs), ph . , and μl aliquots were incubated overnight on nunc maxisorp -well elisa plates (thermofisher # - - ) at °c. elisa plates were washed three times with μl tbs containing . % (v/v) tween- (tbst). plates were blocked with μl of . % (w/v) bsa in tbs for h at room temperature. μl samples of biotinylated rbd-vyfp ( nm) mixed with individual purified sybodies ( nm) were prepared in tbs containing . % (w/v) bsa and . % (v/v) tween- (tbs-bsa-t) and incubated for . h at room temperature. these μl rbd-sybody mixtures were transferred to the plate and incubated for minutes at room temperature. μl of streptavidin-peroxidase (merck, cat#s ) diluted : in tbs-bsa-t was incubated on the plate for h. finally, to detect bound biotinylated rbd-vyfp, μl of development reagent containing , ′, , ′-tetramethylbenzidine (tmb), prepared as previously described [ ] , was added, color development was quenched after - min via addition of μl . m sulfuric acid, and absorbance at nm was measured. background-subtracted absorbance values were normalized to the signal corresponding to rbd-vyfp in the absence of added sybodies. purified sybodies carrying a c-terminal myc-his tag (sb_init expression vector) were diluted to nm in µl pbs ph . and directly coated on nunc maxisorp -well plates (thermofisher # - - ) at °c overnight. the plates were washed once with µl tbs ph . per well followed by blocking with µl tbs ph . containing . % (w/v) bsa per well. in parallel, chemically biotinylated prefusion spike protein (s- p) at a concentration of nm was incubated with nm sybodies for h at room temperature in tbs-bsa-t. the plates were washed three times with µl tbs-t per well. then, µl of the s- p-sybody mixtures were added to the corresponding wells and incubated for min, followed by washing three times with µl tbs-t per well. µl streptavidin-peroxidase polymer (merck, cat#s ) diluted : in tbs-bsa-t was added to each well and incubated for min, followed by washing three times with µl tbs-t per well. finally, to detect s- p bound to the immobilized sybodies, µl elisa developing buffer (prepared as described previously [ ] ) was added to each well, incubated for h (due to low signal) and absorbance was measured at nm. as a negative control, tbs-bsa-t devoid of protein was added to the corresponding wells instead of a s- p-sybody mixture. kinetic characterization of sybodies binding onto sars-cov- spike proteins was performed using gci on the wavesystem (creoptix ag, switzerland), a label-free biosensor. for the off-rate screening, biotinylated rbd-vyfp and ecd were captured onto a streptavidin pcp-sta wavechip (polycarboxylate quasi-planar surface; creoptix ag) to a density of - pg/mm . sybodies were first analyzed by an off-rate screen performed at a concentration of nm (data not shown) to identify binders with sufficiently high affinities. the six sybodies sb# , sb# , sb# , sb# , sb# , and sb# were then injected at increasing concentrations ranging from . nm to μm (three-fold serial dilution, concentrations) in mm tris ph . , mm nacl supplemented with . % tween- (tbs-t buffer). sybodies were injected for s at a flow rate of μl/min per channel and dissociation was set to s to allow the return to baseline. in order to determine the binding kinetics of sb# and sb# against intact spike proteins, the ligands rbd-vyfp, s- p and s- p were captured onto a pcp-sta wavechip (creoptix ag) to a density of pg/mm , pg/mm and pg/mm respectively. sb# and sb# were injected at concentrations ranging from . nm to nm or . nm to nm, respectively ( -fold serial dilution, concentrations) in tbs-t buffer. sybodies were injected for s at a flow rate of μl/min and dissociation was set to s. in order to investigate if sb# and sb# bind simultaneously to the rbd, s- p and s- p, both binders were either injected alone at a concentration of nm or mixed together at the same individual concentrations at a flow rate of μl/min for s in tbs-t buffer. to measure binding kinetics of the three bispecific fusion constructs, gs , gs and gs , s- p was captured as described above to a density of pg/mm and increasing concentrations of the bispecific fusion constructs ranging from nm to nm ( -fold serial dilution, concentrations) in tbs-t buffer at a flow rate of μl/min. because of the slow off-rates, we performed a regeneration protocol by injecting mm glycine ph for s after every binder injection. for ace competition experiments, s- p was captured as described above. then sb# , sb# or sb# (non-randomized convex sybody control) were either injected individually or premixed with ace in tbs-t buffer. sybody concentrations were at nm and ace concentration was at nm. all sensorgrams were recorded at °c and the data analyzed on the wavecontrol (creoptix ag). data were double-referenced by subtracting the signals from blank injections and from the reference channel. a langmuir : model was used for data fitting with the exception of the sb# and sb# binding kinetics for the s- p and the s- p spike, which were fitted with a heterogeneous ligand model as mentioned in the main text. pseudovirus neutralization assays have been previously described [ , , ] . briefly, propagationdefective, spike protein-pseudotyped vesicular stomatitis virus (vsv) was produced by transfecting hek- t cells with sars-cov- sdel (sars- s carrying an aa cytoplasmic tail truncation) as described previously [ ] . the cells were further inoculated with glycoprotein g trans-complemented vsv vector (vsv*g(luc)) encoding enhanced green fluorescence protein (egfp) and firefly luciferase reporter genes but lacking the glycoprotein g gene [ ]. after h incubation at °c, the inoculum was removed and the cells were washed once with medium and subsequently incubated for h in medium containing : of an anti-vsv-g mab i (atcc, crl- tm ). pseudotyped particles were then harvested and cleared by centrifugation. for the sars-cov- pseudotype neutralization experiments, pseudovirus was incubated for min at °c with different dilutions of purified sybodies, sybdody fusions or sybody-fc fusions. subsequently, s protein-pseudotyped vsv*g(luc) was added to vero e cells grown in -well plates ( ' cells/well). at h post infection, luminescence (firefly luciferase activity) was measured using the one-glo luciferase assay system (promega) and cytation cell imaging multi-mode reader (biotek). the serial dilutions of control sera and samples were prepared in quadruplicates in -well cell culture plates using dmem cell culture medium ( µl/well). to each well, µl of dmem containing tissue culture infectious dose % (tcid ) of sars-cov- (sars-cov- /münchen- . / / ) were added and incubated for min at °c. subsequently, µl of vero e cell suspension ( , cells/ml in dmem with % fbs) were added to each well and incubated for h at °c. the cells were fixed for h at room temperature with % buffered formalin solution containing % crystal violet (merck, darmstadt, germany). finally, the microtiter plates were rinsed with deionized water and immune serum-mediated protection from cytopathic effect was visually assessed. neutralization doses % (nd ) values were calculated according to the spearman and kärber method. freshly purified s- p was incubated with a . -fold molar excess of sb# alone or with sb# and sb# and subjected to size exclusion chromatography to remove excess sybody. in analogous way, the sample of s- p with sb# was prepared. the protein complexes were concentrated to . - mg ml - using an amicon ultra- . ml concentrating device (merck) with a kda filter cut-off. . μl of the sample was applied onto the holey-carbon cryo-em grids (au r . / . , mesh, quantifoil), which were prior glow discharged at - ma for s, blotted for - s and plunge frozen into a liquid ethane/propane mixture with a vitrobot mark iv (thermo fisher) at °c and % humidity. samples were stored in liquid nitrogen until further use. screening of the grid for areas with best ice properties was done with the help of a home-written script to calculate the ice thickness (manuscript in preparation). cryo-em data in selected grid regions were collected in-house on a -kev talos arctica microscope (thermo fisher scientifics) with a post-column energy filter (gatan) in zero-loss mode, with a -ev slit and a μm objective aperture. images were acquired in an automatic manner with serialem on a k summit detector (gatan) in counting mode at × , magnification ( . Å pixel size) and a defocus range from − . to − . μm. during an exposure time of s, frames were recorded with a total exposure of about electrons/Å . on-the-fly data quality was monitored using focus [ ] . for the s- p/sb# / sb# complex dataset, in total , micrographs were recorded. beaminduced motion was corrected with motioncor _ . . [ ] and the ctf parameters estimated with ctffind . . [ ] . recorded micrographs were manually checked in focus ( . . ), and micrographs, which were out of defocus range (< . and > μm), contaminated with ice or aggregates, and with a low-resolution estimation of the ctf fit (> Å), were discarded. , particles were picked from the remaining , micrographs by cryolo . . [ ] , and imported in cryosparc v . . [ ] for d classification with a box size of pixels. after d classification, , particles were imported into relion- . . [ ] and subjected to a d classification without imposed symmetry, where an ab-initio generated map from cryosparc low-pass filtered to Å was used as reference. two classes resembling spike protein, revealed two distinct conformations. one class shows a symmetrical state with all rbds in an up conformation ( up) and both sybodies bound to each rbd ( , particles, %). in the asymmetrical class ( , particles, %) the rbds adopt one up, one up-out and one down conformation ( up/ up-out/ down), where both sybodies are bound to rbds up and up-out state, while only sb# is bound to the down rbd. the up class was further refined with c symmetry imposed. the final refinement, where a mask was included in the last iteration, provided a map at . Å resolution. six rounds of per-particle ctf refinement with beamtilt estimation and re-extraction of particles with a box size of pixels improved resolution further to . Å. the particles were then imported into cryosparc, where non-uniform refinement improved the resolution to Å. the asymmetrical up/ up-out/ down was refined in an analogous manner with no symmetry imposed, resulting in a map at . Å resolution. six rounds of per-particle ctf refinement with beamtilt estimation improved resolution to . Å. a final round of non-uniform refinement in cryosparc yielded a map at . Å resolution. local resolution estimations were determined in cryosparc. all resolutions were estimated using the . cut-off criterion [ ] with gold-standard fourier shell correlation (fsc) between two independently refined half-maps [ ] . the directional resolution anisotropy of density maps was quantitatively evaluated using the dfsc web interface (https:// dfsc.salk.edu) [ ] . a similar approach was performed for the image processing of the s- p/sb# complex. in short, , micrographs were recorded, and , used for image processing after selection. , particles were autopicked via cryolo and subjected to d classification in cryosparc. , selected particles were used for subsequent d classification in relion- . . , where the symmetrical up map, described above, was used as initial reference. the best class comprising , particles ( %) represented an asymmetrical up/ up-out/ down conformation with sb# bound to each rbd. several rounds of refinement and ctf refinement yielded a map of . Å resolution. for the dataset of the s- p/sb# complex, in total , images were recorded, with , used for further image processing. , particles were autopicked via cryolo and subjected to d classification in cryosparc. , selected particles were imported into relion- . . and used for subsequent d classification, where the symmetrical up map, described above, was used as initial reference. two distinct classes of spike protein were found. one class ( , particles, %) revealed a state in which two rbds adopt an up conformation with sb# bound, whereby the density for the third rbd was poorly resolved representing an undefined state. several rounds of refinement and ctf refinement yielded a map of . Å resolution. two other classes, comprising , particles ( %) and , particles ( %), were identical. they show a up/ down configuration without sb# bound to any of the rbds. both classes were processed separately, whereby the class with over k particles yielded the best resolution of . Å and was used for further interpretation. a final non-uniform refinement in cryosparc further improved resolution down to . Å. defocus range (μm) - . to - . - . to - . - . to - . - . to - . - . to - . pixel size (Å) the plasmids encoding for the six highest affinity binders are available through addgene (addgene # -# ). purified sb-fc constructs can be commercially obtained from absolute antibody. the three-dimensional cryo-em density maps are available for the reviewers upon request. all cryo-em data will be deposited in the electron microscopy data bank and include the cryo-em maps, both half-maps, the unmasked and unsharpened refined maps and the mask used for final fsc calculation. raw cryo-em data will be deposited in the electron microscopy public image archive (empair). inhibi on of the rdb-ace interac on by sybodies. (a) elisa inhibi on screen. individual purified sybodies ( nm, sybody number shown on x-axis) were incubated with bio nylated rbd-vyfp ( nm) and the mixtures were exposed to immobilized ace . bound rbd-vyfp was detected with streptavidin-peroxidase/tmb. each column indicates backgroundsubtracted absorbance at nm, normalized to the signal corresponding to rbd-vyfp in the absence of sybody (dashed red line). (b) compe on of sybodies and ace for spike binding inves gated by gci. s- p was immobilized on the gci chip and sb# ( nm), sb# ( nm) and non-randomized control sybody sb# ( nm) were injected alone or premixed with ace ( nm). neutraliza on of viral entry using pseudotyped vsvs. (a) rela ve infec vity in response to increasing sybody concentra ons was determined. the black curve shows data when a mixture of sb# and sb# was added. (b) same assay as in (a) with sybodies fused to human fc to generate bivalency. error bars represent standard devia ons of three biological replicates. sybody selec on strategy against sars-cov- rbds. a total of six independent selec on reac ons were carried out, including a target swap between ribosome display and phage display rounds. enriched sybodies of phage display round of all three libraries were expressed and purified as a pool and used to perform an off-rate selec on in phage display round . for each of the six independent selec on reac ons, clones were picked at random and analyzed by elisa. micro ter plate wells were coated with inidividual sybodies, incubated with bio nylated constructs (receptor-binding domain, rbd; spike ectodomain, ecd; pre-fusion spike, s- p; maltose-binding protein, mbp), and then detected with streptavidin-peroxidase/tmb. a nonrandomized sybody was used as nega ve control (wells h and h , respec vely). sybodies that were sequenced are marked with the respec ve sybody name (sb_# - ). please note that iden cal sybodies that were found - mes are marked with the same sybody name (e.g. sb_# ). loop . phylogene c tree of rbd sybodies. a radial tree was generated in clc . . . figure s kine c characteriza on of sybodies by gci. (a) rbd-vyfp and ecd were immobilized as indicated and the six top sybodies were injected at increasing concentra ons ranging from . nm to μm. data were fi ed using a langmuir : model. (b) in depth affinity characteriza on of sb# and sb# . rbd-vyfp and s- p were immobilized as indicated and sb# and sb# were injected at concentra ons ranging from . nm to nm for sb# and . nm to nm for sb# . for rbd, data were fi ed using a langmuir : model. for s- p, the data were fi ed with the heterogeneous ligand model, because the : model was clearly not appropriate to describe the experimental data. corresponding data for s- p is shown in main fig. c . simultaneous binding of sb# and sb# . compe on elisa experiment in which sb# was coated on the elisa plate and rbd binding was assesses in the absence of presence of tag-less sybodies as indicated in the x-axis. to determine the background signal, buffer devoid of protein was added. herd immunity -estimating the level required to halt the covid- epidemics in affected countries the reproductive number of covid- is higher compared to sars coronavirus estimation of the reproductive number of novel coronavirus (covid- ) and the probable outbreak size on the diamond princess cruise ship: a data-driven analysis presumed asymptomatic carrier transmission of covid- estimating clinical severity of covid- from the transmission dynamics in wuhan, china preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies. viruses the sars-cov- vaccine pipeline: an overview use of antiviral drugs to reduce covid- transmission a novel coronavirus outbreak of global health concern a sars-like cluster of circulating bat coronaviruses shows potential for human emergence structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor human coronaviruses oc and hku bind to -o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a pre-fusion structure of a human coronavirus spike protein cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains viral membrane fusion unexpected receptor functional mimicry elucidates activation of coronavirus fusion identification of immunodominant sites on the spike protein of severe acute respiratory syndrome (sars) coronavirus: implication for developing sars diagnostics and vaccines the spike protein of sars-cov--a target for vaccine and therapeutic development development and characterisation of neutralising monoclonal antibody to the sars-coronavirus human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants potent neutralization of mers-cov by human neutralizing monoclonal antibodies to the viral spike glycoprotein cross-neutralization of human and palm civet severe acute respiratory syndrome coronaviruses by antibodies targeting the receptor-binding domain of spike protein receptor-binding domain of severe acute respiratory syndrome coronavirus spike protein contains multiple conformation-dependent epitopes that induce highly potent neutralizing antibodies a novel nanobody targeting middle east respiratory syndrome coronavirus (mers-cov) receptor-binding domain has potent cross-neutralizing activity and protective efficacy against mers-cov structure of severe acute respiratory syndrome coronavirus receptorbinding domain complexed with neutralizing antibody fully human single-domain antibodies against sars-cov- . biorxiv immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen nanobodies: natural single-domain antibodies a general protocol for the generation of nanobodies for structural biology structure of a nanobody-stabilized active state of the beta( ) adrenoceptor. nature synthetic single domain antibodies for the conformational trapping of membrane proteins. elife generation of synthetic nanobodies against delicate proteins nanobodies® as inhaled biotherapeutics for lung diseases selection, biophysical and structural analysis of synthetic nanobodies that effectively neutralize sars-cov- . biorxiv potent synthetic nanobodies against sars-cov- and molecular basis for neutralization single beam grating coupled interferometry: high resolution miniaturized label-free sensor for plate based parallel screening structure-based design of prefusion-stabilized sars-cov- spikes rapid quantification of sars-cov- -neutralizing antibodies using propagation-defective vesicular stomatitis virus pseudotypes. vaccines (basel) virological assessment of hospitalized patients with covid- modulation of protein properties in living cells using nanobodies structural basis for the neutralization of sars-cov- by an antibody from a convalescent patient neutralization of sars-cov- by destruction of the prefusion spike distinct conformational states of sars-cov- spike protein. science structures of human antibodies bound to sars-cov- spike reveal common epitopes and recurrent features of antibodies ultrapotent human antibodies protect against sars-cov- challenge via multiple mechanisms structural basis of a shared antibody response to sars-cov- . science antibody cocktail to sars-cov- spike protein prevents rapid mutational escape seen with individual antibodies neutralizing nanobodies bind sars-cov- spike rbd and block interaction with ace an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction de novo design of picomolar sars-cov- miniprotein inhibitors an ultra-high affinity synthetic nanobody blocks sars-cov- infection by locking spike into an inactive conformation engineered peptide barcodes for in-depth analyses of binding protein libraries a highly efficient modified human serum albumin signal peptide to secrete proteins in cells derived from different mammalian species a versatile and efficient high-throughput cloning tool for structural biology x-ray structure of a calcium-activated tmem lipid scramblase a variant of yellow fluorescent protein with fast and efficient maturation for cell-biological applications one-step purification of recombinant proteins using a nanomolar-affinity streptavidin-binding peptide, the sbp-tag. protein expression and purification structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies a human monoclonal antibody blocking sars-cov- infection a vesicular stomatitis virus replicon-based bioassay for the rapid and sensitive determination of multi-species type i interferon focus: the interface between data collection and data processing in cryo-em motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy ctffind : fast and accurate defocus estimation from electron micrographs sphire-cryolo is a fast and accurate fully automated particle picker for cryo-em cryosparc: algorithms for rapid unsupervised cryo-em structure determination new tools for automated high-resolution cryo-em structure determination in relion- . elife optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy prevention of overfitting in cryo-em structure determination addressing preferred specimen orientation in single-particle cryo-em through tilting we thank rony nehmé and andré heuer (creoptix ag, wädeswil, switzerland) for the acquisition, fitting and interpretation of a first set of gci measurements using the wavesystem. we thank florence projer, david hacker and kelvin lau (protein production and structure core facility, epfl, switzerland) for the production of the pre-fusion spike protein. we are grateful to jason mclellan (the university of texas at austin, u.s.) for having provided the pre-fusion-stabilized soluble spike expression vectors for s- p and s- p. we thank michael fiebig (absolute antibody) for providing us with purified sb-fc. we thank raimund dutzler and marta sawicka (university of zurich) for freezing cryo-em grids. michiel punter (university of groningen) is acknowledged for it help. key: cord- -ll frwc authors: sun, shihui; gu, hongjing; cao, lei; chen, qi; yang, guan; li, rui-ting; fan, hang; ye, qing; deng, yong-qiang; song, xiaopeng; qi, yini; li, min; lan, jun; feng, rui; guo, yan; qin, si; wang, lei; zhang, yi-fei; zhou, chao; zhao, lingna; chen, yuehong; shen, meng; cui, yujun; yang, xiao; wang, xinquan; wang, hui; wang, xiangxi; qin, cheng-feng title: characterization and structural basis of a lethal mouse-adapted sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ll frwc the ongoing sars-cov- pandemic has brought an urgent need for animal models to study the pathogenicity of the virus. herein, we generated and characterized a novel mouse-adapted sars-cov- strain named mascp that causes acute respiratory symptoms and mortality in standard laboratory mice. particularly, this model exhibits age and gender related skewed distribution of mortality akin to severe covid- , and the % lethal dose (ld ) of mascp was ∼ pfu in aged, male balb/c mice. deep sequencing identified three amino acid mutations, n y, q h, and k n, subsequently emerged at the receptor binding domain (rbd) of mascp , which significantly enhanced the binding affinity to its endogenous receptor, mouse ace (mace ). cryo-electron microscopy (cryo-em) analysis of mace in complex with the rbd of mascp at . -angstrom resolution elucidates molecular basis for the receptor-binding switch driven by amino acid substitutions. our study not only provides a robust platform for studying the pathogenesis of severe covid- and rapid evaluation of coutermeasures against sars-cov- , but also unveils the molecular mechanism for the rapid adaption and evolution of sars-cov- in mice. one sentence summary a mouse adapted sars-cov- strain that harbored three amino acid substitutions in the rbd of s protein showed % mortality in aged, male balb/c mice. coronavirus disease caused by severe acute respiratory syndrome coronavirus (sars-cov- ), has resulted in a public health crisis ( ) . the symptoms of covid- are similar to those of sars-cov and mers-cov infections, ranging from fever, fatigue, dry cough and dyspnea, and mild pneumonia to acute lung injury (ali) and the acute respiratory distress syndrome (ards) in severe cases. in fatal cases, multi-organ failures accompanied by a dysregulated immune response have been observed ( ) ( ) ( ) . numerous studies have highlighted age and gender related discrepancies in the distribution of covid- cases where the elderly and men tend to have a higher case-fatality ratio when compared to the young and females, suggesting that aged man are more likely to succumb to covid- ( , ) . coronaviridae family, and is an enveloped, single stranded positive-sense rna virus. human angiotensin-converting enzyme (ace ), has been demonstrated as the functional receptor for sars-cov- ( , ) . sars-cov- cannot infect wild-type laboratory mice due to inefficient interactions between its s protein and the ace receptor of mouse (mace ). ( ) . so, several hace expressing mouse models such as hace transgenic mice ( ) , aav-hace transduced mice ( ) and ad -hace transduced mice ( ) have been developed. furthermore, mouse adapted strains of sars-cov- have also been developed via either in vivo passaging or reverse genetics ( ) ( ) ( ) . however, all these models cause only mild to moderate lung damage in mice. a small animal model capable of recapitulating the severe respiratory symptoms and high case fatality ratio of covid- remains to be established. in this study, we have developed and characterized a new lethal strain of sars-cov- named mascp by using the in vivo passaging method. previously, we reported the development of mascp , a mouse-adapted sars-cov- strain using a similar strategy ( ) . remarkably, intranasal injection of mascp caused % fatality in aged balb/c mice. prior to death, all infected animals developed severe malfunctions of the respiratory system, including acute respiratory distress syndrome in our previous study, we generated a mouse-adapted strain of sars-cov- (mascp ) by serial passages of a sars-cov- in the lung of aged balb/c mice, which caused moderate lung damage and no fatality in mice. herein, we further serially passaged for additional times to generate a more virulent sars-cov- , and the resulting virus at passage ( named as mascp ) was used for stock preparation and titration. to characterize the pathogenicity of mascp , groups of balb/c were subjected to intranasal injection of varying doses of mascp . strikingly, survival curve analysis showed that high doses of mascp caused almost % mortality within ten days in all aged mice, while young mice were resistant to mascp challenge and no animals developed disease and died in this group (fig. a-d) . aged mice challenged with high dose of mascp developed typical respiratory symptoms and exhibited features like ruffled fur, hunched back, and reduced activity. of particular note, tachypnea was common in all moribund animals (supplemental video). for the aged mice, male animals were more susceptible to mascp in comparison to female ones, and the ld was calculated to pfu and pfu, respectively (fig. a, b) . additionally, this unique gender-dependent mortality was also recorded in aged c bl/ mice challenged with mascp ( fig. s ) where % of the males challenged with the virus died from respiratory symptoms. thus, serial passaging of mascp generated a more virulent mascp , which was sufficient to cause % mortality in aged balb/c mice at low dose (~ pfu). we further characterized the in vivo replication dynamics of mascp in both young and aged mice, and the results from qrt-pcr showed that high levels of sars-cov- subgenomic rnas were persistent in the lung and tracheas till day post infection (dpi) in aged mice (fig. e) . marginal viral rnas were also detected in the intestine, heart, liver, spleen, brain, and kidney. the young mice had a similar tissue distribution as the aged ones upon mascp challenge, and lung and tracheas represented the major tissues supporting viral replication (fig. f ). multiplex immuno staining showed that mascp predominantly targeted the airway cc + club cells and spc+ at cells, while foxj + ciliated cells and pdpn+ alveolar type (at ) cells detected few positive signals (fig. g ). more sars-cov- -infected cells were detected in the lung from aged mice than those from young animals at dpi (fig. h) , which was in agreement to the high ratio of spc+ at cells that co-express ace in aged mice ( fig. s ). consequently, a striking loss of spc+ at cells with apoptosis were observed in the lung from aged mice ( fig. s ). collectively, aged male mice that developed severe respiratory symptoms and % fatality serve as the most suitable animal host for mascp and were chosen for subsequent analysis. to further characterize the pathogical outcome in mascp infected aged male mice, the lung as well as other organs were collected at dpi and subjected to histopathological and immunostaining analysis. when observed by the naked eye, lung tissue samples from mascp infected animals showed visible lung injury characterized with biolateral cardinal red appearance. furthermore, a lot of sticky secretion was seen at the lung surfaces when compared with the control animals ( fig. a ). according to the metrics of acute lung injury (ali) laid out by the american thoracic society (ats), mascp infection induced necrotising pneumonia and extensive diffuse alveolar damages (dad) and even ards on day ( ) . the microscopic observation showed large quantities of desquamative epithelial cells in bronchiole tubes (yellow arrow) and a large area of necrotic alveoli epithelial cells, fused alveoli walls with inflammatory cells infiltration especially neutrophils in alveolar septae or alveolar airspace, serious edema around vessels (cyan arrow) and scattered hemorrhage (blue arrow) ( fig. a ). in addition, foamy cells, polykaryocytes, fibrin cluster deposition, and hyaline membrane formation were common in the mascp infected animals (fig. b) , indicative of acute respiratory distress syndrome (ards), which is well characterized in severe covid- patients ( ). interestingly, typical viral inclusion bodies were also observed occasionally in lungs of infected mice (fig. b) . comparatively, lung damage in young mice infected with mascp were much milder than those of the aged mice; less visual lung damage were seen in gross necropsy, and only thickened alveolar septa and activated inflammatory cell infiltration ( , ) . these pathological damages caused by mascp in mice recapitulated most spectrums of seriously ill covid- patients caused by sars-cov- infection. additionally, immunostaining of lung sections showed a significant infiltration of cd + macrophages and ly- g + neutrophils in mascp infected mice ( fig. s ). interestingly, more cd + macrophages and ly- g + neutrophils were detected in young mice than that in aged mice dpi, and reversed on dpi, which indicated rapid and short-lived immune response to limite viral replication in young mice. to test the utility of mascp infected mouse model for evaluation of antiviral candidates, h , a known human monoclonal antibody targeting the rbd of sars-cov- ( ), was examined for its ability to confer benefit during the infection. to deduce the genetic basis for the lethal phenotype of mascp , deep sequencing was performed to identify the mutations emerged during the in vivo passaging history. and fig. s b ). to clarify the potential role of these mutations, the rbd of these different adaptive strains were expressed to assay their binding affinities to mace ( fig. s ). as expected, the wt rbd presented no detectable binding, but rbds from mouse-adapted strains (rbdmascp , rbdmascp and rbdmascp ) gain gradually enhanced binding abilities to mace with affinities ranging from ~ μm to μm (fig. b ). the increased affinity between mace and the rbd of mouse-adapted strains probably contribute to the enhanced virulence in mice. to further elucidate the molecular basis for the gradual change in specificity of mascp , structural investigations of the mace in complex with rbdmascp or rbdmascp were carried out. two non-competing fab fragments that recognize the rbd beyond the mace binding sites were used to increase the molecular weight of this complex for pursuing an atomic resolution by cryo-em reconstruction (fig. s -s ). interestingly, cryo-em characterization of the mace in complex with rbdmascp revealed that the complex adopts three distinct conformational states, corresponding to tight binding (state ), loose binding (state ) and no binding modes (state ) (fig. s ), indicative of a quick association and quick dissociation interaction manner between the mace and rbdmascp . however, only the tight binding conformation was observed in the mace -rbdmascp complex structure, reflecting a more stable/mature binding mode for the rbdmascp to mace , akin to that of the rbdwt and hace . we determined asymmetric cryo-em reconstructions of the mace -rbdmascp complex at . Å and three states of the mace -rbdmascp complex at . to . Å (figs. s -s and table s ). the map quality around the mace -rbdmascp interface was of sufficient quality for a reliable analysis of the interactions ( fig. c and fig. s ). the overall structure of the mace -rbdmascp complex resembles that of the rbdwt-hace complex with a root mean square deviation of . Å (fig. c ). the rbdmascp recognizes the helices (α and α ) located at the apical region of the mace via its receptor binding motif (rbm) (fig c- e ). the interaction area on the mace could be primarily divided into three patches (pi, pii and piii), involving extensive hydrophilic and hydrophobic interactions with three regions separately clustered by three adaptation-mediated mutated residues (k n, corresponding to clus ; q h, corresponding to clus ; and n y, corresponding clus ) in the rbm ( fig c- e ). coincidentally, a number of amino acid substitutions, such as q k, q y and p t, in the rbm identified in other reported mouse-adapted sars-cov- isolates ( , , ) were included either in the clus or clus , underlining the putative determinants for cross-transmission (fig f- g ). an extra clus is further accumulated in the mascp to gain utmost binding activity and infection efficacy ( fig f- g ). the extensive hydrophobic interactions in clus constructed by y (or y or h in other mouse-adapted sars-cov- isolates), y in the rbdmascp and y , h in the mace , hydrogen bonds in clus formed h (k in other mouse-adapted strain) in the rbdmascp and n , e in the mace and hydrophilic contacts constituted by n in the rbdmascp and n , q in the mace contribute to the tight binding of the mascp to mace . contrarily, structural superimposition of the rbdwt over the mace -rbdmascp complex reveals the loss of these interactions, leading to the inability of the rbdwt to bind mace (fig g) . these analysis pinpoints key structure-infectivity correlates, unveiling the molecular basis for adaptation-mediated evolution and cross-transmission of sars-cov- . clinically, the severe covid- disease onset might result in death due to massive alveolar damage and progressive respiratory failure ( ) ( ) ( ) . distinct from all currently reported animal models which mimic the mild to moderate clinical symptoms of covid- , the mascp infected mouse model could manifest many of the severe clinical syndromes associated with covid- disease such as pulmonary oedema, fibrin plugs in alveolar, hyaline membrane, and scattered hemorrhage ( , ) . the as the functional receptor of sars-cov and sars-cov- , ace , is highly expressed on vascular endothelial cells and smooth muscle cells in multiple organs, which probably leads to the observed viral tropism contributing to cellular injury. in this model, closely correlated with the higher ratio of ace -positive cells in type ii pneumocytes in aged mice when compared to those in young mice, massive injury of (at cells) type ii pneumocytes was observed in aged mice. therefore, age-related ace expression pattern in lungs might contribute to the severe phenotype observed in aged mice. although sars-cov- viral antigen has been detected in kidney of postmortem specimens ( ) , no viral antigen or viral rna were detected in our model. so in this mascp infected mouse model, the kidney injury may arise due to secondary endothelial injury leading to proteinuria. in addition, although sars-cov- has also been implicated to have neurotropic potential in covid- ( ), we did not find typical characteristics of viral encephalitis in this model. importantly, the imbalanced immune response with high-levels of proinflammatory cytokines, increased neutrophils and decreased lymphocytes, which were in line with sars and mers infections ( ), playing a major role in the pathogenesis of covid- ( ) , were also observed in this model. the skewed age distribution of covid- disease was reproduced in the mascp infected mouse model where more severe symptoms were observed in aged mice when compared to young mice. different from h n pandemic( ), covid- appears to have a mild effect on populations under years, and the elderly are more likely to progress to severe disease and are admitted to intensive care unit (icu) worldwide ( ) . ace , the functional receptor of sars-cov- , expressed increasingly in the lungs with age, which might provide an explanation to the higher disease severity observed in older patients with covid- ( ) . more importantly, the host immune response may determine the outcome of the disease. our immune system is composed of innate immunity and adaptive immunity. the innate immunity comprises of the first line of defense against pathogens and is acute as well as short lived. however, aging is linked with insufficient, prolonged and chronic activation of innate immunity associated with low-grade and systemic increases in inflammation (inflamm-aging) which can be detrimental for the body ( ) . the delicate co-operation and balance are interrupted by the chronic activation of innate immunity and declined adaptive immune responses with increasing age in covid- ( ) . in the mascp infected mouse model, the young mice presented acute inflammatory response with more innate immune cells infiltration on day , while lagged and sustained immune response in aged mice. the different immune response in mice model may be vital in limiting virus replication at early times and contribute to different outcome on day in young or aged mice. in addition to the age-related skewed distribution of covid- , gender-related differences in distribution of covid- disease is also recapitulated in this mascp infected mouse model with increased susceptibility and enhanced pathogenicity observed in male mice when compared to their female counterparts. biological sex is an important determinant of covid- disease severity ( ) . in china, the death rate among confirmed cases is . % for women and . % for men ( ) . in italy, half of the confirmed covid- cases are men which account for % of all deaths ( ) . this pattern is generally consistent around the world. the skewed distribution of covid- suggests that physiological differences between male and female may cause differential response to infection. so the hypothesis that females display reduced susceptibility to viral infections may be due to the stronger immune responses they mount than males ( ) . it has been studied that androgens may lower and estrogens may enhance several aspects of host immunity. in addition, androgens facilitate and estrogens suppress lymphocyte apoptosis. furthermore, genes on the x chromosome important for regulating immune functions, and androgens may suppress the expression of disease resistance genes such as the immunoglobulin superfamily ( ) . in the mascp infected mouse model, we found out that it presented higher mortality of the male than the female infected with the same dose of virus, indicating the successful recapitulation of covid- and also its potential application in the study of the pathogenesis of the disease. learning from sars-cov ma with increased virulence in mice, multiple gene products may contribute independently to the virulence. unlike the sars-cov ma , three subsequent emerged mutations (n y, k n, and q h) in the mascp were located in rbd, which breaks a barrier for cross-species transmissions of sars-cov- , enabling gradually adapted recognition of sars-cov- to mace during the in vivo passages in mice. the serially increased affinities between mace and the rbd of mouse-adapted strains confer to enhanced infections in mice. cryo-em structures of the mace in complex with rbdmascp and rbdmascp define preciously the atomic determinants of the receptor-binding switch, providing novel insights into adaptationmediated evolution and cross-transmission of sars-cov- . in addition, there are also amino acid substitutions outside the s protein of mascp (fig. s ) . at present, we cannot rule out the contribution of these mutations, and further validation with reverse genetic tools will help understand the biological function of each single mutation ( ) . figs. s to s center, beijing institute of microbiology and epidemiology (approval number: iacuc-dwzx- - ). mouse adapted strain of sars-cov- (mascp ) was developed in our previous study ( ) . additional serial passage of times was performed as previously described ( ) . multiplex immunofluorescent assay. the multiplex immunofluorescence assay was conducted as previously described ( ) . briefly, the retrieved sections were incubated with primary antibody for h followed by detection using the hrp-conjugated secondary antibody and tsa-dendronfluorophores (neon -color allround discovery kit for ffpe, histova biotechnology, nefp ). afterwards, the primary and secondary antibodies were thoroughly eliminated by heating the slides in retrieval/elution buffer (abcracker®, histova biotechnology, abcfr l) for sec at °c lung homogenates from mascp -infected mice or mock treated mice were processed as previously described ( ) and subjected to rna-seq. total rna from lung tissue were extracted using trizol (invitrogen, usa) and treated with dnase i (neb, usa). sequencing libraries were generated using nebnext® ultratm rna library prep kit for illumina® (neb, usa) following the manufacturer's recommendations and index codes were added to attribute sequences to each sample. the clustering of the index-coded samples was performed on a cbot cluster generation system using hiseq pe cluster kit v -cbot-hs (illumina) according to the manufacturer's instructions. after cluster generation, the libraries were sequenced on illumina novaseq platform and bp paired-end reads were generated. after sequencing, perl script was used to filter the original data (raw data) to clean reads by removing contaminated reads for adapters and low-quality reads. clean reads were aligned to the mouse genome (mus_musculus grcm . ) using hisat v . . . the number of reads mapped to each gene in each sample was counted by htseq v . . and tpm (transcripts per kilobase of exon model per million mapped reads) was then calculated to estimate the expression level of genes in each sample. deseq v . . was used for differential gene expression analysis. genes with padj≤ . and |log fc| > were identified as differentially expressed genes (degs). degs were used as query to search for enriched biological processes (gene ontology bp) using metascape. heatmaps of gene expression levels were constructed using heatmap package in r (https://cran.rstudio.com/web/packages/pheatmap/index.html). dot plots and volcano plots were constructed using ggplot (https://ggplot .tidyverse.org/) package in r. production of fab fragment. the b and d fab fragments ( ) were generated using a pierce fab preparation kit (thermo scientific). briefly, the antibody was mixed with immobilized-papain and then digested at ˚c for - h. the fab was separated from the fc fragment and undigested iggs by protein a affinity column and then concentrated for analysis. surface plasmon resonance. mace was immobilized onto a cm sensor chip surface using the nhs/edc method to a level of ~ response units (rus) using biacore® (ge healthcare) and pbs as running buffer (supplemented with . % tween- ). wtrbd, rbdmascp and rbdmascp , which were purified and diluted, were injected in concentration from high to low. the binding responses were measured, and chip surfaces were regenerated with mm glycine, ph . (ge healthcare). the apparent binding affinity (kd) for individual antibody was calculated using biacore® evaluation software (ge healthcare). for the competitive binding assays, the first sample flew over the chip at a rate of ul/min for s, after which the second sample was injected at the same rate for another s. all antibodies were evaluated at a saturation concentration of nm, while mace was applied at nm concentration. all surfaces of chips were regenerated with mm glycine, ph . (ge healthcare). the response units were recorded at room temperature and analyzed using the same software as mentioned above. for cryo-em sample preparation, the quaternary complex (rbdmascp /rbdmascp -fabb -fabd -mace ) was diluted to . mg/ml. holy-carbon gold grid (quantifoil r . / . mesh ) were freshly glow-discharged with a solarus plasma cleaner (gatan) for s. a μl aliquot of the mixture complex was transferred onto the grids, blotted with filter paper at ℃ and % humidity, and plunged into the ethane using a vitrobot mark iv (fei). for rbdmascp /rbdmascp -fabb -fabd -mace complex, micrographs were collected at kv using a titan krios microscope (thermo fisher), equipped with a k detector (gatan, pleasanton, ca), using serialem automated data collection software ( ) movies ( frames, each . s, total dose e − Å − ) were recorded at final pixel size of . Å with a defocus of between - . and - . μm. image processing. for rbdmascp -fabb -fabd -mace complex, a total of , micrographs were recorded. for rbdmascp -fabb -fabd -mace complex, a total of , micrographs were recorded. both sets of the data were processed in the same way. firstly, the raw data were processed by motioncor , which were aligned and averaged into motion-corrected summed images. then, the defocus value for each micrograph was determined using gctf. next, particles were picked and extracted for two-dimensional alignment. the partial well-defined particles were selected for initial model reconstruction in relion. the initial model was used as a reference for three-dimensional classification. after the refinement and postprocessing, the overall resolution of rbdmascp -fabb -fabd -mace complex was up to . Å, on the basis of the gold-standard fourier shell correlation (threshold = . ) ( ) . for rbdmascp -fabb -fabd -mace complex, the class i complex the resolution achieved was . Å, classii complex had a resolution of . Å, while class iii complex was reconstructed to a resolution of . Å. the quality of the local resolution was evaluated by resmap ( ) . model building and refinement. the wtrbd-hace (pdb id: m j) structures were manually docked into the refined maps of rbdmascp -fabb -fabd -mace complex using ucsf chimera ( ) and further corrected manually by real-space refinement in coot ( ) . the atomic models were further refined by positional and b-factor refinement in real space using phenix ( ) . validation of the final model was performed with molprobity ( ) . the data sets and refinement statistics are shown in table s . statistical analyses were carried out using prism software (graphpad). all data are presented as means ± standard error of the means (sem). statistical significance among different groups was calculated using the student's t test, fisher's exact test, two-way anova or mann-whitney test *, **, and *** indicate p < . , p < . , and p < . , respectively. dying with sars-cov- infection-an autopsy study of the first consecutive autopsy in suspected covid- cases the pathological autopsy of coronavirus disease (covid- ) in china: a review. pathogens and disease considering how biological sex impacts immune responses and covid- outcomes clinical features of covid- in elderly patients: a comparison with young and middle-aged patients structure of the sars-cov- spike receptor-binding domain bound to the ace receptor characteristics of sars-cov- and covid- comparative analysis reveals the species-specific genetic determinants of ace required for sars-cov- entry the pathogenicity of sars-cov- in hace transgenic mice mouse model of sars-cov- reveals inflammatory role of type i interferon signaling. biorxiv a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies a mouse-adapted model of sars-cov- to test covid- countermeasures adaptation of sars-cov- in balb/c mice for testing vaccine efficacy a mouse-adapted sars-cov- induces acute lung injury and mortality in standard laboratory mice adaptation of sars-cov- in balb/c mice for testing vaccine efficacy an official american thoracic society workshop report: features and measurements of experimental acute lung injury in animals covid- makes b cells forget, but t cells remember the novel severe acute respiratory syndrome coronavirus (sars-cov- ) directly decimates human spleens and lymph nodes. medrxiv human kidney is a target for novel severe acute respiratory syndrome coronavirus (sars-cov- ) infection. medrxiv structural basis for neutralization of sars-cov- and sars-cov by a potent therapeutic antibody mouse-adapted sars-cov- replicates efficiently in the upper and lower respiratory tract of balb/c and c bl/ j mice a mouse-adapted model of sars-cov- to test covid- countermeasures histopathology and ultrastructural findings of fatal covid- infections in washington state: a case series pathological study of the novel coronavirus disease (covid- ) through postmortem core biopsies. modern pathology : an official journal of the united states and canadian academy of pathology clinical features of covid- and factors associated with severe clinical course: a systematic review and meta-analysis. ssrn pathological findings of covid- associated with acute respiratory distress syndrome. the lancet identification of a potential mechanism of acute kidney injury during the covid- outbreak: a study based on singlecell transcriptome analysis neurological associations of covid- lung pathology of fatal severe acute respiratory syndrome a new coronavirus associated with human respiratory disease in china pandemic preparedness and response--lessons from the h n influenza of report of the who-china joint mission on coronavirus disease (covid- ) sars-cov- : virus dynamics and host response chronic inflammation in the etiology of disease across the life span disparities in age-specific morbidity and mortality from sars-cov- in china and the republic of korea characteristics of covid- patients dying in italy report based on available data on the lethal sex gap: covid- sex differences in immune responses spike mutation d g alters sars-cov- fitness acknowledgements: this work was in memory of prof this work was supported by the national key plan for scientific research and development of china quantifying the local resolution of cryo-em density maps visualization system for exploratory research and analysis tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions towards automated crystallographic structure refinement with phenix.refine molprobity: all-atom structure validation for macromolecular crystallography relion: implementation of a bayesian approach to cryo-em structure determination automated electron microscope tomography using robust prediction of specimen movements chadox ncov- vaccine prevents sars-cov- pneumonia in rhesus macaques data and materials availability: all requests for resources and reagents should be directed to c.-f.q. (qincf@bmi.ac.cn and qinlab @ .com) and will be fulfilled after completion of a materials transfer agreement. key: cord- -c rd a authors: popa, alexandra; genger, jakob-wendelin; nicholson, michael; penz, thomas; schmid, daniela; aberle, stephan w.; agerer, benedikt; lercher, alexander; endler, lukas; colaço, henrique; smyth, mark; schuster, michael; grau, miguel; martinez, francisco; pich, oriol; borena, wegene; pawelka, erich; keszei, zsofia; senekowitsch, martin; laine, jan; aberle, judith h.; redlberger-fritz, monika; karolyi, mario; zoufaly, alexander; maritschnik, sabine; borkovec, martin; hufnagl, peter; nairz, manfred; weiss, günter; wolfinger, michael t.; von laer, dorothee; superti-furga, giulio; lopez-bigas, nuria; puchhammer-stöckl, elisabeth; allerberger, franz; michor, franziska; bock, christoph; bergthaler, andreas title: mutational dynamics and transmission properties of sars-cov- superspreading events in austria date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c rd a superspreading events shape the covid- pandemic. here we provide a national-scale analysis of sars-cov- outbreaks in austria, a country that played a major role for virus transmission across europe and beyond. capitalizing on a national epidemiological surveillance system, we performed deep whole-genome sequencing of virus isolates from samples to cover major austrian sars-cov- clusters. our data chart a map of early viral spreading in europe, including the path from low-frequency mutations to fixation. detailed epidemiological surveys enabled us to calculate the effective sars-cov- population bottlenecks during transmission and unveil time-resolved intra-patient viral quasispecies dynamics. this study demonstrates the power of integrating deep viral genome sequencing and epidemiological data to better understand how sars-cov- spreads through populations. graphical abstract the sars-cov- pandemic has already infected more than million people in countries, causing , global deaths as of july th and extraordinary disruptions to daily life and economies ( , ). the international research community rapidly started to establish diagnostic tools, assess immunological and pathological responses and define risk factors for covid- ( ) ( ) ( ) ( ) . clustered outbreaks and superspreading events of sars-cov- pose a particular challenge to pandemic control ( ) ( ) ( ) ( ) . however, we still know comparatively little about the fundamental underlying properties of sars-cov- genome evolution and transmission dynamics within the human population. during its sweep across the globe, the . kb-long sars-cov- genome has accumulated mutations at a rate - fold lower than those for the sars, mers and influenza a viruses ( ) . acquired fixed mutations enable phylogenetic analyses and have already led to important insights into the origins and routes of sars-cov- spread ( ) ( ) ( ) ( ) . conversely, low frequency mutations and their changes over time within individual patients offer deep insights into the dynamics of intra-host evolution. the resulting viral quasispecies represent diverse groups of variants with different frequencies that form structured populations and maintain high genetic diversity, contributing to fundamental properties of infection and pathogenesis ( , ) . with a population of . million people, austria is located in the center of europe and operates a highly developed health care system that utilizes a national epidemiological surveillance program. as of june th , contact tracing was performed for all , reported cases that tested positive for sars-cov- , and , cases could be linked to epidemiological clusters ( ) (methods) . due to its prominent role in international winter tourism, austria has been implicated as a superspreading transmission hub across the european continent. tourism-associated spread of sars-cov- from austria may be responsible for up to half of all imported cases in denmark, norway and a considerable share of cases in many other countries including iceland and germany ( , , ) . in this study we phylogenetically and epidemiologically reconstructed major sars-cov- infection clusters in austria and analyzed their role in transcontinental virus spread. moreover, we combined our deep viral genome sequencing data with epidemiologically identified chains of transmissions and family clusters together with biomathematical analyses to study genetic bottlenecks and the dynamics of genome evolution of sars-cov- . our results provide fully integrated genetic and epidemiological evidence for continental spread of sars-cov- from austria and establish fundamental transmission properties of the sars-cov- in the human population. phylogenetic-epidemiological reconstruction of sars-cov- infection clusters in austria we sequenced sars-cov- rna samples from cases originating from different geographical locations across austria. the main focus was on the austrian provinces of tyrol and vienna (fig. a) , given that these two regions led to numerous positive cases (fig. s a) ( ) . the samples presented in this study capture the first phase of the outbreak (mid-february to mid-march ) as well as the peak of infections in austria (fig. a) , and cover a balanced sampling across multiple epidemiological and clinical parameters including age, sex and viral load ( fig. s b-c) . samples from both swabs (nasal, oropharyngeal) and secretions (tracheal, bronchial) were included, in order to investigate not only the evolutionary dynamics within the population, but also within individuals (fig. s d) . we assembled sars-cov- genome sequences, constructed phylogenies and identified low frequency mutations based on high-quality sequencing results with > million reads per sample and > % of mapped viral reads (fig. s a-b) . our pipeline was validated by experimental controls involving sample titration and technical sample replicates ( fig. s c- to investigate the link between local outbreaks in austria and the global pandemic, we performed phylogenetic analysis of sars-cov- genomes from the austrian cases (> % genome coverage, > % aligned viral reads) and , global genomes from the gisaid database (fig. b, table s ). our analysis revealed six distinct phylogenetic clusters defined by fixed mutation profiles that were mainly present in the tyrol region (tyrol- , tyrol- , tyrol- ) and in vienna (vienna- , vienna- , vienna- ) (fig. b) . these clusters are related to the global clades a, a, b and c (fig. s a) . our largest phylogenetic cluster, "tyrol- " (fig. s b) , whose cases are closely linked to the ski-resort ischgl, was assigned to clade c (fig. b) . this clade is predominantly populated by strains from north america (fig. s a) . integration of the phylogenetic analysis of austrian sars-cov- sequences with epidemiological data from contact tracing resulted in strong overlap of these two lines of evidence (fig. c, table s ). all sequenced samples from the epidemiological cluster a mapped to the relatively homogenous phylogenetic cluster vienna- (fig. c) with an index patient with reported travel history to italy. of note, both clusters tyrol- and vienna- originate from crowded indoor events (i.e. apré ski bar; spinning class), by now known origins for superspreading events. the epidemiological cluster s, which includes resident and travel-associated cases to the skiresort ischgl or the related valley paznaun, largely mapped to the phylogenetic cluster tyrol- (fig. c) . while different sars-cov- strains circulated in the region of tyrol, this data suggests that the epidemiological cluster s originated from a single strain with a characteristic mutation profile leading to a large outbreak in this region (fig. d) . to elucidate the possible origin of the sars-cov- strain giving rise to this cluster, we searched for strains matching its mutation profile among global sars-cov- sequences ( fig. d-e, fig. s c ). we found that all strains in clade c matched the mutation profile of the tyrol- cluster in our phylogenetic analysis (fig. e, fig. s c ). to reveal possible transmission lines between european countries at that time, we performed phylogenetic analysis using all , high-quality european sars-cov- sequences sampled before march that were available in the database gisaid. using this approach, we identified several samples matching the tyrol- cluster mutation profile from a local outbreak in the region hauts-de-france in the last week of february (fig. e, fig. s c ) ( ) . introduction events of this sars-cov- strain to iceland by tourists with a travel history to austria were reported as early as march (fig. e, s c ) ( ), indicating that this strain was already present in ischgl in the last week of february. these findings suggest that the emergence of the cluster tyrol- coincided with the local outbreak in france (fig. e) and with the early stages of the severe outbreak in northern italy ( ) . the viral genomes observed in the tyrol- cluster were closely related to those observed among the icelandic cases with a travel history to austria (fig. s d-e) ( ). vice versa, many of the icelandic strains with a "tyrol- " mutation profile had reported an austrian or icelandic exposure history (fig. s f) . together, these observations and epidemiological evidence support the notion that the sars-cov- outbreak in austria propagated to iceland. moreover, the emergence of these strains coincided with the emergence of the global clade c (fig. s c) . one week after the occurrence of sars-cov- strains with this mutation profile in france and ischgl, an increasing number of related strains based on the same mutation profile could be found across continents (fig. e) where they established new local outbreaks, for example in new york city ( ). as a popular international destination attracting thousands of tourists from european countries and overseas, ischgl may have played a critical role as transmission hub for the spread of clade c strains to europe and north america (fig. s g-h) . dynamics of low frequency and fixed mutations in clusters next, we sought to gain insights into the fundamental processes of sars-cov- infection by integrative analysis of viral genomes. mutational analysis of sars-cov- genomes from austrian cases revealed that more than half of the observed substitutions were non- synonymous, with the most frequent non-synonymous mutations occurring in nsp , orf a and orf (fig. s a-b ). an analysis of the mutational signatures in the , global strains and the austrian subset of sars-cov- isolates showed a non-homogeneous mutational pattern dominated by c>u, g>u and g>a substitutions (fig. s c) . moreover, we observed increased mutation rates in the ' and ' utrs ( fig. s d) . we found that % of the positions in the genome ( , total positions) harbored variants among the sequenced strains from austria and we identified mutational hotspots for both high and low-frequency mutations ( fig. a, fig. s a alleles being fixed in more than three samples and exhibiting a frequency < . in at least two other samples ( fig. a) . we confirmed the non-homogeneous mutational pattern of fixed mutations in our low-frequency data, suggesting that the same biological and evolutionary forces are at work for both types of mutations ( fig. s c-e) . based on our phylogenetic analysis, we found a sub-cluster inside the phylogenetic tyrol- cluster, defined by a fixed non-synonymous g>u mutation at position , (fig. b) . this mutation is absent from most of the other austrian cases but was detected at low and intermediary frequencies in other cases of the tyrol- cluster. interestingly, around the emergence of this mutation, sequences sharing the same mutational profile (i.e. tyrol- haplotype and g>u at position , ) appeared in other european countries including denmark and germany (fig. c) . similarly, a synonymous fixed c>u mutation at position , defines a subcluster inside the phylogenetic vienna- cluster (fig. d) . potential functional effects of this mutation are unknown although it is predicted to slightly alter the locally stable rna region with a destabilized central fold ( fig. s c-d) . the cases from this subcluster intersect with members of two families (families and ) (fig. e , see also conversely, four members of family , who tested positive for sars-cov- between march th and nd , were epidemiologically assigned to cluster al and also show a fixed u nucleotide at position , ( fig. d-e) . this data indicates the emergence of a fixed mutation within a family, and provides phylogenetic evidence to link previously disconnected epidemiological clusters. together, these results from two superspreading events (tyrol- , vienna- ) demonstrate the power of deep viral genome sequencing in combination with detailed epidemiological data for observing viral mutation on their way from emergence at low frequency to fixation. impact of transmission bottlenecks and intra-host evolution on sars-cov- mutational dynamics the emergence and potential fixation of mutations in the viral quasispecies depend on interhost bottlenecks and intra-host evolutionary dynamics ( , ) . to test the individual contributions of these forces, we first investigated the transmission dynamics between known pairs of donors and recipients by inferring the number of virions initiating the infection, also known as the genetic bottleneck size ( , ) . our set of sars-cov- positive-tested cases comprised twenty-three epidemiologically confirmed donor-recipient pairs (fig. a, fig. s a ). using a beta-binomial method to quantify the bottleneck size ( ), we found a median bottleneck size of virions (ranging from to greater than ) (fig. b) . the observed relatively large bottleneck size of sars-cov- is driven by the stability of mutation frequencies across transmissions (fig. s a ). next, we investigated the dynamics of intra-host evolution by using time-resolved viral sequences from nine longitudinally sampled patients. these patients were subject to different medical treatments and four of them succumbed due to covid- related complications (table s ) . we observed diverse mutation patterns across individual patients and time (fig. c, fig. s b ). samples from patients , , and showed a small number of stable lowfrequency mutations (≥ . and ≤ . ), while patients , , , , and exhibited higher variability including the fixation of individual mutations (fig. c, fig. s b ). the patientspecific dynamics of viral mutation frequencies may indicate influences of both host-intrinsic factors such as immune responses and overall state of their health as well as extrinsic factors such as different treatment protocols. finally, we examined the genetic distance between samples obtained across donor -recipient pairs and serially acquired patient samples. this analysis revealed that genetic divergence was greater between the consecutive samples within individual patients than during inter-host transmission (fig. d) , suggesting that viral intrahost evolution has a potentially strong impact on the landscape of fixed mutations. unprecedented global research efforts are underway to match the rapid pace of the covid- pandemic around the globe and its pervasive impact on health and socioeconomics. these efforts include the genetic characterization of sars-cov- to track viral spread and to dissect the viral genome as it undergoes changes in the human population. here, we leveraged deep viral genome sequencing in combination with national-scale epidemiological workup to reconstruct austrian sars-cov- clusters that played a substantial role in the international spread of the virus. notably, our time-resolved study shows how emerging lowfrequency mutations of sars-cov- become fixed in local clusters with subsequent spread across countries, thus connecting viral mutational dynamics within individuals and across populations. exploiting epidemiologically well-defined clusters and families, we were able to determine the inter-human genetic bottleneck for sars-cov- , i.e. the number of virions that start the infection and produce progeny in the viral population, at around - . our quantified bottlenecks are based on a substantial number of defined donor-recipient pairs and in agreement with recent studies implying larger bottleneck sizes for sars-cov- compared to estimates for influenza a virus ( , ( ) ( ) ( ) ( ) . these bottleneck sizes correlate inversely with higher mutation rates of influenza virus as compared to sars-cov- . of note, estimates of viral bottleneck size are likely influenced by many parameters including virus-specific differences and stochastic evolutionary processes ( ) . successful viral transmission also depends on other factors including the rate of decay of viral particles, availability of susceptible cells, the host immune response and co-morbidities ( , ) . to better understand the mechanisms at work, future investigations will need to probe these factors in the context of viral intra-host diversity across body compartments and time ( ) ( ) and assess how the underlying viral population structures act together and influence genome evolution of sars-cov- ( , ) . this study underscores the value of tightly integrated epidemiological and molecular sequencing approaches to provide the high spatiotemporal resolution necessary for public health experts to track pathogen spread effectively. this enables the retrospective identification of sars-cov- chains of transmissions and international hotspots such as the phylogenetic cluster tyrol- ( , ( ) ( ) ( ) . our data also show that all but cluster tyrol- carry the prevalent d g mutation in the s protein ( ), supporting the notion of multiple introduction events to tyrol. moreover, our phylogenetic analysis of cluster vienna- allowed us to uncover previously unknown links between epidemiological clusters. this observation was subsequently confirmed by extended contact tracing, demonstrating deep viral genome sequencing directly contributes to public health efforts. the time has come to implement the technical capacities and interdisciplinary framework for prospective near real-time tracking of sars-cov- infection clusters by integrating approaches which combine viral phylogenetic and epidemiological evidence as well as possible complementary data such as serological testing ( ) . such inter-disciplinary platforms will be particularly relevant for the prevention of superspreading events and the assessment of the effectiveness of pandemic containment strategies in order to improve the preparedness for anticipated recurrent outbreaks and resurgences of sars-cov- as well as future pandemics ( ). sequencing data processing and analysis following demultiplexing, fastq files containing the raw reads were inspected for quality criteria (base quality, n and gc content, sequence duplication, over-represented sequences) using the fastqc (v. . . ) ( ). trimming of adapter sequences was performed with bbduk from the bbtools suite (http://jgi.doe.gov/data-and-tools/bbtools). overlapping read sequences in a pair were corrected for with bbmerge function from the bbtools. read pairs were mapped on the combined hg and sars-cov- genome (genbank: mn . ) using the bwa-mem software package (v . . ) ( ). only reads mapping uniquely to the sars-cov- viral genome were extracted. primer sequences were removed after mapping by masking with ivar ( ) . from the viral reads bam file, the consensus fasta file was generated using the samtools (v . ) ( ) # , twist biosciene) were titrated in increasing ratios ( . %, %, %, %, %, %) and subjected to cdna synthesis and pcr amplification as described. these controls are important for assessing the limit of low frequency detection across samples. rna was extracted and processed independently from sample cemm to serve as a technical control for pcr processing. amplicons from the cemm sample were sequenced twice in order to assess the potential biases introduced by the sequencing step. phylogenetic analysis was conducted using the augur package (version . . ) ( ). we compiled a randomly subsampled dataset of full length viral genomes with high coverage (< % ns) that were available from gisaid (https://www.gisaid.org/, nd of june) and sequences obtained in this publication. sequences from gisaid were filtered for entries from human hosts with complete sampling dates. metadata information for patient age and sex were excluded from the analysis. multiple sequence alignments (msa) were performed using mafft ( ). a masking scheme for homoplasic and highly ambiguous sites was applied to avoid bias in the following phylogenetic analysis as discussed elsewhere [n. de maio, c. walker, r. borges, l. weilguny, g. slodkowicz, and n. goldman, "issues with sars-cov- sequencing data," virological.org.]. we reconstructed the phylogeny with the augur pipeline using iq-tree ( ) and further processed the resulting trees with treetime to infer ancestral traits of the nodes ( ) . phylogenetic trees were rooted with the genome of "wuhan-hu- / ". the same workflow was repeated for phylogenetic reconstruction of all high-quality european strains before march st available in the gisaid database by june th ( ). clade annotations for global trees were adapted from nextstrain.org (https://github.com/nextstrain/ncov/blob/master/config/clades.tsv), clusters of austrian strains were identified based on shared mutation profiles and patient location from epidemiological data. inter-host mutations were reconstructed using the augur pipeline to infer nucleotide changes at the internal nodes ( ) . positions reported as highly homoplasic were masked, including the first and the last nucleotides [n. de maio, c. walker, r. borges, l. weilguny, g. slodkowicz, and n. goldman, "issues with sars-cov- sequencing data," virological.org.]. the consequence type of the mutations was annotated using a customized implementation of the ensembl variant effect predictor (vep version ) using the first sars-cov- sequenced genome (ncbi id: nc_ v ) as a reference. the mutational profile was obtained as the normalized count of the number of mutations in each of the trinucleotide changes. to account for the genomic composition of the sars-cov- virus we also divided each triplet probability by the total number of available triples in the sars-cov- reference genome. for the intra-host analysis, the process to obtain the mutational spectra panels was the same as intra-host but using the low frequency variant calling output ( mutations across austrian samples with alleles frequencies between . and . ). the mutational profile was computed following the same rationale as for the inter-host variants. we aimed to assess whether the variation in the rate of single nucleotide substitution along the sars-cov- genome can be solely explained by its tri-nucleotide composition. we devised a statistical test performing local estimations of the deviation from the expectation of the observed number of mutations with respect to the expected based on the tri-nucleotide composition of a particular region of the genome. we first counted the total number of nonprotein affecting mutations (i.e., synonymous variants and upstream/downstream gene variants) that has been observed across sequenced viral genomes of infected individuals. the focus on non-protein affecting mutations aims to lessen the potential positive selection bias derived from their effect into the coding parts of the viral genome. we did not consider mutations in masked sites (see filtering of mutations for further information about masked sites). we next assigned to each reference nucleoside a probability of mutation of the three alternatives based on its tri-nucleotide context ( ' and ' nucleosides) and the relative probability of mutation derived from the , samples from gsaid. then we performed n (n= ) randomizations of the same number of observed mutations distributing them along the sars-cov- genome according to their mutational probability. protein-affecting mutations were not randomized, and masked sites were not available to the randomization. we then divided the , bp of the viral genome into windows of kb (except the last window with bps). analogously, in the zoom-in analysis, we divided the first and last kb window of the viral genome into windows of bp. for each window we estimated the mean and standard deviation number of simulated mutations within the window. finally, for each window we estimated the deviation from the expectation using a log-likelihood test (i.e., g-test goodness of fit), where we compared the observed number of mutations in the window versus the mean simulated number. to address the question whether mutations that have been observed in the austrian sars-cov- samples have an influence on the rna structure of the virus we performed computational predictions at the secondary structure level with the viennarna package ( ) . we started with characterizing locally stable rna structures in the sars-cov- reference genome nc . with rnalfold. we required that the underlying sequences were not longer than nt and we targeted thermodynamic stability by selecting only regions whose free energy z score was at least - among dinucleotide shuffled sequences of the same sequence composition. this approach yielded locally stable rna secondary structures spread throughout the sars-cov- genome, which were then intersected with unique fixed mutations found in austrian samples, i.e. positions , , , , , , , , , , , and - . this approach resulted in seven hits which were subsequently analyzed in detail. we performed single sequence minimum free energy (mfe) structure predictions for both the reference and the mutation variants. in addition, we assessed for each region the level of structural conservation within a set of phylogenetically related viruses. here we were particularly interested in finding evidence for covariation in stacked helices. typical covariation patterns are compensatory mutations, i.e. cases where a mutation in one nucleotide is compensated by a second mutation of its pairing partner, such as a gc base-pair being replaced by an au pair. likewise, consistent mutations comprise cases where only one nucleotide is exchanged, thereby maintaining the base-pair, e.g. gc to gu. we characterized orthologous regions in selected sarbecovirus species with infernal ( ), produced structural multiple sequence nucleotide alignments with locarna ( ) and computed consensus structures with rnaalifold ( ) . in addition, each block was analyzed for structural conservation by rnaz ( ) . bottleneck estimation. we first set out to estimate the transmission bottleneck sizes for each donor-recipient pair. our analysis is based on the beta-binomial method presented in leonard et al ( ) . for shared mutations with a defined donor to recipient transmission (fig. b) , we determined those mutations present in both samples and calculated their absolute difference in frequency. similarly, we made the same computations between time consecutive pairs for serially sampled patients. if multiple samples were obtained on the same day, the sample with lowest ct value was considered. note that the time-consecutive pairs had differing number of days between samples. to these genetic distances obtained from the shared variants we added the sum of the frequencies of the variants detected in only one of the pairs of shared samples; i.e. we calculated the l -norm of the variant frequencies. statistical difference between the genetic distances from transmission pairs versus consecutive pairs from serially sampled patients, was determined by a wilcoxon (one-sided) rank-sum test. hörmann for providing samples and tobias pahlke for support with computing cluster. we thank the tourism office paznaun -ischgl for statistical data. competing interests: authors declare no competing interests. data and materials availability: virus sequences are deposited in the gisaid database (see table s ). all phylogenetic trees used in this study are available for visualization under the following urls: global build: https://nextstrain.org/community/bergthalerlab/sars-cov- /nextstrainaustria ; european build with european strains before march: https://nextstrain.org/community/bergthalerlab/sars-cov- /earlyeurope build with austrian strains used for phylogenetic analysis: https://nextstrain.org/community/bergthalerlab/sars-cov- /onlyaustrian (transmission chains, n= ) and intra-patient consecutive time points (n= ) (fig. c, fig. s b). only variants seen in two samples are considered. s : acknowledgements for sars-cov- genome sequences derived from gisaid table s : epidemiological clusters referred to in this study. table s : clinical information of covid- patients relating to figure and figure s . a pneumonia outbreak associated with a new coronavirus of probable bat origin immunology of covid- : current state of the science deep immune profiling of covid- patients reveals patient heterogeneity and distinct immunotypes with implications for therapeutic interventions viral and host factors related to the clinical outcome of covid- genomewide association study of severe covid- with respiratory failure superspreading and the effect of individual variation on disease emergence epidemiology of covid- in a long-term care facility in king county novel coronavirus-infected pneumonia in wuhan, china high sars-cov- attack rate following exposure at a choir practice -skagit county emergence of genomic diversity and recurrent mutations in sars-cov- spread of sars-cov- in the icelandic population introductions and early spread of sars-cov- in the new york city area investigation of three clusters of covid- in singapore: implications for surveillance and response measures genomic surveillance reveals multiple introductions of sars-cov- into northern california quasispecies diversity determines pathogenesis through cooperative interactions in a viral population viral quasispecies. virology. - emergence of coronavirus disease (covid- ) in austria sars-cov- transmission chains from genetic data: a danish case study superspreading and exportation of covid- cases from a ski area in austria estimating the burden of sars estimation of covid- outbreak size in italy matters of size: genetic bottlenecks in virus infection and their potential impact on evolution pathogen population bottlenecks and adaptive landscapes: overcoming the barriers to disease emergence transmission bottleneck size estimation from pathogen deep-sequencing data, with an application to human influenza a virus shared sars-cov- diversity suggests localised transmission of minority variants quantifying influenza virus diversity and transmission in humans stochastic processes constrain the within and between host evolution of influenza virus temporal dynamics in viral shedding and transmissibility of covid- virological assessment of hospitalized patients with covid- sociovirology: conflict, cooperation, and communication among viruses evolutionary dynamics in structured populations coast-to-coast spread of sars-cov- during the early epidemic in the united investigation of a covid- outbreak in germany resulting from a single travel-associated primary case: a case series a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity connecting clusters of covid- : an epidemiological and serological investigation projecting the transmission dynamics of sars-cov- through the postpandemic period a proposal of alternative primers for the artic network's multiplex pcr to improve coverage of sars-cov- genome sequencing fastqc a quality control tool for high throughput sequence data fast and accurate short read alignment with burrows-wheeler transform an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar the sequence alignment/map format and samtools the sequence alignment/map format and samtools lofreq: a sequence-quality aware, ultra- sensitive variant caller for uncovering cell-population heterogeneity from highthroughput sequencing datasets a statistical framework for snp calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w using drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program nextstrain: real-time tracking of pathogen evolution effective stochastic algorithm for estimating maximum-likelihood phylogenies treetime: maximum-likelihood phylodynamic analysis viennarna package . infernal . : -fold faster rna homology searches inferring noncoding rna families and classes by means of genome-scale structure-based clustering rnaalifold: improved consensus structure prediction for rna alignments from the cover: fast and reliable prediction of noncoding rnas the two technical replicates of this titration are depicted with different symbols. (d) comparison of variant detection for two independent full processing (pcr amplification, library preparation, sequencing) of the same patient sample, cemm . (e) comparison of variant detection for two independent sequencing runs of the same patient sample cemm . geographic distribution of clades bases on information for , strains in the global phylogenetic analysis in this study. (b) distribution of sars-cov- from austrian covid- sequences over the six phylogenetic clusters identified in this publication. (c) clade c of time-resolved phylogenetic trees reconstructed from , randomly subsampled global strains and austrian strains (left) or all european high-quality sequences dated before st of march. (d) statistics of foreign exposure history of icelandic covid- cases as reported in gisaid. (e) icelandic strains with austrian exposure history matching austrian cluster profiles. (f) exposure history of all sars-cov- sequences from icelandic covid- cases available on gisaid that match the mutation profile of the phylogenetic cluster tyrol- . (g) international tourists visiting ischgl between mutational analysis of fixed mutations in sars-cov- sequences. (a) ratio of non-synonymous to synonymous mutations in unique mutations identified in austrian sars-cov- sequences. (b) frequencies of synonymous and non-synonymous mutations per gene or genomic region normalized to length of the respective gene, genomic region or gene product (nsp - ). (c) mutational spectra panel. mutational profile of interhost mutations. relative probability of each trinucleotide change for mutations across sars-cov- sequences in , global sequences obtained from gisaid samples plus austrian samples (top) or sars sars-cov- sequences from gisaid compared to the expected distribution (based on randomizations) according to their tri-nucleotide context. the grey line indicates the mean number of simulated mutations in the window, the colored background represents the distribution of expected mutations (+/-standard deviation with respect to the mean) and the red dots indicate a significant difference (g-test goodness of fit shows a zoom analysis of low-frequency mutations. (a) number of variants detected across the different sample types. (b) number of variants per variant class. (c) mutational profile (relative probability of each trinucleotide) of , intra-host mutations across austrian mutational profile (relative probability of each trinucleotide) of , , intra-host mutations across austrian samples (alleles frequencies less than . ) (lower panel). (d) analysis of the mutation rate (analogous to the interhost mutation rate panel) across the sars-cov- genome using rna secondary structure prediction of the upstream nt of the sars-cov- reference genome (nc . ), comprising the complete 'untranslated region (utr) and parts of the nsp protein nucleotide sequence. the canonical aug start codon is located in a stacked region of sl (highlighted in gray). mutational hotspots observed in the austrian sars-cov- samples are highlighted in color. two fixed mutations at positions and , respectively, are marked in red. low-frequency variants with an abundance between % and % in individual samples are shown in orange s : viral intra-host diversity in individual patients. (a) two examples of low key: cord- -wbyqmvhs authors: xiao, minfeng; liu, xiaoqing; ji, jingkai; li, min; li, jiandong; yang, lin; sun, wanying; ren, peidi; yang, guifang; zhao, jincun; liang, tianzhu; ren, huahui; chen, tian; zhong, huanzi; song, wenchen; wang, yanqun; deng, ziqing; zhao, yanping; ou, zhihua; wang, daxi; cai, jielun; cheng, xinyi; feng, taiqing; wu, honglong; gong, yanping; yang, huanming; wang, jian; xu, xun; zhu, shida; chen, fang; zhang, yanyan; chen, weijun; li, yimin; li, junhua title: multiple approaches for massively parallel sequencing of hcov- (sars-cov- ) genomes directly from clinical samples date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wbyqmvhs covid- has caused a major epidemic worldwide, however, much is yet to be known about the epidemiology and evolution of the virus. one reason is that the challenges underneath sequencing hcov- directly from clinical samples have not been completely tackled. here we illustrate the application of amplicon and hybrid capture (capture)-based sequencing, as well as ultra-high-throughput metatranscriptomic (meta) sequencing in retrieving complete genomes, inter-individual and intra-individual variations of hcov- from clinical samples covering a range of sample types and viral load. we also examine and compare the bias, sensitivity, accuracy, and other characteristics of these approaches in a comprehensive manner. this is, to date, the first work systematically implements amplicon and capture approaches in sequencing hcov- , as well as the first comparative study across methods. our work offers practical solutions for genome sequencing and analyses of hcov- and other emerging viruses. as of march , human coronavirus (hcov- ) has surpassed severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coro- navirus (mers-cov) in every aspect, infecting over , people in more than coun- tries, with a mortality of over , , . so far, coronaviruses have caused three major epidemics in the past two decades, posing a great challenge to global health and economy. single nucleotide variations (isnvs) and its allele frequency have also contributed to anti- viral therapy and drug resistance, e.g., to reveal highly conserved genes during the outbreak that potentially serve as ideal therapeutic targets , . however, it is a challenge to accurately detect isnvs from clinical samples, especially when the samples are subjected to extra steps of enrichment and amplification. therefore, we aim to comprehensively compare the sensitivity, inter-individual (variant) and intra-individual (isnv) accuracy, and other general features of different approaches by sys- tematically utilizing ultra-high-throughput metatranscriptomic, hybrid capture-based, and amplicon-based sequencing approaches to obtain genomic information of hcov- from serial dilutions of a cultured isolate and directly from clinical samples. we present a reason- able sequencing strategy that fits into different scenarios, and estimate the minimal amount of sequencing data necessary for downstream hcov- genome analyses. our study, to- gether with our tailor-made experimental workflows and bioinformatics pipelines, offers very the situation, but this requires high-standard laboratory settings and expertise apart from being time-consuming. also, unwanted mutations that are not concordant with original clin- ical samples may be introduced during the culturing process. to enrich adequate viral content for whole-genome sequencing in a convenient manner, we pursued two other methods: multiplex pcr amplification (amplicon) and hybrid capture (capture) (fig. ) . we designed a systematic study to comprehensively validate the bias, sensitivity, inter-individual (variant) and intra-individual (isnv) accuracy of multiple ap- proaches by sequencing serial dilutions of a cultured isolate (unpublished), as well as the eight clinical samples (fig. ) . we performed qrt-pcr of -fold serial dilutions (d -d ) of the cultured isolate, and the ct was . , . , . for, . , . , , and . , respec- tively, indicating the undiluted rna (d ) of the cultured isolate contained ~ e+ genome copies per ml. for amplicon sequencing, we utilized two kits comprising of two set of pri- mers generating pcr products of - bp and - bp, respectively. the ~ bp amplicon-based sequencing was implemented in all samples and analyzed throughout the study, while the ~ bp amplicon-based sequencing was only applied in the cultured isolate for coverage analysis. to inadequate depth in amplicon sequencing (fig. a) . another pitfall is that amplification across the genome can hardly be unbiased, causing difficulties in complete genome assem- bly. indeed, amplicon sequencing exhibited a higher level of bias compared with meta- transcriptomic sequencing, in terms of coverage across the viral genomes from the cultural isolate and the clinical samples tested in our study (fig. b, d, supplementary fig. ). to our surprise, however, capture sequencing was almost as unbiased as meta sequencing, demonstrating better performance than the previous capture method used to enrich zikv despite the hcov- genome is ~ fold larger than zikv (fig. b, c) . two reasons amongst others were likely to be accountable to this improvement, ) we utilized pieces of ssdna probes covering x of the hcov- genome to capture the libraries, ) we employed the dnbseq sequencing technology that features pcr-free rolling circle replica- tion (rcr) of dna nanoballs (dnbs) , . the sequencing results of amplicon and capture approaches revealed dramatic increases in the ratio of hcov- reads out of the total reads compared with meta sequencing, sug- gesting the enrichment was highly efficient - -fold in capture method and -fold in amplicon method for each sample on average (supplementary table - ). to further com- pare the sensitivity of different methods, we plotted the number of hcov- reads per million (hcov- -rpm) of total sequencing reads against the viral concentration for each sample. the productivity was similar between the two methods when the input rna of the cultured isolate contained e+ genome copies per ml and above (fig. e) . however, amplicon sequencing produced - fold more hcov- reads than capture sequencing when the input rna concentration of the cultured isolate was e+ genome copies per ml and lower, suggesting amplicon-based enrichment was more efficient than capture for more challenging samples (conc. ≤ e+ genome copies per ml, or ct ≥ . ) (fig. e) . meta sequencing -as expected -produced dramatically lower hcov- -rpm than the other two methods among clinical samples tested with a wide range of ct values, whereas amplicon and capture were generally comparable to each other (fig. e) . considering the costs for sequencing, storage, and analysis increase substantially with larger datasets, we tried to estimate how much sequencing data must be produced for each approach in order to achieve x depth across % of the hcov- genome, and the results can be found in supplementary table . as a practical, cost-effective guidance for future sequencing, we also assessed the minimum amount of data required to pass the stringent filters (≥ % coverage and method-specific depth, see methods) in our pipelines corresponding to differ- e+ copies of hcov- genome per ml, while capture sequencing requires , to mb for the same situation (fig. g, supplementary table ) . to determine the accuracy of dif- ferent approaches in discovering inter-individual genetic diversity, we tested each method in calling the single nucleotide variations (snvs) and verified some of the snvs with sanger sequencing ( supplementary fig. ). two to five snvs were identified within each clinical sample, and in all the seven samples, snvs identified by the three methods were concord- ant except that capture missed one snv at position in gzmu (fig. a) . we then investigated the allele frequencies of these sites across methods, and found that alleles identified by capture sequencing displayed lower frequencies than the other two methods, especially for gzmu , gzmu , and gzmu where the viral load was lower (ct ≥ ), which explained why capture sequencing neglected an snv in our pipeline when the cutoff of snv calling was set as % allele frequency (fig. b) . these data indicate that amplicon sequencing is more accurate than capture sequencing in identifying snvs, espe- cially for challenging samples. sequencing has also allowed us to investigate rna expression patterns of the overall mi- crobiome and host content and thus suitable for discovering new viruses, distinguishing co- infections, and dissecting virus-host interactions. to explore the microbiota, we performed further metatranscriptomics analysis of the clinical samples. we were able to identify host nucleic acids in all of the samples, and over % of total reads were from the host in sputum, nasal, and throat samples ( supplementary fig. a ). virus contributed to less than % of reads in anal swab and throat swab while more than % of reads in nasal swab (supple- mentary fig. b) . these results suggest nasal swab could be the most ideal sample type for viral detection among the four sample types, which agrees with recent clinical evidence . among the viral reads, over % were coronaviridae, which is consistent with clinical diag- nostics ( supplementary fig. c ). reads from other viruses were also identified, indicating further measurements could be taken to confirm if co-infection exists ( supplementary fig. ). bacterial composition was also shown, providing support for scientific research, as well bioinformatics pipelines tailored in this study, e.g., ) the bias of amplicon sequencing can be improved by reducing the amount of cycles in the st pcr or optimize the molar ratios of primers (fig. a) , ) the amplicon sequencing is particularly convenient compared with previous counterparts since the fragmentation and library construction steps are omitted cons using single-end nt reads (fig. a) , ) using anything less than pieces of ssdna probes in hybrid capture may attenuate the sequencing coverage while increase the bias, ) metatranscriptomic sequencing was conducted with an ultra-high-throughput se- quencing platform so that the successful rate was substantially higher than usual. ) the minimal amount of data necessary for analyzing the hcov- genome from clinical samples across methods is higher than that predicted by data from the cultured isolate was probably due to the high nucleic acids background from the host and other microbes (supplementary table - , supplementary fig. ) . also, we do not take into account the time spent in se- nt strategy using the same protocol described above, generating gb sequencing data for each sample on average. total rna was reverse transcribed to synthesize the first-strand cdna with random hexamers and superscript ii reverse transcriptase kit (invitrogen, carlsbad, usa). table . µl of first-strand instructions, including lambda phage gdna unless specified. ng of human gdna was added to each pcr reaction of the cultured isolate. the pcr was performed as follows: min at °c, min at °c, cycles of ( s at °c, min at °c, min at °c to s reaction were unique dual indexed. after the nd pcr, products were purified following the same procedures as the st pcr and quantified using the qubit dsdna high sensitivity assay on qubit . (life technologies). pcr products of samples yielding sufficient material (> ng/µl) were pooled at equimolar to a total dna amount of ng before converting to single-stranded circular dna. dnbs-based libraries were generated from μl of single- stranded circular dna pools and sequenced on the mgiseq- platform with single-end nt strategy, generating . gb sequencing data for each sample on average. for metatranscriptomic and hybrid capture sequencing data, total reads were first processed by kraken v . . (default parameters) with a self-build database of coronaviridae ge- nomes (including sars, mers and hcov- genome sequences downloaded from gisaid, ncbi and cngb) to efficiently identify candidate viral reads with a loose manner. these candidate reads were further qualified with fastp v . . (parameters: -q -u - n -l ) and soapnuke v . . (parameters: -l -q . -e -n . - -q -g -d) to remove low-quality reads, duplications and adaptor contaminations. low-complexity reads were next filtered by prinseq v . . (parameters: -lc_method dust -lc_threshold ). after the above process, hcov- -like reads generated from metatranscriptomics and hybrid capture method were obtained. for amplicon sequencing data, se reads were first processed with fastp v . . (pa- rameters: -q -u -n -l ) to remove low quality-reads and adaptor sequences. primer amplicon sequencing data were processed as described above, except that duplications were not removed. a heatmap was generated to visualize the viral genome coverage for all samples sequenced by the amplicon method with the pheatmap package in r (v . . ) . the depth at each nucleotide position was binarized, and was shown in pink if the depth was x and above. hcov- reads of metatranscriptomic and hybrid capture sequencing data were identified by aligning the hcov- -like reads to the hcov- reference genome (gisaid accession: epi_isl_ ) with bwa in a strict manner of coverage ≥ % and identity ≥ %. for comparisons of the coverage and depth of the viral genome across samples and methods, we normalized the viral reads to total sequencing reads with hcov- reads per million (hcov- -rpm). hcov- -rpm for amplicon sequencing data was calculated by the same pipelines we applied for metatranscriptomic and hybrid capture sequencing data. to estimate the minimum data requirements for genome assembling and intra-individual variation analysis, we applied gradient-based sampling to the hcov- genome align- ments (referred to bam files) to each dataset using samtools (v . ) . the effective genome coverage was set as % for all three mps methods. considering the distinct technologies used in different methods, we set method-dependent thresholds of effective depth as follows: ) ≥ x for metatranscriptomic sequencing; ) ≥ x for hybrid capture sequencing; and ) ≥ x for amplicon sequencing. we next calculated the coverage and depth within each subsampled bam file per sample to determine the minimal bam file that could meet the above thresholds of both coverage and sequencing depth. the method- dependent minimum amount of sequencing data of each sample were estimated accord- ingly. we assessed the correlations between the hcov- genome copies per ml in diluted samples of cultured isolates and the minimum amount of sequencing data for amplicon- and capture-based methods using pearson correlation coefficient (r) with the function scatter from the r package ggpubr (v . . ) . except for amplicon sequencing samples, variants calling in metatranscriptomic and hybrid for amplicon sequencing data, only base positions covered by ≥ x reads were used for isnvs calling. for metatranscriptomic and hybrid capture-based sequencing data, the thresholds of depth were set as x and x, respectively. the candidate isnvs were fur- ther filtered for quality as follows: ) frequency filtering, only minor alleles (frequency ≥ % and < %) and major alleles (frequency ≥ % and ≤ %) were remained; ) depth filter- ing, isnvs with fewer than five forward or reverse reads were removed; and ) strand bias filtering (not applicable to single-end reads of amplicon sequencing), isnvs were removed if there were more than a -fold strand bias or a -fold difference between the strand bias of the variant call and that of the reference call. for metatranscriptomic sequencing of clinical samples, raw sequencing data of a single se- quence lane (approximately - gb per sample) was used to simultaneously assess the rna expression patterns of human, bacteria and viruses in clinical samples from covid- patients. we first used software fastp (v . . ) to filter low-quality reads and remove adapter with parameters: - - -q -c -l . after qc, we mapped high-quality reads to hg and removed human ribosomal rna (rrna) reads by soap v . (parameters: - m -x -s -l -v -r ), and the remaining rna reads were then aligned to hg by hisat with default settings to identify non-rrna human transcripts as previously de- scribed. next, we applied kraken (version . . -beta, parameters: --threads --confi- dence ) to assign microbial taxonomic ranks to non-human rna reads against the large a pneumonia outbreak associated with a new coronavirus of probable bat origin rna based mngs approach identifies a novel human coronavirus from two individual pneumonia cases in wuhan outbreak a new coronavirus associated with human respiratory disease in china evaluation of advanced reverse transcription-pcr assays and an alternative pcr target region for detection of severe acute respiratory syndrome- associated coronavirus coronaviruses and the human airway: a universal system for virus-host interaction studies a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster first case of novel coronavirus in the united states stochastic processes constrain the within and between host evolution of influenza virus intra-host dynamics of ebola virus during ebola virus epidemiology, transmission, and evolution during seven viral quasispecies evolution zika virus evolution and spread in the americas reliable multiplex sequencing with rare index mis-assignment on dnb-based ngs platform advanced whole genome sequencing using a complete r: a language and environment for statistical computing ggplot " based publication ready plots. r package version haplotype-based variant detection from short-read sequencing snippy: fast bacterial variant calling from ngs reads a program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain soap : an improved ultrafast tool for short read alignment graph-based genome alignment and genotyping with hisat and hisat-genotype improved metagenomic analysis with kraken bracken: estimating species abundance in metagenomics data double indexing overcomes inaccuracies in multiplex sequencing coverage and depth of the cultured isolate and eight clinical samples. a, amplicon sequencing coverage by sample (row) across the hcov- three independent experiments were performed for amplicon sequencing. pink, ~ bp amplicon-based sequencing including human and lambda phage nucleic acids background; red, ~ bp amplicon-based sequencing; orange, ~ bp amplicon-based sequencing excluding human and lambda phage nucleic acids background (nab); light blue, capture sequencing. f, hcov- -rpm (reads per million) sequenced plotted against qrt-pcr ct value for the clinical samples alleles with ≥ % frequencies were called. *, snvs verified by sanger sequencing. b, allele frequencies of the identified snvs. pink, amplicon; light blue, capture; blue, meta. minor allele frequencies detected in serial dilutions of the cultured isolate (c) and clinical samples (d) across methods. pink, amplicon vs meta; light blue, capture vs meta. minor alleles are defined with ≥ % and < % frequencies. besides general quality filter and human nucleic acids. pe , paired-end -nt reads; se , single-end -nt reads. key: cord- -ee xw r authors: moustafa, ahmed m.; planet, paul j. title: rapid whole genome sequence typing reveals multiple waves of sars-cov- spread date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ee xw r as the pandemic sars-cov- virus has spread globally its genome has diversified to an extent that distinct clones can now be recognized, tracked, and traced. identifying clonal groups allows for assessment of geographic spread, transmission events, and identification of new or emerging strains that may be more virulent or more transmissible. here we present a rapid, whole genome, allele-based method (gnuvid) for assigning sequence types to sequenced isolates of sars-cov- sequences. this sequence typing scheme can be updated with new genomic information extremely rapidly, making our technique continually adaptable as databases grow. we show that our method is consistent with phylogeny and recovers waves of expansion and replacement of sequence types/clonal complexes in different geographical locations. gnuvid is available as a command line application (https://github.com/ahmedmagds/gnuvid). rapid sequencing of the sars-cov- pandemic virus has presented an unprecedented opportunity to track the evolution of the virus and to understand the emergence of a new pathogen in near-real time. during its explosive radiation and global spread, the virus has accumulated enough genomic diversity that we are now able to identify distinct lineages and track their spread in distinct geographic locations and over time [ ] [ ] [ ] [ ] [ ] [ ] . phylogenetic analyses in combination with rapidly growing databases [ , ] have been instrumental in identifying distinct clades and tracing how they have spread across the globe, as well as estimating calendar dates for the emergence of certain clades [ ] [ ] [ ] [ ] . this information is extremely useful in assessing the impact of early measures to combat spread as well as identifying missed opportunities [ ]. going forward whole genome sequences will be useful for identifying emerging clones or hotspots of reemergence. in all of these efforts, identification of specific clones, clades, or lineages, is a critical first step, and there are few systems available to do this [ ] . as of june st there are already , and , complete genomes (> , bp) available at gisaid [ ] and genbank [ ] , respectively. to address the problem of identifying sequence types in sars-cov- and leverage these huge datasets, we took inspiration from a an approach used widely in bacterial nomenclature, multilocus sequence typing (mlst) [ ] . our panallelome approach to developing a whole genome (wgmlst) scheme for sars-cov- uses a modified version of our recently developed tool, whatsgnu [ ], to rapidly assign an allele number to each gene nucleotide sequence in the virus's genome creating a sequence type (st). the st is codified as the sequence of allele numbers for each of the genes in the viral genome. here we show that this approach allows us to link sts into clearly defined clonal complexes (cc) that are consistent with phylogeny. we show that assessment of sts and ccs agrees with multiple introductions of the virus in certain geographical locations. in addition, we use temporal assessment of sts/ccs to uncover waves of expansion and decline, and the apparent replacement of certain sts with emerging lineages in specific geographical locations. we developed the gnu-based virus identification (gnuvid) system as a tool that automatically assigns a number to each unique allele of the open reading frames (orfs) of sars-cov- ( figure a) information. the majority of these alleles ( %) are for orf ab which represents % of the genome length ( figure a) . strikingly, the most abundant alleles of each orf (except orf ab) were present in at least % of the , isolates, and for orfs (orf a- ) the allele that was observed in the earliest genomes was also the most prevalent, suggesting strong nucleotide level conservation over time. some widespread alleles corresponded to mutations that have been hypothesized to be important to the evolution or pathogenesis of the virus. for instance, for the s gene, the gene for the spike protein, % ( / ) of unique alleles have the a g (d g) mutation ( figure b ) that has been associated with the emergence of increased transmission whether through increased transmissibility [ ] isolates. the earliest sequenced, and most common, orf a allele that carries this mutation (allele ) was isolated in france on february st in a virus that also carries also the a g mutation in the spike gene. to create an st for each isolate gnuvid automatically assigned unique st numbers based on their allelic profile (supplementary table ). we then used a minimum spanning tree (mst) to group sts into larger taxonomic units, clonal complexes (ccs), which we define here as clusters of > sts that are single or double allele variants away from a "founder". using the goeburst algorithm [ , ] to build the mst and identify founders, we found ccs representing % ( / ) of all unique sts ( figure b) . when the global region of origin for each genome sequence was mapped to each cc there was a strong association of some ccs with certain geographical locations. for instance, genomes from ccs , , , , , , , , , , , , are predominantly from europe while genomes from ccs , and are mainly from asia ( figure b) . interestingly, genomes originating from the us appear to be associated with very divergent ccs, potentially reflecting two major introductions. the first, cc , is associated with locations on the west coast, specifically washington state. the first two isolates belonging to cc are from china followed by the first isolate from washington ( / / ). the second predominant us cc, cc , is closely related to other ccs found predominantly in europe ( figure b and c). isolates of cc were initially found and sequenced in europe, followed by the us east coast, and later in other us locations ( figure b) . interestingly, almost all isolates ( %) from cc and its descendants ccs , , and ( figure b) carry the g t mutation in orf a, representing % of all isolates that carry this mutation; the other % are from sts that were not assigned ccs. cc is interesting for its geographic predominance in the middle east ( % from saudi arabia and turkey) and its close relationship to st and st , which are mostly found in the us. this may signal a transmission event from the us to the middle east. to show that ccs are mostly consistent with whole genome phylogenetic trees we produced a maximum likelihood tree and mapped the cc designations onto the tree. table ) . because we included collection dates for each genomic sequence, we can use sts and ccs to better understand the emergence and replacement of certain lineages in certain geographical regions over time. figure a shows temporal plots of the most common ccs around the world. this makes clear the emergence of new ccs over time such as cc , cc and cc . cc , the earliest cc, started by representing % of sequenced genomes in mid-january, but had dropped to only % by mid-march. of course, relative proportions of sts or ccs isolated and sequenced may be a highly biased statistic that is contingent upon where the isolate comes from, the decision to sequence its genome, and the local capacity to sequence a whole genome. interestingly, europe showed a general cc diversity over time resembling that of the worldwide temporal plot, and then showed expansion of the local cc and cc after mid-february (figure d) . the us plot (figure e ) reflects the two possible introductions on the west and east coasts from asia and europe, respectively, with the current dominance (more than %) of cc . focusing on washington, it is interesting to note the possible replacement of cc by cc perhaps by introduction from the east coast or europe ( figure f ) [ , ] . in new york, a different pattern is seen with cc being persistently dominant ( figure g ). however, a more granular view of sts in new york, not ccs, shows a shifting epidemiology with st declining and the rise of closely related slvs and dlvs of st ( figure h) . because a change in any allele creates a new st our method may accumulate and count "unnecessary" sts that have been seen only once or may be due to a sequencing error. this is partially ameliorated by the use of the cc definition that allows some variability amongst the members of a group. a large number of sts also may allow more granular approaches to tracking new lineages. our method is also limited by the quality and extent of the database. for this implementation we limited the database to genomes that do not have any ambiguity or degenerate bases. however, these genomes could be queried through our tool to be assigned to the closest st/cc. another limitation is the stability of the classification system, some virus genomes may be reassigned to new ccs as clones expand epidemiologically, but this may also reflect a dynamic strength as circulating viruses emerge and replace older lineages. the genomic epidemiology of the , sars-cov- isolates studied here show six predominant ccs circulated/circulating globally. our tool (gnuvid) allows for fast sequence typing and clustering of whole genome sequences in a rapidly changing pandemic. as illustrated above, this can be used to temporally track emerging clones or table the compressed genomes from our quality controlled dataset are available from the corresponding author and available online for download. the compressed database will be updated weekly on https://github.com/ahmedmagds/gnuvid. source code for gnuvid can be found in its most up-to-date version here, https://github.com/ahmedmagds/gnuvid, under the gnu general public license. the authors declare that they have no competing interests we would like to thank lidiya denu and michael silverman for helpful comments and discussion. we would like to thank the global initiative on sharing all influenza data (gisaid) and thousands of contributing laboratories for making the genomes publicly available. a full acknowledgements table is available as supplemental information. a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology a genomic survey of sars-cov- reveals multiple introductions into northern california without a predominant lineage the emergence of sars-cov- in europe and the us cryptic transmission of sars-cov- in washington state comprehensive genome analysis of , usa sars-cov- isolates reveals haplotype signatures and localized transmission patterns by state and by country global genetic diversity patterns and transmissions of sars-cov- gisaid: global initiative on sharing all influenza data -from vision to reality multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms whatsgnu: a tool for identifying proteomic novelty spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc global optimal eburst analysis of multilocus typing data using a graphic matroid approach evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data a new coronavirus associated with human respiratory disease in china basic local alignment search tool phyloviz . : providing scalable data integration and visualization for multiple phylogenetic inference methods mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform iq-tree : new models and efficient methods for phylogenetic inference in the genomic era dating of the human-ape splitting by a molecular clock of mitochondrial dna ufboot : improving the interactive tree of life (itol) v : recent updates and new developments key: cord- -dw dlx authors: wohlers, inken; calonga-solís, verónica; jobst, jan-niklas; busch, hauke title: covid- risk haplogroups differ between populations, deviate from neanderthal haplotypes and compromise risk assessment in non-europeans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dw dlx recent genome wide association studies (gwas) have identified genetic risk factors for developing severe covid- symptoms. the studies reported a bp insertion rs on chromosome and furthermore two single nucleotide variants (snvs) rs and rs , all three correlated with each other. zeberg and pääbo subsequently traced them back to neanderthal origin. they found that a . kb genomic region including the risk allele of rs is inherited from neanderthals of vindija in croatia. here we add a differently focused evaluation of this major genetic risk factor to these recent analyses. we show that (i) covid- -related genetic factors of neanderthals deviate from those of modern humans and that (ii) they differ among world-wide human populations, which compromises risk prediction in non-europeans. currently, caution is thus advised in the genetic risk assessment of non-europeans during this world-wide covid- pandemic. in general, gwas relate genotypes to phenotypes such as disease susceptibility and severity. however, association does not imply causality. to pinpoint causal variant(s) underlying a gwas association signal which typically comprises many correlated variants, a so-called fine-mapping is performed in a first step. and ultimately, fine- mapping must be followed by experimental validation to eventually identify causal variant(s) and mechanisms. while gwas are based on cohort data, a personal risk can be assessed nonetheless, via associated variants as proxies for causal variants. for this, the cohort's genetic linkage patterns need to be representative of the neanderthal study, with an overall maximum probability to include a causal variant of %. however, as two positions carry protective alleles the risk probability of the previously assessed vindija neanderthal haplotype is only %. our results presented here complement the haplotype-based assessment of zeberg and pääbo. we use the same genomes data as in the original study, but with three important differences: (i) we investigate haplotypes within a larger genomic frequency of % in the whole dataset and variable frequencies from to % in different continental populations (fig. a) . eight haplogroups, h -h , have counts higher than and the most common is the protective haplotype h . risk haplotypes h -h tend to differ between continental populations (fig. a) . for them, covid- genetic risk probability varies substantially between and %. the high risk haplogroups h , h and h differ by one or two alleles, and differ from the low risk haplogroups h , h and h all of which are similar to the protective haplogroup h (fig. b) . however, individuals carrying a risk haplogroup very dissimilar from neanderthal haplotypes may still carry a causal variant (fig. c) ; this holds particularly true for africans with haplogroups h or h ( % or % probability) and for asians with haplogroup h ( % probability). haplogroup h has highest risk probability and is the most common risk haplogroup in europeans and americans (fig. b) . all human risk haplogroups differ from neanderthal haplotypes (fig. c) if this variant was causal ( % probability) using lead variants such as rs or rs would incorrectly classify individuals carrying these haplogroups to be at risk. this applies to few europeans, but mostly to non-europeans. in conclusion we find that classification into high and low covid- risk is extremely error-prone in non-european populations, if this assessment is based on currently known european risk variants and probabilities. the risk haplogroup diversity observed across populations thus compromises risk assessment in non-europeans. this situation is currently improved by performing ancestry-matched gwas in non-using complementary, e.g. experimental approaches will help in the process. these diverse systems genetics efforts will eventually converge into genetic causes and corresponding molecular mechanisms that explain non-environmental variation in covid- severity. severe covid- gwas group et al. genomewide association study of severe covid- with respiratory failure covid- host genetics initiative. the covid- host genetics initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the sars-cov- virus pandemic the major genetic risk factor for severe covid- is inherited from neanderthals finemap: efficient variable selection using summary data from genome-wide association studies genomes project consortium et al. a global reference for human genetic variation we thank the covid- host genetics initiative for publicly releasing gwas summary statistics. iw and hb acknowledge funding by the deutsche key: cord- - i efdp authors: sidhom, john-william; baras, alexander s. title: analysis of sars-cov- specific t-cell receptors in immunecode reveals cross-reactivity to immunodominant influenza m epitope date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: i efdp adaptive biotechnologies and microsoft have recently partnered to release immunecode, a database containing sars-cov- specific t-cell receptors derived through mira, a t-cell receptor (tcr) sequencing based sequencing approach to identify antigen-specific tcrs. herein, we query the extent of cross reactivity between these derived sars-cov- specific tcrs and other known antigens present in mcpas-tcr, a manually curated catalogue of pathology-associated tcrs. we reveal cross reactivity between sars-cov- specific tcrs and the immunodominant influenza gilgfvftl m epitope, suggesting the importance of further work in characterizing the implications of prior influenza exposure or co-exposure to the pathology of sars-cov- illness. we first examined the distribution of tcrs within the mcpas database over the types of pathogens present in the database and cross-referenced the sars-cov- specific tcrs into the mcpas database ( figure a) , and we noted that there was a statistically significant enrichment (from . % to . %) of sars-cov- specific tcrs that had known specificity to the immunodominant m gilgfvftl epitope we then examined the distribution of tcrs within the immunecode database across the various open readings frames (orfs) and mapped the m specific tcrs within this database ( figure b) . we further noted that there was a statistically significant enrichment (from . % to . %) of cross-reactive sars-cov- specific tcrs within the surface glycoprotein orf. to better characterize the sars-cov- epitopes these tcrs were responsive to, we visualized the number of unique tcrs per epitope in the immunecode database and noted a high level of specificity to a stretch of overlapping epitopes within the surface glycoprotein orf ( figure c ). we observed that these highly homologous tcrs (visualized by seqlogo) mapped to a set of overlapping epitopes containing an snvt motif, suggesting that this repertoire of t-cells is capable of recognizing both the m epitope and this sars-cov- -derived epitope. finally, we found that these cross-reactive tcrs were present in both individuals who were and were not exposed to sars-cov- ( figure d ). in this brief analysis, we have identified a set of tcrs with known specificities to both a sars-cov- and influenza antigen. the implications of these findings are relevant in understanding the nature of the immune response to sars-cov- as the clinical course is highly variable with patients presenting anywhere from being asymptomatic to requiring critical care (goyal et al. ) . further insights into the evolution of the immune response could help guide early triage of individuals who are more likely to require more intensive support. however, there are several limitations of this study. the number of individuals in this study was fairly small with exposed and non-exposed individuals. given the lack of power in this analysis, it was difficult to assess for any differences in the extent of cross reactivity in exposed vs non-exposed individuals. differences in cross reactivity may shed light on the role of influenza specific antigen-experienced t-cells in the control of sars-cov- infection. furthermore, given these results, future studies may want to consider comparing the clinical course in influenza vaccinated vs unvaccinated individuals. another significant limitation is assessing the certainty of antigen-specificity of a given tcr. often, specificity of a tcr is determined in a high throughput fashion through methods such as tetramer sorting or tcell expansion based assays, which can be inherently noisy and non-specific sources of data (sidhom et al., n.d.) . this problem is highlighted in figure a when we compare the distribution of unique tcr sequences in the mcpas-tcr database vs the sampled distribution of sars-cov- specific tcrs. while there is clear enrichment in the m epitope, the np epitope, while demonstrating a large number of tcrs that are seemingly cross-reactive, the enrichment over the background distribution within mcpas-tcr is not significant, suggesting that this may be nonspecific signal within the mira assay. in light of these limitations, t-cell recognition assays such as elispot would be needed moving forward to validate these results, and confirm that truly, influenza specific t-cells are capable also of recognizing sars-cov- specific epitopes. finally, while these sars-cov- epitopes may elicit in-vitro responses, if they are not processed and presented in-vivo, influenza specific t-cells would likely not have any ability to recognize sars-cov- infected cells. thus, further work is required to not only confirm that these influenza specific t-cells are cross-reactive to these sars-cov- epitopes but studies are needed to confirm their in-vivo processing and presentation. in conclusion, while these results are preliminary in a small cohort of individuals, we have identified a set of tcrs that is known to both recognize an immunodominant epitope derived from influenza and sars-cov- , suggesting that immune control of one infection may play a role in the control of the other. these results justify further study into evolution of the immune response to sars-cov- in the setting of prior or coexisting influenza infection. all data and code used to conduct this analysis are available at https://github.com/sidhomj/covid with included jupyter notebook to reproduce all analyses and figures present in this manuscript. figure . a) t-cell receptor sequences for pathogenic epitopes were collected from mcpas-tcr and their distribution by pathogen was plotted (left) along with the distribution of sars-cov- specific tcrs (middle) that had exact sequence match to tcr sequences in the mcpas-tcr database. both counts and proportions for each pathogen is shown. the magnitude of enrichment is visualized (right) for each mcpas-tcr epitope (fisher's exact test: * : p-val < . after multiple hypothesis testing correction) b) distribution of sars-cov- tcrs from immunecode (left) and m specific (middle) tcrs were visualized across all open reading frames (orfs). the magnitude of enrichment is visualized (right) for each orf (fisher's exact test: *** : p-val < . after multiple hypothesis testing correction). c) distribution of unique tcrs specific for m gilgfvftl across sars-cov- epitopes (top). select overlapping epitopes were highlighted showing a shared snvt motif and corresponding specific tcrs were represented with a seqlogo (bottom). d) distribution of unique tcrs specific for m gilgfvftl across individuals stratified by whether the individual was considered exposed or non-exposed to sars-cov- . clinical characteristics of covid- in new york city multiplex identification of antigen-specific t cell receptors using a combination of immune assays and immune receptor sequencing molecular immune pathogenesis and diagnosis of covid- clinical infectious diseases: an official publication of the infectious diseases society of america covid- infection: the perspectives on immune responses deeptcr: a deep learning framework for understanding t-cell receptor sequence signatures within complex t-cell repertoires mcpas-tcr: a manually curated catalogue of pathology-associated t cell receptor sequences functional exhaustion of antiviral lymphocytes in covid- patients key: cord- - dfiwo authors: paris, kristina a.; santiago, ulises; camacho, carlos j. title: loss of ph switch unique to sars-cov supports unfamiliar virus pathology date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dfiwo cell surface receptor engagement is a critical aspect of viral infection. this paper compares the dynamics of virus-receptor interactions for sars-cov (cov ) and cov . at low (endosomal) ph, the binding free energy landscape of cov and cov interactions with the angiotensin-converting enzyme (ace ) receptor is almost the same. however, at neutral ph the landscape is different due to the loss of a ph-switch (his lys) in the receptor binding domain (rbd) of cov relative to cov . namely, cov stabilizes a transition state above the bound state. in situations where small external strains are applied by, say, shear flow in the respiratory system, the off rate of the viral particle is enhanced. as a result, cov virions are expected to detach from cell surfaces in time scales that are much faster than the time needed for other receptors to reach out and stabilize virus attachment. on the other hand, the loss of this ph-switch, which sequence alignments show is unique to cov , eliminates the transition state and allows the virus to stay bound to the ace receptor for time scales compatible with the recruitment of additional ace receptors diffusing in the cell membrane. this has important implications for viral infection and its pathology. cov does not trigger high infectivity in the nasal area because it either rapidly drifts down the respiratory tract or is exhaled. by contrast, this novel mutation in cov should not only retain the infection in the nasal cavity until ace -rich cells are sufficiently depleted, but also require fewer particles for infection. this mechanism explains observed longer incubation times, extended period of viral shedding, and higher rate of transmission. these considerations governing viral entry suggest that number of ace -rich cells in human nasal mucosa, which should be significantly smaller for children (and females relative to males), should also correlate with onset of viral load that could be a determinant of higher virus susceptibility. critical implications for the development of new vaccines to combat current and future pandemics that, like sars-cov , export evolutionarily successful strains via higher transmission rates by viral retention in nasal epithelium are also discussed. although accurate assessments are still evolving, reports from the world health organization indicate that infection with sars-cov- (cov ) is significantly different relative to infection with previous respiratory viruses. the biggest distinctions are the longer incubation period and increased viral shedding (both likely responsible for higher rates of transmission), strong correlation of infected-fatality rates (ifr) with age and comorbidities, higher ifr for males relative to females, and minimal impact in children. as much as % of deaths from cov relate to cardiovascular complications (akhmerov and marban, ) , while cellular and animal models have also revealed inappropriate inflammatory responses along with high chemokine production (blanco-melo et al., ) . using thermodynamic, kinetic and molecular modeling, we explored potential reasons for these cov -unique aspects and aimed to identify determinants of its complex pathology. while it stands to reason that some of the answers may be found in the complex genomic changes triggered by the virus in cells, tissues, and organs, longer incubation and infectivity time scales also suggest that differences could have a biophysical origin. work on sars-cov (cov ) has already determined that the virus enters cells via receptormediated endocytosis in a ph-dependent manner (wang et al., ) that is characterized by cotranslocation of the viral spike glycoprotein and its specific functional receptor, the angiotensinconverting enzyme (ace ), from the cell surface to early endosomes. key steps that control the fate of the virus in the early and late endosome are driven in part by lowering the ph from . -to- . and from . -to- . (bui et al., ) , respectively; exposure to low ph triggers a spike catalyzed fusion between the viral and endosomal membranes followed by viral genome release. the cell machinery is then hijacked to replicate and assemble new virus particles that eventually exit the cell, primarily through budding. broadly speaking, this is the same ph-dependent endocytic path followed by the influenza virus (qin et al., ; yamauchi, ) , and likely all other coronaviruses (including mers-cov). while infections by cov and cov are mediated by the ace receptor, mers-cov (mers) gains entry to cells through dpp (raj et al., ) . figure shows structures for the receptorbinding domains (rbd) in complex with their receptors for all three viruses. surprisingly and likely significantly, while the rbd of cov (li et al., ) and mers (wang et al., ) have one histidine on opposite ends of their binding interface, cov (wang et al., ) does not have his residues in this domain. as we have been able to determine to date, all known strains of cov have mutated away their last his residue that is still present in the rbd of cov /mers and related zoonotic viruses (see below). because cell surface receptor engagement is a critical aspect of viral infection and life cycle, and sensing ph is relevant for both viral replication and regulation of histidine protonation, we set to decipher the mechanistic role of the remaining his residue that distinguishes the rbds of cov and cov . results ph-switch. at low (endosomal) ph, the binding free energy landscape of cov and cov interactions with their ace receptor is almost the same. this is important because low ph is critical for the activation of the spike fusion in late endosomes (martin and helenius, ) . in particular, his in ace (fig. ) , located at the core of the rbd binding interface, should play a key role in this process. indeed, we predict based on the co-crystal structures that his + should readily form a hydrogen bond network that stabilizes the rbd/ace complex in both cov and cov (fig. ) . thus, loss of the ph-switch in cov has no impact on the low ph bound conformation with ace . on the other hand, at neutral ph the landscape is different due to the loss of the ph-switch (his lys) in the rbd of the spike protein of cov relative to cov . we studied this loss by performing three independent unconstrained molecular dynamics simulations (mds) of cov pdb dd (prabakaran et al., ) and cov from the receptor (apo) in pdb lzg at both physiological and low ph ~ . conditions (see methods in supplementary information). in practice, lowering the ph protonates the his residue from neutral to positively charged (his + ) by the addition of an extra hydrogen. of note, although not studied here, the ph-switch in the rbd of mers has previously been observed in ph-dependent crystal structures (zhang et al., ) . mds revealed two distinct regions in the binding interface of cov : (a) a loop motif (f spdgkpctppalncy ) that for his and his + shows mostly not-bound-and bound-like conformations, respectively; and, (b) the rest of the binding interface that consistently adopts bound-like conformations that are independent of ph ( fig. a-c) . figure a -b shows the dominant clusters observed in the his and his + simulation of cov , as well as the ph-independent simulation of cov . figure c -d shows detailed analyses of the corresponding root-mean-squaredeviations (rmsds) of these two regions relative to the co-crystals as a function of time (additional mds are shown in fig. s ). the plots include the equilibration time (between - ns) in order to emphasize that the distinct trajectories were not biased by different initial conformations. remarkably, deletion of the ph switch in cov that mutates the last histidine, his , in cov by lys in cov generates a motif, which includes t eiyqag , that yields almost exclusively bound-like conformations (fig. d) ¾i.e., cov is always ready to bind ace . it is interesting to note that the effect of both lys in cov and his + in cov ( fig. c -d) is to stabilize the bound-like state. implications of these findings in the binding free energy landscape are sketched in fig. . namely, at low (late endosomal) ph, the landscape of cov and cov interactions with the ace receptor are very similar. however, at neutral ph the landscape is different due to the loss of the ph switch in the rbd of cov relative to cov . specifically, the not-bound-like conformations of the phdependent loop in cov stabilizes a higher free energy transition state, whereas the persistent bound-like behavior of cov yields a much tighter bond. dynamics of virus-receptor interactions for cov and cov . what are the implications of this transition state found in cov but not in cov at neutral ph? to answer this question, we need to consider that viral particles in the respiratory system are under small external strains from, e.g., shear flow in the respiratory airways, when engaging cell surface receptors. the cell mechanics of these interactions can be described by applying the reaction-limited kinetics of membrane-to-surface adhesion and detachment first envisioned by dembo et al (dembo et al., ) . specifically, one can write the free energy of the bound-state (bs) ∆ #$ under a tensile force as a function of the cell-cell gap width l, a constant binding free energy ∆ #$ & and a "spring" energy such that where #$ is the equilibrium length for the bonded state. a similar equation holds for the free energy of the transition state (ts) ∆ .$ at the same cell-cell gap, this treatment assumes equilibrium between bonded and de-bonded states, so there must be a very slow "ramp" rate for the force of pulling or pushing. if these conditions apply, the equilibrium constant for bond formation can be written as where # is the boltzmann factor. note that ( ) = ∆ / : = cd / cee . according to arrhenius theory, the de-bonding rate constant at a given gap-width can then be written as the mechanical or structural basis of the rbd/receptor interactions in fig. s can be characterized as a "door-knob" type junction across the gap, as opposed to a gripping or fish-hook bound state. the knob interactions with the receptor entailed two characteristic patches, a large bound-like domain and the smaller switching loop (fig. ) , which one could model with spring constants k #i and k iccj , respectively (see red and yellow surfaces in fig. s ). then, the elastic constant of the bound-and transition-state can be written as k #$ ≈ k #i + k iccj and k .$ ≈ k #i , respectively, with the equilibrium rest lengths being essentially the same, i.e., l #$ = .$ = l . thus, virus detachment to the transition state corresponds to an ideal case of the theory, where the only allowed change between the bonded and transition state is in the spring constants, such that it is clear that the spring approximation should only apply for small deformations. in general, the springs across the gap undergo a "twisting" motion around its long-axis to reach the transition state from the bonded state. the motion will increase "tightness" of the spring if (k #$ − k .$ ) < , which defines a catch-bond. here, however, cov always loosens tightness, (k #$ − k .$ ) = k iccj > , corresponding to a typical slip-bond whose lifetime is shortened by tensile forces acting in the bond. free energy landscapes and estimates of bond detachment for cov /cov and ace . we use the fastcontact server (champ and camacho, ) to compute the electrostatic ∆ i r & and desolvation ∆ s $ci & binding free energies of the bound and transition states using co-crystal structures and chimeras that incorporate changes triggered by low ph structures. entropies coupled to the unbound state could be somewhat higher for cov relative to cov due to the larger conformational entropy associated with the switching loop in fig. . other error bars are correlated since interactions are very similar such that ∆∆ ′ have an error bar of ± / . absolute free energies need to account for size-dependent configurational and vibrational entropy changes upon binding, which for high affinity protein-protein complexes have been estimated to be anywhere between -to- kcal/mol. however, for flat and rather superficial complexes such as those here (fig. s ) , the entropy loss could be much lower. finally, the pre-exponential factor cee (eq. ) in the absence of a transition state and at equilibrium is exactly cd at m concentration, which for diffusion-limited association can be approximated by cd~ \ s - (camacho et al., ) . (fig. a-c) . experimental equilibrium binding free energy is from (walls et al., ) . free energy estimates of bound complexes are fully consistent with experimental data (walls et al., ) ; alternative measurements have suggested a fold weaker binding (shang et al., ) . these differences will only re-scale cee & by a factor of but will not be significant to our conclusions. the key observation is that cov stabilizes a transition state by about . / above the bound state. as a result, small external strains applied by, say, shear flow in the respiratory system enhance the off rate of the viral particle as shown in eq. . thus, cov virions are expected to detach from cell surfaces in faster time scales. these binding free energy estimates are depicted in the landscapes in fig. . only at physiological ph should the landscape of cov display a ph-dependent transition state. other bonds are expected to break in an all-or-none type of transition. optimal dwelling times and endocytosis. in principle, the ph-switch in cov could provide a natural mechanism to optimize virus internalization. namely, cov is expected to "bounce around" cell surfaces many times before cell entry. if the density of receptors is high enough, a "stick-and-slip" approach could be an efficient mechanism to find clusters of receptors randomly distributed on the cell surface. on the other hand, if only a small number of cell surface receptors are available, then receptor diffusion will be the limiting step to accrue the critical number of receptors needed for endocytosis, and longer rbd/ace dwelling times will be required. of note, tighter binding to ace would also make it easier for a smaller number of cov particles to establish an initial foothold in the respiratory system compared to weak binding where particles could be exhaled out. broad estimates of "high" concentration, e.g., in the range of -to- , receptors, yield an average separation between receptors ~ . − . µ (see fig. b ) that is larger than the diameter of the virus ~ . µ . thus, after attachment of the first spike to its receptor, recruitment of a second receptor to stabilized virus attachment will be limited by other receptors circulating in the cell membrane. lateral protein diffusion in cell membranes is length-scale dependent, varying between ~ . (kusumi et al., ) for - nm and > nm, respectively. thus, diffusion time scales to bring two receptors into close proximity for the above length scales are ~ − . it is noteworthy that the number of surface receptors in cells have an upper limit of about , , which in the respiratory airways could limit infectivity to dwelling times of about cee &~ v (or ~ \ v ) based on ~ . µ (fig. ) . viral infection and its pathology. the lifetime of cov rbd/ace bonds at physiological ph (~ s) is marginally short-lived for efficiently triggering endocytosis, even at high ace receptor concentrations. as a result, cov virions are expected to detach from cell surfaces in time scales that are much faster than the time needed for other receptors to reach out and stabilize virus attachment. and for human nasal goblet cells, it will be significantly worse since, after each bounce, particles will be biased by gravity to either diffuse down the respiratory tract or be exhaled, where they will not find significant amounts of ace receptors until reaching lung alveolar epithelial cells (hamming et al., ) . on the other hand, deletion of the ph switch allows cov to have rbd/ace bonds with dwelling times of about ~ s, commensurate with the diffusion time scales needed to recruit enough ace receptors to trigger endocytosis. this mechanism implies that, for the most part, cov will not co-localize in the nasal cavity. this prediction is consistent with cov being mainly a lower respiratory tract disease, causing complications that include acute respiratory distress (ding et al., ; hamming et al., ) . viral replication in human mucous gland cells will release viruses back into the same area where they can infect new cells until the supply of ace receptors is depleted below the critical threshold needed for binding and internalization. this process will trap viral particles in the upper respiratory tract, naturally leading to longer incubation times. similarly, accumulation of viral particles in the nasal mucosa will lead to extended periods of viral shedding. of note, since viral transit to the lower respiratory tract will be significantly slower for cov relative to cov , this period of higher infectivity rates could be for the most part mediated by asymptomatic individuals. based on our findings, incubation times should correlate with the number of ace -rich cells in the nasal area. it is important to note that children do not have well developed sinuses until adolescence (henson et al., ) . thus, large areas for viral replication will not be available in children, resulting in shorter incubation times due to the faster diffusion down to the lower respiratory tract. something similar could apply to females who have smaller nasal cavities relative to males (samolinski et al., ) . shorter times in the nasal cavity would lead to a lower viral load in the upper airways and could explain the lower transmission and milder symptoms that are observed in children, as well as the lower ifr in adult women relative to men. our proposed mechanism is also consistent with reported loss of sense of smell (anosmia) that may occur by day of a cov infection (speth et al., ) , as cells in the nasal cavities support olfactory mucous membranes needed for the perception of smell. proximity to the brain also suggest that cov infections could impact the brain in ways that other sars viruses cannot. moreover, cardiovascular and immunological complications triggered by cov could also be explained on the basis of long-term insult of endothelial cells by viral sequestration of the ace receptor (gurley and coffman, ) . ph-switch across species. further supporting the observation that cov is unique among other coronaviruses is shown in table that compares sequence alignments of ph domains in rbds of both cov , cov , mers, as well as other closely related zoonotic viruses. cov , and related coronaviruses in one species of pangolin and some bats do not share the ph-switch present in cov , instead they share the lys stabilization motif. however, these zoonotic viruses still have ph-switches that co-localize next to the ph-switch in the rbd of mers structure (fig. ) . interestingly, different bat-infecting strains show putative ph-switches that are closer in both sequence and structure to either mers or pangolin-associated coronavirus. while we have not yet found the species or strain where the loss of the ph-switch first occurred, these relationships point at the possible zoonotic origin of cov as well as evolutionary pressures to preserve the phswitch. it is noteworthy that dpp , the receptor of the mers rbd, is not found in nasal epithelial cells (meyerholz et al., ) . outlook. this newly discovered difference in protein sequence in the receptor binding domain of the spike glycoprotein and its impact on receptor binding reveals a mechanism that allows sars-cov internalization to take advantage of the high expression of ace in the nasal epithelium¾resulting in increased retention times in the upper respiratory tract and augmented infectivity. this mechanism reconciles observed epidemiological traits and pathologies specific to sars-cov and explains differences with those associated with sars-cov, which due to its stick-and-slip ph-switch is unable to efficiently undergo endocytosis in the nasal cavity. sars-cov also has a higher infected-fatality rate than sars-cov . while the evolutionary advantage of higher infectivity by sars-cov in the nasal area is clear, this property comes at the expense of an important regulatory mechanism that would have allowed this virus to more readily move in other organs and tissues. in fact, the life-cycle of sars-cov is significantly slower than that of sars-cov because cov is essentially immobilized at its initial cell receptor contact. thus, it seems unlikely that the diffusion limited recruitment of ace receptors affecting the virus in the respiratory airways would also be the limiting step in tissues. after internalization, the virus is encapsulated in a vesicle supported by rbd/ace complexes. the actual final number of complexes in each vesicle should vary above a given threshold, though not much is known about the details of this process. contrary to sars-cov, cov complexes would be expected to have greater difficulty slipping and breaking. it is not difficult to imagine that for vesicles compressed by an excess of receptors the fusion with the early endosome might be hindered, hosting a population of viruses that could stay latent or activate at much later times. this simple mechanism could underlie the still anecdotal evidence for infection recurrence (chen et al., ) , as well as extremely long-term of viral shedding. collectively, our studies provide insight pertinent to the molecular basis of viral infectivity and, at the same time, validate this form of thermodynamic and molecular modeling as an approach to probe the evolution of the next sars-mediated pandemic. from a therapeutic perspective, our findings linking viral pathology with long-term viral infection/retention in nasal epithelium of the upper respiratory tract suggest that vaccine development should not just concentrate on fighting systemic infection through induction of igg responses, but should instead aim to elicit high titers of secretory iga antibodies capable of neutralizing the virus in the nasal mucosa. therefore, intranasal delivery of a vaccine with strong iga producing potential is a logical approach to consider as the next step in countering the current and future pandemics that, like sars-cov , export evolutionarily successful strains via higher transmission rates. surface representation of the co-crystals reveal two characteristic lobes with flat and mostly superficial contacts. yellow surface corresponds to ph-switch loop and red surface indicates remaining of binding interface. covid- and the heart imbalanced host response to sars-cov- drives development of covid- effect of m protein and low ph on nuclear transport of influenza virus ribonucleoproteins kinetics of desolvation-mediated protein-protein binding fastcontact: a free energy scoring tool for protein-protein complex structures recurrence of positive sars-cov- rna in covid- : a case report the reaction-limited kinetics of membrane-to-surface adhesion and detachment the clinical pathology of severe acute respiratory syndrome (sars): a report from china angiotensin-converting enzyme gene targeting studies in mice: mixed messages tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis anatomy, head and neck, nose sinuses paradigm shift of the plasma membrane concept from the twodimensional continuum fluid to the partitioned fluid: high-speed single-molecule tracking of membrane molecules structure of sars coronavirus spike receptor-binding domain complexed with receptor transport of incoming influenza virus nucleocapsids into the nucleus dipeptidyl peptidase distribution in the human respiratory tract: implications for the middle east respiratory syndrome structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody real-time dissection of dynamic uncoating of individual influenza viruses dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc changes in nasal cavity dimensions in children and adults by gender and age structural basis of receptor recognition by sars-cov- single-particle tracking of immunoglobulin e receptors (fcepsilonri) in micron-sized clusters and receptor patches otolaryngol head neck surg function, and antigenicity of the sars-cov- spike glycoprotein sars coronavirus entry into host cells through a novel clathrin-and caveolae-independent endocytic pathway structure of mers-cov spike receptor-binding domain complexed with human receptor dpp structural and functional basis of sars-cov- entry by using human ace quantum dots crack the influenza uncoating puzzle structural definition of a unique neutralization epitope on the receptor-binding domain of mers-cov spike glycoprotein the protein data bank the pymol molecular graphics system, version . r pre, schrödinger, llc routine microsecond molecular dynamics simulations with amber on gpus. . generalized born routine microsecond molecular dynamics simulations with amber on gpus. . explicit solvent particle mesh ewald simmerling. ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb development and testing of a general amber force field software for processing and analysis of molecular dynamics trajectory data vmd -visual molecular dynamics acknowledgements. this work was supported by nih gm , ns and protonated state of h + is predicted to form stable h-bond network with d in ace , and y and y in (a) cov and (b) cov , respectively. h-bond network is based on rotamers already observed in co-crystals of the unprotonated forms: (c) pdb ajf for cov , and, (d) pdb lzg for cov (see also pdb m j). unprotonated co-crystal structures of cov assigned h with a ndh making a bond with backbone oxygen of d , which is already making a bond in the a-helix.our mds indicate that even in the unprotonated form the rotamer should be rotated o having nd and ne more readily interacting with d and y (as shown in panel d). figure . role of ph switch in cov relative to cov . (a) and (b) show the same co-crystals as in fig. superimposed with centroid of largest . Å rmsd cluster of conformations of phswitching loop from mds shown in (c) for cov and (d) for cov . also indicated is the size of the corresponding cluster relative to simulation time. (c) root-mean-squared-deviation (rmsd) of amino acid loop as a function of time that switches between not-bound-like (~ . Å) to bound-like (~ . Å) relative to co-crystal (pdb ajf), for his and his + , respectively; (d) same analysis for cov homologous loop shows most conformations under Å rmsd relative to pdb lzg. binding interface, not including loop, stays in a bound-like conformation for % of the simulation time for both (e) cov and (f) cov . atomic coordinates for starting structures were acquired from the protein data bank [ ] : dd was used for cov rbd and lzg was used for the cov rbd. the rbd from dd (bound to neutralizing antibody) was chosen instead of that in pdb id ajf (bound to ace ) as a starting structure as it includes otherwise missing portions of the domain. modification of his to different tautomeric or protonation states was done with pymol's mutagenesis wizard [ ] . molecular dynamics simulations (mds) were carried out with pmemd.cuda from amber [ ] [ ] [ ] using amber ff sb force field [ ] and generalized amber force field (gaff) [ ] . we used tleap binary (part of amber ) for solvating the structures in a cubed tip p water box with a Å distance from structure surface to the box edges, and closeness parameter of . Å. the system was neutralized and solvated. simulations were carried out after minimizing the system, gradually heating the system from k to k over ps, and equilibrating the system for ns at npt. ns of production was then carried out using npt at k with the langevin thermostat, a non-bonded interaction cut off of Å, time step of fs, and the shake algorithm to constrain all bonds involving hydrogens. clustering was completed using cpptraj [ ] and h-bond and rmsd calculations were done with vmd [ ] . all figures were drawn using pymol [ ] and gnuplot. key: cord- -vniokol authors: pontes, camila; ruiz-serra, victoria; lepore, rosalba; valencia, alfonso title: unraveling the molecular basis of host cell receptor usage in sars-cov- and other human pathogenic β-covs date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vniokol the recent emergence of the novel sars-cov- in china and its rapid spread in the human population has led to a public health crisis worldwide. like in sars-cov, horseshoe bats currently represent the most likely candidate animal source for sars-cov- . yet, the specific mechanisms of cross-species transmission and adaptation to the human host remain unknown. here we show that the unsupervised analysis of conservation patterns across the β-cov spike protein family, using sequence information alone, can provide rich information on the molecular basis of the specificity of β-covs to different host cell receptors. more precisely, our results indicate that host cell receptor usage is encoded in the amino acid sequences of different cov spike proteins in the form of a set of specificity determining positions (sdps). furthermore, by integrating structural data, in silico mutagenesis and coevolution analysis we could elucidate the role of sdps in mediating ace binding across the sarbecovirus lineage, either by engaging the receptor through direct intermolecular interactions or by affecting the local environment of the receptor binding motif. finally, by the analysis of coevolving mutations across a paired msa we were able to identify key intermolecular contacts occurring at the spike-ace interface. these results show that effective mining of the evolutionary records held in the sequence of the spike protein family can help tracing the molecular mechanisms behind the evolution and host-receptors adaptation of circulating and future novel β-covs. significance unraveling the molecular basis for host cell receptor usage among β-covs is crucial to our understanding of cross-species transmission, adaptation and for molecular-guided epidemiological monitoring of potential outbreaks. in the present study, we survey the sequence conservation patterns of the β-cov spike protein family to identify the evolutionary constraints shaping the functional specificity of the protein across the β-cov lineage. we show that the unsupervised analysis of statistical patterns in a msa of the spike protein family can help tracing the amino acid space encoding the specificity of β-covs to their cognate host cell receptors. we argue that the results obtained in this work can provide a framework for monitoring the evolution of sars-cov- specificity to the hace receptor, as the virus continues spreading in the human population and differential virulence starts to arise. spike protein sequences which were manually reviewed to annotate information on viral strain, host organism and cell receptors using information extracted from the ncbi protein database, uniprotkb and the literature. detection of sdps and spike protein subfamilies. the s det method ( ) uses multiple correspondence analysis (mca) to identify differentially conserved positions and sequence subfamilies within a given msa. msa positions that follow the subfamily segregation are defined as specificity determining positions (sdps) of the family. here, the unsupervised mode of s det was used in a two-level decomposition analysis to identify sdps linked to the spike protein family segregation between and within β-cov subgroups. phylogenetic analysis was performed both on the full β-cov msa and for individual subfamilies using the phyml method ( ) with default parameters. coevolution analysis. coevolving msa positions were identified by computing the mi-apc ( ) . mi-apc is a mutual information (mi)-based score corrected by the average product correction (apc) of the background noise and phylogenetic signal. to ensure robust statistics, msa columns were filtered according to percentage of gaps (<= %) and shannon entropy (>= avg -std), computed as follows: where x is the number of sdps observed in a given domain of interest, k is the number of sdps in the test set, m is the length of the domain of interest and n + m is the size of the protein. the analyses were performed both at the subunit and domain level. annotations on subunits and domain boundaries of the sars-cov- spike glycoprotein were retrieved from the literature ( ) and mapped to the rest of human β-covs spike sequences used in this analysis (sars-cov, mers, oc and hku ) based on the msa. evolutionary rate analysis. per-site evolutionary rates (ev) were computed by rate site ( ) using as input the full msa of the β-cov spike family and phylogenetic tree. sars-cov- sequence was set as the reference sequence. raw rate values were used to compute relative rates by normalizing the values to the mean of ( ) . structural dataset. all the structures employed in this study were retrieved from the pdb (https://www.rcsb.org) using the following pdb codes: vsb ( ) , vxx ( ) , lzg ( ) , x b ( ) , x ( ) , ajf ( ) , ohw ( ) , nzk ( ) , x f ( ) , l ( ) and i ( ) . a phylogenetic analysis was performed based on a msa of the full-lenght spike sequences from sars-cov- and other representative β-covs. the resulting tree, shown in figure a , reflects the taxonomic classification of β-covs into five subgenera, namely sarbecovirus, hibecovirus, nobecovirus, embecovirus and merbecovirus ( ) . a first s det analysis was performed on individual protein subfamilies from the sarbecovirus, embecovirus and merbecovirus subgroups. in the case of sarbecovirus, we identified three different clusters, two of which correspond to known sarbecovirus clades ( ) . notably, in contrast to what observed based on phylogenetic analysis, both sars-cov- and ratg are clustered together with members of sarbecovirus clade , which includes human sars-cov and bat sars-like sequences ( figure b-c) . this result is confirmed by the analysis of similarity scoring matrices (supplementary figure s ) where sars-cov- and sars-cov sequences cluster together and are closer to the green cluster based on the similarity of sdps, but not when the full-length sequence is considered. within the embecovirus group, we identified three clusters: a first cluster corresponding to murine covs (mhv), a second containing rat (rtcovs) and human covs (cov-hku ), and a third cluster containing the human cov hcov-oc and other mammalian covs (supplementary figure s -a,c). within the merbecovirus, we identified four clusters: a first cluster containing mers and mers-like bat covs, a second cluster containing hedgehog and bat covs, a third cluster containing hku bat covs, and a fourth cluster containing one single bat cov isolate (kw e-f ) from the nycteria species (supplementary figure s -b,d) . the sdps associated with the subfamily segregation within these three subgenera display a strong domain enrichment within the spike s subunit ( figure , supplementary table s ). specifically, the sdps (n = ) of sarbecovirus subfamilies, containing both sars-cov and sars-cov- spike sequences, are enriched in the rbd whereas in hcov-oc and hcov-hku , the human-pathogenic species belonging to embecovirus, sdps (n = ) fall mainly in an upstream region of the spike, with a significant enrichment in the ntd. also in mers-cov we observe a significant enrichment within the s subunit. however, as shown in figure , the sdps (n = ) are almost evenly distributed across the ntd and rbd regions, with a significant enrichment in the latter (supplementary table s ) . notably, the distribution of the sdps shows a clear relationship with the cell receptor usage observed among β-covs. in particular, hcov-hku and hcov-oc are known to bind sialic acid receptors on the host cells via the spike ntd ( ) , sars-cov and sars-cov- recognize the ace receptors via the rbd ( , ) , while mers-cov has been reported to use both dpp ( ) and sialic acid receptors ( ) via the rbd and ntd domains, respectively. in summary, the sdps found within these β-cov subgenera define a specific region of the receptor binding domains: they are part of, or in direct contact with, the ace interacting surface ( a second s det analysis was performed on the full β-cov msa. as shown in figure a -d, the identified sequence subfamilies are consistent with the phylogenetic classification of betacoronavirus into five subgenera ( ) . in this case, the sdps linked to the full β-cov spike family segregation are mainly located in the s protein subunit, with a statistically figure s ) . collectively, these results indicate that sdps capture the functional diversification observed within the individual protein subfamilies, whereby host-cell receptor specificity arises in the context of a structural framework that is specific to each β-cov phylogenetic group. structural and mutagenesis studies have shown that the spike rbd of sarbecovirus contains all the necessary information for host receptor binding and that a few amino acid substitutions in this region can lead to efficient cross-species transmission ( , ) . binding to ace is clade-specific and occurs at the carboxy-terminal region of the rbd, by an extended concave loop subdomain which forms the interaction interface with the ace n-terminal helix. notably, both the ace -contacting residues and the surrounding amino acids, collectively referred to as the receptor binding motif (rbm), are required to impart human receptor usage within the sarbecovirus lineage ( ) . consistently, we observe that several sdps associated with the sarbecovirus subfamily fall within the rbm, i.e., y /y , l /l , t /n , c /c , p /p , g /g , g /g , p /p , y /y on sars-cov- /sars-cov sequences, respectively ( figure ). of these, y has been previously reported as critical for ace binding ( ) . two sdps, namely g and g , fall within the receptor interface forming two hydrogen bonds with the ace k . other sdps, such as l , t , c , p and g make direct contact with ace contacting residues. hence, these sdps are likely to play an important role in ace binding by affecting the local orientation of ace contacting residues. this hypothesis is further supported by results from in silico mutagenesis and coevolution analysis (details in next section). specifically, we tested the effect of amino acid mutations across the rbm by mutating the sars-cov- sequence to the consensus spike sequence from clade , i.e. a clade known to be incompatible with hace usage ( ) . notably, while most substitutions are predicted to have a destabilizing effect on the spike-hace complex, mutations of sdp residues are predicted to have a significantly larger impact compared to non-sdp residues (figure ) . computational methods exploiting coevolution signals in msas of protein families are widely used to infer features such as molecular interactions and functional sites ( , ( ) ( ) ( ) . such signals arise from the specific adaptation between correlated amino acid sites, where changes in one site are potentially compensated by changes in the other. in the case at hand, coevolution signatures are used as markers for the study of the physical interactions occurring between different sites of the spike as well as between the spike and their cognate host cell receptor ace . as it can be observed in supplementary figure s , the strongest intramolecular coevolution signal, considering the top- predictions, is observed over the rbd region of the spike (the overall precision of the method is reported in supplementary figure s ). figure shows in detail the rbm region, which presents above average precision and recall values of % and . %, respectively. among the sdps (highlighted in green), the precision is even higher, around %, and the recall is . %. notably, % of sdps found within the sarbecovirus subgenus show a coevolution signal with ace contacting residues, namely y (coevolving with c , p ), q (coevolving with l , t , c , p ) and q (coevolving with g ). these three positions are hubs on the interface with ace , making direct contact with positions t , f , k and y , positions k , h and e , and positions d , y , q and l of ace , respectively ( ) . particularly, position q /n in sars-cov- /sars-cov has been described to be critical for high affinity binding of both sars-cov and sars-cov- to ace ( , ) . furthermore, it is interesting to notice that the c /c sdp in sars-cov- /sars-cov is an important position for the stability of the rbm as a whole and a complete loss of hace binding in vitro has been described when this position is mutated to alanine in sars-cov ( ) . we next performed a coarse-grained coevolutionary analysis on a concatenated msa containing eight spike proteins and their cognate ace receptors. contact predictions were obtained by computing the mi-apc score for every inter-protein pair of alignment positions, considering the rbd region of the spike protein and the whole ace . interestingly, three rbm positions were found coupled to ace positions among the top- mi-apc scores (supplementary figure s ) . specifically, the ace residue h was coupled to l , s and q on the spike protein. additionally, the spike positions r and l were coupled to ace residues q , n , q , m and to e , t and h , respectively. among these predictions, t -l , h -l and h -s correspond to true contacts within a distance, while the couplings between r and ace positions could be related to long-range effects. notably, positions e and h have been described as crucial to sars-cov binding to ace ( ) . also, l was described as important for the stabilisation of a binding hotspot between sars-cov- and ace ( ) . interestingly, a recent study reported that ace variants e k and t a are more susceptible to sars-cov- , while variant h r decreases sars-cov- affinity ( ) . these results, despite the limited number of sequences in the concatenated alignment, point to at least three specific rbm viral positions (l , s and q in sars-cov- ) likely adapted to their species-specific counterparts. since the beginning of the covid- outbreak and the isolation of the sars-cov- virus, laboratories around the world are continuously isolating viral genomic sequences with unprecedented speed, enabling nearly real-time data sharing of more than , genomic sequences so far ( ) . after discarding partial sequences, a multiple sequence alignment was built based on a total of , sars-cov- spike sequences isolated from human samples in countries. our analysis of missense amino acid variations confirmed earlier reports ( ) that most mutations occur within the s subunit, with a dominant variant observed at position , where more than % of samples carry the d g mutation, followed by mutations l f and r i appearing in and samples, respectively. within the rbd, the top frequent mutations are s n, t i and p s, found in , , and samples, respectively. notably, these positions fall within the rbm, forming a surface-exposed loop that is proximal to the ace binding surface, and that is absent in sarbecovirus clades and ( ) . although previous experiments indicate that this loop is per se not sufficient to impart ace receptor usage, deletions of this region are associated with reduced spike expression and loss of cell entry ( ) . hence, mutations in this region are expected to impact the stability of the protein, rather than its affinity to the receptor. finally, perhaps not surprisingly, the frequency of variants at sdps is very low, with an overall variability that is comparable to that observed in ace -contacting residues (supplementary figure s ) . mutations are observed in out sdps. with the exception of t a and y , that are observed in and samples, respectively, all other mutations are only present in one sample. mostly, these variants are predicted to be neutral or to destabilize the binding to the receptor (supplementary figure s b) . in summary, the low frequency of mutations in sdps fulfils the requirement to preserve a functional role of those positions, providing an additional evidence of their involvement in the interaction with the host cell receptor. the relationship between protein family segregation and their functional organisation has been extensively investigated for decades and a variety of computational methods have been developed to infer their evolutionary link at the residue level ( ) ( ) ( ) . it is therefore relatively straightforward to identify the amino acid positions that modulate the functional specificity of a given enzyme towards a substrate or cofactor ( ) or the binding specificity of a protein-ligand or protein-protein interaction ( , ) by the analysis of the differential conservation patterns within the msa of a protein family. here we apply this concept to the analysis of the sars-cov- spike in the context of a msa of homologous sequences belonging to the β-cov genus. the sdp analysis is based on a vectorial representation of protein sequences and amino acid positions in a multidimensional space to simultaneously identify the family segregation and the residue positions that better explain the sources of variation of the family ( ) . on one hand, the analysis performed at the β-cov family level led to the identification of five sequence subfamilies, reflecting the known phylogenetic classification of betacoronavirus into five subgenera ( ) ( figure a ). on the other hand, the analysis performed on individual β-cov subgenera, i.e. sarbecovirus, merbecovirus and embecovirus subgroups, allowed a fine-grained classification into subfamily clusters that clearly reflect the functional diversification of the spike protein family, that is, the specificity to different host-cell receptors (figure - ) . indeed, both the clustering and domain enrichment results of the associated sdps consistently reproduce the known cell receptor specificities observed across the different β-cov lineages ( , ( ) ( ) ( ) . at the level of the sarbecovirus group, for example, both sars-cov- and ratg are clustered together with sars-cov and other sars-like sequences from bats ( figure c) , reflecting the ability of these members of the sarbecovirus group to bind the ace cell receptor ( ) . notably, the proximity of sars-cov- and ratg and other sars-cov sequences based on key sdp positions is different to what seen based on full sequence phylogenetic analysis, where they form distinct clades. a proximity that is driven by their shared sdps and it is interpreted here in terms of their shared ability to bind the human ace receptor. as it is often the case, functional constraints arise from the requirement of maintaining the interaction of proteins with other macromolecules or ligands. such constraints translate into specific roles and properties of individual amino acids, or protein sites. in the case at hand, the analysis of the physicochemical, structural and conservation properties of the sdps of the different subfamilies highlights a pattern that is typical of protein functional sites, as they show high conservation across the protein family, are solvent exposed, and are enriched in the receptor binding domains (supplementary figures s - and supplementary table s ) . hence, in order to assess the role of the sdps in mediating ace receptor usage across the sarbecovirus group, we set up an in silico mutagenesis study and analysed the effect of amino acid mutations across the rbm and their impact on ace binding. notably, while our results are in line with previous observations ( ) , they point to specific positions across the rbm that might exert a critical role, either by engaging the receptor through direct intermolecular contacts or by affecting the local orientation of ace contacting residues and the stability of the rbm as a whole. collectively, these results point to a key role of sdps in mediating host cell receptor specificity across β-covs and provide, at the same time, a framework for monitoring the evolution of the sars-cov- specificity to hace , as well as the emergence of novel potential cross-species transmission events. as such, it is important to notice that from the analysis of amino acid variations across the circulating sars-cov- virus, sdps tend to mutate with a very low frequency, similar to what is seen at ace contacting sites (supplementary figure s ) ( ) . this is of relevance, as our results suggest that mutations in sdps can significantly impact the receptor-binding ability of the spike. furt hermore, the experience in other scenarios has shown that mutating sdps is, in general, sufficient to transform the properties between two groups of proteins of the same family, i.e. the interchange of the residues occupying the sdps between two families implies a change in the associated biological properties ( ) ( ) ( ) ( ) ( ) . notable examples include the production of switch-of-function mutants of small gtpases with changed selectivity, or the change of transport specificity between mip channel proteins by few amino acid substitutions ( , ) . in line with this reasoning, it can be argued that other members of the sarbecovirus group might have the potential to acquire ace binding ability, as they share substantial similarity in terms of spds. this is especially the case of members of the sarbecovirus green cluster, which despite being phylogenetically distant to sars-cov- , display identical residues in out of sdp positions within the rbd, making them potential candidates for new human infections. in conclusion, the results presented here show that the identification of evolutionary patterns based on the analysis of sequence information alone can provide meaningful insights on the molecular basis of host-pathogen interactions and adaptation. we believe that both the methodology and results presented in this work can provide the basis for follow-up studies analysing the potential routes of mutations that could lead to new adaptation to human hosts and ultimately contribute to better understanding and monitoring of events that are critical to public health concerns worldwide. the sars epidemic in hong kong: what lessons have we learned? outbreak of middle east respiratory syndrome coronavirus in saudi arabia: a retrospective study an interactive web-based dashboard to track covid- in real time genetic recombination, and pathogenesis of coronaviruses fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein evidence for a common evolutionary origin of coronavirus spike protein receptor-binding subunits cryo-em structure of the -ncov spike in the prefusion conformation structure of the sars-cov- spike receptor-binding domain bound to the ace receptor the proximal origin of sars-cov- structure, function, and evolution of coronavirus spike proteins evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic basic local alignment search tool cd-hit: accelerated for clustering the next-generation sequencing data mafft online service: multiple sequence alignment, interactive sequence choice and visualization protein interactions and ligand binding: from protein subfamilies to functional specificity new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion rate site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues measuring evolutionary rates of proteins in a structural context structure, function, and antigenicity of the sars-cov- spike glycoprotein structural and functional basis of sars-cov- entry by using human ace cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains structure of sars coronavirus spike receptor-binding domain complexed with receptor structural basis for human coronavirus attachment to sialic acid receptors structure of mers-cov spike receptor-binding domain complexed with human receptor dpp pre-fusion structure of a human coronavirus spike protein estimating maximum likelihood phylogenies with phyml genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses human coronaviruses oc and hku bind to --acetylated sialic acids via a conserved receptor-binding site in spike protein domain a a pneumonia outbreak associated with a new coronavirus of probable bat origin angiotensin-converting enzyme is a functional receptor for the sars coronavirus identification of residues on human receptor dpp critical for mers-cov binding and entry identification of sialic acid-binding function for the middle east respiratory syndrome coronavirus spike glycoprotein slow protein evolutionary rates are dictated by surface-core association a sars-like cluster of circulating bat coronaviruses shows potential for human emergence the sars coronavirus s glycoprotein receptor binding domain: fine mapping and functional characterization sequence co-evolution gives d contacts and structures of protein complexes emerging methods in protein co-evolution conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone receptor and viral determinants of sars-coronavirus adaptation to human ace structural basis of receptor recognition by sars-cov- a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensin-converting enzyme identification of critical determinants on ace for sars-cov entry and development of a potent entry inhibitor human ace receptor polymorphisms predict sars-cov- susceptibility https buckland-merrett, data, disease and diplomacy: gisaid's innovative contribution to global health characterizations of sars-cov- mutational profile, spike protein stability and viral transmission evolution of function in protein superfamilies, from a structural perspective a method to predict functional residues in proteins protein function prediction using local d templates an evolutionary trace method defines binding surfaces common to protein families automatic methods for predicting functionally important residues molecular basis of binding between novel human coronavirus mers-cov and its receptor cd bat origins of mers-cov supported by bat coronavirus hku usage of human receptor cd receptor usage and cell entry of bat coronavirus hku provide insight into bat-to-human transmission of mers coronavirus mutations from bat ace orthologs markedly enhance ace -fc neutralization of sars-cov- https predicting functionally important residues from sequence conservation evolution of protein kinase substrate recognition at the active site switch from an aquaporin to a glycerol channel by two amino acids substitution effector recognition by the small gtp-binding proteins ras and ral switch-of-function mutants based on morphology classification of ras superfamily small gtpases evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors the authors are grateful to all other members of the computational biology group for their valuable feedbacks and useful discussions. this work has received funding from the exscalate cov project, from the european union's horizon research and innovation programme, under grant agreement n. . the authors have declared no competing interest. key: cord- -d n b authors: bacus, michael g.; dayap, stephen adrian h.; tampon, nikki vanesa t.; udarbe, marielle m.; puentespina, roberto p.; villanueva, sharon yvette angelina m.; de cadiz, aleyla e.; achondo, marion john michael m.; murao, lyre anni e. title: global genetic patterns reveal host tropism versus cross-taxon transmission of bat betacoronaviruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d n b emerging infectious diseases due to coronavirus (cov) infections have received significant global attention in the past decade and have been linked to bats as the original source. the diversity, distribution, and host associations of bat covs were investigated to assess their potential for zoonotic transmission. phylogenetic, network, and principal coordinate analysis confirmed the classification of betacoronaviruses (betacovs) into five groups ( a to e) and a potentially novel group, with further division of d into five subgroups. the genetic co-clustering of betacovs among closely related bats reflects host taxon-specificity with each bat family as the host for a specific betacov group, potentially a natural barrier against random transmission. the divergent pathway of betacov and host evolution suggests that the viruses were introduced just prior to bat dispersal and speciation. as such, deviant patterns were observed such as for d-iv, wherein cross-taxon transmission due to overlap in bat habitats and geographic range among genetically divergent african bat hosts could have played a strong role on their shared cov lineages. in fact, a few bat taxa especially the subfamily pteropodinae were shown to host diverse groups of betacovs. therefore, ecological imbalances that disturb bat distribution may lead to loss of host specificity through cross-taxon transmission and multi-cov infection. hence, initiatives that minimize the destruction of wildlife habitats and limit wildlife-livestock-human interfaces are encouraged to help maintain the natural state of bat betacovs in the wild. importance bat betacoronaviruses (betacovs) pose a significant threat to global public health and have been implicated in several epidemics such as the recent pandemic by severe acute respiratory syndrome coronavirus . here, we show that bat betacovs are predominantly host-specific, which could be a natural barrier against infection of other host types. however, a strong overlap in bat habitat and geographic range may facilitate viral transmission to unrelated hosts, and a few bat families have already been shown to host multi-cov variants. we predict that continued disturbances on the ecological balance may eventually lead to loss of host specificity. when combined with enhanced wildlife-livestock-human interfaces, spillover to humans may be further facilitated. we should therefore start to define the ecological mechanisms surrounding zoonotic events. global surveillance should be expanded and strengthened to assess the complete picture of bat coronavirus diversity and distribution and their potential to cause spillover infections. emerging and re-emerging infectious diseases greatly affect public health and global economies ( ). these diseases involve pathogenic strains that recently evolved, pathogens that infect human population for the first time, and pathogens that re-occur at higher frequency ( ). majority of these emerging infectious diseases are caused by microorganisms from non-human source or zoonotic pathogens from wild animals ( ). in particular, emerging infectious diseases due to coronavirus (cov) however, some deviations were also noted. for example, bat hosts that belong to genetically unrelated taxa were mixed in some betacov groups. the mollosidae bat eumops glaucinus was found in world leaf-nosed bat trianeops persicus in d-iv of epomophorinae bats. looking at the host, certain bat families were observed to harbor betacovs that belong to various lineages. the rousettinae bats were found to carry both d-i/ d-iii and d-iv betacovs, and the old world fruit bats b, d-iv, and e. the the divergence of the bats and their betacovs were compared to evaluate common evolutionary pathways (fig. b) . the vesper microbats diverged as a separate group from the rest of the bats. the in contrast, our findings support a previously proposed hypothesis that covs limit cross-species although bat betacovs are host taxon-specific, their evolutionary pathways are different from evolution with its host. instead, this is indicative that the currently circulating viruses may have been introduced relatively recently, i.e. to the most recent common ancestors of each bat taxon but prior to global dispersion and speciation, during which the virus acquired adaptation to its host. the recent introduction of betacovs in bats implies that other factors may have had the opportunity to influence virus-host dynamics. in the succeeding discussions, we will present two deviant phenomena that exemplify this: cross-taxon transmission of covs and bat hosts with multi-cov lineages. we provide genetic evidence for cross-taxon transmission as indicated by genetically unrelated microbial threats to health: taxonomy-history?taxnode_id= a new virus isolated from the human respiratory tract identification of a new human coronavirus molecular pathology of emerging coronavirus infections dobsonia moluccensis. the iucn red list of threatened species notes on the dry season roosting and foraging behaviour food availability and annual migration of the replication and shedding of mers-cov in jamaican fruit bats (artibeus jamaicensis) further evidence for bats as the evolutionary source of middle east respiratory syndrome coronavirus coronaviruses: important emerging human pathogens urbanization and disease emergence: dynamics at the wildlife-livestock-human interface mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium wuhan municipal commission of health and health on detection and characterization of a novel bat-borne coronavirus in singapore using multiple molecular approaches preliminary assessment of activity pattern and diet of the lesser dog faced fruit bat cynopterus brachyotis in a dipterocarp forest development of a one-step rt- pcr assay for detection of pancoronaviruses (α-, β-, γ-, and δ-coronaviruses) using newly designed degenerate primers for porcine and avianfecal samples jmodeltest: phylogenetic model averaging. molecular biology and evolution bayesian phylogenetic and phylodynamic data integration using beast . . virus evolution posterior summarization in bayesian phylogenetics using tracer . . systematic biology past: paleontological statistics software package for education and data analysis key: cord- -un urdb authors: baray, juwel chandra; khan, md. maksudur rahman; mahmud, asif; islam, md. jikrul; myti, sanat; ali, md. rostum; sarker, md. enamul haq; kumar, samir; chowdhury, md. mobarak hossain; roy, rony; islam, faqrul; barman, uttam; khan, habiba; chakraborty, sourav; hossain, md. manik; chowdhury, md. mashfiqur rahman; ghosh, polash; mohiuddin, mohammad; sultana, naznin; nag, kakon title: bancovid, the first d g variant mrna-based vaccine candidate against sars-cov- elicits neutralizing antibody and balanced cellular immune response date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: un urdb effective vaccine against sars-cov- is the utmost importance in the current world. more than million deaths are accounted for relevant pandemic disease covid- . recent data showed that d g genotype of the virus is highly infectious and responsible for almost all infection for nd wave. despite of multiple vaccine development initiatives, there are currently no report that has addressed this critical variant d g as vaccine candidate. here we report the development of an mrna-lnp vaccine considering the d g variant and characterization of the vaccine in preclinical trial. the surface plasmon resonance (spr) data with spike protein as probe and competitive neutralization with rbd and s domain revealed that immunization generated specific antibody pools against the whole extracellular domain (rbd and s ) of the spike protein. the anti-sera and purified iggs from immunized mice on day and neutralized sars-cov- pseudovirus in ace -expressing hek cells in a dose dependent manner. importantly, immunization protected mice lungs from pseudovirus entry and cytopathy. the immunologic responses have been implicated by a balanced and stable population of cd + cells with a th bias. the igg a to igg and (igg a+igg b) to (igg +igg ) ratios were found ± . and . ± . , respectively. these values are comparatively higher than relevant values for other published sars-cov- vaccine in development, , and suggesting higher viral clearance capacity for our vaccine. the data suggested great promise for immediate translation of the technology to the clinic. as dispersant. the formulation was concentrated using ultra centrifugal filters (merck, germany), filtered through . micron filter, and stored at ± °c. the formulation was passed through the quality control for the particle size, encapsulation efficiency, endotoxin limit and sterility. a total number of balb/c swiss albino mice (male and female) of - weeks old, were selected randomly and isolated days before immunization. after careful observation and conditioning, mice ( males and females) were taken to the experiment room for immunization and subsequent safety and efficacy analysis. male mice were also separated for local tolerance testing. the temperature in the experimental animal room was °c (± °c) and the relative humidity was ± %. the room was hvac controlled iso class room with % fresh air intake and full exhaust. the mice were individually housed in polypropylene cage with individual water bottle, provided with g of in-house mouse feed daily and kept under hours of day-night cycle. mice were separated into different groups consisting mice ( males and females) in each group. there were different treatment groups such as treatment group , , and , placebo group and control group. each mouse of treatment groups , , and was immunized with sterile . µg/ µl, . µg/ µl and . µg/ µl of bancovid, respectively. each mouse of the placebo group was injected with the vehicle only and the control group mice were not injected with anything. intramuscular (im) injection in the left quadriceps was done for immunization. the flow of the experimental design is the immunogenicity of bancovid was evaluated in balb/c mice, post administration to the quadriceps muscle. approximately µl blood was collected from facial vein and centrifuged at x g for serum isolation ( minutes at cc). all serums were aliquot, frozen immediately and stored at - °c until analysis. the reactivity of the sera from each group of mice immunized with bancovid was measured against sars-cov- s antigen (sinobiologicals, china). analysis revealed igg binding against sars-cov- s protein antigens in the sera of the immunized mice. the serum igg binding endpoint titers (epts) were measured in mice immunized with bancovid. epts were observed in the sera of mice at day and day after immunization with a single dose of the vaccine candidate. toxicity pre-immune whole blood (approximately µl) from each mouse was collected for complete blood count (cbc) in % edta at days before immunization. similarly, whole blood was also collected after immunization at day for cbc analysis using auto hematology analyzer then co-transfection was performed using lipofectamine (thermofisher, usa) reagent according to manufacturer's protocol. next day . % low melting agarose in dmem media was spread on the well and incubated until plaques were formed. after formation of plaques, multiple plaques were collected in dmem media and titers were measured for plaque selection. then selected plaque was added on the fresh pre seeded viral production cell. after few days, for retro based neutralization assay, qpcr was used to analyzed the copy number of s gene that integrated into cell. copy number of s gene indicated the entry of pseudovirus into cell. magmax dna multi-sample ultra-kit. (thermofisher, usa). these genomic dna was used for determination of s gene copy number by qpcr. in-vivo neutralization a total number of albino male mice of - weeks were selected and isolated for the analysis. ultimate (thermofisher, usa) system using mm disodium hydrogen phosphate (wako, japan), mm sodium dihydrogen phosphate (wako, japan), mm sodium chloride (merck, germany), ph . as mobile phase. biobasic sec- ( x . mm, particle size; µm, thermofisher, usa) column was used with . ml/minute flow rate, nm wavelength, µl sample injection volume for minutes. for peptide identification, data dependent mass spectrometry was performed where full-ms scan range was m/z to m/z, resolution was , , agc target was e , maximum it was milliseconds (ms), and data dependent mass spectrometry resolution was , , agc target was e , maximum it was ms. after getting raw data from mass spectrometry system, data analysis was performed in biopharma finder (thermofisher, usa) using variable parameters to get confident data, and then data were combined in one map to visualize complete ivt process was tuned to obtain desired mrna with high yield and quality (figure: c) . the mrna was encapsuled in lipid nano particle (lnp) ranging from - nm with the final ph of . . we did a pilot study with limited numbers of mice to identify the suitable mrna- lnp size for our formulation. mrna-lnp either smaller than nm or larger than nm did not generate considerable immunological response even with a dose of ng/mice (data not shown). to obtain the best process control for the dose production, we therefore, set our mrna-lnp size range at ± nm. we used mrna-lnp of this range throughout the rest whether the immunization have generated antibody pool spanning for the whole antigen or for any specific domain (s or s ), we have chosen surface plasmon resonance (spr) experiment. the s protein chip recognized high-affinity antibody from the anti-sera (figure: d) . the response was attenuated significantly for s-protein(s) (s, s and s ) pretreated sera ( figure d). s and s pretreatment showed similar and strong inhibitory response while s the ph ( . ) of our formulation buffer for mrna-lnp is also lower than the other relevant references ( . ~ . ). , , , lower ph helps quick release of the cargo from endosomal compartment and protects mrna from acid hydrolysis and lysosomal digestion in intracellular milieu. together, numbers of minute changes in the design context likely playing in concert and produced quick, balanced, stable th -igg -biased antibody response. 'bancovid' immunization did not produce any noticeable effect for local or systemic toxicity as primarily evident by the absence of four cardinal signs of inflammation: redness (latin rubor), heat (calor), swelling (tumor), and pain (dolor). there was no erythema or erythredema as well in any injection site. the cbc and blood chemistry data did not show significant changes in relevant profiles and has been suggesting that the vaccine behaves safely in animal. a balanced response between th and th is desired to achieve safe and effective humoral immunity performance. 'bancovid' has produced well-balanced igg and igg response by th day postimmunization and remained similar on th day postimmunization sera, which is suggesting a stable antibody response during the sampling period. along with opsonizing characteristics, igg has higher affinity to its receptors and have superior complement system activation potential over igg . , accordingly, 'bancovid'-mediated higher ratio of igg than igg has suggested that higher capacity of the antibody pool to clear antigen from the system. the ratio of igg a and igg , and cytokine-stained cd + and cd + t cell population showed a th -bias response. since mouse igg is equivalent to human igg , therefore, it is plausible that 'bancovid' will elicit effective cellular and humoral response against sars-cov- in human. the early vaccine development initiatives were taken before the g variant became the roles of g mutation on constitutive infection have been attributed to its conformational change. it has been proposed that the -cooh group of d forms hydrogen bond with the - oh group of t across the s /s interface, which cannot form in g . on the contrary, structural modeling studies revealed that "the d g substitution creates a sticky packing defect in subunit s , promoting its association with subunit s as a means to stabilize the structure of s within the s /s complex. in other words, the d g mutation in fact promotes the s /s association and stabilize the spike. the finding is in accordance with the observation that g has a greater stability originating from less s domain shedding and greater accumulation of the intact s protein into the pseudovirion. it has also been reported a in the uncomplexed s and inhibits the s /s association. g diminishes the salt bridge formation and s /s association resulting interaction with the rbd to facilitate higher infection. therefore, blocking of g with a specific antibody would inhibit such acquired fitness of sars-cov- . 'bancovid' immunization has produced a pool of antibody that covers the whole length of the spike protein suggesting that highly likely relevant antibody- for facility and information management system. flow cytometric analysis of total t cell (cd + ) populations producing tfn alpha on mouse splenocyte upon sars-cov- s protein stimulation. cells were gated in an orderly manner, like singlets were gated, followed by lymphocytes, cd + , cd + cd + and cd + cd + tfnalpha + (a, b, c) control panels where . %, . % and . % cd + cd + tfnalpha + cells were identified respectively, (d, e, f) treatment panels where . %, . % and . % cd + cd + tfnalpha + cells were identified respectively. flow cytometric analysis of total t cell (cd + ) populations producing il- on mouse splenocyte upon sars-cov- s protein stimulation. cells were gated in an orderly manner, like singlets were gated, followed by lymphocytes, cd + , cd + cd + and cd + cd + il + (a, b, c) control panels where . %, . % and . % cd + cd + il + cells were identified respectively, (d, e, f) treatment panels where . %, . % and . % cd + cd + il + cells were identified respectively. a single dose of self-transcribing and replicating rna based sars-cov- vaccine produces protective adaptive immunity in mice self-amplifying rna sars-cov- lipid nanoparticle vaccine candidate induces high neutralizing antibody titers in mice alnylam launches era of rnai drugs capping, and proofreading mechanisms of sars-coronavirus exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics molecular architecture of early dissemination and massive second wave of the sars-cov- virus in a major metropolitan area tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- d g spike variant does not alter igg, igm, or iga spike seroassay performance sars-cov- viral spike g mutation exhibits higher case fatality rate the spike d g mutation increases sars-cov- infection of multiple the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity structural impact of mutation d g in sars-cov- spike protein: enhanced infectivity and therapeutic opportunity spike mutation d g alters sars-cov- fitness and neutralization the effect of size and charge of lipid nanoparticles prepared by microfluidic mixing on their lymph node transitivity and distribution lipid-based nanoparticles in the systemic delivery of sirna virus-like particles induces a strong antiviral-like immune response in mice type /type immunity in infectious diseases igg subclass co-expression brings harmony to the quartet model of flow cytometric analysis of total t cell (cd + ) populations producing il- on mouse splenocyte upon sars-cov- s protein stimulation. cells were gated in an orderly manner, like singlets were gated cd + cd + il + cells were identified respectively after dissolving, add mm dtt (thermofisher scientific, usa) to the solution to a final concentration of mm ( : dilution) and mix briefly; incubate at °c for hour. for alkylation to digest, add trypsin (thermofisher scientific, usa) solution to the sample solution to a final trypsin to protein ratio of : (w/w). incubate the sample tube at °c for - hours. after incubation, to stop digestion supplementary µg of p rdna was restriction digested with sfoi (thermofisher, usa) for hours, visualized using . % agarose gel electrophoresis, gel excised and dna extracted from gel using genejet gel extraction and dna cleanup micro kit, re-purification of dna by phenol:chloroform:isoamyl alcohol, followed by phenol removal using chloroform (twice). purified lyophilized dna was reconstituted using nuclease-free water, quantified and store at - °c for future use. optimization step : synthesis time factor ng linear purified dna was used for all optimization reactions. each reaction was performed in a µl total volume. for every reaction, a dnase treatment reaction was also performed using µl turbo dnase ( u/µl) at °c for minutes. for visualization, % agarose gel electrophoresis was performed after every step of reaction ( figure c) .in optimization step , where synthesis time dependency was observed, for that following components were mixed together apart from water and template, and reaction were run for , , , , and hours. control reactions were also performed ( µg control template ptri-xef for each reaction) for , and hours at °c. optimization step : rnase inhibitor and pyrophosphatase effectin rd step of optimization, murine rnase inhibitor and yeast pyrophosphatase effects were observed at a constant synthesis time ( hours) and constant rntps at °c. for that following components were mixed together apart from water and template. a higher concentration of rntps reaction was setup. as this point, last optimized condition was run as positive control. optimization step : temperature dependencyin th step of optimization, temperature dependency was observed at a constant synthesis time ( hours), constant rntps, and constant rnase inhibitor and pyrophosphatase, and at , , , , and °c. for that following components were mixed together apart from water and template. a higher concentration of rntps reaction was also setup. as this point, last optimized condition was run as positive control. tapping c spin column (thermofisher scientific, usa) to settle resin. place column into a receiver tube. to activate the column, add µl % acetonitrile (wako pure chemicals industries ltd., japan) to wet resin. centrifuge the column at × g for minute. repeat the step. to equilibrate, add µl . % formic acid (wako pure chemicals industries ltd., japan) in % acetonitrile (wako pure chemicals industries ltd., japan). centrifuge the column at × g for minute. repeat the step. load sample on top of resin bed. place column into a receiver tube. centrifuge the column at × g for minute. to ensure complete binding, recover flowthrough and repeat the step - times. to wash the column, place column into a receiver tube. add µl . % formic acid in % acetonitrile to column. centrifuge the column at × g for minute. repeat the step. to recover sample, place column in a new receiver tube. add µl % acetonitrile to top of the resin bed. centrifuge at × g for minute. repeat the step in same receiver tube. rpmi complete media (rpmi + l-glutamine + penicillin streptomycin + mouse sera) was prepared first. then a mm petri dish was taken, ml complete media was added and harvested spleen was taken into the dish. by using microscopic glass slides, spleen was smashed into pieces within the petri dish. cells were washed out from slides using micropipette. a ml pipette was used to draw the solution up and down, each time closing the end of the pipette against the bottom of the petri dishto forcefully expel the contents and break up the pieces. cell solution was passed through a sterile μm mesh strainer. centrifugation was performed for minutes at xg, at ºc. supernatant discarded and cells were re-suspended in rbc( x) lysing buffer ( x rbc lysis buffer: nh cl - . gm, nahco - . gm, edta - . gm, ph adjusted to . using naoh, volume adjusted to ml with water. filter sterilize and store at °c for six months.) and incubated at room temp for - mins. vigorous shaking was performed at minute intervals. again centrifugation was performed for minutes at xg, at ºc. supernatant discarded and cells were resuspended in pbs, following centrifugation and supernatant discard. pbs washing step was repeated again. finally, re-suspension of cell pellet in ml rpmi complete media, plating in a -well culture plate and incubate at °c, % co as needed. key: cord- -fruzsn authors: finch, courtney l.; crozier, ian; lee, ji hyun; byrum, russ; cooper, timothy k.; liang, janie; sharer, kaleb; solomon, jeffrey; sayre, philip j.; kocher, gregory; bartos, christopher; aiosa, nina m.; castro, marcelo; larson, peter a.; adams, ricky; beitzel, brett; di paola, nicholas; kugelman, jeffrey r.; kurtz, jonathan r.; burdette, tracey; nason, martha c.; feuerstein, irwin m.; palacios, gustavo; claire, marisa c. st.; lackemeyer, matthew g.; johnson, reed f.; braun, katarina m.; ramuta, mitchell d.; wada, jiro; schmaljohn, connie s.; friedrich, thomas c.; o’connor, david h.; kuhn, jens h. title: characteristic and quantifiable covid- -like abnormalities in ct- and pet/ct-imaged lungs of sars-cov- -infected crab-eating macaques (macaca fascicularis) date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fruzsn severe acute respiratory syndrome coronavirus (sars-cov- ) is causing an exponentially increasing number of coronavirus disease (covid- ) cases globally. prioritization of medical countermeasures for evaluation in randomized clinical trials is critically hindered by the lack of covid- animal models that enable accurate, quantifiable, and reproducible measurement of covid- pulmonary disease free from observer bias. we first used serial computed tomography (ct) to demonstrate that bilateral intrabronchial instillation of sars-cov- into crab-eating macaques (macaca fascicularis) results in mild-to-moderate lung abnormalities qualitatively characteristic of subclinical or mild-to-moderate covid- (e.g., ground-glass opacities with or without reticulation, paving, or alveolar consolidation, peri-bronchial thickening, linear opacities) at typical locations (peripheral>central, posterior and dependent, bilateral, multi-lobar). we then used positron emission tomography (pet) analysis to demonstrate increased fdg uptake in the ct-defined lung abnormalities and regional lymph nodes. pet/ct imaging findings appeared in all macaques as early as days post-exposure, variably progressed, and subsequently resolved by – days post-exposure. finally, we applied operator-independent, semi-automatic quantification of the volume and radiodensity of ct abnormalities as a possible primary endpoint for immediate and objective efficacy testing of candidate medical countermeasures. finch et al. ). however, in both human disease and animal models, the temporal and mechanistic relationship between viral replication, subsequent immunopathology ( , ) , and clinical disease remains uncertain. furthermore, in the available nhp models, all of which are sublethal, markers of clinical disease (cage-side scoring, chest x-ray) have been of limited sensitivity. more concerningly, both metrics are subject to observer bias ( - ). reliable animal models needed for rapid development and evaluation of candidate medical countermeasures (mcms) require an unbiased reproducible and quantifiable metric of disease that mirrors key aspects of covid- . based on the rather limited x- ray findings in the lungs of reported nhp models of sars-cov- infection with either mild or no clinical signs ( , , - ), we turned to high-resolution chest ct and pet/ct to characterize lung abnormalities in infected nhps toward longitudinal quantitative comparison. we used direct bilateral primary intrabronchial instillation in a -day-apart staggered design to expose two groups of three crab-eating macaques (macaca fascicularis) to medium (mock group macaques m - ) or medium including . x pfu/macaque of sars-cov- (virus group macaques v - ) (supplementary table ) . all macaques were observed daily for days prior to exposure (day [d] ) and for days post-exposure. physical examination scores and blood, conjunctival, nasopharyngeal, oropharyngeal, rectal, fecal, and urine specimens were collected at identical timepoints. virus-exposed macaques were indistinguishable from mock group macaques during the pre-exposure time period. two pre-exposure chest ct and whole- previously published results ( , ( ) ( ) ( ) ( ) ( ) , none of the macaques developed any major clinical abnormalities (including by cage-side assessment and clinical scoring or physical examination) throughout the study and clinical laboratory results were not significantly different between the mock-exposed and virus-exposed groups (supplementary tables - ) . sars-cov- rna could not be detected by rt-qpcr in any sample from mock- exposed macaques but was variably present during the early days post-exposure in conjunctival, fecal, nasopharyngeal, oral, and rectal swabs, but never in plasma or urine (figure a) . anti-sars-cov- igg antibodies were not detectable by elisa in mock- exposed macaques but were detectable at d post-exposure and continued to rise in all virus-exposed macaques to at least d (figure b) . consistent with elisa results, fluorescent neutralization titers generated from sera were undetected until d and were detected only in virus-exposed macaques (figure c) . longitudinal measurement of selected peripheral cytokines revealed between-and within-group differences with marked abnormalities noted in macaque v , which also had the highest igg antibody titers (supplementary figure ) . with the exception of minor and transient abnormalities on baseline imaging, ct scans of all mock-exposed macaques appeared generally normal over the entire study period (supplementary figure ) . however, all virus group macaques developed lung abnormalities clearly visible by chest ct as early as d . qualitatively, the distribution morphology, and duration of abnormalities described a spectrum similar to mild- moderately ill humans with covid- . characteristic ct findings in all virus group macaques included bilateral peripheral ggos variably in association with intra-or interlobular septal prominence (so-called "crazy paving"), reticular or reticulonodular sars-cov- pulmonary abnormalities in macaques finch et al. opacities, peri-bronchial thickening, subpleural nodules, and, in one macaque, dense alveolar consolidation with air bronchograms (figures - a, videos - increased fdg uptake detected by pet (figure , supplementary figures - ) corresponded well to the structural changes in the lungs observed by ct, and regional lymph node uptake was seen in all virus group macaques at d . in macaque v , fdg uptake decreased in the lungs at d but increased in mediastinal lymph nodes, and new fdg uptake was identified in the spleen. the two macaques (v , v ) with persistent or progressive structural abnormalities on chest ct had variable changes in fdg uptake associated with the structural abnormalities in the lungs (some markedly increased, some improved) with an accompanying marked increase in fdg uptake in regional lymph nodes and spleens on d . pet scan on d revealed normalization of previous areas of increased fdg uptake in the lung parenchyma in all three virus-exposed macaques, and persistent increased fdg update in regional lymph nodes and spleen. mock-exposed macaques did not have similar increased fdg uptake with the exception of transient increased uptake in regional lymph nodes after mock-exposure in a single macaque (m ). sars-cov- pulmonary abnormalities in macaques finch et al. quantification of the suvmax in selected regions of interest (roi) in the lung, specific regional lymph nodes, and spleen mapped well to the qualitative findings in both mock- exposed and virus-exposed macaques (figure ) . ct images can be used for quantification of lung abnormalities using measures of volume or radiodensity, i.e., total lung volume (lv); average radiodensity in the total lung volume (ld); hyperdense volume (hv), a volume of lung in which radiodensity (hounsfield units, hu) is above a pre-defined threshold; and average radiodensity in the hyperdense volume (hyperdensity, hd). normalized changes from a pre-exposure baseline can be longitudinally measured as the percent change in the volume of lung hyperdensity (pclh). toward standardization across lung volumes, pclh can be also be expressed as a percent of total lung volume (pclh/lv). increases in pclh or pclh/lv were not seen in the mock-exposed macaques over the entire study (figure a a key advantage of quantifiable ct chest imaging readout over serial euthanasia studies, in addition to potentially reduced experimental animal numbers, is the ability not only to evaluate between-group differences, but also to compare severity and duration of disease at higher resolution in single animals and even in isolated parenchymal areas sequentially. this approach can reduce the error inherent in cross-sectional sampling of individual animals at single timepoints. imaging does, however, introduce its own experimental complexities and limitations. as we aimed to evaluate whether pclh (or other ct imaging readouts presented in figure ) could be a useful quantitative readout for radiographic progression in the sars-cov- infected lung, we chose not to include irradiated inactivated sars-cov- in the mock inoculum to avoid antigen-induced inflammation and related radiographic changes. for similar reasons, and to avoid artificial dissemination of sars-cov- , we specifically did not perform bronchoalveolar lavage (bal) to obtain lung samples for downstream cellular, molecular, and virologic analysis ( , ) and did not perform lung biopsies. the frequency of anesthesia and instrument availability pragmatically limit imaging to carefully chosen timepoints during sars-cov- pulmonary abnormalities in macaques finch et al. a study. in particular, the extended time required to perform pet imaging resulted in logistical limitations of the number of macaques that could be included in the study. finally, with complete resolution of radiographic abnormalities by the end of the study period, we opted not to euthanize these macaques to be able to perform a re-exposure study in the future. thus, we cannot correlate radiographic with histopathologic findings. future studies should extend our initial findings in several directions. first, follow-up confirmation of these pilot results in this model of mild-moderate covid- is needed to further establish quantifiable lung ct as a reliable disease readout and to forge imaging-pathologic correlates in macaques euthanized at peak radiographic abnormality. confirmation should enable proof-of-concept evaluation of whether a candidate mcm will indeed significantly decrease peak or auc of pclh or pclh/lv compared to untreated infected control macaques. data from additional macaques will be used to confirm the sensitivity and relevance of the auc - and auc - for pclh or pclh/lv as robust measures of lung changes from ct evaluation. in parallel, disease severity could possibly be increased in the crab-eating macaque model by optimizing delivery of sars-cov- to the most vulnerable lung (via aerosol or more distal bronchoscopic delivery), with the ultimate goal of using the ct- quantifiable volume or radiodensity readouts to model the sick hospitalized human. other groups are already evaluating nhps of diverse species as possible covid- models. in these models, serial chest ct imaging after intrabronchial instillation of sars-cov- could be used to establish a meaningful and quantifiable covid- -like disease readout that will enable objective evaluation of medical countermeasures and also a comparison of sars-cov- -induced lung abnormalities in different nhp models. the macaques were split into groups of animals each (supplementary table ) . mock group (m) macaques received ml of dmem + % heat-inactivated fbs into each bronchus by direct bilateral primary post-carinal intrabronchial instillation, followed by a -ml normal saline flush and then ml air. virus group (v) macaques were exposed the sars-cov- pulmonary abnormalities in macaques finch et al. same way with each -ml instillate containing . x pfu/ml (i.e., a total exposure dose of . x pfu) of sars-cov- followed by -ml saline flush and then ml air. all macaques were sedated prior to instillation. prior to administering anesthesia, glycopyrrolate ( . mg/kg) was delivered intramuscularly to reduce saliva secretions. next, each macaque received mg/kg ketamine and then μg/kg dexmedetomidine intramuscularly. to reverse anesthesia, . mg/kg atipamezole was administered intravenously. all macaques were evaluated daily for health and were periodically table ). cage-side and physical exam scoring criteria were developed in collaboration with national primate research centers (nprcs) to standardize disease assessment and compare disease outcomes between nhp models. heart rate was not incorporated into the physical exam scores until d because heart rate score was determined as beats per minute over baseline. baseline heart rate was determined as an average over three timepoints, d- , d- /d- , and d (except for normalized with baseline density/radiodensity difference with baseline a new coronavirus associated with human respiratory disease in a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china world health organization, coronavirus (covid- clinical course and outcomes of critically ill patients with sars- cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study clinical characteristics of coronavirus disease in china clinical findings in a group of patients infected with the novel coronavirus (sars-cov- ) outside of wuhan, china: retrospective case series covid- in critically ill patients in the seattle region -case series clinical features of patients infected with novel coronavirus in wuhan sars-cov- pulmonary abnormalities in macaques finch et al. the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application age-related rhesus macaque models of covid- covid- ): a critical review of the literature to date epidemiological and clinical characteristics of asymptomatic sars-cov- carriers incidental findings suggestive of covid- in asymptomatic patients undergoing nuclear medicine procedures in a high prevalence region histopathologic changes and sars-cov- immunostaining in the lung of a patient with covid- pathological findings of covid- associated with acute respiratory distress syndrome multiscale -dimensional pathology findings of covid- diseased lung using high-resolution cleared tissue microscopy. biorxiv pulmonary and cardiac pathology in covid- : the first autopsy series from new orleans. medrxiv autopsy findings and venous thromboembolism in patients with covid- : a prospective cohort study sars-cov- pulmonary abnormalities in macaques finch et al. the pathogenicity of sars-cov- in hace transgenic mice susceptibility of ferrets, cats, dogs, and other domesticated animals to infection and rapid transmission of sars-cov- in ferrets simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility susceptibility of tree shrew to sars-cov- infection. biorxiv ocular conjunctival inoculation of sars-cov- can cause mild covid- in rhesus macaques. biorxiv comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model infection with novel coronavirus (sars-cov- ) causes pneumonia in the rhesus macaques comparison of sars-cov- infections among species of non- human primates. biorxiv respiratory disease in rhesus macaques inoculated with sars-cov- sars-cov- pulmonary abnormalities in macaques finch et dysregulation of immune response in patients with covid- in covid- : immunopathology and its implications for therapy frequency and distribution of chest radiographic findings in covid- positive patients interobserver reliability of the chest radiograph in community-acquired pneumonia portable chest x-ray in coronavirus disease- (covid- ): a pictorial review chest x-ray findings in ambulatory patients with covid- presenting to an urgent care center: a normal chest x-ray is no guarantee bias in radiology: the how and why of misses and misinterpretations error and discrepancy in radiology: inevitable or avoidable? insights imaging observer bias in lung nodule detection with spiral ct interpretive error in radiology sars-cov- pulmonary abnormalities in macaques finch et al. cytokine release syndrome in severe covid- on the alert for cytokine storm: immunopathology in covid- clinical and immunological features of severe and moderate coronavirus disease clinical predictors of mortality due to covid- based on an analysis of data of patients from wuhan risk factors for severity and mortality in adult covid- inpatients in wuhan bronchoalveolar lavage affects computed tomographic and radiographic characteristics of the lungs in healthy dogs inactivation and safety testing of middle east respiratory syndrome coronavirus panmicrobial oligonucleotide array for diagnosis of infectious diseases sars-cov- pulmonary abnormalities in macaques finch et influenza a and methicillin-resistant staphylococcus aureus co-infection in rhesus macaques -a model of severe pneumonia the nonhuman primate in nonclinical drug development and safety assessment time course of lung changes on chest ct during recovery from novel coronavirus (covid- ) pneumonia mass preserving image registration for lung ct d slicer as an image computing platform for the quantitative imaging network a rhesus macaque model of asian-lineage zika virus infection anti-sars-cov- s subunit igg elisa key: cord- -umcbulcw authors: martínez-murcia, antonio; bru, gema; navarro, aaron; ros-tárraga, patricia; garcía-sirera, adrián; pérez, laura title: in silico design and validation of commercial kit gps™ covid- dtec-rt-qpcr test under criteria of une/en iso : and iso/iec : date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: umcbulcw background the corona virus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ), has become a serious infectious disease affecting human health worldwide and rapidly declared a pandemic by who. early, several rt-qpcr were designed by using only the first sars-cov- genome sequence. objectives a few days later, when additional sars-cov- genome were retrieved, the kit gps™ covid- dtec-rt-qpcr test was designed to provide a highly specific detection method and commercially available worldwide. the kit was validated following criteria recommended by the une/en iso : and iso/iec : . methods the present study approached the in silico specificity of the gps™ covid- dtec-rt-qpcr test and rt-qpcr designs currently published. the empirical validation parameters specificity (inclusivity/exclusivity), quantitative phase analysis ( - copies), reliability (repeatability/reproducibility) and sensitivity (detection/quantification limits) were evaluated for a minimum of - assays. diagnostic validation was achieved by two independent reference laboratories, the instituto de salud carlos iii (isciii), (madrid, spain) and the public health england (phe; colindale, london, uk). results the gps™ rt-qpcr primers and probe showed the highest number of mismatches with the closet related non-sars-cov- coronavirus, including some indels. the kits passed all parameters of validation with strict acceptance criteria. results from reference laboratories % correlated with these obtained by suing reference methods and received an evaluation with % of diagnostic sensitivity and specificity. conclusions the gps™ covid- dtec-rt-qpcr test, available with full analytical and diagnostic validation, represents a case of efficient transfer of technology being successfully used since the pandemic was declared. the analysis suggested the gps™ covid- dtec-rt-qpcr test is the more exclusive by far. . in order to illustrate the extent of mismatching of gps™ kit, an alignment of primers/probe sequences to selected sars-cov sequences is shown in table . standard calibration curves of the qpcr were performed from ten-fold dilution series (figure a only three months ago, an outbreak of severe pneumonia caused by the novel coronavirus sars- cov- started in wuhan (china) and rapidly expanded to almost all areas worldwide. due to the need of urgent detection tools, several laboratories developed rt-qpcr methods by designing primers and probes from the alignment of a single-first provided sars-cov- genome sequence to known sars-cov, and some of these protocols were published at the who website [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . as the number of genomes available rapidly expanded during last january, the gps™ covid- dtec-rt-qpcr test was based on a more specific target for sars-cov- detection, being this company one of the pioneers marketing a pcr-kit for the covid- worldwide. mismatches which may affect to its binding, particularly considering its short primary structure. in some cases, single nucleotide mismatching was observed in some primers, but none of them were located close to primer '-end. considering all updated alignments, only the australia/vic / sequence showed a unique mismatch to the gps™ probe. therefore, a full calibration was run using synthetic rna-genomes from australia/vic / isolate and the resulting ct values the in silico analysis for exclusivity was more complex, showing a wide range of discriminative power for the methods subjected to analysis (table ) . for instance, the two rt-qpcr designs ip and ip developed by institut pasteur seems to discriminate well between sars-cov- and other respiratory virus as confirmed for a panel of specimens [ ] . the cdc from atlanta (usa) the n primer/probe, but a few weeks ago, this set was removed from the panel cov- and sars-cov, p probe was considered specific for sars-cov- . although our in silico results confirmed that purpose for p (table ) , the rdrp_p assay may also react with some other coronaviruses. the cdc in china developed two rt-qpcr assays for orf ab and n genes (table ). the analysis standard curve was repeated a minimum of ten times and average value for all parameters were optimum according to standard limits. for reliability, the coefficient of variation (cv) obtained in all cases for both, repeatability and reproducibility, was always much lower than %. the lod was tested with the usual protocol for copies repeated times with a positive result in all cases ( %). loq assays were performed in two sets of tests for both copies of standard templates. the loq measurement in both cases was validated with a t-student test with a confidence interval of %. the kit received diagnostic validation at two different reference laboratories (isciii, madrid; and phe, london). the results shown in table indicated % of diagnostic sensitivity and % of diagnostic specificity was assigned. currently, the kit is being used in several spanish hospitals and diagnostic laboratories. representative calibration curve with stats. inclusivity of the gps™ covid- dtec-rt-qpcr test using six ranges of decimal dilution from · copies to · copies, and negative control. genomic variance of the -ncov coronavirus the first disease x is caused by a highly transmissible acute respiratory syndrome coronavirus . virol sin severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges evolutionary history, potential intermediate animal host, and cross-species analyses of sars-cov- genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a pneumonia outbreak associated with a new coronavirus of probable bat origin real-time pcr in virology identification of a novel coronavirus in patients with nat-inst-health-t.pdf? methods for novel coronavirus (n-cov- ) in japan the neighbor-joining method: a new method for reconstructing phylogenetic trees evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony conformity assessment. general requirements for the competence of testing and calibration laboratories medical laboratories -requirements for quality and competence full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov probable pangolin origin of sars-cov- associated with the covid- outbreak sars-cov- : an emerging coronavirus that causes a global threat genetic evolution analysis of novel coronavirus and coronavirus from other species comparative analysis of primer-probe sets for the laboratory confirmation of sars-cov- comparative performance of sars-cov- detection assays using seven different primer/probe sets and one assay kit key: cord- -av r it authors: liu, zhuoming; vanblargan, laura a.; rothlauf, paul w.; bloyet, louis-marie; chen, rita e.; stumpf, spencer; zhao, haiyan; errico, john m.; theel, elitza s.; ellebedy, ali h.; fremont, daved h.; diamond, michael s.; whelan, sean p. j. title: landscape analysis of escape variants identifies sars-cov- spike mutations that attenuate monoclonal and serum antibody neutralization date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: av r it although neutralizing antibodies against the sars-cov- spike (s) protein are a goal of most covid- vaccines and being developed as therapeutics, escape mutations could compromise such countermeasures. to define the immune-mediated mutational landscape in s protein, we used a vsv-egfp-sars-cov- -s chimeric virus and neutralizing monoclonal antibodies (mabs) against the receptor binding domain (rbd) to generate escape mutants. these variants were mapped onto the rbd structure and evaluated for cross-resistance by convalescent human plasma. although each mab had unique resistance profiles, many shared residues within an epitope, as several variants were resistant to multiple mabs. remarkably, we identified mutants that escaped neutralization by convalescent human sera, suggesting that some humans induce a narrow repertoire of neutralizing antibodies. by comparing the antibody-mediated mutational landscape in s protein with sequence variation in circulating sars-cov- strains, we identified single amino acid substitutions that could attenuate neutralizing immune responses in some humans. control of the sars-cov- pandemic likely will require the deployment of multiple this data and functional approach may be useful for monitoring and evaluating the emergence of escape from antibody-based therapeutic and vaccine countermeasures as they are deployed. to select for sars-cov- s variants that escape neutralization, we used vsv-sarsas we isolated this mutation alone, and acquisition of the l r substitution appeared to increase viral fitness as judged by plaque morphology (fig s ) . for sars - , s n was isolated as a single variant suggesting that this substitution arose first, however acquisition of the s f did not alter plaque morphology (fig s ) . as the l r or s f substitutions were not identified alone, it remains unclear whether they cause resistance to h or sars - respectively. collectively, these results show that escape mutational profiling can identify key epitopes and dominant antigenic sites. escape mutants confer cross-resistance to multiple mabs. we next evaluated whether individual mutants could escape neutralization by the other inhibitory mabs in the panel. we tested the identified escape mutants for neutralization by ten different mabs. we defined the degree of resistance as a percentage by expressing the number of plaques formed by each mutant in the presence of antibody versus its absence. we plotted the degree of resistance to neutralization as a heat map and arbitrarily set % as the than one mab, with substitutions at s and e exhibiting broad resistance (fig ) (fig a and s ). as observed with chimeric viruses expressing the wild-type s protein, all of the escape mutants were inhibited by hace -fc but not mace -fc. however, the extent of neutralization by hace -fc varied (fig a) , with some mutants more sensitive to receptor inhibition and others exhibiting relative resistance. substitutions at residues r , a , n , s , s and p were more sensitive to inhibition by soluble hace than the wild-type s as evidenced by reduced ic values (fig a) and leftward shifts of the inhibition curves (fig s ) . this effect was substitution- dependent as n k was -fold more sensitive to hace than n y (p < . ). several inhibition at the highest concentration ( µg/ml) of hace -fc tested (fig a and b) . residue initial dilution) of sera tested. all four of the substitutions at residue e were resistant to each of the four sera, suggesting that this is a dominant neutralizing epitope. indeed, change at e substitutions (k e, g v, l r and f s) resulted in resistance to neutralization of sera , and (fig a and s ). substitutions n d and n y but not n k were resistant to sera and . sera and also did not efficiently neutralize s g, l r, and t i. all four sera neutralized the single substitution s n as well as wild-type virus (fig a-b) . substitution s n was sensitive to neutralization by sera and except in the presence of a second s f substitution (fig a and s ) . additional amino acid substitutions that conferred resistance to serum include t s and g d. substitution f s, which altered sensitivity to soluble ace , escaped neutralization by serum but not , or . thus, individual escape mutants can exhibit resistance to neutralization by polyclonal human convalescent sera. this observation suggests that the repertoire of antigenic sites on rbd that bind high titer neutralizing antibodies is limited in some humans. comparison of escape mutants with s sequence variants isolated in humans. to broaden our analysis, we performed a second campaign of escape mutant selection compiled all publically available genome sequences of sars-cov- . using , genomes from gisaid, we calculated the substitution frequencies throughout rbd protein (fig a) and mapped the identified residues onto rbd structure (fig b) . of the escape variants we selected, are present in circulating human isolates of sars-cov- (fig a) . the most frequent s sequence variant seen in clinical isolates is d g which is present in % of sequenced isolates. the second most frequent substitution is s n, which is present in . % the mutations we selected also inform the mechanism by which the different antibodies structural studies on the mechanism by which b and h neutralize sars-cov- support inhibition by directly competing with ace binding and an indirect mechanism, respectively. direct competition with ace binding is consistent with the escape mutants we selected with b , and an indirect mechanism fits with the escape mutants we identified to h . the viral mutation rates structural basis of receptor recognition by sars-cov- coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics coronaviruses as dna wannabes: a new model for the regulation of rna virus replication fidelity escape from neutralizing antibodies by sars-cov- spike protein variants cryo-em structure of the -ncov spike in the prefusion conformation a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace potently neutralizing and protective human antibodies against sars-cov- this study was supported by nih contracts and grants ( n c and r key: cord- - edsbfc authors: cox, brian j. title: integration of viral transcriptome sequencing with structure and sequence motifs predicts novel regulatory elements in sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: edsbfc in the last twenty years, three separate coronaviruses have left their typical animal hosts and became human pathogens. an area of research interest is coronavirus transcription regulation that uses an rna-rna mediated template-switching mechanism. it is not known how different transcriptional stoichiometries of each viral gene are generated. analysis of sars-cov- rna sequencing data from whole rna transcriptomes identified trs dependent and independent transcripts. integration of transcripts and ’-utr sequence motifs identified that the pentaloop and the stem-loop were also located upstream of spliced genes. trs independent transcripts were detected as likely non-polyadenylated. additionally, a novel conserved sequence motif was discovered at either end of the trs independent splice junctions. while similar both sars viruses generated similar trs independent transcripts they were more abundant in sars-cov- . trs independent gene regulation requires investigation to determine its relationship to viral pathogenicity. coronavirus, sars, covid , sequence motifs, rna folding, transcription, rna-sequencing, splicing, template switching while coronaviruses (cov) are endemic to much of the world, many are relatively harmless, contributing along with other viral species to the common cold in humans (coleman and frieman, ) . however, three recent events of coronaviruses jumping hosts have caused outbreaks of lethal pulmonary infections. the sars subtype infected thousands of people and killed or nearly % of all infected individuals. mers was more harmful but spread poorly between individuals (coleman and frieman, ) . the most recent event covid- (formerly -ncov) began late is currently ongoing with millions infected and s of thousands of deaths. also, the preventative measures to reduce the spread of the virus is causing global economic dysfunction. intense global efforts are ongoing to develop vaccines and drug treatments. an assessment of the regulation and expression of viral genes is needed to assist in the development of diagnostics, vaccines and pharmaceutical interventions. the coronavirus is a family of positive single-stranded rna viruses that use only rna to generate if all sgmrna are polyadenylated. some rnas, especially non-coding rnas, generally lack a poly-a tail, which could explain poor detection of trs independent sgmrnas. using published sequencing data of ribosome depleted total rna from sars-cov- infected cells and animals (blanco-melo et al., ) , i aligned these against the viral genome (figure ) . using the aligned reads, i generated transcript models using stringtie, which identified multiple spliced species that aligned with the trs-l templated events (figure ) . trs-l transcripts aligned to the genes s, orf a, orf a and n. the s transcript also presented with a short deletion of bases. also, trs independent spliced transcripts were identified. non-canonical transcripts spliced from orf ab onto orf ab and from orf ab onto n at base . the dominant orf -n fusion product (bases , to , ) is predicted to have an open reading frame for part of orf ab that includes the leader protein (nsp ), nsp and approximately % of nsp . for nsp , this would create the near full-length protein, including the nucleic acid-binding domain (nar), but would remove the transmembrane domain. the fused n gene is out-of-frame and would not create a protein product. this same orf ab-n fusion was previously reported by one other group as one of the non-canonical sgmrna species in sars-cov- (kim et al., ). the stringtie quantification of the predicted spliced products showed that relative transcripts abundance of trs dependent was . % while the trs independent was . %. of the trs independent sgmrnas, the orf ab-n spliced product represented . % of the predicted spliced rna, which was the fourth most abundant of the predicted sgmrna species, more than several of the trs-dependent sgmrnas. while this finding represents an approximately four times higher detection of trs independent sgmrnas than previously reported, the methods of quantification in these two studies are not directly comparable, challenging the determination of the relative abundance. another caveat is that stringtie uses mammalian splicing rules that would not follow in the splicing of viral rna that uses a template switch process. to better identify possible sgmrna that may be independent of poly-a, split reads aligned to the viral genome were extracted and quantified (table ) . using this metric, trs dependent sgmrnas that spliced the 'trs-l onto trs-b sequences upstream of viral genes represented only % of the reads and matched the genes s, orf a, e, m, orf a, orf and n, identifying multiple trs dependent spliced products missed by stringtie. moreover, trs independent reads were identified with multiple sgmrnsas mapping from six different regions of orf ab into two regions of the n gene (table ) . to explore if trs independent transcripts are common in other human cov pathogens, i assessed sars-cov(josset et al., ; xiong et al., ) and mers (data: prjna ; publication not known) using previously published polya independent rna sequencing data sets that also used ribosome depletion of total rna. for sars-cov, i observed that trs dependent split junctions represented % of all mapped split reads (figure and table ). this is similar to my observations of sars-cov- , indicating that trs independent splicing is abundant and likely not polyadenylated. significantly the orf ab-n fusion products were also observed in sars-cov but at lower levels ( table ). in the mers virus, the trs independent split read events represented . % of all split reads (data not shown), indicating a higher prevalence of trs dependent sgmrnas compared to both of the sars-cov. as the sl conserved loop sequence is also known to be essential for viral sgmrna, i scanned for this sequence (figure c, d, f and g; blue box). for sgmrna dependent genes, the sl pentaloop sequence (ucuugu) was discovered only near the ' region often near the trs-b sequence. the pentaloop sequence was also identified in multiple locations within the orf ab gene locus. the trs is located in the stem of sl , but the loop sequence is also highly conserved in cov species. for these reasons, i next scanned the sars-cov- genome for matches sl (cuaaac) loop sequences (figure c, d, f and g; green box). a high correlation was noted in the presence of sl sequences near the trs sequence in the ' region of sgmrna dependent genes. as well, multiple sl motifs were observed within the orf ab open reading frames. there was no statistical relationship to the presence of sl or sl sequences and the level of sgmrna expression (data not shown). to determine if the trs, sl and sl are conserved to other cov, i repeated the search for these sequences motifs within the sars-cov genome (figure h-k) . while the trs and sl sequence frequencies and locations were similar between the two sars viruses (figure c while the presence of an rna sequence motifs was not sufficient to explain sgmrna abundance, regulatory motifs may need to be located on folded structures such as sl to be active. next, i assessed the folded structures within the trs-b loci near sgmrna of trs dependent genes to determine if structure and sequence were related to expression levels (figure ) . while a statistical inference is not possible, there was an interesting observation that the trs-b within the n gene locus folded to place the trs, sl and sl sequences on two adjacent stems. n was the only gene with this structure of the trs-b/sl /sl . only s and e were predicted to have the trs-b on a stem, but the sl and sl were unfolded. the sl sequence upstream of orf a and m was predicted to be on a folded stem, but the associated trs-b was not. given that the n gene is the most robustly expressed sgmrna ( - fold higher), the folded structures containing all three sequence elements may be a driver of high template switching. a high level of unexpected splicing within the large orf ab gene was noted in multiple locations, and all of these formed fusions within the n gene (figure orf ab locus or the n gene near the splice locations. however, within the orf ab locus, multiple sl and sl sequences were identified, but these were not observed within the n gene. an inspection of the orf ab splice regions with the n splice regions did identify a novel sequence element (attggc). searching the sras-cov- genome for this element revealed it to be present at / orf ab splice starts, and both n splice ends, including the abundant orf ab (base , ) to n gene (base , ) fusion sgmrna (figure f and g; black boxes). the novel sequence was also found near sl and sl motifs in the orf ab. significantly, the novel motif was not observed in the 'utr or near trs-b sequences of genes. i called this motif the trs independent regulatory (tir) motif. next, i searched for the tir motif associated with trs-independent sgmrnas within the sars-cov genome (figure l-n) . i observed that the tir sequence was identified in sars-cov at similar locations, as observed in sars-cov- (figure f, g, m and n) . the tir sequence was found near the splice locations of the in the orf ab-n fusion in sars-cov (figure m) . this suggests a level of conservation for the tir motif. mers showed poor conservation of the sl motif and the tir motif (data not shown). the trs mechanism of sgmrna generation is characterized in cov species other than the two sars viruses. i show that the folded structure of the '-utr of sars-cov- is predicted to be similar to sars-cov and even mers. the high conservation indicates that the trs mechanism is likely used in sars-cov- . a recent report highlighted the presence of trs independent sgmrnas and the role that methylation signalling may play in marking these transcripts. in this current work, i draw attention to the possible difference in abundance of the trs independent sgmrnas when using total versus poly-a enrichment for library construction. many rnas do not contain poly-a sequences, specifically those that function as non-coding rnas (zhang et al., ) . i identified trs-independent transcripts as more abundant in total rna compared to poly-a rna. the lack of poly-a in trs independent sgmrnas suggests that they may function as non-coding rnas. potential functions of the non-coding rnas are to inhibit cellular antiviral responses or to amplify the viral genome and sgmrna by priming rna-rna interactions and the viral polymerase. methods such as stringtie performed poorly at identifying sgmrnas, and instead use of split reads gave better performance. using the depth of reads mapped to the junctions provided quantification of the abundance of trs dependent and independent reads. quantification of junctions revealed that the sars family of cov had a higher abundance of the trs independent sgmrnas compared to mers. viral sgmrna exist in different proportions that are relative to the protein's proportion needed for viral particle assembly (kim et al., ; sola et al., ) . these different stoichiometries may indicate that there is a higher preference for some template switch regions leading to the increased production of these sgmrnas over others. to regulate the abundance of sgmrna transcripts, the trs likely does not work alone in priming the template switch. to investigate possible combinations of other regulatory motifs, i used the sl pentaloop and the sl loop upstream of the trs-l that are part of the '-utr regulatory regions. i noted the presence of the sl loop motif upstream of genes that generate sgmrnas. i also found evidence of an expanded trs signal that includes the loop of stem-loop . while i found no evidence for a correlation of motifs and expression levels, there did appear to be a relationship between the predicted folded structures of the ' regions of the viral gene and the regulatory sequences. this relationship was most apparent for the n gene, where the trs, sl and sl were all located on two different stems, which was not an arrangement observed on the other viral genes. this folded arrangement may increase the n gene's affinity for template switching to the '-utr through higher affinity rna-rna interactions. other regulatory rna elements may be present that explain sgmrna abundance variation. alternatively, variation in transcript stability may cause apparent differential expression. in a comparison of sars-cov and sars-cov- , the sl and sl motifs are preferentially located near trs-b sequences and not interior to the open reading frames. the exception is orf ab, that while lacking trs motifs, had a high number of sl motifs in both species. interestingly, the sl motif frequency was markedly reduced (> %) within the orf ab region of sars-cov compared to sars-cov- . split reads also identified trs-independent sgmrnas. in support of my observations on sars-cov- , i also assessed sars-cov and mers transcriptional data, two other human pathogenic cov. significantly, all three of these viruses produce trs independent transcripts, indicating that this is a broader phenomenon of cov and not species-specific. i also identified the tir motif, a novel sequence element (attggc) flanking the spliced regions of trs independent sgmrnas in both sars-cov- and sars-cov. the novel sequence element is absent from the 'utr and near trs-b sequences. i observed that while sars-cov- and sars-cov both expressed orf ab-n fusions, sars-cov- had a higher level of expression relative to trs dependent sgmrnas. considering that fewer sl motifs were present in sars-cov relative to sars-cov- , this suggests that the sars-cov- virus may be evolving other regulation of sgmrnas. while sars-cov and sars-cov- are closely related, sars-cov- has infected more people in more counties. the increased expression of the trs-independent sgmrnas may provide a selective advantage to sars-cov- in human disease relative to sars-cov. mutagenic studies are needed to validate the tir motif's role in trs independent sgmrna generation. in conclusion, the novel regulatory sequence and confirmation of trs independent sgmrnas will lead to an improved mechanistic understanding of sgmrna production. the sgmrna mechanism could be drugged to inhibit viral transcription beyond targeting the polymerase. additionally, in all species, the poly-a region (blue) is predicted to be looped back to prime regions thought to be important in self-priming the viral rna directed rna polymerase. for comparison (a), the 'utr folded structure is shown with the location of the trs, sl and sl loops. the local predicted folded rna structures and the regulatory sequences are displayed with the sequence alignment depth of the split reads that map to each gene. of note is that only the n gene, with . to times more expression than all other transcripts, has all regulatory elements predicted to folded stem-loop structures analogous to the ' utr. all viral genome sequences were obtained from ncbi, sars-cov- nc_ . , sars-cov nc_ . and mers nc_ . . all sequences were handled and processed in r using the biostrings package, including conversion of dna into rna, reverse complementation, extraction of subsequences and motif mapping. related sequences were found using the ncbi blast tool (johnson et al., ) . data were visualized using ggplot (wickham, ) . rna folding experiments were performed using the rna structure web server and stand-alone program(mathews, ) using default conditions. data sets were obtained from the short read archive (sars-cov- gse (blanco-melo et al., ) ; sars-cov prjna (josset et al., ; xiong et al., ) ; mers prjna (not linked to a pubmed article) ). these included all cell lines, human samples and animal models infected with the virus. fastq files were deposited as processed files with adaptors trimmed and poor-quality sequences removed and trimmed. the htseq (anders et al., ) program was used to generate the viral reference genome and for alignments of the fastq files. reads that did not align with the viral genome were discarded. the setting for alignment included -data to prepare for transcript assembly and reducing the non-canonical splice penalty to . aligned reads in sam format were converted into bam files using samtools. bam files were sorted and indexed using samtools. reads that split and represented splicing were extracted by using the cigar codes by an awk command to select specific reads that were split. aligned reads were visualized in the integrated genome viewer. bam files were converted into fastq files using the bam fastq function. converted fastq files were assembled into transcripts using stringtie. the gtf output files from stringtie were visualized in igv and extracted as a bed file with a column format. the bed file was used to extract the stranded fasta sequence from the reference genome. the nbci orf finder tool was used to identify open reading frames in novel splice products. all code is available upon requires htseq-a python framework to work with high ggplot : elegant graphics for data analysis genomic profiling of collaborative cross founder mice infected with respiratory viruses reveals novel transcripts and infectionrelated strain-specific gene and isoform expression the structure and functions of coronavirus genomic ' and ' ends coronavirus leader rna regulates and initiates subgenomic mrna transcription both in trans and in cis gene expression profiling of nonpolyadenylated rna-seq across species key: cord- - loucvc authors: pipes, lenore; wang, hongru; huelsenbeck, john p.; nielsen, rasmus title: assessing uncertainty in the rooting of the sars-cov- phylogeny date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: loucvc the rooting of the sars-cov- phylogeny is important for understanding the origin and early spread of the virus. previously published phylogenies have used different rootings that do not always provide consistent results. we investigate several different strategies for rooting the sars-cov- tree and provide measures of statistical uncertainty for all methods. we show that methods based on the molecular clock tend to place the root in the b clade, while methods based on outgroup rooting tend to place the root in the a clade. the results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. we also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. our results suggest that inferences on the origin and early spread of sars-cov- based on rooted trees should be interpreted with caution. sars-cov- , the virus causing covid- or 'severe acute respiratory syndrome,' has a single-stranded rna genome , nucleotides in length zhou et al., b) . the exact origin of the virus causing the human pandemic is unknown, but two coranoaviruses isolated from bats -ratg isolated from rhinolophus affinis (zhou et al., a) and rmyn isolated from rhinolophus malayanus (zhou et al., b) , both from the yunnan province of china -appear to be closely related. after accounting for recombination, the divergence time between these bat viruses and sars-cov- is estimated to be approximately years [ % c.i. ( , ) ] and years [ % c.i. ( , ) ] , for ratg and rmyn respectively, using a strict clock, only the most closely related sequences, and only synonymous mutations, or years [ % hpd credible interval ( , )] for ratg (boni et al., ) tang et al., ; yu et al., ; zhang et al., ) , analyses that used midpoint rooting reached another placement (li et al., c, d; nie et al., ) , and yet other analyses using a molecular clock have reached a different placement of the root giovanetti et al., ; lemey et al., ; li et al., a) . in particular, there is considerable discrepancy between rootings based on rooting with the two closest outgroup sequences (fig a) , which has a rooting in clade a, and rooting based on a molecular clock (fig b) , which has a rooting in clade b, using clade designations by rambaut et al. ( ) . clade b contains the earliest sequences from wuhan, and a rooting in this clade would be compatible with the epidemiological evidence of an origin of sars-cov- in or near wuhan. however, if an outgroup rooting is assumed ( fig a) the inferred origin is in clade a which consists of many individuals from both inside and outside east asia. such a rooting would be compatible with origins of sars-cov- outside of wuhan. the rooting of the sars-cov- pandemic is, therefore, critical for our understanding of the origin and early spread of the virus. however, it is not clear how best to root the tree and how much confidence can be placed in any particular rooting of the tree. there are many different methods for inferring the root of a phylogenetic tree, but they largely depend on three possible sources of information: outgroups, the molecular clock, and non-reversibility. the latter source of information can be used if the underlying mutational process is non-reversible, that is, for some pair of nucleotides (i,j), the number of mutations from i to j differs from the number of mutations from j to i, in expectation at stationarity. however, this source of information is rarely used to root trees because it relies on strong assumptions regarding the mutational process, and it has been shown to perform poorly on real data (huelsenbeck et al., ) . most studies use methods based on either outgroup rooting, molecular clock rooting, or a combination of both. outgroup rooting is perhaps the conceptually easiest method to understand, and arguably the most commonly used method. in outgroup rooting, the position in which one or more outgroups connects to the ingroup tree is the root position. outgroup rooting can be challenged by long-branch attraction if distant outgroups are being used (e.g. felsenstein, ; graham et al., ; hendy and penny, ; maddison et al., ) . in such cases, the outgroup will have a tendency to be placed on the longest branches of the ingroup tree. in viruses, in particular, because of their high mutation rate, it can be challenging to identify an outgroup sequence that is sufficiently closely related to the ingroup sequences to allow reliable rooting. an alternative to outgroup rooting is molecular clock rooting, which is based on the assumption that mutations occur at an approximately constant rate, or at a rate that can be modeled and predicted using statistical models (e.g., using a relaxed molecular clock such as drummond et al. ( ) ; yoder and yang ( ) ). the rooting is then preferred that makes the data most compatible with the clock assumption by some criterion. early methods for rooting using molecular clocks were often labeled midpoint rooting as some original methods were based on placing the root halfway between the most distant leaf nodes in the tree (e.g. swofford et al., ) . more modern methods use more of the phylogenetic information, for example, by finding the rooting that minimizes the variance among leaf nodes in their distance to the root (e.g. mai et al., ) or produces the best linear regression of root-to-tip distances against sampling times when analyzing heterochronous data (rambaut et al., ) . methods for inferring phylogenetic trees that assume an ultrametric tree (i.e. a tree that perfectly follows a molecular clock), such as unweighted pair group method with arithmetic mean (upgma; sokal and michener, ) , directly infers a rooted tree. similarly, bayesian phylogenetic methods using birth-death process priors (kendall, ; thompson, ) or coalescent priors (kingman, a, b, c) also implicitly infers the root. but even with uninformative priors on the tree the placement of the root can be estimated in bayesian phylogenetics using molecular clock assumptions. an advantage of such methods, over methods that first infer the branch lengths of the tree and then identify the root most compatible with a molecular clock, is that they explicitly incorporate uncertainty in the branch length estimation when identifying the root and they simultaneously provide measures of statistical uncertainty in the rooting of the tree. boni et al., ; wang et al., ) . recombination in the outgroups is at odds with the assumption of a single phylogenetic tree shared by all sites assumed by phylogenetic models when using outgroup rooting, particularly if more than one outgroup is included in the analysis. to investigate the possible rootings of the sars-cov- phylogeny we used six different methods and quantified the uncertainty in the placement of the root for each method on the inferred maximum likelihood topology. we note that the question of placement of a root, is a question idiosyncratic to a specific phylogeny, and to define this question we fixed the tree topology, with the exception of the root placement, in all analyses. in all cases, we applied the method to the alignment of sars-cov- sequences and two putative outgroup sequences, ratg and rmyn , (see table s ) that was constrained such that the proteincoding portions of the sars-cov- genome were in frame, and is described in detail in wang et al. ( ) . to ensure that we could accurately capture the rooting from available sequences, the sequences used for the analysis are chosen to be representative of the basal branches of the phylogeny and/or were early sequenced strains. there are two orders of magnitude more strains available in public databases, however these sequences are more terminally located and would provide little additional information about the placement of the root but have the potential to add a significant amount of additional noise. we are therefore focusing our efforts on the limited data set of early sequences. however, we note that future inclusion of more sequences with a basal the topology of the tree is shown in figure . the outgroup sequences were pruned from the tree using nw prune from newick utilities v . (junier and zdobnov, ) . bootstrapping was preformed using the raxml-ng --bootstrap option. for the ratg +rmyn analysis, only bootstrapped trees that formed a monophyletic group for ratg and rmyn were kept. the clades of the tree were assigned according to nomenclature proposed by rambaut et al. ( ) where the a and b clades are defined by the mutations and and based on whether or not they share those sites with ratg . the six different methods for identifying the root of the sars-cov- phylogeny were: ( ) outgroup rooting using ratg . we constrained the tree topology to be equal to the unrooted sars-cov- phylogeny, i.e. the only topological parameter estimated was the placement of the ratg sequence on the unrooted sars-cov- phylogeny. we masked the potential recombination segment (nc v positions to ) in ratg identified in wang et al. ( ) from the alignment. to quantify uncertainty we obtained , bootstrap samples. we note that while interpretation of bootstrap proportions in phylogenetics can be problematic (see efron et al., ) we performed , parametric simulations using pyvolve (spielman and wilke, ) using maximum likelihood estimates, from the original data set, of the model of molecular evolution and the phylogenetic tree, including branch lengths (see table s ). for each simulation, we then estimated the tree using the same procedure as used for the real data for both the simulated data, and for bootstrap replicates. we then constructed confidence sets by ( ) outgroup rooting using rmyn . we used the same methods as in ( ) rambaut ( rambaut ( , applied to the maximum likelihood tree. this method uses the molecular clock to root the tree. we again quantified uncertainty using , bootstrap samples. to investigate differences in signs of temporal signal for the outgroup rooting and the molecular clock rooting, we calculated root-to-tip distances using tempest v . . (rambaut et al., ) for the ml tree using the outgroup rooting ( fig s ) and a re-rooting of the ml tree using the molecular clock rooting (fig s ) . re-rooting was performed using nw reroot from newick utilities v . (junier and zdobnov, we estimated the maximum clade credibility tree using a time-measured bayesian phylogenetic reconstruction implemented in beast (suchard et al., ) v . . . we used a gtr+Γ substitution model and the uncorrelated relaxed clock with a lognormal distribution, and specified flexible skygrid coalescent tree priors. treeannotator was used to annotate the maximum clade credibility tree. /t h a il a n d / / s a r s -c o v - /h u m a n /c h n /w u h a n y b / u s a /w i / s a r s -c o v - /h u m a n /g u a n g z h o u /i q t c / s a r s -c o v - /i q t c /h u m a n / /c h n s a r s -c o v - /i q t c /h u m a n / /c h n s a r s -c o v - /h u m a n /c h n /b e ij in g im e -b j / s a r s -c o v - /h u m a n /c h n /y n - - / s a r s -c ov - /h um an / . : the unrooted maximum likelihood topology for the sars-cov- phylogeny of genomes with probabilities for six rooting methods. only branches with probability greater than . for at least one method are shown. branch lengths are not to scale. the software package pangolin (https://github.com/hcov- /pangolin, updated on - - ) was used for lineage assignment based on lineages updated on - - . after running the software for assignment, only lineages called for sequences where both alrt and ufbootstrap values which quantify the branch support in phylogeny construction are > . the global spread of -ncov: a molecular evolutionary analysis. pathogens and global health relaxed phylogenetics and dating with confidence bayesian evaluation of temporal signal in measurably evolving populations temporal signal and the phylodynamic threshold of sars-cov- . biorxiv bootstrap confidence levels for phylogenetic trees cases in which parsimony or compatibility methods will be positively misleading the ucsc sars-cov- genome browser the first two cases of -ncov in italy: where they come from mapping genome variation of sars-cov- worldwide highlights the impact of covid- super-spreaders rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots a framework for the quantitative study of evolutionary trees inferring the root of a phylogenetic tree the newick utilities: high-throughput phylogenetic tree processing in the unix shell on the generalized "birth-death" process the coalescent. stochastic processes and their applications exchangeability and the evolution of large populations outgroup analysis and parsimony minimum variance rooting of phylogenetic trees and implications for species tree reconstruction phylogenetic and phylodynamic analyses of sars-cov- ape . : an environment for modern phylogenetics and evolutionary analyses in r recombination and lineage-specific mutations led to the emergence of sars-cov- . biorxiv estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies path-o-gen: temporal signal investigation tool exploring the temporal structure of heterochronous sequences using tempest (formerly patho-gen) a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology usa sars-cov- isolates reveals haplotype signatures and localized transmission patterns by state and by country. medrxiv a statistical method for evaluating systematic relationships pyvolve: a flexible python module for simulating sequences along phylogenies bayesian phylogenetic and phylodynamic data integration using beast . . virus evolution molecular systematics, chapter phylogenetic inference on the origin and continuing evolution of sars-cov- human evolutionary trees emergence of genomic diversity and recurrent mutations in sars-cov- . infection synonymous mutations and the molecular evolution of sars-cov- origins world health organization a new coronavirus associated with human respiratory disease in china estimation of primate speciation dates using local molecular clocks decoding the evolution and transmissions of the novel pneumonia coronavirus (sars-cov- /hcov- ) using whole genomic data origin and evolution of the novel coronavirus a novel bat coronavirus reveals natural insertions at the s /s cleavage site of the spike protein and a possible recombinant origin of hcov- a pneumonia outbreak associated with a new coronavirus of probable bat origin we thank dr. adi stern for discussion. the research was funded by koret-uc berkeley- biology and bioinformatics to rn. key: cord- -xv k authors: chow, ryan d.; chen, sidi title: the aging transcriptome and cellular landscape of the human lung in relation to sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xv k since the emergence of sars-cov- in december , coronavirus disease- (covid- ) has rapidly spread across the globe. epidemiologic studies have demonstrated that age is one of the strongest risk factors influencing the morbidity and mortality of covid- . here, we interrogate the transcriptional features and cellular landscapes of the aging human lung through integrative analysis of bulk and single-cell transcriptomics. by intersecting these age-associated changes with experimental data on host interactions between sars-cov- or its relative sars-cov, we identify several age-associated factors that may contribute to the heightened severity of covid- in older populations. we observed that age-associated gene expression and cell populations are significantly linked to the heightened severity of covid- in older populations. the aging lung is characterized by increased vascular smooth muscle contraction, reduced mitochondrial activity, and decreased lipid metabolism. lung epithelial cells, macrophages, and th cells decrease in abundance with age, whereas fibroblasts, pericytes and cd + tcm cells increase in abundance with age. several age-associated genes have functional effects on sars-cov replication, and directly interact with the sars-cov- proteome. interestingly, age-associated genes are heavily enriched among those induced or suppressed by sars-cov- infection. these analyses illuminate potential avenues for further studies on the relationship between the aging lung and covid- pathogenesis, which may inform strategies to more effectively treat this disease. these data indicate that differential expression of sars-cov- host entry factors alone is unlikely to explain the relationship between age and severity of covid- illness. to discern the host cell types involved in covid- entry, we turned to a single cell rna-seq (scrna-seq) dataset of , human lung cells from the tissue stability cell atlas . in agreement with prior reports, analysis of the single cell lung transcriptomes revealed that alveolar type (at ) cells were comparatively enriched in ace and tmprss -expressing cells , ( supplementary figure a-b) . however, ace -expressing cells represented only . % of all at cells, while . % of at cells expressed tmprss . alveolar type (at ) cells also showed detectable expression of ace and tmprss , but at lower frequencies ( . % and . %). ctsl expression could be broadly detected in many different cell types including at cells, but its expression was particularly pronounced in macrophages (supplementary figure c) . since the expression of host entry factors ace , tmprss and ctsl did not increase with age, we next sought to identify all age-associated genes expressed in the human lung (methods). using a likelihood-ratio test , we pinpointed the genes for which age significantly impacts their expression. with a stringent cutoff of adjusted p < . , we identified two clusters of genes in which their expression progressively changes with age (figure d) . cluster is composed of genes that increase in expression with age, while cluster contains genes that decrease in expression with age. gene ontology and pathway analysis of cluster genes (increasing with age) revealed significant enrichment for cell adhesion, vascular smooth muscle contraction, oxytocin signaling, and platelet activation, in addition to several other pathways (figure e) . these findings are consistent with known physiologic changes of aging, including decreased pulmonary compliance , and heightened risk for thrombotic diseases . of note, deregulation of the reninangiotensin system has been implicated in the pathogenesis of acute lung injury induced by sars- china and italy have found that patients with hypertension were more likely to develop ards , require icu admission , and die from the disease , though we note that correlative epidemiologic studies do not necessarily demonstrate causality. cluster genes (decreasing with age) were significantly enriched for mitochondrion, mitochondrial translation, metabolic pathways, and mitosis, among other pathways (figure f) , which is consistent with prior observations of progressive mitochondrial dysfunction with aging [ ] [ ] [ ] . of note, cluster was also enriched for genes involved in lipid metabolism, fatty acid metabolism, peroxisome, and lysosomal membranes. age-associated alterations in lipid metabolism could impact sars-cov- infection, as sars-cov can enter cells through cholesterol-rich lipid rafts [ ] [ ] [ ] [ ] . similarly, age-associated alterations in lysosomes could influence late endocytic viral entry, as the protease cathepsin l cleaves sars-cov spike proteins from within lysosomes , . having compiled a high-confidence set of age-associated genes, we sought to identify the lung cell types that normally express these genes, using the human lung single cell transcriptomics dataset from the tissue stability cell atlas . by examining the scaled percentage of expressing cells within each cell subset, we identified age-associated genes predominantly enriched in different cell types. cell types with highly enriched expression for certain cluster genes (increasing with age) included fibroblasts, muscle cells, and lymph vessels (figure a) . in contrast, cell types with highly enriched expression for certain cluster genes (decreasing with age) included macrophages, dividing dendritic cells (dcs)/monocytes, and at cells (figure b) . similar results were found using an independent human lung scrna-seq dataset from the human lung cell atlas (supplementary figure a-b) . examining the muscle-enriched genes that increased in expression with age, gene ontology analysis revealed enrichment for vascular smooth muscle contraction, cgmp-pkg signaling, z-disc, and actin cytoskeleton, among other pathways ( figure c ). as for the at -enriched genes that decreased in expression with age, gene ontology analysis revealed enrichment for metabolic pathways, biosynthesis of antibiotics, lipid metabolism, extracellular exosome, and mitochondrial matrix (figure d) . a subset of these enriched gene ontologies had also been identified by the bulk rna-seq analysis (figure e-f) . thus, integrative analysis of bulk and single-cell transcriptomes revealed that many of the age-associated transcriptional changes in human lung can be mapped to specific cell subpopulations, suggesting that the overall abundance of these cell types, their transcriptional status, or both, may be altered with aging. as the pathophysiology of viral-induced ards involves an intricate interplay of diverse cell types, most notably the immune system , , aging-associated shifts in the lung cellular milieu could contribute an important dimension to the relationship between age and risk of ards in patients with covid- . to investigate the cellular landscape of the aging lung, we applied a gene signature-based approach to infer the enrichment of different cell types from the bulk rna-seq profiles. since bulk rna-seq measures the average expression of genes within a cell population, such datasets will reflect the relative proportions of the cell types that comprised the input population, though with the caveat that cell types can have overlapping expression profiles and such profiles may be altered in response to stimuli. using this approach, we identified ageassociated alterations in the enrichment scores of several cell types (figure a) . whereas epithelial cells decreased with age, fibroblasts increased with age (figure b ). this finding is consistent with the progressive loss of lung parenchyma due to reduced regenerative capacity of the aging lung , as well as the increased risk for diseases such as chronic obstructive pulmonary disease and pulmonary fibrosis . in addition, these results are concordant with the findings from analysis of human lung single-cell transcriptomes (figure a-b) . among the innate immune cell populations, the enrichment scores of total macrophages were inversely associated with age (figure c) . macrophages are major drivers of innate immune responses in the lung, acting as first-responders against diverse respiratory infections . thus, the age-associated decrease in macrophage abundance may be a possible factor related to the greater severity of lung pathology in patients with covid- . although macrophage accumulation is often associated with the pathologic inflammation of viral ards , , pulmonary macrophages can act to limit the duration and severity of infection by efficiently phagocytosing dead infected cells and released virions [ ] [ ] [ ] . notably, macrophages infected with sars-cov have been found to abort the replication cycle of the virus , , further supporting their role in antiviral responses. however, macrophages may suppress antiviral adaptive immune responses , , inhibiting viral clearance in mouse models of sars-cov infection . in aggregate, these prior reports suggest that the precise role of lung macrophages in sars-cov- pathophysiology is likely contextdependent. it is also plausible that the increased numbers of macrophages are not the primary distinction between young and old patients, but rather the functional status of the macrophages. in line with this, we observed that the age-associated changes in macrophages were specifically attributed to the pro-inflammatory m macrophage subset but not the immunoregulatory m subset (figure c) , though this binary classification scheme represents an oversimplification of macrophage function. nevertheless, elucidating the consequences of age-associated changes in lung macrophages may reveal insights into the differential outcomes of older patients with covid- . further studies are needed to investigate whether macrophages or other innate immune cells respond to sars-cov- infection, and how their numbers or function may change with aging. among the adaptive immune cell populations, we observed that th cells and cd + tcm cells trended in opposite directions with aging (figure d) . while the lungs of younger donors were enriched for th cells, they were comparatively depleted for cd + tcm cells; the inverse was true in the lungs of older donors. of note, mouse models of sars-cov infection have indicated important roles for cd + t cell responses in viral clearance , . additionally, th cells are responsive to sars-cov vaccines and promote macrophage activation against viruses . it is therefore possible that age-associated shifts in cd + t cell subtypes within the lung may influence the subsequent host immune response in response to coronavirus infection. however, future studies will be needed to determine the role of th cells and other adaptive immune cells in the response to sars-cov- , and how these dynamics may change with aging. we next explored the roles of lung age-associated genes in host responses to viral infection. since functional screening data with sars-cov- has not yet been described (as of march , ), we instead searched for data on sars-cov. while these two viruses belong to the same genus (betacoronaviridae) and are conserved to some extent , they are nevertheless two distinct viruses with different epidemiological features, indicating unique virology and host biology. therefore, data from experiments performed with sars-cov must be interpreted with caution. we reassessed the results from a prior in vitro sirna screen of host factors involved in sars-cov infection . in this kinase-focused screen, factors were determined to have a significant effect on sars-cov replication. notably, of the factors exhibited age-associated gene expression patterns (figure a) , with genes in cluster (increasing with age) and genes in cluster (decreasing with age). the genes in cluster were all associated with increased sars infectivity upon sirna knockdown; these genes included clk , akap , alpk , and itk. paradoxically, while knockdown of clk was associated with increased sars infectivity, cell viability was also found to be increased (figure b) . of the genes in cluster , were associated with increased sars infectivity and reduced cell viability upon sirna knockdown (aurkb, cdkl , pdik l, cdkn , mst r, and adk). age-related downregulation of these factors could be related to the increased severity of illness in older patients. however, we emphasize that until rigorous follow-up experiments are performed with sars-cov- , the therapeutic potential of targeting these factors in patients with covid- is unknown. using the human lung scrna-seq data, we then determined which cell types predominantly express these host factors. of the genes in cluster that had a significant impact on sars-cov replication, clk was universally expressed, while alpk expression was rarely detected (figure c, supplementary figure a) . itk was preferentially expressed in lymphocytic populations, and akap was most frequently expressed in ciliated cells and muscle cells. of the overlapping genes in cluster , aurkb and cdkn were predominantly expressed in proliferating immune cell populations, such as macrophages, dcs/monocytes, t cells, and nk cells (figure d, supplementary figure b) . mst r, pdik l, and pskh were infrequently expressed, through their expression was detected in a portion of at cells ( . %, . %, and . %, respectively). finally, adk and cdkl exhibited preferential enrichment in at cells ( . % and . %). in aggregate, these analyses showed that the age-associated genes with functional roles in sars-cov are expressed in specific cell types of the human lung. we then investigated whether age-associated genes in the human lung interact with proteins encoded by sars-cov- . a recent study interrogated the human host factors that interact with different sars-cov- proteins , revealing the sars-cov- : human protein interactome in cell lines expressing recombinant sars-cov- proteins. by cross-referencing the interacting host factors with the set of age-associated genes, we identified factors at the intersection ( figure a ). of these genes showed an increase in expression with age (i.e. cluster genes), while decreased in expression with age (cluster genes). mapping these factors to their interacting sars-cov- proteins, we noted that the age-associated host factors which interact with m, nsp , nsp , nsp , nsp , orf a, orf , orf c, and orf proteins generally decrease in expression with aging (figure b) . however, a notable exception was nsp , as the age-associated hostfactors that interact with nsp both showed increased expression with aging (crtc and mycbp ) (figure c ). nsp encodes for the primary rna-dependent rna polymerase (rdrp) of sars-cov- , and is a prime target for developing therapies against covid- . the observation that crtc and mycbp increase in expression with aging is intriguing, as these genes may be related to the activity of nsp /rdrp in host cells. of note, mycbp is a known repressor of camp signaling , , and camp signaling potently inhibits contraction of airway smooth muscle cells . thus, age-associated increases in mycbp could promote smooth muscle contraction, which is concordant with our analyses on age-associated gene signatures in the lung (figure e) . mycbp might possibly contribute to covid- pathology by not only interacting with sars-cov- rdrp, but also through its normal physiologic role in promoting smooth muscle contraction. to assess the cell type-specific expression patterns of these various factors, we further analyzed the lung scrna-seq data. of the sars-cov- interacting genes that increase in expression with age, mycbp was frequently expressed across several populations, particularly proliferating immune populations (dc/monocyte, t cells, macrophages), muscle cells, fibroblasts, and lymph vessels (figure d) . mycbp was also expressed in . % of at cells. cep was preferentially expressed in lymph vessels, while akap l and crtc showed relatively uniform expression frequencies across cell types, including a fraction of at cells ( . % and . % expressing cells, respectively). of the sars-cov- interacting genes that decrease in expression age, npc and ndufb were broadly expressed in many cell types, including at cells ( . % together, these analyses highlight specific age-associated factors that interact with the sars-cov- proteome, in the context of the lung cell types in which these factors are normally expressed. finally, we assessed whether sars-cov- infection directly alters the expression of lung ageassociated genes. a recent study profiled the in vitro transcriptional changes associated with sars-cov- infection in different human lung cell lines . we specifically focused on the data from a lung cancer cells, a cells transduced with an ace expression vector (a -ace ), and calu- lung cancer cells. several age-associated genes were found to be differentially expressed upon sars-cov- infection (figure a-c) . of note, the overlap between lung ageassociated genes and sars-cov- regulated genes was statistically significant across all cell lines (figure d-f) , suggesting a degree of similarity between the transcriptional changes associated with aging and with sars-cov- infection. among the age-associated genes that were induced by sars-cov- infection, the majority of these genes increase in expression with age (cluster ) (figure g-i) . conversely, among the age-associated genes that were repressed by sars-cov- infection, most of these genes decrease in expression with age (cluster ). of note, the directionality of sars-cov- regulation (induced or repressed) and the directionality of ageassociation (increase or decrease with age) were significantly associated across all cell lines (figure g-i) . to identify a consensus set of age-associated genes that are regulated by sars-cov- infection, we integrated the analyses from all cell lines. genes were consistently induced by sars-cov- infection (figure a ). of these, genes are in cluster (increase with age) and genes are in cluster (decrease with age). the induced genes in cluster include several factors involved in ras signaling (rab b, rasa , and rasgrp ) as well as clk , which was shown to be involved in host responses to sars-cov infection (figure a-b) . on the other hand, genes were concordantly repressed by sars-cov- infection (figure b here we systematically analyzed the transcriptome of the aging human lung and its relationship to sars-cov- . we found that the aging lung is characterized by a wide array of changes that could contribute to the worse outcomes of older patients with covid- . on the transcriptional level, we first identified , genes that exhibit age-associated expression patterns. we subsequently demonstrated that the aging lung is characterized by several gene signatures, including increased vascular smooth muscle contraction, reduced mitochondrial activity, and decreased lipid metabolism. by integrating these data with single cell transcriptomes of human lung tissue, we further pinpointed the specific cell types that normally express the age-associated genes. we showed that lung epithelial cells, macrophages, and th cells decrease in abundance with age, whereas fibroblasts and pericytes increase in abundance with age. these systematic changes in tissue composition and cell interactions can potentially propagate positive feedback loops that predispose the airways to pathological contraction . we find that some of the age-associated genes have been previously identified as host factors with a functional role in sars-cov replication , and a fraction of the age-associated factors have been shown to directly interact with the sars-cov- proteome . furthermore, age-associated genes are significantly enriched among genes directly regulated by sars-cov- infection , suggesting transcriptional parallels between the aging lung and sars-cov- infection. moreover, it is intriguing that the genes induced by sars-cov- infection tend to increase in expression with aging, and vice versa. whether any of these age-associated changes causally contribute to the heightened susceptibility of covid- in older populations remains to be experimentally tested. it is also important to note that the datasets analyzed here were not from patients with covid- . given the limited data that is currently publicly available, we emphasize that the analyses presented here at this stage should not be used to guide clinical practice. these analyses resulted in a number of previously unnoted observations and phenomena that illuminate new directions for subsequent research efforts on sars-cov- , generating genetically-tractable hypotheses for why advanced age is one of the strongest risk factors for covid- morbidity and mortality. ultimately, we hope such knowledge can help the field to sooner develop rational therapies for covid- that are rooted in concrete biological mechanisms. we thank akiko iwasaki, craig wilen, hongyu zhao, wei liu, wenxuan deng, andre levchenko, katie zhu, ruth montgomery, bram gerriten, steven kleinstein and a number of other colleagues for their critical comments and suggestions, which were incorporated into the analyses and manuscript. we thank antonio giraldez, andre levchenko, chris incarvito, mike crair, and scott strobel for their support on covid- research. we thank our colleagues in the chen lab, the genetics department, the systems biology institute and various yale entities. we also want to thank all of the healthcare workers who are risking their health on the frontlines to treat patients with this disease. rc and sc conceived and designed the study. rc developed the analysis approach, performed all data analyses, and created the figures. rc and sc prepared the manuscript. sc supervised the work. no competing interests related to this study. the authors have no competing interests as defined by nature research, or other interests that might be perceived to influence the interpretation of the article. the authors are committed to freely share all covid- related data, knowledge and resources to the community to facilitate the development of new treatment or prevention approaches against sars-cov- / covid- as soon as possible. as a note for full disclosure, sc is a co-founder, funding recipient and scientific advisor of evolveimmune therapeutics, which is not related to this study. c. david gene ontology and pathway analysis of cluster age-associated genes that exhibit enriched expression in muscle cells. d. david gene ontology and pathway analysis of cluster age-associated genes that exhibit enriched expression in alveolar type (at ) cells. a. venn diagram of the intersection between age-associated genes in human lung and the sars-cov- : human protein interactome (gordon et al., ) . of the age-associated genes that were found to also interact with sars-cov- , of them increased in expression with age, while decreased with age. b. age-associated genes in human lung and their interaction with sars-cov- proteins, where each block contains a sars-cov- protein (underlined) and its interacting age-associated factors. blocks are colored by the dominant directionality of the age association (orange, decreasing with age; blue, increasing with age). gene targets with already approved drugs, investigational new drugs, or preclinical molecules are additionally denoted with an asterisk. the genotype-tissue expression (gtex) project was supported by the common fund of the office of the director of the national institutes of health, and by nci, nhgri, nhlbi, nida, nimh, and ninds , . rna-seq raw counts and normalized tpm matrices were downloaded from the gtex portal (https://gtexportal.org/home/index.html) on march , , release v . all accessed data used in this study are publicly available on the web portal and have been deidentified, except for patient age range and gender. case-fatality rates in china and italy were from the chinese cdc and italian iss, for visualization of rna-seq expression data, the tpm values were log transformed and plotted in r (v . . ). all boxplots are tukey boxplots, with interquartile range (iqr) boxes and . × iqr whiskers. pairwise statistical comparisons in the plots were assessed by two-tailed mann-whitney test, while statistical comparisons across all age groups were performed by kruskal-wallis test. to identify age-associated genes, the raw counts values were analyzed by deseq (v . . ) , using the likelihood ratio test (lrt). age-associated genes were determined at a significance threshold of adjusted p < . . genes passing the significance threshold were then scaled to z-scores and clustered using the degpatterns function from the r package degreport (v . . ). gene clusters with progressive and consistent trends with age were retained for downstream analysis. gene ontology and pathway enrichment analysis was performed using david (v . ) (https://david.ncifcrf.gov/), separating the age-associated genes into the two clusters (increasing or decreasing with age), as described above. scrna-seq data were analyzed in r (v . . ) using seurat , and custom scripts. of the age-associated genes identified from gtex bulk transcriptomes, genes were matched in the tissue stability cell atlas dataset and genes were matched in the human lung cell atlas dataset. to determine the percentage of cells expressing a given gene, the expression matrices were converted to binary matrices by setting a threshold of expression > . cell typespecific expression frequencies for each gene were then calculated using the provided cell type annotations. to identify genes preferentially expressed in a specific cell type, we further scaled the expression frequencies in r to obtain z-scores. data were visualized in r using the nmf package . where applicable, gene ontology analysis was performed with david (v . ) , using genes with z-score > in the cell type of interest for analysis. to infer the cellular composition of each lung sample, we analyzed the tpm expression matrices using the xcell algorithm . the resultant cell type enrichment tables were analyzed in r. for data visualization, cell type enrichment scores were scaled to z-scores, and the median zscore for each age group was expressed as a heatmap, using the superheat package . ageassociation was assessed across all age groups by kruskal-wallis test. to assess whether any age-associated genes affect host responses to sars-cov (a coronavirus related to sars-cov- ), we analyzed the data from a published sirna screen of host factors influencing sars-cov (data set s in the publication; accessed on march , ). for data visualization, each point corresponding to a target gene was size-scaled and color-coded according to the age-association statistical analyses described above. to assess whether any lung age-associated genes encode proteins that interact with the sars-cov- proteome, we compiled the data from a preprint manuscript detailing the human host factors that interact with different proteins in the sars-cov- proteome (accessed on march , ). to assess whether the expression of lung age-associated genes is influenced by sars-cov- infection, we utilized the data from a preprint manuscript detailing the transcriptional response to sars-cov- infection , from the gene expression omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse ) (accessed on april , ). differentially expressed genes were determined using the wald test in deseq (v . . ) comparing sars-cov- infected cells to batch-matched mock controls, with a significance threshold of adjusted p < . . of the age-associated genes, genes were matched to the rna-seq dataset. statistical significance of overlaps between the gene sets was assessed by hypergeometric test, assuming , total genes as annotated in the rna-seq dataset and age-associated genes. statistical significance of the association between the directionality of sars-cov- regulation and the directionality of age-association was assessed by two-tailed fischer's exact test. gene ontology and pathway enrichment analysis was performed using david (v . ) (https://david.ncifcrf.gov/). comprehensive information on the statistical analyses used are included in various places, including the figures, figure legends and results, where the methods, significance, p-values and/or tails are described. all error bars have been defined in the figure legends or methods. codes used for data analysis or generation of the figures related to this study are available upon request to the corresponding author and will be deposited to github upon publication for free public access. all relevant processed data generated during this study are included in this article and its supplementary information files. raw data are from various sources as described above. all data and resources related to this study are freely available upon request to the corresponding author. tables table s : demographics of donors for gtex lung samples. (gordon et al., ) with age-association statistics from this study. table s : differential expression analysis in a cells, infected with sars-cov- vs mock control, with age-association annotations. table s : differential expression analysis in a -ace cells, infected with sars-cov- vs mock control, with age-association annotations. table s : differential expression analysis in calu- cells, infected with sars-cov- vs mock control, with age-association annotations. cdkn b gna rab b n bp rasgrp camsap mxi gem c s rbm rasa brwd phc gatad b pnrc arid b atrx clk pnisr prtg h f nckipsd cnn ptov cfd rhoc polr f gnai tk tm sf pskh ap m msh atp v e pgam mvk aacs acaca acss nipsnap nsdhl aifm agr gipc mmab hadh qdpr ap s ndufa poldip ahcy mrps samm prpf nup gba cln apeh atic arpc b ppp ca aprt atp v d akr a phb c qbp parp cbr acat pdhb ndufa mrpl prdx mtch ormdl ndufs a t p b a l g n e u p r i m c e n p f h o o k g c c p i g o n a r s a a r a t p v a n d u f a f a g p s n p c p p t n d u f b -log (adj. p-value) gem c s gna rab b brwd cnot l cdkn b camsap mxi rasa rbm gatad b phc clk arid b atrx pnisr pnrc prtg h f nckipsd cnn gnai rhoc cfd ptov polr f ap m agr tk tm sf gba pgam pskh acss acaca aacs mvk msh atp v e nup nipsnap nsdhl aifm acat cbr gipc mmab hadh mrps ap s poldip qdpr ahcy atic apeh cln prpf samm arpc b ppp ca aprt parp pdhb mtch ndufa mrpl asna ndufs c qbp ndufa phb ormdl atp v d akr a prdx estimating clinical severity of covid- from the transmission dynamics in wuhan, china adjusted age-specific case fatality ratio during the covid- epidemic in hubei, china case-fatality rate and characteristics of patients dying in relation to covid- in italy coronavirus disease (covid- ) in italy severe outcomes among patients with coronavirus disease (covid- ) -united states a novel coronavirus from patients with pneumonia in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a pneumonia outbreak associated with a new coronavirus of probable bat origin epidemiological characteristics of pediatric patients with coronavirus disease in china the genotype-tissue expression (gtex) project a novel approach to high-quality postmortem tissue procurement: the gtex project cryo-em structure of the -ncov spike in the prefusion conformation receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein covid- and italy: what next? the lancet risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) -china scrna-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation single-cell rna expression profiling of ace , the putative receptor of wuhan single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses moderated estimation of fold change and dispersion for rna-seq data with deseq effect of aging on respiratory system physiology and immunology alterations in platelet functions during aging: clinical correlations with thrombo-inflammatory disease in older adults a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury angiotensin-converting enzyme protects from severe acute lung failure identification of a novel coronavirus in patients with severe acute respiratory syndrome a novel angiotensin-converting enzyme-related carboxypeptidase (ace ) converts angiotensin i to angiotensin - a human homolog of angiotensin-converting enzyme cloning and functional expression as a captopril-insensitive angiotensin-converting enzyme is an essential regulator of heart function clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china baseline characteristics and outcomes of patients infected with sars-cov- admitted to icus of the lombardy region the mitochondrial basis of aging and age-related disorders the mitochondrial basis of aging mitochondrial dysfunction in the elderly: possible role in insulin resistance lipid rafts are involved in sars-cov entry into vero e cells sars coronavirus entry into host cells through a novel clathrin-and caveolaeindependent endocytic pathway lipid rafts play an important role in the early stage of severe acute respiratory syndrome-coronavirus life cycle lipid rafts: heterogeneity on the high seas sars coronavirus, but not human coronavirus nl , utilizes cathepsin l to infect ace -expressing cells inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry a molecular cell atlas of the human lung from single cell rna sequencing molecular pathology of emerging coronavirus infections clinical progression and viral load in a community outbreak of coronavirus-associated sars pneumonia: a prospective study xcell: digitally portraying the tissue cellular heterogeneity landscape regeneration of the aging lung: a mini-review the aging lung pulmonary macrophages: key players in the innate defence of the airways the clinical pathology of severe acute respiratory syndrome (sars): a report from evolution of pulmonary pathology in severe acute respiratory syndrome regulating the adaptive immune response to respiratory virus infection regulation of immunological homeostasis in the respiratory tract alveolar macrophages in the resolution of inflammation, tissue repair, and tolerance to infection antibody-dependent infection of human macrophages by severe acute respiratory syndrome coronavirus cytokine responses in severe acute respiratory syndrome coronavirus-infected macrophages in vitro: possible relevance to pathogenesis alveolar macrophage elimination in vivo is associated with an increase in pulmonary immune response in mice evasion by stealth: inefficient immune activation underlies poor t cell response and severe disease in sars-cov macrophage plasticity, polarization, and function in health and disease t cell responses are required for protection from clinical disease and for virus clearance in severe acute respiratory syndrome coronavirus-infected mice cellular immune responses to severe acute respiratory syndrome coronavirus (sars-cov) infection in senescent balb/c mice: cd + t cells are important in control of sars-cov infection induction of th type response by dna vaccinations with n, m, and e genes against sars-cov in mice expanding roles for cd + t cells in immunity to viruses a kinome-wide small interfering rna screen identifies proviral and antiviral host factors in severe acute respiratory syndrome coronavirus replication, including double-stranded rna-activated protein kinase and early secretory pathway proteins a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing protein associated with myc (pam) is a potent inhibitor of adenylyl cyclases pam mediates sustained inhibition of camp signaling by sphingosine- -phosphate camp regulation of airway smooth muscle function sars-cov- launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems a microphysiological model of the bronchial airways reveals the interplay of mechanical and biochemical signals in bronchospasm systematic and integrative analysis of large gene lists using david bioinformatics resources integrating single-cell transcriptomic data across different conditions, technologies, and species comprehensive integration of single-cell data a flexible r package for nonnegative matrix factorization superheat: an r package for creating beautiful and extendable heatmaps for visualizing complex data key: cord- -shvtfxog authors: fukumoto, tatsuya; iwasaki, sumio; fujisawa, shinichi; hayasaka, kasumi; sato, kaori; oguri, satoshi; taki, keisuke; nakakubo, sho; kamada, keisuke; yamashita, yu; konno, satoshi; nishida, mutsumi; sugita, junichi; teshima, takanori title: efficacy of a novel sars-cov- detection kit without rna extraction and purification date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: shvtfxog rapid detection of sars-cov- is critical for the diagnosis of coronavirus disease (covid- ) and preventing the spread of the virus. a novel “ novel coronavirus detection kit (ncov-dk)” halves detection time by eliminating the steps of rna extraction and purification. we evaluated concordance between the ncov-dk and direct pcr. the virus was detected in / fresh samples by the direct method and / corresponding frozen samples by the ncov-dk. the overall concordance rate of the virus detection between the two methods was . % ( % ci, . - . ). concordance rates were . % ( % ci, . - . ), . % ( % ci, . - . ), . % ( % ci, . - . ) in nasopharyngeal swab, saliva, and sputum samples, respectively. these results indicate that the ncov-dk effectively detects sars-cov- in all types of the samples including saliva, while reducing time required for detection, labor, and risk of human error. nasopharyngeal swab, sputum and saliva samples were collected from patients who were admitted to our hospital after a diagnosis of covid- . a total of samples were selected for this study according to the availability of the frozen stock samples. this study was approved by the institutional ethics board and informed consent was obtained from all patients. nasopharyngeal samples were obtained using floqswabs (copan, murrieta, ca, usa). sputum and saliva samples were self-collected in a sterilized pp screw cup (asiakizai co., ltd., tokyo, japan). μl sputum or saliva were added to μl pbs, mixed vigorously, then centrifuged at , x g for minutes at o c, and and the hilden, germany) from fresh samples. one-step rt-qpcr was performed using one-step correlation of chest ct and rt-pcr testing report of cases sars-cov- viral load in upper respiratory specimens of infected patients saliva as a non-invasive specimen for detection of sars-cov- comparison of sars-cov- detection in nasopharyngeal swab and saliva detection of noroviruses in fecal specimens by direct rt-pcr without rna purification high-yield rna-extraction method for saliva key: cord- -g px b authors: takagi, akira; matsui, masanori title: an immunodominance hierarchy exists in cd + t cell responses to hla-a* : -restricted epitopes identified from the non-structural polyprotein a of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g px b covid- vaccines are being rapidly developed and human trials are underway. almost all of these vaccines have been designed to induce antibodies targeting spike protein of sars-cov- in expectation of neutralizing activities. however, non-neutralizing antibodies are at risk of causing antibody-dependent enhancement. further, the longevity of sars-cov- -specific antibodies is very short. therefore, in addition to antibody-induced vaccines, novel vaccines on the basis of sars-cov- -specific cytotoxic t lymphocytes (ctls) should be considered in the vaccine development. here, we attempted to identify hla-a* : -restricted ctl epitopes derived from the non-structural polyprotein a of sars-cov- . eighty-two peptides were firstly predicted as epitope candidates on bioinformatics. fifty-four in peptides showed high or medium binding affinities to hla-a* : . hla-a* : transgenic mice were then immunized with each of the peptides encapsulated into liposomes. the intracellular cytokine staining assay revealed that out of peptides were ctl epitopes because of the induction of ifn-γ-producing cd + t cells. in the peptides, peptides were chosen for the following analyses because of their high responses. to identify dominant ctl epitopes, mice were immunized with liposomes containing the mixture of the peptides. some peptides were shown to be statistically predominant over the other peptides. surprisingly, all mice immunized with the liposomal peptide mixture did not show the same reaction pattern to the peptides. there were three pattern types that varied sequentially, suggesting the existence of an immunodominance hierarchy, which may provide us more variations in the epitope selection for designing ctl-based covid- vaccines. importance for the development of vaccines based on sars-cov- -specific cytotoxic t lymphocytes (ctls), we attempted to identify hla-a* : -restricted ctl epitopes derived from the non-structural polyprotein a of sars-cov- . out of peptides predicted on bioinformatics, peptides showed good binding affinities to hla-a* : . using hla-a* : transgenic mice, in peptides were found to be ctl epitopes in the intracellular cytokine staining assay. out of peptides, peptides were chosen for the following analyses because of their high responses. to identify dominant epitopes, mice were immunized with liposomes containing the mixture of the peptides. some peptides were shown to be statistically predominant. surprisingly, all immunized mice did not show the same reaction pattern to the peptides. there were three pattern types that varied sequentially, suggesting the existence of an immunodominance hierarchy, which may provide us more variations in the epitope selection for designing ctl-based covid- vaccines. in december , the coronavirus disease caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) was firstly identified in wuhan, hubei province, china. since then, its subsequent spread of global infection has still continued to gain momentum. as of september th, , the covid- pandemic has infected more than . million people around the world and caused more than , deaths. although the clinical symptom is varied from asymptomatic or mild self-limited infection to severe life-threating respiratory disease, the mechanism of disease outcome remains unclear. many nations are struggling to find appropriate preventive and control strategies. however, there are no vaccines or antiviral drugs available for the treatment of this infectious disease. there are seven coronaviruses that infect humans. in addition to sars-cov- , sars-cov and middle-east respiratory syndrome coronavirus (mers-cov) cause severe pneumonia, whereas the other four human coronaviruses including hcov- e, -nl , -oc and -hku cause common cold ( ). like other coronaviruses, sars-cov- possesses a large single-stranded positive sense rna genome ( ). as shown in fig. , the '-terminal two-thirds of the genome are composed of orf a and orf b. orf a encodes the polyprotein a (pp a) containing non-structural proteins with peptide-encapsulated liposomes. each of peptides selected were encapsulated into liposomes as described in the materials and methods. hla-a* : transgenic (hhd) mice ( ) were then subcutaneously (s.c.) immunized twice at a one-week interval with each of peptide-encapsulated liposomes together with cpg adjuvant. one week later, spleen cells of immunized mice were prepared, stimulated in vitro with a relevant peptide for hours, and stained for their expression of cell-surface cd and intracellular interferon-gamma (ifn-. as shown in fig. , the intracellular cytokine staining (ics) assay showed that significant numbers of ifn--producing cd + t cells were elicited in mice immunized with liposomal peptides including pp a- , - , - , - , - , - , - , - , - , - , - , - , - , - , - , - , - , and - , revealing that these peptides are hla-a* : -restricted ctl epitopes derived from sars-cov- pp a. as indicated in table , multiple epitopes are located in small proteins such as nsp ( aa) and nsp ( aa), whereas only one epitope is seen in the large nsp composed of amino acids. on the other hand, the remaining peptides out of peptides in liposomes were not able to stimulate peptide-specific ctls in mice (data not shown), demonstrating the necessity to generate data through wet-lab experiments. interestingly, four epitopes including pp a- , - , - , and - are located in the amino acid sequence of sars-cov pp a as well (table ) . pp a- was previously reported by us in the identification of sars-cov-derived ctl epitopes ( ). however, any of epitopes are not found in the amino acid sequence of either mers-cov or the four common cold human coronaviruses involving hcov-oc , in the positive peptides, peptides including pp a- , - , - , - , - , - , - , - , - , and - were selected for the following analyses because of the high ratios of ifn- + cells in cd + t cells (fig. ) . to confirm that the peptides are effective epitopes for peptide-specific ctl responses, we examined whether peptide-specific killing activities were elicited in mice with each of peptides in liposomes. hhd mice were immunized s.c. twice with each of peptide-encapsulated liposomes and cpg adjuvant. one week later, equal numbers of peptide-pulsed and -unpulsed target cells were transferred into immunized mice via i.v. injection, and peptide-specific lysis was analyzed by flow cytometry. in support of the data of ics (fig. ) , peptide-specific killing was observed in mice immunized with any of liposomal peptides (fig. a) . we next attempted to identify dominant ctl epitopes among the ctl epitopes. the same amounts of the peptide solutions at an equal concentration were mixed together and encapsulated into liposomes. seventeen mice were immunized once with the liposomes containing the mixture of peptides. one week later, spleen cells were incubated with each of the peptides for hours, and the ics assay was performed. as shown in fig. b & c, pp a- and - were statistically predominant over almost all other peptides in the induction of peptide-specific ifn- + cd + t cells. furthermore, pp a- and pp a- were significantly superior to pp a- /- and pp a- /- /- in the stimulation of ifn--producing cd + t cells, respectively ( fig. b) . we also examined the peptide-specific induction of cd a + cd + t cells and cd + cd + t cells. cd a and cd are markers of degranulation and early activation on cd + t lymphocytes, respectively. nine mice were immunized once with the liposomes encapsulating the mixture of the peptides. after one week, spleen cells were stimulated with each peptide, and stained for their expression of cd a or cd of cd + t cells. at first glance, the graphs of cd a ( taken together, peptides differed significantly in their ability to induce sars-cov- pp a-specific ctls when mice were immunized with the mixture of peptides in liposomes. thus, some peptides were found to be dominant ctl epitopes although each peptide alone of the peptides has the capability to efficiently activate peptide-specific ctl (figs. & a) . existence of an immunodominance hierarchy. the data in fig. indicate reactivity to the peptides in each of mice immunized with the mixture of the peptides in liposomes. each graph represents reactivity of an individual mouse (fig. ) . unexpectedly, all mice did not show the same reaction pattern against the peptides. it looks like there were roughly three types that varied sequentially in terms of the reaction pattern to the peptides. type a is a group of mice in which pp a- , - , - -specific ifn- + cd + t cells were predominantly elicited, whereas the remaining peptides were not able to activate peptide-specific ifn- + cd + t cells. in the case of type b, pp a- stimulated peptide-specific ifn- + cd + t cells as well as pp a- , and - . in addition to these three peptides, several other peptides also induced ifn- + cd + t cells in type c. these data suggest that there seems to be an immunodominance hierarchy composed of three stages in cd + t cell responses to the peptides. the immunodominance hierarchy may provide us more variations for designing ctl-based covid- vaccines. after the epidemics of sars and mers, scientists have not succeeded yet in developing preventive or therapeutic vaccines available for re-emergence of them. in the sars and mers vaccine development, the full-length s protein or its s subunit have frequently been used as an antigen to produce anti-rbd neutralizing antibodies. however, these vaccine candidates provided partial protection against virus challenge in animal models accompanied by safety concerns such as ade ( ). furthermore, antibody responses to coronaviruses rapidly wane following infection or immunization ( , , ). considering the above, it should be necessary to consider ctl-based vaccine against sars-cov- to provide robust long-lived t cell memory although neutralizing antibody responses are a primary vaccine target. in the current study, we aimed to find hla-a* : -resctricted ctl dominant epitopes derived from sars-cov- . dominant epitopes induce strong immune response to eliminate a certain pathogen fast and effectively, and also contribute to make the memory t cell pool. we focused on pp a of sars-cov- to find out ctl epitopes because pp a is a largest and conserved polyprotein among the constituent proteins. further, pp a is produced earlier than structural proteins, suggesting that pp a-specific ctls can eliminate infected cells before the formation of mature virions. to predict ctl epitopes, we utilized bioinformatics to select peptides with high scores in four kinds of computer-based programs (table ). in the evaluation of peptide binding, peptides showed high or medium binding affinities to hla-a* : molecules, whereas the remaining peptides displayed low binding affinities or no binding (table ) . out of them, only eighteen peptides were found to be ctl epitopes. hence, we have to keep in mind that currently available algorithms have a limited ability to accurately predict ctl epitopes although the bioinformatics approach is very useful to quickly predict a number of epitopes in a large protein ( , ) . among epitopes which we have identified in the current study, epitopes including pp a- , - , - , and - are present in the amino acid sequence of sars-cov pp a ( % identity) (table ) . therefore, ctls induced by these four epitopes could work fine for the clearance of sars-cov as well. in support of this data, le bert et al., reported that long lasting memory t cells in sars-recovered individuals cross-reacted to n protein of sars-cov- ( ). recently, several studies found that sars-cov- -reactive cd + and cd + t cells were detected in a substantial proportion of healthy donors who have never infected with sars-cov- or sars-cov ( - ). it is most likely that these individuals were previously infected with one of the four human coronaviruses (hcov- e, -nl , -oc and -hku ) that cause seasonal common cold. nelde et al. demonstrated evidence that the amino acid sequences of several sars-cov- t cell epitopes recognized by unexposed individuals are similar to some amino acid sequences in the four seasonal common cold human coronaviruses with identities ranging from % to % (not % identity) ( ). however, anyone has not shown evidence that people with this cross-reactivity are less susceptible to it may be also possible to assume that pre-existing t cell immunity might be detrimental through mechanisms such as original antigenic sin or ade ( ). in the current data, any of the epitopes was not found in the amino acid sequences of the four human coronaviruses, suggesting that effective common ctl epitopes derived from pp a, if any, might be very few. here, we focused on ctl epitopes restricted by hla-a* : which is the most common hla class i allele in the world, and used highly reactive hla-a* : transgenic mice, termed hhd mice ( ). although we can use lymphocytes of sars-cov- -infected individuals to identify ctl epitopes, there are mainly two advantages to using hhd mice. first, a large amount of blood of patients is required for examine many candidates of ctl epitopes, but any number of mice can be prepared for this purpose. second, when using patients' lymphocytes, we are only testing whether the peptide candidates are recognized by memory ctls. when using naïve mice, however, we can find whether the epitope candidates are able to prime peptide-specific ctls, which may be a better criterion to judge them as vaccine antigens. it is supposed that the efficient epitope for ctl recognition is not always efficient for ctl priming. however, we should take into account that the immunogenic variation in hla class i transgenic mice may not be identical to that in humans because the antigen processing and presentation differ between them. in the previous studies, we used peptide-linked liposomes as an immunogen ( ). the surface-linked liposomal peptides were effective for peptide-specific ctl induction in mice. however, attaching peptides to the surface of liposomes followed by purifying them through the column is a fairly complicated process and time-consuming. in the current study, peptide-encapsulated liposomes were used as an immunogen. in contrast to the peptide-linked liposomes, the peptide-encapsulated liposomes are prepared by just mixing liposomes and the peptide. in addition, the peptide-encapsulated liposomes are able to prime peptide-specific ctls in mice as efficiently as the peptide-linked liposomes. liposome itself consisting of lipid bilayers is a very safe material for humans. therefore, the peptide-encapsulated liposomes are considered to be promising as a ctl-based vaccine candidate. understanding the mechanism of immunodominance is obviously important for the development of effective vaccines. when mice were immunized with liposomes containing the mixture of peptides, it was found that some peptides induced peptide-specific ctls stronger than other peptides (figs. & ) . as shown in figs table i , the peptide affinity of pp a- to hla-a* : is very high (bl = . m), while pp a- is a medium binder (bl = . m). interestingly, the peptide affinity of pp a- is lowest among the peptides selected (table i) . these data suggest that the affinity of tcr to the peptide-mhc-i complex is critical for ctl immunodominance. in the selection of antigenic epitopes for the ctl-based vaccine against sars-cov- , dominant epitopes such as pp a- and - should be chosen because they produce strong ctl response to eliminate virus-infected cells effectively. however, the immunological pressure exerted by dominant epitopes may allow the epitope sequences of sars-cov- to be mutated, and therefore, a vaccine containing multiple antigenic epitopes should be recommended for a successful covid- vaccine. it was surprising that all of the genetically identical mice did not show the same reactive pattern against the peptides when they were immunized with liposomes containing the mixture of peptides (fig. ). there were roughly three pattern types, a-c, that varied sequentially, suggesting the existence of an immunodominance hierarchy composed of three stages in cd + t cell responses to the peptides (fig. ) . the differences among the three types might be explained by the timing of ctl expansion. in the type a, dominant peptides, pp a- , - , and - presumably activated t cells more efficiently than the other peptides, and hence, dominant peptide-specific ctls proliferate faster and curtail the expansion of ctls specific for the other peptides. in the type b, it is considered that the expansion of dominant ctls specific for pp a- , and - was delayed for some reason compared to that in type a, and thereby subdominant ctls specific for pp a- could afford to expand. it is also thought that even non-dominant ctls proliferated because the expansion of both dominant ctls and subdominant ctls in the type c was later than that in the type b. in vivo ctl assay was carried out as described ( ). in brief, spleen cells from naive hhd mice were equally split into two populations. one population was pulsed with m of a relevant peptide and labeled with a high concentration ( . m) of carboxyfluorescein diacetate succinimidyl ester (cfse) (molecular probes, eugene, or). the other population was unpulsed and labeled with a lower concentration ( . m) of cfse. an equal number ( × ) of cells from each population was mixed together and adoptively transferred i.v. into mice that had been immunized once with a liposomal peptide. sixteen hours later, spleen cells were prepared and analyzed by flow cytometry. to calculate specific lysis, the following formula was used: % specific lysis = immunized twice with either each of the liposomal peptides (pp a- , - , - , - , - , - , - , - , , and - ) or liposomes alone together with cpg. one week later, in vivo peptide-specific killing activities were measured. three to five mice per group were used, and the data of % specific lysis are shown as the mean ± lessons for covid- immunity from other coronavirus infections structural genomics of sars-cov- indicates evolutionary conserved functional regions of viral proteins structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid- development of an inactivated vaccine candidate for sars-cov- nvx-cov vaccine protects cynomolgus macaque upper and lower airways against sars-cov- challenge vaccine bnt b selected for a pivotal efficacy study exhaustion of t cells in patients with coronavirus disease (covid- ). front immunol : longitudinal analyses reveal immunological misfiring in severe covid- origin and evolution of pathogenic coronaviruses efficient induction of cytotoxic t lymphocytes specific for severe acute respiratory syndrome (sars)-associated coronavirus by immunization with surface-linked liposomal peptides derived from a non-structural polyprotein a syfpeithi: database for mhc ligands and peptide motifs a hybrid approach for predicting promiscuous mhc class i restricted t cell epitopes propred : prediction of promiscuous mhc class-i binding sites the immune epitope database (iedb) cytotoxic t lymphocytes recognize a fragment of influenza virus matrix protein in association with hla-a an optimal viral peptide recognized by cd + t cells binds very tightly to the restricting class i major histocompatibility complex protein on intact cells but not to the purified class i protein -restricted education and cytolytic activity of cd + t lymphocytes from beta microglobulin (beta m) hla-a . monochain transgenic h- d b beta m double knockout mice a sequence homology and bioinformatics approach can predict candidate targets for immune responses to sars-cov- covid- coronavirus vaccine design using reverse vaccinology and machine learning sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls selective and cross-reactive sars-cov- t cell epitopes in unexposed humans sette a. . targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals sars-cov- -reactive t cells in healthy donors and patients with covid- sars-cov- -epitopes define heterologous and covid- -induced t-cell recognition pre-existing immunity to sars-cov- : the knowns and unknowns introduction of a point mutation into an hla class i single-chain trimer induces enhancement of ctl priming and antitumor immunity partial purification and some properties of a cytotoxic monoclonal antibody with specificity for hla-a and a variant of hla-a b & c) comparison of the peptides in the induction of ifn- + cd + t cells seventeen hhd mice were immunized once with the mixture of peptides involving after one week, spleen cells were stimulated with or without each of the cd + t cells was stained. (c) y-axis indicates the relative percentages of ifn- + cells in cd + t cells which were calculated by subtracting the % of ifn- + cells in cd + t cells without a peptide from the % of ifn- + cells in each blue circle represents an individual mouse data are shown as the mean (horizontal bars) ± sd. (b) statistical comparisons of the relative % values of ifn- + cd + t cells among the peptides in fig. c were made by one-way anova followed by post-hoc tests. *, p < comparison of the peptides in the induction of cd a + cd + t cells (a) and after one week, spleen cells were stimulated with or without each of the peptides t cells was analyzed. data indicates the relative percentages of cd a + (a) and cd + (b) cells in cd + t cells which were obtained by subtracting the % of cd a + and cd + cells in cd + t cells without a peptide from the % of cd a + and cd + cells each red (a) or green (b) circle represents an individual mouse. data are shown as the mean (horizontal bars) ± sd. statistical analyses of the data among the peptides in fig. a and fig. b were performed by one-way anova followed by post-hoc tests in fig three types of reactivity in mice immunized with the mixture of the peptides fifteen mice were immunized once with the mixture of peptides including pp a- after one week, spleen cells were stimulated with or without each of the peptides, and intracellular ifn- in cd + t cells was stained peptides, fifteen mice were divided into three types, a-c. each graph represents reactivity of an individual mouse. y-axis indicates the relative percentage of ifn- + cells cd + t cells which was calculated by subtracting the % of ifn- + cells in cd + t cells without a peptide from the % of ifn- + cells in cd + t cells with a relevant peptide statistical analyses of the relative % values to peptides in each type were performed by one-way anova followed by post-hoc tests key: cord- - erodkv authors: hassan, sk. sarif; attrish, diksha; ghosh, shinjini; choudhury, pabitra pal; uversky, vladimir n.; uhal, bruce d.; lundstrom, kenneth; rezaei, nima; aljabali, alaa a. a.; seyran, murat; pizzol, damiano; adadi, parise; abd el-aziz, tarek mohamed; soares, antonio; kandimalla, ramesh; tambuwala, murtaza; lal, amos; azad, gajendra kumar; sherchan, samendra p.; baetas-da-cruz, wagner; palù, giorgio; brufsky, adam m. title: notable sequence homology of the orf protein introspects the architecture of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: erodkv the global public health is endangered due to covid- pandemic, which is caused by severe acute respiratory syndrome coronavirus- (sars-cov- ). despite having similar pathology to mers and sars-cov, the infection fatality rate of sars-cov- is likely lower than %. sars-cov- has been reported to be uniquely characterized by the accessory protein orf , which contains eleven cytotoxic t lymphocyte (ctl) epitopes of nine amino acids length each, across various human leukocyte antigen (hla) subtypes. in this study, all missense mutations found in sequence databases were examined across twnety-two unique sars-cov- orf variants that could possibly alter viral pathogenicity. some of these mutations decrease the stability of orf , e.g. i l and v i were found in the morf region of orf which may also possibly contribute to intrinsic protein disorder. furthermore, a physicochemical and structural comparative analysis was carried out on sars-cov- and pangolin-cov orf proteins, which share . % amino acid homology. the high degree of physicochemical and structural similarity of orf proteins of sars-cov- and pangolin-cov open questions about the architecture of sars-cov- due to the disagreement of these two orf proteins over their sub-structure (loop/coil region), solubility, antigenicity and change from the strand to coil at amino acid position , where tyrosine is present. altogether, sars-cov- orf is a promising pharmaceutical target and a protein which should be monitored for changes which correlate to change pathogenesis and clinical course of covid- infection. severe acute respiratory syndrome coronavirus- (sars-cov- ), responsible for the global pandemic, has brought the whole world to a stand-still , . the contagious nature of this virus is concerning as it has infected more than million people worldwide claiming , deaths, so far [ ] [ ] [ ] . in addition to low-pathogenicity and endemic coronaviruses, high pathogenic severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov) in and , respectively, caused severe human illnesses, e.g. pneumonia, and renal failure but without any pandemic grade transmission capacities . sars-cov had a . % infection fatality rate and mers a % infection fatality rate, but sars-cov- has a lower than % infection fatality . therefore, it is vital to monitor critical mutations in the proteins such as orf (open reading frame ) that could possibly change viral pathogenicity. sars-cov- is a baltimore class iv positive-sense, single-stranded rna virus with four structural proteins, sixteen non-structural proteins, and six accessory proteins . the smallest accessory protein in sars-cov- , the -residue peptide orf , and distinguishes the infection more rapidly than pcr based strategies . the protein sars-cov- orf has the highest number of immunogenic epitopes of all putative orf proteins, therefore making it a potential target for vaccine development . due to its short length, orf has been suggested to be an insertion mutation. however, this is unlikely as the orf gene is present at the terminal its sgrna sequence. it has been hypothesised that orf is a transposon, but this is also unlike as transposons are of larger size . orf consists of a molecular recognition feature (morf) region from amino acid residue to , which is a molecular recognition site for interaction with other proteins . it is one of the critical properties of intrinsically disordered proteins that allow proteins to adapt an ensemble of conformations when bound to different proteins, and this permits interaction with multiple proteins . through high-throughput analysis it was revealed that orf can interact with a large number of host proteins despite its small structure; therefore, this aspect can be likely attributed to the morf region . through bioinformatics, it was previously reported that the sars-cov- orf exhibits interaction with multiple members of the cullin-ubiquitin-ligase complex and controls the host-ubiquitin machinery for viral pathogenesis [ ] [ ] [ ] [ ] . humans may not have been able to utilize any memory b and t cells elicited against other microorganisms to target orf and fight sars-cov- , contributing to its contagious nature . it was further reported that no sequence homology was found with any protein in the ncbi protein depository. recently, sars-cov- orf is found to have . % nucleotide similarity to that of pangolin-cov- , . the present study examines mutations discovered in sars-cov- orf variants, which along with their physiochemical and immunological properties suggests the significance of these mutations to alter pathogenesis and to possibly identify some potential vaccine candidates. a inclusive parity and disparity analysis between the two orf proteins of sars-cov- and pangolin-cov was also conducted. each unique orf sequence was aligned using the national center for biotechnology information (ncbi) protein p-blast and omega blast suites to determine the mismatches and thereby, the missense mutations (amino acid changes) were identified , ( figure (a) ). a mutation from one amino acid a to another a at the position p is denoted by a pa or a (p)a . based on the mutations, conserved and non-conserved residues in orf proteins are identified and marked in different colors in (figure there are altogether distinct missense mutations which were examined across unique orf variants of sars-cov- . these missense mutations are found in the entire orf sequence starting from the amino acid position to . the amino acids arginine (r), valine (v), and leucine (l) are substituted to more than one amino acid at fixed positions (marked magenta in figure (b). the largest conserved region across all the orf variants is "slllc" at positions - . note that each unique variant ( table . twenty-two orf proteins (sars-cov- ) and their corresponding mutations and predicted effects with changes in chemical properties. * provean score: if the provean score is equal to or below a predefined threshold (e.g., - . ), the protein variant is predicted to have a "deleterious" effect. if the provean score is above the threshold, the variant is predicted to have a "neutral" effect. ri: reliability index ranges from to . from table , it was established that the majority of the diversified mutations are deleterious and cause the stability of the protein to decrease, thus indicating the amplification of intricate virulence of sars-cov- . it was reported that sars-cov- orf is not homologous with other proteins in the ncbi depository . the sars-cov- orf was blasted in the ncbi depository and no significant homology was detected for orf sars-cov as well as bat-cov orf . surprisingly, sars-cov- orf showed . % homology to pangolin-cov orf (qig . (release date: - - ; collection date: - - ; geo-location: china; host: sunda pangolin (manis javanica))) ( figure only the serine (s) has been mutated to asparagine (n) at amino acid position in sars-cov- orf from the pangolin-cov orf and the mutation is deleterious (provean score - ). due to this mutation, the stability of the protein structure is predicted to be decreased and consequently the intricate virulence of sars-cov- will escalate. analysis of the per-residue intrinsic disorder predispositions of the orf of sars-cov- and orf proteins from sars-cov and pangolin-cov provide further evidence of their differences. figure a represents the results of this analysis and shows that while orf proteins from sars-cov- and pangolin-cov show very similar disorder profiles, the per-residue disorder propensity of the orf protein from sars-cov is remarkably different, especially within the c-terminal half of this protein. this is in agreement with the results of other analyses conducted in this study. (b) analysis of the intrinsic disorder predisposition of the unique variants of sars-cov orf in comparison with the reference orf protein from sars-cov- (yp_ ) from the nc_ sars-cov genome (china, wuhan) (bold black curve). analysis is conducted using pondr-vsl algorithm , which is one of the more accurate standalone disorder predictors [ ] [ ] [ ] . a disorder threshold is indicated as a thin line (at score = . ). residues/regions with the disorder scores > . are considered as disordered, whereas residues with disorder scores between . and . are considered highly flexible, and residues with disorder scores between . and . are taken as moderately flexible. flattened. interestingly, comparison of the figure a and b shows that the variability in the disorder predisposition between many variants of the orf protein from various sars-cov- isolates is noticeably greater than that between the reference orf from sars-cov- and orf from pangolin-cov. on the other hand, none of the sars-cov- orf variants (with the exception for the truncated qjr . variant) has as disordered c-terminal half as the orf protein from sars-cov does. considering the highest amount of sequence homology of orf proteins of sars-cov- and pangolin-cov, we intended to discover the parity and disparity between the orf proteins of sars-cov- and pangolin-cov. we, therefore, performed a multi-dimensional analysis of both orf proteins from structural, physicochemical, biophysical and immunological aspects to understand the origin of sars-cov- from the orf perspective. exploration of similarities between sars-cov- and pangolin orf sequences ( figure a ) revealed that neither of them had disulfide linkages. however, many differences were detected. the sars-cov- orf protein was classified as an alpha-helical transmembrane protein (with probability . ) owing to the server abtmpro as well as the presence of a majority of hydrophobic amino acids, whereas the pangolin-cov orf sequence was predicted to be a non-transmembrane protein (with probability . ). also, it was discerned that the predicted probability of antigenicity of sars-cov- orf was slightly higher than that of pangolin-cov orf . it was predicted that both proteins are located in the capsid region of the virus as both of them have a positive distance score, with a higher score for pangolin-cov ( . ) than for sars-cov- ( . ). to achieve deeper insights into the orf proteins of sars-cov- and pangolin-cov, we characterized their secondary structure ( figure b ) and found them to be very much similar except for a significant difference at the position , tyr (y), which for sars-cov- orf is in the coil region whereas for pangolin-cov orf , it is located in the strand region. most of the residues, in sars-cov- orf and in pangolin-cov orf , are buried and consequently, the solubility of sars-cov- orf is slightly higher than that of pangolin-cov. after structural and fundamental property studies, a subsequent thorough analysis of the physicochemical properties of two orf proteins of sars-cov- and pangolin-cov was performed, which unveiled the high similarity based on extinction coefficient, isoelectric point and net charge ( figure a ). however, the molecular weight ( . g/mol) of the sars-cov- orf was higher compared to pangolin-cov orf ( . g/mol), due to the substitution of s (low molecular weight) of pangolin-cov to n (high molecular weight) of sars-cov- . the enzyme cleavage sites for the sars-cov- and pangolin-cov orf were also indistinguishable for all proteases ( figure b ). protein intrinsic disorder analysis disclosed the presence of hotloops in both sequences within the same span of amino-acids ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . however, the presence of loops/coils ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) was a distinct characteristic of sars-cov- orf and no such structures were observed for pangolin orf (figure ). eleven distinct epitopes in the sars-cov- orf were identified and analysed for binding affinity using pickpocket across hla subtypes. the ideb score was predicted using the ideb immunogenicity tool. eleven epitopes (marked in orange) from the wuhan sars-cov- orf sequence. scores in red/blue show an increase/decrease concerning the score associated with nine epitopes. green marked scores convey the immunogenicity value remaining unchanged. to shed light on the immunogenic properties of orf , we carried out immunoinformatics analysis and identified ( figure ) nine amino acid long epitopes in cytotoxic t-lymphocytes (ctls) from the sars-cov- orf sequence across all hla subtypes. their scores were recorded, and corresponding epitope-bearing mutations were analysed. comparison of scores with the original epitopes were done and thereby predicted the increase/decrease in binding affinity for class i mhc molecules due to mutations. these eleven epitopes and mutational sequence-bearing epitopes were analysed using the ideb tool to account for their immunogenicity. a detailed study of the orf protein was carried out to evaluate its potential to yield to variants that could possibly alter viral pathogenicity. it was observed that each sars-cov- orf sequence possesses one distinct mutation. each of the twenty-two sars-cov- orf variants is at a uniquely different position. none of these mutations in the sars-cov- orf , however, contributes to the determination of clades of sars-cov- . of all variants, a total of variants were / identified to possess mutations at amino acid positions - and in a region predicted to contain overlapping loops/coils and hot-loop regions of the orf protein. all mutations were predicted to be deleterious with decreased effect on protein structure stability except s f, which increased stability, denoting that these mutations play an active role in enhancing intrinsic propensity disorder (ipd) and allowing the protein to undergo more favorable interactions with other proteins. two other mutations, i l and v i, were found to be in the morf region of orf , and which may also possibly contribute to the ipd as well. the mutations at positions and were also significant due to their sensitivity for trypsin activity. four orf variants (qnc . , qmt . , qmu . and qla . ) possess four mutations at these two positions. among them, three variants harboring the mutations r i, r l and r c provide trypsin resistance, while the fourth variant (qla . ) with the r k mutation is susceptible to protease degradation. an amino acid homology of . % was observed between sars-cov- orf and pangolin-cov orf . although most physicochemical and peptide properties are similar, the probability of antigenicity is greater for sars-cov- orf than that of pangolin-cov orf and consequently a stronger immune response is predicted for sars-cov- orf . a change from strand (pangolin-cov orf ) to coil (sars-cov- orf ) at position (tyrosine (y)), is predicted indicating the higher disordered state of the protein. a sequence with the y h mutation was also detected in sars-cov- orf , which showed that a hydrophobic amino acid was replaced by a hydrophilic amino acid, thus increasing the probability for more ionic interactions. analysis identified orf mutations predicted to alter binding affinity to respective hla alleles and to possibly correspondingly change the immunogenicity of sars-cov- orf . eight orf variants (containing one of the following mutations each g d, i l, i m, y c, y h, f s, l s and l p (table ) ) accounted for % of total mutations and demonstrated decreased affinity for mhc class i, % of the variants (carrying mutations r k, r i, r c, r l and d y) predict for increased affinity, and % of the variants (carrying mutations v i, a v, p s, s f, a v and v a) contain both high and low binding affinity epitopes. this may indicates that mutations in orf are predominantly decrease the affinity of epitopes to escape the host-immune system, while in the mixed cases the effect of increased affinity by mutations is nullified by the presence of mutations contributing to decreased affinity. for mutations showing only increased binding affinity epitopes, it is hypothesized that acquiring more than one mutation in a single sequence in the future will nullify them as well. in addition, the immunogenicity score prediction revealed that a large number of mutations had decreased or no effect and very few of them exhibited an increased immunogenicity score, which may be a possible strategy adopted by sars-cov- to evade the host-immune response. six mutation-bearing sequences (qlj . , qmt . , qly . , qnc . , qmt . , and qlg . ) were found to contain epitopes showing both high affinity binding for mhc class and high immunogenicity, indicating that these epitopes can mount significant immune response and might serve as potential targets for vaccine candidates. more critical study in orf sars-cov is necessary to monitor high frequency mutations that could change viral pathogenesis. orf protein of sars-cov- and pangolin-cov are similar. however, there are predicted notable differences detected between these two orf proteins in terms of loop/coil structure, antigenicity, solubility, and in mutational diversification of sars-cov- . these significant disagreements of various physicochemical, structural, immunological properties despite an amino acid homology ( . %) between the orf proteins of sars-cov- and pangolin-cov are quite surprising, and deserving of further study. there were , complete genomes of sars-cov- available on the ncbi (national center for biotechnology information) database, as of th august . each genome contains the orf accessory protein and among them only sequences were found to be unique. among these unique orf protein sequences, only sequences possess only one missense mutation each and the remaining sequences possess ambiguous mutations. it is noted that, there was only one orf sequence (qjr . ) which was truncated due to a nonsense mutation at amino acid position . the present study focused on these orf proteins ( the miscellany of orf variants of sars-cov- is clearly observed in the sequence-based homology (figure (a) ) and phylogeny (figure (b) ). each orf of sars-cov- is different from the wuhan sars-cov- orf sequence utilizing a single amino acid change at a distinct position. noticeably, these positions ( ) are widely varying from the position to for the sars-cov- orf variants. the prediction of various properties of orf proteins was determined by several webservers which are briefly described as follows. • for the prediction of the effect of identified mutations, the provean webserver was used and also for the structural effects of mutations, and another webserver, i-mutant, was used [ ] [ ] [ ] . the quark webserver was used for the prediction of secondary structure of orf proteins [ ] [ ] [ ] . • given an amino acid sequence, the abtmpro webserver predicts whether the given sequence is a transmembrane protein. if the given sequence is a transmembrane protein, it further predicts the probabilities of the protein being an alpha-helix transmembrane protein or a beta barrel transmembrane protein. in addition, for various peptide property findings, the innovagen webserver was used . • the dipro can predict whether the given protein sequence contains a cysteine disulfide bond, based on d recurrent neural network, support vector machine, graph matching and regression algorithms . • the protein antigenicity is predicted using the webserver antigenpro, which is a sequence-based, alignment-free and pathogen-independent predictor. a two-stage architecture makes the probability of prediction based on multiple representations of the primary sequence and five machine learning algorithms . the intrinsic disorder prediction of a given protein sequence was made using the server disembl . • epitopes of a given amino acid sequence were spotted and analyzed for binding affinity using across hla (human . the ideb (the immune epitope database) score was predicted using the ideb immunogenicity tool , . per-residue disorder distribution within orf protein sequences was evaluated by pondr −v sl , which is one of the more accurate standalone disorder predictors [ ] [ ] [ ] . the per-residue disorder predisposition scores are on a scale from to , where values of indicate fully ordered residues, and values of indicate fully disordered residues. values above the threshold of . are considered disordered residues, whereas residues with disorder scores between . and . are considered highly flexible, and residues with disorder scores between . and . are taken as moderately flexible. sars-cov- detection in patients with influenza-like illness human neutralizing antibodies elicited by sars-cov- infection case report: death due to covid- in three brothers. the am pathological findings of covid- associated with acute respiratory distress syndrome. the lancet respiratory medicine coronavirus infections and immune responses rooting the phylogenetic tree of middle east respiratory syndrome coronavirus by characterization of a conspecific virus from an african bat the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirus outbreak in wuhan, china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- variant analysis of sars-cov- genomes bioinformatic prediction of potential t cell epitopes for sars-cov- understanding covid- via comparative analysis of dark proteomes of sars-cov- , human sars and bat sars-like coronaviruses multitude of binding modes attainable by intrinsically disordered proteins: a portrait gallery of disorderbased complexes a sars-cov- protein interaction map reveals targets for drug repurposing sars-cov- molecular network structure virus-host interactome and proteomic survey of pmbcs from covid- patients reveal potential virulence factors influencing sars-cov- pathogenesis coding potential and sequence conservation of sars-cov- and related animal viruses sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls the architecture of sars-cov- transcriptome are pangolins the intermediate host of the novel coronavirus (sars-cov- )? ncbi blast: a better web interface the embl-ebi search and sequence analysis tools apis in morfchibi system: software tools for the identification of morfs in protein sequences exploiting heterogeneous sequence properties improves prediction of protein disorder comprehensive review of methods for prediction of intrinsic disorder and its molecular functions comprehensive comparative assessment of in-silico predictors of disordered regions accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus co-infection with sars-cov- and influenza a virus in patient with pneumonia, china predicting the functional effect of amino acid substitutions and indels a fast computation of pairwise sequence alignment scores between a protein and a set of single-locus variants of another protein provean web server: a tool to predict the functional effect of amino acid substitutions and indels ab initio protein structure assembly using continuous structure fragments and optimized knowledgebased force field toward optimal fragment generations for ab initio protein structure assembly molecular conservation and differential mutation on orf a gene in indian sars-cov genomes scratch: a protein structure and structural feature prediction server large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching high-throughput prediction of protein antigenicity using protein microarray data protein disorder prediction: implications for structural proteomics the pickpocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to mhc-peptide binding the immune epitope database (iedb): update authors acknowledge the ncbi sequence (sars-cov- and pangolin-cov- ) depositors. ssh conceived the problem and experiment(s). da, sg, ssh, vnu examined the mutations. ssh, ppc, da, sg and vnu analysed the results. ssh wrote the primary draft of the article. all authors reviewed, edited and approved the final manuscript. the authors have no conflicts of interest to declare. key: cord- -klb oe q authors: chen, serena h.; young, m. todd; gounley, john; stanley, christopher; bhowmik, debsindhu title: distinct structural flexibility within sars-cov- spike protein reveals potential therapeutic targets date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: klb oe q the emergence and rapid worldwide spread of the novel coronavirus disease, covid- , has prompted concerted efforts to find successful treatments. the causative virus, severe acute respiratory syndrome coronavirus (sars-cov- ), uses its spike (s) protein to gain entry into host cells. therefore, the s protein presents a viable target to develop a directed therapy. here, we deployed an integrated artificial intelligence with molecular dynamics simulation approach to provide new details of the s protein structure. based on a comprehensive structural analysis of s proteins from sars-cov- and previous human coronaviruses, we found that the protomer state of s proteins is structurally flexible. without the presence of a stabilizing beta sheet from another protomer chain, two regions in the s domain and the hinge connecting the s and s subunits lose their secondary structures. interestingly, the region in the s domain was previously identified as an immunodominant site in the sars-cov- s protein. we anticipate that the molecular details elucidated here will assist in effective therapeutic development for covid- . a new coronavirus, severe acute respiratory syndrome coronavirus (sars-cov- ), causes respiratory illness and has now reached a pandemic scale. denoted as coronavirus disease (covid- ) , its global spread is currently ongoing with symptoms that can range from mild flu-like to severe pneumonia, leading to death in certain cases. early investigations in china showed that sars-cov- has a high genomic sequence similarity to the previous sars-cov- , along with a bat coronavirus [ ] , [ ] . similar to sars-cov- , sars-cov- is a positive-sense, single-stranded rna virus of the betacoronavirus genus. given the health crisis caused by the mounting number of covid- cases worldwide, there is an urgent need to develop effective therapeutics and eventual vaccines. a critical step in developing targeted treatments is obtaining a detailed notice: this manuscript has been authored by ut-battelle, llc under contract no. de-ac - or with the u.s. department of energy. the united states government retains and the publisher, by accepting the article for publication, acknowledges that the united states government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for united states government purposes. the department of energy will provide public access to these results of federally sponsored research in accordance with the doe public access plan (http://energy.gov/downloads/doe-public-accessplan). understanding of the molecular pathways of sars-cov- and constituent structures. the structural biology community has made rapid progress towards this goal by experimentally determining several sars-cov- proteins, including the spike (s) [ ] - [ ] , nucleocapsid (n) [ ] , and main protease (m pro ) [ ] . the s protein is particularly important since it resides on the viral envelope and is responsible for host cell entry by engaging angiotensin-converting enzyme (ace ) receptors [ ] , [ ] , [ ] . recent experimental structures of the sars-cov- s protein receptor binding domain (rbd) in complex with ace provide detailed interface information [ ] , [ ] ; targeting this interface represents an active area of research for therapeutic development [ ] . however, there may exist potential targets on the s protein besides the rbd domain. further motivating our work is the need to understand sars-cov- structure in the context of structures from previous coronaviruses. prior to the emergence of sars-cov- , comparisons of the trimeric s protein from different viruses showed they possess overall similar structures but with some local differences [ ] , [ ] . with the appearance of sars-cov- , it is necessary to investigate the s protein structure [ ] , and compare it to previous human coronaviruses. here, to obtain deeper insights into s protein structure for biological understanding and therapeutic targeting, we employ a combined molecular dynamics (md) simulation and artificial intelligence (ai) methodology on a series of coronavirus s proteins. specifically, we investigate the s proteins from the current sars-cov- , sars-cov- , middle east respiratory syndrome coronavirus (mers-cov), and human coronavirus hku (hcov-hku ). from structural analysis of extensive md simulations, we find substantial flexibility between the subunits of the s proteins. we find that reduced distance matrix representations, interpreted by unsupervised deep learning (dl), reveal important regions for s protein trimerization. these regions present potential targets for therapeutic development. human coronavirus spike (s) proteins are molecular complexes each formed by three protein chains [ ] , [ ] - [ ] . to better understand s protein structure [ ] , we started by studying the protomer, the structural unit of the trimeric complex, and compared the protomer structure of the current sars-cov- with protomers of previously identified human coronaviruses, including sars-cov- [ ] , mers-cov [ ] , and hcov-hku [ ] . we modeled each protomer system from the corresponding cryo-em s protein structure (see methods for more details). there are three major structural domains which constitute the protomer: the amino-terminal domain (ntd), receptor binding domain (rbd), and the s domain. the ntd and rbd are within the s subunit and are responsible for binding to host receptors, while the s domain is in the s subunit which mediates fusion of the viral and host membranes [ ] , [ ] . fig. a-d shows the initial protomer structures of the four s proteins, each highlighted by the three domains. to determine the dynamic change of the domain organization in the protomer structures, we measured the distribution of the interdomain distances from , structures of each system taken from independent ns md simulations (fig. e-g) . given the limited timescale, we made no attempt to reproduce the thermodynamics but focused on computing the structural features of the systems [ ] . the overall distributions of the three interdomain distances are broad, ranging from Å to Å between s and ntd/rbd domains and Å to Å between ntd and rbd domains. these wide distributions indicate enhanced structural flexibility in the domain arrangement of the four protomer systems compared to the cryo-em structures. to gain further insight into the , protomer structures, we applied convolutional variational autoencoders (cvaes) to encode the high dimensional protein structures from the md simulations into -d latent spaces for visualization. compared to commonly applied structural analysis methods, which are mostly limited to specific regions of the protein of interest, our dl approach allows us to evaluate a protein structure as a whole yet still capture detailed local structural features. we found individual clusters corresponding to each of the four protomer systems, suggesting that the structural features of the four systems embedded in the latent spaces are distinct from each other ( fig. a-c). representative structures selected from the clusters also show high structural flexibility in their domain arrangement ( fig. d-f). the hinge region connecting the s and s subunits opens up, causing broad distributions of the interdomain distances observed in fig. e-g. the exposed surface in the protomer may provide new targets for therapeutic action. therefore, we further explored the structural flexibility of the sars-cov- s protein. we compared the structures between the protomer and trimer to investigate whether the large structural arrangement of the domains in the protomer is affected by the trimeric state. we found that domain organization is stabilized by trimerization. unlike the protomer, the three domains are closely arranged in the trimer (figs. a-b and s ). in addition to their interdomain distances, the differences in structural flexibility between the protomer and trimer is captured in solventaccessible surface area (sasa) values (fig. c) . the sasa values of the protomer and one chain of the trimer are ± nm and ± nm , respectively. the difference in these values largely comes from their s domains, which have the values of ± nm and ± nm , respectively. again, we applied a cvae to encode the , protomer and trimer structures into -d latent spaces. we found two individual clusters, one corresponding to the protomer and the other to the trimer, suggesting that the structural features of the two oligomeric states embedded in the latent spaces are different (fig. d-g) . to locate the different structural features between the protomer and trimer in their -d structures, we compared the difference between the distance matrices of the representative protomer and trimer structures selected from the clusters in the latent dimensions (fig. a) . most differences between the distance matrices result from interdomain arrangement. however, there are some disparities within the s domain (fig. b) . one region of difference corresponds to residue numbers to in the pdb. in the -d structure, this region forms a beta sheet in the trimer while the structure is lost in the protomer (fig. c) . further investigation into this region reveals that the beta sheet in the trimer is stabilized by another beta sheet constituted by residues to of another chain (fig. d) . without the presence of other chains, this stability is lost; not only do residues to in the s domain lose the beta sheet structure, but residues to in the hinge region connecting the two subunits become an extended loop (fig. e) . oligomeric proteins are often stabilized by oligomerization and hold highly flexible regions in the protomer state [ ] . the transition between the structured and unstructured form of these flexible regions is sometime reversible, but due to the size of the s protein protomer and the timescale applied in this study, we did not observe reversible behavior of the two loop regions in the protomer state. interestingly, a fragment between residues and was identified previously as an immunodominant site in sars-cov- s protein [ ] . complementary antibodies acting on this site provided the dominant immune response for patients who recovered from the sars-cov infection. our results provide structural comprehension on the previous experimental observation and posit that targeting these residues might not only interfere with s protein trimerization and the subsequent viral activity but also aid antibody immune response. our findings provide insights into therapeutic design targeting the s protein beyond the oft-targeted rbd domain. here, we used unsupervised deep learning routines based on cvaes to systematically compare s protein ensembles from md simulations across lower-dimensional, latent spaces. by first comparing the s protein protomer structure of sars-cov- to those from previous human coronaviruses, we identified distinct clusters for each virus in the -d latent space, where representative structures from these clusters highlight their differences in domain flexibility. next, we compared the sars-cov- s protein protomer and trimer structures, which also displayed a clear separation of clusters in the latent space. while the main distinctions between these two states arise from the general gain in structural stability as the protomer self-assembles into the trimer state, we pinpointed structural transitions in specific flexible regions of the protomer that warrant consideration as potential therapeutic targets. these regions are promising as natural targets of immune recognition, but more importantly, they are involved in s protein oligomerization, suggesting they are susceptible to therapeutic action for protein destabilization. overall, our study provides a more complete molecular view of the sars-cov- s protein that may assist in accelerating both vaccine and drug design efforts. to generate initial systems for the structural study of human coronavirus s proteins, we built protomer structures from chain a of the cryo-em s protein structures of sars-cov- (pdb vsb [ ] ), sars-cov- (pdb crz [ ] ), mers-cov (pdb q [ ] ), and hcov-hku (pdb i [ ] ). the trimeric state of the sars-cov- s protein consists of all three chains in pdb vsb. each structure was solvated in the center of a water box with a minimum distance of Å from the edge of the box to the nearest protein atom, neutralized with counter ions and ionized with mm nacl. table summarizes the details of the five molecular systems studied. following a similar protocol to our previous studies [ ] , [ ] , the resulting systems were each subjected to , steps of energy minimization, followed by ns equilibration with harmonic restraints placed on the heavy atoms of the protein. the force constant was kcal/mol/Å [ ] . after equilibration, the restraints were released, and a ns trajectory was generated in a production run. for each system, independent ns trajectories were performed. structures were taken every ns for analysis, yielding a total of , structures for each system. all md simulations were performed with namd [ ] in npt ensemble at atm and k with a time step of fs. the charmm m force field [ ] and tip p water model [ ] were used. the nonbonding interactions were calculated with a typical cutoff distance of Å, while the long-range electrostatic interactions were enumerated with the particle mesh ewald algorithm [ ] . to further understand the molecular structures of different human coronavirus s proteins and the oligomeric state of sars-cov- s protein, we deployed a custom-built deep learning architecture, a convolutional variational autoencoder (cvae), to encode the high dimensional protein structures from the md simulations into lower dimensional latent spaces. the goal of our ai method is to reduce the high dimensionality of the molecular system while preserving the inherent characteristics of the system and learning novel behavior in a latent space that is normally distributed. the direct comparison between the decoded and original input data ensures the accuracy of the latent space representation. this customized cvae approach has been successfully applied to study the folding pathways of small proteins and structural clustering of biomolecules [ ] - [ ] . ) data preparation: we used translation and rotation invariant input data for the dl networks. we represented each md structure by a distance matrix using the c ↵ atoms of the protein and generated two input datasets for the cvaes. input included the matrices of the protomer structures of sars-cov- , sars-cov- , mers-cov, and hcov-hku . input included the matrices of the protomer structure and the chain a of the trimer structure of sars-cov- . for input , as the proteins are different in length, we first performed a multiple sequence alignment of the protomer structures of sars-cov- , sars-cov- , mers-cov, and hcov-hku by clustal omega [ ] . based on the alignment, we inserted gaps into the corresponding aligned residue location in the distance matrix and set the distance between gaps to be Å. the size of each distance matrix after the alignment was , ⇥ , . to reduce the matrix size, we applied a convolution layer with padding of and a ⇥ filter with strides of size in both the x and y directions. the size of each resulting matrix became ⇥ . finally, after alignment and size reduction we merged a total of , distance matrices of the four s proteins. an example of a distance matrix following the alignment and size reduction is represented in fig. s . for input , as the proteins are of the same length, no alignment was required. the size of each matrix was ⇥ . we again applied a convolution layer with padding of and a ⇥ filter with strides of size in both the x and y directions to reduce the matrix size. the size of each resulting matrix was also ⇥ , and we merged a total of , distance matrices of the protomer and trimer of sars-cov- s protein. ) cvae implementation: for each of the two input datasets, we randomly split the aligned matrices into training and validation datasets using the / ratio and applied a cvae to capture the important structural features and projected them in the three-dimensional ( -d) latent space for visualization. the encoder network of each cvae consisted of three convolutional layers and a fully connected layer. we used a ⇥ convolution kernel and a stride of , , and at the three convolutional layers, respectively. we trained each cvae until the training and validation loss converged. fig. s shows the loss curves along the epochs. the difference between decoded and original images is minimal, suggesting the models were trained successfully (fig. s ) . we then selected representative structures from the clusters in the latent space and visualized them using vmd [ ] . all md simulations and dl analysis were performed on the summit supercomputer at the oak ridge leadership computing facility. the distances calculated from the cryo-em structure (pdb vsb) are marked by a pentagon. a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace structure, function, and antigenicity of the sars-cov- spike glycoprotein structure of the sars-cov- spike receptor-binding domain bound to the ace receptor crystal structure of nsp endoribonuclease nendou from sars-cov- crystal structure of sars-cov- main protease provides a basis for design of improved -ketoamide inhibitors functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target structure, function, and evolution of coronavirus spike proteins cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains pre-fusion structure of a human coronavirus spike protein stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis structures of mers-cov spike glycoprotein in complex with sialoside attachment receptors the energy landscape of a protein switch in silico design and validation of high-affinity rna aptamers targeting epithelial cellular adhesion molecule dimers b-cell responses in patients who have recovered from severe acute respiratory syndrome target a dominant site in the s domain of the surface spike glycoprotein graphene-extracted membrane lipids facilitate the activation of integrin ↵ v stability of ligands on nanoparticles regulating the integrity of biological membranes at the nano-lipid interface scalable molecular dynamics with namd charmm m: an improved force field for folded and intrinsically disordered proteins comparison of simple potential functions for simulating liquid water particle mesh ewald: an n log (n) method for ewald sums in large systems deep clustering of protein folding simulations mechanism of glucocerebrosidase activation and dysfunction in gaucher disease unraveled by molecular dynamics and deep learning visual analytics for deep embeddings of large scale molecular dynamics simulations the embl-ebi search and sequence analysis tools apis in vmd: visual molecular dynamics key: cord- - cb a r authors: mcnamara, ryan p.; caro-vegas, carolina; landis, justin t.; moorad, razia; pluta, linda j.; eason, anthony b.; thompson, cecilia; bailey, aubrey; villamor, femi cleola s.; lange, philip t.; wong, jason p.; seltzer, tischan; seltzer, jedediah; zhou, yijun; vahrson, wolfgang; juarez, angelica; meyo, james o.; calabre, tiphaine; broussard, grant; rivera-soto, ricardo; chappell, danielle l.; baric, ralph s.; damania, blossom; miller, melissa b.; dittmer, dirk p. title: high-density amplicon sequencing identifies community spread and ongoing evolution of sars-cov- in the southern united states date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cb a r sars-cov- is constantly evolving. prior studies have focused on high case-density locations, such as the northern and western metropolitan areas in the u.s. this study demonstrates continued sars-cov- evolution in a suburban southern u.s. region by high-density amplicon sequencing of symptomatic cases. % of strains carried the spike d g variant. the presence of d g was associated with a higher genome copy number and its prevalence expanded with time. four strains carried a deletion in a predicted stem loop of the ’ untranslated region. the data are consistent with community spread within the local population and the larger continental u.s. no strain had mutations in the target sites used in common diagnostic assays. the data instill confidence in the sensitivity of current tests and validate “testing by sequencing” as a new option to uncover cases, particularly those not conforming to the standard clinical presentation of covid- . this study contributes to the understanding of covid- by providing an extensive set of genomes from a non-urban setting and further informs vaccine design by defining d g as a dominant and emergent sars-cov- isolate in the u.s. the current covid- pandemic is an urgent public health emergency with over , deaths in the united states (u.s.) alone. covid- is caused by infection with the severe acute respiratory syndrome coronavirus- (sars-cov- ). the typical symptoms for covid- may include the following: fever, cough, shortness of breath, fatigue, myalgias, headache, sore throat, abdominal to provide finer granularity about biological changes during sars-cov- transmission, we employed next generation sequencing (ngs) as an independent screening modality. this allowed us to reconstruct the mutational landscape of cases seen at a tertiary clinical care center in the southeastern u.s. from the start of the u.s. epidemic on march , , until past the peak of the first distribution of x coverage for all samples is presented in figure a . as expected, more mapped reads yielded higher coverage. of the negative controls, none had > total reads aligned. of the positive samples, greater than * total mapped reads were needed to obtain x coverage of the whole genome, a minimum of . x reads were needed to obtain > % coverage at x. the number of reads aligned varied depending on the viral load, as determined by real-time qpcr using cdc primer n , but not total rna, as determined using rnase p, of the samples ( figure b) . in this assay, any cp < for sars-cov- qpcr yielded reliable coverage, which increased linearly with viral load. at a cp ≥ most positive samples still yielded reads that mapped to the target genome and thus allowed detection of sars-cov- sequences; however, the results were less consistent, and coverage was more variable. as expected, total rna (measured by rnase p) was not associated with sequencing coverage and varied considerably across samples, even though each sample used the same amount of virus transport medium (vtm). the coverage level distribution is shown in figure c independently derived consensus genomes from the sars-cov- /human/usa-wa / isolates showed evidence of divergence between the original isolate, the seed stock, and commercially distributed standard (figure b) . similar culture-associated changes were recently reported for a second, culture-amplified reference isolate: hong kong/vm / . this is not surprising, given that any large-scale virus amplification in culture is accompanied by virus evolution, but it raises concerns about the utility of using a natural isolate, rather than a molecular clone (graham et al., ; thao et al., ) as standard for sequencing. the phylogeny based on whole genome nucleotide sequences revealed several interesting facets. predictably, all unc isolates of sars-cov- were significantly different from sars-cov and rattg (figure b, purple color) . rattg was used as an outgroup for clustering. the first nc case (nc_ , (figure b , arrow labeled "wa")) was a person returning from washington (wa) and one large deletion was identified in four independent samples: nucleotides were deleted beginning at position (indicated in figure c by a delta symbol) . this region is within the previously recognized "coronavirus ' stem-loop ii-like motif (s m)". this was confirmed in multiple isolates, supported by multiple, independent junction-spanning reads (figure a, b) . junctions were mapped to single nucleotide resolution directly from individual reads. the variant ' end does not destroy overall folding but introduces a shorter stable hairpin (figure c, d) . how this mutation affects viral fitness remains to be established. in sum, this study generated exhaustive snv information representing the introduction and spread of sars-cov- across a suburban low-density area in the southern u.s. all samples were from symptomatic cases and the majority of genomes clustered with variants that predominate the outbreak in the u.s., rather than europe or china. this supports the notion that the majority of u.s. there seems to be partial overlap between the bulged stem-loop and the pseudoknot, suggesting that these two structures are mutually exclusive and may serve as a switch to regulate the ratio of full length rna and defective rna (goebel et al., ) . these two structures are also present in sars- cov. these isolates represent full-length genomes from symptomatic patients rather than disjointed rna fragments recovered after clinical disease had subsided, thus we speculate that these deletion about half of the specimen not clinically tested for sars-cov- tested positive by sequencing. this was not surprising, as to this day testing capabilities are limited, and probable cases are triaged based on clinical and public health indications. these unknown cases were not asymptomatic but represent patients with a clinically indicated need for upper respiratory sampling. finding additional sars-cov- cases in this population suggests that case counts based on nat represent a lower estimate of sars-cov- prevalence. it may also suggest that the current triage criteria for sars- cov- testing are too limited to understand spread of this virus. in sum, this study underscores the sensitivity and accuracy of current nat assays and demonstrates the utility of testing by sequencing. it contributes to the worldwide effort to understand and combat the covid- pandemic by providing the first set of full-length sars-cov- genomes from a non-urban setting. coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease the proximal origin of sars-cov- presymptomatic sars-cov- infections and transmission in a skilled nursing facility sars-cov- viral spike g mutation exhibits higher case fatality rate covid- in critically ill patients in the seattle region -case series genomic variance of the -ncov coronavirus epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in china: a descriptive study molecular evolution of the sars coronavirus during the course of the sars epidemic in china detection of novel coronavirus ( -ncov) by real- time rt-pcr the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars- could the d g substitution in the sars-cov- spike (s) protein be associated with higher covid- mortality? coast-to-coast spread of sars-cov- during the early epidemic in the united states phylogenetic network analysis of sars- cov- genomes genomic epidemiology of hcov- characterization of the rna components of a putative molecular switch in the ' untranslated region of the murine coronavirus genome a live, impaired-fidelity coronavirus vaccine protects in an aged, immunocompromised mouse model of lethal disease evaluation of a recombination-resistant coronavirus as a broadly applicable clinical characteristics of coronavirus disease in china nextstrain: real-time tracking of pathogen evolution temporal dynamics in viral shedding and transmissibility of covid- sars-cov- transmission from presymptomatic meeting attendee faster quantitative real-time pcr protocols may lose sensitivity and show increased variability sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor mafft multiple sequence alignment software version : improvements in performance and usability infection and rapid transmission of sars-cov- in ferrets functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia efficiency clustering for low-density microarrays and its application to qpcr antibody responses to sars-cov- in patients with covid- genomic epidemiology of sars-cov- in guangdong province genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding us cdc real-time reverse transcription pcr panel for detection of severe acute respiratory syndrome coronavirus the neighbor-joining method: a new method for reconstructing phylogenetic trees burden of respiratory viral infection in persons with human immunodeficiency virus. influenza other respir viruses structural basis of receptor recognition by sars-cov- gisaid: global initiative on sharing all influenza data -from vision to reality prospects for inferring very large phylogenies by using the neighbor-joining method coronavirus disease in children -united states rapid reconstruction of sars-cov- using a synthetic genomics platform aerosol and surface stability of sars-cov- as compared with sars-cov- emergence of genomic diversity and recurrent mutations in sars- an outbreak of severe kawasaki-like disease at the italian epicentre of the sars- cov- epidemic: an observational cohort study antigenicity of the sars-cov- spike glycoprotein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus. a phylogenetically conserved hairpin-type ' untranslated region pseudoknot functions in coronavirus rna replication virological assessment of hospitalized patients with covid- cryo-em structure of the -ncov spike in the prefusion conformation a new coronavirus associated with human respiratory disease in china factors associated with prolonged viral rna shedding in patients with covid- characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding quantitative detection and viral load analysis of sars-cov- in infected patients viral and host factors related to the clinical outcome of covid- clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china sars-cov- viral load in upper respiratory specimens of infected patients genetic interactions between an essential ' cis-acting rna pseudoknot, replicase gene products, and the extreme ' end of the mouse coronavirus genome this work was funded by public health service grants ca , ca , and ca key: cord- -pn j authors: sahakijpijarn, sawittree; moon, chaeho; koleng, john j.; christensen, dale j.; williams, robert o. title: development of remdesivir as a dry powder for inhalation by thin film freezing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pn j remdesivir exhibits in vitro activity against sars-cov- and was granted approval for emergency use. to maximize delivery to the lungs, we formulated remdesivir as a dry powder for inhalation using thin film freezing (tff). tff produces brittle matrix nanostructured aggregates that are sheared into respirable low-density microparticles upon aerosolization from a passive dry powder inhaler. in vitro aerodynamic testing demonstrated that drug loading and excipient type affected the aerosol performance of remdesivir. remdesivir combined with optimal excipients exhibited desirable aerosol performance (up to . % fpf; . μm mmad). remdesivir was amorphous after the tff process, which benefitted drug dissolution in simulated lung fluid. tff remdesivir formulations are stable after one-month storage at °c/ %rh. in vivo pharmacokinetic evaluation showed that tff-remdesivir-leucine was poorly absorbed into systemic circulation while tff-remdesivir-captisol® demonstrated increased systemic uptake compared to leucine. remdesivir was hydrolyzed to the nucleoside analog gs- in lung, and levels of gs- were greater in lung with the leucine formulation compared to captisol®. in conclusion, tff technology produces high potency remdesivir dry powder formulations for inhalation suitable to treat patients with covid- on an outpatient basis and earlier in the disease course where effective antiviral therapy can reduce related morbidity and mortality. the coronavirus disease (covid- ) is an ongoing worldwide pandemic. as of september , laboratory-confirmed cases have been reported in countries and territories with more than million reported cases and close to million reported deaths [ ] . although this disease of antiviral activity in the lungs, and limit the potential for systemic side effects [ ] . in addition, the cost of the drug can be reduced, and the supplies of the drug can be maximized, thus treating more patients due to less dose required by inhalation as compared to injectable forms. the treatment cost can also be decreased when administered by inhalation, since patients may not need to visit hospitals as is required to administer the iv injectable dose. therefore, more affordable and early stage treatment can be provided to patients with inhaled remdesivir. nebulization of the current iv formulation in a diluted form is a potential method of pulmonary administration; however, the drug is prone to degrade by hydrolysis in aqueous solution to form the nucleoside monophosphate, which has difficulty penetrating cell membranes, thereby minimizing the antiviral activity in the lung cells [ ] . another concern is the use of sbecd as an excipient in although several techniques have been used to prepare inhalable powders, including mechanical milling and spray drying, the advantages of thin film freezing (tff) over other techniques of rely on the ability to produce aerosolizable particles composed of brittle matrix, nanostructured aggregates. these are high surface area powders that are ideally suited for dry powder inhalation. tff employs ultra-rapid freezing (on the order of - , k/sec) such that precipitation (either as a crystalline nanoaggregate or amorphous solid dispersion) and particle growth of the dissolved solute can be prevented [ ] . subsequently, nanostructured aggregates are formed as a low-density brittle matrix powder [ ] , which is efficiently sheared into respirable low-density microparticles by a the ef was calculated as the total amount of remdesivir emitted from the device as a percentage of the pan with a t-zero hermetic lid were crimped, and a hole was drilled in the lid before placing the the dispersion was adjusted to ml by adding m hbss). the dispersion was frozen at - c for intratracheal administration was carried out using the dry powder insufflator (dp- m model, penn-century inc., philadelphia, pa) connected to the air pump (ap- model, penn-century inc., philadelphia, pa the aerodynamic particle size distribution of tff remdesivir formulations was evaluated using a plastiape® rs high resistance dpi and ngi apparatus. figure and h-nmr was performed to identify interactions between remdesivir and excipients. figure b demonstrates an expansion of h-nmr spectra for selected tff remdesivir powder formulations and remdesivir unprocessed powder. while the peak at . ppm is sharp and does not show any differences from the presented samples, the peaks at . and . ppm of f exhibited broader peak. also, these peaks were slightly shifted to downfield. figure c shows the rs high resistance monodose dpi is a capsule-based dpi device that is available for commercial product development, and it functions to disperse the powder based on impaction force. a previous study confirmed that this impact-based dpi can disperse low-density brittle matrix powders made by tff process into respirable particles better than a shear-based dpi (e.g., handihaler®) [ ] . another study also evaluated the performance of different models of the monodose dpi (rs and rs ) on the aerosol performance of brittle matrix powders containing voriconazole nanoaggregates prepared by tff [ ] . it was shown that the rs device exhibited better powder shearing and deaggregation through smaller holes of the capsule created by the piercing system of the rs device [ ]. therefore, the rs high resistance plastiape® dpi was selected for this study. we found that excipient type and drug loading affect the aerosol performance of tff remdesivir powder compositions. overall, the aerosol performance of tff remdesivir powders increased as the drug loading was increased, a highly desirable feature. this trend is obvious for the captisol®-, lactose-, and mannitol-based formulations when the drug loading was increased from % to %. furthermore, high potency tff remdesivir powder without excipients (f and f ) also exhibited high fpf and small mmad, which indicates remdesivir itself has a good dispersing ability without the need of a dispersing excipient when prepared using the tff process. this shows that the tff technology can be used to minimize the need of excipient(s) in the formulation, thus maximizing the amount of remdesivir being delivered to the lungs by dry powder inhalation. the aerosol performance of the leucine-based formulations did not significantly change when the drug loading was increased from % to %, and these formulations exhibited superior aerosol performance compared to the other excipient-based formulations studied in this paper. this is likely one potential concern related to the amorphous drug is physical instability due to its high energy state. according to criteria described by wyttenbach in addition to the physical stability, remdesivir, as a prodrug, is prone to degrade by hydrolysis in aqueous solution. since an organic/water co-solvent system is required to dissolve the drug and excipients in the tff process, chemical stability is another concern during preparation. nmr spectra demonstrated that remdesivir did not chemically degrade as a result of the tff process. even though remdesivir was exposed to binary co-solvent systems consisting of water during the process, the entire tff process used to produce remdesivir dry powder inhalation formulations did not induce chemical degradation of the parent prodrug. furthermore, remdesivir was chemically stable by hplc (data not shown) and nmr after one-month storage at c/ %rh. since remdesivir is a poorly water-soluble drug, its dissolution may be a critical factor of drug release in the lung fluid, especially in high drug load formulations. undissolved particles can be cleared by mucociliary clearance or macrophage uptake, causing lower drug concentration, lung irritation, and inflammatory response [ ] . therefore, a dissolution test was evaluated in this study. the dissolution profile demonstrated that physical form of drug appears to have a significant effect generally, even though leucine is a hydrophobic amino acid, it is a water-soluble excipient that has higher solubility than poorly water-soluble drugs like remdesivir. therefore, the interactions between drug and leucine at the molecular level can increase the dissolution rate of poorly water-soluble drug [ , ] , like that observed in our study . significantly higher plasma levels of gs- than remdesivir were observed in our rat pharmacokinetic study. the half-life of remdesivir is reportedly much shorter than that of gs- . while half-life of gs- is approximately . hours, for remdesivir it is only about hour in humans following multiple once-daily iv administrations [ ] . in rats, the half-life of remdesivir in plasma is reported to be less than . minute, which is much shorter than that in humans, due to we have demonstrated that low density, highly porous brittle matrix particles of remdesivir covid- dashboard by the center for systems science and engineering (csse) at johns a review of coronavirus disease- (covid- ) summary on compassionate use: remdesivir gilead arguments in favour of remdesivir for treating sars-cov- infections current knowledge about the antivirals remdesivir (gs- ) and gs- as therapeutic options for coronaviruses therapeutic efficacy of the small molecule gs- against ebola virus in rhesus monkeys the antiviral compound remdesivir potently inhibits rna-dependent rna polymerase from middle east respiratory syndrome coronavirus clinical benefit of remdesivir in rhesus macaques infected with sars-cov- gs- and its parent nucleoside analog inhibit filo-, pneumo- , and paramyxoviruses coronavirus susceptibility to the antiviral remdesivir mediated by the viral polymerase and the proofreading exoribonuclease compassionate use of remdesivir for patients with severe remdesivir in covid- mild/moderate -ncov remdesivir rct -full text view -clinicaltrials severe -ncov remdesivir rct -full text view -clinicaltrials sulfobutylether-beta-cyclodextrin physicochemical characterization, molecular docking, and in vitro dissolution of glimepiride- the use of captisol (sbe -beta-cd) in oral solubility-enabling formulations: comparison to hpbetacd and the solubility-permeability interplay pulmonary drug delivery. part i: physiological factors affecting therapeutic effectiveness of aerosolized medications remdesivir for treatment of covid- : combination of pulmonary and iv administration may offer aditional benefit inactive ingredient search for approved drug products. u.s. food and drug administration characterization and pharmacokinetic analysis of aerosolized aqueous voriconazole solution dose tolerability of chronically inhaled voriconazole solution in rodents dry powder insufflation of crystalline and amorphous voriconazole formulations produced by thin film freezing to mice a guide to aerosol delivery devices for respiratory therapists use of thin film freezing to enable drug delivery: a review effect of processing parameters on the physicochemical and aerodynamic properties of respirable brittle matrix powders respirable low-density microparticles formed in situ from aerosolized brittle matrices in vitro and in vivo performance of dry powder inhalation formulations: comparison of particles prepared by thin film freezing and micronization small airway absorption and microdosimetry of inhaled corticosteroid particles after deposition novel ultra-rapid freezing particle engineering process for enhancement of dissolution rates of poorly water-soluble drugs physicochemical characterization of d-mannitol polymorphs: the challenging surface energy determination by inverse gas chromatography in the infinite dilution region agglomerated novel spray- dried lactose-leucine tailored as a carrier to enhance the aerosolization performance of salbutamol sulfate from dpi formulations formulation strategy and use of excipients in pulmonary drug delivery. int j pharm (d) of the securities exchange act of for the fiscal year ended december , form -k savara inc: united states securities and exchange commission a study of aerovanc for the treatment of mrsa infection in cf patients using thin film freezing to minimize excipients in inhalable tacrolimus dry powder formulations processing design space is critical for voriconazole nanoaggregates for dry powder inhalation produced by thin film freezing direct evidence on reduced adhesion of salbutamol sulphate particles due to l-leucine coating correlations between surface composition and aerosolization of jet-milled dry powder inhaler formulations with pharmaceutical lubricants viscosity measurements of methanol-water and acetonitrile-water mixtures at pressures up to bar using a novel capillary time-of-flight viscometer density, dynamic viscosity, and derived properties of binary mixtures of , dioxane with water at t= . k effect of process variables on morphology and aerodynamic properties of voriconazole formulations produced by thin film freezing glass-forming ability of compounds in marketed amorphous drug products glass forming ability on the physical stability of supersaturated amorphous solid dispersions myth or truth: the glass forming ability class iii drugs will always form single-phase homogenous amorphous solid dispersion formulations interpretation and prediction of inhaled drug particle accumulation in the lung and its associated toxicity nebulization of nanoparticulate amorphous or crystalline tacrolimus--single-dose pharmacokinetics study in mice in vitro and in vivo evaluation of porous lactose/mannitol carriers for solubility enhancement of poorly water-soluble drugs the role of functional excipients in solid oral dosage forms to overcome poor drug dissolution and bioavailability nmr methods to characterize protein-ligand interactions hydrogen bonding and chemical reactivity nmr techniques to study hydrogen bonding in aqueous solution. monatshefte fur chemie chemical properties, aerosolization and dissolution of co-spray dried azithromycin particles with l- amino acids as co-amorphous stabilizers for poorly water soluble drugs improved aqueous solubility of crystalline astaxanthin ( , '-dihydroxy-beta, beta-carotene- , '-dione) by captisol (sulfobutyl ether beta- cyclodextrin) what do we know about remdesivir drug interactions? pulmonary dissolution of poorly soluble compounds studied in an ex vivo rat lung model remdesivir inhibits sars-cov- in human lung cells and chimeric sars-cov expressing the sars-cov- rna polymerase in mice advantages of the parent nucleoside gs- over remdesivir for covid- treatment key: cord- - mnclfeg authors: suzuki, yuichiro j.; nikolaienko, sofia i.; dibrova, vyacheslav a.; dibrova, yulia v.; vasylyk, volodymyr m.; novikov, mykhailo y.; shults, nataliia v.; gychka, sergiy g. title: sars-cov- spike protein-mediated cell signaling in lung vascular cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mnclfeg currently, the world is suffering from the pandemic of coronavirus disease (covid- ), caused by severe acute respiratory syndrome coronavirus (sars-cov- ) that uses angiotensin-converting enzyme (ace ) as a receptor to enter the host cells. so far, million people have been infected with sars-cov- , and nearly million people have died because of covid- worldwide, causing serious health, economical, and sociological problems. however, the mechanism of the effect of sars-cov- on human host cells has not been defined. the present study reports that the sars-cov- spike protein alone without the rest of the viral components is sufficient to elicit cell signaling in lung vascular cells. the treatment of human pulmonary artery smooth muscle cells or human pulmonary artery endothelial cells with recombinant sars-cov- spike protein s subunit (val – gln ) at ng/ml ( . nm) caused an activation of mek phosphorylation. the activation kinetics was transient with a peak at min. the recombinant protein that contains only the ace receptor-binding domain of sars-cov- spike protein s subunit (arg – phe ), on the other hand, did not cause this activation. consistent with the activation of cell growth signaling in lung vascular cells by sars-cov- spike protein, pulmonary vascular walls were found to be thickened in covid- patients. thus, sars-cov- spike protein-mediated cell growth signaling may participate in adverse cardiovascular/pulmonary outcomes, and this mechanism may provide new therapeutic targets to combat covid- . coronaviruses are positive sense single stranded rna viruses that often cause the common cold [ , ] . some coronaviruses can, however, be lethal. currently, the world is suffering from the pandemic of coronavirus disease (covid- ) caused by severe acute respiratory syndrome coronavirus (sars-cov- ) [ , ] . so far, million people have been infected with sars-cov- worldwide, causing serious health, economical, and sociological problems. sars-cov- uses angiotensin converting enzyme (ace ) as a receptor to enter the host cells [ , ] . lung cells are the primary targets of sars-cov- , resulting in severe pneumonia and acute respiratory distress syndrome (ards) [ , ] . so far, nearly million people have died because of covid- . [ , , ] . thus, managing the pulmonary and cardiovascular aspects of covid- is considered to be the key for reducing the severity of covid- and the associated mortality. however, it is unclear exactly how sars-cov- affects humans. thus, understanding the mechanism of sars-cov- actions should help develop therapeutic strategies to reduce the mortality and morbidity associated with covid- . the sars-cov- spike protein is critical to initiate the interactions between the virus and the host cell surface receptor and facilitates the viral entry into the host cell by assisting in the fusion of the viral and host cell membranes [ , ] . this protein consists of two subunits: subunit (s ) that contains the ace receptor-binding domain (rbd) and subunit (s ) that is responsible for fusion. the current dogma of the sars-cov- mechanism is that the binding of viral spike protein to the host ace receptor results in the entry of virus into the host cells and the cellular response is a result of the viral infection [ , , , ] . however, we herein show that sars-cov- spike protein alone without the rest of the viral components is sufficient to elicit cell signaling in human host cells, suggesting a novel biological mechanism of sars-cov- actions. human pulmonary artery smooth muscle cells and human pulmonary artery endothelial cells were purchased from sciencell research laboratories (carlsbad, ca, usa), and rat pulmonary artery smooth muscle cells were purchased from cell applications (san diego, ca, usa). cells were cultured in accordance with the manufacturers' instructions in % co at °c. cells at passages - were maintained in low fetal bovine serum ( . %)-containing medium overnight before the treatment as routinely performed in experiments on cell signaling and protein phosphorylation [ ] . cells were treated with the recombinant sars-cov- spike protein full length s subunit that contains most of the s subunit (val -gln ) with a molecular weight of kda (raybiotech, peachtree corners, ga, usa) or the recombinant sars-cov- spike protein rbd (raybiotech) that only contains the rbd region (arg -phe ) with a molecular weight of kda. some cells were pretreated for h with the rabbit anti-ace antibody (catalog # ; cell signaling technology, danvers, ma). to prepare cell lysates, cells were washed in phosphate buffered saline and solubilized with lysis buffer containing mm hepes (ph . ), % (v/v) triton x- , mm edta, mm sodium fluoride, . mm sodium orthovanadate, mm tetrasodium pyrophosphate, mm pmsf, µg/ml leupeptin, and µg/ml aprotinin. samples were then centrifuged at , g for min at ˚c, supernatants collected, and protein concentrations determined [ ] . for western blotting data, means and standard errors of mean (sem) were computed. two groups were compared by a two-tailed student's t test, and differences between more than two groups were determined by the analysis of variance (anova). p < . was defined to be statistically significant. for the morphometric analysis, ibm spss statistics software version . was used for the statistical calculations. mann-whitney u test was used to define the statistical significance at p < . . to test the hypothesis that sars-cov- spike protein alone can elicit cell signaling, human pulmonary artery smooth muscle cells were treated with the full length s subunit (val -gln ) of the sars-cov- spike protein for , , and min. as shown in fig. a , the spike protein at a concentration of ng/ml ( . nm) strongly activated the phosphorylation of kda mek at ser and ser residues. the kinetics of mek phosphorylation promoted by the full-length s subunit (val -gln ) of the sars-cov- spike protein was consistently found to be transient with a peak at min. this fast activation suggests that this may be a receptor-mediated cell-signaling event. similarly, full-length s subunit sars-cov- spike protein promoted the phosphorylation of mek in human pulmonary artery endothelial cells (fig. b ). sars-cov- spike protein, however, did not activate other signaling events such as akt (fig. a) and stat (fig. b) pathways. as we performed experiments to determine the effects of the full-length s subunit of sars-cov- spike protein on rat pulmonary artery smooth muscle cells, we surprisingly found that, not only sars-cov- spike protein did not promote the mek phosphorylation, but rather decreased the phosphorylation. as shown in fig. , the treatment of full length sars-cov- spike protein s resulted in the dephosphorylation of mek as early as min after the treatment and this dephosphorylation event was maintained for at least min. thus, sars-cov- spike protein promotes the mek phosphorylation in human cells, but not in rat cells. to confirm that the action of sars-cov- spike protein is through its well-known receptor ace , , human pulmonary artery smooth muscle cells were pretreated with the ace antibody for hour before treating with the full-length s subunit of sars-cov- spike protein. the ace antibody alone caused the activation of mek, and sars-cov- spike protein did not further increase this mek phosphorylation signal (fig. ) . to test whether the rbd binding to ace is sufficient to stimulate cell signaling for mek phosphorylation, human cells were treated with the recombinant s rbd of sars-cov- spike protein that only contains the rbd (arg -phe ). in contrast to the full-length s subunit (val -gln ) that strongly phosphorylated mek, s rbd (arg -phe ) did not activate the mek phosphorylation in human pulmonary artery smooth muscle cells (fig. a) or in human pulmonary artery endothelial cells (fig. b) . thus, other regions of the spike protein in addition to rbd may be required for eliciting cell signaling for mek phosphorylation. our results showing that sars-cov- spike protein is capable of activating the mek/erk pathway in pulmonary artery smooth muscle and endothelial cells suggest that cell growth signaling may be triggered in the pulmonary vascular walls in response to sars-cov- . to test this, we examined the lung histology results of patients who died of covid- . (fig. b, right) . the major finding of this study is that the sars-cov- spike protein without the rest of the virus can elicit cell signaling, specifically the activation of the mek/erk pathway, in human host lung vascular smooth muscle and endothelial cells. mek is a mitogen-activated protein kinase kinase (mapkk) that phosphorylates and activates extracellular-regulated kinase (erk), one type of mitogen-activated protein kinases (mapk). in this mek/erk pathway of cell signaling, mek is activated by the phosphorylation by raf kinase [ ] . the mek/erk pathway is a well-known cell growth mechanism [ ] and has also been shown to facilitate the viral replication cycles [ ] . by contrast to the full-length s subunit of sars-cov- spike protein (val -gln ) that strongly activated mek, rbd only containing protein (arg -phe ) did not activate mek in human pulmonary artery smooth muscle or in endothelial cells. these results suggest that the protein regions other than the rbd (i.e. val -arg and/or phe -gln ) are required for eliciting cell signaling for the mek phosphorylation. thus, we propose that sars-cov- spike protein-mediated cell signaling is not merely cells responding to anything binding to the membrane surface protein, but is a growth factor/hormone-like specific cell signal transduction event that is well coordinated by the rbd as well as other protein regions that facilitate cell signaling. our ace antibody experiments as described in fig. support the involvement of ace . however, the binding of the ace antibody to ace seems sufficient to cause the activation of mek. while we interpreted that the ace antibody blocked and interfered with the binding of the rbd to ace , it is possible that ace antibody-mediated cell signaling may have desensitized the subsequent spike protein-mediated effects to activate mek. thus, sars-cov- spike protein-mediated cell signaling may occur through a receptor other than ace . in contrast to the effects of sars-cov- spike protein on human cells to activate the mek phosphorylation, this protein promoted the dephosphorylation of mek in rat cells. this species difference in the sars-cov- spike protein actions may be important to understand how this virus severely affects humans. we previously reported a novel ligand-mediated dephosphorylation mechanism of mek induced by neurotensin and neuromedin n [ ] . the sars-cov- spike protein signaling in rat pulmonary artery smooth muscle cells represents another example of the mek dephosphorylation mechanism. these results also suggest that the species-specific actions of sars-cov- may depend on how spike protein-mediated cell signaling mechanisms differ among species. it has been noted that elderly patients with systemic hypertension and other cardiovascular diseases are particularly susceptible to developing severe and possibly fatal conditions of covid- [ , , ] . thus, the pathology of covid- does not seem to be patients. these results also indicate the possibility that patients who recovered from covid- may be predisposed to developing pulmonary arterial hypertension and right-sided heart failure. we also noticed that published lung histology images of patients who died of ards during the - sars outbreak due to the infection with sars-cov- did not exhibit the signs of thickened pulmonary vascular walls [ , ] . thus, pulmonary vascular wall thickening is a unique feature of the sars-cov- infection and covid- . our results in human pulmonary artery smooth muscle and endothelial cells revealed that the sars-cov- spike protein s subunit is sufficient to trigger biological responses in the human host cells in the absence of the participation of the rest of the sars-cov- viral particle. we also found that the pulmonary vascular walls are thickened in covid- patients. we propose that sars-cov- spike protein-mediated cell growth signaling participates in adverse cardiovascular/pulmonary outcomes seen in covid- . this mechanism may provide new therapeutic targets to combat the sars-cov- infection and covid- . none. epidemiology, genetic recombination, and pathogenesis of coronaviruses the molecular biology of sars coronavirus a new coronavirus associated with human respiratory disease in china clinical features of patients infected with novel coronavirus in wuhan structural basis for the recognition of sars-cov- by full-length human ace characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine pathological findings of covid- associated with acute respiratory distress syndrome risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease prevalence and impact of cardiovascular metabolic diseases on covid- in china prevalence of comorbidities and its effects in patients infected with sars-cov- : a systematic review and meta-analysis structure, function, and antigenicity of the sars-cov- spike glycoprotein cell entry mechanisms of sars-cov- protein carbonylation as a novel mechanism in redox signaling the erk cascade. distinct functions within various subcellular organelles mapk signal pathways in the regulation of cell proliferation in mammalian cells viral exploitation of the mek/erk pathway -a tale of vaccinia virus and other viruses ligand-mediated dephosphorylation signaling for map kinase the clinical pathology of severe acute respiratory syndrome (sars): a report from china pulmonary pathology of severe acute respiratory syndrome in toronto key: cord- -jr n uo authors: mcauley, julie l.; deerain, joshua m.; hammersla, william; aktepe, turgut e.; purcell, damian j.f.; mackenzie, jason m. title: liquid chalk is an antiseptic against sars-cov- and influenza a respiratory viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jr n uo the covid- pandemic has impacted and enforced significant restrictions within our societies, including the attendance of the public and professional athletes in gyms. liquid chalk is a commonly used accessories in gyms and is comprised of magnesium carbonate and alcohol that quickly evaporates on the hands to leave a layer of dry chalk. we investigated whether liquid chalk is an antiseptic against highly pathogenic human viruses including, sars-cov- , influenza virus and noroviruses. chalk was applied before or after virus inoculum and recovery of infectious virus was determined to mimic the use in the gym. we observed that addition of chalk before or after virus contact lead to a significant reduction on recovery of infectious sars-cov- and influenza but had little impact on norovirus. these observations suggest that the use and application of liquid chalk can be an effective and suitable antiseptic for major sporting events, such as the olympic games. was aseptically smeared onto - discrete areas covering approximately cm round surface (approx. ul) on a sterile tissue culture dish and allowed to dry. ul virus inoculum was then applied. where the inoculum did not absorb into the dry chalk, a slurry of chalk:virus mixture was created using a sterile tip and mixing. min later, ul infection media (identical to culture media but without the presence of sera) was added then mixed with the chalk:virus sample and collected. excess chalk was pelleted at g for min, then a % tissue culture infectious dose (tcid ) assay performed on the supernatant as previously described [ , ] . for the virus first assay, ul virus inoculum was added to - discrete areas on a sterile tissue culture dish, then chalk added and spread to cover an approximate cm round surface on a sterile tissue culture dish. after min incubation at room temperature, ul infection media whereas chalk # also had a significant impact, but some residual virus could be recovered. to further our study, we also tested the antiviral effect of liquid chalk against another highly infectious and pathogenic respiratory viral pathogen iav. the experiments were performed exactly as described above and the virus tcid /ml was titrated on mdck cells. as can be observed in figure , all four liquid chalk products were effective in restricting the recovery of iav compared to sars-cov- . however, for iav the effect was greater when transmission of sars-cov- (the causative agent of the covid- pandemic), influenza a virus (h n ) (iav) and norovirus, using the surrogate model of mouse norovirus vero cells (american type culture collection [atcc]) were maintained in minimal essential media (mem) supplemented with % heat-inactivated foetal bovine serum (fbs), μm hepes, mm glutamine and antibiotics madin-darby canine kidney (mdck) cells were grown in roswell park memorial institute (rpmi) media supplemented with % fbs, mm glutamine and antibiotics. raw . cells were maintained in dulbecco's modified eagle's medium (dmem) with % fbs and % glutamax. cell cultures were maintained at o c in a % co incubator. sars-cov- isolate hcov- /australia/vic cytotoxicity assay ® non-radioactive cytotoxicity assay ul of various liquid chalk samples were aseptically air-dried, resuspended in ul of cell culture media and centrifuged at xg for mins to remove excess chalk particles (this sample is depicted as "neat"). neat supernatant was -fold serially diluted in respective tissue culture media, added to the -well plate containing cells and incubated at ᴼc for hours. following the incubation period, ul of x lysis solution was added for mins to the control wells to applied first and dried, the recovery of infectious influenza a virus was significantly reduced compared to the chalk control, but markedly more infectious virus remained in the samples treated with chalk and chalk compared to the virus first samples that underwent the same treatment (data not significant) as a comparator, we also investigated the ability of liquid chalk to inactivate another highly infectious viral pathogen, norovirus. as human norovirus is difficult to cultivate in laboratory conditions, we utilised the widely appreciated surrogate murine norovirus (mnv) for our studies again, the experiments were identical to those described above except the viral tcid /ml was performed on raw . (murine macrophage) cells we observed that chalks # - had very little impact on virus recovery, with a . log reduction the best that we observed. however, in contrast to sars-cov- we observed an approximate log reduction upon application of chalk # . this is interesting as the major difference between mnv and sars-cov- and iav is that mnv is a non-enveloped virus, whereas the other two contain a host-derived lipid membrane as their outer most layer. thus norovirus remains infectious when exposed to liquid chalk. norovirus was not rendered non-infectious when treated with gym chalk, regardless of whether the virus was added to dry chalk, or chalk was added to virus inoculum as different alcohols are the major constituents of liquid chalk, we additionally evaluated the impact of alcohol alone on the recovery of infectious virus. as can be observed in table , exposure of sars-cov- and iav to both ethanol and isopropanol is detrimental to the infectiousness of these viruses up to a dilution of % v/v for iav and % v/v for sars-cov . given proprietary information regarding the alcohol content of chalks there have recently been two press releases, one using seasonal cov and not sars neither public release investigated the efficiency against other highly infectious viral pathogens. to our knowledge this study is the first to demonstrate antiviral activity of a range of commercially available and commonly used liquid chalks. given the uncertainty of re-opening gyms due to contact transmission from potentially contaminated equipment, our findings that liquid chalks have anti-viral activity against sars-cov- may aid in decision making for re-opening gyms in the future victorian infectious diseases reference laboratory at the doherty institute, in providing our laboratory with isolated sars-cov- material author contributions tea and jmd performed then experiments themselves; wh provided the liquid chalk collated and analysed the data and wrote the manuscript funding: the university of melbourne acknowledges the support of a grant administered by the state government of victoria to jmm conflicts of interest: the authors declare no conflict of interest. the funders had no role in the design of the study hand sanitizers: a review of ingredients, mechanisms of action, modes of delivery, and efficacy against coronaviruses efficacy of ethanol against viruses in hand disinfection inactivation of severe acute respiratory syndrome coronavirus by who-recommended hand rub formulations and alcohols. emerging infectious disease journal inactivation of sars-cov- by commercially available alcohol-based hand sanitizers isolation and rapid sharing of the novel coronavirus (sars-cov- ) from the first patient diagnosed with covid- in australia validation of a single-step, single-tube reverse transcription loop-mediated isothermal amplification assay for rapid detection of sars-cov- rna expression of the influenza a virus pb -f enhances the pathogenesis of viral and secondary bacterial pneumonia stat -dependent innate immunity to a norwalk-like virus mouse norovirus replication is associated with virus-induced vesicle clusters originating from membranes derived from the secretory pathway murine norovirus: a model system to study norovirus biology and pathogenesis liquid chalk proven in cu labs to kill coronavirus, potentially helping collated and analysed the data and wrote the manuscript competing interests: all authors declare they have no competing interests with this study data sharing: all results gained during this study will be made available via journal access after acceptance and publication of the article on open access university website key: cord- -idji l a authors: xu, huanzhou; chitre, siddhi a.; akinyemi, ibukun a.; loeb, julia c.; lednicky, john a.; mcintosh, michael t.; bhaduri-mcintosh, sumita title: sars-cov- viroporin triggers the nlrp inflammatory pathway date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: idji l a cytokine storm resulting from a heightened inflammatory response is a prominent feature of severe covid- disease. this inflammatory response results from assembly/activation of a cell-intrinsic defense platform known as the inflammasome. we report that the sars-cov- viroporin encoded by orf a activates the nlrp inflammasome, the most promiscuous of known inflammasomes. orf a triggers il- β expression via nfκb, thus priming the inflammasome while also activating it via asc-dependent and -independent modes. orf a-mediated inflammasome activation requires efflux of potassium ions and oligomerization between nek and nlrp . with the selective nlrp inhibitor mcc able to block orf a-mediated inflammasome activation and key orf a residues needed for virus release and inflammasome activation conserved in sars-cov- isolates across continents, orf a and nlrp present prime targets for intervention. summary development of anti-sars-cov- therapies is aimed predominantly at blocking infection or halting virus replication. yet, the inflammatory response is a significant contributor towards disease, especially in those severely affected. in a pared-down system, we investigate the influence of orf a, an essential sars-cov- protein, on the inflammatory machinery and find that it activates nlrp , the most prominent inflammasome by causing potassium loss across the cell membrane. we also define key amino acid residues on orf a needed to activate the inflammatory response, and likely to facilitate virus release, and find that they are conserved in virus isolates across continents. these findings reveal orf a and nlrp to be attractive targets for therapy. worldwide reports of covid- indicate that effective management of severely ill individuals will require both antiviral and anti-inflammatory strategies. indeed, during the second week of cells. importantly, we find that although the cov- orf a protein has diverged somewhat from its homologs in other covs, some of these newly divergent residues are essential for activating the nlrp inflammasome and are perfectly conserved in virus isolates across continents. with lung as the predominant site of pathology along with established tropism for kidney and other organs , we introduced orf a into lung origin a cells and for comparison, kidney origin hek- t cells, cell types that readily support sars-cov- infection , and found induction of pro-il- in both cell types, consistent with priming of the inflammasome. compared to empty vector-exposed cells, orf a also increased the levels of cleaved, i.e. the active form of the pro- inflammatory caspase, caspase , as well as the cleaved form of the caspase substrate, pro-il- , indicating activation of the inflammasome, again in both cell types (fig. a) . priming by orf a resulted from nfb-mediated expression of il-  message (fig. b) as indicated by increased ib phosphorylation and enrichment of nfb p at the il-  promoter in orf a- exposed cells (figs. c-e). orf a also caused cleavage/activation of gasdermin d, the pyroptosis-inducing caspase -substrate, indicated by an increase in the n-terminal fragment of gasdermin d (fig. f ). this was accompanied by orf a-mediated increased cleavage/activation of caspase and cell death, likely secondary to both pyroptosis and apoptosis (figs. g and h). thus, orf a primes the inflammasome by triggering nfb-mediated expression of pro-il-  while also activating the inflammasome to cleave pro-caspase- , pro-il- , and the pore-forming gasdermin d, inducing cell death. orf a activates the nek -nlrp inflammasome via asc-dependent and independent modes. in probing the mechanism of orf a-mediated activation of the inflammasome, we found that it nima-related kinase nek recently linked to nlrp activation , we also depleted nek and found that orf a was impaired in its ability to cause cleavage of caspase , i.e. unable to activate the inflammasome (fig. d) . the nlrp inflammasome is activated by a variety of cell-extrinsic and -intrinsic stimuli that trigger the assembly of the inflammasome machinery wherein nlrp oligomerizes with the adaptor protein asc (apoptosis-associated speck-like protein containing a card) leading to recruitment of pro-caspase which is then activated by proximity-induced intermolecular cleavage. given orf a-mediated inflammasome activation in hek- t cells that lack asc (fig. e) , we asked if orf a activated the inflammasome solely in an asc- independent manner. we found that orf a's ability to activate pro-caspase was substantially impaired upon depletion of asc in a cells (fig. f) , supporting the idea that orf a activates the inflammasome in both asc-dependent and -independent ways. to assess if orf a also mediates activation of other prominent inflammasomes including nlrp and nlrc , we depleted each of these molecules but were unable to block cleavage of pro-caspase (fig. g) , indicating that orf a predominantly activates the nlrp inflammasome. orf a triggers nlrp inflammasome assembly via k + efflux. with nek a key mediator of nlrp activation downstream of potassium efflux, and efflux of potassium ions a central mechanism of nlrp activation, particularly by ion channel-inducing viroporins , , , we investigated the effect of blocking potassium efflux by raising the extracellular concentration of k + and found that orf a-mediated caspase cleavage was abrogated (fig. a) . to identify the type of k + channel formed by orf a, we employed known pharmacologic inhibitors including quinine, barium, iberiotoxin, and tetraethylammonium to block two-pore domain k + channels, inward-rectifier k + channels, large conductance calcium-activated k + channels, and voltage gated k + channels, respectively . mimicking the ability of barium to dampened response by the inflammasome (fig. b) . thus, sars-cov- orf a has retained some of the key residues needed for virus release and inflammasome activation but it has acquired additional changes that support a functionally consequential divergence from earlier covs. nonetheless, this domain bearing the abovementioned residues that is essential for forming ion channels for virus release has remained remarkably well conserved throughout the pandemic, thereby maintaining its ability to activate the inflammasome. in summary, an essential viroporin required for release of sars-cov- from infected cells is also able to prime and activate the nlrp inflammasome, the machinery responsible for much of the regulates lytic activation of kaposi's sarcoma-associated herpesvirus severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc a promiscuous inflammasome sparks replication of a common tumor virus novel replisome-associated proteins at cellular replication forks in ebv-transformed b lymphocytes krab-zfp repressors enforce quiescence of oncogenic human characterization of a functional nf-kappa b site in the human interleukin beta promoter: evidence for a positive autoregulatory loop barr virus replication through phosphorylation of kap /trim in burkitt lymphoma cells cov- reference strain wuhan-hu- (genbank nc_ . ) and % identity with co-immunoprecipitation (co-ip) co-ip was performed as described previously . cells were lysed in ice-cold ip lysis key: cord- -rzda x authors: wells, stephen a. title: rigidity, normal modes and flexible motion of a sars-cov- (covid- ) protease structure date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rzda x the rigidity and flexibility of two recently reported crystal structures (pdb entries y e and lu ) of a protease from the sars-cov- virus, the infectious agent of the covid- respiratory disease, has been investigated using pebble-game rigidity analysis, elastic network model normal mode analysis, and all-atom geometric simulations. this computational investigation of the viral protease follows protocols that have been effective in studying other homodimeric enzymes. the protease is predicted to display flexible motions in vivo which directly affect the geometry of a known inhibitor binding site and which open new potential binding sites elsewhere in the structure. a database of generated pdb files representing natural flexible variations on the crystal structures has been produced and made available for download from an institutional data archive. this information may inform structure-based drug design and fragment screening efforts aimed at identifying specific antiviral therapies for the treatment of covid- . as of late , there has been global concern over a novel respiratory disease (designated covid- by the world health organisation (who)) which originated in the wuhan region of china and has subsequently spread to multiple other countries worldwide. the virus causing covid- is a coronavirus with the taxonomic identifier sars-cov- [ ] . thanks to the diligent efforts of structural biologists, several crystal structures of proteins from sars-cov- have already been obtained and made available through the protein data bank (pdb) even in advance of journal publication. crystal structures provide us with detailed structural knowledge of the arrangement of atoms making up a protein or protein complex. a crystal structure is a static snapshot of the protein "frozen" into one of its many possible conformations. this static snapshot is rich in implicit information on the natural dynamics which the protein will explore in vivo. these natural dynamics, described by kern [ ] as a protein's "dynamic personality", can be critical to a protein's function, and can directly affect the geometry of, for example, an enzyme active site. a judicious combination of simplified methodologiesrigidity analysis, elastic network modelling, and geometric simulation of flexible motioncan work together to extract valuable information on the large-amplitude, low-frequency motions of a protein structure, at a fraction of the computational cost (in resources and time) of molecular dynamics (md) approaches [ ] . these simplified methods are strongly complementary to both computational md investigations and experimental biophysical/biochemical studies of protein behaviour. enzyme active site geometry [ ] [ ] [ ] [ ] ; and, crucially, that a structure generated from geometric simulation of a large amplitude motion can be used as input for further md investigation [ ] , as the geometric simulation retains the local bonding geometry and constraint network of the input crystal structure and forbids major bonding distortions and steric clashes. the information obtained here regarding the flexibility of the enzyme should be of use and interest in structure-based drug design and computational screening of drugs and drug-like molecules capable of binding to and inhibiting the action of the viral protease, such as the pioneering work of walsh et al. reported online within the past few days [ ] . the simulations reported here suggest that the protease structure is capable of substantial flexible motions which alter domain orientations, open and close clefts, and affect the geometry of an inhibitor-binding site. the entirety of the computational work described here was carried out in less than one working day on a laptop computer. the inputs and outputs of the simulations, principally consisting of pdb files representing flexible variations on the protease crystal structures, are available for download from the university of bath research data archive at https://researchdata.bath.ac.uk/ / [ ] with doi https://doi.org/ . /bath- methods, software and data input data: crystal structures of a viral protease were downloaded from pdb entries y e (free protease) and lu (inhibitor bound). visualisations: visualisation and some minor structural editing, viz. generation of symmetry copies, removal of water and other heterogroups, removal of alternate sidechain conformations, and renumbering of entries after addition of hydrogens, were carried out with the pymol viewer, version . [ ] . all structures were aligned to the y e crystal structure for consistency of view. addition of hydrogens, sidechain flipping: the molprobity [ ] web server hosted at duke university provides online tools carrying out these functions. rigidity analysis: carried out using the pebble-game constraint counting algorithm implemented in the software first [ ] . first should be available from arizona state university via flexweb.asu.edu; academic readers should contact the present author if there is difficulty in obtaining the software, as the site appears currently to be down. recent research has shown that the assignment of energies to polar interactions in first suffers from a bug that leads to some salt-bridge interactions with short donoracceptor distances being treated as weak rather than strong. therefore, interactions were identified and energies assigned using a recently developed corrected function, sbfirst [ ] . a copy of the sbfirst software for constraint identification is included with the deposited dataset for this study. normal mode analysis: carried out using the elnemo elastic network modelling software [ ] . this method builds an elastic network model of the protein with one site per residue (using the alpha carbon _ca_ atom coordinates as the node positions), placing springs of uniform strength between each pair of residues lying less than Å apart, and diagonalising the resulting hessian matrix to identify eigenvectors. geometric simulations of flexible motion: carried out using the froda geometric simulation engine [ ] incorporated into first. the froda engine iteratively projects an all-atom model of the protein structure along a normal mode direction [ ] while constraining the local geometry to match that of the original protein structure. frames along each such trajectory are written out as numbered pdb files. workflow and protocol: for each protein structure, the complete workflow for this study is as follows. download structure in pdb or mmcif format. in this case, the downloaded structure contains coordinates for a single protein chain, and symmetry operations to generate copies making up the crystal structure. visualise structure in pymol. generate symmetry mates. select the appropriate symmetry mate to make up the homodimer indicated in the pdb entry (under global stoichiometry) as the biological entity of interest. delete other symmetry copies. alter chain id(s) in the symmetry copy to give each chain a unique chain id. process structure through the molprobity website, adding hydrogens at electron-cloud positions, accepting all recommendations for side chain flips. download the hydrogenated structure (in which all hydrogens have a serial number of ). visualise structure in pymol for final cleaning and preparation. remove alternate conformations (i.e. those with an altloc entry which is not a blank or an a). remove water molecules and extraneous heterogroups such as glycerol, acetyl etc. save the cleaned structure; in this step, pymol renumbers the atoms sequentially, so hydrogens now have appropriate serial numbers. identify covalent and noncovalent interactions from the atomic geometry of the protein structure using sbfirst. this generates lists of covalent bonds, tagged as being either rotatable or non-rotatable (such as the c-n bond in the peptide main chain); hydrophobic tether interactions, wherever two nonpolar sidechains are closely adjacent; and polar interactions, including salt bridges and hydrogen bonds. polar interactions are assigned energies between and - kcal/mol based on the donor-hydrogen-acceptor geometry. analyse rigidity with first as a function of hydrogen-bond energy cutoff. this analysis groups atoms into rigid clusters by matching degrees of freedom against constraints. first can carry out a dilution, gradually lowering the cutoff to exclude polar interactions one by one from weakest to strongest. it is also possible to simply carry out "snapshots" of the rigidity at a series of cutoff values, which in this study were - . , - . , - . … - . , - . kcal/mol. review constraints based on rigidity information. at this stage, a small number of noncovalent constraints (one in y e, three in lu ) were identifiable as artefacts of crystal packing, and were excluded from the network when carrying out the geometric simulations. based on the rigidity analysis, a cutoff of - . kcal/mol appeared appropriate for the geometric simulations, in line with previous studies [ , ] . normal mode analysis. extract only the alpha carbon positions and run elnemo pdbmat (generation of matrix) and diagstd (diagonalise matrix) functions. the result is a series of normal modes of motion, reported as eigenvector/eigenvalue pairs. when these are sorted by eigenvalue from lowest to highest, the first six modes are trivial rigid-body motions with near-zero eigenvalues (frequencies), while the low-frequency modes from upwards represent flexible motions of the protein itself. this study examines modes to , that is, the ten lowest-frequency nontrivial modes, with modes to being shown in the figures. geometric simulation of flexible motion. for each normal mode of interest ( to ), run first with a chosen energy cutoff (- . kcal/mol) and the previously identified lists of covalent and noncovalent bonding constraints, providing the normal mode eigenvector and a direction of motion (parallel or antiparallel to the vector) along with a directed step size (in this case . Å), and invoking the froda engine implemented in first. froda carries out a series of steps. in each step, all atom positions are moved along the normal mode direction. then, the local atomic geometry defined by the bonding constraints and steric exclusion is restored by a few cycles of iterative relaxation. after a userdefined number of steps (in this case ), a numbered "frame" is written out as a pdb file. a series of such frames describes the trajectory of flexible motion. froda continues each run until a user-defined maximum number of steps (in this case ) or until "jamming" occurs where the constraints can no longer be satisfied, typically when the motion has led to severe steric clashes through the collision of two domains. output data: the inputs, scripts and outputs of the rigidity analysis, normal mode analysis, and geometric simulations for each protein have been deposited in the university of bath's data repository and are accessible from https://researchdata.bath.ac.uk/ / [ ] results and discussion the y e and lu crystal structures each contain explicit coordinates for one chain (a) of the protease, and in the case of lu , a chain (c) representing a bound inhibitor. the input structures for analysis and simulation consist of chain a and a symmetry copy (b), making up the biological homodimer. chain c is likewise duplicated to chain d. the x-ray structure does not contain explicit hydrogens, which are added using molprobity [ ] . with hydrogens added, it is then possible to detect the covalent bonds, polar noncovalent interactions (hydrogen bonds and salt bridges) and hydrophobictether interactions using sbfirst [ ] (see methods). the polar interactions are rated with an effective energy between and - kcal/mol based on the donor-hydrogen-acceptor geometry. pebble-game rigidity analysis [ , ] divides the structure into rigid clusters and flexible regions based on the distribution of degrees of freedom and constraints. this rigid cluster decomposition (rcd) depends on the set of polar interactions that are included, based on an energy cutoff that excludes weaker interactions. information on the relative stability of different portions of the structure is typically obtained at cutoffs around a "room temperature" value of - kcal/mol. rcds are shown in figure . with the inclusion of weaker polar constraints, a single large rigid cluster extends across both chains of the protease dimer. as weaker constraints are eliminated, the large cluster extends across the n-terminal domains (roughly residues - ) of both chains, which consist largely of beta-sheet structure, while the largely alpha-helical c-terminal domain breaks up into several smaller rigid clusters, each one a single helix. with further elimination of constraints, the beta-sheet regions become fully flexible. flexible motion of a protein structure is best carried out at with a rigidity cutoff such that the structure is fully flexible. the mobility studies that follow are carried out with a cutoff of - . kcal/mol, similar to previous studies on enzymes [ , ] . it is important to appreciate that even at this lower cutoff the structure is still thoroughly constrained by noncovalent interactions, as illustrated in figure . the core of each folded domain is rich in hydrophobic interactions, as is the interface region between the two beta-sheet domains as the two chains form the homodimer, and arrays of strong polar interactions constrain secondary structure (for example, the backbone hydrogen bonds within alpha helices and beta sheets). noncovalent constraints in the lu structure, shown as green (hydrophobic) and red (polar) lines. the protein mainchain is shown in cartoon representation. the circled area highlights a small set of interdomain hydrophobic constraints, one between the ala residues of the two chains, and one each between residue thr of one chain and leu of the other. in the y e structure only the ala interaction is present. these interactions may be considered an artefact of crystal packing. also highlighted in figure are a small set of hydrophobic interactions assigned between the n-terminal domains of chains a and b. in the y e structure, a single tether connects the sidechains of the ala residues of both chains. in the lu structure illustrated, there are two additional tethers nearby, each connecting the sidechains of the thr residue of one chain and the leu of the other. since the two n-terminal domains are otherwise not connected by noncovalent interactions, and these tethers are not involved in forming a rigid cluster, it seems reasonable to consider these tethers an artefact of crystal packing, and eliminate them from further consideration. their deletion does not affect the rigid cluster decompositions of the structures. if these tethers were retained, they would somewhat limit (but not entirely prevent) the central cleft-opening motion discussed below. normal modes, representing directions of motion intrinsic to the structure, are easily obtained from an elastic network model [ ] (see methods) in which the protein is represented as one site per residue and springs of uniform strength are placed between every pair of sites with a separation less than Å. in the lu structure, the amino acid residues of the bound inhibitor are included in the elastic network. of the n normal modes of a structure with n residues, six modes of near-zero frequency represent the trivial rigid body motions of the structure, while the low-frequency modes from upwards represent modes of flexible motion. when the structure moves along a low-frequency mode direction, the global geometry changes with minimal change in the local geometry, making these directions of "easy" motion for the protein. the sign of a normal mode eigenvector is arbitrary; motions parallel and antiparallel to each bias direction must be investigated equally. linear projection of a protein structure along a normal mode direction would rapidly introduce unphysical distortions into the structure. the geometric simulation approach implemented in froda [ , ] instead applies a bias, moving the structure along a normal mode direction, while maintaining steric exclusion and the bonding geometry and noncovalent constraints of the input structure. this approach rapidly generates physically realistic flexible variations on the input structure which retain all-atom steric and bonding detail. geometric simulations of the lowest-frequency nontrivial modes of motion ( upwards) show that the protease structure is capable of substantial amplitudes of easy motion covering several Ångstroms distance in global root-mean-square displacement (rmsd). in particular, the orientations of the alphahelical domains relative to the beta-sheet domains can change through flexing and rotation about the interdomain "hinges", and the beta-sheet domains themselves are large enough to display bending and twisting motions within themselves. these variations are illustrated for the y e structure in figure and for lu in figure . , shown in cartoon representation and coloured from blue to red along each chain. each flexible variation is the th frame of a geometric simulation, lying at an all-atom rmsd of to Ångstroms from the starting crystal structure. note the substantial variations of domain orientation achieved. each variation is labelled with the normal mode used to bias the geometric simulation, from to , and with the direction of bias, parallel (+) or antiparallel (-) to the mode eigenvector, whose sign is arbitrary. : flexible variations (outer circle) of the lu structure (centre), shown in cartoon representation and coloured from blue to red along each chain. the bound inhibitor is shown as spheres. each flexible variation is the th frame of a geometric simulation, lying at an all-atom rmsd of to Ångstroms from the starting crystal structure. note the substantial variations of domain orientation achieved. each variation is labelled with the normal mode used to bias the geometric simulation, from to , and with the direction of bias, parallel (+) or antiparallel (-) to the mode eigenvector, whose sign is arbitrary. flexible variations on a protein crystal structure are potentially relevant to structure-based drug design in at least two ways. firstly, flexible motion can open or expose clefts and potential binding sites not directly visible in the static crystal structure. since the protein in vivo will be exploring its flexible motion thanks to the brownian-motion driving force of its solvent (e.g. cytosol), such latent sites may constitute valid target areas for inhibitors. secondly, the global low-frequency motion couples to variations in the binding site/active site geometry [ , ] . knowledge of the range of flexible variation here is potentially useful for structurebased drug design and/or fragment screening, since attention can be focussed on candidate molecules that interact robustly with the binding site and tolerate its flexibility, in preference to molecules that interact well only with the crystal structure and not with its flexible variations. y e crystal structure (a) and cleft opening motion along modes -(b) and +(c). bottom row: lu crystal structure (d) and cleft opening motion along modes + (e) and + (f). protein structure is shown as spheres and coloured by amino acid type. green, aliphatic or aromatic residues; yellow, cysteine or methionine; red, hydroxyl (serine and threonine); magenta, acidic; purple, amidic; blue, basic. a more detailed view of this opening cleft is shown in figure . the residues are coloured to bring out the chemical character of the amino acid side chains. the portions of the alpha-helical domain surfaces that move aside to open the cleft display largely hydrophobic residues (aliphatic or aromatic side chains). intriguingly, the exposed surface area within the cleft is richly lined with a series of basic residues (lysine and arginine), and is flanked by acidic (aspartic and glutamic acids) and polar (threonine) residues. in the crystal structures the basic and acidic residues appear to be involved in a network of inter-and intra-domain salt bridge interactions which stabilise the core of the homodimer. an antagonist capable of targeting this zone of strongly polar surface geometry exposed by flexible motion could potentially disrupt the dynamics of the enzyme and interfere with its function. detail view of interdomain cleft opening (structure lu , mode +, th frame of geometric simulation). protein structure is shown as spheres and coloured by amino acid type. green, aliphatic or aromatic residues; yellow, cysteine or methionine; red, hydroxyl (serine and threonine); magenta, acidic; purple, amidic; blue, basic. one end of the bound inhibitor is visible as grey spheres in the lower left. the exposed cleft is rich in basic, acidic and polar residues. in the lu structure, a binding site on the flank of each beta-sheet domain is occupied by the inhibitor n-(( -methylisoxazol- -yl)carbonyl)alanyl-l-valyl-n~ ~-(( r, z)- -(benzyloxy)- -oxo- -((( r)- oxopyrrolidin- -yl)methyl)but- -enyl)-l-leucinamide (chain c of the structure), hereinafter "n ". this inhibitor is listed in the pdb as prd_ ; it is a peptide-like inhibitor, with a central core of amino acids modified at either end with heterogroups, and was previously known to inhibit a protease of a feline coronavirus [ ] (see pdb entry eu ). in the y e structure the same binding cleft is empty, which means that its geometry is likely to change in the course of flexible motion of the beta-sheet domain. figure shows the n binding site of the lu structure and the corresponding region of the y e structure. as noted, all structures have been aligned to the y e crystal structure for ease of viewing and comparison. figure also shows flexible variations of the y e site in the course of flexible motion. a set of residues forming a rough square around the binding site (glu , gln , asn and thr ) are highlighted in white as a guide to the eye. it is immediately visible that openings, closings and flexible distortions of the site are an intrinsic feature of the flexible motion of the protease. note in particular the effect of motion along the lowest-frequency nontrivial mode direction, mode . motion biased antiparallel to the mode eigenvector ( -) includes an opening of the cleft, while motion biased parallel to it ( +) closes it. since the n inhibitor is quite deeply embedded in the binding site of the lu structure, opening and closing of this cleft in the course of the protein's natural motion may be necessary for molecules to access the cleft. on visual inspection, the site in the y e structure is clearly slightly more closed that in lu , consistent with the proposition that such opening/closing motions are intrinsic to the protein. - + - + - + - + figure : top row: the n inhibitor binding site of the lu structure, and the corresponding empty site of the y e structure. rows below: flexible variations of the y e site from geometric simulations along normal modes as indicated. residues - of a single chain of the protease are shown as spheres and coloured by amino acid type. green, aliphatic or aromatic residues; yellow, cysteine or methionine; red, hydroxyl (serine and threonine); magenta, acidic; purple, amidic; blue, basic. to highlight the binding site, residues glu , gln , asn and thr are coloured white. substantial opening and closing of the binding site cleft occurs during the flexible motion. the rigidity and flexibility of two recently reported crystal structures (pdb entries y e and lu ) of a protease from the sars-cov- virus, the infectious agent of the covid- respiratory disease, has been investigated using pebble-game rigidity analysis, elastic network model normal mode analysis, and all-atom geometric simulations. this computational investigation of the viral protease follows protocols that have been effective in studying other homodimeric enzymes. the protease is predicted to display flexible motions in vivo which directly affect the geometry of a known inhibitor binding site, e.g. through an opening/closing motion, and which open new potential binding sites elsewhere in the structure. a database of generated pdb files representing natural flexible variations on the crystal structures has been produced and made available for download from an institutional data archive. this information may inform structure-based drug design and fragment screening efforts aimed at identifying specific antiviral therapies for the treatment of covid- . naming the coronavirus disease (covid- ) and the virus that causes it dynamic personalities of proteins rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses structure and function in homodimeric enzymes: simulations of cooperative and independent functional motions a complete thermodynamic analysis of enzyme turnover links the free energy landscape to enzyme catalysis exposing the interplay between enzyme turnover, protein dynamics, and the membrane environment in monoamine oxidase b. biochemistry protein flexibility is key to cisplatin crosslinking in calmodulin structure and function of l-threonine- -dehydrogenase from the parasitic protozoan trypanosoma brucei revealed by x-ray crystallography and geometric simulations the flexibility and dynamics of protein disulfide isomerase main protease structure and xchem fragment screen dataset for "rigidity, normal modes and flexible motion of a sars-cov- (covid ) protease structure the pymol molecular graphics system molprobity: more and better reference data for improved all-atom structure validation protein flexibility predictions using graph theory. proteins-structure function and genetics salt bridge impact on global rigidity and thermostability in thermophilic citrate synthase elnemo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement constrained geometric simulation of diffusive motion in proteins comparative analysis of rigidity across protein families crystal structure of feline infectious peritonitis virus main protease in complex with synergetic dual inhibitors key: cord- -diqfmitr authors: luo, lei; liu, dan; zhang, hao; li, zhihao; zhen, ruonan; zhang, xiru; xie, huaping; song, weiqi; liu, jie; huang, qingmei; liu, jingwen; yang, xingfen; chen, zongqiu; mao, chen title: air and surface contamination in non-health care settings among environmental specimens of covid- cases date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: diqfmitr background little is known about the sars-cov- contamination of environmental surfaces and air in non-health care settings among covid- cases. methods and findings we explored the sars-cov- contamination of environmental surfaces and air by collecting air and swabbing environmental surfaces among covid- cases in guangzhou, china. the specimens were tested by rt-pcr testing. the information collected for covid- cases included basic demographic, clinical severity, onset of symptoms, radiological testing, laboratory testing and hospital admission. a total of environmental surfaces and air specimens were collected among covid- cases before disinfection. among them, specimens ( / , . %) were tested positive from covid- cases ( / , . %), with ( / , . %) positive specimens from asymptomatic cases, ( / , . %) from mild cases, and ( / , . %) from moderate cases. all positive specimens were collected within days after diagnosis, and ( / , . %) were found in toilet ( on toilet bowl, on sink/faucet/shower, on floor drain), ( / , . %) in anteroom ( on water dispenser/cup/bottle, on chair/table, on tv remote), ( / , . %) in kitchen ( on dining-table), ( / , . %) in bedroom ( on bed/sheet pillow/bedside table), ( / , . %) in car ( on steering wheel/seat/handlebar) and ( / , . %) on door knobs. air specimens in room ( / , . %) and car ( / , . %) were all negative. conclusions sars-cov- was found on environmental surfaces especially in toilet, and could survive for several days. we provided evidence of potential for sars-cov- transmission through contamination of environmental surfaces. the coronavirus disease pandemic has precipitated a global crisis, and it has resulted in confirmed cases including with deaths globally as of may , [ ] . reported transmission modes of severe acute respiratory syndrome coronavirus (sars-cov- ) among humans mainly through respiratory droplets produced when an infected case sneezes or coughs [ ] . people may be infected by inhalation of virus laden liquid droplets, and infection is more likely when someone are in close contact with covid- cases [ ] [ ] [ ] . however, the importance of indirect contact transmission, such as environmental contamination, is uncertain [ ] [ ] [ ] . evidences suggested that environmental contamination with sars-cov- is likely to be high, and it is supported by recent researches focus on environmental contamination from covid- cases in hospital [ ] [ ] [ ] [ ] . hospitals already have perfect disinfection measures and are less likely to appear super-spreaders compared with community and family [ , [ ] [ ] [ ] . however, the role of air and surface contamination in non-health care settings is still need to be explored. therefore, it is vital to understand the environmental contamination of infected cases by sars-cov- in non-health care settings, which was a vital aspect of controlling the spread of the epidemic. to address this question, in this study, we sampled total of surfaces environmental and air specimens among cases in guangzhou, china, to explore the surrounding environmental surfaces and air contamination by sars-cov- in non- health care settings. study design and setting based on covid- case reports from jan to apr , , environmental surfaces and air specimens were collected by guangzhou cdc (gzcdc) from feb to apr , . the environmental surfaces specimens of covid- cases sampled in home, hotel, public area, restaurant, marketplace, car and pet, which was associated with covid- cases' life trajectory before hospitalization. air specimens of cases also sampled in their room (home or hotel). based on cases' reported activity tracks, the number of specimens collected per covid- case varied. all specimens were collected before disinfection. handlebar) were also mixed form. air specimens was sampled using an md microbiological sampler (sartorius, germany) and sterile gelatin filters ( µm pores and mm diameter, sartorius, germany) [ , ] . air specimens was sampled for minutes in toilet, anteroom and the asymptomatic persons infected with sara-cov- (asymptomatic cases in short) refers to those who have no relevant clinical manifestations including clinically detectable signs or self-perceived symptoms such as fever, cough, or sore throat, but (table ) . distribution of environmental specimens among covid- cases. a total of environmental surfaces and air specimens were collected among covid- cases, and specimens ( / , . %) were positive by rt-pcr testing from covid- cases ( / , . %), with ( / , . %) positive specimens from asymptomatic cases, ( / , . %) from mild cases, and ( / , . %) from moderate cases ( of sars-cov- (table ) . all the positive specimens were collected within days (≤ days) from diagnosis to sampling (figure ), and ( / , . %) positive environmental surfaces specimens were collected from home, ( / , . %) from hotel and ( / , . %) from car that had driven. while, specimens in restaurant that had eaten ( / , . %), marketplace that had visited ( / , . %), pet that had lived with ( / , . %) and public area that had stayed ( / , . %) were all negative by rt-pcr testing (table ). although, the rt-pcr testing among specimens in restaurant and marketplace was negative, the huge number of people exposed to there also existed great risk. previous suggesting sharing indoor space was a major risk of sars-cov- infection. all environmental specimens were collected before they were diagnosed in our study, which indicated that a person who exposed to environmental contamination from covid- cases had a high infected risk unknowingly. in addition, sars-cov- had a great opportunity of surviving for a while on surfaces such as toilet, anteroom, and kitchen in current study. therefore, we suggested that home quarantine for suspected covid- cases might be not a good control strategy. it was difficult to ensure that cluster infection did not occur in families during the quarantine period for at least fourteen days because they shared areas like toilets during quarantine. previous study also suggested that home quarantine required personal protective equipment and professional training, but for ordinary people and families, especially those living together in a narrow space, was obviously hard to implement excellent infection control, causing other families to be infected [ , , ] and centralized quarantine was recommended in this condition [ ] . previous study showed that covid- cases with severe disease had significantly higher viral loads than that with mild disease in respiratory specimens [ ] . we tried to explore whether the more serious the covid- , the more contaminated to the environment surfaces. in this study, environmental surfaces specimens were collected in marketplace from one severe covid- cases, and the specimens' rt- pcr testing was negative, which was probably due to the sampling site was where he or she came into contact occasionally. in other cases, several environmental surfaces specimens were tested positive with ( / , . %) for asymptomatic cases, ( / , . %) for mild cases, ( / , . %) for moderate cases, suggesting all cases would contaminate environmental surfaces. in addition, covid- cases without symptoms like fever, dry cough, expectoration, fatigue, myalgia, diarrhea, were more likely to have positive specimens of sars-cov- . it might due to that people with symptoms were quarantined more quickly in general, while, people without symptoms would continue to contaminated the environmental surfaces and air before they were admitted to hospital. we highly recommend that persons no matter covid- cases or note: environmental specimens were tested by rt-pcr testing; positive environmental specimens of asymptomatic covid- cases were excluded in bars of symptoms onset to sampling. note: (+) represents positive environmental surfaces specimens, and (−) represents negative environmental surfaces and air specimens; the number represents the count of negative/positive environmental specimens; white blank represents without specimens. extensive viable middle east respiratory syndrome (mers) coronavirus contamination in air and surrounding environment in clinical infectious diseases : an official publication of the infectious diseases society of asymptomatic persons infected with covid- virus. china cdc weekly . . national health commission of the people's republic of china. chinese clinical guidance for covid- pneumonia diagnosis and treatment initial investigation of transmission of covid- among crew members during quarantine of a cruise ship -yokohama changes in contact patterns shape the dynamics of the covid- outbreak in china household transmission of sars-cov- clinical infectious diseases : an official publication of the infectious diseases society of america severity of covid- at the time of diagnosis ly%=lymphocyte percentage; ne= neutrophilic granulocyte; ne%= neutrophilic granulocyte percentage b: data on may ;c: data at hospital admission; number of participants with missing values: c= , d= public area included elevator, elevator button, stair armrest. b: asymptomatic covid- cases with environmental specimens were excluded. c: a total of specimens were covid- cases who with at least one positive environmental specimen key: cord- - lr xara authors: resende, paola cristina; delatorre, edson; gräf, tiago; mir, daiana; motta, fernando do couto; appolinario, luciana reis; da paixão, anna carolina dias; ogrzewalska, maria; caetano, braulia; dos santos, mirleide cordeiro; de almeida ferreira, jessylene; santos junior, edivaldo costa; da silva, sandro patroca; fernandes, sandra bianchini; vianna, lucas a; da costa souza, larissa; ferro, jean f g; nardy, vanessa b; croda, júlio; oliveira, wanderson k; abreu, andré; bello, gonzalo; siqueira, marilda m title: genomic surveillance of sars-cov- reveals community transmission of a major lineage during the early pandemic phase in brazil date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lr xara despite all efforts to control the covid- spread, the sars-cov- reached south america within three months after its first detection in china, and brazil became one of the hotspots of covid- in the world. several sars-cov- lineages have been identified and some local clusters have been described in this early pandemic phase in western countries. here we investigated the genetic diversity of sars-cov- during the early phase (late february to late april) of the epidemic in brazil. phylogenetic analyses revealed multiple introductions of sars-cov- in brazil and the community transmission of a major b. . lineage defined by two amino acid substitutions in the nucleocapsid and orf . this sars-cov- brazilian lineage was probably established during february and rapidly spread through the country, reaching different brazilian regions by the middle of march . our study also supports occasional exportations of this brazilian b. . lineage to neighboring south american countries and to more distant countries before the implementation of international air travels restrictions in brazil. introduction covid- , the disease caused by severe acute respiratory syndrome coronavirus- (sars-cov- ), is leading to high rates of acute respiratory syndrome, hospitalization, and death genomes (> % of ambiguous positions), we obtained a final dataset of , sequences. because most sequences recovered ( %) were from the united kingdom (uk), we generate a "non- redundant" global balanced dataset by removing very closely related sequences (genetic similarity > . %) from the uk. to achieve this aim, sequences from the uk were grouped by similarity with the cd-hit program and one sequence per cluster was selected. with this sampling procedure, we obtained a balanced global reference b. . dataset containing , sequences that were aligned with the new b. . brazilian sequences generated in this study using mafft v . and then subjected to maximum-likelihood (ml) phylogenetic analyses. the ml phylogenetic tree was inferred using iqtree v . . (mc) and compared to a null hypothesis generated by tip randomization. results were considered significant for p < . . the age of the most recent common ancestor (tmrca) and the spatial diffusion pattern of the brazilian sars-cov- sequences here obtained were classified as clade b. ( %, n = ), and particularly within the sub-clade b. . ( %, n = ) (fig. b) . the prevalence of the sub-clade b. . in our sample ( %) was much higher than that observed in other brazilian sequences available in gisaid ( %) (fig. c) phylogenetic tree, consistent with the hypothesis of multiple independent introductions (fig. ) (fig. ) . we also detected two other well-supported (sh-alrt > %) monophyletic clades of small size (n = - ) mostly composed by brazilian sequences ( supplementary fig. ) . in addition to sharing the three nucleotide mutations (g a, g a, g c) characteristic of the clade b. fig. ). despite the low genetic diversity, analyses of geographic structure rejected the null hypothesis of a panmixed population (supplementary january - th february) and its dissemination to brazil at th february ( % hpd: th february - th february) (fig. a) from western europe into brazil before nd february and that synapomorphic mutations t c and t c were fixed at sequential steps during subsequent virus local spread (fig. b) the authors wish to thank all the health care workers and scientists, who have worked hard to deal with this pandemic threat, the gisaid team and all the submitters of the database. gisaid a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- an interactive web-based dashboard to track covid- in real time brazilian ministry of health. brasil confirma primeiro caso da doença -covid- brazilian ministry of health nextstrain: real-time tracking of pathogen evolution a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology. biorxiv global initiative on sharing all influenza data -from vision to reality revealing covid- transmission by sars-cov- genome sequencing and agent based modelling. biorxiv tracking the covid- pandemic in australia using genomics a phylodynamic workflow to rapidly gain insights into the dispersal history and dynamics of sars-cov- lineages. biorxiv sars-cov- transmission chains from genetic data: a danish case study. biorxiv introductions and early spread of sars-cov- in france. biorxiv spread of sars-cov- in the icelandic population full genome viral sequences inform patterns of sars-cov- spread into and within israel. medrxiv rapid sars-cov- whole genome sequencing for informed public health decision making in the netherlands. biorxiv phylodynamics of sars-cov- transmission in spain. biorxiv the emergence of sars-cov- in europe and the us. biorxiv introductions and early spread of sars-cov- in the new genomic surveillance reveals multiple introductions of sars-cov- into northern california the ongoing covid- epidemic in minas gerais, brazil: insights from epidemiological data and sars-cov- whole genome sequencing. medrxiv importation and early local transmission of covid- in brazil genomic and phylogenetic characterization of an imported case of sars-cov- in detection of novel coronavirus ( -ncov) by real-time rt- pcr sars-cov- genomes recovered by long amplicon tiling multiplex approach using nanopore sequencing and applicable to other sequencing platforms seqinr . - : a contributed package to the r project for statistical computing devoted to biological sequences retrieval and analysis. structural approaches to sequence evolution cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences mafft multiple sequence alignment software version : improvements in performance and usability iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies modelfinder: fast model selection for accurate phylogenetic estimates exploring the temporal structure of heterochronous sequences using tempest (formerly path-o-gen) correlating viral phenotypes with phylogeny: accounting for phylogenetic uncertainty bayesian phylogenetic and phylodynamic data integration using beast . improved performance, scaling, and usability performance computing library for statistical phylogenetics bayesian coalescent inference of past population dynamics from molecular sequences bayesian phylogeography finds its roots key: cord- - jmo jc authors: ismail, saba; ahmad, sajjad; azam, syed sikander title: immuno-informatics characterization sars-cov- spike glycoprotein for prioritization of epitope based multivalent peptide vaccine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jmo jc the covid- pandemic caused by sars-cov- is a public-health emergency of international concern and thus calling for the development of safe and effective therapeutics and prophylactics particularly a vaccine to protect against the infection. sars-cov- spike glycoprotein is an attractive candidate for vaccine, antibodies and inhibitor development because of many roles it plays in attachment, fusion and entry into the host cell. in this study, we characterized the sars-cov- spike glycoprotein by immune-informatics techniques to put forward potential b and t cell epitopes, followed by the use of epitopes in construction of a multi-epitope peptide vaccine construct (mepvc). the mepvc revealed robust host immune system simulation with high production of immunoglobulins, cytokines and interleukins. stable conformation of the mepvc with a representative innate immune tlr receptor was observed involving strong hydrophobic and hydrophilic chemical interactions, along with enhanced contribution from salt-bridges towards inter-molecular stability. molecular dynamics simulation in solution aided further in interpreting strong affinity of the mepvc for tlr . this stability is the attribute of several vital residues from both tlr and mepvc as shown by radial distribution function (rdf) and a novel analytical tool axial frequency distribution (afd). comprehensive binding free energies estimation was provided at the end that concluded major domination by electrostatic and minor from van der waals. summing all, the designed mepvc has tremendous potential of providing protective immunity against covid- and thus has the potential to be considered in experimental studies. in december , a new strain of coronavirus emerged in wuhan city of hubei province in china and has since spread globally. the virus belongs to clade b of family coronaviridae in the order nidovirales, and genera betacoronavirus and caused pulmonary disease outbreak [ , ] . it is positive-sense rna, enveloped and non-segmented virus and named as sars-cov- as it share % sequence identity with sars coronavirus (sars-cov) [ , ] . sars-cov- caused coronavirus disease- and evidence suggest a zoonotic origin of this disease [ ] . though the zoonotic transmission is not completely understood but facts provide the ground that it proliferates from the seafood market huanan in wuhan and human-to-human transmission resultant into the exponential increase in number of cases [ , ] . as of march , , cases are reported worldwide with , deaths and , recovered patients. among the active cases, , are currently infected, , ( %) are in mild conditions and , ( %) are seriously ill. serisouly illed. among the , closed cases, , ( %) are recovered whereas , ( %) die. on march , the world health organization (who) affirmed covid- as a pandemic (https://www.worldometers.info/coronavirus/). sars-cov- utilizes a highly glycosylated, homotrimeric class i viral fusion spike protein to enter into host cells [ ] . this protein is found in a metastable pre-fusion state which undergoes a structural rearrangement facilitating viral membrane fusion with the host cell [ ] [ ] [ ] . the binding of s subunit to a host-angiotensin-converting enzyme initiates this process and disrupts the prefusion trimeric structure resulting into s subunit dispersion and stabilizes the s subunit to a post-fusion conformation [ ] . the receptor-binding domain (rbd) of s goes through a hingelike conformational change that temporarily hides or exposes the determinants of receptor binding in order to occupy a host-cell receptor [ ] . down and up conformation states are recognized where former is related to the receptor-inaccessible state and the later one explains receptor-accessible state and considered as less stable [ ] [ ] [ ] [ ] . this critical role of the spike protein makes it an important target for antibody-mediated neutralization, and detailed study of the pre-fusion s structure would provide information at atomic-level helping in the design and development of a vaccine [ ] [ ] [ ] [ ] [ ] . current data indicates that sars-cov- spike and sars-cov spike both share the same functional receptor (host cell) -angiotensin-converting enzyme (ace ) [ , ] . interestingly, ace binds to sars-cov- spike ectodomain with ∼ nm affinity, about - folds higher than ace binding to sars-cov spike [ ] . one possible reason for sars-cov- capability of spreading infection from human-to-human is sars-cov- spike's high affinity for human ace [ ] . series of cellular immune and humoral responses can be triggered by sars-cov- infections [ ] . immunoglobulin g (igg) and igm were noticeable after the weeks of onset of infection which are specific antibodies to sars-cov- . high titers of neutralizing antibodies and sars-cov- specific cytotoxic t lymphocyte responses have been identified in the patients who have improved from sars-cov- . this phenomenon clearly suggest that both cellular and humoral immune reactions are vital to clearing the sars-cov- infection [ ] [ ] [ ] [ ] [ ] . the study presented, herein, is an attempt to get insights about antigenic determinants of sars-cov- spike glycoprotein and highlight all antigenic epitopes [ ] of the spike that can be used specifically for the design of a multi-epitope peptide vaccine construct (mepvc) [ ] to counter covid- infections. the epitopes predicted by immunoinformatics techniques were fused together as well as to β-defensin adjuvant [ , ] to boost the antibody production and long- the mepvc affinity for an appropriate immune receptor as an agonist was checked in the step of molecular docking [ ] . tlr available under pdb id of ziw was retrieved and used as a receptor molecule. tlr also named as cd is a transmembrane protein belongs to the family of pattern recognition receptor [ ] . it detects viral infection-associated dsrna and evoke the activation of interferon regulatory transcription factor (irf ) and (nuclear factor kappa-lightchain-enhancer) nf-kb [ ] . unlike other tlrs, tlr uses tir-domain-containing adapterinducing interferon-β (trif) as a primary adapter [ ] . irf eventually induces the development of type i interferons leading to the activation of innate immune system and eventually to long lasting adaptive immunity [ ] . the tlr receptor and mepvc were used in a blind docking approach through an online patchdock server interface [ ] . the interacting molecules were docked according to the shape complementarity principle. the clustering rmsd is allowed to default . Å. the output docked solutions were immediately refined with fast interaction refinement in molecular docking (firedock) server [ ] which provides an efficient framework for refining patchdock complexes. the refined complexes were examined and one with lowest global energy was considered as top ranked. the opted complex was subjected to in-depth mepvc conformation with respect to the tlr using ucsf chimera . . [ ] . md simulation was applied on the selected top complex for -ns to understand complex dynamics and stability for practical applications. this assay was categorized into three phases: (i) parameters file preparation (ii) pre-processing, and (iii) simulation production [ ] . in first phase using an antechamber module of amber [ ] , complex libraries and set of parameters for tlr and mepvc were generated. the complex system was solvated into Å tip p solvation box achieved through leap module of amber. the intermolecular and intramolecular interactions of the system were determined by ff sb force [ ] field. counter ions in the form of na + were added to the system for charge neutralization. in the system pre-processing stage, complex energy was optimized through several rounds: minimization of complete set of hydrogen atoms for steps, minimization of system solvation box energy for steps with restraint of kcal/mol -Å on the remaining system, minimization of complete set of system atoms again for steps with applied restraint of kcal/mol -Å applied on system carbon alpha atoms, and steps of minimization on system non-heavy atoms with restraint of kcal/mol -Å on other system components. the complex system then underwent a heating step where the complex was heated gradually to k through nvt ensemble, maintained through langevin dynamics [ ] and shake algorithm [ ] to restrain hydrogen bonds. complex equilibration was achieved for -ps. pressure on the system was maintained using npt ensemble allowing restraint on cα atoms of kcal/mol -Å . in the simulation production, trajectories of -ns were produced on time scale of -fs. non-bounded interactions were differentiated by describing cut-off distance of . Å.cpptraj module [ ] was lastly used for statistical computation of different structure parameters to probe complex stability. the md simulation trajectories were visualized and analyzed in visual molecular dynamics (vmd) . . [ ] . axial frequency distribution or simply afd [ ] is a novel analytical technique run on simulation trajectories to access ligand d conformation with respect to reference receptor atom. such local structural movements are not captured by any other available technique. afd can be mathematically presented by eq.i, where, i and j are ligand atom coordinates on x and y axis with cut-off value k and l, respectively. the mi,j sums interactions frequency that fall in the coordinate (i,j). the interaction energy and solvation free energy for tlr receptor, mepvc, tlr -mepvc complex were calculated utilizing the mmpbsa.py module [ ] of amber . also, an average of the above was estimated as a net binding free energy of the system. the binding free energy was computed through mm-pbsa method and its counterpart mm-gbsa of amber with objective to derive the difference between bound and unbound states of solvated conformations of the same molecule [ ] . . computational approach adopted for the design of a sars-cov- spike protein based mepvc. the sars-cov- spike protein was targeted for mepvc designing because of many filters it fulfilled required for a potential vaccine candidate. first, it does not share any significant homology to the human host and as such chances of autoimmune responses are negligible [ ] . second, the protein is also not found to have any sequence identity to the mouse proteome and thus accurate immunological findings can be deciphered from in vivo mice experimentations [ ] . this spike protein only harbored one transmembrane helix ensuring the wet lab protein cloning and expression for antigen analysis easy [ ] . antigenicity is another factor that make this candidate highly suitable for vaccine designing as this allows efficient binding to the products of host immune system [ ] . further, this protein is strongly adhesive which makes it an excellent target for creation of adhesion based vaccine [ ] . lastly, all the sequences of sars-cov- spike protein are highly conserved thus a vaccine based on its sequence will be highly likely to have broad spectrum immunological implications [ ] . prioritization of potential epitopes for the sars-cov- spike protein commenced with the mapping of b-cell epitopes that predicted total of epitopes of vary length ranging from one to (s table ). the average score predicted for these b cell epitopes is . , maximum (max) of . and minimum (min) of . (fig. ) . each bcell epitope was then analyzed in mhc- alleles binding regions prediction [ ] . the predicted epitopes were then screened and stringent criteria of lowest percentile score was used to choose the excellent binders. afterward, the b cell epitopes were simultaneously run in mhc-ii alleles binding [ ] . likewise mhc-i, reference set of mhc-ii binding were: hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , hla-drb * : , and hla-drb * : . the mhc-ii predicted epitopes were also filtered on basis of percentile score and then cross checked with the selected mhc-i allele and those common in both classes were considered only which were in numbers. the shortlisted common mhc-i and mhc-ii epitopes then subjected to antigenicity check. in this check, ability of the filtered b-cell derived t-cell epitopes ability to evoke and bind to products of adaptive immunity. this yielded epitopes all of which have strong ability to bind to the most prevalent drb* with average ic score of . , max of and min of . .the antigenic epitopes then underwent allergenicity check to discard allergic peptides that may cause allergic reactions [ ] . this resulted into epitopes. non-toxic epitopes were whereas were ifn-gamma producer (fig. ) . the set of epitopes obtained at different stages of epitope mapping phase is tabulated in table . these epitopes appear to provide coverage to % of the world population (fig. ) and green peaks are those predicted as epitopes and non-epitopes, respectively. prioritized t cell epitopes derived from b cells were fused together tandemly by aay linkers to make a multi-epitope peptide (mep). aay linker avoid formation of junctional epitopes and as such enhance epitope presentation [ ] . to the n-terminal of mep, eaaak linker was added to attach β-defensin as an adjuvant leading to the design of a mepvc. the mepvc is schematically shown in fig. a . mepvc offers many advantages compared to a separate antigenic peptide. such vaccines induce both cd + and cd + responses and the antigens optimization are optimal. eaaak is a rigid spacer and allow separation of the attached domain and promoting efficient immune processing of the epitopes [ ] . β-defensins are potent immune adjuvants as they are capable of significantly enhancing production of lymphokines resulting into antigen-specific ig production and t cell-dependent cellular immunity. the sequence of mepvc is: giintlckyycrvrggrccvcsccpkeeqigkcstrgrkccrrkkecaakyawnrk cisacyiapgqtgkiccyprrarsvcsacytvydpcqpcaayvydplcpelcayckn htscdv. physicochemical properties of the mepvc were evaluated in order to assist experimentalists in the field to setup experiments accordingly in vitro and in vivo. the length of mepvc is spanned across amino acids and has molecular weight of . kda. vaccine construct with weight less than kda is generally believed to effective vaccine target because of its easier purification. theoretical pi of mepvc is . and aids in locating mepvc on d gel. mepvc aliphatic index is . projecting the vaccine thermostable at different temperatures. the total number of negatively charged and positively charges residues are and , respectively. the grand average of hydropathicity (gravy) score computed for the mepvc is - . , illustrating hydrophilic nature of the protein and is likely to interact with water molecules. the estimated half-life of mepvc in mammalian reticulocytes, in vitro is hours, yeast, in vivo is greater than hours, and escherichia coli, in vivo higher than hours. the antigenicity of the mepvc was cross-checked and predicted highly antigenic with value of . . total entropy of the protein is . which is considered ideal and also the vaccine has no transmembrane helices (alpha helical transmembrane protein, . and beta barrel transmembrane protein, . ) hence no difficulties can be anticipated in cloning and expression analysis. the predicted solubility upon overexpression of mepvc is . reflecting higher solubility of mepvc. the d model of the mepvc was constructed using ab initio dpro predictor as no appropriate template was available for homology modeling and threading methods. the d structure of mepvc is shown in fig. b . the structure secured . % of residues in the ramachandran favored, . %, . % and % residues in additionally allowed, generously allowed and disallowed regions, respectively. as the predicted mepvc unit has number of loop regions that need to be modeled proper before moving forward. in total, five sets of residues: alas -lys , ile -gly , cys -arg , thr -pro , and asn -val were loop modeled. the loop modeled structure increased the ramachandran favored residues percentage to . %, residues in allowed region reduced to . %, residues of generously allowed region to . % and disallowed remained to %. the structure was subjected to structure perturbations and relaxations to obtain a refined model. among the generated structures (s- table ), the first model was selected as it has improved rama favored score, lowest stable galaxy energy of . , improved clash score of . and good molprobity value. similarly, the structure lacks poor rotamers in contrast to the original structure. the ramachandran statistics for the refined d structure are in following order: ramachandran favored residues ( . %), additionally allowed region ( . %), generously allowed region ( . %) and disallowed region ( %). the z-score of the refined mepvc is - . and within the score range of same size protein in structure data bases (s- fig. ). fig. . a. schematic depiction of the mepvc. b. the original predicted d mepvc structure and refined along with respective ramachandran plots. aay linkers are shown in red while epitopes are in coal and yellow is for eaaak linker. cyan color represents the β-defensin adjuvant. in the ramachandran plot, the torsion angles are shown by black squares dispersed across the core secondary structures (colored as red). the allowed regions can be understand by yellow, generously allowed by pale yellow and disallowed by white region. the top right, top left, bottom right and bottom left represent quadrants for left handed alpha helices, beta sheets, right handed alpha helix, and no elements, respectively. further, disulfide engineering of the mepvc was performed in order to optimize molecular interactions and confer considerable stability by attaining precise geometric conformation [ , ] . eight pairs of residues were selected to be replaced with cysteine amino acid. these pairs are: gln -ala (χ angle,+ , energy value, . kcal/mol), cys -leu (χ angle,+ . , energy value, . kcal/mol), lys -ala (χ angle,+ . , energy value, . kcal/mol), arg -ala (χ angle,+ . , energy value, . kcal/mol), ala -ala (χ angle,- . , energy value, . kcal/mol), ala -ala (χ angle, , energy value, kcal/mol), leu -glu (χ angle, - . , energy value, . kcal/mol), and phe -pro (χ angle,- . , energy value, . kcal/mol). these residues have either higher energy level i.e. > kcal/mol and χ angle out of range (< − and > + ) were selected on purpose to make them stable. the original and disulfide mutant mepvc structures are shown in fig. . the primary purpose of in silico cloning of the mepvc was guide molecular biologist and genetic engineers about the possible cloning sites and predicted level of expression in a specific expression system for instance here in this study we used e. coli k system. prior to cloning, reverse translation of the mepvc sequence was conducted to have an optimized codon usage as per e. coli k to yield its max expression. the cai value of the improved mepvc sequence is indicating ideal expression of the vaccine [ ] . the gc content whereas is . % nearly to the e. coli k and range within the optimum ranged between % and %. the cloned mepvc is shown in fig. . both primary and secondary immune responses seem to play a significant contribution against the pathogen and may be compatible to the actual immune response. the in silico host immune system response to the antigen is shown in fig. . high concentration of igg +igg and igm was characterized at the primary response, followed by igm, igg + igg and igg at both primary and secondary stages with concomitant of antigen reduction. additionally, robust response of interleukins and cytokines were observed. all this suggest the efficient immune response and clearance of the pathogen upon subsequent encounters. elevated b cell population including memory cells and different isotypes in response to the antigen, points to the long lasting formation of memory and isotype switching. the t helper cell population additionally with the cytotoxic t cell and their respective memory development are in strong agreement of strong response to the antigen. bioinformatic modelling driven molecular docking of the desingned mepvc to one representative innate immune response receptor tlr was carried out in order to decode mepvc potential of binding to the innate immune receptors. this was fundamental to understand as tlr is significant in recognition of virus associated molecular patterns and of activaiton of type i interferons and nf-kappa b. the docking assesment predicted top complexes sorted mainlny on scoring functions along with interacting molecules area size, desolvation energy, and complexes actual rigid transformation. following, the complexes were subjected to firedock web server for refinement assay. this ease in discarding flexibility errors of the docking procedure and provide a deep refinement of the predictions thus limiting the chances of false positive docking calculations. according to the global energy, solution was considered as a best complex with net global energy of - . kj/mol ( table ). this energy is the output of - . kj/mol attractive van der waals (vdw), . kj/mol repulsive vdw, . kj/mol atomic contact energy (ace), and - . kj/mol hydrogen bond energy. the docked conformation and chemical interacting residues of the mepvc with tlr is shown in fig. . visual inspection of the complex leads to observation of deep binding of the mepvc at the center of tlr and favor rigorously rigoursly hydrogen and weak van dar waals interactions with various residues of tlr . within Å, the mepvc was noticed to formed interactions with his ,val ,asn ,asp ,phe ,val ,asn ,gln ,his ,thr ,glu ,ser ,hi s ,thr ,his ,gln ,glu ,lys ,lys ,glu ,asn ,ser ,asp ,ser ,tyr ,tyr ,phe ,tyr ,lys ,tyr ,tyr ,asn ,his ,asn , and glu . the stability of mepvc with tlr was further investigated through md simulations. the trajectories of md simulations were used in vital statistical analysis to decode backbone stability and residual flexibility. root mean square deviation (rmsd) [ , ] was performed first that compute average distance of backbone carbon alpha atoms of superimposed frames (fig. a) . the average rmsd for the system is . Å with max of . Å at -ns. an initial sudden change in rmsd can be seen up to . -ns that may be due to adjustments adopted by the complex when exposed to dynamics forces and milieu. the second minor rmsd shift can be noticed between -ns to -ns. afterward, the system is quite stable with not global and local conformational changes specified. next, root mean square fluctuations (rmsf) [ ] was applied on the system trajectories (fig. b) . rmsf is the average residual mobility of complex residues from its mean position. mean rmsf for the mepvc-tlr complex calculated is . Å with max of . Å pointed at the n-terminal of the mepvc. most of the interacting residues of the mepvc with tlr are subject to minor fluctuations, a fact in analogy to complex high stability. the thermal residual deviation was assessed afterward by beta-factor (β-factor) [ ] , the outcomes of which is strongly correlated to the rmsf and hence further affirming system stability (fig. c) . the average β-factor of the system analyzed is . Ų with max of . Ų. lastly, we evaluated the compactness of the system by means of radius of gyration (rg) [ ] analysis (fig. d) . high rg and low rg illustrate the magnitude of system compactness and system less tight packing. it further tell us the whether the system of interest in order or not. highly compact system is an indication of system stability and vice versa. the mean rg for our system is . Å with max score of . reflecting higher ordered and compact nature of the system. hydrogen bonds are dipole-dipole attractive forces and formed when a hydrogen atom bounded to a highly electronegative atom such as f, n, and o is attracted by another electronegative atom [ ] . the strength of a hydrogen bond vary from kj to kj per mole. hydrogen bonds are deemed vital in molecular recognition and provide rigidity in achieving stable conformation [ ] . the frequency of hydrogen bonds in each frame of the md simulation trajectories can be visualized in fig. a . these hydrogen bonds are extracted by mean of vmd hbonds plugin and are in number as tabulated in table . the cut-off distance set is . Å and cut-off angle degrees. each residue pair may for one, two or more each of which is counted separately. the min, mean and max number of hydrogen bonds between mepvc and tlr are , , and , respectively. the distribution and bonding pattern of intermolecular interactions of the mepvc residues atom(s) with respect to the tlr were studied through radial distribution function (rdf) (abbasi et al., ; donohue, ; kouetcha et al., ) . rdf mainly describes distance 'r' between two entities and is represented by g(r). the factor 'r' is extracted from simulation trajectories and range from o to ∞ [ ] . the hydrogen bonds predicted by vmd were utilized in rdf that shown only interactions between mepvc and tlr with good affinity for each other. in these interactions, tlr residues (atoms) are: asp :od , gln :he , glu :oe , asp :od , lys :hz , asp :od , glu :oe , and glu :oe are found to have strong radii distribution to their counterpart mepvc residues (atoms): arg :hh , arg : o, lys :hz , arg :hh , glu :oe , asn :hd , arg :hh , and tyr : hh, respectively. the rdf plots for the above said interactions are illustrated in fig. b . the interaction between asp -od and arg -hh has a refined distribution pattern and highest density distribution among all. the max g(r) value for this interaction is . observed at distance range of Å. this is followed by glu -oe -tyr -hh with max g(r) value of . mostly interaction at distance range of . Å. the glu -oe -arg -hh is also much refined having g(r) value of . and mostly interacts within distance range of . Å. the remaining interactions density distribution is not confined and vary considerably but important from mepvc and tlr interaction point of view. salt bridges are non-covalent in nature and the outcome of interactions between two ionized states [ ] . these interactions comprised two parts: an electrostatic interaction and a hydrogen bond. in salt bridges, lysine or arginine typically behave as base where glutamine or aspartate as acid and the bridge is created when carboxylic acid group allows a proton migration to guanidine and amine group in arginine. salt bridges are the strongest among all non-covalent interactions and contribute to a major extent in biomolecular stability [ ] [ ] [ ] . in total, salt bridges were identified between tlr and mepvc within the cut-off distance of . Å as can be depicted from fig. c . the higher numbers of salt bridges were recorded for tlr -glu and mepvc-lys . the mean number of salt bridges for this interaction is , max, and min, . the count for other salt bridges from tlr to mepvc is in following order: asp -arg (mean, , max, and min, ), glu -lys (mean, , max, and min, ), asp -arg (mean, , max, and min, ), glu -arg (mean, max, and min, ), glu -lys (mean, max, and min, ), glu -lys gln -side . % the vital hydrogen bond interactions involved between tlr receptor and mepvc shortlisted by vmd were subjected to a novel afd analysis to elucidate d movements of mepvc atoms with respect to a reference tlr residues atom in simulation time. to this objective, interactions mentioned in the rdf were used in afd. preliminary investigation suggested that only three interactions: tlr -asp -mepvc-arg , tlr -glu -mepvc-tyr , and tlr -glu -mepvc-arg are mainly represented frequently and found in most of the simulation frames. the tlr -asp -mepvc-arg is uncovered in frames, tlr -glu -mepvc-tyr in , and tlr -glu -mepvc-arg in making these interactions ideal for interpreting density distribution of the interactions on xyz planes and also appropriate for gaining ideas about conformational changes of the interacting atoms with respect to each other. as the local structure movements and rotations are responsible for functional shifts, their understanding in our system is important to be unveiled. for tlr -asp -mepvc-arg (fig. ) , the density distribution is not uniform, dispersed and behave flexibility in affinity on all three axis for the receptor atom. parallel, the strength of interaction is also observed affected due to these minor structural movements of the mepvc residue atom. though, the mentioned interaction depicts mepvc is still within the vicinity of the tlr reference residue and enjoys this interaction flexibility with the said mepvc residue during simulation. tlr -glu -mepvc-tyr interaction (fig. ) has less distribution area and has much higher intensity illustrating strong affinity of the interacting atoms for each other. it also gives an idea of the lesser movements of the atoms with respect to each other, an indication of a correct system conformation. the distribution area tlr -glu -mepvc-arg is much dispersed though high intensity of the interaction can be seen in close vicinity (fig. ) . the net free energy of binding (Δtotal) in both gb and pb models are revealed favorable mepvc-tlr complex in pure water. the net gb and pb energy for the mepvc-tlr complex is - . kcal/mol and - . kcal/mol, respectively. to this energy, high contribution was noticed from gas phase energy (Δg gas) compared to highly insignificant contributions from solvation energy (Δg solv). in gb model, the Δg gas energy for the system is - . the net free energy of the simulated system was subjected to per residues and pairwise residues decomposition to point residues that contribute majorly in system stabilization and lower energy. molecular docking simulation studies demonstrated residues from the tlr receptor that are in direct contact with the mepvc but per residue decomposition assay illustrated that among the residues only hie , phe , glu , hie , met , ile , thr , asp , glu , glu , glu , ser , phe , asn , ser , met , asp , tyr , tyr , phe , tyr , tyr , glu , hie , and tyr are hotspot as they contribute rigoursly in binding interaction with mepvc at the docked side. the side chain of hie , thr , asp , glu , ser , tyr , and glu contribute significantly in chemical interactions and have energy value in following order: - . kcal/mol, - . kcal/mol, - . kcal/mol, - . kcal/mol, - . kcal/mol, - . kcal/mol, and - . kcal/mol, respectively. to these tlr hotspot residues, the mepvc interacting residues were also observed in quite lower energies illustrating high affinity for the receptor residues for chemical interactions. the binding free energy of the tlr -mepvc complex, tlr receptor, mepvc and the net system energy is further decomposed into frames extracted from simulation trajectories (s- fig. ). this information deemed vital in predicting the simulation time where higher intermolecular affinity was observed and can guide about the most suitable docked conformation. in general the complex, receptor and construct energies are higher in pb compared to gb but for the total energy, the pb energies are quite lower for frames in contrast to gb. for the complex, the min, max and average binding energy reported are - . kcal/mol, - . kcal/mol, and - . kcal/mol, respectively in gb. the pb max frame energy is - . kcal/mol, min is - . kcal/mol and average is - . kcal/mol. the gb receptor max is - . kcal/mol whereas the min is - . pair-wise energy contribution to the net energy of the system was accomplished in order understand pair residues role from both tlr and mepvc in complex stability. we found that the thr and asp (- . kcal/mol in gb and - . kcal/mol in pb), glu and ser (- . kcal/mol in gb and - . kcal/mol in pb), glu and hie (- . kcal/mol in gb and - . kcal/mol in pb) of tlr receptor have high combine contribution to the net energy. in case of mepvc, asn and thr (- . kcal/mol in both gb and pb), val and arg (- , kcal/mol in gb and - . kcal/mol in pb) and arg and arg (- . kcal/mol and . kcal/mol). taken together, we characterized sars-cov- spike glycoprotein for antigenic peptides and proposed a mepvc by means of several computational immunological methods and biophysical calculations. the outcomes of this study could save time and associated cost that go into experimental epitope targets study. the mepvc is capable of activating all components of the host immune system, have suitable structural and physicochemical properties. also, it seems to have very stable dynamics with tlr innate immune receptor and thus has higher chances of presentation to the host immune system. however, additional in vivo and in vitro experimentations are needed to disclose its potential in fight against covid- . no conflict of interest was reported by the authors. authors are highly grateful to the pakistan-united states science and technology cooperation program (grant no. pak-us/ / ) for granting the financial assistance. fig. . prosa-z energy plot for the mepvc. fig. . residue wise decomposition of net binding energy into tlr receptor and mepvc interacting residues. top (gb) and bottom (pb). fig. . binding energy decomposition per frame for tlr -mepvc complex (a), tlr receptor (b), mepvc (c) and net energy (d). s- table . b cell epitopes predicted for the sars-cov- spike glycoprotein. s- table . top refined model of the mepvc. the input mepvc is also provided. glu -arg (mean, max, and min, ), glu -lys (mean, max, and min, ), glu -arg (mean, max, and min, ), glu -lys (mean, max, and min, ), glu -lys (mean, max, and min, ), glu -lys (mean, max, and min, ), glu -arg (mean, max, and min, ), glu -arg (mean covid- , an emerging coronavirus infection: advances and prospects in designing and developing vaccines, immunotherapeutics, and therapeutics a review of the novel coronavirus (covid- ) based on current evidence emerging coronaviruses: genome structure, replication, and pathogenesis the deadly coronaviruses: the sars pandemic and the novel coronavirus epidemic in china zoonotic origins of human coronaviruses estimating clinical severity of covid- from the transmission dynamics in wuhan, china novel, others, the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china, zhonghua liu xing bing xue za zhi= zhonghua liuxingbingxue zazhi structure, function, and antigenicity of the sars-cov- spike glycoprotein structure, function, and evolution of coronavirus spike proteins the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex cryo-em structure of the -ncov spike in the prefusion conformation tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding shi, others, immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen lanzavecchia, others, unexpected receptor functional mimicry elucidates activation of coronavirus fusion cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains others, characterization of novel monoclonal antibodies against merscoronavirus spike protein wu, others, identification of the immunodominant neutralizing regions in the spike glycoprotein of porcine deltacoronavirus development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach therapeutic options for the novel coronavirus ( -ncov) receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus greenough, others, angiotensin-converting enzyme is a functional receptor for the sars coronavirus the novel coronavirus ( -ncov) uses the sars-coronavirus receptor ace and the cellular protease tmprss for entry into target cells stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis others, a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster perspectives on therapeutic neutralizing antibodies against the novel coronavirus sars-cov- longitudinal profile of immunoglobulin g (igg), igm, and iga antibodies against the severe acute respiratory syndrome (sars) coronavirus nucleocapsid protein in patients with pneumonia due to the sars coronavirus disappearance of antibodies to sars-associated coronavirus after recovery an efficient method to make human monoclonal antibodies from memory b cells: potent neutralization of sars coronavirus anti--spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection peptide vaccine: progress and challenges multi-epitope vaccines: a promising strategy against tumors and viral infections avian antimicrobial peptides: the defense role of $β$-defensins exploring the zika genome to design a potential multiepitope vaccine using an immunoinformatics approach exploring ns / a, ns a and ns b proteins to design conserved subunit multi-epitope vaccine against hcv utilizing immunoinformatics approaches molecular dynamics simulations of biomolecules the mm/pbsa and mm/gbsa methods to estimate ligandbinding affinities the immune epitope database (iedb): update bepipred- . : improving sequence-based b-cell epitope prediction using conformational epitopes gapped sequence alignment using artificial neural networks: application to the mhc class i system peptide binding predictions for hla dr, dp and dq molecules mhcpred: a server for quantitative prediction of peptide--mhc binding pangenome and immuno-proteomics analysis of acinetobacter baumannii strains revealed the core peptide vaccine targets virulentpred: a svm based prediction method for virulent proteins in bacterial pathogens vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines allertop-a server for in silico prediction of allergens peptide toxicity prediction designing of interferon-gamma inducing mhc class-ii binders development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines exploring dengue genome to construct a multi-epitope based subunit vaccine by utilizing immunoinformatics approach to battle against dengue infection scratch: a protein structure and structural feature prediction server others, galaxy: a platform for interactive large-scale genome analysis galaxyrefine: protein structure refinement driven by sidechain repacking disulfide by design . : a web-based tool for disulfide engineering in proteins codon usage: nature's roadmap to expression and folding of proteins jcat: a novel tool to adapt codon usage of a target gene to its potential expression host innate immune pattern recognition: a cell biological perspective molecular docking, in: mol. model. proteins a family of human receptors structurally related to drosophila toll signaling to nf-$κ$b by toll-like receptors medical microbiology e-book: a guide to microbial infections: pathogenesis, immunity, laboratory diagnosis and control. with student consult online access recognition of doublestranded rna and activation of nf-$κ$b by toll-like receptor patchdock and symmdock: servers for rigid and symmetric docking firedock: fast interaction refinement in molecular docking ucsf chimera-a visualization system for exploratory research and analysis combating tigecycline resistant acinetobacter baumannii: a leap forward towards multi-epitope based vaccine discovery the ff sb force field langevin stabilization of molecular dynamics a fast shake algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations cpptraj: software for processing and analysis of molecular dynamics trajectory data vmd: visual molecular dynamics afd: an application for bi-molecular interaction using axial frequency distribution mmpbsa.py: an efficient program for end-state free energy calculations assessing the performance of the mm_pbsa and mm_gbsa methods. . the accuracy.pdf identification of putative vaccine candidates against helicobacter pylori exploiting exoproteome and secretome: a reverse vaccinology based approach vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development panrv: pangenome-reverse vaccinology approach for identifications of potential vaccine candidates in microbial pangenome prioritization of potential vaccine targets using comparative proteomics and designing of the chimeric multi-epitope vaccine against pseudomonas aeruginosa adhesins as targets for vaccine development exoproteome and secretome derived broad spectrum novel drug and vaccine candidates in vibrio cholerae targeted by piper betel derived compounds the mhc class i antigen presentation pathway: strategies for viral immune evasion mhc class ii proteins and disease: a structural perspective vaccine allergies predefined spacers between epitopes on a recombinant epitope-peptide impacted epitope-specific antibody response in silico design of multimeric hn-f antigen as a highly immunogenic peptide vaccine against newcastle disease virus disulphide bonds and protein stability protein disulfide engineering novel immunoinformatics approaches to design multi-epitope subunit vaccine for malaria by investigating anopheles salivary protein molecular dynamics simulation studies of novel $β$-lactamase inhibitor significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins interaction mechanisms of a melatonergic inhibitor in the melatonin synthesis pathway binding mode analysis, dynamic simulation and binding free energy calculations of the murf ligase from acinetobacter baumannii radius of gyration as an indicator of protein structure compactness the hydrogen bond in molecular recognition hydrogen bonds in proteins: role and strength ultrafast scalable parallel algorithm for the radial distribution function histogramming using mpi maps radial distribution functions of some structures of the polypeptide chain defining the role of salt bridges in protein stability do salt bridges stabilize proteins? a continuum electrostatic analysis protein stabilization by salt bridges: concepts, experimental approaches and clarification of some misunderstandings contribution of surface salt bridges to protein stability: guidelines for protein engineering key: cord- -wqpr v p authors: yuan, xianlin; li, liangping title: the influence of major s protein mutations of sars-cov- on the potential b cell epitopes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wqpr v p sars-cov- has rapidly transmitted worldwide and results in the covid- pandemic. spike glycoprotein on surface is a key factor of viral transmission, and has appeared a lot of variants due to gene mutations, which may influence the viral antigenicity and vaccine efficacy. here, we used bioinformatic tools to analyze b-cell epitopes of prototype s protein and its common variants. potential linear and discontinuous epitopes of b-cells were predicted from the s protein prototype. importantly, by comparing the epitope alterations between prototype and variants, we demonstrate that b-cell epitopes and antigenicity of variants appear significantly different alterations. the dominant d g variant impacts the potential epitope least, only with moderately elevated antigenicity, while the epitopes and antigenicity of some mutants(v a, v f, etc.) with small incidence in the population change greatly. these results suggest that the currently developed vaccines should be valid for a majority of sars-cov- infectors. this study provides a scientific basis for large-scale application of sars-cov- vaccines and for taking precautions against the probable appearance of antigen escape induced by genetic variation after vaccination. author summary the global pandemic of sars-cov- has lasted for more than half a year and has not yet been contained. until now there is no effective treatment for sars-cov- caused disease (covid- ). successful vaccine development seems to be the only hope. however, this novel coronavirus belongs to the rna virus, there is a high mutation rate in the genome, and these mutations often locate on the spike proteins of virus, the gripper of the virus entering the cells. vaccination induce the generation of antibodies, which block spike protein. however, the spike protein variants may change the recognition and binding of antibodies and make the vaccine ineffective. in this study, we predict neutralizing antibody recognition sites (b cell epitopes) of the prototype s protein of sars-cov , along with several common variants using bioinformatics tools. we discovered the variability in antigenicity among the mutants, for instance, in the more widespread d g variant the change of epitope was least affected, only with slight increase of antigenicity. however, the antigenic epitopes of some mutants change greatly. these results could be of potential importance for future vaccine design and application against sars-cov variants. author summary the global pandemic of sars-cov- has lasted for more than half a year and has not yet been contained. until now there is no effective treatment for sars-cov- caused disease . successful vaccine development seems to be the only hope. however, this novel coronavirus belongs to the rna virus, there is a high mutation rate in the genome, and these mutations often locate on the spike proteins of . since january , , the who announced the covid- contagion as a public health emergency of global concern. as of july , cases of covid- and deaths have been reported globally according to covid- situation report- (who website at https://www.who.int/emergencies/diseases/ novel-coronavirus- ). the virion of sars-cov- is spherical, enveloped, and - nm in diameter with spikes of about - nm outside. the coronaviral genome encodes proteins, four of them are major structural proteins: the spike (s), membrane (m), envelope (e) and nucleocapsid (n) proteins [ ] . each of these proteins is responsible for different in the sars-cov- genome were discovered from public genome data assemblies as at may , [ ] , in which missense mutations of s protein were detected. among these spike mutations, d g mutation, in which aspartic acid (d) was replaced with glycine (g) at the aa site of , was a major mutation of great concern [ , ]. sars-cov- with d g mutation may have triggered fatal infections in many european countries, such as spain, italy, france, etc. [ ] . these mutations will undoubtedly cause changes in the structure of s proteins. however, it's highly worth concerning whether or not these mutations affect the antigenicity of s proteins and the binding ability with neutralizing antibodies. if the b-cell epitopes on s protein changed and could not bind the neutralizing antibodies , it would result in losing efficacy of the developed vaccines based on prototype s protein. many immuno-bioinformatic tools have been developed to dope out the overall and deep analysis of viral antigens, including both linear and discontinuous epitopes of b-cells as well as their immunogenicity, etc. to explore these questions, here we report to used these immuno-bioinformatic tools from the iedb and related resources to predict the b cell epitopes of s protein from the prototype and mutated strains of sars-cov- and compare the changes of the likely epitope sites from dominant and rare mutations of s protein. we found that the distinctive mutations of s proteins could impact potential effective epitopes of s proteins in different degree. the exracellular domains divide into s and s subunits; s contains the n-terminal domain (ntd) and receptor binding domain (rbd) [ ] . based on the beginning and end position of different domains, major mutations were shown in the schematic diagram of s protein ( figure a ). we found that most of s protein mutations ( %) locates in s and near region of s protein. the highest mutation d g is near rbd to predict the potential linear b-cell epitopes, we first used bepipred- . prediction tool on iedb server to screen the prototype s protein sequence and discovered total b-cell linear epitopes (table s ) , whose distribution is shown on table . among them, epitopes are in s subunit in the ntd region, in the rbd domain) and in the s subunit of s protein. based on this analysis, we found that three epitopes in the rbd domain ( ptklndl , devrqiapgqtgki , and ncyfpl ) have more significant antigenicity and accessibility. we further predicted the discontinuous epitopes by the discotope . online server. d structure of s protein (pdb id: vyb, chain id: a) was utilized to predict the discontinuous epitopes. the default threshold was − . with % of sensitivity and % of specificity. the discontinuous epitopes were predicted and mainly located in the whole rbd region at aa~ aa of s protein shown in figure a . all of the predicted epitopes distributing on surface of s protein are shown in a d structure picture in figure b using jsmol viewer. according to the distribution in different domains, these epitopes (table s ) could be divided into four groups (table ) and the highest propensity score (p-score) and discotope score (d-score) of epitopes were concentrated at ~ aa of rbd region shown by arrows in the figure b . finally, these epitopes were validated by pepitope tool (http://pepitope.tau.ac.il/) , the three major antigen clusters were consistent with b-linear epitopes mentioned above (table ) . by assessment for its antigenicity and surface availability, we found that four epitopes have changed (table s ). in brief, after h y mutation, the s protein had effective epitopes, two of which have better antigenicity than original epitopes at site of ~ and ~ , two of which were newly generated at sites of ~ and ~ , and the remaining epitopes were the same as those without mutation. y h mutation y h mutation occurred in countries, but the frequency appeared to decrease now [ ] . by using the screening methods above, we found that five altering sites have distinct influences on the likely epitopes (table s ) . y h mutation emerging, the s protein had effective epitopes, two of which were newly produced from originally unlikely epitopes at sites of ~ , three of which have better antigenicity than original epitopes at site of ~ , ~ and ~ , and the remaining epitopes were conservative. the epitope at site ~ was reduced antigenicity slightly. therefore, effective epitopes have predicted after g s mutation, in which the amount of epitope-changing is the most, and the overall antigenicity was dereased. (table s ) the background of v f mutation is consistent with v i. through the above-mentioned forecasting tool, we found that alterations directly affected b cell epitope other than v i mutation obviously (table s ) in order to investigate the influences of the above common mutations of s protein on b cell epitopes, we compared the predicted epitopes of reference and mutant s protein, analyzed the association of epitope changes among mutations and determined the influence of mutation on b cell epitopes. the detailed information of changes in each mutation is listed in the table s . we found that some mutations did not or slightly change b-cell epitopes, while others strongly impact the number and site of b-cell epitopes. all the major changes of b-cells comparison was summarized in table . most important finding is that the commonest mutation d g change the b-cell epitopes of s protein slightly, only moderately increasing the accessibility and antigenicity of epitope - . there are potential epitopes in d g mutation, nearly identical to those without mutation. in d g and v i mutation, their effective epitopes were also , in which only epitope at the same site table , and of them are concentrated on and near rbd domain. especially, the most frequently occurring mutation d g located at s and s junctions, where is near the furin cleavage site of the s /s boundary. walls reported that deletion of this cleavage region could influence sars-cov- s-mediated entry into host cells [ ] . hence, korber [ ] and zhang [ ] proposed that d g mutation contributes to the spread of sars-cov- , which makes g strain swiftly become the dominant mutant. the mutation in s protein may affect the b-cell epitopes and lead to vaccine failure. therefore, in order to explore the impact of mutations on antigenicity of s protein, in this study, we applied immuno-informatics tools to predict potential b-cell epitopes of prototype and variant s protein. (table ) and discontinuous epitopes such as no. (table ) could be vaccine candidates targets. importantly, it's worth exploring whether or not the mutations on s protein leads to epitope changes. therefore, we used a group of prediction tools of b-cell epitopes to predict the prototype and variant s protein. the primary sequence of sars-cov- s protein was retrieved from ncbi genbank database using accession number qho . and was used as prototype sequence or reference sequence for vaccine development in many projects [ ] . its complete genome number is nc_ , which. the major variation sequences were available from the global initiative for sharing all influenza data (gisaid) [ ] and genbank we used the sequence from early onset sars-cov- as the wildtype or prototype and the recent variant virus as mutation strains to predict the b-cell epitopes of s protein. the s protein sequence was exclusive of the signal peptide (sp), tm and cytoplasmic region, and only the ectodomain of s protein was used for analysis. the linear and non-linear (discontinuous) epitopes of b cell were predicted by the different tools. the linear epitopes were prediced by bepipred- . server of iedb online database [ , ]. the threshold was set to . , which represented that the sensitivity was %, and the specificity was %. analysis result shows in a figure in which the residues with scores above the threshold predicted to be part of an epitope were colored in yellow. the effective b-cell epitopes relies on stronger antigenicity and accessibility of surface a pneumonia outbreak associated with a new coronavirus of probable bat origin identification of a novel coronavirus in patients with severe acute respiratory syndrome genome composition and divergence of the novel coronavirus ( -ncov) originating in china. cell host & microbe a novel coronavirus associated with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia properties of coronavirus and sars-cov- structure, function, and antigenicity of the sars-cov- spike glycoprotein variant analysis of sars-cov- genomes. bulletin of the world health organization sars-cov- viral spike g mutation exhibits higher case fatality rate spike (s) protein be associated with higher covid- mortality? spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor rna based mngs approach identifies a novel human coronavirus from two individual pneumonia cases in wuhan outbreak. emerging microbes & infections the early landscape of covid- vaccine development in the uk and rest of the world preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies covid- , an emerging coronavirus infection: advances and prospects in designing and developing vaccines, immunotherapeutics, and therapeutics the sars-cov- vaccine pipeline: an overview. current tropical medicine reports the d g mutation in the epitope-based peptide vaccine design and target site depiction against middle east respiratory syndrome coronavirus: an immune-informatics study cov spike protein: a key target for antivirals. expert opinion on therapeutic targets sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies. proceedings of the national academy of sciences a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensin-converting enzyme spike mutation increases sars cov- susceptibility to neutralization global initiative on sharing all influenza data-from vision to reality clustal omega for making accurate alignments of many protein sequences msaviewer: interactive javascript visualization of multiple sequence alignments cdd/sparcle: the conserved domain database in identifying candidate subunit vaccines using an alignment-independent method based on principal amino acid properties identification and validation of specific b-cell epitopes of hantaviruses associated to hemorrhagic fever and renal syndrome the immune epitope database and analysis resource: from vision to blueprint bepipred- . : improving sequence-based b-cell epitope prediction using conformational epitopes influence of protein flexibility and peptide conformation on reactivity of monoclonal anti-peptide antibodies with a protein alpha-helix conformational b-cell epitope prediction on antigen protein structures: a review of current algorithms and comparison with common binding site prediction methods bioinformatics resources and tools for conformational b-cell epitope prediction. computational and mathematical methods in medicine reliable b cell epitope predictions: impacts of method development and improved benchmarking it is grateful to zijun shu for his management of manuscript reference. key: cord- - inxq t authors: cuccarese, michael f.; earnshaw, berton a.; heiser, katie; fogelson, ben; davis, chadwick t.; mclean, peter f.; gordon, hannah b.; skelly, kathleen-rose; weathersby, fiona l.; rodic, vlad; quigley, ian k.; pastuzyn, elissa d.; mendivil, brandon m.; lazar, nathan h.; brooks, carl a.; carpenter, joseph; probst, brandon l.; jacobson, pamela; glazier, seth w.; ford, jes; jensen, james d.; campbell, nicholas d.; statnick, michael a.; low, adeline s.; thomas, kirk r.; carpenter, anne e.; hegde, sharath s.; alfa, ronald w.; victors, mason l.; haque, imran s.; chong, yolanda t.; gibson, christopher c. title: functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and covid- drug discovery date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: inxq t development of accurate disease models and discovery of immune-modulating drugs is challenged by the immune system’s highly interconnected and context-dependent nature. here we apply deep-learning-driven analysis of cellular morphology to develop a scalable “phenomics” platform and demonstrate its ability to identify dose-dependent, high-dimensional relationships among and between immunomodulators, toxins, pathogens, genetic perturbations, and small and large molecules at scale. high-throughput screening on this platform demonstrates rapid identification and triage of hits for tgf-β- and tnf-α-driven phenotypes. we deploy the platform to develop phenotypic models of active sars-cov- infection and of covid- -associated cytokine storm, surfacing compounds with demonstrated clinical benefit and identifying several new candidates for drug repurposing. the presented library of images, deep learning features, and compound screening data from immune profiling and covid- screens serves as a deep resource for immune biology and cellular-model drug discovery with immediate impact on the covid- pandemic. acting through autocrine, paracrine, and endocrine mechanisms, endogenous immune stimuli maintain homeostasis and signal response to invasion, injury, or malignancy. immune dysregulation underlies a broad set of human diseases including inflammation , autoimmune disease , neuroinflammation , neurodegenerative disease , secondary effects of traumatic brain injury , cancer , , infection [ ] [ ] [ ] , and cytokine storm , . improvements in the understanding of how immune stimuli amplify or suppress the immune system, trigger new cell fate differentiation, and remodel tissue have resulted in the discovery of a wide range of successful therapeutics , as demonstrated by the anti-tnf antibody adalimumab (humira), noted both for its discovery and its application in rheumatic disease . however, the immune system is vastly complex and dependent on cell type and context; reliably intervening in such a highly interdependent process is a formidable drug discovery challenge. with few exceptions , intercellular immune signaling has been explored by studying specific factors in isolation. although cellular response to individual immune stimuli can be effectively profiled by high-dimensional molecular methods such as rnaseq , proteomics , and chipseq , these technologies lack the speed and cost-effectiveness for systems-level analysis of large panels of immune stimuli and screening libraries across myriad inflammatory models. by contrast, image-based methods have demonstrated utility at many stages of drug discovery , including compound profiling , prediction of assay performance , clustering by mechanism of action , , and toxicology predictions . here we present phenomics, the analysis of fluorescence microscopy images as a scalable approach to examine cellular response to a wide range of perturbations. deep-learning algorithms extract high-dimensional and dose-dependent fingerprints of cellular morphological changes, or 'phenoprints', from images to support a variety of downstream applications. these phenoprints detect subtle morphological changes far beyond human ability, and a standardized assay pipeline allows the phenoprints of millions of cellular samples to be related across time and experimental conditions. in this work, we first demonstrate the ability to use images alone to accurately quantify and relate hundreds of immune stimuli. we then show how profiles resulting from these perturbations can be employed in drug screening, particularly highlighting the utility of a relatable dataset to accurately predict the mechanism of action for unknown compounds. we used these capabilities to rapidly develop high-throughput-ready disease models for both sars-cov- viral infection and the resulting cytokine storm, and immediately launched large-scale drug screens that recapitulated known effective and ineffective therapies and, more importantly, identified several new potential treatments for both sars-cov- infection and covid- -associated cytokine storm. in order to discern the breadth of immune response that might be captured in a single profiling assay for various applications (fig. ) , we treated endothelial cells, fibroblasts, peripheral blood mononuclear cells (pbmc), and macrophages with diverse immune stimuli (table s ), crispr gene editing reagents, antibodies, and small molecules. we used a single fully automated experimental workflow in which cells are plated, treated, labeled with a panel of fluorescent stains, and imaged with high-throughput fluorescence microscopy . vector representations of more than one million multi-channel fluorescence microscopy images analyzed in this manuscript were generated using a proprietary analytics workflow based on an extension of a densenet- neural network various cell types (top left) are treated with a range of biological perturbants and treatments (bottom left), including recombinant proteins, antibodies, crispr-based genetic modifications, and small molecules. high-throughput fluorescence microscopy (middle-top) and deep learningenabled image featurization generates high-dimensional phenoprints that are used for interrogating a range of experimental questions (middle-top and middle bottom). this approach suits a suite of applications (right) with a single workflow and on a single platform. specific applications demonstrated in this paper: systems biology exploration of immune relationships, adaptation across diverse disease models, compound screening, and mechanism prediction. phenoprints are high-dimensional vector representations (embeddings) of cellular morphology derived from fluorescent images resulting from treatment with a biological perturbation. to provide landmark phenoprints across a diverse range of immune function, stimuli were added to four different primary human cell types at a range of biologically relevant concentrations and compared to untreated cells and adjacent concentrations. for example, cells treated with tumor necrosis factor alpha (tnf-ɑ), the anti-pd- antibody nivolumab, transforming growth factor beta (tgf-β), or interferon alpha (ifn-ɑ) each displayed concentration-dependent phenotype strength measured as intra-replicate consistency as well as increasing convergence of the phenotype ( fig. a) . in total, of stimuli produced highly consistent, dose-dependent phenoprints in at least one cell type (fig. b) , with representation from cytokines, growth factors, chemokines, antibodies, microbial toxins, and others (fig. s ) . a subset of factors, such as innate stimulants like lps, interferons, and microbial toxins (e.g. enterotoxins) produced strong dose-dependent phenoprints in all cell types tested. by contrast, other stimuli produced phenoprints only in the expected cell type: vascular endothelial growth factor (vegf) in endothelial cells, fibroblast growth factor (fgf) family in fibroblasts, innate stimulants and interferons in macrophages and pbmcs, and tgf-β family in both macrophages and fibroblasts , . example phenoprints for indicated immune stimuli and cells. the intra-dose cosine similarity among well replicates (n= ) for each perturbation dose, demonstrating perturbation consistency(red), and average pairwise cosine similarity between each perturbation dose (blue) and the highest dose, demonstrating phenotypic convergence. b. statistically significant phenoprints induced by immune stimuli (rectangles, colored by class as in fig. c ) are connected by edges to the cell type (circles) in which the phenoprint was observed. thicker edges reflect stronger interactions. c. classes of all immune stimuli in immune perturbant library. to test whether phenoprints could capture known functional relationships, we applied hierarchical clustering and confirmed appropriate grouping for factors with similar function and/or structural homology, often in a cell-type-specific manner (fig. a , table s ). for example, related factors il- and il- , and all type-i ifns clustered in unique groups in any cell type in which a phenoprint was observed. chemokines (such as ccl- and mcp- ) and innate stimulants (such as lps and flagellin) are also readily grouped with other stimulants, but only in their appropriate cell types: macrophages and pbmcs (fig. a , green/purple edges). growth factors cluster in fibroblasts, consistent with their sensitivity to environmental cues for matrix remodeling and organ function, and ability to differentiate into other states such as myofibroblasts under inflammatory pressure (fig. a , blue edges) , . phenoprints also captured associations between initial stimuli and overlapping secondary effects, as in the case of clusters for activators of nfkb (tnf-ɑ, il β), interferons induced by irf , and tlr ligands (poly (i:c) variants) in endothelial cells (fig. s c ). double-stranded rna motifs such as poly (i:c) are recognized by tlr , which signals through both nfkb and irf pathways . fine-grained distinctions were visible in clustered phenoprints. for example, the tgf-β superfamily (tgf-β proteins, growth differentiation factors, activins, and müllerian inhibiting substance) formed a distinct cluster from other growth factor families, which included the epidermal growth factor (egf) family (egfs, tgf-ɑ, betacellulin, heparin-binding egf), platelet-derived growth factor (pdgf) family, fgf family and the insulin-like growth factor (igf) family (fig. s d ). within these larger groupings, nearly structurally-identical tgf-β isoforms tgf-β , tgf-β , and tgf-β clustered more tightly than do other members of the tgf-β superfamily . pathogen-derived toxins also revealed expected relationships. c. difficile toxins a and b produced cell-type-specific similarities with members of the human immune stimulant panel (fig. s e ). these toxin phenoprints were similar to those of interferon proteins in macrophages, as well as to il- superfamily members and nfkb-mediated stimulants in huvec. this aligns with the known dual pathology of c. diff. infection, involving both inflammation through macrophage activation and direct gut permeabilization effects . high-dimensional morphological relationships in four cell contexts. immune stimuli are arranged based on hierarchical clustering of all factors in all cell types; only factors with a statistically strong phenoprint in at least one cell type are shown. lines between nodes are filtered for similarity and colored based on the cell type in which the association was observed. b. hierarchical clustering of phenoprints resulting from huvec treated with an annotated compound library. highlighted clusters pertain to small molecule inhibitors of agc family kinases: akt (blue), rock (orange), mek (orange)/erk (blue), and mtor inhibitors: rapalogues (blue), pi k inhibitors (orange) and mtorc / (green). c. process for reduction of high-dimensional data into two dimensions. d. projections of compound response in the context of on-and off-perturbation vectors and logistical regression for on-perturbation values by drug concentration for tnf-ɑ, il- (receptor chimera) ligands, and socs knockout in huvec (mean, n= ). contour lines depict perturbation state (orange) and target state (blue) replicates at the , , and th percentiles, respectively. confidence in phenotypes resulting not only from the immune perturbation but those resulting from compounds in a screening library is critical for evaluation of high-throughput screens. to this end, we generated individual compound phenoprints from an annotated bioactive library, which indeed clustered by mechanism of action ( fig. b ). for example, inhibitors of agc kinases akt and rock clustered within a super-group as did inhibitors of mek and erk, which is expected based on protein homology and sequential signaling, respectively. more granular sub-clusters were also observed; for example, a diverse group of inhibitors of mtor can be sub-classified into rapalogues, pi k inhibitors and mtorc / inhibitors. we next tested whether phenomics could uncover compounds that rescue the complex high-dimensional effect of various immune perturbants. we applied several disease-related phenoprints identified above and defined a corresponding perturbation vector in high-dimensional embedding space. compounds that rescued on-perturbation morphology with minimal deviation in off-perturbation morphology were selected as screen hits, as they offer a potential combination of efficacy and specificity (fig. s a, b) . we first validated the strategy using approved anti-tnf antibodies in the context of a tnf-ɑ phenoprint (fig. d ). all tested antibodies rescued the phenoprint induced by tnf-ɑ (perturbed state) back towards the unperturbed phenoprint (target state), but infliximab was less potent relative to adalimumab and golimumab. it is known that infliximab, adalimumab, and golimumab all have similar affinities for tnfɑ , but only adalimumab and golimumab are effective in the absence of concomitant immunosuppression . although many factors can affect antibody performance, this finding suggests phenomics can differentiate subtleties of antibody efficacy beyond affinity for the ligand. next, we selected a set of clinical-stage and approved jak inhibitors to test the ability of phenomics to model compound rescue of il- signaling when cells are activated by the ligand or when disrupted by genetic modification. tofacitinib, baricitinib, ruxolitinib, and oclacitinib reverse the phenoprint of the il- receptor chimera in huvec in a dose-dependent manner. in addition to identifying compounds that block receptor-mediated signaling, we demonstrated compound-induced rescue of a phenotype resulting from knockout of an intracellular mediator. here, knockout of socs using crispr/cas leads to hyperactivation of jak signalling , resulting in a phenoprint that is rescued by the same set of compounds ( fig. d ; comparison to inactives: fig. s c ). well-designed phenotypic screening efforts benefit from being a more proximal model of the disease, and since they are not limited to pre-defined targets, offer the ability to uncover novel therapeutic pathways . however, this carries the risk of investing time and resources into compounds that are later discovered to interact with known or disadvantageous pathways. to address this challenge, we leveraged the relatability of our nce screening and annotated compound datasets to identify novel therapeutic opportunities while rapidly deprioritizing high-risk mechanistic space, as demonstrated in a screen against the tgf-β-induced phenoprint. tgf-β, signaling through the receptor alk , is recognized as a primary driver of fibrosis in debilitating diseases such as idiopathic pulmonary fibrosis and renal fibrosis, as well as a significant contributor to immune exclusion in the tumor microenvironment . the diversity and consistency of phenoprints recapitulating known biology suggested phenomics might discover new chemical entities (nce) and predict mechanism of action. we therefore screened , diverse chemical starting points (fig. s a ) against the tgf-β phenoprint (fig. a) . a novel compound of interest (rec- ) completely reversed the phenoprint at low micromolar concentrations. further, this same compound rescued an orthogonal functional validation assay, mitigating tgf-βinduced collagen deposition with an ec of . µm (fig. b) . we then compared the rec- phenoprint to reference phenoprints derived from a set of , diverse, well-annotated small molecules; several known alk inhibitors were highly similar (fig. c) . we experimentally validated the accuracy of these predictions in gold standard assays of alk activity: rec- inhibited cellular p-smad activity and cell-free biochemical alk activity at . µm and . µm respectively (fig. s d, fig. d ) , . because the advancement of tgf-β receptor inhibitors has been hampered by cardiac toxicity , and because our research goals are to identify compounds acting against novel pathways, we rapidly deprioritized this compound in favor of others based on the primary screening data. overactive tnf-ɑ signaling is a major driver of inflammation in inflammatory and autoimmune diseases including alzheimer's disease , multiple sclerosis , and traumatic brain injury , and significant benefit has been achieved with monoclonal antibody intervention . we therefore sought to rescue the tnf-ɑ phenoprint with an intervention that not only targets a novel mechanism of action, but also benefits from advantages of small molecules over existing antibodies, such as oral availability and increased central nervous system penetration. we found , compounds that statistically altered the tnf-ɑ phenoprint at a single dose in a , -compound primary screen. we tested these for dose response and selected a subset for validation. although suppression of tnf-ɑ signaling through nfkb blockade is a plausible anti-inflammatory strategy, reduction of tnf-ɑ signaling via global inhibition of nfkb leads to a challenging safety profile . two molecules (rec- and rec- ) rescued the tnf-ɑ-induced phenoprint (fig. e ) and prevented secretion of il- , a marker of tnf-ɑ stimulation (fig. f ) while preserving nfkb activation (fig. s e) . by comparing the phenoprint of rec- to phenomics data from annotated compounds in prior experiments, we found strong similarity between the phenoprints of rec- and rho kinase inhibitors (fig. g ). this finding was corroborated by kinase profiling, revealing rock and rock inhibition at . and . µm, respectively (fig. s f ). during intellectual property exploration, the target was further confirmed; the scaffold was previously evaluated as a rock inhibitor . kinases that were inhibited to a lesser degree were cdk and dyrk b (fig. s f ). the effect of rock inhibition on tnf-ɑ-induced inflammation is documented and similar compounds are in active development for autoimmune disease , . therefore, we deprioritized the compound using high-dimensional primary screening data in favor of another compound, rec- . the alternative scaffold also reduced tnf-ɑ-induced il- release ( fig. f ) but did not phenotypically cluster with any of the mechanistic classes annotated in our libraries, thus enabling informed prioritization of the compound for further study. in a > kinase biochemical screen, the compound showed no significant inhibition of any kinase at µm. to investigate the potential benefit of rec- in neuroinflammation, we explored the role of the molecule to suppress microglial activation in vitro. using an immunofluorescence stain for the microglial activation marker iba- we confirmed that rec- reduced the activation of bv- mouse microglia ( fig. h ; example images fig. s g ). given the potential of rec- to operate via a novel mechanism of action (moa) we initiated a hit optimization effort initially focused on enhancing the series potency (rec- ec > um). we used phenomics as to assess our efforts to optimize potency while maintaining or improving efficacy against an unknown target; we succeeded in improving potency by more than ten-fold (rec- ec = nm) and improving the on-perturbation score to . (fig. i ). the ongoing covid- pandemic presents an urgent need for quick and adaptable drug discovery in the context of a complex and poorly understood disease. we leveraged our phenomics platform to screen for approved and reference (e.g., development stage antiviral) compounds that could address two key components of covid- disease progression: direct effects of viral infection and the damaging effects of an unresolved inflammatory response, or cytokine storm. , and , compounds were applied to cells in the infection and cytokine storm modes, respectively, and compound rescue was evaluated with the same processing pipeline described above. many features of terminal covid- are the result of inflammatory pressure on endothelial cells, manifesting as barrier disruption, lymphocyte recruitment, induction of blood coagulation, and acute respiratory distress syndrome (ards) . we modeled the cytokine storm associated with late-stage covid- in endothelial cells by applying cocktails of circulating proteins that mirror those from severe covid- patients (perturbed state) as well as healthy control patients (target state) (table s , fig. s a ). we hypothesized that rescue of the perturbed state toward the target state would reveal anti-inflammatory compounds specifically relevant to the covid- -associated cytokine storm. following identification of hit compounds, electric cell-substrate impedance sensing (ecis) was employed to confirm the activity of the aforementioned compounds in an orthogonal functional model of vascular integrity challenged with the same cytokine cocktails. presently, jak inhibitors have shown benefit in one nonrandomized trial and represent one of the most common mechanisms being evaluated among hundreds of clinical trials active for covid- . in our screen against the cytokine storm phenotype, jak inhibitors were capable of potent rescue of the severe cytokine storm phenoprint, confirming strong potential for this mechanism's efficacy in the context of a complex immune cascade (fig. a) . we also identified rescue by compounds in three classes of inhibitors outside of the jak/stat pathway that have been less deeply explored in the context of covid- , including inhibitors of syk, pi k and c-met. compounds were then applied to cells with the same inflammatory cocktail and evaluated with ecis ( fig. s a ) to inform on benefit to vascular integrity. rescue of this orthogonal functional assay was observed for each mechanism identified in the high-dimensional assay (fig. g, fig. s b -c). we next developed a model of sars-cov- infection to screen for repurposable compounds acting directly against viral targets or on host pathways. to define the model, we evaluated the effect of sars-cov- infection in multiple cell types, of which three resulted in robust phenoprints as compared to either mock infected or inactivated virus control populations: calu (a lung adenocarcinoma line), vero (an immortalized interferondeficient african green monkey kidney line ), and primary human renal cortical epithelium (hrce) (fig. c, fig. s d ). we confirmed active infection with sars-cov- nucleocapsid antibody staining and quantification of productive viral replication ( fig. s a-c) . we reasoned that a primary human cell type would be most directly translatable to human pathology, especially from tissues demonstrated to be directly infected by sars-cov- , and thus conducted a screen of compounds against the hrce phenotype, while testing a limited subset of those compounds in vero and calu cells. the majority of compounds currently under evaluation in human clinical trials for covid- showed no or weak efficacy in the hrce model . however, in these screens remdesivir and its metabolite, gs- , demonstrated strong efficacy and aligned with potency described in the literature (ec of nm and μm, respectively) ( fig. d) , . remdesivir is a nucleoside analog that directly interferes with the viral-rna-dependent rna polymerase to inhibit viral replication and, importantly, successfully reduced recovery time for treated patients in clinical trials announced after our data and analysis was publicly released. further illustrating the predictive capacity of the model, two other antivirals, lopinavir and ritonavir were not found to be efficacious and were later discontinued in clinical testing for covid- . additionally, aloxistatin (e d), an irreversible cysteine protease inhibitor initially developed for muscular dystrophy , also demonstrated suppression of the viral phenoprint in hrces (ec of nm). recent studies have confirmed that cathepsin l, a cysteine protease, is required for sars-cov- entry in some cell types, and aloxistatin treatment significantly reduced entry of sars-cov- pseudovirions , . we then tested a subset of these antiviral compounds in additional cell types, vero and calu , and found aloxistatin did not rescue in these models. however, another protease inhibitor, camostat mesilate, was efficacious in the calu model (ec of nm), but not the vero or hrce models (fig. d-f, fig. s e ). camostat inhibits tmprss , which was recently shown to be required for sars-cov- entry in human airway cells . similar to findings in recent clinical trials , , we found chloroquine and hydroxychloroquine to have no benefit in the hrce or calu models; however, they showed modest benefit in vero cells (ec of nm and . µm, respectively) with very high offperturbation activity (fig. d-f) . overall, compound efficacy in human cell types was poorly recapitulated in vero cells (fig. d-f, fig. s f ). taken together these findings suggest that sars-cov- entry protease inhibitor activity varies across cell type and species; however, remdesivir and gs- show strong rescue of the viral phenoprint in all cell types tested. we identified jak inhibitors ruxolitinib and baricitinib as efficacious in both viral and cytokine storm models (fig. a, i, fig. s g ). however, we found that high concentrations of these compounds led to increased infection in hrce cells (fig. s h) . suppression of interferon production is a known component of sars-cov- infection at a low multiplicity of infection . it is unclear however, what effect additional interferon suppression would have in vivo, especially at higher viral loads, warranting investigation into alternative mechanisms of cytokine storm suppression, such as pi k or c-met inhibition. notably, bortezomib exhibited poor performance in both assay modes, is reported to impair endothelial cells in inflammatory contexts , and also enhances susceptibility to viral infection , particularly coronaviruses . biology is massively complex and highly networked, but the tools to explore and discover novel biology and develop medicines have until recently relied on simple, univariate measurement. the genomic revolution yielded a taste of what is possible if high-dimensional biology can be scaled by massively increasing the rate of understanding of the role of thousands of genes in human biology and disease. following this advancement, several new high-dimensional approaches have been developed to add clarity to complex functional relationships and discover new therapeutics, but these are hindered from highthroughput screening application by engineering and cost bottlenecks. we present here early data from our experience using phenomics (fig. ) as one strategy to accelerate drug discovery. we first established that diverse immune biology and pharmacology can be detected and discriminated using phenomics (fig. ) . these data also reveal that phenomics is not simply a classification technology: deep quantification of rich, multi-parametric signal and assessment of dose response is achievable, enabling comparisons and clusterings of diverse biological perturbants alone, or in combination across diverse cell types (fig. ) . diving more deeply on just two of the immune phenoprints we uncovered that are suitable for drug screening, we explored , new chemical entity starting points in the context of tgf-β-and tnf-ɑ-induced phenoprints (fig. ) . among hits in each context, prediction of alk and rock inhibition allowed us to rapidly shift resources to higher priority hits. in particular, we focused our efforts on a suppressor of the tnf-ɑ phenoprint with a high potential to potentially be active against a novel, but as of yet unknown, target. further, we drove medicinal chemistry work against this unknown target(s) using phenomics, demonstrating a -fold increase in potency while also increasing the magnitude of rescue. the application of phenomics can be extended to more complex disease-causing perturbations as well: the platform was rapidly adapted for the characterization and exploration of actionable therapies in the context of a novel and poorly understood disease, covid- . within days of initiating the project, we identified hits through high throughput chemical screens against covid- cytokine storm and sars-cov- infection in the relevant tissue types, without the need to develop cytokine-or virus-specific reagents and assays. we demonstrated that a handful of drugs currently in clinical trials strongly modulate the infection model (e.g. remdesivir), the cytokine storm model (pi k inhibitors) or both (jak inhibitors), prior to their clinical trial results becoming available. conventional antiviral research relies heavily on univariate assays that measure attributes like cell death or expression level of one protein. using a single platform, we found not only conventional antivirals, but also compounds with unconventional effects on disease-associated host pathways such as inflammation. in the sars-cov- model remdesivir and its analog, gs- demonstrated efficacy in all cell models tested. unfortunately, remdesivir is dosed via an intravenous route, typically in an inpatient setting and a time at which cytokine storm may be primarily responsible for the pathology (wherein remdesivir had no unexpected efficacy in our cytokine storm model). sars-cov- is able to use a variety of receptors to facilitate cell entry, with receptor specificity by cell type apparent in our data: aloxistatin (e d), inhibiting the cathespinmediated entry pathway, and camostat, inhibiting the tmprss -mediated pathway, each demonstrated strong response in hrce and calu cells respectively. nevertheless, pseudovirus entry assays have shown that even in cells with both pathways active, modulating a single pathway still quantitatively reduces viral infection load. further study of the proportional activity of each pathway in relevant human tissues may be warranted. as aloxistatin is orally bioavailable, simply and inexpensively synthesized, and has a relatively strong safety profile based on chronic treatment of muscular dystrophy patients in a phase trial, it deserves further study for covid- , with the expectation that early treatment in the course of infection may be most efficacious. in our model of more advanced covid- symptoms driven by cytokine storm, jak inhibitors were noteworthy rescuers of the inflammatory phenoprint and moderate rescuers of the viral phenoprint at low concentrations. due to oral bioavailability and the safety profile of acute treatment they are excellent candidates for repurposing. however, we observed that jak inhibitors enhanced cellular infection of sars-cov- at higher concentrations, suggesting an effect on interferon signaling, a possible clinical liability that should be closely monitored during trials. supporting this finding and underlining the importance of identifying diverse options to address cytokine storm, jak inhibitors are known to increase the prevalence or severity of other viral infections including herpes zoster, jc virus, and hepatitis b [ ] [ ] [ ] . this study also identified alternative mechanisms of action which have been much less deeply considered in the context of covid- , such as certain syk inhibitors, c-met inhibitors and pi k inhibitors. such molecules could be critical additions to remdesivir therapy in severe patients. this work and the recent success of dexamethasone in clinical trials for covid- also identified a key limitation of our current phenomics approach: when studying a cell type in isolation, phenomics surfaces compounds that act via cellautonomous mechanisms . compounds that intervene in multicellular processes might be revealed by development of coculture models. taken together, our results demonstrate that systems-level modeling and drug discovery is achievable using a single phenomics platform. first, this approach simplifies and extends the ability to work across many disease models rapidly because assay development work for any new model is minimized. second, this work partially overcomes a historical limitation of phenotypic screening, predicting mechanism of action, by relating the high-dimensional phenoprint of hit compounds to those of reference molecules. finally, we show the potential of this platform in optimizing nce compounds through medicinal chemistry in a high-dimensional, target-agnostic manner. unlike other high-dimensional approaches, the relatively inexpensive nature of these image-based assays allows them to be scaled to levels of throughput comparable to more traditional lowdimensional screening modalities. in the hopes that it will be valuable to others, we have made images and embeddings from huvec treated with the immune perturbant library, and from our covid- primary screens (both infection and cytokine storm) available online (including raw image data, metadata, and deep learning embeddings from images) at rxrx.ai/rxrx and rxrx.ai/rxrx , respectively. beyond the applications described here, the modular nature of this phenomics platform enables rapid adaptation to different libraries of immune stimulants, antibodies, or other large molecules, and incorporation of additional cellular contexts like co-culture models. in future work, comparisons of hits to phenoprints associated with knockout of each gene in the genome (achieved by arrayed whole-genome crispr knockout) may further expand our ability to predict mechanisms beyond those represented within our annotated small molecule library, bridging a key gap in phenotypic screening. critically, these data can be related over time and across disparate research programs-supporting the creation of large biological image datasets for deep-learning applications , that will accelerate drug discovery and yield functional maps of human cellular biology. human umbilical vein endothelial cells (lonza, c a) were cultured according to manufacturer's recommendations in egm (lonza, cc- ). nhlf: normal human lung fibroblasts (lonza, cc- ) were cultured according to manufacturer's recommendations in fgm (lonza: cc- , ) and used at passage four for all assays. pbmc: peripheral blood mononuclear cells (pbmcs), from healthy donors, were prepared from fresh (< hours old) leukopaks (stemcell technologies inc., catalog # ). following rcf washes (brake off) for platelet removal, the samples were processed by easyseptm rbc depletion reagent in accordance to manufacturer's instructions (stemcell technologies inc., catalog # ). following isolation, pbmcs were pelleted ( rcf) and resuspended in cryopreservation medium (cryostor® cs freeze media, biolifesolutions inc., part #: ) for long term storage. macrophage: macrophages were derived from either cryopreserved or freshly isolated pbmcs. monocytes were enriched by plastic adherence, first seeding pbmcs in serum free rpmi- medium followed by . hours of incubation and washed x with pbs. cells were incubated for d in complete medium (rpmi+ % heat inactivated fbs, ng/ml m-csf, ng/ml il- ). media was replenished after d with one third of the conditioned medium and ⅔ fresh complete medium. monocyte derived macrophages (mdms) were harvested from each vessel after an additional d using accutase following manufacturer's instructions (thermo -a ). mdms were pelleted ( rcf) and resuspended in cryo-preservation medium. hrce: primary human renal cortical epithelial cells lonza (cc- ) were propagated at °c with % co in epicm, (sciencell # ) supplemented with epithelial cell growth supplement (epicgs, sciencell # ). vero: an immortalized african green monkey kidney (atcc ccl- ) were propagated at °c with % co in eagle's minimum essential medium (emem) supplemented with % fbs. calu : human lung adenocarcinoma line (atcc htb- ) were propagated at °c with % co in emem supplemented with with % fbs. bv- : murine microglial cells (iclc atl , ospedale policlinico san martino) were propagated at °c with % co in rpmi media + % heat inactivated fbs. immune stimuli (table s ) were solubilized in sterile phosphate buffered saline (pbs) containing . % bsa (sigma cat.# a - ml) to make stock solutions of . mg/ml in echoqualified w low-dead volume source plates. source plates were stored at - °c until use. cells were seeded into -well microplates (greiner, ) via multidrop (thermo fisher) and incubated at c in % co for the duration of the experiment. immune stimuli or virus were added hours post-seeding (huvec, macrophage, fibroblast) or h (pbmc). treatments were randomized across treatment plates with a -log range of immune stimuli (typically . - ng/ml) at replicates each with acoustic transfer (echo , labcyte) and incubated °c for or (complete immune stimuli panel) or h (for pbmc with pembrolizumab or nivolumab). active sars-cov- was added via multidrop hours post seeding of the specified cell type. plates were stained using a modified cell painting protocol . cells were treated with mitotracker deep red (thermo, m ) for m, fixed in - % paraformaldehyde, permeabilized with . % triton x , and stained with hoechst (thermo), alexa fluor phalloidin (thermo), alexa fluor wheat germ agglutinin (thermo), alexa fluor concanavalin a (thermo), and syto (thermo) for minutes at room temperature and then washed and stored in hbss+ . % sodium azide. live-virus experiments omitted the mitochondrial stain due to operational constraints of the biosafety level environment. one hour prior to addition of the immune stimulant or hours prior to the addition of virus, cells were treated with compound via acoustic transfer (echo , labcyte). primary screening of new chemical entity libraries was performed at or µm with concentration-response confirmation spanning nm to µm in half-log steps. sars-cov- screening was completed in dose response in half-log steps between nm to µm. after h incubation (or hours post viral infection), plates were imaged using image express micro confocal high-content imaging system (molecular devices) microscopes in widefield mode with x objectives. four sites per well were acquired with channels per site. the following bandpass filters were used to visualize the channels: ff / / / , ff / / , ff - / / / - , ff - / / , and ff - / / . kinase profiling was performed using a kinomescan tm panel of or > kinases at eurofins-discoverx (san diego, ca). targets exhibiting > % inhibition were followed by kdelect Ⓡ analyses for dyrk b, cdk , rock and rock to determine ic 's. data presented as mean and standard deviation. alt-r crispr-cas reagents were purchased from integrated dna technologies, inc. (idt) and prepared following the manufacturer's guidelines and protocols (alt-r crispr-cas crrna, alt-r crispr-cas tracrrna cat # , alt-r s.p. cas nuclease v , cat # , and alt-r cas electroporation enhancer, cat # ). alt-r crispr-cas crrna was duplexed to alt-r crispr-cas tracrrna and then combined with alt-r s.p. cas nuclease v , following idt guidelines, to form a functional crispr-rnp complex. this crispr-rnp complex was transfected into cells using the lonza d nucleofection system and standard protocols with proprietary modifications, or with a proprietary lipofectionbased process for high-throughput application. alt-r cas electroporation enhancer was included into the nucleofection reactions to enhance transfection efficiency following standard guidelines from idt. all images were uploaded to cloud storage and featurized by embedding them with a trained neural network using google cloud platform. this network is based on the convolutional neural network densenet- . we adapt this network in the following ways. first, we change the first convolutional layer to accept image input of size x x . like densenet- , we use global average pooling to contract the final feature maps, which in our case are tensors of dimensions x x , , to a vector of length , . however, instead of following immediately with a classification layer, we add two fullyconnected layers of dimension , and , respectively, and use the -dimensional layer as the embedding of the image. the weights of this network were learned by adding two separate classification layers to the embedding layer, one using softmax activation and the other using arcface activation, which were simultaneously optimized by training the network to recognize perturbations in the public dataset rxrx and in a proprietary dataset of immune stimuli in various cell types. due to operational constraints of the bsl- assay conditions, a modified assay protocol lacking one image channel was used for the live-virus experiments. to accommodate this change, we trained a separate network of the same basic architecture that used only five input channels and one fully-connected final layer of dimension , . immune stimuli phenoprints were observed by calculating the mean embedding of all but one biological replicate, finding the angle between that average and the held-out replicate well, and repeating this process for every replicate to find the average cross-validated angle for that perturbation. statistical significance of these phenotypes was determined by comparing their similarity at high dose against a distribution of similarities between embeddings of images of untreated cells. we used the benjamini-hochberg multiple tests correction with a % false discovery rate and considered phenotypes acceptable if they had a corrected p-value< . in two independent experimental batches. the similarity between a pair of immune stimuli was determined by calculating the cosine similarity between all pairs of embeddings of one immune stimulant at high concentration with the embeddings of another immune stimulant at high concentration, and testing whether the mean of this pairwisesimilarity distribution was significantly different from zero using a one-sample t-test and employing the benjamini-hochberg multiple tests correction with a % false discovery rate. only significant pairs are used in this paper, and the means of their pairwise-similarity distributions are the values reported in the figures. for small molecule screens, post-processing of the embedded images included normalization to remove inter-plate variance, pca to reduce the feature space, and anomaly detection to remove outliers from the control populations. the vector pointing between the barycenters of the untreated and perturbed conditions was computed, and the embedded image vectors were decomposed into the signed scalar projection (the onperturbation score) and the scalar rejection (the off-perturbation score) with respect to this vector. these scores were normalized so that the mean on-perturbation score was for the untreated condition and for the perturbed condition. separation of the untreated and perturbed conditions along the on-perturbation axis was assessed by z-factor. for compound moa inference, cosine similarities were computed between the embeddings of an nce compound and the set of embeddings of a compound library annotated for moa, and significantly large similarities (relative to the distribution of similarities of pairings of annotated compounds with the nce compound) were reported. for the cocktail representing severely affected patients, top concentration of the most abundant protein, cxcl- was selected to be ng/ml based on a practical screen concentration and previously identified phenotypes for this factor. all other proteins were prepared at appropriate concentrations relative to cxcl- . cocktails representing healthy patients and those with moderate disease severity were prepared with each concentration relative to the severe cocktail. iba immunofluorescence assay bv- microglia were thawed from liquid nitrogen and plated at cells per well in -well pdl/collagen-coated plates (greiner # ). the next day, the cells were treated with compound first, followed an hour later by stimulant (recombinant murine tnf-ɑ+ifn-γ (peprotech), μg/ml in . % low endotoxin bsa/pbs). twenty-four hours after treatment, bv- microglia were fixed and stained for the microglial activation marker iba . briefly, cells were fixed with % pfa for min at room temperature. primary antibody solution was added to a : final dilution (iba antibody abcam cat #ab ) incubated overnight at °c. after overnight incubation with primary antibody cells were washed with pbs, and secondary antibody solution was added (alexafluor donkey anti-goat igg, : final dilution. invitrogen cat# a ). cells were then incubated for hr at room temperature, protected from light. following secondary antibody incubation cells were washed and the plate was sealed for imaging, which was performed on an image express micro confocal high-content imaging system (molecular devices). data and error presented as mean and standard deviation. quantitative measurement of cytokines in supernatants obtained from cultured cells was performed by using a homogenous timeresolved fluorescence assay (htrf, cisbio). il- htrf assays were performed in accordance with the manufacturer's protocol. briefly, cells were seeded and after h treated with compound over concentrations and replicates each. after an additional h, supernatant was collected from assay plate and appropriate sample dilutions and standards were made and dispensed into barcoded labeled -perkinelmer proxiplates (cat# ). after the recommended incubation time, the plate was read using an envision Ⓡ microplate reader (perkinelmer). data and error presented as mean and standard deviation. nfkb reporter cells (tr a- , system biosciences) were seeded at cells per well in well imaging plates ( , greiner). compounds were added at at least replicates per concentration followed by ng/ml tnf-ɑ after h. plates were imaged (gfp channel) once every h via incucyte (sartorius). data for integrated intensity over at the h time point is presented as mean and standard deviation, significance analyzed with -way anova. normal human lung fibroblasts (nhlf, lonza) were plated in -well plates (greiner) at . x ^ cells/ml in fgm- (lonza). after hours, the media was replaced with fbm media (lonza) and incubated for hours. cells were treated with compounds of interest in a -point dose response curve at replicates per concentration using an acoustic liquid handler, and incubated for hour at °c, % co . cells were then treated with ul of ng/ml tgf-β (r&d systems), for a final concentration of ng/ml. cells were incubated for minutes at °c and % co . cells were fixed with % pfa, blocked for hour with % bsa/ . % triton x- /pbs, and then stained for psmad (cell signaling, : ). after an overnight incubation at °c, cells were stained with alexafluor (thermo, : ) and hoescht (thermo, : ) for two hours at °c, washed with pbs twice, and imaged on an image express micro confocal high-content imaging system (molecular devices). images were analyzed with cellprofiler to observe nuclear translocation of psmad . data and error presented as mean and standard deviation. normal human lung fibroblasts (nhlf, lonza) were plated in -well plates (greiner) at . x ^ cells/ml in fgm- (lonza). cells were treated with compounds of interest in a point dose response curve at replicates per concentration using acoustic transfer (echo , labcyte), and incubated for hour at °c, % co . cells were then treated with ul of ng/ml tgf-β (r&d systems), for a final concentration of ng/ml using multidrop (thermo fisher). cells were incubated for hours at °c and % co . cells were fixed with % pfa, blocked for hour with % bsa/ . % triton x- /pbs, and then stained for collagen i (cell signaling, : ). after an overnight incubation at °c, cells were stained with alexafluor (thermo, : ), cellmask orange (thermo, : ) and hoescht (thermo, : ) for two hours at °c, washed with pbs twice, and imaged on an image express micro confocal high-content imaging system (molecular devices). images were analyzed with cellprofiler. data and error presented as mean and standard deviation electric cell-substrate impedance sensing (ecis) prior to use, -well ecis plates (applied biophysics, w idf pet) were pre-treated with mm l-cysteine (sigma-aldrich, c - g) and then coated with fibronectin (gibco, phe ). human umbilical venous endothelial cells were plated in the fibronectin coated -well ecis plates at , cells/well in ebm- (lonza cc- ) +egm- (lonza, cc - ). cells were allowed to settle at room-temperature for hour and then incubated for hours at o c, % co . following incubation, the plates were placed on the ecis readers for hour to establish baseline resistance. cells were then treated in a point dose response curve at replicates per concentration using acoustic transfer (echo , labcyte) and returned to the incubator for hour. following this, cells were treated with the cytokine storm cocktail (table s ) . resistance was measured for hours following this. the assay window was defined as the time range with the greatest observable difference in membrane resistance between empty and disease control (approximately hours following cocktail addition). resistance was normalized to each plate and graphed as a dose-response curve where and correspond to health and disease controls, respectively. after staining and imaging to establish high dimensional phenotypes, plates were rinsed once with wash buffer ( xhbss + . % sodium azide) before incubating with primary antibody raised against sars-cov- nucleocapsid protein for mins at rt (sino biological catno. -t , : dilution). media was evacuated from wells by inverted centrifugation, and secondary antibody was added and incubated another minutes at rt (thermo scientific catno. a , : dilution). primary and secondary antibodies were diluted in stain base media ( xhbss, % bsa, . % triton-x ). plates were washed one final time using inverted centrifugation and wash buffer before imaging as described above. data presented as mean value. cell-level image segmentations and per-cell log-mean nucleocapsid staining intensities were calculated using standard image segmentation techniques (cellprofiler ). these intensities were normalized by plate with respect to log-mean-intensity in the mock control cells. to adjust for optical effects that changed the background fluorescence level in infected wells, a gaussian mixture model was used to align the lowest peak in log-meanintensity across well conditions. cells were estimated to be infected if they exhibited an adjusted log-mean-intensity above the th percentile of intensity for the control cell population. this estimate of number of infected cells was used to compute a fraction of cells infected, which was adjusted to account for the % of uninfected cells expected to be above the % threshold. the usa-wa / strain of sars-cov- was propagated in vero cells. cells were grown in standard tissue culture flasks ( % confluence) and were infected at a multiplicity of infection (moi) of . , in emem + % fbs and g/ml gentamicin, incubated at °c with % co for days. supernatants containing virus were removed from these cultures, spun down to remove cellular debris and stored at - °c until use. viral titers were determined through standard tissue culture infectious dose % (tcid ) methods, where cytopathic effect (cpe) on vero cells was measured by visual observation under a light microscope. to create a suitable control with inactivated virus, sars-cov- was irradiated with a uv lamp for or minutes. viral inactivation in this sample was verified using visual cpe on vero cells, where undetectable level of active virus was observed. an additional "mock" control was created using conditioned media preparations generated from uninfected vero cells grown in % fbs in emem for five days. cellular debris were removed through centrifugation and the supernatants were frozen at - °c until use. all experiments using sars-cov- were performed using biosafety level (bsl- ) containment procedures at partner facilities including one at utah state university. data and error presented as mean and standard deviation. a -well plate was pre-treated with compounds, at concentrations (ranging from to ng/ml) with at least replicates of each concentration, and a reaction mix containing ng alk (thermofisher), using poly : as substrate was added to the plate. the reaction was started by the addition of µm atp, the plate was mixed on a plate shaker ( rpm for min) and the reaction allowed to incubate at room temp for min. the reaction was terminated by the addition of adp-glo reagent (promega v ) ( min incubation) and kinase detection reagent ( min incubation). luminescence was captured using an envision xcite plate reader. data presented as mean and standard deviation. all assessments of compound diversity and similarity were performed in datawarrior (openmolecules.org). neighbors were determined to be at % or greater similarity as determined with the skelspheres descriptor. tables containing metrics for immune stimulant phenotypes in each cell type are provided in the supplementary materials. underlying images, metadata, and deep learning embeddings for soluble factor perturbations in huvec and all primary screens in primary cell types for covid- virus and cytokine storm screening have been made available at rxrx.ai. an interactive server containing drug response projections, hit scores, and structures for covid- screening data has been made available for custom search at covid .rxrx.ai. the code underlying this report leverages proprietary algorithms for image processing, data standardization, outlier detection, and compound efficacy scoring. as such the code underlying this report will not be made available. instead, much of the output of these algorithms is provided in the provided supplemental tables. virus graphic in fig. schematic for interpreting projection of drug response in -dimensional plot. contours show the %, %, and % distributions of on-and off-perturbation scores for the perturbation (orange) and target (blue) states. ideal rescues are compounds that rescue along the on-perturbation xaxis toward the target state with minimal increase to the off-perturbation score on the y-axis. b. potential hits are prioritized by the proximity of any dose to the target state, illustrated here with dashed lines and increasing red highlight intensity for higher-ranked dose-curve trajectories. c. projections of treatments along the on-perturbation vector: rescue of the tnf-ɑ phenoprint with clinically approved monoclonal antibodies, reversal il- -il- r receptor chimera, and reversal of socs crispr gene knockout. ec for high-dimensional compound rescue are indicated in parentheses representative images of hrce, calu and vero cells immunostained with sars-cov- nucleocapsid protein (pink) and modified cell paint dyes c. infection rates of each tested cell type as analyzed by nucleocapsid immunostaining. of note, hrce donors displayed significant variation in infectability and only a minority of donors exhibited infection rates high enough for screening. antibody stains were performed after the principal analysis concluded and are therefore not represented in the primary dataset used for phenoprint evaluation and compound screening. d. infection of hrce yielded a phenoprint against the mock-infected target population with an assay z-factor of . cytokines in rheumatoid arthritis -shaping the immunological landscape cytokine networks in neuroinflammation immune signaling in neurodegeneration inflammatory cytokine and chemokine profiles are associated with patient outcome and the hyperadrenergic state following acute brain injury cytokines in cancer immunotherapy cytokines in the treatment of cytokines in the balance of protection and pathology during mycobacterial infections the temporal role of cytokines in flavivirus protection and pathogenesis insights into the hiv latency and the role of cytokines cytokine storm and sepsis disease pathogenesis immunotherapeutic implications of il- blockade for cytokine storm biologics for targeting inflammatory cytokines, clinical uses, and limitations directed evolution of enzymes and binding proteins anti-tnf for treating rheumatoid arthritis immune-centric network of cytokines and cells in disease context identified by computational mining of pubmed rna-seq reveals activation of both common and cytokinespecific pathways following neutrophil priming proteomics analysis of cytokine-induced dysfunction and death in insulin-producing ins- e cells: new insights into the pathways involved type i interferons and the cytokine tnf cooperatively reprogram the macrophage epigenome to promote inflammatory activation strategy for identifying repurposed drugs for the treatment of cerebral cavernous malformation toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling repurposing high-throughput image assays enables biological activity prediction for drug discovery a next generation connectivity map: l platform and the first , , profiles tales of , small molecules: phenomic profiling through live-cell imaging in a panel of high-content assay multiplexing for toxicity screening in induced pluripotent stem cell-derived cardiomyocytes and hepatocytes cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes densely connected convolutional networks transforming growth factor-β signaling in immunity and cancer tgfβ attenuates tumour response to pd-l blockade by contributing to exclusion of t cells growth factor modulation of fibroblast proliferation, differentiation, and invasion: implications for tissue valve engineering fibroblasts and myofibroblasts: what are we talking about? sensing of viral infection and activation of innate immunity by toll-like receptor structural biology and evolution of the tgf-β family essential involvement of ifn-γ in clostridium difficile toxin a-induced enteritis clostridium difficile toxins: mechanism of action and role in disease characterization of golimumab, a human monoclonal antibody specific for human tumor necrosis factor α infliximab vs. adalimumab in crohn's disease: results from patients in an australian and new zealand observational cohort study socs binds specific receptor-jak complexes to control cytokine signaling by direct kinase inhibition opportunities and challenges in phenotypic drug discovery: an industry perspective the alk- inhibitor a- - inhibits smad signaling and epithelial-to-mesenchymal transition by transforming growth factor-beta preclinical assessment of galunisertib (ly monohydrate), a first-in-class transforming growth factor-β receptor type i inhibitor selective inhibition of tgfβ activation overcomes primary resistance to checkpoint blockade therapy by altering tumor immune landscape targeting tumor necrosis factor alpha for alzheimer's disease selective modulation of tnf-tnfrs signaling: insights for multiple sclerosis treatment the far-reaching scope of neuroinflammation after traumatic brain injury anti-tnf-α therapies: the next generation nf-kb signaling in inflammation rho kinase (rock) inhibitors and their therapeutic potential cutting edge: selective oral rock inhibitor reduces clinical scores in patients with psoriasis vulgaris and normalizes skin pathology via concurrent regulation of il- and il- nct ibrutinib suppresses lps-induced neuroinflammatory responses in bv microglial cells and wild-type mice the trinity of covid- : immunity, inflammation and intervention -novel coronavirus ( -ncov) infections trigger an exaggerated cytokine response aggravating lung injury baricitinib restrains the immune dysregulation in covid- patients regulation of the interferon system: evidence that vero cells have a genetic defect in interferon production identification of potential treatments for covid- through artificial intelligence-enabled phenomic analysis of human cells infected with sars-cov- human kidney is a target for novel severe acute respiratory syndrome coronavirus (sars-cov- ) infection. infectious diseases (except hiv/aids covid- clinical trial dashboard mechanism of inhibition of ebola virus rna-dependent rna polymerase by remdesivir coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease. mbio remdesivir for the treatment of covid- -preliminary report who discontinues hydroxychloroquine and lopinavir/ritonavir treatment arms for covid- characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov cell entry mechanisms of sars-cov- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor covid- ) update: fda revokes emergency use authorization for chloroquine and hydroxychloroquine effect of high vs low doses of chloroquine diphosphate as adjunctive therapy for patients hospitalized with severe acute respiratory syndrome coronavirus (sars-cov- ) infection: a randomized clinical trial imbalanced host response to sars-cov- drives development of covid- hypoxia-inducible transcription factor- alpha determines sensitivity of endothelial cells to the proteosome inhibitor bortezomib the proteasome inhibitor bortezomib enhances the susceptibility to viral infection the proteasome inhibitor velcade enhances rather than reduces disease in mouse hepatitis coronavirus-infected mice a systematic review and meta-analysis of infection risk with small molecule jak inhibitors in rheumatoid arthritis fatal encephalopathy with wild-type jc virus and ruxolitinib therapy evaluation of hepatitis b virus in clinical trials of baricitinib in rheumatoid arthritis low-cost dexamethasone reduces death by up to one third in hospitalised patients with severe respiratory complications of covid- rxrx additive angular margin loss for deep face recognition rxrx : an image set for cellular morphological variation across many experimental batches cellprofiler: image analysis software for identifying and quantifying cell phenotypes we declare competing interests. all authors were employees of or advisors to recursion during the course of this work. all authors have real or potential ownership interest in recursion. key: cord- -iuez zj authors: achdout, hagit; aimon, anthony; bar-david, elad; barr, haim; ben-shmuel, amir; bennett, james; bobby, melissa l; brun, juliane; sarma, bvnbs; calmiano, mark; carbery, anna; cattermole, emma; chodera, john d.; clyde, austin; coffland, joseph e.; cohen, galit; cole, jason; contini, alessandro; cox, lisa; cvitkovic, milan; dias, alex; douangamath, alice; duberstein, shirly; dudgeon, tim; dunnett, louise; eastman, peter k.; erez, noam; fairhead, michael; fearon, daren; fedorov, oleg; ferla, matteo; foster, holly; foster, richard; gabizon, ronen; gehrtz, paul; gileadi, carina; giroud, charline; glass, william g.; glen, robert; glinert, itai; gorichko, marian; gorrie-stone, tyler; griffen, edward j; heer, jag; hill, michelle; horrell, sam; hurley, matthew f.d.; israely, tomer; jajack, andrew; jnoff, eric; john, tobias; kantsadi, anastassia l.; kenny, peter w.; kiappes, john l.; koekemoer, lizbe; kovar, boris; krojer, tobias; lee, alpha albert; lefker, bruce a.; levy, haim; london, nir; lukacik, petra; macdonald, hannah bruce; maclean, beth; malla, tika r.; matviiuk, tatiana; mccorkindale, willam; melamed, sharon; michurin, oleg; mikolajek, halina; morris, aaron; morris, garrett m.; morwitzer, melody jane; moustakas, demetri; neto, jose brandao; oleinikovas, vladas; overheul, gijs j.; owen, david; pai, ruby; pan, jin; paran, nir; perry, benjamin; pingle, maneesh; pinjari, jakir; politi, boaz; powell, ailsa; psenak, vladimir; puni, reut; rangel, victor l.; reddi, rambabu n.; reid, st patrick; resnick, efrat; robinson, matthew c.; robinson, ralph p.; rufa, dominic; schofield, christopher; shaikh, aarif; shi, jiye; shurrush, khriesto; sittner, assa; skyner, rachael; smalley, adam; smilova, mihaela d.; spencer, john; strain-damerell, claire; swamy, vishwanath; tamir, hadas; tennant, rachael; thompson, andrew; thompson, warren; tomasio, susana; tumber, anthony; vakonakis, ioannis; van rij, ronald p.; varghese, finny s.; vaschetto, mariana; vitner, einat b.; voelz, vincent; von delft, annette; von delft, frank; walsh, martin; ward, walter; weatherall, charlie; weiss, shay; wild, conor francis; wittmann, matthew; wright, nathan; yahalom-ronen, yfat; zaidmann, daniel; zidane, hadeer; zitzmann, nicole title: covid moonshot: open science discovery of sars-cov- main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iuez zj herein we provide a living summary of the data generated during the covid moonshot project focused on the development of sars-cov- main protease (mpro) inhibitors. our approach uniquely combines crowdsourced medicinal chemistry insights with high throughput crystallography, exascale computational chemistry infrastructure for simulations, and machine learning in triaging designs and predicting synthetic routes. this manuscript describes our methodologies leading to both covalent and non-covalent inhibitors displaying protease ic values under nm and viral inhibition under um in multiple different viral replication assays. furthermore, we provide over crystal structures of fragment-like and lead-like molecules in complex with the main protease. over synthesized and ordered compounds are also reported with the corresponding activity in mpro enzymatic assays using two different experimental setups. the data referenced in this document will be continually updated to reflect the current experimental progress of the covid moonshot project, and serves as a citable reference for ensuing publications. all of the generated data is open to other researchers who may find it of use. the project is an international effort involving over groups (see author list), many of whom have provided work pro-bono or at-cost. the project is ip-free and has relied on philanthropic donations up to this point. the covid moonshot project has focused on progressing early fragment-screening results into potent compounds with activity against both the main protease and the virus. design of compounds was done through a novel crowd sourcing mechanism, harnessing the expertise of drug-designers across the world. building on their individual sets of expertise, these designers have entered over submissions for a total of over , individual molecule designs to date. the rationales for each design include docking-based approaches, by-eye structure-based designs, machine learning approaches, crawling of the past literature on sars and mers compounds, and other general medicinal-chemistry insights that can be visualised at https://postera.ai/covid. in the background, a high-throughput screen of , compounds has been carried out to further aid with lead generation. the results of this screen will be reported in a future manuscript. out of these myriad designs, over compounds have been either ordered or synthesized and tested for activity against mpro in assays at two different sites. the enzymatic activity has been measured using two complementary biochemical assays assessing inhibition of the sars-cov- main protease: a fluorescence based assay (weizmann institute) and rapidfire mass spectrometry assay (university of oxford). furthermore, structural characterization of selected compounds has resulted in over crystal structures, when combined with the fragment screening data. particularly promising compounds have been sent through a further admet assay cascade, and sufficiently suitable compounds have been assessed in cellular antiviral screening assays performed by collaborators at the university of oxford, the university of nebraska medical centre, israel institute for biological research and the radboutd university medical centre. all design and experimental data can be found on the covid moonshot website at https://postera.ai/covid, which also links to many crystal structures found at https://fragalysis.diamond.ac.uk/viewer/react/preview/ target/mpro. the complete raw data in csv format with detailed description is located at the moonshot github repository, https://github.com/postera-ai/covid_moonshot_submissions. the data is frequently updated and we kindly ask that all questions regarding the data be directed at the active covid moonshot forum https://discuss.postera.ai/c/covid/general/ . this is an ongoing project, and this manuscript is very much a "living document". the reason for the publication of this document is to provide readers and ensuing publications a central point to reference data generated from covid moonshot. figures and exemplify our progress -from crowdsourced fragment merges (try-uni- a b- ) to antiviral activity that is comparable to that of remdesivir. our approach combines the collective creativity of the scientific community with algorithms that aid with synthesis planning and free energy calculations for prioritizing crowdsourced designs. specifically, as we started with a dense set of fragment hits with over protein-fragment structures, a crowdsourcing approach allows us to sample different creative merges and expansions of the different fragments. those crowdsourced designs are then triaged based on a lax docking-based threshold. docking was performed by several groups pursuing several different approaches-of both covalent and non-covalent compounds. consensus docking results were used to filter out designs which demonstrably do not fit the binding site, a difficult task to do by-eye given the sheer number of designs. we then triage compounds based on synthetic accessibility to ensure short cycle times. as the total number of commercially available virtual compounds exceeds billion, keeping track of synthetic accessibility is a non-trivial task. complexlooking molecules might be synthesizable in a single step, while simple functionalizations could require a complex multi-step synthesis. during the project, consortium member postera built a new tool (https: //postera.ai/manifold) in order to accelerate this difficult task. additional custom synthesis tools from postera allow for rapid prioritization of compounds and generation of synthetic routes. furthermore, the algorithms are tailored to the building blocks available at each of our partner cros, such that we can algorithmically allocate load. the resulting compounds are synthesized at partner cros, and all compounds are subjected to: ( ) mpro rapidfire mass spectrometry assay, ( ) mpro fluorescence assay, ( ) high throughput x-ray crystallography. non-covalent compounds are also analysed by nmr. early on the use of two complementary protease assays was deemed important due to the rapid development of procedures and the need for verification. agreement between the label free ms based approaches and fluorophore-based approaches have been reassuring, while discrepancies between assays have led to discovery of false-positives. high throughput crystallography has allowed for rapid generation of design ideas that merge early binding fragments with more recent actives generated during the project. as soon as the data is cleaned and standardized, the results of all assays are released as on the moonshot platform (http://postera.ai/covid and https://discuss.postera.ai/c/covid/general/ ), ready for the crowd to submit followup compounds. we received over , diverse designs from more than contributors. our data-driven approach enabled the synthesis and testing of over , compounds in less than months. some members of the community informally organised themselves into a medicinal chemistry design team, with weekly meetings to go through the data and suggest new designs. moreover, exascale computing resources donated by folding@home enable moonshot to use free energy perturbation to prioritise compounds. compounds were seeded into assay-ready plates (greiner low volume ) using an echo acoustic dispenser, and dmso was back-filled for a uniform concentration in assay plates (maximum %). screening assays were performed in duplicate at µm and µm. hits of greater than % inhibition at µm were confirmed by dose response assays. reagents for mpro assay reagents were dispensed into the assay plate in l volumes for a final volume of µl. final reaction concentrations were mm hepes ph . , mm tcep, mm nacl, . % tween- , % glycerol, nm mpro, nm fluorogenic peptide substrate ([ -fam]-avlqsgfr-[lys(dabcyl)]-k-amide). mpro was pre-incubated for minutes at room temperature with compound before addition of substrate. protease reaction was measured continuously in a bmg pherastar fs with a / ex/em filter set. data analysis was performed with collaborative drug discovery (cdd). one of these compounds ada-ucb- c cb - , has also been tested in an antiviral assay and showed promising activity. antiviral activity was measured at the israel institute for biological research (iibr) with remdesivir serving as the positive control. the more potent analogs of ada-ucb- c cb - have not yet been tested for antiviral activity. the development of these series will be further detailed in a future publication. mpro inhibitor with antiviral activity. lon-wei- e a e - has also been tested in the iibr antiviral assay and showed activity. the development of these series will be further detailed in a future publication. these interconnected steps take place among teams distributed internationally among numerous researchers who have never met in person. coordination of these synergistic efforts has taken a significant amount of logistical planning amid lockdowns and other complications resulting from the covid pandemic. inhibitor compounds at mm in dmso are dispensed into -well plates (greiner pp ) using an echo t dispenser (dmso concentration < %, final volume = nl.). a µm enzyme stock solution is prepared in mm hepes, ph . and mm nacl, and subsequently diluted to a working solution of nm mpro in assay buffer ( mm hepes, ph . and mm nacl) before the addition of µl to each well using a multidrop combi (thermo scientific). after a quick centrifugation step ( rpm, s) the plate is incubated for min at room temperature. the reaction is initiated with the addition of µl of µm substrate (tsavlqsgfrk-nh , initially custom synthesized by the schofield group, glbiochem) dissolved in assay buffer. after centrifugation ( rpm, s) the reaction is incubated for min at room temperature before quenching with % formic acid. the reactions are analysed with ms using rapidfire (rf) high-throughput sampling robot (agilent) connected to an ifunnel agilent accurate mass quadrupole time-of-flight (q-tof) mass spectrometer using electrospray. all compounds are triaged by testing the % inhibition at and µm final concentration. dose response curves uses an -point range of - . µm inhibitor concentrations. rapidfire integrator software (agilent) was used to extract the m/z (+ ) charge states of both the substrate ( . da) and cleaved n-terminal product tsavlq ( . da) from the total ion chromatogram data followed by peak integration. percentage conversion (product peak integral / (product peak integral + substrate peak integral))* ) and percentage inhibitions were calculated and normalised against dmso control with deduction of any background signal in microsoft excel. ic s were calculated using levenberg-marquardt algorithm used to fit a restrained hill equation to the dose-response data with both graphpad prism and cdd. method described in kantsadi and vakonakis [ ] . method described in douangamath et al. [ ] . , compounds from the g-incpm screening collection, as well as a , mpro targeted compound library from enamine, were screened against mpro using a fluorogenic protease assay. a screening cascade was established as follows: ) hits were defined as showing a greater than % inhibition were selected, filtered for obvious structural alerts and promiscuity in unrelated assays ) hits were re-tested in three independent mpro assay runs to confirm activity ) confirmed hits in concentrations were tested in two independent mpro assay runs to determine preliminary potency as ic . promising compounds were re-sourced, and are awaiting validation. hts was performed in both -well ( µl) and -well ( µm, greiner ) formats. compounds were pre-plated into barcoded assay ready plates for a final concentration of µm ( . % dmso). the assay was performed using thermo combi dispensers with µl tubing integrated with plate storage and bmg pherastar fs plate reader and spinnaker plate mover driven by momentum software (thermo). for -well screening, plates were unloaded and spun for min at rpm prior to plate reading. data was loaded to screener software (genedata) for mapping, normalization and annotation. curated data was then loaded to cdd vault for merging with compound annotations. the results of the hts screen can be viewed publicly at https://github.com/postera-ai/covid_moonshot_ submissions nephelometry-based solubility assay to define threshold compound solubility at the concentrations of µm and µm in pbs solution. the compound solubilities are normalized to deoxyfluorouridine ( % soluble) and ondansetron ( % solubility). to put the numbers in perspective, the expected relative solubility values are: to perform the screening, l aliquots of pbs buffer were added to each well of -well microplates with clear flat bottom. plates with the buffer were subjected to optical integrity inspection using nephelostar. total scanning time for one -well plate was min. the pass criterion was set as the background signal in any of the scanned wells below rnu, thus making optical quality of the plates satisfactory for the assay. in case of an excessively high background signal in any wells of the test-plates, those plates/wells were excluded from the study (data not shown). compounds were obtained from enamine repository as solids formatted in polypropylene, round bottom blank tubes, in latch boxes (matrix ). compounds were dissolved in dmso at mm, incubated at - °c for hours, shaken for hour at rpm using highspeed microplate shaker illumina, then incubated at - °c for hours. intermediate dmso solutions of the tested compounds were prepared to mm (for the tested concentration µm) and mm (for the tested concentration µm). ondansetron and dofu were added to the final plates in mm concentration to get mm and mm final concentrations in the assay plates. to prepare the test plates, . µl aliquots of dmso solutions of the tested compounds, reference compounds or pure dmso were transferred from the polypropylene tubes to the corresponding wells of -well plates with pbs buffer using plate mate, according to the plate map ( figure ) . thus, final volumes in the test plates were brought to . µl, resulting in concentrations of the compounds in test wells of µm and % dmso, correspondingly. then, turbidity of the solutions was immediately scanned for each well. we employ an approach based on the molecular transformer technology [ , ] . our algorithm uses natural language processing to predict the outcomes of chemical reactions and design retrosynthetic routes starting from commercially available building blocks. this proprietary platform is provided free of charge by postera inc (http://postera.ai). additionally, manifold (https://postera.ai/manifold) was built by postera inc. during the project to search the entire space of purchasable molecules, and automatically find the optimal building blocks. a variety of antiviral replication assays were performed in collaborating laboratories, including cytopathic effect (cpe) inhibition assays at the iibr, israel, viral qpcr at radboud, netherlands, immunofluorescence assays at university of nebraska medical centre, usa, and plaque assays and focus forming unit assays at university of oxford, uk. were cultured in dulbecco's modified eagle medium (dmem) with . g/l glucose and l-glutamine (gibco), supplemented with % fetal calf serum (fcs, sigma aldrich), µg/ml streptomycin and u/ml penicillin (gibco). cells were maintained at c with % co . virus. sars-cov- (isolate betacov/munich/bavpat / ) was kindly provided by prof. c. drosten (charité-universitätsmedizin berlin institute of virology, berlin, germany) and was initially cultured in vero e cells up to three passages in the laboratory of prof. bart haagmans (viroscience department, erasmus medical center, rotterdam, the netherlands). vero fm cells were infected with passage stock at an moi of . in infection medium (dmem containing l-glutamine, % fcs, mm hepes buffer, µg/ml streptomycin and u/ml penicillin). cell culture supernatant containing virus was harvested at hours post-infection (hpi), centrifuged to remove cellular debris, filtered using a . m syringe filter (whatman), and stored in l aliquots at - c. compounds. mm compound stocks dissolved in dmso were provided by enamine (ukraine). virus stock titration. vero e cells were seeded in -well plates at a density of , cells/well. cell culture medium was discarded at h post-seeding, cells were washed twice with pbs and infected with -fold dilutions of the virus stock. at hpi, cells were washed with pbs and replaced with overlay medium containing minimum essential medium (mem), % fcs, mm hepes buffer, g/ml streptomycin, u/ml penicillin and . % carboxymethyl cellulose (sigma aldrich). at hpi, the overlay medium was discarded, cells were washed with pbs and stained with . % crystal violet solution containing % formaldehyde for minutes. thereafter, staining solution was discarded, plates washed with double-distilled water, dried and plaques were counted. antiviral assay. vero e cells were seeded onto -well plates at a density of , cells/well. at h post-seeding, cell culture medium was discarded, cells were washed twice with pbs and infected with sars-cov- at an moi of . in the presence of six concentrations of the inhibitors ( m - . m). at hpi, the inoculum was discarded, cells were washed with pbs, and infection medium containing the same concentration of the inhibitors was added to the wells. sars-cov- infection in the presence of . % dmso was used as a negative control. at hpi, l of the cell culture supernatant was added to rna-solv reagent (omega bio-tek) and rna was isolated and precipitated in the presence of glycogen according to manufacturer's instructions. qrt-pcr. taqman reverse transcription reagent and random hexamers (applied biosystems) were used for cdna synthesis. semi-quantitative real-time pcr was performed using gotaq qpcr (promega) bryt green dye-based kit using primers targeting the sars-cov- e protein gene (forward primer, -acaggtacgttaatagttaatagcgt- '; reverse primer, -acaggtacgttaatagttaatagcgt- '). a standard curve of a plasmid containing the e gene qpcr amplicon was used to convert ct values relative genome copy numbers. cell viability. vero e cells were seeded in -well white-bottom culture plates (perkin elmer) at a density of , cells per well. at h post-seeding, cells were treated with the same concentrations of compounds as used for the antiviral assay. cells treated with . % dmso were used as a negative control. at h posttreatment, cell viability was assessed using the cell titer glo . kit (promega) using the victor multilabel plate reader (perkin elmer) to measure luminescence signal. virus propagation. sars-cov- england/ / was provided at passage from public health england, collindale. passage submaster and passage working stocks were produced by infecting vero e cells at a multiplicity of infection of . in virus propagation medium (dmem with glutamax supplemented with % fcs) and incubating until cytopathic effect was visible. the cell supernatant was then centrifuged at g for minutes, aliquoted and stored at - °c. the titre of viral stocks was determined by plaque assay. all subsequent assays were performed using a passage stock. cell viability. cell viability was was measured using the celltiter r aqueous one solution cell proliferation mta ( -( , -dimethylthiazol- -yl)- -( -carboxymethoxyphenyl)- -( -sulfophenyl)- h - tetrazolium, inner salt) assay (promega) according to the manufacturer's instruction after treatment with compound. briefly, calu cells were treated with compounds in quadruplicate for days. wells with µl growth medium with and without cells were included as controls in quadruplicate. after the incubation, µl of growth medium was removed and µl of mts reagent was added to the remaining medium in each well. after a further one to two hour incubation, the absorbance at nm was measured on a molecular devices spectramax m microplate reader. antiviral assays. calu cells were seeded into well plates (vwr) at a cell density of × cells/well. after hours, the wells were inoculated for minutes with l sars-cov- england / at a m.o.i. of . . the inoculum was replaced with l growth medium containing the appropriate compound dilutions or matched controls and incubated for days. the supernatant was harvested and stored at - °c prior to analysis by plaque or focus forming unit (ffu) assay. for plaque assays, four -fold serial dilutions of each supernatant to be analysed were prepared in virus propagation medium. l of each dilution was inoculated in triplicate into wells of a -well plate followed by . ml of vero e cells at × cells/ml in virus propagation medium. the plates were incubated for hours before the addition of . ml of . % cmc overlay ( : mix of . % carboxymethylcellulose in h o and virus propagation medium). the plates were incubated for a further days before staining with napthol blue black. plaques were counted by eye. for focus forming unit assays, a sars-cov- microneutralization assay from the w james lab (dunn school of pathology, university of oxford) was adapted for use as a ffu assay. briefly, half log dilutions of each supernatant to be analyzed were prepared in virus propagation medium. µl of each dilution was inoculated into wells of a -well plate in quadruplicate followed by l vero e cells at . × cells/ml in virus propagation medium. the plates were incubated for hours prior to the addition of l of . % cmc overlay. the plates were incubated for a further hours. after hours the overlay was carefully removed and the cells washed once with pbs before fixing with µl of % paraformaldehyde, after minutes the paraformaldehyde was removed and replaced with µl of % ethanolamine in pbs. the cells were permeabilized by replacing the ethanolamine with % triton x in pbs and incubating at °c for minutes. the plates were then washed times with wash buffer ( . % tween in pbs) inverted and gently tapped onto tissue to dry before the addition of l of ey a anti-n human mab (arthur huang (taiwan)/alain townsend (weatherall institute of molecular medicine, university of oxford)) at pmol in wash buffer. the plates were rocked at room temperature for hour, washed and incubated with l of secondary antibody anti-human igg (fc-specific)-peroxidase-conjugate produced in goat diluted : at room temperature for a further hour. µl of trueblue peroxidase substrate was added to the wells and incubated at rt for min on the rocker, after minutes the substrate was removed and the plates washed with ddh o for minutes. the h o was removed and the plates allowed to air dry. the foci were then counted using an elispot classic reader system (aid gmbh). structure of m pro from sars-cov- and discovery of its inhibitors crystallographic and electrophilic fragment screening of the sars-cov- main protease crowdsourcing drug discovery for pandemics discovery, synthesis, and structure-based optimization of a series of n-(tert-butyl)- -(n-arylamido)- -(pyridin- -yl) acetamides (ml ) as potent noncovalent small molecule inhibitors of the severe acute respiratory syndrome coronavirus (sars-cov) cl protease rapid assessment of ligand binding to the sars-cov- main protease by saturation transfer difference nmr spectroscopy. biorxiv molecular transformer unifies reaction prediction and retrosynthesis across pharma chemical space molecular transformer: a model for uncertaintycalibrated chemical reaction prediction key: cord- -xanewi authors: sun, jiya; ye, fei; wu, aiping; yang, ren; pan, mei; sheng, jie; zhu, wenjie; mao, longfei; wang, ming; huang, baoying; tan, wenjie; jiang, taijiao title: comparative transcriptome analysis reveals the intensive early-stage responses of host cells to sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xanewi severe acute respiratory syndrome coronavirus (sars-cov- ) has caused a widespread outbreak of highly pathogenic covid- . it is therefore important and timely to characterize interactions between the virus and host cell at the molecular level to understand its disease pathogenesis. to gain insights, we performed high-throughput sequencing that generated time-series data simultaneously for bioinformatics analysis of virus genomes and host transcriptomes implicated in sars-cov- infection. our analysis results showed that the rapid growth of the virus was accompanied by an early intensive response of host genes. we also systematically compared the molecular footprints of the host cells in response to sars-cov- , sars-cov and mers-cov. upon infection, sars-cov- induced hundreds of up-regulated host genes hallmarked by a significant cytokine production followed by virus-specific host antiviral responses. while the cytokine and antiviral responses triggered by sars-cov and mers-cov were only observed during the late stage of infection, the host antiviral responses during the sars-cov- infection were gradually enhanced lagging behind the production of cytokine. the early rapid host responses were potentially attributed to the high efficiency of sars-cov- entry into host cells, underscored by evidence of a remarkably up-regulated gene expression of tprmss soon after infection. taken together, our findings provide novel molecular insights into the mechanisms underlying the infectivity and pathogenicity of sars-cov- . coronavirus disease (covid- ) triggered by the severe acute respiratory syndrome coronavirus (sars-cov- ) is currently affecting global health. the sars-cov- is the third highly pathogenic coronavirus following sars-cov and mers-cov that cause severe accurate respiratory symptoms in humans. since december , this virus has caused more than thousand covid- cases in china. nowadays, the number of infections in countries outside china is growing rapidly. the most remarkable feature of the sars-cov- incidences and epidemiology is its great capacity for human-to-human transmission [ ] . clinically, the majority of covid- patients have mild and moderate symptoms, and the elderly appear to have severe symptoms [ ] . based on the analysis of china data, the covid- case-fatality rate was estimated as around . % ( , deaths over , confirmed cases of sars-cov- infection) [ ] , lower than those of sars and mers [ ] . however, due to the large-scale infected population, the sars-cov- has already caused more than ninety thousand deaths as of april th , sowing great social panic around the world. while recent efforts have been focused on transcriptome analysis of host responses to sars-cov- infection at a certain time point in certain cell lines [ , ] , the transcriptional dynamics of host responses to the virus infection has remained largely unexplored. generally, once the virus enters the cell, the host innate immune responses, such as the interferon-mediated antiviral responses and cytokine production, have a pivotal role in suppressing the virus replication, which, if inadequate, might contribute to the viral pathogenesis. this hypothesis has been supported by our previous study, which has shown that the high pathogenicity of avian influenza virus is associated with abnormal coordination between interferon-mediated antiviral responses and cytokine production in host cells [ ] . similar to both sars-cov and mers-cov, which induce the overactivation of cytokines [ ] , increased cytokine levels are also observed in patients hospitalized with covid- [ ] . transcriptome analysis of in vitro host cells shows that sars-cov and mers-cov elicit distinct responses to the expression of the host genes [ ] . until now, the time-series gene-expression profiling of the host response to sars-cov- remains unknown and thus is urgently needed uncovering its pathogenesis. in this study, we used the sars-cov- strain isolated from patients [ ] to infect in vitro calu- cells, and performed rna sequencing to determine the time-series transcriptome profiling data of the host. we established the host response patterns for sars-cov- by comprehensive analysis of the transcriptomic profiles from sars-cov- , sars-cov and mers-cov. these results provide profound new insights into the pathogenesis and progression of the covid- disease caused by sars-cov- , illuminating new strategies for the prevention and control of sars-cov- transmission and eventually leading to a cure of the covid- disease. cells and virus. calu- human airway epithelial cells(atcc, htb- ) were cultured in minimum essential media (mem, hyclone) supplemented with % fetal bovine serum(fbs), % mem neaa, and u/ml penicillin-streptomycin solution (gibco, grand island, usa) at °c in a humidified atmosphere of % (v/v) co . vero cells (atcc, ccl- ) were cultured at °c in dulbecco's modified eagle's medium (dmem, gibco) supplemented with % fbs (gibco) in the atmosphere with % co . sars-cov- strain betacov/wuhan/ivdc-hb- / (c-tan-hb , gisaid accession no. epi_isl_ ) was isolated from a human patient [ ] . viruses were harvested and viral titrations were performed in vero cells using plaque assay. [ ] . the cleaned data were mapped to the human grch reference genome using star aligner (v . . a) [ ] . the htseq-count command was used to count reads mapped to each gene [ ] . the r package deseq was applied to further identify differentially expressed genes (degs) (fdr< . , |log fc|>= ) [ ] . the unmapped reads against the entire human genome were further aligned to the reference genome of sars-cov- (epi_isl_ ). virus genome annotation was based on our previous work [ ] . go enrichment analysis was performed by fisher's exact test with the human protein-coding genes as a background in r. for analysis of microarray data of sars-cov and mers-cov, normalization and identification of degs (fdr< . , |log fc|>= ) were conducted using the r package limma [ ] . we carried out in time-course experiments to identify dynamic changes in transcripts in response to sars-cov- based on the infected and mock-infected groups across four time points ( , , and hpi), in which three biologically independent replicates for each treatment group were used for constructing cdna libraries. the calu- human airway epithelial cell line, a model of human respiratory disease [ ] , was used as the host cell of sars-cov- , subjected to the same moi and host cell used in the previous analyses of sars-cov and mers-cov infections. after the total rna isolation and sequencing, we obtained the host transcriptomes, as well as the genomes and transcripts of viruses. the high-throughput sequencing resulted in an average of million pairedend reads per sample, and the sequencing quality was high with a mean mapping rate of unique reads at approximately % among mock samples (supplementary table ). the quality control of all samples was assessed by the pca analysis based on normalized counts from deseq , which indicated that high quality was achieved given that the majority of samples were well clustered except only one sample from the infection group at hpi that was removed before further analysis ( figure a ). to evaluate the growth rate of sars-cov- , we calculated the rna level of the virus represented by unique reads mapping rates at different time points. our results showed that in general the virus reads increased sharply from . % to . % while reads mapped to the host genome dropped rapidly from . % to . % ( figure b) , suggesting a rapid replication of the virus within hours. from the results, at the earliest time point ( hpi), virus produced high-levels of viral genome rna as evidenced by relatively even coverage depth across the whole genome ( figure c ). interestingly, we found that there was a significantly active transcription of the ' end of sars-cov- at hpi, especially for the m, , a, b, b and n genes ( figure c) which could play important roles in the antagonism with host immune response [ , ] . after that, the relatively even depth distribution of reads along viral genome was again observed at panels of and hpi. this time-dependent patterns of virus replication and transcription was most likely to play critical roles in the pathology of sars-cov- . to elucidate the global changes of host gene expression along with virus growth, we identified the overall up-and down-regulated degs during sars-cov- infection ( figure d and supplementary table ). as shown in figure d , during the early stage of infection before hpi, there were many more up-regulated genes than down- to investigate specific host responses during sars-cov- infection, we performed a comparative transcriptome analysis by integrating two public host transcriptomes of sars-cov (gse ) [ ] and mers-cov (gse ) [ ] figure b ). in addition, a list of the virus-specific antiviral-related genes was identified, including parp [ ] and cmpk [ ] for sars-cov- , bst [ ] , ititm and usp for sars, and parp [ ] for mers-cov ( figure b ). secondly, we further used a set of human cytokines to quantify host inflammation responses between three viruses. the cytokines from the cytoreg database were often cited by various publications and play a primary role in the immune system [ ] . our results showed that, for sars-cov- , the level of cytokine production was highly induced at hpi, decreased at hpi, and then slowly recovered thereafter ( figure c ). and mers-cov. our analysis also provided evidence that sars-cov- had more cytokines in common with sars-cov than with mers-cov ( figure d next, to gain possible explanations for the distinct patterns in host antiviral capacity and cytokine production during sars-cov- infection, dynamic expression of four types of key genes were evaluated, including virus receptors for cell entry, pathogen recognition receptors (prrs) for an innate immune startup, regulator genes for induction of antiviral-related genes and interferon production (figure ) . for the three cell entry related genes (ace as the receptor of sars-cov and sars-cov- [ , ] , dpp as the receptor of mers-cov [ ] and protease tmprss for s protein priming of sars-cov- [ ] ), we observed the dramatic changes in tmprss expression with very early induction during sars-cov- infection, and the slightly down-regulated expression of ace in cells infected with sars-cov- and sars-cov, whereas dpp was more up-regulated in mers-cov (figure ). for the two prrs, ddx is a canonical rig-i-like receptor for rna virus recognition [ ] , and tlr is a toll-like receptor playing important roles in initiating a protective innate immune response to highly pathogenic coronavirus infections [ ] . we observed that all three viruses had a notably up-regulated expression of ddx while only mers-cov had a suppressed tlr at the hpi (figure ) , which is consistent with the fact that decreased expression of tlr contributes to the pathology of highly pathogenic coronavirus infections [ ] . among the four regulator genes, irf is responsible for the expression of most ifn-α subtypes and the type i ifn amplification loop [ ] , and irf , stat and stat form the isgf complex that binds to interferon-stimulated response elements and thereby induces the expression of interferon-stimulated genes [ ] . as expected, gradually up-regulation of the four primary regulator genes was observed for all three viruses (figure ). at last, we found a significant difference in the expression of ifnb between sars-cov- , sars-cov and mers-cov, indicating that ifnb likely accounted for the observed variations of the host antiviral capacities among three viruses (figure ) . taken together, early induction of tmprss and gradually increased expression level of ifnb were likely responsible for the distinct host immune response patterns of sars-cov- infection. using time-series profiling of the virus genome and host transcriptome at the same time during sars-cov- infection coupled with comparative transcriptome analysis, we found that, compared to sars-cov and mers-cov, sars-cov- induces strong host cell responses at the very early stage of infection that not only favor its high infectivity to host cells but also restrict its pathogenesis. here we sequenced the transcriptomes of sars-cov- and virus-infected host cells simultaneously during the early stages of infection, providing a robust reference dataset to speculate the antagonistic pattern between pathogen and host cells. to summarize, our findings showed that sars-cov- induced the significantly high expression of the cellular serine protease tmprss at hpi to help the entry of viral particles into cells [ ] (figure ). at the same time, host cell initiated an immediate response for the invasion of sars-cov- virus ( figure d ). then, the virus successfully suppressed the acute response of host cells for fast proliferation by increasing the transcripts of its ' genome end, including m, , a, b, and n genes which were consistent with their reported regulations to host immune response [ , ] . as a response from hosts cell, a number of antiviral pathways and cytokine productions were up-regulated to resist the virus infection (figures and ). in particular, several metabolism-associated pathways were down-regulated at hpi and hpi (figure ). after the antagonistic cycle, a dramatic proliferation of viral particles was detected in the early infection of host cells ( figure b) , which could possibly be an explanation for the fast spread of sars-cov- in humans. as sars-cov- was reported a relatively low risk of mortality [ ] compared to the other two serious human coronaviruses, sars-cov and mers-cov, we compared and contrasted the host transcriptomes in response to the viral infections. we found that some cytokines in sars-cov- -infected cells were markedly up-regulated at a very early stage, which was not observed for sars-cov and mers-cov and even less frequently observed for other viruses. the unusual high expression of cytokines at hpi possibly explains why patients with severe clinical symptoms rapidly deteriorated. although the number of infected cases was very high, the majority of infections displayed mild symptoms which are partly explained by a gradual increase in host antiviral capability from to hpi. in contrast to sars-cov- , both sars-cov and mers-cov were able to inhibit the antiviral capability of the host significantly, which could explain their observed relatively high mortalities. mers was associated with a higher mortality than sars, which could be in part attributed to the higher expression of cytokines suppressing the antiviral responses. recently, blanco-melo et al. [ ] have published transcriptome data of host responses to sars-cov- from in vitro cell lines including a (moi of . ) and nhbe (moi of ) at hpi. this previously published data is complemented by our study designed to investigate the early response phase of cell lines infected with sars-cov- . while the previous work did not observe the elevated levels of ifnb , ifnl and ifnl , our findings show that not only ifnb but also ifnl and ifnl expressions are upregulated between and hpi ( figure d ). also, they did not detect gene expression of ace and tmprss at hpi, while we observed that ace is down-regulated at hpi and tmprss is only up-regulated at hpi before returning to the normal levels ( figure ) . our time-series sampling revealed distinct early-response features of sars-cov- , which provided a possible explanation for some clinical observations. for example, a recent clinical study [ ] found that sars-cov- could replicate effectively in upper respiratory tract tissues, and that the viral loads appeared earlier (before day ) and were substantially more than expected. findings from the present study have confirmed that, at hpi, the ' end of sars-cov- genome start to express densely, reducing the effectiveness of host immune surveillance, which possibly enables the rapid replication of sars-cov- in upper respiratory tract tissues. in spite of the fact that several studies have already demonstrated a consistent correlation between gene expression measured by rna-seq and by microarray [ ] [ ] [ ] , we still need to exclude the possibility of bias resulting from different methodologies. first, because rna-seq can potentially detect more genes than microarrays, we only early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia clinical characteristics of coronavirus disease in china estimates of the severity of coronavirus disease : a model-based analysis from sars to mers, thrusting coronaviruses into the spotlight. viruses sars-cov- launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients regulation of early host immune responses shapes the pathogenicity of avian influenza a virus clinical features of patients infected with novel coronavirus in wuhan cell host response to infection with novel human coronavirus emc predicts potential antivirals and important differences with sars coronavirus. mbio a novel coronavirus from patients with pneumonia in china trimmomatic: a flexible trimmer for illumina sequence data star: ultrafast universal rna-seq aligner htseq--a python framework to work with highthroughput sequencing data moderated estimation of fold change and dispersion for rna-seq data with deseq genome composition and divergence of the novel coronavirus ( -ncov) originating in china limma powers differential expression analyses for rna-sequencing and microarray studies human coronaviruses: a review of virus-host interactions. diseases human coronavirus: host-pathogen interaction release of severe acute respiratory syndrome coronavirus nuclear import block enhances host transcription in human lung cells the interaction between the parp protein and the ns protein of h n aiv and its effect on virus replication cmpk and bcl-g are associated with type interferon-induced hiv restriction in humans tetherin inhibits hiv- release by directly tethering virions to cells rapid evolution of parp genes suggests a broad role for adpribosylation in host-virus conflicts global landscape of mouse and human cytokine transcriptional regulation pathological findings of covid- associated with acute respiratory distress syndrome dysregulation of immune response in patients with covid- in wuhan, china sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc rig-i in rna virus recognition toll-like receptor signaling via trif contributes to a protective innate immune response to severe acute respiratory syndrome coronavirus infection. mbio irf- , irf- , and irf- coordinately regulate the type i ifn response in myeloid dendritic cells downstream of mavs signaling the molecular basis for functional plasticity in type i interferon signaling virological assessment of hospitalized patients with covid- a comprehensive comparison of rna-seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in saccharomyces cerevisiae correlation between rna-seq and microarrays results using tcga data large scale comparison of gene expression levels by microarrays and rnaseq using tcga data the sequencing data from this study have been submitted to the national genomics data center (https://bigd.big.ac.cn/) with the accession number prjca . jiang, tan and huang devised the experiment and wrote the paper; sun, wu, sheng and zhu conducted bioinformatics analysis; ye, huang and yang prepared samples; wang provided calu- cell; pan run rna sequencing; and all authors revised the manuscript. the authors declare that they have no conflict of interest. key: cord- -yabt jf authors: tuttle, kathryn d.; minter, ross; waugh, katherine a.; araya, paula; ludwig, michael; sempeck, colin; smith, keith; andrysik, zdenek; burchill, matthew a.; tamburini, beth a.j.; orlicky, david j.; sullivan, kelly d.; espinosa, joaquin m. title: jak inhibition blocks lethal sterile immune responses: implications for covid- therapy date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yabt jf cytokine storms are drivers of pathology and mortality in myriad viral infections affecting the human population. in sars-cov- -infected patients, the strength of the cytokine storm has been associated with increased risk of acute respiratory distress syndrome, myocardial damage, and death. however, the therapeutic value of attenuating the cytokine storm in covid- remains to be defined. here, we report results obtained using a novel mouse model of lethal sterile anti-viral immune responses. using a mouse model of down syndrome (ds) with a segmental duplication of a genomic region encoding four of the six interferon receptor genes (ifnrs), we demonstrate that these animals overexpress ifnrs and are hypersensitive to ifn stimulation. when challenged with viral mimetics that activate toll-like receptor signaling and ifn anti-viral responses, these animals overproduce key cytokines, show exacerbated liver pathology, rapidly lose weight, and die. importantly, the lethal immune hypersensitivity, accompanying cytokine storm, and liver hyperinflammation are blocked by treatment with a jak -specific inhibitor. therefore, these results point to jak inhibition as a potential strategy for attenuating the cytokine storm and consequent organ failure during overdrive immune responses. additionally, these results indicate that people with ds, who carry an extra copy of the ifnr gene cluster encoded on chromosome , should be considered at high risk during the covid- pandemic. one sentence summary inhibition of the jak kinase prevents pathology and mortality caused by a rampant innate immune response in mice. cytokine release syndrome (crs) or hypercytokinemia, often referred to as a 'cytokine storm', has been postulated to drive mortality during severe respiratory viral infections, such as influenza ( ) -including the spanish flu epidemic ( ) and the h n bird flu ( ) , as well as during the severe acute respiratory syndrome coronavirus epidemic of (sars-cov- ) ( ). increasing evidence supports the notion that mortality during infections with sars-cov- , which causes coronavirus disease of , is driven by the exacerbated immune response to the virus, leading to a cascade of events involving a cytokine storm and acute respiratory distress syndrome (ards), often accompanied by myocardial damage and multiple organ failure ( , ) . this pathological cascade resembles what is observed in other lethal viral lung infections, where the immune response to the virus triggers a primary wave of cytokines, including type i and type iii interferons (ifns), followed by infiltration and activation of diverse immune cells and production of secondary cytokines and chemokines, including type ii ifn (ifn-g), accelerated immune activation, and progressive decline of respiratory function ( ) . importantly, many of the cytokines and chemokines involved in the cytokine storm employ janus protein kinases (jaks) for signal transduction. cytokine analysis of covid- patients indicate an important role for the magnitude of the cytokine storm in prognosis and clinical outcome. levels of c-reactive protein (crp) and il- at the time of hospitalization were reported to be significantly higher in patients who eventually died versus those that survived ( ) . in a different study, patients admitted to intensive care units (icus) showed significantly higher levels of il- , il- , il- , ip- , mcp- , mip- α, g-csf, and tnf-α relative to non-icu patients ( ) . using artificial intelligence methods, high levels of circulating alanine aminotransferase (alt), a marker of liver inflammation, was associated with progression toward ards ( ) . altogether, these findings support the rationale for combining antiviral treatment and targeted immunosuppression as a therapeutic approach in covid- ( ) . accordingly, several clinical trials are currently testing the safety and efficacy of inhibitors of il- signaling and jak/stat signaling. furthermore, hydroxychloroquine, a molecule shown to attenuate toll-like receptor signaling and cytokine production that is approved for the treatment of rheumatoid arthritis and systemic lupus erythematosus ( , ) , is also being testing in many clinical trials for covid- . within this context, we report here relevant results obtained during studies of the hyperactive immune response in a mouse model of down syndrome (ds). we recently discovered that trisomy (t ), the genetic cause of ds, causes consistent activation of the ifn transcriptional response in multiple cell types ( ) ( ) ( ) ( ) , which is explained by the fact that four of the six ifn receptors are encoded on chromosome (ifnar , ifnar , ifngr , il rb). additional investigations of the proteome ( ), metabolome ( ) , and immune cell lineages ( , ) of people with ds demonstrated that t causes: ) changes in the circulating proteome indicative of chronic autoinflammation with clear dysregulation of ifn-inducible cytokines ( ) , ) activation of the ifn-inducible kynurenine pathway, leading to elevated production of neurotoxic tryptophan catabolites ( ) , and ) widespread hypersensitivity to ifn stimulation across the human immune system ( , ) . altogether, these results support the notion that ds can be understood in good measure as an interferonopathy, whereby increased ifn signaling could account for many of the developmental and clinical impacts of t . now, we report results generated during our investigation of immune responses in the dp ( )/yey mouse strain (referred hereto as dp ) ( ) , a widely used mouse model of ds. the dp strain carries a segmental duplication of murine chromosome that is syntenic to human chromosome , encoding ~ genes, including the four ifnrs. we found that dp mice overexpress ifnrs in all immune cell types tested, display hypersensitivity to type i and type ii ifn stimulation, overproduce key cytokines, and display increased liver inflammation. remarkably, dp mice display lethal immune responses upon challenge with viral mimetic molecules, such as the tlr agonist poly-i:c [p(i:c)] or the tlr / agonist imiquimod, a phenotype that is not observed in their wild type (wt) littermates or mouse strains carrying triplications of other genomic regions syntenic to chromosome . strikingly, both the lethal immune phenotype and associated cytokine storm are blocked by treatment with a small molecule inhibitor of the jak kinase, which also decreases liver pathology. overall, these results have potential far-reaching significance for the treatment of covid- , justifying a deeper study of jak inhibitors as a therapeutic strategy to attenuate the cytokine storm and downstream organ failure in this condition, while also indicating that individuals with ds should be considered a high-risk population during the covid- pandemic. in order to model the impact of t on chronic innate immune responses in mice, we employed the dp mouse strain, which harbors triplication of a region syntenic to human chromosome including four ifnrs ( figure a ) ( ) . to define the relationship between gene copy number and ifnr protein expression, we evaluated surface expression of ifnar in peripheral immune cells from dp mice by flow cytometry. dp( ) yey/+ and dp( )yey/+ mice (hereafter referred to as dp and dp ), which have three copies of genes that are syntenic to chromosome , but are disomic for ifnrs ( ) , were included as controls ( figure a) . expression of ifnar on the surface of cd + white blood cells was significantly higher in dp mice, but not dp or dp animals, relative to wt littermates ( figure b) . when different immune cell types were analyzed by flow cytometry, ifnar expression was significantly higher in all cell types tested ( figure c) . these results are consistent with previous reports of mrna overexpression for all ifnrs in this gene cluster in various tissues of dp mice ( , ) . to test whether upregulation of ifnrs conferred stronger responses to ifn ligands, we stimulated wt and dp blood samples with ifn-a and evaluated downstream phosphorylation of stat (pstat ) by flow cytometry. dp cells responded more strongly to ifn-a as defined by significantly higher levels of pstat relative to wt cells, in all immune lineages examined ( figure d ). this widespread heightened response to ifn-a was not observed in dp and dp cells; however, t cell subsets from the dp strain exhibited more robust pstat ( figure e-f) . although dp mice have only two copies of the ifnr genes, this result suggests that some aspect of triplication of the region of chromosome syntenic to human chromosome leads to differential regulation of type i ifn signaling in t cells. given that one of the type ii ifn receptor subunits (ifngr ) is also encoded on murine chromosome and triplicated in dp mice, we next investigated the response to ifn-g stimulation. indeed, dp cells responded more strongly to ifn-g stimulation in most cell types examined ( figure g) . overall, these results suggest that elevated expression of the ifnrs in the dp mouse strain confers increased sensitivity to type i and type ii ifns. since dp cells mount enhanced responses to ifn ligands, we next investigated the response of dp mice to innate immune stimuli known to trigger an ifn response. chronic systemic induction of ifn signaling was elicited by repeated administration of the tlr agonist polyinosinic:polycytidylic acid [p(i:c)], which is known to produce an acute spike of type i ifn production ( - days), followed by low but persistent expression of type i ifn ligands and a robust cytokine response in the chronic phase ( - days) ( ) . wt and dp mice were given intraperitoneal injections of mg/kg of p(i:c) or an equivalent volume of vehicle (sham) every other day, for up to days, with the experiment completed at day . remarkably, dp mice were profoundly sensitive to the treatment, which was largely tolerated by wt mice (figure a ). body weight measurements during the course of the experiment revealed rapid weight loss in dp mice upon treatment ( figure b) . eventually, all but one of the twelve dp mice had to be removed from the study for excessive weight loss (> %). in contrast, wt animals did not lose as much weight, and all but three out of nine survived to the end of the experiment (figure a ,b). dp and dp mice also tolerated chronic immune stimulation with p(i:c) (figure c-f). although p(i:c) treatment caused some weight loss in both dp and dp mice, the percentage lost was comparable to wt levels. this p(i:c)-induced weight loss was clearly dosedependent, as mice receiving half the lethal dose ( mg/kg) displayed only minor weight loss, with no observable differences between dp and wt mice ( figure s ). in order to determine whether hypersensitivity was restricted to tlr agonists or more broadly observed across activation of other pattern recognition receptors, we treated the mice with the tlr agonist imiquimod. topical administration of imiquimod causes ifn production, acute skin inflammation, systemic inflammation, and dehydration, and is commonly employed as a model of psoriasis ( ) . imiquimod treatment caused significant weight loss in dp mice despite daily administration of supportive fluids, which led to rapid termination of the study in observation of the humane endpoint ( figure g,h) . in contrast, wt animals receiving identical treatment maintained their weight (figure g,h) . altogether, these results indicate that the hypersensitivity to ifn stimulation observed at the cellular level in dp mice associates with increased sensitivity and lethality to ifn-inducing viral mimetic agents at the organismal level. we next compared the immune response in dp mice relative to their wild type littermates during the acute and chronic responses to p(i:c). first, we measured levels of circulating ifn-a hours after the first injection of p(i:c) (or sham). ifn-a levels were induced by p(i:c), with a stronger response in dp mice ( figure a) . elevated ifn ligand production in dp mice is consistent with a predicted fast-forward loop driven by increased ifnr expression in ds ( ) . we then measured mrna expression for several ifn-stimulated genes (isgs) in the spleen of wt and dp mice at hours after a single injection of p(i:c) (or sham). this experiment revealed strong induction of isg expression upon p(i:c) treatment, with several isgs being more significantly elevated in the dp mice than in wt animals ( figure b ). next, we measured the levels of several cytokines and chemokines in the bloodstream hours after a single p(i:c) injection. expectedly, p(i:c)-treated animals showed increased levels of many pro-inflammatory factors, regardless of genotype ( figure c and figure s a ). in this acute phase, the most significantly elevated inflammatory markers were ip- (cxcl ), mcp- , rantes (ccl ), and kc/gro (cxcl ) (figure c ). on average, dp mice did not show significant differences in expression of cytokines relative to their wt littermates. then, we measured circulating cytokines and chemokines during the chronic response. we compared serum levels between dp mice taken at the time of harvest due to reaching the humane endpoint of % weight loss to the levels in the wt littermates that survived to the end of the -day experiment. at these later stages of chronic inflammation, several of these inflammatory markers were significantly elevated in the dp animals, such as mcp- and mip- a, but not in the wt littermates ( figure d) . altogether, these results indicate that activation of tlr signaling leads to strong induction of key cytokine and chemokines in our experimental paradigm, with several inflammatory markers being more elevated in the dp mice. although not a single cytokine or chemokine stands out as the sole driver of phenotypes in the dp mice, the mild over-production of several cytokines could potentially contribute to their exacerbated immune sensitivity. next, we investigated markers of liver inflammation and injury. the serum levels of the enzymes alanine transaminase (alt) and aspartate transaminase (ast), two commonly used markers of hepatocyte injury, were significantly elevated upon p(i:c) treatment in the dp mice, reaching concentrations nearly an order of magnitude higher than those observed in their wt littermates ( figure a) . prompted by these results, we investigated liver pathology. liver tissue sections were stained with hematoxylin and eosin (h&e) and evaluated by a trained histologist, blinded against treatment group and genotype. scoring of liver pathology was done with procedures adapted for mice ( ) using a validated histological scoring system ( ) , that included parameters such as cell injury, inflammation, and reactive changes, leading to an overall liver pathology score. this analysis revealed that dp mice have increased liver pathology even before p(i:c) treatment, as demonstrated by significantly higher levels of cell injury, inflammation, and reactive changes relative to their wt littermates (figure b,c) . this basal level of liver pathology was not observed in the dp and dp mice (figure d,e) . importantly, upon treatment with p(i:c), the livers of wt animals developed pathology scores comparable or greater than those observed at baseline in dp mice, including signs of strong recruitment and/or expansion of inflammatory cells. when p(i:c) was administered to dp mice, all metrics of liver pathology increased, leading to significantly higher overall pathology scores relative to both dp sham treatment and wt upon p(i:c) treatment (figure b-c) . livers from dp and dp mice that received p(i:c) also experienced increases in liver pathology, but these changes did not differ from wt mice (figure d,e) . overall, these results indicate that dp mice display increased liver inflammation and pathology, both at baseline and upon immune activation, which could potentially contribute to their heightened sensitivity to ifn-inducing agents. all three types of ifn signaling employ the jak kinase for signal transduction, in combination with either jak or tyk (figure a) . therefore, we sought to determine whether inhibiting jak specifically would block the lethality and pathology caused by p(i:c) treatment in dp mice. to this end, we used the jak -specific inhibitor incb . animals received a dose of mg/kg of incb (or an equivalent volume of methylcellulose used as the vehicle) twice daily via oral gavage, beginning hours prior to the first p(i:c) injection, and every day during the course of the experiment. remarkably, dp mice that received the jak inhibitor all survived the p(i:c) treatment, while dp animals that received the vehicle and p(i:c) experienced weight loss and had to be euthanized ( figure a) . we then analyzed the impact of jak inhibition on cytokine production by comparing the dp mice treated with p(i:c) with and without co-treatment with the jak inhibitor. importantly, cytokine production was strongly abrogated by jak inhibition (figure b and figure s ) . lastly, we evaluated the impact of jak inhibition on markers of liver inflammation and injury. indeed, the jak inhibitor led to a reduction of circulating levels of alt and ast, as well as a decrease in overall liver pathology scores ( figure c) . altogether, these results indicate that blocking the catalytic activity of the jak kinase prevents the lethality associated with a rampant innate immune response in the dp mouse model. the efficacy of this treatment likely relies on a combination of inter-related effects, such as reduced cytokine production and improved liver function. as the covid- pandemic rapidly progresses worldwide, there is a dire need to identify new prophylactic and therapeutic strategies. based on recent data from analyses of inflammatory markers in covid- ( , ) and prior knowledge about inflammatory responses in other lethal viral lung infections, targeted immune-suppression is now appreciated as a potential strategy to prevent ards, fulminant myocarditis, organ failure, and mortality at advanced stages of the condition ( ). accordingly, clinical trials for diverse immune-suppressants have been started around the world, including inhibitors of il- and jak/stat signaling. here, we demonstrate a key role for the jak kinase in driving cytokine storms and accompanying mortality in a mouse model of lethal anti-viral immune responses. jak is one of several jak kinases required for mediating inflammatory signaling downstream of a large number of cytokines, including all three types of ifn signaling and il- ( ). we show that eliciting the innate anti-viral immune response with tlr agonists is lethal in a sensitized mouse model carrying triplication of four of the six ifn receptors. within days, the dp mice rapidly lose weight and die if unattended, concurrent with induction of many prominent cytokines linked to the severity of covid- -driven pathology, such as il- , tnf-α, il- , ip- , mcp- , and mip- α ( ) . furthermore, dp mice display exacerbated liver inflammation and pathology, which, together with stronger production of some cytokines, may converge into a pathological cascade leading to wasting and death. importantly, these phenotypes are blocked by jak inhibition. based on these results, we propose here that jak inhibitors are a valid therapeutic approach for attenuating the cytokine storm in covid- . four jak inhibitors are currently fda-approved for the treatment of diverse medical conditions: jakafi/ruxolitinib, a jak / inhibitor approved for myelofibrosis, polycythemia vera, and graft-versus-host-disease; xeljanz/tofacitinib, a jak / inhibitor approved for rheumatoid arthritis, psoriatic arthritis and ulcerative colitis; olumiant/baricitinib, a jak / inhibitor approved for rheumatoid arthritis; and rinvoq/upadacitinib, a jak inhibitor also approved for rheumatoid arthritis. many jak specific inhibitors are currently being developed and tested in clinical trials for diverse inflammatory and autoimmune conditions, including the molecule used in this study, we hypothesize that jak inhibitors could be superior agents in terms of attenuating the cytokine storm caused by covid- relative to existing anti-il agents, which consist of injectable monoclonal antibodies that inhibit the interaction between il- and its receptor il- r ( ) . in contrast, jak inhibitors are available as drugs administered orally, with very well characterized pharmacodynamics and pharmacokinetics, and may provide a more appropriate strategy to transiently tone down the cytokine storm to prevent ards and fulminant myocarditis. furthermore, jak inhibitors not only block il- signaling, but also other inflammatory pathways involved in the cytokine storm ( ) . depending on the degree of inhibition of various jak kinases, it is predicted that available jak inhibitors will block various aspects of the cytokine storm, with potential for varying degrees of therapeutic benefit. with regards to its relevance to therapeutic strategies for covid- and other conditions associated with cytokine storms, this study has several limitations. first, our mouse model does not fully reproduce the human conditions associated with cytokine storms. although pandemic. our studies of the dp mice were originally driven by our interest in studying immune dysregulation and ifn hyperactivity in ds. of note, our results are consistent with studies of sars-cov- demonstrating that type i ifn signaling contributes to sars-driven pathology and mortality in mice ( ) . we predict that, as has been observed during rsv ( ) and h n ( ) infections, people with ds will develop more severe complications upon sars-cov- infection, including higher rates of hospitalization and mortality. we hope these results will encourage special attention to individuals with ds, including closer monitoring of hyperinflammation during covid- , and inclusion in clinical trials for targeted immunesuppressants. table s . q-rt-pcr. q-rt-pcr was performed using the applied biosystems viia ™ -well block real time pcr system. q-rt-pcr master mix was prepared with applied biosystems sybr select master mix for cfx. standard curves were run for every primer pair in each q-rt-pcr experiment to ensure efficient amplification of target transcripts within all experimental tissues. all samples were run in triplicate, averaged and normalized to s rrna. primer sequences at provided in table s . cytokine measurements. blood cytokine levels were measured using mesoscale discovery assays and/or biolegend legend plex assays per manufacturer's instructions. all samples were analyzed in duplicate and the average used for statistical analysis. missing values were set to the lower limit of detection. liver histopathology. formalin-fixed paraffin-embedded pieces of liver were sectioned at microns and stained with hematoxylin and eosin (h&e). scoring of liver pathology used procedures adapted for mice as described ( ) table s . primers for q-rt-pcr. primers used for q-rt-pcr analysis of interferonstimulated genes and s rrna, including gene, refseq id, forward primer sequence, reverse primer sequence. the cytokine storm of severe influenza and development of immunomodulatory therapy preparing for the next pandemic confronting potential influenza a (h n ) pandemic with better vaccines an interferon-gamma-related cytokine storm in sars patients clinical predictors of mortality due to covid- based on an analysis of data of patients from wuhan, china clinical features of patients infected with novel coronavirus in wuhan into the eye of the cytokine storm towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity covid- : consider cytokine storm syndromes and immunosuppression mode of action of hydroxychloroquine in ra-evidence of an inhibitory effect on toll-like receptor signaling chloroquine and hydroxychloroquine equally affect tumor necrosis factor-alpha, interleukin , and interferon-gamma production by peripheral blood mononuclear cells trisomy consistently activates the interferon response trisomy dysregulates t cell lineages toward an autoimmunity-prone state associated with interferon hyperactivity trisomy activates the kynurenine pathway via increased dosage of interferon receptors mass cytometry reveals global immune remodeling with multilineage hypersensitivity to type i interferon in down syndrome trisomy causes changes in the circulating proteome indicative of chronic autoinflammation duplication of the entire . mb human chromosome syntenic region on mouse chromosome causes cardiovascular and gastrointestinal abnormalities rodent models in down syndrome research: impact and future opportunities lifespan analysis of brain development, gene expression and behavioral phenotypes in the ts cje, ts dn and dp( ) /yey mouse models of down syndrome re-entry into quiescence protects hematopoietic stem cells from the killing effect of chronic exposure to type i interferons imiquimod-induced psoriasis-like skin inflammation in mice is mediated via the il- /il- axis signaling a link between interferon and the traits of down syndrome ketohexokinase c blockade ameliorates fructose-induced metabolic dysfunction in fructose-sensitive mice design and validation of a histological scoring system for nonalcoholic fatty liver disease jak inhibition as a therapeutic strategy for immune and inflammatory diseases targeting interleukin- signaling in clinic interaction of sars and mers coronaviruses with the the establishment of reference sequence for sars-cov- and variation analysis dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice down syndrome and the risk of severe rsv infection: a meta-analysis pandemic (h n ) virus and down syndrome patients we are grateful to all staff at the linda crnic institute for down syndrome, specially hannah dougherty, jessica baxter, dayna tracy, and ross granrath. we also thank staff at the university of colorado cancer center flow cytometry shared resource, the human immune monitoring shared resource (himsr), and the barbara davis center flow cytometry core who assisted with various aspects of this work. we are grateful to incyte corporation for providing incb for these studies and for expedited review of these results prior to publication. key: cord- - l d authors: wang, hai-long title: the emergence of inter-clade hybrid sars-cov- lineages revealed by d nucleotide variation mapping date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: l d i performed whole-genome sequencing on sars-cov- collected from covid- samples at mayo clinic rochester in mid-april, , generated consensus genome sequences and compared them to other genome sequences collected worldwide. i proposed a novel illustrating method using a d map to display populations of co-occurring nucleotide variants for intra- and inter-viral clades. this method is highly advantageous for the new era of “big-data” when high-throughput sequencing is becoming readily available. using this method, i revealed the emergence of inter-clade hybrid sars-cov- lineages that are potentially caused by homologous genetic recombination. the covid- pandemic caused unprecedented disruption to populations worldwide. the causative agent is the sars-cov- virus, a single-stranded positive-sense rna virus from the beta-coronavirus genus with ~ kb in genome length , . although general testing for this novel virus is rapidly expanding, targeted sequencing approaches have attracted more attention for the following reasons: a) the sars-cov- virus has evolved into mutant strains with different clinical complications. monitoring the spread of specific strains during an outbreak can help inform intervention methods and their impacts. b) vaccine efficacy can be monitored by tracking the prevalence and distribution of key genes involved in vaccine development and deployment, and by determining the viral mutation rates to inform a vaccine's longevity and its interactions with the human immune system [ ] [ ] [ ] . today, there are over , genome deposits on gisaid (the global initiative on sharing all influenza data) and over , on ncbi (national center for biotechnology information), reflecting a worldwide effort to fight against the virulent pathogen. the wealth of sars-cov- genome sequences provides a rare opportunity for both clinical and basic scientists to investigate its evolution mechanism. on the other hand, outsized collections of genome sequences pose a computational challenge to bioinformatics analysis. currently, the most adopted method for illustrating species evolution, the phylogenetic analysis, is inadequate for discerning fast-evolving mutations and recombination. new methods for mapping mutations within the same genome are needed for the impending "big-data" era. i propose a two-dimensional nucleotide variant mapping that illustrates populations of cooccurring mutations within the same genome. this plotting method is not limited by the increasing number of examined sequences. thus, it allows a direct comparison between genome collections from different geographic regions or the same region but at a different time. furthermore, it reveals the emergence of inter-clade hybrid lineages. as part of mayo clinic's efforts to fight against covid- , i performed viral genome sequencing using the third-generation sequencing techniques developed by oxford nanopore technology (ont). i received positive covid- samples of rna extracts collected by the department of laboratory medicine & pathology at mayo clinic rochester from april , through april , . following the artic network's amplicon sequencing protocol (see method for details), i succeeded in collecting genome sequences. a phylogenetic tree, presented in figure , compares these genome sequences to a reference genome (nc_ . ). according to the latest sars-cov- classification , there are six major viral clades (named l, s,v, g, gh and gr, respectively) represented by their unique sets of single nucleotide polymorphisms (snps). clade "l" is the reference genome (nc_ . ) found in china; clade "s" is named after the mutation in orf :l s (with a transition mutation of t c and co-occurring with another transition mutation of c t); clade "v" is named after the mutation in orf a:g v (with a transversion mutation of g t and co-occurring with another transversion mutation of g t); clade "g" is named after the d g mutation in the spike protein (with a transition mutation of a g and co-occurring with other three transition mutations of c t, c t, c t). two derivative clades from the g clade are the "gh" clade characterized by a mutation orf a:q h (g t), and the "gr" clade characterized by the trinucleotide mutations in the nucleocapsid gene, inducing g a, g a and g c. from these genome sequences, belong to the s clade (in magenta), to the v clade (in red), to the g clade (in green), again to the gh clade (in cyan), and to the gr clade (in orange). the same coloring scheme for distinct clades will be used throughout the paper for convenience. profiling snps is another way to show frequencies of mutations within a clade. shown in figure are profiles of two dominant clades, s (fig a) and g (fig b) . high-profile snps are those with top heights representing high frequency at special genome loci. counts of less than two were omitted to reduce the background noise caused by low-profile de novo mutations. being unique to distinct clades, high-profile snps are considered as clade-defining snps. the phylogenetic analysis, as commonly used to display evolutionary relationships among species, employs algorithms to calculate the evolutionary difference between two species and uses estimated distances to draw branching patterns that display how the species evolved from a common ancestor. although a phylogenetic tree is appealing, it traditionally ignores the recombination event that is another major driving force in evolution and causes closely related species to have different phylogenetic histories. sars-cov- is an rna virus with limited genome length, but its high mutation rate and homologous genetic recombination nonetheless gave rise to exponentially increased variants. classical studies based on consensus sequences become insufficient to interpret its adaptive potential. besides, drawing a phylogenetic tree for many species becomes impractical due to constraints in the drawing space (e.g. it was difficult to layout merely sequences in figure ). an alternative method is needed for the impending era of "big-data", when the full wealth of sequencing collections is becoming available. this new method should also demonstrate the links between nucleotide variations on the same genome. here, i propose a d mapping for co-occurring snp pairs. in brief, a matrix (n x n) is created in which "n" is the genome length. from each genome sequence, any detected co-occurring snp pair will be used to increase the value of m ij by , where "i" and "j" are the two positions (loci) of nucleotides from that snp pair. the final matrix is then normalized by dividing each element by the total number of genome sequences. directly plotting of this matrix into a d scaled map will generate a diagonally symmetric pattern that displays the distribution of population percentages for every co-occurring snp pairs in the whole genome set. however, including all ~ , nucleotides in a full-scale image is overwhelming for normal human eyes to process. a simpler format including only a selected set of loci would be a better way to demonstrate the concept. i plotted a bubble map with those clade-defining snps (including , , , , , , , , , , , and in the order of their nucleotide position in the genome and colored according to their representing clades). the radius of each bubble is proportional to the corresponding population percentage (see figure ). in this kind of d map, a colored bubble at an intra-clade intersection represents a population of the representing clade; while regions at inter-clade intersections are normally empty, indicating no crossover between distinct clades. in figure , however, there are two (or four, if accounting for the diagonal symmetry) regions belonging to two pairs of nucleotides at ( , ) and ( , ) that are not empty but have small black dots. the locus of belongs to the g clade and both loci of and belong to the s clade. these inter-clade co-occurring snp pairs are found in two lineages (mc and mc ) that were previously highlighted in the phylogenetic tree of figure . there are two possibilities for inter-clade crossover events: ) a de novo mutation; ) homologous genetic recombination. these two lineages both have a total of eight snps (for mc : , , , , , , , ; for mc : , , , , , , , , colored according to their representative clades). according to the snp profile of the s clade shown in figure a , the lineage mc has six high-profile snps characteristic to the s clade, while mc has five, which might be the reason why the phylogenetic tree includes them both in the s clade. other than the high-profile snp at representing the g clade, there is only one low-profile mutation in mc and two in mc . the small number of low-profile snps found in both lineages suggests that the possibility of a de novo mutation causing the high-profile snp at to occur in an s-clade lineage is very low. homologous genetic recombination, on the other hand, is a major evolutionary driving forces found in all kingdoms of living organisms. viral homologous recombination, discovered in several other rna viruses, has the potential to create a novel strain with enhanced virulence [ ] [ ] [ ] [ ] . for the sars-cov- virus, several accounts of recombination events have been reported - . there are several tools, including simplot , rdp , and phipack , that are specifically designed for detecting homologous recombination. but none of these tools can detect any possible recombination events among genome sequences, possibly due to low-frequency events with few snps. for example, the phipack was designed to work for low-divergence genomes, it still requires ~ % of variants to perform statistical analysis. because of the relatively short period into covid- pandemic, most viral lineages have less than snps, which is at only . % of variant percentage. this is why all previously reported recombination events of sars-cov- have relied on clade-defining snps. the d co-occurring variant mapping is a simple way to display inter-clade hybrid lineages, and it can be used to directly compare distributions of populations for intra-and inter-clade from different geographic locations or the same location but at a different time point. therefore, it is worth to examine genome sequences from around the world that have been deposited into public domains. . the big picture of homologous recombination. i downloaded ~ , sars-cov- genome sequences from ncbi (on september nd ) and used the same criterion as before to search for inter-clade co-occurring snp pairs (see method for details). i first looked at domestic regions in the us and focused on four states, including new york, minnesota, california, and washington, that represent different geographic regions (see figure ). the minnesota genome collection had sequences at the time of this study. its d plot shows a pattern similar to that in figure (no doubt given that mayo clinic rochester is located in minnesota), which is dominated by the s and the g clade, followed by a smaller population of the gh clade and an even less-populous gr clade, with the v clade as the least populous. there are also two inter-clade co-occurring snp pairs at ( , ) and ( , ) between the s and the g clade. both co-occurring snp pairs were identified in the same lineage (ncbi access id: mt ). the new york genome collection had sequences, and its d plot shows two equally dominant g and gh clades, followed by v, s, and gr clades. inter-clade co-occurring snp pairs are found between the locus at of the v clade and all other representative loci of the g and gh clades. these co-occurring snp pairs were identified in two lineages (ncbi access id: mt , mt ). the california genome collection had sequences, and its d plot shows increased differences when compared with that of new york. in california, the largest cluster still belongs to the g clade. yet, both the s and gr clades are significantly bigger than those found in new york. the v clade is the least-populous. again, inter-clade co-occurring snp pairs are found between the of the v clade and all other representative loci of the g clade and gh clade that correspond to six lineages (ncbi access id: mt , mt , mt , mt , mt , and mt ). the washington genome collection had , sequences, and its d plot shows significantly increased populations in the s and the gr clades. three groups of inter-clade co-occurring snp pairs are revealed. the first one is between the locus at from the v clade and all other major clades; the second one is found between the s and the g clades, involving both representative loci ( and ) of the s clade; the third one is at ( , / / ) between the gh and the gr clades. the long list of corresponding lineages can be found in supplemental table s . here, i paid my attention to the second group, in which two representative loci simultaneously crossing over into a distinct clade, which would be stronger evidence for homologous genetic recombination. three candidate lineages were identified (also labeled with an underline in table s . ncbi access id: mt , mt , mt ). the lineage mt has loci ( , , , , , , , , , and ), whereas the other lineage mt has loci ( , , , , , , , and ), all colored according to distinct clades. in the case of mt , it is expected that the genome region between and of the g-clade strain was swapped with the same region in an s-clade strain. in the case of mt , however, it is expected that the region between and of the s-clade strain was swapped with the same region in a gr-clade strain. also, the absence of the locus at in mt , a signature snp of all g-(gh and gr) clade lineages, is another strong piece of evidence for homologous genetic recombination. the third lineage of mt is similar to mt but has additional low-profile snps. so far, i have demonstrated the usage of d co-occurring snp mapping for detecting inter-clade hybrid lineages of the sars-cov- virus. next, i also examined genome collections from international regions and focused on the following four countries: china, japan, italy, and spain, where early covid- outbreaks were detected (see figure ). d plots of these four regions are very different. the genome collection from china had sequences with the s clade being the most dominant clade. it is followed by the v, g, and gr clades based on the population size. no gh clade was detected. two inter-clade co-occurring snp pairs were found at ( , ) and ( , ), both are between the s and the v clades, corresponding to two lineages (ncbi access id: mt and mt ). the genome collection from japan had sequences with the gr clade as the largest cluster, followed by a smaller gh clade. the s-clade is the least-populous. no viral lineage of the v clade was detected. two inter-clade co-occurring snp pairs were found, one at ( , ) between the v and the g clades (ncbi access id: lc ), and the other one at ( , / / ) between the gh and the gr clades (ncbi access id: lc ). the genome collection from spain had sequences. two dominant clusters are the s and g clades, followed by the gr and the v clades. no gh clade was detected. the genome collection from italy had sequences with the g clade as the largest cluster, followed by the gr and v clades. no gh or s clade was detected. no inter-clade co-occurring snp pairs were detected in genome collections from these two countries deposited to ncbi on september nd . the covid- pandemic caused unprecedented disruption to populations worldwide. fighting the causative agent is the indisputable duty of everyone who has the relevant knowledge and skills. sequencing the viral genome can help characterize the evolutional changes of the virus and help public health authorities understand the identity, whether it is changing, and how it is being transmitted. i procured a minion device in january originally for another sequencing project. my experiences learned during this project are worth further discussions. i will mainly focus on the following four topics: the use of nanopore sequencing technology, pros and cons of the phylogenetic analysis, homologous genetic recombination, and the quasispecies model of rna viruses. oxford nanopore technology gained attention for its long-reading sequencing technology, low cost, and portable capability. many researchers are using the minion device (costing $ , ) as basic laboratory equipment for high-throughput sequencing needs. the system has been previously used in other outbreak situations, such as ebov and zikv . early in march, the artic network opened a specific sequencing protocol for covid- , including a set of primers, laboratory materials, and related tools for bioinformatics analyses. the protocol relies on direct amplification of the virus using tiled and multiplexed primers, and it has been widely adopted by many groups around the world , . the cost of two flowcells (>$ /each) and other reagents is about $ , . a rough estimation of the material cost per sample is about $ ~$ . the bottleneck for the study is preparing the library before sequencing. it took me a whole week to prepare samples, which were all processed manually. i made some mistakes due to a lack of experience, so only consensus sequences were finally achieved. the library preparation process can be easily scaled up with a liquid handling robot. with ~ hours of data acquisition, the minion device usually generates one million reads, enough to provide results for collecting final consensus sequences. the feature of live-basecalling is necessary when using the rampart tool to monitor the on-going progress during sampling and to check whether all amplicons have been sampled. the final stage for bioinformatics analysis was handled by the nextflow covid- workflow that includes read-quality score filtering, reference alignment, and consensus sequence generation, which only takes about twenty minutes. this low cost, decentralized sequencing can be set up quickly for flexible, portable, and scalable applications. a phylogenetic tree is a diagram used to illustrate phylogeny, the history of evolution, in reference to lines of descent and relationships among broad groups of organisms. the branching patterns in a phylogenetic tree reflect how species or other groups evolved from a common ancestor. the methodology has been used increasingly in diverse areas of biological science, especially for tracing viral evolution at the early stages of the outbreak in an epidemic like covid- , , , . however, there are inherent limitations to a phylogenetic tree. first, traditional methods of phylogeny estimation, such as maximum parsimony, minimum evolution, or maximum likelihood, all assume that a single evolutionary history underlies the sequences. second, the viral network presented in the tree diagram is merely a snapshot of the early stage of the viral spreading before the phylogeny becomes obscured by subsequent migration and mutation. when the number of species in the group increases, drawing a tree diagram becomes impractical. third, commonly used methods in phylogeny, such as the pairwise distance method, do not capture the effects of homologous recombination that imply how different parts of the sequence could have separate phylogenetic histories and are not related by a single phylogenetic tree . though the ability to detect recombination is limited, ignoring recombination in tree-based analysis could lead to artifacts , . in the nature of evolution, viruses are continuously changing as a result of genetic selection through two principal mechanisms: ) de novo mutation that occurs when an error is incorporated in the genome and causes subtle genetic changes; ) recombination that happens when co-infecting viruses swap genetic information and cause major genetic changes . there are two mechanisms of viral recombination: ) the independent assortment, where viruses with multipartite genomes trade unlinked, and assorted random segments during replication; ) the incompletely linked gene, where viruses trade linked genes residing on the same piece of nucleic acid. either mechanism can produce new viral serotypes with altered virulence . recombination has been shown to occur in several positive-sense single-stranded rna virus groups, such as retroviruses, picornaviruses, and coronaviruses. it is currently believed that recombination in coronaviruses occurs by a copy-choice mechanism , in which the viral rna polymerase binds to only a few bases of the template rna. the weak interaction permits the polymerase to disassociate from the original template and then associate with a new template rna strand. the efficiency of this mechanism of recombination is relatively low. viral recombination generates novel progeny viruses that express new antigenic characteristics, with new surface proteins to infect previously resistant individuals, or with novel combinations of proteins to infect new cells in the original host. on the other side, viral recombination has also been used to create new vaccines. vaccinia virus strains carrying genes coding for viral antigens but low virulence has been produced to stimulate specific antibody production by the host, resulting in the protection of the host from the immunogen. therefore, characterizing viral recombination will not only help health management teams effectively contain emerging viruses with enhanced virulence, but also support virologists in creating new vaccines. due to their limited genome length and higher mutation rate, many rna viruses have genetically diverse populations known as quasispecies , meaning a highly diverse replicating population reaching equilibrium in the level of diversity. in traditional population genetics, a given variant in a population is approximated by its fitness. however, the quasispecies model suggests that variants are "coupled" in sequence space. a low fitness variant can thereby be maintained at a higher than expected frequency because it is coupled to a well-represented, higher fitness genotype in sequence space . instead of focusing on ancestor-descendant relationships, the quasispecies model puts more weight on co-occurring variants within the same genome and their corresponding populations, which is exactly the purpose of this d mapping for co-occurring variants. the emergence of inter-clade hybrid sars-cov- lineages indicated that the causative agent of covid- has entered a phase of quasispecies. while tracing the path of viral infection is still important, estimating the level of viral diversity is also urgently needed. in order to survive, a virus must be diverse enough to rapidly adapt to changing environments without losing fitness during the passage from host to host. the level of diversity varies depending on the host environment and is specific to the virus-host combination. the differences in viral clade populations from separate regions are clearly displayed in figures and . predictions from the quasispecies theory have profound implications for the understanding of the viral disease. the d plotting method for co-occurring variants is a useful tool for a longitudinal study to monitor the level of diversity from a region with certain environments, include the ethnogenesis population, the policies enforced by local health authorities, and viral immunity in the area. in summary, i performed whole-genome sequencing of the sars-cov- virus and compared those genome sequences with other samples collected worldwide. i proposed a d plotting method to illustrate co-occurring variants, as it has several advantages: ) it directly shows the population of coupled mutations without any statistical manipulation. ) it is suitable to display big data with no limit on the size of the genome collection (actually the more the better); ) it is flexible in regards to the plotting technique. even though the full picture with the whole range of genome length contains an overwhelming amount of information, a simplified snapshot could be generated from a subset of the sequence space; ) it helps identify the emergence of inter-clade hybrid viral strains that have both positive and negative potential impacts on all of us. specimen collection and testing of individuals with suspected covid- at mayo clinic rochester were screened for sars-cov- infections as per the cdc protocol. samples of covid- positive rna extracts collected between april , and april , were provided by the department of laboratory medicine & pathology under mayo clinic's approved protocol for covid- study. detailed whole genome sequencing and analysis methodologies came from the artic network . briefly, cdna synthesis was performed with a superscript iv first strand synthesis kit (thermo) according to manufacturer specifications. direct amplification of the viral genome cdna was performed using multiplexed primer pools per protocols provided by the artic network (v ). the sequencing library preparation protocol was adapted from the artic network's protocol for ont "pcr tiling of covid- virus". adapter ligation and cleanup were performed using the ont's ligation sequencing kit (sqk-lsk ) according to manufacturer specifications. libraries were sequenced on ont's minion device using type r . . flow cells (flo-min d). raw sequencing data were acquired using ont's minknow software ( . . ) running on a custom-built dell precision t desktop computer equipped with an nvidia rtx ti graphic processing unit. fast live-basecalling was enabled for barcode detections. high-accuracy basecalling of nanopore reads was later performed using guppy v . . on an hp prodesk computer equipped with an nvidia rtx gpu. automatic genome analyses were performed with the artic covid- nextflow software package for read-quality score filtering, reference alignment, and consensus sequence generation. the evolutionary history of sars-cov- was inferred using a bayesian approach, implemented through the markov chain monte carlo framework available in beast . . , utilizing the beagle library v to increase computational performance. for poorly sampled locations, the bayesian inference of unsampled genomes was associated with an epidemiologically informed sampling time and location, but not with observed sequence data. the phylogenetic tree was then plotted using the figtree (v . . ) accompany with the beast package. the consensus genome sequences have not been deposited to any public database, pending institutional approval. about , sars-cov- genome sequences were downloaded from ncbi on september nd , from which incomplete sequences with less than two-thirds of the usual ~ , bp genome length were discarded. each of the remaining , genome sequences was individually aligned to the reference sars-cov- genome (nc_ . ) using the needle tool (needleman-wunsch global alignment of two sequences) from the emboss package v . . . . all nucleotide positions were based on the reference genome. variants of snps were named using the snp-sites software . custom scripts in perl, python, and r were used for various calculations and graphics. a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission the proximal origin of sars-cov- genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding sars-cov- genomes recovered by long amplicon tiling multiplex approach using nanopore sequencing and applicable to other sequencing platforms geographic and genomic distribution of sars-cov- mutations mechanisms of viral mutation recombination of globally circulating varicella-zoster virus analyses of a whole-genome inter-clade recombination map of hepatitis delta virus suggest a host polymerase-driven and viral rna structure-promoted templateswitching mechanism for viral rna recombination first natural crossover recombination between two distinct species of the family closteroviridae leads to the emergence of a new disease novel coronavirus is undergoing active recombination identification of sars-cov- recombinant genomes rapid detection of inter-clade recombination in sars-cov- with bolotie running title: surveillance system for rapid identification of recombinant sequences in viral genomes with low levels of divergence recombination in aids viruses rdp : detection and analysis of recombination patterns in virus genomes phipack: phi test and other tests of recombination real-time, portable genome sequencing for ebola surveillance multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples direct rna sequencing and early evolution of sars-cov- a unique clade of sars-cov- viruses is associated with lower viral loads in patient upper airways rampart: a workflow management system for de novo genome assembly phylogenetic network analysis of sars-cov- genomes the emergence of sars-cov- in europe and the us how does recombination affect phylogeny estimation? consequences of recombination on traditional phylogenetic analysis recombination and phylogeny : effects and detection chapter viral genetics quasispecies theory and the behavior of rna viruses genetic diversity in rna virus quasispecies is controlled by host-virus interactions genetic diversity in rna virus quasispecies is controlled by host-virus interactions bayesian phylogenetic and phylodynamic data integration using beast . . virus evol beagle: an application programming interface and highperformance computing library for statistical phylogenetics the embl-ebi search and sequence analysis tools apis in snp-sites: rapid efficient extraction of snps from multi-fasta alignments acknowledgements i thank dr. matthew j. binnicker, whose lab provided rna extracts of covid- samples. i also thank drs. gregory a. worrell and thomas p. burghardt for their valuable comments. i especially thank my son michael and my daughter michelle who provided conceptual inputs to this work and proofreading on the initial draft of this manuscript.no special funding was received for this work, although the author is supported by nih brain initiative r grant (ns from ninds and nimh) and the minnesota partnership for biotechnology and medical genomics (mnp # . ). these funding sources had no role in the design of the study, in the collection/analysis/interpretation of the data, in the writing of the report, or in the decision to publish. the corresponding author had access to all of the data in the study and had final responsibility for the decision to submit this work. key: cord- -bwe f xf authors: bojadzic, damir; alcazar, oscar; chen, jinshui; buchwald, peter title: small-molecule in vitro inhibitors of the coronavirus spike – ace protein-protein interaction as blockers of viral attachment and entry for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bwe f xf inhibitors of the protein-protein interaction (ppi) between the sars-cov- spike protein and ace , which acts as a ligand-receptor pair that initiates the viral attachment and cellular entry of this coronavirus causing the ongoing covid- pandemic, are of considerable interest as potential antiviral agents. while blockade of such ppis with small molecules is more challenging than with antibodies, small-molecule inhibitors (smis) might offer alternatives that are less strain- and mutation-sensitive, suitable for oral or inhaled administration, and more controllable / less immunogenic. here, we report the identification of smis of this ppi by screening our compound-library that is focused on the chemical space of organic dyes. among promising candidates identified, several dyes (congo red, direct violet , evans blue) and novel drug-like compounds (dri-c , dri-c ) inhibited the interaction of hace with the spike proteins of sars-cov- as well as sars-cov with low micromolar activity in our cell-free elisa-type assays (ic s of . - . μm); whereas, control compounds, such as sunset yellow fcf, chloroquine, and suramin, showed no activity. protein thermal shift assays indicated that the smis identified here bind sars-cov- -s and not ace . selected promising compounds inhibited the entry of a sars-cov- -s expressing pseudovirus into ace -expressing cells in concentration-dependent manner with low micromolar ic s ( - μm). this provides proof-of-principle evidence for the feasibility of small-molecule inhibition of ppis critical for coronavirus attachment/entry and serves as a first guide in the search for smi-based alternative antiviral therapies for the prevention and treatment of diseases caused by coronaviruses in general and covid- in particular. covid- , which reached pandemic levels in early (who; march , ) , is caused by the severe acute respiratory syndrome-coronavirus (sars-cov- ) ( - ). sars-cov- is the most infectious agent in a century ( ) , having already caused more than forty million infections and a million deaths worldwide. this coronavirus (cov) is an enveloped, positive-sense rna virus with a large rna genome of roughly . kilobases and a diameter of up to about nm, characterized by club-like spikes emerging from its surface ( , ) . it is the most recently emerged among the seven covs known to infect humans. they include four covs that are responsible for about a third of the common cold cases (hcov e, oc , nl , and hku ) and three that caused epidemics in the last two decades associated with considerable mortality: sars-cov- ( ) ( ) , ~ % mortality), mers-cov (middle east respiratory syndrome coronavirus; , ~ % mortality), and now sars-cov- ( - ), which seems to be less lethal but more transmissible ( , ) . while the sars-cov- situation is still evolving, current estimates indicate that about % of infected individuals need hospitalization and the average infection fatality ratio (ifr, percentage of those infected that do not survive) is around . % but in a strongly agedependent manner, i.e., increasing from . % in < years old to . % in those > years old ( ) (to be compared with an ifr of < . % for influenza). this created unprecedented health and economic damage and a correspondingly significant therapeutic need for possible preventive and/or curative treatments. as future covs that are highly contagious and/or lethal are also likely to emerge, novel therapies that could neutralize multiple strains are of particular interest especially as the large who solidarity trial suggests that repurposed antiviral drugs including hydroxychloroquine, remdesivir, lopinavir, and interferon-β , appear to have little or no effect on hospitalized covid- patients, as indicated by overall mortality, initiation of ventilation, and duration of hospital stay ( ) . viral attachment and entry are of particular interest among possible therapeutic targets in the life cycle of viruses ( ) because they represent the first steps in the replication cycle and take place at a relatively accessible extracellular site; they have indeed been explored for different viruses ( ) . covs use the receptor-binding domain (rbd) of their glycosylated s protein to bind to cell specific surface receptors and initiate membrane fusion and virus entry. for both sars-cov and sars-cov- , this involves binding to angiotensin converting enzyme (ace ) followed by proteolytic activation by human proteases ( , , , ) . hence, blockade of the rbd-hace protein-protein interaction (ppi) can disrupt infection efficiency, and most vaccines and neutralizing antibodies (nabs) aim to abrogate this interaction ( , ) . cov nabs, including those identified so far for sars-cov- , primarily target the trimeric s glycoproteins, and their majority recognizes epitopes within the rbd that binds the ace receptor ( ) ( ) ( ) ( ) ( ) . it would be important to have broadly cross-reactive nabs that can neutralize a wide range of viruses that share similar pathogenic outcomes ( ). the s proteins of sars-cov, mers-cov, and sars-cov- have similar structures with - amino acids and rbds spanning about residues and consisting of core and external subdomains, with the rbd cores being responsible for the formation of s trimers -similarities that allow the possibility of broad neutralization ( , ). sars-cov and sars-cov- share ~ % amino acid identity in their s proteins ( , ) ; nevertheless, most current evidence indicates that sars-cov antibodies are not cross-reactive for sars-cov- ( ). for example, one study found that none of the rbd-specific monoclonal antibodies derived from single b cells of eight sars-cov- infected individuals cross-reacted with sars-cov or mers-cov rbds ( ). antibody-like monobodies designed to bound to the sars-cov- s protein also did not bind that of sars-cov ( ). as a further complicating factor, rna viruses accumulate mutations over time, which yields antibody resistance and requires the use of antibody cocktails to avoid mutational escape ( ). in addition to being too highly target-specific, antibodies, as all protein therapies, are hindered by problems related to their solubility, unsuitability for oral or inhaled administration, and immunogenicity. by being foreign proteins, they themselves can act as antigens and elicit strong immune response in certain patients ( ) ( ) ( ) , and this is only further exacerbated by their long elimination half-lives ( ) . even among fda approved therapeutics, there were more postmarket safety issues with biologics than with small-molecule drugs ( ) . hence, peptides or small molecules can offer alternative approaches. some peptide disruptors of this ppi have also been reported, but so far none have been very effective ( , [ ] [ ] [ ] . more importantly, because of bioavailability, metabolic instability (short half-life), lack of membrane permeability, and other issues, developing peptides into clinically approved drugs is difficult and rarely pursued ( , ) . small molecules traditionally were not considered for ppi modulation because they were deemed unlikely to be successful due to the lack of well-defined binding pockets on the protein surface that would allow their adequate binding. during the last decade, however, it has become increasingly clear that smis can be effective against certain ppis. there are now > ppis targeted by smis that are in preclinical development ( ) ( ) ( ) ( ) ( ) ( ) ( ) , and two of them (venetoclax ( ), lifitegrast ( )) were recently approved by the fda for clinical use ( , ). notably, the success of two small-molecule drugs that target hiv- entry and are now approved for clinical use, enfuvirtide and maraviroc, validates this strategy of antiviral drug discovery. maraviroc targets the c-c motif chemokine receptor (ccr ), a host protein used as a co-receptor during hiv- entry, and it is a noncompetitive allosteric inhibitor that stabilizes a conformation no longer recognized by the viral envelope ( , ) . hence, it is an allosteric smi of a ppi, highlighting the feasibility of such an approach to prevent viral entry. interestingly, maraviroc has been claimed recently to inhibit the sars-cov- s-protein mediated cell fusion in cell culture ( ). therefore, smis could yield antiviral therapies that are more broadly active (i.e., less strain-and mutation-sensitive), more patient friendly (i.e., suitable for oral or inhaled administration), less immunogenic, and more controllable (shorter half-life / better biodistribution) than antibodies ( ). oral bioavailability offers a major advantage for access, wide-spread usage, and compliance ( ) a requisite for longterm and broadly acceptable preventive use ( - ). for covid- , the possibility of direct delivery into the respiratory system via inhaled or intranasal administration is also important and unlikely to be achievable for antibodies. broadly specific activity could make possible multi-strain or even pan-cov inhibition, and while it is unlikely with antibodies ( , ), it is possible for smis. for example, we have shown that while the corresponding antibodies did not cross-react for the human vs mouse cd -cd l ppi, our smis did so and had about similar potencies ( , ). since previously we found that starting from organic dyes one can identify smis for cosignaling ppis as potential immunomodulatory agents ( , - ), we initiated a screen of such compounds for their ability to inhibit the sars-cov- -s-ace ppi. this led to the identification of several organic dyes ( - , figure ) that show promising inhibitory activity of this ppi in vitro, including methylene blue ( ), a phenothiazine dye approved by the fda for the treatment of methemoglobinemia, which we have described separately ( ). more importantly, it also led to the identification of new and more potent smis ( - ) that are more drug-like and no longer contain color-causing chromophores as summarized below. as part of our work to identify smis for co-signaling ppis that are essential for the activation and control of immune cells, we discovered that the chemical space of organic dyes, which is particularly rich in strong protein binders, offers a useful starting point. accordingly, it seemed logical to explore it for possible inhibitors of the sars-cov- s protein -ace ppi that is an essential first step for the viral entry of this novel, highly infectious coronavirus. we were able to set up a cell-free elisa-type assay to quantify the binding of sars-cov- s protein (as well as its sars-cov analog) to their cognate receptors (human ace ) and used this to screen our existing in-house compound library containing a large variety of organic dyes and a set of colorless analogs prepared as potential smis for costimulatory ppis. these maintain the main molecular framework of dyes but lack the aromatic azo chromophores responsible for the color as they are replaced with amide linkers ( , ). all new compounds used here were synthesized as described before as part of our effort to identify novel smis for the cd -cd l costimulatory ppi ( , ). synthesis involved one or two amide couplings (using a modified version of the procedure from ( )) and a hydrogenation (using a modified version of the procedure from ( )). these steps were used with different linkers and naphthyl moieties as needed for each structure; all corresponding details are summarized in the supplementary material (supplementary methods and supplementary schemes s -s ). all structures tested here ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) are shown in figure . as a first step, we explored the feasibility of setting up screening assays using a cell-free elisa-type format similar to those used in our previous works with fc-conjugated receptors coated on the plate and flag-or his-tagged ligands in the solution ( , - ). concentrationresponse assessments of binding to ace indicated that both the s and rbd portions of sars-cov- -s bind strongly and follow classic sigmoid patterns corresponding to the law of mass action ( ) with a slightly stronger binding for rbd than s ( figure ). fitting of data gave median effective concentrations (ec s) and hence binding affinity constant (kd) estimates of . and . nm, respectively ( and ng/ml) -in good agreement with the specifications of the manufacturer (sinobiological; wayne, pa, usa) and published values indicating a low nanomolar range ( - nm) typically based on surface plasmon resonance (spr) studies ( ) . because we are interested in possible broad-spectrum inhibitors, we also performed concentration-response assessments of the binding of sars-cov and hcov-nl s proteins (using their s &s and s domains, respectively) as they also use ace as their cognate receptor. sars-cov bound with about similar potency as sars-cov- ( . nm; ng/ml), whereas hcov-nl had significantly lower affinity ( . nm, ng/ml) ( figure ). based on this, we first used this assay to screen for inhibitors of sars-cov- rbd binding, which showed the strongest affinity to hace . in fact, this assay setup is very similar to one recently shown to work as a specific and sensitive sars-cov- surrogate virus neutralization test based on antibody-mediated blockage of this same ppi (cov-s-ace ) ( ) . we screened our in-house library of organic dyes plus existing analogs together with a few additional compounds that are or have been considered of possible interest in inhibiting sar-cov- by different mechanisms of action, e.g., chloroquine, clemastine, and suramin ( , [ ] [ ] [ ] [ ] . screening at µm indicated that most have no activity and, hence, are unlikely to interfere with the s-protein -ace binding needed for viral attachment. nevertheless, some showed activity (supplementary material, figure s ). compounds showing the strongest activity, i.e., rose bengal, erythrosine b (erb), and phloxine b, are known promiscuous smis of ppis ( ). as such, they are of no value here being nonspecific; they were included as positive controls. this screening also identified methylene blue (meblu, ), a phenothiazine dye approved by the fda for the treatment of methemoglobinemia and also used for several other therapeutic applications in the developed world ( ) ( ) ( ) and with additional potential for certain developing world applications such as malaria ( ) , as showing promising inhibitory activity for the sars-cov- -s-hace ppi, likely contributing to its anti-cov activity ( ) ; this has been discussed separately ( ). next, detailed concentration-response assessments were performed to establish inhibitory activity (ic ) per standard experimental guidelines in pharmacology and experimental biology ( , ) . these confirmed that indeed several organic dyes as well as non-dye dri compounds inhibited this ppi in a concentration-dependent manner with low micromolar ic s ( figure ) . for example, among tested dyes, congo red (cgrd, ), direct violet (dv , ), evans blue (evbl, ), chlorazol black (chbk, ), and calcomine scarlet b (csc b, ) had ic s of . , . , . , . , and . µm, respectively. further, we also found several dri compounds of low micromolar activity including some, such as dri-c ( ) and dri-c ( ), with even better submicromolar ic s ( and nm, respectively). for the compounds tested here, concentration dependencies were adequately described by a standard log inhibitor vs response model (i.e., a classical sigmoid binding function with a hill slope of ( )). sunset yellow fcf (fd&c yellow # ), an azo dye and an fda approved food colorant included as a possible negative control, showed no inhibitory activity. notably, neither chloroquine nor suramin showed inhibitory activity in this assay. we tested chloroquine, an anti-parasitic and immunosuppressive drug primarily used to prevent and treat malaria, because it was the subject of considerable controversy regarding its potential antiviral activity against sars-cov- ( ) . we also tested suramin, an antiparasitic drug approved for the prophylactic treatment of african sleeping sickness (trypanosomiasis) and river blindness (onchocerciasis), because it was claimed to inhibit sars-cov- infection in cell culture by preventing binding or entry of the virus ( ) and because it was one of the first compounds we found to inhibit the cd -cd l ppi ( ). on the other hand, erythrosine b (erb, fd&c red # ), an fda approved food colorant that we found earlier to be a promiscuous ppi inhibitor and have been using as positive control in such assays, inhibited with an ic of . µm, similar to its activity found for other ppis tested before ( - µm) ( ). for a few representative compounds, we also tested their ability to inhibit not just the binding of sars-cov- -rbd but also that of sars-cov- -s to hace . we obtained similar potencies; e.g., dri-c had an ic of . µm ( % ci of . - . µm) for s (supplementary material, figure s ) vs . µm ( % ci of . - . µm) for rbd ( figure ). this confirms that these are indeed real inhibitory activities relevant for the s protein -hace ppi of interest. more importantly, we also assessed the ability of selected promising compounds to inhibit the binding of sars-cov-s to ace using a similar setup. as shown in figure , several of the same compounds including organic dyes (cgrd, dv , and others) as well as dri compounds showed similar activity against sars-cov as against sars-cov- . for compounds tested in this assay such as cgrd, cvn, evbl, csc b, dri-c , and dri-c the ic s were . , . , . , . , . , and . µm (figure ) , respectively -values that are similar to those obtained for sars-cov- inhibition (figure ), raising the possibility of broad-spectrum anti-cov activity. besides activity, it is also important to achieve adequate selectivity, specificity, and safety. to become promising lead candidates, small-molecule compounds are usually expected to show > -fold selectivity over other possible pharmacological targets of interest ( , ). as a counterassay, here we assessed inhibitory activity against the tnf-r -tnf-α interaction, as we have done before ( , ). most of the dyes found here to inhibit the sars-cov- -ace ppi ( figure ) seem to be relatively promiscuous as they also inhibited the tnf-r -tnf-α ppi ( figure ) showing only some limited selectivity, e.g., -fold for cgrd ( . vs . µm) as one of the best and only . -fold for dv ( . vs. . µm). on the other hand, several dri-c compounds showed good, more than -fold selectivity, e.g., > -fold for dri-c ( . vs µm). the symmetric dri-c seems an exception that was the most potent in all assays but showed no selectivity ( . vs . µm). as these dri-c compounds were designed to target cd -cd l, they all inhibit that ppi with high nanomolar -low micromolar potency (see discussion). to establish whether these smis bind to cov-s or ace , we used a protein thermal shift assay (differential scanning fluorimetry or thermofluor assay) ( , ) as we did before for cd l ( ). this assay quantifies the shift in protein stability caused by binding of a ligand via use of a dye whose fluorescence increases when exposed to hydrophobic surfaces, which happens as the protein starts to unfold as it is heated and exposes its normally buried hydrophobic core residues. it allows rapid and inexpensive evaluations of the temperature-dependence of protein stability using real-time pcr instruments and only small amounts of protein. it is sensitive enough to assess small-molecule ppi interference and can be used even as a screening assay ( ). as shown in figure , the presence of cgrd or dri-c caused clear left-shifts in the melting temperature (tm) of the protein for sars-cov- -rbd, but not ace (purple vs. blue lines) indicating the former as the binding partner. this is encouraging, as smis targeting the s-protein are much more likely to ( ) not cause undesirable side effects than ace -targeting ones, which could interfere with ace signaling, and ( ) be more broadly specific due to the structural similarity of the different cov s glycoproteins. binding of a ligand usually results in an increase (right-shift) of the melting temperature due to stabilization of the protein; however, cases with a decrease (hence, destabilization) have also been reported ( ) including for the ebola virus glycoprotein ( ). for a set of selected active compounds, we were able to confirm that they also inhibit viral entry. this has been done with pseudoviruses bearing the sars-cov- s spike protein (plus fluorescent reporters) and generated using bacmam-based tools. these allow quantification of viral entry, as they express bright green fluorescent protein that is targeted to the nucleus of ace -(and red fluorescence reporter) expressing host cells (here, hek t), but can be handled using biosafety level containment, as they do not replicate in human cells. a day after entry, host cells express green fluorescence in the nucleus, indicating pseudovirus entry. if entry is blocked, the cell nucleus remains dark. in this assay, several of our smis tested, for example, cgrd, dv , and dri-c showed good concentration-dependent inhibition as illustrated by the corresponding images and bar graphs in figure . fitting with regular concentration response curves indicated a very encouraging ic of . µm for dri-c . cgrd and dv also inhibited, but with higher ic s ( and µm for, respectively), which is not unexpected for such azo dyes as they tend to lose activity in cell-based assay due to nonspecific binding ( figure c ). as a first safety assessment, in parallel with the cell assays, we also evaluated cytotoxicity for several compounds in the same cells and at the same concentrations using a standard mts assay to ensure that effects are present at non-toxic concentration levels. notably, chloroquine already showed noticeable cytotoxicity at µm concentrations in this assay with hek t cells, so its effect on pseudovirus entry could not be reliably evaluated there and hydroxychloroquine was used. we have shown before that compounds such as dri-c ( ) or dri-c ( ) results obtained here confirm again that the chemical space of organic dyes can serve as a useful starting platform for the identification of smi scaffolds for ppi inhibition. organic dyes need to be good protein binders; hence, their contain privileged structures for protein binding ( - ) and can provide a better starting point toward the identification of smis of ppis than most drug-like screening libraries, whose chemical space has been shown to not correspond well with that of promising ppi inhibitors ( ) ( ) ( ) . using this strategy, we have identified promising smis for the cd -cd l costimulatory interaction ( , , ) and even some promiscuous smis of ppis ( ). of course, because most dyes are unsuitable for therapeutic applications due to their strong color and, in the case of azo dyes, their quick metabolic degradation ( , ) , structural modifications are needed to optimize their clinical potential ( , ). here, we explored the potential of this approach to identify smis for the ppi between ace and cov spike proteins as potential antivirals inhibiting attachment. since sars-cov- uses its s protein via its rbd to bind ace as the first step of its entry ( , , , ) , targeting these proteins is a viable therapeutic strategy, and work with prior zoonotic cov has demonstrated proof-of-concept validity for such approaches. by screening our compound library spanning the chemical space of organic dyes, we identified several promising smis including dyes, such as congo red and direct violet , as well as novel drug-like compounds, such as dri-c , that ( ) inhibited the sars-cov- -s-hace ppi with low micromolar activity ( following the emergence of sars-cov in the early s, a limited number of groups performed high-throughput screening (hts) assays to identify inhibitory drug candidates for targeting various early steps in its cell invasion. identified candidates included some putative smis of viral entry, for example, ssaa e ( ) and ve ( ) . inhibitory candidates acting by other mechanism identified included, for example, ssaa e , ssaa e ( ); mp , he ( ) ; arb - , arb - ( ); ke ( ) ; and others (reviewed in ( , , ) ). most of these showed activity only in the low micromolar range, e.g., . , . , and . µm for ssaa e , k , and ve , respectively ( ). even if these compounds showed some evidence of inhibiting cov infection, no approved preventive or curative therapy is currently available for human cov diseases. in addition to the relatively low (micromolar) potency, a main reason for this is that these compounds were not suitable for clinical translatability. they could not pass the pre-clinical development stage and enter clinical trials due to their poor bioavailability, safety, and pharmacokinetics ( ). note that by starting from a different chemical space and not from that of drug-like molecules typically used for hts, our best smis identified here are already well within this low micromolar range for sars-cov- . there also was a recent attempt at identifying possible disruptors of the sars-cov- -s-rbd-ace binding using alphalisa assay based hts of , small-molecule drugs and pre-clinical compounds suitable for repurposing that identified possible hits ( ) . however, these were also of relatively low potency (micromolar ic s). none of them shows resemblance with the scaffold(s) identified here -highlighting again the known lack of overlap between the chemical space of existing drugs / drug-like structures and that of ppi inhibitors. the s protein is a homotrimer with each of its monomer units being about kda, and it contains two subunits, s and s , mediating cell attachment and fusion of the viral and cellular membrane, respectively ( , ). the rbd of the s protein is located within the s domain and is known to switch between a standing-up position for receptor binding and a lying-down position for immune evasion ( , ) . covs can utilize different receptors for binding, but several covs, even from different genera, can also utilize the same receptor. sars-cov- is actually the third human cov utilizing ace as its cell entry receptor, the other two being sars-cov and the αcoronavirus hcov nl ( ). mers-cov recognizes dipeptidyl peptidase (dpp ) ( ) ( ) ( ) , while hcov e recognizes cd ( ) . some β-coronaviruses (e.g., hcov oc ) bind to sialic acid receptors ( ) . having access to broadly cross-reactive agents that can neutralize a wide range of antigenically disparate viruses that share similar pathogenic outcomes would be highly valuable from a therapeutic perspective ( ), and smis are less specific and could yield therapies that are more broadly active (i.e., less strain-and mutation-sensitive) than antibodies, which tend to be highly specific. we have shown before that while the corresponding antibodies are species specific for the cd -cd l ppi, our smis could inhibit both the human and mouse system with similar potencies ( , ). hence, it is feasible that smi structures can be identified that in addition to inhibiting sars-cov- , also inhibit other covs, including the high lethality sars-cov and mers-cov as well as the common cold causing hcovs. along these lines, it is very encouraging that smis identified here target the cov-s protein and not ace ( while the smis identified here are not very small structures (mw in the to da range), they are still relatively small compared to typical smis of ppis. these tend to have larger structures to achieve sufficient activity, and they often severely violate the widely used "rule-offive" criteria, which, among others, requires mw < ( ) . in the last two decades, this "rule" has been used as a guide to ensure oral bioavailability and an adequate pharmacokinetic profile. nevertheless, an increasing number of new drugs have been launched recently (including the two small-molecule ppi inhibitors discussed earlier) that significantly violate these empirical rules proving that oral bioavailability can be achieved even in the "beyond rule-of-five" chemical space ( ) . hence, our results provide further proof for the feasibility of smi for cov attachment and provide a first map of the chemical space needed to achieve this. finally, these dri-c structures ( ) ( ) ( ) ( ) ( ) ( ) were originally intended to modulate co-signaling interactions, specifically to inhibit the cd -cd l costimulatory interaction, and they do so with low micromolar potency in cell assays (≈ µm) ( , ). while some show good selectivity vs tnf (e.g., dri-c , dri-c ), others seem more promiscuous (e.g., dri-c ). tnfinhibitory activities here were somewhat stronger than those we obtained before, e.g., ic s of . vs ( ) for erb or vs > ( ) for dri-c , possibly due to the use of a different blocking buffer. we hope that these ppi inhibitory activities can be ultimately separated, but even if not and they still retain some activity in modulating co-signaling interactions, this might not necessarily be counterproductive. it could provide a unique opportunity to pursue dual-function molecules that on one hand, have antiviral activity by inhibiting the interaction needed for cov attachment (e.g., sars-cov- -s-ace ) and, on the other, possess immunomodulatory activity to rein-in overt inflammation (inhibiting cd -cd l) or to unleash t cell cytotoxicity against virusinfected cells (inhibiting pd- -pd-l ). targeting of the pd- co-signaling pathway could be particularly valuable for its potential in restoring t cell homeostasis and function from an exhausted state ( , ) , which is of interest to improve viral clearance and rein-in the inflammatory immune response and the associated cytokine storm during anti-viral responses such as those likely implicated in the serious side effects seen in many covid- patients ( , ( ) ( ) ( ) . notably, the overexuberant immune response seen in covid- has raised the possibility that the lethality related to infection with the sars-cov- is possibly related to an uncontrolled autoimmune response induced by the virus ( ) , and the presence of auto-antibodies against type i ifns in patients with life-threatening covid- has now been confirmed ( ) . in conclusion, screening of our library of organic dyes and related novel drug-like schemes s -s ). sars-cov- s and rbd (cat. no. -v h and -v h), sars-cov s +s (cat. no. with . μg/ml sars-cov- rbd, . μg/ml ace with . μg/ml sars-cov- s , and . μg/ml ace with μg/ml sars-cov s s . . these values were selected following preliminary testing to optimize response (i.e., to produce a high-enough signal at conditions close to half-maximal response, ec ). binding assessments for tnf-r -tnf-α were performed as previously described using tnf-r at . μg/ml and tnf-α at . μg/ml ( ), with the exception of using superblock as the blocking buffer here. stock solutions of compounds at mm in dmso were used. this assay was used as described before ( ) and following standard protocols from literature ( , ) to establish which protein binds our compounds. sypro orange h. formazan levels were measured using a plate reader at nm. all binding inhibition assays were performed as at least duplicates per plates, and all results shown are average of at least two independent experiments. as before ( - ), binding data were converted to percent inhibition and fitted with standard log inhibitor vs. normalized response models ( ) using nonlinear regression in graphpad prism (graphpad, la jolla, ca, usa) to establish half-maximal effective or inhibitory concentrations (ec , ic ). ace , angiotensin converting enzyme ; cov, coronavirus; ppi, protein-protein interaction; sars, severe acute respiratory syndrome; smi, small-molecule inhibitor. binding curves and corresponding ec s are shown for sars-cov- (rbd and s ), sars-cov (s &s ), and hcov-nl (s ). they were obtained using fc-conjugated hace coated on the plate and his-tagged s , s s , or rbd added in increasing amounts as shown with the amount bound detected using an anti-his-hrp conjugate (mean ± sd for two experiments in duplicates). concentration-response curves obtained for the inhibition of the ppi between sars-cov- -rbd (his-tagged, . µg/ml) and hace (fc-conjugated, µg/ml) in cell-free elisa-type assay with dye (a) and non-dye (b) compounds tested. the promiscuous ppi inhibitor erythrosine b (erb) and the food colorant fd&c yellow no. (sunset yellow, sy) were included as a positive and negative controls, respectively. data are mean ± sd from two experiments in duplicates and were fitted with standard sigmoid curves for ic determination. estimated ic s are shown in the legend indicating that while suramin and chloroquine were completely inactive (ic > µm), several of our in-house compounds including organic dyes (cgrd, dv , and others) as well as proprietary dri-c compounds (e.g., dri-c , dri-cc ) showed promising activity, some even at sub-micromolar levels (ic < µm). figure . concentration-dependent inhibition of sars-cov-s s binding to ace by representative compounds of the present study. concentration-response curves obtained for the inhibition of the ppi between sars-cov-s s (his-tagged, µg/ml) and hace (fc-conjugated, µg/ml) in cell-free elisatype assay by selected representative dye and non-dye compounds. data and fit as before ( figure ). most compounds including several dri-c compounds show similar activity against sars-cov (i.e., sars-cov- ) as against sars-cov- raising the possibility of broad-spectrum activity. concentration-response curves obtained for the inhibition of this important tnf superfamily ppi in similar cell-free elisa-type assay as used for the cov-s-ace ppis to assess selectivity. data and fit as before ( figure ). as the ic values indicate, dri-c compounds showed more than -fold selectivity in inhibiting the cov-s ppi vs the tnf ppi. log c (µm) schemes s to s figures s to s for coupling ′-nitro[ , ′-biphenyl]- -carboxylic acid ( ) was synthesized by two steps as described in the literature ( ) . as described before ( ), for the synthesis of and as a general procedure of coupling a modified version of the coupling reaction from reference ( ) hz, h), . (t, j = . hz, h), . (q, j = . hz, h), . (t, j = . hz, h); promising smi hits to investigate (> % inhibition) … non-inhibitors figure s . concentration-dependent inhibition of sars-cov- -s binding to ace by compounds of the present study. concentration-response curves obtained for the inhibition of the ppi between sars-cov- -s (his-tagged, . µg/ml) and hace (fc-conjugated, µg/ml) in cell-free elisa-type assay with compounds tested. data are mean ± sd from two experiments in duplicates and were fitted with standard sigmoid curves for ic determination as in figure . the science underlying covid- : implications for the cardiovascular system cytokine release syndrome in severe covid- how does sars-cov- cause covid- ? estimating the burden of sars-cov- in france repurposed antiviral drugs for covid- ; interim who solidarity trial results. medrxiv inhibitors of viral entry cell entry mechanisms of sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine cytokine storm in a phase trial of the anti-cd monoclonal antibody tgn london's disastrous drug trial has serious side effects for research small molecules drive big improvements in immunooncology therapies postmarket safety events among novel therapeutics approved by the us food and drug administration between and covid- : drug targets and potential treatments the spike protein of sars-cov -a target for vaccine and therapeutic development investigation of ace nterminal fragments binding to sars-cov- spike rbd. biorxiv current challenges in peptide-based drug discovery the current state of peptide drug discovery: back to the future? small-molecule inhibitors of protein-protein interactions: progressing towards the dream reaching for high-hanging fruit in drug discovery at proteinprotein interfaces synthesis and evaluation of non-basic inhibitors of urokinase-type plasminogen activator (upa) structure-activity relationships of analogues of nf confirm nf as the most potent and selective known p x receptor antagonist a single unified model for fitting simple to complex receptor response data a sars-cov- surrogate virus neutralization test based on antibody-mediated blockage of ace -spike proteinprotein interaction chloroquine for the novel coronavirus sars-cov- a sars-cov- protein interaction map reveals targets for drug repurposing candidate drugs against sars-cov- and covid- suramin inhibits sars-cov- infection in cell culture by interfering with early steps of the replication cycle methylene blue lest we forget you--methylene blue efficacy and safety of primaquine and methylene blue for prevention of plasmodium falciparum transmission in mali: a phase , single-blind, randomised controlled trial methylene blue has a potent antiviral activity against sars-cov- in the absence of uv-activation in vitro. biorxiv experimental design and analysis and their reporting ii: updated and simplified guidance for authors and peer reviewers scaffolds for blocking protein-protein interactions designing focused chemical libraries enriched in protein-protein interaction inhibitors using machine-learning methods prediction of protein-protein interaction inhibitors by chemoinformatics and machine learning methods rationalizing the chemical space of protein-protein interaction inhibitors metabolism of azo dyes: implication for detoxication and activation toxicological significance of azo dye metabolism by human intestinal microbiota novel inhibitors of severe acute respiratory syndrome coronavirus entry that act by three distinct mechanisms identification of novel small-molecule inhibitors of severe acute respiratory syndrome-associated coronavirus by chemical genetics development and validation of a high-throughput screen for inhibitors of sars cov and its application in screening of a , -compound library targeting membrane-bound viral rna synthesis reveals potent inhibition of diverse coronaviruses including the middle east respiratory syndrome virus targeting ace -rbd interaction as a platform for covid therapeutics: development and drug repurposing screen of an alphalisa proximity assay. biorxiv characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov highly conserved regions within the spike proteins of human coronaviruses e and nl determine recognition of their respective cellular receptors structural basis for human coronavirus attachment to sialic acid receptors experimental and computational approaches to estimate solubility and permeability in drug discovery and development setting beyond the rule of : lessons learned from abbvie's drugs and compound collection the many faces of the anti-covid immune response restoring function in exhausted cd t cells during chronic viral infection the pathogenesis and treatment of the 'cytokine storm' in covid- immune checkpoint inhibitors: a physiology-driven approach to the treatment of coronavirus disease cytokine release syndrome in covid- : innate immune, vascular, and platelet pathogenic factors differ in severity of disease and sex covid- : infection or autoimmunity rapid chromatographic technique for preparative separations with moderate resolution nih image to imagej: years of image analysis photoactivatable hno-releasing compounds using the retro-diels-alder reaction dri-c ) yellow solid ( . g, %) ( % pure by hplc analysis (uv spectra at nm)). h-nmr ( mhz, dmso-d ): δ . (s, h) hz, h), . - . (m, h), . (d, j = . hz, h), . (d, j = . hz, h), . - . (m, h), . (t, j = . hz, h), . (t, j = . hz, h), . (q, j = . hz, h), . (t, j = . hz, h); c-nmr ( mhz, dmso-d nitrobenzamido)biphenyl- -ylcarboxamido)naphthalene- -sulfonic acid ( , dri-c ) preparation of compound followed the synthetic scheme of scheme s . the general procedure for the coupling reaction as described earlier was followed with '-( -nitrobenzamido)biphenyl- -carboxylic acid ( mg, . mmol) and -aminonaphthalene- -sulfonic acid ( mg, . mmol) to give the triethylamine salt of the title compound as a yellowish solid δ . (s, h), . (s, h), . (br, h), . (d, j = . hz, h), . - . (m, h), . - . (m, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (t, j = . hz, h), . (q, j = . hz, h) dri-c ) preparation of compound followed the synthetic scheme of scheme s . the general procedure for the coupling reaction as described earlier was followed with '-( -nitrobenzamido)biphenyl- -carboxylic acid ( mg, . mmol) and -aminonaphthalene- -sulfonic acid ( mg, . mmol) to give the triethylamine salt of the title compound as a yellowish solid ( mg, %) (> % pure by hplc analysis δ . (s, h), . (s, h), . (d, j = . hz, h) hz, h), . (q, j = . hz, h), . (t, j = . hz, h); c nmr ( mhz dri-c ) preparation of compound followed the synthetic scheme of scheme s . the general procedure for the coupling reaction as described earlier was followed with -nitrobenzoic acid ( mg, . mmol) and the triethyl amine salt of -( -( -aminopyridin- -yl)benzamido)naphthalene- -sulfonic acid ( mg, . mmol) to give the triethylamine salt of the title compound as a yellowish solid ( mg, %) (> % pure by hplc analysis (uv spectra at nm)). h nmr ( mhz, dmso-d ): δ . (s, h), . (s, h) -nitrobenzamido)thiophen- -yl)benzamido)naphthalene- -sulfonic acid ( , dri-c ) preparation of compound followed the synthetic scheme of scheme s . the general red solid ( mg, %) (> % pure by hplc analysis dmso-d ): δ . (s, h) hz, h) hz, h), . (d, j = . hz, h), . (t, j = . hz, h) dmso-d ): δ dri-c ) preparation of compound followed the synthetic scheme of scheme s . the general white solid ( mg, %) (> % pure by hplc analysis (uv spectra at nm)) mhz, dmso-d ): δ . (s, h), . (s, h), . (s, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . - . (m, h), . (d, j = . hz, h), . (t, j = . hz, h) dri-c ) preparation of compound followed the synthetic scheme of scheme s . , '-solid ( mg, %) (> % pure by hplc analysis (uv spectra at nm)). h nmr ( mhz, dmso-d ): δ . (s, h) research and education center at the university of florida (supported by funding from nih s od - a ) for their prompt and professional service. key: cord- - g v authors: teng, xufei; li, qianpeng; li, zhao; zhang, yuansheng; niu, guangyi; xiao, jingfa; yu, jun; zhang, zhang; song, shuhui title: compositional variability and mutation spectra of monophyletic sars-cov- clades date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: g v covid- and its causative pathogen sars-cov- have rushed the world into a staggering pandemic in a few months and a global fight against both is still going on. here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code, to molecular mechanisms based on understanding of rna replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on replicase-transcriptase complex. our analysis starts with primary sequence information and identity-based phylogeny based on , sars-cov- genome sequences and evaluation of sequence variation patterns as mutation spectrum and its permutations among organized clades tailored to two key mechanisms: strand-biased and function-associated mutations. our findings include: ( ) the most dominant mutation is c-to-u permutation whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity albeit assumed most slightly deleterious. ( ) the second abundance group includes: three negative-strand mutations u-to-c, a-to-g, g-to-a and a positive-strand mutation g-to-u generated through an identical mechanism as c-to-u. ( ) a clade-associated and biased mutation trend is found attributable to elevated level of the negative-sense strand synthesis. ( ) within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes. these findings demand a bioinformatics platform where emerging mutations are mapped on to mostly subtle but fast-adjusting viral proteomes and transcriptomes to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications. such thoughts and actions are in desperate need, especially in the middle of the war against covid- . the recent threats from sars-cov- , sars-cov, and mers-cov are different from those of earlier human coronaviruses (covs), including alphacoronaviruses, such as hsa-cov- e, hsa-cov-nl , hsa-cov-oc and hsa-cov-hku [ - ], in at least two aspects. first, the recent groups of betacoronaviruses appears to come more frequently in the past two decades as compared to the early comers where new members may be discovered as technology become more efficient and accurate [ ] . the current sars-cov- is also different from both intermediates that allow mismatch repair albeit existence of short and extremely rare double- stranded rna fragments involved in interference-based immunity [ , ] . third, the existence of wobble basepairing for secondary structures is of essence for operational functions of all rna molecules in addition to genetic information inheritance [ ] . that said, we can now look at how the rna genome of sars-cov- and related covs take advantage lower g+c contents. gc -associated mutations often reflect directional mutation patterns as observed strongly in certain lineages of plants and warm-blooded vertebrates as negative gradients from the transcription starts, and such trends are attributable to a special dna repair mechanism, transcription-coupled dna repair [ ] [ ] [ ] . the notion here is to remind ourselves that transcription-centric mutations may be accounted for some of the mutation events in rna ratg and mja-betacov-p l, which are considered to be distantly related but most closely are considered here, half of the codons are not sensitive to cp changes, and most of them are smaller amino acids ( figure s b ; [ - ]) . second, at the cp , g and c contents are both pulled apart toward extremity but not a or u, while the two pyrimidines and two purines appear stretched to separate directions; these trends suggest strong selective pressure at the first codon position over the entire genome. it is indeed that cp codons shoulder the most mutation pressures since they fall into all negative-sense strand permutations (known as r -derived permutations, c-to-u, g-to-a, u-to-c and a-to-g). third, the cp contents are most row- flipping changes referenced to the genetic code organization [ ] . these alterations are very useful for alternating chemical characteristics between related amino acids, and in terms of flexibility, cp codons are less stringent than cp but more flexible than cp . the balancing power becomes more obvious when orfs or proteins are examined individually for their composition dynamics ( figure c ). finally, it is conclusive that the more similar the covs in composition dynamic parameters, the closer they are genetically and phylogenetically in principle. however, primary parameters, such as g+c and purine contents are necessary but may not be sufficient. for instance, there has been a cov genome isolate from a wild vole captured in northeastern china, whose g+c and purine contents overlap with sars-cov- completely (rodent coronavirus isolate rtmruf-cov- /jl ; . , . ; [ ]) but its genome sequence is different (sharing . % identity with sars-cov- ). therefore, we have yet to find a within-population immediate animal host of sars-cov- albeit best similarity of composition dynamics seen among them. our subsequent study is focused on composition dynamics within cov genomes. it is interesting to see uniformity among all codon position contents of all cov genomes, increased g+c content from antigenomes to subgenomes. however, this trend is an illusion where the real trend is the lower g+c content of antigenomes but higher g+c contents of subgenomes due to stronger selection over structural proteins. this observation becomes clearer when all orfs and proteins are scrutinized one by one ( figure c and figures s ). sars-cov- has an exceptionally short subgenome (sg ) which only contains orf , but we have no evidence that it is either functional or non-functional. these results collectively remind us that sars-cov- and its two most-closely-related covs, unlike in the case of many other known by stratifying the data into structured and non-structured clades; the former can be analyzed first and the rest await further ideas. the next is even more troublesome. assuming that we have or more genome sequences per cov isolate and variations identified among them are still a miniscule fraction of the total virions produced in a patient body (medians and means of variations per cov isolate among c to c , see table s ), since the viral load per patient sample, such as sputum [ , ] , is equivalent to a -person or more sampling of the entire human population on earth, out of . even so, we have still been able to find shared variations among patient samples occasionally and even more lucky to have some clade structures, by and large due to the relatedness of the patients in the transmission network. finally, we have to admit that many assumptions have to be made about these samples and their genome sequences above sequence and assembly errors for phylogeny and genetic studies. nevertheless, we have constructed a somewhat stable phylogenetic tree-and-branch structure for further analysis ( figure a ). it is composed of monophyletic clades and non- monophyletic clade based on both orders of sample collection date and highly-shared mutations. among the clades, c shares two landmark mutations, c u in orf ab and u c in orf , and earlier date ( / / ). c shares three more mutations (c u, a g, and c u in orf ab) than what c have, and a late collection date ( / / ). clades c , c , and c are also distinguishable by some major mutations, so are c , c , and c ; the latter clades are clustered together based on four shared and other clade-associated mutations. the leftover large number of isolates that lack all landmark mutations are grouped into c , which have the earliest collection date on / / . according to the literature and our discussion, we have further grouped the clades into three clusters, s (c and c ), g (c , c and c ), and l (all the rest) since phylogeny shows clear divergence among them. between-clade analysis of high major allele frequency (maf) variations reveals that some clade-associated signature mutations are also shared among clades. for instance, c u in orf ab and a g in cluster s have recurred in other clades of different clusters, which are excellent landmarks for subclade definition. another notion is that higher maf within- clade mutations (such as maf> . ) are mostly non-synonymous mutations, indicating selection at work ( figure s ). our neighbor-joining tree based on distances from clades suggests that sars-cov- appears originated from multiple zoonotic reservoirs instead of a single direct ancestor ( figure s ). in addition, our classification rationales are largely in unrooted phylogenetic tree is shown in figure b . to look for clade-associated compositional and functional features, we have first built a consensus sequence for each clade and subsequently calculated frequencies for each within- clade permutation (table s ; figure a and b). a key assumption behind this is that certain functional mutations may have clade-specific effects on mutation spectrum, to close a loop where sequence mutations through genetic coding principles alter the viral proteome function. our observations are of importance in establishing logics about compositional dynamics between nucleic acids and proteins. first, permutations among clades are indeed variable according to their proportions calculated from genome variants, and aside from high- proportion permutations, r and r permutations, two other r and one r permutations appear also joining in, which are u-to-g and a-to-c, as well as a-to-u, respectively. second, the variable permutations, where some may represent effect of mutation pressure and others may exaggerate selection pressure, are unique to clades and clade clusters. for instance, clade cluster s has the lowest g-to-u fraction as compared to those of l and g; in addition, among the s clades, c has the lowest value of g-to-u. similarly, c , c , c , c , and c have relatively higher g-to-u permutations. third, based on the disparity of permutations or simply mutation spectra, we have taken a rather radical step to assume rtc statuses in favor of either tight or loose statuses for binding to purines and pyrimidines (see figure s ). since purines are larger than pyrimidines in size, the purine-or r-tight must be different from pyrimidine-or y-tight. the results are strikingly predictable in that the r-tight status suggests have discrete definitions for these so-called tight statuses but the less trendy r-loose and y- loose statuses also support a similar idea. we have further examined the compositional subtleties among the clades and clusters with a focus on g+c and purine content variability as both contents appear drifting toward optima in sars-cov- and its relatives ( figure c and d). different clades exhibit distinct compositional features and such dynamics are very indicative for the existence of feedback loops connecting rna variables to protein variables. two directions have to be advised for understanding these features albeit in absence of between-clade statistics. the first direction is driven by strong mutations, perhaps coupled to tight-loose switches in the catalytic pocket of rdrps in rtcs. it is clear that except c , the g or c -c -c cluster has the lowest g+c ( . , based on a c cov sampled in australia) and the lowest purine contents ( . , based on a c cov collected in bangladesh and a c cov collected in england). both lower g+c and purine contents are indicative of mutation pressure and signal this fast-evolving cluster of covs. since this cluster has the largest collection of covs, it is also not surprising to see a more complex median diversification within clades ( figure c and d). the second direction is the drive from selection or both selection and mutation in balance or imbalance, as well as in modes of fine-tuning or quick-escaping. some results from our analyses are shown here for briefing purposes ( figure e to g). for instance, g+c and purine contents at cp are informative for mutation drives and other measures are less clear cut ( figure f ), given the evidence that even mafs among clades are not stably distributed among clades as lower maf variations are rather sporadic and hard to analyze even binned into groups (data not shown). based on our clade and clade cluster analysis, it is tempting for us to speculate that there are plenty of rooms for further investigations into mutation spectra among large clades and even smaller clades or closely related individual cov genomes for several reasons. first, all high-frequency mafs should be identified and classified and these variations are candidates for highly selected mutations. second, all within-clade minor but not rare alleles (less than / , ), such as those of mafs in a range of . % to % should also be identified; they provide basis of within-clade sequence analysis. third, all non-structured cov genomes must be also classified based on shared variations, as they are not only valuable for within-clade but also for clade-cluster analyses since there is a large background of genome variations not yet brought into the databases. within-clade compositional dynamics can also be very informative, especially for covering and predicting future functional changes, such as identifying mutated and diversified forms of covs for drug and vaccine designs. it is also of essence for nucleic acid-based diagnostics, such as clade-specific identifications. we are in a process of developing an interactive database and mutation-function predicting algorithms based on our results to interpret novel sequence variations in real time. within-population variations are identified based on clade consensus sequence after alignment and extracted from datasets that have hundreds and thousands of genome sequences. the analysis of within-population variations relies on structured phylogeny and proportion change of permutations. the changes, based on functional relevance, can be classified into either copy number-related or rtc-specificity related, or sometimes both. we have taken two steps to extract information in order to distinguish the underlining mechanisms (figure ). in the first approach, we identify key mutations based on maf of mutations with a consideration of relatively even distribution among subclades and name the subclades in a sequential order based on the absence of a subset ( figure a ). in the second step, we plot out permutations to track changes among subclades ( figure b ). for instance, clade c can be divided into subclades and its variable permutation fractions are clearly recognizable. an immediate discovery is the trends of descending c-to-u, ascending a-to-g, and wavy g-to-u that initially goes up with a-to-g but rides down with c-to-u afterward. taking the two smaller clades, such as c and c , as examples ( figure d and e), we first find that their trends of permutation variables show opposite directions, where the increasing c-to-u accompanies with the decreasing g-to-u. a closer examination reveals that the increasing c-to-u in c is also accompanied by descending u-to-c. the only permutation showing an increasing trend in c is g-to-a. the take-home message from these trends is that rna synthesis of this subclade is biased toward producing more negative-sense strands or its mutation spectrum exhibits increasing mutations generated during the negative-sense table s . cov-related (cdr-betacov-b ), sars-cov- related (raf-betacov-ratg ), and nl - related (taf-alphacov-nl ; both species and cov genera were labelled for clarity). third, we also added more informative cov genome sequences to enrich lineage-associated information, which are a pangolin coronavirus genome (mja-betacov-p l) reported to be closed to sars-cov- and non-beta-coronaviruses that infect animals (e.g., ave-gamacov from gamma-coronavirus genus and smu-alphacov-ws from alpha-coronavirus genus). ' end of the genome, which is rather a result of, in terms of mechanism, the increased u content and c-to-u permutation. the negative gradient of u is also obvious from the ' end to the ' end. we use a -bp sliding window with a -bp step to show dynamic changes of genome g+c, purine, gc to gc , and gc skew (g-c/g+c) contents. the complete genome sequences and data sources are listed in table s . (colored and uncolored backgrounds) and cp and cp relative to their permutations sensitivity and changes are indicated with half parentheses with color-coding: c-to-u|u-to-c and a-to- g|g-to-a, red; g-to-u|u-to-g and a-to-c and c-to-u, blue; and a-to-u|u-to-a and g-to-c|c- to-g, green. note that cp and cp are sensitive to column and row codon swaps, respectively. cp is in a unique position where only half of the codons are sensitive to its changes, and the other half is so organized that some codons are more permissive than others. coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics rampant c-->u hypermutation in the genomes of sars-cov- and other coronaviruses: causes and consequences for their short-and long-term evolutionary evidence for host- dependent rna editing in the transcriptome of sars-cov- a scenario on the stepwise evolution of the genetic code a content-centric organization of the genetic code on the organizational dynamics of the genetic code the pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids an accessory to the 'trinity': sr-as are essential pathogen sensors of extracellular dsrna, mediating entry and leading to subsequent type i ifn responses sars coronavirus pathogenesis: host innate immune responses and viral antagonism of interferon codon--anticodon pairing: the wobble hypothesis the transcript-centric mutations in human genomes distinct contributions of replication and transcription to mutation rate variation of human genomes compositional gradients in gramineae genes asymmetric substitution patterns in the two dna strands of bacteria comparative analysis of rodent and small mammal viromes to better understand the wildlife origin of emerging infectious diseases viral load of sars-cov- in clinical samples detection of sars-cov- in different types of clinical specimens a dynamic nomenclature proposal for sars-cov- lineages to assist genomic epidemiology on the origin and continuing evolution of sars-cov- disease and diplomacy: gisaid's innovative contribution to global health a novel bat coronavirus closely related to sars-cov- contains natural insertions at the s /s cleavage site of the spike a pneumonia outbreak associated with a new coronavirus of probable bat origin sars-cov- viral spike g mutation exhibits higher case fatality rate the d g mutation in sars-cov- spike increases transduction of multiple human cell types spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity nanopore target sequencing for accurate and comprehensive detection of sars-cov- and other respiratory viruses transmission and evolution of the middle east respiratory syndrome coronavirus in saudi arabia: a descriptive genomic study genomic surveillance elucidates ebola virus origin and transmission during the outbreak reconstructing the initial global spread of a human influenza pandemic: a bayesian spatial-temporal model for the global spread of h n pdm origins and evolutionary genomics of the swine-origin h n influenza a epidemic decoding the evolution and transmissions of the novel pneumonia coronavirus (sars-cov- / hcov- ) using whole genomic data comparative analysis of coronavirus genomic rna structure reveals conservation in sars- like coronaviruses the china national genebank horizontal line owned by all, completed by all and shared by all world data centre for microorganisms: an information infrastructure to explore and utilize preserved microbial strains worldwide muscle: multiple sequence alignment with high accuracy and high fasttree --approximately maximum-likelihood trees for large alignments interactive tree of life (itol) v : recent updates and new developments phangorn: phylogenetic analysis in r using ggtree to visualize data on tree-like structures full names of the human covs are listed in the legend of figure key: cord- -uihof k authors: beddingfield, brandon; iwanaga, naoki; zheng, wenshu; roy, chad j.; hu, tony y.; kolls, jay; bix, gregory title: the integrin binding peptide, atn- , as a novel therapy for sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uihof k many efforts to design and screen therapeutics for the current severe acute respiratory syndrome coronavirus (sars-cov- ) pandemic have focused on inhibiting viral host cell entry by disrupting ace binding with the sars-cov- spike protein. this work focuses on the potential to inhibit sars-cov- entry through a hypothesized α β integrin-based mechanism, and indicates that inhibiting α β integrin interaction with ace and the spike protein using a novel molecule atn- represents a promising approach to treat covid- . sars-cov- spike protein interaction with integrins, and specifically α β . we performed similar assays to investigate ace binding to α β , using a mixture of atn- and human ace protein (hace ). clear inhibition of ace /α β binding by atn- was apparent and dose-dependent (fig. c) . application of atn- also did not reduce binding of trimeric spike protein to hace , but did reduce binding of monomeric spike ( figure s ). the in vitro assessment of atn- and therapeutic potential was performed using a once-passaged vero (e ) african green monkey (chlorocebus atheiops) kidney cell line utilizing competent sars-cov- . atn- was effective at reducing viral loads after infection (fig. a) , with an estimated ic of . µm. the ec value for atn- approximates the value for remdesivir . measuring cellular viability and underlying cytotoxicity is another metric for antiviral therapeutic potential that we explored with atn- . after hours infection at a moi of . , cells were lysed with celltiterglo and luminescence values were taken to measure atp production in each treatment. pretreatment with atn- increased atp production in infected cells, indicating increased viability, and was consistent with viral pcr data at concentrations as low as µm atn- (fig. b) . addition of µm atn- resulted in a decreased cytopathic effect (i.e. fewer apparent rounded, bright cells) when cells were visualized by phase contrast microscopy (fig. c) . in summary, we show that sars-cov- spike protein binds to both α β and α β /hace , and that this binding can be effectively inhibited by atn- , which also disrupts sars-cov- infection in vitro. prophylatic treatment of atn- increased cell viability in the presence of sars-cov- and decreased cytopathic effects associated with viral infection. taken together, and in light of atn- 's previously demonstrated in vivo therapeutic efficacy against a closely related beta-coronavirus (porcine hemagglutinating encephalomyelitis virus ) and its successful use in human cancer clinical trials , these results support the performance of in vivo studies to assess the potential efficacy of atn- as an experimental therapeutic agent for covid- . veroe cells (atcc# crl- ) were cultured in complete dmem containing % fetal bovine serum (fbs). sars-cov- stock from viral seed (sars-cov- ; -ncov/usa-wa / (bei# nr- ) was obtained by infecting nearly confluent monolayers of veroe cells for one hour with a minimal amount of liquid in serum free dmem. once adsorption was complete, complete dmem containing % fbs was added to the cells and the virus was allowed to propagate at ℃ in % co . upon the presence of cpe in the majority of the monolayer, the virus was harvested by clearing the supernatant at , xg for minutes, aliquoting and freezing at - ℃. sequencing confirmed consensus sequence was unchanged from the original isolate. enzyme-linked immunosorbent assay (elisa) was utilized to determine the ability of atn- to disrupt binding events essential to entry of sars-cov- into a host cell. for determination of inhibition of ace / α β integrin binding by atn- , α β was coated on -well plates at µg/ml for hours at room temperature and blocked overnight with . % bsa. addition of . µg/ml of hace -fc (sino biological, cat# -h h) in differing concentrations of atn- followed, incubating for hour at ℃. incubation with an hrp labeled goat anti-human fc secondary antibody at : for minutes at ℃ was followed by detection by tmb substrate. in order to assess disruption of binding of α β to sars-cov- spike protein, -well plates were coated as before, but incubation with atn- was performed in conjunction with µg/ml spike (produced under hhsn c and obtained through bei resources, niaid, nih: spike glycoprotein receptor binding domain (rbd) from sars-related coronavirus , wuhan-hu- , recombinant from hek cells, nr- ) in the presence of mm mncl , followed by detection with an anti-spike antibody. the rest of the procedure was consistent with the previous. in order to determine the ability of atn- to reduce the infection capability of sars-cov- in vitro, a cell-based assay was utilized. veroe cells were plated at a density of . x cells/well in a -well plate and incubated overnight at ℃ in % co . the next day, cells were treated with dilutions of atn- in complete dmem with % fbs for one hour at ℃ in % co , followed by viral infection at an moi of . . after hours, virus and cells were lysed via trizol ls and rna was extracted using a zymo direct-zol rna kit (#r ) according to manufacturer's instructions. experiments were performed under biosafety level conditions in accordance with institutional guidelines. viral load was quantified using a reverse transcriptase qpcr targeting the sars-cov- nucleocapsid gene. rna isolated from cell cultures was plated in duplicate and analyzed in an applied biosystems using taqpath supermix with the following program: i) ℃ for min., ii) ℃ for min. and iii) cycles of ℃ for s and ℃ for s. inhibition of sars-cov- spike protein binding to human ace by atn- . plates were precoated with a, trimeric or b, monomeric spike protein and incubated with a mixture of hace and various atn- concentrations, followed by detection of bound hace via hrpconjugated anti-ace antibody. data was normalized to a no-atn vehicle control. data represent mean ± sd, n= , * p< . , ** p< . . covid- ) pandemic a pneumonia outbreak associated with a new coronavirus of probable bat origin the proximal origin of sars-cov- genomic surveillance elucidates ebola virus origin and transmission during the outbreak. science ( -. ) a sars-like cluster of circulating bat coronaviruses shows potential for human emergence structure of the sars-cov- spike receptor-binding domain bound to the ace receptor remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro the fda-approved drug ivermectin inhibits the replication of sars-cov- in vitro inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion a human monoclonal antibody blocking sars-cov- infection new strategy for covid- : an evolutionary role for rgd motif in sars-cov- and potential inhibitors for virus infection sars-cov- and infectivity: possible increase in infectivity associated to integrin motif expression covid- in a patient with multiple sclerosis treated with natalizumab: may the blockade of integrins have a protective role? a potential role for integrins in host cell entry by sars-cov- enhanced receptor binding of sars-cov- through networks of hydrogen-bonding and hydrophobic interactions invasion of human respiratory epithelial cells by bordetella pertussis: possible role for a filamentous hemagglutinin arg-gly-asp sequence and α β integrin integrins: a flexible platform for endothelial vascular tyrosine kinase receptors angiotensin converting enzyme (ace) and ace bind integrins and ace regulates integrin signalling interaction of ace and integrin β in failing human heart pharmacology of the novel antiangiogenic peptide atn- (ac-phscn-nh ): observation of a u-shaped dose-response curve in several preclinical models of angiogenesis and tumor growth integrin α β -fak signaling pathway a non-rgd-based integrin binding peptide (atn- ) blocks breast cancer growth and metastasis in vivo the short amino acid sequence pro-his-ser-arg-asn in human fibronectin enhances cell-adhesive function sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor phase trial of the antiangiogenic peptide atn- (ac-phscn-nh ), a beta integrin antagonist, in patients with solid tumours we thank r garry, tulane university school of medicine, for use of pcr reagents, n maness at tulane national primate research center for viral stock, a mazar, monopar therapeutics, for helpful discussions on the preparation and handling of atn- for in vitro studies, and i rutkai and a narayanappa for collection of technical information regarding antibodies used in elisa studies. we would also like to thank k andersen at scripps research institute for sequencing of viral stock. g.b. is supported by tulane university startup funds. jkk was supported by the following nih grant r hl for this work. this research was supported in part by grant od to cjr from the national center for research resources and the office of research infrastructure programs (orip), nih. tyh was supported by department of defense grant w ixwh and nih grants r eb , r ai , r ai , r hd and r ai . g.b. is the inventor on a filed provisional patent with the uspto related to this work. the remaining authors declare no competing interests. key: cord- -dty esg authors: zhang, rongxin; ke, xiao; gu, yu; liu, hongde; sun, xiao title: whole genome identification of potential g-quadruplexes and analysis of the g-quadruplex binding domain for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dty esg the coronavirus disease (covid- ) pandemic caused by sars-cov- (severe acute respiratory syndrome coronavirus ) quickly become a global public health emergency. g-quadruplex, one of the non-canonical secondary structures, has shown potential antiviral values. however, little is known about g-quadruplexes on the emerging sars-cov- . herein, we characterized the potential g-quadruplexes both in the positive and negative-sense viral stands. the identified potential g-quadruplexes exhibits similar features to the g-quadruplexes detected in the human transcriptome. within some bat and pangolin related beta coronaviruses, the g-quartets rather than the loops are under heightened selective constraints. we also found that the sud-like sequence is retained in the sars-cov- genome, while some other coronaviruses that can infect humans are depleted. further analysis revealed that the sars-cov- sud-like sequence is almost conserved among , sars-cov- samples. and the sars-cov- sudcore-like dimer displayed similar electrostatic potential pattern to the sud dimer. considering the potential value of g-quadruplexes to serve as targets in antiviral strategy, we hope our fundamental research could provide new insights for the sars-cov- drug discovery. to get the potential g-quadruplexes in the sars-cov- genome, we took the strategy described as follows ( fig. a) : (i) predicting the pg s with three software independently. (ii) merging the prediction results of the pg s and evaluating the g-quadruplex folding capabilities by the cg/cc scores. (iii) the pg s with cg/cc scores higher than the threshold were selected as candidates for further analysis. here, the threshold for determining whether pg s can be folded was set to . , as described in the study of jean-denis beaudoin et al. [ ] in total, we obtained pg s (table. ) in the positive or negative-sense strands for further analysis. to annotate the pg s, the reference annotation data (in gff format) of sard-cov- were downloaded from the ncbi database with the accession number of nc_ . firstly, we focused on the pg s on the positive-sense strand. fifteen of the pg s ( . %) were located on the positively-sense strand, the vast majority of them were harbored in non-structural proteins including nsp , nsp , nsp , nsp , nsp and nsp , with the remaining ones located in the spike protein, orf a, and the membrane protein. secondly, we examined the pg s on the negative-sense strand, which is an intermediate product of replication. nine pg s were scattered on the negative-sense strand. to further characterize the potential canonical secondary structures competitive with gquadruplexes, the landscape of thermodynamic stability of the sars-cov- genome was depicted by using Δg°z-score [ ] . in general, a positive Δg°z-score implies that the secondary structure of this region tends to be less stable than the randomly shuffled sequence with the identical nucleotide composition, while a negative Δg°z-score signifies higher stability than the randomly shuffled sequence. for each nucleotide in the sars-cov- genome, the Δ g°z-score was calculated for all the nt windows covering the nucleotide, and an average Δg°z-score was deduced then. several pg s are located in positions with a locally higher average Δg°z-scores (fig. b ) which implied the relative instability of a canonical secondary structure and the lower possibility to adopt such a competitive structure against the g-quadruplex, which may ultimately favor the formation of g-quadruplex. -ggttggtttgttacctgggaagg - + ggctttggagactccgtggaggag g nsp + ggagactccgtggaggagg nsp + ggtaataaaggagctggtgg nsp -ggggcttttagaggcatgagtagg - + ggaggaggtgttgcagg nsp + gggtttaaatggttacactgtagag gagg nsp + ggtttaaatggttacactgtagagg agg nsp -ggtggaatgtggtagg - -ggatatggttggtttgg - + ggatacaaggctattgatggtgg nsp -ggtttgtggtggttgg - -ggtgatagaggtttgtggtggttgg - -ggtgatagaggtttgtggtgg - + ggtacaggctggtaatgttcaactc agg nsp + ggctggtaatgttcaactcagggtt attgg nsp + ggtatgtggaaaggttatgg nsp -ggatctgggtaaggaagg - + ggattggcttcgatgtcgagggg nsp in , chun kit kwok and co-workers profiled the rna g-quadruplexes in the hela transcriptome by using the rna g-quadruplex sequencing (rg -seq) technology, and quantified the diversity of these rna g-quadruplexes [ ] . we set out to address the question of whether the potential g-quadruplexes in sars-cov- showed analogical features with the g-quadruplexes found in the human transcriptome and if these pg s have the ability to form g-quadruplex structures. we noticed that the pg s in sars-cov- are all in the two-quartet style. therefore we retrieved the two-quartet rna g-quadruplex sequence data generated in the rg -seq experiment under the condition of k + and pyrdiostatin (pds). however, for some rts (reverse transcriptase stalling) sites labeled as two-quartet, there may exist overlapping g-quadruplexes with different loops (e.g., ggcacagcaggcatcggaggtgaggcgggg), and it is difficult to determine which one was formed in the experiment. in order to eliminate the ambiguity, only the rts sites containing nonoverlapping two-quartet g-quadruplex (e.g., gtcattttttgtgtttggtttggtggtggc) were considered. firstly, we investigated the loop length distribution pattern of the two-quartet pg s in both sars-cov- and the human transcriptome (fig. a) . as a whole, the two-quartet pg s in sars-cov- and the human transcriptome displayed similar loop length distribution patterns, and the loop length of the pg s in sars-cov- falls into the scope of the ones from the human transcriptome. the distributions of loop length between the sars-cov- pg s and the human two-quartet gquadruplexes did not show discrepancies (fig. s , wilcoxon test, p-value = . ). considering the fact that the presence of multiple-cytosine tracks may hinder the formation of gquadruplexes [ , ] , we examined the cytosine ratio in g-quadruplex loops (fig. b ). no significant difference in loop cytosine ratios was observed between the sars-cov- pg s and the human two-quartet g-quadruplexes (wilcoxon test, p-value = . ), which suggested that the loop cytosine ratios between the two types of g-quadruplex were similar. taken together, our results suggested that the pg s in sars-cov- displayed similar features to the rg s in the human transcriptome. recent research revealed that the g-quadruplexes in human utrs (untranslated regions) are under selective pressures [ ] , and some coronaviruses on bats and pangolins are closely related to sars-cov- . the conservation of the potential g-quadruplexes in the sars-cov- genome under selective constraints were analyzed. we collected some beta coronavirus genomic sequences of bats and pangolins from several public databases and used the nj (neighbor-joining) method to construct the phylogenetic tree with , bootstrap replications (fig. s ). the rs (rejected substitutions) score for each site in the sars-cov- reference genome was evaluated by using the gerp++ software. we checked the rs score difference between the g-tract (continuous runs of g) nucleotides and other nucleotides. a significant discrepancy was observed, which means that the g-tracts nucleotides exhibit heightened selective constraints than other nucleotides in the sars-cov- genome (fig. a , wilcoxon test, p-value = . × - ). considering that the g-tracts are composed of guanines, the conservation of guanines in and outside the g-tracts in the sars-cov- genome were also compared. we found that the guanines in g-tracts are under heightened selective constraints (fig. b , wilcoxon test, p-value = . × - ). the nucleotides within g-tracts are more relevant to the g-quadruplexes structural maintenance than loops. then we compared the gtract and loop rs scores. as expected, the g-tract rs scores were significantly higher than loops (fig. c , wilcoxon test, p-value = . × - ), which suggests that the g-tracts experienced stronger selective constraints. we also checked that if the pg s that are under heightened selective constraints is relevant to its inherent properties or potential functions rather than the sequence contexts. a random test was performed to check whether the fragments containing pg s manifested different average rs scores compared with random fragments in the sars-cov- genome. the fragments containing pg were designated as the sequence nt upstream and downstream of the pg centers. we conducted , rounds of tests. in each test, we randomly selected fragments from the sars-cov- genome with a length of nt and carried out the wilcoxon test to assess the average rs score difference among the randomly selected fragments and the fragments containing pg s. the p-value for each round was retained. as a result, no evident difference was observed as few p-values ( / ) were less than . (fig. d) , suggesting that pg s that are under heightened selective constraints is more likely to be related to its inherent properties or potential functions rather than sequence contexts. both sars-cov and sars-cov- could cause acute disease symptoms, and the above coronavirus shares similar nucleic acid sequence compositions. there is a sud in the sars-cov genome that can binding to the g-quadruplex structures and it is unclear if the sars-cov- genome possess the resemble structure. thus, we started to explore whether the sars-cov- genome contains the protein-coding sequence similar to sud and whether sars-cov- retains the ability to bind rna g-quadruplexes. we collected the orf ab amino acid sequences of some coronaviruses, including seven known coronaviruses, which can infect humans and other coronaviruses belonging to different genera. surprisingly, the sud protein sequence is absent in some coronaviruses, especially in alpha, gamma, and delta coronaviruses (fig. s ). in contrast, the sud protein sequence is retained in several beta-coronavirus, particularly in bat and pangolin associated beta coronavirus. moreover, among the seven coronaviruses that can infect humans, only sras-cov and sars-cov- keep the sud sequence, while the sud sequence in mers-cov, hcov- e, hcov-nl , hcov-oc and hcov-hku is depleted. next, we examined eight key amino acid residues in sud that previously reported associated with g-quadruplex binding affinity (fig. a) . almost all the key amino acid residues are reserved in sars-cov- , except one conservative replacement of k (lysine) > r (arginine). we hypothesized that if the g-quadruplex binding ability is essential for the sars-cov- , the above amino acid residues should be conservative. we then investigated the conservation of the eight amino acid residues within sars-cov- samples. we retrieved the sequence alignment file of , sars-cov- samples from the gisaid database and calculated the mutation frequency for each nucleotide. we observed the frequency of nucleotide mutations in the above eight codons. as a result, a limited mutation frequency was found as compared to the whole genome average mutation frequency (fig. b, frequency = . ). although eight mutations were detected in glutamate ( e), seven of them were synonymous mutations. next, we checked the electrostatic potential pattern in the sars-cov- sudcore-like dimer structure. the sars-cov- sudcore-like dimer structure is defined as the dimer structure formed by the amino acid residues in sars-cov- corresponding to the sud of sars. we found that the sudcore-like dimer of sars-cov- and the sudcore of sars present analogical electrostatic potential patterns. the positively charged patches were observed in the core of the sudcore-like dimer, which was surrounded by negatively charged patches (fig. c ). in contrast, when the dimer is rotated °, a slightly inclined narrow cleft with negative potential accompanied by the positively charged patches was discovered (fig. d ). and the above patterns also appeared in the sud dimer. in the previous reports, several positively charged patches located in the center and back of the dimer were presumed to bind the g-quadruplex structures. by comparison with the electrostatic potential of the sars sudcore dimer, we identified the positively charged patches located in the center and back of the sars-cov- sudcore-like dimer, which can potentially bind the g-quadruplexes (fig. c-d) . the covid- pandemic has caused huge losses to humans and made people pay more attention to public health. a large number of scientists all over the world have been engaged in the fight against the outbreak. the sars-cov- coronavirus is the key culprit responsible for the outbreak, and no specific inhibitor drugs have been developed yet. g-quadruplexes have shown tremendous potential for the development of anticancer [ ] [ ] [ ] [ ] and antiviral drugs [ , , ] , as gquadruplexes can interfere with many biological processes that are critical to cancer cells and viruses. therefore, it is necessary to quantify and characterize the pg s in the sars-cov- genome to provide a possible novel method for the treatment of covid- . in this study, besides three popular g-quadruplexes prediction tools, the cg/cc scoring system, which is specially designed for the identification of rna g-quadruplexes, was adopted to determine the pg s. indeed, we did not find the g-quadruplexes with three or more g-quartets, which are generally considered to be more stable than the two-quartet g-quadruplexes. one of the controversial issues lies on the stability of the two-quartet g-quadruplexes, especially the folding capability of those g-quadruplexes in vivo. however, it is well-acknowledged that the rna gquadruplexes is more stable than their dna counterparts [ , ] and sars-cov- is a single-strand rna virus, which may be conducive to its structure formation. several emerging studies have demonstrated the formation of two-quartet g-quadruplexes in viral sequences, which could serve as antiviral elements under the presence of g-quadruplex ligands [ , , ] . moreover, the k + (potassium ion), one of the primary positive ions inside human cells, can strongly support the formation of g-quadruplexes. nevertheless, whether the sars-cov- g-quadruplexes could form in vivo requires overwhelming proofs. most of the pg s we detected were located in the positive-sense strand. the g-quadruplex forming sequences in the sars-cov genome were presumed to function as the chaperones of sud, and their interaction was essential for the sars-cov genome replication [ ] . orf ab that encodes the replicase proteins is required for the viral replication and transcription. some pg s were found to harbored in orf ab, and whether these pg s were related to the replication of the viral genome and interact with sud-like structures like in sars-cov, is worthy of further investigation. in addition to orf ab, there exists several pg s in the structural and accessory protein-coding sequences as well as the sgrnas that containing the above protein sequences. some studies have characterized the impact of g-quadruplex structures on the translation of human transcripts, and an apparent inhibitory effect was observed [ , , ] . the translation of some sars-cov- proteins requires the involvement of human ribosomes; thus, it is possible to repress the translation of sars-cov- proteins via stabilizing the g-quadruplex structures. in fact, this inhibition effect has been reported in some other viral studies [ , ] . the negative-sense strand serves as templates for the synthesis of the positive-sense strand and the sub-genomic rnas. the identified potential gquadruplexes were broadly distributed in the negative-sense strand. notably, we observed one pg located at the ' end of the negative-sense strand. a previous study confirmed that the stable gquadruplex structures located at the ' end of the negative-sense strand could inhibit the rna synthesis by reducing the activity of the rdrp (rna-dependent rna polymerase) [ ] . therefore, it is necessary to further investigate whether the pg at the ' end of the negative-sense strand of sars-cov- could inhibit rna synthesis. in addition, recent research revealed that the highfrequency trinucleotide mutations (g a, g a and g c) were detected in the sars-cov- genome [ , ] . g a and g a always co-occur within the same codon, which means a positive selection of amino acid [ ] . we noticed that the trinucleotide mutations were in the g-rich sequence from nt to nt ( ' ggggaacttctcctgctagaatggctggcaatggcgg '). the potential g-quadruplex downstream of the trinucleotide mutations was filtered by the cg/cc score system as the presence of cytosine tracks within and flanking of the potential g-quadruplex reduce the cg/cc score; however, in fact, this potential g-quadruplex showed a relative lower mfe (minimum free energy) among all the potential g-quadruplexes we detected. the consequence of the trinucleotide mutations was still elusive. whether the mutations have an internal causality with the g-rich sequence still needs to be elucidated. the sud in sars, which is thought to be related to its terrible pathogenicity, has displayed binding preference to the g-quadruplexes in human transcripts [ ] . our analysis revealed that the novel coronavirus sars-cov- contained a similar domain to sud as well. furthermore, several amino acid residues previously reported to be an indispensable part of the g-quadruplexes binding capability are retained in sars-cov- . further exploration indicated that the eight key amino acid residues were conserved in numerous sars-cov- samples across countries all over the world, suggesting the essentiality of the above residues. it is supposed that the binding of sud to gquadruplexes could affect transcripts stability and translation, hence impairing the immune response of host cells. the expression of host genes in sars-cov- infected cells is extremely inhibited [ ] ; therefore, we speculate that the sars-cov- may possess the similar mechanism with sars-cov that can inhibit the expression of some important immune-related genes to escape immune defense. herein, we briefly depict the possible role of g-quadruplexes in the antiviral mechanism and pathogenicity, and the development of certain g-quadruplex specific ligands might be a promising antiviral strategy (fig. ) . we call for more researchers to shed light on the relationship between gquadruplexes and coronaviruses. only if we have a deeper understanding of coronaviruses can we better cope with the possible novel coronavirus pandemics in the future. fig. possible role of g-quadruplexes in the antiviral mechanism and pathogenicity. left part, gquadruplexes can function as inhibition elements in the sars-cov- life cycle. both the replication and translation could be affected by the g-quadruplexes structures. the stable g-quadruplexes in the ' end of the negative-sense strand may interfere with the activity of rdrp; hence, the replication of the negative-sense strands to the positive-sense strands is repressed, so that the sars-cov- genomes cannot be produced in large quantities. the g-quadruplex structures can suppress the translation process by impairing the elongating of ribosomes, which can hinder the production of proteins required for the virus. the g-quadruplex structures could be stabilized by the specific ligands to enhance the inhibitory effects, which is a promising antiviral strategy. right part, a possible mechanism for sars-cov- to impede the expression of human genes. g-quadruplex structures, particularly with longer g-stretches, are the potential binding targets for sud-like proteins. and the interaction of the sud-like proteins with g-quadruplex structures possibly lead to the instability of host transcripts or obstructing the translation efficiency. we obtained a total of full-length bat-associated beta coronaviruses from the dbatvir (http://www.mgc.ac.cn/dbatvir/) database [ ] . we also downloaded the bat coronavirus ratg genome from the ncbi virus database (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/), which has shown a high sequence similarity to the sars-cov- reference genome in previous reports. we acquired the sars-cov- reference genome from the ncbi virus database under the accession number of nc_ . in addition to those sequences, nine pangolin coronaviruses were derived from gisaid (https://www.gisaid.org/) database [ ] . the emboss needle software, which is based on the needleman-wunsch algorithm and, is a part of the embl-ebi web tools [ ] , was employed for the pairwise sequence alignment. clustal omega [ , ] is a reliable and accurate multiple sequence alignment (msa) tool that can be performed on large data sets. we choose this msa tool for the alignment of viral genomes and the alignment of protein sequences under the default paraments. ugene [ ] is a powerful and userfriendly bioinformatics software, and we choose ugene to visualize the pairwise and multiple sequence alignment results. we used the mega x software [ ] to construct the neighbor-joining phylogenetic tree with , bootstrap replications. to depict the conservation state for each nucleotide site, the gerp++ software [ ] was applied to calculate the "rejected substitutions" score column by column, which can reflect the constraints strength for each nucleotide sites. several open-source g-quadruplex detection software was used to search the pg s both in the sars-cov- positive-sense and negative-sense strands. g catchall [ ] , pqsfinder [ ] , and qgrs mapper [ ] were employed to predict the putative g-quadruplexes, respectively; please see ref [ ] for more information about the comparison of those tools mentioned above. the minimum g-tract length was set to two in the three software, while the max length of the predicted gquadruplexes was limited to . specifically, the minimum score of the predicted g-quadruplex was set to when using pqsfinder. we utilized bedtools [ ] to sort the pg s according to their coordinates. apart from this, we adopted the cg/cc scoring system [ ] proposed by jean-pierre perreault et al. to delineate the sequence context influence on pg s. the pg s along with nt upstream and downstream sequence contexts were used to calculate cg/cc score, and . was taken as the threshold for the preliminary inference of the g-quadruplex folding capability [ ] . using a customized python script, we implemented the cg/cc scoring system. the sars-cov- sud core-like homo-dimer structure was modeled based on the template of the sars-cov sud structure (pdb id: w g) through homology modeling. all the modeling process were performed in the swiss model [ ] website (https://swissmodel.expasy.org/). the electrostatic potential was calculated and visualized in the pymol software by using the apbs (adaptive poisson-boltzmann solver) plugin. the Δ g°z-score for the sars-cov- genome was retrieved from rnastructuromedb (https://structurome.bb.iastate.edu/sars-cov- ). the Δg°z-score is described as follows. where the means the mfe (minimum free energy) Δg° value predicted by the rnafold software with a window of nt and step of one nt. and the ������ represents the mfe Δg° value generated by the randomly shuffled sequence with the identical nucleotide composition. the is the standard deviation across all the mfe values. to depict the Δg°z-score for each nucleotide in the sars-cov- genome, we utilized the following formula. where z is the average Δg°z-score for nucleotide , denotes the total number of the sliding windows that covering the nucleotide . Δg°z − score indicates the Δg°z-score for the mth window. for example, when considering the nucleotide under the setting of nt window length and one nt step, there are sliding windows covering the nucleotide . so, the z , which means the average Δg°z-score for nucleotide , is calculated as the sum of theΔg°zscore of sliding windows divided by the total number of the sliding windows. this work was supported by the national natural science foundation of china ( ). severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges origin and evolution of pathogenic coronaviruses epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china severe acute respiratory syndrome coronaviruses -drug discovery and therapeutic options coronaviridae study group of the international committee on taxonomy of, the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- clinical characteristics of coronavirus disease in china the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak coronavirus disease (covid- ): a perspective from china virology, epidemiology, pathogenesis, and control of covid- crispr-cas -based detection of sars-cov- sars-cov- : an emerging coronavirus that causes a global threat the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak -an update on the status the architecture of sars-cov emerging coronaviruses: genome structure, replication, and pathogenesis sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein covid- infection: origin, transmission, and characteristics of human coronaviruses probable pangolin origin of sars-cov- associated with the covid- outbreak identifying sars-cov- related coronaviruses in malayan pangolins the proximal origin of sars-cov- a pneumonia outbreak associated with a new coronavirus of probable bat origin isolation of sars-cov- -related coronavirus from malayan pangolins dna secondary structures: stability and function of gquadruplex structures g-quadruplexes: prediction, characterization, and biological application the regulation and functions of dna and rna g-quadruplexes the structure and function of dna g-quadruplexes involvement of g-quadruplex regions in mammalian replication origin activity g-quadruplexes in dna replication: a problem or a necessity? human origin recognition complex binds preferentially to g-quadruplexpreferable rna and single-stranded dna g motifs affect origin positioning and efficiency in two vertebrate replicators telomere dna g-quadruplex folding within actively extending human telomerase g-quadruplex formation at the ' end of telomere dna inhibits its extension by telomerase, polymerase and unwinding by helicase telomeric g-quadruplexes are a substrate and site of localization for human telomerase regulation of telomere length by g-quadruplex telomere dna-and terra-binding protein tls/fus g-quadruplex preferentially forms at the very ′ end of vertebrate telomeric dna an rna g-quadruplex in the ′ utr of the nras proto-oncogene modulates translation utr of the bag- mrna affects both its cap-dependent and cap-independent translation through global secondary structure maintenance a gquadruplex structure within the ′-utr of trf mrna represses translation in human cells rna gquadruplexes at upstream open reading frames cause dhx -and dhx -dependent translation of human mrnas more than just a kink in microbial genomes g-quadruplexes in viruses: function and potential therapeutic applications g-quadruplexes and g-quadruplex ligands: targets and tools in antiviral therapy the sars-unique domain (sud) of sars coronavirus contains two macrodomains that bind g-quadruplexes ltr reveals a ( + ) folding topology containing a stem-loop u region in the hiv- genome adopts a g-quadruplex structure in its rna and dna sequence hiv- nucleocapsid protein unfolds stable rna g-quadruplexes in the viral genome and is inhibited by g-quadruplex ligands anti-hiv- activity of the g-quadruplex ligand braco- zika virus genomic rna possesses conserved g-quadruplexes characteristic of the flaviviridae family the effect of single nucleotide polymorphisms in g-rich regions of high-risk human papillomaviruses on structural diversity of dna chemical targeting of a g-quadruplex rna in the ebola virus l gene new scoring system to identify rna g-quadruplex folding an days after vaccination) , , . therefore, virus-specific antibodies in serum and secretory mucus (e.g., saliva and sputum) can be used as diagnostic markers for viral infection and for the evaluation of patient's adaptive immune responses (either from virus infection or from vaccination). as such, sars-cov- -specific antibodies are currently listed among the diagnostic markers in the "covid- diagnosis and treatment plan (provisional th edition)" published in china. in addition to sars-cov- specific antibodies, the viral antigen (such as the spike protein or s protein) in circulating blood can be used for the prognosis of covid- -related viremia , . as one of the most commonly used targets for vaccine development, the serum concentration of the s protein is also a potential marker for early-stage vaccine responses, especially for sub-unit vaccines . unfortunately, the existing antibody detection methods are still far from adequate. the gold nanoparticle based lateral flow assay (e.g., paper-based test strips) is very popular in rapid detection of igg/igm antibodies (especially for point-of-care diagnostics) - . although fast ( - minutes), it provides only yes/no information and has very limited sensitivities, making this method very easy to generate false positives/negative and impossible to track the patients' immune response to infection, treatment, and vaccination. conventional elisa (enzyme-linked immunosorbent assay), on the other hand, can provide quantitative, accurate, and sensitive results, but it involves complicated and expensive instruments and long assay time ( - hours) , . in addition, samples need to be sent to centralized labs, which significantly increases the turn-around time. in this work, we present a microfluidic elisa technology for rapid ( - minutes), quantitative, and sensitive detection of sars-cov- biomarkers using sars-cov- specific igg and viral antigen -s protein, both of which are spiked in serum, as a model system. we also characterized various humanized monoclonal igg and identified one with a high binding affinity towards sars-cov- s protein that can serve as the calibration standard of anti-sars-cov- s igg in serological analyses. furthermore, we demonstrated that our microfluidic elisa platform can be used for rapid affinity evaluation of monoclonal anti-s antibodies. the microfluidic elisa device is highly portable and requires only l of samples for each channel, which can be easily collected from a drop of fingertip blood. therefore, our technology will greatly facilitate rapid and quantitative analysis of covid- patients and vaccine recipients at point-ofcare. a detailed description of the automated elisa system and corresponding capillary sensor arrays can be found in our previous publications , . a photo of the capillary sensor array can be found in figure (a). it is made of polystyrene using the injection molding method. the sensor array has channels, each of which has an inner diameter of . mm and approximately l volume, and acts as an elisa reactor. the chemiluminescent substrate (supersignal™ elisa femto substrate, ), the ultrapure™ dnase/rnase-free distilled water ( ), and the superblock™ (pbs) blocking buffer ( ) were purchased from thermo fisher. the elisa coating buffer ( × pbs, dy ), concentrated wash buffer (wa ), and concentrated reagent diluent ( % bsa in × pbs, dy ) were purchased from r&d systems. the human serum was purchased from millipore sigma (h - ml). human-cell-expressed sars-cov- spike s -his recombinant protein was provided by sino biological ( -v h). the human-cell-expressed sars-cov spike s -his recombinant protein was purchased from creative biolabs (vang-wyb ). the recombinant cr therapeutic antibody was purchased from creative biolabs (mro- lc). the humanized chimeric antibodies d , d , and d were developed and provided by sino biological (catalog number: -d , -d , and -d ). the anti-polyhistidine antibody that was used in polyhistidine-mediated s protein immobilization (see figure s ) was purchased from thermo fisher (ma - ). the horseradish peroxide (hrp) conjugated secondary antibody used in the igg detection experiment was from the detection antibody in thermo fisher's human total igg elisa kit ( - - ). the hrp conjugation of cr , d , d , and d antibodies were carried out with abcam's hrp conjugation kit (ab ) with a molar ratio of antibody : hrp = : . the illustrations for the reactor preparation protocol and the elisa protocols can be found in figure s . for all steps, the working solution of the wash buffer and reagent diluent were diluted with ultrapure™ dnase/rnase-free distilled water to achieve × working concentration. in the first step of the reactor preparation process ( figure s (a)), the working solution of the capture antibody (i.e., d in s detection experiments) or the anti-his antibody (in igg detection and antibody affinity experiments) were prepared by diluting the stock solution with the elisa coating buffer ( × pbs, ph = . ) to achieve final concentrations of μg/ml. note that for the anti-s igg detection and antibody affinity experiments, the second incubation step was used for blocking plus s protein immobilization (with μg/ml of s -his protein dissolved in % bsa). for the s protein detection, this step is used for blocking only (with % bsa in × pbs). for the anti-s igg detection experiments (see figure s (b) for the detailed protocol), various concentrations of monoclonal antibodies were prepared by diluting the stock solutions with times diluted human serum (the serum was diluted with × reagent diluent, which correlates to % bsa). the working solution of the detection antibody (in this case, the hrp-conjugated secondary antibody) was prepared by diluting the stock solution times in × reagent diluent. for the s detection experiments (see figure s (c) for the detailed protocol), various concentrations of the s proteins were prepared by diluting the stock solution with times diluted human serum. the working solution of the detection antibody (i.e., hrp-conjugated d ) was prepared by diluting the stock solution times in × reagent diluent. the final concentration of the detection antibody was µg/ml. for the antibody affinity experiments, various concentrations of hrp-conjugated monoclonal antibodies were prepared by diluting the stock solution in × reagent diluent. the signal intensities of the microfluidic elisa were measured with the chemiluminescent imaging method, using a cmos camera (see figure (b) for an example of the chemiluminescent signals). to enhance the dynamic ranges of the elisa, multiple exposures with adjustable exposure time were applied. all chemiluminescent intensities (cl intensity) were normalized to three seconds of exposure time. detailed explanations about the chemiluminescent imaging and the multiple exposure approaches can be found in our previous publications . the receptor binding domain (rbd) in the spike protein (located in the s subunit) on a coronavirus is responsible for binding to the membrane receptors on the target cell. it plays a critical role in the coronavirus cell entry process. for a patient infected by a coronavirus, his/her immune system will develop antibodies to bind and block the rbd of that specific coronavirus. out of all types of neutralization antibodies, igg has the longest lifetime in a person's circulating blood. it is used as a biomarker for the evaluation of patient's adaptive immune responses and the degree of recovery). in addition, as the major active ingredient, the concentration of sars-cov- s specific igg can be used as the indicator for the strength of the convalescent plasma [ ] [ ] [ ] [ ] [ ] . for these reasons, igg that binds specifically to sars-cov- s protein (especially the rbd) is the first biomarker that we aim to detect. the mechanism of igg detection is illustrated in figure (a). first, s protein is immobilized on the capillary inner surface through a poly-histidine mediation approach (see figures s and s (a) for details). then, s specific igg in the sample (such as serum) is attracted to the surface through immunosorbent reaction. finally, the hrpconjugated detection antibody is used to visualize the binding of the immobilized igg. to ensure detection specificity, a monoclonal detection antibody that binds specifically to the fc domain on human igg was used. according to the protocol in figure s (b), the entire assay was completed in minutes. to validate the feasibility of our assay, we selected three recombinant and humanized monoclonal anti-sars-cov- s antibodies as the positive control antibodies. the first antibody, cr , is a therapeutic human igg originally developed against the s protein of sars-cov. it was recently reported to have cross-reactivity towards the s protein of sars-cov- [ ] [ ] [ ] . the second and the third antibodies, d and d , are humanized chimeric iggs (the precursors of d and d were originally raised in mouse and rabbit, respectively) that are specific to the s protein of sars-cov. based on the preliminary results conducted at sino biological internally, they were also believed to have high binding affinities towards the s protein of sars-cov- . in order to mimic the actual clinical situations, we decided to use times diluted human serum as the solvent of the igg antibodies ( - are the typical dilution factors of serum in actual serological analyses). to evaluate the differences in antibody's affinity towards sars-cov- s and sars-cov s , we performed a side-by-side study with these two types of s proteins for all three clones of antibodies. the corresponding results are shown in fig. (b)-(d) . in general, the chemiluminescent intensities are proportional to the concentration of the spiked-in monoclonal antibodies. all these three antibodies are still detectable at . ng/ml with both types of s proteins. for d and d , the signal for both types of s proteins is very similar, indicating that the antibodies' binding affinity towards sars-cov- s and sars-cov s should be very similar (d may have a slightly higher affinity towards sars-cov- s than sars-cov s ). however, for cr , the signal for sars-cov- s is systematically lower than that for sars-cov s , indicative of a weaker affinity of cr towards sars-cov- s than sars-cov s , which agrees with recently published findings about cr 's binding ability , . the entire dynamic ranges of these three antibodies against sars-cov- s can be found in fig. s . the linear dynamic range in the log-log scale for cr , d , and d are - ng/ml, - ng/ml, and - ng/ml. the corresponding slope in the linear range is . , . , and . , respectively. as a negative control, a human igg isotype does not generate any detectable signal within the entire range of detection ( . - ng/ml). due to the narrow linear dynamic range and relatively low binding affinity, cr should not be used as the calibration standard of anti-sars-cov- s igg. the remaining two antibodies, d and d , are nearly the same in terms of the dynamic range and affinity towards sars-cov- s . however, d has a better specificity for sars-cov- s compared to sars-cov s . therefore, d may be a better candidate as a calibration antibody. in addition to anti-s igg, s protein itself may also be a marker in the prognosis of covid- . it may appear in blood for the patients who develop coronavirus viremia and the people who receive certain types of coronavirus vaccines (especially subunit vaccines). recent evidence indicates that the viral proteins may also exist in mucus samples such as the saliva. to detect the s protein, we employed a standard sandwich elisa format, as illustrated in fig. (a) , in which a monoclonal sars-cov- /sars-cov s rbd-specific antibody (d ) was used as the capture antibody and another s rbd-specific antibody (d ) was used as the detection antibody sars-cov- /sars-cov. to reduce the number of steps as well as the total assay time, we directly conjugated hrp molecules on the detection antibody. according to the protocol in figure s (c), the entire assay was completed in minutes. same as in the igg detection experiment, we performed a side-by-side study with the s proteins from sars-cov- and sars-cov. to mimic actual clinical settings, we used times diluted human serum as the solvent of the s protein, as we do not expect to see a high concentration of viral s protein in serum (or saliva). the entire dynamic range of the s detection assay is presented in fig. (b) . the linear dynamic range for sars-cov- s and sars-cov s is . - ng/ml and . - ng/ml with a slope of . and . (in the log-log scale), respectively. according to figures (b) and (c), the detection of sars-cov s appears to have a higher sensitivity than sars-cov- s . this is because both of antibodies (d and d ) used in this assay were originally raised against the rbd of sars-cov s . a higher sensitivity in detecting sars-cov- s protein may be achieved in the future with the antibodies specifically developed against the s protein (or s rbd) of sars-cov- . based on the studies in the previous sections, it is obvious that a good calibration standard (monoclonal human or humanized igg towards sars-cov- s protein) is essential for performing quantitative evaluation of the patient's immune response. high affinity antibodies are also essential for building a sensitive sars-cov- s elisa kit. however, due to nascence of sars-cov- there is no "gold standard" antibody that can be used in igg calibration or s antigen detection yet. consequently, it is urgent to find humanized antibodies with high binding affinities. unfortunately, traditional antibody evaluation approaches, such as plate-based elisa, bio-layer interferometry (bli), and surface plasmonic resonance (spr), suffer from long assay time ( - hours for plate-based elisa), small dynamic range ( orders of magnitude for plate-based elisa), and large sample concentration and consumption ( - µg/ml for bli or spr) [ ] [ ] [ ] . to address these problems, here we present a simple and rapid approach for the affinity assessment of monoclonal antibodies. the assay mechanism and the corresponding protocol are illustrated in figures (a) and s (d), respectively. our assay follows a single-step elisa format. same as in the igg detection experiment, recombinant s proteins were first immobilized on the supporting surface ( µg/ml), followed by the binding of igg. to reduce potential sources of errors, we directly conjugated hrp molecule with purified igg molecules (molar ratio igg : hrp = : , with the hrp conjugation kit from abcam). the antibodies were then diluted to six different concentrations ( - ng/ml with × serial dilution) and then applied to the s protein-coated elisa reactors. the immobilized igg can be quantified after a short minutes of incubation and times of rinsing. although the antibodies may not reach equilibrium by the end of the incubation, the quantity of the immobilized antibody (also the signal intensity) should still have a positive correlation with the affinity of the antibody. to compare the antibody's affinity towards the s protein of sars-cov- and sars-cov, we performed a side-by-side experiment for all antibodies under test . the antibodies were cr , d , d , and d . the first three antibodies were used as calibration standards in the igg detection experiments and d was used as the detection antibody in the s detection experiments. thanks to the simple protocol, our assay exhibits excellent intra-assay consistencies (see fig. s as an example), thus ensuring highly reliable measurements. as shown in figure (b), these four antibodies have very different affinities towards the s protein of sars-cov- . note that the data for ng/ml are not presented because the signal is not detectable for the three antibodies except d (the signal for d is very weak, see figure s ). for the points within the linear dynamic ranges, the signal intensities with the strongest antibody (d ) are - times higher than the weakest antibody (cr ). for example, the signal intensities for these two antibodies at ng/ml are and . , respectively. on the other hand, the lowest detectable concentration for d is ng/ml and the lowest detectable concentration for cr is ng/ml. this can be another evidence for the difference in antibody's affinity. these observations agree with several recently published experiment results and our own preliminary results [ ] [ ] [ ] and our measurements at sino biological (the equilibrium dissociation constant, kd, for cr and d is - nm and < nm, respectively, based on bli measurements. in contrast, as shown in figure (c) the antibodies' affinity towards sars-cov s can be very different from sars-cov- s . for example, cr 's binding affinity towards sars-cov s is stronger than its affinity towards sars-cov- s . conversely, d 's binding affinity towards sars-cov s is weaker than sars-cov- s . in addition, the pattern of calibration curves with sars-cov- s is significantly different from that with sars-cov s . while d , d , and cr 's affinities towards sars-cov- s vary significantly, they appear to be very similar to each other towards sars-cov s . although our current method allows us to rapidly evaluate the relatively affinity among all antibodies, it is unable to extract the exact value of kd. this problem can be resolved by introducing multiple calibration antibodies with known affinities. we have demonstrated a portable chemiluminescent microfluidic elisa system that is able to conduct sensitive detection and quantification of sars-cov- -related biomarkers in only - minutes using micro-liter sized sample volumes. the llod of ng/ml for igg in serum was achieved using the humanized chimeric antibodies as the model system. we also successfully characterized different antibodies and identified an antibody candidate, d , which can be used as the calibration antibody for quantitative evaluation of anti-sars-cov- s igg. this approach can also be extended to evaluation of therapeutic convalescent plasma. furthermore, we demonstrated sensitive detection of sars-cov- s protein (antigen) with the llod of . ng/ml. finally, we showed that our technology can be used as an alternative approach for rapid ( . minutes) screening and validation of monoclonal anti-s antibodies. our method requires only tens of nanograms, which is several orders of magnitude smaller than used traditional label-free methods (such as bli and spr), and will be useful in screening and selection of high affinity therapeutic neutralization antibodies and research-use antibodies , [ ] [ ] [ ] . we will continue to optimize our assays in multiple aspects. for igg detection, we will investigate more humanized or patient-derived sars-cov- s antibodies to identify optimal calibrators with a high binding affinity and a large linear dynamic range. the differences in the antibodies' affinity against monomeric s (most commonly used in antibody and vaccine development) and trimeric s (the natural conformation of s on sars-cov- virus) can also be explored . for s protein detection, we will improve the detection sensitivity, since the abundance of sars-cov- s may be very low in actual patient samples. based on our previous publications , , the llod can be greatly enhanced when we replace the hrp-conjugated detection antibody with a biotinylated detection antibody once it become available. for antibody affinity evaluation, we aim to further optimize the assay's protocol (such as adjusting the incubation time and rinsing time) and employ it to evaluate more therapeutic and research use antibodies. our approach also opens a door for other covid- related clinical or laboratory researches. for example, the diagnostic value of the covid- related biomarkers in serum and saliva (especially s specific iga, igm, and viral antigens such as the s and nucleocapsid (n) proteins) is currently under intensive evaluation , , . the igg detection method described in this work can be easily adapted to detect and quantify other types of antibodies such as igm and iga , , . the concept for sars-cov- s protein detection can also be adapted to detect other types of viral antigens (such as the sars-cov- n protein), as described in figure s . direct detection of viral antigens in patient samples such as serum and saliva may facilitate the rapid and cost-effective diagnosis of covid- , . finally, the microfluidic elisa platform can be used to study the neutralization efficacy of therapeutic antibodies (see figure s ), as well as for the recognition, evaluation, and phenotyping of natural and recombinant (fake) viral particles . and sars-cov s protein (black circles) in times diluted human serum. the averaged background is subtracted from all data points. the solid lines are the linear fit of the data in the loglog scale. the grey shaded area marks ×standard deviation of the background. the lower limit of detection (llod) for sars-cov- s protein and sars-cov s is . ng/ml and . ng/ml, respectively. (c) calibration curves for s proteins between . and ng/ml. the error bars are generated from duplicate measurements. different monoclonal humanized s specific igg against the s protein from sars-cov- (b) and sars-cov (c). the solid lines are the linear fit of the data in the log-log scale. the sample-to-answer time of this assay is minutes. (b)-(d) detection of s specific igg in times diluted serum, against the s protein from sars-cov- (red squares) and sars-cov (black circles). the calibration curves are generated with three different monoclonal humanized antibodies (cr in (b) error bars are generated from duplicate measurements. see also figure s for the entire dynamic range of cr , d , and d , and their respective lower limits of detection the authors thank the financial support from the department of biomedical engineering. the authors declare the following competing financial interest): m. k. k. o. and x. f. are cofounders of and have an equity interest in optofluidic bioassay, llc. key: cord- -wf hlhim authors: larsen, mads delbo; de graaf, erik l.; sonneveld, myrthe e.; plomp, h. rosina; linty, federica; visser, remco; brinkhaus, maximilian; Šuštić, tonći; de taeye, steven w.; bentlage, arthur e.h.; nouta, jan; natunen, suvi; koeleman, carolien a. m.; sainio, susanna; kootstra, neeltje a.; brouwer, philip j.m.; sanders, rogier w.; van gils, marit j.; de bruin, sanne; vlaar, alexander p.j.; zaaijer, hans l.; wuhrer, manfred; van der schoot, c. ellen; vidarsson, gestur title: afucosylated immunoglobulin g responses are a hallmark of enveloped virus infections and show an exacerbated phenotype in covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wf hlhim igg antibodies are crucial for protection against invading pathogens. a highly conserved n-linked glycan within the igg-fc-tail, essential for igg function, shows variable composition in humans. afucosylated igg variants are already used in anti-cancer therapeutic antibodies for their elevated binding and killing activity through fc receptors (fcγriiia). here, we report that afucosylated igg which are of minor abundance in humans (∼ % of total igg) are specifically formed against surface epitopes of enveloped viruses after natural infections or immunization with attenuated viruses, while protein subunit immunization does not elicit this low fucose response. this can give beneficial strong responses, but can also go awry, resulting in a cytokine-storm and immune-mediated pathologies. in the case of covid- , the critically ill show aggravated afucosylated-igg responses against the viral spike protein. in contrast, those clearing the infection unaided show higher fucosylation levels of the anti-spike protein igg. our findings indicate antibody glycosylation as a potential factor in inflammation and protection in enveloped virus infections including covid- . generally, changes in the fc glycans are associated with age, sex and autoimmune diseases ( ). serum igg are highly fucosylated at birth and slightly decrease to ~ % fucosylation at adulthood ( ). until now, no strong clues on how igg core fucosylation is controlled have come forward. we have previously observed that alloantibodies against red blood cells (rbc) and platelets show remarkably low igg-fc-fucosylation in most patients, even down to % in several cases ( - ), whereas the overall serum igg fc-fucosylation show consistently normal high levels. moreover, we have reported the lowered igg-fc fucosylation to be one of the factors determining disease severity in pregnancy associated alloimmunizations, resulting in excessive thrombocytopenia's and blood cell destruction when targeted by afucosylated antibodies ( - ). in addition to the specific afucosylated-igg response against platelets and rbc antigens, this response has also been identified against hiv and dengue virus ( , inspired by the similarities between the unique afucosylated igg responses in various alloimmune responses ( - , ), hiv ( ) and dengue( ) -all being directed against surface exposed and membrane embedded proteins -we analyzed igg glycosylation in anti- to investigate the fc-glycosylation of total-and antigen-specific antibodies, first igg from > human serum samples was affinity-purified using protein g affinity beads and immobilized antigens, respectively. thereafter, isolated igg was digested with trypsin and resulting igg -fc-glycopeptides were analyzed with liquid chromatography-mass spectrometry (lc-ms) ( fig. a) ( , , ). subsequently, intensities were extracted and igg- glycosylation profiles were calculated ( fig. b-c) . t-test for a,b, and c and as a one-way anova sidak's multiple comparisons test comparing total igg to antigen-specific igg within groups, and same specificity igg between groups, for d and e. only statistically significant differences are shown. *= p< . , **= p< . , ***= p< . , ****= p< . . analogous to the platelet and red blood cell alloantigens ( - , ), the response to these enveloped viruses also showed significant afucosylation of the antigen-specific igg (fig. b) , while the afucosylation was absent against the non-enveloped virus parvo b (fig. c ). of note, total igg showed high fucosylation levels throughout ( fig. a-c) , underlining that the majority of human igg responses consists of fucosylated igg responses ( , , ) . the extent of the response to the enveloped viruses was highly variable, both between individuals and between the types of antigen, which is in agreement with the variable tendency of different rbc-alloantigens to induce an afucosylated response ( ). afucosylation was particularly strong for cmv and to a lesser degree for hiv (fig. b) . the to their natural infection counterpart (fig. e, fig. s ). both showed reduction, with a more prominent difference for the mumps response (fig. e, fig s ) . other glycan traits for anti- measles and anti-mumps are shown in fig. s . we then tested if this type of response also plays a role in patients with sars-cov- (covid- or non-enveloped viruses (fig. ) . the netherlands were used as controls. washing and eluting specific antibodies was done as described above for cmv-specific antibodies. (table s ) was at least higher than background plus times its standard division, otherwise the data was excluded ( ). the total level of glycan traits was calculated as described in table s . no correlation was found between the level of igg fc-fucosylation made during alloimmunization against hpa a in pregnancy (y-axis) and cmv (x-axis) in the same individual. b. also no correlation was found between the level of igg fc-fucosylation made against hiv (y-axis) and cmv (x-axis) in the same individual. statistical analysis was performed using pearson correlation. sweet but dangerous -the role of immunoglobulin g glycosylation in autoimmunity and inflammation glycosylation as a strategy to improve antibody-based therapeutics igg subclasses and allotypes: from structure to effector functions novel concepts of altered immunoglobulin g galactosylation in autoimmune diseases lack of fucose on human igg n-linked oligosaccharide improves binding to human fcgamma riii and antibody-dependent cellular toxicity unique carbohydrate-carbohydrate interactions are required for high affinity binding between fcgammariii and antibodies lacking core fucose antibodies to watch in high-throughput igg fc n-glycosylation profiling by mass spectrometry of glycopeptides human igg fc-glycosylation after birth and during early childhood regulated glycosylation patterns of igg during alloimmune responses against human platelet antigens a prominent lack of igg -fc fucosylation of platelet alloantibodies in pregnancy low anti-rhd igg-fc-fucosylation in pregnancy: a new variable predicting severity in haemolytic disease of the fetus and newborn glycosylation pattern of anti-platelet igg is stable during pregnancy and predicts clinical outcome in alloimmune thrombocytopenia a pilot study showing differences in glycosylation patterns of igg subclasses induced by pneumococcal, meningococcal, and two types of influenza vaccines antigen specificity determines anti-red blood cell igg-fc alloantibody glycosylation and thereby severity of haemolytic disease of the fetus and newborn glycans are a novel biomarker of chronological and biological ages severe sars-cov- infections: practical considerations and management strategy for intensivists. intensive care med clinical features of patients infected with anti-ha glycoforms drive b cell affinity selection and determine influenza vaccine efficacy decoding the human immunoglobulin g-glycan repertoire reveals a spectrum of fc-receptor-and complement-mediated-effector activities modified vaccinia virus ankara: history, value in basic research, and current perspectives for vaccine development anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection the pathogenesis and treatment of the 'cytokine storm'' in covid- comparison of the fc glycosylation of fetal and maternal immunoglobulin g fig. s : fucosylation of anti-measles. compared to total igg fucosylation, the antigen-specific igg fucosylation, of anti-measle antibodies was only significantly lowered in the younger vaccinated cohort (mean age . ). this is likely to be masked by the natural tendency of lowered total igg-fucosylation with increasing age ( ), as the naturally infected cohort (before introduction of the mev/muv vaccination program in s in the netherlands) is older than the vaccine cohort (average . vs . years, respectively). in line with this, the total igg fucosylation of the older cohort showed significantly lowered total-igg fucosylation compared to the younger vaccinated cohort / statistical analysis was performed as paired t-test for a,b, and c and as a one-way anova sidak's multiple comparisons test comparing total igg to antigenspecific igg within groups, and same specificity igg between groups, for d and e. only statistically significant differences are shown. (*= p< . ) key: cord- - ozn il authors: de almeida, paula rodrigues; demoliner, meriane; antunes eisen, ana karolina; heldt, fágner henrique; hansen, alana witt; schallenberger, karoline; fleck, juliane deise; spilki, fernando rosado title: sars-cov quantification using rt-dpcr: a faster and safer alternative to assist viral genomic copies assessment using rt-qpcr date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ozn il in this study, serial dilutions of sars-cov rna extract were tested using rt-dpcr using three different primer-probe assays aiming sars-cov nucleocapsid coding region. narrower confidence intervals, indicating high quantification precision were obtained in and -fold serial dilution and rt-dpcr results were equivalent between different assays in the same dilution. high accuracy of this test allowed conclusions regarding the ability of this technique to evaluate precisely the amount of genomic copies present in a sample. we believe that this fast and safe method can assist other researchers in titration of sars-cov controls used in rt-qpcr without the need of virus isolation. human coronaviruses (cov) were often associated to mild respiratory diseases such as winter colds. in , an outbreak of a severe acute respiratory disease (sars) caused by cov changed the point of view of mankind about this virus (holmes, ) . in january of , a new cov (sars-cov ) emerges in hubei province, china, causing severe respiratory disease, affecting over thousand people in that region. currently, the disease is known as covid- (cov disease ), the infection causes variable degree of severity, usually severe, seldom fatal in individuals with other diseases and the elderly, with case-fatality rate between % and % in this age in china. high transmissibility is the main difference between sars-cov and other cov genotypes affecting humans (wu & mcgoogan, , velavan & meyer, . the high rate of transmissibility has impaired the local control of sars-cov in china, allowing it to spread to europe and americas. in italy and spain, it has been especially severe, reaching higher case-fatality and causing a collapse of official health services (lazzerini & putoto, ) . sars-cov infections are now in acceleration phase in the americas, with over . cases in the us, and over million cases around the world. real-time information is among the most important factors helping to control local spread of covid- . additional information about viral titration methods in different samples are necessary to estimate viral pathogenicity, to assess treatment success, estimate viral degree of transmissibility and viral survival in different surfaces and situations. in order to titrate sars-cov via classical viral culture methods, a biosecurity level (bsl ) facility is required, moreover, these methods are time consuming, and impose a higher risk of infection for people involved. here, we present a fast, accurate and simple method of viral titration using quantstudio d® microchip based rt-dpcr to titrate sars-cov genomic copies from controls to be used in rt-qpcr assays for diagnosis and research purposes. rna was extracted with magmax tm rna isolation kit and cdna was synthesized with promega goscripttm according to manufacturer's instructions. next, cdna was diluted in a fold based series from to ¹ to attempt ideal target copy numbers in the final reaction. ⁻⁶ ⁻ optimal target concentration is essential for an accurate quantification through rt-dpcr (majumdar et al., ) . a dilution that results in approximately to target copies in the final reaction usually presents better precision values. to achieve this titration of target copies, a series of rt-dpcr were attempted in six serial -fold dilutions. digital rt-dpcr was performed in a quantstudio d tm system (applied biosystems tm ), using quantstudio™ d digital pcr master mix v dpcr mastermix and quantstudio tm d k v chips. primers and probe targeting sars-cov nucleocapsid , and , described by (fda) were used in rt-dpcr reactions. cycling conditions were as follows: in min at ºc, followed by cycles of sec at ºc and min at ºc, and a final step of ºc per min followed by maintenance at ºc. v chips were read in the quantstudio d instrument and the results were interpreted in the dpcr analysissuite tm app in the thermo fisher connect tm dashboard. results with precision values below % were selected to estimate quantity of sars-cov genomic copies based on rt-dpcr. ideal precision values were achieved in all three assays in dilutions of : ² and ⁻ : ³ahighlighted in table . ⁻ table . rt-dpcr results of -fold serial dilution of sars-cov control using assays for three targets in the nucleocapsid gene. selected samples are highlighted in bold. target average (copies/µl) table were selected to estimate genomic copies of sars-cov in the positive control stock to be used in future rt-qpcr assays. these samples presented a low precision value. precision value is calculated based on the confidence interval estimated for the sample. target titration through digital pcr relies on poison distribution using chips with microscopic partitions where individual pcr reaction occurs with a probe based assay, allowing a quantification based on negative to positive rates among these partitions. chips used in this experiment have . partitions, in general, % of negative partitions results in good precision values. precision of rt-dpcr quantification indicates how reliable is the result achieved in a reaction, it is directly related to confidence interval, with a narrower confidence interval resulting in desirable precision values and consistent target quantification results. precision is dependent on target concentration, and ideal precision can be achieved by serial dilutions. ideal concentrations are those with lower precision values and narrower confidence intervals (majumdar, et al., ) . in this experiment three primer-probe assays were applied to titrate viral genomic copies per microliter. it is possible to observe very close quantification and precision results for the three probes at the same concentration. distribution of reactions in the chips were also very similar, as shown in figure . these results indicate that these primer-probe assays are suitable for sars-cov quantification through rt-dpcr. all three assays target the same gene, but in different locations and the similarity of the results in two different concentrations and different probes reinforce the reliability of this technique in titration of genomic copies without need of excessive direct viral manipulation and time consuming techniques. the limit of detection reported for these assays was of , copies/µl (fda). control samples in diluted in : ⁻ presented an estimated value of approximately copies per µl, however here cdna was diluted in water, moreover, these samples were above the cutoff value established for interpretation of reliable quantification assays. results obtained by this method can be applied in rt-qpcr reactions, using the same dilution that presented better precision values and narrower confidence interval in the quantitative curve of the rt-qpcr, and additional -fold dilutions above and below these values. this method is an alternative to classical method of viral titration used in rt-qpcr curves, it is faster and safer than classical methods, it can be applied in bsl laboratories and only cdna is needed, therefore controls that are shared between multiple laboratories can be quantified in a single laboratory. all these situations are very common during this pandemic crisis, hence, this technique might be helpful nowadays. sars coronavirus: a new challenge for prevention and therapy characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china the covid- epidemic covid- in italy: momentous decisions and many uncertainties. the lancet digital pcr modeling for maximal sensitivity, dynamic range and measurement precision accelerated emergency use authorization (eua) summary declaration of competing interest: the authors declare no conflict of interest. funding this study was funded by rede-virus mctic; capes key: cord- -fn gcxbb authors: ziv, omer; price, jonathan; shalamova, lyudmila; kamenova, tsveta; goodfellow, ian; weber, friedemann; miska, eric a. title: the short- and long-range rna-rna interactome of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: fn gcxbb the coronaviridae is a family of positive-strand rna viruses that includes sars-cov- , the etiologic agent of the covid- pandemic. bearing the largest single-stranded rna genomes in nature, coronaviruses are critically dependent on long-distance rna-rna interactions to regulate the viral transcription and replication pathways. here we experimentally mapped the in vivo rna-rna interactome of the full-length sars-cov- genome and subgenomic mrnas. we uncovered a network of rna-rna interactions spanning tens of thousands of nucleotides. these interactions reveal that the viral genome and subgenomes adopt alternative topologies inside cells, and engage in different interactions with host rnas. notably, we discovered a long-range rna-rna interaction - the fse-arch - that encircles the programmed ribosomal frameshifting element. the fse-arch is conserved in the related mers-cov and is under purifying selection. our findings illuminate rna structure based mechanisms governing replication, discontinuous transcription, and translation of coronaviruses, and will aid future efforts to develop antiviral strategies. rna viruses comprise the dominant component of the eukaryotic virome (dolja and koonin, ) . their error-prone genome replication mode allows them to rapidly evolve new variants and to jump from animals to humans (woolhouse and gaunt, ) , thus presenting a high epidemic and pandemic threat. several members of the betacoronavirus genus (family coronaviridae), namely the severe acute respiratory syndrome coronavirus (sars-cov), the middle east respiratory syndrome coronavirus (mers-cov), as well as the severe acute respiratory syndrome coronavirus (sars-cov- ) are of special concern. sars-cov- , the causative agent of coronavirus disease , has spread to date to nearly every country in the world, resulting in millions of infections, over a million of deaths, and a massive global economic impact (mckibbin and fernando, ) . even though worldwide efforts and resources are redirected to overcome the covid- pandemic, at present, there are no approved vaccines or antiviral medicines. this illustrates the urgent need for deciphering in-depth the molecular biology of coronaviruses, especially sars-cov- . coronaviruses have evolved the largest known single-stranded rna genome in nature. regulation of their mrna transcription and translation is facilitated by cis-acting structures that interact with each other, with viral proteins, and with host machineries (madhugiri et al., ) . mrna transcription in coronaviruses involves a process whereby so-called subgenomic mrnas (sgmrnas) are produced through discontinuous genomic rna (grna) template utilization, which is in contrast to replication of the full-length genome (sawicki et al., ) . this discontinuous transcription is mediated by the transcription regulating sequence-leader (trs-l) at the ′ end of the genome, and the transcription regulating sequence-body (trs-b) at the ′ ends of each orf. template switching between these rna sequence elements results in a set of ′ and ′ co-terminal, "nested" sgmrnas of different sizes on which the ′ proximal orfs are translated into nonstructural or structural viral proteins (moreno et al., ; sola et al., ) . the mechanisms underlying discontinuous transcription and genome replication have not been fully worked out, however long-distance rna-rna interactions along the viral genome have been proposed as key regulators (mateos-gómez et al., ; mateos-gomez et al., ; moreno et al., ; sola et al., ) . on the full-length grna itself, two partially overlapping open reading frames (orf a and orf b) are translated from the same start codon at the ′ end, resulting in the polyproteins pp a and pp ab. translation of the longer product pp ab is made possible by a hairpin-type pseudoknot rna structure known as the frameshifting element (fse) which regulates a programmed - ribosomal frameshifting that overrides with about % efficiency the stop codon of orf a (kelly et al., ; namy et al., ) . previous studies applied rna structure probing techniques using selective ′-hydroxyl acylation analyzed by primer extension (shape) and dms, as well as nuclear magnetic resonance (nmr) to effectively identify conserved cis-acting rna structures regulating the life cycle of coronaviruses. however, when it comes to identifying long-distance base-pairing between distal nucleotides, these methods fall short. therefore, the long-range rna-rna interactome of coronaviruses has never been mapped in full. deciphering how the various structural elements along the coronavirus grna and sgmrna are folded and brought together in time and space is vital for understanding, dissecting and manipulating viral replication, discontinuous transcription, and translation regulation. we recently developed crosslinking of matched rnas and deep sequencing (comrades) for in-depth rna conformation capture in living cells (ziv et al., ) . comrades is derived from a class of methods that combine psoralen crosslinking of base paired rna and deep sequencing (aw et al., ; lu et al., ; sharma et al., ) . comrades utilizes a clickable psoralen derivative to specifically crosslink paired nucleotides, and high throughput sequencing to retrieve their positions ( figure ). following in vivo crosslinking, the viral rna is selectively captured, fragmented and subjected to a click-chemistry reaction to add a biotin tag to crosslinked fragments. crosslinked rna duplexes are then selectively captured using streptavidin affinity purification. half of the resulting rna is proximity ligated, following reversal of the crosslink to create chimeric rna templates for high throughput sequencing. the other half is used as a control, in which reversal of the crosslink precedes the proximity ligation, and accurately represents the background level of non-specific ligation. the coupling of two biotin-streptavidin mediated enrichment steps, first of viral rna, and second of crosslinked rna duplexes provides high structural depth for identification of both long-and short-lived conformations. comrades can therefore measure (i) the structural diversity of alternative rna conformations that co-exist inside cells; (ii) short-distance, as well as long-distance (over tens of thousands of nucleotides) base-pairing within the same rna molecule; and (iii) base-pairing between different rna molecules, such as those of host and viral origin (kudla et al., ; ziv et al., ) . here we apply comrades to study the structural diversity of the sars-cov- grna and sgmrna inside cells. we discover networks of short-and long-range rna-rna interactions spanning the entirety of sars-cov- grna and sgmrna. we reveal site-specific interactions with the host transcriptome. finally, we uncover a conserved long-range structure encompassing the programmed ribosomal frameshifting element. the sars-cov- genome and sgmrna adopt alternative co-existing topologies that involve long-distance base-pairing inside the host, the grna of sars-cov- is transcribed into sgmrna ( figure a ). to compare the structure of both types of rna, we applied the comrades method and set up a dual enrichment strategy to analyse the positive sense grna and positive sense sgmrna separately ( figure b ). briefly, we selectively pulled down the full-length positive sense sars-cov- genome from in vivo crosslinked, sars-cov- inoculated vero e /tmprss cells (matsuyama et al., ) , using a tiling array of antisense probes for orf a/b, which resulted in a highly enriched grna fraction ( figure c ). the full-length positive sense sgmrna was subsequently enriched from the grna-depleted supernatant of the first pulldown, using a second tiling array of antisense probes to the region downstream of orf a/b ( figure b ). this dual enrichment strategy resulted in a high degree of separation between the grna and the sgmrna ( figure c ). comrades provided > million nonredundant chimeric reads, which was sufficient to generate high-resolution maps for both the grna and sgmrna with a high signal to noise ratio ( figure s ), and high reproducibility between independent biological replicates (r = . , p value < . e- , figure d ,e). our structural data covered > . % of the coronavirus grna and the sgmrna ( figure c ), and represents the base-pairing nature of sars-cov- grna and sgmrna inside cells. available models for the rna structure of sars-cov- and related viruses are largely confined to short-distance base-pairing which result in local folding of important cis-acting elements (andrews et al., ; huston et al., ; kelly et al., ; lan et al., ; manfredonia et al., ; ryder, ; sanders et al., ; sun et al., ) . however, longdistance base-pairing between distal rna elements are equally essential for many rna viruses (huber et al., ) , including coronaviruses (mateos-gómez et al., ; mateos-gomez et al., ; moreno et al., ; sola et al., ) . the ability of comrades to capture rna base-pairing regardless of the distance between the interacting bases enabled us to confirm in vivo the structure of nearly all previously characterised cis-acting elements (with one exception, discussed below) and to discover long-distance rna-rna interactions as they occur inside cells. indeed, we observed a high prevalence of long-range rna basepairing along the sars-cov- genome, with orf a demonstrating more long-range connectivity than any other orf ( figure f ). most of the base-pairing is confined to a single orf, however, some interactions cross orf boundaries. for example, orf a base-pairs with orf b, as well as with the ′ and ′ untranslated regions (utrs) ( figure f ). we additionally discovered long-distance interactions unique to the sgmrna ( figure g ). previous models of the sars-cov- and related viruses mainly analysed structural population averages, i.e. assuming that all copies of the genome and sgmrna have a single static conformation. yet, the complex life cycle of viral rna genomes, i.e. their engagement with multiple cellular and viral machineries such as the ones for replication, transcription, and translation, suggests a dynamic rna structure, as we and others have reported for zika virus (huber et al., ; li et al., ; ziv et al., ) and for hiv- (tomezsko et al., ) . our structural analysis of sars-cov- reveals a high level of structural dynamics whereby alternative high-order conformations, some of which involve long-distance basepairing, co-exist in vivo (figures and s a , table s ). for example, nucleotides , in orf a interact with three alternative distal regions: . kb upstream, . kb downstream, and kb upstream ( figure , arches , and respectively), and the ′ utr interacts with orf a as well as with the ′ utr ( figure , arches and respectively). in contrast, we find that orf n sgmrna is held in a single dominant conformation where the leader sequence interacts exclusively with a region . kb downstream ( figure s b ). in summary, we discover the co-existence of alternative sars-cov- grna and sgmrna topologies, held by long-range base-pairing between regions tens of thousands of nucleotides apart. each topology brings in physical proximity previously characterised and new elements involved in viral replication and discontinuous transcription, therefore offering a model for facilitating distinct patterns of template switching to produce the complete sars-cov- transcriptome. the infectious life cycle of coronaviruses takes place mainly in the host cell's cytoplasm, where many cellular rnas reside (sola et al., ) . host-virus rna-rna interactions regulate the replication of some rna viruses, e.g. the interaction between hepatitis c virus and human microrna mir- (jopling et al., ) , the interaction between zika virus and human mir- (ziv et al., ) , and the priming of hiv- replication by human trnas (mak and kleiman, ) . however, to the best of our knowledge, whether the sars-cov- grna or sgmrna interact with cellular rna is unknown. our comrades method provides an opportunity to undertake an unbiased analysis of the host-virus rna-rna interactome (ziv et al., ) . we discovered site-specific interactions between the sars-cov- rna and various cellular rnas, especially small nuclear rnas (snrnas) (figures a, b and s a, b) . apart from their canonical role in splicing, snrnas mature in the cytoplasm and may have additional biological roles (matera et al., ) . along the viral grna, cellular snrna interactions are mostly confined to orf a and orf b, and include site specific binding of u , u and u snrnas. the grna coding region for the sgmrna orfs and the utrs are largely devoid of snrna binding. in contrast, along the viral sgmrna, both the n orf and the ′ utr show high occupancy of u and u snrna binding. in order to explore the conservation of these snrna interactions in a related coronavirus, we performed comrades on mers-cov-inoculated huh- cells. similarly to sars-cov- , we identified a site-specific interaction of u snrna within the mers-cov orf a ( figures c and s c ), illustrating the evolutionary conservation of the u snrna base-pairing with orf a of betacoronaviruses. in addition to cellular small rnas we also detected long cellular rnas interacting with sars-cov- rna, although to a lesser extent. of specific interest, the rnase mrp rna was found to base-pair with an extended ′ region of the sgmrna, but not the grna, of sars-cov- ( figure s d ). the rnase mrp rna has a conserved secondary structure similar to that of the rna component of the bacterial rnase p ribonucleoprotein (rnp) complex (dávila lópez et al., ; welting et al., ) . the rnase mrp rna has a role in human pre-ribosomal rna processing (goldfarb and cech, ) , when mutated leads to a spectrum of human disease (ridanpää et al., ) , and has been implicated in viral rna degradation (jaag et al., ) . targeting host-virus rna-rna interactions provides an attractive platform for developing new antiviral therapies, as resistance would require the virus to acquire considerable mutational changes to become independent of the host rna. however, whereas multiple tools and efforts are dedicated to identifying host-virus protein-protein interactions, the crosstalk between host and virus rna remains largely unexplored. coupled with the recent advancement in techniques to target rna in vivo, comrades's capacity to map the hostvirus rna-rna interactome opens up new opportunities to control emerging rna viruses. the data we present here could be valuable for the development of new targets for antiviral drugs. the ′ utr of coronaviruses contain five evolutionary conserved stem-loop structures (denoted sl -sl ) that are essential for genome replication and discontinuous transcription (madhugiri et al., ) . the ′ utr contains structural elements important for replication: an evolutionary conserved bulged stem-loop (bsl) (hsue and masters, ) , a partially overlapping hairpin-type pseudoknot (goebel et al., ; williams et al., ) , and a ′ terminal multiple stem-loop structure containing a hyper-variable region (hvr), which folds back to create a triple helix junction (liu et al., ) . our analysis identified seven of these eight cis-acting elements within the utrs ( figures a and s a ). however, our data did not support the folding of the stem-loop pseudoknot at the ′ utr. of note, two recent studies using shape methods to map the structure of sars-cov- inside cells similarly failed to identify this pseudoknot (huston et al., ; sun et al., ) , and a previous study demonstrated the instability of this pseudoknot in the related mouse hepatitis virus (mhv) (stammler et al., ) . in addition to the canonical utr structures, we provide here a direct in vivo evidence for genome cyclization in sars-cov- , mediated by long-range base-pairing between the ′ and ′ utrs ( figures b and s b ). this base-pairing spans a distance of . kilobases and is among the longest distance rna-rna interactions ever reported. genome cyclization was previously hypothesised from mutational analyses of murine coronavirus (mhv) and was suggested to facilitate discontinuous transcription (li et al., ) . however, while mhv genome cyclization involves the ′ sl structure, we find that in sars-cov- , this process is mediated by the ′ sl instead, and results in complete opening of sl and disruption of the triple helix junction in the ′ utr ( figure b ). in agreement with this observation, sl of related betacoronaviruses was suggested to be weakly folded or unfolded (chen and olsthoorn, ; li et al., ) . genome cyclization plays an essential role in the replication of a number of rna viruses, including flaviviruses (hahn et al., ; ziv et al., ) . the evolutionary selection of such a mechanism might stem from in-cell competition between intact and defective viral genomes, as it ensures that only genomes bearing two intact utrs engage with the replication machinery. the sars-cov- genome cyclization we report here results in a complete opening of the ′ sl where the transcription regulating sequence-leader (trs-l) resides, raising the possibility that genome cyclization regulates sars-cov- discontinuous transcription, as was previously suggested for mhv (li et al., ) . it remains to be seen whether this base-pairing can be targeted to inhibit viral replication in vivo. in addition to genome cyclization, we identified two alternative conformations involving long-distance rna-rna interactions between each utr and orf a. these long-distance conformations result in unfolding of sl and sl in the ′ utr ( figures c and s c) , and unfolding of the terminal stem-loop in the ′ utr ( figures d and s d ). of note, unlike the grna, the leader sequence within the ′ utr of orf n sgmrna is held in a single longrange conformation through base-pairing with a region . kb downstream ( figures e, s b and s e). all of the long-range interactions described above are strongly supported through chimeric reads ( figures f and s a ). overall, our data demonstrate the existence of alternative, mutually exclusive utr conformations inside cells, involving interactions between functional utr elements and distal regions within the orfs. we further show that the n orf sgmrna folds differently than the viral genome. the long-distance rna structure map for sars-cov- provides a practical starting point to dissect the regulation of discontinuous transcription, as it identifies cis-acting elements that interact with each other to create genome topologies that favour the synthesis of the ensemble of sgmrnas. rna viruses evolve sophisticated mechanisms to enhance the functional capacity of their size-restricted genomes and to regulate the expression levels of their replicase components. in coronaviruses, one such mechanism is programmed - ribosomal frameshifting to facilitate translation of orf b which contains the viral rdrp activity, and to set a defined ratio of orf a and orf b products (plant et al., ) . this is mediated by a ~ nucleotide long cis-acting frameshifting element (fse) composed of a stem-loop attenuator, and a slippery sequence followed by a single-stranded spacer and an rna pseudoknot (kelly et al., ) . it has been suggested that pausing the progression of the ribosome upstream of the pseudoknot facilitates a tandem-slippage of the peptidyl-trna and aminoacyl-trna to the − reading frame, thus allowing continuous translation through the stop codon at the end of orf a (brierley et al., ) . altering the frameshifting mechanism had a deleterious effect on sars-cov replication (plant et al., ) , making the fse an attractive target for antiviral therapy. understanding the surrounding rna structure and function is therefore of great importance as it might aid the design of drugs targeting the fse. unexpectedly, we find that the fse of sars-cov- is embedded within a much larger, ~ . kb long higher-order structure that bridges the ′ end of orf a with the ′ region of orf b, which we termed the fse-arch ( figure a ,b). to the best of our knowledge, this is the first time such a long-range structural bridge has been reported for any coronavirus, and importantly this structure is supported by the largest number of chimeric reads in our data (more than tens of thousands of non-redundant chimeric reads) ( figures s a,b) , reflecting its high folding stability in vivo. the fse-arch results in a stem-loop structure encompassing , nucleotides, and bearing the fse within it ( figures b and s c ). we hypothesized that if an rna-rna interaction is functionally important, there should be purifying selection and hence reduced nucleotide evolution rate in this region. therefore we used a recent dataset (firth, ) to explore the nucleotide conservation of the fse-arch ( figure c ). strikingly, the fse-arch is under a strong purifying selection and is among the most conserved regions within the sars-cov- genome. consistent with this, analysing the phylogeny of the sars-related coronavirus subgenus (taxid: ) revealed two positions of covariance that support the conservation of the fse-arch ( figure b ). to further explore this structure experimentally, we analysed its existence in mers-cov. mers-cov shares only ~ % sequence identity with sars-cov- lu et al., ) , yet even so, performing comrades on mers-covinoculated huh- cells revealed a strong evidence for an homologous fse-arch surrounding the mers-cov fse, bridging orf a with orf b ( figure d,e) . while the mechanism governing the fse-arch formation will require further investigation, similar long-distance interactions around the frameshifting elements of several plant rna viruses were previously demonstrated to regulate frameshifting, possibly by assisting in back-stepping of ribosomes at the slippery sequence, and by stabilising the fse, allowing it to refold after the passage of each ribosome (barry and miller, ; cimino et al., ; gao and simon, ; tajima et al., ) . in addition to their coding capacity, nucleic acids have evolved structural capabilities to sense metabolites (mandal and breaker, ) , catalyse reactions (pyle, ) , and interact with other cellular components. when brought in physical proximity, different combinations of cis-acting sequences can lead to new biological activities. for example, interactions between promoters and enhancers dictate the rate of transcription along the eukaryotic genome (rowley and corces, ) . great effort is being made to reveal the structural landscapes of the sars-cov- genome (andrews et al., ; huston et al., ; kelly et al., ; lan et al., ; manfredonia et al., ; ryder, ; sanders et al., ; sun et al., ) . however, without deciphering the long-range connectivity, our understanding is far from being complete. here we reveal how cis-acting elements along the coronavirus genome are the authors declare no competing interests. e.a.m. is a founder and director of storm therapeutics. storm therapeutics had no role in the design, performance, analysis, interpretation, and writing of the study. o.z is a consultant in evotec int. evotec int. had no role in the design, performance, analysis, interpretation, and writing of the study. interactions that span at least nt are shown. colours as in (f). see figure s and table s for numbers of chimeric reads and significance of the arches. numbers within loops in (b-e) represent the loops sizes. grey arches adjacent to nucleotide sequences in (b-e) mark unpaired bases. full sequences are available in figure s . see figure s for statistical significance and the full sequence of the fse-arch. irradiating the rna on ice with . kj/m nm uvc using a cl- crosslinker (uvp). sequencing library preparation. library preparation was done as described in (ziv et al., ) , using n unique molecular identifiers to eliminate pcr biases. pre-adenylated adapters were used and all ligation reactions were carried without atp to reduce ligation artefacts. all libraries and controls went through pcr cycles using kapa hifi hotstart ready mix (kapa biosystems). pcr products were size-selected on a . % agarose gel before loading on a novaseq (illumina) for paired-end bp runs. total of ~ . billion sequences were achieved for this study. data pre-processing. data pre-processing was performed according to (ziv et al., ) . in brief, raw paired-end reads were trimmed for adaptors and checked for quality using cutadapt (martin, ) . trimmed paired-end reads were assembled into single reads using the program pear (https://cme.h-its.org/exelixis/web/software/pear/doc.html). pcr duplicates were removed using unique molecular identifiers via collapse.py (https://gitlab.com/tdido/tstk). chimeric reads were identified and annotated to the respective genome using hyb (travis et al., ) . sars-cov samples were processed using the chlorocebus sabaeus reference genome (chlsab . ) with the addition of the sars-cov- sequence (nc_ . ). mers samples were processed using the homo sapiens reference genome (grch ) with the addition of mers (nc_ . ). clustering of chimeras into chimeric groups. due to crosslinking and fragmentation, the comrades data can provide redundant structural information whereby the same in vivo structure produces sequencing reads differing by a few nucleotides. this results in increased computation load of folding each chimeric read separately. to overcome this issue, and to gain better structure predictions, the reads were clustered into chimeric groups. each chimeric read is composed of a left side (l) and right side (r), each originated from a different position along the grna or sgmrna. each chimeric read can therefore be described as (g): the genomic distance between l and r, and chimeric reads that originated from the same structure will have a similar g and can be clustered based on their g values. clustering of chimeric reads that originated from the same structure was performed using a network-based approach whereby an adjacency matrix is created for all chimeric reads based on the nucleotide difference between their g values (deltagap). this results in deltagap= for identically overlapping gaps, and increasing deltagap values for chimeric reads that share less overlapping sites. the clustering was performed twice per sample, once for the chimeric reads that represent short stem structures (g <= nt) and once for chimeric reads that represent long distance interactions (g > nt). short range interactions weights were calculated as: e(gi,gj) = -deltagap (gi, gj) . this allows exactly overlapping gaps to have the highest weight, and gaps with no overlaps to have a weight of . for long range interactions weights were calculated as: e(gi,gj) = -deltagap(gi, gj) long range interactions with weights lower than were set to , meaning that gaps that differ by more than nucleotides could not be considered as part of the same chimeric group. the weighted graphs created for long-and short-range interactions consisted of g as vertices and weights as edges: g = (v,e) using igraph (http://www.interjournal.org/manuscript_abstract.php? ). to identify densely connected subgraphs (communities) with chimeric groups containing chimeric reads that originated from the same structure, we clustered the graph using random walks with the cluster_waltrap function (steps = ) from the igraph package. chimeric groups containing less than chimeric reads were discarded. chimeric groups often contained a small amount of longer l or r sequences due to the random fragmentation in the comrades protocol. to avoid introducing biases in the folding results, clusters were trimmed to the region from l to r for which evidence in the cluster is higher than the mean evidence - standard deviations. folding. folding of the chimeric groups was performed using the vienna package (lorenz et al., ) . for short range chimeric groups rnafold was used (with default parameters) and for long range chimeric groups rnaduplex was used (with default parameters). (iii) msa- -sarsrel- seq: pairwise distances between the sequences in (i) were calculated using the mbed function in the kmer package (blackshields et al., ) . the "seeds" attribute was used to extract the sequence indices of all seed sequences. these seed sequences were included in this multiple sequence alignment. this resulted in a smaller sequence set but with a representative variation as (i). muscle (default parameters). (iv) msa- -sarsrel- seq: the unaligned sequences used to generate (ii) were divided into seven smaller sequence sets (six -sequences sets and one -sequence set). the seed sequences ("seeds" attribute of the mbed function in the kmer package (https://cran.rproject.org/package=kmer)) for all seven sets were combined in a new sequence set to be aligned to make up multiple sequence alignment (iv). this resulted in a sequence set with less sequences than (i), but more than (iii) and representative variation as (i). muscle (default parameters). in all cases the ncbi reference genome for sars-cov- (nc_ . ) was used as reference. for each of the four sequence alignments, the following steps were taken: (i) sars-cov- comrades cluster coordinates. for long-range clusters (defined above) the segments defined by the coordinates of the left and right side of the respective cluster were extracted from the msa, and fused together to form a smaller msa containing only the aligned left and right side sequences. for short-range clusters (defined as above) the whole region defined by the start position of the left side and the end position of the right side was extracted. those segments of the full msa will be referred to as "cluster alignments". in both cases, any sequence starting with more than empty positions was removed from that cluster alignment. (ii) the cluster alignments were analyzed with r-scape (rivas et al., ) using default parameters. (iii) the r-scape output for each candidate co-varying pair includes an e-value statistical score (probability of a false positive result for the respective position pair in the cluster alignment). the default significance level of . was kept, so only position pairs with evalues smaller than . were considered in the subsequent analysis. covariation analysis of sgmrna chimera clusters. the alignments described above were shortened to include the leader sequence fused to the full-length of mrna-s, and were subsequently used here. a modified version of the code used from full genome chimera clusters was used. sequence conservation analysis of the extended fse structure. genome conservation data analyzed with synplot (firth, ) was taken from (firth, ) . these data were aligned with our structural data and displayed in figure c . ridanpää, m., van eenennaam, h., pelin, k., chadwick, r., johnson, c., yuan, b., vanvenrooij, w., pruijn, g., salmela, r., rockas, s., et al. ( ) a-a-c-c-a-a-c------c--a--c--c---g-----c--g---c----a-a-c-g-a-a-----a --g -g---g-a---a-a-g --a-g-a-g-a-a-c-a---a-g-a-c-a-a-g------g-c----a-a-a-u-g-u-u-c-u-c-u--a-a g-a-a-c-u-u-u-a-a g-u-a-a-g-a-g-g--u-u-c-u-u-g-a-a-a-u-u c-u-u-g-u--g-a-u--g-u-u-c g-a-a-c-a--c-u-a--c-a-a-g an in silico map of the sars-cov- rna structurome in vivo mapping of eukaryotic rna interactomes reveals principles of higher-order organization and regulation a - ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral rna sequence embedding for fast construction of guide trees for multiple sequence alignment characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot group-specific structural features of the '-proximal sequences of coronavirus genomic rnas emerging coronaviruses: genome structure, replication, and pathogenesis multifaceted regulation of translational readthrough by rna replication elements in a tombusvirus conserved and variable domains of rnase mrp rna metagenomics reshapes the concepts of rna virus evolution by revealing extensive horizontal virus transfer muscle: a multiple sequence alignment method with reduced time and space complexity mapping overlapping functional elements embedded within the proteincoding regions of rna viruses a putative new sars-cov protein, c, encoded in an orf overlapping orf a multiple cis-acting elements modulate programmed - ribosomal frameshifting in pea enation mosaic virus characterization of the rna components of a putative molecular switch in the ′ untranslated region of the murine coronavirus genome targeted crispr disruption reveals a role for rnase mrp rna in human preribosomal rna processing conserved elements in the ' untranslated region of flavivirus rnas and potential cyclization sequences a bulged stem-loop structure in the ' untranslated region of the genome of the coronavirus mouse hepatitis virus is essential for replication structure mapping of dengue and zika viruses reveals functional long-range interactions comprehensive in-vivo secondary structure of the sars-cov- genome reveals novel regulatory motifs and mechanisms role of rnase mrp in viral rna degradation and rna recombination modulation of hepatitis c virus rna abundance by a liver-specific microrna structural and functional conservation of the programmed − ribosomal frameshift signal of sars coronavirus (sars-cov- ) rna conformation capture by proximity ligation structure of the full sars-cov- rna genome in infected cells structural lability in stem-loop drives a ′ utr- ′ utr interaction in coronavirus replication integrative analysis of zika virus genome rna structure reveals critical determinants of viral infectivity functional analysis of the stem loop s and s structures in the coronavirus ′utr viennarna package . genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding coronavirus cis-acting rna elements primer trnas for reverse transcription gene regulation by riboswitches genome-wide mapping of therapeutically-relevant sars-cov- rna structures cutadapt removes adapter sequences from high-throughput sequencing reads gene n proximal and distal rna motifs regulate coronavirus nucleocapsid mrna transcription longdistance rna-rna interactions in the coronavirus genome form high-order structures promoting discontinuous rna synthesis during transcription a day in the life of the spliceosome enhanced isolation of sars-cov- by tmprss -expressing cells the global macroeconomic impacts of covid- : seven scenarios. centre for applied macroeconomic analysis shows lack of evidence for structure in lncrnas organizational principles of d genome architecture analysis of rapidly emerging variants in structured regions of the sars-cov- genome comparative analysis of coronavirus genomic rna structure reveals conservation in sars-like coronaviruses a contemporary view of coronavirus transcription global mapping of human rna-rna interactions continuous and discontinuous rna synthesis in coronaviruses a conserved rna pseudoknot in a putative molecular switch domain of the '-untranslated region of coronaviruses is only marginally stable in vivo structural characterization of the whole sars-cov- rna genome identifies host cell target proteins vulnerable to re-purposed drugs a long-distance rna-rna interaction plays an important role in programmed - ribosomal frameshifting in the translation of p replicase protein of red clover necrotic mosaic virus determination of rna structural diversity and its role in hiv- rna splicing hyb: a bioinformatics pipeline for the analysis of clash (crosslinking, ligation and sequencing of hybrids) data differential association of protein subunits with the human rnase mrp and rnase p complexes a phylogenetically conserved hairpin-type ′ untranslated region pseudoknot functions in coronavirus rna replication ecological origins of novel human pathogens comrades determines in vivo rna structures and interactions the authors thank christian drosten and john ziebuhrfor for providing the sars-cov- and mers-cov strains used in this study. we thank b. luisi, g. evan, and the department of key: cord- - y o y authors: andreano, emanuele; nicastri, emanuele; paciello, ida; pileri, piero; manganaro, noemi; piccini, giulia; manenti, alessandro; pantano, elisa; kabanova, anna; troisi, marco; vacca, fabiola; cardamone, dario; de santi, concetta; agrati, chiara; capobianchi, maria rosaria; castilletti, concetta; emiliozzi, arianna; fabbiani, massimiliano; montagnani, francesca; montomoli, emanuele; sala, claudia; ippolito, giuseppe; rappuoli, rino title: identification of neutralizing human monoclonal antibodies from italian covid- convalescent patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y o y in the absence of approved drugs or vaccines, there is a pressing need to develop tools for therapy and prevention of covid- . human monoclonal antibodies have very good probability of being safe and effective tools for therapy and prevention of sars-cov- infection and disease. here we describe the screening of pbmcs from seven people who survived covid- infection to isolate human monoclonal antibodies against sars-cov- . over , memory b cells were single-cell sorted using the stabilized prefusion form of the spike protein and incubated for two weeks to allow natural production of antibodies. supernatants from each cell were tested by elisa for spike protein binding, and positive antibodies were further tested for neutralization of spike binding to receptor(s) on vero e cells and for virus neutralization in vitro. from the , memory b specific for sars-cov- , we recovered b lymphocytes expressing human monoclonals recognizing the spike protein and of these were able to inhibit the binding of the spike protein to the receptor. finally, mabs were able to neutralize the virus when assessed for neutralization in vitro. lead candidates to progress into the drug development pipeline will be selected from the panel of neutralizing antibodies identified with the procedure described in this study. one sentence summary neutralizing human monoclonal antibodies isolated from covid- convalescent patients for therapeutic and prophylactic interventions. the impact of the sars-cov- pandemic, with more than . million cases, , deaths and more than million people unemployed in the united states alone, is unprecedented. this first wave of infection is likely to be followed by additional waves in the next few years, until herd immunity, acquired by vaccination or by infection, will constrain the circulation of the virus. it is therefore imperative to develop therapeutic and preventive tools to face the next waves of sars- cov- infections as soon as possible. among the many therapeutic options available, human monoclonal antibodies (mabs) are the ones that can be developed in the shortest period of time. in fact, the extensive clinical experience with the safety of more than commercial mabs approved to treat cancer, inflammatory and autoimmune disorders provides high confidence on their safety. these advantages, combined with the urgency of the sars-cov- pandemic, support and justify an accelerated regulatory pathway. in addition, the long industrial experience in developing and manufacturing mabs decreases the risks usually associated with the technical development of investigational products. finally, the incredible technical progress in the field allows to shorten the conventional timelines and go from discovery to proof of concept trials in - months ( ). indeed, in the case of ebola, mabs were the first therapeutic intervention recommended by the world health organization (who) and they were developed faster than vaccines or other drugs ( ). the sars-cov- spike glycoprotein (s-protein) has a pivotal role in viral pathogenesis and it is considered the main target to elicit potent neutralizing antibodies and the focus for the development of therapeutic and prophylactic tools against this virus ( , ) . indeed, sars-cov- entry into host cells is mediated by the interaction between s-protein and the human angiotensin converting enzyme (ace ) ( , ) . the s-protein is a trimeric class i viral fusion protein which exists in a metastable prefusion conformation and in a stable postfusion state. each s-protein monomer is composed of two distinct regions, the s and s subunits. structural rearrangement occurs when the receptor binding domain (rbd) present in the s subunit binds to the host cell membrane. this interaction destabilizes the prefusion state of the s-protein triggering the transition into the postfusion conformation which in turn results in the ingress of the virus particle into the host cell ( ). single-cell rna-seq analyses to evaluate the expression levels of ace in different human organs have shown that sars-cov- , through the s-protein, can invade human cells in different major physiological systems including the respiratory, cardiovascular, digestive and urinary systems, thus enhancing the possibility of spreading and infection ( ). to identify potent mabs against sars-cov- we isolated over a , s-protein specific-memory b cells derived from seven covid- convalescent donors. as the s-protein rbd domain is mainly exposed when this glycoprotein is in its prefusion state ( ), we screened naturally produced mabs against either the s /s subunits and the s-protein trimer stabilized in its prefusion conformation ( ). this strategy allows us to identify mabs able to recognize linear epitopes as well as highly neutralizing trimer specific regions on the s-protein surface. the potent neutralizing effect of trimer specific mabs has already been shown for other pathogens including the respiratory syncytial virus (rsv) ( , ) . in this paper we report the identification of a panel of mabs from which we plan to select lead candidates for clinical development. patients recovered from sars-cov- infection were enrolled in two ongoing clinical studies based in rome, italy (national institute for infectious diseases, irccs, lazzaro spallanzani) and siena, italy (azienda ospedaliera universitaria senese). we firstly examined whether these patients had anti-sars-cov- s-protein antibodies. plasma samples were evaluated by enzyme linked immunosorbent assay (elisa) to assess the polyclonal response to the s-protein trimer, for their ability to neutralize the binding of the spike protein to vero e cells (neutralization of binding or nob assay) and for their potency in neutralizing the cytopathic effect caused by sars-cov- infection in vitro. results shown in table and figure show that, among the seven donors included in this study, six were able to produce high titers of sars-cov- s-protein specific antibodies and in particular donors r- , r- and r- showed the highest virus neutralizing titers. only one patient (r- ) mounted low anti-spike polyclonal response ( fig. a-b ). interestingly, despite no statistically significant correlation was observed when pearson correlation analysis was performed, it is possible to observe a trend of correlation between s-protein binding, nob titer and neutralization titer, suggesting that an abundant response against the s-protein trimer in its prefusion conformation may be indicative of immunity against sars-cov- ( fig. c-d) . a bigger dataset may be needed to support this observation. mbcs binding to the sars-cov- s-protein trimer in its prefusion conformation. a total of , s-protein-binding mbcs were successfully retrieved with frequencies ranging from , % to , % (table ) . following the sorting procedure, s-protein + mbcs were incubated over a layer of t -cd l feeder cells in the presence of il- and il- stimuli for two weeks to allow natural production of immunoglobulins ( ). subsequently, mbc supernatants containing igg or iga were tested for their ability to bind either the sars-cov- s-protein s + s subunits ( showing a broad array of neutralization potency ranging from % to over % (fig. b ). in the viral neutralization assay, supernatants containing naturally produced igg or iga were tested for their ability to protect the layer of vero e cells from the cytopathic effect triggered by sars- (table ) . the first drugs to be tested in the ebola outbreak and showed remarkable efficacy in preventing mortality ( ). given the remarkable efficacy of this intervention, potent mabs became the first, and so far the only, drug to be recommended for ebola by the who. in the case of sars-cov- , where so far we do not have any effective therapeutic nor prophylactic interventions, mabs have the possibility to become one of the first drugs that can be used for immediate therapy of any patient testing positive for the virus, and even to provide immediate protection from infection in high risk populations. preliminary evidences showed that plasma from infected people improves the outcome of patients with severe disease, therefore it is highly possible that a therapeutic and/or prophylactic mab-based intervention can be highly effective ( ). furthermore, vaccination strategies inducing neutralizing antibodies have shown to protect non- human primates from infection ( ). these results further stress the importance of mabs as a measure to counterattack sars-cov- infection and to constrain its circulation. in this work we addressed the question of whether mabs recognizing sars-cov- can be recovered from memory b cells of people who survived covid- and whether some of them are able to neutralize the virus. our data show that sars-cov- specific mabs can be successfully isolated from most convalescent donors even if the frequency of s-protein specific memory b cells is highly variable among them. in addition, approximately % of isolated mabs were able to inhibit the binding of the s-protein to the receptor(s) on vero e cells. finally a fraction of isolated mabs (n= ) were able to effectively neutralize sars-cov- with high potency when tested in vitro. these data suggest that the method we implemented allows us to successfully retrieve mabs with potent neutralizing activity against sars-cov- and we plan to select the most promising candidate(s) for drug development. lead candidates will be further tested for the ability to generate resistant viruses and for the ability to induce antibody-dependent disease enhancement in appropriate models. elisa assay with s and s subunits of sars-cov- s-protein the presence of s -and s -binding antibodies in culture supernatants of monoclonal s-protein- to study the binding of the covid- spike protein to cell-surface receptor(s) we developed an assay to assess recombinant spike protein specific binding to target cells and neutralization thereof. to this aim the stabilized spike protein was coupled to streptavidin-pe (ebioscience # - - , the infected cultures were incubated at °c, % co and monitored daily until approximately - % of the cells exhibited cytopathic effect (cpe). culture supernatants were then collected, centrifuged at °c at , rpm for minutes to allow removal of cell debris, aliquoted and stored at - °c as the harvested viral stock. viral titers were determined in confluent monolayers of vero e cells seeded in -well plates using a % tissue culture infectious dose assay (tcid ). cells were infected with serial : dilutions (from - to - ) of the virus and incubated at °c, in a humidified atmosphere with % co . plates were monitored daily for the presence of sars-cov- induced cpe for days using an inverted optical microscope. the virus titer was estimated according to spearman-karber formula ( ) and defined as the reciprocal of the highest viral dilution leading to at least % cpe in inoculated wells. rr is an employee of gsk group of companies. we would like to thank the whole gsk vaccines pre-clinical evidence generation and assay -immunolgy function led by dr oretta finco for their availability and support as well as mrs we would also like to thank dr. mariagrazia pizza and dr. simone pecetta for initial insightful advice jason mclellan and his team for generously providing the sars-cov- s-protein stabilized in its prefusion conformation used in this study. furthermore, we would like to thank dr. daniel wrapp and dr we gratefully acknowledge the collaborators members of inmi covid- study group: maria alessandra abbonizio we would like to thank all the other members of the covid- unit and of the covid- team giacomo zanelli, and the covid- nursing staff developing therapeutic monoclonal antibodies at pandemic pace successful ebola treatments promise to tame outbreak structure, function, and antigenicity of the sars-cov- spike glycoprotein the trinity of covid- : immunity, inflammation and intervention structural and functional basis of sars-cov- entry by using human ace cryo-em structure of the -ncov spike in the prefusion conformation single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to -ncov infection. frontiers of medicine structural, antigenic and immunogenic features of respiratory syncytial virus glycoproteins relevant for vaccine development immunological goals for respiratory syncytial virus vaccine development. current opinion in immunology isolation of human monoclonal antibodies from peripheral blood b cells human monoclonal antibodies for discovery, therapy, and vaccine acceleration effectiveness of convalescent plasma therapy in severe covid- patients rapid development of an inactivated vaccine for sars-cov- one-hit models for virus inactivation studies this publication was supported by funds from the "centro regionale medicina di precisione" and by all the people who answered table . sars-cov- convalescent donors s-protein specific mbcs analyses. the table reports the number of s-protein-specific mbcs that were sorted and screened (for binding by elisa and for functionality by nob and viral neutralization) for each subject enrolled in this study; nd = not done. key: cord- -y zd oui authors: olagnier, david; farahani, ensieh; thyrsted, jacob; cadanet, julia b.; herengt, angela; idorn, manja; hait, alon; hernaez, bruno; knudsen, alice; iversen, marie beck; schilling, mirjam; jørgensen, sofie e.; thomsen, michelle; reinert, line; lappe, michael; hoang, huy-dung; gilchrist, victoria h.; hansen, anne louise; ottosen, rasmus; gunderstofte, camilla; møller, charlotte; van der horst, demi; peri, suraj; balachandran, siddarth; huang, jinrong; jakobsen, martin; svenningsen, esben b.; poulsen, thomas b.; bartsch, lydia; thielke, anne l.; luo, yonglun; alain, tommy; rehwinkel, jan; alcamí, antonio; hiscott, john; mogensen, trine; paludan, søren r.; holm, christian k. title: identification of sars-cov -mediated suppression of nrf signaling reveals a potent antiviral and anti-inflammatory activity of -octyl-itaconate and dimethyl fumarate date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y zd oui antiviral strategies to inhibit severe acute respiratory syndrome coronavirus (sars-cov ) and the pathogenic consequences of covid- are urgently required. here we demonstrate that the nrf anti-oxidant gene expression pathway is suppressed in biopsies obtained from covid- patients. further, we uncover that nrf agonists -octyl-itaconate ( -oi) and the clinically approved dimethyl fumarate (dmf) induce a cellular anti-viral program, which potently inhibits replication of sars-cov across cell lines. the anti-viral program extended to inhibit the replication of several other pathogenic viruses including herpes simplex virus- and- , vaccinia virus, and zika virus through a type i interferon (ifn)-independent mechanism. in addition, induction of nrf by -oi and dmf limited host inflammatory responses to sars-cov infection associated with airway covid- pathology. in conclusion, nrf agonists -oi and dmf induce a distinct ifn-independent antiviral program that is broadly effective in limiting virus replication and suppressing the pro-inflammatory responses of human pathogenic viruses, including sars-cov . one sentence summary nrf agonists -octyl-itaconate ( -oi) and dimethyl fumarate inhibited sars-cov replication and virus-induced inflammatory responses, as well as replication of other human pathogenic viruses. the sars-cov pandemic emphasizes the urgent need to identify cellular factors and pathways that can be targeted by new broad-spectrum anti-viral therapies. viral infections usually cause disease in humans through both direct cytopathogenic effects and through excessive inflammatory responses of the infected host. this also seems to be the case with sars-cov as covid- patients develop cytokine storms that are very likely to contribute to, if not drive, immunopathology and severe disease( , ). for these reasons, anti-viral therapies must aim to not only inhibit viral replication but also to limit inflammatory responses of the host. nuclear factor (erythroid-derived ) -like (nrf ) functions as a cap´n´collar basic leucine zipper family of transcription factors characterized structurally by the presence of nrf -ech homology domains ( ) . at homeostasis, nrf is kept inactive in the cytosol by its inhibitor protein keap (kelch-like ech-associated protein ), which targets nrf for proteasomal degradation( ). in response to oxidative stress, keap is inactivated and nrf is released to induce nrf -responsive genes. in general, the genes under the control of nrf protect against stress-induced cell death and nrf has thus been suggested as the master regulator of tissue damage during infection ( ) . importantly, nrf is now demonstrated as an important regulator of the inflammatory response ( , ) and functions as a transcriptional repressor of inflammatory genes, most notably interleukin (il-) b, in murine macrophages ( ) . recent reports have now demonstrated that nrf is induced by several cell derived metabolites including itaconate and fumarate, to limit inflammatory responses to stimulation of tlr signaling with lipopolysaccharide stimulation ( ) . the chemically synthesized and cell-permeable derivative of itaconate, -octyl-itaconate ( -oi) was then demonstrated to be a very potent nrf inducer ( ) . of special interest is the derivative of fumarate, dimethyl fumarate (dmf), a us food and drug administration (fda) approved drug, which is used as an anti-inflammatory therapeutic in multiple sclerosis (ms) and demonstrated, at least in animal models, a potent capacity to suppress pathogenic inflammation through a nrf -dependent mechanism ( , ) . besides limiting the inflammatory response to lps, induction of nrf by -oi also inhibits the stimulator of interferon genes (sting) antiviral pathway along with interferon (ifn) stimulated gene expression ( ) . in opposition to this anti-viral effect of nrf on the ifn- response a recent single-cell rna-seq analysis has demonstrated that nrf gene expression signatures correlated negatively with susceptibility to hsv infection ( ) . if nrf agonists can be used to limit viral replication of sars-cov or other pathogenic viruses is, however, not known. here we demonstrate that expression of nrf -dependent genes is suppressed in biopsies from covid- patients and that treatment of cells with nrf agonists -oi and dmf induces a strong anti-viral program that limits sars-cov replication. the anti-viral effect of activating nrf extended to other pathogenic viruses including herpes simplex virus- and- (hsv- and hsv- ), vaccinia virus (vacv), and zika virus (zikv). further, -oi and dmf limited the release of pro-inflammatory cytokines in response to sars-cov infection and to virus-derived ligands through a mechanism that limits irf dimerization. in summary, we demonstrate that nrf agonists are plausible broad-spectrum anti-viral and anti-inflammatory agents and we suggest a repurposing of the already clinically approved dmf for the treatment of sars-cov . to identify host factors or pathways that are important for controlling sars-cov infection, publicly available transcriptome data sets including transcriptome analysis of lung biopsies from covid- patients were analyzed using differential expression analysis ( ) . here, genes linked with inflammatory and anti-viral pathways, including rig-i receptor and toll-like receptor signaling, were highly enriched in covid- lung patient samples, whereas genes associated with the nrf dependent anti-oxidant response were highly suppressed ( fig. a-c) . that nrf - induced genes are repressed during sars-cov infections was supported by re-analysis of another data-set builing on transcriptome analysis of lung autopsies obtained from five individual covid- patients (desai et al., ) (fig. d) . further, that the nrf -pathway is repressed during infection with sars-cov was also supported by in vitro experiments where the expression of nrf -inducible proteins heme oxygenase (ho- ) and nad(p)h quinone oxydoreducatse (nqo ) was repressed in sars-cov infected vero htmprss cells while the protein levels of canonical anti-viral transcription factors such as stat and irf seemed unaffected (fig. s ) . these data indicate that sars-cov targets the anti-oxidant nrf pathway and thus suggests that the nrf pathway restricts sars-cov replication. nrf agonist -oi (fig. h) . the effect of -oi was also retained in the lung cancer cell line calu- , where sars-cov rna levels were reduced by > -logs (fig. i) , while release of progeny virus was reduced by > -logs based on tcid analysis of cell supernatants (fig. low compared to what we could observe in calu and vero cells but -oi treatment still reduced sars-cov rna levels and release of progeny virus (fig. l+m) . we further tested the anti- viral effect towards sars-cov in primary human airway epithelial (hae) cultures (fig. n) . here, -oi treatment also significantly reduced viral rna levels (fig. o) . interestingly, when treating calu cells with dmf, another known nrf inducer and a clinically approved drug in the first-line-of treatment of multiple sclerosis, we could also observe an anti-viral effect toward sars-cov replication similar in magnitude as what we had observed with -oi (fig p-q) as well as a reduced but significant effect when using vero cells (fig. r) . to further evaluate if the nrf /keap axis controls an anti-viral pathway effective in inhibiting sars-cov replication, we used genetic activation of nrf by silencing of keap . this approach supported the -oi data as silencing of keap led to suppressed replication of sars-cov in calu cells by qpcr analysis, immunoblotting, and tcid analysis (fig. s-u) . finally, the anti-viral effect of sars-cov was not isolated to this particular isolate as the effect of -oi was reproduced using a different sars-cov isolate obtained in japan( ) (fig. v+x) . these data demonstrate that nrf inducers -oi and dmf induce potent anti-viral responses that efficiently inhibit sars-cov replication across multiple cellular systems. virus (hsv) type and we could observe that treatment with -oi reduced both the release of progeny virus, the cellular content of virus rna determined by rna sequence analysis, and viral protein as determined by both immunoblotting and flow cytometry (fig. a-e and fig. s ) . by contrast, the expression of nrf -inducible ho- , nqo , and sequestosome (sqstm ) was highly increased in response to -oi treatment (fig. c and fig. s ) . the anti-viral effect of -oi was at least partially dependent on nrf as silencing hereof by sirna clearly reduced the suppression of hsv infection by -oi (fig. f-g) . vaccinia virus (vacv) belongs to the family of human pathogenic poxviruses. we used hacat cells, but also bone marrow derived dendritic cells (bmdcs), to test if the anti-viral effect of - oi extended to these viruses. here we could observe that both hacat cells and bmdcs became highly resistant to infection with vacv when these were pre-treated with -oi as measured by plaque assay and flow cytometry (fig. h-n) . this seemed also to be the case for another poxvirus ectromelia virus (ectv) as assesses by confocal imaging (fig. k-l) . for both hsv and vacv the anti-viral effect of -oi was extended to other cell type including murine cancer cell line t and human renal carcinoma -o cells (fig. s ) . interestingly, the anti-viral effect of -oi was not extended to infection with vesicular stomatitis virus (vsvd m) emphasizing that the anti-viral program induced by -oi effectively inhibits replication of many, but not all, viruses (fig. s ) . the anti-viral effect of -oi relied on intracellular restriction of replication, since viral entry was not affected by -oi treatment -if anything it seemed to be slightly increased (fig. s ) . to determine if the anti-viral effect of -oi extended to an in vivo model of viral pathogenesis, female c bl j mice were treated with -oi prior to vaginal inoculation with hsv; pre-treatment with -oi decreased disease progression (fig. s ) , an effect that was enhanced in mice deficient in sting (tmem -/-) most likely to due to the pro-viral effect -oi has on the sting signaling pathway ( , ) , which is eliminated in these mice. finally, we tested the efficacy of -oi on zika virus, an important human pathogenic virus causing mild symptoms in the competent adult but severe disease when transmitted in utero ( ) . here, we could demonstrate that the anti-viral program induced by -oi reduced replication of zika virus in the human lung cancer cell line a and in the human liver cell line huh- (fig. o-p) . given that vero cells are deficient in type i ifn ( ) , this suggested that the inhibitory effect of -oi was actually independent of type i ifn signaling. to address this possibility, we used either hacat cells deficient in ifn alpha receptor (ifnar ), signal transducer and activator of transcription (stat ), both of which is necessary for type i ifn-signaling( ); or deficient in sting, which is central to type i ifn-response to dna viruses. here, cells were treated with -oi, followed by infection with hsv and vacv. replication of both viruses was inhibited by -oi in stat ko cells and for hsv also in sting ko cells as measured by plaque assay and expression of viral proteins by immunoblotting and flow cytometry (fig. s ) . in conclusion, -oi induces an anti-viral program that operates independently of ifn signaling (fig. s ) . to examine what general pathways are affected by -oi treatment either alone or during infection that could predict the antiviral mode of action itaconate, we treated hacat cells with - oi before infection with hsv- . rna was then collected and analysed by rna sequencing. pathway analysis was then used to compare untreated cells to -oi treated cells either with or without infection with hsv- . this analysis identified several pathways that were either induced or repressed by -oi treatment while re-confirming that ifn-signaling pathways was repressed by -oi (fig. s ) . amongst the top up-regulated genes induced by -oi was the heme oxygensase- , an enzyme canonicaly involved in stress detoxification, also reported to have antiviral activity against amongst others zika, dengue and ebola viruses ( ) ( ) ( ) . to assess whether, ho- had any antiviral activity in our cellular system, vero htmprss and calu- cells were either transfected with an overexpression plasmid encoding ho- or genetically silenced for keap and ho- by sirna, respectively before infection with sars-cov- . none of the treatments (ho- overexpression or silencing) really altered sars-cov- infection/replication suggesting of an ho- -independent antiviral program induced by nrf (fig. s ) . in an attempt to pin-point the anti-viral mode of action of -oi, we also used microscope-based analysis of morphology by cell-paint technology (fig. s ) . with this analysis we are able to compare morphological changes in cells treated with -oi to cells treated with compounds that have known cellular targets and with cells treated with other compounds with reported anti-viral activity towards sars-cov including remdesivir and hydroxychloroquine ( , ) . in this analysis, -oi was determined to have an low but significant morphological activity whitout loss of cell viability. interestingly, the activity of -oi did not seem to overlap with other compound with known perturbation in cell morphology including rapamycin, bafilomycin, tunicamycin, cyclohexamide, emetine, mitomycin, or doxorubicin. interestingly, there was also no observable overlab with the activity profile of remdesivir or hydroxychloroquine indicating that the anti-viral mode of action of -oi is distinct from known anti-viral mechanisms. student's t-test to determine statistical significance where **p< . , ***p< . , and ****p< . . in covid- , an uncontrolled pro-inflammatory cytokine storm contributes to disease pathogenesis and lung damage ( ) . for this reason, we investigated if -oi and dmf could inhibit expression of pro-inflammatory cytokines induced by sars-cov . in calu- cells, infection with sars-cov increased the expression of ifnb , c-x-c motif chemokine (cxcl ), tumor necrosis factor alpha (tnfa), il- b and c-c chemokine ligand (ccl ). interestingly, this was abolished by pretreatment with -oi thus severely reducing the pro- inflammatory response to sars-cov (fig. a-b) . by contrast, expression of the nrf inducible gene hmox was highly increased in response to -oi treatment (fig. c) . the potential anti-inflammatory effect of -oi in this context was supported when using hae cultures. here, treatment with -oi also reduced the expression of ifnb , cxcl , tnfa, and ccl in the context of sars-cov infection (fig. d-e) , while increasing the expression of the nrf inducible gene hmox (fig. f) . a similar pattern was seen in experiments where calu cells were treated with dmf before sars-cov infection. here, ifnb , cxcl and ccl mrna levels were highly reduced in dmf treated cells while tnfa mrna levels seemed unaffected (fig. g+h) . by contrast, treatment with dmf increased the mrna expression levels of nrf inducible gene hmox (fig. i) . as inflammatory responses often stem from immune cells we also tested the effect of -oi on peripheral blood mononuclear cells (pbmcs) harvested from healthy donors. although stimulation of pbmcs with sars-cov yielded a very weak induction of cxcl compared to sendai virus (sev) infection, and no detectable induction of other cytokines, -oi treatment also reduced cxcl mrna levels in this context (fig. j) . further, when using pbmcs harvested from four individual patients with severe covid- and admitted to hospital intensive care units (icus), we could conclude that in three out of four patients, expression levels of cxcl were increased when compared to healthy controls; and that in all four patients, these levels were strongly reduced to or below normal when treating the pbmcs with -oi (fig. k) indicating that -oi is able to relieve this rig-i agonist m (fig l-m) , through an effect linked to the inhibition of interferon regulatory factor (irf ) dimerization but not of upstream phosphorylation of tank binding kinase (tbk ) or of irf expression itself (fig n) . importantly, nrf expression itself was closely associated with the inhibition of irf dimerization and host antiviral gene expression, since nrf silencing by sirna was sufficient to restore irf dimerization and limit the inhibitory effect of -oi (fig. n-o) . when using the constitutively active form of irf , irf ( d) ( ), - oi was still able to block irf dimerization, and again, this effect was eliminated when nrf expression was suppressed by sirna (fig p-q) . these data indicate that an nrf inducible and dependent mechanism targets the induction of ifn by inhibition of irf dimerization. this phenomenon is likely to add to the inhibition of sars-cov induced cytokine release we could observe when using nrf agonists. we have previously reported that -oi inhibits the expression of sting, which is important for the induction of the ifn-response in cells stimulated with cytosolic dna ( ) . in line, -oi inhibited the ifn-response to hsv infection and to stimulation with sting agonists dsdna and cgamp (fig. r-t) . and hacat(q) cells were transfected with indicated plasmids before treatment with -oi at µm. in (q), hacat cells were lipofected with sirnas for h before plasmid transfection. cells were then collected for analysis by qpcr (p) and immunoblotting (q). (r-t) hacat cells were treated with -oi at µm before infection with hsv at moi . or transfection with dsdna ( µg.ml - ). cell pellets were collected for qpcr and immunoblotting at and hours respectively. in (n-t), data display data from one experiment representative of at least to independent experiments. all statistical analysis were performed using a two-tailed student's t-test to determine statistical significance where **p< . , ***p< . , and ****p< . . altogether, this study demonstrated that the expression of nrf dependent anti-oxidant genes was significantly inhibited in covid- patients, and that the nrf agonists -oi and dmf inhibited both sars-cov replication, as well as the expression of associated inflammatory markers. the ability of these nrf inducers to also reduce potentially pathogenic ifn-and inflammatory responses while retaining their anti-viral properties is unique to these compounds and promotes their applicability to prevent virus-induced pathology. that nrf might be a natural regulator of ifn-responses in the airway epithelium is supported by a recent report demonstrating that nrf activity is high while ifn activity is low in the bronchial epithelium ( ) . as dmf is currently used as an anti-inflammatory drug in relapsing-remitting ms, this drug could be easily repurposed and tested in clinical trials to test its ability to limit sars-cov replication and inflammation- induced pathology in covid- patients. our observation that -oi strongly inhibits the ifn- response to both cytosolic dna and d rna, which are canonical anti-viral pathways, but still retain its ability to block viral replication also suggests a spectrum of unidentified cellular programs that are inducible through nrf and efficiently restrict viral replication independently of ifns. this is supported by already mentioned negative correlation between expression of covid- : consider cytokine storm syndromes and immunosuppression clinical features of patients infected with novel coronavirus in wuhan stress-activated cap'n'collar transcription factors in aging and human disease the nrf regulatory network provides an interface between redox and intermediary metabolism nrf as a master regulator of tissue damage control and disease tolerance to infection nrf is a critical regulator of the innate immune response and survival during experimental sepsis nrf -dependent protection from lps induced inflammatory response and mortality by cddo-imidazolide nrf suppresses macrophage inflammatory response by blocking proinflammatory cytokine transcription itaconate is an anti-inflammatory metabolite that activates nrf via alkylation of keap fumaric acid and its esters: an emerging treatment for multiple sclerosis with antioxidative mechanism of action fumaric acid esters exert neuroprotective effects in neuroinflammation via activation of the nrf antioxidant pathway nrf negatively regulates sting indicating a link between antiviral sensing and metabolic reprogramming single-cell rna-sequencing of herpes simplex virus -infected cells connects nrf activation to an antiviral program imbalanced host response to sars-cov- drives development of covid- parp- cleavage fragments: signatures of cell-death proteases in neurodegeneration enhanced isolation of sars-cov- by tmprss -expressing cells nrf negatively regulates type i interferon responses and increases susceptibility to herpes genital infection in mice emergence and spreading potential of zika virus defectiveness of interferon production and of rubella virus interference in a line of african green monkey kidney cells (vero) regulation of type i interferon responses nrf -dependent induction of innate host defense via heme oxygenase- inhibits zika virus replication the cytoprotective enzyme heme oxygenase- suppresses ebola virus replication human heme oxygenase is a potential host cell factor against dengue virus replication remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro remdesivir in adults with severe covid- : a randomised, double-blind, placebo-controlled, multicentre trial pathogenic potential of interferon alphabeta in acute influenza infection an optimized retinoic acid-inducible gene i agonist m induces immunogenic cell death markers in human cancer cells and dendritic cell activation structural and functional analysis of interferon regulatory factor : localization of the transactivation and autoinhibitory domains regional differences in airway epithelial cells reveal tradeoff between defense against oxidative stress and defense against rhinovirus key: cord- - gdawhfn authors: kirkland, p.d.; frost, m.j. title: the impact of viral transport media on pcr assay results for the detection of nucleic acid from sars-cov- and other viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gdawhfn during the sars-cov- pandemic, there has been an acute shortage of viral transport medium. many different products have been used to meet the demands of large-scale diagnostic and surveillance testing. the stability of sars-cov- rna was assessed in several commercially produced transport media and an in-house solution. coronavirus rna was rapidly destroyed in the commercial transport media though the deleterious effects on intact virus were limited. similar results were obtained for a type a influenza virus. there was reduced detection of both virus and nucleic acid when a herpesvirus sample and purified dna were tested. collectively these data showed that the commercial viral transport media contained nucleases or similar substances and may seriously compromise diagnostic and epidemiological investigations. recommendations to include foetal bovine serum as a source of protein to enhance the stabilising properties of viral transport media are contraindicated. almost all commercial batches of foetal bovine serum contain pestiviruses and at times other bovine viruses. in addition to the potential for there to be nucleases in the transport medium, the presence of these viruses and other extraneous nucleic acid in samples may compromise the interpretation of sequence data. the inclusion of foetal bovine serum presents a biosecurity risk for the movement of animal pathogens and renders these transport media unsuitable for animal disease diagnostic applications. while these transport media may be suitable for virus culture purposes, there could be misleading results if used for nucleic acid-based tests. therefore, these products should be evaluated to ensure fitness for purpose. for several decades a variety of medium solutions have been recommended to stabilise specimens for the detection of bacteria and viruses, particularly during diagnostic investigations. these have usually been based on balanced salt or saline solutions with a buffering capacity to maintain a "near-neutral" ph. to enhance the stability of viruses a spectrum of protein supplements has and continues to be recommended ( , , , ) . while some laboratories have prepared viral transport medium (vtm) "in house", commercial preparations are used extensively and are often supplied as part of a sample collection kit with sterile swabs. testing of samples by cultural methods meant that the emphasis of studies for the evaluation of these products originally focussed on the capacity of a preparation to maintain the infectivity of viruses at different temperatures while being held prior to and during transport and while being stored at the laboratory ( , ) . with the widespread introduction of molecular based diagnostic assays, especially real time pcr (qpcr), studies have been undertaken to evaluate the stability of viruses in vtms, particularly in commercially prepared products, while being held at a range of temperatures ( , , ) . however, while thermal stability has been considered, generally little attention has been given to other characteristics of the vtm or the potential impact of endogenous components. one commercially available product is specifically designed to inactivate viruses and bacteria and contains components to inhibit the activity of nucleases that may be present in the sample ( , ) . there are some other products that are recommended for use in molecular detection assays but the manufacturers provide no comment that these products are unlikely to be suitable for samples where virus culture will be attempted. during large-scale disease epidemics there can be pressure placed on the capacity of manufacturers to supply transport media and, during a pandemic, supply-chain and manufacturing pressures can become prohibitive. during the current ( ) sars-cov- pandemic, there has been an acute shortage of vtm in australia because of a combination of both local and international demand, the lack of a local manufacturer and partly because of reduced international airline flights to australia. consequently, many different vtms and similar solutions have been used to meet the demand for transport media generated by large scale diagnostic and surveillance testing. after becoming aware of concerns of variable results in some assays, we initiated a preliminary study to compare the stability of sars-cov- rna in several commercially available vtms and an in-house product. when disturbing results were found in a preliminary sars-cov- study, the investigation was extended to examine whether there was also an adverse impact on qpcr results for other viruses. also of concern are recommendations ( , ) to include foetal bovine serum (fbs) as a source of protein to enhance the stabilising properties of vtms. this report documents observations of the adverse impact of certain vtms on real time reverse transcription pcr (qrt-pcr) assays for the detection of sars-cov- virus as well as on a type a influenza virus and a herpesvirus and discuss the broader implications of the inclusion of foetal bovine serum as a protein supplement to vtms. during the initial investigation, purified rna from an australian isolate (wmd dc ) of sars-cov- was supplied to the elizabeth macarthur agriculture institute (emai) by the institute of clinical pathology and medical research (icpmr), westmead, new south wales (nsw). a / dilution of this rna was made in trna (sigma, ng/ul) to use as a reference preparation (designated r ) and stored frozen at approximately - o c. subsequently, to include a sample which was expected to contain a high titre of intact, likely infectious virus, a sample from a recently diagnosed patient was selected (p ). this sample had also been stored frozen at approximately - o c soon after it had been tested a few days earlier. dilutions of these samples were tested in parallel (see below). to extend the study beyond sars-cov- virus, a type a influenza virus, (l , an isolate of h n influenza ( )), was used. rna was purified from l using standard methods (see below) and was diluted / in the extraction elution buffer to prepare a working stock. dilutions of this purified nucleic acid preparation were also tested in parallel with the infectious reference virus l as described for the sars-cov- virus. finally, a preparation of bovine herpesvirus (bhv- , m ) was similarly prepared as both purified dna and infectious virus to determine the influence of transport media on a dna virus. the study included different vtms. vtm- was an "in-house" product based on phosphate buffered saline (pbs ph . ) supplemented with . % gelatin (pbgs), antibiotics and . % phenol red as a ph indicator. the other three were commercially manufactured vtms -vtm- and vtm- are widely used, both in australia as well as many countries in the northern hemisphere; vtm- is a new australian manufactured product that is undergoing evaluation. vtm- and vtm- are believed to be supplemented with bovine serum albumen and gelatin and vtm- with foetal bovine serum. sterile phosphate buffered saline (pbs), ph . was used as a control solution. for each virus under study, the concentrated reference viruses and the respective purified nucleic acids were diluted in either phosphate buffered saline (pbs) or one of the vtm solutions and were then subjected to nucleic acid extraction and testing by qpcr. specifically, dilutions of the purified sars-cov- rna (r ) were prepared by adding either ul of r to ul for a / dilution of the vtm under study or, for / dilutions, ul of diluted r to an equal volume of the respective vtm. in the pilot experiment, only vtms - were included. within minutes of preparation of the dilutions of rna in each vtm, the diluted samples were extracted and tested by qrt-pcr as described below. subsequently, this pilot experiment was repeated with the inclusion of pbs and vtm- , as well as a series of dilutions of patient sample p . these dilutions of rna and virus were then extracted within minutes of preparation and tested in several sars-cov- qrt-pcr assays. later, less dilute preparations of r were made so that the impact of the vtms on higher concentrations of sars-cov- rna could also be assessed. to obtain sufficient vtm as a diluent for this study, the contents ( . - ml) of vials of vtm were pooled and mixed. to investigate whether the results obtained from the experiments involving sars-cov- were applicable to other viruses, the same experimental design was applied to a type a influenza virus and a herpesvirus and the respective purified viral nucleic acid extracts. the starting concentrations of nucleic acid and whole virus were adjusted so that the levels of nucleic acid (based on cycle-threshold (ct) values) were very close to those used in the sars-cov- experiments. in each instance each diluted sample (nucleic acid or whole virus) was extracted and tested within minutes of preparation of the dilutions. as well as testing the whole virus & purified nucleic acid combinations, the impact of the different vtms on the exogenous rna internal control (xipc) were also assessed by adding , or , copies of the xipc to ul of each vtm and testing immediately or after the dilutions had been held at o c for hours. finally, to confirm that vtm- can provide a high level of stability for clinical samples, swabs from sars-cov- infected patients were removed from the transport medium in which they had been collected and were placed in new vials containing ml of vtm- and were then held at o c for - days before testing. nucleic acid extraction and pcr assays: total nucleic acid was extracted from ul of each sample (including dilutions of the already purified rna or dna) with a magnetic beadbased viral rna extraction kit (magmax viral rna -ambion) that is run on a kingfisher- magnetic particle handling system (thermofisher). after purification the nucleic acid was eluted in ul of kit elution buffer and ul run in the appropriate qrt-pcr. the primers and probe were directed at the sars-cov- e gene ( ) and the ip assay targeting the rdrp gene ( ) as developed by the institut pasteur (ip), paris, france and recommended by the world health organisation. the sars-cov- primers and probe were used in a triplex assay format with the inclusion of the exogenous rna internal control (xipc) assay ( ) . the probes for the coronavirus assays (e/ip ) were both labelled with a fam reporter and black hole quencher (biosearch technologies) and the xipc probe with a vic reporter and tamra quencher (life technologies) as these combinations had been shown to provide high analytical sensitivity. a commercial reverse transcriptase mastermix (agpath-id one-step rt-pcr kit -life technologies) was used and run, with slight modifications, under the standard cycling conditions recommended by the manufacturer, using an abi or quantstudio (thermofisher scientific) thermocycler. the combined annealing and extension temperatures were adjusted to o c and the assay was run for a total of cycles. the baseline was manually set between to cycles and the threshold at . . the results were expressed as cycle-threshold (ct) values, being the number of pcr cycles at which the amplification plot crossed the threshold. for comparative purposes a selection of samples was also tested using the cdc designed assay ( -ncov_n ) targeting the n gene, which was also run under the published conditions ( ) with the same commercial mastermix. published primer and probe sets were used for the pan-reactive type a influenza qrt-pcr ( ), the bhv- qpcr ( ) and the pan-pestvirus qrt-pcr ( ) assays, each in a duplex format with the xipc assay. each assay utilised the agpath-id one-step rt-pcr mastermix and were run for cycles under the standard cycling conditions recommended by the manufacturer, with the baseline set automatically and the threshold at . . all assays included at least positive controls, one negative control (trna) and a 'no template' control (ntc -nuclease free water). the xipc rna (approximately copies/ul) was included in the sample lysis buffer prior to the extraction of nucleic acid from all test and control samples except for the ntc wells. the results of these investigations are documented in tables - . the preliminary investigation (table ) showed that no sars-cov- rna was detected after dilution invtm- or vtm- . in contrast, the results for vtm- were at the expected levels and those for the xipc were highly reproducible for each sample, indicating that there had been efficient nucleic acid extraction and that there were no inhibitors of the qrt-pcr in the samples. table -preliminary comparison of sars-cov- qrt-pcr ct values when testing sars-cov- purified rna diluted in different viral transport media. to confirm the initial results, the experiment was repeated ( table , samples - ), with the inclusion of both rna and presumptively whole virus, as well as pbs and additional vtm solutions. vtm- and pbs gave very similar results for the rna samples. again, no sars-cov- rna was detected in the dilutions prepared in vtms - . consequently, a further set of dilutions ( table , samples - ) were prepared in each solution to test whether there was any adverse effect on higher concentrations of rna. sars-cov- rna was not detected in any dilution prepared in vtm- and very weak reactivity (approaching the assays limit of detection) was observed for vtm- and vtm- in the sample with the highest concentration of rna. in contrast, samples diluted in pbs and vtm- gave almost identical results, with a ct value of approximately for the highest rna concentration. this difference between the pbs/vtm- result and the results for the other vtms represents a reduction in analytical sensitivity of approximately log for the detection of free sars-cov- rna. the results for the xipc were again highly reproducible and similar for each dilution in each solution, confirming the high efficiency of rna extraction and no apparent impact of pcr inhibitors. testing was then undertaken to determine whether there might be a reduction in sensitivity when testing a sample that presumptively contains high quality intact virions. in this instance, the results for the dilutions of virus (table ) were similar for each vtm and the pbs, as were the results for the xipc. similar results for both sars-cov- purified rna and the patient sample were obtained in the cdc assay ( -ncov_n ) targeting the n gene (results not included). testing of the samples of vtm- to which swabs from sars-cov- positive patients had been added and stored at o c for - days gave results that were very similar to the original results. with the exception of a sample that had given a result (ct= . ) close to the limit of detection, the mean variation for the other samples was very similar ( . - . ) ct lower than the original result. when the influenza virus and its homologous purified rna were tested (table ) , similar trends were noticed for the rna results as had been observed for sars-cov- . the results for vtm- and pbs were comparable but no rna was detected in the dilutions in vtms - . the results for all solutions were similar at each dilution when whole virus was tested (data not included). finally, to establish whether these effects were limited to testing of rna viruses, a sample of bovine herpesvirus- was tested in the same model and range of dilutions. the results for each of the vtms and pbs containing the highest concentrations of herpesvirus dna were similar (table ) although there was a distinct trend towards higher ct values for vtm- and vtm- . the preparations of vtm- and vtm- with the lowest concentrations of dna gave negative results. to confirm that the differences observed were not due to random variation around the limit of detection of the assay, eight replicates for the two lowest dna concentrations (highest two dilutions of dna) were then tested ( table ). these results confirmed that there had been some impact of vtm- , vtm- and vtm- on the stability of the purified dna extracts. as had been noted for previous experiments, the xipc gave highly reproducible results for all samples tested. when the dilutions of bovine herpesvirus were tested, the adverse effects that had been observed with the purified dna were again noted but perhaps in a more profound manner table ). reduced sensitivity was observed with samples diluted in vtm- , vtm- and vtm- . in order to further confirm that the differences observed between the different dilutions of virus were not due to random variation around the limit of detection of the assay, seven replicates of each sample from the last dilutions that were positive in pbs were tested again. the results ( the experiments undertaken have shown that some of the vtm solutions examined had a significant and deleterious impact on purified viral rna and variable effects on dna. the synthetic xipc rna that was used throughout this study was not affected because it was prepared in a trna solution and included in the sample lysis buffer which includes inhibitors of nuclease activity and the sample is extracted immediately after addition of the buffer. however, to confirm that the xipc construct could be affected by components in the vtms under study, two concentrations of xipc were prepared in pbs and each of the vtm solutions and tested after being held at room temperature for or hours. the adverse effect of each commercial vtm (table ) was apparent within the first hour and no rna was detected after hours. table -qpcr-pcr ct values for the exogenous synthetic rna when diluted in different viral transport media and tested immediately and after holding at room temperature for days the results of these studies clearly indicate that the commercially prepared vtm solutions have had an adverse impact on the ability to detect both sars-cov- and influenza rna. the results for the rna preparations diluted in the commercial vtms would suggest that there are components of these vtms that have prevented the detection of the rna in these samples. as the detection of the xipc was not affected, we propose that these results provide unequivocal evidence of the presence of a nuclease(s) or similar substance in these vtms. the impact on these samples was rapid as all samples were extracted within one hour of preparation yet no rna was detected in a sample that was estimated to contain more than copies of viral rna in a ul sample. the same result was obtained with assays that were directed at different regions of the sars-cov- genome. the results obtained for the virus samples that were diluted to similar concentrations as the rna samples and held for the same time period also support this hypothesis because it is expected that these samples contained a high proportion of intact nucleocapsids that offered protection to the viral rna. the same trends were observed with the influenza a virus rna and presumptively intact virions. although the effects on the herpesvirus samples were less pronounced, nevertheless there was some impact on both purified dna and whole virus. it is also important to recognise that these observations reflect the outcome of contact between viral nucleic acid and vtm for less than hour in each instance. the levels of nucleic acid that were destroyed after almost immediate addition to the vtms were not insignificant. with a concentration of rna that gave a ct value of approximately in both pbs and vtm- , this represents a reduction of approximately ten thousand-fold and cannot be ignored. while it might be argued that the adverse impact on whole virus appeared to be slight, free nucleic acid and perhaps whole virus was destroyed at levels that could be of diagnostic relevance ( ) . while the impact on rna viral genomes is likely to be markedly greater than on dna sequences, the outcome cannot be predicted as secondary structure may also have an influence ( ) and the speed and severity of the impact may vary depending on the nucleic acid target, as shown by the differences between the results for the two rna viruses and the xipc. further, it cannot be assumed that the target nucleic acid will always be protected by nucleoprotein. degradation will occur during the course of an infection and also under conditions where sample collection, transport and storage are sub-optimal. additionally, the adverse effects observed in this study could potentially be exacerbated with alternative nucleic acid purification technologies that take longer than the minutes required for this magnetic-bead based method. while undertaking surveillance and epidemiological tracing during a pandemic, failure to detect a moderate level of rna in a person who is asymptomatic could result in a critical source of infection remaining undetected. however, with the selection of an appropriate transport medium, sample degradation, even at room temperature, can be minimal. this is clearly shown by the performance of the 'in house' medium (vtm- ) where there is little evidence of deterioration of the rna samples after holding at room temperature for days and no significant deterioration of virus when held at o c for more than weeks. the world health organisation, the us centers for disease control and prevention (cdc) and the uk government have each provided recommendations for the formulation of vtm solutions to be used for the collection of specimens for sars-cov- testing and comment that a supplement of protein or glycerol should be added to enhance the stability of viruses ( , , ) . we believe that, while this is an essential feature of a high quality vtm, it is in achieving this requirement that the current problem with the commercially available vtms may have arisen. our 'in house' vtm includes gelatin which is extracted from animal tissues by treatment at very low or high ph, and prolonged boiling at high temperatures before sterilisation and drying. these steps inactivate both enzymes and infectious agents that are present as well as destroying residual nucleic acid. further, during the preparation of vtm- , the pbs solution to which the gelatin has already been added is also sterilised by heat treatment. in contrast, products that include bovine serum albumen or other serum-derived components, such as vtms , and , cannot be sterilised by autoclaving without coagulation of the protein supplement. therefore, for these vtms, the raw materials must each be free of nucleases and proteinases prior to sterilisation by methods that do not include heating. the choice of protein supplement is also an important consideration from the perspective of inadvertent addition of adventitious agents to the vtm and perhaps genomic dna from animals from which the protein supplement has been derived. foetal bovine serum (fbs) is a recommended supplement ( , ) but almost all commercial batches of fbs contain pestiviruses and at times other bovine viruses. a moderate concentration of bvdv viral rna (ct= . , data not included) was detected in vtm- . as whole genome nucleic acid sequencing is now often undertaken on many original, uncultured patient samples, the presence of these viruses and other extraneous nucleic acid in samples may reduce the sensitivity of sequencing protocols and complicate the interpretation of sequence data. the nucleases present may also have an impact on the quantity of rna available for sequencing. furthermore, the inclusion of fbs presents a biosecurity risk for the movement of animal pathogens between countries and renders such vtm solutions unsuitable for many animal disease diagnostic or research applications. as indicated in the uk government guideline ( ) there is a clear need for the "use (of) alternative swabs and transport medium in accordance with a locally validated laboratory strategy" to demonstrate fitness for purpose. the specifications for products that might be used for both nucleic acid detection methods and virus culture are likely to be more rigorous than for those vtms that are only used for one laboratory method. the special requirements for products that are suitable for nucleic acid testing have been recognised by some manufacturers who have developed specific transport media to inactivate the viruses of interest and to minimise the degradation of nucleic acid ( , ) . some of the major manufacturers of vtm solutions also offer products with additives to reduce nuclease activity but most of these also preclude opportunities to undertake virus culture. however, these limitations are often not apparent to purchasing departments, especially during a pandemic, when any vtm may be mistakenly thought to be "fit for purpose". in conclusion, the results of this study provide examples of how the composition of a vtm could have an impact on the outcome of nucleic acid based testing and, in particular, situations where either there is a need to detect rna that is not packaged into a nucleocapsid or where rna constructs may be diluted in a vtm for use as a positive control in an assay or perhaps for proficiency testing. finally, and particularly in the face of a pandemic, users should be reminded that products fit for one purpose may not be suitable for an alternative use. a product that may be eminently suitable for virus culture purposes could result in misleading results if used for nucleic acid-based tests. and other staff of the virology laboratory at emai for their assistance during the preparation of samples for the initial experiment and the longer-term storage of swabs. who manual on animal influenza surveillance and diagnosis collecting, preserving and shipping specimens for the diagnosis of avian influenza a(h n ) virus infection centers for disease control and prevention, . preparation of viral transport medium. standard operating procedure sop# covid- : guidance for sampling and for diagnostic laboratories maintenance of viability and comparison of identification methods for influenza and other respiratory viruses of humans comparison of various transport media for viability maintenance of herpes simplex virus, respiratory syncytial virus and adenovirus evaluation of swabs, transport media and specimen transport conditions for optimal detection of viruses by pcr comparison of nasopharyngeal nylon flocked swabs with universal transport medium and rayonbud swabs with a sponge reservoir of viral transport medium in the diagnosis of paediatric influenza comparison of copan eswab and floqswab for covid- pcr diagnosis: working around a supply shortage a clinical specimen collection and transport medium for molecular diagnostic and genomic applications comparison of a new transport medium with universal transport medium at a tropical field site influenza virus a (h n ) in chickens and poultry abattoir workers protocol: real-time rt-pcr assays for the detection of sars-cov- institut pasteur development and performance evaluation of calf diarrhea pathogen nucleic acid purification and detection workflow centers for disease control and prevention, . -novel coronavirus ( -ncov) real-time rrt-pcr panel primers and probes rapid detection of highly pathogenic avian influenza h n virus by taqman reverse transcriptase-polymerase chain reaction validation of a real-time pcr assay for the detection of bovine herpesvirus in bovine semen a universal heterologous internal control system for duplex real-time rt-pcr assays used in a detection system for pestiviruses impact of rna degradation on viral diagnosis: an understated but essential step for the successful establishment of a diagnosis network the authors are indebted to mr ian carter, institute of clinical pathology and medical research, westmead hospital, westmead nsw for the generous supply of the purified sars-cov- rna and for referral of the patient sample. we also appreciate the productive discussions with drs catherine pitman and dominic dwyer, nsw health pathology regarding the need for rigorous evaluation of viral transport media. we also thank dr deb finlaison for helpful comments on the draft manuscript and shannon mollica, rodney davis key: cord- -vp xd p authors: parisi, ortensia ilaria; dattilo, marco; patitucci, francesco; malivindi, rocco; pezzi, vincenzo; perrotta, ida; ruffo, mariarosa; amone, fabio; puoci, francesco title: “monoclonal-type” plastic antibodies for sars-cov- based on molecularly imprinted polymers date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vp xd p our idea is focused on the development of “monoclonal-type” plastic antibodies based on molecularly imprinted polymers (mips) able to selectively bind a portion of the novel coronavirus sars-cov- spike protein to block its function and, thus, the infection process. molecular imprinting, indeed, represents a very promising and attractive technology for the synthesis of mips characterized by specific recognition abilities for a target molecule. given these characteristics, mips can be considered tailor-made synthetic antibodies obtained by a templating process. in the present study, the developed imprinted polymeric nanoparticles were characterized in terms of particles size and distribution by dynamic light scattering (dls) and the imprinting effect and selectivity were investigated by performing binding experiments using the receptor-binding domain (rbd) of the novel coronavirus and the rbd of sars-cov spike protein, respectively. finally, the hemocompatibility of the prepared mip-based plastic antibodies was also evaluated. the genome analysis of the novel coronavirus revealed that it belongs to the subgenus sarbecovirus of the genus betacoronavirus and it is closely related ( % identity) to two bat-sars-like coronavirus (bat-sl-covzc and bat-sl-covzxc ) with which it forms a distinct lineage (lu et al., ; pradhan et al., ) . on the other hand, sars-cov- is divergent from sars-cov (about % similarity) and mers-cov (about % similarity) (lu et al., ) . the genome of sars-cov- encodes different proteins such as the spike protein, the envelope protein, the membrane protein, and the nucleocapsid protein. the coronavirus spike protein is a surface protein that mediates host recognition and attachment. it consists of two functional subunits: the s subunit, which contains a receptor-binding domain (rbd) responsible for host cell receptor recognizing and binding, and the s subunit, which is involved in the viral and host membranes fusion. these two processes, which represent the initial steps in the coronavirus infection cycle, are crucial in determining host specificity, tissue tropism and transmission capacity (lu et al., ; wang et al., ) . although the sars-cov- genome was closer to bat-sl-covzc and bat-sl-covzxc , its rbd structure presents a high homology to that of sars-cov, which uses angiotensin-converting enzyme (ace ) as host cell receptor (li et al., ) . both sars-cov and sars-cov- belong to the -genus and, in a recent study (wan et al., ) , it is reported that the overall sequence similarities between -ncov and sars-cov spike proteins are around %- % for the whole protein and around %- % for the rbd. moreover, xu x. et al. (xu et al., ) have found that the novel coronavirus spike protein has a relevant binding affinity to human ace and, thus, this virus can interact with this entry receptor causing the infection of human respiratory epithelial cells. therefore, the spike protein, which is involved in viral recognition and binding to human ace (lei et al., ; yan et al., ; playing a key role also in human-to-human transmission of this novel coronavirus, represents the common and primary target for the development of antibodies, vaccines and therapeutic agents. in this context, our idea is to develop "monoclonal-type" plastic antibodies based on molecularly imprinted polymers (mips) for the selective recognition and binding of the rbd of the novel coronavirus sars-cov- in the aim to block the function of its spike protein (figure .) . this kind of polymeric matrices is synthesized by polymerizing functional and crosslinking monomers around the template (parisi et al., ; parisi et al., a) . the selective recognition abilities of the imprinted polymers are due to the formation of a complex between the target analyte and the selected functional monomers during the pre-polymerization step. therefore, the choice of the monomers represents a crucial point in the preparation of effective mips and it is based on their ability to establish interactions with the functional groups of the template molecule in a covalent or non-covalent way. three main approaches, indeed, can be used to synthesize this kind of polymers depending on the nature of the interactions occurring between the template molecule and the chosen functional monomers during both pre-polymerization and binding steps (parisi et al., b; puoci et al., ) . in the covalent one, template and functional monomers are covalently bound during the pre-polymerization phase and, after the polymerization reaction, the analyte is extracted from the polymeric matrix by chemical cleavage of the covalent bonds. then, the same covalent interactions are re-formed during the rebinding. the semi-covalent approach involves the formation of covalent interactions during the polymerization process and non-covalent interactions in the rebinding process. finally, the non-covalent approach is based on the formation of non-covalent interactions, including hydrogen bonds and electrostatic, π-π and hydrophobic interactions, between template and monomers during the polymerization process and the subsequent recognition phase. this method is widely employed due to several advantages such as the simple experimental procedure and the large variety of appropriate functional monomers. in the present study, the last imprinting approach was chosen for the mips-based antibodies preparation due to the nature of the template-monomers interactions, which are similar to those found in biological systems. once the polymerization reaction has taken place, the template is extracted leading to a porous crosslinked polymeric matrix containing binding holes fitting size, shape and functionalities of the target compound. plastic antibodies made from tailor-made polymeric imprinted nanoparticles represent an alternative to the traditional antibodies, which require an expensive production procedure and are often unreliable due to their restricted stability (refaat et al., ; xu et al., ) . being synthetic materials, mips are robust, physically and chemically stable in a wide range of conditions, including temperature and ph, and more easily available due to a low-cost, reproducible and relatively fast and easy preparation compared to the biological counterpart (piloto et al., ; wubulikasimu et al., ) . therefore, this kind of materials combines the robustness of polymers with the selectivity of natural receptors (capriotti et al., ) and could find applications in several fields, including separation, catalysis and drug delivery, as sensors, synthetic biomimetic receptors and recognition elements in bioanalytical assays (pan et al., ) . moreover, imprinted polymers are characterized by significant versatility. these polymeric materials, indeed, can be designed and engineered according to their specific application developing polymers with magnetic and/or fluorescent properties. synthetic polymeric antibodies against sars-cov- was produced according to the molecular imprinting technology (mit) and a non-imprinted polymer (nip) was also prepared following the same experimental procedure adopted for the imprinted nanoparticles, but in the absence of the novel coronavirus rbd. for this purpose, the non-covalent imprinting approach was adopted and biocompatible functional and crosslinking monomers were chosen. shown as means ± s.d. . ± . . ± . . ± . . ± . the same experimental conditions were adopted to perform selectivity studies, which involved a standard solution of a molecule structurally similar to the -ncov rbd such as the rbd of sars-cov spike protein. the obtained results showed no significant differences between mip and nip nanoparticles (table the developed mip-based plastic antibodies were designed for intravenous administration and, therefore, have to provide a suitable hemocompatibility, which is a key requirement for a successful system that comes in contact with the bloodstream. in order to evaluate the hemocompatibility of the prepared imprinted nanoparticles, hemolysis tests were performed using an isotonic phosphate buffer solution and pure water as negative and positive controls, respectively. a polymeric material is considered not-hemolytic if the hemolysis percentage is below % (contreras-garcía et al., ; panikkar et al., ) . the performed hemolysis assay revealed that the synthesized imprinted nanoparticles induced . % hemolysis, thus, the observed hemolytic potential is within satisfactory limits indicating a good hemocompatibility without causing erythrocyte damage. in conclusion, molecular imprinting technology was adopted as synthetic strategy to prepared molecularly imprinted nanoparticles able to selectively recognize and bind the spike protein receptorbinding domain of the novel coronavirus sars-cov- . the reported preliminary results suggested the potential use of this biocompatible polymeric material as mip-based "monoclonal-type" plastic antibodies devoted to block the function of the virus spike protein. given these characteristics, the developed nanoparticles could be potentially used as free-drug therapeutics in the treatment of -ncov infection. moreover, when loaded with antiviral agents, these nanoparticles could act as a powerful multimodal system combining their ability to block the virus spike protein with the targeted delivery of the loaded drug. in addition, the same nanoparticles can be further engineered to become an immunoprotective vaccine or a mip-based sensor for diagnostic purpose. does the protein corona take over the selectivity of molecularly imprinted nanoparticles? the biological challenges to recognition genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan surface functionalization of polypropylene devices with hemocompatible dmaam and nipaam grafts for norfloxacin sustained release coronaviruses: an overview of their replication and pathogenesis neutralization of sars-cov- spike pseudotyped virus by recombinant ace -ig structure of sars coronavirus spike receptor-binding domain complexed with receptor bat-to-human: spike features determining 'host jump'of coronaviruses sars-cov, mers-cov, and beyond genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding molecularly imprinted polymers as receptor mimics for selective cell recognition modified polyacrylamide microspheres as immunosorbent: trivandrum- full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event magnetic molecularly imprinted polymers (mmips) for carbazole derivative release in targeted cancer therapy molecularly imprinted polymers (mips) as theranostic systems for sunitinib controlled release and self-monitoring in cancer therapy molecularly imprinted polymers for selective recognition in regenerative medicine plastic antibodies tailored on quantum dots for an optical detection of myoglobin down to the femtomolar range uncanny similarity of unique inserts in the -ncov spike protein to hiv- gp and gag. biorxiv molecularly imprinted polymers (pims) in biomedical applications strategies for molecular imprinting and the evolution of mip nanoparticles as plastic antibodies-synthesis and applications ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex epidemiology, genetic recombination, and pathogenesis of coronaviruses receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus mers-cov spike protein: targets for vaccines and therapeutics synthesis of fluorescent drug molecules for competitive binding assay based on molecularly imprinted polymers molecularly imprinted polymer nanoparticles as potential synthetic antibodies for immunoprotection against hiv evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission structural basis for the recognition of sars-cov- by full-length human ace a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -ug duzw authors: ni, dongchun; lau, kelvin; lehmann, frank; fränkl, andri; hacker, david; pojer, florence; stahlberg, henning title: structural investigation of ace dependent disassembly of the trimeric sars-cov- spike glycoprotein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ug duzw the human membrane protein angiotensin-converting enzyme (hace ) acts as the main receptor for host cells invasion of the new coronavirus sars-cov- . the viral surface glycoprotein spike binds to hace , which triggers virus entry into cells. as of today, the role of hace for virus fusion is not well understood. blocking the transition of spike from its prefusion to post-fusion state might be a strategy to prevent or treat covid- . here we report a single particle cryo-electron microscopy analysis of sars-cov- trimeric spike in presence of the human ace ectodomain. the binding of purified hace ectodomain to spike induces the disassembly of the trimeric form of spike and a structural rearrangement of its s domain to form a stable, monomeric complex with hace . this observed hace dependent dissociation of the spike trimer suggests a mechanism for the therapeutic role of recombinant soluble hace for treatment of covid- . coronavirus is a family of single-stranded rna viruses, many of which can infect animals and humans (maclachlan and dubovi, ; monto, ) . the symptoms of coronavirus-related diseases can be mild and mainly occur in respiratory tract. for example, roughly %- % cases of the common cold are caused by human coronaviruses (mesel-lemoine et al., ) . a coronavirus infection sometimes can develop serious illnesses, such as sars (severe acute respiratory syndrome), mers (middle east respiratory syndrome) and also the current pandemic covid- (coronavirus disease ) (tang et al., ) . sars-cov is a beta- coronavirus that caused a pandemic in . sars-cov- is a novel coronavirus that is genetically similar to the previous sars-cov. sars-cov- causes the ongoing pandemic covid- and has been spreading globally since the first quarter of this year (ciotti et al., ) . the symptoms of covid- vary from person to person. in some cases, the illness is very serious, in particular for the elderly (pascarella et al., ) . as of today, no specific antiviral drugs were approved for use against covid- and vaccine development is still at the phase of clinical testing. cryogenic electron microscopy (cryo-em) is a technique for structure determination of biomacromolecules, which has been particularly successful for studying high molecular-weight proteins. cryo-em does not require crystallization of the target protein. covid- related protein structures have been widely investigated since the spread of sars-cov- in this year, using cryo-em single particle analysis (spa). the protein nicknamed spike is with its kda monomeric molecular weight the largest viral surface protein of sars-cov- . it consists of two domains s and s that are connected by a short linker. spike forms stable trimers on the virus surface that are attached to the virus membrane. this spike trimer is the key molecule for host cells receptor binding and invasion of the host cells. the cryo-em structure of the entire spike homotrimer was determined recently, showing a mushroom sharped overall architecture (walls et al., ; wrapp et al., a) . as also for the sars-cov, the viral fusion bridge from sars-cov- to the host cell is formed by spike and the ectodomain of the human angiotensin-converting enzyme (hace ), which is the virus receptor on the host cell that triggers virus entry. in vitro studies have shown that the spike receptor binding domains (rbds) from sars-cov as well as sars-cov- can both bind to the ectodomain of hace with comparable binding affinities in low nanomolar levels (lan et al., a) . however, the new sars-cov- exhibits a more potent capacity of host cells adhesion, as well as a larger virus-entry efficiency than other beta-coronaviruses (shang et al., ) . the membrane-attached hace is known to be the key molecule for the infection by several viruses, including sars-cov, human coronavirus nl (hcov-nl ) and sars-cov- . the infection process primarily involves virus adhesion and fusion (bao et al., ; fehr and perlman, ; kuba et al., ; sia et al., ). interestingly, hace may not only serve as a drug target to prevent sars-co- infection, but hace itself may also be considered as a potential therapeutic drug candidate for the usage against covid- or other beta-coronavirus related diseases. the clinical-grade soluble form of hace has been reported to be a potential novel therapeutic approach for reducing the infection of sars-cov- (monteil et al., ) by preventing the viral spike from interacting with other hace present on human cells. recently, researchers have also characterized the entire architecture of the inactivated authentic virions from sars-cov- using cryo-electron tomography, observing that post-fusion s trimers are distributed on the surface of sars-cov- virions (ke et al., ; turonova et al., ) . the exact role of hace so far is not yet fully understood in terms of its interaction with full-length spike protein. in this study, we present a cryo-electron microscopy (cryo-em) study of the sars-cov- spike protein in complex with hace . our analysis reveals a monomeric complex of spike s domain with hace , requiring a large structural rearrangement in s compared to its isolated structure. our data show that hace binding induces a conformational change in spike, leading to spike trimer dissociation. spike and hace production and its complex assembly. the prefusion spike p ectodomain was expressed in expicho cells and affinity purified via its twin strep- tag. sds-page analysis showed the presence of pure full-length spike protein, consisting of both, the s and s domains at the expected molecular weight of kda for the spike monomer. (suppl. fig. s a ). the purified spike sample in pbs buffer was imaged as negatively stained preparations by transmission electron microscopy (tem). this revealed the expected trimeric shape, and d class averages of selected particles in negative stain tem images showed the typical, mushroom-shaped particles (suppl. fig. s a .c), in accordance with the expected structure of the sars-cov- spike in the pre-fusion state. human ace ectodomain was expressed in hek cells and purified via a poly-histidine immobilized metal affinity chromatography (imac) with a fastback ni + column, followed by another anion exchange column (suppl. fig. s b ). for details, see methods. purified spike protein was mixed with excess hace (molar ratio spike:hace of : ) and incubated for hours at º c. samples were prepared by negative staining and imaged by tem. unexpectedly, the observed particle features were largely different from the typical spike trimers in shape. a d analysis of ' picked negatively stained particles revealed in class averages that the majority of the particles were smaller in size and asymmetrical, compared to the non-incubated spike trimeric samples (suppl. fig. s b,d) . this suggests that the prolonged incubation with hace led to spike trimer dissociation. a similar observation for sars- cov-ace complex was recently reported by song et al (song et al., ) . due to particle heterogeneity, we decided to further purify the complex by size exclusion chromatography (sec) and indeed the sec profile showed three distinct peaks, called peak , peak and peak (suppl. fig. s c,d) . by analyzing the peaks by sds gel and negative stain em, we could clearly differentiate non-structured aggregates of full-length spike and hace in peak , that did not allow further structural analysis (data not shown), to the excess of unbound hace in peak (suppl. fig. s c,d) . the homogenous peak that contained full-length spike in complex with hace , was further analyzed by cryo-em. hace binding can induce disassembly of spike homotrimer peak (spike:hace at molar ratio : after overnight incubation at °c) was vitrified and frozen grids were loaded into a thermo fisher scientific (tfs) titan krios cryo-em instrument, operated at kv acceleration voltage, and equipped with a gatan quantum-ls energy filter equipped with k direct electron detector (suppl. fig. s ). ' dose-fractionated images (i.e., movies) were recorded (suppl. fig. s ), from which ~ . million particles were extracted and subjected to image processing and d reconstruction. the final d reconstruction from ' particles at . Å overall resolution showed a density map corresponding to a single, monomeric spike protein in complex with hace (fig. a) . the map allowed docking with available structures for s and hace taken from the previously reported structures (spike pdb id vyb and spike rbd-ace m j), revealing a structural rearrangement of the c-terminal domain (ctd) and n-terminal domain (ntd) of s compared to a monomer from that spike structure in the rbd up conformation. the interaction between the s rbd and hace is in agreement with several other reported structures of the rbd- ace complex (pdb id m j, vw or lzg). no additional density for s or a fragment of s was detected in the reconstruction. we tested a shorter incubation time by mixing spike:hace (molar ratio of : ) and let it incubate for hours at ºc, instead as overnight. no further sec purification was performed. subsequently, cryo-em grids of this sample were prepared and subjected to cryo-em analysis (suppl. fig. s ). from ' recorded movies, ' particles were extracted and subjected to classification and d analysis. this revealed a small sub-set of ' particles corresponding to the prefusion spike trimer, which allowed a d reconstruction at . Å overall resolution (no symmetry was applied), while some regions of the d map showed lower resolution, presumably due to increased flexibility of these areas (suppl. fig. s ) . a resolution-limited map at Å resolution ( fig. b) allowed clear docking the models of spike s and s and hace , which showed that the complex is composed of spike and hace in a molar ratio of : (spike:hace ). three hace molecules were observed to attach to the rbds of spike. all three rbds were in the rbd up conformation and slightly shifted away from the central trimer axis (fig. b) . a similar arrangement was also recently observed by s -hace model to that of the trimeric spike-hace ( : ) showed that a ~ º rotation of the c-terminal and n-terminal sub-domains of spike s were required to bring the spike s protein into the monomeric arrangement with hace . after such re-arrangement, the domains of hace and the rbds of the s protein are in good agreement with a reported crystal structure (ref lan et al.) (fig. b) . when comparing the docked model of the monomeric s -hace complex with that of the trimeric spike- hace complex (fig. e) , a considerable number of stearic clashes at the interface between spike s (ctd) and its neighboring region from the s polypeptide chain was obvious. the docked monomeric s -hace complex is further structurally incompatible with the observed trimeric arrangement. the stoichiometric ratio of the complex of spike:hace on the host cell upon virus entry is not well established. nevertheless, one hace molecule per spike trimer is likely sufficient for binding and initializing virus fusion with the host cell (song et al., ) . even though the structure of a post-fusion s trimer has recently been determined (cai et al., ), it is not clear how membrane fusion during virus entry is coordinated upon release of the s -hace caps (fig. ) . we here report the cryo-em structure of a stable monomeric s spike-hace ( : ) complex. even though size-wise it would have been detectable, our particle classification did not reveal any particle class corresponding to an isolated s fragment in addition to the observed s spike-hace ( : ) particles (suppl. fig. s c, s ) . knowledge of the mechanism how the s fragment might be detached from the hace receptors after virus entry would be relevant for understanding its mechanism of infection and pathogenicity. the fact that we did not observe any free s fragments suggests that the s -hace complex is rather stable, at least under our in vitro conditions. secondly, the s domain was not detected in the obtained structure of the spike-hace monomeric complex ( : ), even though the sds-page analysis showed that s was present as full length in the sample (suppl. fig. s d). the s domain is expected to be connected to the s domain via a short loop between s and s , where a furin protease cleavage site is expected (belouzard et al., ; haan et al., ; hoffmann et al., ). however, in the absence of stable trimers, the loop between s and s is likely very flexible, possibly making the s domain undetectable by cryo-em maps. our cryo-em analysis that didn't show the s domain in the averaged d reconstruction therefore likely failed to align the s domains either due to their flexibility, or due to a denaturation of s during sample preparation. an early study presented a potential dose-dependent inhibition of sars-cov- infection by a recombinant soluble form of hace (monteil et al., ) . the mechanism, how the soluble forms of hace would be able to neutralize the virus, is not known. one possible mechanism could be a direct competition between the soluble hace and the host cell hace receptor, so that spike proteins saturated with soluble hace domains render them unable to interact with host cell hace . here, however, we report that the soluble forms of hace induce the opening and disassembly of the trimeric spike structure to create the stable spike s -hace complex (fig. ) . we propose a mechanism by which the formation of the spike-hace ( : ) complex induces a high structural flexibility in the spike trimer, allowing a conformational re-arrangement of the s c-and n- terminal domains when interacting with hace . in consequence, the new s -hace complex is incompatible with a trimeric arrangement, causing the dissociation of the trimeric complex (fig. ) . this hypothesis is supported by the recent manuscript deposited in biorxiv.org, which describes a similar effect triggered by engineered darpin molecules (walser et al., ) . therefore, we suppose that the soluble forms of hace may not only block the infection and replication of sars-cov- , but also destroy the trimeric spike adaptors that are responsible for viral host membrane fusion. this mechanism suggests a novel therapeutic strategy for the treatment of covid- , by adding soluble hace to dissociate the spike trimer of approaching viruses. protein production and purification two different preparations of cryo-em grids were performed. purified spike and hace proteins were mixed at the molar ratio of : and incubated for hours at ºc. after that, the sample was subjected to size exclusion chromatography (sec) with a superose increase ( / ) column, and the fractions from peak were pooled and concentrated in kda centrifugal concentrators (millipore). alternatively, purified spike and hace were mixed at the molar ratio of : (spike:hace ) an incubated for hours at ºc, without further purification via sec. for both samples, the concentration was adjusted to . mg/ml. cryo-em grids were prepared with a vitrobot mark iv (thermo fisher), using a temperature of °c and % humidity. μl of sample was applied onto for both samples, dose-fractionated images (i.e., movies) were recorded with a titan krios (thermo fisher), operated at kv, and equipped with a gatan quantum-ls energy filter ( ev zero-loss energy filtration) followed by a gatan k summit direct electron detector. the data collection statistics is presented in table . for the trimeric spike-hace complex, image processing was performed similarly. the final reconstruction produced a d map at . Å overall resolution. protein models were generated from reported structures (spike: pdb id vyb; ace -rbd: pdb id m j) (lan et al., b; walls et al., ) . for the s -hace structure, the model was manually docked into the em density with the program chimera (pettersen et al., ) and further refined using rigid-body fitting in coot (emsley et al., ) . for the spike-hace trimer, the density corresponding to hace was relatively weak, so that low-pass filtration to Å resolution was applied to the map before proceeding with the docking of hace as described above. figure cryo-em maps of sars-cov- spike-hace complexes and fitted models. a. the d reconstruction of sars-cov- spike and human ace (mixed at a molar ratio : ) incubated for hrs and further purified by sec shows a structure corresponding to monomeric spike s protein in complex with hace . no density for s is observed. the n-and c-terminal domains of s had to be re-arranged to fit into the map. left: side-view, center: º rotated view. the structure is colored as follows: hace ectodomain green, spike s -rbd yellow, spike s -ctd slate blue, spike s -ntd salmon. right: the bottom image shows a representative d class average of the s -hace complex. the upper cartoon is its interpretation. b. d reconstruction of spike and hace (ratio : ) incubated for hours shows a trimeric map allowing docking of three hace molecules and three s and three s molecules, all forming a trimeric complex. superimposed. b. structural comparison of s -hace regions. the movement is presented as two colors: sars-cov trimeric spike-hace complex (green) and s -hace (blue) c,d. the close-up views show the proposed local structural rearrangements. e. superposition of s -hace ( : ) with the trimeric form of spike-hace complex. the rbds have been superimposed. the rearranged structure of spike-hace is no longer compatible with formation of trimers so that it dissociates. the right panel is a predicted structure of the dissociated spike-hace monomer. spike. in this model, one hace molecule binds to one spike s monomer and induces the conformational changes in the trimeric spike. subsequently, a post-fusion s trimer is formed. the lower row shows a novel proposed pathway leading to spike trimer disassembly by hace . in presence of a high concentration of hace molecules, a spike-hace ( : ) complex is formed. structural clashes between the three spike-hace elements lead to their dissociation. this induces the formation of monomeric spike-hace complexes. average classes. d. model fitted into the s trimeric core (the map was low-pass filtered). e. distribution of particle orientations. f. local resolution level at the best resolved regions of the trimeric form of spike-hace complex (bottom view, monores). g. the low-pass filtered em map at Å resolution (for model generation). scale bar in a is nm and in c is nm. figure s processing workflows for the trimeric spike-hace complex. using prediction of specimen position a human coronavirus responsible for the common cold massively kills dendritic cells but not monocytes inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace coronaviruses. viral infections of humans: epidemiology and control covid- diagnosis and management: a comprehensive review chimera--a visualization system for exploratory research and analysis cryosparc: algorithms for rapid unsupervised cryo-em structure determination structural basis of receptor recognition by sars-cov- pathogenesis and transmission of sars-cov- in golden hamsters cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace the hallmarks of covid- disease efficient d-ctf correction for cryo-electron tomography using novactf improves subtomogram averaging resolution to . a structure, function, and antigenicity of the sars-cov- spike glycoprotein key: cord- - nnjex y authors: ramachandran, ashwin; huyke, diego a.; sharma, eesha; sahoo, malaya k.; banaei, niaz; pinsky, benjamin a.; santiago, juan g. title: electric-field-driven microfluidics for rapid crispr-based diagnostics and its application to detection of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nnjex y the rapid spread of covid- across the world has revealed major gaps in our ability to respond to new virulent pathogens. rapid, accurate, and easily configurable molecular diagnostic tests are imperative to prevent global spread of new diseases. crispr-based diagnostic approaches are proving to be useful as field-deployable solutions. in a basic form of this assay, the crispr-cas enzyme complexes with a synthetic guide rna (grna). this complex is activated when it highly specifically binds to target dna, and the activated complex non-specifically cleaves single-stranded dna reporter probes labeled with a fluorophore-quencher pair. we recently discovered that electric field gradients can be used to control and accelerate this crispr assay by co-focusing cas -grna, reporters, and target. we achieve an appropriate electric field gradient using a selective ionic focusing technique known as isotachophoresis (itp) implemented on a microfluidic chip. unlike previous crispr diagnostic assays, we also use itp for automated purification of target rna from raw nasopharyngeal swab sample. we here combine this itp purification with loop-mediated isothermal amplification, and the itp-enhanced crispr assay to achieve detection of sars-cov- rna (from raw sample to result) in min for both contrived and clinical nasopharyngeal swab samples. this electric field control enables a new modality for a suite of microfluidic crispr-based diagnostic assays. significance statement rapid, early-stage screening is especially crucial during pandemics for early identification of infected patients and control of disease spread. crispr biology offers new methods for rapid and accurate pathogen detection. despite their versatility and specificity, existing crispr-diagnostic methods suffer from the requirements of up-front nucleic acid extraction, large reagent volumes, and several manual steps—factors which prolong the process and impede use in low resource settings. we here combine on-chip electric-field control in combination with cripsr biology to directly address these limitations of current crispr-diagnostic methods. we apply our method to the rapid detection of sars-cov- rna in clinical samples. our method takes min from raw sample to result, a significant improvement over existing diagnostic methods for covid- . we next developed a novel protocol for the detection of rt-lamp-amplified cdna of sars-cov- viral rna using itp-mediated crispr-cas dna detection. upon the lod of the itp-crispr method was found to be copies per microliter reaction, which is the same as the very recent crispr-based assay ( ). further, in the case of positive detection, a fluorescence signal above the threshold value was observed in < min (fig. s ) . these results are in contrast to the copy per microliter reaction lod for the -hour qpcr method (fig. e) ( ). lastly, we verified that microfluidic itp-crispr detection and the typical crispr-based ( ) approaches gave the same positive/negative result when tested with the same lamp pre-amplified dna (fig. s ) . itp enables rapid extraction of total nucleic acids from raw nasopharyngeal swab samples we also demonstrated on-chip itp extraction of total nucleic acids from raw clinical positive and negative nasopharyngeal swab samples (figs. b, a and d ). to validate our extraction method, we performed qpcr for the e gene and rnase p control (fig. e) . we observed that itp-extracted nucleic acids showed e gene amplification on positive qpcr-based detection approaches is provided in table . in summary, we developed an electrokinetic microfluidic method broadly applicable to crispr-based diagnostics. our method involves itp-based nucleic acid extraction from raw sample, isothermal reverse transcription and amplification, and then a novel crispr assay enhanced by itp with a total assay time of min (from raw sample to result). we table s . lamp primers (elim biosciences) were reconstituted in nuclease free water and grnas (idt) were reconstituted in rna reconstitution buffer. for itp co-focusing visualization experiments, the mtb target dna sequence was used (table s ). μm stock solution of mtb dsdna was prepared by pre-hybridizing complementary ssdna templates (elim biosciences) in a buffer containing mm tris- hcl, mm mgcl , and mm edta at o c. we designed a cy -labeled grna (idt, table s ) to target the mtb dsdna sequence. main channel length between the positive/negative electrodes is mm (fig. s ) . to avoid cross-contamination, ensure run-to-run repeatability, and provide uniform surface properties, the channels were rinsed in the following order before each itp experiment: % bleach for min, di water for min, % triton-x for min, di water for min, m naoh for min, and di water for min. between each rinse step, the channel was completely dried using vacuum. the buffer loading procedure and buffer placement in the channel sections are detailed in fig. s . x composition of lysis buffer included . % triton x, mg/ml of proteinase k, . mg/ml of carrier rna (thermo fisher). following incubation, μl of mm hepes buffer was added, and μl of this mixture was dispensed in the trailing electrolyte (te) reservoir on-chip (fig. s ) . the leading electrolyte (le) buffer in the main channel consisted of mm tris-hcl (ph . ), u/μl rnasin plus, . % triton x, % of . mda polyvinylpyrrolidone (pvp) and x sybr green i. sybr green i was used to visualize the itp peak which contained nucleic acids (fig. d) a x cas -grna complex mixture was prepared by pre-incubating μm of lbcas a (neb) with . μm grna in x nebuffer . at °c for min. lbcas -grna complexes were prepared independently for n, e, and rnase p genes. for itp co-focusing visualization experiment in fig. c, a x cas -grna complex was prepared using μm of lbcas a (neb) and . μm of cy -labeled grna. here, a molar excess of lbcas a was used to minimize free, unbound grna. lbcas -grna complex was combined with μl of pre-prepared mtb dsdna template and μl le buffer. the on-chip buffer loading procedure is described in fig. s . the itp-crispr detection experiments were performed at constant current of μa supplied by a keithley sourcemeter (fig. s ) . fluorescence images of the moving itp peak were acquired in s intervals using a cmos camera (hamamatsu orca- flash . ) mounted on an inverted epifluorescence microscope (nikon eclipse te ). for widefield images of itp peak in fig. b, the rt-qpcr assay was performed using the abi fast dx (applied biosystems) instrument. we performed assays for the e and rnase p genes separately in µl reaction volumes using the luna universal probe one-step rt-qpcr kit (new england deviation. covid -p was below the lod of our assay as confirmed by qpcr (fig. e) . is loaded in reservoir , μl of le is loaded in reservoir , and μl te combined with ssdna reporters is loaded in reservoir . vacuum is applied at reservoir briefly till the channels are filled as depicted in the schematic. then, reservoirs , , and are emptied and loaded with μl of le, and reservoir is emptied and loaded with μl te. a constant current of μa is applied for min and fluorescence intensity of the itp peak is recorded using a cmos camera every s. ( ) for an integration of at least one itp assay into a portable device. we propose here the concept that such a system can integrate itp-based nucleic acid extraction, multiplexed isothermal amplification of target cdna of n and e genes of sars-cov- and rnase p control, followed by itp-crispr-based cdna detection in three separate channels using photodiodes. table s . list of grnas, lamp primers, rt-qpcr primers, template and reporter sequences. mtb sequences were used for itp co-focusing experiments of figure c detection of novel coronavirus ( -ncov) by real- time rt-pcr crispr-cas -based detection of sars-cov- . nat point-of-care testing for covid- using sherlock diagnostics. medrxiv isotachophoresis applied to biomolecular reactions massively multiplexed nucleic acid detection using cas rapid detection of urinary tract infections using isotachophoresis and molecular beacons purification of nucleic acids from whole purification of nucleic acids using isotachophoresis rapid hybridization of nucleic acids using isotachophoresis crispr-cas a target binding unleashes indiscriminate single- stranded dnase activity open source simulation tool for electrophoretic stacking, focusing, and separation the crystal structure of cpf in complex with crispr rna fluorescent carrier ampholytes assay for portable, label-free detection of chemical toxins in tap water key: cord- -sgm q i authors: walter, justin d.; hutter, cedric a.j.; zimmermann, iwan; wyss, marianne; earp, jennifer; egloff, pascal; sorgenfrei, michèle; hürlimann, lea m.; gonda, imre; meier, gianmarco; remm, sille; thavarasah, sujani; plattet, philippe; seeger, markus a. title: sybodies targeting the sars-cov- receptor-binding domain date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sgm q i the covid- pandemic, caused by the novel coronavirus sars-cov- , has resulted in a global health and economic crisis of unprecedented scale. the high transmissibility of sars-cov- , combined with a lack of population immunity and prevalence of severe clinical outcomes, urges the rapid development of effective therapeutic countermeasures. here, we report the generation of synthetic nanobodies, known as sybodies, against the receptor-binding domain (rbd) of sars-cov- . in an expeditious process taking only twelve working days, sybodies were selected entirely in vitro from three large combinatorial libraries, using ribosome and phage display. we obtained six strongly enriched sybody pools against the isolated rbd and identified unique anti-rbd sybodies which also interact in the context of the full-length sars-cov- spike ectodomain. among the selected sybodies, six were found to bind to the viral spike with double-digit nanomolar affinity, and five of these also showed substantial inhibition of rbd interaction with human angiotensin-converting enzyme (ace ). additionally, we identified a pair of anti-rbd sybodies that can simultaneously bind to the rbd. it is anticipated that compact binders such as these sybodies could feasibly be developed into an inhalable drug that can be used as a convenient prophylaxis against covid- . moreover, generation of polyvalent antivirals, via fusion of anti-rbd sybodies to additional small binders recognizing secondary epitopes, could enhance the therapeutic potential and guard against escape mutants. we present full sequence information and detailed protocols for the identified sybodies, as a freely accessible resource. the ongoing pandemic arising from the emergence of the novel coronavirus, sars-cov- , demands urgent development of effective antiviral therapeutics. several factors contribute to the adverse nature of sars-cov- from a global health perspective, including the absence of herd immunity [ ] , high transmissibility [ , ] , the prospect of asymptomatic carriers [ ] , and a high rate of clinically severe outcomes [ ] . moreover, a vaccine against sars-cov- is unlikely to be available for at least - months [ ] , despite earnest development efforts [ , ] , making alternative intervention strategies paramount. in addition to offering relief for patients suffering from the resulting covid- disease, therapeutics may also reduce the viral transmission rate by being administered to asymptomatic individuals subsequent to probable exposure [ ] . finally, given that sars-cov- represents the third global coronavirus outbreak in the past years [ , ] , development of rapid therapeutic strategies during the current crises could offer greater preparedness for future pandemics. akin to all coronaviruses, the viral envelope of sars-cov- harbors protruding, club-like, multidomain spike proteins that provide the machinery enabling entry into human cells [ ] [ ] [ ] . the spike ectodomain is segregated into two regions, termed s and s . the outer s subunit of sars-cov- is responsible for host recognition via interaction between its c-terminal receptor-binding domain (rbd) and human angiotensin converting enzyme (ace ), present on the exterior surface of airway cells [ , ] . while there is no known host-recognition role for the s n-terminal domain (ntd) of sars-cov- , it is notable that s ntds of other coronaviruses have been shown to bind host surface glycans [ , ] . in contrast to spike region s , the s subunit contains the membrane fusion apparatus, and also mediates trimerization of the ectodomain [ ] [ ] [ ] . prior to host recognition, spike proteins exist in a metastable pre-fusion state wherein the s subunits lay atop the s region and the rbd oscillates between "up" and "down" conformations that are, respectively, accessible and inaccessible to receptor binding [ , , ] . upon processing at the s /s and s ' cleavage sites by host proteases as well as engagement to the receptor, the s subunit undergoes dramatic conformational changes from the pre-fusion to the post-fusion state. such structural rearrangements are associated with the merging of the viral envelope with host membranes, thereby allowing injection of the genetic information into the cytoplasm of the host cell [ , ] . coronavirus spike proteins are highly immunogenic [ ] , and several experimental approaches have sought to target this molecular feature for the purpose of viral neutralization [ ] . the high specificity, potency, and modular nature of antibody-based antiviral therapeutics has shown exceptional promise [ ] [ ] [ ] , and the isolated, purified rbd has been a popular target for the development of anti-spike antibodies against pathogenic coronaviruses [ ] [ ] [ ] [ ] . however, binders against the isolated rbd may not effectively engage the aforementioned pre-fusion conformation of the full spike, which could account for the poor neutralization ability of recently described single-domain antibodies that were raised against the rbd of sars-cov- [ ] . therefore, to better identify molecules with qualities befitting a drug-like candidate, it would be advantageous to validate rbd-specific binders in the context of the full, stabilized, pre-fusion spike assembly [ , ] . single domain antibodies based on the variable vhh domain of heavy-chain-only antibodies of camelids -generally known as nanobodies -have emerged as a broadly utilized and highly successful antibody fragment format [ ] . nanobodies are small ( ) ( ) ( ) ( ) , stable, and inexpensive to produce in bacteria and yeast [ ] , yet they bind targets in a similar affinity range as conventional antibodies. due to their minimal size, they are particularly suited to reach hidden epitopes such as crevices of target proteins [ ] . we recently designed three libraries of synthetic nanobodies, termed sybodies, based on elucidated structures of nanobody-target complexes (fig. a) [ , ] . sybodies can be selected against any target protein within twelve working days, which is considerably faster than natural nanobodies, which requires the repetitive immunization during a period of two months prior to binder selection by phage display [ ] (fig. c) . a considerable advantage of our platform is that sybody selections are carried out under defined conditions -in case of coronavirus spike proteins, this offers the opportunity to generate binders recognizing the metastable pre-fusion conformation [ , ] . finally, due to the feasibility of inhaled therapeutic nanobody formulations [ ], virusneutralizing sybodies could offer a convenient and direct means of prophylaxis. here, we report of in vitro selection and characterization of sybodies against the rbd of sars-cov- spike protein. two independently prepared rbd constructs were used for in vitro sybody selections, and resulting single clones that could bind the full spike ectodomain were sequenced, expressed, and purified. six unique sybodies show favorable binding affinity to the sars-cov- spike, and five of these were also found to substantially attenuate the interaction between the viral rbd and human ace . moreover, pairs of sybodies were identified that can simultaneously bind to the rbd. we present all sequences for these clones, along with detailed protocols to enable the community to freely produce and further characterize these sars-cov- binders. based on sequence alignments with isolated rbd variants from sars-cov- that were amenable to purification and crystallization [ , ] , a sars-cov- rbd construct was designed, consisting of residues pro -gly fused to venus yfp (rbd-vyfp). this construct was expressed and secreted from expi cells, and rbd-vyfp was extracted directly from culture medium supernatant using an immobilized anti-gfp nanobody [ ], affording a highly purified product with negligible background contamination. initial efforts to cleave the c-terminal vyfp fusion partner with c protease resulted in unstable rbd, so experiments were continued with full rbd-vyfp fusion protein. to account for the presence of the vyfp fusion partner, a second rbd construct, consisting of a fusion to murine igg fc domain (rbd-fc), was commercially acquired. to remove any trace amines, buffers were exchanged to pbs via extensive dialysis. proteins were chemically biotinylated, and the degree of biotinylation was assessed by a streptavidin gel-shift assay and found to be greater than % of the target proteins [ ] . we note that while both rbd fusion proteins were well-behaved, a commercially acquired purified full-length sars-cov- spike ectodomain construct (ecd) was found to be aggregation-prone. very recently, we also produced an engineered spike protein ectodomain containing two point mutations known to stabilize the pre-fusion state, an inactivated furin cleavage site, and a c-terminal trimerization motif [ , , ] . while this purified pre-fusion spike (pfs) had not yet been available for binder selections and characterization by grating-coupled interferometry, it was used to conduct elisas in order to identify selected sybodies which recognize the rbd in the pre-fusion context (see below). since both our rbd constructs bear additional fusion domains (fc of mouse igg and vyfp, respectively), sybody selections were carried out with a "target swap" approach (fig. b) . hence, selections with the three sybody libraries (concave, loop and convex) were started with the rbd-vyfp construct using ribosome display, and the rbd-fc construct was then used for the two phage display rounds (selection variant : rbd-vyfp/rbd-fc/rbd-fc) and vice versa (selection variant : rbd-fc/rbd-vyfp/rbd-vyfp). accordingly, there were a total of six selection reactions (table , fig. b) . to increase the average affinity of the isolated sybodies, we included an off-rate selection step using the preenriched purified sybody pool after phage display round as competitor. to this end, sybody pools of all three libraries of the same selection variant were sub-cloned from the phage display vectors into the sybody expression vector psb_init. subsequently, the two separate pools (all sybodies of selection variants and , respectively) were expressed and purified. the purified pools were then added to the panning reactions of the respective selection variant in the second phage display round. thereby, rebinding of sybody-phage complexes with fast off-rates was suppressed. enrichment of sybodies against the rbd was monitored by qpcr. already in the first phage display round, the concave and loop sybodies of selection variant showed enrichment factors of and , respectively (table ) . after the second phage display round (which included the off-rate selections step), strong enrichment factors in the range of - were determined. after sub-cloning the pools from the phage display vector pdx_init into the sybody expression vector psb_init, clones of each of the selections reactions ( table , fig. b ) were picked at random and expressed in small scale. our standard elisa was initially performed using rbd-vyfp (rbd), spike ectodomain containing s and s (ecd), and maltose binding protein (mbp) as unrelated dummy protein. as outlined in the materials and methods section, elisa analysis revealed very high hit rates for the rbd and the ecd ranging from % to % and % to %, respectively (fig. , table ). the majority of the sybodies giving an elisa signal to the rbd also gave a clear signal the full-length spike protein (fig. ). however, there was a total of hits that only gave an elisa signal for rbd-vyfp, but not for the ecd. this could be due to the presence of cryptic rbd epitopes that are not accessible in the context of the full-length spike protein, or the respective sybodies may recognize the vyfp portion of the rbd-vyfp construct, though the selection procedure clearly disfavors the latter explanation. importantly, background binding to the dummy protein mbp was not observed for any of the analyzed sybodies, clearly showing that the binders are highly specific. we then sequenced sybodies that were elisa-positive against rbd-vyfp as well as the full-length spike ( for each of the selection reactions numbered from sb# - , see also fig. b ). subsequent to sybody sequencing, we also performed the elisa using engineered pre-fusion-stabilized spike ectodomain (pfs) (fig. ) , which was not available at the onset of the project. overall, the elisa signals for the ecd and pfs are highly similar. however, there are around sybodies that bind to the ecd clearly stronger than to the pfs (yet the opposite scenario was never observed). this could be explained by the fact that the pfs forms a trimer, while the oligomeric state of the ecd is not clear. in addition, the ecd might adopt partially or completely a post-fusion state, whereas pfs is expected to predominantly adopt the pre-fusion state. trimer formation as well as pre-fusion stabilization might shield certain binding epitopes on the rbd in the context of the pfs, which might become accessible as the spike falls apart into monomers and/or transits to the post-fusion state. in light of our elisa data, the pfs construct will be a crucial element in any future sybody selection campaigns. sequencing results of out of sybody clones were unambiguous. out of these clones, were found to be unique and the respective clone names are indicated in the elisa figure (fig. , table ). of note, there were no duplicate binders identified in both selection variants, indicating that the two separate selection streams gave rise to completely different arrays of sybodies. as an additional note, one sybody identified from the supposed convex library turned out to belong to the concave library; spill-over of sybodies across libraries is occasionally observed. hence, there was a total of concave, loop and convex sybodies, which were then aligned according to their library origin . as a final analysis, all sybody sequences were aligned to generate a phylogenetic tree, which shows a clear segregation across the three libraries and indicates a large sequence variability of the identified sybodies ( fig. ). the unique sybodies were individually expressed in e. coli and purified via ni-nta affinity chromatography and gel filtration. ultimately, sybodies were sufficiently well-behaved, with respect to solubility, yield, and monodispersity, to proceed with further characterization. for a kinetic analysis of sybody interactions with the viral spike, we employed grating-coupled interferometry (gci) to probe sybody binding to immobilized rbd-vyfp or ecd. first, the purified sybodies were subjected to an off-rate screen, which revealed six sybodies (sb# , sb# , sb# , sb# , sb# , and sb# ) with strong binding signals and comparatively slow off-rates. binding constants were then determined by measuring on-and off-rates over a range of sybody concentrations, revealing affinities within a range of - nm to the sars-cov- spike (fig. , table ). of note, binding affinities were consistently equal or higher for the ecd as compared to the rbd-vyfp, in particular in case of sb# for which the off-rate differs by more than two-fold. this might indicate a binding avidity effect arising from binding epitopes clustering in the context of the spike trimer or differences with regards to the glycan structures (rbd-vyfp was produced in hek cells, whereas the ecd was produced in insect cells). to our surprise, the majority of purified and elisa-positive sybodies ( out of ) displayed binding affinities worse than nm. this may be attributed to the presence of complex heterogeneous asnlinked glycans within the rbd, which could hinder the isolation of specific high-affinity binders. alternatively, given that the final elisa step of the selection process resulted in a substantial number of positive clones, insufficiently stringent conditions may have favored the high positive hit rate of lowaffinity binders. since virulence of sars-cov- is dependent on the ability of the viral rbd to bind to human ace (hace ), we sought to determine which of the selected sybodies that were well-behaved upon purification could inhibit interaction between the isolated rbd and purified hace . for this assessment, elisa plates were coated with purified hace , and the binding of purified rbd to the immobilized hace was measured in the presence or absence of an excess of each purified sybody (fig. ). while the absence of any added sybody resulted in a strong elisa signal corresponding to rbd association with hace , the pre-incubation of nearly all sybodies with the rbd resulted in an attenuated signal, implying that these binders inhibit rbd-hace association. this signal decrease relative to unchallenged rbd was modest for most sybodies, with an average signal reduction of about %, but five sybodies demonstrated exceptionally high apparent inhibition of rbd-hace interaction (sb# , sb# , sb# , sb# , and sb# ), showing ≥ % signal reduction. notably, the aforementioned kinetic analysis had shown that these sybodies were also among the strongest rbd binders. taken together, this data suggests that sb# , sb# , sb# , sb# , and sb# recognize a surface region on the rbd that overlaps with the hace binding site. while kinetic analysis had revealed sb# to be among the stronger binders to the sars-cov- ectodomain (kd ≈ nm, fig. , table ), the hace competition elisa revealed that sb# does not inhibit hace -rbd interaction to the same extent as other sybodies with comparable affinities ( % inhibition for sb# , compared to > % for sb# , sb# , sb# , and sb# ). therefore, it was hypothesized that sb# may interact with a non-or partially-overlapping surface on the rbd, relative to the more strongly-inhibiting sybodies. using sb# as a representative of the hace -inhibiting sybodies, we analyzed the ability of sb# and sb# to simultaneously associate with the rbd. first, elisa experiments demonstrate that incubation of sb# with the pre-fusion spike only slightly prevents the spike from binding to immobilized sb# , whereas pre-incubation with sb# , sb# , sb# , sb# , or sb# completely prevents spike interaction with immobilized sb# (fig. ). in agreement with the elisa data, gci experiments revealed that co-injection of sb# and sb# results in a clear (but not fully additive) increase of the response signal, relative to sb# or sb# injected alone, implying simultaneous binding of sb# and sb# (fig. ). the control gci experiment involving the co-injection of sb# and sb# did not result in a similar signal increase ( fig. ). in sum, this data plausibly suggests that sb# and sb# can simultaneously bind to the rbd. for the design of therapeutics against sars-cov- , the fusion of such a pair of non-overlapping binders could provide benefits via increased overall avidity to the spike protein. we have demonstrated the ability of our rapid in vitro selection platform to generate sybodies against the sars-cov- rbd, within a two-week timeframe. characterization of these sybodies has identified a high-affinity subset of binders that also inhibit the rbd-ace interaction. we anticipate that the presented panel of anti-rbd sybodies could be of use in the design of urgently required therapeutics to mitigate the covid- pandemic, particularly in the development of inhalable prophylactic formulations [ ] . furthermore, our identification of a pair of sybodies that can simultaneously associate with the rbd may offer an attractive foundation for the construction of a polyvalent sybodybased therapeutic. we have attempted to provide a complete account of the generation of these molecules, including full sequences and detailed methods, such that other researchers may contribute to their ongoing analysis. future work may include virus neutralization assays using the identified sybodies, as well as further selection campaigns targeting additional spike epitopes. finally, our recently described flycode technology could be utilized for deeper interrogation of selection pools, in order to facilitate discovery of exceptional sybodies that possess very slow off-rates or recognize rare epitopes [ ] . a gene encoding sars-cov- residues pro -gly (rbd, genbank accession qhd . ), downstream from a modified n-terminal human serum albumin secretion signal [ ] , was chemically synthesized (geneuniversal). this gene was subcloned using fx technology [ ] into a custom mammalian expression vector [ ] , appending a c-terminal c protease cleavage site, myc tag, venus yfp [ ] , and streptavidin-binding peptide [ ] onto the open reading frame (rbd-vyfp). - ml of suspension-adapted expi cells (thermo) were transiently transfected using expifectamine according to the manufacturer protocol (thermo), and expression was continued for - days in a humidified environment at °c, % co . cells were pelleted ( g, min), and culture supernatant was filtered ( . µm mesh size) before being passed three times over a gravity column containing nhsagarose beads covalently coupled to the anti-gfp nanobody k k [ ], at a resin:culture ratio of ml resin per ml expression culture. resin was washed with column-volumes of rbd buffer (phosphate-buffered saline, ph . , supplemented with additional . m nacl), and rbd-vyfp was eluted with . m glycine, ph . , via sequential . ml fractions, without prolonged incubation of resin with the acidic elution buffer. fractionation tubes were pre-filled with / vol m tris, ph . ( µl), such that elution fractions were immediately ph-neutralized. fractions containing rbd-vyfp were pooled, concentrated, and stored at °c. purity was estimated to be > %, based on sds-page (not shown). yield of rbd-vyfp was approximately - μg per ml expression culture. a second purified rbd construct, consisting of sars-cov- residues arg -phe fused to a murine igg fc domain (rbd-fc) expressed in hek cells, was purchased from sino biological (catalogue number: -v h, µg were ordered). purified full-length spike ectodomain (ecd) comprising s and s (residues val -pro ) with a c-terminal his-tag and expressed in baculovirus-insect cells was purchased from sino biological (catalogue number: -v b , µg were ordered). the prefusion ectodomain of the sars-cov spike protein (residues - ) [ ] , was transiently transfected into x suspension-adapted expicho cells (thermo fisher) using mg plasmid dna and mg of pei max (polysciences) per l procho medium (lonza) in a l erlenmeyer flask (corning) in an incubator shaker (kühner). one hour post-transfection, dimethyl sulfoxide (dmso; applichem) was added to % (v/v). incubation with agitation was continued at °c for days. l of filtered ( . um) cell culture supernatant was clarified. then, a ml gravity flow strep-tactin®xt superflow® column (iba lifescience) was rinsed with ml buffer w ( mm tris, ph . , mm nacl, mm edta) using gravity flow. the supernatant was added to the column, which was then rinsed with ml of buffer w (all with gravity flow). finally, six elution steps were performed by adding each time . ml of buffer bxt ( mm biotin in buffer w) to the resin. all purification steps were performed at °c. to remove amines, all proteins were first extensively dialyzed against rbd buffer. proteins were concentrated to µm using amicon ultra concentrator units with a molecular weight cutoff of - kda. subsequently, the proteins were chemically biotinylated for min at °c using nhs-biotin (thermo fisher, # ) added at a -fold molar excess over target protein. immediately after, the three samples were dialyzed against tbs ph . . during these processes (first dialysis/ concentrating/ biotinylation/ second dialysis), %, %, % and % of the rbd-vyfp, rbd-fc, ecd and pfs respectively were lost due to sticking to the concentrator filter or due to aggregation. biotinylated rbd-vyfp, rbd-fc and ecd were diluted to µm in tbs ph . , % glycerol and stored in small aliquots at - °c. biotinylated pfs was stored at °c in tbs ph . . sybody selections with the three sybody libraries concave, loop and convex were carried out as described in detail before [ ] . in short, one round of ribosome display followed by two rounds of phage display were carried out. binders were selected against two different constructs of the sars-cov- rbd; an rbd-vyfp fusion and an rbd-fc fusion. mbp was used as background control to determine the enrichment score by qpcr [ ] . in order to avoid enrichment of binders against the fusion proteins (yfp and fc), we switched the two targets after ribosome display (fig. b) . for the offrate selections we did not use non-biotinylated target proteins as described in the standard protocol, because we did not have enough purified protein at hand to do so. instead we sub-cloned all three libraries for both selections after the first round of phage display into the psb_init vector ( clones) and expressed the six pools in e. coli mc cells. then the pools corresponding to the same selection were pooled for purification. the two final pools were purified by ni-nta resin using gravity flow columns, followed by buffer exchange of the main peak fraction using a desalting pd column in tbs ph . to remove imidazole. the pools were eluted with . ml instead of . ml tbs ph . in order to ensure complete buffer exchange. these two purified pools were used for the off-rate selection in the second round of phage display at concentrations of approximately µm for selection variant (rbp-fc) and µm for selection variant (rbp-yfp). the volume used for off-rate selection was µl. just before the pools were used for the off-rate selection, . % bsa and . % tween- was added to each sample. off-rate selections were performed for minutes. elisas were performed as described in detail before [ ] . single clones were analyzed for each library of each selection. since the rbd-fc construct was incompatible with our elisa format due to the inclusion of protein a to capture an α-myc antibody, elisa was performed only for the rbd-vyfp ( nm) and the ecd ( nm) and later on with the pfs ( nm). of note, the three targets were analyzed in three separate elisas. as negative control to assess background binding of sybodies, we used biotinylated mbp ( nm). positive elisa hits were sequenced (microsynth, switzerland). the unique sybodies were expressed and purified as described [ ] . in short, all sybodies were expressed overnight in e.coli mc cells in ml cultures. the next day the sybodies were extracted from the periplasm and purified by ni-nta affinity chromatography (batch binding) followed by sizeexclusion chromatography using a sepax srt- c sec size-exclusion chromatography (sec) column equilibrated in tbs, ph . , containing . % (v/v) tween- (detergent was added for subsequent kinetic measurements). six out of the binders (sb# , sb# , sb# , sb# , sb# , sb# ) were excluded from further analysis due to suboptimal behavior during sec analysis (i.e. aggregation or excessive column matrix interaction). kinetic characterization of sybodies binding onto sars-cov- spike proteins was performed using gci on the wavesystem (creoptix ag, switzerland), a label-free biosensor. biotinylated rbd-vyfp and ecd were captured onto a streptavidin pcp-sta wavechip (polycarboxylate quasi-planar surface; creoptix ag) to a density of - pg/mm . sybodies were first analyzed by an off-rate screen performed at a concentration of nm (data not shown) to identify binders with sufficiently high affinities. the six sybodies sb# , sb# , sb# , sb# , sb# , and sb# were then injected at increasing concentrations ranging from . nm to μm (three-fold serial dilution, concentrations) in tbs buffer supplemented with . % tween- . sybodies were injected for s at a flow rate of μl/min per channel and dissociation was set to s to allow the return to baseline. sensorgrams were recorded at °c and the data analyzed on the wavecontrol (creoptix ag). data were double-referenced by subtracting the signals from blank injections and from the reference channel. a langmuir : model was used for data fitting. purified recombinant hace protein (mybiosource, cat# mbs ) was diluted to nm in phosphate-buffered saline (pbs), ph . , and μl aliquots were incubated overnight on nunc maxisorp -well elisa plates (thermofisher # - - ) at °c. elisa plates were washed three times with μl tbs containing . % (v/v) tween- (tbst). plates were blocked with μl of . % (w/v) bsa in tbs for h at room temperature. μl samples of biotinylated rbd-vyfp ( nm) mixed with individual purified sybodies ( nm) were prepared in tbs containing . % (w/v) bsa and . % (v/v) tween- (tbs-bsa-t) and incubated for . h at room temperature. these μl rbd-sybody mixtures were transferred to the plate and incubated for minutes at room temperature. μl of streptavidin-peroxidase (merck, cat#s ) diluted : in tbs-bsa-t was incubated on the plate for h. finally, to detect bound biotinylated rbd-vyfp, μl of development reagent containing , ′, , ′-tetramethylbenzidine (tmb), prepared as previously described [ ] , was added, color development was quenched after - min via addition of μl . m sulfuric acid, and absorbance at nm was measured. background-subtracted absorbance values were normalized to the signal corresponding to rbd-vyfp in the absence of added sybodies. purified sybodies carrying a c-terminal myc-his tag (sb_init expression vector) were diluted to nm in µl pbs ph . and directly coated on nunc maxisorp -well plates (thermofisher # - - ) at °c overnight. the plates were washed once with µl tbs ph . per well followed by blocking with µl tbs ph . containing . % (w/v) bsa per well. in parallel, chemically biotinylated prefusion spike protein (pfs) at a concentration of nm was incubated with nm sybodies for h at room temperature in tbs-bsa-t. the plates were washed three times with µl tbs-t per well. then, µl of the pfs-sybody mixtures were added to the corresponding wells and incubated for min, followed by washing three times with µl tbs-t per well. µl streptavidin-peroxidase polymer (merck, cat#s ) diluted : in tbs-bsa-t was added to each well and incubated for min, followed by washing three times with µl tbs-t per well. finally, to detect pfs bound to the immobilized sybodies, µl elisa developing buffer (prepared as described previously [ ] ) was added to each well, incubated for h (due to low signal) and absorbance was measured at nm. as a negative control, tbs-bsa-t devoid of protein was added to the corresponding wells instead of a pfssybody mixture. ( ) ( ) ) sb# belongs to the concave library (spill-over). ) two sequencing reactions failed. sb# qvqlvesggglvqaggslrlscaasgfpvrkanmhwyrqapgkerewvaaimskgeqtvyadsve grftisrdnakntvylqmnslkpedtavyycrvfvgwhyfgqgtqvtvs sb# qvqlvesggglvqaggslrlscatsgfpvyqanmhwyrqapgkerewvaaiqsygdgthyadsvk grftisrdnakntvylqmnslkpedtavyycravyvgmhyfgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvnyktmwwyrqapgkerewvaaiwsyghtthyadsvk grftisrdnakntvylqmnslkpedtavyycvvwvghnyegqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyaqnmhwyrqapgkerewvaaiyshgywtlyadsvk grftisrdnakntvylqmnslkpedtavyycevqvgawytgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvfsghmhwyrqapgkerewvaailsngdsthyadsvk grftisrdnakntvylqmnslkpedtavyycrvhvgahyfgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpveqgrmywyrqapgkerewvaaiishgtvtvyadsvk grftisrdnakntvylqmnslkpedtavyycyvyvgaqywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvlftymhwyrqapgkerewvaaiwssgnstwyadsvk grftisrdnakntvylqmnslkpedtavyycfvkvgnwyagqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvnagnmhwyrqapgkerewvaaiqsygrttyyadsvk grftisrdnakntvylqmnslkpedtavyycrvfvgmhyfgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvssstmtwyrqapgkerewvaainsygwethyadsvk grftisrdnakntvylqmnslkpedtavyycyvyvggsyigqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvqshymrwyrqapgkerewvaaiestghhtayadsvk grftisrdnakntvylqmnslkpedtavyyctvyvgyeyhgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvetenmhwyrqapgkerewvaaiyshgmwtayadsvk grftisrdntkntvylqmnslkpedtavyycevevgkwyfgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvkasrmywyrqapgkerewvaaiqsfgevtwyadsvk grftisrdnakntvylqmnslkpedtavyycyvwvgqeywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyasnmhwyrqapgkerewvaaiesqgymtayadsvk grftisrdnakntvylqmnslkpedtavyycwvivgeyyvgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvqaremewyrqapgkerewvaaikstgtytayaysvk grftisrdnakntvylqmnslkpedtavyycyvyvgssyigqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvknfemewyrkapgkerewvaaiqsggvetyyadsvk grftisrdnakntvylqmnslkpedtavyycfvyvgrsyigqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvayktmwwyrqapgkerewvaaiesygikwtryadsv kgrftisrdnakntvylqmnslkpedtavyycivwvgaqyhgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvagrnmwwyrqapgkerewvaaiyssgtyteyadsvk grftisrdnakntvylqmnslkpedtavyychvwvgslykgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvkharmwwyrqapgkerewvaaidshgdttwyadsvk grftisrdnakntvylqmnslkpedtavyycyvyvgasywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvnshemtwyrqapgkerewvaaiqstgtvteyadsvk grftisrdnakntvylqmnslkpedtavyycyvyvgssylgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpveqremewyrqapgkerewvaaidsngnytfyadsvk grftisrdnakntvylqmnslkpedtavyycyvyvgksyigqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvkhhwmfwyrqapgkerewvaaiksygygteyadsvk grftisrdnakntvylqmnslkpedtavyycfvgvgthyagqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyaaemewyrqapgkerewvaaissqgtityyadsvk grftisrdnakntvylqmnslkpedtavyycfvyvgksyigqgtqvsvs sb# qvqlvesggglvqaggslrlscaasgfpvhawemawyrqapgkerewvaairsfgssthyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdfgthhyaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvntwwmhwyrqapgkerewvaaitswgfrtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdkgmavqwydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyntwmewyrqapgkerewvaaitshgyktyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdegdmftaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyhstmfwyrqapgkerewvaaiyssgqhtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdsgqwrqeydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvehemawyrqapgkerewvaairsmgrktlyadsvkg rftisrdnakntvylqmnslkpedtavyycnvkdfgytwheydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvtmawmwwyrqapgkerewvaairsegvrtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdygqahayydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvnshfmewyrqapgkerewvaaiqhssgfhtyyadsv kgrftisrdnakntvylqmnslkpedtavyycnvkdtgttedydywgqgtqvtvs sb# qvqldesggglvqaggslrlscaasgfpvyhawmewyrqapgkerewvaaitssgrhtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdagrvynsydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvahawmewyrqapgkerewvaaitsygyktyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdtgtyrfyydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvwnqtmvwyrqapgkerewvaaiwsmghtyyadsvkg rftisrdnakntvylqmnslkpedtavyycnvkdagvynryydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvehywmewyrqapgkerewvaaitsfgyrtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdwgfashaydywgqgiqvtvs sb# qvqlvesggglvqaggslrlscaasgfpeiawemawyrqapgkerewvaairsfgertlyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdfgwqhqeydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyhaymewyrqapgkerewvaaiysngehtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdsgsfnqaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvewshmhwyrqapgkerewvaaivskggytlyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdygvhfkrydywgqgtqvtvi sb# qvqlvesggglvqaggslrlscaasgfpvfhvwmewyrqapgkerewvaaidsagwhtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdagnttsaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyynwmewyrqapgkerewvaaihsngdetfyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdidaeayaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyhvwmewyrqapgkerewvaaitssgshtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdsgqwrvqydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvywhhmhwyrqapgkerewvaaiiswgwyttyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdhgaqnqmydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyrdrmawyrqapgkerewvaaiysagqqtryadsvk grftisrdnakntvylqmnslkpedtavyycnvkdvghhyeyydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvdngymhwyrqapgkerewvaaidsygwhtiyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdkgqmraaydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvswhsmywyrqapgkerewvaaifsegdwtyyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdygssyykydywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvsqsvmawyrqapgkerewvaaiyskgqythyadsvk grftisrdnakntvylqmnslkpedtavyycnvkdagssywdydywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsigqieylgwfrqapgkeregvaalntwtgrtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaarwgrtkplntyyysywgqgtpvtvs sb# qvqlvesgggsvqaggslrlscaasgyidkivylgwfrqapgkeregvaalytlsghtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaateghahalyrlhyywgqgtqvtvs sb# qvqlvesggglvqaggslrlscaasgfpvyqgemhwyrqapgkerewvaairstgvqtwyadsvk grftisrdnakntvylqmnslkpedtavyycrvwvgthyfgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgniqriyylgwfrqapgkeregvaalmtytghtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaayvgaenplpysmygywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgqishikylgwfrqapgkeregvaalitrwgqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaadygasdplwfihylywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgkiwtikylgwfrqapgkeregvaalmtrwgytyyadsvk grftvsldnakntvylqmnslkpedtalyycaaanygsnfplaeedywywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgnisqihylgwfrqapgkeregvaalntdygytyyadsvk grftvsldnakntvylqmnslkpedtalyycaaayyfgddiplwweaysywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgnistieylgwfrqapgkeregvaalytwhgqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaarwgrhmplsateysywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgniesiyylgwfrqapgkeregvaalwtgdgetyyadsvk grftvsldnakntvylqmnslkpedtalyycaaaawgnsaplttyryyywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgfiygitylgwfrqapgkeregvaalvtwngqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaadwgydwplwdewywywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgtiadikylgwfrqapgkeregvaalmtrwgstyyadsvk grftvsldnakntvylqmnslkpedtalyycaaanyganyplysqqysywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsissikylgwfrqapgkeregvaalmtrwgmtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaanyganeplqythynywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgeiesifylgwfrqapgkeregvaalytyvgqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaasygaahplsimryyywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgtiahikylgwfrqapgkeregvaalmtkwgqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaasyganfplkasdysywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsiqaitylgwfrqapgkeregvaalvtwngqtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaadwgydwplwdewywywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsissitylgwfrqapgkeregvaalvtysgntyyadsvk grftvsldnakntvylqmnslkpedtalyycaaatwghswplyndeywywgqgsqvtvs sb# qvqlvesgggsvqaggslrlscaasgsissitylgwfrqapgkeregvaalitvnghtyyadsvk grftvsldnakntvylqmnslkpedtalyycaaaawgyawplhqddywywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsissitylgwfrqapgkeregvaalntfngttyyadsvk grftvsldnakntvylqmnslkpedtalyycaaatwgyswpliaeynwywgqgtqvtvs sb# qvqlvesgggsvqaggslrlscaasgsissitylgwfrqapgkeregvaalktqagftyyadsvk grftvsldnakntvylqmnslkpedtalyycaaanwgyswplyeaddwywgqgtqvtvs the plasmids encoding for the six highest affinity binders will very soon be available through addgene (addgene # -# ). for each of the six independent selec on reac ons, clones were picked at random and analyzed by elisa. a non-randomized sybody was used as nega ve control (wells h and h , respec vely). sybodies that were sequenced are marked with the respec ve sybody name (sb# - ). please note that iden cal sybodies that were found - mes are marked with the same sybody name (e.g. sb# ) . elisa analyses shown in these graphs were performed on three different days: ( ) rbd and mbp, ( ) ecd, ( ) . . . . concave adsvkgrftisrdnakntvylqmnslkpedtavyycx-vxvgxxyxgqgtqvtvs phylogene c tree of rbd sybodies. a radial tree was generated in clc . . . sybodies inhibit rbd binding to ace . the effect of sybodies on rbd associa on with human ace was assessed with an elisa. individual sybodies ( nm, sybody number shown on x-axis) were incubated with bio nylated rbd-vyfp ( nm) and the mixtures were exposed to immobilized ace . bound rbd-vyfp was detected with streptavidin-peroxidase/tmb. each column indicates background-subtracted absorbance at nm, normalized to the signal corresponding to rbd-vyfp in the absence of sybody (dashed red line). simultaneous binding of sb# and sb# . (a) simultaneous binding of sybodies was analyzed using gra ng-coupled interferometry on the wave system (creop x ag, switzerland). bio nylated ecd was immobilized and the binders were injected alone and simultaneously at satura ng concentra ons (sb# : nm, sb# : nm, sb# : nm). superimposed sensorgrams are shown. (b) compe on elisa. title of the graphs indicate the sybody which was directly coated on the plate at a concentra on of nm. the labels on the x-axes depict the sybody used for compe on. to determine the background signal, buffer devoid of protein was added. herd immunity -estimating the level required to halt the covid- epidemics in affected countries the reproductive number of covid- is higher compared to sars coronavirus estimation of the reproductive number of novel coronavirus (covid- ) and the probable outbreak size on the diamond princess cruise ship: a data-driven analysis presumed asymptomatic carrier transmission of covid- estimating clinical severity of covid- from the transmission dynamics in wuhan, china predicting the future trajectory of covid- preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies. viruses the sars-cov- vaccine pipeline: an overview use of antiviral drugs to reduce covid- transmission a novel coronavirus outbreak of global health concern a sars-like cluster of circulating bat coronaviruses shows potential for human emergence structure, function, and evolution of coronavirus spike proteins cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor we thank rony nehmé and andré heuer (creoptix ag, wädeswil, switzerland) for the acquisition, fitting and interpretation of gci measurements using the wavesystem. we thank florence projer, david hacker and kelvin lau (protein production and structure core facility, epfl, switzerland) for the production of the pre-fusion spike protein. we are grateful to jason mclellan (the university of texas at austin, u.s.) for having provided the pre-fusion-stabilized soluble spike expression vector. kine c characteriza on of the top six sybodies. (a) binding kine cs were measured by gra ng-coupled interferometry on the wave system (creop x ag, switzerland). rbd-vyfp and ecd were immobilized and the sybodies were injected at increasing concentra ons ranging from . nm to μm. data were fi ed using a langmuir : model. key: cord- -gjocoa authors: tsai, shang-jui; guo, chenxu; atai, nadia a.; gould, stephen j. title: exosome-mediated mrna delivery for sars-cov- vaccination date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gjocoa background in less than a year from its zoonotic entry into the human population, sars-cov- has infected more than million people, caused . million deaths, and induced widespread societal disruption. leading sars-cov- vaccine candidates immunize with the viral spike protein delivered on viral vectors, encoded by injected mrnas, or as purified protein. here we describe a different approach to sars-cov- vaccine development that uses exosomes to deliver mrnas that encode antigens from multiple sars-cov- structural proteins. approach exosomes were purified and loaded with mrnas designed to express (i) an artificial fusion protein, lsnme, that contains portions of the viral spike, nucleocapsid, membrane, and envelope proteins, and (ii) a functional form of spike. the resulting combinatorial vaccine, lsnme/sw , was injected into thirteen weeks-old, male c bl/ j mice, followed by interrogation of humoral and cellular immune responses to the sars-cov- nucleocapsid and spike proteins, as well as hematological and histological analysis to interrogate animals for possible adverse effects. results immunized mice developed cd +, and cd + t-cell reactivities that respond to both the sars-cov- nucelocapsid protein and the sars-cov- spike protein. these responses were apparent nearly two months after the conclusion of vaccination, as expected for a durable response to vaccination. in addition, the spike-reactive cd + t-cells response was associated with elevated expression of interferon gamma, indicative of a th response, and a lesser induction of interleukin , a th -associated cytokine. vaccinated mice showed no sign of altered growth, injection-site hypersensitivity, change in white blood cell profiles, or alterations in organ morphology. consistent with these results, we also detected moderate but sustained anti-nucleocapsid and anti-spike antibodies in the plasma of vaccinated animals. conclusion taken together, these results validate the use of exosomes for delivering functional mrnas into target cells in vitro and in vivo, and more specifically, establish that the lsnme/sw vaccine induced broad immunity to multiple sars-cov- proteins. covid- is caused by severe acute respiratory syndrome coronavirus (sars-cov- ) (coronaviridae study group of the international committee on taxonomy of, ; zhou et al., b) . covid- typically presents with symptoms common to many respiratory infections such as fever and cough (gandhi et al., ) but can also progress to acute respiratory distress, disseminated disease, and death (force et al., ; guo et al., ; richardson et al., ; zhou et al., a) . humans have long been host to several mildly pathogenic beta-coronaviruses (oc , hku , etc. (corman et al., ) ) but sars-cov- entered the human population in late as the result of a zoonotic leap. sars-cov- is closely related to a pair of prior bat-to-human zoonoses that were responsible for the outbreaks of severe acute respiratory syndrome (sars-cov) in (graham and baric, ) and middle east respiratory syndrome (mers-cov) in (memish et al., ) . while sars-cov- infection is associated with lower mortality than sars-cov or mers-cov, sars-cov- displays a higher rate of transmission and has become a major cause of morbidity and mortality worldwide (https://www.cdc.gov/coronavirus/ -ncov/hcp/clinical-guidance-managementpatients.html)(coronavirus.jhu.edu) (korber et al., ) . infection of a cell by sars-cov- results in the translation of its viral genomic rna (grna) into large polyproteins, open reading frame (orf a) and orf ab, which are processed to release nonstructural proteins (nsp - ) (v'kovski et al., ) . these early proteins prime the host cell for virus replication and mediate the synthesis of subgenomic viral rnas. these encode additional proteins, including the sars-cov- structural proteins nucleocapsid (n), spike (s), membrane (m), and envelope (e). the coronavirus integral membrane proteins s, m, and e are co-translationally translocated into the endoplasmic reticulum (er) and trafficked by the secretory pathway to golgi and golgi-related compartments (ruch and machamer, ; ujike and taguchi, ) , and perhaps other compartments of the cell as well (ghosh et al., ) . during their intracellular trafficking, the s, m, and e proteins work together to recruit n protein-grna complexes into nascent virions and to drive the budding of infectious vesicles from the host cell membrane. the resulting sars-cov- virions are small, membrane-bound vesicles of ~ nm diameter, with large, spike-like trimers of s that protrude from the vesicle surface . the s protein interacts with a variety of cell surface proteins including its canonical receptor, angiotensin-converting enzyme ii (ace ) (hoffmann et al., ; matheson and lehner, ; zhou et al., b) , and neuropilin- (cantuti-castelvetri et al., ; daly et al., ) . sars-cov- receptors and other infection mediators (e.g. tmprss (hoffmann et al., ) ) are expressed within the respiratory tract, consistent with its respiratory mode of transmission (mason, ) . however, the surface proteins that facilitate sars-cov- binding and entry are also expressed in many other cell types, allowing sars-cov- to spread within the body and impact multiple organ systems, including the brain, heart, gastrointestinal tract, circulatory system, and immune system (cantuti-castelvetri et al., ; daly et al., ; li et al., ; nicin et al., ; singh et al., ; ziegler et al., ) . studies show that covid- patients generate potent cellular and humoral immune responses to the virus (poland et al., ; st john and rathore, ; zost et al., ) . moreover, animal studies provide clear evidence that sars-cov- infection elicits immune responses that reverse the course of disease, clear the virus, and confer resistance to reinfection (bosco-lauth et al., ; chandrashekar et al., ; deng et al., ; shan et al., ) . taken together, these observations augur well for control of sars-cov- transmission and disease through vaccination. although to date there are no approved vaccines for any human coronavirus, disease-preventing vaccines have been progressively developed for multiple animal coronaviruses (tizard, ) . most of these coronavirus vaccines are based on attenuated viruses, which elicit immune responses to all viral proteins, or inactivated virus particle vaccines, which induce immunity to the structural proteins of the virus (i.e. s, n, m, and e). of the sars-cov- vaccines selected for rapid development, all are based on immunization with just a single viral protein, the large, spike-like s protein (slaoui and hepburn, ) (samrat et al., ) . although s-based sars-cov- vaccines all target the same protein, they vary significantly in antigen structure and mode of antigen delivery. forms of s in vaccine trials range from s protein fragments (walsh et al., ) to full-length forms of s (bos et al., ; graham et al., ; hassan et al., ; jackson et al., ; keech et al., ; walsh et al., ; zhu et al., ) though none deliver the kinds of full-length, functional form of s encoded by sars-cov- . as for the modes of s antigen delivery, most enlist host cells to express the s antigen component of their vaccine, from either injected mrnas (jackson et al., ; walsh et al., ) or infectious viral vectors (bos et al., ; graham et al., ; hassan et al., ; zhu et al., ) while some involve direct injection of purified, recombinant s protein (keech et al., ) . here we describe an alternative approach to sars-cov- vaccination that combines the features of exosome-based mrna delivery with the expression of viral antigens in forms designed for antigen presentation by major histocompatibility (mhc) class i and class ii pathways (imai et al., ) . exosomes are small extracellular vesicles (sevs) of ~ - nm in diameter that are made by all cells, abundant in all biofluids, and mediate intercellular transmission of signals and macromolecules, including genetic information such as rnas (pegtel and gould, ) . here we describe the results of a trial immunization study in which exosomes were used to deliver multiple mrnas designed to express fragments of the s, n, m, and e proteins targeted to mhc class i and class ii antigen processing compartment, as well as a full-length, functional form of s. to develop a system for exosome-mediated mrna delivery we first established a protocol for exosome purification from cultured human cells. towards this end, f cells were grown in suspension in chemically-defined media, free of animal products and antibiotic supplements. cells and cell debris were removed by centrifugation and filtration to generate a clarified tissue culture supernatant (ctcs), followed by purification of the ctcs by filtration and chromatography. this process yielded a population of small evs that have the expected ultrastructure and size distribution profile of human exosomes and contain the exosomal marker proteins cd and cd (fig. ) . this process concentrated exosomes ~ -fold, to ~ x exosomes/ml, with an average recovery of %. to determine whether the treatment of cells with exosome-mrna formulations could induce cells to express these mrnas, we synthesized an mrna containing a codonoptimized open reading frame (orf) for antares , a reporter protein comprised of the luciferase teluc fused to two copies of the fluorescent protein cyofp (cyofp -teluc-cyofp ) (yeh et al., ) . antares expression can be measured via its luciferase activity (diphenylterazine-induced, bioluminescence resonance energy transfer (bret)mediated bioluminescence) or via its cyofp -mediated fluorescence (excitation range of nm- nm; emission range of nm- nm). this mrna was loaded into exosomes and then incubated with cultures of hek cells overnight. the cells were then processed for antares luciferase activity and fluorescence microscopy (fig. ) . treated cells displayed high levels of antares luciferase activity that was dependent on the specific order of component addition during the exosome formulation process. when interrogated by fluorescence microscopy, the cells displayed the expected expression of antares -mediated fluorescence. to test whether exosome-mrna formulations can elicit immune responses to proteins of sars-cov- , we first synthesized a pair of mrnas designed to express sars-cov- antigens. the first of these mrnas encoded a membrane protein (lsnme) comprised of the receptor binding domain (rbd) of s, the entire n protein, and soluble portions of the m and e proteins, all expressed within the extracellular domain of the human lamp protein. this protein is predicted to be degraded into peptides for antigen presentation by the mhc class i system, and if expressed in antigen-presenting cells (apcs), to be degraded into peptides for antigen presentation by mhc class ii molecules (gupta et al., ) (imai et al., ) . expression of such a protein in a non-apc cell type such as hek is expected to result in its accumulation in the er, and consistent with this hypothesis, staining hek cells transfected with this mrna expressed with a covid- patient plasma identified an er-localized protein (fig. a, b) . the second of these in vitro synthesized mrnas was designed to express the full-length, functional form of s from the original wuhan- isolate of sars-cov- (s w ) (zhou et al., b) . transfection of this mrna into hek cells led to expression of a distinct protein that was also recognized by antibodies present in a covid- patient plasma (fig. c, d) . taken together, these results demonstrate that these mrnas encode that are reactive with covid- patient plasmas and have expected subcellular distributions. a single exosome-mrna formulation containing both the lsnme and s w mrnas (hereafter referred to as the lsnme/s w vaccine) was injected (intramuscular) into weeks-old male c bl/ j mice (fig. ) . the vaccine was dosed at ug or . ug equivalents of each mrna and injections were performed on day (primary immunization), day ( st boost), and day ( nd boost). blood ( . ml) was collected on days , , , and . on day the animals were sacrificed to obtain tissue samples for histological analysis and splenocytes for blood cell studies. using elisa kits adapted for the detection of mouse antibodies, we observed that vaccinated animals displayed a dose-dependent antibody response to both the sars-cov- n protein and s protein. these antibody reactions were not particularly robust but they were longlasting, persisting to weeks after the final boost with little evidence of decline. it should be noted that the modest antibody production was expected in the case of the n protein, as the lsnme mrna is designed to stimulate cellular immune responses rather than the production of anti-n antibodies. vaccinated and control animals were also interrogated for the presence of antigenreactive cd + and cd + t-cells. this was carried out by collecting splenocytes at the completion of the trial (day ) using a cfse proliferation assay in the presence or absence of recombinant n and s proteins. these experiments revealed that vaccination had induced a significant increase in the percentages of cd + t-cells and cd + t-cells that proliferated in response to addition of either recombinant n protein or recombinant s protein to the culture media ( fig. a-d) . these vaccine-specific, antigen-induced proliferative responses demonstrate that the lsnme/s w vaccine achieved its primary goal, which was to prime the cellular arm of the immune system to generate n-reactive cd + and cd + t-cells, and also s-reactive cd + and cd + t-cells. in additional experiments, we stained antigen-induced t-cells cells for the expression of interferon gamma (ifng) and interleukin (il ). these experiments revealed that the s-reactive cd + t-cell population displayed elevated expression of the th -associated cytokine ifng, and to a lesser extent, the th -associated cytokine il (fig ) . in contrast, nreactive t-cells failed to display an n-induced expression of either ifng or il . control and vaccinated animals were examined regularly for overall appearance, general behavior, and injection site inflammation (redness, swelling). no vaccine-related differences were observed in any of these variables, and animals from all groups displayed similar age-related increases in body mass (supplemental figure ). vaccination also had no discernable effect on blood cell counts (supplemental figure ) . histological analyses were performed on all animals at the conclusion of the study by an independent histology service, which reported that vaccinated animals showed no difference in overall appearance of any of the tissues that were examined. representative images are presented for brain, lung, heart, liver, spleen, kidney, and side of injection skeletal muscle in an animal from each of the trial groups (fig. ) . exosomes represent a novel drug delivery vehicle capable of protecting labile cargoes from degradation and delivering them into the cytoplasm of target cells (kamerkar et al., ; li et al., ; o'brien et al., ) . this is particularly relevant for the development of rna-based vaccines and therapeutics, as unprotected rna-based drugs are subject to rapid turnover, poor targeting, and in some cases unwanted side effects arising from naked nucleic acid injection. encapsulating rnas in liposomes and other types of lipid nanoparticles (lnps) is one approach to solving these problems (witzigmann et al., ; yu et al., ) , but lnps are known to pose risks of their own (peer, ) , and in some cases lnp-rna drugs have been associated with severe adverse effects (hong et al., ) . in contrast, exosomes are continually released by all cells, are abundant components of human blood and all other biofluids (coumans et al., ; pegtel and gould, ) , and are therefore well-tolerated drug-delivery vehicles in human (kamerkar et al., ; li et al., ; o'brien et al., ) . in addition, exosomes play critical roles in the intercellular delivery of signals and macromolecules, including the functional delivery of mrnas and other rnas, making rna-loaded exosomes an attractive candidate for clinical applications of rna therapeutics (o'brien et al., ; ratajczak et al., ; skog et al., ) . . in the present report, we established that formulations of purified exosomes, in vitrosynthesized mrnas, and polycationic lipids can mediate mrna transport into human cells, and functional expression of mrna-encoded protein products. this was established first for antares , a bioluminescent and fluorescent protein that served as a reporter protein for interrogating the effect of exosome-mrna formulation variables that affect exosome-mediated mrna delivery. it was then extended to the functional delivery of mrnas encoding membrane proteins, including the multi-antigen carrier protein lsnme and s w , a functional spike protein. taken together, these results indicate that mrnas delivered via exosome-mrna formulations can support cargo protein synthesis, regardless of whether the protein is predicted to be synthesized on free cytosolic ribosomes (e.g. antares ) or on membrane-bound ribosomes that mediate cotranslational translocation of the protein into the endoplasmic reticulum (e.g. lsnme and s w ). we also explored the ability of an exosome-rna formulation to drive functional mrna expression in vivo by injecting an exosome complex containing the lsnme and s w mrnas (lsnme/s w ) into mice and monitoring immune responses to the sars-cov- n and s proteins. this vaccine was administered at relatively low doses of µg mrna equivalents and . µg mrna equivalents in the absence of adjuvant. injections were spaced at three-week intervals, and blood samples were collected over the course of weeks. the animals were then sacrificed and tissues were harvested for analysis of cellular immune responses and organ histology. consistent with the goal of vaccineinduced development of balanced t-cell responses, vaccinated animals displayed antigen-induced proliferation of cd + and cd + t-cell responses to both the n and s proteins. these antigen-responsive cd + and cd + populations were present nearly two months after the final boost injection, indicating that lsnme/s w vaccination had elicited a sustained cellular immune response to both of these sars-cov- structural proteins. furthermore, when these cell populations were interrogated for antigen-induced expression of the cytokines ifng and il , we detected elevated expression of ifng in cd + t-cells exposed to exogenous s protein, as well as a more modest s-induced expression of il . these results raise the possibility that the lsnme/s w induces the kind of th -skewed cellular response desired for vaccine-induced immunity. these results are consistent with the design of the lsnme open reading frame, which is engineered to drive antigen processing by the mhc class i and class ii pathways (gupta et al., ; imai et al., ; wu et al., ) . vaccinated animals also developed durable antibody responses to the n and the s proteins. while the titers of these antibody responses were modest, they were sustained at relatively constant levels over the weeks following the final boost injection. the relative strength of these immune responses is likely a consequence of the low mrna dose of the lsnme/s w vaccine, and is likely to be amplified significantly by the > -fold increase in dose projected in large animal models and human trials. in conclusion, the results presented in this study validate the use of exosome-mrna formulations for functional delivery of mrnas both in cultured cells and in live animals. the successful use of exosomes to deliver antares mrna opens the door to follow-on studies aimed at optimizing exosome-rna formulation conditions, as well as for characterizing the time-dependence of antares expression, biodistribution of exosomemediated rna expression, injection site effects, and exosome-mediated tissue. as for the future development of the lsnme/s w vaccine, we anticipate that follow-on studies in larger animal models at doses comparable to other mrna vaccines will demonstrate a desirable combination of safety, balanced immune responses, and when challenged, protection against sars-cov- infection and/or disease. f cells (gibco, cat.# - ) were tested for pathogens and found to be free of viral exosome and cell lysates were separated by sds-page using pre-cast, - % gradient gels (bio-rad ) and transferred to pvdf membranes (thermofisher, # ). membranes were probed using antibodies directed against cd , cd (system biosciences exoab-cd a- and exoab-cd a- , respectively), and actin (sigma a ), with hrp-conjugated goat anti-rabbit secondary antibody used for detection (cell signaling, # ). target proteins were visualized by chemiluminescence, and images were captured using a chemidoc imager (bio-rad). exosomes were fixed by addition of formaldehyde to a final concentration of %. carboncoated grids were placed on top of a drop of the exosome suspension. next, grids were placed directly on top of a drop of % uranyl acetate. the resulting samples were examined with a tecnai- g spirit biotwin transmission electron microscope (john hopkins university, usa). mrnas were purified using rneasy columns (qiagen) and reuspended in dnase-free, rnase-free water using nuclease-free tips and tubes. rnas were then combined with different combinations and amounts of polycationic lipids and exosomes, as well as in different orders of addition. rna loading of exosomes for vaccine formulation involved pre-mixing of mrnas with polycationic lipids followed by addition of exosomes. hek cells were incubated with exosome-mrna formulations overnight under standard culture conditions. antares luciferase activity was measured by live cell bioluminescence was collected after incubating with substrate diphenylterazine (mce, hy- ) at final concentration of µm for minutes. readings were collected using a spectramax i x (molecular devices). fluorescence micrographs of antares expression in transfected hek cells were captured as png files using an evos m microscope equipped with an olympus uplansapo x/ . objective. immunization rna-loaded exosome formulations were generated and then stored for hours at °c prior to injection of mice. injection doses were at either µg equivalents of each mrna, or . µg equivalents of each mrna. immunizations were initiated on thirteen weeksold, male c bl/ j mice (jackson laboratory) housed under pathogen-free conditions at the cedars-sinai medical center animal facility. all animal experimentation was performed following institutional guidelines for animal care and were approved by the cedars-sinai medical center iacuc (# ). all injections were at a volume of µls. blood (~ . ml) was collected periodically from the orbital vein. at day , mice were deeply anesthetized using isoflurane, euthanized by cervical dislocation, and processed using standard surgical procedures to obtain spleen, lung, brain, heart, liver, kidney, muscle, and other tissues. spleens were processed for splenocyte analysis, and all tissues were processed for histological analysis by fixation in % neutral buffered formalin. histological analysis was performed by the service arm of the hic/comparative pathology program of the university of washington. mouse igg antibody production against sars-cov- antigens was measured by enzyme-linked immunosorbent assays (elisa). for antigens s (rbd) and n, precoated elisa plates from raybiotech were utilized (ieq-cov s rbd-igg; ieq-covn-igg), and the experiments were performed according to the manufacturer's instructions, with modification. briefly, mouse plasmas at dilutions of : were added to antigen precoated wells in duplicates and incubated at room temperature (rt) for hours on a shaker ( rpm). the plates were washed times with wash buffer followed by blocking for hours at rt with % bsa in pbs. mouse antibodies bound to the antigens coated on the elisa plates were detected using hrp-conjugated goat anti-mouse secondary antibodies (jackson immuno research inc.) plates were washed times with washing buffer, and developed using tmb substrate (raybiotech). microplate reader was used to measure the absorbance at nm (spectramaxid , molecular devices, with softmax pro software). after terminal blood collection mice were euthanized, and part of fresh spleens were harvested. single cell splenocyte preparation was obtained by machinal passage through a µm nylon cell strainer (bd falcon, # ). erythrocytes were depleted using ammonium-chloride-potassium (ack) lysis buffer (gibco, #a - ), and splenocytes were washed using r media by centrifuging at x g for minutes at rt. splenocytes ( x cells/mouse) were resuspended in µl of % fbs in x pbs and incubated with fluorochrome-conjugated antibodies for surface staining of cd (invitrogen, # - - ) cd (biolegend, # ), cd (biolegend, # ), b (bd, # ) cd c (invitrogen, # - - ), f / (invitrogen, #mf ) ly g (invitrogen, # - - ) and ly c (bd, # )) for minutes at °c in the dark. following incubation, samples were washed twice with µls % fbs in x pbs and centrifuged at x g for minutes at rt to remove unbound antibodies. next the cells were fixed with µls ics fixation buffer (invitrogen, # - - ). samples were analyzed on a facs canto ii (bd biosciences) with , - , recorded lymphocytes . the data analysis was performed using flowjo software (flowjo, llc) and presented as a percentage change in the immune cell population compared to the vehicle-treated group. splenocytes were resuspended at cells/ml in % fbs in xpbs and stained with carboxyfluorescein succinimidyl ester (cfse) (invitrogen, #c ) by rapidly mixing equal volume of cell suspension with µm cfse in % fbs in x pbs for minutes at °c. the labeled cells were washed three times with r complete medium. the cells were incubated for hours in the presence of µg/ml sars-cov- antigens n or s the stained cells were analyzed on a bd facs canto ii with , - , recorded lymphocytes. the data analysis was performed using flowjo software. individual six control mice, (orange bars and black squares) six mice immunized with . µg equivalents of each mrna, and (rust bars and black triangles) six mice immunized with µg equivalents of each mrna. height of bars represents the mean, error bars represent +/-one standard error of the mean, and the statistical significance of differences between different groups is reflected in student's t-test values of * for < . and ** for < . . figure . lsnme/s w vaccination leads to s-induced expression of ifng and il by cd + t-cells. splenocytes were interrogated by flow cytometry following incubation in the absence or presence of (a, b) purified, recombinant n protein or (c, d) purified, recombinant s protein, and labeling with antibodies specific for cd or cd , and for ifng or il . differences in labeling for ifng or il in cd + cd + cell populations were plotted for (grey bars and black circles) individual six control mice, (orange bars and black squares) six mice immunized with . µg equivalents of each mrna, and (rust bars and black triangles) six mice immunized with µg equivalents of each mrna. height of bars represents the mean, error bars represent +/-one standard error of the mean, and the statistical significance of differences between different groups is reflected in student's ttest values of * for < . . figure . absence of tissue pathology upon lsnme/s w vaccination. representative micrographs from histological analysis (hematoxylin and eosin stain) of lung, brain, heart, liver, kidney, spleen, and muscle (side of injection) of animals from (upper row) control mice, (middle row) mice immunized with the lower dose of the lsnme/s w vaccine, and (lower row) mice immunized with the higher dose of the lsnme/s w vaccine. cd + cells were further differentiated by staining for (e) cd and (f) cd . no statistically significant differences were detected in these subpopulations of white blood cells. ad vector-based covid- vaccine encoding a prefusion-stabilized sars-cov- spike immunogen induces potent humoral and cellular immune responses experimental infection of domestic dogs and cats with sars-cov- : pathogenesis, transmission, and response to reexposure in cats neuropilin- facilitates sars-cov- cell entry and infectivity sars-cov- infection protects against rechallenge in rhesus macaques hosts and sources of endemic human coronaviruses coronaviridae study group of the international committee on taxonomy of the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- methodological guidelines to study extracellular vesicles neuropilin- is a host factor for sars-cov- infection primary exposure to sars-cov- protects against reinfection in rhesus macaques acute respiratory distress syndrome: the berlin definition mild or moderate covid- recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission evaluation of the immunogenicity of prime-boost vaccination with the replication-deficient viral vectored covid- vaccine candidate chadox ncov- cardiovascular implications of fatal outcomes of patients with coronavirus disease (covid- ) sars coronavirus nucleocapsid immunodominant t-cell epitope cluster is common to both exogenous recombinant and endogenous dna-encoded immunogens a single-dose intranasal chad vaccine protects upper and lower respiratory tracts against sars-cov- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor phase study of mrx , a liposomal mir- a mimic, in patients with advanced solid tumours distinct subcellular compartments of dendritic cells used for cross-presentation an mrna vaccine against sars-cov- -preliminary report exosomes facilitate therapeutic targeting of oncogenic kras in pancreatic cancer phase - trial of a sars-cov- recombinant spike protein nanoparticle vaccine tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus extracellular vesicles carry microrna- to intrahepatic cholangiocarcinoma and improve survival in a rat model expression of the sars-cov- cell receptor gene ace in a wide variety of human tissues thoughts on the alveolar phase of covid- how does sars-cov- cause covid- ? cell type-specific expression of the putative sars-cov- receptor ace in human hearts rna delivery by extracellular vesicles in mammalian cells and its applications immunotoxicity derived from manipulating leukocytes with lipid-based nanoparticles sars-cov- immunity: review and applications to phase vaccine candidates embryonic stem cell-derived microvesicles reprogram hematopoietic progenitors: evidence for horizontal transfer of mrna and protein delivery presenting characteristics, comorbidities, and outcomes among patients hospitalized with covid- in the new york city area the coronavirus e protein: assembly and beyond prospect of sars-cov- spike protein: potential role in vaccine and therapeutic development infection with novel coronavirus (sars-cov- ) causes pneumonia in rhesus macaques a single-cell rna expression map of human coronavirus entry factors glioblastoma microvesicles transport rna and proteins that promote tumour growth and provide diagnostic biomarkers developing safe and effective covid vaccines -operation warp speed's strategy and approach early insights into immune responses during covid- vaccination against coronaviruses in domestic animals incorporation of spike and membrane glycoproteins into coronavirus virions coronavirus biology and replication: implications for sars-cov- safety and immunogenicity of two rna-based covid- vaccine candidates clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china lipid nanoparticle technology for therapeutic gene regulation in the liver engineering an intracellular pathway for major histocompatibility complex class ii presentation of antigens molecular architecture of the sars-cov- virus red-shifted luciferase-luciferin pairs for enhanced bioluminescence imaging rna drugs and rna targets for small molecules: principles, progress, and challenges clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues potently neutralizing and protective human antibodies against sars-cov- statistical analysis was performed using graphpad prism software for windows/mac (graphpad software, la jolla california usa). results were reported as mean ± standard deviation or mean ± standard error, differences were analyzed using student's t-test and one-way analysis of variance. sources of support: this study was conducted with support from capricor and from johns hopkins university.disclosures: s.j.g is a paid consultant for capricor, holds equity in capricor, and is coinventor of intellectual property licensed by capricor. s.j.t. is co-inventor of intellectual property licensed by capricor. c.g. is co-inventor of intellectual property licensed by capricor. n.a. is an employee of capricor. key: cord- -nvphu fm authors: thomson, emma c.; rosen, laura e.; shepherd, james g.; spreafico, roberto; da silva filipe, ana; wojcechowskyj, jason a.; davis, chris; piccoli, luca; pascall, david j.; dillen, josh; lytras, spyros; czudnochowski, nadine; shah, rajiv; meury, marcel; jesudason, natasha; de marco, anna; li, kathy; bassi, jessica; o’toole, aine; pinto, dora; colquhoun, rachel m.; culap, katja; jackson, ben; zatta, fabrizia; rambaut, andrew; jaconi, stefano; sreenu, vattipally b.; nix, jay; jarrett, ruth f.; beltramello, martina; nomikou, kyriaki; pizzuto, matteo; tong, lily; cameroni, elisabetta; johnson, natasha; wickenhagen, arthur; ceschi, alessandro; mair, daniel; ferrari, paolo; smollett, katherine; sallusto, federica; carmichael, stephen; garzoni, christian; nichols, jenna; galli, massimo; hughes, joseph; riva, agostino; ho, antonia; semple, malcolm g.; openshaw, peter j.m.; baillie, j. kenneth; rihn, suzannah j.; lycett, samantha j.; virgin, herbert w.; telenti, amalio; corti, davide; robertson, david l.; snell, gyorgy title: the circulating sars-cov- spike variant n k maintains fitness while evading antibody-mediated immunity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nvphu fm sars-cov- can mutate to evade immunity, with consequences for the efficacy of emerging vaccines and antibody therapeutics. herein we demonstrate that the immunodominant sars-cov- spike (s) receptor binding motif (rbm) is the most divergent region of s, and provide epidemiological, clinical, and molecular characterization of a prevalent rbm variant, n k. we demonstrate that n k s protein has enhanced binding affinity to the hace receptor, and that n k virus has similar clinical outcomes and in vitro replication fitness as compared to wild- type. we observed that the n k mutation resulted in immune escape from a panel of neutralizing monoclonal antibodies, including one in clinical trials, as well as from polyclonal sera from a sizeable fraction of persons recovered from infection. immune evasion mutations that maintain virulence and fitness such as n k can emerge within sars-cov- s, highlighting the need for ongoing molecular surveillance to guide development and usage of vaccines and therapeutics. sars-cov- , the cause of covid- , emerged in late and expanded globally, resulting in over million confirmed cases as of october . molecular epidemiology studies across the world have generated over , viral genomic sequences and have been shared with unprecedented speed via the gisaid initiative (https://www.gisaid.org/). these data are essential for monitoring virus spread (meredith et al., ) and evolution. of particular interest is the evolution of the sars-cov- surface protein, spike (s), which is responsible for viral entry via its interaction with the human angiotensin-converting enzyme (hace ) receptor on host cells. the s protein is the target of neutralizing antibodies generated by infection or vaccination (folegatti et al., ; jackson et al., ; keech et al., ) as well as monoclonal antibody (mab) drugs currently in clinical trials (hansen et al., ; jones et al., ; pinto et al., ) . a sars-cov- s variant, d g, is now dominant in most places around the globe (callaway, ) . studies in vitro indicate that this variant may have greater infectivity while molecular epidemiology indicates that it spreads efficiently and likely maintains virulence (hu et al., ; korber et al., ; volz et al., ; . amino acid is outside the receptor binding domain (rbd) of s, the domain targeted by % of neutralizing antibody activity in serum of sars-cov- survivors (piccoli et al., ) . initial studies suggest that d g actually exhibits increased sensitivity to neutralizing antibodies, likely due to its effects on the molecular dynamics of the spike protein (hou et al., ; yurkovetskiy et al., ) . therefore, this dominant variant is unlikely to escape antibody-mediated immunity. the low numbers of novel mutations reaching high frequency in sequenced sars-cov- isolates may relate to the moderate intrinsic error rate of the replication machinery of sars-cov- (li et al., c; robson et al., ) and to this new human coronavirus requiring no significant adaption to humans (maclean et al., ) . nevertheless, the increasing number of infected individuals and the large reservoir of individuals susceptible to infection increases the likelihood that novel variants that impact vaccine and therapeutic development will emerge and spread. moreover, the full impact of immune selection, which can drive variant selection, likely has not yet had a dominant influence on the pandemic, since herd immunity has not yet been attained. as population immunity increases and vaccines are deployed at scale this might change. the potential for circulating viral variants to derail promising vaccine or antibody-based prophylactics or treatments, even in the absence of selective pressure from the drug or vaccine, is demonstrated by the failure of a phase iii clinical trial of a mab targeting the respiratory syncytial virus (simoes et al., ) , and the need for new influenza vaccines on a yearly basis. it is therefore critical to understand whether and how sars-cov- may evolve to evade antibody-dependent immunity. here, we examined the immunodominant sars-cov- receptor binding motif (rbm), the primary target of the neutralizing ab response within the rbd (piccoli et al., ) and found it to be less conserved than the rbd or the entire spike protein in circulating viruses. to understand the implications of this structural plasticity for immune evasion, we defined the clinical and epidemiological impact, the molecular features, and the immune response to an rbm variant, n k. this variant has arisen independently twice, in both cases forming lineages of more than sequences. as of october , it has been observed in countries and is the second most commonly observed rbd variant worldwide. we find that the n k mutation is associated with a similar clinical spectrum of disease and slightly higher viral loads in vivo compared with isolates with the wild-type n residue, and that it results in immune escape from polyclonal sera from a proportion of recovered individuals and a panel of neutralizing mabs. n k provides a sentinel example of immune escape, indicating that rbm variants must be evaluated when considering vaccines and the therapeutic or prophylactic use of mabs. long term control of the pandemic will require systematic monitoring of immune escape variants and selection of strategies that address the variants circulating in targeted populations. competing pressures influence the evolution of the spike rbm. first, the rbm mediates viral entry (shang et al., ; walls et al., ; wrapp et al., b) and therefore it must maintain sufficient affinity to engage the entry receptor hace . second, it is a major target of neutralizing antibodies (robbiani et al., ; rogers et al., ; wec et al., ) and could be a primary location for the emergence of immune escape mutations. we set out to understand these competing pressures by evaluating the landscape of rbm sequence divergence observed in circulating sars-cov- variants and in other viruses of the sarbecovirus lineage. we used published x-ray structures of sars-cov and sars-cov- rbd:hace complexes (lan et al., ; li et al., ) to define the rbm residues using a Å distance cutoff (figures a-c) . we evaluated ~ , sars-cov- genomic sequences deposited in gisaid as of october , and observed a high number of variants occurring in the rbm (figure a) . to understand how the divergence of the rbm compares to the divergence of the entire rbd and the whole spike protein, we divided the spike protein into three non-overlapping regions: the rbm, the rbd outside of the rbm, and the full s protein outside of the rbd. we counted individual variants occurring at least ten times, and quantified substitutions of different amino acids at the same position as separate variants. we found that the rbm is the least conserved region of s ( figure b) . to understand this result further, we evaluated a published deep mutational scanning (dms) data set of the rbd and compared it to sequences of circulating viruses. the dms data defines the effect of each possible single amino acid change on both expression of the rbd and its capacity to bind hace . for each position in the rbm, we compared the dms results for all amino acid substitutions at that position versus only substitutions that have been observed in circulating sars-cov- isolates ( figure c) . a subset of residues shows the largest loss of hace binding upon mutation (top ~ / of rbm residues in figure c ) and, as would be expected, few natural variants of these residues have been observed to be circulating to date. surprisingly, these conserved residues each contribute weakly to the rbd:hace total interaction energy (the sum of pair-wise interaction energies for all residues at the binding interface in the x-ray structure; "binding energy" in figure c ). for the majority of the rbm (bottom ~ / of rbm residues in figure c ), variation in circulating virus sequences confirms the tolerance to mutation predicted by the dms data. notably, several rbm residues forming the strongest interactions with the receptor, e.g. k and e , are not highly conserved despite their predicted importance. these results suggest that the rbm has a degree of structural plasticity whereby it is able to accommodate mutations without disrupting hace binding. evolutionary analysis of sarbecoviruses provides further support for rbm plasticity li et al., b; rambaut et al., ) . the sars-cov rbm is highly divergent from the sars-cov- rbm (figure s a-b) while maintaining hace binding affinity. additionally, there are many sequence changes in the rbm across a panel of related coronaviruses from animal isolates (figure s a-b, table s ). to determine the ability of members of the sarbecovirus lineage to bind hace , we produced nine recombinant rbd proteins corresponding to seven animal isolates, sars-cov- , and sars-cov and evaluated their binding to recombinant hace ( figure s c ). we found that three of the rbds from animal isolates showed strong affinity for hace : gd pangolin, which has a highly similar rbm to sars-cov- , and gx pangolin and bat cov wiv , which have highly divergent rbms (figure s a-b) . this further supports the conclusion that the rbm is structurally plastic, while retaining binding with hace as a receptor. given this plasticity, we next considered whether an rbm variant can lead to immune evasion while retaining virulence. the two most commonly observed circulating rbd variants as of october contain mutations in the rbm (s n and n k). we first identified the n k variant in march , circulating in scotland from lineage b. on the background of d g (da silva filipe et al., ) . using phylogenetic analysis, we determined this variant represented a single lineage (figure a ) that increased in frequency to sequences by june , (~ % of the available scottish viral genome sequences for this time period). numbers of n k and all other isolates decreased in scotland concurrent with control of the pandemic by initiation of stringent public health measures and this lineage has not been detected in scotland after june. however, the n k variant has been observed in > sequences in a second lineage in europe, first sampled in romania on may , , then norway on june , and is now circulating in countries, as well as arising independently in the u.s. (figure a-c) . as of oct , , all n k variants arose from a c-to-a transversion in the third codon position, though these counts are heavily influenced by sampling frequency which varies widely between countries. as scotland has a high sampling frequency for its population size (~ . m), it is possible to calculate a growth rate (voltz and frost, ) based on a comparison of the scottish lineages. we find that the growth rate is similar to what has already been shown for the d g background with no evidence for a faster rate of growth than n lineages ( figure s a ). in addition to its frequency and spread, the n k variant stood out from other circulating rbm variants as having a plausible mechanism for maintenance of viral fitness. the equivalent position to n k in the sars-cov rbm is also a positively-charged amino acid (r ), which forms a salt bridge with hace (li et al., ) . we therefore hypothesized that the n k sars-cov- variant may form this additional salt bridge at the rbd-hace interface (rbd n k:hace e ). structural modeling supported that this salt bridge could form without disrupting the binding interface, including the two original salt bridges (rbd k :hace d and rbd e :hace k ) (figure a-c) . a salt bridge is the strongest type of non-covalent bond and the n k mutation could plausibly increase the number of salt bridges at the binding interface from two to three, presenting the hypothesis that the n k variant may have enhanced binding for hace . to test this hypothesis, we used surface plasmon resonance (spr) to evaluate binding of recombinant n k s or rbd protein to recombinant hace . we also evaluated the n r and k v variants, each of which is found in sars-cov at these positions. across multiple assay formats, we found that the n k and n r variants exhibited a ~ -fold enhanced binding affinity for hace as compared to the original n variant (termed herein wt) ( figure d ). the magnitude of this enhancement was paralleled by a ~ -fold loss of binding affinity for the k v variant relative to wt. lastly, we also tested the effect of the n k/r and k v mutations in combination. these double variants form the same number of salt bridges at the hace binding interface as compared to wt, but one is at rbd position rather than ; we found they had an hace affinity similar to the wt ( figure d ). these data indicate that acquisition of the n k mutation enhances binding affinity, which could have implications in vivo in the context of natural infection. also, the enhanced affinity could plausibly compensate for other mutations that would otherwise be detrimental (e.g. k v), further highlighting the plasticity of the rbm. the enhanced hace affinity of the n k variant, its geographical emergence as independent lineages as well as its prevalence among circulating viral isolates is consistent with maintained viral fitness. we set out to directly examine fitness by evaluating clinical data and outcomes of virus carrying the n k mutation versus wt n , as well as by direct in vitro viral growth and competition. we used qpcr to evaluate viral load (as measured by cycle threshold, ct) in , scottish patients whose viral isolates had been sequenced (figures a-b ). viral isolates were either n k/d g (n= ), n /d g (n= ) or ancestral (n /d ) (n= ). our analysis found strong evidence that the n k/d g genotype was associated with marginally lower cycle threshold (ct) than the n /d g genotype (mean ct value difference between n k/d g and n /d g: - . , % ci: - . , - . ) ( figure b ). as ct measurements were carried out in multiple sites, a sub-analysis of viral load using rna standards was carried out with available samples and showed a near-complete correlation with ct ( figure b ). d g has previously been associated with higher viral loads/lower ct values than d (korber et al., ) but we did not detect this difference in this statistical analysis due to the intercept of the model being imprecisely estimated (table s ) . clinical outcomes were also obtained for a subset of these patients (n= , ), who were scored for severity of disease based on oxygen requirement: . no respiratory support, : supplemental oxygen, : invasive or non-invasive ventilation or high flow nasal cannulae, : death (figures c and s b ). genotype counts for this analysis were n k/d g (n= ), d g (with n ) (n= ) or ancestral (n /d ) (n= ). analysis based on our ordinal scale indicated that the n k/d g viral genotype was associated with similar clinical outcomes compared to d g or ancestral genotypes (posterior mean: . , % ci: - . , . ) ( table s ). all other results from the severity analysis were qualitatively similar to a previous analysis of the d g mutation (volz et al., ) . these clinical data indicate that the n k virus is not attenuated. we next tested growth of two representative sars-cov- isolates, gla (wt n ) and gla (n k), both with the d g background (table s ) . culture was carried out for hours in vero e -ace cells either with or without tmprss expression. there was no significant difference between the growth of these strains after inoculation at multiplicities of infection (mois) of . and . . the n k strain replicated slightly faster early after inoculation ( figure d ). these data indicate that the n k mutation does not exhibit dominant negative effects on viral growth, and most likely supports normal replication. to further assess fitness for replication in cultured cells, we carried out a cross-competition assay using inoculation of cells at a matched moi followed by quantitation of n and n k by metagenomic ngs over time (figure e ). the n k strain demonstrated similar fitness as the wt n strain, with a possible fitness advantage for n k in cells expressing tmprss . taken together with the clinical outcomes, these results indicate that the n k mutation results in viral fitness that is similar or possibly slightly improved compared to the wild-type n . having established that virus carrying the n k mutation is fit, we sought to understand whether this mutation evades antibody-mediated immunity by evaluating recognition of the n k variant by monoclonal antibodies and by polyclonal immune serum from recovered individuals, including donors who were infected by the sars-cov- n k variant. . % of the tested sera showed a greater than -fold reduction in binding to n k rbd as compared to wt rbd (figures a-b and s ) . in some individuals the rbd response was diminished to low titers of < : by the n k mutation. thus, the response to the rbd is significantly influenced by the n k mutant within the immunodominant rbm domain (piccoli et al., ) in a significant portion of persons potentially immune to wt sars-cov- . the majority of sera demonstrating loss of binding were those that had overall lower responses to wt rbd, indicating lower ab titers. the sera from the six individuals known to have recovered from infection with sars-cov- n k virus showed no change in binding levels to wt rbd as compared to n k rbd (figures a-b and s ) . this may reflect a true variant-specific response or that differential binding could not be measured due to the limited number of samples analyzed. to understand our results at the level of individual antibodies, we evaluated a panel of mabs isolated from individuals recovered from sars-cov- infection early in the pandemic (likely with n wt virus) (piccoli et al., ; tortorici et al., ) , as well as clinical-stage mabs regn , regn , ly-cov , and s (the parent of vir- ) hansen et al., ; chen et al., ; pinto et al., ) . . % of these mabs demonstrated a > -fold reduction of rbd binding in response to the n k mutation ( figures c-d and s ). for comparison, we also evaluated the k v mutation which eliminates one salt bridge at the rbm:hace interface and the n k/k v double mutation. a similar percentage ( . % for k v vs . % for n k) of mabs lost > -fold binding to these variants, including several ( . %) which were not sensitive to either single mutant but were sensitive to the double mutant ( figures c-d) . the reduced binding of mabs to these rbd mutants were also confirmed by bio-layer interferometry analysis (bli) (figures e and s a) . to define the potential biological importance of these mutations for evasion of antibody-mediated neutralization, we tested mabs against pseudoviruses expressing s variants n k, k v or n k/k v (figures f-h and s b ). neutralization of pseudoviruses containing these mutations was significantly diminished for certain mabs, including some that are in clinical development. as predicted by its non-rbm epitope , s was capable of neutralizing each of these variants. sensitivity of some neutralizing mabs to mutations at these positions have also been reported in other studies greaney et al., ; li et al., a; weisblum et al., ) but combinations of mutations have not typically been evaluated. overall, our results demonstrate that mutations compatible with viral fitness can result in immune evasion from both monoclonal and polyclonal antibody responses. the evolution of the sars-cov- rbm, a critical epitope for vaccine response and therapeutic mabs, will depend on the fitness of rbm variants. the findings herein describe an example of a naturally-occurring rbm variant which can evade antibody-mediated immunity while maintaining fitness. fitness of this variant, n k, was demonstrated by repeated emergence by convergent evolution, spread to multiple countries and significant representation in the sars-cov- sequence databases, the fact that the n k rbd retains a high affinity interaction with the hace receptor, efficient viral replication in cultured cells, and no disease attenuation in a large cohort of infected individuals. the fitness of n k is consistent with our findings that the rbm is the most divergent region of s. this divergence indicates an ability of sars-cov- to accommodate mutations at the rbm while retaining the functional requirement of hace binding, and is likely to be linked to immune pressure from neutralizing ab responses. there is precedent for the most immunogenic region of a viral surface protein to be the fastest mutating despite harboring the receptor binding site; for example, the immunogenic globular head domain of the influenza virus hemagglutinin surface protein, which contains the sialic acid receptor binding site, evolves faster than the stalk region (doud et al., ; kirkpatrick et al., ) . the ability to accommodate mutations in the rbm indicates a high likelihood that immune-evading sars-cov- variants compatible with fitness will continue to emerge, with implications for reinfection, vaccines, and both monoclonal and polyclonal antibody therapeutics. in our profile of immune escape from the n k variant, we observed resistance to a mab currently being evaluated in clinical trials as part of a two-mab cocktail. the promise of using cocktails of mabs is that they should significantly lower the likelihood of drug-induced selection of resistant viruses . however, if circulating viral strains already carry resistant mutations to one antibody in the cocktail, this could reduce the cocktail to a monotherapy. additionally, considering the high level of plasticity of the rbm demonstrated in the present study, there could be many combinations of rbm mutations compatible with viral fitness while leading to immune escape. this is supported by our result that n k can compensate for a mutation (k v) that otherwise decreases receptor binding affinity ( figure d ). this particular combination of mutations is plausibly compatible with fitness as it parallels sars-cov rbm:hace interactions (salt bridge at sars-cov rbd position r and no salt bridge at v , figure a) . notably, several mabs which were not sensitive to these mutations individually were sensitive to them in combination, including the two-mab cocktail ( figure c-h) . we propose two approaches that will be critical for minimizing the impact of mab escape mutations. one is to develop mabs with epitopes that are highly resistant to viral escape. this may include epitopes outside of the rbm and/or epitopes that are crossreactive across sars-cov and sars-cov- , indicating conserved epitopes with a low tolerance for mutation wec et al., ; wrapp et al., a) . a comparison of epitopes of rbm-targeting mabs with the most conserved regions of the rbm ( figure c ) may also identify rbm mabs with a higher barrier to escape. the second approach is to screen patients, likely at the population level, for the presence of potential resistance variants prior to drug administration. the availability of multiple different mab therapeutics in the clinic could provide the opportunity to tailor the choice of therapeutic to local circulating variants. in general, given that access to therapeutic monoclonal antibodies via clinical trials and emergency use authorization is expanding, and as more people develop immune responses to the wildtype virus, monitoring the evolution of sars-cov- will be increasingly critical. although sars-cov- is evolving slowly and at present should be controllable by a single vaccine (dearlove et al., ) , variation accumulating in the rbm could put this at risk, especially for individuals with a moderate ab response to vaccination or infection. while we only report on evasion of antibody-mediated immunity here, it would be surprising to us if similar changes are not observed to evade t cell immunity and innate immunity. wec, a.z., wrapp, d., herbert, a.s., maurer, d.p., haslwanter, d., sakharkar, m., jangra, r.k., dieterle, m.e., lilov, a., huang, d., et al. ( ) . broad neutralization of sars-related viruses by human monoclonal antibodies. science , - . weisblum, y., schmidt, f., zhang, f., dasilva, j., poston, d., lorenzi, j.c.c., muecksch, f., rutkowska, m., hoffmann, h.-h., michailidis, e., et al. ( ) . escape from neutralizing antibodies by sars-cov- spike protein variants. ( ) . a new coronavirus associated with human respiratory disease in china. nature , - . yurkovetskiy, l., wang, x., pascal, k.e., tomkins-tinch, c., nyalile, t.p., wang, y., baum, a., diehl, w.e., dauphin, a., carbone, c., et al. ( ) . structural and functional analysis of the d g sars-cov- spike protein variant. cell. zhang, l., jackson, c.b., mou, h., ojha, a., rangarajan, e.s., izard, t., farzan, m., and choe, h. ( ) . the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity. https://wwwbiorxivorg/content/ / v . samples from sars-cov- infected individuals were obtained from the ticino healthcare workers cohort (switzerland), described previously (piccoli et al., ) , and under study protocols approved by the local institutional review board (canton ticino ethics committee, switzerland). all donors provided written informed consent for the use of blood and blood components (such as pbmcs, sera or plasma). in the ticino region of switzerland and during the time period of collection (february-march ) no n k sars-cov- isolates were reported. samples from six n k variant infected individuals were obtained from the isaric c consortium (https://isaric c.net/). ethical approval was given by the south central-oxford c research ethics committee in england (reference /sc/ ), and by the scotland a research ethics committee (reference /ss/ ). the study was registered at https://www.isrctn.com/isrctn . residual nucleic acid extracts derived from the nose-throat swabs of sars-cov- positive individuals whose diagnostic samples were submitted to the west of scotland specialist virology centre between rd march and th june were sequenced as part of the cog-uk consortium under study protocols approved by the relevant national biorepositories ( /ws/ nhs and /s / ) (consortiumcontact@cogconsortium.uk, ) . rbm residues were determined based on the rbd:ace complex crystal structures ajf for sars-cov (li et al., ) and m j for sars-cov- (lan et al., ) . the ajf structure was obtained from the pdb-redo server (pdb-redo.eu) and was subsequently prepared in the molecular modeling software moe (v . , https://www.chemcomp.com) using the structure preparation, protonation and energy minimization steps with default settings. rbd residues within . a distance of any ace atoms (determined using moe) were determined for each of the two copies of the complex in the asymmetric unit, and then were combined to obtain the rbm. m j was obtained from the coronavirus structural task force server (https://github.com/thornlab/coronavirus_structural_task_force) and was further refined (using refmac v . . ), manually fitted (using coot v . ) and prepared (using moe, as described above) in multiple iterative cycles. the final structure was analyzed for rbd-ace contact residues with a . a cutoff to obtain the rbm (using moe). the final list of rbm residues (figure c ) was arrived at by combining the sars-cov and sars-cov- results. using moe, the pairwise binding energy between each residue in sars-cov- rbd and each residue in ace , and the total binding energy for all interactions, was determined at cutoff distances . a, . a, . a, . a, . a, . a, . a, . a and . a. the percentage of the total binding energy for each interacting rbd residue was calculated for each distance cutoff and was then averaged over all cutoffs. the resulting values are shown in green in figure c . differential accumulation of amino acid variants in the rbm, rbd or spike protein was computed taking into account only the presence or absence of a variant at any residue. each variant called present counts one. a variant is called present if there are at least x number of supporting sequences deposited in gisaid, where x varies from to . the number of variants is then normalized to the size of the domain (number of residues). dms data was retrieved from . variant-level dms scores were aggregated to residue-level by taking the minimum (most disruptive variant) or the average score across all variants of a residue, except for the reference residue and the stop codon. alternatively, minimum and average scores are computed only across variants that have been observed as naturally occurring. data were represented as a heatmap annotated with: frequency of non-reference amino acids in deposited gisaid sequences (n ≈ , , at least sequences were required to call a variant as present), in log scale; number of countries in which a variant was observed; and percentage of total binding energy computed from an x-ray crystal structure (cf. structural analysis methods section). prefusion-stabilized sars-cov- spike protein variants (residues - , containing the p and furin cleavage site mutations with a muphosphatase signal sequence and a c-terminal avi- xhis-epea-tag in a pd -v vector (atum bio) were expressed in expi f cells at °c and % co according to manufacturer's instructions (thermo fisher scientific). cell culture supernatant was collected after four days and purified over a ml c-tag affinity matrix (thermo fisher scientific). elution fractions were concentrated and injected on a superose increase / gl column with x pbs ph . as running buffer. sars-cov- rbd variants (residues - with a c-terminal thrombin-cleavage site-twinstrep- xhis-tag, and n-terminal signal sequence) were expressed in expi f cells at °c and % co in a humidified incubator. transfection was performed using expifectamine reagent (thermo fisher scientific). cell culture supernatant was collected three days after transfection and supplemented with x pbs to a final concentration of . x pbs ( . mm nacl, . mm kcl and . mm phosphates), or . x for rbd n r. sars-cov- rbds were purified using or ml histalon superflow cartridges (takara bio) and subsequently buffer exchanged into cytiva x hbs-n buffer or pbs. rbds from other sarbecoviruses were expressed in expi f cells at °c and % co . cells were transfected using pei max. cell culture supernatant was collected seven days after transfection. proteins were purified using a ml strep-tactin xt superflow high capacity cartridge followed by buffer exchange to pbs using hiprep / desalting columns. for s binding measurements, recombinant ace (residues - from uniprot q byf with a c-terminal thrombin cleavage site-twinstrep- xhis-ggg-tag, and nterminal signal sequence) was expressed in expi cells at °c and % co in a humified incubator. transfection was performed using expifectamine reagent (thermo fisher scientific). cell culture supernatant was collected seven days after transfection, supplemented with buffer to a final concentration of mm tris-hcl ph . , mm nacl, and then incubated with biolock solution for one hour. after filtration through a . µm filter, ace was purified using a ml streptrap hp column (cytiva) followed by isolation of the monomeric ace by size exclusion chromatography using a superdex increase / gl column pre-equilibrated in pbs (gibco - ). for binding measurements with surface-captured rbd, recombinant ace (residues - from uniprot q byf with a c-terminal avitag- xhis-ggg-tag, and nterminal signal sequence) was expressed in hek .sus using standard methods (atum bio). protein was purified via ni sepharose resin followed by isolation of the monomeric ace by size exclusion chromatography using a superdex increase / gl column pre-equilibrated with pbs. for binding measurements with surface-captured ace , recombinant ace (residues - with a c-terminal gs-igg a-mm-fc tag, and n-terminal signal sequence) was stably transfected in cho-k gs knock-down cell line (atum bio). protein was purified via protein a and buffer exchanged into pbs. spr binding measurements were performed using a biacore t instrument. s protein was surface captured via anti-avitag pab covalently immobilized on a cm chip, rbd protein was surface captured via streptactin xt covalently immobilized on a cm chip, and ace -mfc was surface captured via covalent immobilization of the cytiva mouse antibody capture kit on a c chip. running buffer was cytiva hbs-ep+ (ph . ) and all measurements were performed at °c. all experiments were performed as singlecycle kinetics, with a -fold dilution series of monomeric ace starting from nm, each concentration injected for sec, or a -fold dilution series of rbd starting from nm, each concentration injected for sec. all data were double reference-subtracted and fit to a binding model using biacore evaluation software. for one representative replicate, capture levels were normalized to wt for visualization. binding data with ace as analyte were fit to a : binding model. binding data with rbd as analyte were fit to a heterogeneous ligand binding model, due to an artifactual kinetic phase with very slow dissociation that arises when rbd is an analyte; the lower affinity of the two kds reported by the fit is reported as the kd of the rbd-ace interaction (the two reported kds are separated by at least two orders of magnitude for all fits). the measured kd for ace binding to s is likely influenced by conformational dynamics of the rbds in the context of the prefusion s trimer. reported kds are an average of - replicates measured on at least two separate days, with error given as sem. a national sequencing collaboration formed at the start of the epidemic in the uk, cog-uk consortium (consortiumcontact@cogconsortium.uk, ), has facilitated the tracking of sars-cov- sequences across scotland since the start of the outbreak in february ( , sequences by oct , ) and real-time monitoring of genetic changes in the spike gene that might be associated with changes in virulence or transmissibility. sequencing was carried out using an amplicon-based protocol in real-time at a rate of up to genomes per week. % of samples were selected as surveillance samples, representing scottish health boards proportionately based on population size, while % were selected to allow intervention with local issues such as nosocomial infection in hospitals and nursing homes. a gradual increase in the prevalence of the n k polymorphism was noted to become increasingly prevalent during april . this was noted to be particularly common in the greater glasgow & clyde nhs health board region but spread to adjacent scottish health boards also. sequencing libraries were prepared according to the artic ncov- described in detail at https://artic.network/ncov- . briefly, pcr amplicons were generated using the ncov- primalseq sequencing primers using - cycles of amplification. generated amplicons were used to prepare either oxford nanopore or illumina sequencing libraries. oxford nanopore libraries were prepared as described in the link above and sequenced in a flow cell r . . (oxford nanopore technologies, part number flo-min d), using minknow version . . . raw fast files were basecalled using guppy version . . in high accuracy mode with a minimum quality score of . reads were size filtered, demultiplexed and trimmed with porechop (https://github.com/rrwick/porechop), and mapped against reference strain wuhan-hu- (mn ). variants were called using nanopolish . . and accepted if they had a loglikelihood score of greater than and minimum read coverage of . for illumina sequencing, amplicons were used to prepare libraries using the kapa hyperprep kit (kapa biosystems, part number kk ) and further processed as described in the competition assay sequencing method. sequencing was carried out on illumina's miseq system (illumina, part number sy- - ) using a miseq reagent v cycle kit (illumina, part number ms- - ) . reads were trimmed with trim_galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and mapped with bwa (li and durbin, ) ) to the wuhan-hu- (mn ) reference sequence, followed by primer trimming and consensus calling with ivar (grubaugh et al., ) and a minimum read coverage of . uk sequences were obtained from the cog-uk consortium, https://www.cogconsortium.uk. global sequences were obtained from the gisaid initiative, https://www.gisaid.org on oct . the sequences were mapped using minimap and padded against the wuhan/wh / reference. the sequences were downsampled with weights that normalise sequence count per epiweek, maximise the number of countries and lineages represented, and enriching for sequences with the n k mutation. a maximum-likelihood phylogenetic tree was constructed using iq-tree with the the following parameters: -czb -blmin . -m hky --runs and all other parameters set to default. the tree was visualised with custom python code using the baltic library, https://github.com/evogytis/baltic. for the phylodynamic analysis, scottish "introduction" lineages were identified (lycett et al., , in prep and see http://sars .cvr.gla.ac.uk/risefallscotcovid), and the skygrowth package in r was used to estimate the effective population size over time, and the growth rate of the lineage within scotland (volz and frost, ) . clinical samples submitted to the west of scotland specialist virology centre for sars-cov- diagnostic rt-pcr testing were selected for sequencing as part of the covid- uk genomics uk consortium (cog-uk) project, resulting in whole genome sequences originating from the nhs greater glasgow and clyde health board region. sequences were linked to electronic patient records and basic metadata including sample date, age, sex, admission to hospital and mortality at days post diagnosis extracted. the electronic patient records of a subset of patients underwent full casenote review and clinical severity was recorded based on a -level ordinal scale: . no requirement for respiratory support, . treatment with supplemental oxygen via facemask or low-flow nasal cannulae, . intubation and ventilation, non-invasive ventilation or oxygen delivery by high flow nasal cannulae devices, . death within the days following diagnosis. we modified the who ordinal scale to these points as described previously (volz et al., ) to avoid using hospitalisation as a criterion of severity because ) many patients in nursing homes had severe infection but were not admitted to hospital, and ) early in the outbreak, all cases were hospitalised irrespective of the severity of their infection. these data had previously been analysed to test for an effect of the d g mutation on the severity of disease (volz et al., ) ; we extend that analysis here using the same methodology to test for an effect of the n k mutation. additionally, we perform a new analysis using a model with the same structure to test for an effect of both the d g mutation and the d g/n k mutation combination on the viral load of infected patients, as measured by cycle threshold value. in both cases we cannot estimate the marginal effect of the n k mutation, as we only have the mutation on the g genetic background, so the individual effect of n k cannot be separated from any potential epistatic interactions between the mutations. briefly, the structure of the model used previously (volz et al., ) and in the present study is a phylogenetic generalised additive model with mutation being the primary predictor of interest. the model controls for biological sex, age and the number of days since the first reported case in the dataset, with the latter two being included as penalised splines with a maximum of knots. if the patient was part of a cluster of cases, this was included as a random effect, with individuals not part of clusters being assigned their own levels. correlations driven by the rest of the genome are controlled for by a phylogenetic random effect using a correlation matrix generated under a brownian motion assumption from a phylogeny estimated in iq-tree v. . . (minh et al., ) using a hky + Γ model, masking the positions recommended by de maio et al. as of / / (https://virological.org/t/issues-with-sars-cov- -sequencing-data/ / ), rooted on the first sequenced sars-cov- genome (wu et al., ) . the priors for the severity model were those used in the previous analysis of this data. the priors for the model of the viral load were a student-t (mean = , scale = , degrees of freedom = ) prior on the model intercept, a gaussian (mean = , standard deviation = ) prior over the fixed effects, and an exponential (lambda = . ) prior over the random effect, penalised spline and residual standard deviations. there are two key structural differences between the model used previously (volz et al., ) and the model used here. firstly, mutation is a three level rather than two level factor (d /n , d g/n and d g/n k) with the ancestral d /n being the reference level. secondly, as we are now interested in two mutations, we estimated the phylogeny used to control for the effect of the rest of the genome excluding both the nucleotide position underlying the d g mutation and the nucleotide position underlying the n k mutation (in addition to the sites from de maio et al mentioned above). the severity model used a cumulative error structure while the model on the ct values used a gaussian error structure. in both cases, the models were estimated in brms v. . . (bürkner, ) . the presented models had no divergent transitions, rhat values less than . , and appropriate bulk and tail effective sample sizes for all parameters. shortest probability intervals were calculated using the r package spin v. . (liu et al., ) . analysis code is available at https://github.com/dpascall/sars-cov- -mutationanalysis. all samples were tested in duplicate using the -ncov_n assay rt-qpcr assay (https://www.fda.gov/media/ /download). ready-mixed primers and probe were obtained from idt (leuven, belgium). pcr was carried out using neb luna universal probe one-step reaction mix and enzyme mix (new england biolabs, herts, uk), primers and probe at nm and . nm, respectively, and µl of rna sample in a final volume of µl. no template negative controls were included after every seventh sample. six ten-fold dilutions of sars-cov- rna standards were tested in duplicate in each assay; standards were calibrated using a plasmid containing the n sequence that had been quantified using droplet digital pcr. thermal cycling was performed on an applied biosystems™ fast pcr instrument running sds software v . (thermofisher scientific) under the following conditions: °c for minutes and °c for minute followed by cycles of °c for s and °c for minute. assays were repeated if the reaction efficiency was < % or the r value of the standard curve was ≤ . . where possible, testing of samples was repeated if the %cv of the duplicates was < %. veroe -ace cells (veroe cells induced to overexpress ace ) either with or without tmprss overexpression (rhin et al., under review) were seeded in a well plate and inoculated with an moi of . with either the gla (n /d g) or gla (n k/d g) virus isolates for hr before washing the cells three times in pbs and replacing with % dmem. ul of media was removed at each timepoint, rna was extracted, and the presence of sars-cov- determined using -ncov-n assays (idt) with an neb luna universal probe one-step rt-qpcr kit. a standard curve was used to determine the copy number present per ml of cell culture media. ul of the fresh media was also tested for the presence of virus, which was undetectable in all wells. three t flasks were seeded with veroe -ace or veroe -ace -tmprss and inoculated with either single viruses or both gla and gla virus strains at an moi of . for hr. the flasks were washed three times with pbs, with ul of the final wash being retained to determine the presence of free virus, before adding ml of fresh % dmem. at , , and hrs, ul of media was removed, which was replaced with ul fresh media. ul was used for rna extraction and ngs analysis of the frequencies of the specific positions within the spike protein. the single virus inoculations showed no alternations in the frequency of the amino acid positions and the final wash showing no free virus in the supernatant. we used an unbiased metagenomic ngs sequencing pipeline to quantify variation across the whole viral genome on the illumina ngs next seq platform. briefly, extracted nucleic acid was incubated with dnasei (thermo fisher, part number am ) followed by cdna synthesis using superscript iii (thermo scientific, part number ) and nebnext ultra ii non-directional rna second strand synthesis module (new england biolabs, part number e l). samples were further processed using the kapa ltp library preparation kit for illumina platforms (kapa biosystems, part number kk ) and indexed with the nebnext multiplex oligos for illumina unique dual index primer pairs (new england biolabs, part number e s). libraries were sequenced on illumina's nextseq system (illumina, part number sy- - ), generating million pairs of reads per sample. human mabs were isolated from plasma cells or memory b cells of sars-cov or sars-cov- immune donors, as previously described (corti et al., ; pinto et al., ; tortorici et al., ) . ly-cov mab was obtained from eli lilly and company. regn and regn mabs were produced recombinantly based on published sequences (hansen et al., ) . a total of human monoclonal antibodies or human sera were tested for binding to rbd wt and mutants. spectraplate- plates with high protein binding treatment (custom made from perkin elmer) were coated overnight at °c with . µg/ml (for mabs) or ug/ml (for sera) sars-cov- rbd wt, n k, k v or n k/k v in phosphate-buffered saline (pbs), ph . . plates were subsequently blocked with blocker casein % supplemented with . % tween (sigma-aldrich) for h at room temperature. the coated plates were incubated with serial dilutions of the monoclonal antibodies or of the sera for h at room temperature. the plates were then washed with pbs containing . % tween- (pbs-t), and alkaline phosphatase-goat anti-human igg (southern biotech) was added and incubated for h at room temperature. after washing steps with pbs-t, p-nitrophenyl phosphate (pnpp, sigma-aldrich) substrate was added and incubated for min at room temperature. the absorbance of nm was measured by a microplate reader (biotek). fitting was performed using a -parameter logistic ( pl) model, yielding dose-response curves from which the area under the curve (auc) between and ng/ml was computed. the auc allows to capture, in a single metric, shifts of interest in two parameters of the pl model: ec and upper asymptote. bli binding measurement was performed on a selection of human monoclonal antibodies tested by elisa. antibodies were diluted to . µg/ml in kinetic buffer (pbs supplemented with . % bsa) and immobilized on protein a biosensors of an octet red system (fortébio). antibody-coated biosensors were incubated for min with a solution containing µg /ml of sars-cov rbd wt, n k, k v or n /k v in kinetic buffer. a dissociation step was then performed by incubating the biosensors for min in kinetic buffer. change in molecules bound to the biosensors caused a shift in the interference pattern that was recorded in real time and plotted using graphpad prism software. replication defective vsv pseudovirus (takada et al., ) expressing sars-cov- spike protein were generated as previously described (riblett et al., ) with some modifications. plasmids encoding sars-cov- spike variants were generated by site-directed mutagenesis of the wild-type plasmid, pcdna . (+)-spike-d (giroglou et al., ) . lenti-x™ t cells (takara, ) were seeded in -cm dishes at a density of e cells/cm and the following day transfected with µg of spike expression plasmid with transit-lenti (mirus, ) according to the manufacturer's instructions. one day post-transfection, cells were infected with vsv-luc (vsv-g) (kerafast, eh -pm) for h, rinsed three times with pbs, then incubated for an additional h in complete media at °c. the cell supernatant was clarified by centrifugation, filtered ( . µm), aliquoted, and frozen at - °c. vero e cells (atcc crl- ) were seeded into clear bottom white well plates (costar, ) at a density of e cells per well. the next day, mabs were serially diluted in pre-warmed complete media, mixed at a : ratio with pseudovirus and incubated for h at °c in round bottom polypropylene plates. media from cells was aspirated and µl of virus-mab complexes were added to cells and then incubated for h at °c. an additional µl of prewarmed complete media was then added on top of complexes and cells incubated for an additional - h. conditions were tested in duplicate wells on each plate and at least six wells per plate contained uninfected, untreated cells (mock) and infected, untreated cells ('no mab control'). virus-mab-containing media was then aspirated from cells and ml of a : dilution of bio-glo (promega, g ) in pbs was added to cells. plates were incubated for mins at room temperature and then were analyzed on the envision plate reader (perkinelmer). relative light units (rlus) for infected wells were subtracted by the average of rlu values for the mock wells (background subtraction) and then normalized to the average of background subtracted "no mab control" rlu values within each plate. percent neutralization was calculated by subtracting from the normalized mab infection condition. data were analyzed and visualized with prism (version . . ). ic curves were calculated from the interpolated value from the log(inhibitor) vs. response -variable slope (four parameters) nonlinear regression with an upper constraint of < . each neutralization infection was conducted on three independent days. . dms score is the binding or expression fold change over wt on a log scale. aggregated dms data is shown for each residue by taking the minimum (most disruptive variant) or the average score across all possible variants of a residue, except for the reference residue and the stop codon ('mutagenesis' columns). alternatively, minimum and average scores are computed only across variants that have naturally occurred ('observed variants' columns). when no natural variants have been observed, cells are grey. the heatmap is annotated with frequency of non-reference amino acids in deposited sequences (at least sequences were required to call a variant), in log scale; number of countries in which a variant was observed; and percentage of total binding energy between rbd and hace computed from an x-ray crystal structure. data were sorted on the leftmost dms column. legend on next page (h) correlation of elisa-binding fold change and neutralization fold change for each variant relative to wt (where a smaller elisa auc and therefore a smaller ratio represents loss of binding, and a larger ic and therefore a larger ratio represents loss of neutralization) a rbm rbd table s . details of the sarbecovirus sequences used for figure s . the top sequences shaded in gray were used for the similarity plot and all sequences were used for the entropy plot. parameter estimates on the link scale from the model estimating the impact of the n k mutation on the ct value of patients infected with sars-cov- in scotland. credible intervals represent % the shortest posterior density intervals. the difference between d g/n and d g/n k was estimated by direct subtraction of the hamiltonian monte carlo samples of the d g/n k estimate from the d g/n estimate. ct value did not appear strongly correlated with biological sex or age after controlling for the other factors. patients infected with related viral genomes had correlated ct values at testing potentially implying that there are other undescribed mutations in the genome that are affecting the viral load. parameter estimates on the link scale from the model estimating the impact of the n k mutation on the severity of infection of patients infected with sars-cov- in scotland. credible intervals represent % the shortest posterior density intervals. thresholds correspond to the positions of the boundaries between the different severity classes. amino acid change gene mutation gla c t nsp p l c t s d g a g e v a a t t c gla c t nsp p l c t nsp v a t c s n k c a s d g a g orf v f g t table s nucleotide differences between gla and gla . snps determined by cov-glue on consensus sequences relative to wuhan-hu- (nc_ . ). antibody cocktail to sars-cov- spike protein prevents rapid mutational escape seen with individual antibodies evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic advanced bayesian multilevel modeling with the r package brms an integrated national scale sars-cov- genomic surveillance network sars-cov- neutralizing antibody ly-cov in outpatients with covid- a neutralizing antibody selected from plasma cells that binds to group and group influenza a hemagglutinins genomic epidemiology of sars-cov- spread in scotland highlights the role of european travel in covid- emergence a sars-cov- vaccine candidate would likely match all currently circulating variants how single mutations affect viral escape from broad and narrow antibodies to h influenza hemagglutinin safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial retroviral vectors pseudotyped with severe acute respiratory syndrome coronavirus s protein complete mapping of mutations to the sars-cov- spike receptor-binding domain that escape antibody recognition an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar sars-cov- d g variant exhibits enhanced replication ex vivo and earlier transmission in vivo d g mutation of sars-cov- spike protein enhances viral infectivity an mrna vaccine against sars-cov- -preliminary report neutralizing antibodies against sars-cov- and other human coronaviruses ly-cov , a rapidly isolated potent neutralizing antibody, provides protection in a non-human primate model of sars-cov- infection phase - trial of a sars-cov- recombinant spike protein nanoparticle vaccine the influenza virus hemagglutinin head evolves faster than the stalk domain tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure of sars coronavirus spike receptorbinding domain complexed with receptor fast and accurate short read alignment with burrows-wheeler transform the impact of mutations in sars-cov- spike on viral infectivity and antigenicity emergence of sars-cov- through recombination and strong purifying selection transmission dynamics and evolutionary history of -ncov simulation-efficient shortest probability intervals natural selection in the evolution of sars-cov- in bats, not humans rapid implementation of sars-cov- sequencing to investigate cases of health-care associated covid- : a prospective genomic surveillance study iq-tree : new models and efficient methods for phylogenetic inference in the genomic era refmac for the refinement of macromolecular crystal structures mapping neutralizing and immunodominant sites on the sars-cov- spike receptor-binding domain by structure-guided high-resolution serology cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody a dynamic nomenclature proposal for sars-cov- lineages to assist genomic epidemiology a haploid genetic screen identifies heparan sulfate proteoglycans supporting rift valley fever virus infection convergent antibody responses to sars-cov- in convalescent individuals coronavirus rna proofreading: molecular basis and therapeutic targeting isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model structural basis of receptor recognition by sars-cov- suptavumab for the prevention of medically attended respiratory syncytial virus infection in preterm infants deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding a system for functional analysis of ebola virus glycoprotein ultrapotent human antibodies protect against sars-cov- challenge via multiple mechanisms scalable relaxed clock phylogenetic dating evaluating the effects of sars-cov- spike mutation d g on transmissibility structure, function, and antigenicity of the sars-cov- spike glycoprotein top -pairwise similarity to sars-cov- (sliding window size of amino acids) for seven related sarbecoviruses (see figure key) across the rbd region of the spike protein. bottom -site-specific entropy plot across the rbd protein alignment of sars-cov- and related viruses (data s ). entropy for each position l (h(l)) was calculated using shannon's entropy formula with a natural log as sites constituting the rbm are annotated in blue the x-axis refers to absolute positions in the sars-cov- spike protein sequence. rightbox plot of site-specific entropy values for the rbm sites (blue) and remaining non-rbm rbd sites (gray) sequence alignment (left) and identity for rbm and rbd (right) to sars-cov- of the rbd sequences showing binding to hace . rbm residues indicated by blue boxes. (c) binding of hace to human, pangolin and bat sarbecovirus rbds by bli. bat cov ratg we thank all scottish nhs virology laboratories who provided samples for sequencing and scott arkison for hpc maintenance. we thank chiara silacci-fregni from humabs biomed, sandra jovic, blanca fernandez rodriguez, federico mele, from the institute for research in biomedicine in bellinzona and tatiana terrot from ente ospedaliero cantonale in lugano for the help in collecting sera samples. we thank cindy ng for help with protein production. we thank julia di iulio for help with analyzing gisaid sequences. we gratefully acknowledge the authors, originating and submitting laboratories of the sequences from gisaid, https://www.gisaid.org, on which much of this research is based.the isaric who ccp-uk study protocol is available at https://isaric c.net/protocols; study registry https://www.isrctn.com/isrctn . this work uses data provided by patients and collected by the nhs as part of their care and support #datasaveslives. we are grateful to the frontline nhs clinical and research staff and volunteer medical students who collected the data in challenging circumstances; and the generosity of the participants and their families for their individual contributions in these difficult times. we also acknowledge the support of jeremy j farrar, nahoko shindo, devika dixit, nipunie rajapakse, lyndsey key: cord- -geb esu authors: bukreyeva, natalya; mantlo, emily k.; sattler, rachel a.; huang, cheng; paessler, slobodan; zeldis, jerry title: the impdh inhibitor merimepodib suppresses sars-cov- replication in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: geb esu the ongoing covid- pandemic continues to pose a major public health burden around the world. the novel coronavirus, severe acute respiratory syndrome coronavirus (sars-cov- ), has infected over one million people worldwide as of april, , and has led to the deaths of nearly , people. no approved vaccines or treatments in the usa currently exist for covid- , so there is an urgent need to develop effective countermeasures. the impdh inhibitor merimepodib (mmpd) is an investigational antiviral drug that acts as a noncompetitive inhibitor of impdh. it has been demonstrated to suppress replication of a variety of emerging rna viruses. we report here that mmpd suppresses sars-cov- replication in vitro. after overnight pretreatment of vero cells with μm of mmpd, viral titers were reduced by logs of magnitude, while pretreatment for hours resulted in a -log drop. the effect is dose-dependent, and concentrations as low as . μm significantly reduced viral titers when the cells were pretreated prior to infection. the results of this study provide evidence that mmpd may be a viable treatment option for covid- . the ongoing covid- pandemic continues to pose a major public health burden around the world. the novel coronavirus, severe acute respiratory syndrome coronavirus (sars-cov- ), has infected over one million people worldwide as of april, , and has led to the deaths of nearly , people. no approved vaccines or treatments in the usa currently exist for covid- , so there is an urgent need to develop effective countermeasures. the impdh inhibitor merimepodib (mmpd) is an investigational antiviral drug that acts as a noncompetitive inhibitor of impdh. it has been demonstrated to suppress replication of a variety of emerging rna viruses. we report here that mmpd suppresses sars-cov- replication in vitro. after overnight pretreatment of vero cells with μm of mmpd, viral titers were reduced by logs of magnitude, while pretreatment for hours resulted in a -log drop. the effect is dose-dependent, and concentrations as low as . μm significantly reduced viral titers when the cells were pretreated prior to infection. the results of this study provide evidence that mmpd may be a viable treatment option for covid- . drugs with history of being tested in human patients or used for treatment of other conditions offer the most expedient option, and several such drugs are currently being tested for efficacy against sars-cov- , including many broad-spectrum antivirals. one such antiviral, merimepodib (mmpd), has already being tested against hepatitis c in patients as well as against many emerging rna viruses in cell culture, including zika, ebola, lassa, junin, and chikungunya viruses ( ) . mmpd noncompetitively inhibits inosine- '-monophosphate dehydrogenase (impdh), an enzyme responsible for de novo synthesis of guanosine nucleotides ( , ) . in vitro, inhibition can be reversed by addition of exogenous guanosine ( ). early work suggests that impdh may directly interact with sars-cov- nsp , perhaps indicating that drugs targeting impdh such as mmpd might have an impact on viral replication ( ) . this drug, while still investigational, is considered to be safe in humans. more than patients have received the drug in phase i and phase ii clinical trials. many were treated for six months. in this study, we aimed to examine whether mmpd may be active in reducing sars-cov- replication. vero cells (ccl- , atcc) were maintained in dulbecco's modified eagle medium (hyclone) supplemented with % fbs (gibco) and % penicillin and streptomycin solution (hyclone). we also tested t- , another broad-spectrum antiviral that acts as a nucleoside analog. both overnight and -hour pretreatment with high doses ( - μm) of t- failed to inhibit sars-cov- replication. overall, our results indicate that therapeutic concentrations of mmpd, but not t- , is effective in reducing sars-cov- coronavirus titers. our results show that mmpd can inhibit sars-cov- replication at low concentrations. this is likely due its inhibition of impdh, leading to a depletion of guanosine for use by the viral polymerase during replication. by contrast, t- has been reported to weakly inhibit impdh, instead acting as a nucleotide analogue and interacting more specifically with certain viral polymerases ( ) . further work is needed to characterize the full mechanism behind mmpd inhibition of sars-cov- as well as its efficacy in animal models of corona virus infections. in covid- infection it is essential to minimize the spread of the viral infection to the lower respiratory tract. we chose to test mmpd pretreated uninfected cells in order to see if viral spread can be limited. this approach potentially allows us to model the use of this drug for prophylaxis. this is important since a large proportion of exposed patients can experience rapid expansion of their viral burden while being asymptomatic. a drug that thwarts this viral expansion will allow the immune system to eliminate the nascent infection. since mmpd is a host directed therapy and not a direct acting antiviral, the likelihood of emergence of resistant variants of the virus to mmpd is low. potentially mmpd might be used in combination with direct antivirals or immunomodulators. the concentrations tested in this study are easily clinically-achievable in human patients. mg mmpd administered orally results in plasma concentrations of around ng/ml ( μm) shortly after administration ( ) . mmpd may therefore be a viable treatment option for covid- that can be quickly tested and deployed. merimepodib, an impdh inhibitor, suppresses replication of zika virus and other emerging viral pathogens imp dehydrogenase: structure, mechanism, and inhibition the structure of inosine ′-monophosphate dehydrogenase and the design of novel inhibitors broad-spectrum antiviral activity of the imp dehydrogenase inhibitor vx- : a comparison with ribavirin and demonstration of antiviral additivity with alpha interferon a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing potent antiviral activities of type i interferons to sars-cov- infection mechanism of action of t- against influenza virus merimepodib, pegylated interferon, and ribavirin in genotype chronic hepatitis c pegylated interferon and ribavirin nonresponders key: cord- -wgtatr v authors: joshi, madhvi; puvar, apurvasinh; kumar, dinesh; ansari, afzal; pandya, maharshi; raval, janvi; patel, zarna; trivedi, pinal; gandhi, monika; pandya, labdhi; patel, komal; savaliya, nitin; bagatharia, snehal; kumar, sachin; joshi, chaitanya title: genomic variations in sars-cov- genomes from gujarat: underlying role of variants in disease epidemiology date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wgtatr v humanity has seen numerous pandemics during its course of evolution. the list includes many such as measles, ebola, sars, mers, etc. latest edition to this pandemic list is covid- , caused by the novel coronavirus, sars-cov- . as of th july , covid- has affected over million people from + countries, and , , deaths. genomic technologies have enabled us to understand the genomic constitution of the pathogens, their virulence, evolution, rate of mutations, etc. to date, more than , virus genomes have been deposited in the public depositories like gisaid and ncbi. while we are writing this, india is the rd most-affected country with covid- with . million cases, and > deaths. gujarat is the fourth highest affected state with . percent death rate compared to national average of . percent. here, sars-cov- genomes from across gujarat have been sequenced and analyzed in order to understand its phylogenetic distribution and variants against global and national sequences. further, variants were analyzed from diseased and recovered patients from gujarat and the world to understand its role in pathogenesis. from missense mutations, found from gujarat sars-cov- genomes, c t, deleterious mutation in nucleocapsid (n) gene was found to be significantly associated with mortality in patients. the other significant deleterious variant found in diseased patients from gujarat and the world is g t, which is located in orf a and has a potential role in viral pathogenesis. sars-cov- genomes from gujarat are forming distinct cluster under gh clade of gisaid. detailed mutation frequency profile is provided as supplemental table s . with reference to indian genomes, g t, c t, c a, c t and c t were found to be occurring at more than % frequencies (p-value < . ). from these mutations, g t, c t and c a were found to be missense mutations. g t and c a lie in the region of orf a encoding nsp . further deceased versus recovered patient mutation profile analysis of the known patient's status dataset from gujarat and global is represented in figure . similarly, comparison of missense mutation profile of deceased verses recovered patients with genome count, frequency > %, and p-value for global dataset is represented in improve the adaptive behaviour of pathogenic species, thus, making them highly contagious. further, laboratory and experimental studies need to be carried out to validate the exact role of this particular mutation in respect to the molecular pathways and interactions in the biological systems despite being a strong possible mutation candidate found in the gujarat region. the genomics approach has been a useful resource to identify and characterize the virulence, fastqc version . . . a quality control tool for high throughput genomes positive selection of orf a and orf genes drives the evolution of sars-cov- during the covid- pandemic genome composition and divergence of the novel coronavirus ( -ncov) originating in china beware of the second wave of covid- full-genome sequences of the first two sars-cov- viruses from india genotyping coronavirus sars-cov- : methods and implications phylosuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies the authors are grateful to the secretory, department of science and key: cord- - nccarjn authors: luzes, rafael; muzi-filho, humberto; pereira-acácio, amaury; crisóstomo, thuany; vieyra, adalberto title: angiotensin-( – ) modulates angiotensin converting enzyme (ace ) downregulation in proximal tubule cells due to overweight and undernutrition: implications regarding the severity of renal lesions in covid- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nccarjn the renal lesions – including severe acute kidney injury – are severe outcomes in sars-cov- infections. there are no reports regarding the influence of the nutritional status on the severity and progress of these lesions. ageing is also an important risk factor. in the present communication we compare the influence of overweight and undernutrition in the levels of renal angiotensin converting enzymes and . since the renin-angiotensin-aldosterone system (raas) has been implicated in the progress of kidney failure during covid- , we also investigated the influence of angiotensin-( – ) (ang-( – )) the shortest angiotensin-derived peptide, which is considered the physiological antagonist of several angiotensin ii effects. we found that both overweight and undernutrition downregulate the levels of angiotensin converting enzyme (ace ) without influence on the levels of ace in kidney rats. administration of ang-( – ) recovers the control levels of ace in overweight but not in undernourished rats. we conclude that chronic and opposite nutritional conditions play a central role in the pathophysiology of renal covid- lesions, and that the role of raas is also different in overweight and undernutrition. with high levels of inflammatory factors [ ] that can stimulate the renin-angiotensin- aldosterone system (raas) [ , ] . the experiments were conducted during the development of projects in our laboratory during the last year, and we believe that the resultsif presented together to allow comparisonscould shed some light on the mechanisms underlying the progression and the severity of renal lesions due to covid- male wistar rats were fed: (i) with the rbd from weaning ( figure a ,b) represents, per se, an increase in the risk of kidney proximal tubules damageand therefore of akiduring sars-cov- infection in normonourished rats, and this could also be applicable to human kidneys, as in the lungs [ ] . figure a ,c shows that ace levels remained unmodified over that relatively long period of life and, therefore, this could contribute to the ace/ace imbalance that would worsen kidney injuries, as is the case with lung and heart [ ] , especially in the presence of comorbidities or modifications in the renal local raas. in the next figures our results show that modifications in the nutritional status profoundly modified ace abundance in the renal cortex corticis, which is also modulated in a different manner by antagonizing ras in cases of overweight or undernutrition. figure a ,b shows that the abundance of ace decreased % in the hl (overweight) rats whose body mass was ~ % higher than that of the ctr group. product of a pathway that involves progressively shorter ang ii-derived peptides, as we demonstrated a decade ago [ ] . the mechanism underlying the effect of the peptide could be an action as "allosteric enhancer" that induces a second binding site in at r with a very high affinity for ang ii [ ], possibly involved in the modulation of ace formation. another possibility would be a beneficial formation of heterodimers involving these modified at r and mas receptors, and that are able to act on the ace , as in the case of blood pressure [ ] . that ang-( - )-mediated upregulation of renal ace occurred in overweight rats, but not in the ctr group, is indicative that a "pro- hypertensive tissular microenvironment" (high ang ii) develops in animals fed with a diet rich in lipids, causing downregulation of ace (figure a the influence of chronic undernutrition on renal ace levels present some similarity, but also a huge difference, compared with overweight rats (figure ) . the similarity is in the downregulation of the enzyme level by rbd alone (~ %) (figure a,b), i.e. a value that did not differ from the overweight-induced downregulation seen in figure a ,b (~ %). again, it may be that activation of the ang ii/at r axis now in a "pro-hypertensive tissular microenvironment" induced by the multideficient diet contributes for the notable decrease in ace , and, since there was no modification in ace ( figure a,c) , to the important ace/ace imbalance in proximal tubules that might be crucial in the kidney, as seems to be the case in the lung [ ] . on this account, we have strong evidence for the existence of a "pro-hypertensive tissular at r acts in parallel with at r in promoting tissular damage in kidney tissue (recently reviewed in [ ] ). there is a possibility of an adjuvant and independent at r- associated pathway that, being altered by an amino acid imbalance [ ] due to undernutrition, downregulates ace synthesis and levels. a second possibility is the formation of heterodimers between at r and abnormal at r, which could be modulated by ang ii and ang-( - ), as in the case of spontaneously hypertensive rats [ ]. as noted above, these rats are of a juvenile age. therefore, undernutrition could favor severe kidney lesions in young people affected by covid- , an issue that has not been explored until the present. world health organization. novel coronavirus -china european centre for disease prevention and control. covid- situation update worldwide severe covid- : a review of recent progress with a look toward the future possible cause of inflammatory storm and septic shock in patients diagnosed with (covid- ) multiple organ dysfunction in sars- cov- : mods-cov- . expert rev respir med ( ) online ahead of print should covid- concern nephrologists? why and to what extent? the emerging impasse of angiotensin blockade covid- : risk factors for severe disease and death sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein glomerular localization and expression of angiotensin-converting enzyme and angiotensin-converting enzyme: implications for albuminuria in diabetes the contralateral kidney presents with impaired mitochondrial functions and disrupted redox homeostasis after days of unilateral ureteral obstruction in mice the pivotal link between ace deficiency and sars-cov- infection age-and gender-related difference of ace expression in rat lung age-associated changes in the vascular renin-angiotensin system in mice covid- and endocrine diseases. a statement from the two modes of na extrusion in cells from guinea pig kidney cortex slices a regional basic diet from northeast brazil as a dietary model of experimental malnutrition components of the ain- diets as improvements in the ain- a diet perivascular adipose tissue as a relevant fat depot for cardiovascular risk in obesity obesity-hypertension: an ongoing pandemic obesity-induced hypertension: interaction of neurohumoral and renal mechanisms i-converting enzyme inhibitory peptides in an alkaline protease hydrolyzate derived from sardine muscle antihypertensive effects of peptide in sake and its by-products on spontaneously hypertensive rats the shortest angiotensin ii-derived peptide, opening new vistas on the renin- angiotensin system? comparison between calcium transport and adenosine triphosphatase activity in membrane vesicles derived from rabbit kidney proximal tubules comparing rat's to human's age: how old is my rat in people years? a hypothesis for pathobiology and treatment of covid- : the centrality of ace /ace imbalance is an endogenous ligand for the g protein-coupled receptor mas a scrutiny of the biochemical pathways from ang ii to ang-( - ) in renal basolateral membranes angiotensin-( - ) counteracts the angiotensin ii inhibitory action on renal ca + -atpase through a camp/pka pathway dimerization of at and mas receptors in control of blood altered signaling pathways linked to angiotensin ii underpin the upregulation of renal na + - atpase in chronically undernourished rats effects of ingestion of disproportionate amounts of amino acids inhibits renal na + -atpase in hypertensive rats through a mechanism that involves dissociation of ang ii receptors, heterodimers, and pka administration as indicated on the abscissae. statistical differences were estimated using one-way anova followed by bonferroni's test for the selected pairs indicated within the panels. significant differences were set at p< . . in a, the representative immunoblottings were derived from the same gel, and were cut to remove information irrelevant to the work described in this letter. loading controls were run on the same blot. key: cord- - dw xby authors: kathwate, gunderao h title: in silico design and characterization of multi-epitopes vaccine for sars-cov from its spike proteins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dw xby covid is disease caused by novel corona virus, sars-cov originated in china most probably of bat origin. till date, no specific vaccine or drug has been discovered to tackle the infections caused by sars-cov . in response to this pandemic, we utilized bioinformatics knowledge to develop efficient vaccine candidate against sars-cov . designed vaccine was rich in effective bcr and tcr epitopes screened from the sequence of s-protein of sars-cov . predicted bcr and tcr epitopes were antigenic in nature non-toxic and probably non-allergen. modelled and refined tertiary structure was predicted as valid for further use. protein-protein interaction prediction of tlr / and designed vaccine indicates promising binding. designed multiepitope vaccine has induced cell mediated and humoral immunity along with increased interferon gamma response. macrophages and dendritic cells were also found increased over the vaccine exposure. in silico codon optimization and cloning in expression vector indicates that vaccine can be efficiently expressed in e. coli. in conclusion, predicted vaccine is a good antigen, probable no allergen and has potential to induce cellular and humoral immunity. in december group of patients from wuhan city of china was found to have pneumonia like symptoms and diagnosed with the infection of beta coronavirus ( ) . later it spread across the china and now spread all over the globe. these infections were named as covid disease and the virus as sars-cov by who ( ) . on january due to its spread more than countries declared covid pandemic as a public health emergency of international concern. novelty of sars-cov is its rapid spread may be due asymptomatic patients ( , ) and highly sophisticated, time consuming diagnostic methods( - ). as of may , million confirmed cases with more than , deaths worldwide ( ) . in india due to lockdown positive cases are in control and spread is also marginal, but still after approximately four months' case reports are increases. as of may , a total of confirmed cases with deaths are reported by ministry of health and family welfare, govt. of india. till date, no antiviral drugs available to combat the sars-cov infections. there are few drug candidates in clinical trials but the process is time consuming ( ) . in south korea, lopinavir/ritonavir combination was found significantly effective in lowering of viral load to no detectible or little sar-cov titer( ). but in another group study same combination was totally ineffective beyond standard care ( ) . another drug pair drug, hydroxychloroquine, an antimalarial drug and azithromycin reported to found effectively associated with reduction of viral load ( , ) . but qt interval prolongation that may cause life threating arrhythmia ( ) and in large scale study both drugs and drug alone compared with neither drugs was not associated with mortality ( , ) . in a drug repurposing study, remdesivir, lopinavir, emetine, and homoharringtonine have significantly inhibited replication of sar-cov ( ) . but this was in vitro study, clinical trial reports are underway. similar determined attempts are going through the globe. to break the chain of infection prevention is much better option than treatments. several evidences suggest and support the importance of vaccine to eliminate covid epidemic ( ) ( ) ( ) . various epidemiological surveys provide the evidences that naturally acquired immunity can eliminate the sars-cov titer ( , ) . more than % patients develop mild and % develop severe symptoms of covid with zero case fatality rate ( ) . presently there is no vaccine available against covid disease. efforts are being taken for the development of effective vaccine all over the globe ( ) ( ) ( ) ( ) . all these vaccines are under development ( ) and may require at least to . year to hit the market ( ) . bioinformatics approach can accelerate the process by predicting potential candidate peptides. vaccines against mers, ebola, and human papilloma virus were designed and successfully developed by using bioinformatics approaches ( ) . sars-cov belong to beta coronoviridae family which includes four endemic viz hcov-hku , hcov-oc , hcov- e, hcov-nl , two epidemic viruses like middle east respiratory syndrome virus (mers), and sars ( ) . this is a non-segmented positive strand rna virus with envelop. envelop proteins are categorized into structural, non-structural and accessory proteins. structural proteins are involved in protection and bind to the host. several proteins of sar-cov are important for virulence and pathogenesis. for example nucleocapsid (n) protein n, is essential for rna binding and its replication and transcription ( ) . envelope (e) and membrane (m) proteins and instrumental in virus assembly and virulence promotion ( , ) . e and m proteins are effective in induction of immune response ( ) . spike (s), a structural protein is responsible for binding to receptor on host cell, angiotensin converting enzyme ( ) . thereby proteolytic cleavage by tmprss allows subsequent entry into cell through endocytosis ( ) . s protein also found to be involved in activation of t cell response ( , ) . entire s protein expressed in chimpanzee adeno (chad)-vector showed protection against sars-cov in mice and rhesus macaques ( ) . single dose of chadox ncov- vaccine is sufficient to elicit humoral and cell mediated immune response in both animals. viral load was found reduced compared to control animals and symptoms of pneumonia were absent. here we designed a multi-epitopes vaccine from s protein. this vaccine has all the ideal properties like stability at room temperature, immunogenic, antigenic and non-allergen. all the epitopes are good in stimulation of humeral as well and cell mediated immunity. complete spike protein sequence (p dtc ) of sars-cov was downloaded in fasta format from uniprot protein database. this sequence was used for further analysis and obtaining potential epitopes for b and t cell receptors. there are various tools used to predict the epitopes that could be presented to t cell receptors. potential epitopes of various bacteria and viruses are predicted by using such online epitopes predicting tools. following tools were used for prediction of tcr (t cell receptor) epitopes iedb mhc-i processing predictions, mhc-np, netctlpan . , rankpep, and netmhcpan . . all these tools were used to screen the potential peptides as epitopes for tcr. these tools have various methods to predict the epitopes. iedb (instructor/evaluator database) is a web-based epitope analysis resource includes tools for t cell epitope prediction, b cell epitope prediction and other analysis tools like epitope conservancy etc. this resource use methods based on artificial neural network (ann), stabilized matrix method (smm), and combinatorial peptide libraries(comblib), predicts the peptides way that processed naturally and presented by mhc i ( ) . immunogenicity, toxicity, allergen, conservancy, and antigenicity were analyzed for predicted tcr epitopes. iedb mhc class i immunogenicity server and conservancy tool were used for determination of immunogenicity and conservancy. immunogenicity of a peptide mhc complex (http://tools.immuneepitope.org/immunogenicity/ ) were assessed keeping all the parameters at default. protein sequence variants( ) used setting sequence identity % and other parameter default. toxipred (http://www.imtech.res.in/raghava/toxinpred/index.html ) online tool predicts the toxicity considering physicochemical properties of selected peptides( ). online server allergenfp v. . (http://ddgpharmfac.net/allergenfp/ ) was used to predict peptides as allergens ( ) . antigenicity of epitopes was analyzed by online server vaxijen v . (http://www.ddg-harmfac.net/vaxijen/vaxijen/vaxijen.html ). threshold value was set to . ( ) . this is alignment independent predictor based on auto-cross covariance (acc) transformation epitopes sequences into uniform vectors of principle amino acid properties. accuracy of this server varies in between to % depending on targeted organism. six different method were used for the prediction of b cell receptor (bcr) epitopes. all these methods generate fragments the protein. for the prediction of b cell epitopes, it is necessary to find linear sequence of b cell epitope in the protein sequence. bepipred linear epitope predication server (http://www.cbs.dtu.dk/services/bepipred/ ) uses a hidden markov model and propensity scale method. similarly, other properties are also being consider to predict good b cell epitopes ( ) . those properties are calculated by different methods at iedb server (http://tools.iedb.org/bcell/ )like kolaskar-tongaonkar antigenicity scale provide physiology of the amino acid residues( ), emini surface accessible score for accessible surface of the epitope( ), secondary structure of epitopes also has role in antigenicity. karplus-schulz flexibility score ( ) and chou-fasman β turns methods ( ) were used for flexibility and β turns prediction respectively. parker hydrophilicity prediction method ( ) was used for determination of in silico hydrophilicity of peptides. high scored and common peptides predicted by various tools were selected for deriving sequence of potential vaccine candidate. different epitopes for t cell and b cell receptors were linked together by gpgpg and aay. to enhance the immunogenicity, ompa (genbank: afs . ) protein was chosen as adjuvant and was linked through eaaak at n terminal site of the vaccine. antigenicity of chimeric vaccine was predicted by vaxijen v . (http://www.ddgpharmfac.net/vaxijen/vaxijen/vaxijen.html) and antigenpro. vaxijen . is alignment free antigenicity prediction tool utilizes auto cross covariance transformation of protein sequence into uniform vector vectors of principal amino acid properties ( , ) . antigenpro (http://scratch.proteomics.ics.uci.edu/) is also online tool utilizes protein microarray dataset for the prediction of antigenicity. based on cross validation experiment accuracy of this server is estimated to be around % using combined dataset. allertop v . and allergenfp were two online tools utilized for the prediction of allergenicity of chimeric protein. amino acid edescriptors, auto-and cross-covariance transformation, and the k nearest neighbours (knn) machine learning methods are basis of allertop v . (http://www.ddg-pharmfac.net/allertop) for allergenicity prediction ( ) . allergenfp, is descriptor-based fingerprint, alignment free tool for allergenicity prediction. the tool use four step algorithm. in first step, proteins are described based on their properties like hydrophobicity, size, secondary structures formation and relative abundance. in subsequent step, generated strings are converted into vectors of equal length by acc. then vectors are converted into binary fingerprints and compared in terms of the tanimoto coefficient. applying this approach to known allergen and non-allergens can identify the % of allergen/non-allergen with mathew's correlation coefficient of . ( ) . for solubility prediction of multi-epitopes chimeric vaccine, proso ii server (http://mbiljj .bio.med.uni-muenchen.de: /prosoii/prosoii.seam) was utilized ( ) . proso ii server works on an approach of classifier which utilizes difference between targetdb and pdb and insoluble proteins of targetdb. the accuracy of this server is % at default threshold . . protoparam, online web server was exploited for evaluation of physiochemical properties. the properties like amino acid composition, theoretical pi, instability index, in vitro and in vivo half-life, aliphatic index, molecular weight, and grand average of hydropathicity (gravy) were evaluated. psipred and raptorx property online servers were used to determine secondary structure of predicted vaccine. psipred is publicly available webserver, includes two feed forward neural networks works on output obtained from psi-blast. psipred . attains average q score of . % obtained using very stringent cross approval strategies to assess its performance. raptorx property is another web based server prediction secondary structure of protein without template ( ) . this server utilizes deepcnf (deep convolutional neural fields), a new machine learning model that predicts secondary structure and solvent accessibility and disorder regions simultaneously ( ) . it accomplishes q score of approximately for state secondary structure and approx. % for state solvent accessibility. tertiary structure of multi-epitopes chimeric vaccine candidate was built using i tasser online server (https://zhanglab.ccmb.med.umich.edu/i-tasser/). the i-tasser (iterative threading assembly refinement) web server utilize sequence-to-structure-to-function paradigm to build protein structure ( ) . it is top ranked d protein structure web server in community wide casp experiments ( ) . two web bases servers were used to refine the d structure of multi-epitopes chimeric protein. initially modrefiner server (https://zhanglab.ccmb.med.umich.edu/modrefiner/) and finally galaxyrefine server (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=refine) was used. modrefiner server is an algorithm for atomic level structure refinement utilizes c alpha trace, main chain or atomic model. output structure is refined in term of accurate position of side chains hydrogen bon network and less atomic overlaps ( ) . on other hand galaxy server rebuild the d structure, performs repacking, and uses molecular dynamic simulations to accomplish overall protein structure relaxation. structure refined by galaxy server of best quality in accordance with community-wide casp experiments ( ) . prosa-web (https://prosa.services.came.sbg.ac.at/prosa.php), the errat server (http://services.mbi.ucla.edu/errat/) and rampage (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) web servers were utilized for d structure validation obtained after galaxy server refinement ( ) . prosa web server calculate the quality score as z score that should fall in a characteristic range ( ) . z score obtained for a specific input protein is in context with protein structure available in public domain. errata server analyses non bonded atom-atom interaction the refined d structure of protein compared to high resolved crystallographic protein structures ( ) . ramachandran plot displays energetically allowed and disallowed dihedral psi and phi angles of amino acids. the plot is calculated based on van der waal radius of the protein side chain. rampage server determines ramachandran plot for a protein that include percent residues in allowed and disallowed regions ( ) . more than % b cell epitopes are discontinuous that is they are present in small segments on linear protein and the brought to proximity while protein folding. discontinues b cell epitopes for designed vaccine were predicted by ellipro online tool (http://tools.immuneepitope.org/tools/ellipro) at iedb. in this tool three algorithms implemented to determine the discontinuous b cell epitopes. d structure of input protein is approximated as number of ellipsoid shapes, calculate protrusion index (pi) and clusters neighboring residues. ellipro defines pi score of each residue based on the center of mass of residue residing outside the largest possible ellipsoid. in consideration with other epitope predicting tools ellipro gave an auc value of . , best of all ( , ) . molecular docking server haddock (https://bianca.science.uu.nl/haddock . /) was used to see the interaction of designed vaccine and tlr . haddock (high ambiguity driven proteinprotein docking) is information derived flexible docking server ( ) . galaxyrefine d model of multi-epitopes chimeric proteins, adjuvant tlr (pdb id: nig) and tlr (pdb id: g a) were uploaded for docking at hddock server keeping all the parameter default. finally, top five models were downloaded and by prodigy (protein binding energy prediction) webserver (https://nestor.science.uu.nl/prodigy/) was utilized for prediction of binding affinities ( ) . in silico cloning of vaccine construct was predicted by c-immsim server (http:// . . . /c-immsim/index.php). c-immsim server is freely available web based server utilizes position specific scoring matrix (pssm) for the prediction of immune epitopes and machine learning techniques for immune interaction ( ) . jcat (java codon adaptation tool) server was utilized for reverse translation and codon optimization. codon optimization is carried out in order to express the construct in e. coli host. jcat (http://www.prodoric.de/jcat) output display sequence of nucleotide, other important properties of sequence that includes codon adaptation index (cai), percent gc content essential for to assess the protein expression in host ( ) . finally, vaccine construct was cloned in pet a (+) plasmid vector by adding xhoi and xbai restriction sites at c and n terminus, respectively. snapgene tool was used to clone the construct to ensure the cloning and expression ( ) . iedb recommend, mhc-np, netctlpan . , rankpep and netmhcpan . server predicted the potential candidate epitopes from the spike glycoprotein sequence of sars-cov . epitopes commonly predicated by at least four prediction tools were selected further for analysis parameter. there were four such epitopes with high score and predicted by four different tools (table a) . all the four epitopes were immunogenic and antigenic in nature, conserved %, and predicted to be non-allergen (table b) . immunogenicity scores for the predicted epitopes ranges from . to . . all the epitopes were with positive score hence considered for further analysis. all the peptides predicted to be nontoxic by toxipred online toxicity prediction tool. after provision of spike protein sequence to the bepipred server showed average score of . , with . maximum and . minimum scores (fig ). other properties like physiology of amino acid residues, accessible surface, hydrophilicity, fexibility and β turns were also determined ( table parker hydrophilicity method ( - ). three peptides high scored, predicted for t cell receptors and two peptides for b cell receptors were ligated together by gpgpg or aay. additionally, ompa (genbank: afs . ) was linked at n terminal end of the designed polypeptides by eaaak linker. for purification purpose sis histidine residues were added. final amino acid residues of designed vaccine were when all the five peptides, linkers and adjuvant were ligated (fig ) . antigenicity of the designed multi-epitopes vaccine predicted by vaxijen v . server was . with probable antigen annotation for virus as target organism at . threshold. antigenicity predicted for same sequence of vaccine by antigenpro server was . with . predicted probable solubility upon overexpression. antigenicity of peptide without adjuvant sequence was . with . probable solubility of peptides upon overexpression. the results obtained indicates that with and without adjuvant predicted vaccine sequence is potentially antigenic in nature. allergenfp and allertop servers predicted that both vaccine with adjuvant and without adjuvant are probable non allergen with . and . tanimoto similarity indexes, respectively. the computed molecular weight of designed vaccine was . kda and theoretical pi is . . predicted pi indicates the slight alkalinity of the vaccine ( table ). the estimated half-lives were hrs in mammalian reticulocytes (in vitro), > hours in yeast (in vivo) and > hours in escherichia coli (in vivo). vaccine is highly stable with instability index . that classify the protein as stable. estimated aliphatic index . indicating thermo-stability of the protein ( ) . the predicted gravy score is - . (table ) . negative gravy score indicates that protein is hydrophilic in nature and will interact with water molecules ( ) . secondary structure prediction: secondary structure predicted by online tool raptorx property includes % alpha helix region, % beta sheet region and % coil region. furthermore, solvent accessibility of amino acid residues predicted to be % exposed, % medium exposed and % buried. total residues ( %) are predicted as residues in disordered region. pictorial representation of secondary structure predicted of final protein by psipred is shown in figure . i-tasser predicted the top five d structure model of designed vaccine utilizing threading templates. z score for these templates was ranging from . to . indicating good alignment with sequence submitted. c score is critical in the quality of built model which is quantitatively measured. c score ranges in between - to signifies the best model quality. c score of top five model ranging from - . to - . . model with high c score i.e. - . , estimated tm score . ± . and estimated rmsd . ± . was chosen for further analysis ( fig a) ( ) . for refinement purpose tertiary structure predicted by i-tasser is initially submitted to modrefiner and finally to galaxyrefine. among top five refined models, model found to be the best based on various parameters like gdt-ha . , rmsd-. , molprobit- . and rama favored . (fig b) . quality of refined model was validated by ramachandran plot at rampage webserver. . % residues of modelled vaccine are in favored region, . % in allowed region and . % are in outlier region (fig d) . error in the model are analyzed by prosa web and errat web server. z score is - . ( fig e) and errat score index is . (fig c) . ramachandran plot z score and errat score showed that refined model is of good quality and can be used further. four discontinuous epitopes include of residues were predicted. the score of the predicted epitopes ranges from . to . . shortest and longest discontinuous b cell epitope is of and residues long respectively. haddock web bases server has been utilized to dock the designed vaccine to tlr and tlr . tlr / eliciting immune response toward designed vaccine was analyzed by conformational change with reference to adjuvant tlr / complexes. top models from each group with lowest haddock score were selected. haddock score for tlr /adjuvant and tlr /vaccine were, - . and - . kcal/mol, respectively. in addition, relative binding energies were - . (fig a, b) . similarly, tlr /adjuvant complex have charged-charged , charged polar , charged-apolar , polar-polar , polarapolar and apolar-apolar . while tlr /vaccine complex has charged-charged , charged-polar , charged-apolar , polar-polar , polar-apolar and apolar-apolar interface contacts were observed ( figure c, d) . enhance expression of nucleotide construct in escherichia coli, is reverse translated in jcat online webtool. total number of nucleotides in the constructs were . after codon optimization cai was . and average gc content of optimized nucleotide construct was . indicating good probable expression of vaccine in e. coli k . to clone the gene xho and xbai restriction sites are added at n and c terminus of the construct and gene was cloned in pet a+ plasmid, by using snapgene software (fig ) . c-immsim immune simulator web server was used for determining ability of vaccine to induce immunity. results obtained are consistent with the experimental results published elsewhere ( ) . within first week of vaccine administration primary response was observed by high level of igm. secondary and tertiary immune response clearly seen with increase in b cell number, level of igm, igg +igg , and igm+igg with decreasing concentration of antigen. different b cell isotypes population were also found increasing indicating isotype switching and memory formation (fig a) . b cell activities were also found high, especially b isotype igm and igg , was observed with prominent memory cell formation (fig b) . similarly, cell population of th and tc cells are found high along with memory development (fig c, d) . in addition, macrophage active was found to be increased consistently after each antigen shots and declined upon antigen clearance (fig e) . another cell from cell mediated immunity, dendritic cells were also found increased (fig f) . ifnγ and il expression was found high with low simpson index indicating sufficient immunoglobulin production, suggesting good humoral immune response ( fig g) . simpson index (d) is a measure of diversity. increase in d over time indicates emergence of different epitope-specific dominant clones of t-cells. the smaller the d value, the lower the diversity. till date there is no cure available for covid , suggesting urgent requirement of drug or vaccine to control the spread of sar-cov infections. advancement in bioinformatics tools, process of vaccine development can be facilitated by identifying potential epitopes for t and b cells that will lead to vaccine for prevention of covid . several reports and animal trials suggest that s-protein can be potential target for vaccine development ( , ( ) ( ) ( ) ( ) . there are several reports suggesting the importance of spike protein in activation of cells of immunity and vaccine development ( ) ( ) ( ) . hla-a restricted epitopes from s protein of sars-cov , elicit t cell specific immune response in sars-cov recovered patients ( ) . similarly serum samples of sars-cov recovered patients were sufficient to neutralize epitope rich region on the spike s protein from sars-cov ( ) . programing of live attenuated form of pathogen is commonly used strategy for the vaccine development. despite efficacy of such vaccines, reversion of virulence remains a challenge and not utilized in weak immune patients ( , ) . here in case sars-cov due to its high communal transmission vaccine research requires highly skilled workers, sophisticated instrumentation and biosafety level iii facility. this may likely the slow down the process of vaccine development, enhanced cost leading to restriction on availability of vaccine for mass population. advancement in the field of bioinformatics and molecular biology techniques have provided opportunity of development of vaccine with high efficacy, less time for development, low cost of production that may lead to low cost vaccine in time for large population. we designed a multi-epitopes vaccine construct from s-protein of sars-cov . several online tools were used for the designing of epitopes and thereby a vaccine. from various epitopes predicted by the online server based on common sequence and high score three tcr and two bcr epitopes were selected as part of covid vaccine. all the five tcr and bcr epitopes were linked by gpgpg and aay linker peptides (fig ) . to enhance the antigenicity of tcr/bcr linked epitopes peptide, adjuvant ompa were linked by eaaak linker sequence for a complete vaccine design which was found to restore antigenicity and nonallergen status. (table ). for successful vaccine candidate, secondary and tertiary structures characters play an essential role. secondary structure of designed vaccine contains % alpha helixes, % beta sheets and % coils. upon refinement tertiary structure of designed vaccine showed improvement to desirable level as indicated by ramachandran plot. designed vaccine showed high antigenicity score, no probable allergen. tlr are essential receptor proteins in activation of innate immune response. that recognize and respond by induction of immune reactions to pamps (pathogen associated molecular patterns). various tlrs have shown to activate immune response to virus through their interaction with nucleic acids and envelopes proteins. tlr and tlr / involved in nucleic acids detection, while tlr and tlr recognize envelope glycoproteins ( ) . interaction of s protein with tlr increase the production of il in human monocyte macrophages ( ) . tlr also involved in eliciting innate immune response upon recognition of several other virus components ( , ) . in a comparative binding efficacy of s-proteins to tlrs, tlr showed strongest proteinprotein interaction with hydrogen and hydrophobic bonds on extracellular domain ( ) . in molecular docking analysis, tlr and tlr showed stable protein-protein interaction with designed vaccine compared to adjuvant protein. tlr showed more stable interaction than tlr , which is consistent with previous report ( ) . stable interaction of designed vaccine indicates efficiency of vaccine for activation of tlrs, involved in dendritic cell activation, thereby subsequent antigen processing, and presentation on surface to t cells ( ) . immune simulation showed typical natural immune response pattern upon multiple exposure to same antigen. server predicted elevated level of b cell and t cell for longer time on repeated exposure of antigen. increased level of antiviral cytokine ifnγ and il indicate potential of subsequent activation of t-helper cell thereby high level of ig production, supporting humoral immune response ( ) . to validate the designed vaccine in the screening of immune reactivity by serological test, it is necessary to express the construct in preferred host like e. coli k for recombinant proteins. codon optimization of was performed to achieve high expression designed vaccine in e coli . codon adaptability index (cai= . ) and gc content ( . %) of the vaccine construct was optimum for high expression of recombinant protein in host. immediate step, here after is to express the protein in preferred host and validate results obtained in this report by performing the several immunological assays. inadequate number of drugs and time-consuming process of drug development, multiplication chain of sars-cov infection can be control only by vaccine. we utilized tools and techniques of immune informatics to design a potential vaccine candidate. this vaccine codes epitopes form s protein of sars-cov virus for t and b cell receptors. this is one more time showed that bioinformatics approach can effectively use to develop vaccines in short time and with incurred less cost. although in silico results point out the effectiveness of the vaccine, efficacy need to be analyzed by preforming laboratory experiments and animal model studies. properties of that peptides. in case of immunogenic properties peptide number has negative score for antigenicity hence neglected for further analysis. the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak-a n update on the status world health organization declares global emergency: a review of the novel coronavirus (covid- ) asymptomatic transmission, the achilles' heel of current strategies to control covid- presumed asymptomatic carrier transmission of covid- . jamanetwork improved molecular diagnosis of covid- by the novel, highly sensitive and specific covid- -rdrp/hel real-time reverse transcription-pcr assay validated in a rights reserved. n reuse, undefined. evaluation of covid- rt-qpcr test in multi-sample pools role of chest ct in diagnosis and management covid- )?: situation report the efficacy of lopinavir plus ritonavir and arbidol against novel coronavirus infection -full text view -clinicaltrials.gov a trial of lopinavir-ritonavir in adults hospitalized with severe covid- hydroxychloroquine and azithromycin as a treatment of covid- : results of an open-label non-randomized clinical trial covid- patients with at least a six-day follow up: a pilot qt interval prolongation and torsade de pointes in patients with covid- treated with association of treatment with hydroxychloroquine or azithromycin with in-hospital mortality in patients with covid- in new york state journal pre-proof no evidence of rapid antiviral clearance or clinical benefit with the combination of hydroxychloroquine and azithromycin in patients with severe covid- infection remdesivir, lopinavir, emetine, and homoharringtonine inhibit sars-cov- replication in vitro clinical characteristics of deceased patients with coronavirus disease : retrospective study epidemiology and transmission of covid- in cases and of their close contacts in shenzhen, china: a retrospective cohort study the effect of control strategies to reduce social mixing on outcomes of the covid- epidemic in wuhan, china: a modelling study covid- infection: the perspectives on immune responses allergy and immunology immune responses in covid- and potential vaccines: lessons learned from sars and mers epidemic. apjai-journal.org estimating case fatality rates of covid- preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies microneedle array delivered recombinant coronavirus vaccines: immunogenicity and rapid translational development characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor identification of an hla-a* -restricted cd t-cell epitope ssp- of sars-cov spike protein draft landscape of covid- candidate vaccines sars-cov- vaccines: status report insilico design of a multi-epitope vaccine candidate against onchocerciasis and related filarial diseases covid- : living through another pandemic the nonstructural proteins directing coronavirus rna synthesis and processing the coronavirus e protein: assembly and beyond membrane binding proteins of coronaviruses sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response coronavirus infections and immune responses targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques properties of mhc class i presented peptides that enhance immunogenicity. ncbi.nlm.nih.gov in silico approach for predicting toxicity of peptides and proteins. ncbi.nlm.nih.gov allergenfp: allergenicity prediction by descriptor fingerprints. academic.oup vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines bepipred- . : improving sequence-based b-cell epitope prediction using conformational epitopes. academic.oup a semi-empirical method for prediction of antigenic determinants on protein antigens induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide downloaded from prediction of chain flexibility in proteins -a tool for the selection of peptide antigens prediction of the secondary structure of proteins from their amino acid sequence. ci.nii.ac.jp [internet new hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and x-ray-derived accessible sites identifying candidate subunit vaccines using an alignmentindependent method based on principal amino acid properties allertop v. -a server for in silico prediction of allergens proso ii -a new method for protein solubility prediction raptorx-property: a web server for protein structure property prediction protein secondary structure prediction using deep convolutional neural fields. nature i-tasser: a unified platform for automated protein structure and function prediction in silico analysis of epitope-based vaccine candidates against hepatitis b virus polymerase protein improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization galaxyweb server for protein structure prediction and refinement. academic.oup data set for phylogenetic tree and rampage ramachandran plot analysis of sods in gossypium raimondii and g. arboreum prosa-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. academic.oup verification of protein structures: patterns of nonbonded atomic interactions structure validation by calpha geometry: phi, psi and cbeta deviation a new structure-based tool for the prediction of antibody epitopes dec ; antibody-protein interactions: benchmark datasets and prediction tools evaluation the haddock . web server: user-friendly integrative modeling of biomolecular complexes prodigy: a web server for predicting the binding affinity of protein-protein complexes computational immunology meets bioinformatics: the use of prediction tools for molecular binding in the simulation of the immune system jcat: a novel tool to adapt codon usage of a target gene to its potential expression host salmonella persist in activated macrophages in t cell-sparse granulomas but are contained by surrounding cxcr ligand-positioned th cells thermostability and aliphatic index of globular proteins. academic.oup a simple method for displaying the hydropathic character of a protein i-tasser gateway: a protein structure and function prediction server powered by xsede development of an inactivated vaccine candidate for sars-cov- . science ( -) chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques neeltje van doremalen. biorxiv.org safety, tolerability, and immunogenicity of a recombinant adenovirus type- vectored covid- vaccine: a doseescalation, open-label, non-randomised, first-in-human trial evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission subunit vaccines against emerging pathogenic human coronaviruses sars vaccine development. ncbi.nlm.nih.gov [internet hla-a * t-cell epitopes in severe acute respiratory syndrome (sars) coronavirus nucleocapsid and spike proteins b-cell responses in patients who have recovered from severe acute respiratory syndrome target a dominant site in the s domain of the surface spike glycoprotein nonclinical development of novel biologics, biosimilars, vaccines and specialty biologics global+regulatory+guidelines+for+vaccines.+in+nonclinical+development+of+novel+b iologics,+biosimilars,+vaccines+and+specialty+biologics+&ots=uupokzupyo&sig=esm wpqt kubtszrvu lki zbq global regulatory guidelines for vaccines toll-like receptors and viruses: induction of innate antiviral immune responses. ncbi.nlm.nih.gov sars coronavirus spike proteininduced innate immune response occurs via activation of the nf-κb pathway in human monocyte macrophages in vitro innate immunity to respiratory viruses viruses and toll-like receptors in silico studies on the comparative characterization of the interactions of sars-cov- spike glycoprotein with ace- receptor homologs and human tlrs a theory on sars-cov- susceptibility: reduced tlr -activity as a mechanistic link between men, obese and elderly designing a multi-epitopic vaccine against the enterotoxigenic bacteroides fragilis based on immunoinformatics approach key: cord- -xzu faol authors: andreano, emanuele; nicastri, emanuele; paciello, ida; pileri, piero; manganaro, noemi; piccini, giulia; manenti, alessandro; pantano, elisa; kabanova, anna; troisi, marco; vacca, fabiola; cardamone, dario; de santi, concetta; benincasa, linda; agrati, chiara; capobianchi, maria rosaria; castilletti, concetta; emiliozzi, arianna; fabbiani, massimiliano; montagnani, francesca; depau, lorenzo; brunetti, jlenia; bracci, luisa; montomoli, emanuele; sala, claudia; ippolito, giuseppe; rappuoli, rino title: extremely potent human monoclonal antibodies from convalescent covid- patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xzu faol human monoclonal antibodies are safe, preventive and therapeutic tools, that can be rapidly developed to help restore the massive health and economic disruption caused by the covid- pandemic. by single cell sorting sars-cov- spike protein specific memory b cells from covid- survivors, neutralizing antibodies were identified and of them were expressed as igg. up to , % of monoclonals neutralized the wild type virus at a concentration of > ng/ml, , % neutralized the virus in the range of - ng/ml and , % had a neutralization potency in the range of - ng/ml. only , % neutralized the authentic virus with a potency of - ng/ml. we found that the most potent neutralizing antibodies are extremely rare and recognize the rbd, followed in potency by antibodies that recognize the s domain, the s-protein trimeric structure and the s subunit. the three most potent monoclonal antibodies identified were able to neutralize the wild type and d g mutant viruses with less than ng/ml and are good candidates for the development of prophylactic and therapeutic tools against sars-cov- . one sentence summary extremely potent neutralizing human monoclonal antibodies isolated from covid- convalescent patients for prophylactic and therapeutic interventions. the impact of the sars-cov- pandemic, with more than million cases, over million deaths, trillion impact on the gross domestic product (gdp) and million people filing unemployment in the united states alone, is unprecedented (aratani, ) . in the absence of drugs or vaccines, non-pharmaceutical interventions such as social distancing and quarantine have been the only way to contain the spread of the virus. these interventions showed to be efficient when properly implemented but not all countries were able to do so showing the limits of these strategies. the urgency to develop vaccines and therapies is extremely high. the effort to develop vaccines is unprecedented and fortunately in october we already have several vaccines in advanced phase iii efficacy trials and many others in earlier phase of development. in spite of the big effort, it is predictable this wave of infection will continue to spread globally and it is likely to be followed by additional waves in the next few years until herd immunity, acquired by vaccination or by natural infection, will constrain the circulation of the virus. it is therefore imperative to develop in parallel both vaccines and therapeutic tools to face the next waves of sars-cov- infections as soon as possible. among the many therapeutic options available, human monoclonal antibodies (mabs) are the ones that can be developed in the shortest period of time. in fact, the extensive clinical experience with the safety of more than commercial mabs approved to treat cancer, inflammatory and autoimmune, disorders provides high confidence on their safety. these advantages, combined with the urgency of the sars-cov- pandemic, support and justify an accelerated regulatory pathway. in addition, the long industrial experience in developing and manufacturing mabs decreases the risks usually associated with the technical development of investigational products. finally, the incredible technical progress in the field allows to shorten the conventional timelines and go from discovery to proof of concept trials in - months (kelley, ) . indeed, in the case of ebola, mabs were the first therapeutic intervention recommended by the world health organization (who) and they were developed faster than vaccines or other drugs (kupferschmidt, ) . the sars-cov- spike glycoprotein (s-protein) has a pivotal role in viral pathogenesis and it is considered the main target to elicit potent neutralizing antibodies and the focus for the development of therapeutic and prophylactic tools against this virus , tay et al., . indeed, sars-cov- entry into host cells is mediated by the interaction between s-protein and the human angiotensin converting enzyme (ace ) (wang et al., b . the s-protein is a trimeric class i viral fusion protein which exists in a metastable prefusion conformation and in a stable postfusion state. each sprotein monomer is composed of two distinct regions, the s and s subunits. structural rearrangement occurs when the receptor binding domain (rbd) present in the s subunit binds to the host cell membrane. this interaction destabilizes the prefusion state of the sprotein triggering the transition into the postfusion conformation which in turn results in the ingress of the virus particle into the host cell (wrapp et al., ) . single-cell rnasequencing analysis revealed that ace expression was ubiquitous in different human organs have suggesting that sars-cov- , through the s-protein, can invade human cells in different major physiological systems including the respiratory, cardiovascular, digestive and urinary systems, thus enhancing the possibility of spreading and infection (zou et al., ) . during the first few months of this pandemic, several groups have been active in isolating and characterizing human monoclonal antibodies from covid- convalescent patients or from humanized mice and some of them have been progressing quickly to clinical trials for the prevention and cure of sars-cov- infection (shi et al., , hansen et al., , wang et al., a , pinto et al., , zost et al., , rogers et al., , andreano et al., . so far, most of the work on human monoclonal antibodies against sars-cov- started from one or few patients and it allowed to successfully isolate and characterize the first interesting antibodies with moderate neutralization potency. in many cases an hundreds of nanograms to micrograms of antibodies were required to neutralize the virus in vitro therefore grams of antibodies will be needed per single patient. in this scenario intravenous delivery results to be the only possible administration route for their therapeutic and prophylactic use. only recently very potent human antibodies have been isolated and these can be considered for intramuscular or subcutaneous administration. a striking example is a monoclonal antibody against rsv which delivered intramuscularly to premature babies has shown very promising results in clinical settings (griffin et al., ) . furthermore, giving the initial rush in isolating potential antibody candidates for clinical development, the whole picture on different types of neutralizing antibodies generated after infection is not yet clear. to identify and characterize potent mabs against sars-cov- , we isolated over , sprotein specific-memory b cells (mbcs) derived from covid- convalescent. from the screening of thousands of b cells, three extremely potent monoclonal antibodies were identified and are excellent candidates for further development. isolation and characterization of s-protein specific antibodies from sars-cov- convalescent patients to retrieve mabs specific for sars-cov- s-protein, peripheral blood mononuclear cells (pbmcs) from fourteen convalescent patients enrolled in this study were collected and stained with fluorescent labelled s-protein trimer to identify antigen specific memory b cells (mbcs). fig. summarizes the overall experimental strategy. the gating strategy described in fig. s was used to single cell sort into -well plates igg + and iga + mbcs binding to the sars-cov- s-protein trimer in its prefusion conformation. the sorting strategy aimed to specifically identify class-switched mbcs (cd + cd + igd -igm -) to identify only memory b lymphocytes that went through maturation processes. a total of , s-protein-binding mbcs were successfully retrieved with frequencies ranging from , % to , % (table ) . following the sorting procedure, s-protein + mbcs were incubated over a layer of t -cd l feeder cells in the presence of il- and il- stimuli for two weeks to allow natural production of immunoglobulins ( ). subsequently, mbc supernatants containing igg or iga were tested for their ability to bind either the sars-cov- s-protein trimer in its prefusion conformation or the s-protein s + s subunits ( fig. a -b) by enzyme linked immunosorbent assay (elisa). a panel of , mabs specific for the sars-cov- s-protein were identified showing a broad range of signal intensities (table ) . identification of s-protein specific mabs able to neutralize sars-cov- the , supernatants containing s-protein specific mabs, were screened in vitro for their ability to block the binding of the s-protein to vero e cell receptors. and for their ability to neutralize authentic sars-cov- virus by in vitro microneutralization assay. in the neutralization of binding (nob) assay, of the , tested ( , %) s-protein specific mabs were able to neutralize the antigen/receptor binding showing a broad array of neutralization potency ranging from % to over % (table and fig. c ). as for the authentic virus neutralization assay, supernatants containing naturally produced igg or iga were tested for their ability to protect the layer of vero e cells from the cytopathic effect triggered by sars-cov- infection (fig. s ) . to increase the throughput of our approach, supernatants were tested at a single point dilution and to increase the sensibility of our first screening a viral titer of tcid was used. for this first screening mabs were classified as neutralizing, partially neutralizing and not-neutralizing mabs based on their inability to protect vero e cells from infection, or to their ability to partially or completely prevent the cytopathic effect. out of , mabs tested in this study, a panel of ( , %) mabs neutralized the live virus and prevented infection of vero e cells (table ). the percentage of partially neutralizing mabs and neutralizing mabs (nabs) identified in each donor was extremely variable ranging from , - , % and , - , % respectively ( fig. a and table s ). the majority of nabs were able to specifically recognize the s-protein s domain ( , %; n= ) while , % (n= ) of nabs were specific for the s domain and , % (n= ) did not recognize single domains but only the s-protein in its trimeric conformation ( fig. b ; table s ). from the panel of nabs, we recovered the heavy and light chain amplicons of nabs which were expressed as full length igg using the transcriptionally active pcr (tap) approach to characterize their neutralization potency. the vast majority of nabs identified ( , %; n= ) had a low neutralizing potency and required more than ng/ml to achieve an ic . a smaller fraction of the antibodies had an intermediate neutralizing potency ( , %; n= ) requiring between and ng/ml to achieve the ic , while , % (n= ) required between and ng/ml. finally, only , % (n= ) of the expressed nabs were classified as extremely potently nabs, showing an ic lower than ng/ml ( fig. c -d; table s ). based on the first round of screening, nabs were selected for further characterization. all nabs were able to bind the sars-cov- s-protein in its trimeric conformation (fig. a ). the mabs named j , i , f , g , c , b , i , j and d were also able to specifically bind the s domain (fig. b ). the nabs named h , i , f and f were not able to bind single s or s domains but only the s-protein in its trimeric state, while the nab l bound only the s subunit ( fig. b -c) . among the group of s specific nabs only j , i , f , g , c and b were able to bind the s receptor binding domain (rbd) and to strongly inhibit the interaction between the s-protein and vero e receptors table ). on the other hand i , j and d , despite showing s binding specificity, did not show any binding to the rbd and nob activity ( fig s and table ). based on this description four different groups of nabs against sars-cov- were identified. the first group (group i) is composed by s -rbd specific nabs (j , i , f , g and c ) and showed extremely high neutralization potency against both the wt and d g live viruses ranging from , to , ng/ml ( fig. d -f; table ). the second group (group ii) is composed by s nabs that did not bind the rbd (b , i , j and d ). these antibodies also showed good neutralization potency ranging from , to ng/ml (fig. d -f; table ) but inferior to s -rbd directed nabs. antibodies belonging to group i and ii showed picomolar affinity to the s-protein with a kd ranging from . to . e - m (fig. s ). the third group (group iii) is composed by antibodies able to bind the s-protein only in its whole trimeric conformation (h , i , f and f ). antibodies belonging to this group showed lower affinity to the s-protein (kd . e - m - . e - m) compared to group i and ii nabs and medium neutralization potencies ranging from , to , ng/ml ( table ). this work describes a systematic screening of memory b cells from convalescent people to identify extremely potent human monoclonal antibodies against the spike protein of the sars-cov- virus, to be used for prevention and therapy of covid- . we found that approximately % of the total b cells against the spike protein produce neutralizing antibodies and these can be divided into different groups recognizing the s -rbd, s domain, s -domain and the s-protein trimer. we found that the most potent neutralizing antibodies are extremely rare and that they recognize the rbd, followed in potency by the antibodies recognizing the s domain, the trimeric structure and the s subunit. the l antibody against the s subunit, which had the lowest neutralizing potency, is representative of several other s specific antibodies identified in the preliminary screening. from these data we conclude that in convalescent patients most of the observed neutralization titers are mediated by the antibodies with medium-high neutralizing potency. indeed, the extremely potent antibodies and the antibodies against the s subunit are unlikely to contribute to the overall neutralizing titers because they are too rare and too poor neutralizers respectively to be able to make the difference. the observed antibody repertoire of convalescent patients may be a consequence of the loss of bcl- -expressing follicular helper t cells and the loss of germinal centers in covid- patients which may limit and constrain the b cell affinity maturation (kaneko et al., ) . it is therefore important to perform similar studies following vaccination as it is likely that the repertoire of neutralizing antibodies induced by vaccination may be different from the one described here. out of neutralizing antibodies that were tested and characterized three showed extremely high neutralization potency against both the initial sars-cov- strain isolated in wuhan and the d g variant currently spread worldwide. during the last few months several groups reported the isolation, structure and passive protection in animal models of neutralizing antibodies against sars-cov- . most of these studies, with few exceptions report antibodies which require from to several hundreds more ng/ml to neutralize % of the virus in vitro. these antibodies are potentially good for therapy. however, they will require a high dosage which will result in elevated cost of goods, low capacity to numbers large quantities of doses and intravenous infusion. the extremely potent candidates described in our study will allow to use small quantities of antibodies to reach the prophylactic and therapeutic dosage and as consequence decrease the cost of goods and implement sustainable development and manufacturability. this solution may increase the number of doses produced annually and therefore increase antibodies availability in high income countries as well as low-and middle-income countries (lmics). our work combined with institutions such as the elisa assay with s and s subunits of sars-cov- s-protein the presence of s -and s -binding antibodies in culture supernatants of monoclonal s- vector digestions were carried out with the respective restriction enzymes agei, sali and xho as previously described (tiller et al., , wardemann and busse, ) . briefly, ng of igh, igλ and igκ purified pcrii products were ligated by using the gibson assembly neb into ng of respective human igγ , igκ and igλ expression vectors. the reaction was performed into μl of total volume. ligation product was -fold diluted in nucleasefree water (depc) and used as template for transcriptionally active pcr (tap) reaction which allowed the direct use of linear dna fragments for in vitro expression. the entire process consists of one pcr amplification step, using primers to attach functional promoter (human cmv) and terminator sequences (sv ) onto the fragment pcrii products. tap reaction was performed in a total volume of μl using μl of q polymerase (neb), μl of gc enhancer (neb), μl of x buffer, mm dntps, , µl of forward/reverse primers and μl of ligation product. tap reaction was performed by using the following cycles: °/ ', cycles °/ '', °/ '', °/ ' and °/ ' as final extention step. tap products were purified under the same pcrii conditions, quantified by expi f cell line using manufacturing instructions. the sars-cov- virus was propagated in vero e cells cultured in dmem high glucose supplemented with % fbs, u/ml penicillin, µg/ml streptomycin. cells were seeded at a density of x cells/ml in t flasks and incubated at °c, % co for - hours. the sub-confluent cell monolayer was then washed twice with sterile dulbecco's phosphate buffered saline (dpbs). cells were inoculated with , ml of the virus properly diluted in dmem % fbs at a multiplicity of infection (moi) of . , and incubated for h at °c in a humidified environment with % co . at the end of the incubation, ml of dmem % fbs were added to the flasks. the infected cultures were incubated at °c, % co and monitored daily until approximately - % of the cells exhibited cytopathic effect (cpe). culture supernatants were then collected, centrifuged at °c at , rpm for minutes to allow removal of cell debris, aliquoted and stored at - °c as the harvested viral stock. viral titers were determined in confluent monolayers of vero e cells seeded in -well plates using a % tissue culture infectious dose assay (tcid ). cells were infected with serial : dilutions (from - to - ) of the virus and incubated at °c, in a humidified atmosphere with % co . plates were monitored daily for the presence of sars-cov- induced cpe for days using an inverted optical microscope. the virus titer was estimated according to spearman-karber formula (kundi, ) and defined as the reciprocal of the highest viral dilution leading to at least % cpe in inoculated wells. the neutralization activity of culture supernatants from monoclonal was evaluated using a cpe-based assay as previously described negative for sars-cov- in elisa and neutralization assays). following expression as full-length igg recombinant antibodies were quantitatively tested for their neutralization potency against both the wild type and d g strains. the assay was performed as previously described but using a viral titer of tcid . antibodies were prepared at a starting concentration of µg/ml and diluted step : . technical triplicates were performed for each experiment. characterization of sars-cov- rbd-antibodies binding by flow cytometry containing the captured mab for sec at a flow rate of µl/min. dissociation was followed for sec, regeneration was achieved with a pulse ( sec) of glycine ph . . kinetic rates and affinity constant of spike protein binding to each mab were calculated applying a : binding as fitting model using the bia t evaluation software . . ng/ml). in all graphs selected antibodies are shown in dark red, pink, gray and light blue based on their ability to recognize the sars-cov- s -rbd, s -domain, s-protein trimer only and s -domain respectively. table s . identification of neutralizing antibodies. the table shows numbers and percentages of neutralizing, partially neutralizing and not-neutralizing antibodies identified for each donor assessed in this study. table s . s-protein binding distribution of neutralizing antibodies. the table summarizes the binding distribution of neutralizing antibodies against the s -domain, s -domain and s-protein trimer. table s . potency distribution of sars-cov- s-protein specific nabs. the table reports the number of recombinant nabs expressed per each subject and their distribution based on the neutralization potency. jobless america: the coronavirus unemployment crisis in figures. the guardian single-dose nirsevimab for prevention of rsv in preterm infants studies in humanized mice and convalescent humans yield a sars-cov- antibody cocktail. science isolation of human monoclonal antibodies from peripheral blood b cells developing therapeutic monoclonal antibodies at pandemic pace one-hit models for virus inactivation studies successful ebola treatments promise to tame outbreak evaluation of sars-cov- neutralizing antibodies using a cpe-based colorimetric live virus micro-neutralization assay in human serum samples cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model a human neutralizing antibody targets the receptor-binding site of sars-cov- the trinity of covid- : immunity, inflammation and intervention efficient generation of monoclonal antibodies from single human b cells by single cell rt-pcr and expression vector cloning structure, function, and antigenicity of the sars-cov- spike glycoprotein a human monoclonal antibody blocking sars-cov- infection structural and functional basis of sars-cov- entry by using human ace . cell expression cloning of antibodies from single human b cells cryo-em structure of the -ncov spike in the prefusion conformation single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to -ncov infection key: cord- -bwcpd im authors: yee, min; cohen, e. david; haak, jeannie; dylag, andrew m.; o’reilly, michael a. title: neonatal hyperoxia enhances age-dependent expression of sars-cov- receptors in mice date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bwcpd im the severity of covid- lung disease is higher in the elderly and people with pre-existing co-morbidities. people who were born preterm may be at greater risk for covid- because their early exposure to oxygen at birth increases their risk of being hospitalized when infected with rsv and other respiratory viruses. our prior studies in mice showed how high levels of oxygen (hyperoxia) between postnatal days – increases the severity of influenza a virus infections by reducing the number of alveolar epithelial type (at ) cells. because at cells express the sars-cov- receptors angiotensin converting enzyme (ace ) and transmembrane protease/serine subfamily member (tmprss ), we expected their expression would decline as at cells were depleted by hyperoxia. instead, we made the surprising discovery that expression of ace and tmprss mrna increases as mice age and is accelerated by exposing mice to neonatal hyperoxia. ace is primarily expressed at birth by airway club cells and becomes detectable in at cells by one year of life. neonatal hyperoxia increases ace expression in club cells and makes it detectable in -month-old at cells. this early and increased expression of sars-cov- receptors was not seen in adult mice who had been administered the mitochondrial superoxide scavenger mitotempo during hyperoxia. our finding that early life insults such as hyperoxia enhances the age-dependent expression of sars-cov- receptors in the respiratory epithelium helps explain why covid- lung disease is greater in the elderly and people with pre-existing co-morbidities. of ace was examined in the lungs of mice between pnd and years of age by immunohistochemistry so as to better understand the temporal spatial pattern of its expression. ace was primarily detected in airway epithelial cells with minimal staining seen in the alveolar space (figure a) . the intensity of ace staining increased steadily in the airway epithelium throughout the life of the mouse. a rare ace -positive alveolar cells (arrows) was first observed on pnd and then steadily increased in number between and months of age. western blotting for ace confirmed that the abundance of ace protein became progressively enriched in the whole lungs of -and - month-old mice relative to those of mice harvested at months of age (figure b) . ace mrna levels were similarly increased in the whole lungs of -month-old mice than in those of mice harvested at months of age (figure c) . showed extensive co-localization along the airways at both and months of age (figure a) , but the intensity of ace staining was significantly higher at months of age than at months of age (figure b) . co-staining for ace and the at cell marker prosp-c revealed that the vast majority of ace + cells in the alveoli were at cells (figure c ). approximately % of prosp-c+ at cells expressed ace at months while % of prosp-c+ at cells expressed it at months ( figure d). these findings reveal that ace is primarily expressed by the airway club cells of young adult mice but becomes increasingly expressed by at cells as mice age. % oxygen was not sufficient to induce ace mrna, the levels of ace expression was significantly higher in mice exposed to % and % oxygen relative to controls. exposing mice to a low chronic dose of oxygen ( % for days) that does not alter alveolar development also failed to increase cumulative dose of oxygen than % for days, these findings suggest that oxygen alone may not be stimulating ace expression. immunohistochemistry was used to further understand how hyperoxia affected ace expression in the adult lung. while neonatal hyperoxia increased intensity of ace staining in the airway, it most obviously increased the number of alveolar cells with detectable ace (figure a ). when quantified, neonatal hyperoxia increased the number of alveolar cells expressing ace by approximately % at , and months of age (figure b) . the increased alveolar expression seen at months of age was primarily attributed to increased expression by prosp-c+ at cells; however, this difference resolved at and months of age as more at cells in control lungs began to express ace (figure c ). anti-oxidants block oxygen-dependent changes in ace expression. prior studies by us and other investigators showed that administering the mitochondrial superoxide scavenger mitotempo to mice during exposure to hyperoxia (figure a ) prevents the alveolar simplification and cardiovascular disease observed when these mice reach adulthood - . qrt-pcr revealed administering staining in control mice, it reduced the numbers of alveolar ace + cells in the lungs of hyperoxia- exposed mice lower than controls. neonatal hyperoxia stimulates age-dependent changes in tmprss . tmprss is an endoprotease expressed by respiratory epithelial cells that facilitates viral entry of coronaviruses into epithelial cells . the levels of tmprss mrna and protein were examined in the lungs of -, -and -month-old mice that were exposed to neonatal hyperoxia and room air from pnd - by qrt-pcr and western blotting. tmprss mrna was readily detected in the lungs of -month-old mice, and increased ~ -fold at months and ~ -fold at months (figure a) . neonatal hyperoxia further showed that the levels of tmprss protein were higher in the whole lung lysates of mice exposed to neonatal hyperoxia than in those of control mice (figure b ). as observed for ace expression, exposure to ≥ % oxygen from pnd - was required to significantly increase the levels of tmprss mrna in the lungs of mice at months of age (figure c ). exposure to % oxygen from pnd - also failed to change tmprss expression in adult mice (data not shown) while the administration of mitotempo to mice during exposure blunted the effects of neonatal hyperoxia on tmprss mrna (figure d) . together, these findings suggest age and neonatal hyperoxia have similar effects on increasing tmprss as they do for ace . . cc-by-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint expanded rapidly to become one of the worst pandemics to ever challenge the modern world. while cancer . those with multiple co-morbidities have a higher rate of mortality. people born preterm may also be at great risk for covid- because they often suffer from multiple co-morbidities due, in part, to their lungs being exposed to oxygen too soon or to super-physiological concentrations used to maintain appropriate blood oxygen saturations. it is unclear whether co-morbidities increase disease by changing spatial and temporal expression of sars-cov- receptors or the immune response that leads to a lethal cytokine storm . in this study, we present evidence that expression of the sars- cov- co-receptors ace and tmprss increase in the respiratory epithelium of mice as they age we found that ace was primarily expressed by airway club cells during early postnatal life. the intensity of ace staining increased in the airways of mice with age and became detectable in the alveoli of young adult mice. co-localization with prosp-c revealed that most, but not all alveolar the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint expression continues to increase as mice age. we also found that tmprss mrna expression increases as mice age and this expression was similarly enhanced by neonatal hyperoxia. while at cells have previously been shown to express tmprss , we were not able to detect it in the mouse lung using commercially available antibodies. however, we did find that the abundance of tmprss mrna and protein abundance increased with age and neonatal hyperoxia, and was reduced by mitotempo similar to that of ace . the higher expression of these genes as mice age is in agreement with recent review that discussed two unpublished studies deposited in biorxiv showing how expression of ace and tmprss mrna increases with age in human respiratory epithelium . receptors may be responsible for increasing the severity of covid- lung disease in elderly people. it is important to recognize the normal functions of ace and tmprss because that may help explain why their expression steadily increases with age . ace is perhaps best known for its role in controlling blood pressure in the renin-angiotensin system . ace converts the -amino acid angiotensin i to an -amino acid vasoconstrictive peptide called angiotensin ii. ace accumulates in people with pulmonary hypertension and hydrolyzes angiotensin ii to ang( - ), which has vasodilation properties. over-expressing ace also protects against right ventricular hypertrophy . hence, higher levels of ace seen as the lung ages may reflect an adaptive response designed to tmprss is a serine protease that is localized to the apical surface of secretory cells such as club and at cells of the lung . its expression is highly regulated by androgens in the prostate gland and . cc-by-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint may be similarly responsive to androgens in the lung, suggesting it may play a role in sex-dependent differences in the lung. our study also found that neonatal hyperoxia increased or accelerated expression of ace mrna, ace protein, and tmprss mrna as mice age. significant changes were seen with % or more fio at weeks ( months) of age and persisted as mice age. how hyperoxia regulates expression of these proteins is conflicting and remains to be better understood. one study using human fetal imr- fibroblasts found that hyperoxia does not change expression of ace . however, ace was depleted when cells returned to room air presumably because it was being proteolyzed and shed into the media. in contrast, another study found higher levels of ace in newborn rats exposed to % oxygen for the first week of life and then recovered in % oxygen for the next two weeks . in our hands, changes in ace or tmprss mrna were first detected in - week-old mice exposed to hyperoxia between pnd - . we did not detect changes at the end of oxygen exposure (pnd ). in fact, we recently deposited an rna-seq analysis of at cells isolated from pnd mice exposed to room air versus hyperoxia that shows hyperoxia modestly inhibits ace % for days but not % for or days), we speculate that they occur as an adaptive response to the alveolar simplification and cardiovascular disease as mice exposed to neonatal oxygen age. the genetic studies in mice suggest mutant forms of sp-c that activate the upr are not sufficient by themselves to cause fibrotic lung disease. however, they can predispose the lung to fibrotic disease following viral infections . familial forms of ipf that activate the upr in at cells may therefore accelerate the age-dependent susceptibility of at cells to sars-cov- infections. in summary, we found that neonatal hyperoxia increases or accelerates the age-dependent relative to control mice exposed to room air. (c) qrt-pcr was used to measure tmprss mrna in total lung homogenates of month mice exposed to room air, %, %, or % oxygen between pnd - . (d) qrt-pcr was used to measure tmprss mrna in control and -month-old mice exposed to room air or hyperoxia and vehicle or mitotempo between pnd - n= - mice per group. statistical significance is comparisons for all pairs using tukey-kramer hsd test with *p≤ . ; **p≤ . . . cc-by-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted july , . . https://doi.org/ . / . . . doi: biorxiv preprint neonatal hyperoxia causes pulmonary vascular disease and shortens life span in aging mice genetic toxicology of oxygen covid- spike-host cell receptor grp binding site prediction signal integration in the endoplasmic reticulum unfolded protein response idiopathic pulmonary fibrosis: a disorder of key: cord- -ob euspo authors: durdagi, serdar; dag, cagdas; dogan, berna; yigin, merve; avsar, timucin; buyukdag, cengizhan; erol, ismail; ertem, betul; calis, seyma; yildirim, gunseli; orhan, muge d.; guven, omur; aksoydan, busecan; destan, ebru; sahin, kader; besler, sabri o.; oktay, lalehan; shafiei, alaleh; tolu, ilayda; ayan, esra; yuksel, busra; peksen, ayse b.; gocenler, oktay; yucel, ali d.; can, ozgur; ozabrahamyan, serena; olkan, alpsu; erdemoglu, ece; aksit, fulya; tanisali, gokhan; yefanov, oleksandr m.; barty, anton; tolstikova, alexandra; ketawala, gihan k.; botha, sabine; dao, e. han; hayes, brandon; liang, mengning; seaberg, matthew h.; hunter, mark s.; batyuk, alex; mariani, valerio; su, zhen; poitevin, frederic; yoon, chun hong; kupitz, christopher; sierra, raymond g.; snell, edward; demirci, hasan title: near-physiological-temperature serial femtosecond x-ray crystallography reveals novel conformations of sars-cov- main protease active site for improved drug repurposing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ob euspo the covid pandemic has resulted in + million reported infections and nearly . deaths. research to identify effective therapies for covid includes: i) designing a vaccine as future protection; ii) structure-based drug design; and iii) identifying existing drugs to repurpose them as effective and immediate treatments. to assist in drug repurposing and design, we determined two apo structures of severe acute respiratory syndrome coronavirus- main protease at ambienttemperature by serial femtosecond x-ray crystallography. we employed detailed molecular simulations of selected known main protease inhibitors with the structures and compared binding modes and energies. the combined structural biology and molecular modeling studies not only reveal the dynamics of small molecules targeting main protease but will also provide invaluable opportunities for drug repurposing and structure-based drug design studies against sars-cov- . one sentence summary radiation-damage-free high-resolution sars-cov- main protease sfx structures obtained at near-physiological-temperature offer invaluable information for immediate drug-repurposing studies for the treatment of covid . in late , after the first patient was diagnosed with pneumonia of unknown etiology reported to the world health organization (who) from china, millions of cases followed in a short span of four months (who). on th of march, ; who declared covid outbreak as a pandemic, which originated from severe acute respiratory syndrome corona virus- (sars-cov- ) infection. sars-cov- has a high spread rate (rₒ) value, deeming the pandemic difficult to control (petersen et al., ) . moreover, the absence of a vaccine to provide immunity and the lack of effective treatments to control the infection in high comorbidity groups make this pandemic a major threat to global health (ahn et al., ) . the first human coronavirus to cause a variety of human diseases, such as common cold, gastroenteritis, and respiratory tract diseases was identified in the s (tyrrell et al., ) . in , a deadly version of coronavirus responsible for sars-cov was identified in china (heymann et al., ) . sars-cov- , the most recent member of the coronavirus family to be encountered, is a close relative of sars-cov and causes many systemic diseases (andersen et al., ; dutta and sengupta., ; braun et al., ) . covid patients exhibit (i) high c-reactive protein (crp) and pro-inflammatory cytokine levels; (ii) macrophage and monocyte infiltration to the lung tissue; (iii) atrophy of spleen and lymph nodes which weakens the immune system; (iv) lymphopenia; and (v) vasculitis . release of a large amount of cytokines results in acute respiratory distress syndrome (ards) aggravation and widespread tissue injury leading to multi-organ failure and death. therefore, mortality in many severe cases of covid patients has been linked to the presence of the cytokine storm evoked by the virus (ragab et al., ) . the sars-cov- genome encodes structural proteins including surface/spike glycoprotein (s), envelope (e), membrane (m), and nucleocapsid (n) proteins; and the main reading frames named orf a and orf b that contain non-structural proteins (nsp) (gordon et al., ; chen, liu & guo., ) . among these, orf a/b encodes papain-like protease (plpro), main protease (mpro), a chymotrypsin-like cysteine protease, along with polyproteins named polyprotein a (pp a) and polyprotein b (pp b) (astel et al., ) . encoded polyproteins are then proteolyzed to nsps by precise mpro and plpro cleavages of the internal scissile bonds. nsps are vital for viral replication, such as rna-dependent rna polymerase (rdrp) and nsp , which are used for the expression of structural proteins of the virus thiel et al., ; ullrich & nitsche., ; ziebuhr et al., ) . sars-cov- mpro has no homologous human protease that recognizes the same cleavage site (pillaiyar et al., ) . therefore, drugs that target its active site are predicted to be less toxic and harmful to humans (l. . high sequence conservation of mpro provides minimized mutation-caused drug resistance . given its essential role in the viral life cycle, the sars-cov- mpro presents a major drug target requiring a detailed structural study. drug repurposing is a rapid method of identifying potential therapies that could be effective against covid compared to continuous investigative efforts (e.g identification of new drugs and development of preventive vaccine therapies). the well-studied properties of food and drug administration (fda) approved drugs means such molecules are better understood compared to their counterparts designed de novo. a putative drug candidate identified by drug-repurposing studies could make use of existing pharmaceutical supply chains for formulation and distribution, an advantage over developing new therapies pushpakom et al., ; jarada et al., ) . in typical drug repurposing studies, approved drug libraries are screened against the active site or an allosteric site of target protein structures obtained by methods that have limitations in revealing the enzyme structure, such as cryogenic temperature or radiation damage and investigations that only involve screening of drugs currently on the market (zhou et al., ; choudhary et al., ; beck et al., ; wang, ) . current structural biology-oriented studies that display a repurposing approach to sars-cov- research focused on target-driven drug design, virtual screening or a wide spectrum of inhibitor recommendations (rathnayake et al., ; chauhan and kalra, ; muralidharan et al., ; khan et al., ; kumar et al., ; joshi et al., ) . it is critical to interfere with the ongoing pandemic, instead of time and resource-consuming drug design studies, drug repurposing studies with the existing drugs are crucial in short terms should be a priority. the main purpose of our study is to reveal the conformational dynamics of mpro, which plays a central role in the viral life-cycle of sars-cov- . linac coherent light source (lcls) with its ultrafast and ultrabright pulses enables outrunning secondary radiation damage. the updated highthroughput macromolecular femtosecond crystallography (mfx) instrument of lcls that is equipped with the new autoranging epix k detector that provides a dynamic range of eleven thousand kev photons in fixed low gain mode to collect secondary radiation-damage-free structural data from mpro microcrystals remotely at ambient-temperature (sierra et al., ; blaj et al., ; . here we present two sfx structures of sars-cov- mpro which provide structural dynamics information of its active site and their deep in silico analysis. our results emphasize the importance of structure-based drug design and drug repurposing by using the xfel structures. we obtained more relevant results of flexible areas over two different structures. radiation-damage-free sfx method which enables obtaining the novel high-resolution ambient-temperature structures of the binding pocket of mpro provides an unprecedented opportunity for identification of highly effective inhibitors for drug repurposing by using a hybrid approach that combines structural and in silico methods. besides, structure-based drug design studies will be more accurate based on these novel atomic details on the enzyme's active site. conformations of the drug-binding pocket. we determined two radiation-damage-free sfx crystal structures of sars-cov- mpro in two crystal forms at . Å and . Å resolutions with the following pdb ids: cwb and cwc, respectively (fig. a, b) (supplementary table & the diffraction data collected remotely at the mfx instrument of the lcls at slac national laboratory, menlo park, ca (sierra et al., analysis for sfx studies at lcls). we used an mpro structure determined at ambient-temperature using a rotating anode home x-ray source (pdb id: wqf; as our initial molecular replacement search model for structure determination. two high-resolution sfx structures obtained in different space groups were superposed with an overall rmsd of . Å (fig. s ). they reveal novel active site residue conformations and dynamics at atomic level, revealing several differences compared to the prior ambient-temperature structure of sars-cov- mpro that was obtained at a home x-ray source (fig. a, b ). mpro has a unique n-terminal sequence that affects enzyme's catalytic activity (chang, ; l. zhang et al., ) . besides the native monoclinic form of mpro at . Å (fig. a , pdb id: cwb) we also determined the structure of the mpro with additional extra n-terminal amino acids (generated by thrombin specific n-terminal cleavage) at . Å resolution (fig. b , pdb id: cwc). the structure obtained from this modified version of mpro reveals how the minor changes introduced at the n-terminus affects both the three-dimensional structure of the mpro and promotes the formation of a new orthorhombic crystal form. biologically relevant dimeric structure of native monomeric mpro can be generated by adding the symmetry-related chain b ( fig. a) . each protomer of sars-cov- mpro is formed by three major domains s ). domain i starts from the n-terminal of protein and includes anti-parallel beta-sheet structure. this beta-sheet forms a beta-barrel fold which ends at residue . the domain ii of mpro resides between residues through and mostly consists of anti-parallel beta-sheets. the third domain of the mpro is located between residues through and consists of mostly alpha helices and has a more . previous biochemical studies of mpro suggested there is a competition for dimerization surface between domains i and iii. in the absence of domain i, mpro undergoes a new type of dimerization through the domain iii (zhong et al., ) . during our purifications, we repeatedly observed a combination of monomeric and dimeric forms of mpro on the size exclusion chromatography steps which may be caused by this dynamic compositional and conformational equilibrium. the two sars-cov- mpro sfx crystal structures reveal a non-flexible core active site and the catalytic amino acid cys fig. s . temperature factor analysis revealed that the active site is surrounded by mobile regions (fig. s ). the presence of these mobile regions was observed in both sfx crystal structures, suggesting an intrinsic plasticity rather than an artifactual finding that could have arisen based on the crystal lattice contacts ( fig. s ; indicated with red circles). further, this plasticity suggests that molecules that interact via non-covalent bonds to the mpro binding pocket would do so weakly. our investigation of pdb structures with available electron densities identified that the majority of the mpro inhibitors formed covalent bonds with the active site residue cys . additionally, few non-covalent inhibitors were identified and exhibited weak electron densities (fig. we compared our radiation-damage-free ambient-temperature sfx structure (pdb id: cwb) with ambient-temperature x-ray structure (pdb id: wqf). structures were similar with an rmsd value of . Å (fig. a) . however, we observed significant conformational differences especially in the side chains of thr , ser , glu , leu , asn , cys , met , and gln residues (fig. b & fig. s ). the calculated bias-free composite omit map that covers the active region has been shown ( fig. b & fig. s ). this structure offers previously unobserved new insights on the active site of mpro in addition to wqf structure which is important for the future in silico modeling studies. all three domains contribute to the formation of the active site of the protein. the intersection part of the domain i residue his , domain ii residues cys and his interact via a coordinated water molecule with asp located at the n-terminal loop region of domain iii to form the active site, that is an important drug target area (fig. c & fig. s ). the n-terminal loop of domain iii is suggested to be involved in enzyme activity (ma et al., ) . the distance between cys s and his nε is . Å, very similar to the ambient-temperature structure ( wqf) . oδ -oδ atoms of asp and nh -nε atoms of arg contribute to a salt bridge between these two residues and stabilize the positions of each other. the w water molecule (indicated with a red sphere at the figure) in the active site plays crucial roles for catalysis. w forms triple h-bonds with his , his , and asp side chains with the distances of . Å, . Å, and . Å, respectively (fig. c & fig. s ). when compared to the room temperature structure of mpro ( wqf) our structure displays additional active site residue dynamics while it has an overall high similarity (fig. d) . canonical chymotrypsin-like proteases contain a catalytic triad composed of ser(cys)-his-asp(glu) in their catalytic region, however, sars-cov- mpro possesses a cys and his catalytic dyad which distinguishes the sars-cov- mpro from canonical chymotrypsin-like enzymes (gorbalenya & snijder., ; . during the catalysis, the thiol group of cys is deprotonated by the imidazole of his and the resulting anionic sulfur nucleophilically attacks the carbonyl carbon of the substrate. after this initial attack, an n-terminal peptide product is released by abstracting the proton from the his , resulting in the his to become deprotonated again and a thioester is formed as a result. in the final step, the thioester is hydrolyzed which results in a release of a carboxylic acid and the free enzyme; therefore, restoring the catalytic dyad (pillaiyar et al., ; ullrich & nitsche., ) . catalytic residue conformations of sfx structures are consistent with wqf and support the proposed mpro catalytic mechanism (fig. c, d) . the crystal contact of the symmetry-related molecule with the n-terminal region of the mpro is essential for the formation of the crystal lattice in the c space group ( cwb) (fig. a& fig. s ). & s for cwb and cwc, respectively) to determine the flexible regions. in simulations for both cwb and cwc, the protomers exhibit non-identical behavior as also observed by others in apo form of mpro (suarez and diaz., ; amamuddy et al., ) though as expected, higher rmsf values were observed for the loop regions of protomers. additionally, the protomers of cwc display higher fluctuations compared to cwb around the loops covering the active site (such as loops containing residues - and - ) which could affect the accessibility of the active site by inhibitor compounds. on the drug-binding pocket dynamics. the trajectory frames obtained from md simulations were used to perform pca to determine the variations of conformers of protein structures, i.e. to observe the slowest motions during md simulations. as pca and pca-based methods are useful to reveal intrinsically accessible movements such as domain motions (bahar et al., ) , we have performed pca for backbone atoms of dimeric units for the structures belonging to different space groups. we focused on the first three pcs, that show around % of the total variance in md trajectories to determine the regions of protein structures that display the highest variation (fig. s & s for cwb and cwc, respectively). the first three pcs were projected onto the protein structures to determine the contributions of each residue to specified pcs which displays the motions of specific regions with blue regions with higher thickness representing to more mobile structural parts of the protein along the specified pcs. it can be seen that both in c and p space group structures, again protomers a and b display asymmetric behavior and domain movements along all considered pcs ( fig. a-c for cwb and fig. d -f for cwc) which was also observed in residue dynamics for the same domain between alternate chains (amamuddy et al., ) . when the motions displayed by two dimeric forms are compared to each other cwb (supplementary movies m - for cwb and m -m for cwc), we observe that domain iii is more mobile in cwb (the crystals in the c space group) compared to cwb (the crystals in the p space group). however, the loop region containing residues - around the catalytic site is more mobile for both protomers in cwc while it is only mobile for the chain b of cwb. interestingly, protomer a of cwc is more mobile compared to protomer a of cwb which could reflect the differences in dimeric interface due to the additional n-terminal amino acids and missing interactions with chain b n-terminus residues in the dimer interface. however, the loop regions surrounding the binding pockets with residues - and - display higher flexibility in chain a of cwb while its mobility is somewhat restricted in cwc. we have also performed cross correlation analysis of residues along the pc space to understand the correlation between motions especially for different domains. the correlation between the motions along the first three pcs were plotted as dynamical cross correlation maps displayed in in which motions of some residues are along the same directions, others are in opposite directions within both protomers a and b of cwb while domain ii has limited mobility as being more buried than other domains (suarez and diaz., ) except for the β-strand segment of residues - of protomer a. surprisingly, this segment also has some mobility albeit limited in protomer b of cwc instead of protomer a (supplementary movies m , fig. h ). glu of this segment is actually an important residue that plays a role in stabilizing the substrate binding site s by interacting with ser of the alternate protomer along with phe (ghahremanpour et al., ; and we observed that along pc , for protomer a of cwb and protomer b of cwc, the motions of these residues are correlated. this could have implications about information transfer from one protomer to the other though other mpro structures of sars-cov- needs to be studied in atomistic details. in addition, we observe that correlations between domains of protomers a and b for cwb and cwc are dissimilar. for instance, domains i of cwb have defined anti-correlated motions as can be seen from the purple areas in . space groups of these structures are c , p , p and i respectively. the completeness of the structures was also evaluated, wqf, and y e crystallized in full-length sequence ( - ), however, w only lacks gln at both c-terminal ends of its chains. among compared structures, c y, which has the same space group as cwc has, and lacks amino acids (chain a, - , - , - , and ; chain b, - , - , - , and - ) . in our structures cwb has a full-length sequence, however, cwc lacks its last amino acid residues from c-terminal in chain a, and starts with phe (lacks ser and gly ), and ends at ser , lacks the last residues. when six structures ( w , y e, wqf, c y, cwb, and cwc) were compared based on hydrogen bonding interactions at the dimerization interfaces, more similarities were observed for the latter three ( c y, cwb, and cwc). the only difference between our structures and the other four is the hydrogen bond between ser (chain a) and gln (chain b) (fig. s ). although this hydrogen bond is not observed in other structures, the corresponding residues are close to each other, however they are not within hydrogen bonding distance we also monitored all interface interactions throughout the md simulations and compared cwb and cwc. interestingly, the hydrogen bond between ser (chain a) and gln (chain b) was lost and the interaction was turned into a van der waals interaction in cwb, and conserved as by % of the simulation time. in cwc structure, ser (chain a) was not in the vicinity of gln (chain b) (fig. s ). more interestingly, the same interaction but this time between gln (chain a) and ser (chain b), that is present at the cwb crystal structure, was only observed % of the simulation time ( fig. s ). in the cwc crystal structure, there was no hydrogen bonding interaction between gln (chain a) and ser (chain b). however during simulation, this bond was formed and retained in % of the simulation time (fig. s ). when static structures were compared based on hydrogen bond formation analysis, c y, cwb, and cwc clustered together, having the same hydrogen bonding network at the dimerization interface. the explanation for the hydrogen bond differences observed at the interface is the lack of amino acids at the n-terminal and c-terminal ends of the c y and cwc structures, compared to the cwb. simulation trajectories of cwb and cwc were compared based on hydrogen bond occupancies. the major differences were lys (chain a) and arg (chain b), asn (chain a) and ser (chain b), and arg (chain a) and tyr (chain b). the first interaction was not observed in cwc, however the latter two interactions were not presented in cwb. lys (chain a) and arg (chain b) was conserved % of the simulation time; asn (chain a) and ser (chain b); and arg (chain a) and tyr (chain b) retained % and % of the obtained trajectory frames, respectively (figs. s and s ). we also observed that during md simulations, in the case of cwc, gln (chain a) and phe (chain b) come closer to each other, and this van der waals contact was conserved % of the simulation time and not observed in the case of cwb (fig. s ). mpro. we used three well-known sars-cov- mpro inhibitors (i.e., ebselen, tideglusib, and carmofur) at the investigation of ligand-target interactions. these compounds were docked to the cwb and cwc structures and all-atom md simulations were performed for the top-docking poses. these three compounds were also docked to structures with the following pdb ids: w and y e, for comparison. md simulations were performed using the same md protocol. results showed that especially for apo form dimer targets, ebselen and carmofur are quite flexible at the binding pocket throughout the simulations and in most of the cases, they do not form a stable complex structure ( fig. s ). tideglusib has a more stable structure at the binding pocket of mpro, however, its binding modes are different (fig. ) . in the following section, details of md simulations results of tideglusib at the binding site of the mpro will be discussed. comparison of binding pocket volumes of cwc and ye shows that latter has a bigger average binding pocket volume. average binding pocket volumes were and Å , for the tideglusib bound ye and cwc structures, respectively. corresponding solvent accessible surface area (sasa) values throughout the md simulations also support this result. the sasa, which is the surface area of a molecule accessible by a water molecule, of cwc is smaller than y e ( fig. s ). tideglusib. tideglusib initially docked to the binding pockets of y e and cwc structures using induced-fit docking (ifd) approach. top-docking poses of these compounds were then used in allatom md simulations using the same md protocols. while tideglusib was structurally very stable at the binding pocket of the cwc during the simulations, it was not so stable at the y e the binding site (fig. a ). representative trajectory frames (i.e., the frame that has the lowest rmsd to the average structures) were used in the comparison of binding modes (fig. b) . results showed that while the binding mode of tideglusib forms hydrogen bonds and pi-pi stacking interactions with glu , gln , and his , respectively at the cwc; its corresponding binding mode at the y e only forms van der waals type interactions with hydrophobic moieties. a timeline protein-ligand contacts were visualized throughout the simulations (fig. c ). results showed that thr , leu , met , glu , and gln form stable interactions with the ligand. d ligand atom interactions with protein residues are also represented (fig. d ). interactions that occur more than % of the simulation time are shown. however, corresponding interactions of the ligand at the y e were not stable ( fig. s ). forms of mpro, we applied ifd protocol to predict the binding mode of tideglusib at the binding pocket of wqf and cwb. carbonyl oxygens of the thiadiazolidine ring of the tideglusib formed hydrogen bonds between asn , gly , and glu from their backbone atoms. the naphthalene ring of the ligand formed a pi-pi stacking interaction with the his in the cwb structure. a similar binding mode of tideglusib was observed when wqf was used. however, in this binding mode, his and gly are found to be important. a backbone hydrogen bond was observed between one of the carbonyl oxygen of the thiadiazolidine ring and gly . his formed two pi-pi interactions between the benzyl and the naphthalene rings of the tideglusib (fig. s a, b) . these two binding modes predicted by ifd were used in ns classical all-atom md simulations. each mpro-tideglusib system is evaluated based on rmsd changes from the average structure and we obtained representative structures from corresponding trajectories. in the representative structure of cwb-tideglusib complex, we observed van der waals interactions between surrounding residues, thr , his , cys , thr , ser , met , asn , cys , his , his , met , glu , and gln . in the representative structure of wqf-tideglusib complex, in addition to the similar van der waals sfx utilizes micro-focused, ultrabright and ultrafast x-ray pulses to probe small crystals in a serial fashion. structural information is obtained from individual snapshots; capturing bragg diffraction of single crystals in random orientations (martin-garcia et al., ) . the main advantages of sfx over its counterparts are the capability of working with micron to nanometer sized crystals which does not necessitate the lengthy and laborious optimization steps and enables working with multiple crystal forms and space groups. it enables obtaining high resolution structures at physiologically meaningful temperature and confirms the dynamic regions of the active site without secondary radiation damage. sfx offers great potential and provides much needed critical information for future high-throughput structural drug screening and computational modeling studies with sensitivity to dynamics, insensitivity to potential radiation-induced structural artifacts resulting in production of detailed structural information. sars-cov- mpro catalyzes the precise cleavage events responsible for activation of viral replication and structural protein expression . it has been the focus of several structural and biochemical studies, many of which have been performed at cryogenic temperatures, aiming to provide better understanding of the active site dynamics and reveal an inhibitor that affects the enzyme based on structural information. two crystal forms of mpro, native and modified, were determined at ambient-temperature with resolutions of . Å and . Å respectively. the two forms produced are optimal for co-crystallization and soaking respectively. co-crystallization experiments provide efficient interaction in the binding pocket as both drug and protein are stabilized before the formation of crystals. due to the close crystal lattice contacts, co-crystallization seems to be the preferred method for the native form of mpro (c ). the n-terminal of mpro which plays a critical role in crystal packing. elimination of the h-bonding network produced orthorhombic crystals in the p space group yielding a wider binding pocket, increasing the probability of capturing an expanded number of protein-drug complexes by soaking. an added advantage of two different crystal forms was the elimination of the artifacts introduced by specific lattice packing restraints of each crystal form in the dynamics analysis. the high-resolution mpro sfx structures presented here in two different crystal forms collectively revealed the intrinsic plasticity and dynamics around the enzyme's active site. due to the anionic nature of cys , it seems challenging to design molecules that interact with the active site only through non-covalent bonds as it has a very flexible environment. these findings provide a structural basis for and is consistent with studies claiming that the majority of inhibitors form covalent bonds with the active site of mpro , mainly through cys (figs. especially, unlike some studies, ebselen does not form a stable complex structure (sies et al., ; menendez et al., ; zmudzinski et al., ; węglarz-tomczak et al., ) . in addition to the importance of cys residue in our two different structures, a coordinated w molecule, regulating the catalytic reaction via triple hydrogen-bonding interactions with the his , and the asp that stabilizing the positive charge of his residue . this active site residue conformations of sfx structures are consistent with previous ambienttemperature structure ( wqf) (fig. c, d) . gln contributes to the stability of the ligands , along with asn and ser , are the active site residues forming the flank of the cavity (fig. ) . in a recent study, asn and gln have been indicated to interact with a and b inhibitors that have proven in vitro effectiveness . along with these amino acids, thr , which makes van der waals interaction with the n inhibitor, and ser which undergoes conformational changes in the presence of this inhibitor (pdb id: lu , and leu binding to n , gave different side-chain conformations with sfx, according to wqf structure (fig. b ). there were differences in the crystallization conditions between the sfx and wqf studies, the former making use of charge effects and the latter molecular crowding. the sfx study is also secondary radiation damage-free, eliminating potential artifacts to examine crucial amino acids at the atomic level, especially in terms of catalytic and inhibitor binding sites (figs. s -s ). these two highresolution sfx structures in different space groups reveal new active site residue conformations and intra-and inter-domain network and their dynamics at the atomic level, which helps us to better understand any related structural allosteric transitions of mpro structure interacting with the inhibitors (fig. & figs. s - ) . therefore, considering the importance of the required sensitivity in drug design or the use of natural compounds studies, these active site residue conformations reveal the critical importance of our study more clearly. there are many in silico docking studies performed based on cryogenic mpro protein structures of sars-cov- ton et al., ; pillaiyar et al., ; durdagi et al., ) . although potent antiviral drug candidates are identified and a vaccine research is still ongoing, they have not yielded a desirable final treatment/cure yet. in pursuit of effective drugs against covid , the key role of mpro in viral replication of sars-cov- , highly conserved structure, and the low toxicity of the antiviral molecules targeting this protein due to the absence of homolog of this protease in humans made mpro the target of our study. having access to the alternative ambienttemperature structures of mpro and observed conformational changes on active site residues will be a significant boon for the development of therapeutics and provide better understanding on ligand and inhibitor binding. at this point, our work has two original aspects. firstly, we used a comprehensive platform, sfx, which helps to deeply understand the complexity of sars-cov- to gain access to the high-resolution and radiation damage-free structure and the structural dynamics of the target protein mpro at an unprecedented level at near-physiological-temperatures. secondly, we determined two high-resolution sfx structures of sars-cov- mpro in two different space groups due to the new high-throughput data collection setup offered by the mfx instrument of the lcls-ii. drug repurposing has been the preferred area of research in the insufficiency of time and resources in emergency cases such as a novel pandemic. the most important advantage of the repurposing research is the bypass of several lengthy early stages in drug development. also, knowing the beneficial and detrimental effects of targeted drugs, along with well-established precautions will help with time limitations. this procedure has emerged as a fundamental and very strategic approach not only for prospective cohort design but also for many types of clinical trials, particularly crosssectional studies because as the molecules considered in repurposing studies passed through several stages, have well-defined profiles, they would not require prolonged pre-clinical studies, and hence they are important candidates to consider in case of disease emergencies or outbreaks (su et al., ; cavalla et al., ; choo et al., ; aguila et al., ) . therefore, enormous contribution to clinical studies, repurposing is a rapid step towards the conclusion of not only randomized controlled trials but also critical structural biology investigations (bumb et al., ; pihan et al., ; choudhary et al., ) . considering all this, adopting the drug repurposing approach and using known inhibitors ebselen, tideglusib, and carmofur to carry out mpro-based in silico molecular docking, md simulations and post-md analyses make our combined study more specific. a grave issue with drug research is the variability of the target proteins as the mutations and modifications may render the found drugs ineffective (dinesh et al., ) . for this reason, it is only sensible to work on a protein, whose biochemical properties are conserved over time and among different strains. evolutionarily, viruses try to hide by mimicking the proteins involved in the functioning of the host organism. cytomegalovirus (hcmv) can mimic a common host protein to hijack normal cell growth machinery or human immunodeficiency virus (hiv) can mimic a high percentage of human t cell receptors (bernstein, ; robertson, ) . sars-cov- virus contains the plpro enzyme, which is highly similar to the deubiquitinating enzymes (usp , usp ) in human metabolism (ratia et al., ) . that's why targeting this enzyme carries a high risk. in addition, spike protein, one of the other targeted proteins, also has a high mutation rate . also, spike protein includes a similar restriction site with epithelial channel protein (anand p et al., ) . besides that, inhibitors which target spike protein may act only for preventive aims (i.e., before the virus infection), if the virus already infected the host cell, targeting this region may not be useful. there is no mpro homolog in the human genome, targeting this protease is therefore safer and harmless to humans with reduced cross reactivity and side effects making it an ideal candidate for drug therapy. contributions of residues to the first three pcs for a-c) cwb and d-f) cwc with chain a displayed on the right and chain b on the left side. the aligned trajectory frames were generated to interpolate between the most dissimilar structures in the distribution along specified pcs. color scale from red to blue represents low to high displacements along specified pcs with broadening of the tubes depict the trajectory movements. the cα atoms of catalytic residues his and cys are displayed as spheres with cyan and lime colors, respectively. g-h) dynamic cross correlation matrix generated from the motions observed in pc space with values ranging from - (complete anticorrelation) to + (complete correlation) g) for cwb and h) for cwc. the boundary between chain a and b is denoted by dashed lines. bacterial vector by using ndei and bamhi restriction cleavage sites at ' and ' ends respectively. nterminal canonical sars-cov- mpro autocleavage cut site is indicated by green and purple which generates the native n-terminus. c-terminus has the prescision tm restriction site shown in red which is used to generate the native c-terminus after ni-nta hexa-histidine affinity purification chromatography. in-frame hexa-histidine tag and stop codon is shown in blue color. standard chromatography purification methods were applied to both constructs with slight modifications as described below. soluble mpro proteins were purified by first dissolving the bacterial cells in the lysis buffer containing mm tris ph . , mm nacl, % v/v glycerol supplemented with . % triton x- followed by sonication (branson w sonifier, usa). after sonication step, cell lysate was centrifuged by using the beckman optima™ l- xp ultracentrifuge at rpm for minutes at °c by using ti rotor (beckman, usa). after ultracentrifugation the pellet which contains membranes and insoluble debris was discarded and clear supernatant applied to nickel affinity chromatography by using a ni-nta agarose resin (qiagen, usa). to purify the mpro protein, first the chromatography column was equilibrated by flowing column volume of the loading buffer containing mm tris-hac ph . , mm imidazole, mm nacl. after equilibration, the supernatant containing the overexpressed mpro protein was loaded into the ni-nta agarose column at ml/minute flow rate. unbound proteins were removed by washing with column volumes of the loading buffer to clear the non-specific binding. after washing, hexa-histidine tagged mpro proteins were eluted from the column with the elution buffer containing mm tris-hac ph . , mm nacl, mm imidazole in ml of total volume. after elution, purified protein was placed in a kda cut off dialysis membrane and dialyzed against the buffer containing mm tris-hac ph . , mm nacl overnight to get rid of the excess imidazole. after the dialysis step we applied : stoichiometric molar ratio c protease (prescission protease, genscript, usa) to cleave the c- for initial crystallization screening, we employed sitting-drop microbatch under oil screening method by using well terasaki crystallization plates (greiner-bio, germany). purified mpro protein at mg/ml mixed with : volumetric ratio with ~ commercially available sparse matrix crystallization screening conditions. the sitting drop solutions were then covered with μl of % paraffin oil (tekkim kimya, turkey). all the crystallization experiments were performed at ambienttemperature. for our native construct- we were able to obtain multiple hit conditions and among them the best crystals were obtained at pact premier tm crystallization screen condition # from molecular dimensions, uk. the best crystallization condition has contained mm mmt buffer ph . and % w/v peg [mmt buffer; dl-malic acid, -morpholine ethane sulfonicacid (mes) monohydrate, -amino- -(hydroxymethyl)- , -propanediol (tris)-hcl]. for the modified construct only one crystallization condition yield the macrocrystals. after multiple optimization of the seeding protocol by using crystals obtained by microbatch under oil, we scaled up the batch crystallization volume to total of ml for native construct- and total volume of ml for modified construct- . microcrystals - × - × - μm in size were passed through micron plastic mesh filters (millipore, usa) in the same mother liquor composition to eliminate the large single crystals and other impurities before the data collection. crystal concentration was approximated to be - particles per ml based on light microscopy. due to covid travel restrictions none of the initial crystals or the batched crystalline slurry were able to be pretested for their diffraction quality before the scheduled xfel beamtime. . ml total volume of crystal slurry was transferred to ml screw top cryovial (wuxi nest biotechnology, china cat# ). to absorb the mechanical shocks during transport from istanbul to menlo park, ca these vials were wrapped loosely by kimwipes (kimberly-clark, usa) and placed in ml screw top glass vials and tightly closed to provide insulation during transport via air. the vials were wrapped with excess amounts of cotton (ipek, turkey) and placed in a ziploc tm bag (sc johnson, usa) to provide both added layer of insulation and mechanical shock absorption. the ziploc tm bags were placed in a styrofoam box that was padded with ~ kg of cotton to provide more insulation and mechanical shock absorption during the transport. the styrofoam box was sealed and wrapped with an additional layer of cm thick loose cotton layer and duck taped all around to further insulate the delicate mpro crystals during ambient-temperature transport. all these packing materials and techniques provided us with crystals diffracting to . Å - . Å resolution as described below. the . ml sample reservoir was loaded with mpro crystal slurry in their unaltered mother liquor as described above. we used standard microfluidic electrokinetic sample holder (mesh) (sierra et al., ; sierra et al., ) injector for our sample injection. the sample capillary was a μm id × μm od × . m long fused silica capillary. the applied voltage on the sample liquid was typically - v, and the counter electrode was grounded. the sample ran typically between . and μl/min. the sfx experiments with native mpro microcrystals were carried out at the lcls beamtime id: mfx at the slac national accelerator laboratory (menlo park, ca). the lcls x-ray beam with a vertically polarized pulse with duration of fs was focused using compound refractive beryllium lenses to a beam size of ~ × μm full width at half maximum (fwhm) at a pulse energy of . mj, a photon energy of . kev ( . Å) and a repetition rate of hz. om monitor and psocake (damiani et al., , thayer et al., were used to monitor crystal hit rates, analyze the gain switching modes and determine the initial diffraction geometry of the new epix k m detector . a total of , , detector frames were collected in h m s continuously with the new from native (construct- ) mpro microcrystals. a total of , detector frames were collected in h m s continuously with the new epix k m pixel array detector from modified (construct- ) mpro microcrystals. the total beamtime needed for native (construct- ) and modified (construct- ) datasets were h m s and h m s respectively, which shows the efficiency of the mfx beamline installed with the new epix k m detector and robust injector system, as due to lack of blockages no dead time was accumulated. individual diffraction pattern hits were defined as frames containing more than bragg peaks with a minimum signal-tonoise ratio larger than . , which were a total of , and , images for native and modified respectively. the detector distance was set to mm, with an achievable resolution of . Å at the edge of the detector ( . Å in the corner). an example diffraction pattern is shown in fig. s . the diffraction patterns were collected at the mfx instrument at the lcls using the epix k m detector . the raw data images were subjected to detector corrections with cheetah (barty et al., ) , as well as for hit finding based on bragg reflections. the hitfinding parameters for all datasets classifying a hit were as follows (using peakfinder ): a minimum pixel count of above an adc-threshold of with a minimum signal to noise ratio of was considered a peak, and an image containing at least peaks was classified as a crystal hit. the crystal hits were then indexed using the software package crystfel version . (white et al., ) using the peaks found by cheetah. indexing was attempted using the indexing algorithms from xgandalf (gevorkov et al., ) , dirax (duisenberg et al., ) , mosflm (powell et al., ) and xds (kabsch et al., ) , in this order. after an approximate cell was found, the data was indexed using cell axis tolerances of Å and angle tolerances of º (--tolerance option in crystfel). the integration radii were set to , , and the "multi" option was switched on to enable indexing of multiple crystal lattices in a single image. the indexed reflections were subsequently integrated and merged using partialator ) applying the unity model over iterations and the max-adu set to . the complete reflection intensity list from crystfel was then scaled and cut using the truncate program from the ccp suite (winn et al., ) prior to further processing. for the native mpro protein crystals the final set of indexed patterns, containing , frames ( . % indexing rate), was merged into a final dataset (overall cc* = . ; . Å cutoff) for further analysis (c , unit cell: a = . Å, b = . Å, c = . Å; α = °, β = °, γ = °). the final resolution cutoff was estimated to be . Å using a combination of cc* (karplus & diederichs, ) and other refinement parameters. the final dataset had overall rsplit = . %, and cc* = . in the highest resolution shell. for the n-terminally modified mpro protein crystals the final set of indexed patterns, containing , frames ( . % indexing rate), was merged into a final dataset (overall cc* = . ; . Å cutoff) for further analysis (p , unit cell: a = . Å, b = . Å, c = . Å; α = β = γ = °). the final resolution cutoff was estimated to be . Å using a combination of cc* and other refinement parameters. the final dataset had overall rsplit = . %, and cc* = . in the highest resolution shell. we determined two ambient-temperature mpro structures by using two crystal forms in space group c and p structures using the automated molecular replacement program phaser (mccoy et al., ) implemented in phenix (adams et al., ) with the previously published ambienttemperature structure as a search model (pdb id: wqf) . this choice of starting search model minimized experimental temperature variations between the two structures. coordinates of the wqf were used for initial rigid body refinement with the phenix software package. after simulated-annealing refinement, individual coordinates and tls parameters were refined. we also performed composite omit map refinement implemented in phenix to identify potential positions of altered side chains and water molecules were checked in program coot (emsley & cowtan, ) , and positions with strong difference density were retained. water molecules located outside of significant electron density were manually removed. the ramachandran statistics for native monoclinic mpro structure (pdb id: cwb) (most favored / additionally allowed / disallowed) are . / . / . % respectively. ramachandran statistics for orthorhombic mpro structure (pdb id: cwc) (most favored / additionally allowed / disallowed) are . / . / . % respectively. the structure refinement statistics are summarized in supplementary table s . structure alignments were performed using the alignment algorithm of pymol (www.schrodinger.com/pymol) with the default σ rejection criterion and five iterative alignment cycles. all x-ray crystal structure figures were generated with pymol. the two ambient-temperature mpro in space group c and p were examined to generate ellipsoid structures based on b-factor with pymol and these two structures were compared with the mpro structures at k (pdb id: xkh) to provide better understanding on the flexibility of atoms, side chains and domains. the all ellipsoid structures were colored with rainbow selection on pymol. we have used different crystal structures of mpro available in literature as well as the obtained crystal structures in this study (pdb ids: cwb and cwc) as target structures for molecular docking and md simulations. the biologically-relevant dimeric form of cwb is generated by application of a symmetry operator. as the crystal structures in this study were obtained at ambient-temperature, for comparison another ambient-temperature structure of mpro (pdb id: wqf) in apo form was also selected as target structure. another apo form structure of mpro (pdb id: y e) in dimeric form was also chosen for comparison. additionally, mpro structure bound to a non-covalent inhibitor (pdb id: w ) in both monomeric and dimeric forms was utilized as target structure. for ligands, we have considered three compounds that have shown promising inhibitory activity based on the highthroughput screening of over , compounds by jin et al., namely; ebselen (ic = . ± . μm), tideglusib (ic = . ± . μm) and carmofur (ic = . ± . μm) . all the target structures considered in this study were firstly prepared using protein preparation module of maestro modeling program in which missing atoms were added, water molecules not in the vicinity of co-crystallized ligands were removed and bond orders were assigned (madhavi sastry et al., ) . the protonation states of amino acids at physiological ph were adjusted using propka (bas et al., ) to optimize the hydrogen binding and charge interactions. as a final step of preparation, a restrained minimization was performed with opls e force field parameters (harder et al., ) . the structures of three compounds were taken from pubchem; ebselen (pubchem id, ), tideglusib (pubchem id: ) and carmofur (pubchem id: ). the compounds also needed preparation hence, ligprep module (schrödinger release - , ) of maestro modeling program was employed with opls e force field parameters (harder et al., ) . the ionization states of the molecules were predicted by epik module (shelley et al., ) at physiological ph of . . the prepared target protein and ligand structures were used for molecular docking studies. we have employed a grid-based docking method, induced fit docking (ifd) protocol of maestro (sherman et al., a,b) which uses glide halgren et al., ; friesner et al., ) and prime (jacobson et al., ) the selected docking poses at each considered target structure of mpro with the three compounds were subjected to md studies. the apo form structures obtained in this study were also subjected to md simulations. for comparison reasons, md simulations were also performed for the holo form structure (pdb id: w ) with its co-crystallized ligand, x . the target protein-ligand complexes were placed in simulation boxes with orthorhombic shape in which box sizes were calculated based on buffer distance of . Å along all three dimensions and solvated with explicit water molecules of spc (berendsen et al., ) model. the simulation systems were neutralized by the addition of counter ions (na+ or cl− depending on the charge of the systems) and . m nacl solution was added to adjust concentration of the solvent systems. all atom md simulations package desmond (bowers et al., ) was employed. proceeding the production md simulations, the systems were equilibrated using relaxation protocols of desmond package in which a series of minimizations and short md simulations which are performed with small time-steps at lower temperature and restrains on the nonhydrogen solute atoms in the initial stages and slowly time-steps are increased as well as simulation temperature and restrains on solute atoms are released. the production simulations were performed under constant pressure and temperature conditions, i.e. npt ensemble. temperature was set as k while being controlled by nose-hoover thermostat (nosé, ; hoover, ) . the pressure was set as atmospheric pressure of . bar with isotropic pressure coupling and controlled by martyna-tobias-klein barostat (martyna et al., ) . smooth particle mesh ewald method (essmann et al., ) was utilized to calculate long range electrostatic interactions with periodic boundary conditions (pbc). for short range electrostatics and lennard-jones interactions, the cut-off distance was set as . Å. the multi-step integrator respa was employed in which the time steps were varied for interaction types as followed in fs for: bonded, . ; near . and far . . principal components analysis (pca), a statistical data processing method, were performed to reduce the large-dimensional data by extracting large amplitude motions onto collective sets. a covariance matrix were generated from md trajectory data for backbone atoms of protein structures as follow here, i and j represent the backbone atom number, i.e. residue numbers of proteins while n is the number of backbone atoms considered in analysis. the cartesian coordinates of atoms are denoted by and for ith and jth atom, respectively with and representing the time-averaged values over md simulations. by diagonalization of covariance matrix, a collection of eigenvectors and corresponding eigenvalues were obtained. the eigenvectors of the diagonalized matrix are referred as principal components (pcs) and constitute a linear basis set that matches the distribution of observed structures. the corresponding eigenvalues of the diagonalized matrix display the variance of the distribution along each pcs. in this study, we have utilized the bio d package (grant et al., ; yao et al., ) , a platform independent r package to perform pca for considered simulation systems. the trajectories obtained from independent md simulations were concatenated and frames were aligned with respect to the initial (reference) frame before pca. the correlation of atomic displacements is evaluated by cross-correlation analysis to appreciate the coupling of motions. the magnitudes of all pairwise cross-correlation coefficients were investigated to assess the extent of atomic displacement correlations for each simulation system in principal component space. the normalized covariance matrix of atomic fluctuations was calculated as where and are the displacements of residues i and j, i.e., mean square atomic fluctuations. the values of varies between - to with representing completely correlated motions (same period and same phase), representing completely anticorrelated motions (same period and opposite phase) while value indicates motions are uncorrelated. (ichiye et al., ; mccammon et al., ) . bio d package (grant et al., ; yao et al., ) in r environment was employed to generate atom-wise crosscorrelations of motions observed in pcs to and dynamical cross-correlation map, or dccm were generated and displayed as a graphical representation of cross-correlation coefficients. interface analysis of crystal structures and md trajectories were carried out with the getcontacts python scripts (https://getcontacts.github.io/). two different approaches were followed, in the first one only hydrogen bonds at the dimerization interface were taken into account, and in the second approach all possible interactions, namely; salt bridges, pi-cation, pi-pi stacking, t-stacking, van der waals (vdw), and hydrogen bonds were calculated. if the distance between the acceptor and the donor atoms is < . Å and the angle < ° hydrogen bond is defined between atom groups. salt bridges were defined between atoms of negatively charged [asp (od , od ) and glu (oe , oe )] and positively charged [lys (nz) and arg (nh , nh )], where distances were < . Å. t-stacking, pi-cation, and pi-stacking distance criteria were . , . , and . Å, respectively. hydrophobic and vdw interactions were calculated based on atom r (radii), if the distance between atoms is less than the sum of r of atom a, r of atom b, and . Å. one frame and trajectory-based calculations were performed. one frame calculations were applied for the crystal structures, and means the corresponding residues are in contact, and means no interaction. however, in trajectory-based calculations, we used all available frames from our md trajectories. for the interaction frequencies, we applied a . threshold to only take into account the contacts that are occurred at least % of the simulations. fig. s . superposition of two crystal forms. native mpro in space group c colored in darksalmon and its symmetry mate in green. modified mpro in space group p is colored in palecyan and light blue. two crystal structures align with an overall rmsd of . Å. domain i is colored in light blue, domain ii is colored in pale cyan and domain iii is colored in dark salmon. active site formed by residues from all three subdomains such as critical cys is shown as spheres, his , his and asp shown in sticks. red sphere labeled as w represents the water molecule in the binding pocket. s . projection of md trajectory frames onto subspaces defined by the first three largest pcs as well as distribution of variance observed for eigenvalues for cwc. instantaneous conformations (i.e. trajectory frames) colored from blue to red in order of trajectory time ( ns). the black triangle represents the initial crystal structure conformation projected onto specified pcs. domain i residues depicted in light blue, domain ii and domain iii depicted in pale cyan and dark salmon, respectively. blue color shows hydrogen bonding, lines with red ones lack hydrogen bonding. tables supplementary table s . data collection and refinement statistics for x-ray crystallography datasets are collected by serial crystallography the highest resolution shell is shown in parentheses identification of potent covid- main protease (mpro) inhibitors from flavonoids cultivation of a novel type of common-cold virus in organ cultures genomic characterization of a novel sars-cov- order -nidovirales" in virus taxonomy middle east respiratory syndrome coronavirus (mers-cov) origin and animal reservoir susanna lau predicting commercially available antiviral drugs that may act on the novel coronavirus (sars-cov- ) through a drug-target interaction deep learning model molecular characterization of ebselen binding activity to sars-cov- main protease gc- , and calpain inhibitors ii, xii inhibit sars-cov- viral replication by targeting the viral main protease genome organization and structural aspects of the sars-related virus cytokine release syndrome in severe covid- : interleukin- receptor antagonist tocilizumab may be the key to reduce mortality antiviral drug targets of single-stranded rna viruses causing chronic human diseases predictive methods in drug repurposing: gold mine or just a bigger haystack? a sars-cov- protein interaction map reveals targets for drug repurposing drug repurposing for coronavirus (covid- ): in silico screening of known drugs against coronavirus cl hydrolase and protease enzymes current status of epidemiology, diagnosis, therapeutics, and vaccines for novel coronavirus disease (covid- ) the architecture of sars-cov immune mechanisms of pulmonary intravascular coagulopathy in covid- pneumonia interleukin- use in covid- pneumonia related macrophage activation syndrome the covid- cytokine storm c-like protease inhibitors block coronavirus replication in vitro and improve survival in mers-cov-infected mice us fda approves new class of hiv therapeutics sars-cov main protease: a molecular dynamics study structural plasticity of sars-cov- cl mpro active site cavity revealed by room temperature x-ray crystallography sars-cov- infection: response of human immune system and possible implications for the rapid test and treatment viral cysteine proteinases repurposed gi drugs in the treatment of covid- comparing sars-cov- with sars-cov and influenza pandemics d structure collections dedicated to drug repurposing and fragment-based drug design systematic drug repositioning through mining adverse event data in clinicaltrials. gov ebselen as a highly active inhibitor of plprocov . biorxiv ( ) sars-cov- renal tropism associates with acute kidney injury the protein expression profile of ace in human tissues identification of amitriptyline hcl, flavin adenine dinucleotide, azacitidine and calcitriol as repurposing drugs for influenza a h n virus-induced lung injury a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- performance of epix k, a high dynamic range, gain auto-ranging pixel detector for fels quaternary structure of the sars coronavirus main protease the macromolecular femtosecond crystallography instrument at the linac coherent light source idrug: integration of drug repositioning and drug-target prediction via cross-network embedding designing of improved drugs for covid- : crystal structure of sars-cov- main protease mpro potential therapeutic use of ebselen for covid- and other respiratory viral infections. free radic comprehensive insights into the catalytic mechanism of middle east respiratory syndrome c-like protease and severe acute respiratory syndrome c-like protease normal mode analysis of biomolecular structures: functional mechanisms of membrane proteins cytokines and autoimmunity drug repurposing and emerging adjunctive treatments for schizophrenia history and recent advances in coronavirus discovery fast identification of possible drug treatment of coronavirus disease- (covid- ) through computational drug repurposing study virus-encoded proteinases and proteolytic processing in the nidovirales efficacy and safety of lianhuaqingwen capsules, a repurposed chinese herb structural basis for the ubiquitin-linkage specificity and deisgylating activity of sars-cov papain-like protease crystal structure of sars-cov- main protease provides a basis for design of improved aketoamide inhibitors potential effects of coronaviruses on the cardiovascular system: a review serial femtosecond crystallography: a revolution in structural biology isolation of a novel coronavirus from a man with pneumonia in saudi arabia ebselen derivatives are very potent dual inhibitors of sars-cov- proteases -plpro and mpro in vitro studies computational studies of drug repurposing and synergism of lopinavir, oseltamivir and ritonavir binding with sars-cov- protease against covid- without its n-finger, the main protease of severe acute respiratory syndrome coronavirus can form a novel dimer through its c-terminal domain drugs for hyperlipidaemia may slow down the progression of hearing loss in the elderly: a drug repurposing study sars-cov- strategically mimics proteolytic activation of human enac targeting sars-cov- : a systematic drug repurposing approach to identify promising inhibitors against c-like proteinase and ′-o-ribose methyltransferase human immunodeficiency virus proteins mimic human t cell receptors inducing cross-reactive antibodies understanding human coronavirus hcov-nl ali identification of potent covid- main protease (mpro) inhibitors from natural polyphenols: an in silico strategy unveils a hope against corona impact of emerging mutations on the dynamic properties the sars-cov- main protease: an in silico investigation mechanisms of coronavirus cell entry mediated by the viral spike protein identification of sars-cov- cell entry inhibitors by drug repurposing using in silico structure-based virtual screening approach screening of clinically approved and investigation drugs as potential inhibitors of covid- main protease: a virtual drug repurposing study sars-cov- and male infertility: possible multifaceted pathology clinical trials on drug repositioning for covid- treatment evolutionary and structural analyses of sars-cov- d g spike protein mutation now documented worldwide coronavirus hku and other coronavirus infections in hong kong main protease ( m pro ) from several medicinal plant compounds by molecular docking study managing cytokine release syndrome associated with novel t cell-engaging therapies characterizations of sars-cov- mutational profile, spike protein stability and viral transmission overcoming drug development bottlenecks with repurposing: old drugs learn new tricks drug repurposing: progress, challenges and recommendations the sars-cov- main protease as drug target the epix k -megapixel hard x-ray detector at lcls in silico screening of natural compounds against covid- by targeting mpro and ace using molecular docking a review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions an overview of severe acute respiratory syndrome-coronavirus (sars-cov) cl protease inhibitors: peptidomimetics and small molecule chemotherapy drug repositioning: identifying and developing new uses for existing drugs rapid identification of potential inhibitors of sars-cov- main protease by deep docking of . billion compounds mechanisms and enzymes involved in sars coronavirus genome expression structure-based design of antiviral drug candidates targeting the sars-cov- main protease. science the use of anti-inflammatory drugs in the treatment of people with severe coronavirus disease (covid- ): the experience of clinical immunologists from china timeline: who's covid- response emerging coronaviruses: genome structure, replication, and pathogenesis analysis of the mutation dynamics of sars-cov- reveals the spread history and emergence of rbd mutant with lower ace binding affinity in silico prediction of potential inhibitors for the main protease of sars-cov- using molecular docking and dynamics simulation based drug-repurposing cheetah: software for high-throughput reduction and analysis of serial femtosecond x-ray diffraction data phaser crystallographic software very fast prediction and rationalization of pka values for protein-ligand complexes linac coherent light source data analysis using psana structural plasticity of sars-cov- cl mpro active site cavity revealed by room temperature x-ray crystallography opls : a force field providing broad coverage of drug-like small molecules and proteins constant pressure molecular dynamics algorithms protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments the missing term in effective pair potentials autoindexing diffraction images with imosflm dynamics of proteins and nucleic acids epik: a software program for pka prediction and protonation state generation for drug-like molecules bio d: an r package for the comparative analysis of protein structures indexing in single-crystal diffractometry with an obstinate list of reflections data systems for the linac coherent light source proceedings of the acm/ieee conference on supercomputing overview of the ccp suite and current developments a hierarchical approach to all-atom protein loop prediction linking crystallographic model and data quality phenix: a comprehensive python-based system for macromolecular structure solution coot: model-building tools for molecular graphics glide: a new approach for rapid, accurate docking and scoring. . method and assessment of docking accuracy extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes concentric-flow electrokinetic injector enables serial crystallography of ribosome and photosystem ii nanoflow electrospinning serial femtosecond crystallography a unified formulation of the constant temperature molecular dynamics methods glide: a new approach for rapid, accurate docking and scoring. . enrichment factors in database screening processing serial crystallography data with crystfel: a step-by-step guide crystfel: a software suite for snapshot serial crystallography recent developments in crystfel collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations the epix k -megapixel hard x-ray detector at a smooth particle mesh ewald method onda: online data analysis and feedback for serial x-ray imaging canonical dynamics: equilibrium phase-space distributions domain-opening and dynamic coupling in the α-subunit of heterotrimeric g proteins structure of mpro from sars-cov- and discovery of its inhibitors hd acknowledges support from national science foundation (nsf) science and technology centers grant nsf- (biology with x-ray lasers, bioxfel) and the scientific and technological research council of turkey (tubitak) grant ( c ). hd would like to thank michelle young, ritu khurana, lori anne love and tracy chou for their invaluable support and discussions. use of key: cord- -mg dziuw authors: carneiro, joão; gomes, catarina; couto, cátia; pereira, filipe title: cov id: detection and therapeutics oligo database for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mg dziuw the ability to detect the sars-cov- in a widespread epidemic is crucial for screening of carriers and for the success of quarantine efforts. methods based on real-time reverse transcription polymerase chain reaction (rt-qpcr) and sequencing are being used for virus detection and characterization. however, rna viruses are known for their high genetic diversity which poses a challenge for the design of efficient nucleic acid-based assays. the first sars-cov- genomic sequences already showed novel mutations, which may affect the efficiency of available screening tests leading to false-negative diagnosis or inefficient therapeutics. here we describe the cov id (http://covid.portugene.com/), a free database built to facilitate the evaluation of molecular methods for detection of sars-cov- and treatment of covid- . the database evaluates the available oligonucleotide sequences (pcr primers, rt-qpcr probes, etc.) considering the genetic diversity of the virus. updated sequences alignments are used to constantly verify the theoretical efficiency of available testing methods. detailed information on available detection protocols are also available to help laboratories implementing sars-cov- testing. the sars-cov- genome consists of a single, positive-stranded rna with approximately nucleotides. thousands of genomic sequences are now available in public databases as the epidemic progresses. the great adaptability and infection capacity of rna viruses depends in part from their high mutation rates. as expected, available sars-cov- genomic sequences already show a large number of polymorphisms. many techniques use molecules that interact with the virus rna genome or the reverse transcribed dna, either for clinical testing, diagnosis or determination of viral loads. for example, pcr primers and rt-qpcr probes are been widely used to detect sars-cov- . , it is likely that oligonucleotides complementary to the virus rna will be tested as possible antiviral agents. , however, polymorphisms can be a challenge for the efficiency of available assays since they may lead to false-negative results in detection tests or inefficient therapeutics. the cov id database (http://covid.portugene.com/) uses java graphics and dynamic tables and works with major web browsers (e.g. internet explorer, mozilla firefox, chrome). the database provides descriptive webpages for each oligonucleotide and a search engine to access dynamic tables with numeric data and multiple sequence alignments. a sqlite local database is used for data storage and runs on an apache web server. the dynamic html pages were implemented using cgi-perl and javascript and the dataset tables using the jquery plugin datatables v . . (http://datatables.net/). python and perl in-house algorithms were written and used to perform identity and pairwise calculations. the oligonucleotides were retrieved from peer reviewed publications [e.g., [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] and protocols provided by the world health organization (who). each oligonucleotide has a specific database code (for example, cov id ). the cov id database ranks the 'cov id ranking score' considers the mean value of the three different measures (pis, 'pis and ppi), as previously described. , the sars-cov- 'isolate wuhan-hu- ' (nc_ . ) was used as reference. were obtained from the ncbi genbank (https://www.ncbi.nlm.nih.gov/genbank/sars-cov- -seqs/) and the gisaid initiative the chan_rdrp_gene_r) has a cov id score of . %, but was designed to target all sars-related coronaviruses , which explains its high conservation. in general, available sars-cov- oligonucleotides diverge from other human coronaviruses by several positions, and therefore are unlikely to cause false-positives. if the aim is to choose an oligonucleotide located in a conserved genomic viral evolution and the emergence of sars coronavirus covid- diagnostics in context positive rate of rt-pcr detection of sars-cov- infection in cases from one hospital in evaluation of a quantitative rt-pcr assay for the detection of the emerging coronavirus sars-cov- using a high throughput system rna therapeutics: beyond rna interference and antisense oligonucleotides oligonucleotide antiviral therapeutics: antisense and rna interference for highly pathogenic rna viruses development of a novel reverse transcription loop-mediated isothermal amplification method for rapid detection of sars-cov- rt-lamp for rapid diagnosis of coronavirus sars-cov- . microbial biotechnology rapid and visual detection of novel coronavirus (sars-cov- ) by a reverse transcription loop-mediated isothermal amplification assay development of a novel, genome subtraction-derived, sars-cov- -specific covid- -nsp real-time rt-pcr assay and its evaluation using clinical specimens detection of novel coronavirus ( -ncov) by real-time rt-pcr crispr-cas -based detection of sars-cov- comparative performance of sars-cov- detection assays using seven different primer-probe sets and one assay kit a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster ebolaid: an online database of informative genomic regions for ebola identification and treatment the hiv oligonucleotide database (hivoligodb) mafft online service: multiple sequence alignment, interactive sequence choice and visualization analytical sensitivity and efficiency comparisons of sars-cov- qrt-pcr assays the establishment of reference sequence for sars-cov- and variation analysis this research was supported by national funds through fct -foundation for science and technology within the scope of uidb/ / and uidp/ / . j.c. also acknowledges the fct funding for his research contract at ciimar, established under the transitional rule of decree law / , amended by law / . the authors declare that there are no conflict of interests. key: cord- - u lh jx authors: sharma, a.; preece, b.; swann, h; fan, x.; mckenney, r.j.; ori-mckenney, k.m.; saffarian, s.; vershinin, m.d. title: structural stability of sars-cov- degrades with temperature date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: u lh jx sars-cov- is a novel coronavirus which has caused the covid- pandemic. other known coronaviruses show a strong pattern of seasonality, with the infection cases in humans being more prominent in winter. although several plausible origins of such seasonal variability have been proposed, its mechanism is unclear. sars-cov- is transmitted via airborne droplets ejected from the upper respiratory tract of the infected individuals. it has been reported that sars-cov- can remain infectious for hours on surfaces. as such, the stability of viral particles both in liquid droplets as well as dried on surfaces is essential for infectivity. here we have used atomic force microscopy to examine the structural stability of individual sars-cov- virus like particles at different temperatures. we demonstrate that even a mild temperature increase, commensurate with what is common for summer warming, leads to dramatic disruption of viral structural stability, especially when the heat is applied in the dry state. this is consistent with other existing non-mechanistic studies of viral infectivity, provides a single particle perspective on viral seasonality, and strengthens the case for a resurgence of covid- in winter. statement of scientific significance the economic and public health impact of the covid- pandemic are very significant. however scientific information needed to underpin policy decisions are limited partly due to novelty of the sars-cov- pathogen. there is therefore an urgent need for mechanistic studies of both covid- disease and the sars-cov- virus. we show that individual virus particles suffer structural destabilization at relatively mild but elevated temperatures. our nanoscale results are consistent with recent observations at larger scales. our work strengthens the case for covid- resurgence in winter. sars-cov- is a virus of zoonotic origin which was first identified in humans in late [ ] . similar to other coronaviridae [ ] , the viral particles are enveloped and polymorphic decorated by a variable number of s protein spikes on their membrane [ ] . one of the most confusing and yet urgently pressing questions at the time of this writing is whether the covid- pandemic caused by sars-cov- will show seasonal character. climate and seasonal dependence was expected early in the pandemic [ ] due to similarity with other human coronavirus diseases [ ] , however the rates of infections have failed to strongly decline in the summer of , leading to widespread doubts about covid- seasonality. at the same time, a mounting body of evidence, from theoretical studies [ ] to experimental research on viral populations and their infectivity [ , ] suggest that seasonality is indeed to be expected. however an understanding of how sars-cov- survives different environmental conditions is still incomplete and mechanisms of virus particle degradation are poorly mapped out. this then creates uncertainty for public health policy and its forward projection. a key challenge in studying sars-cov- is the extreme level of threat associated with the live virus and the resultant need for high safety standards for such work. aside from the envelope and s proteins, sars-cov- also packages the positive sense rna genome encapsidated with thousands of copies of nucleocapsid, n proteins. sars-cov- also packages thousands of copies of matrix protein (m) which consists of three membrane spanning helixes with small intraluminal and extra luminal domains. in addition, an unknown number of envelope (e) proteins, which contain a single membrane spanning helix, are also packaged in each virion. we have previously shown that similar to sars-cov [ ] , the expression of sars-cov- m, e, and s proteins in transfected human cells is sufficient for the formation and release of virus like particles (vlps) through the same biological pathway as used by the fully infectious virus [ ] . these vlps faithfully mimic the external structure of the sars-cov- virus. the vlps however, possess no genome and thus present no infectious threat which enables rapid studies with reduced safety requirements. the ability to produce non-infectious vlps further enabled us to devise and rapidly validate novel strategies for manipulation of these particles, most notably via the addition of protein tags to the s and m proteins (these findings are detailed in a separate manuscript). here, we report studies of vlps subjected to variable temperature conditions before or after being immobilized and dried out on a functionalized glass surface. we show that exposure of vlps to a mildly elevated temperature ( c) for as little as minutes is sufficient to induce structural degradation. the effect is weaker for particles exposed to elevated temperatures in solution and stronger for exposure in the dry state. overall, these results provide insight into the viral seasonality of sars-cov- . during initial refinement of vlp purification strategies and associated vlp characterization [ ] , we have found that such particles remain stable for at least a week if stored in liquid buffer at near °c conditions (on ice). we therefore examined whether they would remain stable at room temperature under dry conditions on a model surface (fig. ). spytag-s vlps were adhered to microtubules (mts) are harder to image stably due to high amount of background noise which obscures poly-l-lysine inhomogeneity and mt washout sites (likely due to loose debris on the surface -plausibly a byproduct of particle degradations). features consistent with intact vlps are prominent relative to noise levels and hence easy to resolve (fig. ), but they are so exceedingly rare at °c that they are considered outliers (fig. ). they were seen in large area surveys in which each particle is mechanically probed only a few times. such particles do not survive intact during even a single detailed zoomed-in scan (fig. ) . it takes - minutes to install the sample into the afm, approach the surface and validate tip condition. therefore mildly elevated temperature has a rapid effect on vlp integrity. vlps incubated at °c in solution and imaged at room temperature ( fig. c ) survive better than particles at °c under dry conditions but still often appear disrupted or aggregated in afm imaging. a common transmission route for sars-cov- is through bioaerosols created during sharp exhalation events such as sneezing or coughing. the bioaerosol droplets tend to dry out quickly due to high surface area and small volume [ , ] . therefore virus particles may be exposed to both wet and dry conditions before coming into contact with and infecting the next host. it is widely recognized that virus particles often spread after their deposition on various surfaces [ ] (although direct contact of the next host with contaminated bioaerosol may also be a viable route) and it is therefore also appreciated that virus particles can survive on various surfaces for an extended length of time [ ] . the ability to make vlps based on the sars-cov- genome, combined with abundant available structural information allowing for high precision design strategies for the vlps, opens a unique opportunity for fast progress and allowed us to overcome the safety concerns associated with experiments on the full virus. here we used this technology to study the stability of the viral envelope and associated proteins (m, e, and s) under different environmental conditions. as might be expected a priori, the vlps do indeed degrade when exposed to elevated temperatures. our afm imaging revealed that negligibly few particles retain their shape and even those exceptional particles degraded nearly instantly during scanning and hence are likely already structurally impaired. the mt washout sites are readily identifiable for scans at room temperature, but are difficult to see with an identical colormap presentation due to elevated background noise. however, artificially extending the z range of the data helps reveal the presence of mt washout sites (right panel, washout site highlighted with red oval). faint modulation in the rightmost image is most likely due to electronic noise in the imaging system although a mechanical vibration noise contribution cannot be excluded. a new coronavirus associated with human respiratory disease in china the molecular biology of coronaviruses structures and distributions of sars-cov- spike proteins on intact virions climatic-niche evolution of sars cov- temperature, humidity, and latitude analysis to estimate potential spread and seasonality of coronavirus disease (covid- ) extended lifetime of respiratory droplets in a turbulent vapour puff and its implications on airborne disease transmission environmental stability of sars-cov- on different types of surfaces under indoor and seasonal climate conditions stability of sars-cov- in different environmental conditions purification and electron cryomicroscopy of coronavirus particles, sars-and other coronaviruses minimal system for assembly of sars-cov- virus like particles toward understanding the risk of secondary airborne infection: emission of respirable pathogens droplet fate in indoor environments, or can we prevent the spread of infection? stability of sars-cov- and other coronaviruses in the environment and on common touch surfaces and the influence of climatic conditions: a review, transboundary and emerging diseases aerosol and surface stability of sars-cov- as compared with sars-cov- the effects of temperature and relative humidity on the viability of the sars coronavirus key: cord- - dgngwmw authors: he, zhesheng; zhao, wencong; niu, wenchao; gao, xuejiao; gao, xingfa; gong, yong; gao, xueyun title: molecules inhibit the enzyme activity of -chymotrypsin-like cysteine protease of sars-cov- virus: the experimental and theory studies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dgngwmw sars-cov- has emerged as a world public health threat. herein, we report that the clinical approved auranofin could perfectly inhibit the activity of -chymotrypsin-like cysteine protease (mpro or clpro) of sars-cov- . gold cluster could significantly inhibit clpro of sars-cov- . phenyl isothiocyanate and vitamin k could well suppress the activity of clpro. for mpro inhibition, ic of auranofin, vitamin k , phenyl isothiocyanate, gold cluster are about . μm, . μm, . μm, . μm, respectively. these compounds may be with potentials for treatment sars-cov- virus replication. especially for fda approved auranofin, it is an anti-inflammation drug in clinic, thus it may with strong potential to inhibit virus replication and suppress the inflammation damage in covid- patients. gold cluster is with better safety index and well anti-inflammation in vitro/vivo, therefore it is with potential to inhibit virus replication and suppress the inflammation damage caused by covid- virus. as au(i) ion is active metabolism specie derived from gold compounds or gold clusters in vivo, further computational studies revealed au ion could tightly bind thiol group of cys residue of clpro thus inhibit enzyme activity. also, phenyl isothiocyanate and vitamin k may interact with thiol group of cys via michael addition reaction, molecular dynamic (md) theory studied are applied to confirmed these small molecules are stable in the pocket and inhibit mpro activity. a new coronavirus named as covid- virus (also called sars-cov- virus) is responsible for the pandemic outbreak in the world. the viral -chymotrypsin-like cysteine protease ( clpro, also called mpro) enzyme controls this covid- virus replication and is essential for its life cycle. therefore, clpro is drug target in the case of sars-cov- . our biochemistry studied revealed that the gold compounds such as auranofin, isothiocyanate compounds such as phenyl isothiocyanate, vitamin k such as vitamin k , and gold cluster such as glutathione coated gold cluster could well inhibit the activity of clpro. further dft and md computational studies also verified these small molecules could interact cys and amino acid residues of mpro, thus suppress the activity of mpro. these compounds could serve as potential anti-sars-cov- lead molecules for further drug studies to combat covid- . recombinant covid- virus clpro with native n and c termini was expressed in escherichia coli and subsequently purified following the recently reported work ( ) . the full-length gene encoding covid- virus mpro was synthesized for escherichia coli expression. briefly, the expression plasmid was transformed into escherichia coli cells and cultured in luria broth medium containing μg/ml ampicillin at °c. when the cells were grown to od of . - . , . mm iptg was added to the cell culture to induce the expression at °c. after h, the cells were harvested by centrifugation at , g. the cell pellets were resuspended in lysis buffer ( mm tris-hcl ph . , mm nacl), lysed by high-pressure homogenization, and then centrifuged at , g for in order to characterize clpro enzymatic activity, we used a for mpro inhibition, ic of phenyl isothiocyanate is about . µm, data showed as following figure . phenethyl isothiocyanate is a constituent of cruciferous vegetables that has cancer preventive activity including lung, prostate, and breast cancer ( ) . the potential of this compound to inhibit couid- virus is an interesting topic as phenethyl isothiocyanate is a good candidate to be a dietary supplement. it is widely accepted that gold compounds metabolism in vivo and the produced au(i) would interact with specific thiol group of protein to interfere their normal functions ( ) . the interactions between the gold atom with the binding pockets of proteins were studied by density functional theory (dft) calculations. for the calculations, only the main amino acids of the protein binding pockets were taken into account. all geometries were fully optimized using the b lyp method in conjunction with the sdd basis set for au and the - g(d,p) for other nonmetals ( , , ) . the sdd pseudopotential was also applied for au. during optimization, smd solvation model was utilized to model the water environment. all the calculations were carried out using gaussian package ( ) . the binding energy (eb) between au and the pocket ligands were calculated using the following equation: eb = eau+ + eligands -eau-ligands. where eau+, eligands, and eau-ligands were the total energies of au+ ion, the ligands, and the au-ligand complex structures, respectively. eligands were the single-point energy for the au-ligand complex with the au atom removed from the system. density-functional thermochemistry. iii. the role of exact exchange relativistic effects in gold chemistry. i. diatomic gold compounds a complete basis set model chemistry. ii. open-shell systems and the total energies of the first-row atoms universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions scalable molecular dynamics with namd openmm simulations using the charmm additive force field key: cord- -a oejky authors: sasaki, michihito; uemura, kentaro; sato, akihiko; toba, shinsuke; sanaki, takao; maenaka, katsumi; hall, william w.; orba, yasuko; sawa, hirofumi title: sars-cov- variants with mutations at the s /s cleavage site are generated in vitro during propagation in tmprss -deficient cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a oejky the spike (s) protein of severe acute respiratory syndrome-coronavirus- (sars-cov- ) binds to a host cell receptor which facilitates viral entry. a polybasic motif detected at the cleavage site of the s protein has been shown to broaden the cell tropism and transmissibility of the virus. here we examine the properties of sars-cov- variants with mutations at the s protein cleavage site that undergo inefficient proteolytic cleavage. virus variants with s gene mutations generated smaller plaques and exhibited a more limited range of cell tropism compared to the wild-type strain. these alterations were shown to result from their inability to utilize the entry pathway involving direct fusion mediated by the host type ii transmembrane serine protease, tmprss . notably, viruses with s gene mutations emerged rapidly and became the dominant sars-cov- variants in tmprss -deficient cells including vero cells. our study demonstrated that the s protein polybasic cleavage motif is a critical factor underlying sars-cov- entry and cell tropism. as such, researchers should be alert to the possibility of de novo s gene mutations emerging in tissue-culture propagated virus strains. was the dominant pathway employed by sars-cov- in tmprss -expressing cells [ ] . in contrast, camostat had no impact on entry of the s gene mutant, del , into vero-tmprss cells but e- d treatment resulted in a dose-dependent decrease in del entry (fig. a) . these results suggested that s gene mutant, del , can enter vero-tmprss cells via cathepsin-dependent endocytosis but not the tmprss -mediated fusion pathway. parental vero cells that do not express tmprss were inoculated with s gene mutant viruses in the presence of camostat and/or e- d. addition of e- d inhibited the entry of all s gene mutants into both vero-tmprss and parent vero cells; by contrast, camostat had no impact on s gene mutant entry into these target cells (fig. b ). these results suggested that, in contrast to wt virus, s gene mutants enter into cells via cathepsin-dependent endocytosis only, regardless of the presence or absence of tmprss . because wt virus and s gene mutants showed different sensitivities to the treatment with camostat, an agent currently under exploration as a candidate antiviral for clinical use [ ], we also examined the impact of other antiviral agents including nafamostat (a tmprss inhibitor) [ , ] and remdesivir (a nucleotide analog) [ , ] . antiviral effects in vero-tmprss cells were estimated by a cell viability assay based on the generation of cytopathic effects [ ] . consistent with previous studies [ , ] , nafamostat showed higher antiviral efficacy against wt virus than was observed in response to camostat; however, nafamostat had no antiviral activity against the s gene mutants ( table ). in contrast, remdesivir inhibited infection of both wt and s gene mutants with similar ec values (table ). these results indicated that s gene mutants are resistant to the treatment with tmprrss inhibitors, but are sensitive to antivirals that target post entry in an effort to understand the selection mechanisms underlying the generation of these mutant variants, we estimated the frequency of s gene mutants in virus population of sars-cov- that had undergone serial passage in cultured cells. sars-cov- from an original virus stock was underwent passage once (p ) to four times (p ) in vero or up to eight times (p ) in vero-tmprss . nucleotide sequence heterogeneity at the s /s cleavage site was determined by deep sequencing and variant call analysis. more than one million sequence reads from each passaged sample were mapped onto the s /s cleavage site and analyzed for sequence variation. no sequence variants were observed in virus populations until p in vero-tmprss (fig. a) . in contrast, nucleotide sequence deletions around the s /s cleavage site corresponding to del and del mutants were observed in all three biological replicates of sars-cov- populations passaged in vero cells (fig. a) . notably, wt nucleotide sequences were detected in fewer than % of the isolates evaluated at p and the wt was completely replaced with s gene mutants at p in vero cells. these results indicated that sars-cov- propagation in vero cells results in a profound selection favoring the s gene mutants. s gene mutants del and r h were not identified in the virus populations from p to p . an additional variant del with a deletion of amino acids at a point immediately upstream of the rrar motif (figs. s a and s b), was detected as a minor variant in sample # at p . these results suggest that these specific mutations occur only at low frequency. we then determined the frequency of s gene mutants in virus populations passaged in calu- and caco- , which are cells that endogenously express tmprss [ , ] , and also in t-ace that do not express tmprss [ ] . no s gene mutants were identified in sars-cov- passaged in calu- and caco- until p ; by contrast, s gene mutants emerged at p in t-ace cells (fig. b) . we also identified an additional variant r p carrying a single amino acid substitution at the rrar motif (figs. s a and s b) at p and p in t-ace cells (fig. b) . taken together, these results suggest a strong association between tmprss deficiency and the emergence of s gene mutants. trypsin is a serine protease that is typically added to culture medium to induce cleavage and activation of viral proteins, including the hemagglutinin (ha) protein of influenza virus and the fusion (f) protein of paramyxovirus to promote growth in tmprss -deficient cells [ ] . recent studies report that trypsin treatment activates sars-cov- s protein and induces syncytia formation in cells that transiently express the virus s protein [ , ] . as such, we examined whether exogenously added trypsin could compensate for tmprss deficiency and thus inhibit the emergence of s gene mutants during sars-cov- propagation in vero cells. deep sequencing analysis revealed that s gene mutants did emerge and accounted for the majority of the virus population after p in vero cells cultured in serum-free medium with added trypsin ( in this study, we isolated s gene mutants from sars-cov- wk- , a strain isolated difficult to impossible to maintain sars-cov- with the s /s cleavage site in its intact form. the present study characterized s gene mutants as sars-cov- variants that generate small plaques and that have a narrow range of cell tropism. the phenotypic alterations of s gene mutants might be explained by noting that the s gene mutants were unable enter target cells via direct fusion mediated by tmprss . indeed, a previous study demonstrated that the polybasic cleavage motif at the s /s cleavage site was indispensable for the entry of vsv-pseudotyped viruses into calu- cells that expressed tmprss [ ] . further studies using infectious s gene mutants will provide new insights into the role of the polybasic amino acid motif at the s /s cleavage site with respect to both sars-cov- infection and its pathogenicity. at this time, many studies are conducted using sars-cov- propagated in vero cells. considering the very real possibility that these virus stocks will accumulate s gene mutations, researchers must pay careful attention to the passage history of any working stocks of sars-cov- . moreover, we must be very objective when interpreting the results from studies using vero-passaged virus, especially those focused on s protein cleavage, cells were infected with either wt or s mutants of sars-cov- at an moi of . after h, cells were fixed with . % buffered formaldehyde, permeabilized with ice-cold methanol, and incubated with anti-sars-cov- s antibody (gtx , genetex). cells were inoculated with either wt or s mutants of sars-cov- at an moi of . . after h of incubation, cells were washed twice with phosphate-buffered saline (pbs) and cultured in fresh medium with % fbs. the culture supernatants were harvested at , , and h after inoculation. virus titers were evaluated by plaque assay. vero-tmprss were infected with either wt or s mutants of sars-cov- at - tcid and added to the plates. plates were incubated at for days, and cpe was determined for calculation of % endpoints using mtt assay. the concentration achieving % inhibition of cell viability (effective concentration; ec ) was calculated. the original stock of sars-cov- strain wk- was serially passaged in vero, vero-tmprss , calu- , caco- , and t-ace cells in complete culture medium or (for vero) in serum free dmem supplemented with . μ g/ml trypsin (gibco); three full-length s protein is cleaved into s and s proteins at the s /s cleavage site. functional domains (rbd, receptor binding domain; rbm, receptor binding motif) are highlighted. (b) multiple amino acid sequence alignments focused on the s /s cleavage site of wild type (wt) and isolated mutant viruses (del , del , del and r h). amino acid substitutions and deletions are shown as gray boxes, and the polybasic cleavage motif (rarr) at the s /s cleavage site is highlighted in red. a red arrowhead indicates s /s cleavage site. (a) vero-tmprss cells were infected with sars-cov- wt or del mutant in the presence of varying concentrations of the tmprss inhibitor, camostat, or the cathepsin b/l inhibitor, e- d, for h. at h post-inoculation, the relative levels of viral n protein rna were evaluated quantitatively by qrt-pcr. (b) vero-tmprss and vero cells were infected with sars-cov- wt or s gene mutants in the presence of μm camostat and/or μm e- d for h. at h post-inoculation, the relative levels of viral n protein rna were quantified by qrt-pcr. cellular β-actin mrna levels were used as reference controls. the values shown are mean sd of triplicate samples. one-way analysis of variance with dunnett's test was used to determine the statistical significance between the responses to treatment with inhibitors and the no-treatment controls; *p < . , **p < . , ***p < . . multiple (a) nucleotide and (b) amino acid sequence alignments were constructed based on the sequence of wt and sars-cov- variants identified by deep-sequencing (related to fig. ) . infectious viruses of del and r p were not isolated in this study. nucleotide substitutions and deletions are shown as gray boxes. sequence encoding the polybasic cleavage motif (rarr) at the s /s cleavage site is highlighted in red. sars-cov- was serially passaged in vero cells in serum free dmem containing trypsin with three biological replicates. nucleotide sequence diversity at viral s /s cleavage site was determined by deepsequencing. sars-cov- (r h) atcagactcagactaattctcctcggcgggcacatagtgtagctagtcaatccatcattgc multiple nucleotide sequence alignment of s /s cleavage site of wild type and isolated sars-cov- mutants nucleotide substitutions and deletions are shown as gray boxes. sequence encoding the polybasic cleavage motif (rarr) at the s /s cleavage site is highlighted in red sars-cov- (r p) atcagactcagactaattctcctccgcgggcacgtagtgtagctagtcaatccatcattgc sars-cov- (r p) ecdipigagicasyqtqtnspprarsvasqsiiaytmslgaensvaysnns ecdipigagicasyqtqt----------sqsiiaytmslgaensvaysnns sars-cov- (del ) ecdipigagicasyqtqtnspr-------qsiiaytmslgaensvaysnns sars-cov- (del ) ecdipigagicasyqtqtnsprrarsva---iiaytmslgaensvaysnns sars-cov- (r h) ecdipigagicasyqtqtnsprrahsvasqsiiaytmslgaensvaysnns key: cord- - b sycgv authors: guo, qirui; zhao, yingchi; li, junhong; liu, jiangning; guo, xuefei; zhang, zeming; cao, lili; luo, yujie; bao, linlin; wang, xiao; wei, xuemei; deng, wei; chen, luoying; zhu, hua; gao, ran; qin, chuan; wang, xiangxi; you, fuping title: small molecules inhibit sars-cov- induced aberrant inflammation and viral replication in mice by targeting s a /a -tlr axis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: b sycgv the sars-cov- pandemic poses an unprecedented public health crisis. accumulating evidences suggest that sars-cov- infection causes dysregulation of immune system. however, the unique signature of early immune responses remains elusive. we characterized the transcriptome of rhesus macaques and mice infected with sars-cov- . alarmin s a was robustly induced by sars-cov- in animal models as well as in covid- patients. paquinimod, a specific inhibitor of s a /a , could reduce inflammatory response and rescue the pneumonia with substantial reduction of viral titers in sasr-cov- infected animals. remarkably, paquinimod treatment resulted in % survival of mice in a lethal model of mouse coronavirus (mhv) infection. a novel group of neutrophils that contributed to the uncontrolled inflammation and onset of covid- were dramatically induced by coronavirus infections. paquinimod treatment could reduce these neutrophils and regain antiviral responses, unveiling key roles of s a /a and noncanonical neutrophils in the pathogenesis of covid- , highlighting new opportunities for therapeutic intervention. the ongoing corona virus disease caused by severe acute respiratory syndrome coronavirus- (sars-cov- ) has resulted in unprecedented public health crises, requiring a deep understanding of the pathogenesis and developments of effective covid- therapeutics (wu et al., b; zhu et al., ) . innate immunity is an important arm of the mammalian immune system, which serves as the first line of host defense against pathogens. most of the cells of the body harbor the protective machinery of the innate immunity and can recognize foreign invading viruses (akira et al., ) . the innate immune system recognizes microorganisms via the pattern-recognition receptors (prrs) and upon detection of invasion by pathogens, and prrs activate downstream signaling pathways leading to the expression of various cytokines and immune-related genes for clearing the pathogens including bacteria, viruses and others (akira et al., ) . with regards to sars-cov- infection, an overaggressive immune response has been noted which causes immunopathology zhang et al., ) . in addition, t cell exhaustion or dysfunction has also been observed (diao et al., ; zheng et al., a; zheng et al., b) . besides, some studies suggest that there may be a unique immune response evoked by coronaviruses (blanco-melo et al., ) . however, the nature of these responses elicited by the virus remains poorly understood. accumulating evidences suggest that the neutrophil count is significantly increased in covid- patients with severe symptoms (kuri-cervantes et al., ; liao et al., ; tan et al., ; wu et al., a) . it is believed that neutrophils migrate from the circulating blood to infected tissues in response to inflammatory stimuli, where they protect the host by phagocytosing, killing and digesting bacterial and fungal pathogens (nauseef and borregaard, ; nicolas-avila et al., ) . the role of such a response in host defense against viral infection has not been clearly characterized. a recent study observed a new subpopulation of neutrophils in covid- patients, which have been named developing neutrophils because they lack canonical neutrophil markers like cxcr and fcgr b (wilk et al., ) . however, it is still not clear how this type of neutrophil is activated. moreover, the precise function of these cells is also unknown. alarmins are endogenous, chemotactic and immune activating proteins/peptides which are released as a result of cell injury or death, degranulation, or in response to infection. they mediate the relay of intercellular defense signals by interacting with chemotactic and pattern-recognition receptors (prrs) to activate immune cells in host defense (oppenheim and yang, ; yang et al., ) . currently, the major categories of alarmins include defensins, high-mobility group (hmg) proteins, interleukins (ils), heat shock proteins (hsps), s proteins, uric acid, hepatoma derived growth factor (hdgf), eosinophil-derived neurotoxin (edn), and cathelin-related antimicrobial peptide (cramp) (giri et al., ; yang et al., ) . in response to microbial infection, alarmins are released to initiate and amplify innate/inflammatory immune responses, which involve the activation of resident leukocytes (e.g. macrophages, dendritic cells, mast cells, etc.), production of inflammatory mediators (cytokines, chemokines, and lipid metabolites), recruitment of neutrophils and monocytes/macrophages for the purpose of eliminating invading microorganisms and clearing injured tissues (bianchi, ; chen and nunez, ; nathan, ; oppenheim and yang, ; yang et al., ) . however, uncontrolled production of alarmins is harmful or even fatal to the host in some cases. hmgb protein acts as a late mediator of lethal systemic inflammation in sepsis (wang et al., ) . therefore, anti-hmgb therapeutics have shown to be beneficial in experimental models of sepsis. s a and s a make up approximately % of the cytoplasmic proteins present in neutrophils. both these proteins are members of the s group of proteins. s a and s a are also referred to as mrp and mrp , respectively. under physiological conditions, massive levels of s a and s a are stored in neutrophils and myeloid-derived dendritic cells, while low levels of s a and s a are expressed constitutively in monocytes (foell et al., ; wang et al., ) . s a and s a often form heterodimers (s a /a ) (ometto et al., ) . the major functions of s a /a reported so far include the regulation of leukocyte migration and trafficking, the remodeling of cytoskeleton, amplification of inflammation and exertion of anti-microbial activity (ometto et al., ; wang et al., ) . after being infected with bacteria, neutrophils, macrophages, and monocytes intensely induce the expression and secretion of s a /a to modulate inflammatory processes through the induction of inflammatory cytokines. s a / is an endogenous ligand of toll-like receptor (tlr ) and can trigger multiple inflammatory pathways mediated by tlr (vogl et al., ) . s a and s a also have antibacterial potential via their ability to bind zn + (foell et al., ; wang et al., ) . not much is known about the roles of s a /a in host defense responses against viruses. in the present study, we characterized the nature of the early innate immune responses evoked in rhesus macaques and mice during sars-cov- infection. s a was dramatically upregulated by sars-cov- and a mouse coronavirus (mouse hepatitis virus, mhv), but not by other viruses. a group of non-canonical neutrophils were also activated during sars-cov- infection. the abnormal immune responses were mediated by the s a /a -tlr pathway. s a /a specific inhibitor, paquinimod, significantly reduced the number of neutrophils activated by the coronavirus, inhibited viral replication and rescued lung damage a result of sars-cov- infection. these results highlight the potential of therapeutically targeting s a /a for suppressing the uncontrolled inflammation associated with severe cases of covid- and provide new information on an alarmin-mediated pathway for regulating neutrophil function. to characterize the early immune responses against coronavirus infection, we infected rhesus macaques with sars-cov- and analyzed the transcriptome of infected and non-infected animals ( figure a ). the whole genome wide rna-seq analysis of the lungs from infected rhesus macaques showed that a number of transcripts were induced or inhibited at day and day after sars-cov- infection (supplementary figure a) . the induction of primary antiviral factors, like type i ifns was completely blocked during sars-cov- infection (supplementary figure b) . gene ontology (go) analysis showed that a small group of genes involved in defense responses against viruses was induced ( figure b) . however, interestingly, a greater number of genes involved in regulating cellular responses to lipopolysaccharide (lps) were induced by sars-cov- at day post infection. an analysis of the kegg pathways also showed that sars-cov- induced genes were enriched with more from anti-bacterial pathways than those from anti-viral pathways (supplementary figure c ). in line with this, neutrophil chemotaxis associated genes were also upregulated as a result of sars-cov- infection. this upregulation was less significant at day , but more significant at day when compared to the levels of genes related to antiviral responses that were induced as a result of the viral infection ( figure b ). it is worthy to note that the number of neutrophils increases in covid- patients with severe symptoms (kuri-cervantes et al., ; liao et al., ; tan et al., ; wu et al., a) . our results of the preliminary characterization of the early innate response to a sars-cov- challenge indicate that neutrophils are significantly activated at the very beginning of sars-cov- infection. to confirm this, we examined the expression of neutrophil markers in the lungs from infected rhesus macaques. all the signature genes, from primary granules to tertiary granules, were significantly induced as a result of sars-cov- infection ( figure c ). the markers for monocytes and natural killer cells were slightly upregulated. t cells were unchanged, while interestingly, b cells were significantly down-regulated in the lungs of infected animals ( figure d and supplementary figure d ). taken together, the above data suggested that the sars-cov- infection provoked a non-canonical antiviral response, or an antibacterial response accompanied by increased neutrophils in the lung at the early stage. to explore how an infection by sars-cov- triggers the activation of antibacterial responses, we examined the differential expression of genes before and after coronavirus infection. the expression of alarmin protein s a was robustly upregulated with , -and , -fold induction at day and day post sars-cov- infection, respectively ( figure e ). s a acts as an alarmin through formation of heterodimers with s a , then these heterodimers function as danger associated molecular pattern (damp) molecules and activate innate immune responses via binding to pattern recognition receptors such as toll-like receptor (tlr ). in our studies, s a was also induced during the sars-cov- infection ( figure e ). the basal expression of s a was much higher, but its induction was not as significant as that of s a . in fact, among all the known alarmins, s a was the most significantly induced gene at both day and day post infection ( figure f ). qrt-pcr analysis showed that the level of s a / surged along with an increase in the viral load in the lung of infected animals ( figure g ). next, we examined the expression of alarmins in the blood samples taken from infected rhesus macaques. s a / as well as the neutrophil marker genes were also induced by the viral infection ( figure h ). based on these results, we investigated if s a /a were upregulated in covid- patients. remarkably, both s a and s a were upregulated in post-mortem lung samples from covid- -positive patients when compared with biopsied healthy lung tissue from uninfected individuals ( figure i ). the expression of neutrophil marker genes was also significantly increased in covid- -positive patients ( figure i ). concomitantly, the mrna level of s a was significantly higher in peripheral blood from covid- -positive patients when compared to healthy subjects (supplementary figure e) . a group of alarmins were induced in different types of blood cells of covid- -positive patients. for instance, s a was significantly induced in red blood cells, cd + monocytes, cd + monocytes, neutrophils and developing neutrophils ( figure j ). s a /a is the ligand of tlr , which is the primary prr that recognizes invading gram-negative bacterium and lps. we thus observed a significant induction of genes that were predominantly involved in responses to lps and anti-bacterial pathways. s a /a are also able to induce neutrophil chemotaxis and adhesion, which probably contributed to the increased infiltration of neutrophils. to further investigate the nature of immune responses triggered and the roles of s a in these responses, we challenged hace transgenic mice with sars-cov- and c bl/ mice with influenza a virus (iav), which also infects the respiratory tract. we performed whole genome wide rna-seq analysis to characterize the defense responses during viral infection ( figure d) . interestingly, sars-cov- was only able to induce ifnκ but not any other type i ifns (supplementary figure d ). both sars-cov- and iav were able to induce ifnγ expression. iav, but not sars-cov- , induced type iii ifns production (supplementary figure d) . qrt-pcr further confirmed that ifnβ and isg induction was impaired during sars-cov- infection ( figure c ). the induction of most isgs was attenuated during sars-cov- infection compared with iav infection (supplementary figure e) . however, sars-cov- but not iav induced s a and neutrophil marker genes ly g expression ( figure c ). consistently, other neutrophil marker genes were also induced by sars -cov- but not by iav ( figure d ). similar to the data from rhesus macaque experiments, compared to other alarmins, s a was robustly induced by sars-cov- but not by iav infection in mice ( figure e ). thus, sars-cov- infection specifically induces the transcription of alarmin s a . to further confirm this, we infected c bl/ mice with other rna-or dna-viruses including encephalomyocarditis virus (emcv), herpes simplex virus (hsv- ) and vesicular stomatitis virus (vsv) and measured the expression of s a in the lung of infected animals. none of these viruses were able to induce the expression of s a ( figure f ). we then investigated if other coronaviruses were able to induce the transcription of s a and activate neutrophils. we infected c bl/ mice with mouse hepatitis virus (mhv-a ) intranasally ( figure a) . surprisingly, we did not observe any obvious symptoms in infected mice. we then infected irf /irf double knockout mice and ifnar deficient mice with mhv. similar to the wild type mice, irf /irf double knockout mice were able to eliminate the virus rapidly and did not develop severe pneumonia. interestingly, ifnar deficient mice showed obvious damage in pulmonary interstitium and alveoli ( figure g ). all the mice died within days ( figure h ). we performed whole genome rna-seq analysis to determine if mhv induced similar immune responses as those induced by sars-cov- infection. the lungs of mhv infected mice were collected at day post infection and subjected to rna analysis. similarly, mhv induced genes were enriched in neutrophil chemotaxis and antibacterial pathways ( figure i ). s a was also significantly induced by mhv infection ( figure j ). when compared to iav, type i ifns induction was impaired, but neutrophil marker genes were significantly induced by mhv infection ( figure k and l). histological staining of samples taken from lungs showed more s a positive cells infiltration after mhv infection ( figure m ). since the expression of neutrophil marker genes was elevated in the lung after coronavirus infection, we examined if more neutrophils were infiltrated into lung during infection. surprisingly, the number of ly- g (high) and cd b (high) neutrophils decreased in the lung after mhv infection ( figure n ). interestingly, a group of cd b (high) ly- g (mid) cells showed up after mhv infection. further analysis revealed that these cells were cd + granulocytes. a recent single cell analysis study had reported that a group of developing neutrophils was detected in the peripheral blood of covid- patients with decreased expression of canonical neutrophil markers cxcr and fcgr b (wilk et al., ) . we purified these cells by using flow cytometry sorting. qrt-pcr analysis showed decreased expression of cxcr and fcgr ( figure o ). to explore if other types of viruses or stimuli are able to induce these cells, we challenged c bl/ mice with iav, emcv, vsv, hsv- and lps, hace transgenic mice with sars-cov- through nasal inoculation. interestingly, only sars-cov- and mhv induced this group of neutrophils ( figure p ). taken together, these results suggest that coronavirus infection induces s a and a group of noncanonical neutrophils specifically. s a /a functions as a tlr ligand that amplifies inflammation by triggering the activation of innate immune signaling, including neutrophil chemotaxis. we thus reasoned that the abnormal upregulation of s a /a probably resulted in activation of coronavirus specific neutrophils and dysregulation of immune responses. quinoline- -carboxamide paquinimod (also known as abr- ), is an inhibitor of s a which can prevent the binding of s a to tlr- (bjork et al., ; schelbergen et al., ) , suggesting that it can be used to block the function of s a /a and mitigate the antibacterial immune responses elicited during sars-cov- infection. to test this, we treated the sars-cov- infected mice with s a /a inhibitor paquinimod ( figure a ). expectedly, paquinimod was able to rescue the weight loss of infected mice and inhibited sars-cov- viral replication significantly ( figure b and c). consistently, the expressions of neutrophil markers (ly g, mmp , ltf), il- and s a were significantly reduced after paquinimod treatment ( figure d ). more strikingly, paquinimod completely rescued the death of mhv infected, but not iav infected ifnar deficient mice and significantly inhibited mhv replication ( figure e , f and supplementary figure a ). the expression of neutrophil marker genes in lung and blood from infected mice was also inhibited by paquinimod ( figure g , h and supplementary figure b ). the damages to pulmonary interstitium and alveoli were alleviated, and infiltration of s a + cells was also reduced by paquinimod ( figure i and j). to gain further insights into the modulation of immune responses after paquinimod treatment, we performed genome wide rna-seq analysis using the rna from mhv infected ifnar deficient mice. as expected, pathways related to neutrophil chemotaxis and antibacterial responses were primarily downregulated by paquinimod figure d) . the expression of these b cell related genes was rescued or induced by paquinimod during mhv infection, which was confirmed by qrt-pcr analysis ( figure m ). in line with this, sars-cov- but not iav infection, suppressed the expression of these b cell related genes ( figure n and supplementary figure d ), which were also rescued by paquinimod ( figure o ). we demonstrated that paquinimod inhibited the expression of neutrophil maker genes in lung and blood. to determine if paquinimod could reduce the number of these abnormal cells, we performed flowcytometry to analyze the cell population during coronavirus infection. coronavirus-specific neutrophils were significantly reduced in blood and lung after treatment with paquinimod ( figure a ). it is believed that s a /a can activate tlr or receptor for advanced glycation end products (rage) pathways. interestingly, tlr inhibitor (resatorvid) but not rage inhibitor (azeliragon) was able to reduce the number of these neutrophils suggesting that s a /a activated these neutrophils via tlr pathway ( figure a ). moreover, mhv infection also induced these neutrophils in liver and bone marrow, which could be suppressed by tlr inhibitor ( figure b ). consistently, resatorvid was able to inhibit viral replication in lung and blood of the infected mice ( figure c ). moreover, both paquinimod and resatorvid suppressed the activation of coronavirus related neutrophils in lung during sars-cov- infection ( figure d ). to confirm the role of tlr in activating s a related signaling, we examined s a induction upon lps treatment. lps induced s a expression through adaptor protein myd (supplementary figure a) . we then treated mouse macrophages with the serum from sars-cov- infected mice with or without paquinimod treatment. the serum from infected mice was able to induce the expression of s a and cxcl but not il- β or il- in a myd dependent manner ( figure e and supplementary figure b ). s a further activated tlr signaling to form a positive loop. we thus observed that paquinimod was able to inhibit the induction of s a ( figure d and g-h). cxcl is the primary chemokine that recruits neutrophils. consistent with the in vivo results ( figure a ), azeliragon could not inhibit induction of s a or cxcl by the serum ( figure e ). s a therefore exerted its pathogenic function through activating tlr signaling during coronavirus infections. a recent single cell sequencing data clarified the heterogeneity of neutrophils and identified subpopulations (xie et al., ) . we collected the cd + and ly- g + cells from bone marrow ( figure b ) and examined the expression of each marker gene during mhv infection with or without tlr inhibitor. compared to noninfected mice, mhv induced expression of all g maker genes and maker gene from g to g , but inhibited expression of other maker genes ( figure f ). interestingly, resatorvid was able to reverse the expression of marker genes back to the normal levels. the expression of canonical neutrophil markers cxcr , ly g and fcgr were decreased after mhv infection ( figure g ). we further analyzed the rna-seq data of lung samples taken from the mice infected with sars-cov- and iav. g b population, the matured neutrophil, was activated during iav infection ( figure h and supplementary figure c ). whereas sars-cov- , similar to the results from bone marrow experiments ( figure f ), activated g subpopulation significantly. in addition, sars-cov- also induced other premature neutrophil markers from g to g subpopulation in lung ( figure h and supplementary figure c ). interestingly, paquinimod was able to turn off the abnormal upregulation of these genes (supplementary figure d) . together, these results suggested that coronavirus infection expanded the population of premature neutrophils in s a / and tlr dependent manner, and caused dysregulation of the immune system. innate immunity, the first line of host defense, employs pattern-recognition receptors (prrs) for recognizing pathogen-associated molecular patterns (pamps) of invading pathogens and galvanizing the host defense machinery. the endogenous danger-associated molecular patterns (damps) are also able to trigger the activation of innate immune signaling. alarmins are a panel of proteins or peptides that can function as damps to activate various immune pathways (bianchi, ; yang et al., ) . the fine tuning of the transcription of alarmins is critical for maintaining immune homoeostasis. over or sustained expression of alarmins could result in uncontrolled inflammation (chan et al., ; cher et al., ; kang et al., ; patel, ) . such imbalanced immune responses and cytokine storm contribute to the development of severe acute respiratory distress syndrome (ards). here we demonstrated that sars-cov- and mhv, but not any other tested viruses, induced a robust and sustained transcription of the alarmin s a in animal models (figure d-g and figure ). in line with this, the substantial upregulation of s a was also observed in the lung and peripheral blood of infected macaques and covid- patients ( figure h and i) . although we did observe that the sars-cov- and the other types of viruses could induce s a expression slightly in blood at day post infection ( figure c) , however, the induction of s a by viruses other than sars-cov- was either unchanged or reduced at day post infection. s a expression increased dramatically by folds at day post infection ( figure c ). similar phenotypes were observed during mhv infection ( figure f ). thus, uncontrolled or sustained induction of s a could be a common feature of coronavirus infections. however, the mechanisms by which coronaviruses provoke the induction of s a is unknown. in addition to studying the innate immune responses elicited by sars-cov- infected hace transgenic mice and rhesus macaques, we also infected ifnar deficient mice with mouse coronavirus and characterized the resulting immune responses. the mhv infected mice exhibited a characteristic immune signature comprising of s a induction, initiation of antibacterial like responses and activation of coronavirus related neutrophils. these features of the immune response were similar to those observed in sars-cov- infected mice and rhesus macaques ( figure and figure a and b). untreated mhv infected mice developed more severe symptoms and died within days. mice administered with s a inhibitor, paquinimod, were rescued from lung damage and death against mhv infections. since these results were similar to those obtained during studies with sars-cov- , it seems like a shared mechanism that directs the pathogenesis of pneumonia during sars-cov- and mhv infections in animal models. thus, ifnar deficient mice and mhv through its ability to evoke similar immune responses as sars-cov- , could serve as useful models for investigating ards associated with sars-cov- infection. although we inoculated wild type, mavs knockout, sting knockout, elf knockout and irf /irf double knockout mice with mhv intranasally, none of these mice developed obvious ards related symptoms. irf and irf are key transcription factor of type i ifns. type i ifns activate interferon stimulated genes by binding ifnar. since the irf /irf double knockout phenotype was not susceptible to mhv infection, it seems like ifnar can exert its function in a type i ifns independent way. previous studies (hadjadj et al., ; zhou et al., ) and our above data showed that induction of type i ifns is completely blocked during sars-cov- and mhv infections. this further suggests that ifnar is involved in defense against coronavirus with an unknown mechanism. there is consensus that the neutrophil number is significantly increased in covid- patients with severe symptoms (kuri-cervantes et al., ; liao et al., ; tan et al., ; wu et al., a) . moreover, a recent study identified that a group of premature neutrophils in covid- patients that do not express canonical neutrophil markers like cxcr and fcgr b. bacterial coinfection is common in viral pneumonia, especially, in critically ill patients (bost et al., ) . neutrophils can be activated during bacterial infection and are critical for killing invading bacteria (deng et al., ; li et al., ) . the increased neutrophil number and developing neutrophils in patients could be attributed to coinfection of bacterium. however, results of our studies showed that the premature neutrophils were induced by coronavirus at the early stage of infection in animal models. we compared these cells with neutrophil subpopulations that were proposed in a recent study. coronavirus induced neutrophils in bone marrow and lung belong to the g population predominately ( figure e and supplementary figure b) . furthermore, the populations of neutrophils in lungs are more diverse than those in bone marrow. compared to iav induced neutrophils, coronavirus preferentially induces the expression of premature marker genes (supplementary figure b) . however, the exact mechanism by which coronavirus induces these abnormal neutrophils and the key transcription factors that direct development of these cells await to be further investigated. in summary, we have demonstrated that alarmin s a was specifically upregulated by sars-cov- infection. s a /a amplified inflammatory responses by activating tlr signaling pathway. a group of coronavirus-specific premature neutrophils were activated during infection. the inhibitors of s a /a -tlr axis were able to mitigate abnormal inflammation and inhibit viral replication. these results uncover the role of alarmins and neutrophils in the pathogenesis of sasr-cov- infection and provide new therapeutic targets for the treatment of covid- . cd , mmp , cd , and cd in the lung of sars-cov- -infected rhesus macaques at dpi, dpi, and dpi. n = . (e) rna-seq analysis of rna from lung of rhesus macaques infected intranasally with sars-cov- at dpi and dpi. differential expressed genes represented by s a and s a were analyzed (compared with mock). (f) analysis of all known alarmins showing that s a is the most significantly induced one (e). n = . (g) qrt-pcr analysis for viral load and the expression level of s a and s a in the lung of sars-cov- -infected rhesus macaque at dpi, dpi, and dpi. vehicle treated group and paquinimod treated group at dpi, dpi, dpi, dpi, and dpi. n = . (*p < . ; **p < . ; ***p < . ; ****p < . ). before being inoculated intranasally with reagents and viruses, hace mice were intraperitoneally anaesthetized by . % avertin with . ml/g body weight; wt c bl/ j mice, interferon-α receptor gene knockout mice (ifnar -/-), elf -/mice, ifnar -/mice and mavs -/mice were anaesthetized by isoflurane; rhesus macaques were anaesthetized with mg/kg ketamine hydrochloride. the health status and body weight of all mice were observed and recorded daily. mice were euthanized at , , , and dpi to collect different tissues and examine virus replication and histopathological changes. vsv (vesicular stomatitis virus, indiana strain) was a gift from j. rose (yale university), iav (influenza a virus, pr ) was a gift from feng qiang (fudan university), and hsv- (herpes simplex virus ) was from a. iwasaki (yale university). emcv (encephalomyocarditis virus, vr- b) was purchased from american type culture collection (atcc). mhv-a (mouse hepatitis virus a- ) has been described previously (yang et al., ) . the sars-cov- which has been used in this paper is named strain hb- . the complete genome for this sars-cov- had been put in to gisaid (betacov/wuhan/ivdc-hb- / |epi_isl_ ), and has been described previously . vsv, emcv, hsv- were propagated in vero cells followed by cycles of freezing and thawing, then the large debrises were spun down and the supernatants were used as a stock solution. the titer of the viruses was determined by plaque assay in vero cells. for mice infection assay, wt mice were inoculated intranasally with vsv ( pfu), emcv ( pfu), hsv- ( pfu) after anesthesia. pfu: plaque forming unit. mhv-a were propagated in cl- cells followed by cycles of freezing and thawing, the large debrises were spun down and the supernatants were used as a stock solution. the titer of the viruses was determined by plaque assay in cl- cells. for mice infection assay, ifnar -/mice were inoculated intranasally with mhv-a ( pfu) after anesthesia or ifnar -/mice were inoculated intraperitoneally with mhv-a ( pfu). iav was propagated in -day-old specific-pathogen-free embryonic chicken eggs. the allantoic fluid was collected and titrated to determine the % tissue culture infection dose (tcid ) in a cells and the median lethal dose (ld ) in mice following the reed-muench method. for mice infection assay, wt mice were inoculated intranasally with iav ( tcid ) after anesthesia. for the paquinimod rescue assay, wt mice, hace mice, and ifnar -/mice were inoculated intranasally with iav, sars-cov- , and mhv-a respectively after anesthesia. the mice were given intranasally . μg/day of paquinimod (targetmol; catalog no. t ) starting on dpi. the control group mice were given intranasally μl of pbs. stock solutions of mg/ml paquinimod were prepared with dmso in advance. the health status and body weight of all mice were observed and recorded daily. mice were euthanized at , , , and dpi to collect different tissues and examine virus replication and histopathological changes. for serum co-culture assay, μl peripheral blood was collected from each group of ifnar -/mice on day during paquinimod rescue assay. after blood coagulation at room temperature, samples were spun down and the supernatants were used as a stock serum. raw . cells were seeded on -well plates with cells/ml. after cell adherence, lps ( ng/ml) and mice serum which has been collected above ( μl/ml) were added. after hours co-culture, cells were harvested and lysed by trnzol reagent for rna extraction. whole rna of cells with specific treatment were purified using rneasy mini kit (qiagen no. ). the transcriptome library for sequencing was generated using vahtstm mrna-seq v library prep kit for illumina® (vazyme biotech co.,ltd, nanjing, china) following the manufacturer's recommendations. after clustering, the libraries were sequenced on illumina hiseq x ten platform using ( × bp) paired-end module. the raw images were transformed into raw reads by base calling using casava (http://www. illumina.com/ support/ documentation.ilmn). then, raw reads in a fastq format were first processed using in-house perl scripts. clean reads were obtained by removing reads with adapters, reads in which unknown bases were more than % and low quality reads (the percentage of low quality bases was over % in a read, we defined the low quality base to be the base whose sequencing quality was no more than ). at the same time, q , q , gc content of the clean data were calculated (vazyme biotech co.,ltd, nanjing, china). the original data of the rna-seq was uploaded to the geo datasets. total rna was isolated from the tissues by trnzol reagent (dp , beijing tiangen biotech, china). then, cdna was prepared using hiscript iii st strand cdna synthesis kit (r - , nanjing vazyme biotech, china). qrt-pcr was performed using the applied biosystems real-time pcr systems (thermo fisher scientific, usa) with sybr qpcr master mix (q - , nanjing vazyme biotech, china). the data of qrt-pcr were analyzed by the livak method ( −ΔΔct ). ribosomal protein l (rpl ) was used as a reference gene for mice, and gapdh for macaques. qrt-pcr primers are displayed in supplementary materials table s . the lungs were quickly placed in cold saline solution and rinsed after they were collected. then, lungs were fixed in % pfa, dehydrated and embedded in paraffin prior to sectioning at μm and sections were stained with hematoxylin and eosin (h&e). for immunohistochemical staining, the lung paraffin sections were dewaxed and rehydrated through xylene and an alcohol gradient. antigen retrieval was performed by heating the sections to °c for min in . m citrate buffer (ph . ) and repeated times. the operations were performed according to the instructions of the two-step detection kit (pv- , beijing zsgb biotechnology, china). the samples were treated by endogenous peroxidase blockers for min at room temperature followed by incubation with primary antibodies s a ( : , t, cell signaling technology) at °c for h, then after washed with pbs, sections were incubated with reaction enhancer for min at room temperature and secondary antibodies at °c for min, and finally sections were visualized by , -diaminobenzidine tetrahydrochloride (dab) and counterstained with haematoxylin. the lung tissues, peripheral blood and bone marrow were collected from the mice. the lungs were first ground with mesh copper sieve, and then transferred to dmem containing % fbs, . mg/ml collagenase d ( , roche, switzerland) and . mg/ml dnase i ( , stemcell technologies, canada) for a min digestion at ℃ to obtain single-cell suspensions. livers were harvested and homogenized into single-cell suspensions using mesh copper sieve. bone marrow were flushed out of the femurs using a -gauge needle in pbs containing mm edta and % fetal bovine serum (fbs) and dispersed into single cells through a pipette. single-cell samples were treated by red blood cell lysis buffer (r , beijing solarbio science & technology, china) for min at room temperature and passed through a -μm nylon mesh sieve before staining. peripheral blood was treated with red blood cell lysis buffer to remove red blood cells. after blocking non-specific fc-mediated interactions with cd /cd antibodies ( - - , ebioscience, usa), single-cell suspensions were stained with fluorophore-conjugated anti-mouse antibodies at ℃ for min. after washing the samples, flow cytometry acquisition was performed on a bd lsrfortessa. sorting were performed using a bd ariaiii (bd). all antibodies were purchased from ebioscience: cd -pe ( - - ), ly- g-apc ( - - ), cd b-fitc ( - - ). all analyses were repeated at least three times, and a representative experimental result was presented. two-tailed unpaired student's t test was used for statistical analysis to determine significant differences when a pair of conditions was compared. asterisks denote statistical significance (*p < . ; **p < . ; *** p < . ). the data are reported as the mean ± s.d. figure s (a)-(d) rna-seq analysis of rna from lungs of rhesus macaques infected intranasally with sars-cov- at dpi and dpi. heatmap depicting the differentially expressed genes (a). the expression of ifns was analyzed (b). kegg analysis was performed with the differentially expressed genes compared with mock (c) (fc > or < . , p value < . ). the expression of indicated marker genes was analyzed (d). n = . (e) analysis of s a expression in peripheral blood from health control and covid- patients. fold change to health control (log ). data from the peripheral blood of covid- patients and health control correspond to geo: gse . (*p < . ; **p < . ; ***p < . ; ****p < . ). continued figure s (a) qrt-pcr analysis for viral load in the lung of hace mice infected intranasally with sars-cov- and c bl/ mice infected intranasally with iav at dpi, dpi, dpi, dpi, and dpi. n = . (b)-(e) rna-seq analysis of rna from lung of c bl/ mice infected with iav and hace mice infected with sars-cov- at dpi, dpi, dpi, and dpi. go and kegg analysis was performed with the differentially expressed genes compared with mock (b). go and kegg analysis was performed with the differentially expressed genes between iav induced genes and sars-cov- induced genes (c) (fc > or < . , p value < . ). the expression of ifns (d) and isgs was analyzed (e). rna-seq analysis of rna from lung of ifnar -/mice infected intranasally with mhv between vehicle treated group and paquinimod treated group at dpi. analysis of the expression level of indicted genes (b). kegg analysis was performed with the differentially expressed genes (c) (fc > or < . , p value < . ). (d) qrt-pcr analysis for the expression level of cd in the lungs of hace mice infected intranasally with sars-cov- at dpi, dpi, dpi, dpi, and dpi. n = . (****p < . ). analysis of the change of several related genes of neutrophil groups from the lungs of c bl/ mice infected with iav and hace mice infected with sars-cov- at dpi, dpi, dpi, dpi, and dpi. (d) rna-seq analysis of the change of several related genes of neutrophil groups from the lungs of ifnar-/-mice infected intranasally with mhv between vehicle treated group and paquinimod treated group at dpi. (*p < . ; **p < . ). pathogen recognition and innate immunity damps, pamps and alarmins: all we need to know about danger identification of human s a as a novel target for treatment of autoimmune disease via binding to quinoline- -carboxamides imbalanced host response to sars-cov- drives development of covid- host-viral infection maps reveal signatures of severe covid- patients alarmins: awaiting a clinical response sterile inflammation: sensing and reacting to damage alarmins in frozen shoulder: a molecular association between inflammation and pain localized bacterial infection induces systemic activation of neutrophils through cxcr signaling in zebrafish reduction and functional exhaustion of t cells in patients with coronavirus disease (covid- ). front immunol phagocyte-specific calcium-binding s proteins as clinical laboratory markers of inflammation hepatoma derived growth factor (hdgf) dynamics in ovarian cancer cells impaired type i interferon activity and inflammatory responses in severe covid- patients clinical features of patients infected with novel coronavirus in wuhan hmgb in health and disease comprehensive mapping of immune perturbations associated with severe covid- a critical concentration of neutrophils is required for effective bacterial killing in suspension single-cell landscape of bronchoalveolar immune cells in patients with covid- points of control in inflammation neutrophils at work neutrophils in homeostasis calprotectin in rheumatic diseases alarmins: chemotactic activators of immune responses danger-associated molecular patterns (damps): the derivatives and triggers of inflammation prophylactic treatment with s a inhibitor paquinimod reduces pathology in experimental collagenase-induced osteoarthritis lymphopenia predicts disease severity of covid- : a descriptive and predictive study mrp and mrp are endogenous activators of toll-like receptor , promoting lethal, endotoxin-induced shock cholinergic agonists inhibit hmgb release and improve survival in experimental sepsis s a /a in inflammation. front immunol a single-cell atlas of the peripheral immune response in patients with severe covid- risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease a new coronavirus associated with human respiratory disease in china single-cell transcriptome profiling reveals neutrophil heterogeneity in homeostasis and infection alarmins and immunity coronavirus mhv-a infects the lung and causes severe pneumonia in c bl/ age-related rhesus macaque models of covid- clinical characteristics of cases of death from covid- elevated exhaustion levels and reduced functional diversity of t cells in peripheral blood may predict severe functional exhaustion of antiviral lymphocytes in covid- patients bacterial and fungal infections in covid- patients: a matter of concern a novel coronavirus from patients with pneumonia in china this work was supported by the national natural science foundation of china f.y., x.w., c.q. and q.g. conceived the study and analyzed the data. q.g., y.z. and j.l. performed most experiments and analyzed the data. f.y. and x.g. analyzed the rna-seq data. z.z., l.c. and y.l. helped with the mice experiments. y.l., x.w, x.wei, l. chen. provided support on literature search. f.y. wrote the paper. f.y., x.w. and c.q. revised the paper. the authors have no conflicts of interest to declare. key: cord- - hxan rw authors: li, chenyu; debruyne, david n.; spencer, julia; kapoor, vidushi; liu, lily y.; zhou, bo; pandey, utsav; bootwalla, moiz; ostrow, dejerianne; maglinte, dennis t; ruble, david; ryutov, alex; shen, lishuang; lee, lucie; feigelman, rounak; burdon, grayson; liu, jeffrey; oliva, alejandra; borcherding, adam; tan, hongdong; urban, alexander e.; gai, xiaowu; bard, jennifer dien; liu, guoying; liu, zhitong title: highly sensitive and full-genome interrogation of sars-cov- using multiplexed pcr enrichment followed by next-generation sequencing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hxan rw many detection methods have been used or reported for the diagnosis and/or surveillance of covid- . among them, reverse transcription polymerase chain reaction (rt-pcr) is the most commonly used because of its high sensitivity, typically claiming detection of about copies of viruses. however, it has been reported that only - % of the positive cases were identified by some rt-pcr methods, probably due to low viral load, timing of sampling, degradation of virus rna in the sampling process, or possible mutations spanning the primer binding sites. therefore, alternative and highly sensitive methods are imperative. with the goal of improving sensitivity and accommodating various application settings, we developed a multiplex-pcr-based method comprised of pairs of specific primers, and demonstrated its efficiency to detect sars-cov- at low copy numbers. the assay produced clean characteristic target peaks of defined sizes, which allowed for direct identification of positives by electrophoresis. we further amplified the entire sars-cov- genome from to half a million viral copies purified from covid- positive specimens, and detected mutations through next generation sequencing. finally, we developed a multiplex-pcr-based metagenomic method in parallel, that required modest sequencing depth for uncovering sars-cov- mutational diversity and potentially novel or emerging isolates. a variety of methods for detecting sars-cov- have been reported and discussed , , including rt-pcr, serological testing and reverse transcription-loop-mediated isothermal amplification , . currently, rt-pcr is considered the gold standard for diagnosing sars-cov- infections because of its ease of use and high sensitivity. rt-pcr has been reported to detect sars-cov- in saliva , pharyngeal swab, blood, rectal swab , urine, stool , and sputum . in laboratory conditions, the rt-pcr methodology has been shown to be capable of detecting - copies of virus or lower, through amplification of targets in the orf ab, e and n viral genes, at % confidence intervals [ ] [ ] [ ] . however, only about - % of the true positive cases were identified by rt-pcr, and % of rt-pcr negative results were actually later found to be positive with other assays, hence mandating repeated testing , [ ] [ ] [ ] . in addition, there is evidence suggesting that heat inactivation of clinical samples causes loss of viral particles, thereby hindering the efficiency of downstream diagnosis . therefore, it is necessary to develop robust, sensitive, specific and highly quantitative methods for reliable diagnostics , . the urgency to develop an effective surveillance method that can be easily used in a variety of laboratory settings is underlined by the wide and rapid spreading of sars-cov- [ ] [ ] [ ] . in addition, such method should also distinguish sars-cov- from other respiratory pathogens such as influenza virus, parainfluenza virus, adenovirus, respiratory syncytial virus, rhinovirus, human metapneumovirus, sars-cov, etc., as well as mycoplasma pneumoniae, chlamydia pneumoniae and other causes of bacterial pneumonia [ ] [ ] [ ] [ ] . furthermore, obtaining full-length viral genome sequence through next generation sequencing (ngs) prove to be essential for the surveillance of sars-cov- 's evolution and for the containment of community spread [ ] [ ] [ ] [ ] . indeed, sars-cov- phylogenetic studies through genome sequence analysis have provided better understanding of the transmission origin, time and routes, which has guided policy-making and management procedures , , [ ] [ ] [ ] [ ] . here, we describe the development of a highly sensitive and robust detection assay incorporating the use of multiplex pcr technology to identify sars-cov- infections. theoretically, the multiplex pcr strategy, by simultaneously targeting and amplifying hundreds of targets, has significantly higher sensitivity than rt-pcr and may even detect nucleotide fragments resulting from degraded viral genomes. multiplex pcr has been shown to be an efficient and low-cost method to detect hantaan orthohantavirus and plasmodium falciparum infections , , with high coverage (median %), specificity ( . %) and sensitivity. moreover, this solution can be tailored to simultaneously address multiple questions of interest within various epidemiological settings . similar to a recently described metagenomic approach for sars-cov- identification , we also establish a user-friendly multiplex-pcr-based metagenomic method that is capable of detecting sars-cov- , and could also be applied for the identification of novel pathogens with a moderate sequencing depth of approximately million reads. several rt-pcr methods for detecting sars-cov- have been reported to date , - . among them, two groups reported the detection of - copies of the virus , . to investigate the opportunity for further improvement upon the sensitivity of rt-pcr, we built a mathematical model to estimate the limit of detection (lod) for sars-cov- . the reported rt-pcr amplicon lengths are around - bp, and the sars-cov- genome is , bp (nc_ . ). thus, we chose bp amplicon length and kb sars-cov- genome size for mathematical modeling. with the assumption of % rt-pcr efficiency , we found that rt-pcr assays could only detect . copies of sars-cov- at % probability (fig. a) , which is consistent with the experimental results previously reported . in this model, the predicted probability of rt-pcr assays to detect one copy of sars-cov- is only % (supplemental fig. ). this finding may explain, at least in part, the reported - % detection rates of sars-cov- with known positive samples by rt-pcr , . we further discovered that the lod appears to be independent of the viral genome size. for genomes of to kb, the detection limit remains . copies at % probability. one way to elevate the sensitivity is to simultaneously target and amplify multiple regions of the viral genome in a multiplex pcr reaction, thereby increasing the frequency of occurrence in the mathematical model. amplifying multiple targets has the advantage of potentially detecting fragments of degraded viral nucleotide fragments while tolerating genomic variations, thus allowing for the detection of new and ever-evolving viral strains. the amplification efficiency of multiplex pcr is critical for lod. we estimated that the efficiency of our multiplex pcr technology is about % if using unique molecular identifier (umi)-labeled primers to count the amplified products after ngs sequencing (supplemental fig. and supplemental table ). however, the amplification efficiency could be lower, and amplicons would not be equally amplified if the template used is one single strand of cdna. thus, more amplicons are potentially required for multiplex pcr to detect limited copies of the virus. we designed a panel of pairs of multiplex pcr primers in order to increase the sensitivity of detecting sars-cov- (fig. b) . the average amplicon length is bp. the amplicons span across the entire sars-cov- genome with an average bp gap ( ± bp) between adjacent amplicons. since the observed efficiency of multiplex pcr is about % in amplifying the four dna strands of a pair of human chromosomes, we assumed an efficiency of % in amplifying the single-strand of a cdna molecule. in addition, it has already been reported that % of variants are recovered when directly amplifying amplicons from a single cell using our technology . therefore, we assumed that % of targeted regions would be amplified successfully. using the same mathematical model described above, we estimated that our sars-cov- panel can detect . copies of the virus at % probability (fig. c) . again, the lod is independent of virus genome size. we also designed a second pool of multiplex pcr primer pairs. the target regions of these primer pairs overlap with the gaps between target regions of the previous pool of primer pairs (fig. b) . together, these two overlapping pools of primers provide full coverage of the entire viral genome. most importantly, using both pools in detection would lead to a calculated detection limit of . copies at % probability. the workflow was designed so that the multiplex pcr products are further amplified in a secondary pcr, during which sample indexes and ngs sequencing primers are added (fig. d ). the pcr products were first analyzed by electrophoresis to visualize potential positives. since dozens of target regions could be amplified from a single copy of sars-cov- , electrophoresis peaks with a defined distribution of peak sizes were expected. multiplex pcr could potentially amplify not only sars-cov- , but also other coronaviruses, due to shared sequence similarities despite the fact that we designed primers specific to the sars-cov- genome to avoid cross amplification. in that context, electrophoresis analysis provides a fast and sensitive indication of infection from at least that family of viruses. for specificity, the generated ngs sequencing library can be interrogated for definitive identification of the specific virus. two plasmids, containing the full sequence of the s and n genes of sars-cov- , respectively, were used to validate our multiplex pcr method. a total of targets are expected to be amplified within our -amplicon panel. to simulate the use of real clinical samples, these two plasmids were spiked into cdna generated from human total rna. the copy number of each plasmid was precisely determined by droplet-based digital pcr (ddpcr) with a qx system from bio-rad . the two plasmids were diluted from approximately , copies to below one copy, and were amplified in multiplex pcr reactions. the library peaks of expected sizes were obtained from , to . copies of plasmids ( fig. a) . quantification of peaks demonstrated a wide dynamic range from to about , copies of plasmids (fig. b ). the yield of the libraries started to saturate when the copy number reached , . it is possible that the saturation point could be even lower when all of the amplicons are amplified from positive clinical covid- samples, and the library peaks could be observed with even fewer viral copies. in contrast, the detected quantities of a single target on s gene by rt-pcr rapidly dropped when using . copies ( fig. b and supplemental fig. ). estimated from the mathematical model described above, employing amplicons provides a % chance to detect one single copy of the virus. we tested this predicted probability using one copy of plasmid in multiplex pcr reactions. the theoretical calculation gives a % probability to sample . copies, and a % chance to detect them based on a multiplex pcr efficiency of %. in practice, we experimentally observed a significantly higher % probability to detect . copies (fig. c ). these results suggest that the efficiency of multiplex pcr is actually higher than the previously estimated % when a single-stranded cdna molecule is amplified. when the amplified products were sequenced, we found that the recovered reads were within a range of about -fold relative depth with about . to . plasmids, and were uniformly distributed across the gc range (fig. d ). when detecting down to . copies of plasmids, only the reads from one amplicon were about -fold lower than the average. approximately % of the amplicons were recovered with copies of plasmids, % with . copies, and % with . copies (fig. e ). the above two pools of primers were used to amplify the full sars-cov- genomes from a total of nasopharyngeal swab specimens. these specimens were previously diagnosed to be sars-cov- positive by using rt-pcr. the viral load was found to be from to , rna copies/ul (supplemental table ). these viral genomes were successfully amplified and subsequently sequenced on the illumina miseq. we found a correlation between genome coverage and viral copy number, as expected. while about % of the genome was covered at x for copies of virus, - % of the genomes were covered at x for to , copies of virus at an average sequencing depth of , reads per amplicon (fig. a ). one genome, from the sample with , estimated viral copies, was covered at % for x. this coverage was lower than expected, and might have been caused by the poor sample quality resulting from processing or handling the viral rna or library. cleanplex libraries are usually sequenced at about , paired-end reads while still generating sufficient data for detecting mutations. to confirm sars-cov- libraries could be sequenced at about , reads per amplicon, we sub-sampled the data of genomes to , and , total reads per library, which were equivalent to and reads per amplicon, respectively. at least % of the genome was covered at x with , total reads, and % of the genome was covered at x with , total reads. even at , total reads per library, at least % of the genome was covered at x (fig. b) . the high coverage was also manifested by the superior uniformity of the number of amplicons amplified in the multiplex pcr reaction and recovered in the sequencing (fig. c) , and by the log distance of the number of each amplicon to the mean number of all amplicons in the library (fig. d ). the mutations in the sars-cov- genome from the specimens were detected by two independently developed algorithms. only those mutations that were detected by both methods were reported. assuming all viral particles from a single patient contained identical mutations, mutations with frequencies (%af) > % were considered to be empirically true. the majority of the mutations identified in these sars-cov- genomes clustered around loci, probably reflecting the collection of these specimens in close communities or the transmission of the virus (fig. e) . according to the similarity of these mutations, samples were categorized into three groups. the majority of the mutations in the first group showed > % mutation frequency. groups and shared a considerable number of identical mutations, while group was significantly different, suggesting that the origin of this isolate might be traced back to a distinct lineage (supplemental table ). of note, all strains contained at least one mutation that has been reported to be associated with sars-cov- virulence . the d g mutation (a g-to-a base change at position , in the reference strain nc_ . ), which began spreading in europe in early february, was then transmitted to new geographic regions and became a dominant form , was found in of these specimens. we considered mutations with %af ≤ % as a likely reflection of intra-host heterogeneity . to eliminate noise from pcr amplification and sequencing, only mutations with %af ≥ % were considered. the % cutoff was selected based on the mutation profile from sequencing the synthetic sars-cov- rna controls from twist bioscience. some true intra-host mutations with %af < % might be missed. since no umi was used in the multiplex pcr amplification, intra-host mutations might still be contaminated with noise originating from pcr amplification, even though a % cutoff was applied. the occurrence of such noise may be exacerbated by low viral copy inputs in the multiplex pcr, or low read depth per amplicon in sequencing. in groups and , where both viral copy numbers and read depth per amplicon were high, only one intra-host mutation per genome was found in some of the specimens. in contrast, group had low copy numbers, and low read depth per amplicon, and the numbers of apparent intra-host mutations were considerably higher ( fig. e and supplemental table ). some of these intrahost mutations occurred at the aforementioned loci, or were found at other loci and were identical among the specimens in group , while the remaining ones appeared random. our findings suggested that these recurring mutations might not be true intra-host mutations. indeed, it is possible that the low copy number inputs, as well as sequencing depth, caused a reduction in the %af, and additionally introduced false intra-host mutations. therefore, copy number and sequencing depth should be cautiously considered when a mutation is found to have low %af. in order to characterize highly mutated viruses that would otherwise not be amplified by the predesigned primer pairs, and to discover unknown pathogens, we subsequently developed a user-friendly multiplex-pcr-based metagenomic method. in this method, random hexamer-adapters were used to amplify dna or cdna targets in a multiplex pcr reaction. the large amounts of non-specific amplification products were removed by using paragon genomics' background removal reagent, thus resolving a library suitable for sequencing. for rna samples, paragon genomics' reverse transcription reagents were used to convert rna into cdna, resulting in significantly reduced amount of human ribosomal rna species. we sequenced a library made with , copies of n and s gene-containing plasmids spiked into ng of human gdna, which roughly represents , haploid genomes. even though the molar ratios of viral targets and human haploid genomes were comparable, the n and s genes, which encompass about kb of targets, were a negligible fraction of the billion base pairs of a human genome. if every region of the human genome were amplified and sequenced at . million reads per sample, only one read of viral target would be recovered. in fact, our results showed that % of the recovered bases, or % of the recovered reads, were within the viral n and s genes ( fig. a and supplemental table ). % of sars-cov- and % of mitochondrial targets were covered (fig. b) , and their base coverage was significantly higher than for human targets (fig. c) . in contrast, only . % of human chromosomal regions were amplified. furthermore, the human exonic regions were preferentially amplified (fig. d ). this suggested that the random hexamers deselected a large portion of the human genome, while favorably amplifying regions that were more "random" in base composition. indeed, long gaps and lack of coverage in very large repetitive regions were observed in human chromosomes (fig. e) . on the contrary, the gaps in sars-cov- and mitochondrial regions were significantly shorter (fig. f) . we further optimized this method so that % of the sars-cov- genome was recovered (fig. g) . the depth of the recovered bases was within a -fold range on average. this -fold difference in coverage has been routinely observed with our multiplex pcr technology (supplemental fig. and supplemental table ). therefore, increasing sequencing depth alone might not improve the coverage of the targeted regions further. this study provides a highly sensitive and robust multiplex pcr method for the detection of sars-cov- . by amplifying hundreds of targets simultaneously, our multiplex pcr method is more sensitive than rt-pcr, and tolerates the presence of mutations in sars-cov- . for the purpose of diagnosis, only one of two primer pools could be used, or the two pools could be alternatively used in adjacent samples to prevent cross contamination. while the amplification products from positive samples are mainly viral amplicons, low quantities of primer dimers are produced in the negative samples. therefore, a simple measurement of the dsdna concentration by fluorometry or spectrophotometry would not be sufficient. high-resolution electrophoresis is required to resolve the length of the amplification products in order to differentiate the target amplicons from the primer-dimers. alternatively, a low-depth sequencing in the range of k reads per sample would provide definitive diagnostic results. when both primer pools are used, the entire genome of sars-cov- can be enriched, sequenced and interrogated for the presence of any mutations. we demonstrated that mutations were detected from samples with viral loads ranging from to half a million copies. for accurate sequencing and phylogenetic studies, a high-depth sequencing in the range of k reads per sample, along with an input of high viral load (> copies), are deemed necessary. we caution the interpretation of intra-host mutations obtained with a low input number of viral particles and in low sequencing depth data. the current sars-cov- panel was demonstrated to specifically amplify the entire sars-cov- genome, and sequencing data obtained from the covid- rt-pcr positive samples clearly differentiate sars-cov- from other human coronaviruses, such as mers-cov, cov e, cov oc , cov nl , cov hku . % of the obtained sequencing reads from each of the samples were aligned to the genome sequence of any of the above viral species. this is in contrast to what was reported for a similar sars-cov- panel . such high specificity argues strongly that this panel could be further expanded to include simultaneous detection of other respiratory viruses including influenza virus. metagenomic method is a powerful technology that can theoretically detect any sequences in the specimens. however, metagenomic methods usually require very high sequencing depth in order to find the target sequences, and hence are economically prohibitive as a diagnostic assay. to overcome this constraint, we developed a multiplex-pcr-based metagenomic method that achieved > % coverage of the s and n genes of sars-cov- in the contest of human gdna, while only required ~ . m of total reads per library. this coverage was superior given the recommended % threshold of coverage for drafting a genome . the results were obtained with no additional means of host depletion to remove human gdna and rrna. the viral bases were % of the total recovered bases in the sequencing. yet it still necessary to verify and validate the detection of sars-cov- and the other coronaviruses and respiratory viruses by this metagenomic method. clinical sars-cov- samples were collected at children's hospital of los angeles. samples and ancillary clinical and epidemiological data were de-identified before analysis, and are thus considered exempt from human subject regulations, with a waiver of informed consent according to cfr . (b) of the us department of health and human services. analysis of the nasopharyngeal swab samples from patients with covid- disease was approved by the ministry of health in the us. patients in the covid- outbreak from january to august provided oral consent for study enrolment and the collection and analysis of their nasopharyngeal swab. consent was obtained at the homes of patients or in hospital isolation wards by a team that included staff members of children's hospital of los angeles. the universal human reference rna was from agilent technologies, inc. (cat# ). the plasmids containing either s or n gene of sars-cov- (puc-s and puc-n, respectively) were purchased from sangon biotech, shanghai, china. the pcr primers used in ddpcr and rt-pcr reactions for s gene are '-tgtacttggacaatcaaaaagagttgat and '-aggagcagttgtgaagttcttttc; for n gene are '-ggggaacttctcctgctagaat and '-cagacattttgctctcaagctg, respectively. panel design is based on the sars-cov- sequence nc_ . (https://www.ncbi.nlm.nih.gov/ nuccore/nc_ . /). in total, primer pairs, distributed into two separate pools, were selected by a proprietary panel design pipeline to cover the whole viral genome except for bases at its ends. primers were optimized to preferentially amplify the sars-cov- cdna versus background human cdna or genomic dna. they were also optimized to amplify the covered genome uniformly. ng of universal human reference rna was converted into cdna using random primers and superscript iv reverse transcriptase, following the supplier recommended method (thermo fisher scientific, cat# ). after reverse transcription, cdna was purified with . x volume of magnetic beads, and washed twice with % ethanol. finally, the purified cdna was dissolved in x te buffer and used per multiplex pcr reaction. plasmids puc-s and puc-n, in combination with human cdna, were used in each reaction. paragon genomics' cleanplex secondary pcr mix was used with nm of each pcr primers in ul reactions. the pcr thermal cycling protocol used was °c for min, then °c for sec, °c for sec for cycles. ddpcr ddpcr was performed on qx from bio-rad. plasmids puc-s and puc-n at the estimated copy numbers ( repeats), ( repeats), and ( repeats) were tested. in each reaction, the ddpcr thermal cycling protocol used was °c for min, then °c for sec, °c for min with cycles, °c for min and °c for min, °c hold. the resulting data were analyzed by following the supplier recommended method. paragon genomics' cleanplex multiplex pcr reagents and protocol were used. briefly, a µl multiplex pcr reaction was made by combining x mpcr mix, x pool of the panel, water and viral template cdna. the reaction was run in a thermal cycler ( °c for min, then °c for sec, °c for min for cycles), then terminated by the addition of µl of stop buffer. the reaction was then purified by µl of magnetic beads, followed by a secondary pcr with a pair of primers for cycles. the secondary pcr added sample indexes and sequencing adapters, allowing for sequencing of the resulting products by high throughput sequencing. a final bead purification was performed after the secondary pcr, followed by library interrogation using a bioanalyzer instrument with agilent high sensitivity dna kit (agilent technologies, inc. part# - ). a cumulative poisson probability was used to build the mathematical model. in microsoft excel, the following function was used: % of targets were assumed to be successfully amplified in multiplex pcr. for rt-pcr, = f × n × m. q = number of detected virus genomes. q is used to plot the graph reported in the paper, e.g., the copies of virus against the matching probabilities. for multiplex pcr, q = n/a, a = the number of amplicons used in the multiplex pcr. for rt-pcr, q = n × f. paragon genomics' cleanplex metagenomic reagents and protocol were used. briefly, a µl multiplex pcr reaction was made by combining x mpcr mix, x random hexamer-adapters, water and the viral template cdna. the pcr thermal cycling protocol used was °c for min, then °c for sec, °c for min, °c for min for cycles. the reaction was then terminated by the addition of µl of stop buffer, and purified by µl of magnetic beads. the resulting solution was treated with µl of cleanplex reagent at °c for min to remove non-specific amplification products. after a magnetic bead purification, the product was further amplified in a secondary pcr with a pair of primers for cycles to produce the metagenomic library. this metagenomic library was further purified by magnetic beads before sequencing. high throughput ngs sequencing was performed using illumina iseq , miseq and mgi sequencers (dnbseq-g and its research-grade coolmps sequencing kits). detailed information for the samples sequenced and used in this manuscript can be found in supplemental table . raw sequencing data were trimmed for adaptors using cutadapt version . . the sequences obtained were mapped to the sars-cov- genome (nc_ . ) with bwa-mem using sentieon version . . duplicate read marking was skipped. base-quality recalibration, re-alignment of indels and quality metrics was accomplished with sentieon. the resulting bam files were then used to calculate depth and coverage metrics using samtools version . . . algorithms developed independently at children's hospital of los angeles and paragon genomics were sued to detect the mutations in the genome of sars-cov- . all sequencing data used in this publication are available for downloading at ncbi's sequence read archive (https://www.ncbi.nlm.nih.gov/traces/study/?acc=prjna ). the workflow of the multiplex pcr method. the prepared libraries can be detected using high-resolution electrophoresis, and sequenced together with other samples using high-throughput sequencing. (a) two plasmids, containing sars-cov- s and n genes, respectively, were diluted in human cdna and amplified in multiplex pcr with pool ( pairs of primers). the number of plasmid copies per reaction, determined by ddpcr, were from , to . . the resulting products obtained after multiplex pcr were resolved by electrophoresis. the specific amplification products (the library) can still be seen with . copies of plasmids. (b) the library yields can be detected down to . copies of plasmids (n= ) by multiplex pcr (black line), while only down to . copies by rt-pcr (> . -fold difference) (red line). (c) poisson process was used to estimate the chance of sampling around copy of viral particles, and the mathematical model was used to estimate the chance of detecting them (red line). there is % of probability to detect . copies with a multiplex pcr efficiency of %. in reality, we observed a significantly higher % probability for . copies and % probability for . copies. (d) after sequencing . and . copies of plasmids, the reads of all amplicons spanning both n and s genes were clustered within a fold range of coverage (n= ). with . copies, only the reads of one amplicon were about -fold lower than the average (n= ). (e) about % of amplicons were recovered with copies of plasmids, % with . copies, and % with . copies (n= ). . the entire genomes of sars-cov- were amplified and sequenced from a cohort of covid- positive patients. the copies of virus that were used in the multiplex pcr reaction range from to , . . - . % of the genomes were covered at x with an average of , reads per amplicon from to , copies. the coverage slightly decreased to % with copies of virus. (b) sub-sampling of samples to , total reads slightly reduced the x coverage to - %. (c) the amplification performance of the amplicons for each sample was measured by the uniformity of . x mean of the reads. each circle represents one sample. (d) in one sample, the performance of each amplicon was evaluated by their log distance to the mean reads. each circle represents one amplicon. (e). the mutations in each sars-cov- genome were detected and the sars-cov- samples were segregated into groups based on similarities. the majority of the mutations were shared by groups and . group contained low viral copy numbers in multiplex pcr, low reads per amplicon in sequencing and a considerable amount of apparent intra-host mutations. mutations in group were different from the other groups. the mutations that associate with virulence are in bold. a g, which associates with high transmission , is in red. the reference genome used was nc_ . . black vertical lines represent point mutations, green vertical lines represent intra-host mutations, red vertical lines represent deletions. (a) random hexamer-adapters were used in multiplex pcr to amplify , copies of plasmids in the background of , haploid human gdna molecules. the resulting libraries were sequenced at an average of . million total reads. of the total bases recovered, % were mapped to sars-cov- s and n genes. (b) % of s and n genes, and % of the human mitochondrial chromosome were amplified with >= x coverage, while only . % of the human chromosomes were. (c) on average, s and n genes were covered at , x, mitochondrial and human chromosomes at x and x, respectively. (d) human exons were relatively over-amplified, at about -fold higher compared to their actual ratio within the genome. (e) gaps and long regions of absence of amplification were observed for human chromosomes. an example shown here is chromosome y. small gaps were additionally found in the enlarged cluster of amplification. the long absent region (red double arrow) overlapped with the repetitive regions on y chromosome. (f) representation of the recovered regions in s and n genes and the human mitochondrial chromosome. the coverage was from , -to , -fold for s and n genes, and -to -fold for the mitochondrial chromosome. (g) this metagenomic method was subsequently improved, resulting in . % of the sars-cov- s gene covered at . x and . % of the n gene covered at . x, with a sequencing depth of . m total reads. measures for diagnosing and treating infections by a novel coronavirus responsible for a pneumonia outbreak originating in wuhan recent advances in the detection of respiratory virus infection in humans molecular and serological investigation of -ncov infected patients: implication of multiple shedding routes rapid detection of novel coronavirus (covid- ) by reverse transcription-loop-mediated isothermal amplification. medrxiv rapid colorimetric detection of covid- coronavirus using a reverse tran-scriptional loop-mediated isothermal amplification (rt-lamp) diagnostic plat-form: ilaco. medrxiv consistent detection of novel coronavirus in saliva detectable -ncov viral rna in blood is a strong indicator for the further clinical severity comparison of different samples for novel coronavirus detection by nucleic acid amplification tests comparison of throat swabs and sputum specimens for viral nucleic acid detection in cases of novel coronavirus (sars-cov- ) infected pneumonia (covid- ). medrxiv detection of novel coronavirus ( -ncov) by real-time rt-pcr molecular diagnosis of a novel coronavirus ( -ncov) causing an outbreak of pneumonia development of genetic diagnostic methods for novel coronavirus (ncov- ) in japan correlation of chest ct and rt-pcr testing in coronavirus disease (covid- ) in china: a report of cases chest ct for typical -ncov pneumonia: relationship to negative rt-pcr testing comparison of the clinical characteristics between rna positive and negative patients clinically diagnosed with racing towards the development of diagnostics for a novel coronavirus ( -ncov) clinical and biochemical indexes from -ncov infected patients linked to viral loads and lung injury first cases of coronavirus disease (covid- ) in france: surveillance, investigations and control measures novel coronavirus outbreak in wuhan, china, : intense surveillance is vital for preventing sustained transmission in new locations laboratory readiness and response for novel coronavirus ( -ncov) in expert laboratories in eu/eea countries interpretation of "guidelines for the diagnosis and treatment of novel coronavirus ( -ncov) infection by the national health commission (trial version diagnosis and clinical management of novel coronavirus infection: an operational recommendation of peking union medical college hospital genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes diagnosis, treatment, and prevention of novel coronavirus infection in children: experts' consensus statement emerging novel coronavirus ( -ncov)-current scenario, evolutionary perspective based on genome analysis and recent developments potential of large "first generation" human-to-human transmission of -ncov transmission dynamics and evolutionary history of -ncov cross-species transmission of the newly identified coronavirus -ncov genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the -new coronavirus epidemic: evidence for virus evolution the global spread of -ncov: a molecular evolutionary analysis the first two cases of -ncov in italy: where they come from comparison of targeted next-generation sequencing for whole-genome sequencing of hantaan orthohantavirus in apodemus agrarius lung tissues sensitive, highly multiplexed sequencing of microhaplotypes from the plasmodium falciparum heterozygome. biorxiv rna based mngs approach identifies a novel human coronavirus from two individual pneumonia cases in wuhan outbreak abstract : targeted single cell dna sequencing without prior whole genome amplification for mutational analysis of circulating tumor cells comparison of four digital pcr platforms for accurate quantification of dna copy number of a certified plasmid dna reference material patient-derived mutations impact pathogenicity of sars-cov- . medrxiv spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- . biorxiv characterization of intra-host sars-cov- variants improves phylogenomic reconstruction and may reveal functionally convergent mutations. biorxiv a rapid, low cost, and highly sensitive sars-cov- diagnostic based on whole genome sequencing. biorxiv supplemental information highly sensitive and full-genome interrogation of sars-cov- using multiplexed pcr enrichment followed by next-generation sequencing utsav pandey , moiz bootwalla , dejerianne ostrow , dennis maglinte ca usa department of psychiatry and behavioral sciences, department of genetics children's hospital los angeles the authors would like to thank jian xu of mgi, bgi-shenzhen for technical support in mgi sequencing, dr. jin billy li of stanford university for the coordination of academic support, dr. alexander e. urban of stanford university for suggestions in preparing the manuscript, and dr. zihuai he of stanford university for discussions on a potential mathematical model of the metagenomic method. the authors declare no competing interests. key: cord- -oak okj authors: leng, ling; ma, jie; zhang, leike; li, wei; zhao, lei; zhu, yunping; wu, zhihong; cao, ruiyuan; zhong, wu title: potential microenvironment of sars-cov- infection in airway epithelial cells revealed by human protein atlas database analysis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oak okj the outbreak of covid- has caused serious epidemic events in china and other countries. with the rapid spread of covid- , it is urgent to explore the pathogenesis of this novel coronavirus. however, the foundational research of covid- is very weak. although angiotensin converting enzyme (ace ) is the reported receptor of sars-cov- , information about sars-cov- invading airway epithelial cells is very limited. based on the analysis of the human protein atlas database, we compared the virus-related receptors of epithelial-derived cells from different organs and found potential key molecules in the local microenvironment for sars-cov- entering airway epithelial cells. in addition, we found that these proteins were associated with virus reactive proteins in host airway epithelial cells, which may promote the activation of the immune system and the release of inflammatory factors. our findings provide a new research direction for understanding the potential microenvironment required by sars-cov- infection in airway epithelial, which may assist in the discovery of potential drug targets against sars-cov- infection. the outbreak of covid- pneumonia was declared a "public health emergency of international concern" by the world health organization (who). on january , , chinese scientists isolated a new type of coronavirus named sars-cov- from an infected patient. sars-cov- causes serious respiratory diseases and is transmitted by human-to-human contact [ ] . however, the pathogenesis of this novel pathogen and any potential drug targets remain to be elucidated. the clinical symptoms of sars-cov- resembles those of sars-cov [ ] , and the genome of sars-cov- represents good homology with sars-cov [ ] , with . % sequence consistency the two coronaviruses [ ] . sars-cov mainly replicates in the lower respiratory tract, resulting in diffuse alveolar damage [ ] . the tissue tropism of sars-cov- also is the lower respiratory tract [ , ] . in addition, it was reported that cytopathic effects were observed h after inoculation of sars-cov- on the surface layers of human airway epithelial cells (epcs), and a lack of cilium beating was observed after virus infection [ ] . however, it was found that huh epithelial-like cells from the liver that are similar to airway epcs produced no specific cytopathic effects after six days of virus infection [ ] . in addition, a recent study has shown that an epithelial cell line from cervical cancer (hela) was also insensitive to sars-cov- infection [ ] . these studies suggest that cells from different epithelial sources have different susceptibility to the sars-cov- . coronaviruses are enveloped and require a series of interactions with their host cells during virus entry, depending on tissue and cell-type specificities (e.g., receptor and microenvironment) in addition to strain and species [ ] . although some microorganisms may display a commensal relationship with epcs [ ] , different microorganisms will specifically attack the epcs of different organs. for example, sars-cov mainly binds to cells expressing angiotensin-converting enzyme (ace ), which is one of the major receptors of sars-cov when entering cells [ ] . to date, more studies have confirmed that ace is also the receptor of sars-cov- infection [ ] ; however, ace displays high expression in the kidney, bile duct, and testis with low expression in the airways and lungs [ ] [ ] [ ] . lindskog, et al also emphasized that the expression of ace in human respiratory system is limited, and questioned the role of ace for infection in human lungs [ ] . therefore, we wonder whether there are some key local microenvironment proteins specifically expressed on the surface of airway epc that makes the virus prefer airway epcs. in some cases, additional cell surface molecules or co-receptors are required for sufficient viral entry into host cells. it remains unclear whether these local environments with specific components are related to the susceptibility of different cell types to sars-cov- . to study the specificity of airway epcs for sars-cov- infection, we compared host cell receptors and virus infection-related host proteins expressed in epithelial or epithelial-derived cells from different organs and found a series of specific proteins and key receptors of airway epcs related to virus infection. these proteins may be beneficial to the design of potential drug targets against sars-cov- infection. generally, the expression profiles of human genes on the protein level were derived from the human protein atlas (hpa, https://www.proteinatlas.org/) [ ] . the virus-related biological and functional gene ontology (go) annotation terms were derived from the gene ontology knowledgebase (http://geneontology.org/, released on - - , containing , go terms) [ ] . all human protein annotations were downloaded from the swiss-prot database of uniprot (https://www.uniprot.org/) [ ] . the overall analysis pipeline is shown in figure a . first, all go terms that related to the virus-host functions were collected from the go-obo file. then, after manually checking, only those terms related to host-virus receptor and key biological processes were taken (table s ) , and the human proteins involved in these particular go terms were retrieved from uniprot database. after that, the protein expression levels in epithelial and epithelial derived cells from different organs were derived from the hpa and mapped to these proteins. heap map views of protein expression levels were displayed using conditional formatting of microsoft excel spreadsheets. all protein-protein interactions were retrieved from the string database [ ] , and the protein-protein interaction network was built using cytoscape software (version . . ) [ ] . first, ace was found highly expressed in intestine, kidney, gallbladder, adrenal gland, testis, and seminal vesicle among organs in the hpa, at both the protein and rna levels ( figure s ), consistenting with previous reports [ , ] . single-cell sequencing also showed that ace was expressed in only amounts in airway tissue and type ii alveolar cells (at ). in addition, mrna data of all cell lines in the hpa show low expression of ace in human small-cell lung cancer cells (sclc- h, figure s ). however, ace was not detected at the protein level in airway or lung tissues, indicating that the correlation between protein and transcription levels were widely varied [ ] . thus, limitations arise when predicting protein expression based only on mrna expression levels. in a clinical study of patients, it was speculated that sars-cov- mainly infected the lower respiratory tract [ ] . a chinese research group also isolated this virus from airway epcs [ , ] . these results indicate that airway epcs are possibly a susceptible and primary target for sars-cov- . furthermore, initial viral entrance may cause cytopathological changes to airway epcs, resulting in coughing in % of patients [ ] and further indicating that the lung is also susceptible to virus infection. thus, the susceptibility of different organs to sars-cov- is not consistent with corresponding protein expression levels of ace on these organs. we therefore asked (i) why sars-cov- infects the respiratory tract rather than other organs as its first target organ, (ii) what is the difference between airway epcs and epithelial-derived cells of other organs, and (iii) whether specific local microenvironment components were required for sars-cov- infection besides ace that are sufficient to compensate for the low expression of ace in airway epcs. to explore these issues, we first outlined the expression profile of virus receptors or receptor-related membrane proteins (collectively named virus microenvironment components) of epcs located in different organs. we used the human protein atlas (hpa) database [ ] to extract the protein expression level of receptors involved in "virus receptor activity" (go: ) of epcs and epithelial or epithelial-derived cells from organs ( cell types) ( figure a and supplementary materials and methods). by comparing the data of different organs, we found that the expression pattern of these virus microenvironment components varies among the epithelial-derived cells from different organs. we focused on airway epcs including virus microenvironment components from the nasopharynx and bronchi because these two sites are most likely to be preferred by sars-cov- . we found serpin family b member (serpinb ) and cadherin-related family member (cdhr ) were specifically expressed in airway epcs, which attracted our attention ( figure b ) for their potential as the viral important microenvironment in the airway. in particular, cdhr shows little or no expression within the epithelial-derived cells of other organs ( figure c ). serpinb is in the serine protease family, and it was reported that the s protein of coronavirus can be activated by transmembrane serine protease family member (ttsp) to complete its membrane fusion process [ , ] . thus, the role of serpinb on airway epcs during sars-cov- infection merits further study. other potential receptors such as xenotropic and polytropic retrovirus receptor (xpr ), in addition to being highly expressed on airway epcs, are also expressed on the surface of other epithelial-derived cells. xpr was also highly expressed on colon gland cells and tonsil squamous epcs; moderately expressed on epcs of the oral mucosa, lung, kidney and muscle; and poorly expressed on liver and bile duct cells ( figure b, figure s ). clinical and epidemiological studies also revealed that diarrhea has been observed in a subset of patients infected with sars-cov- in addition to the pulmonary manifestations [ ] [ ] [ ] , indicating that sars-cov- can also infect the intestinal tract as well as the lung tissue through specific receptors and microenvironment components. this putative intestinal infection require special attention, because it suggests that sars-cov- also may be transmitted through the fecal-oral route. stool samples from patients infected with sars-cov- in the united states also have been positive for the presence of sars-cov- , strongly suggesting fecal transmission [ ] . nucleic acid of the similar coronavirus sars-cov was also detected in patients' stool samples [ ] [ ] [ ] . among the proteins with low expression on airway epcs, we focused on dystroglycan (dag ) and found it is only specifically expressed in airway and lung pneumocytes epcs ( figure b ). together these data suggest that the expression levels of putative virus microenvironment components in different types of epithelial-derived cells are organ-specific. we next investigated whether the proteins that respond to sars-cov- infection display organ-specific expression. we used the same scheme to extract virus-host interaction-related entries (table s ). in the search category of viral-reactive proteins that modulate host cell morphology or physiology, few were expressed in the liver. notably, bcl -associated agonist of cell death (bad) and zinc finger c h-type containing a (zc h a) were highly expressed exclusively in airway epcs; neurotrophic receptor tyrosine kinase (ntrk ) was moderately expressed exclusively in airway epcs. another protein, eukaryotic translation initiation factor alpha kinase (eif ak ), which was highly expressed in airway epcs, also was found moderately expressed in the lung, kidney, and muscle ( figure s ). in terms of viral defense, we found that most of the proteins reactive to viruses are highly expressed in different organs ( figure s ). we found more specific highly expressed proteins to study whether the virus induces a biological response after infection through were both putative rna helicases. in the interaction network, we focused on the protein interaction of virus microenvironment components with high expression in airway epcs, which probably perform a specific function in airway epcs after virus infection. rnasel, the interacting protein of xpr , matched these parameters and was specifically highly expressed in airway and tonsil squamous epcs (figure a) . rnasel probably mediates its antiviral effects through a combination of direct cleavage of single-stranded viral rnas, inhibition of protein synthesis through the degradation of rrna, induction of apoptosis, and induction of other antiviral genes. another receptor protein, dag , which was seldom expressed in airway epcs, interacted with interleukin (il)- (figure a) . il- binds to and signals through the il- receptor-like (il rl )/st receptor, which in turn activates nuclear factor (nf)-κb and mitogen-activated protein kinase (mapk) signaling pathways in target cells. in addition, we found most of the interacting proteins of virus microenvironment components on airway epcs are related to the immune response, including toll-like receptors (tlr), nf-κb, rig-i-like receptor, nucleotide-binding oligomerization domain-containing protein (nod)-like receptor and tnf signaling pathways ( figure b) . upon detection of a pathogen, epithelia respond by upregulating expression of defensins, which confer immediate protection to the host, and alarmins, which convey danger signals to neighboring epcs. epithelia are equipped with tlr surface receptors, which recognize conserved molecular patterns associated with pathogens or tissue damage. tlr signals activate nf-κb through the adaptor protein myd [ ] . we found that the most enriched signaling pathways in the interaction network between airway epc receptors and host-reactive protein are the tlr signaling pathway followed by the nf-κb signaling pathway ( figure b ). in addition, we found putative activation of the rig-i-like receptor signaling pathway, which begins a downstream signal pathway to secrete type i interferon in the interaction network. many proteins related to the production and regulation of type i interferon were also observed in the interaction network ( figure s ). the most enriched item was il- β, which is consistent with the fact that the serum of patients infected with sars-cov- has a higher pro-inflammatory factor il- β than that of uninfected people [ ] . in addition, the nod-like receptor and tnf signaling pathways mediate the expression of a variety of inflammatory factors including monocyte chemoattractant protein- (mcp- ), interferon-induced protein (ip- ), and tnf-α which are consistent with the covid- clinical characteristics [ , ] . these results indicate that the virus microenvironment components expressed in airway epcs, especially those interacting proteins with specific and highly expressed microenvironment components, may be involved in inhibition of host cell biological processes and important immune response pathways related to the antiviral response. in this manuscript, we proposes the possibility that airway-specific virus microenvironment components may assist the interaction between ace and sars-cov- , although the expression of ace is very low in the tracheal epithelium that provides other potential directions to study sars-cov- entry into airway epcs. our study also suggests that the table s , and the staining data and pictures are downloaded from "the tissue atlas" of hpa (https://www.proteinatlas.org/ ensg -cdhr /tissue). lamp rpsa itga xpr scarb slc a hspa b serpinb cdhr cd hspa a nectin npc efnb f r slc a tyro cxadr tnfrsf cldn itgav ide cd nectin dag tfrc pvr havcr slc a itgb scarb itgb cd no protein expression data in hpa early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission a pneumonia outbreak associated with a new coronavirus of probable bat origin the clinical pathology of severe acute respiratory syndrome (sars): a report from china epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical features of patients infected with novel coronavirus in a novel coronavirus from patients with pneumonia in china physiological and molecular triggers for sars-cov membrane fusion and entry into host cells epithelial cells: liaisons of immunity angiotensin-converting enzyme is a functional receptor for the sars coronavirus single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to -ncov infection, frontiers of medicine single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses single-cell analysis of ace expression in human kidneys and bladders reveals a potential route of -ncov infection, biorxiv the protein expression profile of ace in human tissues, biorxiv tissue-based map of the human proteome the gene ontology resource: years and still going strong uniprot: the universal protein knowledgebase the string database in : quality-controlled protein-protein association networks, made broadly accessible cytoscape: a software environment for integrated models of biomolecular interaction networks quantitative proteomics of the cancer cell line encyclopedia tmprss activates the human coronavirus e for cathepsin-independent host cell entry and is expressed in viral target cells in the respiratory epithelium the spike protein of the emerging betacoronavirus emc uses a novel coronavirus receptor for entry, can be activated by tmprss , and is targeted by neutralizing antibodies first case of novel coronavirus in the united states enteric involvement of severe acute respiratory syndrome-associated coronavirus infection clinical progression and viral load in a community outbreak of coronavirus-associated sars pneumonia: a prospective study coronavirus as a possible cause of severe acute respiratory syndrome a map of toll-like receptor expression in the intestinal epithelium reveals distinct spatial the authors declare no conflict of interest. key: cord- -uxn olw authors: lu, maolin; uchil, pradeep d.; li, wenwei; zheng, desheng; terry, daniel s.; gorman, jason; shi, wei; zhang, baoshan; zhou, tongqing; ding, shilei; gasser, romain; prévost, jérémie; beaudoin-bussières, guillaume; anand, sai priya; laumaea, annemarie; grover, jonathan r.; liu, lihong; ho, david d.; mascola, john r.; finzi, andrés; kwong, peter d.; blanchard, scott c.; mothes, walther title: real-time conformational dynamics of sars-cov- spikes on virus particles date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: uxn olw sars-cov- spike (s) mediates entry into cells and is critical for vaccine development against covid- . structural studies have revealed distinct conformations of s, but real-time information that connects these structures, is lacking. here we apply single-molecule förster resonance energy transfer (smfret) imaging to observe conformational dynamics of s on virus particles. virus-associated s dynamically samples at least four distinct conformational states. in response to hace , s opens sequentially into the hace -bound s conformation through at least one on-path intermediate. conformational preferences of convalescent plasma and antibodies suggest mechanisms of neutralization involving either competition with hace for binding to rbd or allosteric interference with conformational changes required for entry. our findings inform on mechanisms of s recognition and conformations for immunogen design. against the virus( - ). s is synthesized as a precursor, processed into s and s by furin proteases, and activated for fusion when human angiotensin-converting enzyme (hace ) engages the receptor-binding domain (rbd) and when the n-terminus of s is proteolytically processed ( ) ( ) ( ) . structures of soluble ectodomains and native virus particles have revealed distinct conformations of s, including a closed trimer with all rbd oriented downward, trimers with one or two rbds up, and hace -stabilized conformations with up to three rbd oriented up ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ). real-time information that connects these structures, however, has been lacking. smfret is well suited to inform on conformational dynamics of proteins reporting domain movements in the millisecond to second range, and has previously been applied to study hiv- , influenza a, and ebola spike glycoproteins, via measurements of the distance-dependent energy transfer from an excited donor to a nearby acceptor fluorophores in real-time( - ). to probe dynamics of sars-cov- spikes, we used available high-resolution structures of the sars-cov- s trimer to identify sites of fluorophore pair labeling that have the potential to inform on distance changes expected to accompany conformational changes between the rbddown and receptor hace -induced rbd-up trimer structures( , ) (fig. s ). accordingly, we engineered a and q labeling peptides before and after the receptor-binding motif (rbm) to allow site-specific introduction of donor and acceptor fluorophores at these positions ( fig. , a, b, and fig. s ). we optimized retroviral and lentiviral pseudoviral particles carrying the sars- cov- s protein (fig. s ) to test the impact of these peptides on infectivity, and found that they were well tolerated, both individually and in combination (fig. s d ). to measure conformational dynamics of the sars-cov- s trimer on the surface of virus particles, we established two types of particles, lentiviral pseudoparticles carrying s, as well as coronaviruslike particles generated by expression of s, membrane (m), envelope (e) and nucleocapsid (n) protein (s-men)( , ) (fig. , a and b ). s-men particles co-express coronavirus surface proteins m and e. particle quality and the presence of the corona-like s proteins on both particle surfaces were confirmed by cryo-electron microscopy ( fig. , c and d) . for smfret, lentivirus particles and s-men particles were generated (see materials and methods) by transfecting hek t cells with an excess of plasmid-encoding wild-type, doped with trace amounts of plasmid expressing labeling peptide-carrying s to ensure the production of virus particles that contain, on average, only a single engineered s protein. as for analogous investigations of hiv- envelope protein( , ), donor (cy b( s)) and acceptor fluorophore (ld ) were enzymatically conjugated to the engineered s proteins presented on the virus particle surface in situ (see materials and methods). a biotinylated lipid was then incorporated into the virus particle membrane to allow their immobilization within passivated microfluidic devices coated with streptavidin to enable imaging by total internal reflection microscopy (fig. a) . donor fluorophores on single, immobilized virus particles were excited by a singlefrequency nm laser and fluorescence emission from both donor and acceptor fluorophores were recorded at hz ( fig. a) . from the recorded movies, we computationally extracted hundreds of smfret traces exhibiting anti-correlated donor and acceptor fluorescence intensities, the telltale signature of conformational changes within the s protein on individual virus particles. analyses of smfret data from ligand-free s protein on lentiviral particles revealed that the sars-cov- s protein is dynamic, sampling at least four distinct conformational states to identify the receptor-bound conformation of the sars-cov- s protein by smfret, we measured the conformational consequences of soluble, monomeric hace binding. addition of the monomeric hace receptor to surface-immobilized virus particles lead to increased occupancy of the low-(~ . ) fret s protein conformation (fig. e) , which was observed at both the single-molecule and population level (fig. f ). similar hace receptor impacts on the sars-cov- s protein were observed in both lentiviral particle and s-men coronavirus-like particle contexts (fig. , e to g). dimeric hace , a more potent ligand (fig. s a )( ), stabilized the low-(~ . ) fret s protein conformation more efficiently (fig. s , b and c), suggesting that the observed low-fret state likely represents the receptor-bound state in which all three rbd domains are oriented upwards (rbd-up conformation). a unique strength of single-molecule imaging is its capacity to reveal directly both the structural and kinetic features that define biological function ( , ) . to extract such information for the sars-cov- s protein, we employed hidden markov modeling (hmm)( ) to idealize individual smfret traces. these data allowed quantitative analyses of the thousands of discrete hace -binding -could be achieved spontaneously. as expected, the binding of the hace receptor modified the dynamic s protein conformational landscape towards the rbd-up conformation (~ . fret), rendering it the most populated ( fig. , b, c, f, g). this change resulted from an increase in the transition rate from the rbd-down conformation (~ . fret) towards the intermediate-(~ . ) fret state and subsequently the rbd-up (~ . fret) conformation, which was also modestly stabilized. the energy barriers for reverse transitions towards the rbd-down conformation (~ . fret), were also elevated, explaining receptor-bound conformation accumulation over time (figs. s ). these analyses lead to a qualitative model for hace activation of the sars-cov- s protein from the ground state to the receptor-bound state through at least one intermediate conformation (fig. h ). the summary of relative state occupancies, transition rates among conformations and errors are listed in tables s and s , respectively. in most cell types, the serine protease tmprss is required for ph-independent sars-cov- entry ( , , ) . in vitro, the effect of tmprss is mimicked by the serine protease trypsin, which has similar cleavage specificity ( , ) . smfret analysis of trypsin-treated s protein on lentiviral particles in the absence of receptor revealed a clear shift towards activation (fig. , a, b). after trypsin treatment, the addition of hace receptor was more effective at stabilizing the s protein in the rbd-up (~ . fret) conformation (fig. , c and d, fig. s ). to further validate this finding, we measured the effects of trypsin pre-treatment in virus-cell and cell-cell fusion assays using split nanoluc system consisting of lgbit and hibit (fig. , e and f, fig. s ). here, membrane fusion restored luciferase activity between lentiviral particles carrying the s protein as well as a vpr-hibit fusion protein with cells expressing the lgbit counterpart fused to a ph domain. this assay revealed fusion to be strictly dependent on the hace receptor and to be stimulated by trypsin treatment (fig. , e and f). nearly identical results were observed for cell-cell fusion between donor cells expressing s and target cells expressing hace ( fig. s ) , confirming the activating role of trypsin treatment. we next explored the suitability of the smfret assay to characterize the conformational consequences of antibody binding to the sars-cov- s protein. multiple studies on antibodies generated from covid- patients have shown that one type of antibody often dominates immune responses( - ). this prompted us to screen plasma from convalescent patients with neutralizing activity that can bind to the s protein on lentiviral particles( ) using a modified virus-capture assay (vca)( ). cross-reactive cr ( ), one of the very first reported antibodies from sars-cov- that also bind to sars-cov- spike rbd domain, served as a good indicator of rbd binding (fig. a) . we identified two plasma samples (s and s ) able to specifically bind the rbd, recognize s expressed at the cell surface and to neutralize viral particles (fig. , a to c, and fig. s ). smfret analysis of antibody-bound s revealed that both cr and plasma from patient s , stabilized s in the rbd-up (~ . fret) conformation, in a similar fashion as receptor hace (fig. , d and e). these data point to the presence of rbd-directed antibodies in patient s . by contrast, smfret indicated that plasma from patient s contained an activity that stabilized the rbd-down (~ . fret) conformation (fig. f ). plasma s antagonized hace binding, but rbd competition did not affect its recognition of s, suggesting that its neutralization activity does not solely rely on blocking the receptor interface we then assessed the conformational preference of four rbddirected antibodies: the potently neutralizing antibodies h , - and - , and the neutralization nanobody vhh , each of which binds rbd in a different way( - ). antibody h and nanobody vhh stabilized the s protein in an rbd-up (~ . fret) conformation similar to hace , cr , and s , whereas antibody - shifted the conformational landscape towards rbd-down (~ . fret) conformation, similar to s (fig. , g to j). the very potent neutralizing antibody - ( ), meanwhile, showed a partial shift to the rbd-up (~ . fret) conformation (fig. s ) . the absence or presence of hace did not appear to affect the rbd-up stabilization evidenced for antibodies cr , s , vhh , or h (fig. s ) . however, plasma s , and to a lesser extent antibody - , reduced the hace -dependent stabilization of the rbd-up (~ . fret) conformation, suggesting that they may interfere with hace receptor binding via an allosteric mechanism. these findings indicate that sars-cov- neutralization can be achieved in two ways: ) antibodies that conformationally mimic hace and directly compete with hace receptor binding, or ) by allosterically stabilizing the s protein in its rbd-down conformation. the strength of the presented smfret approach is revealed by the capacity to examine the dynamic properties of the s protein in real time, including: ) the distinct conformational states that it spontaneously transits under physiological conditions; ) the impact of sequence alterations on s protein dynamics; and ) the responses of the s protein to cognate hace receptor and antibody recognition. the present analyses of dynamic s protein molecules provides three lines of evidence that indicate that the intermediate-(~ . ) fret state observed represents the rbd-down, ground state conformation of the s protein, in which all three rbd domains are oriented towards the viral particle membrane. first, in line with previous electron microscopy (em) investigations ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ), the rbd-down state is the most populated. in further agreement with recent em studies, both the disulfide bridge (s c, d c)( , ) and antibody - stabilized the s protein in a conformation with all three rbd oriented down . while our smfret observations highlight considerable conformational flexibility in these contexts compared to em of soluble trimers, we attribute these distinctions to a tendency of our analysis approach to over-emphasize dynamic features, while em may over-emphasize static conformations rigidified by cryogenic temperatures that may be more readily identified and classified( ). multiple lines of evidence also facilitated assignment of the rbd-up (~ . fret) conformation of the s protein with all three rbd domains oriented away from the virus particle membrane. for instance, this conformation was stabilized by soluble monomeric hace receptor, and even further stabilized in the presence of soluble, dimeric hace receptor as well as rbd-targeting antibodies, such as cr , that are known to access their epitopes when the s protein is in an activated, rbd-up conformation( - ). the structure of the on-path (~ . fret) intermediate observed during s opening is likely similar to the all-down ground state; cryo-em structures of soluble sars-cov- s trimers( ) that engage one or two hace molecules receptors( ) reveal that the distance between the two labeling sites increases in the ligand-free protomers adjacent to a protomer engaged to hace (fig. s a ). the additional, highly compacted s conformation (~ . fret) evidenced, which is also depopulated by activating ligands, remains unknown. these smfret analyses are in global agreement with the conformational states observed at the single particle em and cryoet level ( , , , - , - , , ) . the observed fret changes are also are in good agreement with expected increase in the distance between the labeling peptide insertion sites that carry the fluorophores in the rbd-down and rbd-up conformations of the s trimer. the capacity to examine the conformational preferences of rbd-directed antibodies to the s protein enabled us to identify conformational signatures of antibodies in patient plasma. this approach identified patients with antibody activities that either mimicked ace (indicating anti-rbd activity) or stabilized the ground state of s, thereby interfering with data and materials availability: all data is available in the main text or the supplementary materials. the data that support the findings of this study are available from the corresponding authors upon reasonable request. the full source code of spartan, which was used for analysis of smfret data, is publicly available. (http://www.scottcblanchardlab.com/software). some small customized matlab scripts are available upon reasonable requests. a full-length wild-type pcmv -sars-cov- spike (s +s )-long (termed as pcmv-s, codon-optimized, sino biological, cat # vg -ut) plasmid was used as a template to generate tagged pcmv-s. the translated amino acid sequence of pcmv-s is identical to each pair of inserted tags did not compromise s-dependent lentivirus infectivity. infectivity measurements the infectivity of lentivirus particles carrying sars-cov- spike proteins was cryo-electron tomography nm gold tracer was added to the concentrated s-decorated hiv- lentivirus and s-men particles viruses at : ratio, and µl of the mixture was placed onto freshly glow discharged holey carbon grids for min. grids were blotted with filter paper, and rapidly frozen in liquid ethane using a homemade gravity-driven plunger apparatus. cryo-grids were imaged on a cryo-transmission electron microscope (titan krios, thermo fisher scientific) that was operated at kv, using a gatan k direct electron detector in counting mode with a ev energy slit. tomographic tilt series between − ° and + ° were collected by using serialem( ) in a dose-symmetric scheme( ) with increments of °. the nominal magnification was , x, giving a pixel size of . Å on the specimen. the raw images were collected from single-axis tilt series with accumulative dose of ~ e− per Å . the defocus was - µm and frames were saved for each tilt angle. frames were motion-corrected using motioncorr ( ) to generate drift-corrected stack files, which were aligned using gold fiducial makers by imod/etomo( ). tomograms were reconstructed by weighted back projection and tomographic slices were visualized with imod. virus particles were labeled through site-specifically enzymatic labeling, as previously the sars-cov- rbd elisa assay used was recently described ( , ) introduction of fluorophores (cy b, green; ld , red) was guided by conformational changes in s induced by binding of the cellular receptor human angiotensin-converting enzyme (hace ) from the "rbd-down" to the "rbd-up" conformation (fig. s ) . rbd, receptorbinding domain; ntd, n-terminal domain. structures were adapted from rcsb protein data bank accessories vsb ('down' s /s protomer: s , light cyan; s , dark blue) and vyb/ m j ('up' protomer s /s engaged with hace : hace , magenta). table s . (h) relative free energy model of conformational landscapes of sars-cov- spikes in response to the hace binding. the differences in free energies between states were roughly scaled based upon relative state occupancies of each state. (fig. b) . fret histograms represent mean ± s.e.m., determined from three randomly assigned populations of fret traces. for state occupancies see table s . fret histograms represent mean ± s.e.m., determined from three randomly-assigned populations of all fret traces. evaluated state occupancies see table s . table s . table s . relative state-occupancy and fitting parameters in each of four fret-defined conformational states of sars-cov- spike protein on the surface of virus particles. the fret efficiency histograms were fitted into the sum of four gaussian distributions (µ, the mean or expectation of the gaussian distribution; σ, s.d. of the gaussian distribution) for each conformational state. parameters were based upon the observation of original fret efficiency data and were further determined using hidden markov modelling. relative conformational stateoccupancy of sars-cov- spike protein on viral particles are presented as mean ± s.e.m., determined from three independent measurements. r-squared values were evaluated to indicate the goodness of fit. ligand-free . % +/- % % +/- % % +/- % % +/- % + hace . % +/- % % +/- % % +/- % % +/- % table s . transition rates between observed conformational states of sars-cov- spike on virus particles. the survival probability plots (figs. s and s ) were derived from distributions of dwell times for each state-to-state transitions determined through hidden markov modeling (hmm). then plots were fitted by double-exponential distributions: y(t) = a exp -k t + a exp -k t ), where y(t) is the probability and t is the dwell time. the presented rates were the weighted average of two rates derived from double-exponential decays. rates were finally presented in the table as (weighted average +/- % confidence intervals). development of an inactivated vaccine candidate for sars-cov- an mrna vaccine against sars-cov- -preliminary report dna vaccine protection against sars-cov- in rhesus macaques single-shot ad vaccine protects against sars-cov- in rhesus macaques sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov sars-cov- and bat ratg spike glycoprotein structures inform on virus evolution and furin-cleavage effects distinct conformational states of sars-cov- spike protein controlling the sars-cov- spike glycoprotein conformation structure-based design of prefusion-stabilized sars-cov- spikes closing coronavirus spike glycoproteins by structure-guided design. biorxiv structural basis of receptor recognition by sars-cov- structural and functional basis of sars-cov- entry by using human ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor molecular architecture of the sars-cov- virus. biorxiv structures and distributions of sars-cov- spike proteins on intact virions key: cord- -qj coqfr authors: wei, yulong; silke, jordan r.; aris, parisa; xia, xuhua title: coronavirus genomes carry the signatures of their habitats date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qj coqfr coronaviruses such as sars-cov- regularly infect host tissues that express antiviral proteins (avps) in abundance. understanding how they evolve to adapt or evade host immune responses is important in the effort to control the spread of covid- . two avps that may shape viral genomes are the zinc finger antiviral protein (zap) and the apolipoprotein b mrna-editing enzyme-catalytic polypeptide-like protein (apobec ). the former binds to cpg dinucleotides to facilitate the degradation of viral transcripts while the latter deaminates c into u residues leading to dysfunctional transcripts. we tested the hypothesis that both apobec and zap may act as primary selective pressures that shape the genome of an infecting coronavirus by considering a comprehensive number of publicly available genomes for seven coronaviruses (sars-cov- , sars-cov, mers, bovine cov, murine mhv, porcine hev, and canine cov). we show that coronaviruses that regularly infect tissues with abundant avps have cpg-deficient and u-rich genomes; whereas viruses that do not infect tissues with abundant avps do not share these sequence hallmarks. in sars-cov- , cpg is most deficient in the s protein region to evaded zap-mediated antiviral defense during cell entry. furthermore, over four months of sars-cov- evolutionary history, we observed a marked increase in c to u substitutions in the ’ utr and orf ab regions. this suggests that the two regions could be under constant c to u deamination by apobec . the evolutionary pressures exerted by host immune systems onto viral genomes may motivate novel strategies for sars-cov- vaccine development. sequence region of apobec and zc hav isoforms were extracted in fasta format along with their ensembl accession ids. to compare gene expressions of apobec and zc hav l among tissues, we retrieved publicly available rna sequencing and microarray studies that each sampled at least mammalian tissues. the five mammalian species that have extensive tissue-specific mrna expressions are homo sapiens, bos taurus, canis lupus familiaris, mus musculus, and sus scrofa. given that the data extracted were from multiple independent sources, thus not directly comparable, the relative mrna expression level designations (high or low) for apobec and zap isoforms in a given tissue were derived from comparisons among avp expressions in all tissues in each independent source. specifically, we calculated the proportion of mrna , we calculated the averaged pme value by considering all tissue- specific pme values in each independent source. finally, for each avp, tissue-specific pmes were designated as high if they are greater than the averaged pme value and low if they are less than the averaged pme. in addition, each column in supplemental figures s and s with column title designations "apobec " or "zc hav " contains the tissue-specific avp expressions from an individual source, where darkest blue represents the tissue with the highest mrna expression and darkest red represents the lowest mrna expression. retrieving and processing the genomes and regular habitats of coronaviruses infecting five mammalian species the genome, accession id, and sample we computed the nucleotide and di-nucleotide frequencies in each viral genome. among ( ) the index is expected to be with no cpg deficiency or excess, smaller than if cpg is deficient and greater than if cpg is in excess. next, among high sequence quality and complete sars-cov- genomes from cncb, we records of tissue infection are located in supplemental file s ). we determined which human tissues are commonly infected by coronaviruses and whether these tissues express avps in abundance. figure expressions, such as porcine hev infecting pig liver (fig. a) , canine cov infecting dog intestine and lung (fig. b) , and bovine cov infecting cattle intestine (fig. c ). all three of these coronaviruses do not avoid tissues with high avp expressions, nor do they display a compelling preference for tissues with low avp expressions. lastly, murine mhv regularly infects mice brain and liver but rarely infects the lung; however, mice brain and liver express low levels of only coronaviruses targeting tissues with high avp expressions exhibit decreased cpg and increased u content in the previous section, we demonstrated that many surveyed host-specific coronaviruses commonly infect tissues that exhibit high levels of avps (fig. , ; supplemental fig. s , s ), but mhv does not conform to this observation (fig. d) . here we compared the cpg and u content of these coronaviruses and found that viruses that regularly infect avp-rich tissues tend to based on global sequence comparison, figure a shows that most snps are c->u substitutions. more specifically, local mutation patterns (fig. b) show that among sequence samples, increases over time, but only at the ' utr region (fig. a) and orf ab region (fig. b) and not in other regions (fig. c, d, supplemental fig. s ). it is noteworthy that in the s region, tissues expressing both avps in abundance (fig. a, b, and c) . unsurprisingly, these global trends were absent from murine mhv genomes (fig. ) as this virus does not regularly infect tissues that highly express avps (fig. d) apobecs and virus restriction antiretroviral restriction factors in pteropid bats as revealed by apobec gene are pangolins the intermediate host of the novel coronavirus (sars-cov- )? expression (gtex) project broad antiretroviral defence by human apobec g through lethal editing of nascent reverse transcripts structure of the zinc-finger antiviral protein in complex with rna reveals a mechanism for selective targeting of cg-rich viral sequences apobec -mediated restriction of rna virus replication nucleic acid determinants for selective deamination of dna over rna by activation-induced deaminase conservation, acquisition, and functional impact of sex-biased gene expression in mammals the role of zap and oas /rnasel pathways in the attenuation of an rna virus with elevated frequencies of tissues . : an integrative web resource on mammalian tissue expression quasispecies structure, cornerstone of hepatitis b virus infection: mass sequencing approach adenosine deaminases acting on rna (adars) are both antiviral and proviral genome database: new annotation tools for a new reference genome. nucleic acids research structural basis of receptor recognition by sars-cov- the double-domain cytidine deaminase apobec g is a cellular site-specific rna editing enzyme apobec a cytidine deaminase induces rna editing in monocytes and macrophages mitochondrial hypoxic stress induces widespread rna editing by apobec g in natural killer cells isolation of a human gene that inhibits hiv- infection and is suppressed by the viral vif protein rampant c->u hypermutation in the genomes of sars-cov- and other coronaviruses -causes and consequences for their short and long evolutionary trajectories coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics cg dinucleotide suppression enables antiviral defence targeting non-self rna on the origin and continuing evolution of sars- within-patient mutation frequencies reveal fitness costs of cpg dinucleotides and drastic amino acid changes in hiv cpg-recoding in zika virus genome causes host-age-dependent attenuation of infection with protection against lethal heterologous challenge in mice rna virus attenuation by codon pair deoptimisation is an artefact of increases in cpg/upa dinucleotide frequencies translation-associated mutational u-pressure in the first orf of sars-cov- and other coronaviruses apobec g cytidine deaminase association with coronavirus nucleocapsid protein the cpg dinucleotide content of the hiv- envelope gene may predict disease progression extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral relationship of sars-cov to other pathogenic rna viruses explored by tetranucleotide usage profiling a comparative encyclopedia of dna elements in the mouse genome association of potent human antiviral cytidine deaminases with sl rna and viral rnp in hiv- virions moderate mutation rate in the sars coronavirus genome and its implications host factor that blocks human immunodeficiency virus type replication zinc-finger antiviral protein inhibits hiv- infection by selectively targeting multiply spliced viral mrnas for degradation key: cord- -yi yu l authors: zhang, g.; pomplun, s.; loftis, a. r.; tan, x.; loas, a.; pentelute, b. l. title: investigation of ace n-terminal fragments binding to sars-cov- spike rbd date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yi yu l coronavirus disease (covid- ) is an emerging global health crisis. with over million confirmed cases to date, this pandemic continues to expand, spurring research to discover vaccines and therapies. sars-cov- is the novel coronavirus responsible for this disease. it initiates entry into human cells by binding to angiotensin-converting enzyme (ace ) via the receptor binding domain (rbd) of its spike protein (s). disrupting the sars-cov- -rbd binding to ace with designer drugs has the potential to inhibit the virus from entering human cells, presenting a new modality for therapeutic intervention. peptide-based binders are an attractive solution to inhibit the rbd-ace interaction by adequately covering the extended protein contact interface. using molecular dynamics simulations based on the recently solved cryo-em structure of ace in complex with sars-cov- -rbd, we observed that the ace peptidase domain (pd) α helix is important for binding sars-cov- -rbd. using automated fast-flow peptide synthesis, we chemically synthesized a -mer peptide fragment of the ace pd α helix (sbp ) composed entirely of proteinogenic amino acids. chemical synthesis of sbp was complete in . hours, and after work up and isolation > milligrams of pure material was obtained. bio-layer interferometry (bli) revealed that sbp associates with micromolar affinity to insect-derived sars-cov- -rbd protein obtained from sino biological. association of sbp was not observed to an appreciable extent to hek cell-expressed sars-cov- -rbd proteins and insect-derived variants acquired from other vendors. moreover, competitive bli assays showed sbp does not outcompete ace binding to sino biological insect-derived sars-cov- -rbd. further investigations are ongoing to gain insight into the molecular and structural determinants of the variable binding behavior to different sars-cov- -rbd protein variants. a novel coronavirus (sars-cov- ) from wuhan, china, has caused over million confirmed cases and over , deaths globally, according to the covid- situation report from who on june , (https://www.who.int/emergencies/diseases/novel-coronavirus- /situation-reports/), and the number is continually growing. similar to the sars-cov outbreak in , sars-cov- causes severe respiratory problems. coughing, fever, difficulties in breathing and/or shortage of breath are the common symptoms. aged patients with pre-existing medical conditions are at most risk with a mortality rate ~ . % or even higher in some regions. moreover, human-to-human transmission can occur rapidly by close contact. to slow this pandemic and treat infected patients, rapid development of specific antiviral drugs is of the highest the closely-related sars-cov coronavirus invades host cells by binding the angiotensin- converting enzyme (ace ) receptor on human cell surface through its viral spike protein (s) [ - ] . it was recently established that sars-cov- uses the same receptor for host cell entry and binds ace with an affinity comparable with the corresponding spike protein of sars-cov [ , ] . recent cryo-electron microscopy (cryo-em) structural studies of the sars-cov- spike protein receptor binding domain (rbd) in complex with full-length human ace receptor revealed key amino acid residues at the contact interface between the two proteins and estimated the binding affinity at ~ nm [ , ] . these studies provide valuable information that can be leveraged for the development of disruptors specific for the sars-cov- /ace protein-protein interaction (ppi). small-molecule inhibitors are often less effective at disrupting extended protein binding interfaces [ ] . peptides, on the other hand, offer a synthetically accessible solution to disrupt ppis by binding at interface regions containing multiple contact "hot spots" [ ] . we hypothesized that disruption of the viral sars-cov- -rbd/host ace interaction with peptide-based binders would prevent virus entry into human cells, offering a novel opportunity for rbd was simulated under tip p explicit water conditions. analyzing the simulation trajectory after ns, we found that sbp remains on the spike rbd protein surface in a stable conformation ( fig. b ) with overall residue fluctuations smaller than . nm compared with their starting coordinates ( fig. a) . per-residue analysis along the ns trajectory showed that the middle residues of sbp , a -mer sequence we termed sbp , have significantly reduced fluctuations (fig. c, d) , indicating key interactions. the results of this md simulation suggest that sbp and sbp peptides derived from the ace-pd α helix may alone potentially bind the sars-cov- spike rbd protein with sufficient affinity to disrupt the associated ppi. automated fast-flow peptide synthesis yields > % pure compound the two n-terminal biotinylated peptides, sbp and sbp , derived from the α helix were prepared by automated fast-flow peptide synthesis [ , ] with a total synthesis time of . h over coupling cycles. after cleavage from resin, global deprotection, and subsequent c solid- phase extraction, the purity of the crude peptides was estimated to be > % for both biotinylated sbp and sbp based on lc-ms tic chromatograms (supplemental fig. ). we assessed this purity as acceptable for direct downstream biological characterization. sbp peptide binds sino biological insect-derived sars-cov- -rbd with micromolar affinity, but does not associate with other commercial sources of sars-cov- -rbd bio-layer interferometry (bli) was used to measure the binding affinity of the synthesized peptide sbp to glycosylated sino biological insect-derived sars-cov- -rbd, sino biological acrobiosystems hek-expressed sars-cov- -rbd. in all of these assays, biotinylated sbp was immobilized onto streptavidin (sa) biosensors. after fitting the association and dissociation curves from serial dilutions of the protein, the dissociation constant (kd) of sbp to glycosylated sino biological insect-derived sars-cov- -rbd was determined to be ~ nm using the global fitting algorithm and : binding model (fig. e ). however, sbp did not associate with the other three sars-cov- -rbd proteins studied (fig. e) . surprisingly, a scrambled sequence of sbp exhibited binding to the sino biological insect-derived sars-cov- -rbd with comparable association response to sbp at nm concentration (fig. e ). sbp had no observable binding to a negative control human protein menin (fig. f) . likewise, no association was observed between the biotinylated -mer sbp and sino biological insect-derived sars-cov- -rbd (fig. f) . sbp does not compete with biotinylated ace binding to sino biological insect-derived using a competition-format bli assay, we confirmed that soluble human ace protein could compete with immobilized biotinylated ace (acrobiosystems) for binding sino biological insect-derived sars-cov- -rbd, and that a -fold excess of soluble ace (relative to immobilized biotinylated ace ) abolished nearly all of the initial ace /rbd binding interaction ( fig. b, d ). however, competition was not observed when using non-biotinylated sbp pre- mixed in solution with sino biological insect-derived sars-cov- -rbd, even with a -fold excess of the peptide (fig. c, e ). these data suggest that sbp potentially binds sars-cov- -rbd at a different site than ace , binds sars-cov- -rbd too weakly, or for other unknown reasons cannot disrupt the native ace /rbd interaction. recently published cryo-em structures of the rbd of sars-cov- in complex with human ace have identified this ppi as a key step for the entry of sars-cov- into human cells [ , ] . blocking this binding interface represents a highly promising therapeutic strategy, as it could potentially hinder cellular uptake of sars-cov- and intracellular replication. drugging ppis is a longstanding challenge in traditional drug discovery and peptide-based approaches might help to solve this problem. small molecule compounds are unlikely to bind large protein surfaces that do not have distinct binding pockets. peptides, on the other hand, display a larger surface area and chemical functionalities that can mimic and disrupt the native ppi, as is the case for the clinically approved hiv peptide drug fuzeon [ , ] . the identification of a suitable starting point for drug discovery campaigns can be time- intensive. during a pandemic such as this one, therapeutic interventions are urgently needed. peptide-based strategies were developed to target both the spike protein rbd and s subunit of the first sars-cov virus [ , ] . translating these approaches to sars-cov- , inhibitors of the spike protein fusion with the cell membrane and engineered mini-proteins that bind sars-cov- rbd were developed [ , ] . we aimed to determine the minimum length required of the ace n-terminal peptide fragment in order to maintain binding affinity to sars-cov- -rbd and thus potentially deliver a synthetically accessible therapeutic candidate. to rapidly identify potential short peptide binders to the sars-cov- spike protein, we used molecular dynamics ( and several important interactions with the spike protein were observed consistently with multiple lines of published data [ , ] . we used this peptide (sbp ) as an experimental starting point for the development of a sars-cov- spike protein binder. our rapid automated flow peptide synthesizer enabled the synthesis of tens of milligrams of sbp peptide within . h. the crude purity was determined to be > % and therefore sufficient for binding validation by bli. the interaction between n-terminal biotinylated sbp and the rbd of glycosylated sars- cov- spike protein was investigated in detail. we performed serial dilutions of the soluble protein to reliably determine the binding affinity of sbp to sino biological insect-derived sars-cov- - rbd. using a global fitting algorithm, we found that n-terminal biotinylated sbp binds sino biological insect-derived sars-cov- -rbd with micromolar affinity (kd = . µm), a value almost -times higher than the estimated binding affinity of the native ace receptor (kd ~ nm [ ]) (fig. e ). the decreased binding affinity relative to ace may partially explain why sbp was unable to significantly disrupt ace binding to sino biological insect-derived sars-cov- -rbd even at -fold excess in a bli competition assay (fig. c,e) . in addition, the comparable affinity observed for a scrambled sequence of sbp (fig. e) are in progress to gain additional understanding of these molecular processes. in conclusion, a biotinylated peptide sequence derived from human ace was found to bind sino biological insect-derived sars-cov- spike protein rbd with micromolar affinity, but did not associate with sars-cov- -rbd variants obtained from other commercial sources. in spite of this association, competitive bli data indicates that sbp , even at -fold excess, did not compete with ace for binding to sars-cov- -rbd. our preliminary studies highlight the unexpected challenges researchers may encounter while developing peptide-based approaches to disrupt the specific interactions of sars-cov- with its mammalian cell membrane receptors. the cryo-em structure of ternary complex of sars-cov- -rbd with ace -b at (pdb: m ) was chosen as the initial structure, which was explicitly solvated in an Å box, to perform a ns molecular dynamical (md) simulation using namd on mit's supercomputing clusters (gpu node). the amber force field was used to model the protein and peptide. the md simulation system was equilibrated at k for ns. periodic boundary conditions were used and long-range electrostatic interactions were calculated with particle mesh ewald method, with non- bonded cutoff set to . Å. shake algorithm was used to constrain bonds involving hydrogen atoms. time step is fs and the trajectories were recorded every ps. after simulation production runs, trajectory files were loaded into the vmd software for further analysis. after peptide synthesis, the peptidyl resin was rinsed with dichloromethane briefly and then dried in a vacuum chamber overnight. next day, approximately ml of cleavage solution ( % trifluoroacetic acid (tfa), % tips, . % edt, . % water) was added into the syringe containing the resin. the syringe was kept at room temperature for h before injecting the cleavage solution into a ml conical tube. dry-ice cold diethyl ether (~ ml) was added to the cleavage mixture and the precipitate was collected by centrifugation and triturated twice with cold diethyl ether ( ml). the supernatant was discarded. residual ether was allowed to evaporate and the peptide was dissolved in water with . % tfa for solid-phase extraction. after peptide cleavage, peptide precipitates were dissolved in water with . % tfa. agilent mega tfa, and then equilibrated with ml of water with . % tfa. peptides were loaded onto the column for binding, followed by washing with ml of water with . % tfa, and finally, eluted with ml of / water/acetonitrile (v/v) with . % tfa. liquid chromatography-mass spectrometry (lc-ms) peptides were dissolved in water with . % tfa followed by lc-ms analysis on an agilent ifunnel esi-q-tof instrument using an agilent jupiter c reverse-phase column ( . mm × mm, μm particle size). mobile phases were . % formic acid in water (solvent a) and . % formic acid in acetonitrile (solvent b). linear gradients of to % solvent b over minutes (flow rate: . ml/min) were used to acquire lc-ms chromatograms. a fortebio octet® red bio-layer interferometry system (octet red , fortebio, ca) was used to characterize the in vitro peptide-protein binding affinity at °c and rpm. sino biological hek-expressed sars-cov- -rbd (solution in phosphate-buffered saline) and (b) glycosylated sino biological insect-derived sars-cov- -rbd with associated deconvoluted mass spectra obtained by integration over the protein peak at ~ min. the broad bands in the tic chromatograms of (b) are due to additives present in the vendor-formulated solid powder ( % glycerol, % trehalose, % mannitol and . % tween- ). structure of sars coronavirus spike receptor-binding domain complexed with receptor receptor and viral determinants of sars-coronavirus adaptation to human ace lethal infection of k -hace mice infected with severe acute respiratory syndrome coronavirus retroviruses pseudotyped with the severe acute respiratory syndrome coronavirus spike protein efficiently infect cells expressing angiotensin-converting enzyme sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of the sars-cov- by full-length human ace features of protein-protein interactions that translate into potent inhibitors: topology, surface area and affinity mrna display: from basic principles to macrocycle drug discovery identification of critical determinants on ace for sars-cov entry and development of a potent entry inhibitor a pan-coronavirus fusion inhibitor targeting the hr domain of human coronavirus spike inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion an engineered stable mini-protein to plug sars-cov- spikes synthesis of proteins by automated flow chemistry a fully automated flow-based approach for accelerated peptide synthesis peptide-based inhibitors of protein-protein interactions enfuvirtide, an hiv- fusion inhibitor receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars site-specific glycan analysis of the sars-cov- spike in-solution enrichment identifies peptide inhibitors of protein-protein interactions key: cord- -s fp z q authors: chan, kui k.; tan, timothy j.c.; narayanan, krishna k.; procko, erik title: an engineered decoy receptor for sars-cov- broadly binds protein s sequence variants date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: s fp z q the spike s of sars-cov- recognizes ace on the host cell membrane to initiate entry. soluble decoy receptors, in which the ace ectodomain is engineered to block s with high affinity, potently neutralize infection and, due to close similarity with the natural receptor, hold out the promise of being broadly active against virus variants without opportunity for escape. here, we directly test this hypothesis. we find an engineered decoy receptor, sace .v . , tightly binds s of sars-associated viruses from humans and bats, despite the ace -binding surface being a region of high diversity. saturation mutagenesis of the receptor-binding domain followed by in vitro selection, with wild type ace and the engineered decoy competing for binding sites, failed to find s mutants that discriminate in favor of the wild type receptor. we conclude that resistance to engineered decoys will be rare and that decoys may be active against future outbreaks of sars-associated betacoronaviruses. zoonotic coronaviruses have crossed over from animal reservoirs multiple times in the past two decades, and it is almost certain that wild animals will continue to be a source of devastating outbreaks. unlike ubiquitous human coronaviruses responsible for common respiratory illnesses, these zoonotic coronaviruses with pandemic potential cause serious and complex diseases, in part due to their tissue tropisms driven by receptor usage. severe acute respiratory syndrome coronaviruses (sars-cov- ) and (sars-cov- ) engage angiotensin-converting enzyme (ace ) for cell attachment and entry ( - ). ace is a protease responsible for regulating blood volume and pressure that is expressed on the surface of cells in the lung, heart and gastrointestinal tract, among other tissues ( , ) . the ongoing spread of sars-cov- and the disease it causes, covid- , has had a crippling toll on global healthcare systems and economies, and effective treatments and vaccines are urgently needed. as sars-cov- becomes endemic in the human population, it has the potential to mutate and undergo genetic drift and recombination. to what extent this will occur as increasing numbers of people are infected and mount counter immune responses is unknown, but already a variant in the viral spike protein s (d g) has rapidly emerged from multiple independent events and effects s protein stability and dynamics ( , ). another s variant (d y) became prevalent in portugal, possibly due to a founder effect ( ). coronaviruses have moderate to high mutation rates. for example, − substitutions per year per site occur in hcov-nl ( ), an alphacoronavirus that also binds ace , albeit via a smaller interface that is only partially shared with the rbds of sars-associated betacoronaviruses ( ). additionally, large changes in coronavirus genomes have frequently occurred in nature from recombination events, especially in bats where co-infection levels can be high ( , ) . recombination of mers-covs has also been documented in camels ( ). this will all have profound implications for the current pandemic's trajectory, the potential for future coronavirus pandemics, and whether drug resistance in sars-cov- becomes prevalent. the viral spike is a vulnerable target for neutralizing monoclonal antibodies that are progressing through clinical trials, yet in tissue culture escape mutations in the spike rapidly emerge to all antibodies tested ( ). deep mutagenesis of the isolated receptor-binding domain (rbd) by yeast surface display has easily identified mutations in s that retain high expression and ace affinity, yet are no longer bound by monoclonal antibodies and confer resistance ( ) . this has motivated the development of cocktails of non-competing monoclonals ( , ) , inspired by lessons learned from the treatment of hiv- and ebola, to limit the possibilities for the virus to escape. notably, drug maker eli lilly has a monoclonal monotherapy (ly-cov ) in advanced trials (nct ) where the emergence of resistant virus variants has occurred; the trial has been updated to include an arm with a second monoclonal (ly- cov ). however, even the use of monoclonal cocktails does not address future coronavirus spill overs from wild animals that may be antigenically distinct. indeed, large screening efforts were required to find antibodies from recovered sars-cov- patients that cross-react with sars-cov- ( ), indicating antibodies have confined capacity for interacting with variable epitopes on the spike surface, and are unlikely to be broad and pan-specific for all sars-related viruses. an alternative protein-based antiviral to monoclonal antibodies is to use soluble ace (sace ) as a decoy to compete for receptor-binding sites on the viral spike ( , ( ) ( ) ( ) ( ) of diverse sars-associated betacoronaviruses that use ace for entry. we further fail to find mutations within the rbd, which directly contacts ace and is where possible escape mutations will most likely reside, that redirect specificity towards the wild type receptor. we conclude that resistance to an engineered decoy receptor will be rare, and sace .v . targets common attributes for affinity to s in sars-associated viruses. the affinities of the decoy receptor sace .v . were determined for purified rbds from the s proteins of five coronaviruses from rhinolophus bat species (isolates lyra , rs , rs , rs and rsshc ) and two human coronaviruses, sars-cov- and sars-cov- . these viruses fall within a common clade of betacoronaviruses that use ace as an entry receptor ( ). they share close sequence identity within the rbd core while variation is highest within the functional ace binding site (figures and s ) , possibly due to a co-evolutionary 'arms race' with polymorphic ace sequences in ecologically diverse bat species ( ). affinity was measured by biolayer interferometry (bli), with sace (a.a. s -g ) fused at the c-terminus with the fc moiety of human igg immobilized to the sensor surface and monomeric his-tagged rbd ( figure s ) used as the soluble analyte. this arrangement excludes avidity effects, which otherwise cause artificially tight (picomolar) apparent affinities whenever dimeric sace in solution is bound to immobilized rbd decorating an interaction surface. wild type sace bound all the rbds with affinities ranging from nm for sars-cov- to nm for lyra , with median affinity nm ( sars-cov- to . nm for isolate rs , with median affinity less than nm ( table ). the approximate -fold affinity increase of the engineered decoy applies universally to coronaviruses in the test panel and the molecular basis for affinity enhancement must therefore be grounded in common attributes of rbd/ace recognition. the rbd of sars-cov- (pdb m ) is colored by diversity between sars-associated cov strains (blue, conserved; red, variable). a deep mutational scan of the rbd in the context of full-length s reveals residues in the ace binding site are mutationally tolerant to explore potential sequence diversity in s of sars-cov- that may act as a 'reservoir' for drug resistance, the mutational tolerance of the rbd was evaluated by deep mutagenesis ( ). saturation mutagenesis was focused to the rbd (a.a. c -l ) of full-length s tagged at the extracellular n- terminus with a c-myc epitope for detection of surface expression. the spike library, encompassing , single amino acid substitutions, was transfected in human expi f cells under conditions where cells typically acquire no more than a single sequence variant ( , ). the culture was incubated with wild type, his-tagged, dimeric sace at a sub-saturating concentration ( . nm). bound sace - h and surface-expressed s were stained with fluorescent antibodies for flow cytometry analysis ( figure a ). compared to cells expressing wild type s, the library was poorly expressed, indicating many mutations are deleterious for folding and expression. a cell population was clearly discernable expressing s variants that bind ace with decreased affinity ( figure b ). after gating for c-myc-positive cells expressing s, cells with high and low levels of bound sace were collected by fluorescence-activated cell sorting (facs), called the ace -high and ace -low populations, respectively ( figure c ). both the expression and sace binding signals decreased over minutes to hours during sorting, possibly due to shedding of the s subunit. cells were therefore collected and pooled from three separate facs experiments for a combined hours sort time. averaging the log enrichment ratios for each of the possible amino acids at a residue position. by adding conservation scores for both the ace -high and ace -low sorts we derive a score for surface expression, which shows that the hydrophobic rbd core is tightly conserved for folding and trafficking of the viral spike ( figure a ). by comparison, residues on the exposed rbd surface are mutationally permissive for s surface expression. this matches the mutational tolerance of proteins generally. for tight ace binding (i.e. s variants in the ace -high population), conservation increases for rbd residues at the ace interface, yet mutational tolerance remains high ( figure c ). the sequence diversity observed among natural betacoronaviruses, which display high diversity at the ace binding site, is therefore replicated in the deep mutational scan, which predicts the sars-cov- spike tolerates substantial genetic diversity at the receptor-binding site for function. from this accessible sequence diversity sars-cov- might feasibly mutate to acquire resistance to monoclonal antibodies or engineered decoy receptors targeting the ace -binding site. binding site (e.g. v , y and c ) is free to mutate for yeast surface display, but its sequence is constrained in our experiments; this region of the rbd is buried by connecting structural elements to the global fold of an s subunit in the closed-down conformation (this is the dominant conformation for s subunits and is inaccessible to receptor binding) ( , , , ). we used targeted mutagenesis to individually test alanine substitutions to all the cysteines in the rbd ( figure s ). we found all cysteine- to-alanine mutations severely diminish s surface expression in expi f cells, including c a and c a on the rbd 'backside' that were neutral in the yeast display scan ( ). these differences demonstrate that there are tighter sequence constraints on the rbd in the context of a full spike expressed at a human cell membrane, yet overall we consider the two data sets to closely agree. for binding to dimeric sace , we note that interface residues were more tightly conserved in the starr et al data set (figure d ), possibly a consequence of three differences between the deep mutagenesis experiments. first, our selections for ace binding of s variants at the plasma membrane appears to primarily reflect mutational effects on surface expression, which is almost certainly more stringent in human cells. yeast permit many poorly folded proteins to leak to the cell surface ( ). second, the yeast selections were conducted at multiple sace concentrations from which apparent k d changes were computed ( ); the starr et al data in this regard is very comprehensive. due to the long sort times required for our human cell libraries where only a small fraction of cells express spike, we sorted at a single sace concentration that cannot accurately capture a range of different binding affinities quantitatively. third, dimeric sace may geometrically complement trimeric s densely packed on a human cell membrane, such that avidity masks the effects of affinity-reducing mutations. nonetheless, there is overall agreement that ace binding often persists following mutations to the rbd surface, and our data simply suggests mutational tolerance may be even greater than that already observed by starr et al. having shown that the ace -binding site of sars-cov- protein s tolerates many mutations, we asked whether mutations might therefore be found that confer resistance to the engineered decoy sace .v . . resistance mutations are anticipated to lose affinity for sace .v . while maintaining binding to the wild type receptor, and are most likely to reside in the rbd where physical contacts are made. similar reasoning formed the foundation of a deep mutagenesis-based selection of the isolated rbd by yeast surface display to find escape mutations to monoclonal antibodies, and the results were predictive of escape mutations in pseudovirus growth selections ( ). to address whether escape mutations from the engineered decoy might be found in the rbd, we repurposed the s protein library for a specificity selection. cells expressing the library, encoding all possible substitutions in the rbd, were co-incubated with wild type sace fused to the fc region of igg and his-tagged sace .v . at concentrations where both proteins bind competitively ( ). it was immediately apparent from flow cytometry of the expi f culture expressing the s library that there were cells expressing s variants shifted towards preferential binding to sace .v . , but no significant population with preferential binding to the wild type receptor (figures a and b ). cells expressing s variants that might preferentially bind sace (wt)-igg or sace .v . were gated and collected by facs (figure c ), followed by deep sequencing of s transcripts to determine enrichment ratios. there was close agreement between two independent replicate experiments ( figures d- g ). most rbd mutations were depleted following sorting, consistent with deleterious effects on s folding and expression. soluble ace .v . has three mutations from wild type ace : t y buried within the rbd interface, and l t and n y at the interface periphery ( figure a) . a substantial number of mutations in the rbd of s were selectively enriched for preferential binding to sace .v . ( figure b , upper-left quadrant). while sace .v . -specificity mutations could be found immediately adjacent to the sites of engineered mutations in ace (in particular mutations to s-f adjacent to ace -l and s-t adjacent to ace -n ), major hot spots for sace .v . -specificity mutations were also mapped to rbd loop - , contacting the region where the ace -α helix packs against a β-hairpin motif ( figure a ). by comparison, there were no hot spots in the rbd for sace (wt)-specificity mutations. indeed, only a small number of mutations were selectively enriched for preferential binding to wild type receptor ( figure b ), and the abundance of these putative wild type-specific mutations barely rose above the expected level of noise in the deep mutagenesis data. in this competition assay, s binding to wild type sace is therefore more sensitive to rbd mutations than s binding to engineered sace .v . . to determine whether the potential wild type ace -specific mutations found by deep mutagenesis are real as opposed to false predictions due to data noise, we tested mutants of s selectively enriched in the wild type-specific gate by targeted mutagenesis (blue data points in figure b ). only minor shifts towards binding wild type sace were observed ( figure s ). two s mutants were investigated further in sace titration experiments, n w and n y, which both retained high receptor binding and displayed small shifts towards wild type sace in the competition experiment. n of s is located in the - loop and its substitution to large aromatic side chains might alter the loop conformation to cause steric strain with nearby ace mutation n y in sace .v . . after titrating the concentrations of his-tagged sace (wt) and sace .v . and measuring bound protein to s-expressing cells by flow cytometry, it was found s-n w and s-n y do show enhanced specificity for wild type sace , but the effect is small and sace .v . remains the stronger binder ( figure c ); these mutations therefore will not confer resistance in the virus to the engineered decoy. by comparison, multiple independent escape mutations are readily found in s of sars-cov- that diminish the efficacy of monoclonal antibodies by many orders of magnitude ( , ) . finally, representative mutations to s predicted from the deep mutational scan to increase specificity towards sace .v . (purple data points in figure b ) were cloned and were found to have large shifts towards preferential sace .v . binding in the competition assay ( figure s ). these s mutations were y k/q/s, l g/r/y and g k. none of the mutated sites is in direct contact with an engineered residue on sace .v . and the molecular bases for specificity changes are therefore ambiguous, but we speculate may involve local conformational perturbations. validation by targeted mutagenesis therefore confirms that the selection can successfully find mutations in s with altered specificity. the inability to find mutations in the rbd that impart high specificity for the wild type receptor means such mutations are rare or may not even exist, at least within the receptor-binding domain where direct physical contacts with receptors occur. we cannot exclude mutations elsewhere having long-range conformational effects. engineered, soluble decoy receptors therefore live up to their promise as broad therapeutic candidates against which a virus cannot easily escape. the allure of soluble decoy receptors is that the virus cannot easily mutate to escape neutralization. mutations that reduce affinity of the soluble decoy will likely also decrease affinity for the wild type receptor on host cells, thereby coming at the cost of diminished infectivity and virulence. however, this hypothesis has not been rigorously tested, and since engineered decoy receptors differ from their wild type counterparts, even if by just a small number of mutations, it is possible a virus may evolve to discriminate between the two. here, we show that an engineered decoy receptor for sars-cov- broadly binds with low nanomolar k d to the spikes of sars-associated betacoronaviruses that use ace for entry, despite high sequence diversity within the ace -binding site. mutations in s of sars-cov- that confer high specificity for wild type ace were not found in a comprehensive screen of all substitutions within the rbd. the engineered decoy receptor is therefore broad against zoonotic ace - utilizing coronaviruses that may spill over from animal reservoirs in the future and against variants of sars-cov- that may arise as the current covid- pandemic rages on. we argue it is unlikely that decoy receptors will need to be combined in cocktail formulations, as is required for monoclonal antibodies or designed miniprotein binders to prevent the rapid emergence of resistance ( , ), facilitating manufacture and distribution. our findings give insight into how a potential therapeutic can achieve breath with a low chance of virus resistance for a family of highly infectious and deadly viruses. physiology to exert unacceptable toxicity. for example, the entry receptor for human cytomegalovirus is a growth factor receptor, and growth factor interactions had to be knocked out to make a virus-specific decoy suitable for in vivo administration ( ). however, ace in this regard is different and its endogenous activity -the catalytic conversion of vasoconstrictive and inflammatory peptides of the renin- angiotensin system -may be of direct benefit for addressing covid- symptoms. during infection, ace activity is dysregulated and the renin-angiotensin system becomes imbalanced, possibly driving aspects of acute-respiratory distress syndrome (ards) ( - ). administration of recombinant sace converts angiotensin (ang) i and ii to the protective peptides ang-( - ) and ang-( - ), respectively, with potential benefits for the pulmonary and cardiovascular systems that include decreased lung elastance, increased blood oxygenation, reduced hypertension and diminished inflammation ( , , ( ) ( ) ( ) ( ) provide no more than a single coding variant per cell ( , ). expi f cells at × / ml were transfected with a mixture of ng coding plasmid (i.e. library dna) with . µg pcep -Δcmv carrier plasmid (described in ( ) fitc fluorescence for bound sace (wt)- h were collected ( figure c ). collection tubes were coated overnight with fetal bovine serum prior to sorting and contained expi expression medium. collected cell pellets were frozen at - °c and were pooled across separate sort experiments prior to extraction of total rna. the competition selection was performed similarly, with the exception that cells expressing the s library were incubated for minutes in a mixture of nm sace .v . - h and nm sace (wt)-igg . after washing twice, bound proteins were stained for minutes with anti-human igg-apc (clone hp , / dilution; biolegend) and anti-his-fitc (chicken polyclonal, / dilution; immunology consultants laboratory). cells were washed twice and sorted. after gating for the main population of viable cells as described above, the % of cells with the highest fitc-relative-to-apc and highest apc-relative-to-fitc signals were collected ( figure c ). total rna was extracted from the collected cells using a genejet rna purification kit (thermo scientific). first strand cdna was synthesized with accuscript (agilent) primed with a gene-specific oligonucleotide. the region of s scanned by saturation mutagenesis was pcr amplified as overlapping fragments that together span the full rbd sequence. following a second round of pcr, primers added adapters for annealing to the illumina flow cell and sequencing primers, together with barcodes for experiment identification. the pcr products were sequenced on an illumina novaseq using a × nt paired end protocol. data were analyzed using enrich ( ), where the frequencies of s variants in the transcripts of the sorted populations were compared to their frequencies in the naive plasmid library. log enrichment ratios for all the individual mutations were calculated and normalized by subtracting the log enrichment ratio for the wild type sequence across the same pcr-amplified fragment. a pneumonia outbreak associated with a new coronavirus of probable bat origin. nature structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars cryo-em structure of the -ncov spike in the prefusion conformation sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor angiotensin-converting enzyme is a functional receptor for the sars coronavirus. nature functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses much more than just a receptor for sars-cov- . front angiotensin-converting enzyme and angiotensin - : novel therapeutic targets. the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus on the track of the d y mutation in the sars-cov- spike fusion peptide: emergence and geotemporal spread of a highly prevalent variant in portugal mosaic structure of human coronavirus nl , one thousand years of evolution crystal structure of nl respiratory coronavirus receptor-binding domain complexed with its human receptor genetic recombination, and pathogenesis of coronaviruses evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic co-circulation of three camel coronavirus species and recombination of mers-covs in saudi arabia antibody cocktail to sars-cov- spike protein prevents rapid mutational escape seen with individual antibodies complete mapping of mutations to the sars-cov- spike receptor-binding domain that escape antibody recognition ultrapotent human antibodies protect against sars-cov- challenge via multiple mechanisms cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody susceptibility to sars coronavirus s protein-driven infection correlates with expression of angiotensin converting enzyme and infection can be blocked by soluble receptor neutralization of sars-cov- spike pseudotyped virus by recombinant ace -ig inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace engineering human ace to optimize binding to the spike protein of sars coronavirus engineered ace receptor traps potently neutralize sars-cov- . biorxiv high affinity modified ace receptors prevent sars-cov- infection. biorxiv exceptional diversity and selection pressure on sars-cov and sars-cov- host receptor in bats compared to other mammals structural basis of receptor recognition by sars-cov- stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis receptor and viral determinants of sars-coronavirus adaptation to human ace deep mutational scanning: a new style of protein science mapping interaction sites on human chemokine receptors by deep mutational scanning structural architecture of a dimeric class c gpcr based on co-trafficking of sweet taste receptor subunits enrich: software for analysis of protein function by enrichment and depletion of variants deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding de novo design of ace protein decoys to neutralize sars-cov- . biorxiv distinct conformational states of sars-cov- spike protein molecular architecture of the sars-cov- virus global analysis of protein folding using massively parallel design, synthesis, and testing de novo design of picomolar sars-cov- miniprotein inhibitors thpdb: database of fda-approved peptide and protein therapeutics engineered receptors for human cytomegalovirus that are orthogonal to normal human biology angiotensin-converting enzyme protects from severe acute lung failure recombinant angiotensin-converting enzyme improves pulmonary blood flow and oxygenation in lipopolysaccharide-induced lung injury in piglets the pivotal link between ace deficiency and sars-cov- infection renin-angiotensin-system, a potential pharmacological candidate, in acute respiratory distress syndrome during mechanical ventilation sars-cov- and ace : the biology and clinical data settling the arb and acei controversy ace improves right ventricular function in a pressure overload model novel ace -fc chimeric fusion provides long-lasting hypertension control and organ protection in mouse models of systemic renin angiotensin system activation pharmacokinetics and pharmacodynamics of recombinant human angiotensin- converting enzyme in healthy human subjects a pilot clinical trial of recombinant human angiotensin-converting enzyme in acute respiratory distress syndrome novel ace -igg fusions with improved activity against sars-cov . biorxiv computational design of a protein-based enzyme inhibitor cytometer (bd biosciences) and data were processed with fcs express (de novo software). quantification of myc-s surface expression is detailed in figure s . part supported by nih award r ai to e.p. the university of illinois has filed a provisional patent for engineered decoy receptors and e.p. and k.k.c. are co-founders of orthogonal biologics, inc. key: cord- - ecsp y authors: griesemer, sara b; van slyke, greta; st. george, kirsten title: assessment of sample pooling for clinical sars-cov- testing date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ecsp y accommodating large increases in sample workloads has presented one of the biggest challenges to clinical laboratories during the covid- pandemic. despite the implementation of new automated detection systems, and previous efficiencies such as barcoding, electronic data transfer and extensive robotics, throughput capacities have struggled to meet the demand. sample pooling has been suggested as an additional strategy to further address this need. the greatest concern with this approach in a clinical setting is the potential for reduced sensitivity, particularly the risk of false negative results when weak positive samples are pooled. to investigate this possibility, detection rates in pooled samples were evaluated, with extensive assessment of pools containing weak positive specimens. additionally, the frequency of occurrence of weak positive samples across ten weeks of the pandemic were reviewed. weak positive specimens were detected in all five-sample pools but failed to be detected in four of the nine-sample pools tested. weak positive samples comprised an average . % of the positive specimens tested during the pandemic thus far, slightly increasing in frequency during later weeks. other aspects of the testing process should be considered, however, such as accessioning and reporting, which are not streamlined and may be complicated by pooling procedures. therefore, the impact on the entire laboratory process needs to be carefully assessed prior to implementing such a strategy. the covid- pandemic has presented numerous challenges to the health care industry in general and laboratory testing specifically. not least among the latter has been a dramatic increase in sample testing load ( ) . efforts to meet the demand have included increased use of automated instrumentation, multiplexing of molecular detection assays, streamlined testing protocols, as well as increasingly varied acceptable sample types, collection devices and transport media ( ) ( ) ( ) . accommodating the testing workload and reagent shortage during the symptomatic pandemic wave was a significant undertaking ( ) . however, the more recent task, to test returning healthcare workers as well as patients returning for services such as nonessential surgery and other clinical procedures, has generated an even greater challenge. one proposed solution has been sample pooling ( ) ( ) ( ) ( ) , a method previously used for numerous other situations where large scale testing was needed ( ) ( ) ( ) ( ) . when used as an epidemiological surveillance tool, acceptable parameters and limits may be quite different to those when the same system is applied in a clinical testing setting ( , ( ) ( ) ( ) ( ) . in the latter, the issue of detection sensitivity for every individual specimen becomes critical. methods for pooling vary widely and testing large numbers of pooled samples is not possible without an inherent loss of sensitivity. however, there is the potential for more limited pooling, without loss of sensitivity. we sought to investigate to what extent this was possible, while maintaining the detection of weak positive samples. the cdc ncov real-time rt-pcr diagnostic panel ( ) ( ) ( ) was used throughout, with extraction on the biomerieux emag ® (biomerieux inc, durham, nc). for individual specimens, µl of viral transport media (vtm, regeneron, rensselaer) or molecular transport media (mtm, primestore, longhorn, san antonio, tx) from upper respiratory swab, was added to ml nuclisens lysis buffer (biomerieux) and extracted into µl of eluate. the biomerieux emag extraction system will accommodate a maximum volume of ml. therefore, a maximum of nine samples ( µl per sample) could be added to a ml lysis buffer tube without exceeding the maximum volume, while maintaining the same input volume per specimen. if the same extraction efficiency is maintained when nine samples are loaded in the tube as when one is loaded, and the eluate is still µl, theoretically the same total nucleic acid from each sample should be extracted. provided there is no increase in the pcr inhibition of the pooled eluate, the detection sensitivity should therefore also still be the same. pool sizes of five and nine samples were tested, with each pool containing a single positive specimen. for specimen pooling, µl each of one positive, and either four (five-sample pool) or eight (nine-sample pool) negative specimens were added, at random, to ml of lysis buffer, in triplicate, extracted and eluted into µl of elution buffer. in an initial experiment, strong and moderate positive samples in vtm were tested in each size pool. in a second experiment, four weak positive specimens in vtm and four weak positive specimens in mtm, were tested in both five and nine sample pools, in triplicate. additionally, the percentage of positive samples at different viral loads, as assessed by ct value, was reviewed across nine weeks of the pandemic, to determine if these weak positive specimens comprise a significant component of total tested specimens and whether the proportion has changed over the course of the pandemic. pooling of samples with lower ct values did not cause a loss of detection of any of the individual positive samples in either the five-or nine-sample pools (table ) , although in one of the ninesample pools, the detection became inconclusive rather than positive (specimen d). the ct values of positive samples were increased by . to . when pooled with four negative samples. in contrast, pooling with negative samples caused ct increases ranging from . to . . when weak positive samples were pooled with four or eight negative samples (table ) , the positive samples were still all detected in five-sample pools, whether they were in vtm or mtm transport media. when combined in nine-sample pools, detection was more adversely affected for samples in vtm than those in mtm. for samples in mtm, one of three replicates for one sample, failed to be detected in a pool of nine samples. in contrast, for weak positive specimens in vtm transport media, nine-sample pools caused multiple replicates to return negative results. we then sought to assess what component of the total specimens tested are comprised of these weak positive specimens, to evaluate how much of an impact pooling might have overall on testing sensitivity across positive patient detection in the pandemic. further, to assess the positivity rate during the months since the onset of the pandemic in new york, since pooling strategies are not efficient unless sample positivity is low. despite the large range in number of specimens tested per day from late february through mid-may (figure ), the percentages of specimens with viral loads ranging from very strong (ct < ), strong (ct - ), moderate (ct - ), weak (ct - ) and very weak (ct - ) remained remarkably constant, with the exception of those in the very weak range which increased slightly during the last five weeks. overall, this weak positive sample type constituted an average . % of positive specimens, a substantial proportion of the positive specimens received for testing. positivity rates among samples received at this facility rose to almost % during march and has continually dropped since early april, remaining below % since th may and below . % since may . as the pandemic evolves, despite case counts, hospitalization rates and fatalities decreasing in some areas, laboratories continue to face new challenges. workloads have increased with, for example, requirements to perform repeated surveillance screening of asymptomatic health care workers and testing of patients undergoing elective procedures where there is a risk of aerosol production. these policies have pushed test numbers beyond those encountered even at the height of the pandemic wave. suggestions to help manage the load have included pooling of samples to enhance throughput capacity. when being considered for application in a clinical testing environment, of greatest concerns is the potential for this strategy to increase the false negative rate. with that, the greatest risk of false negative results is with weak positive samples. to investigate the potential limits of pooling in this situation, this study focused on pooling weak positive samples in relatively small size sample pools. had these been successful, larger pools would have been attempted. there are multiple methods for pooling samples, some of which carry an inherent risk of reducing test sensitivity and some of which do not. the method described in this paper, where the same volume of each sample is added to lysis buffer as would normally be added if the sample were being tested individually, does not theoretically adversely affect sensitivity, as long as extraction efficiency is maintained and the level of pcr inhibition is not increased by the additional load through the extraction device. when the data was analyzed, despite some minor shifts in ct values, no loss of detection was observed in any of the five-pool experiments, for samples that had been collected in either vtm or mtm. however, when the same weak positive samples were pooled with negative samples rather than , to create pools of , detection failures were observed. in three of the -sample vtm pools and one of the -sample mtm pools, this larger pool size resulted in a complete loss of detection. whether the less frequent occurrence in mtm samples compared to vtm samples is significant is difficult to say based on this limited data. the use of pooling as a throughput enhancement strategy is only efficient if the positivity rate in the samples being tested is low enough that a minimal number of pools will test positive, otherwise, multiple pools will have to be deconvoluted for retesting of individual samples. the optimal or maximum positivity level at which pooling starts to become efficient, depends on the pool size being used. for pool sizes as small as samples, this maximum positivity level is considerably higher than that for very large pool sizes that are sometimes suggested for large scale epidemiological screening studies. for example, at a positivity rate of % and a pool size of , on average, only in every pools will be positive and need to be deconvoluted for retesting. therefore, for every samples, testing could be achieved with a total of tests ( pools and one deconvoluted pool). we noted that the positivity rate in our own lab is now approaching % and therefore such a strategy may be efficient for extraction and detection. moreover, as the pandemic has progressed, there has been an increasing proportion of samples in the weak positive range, and therefore major consideration must be given to the issue of detection sensitivity for these weak samples. it must be noted however, that extraction and detection are not the only components of the laboratory operation and a pooling strategy does not enhance other aspects and may in fact create complications. processes for specimen receiving and accessioning, as well as those for result data management and reporting, are not reduced by pooling strategies. these procedures may be complicated by sample pooling, especially for electronic data transfer programs and laboratory information systems. a large increase in test load facilitated by a pooling strategy, may create serious workload bottlenecks for these and other areas of the operation. therefore, the global implications of a pooling strategy need to be carefully assessed, especially in the clinical testing environment, before implementation. coi sbg and gvs declare no conflict of interest. ksg receives research support from thermofisher for the evaluation of new assays for the diagnosis and characterization of viruses. she also has a royalty generating collaborative agreement with zeptometrix. report from the american society for microbiology covid- international summit overcoming the bottleneck to widespread testing: a rapid review of nucleic acid testing approaches for covid- detection all in': a pragmatic framework for covid- testing and action on a global scale laboratory testing of sars-cov, mers-cov, and sars-cov- ( -ncov): current status, challenges, and countermeasures assessment of specimen pooling to conserve sars cov- testing resources sample pooling as a strategy to detect community transmission of sars-cov- pooling of nasopharyngeal swab specimens for sars-cov- detection by rt-pcr evaluation of covid- rt-qpcr test in multi-sample pools hepatitis b, hepatitis c and hiv transfusion-transmitted infections in the st century pooling nasopharyngeal/throat swab specimens to increase testing capacity for influenza viruses by pcr large scale screening of human sera for hcv rna and gbv-c rna malaria diagnosis from pooled blood samples: comparative analysis of real-time pcr, nested pcr and immunoassay as a platform for the molecular and serological diagnosis of malaria on a large-scale estimating community prevalence of ocular chlamydia trachomatis infection using pooled polymerase chain reaction testing optimization of group size in pool testing strategy for sars-cov- : a simple mathematical model pooling urine samples for ligase chain reaction screening for genital chlamydia trachomatis infection in asymptomatic women pooling as a strategy for the timely diagnosis of soil-transmitted helminths in stool: value and reproducibility centers for disease control and prevention rvb, division of viral diseases. . research use only -novel coronavirus ( -ncov) real-time rt-pcr primers and probes the authors thank kamran zamani for assistance with data management and the provision of figure . they also thank jennifer laplante, rene hull and steven zink for specimen selection and retrieval. we thank the wadsworth covid response team for months of dedicated, skilled testing and careful specimen archiving that provided the well-characterized samples accessed for this work. key: cord- -n zb am authors: postlethwait, john h.; farnsworth, dylan r.; miller, adam c. title: an intestinal cell type in zebrafish is the nexus for the sars-cov- receptor and the renin-angiotensin-aldosterone system that contributes to covid- comorbidities date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n zb am people with underlying conditions, including hypertension, obesity, and diabetes, are especially susceptible to negative outcomes after infection with the coronavirus sars-cov- . these covid- comorbidities are exacerbated by the renin-angiotensin-aldosterone system (raas), which normally protects from rapidly dropping blood pressure or dehydration via the peptide angiotensin ii (ang ii) produced by the enzyme ace. the ace paralog ace degrades ang ii, thus counteracting its chronic effects. ace is also the sars-cov- receptor. ace, the coronavirus, and covid- comorbidities all regulate ace , but we don’t yet understand how. to exploit zebrafish (danio rerio) as a disease model to understand mechanisms regulating the raas and its relationship to covid- comorbidities, we must first identify zebrafish orthologs and co-orthologs of human raas genes, and second, understand where and when these genes are expressed in specific cells in zebrafish development. to achieve these goals, we conducted genomic analyses and investigated single cell transcriptomes. results showed that most human raas genes have an ortholog in zebrafish and some have two or more co-orthologs. results further identified a specific intestinal cell type in zebrafish larvae as the site of expression for key raas components, including ace, ace , the coronavirus co-receptor slc a , and the angiotensin-related peptide cleaving enzymes anpep and enpep. results also identified specific vascular cell subtypes as expressing ang ii receptors, apelin, and apelin receptor genes. these results identify specific genes and cell types to exploit zebrafish as a disease model for understanding the mechanisms leading to covid- comorbidities. summary statement genomic analyses identify zebrafish orthologs of the renin-angiotensin-aldosterone system that contribute to covid- comorbidities and single-cell transcriptomics show that they act in a specialized intestinal cell type. orthologs and previously unrecognized co-orthologs of important components; and second, that a specific under-characterized cell type expresses many raas components and is hence a focal cell type for the raas that merits further exploration. an apparently similar cell type in humans allows sars-cov- infection, the production of infectious virus, and likely some covid- pathologies (stanifer et al., ) . these studies support zebrafish as a model for investigating the relationship of the raas to covid- pathologies. to identify raas components in the zebrafish genome and to detect potential covid- -related cellular systems, we combined our ongoing efforts in defining the relationships of zebrafish and human genomes (braasch et ang i sequences vary more among ray-finned than lobe-finned vertebrates (fig. ) . because coelacanth and several basally diverging ray-finned vertebrates (sturgeon, elephant fish) have the sequence nrvyvhpfnl, this is likely the ancestral ang i sequence for all bony vertebrates. several ray-finned vertebrates have isoleucine at position , representing independent mutations that happen to match placental mammals. position is highly variable in ray fins as in lobe fins. zebrafish ang i is nrvyvhpfnl, differing from the human form at positions , , and . the angiotensin system was likely already active in stem vertebrates because chondrichthys (cartilaginous fish) and even agnathans (jawless vertebrates) possess angiotensinogen genes (takei et al., ) . at position , some cartilaginous fish have arginine like most ray-finned vertebrates, but others have asparagine or tryptophan, and position is variable. the ancestral ang i sequence in jawed vertebrates was likely the same as in ancestral bony fishes (nrvyvhpfnl). angiotensinogen genes in jawless vertebrates appear to encode an angiotensin that shares the amino- terminal four residues with mammals but varies in the carboxy-terminal six residues (wong and takei, ). at position , lampreys have either aspartic acid or glutamic acid, but the conserved isoleucine or leucine at position is replaced with methionine followed by glutamine replacing the otherwise invariant histidine. lamprey ang ii alters cardiovascular dynamics in live lampreys but teleost ang ii (nrvyvhpf) did not (wong and takei, ) angiotensin ii (ang ii) forms when ace cleaves two c-terminal amino acid residues from ang i (fig. s . ). ang ii contributes to hypertension, an important covid- comorbidity, and likely promotes the inflammation that leads to poor disease outcomes. human and zebrafish ang ii differ only at the first and fifth residues (drvtihpf vs. nrvtvhpf, fig. ) . importantly, the fish and human peptides act about equally in zebrafish has a single ortholog of human agtr with conserved syntenies (fig. g) conserved syntenies suggest a mechanism for this situation. in human, mas lies in the gene sequence: . importantly, recent analyses showed that slc a is at the peak of a genome wide association study for poor outcomes from covid- (ellinghaus et al., ). these findings raise the novel hypothesis that slc a variants contribute to variation in covid- outcomes due to differences in expression or protein function related to interactions with ace , the sars- cov- receptor. adam (fig. s . ) is a metalloendopeptidase that cleaves the membrane isoform of ace to make the soluble protein sace (lambert et al., ) . zebrafish has two copies of adam in double conserved synteny with human adam (fig. a) , showing that they are co-orthologs of human adam from the tgd. the zebrafish atlas showed expression of adam a (ensdarg ) stronger in embryonic intestinal epithelium than in larval ace -expressing c cells (fig. b ). in addition, several cells in the vascular endothelium expressed adam a (fig. c) . the tgd duplicate adam b (ensdarg ) was expressed in the atlas in a few widely dispersed individual cells. because expression of adam paralogs was not detected in c , zebrafish may not have soluble ace ; alternatively, a different enzyme in zebrafish might cleave ace to make the soluble form or expression levels may be too low for detection. analysis of the zebrafish aplnra and aplnrb genes in the same tree (ensgt ) showed that most teleost clades have orthologs of each gene, that the bonytongues, which branch deep in the teleosts, root each clade; furthermore, spotted gar and reedfish serve as pre-tgd outgroups, as expected from historical species relationships (fig. b) . the tree showed that the sister group of the ray-finned aplnra+alpnrb clade is a lobe-finned vertebrate clade rooted on coelacanth and amphibia as expected and includes 'reptiles' and birds (fig. b ). this clade, which we tentatively here call the aplnrl clade, contains only monotremes and marsupials among mammals, indicating that this gene was lost in eutherian mammals (zhang et al., ) . conserved syntenies showed that chicken (gallus callus) chromosome gga , which contains aplnrl (ensgalg ), has orthologs on both dre and dre , the sites of aplnra and aplnrb, respectively, as well as dre (fig. d) . these conserved syntenies independently verify that: ) aplnrl was present in the last common ancestor of human and zebrafish; ) aplnrl was lost from eutherians after they diverged from marsupials; and ) the tgd produced aplnra and aplnrb paraogs. these analyses support the hypothesis that the last common ancestor of zebrafish and human had at least two aplnr-related genes; one became aplnr in human and 'aplnr ' in teleosts and the other was lost in eutherians but retained in other tetrapods, and subsequently duplicated in the tgd, becoming 'aplnra' and 'aplnrb' in zebrafish. renaming aplnr (ensdarg ) to aplnr, aplna (ensdarg ) to aplnrla, and aplnrb (ensdarg ) to aplnrlb would better connect zebrafish to human biology. atlas cells expressing aplnr (ensdarg ) occupied midline fates, including prominently the floorplate (c ), hypochord (c ), and scleroderm (c ), as well as an embryonic intestinal epithelium cell type (c ). expression of alpnr was detected in the spleen and heart of zebrafish adults by qpcr (zhang et conserves genes encoding ang ii receptors agtr and agtr . zebrafish also has orthologs encoding slc a , which binds ace and adam , the enzyme that creates the soluble form of ace . furthermore, zebrafish has an ortholog encoding tmprss , which activates coronavirus spike protein for binding to ace and bringing the virus into human cells. zebrafish also has the ligand and receptors for the apelin signaling system. zebrafish has a single ortholog of most raas genes but has duplicates of some. many zebrafish duplicates of human raas genes derive from the teleost genome duplication event, including ) agtr a and agtr b, ) anpepa and anpepb, ) slc a a and slc a b, and ) aplnrla and aplnrlb. other duplicated raas genes appeared by tandem duplication, including ) anpep and anpepl, ) anpepla. and anpepla. ; and ) slc a a. and slc a a. . the significance of these discoveries is that the actions of both zebrafish co- orthologs must be considered when translating zebrafish science to human biology. the raas/apelin system also provides an example of 'ohnologs gone missing', in which one ohnolog zebrafish express covid- -related raas genes in tissues similar to human. results showed that, like humans and other mammals, zebrafish liver cells express angiotensinogen. in the atlas, dpf zebrafish larvae had three types of hepatocytes, two of which expressed agt, adding cellular precision to expression studies in adult zebrafish (cheng et al., ) . induction of agt expression in mammals relies on cortisol and inflammation (brasier and li, ace, which cleaves ang i to ang ii, was surprisingly shown by our scrna-seq analysis to be co-expressed significantly, slc a is at the peak of the strongest of two genome wide association study loci for undesirable covid- outcomes (ellinghaus et al., ) . we suggest the hypothesis that genetic variants near slc a affect the severity of covid- symptoms due to variations in interactions with ace . angiotensin peptides are related to covid- comorbidities because they bind to agtr and agtr receptors on vascular cells to help regulate vasoconstriction, and on the adrenal to stimulate secretion of aldosterone, leading to salt and water retention related to the comorbidity of obesity-related kidney damage (fyhrquist and saijonmaa, ). as in humans, zebrafish scrna-seq analysis showed that agtr b is expressed in endothelial cells and confirmed that agtr is also expressed in endothelial cells (wong et al., ). the conserved expression of zebrafish raas-related genes supports the contention that raas regulation the zebrafish raas is pharmacologically similar to that of mammals. the ace inhibitor lisinopril blocks the effects of ang i on sodium uptake in zebrafish, as predicted if ace were required to convert ang i to ang ii (kumai et al., ) . zebrafish cultured from hpf to dpf in water containing the ace inhibitor captopril do not differ in survival from controls and neither do fish in water with % of the normal salt concentration (rider (e), anpepa (f), and dpp (g). all were expressed in c , the same intestinal epithelium cell type that expressed ace, except anpeplb, which was expressed in blood vessels at dpf and a few neural crest cells, but not in intestinal epithelium. dipeptidyl peptidase- inhibitors can inhibit angiotensin converting enzyme zebrafish hox clusters and vertebrate genome evolution further evidence for bats as the evolutionary source of middle east respiratory syndrome coronavirus angiotensin ii stimulation of dpp activity regulates megalin in the proximal tubules mas and its related g protein-coupled receptors presumed asymptomatic carrier transmission of covid- zebrafish models of cardiovascular disease the zebrafish model of tuberculosis -no lungs needed the spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons a new model army: emerging fish models to study the genomics of vertebrate evo-devo mechanisms for inducible control of angiotensinogen gene transcription evolution of hormone-receptor complexity by molecular exploitation analyses of the cyp b gene family in the guinea pig suggest the existence of a primordial cyp b gene with aldosterone synthase activity the renin angiotensin aldosterone system in obesity and hypertension: roles in the cardiorenal metabolic syndrome automated identification of conserved synteny after whole-genome duplication hnf factors form a network to regulate liver-enriched genes in zebrafish kidney disease is associated with in-hospital death of patients with covid- ace orthologues in non-mammalian vertebrates (danio, gallus, fugu, tetraodon and xenopus) dichotomous role of the macrophage in early mycobacterium marinum infection of the zebrafish full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus small molecule screen for compounds that affect vascular development in the zebrafish retina slc neurotransmitter transporters: structure, function, and regulation a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury angiotensin-ii promotes na+ uptake in larval zebrafish, danio rerio, in acidic and ion-poor water in vivo modulation of endothelial polarization by apelin receptor signalling neuroendocrine control of ionic balance in zebrafish developing zebrafish disease models for in vivo small molecule screens the sh -domain-containing inositol -phosphatase (ship) limits the motility of neutrophils and their recruitment to wounds in zebrafish tumor necrosis factor-alpha convertase (adam ) mediates regulated ectodomain shedding of the severe-acute respiratory syndrome-coronavirus (sars-cov) receptor, angiotensin-converting enzyme- (ace ) possible bat origin of severe acute respiratory syndrome coronavirus notch signaling is required for arterial-venous differentiation during embryonic vascular development genomic characterization and expression analysis of the first nonmammalian renin genes from zebrafish and pufferfish structure and functions of angiotensinogen genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding translational power of the zebrafish: an interspecies analysis of responses to cardiovascular drugs. front pharmacol epigenetic control of intestinal barrier function and inflammation in zebrafish zebrafish angiotensin ii receptor-like a (agtrl a) is expressed in migrating hypoblast, vasculature, and in multiple embryonic epithelia inferring the mammal tree: species-level sets of phylogenies for questions in ecology, evolution, and conservation renin-angiotensin-aldosterone system inhibitors in patients with covid- angiotensin ii stimulation of the basolateral located na+/h+ antiporter in eel (anguilla anguilla) enterocytes human intestine luminal ace and amino acid transporter expression increased by ace-inhibitors antigenicity of the sars-cov- spike glycoprotein does comorbidity increase the risk of patients with covid- : evidence from meta-analysis molecular distinction and angiogenic interaction between embryonic arteries and veins revealed by ephrin-b and its receptor eph-b immunolocalization of subtype angiotensin ii (at ) receptor protein in rat heart members of the undiagnosed diseases facilitate rare disease diagnosis and therapeutic research the zebrafish book. a guide for the laboratory use of summary of probably sars cases with onset of illness from host gut motility promotes competitive exclusion within a model intestinal microbiota identification of vasculature-specific genes by microarray analysis of etsrp/etv overexpressing zebrafish embryos characterization of a native angiotensin from an anciently diverged serine protease inhibitor in lamprey angiotensin at receptor activates the cyclic-amp signaling pathway in eel high expression of ace receptor of -ncov on the epithelial cells of oral mucosa structural basis for the recognition of sars-cov- by full-length human ace apelin- ( - ) is a biologically active ace metabolite of the endogenous cardiovascular peptide the pathogenesis and treatment of the `cytokine storm' in covid- deletion of the angiotensin type receptor (at r) reduces adipose cell size and protects from diet-induced obesity and insulin resistance apelin and its receptor control heart field formation during zebrafish gastrulation the cytokine release syndrome (crs) of severe covid- and interleukin- receptor (il- r) antagonist tocilizumab may be the key to reduce the mortality characterization of the apelin/elabela receptors (aplnr) in chickens, turtles, and zebrafish: identification of a novel apelin-specific receptor in teleosts. front endocrinol (lausanne) uxt potentiates angiogenesis by attenuating notch signaling an affective disorder in zebrafish with mutation of the glucocorticoid receptor key: cord- -dxuabscn authors: zhao, xuesen; zheng, shuangli; chen, danying; zheng, mei; li, xinglin; li, guoli; lin, hanxin; chang, jinhong; zeng, hui; guo, ju-tao title: ly e restricts the entry of human coronaviruses, including the currently pandemic sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dxuabscn c a is a sub-clone of human hepatoblastoma hepg cell line with the strong contact inhibition of growth. we fortuitously found that c a was more susceptible to human coronavirus hcov-oc infection than hepg , which was attributed to the increased efficiency of virus entry into c a cells. in an effort to search for the host cellular protein(s) mediating the differential susceptibility of the two cell lines to hcov-oc infection, we found that adap , gilt and ly e, three cellular proteins with known activity of interfering virus entry, expressed at significantly higher levels in hepg cells. functional analyses revealed that ectopic expression of ly e, but not gilt or adap , in hek cells inhibited the entry of hcov-oc . while overexpression of ly e in c a and a cells efficiently inhibited the infection of hcov-oc , knockdown of ly e expression in hepg significantly increased its susceptibility to hcov-oc infection. moreover, we found that ly e also efficiently restricted the entry mediated by the envelope spike proteins of other human coronaviruses, including the currently pandemic sars-cov- . interestingly, overexpression of serine protease tmprss or amphotericin treatment significantly neutralized the ifitm restriction of human coronavirus entry, but did not compromise the effect of ly e on the entry of human coronaviruses. the work reported herein thus demonstrates that ly e is a critical antiviral immune effector that controls cov infection and pathogenesis via a distinct mechanism. importance virus entry into host cells is one of the key determinants of host range and cell tropism and is subjected to the control by host innate and adaptive immune responses. in the last decade, several interferon inducible cellular proteins, including ifitms, gilt, adap , ch and ly e, had been identified to modulate the infectious entry of a variety of viruses. particularly, ly e was recently identified as host factors to facilitate the entry of several human pathogenic viruses, including human immunodeficiency virus, influenza a virus and yellow fever virus. identification of ly e as a potent restriction factor of coronaviruses expands the biological function of ly e and sheds new light on the immunopathogenesis of human coronavirus infection. oc infection than hepg , which was attributed to the increased efficiency of virus entry into c a cells. in an effort to search for the host cellular protein(s) mediating the differential susceptibility of the two cell lines to hcov-oc infection, we found that adap , gilt and ly e, three cellular proteins with known activity of interfering virus entry, expressed at significantly higher levels in hepg cells. functional analyses revealed that ectopic expression of ly e, but not gilt or adap , in hek cells inhibited the entry of hcov-oc . while overexpression of ly e in c a and a cells efficiently inhibited the infection of hcov- oc , knockdown of ly e expression in hepg significantly increased its susceptibility to hcov-oc infection. moreover, we found that ly e also efficiently restricted the entry mediated by the envelope spike proteins of other human coronaviruses, including the currently pandemic sars-cov- . interestingly, overexpression of serine protease tmprss or amphotericin treatment significantly neutralized the ifitm restriction of human coronavirus entry, but did not compromise the effect of ly e on the entry of human coronaviruses. the work reported herein thus demonstrates that ly e is a critical antiviral immune effector that controls cov infection and pathogenesis via a distinct mechanism. ly e, had been identified to modulate the infectious entry of a variety of viruses. particularly, ly e was recently identified as host factors to facilitate the entry of several human pathogenic viruses, including human immunodeficiency virus, influenza a virus and yellow fever virus. ( , ) , with the mortality rate of %, % and to %, respectively ( , ). no vaccine or antiviral drug is currently available to prevent cov infection or treat the infected individuals. the cross-species transmission of zoonotic covs presents a continuous threat to global human health ( , ) . therefore, understanding the mechanism of cov infection and pathogenesis is important for the development of vaccines and antiviral agents to control the current covid- pandemics and prevent future zoonotic cov threats. cov entry into host cells, a process to deliver viral nucleocapsids cross the plasma membrane barrier into the cytoplasm, is the key determinant of virus host range and plays a critical role in zoonotic cov cross-species transmission ( , ). the entry process begins by the binding of viruses to their specific receptor on the plasma membrane, which triggers endocytosis to internalize the viruses into the endocytic vesicles. the cleavage of viral envelope spike proteins by endocytic proteases and/or endosomal acidification triggers the conformation change of spike protein to induce the fusion of viral envelope with endocytic membrane and release nucleocapsids into the cytoplasm to initiate viral protein synthesis and rna replication. while angiotensin-converting enzyme (ace ) is the bona fide receptor for sars-cov, sars-cov- and cd (also known as aminopeptidase n) as their receptor, respectively ( , ). however, hcov-oc and hcov-hku bind to -oacetylated sialic acids via a conserved receptor- binding site in spike protein domain a to initiate the infection of target cells ( ) . as the key determinant of cell tropism, host range, and pathogenesis, cov entry is primarily controlled by interactions between the spike envelope glycoprotein and host cell receptor as well as the susceptibility of spike glycoprotein to protease cleavage and/or acid-induced activation of membrane fusion ( , ) . for instance, sars-cov can use ace orthologs of different animal species as receptors ( - ) and the efficiency of these ace orthologs to mediate sars-cov cell entry is consistent with the susceptibility of these animals to . in addition, expression of endosomal cathepsins, cell surface transmembrane proteases (tmprss), furin, and trypsin differentially modulates the entry of different human covs ( - demonstrated that ifitm , ifitm -ex and ifitm -ex modulated hcov-oc envelope proteins mediated entry in a similar extent in the two cell lines (fig. e) . accordingly, we concluded that ifitm proteins were not responsible for the observed differential susceptibility of the two hepatoma cell lines to hcov-oc infection. in this study, we further demonstrated that amphob treatment also efficiently attenuated the restriction of ifitm on the infection of sars-covpp, mers-covpp, hcov-nl pp, hcov- epp and iavpp, but not lasvpp (fig. a) . however, amphob treatment altered neither the restriction activity of ly e on the infection of human cov spike protein-pseudotyped lentiviruses nor the enhancement of ly e on iavpp infection (fig. b) . these results strongly imply that ly e modulates virus entry via a distinct mechanism. discussion ly e was initially identified as a cell surface marker to discriminate immature from mature thymocytes subsets ( ). the primary function of ly e has been associated with immune regulation, specifically in modulating t cell activation, proliferation, development ( ). in addition to lymphocytes, ly e mrna can also be detected in liver, spleen, uterus, ovary, lung, and brain and its expression can be induced by type i ifn in a cell-type specific manner second, in addition to gpi anchor, the evolutionally conserved amino acid residue l is also required for both the enhancement and restriction of virus entry into target cells by ly e (fig. ) ( ) . it can be speculated that this specific residue may mediate an interaction with other cellular membrane proteins to module viral entry. the fact that ly e enhances viral infectivity in a cell type-specific manner, with the strongest phenotype in cells of fibroblast and monocytic lineages ( ), does indicate the involvement of other host cellular factors. variations in the abundance of expression, as well as the localization of ly e and its associated proteins or lipids, may explain the differential effects of ly e on the infection of different viruses in different cell types ( fig. and ) . however, ly e enhancement of rna virus infection appears to be independent of type i interferon response and other isg expression ( ) cov- pp and hcov-nl pp, that share the ace receptor, does not support such a hypothesis (fig. ) . finally, the findings that ly e inhibits human cov entry cannot be evaded by ectopic expression of membrane-associated serine protease tmprss and compromised by amphob treatment strongly indicate that ly e modulates virus entry via a distinct mechanism from that ifitm proteins do (figs. and ) . specifically, inhibition of tmpess -enhanced cov entry implies that ly e most likely blocks virus entry at plasma membrane or in early endosomes. moreover, ifitms impede viral fusion by decreasing membrane fluidity and curvature ( ). amphob can bind cholesterol in cell membranes to increase membrane fluidity and planarity and consequentially rescue ifitm inhibition of virus entry ( ). interestingly, amphob only neutralize the antiviral effects of ifitm and ifitm , but has little effect on ifitm restriction of virus entry ( ). while ifitm is predominantly located in the plasma membrane or early endosomes, ifitm and are mainly localized in the later endosomes and lysosomes. due to their differential subcellular localization, ifitm mainly restricts the viruses that enter the cells at cell surface or in the early endosomes, such as parainfluenza viruses and hepatitis c virus ( , ), ifitm and primarily restrict the infection of viruses that enter the cells at later endosomes and/or lysosomes ( , , ). because amphob is endocytosed quite rapidly leading to its concentration in the late endosomes and lysosomes, it more efficiently alleviates the effect of ifitm and , but not ifitm , on virus entry ( ). similarly, the failure of amphob to attenuate the antiviral effects of ly e against human covs is most likely due to its predominant cell surface localization and inhibition of an early step of cov entry. in summary, while it is very interesting to know that ly e is capable of modulating the entry of many rna viruses, we only begin to uncover the mechanism of this fascinating host factor and define its pathobiological role in virus infection ( , ). further understanding the role and mechanism of ly e in viral infections will establish a scientific basis for development of therapeutics to harness its function for the treatment of viral diseases. real-time rt-pcr. hcov-oc rna was quantified by a qrt-pcr assay described previously ( ). to determine the level of isg mrna, total cellular rna was extracted using trizol reagent (invitrogen) and the same amount of total cellular rna was reverse-transcribed with superscript iii kit ((invitrogen). quantitative rt-pcr was performed using itaq universal sybr green supermix (bio-rad) with the following primers: ly e, ´- gtactgcctgaagccgaccatc- ´ and ´-agattcccaatgccggcactag- ´; adap , ´-agctgtcatcagcattaag- ´ and ´-actatctccttcccactttc- ´; gilt, ´-aatgtgaccctctactatgaag- ´ and ´- acgctggtgccctacggaaacg- ´; gapdh, ´-gaaggtgaaggtcggagtcaac- ´ and ´-cagagttaaaagcagccctggt- ´. gene expression was calculated using the -△ △ ct method, normalized to gapdh as described previously ( , ) . shrna targeting ly e mrna. the level of intracellular ly e expression was determined by western blot using a rabbit polyclonal antibody against ly e. β-actin served as a loading control. (b) hepg cells stably expressing the scramble shrna or ly e specific shrna were infected with hcov-oc at an moi of . . cells were harvested at hpi and intracellular viral rna was quantified by qrt-pcr assay and presented as copies per ng total rna. error bars indicate standard deviations (n = ). (c to f) c a or a cells were stably transduced with an empty retroviral vector (pqcxip) or retroviral vector expressing ly e and infected with hcov-oc at the indicated moi. the expression of ly e in the cell lines was confirmed by a western blot assay. β-actin served as a loading control (c and e). the cells were fixed at hpi. the infected cells were visualized by if staining of hcov-oc n protein (red). cell nuclei were visualized by dapi staining (d and f). coronavirus host range expansion and middle east respiratory syndrome coronavirus emergence: biochemical mechanisms and evolutionary perspectives a novel coronavirus associated with severe acute respiratory syndrome characterization of a novel coronavirus associated with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans a new coronavirus associated with human respiratory disease in china a novel coronavirus from patients with pneumonia in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium angiotensin- converting enzyme is a functional receptor for the sars coronavirus human coronavirus nl employs the severe acute respiratory syndrome coronavirus receptor for cellular entry sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc involvement of aminopeptidase n (cd ) in infection of human neural cells by human coronavirus e human coronaviruses oc and hku bind to -o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a host cell proteases: critical determinants of coronavirus tropism and pathogenesis rhesus angiotensin converting enzyme supports entry of severe acute respiratory syndrome coronavirus in chinese macaques expression of feline angiotensin converting enzyme and its interaction with sars-cov s protein mustela vison ace functions as a receptor for sars-coronavirus gilt restricts the cellular entry mediated by the envelope glycoproteins of sars-cov, ebola virus and lassa fever virus interferon-inducible cholesterol- -hydroxylase broadly inhibits viral entry by production of - hydroxycholesterol adap is an interferon stimulated gene that restricts rna virus entry emerging role of ly e in virus-host interactions. viruses interferon induction of ifitm proteins promotes infection by human coronavirus oc selecting cells for bioartificial liver devices and the importance of a d culture environment: a functional comparison between the heparg and c a cell lines cd -dependent modulation of hiv- entry by ly e. j virol interferon-inducible ly e protein promotes hiv- infection inhibition of human immunodeficiency virus type assembly and release by the cholesterol-binding compound amphotericin b methyl ester: evidence for vpu dependence virus entry, assembly, budding, and membrane rafts amphotericin b increases influenza a virus infection by preventing ifitm -mediated restriction thymic shared antigen- . a novel thymocyte marker discriminating immature from mature thymocyte subsets modulation of tcr-mediated signaling pathway by thymic shared antigen- (tsa- )/stem cell antigen- (sca- ) a cell fusion-based screening method identifies glycosylphosphatidylinositol-anchored protein ly e as the receptor for mouse endogenous retroviral envelope syncytin-a deletion of the syncytin a receptor ly e impairs syncytiotrophoblast fusion and placental morphogenesis causing embryonic lethality in mice ifitm proteins inhibit placental syncytiotrophoblast formation and promote fetal demise interferon-induced transmembrane proteins inhibit cell fusion mediated by trophoblast syncytins stimulated gene proteins that inhibit human parainfluenza virus type ifitm is a tight junction protein that inhibits hepatitis c virus entry distinct patterns of ifitm-mediated restriction of filoviruses, sars coronavirus, and influenza a virus ifitm inhibits influenza a virus infection by preventing cytosolic entry relating gpi-anchored ly proteins upar and cd to viral infection. viruses interferon- induced cell membrane proteins, ifitm and tetherin, inhibit vesicular stomatitis virus infection via distinct mechanisms identification of five interferon-induced cellular proteins that inhibit west nile virus and dengue virus infections small molecule inhibitors of er alpha-glucosidases are active against multiple hemorrhagic fever viruses characterization of the spike protein of human coronavirus nl in receptor binding and pseudotype virus entry inhibition of endoplasmic reticulum-resident glucosidases impairs severe acute respiratory syndrome coronavirus and human coronavirus nl spike protein-mediated entry by altering the glycan processing of angiotensin i-converting enzyme vpr is required for efficient replication of human immunodeficiency virus type- in mononuclear phagocytes an interferon-beta promoter reporter assay for high throughput identification of compounds against multiple rna viruses c a cells support more efficient entry of lentiviral particles pseudotyped with hcov-oc envelope proteins than hepg cells. hepg and c a cells were infected with luciferase activities were determined at hpi. relative infection represents the luciferase activity from c a normalized to that of hepg cells. error bars indicate standard deviations (n = ). ** indicates p < cell nuclei were visualized by dapi staining. (b) hcov-oc np and ifitm were determined by western blot assays. β-actin served as a loading control error bars indicate standard deviations (n = ). (e) hepg and c a stably transduced with a control retroviral vector (pqcxip) or a retroviral vector expressing ifitm , ifitm -ex or ifitm -ex were infected with hcov-oc pp hepg cells transduced with empty vector (pqcxip) the amino acid sequence alignment of ly e from multiple vertebrate species is conducted and "three finger-fold" structure is highlighted with black box. the conserved l as well as gpi anchor and n glycosylation sites are indicated (b) flp-in t-rex -derived cell lines expressing a control protein cat, wild-type or mutant ly e were cultured in the absence or presence of tet for h. intracellular ly e expression were detected by a western blot assay. β-actin served as a loading control. (c) flp-in t-rex -derived cell lines expressing the wild-type or mutant ly e were cultured in the absence or presence of tet for h. the cells were then infected with the indicated pseudotyped lentivirus flp-in t-rex -derived cell line expressing ifitm (a) or ly e (b) were cultured in the absence or presence of tet for h. the cells were then infected with the indicated pseudotyped lentivirus in the presence or absence of μm amphob. luciferase activity was measured at hr post- infection , compared to mock treatment key: cord- -nbd arhx authors: fox, charles w. title: the representation of women as authors of submissions to ecology journals during the covid- pandemic date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nbd arhx observations made from papers submitted to preprint servers, and the speculation of editors on social media platforms, suggest that women are submitting fewer papers to scholarly journals than are men during the covid- pandemic. here i examine whether submissions by men and women to six ecology journals (all published by the british ecological society) have changed since the start of covid disruptions. at these six ecology journals there is no evidence of a decline in the proportion of submissions that are authored by women (as either first or submitting author) since the start of the covid- disruptions; the proportion of papers authored by women in the post-covid period of has increased relative to the same period in , and is higher than in the period pre-covid in . there is also no evidence of a change in the geographic pattern of submissions from across the globe. changed since the start of covid disruptions. at these six ecology journals there is no evidence of a decline in the proportion of submissions that are authored by women (as either first or submitting author) since the start of the covid- disruptions; the proportion of papers authored by women in the post-covid period of has increased relative to the same period in , and is higher than in the period pre-covid in . there is also no evidence of a change in the geographic pattern of submissions from across the globe. -----------------------------------many studies have shown that female academics take on a disproportionately high share of family (mason et al. ; ecklund & lincoln ; sallee et al. ) and academic service (guarino & borden ) responsibilities, compared to their male professional colleagues, and that family responsibilities negatively affect the careers of women more than those of men (mason et al. ) . one consequence of the covid- pandemic has been the closing of schools, child care facilities, and other public and private institutions that help manage children. given the demonstrated disparities between men and women in their contributions to child care, it's a sensible prediction that women will be more substantially affected by these disruptions caused by covid- (alon et al. ) and, in particular, that women will suffer more substantial drops in their scholarly productivity than will men during the pandemic. indeed, a few preliminary analyses have suggested this is the case. an analysis of submissions of economics papers to preprint servers (shurchkov ) found a substantial ( - %) decline in proportion of papers submitted by women in march and april , compared to before the covid- disruptions. a similar analysis of submissions to science preprint archives found that, overall, the rates of submissions of papers authored by women declined in march and april , compared to both the preceding two months in and to the same two months (march and april) of , though whether and to what extent there was a decline depended on the specific preprint server (viglione ; vincent-lamarre et al. ) . some journal editors have also speculated on social media that they are observing fewer submissions authored by women post-covid (discussed in flaherty and kitchener ), though the amount of data on which this speculation is based is limited. these observations made from papers submitted to preprint servers, and the speculation of editors on social media platforms, suggest that women are submitting fewer papers to scholarly journals than are men during the covid- pandemic. details on how data are extracted, genderized and analyzed are likewise presented in these previous publications by charles fox and colleagues. the representation of women as authors -women were first authors on . % of all submissions to these six ecology journals between january and may (averaged across journals), though this percentage varied among journals from a high of . % to a low of . % ( figure a ). women were similarly represented among submitting authors (the person who submitted the manuscript to scholarone manuscripts); . % of submitting authors were women (averaged across journals), though this varied among journals from a high of . % to a low of . % ( figure b ). to test whether the representation of women as authors of ecology papers has changed since the start of the pandemic disruptions and stay-at-home orders, i had to establish a date that we can consider to be the start of the disruptions. any chosen date will necessarily be imprecise since every country, and even regions within countries, instituted stay-at-home orders and closed their schools and universities on different dates. i chose march as the start of the post-covid disruption period for these analyses because this is roughly the middle of the - week period during which the majority of united states universities closed to students and researchers. countries, and even regions or universities within countries, vary substantially in the degree to which they exclude researchers from their offices and labs, and this variance is not captured in these analyses. the above analysis is instructive but is possibly biased by the observation that the proportion of papers authored by women has been steadily increasing year-to-year at these six ecology journals (fox et al. ) . i thus also compared submissions in the days pre- march with submission in the days from march to may . as with the previous analysis, i found no evidence that the proportion of first or submitting authors that are women has declined; . % of papers had female first authors pre-covid, compared to . % post-covid ( Χ = . , p = . ), and . % of submitting authors were women pre-covid, compared to . % post-covid ( Χ = . , p = . ). in previous studies of submissions to ecology journals (fox et al. (fox et al. , , we have observed that the first author is less likely to be the person that submits their manuscript (and serve as corresponding author) if they are female than if they are male. in the current dataset ( january to may ), % of papers were submitted by the first author. female first authors were slightly less likely than male first authors to be the submitting author (logistic regression with journal as a random effect; Χ = . , p = . ), though the effect is small (only %), much smaller than the ~ % difference reported in the much larger dataset of fox et al. ( ) . there was no evidence that the likelihood that a female first author served as submitting author differed pre-vs post-covid, either comparing similar timeframes between years ( Χ = . , p = . ) or comparing pre-vs post-covid in ( Χ = . , p = . ), though the dataset is quite small and thus has very low power for detecting interactions in a logistic regression. the conclusion of these analyses is that, at these six ecology journals, i don't see evidence that the submission of manuscripts by women have been more greatly affected than submissions by men by covid- disruptions. it's important to be clear, though, that this analysis only captures submissions during a short window of time, many of which may have been written before the covid disruptions. we need to extend this analysis later this year before we can convincingly conclude that submissions by women are being impacted similarly to those by men. represented in submissions to the bes journals both before and after the covid disruptions. but that analysis does not consider whether total submissions to these six ecology journals (irrespective of author gender) have declined, remained robust, or even increased (since many researchers are barred from their labs and/or field sites and may have little else to do than final thoughts -the closing of universities and other businesses, and the need to socially isolate to reduce transmission of covid- , have profoundly disrupted nearly every aspect of scholarly research. it's inevitable that these disruptions will reduce the amount of scholarly research done until the pandemic slows and it's a reasonable prediction that women will be more greatly impacted than men if they bear a greater share of the increased caretaker workload caused by the closing of our schools and child care facilities. however, contrary to some other preliminary analyses of submissions to preprint servers (shurchkov ; vincent-lamarre et al. ) and the speculation of editors on social media (flaherty ), we do not yet see evidence that there has been a decline in the number of submissions by women to the ecology journals published by the bes. we also do not see evidence of a change in the geographic pattern of submissions from across the globe, despite some areas being particularly heavily impacted by early pandemic disruptions. of course, papers being submitted today are the result of work done months or years prior to submission, and so it may be too early to begin seeing the pandemic's impacts; effects of covid-caused disruptions may not become evident for months or years to come. we thus need to revisit the question later this year. although we do not yet see a differential in the effect of covid- on submissions by men and women, the widespread closing of primary schools and child care facilities will have disproportionately large effects on some members of our community. it's thus critical that academic institutions and scholarly societies develop the infrastructure -both procedures and policies -for addressing the various inequalities that have been and will continue to be created by this pandemic. most importantly, these institutions need to support early career researchersour students and recent graduates whose careers are most vulnerable to extended disruptions to their research and the resulting reduced scholarly productivity. vincent-lamarre and colleagues ( ) provide a list of suggestions for how best to support our vulnerable colleagues, many of which, we are pleased to see, have been adopted by leading universities. the impact of covid- on gender equality. nber working papers failing families, failing science: work-family conflict in academic science no room of one's own. inside higher ed gender differences in patterns of authorship do not affect peer review outcomes at an ecology journal gender diversity of editorial boards and gender differences in the peer review process at six journals of ecology and evolution patterns of authorship in ecology and evolution: first, last, and corresponding authorship vary with gender and geography gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution faculty service loads and gender: are women taking care of the academic family women academics seem to be submitting fewer papers during coronavirus. the lily do babies matter?: gender and family in the ivory tower is covid- turning back the clock on gender equality in academia? medium can anyone have it all? gendered views on parenting and academic careers the decline of women's research production during the coronavirus pandemic are women publishing less during the pandemic? here's what the data say gender prediction methods based on first names with genderizer acknowledgements -thanks very much to the assistant editors of the six bes journals (simon hoggart, jennifer meyer, alice plane, rhiannon robins, kirsty scandrett and india stephenson) for extracting these data from scholarone manuscripts. thanks also to josiah ritchey for running genderizer, and to ruth bryan, emilie aimé and catherine hill for helpful comments on earlier versions of this paper. key: cord- -e fjo tl authors: xiao, xia; wang, conghui; chang, de; wang, ying; dong, xiaojing; jiao, tao; zhao, zhendong; ren, lili; dela cruz, charles s; sharma, lokesh; lei, xiaobo; wang, jianwei title: identification of potent and safe antiviral therapeutic candidates against sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: e fjo tl covid- pandemic has infected millions of people with mortality exceeding , . there is an urgent need to find therapeutic agents that can help clear the virus to prevent the severe disease and death. identifying effective and safer drugs can provide with more options to treat the covid- infections either alone or in combination. here we performed a high throughput screen of approximately us fda approved compounds to identify novel therapeutic agents that can effectively inhibit replication of coronaviruses including sars-cov- . our two-step screen first used a human coronavirus strain oc to identify compounds with anti-coronaviral activities. the effective compounds were then screened for their effectiveness in inhibiting sars-cov- . these screens have identified anti-sars-cov- drugs including previously reported compounds such as hydroxychloroquine, amlodipine, arbidol hydrochloride, tilorone hcl, dronedarone hydrochloride, and merfloquine hydrochloride. five of the newly identified drugs had a safety index (cytotoxic/effective concentration) of > , indicating wide therapeutic window compared to hydroxychloroquine which had safety index of in similar experiments. mechanistically, five of the effective compounds were found to block sars-cov- s protein-mediated cell fusion. these fda approved compounds can provide much needed therapeutic options that we urgently need in the midst of the pandemic. of the screened compounds. the si for the hydroxychloroquine was in our study, indicating increased safety of newly identified drugs compared to the hydroxychloroquine. agent of covid- . positive compounds from the initial screen were tested for their antiviral efficacy against sars-cov- in vero cells. sars-cov- replicates within the vero cells and causes cytopathic effects in these cells in absence of any antiviral treatment. we generated the dose response inhibition curves along with the cytotoxicity curves for these compounds in presence of sars-cov- (fig. ) . our data show that of these compounds show significant efficacy in inhibiting sars-cov- replication with sub micromolar ic for many of these drugs such as nilotinib, clofazimine and raloxifene. the effects also confirmed by immunofluorescence assay (data not shown). these compounds also belong to a wide variety of classes including cardiac glycosides, anti-malarial drug hydroxychloroquine, cyclooxygenase- inhibitors and ion channel blockers, among others. the ic , cc and si of these compounds is shown in table . five candidate drugs inhibit cell fusion. finally, in order to clarify the mechanism by which the current pandemic of covid- is the third major outbreak in this century and the largest needed. in this study, we have identified many fda approved therapies that are highly effective against coronaviruses, including of the agents that are effective against sars-cov- . this screening confirms previous reports demonstrating anti-sars-cov- activity of hydroxychroquine, amlodipine besylate, arbidol hydrochloride, tilorone hcl, dronedarone hydrochloride, tetrandrine, merfloquine, and thioridazine hydrochloride - , , while identifying additional drugs. the underlying mechanisms of viral replication inhibition by these drugs is not clear. it is highly unlikely that these compounds will have similar antiviral mechanisms given the vast structural and pharmacological diversities of the effective antiviral compounds in our drugs can affect various steps in the viral life cycle including attachment, entry, replication, assembly and budding of viral progeny. five drugs may inhibit s-mediated cell fusion as indicated by our data (fig. ) . further studies are required to understand the precise mechanisms of each of the effective compounds found in this study. toxicity is one of the limiting factors in the therapeutic application of many drugs despite their known antiviral activities. many of these drugs had si of > , showing promise of their usefulness at safe doses. for comparison, the si of hydroxychloroquine was found to be in our study while si of amlodipine besylate was found to be ~ , demonstrating much lower safety profile of this drug. similarly, other drugs that are known to have low selective index such as digoxin for their approved use, also show lower si in our screen. five of the drugs with si of > include tyrosine kinase inhibitor nilotinib, antibiotics such as clofazimine and actidione, selective estrogen receptor modulator such as raloxifene and non-steroidal anti-inflammatory drug celecoxib. etacoronaviruses have raised great public health threats to human beings, as most known hcovs including all the three virulent hcovs (sars-cov, mers-cov and sars-cov- ) and two seasonal hcovs (oc and hku ) belong to this species - , , . it is of great value to identify antivirals against a broad spectrum of hcovs, particularly the betacoronaviruses, to tackle such threats by pharmaceutical interventions. to this end, we first screened the compounds which showed apparent activity of anti-oc , the most prevalent hcov circulates worldwide . we then narrowed down the candidates by the screening on sars-cov- , resulting in the provides a foundation for subsequent anti-hcovs drug screening of broad spectrum. however, further tests are warranted to verify their efficacies. in summary, our screen identified previously unknown fda approved compounds that are effective in inhibiting sars-cov- beside confirming the antiviral properties of ten previously reported compounds, validating our approach. this screen identified five new compounds that are highly efficacious in inhibiting the viral replication of sars-cov- with si > . further studies are needed to confirm the in vivo efficacy of these drugs in humans and covid- relevant mouse models such as those with human ace transgene . the positively identified drugs from this screen were used to perform dose response curves against oc on llc-mk and against sars-cov- using vero cells as described below. immunofluorescence: cells were fixed with % paraformaldehyde for min at room temperature, and permeabilized with . % triton x- for min. cells were then blocked with % bsa and stained with primary antibodies, followed by staining with an alexa fluor secondary antibodies. nuclei were counterstained with dapi. gene. the reference standard was tenfold diluted from × copies to × copies. pcr amplification procedure was ℃, min, ℃, min; ℃, s, ℃, s+plate read, cycles. the amplification process, fluorescence signal detection, data storage and analysis were all completed by fluorescence quantitative pcr and its own software (bio-rad cfx manager). the copies of virus were calculated according to the standard curve. the inhibition ratio was obtained by dividing the number of copies of the virus in the vehicle control group. the data were nonlinear fitting by graphpad . software to calculate ic of each drug. cell-cell fusion assay: cell-cell fusion assays were performed as described previously ( ). briefly, hek- t cells were co-transfected with sars-cov- -s glycoprotein and egfp. at h post transfection, cells were digested with trypsin ( . %) and overlaid on a % confluent monolayer of t-ace cells at a ratio of : which were treated with candidate drugs for h. after h incubation, images of syncytia were captured with operetta (perkinelmer, massachusetts, usa). could chloroquine /hydroxychloroquine be harmful in coronavirus disease (covid- ) treatment? clinical infectious key: cord- - q gjwt authors: arora, kajal; rastogi, ruchir; arora, nupur mehrotra; parashar, deepak; paliwal, jeny; naqvi, aelia; srivastava, apoorva; singh, sudhir kumar; kalyanaraman, sriganesh; potdar, swaroop; kumar, devanand; arya, vidya bhushan; bansal, sarthi; rautray, satabdi; singh, indrajeet; fengade, pankaj surendra; kumar, bibekanand; kundu, prabuddha kumar title: multi-antigenic virus-like particle of sars cov- produced in saccharomyces cerevisiae as a vaccine candidate date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q gjwt spike, envelope and membrane proteins from the sars cov- virus surface coat are important vaccine targets. we hereby report recombinant co-expression of the three proteins (spike, envelope and membrane) in a engineered saccharomyces cerevisiae platform (d-crypt™) and their self-assembly as virus-like particle (vlp). this design as a multi-antigenic vlp for sars cov- has the potential to be a scalable vaccine candidate. the vlp is confirmed by transmission electron microscopy (tem) images of the sars cov- , along with supportive hplc, dynamic light scattering (dls) and allied analytical data. the images clearly outline the presence of a “corona” like morphology, and uniform size distribution. images clearly outline the presence of a "corona" like morphology, and uniform size distribution. sars-cov- , , has become a global pandemic in last five months and is posing a grave challenge on many fronts, including health, economy, travel and social norms of living( ). the incidence of mortality and morbidity observed with this pandemic, along with the widespread of disease has created an immediate and urgent need for focused steps to develop a vaccine as an earliest measure that can be utilized for masses ( ) . many candidates are being developed for testing as a vaccine for sars-cov- and research data along with prior knowledge of technology behind each candidate becomes key resource to understand the mechanism of demonstrating immunogenicity and efficacy. amongst different technologies that are being adapted for developing the vaccine candidate, vlps are known to be a potent technology for increased immunogenicity and there are number of examples of commercial vaccines and candidates under trial as vlps ( , ) . most of the vlp vaccines utilize a major surface protein to form the vlp ( ) . a strong b cell response is shown by vlps even in the absence of adjuvants due to efficient crosslinking with specific receptors on b cells ( , ) . in developing vlps as vaccine candidates, one of the significant challenges has been to incorporate multiple surface antigens, express them in a single host and allow self-assembly into a vlp ( ) . in this study, we report the first successful self-assembly of the sars cov- vlp by co-expression of the s, e and m proteins in saccharomyces cerevisiae expression platform. the vlps were studied by multiple analytical methods, and imaged using transmission electron microscopy (tem). the tem images confirmed the morphology and their self-assembly into a vlp. the coronavirus family is known to comprise of four major structural proteins in its genome, namely the spike (s) protein, membrane (m) protein, envelope (e) protein and the nucleocapsid (n) protein. all of these proteins are encoded within the ' end of the genome ( ) . the s protein is known to mediate attachment of the virus to the host cell surface receptors resulting in fusion and subsequent viral entry into the host cell ( ) . the m protein is the most abundant protein and defines the shape of the viral envelope ( ) . the e protein is the smallest protein of all the structural proteins in the viral capsid and its function is described in viral assembly and budding ( ) . the n protein binds to the rna genome and also functions along with m protein in viral assembly and budding. as a class of subunit vaccines, the virus-like particles (vlps) are differentiated from soluble recombinant antigens by stronger protective immunogenicity, associated with their structure, and hence make a much more relevant vaccine candidate ( ) . the episomal constructs for the three proteins, s, e and m with c-terminus his tag were transformed into saccharomyces cerevisiae expression platform independently and analyzed for expression in the th h post induction. his tag was used to enable detection of the expressed proteins as there are no commercially available antibodies against the proteins s, e and m. immunoblot analysis confirmed the expression of the three proteins using anti-his antibody with specific band at ~ kda and ~ . kda for m and e protein respectively ( figure ) which was also confirmed by mass spectrometric analysis. however, a band at ~ kda was observed for s protein (figure ) which is due to detection of the s domain of s protein. the s protein is known to be cleaved by a host cell protease into polypeptides s and s in yeast ( ) . while the band for the cleaved s protein fragment is expected at ~ kda, we obtained a higher molecular weight band, which may be possibly due to the fact that the protein is known to be glycosylated. the expected molecular weight of the s protein is ∼ kda. interestingly, peptide mapping data showed presence of the complete s protein sequence, matching to full length kda protein, suggesting cleavage of s protein by the cellular protease into its two fragments, s and the s , which were found to be coexisting together on the microsomal membrane, and would would get incorporated into the a vlp as it assembles at the er-golgi interface. in order to co-express the s, e and m proteins into s. cerevisiae, recombinant construct for the expression of s and e protein, pyre _csp_cep_his and m protein, pyri _cmp_his construct were cotransformed into the host and the positive clones were selected on ynb-ura-leu plates. selected clones were taken forward for expression analysis. expression of the three proteins, s, e, and m were detected using anti-his antibody under both reducing and non-reducing conditions. results in figure show the expression of s, e and m proteins. we do observe differential expression of the three proteins. further, a band was observed at ~ kda which may be due to oligomerization of e and m protein. interestingly, the band at ~ kda disappears under non reducing conditions suggestive of formation of high molecular wt. complexes or vlps. it has been previously demonstrated that co-expression of s, e and m has been shown to form vlps ( , ) for sars-cov. importantly, the yeast expression platform along with its gras status provides the high scalability, robustness and cost-effective production for billions of dosages which would be the required to fight the pandemic globally. in order to confirm the presence and determine the morphology and size of the vlps formed on over figure c ). this data also indicates that the vlp particles are large with poly dispersity index less than %. interestingly, tem analysis showed that vlps were spherical in shape ( figure a cloning of "s", "e" and "m" proteins: in order to express the s, e and m proteins from sars cov- proteins were cloned into our proprietary expression vectors, pyre and pyri ( ) . the m protein was cloned in pyri integration vector (pyri_cmp_his) and integrated into pypd genome while s and e (pyre _csp_cep_his) protein were cloned into pyre episomal single expression vector as two separate expression cassettes. the synthetic genes sequence biased and optimised for expression in s. cerevisiae were obtained from geneart (regensburg, germany). further, all the genes were synthesized so as to have a his-tag at the c-terminus of the protein for the ease of detection as no antibodies were available. all the genes were also synthesised without an his tag. as an initial proof of concept, all the three proteins with a c-terminal his tag were cloned into pyre vector and transformed for expression analysis. episomal constructs of s, e and m with his tag, pyre _csp_his, pyre _cmp_his and pyre _cep_his plasmid and host vector pyre plasmid were transformed into proprietary protease deficient pypd yeast expression strain using lithium acetate/ss-dna/peg mediated protocol and transformants were selected over selective ynb glucose minus ura plates. also integrating construct with m protein was transformed in pypd host and selected on ynb minus leu plates. for co expression of three proteins, episomal construct with e and s gene were co-transformed into s. cerevisiae (pypd) containing integrated m protein and transformants selected on ynb glucose (without ura and leu) at ᵒc for - days. in order to study the expression, three isolated healthy transformed colonies were inoculated in appropriate ynb glucose media maintaining auxotrophic selection at °c for h. a colony of host strain pypd transformed with pyre served as host-vector control. at this stage, cultures were harvested and the cell pellet was induced with galactose at a final concentration of % in ynb minimal medium without respective markers (~od/ml at this time for all the cultures was ). all the cultures were harvested th h post induction and processed for western blot analysis. expression analysis: briefly, the cells were pelleted and treated with m lithium acetate on ice for min. subsequently, the cells were centrifuged at rpm for min. the cell pellet was then treated with . m naoh for min. the cells were finally pelleted and re-suspended in reducing dye and boiled for minutes. samples corresponding to . od of cells were resolved over sds-western blot was developed using anti-his specific antibody (cat no. sigma:h ) at a dilution of : and hrp conjugated anti-mouse secondary antibody (anti mouse igg-a ) at dilution of : . induction studies to enhance the biomass and vlp production: in order to increase cell biomass and the overall specific yields of the proteins, fed batch culture was carried out. the pre-seed and seed media were prepared in selective media "ynb -ura -leu" and finally the culture was transferred to l ypd media (hiveg peptone gm/l; yeast extract gm/l and dextrose gm/l). the key fermentation parameters of temperature ( ˚c), ph ( . ), d.o ( %) were maintained automatically. the culture was induced at hr with ml of % galactose along with x yt (yeast extract gm/l and tryptone gm/l). the culture was fed with % galactose and x yt every h. in order to study the expression of the overexpressed s, e and m protein, immuno-blot blot analysis was carried out using the cell lysate prepared after disruption of cells using glass beads. sec hplc fractionation: µl of both cell lysate samples at different hours of fermentation were injected onto tskgel supersw ( . mm id × cm l (tosoh bioscience gmbh)) column for analytical sec analysis and analysed with agilent technologies hplc instrument ( infinity ii). the estimated exclusion limit of this column for proteins is , da for globular proteins. mm sodium phosphate, ph . with mm sodium chloride was used as a standard sec buffer for the elution which provides necessary ionic strength to the vlps. uv signals were traced at nm and nm. . ml/min was the flow rate used for the analysis. corresponding vlp peak was separated by injecting µl into the chromatograph and peaks were collected from the detector end and further used for dls or tem analysis. dynamic light scattering analysis: dynamic light scattering analysis was done as a first proof of concept to see the vlp formation from the shake flask culture using litesizer (anton paar, austria) instrument. the size of the molecule was measured as the function of maximum intensity observed. the laser used to measure the hydrodynamic diameter had the wavelength of nm. during the measurement, for the light scattering angle, automatic mode was selected which automatically selects the angle based on the particle concentration. in this case, side scattering ( º) was used by the instrument to measure the hydrodynamic diameter. electron microscopy: in order to characterise and determine the size and structure of vlps, the hplc purified fractions from the cell ly sate were subjected to electron microscopic analysis as described previously ( ) . briefly, the fractions were adsorbed onto carbon-formvar copper grids and subsequently negatively stained with . % uranyl acetate aqueous solution for s. subsequently the grids were washed and examined on talos arctica transmission electron microscope. epidemiological data from the covid- outbreak, real-time case information covid- vaccines: breaking record times to first-inhuman trials engineering virus-like particles as vaccine platforms virus-like particles as immunogens recent advancements in combination subunit vaccine development regulation of igg antibody responses by epitope density and cd -mediated costimulation determinants of autoantibody induction by conjugated papillomavirus virus-like particles mechanisms of coronavirus cell entry mediated by the viral spike protein virus-like particles as a highly efficient vaccine platform: diversity of targets and production systems and advances in clinical development the severe acute respiratory syndrome coronavirus a is a novel structural protein efficient assembly and release of sars coronavirus-like particles by a heterologous expression system characterization of the size distribution and aggregation of virus-like nanoparticles used as active ingredients of the hebernasvac therapeutic vaccine against chronic hepatitis b coronavirus particle assembly: primary structure requirements of the membrane protein co-expression of recombinant human cyp c with human cytochrome p reductase in protease deficient s. cerevisiae strain at a higher scale yields an enzyme of higher specific activity virus-like particles produced in saccharomyces cerevisiae elicit protective immunity against coxsackievirus a in mice key: cord- -welf eb authors: zhou, daming; duyvesteyn, helen me; chen, cheng-pin; huang, chung-guei; chen, ting-hua; shih, shin-ru; lin, yi-chun; cheng, chien-yu; cheng, shu-hsing; huang, yhu-chering; lin, tzou-yien; ma, che; huo, jiandong; carrique, loic; malinauskas, tomas; ruza, reinis r; shah, pranav nm; tan, tiong kit; rijal, pramila; donat, robert f.; godwin, kerry; buttigieg, karen; tree, julia; radecke, julika; paterson, neil g; supasa, piyasa; mongkolsapaya, juthathip; screaton, gavin r; carroll, miles w.; jaramillo, javier g.; knight, michael; james, william; owens, raymond j; naismith, james h.; townsend, alain; fry, elizabeth e; zhao, yuguang; ren, jingshan; stuart, david i; huang, kuan-ying a. title: structural basis for the neutralization of sars-cov- by an antibody from a convalescent patient date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: welf eb the covid- pandemic has had unprecedented health and economic impact, but currently there are no approved therapies. we have isolated an antibody, ey a, from a late-stage covid- patient and show it neutralises sars-cov- and cross-reacts with sars-cov- . ey a fab binds tightly (kd of nm) the receptor binding domain (rbd) of the viral spike glycoprotein and a . Å crystal structure of an rbd/ey a fab complex identifies the highly conserved epitope, away from the ace receptor binding site. residues of this epitope are key to stabilising the pre-fusion spike. cryo-em analyses of the pre-fusion spike incubated with ey a fab reveal a complex of the intact trimer with three fabs bound and two further multimeric forms comprising destabilized spike attached to fab. ey a binds what is probably a major neutralising epitope, making it a candidate therapeutic for covid- . conversion to the post-fusion form where the s subunit engages the host membrane whilst dispensing with s , . neutralising human monoclonal antibodies that recognise the ace receptor binding site for sars-cov- and sars-cov- are generally not cross-reactive between the two viruses and are susceptible to escape mutation - (indeed a natural mutation y n has already been identified at this site (gisaid : accession id: epi_isl_ wienecke-baldacchino et al.)). in contrast cr (derived from a sars-cov- patient) cross-reacts strongly with sars-cov- (methods, fig. ) and has been shown to recognise a cryptic, conserved epitope on the rbd distinct from the binding epitope of ace , [ ] [ ] [ ] . that this is not uncommon for sars-cov- antibodies is suggested by similar observations for d . to isolate sars-cov- spike-reactive monoclonal antibodies, we cloned antibody genes from blood-derived plasmablasts of a covid- patient in the convalescent phase. one of these, ey a was shown by elisa to bind s of sars-cov- and cross react with sars-cov- (fig. ) . binding of ey a to sars-cov- -infected cells was detected by immunofluorescence ( fig. ). surface plasmon resonance (spr) measurements for ey a fab showed high affinity binding to immobilised sars-cov- rbd (kd = nm, although the value for immobilised ey a igg was somewhat higher) as derived from the kinetic data (methods, extended data fig. , extended data table ). spr studies showed that there was some interdependence of ey a and cr binding, which varied depending on which component was immobilised on to the sensor chip; ace blocking assays confirmed a somewhat asymmetric blocking effect (extended data fig. ). with rbd stably expressed on mdck-siat cells (mdck-rbd), ey a did not block binding of ace to the rbd, whereas with ace stably expressed on mdck-siat cells (mdck-ace ) ey a blocked the interaction of rbd with ace . in this assay, ey a exhibits around times stronger ace blocking than cr (ey a, ic = nm; cr , ic = nm) and has equivalent ace inhibition compared to ace -fc (ic = nm) and vhh -fc (ic = nm) . these observations are suggestive of an indirect effect by ey a once bound to the rbd, consistent with an allosteric or weak direct interaction. this was supported by an spr competition assay with immobilised cr , which binds distant from the ace binding site (extended data fig. ) . this showed complete competition with ey a for rbd binding suggesting they recognise the same or overlapping epitopes, and indicated that ey a binds the sars-cov- rbd more tightly. two independent neutralisation tests, both using live wild type sars-cov- showed strong neutralisation. a neutralisation test for ey a based on quantitative pcr detection of virus in the supernatant bathing infected vero e cells after days of culture, showed a ~ -fold reduction in virus signal (methods, extended data fig. ) indicating that it is highly neutralising. this was further corroborated by a plaque reduction neutralisation test (prnt) at phe porton down (methods and extended data table ) using sars-cov- virus and ey a which gave an nd of ~ . µg/ml ( nm) (calculated according to grist ) . a separate prnt implementation at oxford gave a slightly higher nd of ~ µg/ml, consistent with a shorter incubation time of antibody with virus at lower temperature (extended data fig. ). to elucidate the epitope of ey a, we determined the crystal structures of the deglycosylated sars-cov- rbd in complex with ey a fab alone and in a ternary complex incorporating a nanobody (nb) which has been shown to compete with ace (for data on a closely related nb see huo , submitted), as a crystallisation chaperone. the crystals of the binary complex diffracted to . Å resolution (methods, extended data table ) and those of the ternary complex to . Å. the interaction between ey a and the rbd was identical in both complexes (extended data fig. ). the higher resolution ternary complex, which showed that there was no interaction between ey a and the nb, permitted a full interpretation of the detailed interactions (figs. and ) and has been refined to give an rwork/r-free of . / . and good stereochemistry (methods, extended data table ). residues - of the rbd, - and - of the heavy chain and - of the light chain of ey a and - of the nb are well defined fig. a ,b. the nb recognises an epitope adjacent to and slightly overlapping the ace receptor binding site and binds the rbd orthogonally to ey a (fig. b,c) . ey a binds essentially the same epitope as cr , but with a different pose corresponding to a rotation of ° about an axis perpendicular to the rbd α helix (central to both epitopes) (fig. d,e) . the fab complex interface buries and Å of surface area for the cdrs of the heavy and light chains respectively. the ey a interaction is mediated by the cdr loops h , h , h , l and l contacting predominantly α but also α and the β -α , α -α and α -β loops of the rbd (fig. and extended data fig. ). a total of residues from the heavy chain and from the light chain participate in the interface together with residues from the rbd. for the heavy chain these form potentially hydrogen bonds and a single salt bridge between d (of h ) and k of the rbd and the light chain interface residues contribute an additional hydrogen bonds. hydrophobic interactions further increase the binding affinity (fig. ) . of the residues involved in the interaction are conserved between the cr and ey a epitopes ( fig. and extended data fig. ). conformational changes are introduced into the rbd by binding to ey a at the α (residues - ) and α (residues - ) helices (extended data fig. ), similar to those seen for the cr complex . comparison of the epitope residues for ey a, cr and vhh- shows that there is a very substantial overlap (extended data fig. ), although the bulk of the molecules extend in different directions, such that vhh- directly blocks ace- binding . in the first pre-fusion spike structures (pdb ids: vsb , vxx, vyb ), where residues and in the linker between two helices in s were mutated to a pro-pro sequence to prevent the conversion to the post-fusion helical conformation, the rbds were found in either one 'up' two 'down' or all three 'down' configuration, and in both cases the epitope is inaccessible. in the 'down' position it is packed against another rbd of the trimer and the nterminal domain (ntd) of the neighbouring protomer. a recent publication for the wild type spike identifies a more closed form where the s portion of the spike is tightened up. the structure is not yet deposited however, and so we have looked at the role of the epitope in the down rather than fully closed form, which will be broadly similar. here the ey a epitope packs down tightly against the s 'knuckle' bearing the pro-pro mutations, forming a buried protein-protein interface and making the epitope completely inaccessible. we assume that in the closed form this interaction will be even tighter and is probably responsible for maintaining the spike in the pre-fusion state. even when the rbd is in the 'up' configuration, the epitope remains largely inaccessible and substantial further movement of the rbd would be required to permit interaction unless more than one rbd was in the up conformation . to investigate how the fab insinuates itself into the spike, we performed cryo-em analysis. spike ectodomain was mixed with a -fold molar excess of ey a fab and incubated at room temperature ( °c) with an aliquot taken at hours, applied to cryo-em grids and frozen (methods). unbiased d class averages revealed three major particle classes with over onethird comprising a trimeric spike/ey a complex (some of which are self-associated) (methods, extended data table and °. in addition, the orientations of the vh domains relative to their associated rbds differ slightly from that of the crystal structure (by °, ° and °, respectively). the quality of the density suggests that these likely samples selected from a continuous distribution (extended data fig. ). the majority of the remaining particles form either a roughly -fold symmetric structure or a triangular association (methods, extended data table and figs. - ). reconstructions of these particles were anisotropic due to a preferential orientation of the particles on the grid which was somewhat mitigated by collecting data with ° tilt to yield reconstructions at . Å and . Å, respectively, in the plane of the grid but significantly worse resolution perpendicular to the grid (extended data fig. ). the reconstructions were sufficiently clear to allow the unambiguous fitting of ey a-rbd complexes (extended data fig. ), however the density for what we assume are the n-terminal domains is poor in both reconstructions and we did not attempt to fit a model. these structures likely represent a residual well-structured fragment from the unfolding of the pre-fusion state of the spike (sds page analysis shows that the spike polypeptide remains largely uncleaved, extended data fig. ). the 'dimeric' and 'trimeric' structures are formed by different lateral associations and these also differ from that seen for similarly structurally degraded spike-cr complexes (extended data fig. ). convalescent serum has shown promise in patients severely ill with covid- , , thus immunotherapeutics have potential for treating covid- even at a relatively late stage in the disease. to this end, it is desirable to find a combination of antibodies that neutralise the virus by different mechanisms to mitigate against immune evasion and antibody dependent enhancement. one neutralisation mechanism is blocking receptor attachment. we propose that attachment at the ey a epitope is a further major neutralisation mechanism. in support of this, the epitope recognised by ey a has been reported for several antibodies , , , and nanobodies , raised against sars-cov- , sars-cov- and mers. for sars-cov- , cr has also been shown to neutralise synergistically with ace blocking antibodies . despite the spatial separation of the ey a and ace epitopes we find some cross-talk between the two binding events. the ey a epitope is extremely unusual, since it is completely inaccessible in the pre-fusion spike trimer. this raises the question of what the mechanism of neutralisation might be. in the pre-fusion state the ey a/cr epitope rests down upon the upper end of the helixturn-helix between heptad repeat (hr ) and the central helix (ch) of s , essentially putting a lid on the spring-loaded extension of the helix which occurs on conversion to the postfusion state in the vicinity of the mutations designed to prevent conversion between the preand post-fusion conformation (fig. ). the residues of the epitope are crucial to these protein-protein interactions, and therefore highly conserved, explaining why it has, to date, proved impossible to generate mutations that escape binding of the antibody , . ey a binding to the isolated rbd is tight (at ~ nm it is roughly an order of magnitude tighter than cr ) and, remarkably, the binding pose on top of the spike allows three fabs to bind simultaneously around the central axis (whereas cr fab cannot be accommodated). in line with this, a major portion of spike molecules incubated for h with ey a are still in the intact pre-fusion state, with only about / being converted. simple modelling suggests that a similar packing could occur for intact antibodies (extended data fig. ). in general, we would expect binders at this epitope to neutralise by displacing the 'lid' on the hr /ch turn, reducing the stability of the pre-fusion state and therefore reducing the barrier to conversion to the more stable post-fusion trimer. this conversion is hindered in the construct we have used by the presence of the proline mutations at the turn between the helices. premature conversion would prevent later attachment to the cell and block infectivity. the kinetics of this process will determine the effectiveness of the antibody in neutralisation and ultimately protection. since the rbd is a relatively small domain there might also be an interplay between separate epitopes, thus we saw allosteric effects between ey a and ace binding and similarly vhh- , which binds an overlapping epitope to ey a, strongly inhibits ace- binding by virtue of its different angle of attack . the reason for the cross-talk between this study was designed to isolate sars-cov- antigen-specific human mabs from peripheral plasmablasts in humans with natural sars-cov- infection, to characterize the antigenic specificity and phenotypic activity of sars-cov- spike-reactive mab, and to determine the structure of antibody in complex with viral antigen. fresh peripheral blood mononuclear cells (pbmcs) were separated from whole blood by density gradient centrifugation and cryopreserved pbmcs were thawed. pbmcs were stained with a mix of fluorescent-labelled antibodies to cellular surface markers, including anti-cd (bd biosciences, usa), anti-cd (bd biosciences, usa), anti-cd (bd biosciences, usa), anti-cd (bd biosciences, usa), anti-cd (bd biosciences, usa), anti-igg (bd biosciences, usa) and anti-igm (bd biosciences, usa). plasmablasts were selected by gating on cd -cd -cd +cd hicd hiigg+igm-events and were isolated in chamber as single cell as previously described . sorted single cells were used to produce human igg monoclonal antibodies as previously described . expression vectors that carry variable domains of heavy and light chains were transfected into the t cell line for expression of recombinant full-length human igg monoclonal antibodies in serum-free transfection medium. to determine the individual gene segments employed by vdj and vj rearrangements and the number of nucleotide mutations and amino acid replacements, the variable domain sequences were aligned with germline gene segments using the international immunogenetics (imgt) alignment tool (http://www.imgt.org/imgt_vquest/input). ey a igg used for neutralisation and making fab: antibody was expressed using nanobody: this was derived from a naïve library followed by affinity maturation as described deglycosylation of rbd: µl of endoglycosidase f (~ mg/ml) was added to rbd (~ mg/ml, ml) and incubated at room temperature for two hours. rbd was then loaded to a superdex hiload / gel filtration column (ge healthcare) for further purification using buffer mm hepes ph . , mm nacl. purified rbd was concentrated using a kda ultrafiltration tube (amicon) to mg/ml. the neutralization activity of monoclonal antibody-containing supernatant was measured using a sars-cov- (strain cdc- ) infection of vero e cells . briefly, vero e cells were preseeded in a well plate at a concentration of x cells per well. on the following day, monoclonal antibody-containing supernatant were mixed with an equal volume of tcid virus preparation and incubated at °c for hour. the mixture was added into seeded vero e cells and incubated at °c for days. the cell control, virus control, and virus back-titration were setup for each experiment. at day , the culture supernatant was harvested from each well and the viral rna was extracted by the automatic labturbo system (taigen, taiwan) following the manufacturer's instructions. for the most part, except that the specimen was pretreated with proteinase k prior to rna extraction. real-time reverse transcription polymerase chain reaction was performed in a -µl reaction containing µl of rna sars-cov- (australia/vic / ) plaque reduction neutralization tests were performed using passage of sars-cov- victoria/ / . virus suspension at appropriate concentrations in dulbecco's modification of eagle's medium containing % fbs (d ; µl) was mixed antibody ( µl) diluted in d at a final concentration of µg/ml, µg/ml, . ug/ml or . µg/ml, in triplicate, in wells of a well tissue culture plate, and incubated at room temperature for minutes. thereafter, . ml of a single cell suspension of vero e cells in d at x /ml was added, and incubated for h at o c before being overlain with . ml of d supplemented with carboxymethyl cellulose ( . %). cultures were incubated for a further days at o c before plaques were revealed by staining the cell monolayers with amido black in acetic acid/methanol. purified and deglycosylated rbd and ey a fab were combined in an approximate molar ratio of : at a concentration of . mg/ml. nb was also combined with ey a- his fab and rbd in a : : molar ratio with a final concentration of . mg/ml. these two complexes were separately incubated at room temperature for one hour. initial screening of crystals was performed in crystalquick -well x plates (greiner bio-one) with a cartesian robot using the nanolitre sitting-drop vapour diffusion method as previously described , . crystals were soaked in a solution containing % glycerol and % reservoir solution for a few seconds and then mounted in loops and frozen in liquid nitrogen prior to data collection. diffraction data were collected at k at beamline i of diamond light source, uk. diffraction images of . ° rotation were recorded on an eiger xe m detector with an exposure time of . s per frame, beam size × µm and % beam transmission. data were indexed, integrated and scaled with the automated data processing program xia -dials , . the data set for the binary complex of ° was collected from a single frozen crystal to . Å resolution with -fold redundancy. the crystal belongs to space group p with unit cell dimensions a = b = . Å and c = . Å. the structure was determined by molecular replacement with phaser using search models of antibody cr fab and the rbd of the rbd/cr fab complex (pdb id yla; ). there are three rbd/ey a complexes in the crystal asymmetric unit, resulting in a crystal solvent content of ~ %. for the ternary complex, a data set of ° rotation with data extending to . Å was collected on beamline i of diamond with exposure time . s per . ° frame, beam size × µm and % beam transmission). the crystal also belongs to space group r but with unit cell dimensions (a = b = . Å and c = . Å). there is one rbd/ey a/nb complex in the asymmetric unit and a solvent content of ~ %. one cycle of refmac was used to refine atomic coordinates after manual correction in coot to the protein sequence from the search model. for both the binary and ternary complexes the final refinement used phenix resulting in rwork = . and rfree = . for all data to . Å resolution for the binary complex and to rwork = . and rfree = . for all data to . Å resolution for the ternary complex. there is well ordered density for a single glycan at the glycosylation site n in the rbd. data collection and structure refinement statistics are given in extended data table . structural comparisons used shp , residues forming the rbd/fab interface were identified with pisa , figures were prepared with pymol (the pymol molecular graphics system, version . r pre, schrödinger, llc). spike protein, following sec purification, was buffer exchanged into mm tris ph . , mm nacl, . % nan buffer using a desalting column (zeba, thermo fisher). a final concentration of . mg/ml was incubated with ey a fab (in the same buffer) in a : molar ratio (fab to trimeric spike) at room temperature for hrs. control grids of spike alone after incubation at room temperature for hrs were also prepared. each grid was prepared using µl sample applied to a freshly glow-discharged on high for grids were screened on a titan krios microscope using serialem operating at kv (thermo fisher). movies were collected on a k detector on a titan krios operating at kv in super resolution mode, with a calibrated super resolution pixel size of . a/pix at both ° and ° tilt. to compensate for the poorer contrast with tilted data, it was necessary to use a higher dose rate for the latter dataset. alignment and motion correction was performed using relion . 's implementation of motion correction , with a -by- patch-based alignment. all frames were binned by two, resulting in a final calibrated pixel size of . Å/pixel. contrast-transfer-function (ctf) of full-dose and non-weighted micrographs was estimated within a cryosparc wrapper for gctf-v . . images were then manually inspected and those with poor ctf-fits were discarded. particles were then picked by unbiased blob picking in cryosparc v. . . and subjected to rounds of d classification. for the spike-ey a dataset (structure a), , , spike-like particles were used to make a template to pick particles from the untilted dataset, which were then filtered by d classification to , particles and then further refined by d classification with an ab initio model set. for the ° dataset, , particles were used as a template, and filtered by d classification to a set of , particles and then, as before, further refined by unbiased d classification. the two particle sets were then refined together, with a final set of , particles. for b and c (triangular ring and 'dimeric' form), particles from both the zero and ° datasets were combined in a similar manner to the spike-ey a dataset using the 'exposure group utilities' module in cryosparc. both particle sets (b, particles and c, , particles) were then reclassified and the best class refined with non-uniform refinement. for b, c symmetry was imposed at this final refinement stage, resulting in an appreciable improvement in resolution, as indicated by inspection and gold-standard fsc = . ( . versus . Å, see extended data table ). the em density of spike/ey a was fitted with the structure of a closed form of spike (pdb id vxx) apart from the rbds and ey a fab which were fitted with rbd/ey a of the ternary crystal structure using coot . due to the lower resolution, rbd and ey a are only fitted to the 'dimeric' and 'trimeric' em density. the spike/ey a structure was refined with phenix real space refinement, first as a rigid body and then by positional and bfactor refinements. only rigid body refinement was applied to the 'dimeric' and the 'trimeric' complexes. the statistics of em data collection and structure refinement are shown in extended data table these authors contributed equally: d.z., hmed, c.-p.c. ey a binds the s subunit of sars-cov- and cross react with s of sars-cov- . b, antibody cr similarly binds the s subunit of sars-cov- and cross react with sars-cov- s , but with lower affinity. c, convalescent serum from a covid- patient was also included as a control and in this case binding to mers and oc spike proteins also investigated. d, binding of ey a on the sars-cov- infected cells in immunofluorescence assay. anti-influenza h mab bs a was included as a control. sars-cov- spike sars-cov- s sars-cov- s mers spike oc spike performed cryo-em sample preparation, screening and processing and j.raedecke performed cryo-em data collection, and j.ren refined the cryo-em structures helped prepare materials, perform experiments and analysed data. all authors read and approved the manuscript real estimates of mortality following covid- infection cryo-em structure of the -ncov spike in the prefusion conformation. science ( -. ) cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace structure, function, and antigenicity of the sars-cov- spike glycoprotein dynamical asymmetry exposes -ncov prefusion spike human monoclonal antibodies block the binding of sars-cov- spike protein to angiotensin converting enzyme receptor human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants potent neutralization of severe acute respiratory syndrome (sars) coronavirus by a human mab to s protein that blocks receptor association potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies human neutralizing antibodies elicited by sars-cov- infection global initiative on sharing all influenza data -from vision to reality neutralization of sars-cov- by destruction of the prefusion spike a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov. science ( -. ) potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody a human monoclonal antibody blocking sars-cov- infection diagnostic methods in clinical virology. x article structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies distinct conformational states of sars-cov- spike protein deployment of convalescent plasma for the prevention and treatment of covid- treatment of critically ill patients with covid- with convalescent plasma structural basis for neutralization of sars-cov- and sars-cov by a potent therapeutic antibody early release-severe acute respiratory syndrome coronavirus −specific antibody responses in coronavirus disease identification of human single-domain antibodies against sars-cov- immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen the production of glycoproteins by transient expression in mammalian cells epitope-associated and specificity-focused features of ev -neutralizing antibody repertoires from plasmablasts of infected children sequence variation among sars-cov- isolates in taiwan detection of novel coronavirus ( -ncov) by real-time rt-pcr isolation and rapid sharing of the novel coronavirus (sar-cov- ) from the first patient diagnosed with covid- in australia beitrag zur kollektiven behandlung pharmakologischer reihenversuche a procedure for setting up high-throughput nanolitre crystallization experiments. i. protocol design and validation a procedure for setting up high-throughput nanolitre crystallization experiments. crystallization workflow for initial screening, automated storage, imaging and optimization xia : an expert system for macromolecular crystallography data reduction dials: implementation and evaluation of a new integration package phaser crystallographic software refmac for the refinement of macromolecular crystal structures coot: model-building tools for molecular graphics macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix crystal structure of cat muscle pyruvate kinase at a resolution of . Å inference of macromolecular assemblies from crystalline state new tools for automated high-resolution cryo-em structure determination in relion- real-time ctf determination and correction algorithms for rapid unsupervised cryo-em structure determination multiple ligand-protein interaction diagrams for drug discovery we acknowledge the bd facsaria™ cell sorter service provided by the core instrument the authors declare no competing interests. correspondence to david i. stuart or kuan-ying a. huang. the coordinates and structure factors of the sars-cov- rbd/ey a crystallographic complexes are available from the pdb with accession codes xxx and vvv respectively. em maps and structure models are deposited in emdb and pdb with accession codes xxx and yyy for the pre-fusion spike, and xxxxx and yyyy for the dimeric complex respectively. the data that support the findings of this study are available from the corresponding authors on request. key: cord- -hrdanbn authors: ghahremanpour, mohammad m.; tirado-rives, julian; deshmukh, maya; ippolito, joseph a.; zhang, chun-hui; de vaca, israel cabeza; liosi, maria-elena; anderson, karen s.; jorgensen, william l. title: identification of known drugs as inhibitors of the main protease of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hrdanbn a consensus virtual screening protocol has been applied to ca. approved drugs to seek inhibitors of the main protease (mpro) of sars-cov- , the virus responsible for covid- . drugs emerged as top candidates, and after visual analyses of the predicted structures of their complexes with mpro, were chosen for evaluation in a kinetic assay for mpro inhibition. remarkably of the compounds at -μm concentration were found to reduce the enzymatic activity and provided ic values below μm: manidipine ( . μm), boceprevir ( . μm), lercanidipine ( . μm), bedaquiline ( . μm), and efonidipine ( . μm). structural analyses reveal a common cloverleaf pattern for the binding of the active compounds to the p , p ’, and p pockets of mpro. further study of the most active compounds in the context of covid- therapy is warranted, while all of the active compounds may provide a foundation for lead optimization to deliver valuable chemotherapeutics to combat the pandemic. sars-cov- , the cause of the covid- pandemic, is a coronavirus (cov) from the coronaviridae family. its rna genome is ~ % identical to that of sars-cov, which was responsible for the severe acute respiratory syndrome (sars) pandemic in . sars-cov- encodes two cysteine proteases: the chymotrypsin-like cysteine or main protease, known as cl pro or m pro , and the papain-like cysteine protease, pl pro . they catalyze the proteolysis of polyproteins translated from the viral genome into non-structural proteins essential for packaging the nascent virion and viral repication. therefore, inhibiting the activity of these proteases would impede the replication of the virus. m pro processes the polyprotein ab at multiple cleavage sites. it hydrolyzes the gln-ser peptide bond in the leu-gln-ser-ala-gly recognition sequence. this cleavage site in the substrate is distinct from the peptide sequence recognized by other human cysteine proteases known to date. thus, m pro is viewed as a promising target for anti sars-cov- drug design; it has been the focus of several studies since the pandemic has emerged. , [ ] [ ] [ ] [ ] an x-ray crystal structure of m pro reveals that it forms a homodimer with a fold crystallographic symmetry axis. , each protomer, with a length of residues, is made of three domains (i-iii). domains ii and i fold into a six-stranded β-barrel that harbors the active site. , , domain iii forms a cluster of five antiparallel αhelices that regulates the dimerization of the protease. a flexible loop connects domain ii to domain iii. the m pro active site contains a cys-his catalytic dyad and canonical binding pockets that are denoted p , p , p , p , and p . the amino acid sequence of the active site is highly conserved among coronaviruses. the catalytic dyad residues are his and cys and the residues playing key roles in the binding (figure ). these residues have been found to interact with the ligands co-crystallized with m pro in different studies. , , crystallographic data also suggested that ser of one protomer interacts with phe and glu of the other as the result of dimerization. , these interactions stabilize the p binding pocket, thereby, dimerization of the main protease is likely for its catalytic activity. , drug repurposing is an important strategy for immediate response to the covid- pandemic. in this approach, the main goal of computational and experimental studies has been to find existing drugs that might be effective against sars-cov- . for instance, a molecular docking study suggested remdesivir as a potential therapeutic that could be used against sars-cov- , which was supported experimentally by an ec value of μm in an infected-cell assay. however, a clinical trial showed no statistically significant clinical benefits of remdesivir on adult patients hospitalized for severe covid- . nonetheless, patients who were administered remdesivir in the same trial showed a faster time to clinical improvement in comparison to the placebo-control group. in another clinical trial, only patients on mechanical ventilation benefitted from remdesivir. an ec value of μm was also reported for lopinavir , suggesting it may have beneficial activity against sars-cov- . however, neither lopinavir nor the lopinavir/ritonavir combination has thus far shown any significant benefits against covid- in clinical trials. chloroquine, hydroxychloroquine, and favipiravir have also been explored for repurposing against covid- ; however, clinical studies with them have been controversial. [ ] [ ] [ ] [ ] these studies reflect the urgent need for systematic drug discovery efforts for therapies effective against sars-cov- . thus, we decided to pursue discovery of small-molecule inhibitors of m pro . the aim of this initial work was two-fold: to identify known drugs that may show some activity, but also to identify structurally promising, synthetically-accessible substructures suitable for subsequent lead optimization. our expectation was that existing drugs may show activity but not at the low-nanomolar levels that are typical of effective therapies. this report provides results for the first goal. the work began by designing and executing a consensus molecular docking protocol to virtual screen were predicted using the propka and the h++ severs. , accordingly, lysines and arginines were positively charged, aspartic and glutamic acids were negatively charged, and all histidines were neutral. all histidines were built with the proton on nε except for his , which was protonated at nδ. the resulting m pro structure has a net charge of - e. extensive visual inspection was carried out using ucsf chimera. consensus molecular docking. most docking programs apply methods to generate an initial set of conformations, and tautomeric and protonation states for each ligand. this is followed by application of search algorithms and scoring functions to generate and score the poses of the ligand in the binding site of a protein. scoring functions have been trained to reproduce a finite set of experimental ligand-binding affinities that are generally a mix of activity data converted to a free-energy scale. therefore, the accuracy of the scores is dependent on multiple factors including the compounds that were part of the training set. to mitigate the biases, we performed four independent runs of protein-ligand docking with a library of ca. approved, oral drugs using glide, autodock vina, and two protocols with autodock . . the results were compiled and further consideration focused on those compounds that ranked among the top % percent in at least out of the runs. glide. schrödinger's protein prep wizard utility was used for preparing the protein. a -Å grid was then generated and centered on the co-crystallized ligand, which was subsequently removed. the drug library members were neutralized and/or ionized via schrödinger's ligprep. the epik program was used for estimating the pka values of each compound. plausible tautomers and stereoisomers within the ph range of ± were generated for each compound using the opls force field. these conditions resulted in a total of structures, which were then docked into m pro using schrödinger's standard-precision (sp) glide. , autodock. the autodocktools (adt) software was used for creating pdbqt files from sdf and pdb files of compounds and the protein, respectively. non-polar hydrogen atoms were removed and gasteiger-marsili charges were assigned for both the protein and the ligands using adt. the autogrid . simulations. the protonated m pro dimer, with a net charge of - e, was represented by the opls-aa/m force field. tip p water was used as the solvent. sodium counterions were added to neutralize the net charge of each system. the selected ligand candidates were represented by the opls/cm a force field, as assigned by the boss software (version . ) and the ligpargen python code. the parameters were converted to gromacs format using ligpargen. for neutral ligands, the cm a partial atomic charges were scaled by a factor of . . each m pro -ligand complex was put at the center of a triclinic simulation box with -Å padding. an energy minimization was then performed until the steepest descent algorithm converged to a maximum force smaller than . kcal mol - Å - . a cutoff radius of Å was used to explicitly calculate non-bonded interactions. long-range electrostatic interactions were treated using the particle mesh ewald (pme) algorithm. the pme was used with an interpolation order of , a fourier spacing of . Å, and a relative tolerance of - . the van der waals forces were smoothly switched to zero between and Å. analytical corrections to the long-range effect of dispersion interactions were applied to both energy and pressure. all covalent bonds to hydrogen atoms were constrained at their equilibrium lengths using the lincs algorithm with the order of in the expansion of the constraint-coupling matrix. each system was subsequently simulated for ns in the canonical ensemble (nvt) in order for the solvent to relax and the temperature of the system to equilibrate. initial velocities were sampled from a maxwell-boltzmann distribution at k. the v-rescale thermostat with a stochastic term was used for keeping the temperature at k. the stochastic term ensured that the sampled ensemble was canonical. the coupling constant of the thermostat was set to . ps. table , and the structures of some of the ones that turned out to be most interesting are shown in figure . the primary indications include bacterial and viral infections, hypertension, psychosis, inflammation, and cancer. their mechanisms of action are also broad ranging from kinase and protease inhibitors to dopamine receptors agonists/antagonists, and calcium channel blockers. it is not surprising that peptidic protease inhibitors are well-represented in view of the peptide substrate and prior discovery of peptidic inhibitors for m pro and its sars-cov relative. , , in almost all cases the predicted poses for the compounds from the different docking programs agreed well. the poses from glide were then subjected to extensive visual scrutiny to check for unsatisfied hydrogen-bonding sites, electrostatic mismatches, and unlikely conformation of the ligand. about half of the compounds were ruled out for further study due to the occurrence of such liabilities and the presence of multiple ester groups (e.g., methoserpidine and nicomol) or overall size and complexity (e.g., bromocriptine and benzquercin). a repeated motif was apparent with high-scoring ligands having a cloverleaf pattern with occupancy of the p , p ', and p pockets, as illustrated in figure for the complex of azelastine. other common elements are an edge-to-face aryl-aryl interaction with his and placement of a positively-charged group in the p pocket in proximity to glu , e.g., the methylazepanium group of azelastine, the protonated trialkylamino group of bedaquiline, and protonated piperazine of periciazine. however, glu forms a saltbridge with the terminal ammonium group ser b (figure ). the electrostatic balance seems unclear in this region, so our final selections included a mix of neutral and positively-charged groups for the p site. the analysis of the high-scoring compounds also considered structural variety and potential synthesis of analogs. in the end, we settled on compounds, which are highlighted in table , for purchase and assaying. sixteen were commercially available, mostly from sigma-aldrich. the seventeenth, cinnoxicam, was not available, but it was readily prepared in a one-step synthesis from the commercially-available ester components. it may be noted that three calcium channel blockers, efonidipine, lercanidipine, and manidipine were purchased ( figure ). this was not done owing to the characteristic dihydropyridine substructure, since this end of the molecule protrudes out of the p ' site in the docked poses. it was for the variety in the left-sides of the molecules in figure , which form the cloverleaf that binds in the p , p ', and p pockets, as illustrated in figure for manidipine. the steric fit in this region appears good, though the only potential hydrogen bond is between the nitro group and the catalytic cys . remarkably, fourteen of the drugs at μm decreased m pro activity ( nm), as shown in figure and table . five drugs decreased m pro activity to below %. the top five hits from the kinetic assay were manidipine, boceprevir, efonidipine, lercanidipine, and bedaquiline. dose-response curves were obtained to determine ic values, when possible, as shown in figure for the five most potent inhibitors, with the raw data as a function of time and concentration given in figure s . the computed structures for the complexes of boceprevir and bedaquiline are illustrated in figure . for boceprevir, the dimethylcyclopropyl subunit is predicted to μm. further study of these compounds in the context of covid- therapy is warranted, while all of the active compounds reported here may provide a foundation for lead optimization to deliver valuable chemotherapeutics to combat the pandemic. the supporting information is available free of charge on the acs publications website. an excel file with the names and docking scores for the full drug library, a the authors have no competing interests. gratitude is expressed for support to the u. s. national institutes of health (gm ) and to the yale university school of medicine for a corect pilot grant. the m pro plasmid was kindly provided by the hilgenfeld lab. a new coronavirus associated with human respiratory disease in china structure of mpro from sars-cov- and discovery of its inhibitors identification of a novel coronavirus in patients with severe acute respiratory syndrome structural basis for the inhibition of sars-cov- main protease by antineoplastic drug carmofur crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors prediction of novel inhibitors of the main protease (m-pro) of sars-cov- through consensus docking and drug reposition gc- , and calpain inhibitors ii, xii inhibit sars-cov- viral replication by targeting the viral main protease learning from the past: possible urgent prevention and treatment options for severe acute respiratory infections caused by -ncov drug repurposing: progress, challenges and recommendations repurposing against covid- lopinavir, emetine, and homoharringtonine inhibit sars-cov- replication in vitro favipiravir and other drugs for the treatment of the new coronavirus chloroquine and hydroxychloroquine in the treatment of covid- with or without diabetes: a systematic search and a narrative review with a special reference to india and other developing countries in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) observational study of hydroxychloroquine in hospitalized patients with covid- pandda analysis of covid- main protease against the dsi-poised fragment library asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation propka : consistent treatment of internal and surface residues in empirical p k a predictions h++: a server for estimating pkas and adding missing hydrogens to macromolecules h++ . : automating pk prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations a visualization system for exploratory research and analysis towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution epik: a software program for pka prediction and protonation state generation for drug-like molecules opls : a force field providing broad coverage of drug-like small molecules and proteins a new approach for rapid, accurate docking and scoring. . method and assessment of docking accuracy glide: a new approach for rapid, accurate docking and scoring. . enrichment factors in database screening autodock and autodocktools : automated docking with selective receptor flexibility autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading : a high-throughput and highly parallel open source molecular simulation toolkit improved peptide and protein torsional energetics with the opls-aa force field comparison of simple potential functions for simulating liquid water potential energy functions for atomic-level simulations of water and organic and biomolecular systems molecular modeling of organic and biomolecular systems using boss and mcpro ligpargen web server: an automatic opls-aa parameter generator for organic ligands a smooth particle mesh ewald method lincs: a linear constraint solver for molecular simulations canonical sampling through velocity rescaling molecular dynamics with coupling to an external bath polymorphic transitions in single crystals: a new molecular dynamics method structure-based design of antiviral drug candidates targeting the sars-cov- main protease an overview of severe acute respiratory syndrome -coronavirus (sars-cov) cl protease inhibitors: peptidomimetics and small molecule chemotherapy calcium channel blockers: a possible potential therapeutic strategy for the treatment of alzheimer's dementia patients with sars-cov- infection targeting the sars-cov- main protease to repurpose drugs for single-molecule pulling simulations can discern active from inactive inhibitors investigating drug-target association and dissociation mechanisms using metadynamics-based algorithms estimation of drug-target residence times by τ-random accelerated molecular dynamics simulations dynamic undocking and the quasi-bound state as tools for drug discovery key: cord- -bqeffvzl authors: limonta, daniel; dyna-dagman, lovely; branton, william; makio, tadashi; wozniak, richard w.; power, christopher; hobman, tom c. title: nodosome inhibition as a novel broad-spectrum antiviral strategy against arboviruses and sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bqeffvzl in the present report, we describe two small molecules with broad-spectrum antiviral activity. these drugs block formation of the nodosome. the studies were prompted by the observation that infection of human fetal brain cells with zika virus (zikv) induces expression of nucleotide-binding oligomerization domain-containing protein (nod ), a host factor that was found to promote zikv replication and spread. a drug that targets nod was shown to have potent broad-spectrum antiviral activity against other flaviviruses, alphaviruses and sars-cov- , the causative agent of covid- . another drug that inhibits the receptor-interacting serine/threonine-protein kinase (ripk ) which functions downstream of nod , also decreased replication of these pathogenic rna viruses. the broad-spectrum action of nodosome targeting drugs is mediated, at least in part, by enhancement of the interferon response. together, these results suggest that further preclinical investigation of nodosome inhibitors as potential broad-spectrum antivirals is warranted. diagnosis and lack of effective vaccines against most arboviruses, development of broad-spectrum antivirals against these pathogens should be a high priority. the ongoing pandemic caused by sars-cov- poses a different set of challenges. despite concerted efforts to repurpose and find new antiviral drugs ( - ), so far only remdesivir has shown modest efficacy in the acute stages of ) . while more than cov- vaccine candidates are in accelerated development at preclinical and clinical stages ( ), it will likely take another year or more before they are broadly available to the general population as safety and efficacy still need to be evaluated ( , ) . in this study, we characterized the broad-spectrum antiviral activities of nodosome inhibitors gsk and gsk . these small molecules display robust antiviral action against multiple rna viruses and may hold promise as pan-flavivirus inhibitors. first, we showed that nod expression promotes zikv multiplication in hfas which are the main target of this flavivirus in the fetal brain ( , ) . next, we demonstrated that the nod inhibitor gsk blocks infection by and spread of zikv in human fetal brain and cell lines. nod inhibition also reduced replication of the related denv, the alphavirus mayv and the pandemic coronavirus sars-cov- . blocking the nod downstream signaling kinase ripk with gsk (which does not affect its catalytic activity) significantly inhibited replication of these viral pathogens. gefitinib is an fda-approved drug for treatment of lung, breast and other cancers. it works by reducing the activity of the epidermal growth factor receptor (egfr) tyrosine kinase domains. of note, this drug also inhibits the tyrosine kinase activity of ripk ( ) and has been shown to inhibit replication of denv and release of pro-inflammatory cytokines from infected human primary monocytes ( ). the authors suggested a role for egfr/ripk in denv pathogenesis and that gefitinib may be beneficial in the treatment of dengue patients. similarly, the work here which demonstrated the antiviral activity of nod and ripk inhibitors using tissue explants, primary cells and cell lines, support the potential clinical use of these compounds in mono or co- infections by arboviruses as well as coronavirus infections at early and/or advanced stages. as gsk and gsk were developed primarily for immune-mediated inflammatory conditions, their anti-inflammatory effects may have the added benefit of reducing the hyperinflammatory state associated with flavi-, alpha-and coronavirus diseases ( , , , ) . finally, our findings raise potential concerns regarding adjuvants in viral vaccines that augment nod as an immune strategy ( , ) since this immune signaling protein is not a restriction factor but rather an enhancement factor for multiple pathogenic rna viruses. the current study illustrates how the identification of a drug target through transcriptomic analyses of virus-infected cells can lead to novel broad-acting host-directed antiviral strategies with a high barrier of resistance. increased nod expression may be a novel mechanism of immune evasion that viruses use to evade the innate immune response. conversely, drugs that block nodosome formation appear to have broad-spectrum antiviral activity by enhancing the interferon response. collectively, our results warrant consideration of these and related compounds as broad-spectrum antiviral drug candidates for further preclinical development. at hours post-transfection, total rna was extracted and transcripts levels for ifn-stimulated genes (isgs) were quantified by qrt-pcr. human recombinant inf-α assay. hfas in -well plates (greiner) were treated with or without - u/ml of human recombinant ifn-α (sigma-aldrich) for - hours after which total rna was isolated and subjected to qrt-pcr in order to measure expression of isgs. we declare no competing financial interests. the next day, cells were rinsed once with pbs and viruses at an moi of . to were added to the wells. cells were then incubated for (sars-cov- ), (arboviruses) or (arboviral coinfections) hours at c using fresh media supplemented with % fetal bovine serum. next, the inoculum was removed, and the cells were washed twice with pbs. complete culture medium was added to each well, and cells were incubated at c and % co . mock-infected cells were incubated with the culture supernatant from uninfected cells. fetal brain explant cultures and infections. human brain tissue from two - -week aborted fetuses was obtained (after written consent) under protocol of the university of alberta human research ethics board. after delivery to the laboratory in ice-cold pbs, the fresh tissue was placed in a mm petri dish and then dissected with sterile scalpel and forceps into approximately mm x cm x mm blocks ( ) . the small tissue blocks were immediately immersed in -well plates containing the same media used to culture hfas (see above). brain explant tissue was infected overnight with pfu/ml of prvabc- zikv strain. the next day, explants were washed once with pbs and then fresh media was added. the explant cultures were maintained in a humidified c incubator containing % co for up to days under different experimental treatments. tissue and cellular rna purification, cdna synthesis, and qrt-pcr. total rna was extracted from cultured cells or fetal brain tissue using nucleospin rna (macherey-nagel gmbh & co) kits. samples were then treated with rnase-free dnase (macherey-nagel gmbh & co) before a portion ( . - µg total rna) was subjected to reverse transcription using improm-ii reverse transcriptase (promega). cellular transcripts and viral rna were quantitated by qrt- pcr using perfecta sybr green supermix (quanta biosciences) in a cfx touch real-time pcr detection system instrument (bio-rad) under the following cycling conditions: cycles of c for s, c for s, and c for s. gene expression (fold change) was calculated using the (-ΔΔct) method with human β-actin mrna transcript as the internal control. the following forward and reverse primer pairs were used for pcr: shown. values are expressed as the mean of three independent experiments. error bars represent standard errors of the mean. *p < . , **p < . , and ***p < . , by the student t test. calu- and huh cells infected with sars-cov- (moi= . ) were also treated with gsk for hours before collecting the cell supernatants for viral titer determination. as a positive control of the flaviviral inhibition in infected a cells (moi= . - ), we used the anti-flavivirus nucleoside analog vero e cells (moi= . ) was used as positive control at . - µm or dmso as vehicle cells were washed three times in pbs and then permeabilized/blocked in blocking buffer with . % triton-x and % bsa in pbs for hour at room temperature followed by washing with pbs containing . % bsa. incubations with primary antibodies diluted : (mouse anti-flavivirus group antigen g , millipore), : (mouse monoclonal anti-chikungunya capsid kindly donated by dr blocking buffer were carried out at room temperature for . hour followed by three washes in washing buffer ( . % triton-x with . % bsa and pbs) samples were then incubated with secondary antibodies ( : ) in blocking buffer containing secondary antibodies (invitrogen) were alexa fluor anti-mouse. confocal images of cells on coverslips were acquired using an olympus x spinning disk confocal microscope (tokyo, japan) and images were analyzed using volocity . . software. a set of confocal imaging was ( x) of ace -sk-n-sh infected with sars-cov- (moi= ) and treated with gsk or dmso for hours. after fixation, coronavirus-infected cells were stained using mouse monoclonal antibody to sars spike protein as primary antibody and secondary antibodies were confocal images were acquired using a spinning disk confocal microscope with volocity . . software. (d) cells were counted in different fields for quantitation of infected cells after gsk or dmso treatment. values are expressed as the mean of three independent experiments. error bars represent standard errors of the mean gsk suppresses sars-cov- release at subtoxic concentrations. calu- (a) and huh (b) cells were infected with sars-cov- at the moi of . and treated with gsk or dmso as control for hours after which cell supernatants were collected for plaque assay. viral titers as relative fold is shown. cellular atp measurements using the celltiter-glo assay kit are shown at the -hour time point after gsk or dmso treatment in uninfected ace -sk-n-sh (c), calu- (d), and huh (e) cells. values are expressed as the mean of three independent experiments. error bars represent standard errors of the mean phylogeny of zika virus in western hemisphere use of primary human fetal astrocytes and tissue explants as ex vivo models to study zika virus infection of the developing brain involvement of a p -dependent pathway in rubella virus-induced apoptosis human nup regulates the localization and activity of dexh/d-box helicase dhx a diverse range of gene products are effectors of the type i interferon antiviral response angiotensin-converting enzyme is a functional receptor for the sars coronavirus arid a, a factor that promotes formation of swi/snf-mediated chromatin remodeling, is a tumor suppressor in gynecologic cancers a diagnostic polymerase chain reaction assay for zika virus development of real-time reverse transcriptase pcr assays to detect and serotype dengue viruses real-time rt-pcr for mayaro virus detection in plasma and urine a pneumonia outbreak associated with a new coronavirus of probable bat origin identification of benzimidazole diamides as selective inhibitors of the nucleotide-binding oligomerization domain (nod ) signaling pathway the identification and pharmacological characterization of -(tert-butylsulfonyl)-n-( -fluoro- h-indazol- -yl)quinolin- -amine (gsk ), a highly potent and selective inhibitor of rip kinase an adenosine nucleoside inhibitor of dengue virus remdesivir inhibits sars-cov- in human lung cells and chimeric sars-cov expressing the sars-cov- rna polymerase in mice monoclonal antibodies specific for the capsid protein of chikungunya virus suitable for multiple applications key: cord- -jd fvaop authors: bosco-lauth, angela m.; hartwig, airn e.; porter, stephanie m.; gordy, paul w.; nehring, mary; byas, alex d.; vandewoude, sue; ragan, izabela k.; maison, rachel m.; bowen, richard a. title: pathogenesis, transmission and response to re-exposure of sars-cov- in domestic cats date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jd fvaop the pandemic caused by sars-cov- has reached nearly every country in the world with extraordinary person-to-person transmission. the most likely original source of the virus was spillover from an animal reservoir and subsequent adaptation to humans sometime during the winter of in wuhan province, china. because of its genetic similarity to sars-cov- , it is likely that this novel virus has a similar host range and receptor specificity. due to concern for human-pet transmission, we investigated the susceptibility of domestic cats and dogs to infection and potential for infected cats to transmit to naïve cats. we report that cats are highly susceptible to subclinical infection, with a prolonged period of oral and nasal viral shedding that is not accompanied by clinical signs, and are capable of direct contact transmission to other cats. these studies confirm that cats are susceptible to productive sars-cov- infection, but are unlikely to develop clinical disease. further, we document that cats develop a robust neutralizing antibody response that prevented re-infection to a second viral challenge. conversely, we found that dogs do not shed virus following infection, but do mount an anti-viral neutralizing antibody response. there is currently no evidence that cats or dogs play a significant role in human exposure; however, reverse zoonosis is possible if infected owners expose their domestic pets during acute infection. resistance to re-exposure holds promise that a vaccine strategy may protect cats, and by extension humans, to disease susceptibility. the first report of reverse zoonosis, or transmission from human to animal, was reported from hong kong, where a covid patient's dog tested pcr positive for sars multiple times (sit et al. ). in following weeks, other reports of domestic pets becoming infected following cat cohort (n= ) two of the four cats were lightly anesthetized, and challenged with sars as for cohort . forty-eight hours post-infection, two naïve cats were introduced into the room with the infected cats and sampled on the same schedule as before. the two directly challenge cats were euthanized on dpi and the following tissues collected for virus isolation and histopathology: nasal turbinates, trachea, esophagus, mediastinal lymph node, lung, liver, spleen, kidney, small intestine, uterus, and olfactory bulb. tissues were collected into ba- frozen at - c and homogenized prior to plaque assay. additional tissues collected for histopathology included heart, colon, pancreas, hemi-lung lobe, and mesenteric lymph nodes. thoracic radiographs were also obtained for these two cats pre-challenge and just prior to euthanasia. the remaining two cats were euthanized at dpi and necropsied; these cats will be referred to as contact cats step qrt-pcr system (invitrogen), with the following modification; the initial reverse tissues from cats were fixed in % buffered formalin for days and transferred to % with their pets, there is minimal risk of a potentially exposed cat infecting another human. infected pet cats should not be allowed outdoors to prevent potential risk of spreading infection pneumonia of unknown aetiology in wuhan, china: potential for international spread via commercial air travel a pneumonia outbreak associated with a new coronavirus of probable bat origin of novel coronavirus-infected pneumonia receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars infection of dogs with sars-cov- coronavirus: belgian woman infected her cat the brussels times. serological survey of sars-cov- for experimental, domestic, companion and wild animals excludes intermediate hosts of different species of animals absence of sars-cov- infection in cats and dogs in close contact with a cluster of covid- patients in a veterinary campus susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus transmission of sars-cov- in domestic cats sars-cov- neutralizing serum antibodies in cats: a serological investigation host and viral traits predict zoonotic spillover from mammals wildlife and emerging zoonotic diseases: the biology circumstances and consequences of cross-species transmission of current topics in microbiology and immunology a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster sars virus infection of cats and ferrets cov- and covid- in older adults: what we may expect regarding pathogenesis, immune responses, and outcomes the incubation period of coronavirus disease publicly reported confirmed cases: estimation and application. annals of internal medicine asymptomatic infection and atypical manifestations of covid- : comparison of viral shedding duration infection and rapid transmission of sars-cov- in ferrets respiratory disease in rhesus macaques inoculated with sars-cov- covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility molecular biology serological assays for severe acute respiratory syndrome coronavirus (sars-cov- ) pathogen exposure varies widely among sympatric populations of wild and domestic felids across the united states prior puma lentivirus infection modifies early immune responses and attenuates feline immunodeficiency virus infection in cats detection of novel coronavirus ( -ncov) by real-time rt-pcr key: cord- -kl wh zg authors: xu, h; zhang, xy; wang, ww; wang, js title: hydroxychloroquine increased psychiatric-like behaviors and disrupted the expression of related genes in the mouse brain date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kl wh zg hydroxychloroquine (hcq), which has been proposed as a therapeutic or prophylactic drug for sars-cov- , has been administered to thousands of individuals with varying efficacy; however, our understanding of its adverse effects is insufficient. it was reported that hcq induced psychiatric symptoms in a few patients with autoimmune diseases, but it is still uncertain whether hcq poses a risk to mental health. therefore, in this study, we treated healthy mice with two different doses of hcq that are comparable to clinically administered doses for days. psychiatric-like behaviors and the expression of related molecules in the brain were evaluated at two time points, i.e., h and days after drug administration. we found that hcq increased anxiety behavior at both h and days and enhanced depressive behavior at h. furthermore, hcq decreased the mrna expression of interleukin- beta and corticotropin-releasing hormone (crh) in the hippocampus and decreased the mrna expression of brain-derived neurotrophic factor (bdnf) in both the hippocampus and amygdala. most of these behavioral and molecular changes were sustained beyond days after drug administration, and some of them were dose-dependent. although this animal study does not prove that hcq has a similar effect in humans, it indicates that hcq poses a significant risk to mental health and suggests that further clinical investigation is essential. according to our data, we recommend that hcq be carefully used as a prophylactic drug in people who are susceptible to mental disorders. at present, tens of millions of people worldwide have been infected with coronavirus disease (sars-cov- ). among the many potential therapeutic drugs for sars-cov- , society has perhaps paid the most attention to hydroxychloroquine (hcq). although emerging evidence does not suggest that it is efficacious in treating sars-cov- ( - ), it has already been administered to thousands of individuals. furthermore, more than hundred clinical trials of hcq involving tens of thousands of potential participants have been registered. hcq is not only being used to treat patients with sars-cov- but is also being used as a pre-exposure or postexposure prophylactic drug for sars-cov- in healthy populations who are at high risk of exposure to sars-cov- ( , ) . although hcq has been used clinically for decades to treat malaria and autoimmune diseases, such as systemic lupus erythematosus (sle), sjögren's syndrome (ss) and rheumatoid arthritis (ra) ( ), our understanding of its potential adverse effects, especially in patients with sars-cov- and uninfected individuals, is insufficient. there have been reports of serious adverse effects related to hcq, including retinopathy, qt interval prolongation, hypoglycemia, cardiomyopathy, neuropsychiatric effects and so on ( , ) , and qt interval prolongation is the most closely monitored adverse effect in patients with sars-cov- . less attention has been paid to the effects of hcq on mental health, although multiple studies have reported severe psychiatric symptoms, including agitation, hallucinations, anxiety, depression, and suicidal ideation, in patients treated with hcq or chloroquine ( ) ( ) ( ) ( ) . however, whether these neuropsychiatric symptoms are caused by hcq alone or by the interaction between hcq and the disease is not well studied, possibly due to the limited number of reported cases. therefore, whether hcq poses a serious risk to mental health is still not clear. in addition to acting as an antimalaria drug, hcq is an immunomodulator that exerts broad anti-inflammatory effects, including inhibition of antigen presentation to dendritic cells, reduced signaling of both b and t cells and reduced cytokine production in macrophages ( , ) . microglia, as macrophages that reside in the brain and play critical roles in neuronal functions, are also potential targets of hcq ( ) . however, the ability of hcq to cross the blood-brain barrier (bbb) is quite limited ( , ) ; however, considering that hcq has a very long half-life in vivo ( ) , its direct effect on microglia should not be underestimated. in addition, hcq may affect the cns via its inhibitory on peripheral cytokines, such as il- β or tnf-α, which can bind to the vascular endothelium of the brain or directly entering the cns ( ) , resulting in a reduction in the production of corresponding cerebral cytokines and induction of a neuroinflammatory deficit in the brain. although hypo-neuroinflammation is not as concerning as hyperneuroinflammation, and the latter has been indicated as a potential pathology of multiple psychiatric disorders ( ) , emerging evidence suggests that maintaining the balance of the immune system is more important than preventing inflammation in the cns ( ) and that low levels of inflammation in the brain can also have negative effects on cognition and behaviors ( , ) . therefore, the immunosuppressive effect of hcq may have negative effects on the cns, especially in healthy individuals who do not suffer from hyperinflammation induced by infection. therefore, in this study, the potential effects of hcq on mental health were investigated in healthy mice. the animals were administered hcq for days, and psychiatric-like behaviors related to anxiety, depression, and cognitive impairment were evaluated. furthermore, the expression of genes, including brain-derived neurotrophic factor (bdnf), corticotropin-releasing hormone (crh) and interleukin- beta (il- β), which are closely related to mental health, were also measured in two brain areas, namely, the hippocampus and amygdala, which play critical roles in mental health and in which cytokines are produced notably ( ) . in this study, we calculated appropriate doses of hcq for mice based on clinically administered doses of hcq according to the average body surface area of humans and mice. the clinical dosages used for the calculations were and mg on the first day followed by and mg daily for the next days. we evaluated behavior and gene expression at two time points, i.e., h and days after drug administration, to evaluate the immediate and lasting effects of the drug. the male c bl/ j mice used in this study were purchased from vital river laboratory animal center (beijing, china) at the age of weeks. after a week of adaptation, the mice were randomly divided into two groups: the immediate group and the lasting group. the mice in each group were further randomly divided into three subgroups, i.e., the high-dose, low-dose and vehicle groups; there were ~ animals per subgroup. all animals were maintained under controlled environmental conditions ( h light/ h dark cycle, lights on at am; ambient temperature - °c; humidity ± %) with free access to food and water. all procedures were approved by the institutional review board of the institute of psychology, chinese academy of sciences and performed in compliance with the national institutes of health guide for the care and use of laboratory animals. hydroxychloroquine sulfate (ark pharm, chicago, usa) was dissolved in sterile phosphate-buffered saline (pbs), ph= . . the drug was intraperitoneally (i.p.) injected twice per day (every hours) for consecutive days. on the first day, the high dose of hcq was mg/kg/injection, and the low dose was mg/kg/injection; hcg was given at half of these doses for the next days. the body weight of each mouse was recorded every two days during administration and every five days after administration. to test the immediate/lasting effects of hcq on behavior and the expression of molecules in the brain, behavioral tests were performed h and days after the last injection. the interval between each test was h. for each test, the apparatus was cleaned with % ethanol to eliminate the influence of excrement or odor on subsequent animals. the tests were performed in the following order. the nor test was used to evaluate the locomotor activity and working memory of the mice. the test consisted of three phases, including the adaptation phase (phase , min), the training phase (phase , min) and the test phase (phase , min). in phase , the mice were placed in the apparatus (l*w*h: cm* cm* cm, with dim light), allowed to explore freely for min and then returned to their home cages. during the exploration period in phase , the trajectory of each mouse was captured with a camera and automatically tracked and recorded with software (anilab, suzhou, china). the distance traveled every min throughout the -min period was recorded as a measure of the locomotor activity of the mice. in phase , two identical objects (lego bricks, cm* cm* cm) were placed in opposite corners of the apparatus cm from either side of the wall. the mice were introduced from the corner not containing an object, allowed to explore for min and then returned to their home cages for h. in phase , one of the two objects were replaced with a novel object of similar size but of a different color and shape. the mice were then placed in the apparatus again and allowed to explore for an additional min. the exploration of each mice was recorded with an overhead camera and then hand-scored by a pretrained independent investigator blinded to the experimental groups. the total sniff time and the discrimination index ((sniff time for novel object-sniff time for familiar object)/(total sniff time)) ( ) were the main parameters measured in this test. discrimination index data for a few animals that showed a lack of exploratory activity (< s for one object) were excluded. the day after the locomotor and novel object recognition tests were performed, the mice were subjected to the epm test in a quiet dim room to evaluate anxiety-like behaviors. the mice were placed in the center of an elevated maze facing the open arms at the beginning of the test. during the test, the mice were allowed to freely explore the maze for min, and the movement trajectory and time spent in each area were automatically recorded by software. the duration of time spent in the open arms and in the closed arms was reported. the fst was performed h after the epm test to measure depression-like behavior. the mice were forced to swim individually for min in a glass cylinder with a diameter of cm filled cm deep with water at a temperature of ± °c. swimming behavior was recorded using a horizontal video camera, and immobility time (during which the mice exhibited no movement except for that of the whiskers) was hand-scored by two independent observers blinded to the experimental groups. the immobility time of each mouse was calculated as the average of the times measured by the two observers. the mice were sacrificed h after all the behavioral tests were completed. the hippocampus and amygdala were quickly extracted from each brain by using a mouse brain slicer. the tissues were immediately frozen on dry ice and then stored at - °c. rna was extracted from the tissues by using the rnaprep pure tissue kit (tiangen, beijing, china) according to the standard protocol. the rna concentration was quantified using the nanodrop system (thermo fisher scientific, ma, usa), and the quality of the rna was evaluated by gel electrophoresis. one microgram of total rna from each sample was reverse transcribed into first-strand cdna by using the m-mlv kit (promega corporation, madison, wi, usa) following the manufacturer's protocol. mrna expression levels were determined by real-time pcr using sybr™ green pcr master mix (life technologies, california, usa) on an abi system, and dissociation curve analysis was performed at the end of each run. each sample was individually amplified and duplicated, and then the average value was used for subsequent analyses. relative mrna expression was calculated by the standard −ΔΔct method using glyceraldehyde -phosphate dehydrogenase (gapdh) as a housekeeping gene. the following forward (f) and reverse (r) primers (bgi tech, beijing, china) were used: il- β: forward: tgtctgaagcagctatggcaac; reverse: ctgcctgaagctcttgttgatg; bdnf: forward: tggctgacacttttgagcac; reverse: aagtgtacaagtccgcgtcc; crh: forward: gatctcaccttccaccttctg; reverse: cgcaacatttcatttcccga; gapdh: forward: caagctcatttcctggtatgac; reverse: ctgggatggaaattgtgagg. the data are presented as the mean ± sem and were analyzed by graphpad . . comparisons between the three groups were performed using one-way or two-way anova, and lsd post hoc multiple comparisons were carried out. statistical significance for all analyses was set at p≤ . . as shown in figure a , two-way anova indicated that there was a significant effect of time (f ( . , . ) = . , p< . ) and a significant time × drug interaction effect (f ( , ) = . , p= . ) but no significant effect of drug treatment (f ( , ) = . , p= . ). post hoc tests indicated that the rate of increase in body weight, which was significantly lower in the low-dose group (p= . ) and high-dose group (p= . ) than in the pbs group, was significantly different between groups h after injection, but the difference between the low-dose group and high-dose group was not significant (p= . ). the rate of increase in weight over subsequent days were not significant. ± . , p= . , respectively) anxiety-like behavior seemed to become more obvious over time. for the novel object recognition test ( figure d, e) , one-way anova indicated that there was no significant effect of hcq treatment on the discrimination index at days after the final injection (f ( , ) = . , p= . ). however, there was a significant difference in total sniff time (f ( , ) = . , p= . ). post hoc analysis revealed that mice in the high-dose group spent less time sniffing ( . ± . vs. . ± . , p= . ) than mice in the pbs group. for the fst (figure f), one-way anova indicated that there was no significant effect of hcq treatment on immobility time at days after the final injection (f ( , ) = . , p= . ). as shown in figure , hcq rapidly altered the mrna expression levels of bdnf, il- β, and crh in the hippocampus and amygdala. one-way anova indicated that there was a significant effect of drug treatment on the expression of il- β in the hippocampus (f ( , ) = . , p= . ) but not in the amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that the mrna expression of il -β in the high-dose group was significantly lower than that in the low-dose group (p= . ) and pbs group (p= . ). there was a significant effect of drug treatment on the expression of crh in the hippocampus (f ( , ) = . , p= . ) and amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that the mrna expression of crh in the high-dose group was significantly lower than that in the low-dose group in the hippocampus and amygdala (p= . , p= . , respectively) and that the mrna expression of crh in the high-dose group was significantly lower than that in the pbs group (p= . ) in the hippocampus. there was a significant effect of drug treatment on the expression of bdnf in the hippocampus (f ( , ) = . , p= . ) and amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that in the hippocampus, the mrna expression level of bdnf was significantly lower in the low-dose (p= . ) and high-dose (p= . ) groups than in the pbs group. in the amygdala, the mrna expression level of bdnf was significantly lower in the high-dose (p= . ) group than in the pbs group ( . ). as shown in figure , hcq persistently altered the mrna expression levels of bdnf, il- β, and crh in the hippocampus and amygdala. one-way anova indicated that there was a significant effect of drug treatment on the expression of il- β in the hippocampus (f ( , ) = . , p< . ) but not in the amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that the mrna expression of il- β in the high-dose group and low-dose group was significantly lower than that in the pbs group (p< . for both). there was a significant effect of drug treatment on the expression of crh in the hippocampus (f ( , ) = . , p= . ) but not in the amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that the mrna expression of crh in the high-dose group and low-dose group was significantly lower than that in the pbs group (p= . and p= . , respectively). there was a borderline significant effect of drug treatment on the expression of bdnf in the hippocampus (f ( , ) = . , p= . ) and a significant effect of drug treatment on bdnf expression in the amygdala (f ( , ) = . , p= . ). post hoc analysis revealed that in the hippocampus, the mrna expression level of bdnf was lower in the low-dose (p= . ) and high-dose (p= . ) groups than in the pbs group. in the amygdala, the mrna expression level of bdnf was significantly lower in the high-dose (p= . ) group than in the pbs group. the present study found that short-term repeated administration of hcq led to a significant and persistent increase in anxiety-like behaviors in healthy mice and had an effect on depression-like behavior in the short term. hcq also caused a sustained decrease in the mrna expression of il- beta, crh and bdnf in the hippocampus; however, in the amygdala, hcq temporarily affected the expression of crh and induced a lasting significant decrease in the expression of bdnf. some of these behaviors and molecular changes, including the changes in anxiety-like behavior, the expression of crh in the hippocampus and the expression of bdnf in the amygdala, were dose-dependent, as the changes in the high-dose group were greater than those in the low-dose group. to the best of our knowledge, this study is the first to test the psychiatric effect of hcq in an animal model, and our results suggest that, in healthy mice, hcq administration can induce persistent behavioral changes and disrupt gene expression in the brain. the hcq treatment regimen used in this study mimics the current short-term regimen used for the treatment of sars-cov- . according to current clinical studies, most patients take hcq for to days at a dose of mg to mg on the first day followed by half of the original dose on the remaining days; in most studies, doses from mg to mg are administered on the first day. in this study, mice were given hcq at one of two doses for days. hcq was administered at a dose of mg/kg or mg/kg on the first day and at a dose of mg/kg or mg/kg once daily for the next days; these doses corresponded to mg or mg on the first day followed by mg or mg per day on the subsequent days in humans. the drug dose for mice was calculated based on the average body surface area ratio of mice to humans (mouse/human= . ) ( ). it should be noted that for this calculation, the dose per kilogram for humans was calculated based on the global average body weight for adults, which is kg; however, the body weight of many people who take the drug may be higher. therefore, the calculated dose administered to the animals may be higher than the corresponding dose administered to humans. because the purpose of this study was to determine the potential risk of this drug, we thought it was important to choose a slightly higher dose and longer period of administration. in this study, hcq was administered by intraperitoneal (i.p.) injection because approximately %~ % of hcq can be absorbed upon oral administration ( ); thus, there would not be a large difference in drug absorption when the drug is administered orally and by i.p. injection . however, the speed of drug entry into tissues may be faster following i.p. injection than following oral administration. a previous study also indicated that the tolerated dose of i.p. injected hcq is lower than that of orally administered hcq ( ); thus, we chose to administer the drug in two injections per day (one injection every h). the body weight data showed that drug tolerance was basically good; except for a dose-dependent decrease in body weight on the first day, the rate of increase in body weight was not significantly different between the hcq groups and the vehicle group. additionally, evaluation of the locomotor activity of the animals did not show a significant difference between groups except for a hcq-induced decrease in the locomotor activity of mice in the lasting group during the first min of the test, which may have been a consequence of the anxiety-induced decrease in exploration in these animals. two time points were examined in this study, i.e., h and days after administration, and animal tissues were extracted h after the completion of all the behavioral tests; thus tissues were collected and days after administration. previous studies have shown that hcq persists for a long time in vivo, with a half-life of - days in humans ( ) ; however in rats, % of hcq is eliminated by the th day, and % is cleared after days ( ) . the metabolic rate of mice is approximately twice that of rats and times that of humans ( ) , indicating that the direct effect of hcq in the lasting group ( days after administration) was likely quite small. therefore, these two time points can effectively distinguish the immediate and lasting effects of hcq. in the current study, anxiety-like behavior was measured by using the epm test, depression-like behavior was measured by using the fst, and working memory was measured by using the nor test. the results showed that hcq had a significant and sustained effect on anxiety, as indicated by decreased exploration time in the open arms. depression-like behavior was also affected in the immediate group, as indicated by a significantly longer immobility time in the high-dose group than in the low-dose group but not in the vehicle group; these results suggest that hcq has an effect on depression-like behavior. this increase in depression-like behavior is consistent with the changes in the expression of crh in the amygdala. our results did not reveal a significant effect of hcq treatment on working memory, but total exploration time was decreased after hcq administration, which also suggests that the drugs increased anxiety. previous human studies have reported that hcq can induce anxiety-and depression symptoms ( , ) , but no reports have shown that it impairs cognitive function. however, there have been only a few reports in humans, and there is a close relationship between cognitive impairment and psychiatric disorders ( ) . furthermore, while we only measured working memory, the proteins in the brain that were affected by hcq, such as bdnf, have been indicated to be closely related to cognitive functions ( ) ; therefore, our results cannot rule out the possibilty that hcq affects cognitive functions. the mrna expression of three genes, namely, il- β, crh and bdnf, was measured in this study; these three genes are related to neuroinflammation, the hpa axis and neuroplasticity, respectively, and all of them are closely associated with the pathogenic mechanisms of mental illness ( , , ) . our results showed that hcq has a significant effect on all three molecules. at both time points, hcq significantly decreased the expression of il- β in the hippocampus but had little effect on il- β in the amygdala, possibly because the hippocampus is more susceptible to changes in peripheral cytokine levels, as it is closer to the choroid plexus than the amygdala ( ) . moreover, hcq significantly decreased the expression of crh in the hippocampus at both time points in a dose-dependent manner. furthermore, hcq had little effect on the expression of crh in the amygdala; a significant effect was only observed in the immediate group, but the difference was mainly between the high-dose group and the low-dose group. in addition, hcq significantly decreased the expression of bdnf in both brain regions at both time points. it has been reported that these genes interact with one another; for example, it was reported that il- β can regulate the expression of crh and bdnf ( , ) . however, the widespread influence of hcq on these genes is still beyond our initial expectation. considering the importance and extensive influence of these genes on the cns, the influence of hcq on the cns may be quite serious. additionally, hcq has a much longer half-life in humans than in rodents, and our results suggest that hcq has significant effects on anxiety and gene expression more than days after administration. moreover, based on the expression pattern of il- β, the effect of hcq on il- β increases after drug withdrawal. this implies that the influence of hcq on human emotions and cns functions may last for months. previous case reports have also indicated that stopping drug administration cannot resolve its psychiatric effect quickly ( ) . it should be noted that, as a preclinical study, the present study cannot prove that the clinical use of hcq induces anxiety or depression in humans. the results of this study also do not clarify the mechanisms underlying the effect of hcq on behavior or brain function. based on these preliminary results, we speculate that the effect of hcq on the cns is mainly due to its peripheral immunosuppressive effects, which induce hypo-neuroinflammation in the brain, impairing the functions of immune cells in the brain, such as microglia. if our hypothesis is true, healthy individuals and patients with mild disease who do not suffer from infection-induced hyperinflammation would be at a higher risk of psychiatric symptoms than patients with severe sars-cov- . however, much more work is needed to prove this speculation. our results prove that hcq poses a serious risk to mental health. given the current pandemic, we hope that based on the results reported in this study, scientists and clinical doctors will pay more attention to this potential adverse effect of hcq and start to monitor the mental health status of patients in clinical trials. furthermore, we suggest that individuals who are susceptible to psychiatric disorders should not take hcq, especially as a prophylactic drug for sars-cov- . figure : effects of hcq on body weight and locomotor activity. (a) the rate of increase in body weight of mice in the lasting group. (b) the locomotor activity of mice in the immediate group. (c) the locomotor activity of mice in the lasting group (n= - for each subgroup; two-way anova, *p< . compared with the pbs group). il- β, (e) crh, and (f) bdnf in the amygdala (n= - for each subgroup; one-way anova; *p< . , ** p< . , *** hydroxychloroquine with or without azithromycin in mild-to-moderate covid- hydroxychloroquine use against sars-cov- high doses of hydroxychloroquine do not affect viral clearance in patients with sars-cov- infection hydroxychloroquine in the covid- pandemic era: in pursuit of a rational use for prophylaxis of sars-cov- infection emerging prophylaxis strategies against covid- . monaldi arch chest dis hydroxychloroquine in rheumatic autoimmune disorders and beyond rethinking the role of hydroxychloroquine in the treatment of covid- safety considerations with chloroquine, hydroxychloroquine and azithromycin in the management of sars-cov- infection chloroquine psychosis: a chemical psychosis? psychomotor agitation following treatment with hydroxychloroquine anxiety, depression and suicidal ideation in patients with rheumatoid arthritis in use of methotrexate, hydroxychloroquine, leflunomide and biological drugs lethality and behavioral side effects of chloroquine pharmacokinetics and pharmacological properties of chloroquine and hydroxychloroquine in the context of covid- infection effect of hydroxychloroquine on progression of dementia in early alzheimer's disease: an -month randomised, double-blind, placebo-controlled study cns penetration of potential anti-covid- drugs animal toxicity and pharmacokinetics of hydroxychloroquine sulfate the role of the innate immune system in psychiatric disorders inflammation in psychiatric disorders: what comes first? alzheimer's disease and cytokine il- gene polymorphisms: is there an association? deficiencies of microglia and tnfalpha in the mpfcmediated cognitive inflexibility induced by social stress during adolescence changing face of microglia the novel object recognition memory: neurobiology, test procedure, and its modifications a simple practice guide for dose conversion between animals and human cognitive dysfunction in psychiatric disorders: characteristics, causes and the quest for improved therapy bdnf and synaptic plasticity, cognitive function, and dysfunction behavioral studies and genetic alterations in corticotropin-releasing hormone (crh) neurocircuitry: insights into human psychiatric disorders bdnf-trkb receptor regulation of distributed adult neural plasticity, memory formation, and psychiatric disorders choroid plexus blood-csf barrier: major player in brain disease modeling and neuromedicine il- beta immunoreactive neurons in the human hypothalamus: reduced numbers in multiple sclerosis immune dysregulation and cognitive vulnerability in the aging brain: interactions of microglia, il- beta, bdnf and synaptic plasticity prolonged neuropsychiatric effects following management of chloroquine intoxication with psychotropic polypharmacy key: cord- - mo u authors: miersch, shane; li, zhijie; saberianfar, reza; ustav, mart; blazer, levi; chen, chao; ye, wei; pavlenco, alia; subramania, suryasree; singh, serena; ploder, lynda; ganaie, safder; leung, daisy; chen, rita e.; case, james brett; novelli, guiseppe; matusali, giulia; colavita, francesca; copabianchi, maria r.; jain, suresh; gupta, j.b.; amarasinghe, gaya; diamond, michael; rini, james; sidhu, sachdev s. title: tetravalent sars-cov- neutralizing antibodies show enhanced potency and resistance to escape mutations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mo u recombinant neutralizing antibodies (nabs) derived from recovered patients have proven to be effective therapeutics for covid- . here, we describe the use of advanced protein engineering and modular design principles to develop tetravalent synthetic nabs that mimic the multi-valency exhibited by iga molecules, which are especially effective natural inhibitors of viral disease. at the same time, these nabs display high affinity and modularity typical of igg molecules, which are the preferred format for drugs. we show that highly specific tetravalent nabs can be produced at large scale and possess stability and specificity comparable to approved antibody drugs. moreover, structural studies reveal that the best nab targets the host receptor binding site of the virus spike protein, and thus, its tetravalent version can block virus infection with a potency that exceeds that of the bivalent igg by an order of magnitude. design principles defined here can be readily applied to any antibody drug, including iggs that are showing efficacy in clinical trials. thus, our results present a general framework to develop potent antiviral therapies against covid- , and the strategy can be readily deployed in response to future pathogenic threats. to date, all clinically advanced candidate nabs against sars-cov- infection have been derived by cloning from b-cells of recovered covid- patients or from other natural sources , , - . here, we applied an alternative strategy using in vitro selections with phage- displayed libraries of synthetic abs built on a single human framework derived from the highly validated drug trastuzumab. this approach enabled the rapid production of high affinity nabs with properties optimized for drug development. moreover, the use of a highly stable framework enabled facile and modular design of ultra-high affinity nabs in tetravalent formats that retained favorable drug-like properties and exhibited neutralization potencies that greatly exceeded those of the bivalent igg format. these methods provide a general means to rapidly improve the potency of virtually any nab targeting sars-cov- and its relatives, and thus, our strategy can be applied to improve covid- therapies and can be adapted in response to future pathogenic threats. results using a phage-displayed human antigen-binding fragment (fab) library similar to the highly validated library f , we performed four rounds of selection for binding to the biotinylated rbd of sars-cov- immobilized on streptavidin-coated plates. screening of clones for binding to cov- rbd, revealed fab-phage clones that bound to the rbd but not to streptavidin. fab-phage were screened by elisa and those that exhibited > % loss in binding to rbd in the presence of nm ace were sequenced, revealing unique clones (fig. a) , deemed to be potential nabs and converted into the full-length human igg format for purification and functional characterization. to estimate affinities, elisas were performed with serial dilutions of igg protein binding to biotinylated s protein trimer captured with immobilized streptavidin, and these assays showed that three iggs bound with ec values in the sub-nanomolar range (fig. b,c and table ). elisas also confirmed that each igg could partially block the binding of biotinylated ace to immobilized s protein (fig. d) . moreover, similar to the highly specific igg trastuzumab, elisas showed that the three iggs did not bind to seven immobilized proteins that are known to exhibit high non- specific binding to some iggs, and lack of binding to these proteins has been shown to be predictive of good pharmacokinetics in vivo (fig. e) , . we also used biolayer interferometry (bli) to measure binding kinetics and determine avidities more accurately, and all three antibodies exhibited sub-nanomolar dissociation constants (table , fig. s ), in close accord with the estimates determined by elisa. igg exhibited the highest avidity, which was mainly due to a two-or seven-fold faster on-rate than igg or , respectively, and thus, we focused further efforts on this ab. we took advantage of the precision design of our synthetic ab library to rapidly improve the affinity of ab . the synthetic library was designed with tailored diversification of key positions in all three heavy chain complementarity-determining regions (cdrs) and the third cdr of the light chain (cdr-l ). consequently, we reasoned that the already high affinity of ab could be further improved by recombining the heavy chain with a library of light chains with naïve diversity in cdr-l . following selection for binding to the rbd, the light chain library yielded numerous variants, of which were purified in the igg format and analyzed by bli (fig. s ) . several of the variant light chains resulted in iggs with improved binding compared with igg , and in particular, igg - (fig. b) exhibited significantly improved avidity (kd = or pm, respectively) due to an off-rate that was an order of magnitude slower (table , fig. s ) . to understand the molecular basis for antagonism of ace binding, we solved the x-ray crystal structures of the sars-cov- rbd in complex with fab or - at . or . Å resolution, respectively ( fig. a) . as expected, backbone superposition showed that the two complexes were essentially identical (rmsd = . Å). however, there were differences in side chain interactions due to sequence differences in the cdr-l loop, which explained the enhanced affinity of fab - compared with fab (fig. b) . although the side chains of tyr l in fab and his l in fab - both make hydrogen bonds with the side chain of tyr in the rbd, the bond mediated by his l is shorter, and thus, likely to be stronger. moreover, in fab - , the side chain of his l also makes an intramolecular hydrogen bond with the side chain of thr l , which tyr l and arg l are incapable of making in fab , and this interaction may stabilize the cdr-l loop of fab - in a conformation that is favorable for antigen recognition. thus, the crystal structures show that the two substitutions in the cdr-l loop of fab - relative to fab act in a cooperative manner to mediate favorable intermolecular contacts with the rbd, and also, intramolecular interactions that stabilize the loop in a conformation that may be better positioned to interact with the rbd. we next analyzed the structures to understand how the abs could function as antagonists of rbd binding to ace . binding of fab - to the rbd involves an extensive interface, with and Å of surface area buried on the epitope or paratope, respectively, and % or % of the structural paratope is formed by the light or heavy chain, respectively (fig. c) . comparison of the fab and ace epitopes on the rbd revealed extensive overlap, with % or % of the fab or ace epitope occluded by the other ligand (fig. c) . thus, direct steric hinderance explains the blockade of ace binding by fabs and - (fig. d) . we also used cryogenic electron microscopy to visualize fab in complex with the s protein trimer (fig. s a) . this analysis revealed that all three rbds in a single trimer were positioned in an "up" conformation, which was similar to the conformation bound to ace , and the three rbds were bound to three fab molecules. notably, the c-termini of the three fabs were positioned close to each other and pointed away from the s protein, suggesting that a single igg may be able to present two fabs in a manner that would enable simultaneous engagement of two rbds on a single s protein. indeed, this was confirmed in single particle negative stain electron micrographs of igg and the s protein, which revealed that the two fabs of a single igg bound two rbds on a single s protein trimer with a pincer-like grip (fig. s b) . taken together, the x-ray crystallography and electron microscopy showed that fabs and - block ace binding to rbd by direct steric hinderance, and simultaneous binding of fabs to multiple rbds on the s protein trimer enables the iggs to inhibit ace binding with enhanced potency due to avidity. next, we explored whether we could further enhance the avidity of nabs by taking advantage of modular design strategies to engineer tetravalent formats. each sars-cov- particle displays multiple s protein trimers, suggesting that multivalent fab binding could enhance avidity, especially since a single igg molecule can utilize both fab arms to bind a single s protein trimer. we reasoned that additional fab arms added to an igg may further enhance avidity by interacting with rbds on s protein trimers close to the trimer engaged by the core igg. thus, we designed tetravalent versions of and - by fusing additional fabs to either the n-or c-terminus of the igg heavy chain to construct molecules termed fab-igg or igg-fab, respectively (fig. a) . consistent with our hypothesis, the tetravalent molecules exhibited higher avidity, and consequently, greatly reduced off rates compared with their bivalent counterparts, and dissociation constants were in the low single-digit picomolar range ( fig. b, table ). our ultimate aim was to produce therapeutic abs that could be used to treat covid- in patients. aside from high affinity and specificity, effective ab drugs must also possess favorable biophysical properties including high yields from recombinant expression in mammalian cells, high thermodynamic stability, and lack of aggregation and excessive hydrophobic surface area. all iggs and tetravalent molecules were produced in high yields by transient expression in expi f cells ( - mg/l, table ). all proteins were highly thermostable with melting temperatures of the ch /fab domain ranging from - o c, which exceeded the melting temperature of the trastuzumab fab ( . o c, table ). size exclusion chromatography revealed that all iggs eluted as a predominant monodisperse single peak with elution volumes nearly identical to that of trastuzumab ( fig. c and table ) , and the monomeric fraction was calculated to be to > % ( table ) to explore neutralization of potential escape mutants, we generated hiv-gag-based lentivirus-like particle (vlps) pseudotyped with the sars-cov- s protein. we confirmed ace - dependent uptake of the pseudotyped vlps by hek- cells stably over-expressing exogenous ace , and we showed that uptake was inhibited by either fc-tagged rbd (rbd-fc) or igg . within this system, we generated a panel of pseudotyped vlp variants, each containing a single alanine substitution at an rbd position within or close to the ace -binding site. twenty of these vlp variants exhibited a > -fold reduction in internalization compared with the wild-type (wt) vlp, suggesting that these wt side chains contributed favorably to the interaction between the rbd and ace . the remaining vlp variants were internalized with high efficiency, and these represent good mimics of escape mutants, which maintain strong ace -mediated infectivity but may potentially reduce binding to nabs that compete directly with ace . with the panel of vlp variants that mimicked potential escape mutants, we surveyed the effects on cellular uptake after treatment with various nabs (fig. b) . we defined as escape mutants those vlp variants for which cellular uptake in the presence of nm nab was > % of the uptake in the absence of the nab. based on this definition, we found that of the mutations enabled escape from igg , whereas only three mutations enabled escape from igg - . presenting the paratope in tetravalent formats resulted in nabs that could neutralize more variants than igg , and most importantly, tetravalent nabs containing the - paratope strongly neutralized all variants except one. as expected, these results showed that enhancing the avidity of the igg paratope for the s protein enhanced both potency and resistance to escape mutations. moreover, similar enhancements were also achieved by the presentation of paratopes in tetravalent rather than bivalent formats, and the most effective nabs were those that presented the optimized paratope in the tetravalent format. discussion sars-cov- has wreaked havoc on global health and economics, and along with its relatives sars-cov and mers, has shown that viral outbreaks and pandemics will continue to plague the world in the future. consequently, it is essential for the scientific community to adapt the most advanced drug development technologies to combat not only covid- , but also, pathogenic disease in general. in this context, we have deployed advanced synthetic antibody engineering to rapidly develop human nabs, which are potent therapeutic candidates in the natural igg format, and are even better neutralizing agents in the synthetic tetravalent formats that our modular design strategies enable. most importantly, the enhanced affinities and potencies afforded by tetravalent nabs are achieved without compromising any of the favorable characteristics that make igg molecules ideal drugs. moreover, tetravalent nabs resist potential escape mutants, which further augments the power of these molecules as drugs to combat not only sars-cov- , but also, its relatives that may emerge in the future. covid- has also exposed the need for drug development to respond to viral outbreaks ab for the rbd, residues in the ace -binding site are also shown as colored surfaces, and the following color scheme was used: red, contacts with both fab - and ace ; blue, contacts with fab - only; yellow, contacts with ace only. fab - residues that contact the rbd are colored magenta or cyan if they reside in the light or heavy chain, respectively. the cdr-l residues that differ between - and are shown as red spheres. the vlps were treated with nm of the indicated nab (x-axis) and uptake by ace -expressing hek- cells was measured in triplicate and results are representative of n= independent experiments. the heat map shows uptake normalized to uptake in the absence of nab. boxed cells indicate vlps that represented escape mutants for a given nab, as defined by > % uptake with nab treatment compared with untreated control (the percent uptake is shown in each cell). remdesivir for the treatment of covid- -preliminary a randomized trial of hydroxychloroquine as postexposure prophylaxis for covid- convergent antibody responses to sars-cov- in convalescent individuals clinical and immunological assessment of asymptomatic sars-cov- infections convalescent plasma therapy on time to clinical improvement in patients with severe and life-threatening covid- : a randomized clinical trial treatment of critically ill patients with covid- with convalescent plasma effectiveness of convalescent plasma therapy in severe covid- patients a potently neutralizing antibody protects mice against sars-cov- a human neutralizing antibody targets the receptor-binding site of sars-cov- potent neutralization of severe acute respiratory syndrome (sars) coronavirus by a human mab to s protein that blocks receptor association potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies human monoclonal antibody as prophylaxis for sars coronavirus infection in ferrets prophylactic and postexposure efficacy of a potent human monoclonal antibody against mers coronavirus sars-cov- structure and replication characterized by in situ cryo-electron tomography structures and distributions of sars-cov- spike proteins on intact virions entry depends on ace and tmprss and is blocked by a clinically proven protease studies in humanized mice and convalescent humans yield a sars-cov- antibody cocktail cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model igg-neutralizing monoclonal antibodies block the sars-cov- infection a panel of human neutralizing mabs targeting sars-cov- spike at multiple epitopes broad neutralization of sars-related viruses by human monoclonal antibodies diversity is not required for antigen recognition by synthetic antibodies polyreactivity increases the apparent affinity of anti-hiv antibodies by heteroligation biophysical properties of the clinical-stage antibody landscape title: ly-cov , a rapidly isolated potent neutralizing antibody, provides protection in a non-human primate model of sars a strategy to prevent future epidemics similar to the -ncov outbreak a review of studies on animal reservoirs of the sars coronavirus a high through-put platform for recombinant antibodies to folded proteins simple piggybac transposon-based mammalian cell expression system for inducible protein production structure of bacteriophage t fibritin: a segmented coiled coil and the role of the c-terminal domain site-specific biotinylation of purified proteins using bira immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen diversity synthetic antibody libraries yield novel anti-egfr antagonists high-throughput screening of formulations to optimize the thermal stability of a therapeutic monoclonal antibody neutralizing antibody and soluble ace inhibition of a replication-competent vsv- sars-cov- and a clinical isolate of sars-cov- a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov molecular characterization of sars-cov- from the first case of covid- in italy a simple method of estimating fifty percent endpoints -scienceopen crystal violet assay for determining viability of cultured cells primary structure of the streptomyces enzyme endo-beta-n-acetylglucosaminidase h - phaser crystallographic software features and development of coot towards automated crystallographic structure refinement with phenix.refine imgt unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains key: cord- -jpkxjn e authors: brielle, esther s.; schneidman-duhovny, dina; linial, michal title: the sars-cov- exerts a distinctive strategy for interacting with the ace human receptor date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jpkxjn e the covid- disease has plagued over countries and has resulted in over , deaths within weeks. we compare the interaction between the human ace receptor and the sars-cov- spike protein with that of other pathogenic coronaviruses using molecular dynamics simulations. sars-cov, sars-cov- , and hcov-nl recognize ace as the natural receptor but present a distinct binding interface to ace and a different network of residue-residue contacts. sars-cov and sars-cov- have comparable binding affinities achieved by balancing energetics and dynamics. the sars-cov- –ace complex contains a higher number of contacts, a larger interface area, and decreased interface residue fluctuations relative to sars-cov. these findings expose an exceptional evolutionary exploration exerted by coronaviruses toward host recognition. we postulate that the versatility of cell receptor binding strategies has immediate implications on therapeutic strategies. one sentence summary molecular dynamics simulations reveal a temporal dimension of coronaviruses interactions with the host receptor. to gain access to host cells, coronaviruses rely on spike proteins, which are membrane-anchored trimers containing a receptor-binding s segment and a membrane-fusion s segment ( ) . the s segment contains a receptor binding domain (rbd) that recognizes and binds to a host cell receptor. the angiotensin-converting enzyme (ace ) was identified as the critical receptor for mediating sars- entry into host cells ( , ) . binding of the spike protein to the receptor is a critical phase where the levels of the ace expressed on the cell membrane correlates with viral infectivity, and govern clinical outcomes ( ) . consistent with the clinical pulmonary manifestation, ace is widely expressed in almost all tissues, with the highest expression levels in the epithelium of the lung ( ) . similar to the sars- virus, the covid- virus enters the host cell by rbd binding to the host cell ace receptor ( , , ) . host receptor recognition for cell entry is, however, not specified by the cov genus classification. mers-cov is a member of the bcov genus but does not recognize the ace receptor. in contrast, hcov-nl is a member of the acov genus and does recognize the ace receptor ( ) . herein, we analyze the binding of several cov rbds to ace with molecular dynamics (md) simulations and compare the stability, relative interaction strength, and dynamics of the interaction between the viral spike protein and the human ace receptor. the covid- rbd (residues - ) shares a . % sequence identity and high structural similarity with the sars- rbd ( table ). in contrast, the rbd of hcov-nl is only . % identical to that of covid- and there are no significant structural similarities between them (fig. s ) . remarkably, the rbd of mers-cov, which is structurally similar to that of covid- ( . % sequence identity, % structure similarity) recognizes a different host receptor (dpp ) for its cell entry and does not bind ace ( ) . we ran ns molecular dynamic (md) simulations of ace in complex with the rbds of the covid- , sars- , and hcov-nl viruses to quantify the energetics and the dynamics of the different rbd-ace interactions. the simulation trajectory snapshots at ps intervals ( , frames) were analyzed by a statistical potential to assess the probability of the rbd-ace interaction (soap score, ( )), with lower values corresponding to higher probabilities and thus higher affinities. the interaction scores for covid- rbd-ace were comparable to those of sars- , median of - . and - . , respectively (fig. a) . hcov-nl has rbd-ace interaction scores are higher than both of the sars-covs (median of - . ). mers, which is structurally similar to covid- ( table ) does not bind ace . mers virus which binds dipeptidyl peptidase- (dpp , also known as cd ( )), has rbd-ace interaction scores that indicate extremely weak affinity (median of - . ), as expected from a non-cognate receptor interaction. covid- has the largest buried surface area at the interface ( Å ), followed by the interface area for sars- ( Å ) and hcov-nl ( Å ). the number of ace contacting residues maintains the same order, with , , and for covid- , sars- , and hcov-nl , respectively (fig. c) . the three rbds exploit specific binding sites on ace based on the analysis of the md trajectories ( fig. , c and d; movie s ). there is a significant overlap of ace interacting residues between covid- and sars- (at least %), while hcov-nl shares only % and % of contacts with sars- and covid- , respectively. these findings suggest that the coronaviruses exert different interaction strategies with their cognate receptors to achieve the affinity that is required for effective cell entry. an ace residue is considered as part of the interface if one of its atoms is within Å from any rbd atom in at least % of the , md simulation frames. (d) overlay of snapshots for each of the three rbds. the ace is in surface representation (gray). the frames were aligned using the n-terminal fragment of ace that contains the two helices participating in the rbds binding. while the sequence identity between the rbds of covid- and sars- is % (table ) , we observe a significantly higher residue substitution rate at the interaction interface with the ace receptor. out of rbd interface residues, only residues ( %) in covid- are conserved with respect to sars- ( fig. a , table s , fig. s ). similarly, only residues ( %) in sars- are conserved with respect to covid- . to investigate these interface residues, we construct and overlay the contact maps for the rbd-ace interfaces for covid- and sars- (fig. b) . we define a residue-residue contact frequency (cf) as the fraction of md trajectory frames in which the contact appears. remarkably, only out of the total residue-residue interface contacts have comparable (< % difference) contact frequencies between the covid- -ace and sars- -ace interfaces ( fig. b , colored gray). furthermore, we find two interaction patches unique to covid- ( fig. b , patches and ) and another patch unique to sars- ( fig. b , patch ). covid- has a significant and unique contact site between residues - of the rbd and residues - of ace (fig. , b and c). covid- also creates a new interaction patch with the middle of the n-terminal ace helix (fig. , b and c), while sars- has a unique interaction patch with the end of the same helix (fig. , b and c) . the rest of the changes in the interface contact frequencies are due to the different interface loop conformations (covid- residue numbers - , sars- residue numbers - ) (fig. , a and b , table s ). covid- has a significantly higher number of well-defined contact pairs compared to sars- : vs. contacts (with and unique pairs, excluding the ones with similar cfs) were found for rbd-ace of the covid- and sars- , respectively (fig. b) . results from fig. expose the accelerated evolution among the key anchoring residues of the rbd-ace interface. this comparison raises the following question: how does sars- rbd reach an ace binding affinity that is comparable to that of covid- but with fewer contact pairs and a smaller interface area? the distribution of soap scores throughout the simulation trajectory has a larger fluctuation range for sars- , relative to covid- ( fig. , a and b; fig. s a ) suggesting that sars- -ace interaction is fluctuating between several structural states. moreover, analysis of contact frequencies along the entire trajectory reveals that none of the sars- contacts are maintained over % of the frames while covid- still maintains about half of its contacts at % of the trajectory (fig. s b) . to investigate the dynamics of covid- binding compared to sars- , we calculate the root-mean-square fluctuation (rmsf) of each residue with respect to the lowest energy snapshot from their respective ns md simulation trajectory. the interface region in the rbd contains two loops (loop : residues - , loop : residues - ; using covid- numbering, fig. d ) that bind to the ace n-terminal helix on both of its ends. these two loops are highly flexible in the sars- rbd (fig. , a and d) . while loop is also fluctuating in the covid- rbd, albeit much less, loop remains relatively rigid in the covid- rbd. in addition, we find that in the covid- -rbd, a region centered around k leads to further stability relative to the corresponding region in sars- . we attribute this difference to the unique interaction of covid- at position k with the middle of the n-terminal ace helix, thus serving as an anchor site to the receptor ( fig. c and fig. a) . the contribution of k to ace binding is observed in a recent cryoem structure of the covid- spike protein bound to ace ( ) . overall, covid- is more rigid compared to sars- (fig. , a and d) . we investigate the dynamics of a designed sars (sars-des) variant ( ) table s ). the l f mutation is of special interest for the covid- rbd as well because it has this same substitution. our md simulation analysis reveals that the sars-des has a substantially lower interaction scores with ace (median of - . , fig. s ) , as expected for an optimized human ace -binding rbd design. we observed that these two mutations not only enhance the binding affinity to ace , but also lead to a substantial stabilization of the interaction interface. the fluctuation signatures along the rbd of sars-des are surprisingly similar to those recorded for covid- (fig. , b and c) . thus, the switch from a flexible binding mode (for sars- ) to a stable one (covid- and sarsdes, fig. b ) highlights the remarkable capacity of the rbd to adopt alternative receptor binding strategies driven by a minimal number of amino acid substitutions. this analysis reveals the critical role of l f (sars-des residue f ) for stabilizing the covid- -ace interface and a reduction in the number of states of the covid- spike protein bound to an ace receptor. experimental affinity measurements (e.g. surface plasmon resonance, spr) confirm the high affinity of sars- rbd-ace binding, with an equilibrium dissociation constant (kd) of ~ mm ( ) ( ) ( ) ( ) , similar to the binding affinity of ace and the covid- rbd ( , ) . our md based calculation is consistent with sars- displaying a similar but slightly higher affinity relative to covid- (fig. a, fig. s and table s ). binding affinity is achieved through a combination of interface contact optimization and protein stability (fig. e) . while the rbd-ace complex can be resolved at high-resolution by cryo-em ( , ) , md simulations provide orthogonal information about the interaction dynamics on a nanosecond timescale. in the case of covs, md simulations reveal an exceptional versatility of viral receptor binding strategies (fig. e) . covid- adopted a different strategy for achieving comparable affinity to sars- : the interface of covid- is significantly larger than that of sars- ( Å vs. Å) with a remarkable number of interacting residues (ace : vs. , fig. c) . in contrast, sars- is more flexible in its interaction with ace , interacting through fewer contacts that serve as "hot spots". therefore, we predict that sars- rbd neutralizing antibodies will not be effective for covid- . the failure of several of these antibodies to neutralize the binding of covid- rbd to its receptor is consistent with our findings ( , ) . the fluctuation from high-to low-affinity conformations in sars- leads to an increased efficacy for inhibiting peptides ( ) and high-affinity antibodies ( ) compared to covid- . this implies a therapeutic challenge is attributed to the enhanced rigidity of the covid- rbd relative to that of the sars- . the geometric and physicochemical properties of rbd-ace interfaces resemble those of antibody-antigen interactions. in both cases the interface benefits from long loop plasticity, bulky aromatic side chains as anchoring sites, and the stabilization of the complex by distributed electrostatic interactions ( ) . both covid- and sars- interfaces contain long flexible loops and nine aromatic residues (tyr, trp, phe) in the interface with ace ( fig. a) . moreover, in the sars designed variant (sars-des ( )), the addition of an aromatic residue (l f substitution) significantly improved the interaction scores and interface stability (fig. , b and d) . our findings shed light on the accelerated evolution of spike protein binding to the ace receptor similar to the rapid evolution along the antibody-antigen affinity maturation process. structural modeling the structural model of the covid- spike protein receptor binding domain (rbd) in complex with ace was generated by comparative modeling using modeller . ( ) with the covid- sequence (refseq: yp_ . ). we relied on the crystal structure of the spike protein receptor-binding domain from a sars coronavirus designed human strain complexed with the human receptor ace (pdb sci, resolution . Å) as a template for comparative modeling. the sars- spike protein rbd and hcov-nl in complex with ace were taken from pdb ajf (resolution . Å) and kbh (resolution . Å), respectively. missing residues were added in modeller. mers rbd structure was taken from the complex with the neutralizing antibody cdc -c (pdb c z, resolution . Å) and structurally aligned onto sars- rbd in complex with ace receptor. the designed variant is from pdb sci. the md simulations were performed with gromacs software ( ) using the charmm m force field ( ) . each of the complexes was solvated in transferable intermolecular potential with points (tip p) water molecules and ions were added to equalize the total system charge. the steepest descent algorithm was used for initial energy minimization until the system converged at fmax < , kj/(mol · nm). then water and ions were allowed to equilibrate around the protein in a two-step equilibration process. the first part of equilibration was at a constant number of particles, volume, and temperature (nvt). the second part of equilibration was at a constant number of particles, pressure, and temperature (npt). for both md equilibration parts, positional restraints of k = , kj/(mol · nm ) were applied to heavy atoms of the protein, and the system was allowed to equilibrate at a reference temperature of k, or reference pressure of bar for ps at a time step of fs. following equilibration, the production simulation duration was nanoseconds with fs time intervals. altogether , frames were saved for the analysis at intervals of ps. we superimposed several md snapshots on the recently submitted to the pdb x-ray structure ( vw , resolution . Å) of covid- -ace complex. the average rmsd over the interface ca atoms is ~ Å. interaction scores between the virus spike rbd and ace were calculated for each frame of the trajectory using the soap statistical potential ( ) . in the interface contact analysis, a residue-residue contact was defined based on the inter-atomic distance, with a cutoff of Å. table s . rbd-ace interface evaluated by several methods for analysis of protein-protein interactions movie s . overlay of random snapshots from the md trajectories of covid- -ace , sars- -ace , and hcov-nl -ace complexes. for clarity only one copy of ace is shown (gray), covid- , sars- , and hcov-nl are colored blue, red, and green, respectively. characterization of a novel coronavirus associated with severe acute respiratory syndrome return of the coronavirus: -ncov coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus epidemiology and clinical characteristics of human coronaviruses oc , e, nl , and hku : a study of hospitalized children with acute respiratory tract infection in guangzhou, china a new coronavirus associated with human respiratory disease in china structural basis for human coronavirus attachment to sialic acid receptors a crucial role of angiotensin converting enzyme (ace ) in sars coronavirusinduced lung injury susceptibility to sars coronavirus s protein-driven infection correlates with expression of angiotensin converting enzyme and infection can be blocked by soluble receptor exogenous ace expression allows refractory cell lines to support severe acute respiratory syndrome coronavirus replication tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars human coronavirus nl employs the severe acute respiratory syndrome coronavirus receptor for cellular entry the s proteins of human coronavirus nl and severe acute respiratory syndrome coronavirus bind overlapping regions of ace structure of mers-cov spike receptor-binding domain complexed with human receptor dpp tm-align: a protein structure alignment algorithm based on the tm-score optimized atomic statistical potentials: assessment of protein interfaces and loops structural basis for the recognition of the -ncov by human ace . biorxiv computational characterization and design of sars coronavirus receptor recognition and antibody neutralization receptor and viral determinants of sars-coronavirus adaptation to human ace potent binding of novel coronavirus spike protein by a sars coronavirusspecific human monoclonal antibody crystal structure of nl respiratory coronavirus receptor-binding domain complexed with its human receptor structure, function and antigenicity of the sars-cov- spike glycoprotein. biorxiv cryo-em structure of the -ncov spike in the prefusion conformation a hexapeptide of the receptorbinding domain of sars corona virus spike protein blocks viral entry into host cells via the human receptor ace the spike protein of sars-cov--a target for vaccine and therapeutic development the indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops comparative protein structure modeling using modeller gromacs . : a high-throughput and highly parallel open source molecular simulation toolkit charmm m: an improved force field for folded and intrinsically disordered proteins acknowledgments. the authors gratefully acknowledge barak raveh for useful suggestions. tables s -s movie s references ( ) ( ) ( ) residue key: cord- -rcv sh authors: simmonds, p. title: rampant c->u hypermutation in the genomes of sars-cov- and other coronaviruses – causes and consequences for their short and long evolutionary trajectories date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rcv sh the pandemic of sars coronavirus (sars-cov- ) has motivated an intensive analysis of its molecular epidemiology following its worldwide spread. to understand the early evolutionary events following its emergence, a dataset of complete sars-cov- sequences was assembled. variants showed a mean . - . nucleotide differences from each other, commensurate with a mid-range coronavirus substitution rate of × − substitutions/site/year. almost half of sequence changes were c->u transitions with an -fold base frequency normalised directional asymmetry between c->u and u->c substitutions. elevated ratios were observed in other recently emerged coronaviruses (sars-cov and mers-cov) and to a decreasing degree in other human coronaviruses (hcov-nl , -oc , - e and -hku ) proportionate to their increasing divergence. c->u transitions underpinned almost half of the amino acid differences between sars-cov- variants, and occurred preferentially in both ’u/a and ’u/a flanking sequence contexts comparable to favoured motifs of human apobec proteins. marked base asymmetries observed in non-pandemic human coronaviruses (u>>a>g>>c) and low g+c contents may represent long term effects of prolonged c->u hypermutation in their hosts. importance the evidence that much of sequence change in sars-cov- and other coronaviruses may be driven by a host apobec-like editing process has profound implications for understanding their short and long term evolution. repeated cycles of mutation and reversion in favoured mutational hotspots and the widespread occurrence of amino acid changes with no adaptive value for the virus represents a quite different paradigm of virus sequence change from neutral and darwinian evolutionary frameworks that are typically used in molecular epidemiology investigations. sars-coronavirus- (sars-cov- ) emerged late in the hubei province, china as a cause of respiratory disease occasionally leading to acute respiratory distress syndrome and death (covid- ) ( - ). since the first reports in december, , infections with sars-cov- have been reported from a rapidly increasing number of countries worldwide, and led to its declaration as a pandemic by the world health organisation in march, . in order to understand the origins and transmission dynamics of sars-cov- , sequencing of sars-cov- directly from samples of infected individuals worldwide has been performed on an unprecedented scale. these efforts have generated many thousands of high quality consensus sequences spanning the length of the genome and have defined a series of geographically defined clusters that recapitulate the early routes of international spread. however, as commented elsewhere (https://doi.org/ . / . . . ) , there is remarkable little virus diversity at this early stage of the pandemic and analyses of its evolutionary dynamics remain at an early stage. the relative infrequency of substitutions is the consequence of a much lower error rate on genome copying by the viral rna polymerase of the larger nidovirales, including coronaviruses. this is achieved through the development of a proofreading capability through mismatch detection and excision by a viral encoded exonuclease, nsp -exon ( - ). consequently, coronaviruses show a low substitution rate over time, typically in the range of . - x - substitutions per site per year (ssy) ( - ) . applying a mid-range estimate to the - month timescale of the sars-cov- pandemic indicates that epidemiologically unrelated strains might show around - nucleotides differences from each other over the , base length of their genomes. in the current study, we have analysed the nature of the sequence diversity generated within the sars-cov- virus populations revealed by current and ongoing virus sequencing studies. we obtained evidence for a preponderance of driven mutational events within the short evolutionary period following the zoonotic transmission of sars-cov- into humans. sequence substitutions were characterised by a preponderance of cytidine to uridine (c->u) transitions. the possibility that the initial diversity within a viral population was largely host-induced would have major implications for evolutionary reconstruction of sars-cov- variants in the current pandemic, as well as in our understanding both of host antiviral pathways against coronaviruses and the longer term shaping effects on their genome composition. sequence changes in sars-cov- . four separate datasets of full length, (near-) complete genome sequences of sars-cov- sequences collected from the start of the pandemic to those most recently deposited on the th april, were aligned and analysed. each dataset showed minimal levels of sequence divergence, with mean pairwise distances ranging from . - . nucleotide differences between each sequence. however, several aspects of the frequencies and sequence contexts of the observed changes were unexpected. firstly, the ratio of non-synonymous (amino acid changing) to synonymous substitutions was high -in the range of . - . amongst the different sars-cov- datasets. this contrasts with a much lower ratio (consistently below < . ) in sequence datasets assembled for the other human coronaviruses (table ) . including a range of coronaviruses in the analysis, there was a consistent association between dn/ds with degree of sequence divergence ( fig. ). we next estimated frequencies of individual transitions and transversions occurring during the short- terms evolution of sars-cov- . sequence differences between each sars-cov- full genome sequence and a majority rule consensus sequence generated for each of the four sars-cov- datasets were calculated. the directionality of sequence change underlying the observed substitutions was inferred through restricting analysis to polymorphic sites with a minimal number of variable bases (typically singletons). in practice because of the scarcity of substitutions, variability thresholds of %, %, % and % yielded similar numbers and relative frequencies of each transition and transversion. equivalent evidence for directionality was obtained through comparison of each sequence in the dataset with the first outbreak sequence (mn ; wuhan-hu- ), approximately ancestral to current circulating sars-cov- strains (data not shown). for the purposes of the analysis presented here, a consensus-based % threshold was used. a listing of the sequence changes revealed a striking (approximately four-fold) excess of sites where c->u substitutions occurred in sars-cov- sequences compared to the other three transitions ( fig. a ). this excess was the more remarkable given there was an almost two-fold greater number of u bases in the sars-cov- genome compared to c's ( . % compared to . %). to formally analyse the excess of c->u transitions we calculated an index of asymmetry (frequency[c->u] / f[u->c]) x (fu/fc) and compared this with degrees of sequence divergence and dn/ds ratio in sars-cov- and other coronavirus datasets (fig. b, c ). this comparison showed that the excess of c->u substitutions was most marked among very recently diverged sequences associated with the sars-cov- and sars-cov outbreaks and was reduced significantly in sequence datasets of the more divergent human coronaviruses (nl , oc , e and oc ) datasets as sequences accumulated substitutions. c->u substitutions were scattered throughout the sars-cov- genome (fig. ) . long bars representing more polymorphic sites were frequently shared between replicate datasets but unique substitutions (occurring once in the dataset -short bars) showed largely separate distributions. substitutions were not focussed towards any particular gene or intergenic region although all three datasets showed marginally higher frequencies of substitutions in the n gene. the grouping of a selection of sequences showing c->u changes in different genome regions were plotted in a phylogenetic tree containing sequences from the sars-cov- dataset (fig. ) . within the resolution possible in tree generated from such a minimally divergent dataset, many sequences with shared c-u changes were not monophyletic (eg. those with substitutions at positions , , , and ). this lack of grouping is consistent with multiple de novo occurrences of the same mutation in different sars-cov- lineages. the abnormally high dn/ds ratios of . - . in sars-cov- sequences (table ; fig. ) indicated that around % of nucleotide substitutions would produce amino acid changes (if approximately % of nucleotide changes are non-synonymous). on analysis of amino acid sequence changes, a remarkable % of non-synonymous substitutions in the sars-cov- sequence dataset were the consequence of c->u transitions (fig. ). the underlying mechanism that leads to c->u hypermutation therefore also drives much of the amino acid sequence diversity observed in sars-cov- . the context of cytidines within a sequence strongly influenced the likelihood of it mutating to a u (fig. ). the greatest numbers of mutations were observed if the upstream ( ') base was an a or u. there was also a similar approximately -fold increase in transitions if these bases were located on the downstream ( ') side. the effects of the ' and ' contexts were additive; c residues surrounded by an a or u at both ' and ' sides were -fold more likely to mutate than those flanked by c or g residues (mean . transitions compared to . ). splitting the data down into the combinations of ' and ' contexts, a 'u far more potently restricted non-c->u substitutions than a 'a ( fig. s ; suppl. data), while 'g or 'c almost eliminated substitutions irrespective of the ' context. no context created any substantial asymmetry in g->a compared to a->g transitions. the g+c content of coronaviruses varied substantially between species, with highest frequencies in the recently emerged zoonotic coronaviruses (mers-cov: %, sars-cov: % and sars-cov- : %) and lowest in hku ( %). collectively, there was a significant relationship between c depletion and u enrichment with g+c content (fig. ) . the difference in g+c content was indeed almost entirely attributable to changes in the frequencies of c and u bases; the % difference in g+c content between mers-cov and hku arose primarily from the % -> % reduction in frequencies of c. there was a comparable % increase in the frequency of u. their combined effects left frequencies of g and a relatively unchanged. it appears that the accumulated effect of the c->u / u->c asymmetry led to marked compositional abnormalities in coronaviruses. the wealth of sequence data generated since the outset of the sars-cov- pandemic, the accuracy of the sequences obtained by a range of ngs technologies and the long genomes and very low substitution rate of coronaviruses provided a unique opportunity to investigate sequence diversification at very high resolution. the findings additionally provide insights into the mutational mechanisms and contexts where sequence changes occur. thirdly, it informs us about the longer term evolution of viruses and potential effects of the cell in moulding virus composition. the mechanism of sars-cov- hypermutation. the most striking finding that emerged from the analysis of more than sars-cov- genomes was the preponderance of c->u transitions compared to other substitutions in the initial - months of its evolution. these accounted for %- % of all changes in the four sars-cov- datasets. in seeking alternative, non-biological explanations for this observation, they cannot have arisen through misincorporation errors in the next generation sequence methods used to produce the dataset because the analysis in the current was restricted to consensus sequences. these are generally assembled from libraries that typically possess reasonable coverage and read depth; error frequencies of < - per site ( ) would therefore improbably create a consensus change in a sequence library. there was furthermore, no comparable increase in g->a mutations (fig. ) and the sequence context in which sequencing errors occur (a ' or ' c or g; ( )) did not match the favoured context for mutation observed in our dataset (fig. ). coronavirus rna dependent rna polymerase during virus replication. by definition, a coronavirus rna genome descends from any other through an equal number of copyings of the positive and negative strands -any tendency to misincorporate a u instead of a c would be reflected in a parallel number of g->a mutations where this error occurred on the minus strand (or vice versa). as demonstrated however, g->a mutations occurred at a much lower frequency than c->u mutations and were similar to those of a->g (figs. a, ) . while deamination of cytidines in single-stranded dna sequences is a hallmark of apobec function, apobecs show binding affinities for single stranded rna templates that may mediate antiviral functions. a b and a f has been shown to block retrotransposition of a line- transposons mrna through a non-deamination pathway ( ) , potentially through binding to single-stranded rna. direct editing of hiv- rna by the rat a apobec and the accumulation of c->u hypermutation verified that rna could also be used as a substrate for deamination ( ) . this suggested to the authors at that time that apobec-mediated rna editing was a potential antiviral activity mechanism against rna viruses as well as retroviruses. since then, evidence supporting this conjecture has been difficult to obtain; the virus inhibitory effect of apobecs against enterovirus a , measles, mumps and respiratory syncytial viruses were not shown to be associated with the development of virus mutations ( , ). similarly, a c, a f or a h, but not a a, a d and a g were shown to inhibit the replication of the human coronavirus, hcov-nl , but their expression did not lead to de novo c->u (or g->a) mutations on virus passaging ( ) . on the other hand, it has been demonstrated that a a and a g possess potent rna editing capability on mrna expressed in hypoxic macrophages ( ) , natural killer cells ( ) and transfected a goverexpressing hek t cells ( ) . these latter findings verify that apobecs do possess rna editing capabilities but do not provide any mechanistic context for the potential inhibition of rna virus replication by this mechanism. nevertheless the pronounced asymmetry in c->u transitions in sars-cov- and the preferential substitution of c's flanked by u and a bases on both ' and ' sides ( fig. ) that broadly matches what is known about the favoured contexts of a a, a f and a h ( ) provides strong circumstantial grounds for suspecting a role of one or more apobec proteins in coronavirus mutagenesis. their exceptionally long genomes (≈ , bases), the exposure of genomic rna in the cytoplasm before initiation of replication complex formation and the potentially lethal effects of just one mutations introduced into the genomic sequence makes apobec-mediated anti-coronaviral activity plausible in virological terms. it is perhaps because of the otherwise low mutation rate of coronaviruses and the extensive dataset of accurate, minimally divergent sars-cov- sequences assembled post-pandemic that has enabled this mutational signature to be so clearly observed. the key finding in the study was the combined evidence for an apobec- like editing process driving initial sequences changes in sars-cov- and that the observed substitutions have not arisen through a typical pattern on random mutation and fixation that is assumed in evolutionary models. a specific problem for evolutionary reconstructions would be the existence of highly uneven substitution rates at different sites; apobec-mediated editing (and indeed the pattern of c-u transition in sars-cov- sequences) is strongly dependent on sequence context and, for at least two apobecs, additionally influenced by their proximity to rna secondary structure elements in the target sequence ( , ) . sequence changes in sars-cov- and other coronavirus genomes may therefore be partially or largely restricted to number of mutational hotspots that may promote convergent changes between otherwise genetically unlinked strains. as demonstrated in fig. , these can conflict with relationships reconstructed from phylogenetically informative sites. the other important consequence of c->u hypermutation is that most of the amino acid sequence diversity observed in sars-cov- strains originates directly from forced mutations and therefore cannot be regarded in any way as adaptive for the virus (fig. ). an rna editing mechanism of the type discussed above evidently places a huge mutational load on sars-cov- that may underpin the abnormally high dn/ds ratios recorded in sars-cov- and sars-cov sequence datasets (fig. ) . it is likely that many or most amino acid changes are mildly deleterious and transient; repeated rounds of mutation at favoured editing sites followed by reversion may therefore contribute to the large numbers of scattered substitutions in sars-cov- sequences that conflict with their phylogeny. finally, it is intriguing to speculate on the long-term effects of the c->u/u->c asymmetry and the extent to which this may contribute to the previously described compositional abnormalities of coronaviruses ( , ). as described above in connection with mutation frequencies, the compositional asymmetries cannot directly arise through viral rdrp mutational biases because any resulting base frequency differences would be symmetric (ie. g≈c, a≈u). instead it appears that the observed imbalances in frequencies of complementary bases reflect the progressive depletion of c residues and accumulation of u's by the apobec-like mutational process on the genomic (+) strand of coronaviruses. culminating in the compositionally highly abnormal hku sequences ( ), this appears to have driven down the g+c content of coronaviruses as low as % while remarkably leaving g and a frequencies more or less unaltered (fig. ) . intriguingly, the bat-derived coronaviruses along with the recently zoonotically transferred viruses into humans show the least degree of compositional asymmetry. the expansions in apobec gene numbers, extensive positive selection and the consequent variability in apobec nucleic acid targeting ( ) may indeed create distinct selection pressures on coronaviruses in different hosts. the immediate appearance of c->u hypermutation in sars-cov- and sars-cov genomes in humans may therefore represent the initial effects of replication in a more hostile internal cellular environment than to be found in a better co-oadapted, virus-tolerised immune system of a bat ( ). zoonotic origins are suspected for other human coronaviruses but at more remote times ( ) ; perhaps they have taken their mutational and adaptive journeys already. coronaviruses and relatives infecting other species (blue circles) and a collection of bat sars-like viruses (grey circle). a power law line of best fit showed a significant correlation between divergence and dn/ds ratio (p = . ). relationship between g+c content and frequencies of individual bases in coronaviruses. the associations between c depletion and u enrichment with g+c content were both significant by linear regression at p = x - and p = x - respectively. no significant associations were observed between g+c content and g (p = . ) or a (p = . ) frequencies. arrows are colour coded as for fig. . early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a novel coronavirus from patients with pneumonia in china a new coronavirus associated with human respiratory disease in china coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics infidelity of sars-cov nsp -exonuclease mutant virus replication is revealed by complete genome sequencing high fidelity of murine hepatitis virus replication is decreased in nsp exoribonuclease mutants severe acute respiratory syndrome coronavirus sequence characteristics and evolutionary rate estimate from maximum likelihood analysis genetic evolution and tropism of transmissible gastroenteritis coronaviruses characterization and evolution of porcine deltacoronavirus in the united states spread, circulation, and evolution of the middle east respiratory syndrome coronavirus analysis of error profiles in deep next-generation sequencing data adenosine deaminases acting on rna (adars) are both antiviral and proviral the cytidine deaminase cem induces hypermutation in newly synthesized hiv- dna broad antiretroviral defence by human apobec g through lethal editing of nascent reverse transcripts dna deamination mediates innate immunity to retroviral infection g-->a hypermutation of the human immunodeficiency virus type genome: evidence for dctp pool imbalance during reverse transcription apobecs and virus restriction evolution of the primate apobec a cytidine deaminase gene and identification of related coding regions ancient adaptive evolution of the primate antiviral dna-editing enzyme apobec g apobec b and apobec f inhibit l retrotransposition by a dna deamination-independent mechanism apobec-mediated editing of viral rna the innate antiviral factor apobec g targets replication of measles, mumps and respiratory syncytial viruses apobec g is a restriction factor of ev and mediator of imb-z antiviral activity apobec -mediated restriction of rna virus replication apobec a cytidine deaminase induces rna editing in monocytes and macrophages mitochondrial hypoxic stress induces widespread rna editing by apobec g in natural killer cells the double-domain cytidine deaminase apobec g is a cellular site-specific rna editing enzyme deamination hotspots among apobec family members are defined by both target site sequence context and ssdna secondary structure cytosine deamination and selection of cpg suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses on the biased nucleotide composition of the human coronavirus rna genome antiviral immune responses of bats: a review hosts and sources of endemic human coronaviruses sse: a nucleotide and amino acid sequence analysis platform mega : molecular evolutionary genetics analysis version . the work was supported by a wellcome investigator award grant wt ma. key: cord- -melz fq authors: sun, weitao title: the discovery of gene mutations making sars-cov- well adapted for humans: host-genome similarity analysis of genomes from china, the usa and europe date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: melz fq severe acute respiratory syndrome coronavirus (sars-cov- ), a positive-sense single-stranded virus approximately kb in length, causes the ongoing novel coronavirus disease- (covid- ). studies confirmed significant genome differences between sars-cov- and sars-cov, suggesting that the distinctions in pathogenicity might be related to genomic diversity. however, the relationship between genomic differences and sars-cov- fitness has not been fully explained, especially for open reading frame (orf)-encoded accessory proteins. rna viruses have a high mutation rate, but how sars-cov- mutations accelerate adaptation is not clear. this study shows that the host-genome similarity (hgs) of sars-cov- is significantly higher than that of sars-cov, especially in the orf and orf genes encoding proteins antagonizing innate immunity in vivo. a power law relationship was discovered between the hgs of orf b, orf , and n and the expression of interferon (ifn)-sensitive response element (isre)-containing promoters. this finding implies that high hgs of sars-cov- genome may further inhibit ifn i synthesis and cause delayed host innate immunity. an orf ab mutation, g>t, which occurred in virus populations with high hgs but rarely in low-hgs populations, was identified in genomes with geolocations of china, the usa and europe. the g>t caused the amino acid mutation m f in the transmembrane protein nsp . the results suggest that the orf and orf genes and the mutation m f may play important roles in causing covid- . the findings demonstrate that hgs analysis is a promising way to identify important genes and mutations in adaptive strains, which may help in searching potential targets for pharmaceutical agents. in december , a novel coronavirus sars-cov- was reported as the cause of covid- . sars-cov- has a positive-sense single-stranded rna with a length of approximately kb [ ] . studies have shown that considerable genetic diversity exists between sars-cov- and sars-cov [ , ] . compared with sars-speculated that some genes of the two viruses may also exist in the human genome or that the viruses may may accelerate adaptation in humans through increasing hgs of the orf and orf genes and selecting the m f mutation. however, the underlying mechanism by which these genes and mutations make sars-cov- more adapted to humans remains unclear. . ( ) here, the e value represents the expected number of times when two random sequences of length m and n are matched and the score is not lower than s'. parameters k and λ describe the statistical significance of the results [ ] . assuming that the fragment of length matches perfectly in the two random sequences, one has the following formula: . ( ) since the viral genome is quite different from the human genome, matching fragments are usually very short. when is particularly small compared to and , is obtained by combining equation ( ) the sars-cov- (genbank: mn . ) and sars-cov (genbank: ay . ) rna sequences were used as references to establish the genome organization. sars-cov- has '-orfs, while sars- cov has '-orfs. the length of each orf is no less than nt ( table ) . a quantitative definition of hgs was proposed to investigate the similarity between viral coding sequences (cdss) and the human genome (homo sapiens grch .p chromosomes). the cds alignment scores were determined by using ncbi blastn[ ], and hgs was calculated by the formulas described in the methods for each orf in the coronavirus genome. the overall hgs of a full-length virus genome was obtained by the weighted sum of orf hgss. the weighting factor was the ratio of orf length to the full-genome length. the orf lengths of sars-cov and sars-cov- genomes are given in table . r orf b orf b b s sprotein s n/r n/r n/r orf a orf x e eprotein e m mprotein m orf orf x orf a orf x decreased rapidly with increasing hgs (fig ) , which provided evidence that there was a power law of all the gene mutations, the orf ab g>t(ttg>ttt) mutation is the most interesting. this mutation survived in all three regions (fig ) . in addition, this mutation occurred only in the high hgs population rather than in that with a lower hgs ( . this g>t orf ab mutation caused an amino acid mutation, m f, in the nonstructural protein nsp , which is located in a loop between the first and second transmembrane domains on the n- terminal side (fig ) . this finding strongly suggested that the g>t (m f) mutation survived a the discovery of increased hgs of orf and orf provide a strong evidence that sars-cov- evolved to be more adaptable to humans than sars-cov. based on these findings, following conjecture is proposed that the sars-cov- genes involved in suppressing the host's innate immunity are more powerful. a new coronavirus associated with human respiratory disease in china evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission. science china life sciences temporal dynamics in viral shedding and transmissibility of covid- complete nucleotide sequence of sv dna the nucleotide sequence of repetitive monkey dna found in defective simian virus genome sequence of a human tumorigenic poxvirus: prediction of specific host response-evasion genes prel iminary study on genome homology of viruses and human the evolution of large dna viruses: combining genomic information of viruses and their hosts immunity and immunopathology to viruses: what decides the outcome viral evasion of antigen presentation: not just for peptides anymore sars-coronavirus open reading frame- b triggers intracellular stress pathways and activates nlrp inflammasomes syndrome (sars) coronavirus orf protein is acquired from sars-related coronavirus from greater horseshoe bats through recombination expression, post-translational modification and biochemical characterization of proteins encoded by subgenomic mrna of the severe acute respiratory syndrome coronavirus dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice the establishment of reference sequence for sars-cov- and variation analysis emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant the single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to wuhan -ncov infection frontiers of specific ace expression in cholangiocytes may cause liver damage after -ncov infection priapism in a patient with coronavirus disease (covid- ): a case report clinical characteristics of novel coronavirus infection in china sars coronavirus spike protein-induced innate immune response occurs via activation of the nf-κb pathway in human monocyte macrophages in vitro human coronavirus: host-pathogen interaction. annual review of microbiology innate immune evasion by human respiratory rna viruses a molecular arms race between host innate antiviral response and emerging human coronaviruses immune evasion of porcine enteric coronaviruses and viral modulation of antiviral innate signaling saitoh t, akira s. regulation of innate immune responses by autophagy-related proteins human coronaviruses: a review of virus-host interactions viral innate immune evasion and the pathogenesis of emerging rna evasion of host innate immunity by emerging viruses: antagonizing host rig-i shown with special markers at the top of colored blocks representing orfs. mutation g>t in orf ab (codon ttg>ttt) occurred in populations with high hgs, which results in amino acid m f mutation in transmembrane protein nsp . the mutation rarely mutation profile for sars-cov- genomes (geolocation of europe) with different hgs out of a total of viral genomes, genomes have unique hgs values. a total of mutations were identified in all the genomes. the top conserved mutations with were shown with special markers at the top of colored blocks representing orfs. mutation orf ab (codon ttg>ttt) occurred in populations with high hgs, which results in amino acid m f mutation in transmembrane protein nsp . the mutation rarely occurred in populations with low/moderate hgs highly conserved mutations identified in sars-cov- genomes with geolocations of the three regions have different sets of mutations. the ttt (f phenylalanine) mutation occurred in all three regions. ttt represents the mutation g>t(ttg>ttt) in orf ab. the f in the circle represents the amino acid mutation methionine to phenylalanine) in nonstructural protein nsp . the p, h, +, -and s in brackets in the legend represent polar, hydrophobic, positively charged, negatively charged and special residues the topology of transmembrane protein nsp and the identified m f mutation located in a loop between the first and second the accession number and corresponding hgs of sars-cov- genomes with geolocation of china. filename is datasets _china_sars-cov- _nstrain _orfhgs_allinone.xls. the file contains accession id, collection date, location, hgs values for orfs the accession number and corresponding hgs of sars-cov- genomes with geolocation of the usa. filename is datasets _usa_sars-cov- _nstrain _orfhgs_allinone.xls. the file contains accession id, collection date, location, hgs values for orfs the accession number and corresponding hgs of sars-cov- genomes with geolocation of europe. filename is datasets _europe_sars-cov- _nstrain _orfhgs_allinone.xls. the file contains accession id, collection date, location, hgs values for orfs the accession number and corresponding hgs of sars-cov genomes. filename is datasets _sars-cov_nstrain _orfhgs_allinone.xls. the file contains accession id, hgs values for orfs the weighted hgs of the whole genome key: cord- -atisrhas authors: fedorenko, aliza; grinberg, maor; orevi, tomer; kashtan, nadav title: virus survival in evaporated saliva microdroplets deposited on inanimate surfaces date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: atisrhas the novel coronavirus respiratory syndrome (covid- ) has now spread worldwide. the relative contribution of viral transmission via fomites is still unclear. sars-cov- has been shown to survive on inanimate surfaces for several days, yet the factors that determine its survival on surfaces are not well understood. here we combine microscopy imaging with virus viability assays to study survival of three bacteriophages suggested as good models for human respiratory pathogens: the enveloped phi (a surrogate for sars-cov- ), and the non-enveloped phix and ms . we measured virus viability in human saliva microdroplets, sm buffer, and water following deposition on glass surfaces at various relative humidities (rh). although saliva microdroplets dried out rapidly at all tested rh levels (unlike sm that remained hydrated at rh ≥ %), survival of all three viruses in dry saliva microdroplets was significantly higher than in water or sm. thus, rh and hydration conditions are not sufficient to explain virus survival, indicating that the suspended medium, and association with saliva components in particular, likely affect physicochemical properties that determine virus survival. the observed high virus survival in dry saliva deposited on surfaces, under a wide range of rh levels, can have profound implications for human public health, specifically the covid- pandemic. the novel human coronavirus that emerged in china in late caused a pandemic. human saliva microdroplets expelled into the air through coughing, talking, and breathing are considered a key source of transmission of the virus , . these microdroplets, ranging in size between few micrometers up to millimeters [ ] [ ] [ ] travel in the air, while some of them -larger ones in particular -settle on surfaces . thus, inanimate surfaces are a potential medium of virus transmission , , , . like many other viruses, sars-cov- has been shown to survive well and remain viable on various surfaces, e.g., metal, glass, and plastic, for up to several days [ ] [ ] [ ] . while not necessarily viable, sars-cov- rna has been detected on surfaces in contaminated sites such as hospitals [ ] [ ] [ ] [ ] . the factors that affect virus survival in droplets that settle on surfaces are complex and not well understood. survival is expected to depend upon the physicochemical characteristics and hydration conditions of the immediate microscopic environment of the virion. these in turn are determined by several factors, including the composition of the fluid comprising the droplets, the surface properties, the ambient temperature, and the relative humidity (rh). a preeminent source for expelled droplets is human saliva, a complex solution that contains salts, a variety of proteins, and surfactants [ ] [ ] [ ] [ ] . it has been suggested that micrometer-sized dry deposits of saliva droplets on surfaces protect virions . still, it is not clear how well viruses survive in saliva microdroplets following their deposition on surfaces, and what factors are in play therein. the impact of rh on the stability and viability of sars-cov- and other enveloped viruses has been studied mostly in airborne droplets or aerosols [ ] [ ] [ ] [ ] [ ] [ ] [ ] and less on surfaces [ ] [ ] [ ] . while survival varies between virus species, increased survival at both low (< %) and very high (> %) rh is often observed, with decreased survival at intermediate rh levels [ ] [ ] [ ] , . the underlying mechanism of this u-shaped survival as a function of rh is not clear . only a few studies have attempted to gain mechanistic understanding of how rh affects virus stability in microdroplets. prominent among these are the pioneering studies of the marr group , , , , which point to the role of the solvent composition, evaporation dynamics, and rh on virus survival in aerosols and sessile droplets. a key factor determining the evaporation rate and equilibrium hydration level of drying droplets on surfaces (and in air as well) is the deliquescence property of solutes, or mostly highly hygroscopic salts [ ] [ ] [ ] . accordingly, although many surfaces in the indoor environment appear dry, they are in fact covered by thin liquid films and micrometer-sized droplets, invisible to the naked eye, known as microscopic surface wetness, or msw . msw can be considered the "envelope" that accommodates microorganisms on surfaces, and as such has profound impact on many aspects of microbial life. for example, it can protect microbes from complete desiccation. however, msw is a harsh microenvironment that differs in its properties from bulk liquid. msw inherently arises from drying liquids that evaporate on a surface. this drying process is accompanied by physicochemical changes such as solute concentrations, ph, reactive oxygenic species, surface tension, and others. at the microscale, gradients, local densities, and surface irregularity introduce heterogeneity to msw environments , [ ] [ ] [ ] . collectively, msw imposes severe stresses on microbes therein -including viruses -and affects their survival , , , , . the current study, motivated by the urgent call for the scientific community to address sars-cov- spread, aims to explore the link between microscopic surface wetness and virus survival therein. we focus on two variables that directly affect msw -solution composition and rh levels -and seek to determine whether and how it is reflected by virus survival trends. we used bacteriophages suggested as good, safe, and easy to work with, model surrogates to study surface and air survival of pathogenic viruses [ ] [ ] [ ] . we chose to study three phages: the enveloped phi -proposed as a good surrogate for sars coronaviruses -and two other tailless non-enveloped model bacteriophages as a reference. phi is a dsrna phage of the cystoviridae family that has been suggested as a good surrogate for studying enveloped rna viruses , , , [ ] [ ] [ ] ; similar to sars-cov- , it is enveloped by a lipid membrane, has spike proteins, and is of similar size (~ - nm). the other virus strains we used are the well-studied, non-enveloped ms (ssrna; leviviridae) , - and phix (ssdna; microviridae) , . to better understand how rh and the solution composition of microdroplets affect virus survival on surfaces, we combine microscopy imaging to assess msw state with plaque assays to determine virus infectivity. we compare survival in 'sprayed' microdroplets of three suspensions: human saliva, sm buffer, and 'pure' water under a range of rh ( %- %) relevant to the indoor environment. the link between the microscopic environment of viruses, including hydration conditions, and virus viability is discussed. bacteriophages phi (dsm- ), phix (dsm- ), and ms (dsm- ) were purchased from dsmz. vacuum-dried phages were revitalized according to dsmz instructions. bacteriophage propagation: overnight bacterial host cultures were diluted at : into ml fresh media and grown to od . . a single plaque, suspended in µl sm buffer, was used to inoculate the fresh culture. the first propagation was shake incubated at the appropriate temperature for the host strain for hrs, or until lysis was observed. meanwhile, the overnight cultures were diluted again in ml fresh media and grown to od . . the first propagation was then diluted with the ml fresh host cultures and grown for hrs or until lysis was observed. the second propagation was centrifuged ( phage stock solution ( pfu/ml for phi ; and ms and pfu/ml for phix ) were diluted fold with: ( ) sm buffer ( mm nacl, mm mgso × h o, mm tris-cl ph . , . % w/v gelatin) ( ) filter-sterilized h o (w , sigma) and ( ) natural human saliva (donated by one of the authors). fluorescent beads ( µm, melamine resin labeled by rhodamine b, fluka) were added to the suspension with a x final dilution. solutions were loaded into ml refillable spray bottles (purchased at a local cosmetics store) and a portion of the load was sprayed on a -well glass bottom plate (p - . h-n, cellvis). each well was sprayed by pressing the spray nozzle twice, delivering a volume of ~ µl). spray was applied to the well through a ml falcon tube from which we chopped . cm from its conical end. the plates were placed without the cover lid inside a sealed plastic box. a ml saturated salt solution was placed in the bottom of each box to maintain relative humidity of , , , and % (potassium acetate, potassium carbonate, sodium bromide, and ammonium chloride respectively). the boxes were placed in an incubator set at °c for hrs. control: spray bottles with the remaining (unsprayed) suspensions were left in incubation for hrs and sprayed onto the -well plates at the end of the incubation period. all plates were imaged at the end of the -hr incubation period and then suspended with µl of sm buffer for the drop plaque assay. plates containing the bottom agar layer were poured in advance (tsb or lb with g/l agar, mm mgso , and mm cacl ). on the day of the experiment, overnight bacterial host cultures were diluted at : into ml fresh media and shake incubated until they reached od . . meanwhile, the top agar (tsb or lb with . g/l agar, mm mgso , and mm cacl ) was melted and kept in a °c water bath. the bacterial culture was combined with the top agar at a ratio of : , and poured on top of the bottom layer. the phages were serially diluted in sm buffer, and after the top agar solidified, either µl, µl, or µl were pipetted and spread out onto either one quarter, one half, or a full plate ( x cm petri dish) respectively. plates were left open until dry and incubated at the appropriate temperature for the host strain (as described above) overnight. to study virus survival in microdroplets deposited on a smooth inanimate surface, we sprayed phi , ms , and phix viruses suspended in three media -human saliva, water, and sm bufferon glass-bottom -well plates (fig. , methods) . fluorescent beads ( µm in diameter) in an equivalent concentration to those of the phi and ms viruses (~ /ml), were added to the suspension for two purposes: (i) to mimic micrometer-sized particulates (e.g., bacteria cells) that are spread on real-world surfaces and have been shown to affect the formation of microscopic surface wetness ; and to (ii) help interpret the virion distribution within droplets. sprayed microdroplet size ranged between tens to hundreds of microns in diameter (fig. a,b) , which falls within the range of respiratory fluid microdroplets exhaled while coughing, speaking, and breathing , and gravitate toward surfaces (i.e., not the < µm aerosols). as this study's emphasis is virus survival in the indoor environment, we chose to work at a temperature of ℃ and a range of rh ( % to %) that spans most indoor environments . the sprayed well plates were placed under constant temperature and rh conditions for hrs, and subsequently microscopy images were taken and virus survival was estimated by the plaque assay using the corresponding bacterial host as a reporter for infectivity ( fig. , methods). to better understand the microscale hydration state that viruses experience, we first examined the surfaces under the microscope hrs post deposition. representative images of the surface at t = hrs are shown in fig . both the saliva and water microdroplets appeared completely dry at all tested rhs. as can be seen in fig. , saliva microdroplets left a thin layer deposit (~ - µm thickness) of dry matter wherein natural bacterial flora could be seen as well as other aggregated substances, some of them are clearly salt crystals, but not all (somewhat similar to ). beads were dispersed fairly uniformly within these microdroplet deposits, as previously demonstrated for virions . while the beads are at least an order of magnitude larger than individual virions, their visualization can help us grasp the concentration of virions in droplets, and possibly their spatial distribution (assuming lack of virion aggregation, see discussion). a back-of-the-envelope calculation estimated ~ virions in a -µm droplet, which is consistent with the observed bead distribution (fig. ) . water microdroplets dried out at all rh levels. this observation is not surprising, as the water suspension contained very low concentrations of deliquescent solutes (e.g., salts). at rh = %, some tiny microdroplets (~ few µms in diameter) were observed around single beads or small bead clusters (fig. ) . these tiny water microdroplets were likely retained due to strong capillary forces. in contrast, sm buffer microdroplets were hydrated at % rh and above (fig. ) . stable microdroplets of tens of µms in diameter could be clearly seen (fig. ) . at rh = %, some of the microdroplets, but not all, dried out and salt crystals were observed. at rh = %, all microdroplets dried out and only salt crystals were seen (fig. ) . these observations can be explained by the point of efflorescence of nacl, the major solute in sm buffer: ~ - % rh . next, we estimated virus survival under all tested conditions. the bottom surface of each well was suspended, and viral infectivity levels (pfu/ml) in the suspensions were evaluated by the plaque assay ( fig. -see methods). virus viability in sprayed droplets (saliva, water, and sm buffer) was compared with that of suspended virus in bulk medium that was kept in sealed tubes throughout the duration of the experiment as a control. strikingly, the highest survival of all tested viruses was found in evaporated saliva microdroplet deposits with less than . order of magnitude reduction in infectivity. a non-monotonous u-shaped survival as a function of rh was observed for phi and phix with highest survival at % (even somewhat higher than the control). this is consistent with previous observations on superhydrophobic surfaces . similar to saliva, water microdroplets were dry at all rh levels. this appears to have a large impact on the enveloped virus phi , which showed more than orders of magnitude reduction in infectivity (fig. ) . in contrast, the two non-enveloped viruses showed lower reduction in infectivity ( - orders of magnitude, fig. ). while phix survival in evaporated water droplets increased with rh, hydration conditions did not show any noticeable change (fig. ) . ms survival in water was lower than that in saliva, with a u-shaped rh dependence. in the sm buffer, while microdroplets remained hydrated at % and % rh (fig. ) , survival of all three viruses was lower than in saliva under the corresponding rh levels. the only exception was phix , which exhibited similar survival levels in sm and saliva at % rh. overall, phi , phix , and ms in sm buffer deposits showed - orders of magnitude reduction in infectivity across the tested rh range, exhibiting the familiar u-shaped trend. aiming to obtain some mechanistic insights into factors that determine virus survival in microdroplets deposited on surfaces, we explored a wide range of rh, three types of solutions, and three model viruses (enveloped and non-enveloped). combining surface microscopy imaging and infective virus enumeration, we were able to explore the link between the tested variables as manifested with respect to microscopic surface wetness and virus survival. our results indicate that rh and hydration conditions are not sufficient to explain virus survival in microdroplets deposited on surfaces. for example, the log reduction of phi at % rh was ≈ , ≈ , and ≈ when suspended in saliva, sm buffer, and h o respectively. this is in contrast to high and comparable virus survival in bulk solution (control) of all tested media (fig. ) . this implies that the physicochemical changes that characterize a microdroplet's drying process have a pivotal implication for virus survival. thus, the effect of rh on virus survival is context dependent. likewise, sm droplets were hydrated at % rh and completely desiccated at % rh. nonetheless, phi survival under all of these widely differing hydration states was comparable (and all showing ~ - log reduction). we conclude that water availability and hydration status of a droplet cannot explain virus survival on its own. the observation that at a given rh, the microscopic hydration conditions of deposited droplets of various media can differ so widely (see along the rows of fig. ) suggests that rh does not directly affect virus stability and infectivity in drying microdroplets deposited on surfaces, but rather rh indirectly affects survival through its effect on physicochemical conditions at the scale that matters for viruses (~ µm). remarkably, both enveloped and non-enveloped viruses survive well in evaporated human saliva microdroplets deposited on inanimate surfaces at a wide range of rh levels. the observation that saliva droplets were completely dry even at % rh was somewhat surprising, as we expected saliva to effloresce at - % rh . human saliva is a complex fluid that somehow protects viruses in drying droplets and aerosols , . a rough approximation estimates that in our experiments, droplets of ~ microns in diameter contained around virions. thus, the mass of salts, proteins, and surfactants in these droplets are a few orders of magnitude higher than the total virion mass . as suggested by marr et al., association between virion and saliva proteins (or other components) can protect the virions for prolonged periods (even days) in such dry saliva droplets on surfaces. in addition, the lower content of inorganic salts in saliva than that in sm buffer may explain part of the large differences in virion survival between the two solutions. we remark that bacteriophage survival in human saliva is puzzling considering the fact that they are not expected to be selected by evolution for high survival in human saliva, unless they are infecting bacteria of the human microbiomes. we hypothesize that the structural stability of both human viruses and bacteriophages is an old, shared trait that is deeply entrenched in the evolution of viruses. a second remarkable result is the very low survival (~ log reduction) of phi in the evaporated water microdroplets, regardless of rh. this is in stark contrast to the non-enveloped viruses that showed only moderate log reduction (~ ) and moderate survival under high rh ( %). this result may indicate damage associated with the lipid membrane of the enveloped phage under these conditions. we note that in the bulk water, survival of phi was high and comparable to that in sm buffer and saliva. a possible explanation therefor may be attributed to ph changes that occur during evaporation note that virus viability requires both physical stability of the virion and functionality, i.e., the ability to infect host cells and replicate. these in turn likely require stability of the capsid or envelope (depending on virus type), and that the relevant proteins (e.g., spike proteins), which play a role in attachment to host cells, remain functional. while the results of this study could not resolve between stability and infectivity, the differences between enveloped and non-enveloped viruses suggest that reduction in physical stability plays a role in virus viability. another less-understood issue is viral aggregation , . spontaneous aggregation of virions, for example as a response to changes in salt concentrations and ph , , , may occur in drying microdroplets, and thus may play a role in virus survival. we speculate that with lipid-enveloped viruses suspended in water, aggregation of viruses imposed by hydrophobicity or other mechanisms, might play a vital role. if significant viral aggregation occurs, it has implications for interpretation of pfu/ml survival estimations (true for any virus survival study), and for the distribution of virions in drying droplets. large virion aggregates (> tens of virions) might also affect microdroplets' drying dynamics due to capillary pinning, as we have shown for bacteria . the results of this study provide important insights concerning the covid- pandemic. although performed with a surrogate for sars-cov- , it indicates, along with other studies, that sars-cov- survives well in evaporated saliva droplets on inanimate surfaces. because sars-cov- transmission appears to depend upon viral load, it is likely that in indoor environments where infected individuals stay for long periods, viable viruses persist on fomites for days. thus, as long as not proved otherwise, indirect transmission through inanimate surfaces -in particular those with prolonged and high contact -is not unlikely, and must be considered. a source of optimism may lie in the injurious impact of evaporating water droplets on the enveloped phage phi , an observation that should be confirmed with sars-cov- . human saliva were applied onto a glass-bottom -well plate by a spraying device, resulting in microdroplets ranging in size from ~ microns to ~ mms in diameter. plates were transferred to sealed containers that maintained specific rh conditions (using saturated salt solutions), and the containers were placed in an incubator to maintain constant temperature ( c) for hrs. at the end of the incubation period, the surface of the plates was imaged by microscopy, and subsequently the wells were re-suspended and phage concentration was determined by the plaque assay. typical shape and size of sm buffer, h o, and saliva a few minutes after spray application onto a glass surface. (b) droplet size distribution calculated based on the images shown in panel a. typical droplet size is between a few tens of microns to hundreds of microns in diameter, which is within the range of expelled saliva microdroplets that are expected to gravitate more rapidly toward surfaces (as opposed to aerosols, which remain suspended in the air for longer). incubation at a range of rh levels. sm buffer droplets remained hydrated at % rh and above, partially hydrated at % rh, and completely dry at % rh (medium salt crystallization can be seen in dried droplets). h o and saliva droplets appeared to be dry under all tested rh conditions. fluorescent microbeads ( µm) (red) were added at a concentration similar to the virus concentration, and thus may help to visualize the expected distribution of viruses within evaporated microdroplets. and % rh at c. a - titer of bacteriophage suspended in sm buffer, h o, or human saliva was applied by spraying on a glass surface. a control sample was left suspended in the spraying device for the duration of the incubation period. following the incubation period, phage concentration was determined by plaque assay (mean ± sd of four replicates). right panel: relative reduction of infectivity of phi , phix , and ms in sm buffer (green), h o (red) and saliva (blue) (mean ± sd of four replicates). modes of transmission of virus causing covid- : implications for ipc precaution recommendations: scientific brief reducing transmission of sars-cov- on coughing and airborne droplet transmission to humans the size and the duration of air-carriage of respiratory droplets and droplet-nuclei dynamics of airborne influenza a viruses indoors and dependence on humidity potential role of inanimate surfaces for the spread of coronaviruses and their inactivation with disinfectant agents transmission of sars and mers coronaviruses and influenza virus in healthcare settings: the possible role of dry surface contamination aerosol and surface stability of sars-cov- as compared with sars-cov stability of sars-cov- in different environmental conditions stability of sars-cov- on environmental surfaces and in human excreta. medrxiv viral load of sars-cov- in clinical samples environmental contamination of the sars-cov- in healthcare premises: an urgent call for protection for healthcare workers air, surface environmental, and personal protective equipment contamination by severe acute respiratory syndrome coronavirus (sars-cov- ) from a symptomatic patient detection of air and surface contamination by severe acute respiratory syndrome coronavirus (sars-cov- ) in hospital rooms of infected patients. medrxiv dilution of respiratory solutes in exhaled condensates lipids in human saliva biochemical composition of human saliva in relation to other mucosal fluids physico-chemical characteristics of evaporating respiratory fluid droplets droplet evaporation residue indicating sars-cov- survivability on surfaces. arxiv effects of air temperature and relative humidity on coronavirus survival on surfaces stability of middle east respiratory syndrome coronavirus (mers-cov) under different environmental conditions survival of the enveloped virus phi in droplets as a function of relative humidity, absolute humidity, and temperature absolute humidity modulates influenza survival, transmission, and seasonality humidity-dependent decay of viruses, but not bacteria, in aerosols and droplets follows disinfection kinetics influenza virus infectivity is retained in aerosols and droplets independent of relative humidity mechanisms by which ambient humidity may affect viruses in aerosols relationship between humidity and influenza a viability in droplets and implications for influenza's seasonality resistance of the melbourne strain of influenza virus to desiccation survival of influenza virus on banknotes mechanistic insights into the effect of humidity on airborne influenza virus survival, transmission and incidence water-solids interactions: deliquescence. annual review of food science and technology water uptake by nacl particles prior to deliquescence and the phase rule phase transitions of aqueous atmospheric particles bacterial survival in microscopic surface wetness role (s) of adsorbed water in the surface chemistry of environmental interfaces prebiotic condensation through wet-dry cycling regulated by deliquescence variation in ph of model secondary organic aerosol during liquid-liquid phase separation some factors affecting the survival of airborne viruses stability of the covid- virus under wet, dry and acidic conditions. medrxiv comparison of five bacteriophages as models for viral aerosol studies inactivation modeling of human enteric virus surrogates, ms , qβ, and Φx , in water using uvc-leds, a novel disinfecting system transmission of viruses via contact in ahousehold setting: experiments using bacteriophage φx as a model virus evaluation of phi persistence and suitability as an enveloped virus surrogate pseudomonas bacteriophage phi as a model for virus emergence the use of bacteriophages of the family cystoviridae as surrogates for h n highly pathogenic avian influenza viruses in persistence and inactivation studies effects of ph on plaque forming unit counts and aggregation of ms bacteriophage evaluation of murine norovirus, feline calicivirus, poliovirus, and ms as surrogates for human norovirus in a model of viral persistence in surface water and groundwater the three-dimensional structure of the bacterial virus ms atomic structure of single-stranded dna bacteriophage φx and its functional implications the effect of environmental parameters on the survival of airborne infectious agents a mathematical model for predicting the viability of airborne viruses artificial saliva and its use in biological experiments viral aggregation: impact on virus behavior in the environment collective viral spread mediated by virion aggregates promotes the evolution of defective interfering particles viral aggregation: effects of salts on the aggregation of poliovirus and reovirus at low ph we thank adi stern for kindly providing the host e. coli strain of ms . n. k. is supported by research grants from the james s. mcdonnell foundation (studying complex systems scholar award, grant # ) and from the israel science foundation (isf # / ). key: cord- - zui sl authors: hillen, hauke s.; kokic, goran; farnung, lucas; dienemann, christian; tegunov, dimitry; cramer, patrick title: structure of replicating sars-cov- polymerase date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zui sl the coronavirus sars-cov- uses an rna-dependent rna polymerase (rdrp) for the replication of its genome and the transcription of its genes. here we present the cryo-electron microscopic structure of the sars-cov- rdrp in its replicating form. the structure comprises the viral proteins nsp , nsp , and nsp , and over two turns of rna template-product duplex. the active site cleft of nsp binds the first turn of rna and mediates rdrp activity with conserved residues. two copies of nsp bind to opposite sides of the cleft and position the rna duplex as it exits. long helical extensions in nsp protrude along exiting rna, forming positively charged ‘sliding poles’ that may enable processive replication of the long coronavirus genome. our results will allow for a detailed analysis of the inhibitory mechanisms used by antivirals such as remdesivir, which is currently in clinical trials for the treatment of coronavirus disease (covid- ). coronaviruses are positive-strand rna viruses that pose a major health risk . the novel severe acute respiratory syndrome coronavirus- (sars-cov- ) , has caused a pandemic referred to as coronavirus disease . coronaviruses use a rna-dependent rna polymerase (rdrp) complex for the replication of their genome, and for sub-genomic transcription of their genes , . this rdrp complex is the target for antiviral nucleoside analogue inhibitors, such as remdesivir , . remdesivir shows antiviral activity against coronaviruses in cell culture and animals , inhibits coronavirus rdrp , , and is currently tested in the clinic as a drug candidate for treating covid- patients [ ] [ ] [ ] . the rdrp of sars-cov- is composed of a catalytic subunit called non-structural protein (nsp) , and two accessory subunits, nsp and nsp , . the structure of the rdrp was recently reported and is highly similar to the rdrp of sars-cov , a zoonotic coronavirus that spread into the human population in . the nsp subunit contains an n-terminal nidovirus rdrp-associated nucleotidyltransferase (niran) domain, an interface domain, and a c-terminal rdrp domain , . the rdrp domain resembles a right hand and comprises the fingers, palm, and thumb subdomains , that are found in all single-subunit polymerases. subunit nsp binds to the thumb, whereas two copies of nsp bind to the fingers and thumb subdomains , . structural information is also available for isolated nsp -nsp complexes , . to obtain the structure of the sars-cov- rdrp in its active form, we prepared recombinant nsp , nsp and nsp (fig. a, experimental procedures) . when added to a minimal rna substrate (fig. b) , the purified proteins gave rise to rna-dependent rna extension activity, which depended on the presence of nsp and nsp (fig. c) . we assembled and purified a stable rdrp-rna complex and collected single-particle cryo-electron microscopy (cryo-em) data (extended data figure , extended data table ). particle classification yielded a d reconstruction at a nominal resolution of . Å and led to a refined structure of the rdrp-rna complex (extended data figures and ) . the structure shows the rdrp enzyme engaged with over two turns of duplex rna (fig. ) . the structure resembles that of the free enzyme , but also reveals large additional protein regions in nsp that became ordered upon rna binding and interact with rna far outside the core enzyme (extended data figure a ). these observations are unique, as rdrp complexes of hepatitis c virus , poliovirus , and norovirus contain only one turn of rna that is however oriented in a similar way (extended data figure b ). our structure shows details of the rdrp-rna interactions (fig. ) . the nsp subunit binds one turn of rna between its fingers and thumb subdomains (fig. a, b) . the active site is located on the palm subdomain and formed by five conserved nsp elements called motifs a-e (fig. b) . motif c binds the rna '-end and contains the residues d and d , of which d is known to be essential for rna synthesis . the additional nsp motifs f and g reside in the fingers subdomain and position the rna template (fig. b) b minimal rna substrate that folds into a hairpin with 'template' and 'product' regions. the rna contains a -nucleotide fluorescently labeled '-overhang. c incubation of the rdrp subunits (a) with rna (b) leads to efficient rna extension. rnas were separated on a denaturing acrylamide gel and visualized with a typhoon fla imager. contacts to the rna product strand may retain a short rna during de novo synthesis . as the rna duplex exits from the rdrp cleft, there are no structural elements that restrict its extension, consistent with the production of a double-stranded rna during replication (fig. c) . the protruding rna duplex is flanked by long α-helical extensions that are formed by the highly conserved n-terminal regions in the two nsp subunits (figs. , ) . these prominent nsp extensions reach up to base pairs away from the active site and use positively charged residues to form multiple rna backbone interactions (fig. ) . the two nsp extensions form different rna interactions, arguing for sequence-independent binding. the nsp extensions get ordered along rna, as they are flexible in nsp -nsp complexes , . the interactions of the nsp extensions with exiting rna may explain the processivity of the rdrp, which is required for the replication of the very long rna genome of coronaviruses and other viruses of the nidovirus family . it is known that nsp and nsp confer processivity to nsp , and that mutation of residue k is lethal for the virus . k is located in the nsp extension, and interacts with exiting rna (fig. c) . thus the nsp extensions may be regarded as 'sliding poles' that slide along exiting rna at the rear of the polymerase to prevent premature dissociation of the rdrp during replication. the sliding poles could serve a function similar to the 'sliding clamps' of dna replication machineries . to investigate how the rdrp binds the incoming nucleoside triphosphate (ntp), we superimposed our structure onto the related structure of the norovirus rdrp-nucleic acid complex . this suggested that contacts between nsp and the ntp are conserved (extended data fig. c) . the nsp residue n may recognize the '-oh group of the ntp, thereby rendering the rdrp specific for the synthesis of rna, rather than dna. our modeling is also consistent with binding of the triphosphorylated form of remdesivir to the ntp site, because there is space in the ntp site to accommodate the additional nitrile group that is present at the ' position of the ribose ring in this nucleoside analogue. when our study was about to be completed, a manuscript became available that also describes a structure of a sars-cov- rdrp-rna complex . whereas the core structures appear to be very similar, we additionally observe exiting rna and novel nsp extensions that are implicated in enzyme processivity. the other study suggested that remdesivir functions as an immediate rna chain terminator . however, this contradicts previous biochemistry , that showed that remdesivir causes delayed chain termination after the addition of several more nucleotides. to resolve this, we will study the mechanism of rdrp inhibition by remdesivir with a combination of biochemistry and structural biology in the future. no statistical methods were used to predetermine sample size. the experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment. the sars-cov- nsp gene was codon optimized for expression in insect cells. the sars-cov- nsp and nsp genes were codon optimized for expression in escherichia coli. synthesis of genes was performed by geneart (thermofischer scientific geneart gmbh, regensburg, deutschland). the gene synthesis products of the respective genes were pcr amplified with ligation-independent cloning (lic) compatible primer pairs a domain structure of rdrp subunits nsp , nsp , and nsp . in nsp , the conserved sequence motifs a-g are depicted. regions included in the structure are indicated with black bars. b three views of the structure, related by -degree rotations. color code for nsp (nirna, interface, fingers, palm, thumb), nsp , nsp , rna template (blue) and rna product (red) used throughout. the magenta sphere depicts a modeled the pcr products for nsp and nsp were individually cloned into the pet derived vector -b (a gift from s. gradia, uc berkeley, addgene: ). the two constructs for nsp and nsp contain an n-terminal xhis tag and a tobacco etch virus protease cleavage site. the pcr product containing codon optimized nsp was cloned into the modified pfastbac vector -c (a gift from s. gradia, uc berkeley, addgene: ) via lic. the nsp construct contained an n-terminal xhis tag, followed by an mbp tag, a xasp sequence, and a tobacco etch virus protease cleavage site. all constructs were verified by sequencing. the sars-cov- nsp plasmid ( ng) was transformed into dh embacy cells using electroporation to generate a bacmid encoding full-length nsp . virus production and expression in insect cells was then performed as described . after hours of expression in hi cells, cells were collected by centrifugation and resuspended in lysis buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm mgcl , mm β-mercaptoethanol, . µg ml- leupeptin, . µg ml- pepstatin, . mg ml- pmsf, and . mg ml- benzamidine). the sars-cov- nsp and nsp plasmids were overexpressed in e. coli bl (de ) ril cells grown in lb medium. cells were grown to an od of at °c and protein expression was subsequently induced with . mm isopropyl β-d- -thiogalactopyranoside at °c for hours. cells were collected by centrifugation and resuspended in lysis buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm β-mercaptoethanol, . µg ml- leupeptin, . µg ml- pepstatin, . mg ml- pmsf, and . mg ml- benzamidine). protein purifications were performed at °c. after harvest and resuspension, cells of the sars-cov- nsp expression were immediately sonicated for cell lysis. lysates were subsequently cleared by centrifugation ( , g, °c, min) and ultracentrifugation ( , g, °c, min). the supernatant containing nsp was filtered using a -μm syringe filter, followed by filtration with a . -µm syringe filter (millipore) and applied onto a histrap hp ml (ge healthcare), preequilibrated in lysis buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm mgcl , mm β-mercaptoethanol, . µg ml- leupeptin, . µg ml- pepstatin, . mg ml- pmsf, and . mg ml- benzamidine). after application of the sample, the column was washed with cv high salt buffer ( mm nacl, mm na-hepes ph . , a schematic of protein-rna interactions. solid and hollow circles show rna nucleotides that were included in the structure or not visible, respectively. rdrp residues in nsp and nsp within Å of the rna are depicted and contacts indicated with lines. color code as in fig. . b interactions of rdrp active site with the first turn of rna. subunit nsp is in grey and conserved motifs a-g are colored as indicated. active site residues d and d are shown as sticks. the active site metal ion was modeled and is shown as a magenta sphere. c interaction of nsp subunits with protruding rna, showing residues in proximity to rna. mm β-mercaptoethanol, . µg ml- leupeptin, . µg ml- pepstatin, . mg ml- pmsf, and . mg ml- benzamidine), and cv lysis buffer. the histrap was then attached to an xk column / (ge healthcare), prepacked with amylose resin (new england biolabs), which was pre-equilibrated in lysis buffer. the protein was eluted from the histrap column directly onto the amylose column using nickel elution buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm mgcl , mm β-mercaptoethanol). the histrap column was then removed and the amylose column was washed with cv of lysis buffer. protein was eluted from the amylose column using amylose elution buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, . mm maltose, mm imidazole ph . , mm β-mercaptoethanol). peak fractions were assessed via sds-page and staining with coomassie. peak fractions containing nsp were pooled and mixed with mg of his-tagged tev protease. after hours of protease digestion, protein was applied to a histrap column equilibrated in lysis buffer to remove uncleaved nsp , his -mbp, and tev. subsequently, the flow-through containing nsp was applied to a hitrap heparin column ml (ge healthcare). the flow-through containing nsp was collected and concentrated in a mwco , amicon ultra centrifugal filter unit (merck). the concentrated sample was applied to a hiload s / pg equilibrated in size exclusion buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm mgcl , mm tcep). peak fractions were assessed by sds-page and coomassie staining. peak fractions were pooled and concentrated in a mwco , amicon ultra centrifugal filter (merck). the concentrated protein with a final concentration of µm was aliquoted, flash-frozen, and stored at - °c until use. sars-cov-nsp and nsp were purified separately using the same purification procedure, as follows. after cell harvest and resuspension in lysis buffer, the protein of interest was immediately sonicated. lysates were subsequently cleared by centrifugation ( . g, °c, min). the supernatant was applied to a histrap hp ml column (ge healthcare), preequilibrated in lysis buffer. the column was washed with . cv high salt buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm β-mercaptoethanol, . µg ml- leupeptin, . µg ml- pepstatin, . mg ml- pmsf, and . mg ml- benzamidine), and . cv low salt buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm β-mercaptoethanol). the sample was then eluted using nickel elution buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm imidazole ph . , mm β-mercaptoethanol). the eluted protein was dialyzed in dialysis buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm β-mercaptoethanol) in the presence of mg his-tagged tev protease. after hours, imidazole ph . was added to a final concentration of mm. the dialyzed sample was subsequently applied to a histrap hp ml column (ge healthcare), preequilibrated in dialysis buffer. the flow-through that contained the protein of interest was then applied to a hitrap q ml column (ge healthcare). the q column flow-through containing nsp or nsp was concentrated using a mwco kda amicon ultra centrifugal filter (merck) and applied to a hiload s / pg equilibrated in size exclusion buffer ( mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm tcep). peak fractions were assessed by sds-page and coomassie staining. peak fractions were pooled. nsp with a final concentration of µm was aliquoted, flash-frozen, and stored at - °c until use. nsp with a final concentration of µm was aliquoted, flash-frozen, and stored at - °c until use. all protein identities were confirmed by mass spectrometry. all rna oligos were purchased from integrated dna technologies. the rna sequence used for the transcription assay is / -fam/rururu rurcra rurgrc rurarc rgrcrg rurarg rurur ururc rurarc rgrcrg. we designed a minimal substrate by connecting the template rna to the rna primer by a tetraloop, to protect the blunt ends of the rna duplex and to ensure efficient annealing. rna was annealed in mm nacl and mm na-hepes ph . by heating the solution to °c and gradually cooling to °c. rna extension reactions contained rna ( µm), nsp ( µm), nsp ( µm) and nsp ( µm) in mm nacl, mm na-hepes ph . , % (v/v) glycerol, mm mgcl and mm β-mercaptoethanol. reactions were incubated at °c for min and the rna extension was initiated by addition of ntps ( µm utp, gtp and ctp, and µm atp). reactions were stopped by the addition of x stop buffer ( m urea, mm edta ph . , x tbe buffer). samples were digested with proteinase k (new england biolabs) and rna products were separated on % acrylamide gels in x tbe buffer supplemented with m urea. -fam labeled rna products were visualized by typhoon fla imager (ge healthcare life sciences). an rna scaffold for rdrp-rna complex formation was annealed by mixing equimolar amounts of two rna strands ( '-rururu rurcra rurgrc rurarc rgrcrg rurarg- '; -fam/rcrura rcrgrc rg- ') (idt technologies) in annealing buffer ( mm na-hepes ph . , mm nacl) and heating to °c, followed by step-wise cooling to °c. for complex formation, . nmol of purified nsp was mixed with a . fold molar excess of rna scaffold and -fold molar excess of each nsp and nsp . after incubation at room temperature for min, the ec was subjected to size exclusion chromatography on a superdex increase . / equilibrated with complex buffer ( mm na-hepes ph . , mm nacl, mm mgcl , mm tcep). peak fractions corresponding to a nucleic-acid rich high-molecular weight population (as judged by absorbance at nm) were pooled and concentrated in a mwco , vivaspin concentrator (sartorius) to approx. µl. µl of the concentrated rdrp-rna complex were mixed with . µl of octyl ß-d-glucopyranoside ( . % final concentration) and applied to freshly glow discharged r / holey carbon grids (quantifoil). prior to flash freezing in liquid ethane, the grid was blotted for seconds with a blot force of using a vitrobot markiv (thermo fischer scientific) at °c and % humidity. cryo-em data collection was performed with serialem using a titan krios transmission electron microscope (thermo fischer scientific) operated at kev. images were acquired in eftem mode with a slit with of ev using a gif quantum energy filter and a k direct electron detector (gatan) at a nominal magnification of , x corresponding to a calibrated pixel size of . Å/pixel. exposures were recorded in counting mode for . seconds with a dose rate of e-/px/s resulting in a total dose of e-/Å that was fractionated into movie frames. because initial processing showed that the particles adopted only a limited number of orientations in the vitreous ice layer, a total of movies were collected at ° stage tilt to yield a broader distribution of orientations. untilted data was discarded. motion correction, ctf-estimation, and particle picking and extraction were performed on the fly using warp . cryo-em data processing and analysis . million particles were exported from warp to cryosparc , and the particles were subjected to d classification. % of the particles were selected from classes deemed to represent the polymerase, and refined against a synthetic reference prepared from pdb- m . ab initio refinement was performed using particles from bad d classes to obtain five d classes of 'junk' . these five classes and the first polymerase reconstruction were used as starting references to sort the initial . m particles in supervised d classification rather than d, as the latter tended to exclude less abundant projection directions. k particles ( %) from the resulting polymerase class were subjected to another ab initio refinement to obtain five starting references containing four 'junk' classes and the complex of interest. these classes were used as starting references in another supervised d classification. k particles ( %) from the class representing the complex were exported from cryosparc to relion . . there, all particles were refined in d against the reconstruction previously obtained in cryosparc, using a mask including only the core part of the polymerase and a short segment of upstream rna to obtain a . Å reconstruction. ctf refinement and another round of d refinement improved the resolution further to . Å. particles were re-extracted at . Å/ px in a bigger box in warp to accommodate distant parts of the rna. unsupervised d classification with local alignment was performed to obtain classes: with nsp b present, and without. k particles with nsp b present were finally subjected to focused d refinement using a mask including the rna, nsp a and nsp b. to build the atomic model of the rdrp-rna complex, we started from the structure of the free sars-cov- rdrp (pdb: m ) that was recently slightly adjusted by tristan croll (cambridge university, uk; https://twitter.com/crolltristan/ status/ ). the structure was rigid-body fit into the cryo-em reconstruction and adjusted manually in coot . after adjustment of the protein subunits, unmodeled density remained for helical segments in the n-terminal regions of both copies of nsp . these nsp extensions were modeled by superimposing the nsp model (pdb: ahm; chain h) from the crystal structure of the nsp -nsp hexadecamer , in which the far n-terminal region of nsp adopts a similar fold. careful inspection of the remaining a-form rna density revealed that in our complex, instead of the originally designed short template-primer duplex (see above), four copies of one of the rna strands were annealed to form a pseudo-continuous, longer rna duplex. annealing was mediated by a bp self-complementary region within this rna strand (extended date figure c) . nucleotides - of four rna strands were modeled, whereas the flapped-out nucleotides - were invisible due to mobility and excluded from the model. the model was real-space refined using phenix.refine against a composite map of the focused refinement and global reconstructions generated in phenix.combine_focused_maps and shows excellent stereochemistry (extended data c angular distribution plots. scale shows the number of particles assigned to a particular angular bin. blue, a low number of particles; yellow, a high number of particles. d cryo-em map for the rdrp active center region including elements with sequence motifs a-g. the active site is indicated by a magenta sphere. e cryo-em map for the rna duplex and the nsp extensions. the active site is indicated by a magenta sphere. side view a comparison of the free sars-cov- rdrp structure (left) and the replicating rdrp-rna complex (right, this study). color code as in fig. . b similar location and orientation of the rna template-product duplex in rdrp complexes of sars-cov- virus (top left, this study), hepatitis c virus (top right), poliovirus (bottom left), and norovirus (bottom right). structures are shown as ribbon models with rna template and product strands in blue and red, respectively. an active site metal ion is shown as a magenta sphere. side view as defined in fig. . c model of substrate nucleoside triphosphate (ntp) in the rdrp active site. a ctp substrate was placed after superposition of the norovirus rdrp-nucleic acid complex structure . coloring as in fig. b . active site residues d , d and n are shown as sticks, and the modeled active site metal ion is shown as a magenta sphere. when the nucleoside triphosphate form of remdesivir would bind in the ntp site, the nitrile group connected to the ribose c ' position would be accommodated in the space indicated by the dashed circle. from sars to mers: years of research on highly pathogenic human coronaviruses a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin the nonstructural proteins directing coronavirus rna synthesis and processing nidovirus rna polymerases: complex enzymes handling exceptional rna genomes sars-cov orf b-encoded nonstructural proteins - : replicative enzymes as antiviral targets nucleosides for the treatment of respiratory rna virus infections broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease the antiviral compound remdesivir potently inhibits rna-dependent rna polymerase from middle east respiratory syndrome coronavirus race to find covid- treatments accelerates arguments in favour of remdesivir for treating sars-cov- infections remdesivir for severe acute respiratory syndrome coronavirus causing covid- : an evaluation of the evidence biochemical characterization of a recombinant sars coronavirus nsp rna-dependent rna polymerase capable of copying viral rna templates one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities structure of the rna-dependent rna polymerase from covid- virus structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors insights into sars-cov transcription and replication from the structure of the nsp -nsp hexadecamer nonstructural proteins and of feline coronavirus form a : heterotrimer that exhibits primer-independent rna polymerase activity viral replication. structural basis for rna replication by the hepatitis c virus polymerase structural basis for active site closure by the poliovirus rna-dependent rna polymerase binding of '-amino- '-deoxycytidine- '-triphosphate to norovirus polymerase induces rearrangement of the active site pcna, the maestro of the replication fork structural basis for the inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir. biorxiv mechanism of inhibition of ebola virus rna-dependent rna polymerase by remdesivir architecture and rna binding of the human negative elongation factor automated electron microscope tomography using robust prediction of specimen movements real-time cryo-electron microscopy data preprocessing with warp cryo-sparc: algorithms for rapid unsupervised cryo-em structure determination a bayesian approach to beam-induced motion correction in cryo-em single-particle analysis features and development of coot real-space refinement in phenix for cryo-em and crystallography enhancing ucsf chimera through web services we thank all members of the department of molecular biology at the max-planck-institute for biophysical chemistry for support. we thank jana schmitzova for helpful discussions and advice. h.s.h. was supported by the deutsche forschungsgemeinschaft (for ). p.c. was supported by the deutsche forschungsgemeinschaft (exc / , sfb , spp ), the erc advanced investigator grant transreg-ulon (grant agreement no ) and the volkswagen foundation. we thank henning urlaub (max planck institute for biophysical chemistry) for protein identification by mass spectrometry. h.s.h., g.k., l.f., c.d. and d.t. designed and carried out experiments and data analysis. p.c. designed and supervised research. all authors interpreted data and wrote the manuscript. the authors declare no competing interests. reconstructions and structure coordinate files will be made available in the electron microscopy database and the protein data bank. key: cord- -dnppshnv authors: hognon, cécilia; miclot, tom; iriepa, cristina garcia; francés-monerris, antonio; grandemange, stephanie; terenzi, alessio; marazzi, marco; barone, giampaolo; monari, antonio title: role of rna guanine quadruplexes in favoring the dimerization of sars unique domain in coronaviruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dnppshnv coronaviruses may produce severe acute respiratory syndrome (sars). as a matter of fact, a new sars-type virus, sars-cov- , is responsible of a global pandemic in with unprecedented sanitary and economic consequences for most countries. in the present contribution we study, by all-atom equilibrium and enhanced sampling molecular dynamics simulations, the interaction between the sars unique domain and rna guanine quadruplexes, a process involved in eluding the defensive response of the host thus favoring viral infection of human cells. our results evidence two stable binding modes involving an interaction site spanning either the protein dimer interface or only one monomer. the free energy profile unequivocally points to the dimer mode as the thermodynamically favored one. the effect of these binding modes in stabilizing the protein dimer was also assessed, being related to its biological role in assisting sars viruses to bypass the host protective response. this work also constitutes a first step of the possible rational design of efficient therapeutic agents aiming at perturbing the interaction between sars unique domain and guanine quadruplexes, hence enhancing the host defenses against the virus. toc graphics different domains of nsp , , whose precise function of some of them has not been entirely clarified yet, the so-called sars unique domain (sud) deserves a special attention, since it is present only in sars-type coronaviruses and hence it has been associated to the increased pathogenicity of this viral family. the structure of sud (presumably a common domain of different sars viruses) has been resolved experimentally, [ ] [ ] [ ] and it has been proved that the macrodomain is indeed constituted by a dimer of two symmetric monomers. furthermore, both experimental and molecular docking investigations have pointed out a possible favorable interaction of sud with nucleic acids, and in particular with rna in g-quadruplex (g ) conformation. the presence of a high density of lysine residues at the interface between two sud monomers, forming a positively charged pocket, also suggests that rna may be instrumental in favoring sud dimerization, due to the negative charge of the rna backbone hence suggesting the occurrence of electrostatic attraction. this observation may have a highly important biological implication since the dimerization has also been connected to the sud native function. tan et al have proposed that the ability of sud to recognize and bind specific viral and/or host g sequences may have implications in regulating viral replication and/or hampering the host response to viral infection, as schematized in figure . the hypothesis is based on the identification of g sequences in key host mrna that encode proteins involved in different signaling pathways such as apoptoting or survival signaling. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] these proteins could induce a controlled cellular death of infected cells slowing down or stopping the infection, or promote cell survival to delay apoptosis by producing antiviral cytokines. however, the removal of the mrna necessary to produce these signaling factors by viral sud may impair the apoptosis/survival response pathways allowing massive cell infection. , in this letter, we report an extended all-atom molecular dynamics (md) study of the interactions produced between a dimeric sud domain and a short rna g sequence. the crystal structure of the protein (pdb w g) and of the oligonucleotide (pdb j g) have been chosen coherently with the experimental work performed by tan et al. even though the chosen sud starting structure belongs to the sars-cov, the very high nucleotide affinity and the global conservation of the nsp protein suggest that the rna binding spots should be globally preserved. this is also further justified by the fact that sars-cov- nsp has also been recognized to suppress host gene expression and hence inhibit the immune response. equilibrium md has allowed to assess some of the hypothesized complexation modes between g and sud, while also highlighting the most important interactions patterns at an atomistic level, and the effects of g in maintaining the dimer stability. furthermore, to better sample the multidimensional conformational space and to quantify the strength of the interactions coming into play, the free-energy surface has been explored using enhanced sampling methods. a two-dimensional ( d) free energy profile has been computed along two coordinates defining the distance between the centers of mass of g and one sud domain (g -sud a ), and the two sud domains (sud a -sud b ), respectively. the corresponding d potential of mean force (pmf) was obtained using a recently developed combination of extended adaptative biased force (eabf) and metadynamics, hereafter named meta-eabf. , both protein and rna have been described with the amber force field including the bsc corrections, , and the md simulations have been performed in the constant pressure and temperature ensemble (npt) at k and atm. all md simulations have been performed using the namd code and analyzed via vmd, the g structure has also been analyzed with the dna suite. , more details on the simulation protocol can be found in supplementary information (si). to obtain starting conformations, the rna was manually positioned in two different orientations close to the experimentally suggested sud interaction area. the equilibrium md evolved yielding two distinct interaction modes, as reported in figure . in particular, we can easily distinguish between a first mode of binding in which the g mainly interacts with only one sud monomer, called monomeric binding mode, and a second one in which the nucleic acid is firmly placed at the interface between the two protein monomeric subunits, referred as dimeric binding mode. note that while for the monomeric mode we easily found a suitable starting point, two independent ns md trajectory were run to characterize the dimeric mode. the corresponding root mean square deviations (rmsd) with respect to the initial structure are also reported and globally show that both the rna and the protein units are stable. as expected, a slightly larger value of the rmsd is observed for the protein, as a consequence of its larger flexibility compared to the rigid g structures (figure d, c) . note also that the slight initial increase of the protein rmsd observed for the dimeric mode is due to the necessity of a slight structural rearrangement to accommodate the g in the interaction pocket. both modes are globally stable all along the md trajectory, and no spontaneous release of the g is observed. at phosphates. this finding is evidenced by the radial distribution function (rdf) between these positively charged lysine side chains and the negatively charged phosphate oxygen atoms of g (depicted in dark blue in figure b ), which shows a very intense and sharp peak at around Å ( figure a ). interestingly, a secondary peak in the rdf is also observed at . Å, probably defining a second layer of interaction patterns that should contribute to the overall stabilization of the binding. conversely, the monomeric mode is driven by interactions mainly involving the terminal uracil moieties and the top guanine leaflet instead of the phosphate backbone of g . as shown in interacting with the peripheral uracil nucleobases, in a mode strongly resembling the top-binding experienced by a number of g drugs. [ ] [ ] [ ] this is nicely confirmed by the analysis of the time series of the distance between the a-carbon of these amino acids and the nearby guanine that readily drops at around Å and stays remarkably stable all along the md. interestingly, the interaction is sufficiently strong to induce a partial deformation of the planarity of the g leaflets. even though from considerations based on chemical intuition those interactions could be referred as mainly due to dispersion, the inherent parameterization of the amber force field does not allow to completely disentangle and decompose the polarization, dispersion and electrostatic contributions. the fact that the monomeric binding mode is driven by non-covalent interactions with one of the exposed g leaflets may also contribute explaining the fact that longer g sequences are preferentially recognized by the sud interface region. indeed, in this case, for obvious statistic reasons, the ratio between the interaction with the backbone or with the terminal leaflet clearly favors the former. on the other hand, this mode may also act efficiently in the process of recruitment of rna, either viral or cellular, efficiently anchoring the oligomer that can subsequently be safely displaced through the interface area. apart from the different positioning of the g , other structural evidences can already be surmised from the visual inspection of the md trajectory. in particular, the sud dimers appear more compact and the interface region better conserved when the rna g adopts the dimeric binding mode, as can also been appreciated in figure . these results clearly indicate that the dimeric mode leads to a greater stability of the g -sud complex. and arg , as reported in figure . indeed, while in the case of the dimeric-like conformation a peaked distribution centered at relative close distances ( . Å) is observed, indicating a closed and quite rigid disposition, a much broader and bimodal distribution is found for the monomer-like conformation, presenting, most notably, a secondary maximum at about Å, which confirms the partial destabilization of the sud subdomains interface and the greater flexibility induced by this binding mode. to further examine the conformational space spanned by the g /sud complex, and in particular the role of the rna in favoring the dimerization and the structure of the interface, we resorted to enhanced sampling md simulations to obtain the d free energy profile along two relevant collective variables: first, the distance between g and sud, and second, the separation between the two sud subdomains. our choice of collective variables does not allow to explore the binding between the two surfaces of the sud domain, however from the results of tan et al. it is clear that the interaction with rna takes place preferentially at the positively charged interface. on the other hand, our methodology is perfectly adapted to characterize the influence of the binding of rna to the stabilization of the interface between the two sud monomers since it allows the sliding of the g on the sud surface. the pmf is reported in figure together with representative snapshots along the reaction coordinates. from the analysis of the pmf, one can evidence the presence of an evident minimum in the free energy profile corresponding to the situation in which the g is interacting through the dimer mode, in which the sud dimer is compact (figure b ). the free-energy stabilization, with respect to the situation in which the g is well separated from the protein, amounts to about kcal/mol. around the principal minimum a slightly less stable and extended region is also observed having a stabilization free energy of about - kcal/mol and corresponding to the sliding of g in the monomer conformation (figure c ). the rest of the free energy surface appears instead rather flat, and no appreciable barriers are observed along the collective variable. the topology of the free energy surface hence accounts for the possibility to observe important conformational movements, leading to open conformations in which the sud subdomain interface has been basically destroyed (figure d) . however, such conformations are instead hampered by the dimer-like conformation of the rna. the free energy map unambiguously shows that the dimer mode is the preferred one, and also confirms the role of the g binding in maintaining the dimeric sud conformation, since no appreciable free energy barrier for the opening of the sud dimer is observed when the rna is unbound. thus, the dimer mode binding site clearly constitutes a specific target that may help in the development of new efficient antiviral agents against coronaviruses. crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors potential treatments for covid- ; a predicting commercially available antiviral drugs that may act on the novel coronavirus ( -ncov) use of antiviral drugs to reduce covid- transmission. the lancet global health molecular dynamics simulations indicate the covid- mpro is not a viable target for small-molecule inhibitors design changes in nonstructural protein are associated with attenuation in avian coronavirus infectious bronchitis virus sars coronavirus replicase proteins in pathogenesis the sars-unique domain (sud) of sars coronavirus contains two macrodomains that bind g-quadruplexes unique domain: three-domain molecular architecture in solution and rna binding nuclear magnetic resonance structure shows that the severe acute respiratory syndrome coronavirus-unique domain contains a macrodomain fold an overview of their replication and pathogenesis bt -coronaviruses: methods and protocols the rich world of p dna binding targets: the role of dna structure nsp of coronaviruses: structures and functions of a large multi-domain protein binding macrodomain within the "sars-unique domain" is essential for the activity of the sars-coronavirus replication-transcription complex bbc mediates fenretinide-induced cell death in neuroblastoma the structure of human neuronal rab b in the active and inactive form signal transduction in sars-cov-infected cells identification of a human nf-Κb-activating protein analysis of an rna tetraplex (uggggu) with divalent sr + ions at subatomic resolution ( . Å) network-based drug repurposing for novel coronavirus -ncov/sars-cov- structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- the extended generalized adaptive biasing force algorithm for multidimensional free-energy calculations wiley interdisciplinary reviews: computational molecular science zooming across the free-energy landscape: shaving barriers, and flooding valleys taming rugged free energy landscapes using an average force ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb assessing the current state of amber force field modifications for dna refinement of the sugar-phosphate backbone torsion beta for amber force fields improves the description of z-and b-dna scalable molecular dynamics with namd vmd: visual molecular dynamics dna: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures dna: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures targeting g-quadruplexes with organic dyes: chelerythrine-dna binding elucidated by combining molecular modeling and optical spectroscopy nickel(ii), copper(ii) and zinc(ii) metallo-intercalators: structural details of the dna-binding by a combined experimental and computational investigation selective g-quadruplex stabilizers: schiff-base metal complexes with anticancer activity key: cord- -bbn tfq authors: li, quan; cao, zanxia; rahman, proton title: genetic variability of human angiotensin-converting enzyme (hace ) among various ethnic populations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bbn tfq there appears to be large regional variations for susceptibility, severity and mortality for covid- infections. we set out to examine genetic differences in the human angiotensin-converting enzyme (hace ) gene, as its receptor serves as a cellular entry for sars- cov- . by comparing , non-finnish european and , east asians (including , koreans) four missense mutations were noted in the hace gene. molecular dynamic demonstrated that two of these variants (k r and i v) may affect binding characteristics between s protein of the virus and hace receptor. we also examined hace gene expression in eight global populations from the hapmap and noted marginal differences in expression for some populations as compared to the chinese population. however, for both of our studies, the magnitude of the difference was small and the significance is not clear in the absence of further in vitro and functional studies. as of march , , the worldwide tracking for covid- infections reported , , cases including , deaths . among the countries that have reported more than , covid- cases, the total number of cases per population ranged from , per million people in spain to per million cases in china and the total death rate among these countries ranged from per million in spain to per million in china . there are numerous potential factors to explain the wide variability in the number of infections and death among the countries including stringency of social isolation and contact tracing, the importation of the sars-cov- , intensity of testing for the virus, as well as the preparedness and capacity of health care system to cope with the pandemic. differences among the host, particularly the sex of the patient and presence of selected co-morbid diseases and immunosuppressive state have been identified as risk factors from severe covid- , . pre-existing medical conditions have been associated with increased prevalence of death including: cardiovascular disease ( . %); diabetes ( . %); chronic respiratory disease ( . %); hypertension ( . %) and recent diagnosis of cancer ( . %) . specifically, hypertensive patients had a hazard ratio of . for death and . for in-hospital mortality . an increased association with the use of angiotensinconverting enzyme inhibitors (aceis) and angiotensin ii receptor blockers (arbs) has been reported for severe covid- cases . a biologically plausible link between sars-cov- infections and ace inhibition has been proposed . angiotensin-converting enzyme (ace ) cleaves peptides within the reninangiotensin system . ace is mostly expressed in kidneys and the gi tract, and smaller amounts are found in type pneumocytes in the lung and peripheral blood. the ace receptor serves as the host cell entry for sars-cov and more recently it has been shown to be receptor for cellular entry for sars-cov- , . as recently reported by hoffman et al , the cellular entry of sars-cov- can be blocked by an inhibitor of the cellular serine protease tmprss , which is employed by sars-cov- for s protein priming. further attention has been drawn to the ace receptor as antimalarials, which can interfere with ace expression, have resulted in increased clearing of the sars-cov- clearance among covid- patients . to explore the variability in genetic polymorphisms and expression in human ace (hace ), we set out to determine if there were any differences between the asian and caucasian populations for ace polymorphisms and compare the variability of hace expression in peripheral blood among eight different populations. in order to investigate whether differences in genetic variations exist between caucasians and asians and if these variants can influence the efficiency of cell entry of sars-cov- , we retrieved the variants in the hace from gnomad v . exomes . we we also showed these four mutations in the hace x-ray structure as figure . ewald (pme) method was used for calculation the long-range electrostatic interactions. the lincs algorithm was used for constraining all covalent bonds. finally, a ns long md simulation was carried out for each system. the stability of the simulation was checked by computing root mean square deviations (rmsds). rmsds of cα were calculated for the mutant k r/i v and compared to the wild-type shown in figure . rmsds for the complexes with k r changed little during the ns md simulations. mutant i v showed rmsd increase during time - ns, but no obvious rmsd change comparing with wild type after ns. the rmsd values for the k r mutant are slightly higher and structure change compared with wild type. the effect of missense on protein-ligand complex can be assessed by various experimental technologies , but are very time-consuming. the in silico binding free energies calculation can be performed to determine the effects of mutations. here, the binding free energies between s protein and hace (wild type or mutant) were calculated using the g_mmpbsa (table ). these two mutations k r/i v were both predicted to slightly increase the binding free energy and may slightly decrease the binding affinity. while, k r mutated more frequently in caucasian, and i v mutated more frequently among asians. the and yoruba in ibadan, nigeria (yri). raw expression data from sentrix human- expression beadchip version were extracted, background-corrected, log scaled and quantile normalized. expression data from the populations with admixture (gih, lwk, mex, and mkk) were also normalized for population admixture using eigenstrat . the normalized expression data were used for the assessing the expression level for different sets of analysis. the hace expression levels of these populations were showed in figure- . the hace expression level of chb population ( females and males) was then compared to other cohorts using anova analysis, after stratification for sex (table- ). marginal differences were noted among the populations. there was a statistically higher hace expression in men for lwk and gih and jpt populations, lwk and yri populations for females. however the effect sizes were small and its clinical relevance is not clear. this study that examined the genetic polymorphisms in the hace gene between caucasians and asians. we noted that there were four missense mutations, of which k r and i v were of most interest based on their location. the propensity for these variants to mutate were different among these two polymorphisms. among caucasians, k r mutated more, while among asians i v mutated more frequently. in silico studies have noted that these two mutations may affecting the binding characteristics between s protein and hace , however in the absence of further functional studies, the significance of these alternations is not clear. we also found that there was some variability of genetic expression of hace among various populations, but the magnitude of the differences was small and so it is unclear if this has any impact based on this subtle differences in susceptibility or severity of covid- among various ethnicities. we also did not notice a differential expression of hace between the sexes, even though it is widely reported that males have a worse prognosis than females for complications of covid- . our findings are generally consistent with a recently published study by chen et al , that reported the ace expression in asians was similar to that of other races. our expression study was conducted in peripheral blood and the results for site specific expression (alveolar, gi or renal tissue) may yield different results. in conclusion, our results do reveal some differences in genetic polymorphisms between asians and caucasians which may potentially alter the binding of the virus to the hace receptor and some subtle variation in genetic expression of hace among different populations. however our findings need to replicated and further in vitro studies should performed, before the significance of these findings can be determined. at present the reasons behind the variability in covid- presentation among the various countries needs to be further explored, as we were unable to provide convincing evidence to suggest there are differences in allele frequency or expression of hace among caucasians and asians. conflict of interest: none of the authors have no conflicts of interest to disclose regarding this manuscript. data used for this study was obtained from public databases. china-ccdc;. the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) report of the who-china joint mission on coronavirus disease (covid- ) clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease a pneumonia outbreak associated with a new coronavirus of probable bat origin covid- and angiotensin-converting enzyme inhibitors and angiotensin receptor blockers: what is the evidence? jama soluble angiotensin-converting enzyme : a potential approach for coronavirus infection therapy? clinical science the spike protein of sars-cov--a target for vaccine and therapeutic development structure, function, and antigenicity of the sars-cov- spike glycoprotein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor hydroxychloroquine and azithromycin as a treatment of covid- : results of an open-label non-randomized clinical trial analysis of protein-coding genetic variation in , humans. nature structural basis for the recognition of sars-cov- by full-length human ace . science the worldwide protein data bank (wwpdb): ensuring a single, uniform archive of pdb data more bang for your buck: improved use of gpu nodes for gromacs best bang for your buck: gpu nodes for gromacs biomolecular simulations : a high-throughput and highly parallel open source molecular simulation toolkit on the binding affinity of macromolecular interactions: daring to ask why proteins interact transcriptome genetics using second generation sequencing in a caucasian population patterns of cis regulatory variation in diverse human populations principal components analysis corrects for stratification in genome-wide association studies. nature genetics asians and other races express similar levels of and share the same genetic polymorphisms of the sars-cov- cell-entry receptor key: cord- -k jec p authors: zhu, yunkai; feng, fei; hu, gaowei; wang, yuyan; yu, yin; zhu, yuanfei; xu, wei; cai, xia; sun, zhiping; han, wendong; ye, rong; chen, hongjun; ding, qiang; cai, qiliang; qu, di; xie, youhua; yuan, zhenghong; zhang, rong title: the s /s boundary of sars-cov- spike protein modulates cell entry pathways and transmission date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k jec p the global spread of sars-cov- is posing major public health challenges. one unique feature of sars-cov- spike protein is the insertion of multi-basic residues at the s /s subunit cleavage site, the function of which remains uncertain. we found that the virus with intact spike (sfull) preferentially enters cells via fusion at the plasma membrane, whereas a clone (sdel) with deletion disrupting the multi-basic s /s site instead utilizes a less efficient endosomal entry pathway. this idea was supported by the identification of a suite of endosomal entry factors specific to sdel virus by a genome-wide crispr-cas screen. a panel of host factors regulating the surface expression of ace was identified for both viruses. using a hamster model, animal-to-animal transmission with the sdel virus was almost completely abrogated, unlike with sfull. these findings highlight the critical role of the s /s boundary of the sars-cov- spike protein in modulating virus entry and transmission. sars-cov- and sars-cov share nearly % nucleotide sequence identity and use the same cellular receptor, angiotensin-converting enzyme (ace ), to enter target cells (hoffmann et al., b; zhou et al., ) . however, the newly emerged cov- spike in cells promotes cell-cell membrane fusion, which is reduced after deletion of the rrar sequence or when expressing sars-cov s protein lacking these residues (hoffmann et al., a; xia et al., ) . pseudovirus or live virus bearing sars-cov- spike deletion at the s /s junction decreased the infection in calu- cells and attenuated infection in hamsters (hoffmann et al., a; lau et al., ) . the sequence at the s /s boundary seems to be unstable, as deletion variants are observed both in cell culture and in patient samples (lau et al., ; liu et al., ; ogando et al., ; wong et al.) . sars-cov- entry is mediated by sequential cleavage at the s /s junction site and additional downstream s ' site of spike protein. the sequence at the s /s boundary contains a cleavage site for the furin protease, which could preactivate the s protein for membrane fusion and potentially reduce the dependence of sars-cov- on plasma membrane proteases, such as transmembrane serine protease (tmprss ), to enable efficient cell entry (shang et al., ) . here, we evaluate how the deletion at the s /s junction impacts virus entry and cell tropism, infection, as measured by n antigen-positive cells, was sensitive to inhibition by e- d but not camostat in vero cells ( figure f ). when tmprss was expressed, both camostat and e- d inhibited the infectivity of sfull, indicating that expression of tmprss could promote the membrane fusion entry pathway. remarkably, e- d and camostat had no effect on sfull virus in a -ace cells, suggesting that in this cell sfull may use other tmprss homologs or trypsin-like proteases to activate fusion at the plasma membrane since tmprss expression is absent in a cells (matsuyama et al., ) . we observed a similar phenotype even when cells were treated with a high insertion of multiple basic residues at the s /s cleavage site and thus resembles the sdel virus ( figure a) . indeed, e- , but not camostat, efficiently inhibited sars-cov pseudovirus infection in multiple cell types ( figure s d ). these results demonstrate that the deletion at the s /s junction site propels the virus to enter cells through the endosomal fusion pathway, which is less efficient than the fusion pathway at the plasma membrane in airway epithelial cells as indicated by the reduced infectivity in calu- cells. both sdel and sars-cov may share a similar entry pathway. richardson et al., ; zhang et al., ; zhang et al., ) . a lack of suitable human physiologically relevant cell lines and the s protein-induced syncytia formation in cells have made such a screen for sars-cov- very challenging. we found that sdel the top candidates from the crispr screen were determined according to their mageck score ( figure b ). the top hit was ace , the cellular receptor that confers susceptibility to sars-cov- , which confirmed the validity of the screen. additionally, the gene encoding cathepsin l (ctsl), a target of our earlier assay using e- d that is and actin-related protein / (arp / ) complex, which have significant roles in endosomal cargo sorting (liu et al., ; mcnally and cullen, ) . we also identified to define the stage of viral infection that each of the validated genes acted, one representative sgrna per gene was selected for study in a -ace cells. due to its known antiviral activity, ascc was not targeted. we confirmed that editing of these genes did not affect cell viability ( figure s b) . the gene-edited cells were infected with pseudovirus bearing the sdel virus s protein or, as a control, the glycoprotein of suggest that these genes mediate sdel virus entry. notably, pseudovirus bearing the spike protein of sars-cov, which lacks the multiple basic residues at the s /s junction as sdel, exhibited a phenotype similar to sdel pseudovirus and sdel live virus ( figure c and c) . editing of these genes, including those encoding ctsl, cholesterol transporters npc / , wdr / , and tfe , markedly reduced infection, suggesting that sdel and sars-cov may utilize similar entry machinery ( figure c ). intriguingly, these genes edited also significantly inhibited the infection by pseudovirus bearing the spike protein of mers-cov in a -ace -dpp cells ( figure d ). although the furin cleavage site is present at the s /s boundary of mers-cov{millet, # }, it preferentially enters the a cell via endosomal pathway as indicated by its sensitivity to e- d inhibitor ( figure s e ). this is possibly due to the lack of proper protease to representative sgrna per gene was tested ( figure e ). the editing efficiency of some these genes by sgrnas was confirmed by western blotting ( figure s b ). as expected, editing of ctsl did not reduce infection, as the sfull virus enters a -ace cells via an endosomal-independent pathway (demonstrated in figure f ). in general, editing of genes encoding complexes that regulate the retrieval and recycling of cargo significantly reduced infection, albeit to a lesser extent than observed with the sdel live virus. u a, a cationic sterol, binds to the npc protein to inhibit cholesterol export from the lysosome, resulting in impaired endosome trafficking, late endosome/lysosome membrane fusion (cenedella, ; ko et al., ; lu et al., ) . u a has been shown to inhibit the s protein-driven entry of sars-cov, middle east respiratory syndrome coronavirus (mers-cov), and the human coronaviruses nl and e, with the most efficient inhibition observed with sars-cov (wrensch et al., ) . the antiviral effect of u a on type i feline coronavirus (fcov) has also been characterized in vitro and in vivo (doki et al., ; takano et al., ) . we found that, and cullen, ). we hypothesized that disruption of these complexes might affect the binding or transit of virions. to this end, we performed binding and internalization assays using sfull virus in a -ace cells. the genes commd , vps , and ccdc , which encode proteins that are comprise ccc, retromer, and wash complexes, respectively, were each edited; effects on expression were confirmed by western blotting (figure s b). notably, binding and internalization of sfull virions to these cells was significantly decreased compared to control sgrna ( figure a ). the entry receptor ace is critical for sars-cov- infection. to determine whether cell surface expression of ace is regulated by these complexes, gene-edited vps and c orf that were identified in our screen, also are shared functionally by the retromer and ccc complexes (norwood et al., ; phillips-krawczak et al., ) . the s /s boundary of spike protein impacts infection and disease in hamsters in cell culture, we demonstrated that the sdel virus resulted in a switch from the plasma membrane to endosomal fusion pathway for entry. in calu- lung cells, which model more physiologically relevant airway epithelial cells, this switch led to a less efficient endosomal entry process. since virus entry is the first step in establishing infection, we hypothesized that deletion at the s /s boundary might reduce virus infectivity and transmissibility in vivo. indeed, using the golden syrian hamster model, a previous study showed that a sars-cov- variant with a -nucleotide deletion at the s /s junction caused milder disease and less viral infection in the trachea and lungs compared to a virus lacking the deletion(lau et al., ). we evaluated the tissue tropism of the sfull and sdel virus following intranasal inoculation of golden syrian hamsters. nasal turbinates, trachea, lungs, heart, kidney, spleen, duodenum, brain, serum, and feces were collected. sfull virus replicated robustly and reached peak titer at day post infection, with a mean titer -, -, and -fold higher than sdel in the turbinates, trachea, and lungs, respectively ( figure a ). while sdel virus replication was delayed, no significant differences were observed by day in these three tissues (figure b ). at days and , five pieces of fresh feces were collected from each hamster. although no infectious virus was detected by focus-forming assay (data not shown), viral rna levels were higher in fecal samples for sfull ( and -fold) than sdel at days and , respectively ( figure b ). likely related to this, no infectious virus was detected in the duodenum, and sfull rna was . -fold higher than sdel at day ( figure s a ). in serum, we detected no difference in viremia at day , but sfull rna was -and -fold higher than sdel at days and , respectively ( figure s b). in other extrapulmonary organs, infectious virus was not consistently detected (data not shown). in general, brain tissue had the highest viral rna copy number, and all organs showed higher levels of sfull rna at day or compared to sdel except for the liver and kidneys (figure s c-g) . weight loss was only observed in hamsters inoculated with sfull and decreased as much as ~ % at days and ( figure s h ). the s /s boundary of spike protein modulates the transmission to determine the impact of deletion at the s /s junction on transmissibility by direct contact exposure, six hamsters were inoculated intranasally with sfull or sdel virus. at h post inoculation, each donor hamster was transferred to a new cage and co-housed with one naïve hamster for days. for donors (day post-inoculation), tissue samples were processed (figure a and b, and figure s ). for contact hamsters (day post-exposure), nasal turbinate, trachea, and lungs were collected for infectious virus titration and histopathological examination. the peak titers in turbinate, trachea, and lungs from sfull-exposed hamsters reached , . , and . logs, respectively ( . logs, . logs, and . logs on average, respectively) ( figure c ). unexpectedly, no infectious virus was detected in these three tissues from sdel-exposed hamsters ( figure c). in lung sections from hamsters that were exposed to sfull-infected animals, we observed mononuclear cell infiltrate, protein-rich fluid exudate, hyaline membrane formation, and haemorrhage ( figure d ). in contrast, no or minimal histopathological change was observed in the lung sections from hamsters that were exposed to sdel- infected animals ( figure d ). to examine viral spread in the lungs, we performed rna in situ hybridization (ish). viral rna was clearly detected in bronchiolar epithelial cells in hamsters exposed to sfull-infected animals ( figure e ) whereas it was rarely detected in hamsters exposed to sdel-infected animals. similaly, abundant rna was observed in the nasal turbinate epithelium ( figure f ). these results indicated that transmission of sfull from infected hamsters to co-housed naïve hamsters was efficient whereas the deletion at the s /s boundary in the s protein of sdel markedly reduced transmission. the notion that this deletion at the s /s boundary discriminates the entry pathway used by the virus was supported by the large number of endosomal entry host factors uncovered in our genome-wide crispr screen. genes for the endosomal entry- specific enzyme ctsl and for regulating endolysomal trafficking and membrane fusion, such as npc / and wdr / , were required for sdel, but not for sfull virus infection. in parallel, we discovered a panel of entry factors common to both sdel and sfull that after or h, the luciferase activity was determined using nano-glo® luciferase assay kit (promega #n ) according to the manufacturer's instructions. the same volume of assay reagent was added to each well and shake for min, after incubation at room temperature for min, luminescence was recorded by using a flexstation (molecular devices) with an integration time of second per well. celltiter-glo ® reagent was added to each well and allowed to shake for min. after sequence alignment of spike protein encompassing the cleavage site between s and s subunits. the spike proteins of sars-cov- without (sfull strain) and with (sdel with dunnett's test; **p < . ; ***, p < . ; ****p < . ; ns, not significant. and tissues from intestine, brain, heart, liver, spleen, and kidney (day , ) were harvested (n= per day). viral rnas were extracted for rt-qpcr analysis. the viral load in the brain was also titrated by focus-forming assay. the dashed lines represent mechanisms of cholesterol binding to the npc and npc proteins. advances in experimental medicine and biology , - . doench, j.g., fusi, n., sullender, m., hegde, m., vaimberg, e.w., donovan, k.f., smith, i., tothova, z., wilen, c., orchard, r., et al. ( ) . optimized sgrna design to maximize activity and minimize off-target effects of crispr-cas . nature c.a., singla, a., starokadomskyy, p., deng, z., osborne, d.g., li, h., dick, c.j., gomez, t.s., koenecke, m., zhang, j.s., et al. ( ) . commd is linked to the wash complex and regulates endosomal trafficking of the copper transporter atp a. molecular biology of the cell , - . qi, j., zhou, y., hua, j., zhang, l., bian, j., liu, b., zhao, z., and jin, s. ( ) . the scrna-seq expression profiling of the receptor ace and the cellular protease tmprss reveals human organs susceptible to covid- infection. biorxiv, . . rapiteanu a unique protease cleavage site predicted in the spike protein of the novel pneumonia -ncov) potentially related to viral transmissibility tfeb regulates lysosomal positioning by modulating tmem b expression and jip recruitment to lysosomes natural transmission of bat-like sars-cov- dprra variants in covid- patients ifitm proteins inhibit entry driven by the mers-coronavirus spike protein: evidence for cholesterol-independent mechanisms the role of furin cleavage site in sars-cov- spike protein-mediated membrane fusion in the presence or absence of trypsin. signal transduction and targeted therapy tmprss and tmprss promote sars-cov- infection of human small intestinal enterocytes mxra is a receptor for multiple arthritogenic alphaviruses a crispr screen defines a signal peptide processing pathway required by flaviviruses a pneumonia outbreak associated with a new coronavirus of probable bat origin protease inhibitors targeting coronavirus and filovirus entry biotechnology , - . doki, t., tarusawa, t., hohdatsu, t., and takano, t. ( ) . in vivo antiviral effects of u a against type i feline infectious peritonitis virus. pathogens . fedoseienko, a., wijers, m., wolters, j.c., dekker, d., smit, m., huijkman, n., kloosterhuis, n., klug, h., schepers, a., willems van dijk, k., et al. ( ) key: cord- -c n j authors: kratzel, annika; todt, daniel; v’kovski, philip; steiner, silvio; gultom, mitra l.; thao, tran thi nhu; ebert, nadine; holwerda, melle; steinmann, jörg; niemeyer, daniela; dijkman, ronald; kampf, günter; drosten, christian; steinmann, eike; thiel, volker; pfaender, stephanie title: efficient inactivation of sars-cov- by who-recommended hand rub formulations and alcohols date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: c n j the recent emergence of severe acute respiratory syndrome coronavirus (sars-cov- ) causing covid- is a major burden for health care systems worldwide. it is important to address if the current infection control instructions based on active ingredients are sufficient. we therefore determined the virucidal activity of two alcohol-based hand rub solutions for hand disinfection recommended by the world health organization (who), as well as commercially available alcohols. efficient sars-cov- inactivation was demonstrated for all tested alcohol-based disinfectants. these findings show the successful inactivation of sars-cov- for the first time and provide confidence in its use for the control of covid- . importance the current covid- outbreak puts a huge burden on the world’s health care systems. without effective therapeutics or vaccines being available, effective hygiene measure are of utmost importance to prevent viral spreading. it is therefore crucial to evaluate current infection control strategies against sars-cov- . we show the inactivation of the novel coronavirus for the first time and endorse the importance of disinfectant-based hand hygiene to reduce sars-cov- transmission. the recent emergence of severe acute respiratory syndrome coronavirus (sars- cov- ) causing covid- is a major burden for health care systems worldwide. it is important to address if the current infection control instructions based on active ingredients are sufficient. we therefore determined the virucidal activity of two alcohol-based hand rub solutions for hand disinfection recommended by the world health organization (who), as well as commercially available alcohols. efficient sars-cov- inactivation was demonstrated for all tested alcohol-based disinfectants. these findings show the successful inactivation of sars-cov- for the first time and provide confidence in its use for the control of covid- . effective hand hygiene is crucial to limit virus spread. therefore, easily available but efficient disinfectants are crucial. the world health organization's 'guidelines for hand hygiene in health care' suggests two alcohol-based formulations for hand sanitization to reduce pathogen infectivity and spreading. these recommendations are based on fast-acting and broad-spectrum of microbicidal activity, as well as easy accessibility and safety . we have previously shown that who formulation i and ii were able to inactivate the closely related sars-cov and mers-cov results sars-cov- was highly susceptible to the who formulations (fig. ) . who formulation i, based on % ethanol, efficiently inactivated the virus with reduction factors (rfs) of  . and concentrations between % - % (fig. a) . subsequent regression analysis revealed similar inactivation profiles compared to sars-cov, pathogenic human covs (fig. a) . who formulation ii, which is based on % isopropanol, demonstrated a better virucidal effect at low concentrations, with complete viral inactivation and rfs of  at a minimal concentration of % (fig. b) . the regression analysis showed an inactivation profile of sars-cov- , which was in between sars-cov, bcov and mers-cov (fig. b) . next, we addressed the susceptibility of sars-cov- against the individual components of the who recommended formulations which are also the main ingredients of commercially available hand disinfections. both alcohols, ethanol (fig. a) and -propanol ( fig. b) were able to reduce viral titers in s exposure to background levels with rfs between  . and . after sec. furthermore, we could show that a minimal concentration of % ethanol or -propanol is sufficient for viral inactivation (fig. ) . identification of a novel coronavirus in patients with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia who guidelines on hand hygiene in health care severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges coronavirus covid- global cases virucidal activity of world health organization-recommended formulations against enveloped viruses, including zika, ebola, and emerging coronaviruses persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents virucidal efficacy of peracetic acid for instrument disinfection key: cord- -w iyh u authors: bhattacharjee, sayan; bhattacharyya, rajanya; sengupta, jayati title: transmission of allosteric response within the homotrimer of sars-cov- spike upon recognition of ace receptor by the receptor-binding domain date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: w iyh u the pathogenesis of novel sars-cov- virus initiates through recognition of ace receptor (angiotensin-converting enzyme ) of the host cells by the receptor-binding domain (rbd) located at spikes of the virus. following receptor-recognition, proteolytic cleavage between s and s subunits of the spike protein occurs with subsequent release of fusion peptide. here, we report our study on allosteric communication within rbd that propagates the signal from ace -binding site towards allosteric site for the post-binding activation of proteolytic cleavage. using md simulations, we have demonstrated allosteric crosstalk within rbd in apo- and receptor-bound states where dynamic correlated motions and electrostatic energy perturbations contribute. while allostery, based on correlated motions, dominates inherent distal communication in apo-rbd, electrostatic energy perturbations determine favorable crosstalk within rbd upon binding to ace . notably, allosteric path is constituted with evolutionarily conserved residues pointing towards their biological relevance. as revealed from recent structures, in the trimeric arrangement of spike, rbd of one copy interacts with s domain of another copy. interestingly, the allosteric site identified is in direct contact (h-bonded) with a region in rbd that corresponds to the interacting region of rbd of one copy with s of another copy in trimeric constitution. apparently, inter-monomer allosteric communication orchestrates concerted action of the trimer. based on our results, we propose the allosteric loop of rbd as a potential drug target. the recent outbreak of a novel coronavirus, the sars-cov- (severe acute respiratory syndrome coronavirus ), a close relative of the previously-known sars-covs, has become an ongoing global public-health threat. following its emergence, it has created an pandemic situation within no time owing to its alarming high rate of transmission . cell entry in particular, is an important early step of cross-species transmission. spike glycoprotein of covs plays the role of attachment to the angiotensin-converting enzyme- (ace ) receptor of the host cells and mediates viral entry. the association with the host-cell receptor is regulated by the receptor-binding domain (rbd) of the spike protein followed by cleavage using a host protease, which releases the spike fusion peptide, and thereby promotes viral entry . sars-cov- is a single-stranded rna virus with four major constituent proteins: spike protein (s), membrane protein (m), envelope protein (e), and nucleo-capsid protein (n) . lung is the key target organ of the virus and, as the name implies, the infection induces pneumonia accompanied by acute respiratory distress. recent high resolution structures of the trans-membrane spike protein of sars cov- show intertwined arrangement of three monomers. each monomer consists of sub-domains s and s . the receptor-binding domain (rbd) that interacts with ace receptor of the host belongs to the s subunit, while the s subunit anchors on the viral membrane. interestingly, rbd of one copy directly interacts with the s domain of another copy in the homotrimer ( figure ). however, the receptor binding motif (rbm) in the rbd that interacts with the receptor has not been fully resolved in the recent spike trimer structures. a following crystallographic study has resolved the interactions of sars-cov- spike rbd with the ace receptor. clearly, successful viral infection is reliant on the effective complex formation between the host receptor and the protruded spikes on the viral surface, which makes it an obvious drug target. however, in addition to the contact residues, studies have indicated potential role of noncontact residues and allostery in the spike protein-receptor interactions . to better understand the etiology of the infection, we have focused on the rbd of the spike protein and its binding to the ace receptor a, , information on which may provide important clues for designing drugs to combat the virus. our study aims to understand the hidden allosteric cross-talks in rbd upon recognition of the ace receptor and their possible role in the activity of the spike protein trimer. the allosteric crosstalk between the ace -binding residues of rbd with its distal residues (away from the binding site) was evaluated using md simulation followed by extensive analyses of both dynamics and energy terms at atomic level. md simulation has long been implicated as a suitable procedure to explore the allosteric paths of information exchange in between amino acid residues of a protein by elucidating intermediate protein conformations that often occur in pico or nano second time scale which are often difficult to capture with other techniques . the conformational dynamics and change in energetic within rbd before and after binding to the host receptor revealed the hidden communication that shapes the crosstalk pathway leading to recognition and subsequent internalization of the virus. our study offers the identified allosteric sites as additional, potential drug targets. insights gained from our results on electrostatic contributions in viral infection suggest that modification or inhibition of the allosteric communication would block proper functioning of the viral spike protein and consequently hinder viral infection. therefore, it may be considered as possible intervention pathway for infection prevention. conformational dynamics of free and receptor-bound rbd. to probe deeper into the dynamicity of free and ace -bound rbd of the covid spike protein, cα atoms based root mean square fluctuations (rmsf) and order parameter (s ) considering the n-h vectors of rbd, were calculated from the corresponding trajectories. the overall bound state rmsf revealed a lower value relative to the free rbd, implicating the higher rigidity of rbd after binding to ace compared to its apo-form. this observation suggests the ace -directed stabilization of the rbd in the bound state by selection of particular conformation from the fast time scale (ps-ns) regime. in this regard, the residues exhibiting remarkably differentiated rmsf values between the two states are clearly regions displaying some motifs having higher plasticity in free rbd (an rmsf cut-off of > . nm was set to define regions of high flexibility): the binding loop (containing residues to ), and a distal loop (consisting residues to ). it is apparent that the distal loop can be considered as an allosteric site (away from the ace binding site of rbd). remarkably, this allosteric loop is in direct contact through h-bond to a region of the rbd that coincides with the region of one monomer that interacts with the s domain of another monomeric copy in the spike protein trimer a, ( figure ). significantly, when the dynamicity of the apo-form was compared with respect to the ace bound rbd, the fluctuation of the allosteric loop decreased considerably along with the ace binding loop (figure a respectively. thus, this observation also suggests the decrement in overall rbd backbone entropy factor (s is directly related to conformational entropy ) and hence formation of a stable complex. allosteric involvement in proteins is one of the most important areas in the field of allosteric drug development and it was proposed that allosteric communications were achieved either by correlated motions of the amino acids or by fundamental non-bonded energy contributions , . a number of noticeable differences in correlation were found when dccm (a residue-wise correlation matrix-based approach where the correlated and anti-correlated motions are scored between and - considering strong: |cij| = . - . ; moderate: |cij| = . - . ; and weak: |cij| = . - . ) of both apo-and ace -bound rbd was compared. free rbd consisted of several strongly correlated and moderately anti-correlated motions within the amino acids, which were either missing or became weakly correlated in the ace -bound rbd ( figure a -b). interestingly, the binding loop (residues from to ) and the allosteric loop (residues from to ) showed a moderate anti-correlated motion. in other words, these loops move away from each other in a correlated manner. it may be noted here, both the allosteric loops are in correlation which reestablishes the presence of h-bond between them ( figure b ) as revealed by crystal structure described above. this kind of anti-correlated motion, which could be attributed to the structural flexibility of apo-rbd, is much weakened in the ace -bound rbd owing to its conformational rigidity in bound state. cross-correlation maps were used to identify the regions that move in or out of phase during the simulations . the elements of the matrix (cij) were obtained from their position vector (r) as shown in eq. : where i and j correspond to any two atoms, residues, or domains; ri and rj are position vectors of i and j; and the angle brackets denote an ensemble average. inter-atomic cross-correlation fluctuations between any two pairs of atoms (or residues) can be calculated by using this expression and can be represented graphically by the dccm. the value of cij can vary from - (strongly anti-correlated motion) to + (strongly correlated motion). backbone n-h vectors were selected to calculate s over the period of trajectory, which represent dynamicity of protein, with a value indicating complete rigidity and a value towards represents enhanced dynamicity. whereμ , μ , and μ are the x, y, and z components of the relevant bond vector scaled to unit magnitude, μ, respectively . angular brackets indicate averaging over the snapshots. the residue wise non-bonded interaction energy between free rbd and its bound state with ace was described as: the non-bonded interactions (∆ rbd−ace ) include both electrostatic (∆ rbd−ace ) and van der waals (∆ rbd−ace ) interactions and were modeled using a coulomb and lennard-jones (lj) potential function, respectively. hierarchical clustering was implemented for each correlation network to produce cumulative nodal clusters, or communities those are strongly intra-connectedyet slightly inter-connected, using a related betweenness clustering algorithm to that developed by girvan.nevertheless, as is standard for unweighted networks, rather than selecting the partitions with the highest modularity ranking, we selected the partition nearest to the highest modularity score that consisted of the smallest number of overall communities. this excluded the typical circumstance where several small communities with modularity values of similarly high scores were created. the density of connections per node which was assessed by the node centralities were calculated as described by the equation: where represents the centrality of the node i, is the ijth entry of the adjacent matrix a, is actually a constant, and g represents all nodes. ≠ if node i and j are linked, and it will be equal to − , where wij is the edge weight. for every i (i g) is equivalent to defining the eigenvalues and eigenvectors of matrix a. considering a pair of nodes optimal (shortest) and suboptimal (near however farther than optimized neo) linking node paths were established employing the previously mentioned algorithm . path analysis from electrostatic energy contributions. the perturbation in pair-wise electrostatic interactions∆ , − : the contributions due to lj and electrostatic (coulomb) non-bonded interactions to nonbonded energy,were calculated separately, but the lj terms were found to be numerically much smaller than the respective electrostatic ones, so we have focused on the electrostatic interactions. the interaction energy between two residues i and j is the sum of the non-bonded interaction energies already defined in a force-field where: energy networks were built by considering the amino-acid residues as nodes. a weighted edge was made between any pair of residues i and j by considering the interaction energy as the weight. energy-hubs are defined as nodes that have a higher degree or connectivity in the network. betweenness-centrality was computed using the following equation: where bc(v) is betweenness-centrality of residue ν.n is the number of residues within network, σij(v) is the number of shortest paths between residue i and j that pass through residue ν and σij is the total number of shortest paths from i to j. dijkstra's algorithm was applied to calculate the shortest path between residues i and j. binding free energy. mmpb-sa tool for gromacs was used to calculate the binding free energy between ace and rbd . sb and js conceived the project. sb designed and performed all the computational work. sb, rb and js analyzed the data and wrote the paper. this work was supported by serb, dst (india) sponsored project, and csir-indian institute of chemical biology, kolkata, india. we acknowledge the central instrument facility (cif) of csir-indian institute of chemical biology, and csir- pi for super computer facility for supporting our computational work. sb and rb acknowledge ugc and csir, india, respectively for awarding senior research fellowship. pandemic potential of -ncov angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target genome composition and divergence of the novel coronavirus ( -ncov) originating in china structure, function, and antigenicity of the sars-cov- spike glycoprotein structure of the sars-cov- spike receptor-binding domain bound to the ace receptor functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses a pneumonia outbreak associated with a new coronavirus of probable bat origin the role of molecular simulations in the development of inhibitors of amyloid beta-peptide aggregation for the treatment of alzheimer's disease protein side-chain dynamics and residual conformational entropy proteolytic activation of the sars-coronavirus spike protein: cutting enzymes at the cutting edge of antiviral research a study of communication pathways in methionyl-trna synthetase by molecular dynamics simulations and structure network analysis on the relationship between nmrderived amide order parameters and protein backbone entropy changes interaction energy based protein structure networks structure analysis of the receptor binding of -ncov. biochemical and biophysical research structure of the sars-cov- spike receptor-binding domain bound to the ace receptor improved side-chain torsion potentials for the amber ff sb protein force field a modified tip p water potential for simulation with ewald summation combining conformational flexibility and continuum electrostatics for calculating pk(a)s in proteins propka : consistent treatment of internal and surface residues in empirical pka predictions canonical sampling through velocity rescaling. the journal of chemical physics polymorphic transitions in single crystals: a new molecular dynamics method a smooth particle mesh ewald method finding the k shortest loopless paths in a network a note on two problems in connexion with graphs g_mmpbsa--a gromacs tool for high-throughput mm-pbsa calculations the authors declare no conflict of interest. key: cord- -f ka bd authors: yuan, meng; wu, nicholas c.; zhu, xueyong; lee, chang-chun d.; so, ray t. y.; lv, huibin; mok, chris k. p.; wilson, ian a. title: a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: f ka bd the outbreak of covid- , which is caused by sars-cov- virus, continues to spread globally, but there is currently very little understanding of the epitopes on the virus. in this study, we have determined the crystal structure of the receptor-binding domain (rbd) of the sars-cov- spike (s) protein in complex with cr , a neutralizing antibody previously isolated from a convalescent sars patient. cr targets a highly conserved epitope that enables cross-reactive binding between sars-cov- and sars-cov. structural modeling further demonstrates that the binding site can only be accessed when at least two rbds on the trimeric s protein are in the “up” conformation. overall, this study provides structural and molecular insight into the antigenicity of sars-cov- . one sentence summary structural study of a cross-reactive sars antibody reveals a conserved epitope on the sars-cov- receptor-binding domain. ( ). such a high degree of sequence similarity raises the possibility that cross-reactive epitopes may exist. a recent study has shown that cr , which is a human neutralizing antibody that targets the receptor-binding domain (rbd) of sars-cov ( ), can bind to the rbd of sars-cov- ( ). this finding provides an opportunity to uncover a cross-reactive epitope. cr was previously isolated from a convalescent sars patient and is encoded by germline genes ighv - , ighd - , ighj (heavy chain), and igkv - , igkj (light chain) ( ). based on igblast analysis ( ), the ighv of cr is . % somatically mutated at the nucleotide sequence level, which results in eight amino-acid changes from the germline sequence, whereas igkv of cr is . % somatically mutated resulting in three amino-acid changes from the germline sequence (fig. s ). we therefore determined the crystal structure of cr with the sars-cov- rbd at . Å resolution (table s and implying their likely importance in the affinity maturation process. out of residues in the epitope (defined as residues buried by cr ), ( %) are conserved between sars-cov- and sars-cov (fig. d ). this high sequence conservation explains the cross-reactivity of cr . nonetheless, despite having a high conservation in the epitope residues, cr fab binds to sars-cov rbd (k d = nm) with a much higher affinity than to sars-cov- rbd (k d = nm) (table mass spectrometry analysis has shown that a complex glycan is indeed present at this n-glycosylation site in . an n-glycan at n would fit into a groove formed between heavy and light chains ( fig. s c ), which could increase contact and, hence, binding affinity to cr . we then tested whether cr was able to neutralize sars-cov- and sars-cov in an in vitro microneutralization assay ( ). while cr could neutralize sars-cov, it did not neutralize sars-cov- at the as [ ] [ ] [ ] . interestingly, the epitope of cr does not overlap with the ace -binding site ( fig. a) . structural alignment of cr -sars-cov- rbd complex with the ace -sars-cov- rbd complex ( ) further indicates that binding of cr would not clash with ace ( ). this analysis implies that the neutralization mechanism of cr does not depend on direct blocking of receptor binding, which is consistent with the observation that cr does not compete with ace for binding to the rbd ( ). unlike cr , most known sars rbd-targeted antibodies compete with ace for binding to rbd ( , [ ] [ ] [ ] [ ] . the epitopes of these antibodies are very different from that of cr (fig. b ). in fact, it has been shown that cr can synergize with other rbd-targeted antibodies to neutralize sars-cov ( ). although cr itself cannot neutralize sars-cov- in this in vitro assay, whether cr can synergize with other sars-cov- rbd-targeted monoclonal antibodies for neutralization remains to be determined. recently, the cryo-em structure of homotrimeric sars-cov- s protein was determined ( , ) and demonstrated that the rbd, as in other coronaviruses ( , ) adopts two different dispositions in the trimer. the rbd can then undergo a hinge-like movement to transition between "up" or "down" conformations ( fig. a ). ace host receptor can only interact with the rbd when it is in the "up" conformation, whereas the "down" conformation is inaccessible to ace . interestingly, the epitope of cr is also only accessible when the rbd is in the "up" conformation ( fig. , b and c). furthermore, the ability for cr to access the rbd also depends on the relative disposition of the rbd on the adjacent protomer. cr can only access rbd when the targeted rbd on one protomer of the trimer and the rbd on the adjacent protomer are both in the "up" conformation. the variable region of cr would clash with the rbd on the adjacent protomer if the latter adopts a "down" conformation ( fig. d) . as a homotrimer, the s protein could potentially adopt four possible rbd configurations, namely none-"up", single-"up", double-"up", and triple-"up". it appears that cr can only bind to the s protein when it is in double-"up" or triple-"up" configuration. specifically, one molecule of cr can be accommodated in the double-"up" configuration ( fig. e) , whereas three molecules of cr could potentially be accommodated in the triple-"up" configuration (fig. f ). previous cryo-em studies have also shown that the recombinant sars-cov s protein is mostly found in the none-"up", single-"up", or double-"up" conformations ( , ), but rarely in the triple-"up" conformation, even with ace receptor bound ( , ) . together with the fact that cr was isolated from a convalescent sars patient ( ) therefore, although cr does not neutralize sars-cov- in vitro despite its reasonable binding affinity, it is possible that this epitope can confer in vivo protection. the potential existence of non-neutralizing protective antibodies to sars-cov- highlights the need for an effective sars-cov- infection mouse model, which has yet to be established. since there is currently great urgency in the efforts to develop a vaccine against sars- cov- , characterizing the epitopes on sars-cov- s protein is extremely valuable. much work is now ongoing in isolating human monoclonal antibodies from sars-cov- patients. we anticipate that these investigations will decipher the antigenic properties potent neutralization of severe acute respiratory syndrome (sars) coronavirus by a human mab to s protein that blocks receptor association molecular and biological characterization of human monoclonal antibodies binding to the spike and nucleocapsid proteins of severe acute respiratory syndrome coronavirus development and characterisation of neutralising monoclonal antibody to the sars-coronavirus structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structures of mers-cov and sars-cov spike robyn stanfield for assistance in data collection, and andrew ward for discussion this work was supported by nih k ai yersin scholarship (to h.l.), bill and melinda gates foundation opp (to i national natural science foundation of china (nsfc)/research grants council (rgc) cdr loops are labeled. cyan: epitope residues that are conserved between green: epitope residues that are not conserved between d) epitope residues that are important for binding to cr are labeled. epitope residues are defined here as residues in sars rbd with buried surface area > Å after fab cr binding as calculated with pisa ( ). (e) several key interactions between cr and sars-cov- rbd are highlighted. cr heavy chain is colored in orange, cr light chain in yellow, and sars-cov- rbd in cyan the relative binding position of cr with respect to receptor ace and other sars-cov rbd monoclonal antibodies. (a) structures of cr -sars-cov- rbd complex and ace -sars-cov- rbd complex sars-cov- rbd. ace is colored in green, rbd in light grey, and cr in yellow structural superposition of cr -sars-cov- rbd complex r-sars-cov rbd complex (pdb ghw) ( ), and m -sars-cov rbd complex binding of cr depends on the rbd configurations on the s protein rbd in the s proteins of sars-cov- and sars-cov can adopt either an "up" conformation (blue) or a "down" conformation (red). pdb vsb (cryo-em structure ) is shown. (b-c) cr epitope (cyan) on the rbd is exposed in (b) the "up e-f) the double-"up" and triple-"up one cr molecule can be accommodated per s protein in the double-"up" configuration, and (f) three cr molecules could potentially be accommodated per s we thank henry tien for technical support with the crystallization robot, jeanne matteson for contribution to mammalian cell culture, wenli yu to insect cell culture, key: cord- -pwj j authors: shrimp, jonathan h.; kales, stephen c.; sanderson, philip e.; simeonov, anton; shen, min; hall, matthew d. title: an enzymatic tmprss assay for assessment of clinical candidates and discovery of inhibitors as potential treatment of covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pwj j sars-cov- is the viral pathogen causing the covid global pandemic. consequently, much research has gone into the development of pre-clinical assays for the discovery of new or repurposing of fda-approved therapies. preventing viral entry into a host cell would be an effective antiviral strategy. one mechanism for sars-cov- entry occurs when the spike protein on the surface of sars-cov- binds to an ace receptor followed by cleavage at two cut sites (“priming”) that causes a conformational change allowing for viral and host membrane fusion. this fusion event is proceeded by release of viral rna within the host cell. tmprss has an extracellular protease domain capable of cleaving the spike protein to initiate membrane fusion. additionally, knock-out studies in mice have demonstrated reduced infection in the absence of tmprss with no detectable physiological impact; thus, tmprss is an attractive target for therapeutic development. a validated inhibitor of tmprss protease activity would be a valuable tool for studying the impact tmprss has in viral entry and potentially be an effective antiviral therapeutic. to enable inhibitor discovery and profiling of fda-approved therapeutics, we describe an assay for the biochemical screening of recombinant tmprss suitable for high throughput application. we demonstrate effectiveness to quantify inhibition down to subnanomolar concentrations by assessing the inhibition of camostat, nafamostat and gabexate, clinically approved agents in japan for pancreatitis due to their inhibition of trypsin-like proteases. nafamostat and camostat are currently in clinical trials against covid . the rank order potency for the three inhibitors is: nafamostat (ic( ) = . nm), camostat (ic( ) = . nm) and gabexate (ic( ) = nm). further profiling of these three inhibitors against a panel of proteases provides insight into selectivity and potency. the sars-cov- pandemic has driven the urgent need to rapidly identify therapeutics for both preventing and treating infected patients. given that no approved therapeutics for treating any coronaviruses existed at the time sars-cov- emerged (late ), early attention has focused on drug repurposing opportunities , . drug repurposing is an attractive approach to treating sars-cov- , as active drugs approved for use in humans in the united states or by other regulatory agencies, or unapproved drug candidates shown to be safe in human clinical trials, can be nominated for fast-track to the clinic. for example, remdesivir (gs- , gilead sciences inc.), is an inhibitor of viral rna-dependent rna polymerase that had previously been in clinical trials for treating ebola virus. remdesivir was rapidly shown to be active against sars-cov- in vitro and in clinical trials, which resulted in the fda granting emergency use authorization and full approval in japan . the delineation of targets and cellular processes that mediate sars-cov- infection and replication forms the basis for the development of assays for drug repurposing screening and subsequent full-fledged therapeutic development programs. one therapeutic target receiving significant attention is the human host cell transmembrane protease serine (tmprss , uniprot -o ). tmprss is anchored to the extracellular surface of the cell, where it exerts its enzymatic activity. while its precise physiologic substrate is not clear, tmprss gene fusions are common in prostate cancer, resulting in its overexpression . the sars-cov- virus enters cells via its spike protein first binding to the cell-surface enzyme ace , and . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint evidence suggests that tmprss then proteolytically cleaves a sequence on the spike protein, facilitating a conformation change that 'primes' it for cell entry ( figure a) . hcov-nl in cells engineered to overexpress tmprss , and by inhibition with the trypsin-like serine protease inhibitor, camostat . when the mers-cov outbreak occurred, tmprss -overexpressing cells were again shown to facilitate cell infection, tmprss was shown to degrade the mers-cov spike protein, and camostat was shown to limit cell entry . the structurally related trypsin-like serine protease inhibitor nafamostat was shown to similarly inhibit spike protein-mediated cell fusion of mers-cov . given the strong evidence that tmprss mediates coronavirus entry, when sars-cov- emerged it was soon demonstrated through loss-and gain-of-function experiments that tmprss is retained as a mediator of cell infection, and that this can be inhibited by camostat . camostat (also called foy- ) is a trypsin-like serine protease inhibitor approved in japan (as the mesylate salt) for the treatment of pancreatitis and reflux esophagitis . given its status as an approved agent that is orally administered, safe and well tolerated in humans, and can inhibit cellular entry, camostat mesylate received attention as a drug repurposing candidate. at least six clinical trials for treating patients are currently underway . however, while indirect evidence exists that camostat and related compounds inhibit tmprss -mediated cell entry of sars-cov- , no direct biochemical evidence exists in the literature. camostat was developed by ono pharmaceuticals (patented in , ). while a specific report of its development does not appear to be published, it is a highly potent inhibitor of trypsin (ic ≈ nm), and it cross-inhibits other proteases. two other structurally related inhibitors, nafamostat and gabexate (foy- ), are also approved in japan for treating pancreatitis and show potential for activity against sars-cov- , and trials with nafamostat have also been reported. while tmprss biochemical assays have been reported for understanding its role in prostate cancer , , the three inhibitors have not been demonstrated to inhibit tmprss and the rank potency of the inhibitors against tmprss is not known. in response to the covid public health emergency, we are developing both protein/biochemical and cell-based assays to interrogate several biological targets to enable identification of potential therapeutic leads. our initial focus is on performing drug repurposing screening for each assay and rapidly sharing the data through the ncats opendata portal for covid . as part of this effort, we sought to develop a biochemical assay for measuring the activity of tmprss to enable the evaluation of existing drug repurposing candidates and to enable drug repurposing screening. herein we report the development of a tmprss fluorogenic biochemical assay and testing of clinical repurposing candidates for covid . activity of tmprss constructs, and assessment of a number of substrates was first performed. the best substrate was then used to assess enzyme kinetics and establish a km value for the substrate, to define assay conditions and demonstrate suitability of the assay in -and -well plates. the inhibitors camostat, nafamostat and gabexate were assessed. to . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint understand their relative activities, we also profiled them against a panel of human proteases. . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint recombinant human tmprss protein (human tmprss residues - , nterminal x his-tag) (cat # tmprss - h) was acquired from creative biomart to a -well black plate (greiner ) was added boc-gln-ala-arg-amc ( . nl) and inhibitor ( . nl) using an echo acoustic dispenser (labcyte). to that was added tmprss ( nl) in assay buffer ( mm tris ph , mm nacl) to give total reaction volume of µl. following hr incubation at rt, detection was done using the pherastar with excitation: nm and emission: nm. to a -well black plate (greiner ) was added -amino-methylcoumarin ( . nl) and inhibitor or dmso ( . nl) using an echo acoustic dispenser (labcyte). to that was added assay buffer ( mm tris ph , mm nacl) to give total reaction volume of µl. detection was done using the pherastar with excitation: nm and emission: nm. fluorescence was normalized relative to a negative control containing dmso-only wells ( % activity, low fluorescence) and a positive control containing amc only ( % activity, high fluorescence). an inhibitor causing . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint fluorescence quenching would be identified as having a concentration-dependent decrease on amc fluorescence. to a -well black plate was added boc-gln-ala-arg-amc substrate ( nl) and inhibitor ( nl) using an echo acoustic dispenser (labcyte). to that was dispensed tmprss ( nl) in assay buffer ( mm tris ph , mm nacl) using a bioraptr (beckman coulter) to give total reaction volume of µl. following hr incubation at rt detection was done using the pherastar with excitation: nm and emission: nm. the tmprss biochemical assay was performed according to the assay protocol shown in table . to determine compound activity in the assay, the concentration-response data for each sample was plotted and modeled by a four-parameter logistic fit yielding ic and efficacy (maximal response) values. raw plate reads for each titration point were first normalized relative to a positive control containing no enzyme ( % activity, full inhibition) and a negative control containing dmso-only wells ( % activity, basal activity). data normalization, visualization and curve fitting were performed using prism (graphpad, san diego, ca). . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint to identify inhibitors of tmprss that may be used to validate its role in sars-cov- entry and potentially expedite to clinical trials, we developed a biochemical assay using active tmprss protease and a fluorogenic peptide substrate ( figure b) . initially, we screened six candidate fluorogenic peptide substrates, boc-gln-ala-arg-amc , cbz-d-arg-gly-arg-amc , cbz-d-arg-pro-arg-amc , boc-leu-gly-arg-amc , , cbz-gly-gly-arg-amc , and ac-val-arg-pro-arg-amc, most of which had been demonstrated within the literature to be cleaved by tmprss . each peptide contains a -amino- methylcoumarin (amc) fluorophore that is liberated following enzymatic cleavage. candidate substrates were tested to both confirm that the tmprss construct was biochemically active (other commercial constructs sourced and tested were not active, data not shown), and to identify the most cleavable substrate, indicated by the greatest production of fluorescence from the fluorogenic product amc (figure a ). the peptide boc-gln-ala-arg-amc had a % conversion at min, which was the highest conversion observed, and was used for further assay optimization and inhibitor screening with the recombinant tmprss protein from creative biomart ( figure c ). next, a tmprss titration was performed at constant substrate concentration ( µm) to identify an appropriate enzyme concentration that achieves ~ % substrate conversion in min, and this was found to be µm ( figure b ). we then varied assay buffer conditions, such as tris-hcl buffer ph, dmso and tween concentrations to further optimize enzymatic activity and determine tolerance to dmso and tween that are required for inhibitor screening. noticing that trypsin activity is optimal at ph . -. cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint . , we tested a few different ph > and demonstrated that ph of had highest % conversion ( figure c ). however, a ph of , which had nearly identical % conversion, was chosen to proceed for further assay optimization and inhibitor screening. next, enzymatic activity was shown to be tolerant of tween at . % (data not shown), and dmso up to % (data not shown), well above the dmso concentrations of < % v/v typically applied during the testing of inhibitors. finally, using our optimized assay buffer conditions, we determined the km of our peptide substrate to be µm ( figure d ). the concentration of substrate selected for the biochemical assay was set below km at µm to ensure susceptibility to competitive inhibitors (detailed protocol provided in table figure f ). these data demonstrate appropriate performance for this assay within both plate formats to be useful for hts ( figure f ). next, using the established assay in -well format, we tested the inhibition of demonstrated here against numerous proteases tested suggests that neither compound conveys improved target selectivity but rather improved potency, which highlights the need for novel, more specific inhibitors of tmprss . . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint we developed a fluorogenic biochemical assay for measuring recombinant human tmprss activity for high-throughput screening that can be readily replicated and used to demonstrate that nafamostat is a more potent inhibitor than camostat and gabexate. the fluorogenic assay approach taken here has advantages and disadvantages. the assay performs well and was readily scaled to -well format for potential highthroughput robotic screening, can be monitored in real-time, and activity is easily detected by the liberation of a fluorophore. the substrate was selected based on the maximal activity of tmprss against it compared with other substrate candidates, and can be considered a tool substrate, rather than one that is physiologically relevant in the context of the action of tmprss against its sars-cov- spike protein cleavage site. a disadvantage that is common to all fluorescence-based assay readouts is the potential for inhibitory compounds from screening to be false-positive artifacts, by quenching the fluorescence of the amc product, but a simple counter-assay for amc quenchers can be used to identify false-positives. this counter assay was done on those inhibitors profiled here to demonstrate there was no dose-response quenching of amc fluorescence (overlaid data, figure a -c). two other reports of biochemical assays for tmprss exist, though their study was unrelated to the role of tmprss in sars-cov- entry. lucas et al. examined tmprss in the context of prostate cancer. they reported a hts screen at a single concentration, producing several screening hits including the fda-approved bromhexine hydrochloride (bhh) and bhh has also been discussed as a potential . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint drug repurposing tmprss inhibitor candidate . unfortunately, no details of the assay utilized, scale of assay, or its development were described, but the substrate identified in our study as the most amenable for hts (boc-gln-ala-arg-amc) was also used in the lucas study. meyer et al. examined several peptide amc substrates for tmprss and used a biochemical assay to assess modified peptide substrates as tmprss inhibitors, some with observed inhibition constants of approximately nm . they found a high correlation between inhibition constants of inhibitors between tmprss and matriptase, similar to the cross-inhibition seen from the three inhibitors profiled within our assay. additionally, they report a low active tmprss concentration in the enzyme stock solution. in our study, the enzyme concentration used was µm based on the manufacturer's supplied concentration, but we do not know the proportion of enzyme that is active, and the concentration of active enzyme may be far lower than total enzyme present. we report here for the first time the direct biochemical inhibition of tmprss by three clinical agents of interest in covid . given the clinical trial attention on camostat and nafamostat for treating covid , our finding that nafamostat demonstrates greater potency against tmprss supports its evaluation in clinical trials. protease profiling revealed activity against a range of trypsin-like serine proteases (and greater potency than against tmprss ), but activity was restricted to this protease class and compounds did not generally inhibit other protease classes such as matrix metalloproteases (mmps), caspases, and ubiquitin-specific proteases (usps). camostat and nafamostat development was reported to focus on trypsin, plasmin, and . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint kallikrein , , because of the role these targets played in pancreatitis, reflux esophagitis & hyperproteolytic conditions, and activity against these enzymes was certainly observed in the protease panel. beyond the evaluation of current inhibitors, we demonstrated acceptable reproducibility and s:b indicating its suitability in a drug repurposing screen to support the development of new inhibitors of tmprss . there are several reasons why a new clinical candidate may be valuable. a number of coronaviruses have been shown to rely on tmprss for cellular entry (described in the introduction), so a potent, orally available tmprss inhibitor could be invaluable as a repurposing candidate for treating future emergent coronaviruses. covid is associated with acute respiratory distress syndrome (ards). the lung pathology of the ards shows microvascular thrombosis and hemorrhage, and has been characterized as disseminated intravascular coagulation (dic) with enhanced fibrinolysis, or as diffuse pulmonary intravascular coagulopathy , . this coagulopathy can lead to pulmonary hypertension and cardiac injury. trypsin-like serine proteases are involved in sars-cov- cell entry (tmprss ), in the coagulation cascade (apc) and in the enhanced fibrinolysis (plasmin). as shown here, camostat, nafamostat and gabexate directly inhibit enzymes involved in all these processes. clinical development of this class of compounds could thus be directed towards treatment of the infection through inhibition of viral entry, towards treatment of the coagulopathy, or conceivably, both. however, the strategies used to treat viral infection and coagulopathies are very different. the former aims to achieve maximum viral suppression, with dose limited by safety and tolerability. the latter seeks to strike . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint the delicate balance between suppression of thrombosis while managing the inevitable increased risk of bleeding. for this reason, the known clinical trials planned or underway for nafamostat and camostat can be divided into three categories: those whose primary endpoint is focused on limiting or preventing infection, those whose primary endpoint is focused on management of ards and advanced disease, and those which may capture treatment benefit through either mechanism. nafamostat is approved and marketed in japan and s. korea and is typically prescribed gabexate is an iv drug approved for marketing in italy and japan was shown to be effective in treating patients with sepsis-associated disseminated intravascular . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint coagulation and treating acute pancreatitis. it is clearly much less potent than nafamostat and there are no registered covid- clinical studies with it. in summary, of the three drugs in the class, nafamostat is being studied as the preferred drug in an icu setting as it can be titrated against coagulation markers as a treatment for coagulopathy, while considering its antiviral effect as a bonus. in outpatient, early diagnosis, or prophylactic settings, camostat is being studied predominantly with the primary purpose as an antiviral. in these latter settings there is room for new, selective tmprss inhibitors which could achieve higher levels of inhibition without incurring a bleeding risk. the biochemical tmprss assay we disseminate here is a simple, and hts-amenable approach to tmprss inhibitor therapeutic development. clinical trials of nafamostat for covid have been reported, and we believe it warrants evaluation given its superior activity over camostat, as demonstrated herein. compounds were tested against all proteases in dose-response, and activity data was conditionally formatted, dark green = inhibition (ic < µm), light green = (ic > µm), and red = inactive. b) dose-response curves for all three compounds against the eight most sensitive proteases in the panel (full report in supplemental rb report). . cc-by . international license (which was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint rapid repurposing of drugs for covid- a review of sars-cov- and the ongoing clinical trials remdesivir: a review of its discovery and development leading to emergency use authorization for treatment of covid- tmprss : potential biomarker for covid- outcomes simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry middle east respiratory syndrome coronavirus infection mediated by the transmembrane serine protease tmprss identification of nafamostat as a potent inhibitor of middle east respiratory syndrome coronavirus s protein-mediated membrane fusion using the split-protein-based cell-cell fusion assay sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor quest for a covid- cure by repurposing small-molecule drugs: mechanism of action, clinical development, synthesis at scale, and outlook for supply guanidinobenzoic acid derivatives new orally active serine protease inhibitors identification of the first synthetic inhibitors of the type ii transmembrane serine protease tmprss suitable for inhibition of influenza virus activation the androgen-regulated protease tmprss activates a proteolytic cascade involving components of the tumor microenvironment and promotes prostate cancer metastasis cleavage of influenza virus hemagglutinin by airway proteases tmprss and hat differs in subcellular localization and susceptibility to protease inhibitors the membrane-anchored serine protease, tmprss , activates par- in prostate cancer cells adsorption of trypsin on hydrophilic and hydrophobic surfaces nafamostat mesylate blocks activation of sars-cov- : new treatment option for covid- surface loops of trypsin-like serine proteases as determinants of function matriptase- (tmprss ): a proteolytic regulator of iron homeostasis regulation of blood coagulation by the protein c anticoagulant pathway. arteriosclerosis, thrombosis, and vascular biology foy: [ethylp-( -guanidinohexanoyloxy) benzoate] methanesulfonate as a serine proteinase inhibitor. i. inhibition of thrombin and factor xa in vitro potential of heparin and nafamostat combination therapy for covid- immune mechanisms of pulmonary intravascular coagulopathy in covid- pneumonia. the lancet rheumatology pharmacokinetics studies of nafamostat mesilate (fut), a synthetic protease inhibitor, which has been used for the treatments of dic and acute pancreatitis, and as an anticoagulant in extracorporeal circulation metabolic fate of c-camostat mesylate in man, rat and dog after intravenous administration key: cord- - rkrrx a authors: lu, shuai; xie, xi-xiu; zhao, lei; wang, bin; zhu, jie; yang, ting-rui; yang, guang-wen; ji, mei; lv, cui-ping; xue, jian; dai, er-hei; fu, xi-ming; liu, dong-qun; zhang, lun; hou, sheng-jie; yu, xiao-lin; wang, yu-ling; gao, hui-xia; shi, xue-han; ke, chang-wen; ke, bi-xia; jiang, chun-guo; liu, rui-tian title: the immunodominant and neutralization linear epitopes for sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rkrrx a the coronavirus disease (covid- ) pandemic caused by severe acute respiratory syndrome coronavirus (sars-cov- ) becomes a tremendous threat to global health. although vaccines against the virus are under development, the antigen epitopes on the virus and their immunogenicity are poorly understood. here, we simulated the three-dimensional structures of sars-cov- proteins with high performance computer, predicted the b cell epitopes on spike (s), envelope (e), membrane (m), and nucleocapsid (n) proteins of sars-cov- using structure-based approaches, and then validated the epitope immunogenicity by immunizing mice. almost all predicted epitopes effectively induced antibody production, six of which were immunodominant epitopes in patients identified via the binding of epitopes with the sera from domestic and imported covid- patients, and were conserved within sars-cov- , sars-cov and bat coronavirus ratg . we also found that the immunodominant epitopes of domestic sars-cov- were different from that of the imported, which may be caused by the mutations on s (g d) and n proteins. importantly, we validated that eight epitopes on s protein elicited neutralizing antibodies that blocked the cell entry of both d and g pseudo-virus of sars-cov- , three and nine epitopes induced d or g neutralizing antibodies, respectively. our present study shed light on the immunodominance, neutralization, and conserved epitopes on sars-cov- which are potently used for the diagnosis, virus classification and the vaccine design tackling inefficiency, virus mutation and different species of coronaviruses. the coronavirus disease (covid- ) pandemic caused by the novel severe acute respiratory syndrome coronavirus (sars-cov- ) has caused unprecedented impact on global health. more than million cases were reported by who on june , unsolved, it is possible to model these protein structures based on their reported gene sequence using molecular simulation and then predict their epitopes (lu et al., ) . increasing evidences and s protein, respectively, and only s - (n) induced higher antibodies than that of the non- glycosylated epitope ( fig. f and g) . to investigate the spectrum of antibodies in covid- patients, we detected the binding of the early convalescent sera of imported (europe) cases which infected sars-cov- in early april, and domestic (china) cases in early february, to various epitopes ( table ) k r /g r g /r g /r g s in n protein, respectively (table ) , resulting in different immunodominant epitopes of different virus sub-strains which provide the bases for the differential diagnosis. the predicted epitopes induce neutralization antibody production sars-cov- pseudo-virus neutralization assay is a well-accepted method to detect the ability of vaccine to inhibit sars-cov- infection . to assess neutralization antibodies induced by s protein epitopes, we incubated the immunization sera with d or g sars-cov- pseudo-viruses and then the mixture was added to ace - ft cells which stably expressed ace . the results showed that immunized sera of s - , s - , s - and s - epitopes inhibited sars-cov- pseudo-virus infection compared to hbc-s control (p < . ), with inhibition rates around %- % (fig. a) . also, the sera of s - , s - , s - , s - , s - , s - (n) and s - inhibited sars- cov- infection with the inhibition rate from % to % (fig. a) , indicating that these epitopes induced neutralization antibody production. to detect the effect of epitope immunization on the neutralizing responses of g sars-cov- , we incubated the epitope-immunized sera with the g sars-cov- pseudo-viruses. the results showed that sera of epitopes inhibiting d sars-cov- also inhibited g sars-cov- infection, except of s - , s - and s - (fig. b) . however, the immunized sera of epitopes s - , s - , s - , s - , s - , s - , s - (n), and s - only inhibited g sars- cov- pseudo-virus infection. interestingly, compared with its non-glycosylation epitope, s - (n), s - (n) and s - (n) induced less neutralizing antibodies to g pseudovirus while that of s - (n) increased (fig. b ). we then -fold serial diluted the sera with inhibition rate > %, and determined the neutralizing antibody titers induced by these epitopes. s - induced the highest neutralizing effect with antibody titer at : (fig. c ). the structural analysis showed that most of these neutralizing epitopes to d and g sars- cov- were in or near n-terminal domain (ntd), receptor-binding domain (rbd) or s ' cleavage site of s protein and were spatially clustered ( fig. d-i) , except s - and s - which are in or near transmembrane domain and s /s cleavage site at interface of s and s subunits of s protein, respectively. vaccines are potent means to control the current pandemic of covid- and to prevent future outbreak, thus fully understanding the immune responses elicited by the virus epitopes is urgent. as antigenic determinants, identifying and understanding epitopes would facilitate vaccine design and development. since neutralization antibodies usually recognize the surface area of the virus proteins, identification of epitopes in surface area based on d structure of proteins may increase the efficiency to find the epitopes that elicit neutralization antibodies. in this study, we in first time used high-performance computer to simulate the three-dimensional structures of major proteins on sars-cov- and predicted surface area epitopes using the modeled protein structures, which was proved to be efficient and accurate by the further mouse immunization and pseudo-virus neutralization assay. within the identified epitopes, were conserved with > % homology and shared > % homology among sars-cov- , sars-cov and bat coronavirus ratg (table s ), implicating that these epitopes could be used as for designing broad- spectrum betacoronavirus vaccines. some surface area epitopes of sars-cov- were determined to be immunodominant in present study by detecting the binding of the antibodies in early convalescent sera of covid- patient to various predicted epitopes. consistent with previous report, s - was an immunodominant epitope and this epitope was able to elicit neutralization antibodies ( antibodies targeting the interaction interface between rbd and ace , but also the antibodies binding with n-terminal domain (ntd) of s protein, such as s - , s - , s - and s - showed neutralization effect on d strain. within the neutralizing ntd epitopes, s - and s - also showed neutralization effect on g strain. antibodies induced by s - but not its glycosylated form inhibited the cell entry of g pseudovirus rather than d pseudovirus, and the epitopes s - (n) and s - (n) induced less neutralizing mutation. s - is at the s /s cleavage site located at interface of s and s subunits of s protein which is important for spike protein mediated virus-cell membrane fusion. our results suggested that the s - epitope was at the vulnerability site of sars-cov- and might be an ideal candidate and targeting site for vaccine development. moreover, our results showed that the neutralizing epitopes are highly spatial clustered, indicating that conformational epitopes in the above regions may be used for designing an effective vaccine. in conclusion, we have successfully predicted sars-cov- epitopes based on of the d structures of s, m, n, e proteins, validated their immunogenicity, characterized the homology of the epitopes among betacoronavirus, and identified the neutralization and immunodominant epitopes (table s ) . our findings provide a wide neutralization and immunodominant epitope spectrum for the design of an effective, safe vaccine, differential diagnosis and virus classification. serum samples were collected from early convalescent patients with covid- which were confirmed by sars-cov- real-time reverse transcriptase-polymerase chain reaction (rt-pcr). golden bridge biotechnology co., beijing, china) and chromogenic substrate tmb (thermofisher, waltham, ma, usa). the cut-off for seropositivity was set as the mean value plus three standard deviations ( sd) in hbc-s control sera. the binding of the epitopes to the sera of covid- infected patients were detected by elisa using the same procedure as described above, -well plates were coated with . μg peptides and sera were diluted at : . the cut-off lines were based on the mean value + sd in - healthy persons. all elisa studies were performed at least twice. pooled mice sera collected at day after the third immunization were diluted in dmem supplemented with % fetal bovine serum, mixed with . × sars-cov- pseudoviruses and incubated at ℃ for h. the mixture was then added to . × ace - t cells and the medium was replaced after h. firefly luciferase activity was measured h post-infection using bright-glo™ luciferase assay system (promega). all neutralization studies were performed at least twice. three independently mixed replicates were measured for each experiment. the data presented in this study were expressed as mean ± sem. data were analyzed by one-way (anova), followed by multiple comparisons using dunnett's test within graphpad prism . software. student t-test was used to analyze the data of non-glycosylated and glycosylated epitopes. p < . was considered to be significant. with hbc-s control; ***p < . ; ****p < . ; one-way anova followed by dunnett's test; compared with non-glycosylated epitope; #p < . ; student t-test). anti-n antibody titers (log ) **** **** *** s - s - s - ( n ) s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - ( n ) s - s - s - ( n ) s - s - s - ( n s - s - e - e - m - m - m - n - n - n - n - n - anti-epitope antibody titers (log ) s - s - s - ( n ) s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - ( n ) s - s - s - ( n ) s - s - s - ( n s - s - hbc-s s - s - s - ( n ) s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - s - ( n ) s - s - s - ( n ) s - s - s - ( n s - s - preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies protection against heterologous human papillomavirus challenge by a synthetic lipopeptide vaccine containing a broadly cross-neutralizing epitope of l structural characterization of a highly-potent v -glycan broadly neutralizing antibody bound to natively-glycosylated hiv- envelope structures of human antibodies bound to sars-cov- spike reveal common epitopes and recurrent features of antibodies immunoinformatics-aided identification of t cell and b cell epitopes in the surface glycoprotein of -ncov sars-cov- viral spike g mutation exhibits higher case fatality rate development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach the sars-cov- vaccine pipeline: an overview emerging coronaviruses: genome structure, replication, and pathogenesis structure- based design of antiviral drug candidates targeting the sars-cov- main protease a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- clinical characteristics of coronavirus disease in china. the new england journal of medicine sars corona virus peptides recognized by antibodies in the sera of convalescent cases bepipred- . : improving sequence-based b-cell epitope prediction using conformational epitopes hepatitis b core vlp-based mis-disordered tau vaccine elicits strong immune response and alleviates cognitive deficits and neuropathology progression in tau alzheimer's disease and frontotemporal dementia structure of m(pro) from sars-cov- and discovery of its inhibitors human neutralizing antibodies elicited by sars-cov- infection bioinformatic prediction of potential t cell epitopes for sars-cov- antibody lineages with vaccine-induced antigen-binding hotspots develop broad hiv neutralization reliable b cell epitope predictions: impacts of method development and improved benchmarking structure of the sars-cov- spike receptor-binding domain bound to the ace receptor transmission dynamics and evolutionary history of -ncov genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding detection of sars-cov- -specific humoral and cellular immunity cov- spike protein. biorxiv analysis of a sars-cov- infected individual reveals development of potent neutralizing antibodies to distinct epitopes with limited somatic mutation structural basis of receptor recognition by sars-cov- assessment of synthetic peptides of severe acute respiratory syndrome coronavirus recognized by long-lasting immunity recent progress in broadly neutralizing antibodies to hiv. the covid- vaccine development landscape clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice chadox ncov- vaccination prevents sars-cov- pneumonia in rhesus macaques. biorxiv antigenicity of the sars-cov- spike glycoprotein a human monoclonal antibody blocking sars-cov- infection site-specific glycan analysis of the sars-cov- spike swiss-model: homology modelling of protein structures and complexes cryo-em structure of the -ncov spike in the prefusion conformation epitope-based vaccine design yields fusion peptide-directed antibodies that neutralize diverse strains of hiv- structural basis for the recognition of sars-cov- by full-length human ace design of wide-spectrum inhibitors targeting coronavirus main proteases a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov mapping the immunodominance landscape of sars-cov- a pneumonia outbreak associated with a new coronavirus of probable bat origin this work was supported by grants from the national natural science foundation of china **** **** **** key: cord- -nptfd c authors: tengs, torstein; delwiche, charles f.; jonassen, christine monceyron title: a mobile genetic element in the sars-cov- genome is shared with multiple insect species date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nptfd c unprecedented quantities of sequence data have been generated from the newly emergent severe acute respiratory syndrome coronavirus (sars-cov- ), causative agent of covid- . we document here the presence of s m, a highly conserved, mobile genetic element with unknown function, in both the sars-cov- genome and a large number of insect genomes. although s m is not universally present among coronaviruses and appears to undergo horizontal transfer, the high sequence conservation and universal presence of s m among isolates of sars-cov- indicate that, when present, the element is essential for viral function. single-stranded rna viruses, such as coronaviruses, are known to have genomes with strong secondary structural features, such as stem-loop regions and pseudoknots. we have previously reported the presence of a - nucleotide long hairpin-forming element, referred to as stemloop ii-like motif (s m) (jonassen, et al. ) , in several families of positive-sense singlestranded rna ((+)ssrna virus)) viruses (tengs, et al. ) . the molecular structure has been mapped in great detail for sars-cov (robertson, et al. ) . its sequence and secondary structure are highly conserved, but the phylogenetic distribution among viral genomes is very patchy. these properties indicate that the sequence is mobile, and, to our knowledge, s m represents the only known example of a genetic element with the ability to move between distantly related viruses. the function and evolutionary origin of s m remains unknown, but the high level of conservation seen even in distantly related viruses despite relatively high overall mutation rates suggests that s m is under strong selection. when comparing s m motifs from different virus species, there are conserved residues both in loopand base-pairing regions, indicating that there is selective pressure to maintain both the primary and secondary structure. the element is always present near the ' end of the genome, and in all virus families where s m has been reported, there are examples of species carrying two (non-identical) back-to-back copies (quan, et al. ; tengs, et al. ). as noted above, related viruses may lack s m entirely, but when present, its sequence is always highly conserved. sars-cov- (gorbalenya, et al. ) is a member of the sars-related sarbecovirus subgenus (andersen, et al. ) and this group of coronaviruses is known to contain s m (tengs and jonassen ) . the presence of s m in the sars-cov- genome (genbank accession mn , position - ) and other members of this group is probably the result of a single horizontal transfer event, predating the divergence of the sars-related viruses (tengs, et al. ; tengs and jonassen ) . to characterize the specific genotype of s m found in sars-cov- , blastn (altschul, et al. ) was used to search the entire virus section of genbank using all s m sequence genotypes reported in the literature (n = ) (jonassen, et al. ; robertson, et al. ; tengs, et al. ; tengs and jonassen ) as query sequences. to check for the presence of s m motifs in insects, both the tsa and the whole genome shotgun contigs (wgs) databases were mined using the same approach. for the phylogenetic analyses, sequences were aligned using the clustal w algorithm (thompson, et al. ) and maximum likelihood analysis were performed using mega x (kumar, et al. ) . the james-taylor-thornton (jtt) substitution model was used with gamma distribution ( categories) and invariable sites. branch swapping was done using the subtree-pruning-regrafting method with 'very strong' filter. a total of s m-containing accessions were identified, representing at least four virus families (supplementary table two of the s m virus accessions identified were from a recently published rna-based invertebrate virosphere project (shi, et al. ). these two highly similar sequences, derived from the virome of the spider species tetragnatha maxillosa, encode a hypothetical protein immediately upstream of s m that could not readily be identified using protein sequence similarity searches. protein-protein blast searches against the non-redundant (nr) genbank protein database revealed the two best matching non-viral proteins to be from the insect species winter moth (operophtera brumata) and bagworm moth (eumeta japonica). in the o. a phylogenetic analysis focusing on s m accessions and using the longest and most similar amino sequences obtained from the tsa, wgs and nr databases (supplementary table ) gave a topology that was biased towards lepidopteran species ( figure b ). translation of both genomic and transcriptomic data in some cases gave reading frames containing several internal stop codons, but amino acid sequences that could still be aligned to (near) full length ( figure b, supplementary table ) . as sars-cov- is embedded within the sarbecoviruses, it is likely that the unique g > u mutation has occurred specifically during the evolution of the current pandemic strain of sars-cov ( figure c ). an australian isolate with a base deletion in s m was also discovered (accession mt ). intriguingly, the deletion occurred after passaging isolates in vero cells (caly, et al. ) . this could represent an attenuated sars-cov- strain, as culturing in permissive cell lines may have altered the selective pressure and alleviated the need to maintain a functional version of the motif. in the four virus families where s m has previously been described, the flanking protein is easily recognizable. the protein found in insect species, as well as the two t. maxillosa viruses, does not appear to have homologs in any of the better-characterized virus groups, nor does it appear to contain any recognizable motifs. a protein secondary structure homology search (drozdetskiy, et al. ) indicated that the o. brumata protein may contain an integrase catalytic domain, which would be consistent with the mobile nature of s m. we were unable to identify any signature of a retroviral origin of the s m loci in insect genomes, or any other indication of the insect s m contigs being of viral origin. in several of the genomic contigs, the open reading frame (orf) covering the uncharacterized protein also appeared to contain introns, generally considered a hallmark of eukaryotic genes. in addition, pcr and sanger sequencing was performed to confirm the presence of s m and the upstream orf in the o. brumata genome (supplementary figure ) , making it very unlikely that the downloaded sequence data stem from (rna) viruses and do not represent bona fide insect sequences. the insect species that contain s m (and the associated protein) are distantly related, indicating either a deep evolutionary origin with multiple losses or that this genetic construct is also a mobile element, perhaps using viruses as a vector . the t. maxillosa virus could represent such a vector, albeit no s m sequences or proteins similar to the o. brumata protein could be found in any arachnid species using sequence similarity searches. mobility of genetic elements such as transposable elements (tes) has previously been reported in insects (peccoud, et al. ) , and an analysis of the genomic s m contigs using the dfam portal (hubley, et al. ) revealed that several of the accessions had regions with a significant degree of similarity with the long interspersed nuclear element (line) l - _ldor_d, previously reported from butterflies (ray, et al. ) . the exact evolutionary link between the xenologs of s m and the unknown protein found in insects and viruses can not be established based on our data. the s m genotypes found in insects and viruses have similar primary sequence profiles, albeit there appear to be some subtle differences (figure ). we believe that the most likely mode of transfer for s m in viruses is through nonhomologous recombination between rna molecules. outside the astroviruses, sars-cov and sars-cov- represent the only known examples of s m-carrying viruses that infect humans, but it seems probable that s m is still evolutionary active and that this element will continue to affect the evolution of (+)ssrna viruses. because the clade of s m-containing coronaviruses that includes sars-cov and sars-cov- also includes viruses isolated from bat and pangolin, it is unlikely that s m has played a direct role in zoonotic transfer, but its high degree of sequence conservation suggests that it may present a target for therapy, and the presence of closely related systems in insect hosts provides an opportunity to study the biology of this apparent mobile element in relatively tractable experimental systems. basic local alignment search tool the proximal origin of sars-cov- isolation and rapid sharing of the novel coronavirus (sar-cov- ) from the first patient diagnosed with covid- in australia weblogo: a sequence logo generator jpred : a protein secondary structure prediction server viruses as vectors of horizontal transfer of genetic material in eukaryotes the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- the dfam database of repetitive dna families a common rna motif in the ' end of the genomes of astroviruses, avian infectious bronchitis virus and an equine rhinovirus mega x: molecular evolutionary genetics analysis across computing platforms massive horizontal transfer of transposable elements in insects identification of a severe acute respiratory syndrome coronavirus-like virus in a leaf-nosed bat in nigeria simultaneous te analysis of heliconiine butterflies yields novel insights into rapid te-based genome diversification and multiple sine births and deaths the structure of a rigorously conserved rna element within the sars virus genome redefining the invertebrate rna virosphere distribution and evolutionary history of the mobile genetic element s m in coronaviruses a mobile genetic element with unknown function found in distantly related viruses clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice the authors would like to thank m.sc anbjørg rangberg (Østfold hospital trust) for help with pcr and sequencing and dr. snorre hagen (norwegian institute of bioeconomy research) for providing the operophtera brumata dna. this work was supported by the figure . secondary structure of s m in sars-cov- based on sars-cov model (robertson, et al. ) . analysis was performed on orf ab polyprotein sequences from selected coronavirus species.s m-containing accessions have been highlighted and bootstrap values > % indicated ( psedoreplicates). b) maximum likelihood analysis using data from the s m-associated hypothetical protein (see main text for details). sequences in boldface stem from reading frames with (multiple) internal stop codons. * -genomic data, ** -accessions without s m. c) s m sequences corresponding to operational taxonomic units in the phylogenetic trees. lines above alignment show (non-canonical) base-pairing residues (robertson, et al. ) and the position with the unique g > u mutation in sars-cov- has been indicated. (bottom panel). generated using weblogo (crooks, et al. ). operophtera brumata and related insect species. identical sequences from the same species were removed and a maximum likelihood analysis was performed using mega x (kumar, et al. ) after aligning amino acid sequences using the clustal w algorithm (thompson, et al. ). the james-taylor-thornton (jtt) substitution model was used with gamma distribution ( categories) and invariable sites. branch swapping was done using the subtreepruning-regrafting method with 'very strong' filter. s m-containing accessions have been indicated (red: insect species, blue: invertebrate viruses) and bootstrap values > % are shown ( psedoreplicates).supplementary figure . pcr amplification and sanger sequencing of the s m locus in the operophtera brumata genome. pcr was performed using the amplitaq gold pcr master mix (thermo fisher scientific, oslo, norway). primer sequences, orf stop codon (taa) and s m sequence have been underlined. key: cord- -reb vo x authors: miladi, milad; fuchs, jonas; maier, wolfgang; weigang, sebastian; pedrosa, núria díaz i; weiss, lisa; lother, achim; nekrutenko, anton; ruzsics, zsolt; panning, marcus; kochs, georg; gilsbach, ralf; grüning, björn title: the landscape of sars-cov- rna modifications date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: reb vo x in the severe acute respiratory syndrome coronavirus (sars-cov- ) caused the first documented cases of severe lung disease covid- . since then, sars-cov- has been spreading around the globe resulting in a severe pandemic with over . fatalities and large economical and social disruptions in human societies. gaining knowledge on how sars-cov- interacts with its host cells and causes covid- is crucial for the intervention of novel therapeutic strategies. sars-cov- , like other coronaviruses, is a positive-strand rna virus. the viral rna is modified by rna-modifying enzymes provided by the host cell. direct rna sequencing (drs) using nanopores enables unbiased sensing of canonical and modified rna bases of the viral transcripts. in this work, we used drs to precisely annotate the open reading frames and the landscape of sars-cov- rna modifications. we provide the first drs data of sars-cov- in infected human lung epithelial cells. from sequencing three isolates, we derive a robust identification of sars-cov- modification sites within a physiologically relevant host cell type. a comparison of our data with the drs data from a previous sars-cov- isolate, both raised in monkey renal cells, reveals consistent rna modifications across the viral genome. conservation of the rna modification pattern during progression of the current pandemic suggests that this pattern is likely essential for the life cycle of sars-cov- and represents a possible target for drug interventions. severe acute respiratory syndrome coronavirus (sars-cov- ) is an rna virus that causes coronavirus disease . the ongoing covid- pandemic has put an enormous burden on human society in and is expected to have even longer-lasting impacts. despite tremendous ongoing research efforts, we still do not have sufficient antiviral treatment solutions or a vaccine. over the last two decades, the closely related zoonotic betacoronaviruses sars-cov and mers have caused recurring outbreaks in the human population. the ability of coronaviruses (cov) for cross-species transmission, their known reservoirs in multiple species, and their high replication rates keep covs a threat for the human population even beyond the pandemic. understanding the molecular mechanisms behind the replication of sars-cov- is urgently needed. sars-cov- carries an enveloped positive-sense single-stranded rna genome (~ kb) encoding a dense collection of structural and non-structural proteins (nsp), and accessory proteins. like other members of the order nidovirales, the genome encodes two polyproteins followed by a series of orfs that are transcribed into sub-genomic rnas (sgrnas). each transcribed sgrna is thought to be translated into one protein, and its ' untranslated region overlaps with the coding sequence of the shorter downstream sgrnas ( , ) . upon cell entry, orf a and orf b can be translated directly from the viral genome. a − ribosomal frameshifting upstream of the orf a stop codon allows the translation of orf b ( ) . the resulting polyproteins, pp a and pp b, are further cleaved by viral proteases and yield and nsps, respectively. the rna-dependent rna polymerase (rdrp) nsp performs the genome replication and the transcription of sgrnas through negative-sense rna template intermediates. to transcribe the sgrnas, the negative rna intermediates undergo discontinuous transcription, in which the rdrp skips the genome region between transcription-regulatory sequences (trs) located at the ' end of the orfs (trs-b sites) and a corresponding trs-leader site at the ' end of the viral genome (for a review please see sola et al. ( )). as a consequence, viral sgrnas share a common ' leader sequence derived from the ' end of the genome up to the trs-l site. like host mrnas, the viral genomic rna and the sgrnas have a methylated ' cap and a polyadenylated ' tail. still, the transcriptomic aspect of covs, including sars-cov- , is not fully understood. transcript-level regulation of gene expression is widely used by the native cellular mechanisms of the host. viruses have adopted and hijacked these mechanisms throughout their evolution ( the direct rna sequencing experiments yielded a total of . , . and . million sequencing reads for the three samples. we mapped the sequences of each dataset to the combination of the human host genome, the yeast enolase gene used as the ont drs spike, and the sars-cov- ncbi reference genome. notably, between - % of the mapped reads were mapped to the virus genome (fig. a) , which is very much comparable to the fraction of viral reads obtained by kim et al. ( ) using the vero host cell line (fig. s a) . in contrast to calu- cells which were used for this study, vero cells are interferon-deficient. thus, our observation seems to indicate that the interferon deficiency of vero cells does not benefit the viral life cycle to an experimentally relevant extent. this is in line with a recent study analyzing the host transcription response to sars-cov- infection ( ). the long rna sequencing reads generated for this study cover the entire sars-cov- genomic rna as well as the different orfs (fig b,c, fig. s b ). this allowed us to do an in-depth analysis of the genomic junctions, including the trs-b sites described by kim et al. ( ) . for comparability, we downloaded and we mitigated the variability in the sgrna expression levels and the ont higher coverage bias at the 'end of the transcripts by downsampling the collections of intact sgrna reads. in this way, we get a quasi-uniform distribution of intact reads across all the samples and the sgrnas except for orf b (fig. s b ). for comparison, we also applied the same data processing workflows on datasets from kim et al. and taiaroa et al. ( , ) . we used the intact reads that were identified and down-sampled in the previous step for rna modification detection by drs using the two available in silico methods. for the identification of the modification sites, we used two different approaches for harnessing the sensed electrical signals from sequencing the native rna molecules by nanopores. typically, the electrical signal events aligned to positions, called squiggles, are compared between a condition with unknown putative modifications and a control condition. one strategy to detect the modified genomic positions is to compare the distribution of squiggles of two conditions, both encoding the transcripts of interest. another strategy uses trained statistical models of the control condition to identify modification of the other condition by evaluating disagreement between the observed features and the model expectations. two sets of galaxy workflows based on tombo ( ) and nanocompore ( ) tools were designed to compute the modification scores from the drs data (table s ) . both tombo and nanocompore support the distribution-based strategy while tombo further can perform model-based modification detection. since nanocompore supports biological replicates, we used it as the distribution-based strategy for calling modifications from the three replicates (fig. a) . we further used tombo to train models and calculate modification scores for individual samples (fig. b) . to this end, we utilized the in vitro transcribed (ivt) data of sars-cov- from kim et al. as the unmodified rna control dataset for both tombo and nanocompore. the distribution of the signals derived from virus rna and unmodified rna is representatively depicted for fr in figure . we identified the positions modified in sars-cov- sgrnas for all the sgrnas and among all the datasets (fig. a,b) . we specifically focused on the modifications regions of the sub-genomic rnas, i.e., the region downstream of the associated trs-b sites. we excluded the genomic reads due to the moderately low number of intact reads. the modification results for 'leader was also not considered due to the anomalies observed in the read coverage of the 'leader site (figure b) . by comparing the model-based prediction for the presented datasets (fr - ), we identified a high level of correlation between the modification rates of sgrna positions in the three replicates (figure b ). this prompted us to perform a correlation analysis as depicted in figure representatively for sgrna s and n. notably, this analysis revealed a high correlation not only for the modification sites but also for the fractions of modification between biological replicates. we therefore tested the correlation between our data and the previously published data (kr) (fig. ) . remarkably, the top-ranked modification sites are consistent and the correlation of the fraction of rna modification fractions is high (fig. ) , too. this observation was confirmed by visual inspection of raw signals ( examples are shown in fig. ). we excluded the data of the australian isolate from this analysis due to the relatively lower read coverage and different ratio of viral reads (fig. s a) . the large overlap of highly modified sites predicted by two independent algorithms supports the validity of our analysis and findings. however, for sites with a low modification ratio the predicted significance levels differ sometimes, indicating that additional biological replicates are needed to consistently reach a valid significance level. rna modifications are essential modulators of rna stability and function. the recent invention of direct rna sequencing protocols using nanopores enable unbiased detection of rna modification. in general, the analysis of drs raw signals is challenging and not well standardized and thus only possible for experienced bioinformaticians. to enable more researchers to use this technology, we present two highly standardized analysis pipelines for drs sequencing data. these pipelines were integrated into the galaxy platform ( ) and are accessible at https://covid .galaxyproject.org/direct-rnaseq together with workflows for mapping reads to the viral genome, for calling genomic variants, and for identifying and extracting sgrna-derived reads. using these pipelines we analyzed the drs data sets generated for this study, serving as the first drs data from europe, and compared it with the data from previous studies. here we generated the sars-cov- drs sequencing data sets for the first time for three biological replicates. in contrast, to previously published data, viruses were cultured in a disease-relevant human epithelial lung cell line. remarkably, the infection resulted in more than % of poly-a enriched rna reads from sars-cov- . we provide experimental evidence for transcription of a total of sgrnas, two of which are not part of the public sars-cov- reference annotation. the comparative analysis of our three replicates with published data demonstrates a high degree of similarity between isolates from different continents and at both early and recent stages of the epidemie. the development of all analysis workflows used for the bioinformatic evaluation of the sequencing data was carried as part of the covid- initiative of the galaxy project ( ) . all galaxy workflows and additional required inputs to them (beyond the sequencing data) are available from the direct rnaseq subpage of the project at https://covid .galaxyproject.org/direct-rnaseq . mapping of the sequence reads to the corresponding genomes, extraction of intact reads and assignment to the sgrnas were performed on the european galaxy server. for mapping, the ont reads of each sample were first mapped to a virtual genome combined of the host (hg ) and the sars-cov- reference (nc_ . ) genomes, as well as host rdna (u . ) and eno spike sequence using minimap ( ) . the subset of reads that mapped to the viral genome was isolated using samtools ( ) and served as input for a second round of mapping to only the viral genome and minimap parameters optimized for the alignment of viral cross-junction sequences similar to kim et al.. the complete mapping steps can be reproduced using our read mapping to viral genome galaxy workflow. for the extraction of intact reads carrying the viral leader sequence and assignment of these reads to viral sgrnas, we used a two-step strategy. first, we used bedtools ( ) and samtools to extract reads, for which the mapping supported a junction between the viral trs-l site and putative landing regions upstream of any potential longer orf beyond orf ab. the list of landing region candidates used at this step includes the regions between each of the predicted structural orfs and the next intervening upstream start codon, but also correspondingly defined regions upstream of potential alternative start codons within the s, a, m, b, n and orfs and enables a relatively unbiased detection of junction sites independent of prior assumptions about trs-b sites. next, we inspected the resulting reads classifications with igv ( ) for evidence of junction events and used this information to build a list of trs-b sites the use of which is supported by the sequencing data. this list was then used in a second round of assignment of reads to viral sgrnas, in which only reads supporting a junction event between the trs-l site and any of the confirmed trs-b sites (with flanking bases on each side to account for alignment ambiguities around the junction sites) were considered. both read classification strategies can be reproduced using galaxy workflows to classify ont reads by candidate junction and to classify ont reads by confirmed junction sites , respectively. we have also the collections of fastq-formatted intact reads with the viral leader sequence were used as input to tombo. first, tombo preprocess and tombo resquiggle commands were invoked on the fastq files and the associated fast collection (option --rna). tombo detect_modification was invoked using the subcommand model_sample_compare (options --fishers-method-context --minimum-test-reads --sample-only-estimates) on the re-squiggled viral reads and the downsampled ivt data from kim et al. the subcommand level_sample_compare was also applied with the same configuration (data not shown). the methylation scores were extracted from the computed statistics using the subcommand text_output browser_files --file-types dampened_fraction. the plots for ionic signals were also generated using tombo. the second workflow for distribution-based comparison of conditions was developed in galaxy using nanocompore and nanopolish ( , ) . to align the raw sequencing event data to the reference genome, nanopolish subcommand eventalign was used (options --samples --scale-events --print-read-names) ( ) . the alignments produced in the previous step in bam format and the associated reads in fastq format were provided to the nanopolish tool. in the next step, the tabular output of event alignment was treated by removing the rows for the portion of the events that were aligned to the first positions of the genome that covers the leader region using awk. this step has been necessary to have a proper utilization of nanocompore tool that does not natively support spliced alignments. in the next step the event_align data was processed using nanopolishcomp (https://github.com/a-slide/nanopolishcomp) followed by nanocompore sampcomp (options --sequence_context --logit) to obtain the methylation scores. the p-value score gmm_logit_pvalue_context_ was used to predict methylation. heatmaps of the predicted fraction of modified bases using tombo. the red marks show top- % modified sites per sample that are common in at least two of the three samples. figure : distribution of nanopore measured ionic signals for exemplary regions with high modification scores according to tombo and nanocompore. shown are signals obtained from unmodified rna (black) and one representative sample, fr (red). correlation of the fraction of modified bases in the s (a) and n (b) sgrnas computed using tombo. correlation coefficients are given in red circles. figure : direct rna sequencing raw electrical signals of downsampled reads obtained from unmodified rna (ivt, black), from samples generated for this study and from isolate from a published korean data set (fr - and kr, red). mapping statistics of data sets and read distributions among the genomes. a: mapping statistics of drs reads for the human genome, ont control eno, and sars-cov- . depicted are results obtained for published data sets from korea (kr) and australis (au). b: top panel, the total number of reads with a 'leader sequence for the different sgrnas and the genome. bottom panel, the to maximal reads downsampled sgrna reads for the downstream modification analysis. supplementary tables table s : genomic variants detected in the three studied isolates. to be included in this list the variant site had to show a depth of coverage (dp) > and an alternate allele frequency (af) > . . : counts of reads supporting junctions between trs-l and each of trs-b candidate regions. asterisks mark candidate regions with unconvincingly low number of reads that were not considered for further analysis. ng-and-coverage-analysis continuous and discontinuous rna synthesis in coronaviruses subgenomic messenger rnas: mastering regulation of (+)-strand rna virus life cycle viral and cellular mrna translation in coronavirus-infected cells how viruses hijack cell regulation messenger rna modifications: form, distribution, and function. science rna modifications go viral epitranscriptomic marks: emerging modulators of rna virus gene expression rna modifications modulate gene expression during development. science direct rna sequencing enables m a detection in endogenous transcript isoforms at base-specific resolution author correction: nanopore native rna sequencing of a human poly(a) transcriptome direct rna nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis the architecture of sars-cov- transcriptome direct rna sequencing and early evolution of sars-cov- the genome landscape of the african green monkey kidney-derived vero cell line imbalanced host response to sars-cov- drives development of covid- de novo identification of dna modifications enabled by genome-guided nanopore signal processing rna modifications detection by comparative nanopore direct rna sequencing the galaxy platform for accessible, reproducible and collaborative biomedical analyses: update no more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics | biorxiv minimap : pairwise alignment for nucleotide sequences the sequence alignment/map format and samtools bedtools: a flexible suite of utilities for comparing genomic features integrative genomics viewer (igv): high-performance genomics data visualization and exploration. brief bioinform detecting dna cytosine methylation using nanopore sequencing we are very thankful to the entire galaxy community which is maintaining a vast collection of ngs-tools many of which have been fundamental for realizing this project. we thank the university medical center freiburg for all the support and openness. we thank hervé menager supporting us when experiments were demanding more time and nathan roach and stephan flemming for extending our long-read-tool portfolio. we thank the bundeswehr institute of microbiology, munich, germany, for providing us with sars-cov- isolate muc-imb- / . we are very grateful to oxford nanopore technologies for the awesome support and fast delivery -really impressive. table : trs sites for which evidence has been observed in this study. for each trs we list the following: its position as -based start position of the core motif; its core sequence and the three bases flanking it on each side; the supporting read counts in each of the three samples from this study and in the reanalyzed kr sample (for the trs-l site, these counts are simply the sum of all the trs-b counts since reads were required to support junctions between trs-l and one of the trs-b sites to be considered). nanocompore modification results fr - https://usegalaxy.eu/u/milad/h/sars-cov- ont-nanocompore-sampcomp- -replicates- kanalysis historytombo modification results all https://usegalaxy.eu/u/milad/h/sars-cov- -t ombo-re-squiggles-results-data- k key: cord- -yujbcwg authors: al-mulla, fahd; mohammad, anwar; al madhoun, ashraf; haddad, dania; ali, hamad; eaaswarkhanth, muthukrishnan; john, sumi elsa; nizam, rasheeba; channanath, arshad; abu-farha, mohamed; ahmad, rasheed; abubaker, jehad; thanaraj, thangavel alphonse title: a comprehensive germline variant and expression analyses of ace , tmprss and sars-cov- activator furin genes from the middle east: combating sars-cov- with precision medicine date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yujbcwg the severity of the new covid- pandemic caused by the sars-cov- virus is strikingly variable in different global populations. sars-cov- uses ace as a cell receptor, tmprss protease, and furin peptidase to invade human cells. here, we investigated , whole-exome sequences of individuals from the middle eastern populations (kuwait, qatar, and iran) to explore natural variations in the ace , tmprss , and furin genes. we identified two activating variants (k r and n d) in the ace gene that are more common in europeans than in the middle eastern, east asian, and african populations. we postulate that k r can activate ace and facilitate binding to s-protein rbd while n d enhances tmprss cutting and, ultimately, viral entry. we also detected deleterious variants in furin that are frequent in the middle eastern but not in the european populations. this study highlights specific genetic variations in the ace and furin genes that may explain sars-cov- clinical disparity. we showed structural evidence of the functionality of these activating variants that increase the sars-cov- aggressiveness. finally, our data illustrate a significant correlation between ace variants identified in people from middle eastern origins that can be further explored to explain the variation in covid- infection and mortality rates globally. the global pandemic covid- caused by sars-cov- virus is life-threatening and has become a significant concern to humanity. notably, the severity of this disease is highly variable in different populations across the world . since the outbreak, several studies have reported specific factors including age, gender, and pre-existing health conditions that could have contributed to the increased severity of the disease [ ] [ ] [ ] . the genetic susceptibility to has also been explored by scrutinizing angiotensin converting enzyme (ace ) genetic variations in different populations. ace is the functional receptor mediating entry of sars-cov- into the host cells , which is facilitated by furin cleavage [ ] [ ] [ ] . transmembrane serine protease (tmprss ) is another candidate gene that has been linked to covid- disease [ ] [ ] [ ] . tmprss expression enhances ace -mediated sars-cov- cell invasion by operating as a co-receptor . the increased cleavage activity of this protease was suggested to diminish viral recognition by neutralizing antibodies and by activating sars spike (s) protein for virus-cell fusion and facilitates the active binding of sars-cov- through ace receptor, which is a risk factor for a more serious covid- presentation [ ] [ ] [ ] . the hallmark of the novel sars-cov- , as compared to other sars viruses, is the presence of a polybasic furin cleavage site. furin has been reported to facilitate the transport of sars-cov- into or from the host cell [ ] [ ] [ ] . notably, a recent study has highlighted the presence of a unique functional polybasic furin cleavage consensus site between the two spike subunits s and s by the insertion of nucleotides encoding prra in the s protein of sars-cov- virus . the furin-like cleavage-site is cleaved during virus egress, which primes the s-protein providing a gain-of-function for the efficient spreading of the sars-cov- among humans , . it is, therefore, likely that the presence of a deleterious ace , tmprss and furin gene variants may modulate viral infectivity among humans, making some people less vulnerable than others. recent studies, assessed the genetic variations and eqtl (expression quantitative trait locus) expression profiles in the candidate genes ace , tmprss , and furin to demonstrate the sex and population-wise differences that may influence the pathogenicity of sars-cov- , [ ] [ ] [ ] [ ] [ ] . it is to be noted that these studies focused only on the european and east asian populations. given the extremely high prevalence of obesity ( %), hypertension ( %) and diabetes ( %) of the population in the gulf states [ ] [ ] [ ] which are considered as risk factors for mortality from covid- , , the witnessed low infectivity and mortality rates registered in this area of the world are intriguing. even though this could be due to various factors that are not well accounted for yet such as testing, hot weather or extreme measures taken early on by some countries, it could also be due to ethnic genetic variations in the ace -tmprss -furin genes that are key regulators for orchestrating sars-cov- cellular access. as a result, it is crucial to study the variation of these candidate genes ace , tmprss and furin in middle eastern populations to better understand possible natural genetic components that can be responsible for these differences. here, we present a comprehensive comparative assessment of deleterious or gain of function mutations of ace , tmprss , and we first examined ace gene variation frequencies in kuwait, qatar, east asia, and africa where the impact of sars-cov- has been modest and compared it to iran, which is moderately affected and then to europe, the continent with the most deaths per population. overall, we found human ace gene variations and the probability of loss of function mutations (plof= . , c.i. . - . ) to be low in comparison to ace (plof= . , c.i. . - . ), which is a gene of similar size, indicating that the ace gene is highly intolerant to loss of function mutations. additionally, we identified missense variants in the ace gene from kuwait, qatar and iran (table ). all were rare variants defined by minor allele frequency (maf) of less than %. the genetic variants included four novel variants from kuwait, and six from iran. we identified four deleterious variants (rs , rs , rs , and rs ), causing r w, r q, d v, y c missense amino acid substitutions in the ace gene using risk prediction tools as described in the methods section (table ) . all the ace gene deleterious variants were absent from the african, east asian and qatari data and were very rare in europeans but were present at maf of . - . % in the iranian population (table ) . this suggests a more protective effect and a significant decrease in the disease burden in iran compared to europe (p< . ; table ). the positions of the ace receptor polymorphisms on the linearized ace protein model are shown in supplementary fig. , and d-models for the same are shown in supplementary fig. and supplementary fig. . it is noteworthy that none of the ace polymorphisms identified in this study involved the three ace regions known to directly bind the sars-cov- s-protein receptor binding domain (rbd), namely amino acids - , - , and - ( supplementary fig. ). next, we examined whether natural ace gene variations that increase the affinity of ace to the s-protein or facilitate viral entry/viral load exist more frequently in high-burden compared to low-burden populations. two such genetic variants existed in our data ( table ). the first, rs , is a missense variant that changes a lysine amino acid at position to arginine (k r) ( table ). the k , which is just proximal to the first region of the ace receptor involved in s-protein binding, has been shown previously to bind the sterically hindering first mannose in the glycan that is linked to n and thus stabilizes the glycan moiety hindering the binding of s-protein rbd to ace (fig. a) . the missense variant r creates a new hydrogen bond with d , which is then poised to build a salt-bridge with the s-protein rbd k that increases the affinity of sars-cov- to the ace receptor (fig. b) . indeed, the ace k r activating variant was extremely rare in east asian (maf= . %), africans (maf= . %), but the second most common variant in europeans with maf of . % (shown in green fonts in table ). the maf of this variant in the kuwaiti population was nearly half that of europeans (maf= . %), and it was absent from the qatari and iranian exome data (table ). our structural modeling supports the notion that k r is an ace receptor activating variant (fig. a, b) . consistent with these findings, using a synthetic human ace mutant library, a recent study reported that the r variant increased s-protein binding and susceptibility to the virus significantly . we also subjected novel and known ace variants to structural predictions that may impact the binding of sars-cov- to the host cells. these changes (a g, y c, d g, f v and v l) were located proximal to the protein residues that mediate its activity. a previous study indicated that the three amino acid (aa) regions - , especially the residue near lysine and tyrosine , - and - in ace were essential for the binding of s-protein in coronavirus fig. ). the second activating, and by far the most common ace gene variant in europeans (maf= . %) and italians (maf= . %; detected in of exomes) was rs , which replaces the amino acid asparagine at position to aspartic acid (n d) (green font in table ). this ace variant was absent from the east asian population ( , exomes) and was significantly rarer in the middle east and africa (table ). this particular variant has been reported before, but its clinical relevance was persistently dismissed because the codon , being far from ace -spike protein interface, does not appear to be an obvious candidate for ace receptor binding to the s-protein of sars-cov- , . however, we noted that n d is located amino acids proximal to the tmprss cleavage site (aa - ) as shown in supplementary fig. . recent studies demonstrated that tmprss cleavage of the ace receptor increases sars-cov- cellular entry , . it is not unreasonable to suggest that this activating variant may play a similar role in sars-cov- , rendering people who harbor it more prone to severe infection and higher viral load. ace cleavage by tmprss enhances the s-protein viral entry . tmprss cleaves ace between residues and , which is the third and fourth helices in the c-terminal collectrin neck domain of the dimer interface of ace . to further dissect the mechanisms underlying the enhanced ace -tmprss accessibility, we performed structural analysis of the recently published ace domain bound to b at (pdb id: m ) and showed that n is located on the same interface of the loop region in close vicinity to the tmprss cleavage site (fig. a) . since loop region of protein are unordered and display conformational dynamics, a mutation close to the cleavage site can affect the binding affinity of tmprss . therefore, we used dynamut web server to predict the effect of mutation n d on the stability and flexibility of ace , . whereby, the predicted stability change was (ΔΔg): - . kcal/mol, which indicated the destabilization of ace receptor after the introduction of d (fig. b) . the n d mutation has resulted in an increase in entropy in the loop region (ΔΔsvib: . kcal.mol - .k - ) (fig. b ), depicting a more unstable state, which makes ace more readily cleavable by tmprss . in addition, we modelled in the various non-covalent interactions for both n and mutant d and other amino acids in the loop region ( fig. c, d) . in fig. c , n formed a backbone hydrogen-bond with n , this conformation also resulted in an nh-nh between e and s . whereas, with the activating variant d (fig. d ) the back-bone hydrogen bond with n is still established, however, the conformational change has resulted in a polar interaction between the backbone d coo-group and backbone nh of l , forming a weak polar interaction (through water-mediated hydrogen bond). as such, the d variant altered the conformation of the loop, and the nh-nh bond between e and s cannot be formed. whereby, such intermolecular amide interactions are significant for protein stability . such a change in interatomic interactions between amino acids near the cleavage region can decrease the stability and increase the flexibility of the loop , which makes it easier for tmprss to cleave. to gain insights that are pathologically and clinically relevant, we then asked whether a correlation exists between the n d maf data and mortality rates reported in corresponding regions. we identified a significant correlation between n d maf and deaths per million table ). there were few variants in the kuwaiti and iranian populations that influenced ace expression. in the qatari population the variants modifying ace gene expression were similar in frequencies to europeans (supplementary table ) with most upregulating ace expression in the brain and tibial nerve tissues (https://gtexportal.org/home/). notably, the tibial nerves were affected in diabetic conditions , which inflicts a high number of patients in the middle east. the most downregulating eqtl variant, rs , was present in % and % of the african and qatari populations respectively. overall, the eqtl data pertaining to ace expression were not significantly predictive nor informative. five rare and one common deleterious variants were identified in the tmprss gene in the middle eastern population ( supplementary fig. , supplementary table , supplementary table ). overall, there was no significant conclusion that may be withdrawn from the tmprss genetic variation data. on the expression level, we discerned four tmprss eqtl variants, which were detected only in qataris among the middle eastern populations (supplementary table ). one of these variants, rs is intronic and downregulates the expression of tmprss in the prostate. two of the eqtl variants, rs and rs , decrease the expression of the tmprss gene in the thyroid and ovary tissues, respectively. while the variant rs upregulates the tmprss gene expression in the testis (https://gtexportal.org/home/). it is worth noting that two-third of the mortality due to covid- disease affects males , . genetic variation analysis of the furin gene resulted in the identification of known missense variants (table , supplementary fig. ). like ace , all the identified variants were rare in the middle eastern populations. however, unlike the ace gene, no novel variants were observed in furin in the middle eastern population (table ) . among the , we detected seven deleterious variants suggesting a possible decrease in furin protease function, which can potentially reduce the risk of sars-cov- in the studied populations. in this context, deleterious furin gene variations were observed least in east asians, africans, followed by europeans then iranians (table ). interestingly, both in qatar and kuwait the deleterious furin genetic variations were more common, suggesting a possible protective effect against the sars-cov- (p< . ; table ). for example, the maf of r c, and r c in kuwait and qatar were . %/ . % and . %/ . % respectively compared to significantly lower maf in corresponding variants in europeans (p< . ; table ). together, these data support the premise that furin gene variants may play important roles in protection against sars-cov- in the middle east. it is worth noting, however, that in africa furin gene variants may not be a contributing factor in viral protection (table ) . we next sought to determine the extent of furin expression in the middle east. we detected furin eqtl variants in the middle eastern populations, most of which were reported in the qatar population (supplementary table ). we observed a high frequency of the furin upregulating variants, rs ( %) and rs ( %) in the african populations compared to the middle eastern, european and east asian populations (p< . ) (supplementary table ). in the geneatlas phewas database circulating between qatar, kuwait and europe, which was attributed to traveling and repatriation between these countries (manuscript in preparation; gisaid ). for these reasons, we argued that the most likely explanation for the differences observed in mortality rate among countries may be attributed to genetic variation in human genes involved in sars-cov- processing and cellular entry or exit. therefore, we screened the genetic variations and eqtl expression of the sars-cov- candidate genes, ace , tmprss and furin in three middle eastern populations: kuwaiti, iranian, and qatari and compared them to available maf data in the gnomad database . in this study we showed that amongst the nine known ace missense variants, n d (rs ) and k r (rs ) were the most frequent in the global datasets . in agreement with our analysis, structural predictions by stawiski and colleagues revealed that the k r missense variant enhanced the affinity of ace for sars-cov- whereas n d had little involvement in the sars-cov- s-protein interaction . our data suggest that the ace receptor notably, the ace novel variant q p is close to the aa region - , which is important for cleavage by the metalloprotease adam . r h and the two deleterious variants r w and r q are located within residues - essential for cleavage by tmprss d and tmprss . in fact, the mutation of arginine -such as r -and lysine residues within aa residues - markedly reduced ace cleavage by tmprss . however, it should be mentioned that sequentially distant aa residues in the ace receptor can be seen brought structurally proximal to each other to create active sites for catalysis . we illustrated this in supplementary fig. to show that the novel aa changes (colored green) are proximal to the protein residues that mediate its activity (colored blue and red). further studies are needed to directly assess the functional aspects of the reported missense variants. we urge the international community to assess ace variation differences among people with mild/asymptomatic disease versus patients presenting with severe respiratory distress syndrome. interestingly, there was not a single individual in the middle east with ace receptor variations in amino acids known to be crucial for sars-cov- s-protein binding (k , e , d , m , k ) . this may indicate that natural immunity conferred by the ace receptor variations is lacking or extremely rare. the rareness is evident from a recent comprehensive study that scrutinized ace gene variations in more than , individuals from different worldwide population groups and in which only eight rare ace gene missense variants (k r, e k, e k, d v, n i, h r, y h, and q l) with reduced binding to the s-protein of sars-cov- were reported . notably, this study has disproved the claim of cao and colleagues on the absence of such protective ace variants in human populations . finally, according to the human protein atlas portal, the mrna and protein expression of the ace gene is predominant in the human testis, cardiovascular and type ii pneumocytes (https://www.proteinatlas.org/ensg -ace /tissue). further, a recent study that profiled scrna-seq from the human testicles revealed the presence of this receptor in spermatogonia, leydig and sertoli cells . these findings may indicate that human testis is a potential target for coronavirus conforming to male prevalence in infected cases. notably, in our data, kuwaiti individuals carrying ace missense variants were all males. also, a chinese population study observed a higher hemizygous mutation rate in males than females . it is also possible that the high hemizygosity seen in males can be responsible for male prevalence in infected cases. this hypothesis can be further supported by the recent detection of sars-cov- virus in the semen of infected people , . in summary, we report the differential occurrence of gene variants in the middle eastern the whole-exome sequences data of kuwaitis , iranians and qataris published previously were used in this analysis. the genetic data of non-finnish european, east asian and african american populations were obtained from the gnomad repository , which contain data on a total of , exomes and , genomes (https://gnomad.broadinstitute.org/). the expression data for ace , tmprss , and furin were obtained from the genotype-tissue expression (gtex) database (https://gtexportal.org/home/). the same database portal was used to extract quantitative trait loci (eqtls) for the three genes. the missense variants were defined as deleterious when predicted to be damaging, probably damaging, disease causing and deleterious by the five algorithms applied, sift , polyphen- humvar, polyphen- humdiv , mutationtaster and lrt score and/or cadd (combined annotation-dependent depletion) score of more than . we considered only deleterious variants with minor allele frequency (maf) less than % in the burden analysis. the significance of the differences in mafs between different populations was calculated using chi-square test, using the r software (https://www.r-project.org/). all the p-values presented in the tables are not corrected for multiple testing. p values ≤ . were considered significant. all the identified ace missense exon variants were mapped, modeled, and analyzed using pymol modeling software (https://pymol.org/ /). dynamut web server was used to predict the effect of genetic variants on the stability and flexibility of ace receptor , . the missense variants were defined as deleterious when predicted to be damaging, probably damaging, disease causing and deleterious by the five algorithms applied (sift, polyphen- humvar, polyphen- humdiv, mutationtaster and lrt score) and/or cadd score of more than . we considered only deleterious variants with minor allele frequency less than % in the burden analysis. the missense variants were defined as deleterious when predicted to be damaging, probably damaging, disease causing and deleterious by the five algorithms applied (sift, polyphen- humvar, polyphen- humdiv, mutationtaster and lrt score) and/or cadd score of more than . we considered only deleterious variants with minor allele frequency less than % in the burden analysis. p values are calculated using chi-square test. kwt-kuwaitis; irn-iranians; qar-qataris; eur-europeans (non-finnish); eas-east asians; afr-africans. the missense variants were defined as deleterious when predicted to be damaging, probably damaging, disease causing and deleterious by the five algorithms applied (sift, polyphen- humvar, polyphen- humdiv, mutationtaster and lrt score) and/or cadd score of more than . we considered only deleterious variants with minor allele frequency less than % in the burden analysis. the missense variants were defined as deleterious when predicted to be damaging, probably damaging, disease causing and deleterious by the five algorithms applied (sift, polyphen- humvar, polyphen- humdiv, mutationtaster and lrt score) and/or cadd score of more than . we considered only deleterious variants with minor allele frequency less than % in the burden analysis. p values are calculated using chi-square test. kwt-kuwaitis; irn-iranians; qar-qataris; eur-europeans (non-finnish); eas-east asians; afr-africans. the mutant r , magenta, forms one h-bond with mannose. a process that could provoke the moiety-ace stability and increases the affinity of the ace a-helix to s protein rbd binding, where, r functions as a backbone and interacts with d which in turn dignified to build a saltbridge with the s protein rbd k , yellow stick. an interactive web-based dashboard to track covid- in real time clinical features of patients infected with novel coronavirus in wuhan risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study structural basis for the recognition of sars-cov- by full-length human ace virus strain of a mild covid- patient in hangzhou representing a new trend in sars-cov- evolution related to furin cleavage site the spike glycoprotein of the new coronavirus -ncov contains a furinlike cleavage site absent in cov of the same clade sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells ace and tmprss variants and expression as candidates to sex and country differences in covid- severity in italy sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response the proximal origin of sars-cov- cleavage of a neuroinvasive human respiratory virus spike glycoprotein by proprotein convertases modulates neurovirulence and virus spread within the central nervous system host cell proteases: critical determinants of coronavirus tropism and pathogenesis comparative genetic analysis of the novel coronavirus ( -ncov/sars-cov- ) receptor ace in different populations asians and other races express similar levels of and share the same genetic polymorphisms of the sars-cov- cell-entry receptor ace variants underlie interindividual variability and susceptibility to covid- in italian population human ace receptor polymorphisms predict sars-cov- susceptibility individual variation of the sars-cov receptor ace gene expression and regulation obesity in gulf countries diabesity in the arabian gulf: challenges and opportunities the prevalence of pre-diabetes and diabetes in the kuwaiti adult population in clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study evidence for ace -utilizing coronaviruses (covs) related to severe acute respiratory syndrome cov in bats the sequence of human ace is suboptimal for binding the s spike protein of sars coronavirus . biorxiv receptor and viral determinants of sars-coronavirus adaptation to human ace angiotensin-converting enzyme : the first decade ace x-ray structures reveal a large hinge-bending motion important for inhibitor binding and catalysis ace coding variants: a potential x-linked risk factor for covid- disease tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein a transmembrane serine protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry duet: a server for predicting effects of mutations on protein stability using an integrated computational approach dynamut: predicting the impact of mutations on protein conformation, flexibility and stability amide-amide and amide-water hydrogen bonds: implications for protein folding and stability probing protein stability and proteolytic resistance by loop scanning: a comprehensive mutational analysis elasticity of the tibial nerve assessed by sonoelastography was reduced before the development of neuropathy and further deterioration associated with the severity of neuropathy in patients with type diabetes ace expression in kidney and testis may cause kidney and testis damage after -ncov infection an atlas of genetic associations in uk biobank presenting characteristics, comorbidities, and outcomes among patients hospitalized with covid- in the new york city area global initiative on sharing all influenza data -from vision to reality variation across , human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes exome sequencing identifies potential risk variants for mendelian disorders at high prevalence in qatar androgen-induced tmprss activates matriptase and promotes extracellular matrix degradation, prostate cancer cell invasion, tumor growth, and metastasis receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus scrna-seq profiling of human testes reveals the presence of the ace receptor, a target for sars-cov- infection in spermatogonia sars-cov- and the testis: similarity with other viruses and routes of infection covid- and semen: an unanswered area of research assessment of coding region variants in kuwaiti population: implications for medical genetics and population genomics iranome: a catalog of genomic variations in the iranian population predicting amino acid changes that affect protein function a method and server for predicting damaging missense mutations predicting the functional impact of protein mutations: application to cancer genomics identification of deleterious mutations within three human genomes cadd: predicting the deleteriousness of variants throughout the human genome this work was supported by coronavirus emergency resilience grant from kuwait foundation for the advancement of sciences. all the authors declare no competing interests. correspondence and requests for materials should be addressed to f.a-m. figure . ace protein structure (open form, yellow; and closed form with substrate, gray). the active site residues are coded in red color. the zinc binding residues are coded in blue color. the identified novel amino acid changes in the middle eastern populations are coded in green color. the novel changes are proximal to the protein residues that mediate its activity. key: cord- - xyrmk a authors: chadchan, sangappa b.; maurya, vineet k.; popli, pooja; kommagani, ramakrishna title: the sars-cov- receptor, angiotensin converting enzyme (ace ) is required for human endometrial stromal cell decidualization date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xyrmk a study question is sars-cov- receptor, angiotensin-converting enzyme (ace ) expressed in the human endometrium during the menstrual cycle, and does it participate in endometrial decidualization? summary answer ace protein is highly expressed in human endometrial stromal cells during the secretory phase and is essential for human endometrial stromal cell decidualization. what is known already ace is expressed in numerous human tissues including the lungs, heart, intestine, kidneys and placenta. ace is also the receptor by which sars-cov- enters human cells. study design, size, duration proliferative (n = ) and secretory (n = ) phase endometrium biopsies from healthy reproductive-age women and primary human endometrial stromal cells from proliferative phase endometrium were used in the study. participants/materials, setting, methods ace expression and localization were examined by qrt-pcr, western blot, and immunofluorescence in both human endometrial samples and mouse uterine tissue. the effect of ace knockdown on morphological and molecular changes of human endometrial stromal cell decidualization were assessed. ovariectomized mice were treated with estrogen or progesterone to determine the effects of these hormones on ace expression. main results and the role of chance in human tissue, ace protein is expressed in both endometrial epithelial and stromal cells in the proliferative phase of the menstrual cycle, and expression increases in stromal cells in the secretory phase. the ace mrna (p < . ) and protein abundance increased during primary human endometrial stromal cell (hesc) decidualization. hescs transfected with ace -targeting sirna were less able to decidualize than controls, as evidenced by a lack of morphology change and lower expression of the decidualization markers prl and igfbp (p < . ). in mice during pregnancy, ace protein was expressed in uterine epithelial and stromal cells increased through day six of pregnancy. finally, progesterone induced expression of ace mrna in mouse uteri more than vehicle or estrogen (p < . ). large scale data n/a. limitations, reasons for caution experiments assessing the function of ace in human endometrial stromal cell decidualization were in vitro. whether sars-cov- can enter human endometrial stromal cells and affect decidualization have not been assessed. wider implications of the findings expression of ace in the endometrium allow sars-cov- to enter endometrial epithelial and stromal cells, which could impair in vivo decidualization, embryo implantation, and placentation. if so, women with covid- may be at increased risk of early pregnancy loss. study fundings/competing interest(s) this study was supported by national institutes of health / national institute of child health and human development grants r hd and r hd to rk and washington university school of medicine start-up funds to rk. the authors declare that they have no conflicts of interest. although much of the focus during the severe acute respiratory syndrome coronavirus (sars-cov- )/coronavirus disease (covid- ) pandemic has been on respiratory symptoms, some reports suggest that sars-cov- and the related middle east respiratory syndrome coronavirus can cause pregnancy complications such as pre-term birth and miscarriages (favre et al. ). additionally, a few reports noted that pregnant women with covid- had maternal vascular malperfusion and decidual arteriopathy in their placentas (schwartz and dhaliwal ; shanes et al. a) , and a recent clinical case study reported a second-trimester miscarriage in a woman with covid- (baud et al. ). however, whether sars-cov- infects the uterus has not been determined. it seems likely that sars-cov- could infect the uterus because its receptor, angiotensin converting enzyme (ace ), is expressed fairly ubiquitously in human tissues such as the lungs, heart, intestine, kidneys, and placenta (hamming et al. ; harmer et al. ; riviere et al. ) . moreover, ace functions by cleaving the vasoconstrictor angiotensin ii to the vasodilator angiotensin ( - ). as a component of the renin-angiotensin system, ace plays an important role in regulating maternal blood pressure during pregnancy. ace is expressed in the rat uterus during mid-and late pregnancy (merrill et al. ; neves et al. ). in addition, ace mrna expression was noted in the uterus of both rats (brosnihan et al. during the secretory phase, the uterine stromal cells prepare for embryo implantation by undergoing a progesterone-mediated differentiation process called decidualization. in this process, the stromal cells divide, change from a fibroblastic to an epithelioid morphology, and change their pattern of gene expression. decidualization is essential for trophoblast invasion and placentation (carson et al. ; norwitz et al. ; wilcox et al. ) , and defects in this process may underlie early pregnancy loss in some women. given the important function of the uterine stroma and the possibility that sars-cov- could infect the uterus, our goal here was to determine whether ace is expressed in endometrial stromal cells, is regulated by progesterone, and is required for decidualization. we first sought to determine whether ace is expressed in the endometrium and whether its expression differs according to the phase of the menstrual cycle. thus, we obtained endometrial biopsies from women during the proliferative or secretory phase of the menstrual cycles and performed immunofluorescence with an ace -specific antibody. in the proliferative phase, ace was highly expressed in epithelial cells than in stromal cells (fig. a) . however, in the secretory phase, ace expression was increased in the stromal cells (fig. a) . thus, we wondered whether ace expression increased during in vitro decidualization of human endometrial stromal cells (hescs). we isolated primary hescs, exposed them to decidualizing conditions, and confirmed that expression of the decidualization markers prolactin (prl) and insulin-like growth factor-binding protein- (igfbp ) increased over six days. ace mrna also increased over this time ( fig. a) . consistent with this finding, ace protein abundance increased during decidualization, as shown by both immunoblotting (fig. b) and immunofluorescence (fig. c) . as expected, ace protein predominantly localized in the cytoplasm and cell membrane of decidualized hescs. next, we wondered whether ace was required for primary hesc decidualization. to answer this question, we transfected hescs with control or ace -targeting sirnas and then exposed the cells to decidualization conditions. hescs transfected with control sirna changed from fibroblastic to epithelioid morphology (fig. a) and had increased expression of the decidualization markers prl and igfbp (fig. b) . in contrast, hescs transfected with ace - targeting sirna did not show a morphology change over six days (fig. a) and expressed significantly less prl, igfbp , and ace than control cells ( fig. b-c) . these results demonstrate that ace is essential for endometrial stromal cell decidualization. finally, we examined the expression of ace in the endometrium during early pregnancy in mice. we mated female wild-type mice with males of proven fertility and then stained their uteri with an ace -specific antibody at different days in early pregnancy. in days one through four, ace localized to the cytoplasm and cell surface of epithelial and stromal cells. however, beginning on day three, strong ace staining was seen in the cytoplasm of stromal cells. this staining was evident at least through day six, which is when robust decidualization occurs (fig. ) . given this change in ace abundance during pregnancy, we wondered whether ace expression was regulated by steroid hormones. to test this, we ovariectomized six-week-old mice, waited two weeks, treated the mice with either estrogen or progesterone for six hours, and then collected the uteri (fig. a) . uteri from progesterone-treated mice expressed significantly more ace mrna than uteri from vehicle-treated mice, which expressed significantly more ace mrna than uteri from estrogen-treated (fig. b) . consistent with this, immunofluorescence revealed that uteri from progesterone-treated mice had significantly more ace protein in stromal cells than did uteri from vehicle-or estrogen-treated mice (fig. c) . together, our findings suggest that ace expression in the endometrial stroma is promoted by progesterone in both humans and mice. moreover, we show that ace is required for human stromal cell decidualization. given the high ace expression in the human endometrium, sars-cov- may be able to enter endometrial stromal cells and elicit pathological manifestations in women with covid- . if so, women with covid- may be at increased risk of early pregnancy loss. as more data become available, epidemiologists and obstetricians should focus on this important issue and determine whether women who intend to get pregnant should undergo additional health screenings during the covid- pandemic. informed consent was obtained in accordance with a protocol approved by the washington corporation, carlsbad, usa) as described previously (camden et al. protein extracts were prepared from hescs as described previously (oestreich et al. ) . with dapi (cat. no. p thermo scientific). immunofluorescence images were captured on a confocal microscope (leica dmi b). hescs were grown on poly-l-lysine coated coverslips in -well plates and allowed to decidualize for six days in epc media as described above. then, cells were fixed with % sexually mature ( -week-old) cd females were mated to fertile wild-type males, and copulation was confirmed by the presence of vaginal plug on the following morning, designated as day post-coital (dpc). mice were euthanized, and uteri were collected on , , , , , and dpc. to determine the uterine estrogen or progesterone responses, six-week-old cd mice were bilaterally ovariectomized, rested for two weeks to allow the endogenous ovarian-derived steroid hormones to dissipate, and then subcutaneously injected with μl sesame oil (vehicle control), mg progesterone, or ng estradiol (sigma-aldrich) in μl sesame oil. six hours later, mice were euthanized, uterine tissues were collected and fixed in % paraformaldehyde, and rna was isolated and processed for qrt-pcr (kommagani et al. ) . a two-tailed paired student t-test was used to analyze experiments with two experimental groups, and analysis of variance by non-parametric alternatives was used for multiple comparisons to analyze experiments containing more than two groups. p< . was considered significant. all data are presented as mean ± sem. graphpad prism software was used for all statistical analyses. we thank dr. deborah j. frank (department of obstetrics and gynecology, washington university) for assistance with manuscript editing. rk conceived the project, supervised the work, analyzed the data, and wrote the manuscript. the authors have no conflicts of interest to declare. shown as mean ± sem. the experiment was repeated three times; *p < . , **p < . , and ***p < . . the role of asymptomatic class, quarantine and isolation in the transmission of covid- renin angiotensin system blockage by losartan neutralize hypercholesterolemia-induced inflammatory and oxidative injuries second-trimester miscarriage in a pregnant woman with sars-cov- infection decidualized pseudopregnant rat uterus shows marked reduction in ang ii and ang-( - ) levels growth regulation by estrogen in breast cancer (greb ) is a novel progesterone-responsive gene required for human endometrial stromal decidualization embryo implantation a novel angiotensin-converting enzyme-related carboxypeptidase (ace ) converts angiotensin i to angiotensin - -ncov epidemic: what about pregnancies? neovascularization produced by angiotensin ii cyclic decidualization of the human endometrium in reproductive health and failure angiotensin ii-induced apoptosis in rat cardiomyocyte culture: a possible role of at and at receptors tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis quantitative mrna expression profiling of ace , a novel homologue of angiotensin converting enzyme potential influence of covid- /ace on the female reproductive system emergence of a novel coronavirus, severe acute respiratory syndrome coronavirus : biology and therapeutic options the promyelocytic leukemia zinc finger transcription factor is critical for human endometrial stromal cell decidualization acceleration of the glycolytic flux by steroid receptor coactivator- is essential for endometrial decidualization evidence for a novel angiotensin ii receptor involved in angiogenesis in chick embryo chorioallantoic membrane angiotensin-( - ) in normal and preeclamptic pregnancy isolation of human endometrial stromal cells for in vitro decidualization integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of sars-cov- viral entry and highlights inflammatory programs in putative target cells angiotensin ii regulates vascular and endothelial dysfunction: recent topics of angiotensin ii type- receptor signaling in the vasculature ace and ang-( - ) in the rat uterus during early and late gestation implantation and the survival of early pregnancy the autophagy gene atg l is necessary for endometrial decidualization decidualization of the human endometrium regulation of decidualization and angiogenesis in the human endometrium: mini review renin-angiotensin system: biochemistry and mechanisms of action expression and significance of ace -ang-( - )-mas axis in the endometrium of patients with polycystic ovary syndrome angiotensin-converting enzyme (ace ) and ace activities display tissue-specific sensitivity to undernutrition-programmed hypertension in the adult rat angiotensin-( - ) is an endogenous ligand for the g protein-coupled receptor postmortem examination of patients with covid- infections in pregnancy with covid- and other respiratory rna virus diseases are rarely, if ever, transmitted to the fetus: experiences with coronaviruses, hpiv, hmpv rsv, and influenza placental pathology in covid- ', medrxiv cell entry mechanisms of sars-cov- ' a human homolog of angiotensin-converting enzyme. cloning and functional expression as a captopril-insensitive carboxypeptidase the vasoactive peptide angiotensin-( - ), its receptor mas and the angiotensin- converting enzyme type are expressed in the human endometrium the pivotal link between ace deficiency and sars-cov- infection hydrolysis of biological peptides by human angiotensin-converting enzyme- related carboxypeptidase roadmap to embryo implantation: clues from mouse models time of implantation of the conceptus and loss of pregnancy cryo-em structure of the -ncov spike in the prefusion conformation angiotensin ii type receptor mediates programmed cell death tridimensional visualization reveals direct communication between the embryo and glands critical for implantation primary decidual zone formation requires scribble for pregnancy success in mice impact of population movement on the spread of -ncov in china a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- - gq yusk authors: xiang, boyu; hu, xiangyu; li, haoxuan; ma, li; zhou, hao; wei, ling; fan, jue; zheng, ji title: scrna-seq discover cell cluster change under oab: ace expression reveal possible alternation of -ncov infectious pathway date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gq yusk objective previous study indicated that bladder cells which express ace were a potential infection route of -ncov. this study observed some differences of bladder cell cluster and their ace expression between oab mice and healthy mice, indicating the change of infectious possibility and pathway under overactive bladder (oab) circumstance. material and method pubic dataset acquisition was used to get ace expression in normal human bladder and mice bladder (gse ). we built up over oab model and studied the impact on cell typing and ace expression. by way of using single-cell rna sequencing (scrna-seq) technique, bladder cell clustering and ace expression in various cell types were measured respectively. result in pubic database (healthy human and mice bladder), ace expression in humans and mice is concentrated in bladder epithelial cells. the disappearance of umbrella cells, a component of bladder epithelial, was found in our oab model. in the two mouse bladder samples, ace expression of epithelial cells is . %, also the highest of all cell types. conclusion the disappearance of umbrella cell may alternate the infection pathway of -ncov and relate to the onset and progression of oab. in december , a novel coronavirus named -ncov emerged in wuhan, china, and swept around the world in just one month. except remarkable fever and respiratory disorder, urinary system damage was also detected in some -ncov patients , .though there has been no evidence of positive detection of -ncov in urine so far, its detection in urine samples of sars patients hinted the importance of virus-testing in urine samples of -ncov patients, which showed that the urinary system was a potential route of infection , the expression and distribution of receptors determine the pathway of viral infection, which is of great significance for understanding the pathogenesis and designing treatment strategies . -ncov was reported to share the same receptor with sar-cov, angiotensin-converting enzyme (ace ), and previous study has proven that bladder epithelial cells are potential target cells for a novel coronavirus and the concentration of ace seemed to have a special pattern which shows a decreasing trend from the out layer of the bladder epithelium (umbrella cells) to the inner layer (basal cells) with the intermediate cells in-between . studies and a lot of data show that older people with basic diseases have a higher chance of infection and higher mortality if they are infected , so it is very urgent and important to figure out how the underlying disease affects the way to change the -ncov infection pattern. the purpose of our study is to explore cell cluster structure and ace expression pattern in the bladder and use scrna-seq to infer the effects of human senile lesions, oab, on umbrella cells and intermediate cells in the bladder epithelium from the results of mice. since oab is a common senile disease which leads to expensive medical expenses and can be very authentic to simulate the effects of senile diseases on the bladder epithelium and it greatly affects the quality of life of the patients, we construct a model by using oab mice. bioinformatics scrna-seq has been widely used in -ncov research because it has the ability to analyze gene expression of all cell types in multiple tissues with unbiased high resolution. because this potential urinary tract infection pathway may be critical to prevent n-cov, here we used two scrna-seq transcriptomes to analyze bladder tissue from normal and oab mice and combined human and mouse bladder ace expression data from public data base to analyze the impact of underlying disease on this potential infectious system . gene expression matrices of scrna-seq data from normal humans and mice were downloaded from the gene expression omnibus (gse ). we replicated the downstream analysis using the code provided by the author in the original paper. we built up over active bladder model (oab) and studied the impact on cell typing and ace expression. by way of using scrna-seq technique, bladder cell clustering and ace expression in various cell types were measured respectively. bladder tissue samples were gathered and instantly stored in the gexscope tissue preservation solution (singleron biotechnologies) at - °c. before dissociating the tissue, the specimens were washed with hanks balanced salt solution (hbss) for three times and shred into - mm pieces. at °c in a ml centrifuge tube, the tissue pieces were dissolved in ml in pbs, the concentration of single-cell suspension was × cells/ml. to get access to high quality cells, we discarded cells with less than genes and more than genes, as well as the ones whose mitochondrial content was higher than %. after screening, cells were kept for the following analysis. the tsne projection of the data calculation was similar to previously mentioned. cell clusters with resolution . were obtained by using seurat find clusters function. with cell canonical markers, their types were distributed. from dataset (gse ), cell types annotated were basically the same as the original research , except neuron of mouse ( figure a , b, a and b). the fibroblasts in the red box were annotated as neurons by gpm a in the original article, but more credible markers (col a , col a , and dcn) proved that they were fibroblasts. in human bladder, the expression of ace was mainly concentrated in three subtypes of epithelial cells, with a small amount distributed in fibroblasts and monocytes ( figure c and d) . in mouse bladder, ace also gathered in epithelial cells, but with a higher density than human ( figure c and d) . as shown in figure , the ace expression in human bladder relatively corresponded to that of mouse bladder. as presented in our independent data (figure ) according to the latest scrna-seq studies, ace proved detectable in respiratory, digestive and urinary systems , , , indicating these systems were potential targets for -ncov dissemination. lin, et al have revealed that in bladder, the expression level of ace was relatively high in epithelial cells, especially umbrella cells , which indicated umbrella cells or epithelial cells were more likely to be infected by -ncov. as we know, the bladder epithelium is a transitional epithelium with high degeneration and rapid cell regeneration, thus bladder diseases like oab may have an impact on it. in our study, we sought to figure out the changes of scrna-seq transcriptome under oab circumstance. however, healthy or non-cancerous human bladder samples are usually unobtainable. we compared each cell type across species, we found that most human and mouse bladder cells were relatively well correlated. by comparing ace expression profiles, we also found similarities: the highest ace concentration existed in epithelial cells of both species. as an alternative, mice model was reliable to evaluate the effects of oab on cell clustering and -ncov infection risk. we identified the cells to construct the cell map, and then further clustered the fibroblasts and epithelial cells. we firstly discovered that oab umbrella cells are small in quantity but functionally significant. as known, bladder epithelial cells are continuously exposed to mechanical forces. yet for all that promise, our method is far from being the whole answers. the results from mouse scrna-seq were all side evidence without direct relation with clinical features of -ncov. in addition, we only detected a limited number of mouse bladder samples, rather than hace transgenic mice or human samples. li et al has reported that compared to human's ace , mice's ace is less efficient in sars-cov binding. due to differences in molecular structure, such a binding efficiency difference may also appear in -ncov. beyond that, the pathogenicity of -ncov was only clarified in hace mice , so normal mouse model may fail to simulate pathogenesis and therapeutics of novel coronavirus pneumonia. further studies should turn to hace mice or humans and increase the sample size. the extent of transmission of novel coronavirus in wuhan, china, novel coronavirus-infected pneumonia duration of rt-pcr positivity in severe acute respiratory syndrome single-cell rna expression profiling of ace , the putative receptor of wuhan -ncov single-cell analysis of ace expression in human kidneys and bladders reveals a potential route of -ncov infection reduced ca + spark activity contributes to detrusor overactivity of rats with partial bladder outlet obstruction single-cell transcriptomic map of the human and mouse bladders single-cell rna expression profiling of ace , the putative receptor of wuhan -ncov the digestive system is a potential route of -ncov infection: a bioinformatics analysis based on single-cell transcriptomes exocytosis/endocytosis in bladder umbrella cells hypercompliant apical membranes of bladder umbrella cells expansion and contraction of the umbrella cell apical junctional ring in response to bladder filling and voiding acute renal impairment in coronavirus-associated severe acute respiratory syndrome persistent shedding of viable sars-cov in urine and stool of sars patients during the convalescent phase the pathogenicity of sars-cov- in hace transgenic mice key: cord- - y kcwu authors: lan, tammy c. t.; allan, matthew f.; malsick, lauren e.; khandwala, stuti; nyeo, sherry s. y.; bathe, mark; griffiths, anthony; rouskin, silvi title: structure of the full sars-cov- rna genome in infected cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y kcwu sars-cov- is a betacoronavirus with a single-stranded, positive-sense, -kilobase rna genome responsible for the ongoing covid- pandemic. currently, there are no antiviral drugs or vaccines with proven efficacy, and development of these treatments are hampered by our limited understanding of the molecular and structural biology of the virus. like many other rna viruses, rna structures in coronaviruses regulate gene expression and are crucial for viral replication. although genome and transcriptome data were recently reported, there is to date little experimental data on predicted rna structures in sars-cov- and most putative regulatory sequences are uncharacterized. here we report the secondary structure of the entire sars-cov- genome in infected cells at single nucleotide resolution using dimethyl sulfate mutational profiling with sequencing (dms-mapseq). our results reveal previously undescribed structures within critical regulatory elements such as the genomic transcription-regulating sequences (trss). contrary to previous studies, our in-cell data show that the structure of the frameshift element, which is a major drug target, is drastically different from prevailing in vitro models. the genomic structure detailed here lays the groundwork for coronavirus rna biology and will guide the design of sars-cov- rna-based therapeutics. severe acute respiratory syndrome coronavirus (sars-cov- ) is the causative agent of the current coronavirus disease , recently declared a global pandemic by the world health organization (who). sars-cov- is an enveloped virus belonging to the genus betacoronavirus, which also includes sars-cov, the virus responsible for the sars outbreak, and middle east respiratory syndrome coronavirus (mers-cov), the virus responsible for the mers outbreak. despite the devastating effects these viruses have had on public health and the economy, currently no effective antivirals treatment or vaccines exist. there is therefore an urgent need to understand their unique rna biology and develop new therapeutics against this class of viruses. coronaviruses (covs) have single-stranded and positive-sense genomes that are the largest of all known rna viruses ( - kb) (masters, ) . previous studies on coronavirus structures have focused on several conserved regions that are important for viral replication. for several of these regions, such as the ' utr, the ' utr, and the frameshift element (fse), structures have been predicted computationally with supportive experimental data from rnase probing and nuclear magnetic resonance (nmr) spectroscopy (plant et al., ; yang and leibowitz, ) . functional studies have revealed the importance of their secondary structures for viral transcription and replication (brierley, digard and inglis, ; liu et al., ; li et al., ; yang and leibowitz, ) . the fse is located near the boundary of orf a and orf b and causes the ribosome to "slip" and shift register by - nt in order to bypass a canonical stop codon at the end of orf a and translate the viral rna-dependent rna polymerase (rdrp) and other proteins in orf b (brierley et al., ; plant et al., ) . studies on multiple viruses have shown that an optimal frameshifting rate is critical, and small differences in percentage of frameshifting lead to dramatic differences in genomic rna production and infection dose (plant et al., ) . therefore, the fse has emerged as a major drug target for binding of small molecules that can influence the rate of ribosome slippage and be used as a treatment against sars-cov- . the structure of the fse of sars-cov (whose sequence differs from the sars-cov- fse by just one nucleotide), was solved by nmr to be a -stem pseudoknot (plant et al., ; rangan, zheludev and das, ) . the prevailing mechanism is that the -stem pseudoknot causes the ribosome to pause at the slippery sequence and backtrack by nt to release mechanical tension (plant and dinman, ) . however, the frameshifting rate of sars-cov- during infection is unknown, and none of the rna structure models for the fse has been validated in cells. over the past ten years, the drawbacks of in silico models have been largely overcome by the development of genome-wide strategies to chemically probe rna structure in cells. the most popular chemical probes are dimethyl sulfate (dms) (rouskin et al., ) , shape (siegfried et al., ) , and icshape (spitale et al., ) , all of which provide critical in vivo measurements that are used by computational algorithms to generate accurate models of rna structures in vivo. in this study, we perform dms mutational profiling with sequencing (dms-mapseq) (zubradt et al., ) on infected vero cells to generate the first experimentally determined genome-wide secondary structure of sars-cov- . our results reveal major differences with in silico predictions and highlight the physiological structures of known functional elements. our work provides experimental data on the structural biology of rna viruses and will inform efforts in the development of rna-based diagnostics and therapeutics for sars-cov- . to determine the intracellular genome-wide structure of sars-cov- , we added dimethyl sulfate (dms) to infected vero cells and performed mutational profiling with sequencing (dms-mapseq) (zubradt et al., ) (figure a) . dms rapidly and specifically modifies unpaired adenines and cytosines in vivo at their watson-crick faces. our results were highly reproducible between independent biological replicates ( r = . ; figure b ). combined, a total of ~ . million paired reads mapped to the coronavirus genome ( figure c ), representing ~ % of total cellular rna (post ribosomal rna depletion). this large fraction of coronavirus reads from total intracellular rna is consistent with previous literature using sars-cov- infected vero cells (kim et al., ) . dms treated samples had high signal to noise ratio, with adenines and cytosines having a mutation rate ~ -fold higher than the background (guanines and uracils). in contrast, in untreated samples the mutation rate on all four bases ( . %) was slightly lower than previously reported average sequencing error rates of . % (pfeiffer et al., ) ( figure d ). we used the dms-mapseq data as constraints in rnastructure (mathews, ) to fold the entire sars-cov- genomic rna (supplementary figure ) . we first examined the results at the nt ' utr ( figure e ), one of the best studied regions in the coronavirus genome (yang and leibowitz, ; madhugiri et al., ) . the ' utrs of multiple coronaviruses have been characterized extensively in terms of their structures and roles in viral replication. in agreement with previously published structures of this highly conserved region (yang and leibowitz, ; madhugiri et al., ) , we found five stem loops (sl - ) within the sars-cov- ' utr and three stem loops (sl - ) near the beginning of orf a. stem loop (sl ) is vital for viral replication. a previous study on murine hepatitis virus (mhv) found that mutations that destabilize the upper portion (near the loop) of sl or stabilize the lower portion (near the base) prevent viral replication (li et al., ) . in addition, viable viruses with a single-base deletion in the lower portion also had mutations in the ' utr. together, these observations suggested that the lower portion of sl must be able to unfold and likely interacts with the ' utr to assist in sgrna transcription (li et al., ) . in our structure, the lower portion of sl has low dms reactivity, consistent with a pairing between these bases either within the stem or with the ' utr. further work is needed to test a potential alternative structure for sl that would involve long range interactions between the ' and the ' utrs of sars-cov- . the most evolutionarily conserved secondary structure in the ' utr is stem loop (sl ) and the structure of the mhv sl has been solved by nmr (liu et al., ) . the sequences of sars-cov- and mhv sl are identical, and our in-cell model of sl is consistent with the nmr structure, with every base in the stem showing low dms reactivity. although c within the loop of sl has low reactivity, this is explained by the nmr structure, in which c and g (as numbered here) are paired. interestingly, the secondary structure but not the sequence of the stem was shown to be essential for replication: in a previous study, disruption of sl by three mutations on either side of the stem prevented mhv-a replication, but compensatory mutations restored nearly wildtype growth (liu et al., ) . mutants of mhv-a with non-functional sl were able to transcribe minus-strand genomic rna in cells but not minus-strand subgenomic rnas, suggesting that sl plays a role in discontinuous transcription. disruption of sl also reduced the rate of in vitro translation of a downstream luciferase reporter (liu et al., ) . to the best of our knowledge, the precise mechanism through which sl is required for viral replication remains unknown. stem loop (sl ) contains the leader trs (trs-l) involved in discontinuous transcription (yang and leibowitz, ) , the structure and function of which will be discussed in figure . stem loop (sl ) is a bulged stem loop downstream of trs-l. a previous study in mhv found that sl was required for sgrna synthesis and that deleting the entire sl was lethal (yang et al., ) . however, mutation or deletion of either the lower (sl a) or upper (sl b) part of sl merely impairs replication, with disruption of sl a causing greater impairment (yang et al., ) . in addition, replacing the entire sl with a shorter, unrelated sequence that was predicted to fold into a stable stem loop impaired mhv growth but was not lethal (yang et al., ) . based on these results, yang et al. proposed that discontinuous transcription requires proper spacing between the stem loops upstream and downstream of the trs, and sl maintains this space (yang et al., ) . sl also contains the start codon of a short open reading frame (nine codons in sars-cov- ) known as the "upstream" orf (uorf) (madhugiri et al., ) . in a study on mhv, disruption of the uorf start codon increased translation of orf a and modestly decreased viral replication (madhugiri et al., ) . stem loop (sl ) contains four branches (sl , sl a, sl b, sl c), and its structure in mhv-a was predicted using chemical probing with shape (yang and leibowitz, ) . disruption of sl c has been shown to prevent replication of viral defective interfering particles (yang and leibowitz, ) but not of whole mhv . sl contains the start codon of orf a, although in different coronaviruses the start codon can be located in sl a, sl b, or the stem ' of sl c (yang and leibowitz, ) . our data support a three-branched sl for sars-cov- , similar to previous models of sars-cov (yang and leibowitz, ) and sars-cov- (rangan, zheludev and das, ) in which the start codon is located in the main sl stem, just downstream of sl c. stem loops , , and (sl - ) lie downstream of the ' utr, within the coding sequence of nsp . the structures of these stem loops vary among betacoronaviruses, with sars-cov predicted to have all three stem loops, mhv only sl and sl , and bovine coronavirus (bcov) a structure in which sl and sl are branches of another stem . as predicted for sars-cov, our model also has all three stem loops, though this contrasts with an in silico model of sars-cov- that features three short stem loops in place of sl (rangan, zheludev and das, ) . mutations of sl that disrupt its secondary structure but preserve the amino acid sequence of nsp were not lethal in mhv, suggesting that sl is not essential to broadly locate regions of potential interest in the sars-cov- genome, we identified all structured and accessible regions, which we define as stretches of at least consecutive paired bases or consecutive unpaired bases, respectively ( figure a ). structured regions could be functional elements of the genome that are potential targets of therapeutics. we identify structured regions covering a total of , bases ( . % of the genome). these regions include structures with known functions, such as sl in the ' utr and a bulged stem loop in the ' utr (yang and leibowitz, ) , as well as many structures whose functions are unknown. the longest stretch of consecutive paired bases is nt (positions , - , ) and falls within a previously uncharacterized set of two bulged stem loops in orf a. these results provide a starting point for characterizing additional functional structures in the viral life cycle. accessible regions in viral genomes are potential targets of antisense oligonucleotide therapeutics (eckardt, romby and sczakiel, ; ding, ; rangan, zheludev and das, ) through the mechanism of rna interference (wilson and doudna, ) . we identify accessible regions covering , bases ( % of the entire genome). as there are such regions within orf-n, which is present in every sgrna, every sgrna may be targetable with antisense oligos. additionally, every open reading frame except orf-e contains at least one of these predicted accessible regions, raising the possibility of targeting individual sgrnas. at a more stringent threshold of consecutive nucleotides predicted to be unpaired, we find accessible regions, one of which is located within orf-n ( nt, positions , - , ) and hence in every genomic and subgenomic rna. the two longest stretches of consecutive predicted unpaired bases are both nt long and occupy positions , - , (within orf a) and , - , (within orf-s). these regions may offer multiple binding sites for antisense oligonucleotides. as the transcription-regulating sequences (trss) are necessary for the synthesis of sgrnas, we analyzed our structural models of the leader trs (trs-l) and the nine body trss (trs-b). the leader trs (trs-l) is the central component of the ' utr involved in discontinuous transcription (sola et al., ) . in silico models for several alpha and betacoronaviruses variously place trs-l in stem loop (sl ) or in an unpaired stretch of nucleotides (liu et al., ; yang and leibowitz, ) . the trs-l of sars-cov and of sars-cov- was predicted to lie in the ' side of the stem of sl , which is consistent with our in-cell model (liu et al., ; yang and leibowitz, ; rangan, zheludev and das, ) . in our data, the stem of sl contains two bases with medium reactivity ( figure b ), which suggests that sl transitions between folded and unfolded states, as is hypothesized for the alphacoronavirus transmissible gastroenteritis virus (tgev) (madhugiri et al., ) . of the nine body trss, we find that seven (all but the trss of orf a and orf b) lie within a stem loop. of these, all but one trs (n) place the core sequence on the ' side of the stem. four body trss (m, orf , orf , and n) are predicted to lie in stem loops with two or three bulges, with the core sequence spanning one of the internal bulges. the other three structured body trss (s, orf a, and e) lie in stem loops without bulges, with the final paired base in the ' side of the stem contained in the core sequence. strikingly, the entire core sequence is paired in two body trss (s and m), and partially exposed in a loop or bulge in the other five ( figure b ). we evaluated the robustness of our in-cell data derived genome-wide model by varying two critical rna folding parameters used by rnastructure: ) the maximum allowed distance for base pairing and ) the threshold for dms signal normalization. a previous in silico approach for folding rna found that limiting base pairs to be to nt apart was optimal to avoid overpredicting structured regions (lange et al., ) . however, some rna viruses contain known essential structures wherein bases over nt apart are paired (e.g. the rev response element in hiv- spans approximately nt (watts et al., ) ). we therefore varied the maximum distance (md) allowed for base pairing from nt to nt. we computed the agreement between the resulting structures using a modified version of the fowlkes-mallows index (fowlkes and mallows, ) that compares base pairing partners as well as unpaired bases (methods). overall, there was high agreement while varying the md from nt to nt, suggesting that long-distance (i.e. > nt) interactions across the sars-cov- rna have a small effect on the identity of local structures. the genome structure folded with an md of nt was . % identical to the structure with an md of nt, and in the latter structure only . % of base pairs spanned > nt. next, we proceeded with the md limit of nt and tested two different dms signal thresholds that normalize reactivity to either the median of the top % or top % of the most reactive bases. we found that the structure models produced with the two normalization approaches were highly similar, with . % identity ( figure a ). thus, within the ranges that we tested, our genome-wide data-derived model was robust to variation in the parameters of rnastructure (mathews, ) . we proceeded with the whole genome structure modelled with a md of nt and a dms signal normalization of % for further analysis (supplementary figure) . previous studies that computationally predicted genome-wide sars-cov- rna structures used ) rnaz, a thermodynamic-based model that additionally takes sequence alignment and considers base pairing conservation (gruber et al., ; rangan, zheludev and das, ) , and ) contrafold, which predicts rna secondary structures without physics-based models and instead uses learned parameters based on known structures (do, woods and batzoglou, ) . these recent studies predicted structures with rnaz with lengths ranging from to nt, and structures with contrafold with lengths ranging from to nt (rangan, zheludev and das, ) . for each of these structures, we computed the agreement between the different models ( figure b ). we report the agreement using the mfmi while either excluding external bases pairs or including these pairs (methods). as expected, agreement with the structures from purely computational prediction is higher when excluding external base pairs (average . % for rnaz, . % for contrafold) than when including them (average . % for rnaz, . % for contrafold). since our goal is to compare the overall similarity of two structures, we chose the inclusion of external base pairs as the more accurate metric for comparing the structures. our predictions overall agreed more with those from rnaz (mean . %, median . %) than contrafold (mean . %, median . %). we report the agreement between our structure and the rnaz structures across the entire genome ( figure c ). most structures are to % identical, with several short regions that disagree substantially. consistency of our in-cell structure models. agreement is given between our structure models predicted using a maximum distance limit of nt and nt between paired bases at % signal normalization and between our predictions using % and % dms normalization at nt maximum allowed base pair distance. (b) agreement of our structure model with all predicted structures from rnaz and contrafold. agreement is given for both excluding and including external base pairs. (c) agreement of our structure with a previous model from rnaz across the genome. at positions for which multiple rnaz model exists, the average agreement with all models is given. (d) agreement of our model with rnaz predicted structures with the three highest p-values in regions with previously unannotated structures. (e) agreement of our model with contrafold predicted structures with the five highest maximum expected accuracies in evolutionarily conserved regions. (f) agreement of our trs structure models to rnaz predicted structures. for trss for which multiple rnaz models exist, agreement with each prediction is shown. in addition, we computed the similarity of our model compared to the structures with the three highest p-values predicted with rnaz that do not overlap known structures in the rfam database (kalvari et al., ; rangan, zheludev and das, ) (figure d ). we noted that in all three cases, the structure at the center of the window was nearly identical to ours, and most of the disagreements arose at the edges, presumably due to the effects of the windows from rnaz being limited to nt. of the five structures predicted with contrafold that had the largest maximum expected accuracies, our agreement ranged from . % to . %, well above the genome-wide mean ( . %), suggesting that these structures are indeed more accurate than the average contrafold structure ( figure e ). finally, we compared the structures at the trs elements to those predicted by rnaz (rangan, zheludev and das, ) (figure f ). to remove the effects of external base pairs, we focused only on the complete structural element (e.g. a stem loop) in which the trs was located. rnaz predicted structure for four trss. our model for trs-l was identical to the first predicted window from rnaz but differed significantly ( . % agreement) from the next prediction of the same trs-l element within a different folding window, indicating that the choice of folding window can have a large effect on the rnaz structure model. for the other three trs elements for which rnaz predicted at least one structure for, our agreement ranges from . % to . %, above the genome-wide average of . %, lending support to both models. the frameshift element (fse) causes the ribosome to slip and shift register by - nt in order to bypass a canonical stop codon and translate the viral rna-dependent rna polymerase (rdrp) (plant and dinman, ) . previous studies on coronaviruses and other viruses have shown that an optimal frameshifting rate is critical and small differences in percentage of frameshifting lead to dramatic differences in genomic rna production and infection dose (plant et al., ) . therefore, the fse has emerged as a major drug target for small molecule binding that could influence the rate of frameshifting and be used as a treatment against sars-cov- . to date, there is little experimental data on the structure of sars-cov- fse and the prevailing model is a -stem pseudoknot forming downstream of the slippery site, which is thought to pause the ribosome and allow frameshifting to occur (plant and dinman, ) . to closely examine the fse structure in cells, we used dms-mapseq target specific protocol (zubradt et al., ) . we designed primers targeting nt surrounding the fse and amplified this region from cells infected with sars-cov- that were treated with dms. our analysis revealed a strikingly different structure than the prevailing model (plant et al., ; rangan, zheludev and das, ) (figure a ). our in-cell model does not include the expected pseudoknot formation downstream of the slippery sequence. instead, half of the canonical stem (figure a , purple) finds an alternative pairing partner (pink) driven by complementary bases upstream of the slippery site ( figure a , pink). we call this pairing alternative stem (as ). the prevailing model of the sars-cov- fse is based on previous studies of the sars-cov fse, as they only differ in sequence by a single nucleotide located in a putative loop (rangan, zheludev and das, ) . nuclease mapping and nuclear magnetic resonance (nmr) analysis of the sars-cov fse solved the structure of an in vitro refolded, truncated nt region starting at the slippery site (plant et al., ) . this structure did not include the sequence upstream of the slippery site and formed a -stem pseudoknot. interestingly, in silico predictions of the rna structure of the sars-cov- genome using rnaz (rangan, zheludev and das, ) and scanfold (andrews et al., ) do not find the -stem pseudoknot but instead support our in-cell model of alternative stem . in sars-cov- , scanfold not only predicted the as but also found that it was more stable relative to random sequences than any other structure in the entire frameshift element (andrews et al., ) . indeed, three conceptually varied methods (dms-mapseq, rnaz, and scanfold) aimed at identifying functional structures, run independently by different research groups all converge on the alternative stem as a central structure at the fse. in order to directly compare our in-cell findings with the reports of the -stem pseudoknot, we in vitro-transcribed, refolded, and dms-probed the same nt sequence as analyzed by nmr (plant et al., ) . our in vitro-data driven model agrees well with the nmr model ( . % identical) and finds all three canonical stems, including the pseudoknot. the major differences we observed in the structure of the fse in cells vs. in vitro could either be due to ) length of the in vitro refolded viral rna or ) factors in the cellular environment that are absent in vitro. to distinguish between these two possibilities, we re-folded the fse in the context of longer native sequences. we found that as we increased the length of the in vitro re-folded construct by including more of its native sequence, from nt to nt to kb, the dms reactivity patterns became progressively more similar to the pattern we observed in cells ( figure b ). indeed, in the context of the full ~ kb genomic rna, the structure of the fse is nearly identical to the structure in physiological conditions during sars-cov- infection in cells (r = . ). these results indicate that the length of the entire rna molecule is important for correctly folding the fse. strikingly, at a length of nt and above, the main structure forming is alternative stem rather than the -stem pseudoknot. our data indicate that given the full range of pairing possibilities in the genome, as is more favorable and the predominant structure in cells. to determine if other coronaviruses may have a similar alternative structure of the frameshift element, we searched for the sequence that pairs with canonical stem in a set of curated coronaviruses (ceraolo and giorgi, ) . this set contains isolates of sars-cov- , other sarbecoviruses (including the sars-cov reference genome), and merbecoviruses. the nt complement (ccgcgaaccc) to a sequence overlapping canonical stem of the fse (ggguuugcgg) was perfectly conserved in all of the sarbecoviruses, six of which were isolated from bats ( figure c ). however, the nt complement was not present in either merbecovirus. aligning the sequences of all betacoronaviruses with complete genomes in refseq revealed that the nt complement was conserved in all of and only the three sarbecoviruses in refseq: sars-cov, sars-cov- , and btcov bm - (data not shown). these results suggest that as is unique to the sarbecoviruses. in the past, rna structures based on experimental data have been limited to a population average structure which assumes that every individual molecule are in folded in the same conformation. however, previous studies have identified the existence of biologically relevant alternative structures in viral genomic rna that play vital roles in the viruses' life cycles. an example of this is the hiv- rev responsive element (rre) and ′ untranslated region (utr) which regulate viral rna export and packaging. as aforementioned, the rate of frameshifting in coronaviruses is critical for the virus and set at a specific percentage. while the mechanism of how this frameshifting rate is maintained remains unidentified, studies have proposed that alternative rna structure may play a role in promoting frameshifting. to determine whether the sars-cov- fse forms alternative structures, we applied the "detection of rna folding ensembles using expectation-maximization" (dreem) algorithm on our in-cell dms-mapseq data (tomezsko et al., ) . in brief, the dreem algorithm groups dms reactivities into distinct clusters by considering the likelihood of the co-occurrence of reactive bases. (i.e. if two bases are highly mutated in the population average but never mutated together on a single read, we can assume that at least two conformations are present). dreem directly clusters experimental data without relying on generating a thermodynamic based model, thus enabling the discovery of new rna structures based purely on chemical probing data from cells. we analyzed modified intracellular rna of a nt region surrounding the fse specifically amplified from two biological replicates. we found two distinct patterns of dms reactivities ( figure a ), suggesting that the rna folds into at least two distinct conformations in this region. in both biological replicates, clusters and (corresponding to structure and ) separate at a reproducible ratio (~ % vs. %) where structure is significantly different from structure (r = . ) but highly similar to the corresponding cluster in biological replicates (r = . ) ( figure b ). both structures have the alternative stem pairing spanning the slippery sequence. however, structure forms a large nt stem immediately downstream of alternative stem whereas structure does not ( figure c ). further studies measuring frameshifting efficiency are needed to determine whether this nt stem specifically promotes frameshifting by stalling the ribosome right at the slippery sequence. when genomic rna folds into alternative structure (top), the slippery site resides within a loop in the middle of a long stem-loop. as the ribosome starts to unwind the rna, it may pause at the base of the stem, but this pause is far from the slippery site. by the time the ribosome reaches the slippery site, the structure in front of it has been unwound. as the ribosome continues it will reach the upstream stop codon and terminate translation. in contrast, alternative structure (bottom) forms a nt stem loop right in front of the slippery site. this stem loop can cause the ribosome to pause, frameshift - nt and bypass the upstream stop codon to continue translation. here we present, to our knowledge, the first secondary structure of the entire rna genome of sars-cov- in infected cells, based on chemical probing with dms-mapseq. importantly, we find that many genomic regions fold differently than in silico-based predictions. we attribute these differences in large parts to the sequence context used for folding (how much native sequence is taken upstream and downstream for a given region of interest). our data underscore the contribution of the full-length rna molecule towards structure formation at local regions. of particular note, our in-cell data reveal an alternative conformation for the frameshift element (fse) in which the ' side of stem of the canonical pseudoknot is paired to a complementary nt sequence that lies shortly upstream. we find that this alternative stem (as ) predominates to the extent that we are not able to detect the canonical pseudoknot. in previous work (tomezsko et al., ) , we estimated our limit of detection for minor conformations at %, suggesting two potential models for frameshifting in sars-cov- . in the first model, the pseudoknot structure causes frameshifting and forms at a level below our limit of detection. as provides a means to downregulate the rate of frameshifting by sequestering the ' side of the canonical stem . in the second model, frameshifting is stimulated by a large nt stem loop immediately downstream of as that blocks the ribosome and causes it to slip. an alternative structure without the nt stem loop does not cause ribosome pausing at the slippery site. these alternative genomic conformations enable sars-cov- to regulate translation of orf ab ( figure ). if the second model is true, then in sarbecoviruses the ability to form a pseudoknot may be vestigial, as the fse sequence is highly conserved among coronaviruses (plant et al., ) . alternative structure (cluster) ribosome does not frameshift previous functional studies on sars-cov report that the pseudoknot increases frameshifting but do not contradict our model of alternative stem (baranov et al., ; plant et al., ; su et al., ) . functional assays use short reporter constructs that are likely to fold into a structure with a pseudoknot, similar to what we observed for the nt construct refolded in vitro. indeed, a large body of literature shows that different types of pseudoknots or stable stems placed shortly after the slippery site increase frameshifting rates in reporter assays (brierley, digard and inglis, ; baranov et al., ) . however, much work is needed to understand which structures play a major role during active infection. to our knowledge, no studies have examined the effects of mutations and measured the translation efficiency of orf ab in cells infected with a sarbecovirus. our data reveal additional structures in the genome that may be involved in regulating gene expression. for example, previous work in mhv and sars-cov suggests that the n protein binds to and unwinds trs structures to regulate sub genomic gene expression (grossoehme et al., ) . importantly, there is evidence that the stability of the trs structure can affect its affinity for the n protein. all together, these results indicate that small molecules or antisense oligoes that alter the stability of the trs structures will disrupt the expression of subgenomic rnas and can serve as therapeutic strategies. so far, we have not identified any structures of minus-strand rnas. of our . million paired reads mapping to sars-cov- , only , ( . %) came from minus-strand rna, insufficient coverage to obtain reliable dms signals. however, as minus-strand rnas must be transcribed to positive-strand genomic and subgenomic rnas, they may contain structures that are important for this process and potentially druggable. higher-throughput genome-wide sequencing or pcr amplification of a targeted region of the minus strand would enable us to discover structures in the minus strands. in this study we obtained structures of rna in infected vero cells. we were limited to studying population average rna structures, with no temporal resolution or measure of heterogeneity. our models of stable structures across the genome represent prominent conformers forming in cells. however, it is possible that some of the open regions we identify, which have evenly distributed dms signal, are in fact composites of alternative structures (tomezsko et al., ) . deeper sequencing data is needed to determine the degree of structure heterogeneity access the coronavirus genome. further studies will determine how rna structures change from the time a virion enters a cell to the time when new virions are released. we show that in vitro rna-refolding of the full-length kb genome can recapitulate the structures formed at the fse in cells. these results provide evidence that some of the structures we find in cells are largely driven by intrinsic rna thermodynamics. we expect such structures will remain the same between different cell types, although more work is needed to establish the extent of structure changes as a function of the abundance of various cellular and viral protein factors. our in-cell data-derived model of sars-cov- presents major rna structures across the entire genome and provides the foundation for further studies. in particular, we propose a new model for the fse in which the predominant structure forms an alternative stem . future work will involve determining by what mechanism and to what extent the alternative structures of the sars-cov- fse regulate translation of orf ab, as well as whether the fse can fold into a pseudoknot in cells. better understanding of the structures and mechanisms of elements of the sars-cov- genome will enable the design of targeted therapeutics. sample buffer (thermofisher scientific) and run on a % tbe-urea gel (thermofisher scientific) at v for h min for size selection of rna that is ~ nt. to dephosphorylate and repair the ends of randomly fragmented rna, μl x cutsmart buffer (new england biolabs), μl shrimp alkaline phosphatase (new england biolabs), μl rnaseout (thermofisher scientific) and water were added to a final volume of μl and °c for h. next, μl % peg- (new england biolabs), μl × t rna ligase buffer (new england biolabs), μl t rna ligase, truncated kq (england biolabs) and μl linker were added to the reaction and incubated for h at °c. the rna was purified with rna clean and concentrator - , following the manufacturer's instructions for recovery of all fragments and eluted in μl water. excess linker was degraded by adding μl × recj buffer (lucigen), μl recj exonuclease (lucigen), μl ′ deadenylase (new england biolabs) and μl rnaseout, then incubating for h at °c. the rna was purified with rna clean and concentrator - , following the manufacturer's instructions and eluted in μl water. for reverse transcription, . μg of rrna subtracted total rna or μg of in vitrotranscribed rna was added to μl × first strand buffer (thermofisher scientific), μl μm reverse primer, μl dntp, μl . m dtt, μl rnaseout and μl tgirt-iii (ingex). the reverse-transcription reaction was incubated at °c for . h. μl m naoh was then added and incubated at °c for min to degrade the rna. the reversetranscription product was mixed with an equal volume × novex tbe-urea sample buffer (thermofisher scientific) and run on a % tbe-urea gel (thermofisher scientific) at v for h min for size selection of cdna that is ~ nt. the size-selected and purified cdna was circularized using circligase ssdna ligase kit (lucigen) following manufacture's protocol. μl of the circularized product was then used for pcr amplification using phusion high-fidelity dna polymerase (neb) for a maximum of cycles. the pcr product was run on an % tbe gel at v for h and size-selected for products ~ nt. the product was then sequenced with iseq (illumina) to produce either × -nt paired-end reads. fastq files were trimmed using trimgalore (github.com/felixkrueger/trimgalore) to remove illumina adapters. trimmed paired reads were mapped to the genome of sars-cov- isolate sars-cov- /human/usa/usa-wa / (genbank: mn . ) (harcourt et al., ) using bowtie (langmead and salzberg, ) with the following parameters: --local --no-unal --no-discordant --no-mixed -l -x . reads aligning equally well to more than one location were discarded. sam files from bowtie were converted into bam files using picard tools samformatconverter (broadinstitute.github.io/picard). for each pair of aligned reads, a bit vector the length of the reference sequence was generated using dreem (tomezsko et al., ) . bit vectors contained a at every position in the reference sequence where the reference sequence matched the read, a at every base at which there was a mismatch or deletion in the read, and no information for every base that was either not in the read or had a phred score < . we refer to positions in a bit vector with a or as "informative bits" and all other positions as "uninformative bits." for each position in the reference sequence, the number of bit vectors covering the position and the number of reads with mismatches and deletions at the position were counted using dreem. the ratio of mismatches plus deletions to total coverage at each position was calculated to obtain the population average mutation rate for each position. in cases indicated below, bit vectors were discarded if they had two mutations closer than bases apart, had a mutation next to an uninformative bit, or had more than an allowed total number of mutations (greater than % of the length of the bit vector and greater than three standard deviations above the median number of mutations among all bit vectors). the average mutation rate for each position was computed from the filtered bit vectors in the same way as described above. the mutation rates for all of the bases in the rna molecule were sorted in numerical order. the greatest % or % of mutation rates (specified where relevant in the main text) were chosen for normalization. the median among these signals was calculated. all mutation rates were divided by this median to compute the normalized mutation rates. normalized rates greater than . were winsorized by setting them to . (dixon, ) . genome-wide coverage ( figure c ) was computed by counting the number of unfiltered bit vectors from the in-cell library that contained an informative bit ( or ) at each position. signal and noise plots ( figure d ) were generated from the unfiltered population average mutation rate. a total of ( . %) positions across the genome were discarded for having a noise mutation rate greater than % in the untreated sample (likely due to endogenous modifications or "hotspot" reverse transcription errors). the signal and noise were computed every nt, starting at nucleotide . for each of these nucleotides, the average mutation rate was computed over the nt window starting bases upstream and ending bases downstream. the "signal" was defined as the average mutation rate of a and c, while the "noise" was defined as the average mutation rate of g and u. the correlation of mutation rates between biological replicates genome-wide ( figure b) was computed using the unfiltered bit vectors. the correlation of mutation rates between different conditions of the fse ( figure b ) was computed using the filtered bit vectors. the correlation of mutation rates between clusters and biological replicates for the fse ( figure b ) was computed using the filtered bit vectors after clustering into two clusters. for all correlation plots, the pearson correlation coefficient is given. a total of ( . %) outliers with > % mutation rate were removed to prevent inflating the pearson correlation coefficients. the unfiltered population average mutation rate was obtained from the in-cell library reads. the , nt genome of sars-cov- was divided into ten segments, each roughly kb the boundaries of which are predicted to be open and accessible by rnaz (rangan, zheludev and das, ) . for each segment, the population average mutation rate was normalized. the segment was then folded using the fold algorithm from rnastructure (mathews, ) with parameters -m to generate the top three structures, -md to specify a maximum base pair distance, and -dms to use the normalized mutation rates as constraints in folding. all mutation rates on g and u bases were set to - (unavailable constraints). connectivity table files output from fold were converted to dot bracket format using ct dot from rnastructure (mathews, ) . the ten dot bracket structures were concatenated into a single genome-wide structure. given two rna structures of the same length (l) in dot-bracket notation, all base pairs in each structure were identified. each base pair was represented as a tuple of (position of ' base, position of ' base). the number of base pairs common to both structures (p ) as well as the number of base pairs unique to the first structure (p ) and to the second structure (p ) were computed. given these quantities, the fowlkes-mallows index (a measure of similarity between two binary classifiers) is defined as fmi = !" ⁄"( !" + ! )( !" + " ) (fowlkes and mallows, ). in the case that ( !" + ! )( !" + " ) = , we let fmi = . as the fowlkes-mallows index does not consider positions at which the structures agree on bases that are unpaired, the index needed to be modified; otherwise regions with few base pairs would tend to score too low. thus, the number of positions at which both sequences contained an unpaired base (u) was computed. two variations of the modified fowlkes-mallows index (mfmi) were tested that differed in their treatment of externally paired bases, defined as bases paired to another base outside of the region of the structure being compared. the version of mfmi excluding external base pairs counted all externally paired bases as unpaired when computing u. the number of positions containing a paired base (p) was computed as p = l -u. in this case, mfmi was defined as mfmi = ⁄ + ⁄ × fmi, which weights the fowlkes-mallows index by the fraction of paired bases and adds the fraction of unpaired bases (u/l), as the structures agree at all unpaired positions. to include external base pairs, any position containing an externally paired base was not counted in u. the number of positions at which both structures contained an externally paired base with the same orientation (i.e. both facing in the ' or ' direction) was computed as the number e. the number of positions at which at least one structure contained a base that was paired, but not externally, was computed as p. then, the mfmi was defined as mfmi = ⁄ + ⁄ + ⁄ × fmi, which weights the fowlkes-mallows index by the fraction of positions containing a paired base and considers positions in which both bases are unpaired as in agreement, but only counts externally paired bases as agreeing if both structures contain an externally paired base at the same position and the base pairs have the same orientation. excel files from the supplemental material of (rangan, zheludev and das, ) were parsed to obtain the coordinates and predicted structures. for each predicted structure, agreement with the region of our structure with the same coordinates was computed using the mfmi, either including or excluding external base pairs (as specified in the text). box plots of the agreement for each window ( figure b ) show the minimum, first quartile, median, third quartile, and maximum; data lying more than . times the interquartile range from the nearest quartile are considered outliers and are plotted as individual points. the numbers of points in each box plot are given in the results section for figure b . reads from rt-pcr of a nt segment of in-cell rna spanning the fse (nucleotides , - , ) were used to generate bit vectors. the bit vectors were filtered as described above, and the filtered average mutation rates were normalized. the rna was folded using the shapeknots algorithm from rnastructure (hajdin et al., ) with parameters -m to generate three structures and -dms to use the normalized mutation rates as constraints in folding. all signals on g and u bases were set to - (unavailable constraints). connectivity table files output from shapeknots were converted to dot bracket format using ct dot from rnastructure (mathews, ) . accession numbers of curated sarbecovirus and merbecovrus genomes were obtained from (ceraolo and giorgi, ) and downloaded from ncbi. the sequences were aligned using the muscle (edgar, ) web service with default parameters. the region of the multiple sequence alignment spanning the two sides of alternative stem was located and the sequence conservation computed using custom python scripts. for the alignment of all betacoronaviruses with genomes in ncbi refseq (o'leary et al., ) , all reference genomes of betacoronaviruses were downloaded from refseq using the query "betacoronavirus[organism] and complete genome" with the refseq source database as a filter. the sequences were aligned using the muscle (edgar, ) web service with default parameters. the subgenus of betacoronavirus to which each virus belonged was obtained from the ncbi taxonomy database (sayers et al., ) . the filtered bit vectors (the same used to fold the frameshift element) were clustered using the expectation maximization algorithm of dreem to allow detection of a maximum of two alternative structures (tomezsko et al., ) . mapped reads from the in-cell library were classified as minus-strand using a custom python script if they had the following sam flags (li et al., ): paired and proper_pair and ({read and mreverse and not reverse} or {read and reverse and not mreverse}) and not (unmap or munmap or secondary or qcfail or dup or supplementary) . rna structures were drawn using varna (darty, denise and ponty, ). the bases were colored using the normalized dms signals. the sars-cov- starting material was provided by the world reference center for emerging viruses and arboviruses (wrceva), with natalie thornburg (nax @cdc.gov) as the cdc principal investigator." we thank t.b. faust for manuscript input and e. smith for illustrator images. this work was supported by the office of naval research award # n - - - and the burroughs wellcome fund. the authors declare no competing interests ) an in silico map of the sars-cov- rna structurome., biorxiv : the preprint server for biology programmed ribosomal frameshifting in decoding the sars-cov genome', virology an efficient ribosomal frame-shifting signal in the polymeraseencoding region of the coronavirus ibv characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot genomic variance of the -ncov coronavirus varna: interactive drawing and editing of the rna secondary structure statistical prediction of single-stranded regions in rna secondary structure and application to predicting effective antisense target sites and beyond simplified estimation from censored normal samples contrafold: rna secondary structure prediction without physics-based models implications of rna structure on the annealing of a potent antisense rna directed against the human immunodeficiency virus type muscle: multiple sequence alignment with high accuracy and high throughput coronavirus n protein n-terminal domain (ntd) specifically binds the transcriptional regulatory sequence (trs) and melts trs-ctrs rna duplexes rnaz . : improved noncoding rna detection accurate shape-directed rna secondary structure modeling severe acute respiratory syndrome coronavirus from patient with coronavirus disease, united states', emerging infectious diseases rfam . : shifting to a genome-centric resource for non-coding rna families the architecture of sars-cov- transcriptome global or local? predicting secondary structure and accessibility in mrnas fast gapped-read alignment with bowtie ' the sequence alignment/map format and samtools structural lability in stem-loop drives a ′ utr- ′ utr interaction in coronavirus replication a u-turn motif-containing stem-loop in the coronavirus ′ untranslated region plays a functional role in replication mouse hepatitis virus stem-loop adopts a uynmg(u)a-like tetraloop structure that is highly functionally tolerant of base substitutions rna structure analysis of alphacoronavirus terminal genome regions structural and functional conservation of cis-acting rna elements in coronavirus '-terminal genome regions', virology the molecular biology of coronaviruses using an rna secondary structure partition function to determine confidence in base pairs predicted by free energy minimization reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation systematic evaluation of error rates and causes in short samples in next-generation sequencing a three-stemmed mrna pseudoknot in the sars coronavirus frameshift signal achieving a golden mean: mechanisms by which coronaviruses ensure synthesis of the correct stoichiometric ratios of viral proteins the role of programmed- ribosomal frameshifting in coronavirus propagation rna genome conservation and secondary structure in sars-cov- and sars-related viruses: a first look genome-wide probing of rna structure reveals active unfolding of mrna structures in vivo database resources of the national center for biotechnology information rna motif discovery by shape and mutational profiling (shape-map)', nature methods continuous and discontinuous rna synthesis in coronaviruses structural imprints in vivo decode rna regulatory mechanisms' an atypical rna pseudoknot stimulator and an upstream attenuation signal for - ribosomal frameshifting of sars coronavirus determination of rna structural diversity and its role in hiv- rna splicing', nature architecture and secondary structure of an entire hiv- rna genome' molecular mechanisms of rna interference mouse hepatitis virus stem-loop functions as a spacer element required to drive subgenomic rna synthesis shape analysis of the rna secondary structure of the mouse hepatitis virus ' untranslated region and n-terminal nsp coding sequences', virology the structure and functions of coronavirus genomic ' and ' ends', virus research reverse genetics with a full-length infectious cdna of severe acute respiratory syndrome coronavirus dms-mapseq for genome-wide or targeted rna structure probing in vivo materials & methods vero cells infection and dms modification sars-cov- total viral rna was extracted from vero cells (atcc ccl- ) cultured in dmem (gibco) supplemented with % fbs (gibco) plated into mm dishes and infected at a moi of . with -ncov/usa-wa / (passage ) or % v/v) was added dropwise to the plated vero cells h post sars-cov- infection and incubated for min at °c. dms was neutralized by adding ml pbs (thermofisher scientific) with % β-mercaptoethanol. the cells were centrifuged at , g for min at °c. the cells were washed twice by resuspending the pellet with ml pbs with % β-mercaptoethanol and centrifugation to pellet then just once with ml pbs. after washes, the pellet was resuspended in ml trizol (thermofisher scientific) and rna was extracted following the manufacturer's specifications dms modification of in vitro-transcribed rna gblocks were obtained from idt for the sars-cov- nt and nt fse which corresponds to nucleotides - and nucleotides on the basis of the dms concentration used in the next step, mm sodium cacodylate buffer (electron microscopy sciences) with mm mgcl + (refolding buffer) was added so that the final volume was μl. (e.g. for . % final dms concentration: add . μl refolding buffer and . μl dms) then, . μl was added and incubated at °c for min while shaking at r.p.m. on a thermomixer. the dms was neutralized by adding μl β-mercaptoethanol (millipore-sigma). the rna was purified using rna thermofisher scientific) and rna was extracted following the manufacturer's specifications. the rna was purified using rna clean and concentrator - kit (zymo) and dms modified as described above. of total rna per reaction was used as the input for rrna subtraction. first, μl rrna subtraction mix ( μg/μl) and μl × hybridization buffer (end concentration: mm nacl, mm tris-hcl, ph . ) were added to each reaction, and final volume was then adjusted with water to μl. the samples were denatured at °c for min and then temperature was reduced by . °c/s until the reaction was at °c. next, μl rnase h buffer and μl hybridase thermostable rnase h (lucigen) preheated to ° were added. the samples were incubated at °c for min. the rna was cleaned with rna clean and concentrator - , following the manufacturer's instructions and eluted in μl water. then, μl turbo dnase buffer and μl turbo dnase (thermofisher scientific) were added to each reaction and incubated for the reverse-transcription reaction was incubated at °c for . h. μl m naoh was then added and incubated at °c for min to degrade the rna. the cdna was purified with oligo clean and concentrator - (zymo) following instructions. pcr amplification was done using advantage hf dna polymerase (takara) for cycles according to the manufacturer's specifications. the pcr product was purified by dna clean and concentrator - (zymo) following manufacturer's instructions. rna-seq library for bp insert size was constructed following the manufacturer's instruction (nebnext ultra™ ii dna library prep kit) library generation with dms-modified sars-cov- rna after rrna subtraction (described above), extracted dms-modified rna from sars-cov- infected vero cells was fragmented using the rna fragmentation kit (thermofisher scientific). . μg of rrna subtracted total rna was fragmented at °c for . min. the fragmented rna was mixed with an equal key: cord- - zd gai authors: zhang, yali; wang, shaojuan; wu, yangtao; hou, wangheng; yuan, lunzhi; sheng, chenguang; wang, juan; ye, jianghui; zheng, qingbing; ma, jian; xu, jingjing; wei, min; li, zonglin; nian, sheng; xiong, hualong; zhang, liang; shi, yang; fu, baorong; cao, jiali; yang, chuanlai; li, zhiyong; yang, ting; liu, lei; yu, hai; hu, jianda; ge, shengxiang; chen, yixin; zhang, tianying; zhang, jun; cheng, tong; yuan, quan; xia, ningshao title: virus-free and live-cell visualizing sars-cov- cell entry for studies of neutralizing antibodies and compound inhibitors date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zd gai the ongoing covid- pandemic, caused by sars-cov- infection, has resulted in hundreds of thousands of deaths. cellular entry of sars-cov- , which is mediated by the viral spike protein and host ace receptor, is an essential target for the development of vaccines, therapeutic antibodies, and drugs. using a mammalian cell expression system, we generated a recombinant fluorescent protein (gamillus)-fused sars-cov- spike trimer (stg) to probe the viral entry process. in ace -expressing cells, we found that the stg probe has excellent performance in the live-cell visualization of receptor binding, cellular uptake, and intracellular trafficking of sars-cov- under virus-free conditions. the new system allows quantitative analyses of the inhibition potentials and detailed influence of covid- -convalescent human plasmas, neutralizing antibodies and compounds, providing a versatile tool for high-throughput screening and phenotypic characterization of sars-cov- entry inhibitors. this approach may also be adapted to develop a viral entry visualization system for other viruses. ). the gamillus (mgam) and mneongreen (mng) were tested as the fused-gfp because mgam is acid-tolerant, which may enable fluorescent tracking when the probe is taken up into acidic cellular organelles , and mng is the brightest gfp to our knowledge . we designated the rbd-based probes as rbg figure d) , respectively, which were similar to previously reported data for unfused proteins , . together, c-terminal fp-fusion does not influence the structure and ace -binding capability of the rbd and s-ectodomain of sars-cov- . establishment of virus-free assays to visualize sars-cov- cell entry we established hace -overexpressing cell lines using the ace hr and ace irb constructs (figure a ). cell-transfection with ace hr allowed hace - overexpression with nucleus visualization (h b-mruby ). ace irb contains an ace -mruby expressing cassette following an ires-ligated h b-irfp - a- puror. transfection with ace irb simultaneously enabled fluorescent visualization of hace (hace -mruby ) and nucleus (h b-irfp ). using these vectors, we developed three stable cell lines, namely, t-ace irb , t-ace hr and h - ace hr. as expected, hace (or ace -mruby ) was expressed at high levels, and the expression of tmprss (another critical factor for viral entry) did not change in these cells ( figure b ). on t-ace irb cells, both the sars-cov -rbg and sars-cov -stg probes showed effective-binding to the cells, as membrane-bound and hace - mruby -colocalized mgam signals were observed after a -min incubation with the cells ( figure c ). cytoplasmic mgam signals were detected in cells after incubation for -min or longer with the probes, particularly for sars-cov -stg, suggesting that the recombinant probes can not only bind to the cell surface but also be taken up into the cells. the internalization of sars-cov -stg was more evident than that of cov -rbg into t-ace irb ( figure c ). in live-cell dynamic tracking, more internalized mgam signals was observed for sars-cov -stg than sars-cov - rbg ( figure s a ). for mgam signals, the internalized fluorescence ratio (ifr, figure s b) and the internalized vehicle numbers (ivns, figure s d ) of stg-treated cells were both significantly higher than those of rbg-treated cells approximately -min after probe-cell incubation. in contrast, no significant difference was noted in hace - mruby internalization in the presence or absence of probes ( figure s c ). using t-ace irb cells, we established a cell-based assay mimicking sars- cov- cell entry based on recombinant probes. it was a one-step wash-free assay ( figure d ). after -hour cell-probe incubation, the cells were directly imaged by using a fully automatic high-content screening (hcs) system in confocal mode. for quantitative measurements, the h b-irfp were used to identify the nucleus, and the ace -mruby were used to determine the cell boundary. based on the detected nucleus and cell outlines, the green fluorescence intensities on the cell membrane and in the cytoplasmic region of each cell could be measured ( figure d ). generally, we used the mean fluorescence intensity (mfi) in the cytoplasmic region (cmfi) as an index of the amounts of the cell-bound and internalized probes. as the spikes of cov and hku -cov do not interact with hace , the signals of mers-rbg and hku - rbg on t-ace irb were nonspecific background ( figure e ). ratg -rbg showed a detectable and dose-dependent cmfi, but the value was significantly lower than those of the probes of sars-cov- and sars-cov- . compared to sars- cov -rbg, sars-cov -rbg showed slightly stronger signals, possibly due to its higher binding affinity. for sars-cov- , the cmfi of sars-cov -stg, sars-cov - stn and sars-cov -st were significantly higher than that of sars-cov -rbg and sars-cov -rbd , and also stronger than sars-cov -smg. the dylight - labeled sars-cov -rbd probe presented a weaker signal than sars-cov - rbg, suggesting that the nh -dye modification at some amino acids of the rbd may interfere with its interaction with hace . moreover, mgam-fused probes showed better performance than dylight -labeled or mng-fused probes ( figure f ). at a concentration below than nm, the cmfi of sars-cov -stg was approximately - fold higher than that of sars-cov -rbg. based on the visualization system, we developed cell-based hcs assays for analyzing the blocking potencies of sars-cov- entry inhibitors, designated csbt (using sars-cov -stg) and crbt (using sars-cov -stg), respectively. the proteins of hace -fc (race ), sars-cov -rbd and sars-cov -s were employed for inhibition assessments following the procedure described in figure d . as expected, all three proteins exhibited dose-dependent cmfi inhibition in both assays ( figure g ). the z'-factor coefficients of the csbt and crbt were both determined to be over . ( figure s e ), which demonstrated their robustness and reproducibility. detecting entry-blocking antibodies in covid- -convalescent human plasmas by csbt and crbt. recent studies have suggested that convalescent plasma may be beneficial in covid- treatments , . neutralization antibodies (nabs) in convalescent plasmas may be essential in suppressing viruses . however, a rapid method for determining the neutralization antibody titer (nat) of human plasma is still absent. we evaluated the feasibility of the csbt and crbt determined entry-blocking antibody titers as nat surrogates in covid- -convalescent human plasmas (table s ). compared with samples from healthy donors (n= ), all covid- -convalescent plasmas showed significant cmfi inhibition on csbt assay, whereas only samples ( . %) had detectable crbt activity ( figure a ). for quantitative analysis, two-fold serial dilution tests were further performed to determine the csbt and crbt titers ( figure b ). moreover, the titers of total antibodies (tab), igg, igm, and lentiviral-pseudotyping- particles (lvpp) based nat (lvppnat) against sars-cov- were also measured for comparisons ( figure c ). among the antibody titers derived from various assays, the csbt titer showed the best correlation with lvppnat ( figure d and table s , r= . , p< . ), and it also well correlated (r= . , p< . , figure d ) with the neutralization activity against authentic sars-cov- virus in representative samples (table s ) . together, the csbt-determined entry-blocking antibody titer is a good nab surrogate of convalescent plasmas. functional phenotyping of mouse anti-spike antibodies by csbt and crbt. serum samples from mice immunized with the sars-cov -rbd, sars-cov - s and sars-cov -s were collected for lvppnat, csbt and crbt measurements. the sars-cov -rbd and sars-cov -s immunizations resulted in potent and comparable serum lvppnat ( figure s a ), whereas sars-cov -s raised little nabs. the csbt ( figure s b ) and crbt ( figure s c ) assays also exhibited similar results to lvppnat measurements. the id correlation coefficient was . (p< . ) between csbt and lvppnat and . (p< . ) between crbt and lvppnat. using rbd-immunized mice, we developed mabs via rbd-elisa screening following cell-based functional evaluations (as illustrated figure a ). these mabs did not display much difference in elisa-binding to sars-cov -rbd, but of them ( h and a ) showed significantly decreased elisa-binding activities to sars-cov -st ( figure s a ). based on epitope-binning assays using a cross-competitive elisa, the mabs could be divided into six groups ( figure s b ). all mabs showed detectable but varied surface plasmon resonance (spr) affinity ( . - nm, figure s ) to sars- cov -rbd. quantitative measurements of csbt, crbt and lvppnat for the mabs were further performed ( figure b to c, figure s a to c). half of the mabs exhibited high-to-moderate csbt blocking potencies (ic < nm), whereas the remaining ones showed low-to-no csbt activities ( figure b ). in comparisons of the dose- dependent cmfi inhibitions against sars-cov -stg, sars-cov -st , sars- cov -rbg, and sars-cov -rbg ( figure c ), the profiles of most mabs against sars-cov -stg and sars-cov -st were similar, but the activities of b and b were dramatically decreased with sars-cov -st compared to sars- cov -stg, suggesting that dye-conjugation may modify the epitopes of the two mabs and hinder their bindings. notably, seven mabs exhibited striking enhancement at some dosage in the crbt assays ( figure c and figure s b ), but neither enhancement was noted in the csbt nor lvppnat tests. spr analyses demonstrated that the fabs of two representative crbt-enhancing mabs, g and h , also showed a dose-dependent promoting effect on the rbd-ace binding, whereas the fabs of two crbt-blocking mabs ( h and b ) exhibited a dose-dependent reduction in the rbd-ace interaction ( figure s ). together, the crbt-enhancing effects of these mabs may be caused by the antibody-induced rbd conformation changes associated with increases in ace -binding capacity. the functional potencies of mabs determined by various assays are summarized in figure d and table s . the csbt-ic values of the mabs showed the good correlation with their lvppnat ic (r= . , p< . , figure e ) or ic (r= . , p< . , figure s ) values, and were also well correlated with their crbt-ic values (r= . , p< . , figure e ). however, a g mab presented csbt activity but no inhibition in the crbt assay, suggesting that its csbt activity is independent of the direct blocking of the rbd-ace interaction ( figure d ). an h mab with moderate lvppnat activity but showed neither csbt nor crbt inhibition ( figure d ), suggesting it may act through different mechanisms to achieve neutralization. no significant relationship was noted between the elisa-or spr-determined protein- binding activities and neutralization potencies of the mabs ( figure s ). according to the spr ( figure s ) and crbt analyses using sars-cov -rbg ( figure c ), the b , b , f , c , and h mabs showed cross-reactivity to sars-cov- and ratg -cov. however, only the b mab had neutralization activity in sars-cov- lvppnat measurements ( figure s d ). epitope-binning assay ( figure s b) suggested that b , b , f and d possibly share an overlapping-epitope (cluster c ), and c , h , h and g may bind to another similar epitope (mab cluster c b). as b showed comparable lvppnat potencies against both sars- cov- and sars-cov- , it may recognize a cross-neutralization epitope. the h mab, which recognizes a unique epitope that differs from other mabs (mab cluster c , figure s b ), presented the best performance in lvppnat, csbt, and crbt assays but did not show any cross-reactivity with sars-cov- or ratg . both h and the h mab showed sars-cov- neutralization activity against both pseudoparticles (ic = . nm) and the authentic virus (ic = . nm) but had neither csbt nor crbt activity. we speculated that this mab may inhibit the sars- cov- via an intracellular neutralization pathway . to validate this, we prepared dylight -labeled mabs (ab ) of h , g , h , and h and an irrelevant mab (ctrab) for dual-visualizing tracking. among them, h , g and h served as controls that had strong, moderate and weak/no activity for both csbt and neutralization, respectively. in t-ace irb cells simultaneously incubated with stg and ab , we performed time-serial live-cell imaging analyses. to characterize the influence of these mabs on sars-cov -stg internalization, the dynamic changes of the stg-ivns, the stg-ivpmfi (the peak mfi of internalized vesicles), the ab - ivns, the ab -ivpmfi, and the percentage of stg/ab -colocalized internalized vesicles, and the stg-iva were calculated ( figure ). as expected, h completely obstructed stg internalization (p< . ), g showed significant but incomplete inhibition (p= . ), and h presented little/no influence on stg internalization (p= . ). compared to ctrab, the h showed no significant stg-ivns reduction (p= . , figure a ) but increased the stg-ivpmfi (p< . , figure b ). on the other hand, the h group exhibited higher ab -ivns (p< . , figure c ) and ab - ivpmfi (p< . , figure d ). the stg/ab colocalization (p< . , figure e ) and the stg-iva (p< . , figure f ) in the h group were also significantly higher and larger, respectively, than those in the other groups. representative images at -hour post stg/ab -cell incubation, as shown in figure g in stg-visualization system, the cmfi measurements following the csbt procedure showed only a slight reduction in cells treated with a high dose of amiloride, dynasore, apilimod, and apy ( figure s a ). notably, confocal-images revealed that the stg, colocalizing with internalized ace -mruby , were trapped on enlarged cytoplasmic vacuoles induced by pikfyve inhibitors (figure c ), and most of these vacuoles were not stained by ph-dependent lysoview dyes, suggesting an abnormal ph-status ( figure s b ). in addition, tetrandrine or baf.a also caused our data provided convincing evidence demonstrating the versatile applicability of the new system. the csbt-determined entry-blocking potency was a better correlate of nat against pseudotyping or the authentic sars-cov- virus than elisa-binding activity in covid -convalescent human plasmas, immunized mouse sera and mabs. the csbt may serve as rapid proxy assessment to identify plasma source with therapeutic potential in clinic, and is a useful tool in evaluating vaccine efficacy and neutralizing mab identification. in this study, of mabs ( h , b , c and h , figure d ) with the strongest csbt blocking activities (ic < nm) showed the most potent lvppnat (ic < nm). notably, the h mab presented superior neutralization activity against authentic sars-cov- (ic = . nm), which was comparable with recently described potent neutralizing mabs , , and thereby table s . live-cell images were acquired at -min, -hour, -hour, -hour, -hour, -hour, -hour, -hour, and -hour using a x water immersion objective. to visualize compound-induced influence on viral entry, t-ace irb ( figure c-d, figure s a ) or h -ace hr cells ( figure s b ) were pretreated with serial dilutions of compounds for -hour. then the probes were added to the cell cultures for further incubations in the presence of compounds. cell images shown in figure c and figure s b were acquired on tcs sp sted confocal microscope using a x oil immersion objective. the data of figure d and figure s a were derived from images acquired on opera phenix using x water immersion objective. for pictures of figure c , the cells were gently washed twice with pbs at -hour post stg incubation, following a paraformaldehyde fixation before imaging. for experiments as shown in figure s b , the cells at -hour post stg incubation were stained with lysoview ( . μl per well) for -min, then the cells were gently washed twice with pbs buffer and fixed with paraformaldehyde treatment before imaging. cell images involved in figure d and figure crbt. serum samples for mice immunized with recombinant proteins of sars-cov - s (mouse s - , s - , s - ), sars-cov -s (mouse s - , s - , s - ) and sars- p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p (stg vesicles/cell) cell entry mechanisms of sars-cov- rapid development of an inactivated vaccine candidate for sars-cov- potent neutralizing antibodies against sars-cov- identified by high- throughput single-cell sequencing of convalescent patients' b cells inhibition of sars-cov- (previously -ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion neutralization of sars-cov- spike pseudotyped virus by recombinant ace -ig a universal design of betacoronavirus vaccines against covid- , mers, and sars characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov robust neutralization assay based on sars-cov- s-bearing vesicular stomatitis virus (vsv) pseudovirus and ace -overexpressed bhk cells establishment and validation of a pseudovirus neutralization assay for sars-cov- acid-tolerant monomeric gfp from olindias formosa a bright monomeric green fluorescent protein derived from branchiostoma lanceolatum treatment of critically ill patients with covid- with convalescent plasma effectiveness of convalescent plasma therapy in severe covid- patients the convalescent sera option for containing covid- intracellular neutralization of viral infection in polarized epithelial cells by key: cord- -odtue a authors: comandatore, francesco; chiodi, alice; gabrieli, paolo; biffignandi, gherard batisti; perini, matteo; ricagno, stefano; mascolo, elia; petazzoni, greta; ramazzotti, matteo; rimoldi, sara giordana; gismondo, maria rita; micheli, valeria; sassera, davide; gaiarsa, stefano; bandi, claudio; brilli, matteo title: insurgence and worldwide diffusion of genomic variants in sars-cov- genomes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: odtue a the sars-cov- pandemic that we are currently experiencing is exerting a massive toll both in human lives and economic impact. one of the challenges we must face is to try to understand if and how different variants of the virus emerge and change their frequency in time. such information can be extremely valuable as it may indicate shifts in aggressiveness, and it could provide useful information to trace the spread of the virus in the population. in this work we identified and traced over time amino acid variants that are present with high frequency in italy and europe, but that were absent or present at very low frequencies during the first stages of the epidemic in china and the initial reports in europe. the analysis of these variants helps defining phylogenetic clades that are currently spreading throughout the world with changes in frequency that are sometimes very fast and dramatic. in the absence of conclusive data at the time of writing, we discuss whether the spread of the variants may be due to a prominent founder effect or if it indicates an adaptive advantage. the worldwide fast spread of sars-cov- virus during the first months of this year has caused , deaths with more than , , confirmed cases since the first reports of novel pneumonia in wuhan, hubei province, china (zhou et al. ; wu et al. ) up to may (who b . the virus belongs to the beta-coronavirus, and it is the seventh coronavirus known to infect humans, causing severe respiratory and systemic disorders (rothan and byrareddy ) , with a basic r index estimated to range from . to over . the closest known relatives of sars-cov- circulate in animals, specifically bats or pangolins (zhang, wu, and zhang ) , suggesting that an animal virus crossed species boundaries to efficiently infect humans, possibly through multiple passages in intermediate animal hosts, even though the transmission route has not been yet identified to date (andersen et al. ). traces of the history of the spread are present in the viral genome and comparative genomics approaches can therefore be used to understand how viruses can adapt to multiple hosts, uncovering key signatures of this adaptation (andersen et al. ; wan et al. ) , and to trace the infection routes of the virus. at the same time, genomic studies can help tracing viral variants that may be geographically restricted and/or may account for different levels of infectivity and mortality in humans. these variants might arise during the spread of the epidemic, as viruses are known for their high frequency of mutation, particularly in single stranded rna viruses -as in the case of sars-cov- (sanjuán and domingo-calap ) , which has a single, positive-strand rna genome. randomly generated variants can then spread in the population, due to stochastic reasons (i.e. founder effect, drift) or as a consequence of positive selection exerted by intrinsic biological features (such as the level of virus infectivity and its transmission rate), or extrinsic factors such as use of antivirals or reactions by the immune system or other defence mechanisms put in place by the host (di giorgio et al. ) . therefore, haplotype(s) present at the beginning of the epidemic spread can change in time; sometimes novel variants can overcome ancestral ones, and this can be a consequence of different levels of aggressiveness but also of mechanisms beyond selection. if the different variants are identical in terms of their ability to infect and replicate in the host, country-specific switches in haplotype frequency with respect to the most common haplotypes can depend on the very first haplotype(s) arriving in the country (provine ) . instead, when haplotype frequency changes globally, then the hypothesis of differential aggressiveness becomes more probable, indicating that the novel variants may be better adapted to infect human hosts; however, even in this case, the complexity of global human mobility or sampling bias, may originate patterns that may seem causal but are not. here, we present a comprehensive study of the coding sequences from sars-cov- genome sequences isolated since the beginning of the epidemic. italy was the first european country registering non-imported covid- cases requiring hospitalization. the first registered case of covid- was on february , , a young man in the lombardy region in northern italy, diagnosed with atypical pneumonia (livingston and bucher ) . in the next hours more cases were detected, and as of april th , italy registered , cases with , deaths, one of the highest toll in europe and in the world. who classifies the spreading occurring in italy as community transmission, indicating that the country is experiencing large outbreaks of local transmission with no possibility to trace transmission chains between the cases, and with multiple and unrelated clusters of transmission. the high mortality rate observed in italy, especially at the beginning of the epidemic raised the question whether italian strain(s) might have increased aggressiveness. while the increased mortality observed in italy could be explained by the fact that, at the beginning of the epidemic, swabs were performed only on individuals showing up at a hospital, distorting the sampling toward a group of symptomatic individuals devoid of healthy carriers, we cannot exclude that italian haplotype frequency are partially different, at least genetically. to have a better insight on the history and spread of the covid- pandemic in italy and thanks to the sequences deposited in the gisaid database, we identified non synonymous mutations that are differentially frequent in italian sars-cov- strains respect to strains circulating globally. these mutations are enriched in italy, but present in strains from other countries as well, as shown by tracing their relative frequency in time both globally and in different countries, therefore we traced their distribution worldwide and complemented with a phylogenetic analysis to understand how the variants are related. genomes were downloaded from the gisaid.org repository on april, , , and a second time on april, , , and are listed, together with reference to the submitting laboratories, in supplementary table . we extracted coding sequences using a strategy based on tblastn (camacho et al. ) comparisons using as queries all the proteins from the sars-cov- reference sequence deposited in ncbi (accession mn ). after the comparison, the coordinates of the blast alignments on each genome were used to extract the coding sequence. nucleotide sequences were then aligned and translated, and the alignments were manually checked for the presence of frameshifts, and manually edited. alignments were manually edited to remove partial and poorly aligned sequences, resulting in a variable number of sequences per alignment. this manual curation resulted in alignments containing a minimum number of sequences for orf a and a maximum of sequences for the nucleocapsid protein, with an average of sequences per alignment. we are aware that by removing sequences with gaps we are likely removing part of the genuine variability present (i.e. indels), however, we observed indels with such a low frequency that we attributed them mostly to sequencing/assembly errors and decided that the clear advantage of a stronger signal outweighs the possible disadvantages deriving from information loss. additional analysis with high quality genome sequences will be necessary to evaluate whether indels represent an important source of variation. regardless, this does not change the results of our analysis as the target of this work are point mutations. sequences from the alignments were used to build amino acid frequency profiles by using the r-package seqinr (charif and jr ) . basically, for each protein encoded in the sars-cov- we obtained two profiles, one for italian strains and one for the entire set of sequences. as the positions in the alignments for the two groups under examination are congruent, we can then calculate: that is the maximum log ratio of the frequency (f) of all amino acids at a certain position i in the alignment in italian ( it ) and total ( all ) sequences. then we identify positions with s> . that corresponds to the identification of positions where there is a frequency change in one of the residues of at least x. variability in multi-alignments was quantified by calculating entropy = − ∑ # - # : -# at each position of each manually curated protein multi-alignment. mutual information was calculated based on (buck and atchley ) to assess whether variants tend to co-occur frequently. to build updated time profiles of the identified variants, sequences were downloaded a second time from gisaid.org on april, , ( sequences, only high quality and high coverage). sampling dates were retrieved from the database and can be found in the gisaid sequence acknowledgment table (supplementary table ). sequences for which the geographic origin and/or the data were not correctly defined (supplementary table ), were removed, to obtain a database of about sequences, with some per-site variability due to the variable presence of ns in the sequences. we calculated two different time series, one for the entire set of sequences available up to the moment of sequence download, and one by considering only sequences taken from predefined groups of countries (hereinafter the time series by country). in the latter, we merged nearby countries to increase the precision of the estimation of the relative frequency of variants within predefined time intervals. the total time range since the first sequence available ( days) was split into ( for the analysis by country, as the number of sequences per interval is smaller) non-overlapping intervals of about days (~ days for the time series by country). interval duration was selected looking for a compromise among the number of sequences available on average per interval in the different groups and the time resolution of the time series. for each interval we calculated the frequency of the observed variants and the shannon diversity index considering all concatenated coding sequences (not limited to the positions considered in this work). in supplementary table we show the number of sequences used for each country for each interval. we performed a phylogenetic analysis by concatenating nucleotide aligned on the basis of the manually curated amino acid alignments. we selected the best phylogenetic model for our alignment using modeltest-ng (darriba et al. ) (gtr+i+ model) and then we performed maximum likelihood phylogenetic estimation in raxml (stamatakis ) . the obtained phylogenetic tree was then visualized and integrated with additional information about variants and geographical origin using figtree (https://github.com/rambaut/figtree/). our analysis allowed us to identify positions in four proteins that present drastic changes in amino acid frequencies when comparing italian sequences with worldwide sequences available on gisaid.org on april, , ( figure ). however, we discovered that residues found at these positions are not peculiar of italy and, as a matter of fact, they are the most variable genomic positions across available sars-cov- sequences (supplementary figure ) . therefore, we decided to proceed with a worldwide analysis. the sites that we identified are: . position in the spike protein, where an aspartate residue is found in high frequency in sequences obtained at the beginning of the pandemic. a variant, with a glycine residue at the same position is however now very common in europe, italy in particular. during the preparation of this manuscript, we realized that some of the variants are also the topic of other papers and preprints predating this work e.g. (korber et al. ; banerjee et al. ; somasundaram, mondal, and lawarde ; begum et al. ; bai et al. ; yang et al. ; pachetti et al. ) . the fact that we identify some of the previously known mutations by using a different approach indicates that they represent a genuine signal rather than artefacts; however, this does not necessarily mean that their increase in time (see below) is a consequence of a correspondingly increased transmissibility rate or aggressiveness in general -for which at the moment there is no conclusive data as stated in most of the available preprints. in the following, we summarize a thorough bibliographical analysis looking for functional information associated with these positions or the domains they belong to. however, we stress once more that virus spreading is characterized by haplotype shifts that are not necessarily related to the performances of the haplotypes themselves -for instance genetic drift or founder effect -therefore without targeted functional studies on the variants it is difficult to ascertain whether one or the other explanation is more probable. figure for a confirmation of the variability of these positions in the entire set of sequences. note that to cope with zeros when taking the logarithm of the ratio of amino acid frequencies, we add a small number to every frequency that depends on the number of sequences in the two groups (italian and world). this makes so that positions with identical amino acid frequencies in the two groups do not get an s exactly equal to zero. d is located in the sd domain of the spike protein. it is positioned in a loop right after a beta-strand and close to a completely solvent exposed disordered region outside and downstream the ace interaction domain (wrapp et al. ; walls et al. ) . the resolution of the three available structures is not high enough to thoroughly discuss h-bond network. it is likely that d side chain does not establish close interactions with neighboring residues. conformationally, d lies in the region of left-handed helices. thus, from the structural point of view, the mutation d g is likely to be neutral (no particularly strong interactions lost from the missing carboxylate group). on the other hand, the mutation might even be beneficial for overall protein stability: given the increased conformational freedom of glycine, it is indeed a perfect residue in turns following beta structures. other authors suggest a positive effect of d g on the virus efficiency in binding the ace receptor. korber and colleagues (unpublished, (korber et al. ) ) suggest the possibility of structural changes in the protein or even an improved antibody-dependent enhancement effect. in coronaviruses, orf is translated to yield the polyprotein that is processed by proteolysis with the production of intermediate and mature nonstructural proteins (nsps). in figure we highlight the position of the variant sites on the polyprotein with respect to known domains, as defined by the conserved domain database ). position corresponds to residue of the non-structural protein (nsp ) which is characterized by four transmembrane helices (th) (angeletti et al. ) , whose role is still unknown. more in detail, position lies at the n-terminal, which protrudes from the external face of the membrane. interaction analysis of the corresponding protein from sars virus shows that nsp can form dimeric or multimeric complexes that can also involve additional viral proteins (nsp , nsp , nsp , nsp , nsp , nsp , orf a) (von brunn et al. ) suggesting that nsp might be involved in the viral life cycle. sars nsp also interact with host proteins, such as prohibitin and (cornillez-ty et al. ), that are involved in cell cycle progression, migration, differentiation, apoptosis and mitochondrial biogenesis (fusaro et al. ; merkwirth and langer ; rajalingam et al. ; sun et al. ) suggesting it might be important to manipulate prominent host functions. murine hepatitis virus and sars-cov strains deleted in this portion of the polyprotein (graham et al. ) have a strongly reduced viral growth and rna synthesis, but are not affected at the level of protein processing. ectopic expression of the nucleotide sequence coding for nsp in murine cells infected by strains missing nsp , allowed to detect its recruitment in viral complexes. it is therefore possible that nsp is involved in global rna synthesis, interaction with host proteins and pathogenesis (von brunn et al. ). in infected cells, nsp seems to be present in small vesicular foci, but in absence of additional viral proteins it localizes in cytoplasmic or nuclear membranes, without a specific target (prentice et al. ; von brunn et al. ). unfortunately, mutagenesis experiments are still not available for sars-cov , and therefore we still do not know whether the variant observed at this site has any effect on in vivo virus properties. position belongs to nsp , a protein that induces the formation of autophagosomes, which is a sign of starvation in uninfected cells (cottam, whelband, and wileman ) . autophagosomes can act as an innate defense against viral infection but they can be hijacked and support the assembly of coronavirus replicase proteins (orvedahl et al. ; suhy, giddings, and kirkegaard ; wileman ). together with nsp and nsp , nsp moreover promotes the formation of the double membrane vesicles typically observed in sars disease (angelini et al. ). nsp is a protein with transmembrane domains (oostra et al. ) and position lays in the luminal loop between the first and second hydrophobic transmembrane domains. in the wild type sars-cov- sequence, we find phenylalanine residues just before l . instead, some of the variants in this paper have a stretch of phenylalanine, keeping this region highly hydrophobic. position belongs instead to the n-terminal domain of the rna-directed rna polymerase (rdrp, also known as nsp ). by inspecting its three-dimensional structure, sars-cov- rdrp displays a n-terminal nidovirus-like rdrp-associated nucleotidyltransferase domain (niran) followed by an interface domain and then by the canonical palm and finger domain structure (gao et al. ) . p (p according to nsp numbering) is located in the alpha-beta interface domain which bridges niran with the finger domain. specifically, p sits on a solvent exposed loop region in the groove formed between niran and the finger domain. the mutation p l mutation does not change the non-polar nature of the side chain and from the conformational point of view the substitution from p to l should not result in specific structural adjustment of the loop. thus, despite no information concerning mutations at the specific sites identified in this work, the three variant sites in the polyprotein belong to important functional regions of the sequence. orf codes for a secreted accessory protein not directly involved in viral replication (dediego et al. ; yount et al. ; tan et al., ) . homologs have been found in some beta-coronavirus, named orf a (tan et al., ) . orf a of sars codes for a protein with a structure similar to immunoglobulin superfamily proteins, specifically the metazoan ig involved in adhesion (nelson et al. ; hänel et al. ) . the function of this protein is not entirely clear, but could be related to the modulation of the immune system. studies have demonstrated that sars-orf a protein localizes mainly in the perinuclear region of the host cell, where it interacts with bst- (bone marrow stromal antigen ) preventing the glycosylation of bst- needed for functioning (taylor et al. ) . bst- inhibits the release of the virus, as observed in hiv- (taylor et al. ) likely at the level of the endoplasmic reticulum-golgi where it binds budding virions. orf has a structure composed of a beta-sandwich fold with seven beta-stands, which highly resembles orf a's structure, but it lacks the c-terminal transmembrane region and has an additional long insert between strands and which is supposed to be involved in peptide binding (tan et al., ) . orf is a fast-evolving gene which, together with the absence of the transmembrane region and the presence of the insertion may suggest some functional divergence with respect to its ancestor (tan et al., ) . it has been proposed that orf may have acquired a function similar to the adenoviral cr protein, which interferes with mhc molecules to attenuate the antigen presentation and therefore the capability of the host immune system to detect the virus. in this context, we notice that l lays within the orf insert, indicating it may represent a further adaptation. because direct functional information on this site is lacking at the moment, it is not possible to ascribe a functional adaptation to these variants; however, variability at this site might counteract mhc interference function, as a fast-evolving region in the insert may have been selected positively to facilitate the interaction with a fast-evolving host molecule (tan et al., ) . after having identified variant sites, we explored the time profile of the seven variable sites across all sequences (figure ) evidencing haplotype frequency changes for all of them between february and march, with five changes being moderate (nucleocapsid r g and g r, orf l s, polyprotein t i and l f) and two more drastic, now representing the most common haplotypes of non-chinese recent sequences (spike d g and polyprotein p l). the spike variant, in particular, was sequenced only few times in china since the beginning of the epidemy, the first time in zhejiang on january ; however, in this country it never reached a significant frequency. conversely, after the first sequencing of this variant in germany on january this variant started at very low-frequency and then became the most common haplotype at this position outside china. a similar situation is true for the haplotype with a leucine in position of the polyprotein -that rapidly increases since the beginning of march. these patterns seem indicative of a functional role for at least some of the variants that undergo an increase in frequency, but in the absence of any functional test or experimental data, we cannot rule out that the observed frequency changes are a consequence of a founder effect in europe followed by a spreading wave from europe to countries where the epidemic started later. a founder effect, however, implies that the founder arrives first, while this is not always the case, at least for some of the variants and part of the countries. at the same time some data both reviewed and unreviewed start to be available, suggesting that d g on the spike might provide some advantage to the virus. bai (bai et al. ) and brufsky (brufsky ) observed a correlation of g with increased mortality, while korber and coworkers (korber et al. ) found a correlation between the presence of the mutation and a higher viral load in patients. in the legend, we indicate the reference residue with an asterisk, and the variant on which we are focusing with an exclamation mark. additional variants were identified in the sequence dataset downloaded on april, , but they never reach significant percentages, at least at the moment. position on the spike and position on the polyprotein covary in a significant way (data not shown). the most likely explanation for this is the rapid sequential fixation of both mutations in the same strain, together with the absence of recombinations in between the two. the two adjacent sites in the nucleocapsid sequences show a perfect agreement in the time profiles, strongly suggesting that they happened together or within a short time. these two mutations respectively remove and add an arginine (passing from rg to kr); by considering the overall frequencies of the possible amino acid pairs at these positions, we suggest that the second arginine may complement the loss of the first one. indeed, rr is never observed, kg is extremely rare (relative frequency, r.f.= . ), while most genomes either present the original pair rg (r.f.= . ) or kr (r.f.= . ). we speculate that this could indeed be a consequence of the non-neutrality of configurations with no arginine at both positions, an information still not giving hints on the fitness of the fixed variant. next,we reasoned that grouping all the sequences uploaded from different countries may provide a picture averaged over variable and more complicated situations that might characterize the evolution of the virus within different countries or geographical areas. we therefore explored the time profiles of the variants in different geographical macro-regions ( figure and figure ) highlighting a highly heterogeneous situation. we also provide a movie illustrating the changes taking place in variant frequencies across overlapping time windows in macro-regions (supplementary movie ), which provides a dynamical view of the time profiles of the variants. figure shows that d g in china never reaches significant frequency while it increased quite rapidly in several areas where it arrived and where it often started at very low frequency with respect to the original haplotype. if a functional role for this mutation will be demonstrated, this pattern seems to indicate that different variants might have different fitness when interacting with different host's haplotypes, i.e. in case asian and european have different haplotypes concerning some of the proteins interacting with the spike, like for instance furin. to conclude, it is clear that since the first appearance of d g and other of the above variants, their relative frequency underwent a significant increase in several countries, most of the time overcoming in prevalence the original variant(s), except for china, south asia, south america and africa. in figure we summarize the most recent situation, by using sequences from the interval april to , . the best phylogenetic model selected according to aic and bic was gtr+i+g, for which the estimated substitution rate matrix contains the following rates, relative to the g<->t rate, taken as unity: a<->c= . , a<->g= . , a<->t= . , c<->t= . the extremely high rate for c to t (or better u, considering we are dealing with an rna virus) is in agreement with the involvement of host apobec-like editing mechanisms, as proposed in recent works (di giorgio et al. ) . besides not being the focus of this paper, these rates provide further evidence that host's mutagen systems may play an important role in the evolution of sars-cov- and its detection by the immune system. the phylogenetic tree integrated with additional information is reported in figure , with different coloring schemes and together with an evolutionary model summarizing how the different variants configurations are likely related to each other. when considering the information about the residues found at the seven positions on which we are focusing (hereinafter variants configurations, vcs), as in figure a , we find that the phylogenetic clades correspond to the six most frequent ones (accounting for over . % of the sequences), that is rgtlpdl, rgtlpds, rgtfpdl, krtllgl, rgtllgl, rgillgl -obtained by linking the variant residues in the order: protein nucleocapsid sites , and , polyprotein sites , and , spike site and orf site ; for this reason hereinafter we will use clades a to f interchangeably with the vcs written above. we are aware that the correlation is not perfect, as can be seen by the presence of additional but low frequency variants within each clade, or the presence of vc misplaced with respect to their major clade. misplaced sequences can be a consequence of insufficient phylogenetic signal or of convergent evolution. for instance, as anticipated above one of the chinese sequences carrying spike variant g (vc: rgtlpgs), is indeed contained within clade b (rgtlpds), indicating convergent evolution with the spike d g variant in clades d, e, f, or maybe an artifact. however, the limited number of similar cases supports our simplification and the correspondence among clades in the tree and vcs. our multi-alignment contains one sequence annotated as bat coronavirus (epi_isl_ ), which provides a rooting of the phylogenetic tree that falls in clade b and indeed contains most of the sequences obtained in china at the beginning of the epidemy ( figure , panel b) . the tree has two main "radiations", one corresponding to variants configurations rgtlpdl, rgtlpds, rgtfpdl and one to krtllgl, rgtllgl, rgillgl, therefore the main partition of the tree corresponds to the identity of residues at position of the spike and of the polyprotein. the two radiations are present at different frequencies in the same countries, with the notable exception of chinese sequences within radiation , except for epi_isl_ , the only sequence with a rgtllgl pattern sequenced in china that is also present in the phylogenetic tree (indicated in figure a ). this suggests that while most of the diversification of radiation took place in china and was then exported outside, the ancestor of radiation travelled outside china early to start a diversification in the countries where it arrived. this hypothesis is in agreement with the fact that epi_isl_ is placed very close to the root of the branch leading to radiation , indicating it may indeed represent a strain closely related to the true radiation ancestor. these observations raise important questions ( ) about the identity and movements of the first individual carrying the ancestral radiation variant outside china ( ) on the timing of the epidemic, but most importantly ( ) the reason why it did not increase in frequency in china but elsewhere. following the topology of the tree we propose an evolutionary model (figure c ) whereby the ancestral bat variants configuration (rgtvpds or any other present in the presently unknown animal host) evolved into rgtlpds and from this to rgtlpdl. we stress that this likely took place through unobserved states/hosts as the root branch length is close to the total length of the tree and therefore we do not consider mutation v l in the polyprotein full sequence as an adaptation to the human host. the latter originated rgtfpdl on one side, and the ancestor of radiation (rgtllgl) that originated both krtllgl (through kg unfit strains?) and later rgillgl. given the almost perfect agreement of the tree with the variants configurations, we analysed the geographical distribution and the time profiles of the seven sites at once, similar to what is done for mlst-based classification of pathogens. in figure a we show the time profiles of the relative abundance of sequences belonging to the clades, clearly showing not only the appearance and increase of the novel clades, but most importantly the gradual disappearance of sequences belonging to clade a (see also supplementary movies and ). when focusing on single clades across all macro-regions previously defined, we find a heterogeneous situation with different variants increasing in time in different countries. however, we can see that all vcs with a proline at position of the polyprotein and an aspartate at position of the spike almost disappear in time, remaining abundant only in south asia and oceania; in the rest of the world, only vcs with a leucine and a glycine at those positions remain in the last interval of our time range, after replacing the existing vcs. this is indeed clear from figure c showing that vc for clade a (rgtlpdl) is no more present in the interval april to , . south asia is particular because it is the only area where vcs of radiation (rgtllgl,krtllgl,rgillgl) are present for some time but then disappear with a re-increase of clade c (rgtfpdl). this may suggest these variants have equal phenotypic characteristics, but as written above, we cannot exclude differential fitness depending on host's haplotypes. by calculating the shannon index for sequences within the same intervals of times used to track the changes in frequency of the variants, we were able to trace the evolution of diversity in different places. peaks indicate the emergence (from outside or by evolution of pre-existing strains) of variants and their increase in frequency. values maintaining a high shannon index correspond to situations where different variants coexist at comparable frequencies, while a decrease after a peak means that after the appearance of one or more new variants / clades, one of them (novel or old) takes over -reducing the variability. we also indicate the first sampled sequence for each of the six clades. variability increases when a new variant appears, then it stays more or less stable when the existing variants maintain their relative frequencies. however, if one of the variants becomes dominant, then variability decreases again after a peak. increasing variability in time means that the arrival of novel variants continues steadily, or that the existing ones become more homogeneous in frequency. in this work, we identified seven positions in coding sequences of the sars-cov- genome that are characterized by a different pattern of amino acids when comparing italian sequences with the global trends around the world. further analysis revealed that these sites are not peculiar of italian strains, and that different combinations of these variants are present at varying relative frequencies in different geographic areas. we found that the combination of these residues identifies six abundant configurations that corresponds to phylogenetic clades and that cover over % of all sequences. this suggests that the characterization of these positions can represent a fast and portable method for the sars-cov- typing, but novel variants are emerging that might eventually take over the old ones through mutations at additional sites. using this approach we were able to follow the evolution of the virus over time among continents, showing that the different clades evolved in different moments and that their frequencies vary among continents. these sites are also the most variable among all available sars-cov- sequences, raising intriguing questions about their functional effects. variants with a leucine at position of the polyprotein together with a glycine at position of the spike, underwent an increase in frequency since the end of january in most countries, overcoming the original haplotypes. mutations that might affect the structure of the spike protein are of primary interest, since many vaccine candidates and serological tests rely on the conformation of this protein (who a). this and other works also explore the hypothesis that the variants may indeed provide a selective advantage to the virus. clade prevalence in different countries could be used to check for mortality rate differences and association with variants, but as the rates depend on many other factors (different screening strategies, different ways to define an individual infected and so on), we feel premature discussing such correlations. once the numbers will be standardized for different countries, this kind of associations, if present, will clearly emerge. moreover, to really clarify these issues, experimental data is required, such as for instance in the form of tninsertion mutagenesis, as performed on other viruses in the past (fulton et al. ) followed by competition experiments in in vitro cultures or the design of genomes carrying well-defined changes. this would allow to understand how the virus tolerate mutations at different sites and might provide information on the importance of different genomic regions for different stages of the infection. query seq. the proximal origin of sars-cov- covid- : the role of the nsp and nsp in its pathogenesis severe acute respiratory syndrome coronavirus nonstructural proteins , , and induce double-membrane vesicles evolution and molecular characteristics of sars-cov- genome mutational spectra of sars-cov- orf ab polyprotein and signature mutations in the united states of america analyses of spike protein from first deposited sequences of sars-cov from west bengal, india distinct viral clades of sars-cov- : implications for modeling of viral spread analysis of intraviral protein-protein interactions of the sars coronavirus orfeome networks of coevolving sites in structural and functional domains of serpin proteins blast+: architecture and applications seqinr . - : a contributed package to the r project for statistical computing devoted to biological sequences retrieval and analysis severe acute respiratory syndrome coronavirus nonstructural protein interacts with a host protein complex involved in mitochondrial biogenesis and intracellular signaling coronavirus nsp restricts autophagosome expansion modeltest-ng: a new and scalable tool for the selection of dna and protein evolutionary models pathogenicity of severe acute respiratory coronavirus deletion mutants in hace- transgenic mice evidence for host-dependent rna editing in the transcriptome of sars-cov- transposon mutagenesis of the zika virus genome highlights regions essential for rna replication and restricted for immune evasion prohibitin induces the transcriptional activity of p and is exported from the nucleus upon apoptotic signaling structure of the rna-dependent rna polymerase from covid- virus the nsp replicase proteins of murine hepatitis virus and severe acute respiratory syndrome coronavirus are dispensable for viral replication solution structure of the x protein coded by the sars related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin i domains spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- coronavirus disease (covid- ) in italy genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding cdd/sparcle: the conserved domain database in the coronavirus nucleocapsid is a multifunctional protein prohibitin function within mitochondria: essential roles for cell proliferation and cristae morphogenesis structure and intracellular targeting of the sars-coronavirus orf a accessory protein topology and membrane anchoring of the coronavirus replication complex: not all hydrophobic domains of nsp and nsp are membrane spanning hsv- icp . confers neurovirulence by targeting the beclin autophagy protein emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant identification and characterization of severe acute respiratory syndrome coronavirus replicase proteins ernst mayr: genetics and speciation prohibitin is required for ras-induced raf-mek-erk activation and epithelial cell migration the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak mechanisms of viral mutation genomics of indian sars-cov- : implications in genetic diversity, possible origin and spread of virus raxml version : a tool for phylogenetic analysis and post-analysis of large phylogenies remodeling the endoplasmic reticulum by poliovirus infection and by individual viral proteins: an autophagy-like origin for virus-induced vesicles akt binds prohibitin and relieves its repression of myod and muscle differentiation novel immunoglobulin domain proteins provide insights into evolution and pathogenesis mechanisms of sars-related coronaviruses severe acute respiratory syndrome coronavirus orf a inhibits bone marrow stromal antigen virion tethering through a novel mechanism of glycosylation interference structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus who, coronavirus disease (covid- ) situation report aggresomes and autophagy generate sites for virus replication cryo-em structure of the -ncov spike in the prefusion conformation a new coronavirus associated with human respiratory disease in china genomic, geographic and temporal distributions of sars-cov- mutations severe acute respiratory syndrome coronavirus group-specific open reading frames encode nonessential functions for replication in cell cultures and mice probable pangolin origin of sars-cov- associated with the covid- outbreak a pneumonia outbreak associated with a new coronavirus of probable bat origin all authors wish to thank the www.gisaid.org and all the researchers that contributed their sequences to the database for sharing fundamental data for research. we think collaboration is the only approach to counteract the spread of sars-cov- and other similar endeavors. references for all sequences used in this work are in supplementary key: cord- -izkz hz authors: eden, john-sebastian; rockett, rebecca; carter, ian; rahman, hossinur; de ligt, joep; hadfield, james; storey, matthew; ren, xiaoyun; tulloch, rachel; basile, kerri; wells, jessica; byun, roy; gilroy, nicky; o’sullivan, matthew v; sintchenko, vitali; chen, sharon c; maddocks, susan; sorrell, tania c; holmes, edward c; dwyer, dominic e; kok, jen title: an emergent clade of sars-cov- linked to returned travellers from iran date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: izkz hz the sars-cov- epidemic has rapidly spread outside china with major outbreaks occurring in italy, south korea and iran. phylogenetic analyses of whole genome sequencing data identified a distinct sars-cov- clade linked to travellers returning from iran to australia and new zealand. this study highlights potential viral diversity driving the epidemic in iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases. from a public health perspective, the real-time whole genome sequencing (wgs) of emerging viruses enables the informed development and design of molecular diagnostic methods, and tracing patterns of spread across multiple epidemiological scales (i.e. genomic epidemiology). however, wgs capacities and data sharing policies vary in different countries and jurisdictions, leading to potential sampling bias due to delayed or underrepresented sequencing data from some areas with substantial sars-cov- activity. herein, we show that the genomic analyses of sars-cov- strains from australian returned travellers with covid- disease may provide important insights into viral diversity present in regions currently lacking genomic data. in late december , a cluster of cases of pneumonia of unknown aetiology in wuhan city, hubei province, china was reported by health authorities [ ] . a novel betacoronavirus, designated sars-cov- , was identified as the causative agent [ ] of the disease now known as covid- , with substantial human-to-human transmission [ ] . to contain a growing epidemic, chinese authorities implemented strict quarantine measures in wuhan and surrounding areas in hubei province. significant delays in the global spread of the virus were achieved, but despite these measures, cases were exported to other countries. as of march , these numbered more than countries, on all continents except antarctica; the total number of confirmed infections exceeded , and there were nearly , deaths [ ] . although the vast majority of cases have occurred in china, major outbreaks have also been reported in italy, south korea and iran [ ] . importantly, there is widespread local transmission in multiple countries outside china following independent importations of infection from visitors and returned travellers. in new south wales (nsw), australia, wgs for sars-cov- was developed based on an existing amplicon-based illumina sequencing approach [ ] . viral extracts were prepared from respiratory tract samples where sars-cov- was detected by rt-pcr using world health organization recommended primers and probes targeting the e and rdrp genes, and then reverse transcribed using ssiv vilo cdna master mix. the viral cdna was used as input for multiple overlapping pcr reactions (~ . kb each) spanning the viral genome using platinum superfi master mix (primers provided in supplementary table s ). amplicons were pooled equally, purified and quantified. nextera xt libraries were prepared and sequencing was performed with multiplexing on an illumina iseq ( cycle flow cell). in new zealand, the artic network protocol was used for wgs [ ] . in short, bp tiling amplicons designed with primal scheme [ ] were used to amplify viral cdna prepared with superscript iii. a sequence library was then constructed using the oxford nanopore ligation sequencing kit and sequenced on a r . . minion flow-cell. near-complete viral genomes were then assembled de novo in geneious prime . . or through reference mapping with rampart v . . [ ] using the artic network ncov- novel coronavirus bioinformatics protocol [ ] . in total, sars-cov- genomes were sequenced from cases in nsw diagnosed between january and march , as well as a single genome from the first patient in auckland, new zealand sampled on february (table ) . australian and new zealand sequences were aligned to global reference strains sourced from gisaid with mafft [ ] and then compared phylogenetically using a maximum likelihood approach [ ] . the australian strains of sars-cov- were dispersed across the global sars-cov- phylogeny ( figure a ). the first four cases of covid- disease in nsw occurred between and january , and these were closely related (with - snps difference) to the prototype strain mn /sars-cov- /wuhan-hu- , which is the dominant variant (supplementary figures s & s ) . technological advancements and the wide-spread adoption of wgs in pathogen genomics have transformed public health and infectious disease outbreak responses [ ] . previously, disease investigations often relied on the targeted sequencing of a small locus to identify genotypes and infer patterns of spread along with epidemiological data. as seen with the recent west african ebola [ ] and zika virus epidemics [ ] , rapid wgs significantly increases resolution of diagnosis and surveillance thereby strengthening links between clinical and epidemiological data [ ] . this advance improves our understanding of pathogen origins and spread that ultimately lead to stronger and more timely intervention and control measures [ ] . following the first release of the sars-cov- genome [ ] , public health and research laboratories worldwide have rapidly shared sequences on public data repositories such as gisaid [ ] (n = genomes as of march ) that have been used to provide near real-time snapshots of global diversity through public analytic and visualization tools [ ] . while all known cases linked to iran are contained in this clade, it is important to note the presence of two chinese strains sampled during mid-january from hubei and shandong provinces. it is expected that further chinese strains would be identified within this clade, and across the entire diversity of sars-cov- as this is where the outbreak started, including for the outbreak in iran itself. however, while we cannot completely discount that the cases in australia and new zealand came from other sources including china, our phylogenetic analyses, as well as epidemiological (recent travel to iran) and clinical data (date of symptom onset), provide evidence that this clade of sars-cov- is linked to the iranian epidemic, from where genomic data is currently lacking. importantly, the seemingly multiple importations of very closely related viruses from iran into australia suggests that this diversity reflects the early stages of sars-cov- transmission within iran. none declared. wuhan municipal health and health commission's briefing on the current pneumonia epidemic situation in our city a new coronavirus associated with human respiratory disease in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding an interactive web-based dashboard to track covid- in real time world health organisation coronavirus situation report - th evolution of human respiratory syncytial virus (rsv) over multiple seasons in new south wales, australia. viruses ncov- sequencing protocol, quick j an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform a simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood whole genome sequencing-implications for infection prevention and outbreak investigations. curr infect dis rep virus genomes reveal factors that spread and sustained the ebola epidemic genomic insights into zika virus emergence and spread. cell unifying the epidemiological and evolutionary dynamics of pathogens tracking virus outbreaks in the twenty-first century org -novel coronavirus genome global initiative on sharing all influenza data -from vision to reality nextstrain: real-time tracking of pathogen evolution australia -mar- se asia key: cord- -a r k a authors: zhang, shuyuan; qiao, shuyuan; yu, jinfang; zeng, jianwei; shan, sisi; lan, jun; tian, long; zhang, linqi; wang, xinquan title: bat and pangolin coronavirus spike glycoprotein structures provide insights into sars-cov- evolution date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: a r k a in recognizing the host cellular receptor and mediating fusion of virus and cell membranes, the spike (s) glycoprotein of coronaviruses is the most critical viral protein for cross-species transmission and infection. here we determined the cryo-em structures of the spikes from bat (ratg ) and pangolin (pcov_gx) coronaviruses, which are closely related to sars-cov- . all three receptor-binding domains (rbds) of these two spike trimers are in the “down” conformation, indicating they are more prone to adopt this receptor-binding inactive state. however, we found that the pcov_gx, but not the ratg , spike is comparable to the sars-cov- spike in binding the human ace receptor and supporting pseudovirus cell entry. through structure and sequence comparisons, we identified critical residues in the rbd that underlie the different activities of the ratg and pcov_gx/sars-cov- spikes and propose that n-linked glycans serve as conformational control elements of the rbd. these results collectively indicate that strong rbd-ace binding and efficient rbd conformational sampling are required for the evolution of sars-cov- to gain highly efficient infection. studies revealed that the sars-cov- s trimer, similar to that of sars-cov, needs to have at least one rbd in an "up" conformation to bind hace - . therefore, a spike trimer with all three rbds "down" is in a receptor-binding inactive state, and the conformational change of at least one rbd from "down" to "up" switches the spike trimer to a receptor-binding active state overall structures of ratg and pcov_gx spikes the overall structures of homotrimeric ratg and pcov_gx spikes resemble the previously reported pre-fusion structures of coronavirus spikes (fig. a ). both spikes have a mushroom-like shape (~ Å in height and~ Å in width), consisting of a cap mainly formed by β-strands and a stalk mainly formed by α-helices (fig. a) . like other coronaviruses, the ratg and pcov_gx spike monomers are composed of the s and s subunits with a protease cleavage site between them (fig. b, c) . the structural components of the spike include the n-terminal domain (ntd), rbd (also called the c-terminal domain, ctd), subdomain (sd ) and subdomain (sd ) in the s subunit; and the upstream helix (uh), fusion peptide (fp), connecting region (cr), heptad repeat (hr ), central helix (ch), β-hairpin (bh), subdomain (sd ) and heptad repeat (hr ) in the s subunit (fig. d, fig. s ). table s . a novel coronavirus associated with severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a new coronavirus associated with human respiratory disease in a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin origin and evolution of pathogenic coronaviruses identifying sars-cov- -related coronaviruses in malayan pangolins isolation of sars-cov- -related coronavirus from malayan pangolins are pangolins the intermediate host of the novel coronavirus (sars-cov- )? associated with the covid- outbreak recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis for the recognition of sars-cov- by full-length human ace structural and functional basis of sars-cov- entry by using human ace structural basis of receptor recognition by sars-cov- cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace cryo-em structure of the -ncov spike in the prefusion conformation sars-cov- and bat ratg spike glycoprotein structures inform on virus evolution and furin-cleavage effects a ph-dependent switch mediates conformational masking of sars-cov- spike. biorxiv receptor binding and priming of the spike protein of sars-cov- for membrane fusion sars-cov- and three related coronaviruses utilize multiple ace orthologs and are potently blocked by an improved ace -ig structural insights into coronavirus entry adaptation of sars-cov- in balb/c mice for testing vaccine efficacy a mouse-adapted model of sars-cov- to test covid- countermeasures functional and genetic analysis of viral receptor ace cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains glycans on the sars-cov- spike control the receptor immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen eman : an extensible image processing suite for electron microscopy new tools for automated high-resolution cryo-em structure determination in relion- motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy real-time ctf determination and correction quantifying the local resolution of cryo-em density maps swiss-model: homology modelling of protein structures and complexes ucsf chimera--a visualization system for exploratory research and analysis coot: model-building tools for molecular graphics phenix: a comprehensive python-based system for macromolecular structure solution molprobity: all-atom structure validation for macromolecular crystallography emringer: side chain-directed model and map validation for d cryo-electron microscopy vxx) rbd in wheat,and sars-cov- (pdb id: zge) rbd in marine; remaining regions shown in gray. (b) detailed structures of the rbd-glycans interface are shown zge/ vxx) rbds are colored the same as in a. glycans are shown as red sticks and asn-linked glycans are labeled. sequence alignment of the sars-cov- ratg and pcov_gx rbd-interacting glycosylation sites is shown in the bottom panel. some sequences between the three sites are omitted and indicated by black dots amino acid positions of asparagines are indicated above the sequences according to asparagines (n) are colored red and threonines (t) are colored blue binding affinities and cell entr y of the differ ent spikes. (a) binding curves of immobilized hace with the sars-cov- , pcov_gx or ratg spike. data are shown as different colored lines and the best fit of the data to a : binding model is shown in black. (b) the cell entry efficiencies of pseudotyped viruses as measured by luciferase activity. sars-cov- c) the representative micrographs and d classification results of negative-staining em. both spikes were incubated with -fold molar ratio of hace . the red box shows the complex of the pcov_gx spike with hace we thank the tsinghua university branch of china national center for protein fig. the r esidues and glycans inter acting with one rbd of the differ ent spikes. (a) the residues and glycans interacting with one rbd are shown as salmon spheres. the ratg rbd is colored in magenta, pcov_gx rbd in green, sars-cov- key: cord- -wfjdxgxh authors: swann, heather; sharma, abhimanyu; preece, benjamin; peterson, abby; eldridge, crystal; belnap, david m.; vershinin, michael; saffarian, saveez title: minimal system for assembly of sars-cov- virus like particles date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wfjdxgxh sars-cov- virus is the causative agent of covid- . here we demonstrate that non-infectious sars-cov- virus like particles (vlps) can be assembled by co-expressing the viral proteins s, m and e in mammalian cells. the assembled sars-cov- vlps possess s protein spikes on particle exterior, making them ideal for vaccine development. the particles range in shape from spherical to elongated with a characteristic size of ± nm. we further show that sars-cov- vlps dried in ambient conditions can retain their structural integrity upon repeated scans with atomic force microscopy up to a peak force of nn. covid- is a pandemic disease caused by infection of sars-cov- virus . with more than million cases confirmed and a death toll exceeding several hundred thousand individuals, a search for antiviral therapies as well as vaccine candidates is of utmost urgency. non-infectious virus like particles (vlps) displaying essential viral proteins can be used to study the structural properties of the sars-cov- virions and due to their maximum immunogenicity are also vaccine candidates , . vlps are released from cells with similar mechanisms as fully infectious virions and resemble the shape and composition of fully infectious virions . most coronaviruses are pathogens of zoonotic nature with different viruses infecting avian (ibv), bovine (bcov), porcine (tgev), feline (fipv) and murine(mhv) species. there is evidence that bat-sars-cov is the origin of the sars-cov virus which first appeared in human hosts in . the genome of sars-cov- has ~ % similarity to genome of coronaviruses previously identified in bat populations in china , . expression of e and m proteins of tgev result in release of vlps with similar sizes to wild type tgev . similar results are reported for ibv . similarly, expression of m, e and s proteins are shown to result in release of morphologically identical particles to wild type sars-cov virus , . it is known that tgev vlps can induce an ifnα response in their hosts . more recently human mers-cov vlps were also shown to elicit an immune response and serve as vaccine candidates . given the similarities between sars-cov- and sars-cov viruses , , we set out to create sars-cov- vlps by expressing s, m and e proteins of sars-cov- in mammalian cells. we then tested the structural integrity of the sars-cov- vlps attached to dry glass using atomic force microscopy (afm), since sars-cov- virions have been reported to survive on solid surfaces in dry conditions for many hours . plasmid preparation and vlp harvest. sars-cov- m, s and e protein genes were identified from the full genome sequence of the virus , these genes were then humanized and inserted in cmv driven mammalian expression vectors (see supplement for complete plasmid sequences). hours after transfecting a monolayer of t cells with a cocktail of s, m and e plasmids, vlps were harvested from the supernatant (see supplement for further details). the supernatant was filtered using . um filter and vlps were captured in a - % sucrose step gradient and concentrated using amicon spin filters (ufc , emd millipore, burlington, ma). total protein yield for purified vlp stock varied somewhat but was typically above mg/ml. the purified vlps were stable for at least a week when stored at ~ °c. immunogold electron microscopy. vlps were incubated with : dilution of anti-s antibody (clone a , gtx , genetex, irvine, ca) and : dilution of goat anti-mouse igg nanogold conjugates (bbi solutions, crumlin, uk, distributed by tedpella, redding, ca, cat # ). negative stain electron microscopy was performed on vlps by applying . ul of vlps to a glow-discharged formvar-carbon coated em grid (ted pella, redding, ca) followed by two de-ionized water washes and staining with % uranyl acetate. imaging was performed in a joel jem -plus microscope with an accelerating voltage of kev. western blot analysis. after triple washing the cells with pbs, t cells were suspended in ul of ripa buffer (sc- , santa cruz, dallas, texas). µl of vlps or cell extracts were then denatured by laemmli sample buffer (biorad, hercules, ca) with % bme and boiled at c for min. the proteins were separated by sds-page and then functionalized with anti-s antibody as follows. glass was first cleaned and functionalized with biotin-peg-silane as previously described . the surfaces were then incubated with neutravidin (thermo scientific pierce protein biology, waltham, ma, usa) followed by a brief dd-h wash to get rid of excess neutravidin and incubation with biotinylated anti-s antibody ( - -b, abeomics, san diego, ca, usa). surfaces thus prepared were generally devoid of debris but sometimes had step-edge character suggesting that antibody coating was less than a full monolayer ( figure s ). the vlps suspended in assay buffer were then incubated with the functionalized surface and finally washed away via brief buffer exchange with dd-h followed by drying under nitrogen gas flow. all incubations lasted minutes at room temperature. assay buffer: pbs, ph . vlp purifications were first assessed biochemically. figure (see also figure s ) shows western blot analysis of cell extracts as well as of purified vlps. the isolated vlps have both m and s proteins as identified in western blots. s protein appears more highly enriched in cells vs vlps suggesting that not all expressed s protein is released on vlps. the isolated vlps were further analyzed by running a - % sucrose gradient and sequential fractionation. vlps were found to be within the density of - % sucrose ( figure s ). s protein bands show a cleavage product which runs at a lower molecular weight from full s protein as shown in figure and more clearly visible in figure s . in addition both s protein as well as m protein form complexes which survive during western blot analysis and run at considerably higher molecular weight; successive dilutions results in breakup of these higher molecular weight fractions to respective monomeric bands for each protein ( figure s ). we further characterized the purified vlps via electron microscopy. figure shows a representative image of the electron micrographs with the immobilized vlps on the em grids. vlps can be identified via specific nanogold immunolabeling. notably, nanogold labels not only decorate the vlps but also proximal areas, suggestive of s protein dissociating from vlps during surface deposition. the sars-cov- vlps we have characterized have a size distribution of ± nm, consistent with the general characterization of prior coronaviruses with size distributions of - nanometers . this is also consistent with prior reports for sars-cov vlps, which were characterized with cryotomography and have size distribution of - nm (shape reportedly varied from nearly spherical to ellipsoidal with a to aspect ratio ). our measured vlps are also consistent with limited electron microscopy data available for the fully infectious sars-cov- virions observed in patient samples . the structural integrity of vlps can inform their practical applications as well as serve as an estimate of the stability of fully infectious sars-cov- virions. we therefore further characterized the vlps on functionalized surfaces with atomic force microscopy (afm) (fig. and figure s & s ). vlps were independently imaged on multiple occasions, originating from multiple purification batches. the observed surface density was as high as ~ per µm square field of view for highyield vlp batches. vlps appeared as approximately spherically symmetric particles whose lateral diameter was in excess of nm while heights of the particles varied between nm and nm. these shapes are consistent with vlp dimensions broadened laterally by imaging forces and tip curvature and reduced somewhat in height due to surface adhesion and possibly imaging forces. although detailed characterization is subject of future work, we found that repeated imaging of vlps with peak force of nn led to gradual particle deformations: reduced particle heights and non-circular lateral cross-sections -consistent with vlp bursts ( figure c and figure s ). this force is within the range of previously expression of just m, s, and e proteins (figure ) was sufficient for release of vlps which were structurally competent for both harvesting and subsequent investigations (fig and ) . these results open the door for many further studies. they can serve as immunogenic agents in place of the full infectious virus. the vlps can also serve to study virus interactions with proteins of interest (e.g. receptors) -to date most such studies were only possible at the single protein level or in the context of the infectious virus. in addition, artificial vlps can now be used to study the mechanical properties of the sars-cov- virus as well as their dependence on environmental conditions. finally, vlps can also be used to develop and validate novel testing strategies for sars-cov- . the expression methodology is robust and should require no modifications to create vlps for most s, m, and e mutations found in native conditions . for expediency, genetic material in this work is published in the supplement and is available from the authors upon request. the authors will deposit the genetic material with addgene. this study is supported by the nsf rapid award to mv and ss. hs performed the biochemical characterization as performed the afm characterization, bp, ap and ce created reagents, db performed the electron microscopy characterization, mv and ss designed the research and wrote the manuscript. the authors declare no financial conflicts of interest. advances in virus research prepare membrane and activate the pvdf membrane by methanol for mins. put the foam and gel-blotting then add secondary antibody : . so, . ul into ml of blocking buffer and keep for mins- hr max and then again wash it with pbs-tween ( times mins each) and then with key: cord- - wfcc y authors: li, tingting; cai, hongmin; yao, hebang; zhou, bingjie; zhang, ning; gong, yuhuan; zhao, yapei; shen, quan; qin, wenming; hutter, cedric a.j.; lai, yanling; kuo, shu-ming; bao, juan; lan, jiaming; seeger, markus a.; wong, gary; bi, yuhai; lavillette, dimitri; li, dianfan title: a potent synthetic nanobody targets rbd and protects mice from sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wfcc y sars-cov- , the causative agent of covid- , recognizes host cells by attaching its receptor-binding domain (rbd) to the host receptor ace – . neutralizing antibodies that block rbd-ace interaction have been a major focus for therapeutic development – . llama-derived single-domain antibodies (nanobodies, ∼ kda) offer advantages including ease of production and possibility for direct delivery to the lungs by nebulization , which are attractive features for bio-drugs against the global respiratory disease. here, we generated synthetic nanobodies (sybodies) by in vitro selection using three libraries. the best sybody, mr bound to rbd with high affinity (kd = . nm) and showed high neutralization activity against sars-cov- pseudoviruses (ic = . μg ml− ). structural, biochemical, and biological characterization of sybodies suggest a common neutralizing mechanism, in which the rbd-ace interaction is competitively inhibited by sybodies. various forms of sybodies with improved potency were generated by structure-based design, biparatopic construction, and divalent engineering. among these, a divalent mr conjugated with the albumin-binding domain for prolonged half-life displayed highest potency (ic = ng ml− ) and protected mice from live sars-cov- challenge. our results pave the way to the development of therapeutic nanobodies against covid- and present a strategy for rapid responses for future outbreaks. strategy for rapid responses for future outbreaks. the binding through hydrophobic interactions and h-bonding that involves both side chains and main chains (fig. d ). in addition, tyr , a framework residue, also participated binding by forming an h-bond with the rbd gly backbone. mr also binds to the rbd at the 'seat' and 'backrest' regions but approaches the rbd at an almost perfect opposite direction of sr (fig. c, e) , indicating divergent binding mode for these sybodies. the binding of mr to the rbd occurred on an . Å surface area with noticeable electrostatic complementarity (extended data fig. b) . interestingly, this surface was largely shared with the sr binding surface (fig. f) . the interactions between mr and the rbd were mainly mediated by h- bonding. apart from the three cdrs, two framework residues, lys and tyr , interacted with the same rbd residue glu , via a salt bridge with its side chain, and an h-bond with its main chain (fig. g) . molecular mechanism for neutralization structure alignment of sr -, mr -and ace -rbd showed that both sybodies engage with rbd at the receptor-binding motif (rbm) ( fig. a, b) . superposing sr and mr to the s trimer showed both sybodies could bind to the 'up' conformation of rbd with no steric clashes (fig. c, d) , and to the 'down' conformation with only minor clashes (extended data fig. ) owing to their minute sizes. consistent with the structure observation, both sr and mr inhibited the binding of ace to rbd, as revealed by bio-layer interferometry (bli) assays (fig. e, f) . to probe the epitope for mr without a structure, competitive bli binding assays were carried out. the results showed that mr could block ace (fig. g) , and sr and mr (fig. h, i) , suggesting it also binds to at least part of the rbm, although the possibility of allosteric inhibition remains to be investigated. taken together, sr and mr , and probably mr , neutralize sars-cov- by competitively blocking the for biparatopic fusion, we first identified two sybodies, namely lr and lr (fig. a, b), that could bind rbd in addition to mr using the bli assay. as lr showed higher affinity and neutralization activity than lr (fig. a) , we fused this non- competing sybody to the n-terminal of mr with various length of gs linkers ranging from to amino acids (extended data table s ). interestingly, the linker length had little effect on neutralization activity and these biparatopic lr -mr sybodies were more potent than either sybodies alone ( fig. a) with an ic of . g ml - (fig. c). lr -mr may be more tolerant to escape mutants - owing to its ability to recognize two distinct epitopes. this decreased ic by folds for fc-mr ( ng ml - ) and folds fc-mr ( . g ml - ), respectively (fig. d, e) . consistently, the fc fusion increased the apparent binding affinity for both sybodies, with a kd of . nm for fc-mr and less than pm for fc-mr (extended data fig. h, i) . note, however, fc-mr did not gain as much neutralization potency as for the apparent binding affinity. table ). the optimal construct for mr m-mr m had the shortest linker ( -gs) (fig. d , e). by contrast, optimal neutralization activity was observed with the longest linker ( -gs) for mr -mr (fig. d, e) . again, mr -mr was superior compared to mr m-mr m, showing a -fold higher neutralization activity with an ic of ng ml - (fig. e) . compared to the monovalent mr (ic of . g ml - ), the divalent the most potent divalent sybody (mr -mr ) was chosen to investigate the potential of nanobodies to protect mice from sars-cov- infection. nanobodies have very short serum half-lives of several minutes due to their minute size . to circumvent this, we fused mr -mr to the n-terminus of an albumin-binding domain (abd) which has been known to extend the circulating half-life of its fusion partners by increase in size and preventing intracellular degradation . conveniently, we expressed mr -mr -abd in pichia pastoris, which is the preferred host to express nanobody therapeutics owing to its robustness and its endotoxin-free production. small-scale expression of mr -mr -abd showed a secretion level of ~ mg l - with an apparent purity of > % without purification (fig. a) . note, this experiment was carried out using a shaker which gave cell density of od of . given its ability to grow to od of without compromising yield, the expression level of mr -mr -abd may reach . g l - in fermenters. the potential for simple and high-yield production is especially attractive for the pandemic at a global scale. the construct for the rbd with an avi-tag for biotinylation was made by fusing dna, from '-to '-end, of the encoding sequence for the honey bee melittin signal neutralization assay results for sars-cov- pseudovirus. (b) neutralization assay results for sars-cov pseudovirus. veroe -hace cells were infected with a premix of pseudotypes and sybodies at two concentrations ( m and nm) a-i) biotinylated rbd immobilized on a streptavidin-coated sensor was titrated with various concentrations (nm) of sybodies as indicated open-book' view of molecular electrical potential surfaces of the interface between the rbd and sr (a) and between the rbd and mr (b). the electrical potential maps were calculated by adaptive poisson-boltzmann solver (apbs) built-in in pymol structure-based design of a mr mutant (mr m) with improved affinity and potency. (a,b) neutralization assay for sars-cov- (a) or sybody concentrations were used at m (green) and nm (magenta) concentrations. data are from three independent experiments electrostatic repel and hydrophobic mismatch would make lys unfavorable at this position. according to the original library design, lys was unvaried , meaning that lys was not selected and hence opportunities for optimization. (e) the k y mutation fits the hydrophobic microenvironment well, as revealed by the crystal structure of mr m (extended data table ). (f) binding kinetics of mr m binding to rbd. bli signals were recorded under ic values (g ml - ) for sars-cov- are indicated in brackets. data for mr are from data are from three independent experiments extended data fig. . evaluation of in vivo stability and toxicity of nanobodies. (a) for neutralization assay, sera were preincubated with sars-cov- pseudovirus for h before infection at / dilution. the infection rates on veroe - hace were measure by facs days post infection. (b) body weight changes. the body weight data are presented as means  the sd of mice in each group (n= ). no significant differences are observed. (c) representative histopathology of the lungs, heart, liver, spleen, lungs, kidney, and thymus for the different sybodies injected the images and areas of interest are magnified ×. bars indicate m a novel coronavirus outbreak of global health concern structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis of receptor recognition by sars-cov- structural and functional basis of sars-cov- entry by using human ace structural basis for the recognition of sars-cov- by full-length human ace a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace potent neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells human neutralizing antibodies elicited by sars-cov- infection general strategy to humanize a camelid single-domain antibody and identification of a universal humanized nanobody scaffold isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model studies in humanized mice and convalescent humans yield a sars-cov- antibody cocktail an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction. biorxiv neutralizing nanobodies bind sars-cov- spike rbd and block interaction with selection, biophysical and structural analysis of synthetic nanobodies that effectively neutralize sars-cov- synthetic nanobodies targeting the sars-cov- receptor-binding domain an ultra-high affinity synthetic nanobody blocks sars-cov- infection by locking spike into an inactive conformation. biorxiv nanobodies® as inhaled biotherapeutics for lung diseases the socio-economic implications of the coronavirus pandemic (covid- ): a review cell entry mechanisms of sars-cov- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor convergent antibody responses to sars-cov- in convalescent individuals natural single-domain antibodies yeast surface display platform for rapid discovery of conformationally selective nanobodies synthetic single domain antibodies for the conformational trapping of membrane proteins generation of synthetic nanobodies against delicate proteins nali-h : a universal synthetic library of humanized nanobodies providing highly functional antibodies and intrabodies an improved yeast surface display platform for the screening of nanobody immune libraries selection, biophysical and structural analysis of synthetic nanobodies that effectively neutralize sars-cov- . biorxiv the therapeutic potential of nanobodies inference of macromolecular assemblies from crystalline state tracking changes in sars-cov- spike: evidence that d g increases molecular cancer therapeutics fusion to a highly stable consensus albumin binding domain allows for tunable pharmacokinetics. protein engineering, design & selection : peds generation of a broadly useful model for covid- pathogenesis, vaccination, and treatment a human neutralizing antibody targets the receptor-binding site of sars-cov- a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies enzymatic assembly of dna molecules up to several hundred kilobases a fluorescence-detection size-exclusion chromatography- based thermostability assay for membrane protein precrystallization screening the protein complex crystallography beamline (bl u ) at the shanghai synchrotron radiation facility how good are my data and what is the resolution? phaser crystallographic software features and development of coot phenix: a comprehensive python-based system for macromolecular structure solution the pymol molecular graphics system, version . schrödinger, llc electrostatics of nanosystems: application to microtubules and the ribosome key: cord- -o d aa authors: yu, xi; zhang, liming; tong, liangqin; zhang, nana; wang, han; yang, yun; shi, mingyu; xiao, xiaoping; zhu, yibin; wang, penghua; ding, qiang; zhang, linqi; qin, chengfeng; cheng, gong title: broad-spectrum virucidal activity of bacterial secreted lipases against flaviviruses, sars-cov- and other enveloped viruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o d aa viruses are the major aetiological agents of acute and chronic severe human diseases that place a tremendous burden on global public health and economy; however, for most viruses, effective prophylactics and therapeutics are lacking, in particular, broad-spectrum antiviral agents. herein, we identified secreted bacterial lipases from a chromobacterium bacterium, named chromobacterium antiviral effector- (cbae- ) and cbae- , with a broad-spectrum virucidal activity against dengue virus (denv), zika virus (zikv), severe acute respiratory syndrome coronavirus (sars-cov- ), human immunodeficiency virus (hiv) and herpes simplex virus (hsv). the cbaes potently blocked viral infection in the extracellular milieu through their lipase activity. mechanistic studies showed that this lipase activity directly disrupted the viral envelope structure, thus inactivating infectivity. a mutation of cbae- in its lipase motif fully abrogated the virucidal ability. furthermore, cbae- presented low toxicity in vivo and in vitro, highlighting its potential as a broad-spectrum antiviral drug. genomic comparison, csp_bj shares either . % identity to chromobacterium haemolyticum ch -bl or . % identity to chromobacterium rhizoryzae jp - strain. oral supplementation of this bacteria in a. aegypti largely impaired mosquito permissiveness to denv (supplementary figure a and b) and zikv (supplementary figure c and d) , suggesting a close relationship between the identified csp_bj and csp_p strains. we next aimed to understand how csp_bj resists viral infection in mosquitoes. bacteria usually exploit many effectors, such as cellular components, metabolites or secreted proteins, to regulate their host immune or physiological status for effective colonization. we therefore identified the bacterial effector(s) that modulate infection of denv and zikv through differential fragmentation. in this experiment, we cultured csp_bj for hr at °c. the cell-free culture supernatant was collected by centrifugation and filtration through a . µm filter unit, whereas the cell lysates were generated by sonication. either the bacterial cell lysate or the culture supernatant was mixed with plaque-forming units (pfu) of denv or zikv and incubated for hr, and then the infectious viral particles were determined by a plaque forming assay ( figure a ). incubation of the culture supernatant but not the bacterial lysates resulted in significant suppression of denv ( figure b ) and zikv ( figure c ) infectivity in vero cells, indicating that an extracellular effector(s) secreted by csp_bj was responsible for viral inhibition. next, we investigated whether the effector(s) was a secreted protein, small peptide, lipid, polysaccharide or other metabolite. therefore, the culture supernatant was separated using a kda-cutoff filter (wu et al., ) . either the upper retentate (proteins and large peptides) or the lower liquid filtrate (small molecule compounds and short peptides) was mixed with the viruses for incubation in vero cells ( figure d ). intriguingly, inoculation of the retentate rather than the filtrate inhibited the infectivity of denv and zikv ( figure e and f), suggesting that the effector(s) might be a protein(s) secreted by csp_bj. subsequently, the protein components in the upper retentate were separated by sds-page and then identified by mass spectrometry (figure a ). the highly abundant proteins with secretable properties were selected, expressed and purified in an escherichia coli expression system ( figure b) . of all the proteins tested, the bacterial protein encoded by gene (accession: mt ) significantly impaired denv and zikv infection in vero cells ( figure c and d). we named this protein chromobacterium antiviral effector- (cbae- ). intriguingly, a cbae- homologue with . % amino acid identity named cbae- (accession: mt ) was further identified from csp_bj based on sequence comparison (supplementary figure a and b) . both effectors encoded a lipase domain with a typical gdsl motif (casas-godoy et al., ) . to further confirm the virucidal activity, both proteins were expressed and purified in e. coli ( figure e) . a serial concentration of recombinant proteins was mixed with pfu of denv or zikv for plaque assay in vero cells. the half-inhibitory concentration (ic ) of cbae- was . µg/ml for denv and . µg/ml for zikv ( figure f and g). however, the ic of cbae- for both flaviviruses was - times higher than that of cbae- , indicating a much more robust virucidal activity of cbae- against flaviviruses ( figure f and g). thus, we identified bacterial effectors with a high virucidal activity from a csp_bj bacterium. we next investigated the mechanisms by which these bacterial effectors resist viral infection. according to sequence analysis, cbaes contain a conserved lipase domain. lipases are a group of enzymes that catalyse the hydrolysis of the ester bond of glycerides into fatty acids and glycerol (casas-godoy et al., ) . we therefore assessed whether cbaes have lipase activity. in a plate degradation assay, both cbaes directly digested egg yolk lipoproteins and formed lytic halos whose diameters correlated with lipase activity ( figure a ). the sequence gdsl is the core motif of lipase activity (casas-godoy et al., ) . consistently, a s g mutation in this motif of cbae- fully disrupted its lipase activity ( figure a ), validating cbaes as secreted lipases of csp_bj. given their lipase activity, we hypothesized that cbaes might use their enzymatic activity to degrade viral lipid envelope, which may result in exposure of the viral rna (muller et al., ) . to address this hypothesis, serial concentrations of cbaes were incubated with × pfu of denv or zikv for hr at °c. the mixture was then treated with rnase a to evaluate the degradation of exposed viral genomic rna. compared to mock treatments in which the viruses were incubated with bsa, a significant reduction in viral rna was recorded by rt-qpcr when the viruses were treated with cbaes ( figure b and c), indicating that the lipase activity of cbaes directly disrupted the virion structure, thus resulting in viral genome release. consistent with these results, the s g mutant of cbae- that had no lipase activity completely failed to suppress both denv and zikv infection ( figure d and e), further indicating that the virucidal activity of cbaes is lipase-dependent. to validate that cbaes disrupt viral lipidic membranes, we incubated cbae- with purified zikv virions and processed the samples for transmission electron microscopy ( figure f ). the zikv particle typically has a diameter of - nm (hasan et al., ) and, in consistent with that, we observed intact viral particles in our control sample treated with µg/ml bsa ( figure f ). however, upon treatment with µg/ml cbae- in the same buffer as bsa solution, the integrity of zikv particles was fully disrupted ( figure f ). this is in agreement with our previous finding that treatment of cbaes resulted in viral genome release. since cbaes blocked viral infection in the extracellular milieu, we further assessed whether treatment of cbaes prior to viral inoculation could effectively block viral infection. we pre-treated vero cells with µg/ml of purified cbaes. subsequently, . moi of denv or zikv was used to challenge the cbaes-treated cells. pre-incubation with cbae- fully blocked the infectivity of both flaviviruses, whereas treatment with cbae- exhibited %- % inhibition ( figure f and g). since the cbaes showed a direct catalytic action on the viral lipid bilayer, we assessed their virucidal activity against other enveloped viruses. sars-cov- is a newly emerging coronavirus that causes the severe acute respiratory disease covid- . to date, no specific therapeutics are available against sars-cov- infection. we therefore assessed the virucidal effect of the cbaes on sars-cov- . in contrast to the large difference in virucidal activity against flavivirus infection, both cbaes presented a similar antiviral effect against a sars-cov- pseudovirus in hek- t-ace cells ( figure a ). consistently, infection with lowpassage sars-cov- in vero cells was also significantly suppressed by treatment with the cbaes ( figure b ). additionally, both cbaes showed broad-spectrum antiviral effects on hiv- pseudoviruses ( figure c , d and e), as well as hsv- ( figure f ). notably, compared to a hiv-neutralizing antibody (n ) with a near-pan neutralization breadth (huang et al., ) , cbaes were very potent with a much lower ic for hiv pseudovirus strains ( figure c, d and e). nonetheless, it is puzzling that the cbaes did not show any effect against infection with influenza a virus (iav) ( figure g ). the lipidic envelope of a viral particle is generally derived from the plasma membrane or the endoplasmic reticulum (er) membrane of a host cell. thus, cbaes can also act on host cell membranes and their cytotoxicity to host cells may be a major concern for developing a cbae-based broad-spectrum antiviral drug. generally, cbae- showed much higher toxicity figure d ). altogether, these data suggest that cbae- is much safer than cbae- to hosts, and thus could be a broad antiviral drug candidate. in this study, we identified two virucidal effectors with lipase activity, cbae- and cbae- , from the chromobacterium sp. csp_bj. both cbaes showed a potent virucidal activity against a variety of enveloped viruses including denv, zikv, sars-cov- , hiv- , and hsv- . notably, neither cbaes exhibited any effect against iav. the toxicity assessment showed that cbae- was much safer than cbae- to both human cells and mice. indeed, accumulating evidence indicates that certain lipases present a potent antiviral activity. either lipoprotein lipase or hepatic triglyceride lipase impaired hepatitis c virus (hcv) infection in human huh . cells through degrading virus-associated lipoproteins (shimizu et al., ) . a secreted phospholipase a (pla ) isolated from naja mossambica snake venom showed a potent virucidal activity against hcv, denv, and japanese encephalitis virus (jev); while the protein did not exhibit significant antiviral activity against sindbis virus (sinv), iav, middle east respiratory syndrome coronavirus (mers-cov) or hsv- (chen et al., ) . a secreted human pla has been shown to neutralize hiv- by degrading the viral membrane (kim et al., ) or by blocking viral entry into host cells rather than a lipase-mediated virucidal effect (fenard et al., ) , suggesting diverse virucidal mechanisms by pla . cbaes may inactivate viruses by their lipase activity that also damage cellular membranes; nonetheless, their specificity and affinity for the viral envelope could be significantly improved, thus further reducing their cytotoxicity and increasing virucidal efficacy. for example, the sars-cov- surface protein spike binds to human ace with high affinity (wang et al., ) , and thus soluble ace could be engineered with cbaes to greatly enhance its affinity and specificity for the viral envelope. given that human recombinant soluble ace can also inhibit sars-cov- infection in organoids (monteil et al., ) , it is possible that a soluble ace -cbae construction may have improved efficacy in blocking sars-cov- infections. viruses are the major aetiological agents of acute severe human diseases that impose a tremendous burden on the global public health and economy (ghosn et al., ; girard et al., ; wilder-smith et al., ) . given the fact that development of virus-specific vaccines and antiviral drugs is usually lengthy, broad-spectrum antiviral drugs could be crucial to prevent wide spread of new viral disease in a timely manner (schein, ) . cbae- , with a broad antiviral effect and low toxicity to hosts, could be a potential choice. our study provides a future avenue for the development of broad-spectrum antiviral drugs that might reduce the clinical burden caused by emerging viral diseases. human blood was collected from healthy donors who provided written informed consents. the collection of human blood samples and their use for mosquito feeding was approved by the local ethics committee at tsinghua university. eight-week-old female icr mice purchased from vital river laboratories in china were used for toxicity assay. the mice were maintained in a specific pathogen-free barrier facility at tsinghua university. the animal protocol used in this study was approved by the institutional animal care and use committee of tsinghua university and performed in accordance with their guidelines. aedes aegypti (the rockefeller strain) was maintained on a sugar solution in a low-temperature, illuminated incubator (model , thermo electron corporation) at °c and % humidity, according to standard rearing procedures (cheng et al., ) . vero cells, hek- t cells and a cells were maintained in dulbecco's modified eagle's medium ( - , gibco) supplemented with % heat-inactivated foetal bovine serum ( - , gibco) and % antibiotic-antimycotic ( - , invitrogen) in a humidified % (v/v) co incubator at °c. the vero, hek- t and a cell lines were purchased from the atcc (ccl- , crl- and ccl- , respectively). denv- (new guinea c strain), zikv (prvabc strain), hsv- , iav (h n pr strain) and sars-cov- were grown in vero cells with vp-sfm medium ( - , gibco). denv, zikv, hsv and sars-cov- were titrated by a standard plaque formation assay on vero cells (bai et al., ) . iav were titrated using a standard % tissue culture infection dose (tcid ) assay (teferedegne et al., ; varada et al., ) . all experiments involving infectious sars-cov- were performed in a biosafety level (bsl ) containment laboratory. chromobacterium sp. beijing was grown in lb broth at °c for hr at rpm. culture supernatants were obtained by centrifugation and filtering the supernatant through a . µm filter unit (slgp rs, millipore). genscript) before the protein concentration was measured using a bradford assay ( - , bio-rad), and the protein purity was checked with sds-page. vero cells were seeded at ~ × cells per well in -well plates and then incubated at °c overnight before reaching - % confluence. denv, zikv, hsv- and sars-cov- virus stocks were diluted to plaque-forming units (pfu) per ml and incubated untreated or with a serial dilution of the cbaes in five-fold steps at °c for hr before being added onto vero cell monolayers for hr of infection. cell monolayers were washed once with pbs and covered with % agarose overlay dmem with % fbs. after - dpi (denv, zikv and hsv- ) or - dpi (sars-cov- ), vero cell monolayers were fixed and stained with . % crystal violet, and the number of pfu per ml was determined. the concentration of each protein necessary to inhibit virus infection by % (ic ) was calculated by comparison with the untreated cells using the dose-response-inhibition model in graphpad prism . (graphpad software, usa). for pre-infection treatment, vero cells were seeded in -well plates and allowed to form monolayers. ten micrograms/ml cbae- or cbae- was added to vero cell monolayers at °c for hr, and then denv or zikv ( . moi) was added and incubated for another hour at °c. after infection, cell monolayers were washed once with pbs buffer, fresh vp-sfm medium was added, and the cells were incubated at °c for hr before the supernatant was collected for rt-qpcr quantitation of the viral genome. total rna was isolated either from homogenized mosquitoes or infected cell supernatant using supplementary table . the lipase activity of cbae- , cbae- and cbae- -s g was measured with a plate assay as previously described (liu et al., ) . briefly, µg of cbae- , cbae- or cbae- -s g was spot inoculated onto a % agar plate with % egg yolk and incubated for hr at °c. phospholipase activity was indicated by the diameter of the lytic halo around each well. viral rna exposure assay. cbaes or pbs in a total volume of ml for hr at °c. then, the mixtures were treated with µl of rnase a (ge - , transgen) and incubated for hr at °c. viral rna was extracted, and rna degradation was evaluated by rt-qpcr as mentioned above. zikv viral particles were purified as described previously (tan and lok, ) . briefly, virus stocks were pelleted in % w/v peg at , ×g for hr, then purified by % w/v sucrose cushion for hr at , ×g (beckman sw ti rotor) and separated in potassium tartrate-glycerol gradient - % for hrs at , ×g (beckman sw ti rotor). purified viral particles were suspended in µg/ml bsa solution or µg/ml cbae- solution and incubated for hr at room temperature. the samples were then applied to a carbon grid, washed times with water and negatively stained with % w/v uranyl acetate. the images were acquired in a hitachi h- b tem microscope at . kv. the in vitro antiviral efficacy of the cbaes on iav was tested in a cells. briefly, a cells were seeded into a -well plate and incubated at °c for - h. iav ( tcid ) were incubated untreated or with a serial dilution of the cbaes in five-fold steps at °c for hr before being added onto a cell monolayers to allow infection to proceed for hr. then, the virus-protein mixture was removed, and the cells were further cultured with fresh vp-sfm medium. at hr p.i., relative viral rna copy numbers in the infected cells were quantified by rt-qpcr assays with specific primers. the neutralization titre was defined as the concentration of each protein necessary to inhibit the pcr signal by % (i.e., below the threshold of % of the mean value observed in virus control wells). hiv- pseudoviruses were generated by co-transfecting hek- t cells with env expression vectors and the pnl - r-eluciferase viral backbone plasmid, and a neutralization assay was performed as described previously (zhou et al., ) . briefly, pseudovirus titres were measured by luciferase activity in relative light units (rlus) (bright-glo luciferase assay system, promega biosciences, california, usa). neutralization assays were performed by adding tcid (median tissue culture infectious dose) of pseudovirus into serial : dilutions of cbae- or cbae- starting from µg/ml, following incubation at °c for hr and addition of ghostx /r cells. neutralizing activity was measured by the reduction in luciferase activity compared to that in the controls. the fifty percent inhibitory concentration (ic ) was calculated using the dose-response-inhibition model with the -parameter hill slope equation in graphpad prism . (graphpad software, usa). vesicular stomatitis virus g protein (vsv-g) pseudotyped lentiviruses expressing human ace were produced by transient co-transfection of pmd g (addgene # ) and pspax (addgene # ) plasmids and the transfer vector plvx-ace flag-ires-puro with vigofect dna transfection reagent (vigorous) into hek- t cells to generate the hek- t-ace cells for sars-cov- pseudovirus infection. sars-cov- pseudoviruses were purchased from genscript, and neutralization activity was measured using the hek- t-ace cell line with the same procedures as mentioned above. fresh human blood from healthy donors was placed in heparin-coated tubes ( , bd vacutainer) and centrifuged at , ×g and °c for min to separate plasma from blood cells. the plasma was heat-inactivated at °c for min. the separated blood cells were washed three times with pbs to remove the anticoagulant. the blood cells were then resuspended in heat-inactivated plasma. bacterial suspension ( . od) was mixed with viruses and treated blood for mosquito oral feeding via a hemotek system ( w , hemotek). fully engorged female mosquitoes were transferred into new containers and maintained under standard conditions for an additional days. the mosquitoes were subsequently euthanized for further analysis. the mosquitoes used in this experiment were previously antibiotic treated. briefly, mosquitoes were provided with cotton balls moistened with a % sucrose solution including units of penicillin and mg of streptomycin per ml ( - , thermo fisher scientific) for days to remove gut bacteria. the mosquitoes were starved for hr to allow the antibiotics to be metabolized prior to in vitro membrane blood feeding. removal of gut bacteria was confirmed by a colony-forming unit assay. the cytotoxicity of the cbaes was evaluated in vero cells and a cells. cell viability was measured by the mtt [ -( , -dimethylthiazol- -yl)- , -diphenyl tetrazolium bromide] (m , solarbio) method. confluent cell monolayers contained in -well plates were exposed to different concentrations of the cbaes for hr at °c. then, a final concentration of . mg/ml mtt was added to each well. after hr of incubation at °c, the supernatant was removed, and µl of dimethyl sulfoxide (dmso) was added to each well to solubilize the formazan crystals. after shaking for min, absorbance was measured at nm. the concentration of each protein necessary to reduce cell viability by % (cc ) was calculated by comparison with the untreated cells using a sigmoidal nonlinear regression function to fit the dose-response curve in graphpad prism . (graphpad software, usa). icr mice were used to test the in vivo safety of the cbaes. a number of icr -week old female mice were divided into groups (n= ) at random. cbae- or cbae- was dissolved in pbs and administered either intravenously once at doses of , , , and µg/kg or intranasally once at doses of , , and µg/kg. pbs and corresponding concentrations of bsa served as negative controls. the general behaviour, signs of toxicity, body weights and mortality of the mice were recorded after the administration of the cbaes. the half-lethal dose (ld ) was calculated using a sigmoidal nonlinear regression function to fit the dose-response curve in graphpad prism . (graphpad software, usa). animals were randomly allocated into different groups. mosquitoes that died before sample collection were excluded from the analysis. the investigators were not blinded to the allocation during the experiments or to the outcome assessment. no statistical methods were used to predetermine the sample size. descriptive statistics are provided in the figure legends. all analyses were performed using graphpad prism statistical software. (d, e) the s g mutant of cbae- fully lost its ability to suppress denv (d) and zikv (e) infection: inhibition curves of cbae- and cbae- -s g against denv (d) or zikv(e). serial concentrations of cbae- or cbae- -s g were mixed with pfu of denv or zikv in vp-sfm medium to perform standard plaque reduction neutralization tests (prnts). (f) representative negative stained transmission electron microscopy images of zikv particles treated with µg/ml bsa (arrow head) and those treated with µg/ml cbae- (empty arrow head); high magnification: , ×, low magnification: , ×. (g, h) rate of denv (g) or zikv (h) replication inhibition following exposure to cbaes before viral infection of vero cell monolayers. the viral genome was quantified by rt-qpcr. (b, c, g, h) significance was determined using unpaired t-tests. data are presented as the mean ± sem. aegypti. a mixture containing human blood ( % v/v), csp_bj bacterial suspension ( % v/v), and supernatant from denv-or zikv-infected vero cells ( % v/v) was used to feed antibiotic-treated a. aegypti rockefeller strain via an in vitro blood feeding system. mosquito infectivity was determined by rt-qpcr at days post blood meal. the final denv or zikv titre was × pfu/ml for oral infection. (a, c) the number of infected mosquitoes relative to total mosquitoes is shown at the top of each column. a nonparametric mann-whitney test was used for the statistical analysis. (b, d) differences in the infectivity ratio were compared using fisher's exact test. (a) conserved domains of cbae- and cbae- protein sequences was analysed using a simple modular architecture research tool (smart) (letunic and bork, ; letunic et al., ) . (b) sequence comparison of cbae- and cbae- was performed using basic local alignment search tool (blast) on ncbi website with the program "needleman-wunsch alignment of two sequences" (altschul et al., ) . gapped blast and psi-blast: a new generation of protein database search programs antiviral peptides targeting the west nile virus envelope protein broad-spectrum agents for flaviviral infections: dengue, zika and beyond a non-live preparation of chromobacterium sp. panama (csp_p) is a highly effective larval mosquito biopesticide lipases: an overview broad-spectrum antiviral agents: secreted phospholipase a targets viral envelope lipid bilayers derived from the endoplasmic reticulum membrane a c-type lectin collaborates with a cd phosphatase homolog to facilitate west nile virus infection of mosquitoes the pandemic in mexico: experience and lessons regarding national preparedness policies for seasonal and epidemic influenza secreted phospholipases a( ), a new class of hiv inhibitors that block virus entry into host cells the a (h n ) influenza virus pandemic: a review structural biology of zika virus and other flaviviruses west nile virus spreads in europe identification of a cd -binding-site antibody to hiv that evolved near-pan neutralization breadth lysis of human immunodeficiency virus type by a specific secreted human phospholipase a years of the smart protein domain annotation resource smart: recent updates, new developments and status in pathogenicity of different isolates of vibrio harveyi in tiger prawn, penaeus monodon infections in engineered human tissues using clinical-grade soluble human ace phospholipase a isolated from the venom of crotalus durissus terrificus inactivates dengue virus and other enveloped viruses by disrupting the viral envelope respiratory syncytial virus seasonality: a global overview inter-annual variation in seasonal dengue epidemics driven by multiple interacting factors in guangzhou, china human influenza virus infections chromobacterium csp_p reduces malaria and dengue infection in vector mosquitoes and has entomopathogenic and in vitro anti-pathogen activities the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak chromobacterium spp. mediate their anti-plasmodium activity through secretion of the histone deacetylase inhibitor romidepsin repurposing approved drugs on the pathway to novel therapies west nile virus infection lipoprotein lipase and hepatic triglyceride lipase reduce the infectivity of hepatitis c virus (hcv) through their catalytic activities on hcvassociated lipoproteins zika virus: history, epidemiology, transmission, and clinical presentation respiratory syncytial virus hospitalization and mortality: systematic review and meta-analysis dengue virus purification and sample preparation for cryoelectron microscopy development of a neutralization assay for influenza virus using an endpoint assessment based on quantitative reverse-transcription pcr broad-spectrum coronavirus antiviral drug discovery a neutralization assay for respiratory syncytial virus using a quantitative pcr-based endpoint assessment structural and functional basis of sars-cov- entry by using human ace coronavirus disease (covid- ) situation report - epidemic arboviral diseases: priorities for research and public health a gut commensal bacterium promotes mosquito permissiveness to arboviruses sars-cov- : an emerging coronavirus that causes a global threat broadly resistant hiv- against cd -binding site neutralizing antibodies broad-spectrum antiviral agents key: cord- - te nu authors: croll, tristan; diederichs, kay; fischer, florens; fyfe, cameron; gao, yunyun; horrell, sam; joseph, agnel praveen; kandler, luise; kippes, oliver; kirsten, ferdinand; müller, konstantin; nolte, kristoper; payne, alex; reeves, matthew g.; richardson, jane; santoni, gianluca; stäb, sabrina; tronrud, dale; williams, christopher; thorn, andrea title: making the invisible enemy visible date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: te nu during the covid- pandemic, structural biologists have rushed to solve the structures of the proteins encoded by the sars-cov- genome in order to understand the viral life cycle and enable structure-based drug design. in addition to the structures from sars-cov previously solved, structures covering of the viral proteins have been released in the span of only months. these structural models serve as basis for research worldwide to understand how the virus hijacks human cells, for structure-based drug design and to aid in the development of vaccines. however, errors often occur in even the most careful structure determination - and are even more common among these structures, which were solved under immense pressure. from the beginning of the pandemic, the coronavirus structural taskforce has categorized, evaluated and reviewed all of these experimental protein structures in order to help downstream users and original authors. our website also offers improved models for many key structures, which have been used by folding@home, openpandemics, the eu jedi covid- challenge, and others. here, we describe our work for the first time, give an overview of common problems, and describe a few of these structures that have since acquired better versions in the worldwide protein data bank, either from new data or as depositor re-versions using our suggested changes. introduction sars-cov- , the coronavirus responsible for covid- , has a single-stranded rna genome that encodes proteins. these macromolecules fulfil essential roles in the viral life cycle, enabling sars-cov- to infect, replicate, and suppress the immune system of its host. for example, the characteristic spikes that protrude from its envelope and allow it to bind to host cells, are a trimer of the surface glycoprotein ( fig. ) . knowing the atomic structures of these macromolecules is vital for understanding the lifecycle of the virus and helping design specific pharmaceutical compounds that bind and inhibit their functions, with the goal of stopping the cycle of infection. since the covid- pandemic hit in the beginning of this year, the structural biology community has swung into action very efficiently and is now strongly engaged to establish the atomic structures of these macromolecules as fast as possible by use of nuclear magnetic resonance (nmr), cryo-electron microscopy (cryo-em) and crystallographic methods ( ) . all of these methods require an interpretation of the measured and processed data with a structural model and cannot be fully automated. the resulting structures are made freely and publicly available in the world wide protein data bank (wwpdb), structural biology's archive of record ( ) . unfortunately, the fit between model and data is never perfect and errors from measurement, post-processing and modelling are a given. structures solved in a hurry to address a pressing medical and societal need are even more prone to mistakes. however, as these structures are used for biological interpretation, small errors can have severe consequences -in particular, in structure-based drug discovery, structural bioinformatics, and computational chemistry; a current focus of sars-cov- research across the world. as of the writing of this publication, macromolecular structures from sars-cov and sars-cov- have been deposited, covering parts of of the proteins. in this time of crisis, it is therefore vital to ensure that the structural data made available to the wider research community are the best they can be in every regard. pushing the methods to the limit the wwpdb ( ) is an invaluable tool, but once released, a structure in the databank can only be reversioned by the original depositors, who, after any associated papers are published, may have little or no motivation to correct their structures. third parties may only deposit new models based on others' deposited data under new accession ids when accompanied by peer reviewed publications. in such cases, there is no explicit link from the original entry to the new one. importantly, % of structure downloads from the pdb are not by experimentalists, but scientists from other fields ( ) . as a consequence, errors can lead to a large waste of resources and time by those making use of the data obtained from the pdb and may even be misinterpreted as biologically and pharmaceutically relevant information. we, the authors of this manuscript, develop computational methods for the solution of macromolecular structures. by being expert users of our own software tools, we were well placed to help in this unprecedented situation. this is why, since the first signs of a possible pandemic arose at the beginning of this year, we joined forces to assess and, where necessary, improve upon the published macromolecular structures from sars-cov and sars-cov- . in cases where we believe we have significantly improved the macromolecular models, we offered them back to the original authors and the scientific community. raw data are not deposited in the wwpdb and are not mandatory for publication; as a consequence, they are difficult to obtain. their absence in the public record is detrimental for reanalysis and validation, and the development of new methods. in an effort to make the most out of the experimental results, we invited authors to send us their raw experimental data, which are key to validation of the entire structure determination process from start-to-finish. we thus offer to help authors deposit these data in a non-pdb public repository if the authors wished to do so. after we began our validation efforts of macromolecular structures from sars-cov and sars-cov- , we were approached by our colleagues in in-silico drug screening, folding@home ( , ) , openpandemics ( ) , and the eu joint european disruptive initiative (jedi) ( ). these initiatives needed the best structures they could get for studies of the virus and had already lost much computing time and resources to suboptimal structure solutions. all macromolecular structures from sars-cov and sars-cov- in the pdb are downloaded into our repository and assessed automatically in the first hours after release. for crystallographic and cryo-em structures, we check the quality of the deposited merged data, and how well the model fits these data. an automatic evaluation of nmr data is forthcoming. then, all structures are checked according to chemical prior knowledge. evaluation specific to crystallographic data and structure solutions as crystal structures make up . % of our data, these are evaluated most thoroughly. crystal diffraction can, for example, stem from more than one crystal lattice (twinning), be contaminated by ice crystal diffraction (ice rings) or be incomplete due to radiation damage or suboptimal measurement strategy. these issues cannot be resolved after data collection, but treating data accordingly can yield a better structural model. deducing such problems from the deposited structure factors (mandatory in wwpdb) can be difficult; raw data allow a much more complete analysis of the experiment. another source of errors is data processing (integration and scaling), which nowadays is often done automatically. assuming the wrong crystal lattice symmetry or including, for example, diffraction spots obscured by the beam stop, can lead to lower quality or even unsolvable structures. if raw data are available, data can be re-processed and these problems can be resolved manually. to analyse crystallographic data for twinning, completeness and overall diffraction quality, we used phenix.xtriage ( ); furthermore we ran auspex ( ) which automatically identifies ice rings and produces plots from which several other pathologies, like a "bad" beam stop mask, can be recognized quickly. the completeness of most datasets is satisfying, with only out of datasets below %. all the datasets have an acceptable strength with intensity/sigma(intensity) above . ice rings were detected in datasets and problems with the beam stop masking in ; crystal structures were indicated as potentially resulting from twinned crystals. a general indication of how well the atomic model fits the measurement data can be obtained by comparing the deposited r-factors to results from pdb-redo ( ) (including whatcheck ( )) to determine the overall density fit as well as many other diagnostics. while the deposited structures are often improved by pdb-redo, they need to be checked and should not be viewed as "more correct" purely on basis of a lower r value. in addition to this, a high r value does not indicate a single type of error and hence should be used with caution. only two structures in the repository present an alarmingly high rfree value above %, although problems can be found in other structures by looking in more detail. pdb-redo improves the rfree for most of the structures, and the only cases where we found a huge degradation pointed to major issues with the pdb entry; this was especially true for older sars-cov structures. evaluation specific to structures from single-particle cryo-em cryo-em structures make up . % of our data. as with crystallographic structures, raw data are not available from the wwpdb, but the three-dimensional map reconstructed from the microscopic single particle images is deposited, allowing the calculation of the fit between model and map in the form of a fourier shell correlation (fsc). the model-map fsc is plotted as a curve, which estimates agreement between features resolvable at different resolutions. for a well-fitted model, a model-map fsc of . roughly corresponds to the cryo-em map resolution (which is determined as where the fsc between two half-maps drops below . ). to calculate fscs, we use the ccp-em ( ) model validation task which utilizes refmac ( ) and calculates real-space cross-correlation coefficient (ccc), mutual information (mi) and segment manders' overlap coefficient (smoc) ( ) . while mi is a single value score to evaluate how well model and map agree, the smoc score evaluates the fit of each modelled residue individually and can help to find regions where errors occur in the model in relation to the map. z-scores highlight residues with a low score relative to their neighbours and point to potential misfits. out of structures, structures had an average model-fsc below . and seven have a mi score below . , indicating a bad overall agreement between map and model and potential for further improvement. the smoc score indicates for twelve structures that more than % of the residues fit poorly with the map, while the other % of structures had a relatively good density fit. however, most modelling errors could only be corrected manually (see below). in addition to this validation, we run haruspex ( ), a neural network to annotate reconstruction maps to evaluate which secondary structures can be recognized automatically in the map. molecular geometry is constrained by the nature of its chemical bonds and steric hindrance between the atoms. in order to evaluate the model quality with respect to chemical prior knowledge we run molprobity ( , ) , which checks covalent geometry, conformational parameters of protein and rna and steric clashes. however, it is unfortunately possible to use some of these traditional indicators of model quality as additional restraints during refinement, which invalidates them to a certain degree -we therefore also used the molprobity cablam score ( ) , which can pinpoint local errors at - Å resolution even if traditional criteria have been used as restraints. cablam scores higher than % outliers indicated that of the structures have many incorrect backbone conformations. during the crisis the molprobity webservice has been pushed to the limit of its capacity, as many different drug developers have screened the very same coronavirus structures many times. we have developed a bespoke molprobity pipeline to make these results available online and to decrease the workload on the webservice. in addition to this, the sequence of each structure is also aligned and checked against the known genome to highlight misidentified residues. every wednesday, after the new pdb structures have been released, an automatic pipeline runs to organize the new structures according to the genetic information and then to assess the quality of models and the experimental results. these results, along with the original structures, are immediately available from our online repository which is accessible via our website insidecorona.net. to facilitate access and to get an overview of structures, we supply an sql database of key statistics and quality indicators along with the results. as a community, for decades, we have aspired to automate structural biology as much as possible. however, neither structure solution nor validation have been fully automated due to the complexity of interpreting low-quality maps that have poor fit between experimental data and structural models. this task requires detailed knowledge of macromolecular/small molecule structure and chemical interactions. even with state-of-the-art automatic methods at hand, experienced human inspection residue-byresidue remains the best way to judge the quality of a structure, highlighting the continuing need for expert structure solvers. given the flood of new sars-cov- structures, resources have not permitted us to check all structures manually. therefore, we have selected representative structures. certain errors were surprisingly common, such as peptide bond flips, rotamer outliers and mis-identification of small molecules, such as water as magnesium, chloride as zinc, and a multi-zinc site modelled as poly ethylene glycols. zinc plays an important role in many viral infections, and is coordinated by many of the sars-cov- protein structures. we also found a large number of cys-zn sites being mismodelled, with the zinc ion missing or pushed out of density, and/or erroneous disulphide bonds between the coordinating cysteine residues. many coronavirus proteins are linked on certain asparagine residues ("n-linked") to carbohydrate chains called glycans. their exact composition depends on the host cell in question, and their main function is to deter the host immune system. in some structures, where the sample was produced in eukaryotic cells, the "stem" sugars of these n-linked glycans were evident in the map. however, in many cases these sugars were flipped approximately degrees from their correct orientation around the n-glycosidic bond. out of the structures we checked manually, we were able to significantly improve which are available from insidecorona.net. in the following, we will give two examples: once sars-cov- infects a cell, the first protein produced is a long polypeptide chain which is cleaved into functional proteins, the non-structural proteins (nsps) ( ) . these are essential for the production of new viruses in the host cell. nsp is a large protein molecule by any measure, amino acid residues in total. its segments have a variety of functions, among them the papain-like protease domain which cleaves the first five nsps from the polypeptide chain ( ) . without cleavage of the polyprotein, the virus cannot replicate and infection is halted. hence, the papain-like protease domain represents an important potential drug target ( ) . the first sars-cov- structure of this domain was pdb w c (released st april ). it was immediately used as the basis for structure-based drug design around the world. the resolution was . Å and rwork/rfree were . % / . %. however, the overall completeness of the measured data was only . %. why were just over half of all reflections that could have been measured at this resolution recorded? the raw data for this entry are available from proteindiffraction.org ( ) . they revealed that the diffraction data had been measured with a very high x-ray intensity, which led to a swift deterioration in diffracting power due to radiation damage -something which could not have been learned from the data deposited in the pdb, underlining the importance of the availability of raw data. typically, crystallographers aim for a dose of mgy or lower. here, we estimated a diffraction weighted dose of . mgy with raddose- d, with a maximum dose value of mgy, which likely completely destroyed parts of the crystal ( ) . we based this calculation on assumption of a x x μm plates, given that the crystals were described as such. in addition, the measurement covered ° sample rotation followed by an additional ° rotation starting from the same angular position, covering the first ° twice, further increasing the dose while recording little additional information. the angular range per image, . °, was also surprisingly wide for the high-throughput pixel detector used (a dectris pilatus m), and the diffraction was highly anisotropic. we re-processed the images using xds ( ) , omitting the final ° of the first sweep and final ° of the second where the radiation damage affected the data too severely. an elliptical resolution cut-off was applied with staraniso ( ) to account for anisotropy. careful manual intervention could improve the resolution to . Å with better data quality overall. the revised ellipsoidal completeness was . %. the structure has -fold non-crystallographic rotational symmetry; with three monomers coordinating a central zinc ion within the asymmetric unit. the second, functionally important, zinc ion is far removed from the three-fold rotation axis and coordinated by four cysteines. this zinc finger domain is essential for activity, in addition to the papain-like cysteine-histidine-aspartate catalytic triad ( ) , but it is poorly resolved and incompletely and differently modelled in each of the three monomers of this structure. this disorder may be the result of radiation damage. only two of the three zinc sites were modelled, and here, the bond lengths between each of the four sulphur atoms and the zinc varied from . Å to . Å, and the cß-sg-zn angles between ° and °. the third site had no zinc (fig. ) , instead being modelled as a disulphide bond. prior knowledge about coordination chemistry dictates that the bond lengths between cys and zn should all be approximately . Å and the angles about °. adding zincs to all sites and restraining the bond lengths and angles to these expected values, adding non-crystallographic symmetry restraints (requiring the copies to look similar), an overall higher weighting of ideal geometry, and the reassessment of side chains and water molecules improved the electron density maps and lowered the r values to . %/ . % at . Å resolution. this example shows the importance of optimised data collection strategy, data processing and model building, the quality of which is interconnected. in this case, even though the data were radiation damaged, by adjusting the data processing to take this into account, and by modifying the model refinement to include stronger restraints and to take full advantage of the non-crystallographic symmetry, this structure could be drastically improved. a new structure of the c s mutant of the same protein (pdb code wrh) came out a month later, in which the zinc site was clearly resolved. by this time, however, the structure had already been widely used as a target in in-silico drug design: for example, % of participants in the eu jedi covid- challenge have used this structure to design potential drugs. the availability of a better structure a month earlier would not only have increased their chances of success but also saved much computing and man hours in computer aided drug development. when sars-cov- replicates, its single-stranded rna genome needs to be copied. this is achieved by a macromolecular complex of rna-dependent rna polymerase (nsp , rdrp), nsp and nsp ( ) . coronaviruses, including sars-cov- , have some of the largest genomes among rna viruses (approximately kilobases), suggesting their polymerase complexes possess proof-reading functionality. this sets coronaviruses apart from other rna viruses ( ) . the first structure of sars-cov rna polymerase (pdb entry nus) was solved in by cryo-em ( ) , before the pandemic began. in this structure, a loop close to the c-terminus (residues - ) was not resolved in the reconstruction map and hence not modelled. following this loop, the polymerase has an irregular helix followed by a flexible tail. density for this helix was poorly resolved -coupled with its short length and the lack of any information from the preceding and following loops, this led to difficulty in assigning register (the identity of the amino acid at each site). nevertheless, the overall validation statistics (clashscore, ramachandran outliers, sidechain outliers) provided by the wwpdb for this model appeared exceptionally good. we inspected one of the first available structures of the sars-cov- rna polymerase complex ( btf) using isolde ( ) , a program used for interactively visualising and remodelling proteins in their experimental density. the higher resolution at this c-terminal tail of the structure made it clear that the c-terminal helix was severely incorrect, with the assigned sequence being nine residues upstream of the correct residues for this site (see fig. ). this error was present in all the structures of this complex from both sars-cov and sars-cov- , presumably propagated due to each subsequent structure using the previous models as the starting point for their modelling, as is standard practice. for each affected rdrp structure we immediately contacted the original authors. of the sars-cov- rna polymerase complexes in the wwpdb now have the corrected sequence alignment at the cterminus, and those also include many of our other changes described below. these pdb re-versioned corrections allow modelling efforts for drugs against sars-cov- rna polymerase to start from a much better model. notably, the authors of a later cryo-em structure of the rna polymerase/rna complex (pdb entry yyt) used one of our corrected models as the starting point for their new model ( ) . the structure of sars-cov- rna polymerase in pre-translocation state and bound to template-primer rna and remdesivir (pdb entry bv ) ( ) represents a useful basis to investigate the inhibitory effect of remdesivir and the rational design of other nucleoside triphosphate (ntp) analogues ( ) . however, we found that this structure has some issues, which may provide misleading information to people who are conducting such studies. apart from the register shift described above, there are three magnesium ions modelled in the active site, a number which is contradictory to our common knowledge of this class of proteins. magnesium ions play an essential role in catalysis in rna polymerase (binding the incoming ntp, positioning ntp for incorporation and stabilizing the leaving group after catalysis) ( , ) . one of the magnesium ions is shown coordinated by a pyrophosphate, which implies that the pyrophosphate ion release in sars-cov- rdrp is relatively slow and may even couple with the translocation ( ) . however, all three magnesium ions as well as the pyrophosphate are poorly supported by the map reconstructed from the experimental data or by local geometry if these ions were included as fixed components of the binding site, this may have severely impacted in-silico docking and drug design studies. in addition to the above, we corrected the conformations of three rna residues close to the remdesivir site including an adenosine base (t ) modelled "backwards", fixed "backwards" peptides flagged by cablam, added several residues and water molecules with good density and geometry, and corrected two proline residues that had been erroneously flipped from cis to trans. our remodelled structure is offering a valuable structural basis for future studies, such as in-silico docking and drug design targeting at sars-cov- rdrp ( ), as well as for computational modelling or simulations to investigate the molecular mechanism of viral replication ( , , ) . many specialists in structural biology and in silico design are now tackling sars-cov- research, but may not be familiar with the wider body of coronavirus research. in addition to improved models and evaluation results, we also supply context on insidecorona.net. this covers literature reviews centred on the structural aspects of the viral life cycle, host interaction partners, illustrations, and evaluation criteria for selecting the best starting models for in silico projects. furthermore, we added entries about the sars-cov- proteins to proteopedia ( ) and molssi ( , ) , as well as d-bionotes ( ) deep-link into our data base. finally, as sars-cov- has had an unprecedented impact on the world at large, we have also tried to make our, and others', research on the topic accessible to the general public. this has included a number of posts on our homepage aimed at non-scientists and live streaming the reprocessing of data on twitch, as well as the design, production, and public release of an accurate d printed model of sars-cov- based on deposited structures for use as a prop for outreach activities. in the last five months, we have done a weekly automatic post-analysis as well as a manual re-processing and re-modelling of representative structures from each of the structurally known macromolecules of sars-cov or sars-cov- . in this global crisis, where the community aims to get structures out as fast as possible, we aim to ensure that structure interpretations available to downstream users are as solid as possible. we provide these results as a free resource to the community in order to aid the hunt for a vaccine or anti-viral treatment. our results are constantly updated and can be found online at insidecorona.net. new contributors to this effort are very welcome. in the last years, structural biology has become highly automated, and methods have advanced to the point that it is now feasible to solve a new structure from start to finish in a matter of months with little specialist knowledge. the extremely rapid and timely solution of these structures is a remarkable achievement during this crisis and, despite some shortcomings, these structures have enabled downstream work on therapeutics to rapidly progress. the downside is that errors at all points during a structure determination are not only common, but can also remain undetected, and if they are detected, this is usually seen as individual failure. however, no individual researcher is fully conversant in all the details of structure determination, protein and nucleic acid structure, chemical properties of interacting groups, catalytic mechanisms, and viral life cycle. the result is that the first draft of a molecular model often contains errors like the ones pointed out above. while any molecular model could benefit from an examination by multiple experts, during this time it is important to bring such inspection to coronavirusrelated structures as quickly as possible. we believe that, as a community, we need to change how we all see, address and document errors in structures to achieve the best possible structures from our experiments. we are scientists: in the end, truth should always win. visualizing an unseen enemy; mobilizing structural biology to counter covid- rcsb protein data bank: sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education announcing the worldwide protein data bank xtriage and fest: automatic assessment of x-ray data and substructure structure factor estimation auspex: a graphical tool for x-ray diffraction data analysis the pdb_redo server for macromolecular structure model optimization errors in protein structures recent developments in the ccp-em software suite vagin, refmac for the refinement of macromolecular crystal structures refinement of atomic models in high resolution em reconstructions using flex-em and local assessment haruspex: a neural network for the automatic identification of oligonucleotides and protein secondary structure in cryo-electron microscopy maps molprobity: all-atom structure validation for macromolecular crystallography molprobity: more and better reference data for improved all-atom structure validation new tools in molprobity validation: cablam for cryoem backbone, undowser to rethink "waters," and ngl viewer to recapture online d graphics nsp of coronaviruses: structures and functions of a large multi-domain protein identification of severe acute respiratory syndrome coronavirus replicase products and characterization of papain-like protease activity a public database of macromolecular diffraction experiments estimate your dose: raddose- d. protein science staraniso (global phasing ltd the papain-like protease of severe acute respiratory syndrome coronavirus has deubiquitinating activity implications of altered replication fidelity on the evolution and pathogenesis of coronaviruses. current opinion in virology structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors isolde: a physically realistic environment for model building into low-resolution electron-density maps structure of replicating sars-cov- polymerase structural basis for inhibition of the rnadependent rna polymerase from sars-cov- by remdesivir a mechanism for all polymerases the structural mechanism of translocation and helicase activity in t rna polymerase structural basis of the potential binding mechanism of remdesivir to sars-cov- rna-dependent rna polymerase remdesivir and sars-cov- : structural requirements at both nsp rdrp and nsp exonuclease active-sites perspective: computational chemistry software and its advancement as illustrated through three grand challenge cases for molecular science covid- molecular structure and therapeutics hub key: cord- - k js v authors: ratcliff, jeremy; nguyen, dung; andersson, monique; simmonds, peter title: evaluation of different pcr assay formats for sensitive and specific detection of sars-cov- rna date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k js v accurate identification of individuals infected with sars-cov- is crucial for efforts to control the ongoing covid- pandemic. polymerase chain reaction (pcr)-based assays are the gold standard for detecting viral rna in patient samples and are used extensively in clinical settings. most currently used quantitative pcr (rt-qpcrs) rely upon real-time detection of pcr product using specialized laboratory equipment. to enable the application of pcr in resource-poor or non-specialist laboratories, we have developed and evaluated a nested pcr method for sars-cov- rna using simple agarose gel electrophoresis for product detection. using clinical samples tested by conventional qpcr methods and rna transcripts of defined rna copy number, the nested pcr based on the rdrp gene demonstrated high sensitivity and specificity for sars-cov- rna detection in clinical samples, but showed variable and transcript length-dependent sensitivity for rna transcripts. samples and transcripts were further evaluated in an additional n protein real-time quantitative pcr assay. as determined by % endpoint detection, the sensitivities of three rt-qpcrs and nested pcr methods varied substantially depending on the transcript target with no method approaching single copy detection. overall, these findings highlight the need for assay validation and optimization and demonstrate the inability to precisely compare viral quantification from different pcr methodologies without calibration. sars-coronavirus- (sars-cov- ), a human-infective member of the betacoronavirus genus (family coronaviridae), was first identified in the hubei province of china in late as the causative agent behind an increased number of cases of respiratory illness occasionally leading to acute respiratory distress and death. [ ] [ ] [ ] the outbreak was declared a public health emergency of international concern by the world health organization on january th , and the associated disease was named covid- on february th , . , the disease has since spread globally and by june rd has infected nearly million individuals in over countries, causing at least , deaths. , the ability to accurately identify and diagnose asymptomatically and symptomatically infected patients is crucial for efforts aimed at limiting person-to-person transmission and controlling the outbreak. [ ] [ ] [ ] the standard method for diagnosing viral infections is through the detection of viral nucleic acid in clinical samples. reverse-transcriptase quantitative polymerase chain reaction (rt-qpcr) is the gold standard used in most diagnostic laboratories. probe-based rt-qpcr relies on the binding and amplification of three oligonucleotides (two primers and one internal fluorescent probe) and the accumulation of fluorescence signal mediated by dna polymerase activity. rt-qpcr is not widely accessible as the method relies upon the use of expensive real- time pcr platforms and the probe component of the assay is typically the most costly reagent. an alternative, more cost-effective diagnostic method for sars-cov- rna is nested pcr. nested pcr is based on the use of two sequential pcr amplifications wherein the secondary set of primers target sequences nested within the amplicon produced by the first round amplification. compared to conventional pcr, which uses a single round of replication, nested pcr has increased sensitivity and decreased risk of amplification of non-specific products. nested pcr methods were developed for sars, but no nested pcr method for sars-cov- has yet been in this study, a nested pcr assay for sars-cov- has been developed and its performance for experiments to determine the sensitivity of the two rt-qpcr methods and nested pcr were completed using serial dilutions of each transcript ( * to - copies/ µl) in a previously described rna storage buffer containing rna storage solution (thermo fisher scientific; mm sodium citrate, ph . ), herring sperm carrier rna ( µg/ml), and rnasin (new england biolabs uk, u/ml). for nested pcr experiments, detection rate was assessed over five replicates using the methods described above and a positive result was the presence of a pcr product of the expected length. for rt-qpcr experiments, detection rate was assessed over eight replicate experiments using the methods described above and a positive result was an increase in signal that crossed the threshold value calculated by the machine for each experiment. the cdc n method had a cutoff value of ct . the sensitivity of the nested pcr and two rt-qpcr methods was compared by measuring the % endpoints ( ep) of detection for serial dilutions of the four transcripts described above (table , figure ). the rdrp transcript does not contain the target sequences of the cdc n method and thus was not assessed using this method. no method proved to consistently be the most sensitive for all targets, and each method was the most sensitive for at least one target. the between their e, n, and rdrp primer/probe sets, with the rdrp set performing the best with a % detection probability of . copies/reaction for sars-cov- genomic rna. corman et al. also reported a % detection probability of . copies/reaction for the e gene assay -this is in conflict with a % detection probability of copies/reaction reported by institut pasteur using the same primer/probe set. igloi et al. investigated the performance of commercial rt-qpcr kits and found variable performance between the rt-qpcr kits, with several kits having fold differences in sensitivity for different gene targets and one kit having a log difference in sensitivity between their e and rdrp/n preparations. using in vitro transcribed small transcripts of five sars-cov- genes, vogels et al. evaluated nine primer/probe sets, including the charité- rdrp and cdc n sets. vogels et al. found that eight of the nine primer/probe sets had lower limits of detection of copies/reaction, and that the charité rdrp set was unable to detect any replicates with copies/reaction, although they did alter the primer and probe concentration this broader experience suggests that the issue may lie within the generalized approach of detecting sars-cov- rna by pcr rather than being specific to individual assays. one possible factor limiting the sensitivity of pcr assays for sars-cov- may originate from the highly structured nature of the genome as measured by standard thermodynamic rna structure prediction methods. sars-cov- , as well as other coronaviruses, has extensive rna secondary structure elements peppered throughout its genome -approximately % - % of bases may be involved in base pairing. it is conceivable that, if a section of the target sequence is embedded within highly energetically favoured rna secondary structure, binding of the pcr oligonucleotides could be competitively inhibited, delaying the initiation of the pcr reaction. this effect could explain the unusual findings that no pcr based assay for sars-cov- , whether in this study or others, have been able to achieve single copy detection sensitivities. the authors report no conflicts of interest. a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china statement on the second meeting of the international health regulations ( ) emergency committee regarding the outbreak of novel coronavirus ( -ncov) who director-general's remarks at the media briefing on -ncov on world health organization (who) an interactive web-based dashboard to track covid- in real time impact of contact tracing on sars-cov- transmission quantifying sars-cov- transmission suggests epidemic control with digital contact tracing substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) real-time pcr in virology principles and technical aspects of pcr amplification development of taqman rt-nested pcr system for clinical sars-cov detection detection of novel coronavirus ( -ncov) by real-time rt-pcr coronavirus disease (covid- ) a european multicentre evaluation of detection and typing methods for human enteroviruses and parechoviruses using rna transcripts a simple method of estimating fifty per cent endpoints r: a language and environment for statistical computing sars-cov- testing: trials and tribulations sars-cov- detection by direct rrt-pcr without rna extraction high-throughput extraction of sars-cov- rna from nasopharyngeal swabs using solid-phase reverse immobilization beads. medrxiv direct rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swabs without an rna extraction step. biorxiv extraction-free covid- (sars-cov- ) diagnosis by rt-pcr to increase capacity for national testing programmes during a pandemic. biorxiv protocol: real-time rt-pcr assays for the detection of sars-cov- comparison of commercial realtime reverse transcription pcr assays for the detection of sars-cov- analytical sensitivity and efficiency comparisons of sars-cov- qrt-pcr primer-probe sets. medrxiv detection of sars-cov- rna by multiplex rt-qpcr. biorxiv pervasive rna secondary structure in the genomes of sars-cov- and other coronaviruses -an endeavour to understand its biological purpose. biorxiv to interpret the sars-cov- test, consider the cycle threshold value nested pcr detection of clinical samples assessed at the john radcliffe hospital via rt-qpcr. cov -cov (panels a-c) were positive samples, neg -neg (panel d) were negative samples. panel a has a bp molecular ladder panels b-d have kb molecular ladder comparison of detection of sars-cov- rna in clinical samples by nested pcr versus qpcr readout from the microbiology unit of the john radcliffe hospital. samples negative via qpcr were given a representative ct value of . key: cord- -f dsu p authors: hati, sanchita; bhattacharyya, sudeep title: impact of thiol-disulfide balance on the binding of covid- spike protein with angiotensin converting enzyme receptor date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: f dsu p the novel coronavirus, severe acute respiratory syndrome coronavirus (sars-cov- ), has led to an ongoing pandemic of coronavirus disease (covid- ), which started in . this is a member of coronaviridae family in the genus betacoronavirus, which also includes sars-cov and middle east respiratory syndrome coronavirus (mers-cov). the angiotensin-converting enzyme (ace ) is the functional receptor for sars-cov and sars-cov- to enter the host cells. in particular, the interaction of viral spike proteins with ace is a critical step in the viral replication cycle. the receptor binding domain of the viral spike proteins and ace have several cysteine residues. in this study, the role of thiol-disulfide balance on the interactions between sars-cov/cov- spike proteins and ace was investigated using molecular dynamic simulations. the study revealed that the binding affinity was significantly impaired when all the disulfide bonds of both ace and sars-cov/cov- spike proteins were reduced to thiol groups. the impact on the binding affinity was less severe when the disulfide bridges of only one of the binding partners were reduced to thiols. this computational finding provides a molecular basis for the severity of covid- infection due to the oxidative stress. the novel coronavirus known as severe acute respiratory syndrome coronavirus (sars-cov- ) or simply covid- is the seventh member of the coronavirus family. the other two viruses in this family that infect humans are severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov). these are positive-sense, single-strand enveloped rna viruses. the coronavirus particles contain four main structural proteins: the spike, membrane, envelope, and nucleocapsid. [ ] [ ] the spike protein protrudes from the envelope of the virion and consists of two subunits; a receptor-binding domain (rbd) that interacts with the receptor proteins of host cells and a second subunit that facilitates fusion of the viral membrane into the host cell membrane. recent studies showed that rbd of spike proteins of sars-cov and sars-cov- interact with angiotensin-converting enzyme (ace ). ace belongs to the membrane-bound carboxydipeptidase family. it is attached to the outer surfaces of cells and is widely distributed in the human body. in particular, higher expression of ace is observed in organs such as small intestine, colon, kidney, and heart, while ace expression is comparatively lower in liver and lungs. [ ] [ ] the role of oxidative stress on the binding of viral proteins on the host cell surface receptors is a relatively underexplored area of biomedical research. [ ] [ ] [ ] [ ] [ ] [ ] previous studies have indicated that the entry of viral glycoprotein is impacted by thiol-disulfide balance on the cell surface. , [ ] [ ] [ ] [ ] [ ] any perturbations in the thiol-disulfide equilibrium has also been found to deter the entry of viruses into their target cells. the first step of the viral entry involves binding of the viral envelop protein onto a cellular receptor. this is followed by endocytosis, after which conformational changes of the viral protein helps the induction of the viral protein into the endosomal membrane, finally releasing the viral content into the cell. these conformational changes are mediated by ph changes as well as the conversion of disulfide to thiol group of the viral spike protein. several cell surface oxidoreductases regulate the thiol-disulfide exchange, responsible for conformational changes of viral proteins needed for virus entry into host cells. in the backdrop of significant mortality rate for sars-cov- (hereinafter referred to as cov- ) infection, it is important to know if the thiol-disulfide balance plays any role on the binding of the spike glycoprotein on to the host cell receptor protein ace . a recent study with the spike glycoprotein of sars-cov (hereinafter referred to as cov) has exhibited a complete redox insensitivity; despite the reduction of all disulfide bridges of cov to thiols, its binding to ace remained unchanged. however, this study did not probe the redox sensitivity of ace receptor. thus, in the present study, we computationally investigated the redox state of both partners (ace and cov/cov- ) on their binding affinities. the structure of cov and cov- - complexed with ace are known and the noncovalent interactions at the protein-protein interface have been reported recently. using these reported structures, molecular dynamics simulations and electrostatic field calculations were performed to explore the impact of thioldisulfide balance on cov/cov- and ace binding affinities. the structural and dynamical changes due to the change in the redox states of cysteines in the interacting proteins were analyzed and their effects on binding free energies were studied. the molecular basis of the binding of spike proteins to ace is known from x-ray crystallographic (sars-cov) and cryo-electron microscopic (sars-cov- ) studies. the sequence alignment of cov and cov- spike proteins showed high sequence identity (> %) indicating that their binding to ace receptors will be similar ( fig. ). in both bound structures, the rbd of cov and cov- is found to be complexed with ace (fig. ) . both ace and cov- possess four disulfide bridges, whereas cov subunit has only two disulfide linkages (table and fig. a and b) . two large helices of ace form a curved surface (fig. , illustrated by the dashed curved line) that interacts with the concave region of cov or cov- subunit. structural change along the trajectory. in all cases, the simulation started with an equilibrated structure, which was obtained after minimizing the neutralized solvated protein complex built from the experimentally determined structures. the evolution of the protein structure along the md trajectory was monitored by calculating the root-mean-square deviation (rmsd) of each structure from the starting structure as a frame of reference following standard procedure. briefly, during the md simulation, the protein coordinates were recorded for every ps interval and a root-mean-square-deviation (rmsd) of each frame was calculated from the average root-mean-square displacement of backbone c α atoms with respect to the initial structure. then, the rmsd values, averaged over conformations stored during ns time, were plotted against the simulated time (fig. ). compared to the starting structure, only a moderate backbone fluctuation was noted in all protein complexes during ns simulations and the maximum of rmsd was in the range of . - . Å ( table ). the evolution was smooth and its stability was demonstrated by the standard deviation of the computed rmsds, which was less than . Å. taken together, results of the simulations showed no unexpected structural deformation of the sar-cov/cov- ⋅⋅⋅ace complex in both, reduced (thiol groups) and oxidized (disulfide bonds) states (table ) . table ). however, when all disulfides in cov/cov- as well as ace were reduced to thiols, the binding became thermodynamically unfavorable. in both cases, the binding free the study found that the reduction of all disulfides into sulfydryl groups completely impairs the binding of sars-cov/cov- spike protein to ace . this is evident from the positive gibbs energy of binding (∆ bind g o ) obtained for both cov ox ⋅⋅⋅ace ox and cov- ox ⋅⋅⋅ace ox complexes. when the disulfides of only ace were reduced to sulfydryl groups, the binding becomes weaker, as the the redox environment of cell surface receptors is regulated by the thiol-disulfide equilibrium in the extracellular region. , this is maintained by glutathione transporters, a number of oxidoreductases including protein disulfide isomerase, and several redox switches. under oxidative stress, the extracellular environment becomes oxidation-prone resulting more disulfide formation on protein surfaces. therefore, under severe oxidative stress, the cell surface receptor ace and rbd of the intruding viral spike protein are likely to be present in its oxidized form having predominantly disulfide linkages. this computational study shows that under oxidative stress, the lack of reducing environment would result in significantly favorable binding of the viral protein on the cell surface ace . in terms of energetics, this computational study demonstrates that the oxidized form of proteins with disulfide bridges would cause a kcal/mol of decrease in gibbs binding free energy. furthermore, ace , which the viral spike proteins latch on to, is known to be a key player in the remedial of oxidative stress. binding of the viral protein will prevent the key catalytic function of ace of converting angiotensin (a strong activator of oxidative stress) to angiotensin - thereby creating a vicious circle of enhanced viral attack. in summary, the present study demonstrates that the absence of or reduced oxidative stress would have a significant beneficial effect during early stage of viral infection by preventing viral protein binding on the host cells. computational setup. setting up of protein systems and all structural manipulations were carried out using visual molecular dynamics (vmd). disulfide groups were modified to thiols during setting up of structures using standard vmd scripts. molecular optimization and dynamics (md) simulations were carried out using nanoscale molecular dynamics (namd) package using charmm force field. [ ] [ ] [ ] [ ] [ ] during md simulations, electrostatic energy calculations were carried out using particle mesh ewald method. backbone root-mean-squaredeviation (rmsd) calculations were performed using vmd. protein-protein interactions were studied using adaptive-basis poisson boltzmann solver (apbs). electrostatic field calculations were performed using pdb pqr program suit. molecular dynamics simulations. all simulations were performed using the structure of ace bound sars-cov (pdb entry: d g) and scar-cov- (pdb entry: m j) . in all simulations, setup of protein complexes systems was carried out following protocols used previous studies from this lab. , briefly, hydrogens were added using the hbuild module of charmm. ionic amino acid residues were maintained in a protonation state corresponding to ph . the protonation state of histidine residues was determined by computing the pka using binding free energy calculations. gibbs free energies of binding between the ace and sars cov or cov- proteins were calculated using apbs using a standardized method of a treecode-accelerated boundary integral poisson-boltzmann equation solver (tabi-pb). in this method, the protein surface is triangulated, and electrostatic surface potentials are computed. the discretization of surface potentials is utilized to compute the net energy due to solvation as well as electrostatic interactions between the two protein subunits, as outlined in thermodynamic scheme, scheme . following scheme , the free energy of binding of the two protein fragments in water, can be expressed as a sum of two components (eq. ) where the ∆ coul g represents the coulombic (electrostatic) interactions between the proteins occurring at the protein-protein interface (scheme ) and ∆ ∆ solv g is the difference of the solvation energies between the complex and the corresponding free proteins: however, the solvation calculation used only part of the entire spike protein as well as the ace , therefore ∆ bind g tabi-pb was calibrated by correcting ∆ ∆ solv g using experimentally known binding free energy of ace ···cov- : where k d is the experimental dissociation constant, which is equal to nm. the corrected free energy of the solvation using the ∆ g corr and eq. , the corrected binding free energy, ∆ bind g o of all protein complexes is expressed by as shown in eq. , the combination of last two terms in eq. is equal to ∆ ∆ solv g corr. therefore, eq. can be simplified as: figure . sequence alignment (generated by clustal omega ) between the receptor-binding domain of sars-cov and sars-cov proteins. the "*" represents the identical residues, ":" represents similar residues, and gap represents dissimilar residues. the cysteine residues are highlighted in yellow. rvqptesivrfpnitnlcpfgevfnatrfasvyawnrkrisncvadysvlynsasfstfk cov ------------------pfgevfnatkfpsvyawerkkisncvadysvlynstffstfk *********:* *****:**:**************: ***** cygvsptklndlcftnvyadsfvirgdevrqiapgqtgkiadynyklpddftgcviawns cov cygvsatklndlcfsnvyadsfvvkgddvrqiapgqtgviadynyklpddfmgcvlawnt ***** ********:********::**:********** ************ ***:***: the medium-scale fluctuations are shown in green. coronavirus envelope protein: current knowledge coronaviruses: an overview of their replication and pathogenesis structure analysis of the receptor binding of -ncov high expression of ace receptor of -ncov on the epithelial cells of oral mucosa inhibition of human immunodeficiency virus infection by agents that interfere with thiol-disulfide interchange upon virus-receptor interaction inhibitors of protein-disulfide isomerase prevent cleavage of disulfide bonds in receptor-bound glycoprotein and prevent hiv- entry significant redox insensitivity of the functions of the sars-cov spike glycoprotein: comparison with hiv envelope cell surface protein disulfide isomerase regulates natriuretic peptide generation of cyclic guanosine monophosphate the role of cellular oxidoreductases in viral entry and virus infectionassociated oxidative stress: potential therapeutic applications implications of oxidative stress on viral pathogenesis cell entry by enveloped viruses: redox considerations for hiv and sars-coronavirus thiol-disulfide exchange reactions in the mammalian extracellular environment structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections cryo-em structure of the -ncov spike in the prefusion conformation structural and functional basis of sars-cov- entry by using human ace structural basis for the recognition of sars-cov- by full-length human ace crowder-induced conformational ensemble shift in escherichia coli prolyl-trna synthetase from structure to redox: the diverse functional roles of disulfides and implications in disease plasma membrane glutathione transporters and their roles in cell physiology and pathophysiology ace -mediated reduction of oxidative stress in the central nervous system is associated with improvement of autonomic function vmd: visual molecular dynamics charmm additive all-atom force field for glycosidic linkages between hexopyranoses charmm additive all-atom force field for glycosidic linkages in carbohydrates involving furanoses charmm additive all-atom force field for carbohydrate derivatives and its utility in polysaccharide and carbohydrate-protein modeling optimization of the additive charmm all-atom protein force field targeting improved sampling of the backbone ( ) dihedral angles charmm additive all-atom force field for phosphate and sulfate linked to carbohydrates a smooth particle mesh ewald method improvements to the apbs biomolecular solvation software suite pdb pqr: expanding and upgrading automated preparation of biomolecular structures for molecular simulations multiple pathways promote dynamical coupling between catalytic domains in escherichia coli prolyl-trna synthetase computer ""experiments" on classic fluids. i. thermodynamical properties of lennard-jones molecules scalable molecular dynamics with namd a treecode-accelerated boundary integral poisson-boltzmann solver for electrostatics of solvated biomolecules structure of the sars-cov- spike receptor-binding domain bound to the ace receptor the embl-ebi search and sequence analysis tools apis in we acknowledge computational support from the blugold super computing cluster (bgsc) of university of wisconsin-eau claire. figure. . structures of protein complexes of a) sars-cov⋅⋅⋅ace and b) sars-cov- ⋅⋅⋅ace . all the disulfide bridges between cysteine residues are shown in green vdw spheres and thiol groups in cyan licorice. key: cord- -d imvo authors: basu, souradip; mukhopadhyay, suparba; das, rajdeep; mukhopadhyay, sarmishta; singh, pankaj kumar; ganguli, sayak title: impact of clade specific mutations on structural fidelity of sars-cov- proteins date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d imvo the sars-cov- is a positive stranded rna virus with a genome size of ~ . kilobase pairs which spans open reading frames. studies have revealed that the genome encodes about non-structural proteins (nsp), four structural proteins, and six or seven accessory proteins. based on prevalent knowledge on sars-cov and other coronaviruses, functions have been assigned for majority of the proteins. while, researchers across the globe are engrossed in identifying a potential pharmacological intervention to control the viral outbreak, none of the work has come up with new antiviral drugs or vaccines yet. one possible approach that has shown some positive results is by treating infected patients with the plasma collected from convalescent covid- patients. several vaccines around the world have entered their final trial phase in humans and we expect that these will in time be available for application to worldwide population to combat the disease. in this work we analyse the effect of prevalent mutations in the major pathogenesis related proteins of sars-cov and attempt to pinpoint the effects of those mutations on the structural stability of the proteins. our observations and analysis direct us to identify that all the major mutations have a negative impact in context of stability of the viral proteins under study and the mutant proteins suffer both structural and functional alterations as a result of the mutations. our binary scoring scheme identifies l s mutation in orf as the most disruptive of the mutations under study. we believe that, the virus is under the influence of an evolutionary phenomenon similar to muller’s ratchet where the continuous accumulation of these mutations is making the virus less virulent which may also explain the reduction in fatality rates worldwide. what started off as an unusual outbreak of pneumonia in wuhan china has reached the farthest corners of the world as a global pandemic. till date there are over million people affected by this novel strain of severe acute respiratory syndrome (sars) virus and has been named as sars-cov- with the disease named as covid- . genomic structures and phylogenomic studies reveal that the causal agent of covid- belongs to genera beta coronavirus. it is similar to the human beta coronaviruses (sars-cov- , sars-cov and mers-cov) but there are important differences at the genotypic and phenotypic levels which ultimately influence their pathogenesis. the sars-cov- contains a single stranded positive sense rna encapsulated in a nucleoprotein matrix. the rna genome of sars-cov- consists of approximately , nucleotides, encoding for known proteins, though we find the expression of to be constant with one remaining unexpressed. each of the proteins that are reported to be encoded perform different functions in the viral pathogenesis cascade though we are still not fully aware of the plethora of functions that they perform (zeng et al., ) . four proteins make up the viral structure of which the most studied and important protein is the s protein or spike protein since they are the regulators of viral entry into the host. this protein binds to the angiotensin converting enzyme (ace ) receptor and gains entry into the host cell. a pair wise amino acid comparison using the amino acid sequences reveal that the spike proteins of sars-cov- share more than % sequence homology to sars-cov. conservation is found in residues which interact with the ace receptor; however, the other residues are a bit different with a few insertions as well. till date the reports indicate that the interaction of the spike protein of sars cov- is much stronger as compared to the previous coronaviruses . a nonpeer-reviewed preprint posted on medrxiv hypothesises that since ace is overexpressed in the lung tissue of people with chronic respiratory disorders, they are likely to be more affected by this prevalent strain of the virus (pinto et al., ) . however, the infectivity of the virus is not restricted to its affinity for ace . if we dig deeper into the structure of the s protein we shall find that they are composed of two segments, s and s that require a double cleavage to expose the peptide which initiates the fusion with ace . the first cleavage is mediated by a host cell encoded enzyme furin. this ability to get human enzymes to do its work increases its infective ability as well. few non-structural proteins of the virus are involved in the evasion of the host immune system and also function towards the assembly and replication of the virus. these proteins are expressed as part of two large polyproteins which are produced from a ribosomal frameshift event and are cleaved by the protease into different proteins each with specific functions (seah et al., ) . the rest of the proteins are the accessory proteins which are technically non-essential as they are not required for viral replication in vitro and are the least well understood of the total proteome (kim et al., ) . there have been several reports regarding the variation in severity and lethality of coronaviruses. nl , e, oc , and hku cause mild respiratory problems in humans and three others (mers-cov, sars-cov including the newly emerged sars-cov- ) have been observed to inflict severe respiratory syndromes (veriti et al., ; who ; who [mers-cov] ). sars-cov- infection was first reported from wuhan, china, on th december (veriti et al., ) and in less than three months, on th march , who declared covid- as a pandemic (li, ) . by may, , countries had been already affected with almost . lakhs fatal cases .thus it was clearly visible that the infectivity of sars-cov- was much higher than sars-cov or mers-cov (folis et al., ) , though the fatality rate ( . - . %) is still substantially lower than that of sars-cov ( %) and mers-cov ( %) (tang et al., ) . since this is a new pathogen, researchers worldwide are looking towards both primary and secondary prevention strategies in the form of vaccination as well as an over the counter medication which can be afforded by citizens around the world. whatever strategy is being envisaged, the threat lies in the appearance of new strains with resistant mutations to vaccines and therapeutic agents. as a large proportion of humanity is already infected and we have no strategy in place currently to identify the asymptomatic carriers due to the lack of infrastructural and manpower resources around the world, there is high possibility of the successful escape mutations in the genome of the virus. very few studies till date have focused on mapping the mutations on the important protein components of sars-cov- . during the preparation of this manuscript we came across two studies which attempt to map the effects of significant mutations on the protein structure and stability. benvenuto et al ( ) reports in more recent isolates of sars cov- ,the presence of two mutations affecting the non-structural protein (nsp ) and the open reading frame (orf ) in adjacent regions. amino acidic change stability analysis suggests both mutations could confer lower stability of the protein structures. namely, at amino acid position (corresponding to nsp position ), most of the sars-cov- sequences have a leucine residue while some more recent sequences from asian, american, oceanian and european isolates show phenylalanine. at the amino acidic position (corresponding to orf position to ), most of the sars-cov- sequences have an arginine residue while some sequences from australian and american isolates have a histidine residue. in the second study by bhattacharya et al ( ) the amino acid position in the s protein was found to be conserved among species (bat, pangolin, civet and human sars-cov and human sars-cov- ), until the a a subtype defining mutation occurred (d g) (bhattacharya et al., ) . authors report that an additional serine protease (elastase) cleavage site around s -s junction was introduced due to d g mutation in sars-cov- . the glycine at (aa) is predicted to be the nearest substrate site for elastase to perform proteolytic cleavage in the adjacent residue. this study focuses on mapping the effects of the clade specific mutations identified in the orf , nsp , nsp , nsp , nucleocapsid protein (n) and spike protein (s) on the conformation and stability of the structures. it also predicts the possible drug binding sites, along with the druggability of these protein structures and the conformational epitopes of b cell, t cell, mhc -i and mhc -ii alleles. clade specific mutations were identified using literature mining from the various sars -cov dedicated resources at pubmed and pubmed central and we identified a list of prevalent mutations across the major pathogenesis related proteins of the virus (table ) . though currently there are reports of other mutations that have been identified by large scale viral genomic sequencing studies undertaken globally, (mercatelli and giorgi ) , the following dataset can be considered to be comprised of the initial founder mutations which resulted in the variations in the strain of the virus before the international borders were closed down (callaway, ) following which the selection pressure on the virus became more country and population specific. following the identification of the mutations we utilised the ncbi genome browser to extract the protein sequences using the wuhan strain as the ancestral strain. using a generic sliding window approach we were able to pinpoint the sites of mutations and we created two data sets -wild type protein sequences which were directly the ones from the wuhan strain and the mutant data set -which consisted of the protein sequences carrying the mutations in them prediction of physicochemical properties of the protein: to understand the exact conformational behaviour of a protein, physical and chemical parameters of the wild type and mutant protein sequences were analysed using the protparam protein analysis tool of expasy web server (https://web.expasy.org/protparam/) and pepstat protein sequence statistical tool of embl-ebi web server (https://www.ebi.ac.uk/tools/seqstats/emboss_pepstats/),to determine molecular weight, theoretical pi, extinction coefficient (gill & von hippel, ) , estimated half-life (bachmair et al., ) , instability index (guruprasad et al., ) , aliphatic index (ikai, ) , grand average of hydropathy (kyte & doolittle, ) , and isoelectric point (harrison, ) . the secondary structure of the wild type and the mutant proteins along with their degree of disordered residues and accessible surface area was predicted using the primary sequence of the protein. these were submitted as input to the netsurfp - . server (http://www.cbs.dtu.dk/services/net-surfp/) (klausen et al., ) , which utilizes a neural network based approach trained on established protein structures as test sets. the accuracy of prediction is estimated to be % when compared to estimated structures. since the degree of disordered residues in a protein has its impact on overall structural and functional stability of a protein, the data obtained from netsurfp was cross- (mészáros et al., ) . the possible epitopes were predicted for the wild type and their corresponding mutant proteins selected for the study. linear b cell epitopes were predicted by using abcpred (crdd.osdd.net/ raghava/abcpred/) and lbtope (webs.iiitd.edu.in/raghava/lbtope/protein.php) servers. abcpred requires an amino acid sequence as input and generates epitopes using an artificial neural network (ann) (janahi et al., ) . a window length of amino acids and threshold of . were considered for generating the results (kumar et al., ) . "lbtope: linear b cell epitope prediction server" was also used to reveal b cell epitopes of viral origin with a threshold probability score of % and above. the t cell epitopes of viral proteins binding to class i and class ii major histocompatibility complex molecules were mined using netctl . server (http://www.cbs.dtu.dk/services/ netctl/) and propred (https://webs.iiitd.edu.in/raghava/propred/). viral peptides capable of binding to cytotoxic t lymphocytes and mhc i molecules were recognized using netctl . server which facilitates the prediction by combining the scores for proteasomal cleavage, tap transport efficiency and mhc i binding (larsen et al., ) . the single letter code of amino acid sequence was entered in the user interface, the c terminal cleavage, weight on tap transport efficiency and threshold for epitope identification was set at values of . , . and . respectively. the viral proteins were screened against most recognized supertypes of hla alleles. the results were generated with a specificity of % and sensitivity of %, by integrating the scores based on multiple parameters as weighted sum with a relative weight on mhc i binding. for the prediction of mhc class ii epitopes, propred uses quantitative matrices methods which scan the antigen sequence against a profile of hla-dr binding pockets, each having a unique quantitative representation of the amino acid interactions within the peptide binding cleft (sturniolo et al., ) . amino acid sequence was provided as input at threshold value of , which enabled us to view the top % of best scoring peptides with respect to their probability of binding to different human leukocyte antigen (hla) molecules. epitopes were predicted for all the hla-dr alleles that account for % of the mhc molecules on antigen presenting cells (palanisamy & lennerstrand, ) . the impacts of the mutations on the stability of the proteins were estimated using a sequence based approach at the i-mutant . (capriotti et al., ) web server. this is a program which classifies the effects of a particular mutation using a support vector machine based approach. the training set of the tool was the protherm database (bava et al., ) . the temperature and ph threshold used for this analysis were and respectively. no complete crystal structures of these proteins are currently available in the protein data bank. as a result, ab initio modeling was performed using the i-tasser server (https://zhanglab.ccmb.med.umich.edu/i-tasser/) (roy et al., ) and the resultant structures were subjected to molecular simulation for ns using a gromos a force field in gromacs (bekker et al., ; abraham et al., ) . following the simulation, the structures were analysed for torsional/ conformational stability from the ramachandran plots of the models using the molprobity server (williams et al., ) . the integrity and validity of the d structure of the generated homology models were verified by q mean analysis. the variants prioritised by the sequence based analysis were inserted within the wild type models by using the mutate tool of deepviewv . . (guex & peitsch, ) following which they were simulated and validated using the protocols followed for the wild type proteins. the degree of evolutionary conservation of an amino acid in a protein is a signature of the balance between its natural tendency to mutate and the overall need for it to be retained in its original form to maintain the structural integrity and function of the macromolecule. in our analysis we used the consurf web server (http://consurf.tau.ac.il) (ashkenazy et al., ) , for predicting the evolutionary pattern of the amino acids of the macromolecule to reveal regions that are important for structure and function. the structures were submitted as inputs from which the sequences were extracted by the server and top homologues were identified using three psi -blast iterations against the uniprot database. redundant homologous sequences were removed using the cd-hit clustering method (li &godzik, ; fu et al., ) . the resulting sequences were then aligned using mafft (katoh & standley ) and the generated multiple sequence alignment (msa) was used to generate the phylogenetic tree. the tree and the msa were then considered together and the rate site algorithm (pupko et al., ) was used to calculate the position-specific evolutionary rates using an empirical bayesian methodology (mayrose et al., ) . the raw data obtained was then normalized and grouped into nine conservation grades where represented the rapidly evolving positions, denoted positions with intermediate rates, and was the threshold for the most evolutionarily conserved position. these were then mapped on the d structure on a position specific basis. we performed this analysis using both the wild type and mutant structures. prediction of binding pockets and druggability: the alterations in potential ligand-binding sites and druggable pockets, in the wild type and mutant proteins were analysed using dogsitescorer (https://proteins.plus/) web server. the following parameters were assessed: • drug score-the total drug score was obtained by summing up the drug scores for each of the predicted pockets in the wild type and mutant structures. increase or decrease in the total drug score was an indication towards the alteration of the pocket conformation and druggability. • the number of ligand-binding pockets -the number of ligand-binding pockets in the wild type structure was considered as the reference and gain or loss of pockets in the mutant structure was considered for comparison. post simulation, structures of wild type and mutant pairs of each protein were aligned using tmalign tool (zhang & skolnick, ) . tmalign performs structural alignment by superimposing structures, it performs a pair wise alignment for individual residues of protein. the output is provided in terms of root mean square deviation (rmsd), tmalign score and aligned residues among structures. here rmsd value proves to very critical when it comes to structural variations, higher the rmsd value higher are the differences among structures under consideration. in case of this study, higher the rmsd between structures, more were implications of mutations in terms of structural distortion. the molecular structures were visualized using chimera. each sequence based and structural parameter analysed as a part of this study was given equal weightage since, the dataset was time limiting. a simple binary coding scheme was devised which was as follows: alteration in value of parameter from wild type to mutant = - , when the parametric value was unchanged for wild type and mutants then the value was " " indicative of a neutral impact of the mutation. using this scheme, a matrix was prepared and each mutation was provided with a cumulative score. this cumulative score was then arranged in a descending order and the mutation with the lowest score (most negative score) was considered to be the most disruptive. determination of protein sequences from genome: using the methods outlined above we identified the wild type and mutant sequences which would be used subsequently (table ). throughout the course of results and discussion we would be using the names of the proteins for easier understanding. epitopes harboured in the viral proteins, that elicit immune responses in the host, can serve as potent vaccine candidates in near future. linear b cell epitopes, were ascertained using servers, abcpred and lbtope. lbtope organizes the antigenic determinants in descending order of their probability scores. epitopes with greater than % prediction scores were additionally screened for epitopes encircling the alteration site (supplementary table table : variation in average score of epitopes for the set of seven proteins using lbtope. for abcpred, the antigenic determinants are ranked with respect to their binding probabilities, from highest to lowest, thus endorsing the best binder the first rank and so on. in this work, we have analysed the epitopes having any rank within for each protein, enumerated their numbers and computed their average score for wild type and mutant forms (table ) . we observed a change in the number of epitopes and their mean binding score for n protein and nsp proteins. while the number of epitopes remained unaltered, the mean score showed discrepancy for the orf protein. an epitope holding the alteration site in wild type orf protein had a binding score of . , which was altered to . post mutation. again, in proteins like nsp and n protein, mutation resulted in recognition of new epitopes covering mutation region, which was not predicted previously in wild type proteins. no fluctuation in epitope count and score was noted in case of orf a, s protein, nsp and nsp proteins (figure ) table ). finally we summed up all the combined scores of all the epitopes of the individual proteins lying above their respective average score. the data of table y was represented through a histogram ( figure ). propred was used to predict mhc class ii binding t cell epitopes from the set of proteins. the top % of the best scoring epitopes generated from each protein were further shortlisted on the basis of the average score of the epitopes encompassing the mutation region in the wild type protein. the epitopes with probability scores equal to or greater than the average score was screened from the wild type and mutant proteins. the filtered epitopes were then, quantitatively and qualitatively compared and scrutinized between normal and mutant forms (supplementary table ). in case of orf a protein, less epitopes were found to qualify the average score ( potential changes to stability of the proteins brought about by the mutations were analysed using the i-mutant server. according to the algorithm of the server the range of ddg values for a neutral mutation classification is : - . = . are classified as large increase. thus in our analysis we found that n protein, orf a and s protein could be categorised as having large decrease with orf a having the highest negative value of - . , while nsp had large increase in the value. alterations of nsp and nsp were predicted to be in the neutral change segment (figure ). stable structures were generated using ab initio modeling which were subsequently simulated [ qmeans values were found to differ in case of the wild type and mutant structures with significant changes observed in case of s protein, orf a and orf a; while in case of nsp opposite trend was observed, though the differences in qmeans score was not very significant for nsp ( figure ). all the fluctuation plots obtained following simulation [ figure to further confirm the variations induced as a result of the mutations we aligned the protein structurally and distinct differences were observed ( figure ). mutations led to differences in the pro-tein structure of every mutant -wild type pair. highest rmsd value was observed for s protein ( . ), while maximum number of mismatches was observed in case of n protein ( out of ). n protein also had a high rmsd of . . the results of structural alignments do suggest that mutations altered the protein structures substantially (figure ). the analysis of potential drug binding pockets and their druggability was further evaluated and all the proteins had alterations not only in the total number of drug binding pockets but also the average drug scores when wild type and mutant structures were compared. it was interesting to note that apart from nsp and orf a all the other proteins, suffered a decrease in average druggability as a result of the mutations whereas the former two proteins exhibited an increase in potential druggability (figure a and b) . no significant variations were observed in our evolutionary analysis using consurf server. a simple binary scoring scheme was devised, where a parameter value alteration was indicated as - (indicating a change from wild type value) and unaltered parameter (indicating of no change from wild type parameter) was used for the analysis. for parameter values which involved decimal numbers, change was considered only if the first place after decimal had been altered (supplementary table ). once the scoring scheme was populated with the corresponding values a heat map was generated to diagrammatically exhibit the contributions of each parameter under study towards the overall cumulative score obtained (figure a) . the ranking of the mutations were then performed using a histogram in which the cumulative scores were plotted (figure b) . from the histogram we were able to clearly predict the l s mutation in orf a as the most disruptive of the mutations followed closely by nsp mutation (orf a v) and the mutations in n protein (s n); or f a (g v) and s protein (d g). the lowest ranked mutations were those of nsp (l f) and nsp (v i). according to kyte et. al. ( ) increased value of hydropathicity indicates the presence of more hydrophobic residues in the protein sequence and lins et. al. ( ) , points out that decrease in accessibility is more pronounced for hydrophobic than hydrophilic residues (kyte &doolittle, ; lins et al., ) . in our analysis s protein and orf a have increased hydropathicity and decreased accessible surface area which suggest that the mutation have resulted into increased hydrophobicity. salahuddin and khan ( ) have reported that increase in hydropathicity can be a direct determinant of increased pathogenicity for viral proteins. this observation is supported by the observations of (sur et al., ). both of these works involve analysis of mutations and genome variations with flu viral genomes. s protein acts as a cellular receptor to induce the fusion of viral and host membranes (tan et al., ; tan et al., ) . it is important for the viral entry into the host cell and for inducing host immune responses. sequence comparison of isolates from different infections show both viral s protein and orf a gets positively selected during evolution (yin, ) implying that orf a plays an important role in the virus life-cycle and disease development. also, according to linding et. al. ( ) protein disorder is important for understanding protein function and protein folding pathway (linding et al., ) . in our study, orf a protein exhibits decreased disorderedness in its mutant form which indicates towards a change in the structural conformation and rigidity. in our study, maximum change in accessible surface area was observed in s protein whereas in case of orf a the intrinsic disorder was observed to decrease significantly which may affect the role of s protein's functional interaction along with orf a. ikai et. al. ( ) , defined aliphatic index as the relative volume of a protein occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) of amino acids which show a correlation with increase in thermostability of proteins (ikai, ) . in our analysis nsp , nsp and orf a were observed to show an increase in aliphatic index value from wild type to mutant which suggests probable increase in the thermostability of these proteins due to mutation. interestingly orf a protein has a significant role in the functioning of s protein in the host system to increase infection which may be affected by the increased thermostability of orf a. momen-roknabadi et. al. ( ) , in their study predicted that accessible surface area can be used to improve the prediction of secondary structure of a protein (momen-roknabadi et al., ) . according to lins et al ( ) , different secondary structures correspond to different accessibility of residues. in their study, they found that random coils, beta sheets were more accessible folds with average % accessibility, while the alpha helices have % accessibility. in our analysis, we found a similar phenomenon where loss of beta sheet, conversion of random coils to alpha helices have led to decreased accessible surface area and loss of alpha helix have caused an increase in accessible surface area. induction of adaptive immunity is largely dependent on recognition of viral epitopes by host b and t cell receptors. site specific mutations in the mentioned set of seven proteins from sars-cov were extensively analyzed for their effects on epitope recognition and binding efficiencies. while gain of function mutations can improve the binding affinities of epitopes to host lymphocyte receptors, loss of function mutations arising through positive selection can be crucial for b and t cell immune escape (ramaiah et al., ) . each of the seven mutant proteins investigated, exhibited either change in free energies of binding or recognition by specific alleles, thus necessitating further exploration to evaluate their impact on immune escape and viral persistence. besides, changes in the number of linear b-cell epitopes were spotted in proteins like nsp and n protein. in our study, we observed an epitope covering mutation region, recognized by a supertype in wild type of n protein but no changes in corresponding epitope was spotted by the same supertype in the mutant n protein (loss of function mutation); in contrast we spotted a mutation spanning epitope in mutant nsp protein perceived by a supertype, which was not detected by the same supertype in the wild type nsp protein (gain of function mutation). another significant finding was the decline in the number of mhc ii binding t-cell epitopes in the orf and s protein, where the mutant protein was found to have and less immunogenic determinants respectively. each of the seven proteins were assigned a score of either '- ' or ' ', for each of the four computational tools used for epitope prediction, where '- ' corresponds to any change in number or binding efficacy of antigenic determinants, that may have surfaced because of mutation and ' ' corresponds to no changes between wild type and mutant forms. the rmsf value measures the deviation between the positions of particle i and some reference position. rmsd and rmsf values can be differentiated remarkably. rmsf is averaged over time, depicting a value for each particle i, while in case of rmsd, the average is taken over the particles, giving time specific values. zhao et al ( ) have probed into the effect of mutations in the zinc binding b box domain protein (zhao et al., ) . in their study they have looked into the changes in the dynamics of the backbone atoms and also calculated root mean square fluctuation (rmsf) values at each time point of the trajectories of the native and mutant structures. they found that mutations induced higher rmsf which led them to suggest that the much larger side-chain of valine extends its steric hindrance to other zinc-binding residues. supporting the increased flexibility, they also measured the rmsf values for second zinc-binding residues (c , c , h , h ); which led to the final conclusion that: larger rmsf values indicate increased random motions of residues leading to disruption in the structural integrity of the proteins. in our analysis we have also observed a similar phenomenon with the increase in the rmsf values in mutant structures ultimately causing randomness in the d structures which is evidenced by their increased rmsd and more negative q mean scores. the significance of q mean scores have been extensively reported by de carvalho and de mesquita ( ) in their work on human superoxide disputes where they com-pared wild type and mutant structures and reported the effect of mutations (de carvalho & de mesquita ). using the tm align and q mean methodology datta et al ( ) have explored the effect of single nucleotide polymorphisms of the ribonuclease l gene (rnasel) in prostate cancer (datta et al., ) . all the snp's were mapped on the protein structure and tm align was used to classify the most disrupting mutants. in our analyses we have been able to clearly show that mutations caused significant changes in protein structures where highest rmsd value was observed for s protein ( . ), while maximum number of mismatches were observed in case of n protein ( out of ). n protein also had a high rmsd of . . thus from the structural comparisons of wild type and mutant proteins we were able to understand that almost all the mutations destabilised their native conformation. mutations in all the proteins were first matched with the alteration data and it was found that direct effect of the mutation residue was associated with the changes in the epitope site predictions (table ). they were also associated with alteration in the accessible surface area values as predicted thus implying that the mutations resulted in the changes in both structural and functional aspects of the protein structures. the exposure-predominant type (likely introduced early in the assay) (huskova et al., ) . a score of was added if the mutation was in a known human hotspot, if it was truncating or affected a splice site and so on. in our scoring scheme we prioritised the ability of the mutation to cause deviation in the structure of the mutant from the wild type and assigned scores as described in the materials and method segment. our final scoring revealed that orf protein mutation l s was the most disruptive of the mutations under study. incidentally l s mutation was also the most prevalent of the mutations under study as it was present in four different clades b, b , b and b clades. incidentally bhattacharya et al ( ) ( ), canada ( ), and usa ( ); b has been identified from china ( ) and japan ( ); while b finds its prevalence in australia ( ), china ( ) and usa ( ). these data suggest that this mutation was one of the earliest of the mutations that could have occurred naturally as it does not exhibit any population specificity and may be attributed to a random event. this data was however consistent with the predictions made in the i mutant server which had assigned the most negative value for l s and classified it as the most disruptive. this data justifies our scoring scheme and the parameters considered towards the prediction of the most disruptive mutation and thus can be explored further for future studies. recent works suggest that the lowering of the stability of viral proteins as a result of accumulation of mutations (banerjee et al., ) could be the reason behind the low national fatality rate in india. it may also be a phenomenon similar to muller's ratchet where this accumulation of mutations in fresh strains of the virus and lack of stabilising recombination events may result in the reduction in fitness of the virus as is being indicated by the effect of the mutations on protein stability. gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers con-surf : an improved methodology to estimate and visualize evolutionary conservation in macromolecules in vivo half-life of a protein is a function of its amino-terminal residue spike protein mutational landscape in india: could muller's ratchet be a future game-changer for covid- protherm, version . : thermodynamic database for proteins and mutants gromacs -a parallel computer for molecular-dynamics simulations global spread of sars-cov- subtype with spike protein mutation d g is shaped by human genomic variations that regulate expression of tmprss the coronavirus is mutating -does it matter? i-mutant . : predicting stability changes upon mutation from the protein sequence or structure functional and structural consequences of damaging single nucleotide polymorphisms in human prostate cancer structural modeling and in silico analysis of human superoxide dismutase furin cleavage of the sars coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry cd-hit: accelerated for clustering the next-generation sequencing data the proteomics protocols handbook calculation of protein extinction coefficients from amino acid sequence data swiss-model and the swiss-pdbviewer: an environment for comparative protein modeling correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence expression of soluble heterologous proteins via fusion with nusa protein receptor-binding domain of sars-cov spike protein induces highly potent neutralizing antibodies: implication for developing subunit vaccine in vitro using barrier bypass-clonal expansion assays and massively parallel sequencing thermostability and aliphatic index of globular proteins in silico cd +, cd + t-cell and bcell immunity associated immunogenic epitope prediction and hla distribution analysis of zika virus mafft multiple sequence alignment software version : improvements in performance and usability the architecture of sars-cov- transcriptome netsurfp- . : improved prediction of protein structural features by integrated deep learning structural and epitope analysis (tand b-cell epitopes) of hepatitis c virus (hcv) glycoproteins: an in silico approach a simple method for displaying the hydropathic character of a protein large-scale validation of methods for cytotoxic t-lymphocyte epitope prediction structure, function, and evolution of coronavirus spike proteins cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences protein disorder prediction: implications for structural proteomics analysis of accessible surface of residues in proteins interaction between heptad repeat and regions in spike protein of sars-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors comparison of site-specific rate-inference methods for protein sequences: empirical bayesian methods are superior geographic and genomic distribution of sars-cov- mutations iupred a: context-dependent prediction of protein disorder as a function of redox state and protein binding impact of residue accessible surface area on the prediction of protein secondary structures computational prediction of usutu virus e protein b cell and t cell epitopes for potential vaccine development ace expression is increased in the lungs of patients with comorbidities associated with severe covid- . medrxiv : the preprint server for health sciences rate site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues evidence for highly variable, region-specific patterns of t-cell epitope mutations accumulating in mycobacterium tuberculosis strains i-tasser: a unified platform for automated protein structure and function prediction revisiting the dangers of the coronavirus in the ophthalmology practice generation of tissue-specific and promiscuous hla ligand databases using dna microarrays and virtual hla class ii matrices in silico analysis of evolution in swine flu viral genomes through re-assortment by promulgation and mutation characterization of viral proteins encoded by the sarscoronavirus genome a novel severe acute respiratory syndrome coronavirus protein, u , is transported to the cell surface and undergoes endocytosis on the origin and continuing evolution of sars-cov- estimates of the severity of coronavirus disease : a model-based analysis. the lancet. infectious diseases molprobity: more and better reference data for improved all-atom structure validation middle east respiratory syndrome coronavirus (mers-cov) update -sars case fatality ratio genotyping coronavirus sars-cov- : methods and implications characterization of the a protein of sarsassociated coronavirus in infected vero e cells and sars patients tm-align: a protein structure alignment algorithm based on the tm-score molecular dynamics simulation reveals insights into the mechanism of unfolding by the a t/v mutations within the mid zinc-binding bbo -x domain supplementary table : total residue specific values of asa and disorder supplementary table : scoring table for impact assessment supplementary figure : images of ramachandran analysis for wild type and mutant structures the authors acknowledge the contributions of dr. souvik mukherjee, national institute of biomedical genomics, kalyani, west bengal for his insightful discussions regarding the data acquisition techniques as well as overall comments during the preparation of the manuscript. key: cord- -t pu z authors: zuo, j; dowell, a; pearce, h; verma, k; long, hm; begum, j; aiano, f; amin-chowdhury, z; hallis, b; stapley, l; borrow, r; linley, e; ahmad, s; parker, b; horsley, a; amirthalingam, g; brown, k; ramsay, me; ladhani, s; moss, p title: robust sars-cov- -specific t-cell immunity is maintained at months following primary infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t pu z the immune response to sars-cov- is critical in both controlling primary infection and preventing re-infection. however, there is concern that immune responses following natural infection may not be sustained and that this may predispose to recurrent infection. we analysed the magnitude and phenotype of the sars-cov- cellular immune response in donors at six months following primary infection and related this to the profile of antibody level against spike, nucleoprotein and rbd over the previous six months. t-cell immune responses to sars-cov- were present by elispot and/or ics analysis in all donors and are characterised by predominant cd + t cell responses with strong il- cytokine expression. median t-cell responses were % higher in donors who had experienced an initial symptomatic infection indicating that the severity of primary infection establishes a ‘setpoint’ for cellular immunity that lasts for at least months. the t-cell responses to both spike and nucleoprotein/membrane proteins were strongly correlated with the peak antibody level against each protein. the rate of decline in antibody level varied between individuals and higher levels of nucleoprotein-specific t cells were associated with preservation of np-specific antibody level although no such correlation was observed in relation to spike-specific responses. in conclusion, our data are reassuring that functional sars-cov- -specific t-cell responses are retained at six months following infection although the magnitude of this response is related to the clinical features of primary infection. the sars-cov- pandemic has led to over million deaths to date and there is an urgent need for an effective vaccine ( ) . there is considerable interest in understanding how adaptive immune responses act to control acute infection and provide protection from reinfection. antibody responses against sars-cov- are characterised by responses against a range of viral proteins, including the spike, nucleocapsid and membrane proteins. a number of studies, however, have shown that the level of this antibody response declines over time and may even lead to loss of detectable virus-specific antibodies in a substantial proportion of individuals ( , ) . information derived from study of immunity to related viruses such as sars-cov- and mers ( ) has shown that cellular immune responses against these viruses are maintained for much longer periods of time compared to antibody responses ( , ) . this has led to the hope that cellular responses to sars-cov- will similarly be of more prolonged duration ( , ) . studies to date have shown that virus-specific cellular responses develop in virtually all patients with confirmed sars-cov- infection ( ) . these responses remain detectable for several weeks following infection but it is currently unknown how they are maintained thereafter ( ) . in this study we characterised sars-cov- -specific t cell immune responses in a cohort of donors at -months post-infection. blood samples were obtained from convalescent donors at months following initial sars-cov- infection in march-april . among the donors, ( %) were female and ( %) were male with a median age of . years ( - years) . none of the donors required hospitalisation at any time during the course of the study. ( female and male) of the donors who experienced clinical symptoms of respiratory illness were grouped as "symptomatic" and ( female and male) who did not experience any respiratory illness were grouped as "asymptomatic". there was no significant difference between the median age of the symptomatic ( . ( - ) years) and asymptomatic donors ( ( - ) years). interferon gamma (ifn-g) elispot analysis was used to determine the magnitude of the global sars-cov- -specific t cell response. peptide pools from a range of viral proteins, including spike, nucleoprotein and membrane protein, were used to stimulate fresh pbmc and the magnitude of the global sars-cov- -specific t-cell response was determined. median elispot responses against the spike glycoprotein (spike); nucleoprotein and membrane (n/m); and orf a, orf , nsp , nsp a/b (accessory) peptide pools were measured at in , ( . %), , ( . %) and , ( . %) pbmc respectively ( figure a ). using the pre- healthy donor pbmcs to set the cut-off point, of donors ( %) demonstrated a sars-cov- -specific t-cell response to at least one protein with a median t cell immunity to sars-cov- at months post infection total value of cells per million pbmc ( in ) ( figure a ). eighteen donors did not have a demonstrable cellular response to spike and no response to the n/m pool was seen in individuals. no detectable response to any protein tested was seen in donors by elispot assay although all these donors responded by parallel intracellular cytokine analysis ( figure b ). considerable heterogeneity was observed in relation to the magnitude of this response. the global and peptide-specific responses were then assessed in relation to the clinical features at the time of primary infection. importantly, median aggregate elispot responses were % higher in donors who had initially demonstrated symptomatic disease compared to those with asymptomatic infection (figure a ). this profile was consistent against both spike and aggregate n/m proteins where values were % and % higher respectively in donors with initial symptomatic infection ( figure b ). no association was seen between elispot response and donor age. intracellular cytokine analysis was then utilised to assess the specificity and pattern of cytokine production from sars-cov- -specific cd + and cd + t-cells in donors. virusspecific cytokine responses were seen in people, including the individuals that had been negative by elispot analysis ( figure a ). interestingly, cd + virus-specific t cell responses were twice as frequent as cd + responses at this six-month time point ( . % of cd + pool vs . % of cd + pool respectively) ( figure b ). in particular, mean cd + responses against spike and non-spike (n/m/accessory) proteins were measured at . % and . % of the cd + repertoire respectively whilst corresponding values for cd + cells were . % and t cell immunity to sars-cov- at months post infection . % ( figure b ). no differences were observed in the virus-specific cd :cd ratio in relation to demographic factors such as age, symptomatic disease or gender. as expected, the pattern of cytokine production differed between the cd + and cd + subsets ( figure a, b) . it was noteworthy that il- responses were dominant within cd + subset ( figure b ). of note, the pattern of cytokine production by virus-specific cd + t cells was dependent on antigenic specificity. single ifn-g, single il- and dual positive il- +ifn-g+ t cells comprised . %, . % and . % of the spike-specific cd + t cell response respectively, compared to . %, . % and . % of the non-spike-specific repertoire ( figure c ). the magnitude of cd + t cell responses against spike and non-spike proteins within each individual was strongly correlated ( figure d ). however, this association was less marked for the cd + subset where responses were dominant against non-spike proteins ( figure d ). we next assessed how the magnitude, phenotype and cytokine profile of the virus-specific cellular immune response at six months correlated with the prospective profile of antibody production in the six months since infection. antibody levels against both the spike glycoprotein and nucleoprotein were available at serial time points from all donors ( figure a ). these were used to define both the peak value of antibody level against each protein and the rate of decline in antibody level over the subsequent two months. peak antibody levels against both spike and nucleoprotein were observed typically at the second month of sampling ( figure a ). antibody levels fell by approximately % during the two months after peak level but stabilised somewhat thereafter although spike-specific responses continued to decline ( figure a ). interestingly, the magnitude of the t cell elispot response at months against the spike protein was strongly correlated with magnitude of the peak antibody level against both spike protein and the rbd domain ( figure b ). a similar correlation was observed between the cellular response to the n/m pool and the peak level of n-specific antibody ( figure b ). the rate of antibody decline was then assessed in relation to the profile of the cellular immune response at months. relative preservation of the n-specific antibody response was seen in donors with stronger n and m-specific t cell responses at six months suggesting the cellular responses may act to support antibody production ( figure c ). however, no such association was observed in relation to spike-specific responses. finally, we also assessed the profile of cxcr expression on virus-specific t cells and related this to the pattern of stability of the virus-specific antibody response as positive correlations have been observed previously in hiv infection ( ) . high numbers of circulating tfh cd + t cells have been seen in severe acute infection ( ) but at months cxcr was expressed on only % of virus-specific cd + t cells and no correlation was observed with the profile of ab level following infection. the magnitude and quality of the immune memory response to sars-cov- will be critical in preventing reinfection. here we undertook, to our knowledge, the first assessment of the sars-cov- -specific t cell immune response at six months following primary infection in a unique cohort of healthy adults with asymptomatic or mild-to-moderate covid- . the major finding was that virus-specific t cells were detectable in all donors at this extended follow-up period. a high prevalence of detectable t-cell immunity has been described in studies performed at earlier time points after infection and our findings indicate that robust memory is maintained for at least months. approximately in pbmc were sars-cov- -specific which is broadly comparable to findings within the first three months after infection. these values are lower than typical responses against persistent herpesviruses ( ) but comparable to those against acute respiratory viruses, including sars-cov ( , ) . the magnitude of the t cell response was heterogeneous and may reflect the reports of remarkable diversity in the profile of the t cell immune response during acute infection ( ) . a striking feature was that the magnitude of cellular immunity was considerably higher in donors who had experienced symptomatic disease. indeed, median elispot responses were % higher within this group and demonstrate that the initial relative 'setpoint' of cellular immunity established following acute infection is maintained for at least months. a similar pattern has been observed within the first few weeks following acute sars-cov- infection in patients recovering from severe versus mild disease ( ) and also in patients after sars infection ( ) . this is likely to reflect a response to higher viral loads and inflammatory mediators during acute infection ( , ) although it is also possible that an elevated adaptive immune response during primary infection can itself act as a determinant of the clinical phenotype ( ) . cellular responses have a direct protective effect against severe coronavirus infection ( ) and also support antibody production. indeed, cytokine analysis showed that the cd +il- + subset was most significantly elevated in the symptomatic group. the finding of lower levels of t cell immunity in asymptomatic donors at months after infection might potentially add to concerns that this group may be more susceptible to later re-infection. however, it is also possible that the quality of the t cell response at the time of initial infection was sufficient to provide clinical protection. the relative susceptibility of patients with initial asymptomatic disease to episodes of re-infection, either clinically silent or symptomatic, will therefore need to be assessed over time. it was noteworthy that cd t cells responses against sars-cov- outnumbered cd effector cells by ratio of to . again, a similar pattern has been demonstrated at earlier time points after sars-cov- infection and may reflect high levels of viral protein uptake by antigenpresenting cells and cross presentation to the cd + positive t-cell pool or preferential expansion of cd + t cells ( ) . furthermore, cytokine analysis showed that il- was the major cytokine produced by virus-specific cd + cells, indicating a proliferative potential which may auger well for long-term immune memory ( ) . ifn-g responses are broadly equivalent to il- at early time points after infection ( ) but the profile at months suggests that the relative proportion of th effector cells decreases over time or they revert to central memory state ( ) . polyfunctional t cells are typically associated with superior pathogen control ( ) and studies on sars-cov- infections have revealed decreased cytokine functionality in patients with severe disease ( ) . the majority of cd + t cells at six months expressed only a single cytokine and production of three or four cytokines was observed in < % of cells. these results were consistent with comprehensive analysis of the cytokine profile released by sars-cov- specific t cells in supernatants from the elispot assay which showed that il- was consistently the dominant cytokine. interestingly, low levels of il- were released in response to all the peptide pools and as il- production has been reported by subsets of murine influenza and coronavirus specific t cells ( , ) these represent an interesting population of cells for future investigation. low and variable concentrations of il- and tnfα were also detected. of note, the pattern of cytokine production by cd + t cells varies with protein specificity, as seen in earlier reports ( ) . single il- or ifn-g producing cells were predominant against both spike and structural proteins but the former population was significantly greater in the cd + response against non-spike proteins, indicating that a retained th effector profile is more common within spike-specific t cell pool. these dual-positive populations are associated with elite immunological control of hiv infection and indicate that spike-specific t cells preferentially retain characteristics of both effector function and proliferative potential in vivo. the expression of cxcr on cd + t cells has been correlated with the magnitude and persistence of humoral immunity in the setting of hiv infection ( ) . we, however, observed that cxcr was expressed on only % of virus-specific cd + t cells suggesting that circulating virus-specific follicular helper cells are not sustained after infection and no clear relationship was noted between cxcr expression and either the magnitude or the rate of decline of sars-cov- -specific antibody level. findings in acute infection have also failed to correlate ctfh frequencies with the plasmablast response and suggest that non-cxcr + cd + t cell help may also operate ( ). one of the valuable features of our cohort was the availability of monthly antibody levels against the spike and nucleoproteins in the first six months after infection. the finding that that higher t cell responses at months against n/m proteins correlated with slower decline in n-specific antibody levels indicates that vaccine approaches that elicit strong cellular immune responses against this protein are likely to be valuable for sustaining stable antibody responses. in contrast, t-cell responses against spike were not related to the rate of decline of antibodies against that protein. nevertheless, spike protein-specific cellular responses were present in > % of individuals at months after mild to moderate infection and are also recognised as an immunodominant protein following sar-cov- infection ( ) . spike glycoprotein is the major immunogen used in current vaccine trials and these findings indicate that strong and sustained spike-specific t-cell immunity is likely to be required to sustain immune protection and should be assessed in analysis of optimal vaccine strategies. our finding that t cell responses against m/n proteins are equally as high as spike responses at months after natural infection suggest that these proteins could also represent valuable components of future vaccine strategies. our findings demonstrate that robust cellular immunity against sars-cov- is likely to be present within the great majority of adults at six months following asymptomatic and mildto-moderate infection. these features are encouraging in relation to the longevity of cellular immunity against this novel virus and are likely to contribute to the relatively low rates of reinfection that have been observed to date ( ) . further studies will be required to assess how these immune responses are maintained over the longer term. mean spot counts for negative control wells were subtracted from the mean of test wells to generate normalised readings, these are presented as spot forming cells per million input pbmc (sfc/ pbmc). freshly isolated pbmc were rested overnight prior to the assay. . x following overnight peptide stimulation in elispot assays ul of supernatant was removed and combined from two duplicate wells and cryopreserved at - °c. supernatant from eleven donors responding in the elispot assay were profiled using a -plex legendplex t helper cytokine panel version (biolegend) following the manufactures instructions. cytokine beads were analysed on a bd lsr ii flow cytometer (bd biosciences, san jose, ca, us). data was analysed with legendplex software (biolegend) and the average cytokine level determined from two duplicate samples. statistical analysis was performed with graphpad prism . a mann-whitney -tailed u test was used to compare variables between two groups, a wilcoxon matched-pairs signed rank test was used to compare paired non-parametric data, and a friedman test with dunn's multiple comparisons test was used to compare non-parametric data between more than groups. correlations were performed via spearman's rank correlation coefficient. two-way anova with dunnett multiple comparisons test was used to determine significance of cytokine profile data. statistical significance was determined as *p< . , **p< . , ***p< . and ****p< . . clinical features of patients infected with novel coronavirus in wuhan rapid decay of anti-sars-cov- antibodies in persons with mild covid- . the new england journal of medicine clinical and immunological assessment of asymptomatic sars-cov- infections sars a comparative overview persistence of antibodies against middle east respiratory syndrome coronavirus duration of antibody responses after severe acute respiratory syndrome sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls recovery from the middle east respiratory syndrome is associated with antibody and t-cell responses targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals healthcare workers with mild / asymptomatic sars-cov- infection show t cell responses and neutralising antibodies after the first wave circulating cxcr (+)cxcr (+)pd- (lo) tfh-like cells in hiv- controllers with neutralizing antibody breadth cell transcriptomic analysis of sars-cov- reactive cd (+) t cells. ssrn enzyme-linked immunosorbent spot assay but not cmv quantiferon assay is a novel biomarker to determine risk of congenital cmv infection in pregnant women epitope specific t-cell responses against influenza a in a healthy population characterization of sars-covspecific memory t cells from recovered individuals years after infection deep immune profiling of covid- patients reveals distinct immunotypes with therapeutic implications broad and strong memory cd + cd + t cells induced by sars-cov- in uk convalescent individuals following covid- adaptive immunity to sars-cov- viral load dynamics and disease severity in patients infected with sars-cov- in zhejiang province, china robust t cell response toward spike, membrane, and nucleocapsid sars-cov- proteins is not associated with recovery in critical covid- patients virus-specific memory cd t cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection phenotype and kinetics of sars-cov- -specific t cells in covid- patients with acute respiratory distress syndrome phenotypic, functional, and kinetic parameters associated with apparent t-cell control of human immunodeficiency virus replication in individuals with and without antiretroviral treatment antigen-specific adaptive immunity to sars-cov- in acute covid- and associations with age and disease severity central memory and effector memory t cell subsets: function, generation, and maintenance t-cell quality in memory and protection: implications for vaccine design effector t cells control lung inflammation during acute influenza virus infection by producing il- highly activated cytotoxic cd t cells express protective il- at the peak of coronavirus-induced encephalitis t cell responses to whole sars coronavirus in humans what reinfections mean for covid- . the lancet infectious diseases serological surveillance of sars-cov- : trends and humoral response in a cohort of public health workers spice: exploration and analysis of post-cytometric complex multivariate datasets this work was partly funded by ukri/nihr through the uk coronavirus immunology key: cord- -wgt kg f authors: diego-martin, borja; gonzález, beatriz; vazquez-vilar, marta; selma, sara; mateos-fernández, rubén; gianoglio, silvia; fernández-del-carmen, asun; orzáez, diego title: pilot production of sars-cov- related proteins in plants: a proof of concept for rapid repurposing of indoors farms into biomanufacturing facilities date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wgt kg f the current covid- crisis is revealing the strengths and the weaknesses of the world’s capacity to respond to a global health crisis. a critical weakness has resulted from the excessive centralization of the current biomanufacturing capacities, a matter of great concern, if not a source of nationalistic tensions. on the positive side, scientific data and information have been shared at an unprecedented speed fuelled by the preprint phenomena, and this has considerably strengthened our ability to develop new technology-based solutions. in this work we explore how, in a context of rapid exchange of scientific information, plant biofactories can serve as a rapid and easily adaptable solution for local manufacturing of bioreagents, more specifically recombinant antibodies. for this purpose, we tested our ability to produce, in the framework of an academic lab and in a matter of weeks, milligram amounts of six different recombinant monoclonal antibodies against sars-cov- in nicotiana benthamiana. for the design of the antibodies we took advantage, among other data sources, of the dna sequence information made rapidly available by other groups in preprint publications. mabs were all engineered as single-chain fragments fused to a human gamma fc and transiently expressed using a viral vector. in parallel, we also produced the recombinant sars-cov- n protein and its receptor binding domain (rbd) in planta and used them to test the binding specificity of the recombinant mabs. finally, for two of the antibodies we assayed a simple scale-up production protocol based on the extraction of apoplastic fluid. our results indicate that gram amounts of anti-sars-cov- antibodies could be easily produced in little more than weeks in repurposed greenhouses with little infrastructure requirements using n. benthamiana as production platform. similar procedures could be easily deployed to produce diagnostic reagents and, eventually, could be adapted for rapid therapeutic responses. the current pandemic is evidencing several weaknesses in our ability to respond to a global crisis, one of which is the insufficient and heavily centralized distribution of the world manufacturing capacity of bioproducts such as antibodies, vaccines and other biological reagents, specially proteins. since it is economically impracticable to ensure readiness by maintaining inactive infrastructures during large periods of normality, the development of dualuse systems has been proposed, which would serve regular production needs in normal times but could be rapidly repurposed to strategic manufacturing requirements in times of crisis. ideally, such adaptable infrastructures should be widespread to serve local demand in case of emergency. recombinant protein production in plants is a technologically mature bioengineering discipline, with most current plant-based bioproduction platforms making use of non-food crops, mainly the nicotiana species tabacum and n. benthamiana as biomanufacturing chassis (moon et al., ; capell et al., ) . n. benthamiana is most frequently used in association with agrobacterium-mediated transient expression, also known as agroinfiltration, a technology that dramatically reduces the time required for product development. briefly, agroinfiltration consists in the massive delivery of an agrobacterium suspension culture to the intercellular space of plant leaves, either by pressure, using a syringe (small scale), or applying vacuum to plants whose aerial parts have been submerged in a diluted agrobacterium culture (large scale). agrobacterium transfers its t-dna to the cell nucleus, therefore massively reprogramming the plant cell machinery towards the synthesis of the t-dna-encoded protein(s)-of-interest. transient expression of the transgene is often assisted by self-replicating deconstructed virus vectors that amplify the transgene dose, thus boosting protein production by several orders of magnitude (gleba et al., ) . other systems, such as the peaq system, rely on viral genetic elements for boosting expression without recurring to viral replication (sainsbury et al., ). transient expression in n. benthamiana has become the standard in plant-based recombinant protein production due to a unique combination of advantages, with speed and high yield as the most obvious ones. maximum production levels in the g/kg fresh weight (fw) range for certain highly stable proteins such as antibodies have been reported (marillonnet et al., ) . regarding speed, the in-planta incubation times required to obtain maximum yield of recombinant protein are no more than two weeks. an important, often insufficiently highlighted feature of n. benthamiana transient expression is its relatively small infrastructure requirements, partially overlapping with those employed in more conventional, medium/high-tech indoors agriculture, such as hydroponics, vertical farming, etc. (buyel, ) . in this context, when confronted with the covid- crisis, we decided to exercise a partial reorientation of the activities in our academic lab, which is equipped with a multipurpose glass greenhouse facility, towards the production of sars-cov- antigens and antibodies against the virus. here we describe the recombinant production, purification and analysis of six anti-sars-cov- monoclonal antibodies at laboratory scale, plus a pilot upscaling of two of those six antibodies. next to production scale, a critical parameter to assess was the response time. the process described here started in mid-april with the selection of literature-available antibody variable sequences and finalized nine weeks later with approximately . g of anti-sars-cov- antibody (ab) produced in modular badges of n. benthamiana plants and formulated as one litre of ab-enriched plant apoplastic fluid. based on this experience, we estimate that the same process can be reduced up to - weeks with small pre-adaptations, a remarkably short reaction time for a de novo antibody production system. absolutely key for this fast reaction is the immediate availability of scientific data including antibody sequences in pre-print repositories. this is in our opinion one of the most positive lessons that can be extracted from the covid- crisis. we discuss here the possible applications of the fast plant-produced antigens and antibodies in diagnostics and therapy and propose the repurpose of high-tech agricultural facilities as an alternative for decentralized biomanufacturing in times of crisis. both, nicotiana benthamiana wild type plants and , -xylosyltransferase/alpha , fucosyltransferase (Δxt/ft) rnai knock down lines (strasser et al., ) were grown in the greenhouse. growing conditions were °c (light)/ °c (darkness) with a -h-light/ -hdark photoperiod. all sequences were cloned and assembled using the goldenbraid (gb) assembly system https://gbcloning.upv.es) . antibody sequences were obtained from literature (see table ). all antibodies were cloned as single chain antibodies fused to the human igg fc domain. those antibodies derived from synthetic or camelid single domain vhh libraries (sybody , sybody and nanobody ) were designed as direct fusions. cr , cr and cr human monoclonal antibodies were redesigned as single chain variable fragment (scfv) by connecting the variable light (vl) and heavy (vh) chains with a ggggsggggsggggssgggs peptide linker. antibody sequences were codon optimised for n benthamiana with the idt optimization tool at http://eu.idtdna.com/codonopt . the sars-cov- antigen sequences used (n protein, yp_ . ; and s-protein rbd domain, yp_ . , aa - ) derive from the wuhan strain nc_ . four different versions of rbd were designed corresponding to (i) the native sequence with a cterminal xhis-tag or (ii) an n-terminal xhis-tag and a c-terminal kdel sequence for er retention, and (iii and iv) their corresponding n. benthamiana codon optimised counterparts, using the same tool as above. dna sequences were domesticated as level phytobricks for gb cloning and ordered for synthesis as double-stranded dna fragments (gblocks, integrated dna technologies). gblocks were first cloned into the domestication vector pupd (vazquez-vilar et al., ) in a bsmbi golden gate restriction/ligation reaction ( °c - min, x ( °c - min / °c - min), °c - min, °c - min). the ligation product was transformed into e. coli top electrocompetent cells and positive clones were verified by restriction digestion analysis and sequencing. pupd level phytobricks were then cloned into the expression vectors pgreen sp-higg (antibody sequences), pcambiav (rbd sequences) or pcambiav (n sequences). pgreen sp-higg is a pgreen vector based adaptation of the the magnicon® ' provector pich (icon genetics) that is designed for bsai cloning of gb (b -b ) standard parts as in-frame fusions with the tobacco ( - )-beta-glucanase signal peptide and the human igg fc domain. similarly, pcambiav and pcambiav are pcambia based adaptations of the magnicon® ' provector pich that are designed for bsai cloning of gb standard parts as in-frame fusions with tobacco ( - )-beta-glucanase signal peptide (pcambiav , for expression of secreted proteins) or without any subcellular localization signal (pcambiav , for expression of cytoplasmic proteins). assembly reactions were performed as above, and the ligation reactions were transformed into e. coli top electrocompetent cells. positive clones were verified by restriction digestion analysis. all level parts generated in this work are listed in supplementary table . for transient expression in n. benthamiana, the plasmids were transformed into agrobacterium tumefaciens strain gv c c by electroporation. the same strain but carrying the psoup helper plasmid was employed to allow the replication of the pgreen vectors which encode the antibodies. overnight grown exponential cultures were collected by centrifugation and the bacterial pellets were resuspended in agroinfiltration solution ( mm mes, mm mgcl , µm acetosyringone, ph . ) and incubated for h at rt in a horizontal rolling mixer. for small scale agroinfiltration, culture optical density at nm was adjusted to . with agroinfiltration solution and the bacterial suspensions harbouring the ' antibody or antigen modules, the integrase (pich ), and the ' module (pich ) were mixed in equal volumes. control samples were agroinfiltrated with pich _dsred and integrase module. agroinfiltration of to -week-old n. benthamiana plants was carried out through the abaxial leaf surface using a ml needle-free syringe (becton dickinson s.a.). for pilot scale production, the bacterial suspensions were prepared as above except that a lower od was used ( . for sybody agroinfiltration and . for nanobody agroinfiltration). additionally, for sybody agroinfiltration, a bacterial suspension of pich _dsred, a magnicon® ´module encoding the fluorescent protein dsred, was added to the final agrobacterium infiltration solution in a ratio : : . : . (pich :pich :pgreensp-sybody -higg :pich _dsred). delivery of agrobacterium to the plant cells was carried out by vacuum infiltration in a vacuum degassing chamber (model dp , applied vacuum engineering) provided with a l infiltration tank. the aerial part of whole plants (seven plants at a time) was immersed into the agrobacterium infiltration solution; vacuum was applied for min at a vacuum pressure of . bar and then slowly released. days post-vacuum agroinfiltration leaves were excised and then infiltrated with mm phosphate buffer ( . mm nah po , . mm na hpo . h o, ph ), without (sybody ) or with (nanobody ) . mm pmsf (sigma-aldrich, # ), following the same procedure as the vacuum agroinfiltration. after eliminating the buffer excess with tissue paper, the leaves were introduced into mesh zipped bags and then centrifuged using a portable cloth dryer orbegozo sc . thus, the apoplastic fluid was obtained from the drain tube. the apoplastic fluid was centrifuged ( min, x g, at °c) to remove any cell debris and agrobacterium, the supernatant was collected and then fractions of ml were concentrated times using kda amicon ultra- k centrifugal filters (millipore) after centrifugation ( min, x g, at °c). protein crude extracts were obtained by homogenizing ground frozen leaf tissue with cold pbs buffer ( mm nah po , mm na hpo . h o, mm nacl, ph . ) in a : (w/v) and were centrifuged at rpm for min at °c. for antibody purification, g of ground agroinfiltrated tissue were extracted in ml of cold mm phosphate buffer. samples were centrifuged at x g for min and the supernatant was transferred to a clean tube and further clarified by filtration through a . µm membrane filter. the recombinant antibodies were purified by affinity chromatography with protein a agarose resin (abt technology) following a gravity-flow procedure according to the manufacturer's instructions. mm citrate buffer ph was used for elution and m tris-hcl ph was used for neutralization of the eluted sample ( . µl for each µl elution fraction). purified antibodies were quantified using the bio-rad protein assay following the manufacturer's instructions and using bsa for standard curve preparation. the n. benthamiana leaves infiltrated with the different sars-cov- proteins were collected (rbd) or - (n protein) days post infiltration (dpi). leaves were frozen in liquid nitrogen and stored at - °c until used. protein extraction was performed using - g of ground frozen tissue in volumes of coldextraction buffer. three different buffers were tested as a first approach, in order to optimize the purification yields. buffer a: pbs buffer with mm imidazole, ph . buffer b: buffer a supplemented with % triton x- , and buffer c: buffer b supplemented with % glycerol, % sucrose and . % -β-mercaptoethanol. samples were vigorously vortexed and centrifuged at x g for min at °c. the supernatant was carefully transferred to a clean tube and filtered through a . µm syringe filter. protein purification was carried out by ni-nta affinity chromatography as described in (fernandez-del-carmen et al., ) . purified proteins were quantified using the bio-rad protein assay following the manufacturer's instructions and using bsa for standard curve preparation. proteins were separated by sds-page electrophoresis on nupage % bis-tris gels (invitrogen) using mes-sds running buffer ( mm mes, mm tris-base, . mm sds, mm edta, ph . ) under reducing conditions. gels were visualized by coomassie blue staining. for western blot analysis, proteins were transferred to pvdf membranes (amersham hybond™-p, ge healthcare) by semi-wet blotting (xcell ii™ blot module, invitrogen, life technologies) following the manufacturer's instructions. blots were blocked with % ecl prime blocking agent (ge healthcare) in pbs-t (pbs buffer supplemented with . % (v/v) tween- ). for anti-sars-cov- antibody detection, the blots were incubated with : hrp-conjugated rabbit anti-human igg (sigma-aldrich, #a ). for sars-cov- antigen detection the blots were incubated with : anti-his mouse monoclonal primary antibody (qiagen, # ) and then incubated with : peroxidase labelled anti-mouse igg secondary antibody (ge healthcare). blots were developed with ecl prime western blotting detection reagent (ge healthcare) and visualised using a fujifilm las- imager. the overnight coating of costar well eia/ria plates (corning) was carried out at °c with µl of µg/ml rbd (raybiotech, # - ) or bsa (used as control) in coating buffer ( mm na co , mm nahco , ph . ). after washes with µl of pbs, the plate was blocked with µl of a % (w/v) ecl advance blocking reagent (ge healthcare) solution in pbs-t (pbs supplemented with . % (v/v) tween- ) for h at rt. the plate was washed times with pbs, and then, starting at µg of the purified antibody per well ( µl), : serial dilutions in blocking solution were incubated for h min at rt. after washing steps with pbs-t (pbs buffer supplemented with . % tween- ), : hrp-labelled rabbit anti-human igg (sigma-aldrich, #a ) in blocking solution was added. after h, the plate was washed with pbs and the substrate o-phenylenediamine dihydrochloride sigmafast™ opd tablet (sigma-aldrich, #p ) was added (following manufacturer's instructions). reactions were stopped with µl m hcl per well and absorbance was measured at nm. the endpoint titer was determined as the last concentration of each purified antibody showing an absorbance value higher than the value defined as cutoff (mean blank + sd). blank is defined as the values from each elisa test against bsa (zrein et al., ; armbruster and pry, ) . the sandwich elisas were performed as described in the antigen elisa section with a few changes. the plates were coated with µl of µg/ml murine anti-his mab (qiagen, # ), and after blocking, the plates were incubated with µl of the crude extracts of the (rbd/n) antigen expressing leaves serially diluted ( : ) in bsa . %. wt crude extracts were used as negative control. the crude extracts were prepared by adding a volume of pbs buffer corresponding to times the mass of the ground tissue in liquid nitrogen. then the mix was centrifuged ( rpm, °c, min) and the supernatant was subjected to sonication before use. the antigens were sandwiched with µg of the corresponding purified antibody (or µl of the apoplastic fluid in % blocking reagent) per well ( h min incubation, rt). the same procedure as in the antigen elisa was followed for the incubation with the conjugated secondary antibody, colorimetric reaction and measurement. six different antibody sequences were selected for recombinant production in n. benthamiana, following a plant deconstructed viral strategy based on magnifection technology, as described earlier (marillonnet et al., ) (see table ). four of those were directed against the receptor binding domain (rbd) of the sars-cov- spike (s) protein, whereas the remaining two were directed against the n protein. all six antibodies were engineered as single polypeptide chains fused to the human cɣ -cɣ constant immunoglobulin domains. three of them, those derived from single chain camelid or synthetic vhh antibody libraries, were produced as direct fusions. the other three, derived from full-size human monoclonal antibodies, were redesigned as scfvs, using a linker peptide that connects vh and vl regions (see fig a) . the nucleotide sequences of the different variable regions were all obtained from the literature, then chemically synthesized with appropriate extensions and cloned into a destination magnifection-adapted vector using a type iis restriction enzyme strategy. the cloning cassette was flanked by a β-endoglucanase signal peptide for apoplastic localization in n-terminal and the human cɣ -cɣ domains of the human igg in the c-terminal side. the resulting vectors were transferred to agrobacterium cultures and agroinfiltrated in n. benthamiana leaves in combination with a ´ magnicon® module, containing the rna polymerase and movement protein, and with an integrase module (fig b) . for antibody production we used wild-type and rnai Δxt/ft glycoengineered n. benthamiana plant lines, the latter lacking plantspecific xylose and fucose glycosylation (strasser et al., ) . infiltrated leaves were examined daily, and only minimal damage was observed in the agroinfiltrated tissues during the incubation period. after seven days, leaf samples were collected, ground, and crude extracts were analyzed in sds-page. as can be observed in fig a-b (upper panel) , all samples produced coomassie-detectable bands of the expected antibody size. scfv-igg - kda antibodies migrated slightly above the - kda rubisco large subunit, partially masking its detection. vhh-igg antibodies migrated at the expected - kda size. the identity of the coomassie bands was confirmed by western blot using an anti-human igg antibody for detection (fig a-b , lower panel). as shown in fig a-b , under reducing conditions lower molecular weight (mw) bands were also detected, probably as a result of partial proteolytic degradation. small-scale affinity purification was carried out for all six antibodies produced in Δxt/ft plants using protein a affinity chromatography (fig c-d) . the resulting purified antibodies were used to estimate the yield of the final product, which ranged between . µg/g fw (cr antibody) to . µg/g fw (nanobody antibody) (see table ). the in-planta production of sars-cov- rbd and n protein antigens was also assayed in parallel using a similar strategy as described for antibody production. for this purpose, two versions of the expression vector were designed for rbd, one with the native viral sequence and the other with a n. benthamiana codon-optimized sequence. for the n protein, only the n. benthamiana codon-optimized sequence was employed. for rbd, native and codon optimized versions were targeted to the apoplast with the tobacco glucan endo- , -beta-glucosidase signal peptide and versions containing a kdel peptide for er retention were also generated. all nucleotide sequences were chemically synthesized with a small nucleotide extension coding for a histidine tag for detection ( fig a) . as described for antibody production, magnicon®-derived ´ vector modules encoding rbd and n proteins were agroinfiltrated in combination with an integrase module and a ´-module lacking any additional subcellular localization signal. shorter incubation times were decided in antigen production as compared to antibodies because antigen constructs produced different degrees of necrotic lesions in the leaves, ranging from mild symptoms in n protein to severe necrosis after four days in native rbd. for those constructs producing more severe lesions, incubation time was reduced to five days, and for the rest the incubation period was extended to seven days. rbd was extracted and purified using small-scale affinity-chromatography with niquel columns and the resulting coomassie and a western blot analysis are shown in fig b. rbd can be detected as a major estimated kda band, with the presence of higher mw bands that suggest multimerization. er retention did not improve expression levels of rbd for the native version, nor for the n. benthamiana optimized one (data not shown). addition of % triton x- to the standard extraction buffer (see materials and methods) did not improve the yield, which was estimated as - µg/g fw (table ) . n protein was extracted from agroinfiltrated leaves and affinity purified following the same procedure described for rbd. a major kda band was detected both on the crude extract and upon purification (fig b) . small-scale affinity-chromatography with niquel columns gave an estimated yield of µg/g fw for n protein ( table ) . binding activities of affinity purified anti-rbd antibodies were analysed by antigen elisa as shown in fig a. as expected, all assayed antibodies were active in binding their respective antigen. endpoint dilution titers were calculated for anti-rbd antibodies using a commercial antigen. sybodies and and nanobody showed high dilution titers, ( . µm, . µm and . µm, respectively), but the performance of cr was significantly lower ( . µm). in a parallel experiment, we tested the ability of plant-made antibodies to selectively detect our own plant-made antigens, including here also the n protein, using a sandwich elisa approach. for this analysis, elisa plates were coated with a murine anti-his mab, incubated with serial dilutions of crude plant extracts from antigen-producing plants and sandwiched with purified plant-made antibodies. as shown in fig b- c, all antigen-producing plant extracts gave sandwich-elisa signals significantly above the background when assayed using their cognate antibodies, thus evidencing the capacity of both, antibodies and antigens, to function as potent diagnostic tools. background signals in this experiment are likely due to cross-reaction of the anti-human igg secondary antibody with the murine anti-his mab, and could be easily reduced for more potent diagnostic applications by employing recombinant antibody formats other than igg. in the design of a pilot upscaling experiment, we favoured modularity and tried to maximize the affordability and adaptability of the process by reducing the requirements for highly specialized lab equipment. we carried out a final agroinfiltration for recombinant antibody production using a total of plants, equivalent to approximately . kilograms of fresh plant material. the plants were divided in two batches of plants each, and used to produce sybody and nanobody respectively, as these antibodies showed the most promising binding activities and yields. to facilitate the upscaling of the agroinfiltration process, plant seedlings were transplanted in growth modules, each module comprising seven pots kept together in a double layer of disposable plastic-board hexagons as shown in fig a. each production batch consisted in eight hexagonal modules. when plants were six weeks old, they were agroinfiltrated by submerging each hexagon upside down into a cm diameter cylindrical tank filled with l of an agrobacterium suspension, set inside a cylindrical vacuum degas chamber (fig a) . in this way, seven plants at a time were vacuumagroinfiltrated by slowly releasing vacuum while leaves remained submerged in the solution. next, plants were rinsed, brought back to the growth chamber and incubated for days before harvest. two different concentrations of the agrobacterium suspension were used in this experiment. one of them (sybody ) consisted in an od . final mix containing plasmids pich , pich , pgreensybody -igg , and pich _dsred at : : . : . ratio, where pich _dsred is a magnicon® ´module encoding dsred. the fluorescent marker was added to the infiltration mix to monitor the extension of the viral infection foci. as described elsewhere (julve et al., ; julve parreño et al., ) superinfection exclusion among virial clones yields mosaic-like expression patterns of individual clones, therefore the tiles produced by red fluorescent proteins were used as an indication of the extension and distribution of the unlabelled foci producing the recombinant antibody. in parallel, nanobody upscaled production was undertaken by agroinfiltration of an od . agrobacterium culture mix containing pich , pich and pgreennanobody -igg at : : ratio. after days, dsred tiles in sybody experiment, clearly visible with the naked eye, finalized their expansion in most agroinfiltrated leaves, an indicator that the expression tiles had covered the whole leaf surface (fig b) . at this stage, leaves were harvested and submitted to an apoplastic fluid recovery assay, where > . kg batches of detached leaves were vacuum infiltrated in mm phosphate buffer using the same vacuum device as described above. once rinsed to remove the excess of buffer, leaves were packed in mesh zipped bags, spinned down in a spin portable cloth dryer, and the intercellular apoplastic fluid was recovered from the drain tube. with this simple procedure, between and millilitres of apoplastic fluid (sybody and nanobody , respectively) was recovered from . kg of detached leaves. a fraction of the apoplastic fluid of both antibodies was concentrated times in kda centricons, and the rest was kept refrigerated for further analysis. fig c-d show the coomassie-staining and western blot analysis of crude extracts as well as apoplastic fluid preparations, and their corresponding purifications. crude extracts in this pilot experiment showed a vhh-igg band similar in intensity to that obtained in small scale experiments (data not shown). interestingly, apoplastic fluid consisted in a very simplified mix of proteins, with the recombinant antibody being among the most predominant ones. as shown, the different optical density of the agrobacterium culture, together with the presence of a competing dsred clone clearly influenced the accumulation levels, with the yields of nanobody clearly outperforming those of its sybody counterpart. unfortunately, the antibodies seemed partially degraded as indicated by the presence of two bands smaller than the expected vhh-cɣ -cɣ size, which could be compatible with degradation fragments. degradation was only partially solved with the addition of the protease inhibitor pmsf into the recovered phosphate buffer, as shown with nanobody production (fig d) . despite degradation, in a densitometric analysis we estimate that the recovered apoplastic fluid contains . g per liter of intact mab full-size. finally, we performed sandwich elisa tests of sybody and nanobody ( fig e and fig f, respectively) using the total and concentrated apoplastic fluid as detection reagent against serial dilutions of crude plant extracts from rbd-producing plants, showing that this simple antibody preparation can be directly employed in detection procedures without the need of additional purification steps. several n. benthamiana-dedicated bioproduction facilities are functioning worldwide, as those from leaf expression systems in uk (dobon, ) , icon genetics (giritch et al., ) and fraunhofer in germany (wirz et al., ) or kentucky bioprocessing in us (olinger et al., ) , among others. notably, medicago recently announced the building a new sqm facility with capacity for around - million of planned doses of flu vaccine per year. such facilities usually involve separated modules for upstream processing, namely a wet-lab module for preparation of the bacterial inoculum, a regular plant growth chamber, and agroinfiltration room, and a post-infiltration growth chamber. in addition, downstream processing facilities are often situated next or to the production ones to minimize the handling time of fresh tissues. whereas installed capacity of plant-dedicated biofactories is in continue growth, they are clearly insufficient to respond to global or even regional demands in times of crisis. we reasoned that, at least for upstream processes, the infrastructures required for medium scale n. benthamiana-based production are not radically different to those employed in high-tech agriculture practices as hydroponics, aeroponics or vertical farming, and thus high-tech agriculture facilities could be easily repurposed as biomanufacturing facilities in a matter of days or weeks (mcdonald and holtz, ) . as an exercise to practically test the repurposing requirements, we describe here the partial adaptation of our research laboratory and greenhouse facilities to the production of sars-cov- -related antigens and antibodies using n. benthamiana agroinfiltration as manufacturing platform. in figure we show a chronogram of the activities undertaken by our team towards the production of sars-cov- antigens and antibodies, from the initial selection of the nucleotide sequences of the genes-of-interest to the production of one litre of plant apoplastic fluid of recombinant sybody and nanobody . in our hands, the whole process took a total of nine weeks with non-exclusive personnel dedication and partially restricted access to our facilities. the process can be divided in three periods: the first step (design), taking approximately ten days, was dedicated to construct design and gene synthesis. it was pivotal in this step to have open access to viral and antibody sequences deposited in pre-print repositories. particularly remarkable was the openness of academic labs that immediately released primary sequence information of partially characterized anti-sars-cov- monoclonal antibodies, an exercise that should serve as an example in the future. due to our limited testing capacity, the number of parallel designs per product was maintained relatively low, and several design decisions (e.g. codon optimization, purification tags) were taken based in a best-guess approach. ideally, proper crisis preparedness should involve a centralized automated equipment such as a biofoundry (hillson et al., ) , with which the design space could be extended dramatically without causing delay. the second phase (build) was dedicated to cloning and construct building and lasted less than three weeks. our lab counts with adapted plasmids and cloning procedures from previous projects (sarrion-perdigones et al., ; vazquez-vilar et al., ) , therefore no significant time lag occurred in this step. importantly, this period also involved seeding a new plant batch at the scale required for pilot production in week seven ( plants distributed in hexagonal modules in this case). in a third phase (test), starting on week , all constructs were infiltrated at a small scale (three replicate leaves each), shortly incubated ( or days) and then tested functionally in parallel analyses. this small-scale assay took two additional weeks, summing a total of approximately days for the complete process. the synthetic biology-inspired design-build-test (dbt) process described above is conceived as an iterative one, so that new dbt cycles can be run fuelled by the conclusion of previous cycles to generate new optimized versions of the product. based on this experience, we estimate that the whole dbt cycle could be shortened to days or less by optimising the pipeline (e.g. introducing centralized, automatized design and build phases), and by improving preparation and anticipation in the facilities (fig ) . for instance, note that moving from step to step without delay requires a small batch of plants be always maintained in the facility, as it was in our case to supply our research requirements. this only involves transplanting - seedlings every three weeks, and then disposing of them every other three weeks once they start flowering. if a continuous plant supply is not maintained, a minimum of three extra weeks needs to be considered to have plants ready for the first test iteration. whereas the first version of products shown here lack iterative optimization, it would serve eventually to respond to the most urgent demands. in our case, as the results of the first dbt process arose, the best performing version (v . ) of two of the products were taken to production phase. in the exercise shown here, the upscaling was relatively small ( plants, approximately . kg fw). post-agroinfiltration incubation time was extended to days to maximize yields. in the meantime, optimization of the purification/extraction methods were undertaken at small scale, so that the new knowledge acquired could be applied in the batch purification of the pilot experiment. in a crisis-scenario, and given the modularity of the proposed scheme, several medium-size production modules can be replicated in a farming facility, and reproduced in several farms, allowing easy scalability. successive iterations with small scale agroinfiltration could be an effective way to maximize yields and reduce development times by comparing different small-scale strategies. it should be mentioned that the basic apoplast-based downstream processing proposed here could only be used, with the necessary adaptations, in a limited number of crisis-related applications, mainly related with detection and diagnosis. other uses, certainly therapeutic ones, would involve additional regulatory considerations including gmp downstream facilities, which are beyond the scope of this exercise. as a result of this experience, several improvements can be envisioned. we employed the magnicon® vector system with few adaptations for all the attempted proteins. although magnicon® produces maximum yields for many products, some proteins, particularly viral antigens may express better with other (e.g. non-viral) systems. in our experiments, antigens showed rather low expression levels despite optimization attempts using codon optimization and different localization signals. in adapting to an emergency, it would be advisable to perform initial expression tests using different production platforms also involving nonreplicative methods (sainsbury and lomonossoff, ) or dna viruses (yamamoto et al., ; zhang and mason, ) , and to incorporate them to the initial optimization test. as mentioned, this could be done in a centralized manner, later distributing expression clones to several repurposed production facilities. in contrast to antigens, recombinant single-chain antibodies showed in general higher and more uniform expression levels, as could be expected from their more similar structure. we chose to adapt full human iggs to a scfv-igg format to facilitate cloning and expression procedures, since it has been earlier described in plants that single chain formats reproduce the binding activities of the original full-size antibodies from which they derive. this format also facilitates comparisons with vhh antibodies, also produced as igg fusions. the plant-made sars-cov- products described here have several potential applications in the diagnosis area. both rbd and n proteins can be used as reagents for serological assays (amanat et al., ; liu et al., ) , although further yield optimizations should be required. for those assays where antigen glycosylation is an important factor, glycoengineered plants (strasser et al., ) can provide a competitive alternative to mammalian cells cultures (o'flaherty et al., ) . regarding antibodies, they can serve as internal references for the quantification of serological responses. with small modifications, the same antibodies can be adapted for sandwich elisa and employed in the detection and quantification of viral particles, a better proxy for infectiveness than rna. we also show here that apoplastic fluid is an inexpensive antibody preparation suitable for certain applications that require low-cost preparations, e.g. the concentration of the virus from environmental samples. as shown here, the protein complexity in the apoplast is greatly reduced, therefore the apoplast could be regarded as a plant-equivalent of hybridoma supernatant or ascited fluid, although at much lower cost. unfortunately, apoplastic preparations are prone to partial antibody degradation, probably due to endogenous proteases, however this can be minimized using extraction buffers with appropriate protease inhibitors, as it was shown for nanobody . the current pandemic crisis has evidenced the power of new antibody selection procedures, either based on single-cell selection from human peripheral blood mononuclear cells, in the case of full-size antibodies, or based on ultra-high throughput selection of synthetic libraries (sybodies) in the case of camelid-derived nanobodies (zimmermann et al., ; walter et al., ) . large collections of anti-sars-cov- , potentially neutralizing antibody sequences were made available to the scientific community in a question of weeks rather than months. it does not go unnoticed that the combination of rapid antibody selection procedures with fast, modular and scalable plant expression also has implications in the therapeutic arena as an ideal system for passive immunization. intravenous polyclonal immunoglobulins (ivig) from recovered patients have been shown a very effective covid- treatment in several studies (montelongo-jauregui et al., , and references herein) however the limited availability of patient sera hampers its application in practice. interestingly, we showed in a recent work that large recombinant polyclonal antibody cocktails (pluribodies), mimicking a mammalian immune response can be produced in n. benthamiana with high batch-to-batch reproducibility (julve parreño et al., ) . passive immunization with recombinant antibody cocktails resembles a natural response more than a monoclonal therapy, requires shorter developmental times and is probably more robust against the development of resistances. in conclusion, based on the results of the exercise described here, we propose the repurposing of indoors farms into plant-based biomanufacturing facilities as a viable option to respond to local and global shortages of bioproducts such as diagnostics and therapeutic reagents in times of crisis. the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. all authors designed and performed the experiments, and analyzed the data. d.o. wrote this manuscript. all authors revised and edited the written manuscript. this work was supported by h eu projects newcotiana and pharmafactory. s.s. is recipient of a fpi fellowship bio - -r from the spanish zimmermann, i., egloff, p., hutter, c. a., arnold, f. m., stohler, p., bocquet, n., et al. ( the upper timeline represents the actual timespan of the experiments. note that the time points represent approximately the days required to produce and initially characterize the designated products; notwithstanding, some of the results shown in previous figures correspond to extended analysis obtained at a later stage, during the preparation of this manuscript. the lower timeline describes the estimated minimal timespan that would result by introducing some of the optimizations described in the text. a serological assay to detect sars-cov- seroconversion in humans limit of blank, limit of detection and limit of quantitation plant molecular farming -integration and exploitation of side streams to potential applications of plant biotechnology against sars-cov- transient gene expression seeds plant-based bioproduction systems: leaf expression systems' hypertrans technology promises low costs and high yields recombinant jacalin-like plant lectins are produced at high levels in nicotiana benthamiana and retain agglutination activity and sugar specificity rapid high-yield expression of full-size igg antibodies in plants coinfected with noncompeting viral vectors engineering viral expression vectors for plants: the 'full virus' and the 'deconstructed virus' strategies building a global alliance of biofoundries a coat-independent superinfection exclusion rapidly imposed in nicotiana benthamiana cells by tobacco mosaic virus is not prevented by depletion of the movement protein a synthetic biology approach for consistent production of plant-made recombinant polyclonal antibodies against snake venom toxins evaluation of nucleocapsid and spike protein-based enzyme-linked immunosorbent assays for detecting antibodies against sars-cov- systemic agrobacterium tumefaciens-mediated transfection of viral replicons for efficient transient expression in plants from farm to finger prick-a perspective on how plants can help in the fight against covid- convalescent serum therapy for covid- : a th century remedy for a st century disease development of systems for the production of plant-derived biopharmaceuticals. plants mammalian cell culture for production of recombinant proteins: a review of the critical steps in their biomanufacturing delayed treatment of ebola virus infection with plant-derived monoclonal antibodies provides protection in rhesus macaques extremely high-level and rapid transient protein production in plants without the use of viral replication peaq: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants goldenbraid: an iterative cloning system for standardized assembly of reusable genetic modules goldenbraid . : a comprehensive dna assembly framework for plant synthetic biology generation of glyco-engineered nicotiana benthamiana for the production of monoclonal antibodies with a homogeneous human-like n-glycan structure: xylt and fuct down-regulation in n. benthamiana potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody molecular and biological characterization of human monoclonal antibodies binding to the spike and nucleocapsid proteins of severe acute respiratory syndrome coronavirus gb . : a platform for plant bio-design that connects functional dna elements with associated biological data sybodies targeting the sars-cov- receptor-binding domain automated production of plant-based vaccines and pharmaceuticals structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies improvement of the transient expression system for production of recombinant proteins in plants bean yellow dwarf virus replicons for high-level transgene expression in transgenic plants and cell cultures the authors want to thank to the staff of the ibmcp and the polytechnic university of valencia who help us to access safely to our laboratory and greenhouses during the covid- lockdown in spain, and specially eugenio grau for his readiness to help us with sanger sequencing during that period. we are grateful to prof. steinkellner and strasser for providing glycoengineered plant lines and to prof. gleba for sharing magnifection plasmids. ministry of science and competitiveness and r.m. is recipient of a gva fellowship (acif/ / ). key: cord- -pr g ivu authors: kalfaoglu, bahire; almeida-santos, josé; adele tye, chanidapa; satou, yorifumi; ono, masahiro title: t-cell hyperactivation and paralysis in severe covid- infection revealed by single-cell analysis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pr g ivu severe covid- patients can show respiratory failure, t-cell reduction, and cytokine release syndrome (crs), which can be fatal in both young and aged patients and is a major concern of the pandemic. however, the pathogenetic mechanisms of crs in covid- are poorly understood. here we show single cell-level mechanisms for t-cell dysregulation in severe sars-cov- infection, and thereby demonstrate the mechanisms underlying t-cell hyperactivation and paralysis in severe covid- patients. by in silico sorting cd + t-cells from a single cell rna-seq dataset, we found that cd + t-cells were highly activated and showed unique differentiation pathways in the lung of severe covid- patients. notably, those t-cells in severe covid- patients highly expressed immunoregulatory receptors and cd , whilst repressing the expression of the transcription factor foxp and interestingly, both the differentiation of regulatory t-cells (tregs) and th was inhibited. meanwhile, highly activated cd + t-cells express pd- alongside macrophages that express pd- ligands in severe patients, suggesting that pd- -mediated immunoregulation was partially operating. furthermore, we show that cd + hyperactivated t-cells differentiate into multiple helper t-cell lineages, showing multifaceted effector t-cells with th and th characteristics. lastly, we show that cd + t-cells, particularly cd -expressing hyperactivated t-cells, produce the protease furin, which facilitates the viral entry of sars-cov- . collectively, cd + t-cells from severe covid- patients are hyperactivated and foxp -mediated negative feedback mechanisms are impaired in the lung, while activated cd + t-cells continue to promote further viral infection through the production of furin. therefore, our study proposes a new model of t-cell hyperactivation and paralysis that drives pulmonary damage, systemic crs and organ failure in severe covid- patients. negative regulatory mechanisms of t-cell activation control inflammation in cancer, autoimmunity, and infections thus preventing excessive and prolonged inflammation which can induce tissue destruction, or immunopathology. immune checkpoints such as ctla- and pd- are well known examples of t-cell intrinsic regulators. upon recognizing antigens, t-cells are activated and start to express pd- and ctla- , which in turn suppresses the two major signalling pathways for t-cells: t-cell receptor (tcr) signalling and cd costimulation. in addition, the transcription factor foxp can be induced in activated t-cells, especially in humans, and plays a key role as an inducible negative regulator during inflammation. covid- is caused by the coronavirus sars-cov- , which is closely related to the severe acute respiratory syndrome coronavirus (sars-cov). the major symptoms such as cough and diarrhoea in mild to moderate patients can be understood through the type of tissues that can be infected by sars-cov- . sars-cov- binds to angiotensin-converting enzyme (ace ) on the surface of human cells through its spike (s) protein. viral entry is enhanced by the type ii transmembrane serine protease tmprss , which cleaves a part of s protein and thereby exposes the fusion domain of the s-protein. , sars-cov- establishes infections through epithelial cells in the upper and lower airways, which express ace and tmprss . in addition, the pro-protein convertase furin activates the s-protein of sars and sars-cov- . , intriguingly, t-cell specific deletion of furin results in the impairment of effector t-cells and regulatory t-cells (tregs) and leads to the development of age-related autoimmunity, which is accompanied by increased serum ifn-g, il- , il- , il- , and ige. in addition, furin is preferentially expressed by th cells and is critical for their ifn-g production. as evidenced in a parasite infection model, furin-deficient cd + t cells are skewed towards a th phenotype. it is poorly understood how sars-cov- induces severe infection in a minority of patients, who develop respiratory distress and multiorgan failure. these severe patients show elevated serum cytokines, respiratory failure, haemophagocytosis, elevated ferritin, ddimer, and soluble cd (il- r a chain, scd ), which are characteristic features of secondary haemophagocytic lymphohistiocytosis (shlh)-like conditions or cytokine release syndrome (crs). in fact, severe covid- patients have elevated levels of prototypic crs cytokines from innate immune cells including il- , tnf-a, and il- . , recently mcgonagle et al. proposed that activated macrophages drive immune reactions that induce diffuse pulmonary intravascular coagulopathy, or immunothrombosis, in severe covid - patients. while this may explain the unique pulmonary pathology of severe covid- patients, the underlying molecular mechanisms are poorly understood. importantly, crs in severe covid- patients may be more systemic and involve a wide range of t-cells. firstly, circulating t-cells are severely reduced in severe sars-cov- infections for unknown reasons , . intriguingly, severe covid- patients show elevated serum il- and soluble cd (il- receptor a chain). , since il- is a potent growth factor for cd -expressing activated t-cells , the elevation of both il- and cd indicates that a positive feedback loop for t-cell activation is established and overrunning in severe patients. these collectively highlight the roles of t-cells in the pathogenesis of severe sars-cov- infection, although the pathogenetic mechanisms are largely unknown. in this study, we analysed the transcriptomes of cd + t-cells from a single cell rna-seq dataset from a recent study and thereby investigated the gene regulation dynamics during sars-cov- infection. we show that sars-cov- induces multiple activation and differentiation processes in cd + t-cells in a unique manner. we identify defects in foxp mediated negative regulation, which accelerates t-cell activation and death. in addition, by analysing multiple transcriptome datasets, we propose the possibility that those abnormally activated t-cells enhance viral entry through the production of furin in severe covid- patients. we recently showed that, using scrna-seq analysis, melanoma-infiltrating t-cells are activated in situ and differentiate into either foxp high activated tregs or pd- high t follicular helper (tfh)-like effector t-cells . we hypothesised that those mechanisms for t-cell activation and differentiation in inflammatory sites are altered in covid- patients. to address this issue, we analysed the scrna-seq data from bronchoalveolar lavage (bal) fluids of mild and severe covid- patients. firstly, we performed in silico sorting of cd + t-cells and analysed their transcriptomes (fig. a ). we applied a dimensional reduction to the cd + t-cell data using uniform manifold approximation and projection (umap) and principal component analysis ( supplementary fig. a) . as expected, most t-cells were from either mild or severe covid- patients. notably, clusters , , , , , and did not contain any cells from healthy controls (hc) ( fig. b and c) , indicating that these cells uniquely differentiated during the infection, regardless of whether it was mild or severe disease. differential gene expression analysis showed that in comparison to mild patients, cd + tcells from severe covid- patients expressed higher levels of the ap- genes fos, fosb, and jun, the activation marker mki (ki ), th -related genes il r and maf, and chemokines including ccl , ccl , ccl , ccl , ccl , and cxcl (fig. d, supplementary fig. b ). these suggest that cd + t-cells in severe covid- patients are highly activated in the (fig. d, supplementary fig. b) . these were further confirmed by pathway analysis, which identified interleukin, jak-stat, and mapk signalling pathways as significantly enriched pathways (fig. e) . on the other hand, cd + t-cells from severe patients showed decreased expression of interferon-induced genes including ifit , ifit , ifit , and ifitm (fig. d) . pathway analysis also showed that cd + t-cells from severe patients expressed lower levels of the genes related to interferon downstream pathways (fig. e) , suggesting that type-i interferons are suppressed in severe patients. notably, cd + t-cells in severe patients showed lower expression of the tnf superfamily ligands tnfsf (trail) and tnfsf (light) and the surface protein slamf and klrb , all of which have roles in viral infections [ ] [ ] [ ] [ ] . next, we performed a pseudotime analysis in the umap space, identifying two major trajectories of t-cells, originating in cluster (fig. a, b) . pseudotime (t ) involved clusters , , , , , , and , showing a longer trajectory, while pseudotime (t ) involved clusters , , and . interestingly, the cells in the origin showed high expression of il- receptor (il r), which is a marker of naïve-like t-cells in tissues. the expression of il r was gradually downregulated across the two pseudotime trajectories (t and t , fig. b) . intriguingly, t-cells at the ends of both trajectories included mki (ki- ) + t-cells and some cells were nr a + or nr a + (fig. c, supplementary fig s c) , which indicated activation and cognate antigen signalling . these analyses support that the trajectories successfully captured two major pathways for t-cells to be activated during the infection. interestingly, well-known immunoregulatory genes including il ra, ctla , tnfrsf , and tnfrsf were more expressed in t-cells across pseudotime than pseudotime (fig. d, supplementary fig. b) . although these genes are often associated with tregs, foxp was not induced in both of these trajectories, and thus most of the t-cells did not become tregs. t-cells in pseudotime showed modest increase of ctla and tnfrsf only towards the end of the trajectory (fig. d) . furthermore, il ra was significantly upregulated in cd + t-cells from severe covid- patients compared to mildly affected patients (fig. e) . since cd (il ra) is a key marker for tregs and activated t-cells , we asked if cd -expressing t-cells in covid- patients were tregs. intriguingly, the percentage of foxp + cells in il ra + cd + t-cells was significantly reduced in severe patients compared to mild patients ( . % vs . %, fig. f ). this indicates that foxp transcription is repressed in cd + t-cells further suggesting that il ra + t-cells are activated t-cells or 'ex-tregs' (i.e. effector t-cells that were used to express foxp but more recently downregulated foxp expression , ), rather than functional tregs. since t-cells in the late phase of pseudotime upregulated il ra and mki , we asked if expanded tcr clones expressed more il ra. expanded tcr clones in severe patients had more il ra + t-cells than those in mild patients ( % and % cells in mild and severe patients expressed il ra, respectively; p < . , fig. g ), confirming that il ra + t-cells are associated with the severe phenotype. however, no significant difference was observed between expanded and non-expanded tcr clones in severe patients. notably, il transcription was not induced in cd + t-cells in severe patients (fig. h) , suggesting that cd + activated t-cells in severe patients die, at least partly, by cytokine deprivation. given that t-cells from severe patients dominated in the last part of pseudotime (i.e. clusters and ), these findings indicate that t-cells become more activated and vigorously proliferate in severe covid- patients than mild patients. these cd + activated t-cells are likely to be short-lived and do not initiate foxp transcription in severe covid- patients, while they can differentiate into tregs in moderate infections. firstly, we hypothesized that foxp transcription was actively repressed by cytokines in the microenvironments in severe covid- patients. foxp transcription is activated by il- and tgf-b signalling and is repressed by il- and il- signalling. in fact, some macrophages from severe covid- patients expressed tgfb and il (fig. i) , as shown by liao et al. however, cd + t-cells did not increase th -associated genes including rorc, il a, and il f, and the expression of ccr , a marker for th cells, was significantly reduced in severe covid- patients ( fig. j and supplementary fig. d ). this suggests that the differentiation of both tregs and th is inhibited. th differentiating t-cells express both foxp and rorg-t before they mature. in addition, foxp intermediate cd ra -t-cells express rorg-t and th cytokines. together with the scrna-seq analysis results above, these support the model that activated t-cells show differentiation arrest or preferentially die before becoming tregs or th cells in severe covid- patients. importantly, il ra expression was significantly increased in severe covid- patients in comparison to mild patients, whilst very few t-cells expressed il (fig. h) . pd- is another key immunoregulatory molecule for suppressing immune responses during viral infection. however, pd- may play multiple roles in cd + t-cells, as pd- is a marker for tfh. in fact, pd- high bcl high tfh-like t-cells are a major effector population in melanoma tissues. thus we asked if pd- -expressing t-cells show tfh differentiation and/or if pd- expressing t-cells can succumb to pd- ligand-mediated inhibition in covid- patients. however, in sars-cov- infection, bcl was not induced in the major activation and differentiation pathway, pseudotime , indicating that those activated t-cells did not differentiate into tfh. comparatively, t-cells in pseudotime showed some upregulation, although this was statistically not significant (fig. a) . this suggests that tfh differentiation was suppressed in covid- patients. pdcd was highly upregulated in both pseudotime and , suggesting that these cells are vulnerable to pd- ligand-mediated suppression. interestingly, macrophages from severe covid- patients expressed higher levels of cd (pd-l ) yet the expression of pdcd lg (pd-l ) was not significantly different between mild and severe patients (fig. b) . these results indicate that pd- -mediated t-cell regulation was at least partially operating in severe covid- patients. meanwhile, tbx and gata expression is induced in t-cells across pseudotime , suggesting that these t-cells may differentiate into th and th cells (fig. c) . t-cells in the late phase of pseudotime upregulated the expression of cytokines including ifng and il , which are th and th cytokines, respectively (fig. d) . in addition, il (a th and th cytokine) was upregulated in some cells in pseudotime whereas il- was highly sustained in both of the pseudotime trajectories. these indicate that differentiation processes for t-cell lineages are simultaneously induced in activated t-cells from the lung of covid- patients. accordingly, we hypothesized that cd -expressing activated t-cells preferentially differentiate into effector t-cells in severe covid- patients, instead of their most frequent fate as tregs in a normal setting. in order to test this hypothesis and reveal dynamics of each t-cell differentiation pathway, we analysed co-regulated genes across pseudotime , obtaining gene modules by a hierarchical clustering (fig. e) . heatmap analysis of pseudotime successfully captured the pseudo-temporal order of gene expression: genes in modules ii and il r are firstly activated, followed by genes in module iv (apart from il r), subsequently by genes in module i, and lastly genes in module iii alongside module ii genes again (fig. e) . reasonably, module ii contained the ap- transcription factors fos and jun, suggesting that t-cells that highly express these genes have been recently activated. in pseudotime , these fos + jun + t-cells were followed by tcells with high expression of genes in module iv, which contained cd lg and faslg (fig. e) . these cd lg + faslg + t-cells are considered to activate cd -expressing macrophages and dendritic cells as well as inducing apoptosis of fas-expressing cells by providing cd signalling and fas signalling upon contact. cd lg + faslg + t-cells are followed by t-cells that highly expressed genes in module i, which include the th transcription factor tbx , the th transcription factor gata , and foxp . in addition, these t-cells upregulated the immediate early genes egr and nfatc and the activation markers cd and pdcd (fig. e) . these collectively suggest that those (gitr), all of which were found to be upregulated (fig. e) . on the other hand, in order to further understand why cd -expressing t-cells failed to differentiate into effector tregs, we hypothesized that cd -expressing activated t-cells are more likely to differentiate into multiple effector t-cell subsets in severe covid- patients than mildly affected individuals. in fact, il ra + cd + t-cells from severe covid- patients expressed th , th , and il- signature genes more frequently (fig. f) whereas th differentiation was suppressed in il ra + t-cells from both groups. although il- has been classically regarded as a th cytokine, th cells can produce il- . in fact, il , il , il a, and il , were not detected in any t-cells analysed in the dataset (data not shown). thus, we asked if th differentiation was diverted into il- producing immunoregulatory t-cells (tr ), which differentiate by il- signalling and produce il- and thereby suppress immune responses particularly in mucosal tissues. . surprisingly, however, higher frequencies of il ra + t-cells expressed th and th signature genes, concomitantly expressing il signature genes (fig. g) . this strongly supports that il- producing t-cells are not immunoregulatory but th -th multifaceted effector t-cells. il ra and il rb were expressed in activated t-cells in both of the trajectories (supplementary fig. c) , and the frequency of il ra + cells was significantly higher in il + cells than il -cells ( % vs %, p = . , fig. h) . this suggests that a positive feedback loop for il- expression promoted the differentiation of the multifaceted effector t-cells in an autocrine manner , . intriguingly, furin expression was increased in cd + t-cells from severe covid- patients (fig. d) . furin was previously associated with treg functions in a knockout study , although the underlying mechanisms were not clear. in addition, it was not known if furin was specifically expressed in tregs amongst cd + t-cells. recently we showed that the majority of treg-type genes are in fact regulated by tcr signalling , , since tregs receive infrequent-yet-regular tcr signals in vivo . we hypothesise that t-cells produce furin upon activation, which can enhance the viral entry of sars-cov- into lung epithelial cells during inflammation. firstly, we analysed the microarray data of various cd + t-cell populations from mice . in line with our previous observations , all the antigen-experienced and activated t-cell populations including tregs, memory-phenotype t-cells, and tissue-infiltrating effector tcells showed higher expression of furin than naïve t-cell populations (fig. a) . next, we asked if human naïve and memory cd + t cells can express furin upon receiving tcr signals. we addressed this question using the time course rna-seq analysis of cd ra + cd ro − cd + naïve t-cells and cd ra − cd ro + cd + memory t-cells which were obtained from individuals and activated by anti-cd and anti-cd antibodies . sustained over weeks in the culture (fig. b) . memory t-cells also showed higher expression of furin over the time course (p = . ), with a significant difference at its peak ( hours, p = . ). thus, furin expression is induced by tcr signals in human and mouse t-cells. in sars-cov- infection, % of cd + t-cells and % of il ra + cd + t-cells from severe patients expressed furin, while in mild patients only % and % of those cells, respectively, expressed furin (fig. c) . importantly, furin was significantly induced in cd + t-cells in pseudotime , particularly when t-cells upregulated cd , ctla- , and tnfrsf molecules, but not in pseudotime (fig. d) . these collectively support that furin expression is induced in highly activated non-regulatory cd + cd + t-cells in severe covid- our study has shown that cd + t-cells in severe covid- patients have dysregulated activation and differentiation mechanisms. the most remarkable defect was the decoupling of treg-type activation and foxp expression, which normally occurs simultaneously to sustain the effector treg population while inflammation is resolved. this treg-type setting is further enhanced upon activation, when tregs begin to show the effector treg phenotype, further upregulating the expression of the immune checkpoint molecules. importantly, tregs need to sustain foxp transcription in a persistent manner across time, otherwise they can downregulate foxp expression and become effector tcells. , since il- signalling enhances foxp transcription, cd + t-cells are likely to differentiate into foxp + tregs in normal situations. however, in severe covid- patients, those cd + t-cells are considered to be vigorously proliferating, whilst becoming multifaceted effector t-cells or dying, instead of maturing into foxp + tregs. accordingly, we propose to define the unique activation status of cd + foxp -t-cells as hyperactivated t-cells (fig. ) . cd expression occurs mostly in cd + t-cells, and therefore, these cd + hyperactivated tcells are likely to be the source of the elevated serum soluble cd in severe covid- patients. these hyperactivated cd + t-cells produce furin, which can further enhance sars-cov- viral entry into pulmonary epithelial cells. the risk factors for the development of severe covid- include age, obesity, cardiovascular diseases, diabetes, and the use of corticosteroids. , these diseases are associated with dysregulated hormonal and metabolic environments that can dysregulate the homeostasis of cd + t-cells and foxp -expressing tregs. thus, it is imperative to investigate if genes and metabolites associated with the disease conditions have any roles in promoting the differentiation of hyperactivated t-cells. previous reports showed that tregs accumulated in atherosclerotic lesions, and foxp expression was reduced in cd + cd + t-cells from patients with prior myocardial infarctions. in addition, t-cells in patients with obesity may show different responses to t-cell activation. intriguingly, leptin a key hormone produced by adipose tissue, is thought to prevent cd + cd + t-cell proliferation but is relatively deficient in obese patients. , furthermore, the function of tregs is impaired in type- diabetes patients. in addition, foxp transcription is transiently activated in t-cells of severe covid- patients but may be repressed due to their unique metabolic states. tcell activation is dependent on glycolysis, which converts glucose to pyruvate, and the tricarboxylic acid (tca) cycle, which activates oxidative phosphorylation (oxphos) and generates atp in mitochondria. treg differentiation is more dependent on oxphos and can be inhibited by glycolysis. importantly, our pathway analysis suggested that these metabolic pathways were altered in severe covid- patients, although further studies on metabolism are required. furthermore, the hypoxic environment in the lung of severe covid- patients may activate hif- a, which mediates aerobic glycolysis, and thereby promotes the degradation of foxp proteins. the reduction of foxp proteins may result in the abrogation of the foxp autoregulatory transcriptional loop thus blocking treg differentiation. cd + hyperactivated t-cells also expressed pd- , and pd-l expression in macrophages was increased in severe covid- patients. this clearly shows that the pd- system is not able to control hyperactivated t-cells. this may be due to the status of macrophages and other antigen-presenting cells because cd on these cells disrupts the pd- -pd-l interaction and thereby abrogates pd- -mediated suppression. in addition, pd-l expression on lung epithelial cells may play a role in regulating pd- -expressing t-cells, as shown in other viruses including influenza virus and respiratory syncytial virus. , hyperactivated t-cells differentiated into multifaceted th -th cells with il- expression. while il- may serve as a growth factor for these cells through their il- receptors, other cytokines in the microenvironment may drive the expression of both th and th (fig. ) . in conclusion, our study demonstrates that sars-cov- drives hyperactivation of cd + tcells and immune paralysis to promote the pathogenesis of disease and thus life-threatening symptoms in severely affected individuals. therefore, therapeutic approaches to inhibit tcell hyperactivation and paralysis may need to be developed for severe covid- patients the single-cell-rna-seq data from covid- patients and healthy individuals was obtained from gse . the microarray data of murine t-cell subpopulation were from the immunological genome project (gse ). the rna-seq data in gse was used for time course analysis of naïve and memory cd + t-cells. we used h files of the scrna-seq dataset (gse ) which were aligned to the human genome (grch ) using cell ranger, by importing them into the cran package seurat . . single cells with high mitochondrial gene expression (higher than %) were excluded from further analyses. in silico sorting of cd + t-cells was performed by identifying them as the single cells cd and cd e, because no other methods, including the bioconductor package singler, reliably identified cd + t-cells. the tcr-seq data of gse was used to validate the in silico sorting and also for analysing gene expression in expanded clones. macrophages were similarly identified by the itgam expression and lack of pax , cd and cd e expressions. pca was applied on the scaled data followed by a k-nearest neighbor clustering in the pca space. umap was performed on clustered data using the first pca axes. differentially the enrichment of biological pathways in the gene lists was tested by the bioconductor package clusterprofiler, using the reactome database through the bioconductor package reactomepa, and pathways with false discovery rate < . and q-value < . were considered significant. trajectories were identified using the bioconductor package slingshot, assuming that the cluster that show the highest expression of il r and ccr is the origin. the cran package ggplot was used to apply a generalised additive model of the cran package gam to each gene expression data. genes that were differentially expressed across pseudotime was obtained by applying the generalised additive model to the dataset using gam, performing anova for nonparametric effects and thereby testing if each gene expression is significantly changed across each pseudotime (p-value < . ). the enrichment of cytokine-expressing single cell t-cells was tested using a chi-square test. the time course data of furin expression was analysed by one-way anova with tukey's honest significant difference test. (b) in severe covid- patients, hyperactivated macrophages may present antigens to cd + t-cells, which are activated and differentiate into cd + il r+ early activated t-cells which produce il- rather than il- . foxp transcription remains to be suppressed due to this and other unidentified mechanisms such as metabolism, while cytokines such as il- further enhance the activation of cd + t-cells, resulting in the generation of cd + hyperactivated t-cells that express immune checkpoints, multiple effector t-cell cytokines, and furin. the multifaceted th differentiation may lead to unfocused t-cell responses and thereby paralyse the t-cell system. in addition, furin can activate the s-protein of sars-cov- stimulatory and inhibitory co-signals in autoimmunity from stability to dynamics: understanding molecular mechanisms of regulatory t cells through foxp transcriptional dynamics sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor evidence that tmprss activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells t-cell-expressed proprotein convertase furin is essential for maintenance of peripheral immune tolerance proprotein convertase furin is preferentially expressed in t helper cells and regulates interferon gamma proprotein convertase furin constrains th differentiation and is critical for host resistance against toxoplasma gondii clinical features of patients infected with novel coronavirus in wuhan clinical and immunological features of severe and moderate coronavirus disease immune mechanisms of pulmonary intravascular coagulopathy in covid- pneumonia reduction and functional exhaustion of t cells in patients with coronavirus disease (covid- ). medrxiv structure and function of the interleukin receptor: affinity conversion model single-cell landscape of bronchoalveolar immune cells in patients with covid- elucidating t cell activation-dependent mechanisms for bifurcation of regulatory and effector t cell differentiation by multidimensional and single-cell analysis role of tumor necrosis factor-related apoptosis-inducing ligand in immune response to influenza virus infection in mice tnf superfamily networks: bidirectional and interference pathways of the herpesvirus entry mediator (tnfsf ) the measles virus receptor slamf can mediate particle endocytosis cd intcd + t cells: a novel population of highly functional, memory cd + t cells enriched within the gut a timer for analyzing temporally dynamic changes in transcription during differentiation in vivo controversies concerning thymus-derived regulatory t cells: fundamental issues and a new perspective self-antigen-driven activation induces instability of regulatory t cells during an inflammatory autoimmune response plasticity of foxp (+) t cells reflects promiscuous foxp expression in conventional t cells but not reprogramming of regulatory t cells signal transducer and activator of transcription limits the development of adaptive regulatory t cells tgf-beta-induced foxp inhibits t(h) cell differentiation by antagonizing rorgammat function functional delineation and differentiation dynamics of human cd + t cells expressing the foxp transcription factor the pd- /pd-l axis and virus infections: a delicate balance interleukin- production by effector t cells: th cells show self control the biology of t regulatory type cells and their therapeutic application in immune-mediated diseases il- receptor signaling is essential for tr cell function in vivo il- : the master regulator of immunity to infection control of regulatory t-cell differentiation and function by t-cell receptor signalling and foxp transcription factor complexes immunological genome project, c. the immunological genome project: networks of gene expression in immune cells promoter h k methylation dynamically reinforces activation-induced pathways in human cd t cells foxp controls regulatory t-cell function by interacting with aml /runx a temporally dynamic foxp autoregulatory transcriptional circuit controls the effector treg programme plasticity of foxp (+) t cells reflects promiscuous foxp expression in conventional t cells but not reprogramming of regulatory t cells self-antigen-driven activation induces instability of regulatory t cells during an inflammatory autoimmune response risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease letter to the editor: obesity as a risk factor for greater severity of covid- in patients with metabolic associated fatty liver disease statins induce the accumulation of regulatory t cells in atherosclerotic plaque regulatory t cells and il- levels are reduced in patients with vulnerable coronary plaques drug insight: the role of leptin in human physiology and pathophysiology-emerging clinical applications a key role of leptin in the control of regulatory t cell proliferation are regulatory t cells defective in type diabetes and can we fix them? metabolic pathways in immune cell activation and quiescence sugars and fat -a healthy way to generate functional regulatory t cells control of t(h) /t(reg) balance by hypoxia-inducible factor restriction of pd- function by cis-pd-l /cd interactions is required for optimal t cell responses local blockade of epithelial pdl- in the airways enhances t cell function and viral clearance during influenza virus infection expression of programmed death- ligand (pd-l) , pd-l , b -h , and inducible costimulator ligand on human respiratory tract epithelial cells and regulation by respiratory syncytial virus and type and cytokines snapshot: effector and memory t cell differentiation tissue-derived hedgehog proteins modulate th differentiation and disease foxp -deficient regulatory t cells do not revert into conventional effector cd + t cells but constitute a unique cell subset comprehensive integration of single-cell data clusterprofiler: an r package for comparing biological themes among gene clusters mo is supported by mrc project grant (mr/s / ) and is currently a visiting associate key: cord- - ku jc s authors: kraus, aurora; casadei, elisa; huertas, mar; ye, chunyan; bradfute, steven; boudinot, pierre; levraud, jean-pierre; salinas, irene title: a zebrafish model for covid- recapitulates olfactory and cardiovascular pathophysiologies caused by sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ku jc s the covid- pandemic has prompted the search for animal models that recapitulate the pathophysiology observed in humans infected with sars-cov- and allow rapid and high throughput testing of drugs and vaccines. exposure of larvae to sars-cov- spike (s) receptor binding domain (rbd) recombinant protein was sufficient to elevate larval heart rate and treatment with captopril, an ace inhibitor, reverted this effect. intranasal administration of sars-cov- s rbd in adult zebrafish recombinant protein caused severe olfactory and mild renal histopathology. zebrafish intranasally treated with sars-cov- s rbd became hyposmic within minutes and completely anosmic by day to a broad-spectrum of odorants including bile acids and food. single cell rna-seq of the adult zebrafish olfactory organ indicated widespread loss of expression of olfactory receptors as well as inflammatory responses in sustentacular, endothelial, and myeloid cell clusters. exposure of wildtype zebrafish larvae to sars-cov- in water did not support active viral replication but caused a sustained inhibition of ace expression, triggered type cytokine responses and inhibited type cytokine responses. combined, our results establish adult and larval zebrafish as useful models to investigate pathophysiological effects of sars-cov- and perform pre-clinical drug testing and validation in an inexpensive, high throughput vertebrate model. species that have been reported as naturally susceptible to sars-cov- include rhesus and cynomolgus macaques (munster et al., ; rockx et al., ) , ferret (kim et al., ) , cat (shi et al., ) , and syrian hamster (chan et al., ) . mice, by contrast, are not spontaneously permissive to the virus, but mice expressing the human ace receptor provide a useful animal model (bao et al., ; jia et al., ; lutz et al., ) . all these mammalian models have unique advantages and disadvantages for the study of immune responses to sars- cov- and other host-pathogen interactions but do not allow rapid, whole organismal, high throughput, and low-cost preclinical testing of drugs and immunotherapies. as model vertebrates, zebrafish are permissive to human viral pathogens including influenza a (gabor et al., ) , herpes simplex virus type (burgos et al., ) , chikungunya virus (palha et al., ) and human noroviruses gi and gii (van dycke et al., ). zebrafish offer many advantages over other animal models due to their high reproductive ability, rapid development, low maintenance costs, and small transparent bodies. importantly, zebrafish olfactory, immune, and cardiovascular physiology share a significant degree of conservation with humans (postlethwait et al., ; saraiva et al.) . genetically, more than % of disease related genes have a zebrafish orthologue (howe et al., ) . the zebrafish innate immune system is already developed in the transparent larval stages and members of all major groups of mammalian cytokines have been identified in the zebrafish genome (gomes and mostowy, ; zou and secombes, ) . academic laboratories and the pharmaceutical industry use zebrafish larvae in preclinical studies for assessing efficacy and toxicity of candidate drugs for several diseases (taylor et al., ) . zebrafish is a proposed model for covid- and has recently been used in one vaccination study (galindo-villegas, ; ventura-fernandes et al., ). this study aims to elucidate the physiopathology of wildtype zebrafish in response to sars- cov- . sars-cov- infection causes a wide litany of symptoms, ranging from asymptomatic to mild or severe disease (menni et al., ) . apart from respiratory symptoms, multi-organ pathologies are often reported with heterogeneous symptoms such as olfactory and taste loss, cardiac dysfunction, renal pathologies, neurological damage, muscle and joint pain, gastrointestinal symptoms, clotting disorders and others (tabata et mudd et al., ), and elevated type cytokine levels (lucas et al., ) . importantly, this cytokine pattern is in sharp contrast to that found in patients experiencing mild or moderate symptoms, who are able to control exacerbated type and type cytokine responses (lucas et al., ). sars-cov- enters the human host cells when sars-cov- spike (s) protein receptor binding domain (rbd) binds to angiotensin-converting enzyme (ace ) on a permissive host cell, then a serine protease, such as tmprss , cleaves the spike protein s /s site to facilitate fusion of the virion with the host cell membrane (hoffmann et al., a (hoffmann et al., , b . ace is expressed in many different cell types across many organs in the human body including lung, olfactory sustentacular cells, enterocytes, and endothelial cells (albini et ace , the use of ace inhibitors is being considered as a therapeutic intervention in covid- patients (lopes et al., ) . importantly, drugs currently used to treat the covid- can be pro-arrhythmic and therefore there is a need to incorporate cardiovascular damage into the list of targets of therapeutic interventions in covid- and for models that replicate human cardia physiology (kochi et al., ) . a hallmark of sars-cov- infection is acute loss of smell (cooper et al., ) . viral-induced anosmia is not unique to sars-cov- infections since viruses such as rhinoviruses, influenza, parainfluenza and coronaviruses are known to be the main cause of olfactory deficits in humans (suzuki et al., ; imam et al., ) . in mice and humans, ace expression is detected in sustentacular cells, olfactory stem cells known as horizontal and globose basal cells in the olfactory epithelium, and vascular cells (pericytes) in the olfactory bulb (brann et al., the present study reports for the first time that zebrafish larvae exposed to sars-cov- appear to mount innate immune responses that resemble cytokine responses of mild covid- patients. recombinant sars-cov- s rbd is sufficient to cause olfactory, renal and cardiovascular pathologies in larvae and adult zebrafish. we also identify potential mechanisms of sars-cov- induced anosmia by scrna-seq. our findings support the use of zebrafish as a novel vertebrate model to elucidate sars- cov- pathophysiology and to screen drugs and other therapies targeting covid- . results phylogenetic analyses of ace molecules in vertebrates comparative analysis of ace molecules in vertebrates indicated that ace molecules are well conserved in vertebrates with a %- % similarity and . %- % identity between zebrafish ace and human ace , respectively (table s ). examination of ace amino acid motifs in the region involved in binding sars-cov- s protein revealed zebrafish ace has %/ % sequence similarity with the corresponding human ace region compared to %/ % in macaques ace or %/ % in ferret ace ( systemic injection of recombinant sars-cov- protein into adult zebrafish has been shown to induce some toxicity (ventura-fernandes et al., ). in order to determine whether recombinant sars-cov- s rbd protein causes inflammatory responses in zebrafish larvae, we exposed dpf larvae to sars-cov- s rbd recombinant protein for hours (h) and measured cytokine responses by qpcr. as shown in figure a , h immersion with sars-cov- s rbd protein induced a significant downregulation in ifnphi expression and significant increase in expression of ccl a. , a pro-inflammatory chemokine. no changes in ace , tnfα, il b and il a/f expression were observed ( figure a ). these results indicate that rapid immune responses occur in zebrafish larvae exposed to sars-cov- s rbd. we next evaluated the effects of sars-cov- rbd s on zebrafish larva heart function to validate zebrafish larvae as a model for covid- cardiac manifestations. we immersed -and -days post fertilization (dpf) zebrafish larvae with sars-cov- s rbd, or with vehicle, and measured heart rate after h. as shown in figure b , dpf and dpf zebrafish treated with sars-cov- s rbd had significantly higher heart rates compared to vehicle treated controls. as a positive control for the recombinant protein, we used animals treated with the same dose of recombinant infectious hematopoietic necrosis virus (ihnv) glycoprotein (r-ihnvg), a rhabdovirus known to cause severe endothelial damage in zebrafish (ludwig et al., ) . r-ihnvg caused a severe decrease in larval zebrafish heart rate compared to control treated animals ( figure b -c). to determine if increased heart rate induced by sars-cov- s rbd was dependent on ace binding, we co- incubated dpf larvae with captopril and reverted sars-cov- s rbd induced heart dysfunction. captopril had no effect on r-ihnvg induced bradycardia ( figure b ). ventricular trace analyses showed marked differences in rhythm patterns in each treatment group ( figure d). importantly, the captopril and sars-cov- s rbd treated animals, despite having similar heart rates to those of the vehicle treated controls, displayed a unique ventricular trace pattern, warranting future studies regarding the potential cardioprotective role of captopril in combined, these results indicate that zebrafish exposed to sars-cov- s rbd protein experience tachycardia and suggest that zebrafish larvae constitute a valuable pre-clinical model to test the effects of drugs for covid- on cardiac activity in vivo. zebrafish anosmia is one of the earliest manifestations of sars-cov- infection in humans (cooper et al., ) . we have previously shown that ihnv glycoprotein protein is sufficient to induce rapid nasal immune responses as well as neuronal activation in teleost fish (sepahi et ( figure d ). loss of the epithelial mosaic structure characteristic of the teleost olfactory epithelium was observed on days , and post-treatment ( figure e ). by day , loss of entire apical lamellar areas due to severe necrosis was observed in the olfactory lamellae of all treated animals compared to controls ( figure f ). significant loss of olfactory cilia was recorded in all animals treated with sars-cov- s rbd at all time points ( figure h ). these results indicate the sars-cov- s rbd is sufficient to cause inflammation, edema, hemorrhages, ciliary loss, and necrosis in the olfactory organ of zebrafish. hence, olfactory damage can be caused by indirect mechanisms and in the absence of active sars-cov- replication in this tissue. toxicity effects of intranasal sars-cov- s rbd delivery were also evaluated in distant tissues such as the kidney, a target organ of sars-cov- . acute kidney injury (aki) incidence varies from . % to % in covid- patients (su et al., ) . renal damage, especially aki, is also common in patients with ras dysfunction such as diabetic patients who suffer from hypertension (ribeiro-oliveira et al., ; advani, ). a recent study in zebrafish injected with the n-terminal part of sars-cov- s protein reported inflammation and damage in several tissues of adult zebrafish including kidney and d post-injection (ventura-fernandes et al., ). in the present study, histological examination of the head-kidney of zebrafish who received sars-cov- s rbd intranasally revealed renal tubule pathology characteristic of aki h post-treatment ( figure s ). pathology was not as severe at later time points, but vacuolation of the renal tubule epithelium was still visible days post-treatment ( figure s ). we did not observe signs of glomerulopathology in treated animals compared to controls. together, these results indicate that intranasal delivery of sars-cov- s rbd is sufficient to cause nephropathy in adult zebrafish but that pathology is not as severe as when the protein is delivered by injection. intranasal delivery of sars-cov- s rbd causes anosmia in adult zebrafish adult zebrafish exposed to sars-cov- s rbd had a significant reduction of olfactory responses to food extracts of ~ % of preexposure olfaction within minutes as measured by electro-olfactogram (eog), indicating an instant effect of the protein on olfactory function (fig. a). reduction of olfaction was sustained for least one hour of recording, but the olfactory organ remained still semi functional. zebrafish treated with pbs never lost olfaction at any time point. to further quantify the degree of olfactory reduction due to sars-cov- s rbd, we took advantage of the two easily accessible and isolated olfactory chambers present in zebrafish. we exposed one naris to the sars-cov- s rbd protein and the other naris, from the same animal, to pbs and waited h or d before measuring olfaction by eog. at h we observed a - % reduction in food and bile olfactory responses between the treated and untreated naris and a complete loss of olfactory function to both odorants d post-treatment (fig. b) . the reduction of olfactory sensitivity for food extract was smaller than that found for bile, probably due to the lower number of osns involved in bile acid detection compared to amino acids found in food (hansen et al., ) . our results indicate that sars-cov- s rbd-induced-anosmia is not specific for a subset of osns, since both food extracts and bile olfactory signals were suppressed in sars-cov- s rbd treated zebrafish. this fact, together with the d time to develop complete anosmia and disrupt olfactory epithelial structure, support the hypothesis that sars- cov- s rbd damage may occur first on sustentacular cells, with subsequent impacts on osn viability and function. single-cell analysis of the zebrafish olfactory organ to understand the impact of sars-cov- s rbd on zebrafish oo, we performed single cell clusters, endothelial cell (ec) clusters and leucocyte (lymphoid and myeloid) clusters ( figure a-b). of the ncs, neuron and neuron corresponded to mature osns. neuron expressed markers of ciliated osns (ompa and ompb) in addition to several olfactory receptor (or) genes (buiakova et al., ) . neuron , on the other hand, expressed markers of microvillus osns (trpc b, s z, and gnao) and many vomeronasal receptors (vr) such as v rh as well as or gene ( figure s ) (kraemer et al., neuron and neuron expressed cell cycle and early neuronal progenitor markers as well as tmprss b and tmprss a, however the majority of cluster identifying genes in these two clusters are undescribed ( figure s ). scs are supporting cells that exist in the neuroepithelium around osns and in humans, they express ace (bryche et al., ) . scs in the olfactory epithelium can directly arise from horizontal basal cells (hbcs) (yu and wu, ). we found clusters of scs in our datasets. subpopulation closely related to the sustentacular cluster ( figure s ). we identified three clusters of ecs that all express tmprss and tmprss . while we did not detect ace expression in any cell clusters, we detected ace mrna in adult zebrafish olfactory organ, and at low levels in the olfactory bulb ( figure s ). ace expression levels have previously shown to be low in neuronal tissues and therefore may be hard to detect by scrna-seq (song et al., ). endothelial and clusters expressed the endothelial markers sox and tmp a (yao et al., ). all three clusters broadly expressed genes associated with tight junctions (tjp , jupa, ppl, cldne, and cgnl ) as well as many keratin genes ( figure s ). interestingly, endothelial cluster also expressed the calcium channel trpv and a slew of non-annotated genes. there are copious amounts of immune cells in the teleost olfactory organ ( intranasal delivery of sars-cov- s rbd induces inflammatory responses and widespread loss of olfactory receptor expression in adult zebrafish olfactory organ the cellular landscape of the zebrafish olfactory epithelium was affected by sars-cov- s rbd treatment and time ( figure a -d). this was especially evident in the proportions of neuronal cell types d post-treatment when the proportion of mature, omp + ciliated osn was much lower compared to controls and the h treated group. in contrast, neuronal progenitors expressing cell cycle markers (aubk, ecrg , and mki ) and neuronal differentiation and plasticity markers (neurod , neurod , gap , sox , and sox ) were expanded d post-treatment ( figure b -c). further, we detected a noticeable decrease in the proportion of cells belonging to the lymphocyte cluster, a cluster that expressed markers of treg cells (foxp b) h post intranasal delivery of recombinant sars-cov- s rbd but this change was not noticeable at day ( figure c ). at d, we observed a third lymphocyte cluster, not detected at h, highly expressing the tcr subunit zap as well as plac onzin related protein (ponzr ), a molecule that has immunoregulatory roles during th type immune responses in mammals ( figure s in olfactory neuronal clusters were enriched in processes such as neuron differentiation, sensory system development, and sensory organ morphogenesis, whereas downregulated genes belonged to sensory perception of smell, detection of chemical stimulus and gpcr signaling pathway ( figure f ). functional enrichment analyses in metascape showed that the top non-redundant enriched clusters in both upregulated and downregulated genes in zebrafish osns h post- treatment was sensory perception of smell ( figure g ). the same was true days post-treatment, but processes such as regeneration, neuron development, neuron fate commitment and the p signaling pathway were also enriched within the upregulated genes ( figure h ). combined, these results suggest that presence of sars-cov- s rbd in the olfactory organ instigates harmful effects on osns within hours and that the magnitude of the osn damage increases by days post-treatment. further, these analyses indicate that neuronal regeneration and differentiation processes were initiated by day in order to begin repair of olfactory damage. our study allowed us to dissect how each cell type in the zebrafish olfactory organ responds to sars-cov- s rbd. our results indicated unique responses by sc clusters and ec clusters to treatment (figures and ) . at h, we detected increased expression of apoeb, of transcription factors foxq a and id b, the transcriptional regulator nfil - , two tumor necrosis factor receptor superfamily members (tnfrsf b and tnfrsf a) as well as tcima (transcriptional and immune response regulator), whose mammalian ortholog pcim regulates immune responses as well as endothelial cell activation and expression of inflammatory genes ( figure a ) (kim et al., ). further, at h we observed downregulation of the pro-inflammatory cytokine il af/ as well as glutathione peroxidase gpx b, the transcription factor notch b, basal cell adhesion molecule bcam, guanine nucleotide-binding protein subunit gamma gng , and the calcium binding s z in sc and ec from sars-cov- s rbd treated olfactory organs relative to vehicle treated. at d post-treatment, we observed significant increased expression of the gene that encodes brain natriuretic peptide (nppc), a vasodilating hormone, the pro-inflammatory chemokine ccl a. , the m macrophage marker arg , the transcription factors foxq a and sox a, tubulin beta tubb , and the epithelial mitogen epgn, among others ( figure b ). downregulated genes at d post-treatment included hsp . , apoeb, the osteoblast specific factor b postb, the desmosomal component periplakin (ppl), the vasoconstricting endothelin (edn ), the heparin binding molecule latexin (ltx) involved in pain and inflammation, and cd b, a part of the mhc-ii complex ( figure b ). combined, these data indicated immune regulatory responses in sc and ec clusters early after sars-cov- s rbd treatment, followed by transcriptional changes with potential vasoactive effects by day . go and enrichment set analyses indicated that sc and ec clusters initially undergo transcriptional changes enriched in metabolic responses, response to stress, and cell differentiation ( figure c ). later on, at day , sc and ec responses were enriched in genes involved not only in the stress response but also in immune responses and responses to wounding ( figure d ). similar results were identified using metascape, which showed that the inflammatory response to wounding was moderately enriched in the downregulated genes at h, whereas by day , response to wounding became the top enriched set among the upregulated genes ( figure e -f). exposure of wildtype zebrafish larvae with sars-cov- does not support viral replication zebrafish larvae have been used as models to investigate several human viruses. infecting zebrafish larvae in a bsl- laboratory by immersion in contaminated water is comparable to infecting a cell line. we first checked the stability of sars-cov- in zebrafish water overtime, in the absence of any animals. we found that sars-cov- viral loads in the water remained stable throughout the experiment ( figure a -b). we exposed wildtype ab zebrafish larvae to live sars-cov and examined viral mrna abundance over time to determine if zebrafish larvae can support viral replication. we detected no increases in the viral n copy numbers over time and a steady decline in e gene copy numbers in both water from wells containing larvae and virus as well as in the larval tissue ( figure c -f). these results indicate that wild-type zebrafish larvae cannot support efficient sars-cov- replication as suggested by the in silico comparative sequence analyses of the zebrafish ace molecule. exposure of zebrafish larvae to sars-cov- decreases ace expression and triggers pro- inflammatory cytokine responses in order to determine whether exposure of zebrafish larvae with live sars-cov- causes changes in ai, we measured ace mrna levels in control and sars-cov- exposed larvae over time. ace expression was significantly downregulated as early as h post-infection. ace expression inhibition was sustained over the time course of the experiment with the greatest decrease occurring days post-infection (dpi) ( figure a ). we next evaluated changes in expression of cytokine and chemokine genes to establish whether zebrafish mount inflammatory responses that resemble the patterns of mild or severe sars-cov- infection. il β expression was significantly upregulated at h, dpi ( - fold) and dpi ( fold) and significantly downregulated at dpi ( figure b ). we detected a significant increase in tnfa expression in sars-cov- exposed larvae dpi ( figure c ). ifnphi and ifnphi are the two main type i ifn genes involved in larval zebrafish antiviral responses (levraud et al., ) . we detected a significant up-regulation of ifnphi at and dpi, whereas expression was inhibited at dpi. interestingly, ifnphi expression followed a very different pattern compared to ifnphi , which was significantly downregulated dpi but significantly upregulated at dpi ( figure d -e). mxa expression was significantly downregulated at all time points ( figure f ). il af/ expression was significantly elevated , and dpi ( figure g ). expression levels of il , a member of the il family, were downregulated hpi and dpi ( figure h ) whereas the type ii cytokine il /il b was downregulated at hpi, dpi and dpi ( figure i ). further, a significant increase in the expression of the chemokine ccl a. was detected in infected larvae at and dpi compared to controls ( figure j) . a moderate increase in ccl a. expression was observed at dpi followed by a strong down-regulation ( fold) at dpi ( figure k ). taken together, these data indicate that exposure to sars-cov- induces a significant antiviral and pro-inflammatory immune response in wildtype zebrafish larvae. this response involved type i ifn, tnfa, il b, il and ccl , reminiscent of covid- patients with mild disease. the current covid- pandemic has propelled the investigation of sars-cov- and the development of animal models that help identify therapeutic interventions and vaccines for covid- . thus far, all animal models reported are mammals, and therefore breeding, genetic manipulation, and animal housing in bsl- laboratories make these models costly and not readily available in large numbers. zebrafish can overcome many of the limitations of mammalian models thanks to their transparent bodies, short life-span, low maintenance costs and production of large numbers of embryos. we therefore performed the simplest infection procedure, where sars-cov- was added to the water of zebrafish larvae. in this manner, bsl- trained personnel with no experience in zebrafish microinjection can readily expose larvae to sars-cov- without the need of animal protocols in a similar fashion to in vitro cell culture infections. exposure of wildtype zebrafish larvae to sars-cov- in the water did not however result in any detectable viral replication. downregulated in response to sars-cov- exposure. combined, these data suggest the sars- cov- induces some type i ifn responses in zebrafish larvae while inhibits others. future studies are clearly needed to ascertain the role of teleost type i ifn in the anti sars-cov- immune response. s protein is a structural protein of sars-cov- and therefore the target of several vaccine trials. therefore, we exposed zebrafish larvae to sars-cov- s rbd protein and investigated transcriptional and physiological responses. rapid changes in gene expression were detected in treated larvae, including up-regulation of the chemokine ccl a. and the down-regulation of ifnphi . the ccl /ccl axis appears to be critical in teleost antiviral innate responses, as previous studies have shown very rapid responses in larvae exposed to the rhabdovirus svcv (sepahi et al., ) . this change was also detected in the larvae that were exposed to the live sars-cov- virus in the present study. we further detected a significant down-regulation of type i ifn ifnphi gene in larvae exposed to sars-cov- s rbd protein. examination of ace transcriptional changes in zebrafish larvae exposed to sars-cov- revealed a consistent down-regulation in expression throughout the course of infection. interestingly, we did not observe any changes in ace expression after h immersion with sars- cov- s rbd protein. recently, enterocytes were found to be the main cell type expressing ace in dpf-old zebrafish larvae (postlethwait et al., ); and therefore it is possible that the down-regulation in ace expression observed in our experiments was the result of enterocyte responses to sars-cov- . however, an olfactory epithelial cell cluster was not identified in this dataset, probably because these cells constitute too small a fraction of the cells of an entire larva. importantly, we exposed larvae to the virus at dpf, when the olfactory pit is already sampling the surrounding water, while the gut fully opens only at dpf. thus, changes in ace expression levels in the olfactory pit of the zebrafish larvae cannot be ruled out at this point. previous work has shown that ace knockdown in mice protects from sars-cov infection (kuba et al., ) . thus, down-regulation of zebrafish ace expression may have protected larvae from sars-cov- infection in our experiments. our data agree with studies in mouse lungs, where suppression of ace gene expression was consistently observed following sars- cov- infection (chen et al., ). interestingly, changes in ace levels can occur in response viruses that do not require ace for host entry (chen et al., ). thus, although further studies are warranted, our data suggest that ace is involved in antiviral sars-cov- responses in zebrafish. we took advantage of the zebrafish fish model which allows for easy live imaging of heart beats in transparent larvae. we detected in vivo cardiac/heart responses in larval zebrafish exposed to sars-cov- s rbd protein characterized by tachycardia. cardiac arrhythmia is a common symptom among covid- patients and current research efforts aim to understand how sars- cov- infection impacts cardiovascular function (libby, ) . our findings underscore that sars-cov- s rbd is able to cause tachycardia in the zebrafish larval model and that this model can be used for rapid evaluation of drug treatments for covid- . as a proof of concept, we used captopril, an ace inhibitor currently being evaluated in human clinical trials (nct ). captopril treatment ameliorated tachycardia in zebrafish larvae exposed to sars-cov- s rbd recombinant protein. our studies therefore suggest the beneficial use of captopril in covid- patients undergoing cardiac arrhythmia, but clearly further studies are required to fully translate these findings to the clinic and to determine the duration and timing of captopril treatment in covid- patients. a recent report in zebrafish adults indicated that injection of recombinant sars-cov- s n terminal protein caused histopathology of several tissues including the liver, kidney, brain and ovary (ventura-fernandes et al., ). additionally, some animals succumbed to injection with the recombinant protein. we did not detect any mortalities neither in larvae nor in adults in any of our experiments, perhaps suggesting that mortalities were due to the injection procedure rather than the protein treatment itself. of note, the dose used in the present study was considerably lower than the dose delivered in the ventura-fernadez study, perhaps explaining the differences in toxicity between both studies. we observed histological damage following a single intranasal delivery of sars-cov- protein, specifically at the site of delivery, the olfactory organ, whereas more transient and moderate damage was detected in the renal tubules. renal damage may have occurred by direct uptake of sars-cov- s rbd by the kidney once the protein reached the bloodstream following intranasal administration or, alternatively, by activation of ras or inflammatory cascades at the olfactory organ. thus, toxicity of sars-cov- s protein in adult zebrafish may be less severe when delivered intranasally than by injection, and future studies should evaluate whether current vaccine candidates also exert similar effects and whether different administration routes cause the same side-effects or not. this is particularly important as the intranasal route appears promising for some vaccine candidates study did not determine when zebrafish recover olfactory function following sars-cov- s rbd intranasal treatment, but based on our histopathological observations, the olfactory organ was still severely damaged days after treatment, suggesting that recovery of olfactory function may take several weeks in our model. our findings therefore indicate that similar to humans, zebrafish suffer from olfactory pathology and loss of smell in response to sars-cov- s rbd protein. thus, olfactory pathophysiology appears to occur even in the absence of viral replication raising the possibility that nasal vaccines for covid- may also cause transient anosmia in humans. in conclusion, the present study reports that both adult and larval wild-type zebrafish can be useful models to advance our understanding of covid zebrafish larvae in responses to sars-cov- s rbd. animals were exposed to sars-cov- s rbd protein (r-spike, ng/ml) for h at . °c or vehicle. changes in gene expression were measured by rt-qpcr using rps as the house-keeping gene. each data point represents a pool of larvae/well. data are expressed fold-change compared to vehicle controls using the pffafl method. (b) average heart beat per minute of dpf (n= ) zebrafish larvae after h of incubation with vehicle, ng/ml r-spike, or ng/ml r-ihnvg. (c) average zebrafish heart beats per minute in dpf zebrafish larvae (n= ) after h of incubation with vehicle, ng/ml r-spike, or ng/ml r-ihnvg with and without treatment with mm of captopril. heart beats were recorded for min at under a nikon ti microscope and (c-d) mean viral loads quantified as log of sars-cov- n gene and sars-cov- e gene copy numbers in control supernatants from well with larvae not exposed to virus, and supernatants from wells with larvae that were exposed to pfu of sars-cov- for h, d, d and d. each sample represents the supernatant of one well containing larvae. (e-f) mean viral loads quantified as log of sars-cov- n gene and sars-cov- e gene copy numbers in control larvae and larvae exposed to pfu of sars-cov- for h, d, d and d. each sample point represents one well containing larvae. larval infections began at dpf after mechanical dechorionation at dpf. * p-value< . ; ** p- value< . *** p-value< . . results are representative of two independent experiments. for all experiments, wild type ab zebrafish were obtained from zirc (oregon, usa). for the intranasal delivery of sars-cov- s rbd protein into adult zebrafish, female and male adult zebrafish were obtained from dr. wong's laboratory at the university of nebraska due to lockdown of zirc during the pandemic. all fish were maintained in a filtered aquarium system at ℃ with a h light and h dark cycle at the university of new mexico aquatics animal facility. all experiments with adults utilized a mix of male and female animals, and the larvae sex is indeterminable. animals were fed ad libitum gemma complete nutrition (skretting). ab larvae were obtained by batch-crossing ab adults allowing for natural fertilization. the morning of fertilization, larvae were collected at n= per petri dish and kept in e medium containing . % methylene blue. in the afternoon, larvae were placed in fresh e medium without methylene blue and non-surviving embryos were removed. larvae were maintained at . ℃ in e medium until dpf when they are slowly changed to system water. sars-cov- the sars-cov- isolate, a cdc isolate from a us patient (usa-wa / ), was obtained from bei resources. the strain was grown at a low moi to minimize generation of noninfectious particles and low passaged virus was used in all experiments described. the virus was propagated on vero e cells and viral loads quantitated by rt-qpcr and by plaque forming assays as we have described previously (bradfute et al, ). intranasal delivery of sars-cov- s rbd recombinant protein to adult zebrafish adult zebrafish were anesthetized for min in . mg/ml tricaine-s (syndel) solution and then moved to an absorbent boat where their gills were still covered with anesthetic solution for administration of solutions to nares. using a microloader tip (eppendorf, ), μl of ng/μl sars-cov- s rbd (kindly provided by dr. f. krammer) was directly pipetted into each naris, while μl of sterile pbs was applied in control fish. after inoculation, animals were recovered in a separate tank supplemented with o before returning to their rearing tank until the end of the experiment. euthanasia was performed on ice to ensure rapid death without perturbing the combined supernatants were centrifuged for min at g in supplemented neurobasal medium and cells were counted with a hemocytometer. viability was estimated by trypan blue staining. cells were then strained twice through flowmi μm strainers and loaded onto the chromium controller with a viability of > %. cell libraries were generated according to x genomics protocols at the university of new mexico cancer center genomics core facility and sequenced on an illumina novaseq at the university of colorado genomics and microarray core facility. sequencing depth and statistics of the scrna-seq run are shown in figure s . sras for this project can be found on ncbi under bioproject #prjna . fastqs were run through the cell ranger v . pipeline with default settings using the grcz zebrafish genome. output matrices were loaded to r (v . . ) as a seurat object (package seurat v . . ). first, cells with less than or greater than features, and greater than % mitochondrial features were removed. after counts were normalized using the "lognormalize" method and a scale factor of , variable genes were selected using the 'vst" method. data was scaled, and pca dimensional reduction was run. jackstraw analysis determined the vehicle control to have significant principal components (pcs) and the treated samples to have significant pcs which were used for clustering analysis. sars-cov- s rbd treated samples were integrated with the vehicle treated sample and clustered together using significant pcs and a resolution of . . cluster markers were identified with "findallmarkers" in seurat and exported for cluster identification. differential expression analysis was done with seurat "findmarkers" in default settings for each cluster and exported for gene ontology analysis. gene ontology (go) analysis was done with web-based guis metascape and shinygo v . which draw multiple currently maintained databases (ensembl, entrz, kegg among others). biological process webs were created using biological process output from shinygo v . in prism graphpad. biological processes bar graphs were produced by metascape. electro-olfactogram recordings adult ab zebrafish were anesthetized and received µl of recombinant sars-cov- s rbd protein ( ng/µl) in pbs or pbs alone. after h or day, zebrafish were anaesthetized ( . g ms /l), placed in a v-shape stand and supplied with aerated water containing ms anesthetic ( . g/l). the nasal flap was removed with sterile fine forceps to expose the olfactory rosettes to a continuous tank water source. olfactory responses to zebrafish food extract or goldfish bile were measured by electrical recordings as detailed in sepahi et al., . the food extract was prepared as a filtered solution of l tap water and . g of dry food pellets. water food extracts were separated in ml aliquots and kept frozen until the recording day. a . ml mix of bile fluid from adult goldfish was aliquoted in µl and kept frozen until the day of the recording. before each recording, bile aliquots were diluted : in water from the eog system. there were no significant differences in olfactory responses between males and females, hence responses of both sexes were averaged together. the percentage reduction in olfactory activity was calculated by dividing the amplitude of the olfactory signal at time x by amplitude of the olfactory signal at time * . percentage of olfactory signal reduction between control and treated naris was calculated as follows (amplitude response to odorant in control naris (mv) -amplitude response to odorant in treated naris (mv))/amplitude response to odorant in control naris (mv)) * . (sigma) at ℃, stabilized for min at rt and then imaged to record heart-beat activity. zebrafish larvae heart-beat recordings and analysis as the agarose solidified, animals were adjusted to the microscope stage (approx. min) then hearts were recorded using brightfield avi for min at . frames/s at rt. avi images were then opened in imagej with the time series analyzer v plugin. circular rois were drawn in either the atrium or ventricle and average intensity was extracted. the maximum average intensity peaks were identified and counted per s as bpm. data were analyzed by one-way anova with tukey's post hoc. infection of zebrafish fish larvae with sars-cov- ten animals were placed in each well in -well plates containing ml of tank water and transferred to bsl facility the day before infection. gene expression analyses by rt-qpcr whole larvae rna was extracted using trizol. for tissue homogenization bead beater tubes are preloaded with . g . mm dia zirconia beads, . g . mm zirconia beads and µl trizol. samples were loaded into the tubes and bead beat at rpm for s. tubes were then centrifuged at , rpm for min. the homogenate/lysates were transferred to clean . ml microfuge tubes and spun at , rpm for min to pellet debris. supernatants were then processed to extract the total rna using a standard chloroform/phenol extraction protocol. rna was quantified by nanodrop and samples were normalized and µg of rna was used to synthesize cdna using the superscript iii first strand system (thermofisher, ). qpcr was performed using ssoadvanced supermix (biorad, ) and primers listed in table s (supplemental methods). gene expression changes were quantified using the pfaffl method (pfaffl, glycerophospholipid biosynthesis apoptosis microtubule polymerization or depolymerization gpcr signaling, coupled to cyclic nucleotide nd messenger negative regulation of cell differentiation iron uptake and transport circadian regulation of gene expression sensory perception of smell developmental process anatomical structure development response to wounding response to stress immune system response cell chemotaxis multicellular organism development acute kidney injury: a bona fide complication of diabetes the sars-cov- receptor, ace- , is expressed on many different cell types: implications for ace-inhibitor-and angiotensin ii receptor blocker-based cardiovascular therapies myocardial injury and covid- : possible mechanisms savalan the pathogenicity of sars-cov- in hace transgenic mice the establishment of neuronal properties is controlled by sox and sox imbalanced host response to sars-cov- drives development of covid- macrophage-mediated neuroprotection and neurogenesis in the olfactory epithelium cov- entry genes in the olfactory system suggests mechanisms underlying covid- - associated anosmia massive transient damage of the olfactory epithelium associated with infection of sustentacular cells by sars-cov- in golden syrian hamsters olfactory marker protein (omp) gene deletion causes altered physiological activity of olfactory sensory neurons zebrafish as a new model for herpes simplex virus type infection integrating single-cell transcriptomic data across different conditions, technologies, and species simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility individual variation of the sars-cov- receptor ace gene expression and regulation acute inflammation regulates neuroregeneration through the nf-κb pathway in olfactory epithelium chronic inflammation directs an olfactory stem cell functional switch from neuroregeneration to immune defense covid- and the chemical senses: supporting players take center stage keiland broad host range of sars-cov- predicted by comparative and structural analysis of ace in vertebrates smell and taste disorders during covid- outbreak: cross-sectional study on patients an inflammatory cytokine signature predicts covid- severity and survival promoter transgenes direct macrophage-lineage expression in zebrafish expressions and significances of the angiotensin-converting enzyme gene, the receptor of sars-cov- for covid- influenza a virus infection in zebrafish recapitulates mammalian infection and sensitivity to anti-influenza drug treatment covid- : real-time dissemination of scientific information to fight a public health emergency of international concern a mechanistic model and therapeutic interventions for covid- involving a ras-mediated bradykinin storm shinygo: a graphical gene-set enrichment tool for animals and plants epigenetic contribution of high- mobility group a proteins to stem cell properties differential downregulation of ace by the spike proteins of severe acute respiratory syndrome coronavirus and human coronavirus nl the case for modeling human infection in zebrafish covid- and the cardiovascular system: implications for risk assessment, diagnosis, and treatment options impaired type i interferon activity and exacerbated inflammatory responses in severe covid- patients crystal structure of zebrafish interferons i and ii reveals conservation of type i interferon structure in vertebrates correlation between olfactory receptor cell type and function in the channel catfish a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the zebrafish reference genome sequence and its relationship to the human genome angiotensin-converting enzyme protects from severe acute lung failure is sars-cov- (covid- ) postviral olfactory dysfunction (pvod) different from other pvod? ace mouse models: a toolbox for cardiovascular and pulmonary research tc (c orf ) is a novel endothelial inflammatory regulator enhancing nf-κb activity infection and rapid transmission of sars-cov- in ferrets cardiac and arrhythmic complications in patients with covid- structural and functional diversification in the teleost s family of calcium-binding proteins intranasal vaccination with a lentiviral vector strongly protects against sars-cov- in mouse and golden hamster preclinical models a crucial role of angiotensin converting enzyme (ace ) in sars coronavirus-induced lung injury arrhythmias and sudden cardiac death in the covid- pandemic sars-cov- productively infects human gut enterocytes genes in zebrafish and humans define an ancient arsenal of antiviral immunity angiopoietin-like increases pulmonary tissue leakiness and damage during influenza pneumonia the heart in covid- : primary target or secondary bystander? jacc basic to composition and divergence of coronavirus spike proteins and host ace receptors predict potential intermediate hosts of sars-cov- continuing versus suspending angiotensin-converting enzyme inhibitors and angiotensin receptor blockers : impact on adverse outcomes in hospitalized patients with severe acute respiratory syndrome coronavirus ( sars-cov- ) -the brace corona trial longitudinal analyses reveal immunological misfiring in severe covid- whole-body analysis of a viral infection: vascular endothelium is a primary target of infectious hematopoietic necrosis virus in zebrafish larvae covid- preclinical models: human angiotensin-converting enzyme transgenic mice characterization of the immune barrier in human olfactory mucosa real-time tracking of self- reported symptoms to predict potential covid- targeted immunosuppression distinguishes covid- from influenza in moderate and severe disease respiratory disease in rhesus macaques inoculated with sars-cov- real-time whole-body visualization of chikungunya virus infection and host interferon response in zebrafish cc chemokine receptor expression defines a subset of peripheral blood lymphocytes with mucosal t cell phenotype and th or t- regulatory cytokine profile type i and type iii interferons -induction evasion, and application to combat covid- angiotensin ii induced proteolytic cleavage of myocardial ace is mediated by tace/adam- : a positive feedback mechanism in the activating a reserve neural stem cell population in vitro a new mathematical model for relative quantification in real-time rt- pcr an intestinal cell type in zebrafish is the nexus for the sars-cov- receptor and the renin angiotensin-aldosterone system that contributes to covid- comorbidities the renin-angiotensin system and diabetes: an update. vasc. health risk manag comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model innate immune signaling in the olfactory epithelium reduces odorant receptor levels: modeling transient smell loss in covid- patients interplay between sars-cov- and the type i interferon response molecular and neuronal homology between the olfactory systems of zebrafish and mouse. sci. rep tissue microenvironments in the nasal epithelium of rainbow trout ( oncorhynchus mykiss two distinct cd α + cell populations and establish regional immunity olfactory sensory neurons mediate ultrarapid antiviral immune responses in a trka-dependent manner coronavirus placenta-specific limits ifnγ production by cd t cells in vitro and promotes establishment of influenza-specific cd t cells in vivo neuroinvasion of sars-cov- in human and mouse brain alterations in smell or taste in mildly symptomatic outpatients with sars- cov- infection renal histopathological analysis of postmortem findings of patients with covid- in china identification of viruses in patients with postviral olfactory dysfunction clinical characteristics of covid- in people with sars-cov- infection on the diamond princess cruise ship: a retrospective analysis small molecule screening in zebrafish: an in vivo approach to identifying new chemical tools and drug leads proinflammatory cytokines in the olfactory mucosa result in covid- induced anosmia a robust human norovirus replication model in zebrafish larvae zebrafish studies on the vaccine candidate to covid- , the spike protein: production of antibody and adverse reaction a rampage through the body neuronal wiskott-aldrich syndrome protein regulates tgf-β -mediated lung vascular permeability receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus clinical characteristics of hospitalized patients with coronavirus-infected pneumonia in wuhan, china remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro the zebrafish activating immune receptor nitr signals via dap sars and mers: recent insights into emerging coronaviruses sox transcription factors in endothelial differentiation and endothelial-mesenchymal transitions spatiotemporal photolabeling of neutrophil trafficking during inflammation in live zebrafish regeneration and rewiring of rodent olfactory sensory neurons severe acute respiratory syndrome coronavirus infects and damages the mature and immature olfactory sensory neurons of hamsters angiomotin-like protein controls endothelial polarity and junction stability during sprouting angiogenesis metascape provides a biologist-oriented resource for the analysis of systems-level datasets clinical characteristics of covid- patients: a meta-analysis sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues the function of fish cytokines single-cell rna-seq data analysis on the receptor ace expression reveals the potential risk of different human organs vulnerable to -ncov infection key: cord- -miujzgtd authors: mishra, akhilesh; pandey, ashutosh kumar; gupta, parul; pradhan, prashant; dhamija, sonam; gomes, james; kundu, bishwajit; vivekanandan, perumal; menon, manoj b. title: mutation landscape of sars-cov- reveals five mutually exclusive clusters of leading and trailing single nucleotide substitutions date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: miujzgtd the covid- pandemic has spread across the globe at an alarming rate. however, unlike any of the previous global outbreaks the availability of a large number of sars-cov- sequences provides us with a unique opportunity to understand viral evolution in real time. we analysed full-length (> nt) sequences available and identified single-nucleotide substitutions occurring in > % of the genomes. majority of the substitutions were c to t or g to a. we identify c/gs with an upstream ttt trinucleotide motif as hotspots for mutations in the sars-cov- genome. interestingly, three of the substitutions occur within highly conserved secondary structures in the ’ and ’ regions of the genomic rna that are critical for the virus life cycle. furthermore, clustering analysis revealed unique geographical distribution of sars-cov- variants defined by their mutation profile. of note, we observed several co-occurring mutations that almost never occur individually. we define five mutually exclusive lineages (a , b , c , d and e ) of sars-cov- which account for about three quarters of the genomes analysed. we identify lineage-defining leading mutations in the sars-cov- genome which precede the occurrence of sub-lineage defining trailing mutations. the identification of mutually exclusive lineage-defining mutations with geographically restricted patterns of distribution has potential implications for diagnosis, pathogenesis and vaccine design. our work provides novel insights on the temporal evolution of sars-cov- . importance the sars-cov- / covid- pandemic has spread far and wide with high infectivity. however, the severeness of the infection as well as the mortality rates differ greatly across different geographic areas. here we report high frequency mutations in the sars-cov- genomes which show the presence of linage-defining, leading and trailing mutations. moreover, we propose for the first time, five mutually exclusive clusters of sars-cov- which account for % of the genomes analysed. this will have implications in diagnosis, pathogenesis and vaccine design the covid- pandemic caused by the novel coronavirus sars-cov- (sars-like coronavirus ) is rapidly spreading across the globe with over million cases within a period of about months (https://www.who.int/emergencies/diseases/novel-coronavirus- ). this is far more alarming than the sars (severe acute respiratory syndrome) epidemic of which affected countries, killed people, but was contained within months (https://www.who.int/csr/sars/country/table _ _ /en/). understanding the unique features of the sars-cov- is key to containing the covid- pandemic ( ). sars-cov and the sars-cov- are closely related rna viruses with plus strand rna genomes. rna viruses usually display high mutation rates that facilitate adaptability and virulence ( ) . the moderate mutation rates seen in sars-cov ( ) have been attributed to the presence of a '- ' exonuclease or proof-reading like activity ( ) . the rapid global spread of sars-cov- in a short period of time and the availability of a large number of fully sequenced genomes provide us with a unique opportunity of understanding the short-term temporal evolution of this virus in humans in a near real-time scale. while several studies have used phylogenetic analysis to reveal mutation hotspots in the viral genome and identify variants ( - ), we have utilized an alternative approach by focusing only on mutations present in > % genomes followed by clustering. by this approach we propose the classification of the sars-cov- virus genomes into mutually exclusive lineages with unique set of co-occurring mutations and geographic distribution. to understand the evolution of sars-cov- over the first three months of the pandemic, we performed a detailed analysis of full-length genomes available from the gisaid database from december , to march , . a total of sars-cov- sequences (> nt) were retrieved from gisaid (as on th march ). a multiple sequence alignment was performed to visualize the variations in sars-cov- genomes. to analyse the single nucleotide substitutions in detail, we decided to focus on those which were observed in > % of the genomes. as expected, the ' and ' ends of the sequences contained gaps and inconclusive nucleotide positions (ns) attributable at least in part to the technical shortcomings of sequencing. the substitutions were considered only when the quality of the sequences at a given nucleotide position is greater than %. our analysis revealed a total of nucleotide substitutions which occurred at > % in the sars-cov- genomes (table and figure a ). this includes synonymous mutations, missense mutations and substitutions in the non-coding regions. orf /n gene encoding for the nucleocapsid phosphoprotein seems to have accumulated the maximum number (n= ) of mutations. this is also evident when substitution frequencies for each gene are normalized for the gene length ( figure b ). among the non-structural proteins (nsps), the viral rna polymerase-helicase pair (nsp / ) accumulated nine mutations. these observations are important considering that the widely used diagnostic tests for the detection of sars-cov- target nsp (rdrp, rna-dependent rna polymerase) and n genes. while, we found mutations in the nsp region, the substitution frequencies per unit length are rather low as nsp is the longest among the non-structural genes. it is interesting to note than no mutations were observed in the nsp region coding for the main protease (mpro), a widely investigated anti-viral target for sars-cov- ( , ) . a quick scan along the kb genome of sars-cov- clearly suggests clustering of mutations at the ' end of the genome (figure ). it would be worth investigating differences if any, in the fidelity of the virus polymerase or exonuclease pertaining to the ' end of the genome. all subsequent analysis presented in this study pertain to the mutations occurring at > % frequency. the mutations identified in the non-coding regions include one each from the ' utr (c t), ' utr (g a/t) and the intergenic region between orf and orf (g a) ( table ) . these mutations may have implications on viral rna folding. group i and ii coronaviruses harbour three sub-structural hairpins sl a, sl b & sl c with conserved hexameric "uuycgu" loop motifs, which when present in multiple copies are presumed to act as packaging signals for viral encapsidation ( ) . similar structures have been predicted in the 'utr of sars-cov- ( , ) . the c t mutation changes the c-residue in the hexameric loop motif of sars-cov- sl b (figure a & b) . this is one of the earliest and the most prevalent ( . %) of the sars-cov- mutations and may impact viral packaging and titres. interestingly, g in the ' utr is the only position among the mutations which shows two different variants, g t and g a (table ). this residue is part of the stem-loop ii-like motif (s m) highly conserved in sars-like coronaviruses ( figure c ). sars-cov s m sequence assumes a unique rna fold similar to that seen in ribosomal rna and is thought to be of regulatory function ( ) . the g occupies a conserved position critical for the unique kinked structure and the substitutions at this position are expected to alter the s m stem-loop ( figure d ). recent studies have also reported other low frequency mutations in this region ( ) . another structural element relevant for coronavirus replication is the hairpin region designated as pk ( ' utr pseudoknot) ( ) . while the pk stem-loops ( - nt in sars-cov- ) are present in the ' utr of other coronaviruses, sars-cov- harbors the predicted open-reading frame orf overlapping with this region. the c t mutation falls within the pseudoknot sequence and comparison of predicted structures for the wild-type and c t mutant pk display strong changes in the hairpin structure and orientation ( figure e ). it is interesting to note that out of sequences harbouring this mutation have been reported from japan in the month of february . substitutions in the coding region may also have implications in viral rna folding and structure ( ) . a recent study identified conserved and highly structured rna elements in the sars-cov- genome ( ) . we mapped eighteen of our substitutions to these highly structured regions (table and supplementary table s ). we also aligned the sars-cov- genome with the genomes of closely related coronaviruses and analysed the positions homologous to these nucleotide substitution sites (supplementary table s ). interestingly, substitutions resulted in nucleotide changes identical to those in one or more of the related coronavirus genomes. transitions are more common than transversions across mammalian genomes as well as viruses ( ) ( ) ( ) . analysis of substitutions in influenza a virus and hiv- suggest that transversions are more deleterious than transitions ( ) . hence, it is not surprising that of the substitutions we report here for sars-cov- are transitions. substitutions at one of the nucleotide positions include both transition (g a) and transversion (g t). thus the transitional mutation bias in sars-cov- is in keeping with that reported for other viruses ( ) . c to t substitutions account for more than % (n= ) followed by g to a substitution which account for about % (n= ) of the substitutions ( figure a & b) . the predominance of c to t (u) substitutions in sars-cov- has been documented recently ( , ) . a similar trend has also been reported for sars-cov ( ) . amongst host rna editing enzymes, apobecs (apolipoprotein b mrna editing complex) and adars (adenosine deaminases acting on rna) have been studied for their ability to edit virus genomes ( , ) . both negative-strand intermediates and genomic rna (positive sense) may be available as ssrna inside coronavirus infected host cells. the negative-strand intermediates serve as templates for the synthesis of the positive strand (genomic rna) as well as the sub-genomic transcripts. in this process the negative-strand intermediates will progressively lose the ssrna conformation and gain dsrna conformation. in essence, both ssrna and dsrna forms will be available for rna editing by host enzymes. this is important considering that rna editing apobecs specifically act on ssrna to induce c>u transitions and adars target dsrna resulting in a>i (inosine; read as g) editing. apobec-mediated editing of the genomic rna (positive strand) will result in c to u changes, while the editing of the negative-strand intermediates (c to u) will result in g to a changes in the genomic rna (positive strand). it is well documented that the abundance of the positive sense rna outnumbers that of the negative strand rna in infected host cell ( ) . our finding of predominance of c to t (c to u) mutations ( . %) over g to a mutations ( . %) is consistent with the abundance of genomic rna (positive sense) over the negative-strand intermediates. similarly, adar-mediated editing of the genomic rna (positive sense) will result in a to g changes and the editing of the negative-strand (a to g) will result in u to c changes in the genomic rna (positive sense). our findings indicate that the substitutions which are consistent with adar editing (a to g and u to c), constitute about . % ( of substitutions) of those observed in sars-cov- genomes analysed. the predominance of apobec-like rna editing (i.e c to u and g to a) over adar-like rna editing (i.e. a to g and u to c) in sars-cov- may be explained at least in part by the transient nature of virus dsrna available inside infected host cells. editing of sars-cov- rna by apobecs and adars has been recently reported ( ) . in addition, apobec -mediated virus restriction has been documented for the human coronavirus hcov-nl ( ) . we also analysed the dinucleotide context of the mutations described. for this purpose, we considered nucleotides flanking the substitutions described in table . for example, a c>t mutation in the trinucleotide sequence `-acg- ` sequence will result in `-atg- ` trinucleotide. in this context an ac and a cg dinucleotide are lost and an at and a tg dinucleotide are gained. in other words, each substitution will lead to a loss of dinucleotides and a gain of dinucleotides. we analysed the substitutions (g a/t was excluded) to assess the net loss or gain for each of the dinucleotides. interestingly, the substitutions were associated with a loss of gg dinucleotides and a gain of tt dinucleotides ( figure c ). the dinucleotides lost or gained are influenced not only by the nature of the substitutions (eg. c to t or g to a) but also by the nucleotides flanking the substitution site. furthermore, dinucleotides provide important clues about virus pathogenesis ( , ) . in particular, cpg dinucleotides have been associated with virus replication, immune response and virus pathogenesis ( , ) . a recent paper suggests that sars-cov- is extremely cpg depleted to avoid host defences ( ) . virus evolution has been also linked to variations in other dinucleotides including tpa and gpt ( , ) . previous reports suggest that coronaviruses have excess of tt dinucleotides and are depleted for gg dinucleotides ( ) ; this is consistent with major dinucleotide gain/loss in sars-cov- genomes ( figure c ) the biological implications of these findings remain to be determined. signatures of apobec editing (c to t and g to a) dominate the substitutions observed in the sars-cov- genomes ( figure a) . a recent study on rna editing of sars-cov- genome shows evidence for apobec editing ( ) . nonetheless, the specific motifs for apobec-mediated editing of virus genomes remain poorly understood. the presence of c to t mutations and g to a mutation among the mutations we report gave us an opportunity to analyse preferences, if any for specific dinucleotide and trinucleotide motifs flanking these substitutions. while we found no such preferential motifs flanking the c to t or g to a substitution sites, we found interesting upstream trinucleotide motifs when all possible c (c>t, c>a and c>g) & g (g>a, g>t and g>c) substitution sites were considered together we found ttt trinucleotides upstream of . % of all substitutions ( out of ) occurring at c or g residues (c/gs) ( figure d ). we did not find any specific trinucleotide preference downstream to these sites (data not shown). there were no specific di or trinucleotide preferences upstream of substitutions occurring at a/ts ( figure e ). intrigued by the preference for ttt trinucleotide upstream of substitutions occurring at c/gs, we analysed the statistical significance of this finding. briefly, we generated random numbers corresponding to nucleotide positions in the sars-cov- genome and found positions with c/g. we then analysed the frequency of ttt trinucleotides upstream of these positions. the ttt trinucleotide frequency immediately upstream of substitutions originating at c/gs is significantly higher than that upstream of randomly generated nucleotide positions ( . % vs . %; p< . ; figure f ). this finding indicates that ttt trinucleotides followed by c/g (i.e. tttc or tttg) in the sars-cov- genome represent hotspots for substitutions. to the best of our knowledge, tetranucleotide motifs predisposed to higher substitution rates have not been reported for sars-cov- . furthermore, ttt trinucleotides were not detected upstream of any substitution occurring at as or ts (n= ) ( figure e ). while tttg/tttc have not been reported as hotspots among viruses, tttg has been reported as a hotspot for g to t substitutions in yeast deficient in nucleotide excision repair ( ) . ttc has been identified as a hotspot for apobec editing among gamma herpesviruses, which are dsdna viruses ( ) . the specific mechanism associated with increased substitutions at tttg/tttc motifs in the sars-cov- genome is unclear. as negative-strand intermediates of sars-cov- serve as the template for the synthesis of positivesense genomic rna, we speculate that the virus rdrp (rna-dependent rna polymerase, nsp ) may be more error prone at c/gs following a homopolymeric stretch of ts (i.e. ttt). we cannot rule out a role for other proteins involved in genomic rna replication including the nsp helicase and the nsp exonuclease in this process. nonetheless, the identification of tttg and tttc as hotspots for substitutions opens up a plethora of opportunities for research on identification of the underlying virus /host factors. the longest homopolymeric stretch in the entire sars-cov- genome is an octamer of ts (i.e.tttttttt) and the g at the end of this octamer (i.e. ttttttttg) is one of the substitution sites (g t; present in . % of sequences analyzed)(refer table ). more interestingly, a homopolymeric stretch of gggg (nt. to in the refseq) is associated with a triple mutant (i.e. gggg to aacg; present in . % of sequences analysed). all the three gs are either mutated together or they remain wild-type in the genomes analysed. in addition, among the substitutions we report, only the g a and the g a represent mutations within a single codon (r k in the nucleocapsid protein) in sars-cov- . multiple substitutions within a single codon represent positive selection of an amino acid ( ) . further, positively selected amino acids are frequently present in virus proteins involved in critical functions such as receptor binding ( ) . several studies have looked into the aspect of real-time sars-cov- evolution and strain diversification by using phylogenetic analyses ( ) ( ) ( ) . in contrast to this approach, we utilized our catalogued set of single nucleotide substitutions to understand the emergence of sars-cov- variants. a clustering analysis was performed to systematically group all full-length genomes into subtypes which are defined by the set of mutations they harbor. after removing groups which consisted of < genomes (approximately % of genomes analysed), the analysis revealed distinct clusters ( figure a ). this analysis identified very interesting geographical distribution of the clusters across the globe ( figure b ). for example, cluster- (c t, t c) primarily consisted of sequences from asia (as). interestingly, cluster- consisting of three additional mutations (cluster- mutations + c t, a g and c t) accounted for about % of the genomes analysed and was almost exclusively reported from north america (na). a similar pattern was observed for cluster , which was also restricted to na. the cluster consisting of sequences with mutations (c t, c t, c t, a g, c t, g a, g a, g c) was almost exclusively found in europe (eu). similarly, cluster with a single mutation (t c) was also predominantly reported from europe. our clustering analysis revealed several mutations which almost always co-occur. we define co-occurring mutations as those which occur together in > % of their individual occurrences. for example, if mutation a occurs times and mutation b occurs times in the dataset and mutations a and b are present together in the same genomes (i.e. co-occur) times, they are defined as co-occurring mutations as / is > %. accumulation of mutations is key to the diversification of viral lineages. in addition, the co-occurrence of mutations in virus genomes is often suggestive of compensatory mutations. individual mutations as well as co-occurring mutations can act as lineage defining mutations. we consider a specific mutation or a set of cooccurring mutations as "lineage-defining" for sars-cov- , only when they are present in at least % (n= ) of the sequences analysed. each lineage-defining mutation(s) are mutually exclusive and are not present along with another lineage-defining mutation. our analyses reveal five mutually exclusive lineages for sars-cov- . we refer to these five lineages as a , b , c , d and e in the chronological order of their appearance ( figure and supplementary figure s ). these five lineages account for about % of the sequences. the a lineage (co-occurrence of c t, t c) appeared as early as th january in asia. the presence of three co-occurring mutations (g a, t c, g t) defines our b lineage. this lineage was also first observed in samples collected from asia on th january . a single lineage defining mutation g t first reported from north america on the nd january represents our c lineage. the co-occurrence of c t, c t, c t, a g defines our lineage d that was first detected on th february in italy, europe. the e lineage appeared later in the netherlands (europe) and is defined by the presence of the t c mutation ( figure ). the mutually exclusive lineages which we report here account for three quarters of all sequences and may facilitate the development of simple alternatives to whole genome sequencing for epidemiological typing. off note, allelic discrimination assays for cooccurring single nucleotide variations have been successfully used for genotyping of sars-cov in the past ( , ) . leading and trailing mutations have been previously described in virus genomes ( , ) . we define leading mutations as mutations that are documented in sars-cov- at an earlier time point and their presence is a pre-requisite for the origination of trailing mutations. in other words, trailing mutations do not occur in the absence of leading mutations. the lineage-defining mutations a through e represent leading mutations in the evolution of sars-cov- . the chronology of the first appearance of all the substitutions in the sars-cov- genome is summarised in supplementary table s . the a lineage accumulates a trailing mutation set giving rise to a (a + c t, a g, c t) variants of sars-cov- . the a sub-lineage first appeared on th february from the usa and accounts for about %% of the genomes. even though the a lineage originated in asia, the a sub-lineage (that is defined by additional trailing mutations) appeared weeks later in north america and remained restricted to this geographic region. (supplementary figure s ) . thus these three a -defining trailing mutations can be used to discriminate between asian and north american strains in the a lineage. considering the fact that the a lineage accounts for > % of the sequences analysed here, these observations may have potential implications in investigating possible intercontinental variations in infectivity and disease outcomes. the most diverse pattern of trailing mutations has appeared in the d lineage, which incidentally is among the most recent of the lineage-defining mutations of sars-cov- . moreover, the lineage-defining mutations in the d variant include missense mutations in the nsp /rdrp (c t, p>l) and the s gene (a g, d>g), in addition to the sl b loop mutation (c t, figure s table ). within the d lineage, we could observe interesting bifurcation based on geographical distribution with d and d prevalent in europe and d and d in europe and north america. e (t c, a synonymous mutation in the leader peptide/nsp ), the most recent of the lineage-defining mutations of sars-cov- is predominantly seen in european strains (supplementary figure s ) . the constantly updating list of sars-cov- sequences provides a unique opportunity for us to understand the evolution of this virus in humans in the course of the global outbreak. to understand the dynamics of different mutations over the evolution of the virus, we compared full-length genome datasets from gisaid with submission dates until st january (n= ), th february (n= ) and th march (n= ). the mutations which show significant change in their frequency over the first three months of sars-cov- evolution are shown in figure . the data is filtered for mutations which appeared in > % of the sequences in the first (n= ) or complete (n= ) dataset. distinct patterns emerge for different mutations spread over the three datasets ( figure ). interestingly, the lineage-defining a mutation appear very early and despite slight reduction in the second dataset, consistently account for about one third of the sequences in each of the datasets. this would mean that the a lineage is consistently contributing to about a third of all newly reported sequences. the d lineage profile follows a logarithmic rise between the nd and rd datasets. the first case of this leading mutation appeared on the th of february and by the th of march it was among the most frequent mutations ( out of ; > %). consistent with the appearance of the trailing mutations later in the timeline, trailing mutations for a (denoted as a '), d (d '), d (d '), d (d ') and d (d ') sharply increase in numbers between the second and third dataset. c t (nsp /rna polymerase synonymous) is another mutation which also show an upward tendency reaching . % in the complete dataset ( figure a ). the g t variant (nsp l f mutation) displays a profile similar to a between january and february, but shows a downward trend in march. however, this mutation is present in a significant proportion of sequences in the final dataset ( . %) ( figure b) . the c leading mutation shows a marginal downward trend in frequency over time. interestingly, we also identified c t substitution which appear early in the timeline (jan ) in > % of the genomes, but their frequencies rapidly decline to < % in the final dataset ( figure b ). recent studies have shed light on a possible role for one of the d lineage-defining mutations (a g, spike-glycoprotein d g mutation) in the infectivity and fitness of sars-cov- ( ) . the d g mutation has been linked to higher virus loads in patients which may explain the rapid spread of this variant. we have identified trailing mutations d through d associated with sub-lineages of the d g variant (part of the d lineage). these findings provide the necessary groundwork for a plethora of studies with potential implications for diagnosis, pathogenesis and vaccine design. we focused our analyses on single nucleotide substitutions and identified substitutions with > % frequency in the sars-cov- genomes. our analysis show that the nucleotide changes in the identified substitution sites are non-random (predominant c to t/u and g to a) and are partially determined by the upstream sequence motifs. a significant proportion of analysed substitutions are consistent with apobecs/adars editing. clustering analysis revealed unique geographic distributions of sars-cov- variants defined by their mutation profile. interestingly, we observed several co-occurring mutations that almost never occur individually. our analyses reveal mutually exclusive lineages of sars-cov- which account for about three quarters of the genomes analysed. lineage-defining leading mutations in the sars-cov- genome precede the occurrence of trailing mutations that further define sub-lineages. the lineage-defining leading mutations together with the subsequently acquired trailing set of mutations provides a novel perspective on the temporal evolution of sars-cov- the genomic sequences of sars-cov- from clinical samples were downloaded from the gisaid database (https://www.gisaid.org/) ( ) . as of th march , gisaid consisted of sars-cov- full-length sars-cov- genomic sequences (> nt) from human host. the sequences were downloaded at different time periods in the first three months of and fall into three serial datasets with submission dates until / / (n= ), / / (n= ) and / / (n= ). the accession ids of the genome sequences are provided as supplementary table s . the alignment and refinement of the sequences with the sars-cov- reference genome were performed by using muscle multiple sequence alignment software ( ) . . the genomic sequences for the bat coronavirus ratg (accession id: mn . ), pangolin corona virus (accession id: mt . ) and the sars-cov (accession id: nc_ . ) were aligned with the sars-cov- reference genome to compare the substitution sites with homologous nucleotide positions in these closely related coronavirus genomes. alignment positions which harbor defined a/t/g/c residues (rather than gaps or `n`) in % of the aligned genomes were only considered for identifying nucleotide substitutions. these filtered nucleotide positions were analysed and scored for a/t/g/c occupancy in comparison to the sars-cov- reference genome (accession: nc_ . ). we then calculated the percentage of sequences with single-nucleotide variations for each nucleotide position in the sars-cov- genome. substitutions with > % frequency in the genome dataset were included for further analysis. the nucleotide changes at the positions were analysed and tabulated. similar analyses were done to calculate the gain and loss in frequencies of specific dinucleotide sequences at the substitution sites. we also analysed the trinucleotide context upstream and downstream of the substitution sites (- to + genomic positions). to understand the probability of finding ttt trinucleotide upstream of a random c/g position, we first mapped random positions (based on random numbers generated in ms excel) on the sars-cov- genome and identified g/c residues. we the calculated the ttt trinucleotide frequency upstream to these sites and compared it to the actual frequencies at the substitution sites using a chi-squared test. clustering was performed on the sars-cov- sequences with a custom script written using python programming language and the data was visualized using seaborn statistical visualization tool (https://seaborn.pydata.org/). the substitutions ( substitutions + g t and g a) were used for the generation of mutation groups which co-occurred in many genomes. there were also sequences in which none of the selected mutations were present and hence formed the 'no-mutation group'. in the second step, sequences harboring the same mutation groups were clustered together. clusters with less than genome sequences were excluded to obtain a list of clusters represented as a heat map. the clusters were correlated with their geographical location (obtained from gisaid) and represented. the datasets and clusters were analysed to identify co-occurring, lineage-defining, leading and trailing mutations. we classify mutations into the following categories:  co-occurring mutations: we define two (or more) mutations as co-occurring when they occur together in > % of their individual occurrences.  lineage-defining mutations include both co-occurring mutations and singlet mutations that are present in ~ % (n= ) of sequences analysed. lineage-defining mutations are mutually exclusive. to account for sequencing errors, a one sequence tolerance was allowed while deciding mutual exclusivity. rna sequence alignment and structure analysis rna sequences were aligned using multalin software (http://multalin.toulouse.inra.fr/multalin) ( ) . the stem-loop structures were predicted and visualized using the ipknot ( ) (http://rtips.dna.bio.keio.ac.jp/ipknot/) and forna web servers ( )( http://rna.tbi.univie.ac.at/forna/) respectively. the pdb structure for s m stem-loop motif from sars-cov was downloaded from the protein data bank (http://www.rcsb.org/pdb)(pdb id: xjr) and was visualized using ucsf chimera software (version . ) ( ) . of biological sciences, indian institute of technology delhi for infrastructural support. the authors thank iit delhi hpc facility for computational resources. the trailing mutations of a , d , d , d , d are denoted as a ', d ', d ', d ', d ' respectively and these do not include their leading mutations. for example, a ' consists of a defining mutations alone and does not include its leading mutation set a . a ', d ', d ', d ', d ' as well as the c t mutations are increasing in frequency over time. b. c t and the c lineage mutation g t declined with time. the g t substitution shows an initial increase followed by a decrease, but retain significant presence in all three datasets. table . summary of single nucleotide substitutions with > % frequency in sars-cov- genomes *g a and g a occur within the same codon and they always co-occur. amino acid annotation (r > k) is based on the co-occurrence of these mutations. **the mutations were mapped to the 'conservedstructured regions" as described by ( ) figure s . the five lineage-defining mutations a -e are mutually exclusive. a checker board analysis showing the non-overlapping nature of the lineage-defining mutations a through e . *the single exception was accession number epi_isl_ which had both d and e lineage defining mutations and was hence removed from both d and e lineages. genome composition and divergence of the novel coronavirus ( -ncov) originating in china viral mutation rates moderate mutation rate in the sars coronavirus genome and its implications discovery of an rna virus '-> ' exoribonuclease that is critically involved in coronavirus rna synthesis variant analysis of covid- genomes. world health organ preprint phylogenetic network analysis of sars-cov- genomes faster de novo mutation of sars-cov- in shipboardquarantine crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors structure of m(pro) from covid- virus and discovery of its inhibitors group-specific structural features of the '-proximal sequences of coronavirus genomic rnas genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan . an in silico map of the sars-cov- rna structurome the structure of a rigorously conserved rna element within the sars virus genome a phylogenetically conserved hairpin-type ' untranslated region pseudoknot functions in coronavirus rna replication an in silico map of the sars-cov- rna structurome rna genome conservation and secondary structure in sars-cov- and sars-related viruses: a first look patterns of transitional mutation biases within and among mammalian genomes patterns of nucleotide substitution in drosophila and mammalian genomes declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models evidence for the selective basis of transition-to-transversion substitution bias in two rna viruses the establishment of reference sequence for sars-cov- and variation analysis sars-cov genome polymorphism: a bioinformatics study rna editing by the host adar system affects the molecular evolution of the zika virus epstein-barr virus borf inhibits cellular apobec b to preserve viral genome integrity minus-strand copies of replicating coronavirus mrnas contain antileaders evidence for rna editing in the transcriptome of novel coronavirus apobec -mediated restriction of rna virus replication the cpg dinucleotide content of the hiv- envelope gene may predict disease progression identification of tell-tale patterns in the ' non-coding region of hantaviruses that distinguish hcpscausing hantaviruses from hfrs-causing hantaviruses attenuation of dengue (and other rna viruses) with codon pair recoding can be explained by increased cpg/upa dinucleotide frequencies cpg and upa dinucleotides in both coding and non-coding regions of echovirus inhibit replication initiation postentry extreme genomic cpg deficiency in sars-cov- and evasion of host antiviral defense cpg dinucleotide frequencies reveal the role of host methylation capabilities in parvovirus evolution modelling mutational and selection pressures on dinucleotides in eukaryotic phyla -selection against cpg and upa in cytoplasmically expressed rna and in rna viruses cpg usage in rna viruses: data and hypotheses the effect of sequence context on spontaneous polzeta-dependent mutagenesis in saccharomyces cerevisiae evolutionary effects of the aid/apobec family of mutagenic enzymes on human gamma-herpesviruses bursts of nonsynonymous substitutions in hiv- evolution reveal instances of positive selection at conservative protein sites positive selection on the h hemagglutinin gene of human influenza virus a a simple and rapid approach for screening of sars-coronavirus genotypes: an evaluation study coordinated evolution of influenza a surface proteins prevalence of epistasis in the evolution of influenza a surface proteins tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus data, disease and diplomacy: gisaid's innovative contribution to global health muscle: a multiple sequence alignment method with reduced time and space complexity multiple sequence alignment with hierarchical clustering rtips: fast and accurate tools for rna d structure prediction using integer programming forna (force-directed rna): simple and effective online rna secondary structure diagrams ucsf chimera--a visualization system for exploratory research and analysis we gratefully acknowledge the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories where genetic sequence data were generated and shared via the gisaid initiative, on which this research is based (supplementary table ). we acknowledge kusuma school table s . accession ids of all sars-cov- sequences downloaded from gisaid including acknowledgement. gisaid accession ids of all sars-cov- genome sequences used in the study with the acknowledgements and information on the participating labs. key: cord- -v syspu authors: long, s. wesley; olsen, randall j.; christensen, paul a.; bernard, david w.; davis, james r.; shukla, maulik; nguyen, marcus; ojeda saavedra, matthew; cantu, concepcion c.; yerramilli, prasanti; pruitt, layne; subedi, sishir; hendrickson, heather; eskandari, ghazaleh; kumaraswami, muthiah; mclellan, jason s.; musser, james m. title: molecular architecture of early dissemination and evolution of the sars-cov- virus in metropolitan houston, texas date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v syspu we sequenced the genomes of sars-cov- strains from covid- patients in metropolitan houston, texas, an ethnically diverse region with seven million residents. these genomes were from the viruses causing infections in the earliest recognized phase of the pandemic affecting houston. substantial viral genomic diversity was identified, which we interpret to mean that the virus was introduced into houston many times independently by individuals who had traveled from different parts of the country and the world. the majority of viruses are apparent progeny of strains derived from europe and asia. we found no significant evidence of more virulent viral types, stressing the linkage between severe disease, underlying medical conditions, and perhaps host genetics. we discovered a signal of selection acting on the spike protein, the primary target of massive vaccine efforts worldwide. the data provide a critical resource for assessing virus evolution, the origin of new outbreaks, and the effect of host immune response. significance covid- , the disease caused by the sars-cov- virus, is a global pandemic. to better understand the first phase of virus spread in metropolitan houston, texas, we sequenced the genomes of sars-cov- strains recovered from covid- patients early in the houston viral arc. we identified no evidence that a particular strain or its progeny causes more severe disease, underscoring the connection between severe disease, underlying health conditions, and host genetics. some amino acid replacements in the spike protein suggest positive immune selection is at work in shaping variation in this protein. our analysis traces the early molecular architecture of sars-cov- in houston, and will help us to understand the origin and trajectory of future infection spikes. we sequenced the genomes of sars-cov- strains from covid- patients in metropolitan houston, texas, an ethnically diverse region with seven million residents. these genomes were from the viruses causing infections in the earliest recognized phase of the pandemic affecting houston. substantial viral genomic diversity was identified, which we interpret to mean that the virus was introduced into houston many times independently by individuals who had traveled from different parts of the country and the world. the majority of viruses are apparent progeny of strains derived from europe and asia. we found no significant evidence of more virulent viral types, stressing the linkage between severe disease, underlying medical conditions, and perhaps host genetics. we discovered a signal of selection acting on the spike protein, the primary target of massive vaccine efforts worldwide. the data provide a critical resource for assessing virus evolution, the origin of new outbreaks, and the effect of host immune response. [keywords] sars-cov- , covid- , genome sequencing, remdesivir, molecular evolution significance covid- , the disease caused by the sars-cov- virus, is a global pandemic. to better understand the first phase of virus spread in metropolitan houston, texas, we sequenced the genomes of sars-cov- strains recovered from covid- patients early in the houston viral arc. we identified no evidence that a particular strain or its progeny causes more severe disease, underscoring the connection between severe disease, underlying health conditions, and host genetics. some amino acid replacements in the spike protein suggest positive immune selection is at work in shaping variation in this protein. our analysis traces the early molecular architecture of sars-cov- in houston, and will help us to understand the origin and trajectory of future infection spikes. [introduction] pandemic disease caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) virus is now responsible for massive human morbidity and mortality worldwide ( - ). the virus was first documented to cause severe respiratory infections in wuhan, china, beginning in late december ( ) ( ) ( ) . global dissemination occurred extremely rapidly, and has affected major population centers on many continents, especially in asia, europe, and north america ( ) ( ) ( ) . in the united states, seattle and the new york city region have been especially important centers of covid- disease caused by sars-cov- . similarly, in seattle and king county, , positive patients and deaths have been reported as of april , ( ) . the houston metropolitan area is the fourth largest and the most ethnically diverse city in the united states, with a population of approximately million ( ) . the , -bed houston methodist health system has eight hospitals and serves a large multinational and socioeconomically diverse patient population throughout greater houston. although the city is well-known as the energy capital of the world, houston also has a very large port, and the george bush international airport is a major transportation hub for domestic and international flights to asia, europe, and central and south america. many of these flights provide direct city-to-city service to diverse countries, including global population centers. the first covid- case in metropolitan houston was reported on march , ( ) . community spread was suspected of occurring one week later ( ). many of the first cases in our region were associated with national or international travel, including areas known to have covid- virus outbreaks ( ). these facts, coupled with the existence of a central molecular diagnostic laboratory that serves all houston methodist hospitals and our very early adoption of a molecular test for the sars-cov- virus, permitted us to rapidly interrogate genomic variation among strains causing infections in the greater houston area. we here report that sars-cov- was introduced to the houston metropolitan region many independent times from diverse geographic regions, including europe, asia, and south america. the virus spread rapidly and caused disease throughout the metropolitan region. we identified clear genomic signals of person-to-person transmission. some events were known or inferred based on conventional public health maneuvers, but many were not. in addition, spatialtemporal mapping found evidence of rapid and widespread community dissemination soon after covid- cases were reported in houston. analysis of the relationship between distinct genomic viral clades and hospitalization did not reveal significant evidence of more virulent genome types, underscoring the need for a better understanding of the relationship between severe disease, underlying medical conditions, gender, and host genetics. description of metropolitan houston. houston, texas, is located in the southwestern united states, miles inland from the gulf of mexico. it is the most ethnically diverse city in the united states ( viral genome sequencing and phylogenetic analysis. we sequenced the genomes of sars-cov- strains dating to the earliest stages of confirmed covid- disease in houston. phylogenetic analysis identified the presence of many diverse viral genomes that in the aggregate represent many of the major clades identified to date ( ) (fig. ) . clades a a, b, and b were the three more abundantly represented phylogenetic groups (fig. ). geospatial and time series. we examined the spatial and temporal mapping of the genome data to investigate community spread (fig. ) . the figure illustrates evidence of widespread and rapid community dissemination soon after the initial covid- cases were reported in houston. there is also evidence to suggest there were multiple independent strains introduced into metropolitan houston, followed by local spread throughout all regions of the community. epidemiologically linked patients. we investigated the relationships among some of the genomes that were obtained from patients known to share common epidemiologic associations, such as living in the same household. in all instances, individuals known to be epidemiologically associated had identical or nearly-identical sars-cov- genomes, a finding consistent with person-toperson transmission of the virus. geospatial relationships of genetically similar isolates. we next tested the hypothesis that genetically related viruses were constrained to particular geographic areas of metropolitan houston. although in some instances this was the case, an important observation was that most of the individual related subclades were comprised of strains distributed over broad geographic areas (fig. ) . these findings are consistent with the known propensity of respiratory virus sars-cov- to spread rapidly from person to person. patients with shared viral genomes likely constitute epidemiologic clusters, as a consequence of direct transmission to one another, shared transmission through an unknown third party, or via a reticulate network. patients, and other metadata. it is possible that sars-cov- genetic subtypes may have different clinical characteristics, analogous to what was thought to have occurred with the ebola virus ( ) ( ) ( ) and is known for other infectious agents. as an initial examination of this issue in sars-cov- , we tested the hypothesis that patients with disease severe enough to warrant hospitalization were infected with a non-random subset of virus genotypes. we also tested the hypothesis of non-random association between virus clades and disease severity based on in-patient versus out-patient status, the need for mechanical ventilation, and the number of days on a ventilator. we found no apparent simple relationship between viral clades and disease severity using these indicators of disease severity. similarly, there was no simple relationship between viral clades and other metadata, such as gender, age, or length of stay ( fig. s -s ). machine learning analysis. machine learning models were built to predict mortality, length of stay, in-patient status, or icu admission based on the viral genome sequence. f scores (the evaluation metric used in classification algorithms) for most classifiers ranged between . to . , indicative of classifiers that are performing similarly to random chance. outcome (lived versus died) was only weakly correlated with age in this data set (pcc = . ), and similarly regression models built to predict age and length of stay based on the viral genome sequence also had poor performance, with r values near zero. classifiers were also trained to predict host characteristic, gender and ethnic group. the two largest ethnic groups in the patient population were african american and caucasian, as recorded in the electronic medical record. the binary classification model had an f score of . [ . - . , % ci], likely reflecting social networks in early person-to-person transmission. a table of classifier accuracy scores and performance information is provided as table s . analysis of the nsp polymerase gene. the sars-cov- genome encodes an rna dependent rna polymerase (rdrp) used to replicate this rna virus. two amino acid substitutions (phe leu and val leu) in the nsp rdrp have been reported to confer significant resistance in vitro to remdesivir, an adenosine analog ( ) . remdesivir is inserted into rna chains by rdrp during replication, resulting in premature termination and inhibition of virus. this compound has shown prophylactic and therapeutic benefit against mers-cov experimental infection in rhesus macaques ( ) . a recent publication describing results from a compassionate use protocol reported that remdesivir may have therapeutic benefit in some hospitalized patients ( ) . if efficacy is confirmed by a randomized controlled trial, this drug may be be widely used in large numbers of patients worldwide. to acquire baseline data about allelic variation in the gene encoding nsp rdrp, we analyzed the sequenced viral genomes. the analysis identified nonsynonymous single nucleotide polymorphisms (snps) in nsp , resulting in amino acid replacements throughout the protein ( table ) . the most common amino acid change was pro leu, identified in of the houston isolates. this amino acid replacement is present in genomes from the a a clade, and distinguishes the a a clade from other sars-cov- clades. the other amino acid changes in nsp were mainly present in single isolates from individual houston strains, and some have been identified in other strains in the global gisaid collection ( , ) . a prominent exception was the identification of strains with a met ile polymorphism. all strains were phylogenetically very closely related members of the a a clade, and they all also had the p l amino acid replacement characteristic of this clade (fig. s ) . these data indicate that the met ile change is the derived state among strains with the pro leu replacement. importantly, none of the observed amino acid polymorphisms in nsp were located precisely at the two sites associated with remdesivir in vitro resistance ( ) , and of the polymorphic amino acids are located on the surface of this rdrp. however, importantly, the ala val replacement occurs in an amino acid that is located directly above the nucleotide substrate binding site that is comprised of lys , arg , and arg , as recently shown by structural studies ( , ) (fig. ) . the ala position is comparable to val (fig. ) , and a val leu mutation in sars-cov was identified to confer resistance to remdesivir ( ) . coronavirus such as sars-cov and mers gain entry into susceptible host cells using a densely glycosylated viral-surface molecule knows as spike (s) protein. the s protein of sars-cov- virus and its close coronavirus relatives binds directly to the host angiotensin-converting enzyme (ace ) to enter host cells. thus, s protein is a major translational research target, including small molecule inhibitors and extensive vaccine efforts globally ( , ) . analysis of the gene encoding the s protein identified snps, including that produce amino acid changes (table , fig. a ). seven of these replacements (a v, f l, s r, t i, m v, k m, and q h) are not represented in the gisaid database as of april , . f makes contact with the human ace receptor, occupying a pocket formed by ace residues thr , asp and lys ( fig. b ) ( ) . the f l substitution is expected to detrimentally affect binding to ace , as the shorter leu side chain would not fill the pocket well. we note that seven of the amino acid replacements map to the periphery of the s subunit n-terminal domain (ntd). four of these replacements (a s, t i, g v, and a v) can be mapped to the recently determined cryo-em structure of the sars-cov- spike ( ), whereas three other amino acid changes (l f, t i, and h y) are located in flexible ntd loops that could not be modeled (fig. a) . the clustering of amino acid replacements to a distinct region of the protein, together with the occurrence of two different amino acid replacements occurring at the same residue is a strong signal of positive selection. inasmuch as infected patients make antibodies against the ntd region, we favor the idea that host immune selection is a key force contributing to amino acid variation in this region, resulting in an enhanced fitness phenotype of virus variants. also of note, the d g replacement was observed in % ( of strains) of the strains sequenced in this study. residue d is located in subdomain (sd- ) of the s protein, and it forms a hydrogen bond and electrostatic interaction with two residues in the s subunit of a neighboring protomer (fig. c) . replacement of aspartate with glycine would eliminate both interactions, thereby substantively weakening the contact between the s and s subunits. we speculate that this weakening produces a more fusogenic spike protein, because s must first dissociate from s before the s subunit can refold and mediate fusion of viral and cell membranes. stated another way, virus strains with the d g variant may be better able to enter host cells, potentially resulting in enhanced spread. we discovered substantial genetic diversity of the sars-cov- viruses causing covid- disease in houston, texas. the majority of cases studied are progeny of strains that cause widespread disease in european and asian countries. we interpret these data as demonstrating that the sars-cov- virus was introduced into houston many times independently, likely by individuals who had traveled to different parts of the world. in support of this interpretation, the first cases in metropolitan houston were associated with a travel history to a known covid- region ( ). the data are consistent with the fact that houston is a large international city characterized by a multi-ethnic population. the diversity present in our viral genomes contrasts somewhat with data reported recently by gonzalez-reiche et al. ( ) , who studied sars-cov- isolates causing disease in patients in the new york city region. those investigators concluded that the vast majority of disease was caused by progeny of strains imported from europe. similarly, bedford et al. ( ) reported that much of the covid- disease in the seattle, washington area was caused by strains that are progeny of a virus strain recently introduced from china. the viral diversity present in metropolitan houston permitted us to test the hypothesis that distinct viral clades were nonrandomly associated with hospitalized covid- patients or disease severity. although our sample size is relatively small, we found little evidence to support this hypothesis. clearly, this important matter warrants further study with a larger sample size, and this analysis is currently underway. we used machine learning classifiers in an attempt to identify snps that contribute to increased infection severity or otherwise affect virus-host outcome. the models could not be trained to accurately predict these outcomes from the viral genome sequence data, which may be due to the small sample size and class imbalance. as such, no particular snps were identified that are predictive of disease severity or infection outcome. classifiers were also trained to search for predictors of host characteristics such as age by decade, gender, and ethnic group. we found that the african american versus caucasian populations had a predictive signal, and hence potentially significant snps, which may be borne out with increased sampling in these two groups. however, examination of the geographic distribution of the viral isolates classified using this model largely coincided with the demographic distribution of ethnic groups in the houston metropolitan region. as such, the underlying snps found by the model may reflect social networks present early in the spread of sars-cov- in the houston metropolitan area, rather than distinct viral or human genetic factors. remdesivir is a nucleoside analog that has been reported to have activity against mers-cov, a coronavirus related to sars-cov- ( ) . recently, grein et al. ( ) reported that this drug shows promise in treating covid- patients in a relatively small compassionate use protocol. because in vitro resistance of sars-cov to remdesivir has been reported to be caused by either of two amino acid replacements in rdrp (phe leu and val leu), we interrogated our data for polymorphisms in the nsp gene. although we identified different inferred amino acid replacements in the genomes analyzed, none of these were located at the two positions associated with resistance in vitro. these findings suggest that if remdesivir proves to be efficacious in covid- patients, and is deployed widely as a treatment, the majority of sars-cov- strains currently circulating should be susceptible to this drug. however, the ala val polymorphism we identified occurs at an amino acid site that intriguingly is located directly above the nucleotide substrate entry channel and nucleotide binding residues lys , arg , and arg ( , ) (fig. ) . one possibility is that substitution of the smaller alanine residue with the bulkier valine may impose structural constraints for the modified nucleotide analog to bind and thereby disfavor remdesivir binding. this in turn leads to reduced incorporation of remdesivir into the nascent rna, increased fidelity of rna synthesis, and thus drug resistance. a similar mechanism has been proposed for a v l change ( ) . we also identified one strain with a lys asn replacement in nsp . this substitution is located very close to a phe leu replacement reported to produce partial resistance to remdesivir in vitro in sars-cov patients from , although the amino acid positions are numbered differently in sars-cov and sars-cov- . structural studies have suggested that this amino acid is surfaceexposed, and distant from known key functional elements. our observed lys asn change is also located in a conserved motif described as a finger domain of rdrp (fig. ) . one speculative possibility is that lys is involved in binding a yet unidentified cofactor such as nsp or nsp , an interaction that could modify nucleotide binding and/or fidelity at a distance. these data warrant additional study in larger patient cohorts, especially individuals treated with remdesivir. analysis of the gene encoding the spike protein identified many new inferred amino acid replacements not present in available databases. these data, coupled with structural information available for s protein, raises the possibility that many of the amino acid variants have functional consequences. for example, clustering of several amino acid changes to the ntd suggests varying residues in this region bestow a fitness phenotype. regardless, we are now beginning to acquire critical information about the location and extent of amino acid replacements occurring in the s protein in natural populations of sars-cov- . these data permitted generation of many biomedically relevant hypotheses now under study. our work represents analysis of the largest sample of sars-cov- genome sequences to date from patients in the southern united states. this investigation was facilitated by the fact that we had rapidly assessed the suitability and performance of the sars-cov- molecular diagnostic test in january , more than a month before the first covid- patient was diagnosed in houston. our large healthcare system has eight hospitals and many clinics located in geographically diverse areas of the city. these facilities serve patients of very diverse ethnicities and socioeconomic status, thus our data likely represent a broad overview of virus diversity causing infections throughout metropolitan houston. we acknowledge that not every "twig" of the sars-cov- evolutionary tree in houston is represented in these data because the samples studied were not comprehensive. the genomes reported here represent the first strains documented to cause covid- disease in the houston area. thus, they are an important resource that will underpin further and our ongoing study of sars-cov- molecular evolution and dissemination in houston. as of april , , there were , reported cases of covid- in metropolitan houston, with the number of cases growing daily. we are now sequencing virus genomes essentially in real time from infected patients, which will permit us to provide important data that can be exploited to facilitate an enhanced public health response to this pandemic. sars-cov- genome sequence analysis. consensus viral genome sequences from the houston isolates were generated using the artic ncov- bioinformatics pipeline. publicly available genomes and metadata were acquired through gisaid on april , ( ) . gisaid sequences containing greater than % n characters, and houston sequences with greater than % n characters were removed from consideration. identical gisaid sequences originating from the same geographic location with the same collection date were also removed from consideration to reduce redundancy. nucleotide sequence alignments for the combined houston and gisaid strains were generated using mafft version . b with default parameters ( ) . sequences were manually curated in jalview ( ) to trim the ends and to remove sequences containing spurious inserts. phylogenetic trees were generated using fasttree with the generalized time-reversible model for nucleotide sequences ( ) . trees were parsed, annotated, and visualized using the r packages treeio and ggtree ( ) ( ) ( ) . clc genomics workbench (qiagen) was used to generate the trees in the bi filled map visualization. the home address for patients whose isolates were sequenced was matched to a dictionary of addresses downloaded from openaddresses.io (https://openaddresses.io/) to obtain the latitude and longitude geocoding data. because addresses are transcribed and subject to manual error, the fuzzywuzzyr package was used to match the best address in the dictionary. all address matches were manually reviewed for accuracy. the latitude/longitude coordinates were plotted onto a map using the microsoft power bi desktop map visualization. to examine geographic relatedness among genetically similar isolates, the geospatial maps were filtered to four small independent clades. the home address of the patients was again visualized using the microsoft power bi desktop map visualization. time series. the geospatial data were filtered into three time intervals ( / / - / / , / / - / / , and / / - / / ) to illustrate the spread of confirmed sars-cov- positive patients we identified over time. machine learning analysis. machine learning models were trained to predict patient metadata categories including mortality, length of stay, inpatient versus outpatient status, icu admission, overall outcome, gender, age, and ethnicity from viral sequence data. models were trained with the consensus whole genome fasta files by dividing each viral genome into -mer oligonucleotide k-mers, which were used as features to train xgboost models ( ) as described previously ( , ) . models were assessed by computing f scores for classifiers and r scores for regression models. unless otherwise stated, data for the first folds of a -fold cross validation are shown. pdf?sfvrsn=fcf b_ . world health organization coronavirus disease (covid- ) situation report substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) a novel coronavirus outbreak of global health concern another decade, another coronavirus clinical features of patients infected with novel coronavirus in wuhan a novel coronavirus from patients with pneumonia in china early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a new coronavirus associated with human respiratory disease in china european centre for disease control and prevention (ecdc) covid- ) pandemic: increased transmission in the eu/eea and the uk -seventh update severe outcomes among patients with coronavirus disease (covid- ) -united states houston region grows more racially/ethnically diverse gisaid: global initiative on sharing all influenza data -from vision to reality ebola virus glycoprotein with increased infectivity dominated the - epidemic human adaptation of ebola virus during the west african outbreak functional characterization of adaptive mutations during the west african ebola virus outbreak coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease prophylactic and therapeutic remdesivir (gs- ) treatment in the rhesus macaque model of mers-cov infection compassionate use of remdesivir for patients with severe covid- the establishment of reference sequence for sars-cov- and variation analysis remdesivir and sars-cov- : structural requirements at both nsp rdrp and nsp exonuclease active-sites structural basis for the inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir structure of the rna-dependent rna polymerase from covid- virus cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein structural and functional basis of sars-cov- entry by using human ace introductions and early spread of sars-cov- in the new york city area cryptic transmission of sars-cov- in washington state mafft multiple sequence alignment software version : improvements in performance and usability jalview version --a multiple sequence alignment editor and analysis workbench fasttree --approximately maximum-likelihood trees for large alignments treeio: an r package for phylogenetic tree input and output with richly annotated and associated data ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data two methods for mapping and visualizing associated data on phylogeny using ggtree xgboost: a scalable tree boosting system developing an in silico minimum inhibitory concentration panel test for klebsiella pneumoniae using machine learning to predict antimicrobial mics and associated genomic features for nontyphoidal salmonella key: cord- -v bfm x authors: gohl, daryl m.; garbe, john; grady, patrick; daniel, jerry; watson, ray h. b.; auch, benjamin; nelson, andrew; yohe, sophia; beckman, kenneth b. title: a rapid, cost-effective tailed amplicon method for sequencing sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v bfm x the global covid- pandemic has led to an urgent need for scalable methods for clinical diagnostics and viral tracking. next generation sequencing technologies have enabled large-scale genomic surveillance of sars-cov- as thousands of isolates are being sequenced around the world and deposited in public data repositories. a number of methods using both short- and long-read technologies are currently being applied for sars-cov- sequencing, including amplicon approaches, metagenomic methods, and sequence capture or enrichment methods. given the small genome size, the ability to sequence sars-cov- at scale is limited by the cost and labor associated with making sequencing libraries. here we describe a low-cost, streamlined, all amplicon-based method for sequencing sars-cov- , which bypasses costly and time-consuming library preparation steps. we benchmark this tailed amplicon method against both the artic amplicon protocol and sequence capture approaches and show that an optimized tailed amplicon approach achieves comparable amplicon balance, coverage metrics, and variant calls to the artic v approach and represents a cost-effective and highly scalable method for sars-cov- sequencing. the global covid- pandemic has necessitated a massive public health response which has included implementation of society-wide distancing measures to limit viral transmission, the rapid development of qrt-pcr, antigen, and antibody diagnostic tests, as well as a world-wide research effort of unprecedented scope and speed. next generation sequencing technologies (ngs) have recently enabled large-scale genomic surveillance of infectious diseases. sequencing-based genomic surveillance has been applied to both endemic disease, such as seasonal influenza ( ) , and to emerging disease outbreaks such as zika and ebola ( ) ( ) ( ) . as of may , over , sars-cov- sequences have been deposited in public repositories such as ncbi and gisaid ( , ) . several large-scale consortia in the uk (cog-uk: covid- genomics uk), canada (cancogen: canadian covid genomics network), and the united states (cdc spheres: sars-cov- sequencing for public health emergency response, epidemiology, and surveillance) have begun coordinated efforts to sequence large numbers of sars-cov- genomes. such genomic surveillance has already enabled insights into the origin and spread of sars-cov- ( , ) , including the sequencing efforts by the seattle flu study which provided early evidence of extensive undetected community transmission of sars-cov- in the seattle area ( ) . a number of different approaches have been used to sequence sars-cov- . metagenomic (rna) sequencing can be used to sequence and assemble sars-cov- ( ) . this approach has the disadvantage that samples must typically be sequenced very deeply in order to obtain sufficient coverage of the viral genome, and thus the cost of this approach is high relative to more targeted methods. sequence capture methods ( figure a) can be used to enrich for viral sequences in order to lower sequencing costs and are being employed to sequence sars-cov- ( ) . finally, amplicon approaches ( figure b) , in which cdna is made from sars-cov- positive samples and amplified using primers that generate tiled pcr products are being used to sequence sars-cov- ( ). since primers cannot capture the very ends of the viral genome, amplicon approaches have the drawback of slightly less complete genome coverage, and mutations in primer binding sites have the potential to disrupt the amplification of the associated amplicon. however, the relatively low-cost of amplicon methods make them a good choice for population-scale viral surveillance and such approaches have recently been used successfully to monitor the spread of viruses such as zika and ebola ( ) ( ) ( ) . the artic network (https://artic.network/) has established a method for preparing amplicon pools in order to sequence sars-cov- ( figure b ). the artic primer pools have gone through multiple iterations to improve evenness of coverage ( ) . several variants of the artic protocol exist in which the pooled sars-cov- amplicons from a sample are taken through a ngs library preparation protocol (using either ligation or tagmentation-based approaches) in which sample-specific barcodes are added, and are then sequenced using either short-read (illumina) or long-read (oxford nanopore, pacbio) technologies. the library preparation step currently represents a bottleneck in sequencing sars-cov- amplicons, in terms of both cost and labor. here we describe an all-amplicon method for producing sars-cov- sequencing libraries which simplifies the process and lowers the per sample cost for sequencing sars-cov- genomes ( figure c ). this approach incorporates adapter tails in the artic v primer designs, allowing sequencing libraries to be produced in a two-step pcr process, bypassing costly and labor-intensive ligation or tagmentation-based library preparation steps. by reoptimizing the pooling strategy for the tailed primers, we demonstrate that this tailed amplicon approach can achieve similar coverage to the untailed artic v primers at equivalent sequencing depths. we benchmark this approach against both the standard artic v protocol and a sequence capture approach using clinical samples spanning a range of viral loads. the we designed a series of experiments in order to test a streamlined tailed amplicon method and to compare amplicon and sequence capture based methods for sars-cov- sequencing ( figure ). we sequenced these samples using illumina's nextera dna flex enrichment protocol using a respiratory virus oligo panel containing probes for sars-cov- , the artic v tiled primers, and a novel tailed amplicon method designed to reduce cost and streamline the preparation of sars-cov- sequencing libraries. we first evaluated the different sars-cov- sequencing workflows in their performance with a previously sequenced sars-cov- isolate strain from washington state ( -ncov/usa-wa / ) provided by bei resources ( ) . as expected, since the amplicon approaches are unable to cover sequences at the ends of the sars-cov- genome, the dna flex enrichment sequence capture method produced the highest genome coverage. at a subsampled read depth of , reads, the nextera dna flex enrichment method achieved . % coverage at a minimum of x and . % coverage at a minimum of x (figure a -b). the artic v method prepared with truseq library preparation achieved . % coverage at a minimum of x and . % coverage at a minimum of x (figure a -b). we tested a tailed amplicon method (tailed amplicon v ) in which the tailed version of the artic v primers were pooled into two pools in a similar manner to the artic v protocol. the bei wa isolate strain was amplified for both or pcr cycles, using the same enzymes and pcr conditions used for the artic v data set. the tailed amplicon v method produced lower coverage than the artic v method, with . % coverage at a minimum of x and . % coverage at a minimum of x for the pcr cycle sample and . % coverage at a minimum of x and . % coverage at a minimum of x for the pcr cycle sample (figure a-b) . the poorer performance with respect to coverage metrics with the tailed amplicon v protocol was due to substantially worse balance between the different tiled amplicons than with the artic v (untailed) primers ( figure c -d). the coefficient of variation (cv) of the artic v sample was . and the cvs of the tailed amplicon v samples were . and . for the and pcr cycle samples, respectively. the artic v primers have been through multiple cycles of iteration to achieve relatively even amplicon balance and genome coverage ( ) . we reasoned that reducing the concentration of the primers that were over-represented in the initial round of sequencing may improve balance. while adjusting the primer concentration for over-represented amplicons did lower the cv of the tailed amplicon pool, amplicon balance was still substantially worse than with the untailed artic v primers (data not shown). we next tested whether splitting the tailed sars-cov- primers into pcr reactions based on primer performance in the initial sequencing tests could improve balance with the tailed primer approach. the -pool amplification scheme (tailed amplicon v ) achieved coverage metrics close to the untailed artic v approach at comparable read depths with . % coverage at a minimum of x and . % coverage at a minimum of x (figure a-b) . the improvement in genome coverage metrics with the tailed amplicon v approach was a function of improved amplicon balance ( figure e ). the cv of the tailed amplicon v sample was . (comparable to the cv of . with the untailed artic v approach). the same three variants were detected by all four methods tested ( figure f ), consistent with prior comparisons of the usa-wa / and the wuhan-hu- reference strain. next, we assessed the performance of the different sars-cov- sequencing approaches on a set of deidentified patient samples. we selected sars-cov- positive patient samples spanning a range of viral loads as assessed by a qrt-pcr using the cdc primers targeting the sars-cov- nucleocapsid gene (n and n targets, supplemental figure ). in addition, we included two patient negative samples in these experiments. we carried out initial tests of the nextera dna flex enrichment protocol, the tailed amplicon v approach, and the artic v approach using this sample set. for testing the tailed amplicon v approach, and comparing among all four methods, we used a subset of these patient samples with n and n ct values ranging from ~ - ( figure a ). for the illumina dna flex enrichment protocol, sars-cov- genome coverage was more complete for samples with lower n and n cts (ranging from ~ - ) at comparable read depths and coverage thresholds than with amplicon approaches, similar to the bei wa isolate data ( figure c , supplemental figure s -s ). however, for samples with n and n ct values greater than approximately , the number of sequencing reads were substantially reduced and the proportion of reads mapping to the human genome were substantially increased (supplemental figure s ). the average coverage at a subsampled read depth of , raw reads was . % ( x) and . % ( x) for all six test samples. for samples with n and n ct vales of less than , average coverage was . % ( x) and . % ( x) at a subsampled read depth of , raw reads. for artic v tests, based on the n and n target ct values from clinical testing, we used either , , or pcr cycles for the amplification reactions. sufficient amplification to carry out truseq library prep was seen for samples with cts of around or less. five patient samples with n and n ct values ranging from ~ - and the bei wa isolate sample were selected for truseq library prep and sequencing; one sample (n ct = , n ct = . ) was prepared in triplicate. consistent with previous descriptions of the artic v primers, the balance between the tiled amplicons across these samples was relatively even, with a mean cv of . among the five patient samples tested, and . for samples with a n and n ct of less than ( figure b , supplemental figure s ). for the artic v protocol, the average coverage at a subsampled read depth of , raw reads was . % ( x) and . % ( x) for all five test samples. for samples with n and n ct vales of less than , average coverage was . % ( x) and . % ( x) at a subsampled read depth of , raw reads ( figure d , supplemental figure s -s ). we performed initial tests of the tailed amplicon v protocol by amplifying the samples listed in figure a for or pcr cycles using tailed versions of the artic v primers split into two separate pools. as with the bei wa isolate sample, the balance observed with the tailed amplicon v approach was worse than the artic v protocol, with a mean cv of . among the six patient samples tested, and . for samples with a n and n ct of less than ( figure b , supplemental figure s ). this led to decreased coverage at a given read depth for the tailed amplicon v method relative to artic v ( figure e , supplemental figure s ). upon splitting the tailed sars-cov- primers into pcr reactions based on primer performance in the initial sequencing tests, the tailed amplicon v method had much improved amplicon balance. the mean cv of all six patient samples was . (compared to a cv of . with artic v ) and . for samples with a n and n ct of less than (compared to . with the artic v protocol; figure b , supplemental figure s ). the tailed amplicon v protocol had an average coverage at a subsampled read depth of , raw reads of . % ( x) and . % ( x) for all six test samples. for samples with ct vales of less than , average coverage was . % ( x) and . % ( x) at a subsampled read depth of , raw reads ( figure f , supplemental figure s -s ). the slightly lower coverage metrics at a given subsampled read depth for the tailed amplicon v method can likely be explained by primer dimer formation during the two-step amplification process, which is more pronounced for higher n and n ct samples (supplemental figure s ). despite observing negligible amounts of primer dimer products on the bioanalyzer trace, samples with n and n ct values greater than had as much as % primer dimer in the resulting sequencing reads. we have previously reported a substantial size bias on the miseq, which may help explain the preferential clustering and out-sized proportion of primer dimer reads present in the sequencing data for some samples ( ) . while this issue can be overcome by increased sequencing depth, future optimizations aimed at reducing primer dimer contamination such as more stringent size selection or sequencing on an instrument with less size bias, such as the novaseq ( ) could reduce this effect. finally, we examined the variants detected in the patient samples for each of the sars-cov- sequencing methods. there was complete concordance in the variant calls for all samples with n and n ct values below , but less agreement among variant calls between methods for the sample with n and n ct values of approximately ( figure ). here we compare sequence capture and amplicon-based methods for sequencing sars-cov- and describe a streamlined tailed amplicon method for cost-effective and highly scalable sars-cov- sequencing. in comparing the sequence capture and amplicon-based methods, there is a trade-off between the completeness of genome coverage and sensitivity (being able to analyze samples with higher n and n ct values. consistent with other recent analyses of sars-cov- amplicon sequencing approaches ( ), we observed highly concordant results from samples with n and n ct values of less than . for samples with ct values between and , coverage metrics tended to be less robust at a given read depth and samples with ct values of greater than did not perform well under any of the conditions tested. based on validation experiments for the university of minnesota qrt-pcr clinical covid- diagnostic assay, we estimate that a ct value of corresponds to roughly sars-cov- genome copies and a ct value of corresponds to roughly sars-cov- genome copies in the µl input used for cdna creation ( ) . we describe a modified workflow for sars-cov- sequencing which builds on the tiled amplicon approach developed by the artic consortium and currently employed by many labs around the world. this tailed amplicon method uses a two-step pcr process similar to workflows previously described by us and others to generate microbiome or other amplicon sequencing data ( ) . through an iterative testing process, we demonstrate that with the tailed amplicon v method, a four-pool amplification scheme produces data with comparable amplicon balance, coverage metrics, and variant calls to the artic v approach. the tailed amplicon approach bypasses costly and labor-intensive library preparation steps and will allow for production of sars-cov- libraries at high scale (similar workflows are run on tens of thousands of samples per year in the university of minnesota genomics center) at low cost (between $ - per sample depending on scale, including labor costs). we anticipate that this approach will aid in the genomic surveillance of sars-cov- as well as studies on viral diversity and evolution, and the influence of virus genetics on transmissibility, virulence, and clinical outcomes. extracted rna from de-identified clinical biospecimens were obtained subsequent to covid- testing at the university of minnesota for use under the irb approved protocol "detection of covid by molecular methods" (study ). nine samples spanning a range of viral loads as assessed by the ct values of the viral n and n targets by qrt-pcr were selected for these studies. in addition, two sars-cov- negative samples were selected to assess cross-contamination or other sequencing artifacts. the following reagent was deposited by the centers for disease control and prevention and obtained through bei resources, niaid, nih: genomic rna from sars-related coronavirus , isolate usa-wa / , nr- . rna was extracted using one of three kits (qiagen qiaamp viral rna mini kit, macherey-nagel nucelospin virus mini kit, and biomérieux easymag nuclisens system) as described previously ( ) . all extraction methods used µl of viral transport medium as input and eluted in µl of appropriate elution buffer as indicated by manufacturer protocols. the integrity of the extracted rna was analyzed using the agilent high sensitivity rna screentape assay on agilent tapestation following the manufacturer's guidelines (agilent, santa clara, ca). qrt-pcr reactions to identify sars-cov- samples were carried out using a modified version of the centers for disease control and prevention (cdc) sars-cov- qrt-pcr assay, as previously described ( ) . briefly, three separate µl rt-qpcr reactions were set up in a -well barcoded plate (thermo fisher scientific, waltham, ma) for either the n , n , or rp primers and probes. . µl extracted rna was added to . µl qpcr master mix comprised of the following components: . µl nuclease-free water, µl gotaq ® probe qpcr master mix with dutp ( x) (promega, madison, wi), . µl goscript tm rt mix for -step rt-qpcr (promega, madison, wi), . µl primer/probe sets for either n , n , or rp (idt, coralville, ia). reactions were run on a quantstudio qs (thermo fisher scientific, waltham, ma) using the following cycling conditions: one cycle of °c for minutes, followed by one cycle of °c for minutes, followed by cycles of °c for seconds and °c for minute. a minimum of two no template controls (ntcs) were included on all runs. a Δrn threshold of . was selected and set uniformly for all runs. ct values were exported and analyzed in microsoft excel. the following reaction was set up to create cdna using the artic v protocol: µl template rna, µl nuclease-free water, µl superscript iv vilo master mix (thermo fisher scientific, waltham, ma). cdna synthesis reactions were incubated at: °c for minutes, followed by °c for minutes and °c for minutes. cdna was amplified using each of the two artic v primer pools which tile the sars-cov- genome. the following recipe was used to set up the pcr reactions: . µl template cdna, . µl nuclease-free water, µl x q reaction buffer (new england biolabs, ipswich, ma), . µl mm dntps (kapa biosystems, woburn, ma), . µl q polymerase (new england biolabs, ipswich, ma), µl primer pool or ( µm). cycling conditions were: °c for seconds, followed by or cycles of °c for seconds and °c for minutes. pools and were then combined, cleaned up with : ampurexp beads (beckman coulter, brea, ca)., and quantified by qubit fluorometer and broad range dna assay (thermo fisher scientific, waltham, ma) and tapestation capillary electrophoresis (agilent, santa clara, ca). eight samples with > ng/µl concentration of target amplicons were selected for downstream library preparation. library preparation was performed following the standard illumina truseq nano dna protocol for base pair libraries (illumina, san diego, ca). a total of ng of amplicons from the artic protocol were used as the input for library preparation. input material was not sheared, as the amplicons were already the desired fragment length. a modified non-directional nebnext ultra ii first and second strand (#e and #e , new england biolabs, ipswich, ma) protocol was used to generate long fragments of doublestranded cdna as input material for the nextera dna flex enrichment with respiratory virus panel. the following reaction was set up for non-fragmented priming of rna: µl template rna and µl nebnext random primers were combined and incubated at °c for minutes. non-directional first strand cdna synthesis was performed by combining µl of primed template rna, µl nebnext first strand synthesis buffer, µl nebnext first stand synthesis enzyme mix, and µl nuclease-free water. the first strand synthesis reaction was incubated at to generate cdna upstream of sars-cov- genome amplification, the following reaction was set up: µl template rna, µl nuclease-free water, µl superscript iv vilo master mix (thermo fisher scientific, waltham, ma). cdna synthesis reactions were incubated at: °c for minutes, followed by °c for minutes and °c for minutes. the sars-cov- genome was amplified using a two-step pcr protocol. the primary amplification was carried out in a manner similar to the artic v method described above, using two primer pools which tile the sars-cov- genome. the following recipe was used to set up the pcr reactions: . µl template cdna, . µl nuclease-free water, µl x q reaction buffer (new england biolabs, ipswich, ma), . µl mm dntps (kapa biosystems, woburn, ma), . µl q polymerase (new england biolabs, ipswich, ma), µl primer pool or ( µm) for the tailed v protocol. cycling conditions were: °c for seconds, followed by or cycles of °c for seconds and °c for minutes. the primers for the primary amplification contained both sars-cov- targeting sequences (derived from the artic v designs), as well as adapter tails for adding indices and illumina flow cell adapters in a secondary amplification. these amplification primers had the following structure (see supplementary information for primer sequences): left primers: tcgtcggcagcgtcagatgtgtataagagacag right primers: gtctcgtgggctcggagatgtgtataagagacag the pcr products from pool and pool for each sample were combined and then diluted : in sterile, nuclease-free water, and a second pcr reaction was set up to add the illumina flow cell adapters and indices. the secondary amplification was done using the following recipe: µl template dna ( : dilution of the first pcr reaction), . µl nuclease-free water, µl x q reaction buffer (new england biolabs, ipswich, ma), . µl mm dntps (kapa biosystems, woburn, ma, . µl q polymerase (new england biolabs, ipswich, ma), . µl forward primer ( µm), . µl reverse primer ( µm). cycling conditions were: °c for seconds, followed by cycles of °c for seconds, °c for seconds, °c for minute, followed by a final extension at °c for minutes. the following indexing primers were used (x indicates the positions of the bp unique dual indices): forward indexing primer: aatgatacggcgaccaccgagatctacacxxxxxxxxxxtcgtcggcagcgtc reverse indexing primer: caagcagaagacggcatacgagatxxxxxxxxxxgtctcgtgggctcgg four-pool tailed amplicon v library generation and sequencing. samples were processed as described above for the two-pool tailed amplicon sequencing workflow, with the exception that in the first round of pcr, four separate reactions were set up using primer pools . , . , . , and . (see supplementary information for primer sequences). the four pcr reactions were combined in a : : : ratio after an initial pcr amplification of cycles and a : dilution of the combined pcrs for each sample was indexed according to the process described above. the sample pools were diluted to nm based on the qubit measurements and agilent sizing information, and µl of the nm pool was denatured with µl of . n naoh. amplicon libraries (artic v , tailed v , tailed v ) were diluted to pm in illumina's ht buffer, spiked with % phix, and sequenced using a miseq cycle v kit (illumina, san diego, ca). the nextera dna flex enrichment library was diluted to pm in illumina's ht buffer, spiked with % phix, and sequenced using a and a miseq cycle v kit (illumina, san diego, ca). the analysis method for amplicon libraries is as follows: sample quality was assessed with fastqc ( ) . read-pairs were stitched together using pear ( ) . human host dna was filtered by aligning the stitched reads to the human genome (grch ). reads that did not align to the host genome were aligned to the reference wuhan-hu- ( ) sars-cov- genome (mn . ) using bwa ( ) . amplicon read depths were determined by counting the number of aligned reads covering the base at the center of each amplicon region. the ivar software package was used to trim primer sequences from the aligned reads, and ivar and samtools mpileup were used to call variants and generate consensus sequences ( ). variants located outside of the region targeted by the amplicon panel were filtered out (reference genome positions - and - ), and consensus sequences bases corresponding to those regions were trimmed. the nextera dna flex enrichment libraries were analyzed using the same process, except the ivar primer trimming step was omitted, and no filtering of variants or trimming of consensus sequence was performed. sequencing data for this project is available through the ncbi sequence read archive bioproject prjna . genome sequences of the strains sequenced in this study are available in genbank bioproject prjna . a) in illumina's nextera dna flex enrichment protocol cdna is tagmented and made into barcoded sequencing libraries, which are then enriched using sequence capture with a respiratory virus panel containing probes against sars-cov- . b) in the artic protocol, first strand cdna is enriched by amplifying with two pools of primers to generate amplicons tiling the sars-cov- genome. these amplicons are then subjected to either illumina or oxford nanopore library preparation, using methods that either directly add adapters to the ends of the amplicons or fragment them to enable sequencing on a wider variety of illumina instruments. c) the tailed amplicon approach, developed here, enriches first strand cdna using artic v primers containing adapter tails. this allows functional sequencing libraries to be created through a second indexing pcr reaction that adds sample-specific barcodes and flow cell adapters. a) percentage of the bei wa isolate genome coverage at x at different subsampled read depths when sequenced with the indicated approach. b) percent of the bei wa isolate genome coverage at x at different subsampled read depths when sequenced with the indicated approach. c) observed read depth for each of the expected amplicons for the bei wa isolate amplified with the artic v protocol at a subsampled read depth of , raw reads. d) observed read depth for each of the expected amplicons for the bei wa isolate amplified with the tailed amplicon v ( pool amplification) protocol at a subsampled read depth of , raw reads. e) observed read depth for each of the expected amplicons for the bei wa isolate amplified with the tailed amplicon v protocol ( pool amplification) at a subsampled read depth of , raw reads. f) positions of variants detected for the bei wa isolate at a read depth of up to , , raw reads (or the maximum read depth for the sample). global circulation patterns of seasonal influenza viruses vary with antigenic drift multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar viral genomics in ebola virus research a new coronavirus associated with human respiratory disease in china nextstrain: real-time tracking of pathogen evolution probable pangolin origin of sars-cov- associated with the covid- outbreak in brief the proximal origin of sars-cov- cryptic transmission of sars-cov- in washington state a pneumonia outbreak associated with a new coronavirus of probable bat origin capturing sequence diversity in metagenomes with comprehensive and scalable probe design a proposal of alternative primers for the artic network&# ;s multiplex pcr to improve coverage of sars-cov- genome sequencing systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies severe acute respiratory syndrome coronavirus from patient with novel coronavirus disease, united states measuring sequencer size bias using recount: a novel method for highly accurate illumina sequencing-based quantification rapid, sensitive, full genome sequencing of severe acute respiratory syndrome virus coronavirus (sars-cov- ) analytical validation of a covid- qrt-pcr detection assay using a -well format and three extraction methods fastqc a quality control tool for high throughput sequence data pear: a fast and accurate illumina paired-end read merger fast and accurate long-read alignment with burrows-wheeler transform we thank the staff of the university of minnesota genomics center for helpful discussions and technical support. we thank brandon vanderbush for conducting qc on the sars-cov- samples and sequencing libraries. this work was carried out in part using computing resources at the university of minnesota supercomputing institute. we thank sean wang and matt plumb from the minnesota department of heath for helpful discussions and for sharing artic v primers. key: cord- -oa jfots authors: taka, e.; yilmaz, s. z.; golcuk, m.; kilinc, c.; aktas, u.; yildiz, a.; gur, m. title: critical interactions between the sars-cov- spike glycoprotein and the human ace receptor date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oa jfots severe acute respiratory syndrome coronavirus (sars-cov- ) enters human cells upon binding of its spike (s) glycoproteins to ace receptors and causes the coronavirus disease (covid- ). therapeutic approaches to prevent sars-cov- infection are mostly focused on blocking s-ace binding, but critical residues that stabilize this interaction are not well understood. by performing all-atom molecular dynamics (md) simulations, we identified an extended network of salt bridges, hydrophobic and electrostatic interactions, and hydrogen bonding between the receptor-binding domain (rbd) of the s protein and ace . mutagenesis of these residues on the rbd was not sufficient to destabilize binding but reduced the average work to unbind the s protein from ace . in particular, the hydrophobic end of rbd serves as the main anchor site and unbinds last from ace under force. we propose that blocking this site via neutralizing antibody or nanobody could prove an effective strategy to inhibit s-ace interactions. the covid- pandemic is caused by sars-cov- , which is a positive-sense single-stranded rna betacoronavirus. phylogenetic analyses demonstrated that the sars-cov- genome shares ~ % sequence identity with severe acute respiratory syndrome coronavirus (sars-cov), and ~ % with the middle-east respiratory syndrome coronavirus (mers-cov) ( ). despite these similarities, sars-cov- is much more infectious and fatal than sars-cov and mers-cov together ( ) . sars-cov- consists of a kb single-stranded rna genome that is encapsulated by a lipid bilayer and three distinct structural proteins that are embedded within the lipid membrane: envelope (e), membrane (m), and spike (s). host cell entry is primarily mediated by homotrimeric s glycoproteins located on the viral membrane ( fig. a) ( ) . each s protomer consists of s and s subunits that mediate binding to the host cell receptor and fusion of the viral envelope, respectively ( , ) . the receptor-binding domain (rbd) of s undergoes a large rigid body motion to bind to ace . in the closed state, all rbds of the s trimer are in the down position, and the binding surface is inaccessible to ace . the switching of one of the rbds into a semi-open intermediate state is sufficient to expose the ace binding surface and stabilize the rbd in its up position (fig. b) ( ) . the s protein binds to the human angiotensin-converting enzyme (ace ) receptor, a homodimeric integral membrane protein expressed in the epithelial cells of lungs, heart, kidneys, and intestines ( ) . each ace protomer consists of an n-terminal peptidase domain (pd), which interacts with the rbd of the s protein through an extended surface (fig. a , c) ( ) ( ) ( ) . upon ace binding, proteolytic cleavage of the s protein by the serine protease tmprss separates the s and s subunits ( ) . the s protein exposes fusion peptides that insert into the host membrane and promote fusion with the viral membrane ( ) . to prevent sars-cov- infection, there is a global effort to design neutralizing antibodies ( ) , nanobodies ( ) , peptide inhibitors ( ) , and small molecules ( ) that target the ace binding surface of the s protein. yet, only a limited number of studies were performed to investigate critical interactions that facilitate s-ace binding using md simulations. initial studies have constructed a homology model of sars-cov- rbd in complex with ace , based on the sars-cov crystal structure ( , ) and performed conventional md (cmd) simulations totaling ns ( , ) and ns ( , ) in length to estimate binding free energies ( , ) and interaction scores ( ) . more recent studies used the crystal structure of sars-cov- rbd in complex with ace to perform coarse-grained ( ) and all-atom ( - ) md simulations. the effect of the mutations that disrupt close contact residues between sars-cov- rbd and ace on binding free energy was investigated by post-processing of the md trajectories ( , , , ) or by using bioinformatic methods ( ) . the work required to unbind the s protein from ace would provide a more accurate estimate of the binding strength, but this has not been performed under low pulling velocities using the structure of sars-cov- rbd in complex with ace . in addition, systematic analysis of critical residues that stabilize s-ace binding and how mutagenesis of these interaction sites reduces the binding strength and alters the way the s protein detaches from ace under force have not yet been performed. in this study, we performed a comprehensive set of all-atom md simulations totaling . µs in length using the recently-solved structure of the rbd of the sars-cov- s protein in complex with the pd of ace ( ) . simulations were performed in the absence and presence of external force to investigate the binding characteristics and estimate the binding strength. these simulations showed additional interactions between rbd and pd domains to those observed in the crystal structure ( ) . an extensive set of alanine substitutions and charge reversal mutations of the rbd amino acids involved in ace binding were performed to quantify how mutagenesis of these residues weaken binding in the presence and absence of force in simulations. we showed that the hydrophobic end of rbd primarily stabilizes s-ace binding, and targeting this site could potentially serve as an effective strategy to prevent sars-cov- infection. to model the dynamic interactions of the s protein-ace binding interface, we used the costructure of rbd of the sars-cov- s protein in complex with the pd of human ace ( ) (fig. c) . the structure was solvated in a water box that contains physiologically-relevant salt ( mm nacl) concentration. two sets of cmd simulations, each of ns in length, were performed to determine the formation of a salt bridge ( ) and a hydrogen bond, as well as electrostatic and hydrophobic interactions between rbd and pd (table s ). a cutoff distance of Å between the basic nitrogens and acidic oxygens was used to score a salt bridge formation ( ) . for hydrogen bond formation, a maximum . Å distance between hydrogen bond donor and acceptor and a ° angle between the hydrogen atom, the donor heavy atom, and the acceptor heavy atom was used ( ) . interaction pairs that satisfy the distance, but not the angle criteria were analyzed as electrostatic interactions. for hydrophobic interactions, a cutoff distance of Å between the side chain carbon atoms was used ( ) ( ) ( ) . using these criteria, we identified eleven hydrophobic interactions ( fig. a) , eight hydrogen bonds (fig. b) , two salt bridges and six electrostatic interactions (fig. c ) between rbd and pd. observation frequencies were classified as high and moderate for interactions that occur in % and above and between - % of the total trajectory, respectively. f and y of rbd formed hydrophobic interactions with f , l , m , and y of pd, while l , f , y , and a of rbd formed hydrophobic interactions with t of pd at high frequencies (fig. d ). salt bridges between k -d (rbd-pd) and e -k , and hydrogen bonds between n -y , t -d , and q -e were observed at high frequencies, whereas hydrogen bonds y -d , q -k , t -y , y -e , and q -e were observed at moderate frequencies (fig. d ). residue pairs y -h , n -q , t -y , n -k , q -k , and y -q exhibited electrostatic interactions throughout the simulations (fig. d ). the interaction network we identified in our md simulations were mostly consistent with reported interactions in the rbd-pd crystal structure ( ) . however, our simulations identified four hydrogen bonds (q -k , t -d , y -e , and q -q ), one hydrophobic interaction (l -t ), and two electrostatic interactions (y -h and n -k ) that are not present in the crystal structure. in turn, we did not detect frequent hydrogen bonding between g -q , g -k , and y -r and an electrostatic interaction between g -k observed in the crystal structure ( ). this discrepancy may be due to radically different thermodynamic conditions between crystallization solutions and cmd simulations ( ) . we divided the rbd-pd interaction surface into three contact regions (cr - , fig a- the core region (cr ) comprised significantly fewer interactions than the ends of the rbd binding surface (cr and cr ). remarkably, out of interactions we detected in cr were hydrophobic, which were proposed to play a central role in anchoring of rbd to pd ( ) . unlike cr , cr formed only a single hydrophobic interaction with pd, whereas cr did not form any hydrophobic interactions. to estimate the binding strength of the s protein to ace , we performed steered md (smd) simulations to pull rbd away from pd at a constant velocity of Å − along the vector pointing away from the binding interface (fig. a) . steering forces were applied to the cα atoms of the rbd residues on the binding interface, whereas cα atoms of pd residues at the binding interface were kept fixed. because part of the work applied is lost to the irreversible processes as we pull rbd away from pd at a finite velocity, the second law of thermodynamics indicates that unbinding free energy difference between the initial and final states cannot be larger than the average work required for unbinding. therefore, our calculations report relative changes in the binding free energy of wild-type (wt) and mutant rbd under the same velocity and thermodynamic conditions. in smd simulations (each ns, totaling ns in length, table s ), the average work applied to unbind rbd from pd was . ± . kcal/mol (mean ± s.d.), demonstrating that the s protein binds stably to ace (fig. b) . to investigate the contribution of each of the interactions we identified to the overall binding strength, we introduced point mutations on the rbd. salt bridges were eliminated by charge reversals (k e and e k). we also replaced each amino acid with alanine (table s ) to disrupt the pairwise interactions ( ) , with minimal perturbations the protein backbone ( ) . two sets of cmd simulations (a total of . µs in length) were performed for each point mutant. we first quantified the root mean square fluctuation (rmsf) of the cα atom of the rbd residues located on the pd binding surface (fig. c) . the rigid body motions were eliminated by aligning the rbd interaction surface of pd for each conformer (see methods). out of mutations increased the residue fluctuations compared to wt (fig. s a), suggesting that disrupting the interactions between rbd and pd results in floppier binding. largest fluctuations were observed for mutations in cr (f a, and n a), mutations in cr (y a and y a) and mutation in cr (l a) (fig. c) . mutation of these residues also increased the fluctuations in their neighboring region. while mutations in cr increased fluctuations in cr significantly, mutations in cr had little to no effect on the fluctuations in cr ( fig. d and fig. s b ). we next performed smd simulations modeling the unbinding of rbd of each point mutant from pd ( simulations for each mutant, a total of . µs in length, table s ). f a, y a, y a, n a, and y a mutations substantially decreased the work requirement to unbind rbd-pd by . %, . %, . %, . % and . %, respectively ( fig. e-f, fig. s ). we note that most of these mutations also led to the largest increase in residue fluctuations on the binding surface (fig. c ). of these residues (f , n , and y ) are located in cr , whereas y is located in cr . these results highlight the primary role of hydrophobic interactions in cr to stabilize the s-ace binding. to further characterize critical interactions of the s-ace binding interface, we introduced double mutants to neighboring residues of rbd that form critical interactions with pd. we performed a total of . µs of cmd and . µs of smd simulations for double mutants (table s ). in particular, double mutants in cr resulted in out of the highest increase in rmsf ( fig. a and fig. s ). the f a/n a mutation at cr resulted in the largest increase in fluctuations in both cr and cr (fig. b and fig. s ). in smd simulations, out of double mutations also further decreased the average work to unbind rbd from pd ( fig. c-d, and fig. s ). similar to the rmsf analysis, double mutants in cr (f a/n a, e a/y a, e a/f a, and l a/f a) resulted in out of the largest decreases in average work (fig. d) . a charge reversal of k e in combination with either q a or y a also resulted in a large decrease in work values (fig. d) . we also used jarzynski equality ( , ) to construct the free energy profiles as a function of a reaction coordinate, referred to as the potential of mean force (pmf) ( ) . based on the estimated pmf ( fig. s ), double mutants in cr resulted in the largest decrease in the binding energy by - % compared to wt. collectively, these results show that two salt bridges (e -k and k -d ) and the network of hydrophobic interactions in cr involving f , y , and f residues are the most significant contributors of binding strength between the s protein and ace . to test whether cr anchors rbd to pd ( ), we investigated the order of events that result in detachment of rbd from pd in smd simulations. the unbinding process appears to perform a zipper-like detachment starting from cr and ending at cr in % of the simulations (fig. a) . in only % of the simulations, cr released last from pd (fig. a) . because unbinding simulations can reveal features characteristic for the reverse process of binding ( ) ( ) ( ) ( ) ( ) , these results suggest that cr binding is the first and critical event for the s protein binding to ace . mutagenesis of the critical residues in cr , in general, resulted in a substantial decrease in the percentages of unbinding events that terminate with the release of cr from pd. in alanine replacement of the hydrophobic residues (f a, f , and y ), cr was released last for %, %, and % of the smd simulations, respectively (fig. b) . the probability of cr to release last under force was further reduced in double mutants of e a/f a ( %) and l a/f a ( %) (fig. b ). unlike these mutants, f a and f a/n a mutants in cr increased the probability of cr to release last, but this could be attributed to a large increase in fluctuations in cr upon these mutations ( fig. s b ). these results indicate that single and double mutants of the critical residues in cr substantially reduce the binding free energy of this region to ace . it remains unclear whether higher infectivity of sars-cov- than sars-cov can be attributed to stronger interactions between s and ace in sars-cov- ( , ) . to test this possibility, we performed two sets of md simulations for the rbd of sars-cov s protein bound to the pd of ace (pdb id: ajf ( )), and compared these results to that of sars-cov- . similar to sars-cov- , rbd of sars-cov makes an extensive network of interactions with pd. we identified eleven hydrophobic interactions (fig. a) , six hydrogen bonds (fig. b) , and seven electrostatic interactions (fig. c) . out of these interactions, only are conserved in sars-cov and the following mutations have taken place: l /f (sars-cov/sars-cov- ), f /y , p /a , p /e , l /f v /k , n /q , y /q , and t /n . similar to sars-cov- , l and y of sars-cov rbd formed a total of seven hydrophobic interactions at a high frequency with the hydrophobic pocket of ace (fig. d) . unlike sars-cov- , sars-cov rbd did not form any salt bridges with ace . we next modeled the unbinding of rbd of sars-cov from pd by performing smd simulations (totaling ns in length, table s ). the average total unbinding work of sars-cov ( . ± . kcal/mol, mean ± s.d., fig. e ) was identical but more broadly distributed than that of sars-cov- ( . ± . kcal/mol, fig. b ). unlike sars-cov- , cr released last from pd in only % of the unbinding events of rbd of sars-cov, whereas the unbinding of cr was the last event in the remaining % (fig. f) . these results indicate that the s protein binds stably to ace in both sars-cov and sars-cov- and the higher infectivity of sars-cov- cannot be explained by an increase in binding strength. higher variability in unbinding work values and the absence of a clear order in unbinding events of rbd of sars-cov suggest that sars-cov has a more variable binding mechanism to ace than sars-cov- . we performed an extensive set of in silico analysis to identify critical residues that facilitate binding of the rbd of the sars-cov- s protein to the human ace receptor. mutagenesis of these residues and pulling the rbd away from pd at a low velocity enabled us to estimate the free energy of binding and the order of events that result in the unbinding of rbd from pd. our simulations showed that the pd interacting surface of rbd can be divided into three contact regions (cr - ). hydrophobic residues of cr strongly interact with the hydrophobic pocket of pd in both sars-cov and sars-cov- . cr of sars-cov- also forms a salt bridge with ace that is not present in sars-cov. based on our smd simulations, we did not observe a major difference in binding strength of the s protein to ace between sars-cov and sars-cov- , indicating that higher infectivity of sars-cov- is not due to tighter binding of s to the ace receptor. these results are consistent with a recent md simulation that applied the generalized born and surface area continuum solvation approach (mm-gbsa) ( ) , coarse-grained simulations ( ) , and biolayer interferometry ( ). our analysis suggests that cr is the main anchor site of the sars-cov- s protein to ace , and blocking the cr residues f , e , f , n , and y could significantly reduce the binding affinity. consistent with this prediction, llama based nanobody h -h that neutralizes sars-cov- ( ), by interacting with % and % of the critical residues we identified in cr and cr , respectively. similarly, the human antibody ha ( ), and vh-fc ab ( ) neutralizes sars-cov- by interacting with f , a , and f residues on cr , which were among the strongest interactions we detected between rbd and pd. experimental studies revealed that antibodies against sars-cov induce limited neutralizing activity against sars-cov- ( , ) . this may be attributed to the low sequence conservation of the cr region between sars-cov and sars-cov- . in particular, the s protein of sars-cov- contains critical phenylalanine (f ) and glutamate (e ) residues not present in sars-cov, that form hydrophobic interactions and a salt bridge with ace , respectively. it remains to be determined whether this difference plays a role in higher infectivity of sars-cov- than sars-cov. our simulations show that single and double mutants of cr are not sufficient to disrupt the binding of rbd to ace , but reduce the binding free energy of this region. because rbd makes multiple contacts with ace through an extended surface, small molecules or peptides that target a specific region in the rbd-ace interaction surface may not be sufficient to prevent binding of the s protein to ace . instead, blocking of a larger surface of the cr region with a neutralizing antibody or nanobody is more likely to introduce steric constraints to prevent the s protein-ace interactions. materials and methods md simulations system preparation. for cmd simulations, the crystal structure of sars-cov- s protein rbd bound with ace at . Å resolution (pdb id: m j) ( ) was used as a template. the chloride ion, zinc ion, glycans, and water molecules in the crystal structure were kept in their original positions. single and double point mutants were generated using the mutator plugin in vmd ( ) . each system was solvated in a water box (using the tip p water model) having Å cushion in the positive x-direction and Å cushions in other directions. this puts a Å water cushion between the rbd-pd complex and its periodic image in the xdirection, creating enough space for unbinding simulations. ions were added to neutralize the system and salt concentration was set to mm to construct a physiologically relevant environment. the size of each solvated system was ~ , atoms. all system preparations steps were performed in vmd ( ) . all md simulations were performed in namd . ( ) using the charmm ( ) force field with a time step of fs. md simulations were performed under n, p, t conditions. the temperature was kept at k using langevin dynamics with a damping coefficient of ps - . the pressure was maintained at atm using the langevin nosé-hoover method with an oscillation period of fs and a damping time scale of fs. periodic boundary conditions were applied. Å cutoff distance was used for van der waals interactions. long-range electrostatic interactions were calculated using the particle-mesh ewald method. for each system; first, , steps of minimization followed by ns of equilibration was performed by keeping the protein fixed. the complete system was minimized for additional , steps, followed by ns of equilibration by applying constraints on cα atoms. subsequently, these constraints were released and the system was equilibrated for an additional ns before initiating the production runs. the length of the equilibrium steps is expected to account for the structural differences due to the radically different thermodynamic conditions of crystallization solutions and md simulations ( ) . md simulations were performed in comet and stampede using ~ million core-hours in total. rmsf calculations. rmsf values were calculated as 〈∆ 〉 / = 〈( − 〈 〉 ) 〉 / , where, 〈 〉 is the mean atomic coordinate of the i th cα atom and is its instantaneous coordinate. smd simulations. smd ( ) simulations were used to explore the unbinding process of rbd from ace on time scales accessible to standard simulation lengths. smd simulations have been applied to explore a wide range of processes, including domain motion ( , ) , molecule unbinding ( ) , and protein unfolding ( ) . in smd simulations, a dummy atom is attached to the center of mass of 'steered' atoms via a virtual spring and pulled at constant velocity along the 'pulling direction', resulting in force f to be applied to the smd atoms along the pulling vector ( ), where is the guiding potential, is the spring constant, is the pulling velocity, is time, and are the coordinates of the center of mass of steered atoms at time t and , respectively, and is the direction of pulling ( ) . total work (w) performed for each simulation was evaluated by integrating f over displacement along the pulling direction as = ∫ ( ) . in smd simulations of sars-cov- , cα atoms of ace residues s -s , t -p , q -n , g -i , and p -r were kept fixed, whereas cα atoms of rbd residues k -i , g -f , y -a , and n -y were steered (fig. a) . steered atoms were selected as the region comprising the interacting residues. for sars-cov smd simulations the same ace residues were kept fixed. however, two slightly different steered atoms selections were applied: i) using the same residue positions as for sars-cov- , which are v -i , t -l , f -s , and n -y , and ii) selecting the region comprising the interacting residues, which aret -l , f -d , and n -y . the total number of fixed and steered atoms were identical in all simulations. the pulling direction was selected as the distance between the center of mass of steered and fixed atoms. the pulling direction also serves as the reaction coordinate ξ for free energy calculations. each smd simulation was performed for ns using a Å − pulling velocity. at a spring constant of − Å − , the center of mass of the steered atoms followed the dummy atom closely while the spring was still soft enough to allow small deviations. for each system, conformations were sampled with a ns frequency from their cmd simulations ( conformers from each set of the cmd simulations listed in table s md - a-b). these conformations served as separate starting conformations, , for each set of smd simulations (table s md - c-d). potential of mean force for unbinding of rbd. work values to unbind rbd from ace at low pulling velocities along the reaction coordinate were analyzed using jarzynski equality, which provides a relation between equilibrium free energy differences and the work performed through non-equilibrium processes ( ) ( ) ( ) : where Δf is the helmholtz free energy, kb is the boltzmann constant and t is the temperature. because work values sampled in our smd simulations differ more than kbt ( fig. s and s ) , the average work calculated in eq. will be dominated by small work values that are only rarely sampled. for a finite (n) number of smd simulations, the term − ln(∑ − / = ⁄ ) did not converge to 〈 − / 〉. thus, eq. provides an upper bound on Δf, which was used as an estimate of the pmf. fig. s . rmsf values of single and double mutants of rbd of sars-cov- . fig. s . distribution of work values obtained from smd simulations for each single point mutant system of rbd of sars-cov- . fig. s . distribution of work values obtained from smd simulations for each double point mutant system of rbd of sars-cov- . fig. s . pmf and Δf values of wt and six mutants of rbd of sars-cov- . table s . starting conformations and durations of the md simulations performed. movie s . cr releasing last when sars-cov- rbd was pulled away from ace pd. movie s . cr releasing last when sars-cov- rbd was pulled away from ace pd. identification of a novel coronavirus causing severe pneumonia in human: a descriptive study structure, function, and antigenicity of the sars-cov- spike glycoprotein mechanisms of coronavirus cell entry mediated by the viral spike protein stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis conformational transition of sars-cov- spike glycoprotein between its closed and open states structural basis for the recognition of the sars-cov- by full-length human ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure of sars coronavirus spike receptorbinding domain complexed with receptor sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor key residues of the receptor binding motif in the spike protein of sars-cov- that interact with ace and neutralizing antibodies neutralizing nanobodies bind sars-cov- spike rbd and block interaction with ace computational design of ace -based peptide inhibitors of sars-cov- repurposing approved drugs as inhibitors of sars-cov- s-protein from molecular modeling and virtual screening cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace molecular mechanism of evolution and human infection with sars-cov- sars-cov- , an evolutionary perspective of interaction with human ace reveals undiscovered amino acids necessary for complex stability computational prediction of mutational effects on the sars-cov- binding by relative free energy calculations the sars-cov- exerts a distinctive strategy for interacting with the ace human receptor critical differences between the binding features of the spike proteins of sars-cov- and sars-cov effect of mutation on structure, function and dynamics of receptor binding domain of human sars-cov- with host cell receptor ace : a molecular dynamics simulations study dynamics of the ace -sars-cov- /sars-cov spike protein interface reveal unique mechanisms the mers-cov receptor dpp as a candidate binding target of the sars-cov- spike enhanced receptor binding of sars-cov- through networks of hydrogen-bonding and hydrophobic interactions zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open↔ closed transitions hbonanza: a computer algorithm for moleculardynamics-trajectory hydrogen-bond analysis direct and quantitative afm measurements of the concentration and temperature dependence of the hydrophobic force law at nanoscopic contacts a study of the preferred environment of amino acid residues in globular proteins molecular dynamics simulation of antimicrobial peptide arenicin- : b-hairpin stabilization by noncovalent interactions why protein conformers in molecular dynamics simulations differ from their crystal structures: a thermodynamic insight rapid mapping of protein functional epitopes by combinatorial alanine scanning comparing experimental and computational alanine scanning techniques for probing a prototypical protein-protein interaction equilibrium free-energy differences from nonequilibrium measurements: a master-equation approach nonequilibrium equality for free energy differences free energy calculation from steered molecular dynamics simulations using jarzynski's equality molecular dynamics simulations suggest that electrostatic funnel directs binding of tamiflu to influenza n neuraminidases molecular dynamics study of unbinding of the avidin-biotin complex steered molecular dynamics simulations reveal the likelier dissociation pathway of imatinib from its targeting kinases c-kit and abl unbinding of nicotine from the acetylcholine binding protein: steered molecular dynamics simulations computational insights into the mechanism of ligand unbinding and selectivity of estrogen receptors high potency of a bivalent human vh domain in sars-cov- animal models vmd: visual molecular dynamics scalable molecular dynamics with namd optimization of the additive charmm all-atom protein force field targeting improved sampling of the backbone ϕ, ψ and side-chain χ and χ dihedral angles steered molecular dynamics and mechanical functions of proteins steered molecular dynamics simulation of the rieske subunit motion in the cytochrome bc complex computational design of new peptide inhibitors for amyloid beta (aβ) aggregation in alzheimer's disease: application of a novel methodology unfolding of titin immunoglobulin domains by steered molecular dynamics simulation shielding and beyond: the roles of glycans in sars-cov- spike protein. biorxiv data and materials availability: data and the analysis software are available from the corresponding author upon request. the structure of the full-length s protein in complex with ace . the s protein is a homotrimer (green, purple, and grey) and embedded into the viral membrane. ace is a homodimer (blue and orange) and embedded into the host cell membrane. the full length structure of the s protein in complex with ace was modeled using the full length s protein model ( ) and the crystal structure of the s protein rbd in complex with ace (pdb id: m ). both proteins were manually inserted into the membrane by their transmembrane domains. (b) the structure of an s protomer in the down and up position of its rbd. s /s and s ' are the cleavage sites of the s protomer upon ace binding. (c) md simulations were performed for rbd of the s protein in complex with the pd of ace . catalytic residues of ace , glycans, and zn + and clions are shown in brown, red, yellow and purple, respectively. hydrophobic interactions (b) hydrogen bonds, and (c) salt bridges and electrostatic interactions, between rbd (green) and pd (blue) are shown on a conformation obtained from md simulations in the left panels. the interaction surface is divided into three distinct regions (cr - ). normalized distributions of the distances between the amino-acid pairs that form hydrophobic interactions (red), hydrogen bonds (purple), salt bridges (orange), and electrostatic interactions (green) are shown in the right panels. lines with colored numbers represent maximum cutoff distances for these interactions. key: cord- -chnibsa authors: hayn, manuel; hirschenberger, maximilian; koepke, lennart; straub, jan h; nchioua, rayhane; christensen, maria h; klute, susanne; bozzo, caterina prelli; aftab, wasim; zech, fabian; conzelmann, carina; müller, janis a; badarinarayan, smitha srinivasachar; stürzel, christina m; forne, ignasi; stenger, steffen; conzelmann, karl-klaus; münch, jan; sauter, daniel; schmidt, florian i; imhof, axel; kirchhoff, frank; sparrer, konstantin mj title: imperfect innate immune antagonism renders sars-cov- vulnerable towards ifn-γ and -λ date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: chnibsa the innate immune system constitutes a powerful barrier against viral infections. however, it may fail because successful emerging pathogens, like sars-cov- , evolved strategies to counteract it. here, we systematically assessed the impact of sars-cov- proteins on viral sensing, type i, ii and iii interferon (ifn) signaling, autophagy and inflammasome formation. mechanistic analyses show that autophagy and type i ifn responses are effectively counteracted at different levels. for example, nsp induces loss of the ifn receptor, whereas orf a disturbs autophagy at the golgi/endosome interface. comparative analyses revealed that antagonism of type i ifn and autophagy is largely conserved, except that sars-cov- nsp is more potent in counteracting type i ifn than its sars-cov- ortholog. altogether, however, sars-cov- counteracts type i ifn responses and autophagy much more efficiently than type ii and iii ifn signaling. consequently, the virus is relatively resistant against exogenous ifn-α/β and autophagy modulation but remains highly vulnerable towards ifn-γ and -λ treatment. in combination, ifn-γ and -λ act synergistically, and drastically reduce sars-cov- replication at exceedingly low doses. our results identify ineffective type i and ii antagonism as weakness of sars-cov- that may allow to devise safe and effective anti-viral therapies based on targeted innate immune activation. α) induce genes containing nf-κb sites in the promotor. signaling of type i ifns (ifn-α and ifn-β), type ii ifn (ifn-γ), type iii ifn (ifn-λ ) and pro-inflammatory cytokine signaling (tnfα and il- α) was quantified using quantitative firefly luciferase reporters controlled by the respective promotors (fig. c) . stimulation with ifn-α and ifn-β (fig. c) revealed that activation of the isre promotor is strongly repressed by nsp , nsp , nsp , nsp , orf and orf b. a similar set of viral proteins interfered with type ii ifn-γ and type iii ifn-λ signaling, albeit much weaker (mean inhibition % and %, respectively) compared to type i ifn signaling (mean inhibition % for ifn-α and % for ifn-β). activation of nf-κb signaling by tnfα or il- α was potently inhibited by the sars-cov- nsp , nsp , nsp , orf a, e, m, orf and orf b proteins. these analyses revealed that a similar set of proteins (nsp , nsp , nsp , orf a, e, m, orf and orf b) antagonizes pro-inflammatory cytokine signaling. since induction of autophagy does not depend on de novo gene expression , we monitored autophagy levels in sars-cov- protein expressing hek t cells by membrane-association of stably expressed gfp-lc b, a hallmark of autophagy induction (fig. d, supplementary fig. e ) . autophagosome numbers under basal conditions were strongly increased in the presence of orf a, e, m and orf a suggesting either de novo induction of autophagy or blockage of turnover (fig. d ). upon induction of autophagy using rapamycin, a similar pattern was observed. to clarify whether these viral proteins induce autophagy or block turnover, leading to accumulation of gfp-lc b positive vesicles, we treated cells with saturating amounts of bafilomycin a , which inhibits autophagic turnover. the increase of autophagosome numbers by orf a, e, m and orf a was drastically reduced compared to non- blocking conditions (fig. d) , indicating that these proteins block turnover, rather than induce it. blockage of autophagy and co-expression of nsp and nsp induced cell death, which may be responsible for the low number of autophagosomes. unexpectedly, in the presence of nsp autophagosome numbers were consistently reduced, suggesting that it inhibits autophagy (fig. d) . inflammasome responses were analyzed in stable thp- cell lines expressing sars-cov- proteins upon doxycycline induction. to avoid any effects of transcription, assembly of asc specks was needle protein mxih using the anthrax toxin delivery system ( fig. e) . asc speck assembly is typically followed by caspase- activation and release of pro-inflammatory . expression of the sars-cov- nsp , nsp and orf c very weakly induced inflammasome activity in the absence of inflammasome activators, although counterselection against cells prone to aberrant inflammasome activation during selection cannot be ruled out. activation of nlrc inflammasomes was not significantly antagonized by any viral protein. taken together, our analysis reveals that sars-cov- encodes multiple proteins that strongly antagonize innate immunity. notably, there are differences in overall inhibition of the pathways with ifn-γ, ifn-λ as well as inflammasome activity signaling being only weakly antagonized. however, type-i ifn induction and signaling and autophagy are strongly repressed. to analyses mechanistically why type-i ifn and autophagy are potently counteracted by sars-cov- , we aimed at identifying the steps that are targeted in these pathways. we focused on the top inhibitors as identified in fig. b -d. nsp was removed from the analysis as it prevents translation in general . to analyses ifn-β signaling, we monitored the levels of the type i ifn receptor, ifnar using western blotting in hek t cells overexpressing nsp , nsp , nsp , orf or orf b. activation of the two major transcription factors of type i ifn signaling, stat and stat (fig. a ) was examined by phosphorylation status. basal stat and stat levels were not significantly affected by all proteins tested (fig. b , quantification in supplementary fig. a-c) . (fig. b ). in the presence of nsp , activated stat and to a lesser extend stat accumulate ( fig. b and d , supplementary fig. a ). orf and orf b did not affect ifnar levels or stat expression or activation ( fig. b-d) . this agrees with recent reports , , suggesting that orf instead prevents trafficking of transcription factors. in the presence of nsp and to a lesser extend for nsp endogenous levels of ifnar is prominently reduced (fig. b, c) . consequently, phosphorylation of stat was decreased upon nsp co-expression (fig. b, d) . lipidated (lc b-ii) to decorate autophagosomal membranes , . upon fusion of autophagosomes with lysosomes, the autophagic receptor p is degraded (autophagy turnover, fig. e ). we analyzed the effect of the top autophagy modulating sars-cov- proteins: nsp , orf a, e, m and orf a (fig. d) on autophagy markers. levels of beclin- and ulk , which parts of the core machinery of autophagy initiation remained constant (fig. f, supplementary fig. d and e) . overexpression of nsp leads to a very slight decrease of lcb -ii but accumulation of p , suggesting that nsp blocks induction of autophagy . in line with this, the number of gfp-lc b-puncta (=autophagosomes) per cell in hela-gfp-lc b cells is reduced upon nsp expression to almost ( fig. i, j) . in the presence of orf a, e and orf a, the levels of processed lc b (lc b-ii) were - to -fold increased (fig. g) , and p levels are approximately . -fold increased (fig. h ). this indicates that these three viral proteins block autophagic turnover. consequently, the number of autophagosomes is -fold increased upon orf a, e, m or orf a expression (fig. i, j) . curiously, while accumulation of lc b-ii indicates that m blocks autophagic turnover or induces autophagy, the levels of p are not significantly altered in the presence of m (fig. f, h) . notably, overexpression of m resulted in an accumulation of lc b in the perinuclear space, whereas for all other viral proteins autophagosomes are normally distributed (fig. i, j) . taken together, our data demonstrates that sars-cov- synergistically targets type-ifn signaling and autophagy. the major type i ifn antagonists nsp , nsp , nsp , orf or orf b block the signaling cascade at different levels. e, orf a and orf a use similar mechanism to block autophagic turnover, while m may have evolved a different mechanism and nsp inhibits de novo autophagy induction. our data showed that orf a and orf a are the most potent autophagy antagonists of sars-cov- ( fig. d, fig. f-j) . to determine their molecular mechanism(s), we performed proteome analysis of hek t cells overexpressing sars-cov- orf a and orf a ( supplementary fig. a) . as a control, we used s, nsp and nsp overexpressing cells which show little to no effect on autophagy (fig. d ). in addition, we analyzed the proteome of caco- cells infected with sars-cov- for or table ) . analysis of the data revealed that in the presence of nsp , cellular proteins with a short half-life are markedly reduced (supplementary fig. f) . this supports our previous finding that nsp globally blocks translation and confirming the validity of the proteome analysis. panther-assisted gene ontology analysis of the proteins regulated more than - fold by the overexpression of individual sars-cov- proteins revealed that orf a and orf a target the late endosome pathway (go: ) (fig. c, supplementary table ) . a similar analysis for the sars-cov- samples showed that the late endosome pathway is also affected during the genuine infection. thus, we had a closer look at the subcellular localization of orf a and orf a and their effect on intracellular vesicles. in line with the proteome analysis, orf a and orf a both localized to the late endosomal compartment, co-localizing with the marker rab (fig. d ,e). in contrast, localization to rab a-positive early endosomes was not apparent ( supplementary fig. g ). disturbance of the integrity of the trans-golgi network (tgn) at the interface with the late endosomes , by viral proteins is a well-known strategy to block autophagy . immunofluorescence analysis revealed that the localization of orf a or orf a partially overlap with a tgn marker (r = . , fig. g) indicating close proximity. orf , which is known to localize to the golgi apparatus was used a positive control (r= . ). nsp , which displayed a cytoplasmic localization was used as a negative control (r= . ). importantly, analysis of free tgn-marker positive vesicles in sars-cov- orf a or orf a expressing cells revealed that both viral proteins cause significant fragmentation of the tgn (fig. f, h). these data indicate, that both orf a and orf a disturb the proteome at the late endosomes eventually causing the tgn to fragment, which leads to a block of autophagic turnover [ ] [ ] [ ] [ ] . to examine the conservation of innate immune antagonism, we functionally compared nsp , nsp , nsp , nsp , m, n, orf a, orf and orf a of sars-cov- , the closest related cov, cov and the previous highly pathogenic sars-cov- . ratg -cov was isolated from the measles virus v protein and trim expression served as positive controls. overall, proteins of sars- cov- and ratg behave similar to their sars-cov- counterparts, suggesting that many functions are conserved. importantly, however, this is not the case for nsp , nsp and to a lesser extend orf ( fig. a-c) . sars-cov- orf is about -fold less potent in antagonizing type i ifn signaling (fig. b) but induces higher levels of autophagy (fig. c) . however, expression levels of sars-cov- orf were also higher than that of its sars-vov- and ratg counterparts ( supplementary fig. g) , which may explain the differences in activity. differences between sars-cov, ratg and sars- cov- nsp were reanalyzed in a dose-dependent manner, and only in the range of - -fold which may also explained by differential expression (supplementary fig. j) . the most striking, statistically significant difference was observed for nsp . sars-cov- nsp is over -fold more potent in suppression of type i ifn induction and signaling than ratg and sars- cov- nsp (fig. a, b) . notably, expression levels of sars-cov- , ratg and sars-cov- nsp are similar, with sars-cov- nsp even being slightly less expressed ( supplementary fig. c). notably, all nsp variants still inhibit autophagy (fig. c) . dose-dependent effect of sars-cov- nsp , ratg -cov nsp and sars-cov- nsp on type i ifn induction (fig. d ) and signaling ( fig. e) showed that on average sars-cov nsp performed -fold worse than sars-cov- nsp , and ratg nsp inhibited type i ifn induction . -fold less (fig. d) . similarly, sars- cov- nsp outperformed ratg and sars-cov- nsp by -and . -fold, respectively, in inhibition of type i ifn signaling (fig. e) . taken together, this data indicates, that while most ifn antagonist activities are conserved between sars-cov, ratg and sars-cov- , there is an exception: nsp of sars-cov- is considerably more potent than sars-cov- nsp in counteracting both ifn-β induction and signaling. inefficient antagonism by sars-cov- proteins is predictive for efficient immune control immune activation , albeit with different efficiency. the mean inhibition of ifn-γ and ifn- λ signaling was % and %, respectively, compared to type i ifn signaling with a mean inhibition of only % for ifn-α and % for ifn-β. consequently, we assessed whether ifn-α , ifn-β, ifn- γ and ifn-λ have a different impact on sars-cov- (fig. a, supplementary fig. a , b). treatment with the type i ifn-α was the least efficient. in contrast, at the same concentration ifn-γ ( u/ml) reduced viral rna in the supernatant almost -fold more efficiently. all agents caused little if any cytotoxic effects ( supplementary fig. c ). altogether, we observed a good correlation (r= . ) between average inhibition of the respective signaling pathway (fig. c) antagonized by the sars- cov- proteins and ifn susceptibility at u/ml (fig. b) . in contrast to type ii and ii ifn signaling, autophagic turnover was strongly repressed by at least four sars-cov- proteins ( fig. c and fig. ) . thus, based on our inhibition data ( fig. c) we would expect that modulation of autophagy only weakly affects sars-cov- replication. indeed, treatment with rapamycin, which induces autophagy, reduced viral replication to a maximum of - -fold ( supplementary fig. e ). bafilomycin a , which blocks autophagy, had little to no effects ( supplementary fig. e ). both drugs were used at concentrations that only marginally affected cell survival ( supplementary fig. f ). thus, our results indicate that the overall efficiency of sars-cov- proteins in counteracting specific signaling pathway is predictive for the overall antiviral potency of the pathway, as illustrated by different types of ifns. ifn therapy is commonly associated with significant adverse effects, due to induction of inflammation. to minimize detrimental pro-inflammatory effects of ifns, doses required for efficient viral restriction should be reduced. thus, we analyzed the impact of the most potent ifns, ifn-γ and ifn-λ and their combination of sars-cov- . to mimic prophylactic and therapeutic treatment we examined pre- treatment for h before infection with sars-cov- and treatment h post-infection. overall, the treatment but consistent (fig. c, d) . expression analysis of sars-cov- s and n confirmed the qpcr results, and equal gapdh levels exclude effects on viral replication by cytotoxicity (fig. d ). while treatment with a single dose of ifn-γ and ifn-λ alone reduced viral rna production - -fold, the combinatorial treatment at the same concentration potentiated the effect to about -fold reduction in sars-cov- rna (fig. c) . to further decrease inflammatory side-effects by ifn treatment, anti-inflammatory pathways like autophagy could be induced - . treatment with rapamycin, which induces autophagy, already reduces viral replication ~ - -fold at nm (fig. c ). treatment of rapamycin ( nm) in combination with either ifn-γ or ifn-λ was found to be additive (fig. c, d) . triple treatment with ifn-γ, ifn-λ and rapamycin showed the most potent anti-viral effect of all combinations for pre-treatment and post- treatment, reducing viral rna in the supernatant by -fold and -fold, respectively (fig. c) . in summary, our data shows that the anti-sars-cov- effect of combinatorial treatments of ifn-γ, ifn-λ are synergistic. additional activation of anti-inflammatory autophagy by rapamycin further decreased sars-cov- replication. this suggests that concerted activation of innate immunity may be an effective anti-viral approach, exploiting vulnerabilities of sars-cov- revealed by analysis of its innate immune antagonism. viruses drastically alter our innate immune defenses to establish an infection and propagate to the next host , , , , , . our data reveal the extend of immune manipulation sars-cov- employs. we determined the major antagonists of type i ifn induction and signaling as well as pro-inflammatory nf-κb activity encoded by sars-cov- (nsp , nsp , nsp , nsp , orf and orf b). type ii and iii ifn signaling is targeted by a similar set of proteins, although much less efficient. autophagy is majorly targeted by nsp , orf a, e, m and orf a. inflammasome activity is very weakly induced by nsp , nsp and orf c, but none of the sars-cov- proteins block formation of the nlr c inflammasome. subsequent mechanistic studies revealed that sars-cov- proteins synergistically nsp lowers the cellular levels of the ifn receptor, ifnar, thus blocking activation of the crucial transcription factors stat and stat . both orf a and orf a cause fragmentation of the tgn via disturbing the late endosomal pathway. this is a common strategy of viruses to block autophagic turnover. examination of the functional conservation showed that sars-cov- nsp was less efficient in blocking innate immune activation, both type i ifn induction and signaling, than sars- cov- nsp . this may ultimately cause sars-cov- to be better controlled by the innate immune system than sars-cov- , explaining higher numbers of subclinical infections and thus overall lower mortality rates of the current pandemic cov. overall, the combined analysis of ifn antagonism allowed us to deduce that treatment with type-i ifns and regulation of autophagy is only weakly anti-viral. in contrast, treatment with ifn-γ and ifn-λ drastically reduced sars-cov- replication. finally, combinatorial treatment of sars-cov- with these two ifns potentiated the effects of the individual treatments. this may pave the way for future anti-viral therapies against sars-cov- based on rational innate immune activation. why would multiple effective proteins target the same pathway? for example, type i ifn signaling could have been shut down by nsp , nsp , nsp , nsp , orf and orf b alone, each reducing the activation of the innate immune pathways to below %. however, our assays revealed (figs. - ) that the targeting mechanisms are often not redundant and may act synergistically. this could allow the virus to better control the targeted pathway, thus minimizing the effect of the signaling on its replication. in addition, a viral protein majorly targeting one pathway may affect other connected immune pathways at once. for example, disturbance of the kinase tbk activation may affect primarily ifn induction and to a lesser extend also impact autophagy . proteome analyses revealed the late endosome/golgi network as a target of orf a and orf a. our data suggests, that both orf a and orf a of sars- cov- cause fragmentation of golgi apparatus and thus blockage of autophagy. sars-cov- orf a was previously already implicated in golgi fragmentation, thus our data suggests that sars-cov- orf a uses a similar strategy , . notably, fragmentation of the golgi is for example triggered by hepatitis c virus viruses to block anti-viral autophagic turnover and thus may represent a common studies will see more mechanistic data to explain the molecular details of the impact of sars-cov- proteins on innate immune activation. notably, several proteins including orf , orf a, orf a, m and e accumulate at the golgi network or in perinuclear spaces, alluding to the emerging role of the golgi as a hub for immune manipulation , . our results demonstrate that orf , orf a, orf a and orf b are the strongest innate immune antagonists among the accessory genes of sars-cov- (fig. ) . besides the accessory genes, which classically encode immune antagonists, a surprising number of non-structural proteins manipulate innate immunity. nsp , which targets cellular translation and thus broadly inhibits any response enzymatic functions may impact their activity against innate immunity. except for nsp , as its activity as a de-isglase may inactivate the transcription factor irf and thus reduce ifn induction . according to our analysis the structural proteins e and m strongly manipulated autophagy (fig. d ). this suggests that the incoming virion may already block autophagic turnover to prevent their own degradation by however, while we may pick up most counteraction strategies, our screening approach may miss immune evasion strategies employed by sars-cov- . for example, many non-structural proteins form complexes, that are not formed during single overexpression and may only be functional as a full assembly. evasion mechanisms based on rna structures and sequences are lost due to usage of codon- optimized expression plasmids. finally, the virus itself may employ strategies to hide itself from recognition, thus not activating innate immune defenses in the first place. one example is the capping would be immediately recognized by the cytoplasmic sensor rig-i. our analyses further revealed that the human innate immune antagonism is largely conserved in the sars-cov- closest related bat isolate, ratg (fig. ) . this indicates that the bat virus is capable of counteracting the human immune defenses, which may have facilitated successful zoonotic transmission from bat eventually to humans. currently, the intermediate animal host of sars-cov- is under debate , - , however it is likely, that the virus isolated from it is even closer related to sars- cov- than ratg . thus, any immune evasion mechanisms conserved between sars-cov- and ratg , is likely to be conserved in the direct progenitor virus of sars-cov- . the previous epidemic and related human sars-cov- and the current pandemic sars-cov- differ in susceptibility towards ifn s with sars-cov- being more resistant . furthermore, infection with sars-cov- is often asymptomatic and likely controlled by the host as lower mortality rates and higher subclinical infections suggest . paradoxically, this may support the fast spread of the virus. thus, sars-cov- may have found the 'perfect' balance. intermediate immune evasion and thus intermediate pathogenicity to support spread, but not kill the host. our data shows that sars-cov- nsp is strikingly less in efficient in ifn evasion than nsp of sars-cov. these data are the first mechanistic evidence why sars-cov- is less susceptible towards ifn treatment than sars-cov- . it may be tempting to speculate that common cold covs counteract the innate immune system less efficiently than sars-cov- . our analysis indicates that during a sars-cov- infection less cytokines than expected are released, autophagic turnover is blocked and general immune activation is perturbed. this is supported by a large amount of data from covid patients [ ] [ ] [ ] [ ] [ ] , , [ ] [ ] [ ] . however, an important question remains: why are some innate immune pathways, such as ifn-γ signaling less antagonized (fig. ) ? are the viral immune manipulation strategies ineffective? indeed, ifn-γ is most active against sars-cov- among the ifns (fig. ) . one possible explanation would be that there was no need for the virus to antagonize them. indeed, in covid patients and in vitro infections with sars-cov- , ifn-γ levels are surprisingly low , . furthermore, despite high ifn-γ levels being a hallmark of cytokine storms ifn-γ expression in cd + t cells is associated with severe covid , , . it is tempting to speculate that t-cells which confer pre-existing immunity against sars-cov- , could, upon activation, release ifn-γ, whose innate immune signaling may also contribute to increased clearance of the infection. strikingly, our work thus shows that analysis of the innate antagonism may be predictive for therapeutic opportunities. severe side effects are prevalent for treatments with ifns - . however, the side-effects are dose- dependent . thus, minimizing the dose required for treatment is paramount. our data indicates that effects of treatment with multiple ifns is additive but synergistic and potentiates each other (fig. ) . thus, a promising anti-viral approach may be a combinatorial treatment of different cytokines, effectively also reducing the burden of side-effects. the side effects of ifn therapy are mainly caused by inflammation. combined with anti-inflammatory approaches such as autophagy activation by rapamycin , , this approach may even be more successful, as our in vitro data suggests. future studies are highly warranted to study rational, concerted innate immune activation against sars-cov- in vivo. these studies may eventually pave the way for novel therapies, which may not only work against sars-cov- , but also against other pathogenic viruses, including potentially future covs. in summary, our results reveal the extend of innate immune manipulation of sars-cov- . comparison to sars-cov- revealed that mutations in nsp may be responsible for the higher susceptibility of sars-cov- against ifns. finally, our data allowed us to deduce a potent immune activation strategy against sars-cov- : combinatorial application of ifn-γ and ifn-λ. ratg , and sars-cov- were synthesized by twist bioscience and subcloned into the pcg vector using restriction cloning using the restriction enzymes xbai and mlui (new england biolabs). firefly luciferase reporter constructs, harboring binding sites for nf-κb or irf , isre or gas sites, or the genomic promoter of ifna or ifnb in front of the reporter were a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china the proximal origin of sars-cov- quantifying sars-cov- transmission suggests epidemic control with digital contact tracing sars : epidemiology cumulative number of cases and deaths in various countries in middle east respiratory syndrome comparing sars-cov- with sars-cov and influenza pandemics hosts and sources of endemic human advances in virus research rig-i in rna virus recognition sars coronavirus pathogenesis: host innate immune responses and viral antagonism of interferon. current opinion in virology sars coronavirus and innate immunity innate immunity to virus infection the antiviral activities of trim proteins. current opinion in microbiology intracellular detection of viral nucleic acids. current opinion in microbiology trim proteins and their roles in antiviral host defenses cov- infection: apossible smart targeting of the autophagy pathway involvement of autophagy in coronavirus replication trim proteins: new players in virus-induced autophagy autophagy during viral infection -a double-edged sword regulation of adaptive immunity by the innate immune system control of adaptive immunity by the innate immune system inborn errors of type i ifn immunity in patients with life-threatening covid- auto-antibodies against type i ifns in patients with life-threatening type i interferon susceptibility distinguishes sars-cov- from multi-level proteomics reveals host-perturbation strategies of sars-cov- sars-cov- orf b is a potent interferon antagonist whose activity is increased by a naturally occurring elongation variant structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- . science ( -. ). ( ) activation and evasion of type i interferon responses by sars-cov- . nat evasion of type i interferon by sars-cov- sars-cov- orf disrupts nucleocytoplasmic transport through interactions with rae and nup . biorxiv ( ) review of trials currently testing treatment and prevention of covid- the safety of pegylated interferon alpha- b in the treatment of chronic hepatitis b: predictive factors for dose reduction and treatment discontinuation a multidisciplinary therapeutic approach for reducing the risk of psychiatric side effects in patients with chronic hepatitis c treated with pegylated interferon α and side effects of therapy of hepatitis c and their management triple combination of interferon beta- b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with covid- : an open-label, randomised, phase trial a sars-cov- protein interaction map reveals targets for drug repurposing orf and orf b antibodies are accurate serological markers of early and late sars-cov- infection guidelines for the use and interpretation of assays for monitoring autophagy an improved method for high-throughput quantification of autophagy in mammalian cells type iii secretion needle proteins induce cell signaling and cytokine secretion via toll-like receptors a single domain antibody fragment that recognizes the adaptor asc defines the role of asc domains in inflammasome assembly activation and evasion of type i interferon responses by sars-cov- . nat sars-cov- nsp , nsp , nsp and orf function as potent interferon antagonists systematic analysis of protein turnover in primary cells organization of the er-golgi interface for membrane traffic control bidirectional traffic between the golgi and the endosomes - machineries and regulation hepatitis c virus triggers golgi fragmentation and autophagy through the immunity-related gtpase m the open reading frame a protein of severe acute respiratory syndrome-associated coronavirus promotes membrane rearrangement and cell death golgi apparatus: an emerging platform for innate autophagy in immunity and inflammation viral unmasking of cellular s rrna pseudogene transcripts induces rig-i- mediated immunity article trim mediates virus-induced autophagy via activation of tbk severe acute respiratory syndrome coronavirus orf a protein interacts with caveolin signaling organelles of the innate immune system nsp of coronaviruses: structures and functions of a large multi-domain protein papain-like protease regulates sars-cov- viral spread and innate immunity chimeric exchange of coronavirus nsp proteases ( clpro) identifies common and divergent regulatory determinants of protease activity sars-cov cl protease cleaves its c-terminal autoprocessing site by novel subsite cooperativity human coronavirus e nonstructural protein : characterization of duplex-unwinding, nucleoside triphosphatase, and rna sars-coronavirus- nsp possesses ntpase and rna helicase activities that can be inhibited by bismuth salts old" protein with a new story: coronavirus endoribonuclease is important for evading host antiviral defenses probable pangolin origin of sars-cov- associated with the covid- outbreak identifying sars-cov- -related coronaviruses in malayan pangolins interplay between sars-cov- and the type i interferon response the zinc finger antiviral protein restricts sars-cov- . biorxiv ( ) impaired type i interferon activity and inflammatory responses in severe covid- patients the trinity of covid- : immunity, inflammation and intervention clinical and immunological features of severe and moderate coronavirus disease pre-existing immunity to sars-cov- : the knowns and unknowns sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls side effects of interferon-α therapy beta interferon transcription in the nucleus measles virus v protein is a decoy substrate for iκb kinase α and prevents toll-like receptor / -mediated interferon induction hiv- vpu is a potent transcriptional suppressor of nf-kb-elicited antiviral immune responses trim senses and restricts influenza a virus by ubiquitination of pb ifitm proteins promote sars-cov- infection of human lung cells assessment of inflammasome formation by flow cytometry key: cord- -sqvfr q authors: messina, francesco; giombini, emanuela; montaldo, chiara; sharma, ashish arunkumar; piacentini, mauro; zoccoli, antonio; sekaly, rafick-pierre; locatelli, franco; zumla, alimuddin; maeurer, markus; capobianchi, maria r.; lauria, francesco nicola; ippolito, giuseppe title: looking for pathways related to covid- phenotypes: confirmation of pathogenic mechanisms by sars-cov- - host interactome date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: sqvfr q in the last months, many studies have clearly described several mechanisms of sars-cov- infection at cell and tissue level. host conditions and comorbidities were identified as risk factors for severe and fatal disease courses, but the mechanisms of interaction between host and sars-cov- determining the grade of covid- severity, are still unknown. we provide a network analysis on protein–protein interactions (ppi) between viral and host proteins to better identify host biological responses, induced by both whole proteome of sars-cov- and specific viral proteins. a host-virus interactome was inferred on published ppi, using an explorative algorithm (random walk with restart) triggered by all the proteins of sars-cov- , or each single viral protein one-by-one. the functional analysis for all proteins, linked to many aspects of covid- pathogenesis, allows to identify the subcellular districts, where sars-cov- proteins seem to be distributed, while in each interactome built around one single viral protein, a different response was described, underlining as orf and orf a modulated cardiovascular diseases and pro-inflammatory pathways, respectively. finally, an explorative network-based approach was applied to bradykinin storm, highlighting a possible direct action of orf a and ns b to enhancing this condition. this network-based model for sars-cov- infection could be a framework for pathogenic evaluation of specific clinical outcomes. we identified possible host responses induced by specific proteins of sars-cov- , underlining the important role of specific viral accessory proteins in pathogenic phenotypes of severe covid- patients. whilst covid- predominantly affects the respiratory system, it is a multisystem disease, with a wide spectrum of clinical presentations from asymptomatic, mild and moderate, to severe, fulminant disease ( ) . host conditions and comorbidities such as high age, obesity, diabetes, hypertension, organ damages, inflammation and coagulation dysfunctionality, were identified as risk factors for severe and fatal disease courses ( ) , but the mechanisms of interaction between host and sars-cov- that activate pathological pathways and determine severity, are still unknown. in last months, many studies clearly described several mechanisms of sars-cov- infection at cell and tissue level. it was observed that the replication of sars-cov- , as well as all +rna viruses, occurs in the cytoplasm of the host cell, inducing a membrane rearrangement of rough endoplasmic reticulum (er) membranes into double-membrane vesicles ( , ) . moreover, nsp along with nsp and nsp , which yield the rna polymerase activity of nsp , are assembled into the replicase-transcriptase complex, that begins to generate anti-sense (-) genomic rna molecules, templates for positive-sense genome (+) and mrna transcripts ( , ) . during virus entry, a well-known process is the cleavage of s protein by furin on the cell membrane, which lead to split s protein into two subunits, s and s , which the last can interact with ace ( , ) . however, the sars-cov- -host interaction is not restricted to local infection, but it triggers a systemic reaction, including the activation of the bradykinin storm, as described in many severe covid- patients ( ) . indeed, sars-cov- infection causes from one side a decrease of ace level in the lung cells and, on the other side, an increase of ace level, leading to increase bradykinin level ( ) . furthermore, microvascular injuries, due to systemic inflammatory response and endothelial dysfunction, were frequently found in severe covid- patients ( ) . increased circulating d-dimer concentrations, reflecting pulmonary vascular bed thrombosis with fibrinolysis, and elevated cardiac enzyme concentrations, linked to emergent ventricular stress induced by pulmonary hypertension, were associated to early features of severe pulmonary intravascular coagulopathy related to covid- ( ) . moreover, myocardial injuries were found to be linked to the risk of fatal outcome in covid- patients ( ) ( ) ( ) . in % severe patients ( % of all covid- patients) the myocardial injury was observed and a severe patient had a . -fold increase in the risk of myocardial injury than non-severe patients ( ) . although microvascular injuries and microthrombi formation frequently occurred in severe covid- patients, the role of sars-cov- in this phenotype is not completely explained yet. recently, results of a meta-analysis study on interleukin- concentrations in patients with severe or critical covid- , pointed out that the systemic inflammatory profile of covid- is distinct from that of non-covid- ards, sepsis, and car t cell-induced cytokine release syndrome, and it might be involved in organ dysfunction in covid- , such as endovasculitis, direct viral injury and viral-induced immunosuppression ( ) . although many aspects of covid- pathogenesis and mechanisms of sars-cov- have been investigated, only few papers describe the interactions among sars-cov- proteins by wet experiments. physical associations in human cells, between sars-cov- proteins and human proteins, were found by affinitypurification mass spectrometry (ap-ms), identifying high-confidence sars-cov- -human proteinprotein interactions (ppis), highlighting an activation of innate immune pathways ( ) . then, the influence of sars-cov- on transcriptome, proteome, ubiquitinome and phosphoproteome of a lung-derived human cell line, was described through a multi-omic approach. such a multilevel representation highlighted autophagy mechanisms regulation by orf a and nsp , the modulation of innate immunity by m, orf a and ns b, and the integrin-tgf-β-egfr-rtk signalling perturbation by orf of sars-cov- ( ). virus-host interactome by computational approach has been applied to covid- for drug repurposing ( ) , allowing the identification of new drug targets ( ) and contributing to explain clinical manifestations ( , ) . the structural information on sars-cov- proteins and their interactions with human proteins and other viral proteins, allowed to better understand the mechanisms of sars-cov- infection, also comparing it with sars-cov ( ) . the interactome based on ppi and gene expression data, have been applied also to uncover the molecular origins of phenotypes of other complex diseases ( , ) , in order to better define the mechanisms of covid- pathogenesis. the understanding of all mechanisms of sars-cov- infection also passes by overall visualization of biological reactions and pathways involved in covid- and h-cov infections. it can carry out on a disease map, such as covid- disease maps, containing many diagrams about host molecular response during the infection ( ) ( ) ( ) . in this context, defining the host response induced by specific viral proteins would be of great importance and can guide the identification of functional viral targets, helping to better define the pathologic phenotypes of the infection. here, we carry out a network analysis on ppi to better identify host biological response induced by sars-cov- . furthermore, the interactome analysis was applied to design network of sars-cov- -host proteins that could lead to a bradykinin storm. we used three different applications to identify interacting proteins: proteins that interact closely with sars-cov- proteins, protein interactomes around each sars-cov- protein and protein associated with kng , the pre-cursor of bradykinin (bk). in all three models, pathways associated with proteins were identified, including dna damage/repair, tgf-β signalling, complement cascades ( figure ). to define the massive effect of sars-cov- infection on the host cell, an interactome between the entire set of sars-cov- proteins and host was carried out. observing the colours' distribution, it is possible to distinguish three different areas, corresponding to as many subcellular districts, where sars-cov- proteins seem to be distributed (figure a). in fact, n, nsp , nsp and nsp are posed around nuclear proteins, while s, along with nsp , nsp , nsp , nsp , nsp , nsp , orf a and orf b, were among cytosolic and membrane proteins, at the bounder of the interactome. protein m and many accessory and not-structural proteins (ns b, nsp , nsp , orf a, orf a, orf , orf and orf ) are around mitochondrial and endoplasmic proteins. we described the high values of betweenness centrality and degree for host proteins: two proteins multiorganelle amyloid-beta precursor protein (app) and dual-specificity protein phosphatase (pten); the membrane protein sodium/potassium-transporting atpase subunit alpha- (atp a ); two cytoplasmic proteins, - - protein theta (ywhaq) and ubiquitin (ubc); two proteins of endoplasmic reticulum and golgi apparatus, sarcoplasmic/endoplasmic reticulum calcium atpase (atp a ) and unconventional myosin-vi (myo ); the mitochondrial atp synthase subunit alpha (atp f a); one nuclear receptor for export of rnas, exportin- (xpo ), suggesting their main role in this infection (s- figure , s- table ). in order to further dissect the interactions with the entire proteome of sars-cov- , enrichment analysis was carried out with wikipathways and kyoto encyclopaedia of genes and genomes (kegg) databases. wikipathways gene enrichment analysis revealed biological pathways of dna replication, ubiquitination and proteasome, with high significance (fdr < . %). in addition, kegg pathway enrichment analysis revealed dna and rna replication pathways as the most significant pathways (fdr < . %), as well as signalling pathways and viral infection pathways (figure b) . to define the effect of specific viral proteins on host response, interactomes were built, choosing one specific viral protein as seed, imposing to find the closest proteins. these analyses allowed to produce restricted interactomes, which defined the strictest biological interactions associated to seed proteins both among human proteins and other viral proteins. in s- figure for kegg database the gene enrichment analysis on interactomes of ns b, orf a, orf a and orf showed pathway clusters highly significant and consistent with possible pathogenic mechanisms, such as the activation of the complement and of the coagulative cascade, ( ) and the tgf-β-dominated immune response ( ) . in particular, the ns b interactome revealed snare interactions in vesicular transport pathway, describing the mechanisms of intracellular vesicle trafficking and secretion, as the most significant pathway among all interactome (fdr< . %). the orf a interactome revealed a cluster of pathways, composed by as the virus-host interactome has been efficient to describe the response to specific viral proteins, this method was applied to reconstruct the possible involvement of sars-cov- proteins in triggering the bradykinin storm during the infection. in this case, kininogen- (kng ), precursor of bradykinin made by proteolysis cleavage, was considered as seed for rwr, imposing closest proteins as limit to stop the algorithm. the resulting interactome showed ns b, orf a, orf and s as proximal to the seed protein, suggesting their role in the modulation of biological processes around kininogen- . indeed, it would provide a direct influence of in this study, we built a functional interactome between sars-cov- and human proteome through rwr algorithm, to identify biological mechanisms and cell responses during sars-cov- infection and to propose a simulated model of infection contributing to a better understanding of covid- pathogenesis. districts. the high likelihood of our model to in vivo real mechanisms strengths the representative power of the interactome and the explorative algorithm to simulate biological processes. moreover, the proximity among specific viral proteins into the network, such as nsp , nsp , nsp , nsp , nsp , nsp , nsp involved in the viral rna replication, can be a good way to investigate their possible function in viral infection. the closeness among s, nsp , nsp , orf b in the network is difficult to interpret, but s, nsp and orf b, along with n, had significant positive responses of igg antibody in sera of covid- patients ( ) . in sars-cov infected cell culture, the location of nsp in the cytoplasm and to some extent in the nucleus, as well as orf a, orf b, orf and m in er, seems to be consistent with their location among nuclear and cytoplasmatic, and er proteins respectively ( ). the involvement of proteasome and ubiquitination pathways, along with rna replication, represent the principal pathways activated for assembly and replication of sars-cov- ( ) . in fact, strong increases in rna-modifying proteins were revealed in cell culture after infection with sars-cov- ( ) . the ubiquitin proteasome system deletes viral proteins to control the infection, but the virus can use them for its propagation ( ) . the interactomes, built around a single protein of sars-cov- , allowed to draw effects on cell, involving specific pathways. vesicular transport mechanism by snare interactions, identified for ns b, is used by pathogens to penetrate host cells through their membranes and in particular in sars-cov ( , ) . to study the pathogenesis of covid- in systemic context, bradykinin storm was investigated by the interactome approach. we showed ns b and orf a interacting with ece , which can inactivate bk ( ) . patients and reinforce the role of orf a to enhance the bradykinin dysregulation. these host-virus interactions could enhance a better viral fitness during the infection, as suggested by the interaction already observed with sars-cov between ns b in ece ( ). bradykinin storm in covid- : the locking of ece activity due to ns b and especially orf a interaction, could reduce the activation of the vasoconstrictor endotelin- , and amplify the vasodilatation effect of bk. the ece gene is much more expressed in all the tissues than ace and ace, especially in the lungs, where ece is fold and . fold expressed compared to ace and ace, respectively ( ) . similar to the angiotensin peptides, bk is produced from an inactive pre-protein kininogen- through the activation by the serine protease kallikrein ( ) . furthermore, the excess of bk can lead to vasodilatation, hypotension and hypokalaemia ( , ) , which is associated with arrhythmia ( , ) . all these clinical conditions have been widely reported in covid- patients ( , , ) . it is also notable that many of the other symptoms reported for covid- (myalgia, fatigue, nausea, vomiting, diarrhoea, anorexia, headaches, decreased cognitive function) are remarkably similar to other hyper-bk-conditions that lead to vascular hyperpermeabilization ( ) . moreover, the activation of bk is strongly linked to coagulation: active f activates plasma kallikrein (pk), that liberates bradykinin by cleavage ( ) . the direct interactions between there are many experimental platforms for deriving such physical interactions, such as affinity purification mass-spectrometry (ap-ms) and yeast-two-hybrid (y h), which enable the accurate identification of interactions with a relatively long time. the scenario reported in this study refers to few experimental data available on public databases and could be different respect to real phenotypes of covid- patients. the pathways' analysis did not consider tissue and cell type diversity. finally, the low threshold established for the number of nodes found by rwr ( ) limited the reconstruction of the entire pathways. however, this was a software-imposed threshold. although such network-based approach showed great potential in identifying mechanisms not yet observed, experimental tests will be necessary to confirm what we have described. we developed a network-based model for sars-cov- infection, which could be a framework for pathogenic evaluation of specific clinical outcomes. here, the ppi interactomes were used to identify mechanistic processes of the viral molecular machinery during sars-cov- infection, for viral survival and replication within the host. with this knowledge, proteins interactions crucial for pathogenesis could be discerned. we identified different host response induced by specific proteins of sars-cov- , underlining the important role of orf a and orf in phenotypes of severe covid- patients. the interactome approach applied to identification of biological reactions around kininogen- , allowed to view ns b and orf a interactions with ece , which might have a role to enhance bradykinin storm. this network-based model of sars-cov- -host interaction could guide to develop novel treatments against specific viral proteins, such as monoclonal antibodies. the virus-host interactome was made by merging sars-cov- -host protein-protein interaction (ppi) data from intact ( ), with data from human ppi databases, such as biogrid, innatedb-all, imex, intact, matrixdb, mbinfo, mint, reactome, reactome-fis, uniprot, virhostnet, biodata, obtained by r packages psicquic and biomart ( , ) . for sars-cov- -host interaction experimentally obtained, protein interactors and interactions were download from intact at september . sars-cov- proteins are used for all the analyses: e, m, n, s, nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , orf a, orf a, orf , orf a, ns b, orf , orf b, orf , orf . to obtain functional information about ppi ex vitro, to simulate biological reactions in infected cells and to explore cell response against viral infection, we employed a random walk with restart (rwr) algorithm, a state-of-the-art guilt-by-association approach ( ) . this algorithm allows to establish a proximity network from a given protein (seed), to study its functions, based on the premise that nodes related to similar functions tend to lie close to each other in the network. for this study, two types of interactome were performed: the whole sars-cov- -host interactome and an interactome per single sars-cov- protein. in the first imputation, every protein of sars-cov- proteome was used as seeds, with the limit of closest host's proteins to every sar-cov- protein, every protein of sars-cov- proteome was chosen as seeds for random walk with restart, imposing the limit of closest host proteins to every sars-cov- proteins. this analysis allowed to design sars-cov- -host interactome, where the different colours correspond to the localization of the proteins within the cell. for the second kind of interactome, one sars-cov- per time was used as seed, lowering to closest proteins in order to define the induced biological response as better as possible. for each node, a score was computed as a measure of proximity to the seed protein ( ) . in total, a large ppi interaction database was assembled, including nodes and interactions. graphical representations of networks were performed by gephi . . ( ) . to identify hub protein in the sars-cov- -host interactome, the values of betweenness centrality and degree were plotted. betweenness centrality score measure how a specific node is in-between other nodes and then can be considered a hub, while the degree of node corresponds to number of connections. pathways of proteins involved in host response were tested by gene enrichment analysis on kyoto encyclopaedia of genes and genomes (kegg) human pathways and wikipathways databases ( ) . to allow gathering of results for every running, the r package enrichr was used, an r interface to web-based tool 'enrichr' for analysing gene sets ( ) . the enrichr analysis was performed using these statistical parameters: p-value (fisher exact test), q-value (adjusted p-value for false discovery rate, fdr). results for kegg and wikipathways were considered significant with a revised p-value < . . to infer pathways involved in single viral interactome, gene enrichment analyses for each viral interactome were collected along with p-values, as reported in enrichr package output. p-values of every single enrichment analyses were transformed by the function x = − log (p-value) and the % of x values were plotted on heatmap by r package pheatmap ( ) . european centre for disease prevention and control ( ) clinical characteristics of covid- risk factors for covid- severity and fatality: a structured literature review double-membrane vesicles as platforms for viral replication a pneumonia outbreak associated with a new coronavirus of probable bat origin structure of replicating sars-cov- polymerase the sars-coronavirus nsp +nsp complex is a unique multimeric rna polymerase capable of both de novo initiation and primer extension a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor a mechanistic model and therapeutic interventions for covid- involving a ras-mediated bradykinin storm myocardial and microvascular injury due to coronavirus disease immune mechanisms of pulmonary intravascular coagulopathy in covid- pneumonia association of cardiac injury with mortality in hospitalized patients with covid- in wuhan, china characteristics and clinical significance of myocardial injury in patients with severe coronavirus disease cardiovascular implications of fatal outcomes of patients with coronavirus disease (covid- ) incidence of myocardial injury in coronavirus disease (covid- ): a pooled analysis of , patients from studies cytokine elevation in severe and critical covid- : a rapid systematic review, meta-analysis, and comparison with other inflammatory syndromes a sars-cov- protein interaction map reveals targets for drug repurposing multi-level proteomics reveals host-perturbation strategies of sars-cov- and sars-cov matrix metallopeptidase as a host protein target of chloroquine and melatonin for immunoregulation in covid- : a network-based metaanalysis a network medicine approach to investigation and population-based validation of disease manifestations and drug repurposing for covid- . chemrxiv : the preprint server for chemistry identifying human interactors of sars-cov- proteins and drug targets for covid- using network-based label propagation structural genomics of sars-cov- indicates evolutionary conserved functional regions of viral proteins covid- : viral-host interactome analyzed by network based-approach model to study pathogenesis of sars-cov- infection network medicine: a network-based approach to human disease uncovering disease-disease relationships through the incomplete interactome covid- disease map, building a computational repository of sars-cov- virus-host interaction mechanisms systems medicine disease maps: community-driven comprehensive representation of disease mechanisms covid- disease map, a computational knowledge repository of sars-cov- virus-host interaction mechanisms covid- related coagulopathy: what is known up to now in severe covid- , sars-cov- induces a chronic, tgf-β-dominated adaptive immune response sars-cov- proteome microarray for global profiling of covid- specific igg and igm responses analysis of intraviral protein-protein interactions of the sars coronavirus orfeome proteomics of sars-cov- -infected host cells reveals therapy targets pleiotropic roles of the ubiquitin-proteasome system during viral propagation snare motif: a common motif used by pathogens to manipulate membrane fusion crystal structure of severe acute respiratory syndrome coronavirus spike protein fusion core discovery and genomic characterization of a -nucleotide deletion in orf b and orf during the early evolution of sars-cov- effects of a major deletion in the sars-cov- genome on the severity of infection and the inflammatory response: an observational cohort study complement associated microvascular injury and thrombosis in the pathogenesis of severe covid- infection: a report of five cases endothelin-converting enzyme- regulates endosomal sorting of calcitonin receptor-like receptor and beta-arrestins the gtex consortium ( ) the gtex consortium atlas of genetic regulatory effects across human tissues the contact activation and kallikrein/kinin systems: pathophysiologic and physiologic activities bradykinin stimulates renal na(+) and k(+) excretion by inhibiting the k(+) channel (kir . ) in the distal convoluted tubule hypokalemia and sudden cardiac death hypokalemia-induced arrhythmias and heart failure: new insights and implications for therapy. frontiers in physiology clinical features of patients infected with novel coronavirus in wuhan clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china kallikrein-kinin blockade in patients with covid- to prevent acute respiratory distress syndrome bradykinin: inflammatory product of the coagulation system kinin b receptors as a therapeutic target for inflammation intact: an open source molecular interaction database psicquic and psiscore: accessing and scoring molecular interactions biomart--biological queries made easy random walk with restart on multiplex and heterogeneous biological networks gephi: an open source software for exploring and manipulating networks wikipathways: a multifaceted pathway database bridging metabolomics to other omics research enrichr: a comprehensive gene set enrichment analysis web server update figures and tables figure . workflow to describe sars-cov- host interactome analysis based on ppi data figure . a) ppi interactome, based on human ppi and sars-cov- -host interactions, with top closest proteins identified by rwr, using together proteins of sars-cov- as seeds for only one rwr run. different colours of node and edges represent different locations in the cell wikipathways gene enrichment analyses for proteins identified by rwr algorithm using together proteins of sars-cov- heatmap reporting top % smallest p values of gene enrichment analysis on list of proteins, obtained by every interactome of single viral seed. the values were reported as -log (p value). a) p values of gene enrichment based on kegg . b) p values of gene enrichment based on wikipathways the nodes in red are human proteins, while the nodes in green are virus proteins. b) kegg human pathway and wikipathways gene enrichment analyses for proteins closest to kng- . c) ppi interactome, where proteins belonged to human complement system wp (modes in purple) and complement and coagulation cascades (nodes in blue) pathways were highlighted. the proteins belonged to both pathways were marked in light blue : table with human and viral proteins identified by rwr in all sars-cov- proteins interactome are reported, along with statistical parameters and biological information table of gene enrichment analysis obtained by kegg database all interactions selected by rwr in interactome for single seed viral protein. detailed information about interactors, interaction and database were reported plot betweenness centrality vs. degree of human proteins, retained in all proteins of sars-cov- -host interactome e) as seed. the nodes in red are human proteins interactomes based on human ppi and sars-cov- -host interactions, with top closest proteins identified by rwr, using accessory proteins (orf a, orf a, orf , orf a, ns b, orf , orf b, orf , orf ) as seed. the nodes in red are human proteins s-figure . interactomes based on human ppi and sars-cov- -host interactions, with top closest proteins identified by rwr, using accessory proteins key: cord- -hnrmaytm authors: ventura fernandes, bianca h; feitosa, natália martins; barbosa, ana paula; bomfim, camila gasque; garnique, anali m. b.; gomes, francisco i. f.; nakajima, rafael t.; belo, marco a. a.; eto, silas fernandes; fernandes, dayanne carla; malafaia, guilherme; manrique, wilson g.; conde, gabriel; rosales, roberta r. c.; todeschini, iris; rivero, ilo; llontop, edgar; sgro, german g.; oka, gabriel umaji; bueno, natalia f; ferraris, fausto k.; de magalhaes, mariana t. q.; medeiros, renata j.; gomes, juliana m. m; de souza junqueira, mara; conceição, katia; pontes, letícia g.; condino-neto, antonio; perez, andrea c.; barcellos, leonardo j. g.; correa junior, jose dias; dorlass, erick g.; camara, niels o. s; durigon, edison luiz; cunha, fernando q.; nóbrega, rafael h.; machado-santelli, glaucia m.; farah, chuck; veras, flávio p; galindo-villegas, jorge; costa-lotufo, leticia; cunha, thiago m.; chammas, roger; guzzo, cristiane r.; carvalho, luciani r; charlie-silva, ives title: zebrafish studies on the vaccine candidate to covid- , the spike protein: production of antibody and adverse reaction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hnrmaytm establishing new experimental animal models to assess the safety and immune response to the antigen used in the development of covid- vaccine is an imperative issue. based on the advantages of using zebrafish as a model in research, herein we suggest doing this to test the safety of the putative vaccine candidates and to study immune response against the virus. we produced a recombinant n-terminal fraction of the spike sars-cov- protein and injected it into adult female zebrafish. the specimens generated humoral immunity and passed the antibodies to the eggs. however, they presented adverse reactions and inflammatory responses similar to severe cases of human covid- . the analysis of the structure and function of zebrafish and human angiotensin-converting enzyme , the main human receptor for virus infection, presented remarkable sequence similarities. moreover, bioinformatic analysis predicted protein-protein interaction of the spike sars-cov- fragment and the toll-like receptor pathway. it might help in the choice of future therapeutic pharmaceutical drugs to be studied. based on the in vivo and in silico results presented here, we propose the zebrafish as a model for translational research into the safety of the vaccine and the immune response of the vertebrate organism to the sars-cov- virus. the world health organization (who) registered, on january th, , that the outbreak of the disease caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) constituted a public health emergency of international importance (the highest level of alert from the organization) . since then, the number of conclusion. however, the case of covid- meets a new pandemic paradigm and the development of the vaccine has been proposed to be reduced to - years . it is worth mentioning that before vaccine clinical tests begin, several safety protocols must be submitted with in vitro and in vivo experiments on animal models. there is a lack of information regarding the immune response of the organism to sars- cov- , including animal models to study it . although zebrafish do not have lungs as humans do, the present study shows similar inflammatory responses observed in severe cases of covid- patients that could be considered when investigating human responses to the virus. in the global task to develop the vaccine and possible therapeutic approaches for covid- , several animal models have been proposed, such as mice , hace transgenic mice , alpaca , golden syrian hamsters, ferrets, dogs, pigs, chickens, and cats , and species of non-human primates . recently, three reports have described the production of equine neutralizing antibodies against sars-cov- , . a study by deng and collaborators analyzed serum samples from animal species for the detection of specific antibodies against sars-cov- . despite this wide search for candidate animal models, so far only two references promote the zebrafish model on this regard confirming the innovative and pioneer characteristics of our study , . here, female zebrafish individuals injected with a n-terminal fraction of sars- two bioassays were carried out to analyze the toxicity of the rspike. although the immunized fish produced antibodies, the first injection of the rspike generated high toxicity to the fish ( figure ) . therefore, the assay was repeated by adding different control groups to confirm that the toxicity findings were specific to the rspike ( figure ). in the first bioassay, after fish immunization with rspike, the survival rate was . % during the first seven days (figure ) . it was significant when compared to naive control and fish injected with protein buffer (control ), where the survival rate was % and %, respectively ( figure ) . nonetheless, after a second immunization, the rspike immunized group maintained the plateau survival rate, with no statistical significance between the groups for the relative risk of death ( figure ) . therefore, a second assay was conducted by adding different control groups in order to confirm that the toxicity findings were specific to the rspike, and related to the in order to verify the occurrence of sublethal effects of the rspike on treated zebrafish, histopathological analysis of different organs, including brain, gonads, heart, kidney, liver, spleen, among others, was performed in female fishes used in the immunization protocol described in material and methods. animals that died during the immunization experiment were excluded from the analysis. in general, it was observed several morphological alterations compatible with an undergoing inflammatory process in many tissues. markedly, brain obtained from treated fishes showed an intense table . the first experiments aimed to analyze the humoral response with antibody production and used, besides the rspike, the appropriate negative controls as the e. coli table . what can we expect from first-generation covid- vaccines? lancet passive immunization of farmed making waves in cancer research: new models in the zebrafish the zebrafish reference genome sequence and its relationship to the human genome zebrafish as an alternative animal model in human and animal vaccination research potential of mucoadhesive nanocapsules in drug release and toxicology in zebrafish zebrafish as tools for drug discovery developing covid- vaccines at pandemic speed a comprehensive review of animal models for coronaviruses: sars-cov- , sars-cov, and mers-cov comparison of sars-cov- infections among species of non-human affiliations non-invasive bioluminescence imaging of hcov- oc infection and therapy in the central nervous system of live mice an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction equine hyperimmune globulin raised against the sars-cov- spike glycoprotein has extremely high neutralizing titers development of a hyperimmune equine serum therapy for covid- in argentina serological survey of sars-cov- for experimental, domestic, companion and wild animals excludes intermediate hosts of different species of animals the zebrafish disease and drug screening model: a strong ally against covid- tracking mechanisms of viral dissemination in vivo a contemporary view of coronavirus antibody responses to sars-cov- in patients with covid- maternal transfer and protective role of antibodies in zebrafish danio rerio role of maternally derived immunity in fish breast milk-fed infant of covid-   sars-cov- s and s subunits-and nucleocapsid protein-reactive sigm/igm iga antibodies in human milk acute transverse myelitis after covid- pneumonia pathological findings of covid- associated with acute respiratory distress syndrome patients -an mri-based -month follow-up study cerebral venous thrombosis associated with covid- effects of covid- on the nervous system respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace pathological study of the novel coronavirus disease (covid- ) through postmortem core biopsies covid- infection and neurological complications: present findings and future predictions covid- -associated acute necrotizing myelitis pathogenesis and diagnosis of viral infections of the nervous system neurologic and radiographic findings associated with covid- infection in children the conundrum of interleukin- blockade in covid- early administration of interleukin- inhibitors for patients with severe covid- disease is associated with decreased intubation, reduced mortality, and increased discharge inborn errors of human jaks and immunodeficiency due to heterozygous gain-of-function mutations in stat stat transcription factors in t cell control of stat knockout mice are highly susceptible to pulmonary mycobacterial infection erk/mapk signalling pathway and tumorigenesis (review) hemophagocytic syndromes -an update kidney biopsy findings in patients with covid- identification of a potential mechanism of acute kidney injury during the covid- outbreak: a study based on single-cell transcriptome analysis kidney involvement in covid- and rationale for extracorporeal therapies clinical characteristics of fatal and recovered cases of coronavirus disease in wuhan, china: a retrospective study covid- : abnormal liver function tests melano-macrophage centres and their role in fish pathology covid- and the liver its receptor mas, and the angiotensin-converting enzyme type are expressed in the human ovary the vasoactive peptide its receptor mas and the angiotensin-converting enzyme type are expressed in the human endometrium sars-cov- and the next generations: which impact on reproductive tissues? pathogenesis of sars-cov- in transgenic mice expressing human angiotensin-converting enzyme predicting the angiotensin converting enzyme (ace ) utilizing capability as the receptor of sars-cov- . microbes infect ace orthologues in non-mammalian vertebrates (danio, gallus, fugu, tetraodon and xenopus) development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach fishing for mammalian paradigms in the teleost immune system a cold-blooded view of adaptive immunity reconstructing immune phylogeny: new perspectives diversity and repertoire of igw and igm vh families in the newborn nurse shark implications for a distinct b cell receptor in lower vertebrates targets of somatic hypermutation within immunoglobulin light chain genes in zebrafish perspectives on antigen presenting cells in zebrafish breeding zebrafish: a review of different methods and a discussion on standardization neuroproteomics tools in clinical practice a new general topology for cascaded multilevel fixation and decalcification of the development of a universal silico predictor of protein-protein interactions. csermely p profiler-a web server for functional interpretation of gene lists ( update) pathview: an r/bioconductor package for pathway-based data integration and visualization kyoto encyclopedia of genes and genomes pbs buffer for hours and after few pbs rinse the membrane was incubated with monoclonal anti-polyhistidine−peroxidase antibody produced in mouse (sigma- aldrich). key: cord- - klkojpo authors: harilal, divinlal; ramaswamy, sathishkumar; loney, tom; suwaidi, hanan al; khansaheb, hamda; alkhaja, abdulmajeed; varghese, rupa; deesi, zulfa; nowotny, norbert; alsheikh-ali, alawi; tayoun, ahmad abou title: sars-cov- whole genome amplification and sequencing for effective population-based surveillance and control of viral transmission date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: klkojpo background with the gradual reopening of economies and resumption of social life, robust surveillance mechanisms should be implemented to control the ongoing covid- pandemic. unlike rt-qpcr, sars-cov- whole genome sequencing (cwgs) has the added advantage of identifying cryptic origins of the virus, and the extent of community-based transmissions versus new viral introductions, which can in turn influence public health policy decisions. however, practical and cost considerations of cwgs should be addressed before it can be widely implemented. methods we performed shotgun transcriptome sequencing using rna extracted from nasopharyngeal swabs of patients with covid- , and compared it to targeted sars-cov- full genome amplification and sequencing with respect to virus detection, scalability, and cost-effectiveness. to track virus origin, we used open-source multiple sequence alignment and phylogenetic tools to compare the assembled sars-cov- genomes to publicly available sequences. results we show a significant improvement in whole genome sequencing data quality and viral detection using amplicon-based target enrichment of sars-cov- . with enrichment, more than % of the sequencing reads mapped to the viral genome compared to an average of . % without enrichment. consequently, a dramatic increase in genome coverage was obtained using significantly less sequencing data, enabling higher scalability and significant cost reductions. we also demonstrate how sars-cov- genome sequences can be used to determine their possible origin through phylogenetic analysis including other viral strains. conclusions sars-cov- whole genome sequencing is a practical, cost-effective, and powerful approach for population-based surveillance and control of viral transmission in the next phase of the covid- pandemic. the covid- pandemic continues to inflict devastating human life losses ( ), and has enforced significant social changes and global economic shut downs ( ) . with the accumulating financial burdens and unemployment rates, several governments are sketching out plans for slowly re-opening the economy and reviving social life and economic activity. however, robust population-based surveillance systems are essential to track viral transmission during the re-opening process. while rt-qpcr targeting sars-cov- can be effective in identifying infected individuals for isolation and contact tracing, it is not useful in determining which viral strains are circulating in the community: autochthonous versus imported ones, and -if imported -it is important to know the origin of the strains, which in turn influences public health policy decisions. in addition, it is vital to identify super-spreader events as they can be influenced by the virus strain ( ) . sars-cov- whole genome sequencing (cwgs), on the other hand, can detect the virus and can delineate its origins through phylogenetic analysis ( , ) in combination with other local and international viral strains, especially given the accumulation of thousands of viral sequences from countries all over the world (www.nextstrain.org) (figure ) . however, practical considerations, such as cost, scalability, and data storage, should first be investigated to assess the feasibility of implementing cwgs as a population-based surveillance tool. here we show that cwgs is cost-effective and is highly scalable when using a target enrichment sequencing method, and we also demonstrate its utility in tracking the origin of sars-cov- transmission. all patients had laboratory-confirmed covid- based on positive rt-pcr assay for sars-cov- in the centralized dubai health authority (dha) virology laboratory. this study was approved by the dubai scientific research ethics committee -dubai health authority (approval number #dsrec- / _ ). viral rna was extracted from nasopharyngeal swabs of patients with covid- using the ez dsp virus kit (qiagen, hilden, germany), optimized for viral and bacterial nucleic acids extractions from human specimens using magnetic bead technology. sars-cov- positive results were confirmed using a rt-pcr assay, originally designed by the us centres for disease control and prevention (cdc), and is currently provided by integrated dna technologies (idt, ia, usa). this assay consists of oligonucleotide primers and dual-labelled hydrolysis (taqman®) probes ( 'fam/ 'black hole quencher) specific for two regions (n and n ) of the virus nucleocapsid (n) gene. an additional primer/probe set is also included to detect the human rnase p gene (rp) as an extraction control. the reverse transcription and amplification steps are performed using the taqpath tm -step rt-qpcr master mix (thermofisher, ma, usa) following manufacturer's instructions. a sample was considered positive if the cycle threshold (ct) values were less than for each of the sars-cov- targets (n and n ) and the extraction control (rp). to estimate the viral load relative to human rna, we calculated the Δct value for each target as follows: Δct = ctnn -ctrp, where nn is either n or n . the average of the n and n target Δct values was then negated to reach a relative estimate of viral load which is inversely correlated with ct value. rna libraries from all samples were prepared for shotgun transcriptomic sequencing using the truseq stranded total rna library kit from illumina (san diego, ca, usa), following manufacturer's instructions. rna specific fluorescent dye is used to quantify extracted rna using the qubit rna xr assay kit and the qubit flourometer system (thermofisher, ma, usa). then, µg of input rna from each patient sample was depleted for human ribosomal rna, and the remaining rna underwent fragmentation, reverse transcription (using the superscript ii reverse transcriptase kit from invitrogen, carisbad, usa), adaptor ligation, and amplification. libraries were then sequenced using the novaseq sp reagent kit ( x cycles) from illumina (san diego, ca, usa). rna extracted (~ µg) from patient nasopharyngeal swabs was used for double stranded cdna synthesis using the quantitect reverse transcription kit (qiagen, hilden, germany) according to manufacturer's protocol. this cdna was then evenly distributed into pcr reactions for sars-cov- whole genome amplification using overlapping primer sets covering most of its genome (figure a and supplemental table ). the sars-cov- primer sets used in this study were modified from wu et al ( ) by adding m tails to enable sequencing by sanger, if needed (supplemental table ). pcr amplification was performed using the platinum tm superfi tm pcr master mix (thermofisher, ma, usa) and a thermal protocol consisting of an initial denaturation at °c for seconds, followed by cycles of denaturation ( °c for seconds), annealing ( °c for seconds), and extension ( °c for minute and seconds). a final extension at °c for minutes was applied before retrieving the final pcr products. amplification was confirmed by running µl from each reaction on a % agarose gel. all pcr products were then purified using agencourt ampure xp beads (beckman coulter, ca, usa), quantified by nanodrop (thermofisher, ma, usa), diluted to the same concentration, and then pooled into one tube for next steps. a minimum of - ng of the pooled pcr products in µl were then sheared by ultrasonication (covaris le -plus series, ma, usa) to generate a target fragment size of - bp using the following parameters: % duty factor, peak power of watts, cycles per burst, seconds treatment time, an average power of watts, and °c bath temperature. target fragmentation were confirmed by the tapestation automated electrophoresis system tapestation (agilent, ca, usa) (figure a) . subsequently, the fragmented product is purified and then processed to generate sequencing-ready libraries using the sureselectxt library preparation kit (agilent, ca, usa) following manufacturer instructions. indexed libraries from multiple patients were pooled and sequenced ( x cycles) using the miseq or the novaseq systems (illumina, san diego, ca, usa). a step-bystep sars-cov- target enrichment and sequencing protocol is provided in appendix i. demultiplexed fastq reads, obtained through shotgun or target enrichment sequencing, were generated from raw sequencing base call files using bcl fastq v . . , and then mapped to the reference wuhan genome (genbank accession number: nc_ . ) by burrow-wheeler aligner, bwa v . . . alignment statistics, such as coverage and mapped reads, were generated using picard . . . variant calling was performed by gatk v . - - , and was followed by sars-cov- genome assembly using bcftools v. . . ( figure b ). all tools used in this study are freely accessible. for laboratories without bioinformatics support, several publicly accessible, end-to-end bioinformatics pipelines (insaflu: https://insaflu.insa.pt/; genome detective: https://www.genomedetective.com/app/typingtool/virus/) ( , ) , composed of the above tools, can be used to generate viral sequences from raw fastq data. for downstream analysis, a general quality control metric was implemented to ensure assembled sars-cov- genomes have at least x average coverage (sequencing reads >q ) across most nucleotide positions ( - , ). for target enrichment and shotgun sequencing comparisons in table , we used data from samples (uae/p / , l , l , l , l , l , l ) generated using both methods. data from patient l were not included in this analysis since, unlike the above samples, significantly more sequencing data was allocated for the target enriched sample in this patient (supplemental table ) which can overestimate the efficiency of the target enrichment protocol. all new sars-cov- sequences (n= ) generated in this study were submitted to gisaid (global initiative on sharing all influenza data) under accession ids: epi_isl_ and epi_isl_ to epi_isl_ . we used nexstrain ( ), which consists of augur v . . pipeline for multiple sequence alignment (mafft v . ) ( ) and phylogenetic tree construction (iqtree v . . ) ( ) . tree topology was assessed using the fast bootstrapping function with , replicates. tree visualization and annotations were performed in figtree v . . ( ). shotgun transcriptome sequencing was used to fully sequence sars-cov- rna extracted from patients (n= ) who tested positive for the virus ( ). analysis of the sequencing data showed that this approach required, on average, . gb of data per sample yielding . million total reads, of which approximately . % of the reads (~ , reads) mapped to the sars-cov- genome with an average coverage of ~ x (table ) . this is attributed to the fact that most of the shotgun data (~ %) is allocated to the human transcriptome while a minority of the reads align to the sars-cov- genome (table ). in addition to cost and storage considerations discussed below, this approach is not highly sensitive for detecting sars-cov- genomes in general and specifically in samples with low viral abundance. in fact, despite high viral abundance relative to human rna, most samples had less than x sequencing coverage across the sars-cov- genome. samples with seemingly very low viral loads failed to yield full sars-cov- genome sequence using this approach (table and figure a ). to enrich viral sequences and minimize sequencing cost and data storage issues addressed below, we describe an alternative approach where the entire sars-cov- genome is first amplified using overlapping primer sets each yielding around . kb long inserts (figures and supplemental table ). all inserts were then pooled and fragmented to - bp inserts which were then prepared for short read next generation sequencing (figures a and c) . rna extracted from eight covid- patients, which were first sequenced by shotgun transcriptome (table and supplemental table ), were sequenced using the enrichment protocol. as expected, we observed significant enhancement in virus detection using this protocol where, on average, . % of the reads now mapped to the sars-cov- genome leading to tenfold increase in coverage relative to shotgun transcriptome (avg. x versus . x, respectively) despite generating a two hundred fold less sequencing data (avg. . gb versus . gb, respectively, table and figure d ). on average, x coverage per gb of sequencing data was generated using shotgun sequencing ( table ) compared to ~ , x per gb using target enrichment ( table ) suggesting the latter method is more cost effective and is highly scalable. we calculate the table ) . therefore, target enrichment sequencing is still more cost-effective and scalable than shotgun transcriptome sequencing even at higher sequencing throughputs. another factor impeding scalability of the shotgun approach is data storage. even with higher throughput sequencing (novaseq sp flowcell), shotgun sequencing requires an allocation of tb of data for ~ sequenced samples. on the other hand, with tb of data, a total of around , samples can be sequenced using the enrichment method and the miseq micro flowcell ( table ) . therefore, long term data storage allocations, and cost, are significantly higher, and perhaps formidable, when using the shotgun sequencing approach. to illustrate the utility of sars-cov- whole genome sequencing, we tracked the origin(s) of the virus in seven patients (uae/p / , l , l , l , l , l , l ) by comparing their assembled sequences to virus strains (n= ) identified during the early phase of the pandemic, between january and march , in the uae ( ). all seven patient samples were collected between march and april , and are therefore good candidates to determine whether transmissions were community-based due to the previously documented strains or were independent external introductions. multiple sequence alignment and phylogenetic analysis (figures b and ) showed that the new isolates from patients l , l , l , and l clustered with earlier strains of iranian origin (or clade a ), while that from patient l belonged to the early european (or clade a a) cluster (figure ) . this information suggests that transmissions for all those five patients were most likely community-based, which we then confirmed from patient medical records where no recent travel history was reported by any of those individuals. sars-cov- isolates from patients p /uae/ and l were, on the other hand, closer to the earliest asian strains, which are more diverse due to fewer but distinct mutations ( figure ) . hence, with the available sequencing data it is challenging to ascertain whether the p /uae/ and l transmissions were community-based or due to early independent introductions. however, patient l did not have any recent travel history before symptoms onset, suggesting the case for community-based transmission related to an asian strain. on the other hand, travel history in patient p /uae/ was not known and the corresponding isolate appeared to match closely to five other strains from the united states and taiwan (figure ) . therefore, transmission in patient p /uae/ was unlikely to be community-based from the early strains ( ), but rather due to an independent travelrelated introduction of the virus. genomics-based sars-cov- population-based surveillance is a powerful tool for controlling viral transmission during the next phase of the pandemic. therefore, it is important to devise efficient methods for sars-cov- genome sequencing for downstream phylogenetic analysis and virus origin tracking. towards this goal, we describe a cost-effective, robust, and highly scalable target enrichment sequencing approach, and provide an example to demonstrate its utility in characterizing transmission origin. our target enrichment protocol is amplicon-based for which oligonucleotide primers can be easily ordered by any molecular laboratory. next generation sequencing (ngs) has also become largely accessible to most labs, and in our protocol we show that highly affordable, low throughput sequencers, such as the illumina miseq system, can be used efficiently to sequence up to samples at x coverage each at a cost of $ per sample ( table ) . this cost is likely comparable to rt-pcr testing for the virus. one possible limitation is the use of ultra-sonication for fragmentation of pcr products after sars-cov- whole genome amplification. several labs might lack sonication systems due to accessibility and affordability issues. in such situations, our protocol can be easily modified to use enzymatic fragmentation instead provided by commercial kits, such as the agilent https://www.genomedetective.com/app/typingtool/virus/) ( , ) which can take raw sequencing (fastq) files to assemble viral genomes, and to perform multiple sequence alignment and phylogenetic analysis for virus origin tracking. in addition, the described approach does not require significant data storage or computational investment as shown by our cost, data, and scalability calculations ( table ) . our phylogenetic analysis demonstrates how sars-cov- genomic sequencing can be used to track origins of virus transmission. however, data should be carefully interpreted, and should be combined with other epidemiological information (such as travel history) to avoid inaccurate conclusions. the major limitation facing genomic-based sars-cov- surveillance includes the lack of virus sequencing data representing most strains in any country. nonetheless, sars-cov- strains are continuously being sequenced by government, private, and academic entities all over the world, and the sequencing data is being shared publicly. this proliferation of sequencing datasets will empower genomic-based surveillance of the virus, and the availability of cost effective sequencing options, like the one described in this study, will contribute to democratizing sars-cov- sequencing and data sharing. in summary, we show that sars-cov- whole genome sequencing is a highly feasible and effective tool for tracking virus transmission. genomic data can be used to determine community-based versus imported transmissions, which can then inform the most appropriate public health decisions to control the pandemic. disclosures. authors do not have any conflicts of interests to disclose. sars-cov- /covid- : viral genomics, epidemiology, vaccines, and therapeutic interventions identifying and interrupting superspreading events -implications for control of severe acute respiratory syndrome coronavirus genomic surveillance and phylogenetic analysis reveal multiple introductions of sars-cov- into a global travel hub in the middle east shotgun transcriptome and isothermal profiling of sars-cov- infection reveals unique host responses, viral diversification, and drug interactions a new coronavirus associated with human respiratory disease in china insaflu: an automated open web-based bioinformatics suite "from reads" for influenza whole-genomesequencing-based surveillance genome detective: an automated system for virus identification from highthroughput sequencing data nextstrain: real-time tracking of pathogen evolution mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform terrace aware data structure for phylogenomic inference from supermatrices acknowledgements. authors would like to thank members of the microbiology laboratory, latifa women and children hospital, dubai health authority and al jalila children's specialty hospital genomics center for supporting sars-cov- diagnostic testing and for arranging samples used in this study. key: cord- -qbcb qi authors: agarwal, ajay title: in-silica analysis of sars-cov- viral strain using reverse vaccinology approach: a case study for usa date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qbcb qi the recent pandemic of covid that has struck the world is yet to be battled by a potential cure. countless lives have been claimed due to the existing pandemic and the societal normalcy has been damaged permanently. as a result, it becomes crucial for academic researchers in the field of bioinformatics to combat the existing pandemic. the study involved collecting the virulent strain sequence of sars-ncov for the country usa against human host through publically available bioinformatics databases. using in-silica analysis and reverse vaccinology, two leader proteins were identified to be potential vaccine candidates for development of a multi-epitope drug. the results of this study can provide further researchers better aspects and direction on developing vaccine and immune responses against covid . this work also aims at promoting the use of existing bioinformatics tools to faster streamline the pipeline of vaccine development. the situation of covid a new infection respiratory disease was first observed in the month of december , in wuhan, situated in the hubei province, china. studies have indicated that the reason of this disease was the emergence of a genetically-novel coronavirus closely related to sars-cov. this coronavirus, now named as ncov- , is the reason behind the spread of this fatal respiratory disease, now named as covid- . the initial group of infections is supposedly linked with the huanan seafood market, most likely due to animal contact. eventually, human-to-human interaction occurred and resulted in the transmission of the virus to humans. [ ]. since then, ncov- has been rapidly spreading within china and other parts of world. at the time of writing this article (mid-march ), covid- has spread across countries. a count of , cases have been confirmed of being diagnosed with covid- , and a total of deaths have occurred. the cumulative cases have been depicting a rising trend and the numbers are just increasing. who has declared covid- to be a “global health emergency”. [ ]. current scenario and objectives currently, research is being conducted on a massive level to understand the immunology and genetic characteristics of the disease. however, no cure or vaccine of ncov- has been developed at the time of writing this article. though, ncov- and sars-cov are almost genetically similar, the respiratory syndrome caused by both of them, covid- and sars respectively, are completely different. studies have indicated that – “sars was more deadly but much less infectious than covid- ”. -world health organization currently, research is being conducted on a massive level to understand the immunology and genetic characteristics of the disease. however, no cure or vaccine of ncov- has been developed at the time of writing this article. though, ncov- and sars-cov are almost genetically similar, the respiratory syndrome caused by both of them, covid- and sars respectively, are completely different. studies have indicated that -"sars was more deadly but much less infectious than covid- ." the spread of sars epidemic has provided with any useful insights as during that time the virus epidemic was contained only through general prevention means and treatment of the individual symptoms. as a result, there exists only a limited number of tools available to test the coronavirus for their ability to infect humans. this has acted as a major limitation for predicting the next zoonotic viral outbreak. [ ] [ ] [ ] . in response to the current medical crisis, the world health organization has activated the r&d blueprint for the acceleration of the development of diagnostics, therapeutics and vaccines for treatment of this novel coronavirus. the objective of this study lies in compliance to the guidelines for research activated by who. this study aims to utilize a reverse-vaccinology approach in order to identify potential vaccine candidates for covid for the country usa. we shall be deploying open-access bioinformatics tools for our analysis of the same. the virulent strain of sars-cov- was selected for in-silico analysis. the genome of the viral strain is made available by ncbi -national center for biotechnology information (https://www.ncbi.nlm.nih.gov.in). the reference identification for sars-cov- is given by refseq nc_ . viral protein sequences of the sars-cov- were obtained from the vipr -virus pathogen database and analysis. [ ] . these protein sequences were identified and downloaded in a tabular format for the host -humans and the country -usa. the fasta-file for all the protein sequences was downloaded and loaded in r using "seqinr" and "biostrings" packages. the different physicochemical properties were analysed using the "peptides" package in the r. [ ] [ ] [ ] . vaxijen . is used to predict the antigenicity of the protein based on the fasta-files that contain their respective amino acid sequences. the online software predicts the antigenic score for the same. [ ] [ ] . the t-cell and b-cell epitopes of the selected two leader protein sequences were predicted using the iedb (the immune epitope database). iedb is a freely available resource funded by niaid. it is a catalog that contains the experimental data on antibody and t cell epitopes studied in humans, non-human primates, and other animal species in the context of the infectious disease, allergy, autoimmunity, and transplantation. it also assists in hosting tools for predicting and analysing the epitopes. [ ] . for mhc class-i t-cell epitope prediction for orf ab leader proteins, netmhcpan el . method was used. this method was applied for the hla-a* - allele. for mhc class-ii t-cell epitope prediction for orf ab leader proteins, sturniolo prediction method was used. this method was applied for the hla drb * - allele. for b-cell lymphocyte epitope prediction, bepipered prediction method was used. [ ] . ten epitopes were selected on random for analysis of their antigenicity, allergenicity, topology and conservancy. allertop v (https://www.ddg-pharmfac.net/allertop/) was used for determining the allergenicity of selected epitopes. toxinpred server (https://webs.iiitd.edu.in/raghava/toxinpred/protein.php) was used to determining the toxicity of the selected epitopes. the prediction of transmembrane helices in proteins was determined using the tmhmm server v . (http://www.cbs.dtu.dk/services/tmhmm/). the conservancy of the selected epitopes was analysed using the conservancy analysis tool of the iedb server. for this analysis, the parameter for the sequence identity threshold was adjusted to '>= '. [ ] . in order to generate the d structure of the epitopes choses, the pep-fold tool was used. pep-fold (http://bioserv.rpbs.univ-paris-diderot.r/services/pep-fold /) uses a de novo approach to predict the peptide sequences from the amino acid sequences. [ ] [ ] [ ] . it utilizes the structural alphabet sa letters to describe the conformations of four consecutive residues. the pre-docking of the selected epitopes was done using the ucsf chimera. it was also used to perform pre-docking of the selected alleles hla-a* - (for mhc class-i) and hla drb * - . later, the docking of the peptide-protein was done using hpepdock. hpepdock is a web server of performing blind peptide-protein utilizing hierarchical algorithm. [ ] . instead of performing length stimulations to refine peptide conformations, hpepdock studies the peptide flexibility through an ensemble of peptide conformations produced by the modpep program. the sars-cov- strain was identified. all the protein sequences were analysed on the basis of the physicochemical properties to select the top five candidates for further analysis. for the physicochemical analysis, the number of amino acids, instability index, aliphatic index, and the grand average of hydrophobicity (gravy) scores of the all the , proteins were taken using the peptides package in r. this packages allows the identification, selection and analysis of multiple amino acid sequences in the same fasta-file. hence, it was utilized. the physicochemical study unveiled five potential candidates having the lowest score of gravy and instability index less than (hence, displaying stability). these five candidates were individually analysed for their molar extinction coefficients and antigenicity. the vaxijen . tool was used to analyse the antigenicity of the potential five candidates and the molar extinction coefficient was analysed using the expasy's online tool -protparam. out of the five potential candidates, only two were selected for further analysis. this was based on the criteria of having the highest score of predicted antigenicity and highest values of the extinction coefficients. both the leader proteins were then used for further analysis. the t-cell epitopes of the mhc class-i for both the leader protein were determined using the netmhcpan el . prediction method of the iedb server keeping the sequence length at . this tool allows the generation of the epitopes and sorts them on the basis of their percentile scores. randomly, ten potential epitopes were selected randomly for the antigenicity, allergenicity, toxicity, and conservancy tests. for mhc class-ii, t-cell epitopes (hla drb * - allele) of the proteins were also determined using the iedb analysis tools. the sturniolo prediction methods were used for the same. again, ten potential candidates were chosen based on the same criteria as that of mhc class-i. the b-cell epitopes of the proteins were selected using the bepipered linear epitope prediction method of the iedb server. the topology of the chosen epitopes was determined using the tmhmm v . server (http://www.cbs.dtu.dk/services/tmhmm/). the potential t-cell epitopes, whose topology, antigenicity, allergenicity, toxicity and conservancy was analysed, for orf ab leader proteins mt and mt are depicted in the following tables. table generation. on the analysis of the antigenicity, allergenicity, toxicity, and conservancy analysis of the tcell epitopes, it was found that most of them were antigenic, simultaneously being nonallergenic, non-toxic and higher values of conservancy. from the ten selected mhc class-i and mhc class-ii t-cell epitopes, one from each category were selected from both the leader proteins. the criteria for being selected was having the higher antigenic scores, nonallergenicity, non-toxicity, and conservancy value above %. the selected epitopes were: rsdartaph, vqlnnnnnn, lslpvlqvr, and vqlslpvlq. the b-cell epitopes of the orf ab leader proteins are displayed in the following two figures. the following figures depict the pep-fold generated d structures of the selected t-cell epitopes of mhc class-i and class-ii: rsdartaph, vqlnnnnnn, lslpvlqvr, and vqlslpvlq. leader protein mt . vipr: an open bioinformatics database and analysis resource for virology research seqinr . - : a contributed package to the r project for statistical computing devoted to biological sequences retrieval and analysis biostrings: efficient manipulation of biological strings peptides: a package for data mining of antimicrobial peptides vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines identification of novel vaccine candidates against campylobacter through reverse vaccinology the immune epitope database (iedb) . . nucleic acids research exploiting the reverse vaccinology approach to design novel subunit vaccine against ebola virus pep-fold: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides improved pep-fold approach for peptide and miniprotein structure prediction pep-fold : faster de novo structure prediction for linear peptides in solution and in complex hpepdock: a web server for blind peptide-protein docking based on a hierarchical algorithm this research didn't receive any specific grant from funding agencies. the author declares that there is no conflict of interest. the study doesn't contain any work/experiment performed with human participants or animals done by the author. hpepdock server was used to perform docking of the peptide and protein. the purpose of the same was to analyse which of the two selected t-cell mhc class-i epitope: rsdartaph and lslpvlqvr had the lowest global energy. the epitope having the lowest global energy acts as a better vaccine candidate. the docking was performed against the hla-a* - allele whose pdb format was obtained through pre-docking using ucsf chimera.the global energy for the selected class-i epitopes were: - . and - . respectively for rsdartaph and lslpvlqvr. out of the two mhc class-i epitopes selected for leader proteins orf ab mt and mt , the global energy was lowest for lslpvlqvr. for the mhc class-ii t-cell epitope, the docking was performed against the hla drb * - allele. same procedure was followed as mentioned before. the two selected epitopes after the previous analysis were: vqlnnnnnn and vqlslpvlq. the global energy for the selected epitopes were - . and - . respectively.out of the two t-cell mhc class-ii epitopes selected for the orf ab leader proteins mt and mt , the lowest global energy was of vqlslpvlq. the scope of this study involved performing an in-silica analysis of the sars-cov- viral strain for the country of usa against human host. the vipr database was used for the same to obtain all the protein sequences. a total of viral protein sequences were obtained whose extensive physicochemical analysis was done to select a group of two. this extensive analysis was performed using peptides package in the r software.it was revealed that the two leader proteins orf ab mt and mt had the highest extinction coefficient and the lowest score on the gravy. along with the given parameters, these leader proteins were highly stable and were also antigenic in nature.the fasta-formatted files of these selected proteins were taken and analysed to obtain the potential t-cell and b-cell epitopes. the t-cell epitopes of mhc class- and mhc class-ii were analysed on the basis of their scores. ten randomly selected t-cell epitopes from both the classes were taken for further analysis of allergenicity, toxicity, conservancy scores and antigenicity. only one epitope from both the classes was selected which possessed a higher conservancy score (more than %), was non-toxic, non-allergic and antigenic in nature.the two selected epitopes were then docked against their respective alleles to obtain the global energy scores. the epitopes which displayed the lowest global energy score on docking with the alleles were selected and proposed as successful and potential vaccine candidates funding: key: cord- -cbiw yvb authors: israeli, ofir; beth-din, adi; paran, nir; stein, dana; lazar, shirley; weiss, shay; milrot, elad; atiya-nasagi, yafit; yitzhaki, shmuel; laskar, orly; schuster, ofir title: evaluating the efficacy of rt-qpcr sars-cov- direct approaches in comparison to rna extraction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: cbiw yvb sars-cov- genetic identification is based on viral rna extraction prior to rt-qpcr assay, however recent studies support the elimination of the extraction step. herein, we assessed the rna extraction necessity, by comparing rt-qpcr efficacy in several direct approaches vs. the gold standard rna extraction, in detection of sars-cov- from laboratory samples as well as clinical oro-nasopharyngeal sars-cov- swabs. our findings show advantage for the extraction procedure, however a direct no-buffer approach might be an alternative, since it identified up to % of positive clinical specimens. the covid- pandemic caused by sars-cov- produced significant morbidity and mortality worldwide. at the time of writing, more than six million cases and over , deaths were reported [ ] . the pandemic has created an acute need for rapid, cost effective and reliable diagnostic screening. covid- genetic diagnostics process include rna extraction from oronasopharyngeal swabs followed by reverse transcriptase quantitative pcr (rt-qpcr) targeting viral genes [ ] . however, the global demand for reagents has placed extensive strain on supply chains for rt-qpcr kits and to an even greater extent, on rna isolation reagents. potentially, eliminating rna extraction would greatly simplify the diagnostic procedure, reducing both cost and time to answer, while allowing testing to continue in case of reagent shortages. previous studies demonstrated that several lysis buffers might allow the elimination of rna extraction [ ] [ ] [ ] . very recently, two studies [ ] [ ] used a direct no-buffer rt-qpcr approach which identified > % of the tested clinical samples. in this study, we tested the diagnostic efficiency following thermal inactivation ( °c for min and °c for min) without addition of lysis buffers ("no buffer") or following lysis by three buffers (virotype, quickextract and % triton-x- ) and compared it to diagnosis after standard rna extraction. samples included buffers spiked with sars-cov- , at concentrations . - , pfu/ml and clinical samples, previously diagnosed as positive ( ) and negative ( ). viral rna standards were viable sars-cov- (gisaid accession epi_isl_ ), cultured in vero e cells and diluted in viral transport medium (biological industries). virus concentrations are given as plaque forming units/ml. one pfu was determined as virions by digital pcr [data not shown]. oro-nasopharyngeal swab samples for the study were selected after approval by conventional rt-qpcr. positive and negative samples were randomly selected for this study, kept at °c until use. rna was extracted from µl sample using rnadvance viral kit and the biomek i automated workstation (beckman coulter) and eluted with µl h o. samples were analyzed directly or mixed : with one of the following buffers: quick extract dna extraction solution (lucigen), virotype tissue lysis reagent (indical bioscience gmbh) and % triton-x- (sigma) after inactivation at °c for min or °c for min. rt-qpcr assays were performed using the sensifast probe lo-rox one-step kit (bioline). primers and probe for sars-cov- e gene were taken from the berlin protocol [ ]. standard samples were analyzed in duplicates and the results shown in figure are averages. the samples were analyzed following two inactivation temperatures: °c for min or °c for min. the maximal standard deviation was < ct's with an average standard deviation of . across all samples. the limit of detection was pfu/ml: in this concentration samples in the no buffer mode and virotype at °c were not detected, while the rna extraction mode averaged the lowest critical threshold ( ct= . ) followed by quickextract and triton. in pfu/ml only the no buffer mode at °c failed to detect. the rna extraction mode maintained the lowest ct values across all the analyzed concentrations. the minimal delta ct average to the rna extraction mode was obtained using quickextract, followed by triton, virotype and the no buffer mode. sars-cov- clinical samples next we tested the feasibility of direct sars-cov- detection in clinical samples. twenty positive and negative samples were analyzed following thermal inactivation. all previously defined negative samples remained negative across the different buffers and test conditions. positive samples exhibited major differences in detection capability (table ). the alternative buffers exhibited much lower detection levels: triton (both inactivation protocols) detected a single positive sample ( % detection); ouickextract and virotype had - % detection rates (both inactivation protocols). surprisingly, direct no-buffer approach was superior with % and % for the °c and °c inactivation protocols, respectively. detection was reverselycorrelated to samples' ct, with efficiency dropping from % for ct < to % for samples with higher ct. our results demonstrate that rna extraction significantly improves comprehensive and sensitive clinical diagnosis of sars-cov- . we suggest that clinical samples (which include a multitude of nucleic acids and proteins) might significantly hamper detection. although being previously reported to facilitate viral detection [ - ], the tested buffers severely compromised the limit of detection (to a maximum of %). this is surprising, considering that direct analysis without adding buffers achieved a % detection level. this no-buffer direct approach could potentially be used with some success in times of need to achieve screening for high-titer samples. pre-existing samples were used and de-identified. this work was therefore determined to be "not human subjects' research". sars-cov- was kindly provided by bundeswehr institute of microbiology, munich, germany. the authors would like to thank itai glinert for his fruitful reviewing of this manuscript. the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. sars-cov- detection by direct rrt-pcr without rna extraction. clin virol an alternative workflow for molecular detection of sars-cov- -escape from the na extraction kit-shortage direct rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swabs without an rna extraction step biorxiv key: cord- -lbtuvdv authors: de souza, dalton garcia borges; alves júnior, francisco tarcísio; soma, nei yoshihiro title: forecasting covid- cases at the amazon region: a comparison of classical and machine learning models date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lbtuvdv background since the first reports of covid- , decision-makers have been using traditional epidemiological models to predict the days to come. however, the enhancement of computational power, the demand for adaptable predictive frameworks, the short past of the disease, and uncertainties related to input data and prediction rules, also make other classical and machine learning techniques viable options. objective this study investigates the efficiency of six models in forecasting covid- confirmed cases with days ahead. we compare the models autoregressive integrated moving average (arima), holt-winters, support vector regression (svr), k-nearest neighbors regressor (knn), random trees regressor (rtr), seasonal linear regression with change-points (prophet), and simple logistic regression (slr). material and methods we implement the models to data provided by the health surveillance secretary of amapáa, a brazilian state fully carved in the amazon rainforest, which has been experiencing high infection rates. we evaluate the models according to their capacity to forecast in different historical scenarios of the covid- progression, such as exponential increases, sudden decreases, and stability periods of daily cases. to do so, we use a rolling forward splitting approach for out-of-sample validation. we employ the metrics rmse, r-squared, and smape in evaluating the model in different cross-validation sections. findings all models outperform slg, especially holt-winters, that performs satisfactorily in all scenarios. svr and arima have better performances in isolated scenarios. to implement the comparisons, we have created a web application, which is available online. conclusion this work represents an effort to assist the decision-makers of amapá in future decisions to come, especially under scenarios of sudden variations in the number of confirmed cases of amapá, which would be caused, for instance, by new contamination waves or vaccination. it is also an attempt to highlight alternative models that could be used in future epidemics. all those numbers caught the attention of many researchers, that presented models to attend the concerns from the brazilian government and population, such as when the outbreak will peak, how long it will last, and how many will be infected or die [ , ] . many of those forecasting models rely on epidemiological approaches [ , ] or state-of-art artificial intelligence (ai) algorithms. generally, researchers address their models to the country as a unit or to highly populated areas, mainly big cities and federation states like sao paulo and rio de janeiro [ , , ] . however, covid- has also impacted other brazilian regions, such as the north, that is a territory almost entirely covered by the amazon rain-forest and accounts for almost half of the brazilian territory. the north has a low population density ( . inh./km ), accounts for only . % of the brazilian population, and is responsible for . % of all confirmed cases of covid- in brazil. it may be represented by infected per population rates: . % in the north, versus . % in the rest of the country [ ] . figure shows the evolution of the infection rate in all five brazilian regions. ,qihfwlrq dwh carved into the amazon rain-forest is amapá, a northern state of brazil. amapá is like an island surrounded by the forest since it displays no land routes with any other brazilian state (see fig. ). it has only , inhabitants but living in an area bigger than england, which is voc times denser. like other parts of amazon, amapá october , / already experiences an excess mortality from infectious diseases, especially among indigenous populations. despite recent political efforts, many people living in the state still suffers from different social and health problems such as minimal access to clean water and public sanitation [ ] . those and other reasons make amapá especially susceptible to covid- and other epidemic outbreaks that may occur in the future. by the end of may, mapacá, the amapá's capital, saw its health system collapse due to covid- . by closing august , the state consolidated the second highest infection rate in brazil, according to official data [ ] . by the end of september , the state also has a low fatality rate ( . %) when compared to the whole country ( . %), which may be the result of local attempts to track new cases and avoid under-notifications. respecting this ambiance, in this paper, we explore and compare traditional and ai forecasting models to support the amapaense decision-makers in the future decisions to come. the interest variables are the accumulated number of confirmed and death cases. we compare the models autoregressive integrated moving average (arima), holt-winters, support vector regression (svr), k-nearest neighbors regressor (knn), random trees regressor (rt), seasonal linear regression with change-points (slir) and simple logistic regression (slr), which dictates the baseline performance in this study. we compare the models according to the necessities of local authorities. thus, we measure the model's effectiveness to forecast the days ahead and how fast they have responded to quick increases and decreases in the number of cases, as well as to periods of stability. this scenarios may repeat in the future, as result of new contamination waves or vaccination, for example. the forecasts are performed to each amapaense municipality individually and to the state accumulated data, which we paint as our main example. since the municipalities are in different stages of the covid- spreading, they may also display very different curve growing behaviors. thus, as a result of this study, we have also created an online application (which can be accessed in http://www.previsor.covid amapa.com/, that can be used to visualize the data at follow the steps we do, as well as choose the best model to use in different occasions. in this section, we describe our research framework, which we split into: ( . ) data acquisition, ( . ) data splitting, ( . ) fitting and forecasting, and ( ) model evaluation. the subsection that follows treats each one of those steps. we performed all modelings to the cumulative confirmed cases of covid- in amapá, since the first official case, in march th , up to august th . we gather the data from official reports, from each of the th amapense municipalities. the collected data is also available in an application programming interface provided by brasil.io repository [ ] . the measurement periods are different for each municipality and tab. summarized the dates of the first and last reports. training set and tried to forecast the next q days. then, interactively, we added one day to the training set, until it comprised n − q observations. thus, for a given municipality, we have n − p − q + different cross-validation splittings. amapense decision-makers. thus, in the first splitting, the raw data is divided into a proportion of half-and-half between training and testing sets (see algorithm ). each training sample (x) is then standardized (z) by its mean (u) and standard deviation (s), calculated as z = (x − u)/s. we then fit the training datasets to each one the following models: autoregressive running on a daily bases will convert the predicted values before calculating the metrics and comparing them. the models are explained as follows: the arima model stands for integration (i) between autoregressive (ar) and moving average (ma) models. box and jerkings [ ] are the first designers of this model. arima may also be adjusted to consider seasonality, which optimal value may be found after the conduction of a canova-hansen test [ ] . the optimum values of autoregressive (p), degree of their differences (d) and moving average (q) may also be found by search-grid. usually, we select the parameters that minimize the information criterion (aic). articles such as benvenuto et al. [ ] , ceylan [ ] , and singh et al. [ ] bring examples of arima applications to covid- cases forecasting. the general equations for ar and ma models are [ ] : where y t , ε, φ, and θ are the observed values at time t, the value of the random shock at time t, ar, and ma parameters, respectively. thus, an arma model is given by: where α is a constant. when dealing with non stationarity, the data may be differenced, and the arima model is then performed. cases [ , ] . the equations of the additive model follow. where s is the smoothed observation, l the cycle length, and t a period. the trend factor (b), the seasonal index (i), and the forecast at m steps (f ) are given by: the general logic of an svr is relatively simple. suppose a linear regression, which objective is to minimize the sum of square errors. where y i is the target, w i the coefficient and x i the feature. then, the training of svr aims to minimize the following system. random forest is a machine learning algorithm with many decision trees. breiman [ ] proposed a combination of bagging and random subspaces methods. nowadays, prophet is a forecasting approach developed by facebook. it employs a decomposable times series model, with three main model components: trend (g(t)), seasonality (s(t)) and holidays (h(t)). it also assumes an error representing any idiosyncratic changes that are not predicted by the model. with g(t) = (k + a(t) t δ)t + (m + a(t)γ) ( ) where k is the growth rate, δ is the rate adjustments, m is the offset parameter, and γ j is set to s j δ j to make the function continuous. another important aspect is that the model performs automatic changepoint selection, putting a sparse prior on δ. on the other hand, it relies on fourier series to incorporate daily, weekly, and annually seasonalities. in the case of covid- , we are more concerned about weekly seasonality. in the context of covid- , prophet has few appearances in forecasting the accumulated confirmed and death cases [ , ] . . model evaluation we evaluate the performance of each forecasting models in terms of r-squared (r ), root mean square error (rmse), and symmetric mean absolute percentage error (smape). we perform the evaluations for each train/test pair created by the rolling forward splitting. thus, each metric is performed n − p − q + times. where n is the number of observations, y i andŷ i are the i th observed and predicted values. in a general manner, all machine learning models achieve better results than logistic regression. in fig. we can see how holt-winters performs in comparison to the other five models. those findings notice that we measure rolling forward performances according to the r-squared given by each cross-validation set. in each pair of models we can observe how the holt-winters perform in comparison to an other model and considering the periods we classify as ( ) exponential increasing, ( ) after sudden daily decreasing and ( ) stability of daily new cases. similar evaluations to the prediction of confirmed cases can be extended to death cases. in this case, holt-winters still seems to be the most suitable model, along with arima. similar considerations can also be draw to the municipalities of amapá. however, for small cities where data is scarce, most models we present here struggle to make predictions. in this case, even naive approaches seem to be a good alternative. will evolve is critical to local authorities to determine the best responses. thus, in this paper, we compared classical and machine learning models to forecast the evolution of covid- in the state. despite the volume of research papers pointing machine learning models as those with the best performance for many locations, in the case of amapá, two classical approaches seem to perform better: holt-winters and arima. it may be a consequence of the amapaense data, which has marked seasonality and sudden variations. one advantage of these two models is that they are easier to ( ) ( ) may jun jul aug sep networks models, which may consider other feature sets in forecasting future numbers of cases. we also intend to propose a framework that indicates the best forecasting model for each municipality and period, saving time from local decision-makers. coronavirus disease (covid- ): situation report - coronavirus pandemic (covid- ). our world in data forecasting covid- cumulative confirmed cases: perspectives for brazil predicting turning point, duration and attack rate of covid- outbreaks in major western countries modeling and forecasting the covid- pandemic in mathematical modeling and epidemic prediction of covid- of the state of são paulo prediction of epidemic trends in covid- with logistic model and machine learning technics brazilian and american covid- cases based on artificial intelligence coupled with climatic exogenous variables covid- : coronavirus newsletters and cases by municipality per day. brasilio voluntary collective isolation as a best response to covid- for indigenous populations? a case study and protocol from the bolivian amazon. the lancet time series analysis: forecasting and control are seasonal patterns constant over time? a test for seasonal stability application of the arima model on the covid- epidemic dataset. data in brief estimation of covid- prevalence in italy prediction of the covid- pandemic for the top affected countries advanced autoregressive integrated moving average (arima) model. jmir public health and surveillance application of arima and holt-winters forecasting model to predict the spreading of covid- for india and its states forecasting the novel coronavirus covid- . plos one the nature of statistical learning theory support vector regression machines future forecasting using supervised machine learning models covid- spread pattern using support vector regression. piksel: penelitian ilmu komputer sistem embedded and logic an introduction to kernel and nearest-neighbor nonparametric regression random forests. machine learning spatio-temporal estimation of the daily cases of covid- in worldwide using random forest machine learning algorithm outbreak prediction of covid- in most susceptible countries key: cord- -ufyzqgqk authors: aguilar-pineda, jorge alberto; albaghdadi, mazen; jiang, wanlin; lopez, karin j. vera; del-carpio, gonzalo davila; valdez, badhin gómez; lindsay, mark e.; malhotra, rajeev; lino cardenas, christian l. title: structural and functional analysis of female sex hormones against sars-cov cell entry date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ufyzqgqk emerging evidence suggests that males are more susceptible to severe infection by the sars-cov- virus than females. a variety of mechanisms may underlie the observed gender-related disparities including differences in sex hormones. however, the precise mechanisms by which female sex hormones may provide protection against sars-cov- infectivity remains unknown. here we report new insights into the molecular basis of the interactions between the sars-cov- spike (s) protein and the human ace receptor. we further observed that glycosylation of the ace receptor enhances sars-cov- infectivity. importantly estrogens can disrupt glycan-glycan interactions and glycan-protein interactions between the human ace and the sars-cov thereby blocking its entry into cells. in a mouse model, estrogens reduced ace glycosylation and thereby alveolar uptake of the sars-cov- spike protein. these results shed light on a putative mechanism whereby female sex hormones may provide protection from developing severe infection and could inform the development of future therapies against covid- . the novel coronavirus disease (covid- ) global pandemic caused by infection with the severe acute respiratory syndrome coronavirus (sars-cov ) virus has infected nearly million people worldwide resulting in nearly , deaths as july , . emerging data suggests that males are more susceptible to covid- infection and are at higher risk of critical illness and death than females [ ] [ ] [ ] . there has been consistent evidence of an increased case fatality rate (cfr) among males in nearly every country with available sex-disaggregated data including peru, france, greece, italy, mexico, pakistan, philippines and spain amounting to a . times higher cfr than females . understanding the mechanisms underlying enhanced covid- susceptibility and disease severity in males is key to developing new therapies and guiding vaccine development. changes in sex hormone concentration over an individual's lifetime and associated risk of comorbid conditions, such as cardiovascular diseases, may also contribute to variability in disease susceptibility and severity . it has been postulated that the male-biased sex divergence in covid- deaths could be, in part, explained by the strict relationship between sex hormones and the expression of the entry receptor for sars-cov , the angiotensin converting enzyme (ace ) receptor , . molecular studies have demonstrated that the male hormone testosterone regulates the expression of ace and the transmembrane serine protease (tmprss ) which is an androgen-responsive serine protease that cleaves the sars-cov- spike (s) protein and facilitates viral entry via ace binding [ ] [ ] [ ] . androgen-driven upregulation of ace levels may therefore be associated with increased vulnerability to severe infections in male patients with covid- . paradoxically, ace plays an important role in lung protection during injury which is attenuated by the binding of sars-cov- . the presence of a male-biased dependence in covid- susceptibility may imply the presence of a protective factor against sars-cov- infectivity in women. in addition to the ability of sex hormones to modulate expression of proteins related to entry into host cells, both estrogens and androgens are also able to directly modulate immune cell function via receptor-mediated effects , . additionally, sex chromosomes may mediate more favorable outcomes among women compared to men affected with covid- . x-linked genes associated with immune function tend to be expressed more often in females who generally have two x chromosomes compared to males . additional clues to the possible protective effects of estrogens have been suggested by differences in dietary patterns among countries with different cfrs . interestingly, countries with the lowest cfrs including japan and korea are the largest consumers of isoflavones-based foods, also known as phytoestrogens, that may also mediate favorable effects on ace expression and therefore covid- risk [ ] [ ] [ ] . the observation that females and those individuals consuming higher levels of isoflavones may be protected from covid- infection and adverse consequences indicates a potential protective role of estrogens against sars-cov- . here, we examine the role of two estrogen molecules ( β-diol and s-equol) to modulate the ace -dependent membrane fusion protein and reduce cell entry of the sars-cov spike protein into lung cells. to the best of our knowledge, we report new findings regarding the importance of molecular interactions between hace and the viral spike (s) protein. furthermore, we provide insights into the molecular basis for our observations that estrogens impair sars-cov entry and highlight the potential for estrogens as an agent in patients with covid- . glycosylation site-mapping of human ace and sars-cov- spike interactions. recent studies , have shown the ability of the sars-cov virus to utilize a highly glycosylated spike (s) protein to elude the host's immune system and bind to its target membrane receptor, ace , thus enabling entry into human cells. based on the structural complementarity and steric impediments between the s protein and human ace (hace ) protein membranes, we mapped the glycosylation sites of both models [ ] [ ] [ ] [ ] and performed molecular dynamics simulations (mds) by ns to stabilize the glycosylated sars-cov spike (s) and hace complex (suppl. table ., suppl. figure and figure a ). these analyses revealed that glycosylation of the ace protein increases the affinity of the virus s protein to interact with the receptor via glycan-glycan interactions, glycan-protein interactions, hydrogen, and hydrophobic bonds (suppl. table . and figure b) . notably, glycan-glycan interactions occur between the ace glycan at n and n and glycans found on the spike's receptor binding domain (s-rbd) at n and n (figure c , left panel). despite the close interaction between ace and s-rbd glycans, their affinity to anchor with highly negatively charged molecules such as the ace protein remains unalterable (figure c right panel) suggesting that glycan and electrostatic-dependent surface tethering may represent a plausible mechanism for ace-s-rbd binding and cell infection. the glycan-protein interactions occur between the ace glycan at n and the residues of the s-rbd at n , s , n , l , v , g , v , q , t and q (figure d ). while ace residues at d , y , w and g form hydrogen bonds with residues of the s-rbd at n , d , s , n and e (figure f ). multiple distinct clusters of hydrophobic residues at the ace surface were also found to interact with the s-rbd protein (suppl. figure ). importantly one key hydrophobic region on the ace surface at t interacts with five residues of the s-rbd (p , g , f , g and y ), (figure g ). given the insights afforded by our in silico mds experiments, we sought to explore the impact of ace glycosylation on s-rbd cell entry using cultured human umbilical vein endothelial cells (huvecs). a variety of saccharide substrates were utilized for their ability to modulate glycosylation profiles in cells. the glycosylation pattern of the endogenous ace was increased in nearly all treated cells ( figure h ). notably co-incubation of huvecs with ug of recombinant s-rbd (rs-rbd) protein revealed that glucose ( mm) pre-treatment was associated with the greatest degree of rs-rbd entry into the cells by ~ fold compared with hypoglycemic media (hbss or optimen) cells ( figure g ). this model indicates that glycosylated residues surrounding the cavity at the top of the ace molecule could increase binding by the s-rbd. given the possibility that occupancy at glycosylated residues or s-rbd binding sites by estrogens could modify the affinity of the sars-cov virus and alter entry into the cell thereby reducing infectivity, we sought to further examine these interactions using a range of complementary experimental approaches (see table s ). estrogens bind to hace and stimulate its stabilization and internalization. in an effort to explore the potential protective effects of female sex hormones against sars-cov- infection, we examined the impact of estradiol ( β-diol) and a dietary-derived phytoestrogen (s-equol) on hace structure and protein expression by a combination of in silico modeling, in vitro, and in vivo analysis. specifically, in light of the importance of glycan-glycan interactions that mediate virus-ace interactions, we sought to analyze the effect of estrogens on key molecular viral and receptor binding sites. in agreement with yan et al. , we identified three important regions on the ace surface that are utilized for sars-cov- binding. the environment of these regions is composed of a high density of glycans, including a helix α from residues i to t , a helix α from residues v to m , and one loop from residues k to g (suppl. figure and figure a ). we then homogenously solvated the glycosylated hace structure with . mm of β-diol or . mm of s-equol followed by ns of mds. remarkably we found that the βdiol molecules interact with residues at f , y , q , t , q , m and the s-equol molecules interacts with residues at q , k , t , f , k , e , l , n , d , k , a , f , e , q and l (figure b , supp. table # ). both estrogen molecules energetically stabilized the α and α helices by physical interactions and thereby minimized the fluctuation of the ace chains a and b (figure c , supp #). importantly , our calculation of free-energy landscape (fel), demonstrated that the surface of chain b of ace (s-rbd's preferred interaction region) loses its interaction energy with the s-rbd protein from . kj/mol to . kj/mol ( %) for the β-diol system and to . kj/mol ( %) for the s-equol system ( figure d ). in addition, binding of either estrogen molecules to the surrounding hydrophobic pocket of ace at the residue t promotes a decrease in energy by ~ % which may have a negative impact on the attachment of the s-rbd protein to the receptor (suppl. figure ) . we also observed estrogen-glycan interactions particularly at the glycan-protein interactions between the ace (n ) and the s-rbd (n ) ( figure. a) . indeed, glycans are highly polar structures due to their high content of hydroxyl groups which make them suitable for attachment to the ace protein (mostly negatively charged) or the sars-cov- s-protein (polarly charged). the density functional theory (dft) calculation shows an important decrease of the glycan's molecular electrostatic potential (mep) due to the interactions with either estrogen molecules. therefore, estrogen-glycan interactions could decrease the adhesive effect of glycans that enhance s-rbd and ace receptor interactions (suppl. figure and figure. b). these structural analyses suggest that estrogens could act as putative ace ligands due to their ability to bind to highly energetic pockets at the top of the ace surface protein which may increase its conformational equilibria and potentially boost its internalization to the cytoplasm. to support our in-silico analyses, we treated huvecs with β-diol ( nm) or s-equol ( nm) overnight under normal physiologic conditions. immunofluorescent staining demonstrated that estrogen-treated cells have less ace membrane cellular localization ( figure c ). immunoblot analysis revealed that endogenous and dietary estrogens promote ace internalization and degradation through the endocytosis process as assessed by lc b and lamp protein activation in treated cells ( figure. d). to test the hypothesis that lower levels of estrogens are associated with increased levels of ace protein in the respiratory tract, we administrated intratracheally either β-diol ( . μm) or s-equol ( μm) to male mice. histologic analysis of lung sections demonstrated that both forms of estrogens decrease ace membrane expression levels in lung alveoli and also reduced the glycosylation of the ace receptor ( figure e & f) estrogens interfere with sars-cov- receptor binding and block entry into the cell. to determine if the decline of conformational gibbs free energy and gain in stabilization of ace due to estrogen binding could affect the ability of the s protein to interact with the ace receptor and thereby its entry into cells, we performed a refinement step of ace -free or ace -estrogen models with ns of mds followed by molecular docking with the sars-cov- s protein. from structures obtained, with top scores were chosen for further analysis (suppl. table ). the ace - β-diol model promoted the shift of s-rbds from the binding surface toward the lateral side of the ace protein decreasing the number of contact residues. notably s-rbds completely lose contact with key ace -glycosylated residues at n , n , n and n . we also observed that the contact between the s-rbd and the helix α and α of ace moved toward the n-terminal of the helix and thus affected the ability to bind the receptor. in the same manner, the ace -s-equol model demonstrated that s-equol blocks the contact between the s-rbds and the receptor's surface, notably promoting novel interactions at the c-terminal of the helix α causing nonspecific contacts with the receptor at residues q -i and p -n . interestingly, we found that the β-diol interacts with residues on the surface of the receptor and notably forms a cluster on glycans at n (chain a) and n (chain b). on the other hand, the s-equol molecules tend to interact more widely accounting for a total of interactions, including on residues on the chain a and residues on the chain b. (for better visualization, only the top scored s-rdbs structures are shown in figure a ). the nonspecific binding by the s-rbds could be explained by the susceptibility of ace to interact with polar molecules and especially to electrophilic attacks. the fact that the β-diol or s-equol contain few polar groups but are deficient in negative charge renders them more susceptible to attack the surface of hace thereby blocking s-rbd from binding correctly. in addition, we computed the binding score of these models using the atomic energy contact function and in agreement with our previous docking results observed that both estrogen molecules significantly reduced the atomic energy contact between virus and receptor. remarkably, the β-diol reduced the atomic contact by % and the s-equol by % indicating that the entry of the virus may be affected by the presence of either estrogen molecules (figure b ). to validate our in-silico prediction, we pre-treated huvecs with either estrogen molecules followed by incubation with μg of rs-rbd protein overnight. importantly either low or high concentration of β-diol (low= nm) & high= nm) or s-equol (low= nm & high= nm) blocked more than % of the rs-rbd protein entry into the cell as assessed by immunofluorescence and colocalization with lamp , a lysosome marker ( figure c ). in addition, immunoblot demonstrated a decrease of rs-rbd levels in the cytoplasm of huvecs for both estrogen-based treatments (figure d ). together these results suggest a potential molecular mechanism by which estrogens may provide protection against severe infection in covid among women and individuals with phytoestrogen intake. next, we sought to test the ability of estrogens to block key interactions between ace and the sars-cov- s protein and thereby infection of the respiratory tract. male wild-type mice were treated with β-diol ( . μm) or s-equol ( μm) via intratracheal instillation for hrs before tissue collection. elisa-based binding assay showed a significant decrease of ace affinity to sars-cov- s protein in lungs from mice treated with either estrogen molecules compared with the control group (figure a ). we then evaluated in vivo whether intratracheal estrogen treatment would reduce internalization of the s protein in male mice. we observed that intratracheal instillation of both estrogen molecules hrs before intratracheal instillation of rs-rbd ( μg, overnight treatment) increased signal for the rs-rbd on the surface of lung cells which likely results from reduced binding to ace in estrogen-treated mice compared to the untreated group. in contrast control (dmso-treated) lungs showed normal ace membrane localization and cytoplasmic r-s-rbd signal indicating the unperturbed intake of the rs-rbd protein. the observed increase in extracellular rs-rbd in alveolar cells from lungs of estrogen-treated mice suggests estrogen-mediated reduction in internalization of the s-protein. indeed, we observed that pretreatment with estrogen resulted in rs-rbd protein accumulation on the surface of the alveolar cells (figure b ) rather than being internalized into the cytoplasm which would thereby support viral replication and disease progression. our data show that estrogens may interfere with sars-cov- infection in the respiratory tract through direct interaction with the ace receptor in vivo. increased susceptibility and risk of adverse clinical outcomes among males affected by covid- has been reported in multiple epidemiological studies [ ] [ ] [ ] [ ] . androgens can effectively upregulate viral target proteins that may increase viral entry and pathogenicity in patients following exposure to the sars-cov virus and sex-related hormones can modulate immune respose. a detailed understanding of the molecular and cellular mechanisms modulated by estrogen that contribute to viral pathogenicity is therefore critical to the development of new therapies to combat the covid- pandemic. beside the epidemiologic evidence suggesting that females are protected from severe infection, a recent study has demonstrated that the female reproductive tract, expresses very low levels of the ace receptor and almost undetectable tmprss , suggesting that the virus is unlikely to infect the female reproductive tract, where female sex hormones are produced , . in the current study, we utilized in silico, in vitro, and in vivo studies to characterize important glycosylation-mediated interactions between the sars-cov virus spike (s) protein and the human ace receptor that can be modulated by endogenous or dietary estrogens in a manner that may be protective against the sars-cov entry into human cells. previous studies have highlighted the critical role of glycosylation in viral pathobiology, host immune system evasion, and infectivity in a range of human viral illnesses . in many of these viruses, the viral envelope and secreted proteins are extensively glycosylated which is necessary for structural integrity and functionality of these proteins. viral proteins may be glycosylated by the host cell as viruses are able to hijack cellular glycosylation processes. however little data exists on the impact of glycosylation of host proteins necessary for viral entry, such as ace , on viral infectivity. using a novel molecular simulation approach, we demonstrated that ace glycosylation augments binding of the viral s protein by supporting multiple types of interactions including glycan-glycan and glycan-protein interactions, thereby facilitating the stability and affinity of viral binding to the target host receptor. we extend these in silico observations by also demonstrating that entry of the s-rbd can be augmented in vitro by exposure of cultured huvecs to a hyperglycemic environment that increases ace glycosylation. these observations provide insights into the enhanced susceptibility of diabetic patients to severe infections and death in covid- [ ] [ ] [ ] . based on these findings that ace glycosylation enhances interaction with the viral s protein in silico, we explored whether the predominant endogenous form of estrogen, β-diol, may provide a protective effect as assessed using in silico modeling of viral s protein-ace interaction and in vitro and in vivo models of viral infectivity. in addition, we used an identical approach to understand the potential protective mechanisms of dietary phytoestrogens on sars-cov infectivity observed in populations with low cfrs where consumption of these foods is high. we found that both endogenous and dietary estrogens compete with the s-rbd protein to bind specific sites on hace that are used by the virus to bind the receptor. indeed, estrogens were found to bind at almost all sites including ace glycans causing a reduction of energy on the surface of the receptor rendering the receptor less susceptible to interact with other molecules via reduced cell surface expression including the viral s protein interactions. our findings that estrogens interfere with s protein and ace interactions in silico that is associated with reduced s protein uptake in an in vitro model of sars-cov- infectivity in cultured human endothelial cells are consistent with prior studies demonstrating that estrogens have antiviral properties against hiv, ebola and hepatitis viruses . additionally, recent evidence indicates that decreased levels of estrogens in post-menopausal women are an independent risk factor for disease severity in female covid- patients . the findings of the current study thus represent novel findings in our understanding of the molecular mechanisms underlying reduced susceptibility to sars-cov- among females or individuals with depressed estrogen levels and in countries where dietary estrogens are high. we then examined the ability of estrogen molecules to interfere with s protein uptake into pulmonary epithelial cells using an in vivo model of sars-cov infectivity. in agreement with our cellular experiments, lungs from mice treated with dietary or endogenous estrogens demonstrated a dramatic reduction in the uptake of s-rbd. in addition, we observed a remarkable reduction of ace binding possibly due to the low protein levels of ace in those lungs possibly in response to estrogen-mediated degradation. in conclusion we provide a molecular basis that helps elucidate the potential protective effect of estrogens in women infected by the sars-cov- virus which could inform the development of future therapeutic measures to protect against sars-cov- infection including the design of suitable blocking antibodies, estrogen-related treatments, and vaccine development. for immunefluorescence, huvec cells were cultured into -well lab-tektm ii chamber slides (nunctm) and were then treated with either β-diol at nm or s-equol at nm. cells were rinsed twice with ice-cold pbs, fixed with % paraformaldehyde in pbs (pfa, boston bioproducts) for min at rt, and were permeabilized with . % triton-x (sigma-aldrich) for min. the slides were blocked with % donkey-serum, and . m glycine in pbs-tween ( . %) for h at rt. subsequently, the antibodies anti-ace ( : ), s-rbd-his-tag ( : ), anti-lamp ( : ) and anti-lc b ( : ) were added and slides were incubated overnight at °c. the slides were then washed times for min each with pbs-t and were incubated with secondary antibodies at : dilution for hr at room temperature. following immunostaining, slides were mounted with diamond mounting medium containing dapi (thermo fisher). slides were then visualized with the leica tcs sp confocal microscopy station and the pictures were digitized with the leica application suite x software. huvec cells were rinsed twice with ice-cold pbs and proteins were extracted with m-per for whole cell lysis, respectively (thermo fisher). these lysis buffers contained halt protease, phosphatase inhibitors and edta (thermo fisher). the protein concentration was determined by the colorimetric bicinchoninic acid assay (bca assay, thermo fisher). equal amounts of total protein from cell lysates were separated by sds-page ( μg or μg for ace , lamp , lc b and rs-rbd-his-tag, respectively). proteins from the gel were then electro-transferred onto . μm nitrocellulose and . μm pvdf membranes. the membranes were then blocked for h at room temperature, with either % non-fat powdered milk dissolved in tbs-t or % bovine serum albumin in tbs-t, for the nitrocellulose and pvdf membranes, respectively. following blocking, membranes were incubated overnight at °c with the primary antibodies anti-ace ( : ), anti-lamp ( : ), anti-lc b ( : ) and anti-his-tag ( : ). the odyssey infrared western system was used to detect target proteins. band intensity was quantified using imagej software. all experiments involving mice were approved by the partners subcommittee on research animal care. personnel from the laboratory carried out all experimental protocols under strict guidelines to insure careful and consistent handling of the mice. mouse model of sars-cov- s protein entry. weeks old male c bl/ were purchased from the jackson laboratories, usa. to induce the recombinant s-rbd protein. briefly, mice were anesthetized with sevoflurane inhalation (abbott) and placed in dorsal recumbency. transtracheal insertion of a -g animal feeding needle was used to instillate estrogen molecules, rs-rbd or vehicle (dmso), in a volume of µl. mice were sacrificed hrs after instillation of rs-rbs and lungs were removed for further analysis. histology. lungs were then fixed in formalin ( %) for hours before transfer to % ethanol for photography prior to paraffinization and sectioning ( μm) paraffin embedding. slides were produced for tissue staining) for quantitative analysis. saccharides treatment: hypoglycemic media was composed by hbss buffer or optiment media. normal media contained complete endothelial cell growth media. for hyperglycemic media, optiment was supplemented with d-glucose at mm, d-galactose at mm, d-ribose at μm, d-mannose at μm, or d-fructose at μm. huvecs at %- % confluence were supplemented with hypoglycemic, normal or hyperglycemic media hours before incubation with μg of recombinant s-rbd-his-tag overnight. estrogen treatment: huvecs at %- % confluence were supplemented with opti-mem hours before treatment with complete growing media containing β-diol at a concentration of nm or s-equol at a concentration of nm for hrs. fresh media containing rs-rbd ( μg) was supplied the next day. prior to cellular collection, cells were washed with sterile pbs, protein extraction were performed as described above. rs-rbd-ace binding assay μg of total protein extracts from mouse lungs were cleaned up with iga/igg agarose beads for hr at c on a rotator followed by resuspension in assay diluent at x. then μl of each lysate containing , , , , , or μg of total protein were placed into corresponding well of a covid s-protein microplate (cat#: cov-sace , ray biotech,inc.) for overnight incubation at c on a rotator. then supernatant was removed, and wells were washed x followed by incubation with x hrp-conjugated secondary antibody solution for hr at room temperature. then μl of tmb one-step substrate reagent was added to each well for min at room temperature. before read μl of stop solution was added and microplate was read at nm. results are given as mean ± sd student's t test ( -tailed) was applied to determine the statistical significance of difference between control and treated groups (*p < . , **p < . and ***p < . ). for all experiments, at least experimental replicates were performed. violin plot graphs show mean ± sd. data were analyzed, and graphs were prepared with prism . (graphpad software). p values of less than . were considered statistically significant. the crystalline structures used in this work were pdb id: vxx for spike protein (sp, trimeric form) of virus sars-cov- and pdbid: m (of rbd/ace -b at complex) for ace protein (dimeric form), both obtained in the rcsb protein data bank. the missing residues for sp located on n and c terminal domains (m- to p and f to h , respectively) were not considered in the molecular simulations. therefore, each one sp chain was made up of residues (a to s ). in our sp model, another disulfide bond was recognized between missing residues c and c and was considered in md simulations. for ace , residues on n-terminal domain were excluded (m- to t ) and on c-terminal domain only extramembrane residues were considered (to i to g ). ace structure contain two zinc ions in peptidase domain which were considered in this work. the remaining missing residues of both proteins were added using swissmodel server (s-t ). the ace and sp structures are considered glycoproteins and the glycan-linked residues have already been reported [ ] [ ] [ ] [ ] . the oplsaa based doglycans software was used for building all models (s-t ). for sp, there are n-glycosylation residues on its surface, but n , n , n and n sites were excluded due to residues considered in our sp model. oglycosylation sites was not included too. in ace , all n-glycosylation sites were considered. the glycosylation process was carried out using the glycan glcnac man model, a glycoside sequence composed of n-acetyl glucosamines and mannoses (s-f b). this glycan type is the most common core sugar sequence on the n-glycans , estrogen solvated systems. the systems were constructed with the average structure of glycosylated ace , obtained in last ns of mds trajectories. β-diol and s-equol structures were quantum optimized and their force fields were constructed using ligpargen server [ ] [ ] [ ] . the previous simulation box of ace was augmented . nm in all directions and with the protein centered in box, was solvated two times using gmx solvate module. the first solvation was made homogeneous way with βdiol and s-equol molecules ( . and . mm solutions, respectively). in second solvation, explicit water molecules were added to fill the simulation box. in the solvation process, we made sure that the estrogen molecules were not close to the protein at the start of the md simulations. all quantum simulations were performed using density functional theory (dft) at b lyp/ tzvp level , . the self-consistent reaction field (scrf) theory was used for describing the solvent effects on the molecules in water solutions. the calculations were performed in the electronic structure program gaussian and results were visualized in gaussview v. . the molecular structures of β-diol and s-equol were optimized and it was ensured that they were at a global minimum through frequency analysis. these optimized structures were used in the molecular dynamic simulations. in meps analysis, single point calculations were carried out and total electron densities was mapped on molecular electrostatic potential surface. to address the structural interactions, we performed molecular dynamics simulations using gromacs (v. . ) with the opls/aa force field parameters . the protein complexes were solvated with tip p explicit water model . in addition to na + counterions used to neutralize the total charge in the simulation box, we used a mm nacl concentration to mimic physiological conditions. all molecular systems were built in a triclinic simulation box considering periodic boundary conditions (pbc) in all directions (x, y and z). minimum distance of the surface atoms of proteins to the edge of periodic box was . nm for ace receptor and sars-cov- spike protein, and . nm for ace -estrogen solvated systems. the equations of motions were integrated with the leap-frog integrator using a time step of fs. temperature in the simulations was maintained at . k using modified berendsen thermostat (v-rescale algorithm) with = . ps coupling constant with protein and water-ions coupled separately. pressure was maintained at bar using the parrinello-rahman barostat with a compressibility of . x − bar - and a coupling constant of = . ps. all simulations were carried out with a short-range non-bonded cut-off of . nm and the particle mesh ewald (pme) method was used for computing long-range electrostatic interactions with a tolerance of x for contribution in real space. the verlet neighbor searching cut-off scheme was applied with a neighbor-list update frequency of steps ( fs). bonds involving hydrogen atoms are constrained using the linear constraint solver (lincs) algorithm . simulations were first energy minimized using the steepest descent algorithm for a maximum of , steps. the equilibration was conducted by two steps. the equilibration was conducted by two steps. the first step, a ns of dynamics in the nvt (isothermal-isochoric) ensemble and second step, was continued for another ns in the npt (isothermal-isobaric) ensemble. production runs were performed in the npt ensemble for ns for ace and sars-cov- spike protein and ns for ace -estrogen solvated systems. structure and data analysis. the structural interactions were obtained carried out a rigid-rigid body docking analysis using patchdock server in order to obtain the contact residues between s-rbd and ace systems. the patchdock algorithm discard all unacceptable complex and results are assorted by geometry shape complementarity score. in addition, patchdock do calculate the effective atomic contact energies according to zhang et al. the molecular docking was done take to ace protein as receptor molecule and spike protein as ligand molecule. clustering rmsd value and complex type they were selected according to the recommended parameters for protein-protein interactions ( . Ǻ and default mode). for ace -estrogen systems, the docking was performed in the presence of the estrogen molecules bound to the ace structure, β-diol molecules and s-equol molecules, respectively. from total results obtained in molecular docking, those structures that had steric impediments (intermembranal clashes) were discarded. the steric impediments were calculated based in sars-cov- virus size , , whose diameter varies about to nm and its spike protein is about to nm length ( and nm on average, respectively). molecular interactions were analyzed with ligplot software and the pdb files required was constructed with fortran based own computer programs and the statistical data results. statistical results, rmsd, rmsf, rg, sasa, hydrogen bonds, free energies, matches, structures, movies, b-factor maps, were obtained using gromacs modules and their different tool options. the analysis of structure properties was performed using md trajectories on the last ns of each simulations and visualization of the md simulations was created using visual molecular dynamics (vmd) software and the graphs were plotted by the xmgrace software . each molecular conformation during an md simulation has an associated energy and this can be observed using fel maps. these maps are usually represented by two variables related to atomic position and one energetic variable, typically the gibbs free energy. this free energy can be estimated from probability distributions of the system with respect to the chosen variables that are then converted to a free-energy value by bolzmann inverting multi-dimensional histograms. when represented in three dimensions, the fel maps show the energy range of all possible conformations were obtained during a simulation. in this work, we considered two substructures of ace protein for fel map analysis, the alpha - region (i to y ) and loops regions l - and l - (d to r ). the fel maps were plotted using gmx sham module while the rmsd and radius of gyration were considered as atomic position variables respect to its average structure and figures were constructed using wolfram mathematica . . . estrogens bind to ace glycans to promote its internalization. (a) glycanestrogen interactions stabilize ace structure through high-energy contacts involving ace glycan-residues at e , n , k and v (red color). (b) mep maps show the electrostatic impact of estrogen molecules on the surface of ace glycans. energy scale ranging from - . μa (red) to . μa (blue). (c) immunofluorescence staining of human ace (magenta) and the lysosome marker lamp (green), shows loss of ace membrane levels in huvecs treated with β-diol or s-equol compared with control cells (dmso). (d) immunoblot of lysates isolated from huvecs showing decreased levels of total ace protein with estrogen treatment. reduced ace protein levels were associated with increased endocytosis activity as evidenced by immunoblot for lc b and lamp . (e) histologic analysis of mouse lungs after hrs of intratracheal installation with β-diol or s-equol shows loss of ace expression (red) on the membrane of alveolar cells. estrogen-treated lungs showed greater ace -lamp colocalization (white arrows) indicating internalization of the receptor. (f) immunoblot showing decreased levels of total and glycosylated ace proteins in estrogen-treated lungs from male mice compared to control lungd. quantification of protein levels of three replicate experiments is shown. student's t-test, tails. bar graphs are presented as mean with error bars (±sd). surface interacting with top scored s-rbds (top -blue, -red, -orange, -purple and top yellow). s-rbds were scored based on shape complementarity principles. (b) heatmap of atomic contact energy between ace and s-rbds, shows spontaneous energy structures from most favorable (green) to less favorable s-rbd structures (red). energy scale ranging from kcal/mol to - kcal/mol. (c) immunofluorescence analysis of s-rbd entry into huvecs pretreated with β-diol or s-equol followed by treatment with μg/ml of recombinant s-rbd (red) demonstrate that estrogen-treated cells had reduced entry of s-rbd into cells in conjunction with a reduction in ace internalization as showed by colocalization with lamp (green). (d) immunoblot of isolated proteins from cultured huvecs shows a % reduction of s-rbd entry into cells in estrogen-treated cells. quantification of protein levels of three replicate experiments is shown. student's t-test, tails. bar graphs are presented as mean with error bars (±sd). (a) elisa-based binding assay using lung protein lysates shows reduced sars-cov- s protein affinity for the ace receptor after treatment with either β-diol or s-equol. (b) immunofluorescence analysis of wild-type mouse lung treated with β-diol ( . μm) or s-equol ( μm) compared with control lung (dmso) demonstrates rs-rbd protein accumulation on the surface of the alveolar cells rather than being internalized intracellularly where viral replication may occur. treatment with either estrogen also reduced ace prtoein expression (quantification in lower right panel). lamp (orange), ace (green) and rs-rbd (red). quantification levels of three replicate experiments is shown. student's t-test, tails. bar graphs are presented as mean with error bars (±sd). a pneumonia outbreak associated with a new coronavirus of probable bat origin tripartite combination of candidate pandemic mitigation agents: vitamin d, quercetin, and estradiol manifest properties of medicinal agents for targeted mitigation of the covid- pandemic defined by genomics-guided tracing of sars-cov- targets in human cells circulating plasma concentrations of angiotensin-converting enzyme in men and women with heart failure and effects of renin-angiotensin-aldosterone inhibitors predictors of mortality in hospitalized covid- patients: a systematic review and meta-analysis impact of sex and gender on covid- outcomes in europe are sex discordant outcomes in covid- related to sex hormones? sars-cov- and male infertility: possible multifaceted pathology covid- and androgen-targeted therapy for prostate cancer patients sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structural and functional basis of sars-cov- entry by using human ace ace , much more than just a receptor for sars-cov- . front the x chromosome in immune functions: when a chromosome makes the difference the x-files in immunity: sex-based differences predispose immune responses considering how biological sex impacts immune responses and covid- outcomes correction: sex hormones promote opposite effects on ace and ace activity, hypertrophy and cardiac contractility in spontaneously hypertensive rats cross-country comparison of case fatality rates of covid- /sars-cov- . osong public health res whole versus the piecemeal approach to evaluating soy equol: history, chemistry, and formation beyond the cholesterol-lowering effect of soy protein: a review of the effects of dietary soy and its constituents on risk factors for cardiovascular disease developing a fully glycosylated full-length sars-cov- spike protein model in a viral membrane site-specific glycan analysis of the sars-cov- spike structural basis for the recognition of sars-cov- by full-length human ace structure, function, and antigenicity of the sars-cov- spike glycoprotein emerging covid- coronavirus: glycan shield and structure prediction of spike glycoprotein and its interaction with human cd . emerging microbes & infections lc -associated endocytosis facilitates β-amyloid clearance and mitigates neurodegeneration in murine alzheimer's disease gangliosides are essential endosomal receptors for quasi-enveloped and naked hepatitis a virus coronavirus disease- and fertility: viral host entry protein expression in male and female reproductive tissues female reproductive tract has low concentration of sars-cov receptors impaired estrogen signaling underlies regulatory t cell loss-offunction in the chronically inflamed intestine molecular mechanisms of sex bias differences in covid- mortality differential regulation and targeting of estrogen receptor α turnover in invasive lobular breast carcinoma sars-cov- has a sweet tooth glycosylation in health and disease potential influence of menstrual status and sex hormones on female sars-cov- infection: a cross-sectional study from multicentre in wuhan, china swiss-model: homology modelling of protein structures and complexes doglycans-tools for preparing carbohydrate structures for atomistic simulations of glycoproteins, glycolipids, and carbohydrate polymers for gromacs probing the glycosidic linkage: secondary structures in the gas phase potential energy functions for atomic-level simulations of water and organic and biomolecular systems . * cm a-lbcc: localized bond-charge corrected cm a charges for condensed-phase simulations ligpargen web server: an automatic opls-aa parameter generator for organic ligands density-functional thermochemistry. iv. a new dynamical correlation functional and implications for exact-exchange mixing fully optimized contracted gaussian basis sets for atoms li to kr gromacs: high performance molecular simulations through multilevel parallelism from laptops to supercomputers development and testing of the opls all-atom force field on conformational energetics and properties of organic liquids temperature and size dependence for monte carlo simulations of tip p water quiet high-resolution computer models of a plasma molecular dynamics with coupling to an external bath canonical sampling through velocity rescaling die berechnung optischer und elektrostatischergitterpotentiale a parallel linear constraint solver for molecular simulation patchdock and symmdock: servers for rigid and symmetric docking determination of atomic desolvation energies from the structures of crystallized proteins a novel coronavirus from patients with pneumonia in china science forum: sars-cov- (covid- ) by the numbers ligplot: a program to generate schematic diagrams of protein-ligand interactions vmd: visual molecular dynamics version . . mathematica, version key: cord- -v gv uf authors: salazar, cecilia; díaz-viraqué, florencia; pereira-gómez, marianoel; ferrés, ignacio; moreno, pilar; moratorio, gonzalo; iraola, gregorio title: multiple introductions, regional spread and local differentiation during the first week of covid- epidemic in montevideo, uruguay date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v gv uf background after its emergence in china in december , the new coronavirus disease (covid- ) caused by sars-cov- , has rapidly spread infecting more than million people worldwide. south america is among the last regions hit by covid- pandemic. in uruguay, first cases were detected on march th presumably imported by travelers returning from europe. methods we performed whole-genome sequencing of sars-cov- from patients diagnosed during the first week (march th to th) of covid- outbreak in uruguay. then, we applied genomic epidemiology using a global dataset to reconstruct the local spatio-temporal dynamics of sars-cov- . results our phylogeographic analysis showed three independent introductions of sars-cov- from different continents. also, we evidenced regional circulation of viral strains originally detected in spain. introduction of sars-cov- in uruguay could date back as early as feb th. identification of specific mutations showed rapid local genetic differentiation. conclusions we evidenced early independent introductions of sars-cov- that likely occurred before first cases were detected. our analysis set the bases for future genomic epidemiology studies to understand the dynamics of sars-cov- in uruguay and the latin america and the caribbean region. in december , a new coronavirus disease (covid- ) was detected in wuhan, china. its causative agent, known as sars-cov- , has spread rapidly causing a global pandemic of pneumonia affecting more than m people with more than , deaths to date [ ] . global spread of sars-cov- was primarily driven by international movement of people, since travel-associated cases of covid- were reported outside of china as early as mid-january [ ] . this global health emergency has deployed international efforts to apply genomic epidemiology to track the spread of sars-cov- in real time. the recent development of targeted sequencing protocols by the artic network [ ] , open sharing of genomic data through the gisaid (www.gisaid.org) database and straightforward bioinformatic tools for viral phylogenomics [ ] , provides the opportunity to reconstruct global spatio-temporal dynamics of the covid- pandemic with unprecedented comprehensiveness and resolution. south america was one of the last regions in the world to be hit by covid- . indeed, first cases were confirmed in brazil around months after its emergence in china. then, sars-cov- rapidly emerged in neighboring countries like uruguay, that reported first cases in montevideo, its capital city, on march th . population of uruguay is among the smallest in south america (~ . m), and is mainly concentrated in montevideo and its metropolitan area (~ % of the total population). also, the biggest international airport of the country is placed in this metropolitan area and concentrates more than % of international arrivals. these characteristics makes montevideo a suitable place to apply genomic surveillance to uncover epidemiological patterns during the early phase of the covid- local outbreak in uruguay. we therefore aimed to characterize the spatio-temporal dynamics of sars-cov- by sequencing around % of cases occurred during the first week of outbreak in montevideo, allowing us to identify transmission patterns, geographic origins and genetic variation among local strains. ethics statement. residual de-identified nasopharyngeal samples were remitted to the institut pasteur montevideo, that has been validated by the ministry of health of uruguay as an approved center providing diagnostic testing for covid- . all samples were de-identified before receipt by the study investigators. sample collection, processing and sequencing. nasopharyngeal swabs were obtained from patients residing in montevideo. no other location data was available to researchers to avoid patient identification. swabs were placed in virus transport media (bd universal viral transport medium) immediately after collection. total rna extraction was performed directly from samples ( μl) using trizol reagent (invitrogen life technologies, carlsbad, ca, usa). an in-house onestep rt-qpcr assay based on taqman probes targeting the n gene and the open reading frame b region were used to test for the presence of sars-cov- rna. we selected a total of positive samples with cycle-threshold (ct) less than for downstream wholegenome sequencing. sequencing of sars-cov- positive samples was performed according to the primalseq [ ] approach on the minion sequencing platform (oxford nanopore technologies, united kingdom) using the v primer pools. sequencing libraries were prepared based on the pcr tiling of covid- virus protocol (artic network) using the ligation sequencing kit (sqk-lsk ) and native barcoding expansion (exp-nbd ) (oxford nanopore technologies, united kingdom), with the following modifications: i) rna was reverse-transcribed using superscript ii reverse transcriptase (invitrogen life technologies, carlsbad, ca, usa) and random hexamer primers following the manufacturer's instructions, ii) phusion hot start ii high-fidelity pcr master mix (thermo scientific) was used for pcr amplification and, iii) blunt/ta ligase (new england biolabs, usa) was used to ligate barcodes to each sample. to detect crosscontamination between samples, a no template sample was added from a patient tested negative for sars-cov- from the rna purification step. a total of ng of pooled samples was loaded onto a minion r . . flow cell and sequenced for hours generating ~ million reads of average quality of . the rampart software from the artic network (https://github.com/artic-network/rampart) was used to monitor the sequencing run in real time, estimate coverage across samples and check barcoding. subsequently, basecalling was performed guppy software v . . using the high accuracy module. consensus genomes were generated and variants were called using the artic network bioinformatic pipeline (https://artic.network/ncov- /ncov bioinformatic.sop.html). amplicons that were not sequenced or whose depth was less than x were not included in the consensus sequences and these positions were represented by "n" stretches. see supplementary table s . phylogenetic and spatio-temporal analysis. to investigate the origin, dynamics and genetic variation of sars-cov- in uruguay, we added our genomes to a dataset of , complete (> , bp.), highquality genomes available from gisaid (https://www.gisaid.org) on april nd . a list of sequences and acknowledgements to the originating and submitting labs is presented in supplementary table s . we analyzed this dataset using the augur toolkit version . . [ ] . briefly, genomes were aligned using mafft [ ] and a phylogeny was reconstructed with iq-tree [ ] . estimation of ancestral divergence times and geographic locations for internal nodes of the tree, and identification of branch-specific or clade-defining mutations were obtained using augur and treetime [ ] . code for performing these analyses can be found at http://github.com/giraola/covid- -uruguay and results can be visualized at https://nextsrain.org/community/ giraola/covid -uruguay. genomes (hereinafter referred as cluster c -uy) were related each other and were placed within clade b (fig. ) . these clusters were supported by specific, non-homoplasic mutations (table ) . together, these results indicate three independent introductions of sars-cov- into uruguay. early sars-cov- circulation and regional spread. to uncover the spatio-temporal dynamics of sars-cov- in uruguay, we performed a phylogeographic reconstruction. the most probable ancestral location for c -uy was australia, however, very similar viruses have been also sequenced from the united kingdom (fig. ) . the last common ancestor of c -uy in australia dated back to mar th (confidence interval (ci): feb th -mar rd ). c -uy was embedded in a cluster of viruses sequenced in canada, the united states, australia and iceland whose emergence dated back to february th (ci: feb th -mar th ) (fig. ) . c -uy likely originated from european viruses circulating in spain since jan th (ci: jan th -feb th ). the last common ancestor of c -uy in uruguay was tracked to mar th (ci: feb rd -mar th ) (fig. ) . two viruses from chile were intermingled between viruses from spain and uruguay, indicating regional circulation of these variants. together, these results indicate that sars-cov- could be circulating in the country previous to its first detection in mar th . local clusters and genetic differentiation. to further characterize uruguayan sars-cov- , we identified genetic differences among genomes. c -uy ancestor likely located in australia presented one nonsynonymous mutation c t which affected the orf ab protein (s l) and genome uy- harbored one additional synonymous mutation a g. c -uy ancestor likely located in canada was characterized by one synonymous mutation a t and the last common ancestor to uy- and uy- presented another synonymous change a g. additionally, genome uy- presented one synonymous mutation c t. c -uy and chilean genomes displayed one synonymous mutations c t which distinguished them from the original spanish lineage. additionally, c -uy differentiated from chilean genomes by one synonymous mutation c t. specifically, among c -uy genome uy- presented a single synonymous mutation c t and genome uy- presented one non-synonymous mutation c a in the orf ab protein (t k). overall, we observed mutations within uruguayan clusters supporting rapid local differentiation (table ) . using portable, low-cost sequencing approaches based on the minion platform (oxford nanopore technologies) we were able to generate whole-genome sequences of sars-cov- just hours after receiving the samples in the lab. this allowed us to determine the most plausible spatio-temporal scenario during the first week since covid- was detected in uruguay. consequently, we show that real-time molecular epidemiology responses can be implemented locally as previously done by other countries [ ] . our results uncovered sars-cov- genomes derived from three independent international origins as early as days after the disease was reported in uruguay. international air transport has likely played a central role in the rapid spread of sars-cov- , contributing to the current covid- pandemic. according to air transport statistics from the world bank (https://data.worldbank.org) and the national government we estimated these introductions could have occurred as early as mid-february, around month before first cases were reported in uruguay in march th . this is in line with observations from other geographic regions, for example, in new york first cases were confirmed in march but a recent study suggested that initial virus introductions could be traced back to february th [ ] . also, the virus was likely introduced in europe by a traveler that arrived from wuhan to france, but the recent identification of a group of ill chinese tourists in france previous to this case supports earlier introductions [ ] . indeed, sars-cov- has been proposed to circulate in humans even before its first detection in december [ ] . together, our results highlight the importance of active genomic surveillance during the ongoing covid- pandemic to guide informed decisions, based on assessing the epidemiological behavior of sars-cov- in real time. the application of rapid genomic epidemiology during local outbreaks allows to identify main routes of virus introduction and estimate spatio-temporal dynamics which can be used to improve contingency measures. remarkably, tracking the emergence of local genetic variants like those we observed in montevideo, is important to evaluate the sensibility of molecular diagnosis and potential impacts on future anti-viral strategies. subsequent generation of a more comprehensive genomic dataset from the ongoing outbreak in uruguay will allow us to identify domestic transmission patterns and spatio-temporal characterization of local clusters. additionally, setting up coordinated efforts to generate genomic data in south america will allow us to perform integrative analyses to uncover sars-cov- dynamics at the continental level. an interactive web-based dashboard to track covid- in real time novel coronavirus ( -ncov) situation report - multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples | nature protocols nextstrain: real-time tracking of pathogen evolution an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar mafft multiple sequence alignment software version : improvements in performance and usability iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies treetime: maximum-likelihood phylodynamic analysis real-time, portable genome sequencing for ebola surveillance introductions and early spread of sars-cov- in the new york city area early release -early introduction of severe acute respiratory syndrome coronavirus into europe a genomic perspective on the origin and emergence of sars-cov- acknowledgements. we thank josh quick and the artic network for providing sars-cov- sequencing primers and support with sequencing protocols. we also thank to everyone who openly shared their genomic key: cord- - jgtxg authors: choudhury, abhigyan; chandra das, nabarun; patra, ritwik; bhattacharya, manojit; mukherjee, suprabhat title: in silico analyses on the comparative sensing of sars-cov- mrna by intracellular tlrs of human date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jgtxg the worldwide outbreak of covid- pandemic caused by sars-cov- leads to loss of mankind and global economic stability. the continuous spreading of the disease and its pathogenesis takes millions of lives of peoples and the unavailability of appropriate therapeutic strategy makes it much more severe. toll-like receptors (tlrs) are the crucial mediators and regulators of host immunity. the role of several tlrs in immunomodulation of host by sars-cov- is recently demonstrated. however, the functionality of human intracellular tlrs including tlr , , and is still being untested for sensing of viral rna. this study is hoped to rationalize the comparative binding and sensing of sars-cov- mrna towards the intracellular tlrs, considering the solvent-based force-fields operational in the cytosolic aqueous microenvironment that predominantly drive these reactions. our in-silico study on the binding of all mrnas with the intracellular tlrs shown that the mrna of nsp , s , and e proteins of sars-cov- are potent enough to bind with tlr , tlr , and tlr and trigger downstream cascade reactions, and may be used as an option for validation of therapeutic option and immunomodulation against covid- . the worldwide outbreak of coronavirus disease or covid- pandemic caused by the severe acute respiratory syndrome coronavirus- (sars-cov- ), leads to the infection of about . % of the total world's population causing the death of . million people by the first week of november . the continuous expansion of contagion and pathogenesis, it is taking the shape of another dark age in the history of mankind, not only to health crises but also in bankrupting the global socio-economic status. the sars-cov- belongs to the β- coronavirus genus of a b group of the coronaviridae family and is considered the third most virulent type, leading to the highest fatality rate in humans followed by the sars-cov and mers-cov. it transmits from person to person mainly via close physical contact and by respiratory aerosols that produces during coughing, sneezing, and even talking within proximity, although some recent studies indicate transmission through fecal matters and fomite-borne contaminations. - the proteome of the virus consists of structural proteins, existing in three formsthe spike 's' protein, the envelope 'e' protein and the membrane 'm' protein. along with these, forms of non-structural proteins (nsps) combining together to generate different catalytic models. the genome, however, is much simpler and comprises of a , base long, positive- sense single-stranded rna molecule. this (+)ssrna genome makes it further feasible to be detectible by the intracellular toll-like receptors (tlrs) which have a high affinity towards nucleic acid-related pathogen-associated molecular patterns (pamps). several empirical studies have presented various propositions relating the intracellular tlr , , , and to the cytokine storm produced by the virus that majorly owes it the lethality. cytokine storm is apparently the incessant extreme activation of cytokine production leading to prolonged and consistent inflammatory response, which becomes almost continuous due to positive feedback loops in the tlr signaling pathways. the studies are quite successful in relating the role of tlrs in recognizing oligonucleotide pamps and triggering the cytokine storm. the binding of spike protein with the human ace receptor triggers the pathogenesis of the sars-cov- , leading to the activation of tlrs to activate the proliferation and production of pro-inflammatory cytokines causing cytokine storm, those results in inflammations. , from previous studies, it has been found that the spike protein shows binding efficiency with the extracellular domains of tlrs including tlr , tlr and, tlr , with the strongest affinity with tlr . furthermore, the development of several in-silico multi-epitope-based peptide vaccine candidates against the sars-cov- has shown to be effectively binding with tlr , tlr and, tlr to regulate the tlr signaling pathways activation and proliferation. it has been found that targeting human tlrs, in an order to either block the binding of sars-cov- or by inhibiting the tlr activation and proliferation that induce the production of pro-inflammatory cytokines using certain tlr agonists, might be used as an effective therapeutic strategy against coronavirus disease. additionally, the continuous replication and generation of new virus within the host system indicates that there must be binding efficiency and potency with the intracellular tlrs including tlr , , , and in order to induce the severity and pathogenesis of the virus and further introduction to the new host. tlr is well known for its sensing capabilities of viral pamps and exists as a monomer that is attached to the membrane of endosomes, thereby it detects and binds to certain motifs in the invading viral rna. tlr expressed majorly in the cerebral cortex, lung, bronchus, breast, kidney, rectum, and smooth muscle tissues, and functions by adhering to the endosome and thereby binding to the viral rnas with high guanosine and uridine content to initiates a myd signal transduction resulting in activation of nf-κb, mitogen-activated protein kinase (mapk) cascades, as well as irf- and irf- activation via il- receptor- associated kinases (irak)- / / and tnf receptor-associated factor- / . the signaling finally induces the production of pro-inflammatory cytokines including il- β, il- , il- , tnf-α, and ifn-α. tlr is expressed predominantly in the lung and the peripheral blood leucocytes and plays a major role in recognizing gu-rich viral ssrnas including those of sars-cov- . tlr is predominantly expressed in the spleen, lymph node, bone marrow, and peripheral blood leukocytes and it recognizes cpg motifs in viral dnas. however, several studies suggest its role in sensing ssrna fragments generated by the sars-cov- genome. however, a lack of precise knowledge regarding the nature of oligonucleotides and their ligating affinity towards the tlrs still pertains to exist. this in context is abstaining the medical research community from developing certain therapeutic interventions that would have been vitally important in this hour of severity. our study is hoped to rationalize the picture and provide clues regarding the interaction range of the oligonucleotides towards the intracellular tlrs, considering the solvent-based force-fields operational in the cytosolic aqueous microenvironment that predominantly drive these reactions. current literatures suggest the predominant expression of ten proteins of sars-cov- . four of them are structural proteinsthe spike protein subunits s and s , the envelope protein (e), were obtained from the rcsb pdb database. however, the structures of tlr and tlr were obtained by performing homology modelling in the swiss-model server from expasy (https://swissmodel.expasy.org/) by using protein sequences obtained from genbank with accession no. aaz and aaz . respectively. all the crystal surface structures of tlrs and the whole genome of sars-cov- were then prepared to visualize ( figure ). to perform further studies, pymol is used for removing water molecules, aboriginal heteroatoms as well as any xenobiotic ligands whenever present, and adding polar hydrogens and kollman charges to the structures. the retrieved rna sequences were subjected to the imrna tool developed by iiit-delhi (https://webs.iiitd.edu.in/raghava/imrna/). this tool is based on motif-emerging and with classes-identification (merci) program and scans through the rna sequences for motifs that are potent to have immunomodulatory properties. the hdock server (http://hdock.phys.hust.edu.cn/) presents a novel algorithm which is a hybrid of template-dependent along with template-independent ab initio free docking. moreover, it is one of the advance programme which support protein docking against dna/rna molecules. oligonucleotides have great sizes and demand heavier computing resources for the rendering of molecular models, which is seldomly feasible for any supercomputing server to provide. thus, hdock was a program of choice. the retrieved pdb structures along with the rna sequences were used for the purpose, however, the species- the so retrieved docked structures were then fed into the imod webserver from chaconlab (http://imods.chaconlab.org). this program has a user-friendly gui and is a well-recognized tool for performing normal mode analysis (nma) and simulating various modal trajectories of protein dynamics. nma in dihedral coordinates naturally mimics the combined functional motions of protein molecules modelled as a set of atoms connected by harmonic springs. as output the server delivers affine modelled arrow, vector field and a modal animation to signifies the motions. moreover, study also provide more detail profile about mobility using b-factor and deformability plots whereas eigenvalue helps to measure the relative modal stiffness of the in-silico molecular docking between receptor and ligand is not sufficient to conclude the nature in this work, chemistry at harvard macromolecular mechanics (charmm) force field was accessed initially to generate necessary input files for md simulation. - solution builder approach under charmm-gui web tool was first add water box (tip ) and then neutralizing atoms to solvate the system. in order to remove bad contacts and generate more specific outcomes periodic boundary conditions (pbc) were analysed and minimization steps were performed sequentially. server provided outcomes were then applied on gromacs v. . to achieve the equilibrium (nvt-constant volume and temperature) of the system. finally, module gmx_mdrun was accessed to run the md simulation of the system and xmgrace protein with conformational changes from its native form suggests it is in bounded state with any ligand. thus, to analyse the changes in tlrs, firstly we extract the pdbs from trajectories and then visualized using pymol. in order to assesse the quality of modelled structure of tlr and tlr , we access structure assesement tool under swiss-model server and found out both the structures are significant as more than % residues from both structures reside in ramachandran plot favoured region signifying a stable stereochemical structure ( figure s ). our study focuses principally on the interactions of the intracellular tlrs with the mrna fragments, and molecular docking studies play a pivotal role in this study by precisely table s . however, only the four topmost scoring docked complexes were selected for further experimentation and analysis. in accord to our docking studies, tlr binds with a much proficient binding pose with the rna fragment encoding nsp , with a high docking score of - . which is higher than any other docking operation involving tlr like that of the rna fragment of papain-like protease which binds with a score of - . or that of the main protease which has a score of - . (table s ). while in search of insights of molecular docking plip study found the involvement of arg, asn, thr residues in hydrogen bonding, tyr, his in π- cation interactions and salt bridges from and his ( figure a , table ) in this study, the docking operations determine a good binding pocket for e-protein figure b and table . herein, the most probable binding pose of nsp rna fragment and tlr is built with a definite score of - . , which is much higher than those poses comprising of the rnas of table . among the docked complexes with tlr possessing the s subunit mrna fragment stood at the highest position with a score of - . . the molecular insights of this ligation are well supported by the plip analysis as finding involves tyr, arg, val, lys residues in hydrogen bonding, arg in π-cation interactions and salt bridges from lys, glu, and lys ( figure b , table ). figure s i , s j, s i and s j, where red colour represents individual and green to cumulative variance. in the covariance matrix, motions are categorized in different modes and visualized in figure s k and s m with red, blue and white colour that signify variations among correlated, anti-correlated and uncorrelated motions respectively. in the end, the last outcome of nma study, elastic network map graph is represented by figure s l and s n where grey dots indicate the stiffness of the motions and springs direct the atomic connections of the complex. four most active complexes, those were primarily strained out according to their significant herein, rmsd plot in figure a reflects the structural stability of the backbone of different receptor tlrs of the four complexes and suggests complex tlr -nsp is less stable among others. where figure b and c define the compactness and solvent accessible area of that complexes as rg plot and sasa plot, respectively. at last the graphical plot of hydrogen bond clearly describes the insights of the complexes and supports the postulation reflected on rmsd, rg and sasa plots ( figure d ). apart from rmsd, rg, sasa and hydrogen bond studies, we also analyse conformational figure s e ). further, the molecular docking studies showed a similar trend by reflecting highest docking score to the complexes selected through rpiseq tool and in turn support the prediction of best docked structures (table s ). after getting confirmation on best predicted structure through hdock we forwarded to next step of simulation to analyze the conformational stability and flexibility of that selected four structures viz. tlr -nsp mrna complex ( figure a ), tlr -nsp mrna complex as shown in figure a , tlr -e mrna complex in figure b and tlr -s mrna complex ( figure b) . initially, the nma study reveals all the compounds have better stability as it reflects quite similar and significant eigenvalue scores, those are . x - for tlr -nsp mrna complex as shown in figure s g , . x - tlr - nsp mrna complex ( figure s g) , . x - tlr -e mrna complex as in figure s h , and . x - tlr -s mrna complex ( figure s h ). later, md simulation studies figure out postulation of nma studies is not fully accepted as tlr -nsp mrna complex found as unstable throughout the simulation process in the rmsd plot figure a . the plip analysis table and hydrogen bond plot figure d we do acknowledge the efforts of all the doctors, health workers, social workers, scientists, and researchers currently working endlessly against coronavirus disease worldwide. we have kept selected studies as reference due to the limitation in space, but we appreciate all the uncited articles dedicated to the research on coronavirus. rp thanks the department of higher π-cation interactions, while dotted yellow lines denote salt bridges. it is to be noted that for the sake of perceptual clarity some residues have been intentionally omitted from the diagram, however the same have been furnished in table . world health organization. covid- pandemic update. situation report website the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirus outbreak in wuhan coughs and sneezes: their role in transmission of respiratory viral including sars-cov- potential fecal transmission of sars-cov- : current evidence and implications for public health transmission of sars and mers coronaviruses and influenza virus in healthcare settings: the possible role of toll-like receptors and covid- : a two-faced story with an exciting ending covid- : towards understanding of pathogenesis targeting human tlrs to combat covid- : a solution? in silico studies on the comparative characterization of the interactions of sars-cov- spike glycoprotein with ace- receptor homologs and human development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach the toll for trafficking: toll-like receptor delivery to the endosome is a dual receptor for guanosine and single-stranded rna small anti-viral compounds activate immune cells via the tlr myd -dependent signaling pathway tlr and tlr mediated host immune responses in major infectious diseases: a review. the brazilian journal of infectious diseases recognition of gu-rich polyadenylation regulatory elements by human cstf- protein coronavirus infections and immune responses the proteins of severe acute respiratory syndrome coronavirus- (sars cov- or n-cov ), the cause of covid- identifying discriminative classification-based motifs in biological sequences learning framework for robust and accurate prediction of ncrna-protein interactions using evolutionary information predicting rna-protein interactions using only sequence information hdock: a web server for protein-protein and protein-dna/rna docking based on a hybrid strategy plip: fully automated protein-ligand interaction profiler imods: internal coordinates normal mode analysis server designing of a novel multi-epitope peptide based vaccine against brugia malayi: an in silico approach molecular modeling of nucleic acid structure: setup and analysis protein conformational switches: from nature to design computational prediction of the immunomodulatory potential of rna sequences in silico study the inhibition of angiotensin converting enzyme receptor of covid- by ammoides verticillata components harvested from western algeria molecular dynamics simulation for all key: cord- -qgyzk th authors: edgar, robert c.; taylor, jeff; altman, tomer; barbera, pierre; meleshko, dmitry; lin, victor; lohr, dan; novakovsky, gherman; al-shayeb, basem; banfield, jillian f.; korobeynikov, anton; chikhi, rayan; babaian, artem title: petabase-scale sequence alignment catalyses viral discovery date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qgyzk th public sequence data represents a major opportunity for viral discovery, but its exploration has been inhibited by a lack of efficient methods for searching this corpus, which is currently at the petabase scale and growing exponentially. to address the ongoing pandemic caused by severe acute respiratory syndrome coronavirus and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (cov) and other viral families to . petabases of public sequencing data from . million biologically diverse samples. to implement this strategy, we developed a cloud computing architecture, serratus, tailored for ultra-high throughput sequence alignment at the petabase scale. from this search, we identified and assembled thousands of cov and cov-like genomes and genome fragments ranging from known strains to putatively novel genera. we generalise this strategy to other viral families, identifying several novel deltaviruses and huge bacteriophages. to catalyse a new era of viral discovery we made millions of viral alignments and family identifications freely available to the research community. expanding the known diversity and zoonotic reservoirs of cov and other emerging pathogens can accelerate vaccine and therapeutic developments for the current pandemic, and help us anticipate and mitigate future ones. viral zoonotic disease has had a major impact on human health over the past century despite dramatic advances in medical science, notably by the spanish flu, aids, sars, ebola and covid- pandemics. there are an estimated , mammalian viruses [ ] from which emerging infectious diseases in humans may arise [ ] . uncovering this viral biodiversity is a prerequisite for predicting and preventing future epidemics and is therefore the focus of consortia such as usaid predict [ ] and the global virome project [ ] as well as hundreds of government and academic research projects worldwide. these efforts can be aided through re-analysis of petabases of high-throughput sequencing data available in public databases such as the sequence read archive (sra) [ ] . this data spans millions of ecologically diverse biological samples, many of which capture viral transcripts that may be incidental to the goals of the original studies [ ] . to expand the known repertoire of viruses and catalyse global virus discovery, in particular for coronaviridae (cov) family, we developed the serratus cloud computing architecture for ultra-high throughput sequence alignment. from a screen of . million libraries comprising . petabases of sequencing reads, we report , assemblies, including sequences from previously uncharacterised or unavailable cov or cov-like operational taxonomic units (otus), defined by clustering amino sequences of the rna dependent rna polymerase (rdrp) gene at % identity. to demonstrate the broader utility of our approach, we also report six novel deltaviruses related to the human pathogen hepatitis δ virus (hdv), and expand the described members of the recently characterised family of huge bacteriophages (phages). viral discovery is a first step in preparing for the next pandemic. sequencing reads for thousands of uncharacterised viruses already exist and require careful curation. to accelerate this process, we established a freely available and explorable resource of all vertebrate viral alignment data generated by serratus at https://serratus.io. this work lays the foundation for years of future research by enabling the exploration of viruses which have been captured by more than a decade of high-throughput sequencing studies. serratus is a freely available, open-source cloud-computing platform designed to enable petabase-scale sequence alignment against a set of references. using serratus, we aligned in excess of one million short-read sequencing datasets per day for under us cent per dataset (extended figure ). this was achieved by leveraging commercially available computing infrastructure to employ up to , virtual cpus simultaneously (see methods). we aligned , , public rna-seq, meta-genome, meta-virome and meta-transcriptome datasets (termed a sequencing run [ ] ) against a collection of viral family pangenomes comprising all genbank cov records clustered at % identity plus all non-retroviral refseq records for vertebrate viruses (see methods and extended table ). to uncover more divergent viruses, we re-analysed , runs in a translated nucleotide search against a query comprising panproteomes for cov and other families. we performed de novo assembly on , runs potentially containing cov sequencing reads by combining , sra accessions identified by the serratus search with , identified by an ongoing cataloguing initiative of the sra called stat [ ] . , of the resulting assemblies contained putative cov contigs, of which , aligned to cov rdrp (extended table ). of these, we identified otus from a total of , i.e. not represented by coronaviridae in genbank (figure a and extended figure ). the protein domains of these otu are consistent with a cov or cov-like genome organisation (extended figure ) . three of the novel cov otus fell within the alphacoronavirus (αcov) genus. the first (exemplar run: err ) was from two desmodus rotundus bat metagenomes yielding . and . kb cov contigs respectively in the nyctacovirus subgenus. these cov were noted by the data-collectors, [ ] , but the sequences were not public and thus novel to our analysis. the second otu (srr ) was from a pipistrellus pipistrellus bat metagenome collected in in china. finally, from five libraries (err ) generated for a study on the metagenomic effects of the burying beetle nicrophorus vespilloides on a mouse carcass, we assembled a luchacovirus related to the rodent lucheng rn rat coronavirus ( % genome nucleotide identity to nc . ). from a rodent virome study which identified several novel cov [ ] , a sample from an unknown species contained a βcov embecovirus (srr ), with the closest matching genome matching an unclassified βcov from vietnam ( . % to mh ). finally, the δcov otu (srr ) appears to be from a currently unpublished avian virome study in china. we designated the eight remaining otus as group e, noting that all were found in samples from non-mammal aquatic vertebrates falling outside of δcov in the tree (extended figure ) . a sister taxon to coronaviridae figure : expanded characterisation of cov and related otus a radial cladogram derived from maximum likelihood tree of cov and related otus. inset is a phylogram of the same tree annotated with cov genera (greek letters) and group e cov-like nidoviruses. otus were generated by clustering the rdrp gene at % identity. diversity within each such % otu was characterised by counting the number of % identity otus it contained. an otu ( % or %) was considered to be known if it contained a genbank sequence, otherwise to be a novel otu discovered by serratus. hosts were considered novel if the source organism annotated by the sra belonged to a species not annotated as a host in any genbank record, noting that the annotated source may differ from the viral host (e.g., faecal contamination in a plant sample). hosts are classified as primates, fowl (galliformes), bats (chiroptera, aquatic (amphibia and osteichthyes), or other. b length distribution for assemblies of sra datasets classified as likely cov-positive, showing a peak around the typical cov genome length knt. c triangular matrix showing median rdrp sequence identities between selected nidovirales and group e viruses. d phylogram of group e cov-like nidoviruses. was recently proposed [ ] following the characterisation of a corona-like virus, microhyla alphaletovirus (mlev), in the frog microhyla fissipes, and soon after a related pacific salmon nidovirus (psnv) was described in the endangered oncorhynchus tshawytscha [ ] . two of our otus were in these host species and the described viruses proved to be near-perfect matches. we expand this recently characterised group with six additional members, five similar to psnv in; takifugu pardalis (fugu fish; tparnv), syngnathus typhle (broad-nosed pipefish; stypnv), hippocampus kuda (seahorse; hkudnv) [ ] , puntigrus tetrazona (tiger barb; ptetnv), ambystoma mexicanum (axolotl; amexnv), and a more distant member in caretta caretta (loggerhead sea turtle; ccarnv). notably, the ambystoma mexicanum (axolotl) nidovirus (amexnv) was assembled in runs, of which yielded kb contigs. easing the criteria of requiring an rdrp match, / ( . %) of the runs from the associated studies were amexnv positive. gene structure of the amexnv and related contigs suggests that there is genomic segmentation within this clade (extended figure ) , with a homologous assembly gap is present in the published psnv genome [ ] . these contigs were obtained from experimental animals from two different research groups [ ] [ ] [ ] , the common factor is the animal stock centre used by these studies which is therefore likely to be the source of the virus. axolotl are critically endangered in the wild; determining the distribution and pathophysiology of amexnv in these animals can assist with conservation efforts. infectious agents are the leading cause of pyrexia of unknown origin (puo) in children and immunocompromised adults [ ] . in addition to identifying genetic diversity within cov, we cross-referenced cov+ library meta-data to identify possible zoonoses and infer vectors of transmission. discordant libraries, one in which a cov is identified and the viral expected host does not match the sequencing library source taxa, were rare, accounting for only . % of cases (extended table e ). in a virome sequencing study [ ] of children with febrile illness, we identified sequencing runs from two children, one febrile (id: ) and one afebrile (id: ) with reads mapping to the (βcov), murine hepatitis virus (mhv). we assembled a complete . kb mhv genome from each replicate taken from the febrile child and a partial genome from the afebrile child. mhv can infect human cells in vitro [ ] , but may be rare in humans, highlighting how rapid and unbiased meta-genomic sequence analysis can not only resolve the etiology of a sub-set of puo, centralisation of these data (stripped of human-identifying reads) also serves as a public-health surveillance system for zoonosis. an important consideration for these analyses is that the nucleic acid reads do not prove viral infection has occurred in the nominal host species. for example, we identified four libraries in which a porcine or avian coronavirus was found in plant samples. a more likely explanation than cross-kingdom cov transmission is that cov was present in faeces/fertiliser originating from a mammalian or avian host. coronaviridae is a well-characterised family ( figure and extended figure ), yet our re-analysis of the sra yielded eleven novel or under-reported otus. there are at least , more high-confidence (score ≥ ) and diverged (≤ % identity) virus-containing datasets. in particular picornaviridae and reoviridae are enriched and numerous within this category ( figure ). serratus exploration of under-characterised viruses can potentially fill these gaps in our knowledge. the global mortality from viral hepatitis exceeds that of hiv/aids, tuberculosis or malaria, due to acute and chronic liver cirrhosis and subsequent hepatocellular carcinoma [ ] . hepatitis delta virus (hdv) is a small ( . knt) rna satellite virus infecting hepatocytes. alone, hdv is unable to produce infectious viral particles, as it requires the envelope protein from its helper virus, hepatitis b (hbv) [ ] . hdv infection aggravates liver cirrhosis caused by hbv and worsens clinical outcomes [ ] . prior to , hdv was the sole known member of its genus; ten members have since been characterised [ - ]. we identified an additional six deltaviruses ( figure a ) and assembled complete circular genomes for five (extended figure ). the evolutionary histories of these deltaviruses are explored further in a companion manuscript [ ] . one of these novel deltaviruses, mmondv, was identified in marmota monax (eastern woodchuck), a model organism used over the last three decades for the study of viral-induced hepatitis and hepatocellular carcinoma following woodchuck hepatitis virus (whv) infection, a hepadnavirus similar to hbv [ ]. from a study of woodchucks born in captivity and experimentally infected with whv [ ], liver biopsy rna-seq from four ( . %) animals contained > mmondv-mapping reads in at least one time-point of the week study (figure c ). woodchuck hepatitis virus can support replication of human hdv, it is in fact a model for hdv pathogenesis [ , ], so it is probable that whv is also the helper virus for mmondv. inter-animal variation of whv-induced liver cirrhosis can be substantial [ ] and cryptic mmondv infection may have be underlying some of this variability from the past three decades of research using this model system, which warrants further investigation. to explore the utility of broad-scale read archive searches for microbiome research, we sought to locate phages whose genomes encode proteins related to the terminases and major capsid proteins from recently reported huge phages [ ] . to focus on phages whose genomes are substantially larger than normal (the average size is kbp [ ]), we prioritised assembled sequences of ≥ kbp (figure a ). assembly of high-scoring runs returned terminase-containing long contigs, primarily from cats, dogs, cattle and whales. the phylogenetic analysis of these sequences resolves new groups of phages with large genomes, some of which are comprised only of sequences only from one animal genus. however, in a few cases we identified closely related phages in different animal orders, including one case where related phages were found in a human from bangladesh (err ) and groups of cats (prjeb ) and dogs (prjeb ) from england, sampled years apart. this result parallels the finding of kbp lak phage genomes in pigs, baboons and humans [ ] . these newly recovered sequences substantially expand the previously defined clades and reveal members of these clades in new habitats ( figure b ). overall, these findings amplify that phages with large genomes are prevalent in human and animal microbiomes. since the completion of the initial draft of the human genome, the cost of dna sequencing has outpaced moore's law with a corresponding increase in the sizes of sequence databases [ ] . serratus offers researchers access to over a decade of data collected by the global research community in a rapid and a cost-effective manner. while our first priority was viral discovery in the context of an ongoing global health crisis, we believe that serratus and further extensions of petabase scale metagenomics will shape a new era in computational biology, and enable radically new approaches to gene discovery, pathogen surveillance, pangenomic evolutionary analysis amongst other applications. rapid translation of large datasets, such as those generated by serratus, into meaningful biomedical advances requires concerted collaboration between specialists [ ] and underscores a greater need for prompt, free and unrestricted data sharing in the community, not only of raw data (reads) but also of analyses such as assemblies and annotations. to facilitate such progress, we established a data warehouse of the . terabytes of viral alignments containing known, and yet to be characterized, viral species, each requiring domain expertise for curation. these data can be explored via a graphical web interface at https://serratus.io or programatically through the r package tantalus (https://github.com/serratus-bio/tantalus) which interfaces to a postgresql-server hosting high-level data summaries. computational biology is outpacing the rate at which classical isolation-or culture-based validation can be performed. reverse genetics and synthetic nucleic acids offer a path to biological validation when virions are unavailable, such as those predicted from sequence alone [ , ] . innovative fields such as high-throughput functional viromics [ ] leverage these broad and rapidly growing collections of viral sequences, and can inform evidence-based policies responding to emerging pandemics [ , ] . human population growth and encroachment on animal habitats is bringing more species into proximity, leading to increased zoonosis [ ] and accelerating the anthropocene mass extinction [ , ] . while the availability of computation and data analysis is increasing, the opportunity to capture the rich genetic diversity of endangered species and their associated microorganism biodiversity is not. the need to invest in field studies for the collection and curation of rare and biologically diverse samples has never been as pressing as it is today. if not for the conservation of endangered species, then to conserve our own. figure ). the processing of each sequencing library is split into three modules dl (download), align, and merge. the dl module acquires compressed data (.sra format) via prefetch, from the aws s mirror of the sra, decompresses to fastq, and splits the data into fq-blocks of million reads or read-pairs into a temporary s cache bucket. to mitigate excessive disk usage caused by a few large datasets, a limit of million reads per dataset was imposed. the align module reads individual fq-blocks and aligns to an indexed database of user-provided query sequences using either bowtie each component is launched from a separate aws autoscaling group with its own launch template, allowing the user to tailor instance requirements per task. this enabled us to minimise the use of costly block storage during compute-bound tasks such as alignment. we used the following spot instance types; dl: gb ssd block storage, vcpus, gb ram (r .xlarge) instances; align: gb ssd block storage, vcpus, gb ram (c .xlarge) , instances; merge: gb ssd block storage, vcpus, gb ram (c .large) instances. users should note that it may be necessary to submit a service ticket to access more than the default ec instance limit. ec instances have higher network bandwidth (up to . gb/s) than block storage bandwidth ( mb/s). to exploit this, we used s buckets as a data buffering and streaming system and to transfer data between instances following methods developed in a previous cloud architecture (https://github.com/fredhutch/sra-pipeline). this, combined with splitting of fastq files into individual blocks, effectively eliminated file input/output (i/o) as a bottleneck, since the available i/o is multiplied per running instance (conceptually analogous to a raid configuration). using s as a buffer also allowed us to decouple the input and output of each module s storage is cheap enough that in the event of unexpected issues (e.g., exceeding ec quotas) we could resolve problems and resume processing. for example, shutting done the align modules to hotfix a genome indexing problem without having to re-run the dl modules. the serratus scheduler node controls the number of desired instances to be created for each component of the workflow, based on the available work queue. we implemented a pull-based work queue. upon boot-up each instance launches a number of worker threads equal to the number of cpu available. each worker independently manages itself via a boot script, and query the scheduler for available tasks. upon completion of the task, the worker updates the scheduler of the result: success, or fail, and queries for a new task. under ideal conditions, this allows for a response time in the hundreds of milliseconds, worst case, keeping cluster throughput high. each task typically lasts several minutes. the scheduler itself was implemented using postgres (for persistence and concurrency) and flask (to pool connections and translate rest queries into sql). the flask layer allowed us to scale the cluster past the number of simultaneous sessions manageable by a single postgres instance. the work queue can also be managed manually by the user, to perform operations such as re-attempt downloading of an sra accession upon a failure or to pause an operation while debugging. the system is designed to be fully self-scaling. an "autoscaling controller" was implemented which scales-in or scales-out the desired number of instances per task every five minutes based on the work queue. as a backstop, when all workers on an instance fail to receive work instructions from the scheduler, the instance is shut-down. finally a "job cleaner" component checks the active jobs against currently running instances. if an instance has disappear due to spot termination or manual shutdown, it resets the job allowing it to be processed up by the next available instance. to monitor cluster performance in real-time, we used prometheus and node exporter to retrieve cpu, disk, memory, and networking statistics from each instance, postgres exporter to expose performance information about the work queue, and python exporter to export information from the flask server. this allowed us to identify and diagnose performance problems within minutes to avoid costly overruns. we define a viral pangenome as the entire collection of reference sequences belonging to a taxonomic viral family, which may contain both full-length genomes and sequence fragments such as those based on rdrp amplicon sequencing. we developed a summarizer module written in python to provide a compact, human-and machine-readable synopsis of the alignments generated for each sra dataset. the method was implemented in serratus summarizer.py for nucleotide alignment and serratus psummarizer.py for amino acid alignments. reports generated by the summarizer are text files with three sections described in detail online (https://github.com/ababaian/serratus/ wiki/.summary-reports). in brief, each contains a header section with alignment meta-data and one-line summaries for each virus family pangenome, reference sequence and gene respectively, with gene summaries provided for protein alignments only. for each summary line we include descriptive statistics gathered from the alignment data such as the number of aligned reads, estimated read depth, mean alignment identity, and coverage, i.e. the distribution of reads across each reference sequence or pangenome. coverage is measured by dividing a reference sequence into equal bins and depicted as an ascii text string of symbols, one per bin; for example oaooomouu:owwuuwowamwaauw. each symbol represents log (n + ) where n is the number of reads aligned to a bin in this order: .:uwaomuwaom^. thus, ' ' indicates no reads, '.' exactly one read, ':' two reads, 'u' - reads, 'w' - reads and so on; '^' represents > = , reads in the bin. for a pangenome, alignments to its reference sequences are projected onto a corresponding set of bins. for a complete genome, the projected pangenome bin number , , . . . , is the same as the reference sequence bin number. for a fragment, a bin is projected onto the pangenome bin implied by the alignment of the fragment to a complete genome. for example, if the start of a fragment aligns half way into a complete genome, bin of the fragment is projected to bin = of the pangenome. the introduction of pangenome bins was motivated by the observation that bowtie selects an alignment at random when there are two or more top-scoring alignments, which tends to distribute coverage over several reference sequences when a single viral genome is present in the reads. coverage of a single reference genome may therefore be fragmented, and binning to a pangenome better assesses coverage over a putative viral genome in the reads while retaining pangenome sequence diversity for detection. the summarizer implements a binary classifier predicting the presence or absence of each virus family in the query. for a given family f , the classifier reports a score in the range [ , ] with the goal of assigning a high score to a dataset if it contains f and a low score if it does not. setting a threshold on the score divides datasets into disjoint subsets representing predicted positive and negative detections of family f . the choice of threshold implies a trade-off between false positives and false negatives. sorting by decreasing score ranks datasets in decreasing order of confidence that f is present in the reads. naively, a natural measure of the presence of a virus family is the number of alignments to its reference sequences. however, alignments may be induced by non-homologous sequence similarity, for example low-complexity sequence. the score for a family was therefore designed to reflect the overall coverage of a pangenome because coverage across all or most of a pangenome is more likely to reflect true homology, i.e. the presence of a related virus. ideally, coverage would be measured individually for each base in the reference sequence, but this could add undesirable overhead in compute time and memory for a process which is executed in the linux alignment pipe (fastq decompression → aligner → summarizer → alignment file compression). coverage was therefore measured by binning as described above, which can be implemented with minimal overhead. a virus that is present in the reads with coverage too low to enable an assembly may have less practical value than an assembled genome. also, genomes with lower identity to previously known sequences will tend to contain more novel biological information than genomes with high identity and will tend to have fewer alignments highly diverged segments. with these considerations in mind, the classifier was designed to give higher scores when coverage is high, read depth is high, and/or identity is low. this was accomplished as follows. let h be the number of bins with at least alignments to f , and l be the number of bins with from to alignments. let s be the mean alignment percentage identity, and define the identity weight w = ( s ) − , which is designed to give higher weight to lower identities, noting that w is close to one when identity is close to % and increases rapidly at lower identities. the classification score for family f is calculated as z f = max(w( h + l)), ). by construction, z f has a maximum of when coverage is consistently high across a pangenome, and is also high when identity is low and coverage is moderate, which may reflect high read depth but many false negative alignments due to low identity. thus, z f is greater than zero when there is at least one alignment to f and assigns higher scores to sra datasets which are more likely to support successful assembly of a virus belonging to f . )" (date accessed: may th ). retroviruses (n = ) were excluded as preliminary testing yielded excessive numbers of alignments to transcribed endogenous retroviruses. each sequence was annotated with its taxonomic family according to its refseq record; those for which no family was assigned by refseq (n = ) were designated as "unknown". the collection of these pangenomes was termed cov m, and was the sequence reference used for this study. the protein search query was composed of the following sequences: (i) cov proteins (method described under to run serratus, a target list of sra run accessions is required. for this work, we designed target lists broadly classified as human, mouse, mammal, vertebrate, invertebrate, bat (including genome sequencing libraries), virome and metagenome (extended table c ). each list contained accessions of rna-seq, meta-genomic, and metatranscriptome runs for these organisms; some run accessions appeared in more than one list. prior to each serratus run, the lists were depleted for accessions already analyzed. re-processing of a failed dataset was attempted at least twice. in total we were able to generate alignments to the query pangenomes for , , / , , ( . %) of the targeted sra accessions. we implemented an on-going, multi-tiered release policy for code and data generated by this study, as follows. all code, electronic notebooks and raw data is immediately available at https://github.com/ababaian/serratus and on the s ://serratus-public/ bucket, respectively. upon completion of a project milestone, a structured data-release is issued containing raw data into our viral data warehouse s ://lovelywater/. for example, at the time of writing the .bam alignment files from . million sra runs are stored in s ://lovelywater/bam/x.bam; .summary files are s ://lovelywater/summary/x.summary, where x is a sra run accession. these structured releases enable downstream and third-party programmatic access to the data. summary files for every searched sra dataset are parsed into a postgresql relational database which can be queried remotely via an aws relational database (rds) server. this enables users and programs to perform complex operations such as retrieving summaries and meta-data for all sra runs matching a given reference sequence with above a given classifier score threshold. for example, all records containing at least aligned reads to hepatitis delta virus (nc . ) and the associated host taxonomy for the corresponding sra datasets. for users unfamiliar with sql queries we developed tantalus (https://github.com/serratus-bio/tantalus, an r programming-language package which directly interfaces the serratus rds server to retrieve summary information as data-frames. tantalus also offers functions to explore and visualize the data. finally, the serratus data can be explored via a graphical web interface by accession, virus, or viral family at https:/serratus.io. the website uses javascript to access the rds server and create a graphical report with an overview of viral families found in each sra accession matching a user query. all four data access interfaces are under ongoing development, receiving community feedback via their respective github issue trackers to facilitate the translation of this data collection into an effective viral discovery resource. documentation for data access methods is available at https://serratus.io/access . viral assembly and annotation . . coronaspades rna viral genome assembly faces several distinct challenges stemming from technical and biological bias in sequencing data. during library preparation, reverse transcription introduces end coverage bias, and gc-content skew and secondary structures lead to unequal pcr amplification [ ] . technical bias is confounded by biological complexity such as intra-sample sequence variation due to transcript isoforms, as found in cov [ ] and/or to presence of multiple strains. to address the assembly challenges specific to rna viruses, we developed coronaspades, described in detail in a companion manuscript [ ] . in brief, rnaviralspades and the more specialized variant, coronaspades, combines algorithms and methods from several previous approaches based on metaspades [ ], rnaspades [ ] and metaviralspades [ ] with a hmmpathextension step. coronaspades constructs an assembly graph from a rna-sequencing dataset (transcriptome, meta-transcriptome, and meta-virome are supported), removing expected sequencing artifacts such as low-complexity (poly-a / poly-t) tips, edges, single-strand chimeric loops or doublestrand hairpins [ ] and subspecies-bases variation [ ] . to deal with possible misassemblies and high-covered sequencing artifacts, a secondary hmmpathextension step is performed to leverage orthogonal information about the expected viral genome. protein domains are identified on all assembly graphs using a set of viral hidden markov models (hmms), and similar to biosyntheticspades [ ], hmmpathextension attempts to find paths on the assembly graph which pass through significant hmm matches in order. coronaspades is bundled with the pfam sars-cov- set of hmms [ ], although these may be substituted by the user. this latter feature of coronaspades was utilized for hdv assembly, where the hmm model of hdag, the hepatitis delta antigen, was used instead of pfam sars-cov- set. note that despite the name, these hmms are quite general, modeling domains found in all coronavirus genera in addition to rdrp, which is found in many rna virus families. hits from these hmms cover most bases in most known coronaviruse genomes, enabling the recovery of strain mixtures and splice variants. accurate annotation of cov genomes is challenging due to ribosomal frameshifts and polyproteins which are cleaved into maturation proteins [ ] , and thus previously-annotated viral genomes offer a guide to accurate gene-calls and protein functional predictions. however, while many of the viral genomes we were likely to recover would be similar to previously-annotated genomes in refseq or genbank, we anticipated that many of the genomes would be taxonomically distant from any available reference. to address these constraints, we developed an annotation pipeline called darth [ ] which leverages both reference-based and ab initio annotation approaches. in brief, darth consists of the following phases: canonicalize the ordering and orientation of assembly contigs using conserved domain alignments, perform reference-based annotation of the contigs, annotate rna secondary structure, ab intio gene-calling, generate files for aiding assembly and annotation diagnostics, and generate a master annotation file. it is important to put the contigs in the "expected" orientation and ordering to facilitate comparative analysis of synteny and as a requirement for genome deposition. to perform this canonicalization, darth generates the six-frame translation of the contigs using the transeq [ ] and uses hmmer [ ] to search the translations for pfam domain models specific to cov [ ] . darth compares the pfam accessions from the hmmer alignment to the ncbi sars-cov- reference genome (ncbi nucleotide accession nc . ) to determine the correct ordering and orientation, and produces an updated assembly fasta file. darth performs reference-based annotation using vadr [ ] , which provides a set of genome models for all cov refseq genomes [ ] . vadr provides annotations of gene coordinates, polyprotein cleavage sites, and functional annotation of all proteins. darth supplements the vadr annotation by using infernal [ ] to scan the contigs against the sars-cov- rfam release [ ] which provides updated models of cov and untranslated regions (utrs) along with stem-loop structures associated with programmed ribosomal frame-shifts. while vadr provides reference-based gene-calling, darth also provides ab initio gene-calling by using fraggenescan [ ] , a frameshift-aware gene caller. darth also generates auxiliary files which are useful for assembly quality and annotation diagnostics, such as indexed bam files created with samtools [ ] representing self-alignment of the trimmed reads to the canonicalized assembly using bowtie [ ], and variant-calls using bcftools from samtools. darth generates these files so that the can be easily loaded into a genome browser such as jbrowse [ ] or igv [ ] . as the final step darth generates a single generic feature format (gff) . file [ ] containing combined set of annotation information described above, ready for use in a genome browser, or for submitting the annotation and sequence to a genome repository. the serratus searches described above identified , libraries ( , by nucleotide and , by amino acid) as potentially positive for cov (score ≥ and ≥ reads). to supplement this search we also employed a recently developed index of the sra called stat [ ] with which identified an additional , sra datasets not in the defined sra search space. the stat bigquery was where tax id= and total count > " accessed on june th . we used aws batch to launch thousands of assemblies of ncbi accessions simultaneously. the workflow consists of four standard parts: a job queue, a job definition, a compute environment, and finally, the jobs themselves. a cloudformation template was created for building all parts of the cloud infrastructure from the command line. the job definition specifies a docker image, and asks for virtual cpus (vcpus, corresponding to threads) and gb of memory per job, corresponding to a reasonable allocation for coronaspades. the compute environment is the most involved component. we set it to run jobs on cost-effective spot instances (optimal setting) with an additional cost-optimization strategy (spot capacity optimized setting), and allowing up to , vcpus total. in addition, the compute environment specifies a launch template which, on each instance, i) automatically mounts an exclusive tb ebs volume, allowing sufficient disk space for several concurrent assemblies, and ii) downloads the . gb checkv database, to avoid bloating the docker image. the peak aws usage of our batch infrastructure was , vcpus, performing , assemblies simultaneously. a total of , accessions out of , were assembled in a single day. they were then analysed by two methods to detect putative cov contigs. the first method is checkv, followed selecting contigs associated to known cov genomes. the second method is a custom script that parses coronaspades bgc candidates and keeps contigs containing cov domain(s). for each accession, we kept the set of contigs obtained by the first method (checkv) if it is non-empty, and otherwise we kept the set of contigs from the second method (bgc). a majority ( %) of the assemblies were discarded for one of the following reasons: i) no cov contigs were found by either filtering method, ii) reads were too short to be assembled, iii) batch job or sra download failed, or iv) coronaspades ran out of memory. a total of , assemblies were considered for further analysis. with rna-seq metagenomic reads, the number of reads per base may be highly variable at different locations in a viral genome. regions of high coverage may be adjacent to regions with low coverage or no reads, causing breaks between contigs. thus, a given base in a contig may have only one or very few reads as evidence, and as a consequence the reliability of base calls may be low in some regions of the assembly which could degrade inference of biological variations between genomes. the assemblers used in this work do not provide a per-base quality score, and to address this issue we used two complementary approaches: ( ) reporting contig average coverage as a proxy for quality, and ( ) self-aligning reads to the assembly sequence and calling variants to enable facile visual inspection of per-base coverage levels and significant variants in genome browsers (see section . . ). we developed a module, serratax, to predict taxonomy for cov genomes and assemblies (https://github. com/ababaian/serratus/tree/master/containers/serratax). serratax was designed with the following requirements in mind: provide taxonomy predictions for fragmented and partial assemblies in addition to complete genomes; report best-estimate predictions balancing over-classification and under-classification (too many and too few ranks, respectively); and assign an ncbi taxonomy database [ ] identifier (taxid). assigning a best-fit taxid was not supported by any previously published taxonomy prediction software to the best of our knowledge; this requires assignment to intermediate ranks such as sub-genus and ranks below species (commonly called strains, but these ranks are not named in the taxonomy database), and to unclassified taxa, e.g. taxid , unclassified buldecovirus, in cases where the genome is predicted to fall inside a named clade but outside all named taxa within that clade. serratax uses a reference database containing domain sequences with taxids. this database was constructed as follows. records annoated as cov were downloaded from uniprot [ ] , and chain sequences were extracted. each chain name, e.g. helicase, was considered to be a separate domain. to generate an alternate taxonomic annotation of an assembled genome, we created a pipeline based on phylogenetic placement, serraplace. to perform phylogenetic placement, a reference phylogenetic tree is required. to this end, we collected reference amino acid rdrp sequences, spanning all coronaviridae. to this set we added an outgroup rdrp sequence from the torovirus family (nc ). we clustered the sequences to % identity using usearch ([ ] , uclust algorithm, v . . ), resulting in centroid sequences. subsequently we performed multiple sequence alignment on the clustered sequences using muscle ( [ ] , v . . ). we then performed maximum likelihood tree inference using raxml-ng ( [ ] , protgtr+fo+g , v . . ), resulting in our reference tree. to apply serraplace to a given genome, we first use hmmer ([ ], v . ) to generate a reference hmm, based on the reference alignment. we then split each contig into orfs using esl-translate, and use hmmsearch (p-value cutoff . ) to identify those query orfs that align with sufficient quality to the previously generated reference hmm. all orfs that pass this test are considered valid input sequences for phylogenetic placement. subsequently, we use epa-ng ( [ ] , v . . ) to place each sequence on the rdrp reference tree. this produces a set of likely placement locations on the tree, with an associated likelihood weight. we then use gappa ( [ ] , v . . ) to assign taxonomic information to each query, using the taxonomic information for the reference sequences. gappa assigns taxonomy by first labelling the interior nodes of the reference tree by a consensus of the taxonomic labels of all descendant leaves of that node. if % of leaves share the same taxonomic label up to some level, then the internal node is assigned that label. then, the likelihood weight associated with each sequence is assigned to the labels of internal nodes of the reference tree, according to where the query was placed. from this result, we select that taxonomic label that accumulated the highest total likelihood weight as the taxonomic label of a sequence. note that multiple orfs of the same genome may result in a taxonomic label, in which case, we select the longest sequence as the source of the taxonomic assignment of the genome. we performed phylogenetic inferences using a custom snakemake pipeline (available at https://github.com/ lczech/nidhoggr), using pargenes ( [ ] , v . . ). pargenes is a treesearch orchestrator, build on top of modeltest-ng [ ] and raxmlng, enabling higher levels of parallelisation for a given tree search. to infer the maximum likelihood phylogenetic tree displayed in extended figure , we performed a tree search comprising distinct starting trees ( random, parsimony), as well as bootstrap searches. we used modeltestng to automatically select the best evolutionary model, which in this case was lg+iu+g m. the pipeline also automatically produces versions of the best maximum likelihood tree annotated with felsenstein's bootstrap ( [ ] ) support values, and transfer bootstrap expectation ([ ]) values, the latter of which was used in extended figure . archival copies of all code generated for this study is available at https://github.com/serratus-bio. electronic notebooks for experiments are available at https://github.com/ababaian/serratus. access to all data generated in this study can be accessed at https://serratus.io/access. assembled genomes contigs for this study are available at https://serratus.io/access pending deposition into public repositories. extended table : sra run queries and search nucleotide accessions. queries and accessions from this study. a sra queries to retrieve collections of datasets. b nucleotide accessions compiled into the cov ma reference query and c the sequence masked applied to those sequences. extended table : assembled coronaviridae in the sra. a run accessions, assembly statistics and select meta-data for the , runs for which coronaviridae, or coronaviridae-like sequences were assembled. b assignment of assembled runs to operational taxonomic units (otus) based on % identity of the rna dependent rna polymerase (rdrp) domain. c assignment of genbank records to rdrp otus. d assignment of expected viral host for genbank records. e taxonomic source for rdrp containing assemblies. f supporting data for figure . extended figure : overview of the serratus architecture. a schematic and data workflow (b) as described in the methods for aligning to the viral pangenome (c). d a nucleotide alignment completion rate for serratus shows stable and linear performance to complete . million sra accessions in a -hour period. e cost breakdown for this run. compute costs between modules are an approximate comparison of cpu requirements of each step. the total average cost per completed sra accession was $ . us dollars or $ . us dollars per terabase processed. extended figure : distribution of dna and other viral families in the sra the total number of datasets matching each dna or other viral pangenome, binned by the average nucleotide identity and colored by score (see methods). an interactive and queryable version of this plot is available at https://serratus.io/family. figure : deltavirus ribozymes evolutionary history a multiple sequence alignment of the genomic and anti-genomic deltavirus ribozymes based on muscle [ ] and refined manually based on secondary structure. the shortening of the j / loop and presence of the lg loop is specific to and conserved within the genomic ribozyme. consensus secondary structure of the b genomic and c anti-genomic ribozymes. d maximum-likelihood tree based on concatenated ribozyme sequences supports the topology of the δag amino-acid tree (figure ) a strategy to estimate unknown viral diversity in mammals global trends in emerging infectious diseases. eng global shifts in mammalian population trends reveal key predictors of virus spillover risk the global virome project. en the sequence read archive the sensitivity of massively parallel sequencing for detecting candidate infectious agents associated with human tissue. eng demographic and environmental drivers of metagenomic viral diversity in vampire bats. en comparative analysis of rodent and small mammal viromes to better understand the wildlife origin of emerging infectious diseases description and initial characterization of metatranscriptomic nidovirus-like genomes from the proposed new family abyssoviridae, and from a sister group to the coronavirinae, the proposed genus alphaletovirus endangered wild salmon infected by newly discovered viruses comparative population genomics in animals uncovers the determinants of genetic diversity. en blastemal progenitors modulate immune signaling during early limb regeneration midkine is a dual regulator of wound epidermis development and inflammation during the initiation of limb regeneration ap- cfos/junb /mir- a regulate the pro-regenerative glial cell response during axolotl spinal cord regeneration. en pyrexia of unknown origin sequence analysis of the human virome in febrile and afebrile children mouse hepatitis virus strain jhm infects a human hepatocellular carcinoma cell line. eng the global burden of viral hepatitis from to : findings from the global burden of disease study infection by hepatitis delta virus. en pfam sars-cov- special update (part ) en. library catalog: xfam.wordpress.com vadr: validation and annotation of virus sequence submissions to genbank coronavirus annotation using vadr en. library catalog: github infernal . : -fold faster rna homology searches rfam coronavirus special release en. library catalog: xfam.wordpress.com fraggenescan: predicting genes in short and error-prone reads. en the sequence alignment/map format and samtools. eng jbrowse: a dynamic web platform for genome visualization and analysis. eng publisher: american association for cancer research section: focus on computer resources the sequence ontology: a tool for the unification of genome annotations the ncbi taxonomy database uniprot: a worldwide hub of protein knowledge muscle: multiple sequence alignment with high accuracy and high throughput raxml-ng: a fast, scalable and userfriendly tool for maximum likelihood phylogenetic inference. en epa-ng: massively parallel evolutionary placement of genetic sequences. en genesis and gappa: processing, analyzing and visualizing phylogenetic (placement) data pargenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes. en modeltest-ng: a new and scalable tool for the selection of dna and protein evolutionary models. en tobaniviridae (t), and roniviridae (r). b distribution of pair-wise sequence identities for rdrp sequences within and between distinct taxa at species, subgenus and genus rank, respectively. c distribution of pair-wise rdrp identities for coronaviridae genera hidden markov model (hmm) protein domain matches from the rdrp containing contigs or reference sequences for exemplar operational taxonomic units (otus) grouped by genus extended figure : newly characterised deltavirus genomes genome structure and organisation of the five deltaviruses (pmacdv srr ; mmondv srr ; ovirdv srr ; tgutdv srr ; and ichidv srr ) and one deltavirus-like (bgladvl srr ; for which we could not identify a ribozyme sequence) sequence identified in our study. each circular rna virus shows characteristic rod-like genome folding and low free-energy (δg), similar to a hepatitis delta virus positive control, and two ribozymes and the serratus project is an initiative of the hackseqrna genomics hackathon (https://www.hackseq.com). we would like to thank the many contributors for code snippets and bioinformatic discussion; e. erhan, j. chu, i. birol, k. wellman, c. xu, m. huss, k. ha, e. nawrocki, r. mclaughlin, c. morgan-lang, c. blumberg, and the j. brister lab. a. rodrigues, s. mcmillan, v. wu, c. kennet, k. chao, and n. pereyaslavsky for aws support. we would also like to thank the j. joy lab, g. mordecai, j. taylor, s. roux, l. bergner, r. orton, and d. streicker for virology discussions. we are grateful to the entire team managing the ncbi sra. ta is grateful for the advanced research computing resource at the university of british columbia. pb was financially supported by the klaus tschira foundation, rc by anr transipedia and inception grants (pia/anr- -conv- , anr- -ce - ), ak and dm were supported by the russian science foundation (grant - - ) and computation was carried out in part by resource centre "computer centre of spbu". ak and dm are grateful to saint petersburg state university for the overall support of this work (project id: ). project support and computing resources were kindly provided by the university of british columbia community health and wellbeing cloud innovation centre, powered by aws. and special thanks to our patient and understanding partners. ab conceived and led the study. ab and jt designed and implemented the serratus architecture. ab and rce constructed the viral pangenomes and panproteomes. rce developed the serratax and summarizer modules. pb developed the serraplace tree placement and taxonomy prediction code and calculated maximum likelihood trees. ta developed the darth annotation pipeline and submitted the annotated genomes to ena. dm and ak developed the coronaspades assembler. rc implemented the assembly pipeline, and deployed the assembly and annotation pipeline. ab, vl, and dl designed and developed https://serratus.io and the sql server. ab and gn developed the tantalus r package. ab, rce, ta, pb, dm, ak, and rc analysed the coronavirus and deltavirus data. bas and jb designed the phage panproteome, assembled phage genomes, and conducted phylogenetic analyses. all authors contributed to data interpretation and writing the manuscript. correspondence should be addressed to ab. does not apply. key: cord- -qlzhtxs authors: goryachev, a.n.; kalantarov, s.a.; severova, a.g.; goryacheva, a.s. title: potential opportunity of antisense therapy of covid- on an in vitro model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qlzhtxs data on potential effectiveness and prospects of treatment of new coronavirus infection of covid- caused by virus sars-cov- with the help of antisense oligonucleotides acting against rna of virus on an in vitro model are given. the ability of antisense oligonucleotides to suppress viral replication in diseases caused by coronaviruses using the example of sars and mers is shown. the identity of the initial regulatory section of rna of various coronaviruses was found within - nucleotides from the ’-end, which allows using antisense suppression of this rna fragment. a new rna fragment of the virus present in all samples of coronovirus sars-cov- has been identified, the suppression of which with the help of an antisense oligonucleotide can be effective in the treatment of covid- . the study of the synthesized antisense oligonucleotide ’ - agccgagtgacagcc acacag, complementary to the selected virus rna sequence, was carried out. the low toxicity of the preparations of this group in the cell culture study and the ability to reduce viral load at high doses according to real time-pcr data are shown. the cytopathogenic dose exceeds mg / ml. at a dosage of mg / ml, viral replication is reduced by - times. conclusions are made about the prospects of this direction and the feasibility of using the inhalation way of drug administration into the body. antisense oligonucleotides consists in administering to the body of the patient a drug containing single-strand dna chains which is complementary to any site of single-strand dna or rna, for example, virus rna. complementary binding of dna of the preparation and rna of the virus leads to impossibility of transcription and translation of viral rna and cutting of blocked section of rna of the virus by rnase h. this stops synthesis of new viral particles and prevents intracellular propagation of the virus [ ] . several antiviral antisense drugs have been marketed, for example fomivirsen ® (vitraven ® ) -a drug against cytomegalovirus (novartis), miravirsen ® -a drug for the treatment of hepatitis c (santaris pharma). currently, more than drugs for various diseases based on the use of antisense oligonucleotides are undergoing clinical trials in various phases. antisense therapy is a developing strategy for the specific treatment of new and socially significant diseases. the principle of rna inhibition has previously been studied in vitro to inhibit replication of highly pathogenic rna-viruses [ , ] . thus, given the previous experience of antisense oligonucleotide therapy, it can be assumed that this strategy can be applied as an antiviral drug by binding and cleavage of rna sars-cov- . considering that sars-cov- is an rna virus that does not integrate into the host genome, the strategy of using antisense oligonucleotides, according to some authors, can give effective results [ , ] . prerequisites for the use of antisense therapy in the treatment of covid exist. coronavirus infections have previously caused outbreaks of epidemics. in particular, the outbreak of sars (severe acute respiratory syndrome) in china in - was caused by coronavirus sars-cov, the outbreak of middle eastern respiratory syndrome (mers) was also caused by coronavirus [ , ] . after the sars epidemic, a number of authors conducted studies on the effect of antisense therapy on the suppression of coronavirus growth in the tissues under investigation [ , , ] . in work ( ), the authors found among several sequences the most effective viral transcription inhibitors, suppressing the reproduction of coronavirus in cells and preventing infection of other cells. these are antisense blockers complementary to the initial nucleotides of viral rna -the so-called regulatory transcription sequences of trs (in the terminology of the authors) of the coronavirus strain sars-cov-tor (genbank ay ). the nucleotide sequence of the antisense preparations examined is shown in table : table antisense drugs investigated in work ( ) name of the antisense drugs nucleotide sequence ( ` - `) №№ numbers of blocked viral rna sites of sars-cov -tor trs gttcg tttag agaac agatc - trs taaag ttcgt ttaga gaacag - these sequences are complementary to the following viral nucleic acid sequences (table ) : table genetic target sequences of the coronavirus genome for trs and trs drugs name nucleotide sequence of virus rna ( '- ') site trs gatctgttctctaaacgaac site trs ctgttctctaaacgaacttta * -the standard is to use the notation "t" for uracil in rna the authors used morpholine side chain modification to prevent destruction of antisense oligonucleotide and conjugation with arginine polypeptide to improve penetration into infected cells. by comparison of these nucleotide sequences (table ) to the sequence of a genome of the sars-cov- virus (covid- ) the existence of these sequences almost in all strains of a virus allocated at patients during a pandemic of - was revealed (appendix , the nucleotide sequences are highlighted in red color and a frame). blast analysis (https://blast.ncbi.nlm.nih.gov/blast.cgi ) showed % coincidence of the rna sequence in all coronavirus subtypes, which suggests a high conservatism of this region of the coronavirus genome. this makes it possible to use for the treatment of coronavirus infection those antisense oligonucleotide sequences trs and trs , which were studied in by the collective [ ]. however, starting from mid-march , changes have been observed in the virus strains associated with the loss of sites from the ` end. in particular, the mt usa: wa - - virus sample did not have the regions designated as trs and trs . in this regard, the study of an antisense oligonucleotide complementary to the site of the conserved oligonucleotide sequence of the genome of the sars-cov- virus present in all strains, which was the goal of the present study, seems relevant. in accordance with the purpose of the study, the following tasks were set: -to select the nucleotide sequence of the virus that is supposed to be inhibited, -to carry out the synthesis of oligonucleotide, -to determine cytotoxicity and antiviral activity in an in vitro experiment on cell culture. . selection of nucleotide sequence. the nucleotide sequence that was intended to block in the sars-cov- virus was selected next to the trs and trs genes at a distance of four nucleotides from the final region ( '-end) of the trs gene. this choice was due to the fact that this sequence enters the site regulating transcription (transcription regulatory site, trs) and disabling this sequence will also lead to the impossibility of transcription by analogy with inhibition of trs and trs sites in work [ ] . blast analysis on the genbank (ncbi) showed presence of this sequence at genomes of all sequenced samples sars-cov- . this sequence has the form: `-ctg tgt ggc tgt cac tcg gct in the investigated nucleotide sequences of coronaviruses in appendix , this sequence is highlighted in green. the complementary sequence of the antisense oligonucleotide preparation for treatment has the form: `-agc cga gtg aca gcc aca cag the choice of the nucleotide sequence of the virus for blocking by the antisense preparation was also dictated by the minimal ability to form nucleotide hairpins that prevent hybridization of the rna region of the virus and the antisense preparation. when calculating this sequence on an oligocalculator [ ] , it was found that over a period of nucleotides from the '-end of the viral rna sequence, nucleotides can theoretically form - hairpins, from which the sequence we study can be involved in hairpins, while the sequences trs and trs can be fragments of and nucleotide hairpins, respectively. thus, according to theoretical calculations, we assumed a more specific nature of the selected nucleotide sequence for blocking transcription and translation of the viral genome than previously studied in the work ( ). . synthesis of the drug. the synthesis of an antisense oligonucleotide with phosphorothioate protection of the side sugar-phosphate chain '-agccgagtgacagccacacag was commissioned by genterra, moscow (https://www.genterra.ru/synth.html ). synthesis (medium-scale dna) was carried out with phosphorothioate protection of the phosphate group between all nucleotides with purification of reverse-phase hplc (certificate for synthetic oligonucleotides no. from / / ). the total amount of -membered oligonucleotide synthesized was . mg ( . μmol). the choice of phosphorothioate protection of an oligonucleotide to prevent nuclease degradation of an antisense oligonucleotide is due to the low toxicity of phosphorothioate modifications, commercial availability, the possibility of synthesizing large quantities in routine automatic synthesis on dna synthesizers. the study of toxicity and antiviral activity was carried out by order at the test center for quality control of immunobiological medicines of "national research center for epidemiology and microbiology named after n.f. gamalea "of the ministry of health of russia (study no. / of / / ). the study included the study of the cytotoxic effect of the antisense drug, the study of the antiviral activity of the drug during the therapeutic regimen of drug administration, the detection of sars-cov- rna by real-time pcr. in the experimental work, a transplantable cell line of the kidney of the african green monkey (chlorocebus aethiops) vero-e was used. cell cultivation was carried out in a cell growth medium supplemented with fetal bovine serum (fbs) (final concentration %). the studies used a pandemic strain of human coronavirus sars-cov- "gk / " passage , with an infectious activity of tcid /ml (tissue cytopathogenic doses) for vero e cells from the state collection of viruses of «national research center for epidemiology and microbiology named after n.f. gamalea". the virus was cultured in a vero e cell culture for hours at ° c in a % co atmosphere. infectious activity was determined according to the methods recommended by who. the cytotoxic action of the preparation in vero Е cell culture was determined using -well culture flat-bottomed plates in which vero Е cells were placed at , cells/well in a volume of μl of freshly prepared complete medium. cultivation was carried out by hours at a temperature of °c in the atmosphere of % of co . after incubating the cells with the preparations for hours at a temperature of o c in an atmosphere of % СО , the condition of the cell monolayer was visually evaluated. the culture medium was then removed from the plates and μl of reaction medium and μl of mts ( -( , - solution were added to each well to the cell culture monolayer. after incubation for hours at o c, the results were taken into account on the biorad automatic reader at a wavelength of nm. reference filter was - nm. the concentration of preparations reducing the optical density at nm by % compared to the control of cells was taken as % of the cytotoxic dose (cc ). the experiment to assess the viability of cells in the antiviral efficacy test was carried out in the range of drug concentrations that are not toxic to cells (i.e., lower than the detected cc value). the antiviral activity of the sample was assessed visually under a microscope hours after infection by inhibition of the cytopathic effect of the virus in a vero e cell culture. the result was assessed by Δlg max -the maximum decrease in the value of the infectious viral dose in the experiment in comparison with the control, expressed in decimal logarithms. the study of the antiviral activity of the antisense oligonucleotide substance in the vero e cell culture was carried out with a choice of concentrations based on the results of the cytotoxicity study. working solutions of the test drug were prepared with concentrations of . mg/ml; . mg/ml; . mg/ml and . mg/ml, respectively. the vero e cells used in the study were grown in well culture plates in a volume of μl full for hours at °c in an atmosphere of % co . seed dose - , cells / well. . μl of solutions of the test drug were pipetted from the dilution plates of the test drug to the test plates with cells. each point was tested in parallel wells. the preparations corresponding to the dilution scheme were added to the control wells without virus (to assess the potential cytotoxic effect and further take into account the study results). in the wells of the control cells, a medium was added for staging the reaction. the preparation of dilutions of the viral suspension for the study of antiviral activity was carried out by adding a suspension of sars-cov- , passage , with an infectious activity of tcid / ml for vero e cells to the plates with a monolayer of vero e cell culture: - the suspension was diluted by sequential transfer in test tubes with the required amount of the reaction medium - µl of the reaction medium and µl of the viral suspension. determination of viral production by cytopathic action was carried out on the basis of analysis of cell viability using microscopy, in order to visually determine the boundaries of viral cell damage, as well as to control the toxicity of doses of substances. the assessment of antiviral activity of the drug in addition to cytopathic action was also taken into account by reducing the infectious titer of the virus in the culture of vero cells e according to pcr rna sars-cov- , determined by the threshold of the number of reaction cycles (cycle treshold, ct) in various dilutions of the study drug. the study of sars-cov- rna by pcr was carried out by taking μl of the supernatant from wells with drug dilutions, isolating rna in parallel with positive and negative controls. the result of the study was the conclusion about the presence / absence of sars-cov- rna in the culture liquid when exposed to the drug: the presence of sars-cov- rna (ct value more than ), the absence of sars-cov- rna (ct value is absent). evaluation of the cytotoxicity of the drug at various concentrations was determined by incubating the drug with vero e cells for hours using the mts dye and visual assessment of the cell monolayer. based on the data obtained in the study of the cytotoxic effect of the test substance using mts in the culture of vero e cells, an analytical curve was constructed, from which the cc was determined. determination of the cytotoxicity of the antisense oligonucleotide substance by visual assessment of the state of the monolayer of the vero e cell culture under an inverted microscope did not reveal significant changes in the morphology of cells at a substance concentration of mg/ml and below after hours of incubation of the preparation with cells (table ) . determination of the antiviral efficacy of the antisense oligonucleotide according to the treatment scheme (administration of the drug hours after infection) was taken into account by the decrease in the infectious titer of the virus in the culture of vero e cells by the cytopathic effect. according to the results of the study, it was found that the drug did not inhibit the replication of the sars-cov- virus in the vero e cell culture at the tested concentrations. the results of the study of the antiviral activity of the drug by pcr are presented in table. . it can be seen from the presented data that there are no statistically significant differences between the ct value of the virus control group and the threshold number of cycles in the samples with the addition of the drug. the only difference is the ct value of the group with the addition of the drug at a dosage of . mg/ml, where the ct value is . compared to the virus control group ( . - . ). in the course of research, it was found that the antisense oligonucleotide is low toxic to the culture of vero e cells. the % cytotoxic dose of cc was greater than . mg/ml (exact cytotoxic dose values not determined). at the same time, the results of visual determination of cytotoxicity of the preparations (cc ) were comparable to the results of determination of СС using the vital dye mts. thus, this preparation is low toxic and safe for use. in the course of research, it was found that the antisense oligonucleotide is low toxic to the culture of vero e cells. the % cytotoxic dose of cc was greater than . mg/ml (exact cytotoxic dose values not determined). at the same time, the results of visual determination of cytotoxicity of the preparations (cc ) were comparable to the results of determination of СС using the vital dye mts. thus, this preparation is low toxic and safe for use. as a result of the study of the effectiveness of antisense oligonucleotide in in vitro experiments against sars-cov- , no statistically significant antiviral effect was found in the therapeutic regimen of the drug addition, because according to ( ) the minimum effective virus inhibiting concentration is the concentration of the drug reducing the virus titer by at least . lg. at the same time, as a result of studying the rna content of the virus at various dosages, it was found that with a dosage of the preparation of . - . mg/ml, the parameter ct is the number of amplification reaction cycles (doubling of viral rna), which is necessary to achieve a fluorescent signal is . - . doubling cycles. control ct values in the test with infected cells without the addition of antisense oligonucleotide were . - . values. at a dosage of the same preparation of mg/ml, the ct value was . . this means that at a dosage of mg/ml to achieve a fluorescent signal equivalent to the control group, it was necessary to increase the number of amplification cycles by an average of . - . cycles. this indicates that the dosage of the preparation in mg/ml did not inhibit the full reproduction of the virus, but significantly reduced viral rna replication. the reduction of virus replication based on the calculation of additional amplification cycles [ ] was a range of . - . times ( . - . ). that is, at a dosage of mg/ml of the preparation, the viral load of cells can be reduced by . - times. this value cannot be considered statistically significant, but there is a tendency to reduce the viral load, which may also be effective in antiviral therapy. literature data and data obtained in the experiment indicate that the search for antiviral drugs among groups of antisense oligonucleotides against a new coronavirus infection covid- is a promising direction. a possible way to enhance the antiviral effect can be the use of conjugates of oligonucleotides with other ligands or the use of liposomal forms of drug administration to improve drug penetration into cells. for preclinical and clinical studies of antiviral activity, as well as for therapeutic and prophylactic measures, phosphorothioate protection can be used, as the simplest and cheapest in synthesis. other protected group methods are possible though. considering that the virus spreads by airborne dust and airborne droplets, and affects the epithelium of the respiratory tract and lungs, inhalation of a solution of drugs through a nebulizer can be a promising and convenient method of administration. inhalation of the drug solution through a nebulizer is very simple, does not require sterilization, it reaches the epithelium of the respiratory system in a targeted manner, and is possible even in very severe patients. when antisense oligonucleotides are administered by inhalation, the penetration into the systemic circulation is less than % [ ]. the estimated human dosage is calculated based on the following considerations: the sars-cov- (covid- ) virus is a coronavirus infection that affects the epithelium of the respiratory tract and lungs. theoretically, any cell can be infected, in which up to thousand viral particles can multiply, each of which can infect another cell [ ]. the total area of the lungs and respiratory tract is, on average, m ( mm ). the surface density of alveolocytes per mm is approximately thousand cells ( ). thus, the total number of cells in the lungs and respiratory tract is cells. if we assume that all cells are infected ( , viral particles) and drug molecule is needed for each viral rna, the total number of drug molecules per dose per adult is , or nmol ( μg). a review of antisense oligonucleotides showed that the toxicity of this group of drugs is represented by two types. the first of them -hybridization-dependent toxicity -due to the specific sequence of the oligonucleotide and possible crosslinking with rna, which is not a drug target due to complete or partial coincidence of the nucleotide sequence with the target, as well as possible aptamer binding to proteins [ ] . overcoming this type of toxicity is possible through the proper selection of the target rna, using careful bioinformatics analysis to identify a target with perfect structural match or a small number of mismatched bases. this analysis is performed in the preparatory phase by blast analysis of the blocked sequence with sequences of other genes published in genbank (ncbi). the second type of toxicity is hybridization-independent (nonspecific) toxicity, due to the chemical properties of oligonucleotides interacting with proteins and their decay products. this type of toxicity does not depend on the nucleotide sequence, only on the chemical modification of the sugar-phosphate bridge. from this provision, it follows that if the nucleotide sequence of the antisense preparation is correctly selected for the nucleotide sequence of the virus (coronavirus, influenza virus and other rna viruses), there is no possible coincidence with other genes (according to blast analysis), then toxicity and associated with it, side effects / contraindications will depend only on the chemical modification of the antisense drug, regardless of the nucleotide sequence and the type of inhibited virus. this gives the broadest prospects for the creation of antiviral drugs, not only for coronaviruses, but also for other rna viruses, such as influenza. in this case, it will be sufficient to determine in advance the nonspecific toxicity of the oligonucleotide with the selected protection of the sugar-phosphate backbone. when sequencing the genome of the desired virus, it will be possible to use antisense oligonucleotides as medicinal antiviral agents with accelerated toxicity testing as a kind of off-lable drugs. this, among other things, will provide an opportunity to quickly respond to the threat of the next epidemic as a result of the spontaneous penetration of the virus into the human population, or with the targeted use of the engineered chimeric virus as a weapon or a means of terrorist attack. . a study of literature data has shown the promise of using antisense oligonucleotides in the treatment of viral diseases and, in particular, caused by coronaviruses, such as sars-cov and mers. . oligonucleotides previously studied for sars ( ) are complementary to the nucleotide sequences of the viral rna of the new coronavirus infection covid- in almost all samples and, theoretically, can be considered as drugs for the treatment of the new coronavirus infection covid- . investigation of an antisense oligonucleotide with phosphorothioate protection `-agccgagtgacagccacacag, complementary to the region of viral rna near the `-end, showed extremely low toxicity. the cc dose in studies on vero e cell culture was more than mg/ml. . the study of the antiviral activity of antisense oligonucleotide on vero cell culture e according to real-time pcr showed that at a dosage of mg/ml, the reduction in viral sars-cov- endothelial infection causes covid- chilblains: histopathological, immunohistochemical and ultrastructural study of seven paediatric cases sars and mers: recent insights into emerging coronaviruses Руководство по проведению доклинических исследований лекарственных средств oligocalc: an online oligonucleotide properties calculator examination of antisense rna and oligodeoxynucleotides as potential inhibitors of avian leukosis virus replication in rp cells the real-time polymerase chain reaction// molecular aspects of medicine development of therapeutics for treatment of ebola virus infection antisense oligonucleotides target a nearly invariant structural element from the sars-cov- genome and drive rna degradation antisense morpholino-oligomers directed against the ` end of the genome inhibit coronavirus proliferation and growth// journal of virology antiviral effects of antisense morpholino oligomers in murine coronavirus infection models// journal of virology sars and other coronaviruses as causes of pneumonia key: cord- - evl wnd authors: neufeldt, christopher j.; cerikan, berati; cortese, mirko; frankish, jamie; lee, ji-young; plociennikowska, agnieszka; heigwer, florian; joecks, sebastian; burkart, sandy s.; zander, david y.; gendarme, mathieu; el debs, bachir; halama, niels; merle, uta; boutros, michael; binder, marco; bartenschlager, ralf title: sars-cov- infection induces a pro-inflammatory cytokine response through cgas-sting and nf-κb date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: evl wnd sars-cov- is a novel virus that has rapidly spread, causing a global pandemic. in the majority of infected patients, sars-cov- leads to mild disease; however, in a significant proportion of infections, individuals develop severe symptoms that can lead to permanent lung damage or death. these severe cases are often associated with high levels of pro-inflammatory cytokines and low antiviral responses which can lead to systemic complications. we have evaluated transcriptional and cytokine secretion profiles from infected cell cultures and detected a distinct upregulation of inflammatory cytokines that parallels samples taken from infected patients. building on these observations, we found a specific activation of nf-κb and a block of irf nuclear translocation in sars-cov- infected cells. this nf-κb response is mediated by cgas-sting activation and could be attenuated through sting targeting drugs. our results show that sars-cov- curates a cgas-sting mediated nf-κb driven inflammatory immune response in epithelial cells that likely contributes to inflammatory responses seen in patients and might be a target to suppress severe disease symptoms. in late sars-cov- emerged as a highly infectious coronavirus that causes respiratory disease in humans, termed covid- . since the initial identification, sars-cov- has spread around the world leading the world health organization to declare a pandemic. sars-cov- infection causes respiratory symptoms that range from mild to severe and can result in lasting lung damage or death in a significant number of cases . one of the hallmarks of severe covid- disease is low levels of type i interferons (ifns) and overproduction of inflammatory cytokines such as il- and tnf [ ] [ ] [ ] [ ] . this unbalanced immune response fails to limit virus spread and can cause severe systemic symptoms , . therapies aimed at modulating immune activation to attenuate the detrimental inflammatory response or promote an antiviral cytokine response represents an important avenue for treating patients with severe covid- . sars-cov- is a plus-strand rna virus that replicates its genome in the cytosolic compartment of the cells. like all plus-strand rna viruses, this replication process requires the production of a negative-strand rna template in order to amplify the positive sense viral genome. this process and probably also the production of subgenomic rnas of negative and positive polarity, produces double strand (ds)rnas that can be sensed by cytosolic immune receptors (pattern recognition receptors, prrs) that subsequently activate antiviral pathways . in addition to direct viral sensing, cells have also evolved ways to detect the indirect effects of virus infection, such as nuclear or mitochondrial damage caused by the heavy cellular burden of virus replication. cytoplasmic dna sensors including cgas-sting, ifi , or aim , recognize dsdna from dna viruses, but have also been shown to play an important role in rna virus infection, either through directly recognising viral signatures or through sensing of cellular dna released from mitochondria or nuclei due to cellular stress (reviewed in , ) . substrate recognition by either rna or dna sensors leads to signalling cascades that activate two major branches of the innate immune response, the type i/iii ifn response and the inflammatory cytokine response. the type i/iii ifn pathways are directly involved in protecting neighboring cells from virus spread and are vital for the immediate cell-intrinsic antiviral response. the inflammatory cytokine response is involved in recruitment and activation of immune cells, which is required to initiate an adaptive immune response. due to the effective nature of innate immune sensing and responses plus-strand rna viruses have evolved numerous ways to limit or block these cellular pathways. for many viruses, the initial line of defence is to hide viral replication intermediates within membrane compartments that block access to cytosolic prrs, such as rig-i or mda . in the case of coronaviruses, this is achieved through the formation of replication organelles composed predominantly of double membrane vesicles, within which viral rna replication occurs , . coronaviruses can also evade recognition by immune receptors through modification of viral rna to resemble host mrna [ ] [ ] [ ] [ ] . in addition to these passive immune evasion strategies, coronaviruses utilize various mechanisms to actively target and block key immune sensors or signalling molecules (reviewed in , ). for sars-cov- , a closely related virus, several viral proteins have been shown to block rig-i/mda sensing, as well as the downstream activation of tbk and irf [ ] [ ] [ ] [ ] [ ] [ ] . sars-cov- also efficiently blocks ifn receptor and jak-stat signalling to stop downstream immune activation , [ ] [ ] [ ] . additionally, the sars-cov- papain-like protease (plp) has been shown to interfere with cgas-sting activation also limiting activation of innate immune pathways . the combination of these actions can lead to an imbalance between proinflammatory and antiviral immune responses. although numerous immune evasion mechanisms have been characterized for other pathogenic coronaviruses, it remains to be determined whether similar processes exist for sars-cov- . given the homology of sars-cov- to sars-cov- , they may have many conserved antagonistic strategies, however, key differences in infection and disease could suggest divergent pathways. early reports on sars-cov- demonstrated that infection is highly sensitive to type i/iii ifn treatment [ ] [ ] [ ] [ ] . in combination with the low levels of ifn reported to be secreted in severe cases, this suggests that like sars-cov- , sars-cov- infection actively blocks immune activation. transcriptomic analyses of sars-cov- infected cells generated ambiguous results on the induction of type i/iii ifns and the subsequent expression of ifn stimulated genes (isgs). on the one hand, it was shown that sars-cov- triggers only an attenuated immune response suggesting a block in prr signaling pathways, which would parallel sars-cov- and mers-cov infections . on the other hand, several studies argue for a strong induction of ifn responses in both lung and intestinal infection models , . additionally, proteomics approaches determining sars-cov- protein interactions with host factors in exogenous expression conditions revealed several interactions with key immune regulators including mavs, tbk and several co-factors involved in irf activation , . however, many of these findings are still observational leaving the mechanisms of sars-cov- innate immune response modulation unresolved. here, we report the transcriptomic profiles derived from sars-cov- infected human lung cells showing a specific bias towards an nf-κb mediated inflammatory response and a restriction in the tbk specific irf / activation and subsequent ifn response. consistently, secreted cytokine profiles from both severe covid- patients and sars-cov- infected lung epithelial cells, were enriched for pro-inflammatory cytokines and lacked type i/iii ifns. we also demonstrate that sars-cov- infection leads specifically to nf-κb but not irf nuclear localization and that poly(i:c)-induced pathway activation is attenuated in infected cells. finally, we show that the cgas-sting pathway is activated by sars-cov- infection, leading to a specific nf-κb response and that inflammatory cytokine upregulation can be mitigated by sting inhibitory drugs. these results provide insight into how innate immune responses are modulated by sars-cov- in epithelial cells likely contributing to the strong inflammatory responses observed in severe covid- cases. sars-cov- predominantly infects airway and lung tissue in infected individuals. in order to determine the effects of sars-cov- on human lung epithelial cells, calu- and a cells were infected with sars-cov- and virus growth, as well as host transcriptional architecture was determined over a time course of infection. in contrast to calu- , a cells lack endogenous expression of the major sars-cov- entry receptor ace and, hence, are not naturally permissive to sars-cov- infection . we therefore used an engineered a cell line stably expressing ace (a -ace ), which is susceptible to sars-cov- infection , . for both, a -ace and calu- cells, we observed an increase in intracellular viral rna starting at h post infection, which continued to increase up to h post infection ( fig. a-b ). increased extracellular virus rna was observed starting at h post infection which was paralleled by the release of infectious virus (fig. a-c) . the levels of viral rna, production of infectious virus and virus spread were significantly higher in calu- cells compared to a - a ). we did not observe an overall decrease in total mrna quality or large differences in probe intensity, thus showing no indication that sars-cov- infection causes a general transcriptional shutdown. importantly, we observed a high degree of overlap between top significantly upregulated or downregulated genes from both cell lines (extended data fig. b- c). however, in a -ace cells, there was less overall change in transcript levels following infection (fig. b , and extended data fig. a and d) , which is likely due to the lower levels of infection (~ % vs % at h of a -ace vs calu- , respectively) (fig. e) . gene-set enrichment analysis of the transcriptional changes using curated "hallmark" pathways showed were also observed at early time points after infection of calu- cells as well as high levels of the chemokine cxcl /ip- at h post infection (fig. e) . in contrast, no consistently detectable upregulation was found for ifnα, ifnβ, ifnγ and ifnλ / , whereas a rather moderate increase of ifnλ expression was found, but only at the late time point of calu- cell infection. together, these results corroborate published data showing that, in severe cases, sars-cov- infection preferentially induces a pro-inflammatory cytokine production with little activation of the antiviral responses. additionally, these data indicate that infected epithelial cells secrete cytokines that can contribute to induction of tissue-level inflammation. to confirm that the lack of ifn response in calu- or a -ace cells infected with sars-cov- was not due to defects in the activation of innate immune pathways, we to test if ifns could limit virus replication even after establishment of infection, a -ace cells were treated with high levels of various ifns at the time point of infection or h thereafter. pre-treatment with type i ifns, serving as control, blocked virus infection, whereas co-or post-treatment had significantly less effects ( fig. e-f ). of note, the -fold decrease in virus replication following post-treatment with type i ifns likely represents a block in virus spread following the first round of infection, as only ~ - % of cells were observed to be infected at the h time point (fig. e) . together, these observations suggest that sars-cov- likely supresses the production of ifns and antiviral isgs, and that it furthermore rapidly and potently blocks ifn signalling in infected cells. the high levels of inflammatory gene activation and the lack of ifns and isgs in response to sars-cov- infection lead us to investigate which transcription factors of the cell- to determine if sars-cov- can actively block immune stimulation through cytosolic to determine the source of the sars-cov- induced inflammatory response or downstream immune activation, we evaluated the effects of innate immune receptor knockout or overexpression. we first looked at rna receptors that have previously been described to together, these data indicate that recognition of viral rna via cellular rna sensors is not involved in nf-κb activation in sars-cov- infected cells. although cgas is a sensor of cytosolic dna, induction of the cgas-sting-signaling axis leading to activation of nf-κb and irf has been reported for several rna virus infections, most likely through cellular stress responses . to determine whether the cgas-sting pathway is triggered in sars-cov- infection, we first evaluated changes in localization of cgas or sting in infected cells. indeed, both cgas and sting were observed to re-localize to perinuclear clusters in infected cells, indicative of activation ( fig. a-b) . costaining for cgas and dsdna in infected cells also showed that dsdna colocalized with cgas in infected cells (fig. c) . additionally, we observed that, unlike in poly(i:c)-mediated activation of rlrs, sars-cov- infection did not interfere with activation of the cgas-sting pathway by dsdna transfection (extended data fig. c-d) . to confirm that cgas-sting activation is involved in the observed induction of proinflammatory cytokines, we examined the effects of pharmacologically blocking sting in sars-cov- infected cells. one hour post infection, cells were treated with the sting specific inhibitor h- , the tbk inhibitor amlexanox (amx), or dmso. at h post infection, we observed a significant decrease in the levels of tnf mrna in infected cells treated with h- compared to dmso treated cells (fig. d) , both, in a -ace and calu- . this decrease was not observed for amx treated cells. infection levels and cell viability were not significantly affected at the effective concentration (extended data fig. e -g). together these results indicate that sars-cov -infection triggers the cgas-sting pathway, leading to nf-κb-mediated induction of pro-inflammatory cytokines, and that this response can be controlled with sting inhibitors. although sting activation is usually associated with both, nf-κb as well as irf activation, several reports have suggested that interfering with proper translocation of sting from the er to golgi compartments can selectively stimulate the nf-κb pathway , . to test whether this is the case in sars-cov- infected cells, we determined the localization of sting relative to golgi markers by microscopy. consistent with previous reports, in cells transfected with dsdna, we observed sting translocation to the golgi compartment (extended data fig. h) . no significant colocalization of golgi markers and sting were observed in either mock or sars-cov- infected cells, suggesting that sting translocation may be impaired (fig. e-f ). moreover, we found that clusters of sting in sars-cov- infected cells colocalized with viral nucleocapsid (n) protein ( fig. g-h; extended data fig. i ). together, these results suggest that sting is activated in sars-cov- infection, but inhibited from translocating to the golgi, leading to a specific nf-κb inflammatory response in infected cells. in this study, we combine transcriptional profiling and cytokine secretion analyses to characterize the pro-inflammatory response induced by sars-cov- infection, and evaluate the virus-induced signalling pathways mediating this response. we report that both virus- further evaluation of the sars-cov- induced pro-inflammatory response showed a specific induction of nf-κb, but not of irf or the subsequent ifn signalling. nf-κb can be activated through numerous immune or stress stimuli including the er stress responses or increase in cytosolic reactive oxygen species, as well as through detection of cytosolic dna released from the nucleus or mitochondria (reviewed in , , ). our results indicate that cgas-sting activation is a major contributor to nf-κb activation in sars-cov- infected cells. since cgas is a dsdna sensor that would not be expected to directly recognize sars-cov- rna, it is likely that cellular stress or cytokine responses induced by the infection leads to nuclear or mitochondrial dna release which is sensed by cgas [ ] [ ] [ ] [ ] [ ] . similar activation of cgas-sting has been observed for other positive strand rna viruses including flaviviruses and both sars-cov- and nl coronaviruses (reviewed in ). for the coronaviruses, sting activation is perturbed through the action of the viral plp leading to an inhibition of sting oligomerization and downstream activation of tbk and irf , , . intriguingly, mechanisms for cgas-sting modulation seem to be different in sars-cov- infected cells, highlighting a major immunological difference between these related viruses. of note, the activation of nf-κb through cgas-sting does not exclude other sources of nf-κb activation. indeed, we observed increases in fos/jun and atf mrna levels in infected cells suggesting activation of multiple cell stress pathways . moreover, pharmacological inhibition of sting did not completely block tnf upregulation, further indicating a role for other sources of nf-κb-activation. we speculate that therapeutic inhibition of multiple nf-κb activation pathways could serve to further reduce pro-inflammatory responses in sars-cov- infected cells. the selective activation of nf-κb, rather than a general block in all immune activation pathways, indicates a pro-viral role for nf-κb signalling. in addition to functions in inflammation, nf-κb is also important for cell survival and proliferation . these nf-κb cell survival signals could be beneficial for the virus by promoting vitality in cells in order to facilitate efficient and sustained virus replication and spread. mechanisms for nf-κb pathway interference have been reported for numerous dna and rna viruses [ ] [ ] [ ] . selective modulation of the cgas-sting pathways may allow sars-cov- to promote an nf-κb mediated cell survival signal while limiting isg induction. classical cgas-sting induction activates not only nf-κb, but also tbk and irf pathways. we envisage several mechanisms that could contribute to the selective nf-κb activation. first, the virus could actively block tbk activation in infected cells. indeed, protein interaction studies indicate that viral nsp and nsp proteins interact with tbk or its adaptor proteins . additionally, a block in tbk activation has been reported for both sars-cov- and mers-cov infections and early reports demonstrate a lack of tbk phosphorylation in sars-cov- infected cells , . moreover, our results support a model where sars-cov- infection prevents activated sting from translocating from the er to the golgi. activation of sting at the er has been shown to be sufficient for nf-κb activation but not for tbk activation and the subsequent irf phosphorylation , . it may be that fragmentation of the golgi by sars-cov- infection leads to an impairment of sting translocation to the ergic. consistently, our transcriptomic analysis shows impairment in protein secretion pathways, specifically including downregulation of several cop coatomer proteins involved in er to golgi transport. alternatively, sars-cov- proteins could actively block cgas-sting translocation. colocalization between sting and n protein in infected cells suggests a direct role for n protein in limiting sting translocation. a similar mechanism has been suggested for murine cytomegalovirus, where viral m protein associates with sting and limits exit from the er, thereby promoting an nf-κb specific response. interestingly, pathway analysis of our microarray data indicate that cytokine transcriptional responses from sars-cov- infected cells resemble signatures from human cytomegalovirus infected cells. further experimentation is required to define the precise mechanisms of sting activation in sars-cov- infected cells and to determine whether viral proteins are directly associating with components of the cgas-sting pathway. the majority of documented sars-cov- infections lead to mild or no symptoms, indicating that even the observed low level of antiviral pathway activation induced by infected cells can be sufficient to limit and resolve the infection. on the other hand, in patients with underlying conditions or attenuated immune responses, these antiviral responses do not limit virus replication and a sustained virus load eventually leads to a long term inflammatory response. in these latter cases, one important avenue of treatment is to modulate the immune response in order to alleviate hyper-inflammation. in addition to other immune modulators that are currently being used or clinically evaluated (eg. il- inhibitors or corticosteroids) - , our results indicate that disease severity might be suppressed at the epithelial cell level through the use of cgas-sting inhibitors or through blocking nf-κb mediated inflammatory responses. in this respect, nf-κb inhibitors analogous to cape or parthenolide, prolonging survival of sars-cov- infected mice , might help to reduce the disease burden imposed by covid- supplemented with glutamax (gibco), % fetal bovine serum, u penicillin/ml, µg streptomycin/ml, mm l-glutamine and nonessential amino acids. sars-cov- stocks were produced using veroe cell line. passage bavpat / (moi: . ) strain was used to generate the seed virus (passage ). after h the supernatant was harvested, cell debris was removed by centrifugation at , xg for min and supernatant filtered with a . mm pore-size filter. passage virus stocks were produced by using µl of the seed virus (passage ) to infect e+ veroe cells. the resulting supernatant was harvested, filtered h later as described above and stored in aliquots at - °c. stock virus titers were determined by plaque assay. total rna was isolated from cells or supernatants using the nucleospin rna extraction kit (macherey-nagel) according to the manufacturer's specification. cdna was synthesized from the total rna using the high capacity cdna reverse transcription (rt) kit (thermoscientific) according to the manufacturer's specifications. each cdna sample was diluted : in nuclease free h o prior to qpcr analysis using specific primers and the itaq universal sybr green mastermix (bio-rad). primers for qpcr were designed using primer software and include: gene set enrichment analysis was performed according to subramanian et al. . we use the practical r implementation "fgsea" and the hallmark pathway gene set published by liberzon et al. . the barcode plot implementation was inspired by zhan et al. . primary antibodies and specific dilutions used for western blot or immunofluorescence after infection with sars-cov- cells were fixed with % formaldehyde solution, washed twice with phosphate buffered saline (pbs) and permeabilized with . % triton x- in pbs. next, the triton x- solution was replaced with . % (w/v) milk solution (in pbs) and cells were blocked for h at room temperature. primary antibodies were diluted in . % milk solution and samples were incubated with primary antibodies for h. after washing three times with pbs, samples were incubated with fluorophore-conjugated secondary antibodies, diluted in milk solution, for min. after washing three times with pbs samples were mounted in fluoromount g solution containing dapi (southern biotech) for dna staining. microscopic analyses were conducted with a nikon eclipse ti microscope (nikon, tokio, japan) or a leica sp confocal microscope (leica) for the subcellular localization analyses. for quantification of the nuclear translocation of nf-κb p /rela or irf , nuclei were segmented using dapi signal first. secondly, the segmented nucleus was dilated and finally dilated nucleus was subtracted by original nucleus mask to detect perinuclear fluorescent signal. to determine sars-cov- infected cells, dsrna intensity was measured within the perinuclear area. the status of nf-κb or irf nuclear signals was determined based on the ration between perinuclear intensity divided by nuclear intensity. for pretreatment experiments, cells were treated for h with serial dilution of ifnα for calu- cell stimulation (fig. a,b ) cells were transfected with the indicated amount of poly(i:c) using lipofectamine reagent as per the manufacturer's protocol. h after transfection, total rna was isolated and rt-qpcr was used to determine transcript levels as describe above. for transfection in sars-cov- infected cells ( fig. and extended data fig. ) , cells seeded in -well plates were infected with sars-cov- at moi= for h. cells were then transfected with poly(i:c) or herring dna ( ng/well) using lipofectamine reagent as per the manufacturer's protocol. h after transfection, cells were either fixed with % paraformaldehyde and processed for immunofluorescence, or total rna was isolated for rt-qpcr analysis as described above. infected and mock cells were washed with pbs and lysed with µl of sample buffer ( for cytopathic effect (cpe) assays, calu- or a -ace cells were plated into well plates. cells were infected with sars-cov- for h followed by fixation in % formaldehyde for h. cells were then stained with % crystal violet solution and scanned. patient sera were collected and stored at - °c until cytokine measurement. a -ace cells were infected with sars-cov- for h. a, cells were fixed and stained with antibodies specific for irf (green), p /rela (red) and dsrna (grey). turquoise arrows point to cells showing p /rela nuclear accumulation. scale bars, µm. b, graph shows the mean nuclear accumulation of irf and p /rela for images from cells treated as in panel (a). c, infected cells we either untreated or transfected with poly(i:c) h post infection. cells were fixed and stained with antibodies against dsrna followed by analysis using widefield microscopy. graph shows the average percent of infected cells for fields of view collected from independent experiments determined by dsrna fluorescence signal (n> cells per field of view). d-e, cells were infected with sars-cov- for the indicated times. cells were lysed and levels of given proteins were determined by western blot using monospecific primary antibodies. e, western blot signals for phosopho-p /rela (prela) were quantified and compared to the corresponding total p /rela proteins levels. graph shows the mean and sem for prela vs. total p /rela protein levels for independent experiments. f-i, a -ace cells were infected with sars-cov- for h followed by transfection with poly(i:c) and incubated for h. f. total rna was isolated and the mrna levels of ifit and ifit were determined by rt-qpcr. graphs show the mean and sem from independent experiments. g, cells were fixed and stained with antibodies specific for irf (green), p /rela (red) and dsrna (grey). magenta arrows point to cells with nuclear accumulation of both irf and p /rela and turquoise arrows point to cells with only p /rela nuclear signal. scale bars, µm. h-i, quantification of nuclear translocation of fluorescence signals from p /rela or irf from fields of view collected from independent experiments conducted as in panel (g). graphs show the mean number of cells with nuclear signal in uninfected or infected cells in either the mock or sars-cov- treatment conditions. infection was determined by the dsrna signal. quantification was done using an in house fiji macro. cells. a-c, a -ace cells were infected with sars-cov- for h followed by fixation and staining with the indicated antibodies. cells were analyzed by confocal microscopy. scale bars µm (a-b), µm (c). d, cells were infected with sars-cov- . one hour after infection cells were treated with the indicated drugs at the given concentrations. total rna was isolated and the tnf mrna transcript levels were determined by rt-qpcr. graph shows the average fold change and sem for tnf transcript levels compared to dmso treated cells for independent experiments. e, a -ace cells were infected with sars-cov- for h followed by fixing and staining with the indicated antibodies. cells were analyzed by confocal microscopy. scale bars, µm. f, pearson's correlation coefficient for fluorescence signals pertaining to sting and tgn were calculated for fields of view over independent experiments (n> cells). g, a -ace cells were infected sars-cov- for h followed by fixing and staining with antibodies specific to sting (green) or n protein (red). cells were analyzed by confocal microscopy. scale bars, µm upper panels, µm for inset that is indicated with a rectangle in middle right panel. h, pearson's correlation coefficient for fluorescence signal pertaining to sting and n protein were calculated for fields of view over independent experiments (n> cells). a systematic review of pathological findings in covid- : a pathophysiological timeline and possible mechanisms of disease progression clinical features of patients infected with novel coronavirus in wuhan covid- : consider cytokine storm syndromes and immunosuppression immune cell profiling of covid- patients in the recovery stage by single-cell sequencing covid- severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis impaired type i interferon activity and exacerbated inflammatory responses in severe covid- patients. medrxiv rig-i-like receptors: their regulation and roles in rna sensing molecular mechanisms and cellular functions of cgas-sting signalling the cgas-sting defense pathway and its counteraction by viruses architecture and biogenesis of plus-strand rna virus replication factories sars-coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum sars-cov- structure and replication characterized by in situ cryo-electron tomography. biorxiv in vitro reconstitution of sars-coronavirus mrna cap methylation functional screen reveals sars coronavirus nonstructural protein nsp as a novel cap n methyltransferase coronavirus endoribonuclease targets viral polyuridine sequences to evade activating host sensors multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase immunology of covid- : current state of the science sars and mers: recent insights into emerging coronaviruses severe acute respiratory syndrome coronavirus open reading frame (orf) b, orf , and nucleocapsid proteins function as interferon antagonists severe acute respiratory syndrome coronavirus m protein inhibits type i interferon production by impeding the formation of traf .tank.tbk /ikkepsilon complex severe acute respiratory syndrome coronavirus papain-like protease ubiquitin-like domain and catalytic domain regulate antagonism of irf and nf-kappab signaling regulation of irf- -dependent innate immunity by the papainlike protease domain of the severe acute respiratory syndrome coronavirus sars-coronavirus open reading frame- b suppresses innate immunity by targeting mitochondria and the mavs/traf /traf signalosome coronavirus nonstructural protein mediates evasion of dsrna sensors and limits apoptosis in macrophages the sars coronavirus a protein causes endoplasmic reticulum stress and induces ligand-independent downregulation of the type interferon receptor severe acute respiratory syndrome coronavirus orf antagonizes stat function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane severe acute respiratory syndrome coronavirus evades antiviral signaling: role of nsp and rational design of an attenuated strain coronavirus papain-like proteases negatively regulate antiviral innate immune response through disruption of sting-mediated signaling imbalanced host response to sars-cov- drives development of covid- critical role of type iii interferon in controlling sars-cov- infection in human intestinal epithelial cells sars-cov- is sensitive to type i interferon pretreatment. biorxiv antiviral activities of type i interferons to sars-cov- infection bulk and single-cell gene expression profiling of sars-cov- infected human cell lines identifies molecular targets for therapeutic intervention. biorxiv a sars-cov- protein interaction map reveals targets for drug repurposing multi-level proteomics reveals host-perturbation strategies of sars-cov- and sars-cov. biorxiv sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor microscopy-based assay for semi-quantitative detection of sars-cov- specific antibodies in human sera. biorxiv gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles the herpesviral antagonist m reveals differential activation of sting-dependent irf and nf-kappab signaling and sting's dual role during mcmv infection modular architecture of the sting c-terminal tail allows interferon and nf-kappab signaling adaptation nf-kappab, inflammation, immunity and cancer: coming of age severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc inhibition of nf-kappab-mediated inflammation in severe acute respiratory syndrome coronavirus-infected mice increases survival crosstalk of reactive oxygen species and nf-kappab signaling the crosstalk of endoplasmic reticulum (er) stress pathways with nf-kappab: complex mechanisms relevant for cancer mitochondrial dna stress primes the antiviral innate immune response dengue virus activates cgas through the release of mitochondrial dna collateral damage during dengue virus infection: making sense of dna by cgas interleukin- beta induces mtdna release to activate innate immune signaling via cgas-sting cgas surveillance of micronuclei links genome instability to innate immunity deubiquitinating and interferon antagonism activities of coronavirus papain-like proteases sars coronavirus papain-like protease inhibits the type i interferon signaling pathway through interaction with the sting-traf -tbk complex cell signaling and stress responses more to life than nf-kappab in tnfr signaling nf-kappab and virus infection: who controls whom who's driving? human cytomegalovirus, interferon, and nfkappab signaling manipulation of non-canonical nf-kappab signaling by non-oncogenic viruses the global phosphorylation landscape of sars-cov- infection cyclic dinucleotides trigger ulk (atg ) phosphorylation of sting to prevent sustained innate immune signaling can we use interleukin- (il- ) blockade for coronavirus disease (covid- )-induced cytokine release syndrome (crs)? il- inhibitors in the treatment of serious covid- : a promising therapy dexamethasone for covid- ? not so fast can steroids reverse the severe covid- induced "cytokine storm early short course corticosteroids in hospitalized patients with covid- limma powers differential expression analyses for rnasequencing and microarray studies gsea-p: a desktop application for gene set enrichment analysis fast gene set enrichment analysis. biorxiv the molecular signatures database (msigdb) hallmark gene set collection mek inhibitors activate wnt signalling and induce stem cell plasticity in colorectal cancer ebimage--an r package for image processing with applications to cellular phenotypes natural killer cells are scarce in colorectal carcinoma tissue despite high levels of chemokines and cytokines calu- cells show separation between infection and mock but the a -ace cells have less separation due to lower levels of infection. b, plot comparing enriched genes between the two cell lines. c, venn diagram of significantly (fdr < %) upregulated (top) or downregulated (bottom) genes comparing calu- and a -ace cells calu- or a -ace cells were treated with increasing concentrations of given ifns; highest concentration corresponds to x the ec value (determined for isg activation by qpcr) for each ifn. six hours post treatment cells were infected with sars-cov- (moi= ) and h thereafter the graphs show the means and sems compared to hprt mrna levels for independent experiments. c-d, cells were infected with sars-cov- or mock-infected for h, then transfected with poly(i:c) or herring dna and h thereafter, total rna was isolated and the mrna levels of given immune genes were determined by rt-qpcr. c, the graphs show the mean mrna levels of ifit or tnf, corrected for hprt, for independent experiments for mock-infected cells. d, analogous to panel c, but for sars-cov- infected cells. values were corrected for mock of the same treatment for independent experiments. e-g, cells were infected with sars-cov- and, h later, cells were treated with the given drugs or dmso only. e, total rna was isolated and the viral rna levels were determined by rt-qpcr. the graph shows the mean and sem for independent experiments corrected for hprt. f, cells were fixed and stained with antibodies specific for dsrna. graph shows the mean percent of infected cells from different experiments for each condition. g, graph shows the average fold change and sem for the number of cells for each drug treatment compared to the dmso treated cells. h, cells either un-transfected or transfected with herring dna for h were fixed, stained with the indicated antibodies and examined by confocal microscopy. scale bar, µm. i, a -ace cells were infected with sars-cov- for h followed by fixing and staining with the indicated antibodies we thank all members of the molecular virology department at heidelberg university for helpful discussions and support during different stages of the covid- related lockdown. we also thank dr. monica boxberger for taking patient samples, sandra wüst for excellent the authors declare no competing interests key: cord- -xh koro authors: dilucca, maddalena; forcelloni, sergio; georgakilas, alexandros g.; giansanti, andrea; pavlopoulou, athanasia title: temporal evolution and adaptation of sars-cov codon usage date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xh koro the outbreak of severe acute respiratory syndrome-coronavirus- (sars-cov- ) has caused an unprecedented pandemic. since the first sequenced whole-genome of sars-cov- on january , the identification of its genetic variants has become crucial in tracking and evaluating their spread across the globe. in this study, we compared , sars-cov- genomes isolated from countries since the outbreak of this novel coronavirus with the first sequenced genome in wuhan to quantify the evolutionary divergence of sars-cov- . thus, we compared the codon usage patterns, every two weeks, of of sars-cov- genes encoding for the membrane protein (m), envelope (e), spike surface glycoprotein (s), nucleoprotein (n), non-structural c-like proteinase ( clpro), ssrna-binding protein (rbp), ’-o-ribose methyltransferase (omt), endornase (rnase), helicase, rna-dependent rna polymerase (rdrp), nsp , nsp , and exonuclease exon. as a general rule, we find that sars-cov- genome tends to diverge over time by accumulating mutations on its genome and, specifically, on the coding sequences for proteins n and s. interestingly, different patterns of codon usage were observed among these genes. genes s, nsp , nsp , tend to use a norrower set of synonymous codons that are better optimized to the human host. conversely, genes e and m consistently use a broader set of synonymous codons, which does not vary with respect to the reference genome. we identified key sars-cov- genes (s, n, exon, rnase, rdrp, nsp and nsp ) suggested to be causally implicated in the virus adaptation to the human host. the recent emergence of the novel, human pathogen severe acute respiratory syndrome coronavirus (sars-cov- ) in china and its rapid spread poses a global health emergency. on march , who publicly declared the sars-cov- outbreak as a pandemic. as of may , , the covid- protein-coding sequences from the , genomes under study, by using exonerate with default parameters [ ] . we calculated the effective number of codons (en c) to estimate the extent of the codon usage bias of sars-cov- genes. the values of en c range from (when just one codon is used for each amino acid) to (when all synonymous codons are equally used for each amino acid) [ ] . for each sequence, the computation of en c starts from fα, a quantity defined for each family α of synonymous codons: where mα is the number of different codons in α (each one appearing n α , n α , ..., nm α times in the sequence) and nα = mα k= n kα . finally, the gene-specific enc is defined as: where ns is the number of families with one codon only and km is the number of families with degeneracy m (the set of synonymous codons for leu can be split into one family with degeneracy , similar to that of phenylalanine (phe), and one family with degeneracy , similar to that, for example, of proline (pro). en c was evaluated by using dambe . [ ] . the codon adaptation index (cai) [ , ] was used to quantify the extent of codon usage adaptation of sars-cov- to the human coding sequences. the principle behind cai is that the codon usage in highly expressed genes can reveal the optimal (i.e., most efficient for translation) codons for each amino acid. hence, cai is calculated based on a reference set of highly expressed genes to assess, for each codon i, the relative synonymous codon usages (rscui) and the relative codon adaptiveness (wi): in the rscui, xi is the number of occurrences of codon i in the genome, and the sum in the denominator runs over the ni synonymous codons. rscu measures codon usage bias within a family of synonymous codons. wi is defined as the usage frequency of codon i compared to that of the optimal codon for the same amino acid encoded by i (i.e., the the most used one in a reference set of highly expressed genes). finally, the cai value for a given gene g is calculated as the geometric mean of the usage frequencies of codons in that gene, normalized to the maximum cai value possible for a gene with the same amino acid composition: where the product runs over the lg codons belonging to that gene (except the stop codon). this index ranges from to , where the score represents a greater tendency of the gene to use optimal codons in the host organism. the cai analysis was performed using dam be . [ ] . the synonymous codon usage data of human host were retrieved from the codon usage database (http://www.kazusa.or.jp/codon/). the similarity index (sid) was used to provide a measure of similarity in codon usage between sars-cov- and various potential host genomes. formally, sid is defined as follows: where ai is the rscu value of synonymous codons of the sars-cov- coding sequences; bi is the rscu value of the identical codons of the potential host. r(a,b) is defined as the cosine value of the angle included between a and b spatial vectors, and therefore, quantifies the degree of similarity between the virus and the host in terms of their codon usage patterns. in our analysis, we considered as hosts the species shown in figure by dilucca et al. [ ] . sid values range from to ; the higher the value of sid, the more adapted the codon usage of sars-cov- to the host [ ] . an en c plot analysis was performed to estimate the relative contributions of mutational bias and natural selection in shaping cub of genes encoding proteins that are crucial for sars-cov- . in this plot, the en c values are plotted against gc values. if codon usage is dominated by the mutational bias, then a clear relationship is expected between en c and gc : s represents the value of gc [ ] . if the mutational bias is the main force affecting cub of the genes, the corresponding points will fall near the the wright's theoretical curve. conversely, if cub is mainly affected by natural selection, the corresponding points will fall considerably below the wright's theoretical curve. to quantify the relative extent of the natural selection, for each gene, we calculated the euclidean distance d of its point from the theoretical curve. we then show the average values of the distance over time with heatmap, drawn with the cimminer software [ ] , which uses euclidean distances and the average linkage algorithm. muscle program was used to conduct multiple sequence alignments [ , ] . estimates of evolutionary divergence (i.e., the number of base substitutions per site) from the reference sequence (wsm, nc . ) was calculated by using the maximum composite likelihood model, implemented in the software package mega version . [ ] , based on sequences. all ambiguous positions were removed for each sequence pair (pairwise deletion option). distributions of divergence were assessed for each group of viruses divided in different period. the statistical analysis was performed with the r software to calculate average values and their standard deviations. mann-whitney tests on distributions were used to test for statistical significance. all p-values were calculated from -sided tests using . as cut-off. in this study, we used the high-confidence sars-cov- -human protein-protein interactions (ppi) collected by gordon et al. [ ] who identified the viral proteins that physically associate with human proteins using affinity-purification mass spectrometry (ap-ms). we downloaded these ppi from to detect communities of ppi, we used the app molecular complex detection (mcode) [ ] in cytoscape. in a nutshell, mcode iteratively groups together neighboring nodes with similar values of the core-clustering coefficient, which for each node is defined as the density of the highest k-core of its immediate neighborhood times k. mcode detects the densest regions of the network and assigns to each detected community a score that is its internal link density times the number of nodes belonging to it. we also characterized the first ten communities c with the mean valuexc and standard deviation σc of codon bias values within the community, and use them to compute a z-score as zc = (xc −xn)/ √ σ c + σ n (wherexn and σn are, respectively, the mean value and standard deviation of codon bias values computed for all proteins). in this way, a value of zc > (zc < − ) indicates that community c features significantly higher (lower) codon bias than the population mean. cytoscape was used to detect the degree k of a protein. the records of viruses according to geographical location and date of isolation are shown in figure the evolutionary divergence from the reference sequence was calculated for each of the genes in all genomes under study. as shown in figure evolutionary divergence, in terms of number of substitutions per site, increases over time. regarding the evolutionary trajectory of each gene (figure ), there is a statistically significant variation of evolutionary divergence is only for gene n and s. for the other genes most of % of viruses present identical sequence with the reference one. this observation is in line with our previous observation that genes encoding nucleocapsid (n) and spike proteins (s) tend to evolve faster in comparison to the two genes encoding the integral membrane proteins m and e [ ] . to measure the codon usage bias in the sars-cov- genomes, we used the effective number of codons (enc) and the competition adaptation index (cai). for the funtionally important genes in each genome, we calculated the average values of cai and enc over time, as compared to the reference sars-cov- sequence (wsm). to visually illustrate the differences among different time periods, the average enc and cai values of the coronavirus were depicted using a heatmap (see figures , ) . the different genes of the coronavirus show different patterns of codon usage. all the genes have enc and cai values that differ significantly from the corresponding values of reference sequences (|z-score| > ). specifically, the enc values associated with e, m and n are significantly higher than the corresponding one in the reference sequences, indicating that these genes use a broader set of synonymous codons in their coding sequences. this observation implies that these genes maintained high genetic variability in the context of synonymous codon usage that might render some advantages for those genes to express under diverse cellular and eviromental conditions. s, nsp , nsp , exon, rbp, omt and clpro have higher values of enc, compared to the reference sequence, that decrease significantly over time. it suggests a tendency for these genes to otpimize the choice of codons over time to the respective host. helicase has an intermediate value of enc. the enc value of protein rdrp increases only in the last two weeks, whereas rnase exhibits constantly low enc values. noteworthy, the cai of all genes is markedly higher than the reference sequence , underscoring that these genes use codons that are better adapted to the human hosts. in particular, proteins n, exon and s have the highest values of cai, suggesting that these genes accumulate preferential mutations to adapt better to the host. in line with our previous study [ ] , we calculated the similarity index (sid) of sars-cov- genomes with respect human and other hosts species shown in table by woo et al. [ ] . to understand the rationale behind these results, the higher the value of sid, the more adapted the codon usage of sars-cov- to the host under study [ ] . in our precedent analysis, sars-cov- exhibited high sid values for human ( . ), snakes ( . ), as well as pangolins (sid = . ), bats (sid = . ), and rats (sid = . ), which have been suggested to be possible hosts for sars-cov- [ ] . in this analysis, we found that the average value of sid is . ± . ( figure ). this value in human was increased compared to our previous analysis. in contrast, the values of sid for other hosts are not statistically significant. based on the sid combined with the cai results ( figure ), we suggest that sars-cov- , over time, has preferentially accumulated mutations in its genome which correspond to codons that adapt better to the human host. to further investigate the evolutionary forces that affect the sars-cov- codon usage, an enc-plot analysis was conducted separately for each of the genes considered herein. we then investigated variations over time by performing enc plots for sets of genes binned by the time. in these plots, each point represents a single gene retrieved from each genome. to show clearer the temporal difference, we calculated the average values of the distance d from the wright theoretical the conventions are the same as in figure . curve, and represented them graphically through a heatmap in figure . as a genaral rule, the average distances from wright theoretical curve increases over time for all the genes, except for exon , n sp , and om t , which appear to be stable. this means that the vast majority of the genes tend to be under a stronger action of natural selection over time. moreover, we note that genes encoding for helicase, e, and rbp have a shorter distance from the wright theoretical curve as a funtion of the time of isolation, meaning that the codon usage of these genes tends to be ruled by a mutational bias. looking at the numerical values associated with genes ecnding for s, rdrp , and rn ase, we observed that these genes are more scattered, on average, below the theoretical curve, indicating that the codon usage of these genes is most one affected by natural selection. conversely, the rest of the genes revealed larger deviations from the wright theoretical curve, thus showing a strict control of natural selection on their codon usage. as far as the network of interacting proteins in sars-cov- is concerned, we first investigated codon usage bias in relation with the connectivity patterns of the network. the degree distribution of the network suggests that it is scale-free ( figure. ) , meaning that the network contains a large number of poorly connected proteins and a relatively small number of highly connected proteins or 'hubs'. the corresponding genes of these hub proteins have consistently higher values of codon usage bias when this is measured by cai and lower when this is measured by en c. of note, the two codon bias indices are anticorrelated . moreover, we examined codon bias in relation with the community structure of the ppi, where a community is a group of proteins that are more densely connected within each other than with the rest of the network. table shows the figure . degree distribution of proteins p(k). the degree distribution of the network follows a power law, indicating that the network is scale-free. figure . correlation between codon bias indices of genes and the degree k of the corresponding proteins in the ppi. cai of a gene consistently increases with the connectivity of the corresponding protein in the ppi, whereas en c decreases. features of the first ten communities together with their average degree and their average of the two codon bias indices (cai and enc). we also calculate the internal average valuexc and the z-scores, comparing the distribution of bias inside the community with all the proteins. we note that all ten communities superate the z-score test (z> ). regarding the first community, which includes only proteins ( . % of the whole network) that basically overlaps with the main core of the ppi (i.e., the k-core with the highest possible degree). notably, proteins belonging to this community have on average a codon bias index (as measured by cai and en c) that is significantly higher than the average of the rest of the network (the z-score > ). in this study, we performed a comprehensive analysis of the evolutionary divergence and codon usage of sars-cov- over time, considering all genomes available in gisaid on may , . after filtering out incomplete genomes, we retained a total of , complete genomes, with the purpose of investigating the divergence of these viral genomes from the first sequenced sars-cov- genome (nc . ). we focused on sars-cov- genes/proteins that are crucial for its structure, synthesis, transmissibility and virulence. the sars-cov- genomes have a tendency to diverge constantly from the reference genome. this is in accordance with a recent study by pachetti and colleagues ( ) where they have demonstrated that the number of sars-cov- mutations change over time [ ] . this trend is more pronounced in the last two weeks of march, where the genome divergence is more rapid. of note, the sequences of the genes encoding the structural nucleocapsid (n) and spike (s) proteins, vary significantly from the reference sequences, as well as the non-structural proteins nsp and nsp . in support of that, the enc analysis revealed that the codon usage patterns of all genes under study differentiate over time from the reference sequences. this means that a percentage mutations on the sars-cov- genome are synonymous and, therefore, alter the patterns of codon usage. the cai values of all genes, especially s, n and exon , have increased significantly as compared to the reference. the value of the sid estimated from the average codon usage of the sars-cov- genomes against the codon usage of the human host was higher ( . ± . ) as compared to the reference one ( . ). it is suggested that the coronaviral genome has undergone genetic recombinations and beneficial nucleotide changes, which have likely contributed to its enhanced adaptability to the human host. in other words, sars-cov- might use the human translational machinery most effectively than that of other animals. this could explain, at least partially, the global distribution and increasing prevalence of sars-cov- [ , ] . the enc-plot analysis revealed that the codon usage of the viral genes here considered are subject to different balances between mutational bias and natural selection. for instance, the codon preference of the genes s, rdrp, rn ase, n , and n sp , are mainly determined by natural selection, as opposed to the genes m and e, the codon usage of which is rather affected by mutation bias. these results are similar to ones of our previous study [ ] , showing that the codon usage of the genes n , s and rdrp was found to be under stronger selection than genes encoding for proteins m and e. on the basis of our findings, the sars-cov- genes s and n consistently displayed higher genetic diversity, increased host adaptation, and more proneness to natural selection. according to rehman et al. ( ) , the spike protein, which mediates the virus interaction with the human host cells, is more prone to mutations and particularly those occurring in the amino acids implicated in the spikeangiotensin-converting enzyme (ace ) interface [ ] . the n protein, responsible for virus assembly and rna transcription [ ] , is considered the most conserved and stable coronaviral structural protein [ ] . it is tempting to speculate that n gene accumulates mutations that are do not affect its structure and function, but rather enable it to evade the host's immune responses and enhance sars-cov- 's pathogenicity. therefore, these key genes, n and s, accumulate beneficial mutations that would increase the evolvability and transmissibility of sars-cov- , and enable it to continuously adapt to different populations, like spilling over from bats, or other candidate natural reservoirs, to human. rdrp, which catalyzes the transcription and replication of the coronaviral genome, binds to nsp and nsp to form the core rna synthesis machinery of sars-cov- [ ] . accordingly, the genes coding for rdrp and its co-factor nsp appear to be subject to natural selection, suggesting that they have an increased ability to adapt into novel hosts. despite the fact that rdrp is considered less vulnerable to mutations [ ] due to its vital role in maintaining viral genome fidelity, the mutations that occur in rdrp likely promote the virus adaptive flexibility and enhance its resistance to antiviral drugs [ ] . the exoribonuclease exon, which ensures viral replication fidelity by correcting nucleotides incorrectly incorporated by rdrp [ ] , has a relatively high cai value, indicating greater adaptability to the human host. furthermore, we found that in the human-sars-cov- network, the most highly connected or hub proteins have consistently higher codon usage bias relatively to the less connected proteins. this observation leads to the suggestion that evolutionary pressure is exerted upon the genes encoding those proteins,most probably because of their great biological significance for the virus. in other words, if these nodes are removed the entire network will eventually collapse [ ] . large scale genomic analysis of sars-cov- genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant on the origin and continuing evolution of sars-cov- phylogenetic network analysis of sars-cov- genomes geographic and genomic distribution of sars-cov- mutations a novel coronavirus from patients with pneumonia in china a computer program for aligning a cdna sequence with a genomic dna sequence the 'effective number of codons' used in a gene dambe : a comprehensive software package for data analysis in molecular biology and evolution the codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications an evolutionary perspective on synonymous codon usage in unicellular organisms codon usage and phenotypic divergences of sars-cov- genes insights into the genetic and host adaptability of emerging porcine circovirus an information-intensive approach to the molecular pharmacology of cancer muscle: multiple sequence alignment with high accuracy and high throughput muscle: a multiple sequence alignment method with reduced time and space complexity molecular evolutionary genetics analysis across computing platforms analysis of the mutation dynamics of sars-cov- reveals the spread history and emergence of rbd mutant with lower ace binding affinity a sars-cov- protein interaction mapreveals targets for drug repurposing an automated method for finding molecular complexes in large protein interaction networks the proximal origin of sars-cov- evolutionary trajectory for the emergence of novel coronavirus sars-cov- origin and evolution of pathogenic coronaviruses cloning, sequencing, expression, and purification of sars-associated coronavirus nucleocapsid protein for serodiagnosis of sars analysis of preferred codon usage in the coronavirus n genes and their implications for genome evolution and vaccine design structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors a single mutation in poliovirus rna-dependent rna polymerase confers resistance to mutagenic nucleotide analogs via increased fidelity the curious case of the nidovirus exoribonuclease: its role in rna synthesis and replication fidelity network biology: understanding the cell's functional organization a total of , sars-cov- genomes reported across the world were obtained from gisaid (available at https://www.gisaid.org/epiflu-applications/nexthcov- -app/), on may . then, the sequences were classified according to their isolation dates. only complete genomes ( ) ( ) ( ) were included in the present analysis. thus, a list of , sars-cov- genomes was generated; representing % of total number in gisaid. we used the sars-cov- coding dna sequences (cdss) deposited in january by zhu and coworkers [ ] , formerly called "wuhan seafood market pneumonia virus" (wsm, nc . ). we retrieved these sequences from ncbi public database at https : //www.ncbi.nlm.nih.gov/. the cdss of the reference sars cov- genome (nc . ) were used to retrieve the homologous key: cord- -slouuryl authors: baker, jeremy d.; uhrich, rikki l.; kraemer, gerald c.; love, jason e.; kraemer, brian c. title: a drug repurposing screen identifies hepatitis c antivirals as inhibitors of the sars-cov- main protease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: slouuryl the sars coronavirus type (sars-cov- ) emerged in late as a zoonotic virus highly transmissible between humans that has caused the covid- pandemic , . this pandemic has the potential to disrupt healthcare globally and has already caused high levels of mortality, especially amongst the elderly. the overall case fatality rate for covid- is estimated to be ∼ . % overall and . % in hospitalized patients age - years . therapeutic options for treating the underlying viremia in covid- are presently limited by a lack of effective sars-cov- antiviral drugs, although steroidal anti-inflammatory treatment can be helpful. a variety of potential antiviral targets for sars-cov- have been considered including the spike protein and replicase. based upon previous successful antiviral drug development for hiv- and hepatitis c, the sars-cov- main protease (mpro) appears an attractive target for drug development. here we show the existing pharmacopeia contains many drugs with potential for therapeutic repurposing as selective and potent inhibitors of sars-cov- mpro. we screened a collection of ∼ , drugs with a previous history of use in humans for compounds that inhibit the activity of mpro in vitro. in our primary screen we found ∼ compounds with activity against mpro (overall hit rate < . %). subsequent dose validation studies demonstrated dose responsive hits with an ic ≤ μm. hits from our screen are enriched with hepatitis c ns / a protease targeting drugs including boceprevir (ic = . μm), ciluprevir ( . μm). narlaprevir (ic = . μm), and telaprevir ( . μm). these results demonstrate that some existing approved drugs can inhibit sars-cov- mpro and that screen saturation of all approved drugs is both feasible and warranted. taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis c ns / a viral protease should be revisited because some previous lead compounds may be more potent against sars-cov- mpro than boceprevir and suitable for rapid repurposing. limited by a lack of effective sars-cov- antiviral drugs, although steroidal anti-inflammatory treatment can be helpful. a variety of potential antiviral targets for sars-cov- have been considered including the spike protein and replicase. based upon previous successful antiviral drug development for hiv- and hepatitis c, the sars-cov- main protease (mpro) appears an attractive target for drug development. here we show the existing pharmacopeia contains many drugs with potential for therapeutic repurposing as selective and potent inhibitors of sars-cov- mpro. we screened a collection of ~ , drugs with a previous history of use in humans for compounds that inhibit the activity of mpro in vitro. in our primary screen we found ~ compounds with activity against mpro (overall hit rate < . %). subsequent dose validation studies demonstrated dose responsive hits with an ic < m. hits from our screen are enriched with hepatitis c ns / a protease targeting drugs including boceprevir (ic = . m), ciluprevir ( . m). narlaprevir (ic = . m), and telaprevir ( . m). these results demonstrate that some existing approved drugs can inhibit sars-cov- mpro and that screen saturation of all approved drugs is both feasible and warranted. taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis c ns / a viral protease should be revisited because some previous lead compounds may be more potent against sars-cov- mpro than boceprevir and suitable for rapid repurposing. the sars virus and sars-cov- , the cause of the covid- pandemic, are zoonotic coronaviruses found in bats that can infect humans. initial symptoms of sars-cov- infection include fever, myalgia, cough, and headache. infection usually resolves without active medical intervention, but for a subset of cases infection can progress to viral pneumonia and a variety of complications including acute lung targetable activities for covid- , the coronavirus mpro seems a likely choice for rapid drug to accelerate drug development we employed a drug repurposing strategy, an approach of utilizing previously approved drugs for new indications , . previous work suggests libraries enriched with known bioactive drug-like compounds provide the best opportunity for finding new lead compounds , . thus we attempted the selective optimization of side activities (sosa) approach as a rapid and cost effective means to identify candidate hits while minimizing the number of compounds screened. the sosa approach proceeds by two steps. first a limited set of carefully chosen, structurally diverse, well- characterized drug molecules are screened; as approved drugs, their bioavailability, toxicity and efficacy in human therapy has already been demonstrated , . to screen as much of the available approved drug space as possible in an easily accessible format we chose to screen the broad institute drug repurposing library ( compounds, see table s ) . this represents about half of the approximately , approved or experimental drugs known to human clinical medicine . there are significant cost and time advantages realized by drug repurposing as it can accelerate the preclinical phase of development and streamline clinical trials to focus on efficacy rather than safety. repositioning existing approved drugs with the capacity to inhibit covid- virus replication and infection would be of profound utility and immediately impact health care in the current pandemic. there are no drugs in clinical use specifically targeting coronavirus replication. the major advantage of the approach taken here is that by screening drugs with a history of previous clinical use, we will be focusing on compounds with known properties in terms of pharmacokinetics (pk), pharmacodynamics (pd) and toxicity. thus, the broad repurposing library we screened consists of compounds suitable for rapid translation to human efficacy trials. we began assay development by selecting potentially suitable synthetic mpro substrates and compared catalyzed hydrolysis curves between fluorescently labeled substrates (ac-abu-tle-leu-gln-afc , dabcyl-vklq-edans, ac-vklq-afc, dabcyl-tsavlqsgfrkm-edans , and mca- . we chose to use the recently published ac-abu-tle-leu-gln-afc (abu= -aminobutyrate, tle=tbutylglycine) synthetic non-canonical amino-acid containing peptide as mpro more readily cleaves this preferred sequence as compared to the native vklq sequence (fig a) . substrates dabcyl-tsavlqsgfrkm-edans and mca-avlqsgfr-k(dnp)-k-nh had drastically lower rates of mpro catalyzed hydrolysis and were not considered further in our assay development ( fig a) . to determine concentration ratios between mpro and substrate, we next preformed a two-dimensional titration and chose nm mpro and µm substrate for a balance of relatively modest mpro protein requirement and a robust fluorescence intensity ( fig b) . before screening the broad library, we piloted our assay conditions against the nih clinical collections library (~ compounds) and calculated our z'-factor for each plate at . and . (fig c and d) . z'-factor is a score of suitability of assays for high-throughput screening and is derived from the equation z ' -factor = − ( + ) | − | , where σ = standard deviation, µ=mean, p=positive controls, and n=negative controls. a score greater than . indicates a screenable assay. although no promising compounds were identified from this smaller library, it demonstrated that our assay was sufficiently robust for screening the much larger broad repurposing library. window was considered at z-score ≤ - and was calculated as the z-score of Δrfu at minutes corresponding to the linear portion of the curve. x-axis indicates arbitrary compound number arranged by increasing z-score. (d) z'-factor for the two nih clinical collection -well plates. pink circles indicate negative control (dmso) and black circles represent positive controls (no protein). z'factor calculated at . and . for plates and respectively. y axis represents change in rfu over minutes. the concept of drug repurposing is to utilize existing therapeutic drugs to treat a new disease indication. this approach is particularly relevant for covid- because of the potential for an accelerated clinical impact as compared to de novo drug development. a systematic approach to facilitate drug repurposing has recently been described ( , http:// www.broadinstitute.org/repurposing) and has made a large collection of drugs with previous history of use in humans available for high throughput screening. we acquired this at -well density using the optimized kinetic mpro assay described in fig . our overall repurposing strategy is described in fig a. we conducted a single point screen at m compound concentration and observed ~ compounds with activity against sars-cov- mpro for an overall hit rate < . %. these compounds were screened in parallel against the natural amino acid substrate (ac-vklq-afc) as well as a kinetically preferred substrate (ac-abu-tle-leu-gln-afc) ( fig b) . individual compounds are shown in table . virtually. any hit from the broad library (z-score ≤ - ) was validated for dose-responsiveness. all suitable compounds passing this filter with satisfactory curve fitting and potency were ordered as powder and re- validated. future efforts will test for selectivity and in orthogonal assays for suitability. although outside the scope of this report, determination of viral anti-replicative properties as well as toxic profile at required dosage will be determined. the goal of this paradigm is to find suitable candidates for development both as tools for probing underlying mechanisms of sars-cov- as well as for translational potential. (b) screen of the broad repurposing library. library was screened at a concentration of µm against both ac-vklq-afc (black) and ac-abu-tle-leu-gln-afc (purple). hit window was considered for compounds falling below z-score ≤ - against both substrates and consisted of compounds. compounds ordered by average z-score. we validated the hits from the primary screen by conducting a -point dose-response analysis with a drug concentration range from m down to . nm ( -fold dilution series). from this dose-response analysis, repurposing library. using this approach, we derived a docking score for each compound (see table s for broad repurposing library with docking scores). we observe a poor correlation (pearson r= . ) between mpro docking score and z-score in the protease inhibition assay (fig a) . furthermore, top hits from the screen also exhibit a weak correlation (pearson r=- . ) between compound potency and docking score (fig b) . is to complete a survey of approved drugs to identify therapies that can block covid- viral replication by inhibiting the main viral protease. the advantage of this approach is that any approved drug identified can be advanced rapidly to clinical trials without extensive multi-year preclinical development efforts. this is also particularly germane given the limitations of animal models of covid- infection and a diverse variety of initial hits were identified in our high throughput screen of the broad library. of these, the most potent hits are all known protease inhibitors and there is strong representation from protease inhibitors developed to inhibit hcv protease ns / a (boceprevir, ciluprevir, narlaprevir, and telaprevir). clearly as approved or well-developed clinical candidates, these drugs exhibit pharmacological and pharmacodynamic properties well suited to repurposing as a covid- antiviral therapy. boceprevir and narlaprevir appear the most potent against mpro and may be suitable for repurposing. ( nm final concentration in reaction buffer detailed above) was added with a multiflo fx liquid dispenser using a µl cassette. compounds were incubated with mpro for minutes at rt after which ul of substrate ( um final concentration of either ac-vklq-afc or ac-abu-tle-leu-gln-afc) was dispensed into the plate and read using a cytation multi-mode reader immediately at / nm excitation and / nm emission wavelengths every minutes for minutes. data was analyzed using biotek gen software, microsoft excel, and graphpad prism . hit compounds were ordered from the broad institute pre-plated in -well format (greiner ) as -point serial dilutions ( -fold) at nl per well. mpro ( nm final concentration) and substrate (ac- abu-tle-leu-gln-afc at µm final concentration) were dispensed in the same manner described above. graphs were generated using graphpad prism . ic calculations were performed using graphpad prism the novel coronavirus originating in wuhan china: challenges for global health governance epidemiologic and clinical characteristics of novel coronavirus infections involving patients outside wuhan cov- and coronavirus disease : what we know so far mortality in older patients with covid- clinical conditions of coronavirus disease (covid- ) pneumonia: a multicenter sars and mers: recent insights into emerging coronaviruses viral and cellular proteins involved in coronavirus replication crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors structure of m(pro) from covid- virus and discovery of its inhibitors atazanavir: a novel hiv- protease inhibitor. expert opinion on investigational drugs simeprevir for the treatment of chronic hepatitis c. expert opinion on pharmacotherapy drug repositioning: identifying and developing new uses for existing drugs a small-molecule screen in c. elegans yields a new calcium channel antagonist an antidepressant that extends lifespan in adult caenorhabditis elegans selective optimization of side activities: the sosa approach selective optimization of side activities: another way for drug discovery the drug repurposing hub: a next-generation drug library and information resource drug bank: open data drug & drug target database profiling of substrate specificity of sars-cov cl structure-based design of antiviral drug candidates targeting the sars-cov- main protease challenges in modern drug discovery: a case study of boceprevir, an hcv protease inhibitor for the treatment of hepatitis c virus infection the discovery and development of boceprevir discovery of narlaprevir (sch ): a potent sustained virologic response after therapy with the hcv protease inhibitor narlaprevir in combination with peginterferon and ribavirin is durable through long-term follow-up rapid decline of viral rna in hepatitis c patients treated with vx- : a phase ib, placebo-controlled, randomized study combination therapy with telaprevir and pegylated interferon suppresses both wild-type and resistant hepatitis c virus accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field prospective evaluation of free energy calculations for the prioritization of recent progress in the discovery of inhibitors targeting coronavirus proteases recent developments on coronavirus main protease/ c like protease inhibitors. recent patents on anti-infective drug discovery role of lopinavir/ritonavir in the treatment of sars: initial virological and clinical findings prediction of the sars-cov- ( -ncov) c- like protease ( cl (pro)) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates glecaprevir and maraviroc are high-affinity inhibitors of sars-cov- main protease: possible implication in covid- therapy boceprevir for untreated chronic hcv genotype infection antiviral activity of narlaprevir combined with ritonavir and pegylated interferon in chronic hepatitis c patients preclinical characterization of the antiviral activity of sch (narlaprevir), a novel mechanism-based inhibitor of hepatitis c virus ns protease broad spectrum antiviral remdesivir inhibits human endemic and zoonotic deltacoronaviruses with a highly divergent rna dependent rna polymerase remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro glide: a new approach for rapid, accurate docking and scoring method and assessment of docking accuracy glide: a new approach for rapid, accurate docking and scoring. . enrichment factors in database screening key: cord- -iv gs kg authors: kim, youngchang; wower, jacek; maltseva, natalia; chang, changsoo; jedrzejczak, robert; wilamowski, mateusz; kang, soowon; nicolaescu, vlad; randall, glenn; michalska, karolina; joachimiak, andrzej title: tipiracil binds to uridine site and inhibits nsp endoribonuclease nendou from sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: iv gs kg sars-cov- nsp is a uridylate-specific endoribonuclease with c-terminal catalytic domain belonging to the endou family. it degrades the polyuridine extensions in (−) sense strand of viral rna and some non-translated rna on (+) sense strand. this activity seems to be responsible for the interference with the innate immune response and evasion of host pattern recognition. nsp is highly conserved in coronaviruses suggesting that its activity is important for virus replication. here we report first structures with bound nucleotides and show that sars-cov- nsp specifically recognizes u in a pattern previously predicted for endou. in the presence of manganese ions, the enzyme cleaves unpaired rnas. inhibitors of nsp have been reported but not actively pursued into therapeutics. the current covid- pandemic brought to attention the repurposing of existing drugs and the rapid identification of new antiviral compounds. tipiracil is an fda approved drug that is used with trifluridine in the treatment of colorectal cancer. here, we combine crystallography, biochemical and whole cell assays, and show that this compound inhibits sars-cov- nsp and interacts with the uridine binding pocket of the enzyme’s active site, providing basis for the uracil scaffold-based drug development. the current pandemic of covid- is caused by severe acute respiratory syndrome coronavirus (sars-cov- ). as a typical member of the coronaviridae family it is spherical, enveloped, non-segmented, (+) sense rna virus with a large ~ kbs genome. there is no existing vaccine or proven drug against sars-cov- . since february , concerted efforts have focused on characterizing its biology and developing various detection and treatment options, ranging from vaccines through antibodies to antivirals. the coronaviruses (+) sense rna genomes are used as mrna for translation of a large replicase made of two large polyproteins, pp a and pp ab, and as a template for replication of its own copy . these polypeptides are processed by two viral proteases: papain-like protease (plpro, encoded within nsp ), and c-like protease ( clpro or mpro, encoded by nsp ). for cov- the cleavage yields nonstructural proteins, nsps (nsp codes for just a -residues peptide) that assemble into a large membrane-bound replicase-transcriptase complex (rtc) and exhibits multiple enzymatic and binding activities. in addition, several sub-genomic rnas are generated during virus proliferation from (-) sense rna, resulting in translation of structural proteins and - accessory proteins. nonstructural proteins are potential drug targets for therapies and, because of their sequence and function conservation, developed therapeutics might in principle inhibit all human coronaviruses. nsp is a uridylate-specific endoribonuclease. its catalytic c-terminal domain shows sequence similarity and functionality of the endou family enzymes, broadly distributed in viruses, archaea, bacteria, plants, humans and other animals, that are involved in rna processing. the viral endou subfamily have been named nendou. the nsp enzyme is active as a kda hexamer consisting of three dimers. it was reported that nendou cleaves both single-and double-stranded rna at uridine sites producing ', '-cyclic phosphodiester and ´-hydroxyl termini . the ', '-cyclic phosphodiester is then hydrolyzed to '-phosphomonoester. in coronaviruses, toroviruses and arteriviruses, nendou is mapping to nsp s and nsp s. nsp s are much more sequence conserved than nsp s. it was proposed that in coronaviruses nsp affects viral replication by interfering with the host's innate immune response , . to evade host pattern recognition receptor mda responsible for activating the host defenses, the nsp cleaves the ′-polyuridine tracts in (-) sense viral rnas, which are the product of polya-templated rna synthesis . these polyu sequences correspond to mda -dependent pathogen-associated molecular patterns (pamps) and nendou activity limits their accumulation. for sars-cov it was reported that nsp cleaves highly conserved non-translated rna on (+) sense strand showing that both rna sequence and structure are important for cleavage , . recently we determined the first two crystal structures of sars-cov- nsp and showed that it is similar to the sars-cov and mers-cov homologs. we have also shown that purified recombinant sars-cov- nsp can efficiently cleave synthetic oligonucleotide substrate containing single ru, ′- -fam-darudada - -tamra- ′ . studies of nendou subfamily members, nsp and nsp , indicate that the enzymes vary in their requirement for metal ion, with nsp variants showing dependence on mn + . at the same time, the proteins are often compared to metalindependent eukaryotic rnase a, which has a completely different fold, but shares active site similarities and broad function. specifically, it carries two catalytic histidine residues, his and his , that correspond to indispensable his and his in sars-cov nsp . rnase a cleaves its substrates in a two-step reaction consisting of transphosphorylation that generates ', '-cyclic phosphodiester followed by hydrolysis leading to '-phosphoryl terminus. the latter step has only been reported for sars-cov nsp . here, we have expanded nsp research to explore endoribonuclease sequence specificity, metal ion dependence and catalytic mechanism. in addition to biochemical data, we report four new structures of the enzyme in complex with 'ump, 'ump, 'gpu and tipiracil -a uracil derivative. this compound is approved by fda as a combination drug that is used with trifluridine in the treatment of colorectal cancer . tipiracil inhibits the enzyme thymidine phosphorylase which metabolizes trifluridine. we discovered that tipiracil inhibits nsp endoribonuclease activity, albeit not adequately to be an effective cure, but this scaffold can serve as a template for compounds that may block virus proliferation. we tested nsp endoribonuclease activity using several assays and different substrates. we show that sars-cov- nsp requires mn + ions and retains only little activity in the presence of mg + ions. the enzyme cleaves efficiently eicosamer 'gaacu¯cau¯ggaccu¯u¯ggcag ' at all four uridine sites (fig. ) , as well as synthetic endou substrate ( ′- -fam-daru¯dada - -tamra- ′ ) in the presence of mn + and the reaction rate increases with metal ion concentration. the cleavage of the eicosamer seems to show no sequence preference as 'cu¯c, 'au¯g, 'cu¯u and 'uu¯g are recognized and cut, especially at higher manganese concentration ( fig. . sars-cov- nsp might represent the other extreme end of the range. differences in the requirement for mn + ions may affect the outcomes of the endoribonuclease activity. for example, metal-independent nsp from eav and some nsp s generate oligonucleotides with ', '-cyclic phosphodiesters at their ' ends . in contrast, metal-dependent sars-cov nsp seems to be able to hydrolyze ', '-cyclic phosphodiesters and produce oligonucleotides with '-phosphoryl groups on their '-ends . by analogy, the sars-cov- homolog should follow a two-step reaction, but this hypothesis is yet to be tested as our assays do not discriminate between the two. the requirement for mn + ions by sars-cov- nsp may by itself enhance pathogen infectivity because mn + is required for the host innate immune responses to viral infections. rna binding of sars-cov nsp was reported to be enhanced in the presence of mn + , whereas rna binding by the cellular xendou homolog from frog was not affected by mn + ions , . the role of metal ion is puzzling as it was proposed that nsp may have similar activity to eukaryotic rnase a. however, the rnase a activity is metal independent, perhaps suggesting that metal ions may help to position rna for binding or cleavage. no metal ions or strong metal-binding sites were found in any of the models of endou family members available in the protein data bank (pdb). our efforts to locate mn + ion in the nsp structures also failed, arguing against a direct role of metal ions in catalysis. however, we observe the increase of nsp thermal stability in the presence of such ions ( supplementary fig. ), though molecular basis of this behavior remains enigmatic. previous data for sars-cov nsp , mn + ions were found to change the intrinsic tryptophan fluorescence of sars-cov nsp , indicating conformational changes , . since tipiracil is an uracil derivative and nsp is specific for u, we have speculated that the compound may inhibit the enzyme. this hypothesis was tested by using heptamer rna with single uridine site aggaagu, which under experimental conditions remains single-stranded and does not form duplexes. when labeled with cytidine ', '-[ '- p] bisphosphate (pcp), '- p-labeled derivatives of the heptamer and unlabeled 'cmp are produced. the transfer of the p from the pcp to the ' terminus of the heptamer is consistent with transphosphorylation mechanism. in the presence of mm mncl , only partial cleavage is observed. this reaction is decreased by % in the presence of . mm tipiracil (fig. ) . at the higher mn + ion concentration, the cleavage reaction proceeds to completion but no inhibition by tipiracil can be measured (data not shown). we have also performed s antigen elisa assay in a cells and cov- virus replication assay using qrt-pcr. tipiracil was not affecting viability of cells but the inhibition of virus was found to be modest in the concentration range - mm (fig. ) . these preliminary data suggest that the affinity of the compound may need to be improved in order to serve as an antiviral drug. structural information is essential for this process (see below). sars-cov- nsp protein was crystallized with 'ump, 'ump, 'gpu and tipiracil using methods described previously and the structures were determined at . Å, . Å, . Å and . Å, respectively. crystals with ligands diffract to higher resolution than the apoprotein ( . Å). of note, we were unable to obtain co-crystals of nsp with 'amp, ', '-cyclic amp, 'gmp, 'tmp and 'cmp using this same set of conditions as tested for 'ump. all structures were solved by molecular replacement using nsp (pdb id: wvv) and refined as described in supplementary materials and methods and supplementary table . the majority of the un-cleaved his-tag is not ordered and not visible but the electron density is excellent for all residues from m to q and for all bound ligands (fig. ) . in all four structures, ligands bind to the c-terminal catalytic domain active site (fig. ) , though the exact positions vary, as described below. in 'gpu and tipiracil complexes, there is also a phosphate ion bound in the catalytic pocket. the compound binding is facilitated by side chains of seven conserved residues his , his , lys , trp , thr , tyr , and ser and main chain of gly , lys , and val , as well as water molecules (fig. ) . one non-conserved residue, gln , is also involved through a water-mediated interaction. the interactions do not trigger any major protein conformational changes either globally or locally. in fact, the catalytic residues (his , his ) and other active site residues discussed below show very similar conformations in all complexes (rmsd . Å over ca atoms for residues his , gly , his , lys , trp , thr , tyr and ser in the pairwise superposition of complexes with the apo-structure (the highest is . Å for b chain of the tipiracil complex and the lowest is . Å for the chain b of the 'ump complex)). position of phosphate ion or '-phosphoryl group of 'ump is also preserved. specific interactions of ligands with protein are described below. the model of uracil binding by endou was proposed based on the endou and rnase a structures , speculating that ser might be responsible for base recognition. here we describe experimental details of such sequence discrimination. the base of 'ump forms van der waals contacts with tyr and several hydrogen bonds with active site residues, including side chain og and main chain nitrogen atom of ser (fig. a ). these interactions define o and n specificity. o interacts with main chain nitrogen atom of leu defining the uracil specificity. potentially, cytosine and thymine pyrimidines may also bind -an amino group in position of c should be compatible with the recognition pattern and a methyl group in position should not interfere with binding either as uracil position is solvent accessible. in fact, recognition of c has been reported for distantly related, though with similar active site, bacterial endou anticodon trnase . the ribose ring makes several hydrogen bonds with protein residues: (i) 'oh interacts with lys (nz) and with ne of the catalytic his , (ii) 'oh makes water mediated hydrogen bonds with catalytic his (ne ), his (ne ), thr (og ) and gly (main chain nitrogen atom), and (iii) o ' is hydrogenbonded to main chain of val via a water molecule. interestingly, the '-phosphoryl group projects into solvent with no interaction with protein atoms. its only ordered interaction is with 'oh through a water molecule. this '-phosphoryl group location overlaps with that of nsp / 'gpu complex (see below). also worth mentioning is that the ribose in the nsp / 'ump together with the phosphate ion in the nsp /tipiracil mimic ' '-cyclic phosphodiester. the structure with 'ump shows how the enzyme discriminates between uracil and purine bases with ser serving as the key discriminatory residue, as has been hypothesized before , , , . in rna sequence containing uracil, such as 'npgpupn ', where n corresponds to any base, the nsp cleavage would produce 'npgpu 'p, if transphosphorylation is followed by hydrolysis of 'npgpu ' 'p. in the crystal structure of nsp / 'gpu, the dinucleoside monophosphate binds to the active site with uracil interacting with tyr and ser (fig. b ), as seen in the nsp / 'ump complex. however, the distance between o and main chain nitrogen of leu is too long to make a good hydrogen bond. this implicates some flexibility at the protein c-terminus and suggests that the amino group in position of cytosine can be accommodated, as we suggested above and reported previously . the guanine ring is stacking against trp and makes two hydrogen bonds with water molecules. the absence of defined base-side chain interactions suggests lack of specificity for this site in the substrate sequence. the nsp / 'gpu complex binds also a phosphate ion in the active site, most likely from the crystallization buffer. the ion interacts with the protein side chains (his , his , thr , and lys ) and uridine ribose 'oh and 'oh groups. it most likely mimics the binding of scissile phosphoryl group of the substrate. the backbone phosphoryl group ( ' of u) faces solvent as in the nsp / 'ump complex and makes a hydrogen bond to a water molecule. this structure and the nsp / 'ump complex illustrate location and specificity determinants of the uridine with a '-phosphoryl group. the binding of guanine in the structure identifies strong base binding site at trp , which, however, may not necessarily be dedicated to 'end of the oligonucleotide (see discussion below). we co-crystallized nsp with 'ump, assuming that the enzyme would dock it in a manner expected for uridine monophosphate nucleotide in the contexts of larger rna substrate, preserving the uracil specific interactions described above. surprisingly, the uracil base is anchored by trp in the guanine site observed in the nsp / 'gpu complex (fig. c) , confirming that this site can accommodate purine and pyrimidine bases. the '-phosphoryl group occupies the phosphate ion site created by his , his and thr , as observed in nsp / 'gpu. it is surprising that uracil does not go into its dedicated site, but the result demonstrates higher affinity for the base in the trp site than in uracil-recognition site, potentially governed by the strong stacking interactions with the aromatic side chain that take precedence over hydrogen bonds observed in the uracil binding of 'ump. the identity of the trp -interacting base is irrelevant, especially given that the enzyme's substrate is most likely a larger rna molecule. the tipiracil molecule binds to the uracil site as observed in 'ump and 'gpu ( fig. d and ). the molecule makes several substrate analog-like interactions. the uracil ring stacks against tyr and makes hydrogen bonds with ser (interacting with o and n ), lys (o ) and his . the n atom makes a hydrogen bond with a phosphate ion and through it connects to lys . there are also two water mediated interactions to ser and main chain carbonyl oxygen atom of val . iminopyrrolidin nitrogen atom binds to gln , representing the only interaction unique to the ligand. nsp binds tipiracil in its active site in a manner compatible with competitive inhibition and the compound and its derivatives may serve as inhibitors of the enzyme. this structure suggests that uracil alone may have similar inhibitory properties and provides basis for the uracil scaffold-based drug development. the differential scanning fluorimetry experiments (dsf) showed that melting temperature (tm) of the nsp in a presence of tipiracil, 'gpu, ˈump, ˈump and 'tmp is approximately . °c under investigated conditions with buffer that contains mm mncl ( supplementary fig . a, b) . depicted local minima of the nsp first derivative of fluorescence signals have the same tm values as a control. although, denaturation profile of the nsp in a presence of tipiracil is broader and consistently shifted ( . °c) to higher temperatures ( supplementary fig . a) . therefore, dsf results indicates the small increase of stability of complex of the nsp and tipiracil in comparison to control sample and other tested nsp complexes with 'gpu, ˈump, ˈump and tmp. additional change in the nsp tm is observed at °c (supplementary fig. b ). this is caused by all ligands and may be related to increased stability of the hexamer or endou catalytic domain. in the presence of mn + ions the main tm of nsp is increased from . °c to . °c and is mn + concentration dependent. interestingly, at mm and mm concentrations of mn + a new local tm minimum is observed at °c potentially suggesting increased stability of the endou domain ( supplementary fig. c ). our structures of complexes with nucleotides can inform catalytic mechanism of nsp endoribonuclease. we compared our structures with eukaryotic rnase a, a very well-studied model system . rnase a recognizes pyrimidine nucleotides in rna, preferring c over u, and catalyzes a two-step reaction, the transphosphorylation of rna to form a ', '-cyclic cmp intermediate followed by its hydrolysis to 'cmp. nsp also recognizes pyrimidines, preferring u over c, and was proposed to catalyze an analogous reaction . in rnase a the base selectivity is achieved by thr that forms specific hydrogen bonds similar to those created by ser in the nsp / 'ump and nsp / 'gpu complexes. in rnase a the transphosphorylation reaction proceeds via an asynchronous concerted general acid/base mechanism involving his , his and lys . in this mechanism the 'oh proton is transferred to the deprotonated form of his to activate the 'o nucleophile. then, the protonated his donates a proton to the departing 'oh group. lys function is to stabilize the negative charge that accumulates on the nonbridging phosphoryl oxygen atoms in the transition state. in the hydrolysis step the role of two histidine residues are inverted. the rnase a active site is well organized and has several distinct pockets for binding rna substrates (e.g. bases b , b , b and phosphoryl groups p , p and p ) (for review see , with dna (fig. ) . therefore, his is very likely to directly activate 'oh. lys seems to play a role of lys in rnase a. main chain amide of gly may provide function of gln in binding to the substrate phosphoryl group. if his is a base in nsp then the his must be a proton donor for the departing 'oh group and equivalent of his of rnase a. however, these residues are ~ Å apart and approach the phosphoryl group from different direction (fig. a) . in rnase a, his forms a hydrogen bond with asp which may provide proton for the reaction. the structural environment for his is different in nsp . thr forms a hydrogen bond with his and there is also asp further away that makes a water mediated hydrogen bond with his . in the hydrolysis step of converting the ' '-cyclic phosphate back to 'oh and '-phosphoryl group the roles of histidines are reversed and now his must be a base deprotonating a water molecule and his serves as a donor of proton for the 'oh leaving group. a different set of interactions involving nsp and rnase a catalytic residues may explain why nsp activity is more sensitive to low ph (data not shown) and it is expected that kinetics of the reactions will be different. besides similarities in p , the two enzymes share the organization of b pocket. here, nsp has a very well-defined uracil recognition site made of ser , tyr and l that are equivalent to thr , phe and ser in rnase a. further extrapolation from the rnase a model of the desoxyoligonucleotide binding allows us to hypothesize that b site, dedicated to the base on the 'end of scissile bond, in nsp is created by trp . its side chain provides stacking option for a base with no base selectivity function. yet in our nsp / 'gpu structure this site is occupied by the g base located on the 'end of u. we speculate, that for oligonucleotides that are flanked on both sides of u, mimicking the rnase a ligand, ' guanine (or a different base) may adopt position of 'da that locates to the b site owing to the rotation of the p- 'o bond. then, the available trp base binding site can accept base on the ' end of the uridine moiety, somewhat resembling our nsp / 'ump structure. when we combine oligonucleotides from our structures with rnase a/dna complex a plausible model can be built of 'gpup¯u nucleotide bound to the nsp active site (fig. b ). this model underscores importance of conserved trp in anchoring rna in the active site. while ser is the key residue in discriminating base, the hydrophobic interaction with trp may be a significant force for ligand docking. the b site is not easily identifiable in the available nsp structures. in addition, unlike in rnase a, where all p -p subpockets contribute to the backbone binding, in nsp only p site is currently well defined. '-phosphoryl groups in position p of the ligands do not form any direct contacts with the protein, while rnase a p site has lys participating in rna binding. p site of rnase a is created by lys and it appears that his may fulfill such role in nsp . the active site of nsp can accommodate all four ligands, three nucleotides ( 'ump, 'ump, 'gpu) and one synthetic analog (tipiracil) and protein interactions with these molecules seem to stabilize the active site residues and entire catalytic endou domain resulting in a better ordered protein. previous mutational studies demonstrated that two histidine (his and his ) and one serine (ser ) residues are essential for sars-cov nsp activity. we now, for the first time, illustrated how ser participates in uracil (or pyrimidine) recognition and we concur with previous suggestions that his may be key to deprotonate the 'oh to allow nucleophilic attack on the phosphoryl group. comparisons of nsp with rnase a active site show some similarities (active site residues conservation) and indicated a common catalytic mechanism but organization of the rna binding site is distinct, especially at sites more distant from the elements crucial for chemistry. our structures are consistent with the binding of single stranded nucleic acid, such as loops or bulges, as was shown for nsp s and other endous. both uracil and proceeding base must be in an unpaired region in order to bind to nsp , which is consistent with degradation of polyu tracks as reported recently . our structures show that the proceeding base can be guanine or uracil, or other bases as well (see below). accommodation of 'gpu and 'gpupu is in agreement with previous reports of guanine preference in anterior position with respect to u and efficient degradation of polyu tracks. however, sars-cov- nsp can cleave rna, at high mn + concentrations, at uridine sites connected to any base in anterior position; for example, the synthetic endou substrate used for the nuclease assay has "a". nsp binds nucleotides to the catalytic domain of each monomer independently, therefore it is not clear why the hexamer is required for the endou activity of the enzyme. we show that nsp can bind and hydrolyze , and nucleotide long rna. it is possible that the hexamer is needed to bind longer rna substrates or is involved in interactions with other proteins (for example nsp and nsp ) within rtc . the role for metal ion requirement remains a puzzle. although the mn + dependence has been reported for some endou members and appears to be a common feature of nendou subfamily, the metal-binding site was never located. past studies indicated that single strand polyu rna is relatively unstructured under most conditions. it was showed that nsp cleavage is rna sequence and structure dependent. it is possible that the metal is required for maintaining conformation of the rna substrate during catalysis. for example, metal ions like mg + , mn + and zn + form complexes with purine nucleotides to affect outcome of many enzymatic reactions . binding of mn + ions to rna molecules may dramatically transform their structure, as it was shown for riboswitch and the bacillus subtilis m-box aptamer that sequence contains u track (pdb id: pdr, ). as manganese can exist in several oxidation states, it is also possible that metal redox properties can affect protein and rna interactions and chemistry. tipiracil, an uracil derivative, binds to nsp uracil site in a manner consistent with competitive inhibition. in vitro it inhibits nsp rna nuclease activity and shows modest inhibition of cov- virus replication in the whole cell assay. while the compound itself is not optimal for the therapeutic applications, our work shows that uracil and its derivatives may represent a plausible starting point for nucleotide-like drug development. moreover, interaction of trp with bases may provide additional site to build much higher affinity inhibitors. [ '- p]pcp was prepared by phosphorylation of '-cmp as described by england et al. ( ) . rna eicosamers were synthetized by runoff transcription of synthetic double-stranded dna templates as described by sherlin et al. . '- p-labeled rna eicosamers and '- p-labeled rna heptamers were prepared according to zwieb et al. and england et al. , respectively. protein was expressed and purified as described previously. briefly, a l culture of lb lennox higher occupancies of the ligands in the structures, each ligand compound was added to the cryosolution and the co-crystal was soaked for - min before frozen in the liquid nitrogen. typical reaction contained x cpm of '- p-labeled rna eicosamers or '- p-labeled rna the x-ray diffraction experiments were carried out at the structural biology center -id beamline at the advanced photon source, argonne national laboratory. the diffraction images were recorded at k from all crystal forms on the pilatus x m detector using . ° rotation and . sec exposure for °, °, and ° for nsp / 'ump, nsp / 'gpu and nsp /tipiracil, respectively. the data were integrated and scaled with the hkl suite . intensities were converted to structure factor amplitudes in the ctruncate program , from the ccp package and using the apo-form sars-cov- nsp structure (pdb id: vww) as a search model, the structures were determined using molrep , all implemented in the hkl software package. the initial solutions were refined, both rigid-body refinement and regular restrained refinement by refmac , as a part of hkl . the models including the ligands were manually adjusted using coot and then iteratively refined using coot and phenix . throughout the refinement, the same % the architecture of sars-cov- transcriptome nidovirus ribonucleases: structures and functions in viral replication coronavirus nonstructural protein mediates evasion of dsrna sensors and limits apoptosis in macrophages porcine deltacoronavirus nsp antagonizes interferon-beta production independently of its endoribonuclease activity coronavirus endoribonuclease targets viral polyuridine sequences to evade activating host sensors rna recognition and cleavage by the sars coronavirus endoribonuclease functional characterization of xendou, the endoribonuclease involved in small nucleolar rna biosynthesis crystal structure of nsp endoribonuclease nendou from sars-cov- . protein science : a publication of the biochemical characterization of arterivirus nonstructural protein reveals the nidovirus-wide conservation of a replicative endoribonuclease trifluridine/tipiracil (lonsurf) for the treatment of metastatic colorectal cancer turkey coronavirus non-structure protein nsp --an endoribonuclease the severe acute respiratory syndrome coronavirus nsp protein is an endoribonuclease that prefers manganese as a cofactor mutational analysis of the sars virus nsp endoribonuclease: identification of residues affecting hexamer formation functional plasticity of antibacterial endou toxins structural and functional analyses of the severe acute respiratory syndrome coronavirus endoribonuclease nsp major genetic marker of nidoviruses encodes a replicative endoribonuclease bovine pancreatic ribonuclease: fifty years of the first enzymatic reaction mechanism integration of kinetic isotope effect analyses to elucidate ribonuclease mechanism bovine pancreatic ribonuclease a as a model of an enzyme with multiple substrate binding sites crystal structure of ribonuclease a.d(aptpapapg) complex. direct evidence for extended substrate recognition situ tagged nsp reveals interactions with coronavirus stabilities and isomeric equilibria in aqueous solution of monomeric metal ion complexes of adenosine '-diphosphate (adp -) in comparison with genetic analysis of riboswitch-mediated transcriptional regulation responding to mn + in salmonella insights into metalloregulation by m-box riboswitch rnas via structural analysis of manganese-bound complexes linking crystallographic model and data quality molprobity: structure validation and all-atom contact analysis for nucleic acids and their complexes chemical and enzymatic synthesis of trnas for high-throughput crystallization three-dimensional folding of the trna-like domain of escherichia coli tmrna specific labeling of ' termini of rna with t rna ligase hkl- : the integration of data reduction and structure solution--from diffraction images to an initial model in minutes treatment of negative intensity observations a statistic for local intensity differences: robustness to anisotropy and pseudo-centering and utility for detecting twinning overview of the ccp suite and current developments molecular replacement with molrep refinement of macromolecular structures by the maximum-likelihood method coot: model-building tools for molecular graphics phenix: a comprehensive python-based system for macromolecular structure solution procheck: a program to check the stereochemical quality of protein structures cell viability cell viability was measured by staining with celltracker™ red cmtpx (thermofisher scientific). cov- infected cell was stained with µm of celltracker™ red cmtpx for min and then detected by tecan infinite m (tecan) at ex /em nm. after reading, cells were fixed by % neutral buffered formalin (nbf) for immunohistochemistry assay in pbs), and then blocked for min with pbs containing % bsa at room temperature. after blocking, endogenous peroxidases were quenched by % hydrogen peroxide for min : ) in pbs containing % bsa overnight at °c. primary antibody was washed with pbs and pbs-t and then cells were incubated in secondary antibody (immpress horse anti-mouse igg polymer reagent, peroxidase; vector laboratories) for min at room temperature. after washing with pbs for min thermofisher scientific) for mins and detected by tecan infinite m (tecan) at nm after replacing dab solution to pbs. result was normalized by cell viability rna extraction and qrt-pcr total rna from sars-cov- infected cells was isolated using a nucleospin rna kit following the manufacturer's instructions (macherey-nagel). sars-cov- rna was quantified by qrt-pcr using superscript™ iii platinum™ one-step qrt-pcr kit w/rox (thermofisher scientific) and normalized using eukaryotic s we truthfully thank the members of the sbc at argonne national laboratory, especially darren sherrell and alex lavens for their help with setting beamline and data collection at beamline -id. of reflections were kept out throughout from the refinement (in both refmac and phenix refinement). the final structures converged to rwork = . and rfree = . for nsp / 'ump, rwork = . and rfree = . for nsp / 'gpu and rwork = . and rfree = . for nsp /tipircil with regards to each data quality. the stereochemistry of the structure was checked with procheck and the ramachandran plot and validated with the pdb validation server. the data collection and processing statistics are given in supplemental table . supplementary figure . denaturation curves of the nsp in a presence of tipiracil, 'gpu, ˈump, ˈump, tmp (a, b). thermal stability of nsp after addition of manganese ions (c).nsp samples were labeled with sypro orange dye . key: cord- - bbyu qx authors: matsuyama, shutoku; kawase, miyuki; nao, naganori; shirato, kazuya; ujike, makoto; kamitani, wataru; shimojima, masayuki; fukushi, shuetsu title: the inhaled steroid ciclesonide blocks sars-cov- rna replication by targeting viral replication-transcription complex in culture cells date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bbyu qx we screened steroid compounds to obtain a drug expected to block host inflammatory responses and mers-cov replication. ciclesonide, an inhaled corticosteroid, suppressed replication of mers-cov and other coronaviruses, including sars-cov- , the cause of covid- , in cultured cells. the effective concentration (ec ) of ciclesonide for sars-cov- in differentiated human bronchial tracheal epithelial cells was . μm. ciclesonide inhibited formation of double membrane vesicles, which anchor the viral replication-transcription complex in cells. eight consecutive passages of sars-cov- isolates in the presence of ciclesonide generated resistant mutants harboring single amino acid substitutions in non-structural protein (nsp ) or nsp . of note, ciclesonide still suppressed replication of all these mutants by % or more, suggesting that these mutants cannot completely overcome ciclesonide blockade. these observations indicate that the suppressive effect of ciclesonide on viral replication is specific to coronaviruses, highlighting it as a candidate drug for the treatment of covid- patients. importance the outbreak of sars-cov- , the cause of covid- , is ongoing. to identify the effective antiviral agents to combat the disease is urgently needed. in the present study, we found that an inhaled corticosteroid, ciclesonide suppresses replication of coronaviruses, including beta-coronaviruses (mhv- , mers-cov, sars-cov, and sars-cov- ) and an alpha-coronavirus (hcov- e) in cultured cells. the inhaled ciclesonide is safe; indeed, it can be administered to infants at high concentrations. thus, ciclesonide is expected to be a broad-spectrum antiviral drug that is effective against many members of the coronavirus family. it could be prescribed for the treatment of mers, and covid- . in the preprint of this study (posted in biorxiv)( ), we showed that a corticosteroid, ciclesonide, is a potent blocker of sars-cov- replication. based on the data in our preprint study, clinical trials of a retrospective cohort study to treat covid- patients were started in japanese hospital in march . the treatment regime involves inhalation of μ g ciclesonide (two or three times per day) for the steroid compounds chosen from the prestwick chemical library were examined to assess the inhibitory effects of mers-cov-induced cytopathic effects. vero cells treated with steroid compounds were infected with mers-cov at an moi = . and then incubated for days. four steroid compounds, ciclesonide, mometasone furoate, mifepristone, and algestone acetophenide, conferred a > % cell survival rate (fig. ) . interestingly, a structural feature of these compounds is a five-or six-membered monocycle attached to the steroid core. (table ). we examined replication of these mutants in the presence of ciclesonide. first, one of these isolates was tested in veroe /tmprss cells. at hpi, the amount of viral rna derived from the parental virus fell by -fold in the presence of ciclesonide; by contrast, the amount of rna derived from the escape mutant increased -fold compared with that of the parent virus (fig. s ). there was no difference between the parental virus and the escape mutant in the presence of other suppressed replication of all escape mutants by % or more, suggesting that these mutants cannot completely overcome ciclesonide blockade. mutations in the ciclesonide escape mutants were identified at three positions in nsp and at one position in nsp (fig. b) . of note, the amino acid substitution n k in nsp was caused by a different base change (t g and t a) ( table ) average of cell viability in the absence of virus was quantified using a wst assay (n= , in panel b and d). a novel coronavirus from patients with pneumonia in china drug treatment options for the -new coronavirus ( -ncov) remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( - ncov) in vitro the inhaled corticosteroid ciclesonide blocks coronavirus rna replication by targeting viral therapeutic potential of ciclesonide inahalation for covid- pneumonia: report of three cases a case of key: cord- -m u ryy authors: boudewijns, robbert; thibaut, hendrik jan; kaptein, suzanne j. f.; li, rong; vergote, valentijn; seldeslachts, laura; de keyzer, carolien; bervoets, lindsey; sharma, sapna; van weyenbergh, johan; liesenborghs, laurens; ma, ji; jansen, sander; van looveren, dominique; vercruysse, thomas; jochmans, dirk; wang, xinyu; martens, erik; roose, kenny; de vlieger, dorien; schepens, bert; van buyten, tina; jacobs, sofie; liu, yanan; martí-carreras, joan; vanmechelen, bert; wawina-bokalanga, tony; delang, leen; rocha-pereira, joana; coelmont, lotte; chiu, winston; leyssen, pieter; heylen, elisabeth; schols, dominique; wang, lanjiao; close, lila; matthijnssens, jelle; van ranst, marc; compernolle, veerle; schramm, georg; van laere, koen; saelens, xavier; callewaert, nico; opdenakker, ghislain; maes, piet; weynand, birgit; cawthorne, christopher; velde, greetje vande; wang, zhongde; neyts, johan; dallmeier, kai title: stat signaling as double-edged sword restricting viral dissemination but driving severe pneumonia in sars-cov- infected hamsters date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: m u ryy since the emergence of sars-cov- causing covid- , the world is being shaken to its core with numerous hospitalizations and hundreds of thousands of deaths. in search for key targets of effective therapeutics, robust animal models mimicking covid- in humans are urgently needed. here, we show that productive sars-cov- infection in the lungs of mice is limited and restricted by early type i interferon responses. in contrast, we show that syrian hamsters are highly permissive to sars- cov- and develop bronchopneumonia and a strong inflammatory response in the lungs with neutrophil infiltration and edema. moreover, we identify an exuberant innate immune response as a key player in pathogenesis, in which stat signaling plays a dual role, driving severe lung injury on the one hand, yet restricting systemic virus dissemination on the other. finally, we assess sars-cov- -induced lung pathology in hamsters by micro-ct alike used in clinical practice. our results reveal the importance of stat -dependent interferon responses in the pathogenesis and virus control during sars-cov- infection and may help rationalizing new strategies for the treatment of covid- patients. sars-cov- belongs to the family of coronaviridae, which contains a large group of viruses that are constantly circulating in animals and humans. illness in humans caused by coronaviruses is mostly mild and manifested by respiratory or digestive problems as leading symptoms . however, some coronaviruses, such as sars-cov- , mers-cov and the recent sars-cov- , have been responsible for serious outbreaks of severe and lethal respiratory disease , . unlike the previous outbreaks with sars-cov- and mers-cov, the current sars-cov- outbreak has evolved to the largest global health threat to humanity in this century. the unprecedented scale and rapidity of the current pandemic urges the development of efficient vaccines, antiviral and anti-inflammatory drugs. a key step in expediting this process is to have animal models that recapitulate and allow to understand viral pathogenesis, and that can in particular be used to identify new drug targets and preclinically assess preventive and therapeutic countermeasures. acute respiratory disease caused by sars-cov- and mers infections is characterized by a dysregulated inflammatory response in which a delayed type i interferon (ifn) response promotes the accumulation of inflammatory monocyte-macrophages [ ] [ ] [ ] . the severe lung disease in covid- patients seems to result from a similar overshooting inflammatory response . however, because even non-human primates do not fully replicate covid- , little information and no appropriate animal models are currently available to address this hypothesis . to address this knowledge gap, we compared the effect of sars-cov- infection in wild-type (wt) mice of different lineages (balb/c and c bl/ ) and syrian hamsters, as well as a panel of matched transgenic mouse and hamster strains with a knockout (ko) of key components of adaptive and innate immunity. we used an original patient isolate of sars-cov- (betacov/belgium/ghb- / ) that was passaged on huh and vero e cells for these studies ( fig. s and fig. s a ). for full characterization and to exclude possible contaminants, we performed deep sequencing on the inoculum that was used to infect the animals (fig. s a) . no adventitious agents could be detected (data not shown). however, two in-frame deletions in the n-terminal domain and the furin-cleavage site of spike (s) glycoprotein ( aa and aa, respectively) had occurred between cell culture passage p (mixed population of % wt genomes and % ( + aa del) mutant genomes) and p ( % ( + aa del) mutant genomes) [ ] [ ] [ ] , likely as adaptation to growth in vero e cells in vitro (fig. s b ). to first examine whether adaptive immunity contributed to the susceptibility to sars-cov- infection, we inoculated wt (immune-competent) and scid mice (lacking functional t and b cells) from the same balb/c background intranasally with a high × tcid viral dose (p virus) (fig. a ). on day p.i., a viral rna peak in the lungs was observed ( fig. b and fig. s ) with no obvious differences in viral loads (fig. b) nor lung pathology ( fig. d and fig. s a and s b ) between wt and scid mice. these data indicate that mice that lack the human ace receptor , can in principle be infected with sars-cov- , although inefficiently and likely transiently, as also observed for sars-cov- , . however, adaptive immunity did not markedly contribute to this low susceptibility. interferons are the prototypic first-line innate immune defense against viral infections. to evaluate interferons, we compared viral rna levels and lung pathology in wt c bl/ mice, and c bl/ mice with a genetic ablation of their type i (ifnar -/-) and iii interferon (ifn) receptors (il r -/-) (fig. a ). ifnar -/mice showed an enhanced replication of sars-cov- in the lung on day p.i. compared to both wt and il r -/mice (fig. c ). similar to balb/c mice, overall viral loads were low. ifnar -/mice that were treated prior to infection with human convalescent sars-cov- patient serum or plasma that contains spike-specific antibodies (fig. e, fig. s ) had a - -fold reduction in viral loads depending on the patient donor. this provides further evidence for active, although inefficient virus replication in ifnar -/mice. wt and knockout (ifnar -/-, il r -/-) mouse strains, all on c bl/ background, presented consistently with only a mild lung pathology. however, ifnar -/mice showed increased levels of intra-alveolar hemorrhage, sometimes accompanied by some peribronchiolar inflammation ( fig. d and fig. s a and s b ). passive transfer of hcs did not result in an obvious improvement in histopathological scores ( [ ] [ ] [ ] . likewise, hcs treatment modulated, at least to some extent, the observed gene expression patterns ( fig. f and cgas (mb d , p= . ) mrna levels. in summary, our data are in line with restriction of sars-cov- infection by the interferon system in mice, and also suggest limited inflammatory responses in the lungs of mice, in contrast to covid- in humans . taken together, mice were considered as a poor model to study covid- pathogenesis, or to assess the efficacy of vaccines and treatments. in contrast, syrian hamsters have been reported to be highly susceptible to sars-cov- and sars-cov- and might thus provide a small animal model to study sars-cov-induced pathogenicity and the involvement of the immune response in aggravating lung disease. in contrast to mice, intranasal inoculation of sars-cov- in wt hamsters resulted in high viral rna loads (fig. b, fig. s ) a proxy used for the quantification of viral loads (see fig. s c ), and in actual infectious titers (fig. c) in the lungs, i.e. roughly log higher than in ifnar -/mice (fig. c) . also, a marked lung pathology [median cumulative score (mcs) out of maximal score of ; iqr= . - . (p virus)] characterized by a multifocal necrotizing bronchiolitis, massive leukocyte infiltration and edema was observed in infected hamsters but not in mice ( fig. d and fig. s a -c). this resembles histopathological findings in humans suffering from severe bronchopneumonia . in order to investigate the roles of type i and iii ifn in the pathogenesis of sars-cov- infection, we compared virus replication levels and lung pathology in wt hamsters and hamsters with ablated signal transducer and activator of transcription (stat -/lacking type i and iii ifn signaling) , and il r expression (il r-a -/lacking ifn type iii signaling) ( fig. a) . of note, these receptor knockouts did not affect ace expression in hamster lungs (fig. s a ), while interferon-stimulated genes (isg) such as mx- (strongly induced by ifnα/stat signaling) and ip- (induced by both type i and type ii ifns) showed a differential expression pattern when comparing the different genotypes, triggered by sars-cov- infection (fig. s b) . importantly, lower baseline expression of mx- and ip- and failure to respond to sars-cov- infection by mx- upregulation in stat -/hamsters confirmed the functional knockout. as expected, il r-a -/hamsters showed an intermediate phenotype between that of wt and stat -/concerning their antiviral response. for many respiratory viruses, including sars-cov- , type i and iii interferon signaling has been described to play an important role in restricting infection . no marked differences were observed in viral rna levels in the lung of wt, stat -/or il r-a -/hamsters (fig. b) . however, stat -/hamsters had higher titers of infectious virus in the lung (fig. c ), high titer viremia (measured by rt-qpcr and virus titration) and fig. s c ). matrix metalloprotease (mmp)- levels, which may serve as a sensitive marker for the infiltration and activation of neutrophils in inflamed tissues , , were markedly elevated in the lungs of all infected hamsters (fig. g) . however, higher mmp- levels were found in stat -/animals, thereby inversely correlating with the histological findings (fig. d ). in addition, biomarkers elevated in critically ill covid- patients , , such as the cytokines il- , il- and ifn- were not found to be elevated in the serum of infected hamsters (fig. s b) , although mrna levels of ip- (cxl- ) were upregulated in the lungs of sars-cov- infected hamsters as reported for other cytokine/chemokines downstream of ifn- (fig. s b) . nonetheless, infected stat -/and il r-a -/had clearly increased levels of il- and il- in their lungs (fig. s a ). such an inverse correlation between biomarkers and pathology in wt versus stat -/hamsters is in line with findings in mouse models of sars-cov- infection in which pathology correlated with the induction and dysregulation of alternatively activated "wound-healing" monocytes/macrophages , . to assess the utility of the hamsters for testing the effect of therapeutic interventions on sars-cov- replication, wt hamsters were treated with human convalescent plasma or a neutralizing sars-cov- and sars-cov- specific single-domain antibody fc fusion construct (vhh- -fc) prior to infection (fig. h) . unlike a single dose of convalescent plasma, which did not significantly reduce viral load in the lungs, pre-treatment with vhh- -fc reduced viral loads in the lung ~ -fold compared to untreated control animals, validating hamsters as preclinical model for testing anti-sars-cov- therapies. the lack of readily accessible serum markers or the absence of overt disease symptoms in hamsters prompted us to establish a non-invasive means to score for lung infection and sars-cov- induced lung disease by computed tomography (ct) as used in standard patient care to aid covid- diagnosis with high sensitivity and monitor progression/recovery , , , . similar as in humans , semiquantitative lung pathology scores were obtained from high-resolution chest micro-ct scans of freebreathing animals the increase in replication of sars-cov- seen in il r-a -/hamsters, on one hand, combined with a tempered inflammatory response and lung injury as compared to wt hamsters, on the other hand, is in line with the role of type iii ifn plays during respiratory virus infections, including sars-cov- . this observation also suggests that in humans pegylated ifn-lambda , (or similar modulators of innate immunity) may possibly be considered to protect medical staff and other frontline workers from sars-cov- infection or to dampen symptoms in critically ill patients . in conclusion, hamsters may be preferred above mice as infection model for the preclinical assessment of antiviral therapies, of convalescent serum transfer and of approaches that aim at tempering the covid- immune pathogenesis in critically ill patients , . the latter may be achieved by repurposing anti-inflammatory drugs such as il- receptor antagonists (e.g. tocilizumab) , or small molecule jak/stat inhibitors (e.g. ruxolitinib or tofacitinib). educated by our finding that stat signaling plays a dual role in also limiting viral dissemination, targeting the virus-induced cytokine response and overshooting of macrophage activation may need to be complemented by (directly acting) antivirals . wild-type syrian hamsters (mesocricetus auratus) were purchased from janvier laboratories. all other mouse (c bl/ , ifnar -/-, il r -/-, balb/c and scid) and hamster (stat -/and il r-a -/-) strains were bred in-house. six-to eight-weeks-old female mice and wild-type hamsters were used throughout the study. knock-out hamsters were used upon availability; seven-to twelve-week old female stat -/hamsters; five-to seven-week-old il r-a -/hamsters. vero e (african green monkey kidney, kind gift from peter bredenbeek, lumc, nl) and huh (human hepatoma, jcrb ) cells were maintained in minimal essential medium (gibco) supplemented with % fetal bovine serum (integro), % bicarbonate (gibco), and % l-glutamine (gibco). for maintenance of calu- cells (human airway epithelium, kind gift from lieve naesens, ku leuven, be), the above medium was supplemented with mm hepes (gibco). all assays involving virus growth were performed using % (vero e and huh ) or . % (calu- ) fetal bovine serum instead of %. sars-cov- strain betacov/belgium/ghb- / (epi isl | - - ) recovered from a nasopharyngeal swab taken from a rt-qpcr-confirmed asymptomatic patient returning from wuhan, china beginning of february was directly sequenced on a minion platform (oxford nanopore) as described previously antibody vhh- -fc was administered i.p. at a dose of mg/kg day prior to infection. vhh- -fc was expressed in expicho cells (thermofisher scientific) and purified from the culture medium as described . briefly, after transfection with pcdna . -vhh- -fc plasmid dna, followed by incubation at c and % co for - days, the vhh- -fc protein in the cleared cell culture medium was captured on a ml mabselect sure column (ge healthcare), eluted with a mcilvaine buffer ph , neutralized using a saturated na po buffer, and buffer exchanged to storage buffer ( mm l-histidine, mm nacl). the antibody's identity was verified by protein-and peptide-level mass spectrometry. animals were euthanized at different time-points post-infection, organs were removed and lungs were homogenized manually using a pestle and a -fold excess of cell culture medium (dmem/ %fcs). rna extraction was performed from homogenate of mg of lung tissue with rneasy mini kit (qiagen), or µl of serum using the nucleospin kit (macherey-nagel), according to the manufacturer's instructions. other organs were collected in rnalater (qiagen) and homogenized in a bead mill (precellys) prior to extraction. of µl eluate, µl was used as template in rt-qpcr reactions. rt-qpcr was performed on a lightcycler platform (roche) using the itaq universal probes one-step rt-qpcr kit (biorad) with primers and probes (table s ) infectious virus were used to express the amount of rna as normalized viral genome equivalent (vge) copies per mg tissue, or as tcid equivalents per ml serum, respectively. the mean of housekeeping gene β-actin was used for normalization. the relative fold change was calculated using the -ΔΔct method . after extensive transcardial perfusion with pbs, lungs were collected, extensively homogenized using manual disruption (precellys ) in minimal essential medium ( % w/v) and centrifuged ( , rpm, min, °c) to pellet the cell debris. infectious sars-cov- particles were quantified by means of endpoint titrations on confluent vero e cell cultures. viral titers were calculated by the spearman-kärber method and expressed as the % tissue culture infectious dose (tcid ) per mg tissue. to study differential gene expression, rna was extracted from lung tissues using trizol, subjected to cdna synthesis (high capacity cdna reverse transcription kit, thermo fisher scientific), and qpcr using a custom taqman qrt-pcr array (thermo fisher scientific) of genes known to be activated in response to virus infection , as well as two housekeeping genes (table s ) for histological examination, the lungs were fixed overnight in % formaldehyde and embedded in paraffin. tissue sections ( µm) were stained with hematoxylin and eosin to visualize and score for lung damage. calu- (human airway epithelial) cells were plated at × cytokine levels in lung homogenates and serum of hamsters were determined by elisa for ifn- (eha ), il- (eha ) and il- (eha ) following the manufacturer's instructions (wuhan fine biotech co., ltd). the levels of gelatinase b/metalloproteinase (mmp)- present in lung homogenates were analyzed using gelatin zymography , essentially as described previously . for quantification of zymolytic bands internal control samples were spiked into each sample. equivalent hamster enzyme concentrations were calculated with the use of known amounts of recombinant human pro-mmp- and recombinant human pro-mmp- Δoghem as standards . hamsters were anaesthetized using isoflurane (iso-vet) ( - % in oxygen) and installed in prone position into the x-cube micro-ct scanner (molecubes) using a dedicated imaging bed. respiration was monitored throughout. a scout view was acquired and the lung was selected for a non-gated, helical ct acquisition using the high-resolution ct protocol, with the following parameters: kvp, exposures, ms/projection, µa tube current, rotation time s. data were reconstructed using a regularized statistical (iterative) image reconstruction algorithm using nonnegative least squares , using an isotropic µm voxel size and scaled to hounsfield units (hus) after calibration against a standard air/water phantom. the spatial resolution of the reconstruction was estimated at µm by minimizing the mean squared error between the d reconstruction of the densest rod in a micro-ct multiple density rod phantom (smart scientific) summed in the axial direction and a digital phantom consisting of a d disk of . mm radius that was post-smoothed with gaussian kernels using different full width half maxima (fwhm), after aligning the symmetry axis of the rod to the z-axis. visualization and quantification of reconstructed micro-ct data was performed with dataviewer and ctan software (bruker micro-ct). as primary outcome parameter, a semi-quantitative scoring of micro-ct data was performed as previously described , , with minor modifications towards optimization for covid- lung disease in hamsters. in brief, visual observations were scored (from - depending on severity, both for parenchymal and airway disease) on different, predefined transversal tomographic sections throughout the entire lung image for both lung and airway disease by two independent observers (l.s. and g.v.v.) and averaged. scores for the sections were summed up to obtain a score from to reflecting severity of lung and airway abnormalities compared to scans of healthy, wt control hamsters. as secondary measures, image-derived biomarkers (nonaerated lung volume, aerated lung volume, total lung volume, the respective densities within these volumes and large airways volume) were quantified as in , for a manually delineated voi in the lung, avoiding the heart and main blood vessels. the threshold used to separate the airways and aerated (grey value - ) from non-aerated lung volume (grey value - ) was set manually on an -bit greyscale histogram and kept constant for all data sets. il r -/-(n= ) mice. at the indicated time intervals p.i., viral rna levels were determined by rt-qpcr, normalized against β-actin mrna levels and transformed to estimate viral genome equivalents (vge) content per weight of the lungs ( figure s ). for heat-inactivation, sars-cov- was incubated for min at °c. dotted line indicates lower limit of quantification (lloq). the data shown are means ± sem. (d) histopathological scoring of lungs for all different mouse strains. mice were sacrificed on day p.i. and lungs were stained with h&e and scored for signs of lung damage (inflammation and hemorrhage). scores are calculated as percentage of the total maximal score. "no score" means not contributing to theoretical full cumulative score of %. numbers (n) of animals analyzed per condition are given in the inner circle. (e) viral rna levels in ifnar -/mice after treatment with anti-sars-cov- serum or plasma. mice were either left untreated (ic, infection control), or treated intraperitoneally one day before infection with convalescent serum (patient # ), convalescent plasma (patient # ) or with negative control plasma (patient # nc, negative control) and sacrificed on day p.i. viral rna levels were determined in the lungs, normalized against βactin and fold-changes were calculated using the (-ΔΔcq) method compared to mean of ic. the data shown are means ± sem. (f) heatmap showing gene expression profiles of selected marker genes in the lungs of uninfected and infected ifnar -/mice that were either left untreated or treated with convalescent serum from patient # (n= per group). analysis performed on day p.i. the scale represents fold change compared to non-infected animals. statistical significance between groups was calculated by the nonparametric two-tailed mann-whitney u-test (ns = not significant, p > . , * p < . , ** p < . , *** p < . ). wt, stat -/and il r-a -/hamster strains were intranasally inoculated with × tcid of passage or × of passage sars-cov- . outcomes derived from inoculation with passage or passage sars-cov- is designated by circles (p ) or squares (p ). on the indicated days post inoculation (d.p.i.), organs and blood were collected to determine viral rna levels, infectious viral load and score for lung damage. viral loads in the indicated organs were quantified by rt-qpcr (b, e and f) or virus titration (c). (b,f) viral rna levels in the indicated organs were normalized against β-actin mrna levels and transformed to estimate viral genome equivalents (vge) content per weight of the lungs ( figure s ). (c) infectious viral loads in the lung are expressed as the number of infectious virus particles per mg of lung tissue. (e) viral rna levels in the blood were calculated from a standard of infectious virus and expressed as tcid equivalents per ml blood. dotted lines indicate lower limit of quantification (lloq) or lower limit of detection (llod) (d) histopathological scoring of lungs. hamsters were sacrificed on day p.i. with passage sars-cov- and lungs were stained with h&e and scored for signs of lung damage (apoptotic bodies, necrotizing bronchiolitis, edema, pneumonia and inflammation). scores are calculated as percentage of the total maximal score. (g) levels of matrix metalloproteinase (mmp)- levels in lung homogenates of sars-cov- infected hamsters, relative to non-infected controls of the same strain. statistical significance was calculated between infected and non-infected animals within each group. values for infected animals (n= each) compiled from two independent experiments using either p (n= , circles) and p (n= , squares) sars-cov- . (h) viral rna levels in hamsters after treatment with convalescent sars-cov- plasma or with a previously described antibody. hamsters were either left untreated (ic, infection control, n= ) or treated with a single-domain antibody (vhh- -fc, n= ), convalescent plasma (patient # , n= ) or negative control plasma (patient # nc, negative control, n= ) and sacrificed on day p.i. viral rna levels were determined in the lungs, normalized against β-actin and fold-changes were calculated using the (-ΔΔcq) method compared to the mean of ic. the data shown are means ± sem. statistical significance between groups was calculated by the nonparametric two-tailed mann-whitney u-test (ns p > . , * p < . , ** p < . , **** p < . ). . c) . lines indicate matched samples. the data shown are means ± sem. statistical significance between groups was calculated by the nonparametric two-tailed mann whitney u-test (ns p > . , * p < . ). d wrote the original draft with input from co-authors developed the stat -/-and il r-a -/-hamster strains are named as inventors on us patent application no. / , , entitled ''coronavirus binders are named as inventors on us patent application no. / , , entitled ''sars-cov- virus binders origin and evolution of pathogenic coronaviruses a novel coronavirus from patients with pneumonia in china clinical progression of patients with covid- in shanghai dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology induction of alternatively activated macrophages enhances pathogenesis during severe acute respiratory syndrome coronavirus infection clinical features of patients infected with novel coronavirus in wuhan comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model the proximal origin of sars-cov- probable pangolin origin of sars-cov- associated with the covid- outbreak cryo-em structure of the -ncov spike in the prefusion conformation sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor neutralizing antibody against severe acute respiratory syndrome (sars)-coronavirus spike is highly effective for the protection of mice in the murine sars model treatment of critically ill patients with covid- with convalescent plasma complement activation contributes to severe acute respiratory syndrome coronavirus pathogenesis small molecule inhibitors of tbk serve as adjuvant for a plasmidlaunched live-attenuated yellow fever vaccine interferon-stimulated genes: what do they all do? a genetic ifn/stat /fas axis determines cd t stem cell memory levels and apoptosis in healthy controls and adult t-cell leukemia patients treatment of multiple sclerosis patients with interferon-beta primes monocyte-derived macrophages for apoptotic cell death an evolutionary recent ifn/il- /cebp axis is linked to monocyte expansion and tuberculosis severity in humans reducing mortality from -ncov: host-directed therapies should be an option severe acute respiratory syndrome coronavirus infection of golden syrian hamsters simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility pathological findings of covid- associated with acute respiratory distress syndrome efficient gene targeting in golden syrian hamsters by the crispr/cas system stat knockout syrian hamsters support enhanced replication and pathogenicity of human adenovirus, revealing an important role of type i interferon response in viral control validation of assays to monitor immune responses in the syrian golden hamster (mesocricetus auratus) combined action of type i and type iii interferon restricts initial replication of severe acute respiratory syndrome coronavirus in the lung but fails to inhibit systemic virus spread detectable -ncov viral rna in blood is a strong indicator for the further clinical severity gastrointestinal symptoms of cases with sars-cov- infection neutrophils in the initiation and resolution of acute pulmonary inflammation: understanding biological function and therapeutic potential gelatinase b: a tuner and amplifier of immune functions clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies correlation of chest ct and rt-pcr testing in coronavirus disease (covid- ) in china: a report of cases clinical characteristics of deceased patients with coronavirus disease : retrospective study radiological findings from patients with covid- pneumonia in wuhan, china: a descriptive study longitudinal, in vivo assessment of invasive pulmonary aspergillosis in mice by computed tomography and magnetic resonance imaging longitudinal micro-ct provides biomarkers of lung disease that can be used to assess the effect of therapy in preclinical mouse models, and reveal compensatory changes in lung volume quantification of lung fibrosis and emphysema in mice using automated micro-computed tomography lethal infection of k -hace mice infected with severe acute respiratory syndrome coronavirus mice transgenic for human angiotensin-converting enzyme provide a model for sars coronavirus infection the trinity of covid- : immunity, inflammation and intervention longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of sars-cov- infected patients attenuated sars-cov- variants with deletions at the s /s junction sars-cov pathogenesis is regulated by a stat dependent but a type i, ii and iii interferon receptor independent mechanism immune response in stat knockout mice modeling severe fever with thrombocytopenia syndrome virus infection in golden syrian hamsters: importance of stat in preventing disease and effective treatment with favipiravir zika virus infection of adult and fetal stat knock-out hamsters constitutive type i interferon modulates homeostatic balance through tonic signaling anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection a transcriptional regulatory atlas of coronavirus infection of human cells lambda interferon renders epithelial cells of the respiratory and gastrointestinal tracts resistant to viral infections new treatment options for delta virus: is a cure in sight? lambda interferons come to light: dual function cytokines mediating antiviral immunity and damage control covid- and emerging viral infections: the case for interferon lambda the cytokine storm of severe influenza and development of immunomodulatory therapy immune status could determine efficacy of covid- therapies targeting interleukin- signaling in clinic coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease xgenome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes first cases of coronavirus disease (covid- ) in the who european region accounting for population structure reveals ambiguity in the zaire ebolavirus reservoir dynamics modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis modular approach to customize sample preparation procedures for viral metagenomics analysis of relative gene expression data using real-time quantitative pcr and the -ΔΔct method inhibitors of the interferon response enhance virus replication in vitro zymography methods for visualizing hydrolytic enzymes analysis of gelatinases in complex biological fluids and tissue extracts differential diagnosis of autoimmune pancreatitis from pancreatic cancer by analysis of serum gelatinase levels iterative ct reconstruction using shearlet-based regularization radiosafe micro-computed tomography for longitudinal evaluation of murine disease models we thank kathleen van key: cord- - hk h authors: malla, tek narsingh; pandey, suraj; poudyal, ishwor; feliz, denisse; noda, moraima; phillips, george; stojkovic, emina; schmidt, marius title: ebselen reacts with sars coronavirus- main protease crystals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hk h the sars coronavirus main protease clpro tailor cuts various essential virus proteins out of long poly-protein translated from the virus rna. if the clpro is inhibited, the functional virus proteins cannot form and the virus cannot replicate and assemble. any compound that inhibits the clpro is therefore a potential drug to end the pandemic. here we show that the diffraction power of clpro crystals is effectively destroyed by ebselen. it appears that ebselen may be a widely available, relatively cost effective way to eliminate the sars coronavirus . to the active center was observed. they all constitute a database of potential drugs to target the clpro. apart from the fragments, the most promising compounds are the α-ketoamides (fig. ) which bind tightly to the clpro - . as they are complicated to synthesize they carry a hefty price figure . sars cov- clpro in its functional, dimeric form . the two subunits a and b are shown in dark and light blue. an α-ketoamide inhibitor is bound to the active site (red box, enlarged). the his- /cys- catalytic dyad is marked. expensive, but also less known compound that binds to the clpro is ebselen . ebselen is a selenium compound (scheme ) currently tested for a number of diseases such as bipolar disorder and hearing loss . selenium is an essential metal, but toxic in higher doses. ebselen has been shown to bind strongly to the cov- clpro , but the structure of the complex is unknown. here we show what happens when ebselen is added to clpro crystals. methods. expression. the cov- clpro sequence was after days of soaking, the crystals completely maintained their morphology (fig. b) . fully remote due to restriction of the covid- pandemic. fig. shows a comparison of diffraction patterns collected from untreated crystals (fig. a) , and crystals treated with the ebselen (fig. b) . as the untreated crystals diffracted beyond Å, the treated crystals did not show any bragg reflections whatsoever, even at lowest resolution. they are completely amorphous, despite the nice crystal-like shape. accordingly, the structure of only the untreated clpro can be solved. rmsd to wqf . +/- . Å *last resolution shell in brackets integration of data reduction and structure solution -from diffraction images to an initial structural plasticity of sars-cov- cl m-pro active site cavity structures enzyme intermediates captured "on the fly" by mix-and-inject serial crystallography microfluidic mixing injector holder enables routine structural enzymology measurements with mix-and crystallography using x-ray free electron lasers structures of riboswitch rna reaction states by mix-and-inject xfel key: cord- - wo o pp authors: bangaru, sandhya; ozorowski, gabriel; turner, hannah l.; antanasijevic, aleksandar; huang, deli; wang, xiaoning; torres, jonathan l.; diedrich, jolene k.; tian, jing-hui; portnoff, alyse d.; patel, nita; massare, michael j.; yates, john r.; nemazee, david; paulson, james c.; glenn, greg; smith, gale; ward, andrew b. title: structural analysis of full-length sars-cov- spike protein from an advanced vaccine candidate date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wo o pp vaccine efforts against the severe acute respiratory syndrome coronavirus (sars-cov- ) responsible for the current covid- pandemic are focused on sars-cov- spike glycoprotein, the primary target for neutralizing antibodies. here, we performed cryo-em and site-specific glycan analysis of one of the leading subunit vaccine candidates from novavax based on a full-length spike protein formulated in polysorbate (ps ) detergent. our studies reveal a stable prefusion conformation of the spike immunogen with slight differences in the s subunit compared to published spike ectodomain structures. interestingly, we also observed novel interactions between the spike trimers allowing formation of higher order spike complexes. this study confirms the structural integrity of the full-length spike protein immunogen and provides a basis for interpreting immune responses to this multivalent nanoparticle immunogen. severe acute respiratory syndrome coronavirus (sars-cov) caused a global outbreak from - causing severe pneumonia and killing almost people ( ). sars-cov- , belongs to the same lineage of the β-cov genus as sars-cov, and recently emerged in china, spreading rapidly and infecting more than million people worldwide with cases continuing to rise each day ( ) . given the global increase in population density, urbanization, and mobility, and the uncertain future behavior of the virus, vaccination is a critical tool for the response to this pandemic. the sars-cov- spike (s) trimeric glycoprotein is a focus of coronavirus vaccine development since it is a major component of the virus envelope, essential for receptor binding and virus entry, and a major target of host immune defense ( , ) . there are several currently ongoing efforts to make spikebased vaccines using different strategies ( ) ( ) ( ) . the cov s protein is synthesized as an inactive precursor (s ) that gets proteolytically cleaved into s and s subunits which remain non-covalently linked to form functional prefusion trimers ( ) . like other type fusion proteins, the sars-cov- s prefusion trimer is metastable and undergoes large-scale structural rearrangement from a prefusion to a thermostable post fusion conformation upon s-protein receptor binding and cleavage ( , ) . rearrangement exposes the hydrophobic fusion peptide (fp) allowing insertion into the host cell membrane, facilitating virus/host cell membrane alignment, fusion, and virus entry. notably, sars-cov- s has a amino acid insertion (prra) in the s /s cleavage site compared to sars-cov spike resulting in a polybasic rrar furin-like cleavage motif that enhances infection of lung cells ( , ) . while the s subunit is relatively more conserved across the β-cov genus, the s subunit comprising the receptor binding domain (rbd) is immunodominant and much less conserved ( ) . the fp, two heptad repeats (hr and hr ), transmembrane (tm) domain, and cytoplasmic tail (ct) are located in the s subdomain that encompasses the fusion machinery. the s subunit of sars-cov- s folds into distinct domains; the nterminal (ntd), the c-terminal domain (ctd) containing the rbd and two subdomains, sd and sd . while some human covs (hcov), including oc , exclusively use ntdsialic acid interactions as their receptor engagement, others like middle east respiratory syndrome (mers) cov that use the ctd-rbd for primary receptor binding have also been reported to bind sialic acid receptors via their ntd to aid initial attachment to the host cells ( ) ( ) ( ) . although sars-cov- primarily interacts with its receptor ace through the ctd-rbd, there is currently no evidence indicating possible interactions between the ntd and sialoglycans ( , ) . the structure of the stabilized sars-cov- spike ectodomain has been solved in its prefusion conformation and exhibits a high resemblance to sars-cov spike ( ) ( ) ( ) . in this report, we describe the atomic structure of a leading sars-cov- s vaccine candidate based on a full-length s gene with furin cleavage-resistant mutations in the s /s cleavage site and the presence or absence of -proline amino acid substitutions at the apex of the central helix. our studies reveal an overall shift in conformation of the s subunit compared to the previously published structures ( ) ( ) ( ) . interestingly, we also observed direct interactions between adjacent spike trimers; the flexible loop between residues - in the sd from each trimer extending and engaging a binding pocket on the ntd of the adjacent trimers resulting in higher order spike multimers. further, sitespecific glycan analysis revealed the glycan occupancy as well as varying levels of glycan processing at the n-glycosylation sequons present in the spike monomer. thus, our studies provide in-depth structural analysis of the novavax full-length vaccine candidate, currently being tested in humans, that appropriately recapitulates the prefusion spike. the sars-cov- - q- p full-length spike vaccine candidate ( q- p-fl) was engineered from the full-length sars-cov- spike gene (residues - ) including the transmembrane domain (tm) and the cytoplasmic tail (ct) (fig a). the construct was modified at the s /s polybasic cleavage site from rrar to qqaq to render it protease resistant along with proline substitutions at residues k and v in the s fusion machinery core for enhanced stability ( figure a) . a second full-length construct ( q-fl) containing the cleavage site mutations but without the proline substitutions was also made in parallel for comparative purposes. the fl spikes expressed and purified from insect cells were then formulated in . % (v/v) polysorbate (ps ) detergent. to characterize the structural integrity of q- p-fl immunogen, we performed negative stain electron microscopy (nsem) of the fl spike constituted in ps in the presence of matrix-m adjuvant, recapitulating the vaccine formulation being tested in humans. micrographs and d classes of the particles revealed trimeric spike proteins present as free trimers or as multi-trimer rosettes with their transmembrane domains enclosed in cylindrical micellar cores of ps detergent ( figure b ). micrographs show these q- p-fl spike nanoparticles containing as many as trimers. the matrix-m adjuvant can also be seen in the micrographs and d classes as spherical cages of different sizes with only very limited interaction with spike particles. to further evaluate the structural features of the q- p-fl immunogen, we performed single particle cryo-em on the spike formulated in ps detergent. the raw micrographs from cryo-em revealed free trimers and trimer rosettes similar to those observed on the negative stain micrographs (figure a ). initial d classifications revealed the presence of distinct classes: free spike trimers and dimers of trimers ( figure a ). each class was independently subjected to additional classification and refinement. the three-fold symmetry (c ) reconstruction of the free spike trimer resulted in a map of . Å resolution while the asymmetric reconstruction (c ) was resolved to . Å resolution ( figure b and s a, s b). previously published structures of soluble, stabilized sars-cov- spikes have revealed that rbds exist in either a closed (rbddown) or an open (rbd-up) conformation that can engage in ace binding ( ) ( ) ( ) . in contrast, we observed that all three rbds on the q- p-fl spike trimer were present in the closed conformation in the asymmetric reconstruction; the higher resolution c map was consequently used for model building ( figure b and s c). overall, the map was well resolved in both s and s subunits, particularly in the s ntd and ctd domains that were less resolved in previously published structures, thereby enabling us to model the full extent of these domains. notably, the local resolution map calculated using cryosparc showed much of the spike trimer at substantially higher resolution than . Å ( figure s d ). the atomic model contains residues - with breaks only in the flexible loop ( - ) and the cleavage site ( - ) ( figure c ). interestingly, superimposition of the coordinate models of q- p-fl spike with published spike structures (pdb id: vxx and vsb) revealed substantial domain rearrangements in the s subunit of q- p-fl spike compared to the other models, whereas the structure of the s subunit was consistent with the published data ( figure d ). the s ntd differed the most (~ ° rotation counterclockwise relative to published models when viewing down towards the viral membrane) while the ctd and subdomains showed minor local rearrangements ( figure d) . notably, we also observed shifts in the placement of residues flanking the - loop compared to the published models. this region was modeled in one of the published structures (pdb id: x p) as a helix with residues flanking the helix positioned very differently from our model as shown by the corresponding placement of residues t and t (residues flanking the gap in q- p-fl model) ( figure a ). however, upon closer inspection of the cryo-em density (emd- ) corresponding to residues - of the pdb model x p, there is insufficient density to support the helix conformation of this region ( figure s e ). the resulting displacement of residues in the q- p-fl structure enables inter-protomeric interactions by creating a salt-bridge between residues asp and lys ( figure b ). this observation is particularly interesting given the increased prevalence of d g mutation in the emerging sars-cov- strains and its potential role in viral transmission and pathogenesis ( ) . during refinement of an atomic model into the em density, we observed additional densities in the s subunit that did not correspond to any peptide or glycans within the spike ( figure s a ). the first density was buried within a hydrophobic pocket of the ctd created by f , f , y , y , f , f , f , f ( figure c and s b). we had previously observed a non-protein density situated in the structure of porcine epidemic diarrhea virus (pedv) that was identified to be palmitoleic acid ( ) . this pocket in sars-cov- ctd corresponded with the structure of linoleic acid, a polyunsaturated fatty acid; the presence of this ligand was confirmed by mass spectrometry of q- p-fl spike ( figure s b and s c). the main chain carboxyl group of linoleic acid interacts with r and q residues of rbd from the adjacent protomer thereby making interprotomer contacts ( figure c ). the second unassigned density present in ntd was relatively larger and more surface exposed than the first density, surrounded by residues n , y , s , f , r , h , v ( figure d and s d). analysis of the structural features of this density suggested that it may correspond to ps detergent used to solubilize the membrane-bound trimers and stabilize them in solution. the aliphatic tail of ps fit well into the hydrophobic pocket while the carbonyl and hydroxyl groups were well placed in proximity to residues r and h with potential for multiple hydrogen bonds between them ( figure d and s d). overall, the density is consistent with ps detergent and, given its location, provides a possible explanation for the s shift seen in our fl trimer density compared to the published structures. further classification of multimeric trimer particles yielded two separate classes; a dimerof-trimers class that reconstructed to a final resolution of . Å with -fold symmetry and a trimer-of-trimers class that was resolved to . Å resolution ( figure a , b and s a). the presence of the trimer-of-trimers class revealed that each spike trimer had the ability to interact with multiple trimers simultaneously. in both reconstructions, the interaction between each pair of trimers involved the sd of one protomer from each trimer engaging with the ntd of the adjacent trimer ( figure c ). consequently, each trimer pair is symmetrical along a -fold axis with trimer axes tilted to . degrees relative to each other. the atomic model of the dimer-of-trimer em density revealed that the interaction was mainly coordinated by the - loop. although most of the loop residues were too flexible to resolve in the free trimer density map, the inter-trimer interaction stabilized the loop so that it could be fully resolved ( figure d ). the loop reaches into a pocket on the adjacent ntd, interacting with residues -pvaihadq- in the loop with ntd residues q , h , y , l , v and s ( figure d ). we observed subtle changes in the ntd binding pocket in the loop-bound state compared to the free trimer model that allow better accommodation of the loop in the pocket. the residues y and h in the binding pocket appear to switch positions in the loop-bound state resulting in a salt bridge interaction between h and d and potential stacking between w and h ( figure e ). we also observed minor displacement of residues - and - surrounding the pocket. in addition to the main loop interaction resulting in higherorder oligomers, we also observed n glycans extending out towards the symmetry related chain in the adjacent trimer ( figure s b were replaced by a gs linker ( figure f ). to investigate if the absence of stabilizing proline mutations impacted the spike stability and formation of higher order multimers, we performed cryo-em studies of the sars-cov- - q-fl (without p) protein formulated in ps detergent. the raw micrographs and d classes revealed the presence of free trimers as well as trimer-trimer complexes as observed with q- p-fl, indicating that the proline stabilization is not necessary for the formation of these higher order complexes ( figure s a ). the d refinement of free trimers was refined to Å resolution imposing c symmetry as we observed that the rbds were present in closed conformation similar to q- p-fl ( figure s b ). fitting the q- p-fl model into the q-fl map revealed identical conformation of the spike protein further supporting that the presence of p in the full-length immunogen does not lead to any structural changes in the spike protein ( figure s c ). glycans on viral glycoproteins play a wide role in protein folding, stability, immune recognition and potentially in immune evasion. site-specific glycosylation of the sars-cov- prefusion spike protein produced in sf insect cells was analyzed using our recently described mass spectrometry proteomics-based method, involving treatment with proteases followed by sequential treatment with the endoglycosidases (endo h and pngase f) to introduce mass signatures in peptides with n-linked sequons (asn-x-thr/ser) to assess the extent of glycosylation and the degree of glycan processing from high mannose/hybrid type to complex type ( ) . although the method was developed to assess the degree of processing of n-linked glycans in mammalian cells, it is also applicable for analyzing glycosylation of sf insect cells. the primary differences in glycan processing of n-linked glycans in sf insect cells are: ) the production of truncated paucimannose glycans, and ) the potential to introduce either one (α , ) or two (α , /α , ) fucose substitutions into the core glcnac attached to asn. although α , fucose substitution is known to prevent cleavage by pngase f ( ), this is not a factor when analyzing glycosylation from sf cells since they contain α , -fucosylatransferase, which is found in mammalian cells, but only contain trace amounts α- , fucosyltransferase activity, if any ( ) . the paucimannose glycans are highly processed like complex type glycans and not cleaved by endoh, but are cleaved by pngase f. thus, for sf insect cell-produced glycoproteins, the use of endoglycosidases to introduce mass signatures is analogous to analysis of glycoproteins produced in mammalian cells, with endoh removing high mannose/hybrid glycans leaving a glcnac-asn (+ ), followed by treatment with pngase f in o water which removes the remaining paucimannose and complex type glycans and while converting asn to asp (+ ), and the asn of unoccupied sites remains unaltered (+ ). our analysis detected glycosylation at all potential n-linked glycan sequons present on sars-cov- spike ( figure g ). overall, there was high glycan occupancy of over > %, with only two sites, and , more than % unoccupied. interestingly, we did not see clear glycan density at either or in the cryo-em reconstruction of the q- p-fl spike. most sites showed extensive glycan processing to complex/paucimannose type glycans, with only four sites that exhibit ≥ % oligomannose. the glycan analysis also confirmed the presence of glycans at sites , and present in the membrane-proximal region of the spike not resolved by cryo-em. the extensive site-specific glycan processing of the sars-cov- prefusion spike protein in sf insect cells seen here is similar to that recently reported for the spike protein produced in mammalian hek f cells ( ) . the coronavirus disease (covid- ) caused by sars-cov- poses a serious health threat and was declared a pandemic by the world health organization (who). in quick response to this rapidly evolving situation, several sars-cov- spike-based vaccine candidates are being developed and tested at various stages of clinical trials ( ) ( ) ( ) . in this study, we performed structural analysis of the novavax sars-cov we also observed two non-spike densities within the spike trimer that corresponded with linoleic acid and polysorbate detergent. linoleic acid, an essential free fatty acid, was buried within a hydrophobic pocket in the ctd with its main chain carboxyl group making contacts with the adjacent rbd in closed conformation. a recent report by toelzer et al. also identified this density and attributed it to the presence of linoleic acid ( ) . the second large density occupied by ps is situated in the ntd and is relatively more surface exposed. since ps is unique to the formulation of the novavax q- p-fl immunogen, this observation is specific to this structure. however, there is a possibility of other ligands occupying this pocket in the place of ps . the presence of these binding pockets for different ligands in the spike structure provide potential targets for drug design against sars-cov- . the widely used sars-cov- spike ectodomain construct with mutated cleavage site and p substitution has been shown to partially exist in all rbd 'down' conformation or in one rbd 'up' conformation ( , ) . surprisingly, we observed that all the rbds in the q- p-fl spike immunogen were present in a down confirmation, which could be a cause for concern for eliciting neutralizing antibodies that compete with ace binding. however, binding analysis of the q- p-fl immunogen to ace by both bio-layer interferometry and elisa clearly show binding to ace , indicating that the rbd is dynamic and the receptor binding site accessible ( ) . another study on the prefusion structure of a full-length spike protein reported similar findings with rbds clamped down as a consequence of potential clashes between s residues - and sd when rbd is in open conformation ( ) . it is also possible that the interprotomeric contacts made by linoleic acid observed in our structure preferentially lead to the observed rbd positioning. our structural work is consistent with the burgeoning body of structures available of the spike protein, albeit with the important differences described above. hence, this advanced protein subunit vaccine candidate currently being tested in humans appears stable, homogeneous, and locked in the antigenically preferred prefusion conformation. further, the tight clustering of the spikes in the nanoparticle formulation may lead to stronger immune responses over soluble trimers alone, consistent with other viral glycoprotein immunogens (ha and rsv f) ( , ) . it appears that the remarkable speed at which this vaccine was designed did not compromise the quality of the immunogen, and that building off the previous success in formulating rsv f and influenza ha nanoparticle immunogens could readily be extended to sars-cov- spike, particularly in the background of the p mutations previously shown to stabilize many other β-cov spikes ( ) . with structural, biophysical, and antigenic characterization now complete, evaluation in humans will provide the true proof-of-principle for this vaccine concept. we thank bill anderson, hannah l. turner and charles a. bowman for their help with electron microscopy, data acquisition and data processing. we thank bill webb and linh truc hoang for their assistance with mass spectrometry and data processing. we thank lauren holden for her assistance with the manuscript. authors would also like to thank figures s to s table s references ( figs. s to s table s supplementary materials - q- p full-length spike formulated in ps and matrix adjuvant were diluted to approximately µg/ml with tbs. the sample was directly deposited onto carbon-coated -mesh copper grids and stained immediately with % (w/v) uranyl formate for seconds. grids were imaged at kev on tecnai t spirit with a k x k eagle ccd camera at , x magnification and - . μm nominal defocus. micrographs were collected using leginon and the images were transferred to appion for processing ( , ) . particle stacks were generated in appion with particles picked using a difference-of-gaussians picker (dog-picker) and d classes generated by msa/mra ( , ) . ( ) . motioncor was used for alignment and dose weighting of the frames ( ) . micrographs were transferred to cryosparc . for further processing ( ) . ctf estimations were performed using gctf and micrographs were selected using the curate exposures tool in cryosparc based on their ctf resolution estimates (cutoff Å) for downstream particle picking, extraction and iterative rounds of d classification and selection ( ) . particles selected from d classes were used for d refinement of free trimers for q- p-fl and q-fl datasets in cryosparc. final subsets of clean trimer particles were refined with c symmetry and local resolution for the free trimer was calculated using the local resolution function in cryosparc. particles corresponding to dimers-of-trimers classes in cryosparc were transferred to relion . for iterative rounds of d classification to separate dimers-of-trimers and trimers-of-trimers ( ) . final subsets of clean particles from dimers-of-trimers class were refined with c symmetry and the trimers-of-trimers class with c symmetry. model building and refinement. the . Å c -symmetric free trimer map and the . Å c -symmetric dimers-of-trimers maps were used for model building and refinement. initial model building was performed manually in coot using pdb vxx as a template followed by iterative rounds of rosetta relaxed refinement and coot manual refinement to generate the final models ( , ) . emringer and molprobity were run following each round of rosetta refinement to evaluate and choose the best refined models ( , ) . the coordinates were manually placed and refined into the respective map densities using coot. for rosetta refinement, each ligand was saved in mol format and rosetta parameter files were generated using the molfile_to_params.py function ( ) . final map and model statistics are summarized in table s . figures were generated using ucsf chimera and ucsf chimera x ( , ) . mass spectrometry to identify fatty acids in the sars- q- p-fl protein was performed as described previously ( ) . we obtained several candidates in this screen that were narrowed down to candidates based on their intensity and the m/z range of - . a sample of the sars-cov- prefusion spike protein expressed in the sf insect cell line was prepared for ms analysis as previously described with minor modifications ( ) . in brief, the protein ( µg) was denatured and aliquots ( µg the ms data were processed essentially as described previously ( ) . the data were searched against the proteome database and quantified using peak area in integrated proteomics pipeline-ip . since the processing pathway in sf cell line (insect cell line) is similar to mammalian cells for oligomannose and hybrid structures cleaved by endo-h, and then diverges to produce a combination of paucimannose and complex type glycans, peptides with n+ were identified as having oligomannose type glycans, and peptides with n+ are assigned as peptides with complex and paucimannose type glycans. sars: the first pandemic of the st century a novel coronavirus from patients with pneumonia in china mechanisms of coronavirus cell entry mediated by the viral spike protein sars-cov- spike protein: an optimal immunological target for vaccines sars-cov- vaccines: status report developing covid- vaccines at pandemic speed structure, function, and evolution of coronavirus spike proteins tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex the proximal origin of sars-cov- a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells the receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients human coronaviruses oc and hku bind to -o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a identification of sialic acid-binding function for the middle east respiratory syndrome coronavirus spike glycoprotein structures of mers-cov spike glycoprotein in complex with sialoside attachment receptors structural basis of receptor recognition by sars-cov- structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation characterization of the sars-cov- s protein: biophysical, biochemical, structural, and antigenic analysis. biorxiv the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity. biorxiv structure and immune recognition of the porcine epidemic diarrhea virus spike protein. biorxiv in-silico evidence for two receptors based strategy of sars-cov- . biorxiv n-terminal domain (ntd) of sars-cov- spike-protein structurally resembles mers-cov ntd sialoside-binding pocket differential processing of hiv envelope glycans on the virus and soluble recombinant trimer peptide-n -(n-acetyl-beta-glucosaminyl)asparagine amidase f cannot release glycans with fucose attached alpha ---- to the asparagine-linked n-acetylglucosamine residue distinct n-glycan fucosylation potentials of three lepidopteran cell lines site-specific glycan analysis of the sars-cov- spike antigen receptor nanoclusters: small units with big functions molecular determinants of immunogenicity: the immunon model of immune response polyvalent antigens stabilize b cell antigen receptor surface signaling microdomains sars-cov- spike glycoprotein vaccine candidate nvx-cov elicits immunogenicity in baboons and protection in mice a ph-dependent switch mediates conformational masking of sars-cov- spike. biorxiv structures, conformations and distributions of sars-cov- spike protein trimers on intact virions. biorxiv unexpected free fatty acid binding pocket in the cryo-em structure of sarscov- spike protein. biorxiv distinct conformational states of sars-cov- spike protein. biorxiv controlling the sars-cov- spike glycoprotein conformation. biorxiv glycans on the sars-cov- spike control the receptor binding domain conformation. biorxiv respiratory syncytial virus fusion glycoprotein expressed in insect cells form protein nanoparticles that induce protective immunity in cotton rats improved titers against influenza drift variants with a nanoparticle vaccine immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen leginon: a system for fully automated acquisition of electron micrographs a day appion: an integrated, database-driven pipeline to facilitate em image processing dog picker and tiltpicker: software tools to facilitate particle selection in single particle electron microscopy topology representing network enables highly accurate classification of protein images taken by cryo electron-microscope without masking automated molecular microscopy: the new leginon system motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy cryosparc: algorithms for rapid unsupervised cryo-em structure determination gctf: real-time ctf determination and correction new tools for automated high-resolution cryo-em structure determination in relion- structural analysis of glycoproteins: building n-linked glycans with coot automated structure refinement of macromolecular assemblies from cryo-em maps using rosetta emringer: side chain-directed model and map validation for d cryoelectron microscopy molprobity: more and better reference data for improved all-atom structure validation electronic ligand builder and optimization workbench (elbow): a tool for ligand coordinate and restraint generation meeting modern challenges in visualization and analysis ucsf chimera--a visualization system for exploratory research and analysis analysis tool web services from the embl-ebi isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model key: cord- -d stzy w authors: zhao, yuan; wang, junbin; kuang, dexuan; xu, jingwen; yang, mengli; ma, chunxia; zhao, siwen; li, jingmei; long, haiting; ding, kaiyun; gao, jiahong; liu, jiansheng; wang, haixuan; li, haiyan; yang, yun; yu, wenhai; yang, jing; zheng, yinqiu; wu, daoju; lu, shuaiyao; liu, hongqi; peng, xiaozhong title: susceptibility of tree shrew to sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d stzy w since sars-cov- became a pandemic event in the world, it has not only caused huge economic losses, but also a serious threat to global public health. many scientific questions about sars-cov- and covid- were raised and urgently need to be answered, including the susceptibility of animals to sars-cov- infection. here we tested whether tree shrew, an emerging experimental animal domesticated from wild animal, is susceptible to sars-cov- infection. no clinical signs were observed in sars-cov- inoculated tree shrews during this experiment except the increasing body temperature (above ° c) particular in female animals during infection. low levels of virus shedding and replication in tissues occurred in all three age groups, each of which showed his own characteristics. histopathological examine revealed that pulmonary abnormalities were mild but the main changes although slight lesions were also observed in other tissues. in summary, tree shrew is not susceptible to sars-cov- infection and may not be a suitable animal for covid- related researches. the manifestation caused by sars-cov- infection is known as covid- . since the first case of sars-cov- infection reported in wuhan, china, it has been about five months. it causes a pandemic with more than three million confirmed cases and nearly thousand deaths in addition to huge economic losses to the world . nevertheless, it is widely considered to be controllable according to the experiences from china and other countries. however, there are still some critical aspects that need to be further investigated in patients, such as cytokine storm, immunopathogenic damages, tropism of sars-cov- , and other sources of sars-cov- infection besides bat and pangolin that are regarded as the origin of sars-cov- [ ] [ ] [ ] . from these points of view, studies of animals become essentially important. in fact, several animal models of covid- have been recently reported in murine , hamster , ferret , and non-human primate [ ] [ ] [ ] [ ] , which recapitulate covid- from different aspects. in terms of susceptibility to sars-cov- , in addition to these experimental animals, domestic animals and pets are also investigated . cats, as a popular pet, could be an important source of sars-cov- infection due to their close relationship with human beings. the tree shrew, also known as tupaia belangeris, is genetically demonstrated to be close to primates [ ] [ ] [ ] . therefore, it is being developed to be an experimental animal that could be an alternative to primates in biomedical research due to its unique characteristics . in fact, tree shrew has been used for several animal models of virus infections, including hepatitis b , influenza virus [ ] [ ] [ ] , and zika virus . however, tupaia model of high pathogenic viruses has not been reported yet, including sars-cov- . several reports show that sars-cov- may originate from wild animals [ ] [ ] [ ] . replication of sars-cov- in tree shrews is still unknown. in this study, in order to determine the possibility of tree shrew as a covid- model, we tested the susceptibility of tree shrew to sars-cov- infection. we found that sars-cov- had limited replication and shedding in tree shrew, and cause mild histopathology, but no typical symptom is observed in infected tree shrew as reported in covid- patients. tree shew, tupaia belangeri chinensis, were bred in the institute of medical biology, chinese academy of medical sciences. after quarantine and before infection, animals were transferred to absl- facility and housed in the isolated ventilation cages, with -hour light and -hour dark. all animal procedures were approved by the institutional animal care and use committee of institute procedures in this study are outlined in figure . total tree shrews were used in this study, and divided into three groups with consideration of gender and age, including young ( months to months), adult ( years to years) and old ( years to years) groups. each group contains half male and half female animals. after baseline data and samples were collected right before virus inoculation, each of tree shrew was inoculated with ml pfu sars-cov- nasally ( ul/each nostril). clinical signs were recorded daily, including behavior, drinking and eating, breathing, feces and so on. body temperature was also monitored every other day post viral inoculation. at the same time points, nasal, throat and anal swab, and blood samples were collected. viral genomic rna in these samples was quantified by rt-qpcr using virus-specific primers and probes. on day post viral inoculation, six animals were anesthetized, bled and necropsied. after gross lesions were recorded, tissue samples were harvested for analysis of viral loads and histopathology. rnase/dnase-free h o was used to elute rna from the column. . µl of each rna was analyzed in each well of -well plate by one-step rt-qpcr using gene-specific primers and probe as described before . tissue samples of heart, liver, spleen, lung, kidney, weasand, stomach, small intestine, rectum, pancreas, brain, spinal cord, uterus, penis, testis, cecum were harvested and fixed in % neutral buffered formalin. paraffin-embedded tissues were cut into µm of sections, followed by haematoxylin and eosin (h&e) staining. slides were scanned with dhistech and inspected by the experienced pathologist using the manufacture provided software caseviewer. during the period of this study, we couldn't observe any other clinical sign besides change of body temperature. after sars-cov- inoculation, body temperature of tree shrew was monitored every other day. among three ages of virus-inoculated tree shrews, all young tree shrews except ts showed increasing body temperature. there are young tree shrews ( males, females) with peak body temperature on or -day post inoculation (dpi), followed by gradual decline. the peaks of body temperatures in two female (ts , ts ) and one male (ts ) young tree shrews were above ( figure a ). in addition, among old tree shrews, one male (ts ) and three female (ts , ts and ts ) had the peak body temperature (> ) on or dpi. however, only one adult female (ts ) showed the peak body temperature ( . ) on dpi ( figure b & c) . these results indicated that young/old tree shrews and female tree shrews showed higher sensitivity to sars-cov- infection, as compared to adult and male tree shrews. on dpi, we could detect genomic rna of sars-cov- in nasal swabs, throat swabs, anal swabs and one serum sample from young tree shrews, in nasal swabs from adult tree shrews and in nasal swabs, anal swab from old tree shrew (table ) . notably, four samples (nasal, throat, anal swabs and blood) collected from young tree shrew ts all showed virus rna positive. the highest copy number of viral genomic rna was . /ml in nasal swab from the young tree shrew ts . on dpi, there were four young, three adult and four old tree shrew with viral rna positive in some of samples. the highest level of virus shedding was still from the young tree shrew (ts ). on dpi, the young tree shrews showed the decreasing virus shedding and only the animal ts had rna-positive throat swab. in contrast, increasing number of old tree shrews had detectable viral rna in nasal, throat, and anal swabs (table ) . from these findings, we deduced that young tree shrew was more susceptible to sars-cov- infection than adult and old tree shrew. however, old tree shrew and adult tree shrews gave longer duration of virus shedding than young tree shrews. moreover, we noticed that more male animals of adult and old tree shrews showed virus shedding than females. in order to determine viral load in tissue samples, we necropsied animals from ages of tree shrews on dpi. sixteen major tissues were collected from each animal. viral genomic rna was quantified as described in methods. in three young tree shrews (ts , ts and ts ), we could detect viral rna from only lungs in ts and ts , but not in any tissue from ts , although these animal had higher number of viral genomic copy numbers at the earlier stage of sars-cov- infection. in contrast, in the adult tree shrew ts , six tissues were rna positive with the highest number . /ml in pancreas. the female adult tree shrew ts had viral rna-positive uterus in addition to lung and pancreas. the other four necropsied tree shrews showed viral rna-negative results in all tissues (table s ) . to determine the host response to sars-cov- infection, we examined sixteen tissues from each of tree shrews necropsied on dpi and dpi. it is clinically reported that old covid- patients show high morbidity and mortality, which is thought to be associated with the comorbidities firstly, more young and old tree shrews showed increasing body temperature than adult tree shrews ( figure ). secondly, young tree shrews had more severe virus shedding at the early stage of virus infection than the other animals, and old tree shrews had a longer duration of virus shedding than others ( table ) . one of young male tree shrew, ts , didn't have significant increasing body temperature, but virus shedding from nasal, throat, anal and serum was detected on dpi. these results indicated the asymptomatic infection of sars-cov- in young tree shrews. although sars-cov- infection didn't cause severe disease in all three ages of tree shrews, viral replication and mild histopathological changes were still observed in this study. particularly, we found the severe lung abnormality of histology in one adult tree shrew, which may be caused by immunopathological responses, such as cytokine storm (table ) . these results should further be confirmed via nucleic acid in situ hybridization, immunohistochemistry or immunofluorescent staining. in conclusion, tree shrew is not as susceptible to sars-cov- infection as the reported animal models of covid- , though limited replication of sars-cov- and mild histopathology was detected and observed in some tissues. in addition, commercial reagents and completely domesticated tree shrews are very limited. therefore, tree shrew is not a suitable experimental animal for covid- related studies. however, it should be very important to investigate whether wild tree shrews in nature are infected by or asymptomatic carrier of sars-cov- . none declared. total tree shrews (tupaia belangeris ) were divided into three groups (young, adult and old) according to ages. each group included half male and half female. each of tree shrew was inoculated with ml pfu sars-cov- nasally ( ul/each) in dpi. every other day body temperature was monitored. at the same time, nasal, throat, anal and blood samples were collected for analysis of viral loads. on and dpi, animals were euthanatized and necropsied. gross lesions were recorded and tissue samples were collected for further analysis. (b) on every other day as indicated in (a), body temperature of tree shrew was monitored and recorded. the software graphpad was utilized for data processing and plotting as rainbow heat map. x represents no data collected. body temperature beyond was shown in white boxes. young old adult histopathological examination of affected tissues from sars-cov- infected tree shrews. on and dpi, tree shrews were euthanized and necropsied. seventeen tissues were collected from each animal for histopathological analysis. from eight old tree shrews, eleven tissues with various degrade of pathological changes were representatively shown. green arrows indicate the pathological changes described in text. histopathology of all other tree shrews was summarized in table . table . summary of histopathological examination of tissues from all tree shrews of each age group. sixteen tissues were collected and examined histopathogically. six tissues showed no histopathological change. therefore, eleven tissues were shown in this table. each age group contained tree shrews. *number of animals with histopathological changes out of total animals in each age group. table s . viral load in tissues collected from sars-cov- infected tree shrews. density of green color is proportional to the copy number of viral genomic rna. *animal code #copy number of viral genomic rna was expressed as log /ml. -means undetectable. possible bat origin of severe acute respiratory syndrome coronavirus probable pangolin origin of sars-cov- associated with the covid- outbreak identifying sars-cov- related coronaviruses in malayan pangolins the pathogenicity of sars-cov- in hace transgenic mice. biorxiv simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility infection and rapid transmission of sars-cov- in ferrets. cell host & microbe s - comparison of sars-cov- infections among species of non-human primates. biorxiv infection with novel coronavirus (sars-cov- ) causes pneumonia in the rhesus macaques respiratory disease and virus shedding in rhesus macaques inoculated with sars-cov- . biorxiv comparative pathogenesis of covid- , mers and sars in a non-human primate model. biorxiv susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus genome of the chinese tree shrew chromosomal level assembly and population sequencing of the chinese tree shrew genome evaluating the phylogenetic position of chinese tree shrew (tupaia belangeri chinensis) based on complete mitochondrial genome: implication for using tree shrew as an alternative experimental animal to primates in biomedical research the tree shrews: adjuncts and alternatives to primates as models for biomedical research mutations in the p tumor suppressor gene in tree shrew hepatocellular carcinoma associated with hepatitis b virus infection and intake of aflatoxin b tree shrew as a new animal model to study the pathogenesis of avian influenza (h n ) virus infection avian h n influenza virus infection causes severe pneumonia in the northern tree shrew (tupaia belangeri) the tree shrew provides a useful alternative model for the study of influenza h n virus infectivity of zika virus on primary cells support tree shrew as animal model genome composition and divergence of the novel coronavirus ( -ncov) originating in china relationship between the abo blood group and the covid- susceptibility. medrxiv silico analysis of intermediate hosts and susceptible animals of sars-cov- . chemrxiv., preprint prevalence of comorbidities and its effects in coronavirus disease patients: a systematic review and meta-analysis sars-cov- infection in children characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding exuberant elevation of ip- , mcp- and il- ra during sars-cov- infection is associated with disease severity and fatal outcome. medrxiv angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target broad host range of sars-cov- predicted by comparative and structural analysis of ace in vertebrates. biorxiv the ace expression in human heart indicates new potential mechanism of heart injury among patients infected with sars-cov- the authors would like to thank all staffs in national kunming high-level biosafety primate research center for providing absl -related services. this study was supported by yfc and yfc . key: cord- - juhmjaj authors: hou, wei; liu, fei; van der poel, wim h.m.; hulst, marcel m. title: rapid host response to an infection with coronavirus. study of transcriptional responses with porcine epidemic diarrhea virus date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: juhmjaj the transcriptional response in vero cells (atcc® ccl- ) infected with the coronavirus porcine epidemic diarrhea virus (pedv) was measured by rnaseq analysis and hours after infection. differential expressed genes (degs) in pedv infected cells were compared to degs responding in vero cells infected with mammalian orthoreovirus (mrv). functional analysis of mrv and pedv degs showed that mrv increased the expression level of several cytokines and chemokines (e.g. il , cxcl , il a, cxcl [alias il ]) and antiviral genes (e.g. ifi , ifit , mx , oasl), whereas for pedv no enhanced expression was observed for these “hallmark” antiviral and immune effector genes. pathway and gene ontology “enrichment analysis” revealed that pedv infection did not stimulate expression of genes able to activate an acquired immune response, whereas mrv did so within h. instead, pedv down-regulated the expression of a set of zinc finger proteins with putative antiviral activity and enhanced the expression of the transmembrane serine protease gene tmprss (alias mspl) to support its own infection by virus-cell membrane fusion (shi et al, , viruses, ( ): ). pedv also down-regulated expression of ectodysplasin a, a cytokine of the tnf-family able to activate the canonical nfkb-pathway responsible for transcription of inflammatory genes like il b, tnf, cxcl and ptgs . the only cytokine genes found up-regulated by pedv were cardiotrophin- , an il -type cytokine with pleiotropic functions on different tissues and types of cells, and endothelin , a neuroactive peptide with vasoconstrictive properties. furthermore, by comprehensive datamining in biological and chemical databases and consulting related literature we identified sets of pedv-response genes with potential to influence i) the metabolism of biogenic amines (e.g. histamine), ii) the formation of cilia and “synaptic clefts” between cells, iii) epithelial mucus production, iv) platelets activation, and v) physiological processes in the body regulated by androgenic hormones (like blood pressure, salt/water balance and energy homeostasis). the information in this study describing a “very early” response of epithelial cells to an infection with a coronavirus may provide pharmacologists, immunological and medical specialists additional insights in the underlying mechanisms of coronavirus associated severe clinical symptoms including those induced by sars-cov- . this may help them to fine-tune therapeutic treatments and apply specific approved drugs to treat covid- patients. the lack of knowledge for treating hospitalized sars-cov- infected patients is one of the pressing problems of the current covid- pandemic. the sars-cov- virus shows a close genetic similarity to the in april identified sars virus (sars-cov- ) and to other sars-related coronaviruses isolated from humans and bats. sars-cov- induces clinical respiratory symptoms familiar to the virus, mostly in persons with underlying diseases like copd, heart failure, diabetes and obesity ( : wu et al. ) . despite the sars-cov- virus has been extensively studied in the last two decades, there are no vaccines available yet, neither there are effective prophylactic and therapeutic treatment regimens with drugs that work equally well for each individual patient with sars-induced respiratory problems. such treatments might prevent development of severe disease patterns like "acute respiratory distress syndrome" (ards) and other, often fatal complications, and may decrease the case-fatality rate of sars-cov- infections. in our lab we study the alpha-coronavirus pedv. pedv was first detected in pig herds in in europe ( : pensaert and de bouck ) . however, this virus reemerged in the spring of in north america causing a massive outbreak among pig herds, resulting in the death of about % of the suckling piglets due to severe diarrhea and dehydration ( although several studies concluded that these clinical symptoms were caused by mrv itself, in concordance with the co-existence of mrv in pedv infected piglets, also other mrv serotypes were isolated from hospitalized patients with airway problems diagnosed positive for sars-cov- ( : cheng et al. , : duan et al. , : zuo et al. . recently, a cross-family recombinant coronavirus was isolated in china from bat faeces in which an rna sequence originating from the s segment of mrv was inserted in the coronavirus genome between the n and ns a genes, indicating that both viruses were replicating simultaneously in a single cell in bats ( : huang et al. ) . a prevalence study showed that this cross-family recombinant coronavirus circulated in an isolated bat colony in a cave in china ( : obameso et al. ). this cooccurrence of mrv with coronaviruses raised the questions whether a synergistic effect between both viruses exists and if such coexistence plays a role in viral pathogenesis. therefore we studied the host response in cultured cells early ( and hours) after pedv and mrv infection using rnaseq. our original goal was to identify early factors and processes induced by pedv or mrv that could stimulate or influence the replication and pathogenesis of the other virus. the host, tissue and cell tropism of pedv differs from sars-cov- and - . however, the genomic organization, replication strategy and function of a part of the viral nonstructural proteins share common features among all coronaviruses ( : brian and baric ) . this applies particularly for interactions in infected cells of nonstructural coronavirus proteins with specific host proteins. host proteins that are recruited or silenced to support virus replication, assembly and release. in our experiment we used vero cells (cercopithecus aethiops epithelial kidney cell line; atcc® ccl- ) because these cells support efficient infection and replication of both mrv and pedv. vero cells are susceptible for many coronaviruses, including sars-cov- and - ( : chu et al. ). they originate from epithelial tissue, in part resembling nasal and bronchial epithelium cells, the prime target cells infected by sars-cov- in the airways of humans. recent research showed that sars-cov- is also able to replicate in epithelial cells of human small intestinal organoids ( : lamers et al. ) . a disadvantage of vero cells is a deletion in the type i interferon (ifn) gene cluster on chromosome ( : osada et al. ). therefore, these cells lack expression of type i ifns important for activation of antiviral defense mechanisms. however, research has shown that vero cells by-pass this ifnactivation route and could mount an antiviral response mediated by interferon regulatory factor ( : chew et al. ). single infections with pedv or mrv alone and simultaneous (double) infections of vero cells with both viruses were performed using a maximum multiplicity of infection (moi) to achieve a synchronized infection of all cells. by rnaseq measured expression levels of mrna transcripts/genes in infected cells were compared to rnaseq profiles measured from similar treated mock-infected cells harvested at the same time point after infection. the detected sets of differential expressed genes (degs) for pedv and mrv were analyzed by gene set enrichment analysis (gsea) using functional bioinformatic programs to retrieve biological processes (pathways and gene ontology terms [go-term]) and associations with chemical compounds, including drugs. in addition, we searched the literature for functional information of the pedv-degs to find possible associations with sars-cov- pathogenesis. because of the covid- pandemic we gave priority to publish the results of this functional bioinformatical analysis and datamining for the single infected vero cells with pedv separate from the results of the double infections with mrv . in this report we focused on the "very early" host response of epithelial cells to an infection with the coronavirus pedv and pay less attention to the role of specific viral proteins in this host response to pedv. in part our results were in agreement with results of a previous rnaseq study comparing sars-cov- and influenza host responses by rnaseq ( : blanco-melo et al. ). but we also found associations with biological processes, and pivotal genes/proteins acting in these processes, that had not been recognized before. this information may contribute to the search for novel or alternative preventive or therapeutic drugs and treatment protocols for this devastating covid- disease. a time-dependent infection experiment was performed with cultured vero cells. details are described in supplementary file (material and methods) and visually displayed in this file. briefly, overnight cultured vero cells grown in cm wells were mock-infected, infected with mrv strain wbvr ( : hulst et al. ) or pedv strain cv ( : pensaert and de bouck , : rasmussen et al. ] ) with a multiplicity of infection of ≥ for min at °c. for pedv and corresponding mockinfected cells, µg/ml of trypsin in serum-free medium was used to facilitate infection of vero cells during the whole experiment. all virus and mock-infected timepoints were performed in quadruplicate. after incubation for min at °c, virus was discarded and cells were washed twice and supplied with fresh culture medium. cells were incubated for , , , , and h at °c and % co . after incubation for the indicated times, cells were placed on ice before total rna was isolated from three of the quadruplicate wells. the replication of both viruses in vero cells was monitored using virus-specific rt-qpcr tests ( fig. : methods and primers used for pcr are provided in supplementary file ). in addition, cells in one of the quadruplicate wells incubated for h were fixated and stained with antibodies directed against the s spike protein of pedv and the s attachment protein (α ) of mrv . a decrease in ct-values for pedv was not observed before h post inoculation ( h.p.i), indicating that replication in pedv infected cells started later than was observed for mrv (at h.p.i). staining of the cells after h indicated that nearly all vero cells were infected with mrv and more than % with pedv. also more than % of the cells in hwells appeared as fused cells (syncytia), confirming that more than % of the cells were infected with pedv. quality control of the total rna isolated from infected cells using an agilent bioanalyzer showed that rnas isolated from pedv infected wells at h.p.i. were partially degraded (rin values below ), making them unsuitable for rnaseq analysis. therefore, only , and h timepoints were analyzed using rnaseq. stained with a monoclonal antibody directed against the s spike protein. mrv and mock infected cells were stained with a polyclonal rabbit serum raised against a peptide sequence of the s -attachment protein of mrv serotype . nuclei were stained blue with the hoechst, ', -diamidino- -phenylindole dye. equal amounts of total rna isolated from triplicate wells were pooled and subjected to rnaseq analysis by genomescan b.v.(leiden, the netherlands) using next generation sequencing (ngs) (see supplementary file a for details). mapping of ngs reads to the cercopithecus aethiops reference genome and preparation of datafiles with calculated fold change (fc) of expression levels of mapped mrnas, were performed for each comparison at , , and h by genomescan (see supplementary file b). from these datafiles we extracted lists of degs with a fc> and p-value of < . . after accessing the ncbi, panther or kegg databases for human orthologs, not annotated cercopithecus aethiops degs were annotated with an hugo official gene symbols (http://www.genenames.org). in supplementary file sheet "pedv-mrv degs fc> ", lists of all annotated pedv and mrv degs are presented with their fc. in a separate sheet "pedv-degs functional info" all individual degs regulated by pedv at and h.p.i. are presented with their fc, information about their function and the types of human cells in which expression of the gene is relatively high compared to other human cells (retrieved from the "primary cell atlas" dataset of biogps: http://biogps.org/). note that all tables in these excel sheets of supplementary file are sortable using the headers. in all results paragraphs beneath information about the biological function of degs was retrieved by consulting the "genecards" (weizmann institute of science: https://www.genecards.org/) and ncbi gene reports (entrez gene: https://www.ncbi.nlm.nih.gov/gene/), and literature linked to these reports (for references about these biological functions of genes/proteins we refer to publications cited in these reports: "genecards" weblinks are provided in supplementary file ). sets of pedv and mrv degs were analyzed using the gsea program geneanalytics (lifemap sciences, inc.) and pathways (for mrv and pedv), go-terms (not for mrv), and associations with compounds/drugs (not for mrv) with a high or medium score (p-value < . ) were retrieved and listed in separate sheets in supplementary file (sheets "mrv-pedv pathways", "pedv g -terms" and "pedv compounds"). similar and related pathways retrieved for both pedv and mrv, and remarkable pedv pathways, go-terms and compound associations are summarized in table . for pedv all degs within these pathways are provided with their regulation, up (green) or down (red). for mrv only degs in common with pedv-degs were listed in table (see sheet "mrv-pedv pathways" in supplementary file for all mrv-degs acting in these pathways). subsets of pedv-degs were selected matching the terms "chemokines-cytokines", "antiviral" , and terms related to the pathogenesis of covid- (explained below) using the genotyping program varelect (lifemap sciences, inc.) and displayed in supplementary file in separate sheets: "chemokines-cytokines", "(anti)-viral", etc. based on these selections we prepared a set of pedv key-degs consisting of genes regulated with a fc of > (up) or <- (down) or playing an important role in biological processes induced by pedv and related to covid- pathology. in beneath results sections we tried to give as much as possible meaningful information about the function of key-degs for which we found an association with sars-cov- infections. we emphasize that further dedicated experimental and in-silico research is necessary to confirm the involvement of the proteins encoded by these genes for pathogenesis of this viral disease. table . enriched pathways, go-terms and compound associations of pedv-degs. *pedv enriched pathways (a), go-terms (b), and associations with compounds and drugs (c) with a high and medium score and with at least matching genes were retrieved from geneanalytics. common pathways for mrv were included in table a . a full list of pathways with degs, retrieved for mrv at and h, is provided in supplementary file (sheet mrv-pedv pathways). a possible function or process related to specific degs, pathways, go-term, or compounds/drugs is provided in blue text between brackets. $ official gene-symbols (hugo abbreviations) are listed for degs. down-regulated degs were colored red and up-regulated degs were colored green. in section a the number of degs regulated by mrv in a pathway and the common degs are provided between brackets. degs regulated by both pedv and mrv are underlined. compared to mrv, only a few genes involved in "cytokine/chemokine signaling" were regulated at and h by pedv. in fig. a the regulation of cytokines/chemokines in pedv and mrv infected vero cells are displayed. this indicated that mrv increased the transcription of a broad set of cytokines/chemokines, including interferonmediated cytokines like cxcl and cxcl (alias il ), already at h.p.i., whereas pedv did not, even not when replication of pedv rna was detected by rt-qpcr at h.p.i.. for mrv, this cytokine/chemokine response at h.p.i. was followed by high up-regulation of "hallmark" antiviral genes at h.p.i. (see fig. b : e.g. interferoninduced genes [ifi] and oasl) and chemokines that attract t cells, monocytes, granulocytes, including basophils (e.g. cxcl , cxcl and ccl ). pedv infection up-regulated only a few genes coding for proteins with cytokine activity (ctf and edn ), and also did not elevated gene expression of these "hallmark" antiviral genes. in contrast, pedv down-regulated expression of genes (out of in total) coding for zinc finger proteins (out of in total), all with an antiviral activity towards herpes simplex virus ( fig c) binding of thrombin to f rl reduces inflammation, activates platelets and increases vasodilation and permeability of the vascular wall (see also below in the section "platelets activation"). csf r is a receptor for the cytokine colony stimulating factor , a cytokine that regulates differentiation and function of macrophages, and in the cns, the density and distribution of microglia cells. the blnk gene codes for a cytosolic protein that passes on b-cell receptor signals in the signaling cascade that activates b-cell development and function. gene expression of genes coding for essential components of this b-cell signaling, like "spleen associated tyrosine kinase" (syk) and "lyn proto-oncogene"(lyn) were not regulated by pedv, nor by mrv. genes involved in amino acid, protein translation and metabolism of immuno-active compounds. pedv degs coding for enzymes involved in the metabolism of the non-essential amino acids histidine, phenylalanine, tryptophan and proline were found enriched in the pedv dataset (see supplementary file , sheet pedv-compounds). remarkable were the degs coding for amine oxidases involved in the catabolism of the biogenic amines histamine, tryptamine and phenylethylamine, their derivates and related substrates/products of these enzymes (fig. , aoc , maoa, il i ). none of these amino oxidase genes were regulated by mrv. using the genotyping program varelect, pedv-degs with an association with these biogenic amines were retrieved (supplementary file , sheet "biogenic amines"). three enzymes clustered in the "histidine metabolism" pathway (https://www.kegg.jp/keggbin/show_pathway?hsa + ) with histamine and reaction products generated from this biogenic amine (fig. ) . also most association of pedv-degs were found by varelect for histamine. the gene coding for the amine oxidase "interleukin induced " (il i ) was strongly down-regulated ( -fold) h after infection with pedv. besides catalysis of l-phenylaniline into phenylpyruvate (fig. ) , il i also fulfills an important role in signaling in "synaptic clefts" formed between antigen presenting cells (apc) and t cells (so-called "immune cleft": see also below) ( expression of the prostaglandin-endoperoxide synthase (ptgs ) gene was upregulated by mrv at h p.i. ptgs synthesizes prostaglandin endoperoxide h (pgh ), an compound with a short half-life and the precursor of many biological active prostaglandins: e.g. thromboxane-a (mediates activation of platelets), pgi and pge . in contrast, pedv increased the expression of the gene coding for prostaglandin e synthase (ptges) which converts pgh into pge . pge is a direct vasodilator, but does not inhibit platelet aggregation. pge also suppresses t cell receptor signaling. pedv decreased expression of the gamma-glutamyltransferase gene (ggt ) after h ( -fold), but increased expression of this gene hours later to a -fold level compared to mock infected cells. ggt synthesizes leukotriene d (ltd ) from ltc . ige-activated mast cells may secrete ltd and ltc , together with histamine and platelets activating factor (paf). this vesicle mediated secretion by mast cells (degranulation) results in stimulation of mucus production, and similar to histamine, increases the permeability and smooth muscle contraction of the vascular wall. in persons suffering from asthma this degranulation leads to an immediate allergic response (bronchospasm, airflow obstruction and forming of edema). genes involved in "cilia and synaptic cleft" formation. gsea detected "axon guidance" as the go-term with the highest score for pedv (see table ). in addition, pedv-degs were enriched coding for proteins involved in calcium ion-dependent exocytosis from vesicles into the "synaptic clefts" between two cells (e.g. between axons and dendrites), and degs coding for proteins involved in formation of cilia. cilia protruding from cells are found in many forms. they can have a static (structural) function, e.g. in forming of clefts between two cells (see fig. ), or a motile function. motile cilia on the surface of ciliated cells lining up the epithelial layers in the nose, trachea and bronchia sweep out superfluous mucus containing dirt from the airways. pedv degs matching the terms "cilia" and "synaptic cleft" retrieved form the genotyping program varelect were further examined by consulting functional information in the ncbi gene and genecards reports in order to evaluate their association with these processes (see supplementary file , sheet "cilia and synaptic cleft"). based on this analysis we identified genes in the set of pedv degs which can i) negatively regulate cell adhesion (rnd and sema a), ii) inhibit formation of cilia (kinases mak and cdk , highly up-regulated at h), and iii) regulate cytoskeleton rearrangements that facilitate axon growth and growth and stabilization of dendritic spines (f rl , regulation of genes involved in histamine/biogenic amines (see above) and formation of cilia/clefts suggested that gene expression related to this intersynaptic signaling between immune cells could be affected in response to infection with pedv (fig ) . in particular, the highly down-regulated gene il i ( -fold at h p.i.) is of interest (see also above). il i is believed to be secreted from apc's (e.g. dc's) in the immune cleft formed with t cells ( : molinier-frenkel et al. ). the mechanism how il i transmits its signal to t cells is not completely understood. it could bind to a receptor that concentrates this amino oxidase in the cleft, resulting in elevated h o and ammonia production, phenylalanine depletion and phenylpyruvate production in the cleft space. these alteration in the concentration of these chemicals in the cleft are sensed by the t cell. the paralog of il i , amino oxidase maoa (down-regulated -fold by pedv) could also play a similar role in this signaling process. remarkable was also the strong down-regulation of genes coding for the olfactory receptor family subfamily a member (or a ; -fold at h) and anoctamin (ano , alias cacc; -fold at h). ano is a calcium-activated chloride channel imbedded in the basal membranes of neurons that harbor apical membrane receptors like or a that sense odorants. by importing chloride ions into the cytosol ano contributes to the depolarization of these neurons (https://www.kegg.jp/kegg-bin/show_pathway?hsa + ). loss of smell and taste is one of the first noticeable symptoms of covid- . genetic defects in the ano gene are associated with von willebrand disease, a bleeding disorder due to defective platelet aggregation ( . also disease incidence in adult males is significantly higher than in females of the same age. to assess whether pedv-degs relate to these pathological symptoms, the genotyping program varelect was used to identify genes matching the terms "ards", "cardiomyopathy", "obesity (diabetic)", and "platelets activation". detailed information about all matching degs is provided in supplementary file in separate sheets for all queried terms. degs matching to more than one query term are displayed in fig. . remarkable associations of degs with these terms are mentioned in sections beneath. . edn and agt are vasoactive peptides and binding of edn and agt to their receptors on granular cells of the juxtaglomerular apparatus in the kidney raises free calcium levels in the cytosol, leading to inhibition of camp-mediated secretion of the aspartylprotease renin (ren), the key regulator of renin-angiotensin-aldosterone system (raas) (https://www.kegg.jp/kegg-bin/show_pathway?hsa + ). ren converts pre-angiotensinogen (agt) to the endocrine peptide-hormone agt (https://www.kegg.jp/kegg-bin/show_pathway?hsa + ). agt is further cleaved to variants with specific endocrine activity by angiotensin i converting enzymes (e.g. ace and ace : ). on the surface of bronchial epithelial cells ace was identified as entry receptor for sars-cov- and - ( : hoffmann et al. ). the octamer peptide agt stimulates secretion of the mineralocorticoid hormone aldosterone by the adrenal glands. aldosterone, and the agt and edn peptide hormones regulate an array of physiological processes in the body, e.g. vascular smooth muscle contraction, blood pressure, fluid and electrolyte homeostasis ( : agapitov et al. ) . all processes that are important for proper functioning of the vascular system, heart muscles and kidneys. ctf is directly involved in the pathology of numerous cardiovascular diseases by promoting cardiac myocyte hypertrophy ( : wollert et al. ) , which may lead to the onset of heart-diseases like "hypertrophic cardiomyopathy" or "dilated cardiomyopathy", and eventually, to (lethal) heart failure. pedv induced a strong down-or up-regulation of several other genes directly involved in the function of cardiomyocytes. sodium voltage-gated channel subunit (scn b) was strongly upregulated ( -fold) and mylk (see above), citrate synthase (cs; down-regulation -fold) and ankyrin repeat domain (ankrd ) were strongly down-regulated. down-regulation of cs may reduce oxidative capacity in cardiomyocytes. gene expression of ankrd was down-regulated -fold in response to mrv infection at h.p.i, but reverted to a -fold up-regulation h later. ankrd is a putative transcription factor involved in regulation of gene expression in hypertrophic myocytes (https://www.wikipathways.org/index.php/pathway:wp ) regulation of cytosolic calcium levels in the cytosol of cardiomyocytes, e.g. by binding of col a and col a (up-regulated by pedv, see above) to integrin subunit alpha on the surface of cardiomyocytes or after import of calcium ions mediated by calcium voltage-gated channels (e.g. by cacna h; down-regulated -fold at h by pedv) may also trigger myocyte hypertrophy. pedv strongly down-regulated gene expression of a potassium voltage-gated channel (kcnq ; -fold). this in contrast to a strong up-regulation of the sodium symporter scn b. for the potassium channel cacna h (alias kv . ) it was reported that this channel regulates the membrane potential and ca + permeability of mitochondria located in the vicinity the sarcoplasmic reticulum in rat cardiomyocytes ( : testai et al. ). all three above mentioned ion channels are also involved in the process of excitation, contraction/relaxation and repolarization of cardiac myocytes. mrv also downregulated gene expression to -fold for the potassium (kcnq ) and calcium channel cacna h, but did not increased expression of the sodium symporter gene scn b or orthologs of this gene. chronic hypertension and heart disease/failure is a complication frequently observed in obese/diabetic patients. in accordance with this, out of the pedv degs matching the term "obesity" also matched with the term "cardiomyopathy" (see additional remarkable pedv-degs. highly up-or down-regulated pedv-degs not mentioned in the text, and to our opinion interesting with regard to coronavirus infection, are briefly described in table . among these degs several genes coding for transcription factors and genes transcribed in antisense rna's that inhibit translation of their coding counterparts. for more functional information about these degs we refer to the weblinks provided in supplementary file , sheet "pedv key degs". table . remarkable pedv-degs not mentioned in the text. in this report we measured the transcriptional response of vero cells shortly after infection with the coronavirus pedv. the function of the responding host genes and the biological processes in which they act were studied in detail by us to find plausible relations to covid- pathology. because of the differences in genomic organization and expression of viral proteins between sars-cov- and pedv, we paid less attention to couple the response of specific host genes to the function of specific coronavirus proteins. we were able to infect the majority of vero cells (> %) with pedv and mrv synchronically. this resulted in a unique set of highly up-and down-regulated degs for pedv. not more than % of the pedv-degs (n= ) were similar to mrv-degs (total of mrv-degs). in contrast to mrv, we observed no typical response of antiviral genes and related cytokine/chemokine genes in vero cells within h.p.i. for pedv. for mrv these processes started already before h.p.i.. we have to notice that pedv replication started h later than mrv replication, which could in part be the reason for not detecting transcriptional regulation of specific cytokine, chemokine and antiviral genes for pedv. longer incubation times than h were not planned in the original design of our experiments and would have resulted in a set of pedv-degs dominated by genes involved in syncytia forming and apoptotic/necrotic cell death. nevertheless, at h replication of pedv rna was detected by rt-pcr, indicating that dsrna was present in the cells and could have be sensed by cytosolic pattern recognition receptors of the rig-i-like family to initiate an antiviral and related cytokine/chemokine response. similar as observed in another study with pedv and vero cells, and in analogy with sars-cov- and - , we observed a high up-regulation of the transmembrane serine protease gene (tmprss ) that acts as a co-factor in the infection process of cells ( we observed a reduced expression of eef a , as part of a transcription factor-complex that binds and activates the promotor of ifnγ, and of the cytokine eda and its receptor (edaradd) involved in activation of canonical-nfkb transcription of antiviral cytokine-chemokine genes like cxcl (alias il ) an cxcl . this reduced expression of eef a and eda and its receptor may play a crucial role in delaying or downgrading an ifn-mediated antiviral and cytokine/chemokine response in our vero cell system. the elevated transcription of many cytokine and chemokine genes in vero cells by mrv suggests that replication of this rna virus in epithelial cells induces secretion of these immune effectors (more information about the genes/processes that responded to mrv infection will be published elsewhere). pedv, and also the mrv strain we used both replicate in enterocytes lined up in the intestinal mucosal layer. in the intestinal and bronchial epithelial layer, microfold (m) cells are imbedded between these lined up epithelial cells. m-cells sense and engulf foreign pathogens/antigens from the lumen to present them to residing immune cells. according to pathway analysis, most t cell related immune genes regulated in response to mrv infection were part of the pathways "t-helper (th ) differentiation/activation" and "il signaling". antigen presentation by m cells to th cells in the submucosal layers stimulate secretion of different types of il cytokines (il a-d, il and il f) resulting in activation of different types of innate immune cells and t cells, including th /th cells. dysregulation of this pathogen-induced il response may disturb the balance between th /th -cell mediated immune responses, resulting in excessive inflammation, damage to the epithelial layer and on the long term, to autoimmune reactions. tf rorc (or specific isoforms of this tf, see above) plays a pivotal in controlling il expression and secretion by th cells. pedv strongly up-regulated gene expression of rorc in vero cells whereas mrv down-regulated expression of this tf. this difference in regulation of tf rorc suggests that virus-induced activation or suppression of il secretion by th cells in submucosal layers of airway epithelium can be an important mechanism to dysregulate the activation of t cell responses. therefore, tf rorc might be a potential target for drug treatment/development. drugs affecting expression of rorc, like the fluorinated steroid "dexamethasone" and the synthetic tetracycline derivative "doxycycline" (http://ctdbase.org/basicquery.go?bqcat=gene&bq=rar+related+orphan+receptor +c) are already under investigation in relation to sars-cov- pathogenesis. transcriptional regulation of a set of genes coding for enzymes involved in biogenic amine metabolism was unique for pedv, and not observed for mrv. most associations of these pedv-degs were found with histamine, a compound produced by mast cells and basophils, and released by these cells in response to allergens and pathogen-induced inflammation. the -fold up-regulation of the enzyme aoc suggests that histamine is converted to imidazole-acetaldehyde (see fig. ). however, without data of intra-and extracellular concentrations of the chemicals this remains speculative. recent reports indicated that submucosal mast cells in the lungs were triggered by sars-cov- infection to release pro-inflammatory cytokines (il , il and tnf-alpha) . mdscs infiltrate these tumors and inflamed tissues to suppress the local activity of specific immune cells. therefore, the role of infiltrating mdscs at inflammatory sites in the lungs of covid- patients, as part of an sars-cov- immune-evading strategy, and the role of il i in this process, is worthwhile to investigate in more detail. expression of genes that promote or inhibit the formation and motility function of cilia were time-dependently regulated by pedv. the -fold up-regulation of fez gene expression at h ( -fold) descended within h to a moderate -fold upregulation. this descend occurred simultaneously with elevation of gene expression of the kinases mak and cdk , both involved in inhibition of cilia formation. because pedv efficiently replicates in enterocytes that carry ciliated membrane protrusions (microvilli) on their luminal surface, regulation of these genes could be related to structural changes in the cytoskeleton of cells imposed by virus replication (e.g. syncytia forming in pedv infected vero cultures). likewise, sars-cov- replication could also impose structural changes in cilia protruding from the surface of upperairway epithelial cells (nose, trachea) and bronchi. based on our results we cannot pinpoint a specific process in which these cilia-regulating genes act. possible processes can either be formation of an immune cleft, a virological cleft to promote more effective infection of neighboring cells, or cytoskeleton rearrangements to support virus replication, morphogenesis and budding from cells. interestingly, two recent studies revealed a high level of expression of the sars-cov- entry receptor ace and its co-receptor tmprss in ciliated airway epithelial cells ( . these processes deserve more attention, and may also be considered as possible target-processes for interference with drugs. within our set of pedv-degs, and the biological processes deduced from this set, we found associations with diverse aspects of covid- pathogenesis, i.e. "ards", "cardiomyopathy", "obesity (diabetic)", and "platelets activation". however, it is unknown whether the proteins encoded by these pedv-degs indeed play a role in the biological processes underlying the symptoms and complications observed in hospitalized persons infected with sars-cov- . nevertheless, a part of these genes/processes may be starting points for further dedicated research. research to fine tune drug treatment protocols that are already applied for covid- patients, or research that provides new insights for treatments with alternative prophylactic and therapeutic approved drugs. in table . we summarized the pedv-degs that are, to our opinion, of interest for modulation of the biological processes underlying the pathogenesis of covid- . table . possible target genes for covid- therapy. colophon. the overwhelming amount of data published recently made it impossible for us to oversee all (novel) facts about the sars-cov- virus and pathology of the covid- disease. some important genes and related processes imbedded in our set of pedv-degs may have been overlooked by us. therefore, we encourage researchers, especially medical, immunological and pharmaceutical specialists, to study this set of degs in detail. the users-friendly supplementary file with functional information about degs and related biological processes can be down-loaded from the web. by publishing these pedv data ahead of our complete study, we hope that some of the gene targets and cognate processes we have identified for the coronavirus pedv will contribute to a better understanding how hospitalized covid- patients can be treated and cured. a more condensed version of this manuscript, focusing on the original goal of our study, will be submitted to a peer-reviewed virological journal soon. risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease a new coronavirus-like particle associated with diarrhea in swine origin, evolution, and genotyping of emergent porcine epidemic diarrhea virus strains in the united states porcine epidemic diarrhea virus infection: etiology, epidemiology, pathogenesis and immunoprophylaxis a novel pathogenic mammalian orthoreovirus from diarrheic pigs and swine blood meal in the united states identification of a novel reassortant of a mammalian orthoreovirus in faeces of diarrheic pigs in the netherlands: presentation at the th annual meeting of epizone in paris high similarity of novel orthoreovirus detected in a child hospitalized with acute gastroenteritis to mammalian orthoreoviruses found in bats in europe pathogenesis of reovirus infections of the central nervous system identification and characterization of a new orthoreovirus from patients with acute respiratory infections a novel reovirus isolated from a patient with acute respiratory disease reov isolated from sars patients cloning and identification of reovirus isolated from specimens of sars patients a bat-derived putative cross-family recombinant coronavirus with a reovirus gene the persistent prevalence and evolution of crossfamily recombinant coronavirus gccdc among a bat population: a two-year followup coronavirus genome structure and replication comparative tropism, replication kinetics, and cell damage profiling of sars-cov- and sars-cov with implications for clinical manifestations, transmissibility, and laboratory studies of covid- : an observational study sars-cov- productively infects human gut enterocytes the genome landscape of the african green monkey kidney-derived vero cell line characterization of the interferon regulatory factor -mediated antiviral response in a cell line deficient for ifn production imbalanced host response to sars-cov- drives development of covid- full-length genome sequences of porcine epidemic diarrhoea virus strain cv ; use of ngs to analyse genomic and sub-genomic rnas tmprss and mspl facilitate trypsin-independent porcine epidemic diarrhea virus replication in vero cells desc and mspl activate influenza a viruses and emerging coronaviruses for host cell entry efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss a novel isoform of the orphan receptor rorγt suppresses il- production in human t cells structure, function and evolution of the hemagglutinin-esterase proteins of corona-and toroviruses functional screen reveals sars coronavirus nonstructural protein nsp as a novel cap n methyltransferase '-o methylation of the viral mrna cap evades host restriction by ifit family members human il i is a secreted lphenylalanine oxidase expressed by mature dendritic cells that inhibits t-lymphocyte proliferation the il i enzyme: a new player in the immunosuppressive tumor microenvironment avoiding the void: cell-to-cell spread of human viruses a common -kb deletion involving vwf and tmem b in german and italian patients with severe von willebrand disease type pathological findings of covid- associated with acute respiratory distress syndrome case-fatality rate and characteristics of patients dying in relation to covid- in italy the tn antigen-structural simplicity and biological complexity post-translational modifications of coronavirus proteins: roles and function emerging wuhan (covid- ) coronavirus: glycan shield and structure prediction of spike glycoprotein and its interaction with human cd therapies for bleomycin induced lung fibrosis through regulation of tgf-beta induced collagen gene expression covid- -induced acute respiratory failure: an exacerbation of organspecific autoimmunity? medrxiv . cardiotrophin- activates a distinct form of cardiac muscle cell hypertrophy. assembly of sarcomeric units in series via gp /leukemia inhibitory factor receptor-dependent pathways a g polymorphism of the endothelin- gene and atrial fibrillation in patients with hypertrophic cardiomyopathy sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor role of endothelin in cardiovascular disease expression and function of kv . channels in rat cardiac mitochondria: possible targets for cardioprotection human coronary arteriolar dilation to adrenomedullin: role of nitric oxide and k(+) channels adrenomedullin improves cardiac function and prevents renal damage in streptozotocin-induced diabetic rats adrenomedullin: a possible autocrine or paracrine inhibitor of hypertrophy of cardiomyocytes histones and heart failure in diabetes histone h lysine hypoacetylation is associated with defective dna repair and premature senescence in zmpste -deficient mice studies of the sim gene in relation to human obesity and obesity-related traits neuropeptide b mediates female sexual receptivity in medaka fish, acting in a female-specific but reversible manner porcine epidemic diarrhea virus inhibits dsrna-induced interferon-β production in porcine intestinal epithelial cells by blockade of the rig-imediated pathway mast cells contribute to coronavirus-induced inflammation: new anti-inflammatory strategy mast cell stabilisers, leukotriene antagonists and antihistamines: a rapid review of effectiveness in covid- lnc-c/ebpβ modulates differentiation of mdscs through downregulating il i with c/ebpβ lip and wdr identification of discrete tumor-induced myeloid-derived suppressor cell subpopulations with distinct t cellsuppressive activity sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells mechanisms of innate immune evasion in re-emerging rna viruses supplementary file : interactive excel file with sortable tables in separate sheets. please, first read the sheet "read me" for an explanation and instructions for the use of the tables. excel sheets contain tables with i) pedv and mrv degs extracted from rnaseq data files, ii) functional information about the pedv-degs, iii) gsea extracted pathways (for mrv and pedv), go-terms (only for pedv) and compound associations (only for pedv), and iv) associations of pedv-degs with the terms "cytokines-chemokines", "(anti)-viral", "biogenic amines", "cilia and synaptic clefts", and the disorders "ards, "cardiomyopathy", "obesity", and "platelets activation" (a-c-o-p). the authors declare that they have no competing interests.authors' contributions. key: cord- - q t authors: thacker, vivek v; sharma, kunal; dhar, neeraj; mancini, gian-filippo; sordet-dessimoz, jessica; mckinney, john d title: rapid endothelialitis and vascular inflammation characterise sars-cov- infection in a human lung-on-chip model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q t background severe manifestations of covid- include hypercoagulopathies and systemic endothelialitis. the underlying dynamics of damage to the vasculature, and whether it is a direct consequence of endothelial infection or an indirect consequence of immune cell mediated cytokine storms is unknown. this is in part because in vitro infection models are typically monocultures of epithelial cells or fail to recapitulate vascular physiology. methods we establish a vascularised lung-on-chip infection model consisting of a co-culture of primary human alveolar epithelial cells (‘epithelial’) and human lung microvascular endothelial cells (‘endothelial’), with the optional addition of cd + macrophages to the epithelial side. a combination of qrt-pcr, rnascope, immunofluorescence, and elisa measurements are used to study the dynamics of viral replication and host responses to a low dose infection of sars-cov- delivered to the apical surface of the epithelial face maintained at an air-liquid interface. findings sars-cov- inoculation does not lead to a productive amplification of infectious virions. however, both genomic and antisense viral rna can be found in endothelial cells within -day post infection (dpi) and persist upto dpi. this generates an nf-kb inflammatory response typified by il- secretion and a weak antiviral interferon response even in the absence of immune cells. endothelial inflammation leads to a progressive loss of barrier integrity, a subset of cells also shows a transient hyperplasic phenotype. administration of tocilizumab slows the loss of barrier integrity but does not reduce the occurrence of the latter. interpretation endothelial infection can occur through basolateral transmission from infected epithelial cells at the air-liquid interface. sars-cov- mediated inflammation occurs despite the lack of rapid viral replication and the consequences are cell-type dependent. infected endothelial cells might be a key source of circulating il- in covid- patients. vascular damage occurs independently of immune-cell mediated cytokine storms, whose effect would only exacerbate the damage. finding core support from epel. organ-on-chip technologies recreate key aspects of human physiology in a bottom-up and modular manner . in the context of infectious diseases, this allows for studies of cell dynamics , , infection tropism , and the role of physiological factors in disease pathogenesis in more native settings . this is particularly relevant for the study of respiratory infectious diseases , where the vast surface area of the alveoli poses a challenge to direct experimental observation. covid- , caused by the novel betacoronavirus sars-cov- , first manifests as an infection of the upper airways. severe cases are marked by progression into the lower airways and alveoli. here, it manifests as an atypical form of acute respiratory distress syndrome (ards) characterized by good lung compliance measurements , , and elevated levels of coagulation markers such as d-dimers , and pro-inflammatory markers in the blood . autopsy reports show numerous microvascular thrombi in the lungs of deceased patients together with evidence of the intracellular presence of the virus in vascular cells , . these suggest that the lung microvasculature plays a key role in covid- pathogenesis , yet most in vitro studies have focused on monocultures of upper airway respiratory cells. in studies with alveolar epithelial cells, sars-cov- has been shown to replicate poorly both in the a lung adenocarcinoma cell line and in primary alveolar epithelial cells ex vivo and has been reported to be unable to infect primary microvascular endothelial cells , which are at odds with the reported medical literature. there is therefore an urgent need for a better understanding of the pathogenesis of sars-cov- in alveolar epithelial cells and in a more realistic model of the alveolar space that is vascularized. the lung-onchip model is well-suited to this purpose because it includes a vascular compartment maintained under flow, and infection can occur at the air-liquid interface, two key physiological features that are lacking in organoid models [ ] [ ] [ ] . we therefore establish a human lung-on-chip model for sars-cov- infections, and probe the viral growth kinetics, cellular localization and responses to a low dose infection using qrt-pcr, elisa, rnascope, immunofluorescence and confocal imaging (fig. j) . we established a human lung-on-chip model for sars-cov- pathogenesis (fig. a) using primary human alveolar epithelial cells ('epithelial') and lung microvascular endothelial cells ('endothelial'), which form confluent monolayers on the apical and vascular sides of the porous membrane in the chip (fig. b, c) . the modular nature of the technology allowed us to recreate otherwise identical chips either with ( neuropilin- , an integrin-binding protein has recently been reported to be an alternative receptor for sars-cov- entry , . nrp expression was between one and four orders of magnitude higher than ace expression both in monoculture and on-chip (fig. s a, d) , with a -fold higher expression in endothelial cells compared to epithelial cells that was retained on-chip ( fig s b) . expression of cell surface receptor proteins used for sars-cov- entry in both cell types differs significantly in coculture at an air-liquid interface, and an alternative entry receptor such as nrp are far more expressed than ace . infection of the alveolar space is characterized by a lack of productive infection, cell-to-cell transmission, and slow intracellular replication we first characterized the progression of infection by measuring the release of viral progeny and intracellular viral rna loads. infected locs were monitored daily for the release of infected viral progeny ( ) apically -on the epithelial layer and ( ) basolaterally -in the cell culture media flowed through the vascular channel ('apical wash' and 'vascular effluent' in fig. a) . a low number of viral genomes were released apically in the epithelial layer, and the number of genomes detected decreased over - dpi ( fig. a) . genome copy numbers were -fold lower than the starting inoculum ( pfu), suggesting that instances of productive infection were rare. no viral genomes were detected in the vascular effluent (fig. b) , and the lack of infectious particles in the effluent was confirmed for two locs with and w/o macrophages by plaque forming unit assays (data not shown). nevertheless, total rna extracted from the apical and vascular channels of an infected loc without macrophages at dpi revealed > genomes in both epithelial and endothelial cells (fig. c ) and genome copy numbers exceeded those for cellular housekeeping gene rnasep (fig. d ). similar numbers of viral genome copy numbers have been reported for infections of alveolar epithelial cell monoculture and are modest in comparison to airway epithelial cells and vero e cells . nevertheless, sars-cov- appears to disseminate rapidly from the epithelial to the endothelial layer. basolateral transmission has not been reported to be the major mode of transmission for both sars-cov- and sars-cov- infections of monocultures of upper airway cells at the air-liquid interface. this suggests that basolateral transmission may either be a unique feature of alveolar epithelial cells or a consequence of cell-to-cell contact between epithelial and endothelial cells in our vascularised model. autopsy reports show features of exudative alveolar damage . the loc model is compatible with microscopy-based assays which allowed us to assess changes in cellular physiology with high spatial resolution using confocal microscopy. we began by enumerating the number of nuclei per unit area of the membrane surface for locs at and dpi. in the epithelial layer, the density of nuclei declines progressively irrespective of the presence (p= . , p= e- ) and absence (p= . , p= . ) of macrophages compared to uninfected controls (fig. a) . in contrast, the nuclear density increases in the endothelial layer at dpi before decreasing back to levels in the control samples at dpi (fig. b) . these results appeared counterintuitive and so we examined if infection affected the morphology of the cells. staining for f-actin localisation revealed striking changes to the morphology of cells in the vascular channel. d views of the endothelial layer from an uninfected control loc also maintained at the air liquid interface show a confluent layer of cells aligned with the long axis of the channel as expected (fig c, fig. s a ). however, in infected locs there are clear signs of vascular inflammation (fig d-e) . at dpi, areas of cellular hyperplasia characterised by an increased cell density and stronger nucleic acid staining ( fig d, yellow arrows) coexist with areas with normal nucleic acid staining levels but reduced cell-to-cell contact ( fig d, white arrows) . by dpi, a significant loss of tight junctions and cell confluency is observed (fig e, fig s b-d) . vascular cells form clusters, and a much larger proportion of the surface area of the chip has low or no actin staining compared to the uninfected control (fig. s e ). in stark contrast, for the same chip, the epithelial layer maintains a high degree of confluency (fig, f, fig. s f ). immunostaining for the viral s protein at dpi ( in some cases, s proteins appear to co-localize with the periphery of the nucleus ( interestingly, il- levels were not significantly different between these two categories of locs, which suggested a non-immune cell source for this cytokine (fig. a ). elisa assays for il- b and ip did not detect these cytokines in the effluent (data not shown). expression levels of pro-inflammatory genes (tnfa, il , il b) at dpi was an order of magnitude higher than that of interferon genes in both cell types, and little to no interferon stimulatory gene expression was detected in both cell types ( expression at dpi was higher than that at dpi, the epithelial layer at dpi, and the uninfected controls, both on a per-cell and per-field-of-view basis (fig. l, fig. s a p= . , p= . , p= . respectively). unlike viral rna levels (fig. s b , d) il expression in the endothelial layer does not diminish over - dpi and would therefore appear to be the major contributor to il- secretion in the vascular effluent. given the pleiotropic nature of il- , we also examined expression of the il r receptor and the metallopeptidase adam which sheds the tnf-alpha receptor and the il- r receptor in the endothelial layer at dpi. il r expression was low ( fig. m ) and was not altered by infection, whereas adam expression was high (fig. m) and was increased -fold over uninfected controls (fig. n) . adam has been shown to enhance vascular permeability , and so we reasoned that targeting trans il- signalling using the anti-il- r monoclonal antibody tocilizumab , that is also undergoing clinical trials as a repurposed therapeutic for covid- , might ameliorate the vascular inflammation observed. tocilizumab administration at g/ml via continuous perfusion per se did not abrogate il- secretion (fig. a ). an investigation of cell morphologies in the endothelial layer also showed that the perfusion did not prevent the occurrence of hyperplasia (fig. a, e ). to quantify this, we compared regions of interest (rois) that excluded areas of cellular hyperplasia across at least six fields of view from the endothelial layer of locs with and without tocilizumab perfusion, and identified areas with low or no factin staining as those with reduced confluence, as in fig. s e . a plot of the proportion of pixels with intensities below a defined cut-off threshold (fig. g) showed that the untreated loc had a significantly higher proportion of pixels with intensities lower than % (p= . ) of the maximum intensity (p= . for % and p= . for %). inhibition of il- signalling through tocilizumab is able to ameliorate some but not all of the vascular damage observed. into the dynamics and mechanisms that underlie these observations that are difficult to obtain in other experimental models and that we hope will improve our understanding of the pathogenesis of this multi-organ disease. we gratefully acknowledge assistance from dr muhammet fatih gulen for the elisa assays. we are also grateful to the members of the bioimaging core facility (biop) for assistance with confocal microscopy. conditions. the next day, the chip was washed and a reduced medium for the airliquid interface (ali) was flowed through the vascular channel using syringe pumps (aladdin- , word precision instruments) at l/hour as described . the composition of the ali media used was as described in microfluidic organs-on-chips a lungon-chip infection model reveals an essential role for alveolar epithelial cells in controlling bacterial growth during early tuberculosis human gut-on-a-chip supports polarized infection of coxsackie b virus in vitro d microfluidic liver cultures as a physiological preclinical tool for hepatitis b virus infection bioengineered human organ-on-chip reveals intestinal microenvironment and mechanical forces impacting shigella infection integrating lung physiology, immunology, and tuberculosis covid- does not lead to a "typical" acute respiratory distress syndrome management of covid- respiratory distress d-dimer as a biomarker for disease severity and mortality in covid- patients: a case control study ultra-high-throughput clinical proteomics reveals classifiers of covid- infection pulmonary vascular endothelialitis, thrombosis, and angiogenesis in covid- pulmonary post-mortem findings in a series of covid- cases from northern italy: a two-centre descriptive study covid- : the vasculature unleashed imbalanced host response to sars-cov- drives development of covid- tropism, replication competence, and innate immune responses of the coronavirus sars-cov- in human respiratory tract and conjunctiva: an analysis in ex-vivo and in-vitro cultures sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract sars-cov- infection of pluripotent stem cell-derived human lung alveolar type cells elicits a rapid epithelial-intrinsic inflammatory response human ipsc-derived alveolar and airway epithelial cells can be cultured at air-liquid interface and express sars-cov- host factors robust three-dimensional expansion of human adult alveolar stem cells and sars-cov- infection single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses the protein expression profile of ace in human tissues upregulation of angiotensin converting enzyme by shear stress reduced inflammation and proliferation in vascular endothelial cells neuropilin- /gipc signaling regulates α β integrin traffic and function in endothelial cells neuropilin- is a host factor for sars-cov- infection neuropilin- facilitates sars-cov- cell entry and provides a possible pathway into the central nervous system sars-cov- structure and replication characterized by in situ cryo-electron tomography inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace in brief clinical-grade recombinant human ace can reduce sars-cov- infection in cells and in multiple human organoid models. inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace impaired type i interferon activity and inflammatory responses in severe covid- patients structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- neuropilin functions as an essential cell surface receptor rna-gps predicts sars-cov- rna localization to host mitochondria and nucleolus human organ chip models recapitulate orthotopic lung cancer growth, therapeutic responses, and tumor dormancy in vitro reconstituting organ-level lung functions on a chip open-source live d visualization for light-sheet microscopy representative x m fields of view of the epithelial (a) and endothelial layer. (c) fold-change in the markers in (b) relative to uninfected controls at the same timepoint. the bar represents the mean and the error bars the standard deviation. (d-g) d views of representative x m fields of view of the epithelial (d, f) and endothelial layer (e, g) from infected locs reconstituted with macrophages at (d, e) and dpi (f, g). il mrna k) zooms corresponding to the regions in (g) highlighted with white (h) and yellow boxes with solid (i) and dashed lines (j, k) respectively. in these panels, orf abantisense rna (pink) and s rna (amber) are also shown. the panels show examples of cells with similar levels of viral infection but with no (h), intermediate (i) and high levels (j, k) of il expression. (l) quantification of il expression in epithelial plots show the total number of spots normalized by the number of cells in - fields of view detected using rnascope assay using identical imaging conditions for all chips. bars represent the mean value, the solid line represents the median, and error bars represent the standard deviation. data from uninfected controls for days at air-liquid interface is indicated by *. (m) expression relative to gapdh and (n) fold-change in expression relative to uninfected controls for il r and adam expression in the endothelial layer. the bar represents the mean and the error bars the standard deviation key: cord- -q d l authors: pach, szymon; nguyen, trung ngoc; trimpert, jakob; kunec, dusan; osterrieder, nikolaus; wolber, gerhard title: ace -variants indicate potential sars-cov- -susceptibility in animals: an extensive molecular dynamics study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q d l severe acute respiratory syndrome coronavirus (sars-cov- ) emerged in late and since evolved into a global threat with nearly . million infected people and over , confirmed deaths worldwide. sars-cov- is an enveloped virus presenting spike (s) glycoproteins on its outer surface. binding of s to host cell angiotensin converting enzyme (ace ) is thought to be critical for cellular entry. the host range of the virus extends far beyond humans and non-human primates. natural and experimental infections have confirmed high susceptibility of cats, ferrets, and hamsters, whereas dogs, mice, rats, pigs, and chickens seem refractory to sars-cov- infection. to investigate the reason for the variable susceptibility observed in different species, we have developed molecular descriptors to efficiently analyze our dynamic simulation models of complexes between sars-cov- s and ace . based on our analyses we predict that: (i) the red squirrel is likely susceptible to sars-cov- infection and (ii) specific mutations in ace of dogs, rats, and mice render them susceptible to sars-cov- infection. severe acute respiratory syndrome coronavirus (sars-cov- ) is responsible for the recent coronavirus disease (covid- ) pandemic. sars-cov- and the related sars-cov (sars-cov- ), which caused the outbreak of severe acute respiratory syndrome (sars) in [ ] [ ] [ ] , are different strains of the species severe acute respiratory syndrome-related coronavirus of the family coronaviridae. coronaviruses (cov) are enveloped viruses that present characteristic spike (s) glycoproteins on the surface of infectious virions that are essential for viral entry. s is a trimeric glycoprotein containing two functional subunits. the first subunit (s ) is responsible for binding to the host cell receptor, angiotensin converting enzyme . the second subunit (s ) is responsible for the fusion of the viral and cellular membranes. the s subunit harbors the receptorbinding domain (rbd), which is responsible for binding to host cells. the sars-cov- rbd forms an antiparallel β-sheet structure, which is connected via short helices and loops. the contact to the host receptor is mainly established via loops known as the receptor-binding motif (rbm, figure a ). compared to the sars-cov the rbm of sars-cov- appears to be less restricted in its conformation, because four of five prolines present in a loop structure essential for binding to the ace protein are replaced by more flexible residues such as glycine in the s glycoprotein of sars-cov- . ace is a membrane-bound zinc-metalloprotease expressed on the surface of cells not only in the respiratory tract but also in the heart, arteries, kidneys, and intestines were it acts as critical player in the endocrine renin-angiotensin system. independent of its enzymatic function, ace functions as the cellular entry receptor for coronavirus infections. the sars-cov- s shows nanomolar binding affinity to this cellular receptor. ace forms a big substrate-binding cleft between the subdomains i and ii. the sars-cov s binds to the ace exploiting a non-competitive binding site on the outer site of subdomain i ( figure a , b). crystal structures of the rbd of the s in complex with human ace have been described for both sars-cov and sars-cov- . the studies have shown that binding interfaces contain three homologous binding regions (pockets) in the ace -rbd interface, hereinafter referred to as binding pocket a, b, and c (bp a, b, c). the rbd of the sars-cov s protein establishes hydrophobic contacts via its t in bp a to a lipophilic pocket formed by k and y of ace . moreover, hydrogen bonds are formed between n and ace -residue h in bp b, while lipophilic contacts are made between m (ace ) and l in bp c ( figure b) . the sars-cov- s features a binding interface consisting of three contact regions that are homologous to sars-cov. the binding interface is reported to utilize (i) hydrogen bonds in bp a from n (homologous to t in sars-cov ) to y of ace , (ii) a hydrogen bond network via q (homologous to n in sars-cov) to ace -residues k , h , e in bp b, and (iii) lipophilic contacts in bp c via f (homologous to l in sars-cov) to ace -l , m , and y ( figure a ). recent reports confirm that in addition to primates, cats , ferrets , and hamsters are susceptible to covid- infection, but dogs, pigs, chickens , mice, and rats remain unaffected. previous studies on sars-cov showed that differences in ace sequences are responsible for the variable susceptibility observed in different species . based on these observations, we chose a structural approach to investigate how genetic diversity of ace orthologs may affect the susceptibility of animal species to sars-cov- infection. on atomistic level, we have developed dynamic computational models for ace -rbd complexes of different species allowing us to anticipate the effects of amino acid sequence variation of ace on viral entry. sequence comparison of animal ace does not explain differences in covid- susceptibility between investigated species. we focused on animal species based on (i) their importance as natural reservoirs for sars-cov- due to frequent contacts with humans and (ii) availability of studies reporting susceptibility to sars-cov- infection enabling discrimination between differences in our models. the multiple sequence alignment of canine, feline and human ace orthologs revealed that the residues reported as hotspots , (referred to the human sequence positions , , , , , , and ; canine positions are numbered one value lower than other species) show polymorphic mutations h / y in dog and m / t in dog and cat. since the difference in position / is present in dogs only, we searched for sequences harboring the same polymorphism. we found that ace of common ferret (mustela pultorius) also contains a tyrosine at this position. however, ferrets can be infected with sars-cov- , which suggests that the h y polymorphism does not prevent viral entry. moreover, we compared ace sequences from rodents (mouse, rat, hamster, and red squirrel) to sample additional binding pockets in the ace -rbd interface and predict susceptibility to sars-cov- of the red squirrel (sciurus vulgaris). unfortunately, no protein structures are available for animal ace . to compare three-dimensional binding interfaces of animal ace -rbd complexes, we developed homology models of dog, cat, ferret, hamster, mouse, rat, and red squirrel proteins. we used the previously reported x-ray crystal structure of human ace bound to the sars-cov- rbd (pdb-id: m j ) as a template. all homology models show high quality with no major deviation in dihedral angles of the backbone (max. two ramachandran outliers). both outlier residues are located on flexible loops of ace distal to the s binding site and represent polymorphic mutations from glycine in human crystal structure to serine in homology models. all homology models show comparable flexibility in molecular dynamics (md) simulations (measured as root mean square deviation of backbone heavy atoms) to the human enzyme in the range of - Å. since the homology models of animal ace are based on coordinates of human ace in complex with the s rbd, we were able to superpose our models to the templates, yielding animal ace -sars-cov- rbd complexes. all binding interfaces of animal ace -s models remain in the same coordinate frame as the template crystal structure and thus comparable. we compared ace residues with direct contact to the rbd (distance of max. . Å) in all d complexes and searched for mutations possibly contributing to low sars-cov- susceptibility observed in dogs. however, polymorphisms were present in both non-susceptible and susceptible species, which did not explain the possibly lower affinity of rbd to dog, rat or mouse ace . in the next step, we investigated dynamic properties of ace -rbd complexes. all structures were simulated in five md simulation replicas ( ns each) showing stable binding without dissociation events. for further analysis, we focused on three binding regions in the ace -rbd interface. we discovered that the rbm shows considerably larger movements for canine and rat ace complexed with the rbd when compared to other species. we focused on the hotspot residue f placed on top of rbm and occupying the lipophilic pocket (bp c) of ace . we chose the distance between the cz-atom of f (position on phenyl ring) and the cb-atom of bp c central residue (or homologous position in the dog model; figure ) as a surrogate parameter for f contacts. we observed three different states, which can be adopted by the f -side chain: (i) 'perfect' fit into binding pocket (with distance of ca. - Å, 'bound state', figure a ), (ii) contact with ace outside the binding pocket preferable with lipophilic or aromatic side chain of residue / (with distance of ca. - Å, 'fixed state', figure b ), and (iii) contact to lipophilic residues / and / with outwards rotated central y / in a 'deformed state' (with a distance of ca. - Å, figure c ). the contribution of the 'fixed' and 'deformed state' to rbd binding is yet unclear: careful manual analysis of the performed md simulations only revealed loose ace -rbd f contacts in these states, which suggests negligible binding contributions. from all analyzed species, cat, ferret, hamster, human, and mouse showed one or two peaks with a narrow distance distribution around - Å suggesting frequent occupation of bp c (figure ). although mouse ace frequently occupies bp c, we assume that weak interactions in the other two pockets (bp a and b) might be responsible for the low sars-cov- susceptibility of mice. the simulations of rat ace -rbd complexes shows a dominant peak at around Å implying f conformation in the fixed state. canine ace -rbd complex simulations show three peaks suggesting transitions between bound, fixed, and deformed f states. all these results (with exception of murine simulations) correlate well with the susceptibility of the species to sars-cov- . we surmised that the conformational changes of f in dog and rat simulations might be caused by either flattening or conformational deformation of bp c. to validate this hypothesis, we calculated distances between the cb atom of central residue (or in the dog ace ) as the deepest point of bp c and all side chain atoms of residues at positions , and (dog homologues , and ) flanking this pocket. we plotted the occurrence of the shortest distance per frame (sdpf) for each residue (figure ) . from all three descriptors, sdpf between residues - (or - in dog) correlate well with bp c occupation in rat ace . the residue - sdpf average of rat ( . Å) is the lowest distance in comparison with other species, which indicates a flatter bp c. this would result in a suboptimal fit into the binding pocket and entails unbinding events of residue f . md simulations revealed that the polymorphism i l (rat -all other species) might be responsible for the narrowing of rat bp c. we observed that the methyl group of i restricts the bp c and hinders fitting of f . the canine ace shows a broad sdpf distribution between residue - with a minimal distance of Å, which is comparable to rat simulations. however, this state is less frequent in canine simulations. additional χ angle measurements of epitope / show that this residue rotates out of bp c, which causes a deformation of the binding pocket in canine ace . we chose the distance between cb atoms of residues and (or and in dog) localized in the upper and lower helix, respectively, forming bp c (figure , last panel). we surmised that the higher distance causes a larger shift between both helices and the resulting deformation of bp c. in this state, the central residue y / can only establish weak interaction with f . as expected, only canine ace -rbd simulations show higher distances than Å for the y / -f distance, which altogether suggests weak interactions. md simulations revealed that the polymorphism v / a (dog -all other species) might be responsible for the outward rotation of canine y . the larger and more rigid v pushes y out of bp c and towards the nterminus. subsequently, we analyzed the differences in distribution of residue - sdpf (or - sdpf in dog). hamster, human, and rat show remarkably denser distance distributions (at approximately . Å) than other species. the sequence comparison shows that the three species express a long and non-branched side chain residue (asparagine in hamster and rat, methionine in human) at position (dog ). these amino acids might stabilize f in the 'bound state' with bp c by steric effects. the most favorable residue at position / for binding f seems to be the methionine present in human ace , which also increases lipophilic contacts at this position. due to contradictory findings in murine bp c models, we surmise that the low sars-cov- susceptibility observed in mice might be caused by unfavorable or less frequent interactions in bp b and/or a. hence, we searched for polymorphisms exclusively occurring in the mouse ace sequence. we found two mutations in murine bp b (e /d n and k n ), which are surrounded by charged amino acids. both mutations replace a charged with a neutral residue, suggesting that rbd stabilizing salt bridges cannot be formed. further analysis of md trajectories of susceptible species led us to the hypothesis that (i) residue e/d can interact with k of rbd and (ii) k interacts with e within the same helix of ace introducing stable amino acid pairs coordinated by viral residue q . our hypothesis is supported by distance (sdpf) measurements between residues (ace ) - (rbd) ( figure ) and (ace ) - (ace ) / (ace ) - (rbd) (figure ). simulations of the murine protein complex show a considerably greater distance between residues n , e and q compared to other species ( versus Å). this indicates disruption of the hydrogen bond network in bp b. similarly, the mutation e/d n occurring in mouse and rat breaks a salt bridge to k of rbd and the main distance peak occurs at - Å (in contrast to other species showing close contacts at ca. Å). due to larger distances for mouse and rat residues to rbd, water might invade the binding pocket resulting in lower s-protein rbd affinity to murine ace . in addition, we found that a serine residue occurs at position in rat ace while more lipophilic threonine epitopes are found in other species. since residue is surrounded by lipophilic residues of the rbd (f and y ), we surmise that the t s polymorphism in rat might contribute to less favorable interactions in bp b. to our knowledge, the role of residues involved in interactions within bp a could not be clarified so far. we therefore compared the ace sequence of rat and mouse with other species and identified the mutation k h as relevant difference. inspecting md trajectories of rodent ace -rbd complexes, we observed that a lysine side chain in hamster ace establishes a salt bridge to d . similar to the residue pair k -e in the bp b, which is coordinated by viral residue q , the pair k -d interacts with rbd residue q . this region of bp a is surrounded by a hydrogen bond network comprised of ace residues , , , and rbd residues , , , , . we assume that, according to the oring theory , this hydrogen bond network 'seals' the hotspot k /d (ace ) -q (rbd) and prevents water from invading the protein-protein interface. the histidine at position in mouse and rat decreases the average number of hydrogen bonds established in the bp a ( in hamster versus or in rat or mouse, respectively; figure ). since histidine is less basic than a lysine, the formation of a salt bridge to d is less likely. md simulations show that the histidine residue is too short to establish hydrogen bonds with d (ace ) or q (rbd). d can rotate out of the bp a and water can reach the binding site, which results in less frequent interactions between the two proteins. this could explain why the rbd shows lower affinity to mouse and rat ace . red squirrels can be broadly found in urban environments in europe. we therefore strived to predict squirrel ace binding to s from sars-cov- since no information about susceptibility of the red squirrel to sars-cov- has been published yet. we prepared a red squirrel ace -rbd complex similarly to other species and conducted the workflow described above; (i) the depth parameters ( - -sdpf) and based on our descriptors and sequence comparisons, we suggest mutations contributing to sars-cov- susceptibility in unaffected species (table ) . we have investigated some of these mutants in-silico with dynamic models. in a first step, we mutated residue of canine ace from valine to alanine. this polymorphism should prevent the rotation of y and corresponding of bp c. the v a mutant showed the predicted effect ( figure ). in the - -sdpf torsion diagram of mutated ace , the additional peak associated to deformation cannot be observed, which is comparable to feline rbd binding. moreover, we surmise that mutation t m in dogs increases susceptibility to sars-cov- in a fashion that is similar to that conferred by human ace . to test the hypothesis of narrowing the bp c by branched side chains at position , we prepared and simulated an i l mutant of rat ace . this mutation clearly increased bp c occupancy and showed wide opening of bp c (figure ), which supported our hypothesis. we further suggest to mutate rat and mouse ace in bp a and b to stabilize rbd binding. in bp b, mutating positions and from neutral amino acids back to charged ones, as is the case in hamster ace , might increase sars-cov- susceptibility of rat and mouse. similarly, the mutation h k in rat and mouse ace bp a should result in a hydrogen bond network comparable to hamster protein and therefore enhancing the rbd binding. we present mechanistic, dynamic models on atomistic level to understand of the ace /sars-cov- interaction in different animal species that might serve as natural reservoirs for sars-cov- due to frequent contacts with humans. in addition to previous studies , we present extensive molecular dynamics simulations that rationalize rbd binding to ace . based on known susceptibility of animal species to sars-cov- and the comparison of md trajectories, we were able to develop models for prediction of rbd binding to ace . hence, we propose gain-of-function mutations for non-susceptible species (dog, rat and mouse) as validation for our model and predict that the red squirrel has a high chance to get infected by sars-cov- . homology and loop models were prepared using moe . [chemical computing group, montreal, canada] the models were constructed using gb/vi scoring with a maximum of ten main chain models. ace models of cat, dog and ferret were prepared using human ace bound to rbd of sars-cov- (pdb-id: m j ) as template. ace -s complex models were prepared via superposition of receptor homology models on template crystal structure and removing human protein. the catalytic center as well as clashing side chains in binding interface were adjusted using rotamere tool and opls-aa force field in moe. all systems (including substrate-and hit-protease complexes) were prepared in moe by protonation (at k and ph of ) and capping protease termini. ace -rbd-complex simulations were prepared with maestro . [schrödinger, llc: new york, usa] and carried out with desmond . . all systems were inspected for atom clashes, optimized for h-bonds, filled in Å large padding box with scp water model , sodium chloride . m and sodium ions to keep isotonic and electrostatic neutral conditions. the simulations were performed under default minimizing protocol and periodic boundary conditions over ns resulting in frames each simulation. for visual inspection, all trajectories were wrapped and aligned on backbone cα atoms of ace -rbd-complex and first simulation frame using vmd . . . initially, trajectories were analyzed visually to find possible differences in dynamic complexes, such as backbone movements. further analysis was performed with python . using mdanalysis . . for the extraction of distances, angles, and hydrogen bonds from trajectories after an equilibration period of ns (resulting in complex conformations per replicon). data processing and transformation was done with pandas . . . plots were created with seaborn . . . and matplotlib . . . an interactive web-based dashboard to track covid- in real time a pneumonia outbreak associated with a new coronavirus of probable bat origin coronavirus spike proteins in viral entry and pathogenesis cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure analysis of the receptor binding of -ncov potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody ace x-ray structures reveal a large hinge-bending motion important for inhibitor binding and catalysis structure of sars coronavirus spike receptor-binding domain complexed with receptor mice, hamsters, ferrets, monkeys. which lab animals can help defeat the new coronavirus? science anatomy of hot spots in protein interfaces the generalized born/volume integral implicit solvent model: estimation of the free energy of hydration using london dispersion instead of atomic surface area evaluation and reparametrization of the opls-aa force field for proteins via comparison with accurate quantum chemical calculations on peptides † ace -variants indicate potential sars-cov- -susceptibility in animals: an extensive molecular dynamics study | page of uploaded to biorxiv.org on may scalable algorithms for molecular dynamics simulations on commodity clusters proceedings of the acm/ieee conference on supercomputing (sc ) molecular-dynamics study of atomic motions in water vmd: visual molecular dynamics python tutorial data structures for statistical computing in python mwaskom/seaborn: v . matplotlib: a d graphics environment key: cord- - d n cve authors: hassan, sk. sarif; ghosh, shinjini; attrish, diksha; choudhury, pabitra pal; seyran, murat; pizzol, damiano; adadi, parise; el-aziz, tarek mohamed abd; soares, antonio; kandimalla, ramesh; lundstrom, kenneth; tambuwala, murtaza; aljabali, alaa a. a.; lal, amos; azad, gajendra kumar; uversky, vladimir n.; sherchan, samendra p.; baetas-da-cruz, wagner; uhal, bruce d.; rezaei, nima; brufsky, adam m. title: a unique view of sars-cov- through the lens of orf protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d n cve immune evasion is one of the unique characteristics of covid- attributed to the orf protein of severe acute respiratory syndrome coronavirus (sars-cov- ). this protein is involved in modulating the host adaptive immunity through downregulating mhc (major histocompatibility complex) molecules and innate immune responses by surpassing the interferon mediated antiviral response of the host. to understand the immune perspective of the host with respect to the orf protein, a comprehensive study of the orf protein as well as mutations possessed by it, is performed. chemical and structural properties of orf proteins from different hosts, that is human, bat and pangolin, suggests that the orf of sars-cov- and bat ratg -cov are very much closer related than that of pangolin-cov. eighty-seven mutations across unique variants of orf (sars-cov- ) are grouped into four classes based on their predicted effects. based on geolocations and timescale of collection, a possible flow of mutations was built. furthermore, conclusive flows of amalgamation of mutations were endorsed upon sequence similarity and amino acid conservation phylogenies. therefore, this study seeks to highlight the uniqueness of rapid evolving sars-cov- through the orf . severe acute respiratory syndrome-coronavirus- (sars-cov- ) is a novel coronavirus whose first outbreak was reported in december in wuhan, china, where a cluster of pneumonia cases was detected, and on th march, , who declared this outbreak a pandemic [ , , ] . as of nd august , a total of . million confirmed covid- cases have been reported worldwide with , , deaths [ ] . sars-cov- belongs to the family coronaviridae and has % nucleotide similarity and % protein sequence similarity with sars-cov, which caused the previous outbreak of sars in [ , ] . sasr-cov is a single-stranded rna virus of positive polarity whose genome is approximately kb in length and encodes for non-structural proteins, four structural and six accessory proteins [ , ] . the sars-cov- has a total of six accessory proteins including a, , a, b, and [ , ] . among these accessory proteins, orf is a significantly exclusive protein as it is different from another known sars-cov and thus associated with high efficiency in pathogenicity transmission [ , ] . the sars-cov- orf displays arrays of functions; inhibition of interferon , promoting viral replication, inducing apoptosis and modulating er stress [ , , ] . the sars-cov- orf is a amino acid long protein, which has an n-terminal hydrophobic signal peptide ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) aa) and an orf chain ( - aa) [ , ] . the functional motif (vlvvl) of sars-cov-orf b, which is responsible for induction of cell stress pathways and activation of macrophages is absent from the sars-cov- orf protein [ ] . in the later stages of the sars-cov epidemic it was found that a nucleotide deletion in the orf protein caused it to split into orf a ( aa) and orf b ( aa) rendering it functionless while the sars-cov- orf is intact [ ] . also, the sars-cov orf had a function in interspecies transmission and viral replication efficiency as a reported nucleotide deletion, which included orf ab resulted in a reduced ability of viral replication in human cells [ ] . however, the sars-cov- orf mainly acts as an immune-modulator by down-regulating mhc class i molecules, therefore protecting the infected cells against cytotoxic t cell killing of target cells. simultaneously it was proposed forth that it is a potential inhibitor of type interferon signalling pathway which is a key component of antiviral host immune response [ , ] . the orf also regulates unfolded protein response (upr) induced due to er stress by triggering atf- activation, thus promoting infected cell survivability for its own benefit [ ] . since this protein impacts various host pathogen processes and developed various strategies that allow it to escape through host immune responses, it becomes important to study the mutations in the orf to develop a better understanding of the viral infectivity and for developing efficient therapeutic drugs against it [ ] . in this present study, we identified the distinct mutations present across unique variants of the sars-cov- orf and classified them according to their predicted effect on the host, i.e disease or neutral and the consequences on protein structural stability. furthermore, we compared the orf protein of sars-cov- with that of bat-ratg -cov and pangolin-cov orf and tried to determine the evolutionary relationships with respect to sequence similarity and discussed the rising concern on its originality. in addition we made a study on polarity and charge of the sars-cov- orf mutations in two of its distinct domains and explored the possible effect on functionality changes. following this, we present the possible flow of mutations considering different geographical locations and chronological time scale simultaneously, validating with sequence-based and amino acid conservation-based phylogeny and thereby predicted the possible route taken through the course of assimilation of mutations. as of th august, , there were , complete genomes of sars-cov- available in the ncbi database and accordingly each genome contains one of the accessory proteins orf and among them only sequences were found to be unique. the amino acid sequences of the orf were exported in fasta format using file operations through matlab [ ] . among these unique orf protein sequences, only sequences possess various mutations and the remaining sequences do not either possess any mutations or possess ambiguous mutations. in this present study, we only concentrate on orf proteins, which are listed in table . in order to find mutations, we hereby consider the reference orf protein as the orf sequence (yp . ) of the sars-cov- genome (nc ) from china: wuhan [ ] . only the mutations l f, t a, k r, f y and v i are embedded in the orf proteins of bat-cov. mutations in proteins are responsible for several genetic orders/disorders. identifying these mutations requires novel detection methods, which have been reported in the literature [ ] . in this study, each unique orf sequence was aligned using ncbi protein p-blast and sometimes using omega blast suites to determine the mismatches and thereby the missense mutations (amino acid changes) are identified [ , ] . for the effect of identified mutations, a web-server 'meta-snp' was used and also for the structural effects of mutations, another web-server 'i-mutant' was used [ , ] . the web-server 'quark' was used for prediction of the secondary structure of orf proteins [ , ] . a protein sequence of orf is composed of twenty different amino acids with various frequencies. per-residue disorder distribution in sequences of query proteins was evaluated by pondr-vsl [ ] , which is one of the more accurate standalone disorder predictors [ , , ] . the per-residue disorder predisposition scores are on a scale from to , where values of indicate fully ordered residues, and values of indicate fully disordered residues. values above the threshold of . are considered disordered residues, whereas residues with disorder scores between . and . are considered highly flexible, and residues with disorder scores between . and . are taken as moderately flexible. the sars-cov- orf protein (yp ) is a amino acid long protein which has an n-terminal hydrophobic signal peptide ( - aa) and an orf chain ( - aa). a schematic representation of orf (sars-cov- ) is presented in fig. . it was observed that the total number ( ) of hydrophilic residues is more than that ( ) of the hydrophobic residues. however, from the predicted secondary structure (fig. ) , it was found that the highest solubility score is four, indicating that although hydrophilic residues are higher in number, however, they are not highly exposed to the external environment since they are folded inside of the protein therefore this protein is poorly soluble. we further predicted the secondary structure as well as solvent accessibility of orf proteins of sars-cov- , bat- ratg -cov and pangolin-cov using the ab-initio webserver quark (fig. ) and tried to perceive the differences. comparing the orf secondary structures of sars-cov- , bat-ratg -cov and pangolin-cov, the following changes are found at four different locations as presented in table . table , it is inferred that the secondary structures of orf (sars-cov- ) and that of bat-ratg are much closer related than the orf of pangolin-cov. it was obtained that the largest conserved region in the orf protein of sars-cov- is 'pftincqe' (in d ), eight amino acids long (table ) as seen in fig. . it turned out that the . % region is % evolutionary conserved across all distinct variants of the amino acid long orf protein. most of the conserved regions in sars-cov- orf (table and fig. ) lie around helix-coil and strand-coil junctions signifying the functional importance of these regions. helix regions also have conserved amino acids. it can be hypothesised that these junctions are involved in protein-protein interactions. therefore, these regions are conserved in nature. the orf protein of sars-cov- has only . % nucleotide similarity and % protein identity with that of sars-cov as shown in fig. . although the sars-cov- orf has differentiating genome characteristics but exhibit high functional similarity with that of sars-cov orf ab: • the sars-cov orf ab original protein was found to have an n-terminal hydrophobic signal sequence that directs its transport to the endoplasmic reticulum (er). however, after deletion of nucleotides, which split the orf ab protein into orf a and orf b, it was found that only orf a was able to translocate to the er while orf b remained distributed throughout the cell. likewise, the sars-cov- orf protein also contains an n-terminal hydrophobic signal peptide ( - aa), which is involved in the same function. • the er has an internal oxidative environment akin to other organelles, which is necessary for protein folding and oxidation processes. due to this oxidative environment, formation of intra or inter-molecular cysteine bonds between unpaired cysteine residues takes place as the sars-cov orf ab protein is an er resident protein and there are ten cysteine residues present, which exhibit disulphide linkages and form homomultimeric complexes within the er. similarly, the orf of sars-cov- has also seven cysteines, which may be expected to form these types of disulphide linkages. • the sars-cov orf ab is characterized by the presence of an asparagine residue at position with the asn-val-thr motif responsible for n-linked glycosylation, whereas sars-cov- orf has an n-linked glycosylation site at asn ( ) and the motif is asn-tyr-thr. • the sars-cov- orf is found to have both protein-protein and protein-dna interactions while sars-cov orf ab shows only protein-protein interactions. further, it was found that the orf protein of sars-cov- is very much similar ( %) to that of bat-cov ratg on the basis of sequence similarity as well as on the basis of phylogenetic relationships as shown in fig. . as we can see from the sequence alignment, there are only six amino acid differences between the sars-cov- orf and bat-cov ratg . all of these mutations in the orf protein with respect to the reference orf sequence of bat-ratg were found to be neutral type as predicted through webserver meta-snp and all of them had a decreasing effect on stability of the protein as determined using the server (i-mutant). we have also aligned the pangolin-cov orf sequence with that of sars-cov- and found that there is a sequence similarity of %, as depicted in fig. . we observed a difference of amino acid residues between the pangolin-cov and the sars-cov- orf . it was analysed that in both the bat-cov ratg and the pangolin-cov orf protein the mutations l i, v a and s l are occurred. so, it can be hypothesised that sars-cov- may have originated from pangolin-cov or bat-cov ratg orf . also, the comparison of frequencies of the hydrophobic, hydrophilic and charged amino acids was performed, which are present among the four different orf proteins of sars-cov, sars-cov- , bat ratg -cov and pangolin-cov was made. as seen in table , sars-cov- orf , sars-cov orf ab, bat-cov ratg orf and pangolin-cov orf , are all similar in terms of hydrophobicity and hydrophilicity and it is known that hydrophobicity and hydrophilicity play an important role in protein folding which determines the tertiary structure of the protein and thereby affect the function. the orf sequences of sars-cov- , bat-cov ratg and pangolin-cov have almost the same positive and negative charged amino acids, therefore we can say that probably they have similar kind of electrostatic and hydrophobic interactions, which also contribute to the functionality of the proteins. again, for the sars-cov orf ab it was found that the number of positive and negative charged amino acids are closely similar to the sars-cov- orf . although, the sars-cov sequence bears less similarity with the sars-cov- but they are probably similar in terms of electrostatic and hydrophobic interactions. furthermore, we analysed the three orf sequences and checked for their molecular weight, isoelectric point (pi), hydropathy, net charge and extinction coefficient using a peptide property calculator (https : //pepcalc.com/) and found that all the properties are almost identical, which shows that the orf protein of bat-cov ratg , pangolin-cov and sars-cov- are very closely related from the chemical aspects of amino acid residues (fig. ). indicating that orf of the pangolin-cov is more negatively charged as compared to the sars-cov- orf . also, since these three proteins have a similar molecular weight, pi, extinction coefficient and nature of hydropathy plot, it would be difficult to differentiate these three proteins by biophysical techniques on the basis of these properties. the orf of sars-cov- is significantly different from the orf of pangolin-cov and it seems that the orf protein (sars-cov- ) is imitating the properties as well the structure of orf of bat ratg -cov as a blueprint. the differences between the orf of sars-cov- and bat-cov ratg and pangolin-cov can be further demonstrated by the analysis of the per-residue intrinsic disorder predispositions of these proteins. results of this analysis are shown in fig. a , which illustrates that the intrinsic disorder propensity of the orf from sars-cov- is closer to that of the orf from bat-cov ratg than to the disorder potential of orf from pangolin-cov. this is in agreement with the results of other analyses conducted in this study. analysis is conducted using pondr-vsl algorithm [ ] , which is one of the more accurate standalone disorder predictors [ , , ] . a disorder threshold is indicated as a thin line (at score = . ). residues/regions with the disorder scores > . are considered as disordered. each of the orf amino acid sequences (fasta formatted) are aligned with respect to the orf protein (yp . ) from china-wuhan using multiple sequence alignment tools (ncbi blastp suite) and found the mutations and their associated positions were detected accordingly [ ] . it is noted that a mutation from an amino acid a to a at a position p is denoted by based on observed mutations, it is noticed that amino acids threonine (t) and tryptophan (w) are found to be most vulnerable to mutate to various amino acids. it is noteworthy that the sars-cov- orf is rapidly undergoing different type of mutations, indicating that it is a highly evolving protein, whereas the bat-cov orf is highly conserved (fig. ) and the pangolin-cov orf is % conserved (fig. ) . a pie chart presenting the frequency distribution of various mutations is shown in fig. . the n-terminal signal peptide of orf (d ) of sars-cov- is hydrophobic in nature. we further analysed (table ) the mutations and observed that hydrophobic to hydrophobic mutations are dominating, indicating that hydrophobicity of the domain is maintained and thus we can postulate that there is probably no functional change in the hydrophobic n-terminal signal peptide. further, it was found that there was a change from hydrophilicity to hydrophobicity in two positions, thus enhancing distinct non-synonymous mutations and the associated frequency of mutations, predicted effect (using meta-snp) as well as the predicted change of structural stability (using i-mutant) due to mutation(s) are presented in table . the most frequent mutation in the orf proteins turned out to be l s (hydrophobic (l) to non-charged hydrophilic (s)) which is a clade (s) determining mutation with frequency [ ] . it is observed from based on predicted effects and change of structural stability, mutations are grouped into four classes as shown in table . in table , the list of unique orf protein ids and their associated mutations with domain(s) and the predicted effects and changes of structural stability are presented. further, based on the three different types of mutations viz. neutral, disease and mix of neutral & disease, all the orf proteins are classified into three groups which are adumbrated in table . also the corresponding pie chart is given in fig. . note that, other than these two strains, there are other sixty-four other orf sequences, which do not possess any of these two strain-determining mutations. this clarifies that the orf protein is certainly one of the fundamental proteins which directs the pathogenicity of a variety of strains of sars-cov- . to study the evolution of mutations and to observe the relationships among three orf proteins, we compared sars- cov- orf with that of bat-cov ratg and pangolin-cov orf . the detailed analysis of all mutations is presented in table . based on table , it can be suggested that reverse mutations will lead to the same sequence for the sars-cov- orf protein will have the same sequence as that of bat-cov ratg and pangolin-cov orf proteins in the near future. we can also conclude that sars-cov- is showing genetically reverse engineering when compared with bat-cov ratg and pangolin-cov orf . the reversal of mutation also happened here and it was found that the frequency of leucine to serine mutation at th position is quite high. predisposition in the vicinity of residue , whereas the lowest disorder propensity in the vicinity of residue is found in variants qkv . and qkq . . finally, although variant qmu . is almost indistinguishable from variant qjs . within the first residues, its intrinsic disorder predisposition in the vicinity of residue is one of the lowest among all the proteins analysed in this study. interestingly, comparison of the fig. (a) and fig. (b) show that the variability in the disorder predisposition between many variants of the protein orf from sars-cov- isolates is noticeably greater than that between the reference orf from sars-cov- and orf proteins from bat-cov ratg and pangolin-cov. here we present five different possible mutation flows according to the date of collection of the virus sample from patients [ ] . sequence homology-and amino acid compositions-based phylogenies have been drawn for the orf proteins associated in each flow. in this flow of mutations (fig. in order to support these mutation flows, we analysed the protein sequence similarity based on phylogeny and amino acid composition. the reference orf sequence yp is found to be much more similar to the variants qmt . and qmi . , which are more similar to each other as depicted in the sequence based phylogeny (fig. (left) ). this sequence based similarity of the orf proteins qmt . and qmi . is illustrated in the chronology of mutations as shown in fig. . similarly, the mutation flow of the sequences qmt . and qmu . is supported by the respective sequence-based similarity. the network of five orf protein variants from the us is justified based on similar amino acid compositions/conservations across the five sequences as shown in fig. (right) . in this flow of mutations (fig. ), we observed one sequence with first-order mutations, i.e where only one mutation accumulated in the sequence. additionally, four sequences (all are from the us) were identified with second order mutations stating that four sequences were found to have two mutations. to the hydrophobic phenylalanine (f), so it may account for disrupting the ionic interactions. as it is a neutral mutation, the sequence accumulated two neutral mutations. the protein sequence qlh . acquired a second mutation, p s, which was found to be of disease-increasing type and the polarity also changed from hydrophobic to hydrophilic, thus indicating these mutations may have some significant importance. the protein sequence qlh . possesses a second mutation, v l, which was found to be of neutral type with no change in polarity. here, this sequence accumulated two neutral mutations, which may account for some functional changes. by comparing both the sequence-based phylogeny ( fig. (left) ) and amino acid conservation-based phylogeny (fig. (right)), we found that according to sequence-based phylogeny the australian sequence is closely related to the orf wuhan sequence. however, according to the pathway, it should be closely related to both the wuhan sequence and the sequences having second order mutation. this can be attributed to the presence of amino acid residues instead of aa residues. in this case, the sequence has a two amino acid deletion, therefore, it is present at first node. here we analysed the us sequences considering the wuhan sequence (yp . ) as the reference and found one sequence, qkc . , with a single mutation and seven sequences with two mutations each (fig. ). the first sequence, qkc . , contained the l s mutation (strain determining mutation), which was of neutral type. however, the polarity changed from hydrophobic to hydrophilic, which may account for some significant change of function. the sequences that accumulated nd mutation along with l s are as follows: • qms . : this protein sequence acquired a second mutation at position , which changed the hydrophilic amino acid threonine (t) to hydrophobic amino acid isoleucine (i) therefore affecting the ionic interactions. this mutation was found to be a diseased-increasing type, so it may affect the structure of the protein. • qmt . : this sequence gained a second mutation, e d, which was predicted to be of disease-increasing type with no change in polarity. the sequence first accumulated a neutral mutation then a disease-increasing mutation, signifying that these mutations may have some functional importance. • qjd . : h q occurred as second mutation in this sequence, which was found to be of disease-increasing type with no change in polarity. consequently, these mutations may contribute to immune evasion property of the virus. • qkv . : this sequence possesses the s f mutation, which was predicted to be of neutral type, which changed the hydrophilic serine (s) to the hydrophobic phenylalanine (f), thus interfering with the ionic interactions that may increase or decrease the affinity of the viral protein for a particular host cell protein. • qkv . : this sequence acquired a second mutation at q h, which was found to be a neutral mutation and no change in polarity was observed. as this sequence accumulated two neutral mutations, it can be assumed that neutral mutations also have a significant importance. • qkv . : the t a mutation occurred as the second mutation in this sequence, which was predicted to be of disease-increasing type and the polarity was changed from hydrophilic to hydrophobic, hence the structure and function of the protein are expected to differ. from the sequence based phylogeny (fig. (left) ) it was observed that the wuhan sequence was the first to originate. although, qkc . is the first sequence in our flow considering the time, it was found that in the phylogenetic tree it is present at fourth node instead of second node, which is probably due to the presence of ambiguous mutations in this sequence. it was also determined that qkv . is very similar to qmt . and again qmt . is more similar to both of them. all the other sequences having second order mutations are closely related to each other and follow the chronology. from the amino acid based analysis (fig. (right) ) it was found that the wuhan sequence has a high conservation similarity with that of qkv . , thus proving that this sequence was identified chronologically after the wuhan sequence followed by qkc . and qmt . , which again are very similar to each other. in the possible flow of mutations (fig. ) , we have found one sequence with a single mutation, six sequences with two mutations, and another two sequences with three mutations. the us sequence qkc . was identified to have the l s mutation, which is a strain determining mutation and was predicted to be a neutral mutation where polarity was changed from hydrophobic to hydrophilic. the sequences that accumulated second mutations along with l s are as following and it should be noted that the mutational accumulation occurred in a single strain: • qlh . : the a s mutation occurred as a second mutation in this us sequence, which was found to be of neutral type. however, the polarity changed from hydrophobic to hydrophilic, thus potentially influencing the function of the protein. • qmu . : this us sequence possesses the d g mutation, which was predicted to be of neutral type. however, the polarity changed from hydrophilic to hydrophobic so, this sequence accumulated two neutral mutations, which may allow the virus to evolve in terms of virulence. serine (s). the mutation was neutral, thus accumulating two neutral and one disease-increasing mutations, being of significant importance for the evolution of the virus. we identified one more sequence qlh . , which possesses another third mutation, f l, which was predicted to be a neutral mutation and no polarity change was observed. this sequence acquired three neutral mutations that may promote virus survival. • qku . : this sequence with the w l mutation was reported in saudi arabia, which was found to be of disease-increasing type with no polarity change. therefore, this sequence also accumulated one neutral and one disease-increasing mutation, which may affect both the structure and function of the protein. • qmt . : the f s mutation was reported in the us sequence, which was found to be of disease-increasing type and the polarity changed from hydrophobic to hydrophilic. altogether, the sequence possesses one neutral and one disease-increasing mutation that may allow the virus to acquire new properties for better survival strategies. sequence-based phylogeny (fig. (left) ) suggested that the wuhan sequence originated first. due to the presence of an ambiguous amino acid sequence, qkc . did not show close similarity to the wuhan sequence. based on sequence based phylogeny (fig. (left) ) it was observed that wuhan sequence originated first. due to presence of ambiguous amino acids sequence qkc . was not observed in close proximity to wuhan sequence. qkg . and qlh . were found to have third order mutations and they are assumed to be closely related by the flow and the same has been supported by amino acid conservation-based phylogeny (fig. (right) ). flow-v qjr . (australia) possesses the mutation l s with reference to the wuhan orf sequence yp . (fig. ) . another sequence qjr . was reported, which possesses a second mutation, v l. this mutation was predicted to be neutral with no change in polarity. however, the hydrophobicity increased. this sequence belongs to a particular strain and acquired two neutral mutations, indicating that these mutations may play some important role in the function of orf a. as can be seen from both the sequence-based phylogeny and amino acid conservation-based phylogeny (fig. ) , the wuhan sequence has originated earlier and the sequences qjr . and qjr . are more closely related to each other than to the wuhan sequence as both sequences have one common mutation not present in the wuhan sequence. among sars-cov- proteins, the orf accessory protein is crucial because it plays a vital role in bypassing the host immune surveillance mechanism. this protein is found to have a wide variety of mutations and among them l s ( ) and s l ( ) have highest frequency of occurrence, which bears distinct functional significance as well. it has been reported that l s and s l show antagonistic effects on the protein folding stability of sars-cov- [ ]. l s destabilizes protein folding, therefore up-regulating the host-immune activity and s l favours folding stability positively, thus enhancing the functionality of orf protein. l s is already established as a strain determining mutation and since according to our studies both l s and s l do not occur together in a single sequence of sars-cov- orf protein, it is proposed that virus with the s l mutation is a new strain altogether. we also observed that hydrophobic to hydrophobic mutations are dominant in the d domain. therefore, hydrophobicity is an important property for the n-terminal signal peptide. however, in the d domain, hydrophobic to hydrophilic mutations are observed more frequently, consequently making the ionic interactions more favourable and allowing the protein to evolve in terms of better efficacy in pathogenicity. the orf sequence of sars-cov- shows % similarity with the bat-cov ratg and % similarity with that of the pangolin-cov orf . thus, the orf protein of sars-cov- can be considered as a valuable candidate for evolutionary deterministic studies and for the identification of the origin of sars-cov- as a whole. we also analysed a wide variety of mutations in the sars-cov- orf , where we compared it with the orf of bat-cov ratg and pangolin-cov in relation to charge and hydrophobicity perspectives and we found that the bat-cov ratg orf protein exhibits exactly the same properties as that of the sars-cov- orf protein, whereas the properties of the pangolin-cov orf are relatively less similar to the sars-cov- orf . furthermore, to study the evolutionary nature of mutations in the orf , we aligned three bat sequences and found that two of them were exactly the same and there were only six amino acid differences in the third with respect to the other two sequences. so, only two variants were identified for bat-cov. therefore, it shows that the rate of occurrence of mutations is slow in the bat-cov ratg orf . however, for pangolins no differences were observed among four pangolin-cov orf sequences and therefore only a single variant of orf is present. based on sequence alignment, biochemical characteristics and secondary structural analysis, the bat-cov, the pangolin-cov and the sars-cov- orf displayed a high similarity index. additionally, in the orf of sars-cov- , certain mutations were found to exhibit exact reversal with respect to bats and pangolins and therefore pointing towards the genomic origin of sars-cov- . however, unlike bat-cov and pangolin-cov, the mutational distribution of the orf (sars-cov- ) is widespread ranging from the position to , having no defined conserved region. this surprises the scientific community enormously. further this property differentiates the sars-cov- orf from that of bat-cov and pangolin-cov, thus raising the question over the natural trail of evolution of mutations in sars-cov- . we further predicted the types and effects of mutations of sequences and grouped them into four domains and found that diseased type mutations with decreasing effect on stability are more prominent. consequently, it is hypothesized that these mutations are promoting the viral survival rate. furthermore, we tracked the possible flow of mutations in accordance to time and geographic locations and validated our proposal with respect to sequence-based and amino acid conservation-based phylogeny and therefore putting forward the order of accumulation of mutations. the authors do not have any conflicts of interest to declare. the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak the explosive epidemic outbreak of novel coronavirus disease (covid- ) and the persistent threat of respiratory tract infectious diseases to global health security the effect of human mobility and control measures on the covid- epidemic in china coronavirus disease (covid- ): situation report editorial-differences and similarities between severe acute respiratory syndrome (sars)-coronavirus (cov) and sars-cov- . would a rose by another name smell as sweet? systematic comparison of two animal-tohuman transmitted human coronaviruses: sars-cov- and sars-cov a genomic perspective on the origin and emergence of sars-cov- genomic diversity of sars-cov- in coronavirus disease patients the orf , orf and nucleocapsid proteins of sars-cov- inhibit type i interferon signaling pathway the architecture of sars-cov- transcriptome the orf protein of sars-cov- mediates immune evasion through potently downregulating mhc-i, biorxiv sars-cov- orf and sars-cov orf ab: genomic divergence and functional convergence genomic characterization of a novel sars-cov- novel immunoglobulin domain proteins provide insights into evolution and pathogenesis of sars-cov- -related viruses extended orf gene region is valuable in the epidemiological investigation of sars-similar coronavirus functional pangenome analysis provides insights into the origin, function and pathways to therapy of sars-cov- coronavirus functional pangenome analysis shows key features of e protein are preserved in sars and sars-cov- severe acute respiratory syndrome coronavirus : from gene structure to pathogenic mechanisms and potential therapy sars coronavirus accessory gene expression and function understanding genomic diversity, pan-genome, and evolution of sars-cov- immune evasion via sars-cov- orf protein? host immune response and immunobiology of human sars-cov- infection the ab protein of sars-cov is a luminal er membraneassociated protein and induces the activation of atf mechanisms of severe acute respiratory syndrome pathogenesis and innate immunomodulation nosocomial outbreak of novel coronavirus pneumonia in wuhan, china current methods of mutation detection magic-blast, an accurate rna-seq aligner for long and short reads the embl-ebi search and sequence analysis tools apis in collective judgment predicts disease-associated single nucleotide variants casadio, i-mutant . : predicting stability changes upon mutation from the protein sequence or structure ab initio protein structure assembly using continuous structure fragments and optimized knowledgebased force field toward optimal fragment generations for ab initio protein structure assembly use of amino acid sequence data in phylogeny and evaluation of methods using computer simulation phylogeny of the serpin superfamily: implications of patterns of amino acid conservation for structure and function dna barcode analysis: a comparison of phylogenetic and statistical classification methods molecular conservation and differential mutation on orf a gene in indian sars-cov genomes molecular conservation and differential mutation on orf a gene in indian sars-cov genomes molecular conservation and differential mutation on orf a gene in indian sars-cov genomes exploiting heterogeneous sequence properties improves prediction of protein disorder comprehensive review of methods for prediction of intrinsic disorder and its molecular functions comprehensive comparative assessment of in-silico predictors of disordered regions accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus implications of sars-cov- mutations for genomic rna structure and host microrna targeting pathogenetic perspective of missense mutations of orf a protein of sars-cov authors thank prof. bidyut roy of human genetics unit, indian statistical institute, kolkata, indis for his kind support for structure predictions. key: cord- -laeocs j authors: lima, amorce; healer, vicki; vendrone, elaine; silbert, suzane title: validation and comparison of a modified cdc assay with two commercially available assays for the detection of sars-cov- in respiratory specimen date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: laeocs j severe acute respiratory syndrome coronavirus (sars-cov- ), the virus that causes coronavirus disease (covid- ), has spread rapidly around the globe since it was first identified in december of in wuhan, china. in a race to contain the infection, researchers and healthcare officials have developed several assays to help diagnose individuals with covid- . to help laboratories in deciding what assay to bring into testing lines, factors such as assay availability, cost, throughput, and tat should be considered. here we validated a modified version of the cdc assay and used it as a reference to evaluate the performance of the neumodx™ sars-cov- and diasorin simplexa™ covid- direct assays. in silico analysis and clinical sample testing showed that the primesr/probes designed by the cdc were specific to the sars-cov- as they accurately detected all reactive samples with an assay lod of copies/ml. the performance of the three assays were analyzed using nasopharyngeal swabs specimen tested within hours or days from routine testing. a % agreement was observed between the commercial assays and the modified cdc sars-cov- assay. a deeper look at the ct values showed no significant difference between neumodx and the modified cdc sars-cov- assay, whereas diasorin had lower overall ct values than the modified cdc sars-cov- assay. neumodx and diasorin workflows were much easier to perform. neumodx has the highest throughput and shortest tat, whereas although the modified cdc sars-cov- assay has comparable throughput to diasorin, it has the longest hands-on time, and highest tat. there has been a race against time to develop tests for sars-cov- detection so individuals with covid- could be identified and isolated to slow the spread of the disease. in january , the cdc developed a taqman probe based molecular test, and at the end of to help respond to the testing demand in our hospital. all three assays are based on real-time and rp). a no-template water control (ntc) and -ncov_n_ positive control (ncovpc) were used as template for each of the primer/probe set; hs_rpp _positive control plasmid was used as template for rp primer/probe where necessary. the real-time reverse transcriptase pcr (rrt-pcr) cycling conditions were set up on the rotor-gene thermocycler (corbett research, australia) as followed: °c for min, °c for min, °c for min, followed by cycles of °c for s and °c for s with fluorescence (fam) detection during the °c incubation step. a sample was positive for sars-cov- if at least one of the two targets (n , n ) was detected regardless whether rp was amplified, negative if none of the targets was detected and rp was detected, and invalid if rp and the two targets were not detected. samples were processed for the neumodx tm sars-cov- assay and the diasorin analytical sensitivity and specificity of the modified cdc sars-cov- assay. a series of two-fold dilutions of sars-cov- strain usa_wa / rna were spiked in pooled sputum at concentrations of copies/ml to . copy/ml in order to determine the limit of detection (lod) of the assay. all samples were processed and tested in triplicate as described above. the lod was confirmed by further testing in replicates. the analytical specificity was determined by testing samples which include patient samples, atcc strains, and commercially available nucleic acid controls. clinical evaluation of the modified cdc sars-cov- assay. the performance of the modified cdc sars-cov- assay was established by testing contrived np swabs and sputum specimens and non-reactive specimens. of the contrived specimens, were spiked with sars-cov- strain usa_wa / rna at x- x lod concentrations and were spiked at concentrations spanning the assay's testing range ( , copies/ml to copies/ml). covid- direct assay and the modified cdc sars-cov- assay. a total of np swabs were used to compare the clinical performance of two commercially available assays against the modified cdc sars-cov- assay. of those, samples were tested to compare neumodx sars-cov- assay with the modified cdc sars-cov- assay, and were tested to compare simplexa covid- direct assay with the modified cdc sars-cov- assay. statistical analysis. the results obtained from each assay were compared with those obtained using the modified cdc sars-cov- assay. ep evaluator was used to calculate positive percent agreement (ppa), negative percent agreement (npa), and cohen's kappa (k) with % confidence intervals. modified cdc sars-cov- assay were designed by the cdc ( ). since the cdc primer/probe panel became available, many sars-cov- isolates have been sequenced. therefore, we conducted an analysis based on the currently available sequences in the national center for biotechnology information (ncbi) database as of march , . primer blast analysis showed that the primer/probe sets were specific to all available sequences for sars- primer/probe set for human rp gene internal control produced a bp amplicon (fig. b) . the pcr efficiency was determined for each of the primer/probe set by testing a series of -fold dilutions of the , copies/µl concentration of the ncovpc plasmid. the data showed that the pcr was linearity over orders of magnitude with great pcr efficiency (n = % and n = ) and r of . for each of the primer/probe set (fig. ) . the standard curve generated using the known concentrations was used to accurately determine the concentration of the usa_wa / rna. it was found that the usa_wa / rna obtained from utmb was at a concentration of . x copies/ml. analytical specificity of the assay was determined by testing samples, which included patient samples, atcc strains, and commercially available nucleic acid controls. the result showed that none of the targets was detected (table ) . were spiked with copies/ml to , copies/ml. the results showed that the assay detected sars-cov- in all reactive samples, whereas no amplification was seen in the non-reactive samples (table ) . cov- assay and neumodx sars-cov- assay, samples were run within a day and were run within days of first testing. all positive and negative samples tested by the modified cdc sars-cov- assay matched the results using the two commercial assays evaluated yielding a % ppv and npv for each assay (table a) . two samples that were negative on the modified cdc sars-cov- assay resulted in indeterminate on the neumodx for the nsp . neumodx did not yield a ct value for one of the two targets in samples: were negative for nsp gene target and was negative for the n gene target. however, they still considered positive as only one detected target was needed for a positive result. although % agreement was observed among the assays, further analysis showed neumodx sars-cov- and the modified cdc sars-cov- assay was - . , and - . between samples run within days. the overall ct value difference for all samples run between the two assays was - . . on the other hand, the average ct values difference between samples run within days between diasorin simplexa covid direct assay and the modified cdc sars-cov- assay was - . , and - . between samples run within days. the overall ct value difference for all samples between the two assays was - . . although the overall infection and death rates of covid- has been declining in some countries, it has increased in others. therefore, the ongoing pandemic still poses great risks for many around the world, and with the easing of certain restrictions, the need for health care facilities to be equipped and accurately test for the virus to limit its spread is as crucial as it will ever be. to that extent, laboratories have brought in sars-cov- assays and molecular platforms to respond to the need of their communities. there have been a few publications on head-to-head comparisons of those assays, including a couple very recently as we preparing this article, in order to shed light on their performance characteristics and help laboratories make informed decisions on acquiring those assays ( , - ) . in this study, we validated a modified cdc sars-cov- assay and compared its performance to two commercial automated sample-to-answer assays for the detection of sars- cov- rna. we confirmed that the primer/probe sets were specific to all sars-cov- clades based on the available genome sequences including those that were not available at the time when those primer sets were originally designed. not only were those primers/probe sets specific to sars-cov- based on in silico study, there were no false positive results in cross- reactivity experiments using a panel of bacterial and closely-related virus targets. we found that the primer/probe sets have high pcr efficiency and a lod of copies/ml. the clinical sensitivity and specificity of the assay was also evaluated in samples with different concentrations of viral rna. the results showed that the assay is specific to sars-cov- as it was only detected in the reactive samples. the other objective of this study was to compare the clinical performance of two commercially available assays in out laboratory, neumodx sars-cov- assay and simplexa agreement of % between the results obtained on the commercial assays and those on the modified cdc sars-cov- assay. it is worth noting that while we found that the modified cdc inserts for neumodx sars-cov- assay and diasorin simplexa covid direct is copies/ml and copies/ml for assay, respectively. a closer look at the ct value differences between the modified cdc sars-cov- assay and the commercial assays suggests that there is not a significant difference between the modified cdc sars-cov- assay and neumodx sars-cov- ; however, there seems to be a greater difference in ct values between diasorin simplexa cov- direct assay and the modified cdc sars-cov- assay, with diasorin having lower ct values. the difference is even greater in samples that were run days after the routine testing on the modified cdc sars-cov- assay. this is in line with previously published data that showed ct values on diasorin was much lower than those on an the modified cdc sars-cov- assay by an average ct difference of - . ( ). the overall data also suggest that depending on the viral burden in the samples np samples can be refrigerated for at least days and still maintain the rna integrity for viral detection by the assays in this study. a limitation of this study was that the same samples were not tested by the three different assays, so a head-to-head comparison of the three assays was not performed. this was due to the limit of available kits for routine testing in patient care. however, we did a head- to-head comparison of the assay's workflow. the neumodx molecular system is a sample- to-answer and random-access platform that automatically performs nucleic acid extraction, amplification, and signal detection and analysis requiring only very little human interaction for loading and scanning the samples ( ). it has the shortest tat and highest throughput of the three assays. diasorin simplexa cov- direct assay involves a simple operation procedure that does not include an extraction step; it has, however, much lower throughput than neumodx. the modified cdc sars-cov- assay is singleplex; each target must be run in different tubes, as opposed to the two commercial assays, which are multiplex. it requires separate extraction steps on the easymag which can extract up to samples at a time, longer hands-on time, and has much lower throughput compared to neumodx. it has comparable throughput to diasorin if nucleic acid extraction and pcr reaction tubes for the next sample batch are prepared before the pcr cycles of the previous batch ends. therefore, although the three assays have the same accuracy, the overall workflow favors the commercial platforms. in conclusion, diagnostic laboratories around the world have faced with unprecedented challenges due to the sars-cov- pandemic. thousands of sars-cov- tests are being executed in any given laboratory each day. this testing requirement has not only forced laboratory to bring in new technologists to help with testing, but it has also led to the shortage of testing reagents. consequently, laboratories had to acquire different assays and platforms to meet testing demand. as much as ldt assays, such as the modified cdc sars-cov- assay, were instrumental at the onset of the pandemic for covid- testing, their overall testing capacity are limited. therefore, it is necessary for laboratories to acquire multiple high throughput automated instruments that can test high number of samples quickly with almost the same number of qualified laboratory professionals. real time pcr determining amplification efficiency of the primer/probe sets. ten-fold serial dilution of , copies/µl of ncovpc plasmid was tested. pcr linearity over orders of magnitude with a limit of detection of copies/µl; n slope of - . with a correlation coefficient r = . ; n slope =- . and r = . . genotyping coronavirus sars-cov- : methods and implications epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china identification of a novel coronavirus in patients with severe acute respiratory syndrome isolation direct assay clinical evaluation of the cobas sars-cov- test and a diagnostic platform switch during hours in the midst of the covid- pandemic a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster detection of sars-cov- from nasopharyngeal and nasal swabs from individuals diagnosed with covid- comparison of commercially available and laboratory developed assays for in vitro detection of sars-cov- in clinical laboratories clinical evaluation of three sample-to-answer platforms for the detection of sars-cov- molecular system control key: cord- -kyr lv authors: saçar demirci, müşerref duygu; adan, aysun title: computational analysis of microrna-mediated interactions in sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kyr lv micrornas (mirnas) are post-transcriptional regulators of gene expression that have been found in more than diverse organisms. although it is still not fully established if rna viruses could generate mirnas that would target their own genes or alter the host gene expression, there are examples of mirnas functioning as an antiviral defense mechanism. in the case of severe acute respiratory syndrome coronavirus (sars-cov- ), there are several mechanisms that would make mirnas impact the virus, like interfering with replication, translation and even modulating the host expression. in this study, we performed a machine learning based mirna prediction analysis for the sars-cov- genome to identify mirna-like hairpins and searched for potential mirna – based interactions between the viral mirnas and human genes and human mirnas and viral genes. our panther gene function analysis results indicate that viral derived mirna candidates could target various human genes involved in crucial cellular processes including transcription. for instance, a transcriptional regulator, stat and transcription machinery might be targeted by virus-derived mirnas. in addition, many known human mirnas appear to be able to target viral genes. considering the fact that mirna-based therapies have been successful before, comprehending mode of actions of mirnas and their possible roles during sars-cov- infections could create new opportunities for the development and improvement of new therapeutics. coronaviruses are positive-single stranded rna (+ssrna) viruses with exceptionally large genomes of ~ kb with ′cap structure and ′polya tail. the coronavirus subfamily is divided into four genera: α, β, γ, and δ based on serotype and genome features . the genome of a typical cov codes for at least different open reading frames (orfs), which has variations based on the cov type . some orfs encode non-structural proteins while others code for structural proteins required for viral replication and pathogenesis. structural proteins include spike (s) glycoprotein, matrix (m) protein, small envelope (e) protein, and nucleocapsid (n) protein with various roles for virus enterance and spread. sars-cov- belongs to β cov with - % genetic similarity to sars-cov based on sequence analysis and might share similar viral genomic and transcriptomic complexity , . currently, it has been also revealed that sars-cov- has a very high homology with bat covs, which indicated how it is transmitted to human without knowing intermediate carriers . s protein of sars-cov- has a strong interaction with human angiotensin-converting enzyme (ace ) expressed on alveolar epithelial cells which shows the way of virus infection in human . micrornas (mirnas) are small, noncoding rnas that play role in regulation of the gene expression in various organisms ranging from viruses to higher eukaryotes. it has been estimated that mirnas might influence around % of mammalian genes and their main effect is on regulatory pathways including cancer, apoptosis, metabolism and development . although the current release of mirnas, the standard mirna depository, lists mirnas of organisms, only of them are viruses . while, the first virus-encoded mirnas was discovered for the human epstein-barr virus (ebv) , more than viral mirna precursors were reported so far. although it has been shown that various dna viruses express mirnas, it is still debatable if rna viruses could also encode. the major concerns regarding mirnas of rna viruses are based on : -the fact that rna viruses that replicate in cytoplasm, do not have access to nuclear mirna machinery -since rna is the genetic material, mirna production would interfere with viral replication on the other hand, many features of mirna -based gene regulation seems to be especially beneficial for viruses. for instance, through targeting specific human genes by viral mirnas, it is possible to form an environment suitable for survival and replication of the virus. furthermore, for viral mirnas it is likely to escape host defense system since host itself generates mirnas in the same manner. currently, options for the prevention and treatment of covs are very limited due to the complexity. therefore, detailed analysis of cov-host interactions is quite important to understand viral pathogenesis and to determine the outcomes of infection. although there are studies regarding to the viral replication and their interaction with host innate immune system, the role of mirna-mediated rna-silencing in sars-cov- infection has not been enlightened yet. in this study, sars-cov- genome was searched for mirna-like sequences and potential host-virus interactions based on mirna actions were analyzed. data analysis, pre-mirna prediction, mature mirna detection workflows were generated by using the konstanz information miner (knime) platform data genome data of the virus was obtained from ncbi: severe acute respiratory syndrome coronavirus isolate wuhan-hu- , complete genome genbank: mn . mirna prediction workflow izmir pre-mirna prediction genome sequences of sars-cov- were transcribed (t->u) and divided into nt long fragments with overlaps. then these fragments were folded into their secondary structures by using rnafold with default settings and hairpin structures were extracted, producing hairpins in total. a modified version of izmir (svm classifier is changed to random forest and latest mirbase version was used for learning) was applied to these hairpins with ranging lengths (from to ) ( figure ). based on the mean value of averages of classifiers' prediction scores (decision tree, naive bayes and random forest), hairpins passed . threshold and used for further analysis. selected hairpins were further processed into smaller sequences; maximum nt length with nt overlaps. then, these fragments were filtered based on minimum length of and their location on the hairpins (sequences not involving any loop nucleotides were included). target search of these remaining candidate mature mirnas were performed against human and sars-cov- genes by using psrnatarget tool with default settings . moreover, human mature mirnas' from mirbase were applied for searching their targets in sars-cov- genes. the targets of viral mirnas in human genes were further analyzed for their gene ontology (go). to achieve this, panther classification system (http://www.pantherdb.org) was used . searching sars-cov- genome for sequences forming hairpin structures resulted in hairpins with varying lengths (supplementary files). in order to use machine learning based mirna prediction approach of izmir workflows, hundreds of features were calculated for all of the pre-mirna candidate sequences. among those, minimum free energy (mfe) values required for the folding of secondary structures of hairpin sequences and hairpin sequence lengths of known human mirnas from mirbase and predicted hairpins of sars-cov- were compared ( figure ). based on the box-plots shown in figure , most of the extracted viral hairpins seem to be smaller than human mirna precursors. since viruses would need to use at least some members of host mirna biogenesis pathway elements, viral mirnas should be similar to host mirnas to a certain degree. therefore, a classification scheme trained with known human mirnas was applied on sars-cov- hairpins. only hairpins out of passed the . prediction score threshold and used for further analysis. from these hairpins, mature mirna candidates were extracted and their possible targets for human and sars-cov- genes were investigated. sars-cov- mirna candidates were further analyzed to test if they were similar to any of the known mature mirnas from organism listed in mirbase. to achieve this, a basic similarity search was performed based on the levenshtein distance calculations in knime. however, there was no significant similarity between hairpin or mature sequences. while predicted mature mirnas of sars-cov- were used to find their targets in sars-cov- and human genes, mature mirnas of human were also applied on sars-cov- genes (table , supplementary files). although mirna based self-regulation of viral gene expression is a hypothetical case, sars-cov- orf ab polyprotein gene might be the only one that could be a target of viral mirnas. in total human genes seem to be targeted by viral mirnas. table lists some of the predicted targets of sars-cov- mirnas in human genes that have roles in transcription. the full list of mirnatarget predictions are available in supplementary files. orf ab appeared to be the target of different human mature mirnas ( table ) . as expected, number of targeting events appear to be correlated with the gene length. blocking nuclear import of stat by binding to nuclear imports lastly, in order to understand the main mechanisms that would be affected by the influence of viral mirnas on human genes, panther classification system was applied to targeted human genes. based on the results presented in graph is limited to the pathways that have at least genes. pathways of genes were obtained from panther. y-axis shows the number of genes with respected pathways. legend is sorted from maximum to minimum (top to bottom). the potential roles of mirna-mediated rna interference in infection biology has been defined as an essential regulatory molecular pathway. . although encoding mirnas seems quite problematic for rna viruses due to the nature of mirna biogenesis pathway, it is possible to circumvent these problems through different ways as seen in hiv- . therefore, we analyzed possible human genes targeted by predicted mirna like small rnas ( table ) table , it can be concluded that increases in the level of host mirnas targeting virulent genes such as s, m, n, e and orf ab would block viral entry and replication. moreover, decreasing the levels of host mirnas would make sars-cov- more replicative and visible for the host immune system. however, alterations in host mirna levels would interfere with specific cellular processes which are crucial for the host biology. in our study, we have also identified possible mirna like small rnas from sars-cov- genome which target important human genes. therefore, antagomirs targeting viral mirnas could be also designed even though there are only a few studies for dna viruses . however, all these therapeutic possibilities need further mechanistical evaluations to understand how they regulate virus-host interaction. therefore, further in vitro, ex vivo and in vivo studies will be required to validate candidate mirnas for sars-cov- infection. host factors in coronavirus replication coronavirus disease : coronaviruses and blood safety characteristics of and public health responses to the coronavirus disease outbreak in china novel coronavirus: where we are and what we know emerging coronaviruses: genome structure, replication, and pathogenesis a pneumonia outbreak associated with a new coronavirus of probable bat origin evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission an overview of rna virus-encoded micrornas mirbase: tools for microrna genomics identification of virus-encoded micrornas a machine learning approach for microrna precursor prediction in retro-transcribing virus genomes knime: the konstanz information miner on the performance of pre-microrna detection algorithms mirbase: integrating microrna annotation and deepsequencing data vienna rna secondary structure server psrnatarget: a plant small rna target analysis server panther in : modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees coronavirus spike proteins in viral entry and pathogenesis sars coronavirus e protein forms cation-selective ion channels role of severe acute respiratory syndrome coronavirus viroporins e, a, and a in replication and pathogenesis a structural analysis of m protein in coronavirus assembly and morphology the molecular biology of coronaviruses sars coronavirus replicase proteins in pathogenesis severe acute respiratory syndrome (sars) coronavirus orf protein is acquired from sars-related coronavirus from greater horseshoe bats through recombination severe acute respiratory syndrome coronavirus gene products contribute to virus-induced apoptosis induction of apoptosis by the severe acute respiratory syndrome coronavirus a protein is dependent on its interaction with the bcl-xl protein full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event severe acute respiratory syndrome coronavirus orf antagonizes stat function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane cellular versus viral micrornas in host-virus interaction the role of micrornas in the pathogenesis of herpesvirus infection sane, f. & hober, d. mirnas in enterovirus infection mechanisms underlying hepatitis c virus-associated hepatic fibrosis the interplay between viral-derived mirnas and host immunity during infection h n influenza virus-specific mirna-like small rna increases cytokine production and mouse mortality via targeting poly(rc)-binding protein identification of ebola virus micrornas and their putative pathological function hiv- tar mirna protects against apoptosis by altering cellular gene expression up-regulation of microrna- in influenza a virus infection inhibits viral replication by targeting micrornome analysis unravels the molecular basis of sars infection in bronchoalveolar stem cells cellular microrna let- c inhibits m protein expression of the h n influenza a virus in infected human lung epithelial cells phage display technique identifies the interaction of severe acute respiratory syndrome coronavirus open reading frame protein with nuclear pore complex interacting protein npipb in modulating type i interferon antagonism hepatitis c virus-induced upregulation of microrna mir- a- p in hepatocytes promotes viral infection and deregulates metabolic pathways associated with liver disease pathogenesis genetic analysis of determinants for spike glycoprotein assembly into murine coronavirus virions: distinct roles for charge-rich and cysteine-rich regions of the endodomain microrna regulation of rna virus replication and pathogenesis viral micrornas: interfering the interferon signaling a tale of two rnas during viral infection: how viruses antagonize mrnas and small non-coding rnas in the host cell role of taf in transcriptional activation by rta of epstein-barr virus emerging roles for rna degradation in viral replication and antiviral defense viral micrornas target a gene network safety, tolerability, and antiviral effect of rg- in patients with chronic hepatitis c: a phase b, double-blind, randomised controlled trial key: cord- -y nzsv authors: rosenke, kyle; leventhal, shanna; moulton, hong m.; hatlevig, susan; hawman, david; feldmann, heinz; stein, david a. title: inhibition of sars-cov- in vero cell cultures by peptide-conjugated morpholino-oligomers date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y nzsv background sars-cov- is the causative agent of covid- and a pathogen of immense global public health importance. development of innovative direct-acting antiviral agents is sorely needed to address this virus. peptide-conjugated morpholino oligomers (ppmo) are antisense agents composed of a phosphordiamidate morpholino oligomer covalently conjugated to a cell-penetrating peptide. ppmo require no delivery assistance to enter cells and are able to reduce expression of targeted rna through sequence-specific steric blocking. objectives and methods five ppmo designed against sequences of genomic rna in the sars-cov- ’-untranslated region and a negative control ppmo of random sequence were synthesized. each ppmo was evaluated for its effect on the viability of uninfected cells and its inhibitory effect on the replication of sars-cov- in vero-e cell cultures. cell viability was evaluated with an atp-based method and viral growth was measured with quantitative rt-pcr and tcid infectivity assays. results ppmo designed to base-pair with sequence in the ’-terminal region or the leader transcription regulatory sequence-region of sars-cov- genomic rna were highly efficacious, reducing viral titers by up to - log in cell cultures at - hours post-infection, in a non-toxic and dose-responsive manner. conclusion the data indicate that ppmo have the ability to potently and specifically suppress sars-cov- growth and are promising candidates for further pre-clinical development. background: sars-cov- is the causative agent of covid- and a pathogen of immense global public health importance. development of innovative direct-acting antiviral agents is sorely needed to address this virus. peptide-conjugated morpholino oligomers (ppmo) are antisense agents composed of a phosphordiamidate morpholino oligomer covalently conjugated to a cell-penetrating peptide. ppmo require no delivery assistance to enter cells and are able to reduce expression of targeted rna through sequence-specific steric blocking. objectives and methods: five ppmo designed against sequences of genomic rna in the sars-cov- '-untranslated region and a negative control ppmo of random sequence were synthesized. each ppmo was evaluated for its effect on the viability of uninfected cells and its inhibitory effect on the replication of sars-cov- in vero-e cell cultures. cell viability was evaluated with an atp-based method and viral growth was measured with quantitative rt-pcr and tcid infectivity assays. results: ppmo designed to base-pair with sequence in the '-terminal region or the leader transcription regulatory sequence-region of sars-cov- genomic rna were highly efficacious, reducing viral titers by up to - log in cell cultures at - hours post-infection, in a non-toxic and dose-responsive manner. standard operating protocols. ppmo synthesis. ppmo were synthesized by covalently conjugating the cpp (rxr) (where r is arginine and x is -aminohexanoic acid) to pmo (gene tools llc, philomath, oregon) at the ' end through a noncleavable linker, by methods described previously [ ] . cells and viruses. vero-e cells (atcc) were maintained in dmem supplemented with % fetal calf serum, mm l-glutamine, u/ml penicillin and μg/ml streptomycin (growth medium). all cell culture incubations were carried out at º c and % co . sars-cov- isolate ncov-wa - was kindly provided by the centers for disease control and prevention (atlanta, georgia, usa). preparation and quantification of the virus followed methods previously described [ ] . briefly, the original virus stock was propagated once at rml in vero- e cells in dmem supplemented with % fetal bovine serum containing l-glutamine and antibiotics as above (infection medium). the virus stock used in the experiments was passage and was confirmed by sequencing to be identical to the initial deposited genbank sequence cell viability assay. cell viability was assessed using celltiterglo (promega). vero evaluation of virus quantity by qrt-pcr. supernatants were harvested as described above and viral rna purified and quantified by using one-step quantitative reverse transcription pcr (qrt-pcr) following previous described methods [ ] . briefly, total rna was isolated with the rneasy mini kit (qiagen) and rt-pcr carried out using the one-step rt-pcr kit (qiagen) according to the manufacturer's protocols. copy numbers were calculated using standards produced as previously described [ ] . tcid evaluations. viral supernatants were serially diluted in dmem and each dilution sample was titrated in triplicate. subsequently, ul of each virus dilution was transferred to vero-e cells grown in -well plates containing ul dmem. following a seven day incubation period, wells were scored for cytopathic effect (cpe , ] . in this study, five ppmo were designed to target the ' utr and first translation start site-region of sars-cov- positive sense genomic rna ( table ) . two of the ppmo target the '-terminal-region of the genome. both 'end- and 'end- were designed with the intention of interfering with pre-initiation of translation of genomic and subgenomic mrnas. two ppmo were designed to target the genomic 'utr region containing the putative sars- cov- leader-trs core sequence (nt - , '-acgaac- ') and thereby potentially interfere with body-trs to leader-trs base-pairing, and/or with translocation of the s translation preinitiation complex along the 'utr of the genomic and various subgenomic mrnas. coronaviruses produce a set of nested mrnas through the process of discontinuous subgenomic mrna synthesis. the trs is a six-ten nt sequence that is critical in the production of negative strand mrna templates during this process [ , ] . for sars-cov, the leader-trs core sequence consists of nt - of the genomic rna sequence [ ] , and most viral mrnas possess the same nt ' leader sequence. the aug ppmo was designed to target the translation initiation region for orf a/b, which codes for the replicase polyprotein, with the intention to block translation initiation. a negative control ppmo (nc) of random sequence was included (see table ), to control for nonspecific effects of the ppmo chemistry. nc was screened using blast and contains little significant homology to any primate, rodent or viral sequences. evaluation of ppmo cytotoxicity. to evaluate the effect of ppmo treatments on cell viability, cells were treated under similar conditions to the antiviral assays described below, but in the absence of virus. cells were treated in triplicate with increasing doses of ppmo for hours before being assayed using a quantitative cell viability assay. at the concentrations used in the antiviral assays described below, none of the ppmo produced more than % cytotoxic effect ( figure a) . ppmo on sars-cov- replication, vero-e cells were treated with each of the six ppmo described in table at , , and µm for hours before infection, then incubated without ppmo after infection. cell supernatants were collected at four time-points post-infection: , , , and hours. virus growth was evaluated by two methods, qrt-pcr and tcid infectivity assay. using an moi of . , virus growth rose steadily and reached peak growth at hrs post- infection ( fig. b and h) aug ppmo was not nearly as effective as the other antiviral ppmo used in this study ( figure c and i). we found that ppmo targeting the 'terminal-region or leader-trs-region were highly effective at inhibiting the growth of sars-cov- , whereas a ppmo targeting the polyprotein a/b aug translation start site region was not effective. it is unknown if the ineffective aug-ppmo was unable to bind to its target, or if duplexing occurred yet was relatively inconsequential. to date, little sequence variation in the ppmo target sites in the 'utr of sars-cov- has been identified. of the whole-genome nucleotide sequences reported in genbank, two genotypes contain a single mismatch with the 'end ppmo and one genome has a single mismatch with the trs ppmo [ ] . previous studies have shown that ppmo having a single base mismatch with their target site retain approximately % of their activity compared to those having perfect agreement [ , ] . this study demonstrates that ppmo targeted against sars-cov- can enter cells readily and inhibit viral replication in a sequence-specific, dose-responsive and non-toxic manner. morpholinos and their peptide conjugates: therapeutic promise and challenge for duchenne muscular dystrophy antisense morpholino oligomers and their peptide therapeutic oligonucleotides vectorization of morpholino oligomers by the (r-ahx-r)( ) peptide allows efficient splicing correction in the absence of endosomolytic agents inhibition of rna virus infections with peptide-conjugated morpholino oligomers morpholino oligomers targeting the pb and np genes enhance the survival of mice infected with highly pathogenic influenza a h n virus inhibition of influenza a h n virus infections in mice by morpholino oligomers. archives of virology inhibition of respiratory syncytial virus infections with morpholino oligomers in cell cultures and in mice inhibition of porcine reproductive and respiratory syndrome virus infection in piglets by a peptide-conjugated morpholino oligomer antiviral effects of antisense morpholino oligomers in murine coronavirus infection models antisense morpholino-oligomers directed against the ' end of the genome inhibit coronavirus proliferation and growth inhibition, escape, and attenuated growth of severe acute respiratory syndrome coronavirus treated with antisense morpholino oligomers antiviral activity of morpholino oligomers designed to block various aspects of equine arteritis virus amplification in cell culture suppression of porcine reproductive and respiratory syndrome virus replication by morpholino antisense oligomers a contemporary view of coronavirus transcription nidovirus transcription: how to make sense the structure and functions of coronavirus genomic ' and ' ends genomic characterization of a novel sars- cov- inhibition of multiple subtypes of influenza a virus in cell cultures with morpholino oligomers key: cord- -k gsfng authors: suresh, voddu; parida, deepti; minz, aliva p.; senapati, shantibhusan title: tissue distribution of ace protein in syrian golden hamster (mesocricetus auratus) and its possible implications in sars-cov- related studies date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k gsfng recently, the syrian golden hamster (mesocricetus auratus) has been demonstrated as a clinically relevant animal model for sars-cov- infection. however, lack of knowledge about the tissue-specific expression pattern of various proteins in these animals and the unavailability of reagents like antibodies against this species hampers optimal use of these models. the major objective of our current study was to analyze the tissue-specific expression pattern of angiotensin□converting enzyme (ace ), a proven functional receptor for sars-cov- in different organs of the hamster. we have adapted immunoblot analysis, immunohistochemistry, and immunofluorescence analysis techniques to evaluate the ace expression pattern in different tissues of the syrian golden hamster. we found that kidney, small intestine, esophagus, tongue, brain, and liver express ace . epithelium of proximal tubules of kidney and surface epithelium of ileum expresses a very high amount of this protein. surprisingly, analysis of stained tissue sections for ace showed no detectable expression of ace in the lung or tracheal epithelial cells. similarly, all parts of the large intestine (caecum, colon, and rectum) were negative for ace expression. together, our findings corroborate some of the earlier reports related to ace expression pattern in human tissues and also contradicts some others. we believe that the findings of this study will enable the appropriate use of the syrian golden hamster to carryout sars-cov- related studies. the current outbreak of covid- (corona virus ) caused by the sars-cov- virus was declared as a pandemic on th march . till now it has succumbed around . million people worldwide and almost . thousand people have lost their lives due to this pandemic (till th june ). therefore, there is an urgent need to study this viral disease transmission, pathogenesis, prevention, and treatment. in this regard, the role of clinically relevant experimental animal models is crucial. no single species of animal might be able to exactly recapitulate all the sars-cov- infection-related events in humans. however, the use of different animal models will help to address questions in a more reliable and clinically relevant manner. at the same time, the exploration of multiple species susceptible to this virus might also help to identify the natural reservoir and potential carriers of this pathogen. were observed in brain, heart, liver, and kidney on dpi . although no infectious virus was detected in the kidney, low copies of the viral genome were detected on and dpi . in a similar type of study, chan fj et al. by using male and female syrian hamsters of - weeks old identified tissue damage and presence of viral n protein at different parts of the respiratory tract (nasal turbinate, trachea and lungs) . the viral n protein was abundantly present in bronchial epithelial cells, macrophages, type i and ii pneumocytes. at dpi, n protein expression was found all over the alveolar wall . the histopathological analysis also showed tissue damage and/or inflammatory lesions at multiple extra-pulmonary organs (intestine, heart, spleen, bronchial, and mesenteric lymph nodes); however, n protein expression was only detected in the intestinal epithelial cells . in the very recent past, syrian hamster model of sars-cov- infection has been instrumental to establish that passive transfer of a neutralizing antibody (nab) protects sars-cov- infection . studies have clearly shown sars-cov- binds to human angiotensin-converting enzyme (ace ) expressed by its target cells and use it as a functional receptor to enter into cells , . hence, drugs that could inhibit the binding of viral proteins (s-protein) to the ace expressed on the target cells are assumed to be potential therapeutics against covid- . a recent study has shown that human recombinant soluble ace (hrsace ) blocks early stage of sars-cov- infection . we have also proposed the bioengineered probiotics expressing human ace as a potential therapeutics against sars-cov- infection . the alignment of ace protein of different species has suggested that the s protein may interact more efficiently with cricetidae ace than murine ace . in silico analysis also shows possible interaction between sars-cov- spike proteins with syrian hamster ace . at the time of ongoing covid- pandemic, in addition to the vaccine and antiviral development, attentions have been made to target host proteins for therapeutic purposes. as discussed above, the pharmaceutical modulation of ace expression or inhibition of its interaction with sars-cov- spike protein for covid- therapy is a matter of current investigation at different parts of the world . in these efforts, animal models will be instrumental to check the efficacies and safety of potential drug candidates against covid- . although the syrian hamster is a clinically relevant model for multiple infectious diseases, unavailability of reagents like antibodies against hamster proteins and lack of publicly available gene or protein expression data for this species are the major constrains to use these models up to their full capacity . before utilizing hamster as a model to understand the role of ace in the pathogenesis of sars-cov- infection and/or to evaluate the efficacy of ace -targeted drugs, the knowledge about the basal level of ace expression in different tissues of hamster is very essential. in the current study, we have checked the expression pattern of ace in different tissues of normal syrian hamster through immunoblot and immunohistochemical analysis. all the tissue samples used in this study are from archived samples collected during our previous studies , . prior approval from the institutional animal ethical committee (institute of life sciences, bhubaneswar, india) was taken for use of these animals. all the methods associated with animal studies were performed according to the committee for the purpose of control and supervision of experiments on animal (cpcsea), india guidelines. using an electric homogenizer, tissues were lysed in ice-cold ripa buffer ( mm tris-hcl ph . , mm nacl, mm na edta, mm egta, % np- , % sodium de-oxy-cholate, . mm sodium pyrophosphate, mm β -glycerophosphate, mm na vo ) supplemented with a protease inhibitor cocktail (mp biomedicals) and soluble proteins were collected. protein concentrations were measured by bradford assay (sigma). µg of protein was loaded for each sample and electrophoresed through % sds-polyacrylamide gels. proteins were transferred to poly-vinylidene difluoride membrane (millipore) and blocked with % bovine serum albumin. membranes were probed with ace (#ma - ; invitrogen; : ) or β -actin (#a ; sigma-aldrich; : ) primary antibody and horseradish peroxidase-conjugated secondary antibody. antibody binding was detected with electrochemiluminescence substrate (# p; cst) and chemiluminescence visualized with chemidoc™mp gel imaging system (biorad). all the tissue samples were processed and sectioned as reported earlier , . paraffin-embedded sections were de-paraffinized using xylene, rehydrated in graded ethanol and deionized water. sections were subjected to antigen retrieval treatment by boiling in acidic ph citrate buffer life technologies) and visualized using tcs sp sted confocal microscope. the ace recombinant rabbit monoclonal antibody (thermo fisher scientific; clone sn ; cat no ma - ) used in this study was generated by using synthetic peptide within human ace aa - as immunogen. as per the information available by different companies, the antibody of this clone (sn ) has reactivity against human, mouse, rat, and hamster (thermo showed clear reactivity with a protein of molecular weight of ~ kd, which matches with ace (figure ) . the absence of any other non-specific bands in the western blot also suggests its suitability for use in ihc of different organs of hamsters. the major objective of this study was to check the status of ace expression in different hamster tissues. in human patients, the lung associated pathology is a predominant feature of sars-cov- infection . certain earlier studies have shown expression of ace transcripts or protein by lung epithelial cells [ ] [ ] [ ] [ ] , hence, just after the reports that ace binds with sars-cov- spike (s) protein, the research and clinical communities assumed that high level of ace expression in lung or other part of the respiratory tract might be a major driving factor in the pathogenesis of this respiratory virus. our initial immunoblot analysis data showed very trace amount of ace expression in lung tissue lysate (figure ) . to get an idea about spatial and cell-type distribution of ace expressing cells in lungs and trachea further ihc analysis was conducted. interestingly, we didn't find any visible positive staining in the epithelial cells of trachea, bronchioles, and alveoli (figure a) . endothelial cells and smooth muscle cells associated with the wall of blood vessels were also negative for ace staining (data not shown). as most of the previous reports suggest ace expression in type ii pneumocytes, we performed immunofluorescence staining to get magnified images for alveolar pneumocytes and tracheal epithelial cells. corroborating our ihc data, we didn't notice any positive staining when compared with corresponding without antibody stained tissue sections ( figure b) . we believe that the trace amount of lung-associated ace detected in immunoblot analysis might have come from some non-epithelial cells with low abundance and scattered distribution in lung parenchyma, whose presence might not be obvious in stained tissue sections. our findings corroborate a recent report on human ace expression patterns in different organs published in a preprint form . together, based on our preliminary findings and available reports we believe that sars-cov- related lung pathology might be independent or minimally dependent on ace expression status in lungs, which warrants further investigation. recent studies have reported the presence of sars cov- viral protein in respiratory tract epithelial cells and lungs of infected hamsters , hence our findings suggest the possible involvement of some other proteins than ace in entry of sars-cov- virus into respiratory and lungs cells , . in our study, hamster kidney tissues showed very high level of ace expression (figures , b & ) . its expression was mostly at the apical surface of proximal tubules whereas glomeruli were negative (figures b & ) . so far most of the literature and publicly available protein expression database have clearly shown high expression of ace protein in human kidney tissues , . high expression of ace in the kidney is believed to contribute to sars-cov- virus pathogenesis and disease severity in our analysis, in addition to kidney and different parts of the gut, brain, liver, tongue are three other extra-pulmonary organs that showed ace expression (figure , & ) . in brain, mostly the neurons of the cerebral cortex were positive for ace expression. this finding corroborates information available at the human protein atlas (https://www.proteinatlas.org/ensg -ace /tissue). expression of ace in hamster brain neuronal cells might help in investing the possible neurological tissue damage due to sars-cov- infection . in liver tissues, mostly the sinusoidal endothelial cells stained positive for ace , but hepatocytes were negatively stained (figure ). our immunoblot analysis data also showed ace expression in hamster liver tissues (figure ) . the sinusoidal endothelial expression of ace in hamster doesn't match with the expression of ace pattern reported for human liver , and warrants further investigation to understand these contradictory findings. in our analysis, we didn't notice any positive staining in bile duct and gall bladder epithelial cells (data not shown). the human oral mucosal cavity expresses ace and specifically this is highly enriched in tongue epithelial cells . the data from our ihc study also shows expression of ace in both dorsal and ventral stratified squamous epithelium of hamster tongue. interestingly, the ventral side epithelial cells have very high level of ace expression than dorsal side (figure ) . the absence of ace expression in our immunoblot analysis (figure ) could be due to less proportion of cellular proteins contribution into the total tissue lysate (also a possible reason for low level of β -actin detection). together, our study has provided a comprehensive idea about ace expression patterns in different tissues of hamster. we believe that this information will be instrumental in optimal severe acute respiratory syndrome coronavirus infection of golden syrian hamsters simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility pathogenesis and transmission of sars-cov- in golden hamsters surgical mask partition reduces the risk of non-contact transmission in a golden syrian hamster model for coronavirus disease (covid- ) isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure, function, and antigenicity of the sars-cov- spike glycoprotein inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace bioengineered probiotics to control sars-cov- infection spike protein recognition of mammalian ace predicts the host range and an optimized ace for sars-cov- infection interactions of coronaviruses with ace , angiotensin ii, and ras inhibitors-lessons from available evidence and insights into covid- macrophage migration inhibitory factor of syrian golden hamster shares structural and functional similarity with human counterpart and promotes pancreatic cancer characterization and use of hapt -derived homologous tumors as a preclinical model to evaluate therapeutic efficacy of drugs against pancreatic tumor desmoplasia covid- and lung pathology sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes digestive system is a potential route of covid- : an analysis of single-cell coexpression pattern of key proteins in viral entry process the protein expression profile of ace in human tissues cd l (l-sign) is a receptor for severe acute respiratory syndrome coronavirus exploring the pathogenesis of severe acute respiratory syndrome (sars): the tissue distribution of the coronavirus (sars-cov) and its putative receptor, angiotensin-converting enzyme (ace ) genetic roadmap for kidney involvement of severe acute respiratory syndrome coronavirus (sars-cov- ) infection human kidney is a target for novel severe acute respiratory syndrome coronavirus (sars-cov- ) infection persistence and clearance of viral rna in novel coronavirus disease rehabilitation patients a pilot study to investigate the fecal dissemination of sars-cov- virus genome in covid- patients in odisha potential fecal transmission of sars-cov- : current evidence and implications for public health sars-cov- productively infects human gut enterocytes evidence for gastrointestinal infection of sars-cov- diarrhoea may be underestimated: a missing link in novel coronavirus evidence of the covid- virus targeting the cns: tissue distribution, host-virus interaction, and proposed neurotropic mechanisms high expression of ace receptor of -ncov on the epithelial cells of oral mucosa the authors are thankful to director, ils, bhubaneswar for his support. vs, dp and apm are recipients of council of scientific and industrial research (csir) students' research fellowship, government of india. we sincerely acknowledge the technical supports given by mr. madan mohan mallick and mr. bhabani sahoo, ils. key: cord- -jlzufa authors: lee, sungyul; lee, young-suk; choi, yeon; son, ahyeon; park, youngran; lee, kyung-min; kim, jeesoo; kim, jong-seo; kim, v. narry title: the sars-cov- rna interactome date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: jlzufa sars-cov- is an rna virus whose success as a pathogen relies on its ability to repurpose host rna-binding proteins (rbps) to form its own rna interactome. here, we developed and applied a robust ribonucleoprotein capture protocol to uncover the sars-cov- rna interactome. we report host factors that directly bind to sars-cov- rnas including general antiviral factors such as zc hav , trim , and parp . applying rnp capture on another coronavirus hcov-oc revealed evolutionarily conserved interactions between viral rnas and host proteins. network and transcriptome analyses delineated antiviral rbps stimulated by jak-stat signaling and proviral rbps responsible for hijacking multiple steps of the mrna life cycle. by knockdown experiments, we further found that these viral-rna-interacting rbps act against or in favor of sars-cov- . overall, this study provides a comprehensive list of rbps regulating coronaviral replication and opens new avenues for therapeutic interventions. subgenomic rnas (sgrnas) (kim et al., a) . all canonical viral (+) rnas share the common ′ end sequence called the leader sequence and the ′ end sequences. the sgrnas are generated via discontinuous transcription which leads to the fusion between the ′ leader sequence and the "body" parts containing the downstream open reading frames (sola et al., ) that encode structural proteins (s, e, m, and n) and accessory proteins ( a, c, , a, b, , and b) (kim et al., a) . to accomplish this, coronaviruses employ unique strategies to evade, modulate, and utilize the host machinery (fung and liu, ) . for example, the grna molecules must be kept in an intricate balance between translation, transcription, and encapsulation by recruiting the right host rna-binding proteins (rbps) and forming specific ribonucleoprotein (rnp) complexes. as host cells counteract by launching rbps such as rig-i, mda , and toll-like receptors (tlrs) to recognize and eliminate viral rnas, the virus needs to evade the immune system using its components to win the arms race between virus and host. how such stealthy devices are genetically coded in this compact rna genome is yet to be explored (snijder et al., ) . thus, the identification of the rbps that bind to viral transcripts (or the sars-cov- rna interactome) is key to uncovering the molecular rewiring of viral gene regulation and the activation of antiviral defense systems. biochemical techniques for studying rna-protein interactions have been developed (ramanathan et al., ) with the advancement in protein-centric methods such as clip-seq (crosslinking immunoprecipitation followed by sequencing) . in clip-seq experiments, rnp complexes are crosslinked by uv irradiation within cells to identify direct rna-protein interactions. the protein of interest is immunoprecipitated to identify the associated rnas (lee and ule, ; van nostrand et al., ) . more recently, rna-centric methods have also been developed to profile the mrna interactome and rnp complexes (roth and diederichs, ) . after uv irradiation, the rna of interest is purified with oligonucleotide probes and the crosslinked proteins are identified by mass spectrometry. for example, rap-ms exhibits compelling evidence of highly confident profiling of proteins that bind to a specific rna owing to a combination of long hybridization probes and harsh denaturing condition (engreitz et al., ; mchugh et al., ) . in this study, we developed a robust rnp capture protocol to define the repertoire of viral and host proteins that associate with the transcripts of coronaviruses, namely sars-cov- and hcov-oc . network and transcriptome analyses combined with knockdown experiments revealed host factors that link the viral rnas to mrna regulators and putative antiviral factors. to identify the viral and host proteins that directly interact with the genomic and subgenomic rnas of sars-cov- , we modified the rna antisense purification coupled with mass spectrometry (rap-ms) (mchugh and guttman, ) protocol which was developed to profile the interacting proteins of a particular rna species ( figure a) . briefly, cells were first detached from culture vessels and then irradiated with nm uv to induce rna-protein crosslink while preserving rna integrity. crosslinked cells were treated with dnase and lysed with an optimized buffer condition to homogenize and denature the proteins in high concentration. massive pools of biotinylated antisense -nt probes were used to capture the denatured rnp complexes in a sequence-specific manner. after stringent washing and detergent removal, the rnp complexes were released and digested by serial benzonase and on-bead trypsin treatment. these modifications to the rap-ms protocol enabled robust and sensitive identification of proteins directly bound to the rna target of interest (see methods for detailed explanation). we designed two separate pools of densely overlapping -nt antisense probes to achieve an unbiased perspective of the sars-cov- rna interactome ( figure b and table s ). the sars-cov- transcriptome consists of ( ) a genomic rna (grna) encoding nonstructural proteins (nsps) and ( ) multiple subgenomic rnas (sgrnas) that encode structural and accessory proteins (sola et al., ) . the sgrnas are more abundant than the grna (kim et al., a) . the first pool ("probe i") consists of oligos tiles every nucleotides across the orf ab region ( : , nc_ . ) and thus hybridizes specifically with the grna molecules ( figure b) . the second pool of oligos ("probe ii") covers the remaining region ( : , nc_ . ) which is shared by both the grna and sgrnas. to first check whether our method specifically captures the viral rnp complexes, we compared the resulting purification from vero cells infected with sars-cov- (betacov/korea/kcdc / ) at moi . for hours (kim et al., b ) by either probe i or probe ii. as negative controls, we pulled-down without probes ("no probe" control) or with the control probes (for either s or s rrna). protein composition of each rnp sample was distinct as shown by silver staining and western blotting ( figure s a ) with prominent sars-cov- n protein associated with probes i and ii, as expected. enrichment of sars-cov- rnas were confirmed by rt-qpcr ( figure s b ), suggesting that our protocol purfies specific rnp complexes. note that sars-cov- grna was not enriched in the probe ii experiment, hinting at the excess amount of sgrnas over grna in our culture condition. we conducted label-free quantification (lfq) by liquid chromatography with tandem mass spectrometry (lc-ms/ms) and identified host proteins and viral proteins in total ( figure c ). as highly abundant proteins may nonspecifically co-precipitate during the rnp capture experiment, we statistically modelled this protein background as a multinomial distribution and assessed the probability (i.e. p-value) of the quantity of the identified protein in the rnp capture (e.g. probe i) experiment over the protein background of the control (e.g. no-probe) experiment (see methods for details). this unweighted spectral count analysis resulted in and proteins that are overrepresented in the probe i and probe ii sample, respectively (fdr < %, table s ). protein domain enrichment analysis revealed that these proteins indeed harbor rnabinding domains such as rna recognition motif (rrm) domain and k homology (kh) domain ( figure s c ). of note, unlike the cellular mrna interactome (castello et al., ; gerstberger et al., ) , the rna-binding repertoire of sars-cov- rnas showed a depletion of dead/deah box helicase domains and an enrichment of kh domain. as for viral proteins, the n protein was the most strongly enriched one, as expected ( figure d ). the nsp protein was also statistically enriched in both probe i and probe ii experiments. nsp , s, m, and nsp were detected more with probe i than with probe ii. coronavirus nsp is a single-strand rna binding protein (egloff et al., ; sutton et al., ) essential for viral replication (miknis et al., ) . nsp is one of the major virulence factors that suppresses host translation by binding to the s ribosomal subunit (thoms et al., ) . while nsp is mostly studied in the context of host gene expression (narayanan et al., ) , our result hints at the direct role of nsp on the transcripts of sars-cov- . to delineate the host proteins that are enriched in the sars-cov- rnp complex, we employed an additional negative control experiment with uninfected cells ( figure e ). in effect, this control provides a conservative background of host proteins as shown by silver staining ( figure s d ). distributions of peptide length were consistent across technical replicates ( figure s e ), demonstrating the robustness of the"on-bead" digestion step. spectral count analysis against the "uninfected probe control" resulted in and proteins that are enriched in the infected samples with probe i and probe ii, respectively (fdr < %, figure f -h). in combination, we define these proteins as the "sars-cov- rna interactome." host proteins such as csde (unr), eif h, fubp , g bp , pabpc , zc hav were enriched in both the probe i and probe ii rnp capture experiments on infected cells ( figure f ), thus identifying a robust set of the "core sars-cov- rna interactome." gene ontology (go) term enrichment analysis revealed that these host factors are involved in rna stability control, mrna function, and viral process ( figure s f ). to investigate the evolutionary conservation of the rna-protein interactions in coronaviruses, we conducted rnp capture on hcov-oc that belongs to the lineage a of genus betacoronavirus. hcov-oc shows . % nucleotide homology to sars-cov- which belongs to lineage b. we profiled the hcov-oc rnp complexes at multiple time points: , figure s d ), indicating the specificity of the coprecipitated rbps. fourteen viral proteins were detected within the hcov-oc rnp complexes ( figure b , figure s e ). specifically, lfq intensities for viral structural proteins n, m, and s increased over time more evidently in the grna-hybridizing probe i experiment ( figure b ). hcov-oc a, an accessory protein unique to betacoronavirus lineage a, was also detected and increased over time, indicating that this protein of unknown function may act as an rbp. the rdrp nsp and the papain-like protease nsp also increased along with the other nsps identified in this experiment. only a marginal amount of the hcov-oc nsp was detected ( figure s e ), implying the functional divergence of nsp in betacoronavirus lineages a and b. next, we compared the host factors that form the viral rna interactome of hcov-oc and sars-cov- . all proteins from the sars-cov- interactome were also detected in the hcov-oc interactome throughout multiple infection timepoints, except for rbms and ddx y ( figure c ), indicating that many of the same host proteins interact with rnas of both hcov-oc and sars-cov- . to determine the core host factors that are conserved in the coronavirus rna interactomes, we applied our spectral count analysis on the hcov-oc experiment of hpi ( figure s a ) and conducted statistical analysis in comparison to the noprobe control ( figure d , fdr < %) and the uninfected-probe control ( figure e , fdr < %). we identified and host proteins for the hcov-oc probe i and probe ii experiments, respectively (table s ) . proteins were statistically enriched in both probe sets. go term enrichment analysis revealed that these host proteins are involved in transcriptional regulation, rna processing, and rna stability control ( figure s f to understand the regulatory significance of the sars-cov- rna interactome, we compiled a list of "neighboring" proteins that are known to physically interact with the factors identified in our study (see methods for details). in particular, we generated a physical interaction network centered (or seeded) by the core sars-cov- interactome ( figure s a ). network analysis revealed several network hubs (e.g. npm and pabpc ) and two highly connected network modules: the ribosomal subunits and the eif complex. go term enrichment analysis resulted in translation-related biological processes ( figure s b ), most likely due to the overrepresentation of ribosomal proteins and subunits of the eif complex, which reflects the active translational status of viral mrnps. to achieve a more indepth functional perspective of the rna interactome, we reconstructed the physical interaction network with the sars-cov- rna interactome but excluding ribosomal proteins and eif proteins ( figure a ). this analysis identified additional hub proteins such as trim , sqstm , and khdrbs . go term enrichment analysis revealed multiple steps of the mrna life cycle such as mrna splicing, mrna export, mrna stability, and stress granule assembly ( figure b ), suggesting these mrna regulators are co-opted to assist the viral life cycle. interestingly, we also found go terms related to viral processes and innate immune response. in terms of intracellular localization, the sars-cov- rna interactome is enriched by proteins localized in the paraspeckle and cytoplasmic rnp granule (e.g. stress granule) compared to the cellular mrna interactome ( figure c ) (baltz et al., ; castello et al., ) . these observations suggest the regulatory mechanisms of viral rnas distinct from that of host mrnas, which involve activation of host antiviral machinery and sequestration of viral rnas. to measure the impact of these host proteins on coronavirus rnas, we conducted knockdown experiments and infected calu- cells with sars-cov- ( figure a and b). calu- cells are human lung epithelial cells and often used as a model system for coronavirus infection (sims et al., ) . strategically, we selected a subset of the sars-cov- rna interactome that covers a broad range of functional modules that we identified above: jak-stat signaling, mrna transport, mrna stability, and translation. knockdown of host factors that are stimulated by sars-cov- infection or infß treatment, namely parp , trim , zc hav , celf , and shfl, led to increased viral rnas ( figure c ). the result suggests that these rbps which directly recognize the viral rnas are induced by the interferon and jak-stat signaling pathway to suppress coronaviruses. zc hav (zap/parp ) is an isg and known to restrict the replication of many rna viruses such as hiv- (retroviridae), sindbis virus (togaviridae), and ebola (filoviridae) (goodier et al., ) . zc hav was previously reported to recognize cpg and recruits decay factors to degrade hiv rnas (takata et al., ) . zc hav acts as a cofactor for trim , an e ubiquitin ligase that promotes antiviral signaling mainly through rig-i (choudhury et al., ; gack et al., ) . our knockdown results indicate that both zc hav and trim may act as antiviral factors against sars-cov- ( figure c ). sars-cov n protein was previously shown to interact with the spry domain of trim , interfering with the activation of rig-i (hu et al., ). further investigation is needed to understand how zc hav and trim recognize and suppress sars-cov- transcripts and if sars-cov- n protein counteracts trim . parp , a cytoplasmic mono-adp-ribosylation (marylation) enzyme, is also known to have broad antiviral activity against rna viruses such as venezuelan equine encephalitis virus (togaviridae), vesicular stomatitis virus (rhabdoviridae), rift valley fever virus (phenuiviridae), encephalomyocarditis virus (picornaviridae), and zika virus (flaviviridae) by multiple mechanisms including blocking cellular rna translation (atasheva et al., ; welsby et al., ) or triggering proteasome-mediated destabilization of viral proteins (li et al., ) . adp ribosyltransferases are evolutionarily ancient tools used for host-pathogen interactions (fehr et al., ) . of note, coronavirus nsp carries a conserved macrodomain that can remove adpribose to reverse the activity of parp enzymes (fehr et al., ) . knockdown of parp and parp was shown to increase the replication of the macrodomain-deficient mouse hepatitis virus (mhv) which belongs to the lineage a of genus betacoronavirus (grunewald et al., ) which is consistent with our knockdown results ( figure c ). based on our rna interactome data, we hypothesize that the rna-binding activity of parp and its role in viral rna recognition may explain the underlying molecular mechanism of its antiviral activity against sars-cov- transcripts. other interferon-stimulated rbps may also be involved in host defense against sars-cov- . celf (cugbp elav-like protein family ) mediates alternative splicing and controls mrna stability and translation (konieczny et al., ) . celf is required for ifn -mediated suppression of simian immunodeficiency virus (retroviridae) (dudaronek et al., ) , but its involvement in other viral infections is unknown. shfl (shiftless/ryden) was induced prominently upon viral infection and interferon treatment, and suppressed by jak inhibitor (figure ) . shfl suppresses the translation of diverse rna viruses, including dengue virus (flaviviridae) and hiv (retroviridae) (balinsky et al., ; suzuki et al., ; wang et al., ) . under our experimental condition, upregulation of viral rna was only modest in celf and shfl-depleted cells, but further examination is needed as the knockdown efficiency of isgs were low in infected cells ( figure s b ), likely because virus-induced interferon response compromised gene silencing. apart from the above rbps, we identified multiple host factors that have not been previously described in the context of viral infection. in particular, larp depletion resulted in a substantial upregulation of viral rnas ( figure c ), suggesting that larp may have an antiviral function. larp stabilizes ′ top mrnas encoding ribosomal proteins and translation factors, which contain the ′ terminal oligopyrimidine ( ′ top) motif in the ′ utr (philippe et al., ) . larp also represses the translation of ′ top mrnas in response to metabolic stress in an mtorc -dependent manner (hong et al., ; lahr et al., ) . while much is unknown regarding the role of larp during viral infection, a recent proteomics study showed that sars-cov- n protein binds to larp (gordon et al., ) . based on our rnp capture and knockdown results, it is conceivable that larp may interfere with viral translation directly via viral rna interaction. the role of n protein in the context of larp -dependent viral suppression warrants further investigation. of note, we cannot exclude the possibility that larp indirectly regulates sars-cov- rnas via the control of ′ top mrnas encoding translation machinery. in fact, the sars-cov- rna interactome includes specific components of the s and s ribosomal subunits and translational initiation factors ( figure f -h). knockdown experiments indicated that ribosomal proteins (rps and rps ) and translation initiation factor eif h may have antiviral activities ( figure c ). eif h along with eif b is a cofactor for rna helicase eif a (rogers et al., ) whose depletion results in rna granule formation (tauber et al., ) . eif h and eif b were both identified as the core sars-cov- rna interactome ( figure f ). eif h also interacts with sars-cov- nsp (gordon et al., ) , so it will be interesting to investigate the functional consequence of the eif h-nsp interaction. together, our observations implicate that sars-cov- infection may be closely intertwined with the regulation of ribosome biogenesis, metabolic rewiring, and global translational control. other translation factors eif a, eif d, and csde exhibited proviral effects ( figure c ). eif a is the rna-binding component of the mammalian eif complex and evolutionarily conserved along with eif b and eif c (masutani et al., ) . eif d is known to interact with mrna cap and is required for specialized translation initiation . csde (unr) is required for ires-dependent translation in human rhinovirus (picornaviridae) and poliovirus (picornaviridae) (anderson et al., ; boussadia et al., ) . in all, our finding suggests that sars-cov- may recruit eif d and csde to respectively regulate cap-dependent and iresdependent translation initiation (lee et al., ) of sars-cov- grna and sgrnas. lastly, the coronaviral rna interactomes are enriched with rbps with kh domains ( figure s d and figure s b ) unlike the mrna interactome. depletion of fubp (marta ) and hdlbp (vigilin) increased the viral rna levels ( figure c ), hinting at a potential antiviral role of proteins containing kh domains. hdlbp is a conserved protein that contains kh domains and has been implicated in viral translation of dengue virus (ooi et al., ) . fubp was enriched in all four rnp capture experiments (i.e. sars-cov- probe i/ii and oc probe i/ii) ( figure d ). fubp is a nuclear protein with kh domains and binds to the ′ utr of cellular mrnas regulating mrna localization (blichenberg et al., ; mukherjee et al., ) . its connection to the life cycle of coronavirus is unknown to our knowledge. our current study reveals a broad-spectrum of antiviral factors such as trim , zc hav , parp , and shfl and also many rbps whose along with proteins regulating rnas, it would also be interesting to consider the possibility of 'riboregulation' (hentze et al., ) in which rna controls its interacting proteins. dengue virus, for example, uses its subgenomic rna called sfrna to sequester trim (chapman et al., ) . the sgrna/grna ratio is a critical determinant of epidemic potential of dengue virus (manokaran et al., ) . notably, coronaviruses including sars-cov- produces substantial amounts of noncanonical sgrnas that may serve as noncoding decoys to interact with host rbps to modulate host immune responses (kim et al., a) . genome-wide crispr screen , and off-label drug screening have all provided invaluable insights of the underlying biology of this novel human coronavirus. the authors declare no competing interests. calu- cells hours after infection at . moi. data are represented as mean ± s.e.m. (n = independent experiments). s rrna was used for normalization. by scanning the genomic rnas of sars-cov- (ncbi refseq accession nc_ . ) and hcov-oc (genbank accession ay . ) from head to tail, partially overlapping nt tiles were enumerated. these tiles were designed to have nt spacing, so adjacent tiles share a subsequence of nt. to avoid ambiguous targeting, tiles were aligned to the human transcriptome (version of oct , ) using bowtie (langmead and salzberg, ) and multi-mapped sequences were discarded. to prepare biotinylated antisense oligonucleotides (asos) in bulk, the sequence elements for in vitro transcription (ivt), reverse transcription (rt) and pcr were added to the nt tiles. the t promoter ( ′-taa tac gac tca cta tag gg- ′) and a pad for rt priming ( ′-tgg aat tct cgg gtg cca agg- ′) were added to the head and tail of each tile, respectively. we grouped asos into two sets for each viral genome: table s . aso templates were amplified using kapa hifi hotstart readymix (roche) and pcr primers for an aso pool. pcr products were purified by qiaquick pcr purification kit (qiagen). rna intermediates were then transcribed using x megascript t transcription kit (invitrogen), and dna templates were degraded by turbo dnase (invitrogen). to clean up enzymes and other reagents, . × reaction volume of ampure xp (beckman) was applied and polyethylene glycol was added to be final %. the size selection was carried out according to the manufacturer's protocol. biotinylated asos were synthesized by revertaid reverse transcriptase (thermo scientific) and ′ biotin-teg primer. rna intermediates were hydrolyzed at . m naoh and neutralized with acetic acid. finally, aso purification was performed in the same manner as ivt rna selection. the primer sequences used for pcr and reverse transcription are listed in table s . the uniprot reference proteome sets for human (up , canonical, swissprot) and african green monkey (chlorocebus sabaeus; up , canonical, swissprot and trembl) were used to identify host proteins in each mass spectrometry experiment (version / / ) (uniprot consortium, ). the reference proteome set for the severe acute respiratory syndrome coronavirus (sars-cov- ) was manually curated largely based on the ncbi reference sequence (nc_ . ) and related literature of other accessory proteins (e.g. orf b, orf b and orf c). the reference proteome set for the human coronavirus oc (hcov-oc ) was compiled based on the uniprot swiss-prot proteins for hcov-oc (taxonomy: ) except for hcov-oc protein i which was separated into protein ia and protein ib (or n ) (vijgen et al., ) . virus experiments were carried out in accordance with the biosafety guideline by the korea for total rna purification from virus-infected cells, ml trizol ls (invitorgen) were added to media-removed cell monolayers per single well of well plates followed by on-column dna digestion and purification (zymo research). for rna purification from rnp capture sample, bead-captured rnas were digested with ng proteinase k (pcr grade, roche) and incubated at ˚c for hour, followed by rna isolation by trizol ls with glycoblue (invitrogen). ~ µg rna were reverse-transcribed using revertaid transcriptase (thermo scientific) and random hexamer. qpcr was performed with primer pairs listed in table s and powersybr green (applied biosystems) and analyzed with quantstudio (thermo scientific). virus infected cells were detached from culture vessels by trypsin and cell pellets were resuspended with ice-cold pbs. ml cell suspensions were dispersed in mm dishes to irradiate nm uv for . j/cm using bio-link blx- for sars-cov- or . j/cm using spectrolinker xl- for hcov-oc . uv-crosslinked cells were pelleted by centrifugation was used for both the peptide and protein level. the match-between-runs option was enabled with default parameters in the identification step. finally, lfq was performed for those with a minimum ratio count of . to identify host and viral proteins that interact with the particular rna species of interest (e.g. sgrna or grna), we utilized the results from the "bead only" and "probe only" samples as technical backgrounds. specifically, the "bead only" (or no-probe) experiment in infected cells was used to account for non-specific interactors and biotin-containing carboxylases (e.g. pcca, acaca, and acacb) and determine the set of host and viral proteins that in a broad sense bind to the rna, which we call probe i/ii "binding" proteins. the probe experiment in uninfected cells (i.e. "probe only") was then used as the technical background against target rnaindependent interactors and determine the set of host proteins that are enriched for the target rna, which we call probe i/ii "enriched" proteins. to accomplish this, we considered the protein spectra count data as a multinomial distribution and applied a statistical test for spectra count enrichment. specifically, let n p be the number of identified spectra counts for protein group p from the case experiment (e.g. probe i experiment in infected cells), and m p be the respective count number from the control experiment (e.g. noprobe experiment in infected cells we conducted enrichment analyses of gene ontology (go) terms (gene ontology consortium, ) by means of summarizing the function of tens of host proteins identified in the rnp capture experiment. in general, fisher's exact test is used to estimate the statistical significance of the association (i.e. contingency) between a particular go term and the gene set of interest. to improve the explanatory power of this analysis, we used the weight algorithm (alexa et al., ) from the topgo r package which accounts for the go graph structure and reduces local dependencies between go terms. detailed information of the gene ontology was from the go.db r package (version . . ), and go gene annotations were from the org.hs.eg.db r package (version . . ). we integrated protein-protein interaction data from the biogrid database (release . . ) (stark et al., ) and retrieved other proteins that do not necessarily bind to the sars-cov- rna but form either transient or stable physical interactions with the host proteins identified from the rnp capture experiments. in detail, we considered only human protein-protein interactions that were ( ) found from at least two different types of experiments and ( ) reported by at least three publication records which resulted in a total of , interactions covering , human proteins. physical interactions between sars-cov- proteins and human proteins were by affinity capture and mass spectrometry in sars-cov- protein expressing cells (gordon et al., ) . the network r package and the ggnet function of the ggally r package was used for graph visualization. pfam database (version . ) (el-gebali et al., ) was used for protein domain enrichment analysis. taxon (human) and taxon (green monkey) protein domain annotations were used to analyze rnp capture results of hcov-oc and sars-cov- , respectively. onesided fisher's exact test was applied to estimate the statistical enrichment of a particular protein domain for the specific gene set (e.g. sars-cov- probe i binding proteins). we utilized the set of all proteins identified in the rnp capture experiments and all protein domains annotated to those proteins as the statistical background of the enrichment analysis. to investigate the subcellular localizations of the sars-cov- interactome, we leveraged the protein subcellular localization information from the human cell map database v (go et al., ) . information from the safe algorithm was used primarily but then supplemented by information from the nmf algorithm in case of "no prediction" or "-" localizations.. localization terms of the nmf algorithm were matched to terms of the safe algorithm in general, but few were mapped to the higher term of the safe algorithm. for example, the "cell junction" term of the nmf algorithm was merged to the "cell junction, plasma membrane" term of the safe algorithm. zhou, p., yang, x. relative viral rna level amplicon nsp n figure improved scoring of functional groups from gene expression data by decorrelating go graph structure internal initiation of translation from the human rhinovirus- internal ribosome entry site requires the binding of unr to two distinct sites on the ' untranslated region interferon-stimulated poly(adp-ribose) polymerases are potent inhibitors of cellular translation and virus replication irav (flj ), an interferon-stimulated gene with antiviral activity against dengue virus, interacts with mov the mrna-bound proteome and its global occupancy profile on protein-coding transcripts imbalanced host response to sars-cov- drives development of covid- identification of a cis-acting dendritic targeting element in map mrnas proteomics of sars-cov- -infected host cells reveals therapy targets the global phosphorylation landscape of sars-cov- infection unr is required in vivo for efficient initiation of translation from the internal ribosome entry sites of both rhinovirus and poliovirus in vitro reconstitution of sars-coronavirus mrna cap methylation insights into rna biology from an atlas of mammalian mrna-binding proteins the structural basis of pathogenic subgenomic flavivirus rna (sfrna) production drug repurposing screen for compounds inhibiting the cytopathic effect of sars-cov- trim and its emerging rna-binding roles in antiviral defense systematic discovery of xist rna binding proteins maxquant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification andromeda: a peptide search engine integrated into the maxquant environment cugbp is required for ifnbetamediated induction of dominant-negative cebpbeta and suppression of siv replication in macrophages the severe acute respiratory syndrome-coronavirus replicative protein nsp is a single-stranded rna-binding subunit unique in the rna virus world the pfam protein families database in the xist lncrna exploits three-dimensional genome architecture to spread across the x chromosome the nsp macrodomain promotes virulence in mice with coronavirus-induced encephalitis the impact of parps and adp-ribosylation on inflammation and host-pathogen interactions the coding capacity of sars-cov- human coronavirus: host-pathogen interaction trim ring-finger e ubiquitin ligase is essential for rig-imediated antiviral activity creating the gene ontology resource: design and implementation a census of human rna-binding proteins a proximity biotinylation map of a human cell the broad-spectrum antiviral protein zap restricts human retrotransposition a sars-cov- protein interaction map reveals targets for drug repurposing commentary: middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group the coronavirus macrodomain is required to prevent parp-mediated inhibition of virus replication and enhancement of ifn expression impaired type i interferon activity and inflammatory responses in severe covid- patients a new virus isolated from the human respiratory tract a brave new world of rnabinding proteins larp functions as a molecular switch for mtorc -mediated translation of an essential class of mrnas the severe acute respiratory syndrome coronavirus nucleocapsid inhibits type i interferon production by interfering with trim -mediated rig-i ubiquitination dynamic rewiring of the human interactome by interferon signaling the architecture of sars-cov- transcriptome identification of coronavirus isolated from a patient in korea with covid- mbnl proteins and their target rnas, interaction and splicing regulation la-related protein (larp ) binds the mrna cap the molecular biology of coronaviruses comparative analysis of rna genomes of mouse hepatitis viruses structure of the full sars-cov- rna genome in infected cells fast gapped-read alignment with bowtie advances in clip technologies for studies of protein-rna interactions eif d is an mrna capbinding protein that is required for specialized translation initiation regulation mechanisms of viral ires-driven translation parp suppresses zika virus infection through parp-dependent degradation of ns and ns viral proteins dengue subgenomic rna binds trim to inhibit interferon expression for epidemiological fitness reconstitution reveals the functional core of mammalian eif rap-ms: a method to identify proteins that interact directly with a specific rna molecule in cells the xist lncrna interacts directly with sharp to silence transcription through hdac chromosomes. a comprehensive xist interactome reveals cohesin repulsion and an rna-directed chromosome conformation β -actin mrna interactome mapping by proximity biotinylation severe acute respiratory syndrome coronavirus nsp suppresses host gene expression, including that of type i interferon, in infected cells an rna-centric dissection of host complexes controlling flavivirus infection coronavirus as a possible cause of severe acute respiratory syndrome coronaviruses post-sars: update on replication and pathogenesis la-related protein (larp ) repression of top mrna translation is mediated through its cap-binding domain and controlled by an adjacent regulatory region methods to study rna-protein interactions modulation of the helicase activity of eif a by eif b, eif h, and eif f molecular biology: rap and chirp about x inactivation interplay between sars-cov- and the type i interferon response fundamental properties of the mammalian innate immune system revealed by multispecies comparison of type i interferon responses sars-cov replication and pathogenesis in an in vitro model of the human conducting airway epithelium the nonstructural proteins directing coronavirus rna synthesis and processing continuous and discontinuous rna synthesis in coronaviruses biogrid: a general repository for interaction datasets a sars-cov- bioid-based virus-host membrane protein interactome and virus peptide compendium: new proteomics resources for covid- research the nsp replicase protein of sars-coronavirus, structure and functional insights characterization of ryden (c orf ) as an interferon-stimulated cellular inhibitor against dengue virus replication cg dinucleotide suppression enables antiviral defence targeting non-self rna modulation of rna condensation by the dead-box protein eif a structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- the future of cross-linking and immunoprecipitation (clip) uniprot: a worldwide hub of protein knowledge a large-scale binding and functional map of human rna-binding proteins complete genomic sequence of human coronavirus oc : molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission event regulation of hiv- gag-pol expression by shiftless, an inhibitor of programmed - ribosomal frameshifting genome-wide crispr screen reveals host genes that regulate sars-cov- infection parp , an interferon-stimulated gene involved in the control of protein translation and inflammation discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus key: cord- -dk pdqa authors: kuo, tsun-yung; lin, meei-yun; coffman, robert l; campbell, john d; traquina, paula; lin, yi-jiun; liu, luke tzu-chi; cheng, jinyi; wu, yu-chi; wu, chung-chin; tang, wei-hsuan; huang, chung-guei; tsao, kuo-chien; shih, shin-ru; chen, charles title: development of cpg-adjuvanted stable prefusion sars-cov- spike antigen as a subunit vaccine against covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dk pdqa the covid- pandemic caused by the novel coronavirus sars-cov- is a worldwide health emergency. the immense damage done to public health and economies has prompted a global race for cures and vaccines. in developing a covid- vaccine, we applied technology previously used for mers-cov to produce a prefusion-stabilized sars-cov- spike protein by adding two proline substitutions at the top of the central helix (s- p). to enhance immunogenicity and mitigate the potential vaccine-induced immunopathology, cpg , a th -biasing synthetic toll-like receptor (tlr ) agonist was selected as an adjuvant candidate. s- p was combined with various adjuvants, including cpg , and administered to mice to test its effectiveness in eliciting anti-sars-cov- neutralizing antibodies. s- p in combination with cpg and aluminum hydroxide (alum) was found to be the most potent immunogen and induced high titer of spike-specific antibodies in sera of immunized mice. the neutralizing abilities in pseudotyped lentivirus reporter or live wild-type sars-cov- were measured with reciprocal inhibiting dilution (id ) titers of and , respectively. in addition, the antibodies elicited were able to cross-neutralize pseudovirus containing the spike protein of the d g variant, indicating the potential for broad spectrum protection. a marked th- dominant response was noted from cytokines secreted by splenocytes of mice immunized with cpg and alum. no vaccine-related serious adverse effects were found in the dose-ranging study in rats administered single- or two-dose regimens with up to μg of s- p combined with cpg alone or cpg with alum. these data support continued development of cho-derived s- p formulated with cpg /alum as a candidate vaccine to prevent covid- disease. introduction covid- was first identified as a cause of severe pneumonia cases in december in association with a seafood market in wuhan, china [ ] . the viral agent was identified as a novel sars-like coronavirus (sars- cov- ) most closely related to bat coronavirus [ ] . in the six months since its first appearance, sars-cov- has become the largest pandemic since the influenza with nearly million infected and over , coronaviruses are among the largest known enveloped rna viruses and cause respiratory illnesses in humans ranging from the common cold to sars, mers, as well as the current covid- pandemic [ ] . similar to sars-cov, the spike (s) protein of sars-cov- is the receptor for attachment and cell entry via the cellular receptor hace [ ] . researchers are also adapting antigen design strategies used for sars-cov subunit vaccines such as the spike protein are often poorly immunogenic by themselves and therefore typically require adjuvants to enhance their ability to produce an immune response [ in this study, we present data from preclinical studies aimed at developing a covid- candidate subunit vaccine using cho cell-expressed sars-cov- s- p antigen combined with various adjuvants. we have shown that s- p, when mixed with cpg and aluminum hydroxide adjuvants, was most effective in inducing antibodies that neutralized pseudovirus and wild-type live virus while minimizing th -biased responses with no vaccine-related adverse effects. the plasmid expressing sars-cov- (strain wuhan-hu- genbank: mn ) s protein ectodomain ltd. animal studies were conducted in the testing facility for biological safety, tfbs bioscience inc., taiwan. all animal work was reviewed and approved by the institutional animal care and use committee (iacuc). the testing facility's iacuc animal study protocol approval numbers are tfbs - and tfbs - . neutralization titers of wild-type virus and pseudovirus and total anti-s igg titers were all found to be highly correlated with spearman's rank correlation coefficients greater than . ( figure ). we have successfully shown robust immunogenicity elicited by adjuvanted sars-cov- s- (figures , , s , and s ). much stronger neutralizing antibody responses were detected in mice when g or g of s- p protein was adjuvanted with g of cpg and g of aluminum hydroxide than with either adjuvant alone (figure ). s- p in conjunction with cpg and aluminum hydroxide induced potent anti-s antibodies that were effective against wild-type virus (figures and ) . we have shown that high degrees of correlation although moderate il- production was detected in mice receiving g of s- p combined with cpg and aluminum hydroxide, the ifn-γ/il- ratio was -fold higher than those receiving g of s- p adjuvanted with aluminum hydroxide alone. these results suggested that cpg , even in the presence of aluminum hydroxide could steer the immune response away from th to a th response. moreover, these mice produce a limited amount of il- , which is a key mediator in eosinophil activation and major regulator of eosinophil accumulation in tissues [ ] . previous studies showed that the lung-infiltrating eosinophils were a common indication of th -biased immune responses seen in animal models testing sars-cov vaccine candidates [ ] . the finding that il- production was inhibited by the s- p adjuvanted with cpg plus aluminum hydroxide discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin arcgis dashboards. gisanddata.maps.arcgis.com. [retrieved responding to covid- -a once-in-a-century pandemic? funding the development and manufacturing of covid- vaccines: the need for global collective action draft landscape of covid- candidate vaccines - target product profiles for covid- vaccines - severe acute respiratory syndrome-related coronavirus-the species and its viruses, a statement of the coronavirus study group immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen cryo-em structure of the -ncov spike in the prefusion conformation recent advances of vaccine adjuvants for infectious diseases. immune network vaccine adjuvants: understanding the structure and mechanism of adjuvanticity. vaccine immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus. plos one covid- vaccine design: the janus face of immune enhancement development of the cpg adjuvant : a case study culture-based virus isolation to evaluate potential infectivity of clinical specimens tested for covid- effect of mucosal and systemic immunization with virus-like particles of severe acute respiratory syndrome coronavirus in mice co-administration of a cpg adjuvant (vaximmunetm, cpg ) with cetp vaccines increased immunogenicity in rabbits and mice. human vaccines sars-cov- spike glycoprotein vaccine candidate nvx-cov elicits immunogenicity in baboons and protection in mice. biorxiv mammalian cell culture for production of recombinant proteins: a review of the critical steps in their biomanufacturing tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus murine eosinophil differentiation factor. an eosinophil-specific colony-stimulating factor with activity for human cells immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus systems and their potential applications in hepatitis b vaccines key: cord- -djqzgc k authors: hao, siyuan; ning, kang; kuz, cagla aksu; vorhies, kai; yan, ziying; qiu, jianming title: long period modeling sars-cov- infection of in vitro cultured polarized human airway epithelium date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: djqzgc k severe acute respiratory syndrome coronavirus (sars-cov- ) replicates throughout human airways. the polarized human airway epithelium (hae) cultured at an airway-liquid interface (hae-ali) is an in vitro model mimicking the in vivo human mucociliary airway epithelium and supports the replication of sars-cov- . however, previous studies only characterized short-period sars-cov- infection in hae. in this study, continuously monitoring the sars-cov- infection in hae-ali cultures for a long period of up to days revealed that sars-cov- infection was long lasting with recurrent replication peaks appearing between an interval of approximately - days, which was consistent in all the tested hae-ali cultures derived from lung bronchi of independent donors. we also identified that sars-cov- does not infect hae from the basolateral side, and the dominant sars-cov- permissive epithelial cells are ciliated cells and goblet cells, whereas virus replication in basal cells and club cells was not detectable. notably, virus infection immediately damaged the hae, which is demonstrated by dispersed zonula occludens- (zo- ) expression without clear tight junctions and partial loss of cilia. importantly, we identified that sars-cov- productive infection of hae requires a high viral load of . × virions per cm of epithelium. thus, our studies highlight the importance of a high viral load and that epithelial renewal initiates and maintains a recurrent infection of hae with sars-cov- . introduction observed a cytoplasmic expression and a weak junction expression of zo- at dpi (fig. a ) and dpi (sfig. a) for infected hae-ali b - and hae-ali b - , respectively. these results demonstrate that a high viral load (at least > pfu (~ . × vgc) to an epithelium of . cm , which contains with ~ × epithelial cells, is necessary to initiate a productive infection. ciliated and goblet cells are permissive to sars-cov- but not the basal and club cells. we next examined sars-cov- infection in which the inoculation of a high moi of was applied to the basolateral side in hae-ali b - . the results showed there were no detectable infectious virions released from both the apical and basolateral sides (fig. a) . the teer of the infected hae displayed no significant changes over the course of days (fig. b) , which was also evidenced for the well-preserved tight junctions (fig. c) , as well as the rich cilia expression (fig. d) . importantly, np+ cells were not detected for as long as dpi. positive anti-β-tubulin iv staining. consistent with previous imaging results (fig. ) , most of the np+ cells were also positive with anti-β-tubulin iv staining (fig. a) , whereas almost all the ckrt + basal were negative for anti-np staining (fig. c) . probing secretoglobin family a member (scgb a ) expression for club cells and mucin ac (muc ac) expression for goblet cells revealed that the secretory cells were less abundant sub-populations in the infected hae-ali cultures. while we could not locate any club cells stained positively for np expression (fig. d) , we found some np+/muc ac+ goblet cells (fig. b) . importantly, we observed that in the infected hae-ali, a subset of cykt + basal cells are found associated with the expression of ki ; however, in the mock-infected hae-ali, we did not found the cells co-expressing both ki and cykt (fig. b) . as ki is a marker of cell proliferation , this result suggested that sars-cov- infection activates basal cells towards proliferation. taking these lines of evidence together, our results confirmed sars-cov- mainly infects ciliated cells of hae, as well as the goblet cells, despite its lower abundance in hae-ali cultures. our study suggests that basal and club cells are not permissive to sars-cov- . our observation that sars-cov- was unable to infect epithelial cells from the basolateral side supports that the viral entry receptor ace is polarly expressed at the apical side , . we believe the finding that ckrt + basal cells are largely not infected by sars- cov- is important to understand the course of sars-cov- infection in hae. the airway basal cells are the epithelial cell type not presenting on the surface of the airway lumen, thus, they are not usually accessible to the virus on the apical side. however, when the infection commences and the epithelial damage occurs, the destructive mucosal lesions (and the death of the infected ciliated and goblet cells) would allow the virus to get access the basal cells (fig. c) . indeed, the detectable virus shedding to the basolateral chamber (fig. a&c , sfig. a and sfig. a) indicates a possible window to expose the basal cells to sars-cov- . notably, these time points also represent the peaks of the release of virus progeny. however, none of the ckrt + cells that was also np-positive was found from the cytospins prepared from sars-cov- infected hae at dpi when the infection appeared at the lowest level. we hypothesize that the non-permissive nature of basal cells to sars-cov- is due to the negligible expression of ace or tmpress- . the epithelial cell lining of the airways provides an efficient barrier against microbes and aggressive molecules through interdependent functions, including mechanical clearance of the mucus executed by movements of the cilia, a cellular barrier function by means of intercellular epithelial junctions formed by a set of tight junction associated proteins such as zo , and homeostasis of ion transport . at the airway epithelial cellular level, the tight junction- associated proteins, such as zo , occludin, and claudins, play a central part in the epithelial cytoprotection by maintaining a physical selective barrier between external and internal environments. the tight junction proteins are highly labile structures whose formation and structure may be very rapidly altered after airway injury, for example, airway inflammation. proinflammatory cytokines have a drastic effect on tight junction expression and barrier functions, which significantly alter the epithelial barrier permeability - . sars-cov- infection distorted the zo- expression, and thereafter causes barrier dysfunction (teer decrease). the infection not only alters the zo- expression of infected (np + cells) but also uninfected cells (np-cells) (fig. ) . this is also true for the cilia loss. we believe that sars-cov- infection produces inflammatory cytokines as an innate immunity response upon virus infection , which either disturbs zo- and tubulin expression or alters their structures. the innate immunity response also limits virus infection at the front line. in fact, sars-cov- requires a high viral load (> pfu/cm of hae) to initiate a productive infection (fig. ) . of note, the infectious titer (pfu) was determined by plaque assay in vero-e cells, which are interferon-deficient . we determined that pfu of sars-cov- in vero-e cells has a particle (viral genome copy) number of , suggesting that a load of . x particles is required to productively infect cm of the airway epithelium, which is much higher than the small dna virus parvovirus human bocavirus (hbov ) we studied . hbov can infect hae at an moi of as low as . genome copies per cell, which equals . x particles per cm of the airway epithelium. epithelial regeneration involves migration of the basal cells that neighbor the acute injured area (e.g., virus-infected area), active dividing and squamous metaplasia, rapid redifferentiation to preciliated cells, and finally ciliogenesis towards a complete pseudostratified mucociliary epithelium . airway epithelium repair is critical for the maintenance of the barrier function and the limitation of airway hyperreactivity. in a biopsy study of fresh tracheas and lungs from five deceased covid- patients, it was found that the epithelium was severely damaged in some parts of the trachea, and extensive basal cell proliferation was observed in the trachea, where ciliated cells were damaged, as well as in the intrapulmonary airways . these data support our conclusion that basal cells are not permissive to sars-cov- . as a response to these previous findings, our study observed that a subset of proliferating basal cells in the sars-cov- infected hae-ali, but not in the mock infected hae-ali (fig. b) . thus, we hypothesize that sars-cov- infection induces basal cell proliferation, which accounts for the observed long-lasting infections with recurrent peaks of viral replication, which warrants future investigation. overall, we propose a model of sars-cov- -infection of hae (fig. c) and then cytocentrifuged at , rom for min onto slides using a shandon cytospin cytocentrifuge. after cytospun, the slides were fixed overnight in % paraformaldehyde at °c. the fixed hae or dissociated cells were permeabilized with . % triton x- for min at room temperature. then, the slide was incubated with primary antibody in pbs with % fbs for h at ºc. after washing, the membrane was incubated with fluorescein isothiocyanate- and rhodamine-conjugated secondary antibodies, followed by staining of the nuclei with dapi ( ', -diamidino- -phenylindole). online ahead of print covid- : towards controlling of a pandemic genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding transmission of -ncov infection from an an in vitro model of differentiated human airway epithelia. methods for establishing primary cultures well-differentiated human airway epithelial cell cultures culturing the unculturable: human coronavirus hku infects, replicates, and produces progeny virions in human ciliated airway epithelial cell cultures systematic assembly of a full-length infectious clone of human coronavirus nl avian influenza virus glycoproteins restrict virus replication and spread through human airway epithelium at temperatures of the proximal airways human coronavirus e infects polarized airway epithelia from the apical surface infection and propagation of human rhinovirus c in human airway epithelial cells ace receptor expression and severe acute respiratory syndrome coronavirus infection depend on differentiation of human airway epithelia sars-cov replication and pathogenesis in an in vitro model of the human conducting airway epithelium characterization and treatment of sars-cov- in nasal and bronchial human airway epithelia an orally bioavailable broad-spectrum antiviral inhibits sars-cov- in human airway epithelial cell cultures and multiple coronaviruses in mice morphogenesis and cytopathic effect of sars-cov- infection in human airway epithelial cells cryo-em structure of the -ncov spike in the prefusion conformation. science tropism, replication competence, and innate immune responses of the coronavirus sars-cov- in human respiratory tract and conjunctiva: an analysis in ex-vivo and in-vitro cultures distinct stem/progenitor cells proliferate to regenerate the trachea, intrapulmonary airways and alveoli in covid- patients sars-cov- reverse genetics reveals a variable infection gradient in the respiratory tract cov- infection in long-term human distal lung organoid cultures airway epithelial repair, regeneration, and remodeling after injury in chronic obstructive pulmonary disease novel dynamics of human mucociliary differentiation revealed by single-cell rna sequencing of nasal epithelial cultures the ki- protein: from the known and the unknown sars-cov- entry factors are highly expressed in nasal epithelial cells together with innate immune genes sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor tmprss and tmprss promote sars-cov- infection of human small intestinal enterocytes new rules for club development: new insights into human the role and importance of club cells (clara cells) in the pathogenesis of some respiratory diseases the role of scgb a + clara cells in the long-term maintenance and repair of lung airway, but not alveolar, epithelium cigarette smoke exposure and inflammatory signaling increase the expression of the sars- cov- receptor ace in the respiratory tract role of claudin interactions in airway tight junctional permeability establishment of a reverse genetics system for studying human bocavirus in human airway epithelia parvovirus infection of human airway epithelia induces pyroptotic cell death via inhibiting type i and type iii ifn restrict sars-cov- infection of human defectiveness of interferon production and of rubella virus interference in a line of african green monkey kidney cells (vero) in vitro modeling of human bocavirus infection of polarized primary human airway epithelia & trump,b.f. the respiratory epithelium epidermoid metaplasia of hamster tracheal epithelium during regeneration following mechanical injury development of cystic fibrosis and noncystic fibrosis airway cell lines distinct classes of proteasome-modulating agents cooperatively augment recombinant adeno- associated virus type and type -mediated transduction from the apical surfaces of human airway epithelia unique biologic properties of recombinant aav transduction in polarized human airway epithelia human bocavirus infection of well-differentiated human airway epithelium replication of an autonomous human parvovirus in non-dividing human airway epithelium is facilitated through the dna damage and repair pathways real-time analysis of camp-mediated regulation of ciliary motility in single primary human airway epithelial cells establishment of a replicon system for bourbon virus and identification of small molecules that efficiently inhibit virus replication confocal images were taken at a magnification of x on the indicated days post-infection (dpi) nuclei were stained with dapi (blue) three-dimensional (z-stack) imaging of sars-cov- infected primary bronchial - -infected hae-ali b - cultures at dpi were co-stained with or with anti-np and anti-β-tubulin iv antibodies (b), or co- stained anti-np and anti-zo- antibodies (b). a set of confocal images were taken at a magnification of x from the stained pierce of epithelium image as shown in each channel of fluorescence. nuclei were stained with dapi (blue) virus release kinetics and transepithelial electrical resistance (teer) measurement of hae-ali infected with sars-cov- at various viral loads (multiplicities of infection) a&c) virus release kinetics. hae-ali b - cultures were infected with sars-cov- at at the indicated days post-infection (dpi), µl of apical washes by incubation of µl of d-pbs in the apical chamber and µl of the basolateral media were taken for plaque assays transepithelial electrical resistance measurement. the teer of mock-and sars-cov- - hae-ali culture were measured using an epithelial volt-ohm meter (millipore) at the indicated dpi. the teer values were normalized to the teer measured on the day of infection, which is set as . . values represent the mean of relative teer +/-standard deviations hae-ali b - cultures were infected with sars-cov- at an moi from . to . . at dpi, both virus and mock infected hae were co-stained with anti-np and anti-zo- antibodies (a), or co-stained with anti-np and anti-β-tubulin iv antibodies (b). confocal images were taken at a magnification of x . nuclei were stained with dapi (blue). key: cord- - mkf os authors: levasseur, anthony; delerce, jeremy; caputo, aurelia; brechard, ludivine; colson, philippe; lagier, jean-christophe; fournier, pierre-edouard; raoult, didier title: genomic diversity and evolution of coronavirus (sars-cov- ) in france from covid- -infected patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mkf os the novel coronavirus (sars-cov- ) causes pandemic of viral pneumonia. the evolution and mutational events of the sars-cov- genomes are critical for controlling virulence, transmissibility, infectivity, severity of symptoms and mortality associated to this infectious disease. we collected and investigated sars-cov- genomes from patients infected in france. detailed genome cartography of all mutational events (snps, indels) was reported and correlated to clinical features of patients. a comparative analysis between our sars-cov- genomes from french patients and the reference wuhan coronavirus genome revealed substitution mutations and six deletion events: ten were in ’/ ’ utr, were nonsynonymous, were synonymous and one generated a stop codon. six different deleted areas were also identified in nine viral variants. in particular, substitution mutations ( nonsynonymous) and one deletion (Δ - ) concerned the spike s glycoprotein. an average of . mutational events (+/- . sd) and a median of (range, - ) were reported per viral isolate. comparative analyses and clustering of specific mutational signatures in genomes disclose several divisions in groups and subgroups combining their geographical and phylogenetic origin. clinical outcomes of the covid- -infected patients were investigated according to the mutational signatures of viral variants. these findings highlight the genome dynamics of the coronavirus - and shed light on the mutational landscape and evolution of this virus. inclusion of the french cohort enabled us to identify novel mutations never reported in sars-cov- genomes collected worldwide. these results support a global and continuing surveillance of the emerging variants of the coronavirus sars-cov- . the new coronavirus sars-cov- has now spread to every country in the world. sars-cov- is ihucovid- ihucovid- ihucovid- ihucovid- wuhan ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- -pv ihucovid- -pv ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc-pv ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc-pv ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc-pv ihucovid- ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- ihucovid- -pc ihucovid- ihucovid- ihucovid- -pv ihucovid- ihucovid- -pc-pv ihucovid- ihucovid- ihucovid- t r g > t u t r c > t n s p c t c > c t t l n s p t t l n s p a a g > a a t k n n s p a a g > a g g k r n s p c t a > t t a l n s p a a g > x n s p a a g > x n s p a a g > x n s p t c a > x n s p t c a > x n s p t c a > x n s p t t t > x n s p t t t > x n s p t t t > x n s p c t t > a t t l i n s p t t c > t g c f c n s p t g c > t g t c n s p a c c > a t c t i n s p c g a > c g g r n s p g a a > g a g e n s p g t t > t t t v t t v n s p g c c > g t c a v n s p g c t > g t t a v n s p g c c > g c t a n s p t a c > t a t y n s p c a g > c a t q h n s p g a t > a a t d n n s p c t t > t t t l f n s p t a c > t a t y n s p c t c > c t t l n s p a t t > g t t i v n s p t t g > c t g l n s p c c a > t c a p s n s p a g t > a g c s n s p g t t > g t g v n s p c c t > c t t p l n s p g c a > a c a a t n s p t t c > t t t f n s p g a t > g a c d n s p c c a > c c g p n s p c a g > c a t q h n s p g t g > t t g v t c i n s p a c a > a t a t i n s p c g g > t g g r w n s p g a a > g g a e g n s p a a a > a g a k r n s p c c t > t c t p s n s p a a t > g a t n d n s p a a t > a a c n n s p t a c > t a t y n s p a t g > g t g m v n s p t g c > t g t c n s p t a c > t a t y n s p g c a > a c a a t n s p g a t > g a c d n s p c a g > c a t q h n s p a c c > a c t t n s p c c a > t c a p s n s p g a g > g t g e v n s p c a c > t a c h y n s p g a t > g a c d n s p a c t > a t t t i n s p a c t > a t t t i n s p g a c > g a t d c t t n s p c a c > c a t h n s p g c t > g t t a v n s p a a c > a a t n n s p a c a > a t a t i n s p a t g > a t t m i n s p t a c > t a t y n s p g c t > g t t a v n s p t c a > t t a s l n s p g g t > a g t g s n s p c t c > c t t l n s p t t c > t t t f n s p g c t > t c t a s n s p c t c > t t c l f n s p c t c > c t t l n s p a a g > a g g k r n s p c g c > c g t r t t > t t t v f n s p a a g > a a t k n n s p g t a > a t a v i n s p t c t > t a t s y n s p g c a > g c t a n s p c c c > c c t p n s p t a t > t a c y n s p a c g > a t g t m n s p a c a > a t a t i n s p c a g > c a t q h n s p g c c > g t c a v n s p a c t > a t t t i n s p a c a > a t a t i n s p c c t > c t t p l n s p g c t > g c c a n s p t a c > t a t y n s p c a a > c a g q n s p a t c g > t t g s l n s p b a c a > t c a t s n s p b a a g > a g g k r n s p b t g c > t g t c n s p b t t c > t t t f n s p b c c t > c t t p l n s p b c t a > t t a t g > g t a v n s p b c t t > t t t l f n s p b t t g > t t a l n s p b a c a > a t a t i n s p g c t > g t t a v n s p c c g > c c t p n s p t g c > t g t c n s p c c a > c t a p l n s p c a a > a a a q k n s p g a g > g a t e d n s p t t a > t t g l n s p c a t > t a t h y n s p c t c > t t c l f n s p c g t > c g c r n s p g a c > g a t d n s p a a g > a a t k n n s p a t g > a t a m i n s p c c t > t c t p g t t v n s p t t g > t t t l f n s p a a c > a a t n n s p t a c > t a t y n s p g c a > a c a a t n s p g t t > t t t v f n s p g g a > c g a g r n s p t a c > t a t y n s p t t a > t t g l n s p t a t > t g t y c n s p t a t > t a c y n s p c t a > c t t l n s p t c a > c c a s p n s p g g c > g g t g n s p c t g > c t a l n s p g t c > g t t v n s p c a t > c a c h n s p t g g > t g t w t > a t t t i s g t t > t t t v f s g c t > t c t a s s a c a > a t a t i s t t c > t t t f s c c t > t c t p s s g c a > g c t a s t c a > g c a s a s t t c > t t t f s t t c > t t t f s g a t > g g t d g s a c a > a t a t i s t g c > t g t c s g a t > c a t d h s g a t > g a c d s g c t > t c t a s s c t g > c t a l t t c > t t t f o r f a c a g > c a t q h o r f a g c a > t c a a s o r f a t c a > t t a s l o r f a g c t > g t t a v o r f a g c c > t c c a s o r f a c t c > c t t l o r f a g c t > t c t a s o r f a t a c > t a t y o r f a t g g > t g t w c o r f a t g c > t g t c o r f a t a t > c a t y h o r f a g a t > t a t d y o r f a g a a > a a a e k o r f a g g t > g t t g v o r f a a a t > x e t t c > t t t f e g t g > t t g v l m c t t > t t t l f m g t g > t t g v l m t a c > t a t y m a t g > a c g m t t c g > t c a s o r f t t c > t t t f n c c c > c c t p n c c c > c t c p l n c c c > c c t p n g g t > g t t g t > g c t t a n a a c > a a t n n t a c > t a t y n c a t > t a t h y n c c g > c c t p n g c t > g c a a n t t c > t t t f n t a c > t a t y n g c c > g c t a n c c a > t c a p s n g a t > t a t d y n c a g > c a t q h t > c c > t u t r g > a u t r g > t u t r g > t u t r g > t u t r c > t u t r g > c g g c > a g c g s n s p g g t > x n s p g g t > x n s p c a t > x n s p c a t > x n s p c a t > x n s p g t t > x n s p g t t > x n s p g t t > x n s p a t g > x n s p g a a > g g a e g n s p c t c > c f n s p c a t > t a t h y n s p g g c > g a c g d n s p g g t > a g t g s n s p g t g > a t g v m n s p c a t > c g t h r n s p a a t > x n s p a a t > x n s p g a c > x n s p g t g > a t g v m n s p g t g > g l n s p a c a > a t a t i n s p g t t > g t c v n s p g t c > g t t v n s p g t c > g t t v n s p c c a > t c a p s n s p c c t > c t t p l n s p a a g > a a t k n n s p g c c > g t c a v n s p a t a > a netherland s/na / netherland s/na / key: cord- - tmg b authors: xiao, yan; li, zhen; wang, xinming; wang, yingying; wang, ying; wang, geng; ren, lili; li, jianguo title: comparison of three taqman real-time reverse transcription-pcr assays in detecting sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tmg b quick and accurate detection of sars-cov- is critical for covid- control. dozens of real-time reverse transcription pcr (qrt-pcr) assays have been developed to meet the urgent need of covid- control. however, methodological comparisons among the developed qrt-pcr assays are limited. in the present study, we evaluated the sensitivity, specificity, amplification efficiency, and linear detection ranges of three qrt-pcr assays, including the assays developed by our group (ipbcams), and the assays recommended by who and china cdc (ccdc). the three qrt-pcr assays exhibited similar sensitivities, with the limit of detection (lod) at about copies per reaction (except the orf b gene assay in ccdc assays with a lod at about copies per reaction). no cross reaction with other respiratory viruses were observed in all of the three qrt-pcr assays. wide linear detection ranges from to copies per reaction and acceptable reproducibility were obtained. by using clinical specimens, the n gene assay of ipbcams assays and ccdc assays performed better (with detection rates of % and %, respectively) than that of the who assays (with a detection rate of %), and the orf b gene assay in ipbcams assays performed better (with a detection rate of %) than those of the who assays and the ccdc assays (with detection rates of % and %, respectively). in conclusion, the n gene assays of ccdc assays and ipbcams assays and the orf b gene assay of ipbcams assays were recommended for qrt-pcr screening of sars-cov- . (http://www.chinacdc.cn/jkzt/crb/zl/szkb_ /jszl_ / /w .pdf) ( table ) . primers and probes were synthesized by standard phosphoramidite chemistry techniques at qingke biotechnology co. ltd (beijing, china). taqman probes were labeled with the molecule -carboxy-fluroscein (fam) at the ' end, and with the quencher blackhole quencher (bhq ) at the ' end. optimal concentrations of the primers and probes were determined by cross-titration of serial two-fold dilutions of each primer/probe against a constant amount of purified rna of sars-cov- . the taqman real-time rt-pcr assays were performed by using taqman respectively. the concentrations of the rna transcripts were determined by using nanodrop (thermo fisher scientific, ca, usa). comparison of the sensitivities, reproducibility and linear detection ranges of the three qrt-pcr assays. to determine the sensitivity of the three qrt-pcr assays, we measured the limit of detection (lod) for each assay by using rna transcript of the corresponding gene in ten-fold dilution as template (rna transcript alone). a lod of genomic copies per reaction was observed for both the n gene assay and the orf b gene assay of all the three qrt-pcr assays, although the ct values for n gene assay of who assays and orf b gene assay of ccdc assays were higher than cycles (table ) . the linear detection ranges of the three qrt-pcr assays were determined by using a ten-fold dilution of the rna transcript as template. it showed that the ct values increased with the rna transcript from to copies in the reaction in all of the three qrt-pcr assays ( assays, and ccdc assays, respectively. these results suggested that all of the three qrt-pcr assays exhibited linear detection ranges from to copies per reaction, while the who assays showed lower coefficient of linear correlation. the reproducibility of the three qrt-pcr assays was assessed by measuring coefficient of variation (cv) of mean ct values in the intra-and inter-assay. for the n gene assay, the cvs of mean ct values from to copies of rna transcript per reaction were . %- . %, . %- . %, . %- . % in intra-assay, and . %- . %, . %- . %, . %- . % in inter-assay of ipbcams assay, who assay, and ccdc assay, respectively. for the orf b gene assay, the cvs of mean ct values were . %- . %, . %- . %, . %- . % in intra-assay, and . %- . %, . - . %, . %- . % in inter-assay of ipbcams assays, who assays, and ccdc assays, respectively. because co-infections of respiratory viruses are common, we prepared a (v:v= : ) mixture of the rna transcript and a pooled total nucleic acid extract from respiratory specimens (rna transcript + other extract) as template, to evaluate the effect of co-existed viral nucleic acids on the performance of the assays. no effect of the co-existed other viral nucleic acids on the lod and the linear detection range was observed, although higher ct values were generated than those of rna transcript alone as template in all of the three qrt-pcr assays. however, the co-existed other viral nucleic acids put some effect on the efficiencies of the three qrt-pcr assays. comparison of the specificities of the three qrt-pcr assays to evaluate the potential cross-reactions with other human respiratory viruses, the three qrt-pcr assays were examined by using human respiratory samples as templates, which were positive for human coronaviruses (oc , nl , e, or hku ), or influenza viruses (a or b), or respiratory syncytial virus, or parainfluenza virus ( - ), or human metapneumovirus, or rhinovirus, or adenovirus, or bocavirus. no cross reaction was observed in all of the three qrt-pcr assays (data not shown), suggesting high specificity of the three qrt-pcr assays in detecting sars-cov- . the three qrt-pcr assays were evaluated with clinical specimens (including throat swabs and sputum) from suspected covid- patients. sars-cov- was detected from % ( / ), % ( / ), % ( / ) by the n gene assay, and from % ( / ), % ( / ), % ( / ) of all enrolled clinical specimens by the orf b gene assay in ipbcams assays, who assays, ccdc assays, respectively (table ) . with respect to the sputum, sars-cov- was detected from % ( / ), % ( / ), % ( / ) of specimens by the n gene assay, and from % ( / ), % ( / ), . % ( / ) of specimens by the orf b gene assay in in ipbcams assays, who assays, ccdc assays, respectively. about the throat swabs, sars-cov- was detected from . % ( / ), . % ( / ), % ( / ) of specimens by the n gene assay, and from . % ( / ), . % ( / ), % ( / ) of specimens by the orf b gene assay in in ipbcams assays, who assays, ccdc assays, respectively. these results demonstrated that the n gene assay performed better than the corresponding orf b gene assay of all the three qrt-pcr assays, the n gene assay in ccdc assays and orf b gene assay in ipbcams assays performed better than the other assays. rapid and accurate detection of sars-cov- represent a fast-growing global demand, which could be met by taqman real time rt-pcr (qrt-pcr). however, the current available taqman qrt-pcr assays for sars-cov- are varied in performance, including sensitivity, specificity, reproducibility, linear detection ranges, etc. due to that relative lower viral load in upper respiratory tract, reliable qrt-pcr assays for the detection of sars-cov- are required. we thus compared the performance of three currently wide-applied qrt-pcr assays in the detection of sars- sensitivity is the primary demand in the detection of respiratory viruses ( ). all of the three qrt-pcr assays could provide a lod of genomic copies per reaction with a detection range from - genomic copies per reaction. the ct value at genomic copies per reaction in the orf b gene assay of ccdc assays was higher than . these results suggested that most of the three qrt-pcr assays provide high sensitivity and wide linear detection range in detecting sars-cov- , except a relative lower sensitivity observed in the orf b gene assay of ccdc assays. specificity is also essential in the detection of sars-cov- , because of common co-infections with other respiratory viruses and high host dna background in throat swabs ( - ). we evaluated the specificity of the three qrt-pcr assays with respiratory specimens positive for other common respiratory viruses. no cross reaction was observed, demonstrating high specificity of the three qrt-pcr assays in detection of sars-cov- . we next evaluated the reproducibility of the three qrt-pcr assays by measuring coefficient of variation (cv) of mean ct values in intra-and inter-assay ( ). the n gene assay in ipbcams assays and orf b gene assay in who assays exhibited a relative better reproducibility with lower intra-and inter-assay cvs, which were not affected by the co-existed nucleic acids of other respiratory viruses. efficiency is another key parameter of qrt-pcr, reflecting the binding efficiency of primers & probe to template and the amplification efficiency of the pcr system( ). most of the qrt-pcr assays provided good efficiency, except an abnormal efficiency of . % observed in the orf b gene assay of who assays. an exceptionally high efficiency indicates an increased risk of false positive ( ). the co-existed nucleic acids of other respiratory viruses increased the efficiency of all the three qrt-pcr assays, suggesting potential increased risk of cross-reactions between the primers & probe and background nucleic acids. we finally evaluate the performance of the three qrt-pcr assays with clinical specimens from suspected sars-cov- infected patients ( ). possibly because of the lower viral load in upper respiratory tract ( ), the detection rate of sars-cov- was lower in throat swabs than in sputum by all of the three assays. meanwhile, the n gene assay performed better than the corresponding orf b gene assay in all of the three qrt-pcr assays. for the n gene assay, ipbcams assays and ccdc assays performed better than who assays, both of which could detect sars-cov- from more than % of the suspected specimens. for the orf b gene assay, ipbcams assays performed better than who assays and ccdc assays, with a detection rate of %. in conclusion, we performed methodological evaluations on three widely-applied qrt-pcr assays for the detection of sars-cov- . although most of the evaluated assays exhibited good sensitivity, specificity, reproducibility and wide linear detection range, performance test with clinical specimens from suspected covid- patients suggested that the n gene assay in ipbcams assays and ccdc assays, and the orf b gene assays in ipbcams assays were the preferred qrt-pcr assays for accurate detection of sars-cov- . the original data will be available upon request. rna extracted from human respiratory specimens by using trizol. "rna transcript + other viruses" represents a : (v/v) mixture of these two components. an interactive web-based dashboard to track covid- in real time novel coronavirus: from discovery to clinical diagnostics viral load in upper respiratory specimens of infected patients development of two taqman real-time reverse transcription-pcr assays for the detection of severe acute respiratory syndrome coronavirus- . biosafety and health doi:accepted multiplex pcr system for the rapid diagnosis of respiratory virus infection: systematic review and meta-analysis ii rv with the xtag respiratory viral panel and seeplex rv for detection of respiratory viruses co-infection with sars-cov- and influenza a virus in patient with pneumonia co-infection with sars-cov- and human metapneumovirus a multiplex one-tube nested real time rt-pcr assay for simultaneous detection of respiratory syncytial virus, human rhinovirus and human metapneumovirus development of an efficient qrt-pcr assay for quality control and cellular quantification of respiratory samples unaccounted uncertainty from qpcr efficiency estimates entails uncontrolled false positive rates clinical evaluation of a panel of multiplex quantitative real-time reverse transcription polymerase chain reaction assays for the detection of respiratory viruses associated with community-acquired pneumonia the authors declare that there are no conflicts of interest regarding the publication of this paper. key: cord- -q cl kjp authors: salguero, francisco j.; white, andrew d.; slack, gillian s.; fotheringham, susan a.; bewley, kevin r.; gooch, karen e.; longet, stephanie; humphries, holly e.; watson, robert j.; hunter, laura; ryan, kathryn a.; hall, yper; sibley, laura; sarfas, charlotte; allen, lauren; aram, marilyn; brunt, emily; brown, phillip; buttigieg, karen r.; cavell, breeze e.; cobb, rebecca; coombes, naomi s.; daykin-pont, owen; elmore, michael j.; gkolfinos, konstantinos; godwin, kerry j.; gouriet, jade; halkerston, rachel; harris, debbie j.; hender, thomas; ho, catherine m.k.; kennard, chelsea l.; knott, daniel; leung, stephanie; lucas, vanessa; mabbutt, adam; morrison, alexandra l.; ngabo, didier; paterson, jemma; penn, elizabeth j.; pullan, steve; taylor, irene; tipton, tom; thomas, stephen; tree, julia a.; turner, carrie; wand, nadina; wiblin, nathan r.; charlton, sue; hallis, bassam; pearson, geoffrey; rayner, emma l.; nicholson, andrew g.; funnell, simon g.; dennis, mike j.; gleeson, fergus v.; sharpe, sally; carroll, miles w. title: comparison of rhesus and cynomolgus macaques as an authentic model for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q cl kjp a novel coronavirus, sars-cov- , has been identified as the causative agent of the current covid- pandemic. animal models, and in particular non-human primates, are essential to understand the pathogenesis of emerging diseases and to the safety and efficacy of novel vaccines and therapeutics. here, we show that sars-cov- replicates in the upper and lower respiratory tract and causes pulmonary lesions in both rhesus and cynomolgus macaques, resembling the mild clinical cases of covid- in humans. immune responses against sars-cov- were also similar in both species and equivalent to those reported in milder infections and convalescent human patients. importantly, we have devised a new method for lung histopathology scoring that will provide a metric to enable clearer decision making for this key endpoint. in contrast to prior publications, in which rhesus are accepted to be the optimal study species, we provide convincing evidence that both macaque species authentically represent mild to moderate forms of covid- observed in the majority of the human population and both species should be used to evaluate the safety and efficacy of novel and repurposed interventions against sars-cov- . accessing cynomolgus macaques will greatly alleviate the pressures on current rhesus stocks. a novel acute respiratory syndrome, now called coronavirus disease- (covid- ) was first reported in wuhan, china in december . the genetic sequence of the causative agent was found to have similarity with two highly pathogenic respiratory beta coronaviruses, sars and mers , and was later called sars-cov- . it has currently infected > million individuals resulting in > , deaths . among the clinical and pathological signs of sars-cov- infection in humans, pneumonia accompanied by respiratory distress seem to be the most clinically relevant , . (figure ). high levels of viral dpc before falling and remaining below the assay's lower limit of detection (llod) from eleven dpc to dpc. throat swabs from cynomolgus macaques contained higher levels of viral rna early in infection (one to three dpc) and remained ≥ . x copies/ml for all animals between four and nine dpc. virus shedding from the gastrointestinal tract was assessed by rt-qpcr performed on rectal swab samples. in rhesus macaques, low levels of viral rna were detected from one dpc to nine dpc. in cynomolgus macaques, viral rna was similarly detected at a low level in rectal swabs from one dpc to nine dpc. however viral rna levels above the lloq were detected at both three dpc and five dpc in cynomolgus macaques in comparison to two dpc and three dpc in rhesus macaques ( figure d ). viral rna was detected at only two timepoints after challenge in whole blood samples and remained below the lloq throughout the study ( figure e ). in rhesus macaques, viral rna was detected in one animal at three dpc, whilst in cynomolgus macaques, viral rna was detected in two animals at six dpc. were seen ( figure i ). mononuclear cells, primarily lymphocytes also were noted surrounding and infiltrating the walls of blood vessels and airways ( figure j ). an increased prominence of bronchial-associated lymphoid tissue (balt) was noted. in the lungs of rhesus macaques, changes in the alveoli and balt were similar in appearance and frequency to those described in the cynomolgus macaques, and perivascular lymphocytic cuffing of small vessels, characterised by concentric infiltrates of mononuclear cells, was also seen occasionally ( figure k and l). in summary, using the histopathology scoring system developed here, the scores were higher in both macaque species at / dpc compared to / and / dpc, mostly due to higher scores in the alveolar damage parameters observed at the early time point ( figure a ). in the liver, microvesicular, centrilobular vacuolation, consistent with glycogen, together with, small, random, foci of lymphoplasmacytic cell infiltration were noted rarely (data not shown). this is considered to represent a mild, frequently observed in rhesus macaques, the ifn-γ sfu measured following stimulation with spike protein peptide megapools (mp) - did not differ significantly between animals euthanised at either the day - (early) or the day - (late) post-infection time- point in comparison to sfu frequencies measured in the naïve control animals. however, comparison of the summed mp - -specific response indicated that significantly higher sfu frequencies were present in the animals euthanised at the in general, there was a trend for spike protein peptide-specific ifn-γ sfu frequencies measured in pbmc samples collected from cynomolgus macaques to be greater than those detected in rhesus macaques, although these differences did not reach statistical significance. haiming wei pathogenic t- cells and inflammatory monocytes incite inflammatory storms in severe covid- patients targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals airway memory cd (+) t cells mediate protective immunity against emerging respiratory coronaviruses humoral and circulating follicular helper t cell responses in recovered patients with covid- the covid- vaccine development landscape comparison of rhesus and cynomolgus macaques in a streptococcus pyogenes infection model for vaccine evaluation mncs. stacked bars show the group median with % confidence intervals. pbmc: naïve rhesus n= , early rhesus n= , late rhesus n= , naïve cyno = , early cyno n= , late cyno n= . lung: early rhesus n= , late rhesus n= , early cyno n= , late key: cord- -nwwutduy authors: patel, ami; walters, jewell; reuschel, emma l.; schultheis, katherine; parzych, elizabeth; gary, ebony n.; maricic, igor; purwar, mansi; eblimit, zeena; walker, susanne n.; guimet, diana; bhojnagarwala, pratik; doan, arthur; xu, ziyang; elwood, dustin; reeder, sophia m.; pessaint, laurent; kim, kevin y.; cook, anthony; chokkalingam, neethu; finneyfrock, brad; tello-ruiz, edgar; dodson, alan; choi, jihae; generotti, alison; harrison, john; tursi, nicholas j.; andrade, viviane m.; dia, yaya; zaidi, faraz i.; andersen, hanne; lewis, mark g.; muthumani, kar; kim, j joseph; kulp, daniel w.; humeau, laurent m.; ramos, stephanie; smith, trevor r.f.; weiner, david b.; broderick, kate e. title: intradermal-delivered dna vaccine provides anamnestic protection in a rhesus macaque sars-cov- challenge model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: nwwutduy coronavirus disease (covid- ), caused by the sars-cov- virus, has had a dramatic global impact on public health, social, and economic infrastructures. here, we assess immunogenicity and anamnestic protective efficacy in rhesus macaques of the intradermal (id)-delivered sars-cov- spike dna vaccine, ino- . ino- is an id-delivered dna vaccine currently being evaluated in clinical trials. vaccination with ino- induced t cell responses and neutralizing antibody responses against both the d and g sars-cov- spike proteins. several months after vaccination, animals were challenged with sars-cov- resulting in rapid recall of anti-sars-cov- spike protein t and b cell responses. these responses were associated with lower viral loads in the lung and with faster nasal clearance of virus. these studies support the immune impact of ino- for inducing both humoral and cellular arms of the adaptive immune system which are likely important for providing durable protection against covid- disease. covid- was declared a global pandemic on march , by the world health organization. there are > . million confirmed cases worldwide with the total number of deaths estimated to be > , (july , , gisaid.org) . covid- presents as a respiratory illness, with mildto-moderate symptoms in many cases (~ %) (chen et al., ; huang et al., ) . these symptoms include headache, cough, fever, fatigue, difficulty breathing, and possible loss of taste and smell. the factors involved with progression to severe covid- disease in % of cases are unclear; yet, severe disease is characterized by development of a hyperinflammatory response, followed by development of acute respiratory distress syndrome (ards), potentially leading to mechanical ventilation, kidney failure, and death xu et al., ) . the rapid development of vaccine countermeasures remains a high priority for this infection, with multiple candidates having entered the clinic in record time (thanh le et al., ) . sars-cov- is a positive-sense single stranded rna virus belonging to the family coronaviridae, genus betacoronavirus, the same family as severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov). sars-cov- is homologous to sars-cov, sharing around - % of its genome (lu et al., ) . structural components of sars-cov- include the envelope (e) protein, membrane (m) protein, nucleocapsid (n) protein, and surface spike (s) protein, which is a major immunogenic target for humoral and cellular immune responses. virus entry into host cells is mediated through s protein receptor binding domain (rbd) interaction with the host cell receptor angiotensin converting enzyme (ace ) and priming by the serine protease tmprss (hoffmann et al., ) . recent preprints and published studies also implicate s binding to neuropilin (nlp- ) and have identified neutralizing epitopes outside rbd (chi et al., ) . sars-cov- is the third coronavirus outbreak this century. prior work with the related coronaviruses, sars-cov and mers-cov, delineated that the spike protein of these viruses was an important target for development of neutralizing antibodies, and in animal viral challenges vaccine targeted immunity (reviewed in (du et al., ; roper and rehm, ; thanh le et al., ) (liu et al., ; muthumani et al., ; van doremalen et al., a) . recently studies have described initial data for vaccines developed from inactivated viruses (gao et al., ) , recombinant adenoviral vectors that express the spike antigen (van doremalen et al., b) , or intramuscular (im) delivered dna vaccine candidates in nonhuman primates (nhps). these studies focused on disease protection in acute models with challenge weeks after the final vaccination. two in particular showed the ability of vaccination to lower viral load in the lungs (gao et al., ; van doremalen et al., b) , and one reported impact in the lungs and a trend to impact viral loads in the nose . while there are significant questions arising regarding the length of protective immunity from sars-cov infection, as well as the long-term protection of vaccination, there have been no studies yet reported of the ability of vaccine memory to impact challenge outcome. here we report on the impact of an id-delivered sars-cov- dna vaccine in a nonhuman primate model where the animals were challenged months post-vaccination. recent studies support an important role for both cellular and humoral immunity in protection against covid- disease. t cell-mediated immunity has been shown to be important for other beta-coronaviruses, providing both direct protection and help in the generation of effective humoral responses (channappanavar et al., b; gretebeck and subbarao, ) . several sars-cov- studies have now reported that up to % of persons can exhibit mild disease which may be associated with t cell immunity (payne et al., ) or humoral responses (robbiani et al., ) . this echoes older studies which found that neutralizing antibody levels and memory b cell responses have not been sustained in sars survivors, while responsive t cell populations have proved more durable (le bert et al., ; tang et al., ; yang et al., ) . studies have indicated the levels of sars-cov- antibodies wane rapidly in convalescent subjects (long et al., ) . we recently described the immunogenicity of a sars-cov- dna vaccine (ino- ), encoding a synthetic spike immunogen in small animal models . this vaccine candidate induced neutralizing and ace -blocking antibodies, as well as t cell responses in mice and guinea pigs. here, we assess vaccine-induced memory t and b cell responses during the acute expansion and memory phases in rhesus macaques, as well the ability of vaccine-induced memory to impact infection in an nhp viral installation challenge following vaccination by the minimally invasive (id-ep) route. additionally, we addressed the growing concerns regarding the emergence of a dominant sars-cov variant g , which now comprises more than % of circulating global viral strains, and is reportedly associated with increased infectivity and spread (korber b et al., ) . there have not yet been descriptions of vaccine induced responses driving immunity against g variants. we report vaccine-induced humoral and cellular immunity, including neutralizing antibody responses against both the sars-cov- d virus as well as the new g strain. upon sars-cov- challenge more than months following final-dose, vaccinated macaques exhibited a rapid recall response against multiple regions of the s protein. this anamnestic response was characterized by expansion of neutralizing antibody responses, including those against the now dominantly circulating g variant, as well as the rapid expansion of t cell responses. the immune responses induced by ino- were associated with disease protection. nhps are a valuable model in the development of covid- vaccines and therapeutics as they can be infected with wild-type sars-cov- , and present with early infection that mimics aspects of human disease gao et al., ; yu et al., ) . rhesus macaques (n= ) received two immunizations of ino- ( mg), at week and week ( figure a ). naïve control animals (n= ) did not receive vaccine. humoral and cellular immune responses were monitored for weeks (~ months) following prime immunization for memory responses. all animals seroconverted following a single ino- immunization, with serum igg titers detected against the full-length s +s extracellular domain (ecd), s , s , and rbd regions of the sars-cov- s protein ( figure b and c) . cross-reactive antibodies were also detected against sars-cov s protein and rbd, but not mers-cov (supplementary figure ) . sars-cov- -reactive igg against the ecd and rbd were detected in bronchoalveolar lavage (bal) washes at week , -weeks following the nd immunization dose (supplementary figure ) . sars-cov- pseudovirus-neutralizing antibodies were detected in the serum of vaccinated animals at week following immunization ( figure d ). these memory titers were comparable to those observed in other reported protection studies in macaques performed at the acute phase of the vaccine-induced immune response (gao et al., ; van doremalen et al., b; yu et al., ) and those reported in the sera of convalescent patients (ni et al., ; robbiani et al., ) . during the course of the covid- pandemic, a d g sars-cov- spike variant has emerged that has potentially greater infectivity, and now accounts for > % of new isolates (korber b et al., ) . there is a concern that vaccines developed prior to this variant's appearance may not neutralize the d g virus. there is also concern that persons infected with earlier strains, may not have immunity against the g variant. therefore, we evaluated neutralization against this new variant using a modified pseudovirus developed to express the g spike protein ( figure e ). similar neutralization id titers were observed against both d and g spikes, supporting induction of functional antibody responses by the vaccine against both sars-cov- virus variants. to investigate specific receptor-blocking neutralizing antibody activity, we assessed the capacity to occlude ace by vaccine-induced sera. we recently developed a receptor-blocking assay and demonstrated that it correlates with pseudovirus neutralization titers (walker et al. accepted) . in this assay, anti-spike antibodies are bound to sars-cov- spike protein in an elisa plate followed by addition of a soluble version of the ace receptor. sera containing antibodies which can obstruct the receptor binding site on the spike protein will have lower area under the curve (auc) values in our assay. we observe that sera from % of immunized nhps had reduced auc indicating the presence of antibodies that can block the sars-cov- spike protein from interacting with ace ( figure f ). an independent flow cytometry-based assay was also performed to further study the spike-ace interaction. ace -expressing t cells were co-incubated with spike with or without the presence of sera. spike binding to ace was detected by flow cytometry. in this assay, % of macaques inhibited the spike-ace interaction, with - % inhibition of the spike-ace interaction at a : dilution and - % inhibition at a : dilution ( figure g ). all peptide pools with t cells responses peaking at week , two weeks following the second immunization ( - sfu/million cells, with / nhps showing positive responses) ( figure h ). distinct immunogenic epitope responses were detected against the rbd and s regions ( figure b and c). interestingly, cross-reactive t cells responses were also detected against the sars-cov spike protein (supplementary figure a) . however, we did not observe cross-reactivity to mers-cov spike peptides, which is consistent with the lower sequence homology between figure b) . ino- immunized macaques and unvaccinated controls were challenged with sars-cov- weeks (~ months) post-final immunization (study week , figure a ). nhps received a challenge dose of . x pfu of sars-cov- isolate usa-wa / by intranasal and intratracheal inoculation as previously described yu et al., ) . one day after viral challenge, / of ino- vaccinated animals displayed an increase in antibody titers against the sars-cov- full-length ecd. by day , all vaccinated animals had an increase in antibody titers against the full length ecd, s , rbd, and s proteins ( figure b ). seven days post-challenge, robust geometric mean endpoint titers for the ecd ranging from , - , , were observed in immunized animals, compared with the naïve group which displayed seroconversion of only / animals (gmt ) ( figure b ). antibody responses continued to increase against sars-cov- proteins days post-challenge. a significant increase in pseudoneutralization titers (> -fold increase) was observed in all ino- immunized animals against both d and g spike variants by day post-challenge, compared to unimmunized animals which only developed low neutralizing titers ( figure c) . at earlier time points post challenge, viral mrna detection does not discriminate between input challenge inoculum and active infection, while subgenomic (sgmrna) levels are more likely representative of active cellular sars-cov- virion replication yu et al., ) . as such, sars-cov- sgmrna was measured in unvaccinated control and ino- overall, the reduced viral loads following exposure to sars-cov- infection at weeks after immunization show an important durable impact mediated by the vaccine. the covid- pandemic continues to devastate global health creating a destabilizing environment. although therapies are progressing through multiple stages of development, the need for preventative vaccine approaches for sars-cov- remains significant. importantly, multiple vaccines candidates are being advanced to the clinic as potential options for protection against infection and or disease (reviewed in (funk et al., ; thanh le et al., ) ). recently acute challenge studies in nhps receiving different experimental vaccines have been reported (gao et al., ; van doremalen et al., b; yu et al., ) . these studies describe reduction in viral loads in the lower airways, and a trend to lower viral loads in the nose by an im dna delivered full length spike ag. however, protective efficacy at time points following contraction of the acute phase immune response and establishment of immune memory have not been previously studied. this is also true looking at prior nhp challenge in the mers or sars models (muthumani et al., ; roberts et al., ; van doremalen et al., a) . accordingly, understanding the impact of vaccine-induced memory on challenge is of importance. in the current study, we evaluated the durability of an id-delivered full-length spike nhps followed by challenge at weeks following the second immunization. they observed decreases in bal viral loads and a trend to lower nasal swab subgenomic viral loads in some groups, including a group containing a full-length spike ag construct that shares similarity to the construct studied here using a x mg dose by id administration. in a recent clinical study, we reported that id delivery of an hiv vaccine prototype was dose-sparing and demonstrated increased tolerability and improved immune potency compared to im dna delivery (de rosa et al., ) . while limited conclusions can be drawn from such comparisons, more study in this context is warranted. in this study, we observed an impact on induction of durable immunity and protective efficacy at a memory time point, weeks post-final immunization. importantly, reduced viral subgenomic rna loads in the lower lung were observed. a trend of lower viral mrna loads was also observed in the nose with more rapid clearance induced in the ino- vaccinated animals. these data support that immunization and memory immunity from such a vaccine in humans might protect from severe disease in the lung as well as limit the duration of viral shedding in the nasal cavity, thus possibly lowering viral transmission. the initial viral loads detected in control unvaccinated animals in this study were approximately logs higher ( pfu/swab in / nhps on day post-challenge) than in similar published studies performed under identical conditions (~ pfu/swab) . two of the prior reported nhp studies included intranasal delivery as an inoculation route for challenge (van doremalen et al., b; yu et al., ) . high-dose challenge inoculums are frequently employed to ensure take of infection, however such high dose challenge may artificially reduce the impact of potentially protective vaccines and interventions (durudas et al., ; innis et al., ) . despite these limitations, this study demonstrated significant reduction in peak bal sgmrna and overall viral mrna. wolfel et al reported nasal titers in patients of an average . x copies/swab days - following onset of symptoms . these titers are significantly lower than our nhp challenge dose and support the potential for the vaccine to exhibit great impact in the field against natural sars-cov- challenge. t cell immunity has been shown to be an important mediator of protection against betacoronaviruses, and recent studies have specifically identified a role in protection against covid- disease (channappanavar et al., a; channappanavar et al., b; sekine et al., ; sun et al., ) . sekine et al reported sars-cov- -reactive t cells in asymptomatic and mild covid- convalescent patients who were antibody seronegative (sekine et al., ) . these results from convalescent patient studies collectively suggest that vaccine candidates which could generate balanced humoral and cellular immune responses could be important in providing protection against covid- disease. our study and other published reports show that dna vaccination with candidates targeting the full-length sars-cov- spike protein likely increase the availability of t cell immunodominant epitopes leading to a broader and more potent immune response, compared to partial domains and truncated immunogens. in this study, we also observe t cell cross-reactivity to sars-cov- . further evaluation of these shared responses is likely of interest and may inform development of vaccines targeting related betacoronaviruses. long-lived sars-cov specific t cells have been reported (le bert et al., ; yang et al., ) . additional studies of sars-cov- infected patients will provide important insight towards immunity and long-term protection against covid- disease. in addition to t cells, we demonstrated that ino- induced durable antibody responses that rapidly increased following sars-cov- challenge. the vaccine induced robust neutralizing antibody responses against both d and g sars-cov- variants. while the d/g site is outside the rbd, it has been suggested that this shift has the potential to impact vaccine-elicited antibodies (korber b et al., ) and possibly virus infectivity. our data shows induction of comparable neutralization titers between d and g variants and that these responses are similarly recalled following sars-cov- challenge. we additionally observe a positive relationship between vaccine-induced antibody binding to the rbd and neutralizing antibody titers (supplementary figure , r = . ) . overall, the current challenge study provides a snapshot of anamnestic protective efficacy several months post-vaccination with a small cohort of animals. the data support further study of the sars-cov- dna vaccine candidate ino- which is currently in clinical trials. the persistence of sars-cov- immunity following natural infection is unknown and recent studies suggest natural immunity may be short lived (long et al., ) . given that many people exhibit asymptomatic or mild disease and recover without developing significant antibody responses, additional study of vaccines that induce immunological memory for t and b cell responses is the highly optimized dna sequence encoding sars-cov- ige-spike was created using inovio's proprietary in silico gene optimization algorithm to enhance expression and immunogenicity . the optimized dna sequence was synthesized, digested with bamhi and xhoi, and cloned into the expression vector pgx under the control of the human cytomegalovirus immediate-early promoter and a bovine growth hormone polyadenylation signal. use committee at bioqual (rockville, maryland), an association for assessment and accreditation of laboratory animal care (aaalac) international accredited facility. blood was collected for blood chemistry, pbmc isolation, and serological analysis. bronchoalveolar lavage (bal) was collected on week to assay lung antibody levels and on days , , , post challenge to assay lung viral loads. ten chinese rhesus macaques (ranging from . kg- . kg) were randomly assigned to be immunized ( males and females) or naïve ( males and females). immunized macaques received two mg injections of sars-cov- dna vaccine, ino- , at week and by the minimally invasive id-ep administration using the cellectra ® adaptive constant current electroporation device with a p array (inovio pharmaceuticals). animals were challenged with sars-related coronavirus , isolate usa-wa / (bei resources nr- . the following reagent was deposited by the centers for disease control and prevention and obtained through bei resources, niaid, nih: sars-related coronavirus , isolate usa-wa / , nr- . blood was collected at indicated time points to analyse blood chemistry and to isolate peripheral blood mononuclear cells (pbmc), and serum. bronchoalveolar lavage was collected at week and during challenge period to assay viral load. bal from naïve animals was run as control. at week , all animals were challenged with . x vp ( . x pfu) sars-cov- . virus was administered as ml by the intranasal route ( . ml in each nostril) and ml by the intratracheal route. nasal swabs were collected during the challenge period using copan flocked swabs and placed into ml pbs. blood was collected from each macaque into sodium citrate cell preparation tubes (cpt, bd biosciences). the tubes were centrifuged to separate plasma and lymphocytes, according to the manufacturer's protocol. samples were transported by same-day shipment on cold-packs from bioqual to the wistar institute for pbmc isolation. pbmcs were washed and residual red blood cells were removed using ammonium-chloride-potassium (ack) lysis buffer. cells were counted using a vicell counter (beckman coulter) and resuspended in rpmi (corning), supplemented with % fetal bovine serum (atlas), and % penicillin/streptomycin (gibco). was performed to detect cellular responses. monkey ifn-γ elispotpro (alkaline phosphatase) plates (mabtech, sweeden, cat# m- apw- ) were blocked for a minimum of hours with rpmi (corning), supplemented with % fbs and % pen/strep (r ). following pbmc isolation, cells were added to each well in the presence of ) peptide pools ( -mers with -mer overlaps) corresponding to the sars-cov- , sars-cov- , or mers-cov spike proteins ( µg/ml/well final concentration), ) r with dmso (negative control), or ) anti-cd positive control (mabtech, : dilution). all samples were plated in triplicate. plates were incubated overnight at °c, % co . after - hours, the plates were washed in pbs and spots were developed according to the manufacturer's protocol. spots were imaged using a ctl immunospot plate reader and antigen-specific responses were determined by subtracting the number of spots in the r +dmso negative control well from the wells stimulated with peptide pools. antigen binding elisa. serum and bal was collected at each time point was evaluated for binding titers. ninety-six well immunosorbent plates (nunc) were coated with ug/ml recombinant sars-cov- s +s ecd protein (sino biological -v b ), s protein (sino plates were washed four times with pbs + . % tween (pbs-t) and blocked with % skim milk in pbs-t ( % sm) for minutes at °c. sera or bal from ino- vaccinated and control macaques were serially diluted in % sm, added to the washed elisa plates, and then incubated for hour at °c. following incubation, plates were washed times with pbs-t and an anti-monkey igg conjugated to horseradish peroxidase (southern biotech - ) hour at °c. plates were washed times with pbs-t and one-step tmb solution (sigma) was added to the plates. the reaction was stopped with an equal volume of n sulfuric acid. plates were read at nm and nm within minutes of development using a biotek synergy plate reader. -well half area plates (corning) were coated at room temperature for hours with µg/ml polyrab anti-his antibody (thermofisher, pa - b), followed by overnight blocking with blocking buffer containing x pbs, % sm, % fbs, and . % tween- . the plates were then incubated with µg/ml of his x-tagged sars-cov- , s +s ecd (sinobiological, -v b ) at room temperature for - hours. nhp sera (day or week ) was serially diluted fold with xpbs containing % fbs and . % tween and pre-mixed with huace -igmu at constant concentration of . µg/ml. the pre-mixture was then added to the plate and incubated at room temperature for - hours. the plates were further incubated at room temperature for hour with goat anti-mouse igg h+l hrp (a - p, bethyl laboratories) at : , dilution followed by addition of one-step tmb ultra substrate (thermofisher) and then quenched with m h so . absorbance at nm and nm were recorded with biotek plate reader. expressing ace -gfp were generated using retroviral transduction. following transduction, the cells were flow sorted based on gfp expression to isolate gfp positive cells. single cell cloning was done on these cells to generate cell lines with equivalent expression of ace -gfp. to detect inhibition of spike binding to ace , . µg/ml s +s ecd-his tagged (sino biological, catalog # -v b ) was incubated with serum collected from vaccinated animals at indicated time points and dilutions on ice for min. this mixture was then transferred to , t-ace -gfp cells and incubated on ice for min. following this, the cells were washed x with pbs followed by staining for surelight® apc conjugated anti-his antibody (abcam, ab ) for min on ice. as a positive control, spike protein was pre-incubated with recombinant human ace before transferring to t-ace -gfp cells. data was acquired using a bd lsrii and analyzed by flowjo (version ). sars-cov- pseudovirus was produced using hek t cells transfected with genejammer (agilent) using ige-sars-cov- spike plasmid (genscript) and pnl - .luc.r-e-plasmid (nih aids reagent) at a : ratio. forty-eight hours post transfection, supernatant was collected, enriched with fbs to % final volume, steri-filtered (millipore sigma), and aliquoted for storage at - °c. sars-cov- pseudovirus neutralization assay was set up using d media (dmem supplemented with %fbs and x penicillin-streptomycin) in a well format. cho cells stably expressing ace were used as target cells (creative biolabs, catalog no. vcel-wyb ). sars-cov- pseudovirus were titered to yield greater than times the cells only control relative luminescence units (rlu) after h of infection. , cho-ace cells/well were plated in -well plates in ul d media and rested overnight at ˚c and % co for hours. the following day, sera from ino- vaccinated and control groups were heat inactivated and serially diluted. sera were incubated with a fixed amount of sars-cov- pseudovirus for minutes at rt. the sera+virus mix was then added to the plated cho-ace cells and allowed to incubate in a standard incubator ( % humidity, % co ) for h. cells were then lysed using britelite plus luminescence reporter gene assay system (perkin elmer catalog no. ) and rlu were measured using the biotek plate reader. neutralization titers (id ) were calculated using graphpad prism and defined as the reciprocal serum dilution at which rlu were reduced by % compared to rlu in virus control wells after subtraction of background rlu in cell control wells. viral rna assay: rt-pcr assays were utilized to monitor viral loads, essentially as previously described (abnink p et al science). briefly, rna was extracted using a qiacube ht (qiagen,germany) and the cador pathogen ht kit from bronchoalveolar lavage (bal) supernatant and nasal swabs. rna was reverse transcribed using superscript vilo (invitrogen) and ran in duplicate using the quantstudio and flex real-time pcr system (applied biosystems) according to manufacturer's specifications. viral loads were calculated of viral rna copies per ml or per swab and the assay sensitivity was copies. the target for amplification was the sars-cov n (nucleocapsid) gene. the primers and probes for the targets were: to generate a standard curve, the sars-cov- e gene sgmrna was cloned into a pcdna . expression plasmid; this insert was transcribed using an amplicap-max t high yield message maker kit (cellscript) to obtain rna for standards. prior to rt-pcr, samples collected from challenged animals or standards were reverse-transcribed using superscript iii vilo (invitrogen) according to the manufacturer's instructions. a taqman custom gene expression assay sars-cov- infection protects against rechallenge in rhesus macaques virus-specific memory cd t cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection t cell-mediated immune response to respiratory coronaviruses epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study robust antibody and cellular responses induced by dna-only vaccination for hiv the spike protein of sars-cov--a target for vaccine and therapeutic development differential innate immune responses to low or high dose oral siv challenge in rhesus macaques a snapshot of the global race for vaccines targeting sars-cov- and the covid- pandemic development of an inactivated vaccine candidate for sars-cov- animal models for sars and mers coronaviruses sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor clinical features of patients infected with novel coronavirus in wuhan convening on the influenza human viral challenge model for universal influenza vaccines tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls a recombinant vsv-vectored mers-cov vaccine induces neutralizing antibody and t cell responses in rhesus monkeys after single dose immunization clinical and immunological assessment of asymptomatic sars-cov- infections genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a synthetic consensus anti-spike protein dna vaccine induces protective immunity against middle east respiratory syndrome coronavirus in nonhuman primates detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals sars-cov- infections and serologic responses from a sample of convergent antibody responses to sars-cov- in convalescent individuals animal models and vaccines for sars-cov infection sars vaccines: where are we? robust t cell immunity in convalescent individuals with asymptomatic or mild covid- . biorxiv immunogenicity of a dna vaccine candidate for covid- generation of a broadly useful model for covid- pathogenesis, vaccination, and treatment lack of peripheral memory b cell responses in recovered patients with severe acute respiratory syndrome: a six-year follow-up study a single dose of chadox mers provides protective immunity in rhesus macaques virological assessment of hospitalized patients with covid- pathological findings of covid- associated with acute respiratory distress syndrome long-lived effector/central memory t-cell responses to severe acute respiratory syndrome coronavirus (sars-cov) s antigen in recovered sars patients dna vaccine protection against sars-cov- in rhesus macaques recall of humoral immune responses after viral challenge. (a) study outline. (b) igg binding elisa. sars-cov- s +s and sars-cov- rbd protein antigen binding of igg in diluted nhp sera collected prior to challenge and post challenge in ino- vaccinated (right panels) and naïve animals (left panels). (c) pseudo-neutralization assay showing the presence of sars-cov- specific neutralizing antibodies against the d and g variants of sars-cov- before and after viral challenge in unvaccinated at week naïve and ino- immunized ( per group) rhesus macaques were challenged by intranasal and intratracheal administration of . x pfu sars-cov- (us-wa isolate). (a) log sgmrna copies/ml in bal in naïve (left panel) and ino- vaccinated animals (right panel). (b) peak sgmrna and (c) viral rna in bal days post-challenge. (d) log sgmrna copies/ml in nasal swabs in naïve (left panel) and ino- vaccinated animals (right panel). (e) peak sgmrna and (f) viral rna in nasal swabs days post-challenge key: cord- - s o m authors: stamatakis, george; samiotaki, martina; mpakali, anastasia; panayotou, george; stratikos, efstratios title: generation of sars-cov- s spike glycoprotein putative antigenic epitopes in vitro by intracellular aminopeptidases date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: s o m presentation of antigenic peptides by mhci is central to cellular immune responses against viral pathogens. while adaptive immune responses versus sars-cov- can be of critical importance to both recovery and vaccine efficacy, how protein antigens from this pathogen are processed to generate antigenic peptides is largely unknown. here, we analyzed the proteolytic processing of overlapping precursor peptides spanning the entire sequence of the s spike glycoprotein of sars-cov- , by three key enzymes that generate antigenic peptides, aminopeptidases erap , erap and irap. all enzymes generated shorter peptides with sequences suitable for binding onto hla alleles, but with distinct specificity fingerprints. erap was the most efficient in generating peptides - residues long, the optimal length for hla binding, while irap was the least efficient. the combination of erap with erap greatly limited the variability of peptide sequences produced. less than % of computationally predicted epitopes were found to be produced experimentally, suggesting that aminopeptidase processing may constitute a significant filter to epitope presentation. these experimentally generated putative epitopes could be prioritized for sars-cov- immunogenicity studies and vaccine design. we furthermore propose that this in vitro trimming approach could constitute a general filtering method to enhance the prediction robustness for viral antigenic epitopes. severe acute respiratory syndrome coronavirus (sars-cov- ) is the pathogen responsible for coronavirus disease that is behind a major ongoing pandemic ( ) ( ) ( ) . virus entry into host cells is dependent on the s spike glycoprotein that forms homotrimers on the surface of the virion and interacts with the ace receptor in susceptible cells ( ) ( ) ( ) . many studies on clinical characteristics and mortality resulting from sars-cov- infection have highlighted the need for a detailed understanding of immune responses against this pathogen ( ) . while appropriate innate and adaptive immune responses are necessary for recovery from infection, aberrant immune responses can be a major contributing factor to mortality ( , ) . in parallel, understanding immune recognition of sars-cov- is crucial to the ongoing massive global effort into developing an effecting vaccine against this pathogen ( ) . although early analyses have focused on the development of neutralizing antibodies, cellular immune responses are emerging of vital importance ( ) both for understanding normal immune response against this pathogen and for designing and optimizing vaccines ( ) . in particular t-cell mediated immunity appears to be important for both viral clearance and for long-term immunity ( ) . thus, analysis of antigenic epitopes from sars-cov- should be a priority for the design of vaccines that induce effective and longlasting cellular immune responses ( ) . cytotoxic t-cell responses against virus-infected cells hinge on the presentation of small peptidic fragments of viral proteins, called antigenic peptides, by specialized proteins on the cell surface that belong to the major histocompatibility class i complex (mhci, also called human leukocyte antigens, hla, in humans). antigenic peptides are derived from viral proteins that are proteolytically degraded by complex proteolytic cascades ( ) . intracellular aminopeptidases, er aminopeptidase (erap ), er aminopeptidase (erap ) and insulinregulated aminopeptidase (irap) play important roles in producing antigenic peptides, by down-sizing longer peptides to the correct length for binding onto mhci ( ) . appropriate processing of pathogen antigens by these enzymes can determine the generation of cytotoxic immune responses and aberrant processing can lead to immune evasion ( ) . thus, it is important to understand how these enzymes process sars-cov- antigens, so as to gain insight into the efficacy of antiviral cytotoxic responses and reveal possible avenues to enhance them. in this study, we utilized a novel approach to analyze antigen trimming by intracellular aminopeptidases erap , erap and irap, focusing on the largest antigen of sars-cov- , namely s spike glycoprotein. by using tandem lc-ms/ms analysis, we were able to follow trimming in parallel of a large ensemble of peptides derived from the full length of s protein. this approach was inspired by two established observations: i) that these enzymes are expected to normally encounter a very large number of potential substrates concurrently in the cell and ii) accommodation of peptides inside a large cavity of each enzyme can lead to complex interactions between substrates that have to compete for the same space in the cavity ( ) ( ) ( ) . our analysis provides novel insight into the differences in specificity between the three enzymes and provides a potential filter of traditional bioinformatic approaches that aim to predict antigenic epitopes. finally, we propose a limited list of peptides that are potential ligands for common hla alleles and could be prioritized for further immunological analyses and vaccine design efforts. recombinant erap , erap and irap were expressed and purified as described previously. briefly, erap and erap were expressed by hi insect cells in culture after infection with baculovirus carrying the appropriate gene and purified by affinity chromatography using a c-terminal his tag ( , ) . the enzymatic extracellular domain of irap was expressed by stably transfected hek s gnti (-) cells and purified by affinity chromatography using a c-terminal rhodopsin d tag ( ) . enzymes were stored with % glycerol in aliquots at - c until needed. the pepmix sars-cov- peptide mixture was purchased by jbt peptide technologies gmbh. peptide pools were dissolved in dmso. prior to reactions the two peptide collections ( and peptides respectively) were mixed at equimolar concentrations and diluted in buffer containing mm hepes ph , mm nacl to a final concentration of μm. enzymatic reactions were performed in triplicate in a total volume of μl in mm hepes ph , mm nacl. freshly thawed enzyme stocks were added to each reaction to a final concentration of nm. reactions were incubated at c for hours, stopped by the addition of . μl of a % tfa solution, flash frozen in liquid nitrogen and stored at - c until analysis. the sample was preconcentrated on a pepmap lc trapping column ( . x mm) at a rate of ul of buffer a ( . % formic acid in water) in min. the lc gradient used was % buffer b ( . % formic acid in acetonitrile) to % in min followed by an increase to % in min and a second increase to % in . min and then was kept constant for min. the column was equilibrated for min prior to the next injection. a full ms was acquired with a q exactive hf-x hybrid quadropole-orbitrap mass spectrometer, in the scan range of - m/z using k resolving power with an agc of x and max it of ms, followed by ms/ms scans of the most abundant ions, using k resolving power with an agc of x and max it of ms and an nce of . the mass spectrometry proteomics data have been deposited to the proteomexchange consortium via the pride ( ) partner repository with the dataset identifier pxd . we employed the maxquant computational proteomics platform version . . . to search the peak lists against the spike glycoprotein sars fasta file (swissprot accession number p dtc ) and a file containing frequently observed contaminants. n-terminal acetylation ( . da) and methionine oxidation ( . da) were set as variable modifications. the second peptide identification option in andromeda was enabled. the false discovery rate (fdr) were set to of . both for peptide. the enzyme specificity was set as unspecific. the minimum peptide length was set to amino acids. the initial allowed mass deviation of the precursor ion was set to . ppm and the maximum fragment mass deviation was set to ppm. to investigate the trimming of antigenic epitope precursors by intracellular aminopeptidases that generate antigenic peptides, we used a mixture of synthetic peptides derived from the sequence of the sars-cov- s spike glycoprotein. all peptides were residues long and spanned the entire sequence of the protein with an residue overlap between adjacent peptides. this mixture allows the systematic sampling of the entire sequence of the protein. the peptide mixture was incubated with either erap , erap , irap or an equimolar mixture of erap and erap at a substrate to enzyme ratio of : and the digestion products analyzed by lc-ms/ms using a custom search database generated by in silico digestions of the full s protein sequence (uniprot id: p dtc ). all analyses were performed on three biological replicates for each reaction, as well as a control reaction that was performed in the absence of enzyme. an additional technical replicate for the control sample was also analyzed. for statistical robustness, we performed a t-test between the control sample and each reaction and selected for further analysis only the peptides for which the quantification value changed by a statistically significant degree (p-value< . ). the relative abundance of each peptide before and after the reaction was compared by label-free-quantification. analysis identified unique mers in the samples out of the total in the mixture. this represents an % coverage of included peptides which may be due to poor ionization and detection for some peptide sequences. as a result, further analysis was limited to the peptides detectable by our experimental setup. on average, incubation with enzyme reduced the relative abundance of the mer peptides indicating successful digestion ( figure a , b). this reduction was much more evident for erap and erap (and their mixture) than for irap. each enzyme featured a unique digestion fingerprint, suggesting different selectivity, as suggested in previous studies ( ) . since the majority of peptides presented by hla are - residues long, we analyzed the comparative abundance of - mers generated from each reaction ( figure c, d) . of the three enzymes, erap was the most efficient in generating peptides within this length range, consistent with previous reports on the mechanism of action of this enzyme ( , ) . erap and the erap /erap mixture followed, while irap was the least efficient. similar to the trimming of mers, the generation of - mers followed a unique fingerprint for each enzyme. this is consistent with the previous hypotheses that each of these enzymes accommodate peptides in a large internal cavity and selectivity is driven by interactions with the whole sequence of the peptide ( , , ) . indeed, comparing the peptide sequences generated by each enzyme, out of peptides identified, were common between all three enzymes, between erap and erap and between erap and irap ( figure a) . furthermore, peptides were unique for erap , for erap and for irap. a similar situation was evident for - mer peptides ( figure b) . strikingly, the mixture of erap with erap generated the fewest number of distinct sequences of - mers ( figure c ). this was in contrast to the finding that the erap /erap mixture generated about the same average signal intensity as erap ( figure d ). this was due to erap /erap mixture generating fewer, in terms of sequence, distinct peptides, which were however relatively abundant. this finding is consistent with the proposed synergism of erap and erap ( ) and suggests that the combination of these two enzymes is more efficient in trimming variable sequences and can thus over-trim peptides to lengths below residues that are not detectable in our experimental setup and should not be able to stably bind onto mhci ( figure c ). as a result, incubation with erap /erap mixture, accumulates only peptides that are resistant to degradation by both enzymes. since epitope length is a key parameter for binding onto mhci (the majority of presented peptides are mers) we analyzed the distribution of lengths of peptides generated by each enzymatic reaction ( figure ). erap was very efficient in trimming the mer substrates and generated primarily mer peptides, consistent with its proposed property as a "molecular ruler" ( ) . neither erap nor irap were able to accumulate mers preferably, but still generated significant numbers. the mixture of erap and erap showed a similar fractional distribution of peptide lengths ( figure b ), but produced a much lower number of distinct peptide sequences ( figure a ), presumably due to over-trimming to smaller lengths or even single amino acids. table ). for each peptide we selected the best scoring hla-allele and plotted the calculated percentile rank of the predicted score for each enzymatic reaction ( figure a ). the geometric mean of the predicted affinity was lowest for erap (indicating that the erap generated peptides had the highest average affinity for hla), followed by irap and then erap . only a subset of generated peptides was predicted to bind with sufficient affinity onto at least one hla: % for erap , % for erap , % for erap /erap mixture and % for irap (peptide sequences are listed in table a ). these peptides spanned the whole sequence of the s spike glycoprotein, although each enzyme presented a unique signature onto this sequence ( figure b ). showing the predicted affinity of produced peptides for common hla alleles as calculated by hlathena. color region encompasses peptides that are predicted to bind to at least one of the common hla alleles used in the analysis. panel b, schematic representation of relative locations in the s protein sequence where the generated peptides are found. panel c, venn diagrams depicting overlap between peptides of s protein predicted to bind to common hla alleles and peptides produced experimentally by erap , erap , irap or erap /erap mixture. numerals indicate number of peptides in each separate segment. in a recent publication, the authors proposed that different hla alleles can have significant variability in their ability to present sars-cov- epitopes, with hla-b : having the capability to present the fewest and hla-b : being able to present the most ( ) . we thus used hlathena to analyze the propensity of the experimentally produced peptides to bind onto these two alleles (supplemental table ). erap was found to produce potential ligands for hla-b : but only for hla-b : , consistent with the proposed trend. in contrast, erap produced potential ligands for both alleles and irap produced for hla-b : and for hla-b : . strikingly, the mixture of erap with erap produced peptides that could bind onto hla-b : , but no peptides predicted to bind onto hla-b : (table b) . thus, our findings appear to validate the hypothesis that hla-b : is likely to present more sars-cov- epitopes than hla-b : , but only for erap , which however is considered the dominant aminopeptidase activity in the cell for generating antigenic peptides. bioinformatic epitope predictions based on antigen sequence are often used as a tool to study the potential antigenicity of a particular epitope or pathogen. the power of those predictions is constantly evolving and primarily relies on predictions of binding affinity on hla. to compare such predictions to our experimentally generated peptides, we used the full sequence of the s spike glycoprotein and the netmhcpan . server ( ) to predict potential epitopes that could be presented by the common hla alleles indicated in the previous paragraph. the server predicted potential epitopes with lengths of - residues (supplemental table ). of those potential epitopes however, less than % were found to be produced experimentally by one of the enzymes tested and more specifically by erap , by erap , for the erap / mixture and by irap ( figure c ). while our experimental approach has limitations as discussed below, this finding suggests that intracellular antigen processing by aminopeptidases may constitute a major filter in determining which peptides will be presented by mhci. indeed, it has been recently proposed that the main function of erap is to limit the peptide pool available for mhci ( ) . in this context, this experimental approach could be useful in optimizing bioinformatic predictions. our findings provide new information on both the general biological functions of intracellular aminopeptidases that generate antigenic peptides as well as on specific processing of a key antigen from sars-cov- . specifically, our results highlight that each enzyme bears a unique trimming fingerprint to antigen processing. although this has been suggested before based on differences in specificity towards specific peptide substrates ( , ( ) ( ) ( ) , it has not been observed in the context of peptide ensembles. this is potentially important since competition of different peptides for the cavity in these enzymes could result in complex substrate interactions. at first glance, major differences in trimming fingerprints between each or the three enzymes, may appear to impose an unnecessary complication to antigenic peptide generation. it is conceivable, however, that this trimming variability is desirable for the immune system, since it can expand the breadth of possible antigenic peptides detected in different immunological settings and cell types. our results also highlight a previously proposed property of erap : the specialization in trimming large peptides and producing peptides that have the ideal length for mhci binding -most of the erap products fall well within that range ( ) . in contrast, both erap and irap appear to be less optimized for length selection. however, they are still able to produce many peptides that are potential cargo for mhci, casting some doubt on whether the unique trimming properties of erap are absolutely necessary for this basic function. furthermore, the combination of erap with erap appears to provide significant synergism in trimming, to the point of over-trimming peptides and limiting available sequences. synergism between erap and erap has been demonstrated before in trimming isolated peptides and these two enzymes have been proposed to also form functional dimers ( , ) . according to our observations, their combination is especially efficient in trimming. while the biological repercussions of this are not fully clear yet, it is conceivable that the strong associations between erap activity and predisposition to autoimmunity may be related to this effect ( ) . despite the current importance in understanding immune reactions in covid- , very little is known about the cellular adaptive immune responses against sars-cov- . cellular immune responses are emerging as a central player in clearing the infection and as targets for vaccine efforts ( , ) . furthermore, hla polymorphic variation has been suggested to underlie the large variability in virus clearance that has been observed amongst individuals ( ) . our analysis of the largest antigen of sars-cov- , s spike glycoprotein, suggests that aminopeptidase trimming can be a significant filter that helps shape which peptides will be presented by hla. thus, we propose a short list of candidate peptides that could be prioritized in downstream antigenic analysis as well as in vaccine design and efficacy studies. while the functions of erap , erap and irap have been studied in both in vitro and in vivo contexts during the last decade, their relative functional differences have only been compared in processing specific substrates at a time. however, all these enzymes have a broad substrate specificity and can normally encounter thousands of different peptides in the er or endosomal compartments. on the other hand, studies focusing on the presented immunopeptidome have revealed effects attributed to erap and erap trimming, but direct comparisons have been difficult because of the dominant effect of mhci affinity on presentation ( ). our approach stands in-between these two types of studies. it mimics the multiple-substrate situation that is likely normal in vivo but focuses on antigenic peptide precursor trimming. in this context, our approach may have broader application for the quick prediction of potential antigenic epitopes as an additional filter on bioinformatic predictions. indeed, bioinformatic predictions result in many candidate peptides, very few of which will provoke an immune response; adding more filters can increase the usefulness of these rapid approaches. however, our approach also has limitations that need to be taken into account when interpreting results. due to differences in ionization and detection by the lc-ms/ms some peptides may not be detected or may be under-represented compared to other sequences, making comparisons between different peptides less reliable. furthermore, it is an in vitro approach that is limited to the peptide pool used and cannot take into account the dynamics of mhci binding that can protect peptides from further aminopeptidase degradation ( ) or peptide proofreading by chaperone components or the peptide loading complex ( ) . due to those limitations, we restricted our analysis to statistical comparisons and avoided drawing conclusions regarding particular peptide sequences. in summary, we analyzed the trimming of a peptide ensemble spanning the sequence of the s spike glycoprotein of sars-cov- , the pathogen responsible for the recent covid- pandemic. our analysis provided novel insight into the function of antigen trimming enzymes and suggested that aminopeptidase trimming may be a significant filter in determining which peptides can be presented by mhci. furthermore, we have identified a limited set of peptides that were experimentally produced by elongated precursors which could be prioritized in future studies aiming to investigate the antigenicity of sars-cov- infected cells and assist in the design of highly effective vaccines that aim to produce adaptive cytotoxic responses. we propose that our experimental approach may also be useful as a general tool for enhancing bioinformatic predictions of antigenic epitopes. the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- a new coronavirus associated with human respiratory disease in china structural insights into coronavirus entry structure, function, and antigenicity of the sars-cov- spike glycoprotein structural basis of receptor recognition by sars-cov- early insights into immune responses during covid cytokine storm induced by sars-cov- sars-cov- infection-induced immune responses: friends or foes? avoiding pitfalls in the pursuit of a covid- vaccine t cells found in coronavirus patients 'bode well' for long-term immunity protective adaptive immunity against severe acute respiratory syndrome coronaviruses (sars-cov- ) and implications for vaccines immunoinformatics and structural analysis for identification of immunodominant epitopes in sars-cov- as potential vaccine targets. vaccines (basel) degradation of cell proteins and the generation of mhc class i-presented peptides peptidases trimming mhc class i ligands the final touches make perfect the peptide-mhc class i repertoire mechanism for antigenic peptide selection by endoplasmic reticulum aminopeptidase antigenic peptide trimming by er aminopeptidases--insights from structural studies coding single nucleotide polymorphisms of endoplasmic reticulum aminopeptidase can affect antigenic peptide generation in vitro by influencing basic enzymatic properties of the enzyme critical role of interdomain interactions in the conformational change and catalytic mechanism of endoplasmic reticulum aminopeptidase structural basis for antigenic peptide recognition and processing by endoplasmic reticulum (er) aminopeptidase crystal structure of insulin-regulated aminopeptidase with bound substrate analogue provides insight on antigenic epitope precursor recognition and processing the pride database and related tools and resources in : improving support for quantification data probing the s specificity pocket of the aminopeptidases that generate antigenic peptides the er aminopeptidase, erap , trims precursors to lengths of mhc class i peptides by a "molecular ruler" mechanism concerted in vitro trimming of viral hla-b -restricted ligands by human erap and erap aminopeptidases biovenn -a web application for the comparison and visualization of biological lists using area-proportional venn diagrams a large peptidome dataset improves hla class i epitope prediction across most of the human population netmhcpan- . and netmhciipan- . : improved predictions of mhc antigen presentation by concurrent motif deconvolution and integration of ms mhc eluted ligand data cell surface mhc class i expression is limited by the availability of peptide-receptive "empty" molecules rather than by the supply of peptide ligands the specificity of trimming of mhc class i-presented peptides in the endoplasmic reticulum concerted peptide trimming by human erap and erap aminopeptidase complexes in the endoplasmic reticulum placental leucine aminopeptidase efficiently generates mature antigenic peptides in vitro but in patterns distinct from endoplasmic reticulum aminopeptidase erap -erap dimerization increases peptide-trimming efficiency a genome-wide association study identifies a functional erap haplotype associated with birdshot chorioretinopathy targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals a single-cell atlas of the peripheral immune response in patients with severe covid- distribution of hla allele frequencies in chinese individuals with coronavirus disease- (covid- ). hla . . de castro, j. a. l., how erap and erap shape the peptidomes of disease-associated mhc-i proteins cutting edge: h- l(d) class i molecule protects an hiv n-extended epitope from in vitro trimming by endoplasmic reticulum aminopeptidase associated with antigen processing mhc i chaperone complexes shaping immunity key: cord- -od cntc authors: klemm, theresa; ebert, gregor; calleja, dale j.; allison, cody c.; richardson, lachlan w.; bernardini, jonathan p.; lu, bernadine g. c.; kuchel, nathan w.; grohmann, christoph; shibata, yuri; gan, zhong yan; cooney, james p.; doerflinger, marcel; au, amanda e.; blackmore, timothy r.; geurink, paul p.; ovaa, huib; newman, janet; riboldi-tunnicliffe, alan; czabotar, peter e.; mitchell, jeffrey p.; feltham, rebecca; lechtenberg, bernhard c.; lowes, kym n.; dewson, grant; pellegrini, marc; lessene, guillaume; komander, david title: mechanism and inhibition of sars-cov- plpro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: od cntc coronaviruses, including sars-cov- , encode multifunctional proteases that are essential for viral replication and evasion of host innate immune mechanisms. the papain-like protease plpro cleaves the viral polyprotein, and reverses inflammatory ubiquitin and anti-viral ubiquitin-like isg protein modifications , . drugs that target sars-cov- plpro (hereafter, sars plpro) may hence be effective as treatments or prophylaxis for covid- , reducing viral load and reinstating innate immune responses . we here characterise sars plpro in molecular and biochemical detail. sars plpro cleaves lys -linked polyubiquitin and isg modifications with high activity. structures of plpro bound to ubiquitin and isg reveal that the s ubiquitin binding site is responsible for high isg activity, while the s binding site provides lys chain specificity and cleavage efficiency. we further exploit two strategies to target plpro. a repurposing approach, screening unique approved drugs and clinical compounds against sars plpro, identified no compounds that inhibited plpro consistently or that could be validated in counterscreens. more promisingly, non-covalent small molecule sars plpro inhibitors were able to inhibit sars plpro with high potency and excellent antiviral activity in sars-cov- infection models. centred around isg trp and pro /glu , docking isg onto the plpro a helix (extended data fig. c ). these interactions dislodge the ubl-fold from the fingers subdomain (figure a) . while the complex resembles interaction modes observed in sars plpro~isg ctd (pdb tl , , rmsd of . Å for plpro, see extended data fig. ) , some interacting residues (especially, tyr on helix a ) are not conserved (extended data fig. , c) , and seem to improve the contact in sars plpro. more variability is seen in mers plpro, which binds to ubiquitin and isg ctd similarly through its ability to 'close' the fingers subdomain (see discussion in extended data fig. ) . fig. , a,b) . consistently, sars plpro f s mutation greatly diminished extended data fig. f) , are more pronounced. mutational analysis hence confirms that the s site is more important for providing plpro with lys -polyubiquitin activity and specificity. taken together, our data illuminate in molecular detail how sars plpro targets ubiquitin and isg . repurposing known drugs to inhibit plpro activity we next focussed our attention on the urgent matter of inhibiting plpro, to confirm its drugability, and to provide new drug candidates with efficacy in treating ideally, an already clinically approved drug shows a pharmacologically relevant effect on plpro with sub-µm inhibitory potential, cell penetrance, oral bioavailability, and extensive safety profiles for the required dosage. such a drug could be expedited for clinical trials. a -well low-volume high-throughput assay previously used to identify inhibitors together, our data suggests that a repurposing strategy using unique known drugs towards sars plpro is unlikely to yield drug candidates, and highlights the importance of a counterscreen in assessing the validity of hits coming from a screen of known drugs before any conclusions on their therapeutic potential can be drawn. the robust screen and orthogonal assays for plpro will be instrumental in drug discovery campaigns. fig. , a, fig e) and its activity was confirmed by proteolytic cleavage of the gfp tag (figure d, extended data fig f) . nsp expression depleted lys -linked polyubiquitin, which was inhibited by rac c in a dose dependent manner (figure d, extended data fig f) . since it had previously been shown that these inhibitors are specific for plpro over human dubs (also see figure b ) and since treatment with rac c did not affect lys -linked polyubiquitin in the absence of nsp expression (extended data fig f) , the effect of rac c on lys -polyubiquitin is likely due to inhibition of nsp /plpro. detailed molecular understanding of how plpro targets ubiquitin and isg , and a robust high-throughput screen, pave the way to structure-guided drug discovery. indeed, while available clinically tested drugs may not be suitable to target plpro (figure ) gel-based cleavage assays were performed as previously described with the for gel-based quantitative analysis, coomassie stained gel images were converted to grayscale and band intensities were quantified using imagelabä (bio-rad). background intensities were automatically subtracted using a base line relative to we assessed the activity of , compounds contained in commercially available cov- ti v tvd n t ty q g gad p g v h vv q y vt kt e r k ft ni l q dmsm g f pt ld kik hnshe f sars-cov ti v tvd n t ty q g gad p g v h lv q y vt kt e k k ft nt l q dmsm g f pt ld kik hvnhe f mers-cov ti v tvd n t ty q g gad p g l r vl s f is a l s v c cg l g a y g v m l h k v m s e k ret sy fq n d. c r lnvv kt qqqtt k e vm tl y qf sars-cov d a l s v c cg l g a y g v m l h k v m s d k ret th lq n e. a r lnvv kh qkttt t e vm tl y nl mers-cov d a l s v c cg l g a y g a l v k r l v t e r srl the thumb domain, to adopt a similar orientation and interaction as seen for isg bound to mers. mers plpro ubiquitin complexes have been determined with 'open' and more 'closed' fingers extended data figure . separation of function mutations in sars plpro a, ubiquitin and isg binding site analysis based on pisa analysis, indicating interface residues on sars plpro pa (salmon) as bound to sars plpro highlights the different binding modes with a ~ º rotation between the two proteins. c, details of the binding of interacting residues shown as sticks. d, mutations in s and s sites were introduced in plpro, and the enzyme variants were expressed in bacteria, purified, and tested for integrity by assessing the inflection temperature, indicating the transition of folded to unfolded protein. with exception of mutating the catalytic cys to ala, which was severely destabilised and precipitated during purification inflection temperature values were determined in technical duplicate from experiments performed twice. e, f, triubiquitin cleavage to mono-and diubiquitin (left), and proisg cleavage to mature isg (right), were compared side-by-side over a time course, resolved on sds-page gels, and visualised by coomassie staining. experiments were performed in duplicate with nm enzyme and µm substrate reproduced from extended data fig b, c, for comparison. d, s site mutants as indicated extended data figure . the s site in sars plpro a, a previous structure of sars plpro bound to a non-hydrolysable, lys -linked diubiquitin probe (pdb e j, ) explained the noted preference of plpro for longer while the proximal ubiquitin unit occupies the s site in a highly similar fashion in sars~ub and sars ~ub structures (see b, figure a data fig. ), the second, distal, ubiquitin unit binds to the a helix of plpro, through a common binding mode involving the ubiquitin ile patch phe in sars plpro residue is flanked by residues involved in polar contacts structure of the sars plpro~ub complex. the s site consisting of a helix with phe residue, is fully conserved in sars plpro. mutation of phe to ser severely impacts triubiquitin and proisg hydrolysis (see extended data fig fluorescence polarisation assay on isg -tamra for plpro wild-type (reproduced from extended data fig. a) and plpro f s. a ~ -fold lower efficiency for f s is similar to cleavage of isg ctd -tamra suggesting that the s site contributes the difference in binding for the n- terminal ubl-fold. experiments for f s were performed in technical triplicate and biological duplicate. d-f triubiquitin (e) and proisg (f) using wild-type plpro (top, gels reproduced from extended data fig. b-d to enable direct comparison) or plpro f s (bottom) mutation of the s site has no marked effect on hydrolysis of proisg ctd (d) and reduces proisg cleavage to the same levels as proisg ctd experiments were performed in duplicate, see supplementary figure extended data figure . sars plpro compounds inhibit sars plpro a, structure of sars plpro bound to the ubiquitin c-terminal tail in the active site, compare with figure a. b, superposition of ubiquitin tail in sars plpro, and compound j in sars plpro (pdb ovz, ) shows an identical binding for compounds in sars plpro and highlights the change in tyr compounds rac j and rac k, racemic versions of j and k from , and their in vitro biochemical ic values determined by the hts assay technical triplicate and in three independent experiments (as for rac c in figure c). e, immunoblot characterisation of the plpro antibody on hek t cells overexpressing plpro from mers, sars or sars . cell lysates were immunoblotted h post transfection. f, immunoblot analysis showing the effect of rac c ( µm for h) on lys -polyubiquitin chain disassembly by nsp , h post transfection in hek t cells values in parentheses are for highest-resolution shell. b, c (Å) . , . , . key: cord- -vm m kb authors: srivastava, sukrit; verma, sonia; kamthania, mohit; agarwal, deepa; saxena, ajay kumar; kolbe, michael; singh, sarman; kotnis, ashwin; rathi, brijesh; nayar, seema. a.; shin, ho-joon; vashisht, kapil; pandey, kailash c title: computationally validated sars-cov- ctl and htl multi-patch vaccines designed by reverse epitomics approach, shows potential to cover large ethnically distributed human population worldwide date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vm m kb background the sars-cov- (severe acute respiratory syndrome coronavirus ) is a positive-sense single-stranded rna coronavirus responsible for the ongoing - covid- outbreak. the highly contagious covid- disease has spread to countries in less than six months. though several vaccine candidates are being claimed, an effective vaccine is yet to come. in present study we have designed and theoretically validated novel multi-patch vaccines against sars-cov- . methodology a novel reverse epitomics approach, “overlapping-epitope-clusters-to-patches” method is utilized to identify multiple antigenic regions from the sars-cov- proteome. these antigenic regions are here termed as “ag-patch or ag-patches”, for antigenic patch or patches. the identification of ag-patches is based on clusters of overlapping epitopes rising from a particular region of sars-cov- protein. further, we have utilized the identified ag-patches to design multi-patch vaccines (mpvs), proposing a novel methodology for vaccine design and development. the designed mpvs were analyzed for immunologically crucial parameters, physiochemical properties and cdna constructs. results we identified ctl (cytotoxic t-lymphocyte), htl (helper t-lymphocyte) novel ag-patches from the proteome of sars-cov- . the identified ag-patches utilized to design mpvs cover ( ctl and htl) overlapping epitopes targeting different hla alleles. such large number of epitope coverage is not possible for multi-epitope vaccines. the large number of epitopes covered implies large number of hla alleles targeted, and hence large ethnically distributed human population coverage. the mpvs:toll-like receptor ectodomain complex shows stable nature with numerous hydrogen bond formation and acceptable root mean square deviation and fluctuation. further, the cdna analysis favors high expression of the mpvs constructs in human cell line. conclusion highly immunogenic novel ag-patches are identified from the entire proteome of sars cov- by a novel reverse epitomics approach. we conclude that the novel multi-patch vaccines could be a highly potential novel approach to combat sars-cov- , with greater effectiveness, high specificity and large human population coverage worldwide. abstract figure: a multi-patch vaccine design to combat sars-cov- and a method to prepare thereof. multi-patch vaccine designing to combat sars-cov- infection by reverse epitomics approach, “overlapping-epitope-clusters-to-patches” method. immunogenicity of all the screened ctl epitopes was also obtained by using "mhc i immunogenicity" tool of iedb with all the parameters set to default analyzing st, nd, and c-terminus amino acids of the given epitope (calis et al., (sievers et al., ) . the patches of the sars- cov- protein sequences showing consensus with the clusters of overlapping epitopes were chosen and shortlisted as antigenic patches (ag-patches). this approach of search and identification of antigenic patches from source protein in a reverse epitomics manner, i.e. from epitopes to antigenic patches of source protein, is here defined as "overlapping-epitope-clusters-to-patches" method ( figure a, b) . the here provided reverse epitomics approach, "overlapping- epitope-clusters-to-patches" method to identify antigenic patches (ag-patches) from pathogen's (here sars-cov- ) protein is included in filed patent no: worldwide. in this way, the population coverage by an epitope-mhc pair could be determined (sturniolo et al., ) . deviation) is the deviation between templet residues and query residues that are structurally aligned by tm-align. the refinement of all the generated three ctl and two htl mpv models the rmsd value of the refined model shows the conformational deviation from the initial input protein model. the galaxyrefine tool refines the input tertiary structure by repeated structure perturbation as well as by subsequent structural relaxation and molecular dynamics simulation. the tool galaxyrefine generates reliable core structures from multiple templates and then re-builds the loops or termini by cell epitope from the given tertiary structure based on the location of residues in the proteins d structure. the farthest residue to be considered was limited to Å. the residues lying outside of an ellipsoid covering % of the inner core residues of the protein score highest protrusion index (pi) of . ; and so on. the discontinuous epitopes predicted by the ellipro tool are clustered based on the distance "r" in Å between two residues centers of mass lying outside of the largest possible ellipsoid of the protein tertiary structure. the larger the value of r, greater would be the distance between the residues (residue discontinuity) of toxinpred study of all the screened ctl and htl epitopes revealed that all the screened epitopes were non-toxic (supplementary table s patches from ctl and a total of ag-patches from htl high scoring overlapping epitopes ( ctl and htl epitopes) were identified (table & multi-epitope vaccines. the mevs would also face challenge to give raise to the epitopes in their intact form upon chop down processing by proteasome or lysosome, for presentation by apc ( figure a, b, , & ) highly immunogenic ag-patches were utilized to design two htl (htl-mpv- & htl-mpv- ) multi-patch vaccines. the identified ag-patches were highly conserved in nature and were identified on the basis of significant number of overlapping epitope forming clusters (figure , & ) . further analyzed for their amino acid sequence conservancy across the source protein sequence library retrieved from ncbi protein database, by the "epitope conservancy analysis" tool of iedb. we found that all the identified immunogenic ag-patches are highly conserved and both the ctl & htl ag- patches are significantly conserved with their % amino acid residues largely conserved amongst the protein sequences of sars-cov- retrieved from ncbi protein database (ctl epitopes cluster ag-patches were . % to % conserved (largely above . %) and htl epitopes cluster ag-patches were . % to % conserved (table & ) . (table ) . (supplementary table s , s , s , and s ). in the present study, we have reported a novel method to design a vaccine against sars-cov- by utilizing multiple antigenic patches from the viral proteins. the ag-patches used have been identified by the clusters of overlapping epitopes. the identification of these ag-patches was performed by reverse epitomics analysis, of the high scoring ctl and htl epitopes screened from all the orf proteins of the sars-cov- virus. the method is here termed as "overlapping-epitope-clusters-to-patches" method. all the screened epitopes were well characterized for their conservancy, immunogenicity, non-toxicity and large population coverage. the clusters of the overlapping epitopes lead us to identify the ag-patches. these ag-patches from all the orf proteins of the sars-cov- proteome were utilized further to design multi-patch vaccine (mpv) candidate against the sars-cov- infection. the designed mpvs from the antigenic patches of sars-cov- proteins have several advantages over to the subunit and multi-epitope based vaccines. the ag-patches utilized were identified and collected from the entire proteome of the sars-cov- . this would enhance the efficiency of the vaccines and lead the vaccine to be more effective. the mpvs consists of the identified ag-patches will have potential to raise multiple epitopes in clusters upon the chop down processing by proteasome and lysosome in the antigen presenting cell (apc). the identified ag-patches will also provide larger chance of the epitopes raised after proteasome and lysosomal processing to get presentation by the apc and elicit immune response. since the ag-patches were identified by the large number of epitopes forming clusters, the mpvs designed would have potential to raise larger number of epitopes upon proteasome and lysosomal processing, hence larger number of hla alleles could be targeted and hence larger ethnic human population could be covered by the mpvs, in comparison to the limited number of epitope used in multi-epitope vaccines. for instance, the three mpvs designed in the present study used the ag-patches which were identified by two patents have been filed from the report. gln :ser , gly :thr , gly :thr , ser :thr supplementary figure s . rampage analysis for all the mpvs (a) ctl-mpv- work flow concept chart from ag-patch (antigenic patch) identification to in vivo trial for the proposed mpvs against sars-cov- design of multi epitope-based peptide vaccine against e protein of human -ncov: an immunoinformatics approach preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies genome based evolutionary study of sars-cov- towards the prediction of epitope based chimeric vaccine quantitative and qualitative analyses of the immune responses induced by a multivalent minigene dna vaccine assembly and export of mhc class i peptide ligands energetics based epitope screening in sars cov- (covid ) spike glycoprotein by immuno-informatic analysis aiming to a suitable vaccine development immunoinformatics-aided identification of t cell and b cell epitopes in the surface glycoprotein of -ncov the molecular structure of the toll-like receptor ligand-binding domain development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach mediators of innate immunity that target immature, but not mature, dendritic cells induce antitumor immunity when genetically fused with nonimmunogenic tumor antigens predicting population coverage of t-cell epitope-based diagnostics and vaccines development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines properties of mhc class i presented peptides that enhance immunogenicity the ff sb force field fusion protein linkers: property, design and functionality innate immunity: structure and function of designing of interferon-gamma inducing mhc class-ii binders lecture notes in computer science rhinovirus increases human β-defensin- and- mrna expression in cultured bronchial epithelial cells preferential expression and function of toll-like receptor in human astrocytes potential t-cell and b-cell epitopes of -ncov protein identification and analysis tools on the expasy server six-locus high resolution hla haplotype frequencies derived from mixed-resolution dna typing for the entire us donor registry candidate targets for immune responses to -novel coronavirus (ncov): sequence homology- and bioinformatic-based predictions a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- identification of potential vaccine candidates against sars-cov- , a step forward to fight novel coronavirus in silico approach for predicting toxicity of peptides and proteins immunoinformatics analysis and in silico designing of a novel multi- epitope peptide vaccine against staphylococcus aureus. infection, genetics and evolution an effective ctl peptide vaccine for ebola zaire based on survivors' cd + targeting of a particular nucleocapsid protein epitope with potential implications for covid- vaccine design netmhcpan, a method for mhc class i binding prediction beyond humans the structure of human β-defensin- shows evidence of higher order oligomerization a flexible peptide linker enhances the immunoreactivity of two copies hbsag pres ( - ) fusion protein spike glycoprotein for prioritization of epitope based multivalent peptide vaccine. biorxiv individuality: the barrier to optimal immunosuppression peptide vaccine against the severe acute respiratory syndrome coronavirus- cov- ): a vaccine informatics approach. biorxiv galaxyweb server for protein structure prediction and refinement defensins as anti- inflammatory compounds and mucosal adjuvants new ways to boost molecular dynamics simulations reliable b cell epitope predictions: impacts of method development and improved benchmarking in silico identification of vaccine targets for - ncov computationally optimized sars-cov- mhc class i and ii vaccine formulations predicted to target human haplotype distributions structure validation by calpha geometry: phi, psi and cbeta deviation efficient identification of mutated cancer antigens recognized by t cells associated with durable tumor regressions ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb escort service for cross-priming multi-epitope based peptide vaccine design against sars-cov- using its spike protein synonymous codon usage pattern in glycoprotein gene of rabies virus vaccineda: prediction, design and genome-wide screening of oligodeoxynucleotide- based vaccine adjuvants &tracks=[key:sequence_track,name:sequence,display_name:sequence,id:std ,annots:sequence,showlabel:false,colorgaps:false structure-based modeling of sars-cov- peptide/hla-a antigens prediction of mhc class ii binding affinity using smm-align, a novel stabilization matrix alignment method identifying mhc class i epitopes by predicting the tap transport efficiency of epitope precursors ellipro: a new structure-based tool for the prediction of antibody epitopes sequence-based prediction of vaccine targets for inducing t cell responses to sars- cov- utilizing the bioinformatics predictor recon. biorxiv insights into cross-species evolution of novel human coronavirus -ncov and defining immune determinants for vaccine development stereochemical criteria for polypeptide and protein chain conformations: ii. allowed conformations for a pair of peptide units i-tasser: a unified platform for automated protein structure and function prediction in silico approach for designing of a multi-epitope based vaccine against novel coronavirus (sars-cov- ) algpred: prediction of allergenic proteins and mapping of ige epitopes patchdock and symmdock: servers for rigid and symmetric docking frontier therapeutics and vaccine strategies for sars-cov- (covid- ): a review prediction of protein structure and interaction by galaxy protein modeling programs quantitative peptide binding motifs for human and mouse mhc class i molecules derived using positional scanning combinatorial peptide libraries fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega molecular systems biology designing a multi-epitope peptide- based vaccine against sars-cov- design of novel multi-epitope vaccines against severe acute respiratory syndrome validated through multistage molecular interaction and dynamics structural basis of development of multi-epitope vaccine against middle east respiratory syndrome using in silico approach. infection and drug resistance structural basis for designing multiepitope vaccines against covid- infection novel coronavirus (covid ) infection, the ongoing pandemic emergency: an in silico approach generation of tissue- specific and promiscuous hla ligand databases using dna microarrays and virtual hla class ii matrices modeling the mhc class i pathway by combining predictions of proteasomal cleavage, tap transport and mhc class i binding toll-like receptor signaling via trif contributes to a protective innate immune response to severe acute respiratory syndrome coronavirus infection efficient particle-mesh ewald based approach to fixed and induced dipolar interactions designing of a next generation multiepitope based vaccine (mev) against sars- cov- : immunoinformatics and in silico approaches understanding the b and t cells epitopes of spike protein of severe respiratory syndrome coronavirus- : a peptide binding predictions for hla dr, dp and dq molecules predicting protein contact map using evolutionary and physical constraints by integer programming covid- ) dashboard; th antiviral mechanisms of human defensins computational identification of rare codons of escherichia coli based on codon pairs preference mammalian defensins in immunity: more than just microbicidal immunogenicity and immune silence in human vaccine epitopes predicted to induce long-term population-scale immunity design an efficient multi-epitope peptide vaccine candidate against sars-cov- : an in silico analysis overlapping epitope clusters. key: cord- - yn pu authors: rinchai, darawan; kabeer, basirudeen; toufiq, mohammed; calderone, zohreh; deola, sara; brummaier, tobias; garand, mathieu; branco, ricardo; baldwin, nicole; alfaki, mohamed; altman, matthew; ballestrero, alberto; bassetti, matteo; zoppoli, gabriele; de maria, andrea; tang, benjamin; bedognetti, davide; chaussabel, damien title: a modular framework for the development of targeted covid- blood transcript profiling panels date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yn pu covid- morbidity and mortality are associated with a dysregulated immune response. tools are needed to enhance existing immune profiling capabilities in affected patients. here we aimed to develop an approach to support the design of focused blood transcriptome panels for profiling the immune response to sars-cov- infection. we designed a pool of candidates based on a pre-existing and well-characterized repertoire of blood transcriptional modules. available covid- blood transcriptome data was also used to guide this process. further selection steps relied on expert curation. additionally, we developed several custom web applications to support the evaluation of candidates. as a proof of principle, we designed three targeted blood transcript panels, each with a different translational connotation: therapeutic development relevance, sars biology relevance and immunological relevance. altogether the work presented here may contribute to the future expansion of immune profiling capabilities via targeted profiling of blood transcript abundance in covid- patients. covid- is an infectious, respiratory disease caused by a newly discovered coronavirus: sars-cov- . the severity of symptoms and the course of infection vary widely, with most patients presenting mild symptoms. however, about % of patients develop severe disease and require hospitalization ( , ) . the interaction between innate and adaptive immunity can lead to the development of neutralizing antibodies against sars-cov- antigens that might be associated with viral clearance and protection ( ) . but immune factors are also believed to play an important role in the rapid clinical deterioration observed in some covid- patients ( ) . there is thus a need to develop new modalities that can improve the delineation of "immune trajectories" during sars-cov- infection. blood transcriptome profiling involves measuring the abundance of circulating leukocyte rna on a genome-wide scale ( ) . processing of the samples and the raw sequencing data however, is time consuming and requires access to sophisticated laboratory and computational infrastructure. thus, the possibility of implementing this approach on large scales to ensure immediate translational potential is limited. such unbiased omics profiling data might rather be leveraged to inform the development of more practical, scalable and targeted transcriptional profiling assays. these assays could in turn serve to significantly bolster existing immune profiling capacity. fixed sets of transcripts grouped based on co-expression observed in large collections of reference datasets provide a robust platform for transcriptional profiling data analyses ( ) . here we leveraged a repertoire of transcriptional modules previously developed by our team ( ) . the repertoire is based on a collection of reference patient cohorts encompassing pathological or physiological states and individual transcriptome profiles. in this proof of principle study, we used the available transcript profiling data from two separate studies to select covid- relevant sets of modules ( , ) . next, we applied filters based on pre-specified selection criteria (e.g. immunologic relevance or therapeutic relevance). finally, expert curation was used as the last selection step. for this we have developed custom web applications to consolidate the information necessary for the evaluation of candidates. one of these applications provides access to module-level transcript abundance profiles for available covid- blood transcriptome profiling datasets. another web interface was implemented which serves as a scaffold for the juxtaposition of such transcriptional profiling data with extensive functional annotations. ( )]. indeed, the reference patient cohorts upon which our repertoire was based encompass a wide range of immune states, including infectious diseases but also autoimmune diseases, pregnancy and cancer. thus, the first step involved identifying subsets of modules for which changes could be observed in covid- patients. we used two sets of covid- patients for this proof of principle analysis. these datasets were contributed by xiong et al. ( ) (one control and three subjects) and ong et al. ( ) (nine controls and three subjects profiled at multiple time points). their data were generated using rna-seq and nanostring technology, respectively. the generic transcript panel used by ong et al. did not give sufficient coverage across the -module set. we thus mapped the transcript changes at a lower resolution, using a framework formed by module "aggregates". these aggregates were constituted by grouping modules based on similarities in abundance patterns across the reference datasets [see methods section and ( )]. we first assessed changes in transcript abundance resulting from sars-cov- infection across the module aggregates (figure ) . in general, we saw a decrease in aggregates associated with lymphocytic compartments (aggregates a & a ) and an increase in aggregates associated with myeloid compartments and inflammation (aggregates a & a ). as expected, we also saw increases over uninfected controls for the module aggregate associated with interferon (ifn) responses (a ) and the module aggregate presumably associated with the effector humoral response (a ). we detected a wide spread of values for aggregate a for the nanostring (ong et al.) dataset. however, this aggregate comprises only one module, with only two of its transcripts measured in this nanostring code set (the probe coverage across all aggregates is shown in supplementary figure ). despite large differences between the two studies in terms of design, range of clinical severity, technology platforms and module coverage, the combined overall changes (detected at a high-level perspective) are consistent with those observed in known acute infections, such as those caused by influenza, respiratory syncytial virus (rsv) or s. aureus. this consistency is evidenced by the indicated by patterns of change observed for the reference fingerprints shown alongside those of covid- patients (figure ) . overall, such a high-level analysis allows us to identify module aggregates forming our repertoire that may be included in further selection steps. in this proof of principle analysis, we selected aggregates for further analysis ( table ) . the abundance patterns for modules comprised in a given aggregate are not always homogeneous ( figure ) . thus, a next step would consist of identifying sets of modules within an aggregate that display coherent abundance patterns across modules forming a given aggregate. to achieve this, we first mapped the changes in transcript abundance associated with covid- disease using the rnaseq dataset from xiong et al., as illustrated for a ( figure a ) and a ( figure a) . similar plots can be generated for all other aggregates using the "covid- " web application (also compiled in supplementary file and listed in table ). next, we identified and assigned a module set id for each aggregate the modules that formed homogeneous clusters. for example, we designated the first a set as a /s . such module grouping is only based on patterns of transcript abundance observed in three covid- patients; however, the groupings were often consistent with those observed for the much larger reference cohorts that constitute the module repertoire ( figure b and figure b ). a /s , which is formed by m . and m . , serves as a good example of this consistency ( figure b) . likewise, the segregation of the modules forming a based on differences observed in the three covid- patients was also apparent in the reference patient cohorts ( figure b) . specifically, an increase in a /s modules, which accompanied a decrease in a /s modules, in these three patients was also characteristic of rsv patients. we ultimately derived homogeneous covid- relevant module sets from the aggregates selected in the earlier step ( table ) . these sets might be used as a basis for further selection. in the previous step, we used available covid- data to guide the selection of distinct "covid- relevant module sets". in the next step, we selected the transcripts within each module set that warranted inclusion in one of three illustrative covid- targeted panels. a first panel was formed using immunologic relevance as the primary criterion, a second was formed on the basis of relevance to coronavirus biology, a third was constituted on the basis of relevance to therapy. for the first panel we matched transcripts comprised in each module set to a list of canonical immune genes (see methods for details). expert curation also involved accessing transcript profiling data from the reference datasets, indicating for instance leukocyte restriction or patterns of response to a wide range of immune stimuli in vitro. we describe our approach for module and gene annotation in more detail below and provide access to our resources to support expert curation ( table ) . for our illustrative case, we selected one representative transcript per module set to produce a panel comprised of representative transcripts ( table ) . examples of signatures surveyed by such a panel include: ) isg in a /s (interferon responses), which encodes for a member of the ubiquitin family. isg plays a central role in the host defense to viral infections ( ) . ) gata in a /s (erythroid cells), which encodes for a master regulator of erythropoiesis ( ) . it is associated with a module signature (a ) that we recently reported as being associated with immunosuppressive states, such as late stage cancer and maintenance immunosuppressive therapy in solid organ transplant recipients ( ) . in the same report we also found an association between this signature and heightened severity in patients with rsv infection and established a putative link with a population of immunosuppressive circulating erythroid cells ( ) . ) cd in a /s (cell cycle), which encodes for the cd molecule expressed on different circulating leukocyte populations. in whole blood we find the abundance of its transcript to correlates with that of igj, tnfrsf (bcma), txndc (m . ). such a signature was previously found to be increased in response to vaccination at day post administration, to correlate with the prevalence of antibody producing cells, and the development of antibody titers at day ( ) . ) tlr in a /s (inflammation), encodes toll-like receptor . expression of transcripts comprising this aggregate is generally restricted to neutrophils and robustly increased during sepsis (e.g. as we have described in detail earlier for acsl , another transcript belonging to this aggregate ( )). ) gzmb in a /s (cytotoxic cells) encodes granzyme b, a serine protease known to play a role in immune-mediated cytotoxicity. other transcripts forming this panel are listed in table . even with the limited amount of data available to guide the selection in the previous steps, it is reasonable to assume that such a panel (while not optimal) would already provide valid information for covid- immune profiling. additional covid- blood transcriptome data that will become available in the coming weeks will allow us to refine the overall selection process. a different translational connotation was given for this second panel. here, we based the selection on the same collection of module sets. however, this time, whenever possible, we included transcripts that could have value as targets for the treatment of covid- patients. an initial screen identified transcripts encoding molecules that are known targets for existing drugs (see methods). we further prioritized these candidates based on an expert's evaluation of the compatibility of use of the drugs for treating covid- patients. as an exception, module sets belonging to a (interferon response) were selected based on their suitability as markers of a response to interferon therapy. sets for which no targets of clinical relevance were identified ( / ) were instead represented in the panel by immunologically-relevant transcripts (defined earlier). we ultimately identified a preliminary set of targets through this high stringency selection process ( table ) ) ccr in a / (monocytes), encoding the chemokine (c-c motif) receptor , is targeted along with ccr by the drug cenicriviroc. this drug exerts potent anti-inflammatory activity ( ) . ) tbxa r in a / (platelets), encoding the thromboxane receptor, is targeted by several drugs with anti-platelet aggregation properties ( ) . ) pde a in a /s (inflammation), encoding phosphodiesterase a, is targeted by pentoxifylline, a nonselective phosphodiesterase inhibitor that increases perfusion and may reduce risk of acute kidney injury and attenuates lps-induced inflammation ( ) . ) nqo in a /s (complement), encoding nad(p)h quinone dehydrogenase . the nqo antagonist vatiquinone (epi- ) has been found to inhibit ferroptosis ( ) , a process associated with tissue injury ( ) , including in sepsis ( ) . a complete list is provided in table . the fact that this transcript panel and the previous survey the same pre-defined homogenous covid- relevant module sets should make them largely synonymous (since modules are formed on the basis of co-expression). nevertheless, this second panel may be more relevant for investigators interested in investigating new therapeutic approaches or measuring responses to treatment. for the third panel designed in this proof of principle, we primarily selected transcripts based on their relevance to sars biology. as a first step, we used a literature profiling tool to identify sars, mers, or covid- literature articles that were associated with transcripts forming the covid- module sets. next, the potential associations were subjected to expert curation (see methods). once again, to keep redundancies to a minimum, we only included one candidate per set in this panel ( table ) . notable examples include: ) ltf in a /s (neutrophil activation) encodes lactotransferrin, that is known to block the binding of the sars-cov spike protein to host cells, thus exerting an inhibitory function at the viral attachment stage ( ) . ) furin in a /s (erythroid cells), encodes a proprotein convertase that preactivates sars-cov- , thus reducing its dependence on target cell proteases for entry ( ) . ) egr in a /s (monocytes), encodes early growth this screen identified several molecules that may be of importance for sars-cov- entry and replication. it is expected that this knowledge will evolve rapidly over time and frequent updates may be necessary. and, as for the previous two panels, investigators may also have an interest in including more than one candidate per module set. this of course would also be feasible, although at the expense of course of parsimony. a vast amount of information is available to support the work of expert curators who are responsible for finalizing the selection of candidates. this process often requires accessing a number of different resources (e.g. those listed in table ). here we have built upon earlier efforts to aggregate this information in a manner that makes it seamlessly accessible by the curators. as proof of principle, we created dedicated, interactive presentations in prezi for module aggregates a ( ( )) and a (https://prezi.com/view/zycslyo nvjtwjfjkjqb/). these presentations are intended, on the one hand, to aggregate contextual information that can serve as a basis for data interpretation. on the other hand, they are intended to capture the results of the interpretative efforts of expert curators. the interactive presentations are organized in sections, each showing aggregated information from a different level: module-sets, modules and transcripts ( figure ). the information derived from multiple online sources, including both third party applications and custom applications developed by our team ( table ) . among those is a web application developed specifically for this work, which was used to generate the covid- plots from ong et al. and xiong et al. (figure a ). the interactive presentation itself permits to zoom in and out, determine spatial relationships and interactively browse the very large compendium of analysis reports and heatmaps generated as part of these annotation efforts. the last section that contains transcript-centric information, is also the area where interpretations from individual curators is aggregated. we have annotated and interpreted some of the transcripts included in a /s in such a manner: ) oxtr, which encodes for the oxytocin receptor through which antiinflammatory and wound healing properties of oxytocin are mediated ( ) . among our reference cohort datasets, oxtr is most highly increased in patients with s. aureus infection or active pulmonary tuberculosis ( ) . ) cd , which encodes a member of the tetraspanin family, facilitates the condensation of receptors and proteases activating mers-cov and promoting its rapid and efficient entry into host cells ( ) . ) tnfsf , which encodes for ox l and is a member of the tnf superfamily. although ox l is best known as a t-cell co-stimulatory molecule, reports have also shown that it is present on the neutrophil surface ( ) . furthermore, ox l blockade improved outcomes of sepsis in an animal model. our interpretation efforts have been limited thus far by expediency. certainly, interpretation will be the object of future, more targeted efforts. in the meantime, this annotation framework supports the selection of candidates forming the panels presented here. it may also serve as a resource for investigators who wish to design custom panels of their own. early reports point to profound immunological changes occurring in affected patients during the course of a sars-cov- infection ( , ) . in particular, patterns of immune dysfunction have been associated with clinical deterioration and the onset of severe respiratory failure ( ) . however, disease outcomes remain highly heterogeneous and factors contributing to clinical deterioration are poorly understood. among other modalities, means to establish comprehensive immune monitoring in cohorts of covid- patients are needed. here we designed an approach select and curate targeted blood transcript panels relevant to covid- . when finalized, such panels could in turn serve as a basis for rapid implementation of focused transcript profiling assays. this process should become possible as more covid- blood transcriptome profiling datasets become available in the coming weeks and months. these data could, for instance, be used to refine the delineation of covid- module sets. because our selection strategy relies primarily on a pre-existing module repertoire framework, we anticipate that changes would only be relatively minor. indeed, one advantage of basing candidate selection on a repertoire of transcriptional modules is that it permits to derive non-synonymous transcripts sets. in other words, each transcript included in the panel could survey the abundance of a different module (signature). basing selection on differential expression instead, for instance, would tend to select multiple transcripts from more dominant signatures (with highest significance / fold changes). but for more specific purposes, such as differential diagnosis or prediction, machine learning models would be more appropriate [for instance in sepsis studies: ( ) ]. indeed, such panels have already been developed for covid- ( ) , and we anticipate that more will emerge over the coming months. however, our intent here was different: our primary aim was to support the development of a solution that can monitor immune responses and functions. delineation of "immune trajectories" associated with clinical worsening of covid- patients is one application to consider. another application would be the measurement of responses to therapy (as part of standard of care or a trial). the immune profiling of asymptomatic or pre-symptomatic patients (e.g quarantined) would be another setting where implementation of such an assay could prove useful. for this, it would for instance be possible to use protocols that we have previously developed for home-based, self-sampling and blood rna stabilization ( , ) . different connotations were given for the three panels, which are presented here as a proof of principle. the panel consisting of immunologically relevant markers might have the highest general interest. however, measuring changes in the abundance of transcripts coding for molecules that are targetable by existing drugs could have higher translational potential. another illustrative panel comprises transcripts coding for molecules that are of relevance to sars-cov- biology and might be of additional interest in investigations of host-pathogens interactions. the common denominator between these panels is that they comprise representative transcripts of each of the module sets. other transcript combinations following the same principle would be possible, as well as the inclusion of multiple transcripts from the same set for added robustness. the obvious disadvantage of the latter, however, is the increase to the size of the panel. medium-throughput technology platforms, such as the nanostring ncounter system, fluidigm biomark or thermofisher openarray, would be appropriate for implementing custom profiling assays with the number of targets comprising the tentative panels presented here (or a combination thereof). downsizing panels to comprise ± key markers might serve as a basis for implementation on more ubiquitous real-time pcr platforms. overall, this work lays the ground for a framework that could support the development of increasingly more refined and interpretable targeted panels for profiling the authors declare no competing interests. this section includes, for a given module, reports from functional profiling tools as well as patterns of transcript abundance across the genes forming the module. drug targeting profiles were added to provide another level of information. c. gene-centric information. the information includes curated pathways from the literature, articles and reports from public resources. gene-centric transcriptional profiles that are available via gene expression browsing applications deployed by our group are also captured and used for context (gxb). a synthesis of the information gathered by expert curation and potential relevance to sars-cov- infection can also be captured and presented here. design and export of custom plots to populate the annotation framework. in house / open . ). the modular analysis was performed by using , rna-seq genes which overlapped with transcripts from the rd generation module construction ( ) . details of the analysis as described below section. nanostring genes which overlapped with transcripts from the nd generation module construction details of the analysis as described below section. we also used a reference dataset generated by our group that was previously used to construct the blood transcriptional module repertoire ( the method used to construct the transcriptional module repertoire has been described elsewhere ( , ) . the version used here is the third and last to have been developed by our group over a period of years. it is the object of a separate publication (available on a pre-print server ( )). briefly, the approach consists of identifying sets of co-expressed transcripts in a wide range of pathological or physiological states, focusing in this case on the blood transcriptome as the biological system. we determined co-expression based on patterns of datasets). next, this network was mined using a graph theory algorithm to define subsets of densely connected gene sets that constituted our module repertoire ("cliques" and "paracliques"). overall, transcriptional modules were identified, encompassing , transcripts. a supplemental file including the definition of this module repertoire along with the functional annotations is available elsewhere ( ) . to provide another level of granularity and facilitate data interpretation, a second round of clustering was performed to group the modules into "aggregates". this process was achieved by grouping the set of modules according to the patterns of transcript abundance across the reference datasets that were used for module construction. this segregation resulted in the formation of aggregates, each comprising between one and modules. the modular analyses were performed using the core set of the cut off comprised an absolute fc > . and a difference in counts > . the results for each module are reported as the percentage of its constitutive transcripts that increased or decreased in abundance. because the genes comprised in a module are selected based on the co-expression observed in blood, the changes in abundance within a given module tend to be coordinated and the dominant trend is therefore selected (the greater value of the percentage increased vs. percentage decreased). thus, the values range from - % (all constitutive modules are decreased) to + % (all constitutive modules are increased). a module was considered to be "responsive" when the proportion of transcripts found to be increased was > %, or when the proportion of transcripts found to be decreased was ≤ %. at the aggregate-level, the percent values of the constitutive modules were averaged. changes in transcript abundance reduced at the module or module aggregate-level were visualized using a custom fingerprint heatmap format. for each module, the percentage of increased transcripts is represented by a red spot and the percentage of decreased transcripts is represented by a blue spot. the fingerprint grid plots were generated using "complexheatmap" ( ) . a web application was developed to generate the plots and browse modules and module aggregates (https://drinchai.shinyapps.io/covid_ _project/). a detailed description and source code will be available as part of a separate publication biorxiv deposition on github and biorxiv (in preparation). profiler module; accumenta biotech, boston, ma). next, the potential associations were assessed by manual curation. the curators prioritized the transcripts for which the associations could be confirmed based on importance and robustness. immunological relevance: lists of immunologically relevant genes were retrieved from immport ( ) , and were used along with membership to ipa pathways (ingenuity pathway analysis, qiagen, germantown md) to annotate transcripts comprising covid- module sets. the curators prioritized annotated transcripts on the basis of their relevance to the functional annotations of the module set (e.g. interferon, inflammation, cytotoxic cells). the transcript with the highest priority rank was included in the assay. housekeeping genes: a recommended set of housekeeping genes is provided in table . these were selected on the basis of low variance observed across the transcriptome profiles generated for our reference cohorts. links to the resources described in this section and to video demonstrations are available in table . interactive presentations were created via the prezi web application. for this we have built and expanded upon an annotation framework established as part of the characterization of our reference blood transcriptome repertoire ( ) . several bioinformatic resources were used to populate interactive presentations that served as a framework for annotation of covid- relevant module sets. these resources include web applications deployed using shiny r, which permit to plot transcript abundance patterns at the module and aggregate levels. two of these applications were developed as part of a previous work establishing the blood transcriptome repertoire and applying it in the context of a metaanalysis of six public rsv datasets ( clinical characteristics and imaging manifestations of the novel coronavirus disease (covid- ):a multicenter study in wenzhou city clinical characteristics of coronavirus disease in china antibody responses to sars-cov- in patients with covid- on the alert for cytokine storm: immunopathology in covid- . arthritis rheumatol hoboken nj assessment of immune status using blood transcriptomics and potential implications for global health data-driven human transcriptomic modules determined by independent component analysis development and characterization of a fixed repertoire of blood transcriptome modules based on co-expression patterns across immunological states. biorxiv a dynamic immune response shapes covid- progression transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients. emerg microbes infect isg in antiviral immunity and beyond regulation of gata levels in erythropoiesis identification of erythroid cell positive blood transcriptome phenotypes associated with severe respiratory syncytial virus infection. biorxiv neglected cells: immunomodulatory roles of cd + erythroid cells systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines long-chain acyl-coa synthetase role in sepsis and immunity: perspectives from a parallel review of public transcriptome datasets and of the literature immunotherapies for covid- : lessons learned from sepsis tocilizumab treatment in severe covid- patients attenuates the inflammatory storm incited by monocyte centric immune interactions revealed by single-cell analysis. biorxiv interleukin- blockade for severe covid- . medrxiv efficacy and safety study of cenicriviroc for the treatment of non-alcoholic steatohepatitis in adult subjects with liver fibrosis: centaur phase b study design from the design to the clinical application of thromboxane modulators effects of pentoxifylline on inflammatory parameters in chronic kidney disease patients: a randomized trial targeting ferroptosis: a novel therapeutic strategy for the treatment of mitochondrial diseaserelated epilepsy ferroptosis: a regulated cell death nexus linking metabolism, redox biology, and disease dexmedetomidine alleviated sepsis-induced myocardial ferroptosis and septic heart injury inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans cell entry mechanisms of sars-cov- sars coronavirus papain-like protease inhibits the tlr signaling pathway through removing lys -linked polyubiquitination of traf and traf severe acute respiratory syndrome coronavirus orf antagonizes stat function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane the tetraspanin cd facilitates mers-coronavirus entry by scaffolding host cell receptors and proteases ox ligand regulates inflammation and mortality in the innate immune response to sepsis the trinity of covid- : immunity, inflammation and intervention immune response to sars-cov- and mechanisms of immunopathological changes in covid- complex immune dysregulation in covid- patients with severe respiratory failure a community approach to mortality prediction in sepsis via gene expression analysis longitudinal peripheral blood transcriptional analysis of covid- patients captures disease progression and reveals potential biomarkers. medrxiv longitudinal monitoring of gene expression in ultra-low-volume blood samples selfcollected at home finger stick blood collection for gene expression profiling and storage of tempus blood rna tubes a modular analysis framework for blood genomics studies: application to systemic lupus erythematosus democratizing systems immunology with modular transcriptional repertoire analyses complex heatmaps reveal patterns and correlations in multidimensional genomic data triple combination of interferon beta- b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with covid- : an open-label, randomised, phase trial search for specific biomarkers of ifnβ bioactivity in patients with multiple sclerosis immport, toward repurposing of open access immunological assay data for translational and clinical research characterizing blood modular transcriptional repertoire perturbations in patients with rsv infection: a hands-on workshop using public datasets as a source of training material. biorxiv an interactive web application for the dissemination of human systems immunology data copy number loss of the interferon gene cluster in melanomas is linked to reduced t cell infiltrate and poor patient prognosis a curated transcriptome dataset collection to investigate the blood transcriptional response to viral respiratory tract infection and vaccination aberrant cell cycle and apoptotic changes characterise severe influenza a infection--a meta-analysis of genomic signatures in circulating leukocytes gsan: an alternative to enrichment analysis for annotating gene sets pharmacological inhibition of ccr / signaling prevents and reverses alcohol-induced liver damage, steatosis, and inflammation in mice pi k-δ and pi k-γ inhibition by ipi- abrogates immune responses and suppresses activity in autoimmune and inflammatory disease models apobec g cytidine deaminase association with coronavirus nucleocapsid protein. virology apobec -mediated restriction of rna virus replication interferon-gamma and interleukin- downregulate expression of the sars coronavirus receptor ace in vero e cells coronavirus s protein-induced fusion is blocked prior to hemifusion by abl kinase inhibitors mers coronavirus induces apoptosis in kidney and lung by upregulating smad and fgf . nat microbiol sars coronavirus papainlike protease induces egr- -dependent up-regulation of tgf-β via ros/p severe acute respiratory syndrome coronavirus orf a protein interacts with caveolin selectivity in isg and ubiquitin recognition by the sars coronavirus papain-like protease a large-scale drug repositioning survey for sars-cov- antivirals. biorxiv spike protein of sars-cov stimulates cyclooxygenase- expression via both calcium-dependent and calcium-independent protein kinase c pathways sars-coronavirus open reading frame- b triggers intracellular stress pathways and activates nlrp inflammasomes severe acute respiratory syndrome coronavirus orf a protein activates the nlrp inflammasome by promoting traf -dependent ubiquitination of asc single cell rna sequencing of human tissues identify cell types and receptors of human coronaviruses association of rantes with the replication of severe acute respiratory syndrome coronavirus in thp- cells furin cleavage of the sars coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry protease inhibitors targeting coronavirus and filovirus entry c. figure supplementary figure key: cord- - ezisog authors: rocca, maría florencia; zintgraff, jonathan cristian; dattero, maría elena; santos, leonardo silva; ledesma, martín; vay, carlos; prieto, mónica; benedetti, estefanía; avaro, martín; russo, mara; nachtigall, fabiane manke; baumeister, elsa title: a combined approach of maldi-tof mass spectrometry and multivariate analysis as a potential tool for the detection of sars-cov- virus in nasopharyngeal swabs date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ezisog coronavirus disease (covid- ) is caused by the severe acute respiratory syndrome coronavirus (sars-cov- ). the rapid, sensitive and specific diagnosis of sars-cov- by fast and unambiguous testing is widely recognized to be critical in responding to the ongoing outbreak. since the current testing capacity of rt-pcr-based methods is being challenged due to the extraordinary demand of supplies, such as rna extraction kits and pcr reagents worldwide, alternative and/or complementary testing assays should be developed. here, we exploit the potential of mass spectrometry technology combined with machine learning algorithms as an alternative fast tool for sars-cov- detection from nasopharyngeal swabs samples. according to our preliminary results, mass spectrometry-based methods combined with multivariate analysis showed an interesting potential as a complementary diagnostic tool and further steps should be focused on sample preparation protocols and the improvement of the technology applied. the novel coronavirus disease , caused by the sars-cov- virus, was declared a pandemic by the world health organization on march th following its emergence in wuhan china. as of the april th there were over . m confirmed cases of covid- in countries, with over , fatalities (johns hopkins coronavirus resource center, ). sars-cov- is one of four new pathogenic viruses which have jumped from animal to human hosts over the past years, and the current pandemic sends warning signs about the need for preparedness and associated research. clinical presentation of covid- ranges from mild to severe with a high proportion of the population having no symptoms yet being equally infectious (yang et al., ; zhou et al., ) . together, these features have led to intensive lockdown measures in most countries with the aim to restrict the spread of the virus, limit the burden on healthcare systems and reduce mortality rate. in parallel, there has been an extraordinary response from the scientific community. these collective efforts aim to understand the pathogenesis of the disease, to evaluate treatment strategies and to develop a vaccine at unprecedented speeds in order to minimize its impact on individuals and on the global economy (li, ; wenzhong and hualan, preprint) . the rapid, sensitive and specific diagnosis of sars-cov- by fast and unambiguous testing is widely recognized to be critical in responding to the outbreak. since the current testing capacity of rt-pcr-based methods is being challenged due to the extraordinary global demand of supplies such as rna extraction kits and pcr reagents, alternative and/or complementary testing assays need to be deployed now in an effort to accelerate our understanding of covid- disease (chin et al., ; antezack et al., ) . the aim of this work was to assess the potential of maldi-tof ms technology to create mass spectra from nasopharyngeal swabs in order to find specific discriminatory peaks by using machine learning algorithms, and whether those peaks were able to differentiate covid- positive samples from covid- negative samples. sample preparation and maldi-tof data acquisition. samples. first, we analyzed in triplicate samples of nasopharyngeal swab, preliminary tested by rt-pcr (corman et al., ) all spectra were verified using the flex analysis v . software (bruker daltonics, bremen, germany). the spectra selected for model generation and classification were treated according to a standard workflow including the following steps: baseline subtraction, normalization, recalibration, average spectra calculation, average peak list calculation, peak calculation in the individual spectra and normalization of peak list for model generation. main spectrum profiles (msps) were performed according to manufacturer's instructions. all samples were used to build an "in-house" database with maldi biotyper oc v . (bruker daltonics, bremen, germany); in addition, dendrograms were performed to assess the relatedness of these msps using default settings. spectra files from msps were exported as mzxml files using compassxport cxp . . . (bruker daltonics, bremen, germany) for visual analysis. at the same time, a new database was created in bionumerics v . . (applied maths, ghent, belgium) according to the manufacturers' instructions. all raw spectra were imported into the bionumerics database with x-axis trimming to a minimum of m/z. baseline subtraction (with a rolling disc with a size of points), noise computing (continuous wavelet transformation, cwt), smooth (kaiser window with a window size of points and beta of points), and peak detection (cwt with a minimum signal to noise ratio of ) were performed. spectrum summarizing, peak matching and peak assignment was performed according to instructions from bionumerics. in short, all raw spectra were summarized into isolate spectra, and peak matching, using the option "existing peak classes only", was performed on isolate spectra using a constant tolerance of . , a linear tolerance of and a peak detection rate of %. binary peak matching tables were exported to summarize the presence of peak classes. by assigning biomarkers, only the presence and absence of peaks was investigated. to also assess quantitative peak data such as peak intensity and peak area, a multivariate unsupervised statistical tool pca was additionally performed. all samples were examined for the unique masses (± da). classifier models based on machine learning. (stephens, ) . the best top ten peaks are summarized in table . for evaluating the performance of the different approaches mentioned, accuracy, sensitivity, specificity, positive prediction and negative prediction were calculated (clinpro tools . : user manual, ). the results were analyzed according to the following approaches: biomarker findings, construction of an "in-house" database and through the design of predictive models by machine learning. manual analysis of the spectra obtained by flex analysis v . software revealed one potential peak of negativity which was not detected in most of the positive samples: da. when bionumerics software was used, another potential biomarker was found in % of negative samples and only in % of positives: da. this peak is also detected with reproducible intensity in the average spectrum of that same class in clinprotools (fig. ) . evaluation of the novel "in-house" database. the novel database was challenged with previously-characterized samples different than those used to create it. they were processed in the same way as the results of recognition capacity (rc) and cross-validation (cv) values of all models used are summarized in table . for the ga classes and snn classes but correctly identified by the ga classes (not included in the statistical analysis). considering the sars-cov- undetectable samples: % ( / ) were correctly identified and only sample was miss classified (table ) . discussion maldi-tof ms is a simple, inexpensive and fast technique that analyses protein profiles with a high reliability rate, and could be used as a rapid screening method in a large population (croxatto et al., ) . these preliminary results suggest that maldi-tof ms coupled with clinprotools software represents an interesting alternative as a screening tool for diagnosis of sars-cov- , especially because of the good performance and accuracy obtained with samples in which viral presence was not detected. more samples need to be analyzed in order to make a definitive statement. however, this study using maldi-tof combined with machine learning has proven to be, as far as we know, a revolutionary alternative that deserves further development. the identification of specific biomarkers responsible for each peak or group of peaks represents a difficult and demanding task that requires further specific studies. based on the promising preliminary results, we should focus on the improvement of this potential diagnosis approach by assaying various techniques for proteins extraction to the clinical samples and on the expansion of the complementary database in the near future. these results constitute the basis for further research and we encourage researchers to explore the potential of maldi-tof ms in order to assess the feasibility of this technology, widely available in clinical microbiology laboratories, as a fast and inexpensive sars-cov- diagnostic tool. rapid diagnosis of periodontitis, a feasibility study using malditof mass spectrometry clinpro tools . : user manual stability of sars-cov- in different environmental conditions. the lancet microbe detection of novel coronavirus ( -ncov) by real-time rt-pcr covid- map -johns hopkins coronavirus resource center applications of maldi-tof mass spectrometry in clinical diagnostic microbiology structure, function, and evolution of coronavirus spike proteins edf statistics for goodness of fit and some comparisons covid- : attacks the -beta chain of hemoglobin and captures the porphyrin to inhibit human heme metabolism covid- ) : interim guidance clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a singlecentered, retrospective, observational study coronavirus disease (covid- ): a clinical update key: cord- - o l iy authors: cross, robert w.; prasad, abhishek n.; borisevich, viktoriya; woolsey, courtney; agans, krystle n.; deer, daniel j.; dobias, natalie s.; geisbert, joan b.; fenton, karla a.; geisbert, thomas w. title: use of convalescent serum reduces severity of covid- in nonhuman primates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o l iy passive transfer of convalescent plasma or serum is a time-honored strategy for treating infectious diseases. human convalescent plasma containing antibodies against sars-cov- is currently being used to treat covid- patients. however, most patients have been treated outside of randomized clinical trials making it difficult to determine the efficacy of this approach. here, we assessed the efficacy of convalescent sera in a newly developed african green monkey model of covid- . groups of sars-cov- -infected animals were treated with pooled convalescent sera containing either high or low to moderate anti-sars-cov- neutralizing antibody titers. differences in viral load and disease pathology were minimal between monkeys that received the lower titer convalescent sera and untreated controls. however, and importantly, lower levels of sars-cov- in respiratory compartments, reduced gross and histopathological lesion severity in the lungs, and reductions in several parameters associated with coagulation and inflammatory processes were observed in monkeys that received convalescent sera versus untreated controls. our data support human studies suggesting that convalescent plasma therapy is an effective strategy if donors with high level of antibodies against sars-cov- are employed and if recipients are at an early stage of disease. severe acute respiratory syndrome coronavirus (sars-cov- ) can cause a severe, potentially life threating viral pneumonia named coronavirus disease (world.health.organisation, ). the covid- pandemic originated in wuhan, china and spread across the globe at an explosive rate leading to over million cases and hundreds of thousands of deaths in just over months' time. while no currently approved vaccines or therapeutics exist for covid- , a battery of medical countermeasures are being developed and assessed in human clinical trials at record speed (beigel et al., ; jackson et al., ) , (zhu et al., ) . nonetheless, safety and efficacy trials can take considerable time to produce confidence prior to obtaining regulatory approvals, which creates an immediate need for therapeutic options that may be more accessible; ideally one with track records of success against related viruses. transfusion of convalescent blood products has been used in clinical settings for > years (casadevall et al., ) including for the treatment of emerging viruses such as ebola, influenza, and other viruses (mair-jenkins et al., ) . immunity to the closely related severe acute respiratory syndrome coronavirus (sars-cov) and middle east respiratory syndrome coronavirus (mers-cov) is understood to be owed, in part, to the development of potent neutralizing antibody responses (sariol and perlman, ) . indeed, some of the first approaches to treat humans for acute cases that were otherwise unresponsive to standard respiratory virus treatment protocols was the administration of convalescent plasma (cp) from sars-cov (yeh et al., ) or mers-cov (arabi et al., ) , respectively. recently, the world health organization approved a standardized protocol for the use of cp for the treatment of mers-cov (arabi et al., ) , yet there is still debate on the feasibility of cp use in the treatment of mers as donor antibody titers tend to be too low to produce therapeutic effect (corti et al., ) . nonetheless, with ever increasing caseloads and no other immediately available options, reports of the use of covid- convalescent plasma (ccp) for the treatment of severe covid- patients surfaced which suggest therapeutic benefit even if given in some cases of severe disease (joyner et al., a; shen et al., ) . building on these successes, large scale clinical trials have been initiated in association with nationwide donation programs (ccpp , ; malani et al., ) . in response to the worsening public health crisis, the united states food and drug administration issued a federal emergency use authorization (eua) for the use of ccp on august, to facilitate access to the treatment approach, despite ongoing clinical trials to fully evaluate safety and efficacy (us-fda, ) . in parallel with reports of success in humans, hamsters were recently used to experimentally demonstrate the potential of ccp to treat sars-cov- infection. while informative to demonstrate proof-of-concept, limited reagents are available to succinctly describe the host responses to infection and treatment in hamsters (chan et al., ; imai et al., ) . non-human primates (nhp) have long been used to model pathological responses to infection due to their physiological similarity to humans and abundance of cross-reactive reagents, which allow for more detailed analysis than possible in lower vertebrates. a natural extension to this utility is their value in determining predictive efficacy of medical countermeasures such as vaccines or therapies in humans. several groups have recently described nhp infection in a number of species including rhesus macaques, cynomolgus macaques, baboons, and marmosets (lu et al., ; rockx et al., ; singh et al., ) , none of which elicit overt clinical signs of disease reflecting the human condition, making evaluation of therapeutic approaches in these species challenging. recently we described a novel african green monkey (agm) model which recapitulates many of the most salient features of human disease including severe viral pneumonia, transient coagulopathy, and a prolonged state of recovery (cross et al., ; woolsey et al., ) . in this study, we challenged agms with sars-cov- and subsequently treated the animals with different pools of sera derived from agm previously infected with the virus. we provide direct evidence for the importance of neutralization potency on the efficacy of convalescent sera to reduce viral burden in the respiratory tract and to reduce systemic and localized evidence of covid- in agms. ten healthy, sars-cov- naïve agms were randomized into two treatment cohorts (n= each) and an untreated control cohort (n= ). all animals were challenged with a target dose of . x pfu of sars-cov- (sars-cov- /inmi -isolate/ /italy) via combined intranasal (i.n.) and intratracheal (i.t.) inoculation. ten hours post-challenge, the experimental cohorts were treated intravenously (i.v.) with pooled convalescent sera ( . ml/kg) obtained from animals infected with the homologous isolate of sars-cov- in previous studies (cross et al., ; woolsey et al., ) . animals in one experimental cohort received pooled sera from a group of three agms back-challenged days after primary challenge, and then euthanized at the scheduled study endpoint days after the back-challenge. to determine the sars-cov- binding and neutralizing potential of the treatment in vivo, sera from the convalescent sera-treated agms was fractionated from whole blood collected and dpi and assessed for binding by elisa and neutralizing activity by prnt (figure a,b) . the binding titer was : , (total virus), : , (anti-nucleoprotein), and : , (anti-spike rbd); which corresponded to a prnt in pooled sera of ~ : for this cohort (designated "high dose"; "hd"). the second experimental cohort received pooled sera from a group of three agms euthanized at the scheduled study endpoint days after challenge with sars-cov- (cross et al., ) ; the binding titer was : , (total virus, anti-nucleoprotein, and anti-spike rbd); which corresponded to a prnt for pooled sera of ~ : (designated as "low dose"; "ld"). animals were longitudinally monitored for clinical signs of illness and euthanized dpi. in agreement with previous reports (cross et al., ; hartman et al., ; johnston et al., ; speranza et al., ; woolsey et al., ) , sars-cov- -infected agms in this study showed mild to moderate clinical illness ( table ) . shifts in leukocyte populations as compared to prechallenge baseline counts; specifically, lymphocytopenia, generalized granulocytopenia (neutropenia, eosinopenia, and/or basopenia), and mild to moderate thrombocytopenia were common to most animals regardless of cohort starting approximately dpi ( table ) . four animals (hd-agm- , ld-agm- , ld-agm- , and c-agm- ) experienced monocytosis beginning - dpi; in two of these animals (hd-agm- and ld-agm- ) this coincided with generalized granulocytosis (neutrophilia, eosinophilia, and/or basophilia). prothrombin time (pt) was largely unaffected, yet a significantly prolonged coagulation time was noted at dpi for activated partial thromboplastin time (aptt) in hd versus control group (p= . ; two-way anova with tukey's multiple comparison); decreases in levels of thrombocytes were more notable in control animals; and increases in circulating fibrinogen were generally more pronounced in the control group compared to either of the experimental groups ( figure s ). these results suggest that while all animals appear to have experienced varying degrees of coagulopathy, treatment with sera with a higher sars-cov- neutralizing capacity may have partially ameliorated disease in the hd group. however, differences in these parameters were not statistically significant for most time points due to the small cohort sizes and individual animal variability. serum markers of renal and/or hepatic function (cre, alt, ast) were mildly elevated in most animals from treatment groups as well as the control group, while crp, a marker of acute systemic inflammation, was mild to moderately ( - fold over baseline) elevated - dpi in all but a single animal (ld-agm- ). circulating sars-cov- specific antibodies to total virus, nucleoprotein, and the receptor binding domain (rbd) of the spike protein were higher in all animals on dpi of the hd group compared to the ld group and untreated subjects, but by day , spike rbd titers in / hd animals began to decline slightly (figure a,b,c) . three of four animals in the hd-treated group had mean prnt titers of ~ : at dpi, while the fourth (hd-agm- ) had a neutralizing titer between : and : (figure d ). neutralizing titers in agms that have survived experimental sars-cov- challenge have previously been demonstrated to be as low as : , suggesting low neutralizing titers may be negligible when considered as a part of the entire immune response (cross et al., ; woolsey et al., ) . neutralizing antibody titers were markedly lower in ld-treated animals and a nearly complete lack of neutralizing activity was observed in the untreated control animals at dpi. at dpi, neutralizing antibody titers waned to ~ : in hdtreated animals and between : and : in ld-treated animals. a single control animal (c-agm- ) had similar neutralizing activity at this time point (figure e) . the rapid decrease in circulating neutralizing antibody titers by dpi and the lack of robust neutralization in the control and ld-treated animals indicated that much of the neutralizing activity in the hd-treated group was acquired through treatment. we next assessed viral load in whole blood and mucosal swabs on , , , , and dpi, bronchoalveolar lavage (bal) fluid on - , , and dpi, and lung homogenates at the study endpoint ( dpi) by both rt-qpcr detection of viral rna (vrna) and plaque titration of infectious virus. as previously reported (cross et al., ; woolsey et al., ) , there was no circulating sars-cov- detected in the peripheral blood, as assessed by rt-qpcr of whole blood and plaque titration of the plasma fraction, respectively (data not shown). vrna was detected in nasal swabs from three animals from the hd-treated group, four animals from the ld-treated group, and all animals from the control group, including / historical controls (figure a) . notably, in two animals from the hd-treated group, detection of vrna was limited to a single day ( dpi in hd-agm- and dpi in hd-agm- ). a low amount (~ . log pfu/ml) of infectious sars-cov- was recovered from nasal mucosa in a single animal from the hd-treated group dpi, while similar amounts of virus were detected on multiple time points for most animals from both the ld-treated and control groups (figure e) . interestingly, while levels of vrna and infectious virus from the oral mucosa were generally lower and less frequently detected in hdtreated animals compared to ld-treated animals and the two control animals from this study, both vrna and infectious virus were only detected from / animals from historical controls, despite animals from both studies being challenged with identical virus stock via identical challenge route (figure b,f) . detection of vrna from the rectal mucosa was similarly variable, present in samples from only / animals from the hd-treated group, / from the ld-treated group, and only two control animals (c-agm- and a single hc-agm), while infectious virus was not recovered from the rectal swabs of any animal (figure d , and data not shown). strikingly, while sars-cov- vrna was detected at similar levels in the bal fluid from all animals at dpi, only / animals in the hd-treated group had detectable vrna at dpi, compared to all of the animals in both the ld-treated and control groups (figure c ), while infectious virus was not recoverable at all from the bal fluid of both treated groups on day . conversely, infectious virus was detected from both controls from this study and all historical controls (figure g ). genomic vrna was detected in similar quantities from the lower, middle, and upper sections of the lungs from all animals in all groups (figure h ). while comparisons of viral load in mucosal compartments failed to reach statistical significance, taken together, these data suggest that treatment with convalescent sera may decrease viral replication and shedding and thus mitigate disease and transmission. on dpi, all animals were euthanized to gauge viral burden and determine pathological changes in the lungs associated with the treated infection. multifocal pulmonary consolidation with hyperemia and hemorrhage were noted in all agms at dpi. in all agms, the most severe lesions were located in the dorsal aspects of the lower lung lobes (figure ) . a board-certified veterinary pathologist approximated gross lesion severity for each lung lobe. gross lung scores were most severe in untreated control agms followed by the ld-treated animals and least severe among the hd-treated agms (figure , s ) . mild lymphoid enlargement was noted in one untreated control agm and two ld-treated agms. histologically, the untreated control agms euthanized at dpi developed interstitial pneumonia and multifocal alveolar flooding with edema, fibrin, red blood cells and mixed inflammatory cells, as previously observed (cross et al., ; woolsey et al., ) . the interstitial pneumonia was characterized by multifocal moderate expansion of alveolar septae with macrophages, lymphocytes, and fewer numbers of neutrophils (figure a) . bronchial respiratory epithelium was multifocally ulcerated with associated underlying acute inflammation. modest expansion of alveolar septae with collagen was noted multifocally (figure d) we measured plasma concentrations of a panel of cytokines/chemokines known to be altered in human covid- (laing et al., ) . on dpi, interleukin- (il- ) and il- were significantly elevated in control animals compared to both ld and hd groups [il- : p= . (hd) and . (ld) and il- p : p=< . (hd) and . (ld) two way anova supported by tukey's multiple comparison test] (figure b, e) . interferon (ifn) gamma-induced protein (ip- ) was significantly elevated only in the ld cohort [ p=< . (compared to control) and < . (compared to hd) two way anova supported by tukey's multiple comparison test] (figure c ). significant elevations in macrophage chemotactic protein (mcp- ) were observed in the all study and historical control animals [p= . (compared to hd) two way anova supported by tukey's multiple comparison test]as well as the ld [p= . (compared to ld) two way anova supported by tukey's multiple comparison test] group on day (figure k) . on day , il- was elevated only in two control animals. (figure a ). reduced levels of ifn beta, tumor necrosis factor (tnf) alpha, ifn gamma, il- , and mcp- were observed in all of the hd-treated animals as compared to control groups beginning dpi and trending through the remainder of the study (figure g , h, i, j, and k). in response to the dire need for therapeutic options, several human clinical trials have been initiated to evaluate both the safety and the efficacy of passive transfer of ccp to acutely ill covid- patients (joyner et al., a) . overall, the consensus of these studies is that therapeutic benefit is possible; however, a number of caveats associated with doing human studies including unknown incubation times, effective dose, and/or route(s) of exposure. these shortcomings highlight the need for experimentally-controlled studies to help understand potential risks of this approach. others have demonstrated therapeutic benefit in rodent models of covid- but limited information was gleaned in terms of the pathophysiological parameters impacted by treatment with ccp (chan et al., ; imai et al., ) . while hyper-immune plasma has been shown to provide therapeutic benefit in related mers-cov-infected primates (van doremalen et al., ) , to date, there are no reports demonstrating convalescent sera or plasma treatment benefit in nhp models of covid- . here, we provide evidence that convalescent sera with high neutralizing antibody titers can be given as a postexposure treatment using a nhp model which faithfully recapitulates hallmark features of human covid- . compared to ld-treated or untreated control animals, hd-treated agms had lower viral burdens in respiratory compartments, reduced gross and histopathological lesion severity in the lungs, and reductions in several clinical parameters previously shown to be important in human disease including: pro-longed coagulation times, elevated fibrinogen, thrombocytopenia, and hypercytokinemia. differences in clinical parameters of the ld-treated group with untreated control animals from this study or historical control animals were minimal; however, the lack of infectious sars-cov- in the bal samples from all of the ld-treated animals and reduced lung pathology suggest that an antiviral effect was present despite the lower concentration of neutralizing antibodies in the dose of convalescent sera administered. nonetheless, these data support the need for high potency neutralizing antibody content in convalescent blood product preparations to achieve maximal therapeutic benefit. previous work utilizing passive transfer of potent neutralizing antibody containing plasma derived from a cohort of rhesus macaques vaccinated with a sars-cov spike expressing adc-mva vaccine resulted in exacerbation of pulmonary disease suggestive of antibody dependent enhancement (ade) (liu et al., ) , which served to caution the design of therapeutics and vaccines of potential dangers of targeting this antigen for sars-covs. while we characterized the presence of antibodies capable of binding all sars-cov- proteins, and more specifically nucleoprotein and the rbd of spike, we did not assess the differential binding potential of all of these targets in the treatments provided. therefore, we cannot rule out other antiviral mechanisms such as antibody-binding capacity or fragment crystallizable (fc) region-directed functions associated with antibodies present in these treatments. nor can we account for other soluble factors that may have been present in sera derived from survival of natural infection, which may explain the differences observed in our data versus the passive transfer of vaccine-derived sera. importantly, as a result of the lack of available therapeutic options, and the previous successes with cp in the treatment of sars-cov and mers-cov, thousands of humans have been treated with ccp through compassionate use or clinical trial access channels where preliminary results suggest treatment is well tolerated (joyner et al., b) . meta-analysis of the available data suggest a clear benefit as measured by reduced hospitalization times and minimal risk where adverse events were classified as common to cp treatments in general rather than complications related specifically to ccp or ade (joyner et al., a; sun et al., ) . ongoing clinical trials are working to refine the treatment approaches and our understanding of precisely what is needed to improve efficacy in terms of therapeutic dosing of ccp and issues related to safety (joyner et al., b) . several recent studies suggest that timing and dose may be critical to the success of treatment. for example, less therapeutic benefit was observed in patients that began treatment during advanced disease (bradfute et al., ; gharbharan et al., ) . further, a recent large scale clinical trial in india highlighted the importance of donor ccp potency where the median neutralizing antibody potency of donor ccp was : (interquartile range = : - : ) compared to our study wherein both treatment groups prnt values were considerably higher ( ~ : low dose pool or : high dose pool). the trial was terminated early as no clear therapeutic benefit was evident. (agarwal et al., ) . here we have provided detailed experimental evidence in a relevant animal model of covid- which supports early administration of potent ccp. given the exponential growth of a convalescent covid- population, the availability of ccp as a treatment may serve as a therapeutic bridge until more potent and targeted immunotherapeutics such as monoclonal antibodies or other antiviral drugs become available. the authors would like to thank the utmb animal resource center for husbandry support of the authors declare no competing interests. the data sets used and/or analyzed during the current study are available from the corresponding author on reasonable request. , et al. ( ) . immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind, placebo-controlled, phase trial. the lancet , - . the virus (sars-cov- /inmi -isolate/ /italy) was isolated on january , from the sputum of the first clinical case in italy, a tourist visiting from the hubei province of china that developed respiratory illness while traveling. the virus was initially passaged twice (p ) on vero e cells; the supernatant and cell lysate were collected and clarified following a freeze/thaw cycle. this isolate is certified mycoplasma and foot-and-mouth disease virus free. the complete sequence was submitted to genbank (mt ) and is available on the gisaid website (betacov/italy/inmi -isl/ : epi_isl_ ) upon registration. for in vivo challenge, the p virus was propagated on vero e cells and the supernatant was collected and clarified by centrifugation making the virus used in this study a p stock. prior to enrollment in this study, agms (chlorocebus aethiops; n= ; males, females; st kitts origin, worldwide primates, inc.) were tested for sero-reactivity to sars-cov- . all animals were seronegative. animals were anesthetized with ketamine and inoculated with a target dose of . x pfu of sars-cov- (sars-cov- /inmi -isolate/ /italy) through combined intranasal (i.n.) and intratracheal (i.t.) inoculation, with the dose being divided evenly between both routes. all animals were longitudinally monitored for clinical signs of illness, respiration quality, and clinical pathology. all measurements requiring physical manipulation of the animals were performed under sedation by ketamine. mucosal swabs were obtained using sterile swabs inserted into the mucosal cavity, gently rotated to maximize contact with the mucosal surface, and deposited into . ml screw-top tubes containing sterile mem media supplemented to % with fbs. on specified procedure days (days , , , , ), μl of blood was added to μl of avl viral lysis buffer (qiagen) for virus inactivation and rna extraction. following removal from the high containment laboratory, rna was isolated from blood and swabs using the qiaamp viral rna kit (qiagen). rna was isolated from blood and mucosal swabs and assessed using the cdc sars-cov- n assay primers/probe for reverse transcriptase quantitative pcr (rt-qpcr) [ ] . sars-cov- rna was detected using one-step probe rt-qpcr kits (qiagen) run on the cfx detection system (bio-rad), with the following cycle conditions: °c for minutes, °c for seconds, and cycles of °c for seconds and °c for seconds. threshold cycle (ct) values representing sars-cov- genomes were analyzed with cfx manager software, and data are presented as geq. to generate the geq standard curve, rna was extracted from supernatant derived from vero e cells infected with sars-cov- /inmi -isolate/ /italy was extracted and the number of genomes was calculated using avogadro's number and the molecular weight of the sars-cov- genome. all blood plasma and mucosal swabs, and bronchoalveolar lavage (bal) samples. briefly, increasing -fold dilutions of the samples were adsorbed to vero e cell monolayers in duplicate wells ( μl). cells were overlaid with emem medium plus . % avicel, incubated for days, and plaques were counted after staining with % crystal violet in formalin. the limit of detection for this assay is pfu/ml. neutralization titers were calculated by determining the dilution of serum that reduced % of plaques (prnt ). a standard pfu amount of sars-cov- was incubated with two-fold serial dilutions of serum samples for one hour. the virus-serum mixture was then used to inoculate vero e cells for minutes. cells were overlaid with emem medium plus . % avicel, incubated for days, and plaques were counted after staining with % crystal violet in formalin. sars-cov- -specific igg antibodies were measured in sera by elisa at the indicated time points. immunosorbent -well plates were coated overnight with each antigen. for total virusspecific igg, plates were coated with a : dilution of irradiated sars-cov- infected or normal vero e lysate in pbs (ph . ) kindly provided by dr. thomas w. ksiazek (utmb). nucleoprotein and spike receptor binding domain (rbd) elisa kits were kindly provided by zalgen labs, llc. sera were initially diluted : and then two-fold through : , in % bsa in × pbs or in zalgen-provided reagents. after a one-hour incubation, plates were washed six times with wash buffer ( x pbs with . % tween- ) and incubated for an hour with a : dilution of horseradish peroxidase (hrp)-conjugated anti-primate igg antibody (fitzgerald industries international; cat: r-ig hrp). rt sigmafast o-phenylenediamine (opd) substrate (p , sigma) was added to the wells after six additional washes to develop the colorimetric reaction. the reaction was stopped with m sulfuric acid - minutes after opd addition and absorbance values were measured at a wavelength of nm on a spectrophotometer (biotek cytation system). absorbance values were normalized by subtracting uncoated wells from antigen-coated wells at the corresponding serum dilution. end-point titers were defined as the reciprocal of the last adjusted serum dilution with a value ≥ . . necropsy was performed on all subjects euthanized at dpi. tissue samples of all major organs were collected for histopathologic and immunohistochemical (ihc) examination and were immersion-fixed in % neutral buffered formalin for > days. specimens were processed and embedded in paraffin and sectioned at μm thickness. for ihc, specific anti-sars immunoreactivity was detected using an anti-sars nucleocapsid protein rabbit primary antibody at a : dilution for minutes (novusbio). the tissue sections were processed for ihc using the thermofisher scientific lab vision autostainer (thermofisher scientific). secondary concentrations of immune mediators were determined by flow cytometry using legendplex multiplex technology (biolegend). serum levels of cytokines/chemokines were quantified using nonhuman primate inflammation -plex ( : dilution) kit. samples were processed in duplicate following the kit instructions and recommendations. following bead staining and washing, bead events were collected on a facs canto ii cytometer (bd biosciences) using bd facs diva software. the raw .fcs files were analyzed with biolegend's cloud-based legendplex™ data analysis software. the data was analyzed and graphed in graphpad prism . . lesion severity scores and tissue pcr and plaque assay titers were analyzed using either a one-way anova or a two-way anova supported by tukey's multiple comparisons test. elisas and legendplex assays were also analyzed using a two-way anova and tukey's multiple comparisons test. no data points were excluded for our analyses. significance cut-off values are outlined per data in their respective results sections only where significant differences in groups compared existed. (cross et al., ; woolsey et al., ) . elisa binding titers of sera collected on the indicated time points from agms enrolled in the current study following challenge with sars-cov- and passive transfer of convalescent sera for igg against sars-cov virus cellular lysates (total, (a)), nucleoprotein (n, (b) ), or spike protein receptor binding domain (rbd, (c)). prnt assays were performed on pooled convalescent sera from agms challenged with the homologous isolate of sars-cov- in previous studies (cross et al., ; woolsey et al., ) compared with control animals on day post infection (d) and day post infection (e). data shown is the percent reduction in sars-cov- plaque counts (mean of duplicate wells) compared to a control plate (no sera). agms. sars-cov- viral load was assessed by plaque titration (b, d, f) and/or rt-qpcr (a, c, e, g, h) from mucosal swabs and bal fluid collected at the indicated timepoints, and lung tissue harvested at necropsy ( dpi). for all panels, individual data points represent the mean of two technical replicates from a single assay. dashed horizontal lines indicate the limit of detection (lod) for the assay ( geq/ml for rt-qpcr; pfu/ml for plaque titration). bars indicate the mean for each cohort at the indicated time point, values for individual animals within each cohort are shown as color-coded symbols. historical control animals from a previous study utilizing the homologous virus are included for statistical purposes. error bars indicate the upper bound sd. to fit on a log scale axis, zero values (below lod) are plotted as " " ( ); however, statistical comparisons were performed using the original zero values. statistical significance was assessed by two-way anova with the geisser-greenhouse correction without the assumption of sphericity, except for (e) where mixed-effects modeling was used to account for missing historical control values at - dpi, and (f) where no correction for sphericity was necessary as only two time points were being compared. tukey's post-hoc test for multiple comparisons was used to assess differences between cohorts at matched time points. statistically significant comparisons are indicated by asterisk(s) (*= . ≤ p ≤ . , **= . ≤ p ≤ . ). baselines ( dpi) were calculated for each subject at the indicated timepoints. a two-way anova and tukey's multiple comparisons test was used to compare each treatment group at the respective time point (*, p < . ; **, p < . ; ***, p < . ; ****, p < . ). shown is the average value of duplicate samples for each subject and analyte. historical control animals from a previous study utilizing the homologous virus were included for statistical purposes. figure s : coagulation profiles of agms following challenge with sars-cov- and treatment with convalescent sera. prothrombin time (pt) (a), activated partial thromboplastin time (aptt) (b), and fibrinogen levels (shown as fold change from individual subject baseline values) (b), were measured using a vetscan vspro coagulation analyzer. note that historical control animals were not included for analysis in panels (a-d) due to a lack of available data for these parameters. bars indicate the mean for each cohort at the indicated timepoint, values for individual animals within each cohort are shown as color-coded symbols. error bars indicate the upper bound sd. statistical significance was assessed by two-way anova with the geisser-greenhouse correction without the assumption of sphericity, followed by tukey's post-hoc test for multiple comparisons to assess differences between cohorts at matched time points. asterisk indicates significance ( . ≤ p ≤ . ). (d) fold change from individual subject baselines ( dpi) in absolute counts of thrombocytes were calculated for each subject at the indicated timepoints (a). historical control animals from a previous study utilizing the homologous virus were included for statistical purposes. convalescent plasma in the management of moderate covid- in india: an open-label parallel-arm phase ii multicentre randomized controlled trial (placid trial). medrxiv feasibility, safety, clinical, and laboratory effects of convalescent plasma therapy for patients with middle east respiratory syndrome coronavirus infection: a study protocol remdesivir for the treatment of covid- -preliminary report severe acute respiratory syndrome coronavirus neutralizing antibody titers in convalescent plasma and recipients in new mexico: an open treatment study in patients with coronavirus disease . the journal of infectious diseases passive antibody therapy for infectious diseases national covid- convalescent plasma project-component simulation of the clinical and pathological manifestations of coronavirus disease rapid generation of a human monoclonal antibody to combat middle east respiratory syndrome intranasal exposure of african green monkeys to sars-cov- results in acute phase pneumonia with shedding and lung injury still present in the early convalescence phase convalescent plasma for covid- . a randomized clinical trial. medrxiv sars-cov- infection of african green monkeys results in mild respiratory disease discernible by pet/ct imaging and shedding of infectious virus from both respiratory and gastrointestinal tracts syrian hamsters as a small animal model for sars-cov- infection and countermeasure development an mrna vaccine against sars-cov- -preliminary report development of a coronavirus disease nonhuman primate model using airborne exposure. biorxiv evidence favouring the efficacy of convalescent plasma for covid- therapy. medrxiv early safety indicators of covid- convalescent plasma in patients a dynamic covid- immune signature includes associations with poor prognosis anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection comparison of nonhuman primates identified the suitable model for covid- the effectiveness of convalescent plasma and hyperimmune immunoglobulin for the treatment of severe acute respiratory infections of viral etiology: a systematic review and exploratory meta-analysis convalescent plasma and covid- comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model lessons for covid- immunity from other coronavirus infections treatment of critically ill patients with covid- with convalescent plasma sars-cov- infection leads to acute infection with dynamic cellular and inflammatory flux in the lung that varies across nonhuman primate species. biorxiv sars-cov- infection dynamics in lungs of african green monkeys. biorxiv a potentially effective treatment for covid- : a systematic review and meta-analysis of convalescent plasma therapy in treating severe infectious disease fda issues emergency use authorization for convalescent plasma as potential promising covid- treatment, another achievement in administration's fight against pandemic establishment of an african green monkey model for covid- . biorxiv : the preprint server for biology key: cord- - gp cry authors: hoagland, daisy a.; clarke, daniel j.b.; møller, rasmus; han, yuling; yang, liuliu; wojciechowicz, megan l.; lachmann, alexander; oguntuyo, kasopefoluwa y.; stevens, christian; lee, benhur; chen, shuibing; ma’ayan, avi; tenoever, benjamin r title: modulating the transcriptional landscape of sars-cov- as an effective method for developing antiviral compounds date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: gp cry to interfere with the biology of sars-cov- , the virus responsible for the covid- pandemic, we focused on restoring the transcriptional response induced by infection. utilizing expression patterns of sars-cov- -infected cells, we identified a region in gene expression space that was unique to virus infection and inversely proportional to the transcriptional footprint of known compounds characterized in the library of integrated network-based cellular signatures. here we demonstrate the successful identification of compounds that display efficacy in blocking sars-cov- replication based on their ability to counteract the virus-induced transcriptional landscape. these compounds were found to potently reduce viral load despite having no impact on viral entry or modulation of the host antiviral response in the absence of virus. rna-seq profiling implicated the induction of the cholesterol biosynthesis pathway as the underlying mechanism of inhibition and suggested that targeting this aspect of host biology may significantly reduce sars-cov- viral load. a zoonotic transmission of an airborne virus capable of rapid spread has always been a potentially devastating threat to human society; this risk has only increased with globalization. indeed, the recent emergence of sars-cov- has proved an unprecedented global challenge. as the scientific community works fastidiously to learn more about the biology of the virus and develop a vaccine, there is a pressing need to identify therapeutics to reduce morbidity and mortality. to better understand the basis of the disease, initial efforts have focused on determining how different cell types and tissues respond to sars-cov- infection (blanco-melo et al., ; zhou et al., ; ziegler et al., ) . these efforts have generated a high-level understanding of how virus replication correlates with induction of the antiviral response and the changes in cellular processes. in general, a viral infection results in the production of numerous inflammatory triggers which can include aberrant rna, misfolded proteins, and/or damage to the cell (devarkar et al., ) . in the case of sars-cov- , the virus appears to mask aberrant rna to prevent a robust induction of the cellular antiviral response. however, the processes of sars-cov- replication still result in a unique transcriptional footprint that presumably accommodates the virus life cycle (blanco-melo et al., ) . here we attempt to leverage this information to predict compounds that counter these conditions by inverting transcriptional signatures. ideally, in the face of a pandemic we would be able to deploy a pan-antiviral that universally targets a transcriptional footprint common to all viruses. as viruses are obligate parasites, they almost exclusively require host machinery to effectively replicate and spread, making it difficult to develop an antiviral nontoxic to the host. adding to this challenge is the fact that every human pathogen antagonizes some portion of the cellular antiviral response making each transcriptional landscape unique (garcía-sastre, ) . while development of family-specific antivirals has had success with viruses such as orthomyxoviridae (krammer et al., ) . and human immunodeficiency virus (gulick and flexner, ) , such antivirals still do not exist for the coronavirus family. in the absence of a pan-specific coronavirus drug, or a sars-cov- vaccine, the next most useful tool would be the effective repurposing of fda-approved drugs. as this expansive list of compounds has been characterized in humans for toxicity and delivery, identified compounds could be immediately deployed should they show beneficial effects. so far, most drug repurposing efforts involve computational predictions via structural biology, network analysis, or in vitro drug screens (gordon et al., ) . until now, these drug screens have produced low overlap (kuleshov et al., ) , a phenomenon likely due to many variables including the screen's readout or the choice of cell platform. there is underutilized opportunity to understand the global molecular changes these drugs induce by using available sequencing data from diverse models and cell types. the library of integrated network-based cellular signatures (lincs) is an nih consortium working to define multi-omics footprints for pre-clinical and approved drugs subramanian et al., ) . this resource serves as a promising computational method to find compounds which may counter or mimic the gene expression changes induced by a given perturbation. by comparing changes in gene expression before and after drug treatment, and/or viral infection, we can identify transcriptional irregularities that are inversely correlated between these two groups. applied here, such an approach may methodically identify antivirals whose mechanisms of action are based in returning cells to homeostasis while impairing virus replication. as past efforts successfully utilized the lincs resource to identify drugs that attenuate ebola virus in cell-based assays (duan et al., ) , here we applied this approach to identify antivirals for sars-cov- . by analyzing expression profiling of sars-cov- infected cells, we identified a high-dimensional region in expression space unique to this virus, namely cholesterol biosynthesis, whose perturbation interferes with virus biology. these drugs were validated in multiple cell-based assays including sars-cov- -infected vero cells, a ace cells, and in human organoids. in all, these efforts identified four promising drug candidates: amlodipine, loperamide, terfenadine and berbamine which each inhibit the virus independent of viral entry. in a recent study (blanco-melo et al., ) , we applied gene expression profiling to generate transcriptional signatures from lung cultures infected with sars-cov- . those analyses revealed that virus infection resulted in robust replication and a strong cellular response. as a follow up, we used the rna-seq data collected from these studies and processed them using biojupies to create differential expression signatures (see methods). these signatures were then used as queries against the lincs l dataset, a collection of gene expression profiles generated following the administration of > , bioactive compounds including > , fda-approved drugs to human cell lines at a variety of different times and concentrations (subramanian et al., ) with l fwd , we could identify reciprocal transcriptional signatures generated between sars-cov- infection and a given compound. specifically, we identified a common region in the l fwd expression space that is not well characterized, and where genes down-regulated by sars-cov- infected cells are up-regulated by a collection of drugs and small molecules profiled by the l platform. by manually examining the drugs that are consistently targeting this region in expression space, we identified: terfenadine, loperamide, berbamine, trifluoperazine, amlodipine, rs- , and chlorpromazine as having potential therapeutic value against sars-cov- ( figure a ). in support of these findings, it is noteworthy that loperamide, berbamine, and trifluoperazine were recently reported as hits in an independent drug screen for compounds that block sars-cov- infection in african green monkey kidney (vero-e ) cells (jeon et al., ) . the transcriptional signature implicating the aforementioned compounds include genes down-regulated by the virus that are consistently up-regulated by these drugs ( out of / , p-value< . e- , fisher exact test; and out of / / , p-value< . e- , empirical simulations) ( figure b ). overall, based on the l data, these seven compounds influence the same pharmacological high-dimensional gene expression signature space and are predicted to disrupt key cellular processes that are modulated in response to sars-cov- infection. moreover, the expression signatures induced by all seven drugs across multiple cell-line contexts are highly correlated ( figure s a ). in addition to leveraging the l data, we also examined the top genes co-expressed with ace , the established receptor of sars-cov- (hoffmann et al., ) , based on correlations computed across thousands of diverse rna-seq datasets . we then searched for fda-approved compounds that induce or suppress the expression of these genes using drug signatures extracted from other independent transcriptome studies . this approach identified one additional compound, quercetin, as an effector of these ace co-expressed genes ( figure s b ). hence, all together, eight drugs were selected to be tested for efficacy against sars-cov- in vitro. next, we examined whether these eight drugs influenced sars-cov- infection of vero-e cells by applying them at the same concentration used in the l experiments ( μm) with the exception of quercetin ( μm), which was determined based on the concentration used in the top matching signatures extracted from geo (mutch et al., ) . out of the eight drugs, seven drugs completely abolished the presence of sars-cov- nucleocapsid when administered one hour prior to infection (figure a) . quercetin was the only drug where the viral nucleocapsid remained present, while chlorpromazine induced cytotoxicity and therefore was not characterized further ( figure s a ). we next tested the remaining seven drugs in more relevant cell-based models. specifically, we administered the compounds to an adenocarcinomic human alveolar basal epithelial (a ) cell line that constitutively expresses ace (herein referred to as a ace ). we next pre-treated these cells with terfenadine, loperamide, berbamine, trifluoperazine, amlodipine, and rs- and infected cells with sars-cov- . in agreement with our vero-e results, these data demonstrated that all six drugs successfully blocked virus replication ( figure s b ). these results could be further corroborated at an rna level by performing quantitative rt-pcr based on the ' transcriptionregulatory sequence (trs) and the ' end of the n open reading frame which confirmed the effect of the six remaining drugs in both vero-e ( figure s c ) and in a ace cells ( figure b ). lastly, to assess how these compounds impact viral growth, we performed plaque assay from the supernatant of treated a ace cells and found that terfenadine and berbamine completely inhibited the production of progeny virions below the limit of detection whereas loperamide, trifluoperazine, amlodipine, and rs- all significantly reduced viral output by - orders of magnitude ( figure c ). to determine if inhibition was mediated by a block in viral entry, we tested each compound for the capacity to block renilla luciferase activity delivered by a recombinant vesicular stomatitis virus pseudotyped with sars-cov- spike. these data found that entry was not impacted in response to any of the tested compounds ( figure d ). to further understand the molecular basis for the inhibition of sars-cov- , we next administered the drugs to a ace cells both in the absence and presence of virus and measured genome-wide gene expression with rna-seq under all these conditions. as expected, in the absence of virus we observe that the six drugs induce expression signatures comparable to those generated by processing the l data ( figure figure s ). moreover, sars-cov- viral reads per million reads (rpm) in the rna-seq data are magnitudes lower in the six drugtreated compared to dmso-treated a ace populations ( figure b ). taken together, these annotation signatures implicate cholesterol biosynthesis and cell cycle as the two most prominent pathways modulated by the administration of these compounds. however, as there are thousands of other l compounds known to impact cell cycle, these data would implicate cholesterol biosynthesis as the predominant mode of action in inhibiting sars-cov- in these assays. to further assess the global effects of these drugs, we next applied them to human pluripotent stem cell-derived pancreatic endocrine organoid cultures, previously shown to support robust sars-cov- replication . in addition to testing the drugs for their ability to attenuate infection, we also performed genome-wide rna-seq profiling to examine the effect of these drugs on the human pancreatic organoid cultures. principal component analysis of infected organoids found that all these samples formed three major clusters in this transcriptional space defined by variance one and two ( figure a ). these data demonstrate that pca- could be largely defined as an uninfected vs. infected signature. one exception to this was berbamine, which formed a unique cluster for both variance one and two. when this same data was mapped on variance two and three, one could observe greater separation between the non-berbamine drug treatment and dmsoreflecting the changes to cholesterol biosynthesis ( figure s a ). aligning captured reads to a sars-cov- reference genome, we find that while trifluoperazine and rs- demonstrated only modest reduction of viral load, loperamide, terfenadine, amlodipine, and berbamine all successfully reduced virus transcription by approximately three orders of magnitude ( figure b ). while genes involved in the cholesterol biosynthesis pathway are still upregulated in deg analysis ( figure d -i, tables s - ), the most upregulated go annotation in amlodipine and terfenadine is the type i interferon signaling pathway, and in berbamine treated human organoids is the regulation of signal transduction ( figure c ), indicating a more nuanced mechanism of action in this more biologically relevant tissue model. trifluoperazine and rs- treated organoids' lack of enrichment is reflected in decreased efficacy to lower viral reads ( figure b ). taken together, these data suggest that the small molecules capable of reversing the transcriptional landscape induced by sars-cov- infection may be a bona fide means of identifying novel antiviral therapeutics. while additional in vivo testing will be required to ascertain whether amlodipine, berbamine, loperamide, and/or terfenadine represent effective antivirals against sars-cov- , the findings outlined here also provide us with a greater understanding of virus-host interactions. here we identified several potential drug repurposing candidates based on analysis of published rna-seq and l gene expression data. specifically, we identified a region in the pharmacological gene expression space that regulates cholesterol biosynthesis that appears to negatively impact virus replication. this led us to predict seven drugs as potential sars-cov- inhibitor candidates: terfenadine, loperamide, berbamine, trifluoperazine, amlodipine, rs- , and chlorpromazine. loperamide is an over the counter approved drug used to treat diarrhea which is one of the symptoms of covid- . berbamine is an experimental drug extracted from a shrub native to japan, korea, and parts of china (greathouse and rigler, ) . it is an alkaloid and a channel blocker that was previously shown to display anti-lungcancer (duan et al., ) and anti-inflammatory activity (jia et al., ) . trifluoperazine was shown to block epstein-barr virus (nemerow and cooper, ) , while amlodipine is an approved drug for blood pressure and chest pain (angina) (chahine et al., ) . amlodipine was recently reported to improve the outcome of patients with covid- (solaimanzadeh, ; zhang et al., ) . the preclinical drug rs- is a known ccr antagonist that blocks ccl signaling in sepsis (souto et al., ) . hence, its modulation of the immune response may interfere with sars-cov- host interactions. terfenadine was withdrawn from the market in after years due to evidence for cardiac toxicity via the binding to the herg channel yet may still provide short term benefit as a sars-cov- prophylactic or acute treatment (roy et al., ) . chlorpromazine was shown to block epstein-barr virus (nemerow and cooper, ) and influenza a virus (krizanova et al., ) and very recently was found to be effective against sars-cov- in multiple in-vitro assays (plaze et al., ) , although we ruled it out early due to cellular toxicity. these seven drugs were first tested in vero cells by two independent cell-based assays with six of them showing positive results. the six most promising compounds were then tested in a ace cells and in pancreatic endocrine human organoids. the six drugs significantly diminished sars-cov- infection in a ace cells, and four out of six of these drugs also inhibited the virus in the human organoid model. rna-seq profiling of a ace cells and the organoids in the presence of these drugs showed that berbamine displays quite different transcriptional effects compared with the other compounds. it has a more pronounced effect that is opposite in expression space, bringing the tissue closer to its normal uninfected state. it should be noted that the module of genes that are up regulated by the identified compounds is known to be targeted by commonly used cholesterol lowering drugs. it was suggested that statins may be considered as a potential treatment (castiglione et al., ) , but there is some evidence that statins may increase covid- infection (shrestha, ) . while such effects need to be further determined, our analysis suggests that the same pathways are likely involved in sars-cov- pathogenesis and have potential for sars-cov- inhibition. in this regard it is noteworthy that sars-cov- replication demands extensive intracellular membranes (snijder et al., ) which would be impacted by changes to cholesterol biogenesis, as is the case for hepatitis c virus, another positive sense rna virus (paul et al., ; stoeck et al., ) . while we demonstrated here the efficacy for several promising repurposed drugs as potential covid- treatments in multiple cell-based assays, it remains to be determined whether these drugs will show any benefit to patients. the next step will be to test these compounds in an animal model before initiating clinical trials. furthermore, with varying efficacy in different cell culture models and tentative value as therapeutics, all six aforementioned drugs clearly inhibit sars-cov- replication in vitro. should animal models prove these drugs to be ineffective, these data would still be of value as they can be used to study the biology of sars-cov- as it pertains to cholesterol biosynthesis. direction method on the normalized matrix or with limma on the original count matrix filtered using the method described in (chen et al., ) . all differentially expressed genes were submitted for analysis with l fwd and enrichr. jupyter notebooks from previously published rna-seq data were generated with biojupies. consensus analysis was performed by counting gene overlap in the top up or down differentially expressed genes based on the consensus l fwd and rna-seq signatures generated from the a -ace cells. consensus l fwd genes for the drugs were created by enumerating all up and down signatures present in the l fwd database associated with those drugs. first the instances of genes appearing in the up or down gene sets were counted, and the cdf of the z-scored counts was computed to quantify the probability of a given gene being an essential member of signatures for a given drug. finally, the down probabilities were subtracted from the up probabilities and summed for all drugs resulting in genes ranked by their probable essentiality across all drugs. the top positive (mostly up regulated) and negative (mostly down regulated) genes were taken to represent the consensus genes for these drugs. the rna-seq signatures from the a -ace cells were computed by differential expression analysis with limma on the filtered count matrix. the top up and down regulated genes were taken as the signature. to generate the figure that compares the l differential expression signatures, we obtained all the signatures for the drugs rs- , trifluoperazine, chlorpromazine, amlodipine, berbamine, loperamide, and terfenadine from the l fwd processed data. drugs are tested in different cell lines and batches, resulting in , , , , , , and signatures, respectively. we then calculated a representative signature for each drug by averaging the differential expression per gene across the replicates. we then calculated the representative signature for all , unique drugs contained within the l fwd database. we calculated the pairwise similarity between the representative signatures by pearson correlation. the t-test was applied to compare the pairwise correlation distribution between the candidate drugs and the distribution of all pairwise correlation scores. ipsc-derived organoids: undifferentiated human pluripotent stem cells (hpscs) were maintained in feeder-free conditions. briefly, hpscs were cultured on matrigel-coated -well plates in commercial stemflex™ medium (thermofisher). the medium was changed every day, and cells were passaged every ∼ days using relesr™ (stem technologies). in all normal hpsc cultures, μm rho-associated protein kinase (rock) inhibitor y- was only added into the culture media when passaging or thawing hpscs. for pancreatic endocrine organoid differentiation, we adapted previous published protocols . a) l fwd analysis projecting the sars-cov- uninfected vs. infected a -ace rnaseq signature onto the l fwd space (a) where red represents mimicking signatures and blue represents reversing signatures. top mimickers are highlighted in pink and top reversers are highlighted in cyan. projection of reversing drugs: terfenadine, loperamide, berbamine, amlodipine, trifluoperazine, rs- , and chlorpromazine. here signatures are colored by their known mechanisms of action where the signatures for each drug are highlighted in yellow. b) upset plot to visualize the overlap between consensus top up and down genes for the drugs tested in a -ace cells by rna-seq and from their collective signatures from the l fwd resource. these signatures are compared to the top up and down genes computed from a signature created from sars-cov- uninfected vs. infected a ace cells from gse . significance overlap is observed for the intersection between the consensus up genes from the rna-seq and l drug only sets ( genes, p-value< . e- , fisher exact test), and between the consensus up genes from the rna-seq, l drug only sets, and the top down genes from the a -ace infected cells ( genes, p-value< . e- , empirical sampling fitted to a poisson distribution). these genes are listed above their associated bar. a ace were either mock-infected (mock) and treated with dmso or infected with sars-cov- in the presence of dmso or the indicated drugs for hours at moi . . a) principal component analysis of the rna-seq data samples in pc /pc space. b) sars-cov- reads in the rna-seq data normalized to total reads per million of each sample. c) top enriched go term after stringdb analysis comparing either drug treated to dmso-treated ace -a s without infection, or drug-treated to dmso-treated a ace after sars-cov- infection. d-i) volcano plots of the infected a ace populations comparing dmso-or drug candidate-treated after sars-cov- infection. significance compared to dmso treated values and determined by unpaired two-tailed student's t-test, p< . =***; n= biological replicates for rna-seq data. human organoids were either mock-infected (mock) and treated with dmso or infected with sars-cov- in the presence of dmso or the indicated drugs for hours at moi . . a) principal component analysis of the rna-seq data samples in pc /pc space. b) sars-cov- reads in the rna-seq data normalized to total reads per million of each sample. c) top enriched go term after stringdb analysis comparing drug-treated to dmso-treated human organoids after sars-cov- infection. d-i) volcano plots of the infected human organoids comparing dmso-or drug candidate-treated after sars-cov- infection. significance compared to dmso treated values and determined by unpaired student's t-test, p< . = **, p< . = ****, n= biological replicates for rna-seq data. figure s : correlation among predicted drugs in the l space and strategy to predict quercetin a) pearson correlation distributions between l l fwd signatures for all possible pairs from the predicted (red) and between random pairs of signatures for other l l fwd drugs (blue). b) alternative strategy to predict small molecules for inhibiting sars-cov- . random walk plot of the ranks of quercetin signatures from the drug perturbation geo library from creeds (right). the ranks are produced by submitting the top most co-expressed genes with the gene ace based on the archs co-expression matrix for human rna-seq data. these genes are submitted to enrichr to rank drugs against the "drug perturbations from geo down" gene set library. significance compared to dmso treated values and determined by unpaired two-tailed student's t-test, p< . = *** p< . = ****, n= biological replicates for rt-qpcr and cell viability data. human organoids were either mock-infected (mock) and treated with dmso or infected with sars-cov- in the presence of dmso or the indicated drugs for hours at moi . ; n= biological replicates. a) pca analysis of pancreatic endocrine human organoids in pc /pc space. . supplementary tables - degs from a cells treated with drug candidates supplementary tables - degs from a cells treated with drug candidates and infected with sars-cov- supplementary tables - degs from pancreatic organoids treated with drug candidates and infected with sars-cov- biojupies and enrichr profiles for each drug condition imbalanced host response to sars-cov- drives development of covid- statin therapy in covid- infection randomized placebo-controlled trial of amlodipine in vasospastic angina from reads to genes to pathways: differential expression analysis of rna-seq experiments using rsubread and the edger quasi-likelihood pipeline structural basis for m g recognition and '-o-methyl discrimination in capped rnas by the innate immune receptor rig-i suppression of human lung cancer cell growth and migration by berbamine l cds : lincs l characteristic direction signatures search engine ten strategies of interferon evasion by viruses a sars-cov- protein interaction map reveals targets for drug repurposing isolation of the alkaloids, berberine and berbamine, from mahonia swaseyi long-acting hiv drugs for treatment and prevention sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor identification of antiviral drug candidates against sars-cov- from fda-approved drugs berbamine exerts antiinflammatory effects via inhibition of nf-κb and mapk signaling pathways the library of integrated network-based cellular signatures nih program: system-level cataloging of human cells response to perturbations influence of chlorpromazine on the replication of influenza virus in chick embryo cells the covid- gene and drug set library massive mining of publicly available rna-seq data from human and mouse hepatic cytochrome p- reductase-null mice show reduced transcriptional response to quercetin and reveal physiological homeostasis between jejunum and liver infection of b lymphocytes by a human herpesvirus, epstein-barr virus, is blocked by calmodulin antagonists morphological and biochemical characterization of the membranous hepatitis c virus replication compartment inhibition of the replication of sars-cov- in human cells by the fda-approved drug chlorpromazine. biorxiv herg, a primary human ventricular target of the nonsedating antihistamine terfenadine statin drug therapy may increase covid- infection a unifying structural and functional model of the coronavirus replication organelle: tracking down rna synthesis nifedipine and amlodipine are associated with improved mortality and decreased risk for intubation and mechanical ventilation in elderly patients hospitalized for covid- essential role of ccr in neutrophil tissue infiltration and multiple organ dysfunction in sepsis hepatitis c virus replication depends on endosomal cholesterol homeostasis a next generation connectivity map: l platform and the first , , profiles biojupies: automated generation of interactive notebooks for rna-seq data analysis in the cloud l fwd: fireworks visualization of drug-induced transcriptomic signatures extraction and analysis of signatures from the gene expression omnibus by the crowd a human pluripotent stem cell-based platform to study sars-cov- tropism and model virus infection in human cells and organoids calcium channel blocker amlodipine besylate is associated with reduced case fatality rate of covid- patients with hypertension sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues key: cord- - dfaqsv authors: moore, anne c.; dora, emery g.; peinovich, nadine; tucker, kiersten p.; lin, karen; cortese, mario; tucker, sean n. title: pre-clinical studies of a recombinant adenoviral mucosal vaccine to prevent sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: dfaqsv there is an urgent need to develop efficacious vaccines against sars-cov- that also address the issues of deployment, equitable access, and vaccine acceptance. ideally, the vaccine would prevent virus infection and transmission as well as preventing covid- disease. we previously developed an oral adenovirus-based vaccine technology that induces both mucosal and systemic immunity in humans. here we investigate the immunogenicity of a range of candidate adenovirusbased vaccines, expressing full or partial sequences of the spike and nucleocapsid proteins, in mice. we demonstrate that, compared to expression of the s domain or a stabilized spike antigen, the full length, wild-type spike antigen induces significantly higher neutralizing antibodies in the periphery and in the lungs, when the vaccine is administered mucosally. antigen-specific cd + and cd + t cells were induced by this leading vaccine candidate at low and high doses. this fulllength spike antigen plus nucleocapsid adenovirus construct has been prioritized for further clinical development. the emergence of a novel coronavirus, severe acute respiratory syndrome coronavirus (sars-cov- ), the causative agent of covid- disease, in , has led to a global pandemic and significant morbidity, mortality and socio-economic disruption not seen in a century. coronavirus disease (covid- ) is a respiratory illness of variably severity; ranging from asymptomatic infection to mild infection, with fever and cough to severe pneumonia and acute respiratory distress . current reports suggest that asymptomatic spread is substantial , and sars-cov- infection induces a transient antibody response in most individuals . therefore, development of successful interventions is an immediate requirement to protect the global population against infection and transmission of this virus and its associated clinical and societal consequences. mass immunization with efficacious vaccines has been highly successful to prevent the spread of many other infectious diseases and can also prevent disease in the vulnerable through the induction of herd immunity. significant effort and resources are being invested in urgently identifying efficacious sars-cov- vaccines. a number of different vaccine platforms have demonstrated pre-clinical immunogenicity and efficacy against pneumonia , . several vaccines have demonstrated phase i or phase ii safety and immunogenicity [ ] [ ] [ ] . however, at this time, no vaccine has demonstrated efficacy in the field. the most advanced sars-cov- vaccine candidates are all given by the intramuscular (im) route, with some requiring - °c storage. this is a major barrier for vaccine dissemination and deployment during a pandemic in which people are asked to practice social distancing and avoid congregation. the ultimate goal of any vaccine campaign is to protect against disease by providing enough herd immunity to inhibit viral spread, not to make a set number of doses of vaccine. an injected solution takes a long period of time to administer and distribute and requires costly logistics, which means dose availability does not immediately translate to immunity. further, systemic immunization can induce immunity in the periphery and lower respiratory tract. however, these vaccines cannot induce mucosal immunity in the upper respiratory tract. mucosal iga (with the polymeric structure and addition of the secretory component), creates more potent viral neutralization , can block viral transmission , , and in general, is more likely to create sterilizing immunity given that this is the first line of defense for a respiratory pathogen. mucosal vaccines can induce mucosal immune responses, antibodies and t cells at wet surfaces. we are developing oral vaccines for multiple indications, including influenza and noroviruses, delivered in a tablet form for people. our vaccine platform is a replication-defective adenovirus type- vectored vaccine that expresses antigen along with a novel toll-like receptor agonist as an adjuvant. these vaccines have been well tolerated, and able to generate robust humoral and cellular immune responses to the expressed antigens [ ] [ ] [ ] . protective efficacy in humans was demonstrated against a respiratory virus days or more post vaccination, as shown in a well characterized experimental influenza infection model . furthermore, the vaccine also has the advantage of room temperature stability and needle-free, ease of administration, providing several advantages over injected vaccine approaches with respect to vaccine deployment and access. here, we describe the pre-clinical development of a sars-cov- vaccine based on vaxart's oral adenovirus platform. the key approach was to develop several vaccine candidates in parallel, in order to create premanufacturing seeds while initial immunogenicity experiments were in progress. given that the vaccines were made during the pandemic, rapid decisions were required to keep the manufacturing and regulatory timelines from slipping. we assessed the relative immunogenicity of four candidate vaccines that expressed antigens based on the spike (s) and nucleocapsid (n) sars-cov- proteins. these proteins have been well characterized as antigens for related coronaviruses, such as sars-cov and mers (reviewed in yong, et al., ) and, increasingly, for sars-cov- spike. the aim of our vaccine is to induce immunogenicity on three levels; firstly, to induce potent serum neutralizing antibodies to s, secondly to induce mucosal immune responses, and thirdly to induce t cell responses to both vaccine antigens. this three-fold approach aims to induce robust and broad immunity capable of protecting the individual from virus infection as well as disease, promote rapid dissemination of vaccine during a pandemic, and to protect the population from virus transmission through herd immunity. here, we report the induction of neutralizing antibody (nab), igg and iga antibody responses, and t cell responses in mice following immunization of rad vectors expressing one or more sars-cov- antigens. initially, three different rad vectors were constructed to express different sars-cov- antigens. these were a vector expressing the full-length s protein (rad-s), a vector expressing the s protein and the n protein (rad-s-n), and a vector expressing a fusion protein of the s domain with the n protein (rad-s -n). the n protein of rad-s-n was expressed under control of the human beta actin promoter, which is much more potent in human cells than mouse cells. an additional construct where the expressed s protein was fixed in a prefusion conformation (rad-s(fixed)-n) was constructed at a later date as a control for exploring neutralizing antibody responses. these are described in figure . expression of the various transgenes was confirmed following infection of cells using flow cytometry and monoclonal antibodies to the s or n protein (supplemental figure ) . the primary objective of the initial mouse immunogenicity studies was to determine which of the rad vectors induced significant antibody responses to s, and to obtain those results rapidly enough to provide a gmp seed in time for manufacturing. we and others have observed that transgene expression by vaccine vectors orally administered to mice can be suppressed in their intestinal environment, so immunogenicity was assessed following intranasal (i.n.) immunization. animals were immunized i.n. and the antibody titers were measured over time by igg elisa. all three rad vectors induced nearly equivalent anti-s igg titers, at weeks and and the igg titer in all animals was significantly boosted by the second immunization (p < . mann whitney t-tests) ( fig. a) . however, the vector expressing full-length s (rad-s-n) induced higher neutralizing titers compared to the vector expressing only s (fig b) . this was measured by two different neutralizing assays, one based on sars-cov- infection of vero cells (cvnt) and one based on a surrogate neutralizing assay (svnt). furthermore, rad-s-n induced higher lung iga responses to s and unsurprisingly, to s (fig. c ) compared to rad-s -n two weeks after the final immunization. notably, neutralizing titers in the lung were also significantly higher when rad-s-n was used compared to the s -containing vaccine (rad-s -n) (fig. d ). this demonstrated that the rad-s-n candidate induced greater functional responses (nab and iga) s s tm cmv s s furin furin compared to the vaccine containing the just the s domain. because the n protein is much more highly conserved than the s protein, and is a target of long term t cell responses induced by infection , the vector rad-s-n was chosen for gmp manufacturing. c d three dose levels of rad-s-n were then tested to understand the dose responsiveness of this vaccine. the antibody responses to both s (fig. a) and s (fig. b) were measured. similar responses were seen at all three dose levels at all timepoints. responses to s and s were significantly increased at week compared to earlier times, in all groups. the induction of s-specific t cells by rad-s-n at different doses was then assessed. induction of antigen-specific cd + and cd + t cells that produced effector cytokines such as ifn-g, tnf-a and il- was observed two weeks after immunizations ( fig a) . notably, little il- was induced by this vaccine and only in cd + t cells; providing a level of assurance that the risk for vaccine dependent enhancement of disease was very low. furthermore, immunization with rad-s-n induced double and triple positive, multi-functional ifn-g, tnf-a and il- cd + t cells (fig. b) . a second dose response experiment was performed to focus on t cell responses to the s protein, weeks after the final immunization (week of the study). splenocytes were stimulated overnight with a peptide library to the s protein, divided in two separate peptide pools. t cell responses in the two pools were summed and plotted (fig. c ). animals administered the e iu and the e iu dose levels had significantly higher t cell responses compared to the untreated animals but produced a similar number of ifn-g secreting cells to each other, demonstrating a dose plateau at the e iu dose. notably, this t cell analysis was conducted weeks after the second immunization, potentially after the peak of t cell responses. balb/c mice were immunized, in, on days and with x iu, x iu or . x iu of rad co-expressing full length s and n (rad-s-n). the amount of igg specific for s (a) and s in serum diluted / , was evaluated using a mesoscale binding assay. points represent the mean and lines represent the standard deviation. a. c. rad-expressed wild-type s induces a superior neutralizing response compared to stabilized/pre-fusion s. an additional study was performed to compare rad-s-n to a vaccine candidate with the s-protein stabilized and with the transmembrane region removed (rad-s(fixed)-n). a stabilized version of the s protein has been proposed as a way to improve neutralizing antibody responses and produce less non-neutralizing antibodies. the s protein was stabilized through modifications as described by amanat et al., . rad-s-n induced higher serum igg titers to s (fig. a ) at both timepoints tested, although these were not statistically significant at week by mann-whitney (p = . ). however, rad-s-n induced significantly higher neutralizing antibody responses (fig. b ) than the stabilized version (p = . ). these results suggest that a wild-type version of the s protein is superior for a rad based vaccine in mice. the endgame to the covid- pandemic requires the identification and manufacture of a safe and effective vaccine and a subsequent global immunization campaign. a number of vaccine candidates have accelerated to phase iii global efficacy testing and, if sufficiently successful in these trials, may form the first generation of an immunization campaign. however, all of these advanced candidates are s-based vaccines that are injected. such an approach will unlikely prevent virus transmission, but should prevent pneumonia and virus growth and damage in the lower respiratory tract and periphery, as evidenced in macaque challenge studies , . one key constraint in a global covid- immunization campaign will be the cold chain distribution logistics and a bottleneck of requiring suitably trained health care workers (hcws) to inject the vaccine. current logistics costs, including cold chain and training, can double the cost of fully immunizing an individual in a low-middle income country (lmic) . implementing a mass immunization campaign, requiring trained hcws for injection-based vaccines, will have a significant impact on healthcare resources in all countries. the need for cold chain, biohazardous sharps waste disposal and training will result in increased cost, inequitable vaccine access, delayed vaccine uptake and prolongation of this pandemic. these costs will be magnified if vaccines are unable to provide long-term protection (natural immunity to other beta-coronaviruses is short-lived ), and annual injection-based campaigns are needed. vaxart's oral tablet vaccine platform provides a solution to these immunological as well as logistic, economic, access and acceptability problems. in this study we demonstrate, in an animal pre-clinical model, the immunogenicity of a sars-cov- vaccine using vaxart's vaccine platform; namely the induction of serum and mucosal neutralising antibodies and poly-functional t cells. mouse studies were designed to test immunogenicity of candidate vaccines rapidly in the spring of , before moving onto manufacturing and clinical studies critical to addressing the pandemic. vaxart's oral tablet vaccine platform has previously proven to be able to create reliable mucosal (respiratory and intestinal), t cell, and antibody responses against several different pathogens in humans , , , . we know from our prior human influenza virus challenge study that oral immunization was able to induce protective efficacy days post immunization; on par with the commercial quadrivalent inactivated vaccine . these features provide confidence that the adoption of the platform to covid- could translate to efficacy against this pathogenic coronavirus and could provide durable protection against virus infection. finally, a tablet vaccine campaign is much easier because qualified medical support is not needed to administer it. this ease of administration will result in increased vaccine access and potentially, acceptability, as has been evidenced by the success of easy-to-administer, oral polio vaccine, in the elimination of polio virus . these features could be even more important during sars-cov- immunization campaigns compared to other vaccines, as substantially more resources may be required to ensure uptake of this vaccine, given the global levels of covid- denialism, mistrust and increased vaccine hesitancy , . the tablet vaccine does not need refrigerators or freezers, does not require needles or vials, and can potentially be shipped via standard mail or by a delivery drone. these attributes significantly enhance deployment and distribution logistics, even permitting access to isolated regions with fewer technical resources. finally, from an immunological perspective, oral administration of this adenovirus is not compromised by pre-existing immunity to adenoviruses nor creates substantial anti-vector immunity , ; issues that have been shown to cause significantly decreased vaccine potency in an rad based sars-cov vaccine and can prevent durable increased immunity when the same adenovirus platform is re-administered by the im route . the choice of antigen can be difficult during a novel pandemic, a time in which key decisions are needed quickly. the s protein is believed to be the major neutralizing antibody target for coronavirus vaccines, as the protein is responsible for receptor binding, membrane fusion, and tissue tropism. when comparing sars-cov- wu- to sars-cov, the s protein was found to have . % identity . both sars-cov and sars-cov- are believed to use the same receptor for cell entry: the angiotensin-converting enzyme receptor (ace ), which is expressed on some human cell types . thus, sars-cov- s protein is being used as the leading target antigen in vaccine development so far and is an ideal target given that it functions as the key mechanism for viral binding to target cells. however, the overall reliance on the s protein and an igg serum response in the vaccines could eventually lead to viral escape. for influenza, small changes in the hemagglutinin binding protein, including a single glycosylation site, can greatly affect the ability of injected vaccines to protect . sars-cov- appears to be more stable than most rna viruses, but s protein mutations have already been observed without the selective pressure of a widely distributed vaccine. once vaccine pressure begins, escape mutations might emerge. we took two approaches to address this issue; firstly to include the more conserved n protein in the vaccine and secondly to induce broader immune responses, namely through mucosal iga. high expression levels of ace are present in type ii alveolar cells of the lungs, absorptive enterocytes of the ileum and colon, and possibly even in oral tissues such as the tongue . transmission of the virus is believed to occur primarily through respiratory droplets and fomites between unprotected individuals in close contact , although there is some evidence of transmission via the oral-fecal route as seen with both sars-cov and mers-cov viruses where coronaviruses can be secreted in fecal samples from infected humans . there is also evidence that a subset of individuals exist that have gastrointestinal symptoms, rather than respiratory symptoms, are more likely to shed virus longer . driving immune mucosal immune responses to s at both the respiratory and the intestinal tract may be able provide broader immunity and a greater ability to block transmission, than simply targeting one mucosal site alone. blocking transmission, rather than just disease, will be essential to reducing infection rates and eventually eradicating sars-cov- . we have previously demonstrated that an oral, tableted rad-based vaccine can induce protection against respiratory infection and shedding following influenza virus challenge as well as intestinal immunity to norovirus antigens in humans . furthermore, mucosal iga is more likely to be able address any heterogeneity of the s proteins in circulating viruses than a monomeric igg response. miga has also been found to be more potent at cross reactivity than igg for other respiratory pathogens . iga may also be a more neutralizing isotype than igg in covid- infection, and in fact neutralizing iga dominates the early immune response . notably, we saw a higher ratio between neutralizing to non-neutralizing antibodies in our lung versus serum antibody results in our mouse study as well, which supports the concept that iga may have more potency compared to igg. polymeric iga, through multiple binding interactions to the antigen and to fc receptors can turn a weak single interaction into a higher overall affinity binding and activation signal, creating more cross-protection against heterologous viruses . our second strategy to mitigate this potential vaccine-driven escape problem was to include the n protein in the vaccine construct. the n protein is highly conserved among b-coronaviruses, (greater than % identical) contains several immunodominant t cell epitopes, and long-term memory to n can be found in sars-cov recovered subjects as well as people with no known exposure to sars-cov or sars-cov- , . in an infection setting, t cell responses to the n protein seem to correlate to increased neutralizing antibody responses . all of these reasons led us to add n to our vaccine approach. the protein was expressed in a cells (supp fig ) . however, as the human beta actin promoter is more active in human cells than mice, we did not explore immune responses in balb/c mice, but will examine them more carefully in future nhp and human studies. the optimum sequence and structure of the s protein to be included in a sars-cov- vaccine is a subject of debate. several labs have suggested that reducing the s protein to the key neutralizing domains within the receptor binding domain (rbd) would promote higher neutralizing antibody responses, and fewer non-neutralizing antibodies , . we made a vaccine candidate composed of the s domain, which includes the rbd, in an attempt to promote this approach. although the s based vaccine produced similar igg binding titers to s , neutralizing antibody responses were significantly lower compared to the full-length s antigen. other gene-based vaccines have also shown the reductionist approach to s does not work so well, demonstrating that the dna vaccine expressing the full-length s-protein produced higher neutralizing antibodies than shorter s segments . in agreement with these macaque studies, we observed that the sequence of the adenovirus encoded antigen had a significant impact on antibody function, here with respect to neutralization. while reducing the potential for exposing non-neutralizing antibody epitopes seems reasonable in theory, this might reduce the t cell help that allows for greater neutralizing antibodies to develop. indeed, of the spike protein t cell responses, which make up % of the responses to sars-cov- , only % map to receptor binding domain . stabilizing the s protein might be important for a protein vaccine, but not necessarily for a gene-based vaccine. the former is produced in vitro and it is produced to retain a homogenous, defined structure, ready for injection. in contrast, the latter, is expressed on the surface of a cell, in vivo, like natural infection, substantially in a prefusion form, and the additional stabilization may be unnecessary for b cells to create antibodies against the key neutralizing epitopes. we directly compared a stabilized version of s to the wild-type version. the wild-type version was significantly better at inducing neutralizing antibody responses. of interest, this was also observed in a dna vaccine study in nhps, where the stabilized version appeared to induce lower neutralizing antibody (nab) titers compared to the wild-type s . a slightly different result was observed in studies of rad vectors by mercado, et al in nhps, where expressing a stabilized version of the s protein appeared to improve nab but lower t cell responses . in summary, stabilization doesn't universally improve the immune responses in gene-based or vectorbased vaccines. multiple vaccine candidates are in, or are about to begin, clinical testing. due to known safety and immunogenicity for epidemic pathogens such as ebola virus, two leading candidate vaccines are based on recombinant adenovirus vectors; university of oxford's chadox -ncov and janssen pharmaceutical's advac platforms [ ] [ ] [ ] [ ] . we saw stronger serum igg and nab titers in our study compared to a chadox -ncov in balb/c mice , however, this might reflect differences in assay components. a rad vaccine study was performed by hassan, et al., where doses of e vp were given by intranasal delivery . the results were significant from the standpoint of blocking lung infection in a mouse sars-cov- challenge model. they reported titers of serum antibody titers of e above the background titers, similar to our results, despite using doses -to -log fold higher viral doses compared to our study. indeed, in our study, equivalently strong t cell and antibody responses were observed using e iu and e iu by the intranasal route. using these doses, we observed high percentages of cd + t cell responses (up to %) secreting ifn-g and tnf-a and strong cd + t cells after peptide restimulation. although we did not evaluate the trafficking properties of these antigen-specific t cells, we know that oral administration of this ad-based vaccine in humans induces high levels of mucosal homing lymphocytes , . a proportion of the antigenspecific cd + and cd + t cells were polyfunctional in this mouse study. vaccine-induced t cells possessing multiple functions may provide more effective elimination of virus subsequent to infection and therefore could be involved in the prevention of disease. however, it is uncertain at this time what is the optimum t cell phenotype required for protection against disease. in summary, these studies in mice represent our first step in creating a vaccine candidate, demonstrating the immunogenicity of the construct at even low vaccine doses and the elucidation of the full-length spike protein as a leading candidate antigen to induce t cell responses and superior systemic and mucosal neutralizing antibody. future work will focus on the immune responses in humans. for this study, four recombinant adenoviral vaccine constructs were created based on the published dna sequence of sars-cov- publicly available as genbank accession no. mn . . specifically, the published amino acid sequences of the sars-cov- spike protein (s protein) and the sars-cov- nucleocapsid protein (n protein) were used to synthesize nucleic acid sequences codon optimized for expression in homo sapiens cells (blue heron biotechnology, bothell, wa). these sequences were used to create recombinant plasmids containing transgenes cloned into the e region of adenovirus type (rad ), as described by he, et al , using the same vector backbone used in prior clinical trials for oral rad tablets , . as shown in fig , c. rad-s -n: rad vector using a fusion sequence combining the s region of sars-cov- s gene (including the native furin site between s and s ) with the full-length sars-cov- n gene. d. rad-s(fixed)-n: rad vector containing a stabilized s gene with the transmembrane region removed under the control of the cmv promoter and full-length sars-cov- n gene under control of the human beta-actin promoter. the s gene is stabilized through the following modifications: a) arginine residues at aa positions , , were deleted to remove the native furin cleavage site b) two stabilizing mutations were introduced: k p and v p c) transmembrane region was removed following p and replaced with bacteriophage t fibritin trimerization foldon domain sequence (gyipeaprdgqayvrkdgewvllstfl) all vaccines were grown in the expi f suspension cell-line (thermo fisher scientific), purified by cscl density centrifugation and provided in a liquid form for animal experiments. studies were approved for ethics by the animal care and use committees (iacuc). all of the procedures were carried out in accordance with local, state and federal guidelines and regulations. female - week old balb/c mice were purchased from jackson labs (bar harbor, me). because mice do not swallow pills, liquid formulations were instilled intranasally in µl per nostril, µl per mouse in order to test immunogenicity of the various constructs. serum was acquired by cheek puncture at various timepoints. elisas. specific antibody titers to proteins were measured similarly to methods described previously . briefly, microtiter plates (maxisorp: nunc) were coated in carbonate buffer ( . m at ph . ) with . ug/ml s protein (genscript). the plates were incubated overnight at °c in a humidified chamber and then blocked in pbs plus . % tween (pbst) plus % bsa solution for h before washing. plasma samples were serially diluted in pbst. after a -h incubation, the plates were washed with pbst at least times. antibodies were then added as a mixture of anti-mouse igg -horseradish peroxidase (hrp) and anti-mouse igg a-hrp (bethyl laboratories, montgomery, tx). each secondary antibody was used at a : , dilution. the plates were washed at least times after a -h incubation. antigen-specific mouse antibodies were detected with , =, , =-tetramethylbenzidine (tmb) substrate (rockland, gilbertsville, pa) and h so was used as a stop solution. the plates were read at nm on a spectra max m microplate reader. average antibody titers were reported as the reciprocal dilution giving an absorbance value greater than the average background plus standard deviations, unless otherwise stated. to measure responses to both s and s simultaneously, a multi-spot® -well, -spot plate (mesoscale devices; msd) was coated with sars cov- antigens. proteins were commercially acquired from a source (native antigen company) that produced them in mammalian cells ( cells). these were biotinylated and adhered to their respective spots by their individual u-plex linkers. to measure igg antibodies, plates were blocked with msd blocker b for hour with shaking, then washed three times prior to the addition of samples, diluted : . after incubation for hours with shaking, the plates were washed three times. the plates were then incubated for hour with the detection antibody at μg/ml (msd sulfo-tag tm anti-mouse igg). after washing times, the read buffer was added and the plates were read on the meso quickplex sq . neutralizing antibodies were routinely detected based on the sars-cov- surrogate virus neutralization test (svnt) kit (genscript). this elisa-based kit detects antibodies that hinder the interaction between the receptor binding domain (rbd) of the sars-cov- spike glycoprotein and the ace receptor on host cells, and is highly correlated to conventional virus neutralizing titers for sars-cov- infection of vero cells . the advantage of this approach is that the assay can be done in a bsl- laboratory. sera from mice immunized with the candidate vaccines was diluted at : , : , : , : , : and : using the provided sample dilution buffer. sera from non-immunized mice was diluted : . lung samples were diluted : , : , and : . positive and negative controls were prepared at a : volume ratio following the provided protocol. after dilution, sera or lung samples were individually incubated at a : ratio with hrp-rbd solution for minutes at °c. following incubation, µl of the each hrp-rbd and sample or control mixture was added to the corresponding wells in the hace -precoated capture plate and once again incubated at °c for minutes. afterwards, wells were thoroughly washed and µl of the provided tmb ( , =, , =tetramethyl-benzidine) solution was added to each well and left to incubate for minutes at room temperature ( - °c). lastly µl of stop solution was added to each well, and the plate was read on a spectra max m microplate reader at nm. the absorbance of a given sample is inversely related on the titer of anti-sars-cov- rbd neutralizing antibody in a given sample. per test kit protocol, a cut-off of % inhibition when comparing the od of the sample versus the od of the negative control was determined to be positive for the presence of neutralizing antibodies. samples that were negative at the lowest dilution were set equal to ½ of the lowest dilution tested, either for sera or . for lung samples. additional neutralizing antibodies responses were measured in some studies using a cvnt assay at visimederi under bsl conditions. the cvnt assay has a readout of cytopathic effect (cpe) to detect specific neutralizing antibodies against live sars-cov- in animal or human samples. the cvnt/cpe assay permits the virus to makes multiples cycles of infection and release from cells; its exponential grow in few days (usually hours of incubation) causes the partial or complete cell monolayer detachment from the surface of the support, clearly identifiable as cpe. serum samples are heat inactivated for min at °; two-fold dilutions, starting from : are performed then mixed with an equal volume of viral solution containing tcid of sars-cov- . the serum-virus mixture is incubated for hour at ° in humidified atmosphere with % co . after incubation, µl of the mixture at each dilution are added in duplicate to a cell plate containing a semiconfluent vero e monolayer. after hours of incubation the plates are inspected by an inverted optical microscope. the highest serum dilution that protect more than % of cells from cpe is taken as the neutralization titer. two weeks after the final immunization (day of the study), mice were sacrificed and bled via cardiac puncture. lungs were removed and snap frozen at - c. on thawing, lungs were weighed. lungs were homogenized in µl dpbs using pellet pestles (sigma z ). homogenates were centrifuged at rpm for minutes and supernatants were frozen. the total protein content in lung homogenate was evaluated using a bradford assay to ensure equivalent amounts of tissue in all samples before evaluation of iga content. antigen-specific iga titers in lungs were detected using a mouse iga elisa kit (mabtech) and pnpp substrate (mabtech). briefly, maxisorp plates (nunc) were coated with s or s (the native antigen company; ng/well) in pbs for overnight adsorption at °c and then blocked in pbs plus . % tween (pbst) plus . % bsa (pbs/t/b) solution for h before washing. lung homogenates were serially diluted in pbs/t/b, starting at a : dilution. after hours incubation and washing, bound iga was detected using mt a-alp conjugated antibody ( : ), according to the manufacturer's protocol. plates were read at nm. endpoint titers were taken as the x-axis intercept of the dilution curve at an absorbance value x standard deviations greater than the absorbance for naïve mouse serum. non-responding animals were set a titer of or ½ the value of the lowest dilution tested. spleens were removed and placed in ml hanks balanced salt solution (with m hepes and % fbs) before pushing through a sterile strainer with a ml syringe. after rbc lysis (ebiosolutions), resuspension, and counting, the cells were ready for analysis. cells were cultured at e cells/well with two peptide pools representing the full-length s protein at µg/ml (genscript) overnight in order to stimulate the cells. the culture media consisted of rpmi media (lonza) with . m hepes, x l-glutamine , x mem basic amino acids, x penstrep, % fbs, and . e- mole/l beta-mercaptoethanol. antigen specific ifn-g elispots were measured using a mabtech kit. flow cytometric analysis was performed using an attune flow cytometer and flow jo version . . , after staining with the appropriate antibodies. for flow cytometry, e splenocytes per well were incubated for hours at ºc with peptide pools representing full-length s at either or ug/ml, adding brefeldin a (thermofisher) for the last hours of incubation. the antibodies used were apc-h conjugated cd , fitc conjugated cd , bv conjugated cd , percp-cy . conjugated ifn-y, bv conjugated il- , pe-cy conjugated tnfa, apc conjugated il- , alexa fluor conjugated cd , and pe conjugated cd l (bd biosciences). clinical features of patients infected with novel coronavirus in wuhan, china transmission and clinical characteristics of asymptomatic patients with sars-cov- infection clinical and immunological assessment of asymptomatic sars-cov- infections chadox ncov- vaccine prevents sars-cov- pneumonia in rhesus macaques dna vaccine protection against sars-cov- in rhesus macaques safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults aged years or older: a randomised, double-blind, placebo-controlled, phase trial an mrna vaccine against sars-cov- -preliminary report in vitro comparison of the biologic activities of monoclonal monomeric iga, polymeric iga, and secretory iga blocking interhost transmission of influenza virus by vaccination in the guinea pig model safety and immunogenicity of an oral tablet norovirus vaccine, a phase i randomized, placebo-controlled trial high titre neutralising antibodies to influenza after oral tablet immunisation: a phase , randomised, placebo-controlled trial. the lancet infectious diseases oral administration of an adenovirus vector encoding both an avian influenza a hemagglutinin and a tlr ligand induces antigen specific granzyme b and ifn-gamma t cell responses in humans efficacy and immune correlates of protection induced by an oral influenza vaccine evaluated in a phase , placebo-controlled human experimental infection study recent advances in the vaccine development against middle east respiratory syndrome-coronavirus firewalls prevent systemic dissemination of vectors derived from human adenovirus type and suppress production of transgene-encoded antigen in a murine model of oral vaccination long-lived memory t lymphocyte responses against sars coronavirus nucleocapsid protein in sars-recovered patients a serological assay to detect sars-cov- seroconversion in humans projections of costs, financing, and additional resource requirements for low-and lower middle-income country immunization programs over the decade projecting the transmission dynamics of sars-cov- through the postpandemic period systemic and mucosal immune responses following oral adenoviral delivery of influenza vaccine to the human intestine by radio controlled capsule high titer neutralizing antibodies to influenza following oral tablet immunization: a randomized, placebocontrolled trial the lancet infectious diseases systematic review of mucosal immunity induced by oral and inactivated poliovirus vaccines against virus shedding following oral poliovirus challenge disinformation, misinformation and inequality-driven mistrust in the time of covid- : lessons unlearned from aids denialism mistrust in biomedical research and vaccine hesitancy: the forefront challenge in the battle against covid- in italy safety, tolerability, and immunogenicity of a recombinant adenovirus type- vectored covid- vaccine: a dose-escalation, open-label, nonrandomised, first-in-human trial first-in-human evaluation of the safety and immunogenicity of a recombinant adenovirus serotype hiv- env vaccine (ipcavd ) structural, glycosylation and antigenic variation between novel coronavirus ( -ncov) and sars coronavirus (sars-cov) receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus contemporary h n influenza viruses have a glycosylation site that alters binding of antibodies elicited by egg-adapted vaccine strains high expression of ace receptor of -ncov on the epithelial cells of oral mucosa report of the who-china joint mission on coronavirus disease (covid- ) enteric involvement of coronaviruses: is faecal-oral transmission of sars-cov- possible? digestive symptoms in covid- patients with mild disease severity: clinical presentation, stool viral rna testing, and outcomes comparison of antiviral activity between iga and igg specific to influenza virus hemagglutinin: increased potential of iga for heterosubtypic immunity iga dominates the early neutralizing antibody response to sars-cov- cross-protection against influenza a virus infection by passively transferred respiratory tract iga antibodies to different hemagglutinin molecules sars-cov- -specific t cell immunity in cases of covid- and sars, and uninfected controls detection of sars-cov- -specific humoral and cellular immunity in covid- convalescent individuals a novel receptor-binding domain (rbd)-based mrna vaccine against sars-cov- development of a covid- vaccine based on the receptor binding domain displayed on virus-like particles selective and cross-reactive sars-cov- t cell epitopes in unexposed humans single-shot ad vaccine protects against sars-cov- in rhesus macaques safety and immunogenicity of a -dose heterologous vaccine regimen with ad .zebov and mva-bn-filo ebola vaccines: -month data from a phase randomized clinical trial in safety and immunogenicity of a -dose heterologous vaccination regimen with ad .zebov and mva-bn-filo ebola vaccines: -month data from a phase randomized clinical trial in uganda and tanzania clinical assessment of a novel recombinant simian adenovirus chadox as a vectored vaccine expressing conserved influenza a antigens a single intranasal dose of chimpanzee adenovirus-vectored vaccine confers sterilizing immunity against sars-cov- infection a simplified system for generating recombinant adenoviruses very fast folding and association of a trimerization domain from bacteriophage t fibritin systemic and mucosal antibody responses following retroductal gene transfer to the salivary gland a sars-cov- surrogate virus neutralization test based on antibody-mediated blockage of ace -spike protein-protein interaction the authors would like to thank alessandro torelli, laura palladino, emanuele montomoli and the team at vismederi for running neutralizing antibody titers (bsl ) and susan johnson for critically reviewing the manuscript. authorship acm, mc, and snt designed the experiments; acm and snt wrote the manuscript; np, kpt, and kl performed immunological assays; acm, np, kpt, kl, mc, and snt analyzed the data. egd and snt designed the vaccine vectors. egd, nd, kpt, klm mc and snt are current employees and/or own stock options in vaxart, the sponsor of the studies. egd and snt are named as inventors covering a sars-cov- (ncov- ) vaccine. snt is named as an inventor on patent covering the vaccine platform. acm declares no competing interest. key: cord- -bz vab e authors: ouadfeul, sid-ali title: multifractal analysis of sars-cov- coronavirus genomes using the wavelet transform date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bz vab e in this paper, the d wavelet transform modulus maxima lines (wtmm) method is used to investigate the long-range correlation (lrc) and to estimate the so-called hurst exponent of isolate rna sequence downloaded from the ncbi database of patients infected by sars-cov- , coronavirus, the knucleotidic, purine, pyramidine, ameno, keto and gc dna coding are used. obtained results show the lrc character in the most sequences; except some sequences where the anti-correlated or the classical brownian motion character is observed. obtained results demonstrate show that the sars-cov coronavirus undergoes mutation from a country to another or in the same country. severe acute respiratory syndrome sars-cov- is a member of the coronaviridae family causes an illness called covid- , which can spread from person to person (conway, ) . it has many symptoms such as fever, headache, and tiredness. it causes respiratory difficulties that can cause death, especially for people which health chronic difficulties such as diabetes, arterial hypertension, heart and pneumonic illness. until now, there is no proven anti-viral or vaccination for the sars-cov- virus. fractal character of nucleic acids distribution in dna sequences has been widely studied by the scientific community; many papers have been published in literature. arneodo et al ( ) published a paper deals with the study of the long-range correlation (lrc) character of dna sequences using the d continuous wavelet transform method. zu-guo et al ( ) introduced a time series model in a statistical point of view and a visual representation in a geometrical point of view to dna sequence analysis, they have also used fractal dimension, correlation dimension, the hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. cattani ( ) published a paper deals with the digital complex representation of a dna sequences and the analysis of existing correlations by wavelets. the symbolic dna sequence is mapped into a nonlinear time series. by studying this time series the existence of fractal shapes and symmetries will be shown. eight h n dna sequences from different locations over the world are analyzed. studied the long-range correlations in genomic dna and the signature of the nucleosomal structure. audit et al ( ) published a paper deals with wavelet analysis of dna bending profiles reveals structural constraints on the evolution of genomic sequences, voss ( ) published a paper deals with the evolution of long-range fractal correlations and /f noise in dna base sequences. in this paper the d wavelet transform modulus maxima lines (wtmm) method is used to demonstrate the monofractal behavior of sars-cov- rna sequences downloaded from the ncbi database and to estimate the so-called hurst exponent, the goal is to investigate the lrc character in these sequences. we begin by describing the different dna coding that will be used. many dna coding of the nucleic acids formed by four different nucleotide which are the thymine (t), the guanine (g), the cytosine (c), the adenine (a) have been proposed in literature, here we will use the following six coding (messaoudi et al, ) : ) the knucleotidic dna coding: t= , g=- , a= , c=- . purine coding a=g= , c=t=- ) pyrimidine c=t= , a=g=- . ameno groupe: a=c= , g=t=- . keto coding g=t= , a=c=- . gc coding g=c= , a=t=- . the d for more details about the d wtmm method we invite readers to the paper of arneodo et al ( ) or ouadfeul and aliouane ( ) . one of the most important strengths of the wtmm method is the ability to identify the mono or the multifractal behavior of a given fractal process, the linear shape of the spectrum of exponents is enough to say that a given fractal process is monofractal ‫ܪ(‬ is the hurst exponent). for more details about this ability, we invite readers to the papers of ouadfeul and aliouane ( ). audit et al ( ) showed that there has been intense discussion about the existence, the nature and the origin of lrc in dna sequences in last decades. besides fourier and autocorrelation analysis, different techniques including mutual information functions, dna walk representation, zipf analysis and entropies were used for statistical analysis of dna sequences. actually, there were many objective reasons for this somehow controversial situation. most of the investigations of lrc in dna sequences were performed using different techniques that all had their own advantages and limitations. they all consisted in measuring power-law behavior of some characteristic quantity, e.g., the fractal dimension of the dna walk, the scaling exponent of the correlation function or the power-law exponent of the power spectrum. therefore, in practice, they all faced the same difficulties, namely finite size effects due to the finiteness of the sequence. authors of this paper demonstrated the necessity of the wavelet transform microscope to study the lrc character of dna sequences. estimated hurst exponent h of the dna walks using the wavelet transform method is able to say that a given dna walk is an anti-correlated random walk (h < / : anti-persistent random walk), or positively correlated (h > / : persistent random walk). h = / corresponds to classical brownian motion (audit et al, ) . in this part, isolate rna sequence are analyzed using the d wtmm method, these sequences are extracted from genbank downloaded from the national center for biotechnology information (ncbi) database, all these rna sequences are of peoples infected by sars-cov- coronavirus, table shows the code of each genbank and the origin (country) of each patient infected by this virus. these rna profiles are coded using the six coding methods detailed above. then, the d wtmm analysis is applied to these sequences to estimate the so-called holder exponent for each coded dna profile. figure shows the rna knucleotidic coding with bp as a length, the dna walk of this sequence is presented in figure , the dna walk at the position n is defined as the sum for more details about dna walk, we invite readers to the paper of peng ( ) . to demonstrate the fractal behavior of this dna walk, we calculate the spectral density of this sequence and we present the spectral density versus the frequency in the log-log scale (voss, ) .it is clear that log(|ܵሺ݂ሻ| ଶ ሻ has a linear shape (see figure ) , which demonstrating the scale-law behavior of the spectral density versus the frequency et al, ) . the first step is the continuous wavelet transform calculation, the analyzing wavelet is the complex morlet, for more details about the cwt calculation, parameters of the analyzing wavelet, and the scaling method, we invite readers to the paper of ouadfeul and aliouane ( ) . table shows the average value of the hurst exponent for each coding method of the rna genbank, we can observe that the purine (sensitive to a and g concentrations) and the pyrimidine (sensitive to c and t concentrations) have the smallest hurst exponent variation compared to ameno (sensitive to a and c concentrations) and gc coding which have the highest variation of the hurst exponent. we have performed a d wavelet based multifractal analysis of rna profiles downloaded from the ncbi database using the continuous wavelet transform, the analyzing wavelet is the complex morlet, the analyzing parameters of the wavelet transform modulus maxima lines wavelet based fractal analysis of dna sequences long-range correlations in genomic dna: a signature of the nucleosomal structure long-range correlations in genomic dna: a signature of the nucleosomal structure wavelet analysis of dna bending profiles reveals structural constraints on the evolution of genomic sequences the thermodynamics of fractals revisited with wavelets fractals and hidden symmetries in dna identification of coronavirus sequences in carp cdna from wuhan, china genomic data visualisation multifractal analysis revisited by the continuous wavelet transform applied in lithofacies segmentation from well-logs data long-range correlations in nucleotide sequences evolution of long-range fractal correlations and /f noise in dna base sequences fractals in dna sequence analysis authors would like thank the national center for biotechnology information for providing data. key: cord- -bah ege authors: kohmer, niko; westhaus, sandra; rühl, cornelia; ciesek, sandra; rabenau, holger f. title: clinical performance of sars-cov- igg antibody tests and potential protective immunity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bah ege as the current sars-cov- pandemic continues, serological assays are urgently needed for rapid diagnosis, contact tracing and for epidemiological studies. so far, there is little data on how commercially available tests perform with real patient samples and if detected igg antibodies provide protective immunity. focusing on igg antibodies, we demonstrate the performance of two elisa assays (euroimmun sars-cov- igg & vircell covid- elisa igg) in comparison to one lateral flow assay ((lfa) fastep covid- igg/igm rapid test device) and two in-house developed assays (immunofluorescence assay (ifa) and plaque reduction neutralization test (prnt)). we tested follow up serum/plasma samples of individuals pcr-diagnosed with covid- . most of the sars-cov- samples were from individuals with moderate to severe clinical course, who required an in-patient hospital stay. for all examined assays, the sensitivity ranged from . to . % for the early phase of infection (days - ) and from . to % for the later period (days - ) after pcr-diagnosed with covid- . with exception of one sample, all positive tested samples in the analysed cohort, using the commercially available assays examined (including the in-house developed ifa), demonstrated neutralizing (protective) properties in the prnt, indicating a potential protective immunity to sars-cov- . regarding specificity, there was evidence that samples of endemic coronavirus (hcov-oc , hcov- e) and epstein barr virus (ebv) infected individuals cross-reacted in the elisa assays and ifa, in one case generating a false positive result (may giving a false sense of security). this need to be further investigated. as the current sars-cov- pandemic continues, serological assays are urgently needed for rapid diagnosis, contact tracing and for epidemiological studies. so far, there is little data on how commercially available tests perform with real patient samples and if detected igg antibodies provide protective immunity. focusing on igg antibodies, we demonstrate the performance of two elisa assays (euroimmun sars-cov- igg & vircell covid- elisa igg) in comparison to one lateral flow assay ((lfa) fastep covid- igg/igm rapid test device) and two in-house developed assays (immunofluorescence assay (ifa) and plaque reduction neutralization test (prnt)). we tested follow up serum/plasma samples of individuals pcr-diagnosed with covid- . most of the sars-cov- samples were from individuals with moderate to severe clinical course, who required an in-patient hospital stay. for all examined assays, the sensitivity ranged from . to . % for the early phase of infection (days - ) and from . to % for the later period (days - ) after pcr-diagnosed with covid- . with exception of one sample, all positive tested samples in the analysed cohort, using the commercially available assays examined (including the in-house developed ifa), demonstrated neutralizing (protective) properties in the prnt, indicating a potential protective immunity to sars- there is an increasing demand in the detection of antibodies -especially of igg antibodies. convalescent plasma may be used for therapeutic or prophylactic approaches as vaccines and other drugs are under development ( ). for all these purposes, sensitive and especially highly specific antibody assays are needed. the spike (s) protein of sars-cov- has shown to be highly immunogenic and is the main target for neutralizing antibodies ( ). currently there are many s protein based commercially or in-house developed assays available, but there is limited data on how these tests perform with clinical samples and if the detected igg antibodies provide protective immunity. this study aims to provide a quick overview on some of these assays (two commercial available elisa, an lfa, an ifa and a prnt, focusing on the detection and neutralization capacity of in terms of sensitivity, our data are consistent with previously published data. in a study from liu et al., using an rs-based elisa assay, the group found sars-cov- igg antibodies in less than % of the samples from days - after disease onset. the sensitivity increased to > % in samples from days [especially for potential protective igg antibodies against the s protein ( )] using elisa or lateral flow assays is more convenient and practicable than using the hands on-and time-intensive ifa or prnt, which can only be performed by experienced personnel, and the prnt, only in a bsl- laboratory. elisa based assays can be automated and used for larger sample sizes. lateral flow assays can be used by less experienced personnel in a point-of-care setting, generating results in short time. some samples, however, were only detected with the ifa and prnt as gold standard. the titer needed for potential protective immunity is not yet (officially) defined. in one study, it is reported, that a individual cleared sars-cov- without developing antibodies up to days after illness ( ). the mechanism of immunity, especially of protective immunity (if applicable) and how long it will last, need to be further investigated. besides humoral mediated immunity, there is evidence that t-cell sars-cov- and coronavirus disease : what we know so far convalescent plasma transfusion for the treatment of covid- : systematic review characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov evaluation of nucleocapsid and spike protein-based elisas for detecting antibodies against sars-cov- virological assessment of hospitalized patients with covid- . nature perspectives on therapeutic neutralizing antibodies against the novel long-term coexistence of sars-cov- with antibody response in covid- patients the trinity of covid- : immunity, inflammation and intervention severe acute respiratory syndrome coronavirus -specific antibody responses in coronavirus disease patients key: cord- -ks fn pq authors: fraser, nicholas; brierley, liam; dey, gautam; polka, jessica k; pálfy, máté; nanni, federico; coates, jonathon alexis title: preprinting the covid- pandemic date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ks fn pq the world continues to face an ongoing viral pandemic that presents a serious threat to human health. the virus underlying the covid- disease, sars-cov- , caused over million confirmed cases and , deaths since january . although the last pandemic occurred only a decade ago, the way science operates and responds to current events has experienced a paradigm shift in the interim. the scientific community responded rapidly to the covid- pandemic, releasing over , covid- scientific articles within months of the first confirmed case, of which , were hosted by preprint servers. focussing on biorxiv and medrxiv, two growing preprint servers for biomedical research, we investigated the attributes of covid- preprints, their access and usage rates and characteristics of sharing across online platforms. our results highlight the unprecedented role of preprint servers in the dissemination of covid- science, and the impact of the pandemic on the scientific communication landscape. following a steep increase in the posting of covid- research, traditional publishers adopted new policies to support the ongoing public health emergency response efforts. after multiple public calls from scientists [ ] , over publishers agreed to make all covid- work freely accessible by the th march [ , ] . shortly after this, publishers (for example elife [ ] ) began to alter peer-review policies in an attempt at fast-tracking covid- research. towards the end of april, oaspa issued an open letter of intent to maximise the efficacy of peer review [ ] . the number of open-access covid- journal articles suggests that journals have largely been successful at implementing these new policies (supplemental fig. b) . attributes of covid- preprints posted between january and april to explore the attirbutes of covid- preprints in greater detail, we focused our following investigation on two of the most popular preprint servers in the biomedical sciences: biorxiv and medrxiv. between january and april , , preprints were deposited in total to biorxiv and medrxiv, of which the majority ( , , . %) were non-covid- related preprints ( fig. a) . while the weekly numbers of non-covid- preprints did not change much during this period, covid- preprint posting increased, peaking at over in early april. when the data was broken down by server, it was evident that whilst posting of covid- preprints to biorxiv had remained relatively steady, preprints posted to medrxiv increased with time (supplemental fig. a) . the increase in the rate of preprint posting poses challenges for their timely screening. only marginally faster screening was detected for covid- preprints than for non-covid- preprints (fig. b) when adjusting for differences between servers (two-way anova, interaction term; f , = . , p < . ). whilst covid- preprints were screened < day quicker from mean differences observed within both servers (tukey hsd; both p < . ), larger differences were observed between servers (supplemental fig. b) , with biorxiv screening preprints on approximately days quicker than medrxiv for both preprint types (both p < . ). the number of authors may give an indication as to the amount of work, resources used, and the extent of collaboration in a paper. while the average number of authors of covid- and non-covid- preprints did not differ, covid- preprints showed slightly more variability in authorship team size (median, [iqr ] vs [iqr ]). single-author preprints were almost three times more common among covid- than non-covid- preprints (fig. c ). researchers may be shifting their publishing practice in response to the pandemic. among all identified corresponding authors of preprints during the pandemic, we found a significant association between preprint type and whether this was the author's first biorxiv or medrxiv preprint (chi- square, χ = . , df = , p < . ). among covid- corresponding authors, % were posting a preprint for the first time, compared to % of non-covid- corresponding authors in the same period. to further understand which authors have been drawn to begin using preprints since the pandemic began, we additionally stratified these groups by country. corresponding authors based in china had the greatest increase in representation among authors of covid- preprints compared to non-covid- preprints as an expectation (fig. d ). additionally, india had a higher representation among covid- authors specifically using preprints for the first time compared to non-covid- posting patterns. moreover, we found that most countries posted their first covid- preprint close to the time of their first confirmed covid- case (supplemental fig. c - preprints were more likely to choose the more restrictive cc-by-nc-nd or cc- by-nd than those of non-covid- preprints, and less likely to choose cc-by and cc (fig. e) . preprint servers offer authors the opportunity to post new versions of a preprint, to improve upon or correct mistakes in an earlier version. the majority of preprints existed as only a single version for both covid- and non-covid- work with very few preprints existing beyond two versions (fig. f) . this may somewhat reflect the relatively short time-span of our analysis period. covid- preprints did not discernibly differ in number of versions compared with non-covid- preprints ( mann-whitney, p < . ) (fig. g ). this supports anecdotal observations that preprints are being used to share more works-in-progress rather than complete stories. we also found that covid- preprints contain fewer references than non-covid- preprints (median, . [ ]. we assessed differences in publication outcomes for covid- versus non-covid- preprints during our analysis period, which may be partially related to differences in preprint quality. published status (published/unpublished) was significantly associated with preprint type (chi-square; χ = . , df = , p = . ); within our timeframe, % of covid- preprints were published by the end of april, compared to the % of non-covid preprints that were published (fig. i ). these published covid- preprints were split across many journals, with clinical or multidisciplinary journals surveyed tending to publish the most papers that were previously preprints (supplemental fig. e ). to determine how publishers were prioritising covid- research, we compared the time from preprint posting to publication in a journal. delay from posting to subsequent publication was significantly accelerated for covid- preprints by a mean difference of . days compared to non-covid- preprints posted in the same time period (mean, . days [sd . ] vs . days [sd . ]; two-way anova; f , = . , p < . ). this did not appear driven by any temporal changes in publishing practices, as publication times of non-covid- preprints were similar to expectation of our control timeframe of september -january (fig. j ). covid- preprints also appeared to have significantly accelerated publishing regardless of publisher (two-way anova, interaction term; f , = . , p = . ) (supplemental fig. f ). however, data aggregated across several publishers revealed that on average, non-covid- manuscripts had a . % higher acceptance rate than covid- manuscripts, regardless of preprint availability (supplemental fig. g) . extensive access of preprint servers for covid- research to confirm that usage of covid- and non-covid- preprints was not an artefact of differing preprint server reliance during the pandemic, we compared usage to september -april , as a non-pandemic control period. we observed a slight decrease in abstract views (supplemental fig. a) and pdf downloads (supplemental fig. b ) in march , but otherwise, the usage data did not differ from that prior to the pandemic. secondly, we investigated usage across additional preprint servers (data kindly provided by each of the server operators). we found that covid- preprints were consistently downloaded more than non-covid- preprints during our timeframe, regardless of which preprint server hosted the manuscript (supplemental fig. c ), though the gap in downloads varied between server (two-way anova, interaction term; f , = . , p < . ). server usage differences were more pronounced for covid- preprints; multiple post-hoc comparisons confirmed that biorxiv and medrxiv received significantly higher usage per covid- preprint than all other servers for which data was available (tukey hsd; all p values < . ). however, for non covid- preprints, the only observed pairwise differences between servers indicated greater biorxiv usage than ssrn or research square (tukey hsd; all p values < . ). this suggests specific attention has been given disproportionately to biorxiv and medrxiv as repositories for covid- research. covid- preprints were shared more widely than non-covid- preprints based on citation data from dimensions, we found that covid- preprints are cited more often than non-covid- preprints (time-adjusted negative binomial regression; rate ratio = . , z = . , p < . ) (fig. a ), although it should be noted that only a minority of preprints received at least one citation in both groups ( . % vs . %). the highest cited preprint had citations, with the th most cited covid- preprint receiving citations (table ) we also investigated sharing of preprints on twitter to assess the exposure of wider public audiences to preprints, using data from altmetric. covid- preprints were tweeted at a greater rate than non- covid- preprints (rate ratio = . , z = . , p < . ) (fig. b ). the most tweeted non-covid- preprint received , tweets, whereas of the top tweeted covid- preprints were tweeted over , times each ( table ). many of the top tweeted covid- preprints were related to transmission, re-infection or seroprevalence and association with the bcg vaccine. the most tweeted covid- preprint ( , tweets) was a study investigating antibody seroprevalence in california [ ] , whilst the second most tweeted covid- preprint was a widely criticised (and later withdrawn) study linking the sars-cov- spike protein to hiv- glycoproteins [ ] . to better understand the main discussion topics associated with the top- most tweeted preprints, we analysed the hashtags used in original tweets (i.e. excluding retweets) mentioning those preprints (supplemental fig. a ). after removing generic or overused hashtags directly referring to the virus (e.g. "#coronavirus", "#covid- "), we found that the most dominant hashtag among tweets referencing preprints was "#hydroxychloroquine", a major controversial topic associated with two of the top ten most tweeted preprints. other prominent hashtags contained a mixture of direct, neutral references to the disease outbreak such as "#coronavirusoutbreak" and "#wuhan", and some more politicised terms, such as "#fakenews" and "#covidisalie", associated with conspiracy theories. as well as featuring heavily on social media, covid- research has also pervaded print and online news media. covid- preprints were used in news articles at a rate over two hundred times that of non-covid- preprints (rate ratio = . , z = . , p < . ), although as with citations, only a minority were mentioned in news articles at all ( . % vs . %) (fig. c) . the top non-covid- preprints were reported in less than news articles in total, whereas the top covid- preprints were reported in over news articles (table ) . similarly, covid- preprints were also used in blogs at a significantly greater rate than non-covid- preprints (rate ratio = . , z = . , p < . ) ( fig. d ; table ). we noted that several of the most widely-disseminated non-covid- preprints featured topics relevant to infectious disease research, e.g. human respiratory physiology and personal protective equipment (tables and ) . independent covid- review projects have arisen to publicly review covid- preprints [ ] . to investigate engagement with preprints directly on the biorxiv and medrxiv platforms, we quantified the number of comments for preprints posted between january and april. we found that non-covid- preprints were rarely commented upon when compared to covid- preprints (time-adjusted negative binomial regression; rate ratio = . , z = . , p < . ) (fig. e) ; the most commented non- covid- preprint received only comments, whereas the most commented covid- preprint had over comments on the th april (table ). one preprint, which had comments was retracted within days of being posted following intense public scrutiny [ ] . collectively these data suggest that the most discussed or controversial covid- preprints are being rapidly and publicly scrutinised, with commenting systems being used for direct feedback and discussion of preprints. research square, arxiv). however, these citations occurred at a relatively low rate, typically constituting less than % of the total citations in these documents (fig. f ). fifty-eight individual covid- preprints from biorxiv or medrxiv were cited in examined policy documents, of which were cited more than once and were cited more than twice. most preprint citations occurred in documents from the ecdc, uk post and who sb with no preprints cited in analysed documents from the us hsscc. in comparison, only two instances of citations to preprints were observed among manually collected non-covid- policy documents from the same sources. to understand how different usage indicators may represent the sharing behaviour of different user groups, we calculated the correlation between the usage indicators presented above (citations, tweets, news articles, comments). for covid- preprints, we found weak correlation between the numbers of citations and twitter shares (spearman's ρ = . , p < . ), and the numbers of citations and news articles (spearman's ρ = . , p < . ) (fig. g) , suggesting that the preprints cited mostly within the scientific literature differed to those that were mostly shared by the wider public on other online platforms. there was a stronger correlation between covid- preprints that were most blogged and those receiving the most attention in the news (spearman's ρ = . , p < . ). moreover, there was a strong correlation between covid- preprints that were most tweeted and those receiving the most attention in the news (spearman's ρ = . , p < . ), suggesting similarity between preprints shared on social media and in news media (fig. g ). there was a weak correlation between the most tweeted covid- preprints and the most commented upon (spearman's ρ = . , p < . ). taking the top ten covid- preprints by each indicator, there was substantial overlap between all indicators except citations (supplemental fig. b ). we observed much weaker correlation between all indicators for non-covid- preprints (fig. h) show that preprints have been widely adopted for the dissemination and communication of covid- research, and in turn, the pandemic has greatly impacted the preprint and science publishing landscape. changing attitudes and acceptance within the life sciences to preprint servers may be one reason why covid- research is being shared to readily as preprints compared to past epidemics. in addition, the need to rapidly communicate findings prior to a lengthy review process might be responsible for this observation (fig. ) . a recent study involving qualitative interviews of multiple research stakeholders found "early and rapid dissemination" to be amongst the most often cited benefits of preprints [ ] . the fact that news outlets are reporting extensively on covid- preprints ( fig. c and g) represents a marked change in journalistic practice: pre-pandemic, biorxiv preprints received very little coverage in comparison to journal articles [ ] . this cultural shift provides an unprecedented opportunity to bridge the scientific and media communities to create a consensus on the reporting of preprints [ ] . another marked change was observed in the use of preprints in policy documents (fig. f) . preprints were remarkably absent in non-covid- policy documents yet present, albeit at relatively low levels, in covid- policy documents. in a larger dataset, two of the top journals which are being cited in policy documents were found to be preprint servers (medrxiv and ssrn in th and th position respectively) [ ] . this suggests that preprints are being used to directly influence policy-makers and decision making. we only investigated a limited set of policy documents, largely restricted to europe and the us and whether this extends more globally remains to be explored. in the near future, we aim to examine the use of preprints in policy in more detail to address these questions. as most covid- -preprints were not yet published, concerns regarding quality will persist [ ] . commenting on covid- preprints (fig. ) . moreover, prominent scientists are using social media platforms such as twitter to publicly share concerns with poor quality covid- preprints or to amplify high-quality preprints [ ] . the use of twitter to "peer-review" preprints provides additional, public, scrutiny on manuscripts that can complement the less opaque and slower traditional peer-review process. although these new review platforms partially combat poor-quality preprints, it is clear that there is a dire need to better understand the general quality and trustworthiness of preprints compared to peer-review articles. we found comparative levels of preprints had been published within our short timeframe (fig. ) and that acceptance rates at several journals was only slightly reduced for covid- research compared to non-covid- articles (supplemental fig. ) suggesting that, generally, preprints were relatively of good quality. furthermore, recent studies have suggested that the quality of reporting in preprints differs little from their later peer-reviewed articles [ ] and we ourselves are currently undertaking a more detailed analysis (see version of our preprint for an initial analysis of published covid preprints [ ] ). however, the problem of poor-quality science is not unique to preprints and ultimately, a multi-pronged approach is required to solve some of these issues. for example, scientists must engage more responsibly with journalists and the public, in addition to upholding high standards when sharing research. more significant consequences for academic misconduct and the swift removal of problematic articles will be essential in aiding this. moreover, the politicisation of science has become a polarising issue and must be prevented at all costs. thirdly, transparency within the scientific process is essential in improving the understanding of its internal dynamics and providing accountability. our data demonstrates the indispensable role that preprints, and preprint servers, are playing during a global pandemic. by communicating science through preprints, we are sharing at a faster rate than allowed by the current journal infrastructure. furthermore, we provide evidence for important future discussions around scientific publishing. methods preprint metadata for biorxiv and medrxiv we retrieved basic preprint metadata (dois, titles, abstracts, author names, corresponding author name and institution, dates, versions, licenses, categories and published article links) for biorxiv and medrxiv preprints via the biorxiv application programming interface (api; https://api.biorxiv.org). the api accepts a 'server' parameter to enable retrieval of records for both biorxiv and medrxiv. we initially collected metadata for all preprints posted from the time of the server's launch, corresponding to november for biorxiv and june for medrxiv, until the end of our analysis period on th april (n = , ). all data were collected on st may . note that where multiple preprint versions existed, we included only the earliest version and recorded the total number of following revisions. preprints were classified as "covid- preprints" or "non-covid- preprints" on the basis of the following terms contained within their titles or abstracts (case-insensitive): "coronavirus", "covid- ", "sars-cov", "ncov- ", " -ncov", "hcov- ", "sars- ". for comparison of preprint behaviour between the covid- outbreak and previous viral epidemics, namely western africa ebola virus and zika virus (supplemental fig. ) , the same procedure was applied using the keywords "ebola" or "zebov", and "zika" or "zikv", respectively. for all preprints contained in the subset, disambiguated author affiliation and country data for corresponding authors were retrieved by querying raw affiliation strings against the research organisation registry (ror) api (https://github.com/ror-community/ror-api). the api provides a service for matching affiliation strings against institutions contained in the registry, on the basis of multiple matching types (named "phrase", "common terms", "fuzzy", "heuristics", and "acronyms"). the service returns a list of potential matched institutions and their country, as well as the matching type used, a confidence score with values between and , and a binary "chosen" indicator relating to the most confidently matched institution. a small number (~ ) of raw affiliation strings returned from the biorxiv api were truncated at characters; for these records we conducted web-scraping using the rvest package for r [ ] to retrieve the full affiliation strings of corresponding authors from the biorxiv public webpages, prior to matching. for the purposes of our study, we aimed for higher precision than recall, and thus only included matched institutions where the api returned a confidence score of . a manual check of a sample of returned results also suggested higher precision for results returned using the "phrase" matching type, and thus we only retained results using this matching type. in a final step, we applied manual corrections to the country information for a small subset of records where false positives would be most likely to influence our results by a) iteratively examining the chronologically first preprint associated with each country following affiliation matching and applying manual rules to correct mismatched institutions until no further errors were detected (n = institutions); and b) examining the top most common raw affiliation strings and applying manual rules to correct any mismatched or unmatched institutions (n = institutions). in total, we matched , preprints to a country ( . %); for covid- preprints alone, preprints ( . %) were matched to a country. note that a similar, albeit more sophisticated method of matching biorxiv affiliation information with the ror api service was recently documented by abdill et al. [ ] . word counts and reference counts for each preprint were also added to the basic preprint metadata via scraping of the biorxiv public webpages (medrxiv currently does not display full html texts, and so calculating word and reference counts was limited to biorxiv preprints). web scraping was usage data (abstract views and pdf downloads) were scraped from the biorxiv and medrxiv public webpages, using the rvest package for r (wickham, ) . biorxiv and medrxiv webpages display abstract views and pdf downloads on a calendar month basis; for subsequent analysis (e.g figure ) , these were summed to generate total abstract views and downloads since the time of preprint posting. in total, usage data were recorded for , preprints ( . %) -a small number were not recorded, possibly due to server issues during the web scraping process. note that biorxiv webpages also display counts of full-text views, although we did not include these data in our final analysis. this was partially to ensure consistency with medrxiv, which currently does not provide display full html texts, and partially due to ambiguities in the timeline of full-text publishing -the full text of a preprint is added several days after the preprint is first available, but the exact delay appears to vary from preprint to preprint. we also compared rates of pdf downloads for biorxiv and medrxiv preprints with a number of other preprint servers (preprints.org, ssrn, and research square) (supplemental counts of multiple altmetric indicators (mentions in tweets, blogs, and news articles) were retrieved via altmetric (https://www.altmetric.com), a service that monitors and aggregates mentions to scientific articles on various online platforms. altmetric provide a free api (https://api.altmetric.com) against which we queried each preprint doi in our analysis set. importantly, altmetric only contains records where an article has been mentioned in at least one of the sources tracked, thus, if our query returned an invalid response we recorded counts for all indicators as zero. coverage of each indicator (i.e. the proportion of preprints receiving at least a single mention in a particular source) for preprints were . %, . %, and . % for mentions in tweets, blogs and news articles respectively. the high coverage on twitter is likely driven, at least in part, by automated tweeting of preprints by the official biorxiv and medrxiv twitter accounts. for covid- preprints, coverage was found to be . %, . % and . % for mentions in tweets, blogs and news articles respectively. to quantitatively capture how high-usage preprints were being received by twitter users, we retrieved all tweets linking to the top ten most-tweeted preprints. tweet ids were retrieved via the altmetric api service, and then queried against the twitter api using the rtweet package [ ] for r, to retrieve full tweet content. citations counts for each preprint were retrieved from the scholarly indexing database dimensions (https://dimensions.ai). an advantage of using dimensions in comparison to more traditional citation databases (e.g. scopus, web of science) is that dimensions also includes preprints from several sources within their database (including from biorxiv and medrxiv), as well as their respective citation counts. when a preprint was not found, we recorded its citation counts as zero. of all preprints, ( . %) recorded at least a single citation in dimensions. for covid- preprints, preprints ( . %) recorded at least a single citation. comments biorxiv and medrxiv html pages feature a disqus (https://disqus.com) comment platform to allow readers to post text comments. comment counts for each biorxiv and medrxiv preprint were retrieved via the disqus api service (https://disqus.com/api/docs/). where multiple preprint versions existed, comments were aggregated over all versions. as with preprint perceptions among public audiences on twitter, we then examined perceptions among academic audiences by examining comment sentiment. text content of comments for covid- preprints were provided directly by the biorxiv development team. screening time for biorxiv and medrxiv to calculate screening time, we followed the method outlined by steve royle [ ] . in short, we calculate the screening time as the difference in days between the preprint posting date, and the date stamp of submission approval contained within biorxiv and medrxiv dois (only available for preprints posted after december th ). biorxiv and medrxiv preprints were filtered to preprints posted between january st -april th , accounting for the first version of a posted preprint. to describe the level of reliance upon preprints in policy documents, a set of policy documents were manually collected from the following institutional sources: the european centre for disease were then text-mined and manually verified to calculate the proportion of references that were preprints. preprint counts were compared across categories (e.g., covid- or non-covid- ) using chi-square tests or, in cases where any expected values were < , with fisher's exact tests using monte carlo simulation. quantitative preprint metrics (e.g. word count, comment count) were compared across categories using mann-whitney tests and correlated with other quantitative metrics using spearman's rank tests for univariate comparisons. for time-variant metrics (e.g. views, downloads, which may be expected to vary with length of preprint availability), we analysed the difference between covid- and non-covid- preprints using generalised linear regression models with calendar days since jan st as an additional covariate and negative binomially-distributed errors. this allowed estimates of time-adjusted rate ratios comparing covid- and non-covid- preprint metrics. negative binomial regressions were constructed using the function 'glm.nb' in r package mass [ ] . for multivariate categorical comparisons of preprint metrics (e.g. screening time between preprint type and preprint server), we constructed two-way factorial anovas, testing for interactions between both category variables in all cases. pairwise post-hoc comparisons of interest were tested using tukey's honest significant difference (hsd) while correcting for multiple testing, using function 'glht' in r package multcomp [ ]. who. covid- situation report a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- a novel coronavirus associated with severe acute respiratory syndrome novel coronavirus from a man with pneumonia in saudi arabia biomedical journal speed and efficiency: a cross-sectional pilot survey of author experiences tracking the popularity and outcomes of all biorxiv preprints does it take too long to publish research? accelerating scientific publication in biology biorxiv: the preprint server for biology biorxiv at year: a promising start new preprint server for medical research sharing data during zika and other global health emergencies | wellcome preprints: an underutilized mechanism to accelerate outbreak science who director-general's opening remarks at the media briefing on covid- - coronanet: a dyadic dataset of government responses to the covid- pandemic wellcome trust. coronavirus (covid- ): sharing research data | wellcome calling all coronavirus researchers: keep sharing, stay open publishers make coronavirus (covid- ) content freely available and reusable | wellcome publishing in the time of covid- . elife covid- publishers open letter of intent -rapid review the licensing of biorxiv preprints. satoshi village covid- antibody seroprevalence uncanny similarity of unique inserts in the -ncov spike protein to hiv- gp and gag. biorxiv the relationship between biorxiv preprints, citations and altmetrics preprints and scholarly communication: an horbach spjm. pandemic publishing: medical journals strongly speed up their publication process for covid- technical and social issues influencing the adoption of preprints in the life sciences we've just put an additional, cautionary note about the use of preprints on every @biorxivpreprint tracking the twitter attention around the research efforts on the covid- pandemic coronavirus: the spread of misinformation covid- misinformation. uk parliam post open science saves lives: lessons from the covid- pandemic preprints could promote confusion and distortion covid- -policy dataset rapid publications risk the integrity of science in the era of covid- open peer-review platform for covid- preprints the role of research preprints in the academic response to the covid- epidemic advancing scientific knowledge in times of pandemics the mit press and uc berkeley launch rapid reviews: covid- eye for manipulation: a profile of elisabeth bik comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature preprinting a pandemic: the role of preprints in the covid- pandemic rvest: easily harvest (scrape) web pages international authorship and collaboration in biorxiv preprints ) see. roadoi: find free versions of scholarly publications via unpaywall collecting and analyzing twitter data screenager: screening times at biorxiv. in: quantixed [internet modern applied statistics with s simultaneous inference in general parametric models (a) number of covid- confirmed cases and reported deaths cumulative growth of journal articles and preprints containing covid- related search terms cumulative growth of preprints containing covid- related search terms, broken down by individual preprint server stratified by author status where dark fill represents authors previously submitting at least one preprint prior to the covid- pandemic, light fill represents authors submitting a preprint for the first time during the covid- pandemic. (e) license type chosen by authors. (f) number of versions per preprint. (g) word counts per preprint. (h) reference counts per preprint. (i) percentage of preprints published in peer-reviewed journals (green) and non-covid- preprints posted between a- e) are displayed with log scales, with + added for visualisation. boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to . *iqr. all boxplots additionally show raw data values for parameters and limitations of this study we acknowledge a number of limitations in our study. firstly, to assign a preprint as covid- or not, we used keyword matching to titles/abstracts on the preprint version at the time of our data extraction. this means we may have captured some early preprints, posted before the pandemic that had been subtly revised to include a keyword relating to covid- . our data collection period was a tightly defined window (january-april ) which may impact upon the altmetric and usage data we collected as those preprints posted at the end of april would have had less time to accrue these metrics. acknowledgements the authors would like to thank ted roeder, john inglis and richard sever from biorxiv and medrxiv for providing information relating to comments on covid- preprints. we would also like to thank data availability all data and code used in this study are available on github (https://github.com/preprinting-a- pandemic/pandemic_preprints), with the exception of data provided by preprint servers, publishers and raw tweet data. data not publicly available may be shared following permission from the relevant provider and upon reasonable request to the corresponding author. jp is the executive director of asapbio, a non-profit organization promoting the productive use of preprints in the life sciences. gd is a biorxiv affiliate, part of a volunteer group of scientists that screen preprints deposited on the biorxiv server. mp is the community manager for prelights, a non-profit preprint highlighting service. gd key: cord- - s j q authors: salim khan, s muhammad; qurieshi, mariya amin; haq, inaamul; majid, sabhiya; bhat, arif akbar; nabi, sahila; ganai, nisar ahmad; zahoor, nazia; nisar, auqfeen; chowdri, iqra nisar; qazi, tanzeela bashir; kousar, rafiya; lone, abdul aziz; sabah, iram; nabi, shahroz; sumji, ishtiyaq ahmad; kawoosa, misbah ferooz; ayoub, shifana title: seroprevalence of sars-cov- specific igg antibodies in district srinagar, northern india – a cross-sectional study date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: s j q background prevalence of igg antibodies against sars-cov- infection provides essential information for deciding disease prevention and mitigation measures. we estimate the seroprevalence of sars-cov- specific igg antibodies in district srinagar. methods persons > years of age selected from hospital visitors across district srinagar participated in the study. we tested samples for the presence of sars-cov- specific igg antibodies using a chemiluminescent microparticle immunoassay-based serologic test. results age- and gender-standardized seroprevalence was . % ( % ci . % to . %). age - years, a recent history of symptoms of an influenza-like-illness, and a history of being placed under quarantine were significantly related to higher odds of the presence of sars-cov- specific igg antibodies. the estimated number of sars-cov- infections during the two weeks preceding the study, adjusted for test performance, was with an estimated (median) infection-to-known-case ratio of ( % ci to ). conclusions the seroprevalence of sars-cov- specific igg antibodies is low in the district. a large proportion of the population is still susceptible to the infection. a sizeable number of infections remain undetected, and a substantial proportion of people with symptoms compatible with covid- are not tested. in a population since they may miss many asymptomatic and pre-symptomatic infections. thus, an informed policy-and decision-making for control of the covid- epidemic in a community should not be based solely on rt-pcr-based numbers. seroprevalence surveys can provide an estimate of the proportion of the population that has developed antibodies against sars-cov- , an indication of recent sars-cov- infection. mild and asymptomatic infections, which may not have received rt-pcr testing, can be detected. besides, assuming that antibodies provide partial or total immunity, seroprevalence surveys give here, we present the results of a cross-sectional seroprevalence study in district srinagar, conducted between st and th july , to estimate the prevalence of igg antibodies against sars-cov- among adults using a sensitive and specific chemiluminescent microparticle immunoassay (cmia)-based test. study design, setting, and participants we conducted a cross-sectional seroprevalence study in district srinagar over two weeks from st july to th july . district srinagar is a city in the valley of kashmir in northern india. it has an estimated adult (> years) population of just over one million. study participants were adults (> years) who visited select hospitals across the district during the study period. we informed the participants of the study's purpose and procedure. we obtained written informed consent from those who agreed to participate. the institutional ethics committee of government medical college srinagar approved the study. sample size we estimated the minimum sample size needed based on an anticipated seroprevalence of % within an absolute error of . % with % confidence. we used a design effect of to adjust for the nature of sampling and increased the sample size further to account for a non-response of %. the minimum sample size needed for the study was . we targeted a sample size of . we deemed hospital-based selection of participants to be more convenient, rapid, and feasible, given the constrained human resources and time available for completion of the study. such a choice could, however, lead to a non-representative sample. we made efforts to reduce this bias by reporting age-and gender-standardized prevalence. we invited all adult patients (> years) coming to the selected hospitals during the study period for participation in the study. testing. the samples were carefully packed in designated vaccine carriers to avoid hemolysis during transportation. laboratory procedure we performed the test med using fully automated high throughput platform architect figures for urban areas in the report to calculate weights for reporting age-and gender- standardized seroprevalence. the details are in the supplementary material (s table) . to identify potential factors associated with sars-cov- seropositivity, we used logistic regression analysis and reported unadjusted and age-and gender-adjusted odds ratio with a % confidence interval. we estimated the number of infections till two weeks before the study period, i.e., th june to th june , by applying the age-and gender-specific seroprevalence rates found in the table) . over the study period, eligible persons were invited for participation in the study. the refusal rate was . % ( / ). two thousand nine hundred twenty-three blood samples were collected, but demographic data was missing for participants. we analyzed data from participants (fig. ) . table) . young people, those with a history of ili symptoms in the four weeks preceding the interview, and those ever placed under quarantine were found to have significantly higher odds of the presence of sars-cov- specific igg antibodies (table ) . table) . when adjusted for sensitivity and specificity of the laboratory test kit [ ] , the number of infections comes down to ( % ci - ) (s table) . the cumulative number of rt-pcr confirmed cases till th june was in district srinagar, and the number was till th june . the mid-interval (median) cumulative number of rt-pcr confirmed cases in the two weeks preceding the study was (fig ) . the . we did not find any significant difference in seroprevalence among males and females (or . , % ci . - . ) ( and needs further investigation [ ] . the presence of ili symptoms in the recent past was the factor most strongly associated with the understanding the importance of this study and giving their time and consent for participation. the coronavirus disease (covid- ) and the virus that causes it pdf?sfvrsn=adb f _ . novel coronavirus( -ncov) situation report - . j&k admn declares lockdown; essential services to continue : the tribune india a systematic review of asymptomatic infections with covid- the implications of silent transmission for the control of covid- outbreaks epicollect -free and easy-to-use mobile data-gathering platform performance characteristics of the abbott architect sars-cov- igg assay and seroprevalence census of india website : office of the registrar general & census commissioner, india adjusting coronavirus prevalence estimates for laboratory test kit seroprevalence of anti-sars-cov- igg antibodies in geneva, switzerland (serocov- pop): a population-based study antibodies to sars-cov- in sites in the united states prevalence of sars-cov- in spain (ene-covid): a nationwide, population-based seroepidemiological study seroprevalence of covid- virus infection in guilan province estimation of seroprevalence of novel coronavirus disease (covid- ) using preserved serum at an outpatient setting in kobe, japan: a cross-sectional study. infectious diseases (except hiv/aids) prevalence of sars-cov- infection in the luxembourgish population the con-vince study sars-cov- -specific antibodies among adults in antibody tests in detecting sars-cov- infection: a meta-analysis the edi enzyme linked immunosorbent assays for the detection of sars-cov- igm and igg antibodies in human plasma detection of igm and igg antibodies in patients with coronavirus disease specific humoral and cellular immunity in covid- convalescent individuals covid- ) confirmed using an igm-igg antibody test sars-cov- infection serology: a useful tool to overcome lockdown? acute respiratory syndrome coronavirus -specific antibody responses in coronavirus a comparison study of sars-cov- igg antibody between male and female covid- patients: a possible reason underlying different outcome between sex can an effective sars-cov- vaccine be developed for the older population? immunity and ageing list of hospitals across district srinagar age and gender distribution of the population in district srinagar and the estimated population estimated number of infections in district srinagar age-and gender-standardized seroprevalence key: cord- - n j jj authors: duan, xiaohua; han, yuling; yang, liuliu; nilsson-payant, benjamin e.; wang, pengfei; zhang, tuo; xiang, jenny; xu, dong; wang, xing; uhl, skyler; huang, yaoxing; chen, huanhuan joyce; wang, hui; tenoever, benjamin; schwartz, robert e.; ho, david. d.; evans, todd; pan, fong cheng; chen, shuibing title: identification of drugs blocking sars-cov- infection using human pluripotent stem cell-derived colonic organoids date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n j jj the current covid- pandemic is caused by sars-coronavirus (sars-cov- ). there are currently no therapeutic options for mitigating this disease due to lack of a vaccine and limited knowledge of sars-cov- biology. as a result, there is an urgent need to create new disease models to study sars-cov- biology and to screen for therapeutics using human disease-relevant tissues. covid- patients typically present with respiratory symptoms including cough, dyspnea, and respiratory distress, but nearly % of patients have gastrointestinal indications including anorexia, diarrhea, vomiting, and abdominal pain. moreover, these symptoms are associated with worse covid- outcomes . here, we report using human pluripotent stem cell-derived colonic organoids (hpsc-cos) to explore the permissiveness of colonic cell types to sars-cov- infection. single cell rna-seq and immunostaining showed that the putative viral entry receptor ace is expressed in multiple hesc-derived colonic cell types, but highly enriched in enterocytes. multiple cell types in the cos can be infected by a sars-cov- pseudo-entry virus, which was further validated in vivo using a humanized mouse model. we used hpsc-derived cos in a high throughput platform to screen fda-approved drugs against viral infection. mycophenolic acid and quinacrine dihydrochloride were found to block the infection of sars-cov- pseudo-entry virus in cos both in vitro and in vivo, and confirmed to block infection of sars-cov- virus. this study established both in vitro and in vivo organoid models to investigate infection of sars-cov- disease-relevant human colonic cell types and identified drugs that blocks sars-cov- infection, suitable for rapid clinical testing. the current covid- pandemic is caused by sars-coronavirus (sars-cov- ). there are currently no therapeutic options for mitigating this disease due to lack of a vaccine and limited knowledge of sars-cov- biology. as a result, there is an urgent need to create new disease models to study sars-cov- biology and to screen for therapeutics using human disease-relevant tissues. covid- patients typically present with respiratory symptoms including cough, dyspnea, and respiratory distress, but nearly % of patients have gastrointestinal indications including anorexia, diarrhea, vomiting, and abdominal pain. moreover, these symptoms are associated with worse covid- outcomes . here, we report using human pluripotent stem cell-derived colonic organoids (hpsc-cos) to explore the permissiveness of colonic cell types to sars-cov- infection. single cell rna-seq and immunostaining showed that the putative viral entry receptor ace is expressed in multiple hesc-derived colonic cell types, but highly enriched in enterocytes. multiple cell types in the cos can be infected by a sars-cov- pseudo-entry virus, which was further validated in vivo using a humanized mouse model. we used hpsc-derived cos in a high throughput platform to previously, we reported a chemically-defined protocol to derive cos from hpscs , which we modified slightly based on published studies . in brief, hues hescs were induced with chir (chir) and activin a to generate definitive endoderm (de) (extended data fig. a) . after days of culture with chir +fgf to induce hindgut endoderm (he), cells were treated with bmp , epidermal growth factor (egf), and chir for days to promote specification of colon progenitors (cps). starting on day , cps were treated with a colonic medium containing chir, ldn (ldn), and egf. after embedding these organoids in matrigel, spheroids became pseudostratified and progressively cavitated into fully convoluted organoids (fig. a) . the organoids expressed cdx , villin and satb , confirming colonic identity (fig. b) . immunocytochemistry confirmed that cos contain cell types found in normal colon, including keratin (krt ) + epithelial cells, mucin (muc ) + goblet cells, eph receptor b (ephb ) + transit-amplifying (ta) cells, and chromogranin a (chga) + neuroendocrine (ne) cells (fig. c) . single cell rna-seq was used to examine global transcript profiles at single cell resolution (extended data fig. b) . consistent with the immunostaining results, most cells express cdx and vil (extended data fig. c) . five cell clusters were identified including krt + epithelial cells, muc + goblet cells, ephb + ta cells, chga + ne cells, and lgr + or bmi + stem cells (fig. d-e, extended data fig. d) . we examined the expression of two factors associated with sars-cov- cell entry, the putative receptor ace and the protease tmprss . both are expressed in all five cell clusters, but highly enriched in krt + enterocytes ( fig. f-g) . two-dimensional correlation confirmed the co-expression relationship for ace and krt , as well as ace and tmprss (fig. h) . immunohistochemistry further validated the coexpression of krt and ace in hpsc-cos (fig. i) . to model infection of hpsc-cos with sars-cov- , we used a vesicular stomatitis virus (vsv) based sars-cov- pseudo-entry virus, with the backbone provided by a vsv-g pseudo-typed Δ g-luciferase virus and the sars-cov- spike protein incorporated into the surface of the viral particle (see methods for details) , . cos were fragmentized and innoculated with the sars-cov- pseudo-entry virus. or hr post-infection (hpi), the cells were lysed and monitored for luciferase activity (extended data fig. a) . the organoids infected with sars-cov- pseudo-entry virus at moi= . showed a strong signal at hpi (fig. a) . single cell rna-seq was performed to examine the hpscderived cos at hpi. the same five cell populations were identified in the cos postinfection ( fig. b and extended data fig. b-d) . compared to uninfected samples, the krt + enterocyte population decreased significantly (fig. c) . immunostaining confirmed increased cellular apoptosis, suggesting toxicity for these cells (extended data fig. e ). in addition, the ace + population was significantly depleted (fig. e) . the mrnas of sars-cov- pseudo-entry virus, including vsv-ns, vsv-n, and vsv-m, were detected in all five cell populations (fig. f) , but not in the uninfected cos (extended data fig. f) . immunostaining further validated the expression of luciferase in ace + , vil + , cdx + , krt + , and muc + cells (fig. g) . humanized mice carrying hpsc-cos in vivo provide a unique platform for modeling covid- . in brief, hpsc-cos were transplanted under the kidney capsule of nodscid il rg null mice. two weeks after transplantation, the organoid xenograft was removed and examined for cellular identities (fig. h) . consistent with in vitro culture, ace can be detected in hpsc-derived krt + enterocytes (fig. i) . sars-cov- pseudo-entry virus was inoculated locally. at hpi, the xenografts were removed and analyzed by immuno-histochemistry. luciferase was detected in the xenografts inoculated with virus, but not in mock-infected controls (fig. j) . immunohistochemistry detected luciferase in ace + and villin + cells, suggesting these are permissive to sars-cov- pseudo-entry virus infection in vivo (fig. k) . next, we adapted hpsc-cos to a high throughput screening platform and probed the prestwick fda-approved drug library to identify drug candidates capable of blocking sars-cov- pseudo-virus infection. in brief, hpsc-cos were cultured in -well plates. after overnight incubation, organoids were treated with drugs from the library at μ m. one hour post-exposure with drugs, the organoids were innoculated with the sars-cov- pseudo-entry virus. hpi, the organoids were analyzed for luciferase activity (fig. a) . drugs that decreased the luciferase activity by at least % were chosen as primary hit drugs (fig. b) . eight drugs (extended data table ) were identified as lead hits and further tested for their capacities to decrease the luciferase signal in a dose-dependent manner (extended data fig. ) . these drugs could potentially function through blocking virus entry, by decreased cell survival, or even by directly inhibiting luciferase activity. to distinguish these possibilities, the lead hit drugs were tested in comparison to hpsc-cos infected with a control vsvg-luciferase reporter virus. four of the lead hit drugs showed specificity to sars-cov- pseudoentry virus, including mycophenolic acid (mpa, fig. c fig. h) . co-explanted humanized mice were treated with mg/kg mpa by ip injection, followed by local inoculation of sars-cov- pseudo-entry virus. at hpi, the mice were euthanized, and xenografts were analyzed by immunostaining. luc + cells in the xenografts of mpa-treated mice were significantly lower than those of vehicletreated mice (fig. i-k) . finally, hpsc-cos were infected with sars-cov- virus at moi= . or . . at hpi, immunostaining detected the expression of sars-cov membrane protein in the infected hpsc-cos, which partially co-localized with cdx and krt (fig. a) . bulk rna sequencing confirmed viral transcripts in the sars-cov- infected hpsc-cos (moi= . , fig. b) . the mock and infected hpsc-cos separated clearly into two distinct clusters in a pca plot (fig. c) . differential gene expression analysis showed striking induction of chemokine gene expression, including for il a, cxcl , cxcl , cxcl , and il b, yet with no detectable levels of ifn-i or ifn-iii, which is consistent with recent reports [ ] [ ] [ ] (fig. d) . ingenuity pathway analysis of the differential gene expression list highlighted the production of nitric oxide and reactive oxygen species, oxidative phosphorylation, as well as il- production (fig. e) . the hpsc-cos were pre-treated with mpa or qnhc and infected with a relatively high titer of sars-cov- virus (moi= . ). immunostaining confirmed the decrease of sars-cov- + cells in mpa or qnhc-treated hpsc-cos (fig. f) . in summary, we report that hpsc-derived cos express ace and tmprs s and are permissive to sars-cov- infection. there is currently a lack of physiologically relevant models for covid- disease that enable drug screens. previous studies were based on clinical data or transgenic animals, for example mice that express human ace . however, such transgenic animals fail to fully recapitulate the cellular phenotype and host response of human cells , . we adapted a hpsc-derived co platform for high throughput drug screening. using disease-relevant normal colonic human cells, we screened fda-approved compounds and identified mpa and qnhc, two drugs that can block the entry of sars-cov- into human cells. strikingly, in this assay, the efficacies of mpa and qnhc for blocking viral entry are more than times higher than chloroquine, a drug recently authorized by the fda for emergency use to treat covid- patients. moreover, the mpa concentrations effective in blocking viral entry and replication are below that which is routinely used in clinical therapy . mpa is a reversible, non-competitive inhibitor of inosine- ′-monophosphate dehydrogenase and is used widely and safely as an immunosuppressive drug (mycophenolate mofetil; cellcept) to prevent organ rejection after transplantation and for the treatment of autoimmune diseases . mpa has been reported to block replication of human immunodeficiency virus , dengue , as well as middle east respiratory syndrome coronavirus (mers-cov) . several studies on mers-cov suggest that mpa may noncompetitively inhibit the viral papain-like protease while also altering host interferon response , . a recent study also predicts mpa would modulate the interaction between host protein inosine- '-monophosphate dehydrogenase (impdh ) and sars-cov- protein nonstructural protein (nsp ) . furthermore, a clinical study of mers-cov suggested that the patients treated with mycophenolate mofetil has % mortality rate, which is significantly lower than the overall mortality rate as % . qnhc (acriquine ® , atabrine ® , atebrin ® , mepacrine ® ) is an fda-approved antimalarial drug used more recently as an anthelmintic, antiprotozoal, antineoplastic agent, and antirheumatic . recent studies have shown that quinacrine protects mice against ebola virus infection in vivo . both mpa and qnhc can be considered candidates for clinical trials of covid- therapy. recombinant indiana vsv (rvsv) expressing sars-cov- spikes were generated as previously described . hek t cells were grown to % confluency before transfection with pcmv -sars-cov -spike (kindly provided by dr. peihui wang, shandong university, china) using fugene (promega). cells were cultured overnight at °c with % co . the next day, medium was removed and vsv-g pseudotyped Δ g-luciferase (g*Δg-luciferase, kerafast) was used to infect the cells in dmem at an moi of for hr before washing the cells with x dpbs three times. dmem supplemented with % fetal bovine serum and i.u./ml penicillin and μ g/ml streptomycin was added to the infected cells and they were cultured overnight as described above. the next day, the supernatant was harvested and clarified by centrifugation at g for min and aliquots stored at − °c. severe acute respiratory syndrome coronavirus (sars-cov- ), isolate usa- to assay pseudo-typed virus infection on colon organoids, cos were seeded in well plates. pseudo-typed virus was added at moi= . plus polybrene at a final concentration of μ g/ml, and the plate centrifuged for hr at g. at hpi, the infection medium was replaced with fresh medium. at hpi, colon organoids were harvested for luciferase assays or immunostaining analysis. for chemical screening analysis, colon organoids were digested by tryple and seeded in well plates at x cells per well. after chemical treatment, pseudo-typed virus was added at moi= . and the plate centrifuged for hr at g. at hpi, hpsc-cos were harvested for luciferase assays according to the luciferase assay system protocol (promega). cells were lysed in ripa buffer for protein analysis or fixed in % formaldehyde for h for immunofluorescent staining, prior to safe removal from the bsl- facility. the colon organoids were released from matrigel using cell recovery solution (corning) on ice for hr, followed by fixation in % paraformaldehyde for hr at °c, washed twice with x pbs and allowed to sediment in % sucrose overnight. the organoids were then embedded in oct (tissuetek) and cryo-sectioned at μ m thickness. for indirect immunofluorescence staining, sections were rehydrated in x pbs for min, permeabilized with . % triton in x pbs for min, and blocked with blocking buffer containing % normal donkey serum in x pbs for hr. the sections were then incubated with the corresponding primary antibodies diluted in blocking buffer at °c overnight. the following day, sections were washed three times with x pbs before incubating with fluorophore-conjugated secondary antibody for one hr at rt. the sections were washed three times with x pbs and mounted with prolong gold antifade mounting media with dapi (life technologies). images were acquired using an lsm laser scanning confocal microscope (zeiss) and processed with zen or imaris (bitplane) software. organoids and tissues were fixed in % pfa for min at rt, blocked in mg + /ca + free pbs plus % horse serum and . % triton-x for hr at rt, and then incubated with primary antibody at °c overnight. the information for primary antibodies is provided in extended data table . secondary antibodies included donkey anti-mouse, goat, rabbit or chicken antibodies conjugated with alexa-fluor- , alexa-fluor- or alexa-fluor- fluorophores ( : , life technologies). nuclei were counterstained by dapi. antibodies. antibody-mediated fluorescence was detected on a li-cor odyssey clx imaging system and analyzed using image studio software (li-cor). the colon organoids cultured in matrigel domes were dissociated into single cells using . % trypsin (gibco) at °c for min, and the trypsin was then neutralized with dmem f supplemented with % fbs. the dissociated organoids were pelleted and resuspended with l medium (gibco) supplemented with mm hepes, and ng/ml dnasei (sigma). the resuspended organoids were then placed through a µm filter to obtain a single cell suspension, and stained with dapi followed by sorting of live we filtered cells with less than or more than genes detected as well as cells with mitochondria gene content greater than %, and used the remaining cells ( cells for the uninfected sample and cells for the infected sample) for downstream analysis. we normalized the gene expression umi counts for each sample separately using a deconvolution strategy implemented by the r scran package (v. . . ). in particular, we pre-clustered cells in each sample using the quickcluster function; we computed size factor per cell within each cluster and rescaled the size factors by normalization between clusters using the computesumfactors function; and we normalized the umi counts per cell by the size factors and took a logarithm transform using the normalize function. we further normalized the umi counts across samples using the multibatchnorm function in the r batchelor package (v . . ). we identified highly variable genes using the findvariablefeatures function in the r seurat (v . . ) , and selected the top variable genes after excluding mitochondria genes, ribosomal genes and dissociation-related genes. the list of dissociation-related genes was originally built on mouse data , we converted them to human ortholog genes using ensembl biomart. we aligned the two samples based on their mutual nearest neighbors (mnns) using the fastmnn function in the r batchelor package, this was done by performing a principal component analysis (pca) on the highly variable genes and then correcting the principal components (pcs) according to their mnns. we selected the corrected top pcs for downstream visualization and clustering analysis. we ran the uniform manifold approximation and projection (umap) dimensional reduction using the runumap function in the r seurat package with training epochs setting to . we clustered cells into eight clusters by constructing a shared nearest neighbor graph and then grouping cells of similar transcriptome profiles using the findneighbors function and findclusters function (resolution set to . ) in the r seurat package. we identified marker genes for each cluster by performing differential expression analysis between cells inside and outside that cluster using the findmarkers function in the r seurat package. after reviewing the clusters, we merged four clusters that were likely from stem cell population into a single cluster (lgr + or bmi + stem cells) and kept the other four clusters (krt + epithelial cells, muc + goblet cells, ephb + ta cells, and chga + ne cells) for further analysis. we re-identified marker genes for the merged five clusters and selected the top positive marker genes per cluster for heatmap plot using the doheatmap function in the r seurat package . hpsc-cos were harvested by cell scraper, mixed with µl matrigel (corning) and transplanted under the kidney capsule of - weeks old male nsg mice. two weeks post-transplantation, sars-cov- pseudo-entry virus was inoculated locally at x ffu. at hpi, the mice were euthanized and used for immunohistochemistry analysis. to determine the mpa's activity in vivo, the mice were treated with mg/kg mpa in ( %dmso/ % corn oil) by ip injection. two hours after drug administration, sars-cov- pseudo-entry virus was inoculated locally at x ffu. at hpi, the mice were euthanized and used for immunohistochemistry analysis. all animal work was conducted in agreement with nih guidelines and approved by the wcm institutional animal care and use committee (iacuc) and the institutional biosafety committee (ibc). to perform the high throughput small molecule screen, hpsc-cos were dissociated using tryple for min in a ℃ waterbath and replated into % matrigel-coated well plates at , cells/ µl medium/well. after hr, cells were treated with compounds from an in-house library of ~ fda-approved drugs (prestwick) at µm. dmso treatment was used as a negative control. one hour late, cells will be infected with sars-cov- pseudo virus (moi= . ). after hpi, hpsc-cos were harvested for luciferase assay following the luciferase assay system protocol (promega). the authors declare the following competing interests: r.e.s. is on the scientific advisory board of miromatrix inc. the other authors have no competing of interest. scrna-seq and rna-seq data is available from the geo repository database, accession number gse . expression level identity a. npm ciqbp ldhb mgst snhg ncl plin hist h c lcn gdf phlda igfbp lrrc a s a emp cd b galt ceacam hnrnph anpep aldob rbp tm sf o selenop fabp tspan muc apoa kctd scgn tubaia or e npw cyba tph chga mdk cdkn c clca fcgbp muc guca a selenom frzb st galnac hes tff reg a. colonic organoids derived from human induced pluripotent stem cells for modeling colorectal cancer and drug testing differentiation of human pluripotent stem cells into colonic organoids via transient activation of bmp signaling sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor generation of vsv pseudotypes using recombinant deltag-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines establishment and validation of a pseudovirus neutralization assay for sars-cov- chemokine up-regulation in sars-coronavirus-infected, monocyte-derived human dendritic cells dysregulated type i interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in sars-cov-infected mice a pneumonia outbreak associated with a new coronavirus of probable bat origin characterization of spike glycoprotein of sars-cov- on virus entry and its immune cross-reactivity with sars-cov consensus report on therapeutic drug monitoring of mycophenolic acid in solid organ transplantation the inhibition of nucleic acid synthesis by mycophenolic acid effects of mycophenolic acid on human immunodeficiency virus infection in vitro and in vivo mycophenolic acid inhibits dengue virus infection by preventing replication of viral rna thiopurine analogs and mycophenolic acid synergistically inhibit the papain-like protease of middle east respiratory syndrome coronavirus disulfiram can inhibit mers and sars coronavirus papain-like proteases via different modes treatment outcomes for patients with middle eastern respiratory syndrome coronavirus (mers cov) infection at a coronavirus referral center in the kingdom of saudi arabia repurposing quinacrine against ebola virus infection in vivo immunization-elicited broadly protective antibody reveals ebolavirus fusion loop as a site of vulnerability pooling across cells to normalize single-cell rna sequencing data with many zero counts comprehensive integration of single-cell data single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations moderated estimation of fold change and dispersion for rna-seq data with deseq key: cord- -qqyg um authors: zhu, xun; chang, ti-cheng; webby, richard; wu, gang title: idcov: a pipeline for quick clade identification of sars-cov- isolates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qqyg um idcov is a phylogenetic pipeline for quickly identifying the clades of sars-cov- virus isolates from raw sequencing data based on a selected clade-defining marker list. using a public dataset, we show that idcov can make equivalent calls as annotated by nextstrain.org on all three common clade systems using user uploaded fastq files directly. web and equivalent command-line interfaces are available. it can be deployed on any linux environment, including personal computer, hpc and the cloud. the source code is available at https://github.com/xz-stjude/idcov. a documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/readme.md. the on-going coronavirus disease (covid- ) pandemic has resulted in over , deaths, affect- ing more than countries and territories (csse, ; dong, et al., ) . the general impact of the disease calls for a quick response from the local health system at every stage of its transmission. severe acute respiratory syndrome coronavirus (sars-cov- ; previously known as -ncov) is the pathogenic cause of covid- (lescure, et al., ) . quickly can also provide more granular information to contact tracing and for epidemiological studies to under- stand disease severity and host genetic susceptibility to different lineages of sars-cov- . in order to quickly identify the clade of an isolate of sars-cov- given its sequencing fastq files, we have developed a bioinformatics pipeline and a user-friendly web interface. an equivalent command- line interface is also supplemented for batch-processing many samples in a terminal environment. we introduce this system as idcov. table d) shows the marker coverages. when a marker is covered by fewer than reads, the marker is deemed as undetermined and denoted as a question mark (?). csse. covid- dashboard by the center for systems science and engineering (csse) at johns nextflow enables reproducible computational workflows an interactive web-based dashboard to track covid- in real time gisaid -clade and lineage nomenclature aids in genomic epidemiology of active hcov- viruses haplotype-based variant detection from short-read sequencing introductions and early spread of sars-cov- in the new york city area year-letter genetic clade naming for sars-cov- on nextstain.org datahike is a durable datalog database powered by an efficient datalog query engine clinical and virological data of the first cases of covid- in europe: a case series the lancet infectious fast and accurate short read alignment with burrows-wheeler transform the sequence alignment/map format and samtools tracking the covid- pandemic in australia using genomics a new coronavirus associated with human respiratory disease in china key: cord- -qp ianzf authors: ali, fedaa; elserafy, menattallah; alkordi, mohamed h.; amin, muhamed title: ace coding variants in different populations and their potential impact on sars-cov- binding affinity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qp ianzf the susceptibility of different populations to the sars-cov- infection is not yet understood. a deeper analysis of the genomes of individuals from different populations might explain their risk for infection. in this study, a combined analysis of ace coding variants in different populations and computational chemistry calculations are conducted in order to probe the potential effects of ace coding variants on sars-cov- /ace binding affinity. our study reveals novel interaction data on the variants and sars-cov- . we could show that ace -k r; which is more frequent in the ashkenazi jewish population decrease the electrostatic attraction between sars-cov- and ace . on the contrary, ace -i v, r c, k r, d g, g r were found to increase the electrostatic attraction and increase the binding to sars-cov- ; ordered by the strength of binding from weakest to strongest. i v, r c, k r, d g and g r were more frequent in east asian, south asian, african and african american, european and european and south asian populations, respectively. sars-cov- /ace interface in the wt protein and corresponding variants is showed to be a dominated by van der waals (vdw) interactions. all the mutations except k r induce an increase in the vdw attractions between the ace and the sars-cov- . the largest increase of is observed for the r c mutant. despite numerous reports on the clinical manifestation of the recent novel coronavirus disease (covid- ) , including pathogenicity, diagnosis, and recommended treatment regimes, understanding of the mechanisms of infection, the virus-host interactions, and transmission is still in its early stage [ ] [ ] [ ] [ ] . the risk of certain populations for the infection is also still not well understood and remains under investigation. after the identification of the angiotensinconverting enzyme (ace ) as the sars-cov- receptor , a large number of reports have emerged trying to identify the pathology of the disease, and thus provide guidance to the efforts to design small molecules inhibitors , . some recent studies attempted to probe the comorbidity among infected patients, specifically patients with hypertension, diabetes mellitus, coronary heart diseases, and cerebrovascular disease that showed increased risk. whether ace polymorphisms linked to the aforementioned diseases or administration of ace modulating drugs have larger impact on the severity of infection is still a question to be investigated . communications between the pathogen and the host is mainly governed by the different proteinprotein electrostatic interactions , . these proteins need to bind in a specific way for proteinprotein complexes formation and for the signals to be effectively transmitted. the existence of residues that are involved in energetically favorable interactions at the protein-protein interface leads to enhanced interactions leading to viral-host cells fusion, marking the onset of the infection , . therefore, we opted herein to investigate the nature of the interactions at the sars-cov- /ace interface, and how the subtle variations in the structure of ace , induced by certain mutations, can alter the binding of sars-cov- in different populations. this was attempted here by analyzing different ace missense variants that code for ace -k r, ace -i v, ace -r c, ace -k r, ace -d g, ace -g r and the strength of their interaction with the spike protein (s-protein) of sars-cov- using molecular dynamics and monte carlo sampling, which are dominated by electrostatics. the electrostatics interactions are the dominant factor in protein-protein interactions , where classical treatment of these interactions in solution are described by the poisson-boltzmann equation (pbe). our data demonstrates an increase in the strength of the binding of the variants; in the order mentioned, with k r showing the least favorable binding and g r showing the most favorable binding. we analyzed the frequencies of the alleles in different populations to identify the population in which each variant is more frequent. our data help in identifying variations of appreciable frequencies and the implications for each in affecting the strength of the virusreceptor interactions in individuals of different populations. classical electrostatic calculations based on solving poisson-boltzmann (pb) equation is used to investigate the interaction energies between sars-cov- and ace for different mutated ace structures. all continuum electrostatics calculations were performed by the freely available software "multi-conformer continuum electrostatics (mcce)" . where that interaction energies are modeled by treating each residue as separate fragments with integer charges, which are interacting with each other by means of electrostatic and lennard-jones potentials . the poisson-boltzmann (pb) equation is numerically solved by delphi software, which is integrated within the mcce code, to compute self-energies of each conformer and pairwise interaction energies between conformers and between conformers and backbone. upon energy calculations, the energy look-up table is built, and theoretical titration is modeled by obtaining boltzmann distribution of microstates, using the monte-carlo approach. where that microstates of the structures are defined depending on rotamers and protonation states of the different amino acids. where that the protein interior are considered as highly polar media with low dielectric constant (ߝ ௧ ൌ ), whereas the solvent (water) is treated as a continuum medium with high dielectric constant (ߝ ). initial coordinates are defined according to the x-ray crystal structure of the wild type (wt) human ace with the receptor-binding domain rdb of the s protein of sars-cov- (pdb id: m ) from the protein data bank . the structure of the neutral amino acid transporter b at was removed from the pdb file, as it is far from the sars-cov- /ace binding site. furthermore, the structure of ace was trimmed by removing residues from p to the end of the chain. the wt residues of ace at positions , , , , and were mutated to arg (r), gly (g), arg (r), cys (c), arg (r) and val (v), respectively. the sidechains in the wt structure was replaced with the sidechains of the mutants using mcce (multi conformer continuum electrostatics) (fig. ) . a set of different conformers of the sidechains were generated for wt and mutated structures by rotating the single bonds in the sidechains by degrees. in addition, conformers were created for charged amino acids according to their possible protonation states. then, these conformers were subjected to monte carlo sampling to obtain the boltzmann distribution based on the electrostatic interactions, which are calculated using delphi. the most occupied conformers in the generated boltzmann distributions were selected and subjected to molecular dynamics optimization using openmm. finally, the interaction energies were computed for the optimized structures using mcce. the electrostatics and van der waals interactions between aminoacids in the rbd of sars-cov- and ace were analyzed using python-based code. in an attempt to better understand the susceptibility of different populations to infection by sars-cov- , we gathered data on ace missense variants from different projects and databases that aggregate allele frequencies (af). these projects and databases included single nucleotide polymorphism database (dbsnp) , , genomes project phase ( kgp ) , allele frequency aggregator (alfa project) a , exome aggregation consortium (exac) sars-cov- was reported to bind to human ace via different ace residues; q , d , h , y , q , m , k and r . we analyzed all ace coding variants reported and selected the variants that are close to the interaction site to test for changes in their binding interactions to sars-cov- . to focus on the most frequent variants, we did not include variants of very low allele count in all populations. the variants we tested were k r, d g, g r, r c, k r and i v. table s shows the frequency of each allele in the african, african american, american, ashkenazi jewish, asian (all, chinamap, east asian, han chinese south, han chinese beijing and south asian), european (all, european american, finnish and non-finnish) and latino. the electrostatic and the van der waal contribution to the interaction energies of sars-cov- /ace were compared between single mutated and wt protein at ph = (table ). in contrary to k r, most of the mutated structures exhibit a negative shift in the total interaction energy compared to the wt structure (native) i.e. more electrostatic attraction with the spike protein of sars-cov- . based on our calculations, g r mutant is shown to induce the largest increase in the binding energy between sars-cov- and ace , where the binding is more favorable by ~ . kcal/mol than the wt. however, the k r mutant causes a decrease in the binding energies by ~ . kcal/mol (table ). the substitution of lys (k) to arg (r) at positions and was shown to increase the electrostatic attraction between sars-cov- and ace by . -fold and . -fold in comparison to the wt, respectively. oppositely, substitution of r at position was shown to reduce the electrostatic attraction between sars-cov- and ace to . kcal/mol lower than that in wt. in addition, the d g mutant showed similar electrostatic interactions to the wt. however, the r c, i v mutants showed significant decrease in the electrostatic attraction to by ~ kcal/mol compared to the wt (table ) . sars-cov- /ace interface in the wt protein and corresponding mutants is showed to be a dominated by vdw interactions, which accounts for more than % of the interaction energy. all the mutations except k r induced an increase in the vdw attractions between the ace and the sars-cov- . the largest increase of ~ kcal/mol is observed for the r c mutant (table ) . our results, shown in fig. t a b l e . t h e i n t e r a c t i o n e n e r g i e s b e t w e e n t h e s a r s -c o v - a n d a c e e l e c t r o s t a t i c s ( k c a table and the corresponding population with the highest allele frequency. the k r variant which decreases the ace -virus binding was found to be most frequent in the ashkenazi jewish population ( . %). on the contrary, the asian population had the lowest allele frequency for the single nucleotide variant (snv) coding for k r. in addition, the frequency of i v which enhance ace interaction with sars-cov- was observed in more than % of the east asian population. interestingly, i v was not detected in the african, african american, american, ashkenazi jewish and latino populations and was very rare in the european population. r c was most frequent in the south asian population ( . %), but was not detected in other asian subpopulations and was much less frequent or lacking in other populations. the k r, which enhance the virus-receptor binding was most frequent in the african and african american populations; the average of the af reported for the populations was ~ . % and ~ . %, respectively. finally, the d g and g r, which showed the strongest interaction with ace were most abundant in the european population, mounting to . % and ~ . %, respectively. the g r was also abundant in the south asian population ~ . %. our findings give indications regarding the populations that could be more prone to sars-cov- infection due to enhanced binding affinity. nevertheless, the need for proper infection, death and recovery statistics is needed to reach a definite conclusion. k r and i v are considered of intermediate frequencies (> %) in the ashkenazi jewish and east asian population, respectively. therefore, we would refer to them as single nucleotide polymorphism (snps) in the respective populations. however, the other variants are considered of rare frequencies (< %) , . nevertheless, in the era of individualized genomics, the individual differences in the same population should also be taken into account as rare variants might exist affecting the virus binding to ace in many individuals. the possibility of having individuals baring more than one of the mutations that enhance the binding is also still an open question, that will be answered via sequencing of ace in more individuals of the aforementioned populations. the presence of variants for the serine protease tmprss ; required for s protein priming, in different populations could also account for differences in viral entry and pathogenesis . our data suggests that certain populations might be more affected by sars-cov- ; depending on the frequency of the respective variants. to better understand the susceptibility of individuals from different populations to sars-cov- and their risk of infection, large scale sequencing projects should be performed. integrating population genetics will provide a new insight into the mystery of susceptibility, infection, pathogenicity and mortality in different regions. sequencing of ace in patients with the most severe conditions in every population will also provide a more reliable conclusion. the sequencing projects will not only help us test the association between ace variants and risk/severity of infection in certain populations, it will also allow more accurate analysis of variants for other host genes involved in the viral entry and pathogenicity . finally, our findings can potentially guide future attempts to devise small molecules inhibitors, designed specifically to disrupt the interaction between the strongest ace binding variants and the s protein of sars-cov- . the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak-a n update on the status updated approaches against sars-cov- hypothesis for potential pathogenesis of sars-cov- infection-a review of immune changes in patients with viral pneumonia. emerging microbes and infections remdesivir as a possible therapeutic option for the covid- . travel medicine and infectious disease structural basis for the recognition of sars-cov- by full-length human ace . science ( -. ) are patients with hypertension and diabetes mellitus at increased risk for covid- infection? electrostatic contributions to protein-protein interactions: fast energetic filters for docking and their physical basis electrostatic aspects of protein-protein interactions sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor improving protein pka calculations with extensive side chain rotamer sampling dbsnp -database for single nucleotide polymorphisms and other classes of minor genetic variation. genome research dbsnp: a database of single nucleotide polymorphisms a global reference for human genetic variation the exac browser: displaying reference data information from over exomes the mutational constraint spectrum quantified from variation in , humans comparative genetic analysis of the novel coronavirus ( -ncov/sars-cov- ) receptor ace in different populations evaluating empirical bounds on complex disease genetic architecture the impact of rare and low-frequency genetic variants in common disease genotype and phenotype of covid- : their roles in pathogenesis alfa: allele frequency aggregator key: cord- -ngpsgn authors: tremiliosi, guilherme c.; simoes, luiz gustavo p.; minozzi, daniel t.; santos, renato i.; vilela, daiane c. b.; durigon, edison luiz; machado, rafael rahal guaragna; medina, douglas sales; ribeiro, lara kelly; rosa, ieda lucia viana; assis, marcelo; andrés, juan; longo, elson; freitas-junior, lucio h. title: ag nanoparticles-based antimicrobial polycotton fabrics to prevent the transmission and spread of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ngpsgn pathogens (bacteria, fungus and virus) are becoming a potential threat to the health of human beings and environment worldwide. they widely exist in the environment, with characteristics of variety, spreading quickly and easily causing adverse reactions. in this work, an ag-based material is used to be incorporated and functionalized in polycotton fabrics using pad-dry-cure method. this composite proved to be effective for inhibiting the sars-cov- virus, decreasing the number of replicates in . % after an incubation period of minutes. in addition, it caused . % inhibition of the pathogens s. aureus, e. coli and c. albicans, preventing cross-infections and does not cause allergies or photoirritation processes, demonstrating the safety of its use. pathogenic microbes are becoming a potential threat to the health of human beings and environment worldwide. today, the humanity has experienced epidemic diseases caused by both new and well known viruses, including hepatitis c, hiv/aids, sars-cov, mers, lassa fever, zika virus, and ebola virus, as well as yellow fever, influenza, and measles virus, which are more widespread but can be severe, despite the availability of vaccines. [ ] severe acute respiratory syndrome coronavirus (sars-cov- ) is a novel coronavirus that causes the coronavirus disease . since its first detection in december , [ ] it has affected millions of people worldwide, carrying a mortality rate much higher than the common flu. these public health outbreaks driven by emerging covid- infectious diseases constitute the forefront of global safety concerns and significant burden on global economies. while there is an urgent need for its effective treatment based on antivirals and vaccines, it is imperative to explore any other effective intervention strategies that may reduce the mortality and morbidity rates of this disease. in the absence of an effective vaccine, it is expected that not only the current pandemic will continue for several months, but other outbreaks caused by sars-cov- may take place in the future, in the coming months or years. [ ] furthermore, unknown viruses and/or pathogens will likely emerge again, and their pathogenicity, spread, contagion, and mechanism of action will be inquired. this current global crisis alarms us to the fact that we urgently need to prepare ourselves for a new and unpredictable epidemic in the future. the human body is a diverse ecosystem that harbors hundreds of trillions of microbes (bacteria, fungi and viruses) [ ] that might be or become pathogenic under certain circumstances. the development of innovative materials capable of avoiding the transmission, spread, and entry of these pathogens into the human body is currently in the spotlight. highly effective agents are needed not only to control the emergence of new covid- pandemia, their increased proliferation capability, and resistance that severely impact public health, but it is also fundamentally essential to explore strategies for the preparation and application of new materials against pathogen infection. often, the surface of a material is the medium by which the human body interacts with microbes. therefore, anti-pathogen strategies based on chemical modification of the material surface have been developed. one procedure is to form a layer on the surface of the material, thereby reducing the chance of contact between the pathogen and the surface of the material. this greatly reduces the number of pathogens adhering to the surface. another strategy is that killing the adhered pathogen directly by decorating the surface of the material with the biocide agent. use of personal protective equipment is considered to be one of the most important strategies for protecting from transmissible pathogens, particularly when aerosol transmission occurs and when no effective treatment or prophylaxis is available for the disease provoked by these pathogens in question. in the case of covid- , for instance, the who has recently issued recommendation for widespread use of face masks as an important tool in the control of sars-cov- spread. [ ] therefore, the current worldwide public health crisis of covid- has highlighted the particularly emergent need for materials that inactivate enveloped viruses on contact for preventing transmission. inorganic biocide surfaces and materials have attracted much attention due to their better stability and safety as compared with organic reagents for preventing infections and transmission. among inorganic agents, silver cation and metal are most widely used. however, ag cations tend to react with cl − , hs − , and so − in aqueous solution, forming precipitates, thus losing their biocide activity, which affects the practical application of ag-loaded biocide agents to a certain extent. ag and its compounds have been widely used since from ancient time, in bc, to prevent bacterial growth and wound infections, and make water potable. [ ] ag metal is a precious metal and is easily discolored under light and heat, but for many years, it was used as medical treatment as a broad-spectrum antibacterial compound before the discovery of antibiotics in the early th century. [ , ] nanotechnology is capable of modifying both ag cation and metal into their nano range, which dramatically changes their chemical and physical properties. ag nanoparticles (agnps) acquire special attention due to its specificity and environment friendly approach with a wide application in industry and medicine due to its antibacterial, antifungal, larvicidal and anti-parasitic characters. the use of agnps has been greatly enhanced due to the development of antibiotic resistance against several pathogenic bacteria, and they are employed in biomedical industry as coatings in dressings, in medicinal devices, in the form of nanogels in cosmetics and lotions, etc. [ , , ] according to the literature, there are plenty of protocols focused on the production of hybrid/composite materials based on ag nps, whose architecture is driven by different synthetic methods and reaction mechanisms. [ ] [ ] [ ] [ ] [ ] [ ] [ ] while the precise reasons for this unique chemistry and physics are unknown, the observed structures, their reproducibility, and synthetic control this reaction offers, there is plenty of room to find innovative possibilities for new technologies. [ , ] ag nps have been proven to be most useful because they have excellent antimicrobial properties against lethal viruses, microbes/germs, and other microorganisms. these nps are certainly the most extensively utilized material among all. thus, it has been used as antimicrobial agent in different textile industries. [ ] the noble metal nps are considered as more specific and multipurpose agents with a diversity of biomedical applications considering their use in extremely sensitive investigative assays, radiotherapy enhancement, gene delivery, thermal ablation, and drug delivery. these metallic nps are also considered to be nontoxic in case of gene and drug delivery applications. thus, metallic nps can offer diagnostic and therapeutic possibilities simultaneously. [ ] . the purpose of this work is to present an innovative material with high bactericide, fungicide and virucide efficiency in their incorporation and application in textile applications such as cotton-based materials that make special biopolymer hosts for composite materials. finally, it is important to study the reliability of sintered agnps, to test and analyze its allergic response, dermatological photoirrritant and photosensitive effects, as well as their antimicrobial, fungicide and antiviral activity a fine-medium weight % polyester / % cotton woven fabric (plain weave, g/m²; width , m; ends /cm; picks /cm; yarn ne %polyester / % cotton) purchased from local suppliers (são carlos/sp, brazil) was used for the application purpose. an agnp colloidal solution (agnp-cs) and an agnp colloidal solution stabilized with organic polymers (agnp-op) were applied on the polycotton fabric using pad-dry-cure method (nanoxclean® ag+fresh k, and nanoxclean® ag+fresh hybrid, respectively, provided by nanox tecnologia s.a. -são carlos/sp -brazil). an acrylic-based binder compound was used in the impregnation solution as well (starcoat denim gl provided by star colours ltda. -americana/sp, brazil). the polycotton fabric cut to the size of x cm was immersed in the solution containing % (% weight basis) of the antimicrobial products and % of the acrylicbased binder (% weight basis), for minutes and passed through a laboratory scale padder, with a % wet pick-up maintained for all the treatments. after drying ( °c, min) the fabric was annealed at °c for min, then washed with deionized water and then dried at °c for min in a ventilated oven. all samples were then conditioned at °c and % relative humidity for h. samples were produced according to table . micro-raman spectroscopy was performed using an ihr spectrometer (horiba jobin-yvon, japan) coupled to a charge-coupled device (ccd) detector and an argon-ion laser (melles griot, united states) operating at λ = . nm and mw. the spectra were carried out bin the range of - cm - . morphologies of the composites were analyzed by field emission scanning electron microscopy (fe-sem) on a fei instrument (model inspect f ) operating at kv. fourier transform infrared spectroscopy (ftir) was performed using a jasco ft/ir- (japan) spectrophotometer operated in absorbance mode at room temperature. the spectra were carried out in the range of - cm - . the aatcc parallel streak standard method [ ] was used as a qualitative method to evaluate antibacterial activity of the treated fabrics. sterile plate count agar was dispensed in petri plates. hours broth cultures of the test organisms (escherichia coli (e. coli -atcc ) and staphylococcus aureus (s. aureus -atcc ) were used as inoculums. using a µl inoculation loop, loop full of culture was loaded and transferred to the surface of the agar plate by making . cm long parallel streaks cm apart in the center of the plate, refilling the loop at every streak. the test specimen was gently pressed transversely, across the five inoculums of streaks to ensure intimate contact with the agar surface. the plates were incubated at °c for - hours. after incubation, a streak of interrupted growth underneath and along the side of the test material indicates antibacterial effectiveness of the fabric. the quantitative antimicrobial activity assessment of the treated polycotton fabrics was determined according to aatcc test method [ ] . fabric specimens (circular swatch . cm in diameter) were impregnated with . ml of inoculum in a ml container. the inoculum was a nutrient broth culture containing . ~ . · /ml colony forming units of microorganisms. e. coli and s. aureus were used as a reference for gram-negative and gram-positive bacteria, respectively, and c. albicans (atcc ) as a reference for fungus. the microorganisms counted on the treated polycotton fabric and those on a controlled sample were determined after a -hour incubation period at °c. the antimicrobial activity was expressed in terms of percentage reduction of the microorganism after contact with the test specimen compared to the number of microbial cells surviving after contact with the control. the results are expressed as percent reduction of microorganisms by eq. ( ). ( ) where a and b are the numbers of bacteria or fungus recovered from the antimicrobial-treated and untreated polycotton fabrics in the jar incubated over the desired contact period, respectively. an adaptation of iso determination of antiviral activity of textile products standard method [ ] was used as a reference for a quantitative method to evaluate the treated polycotton's ability to inactivate the sars-cov- virus particles (sars-cov- /human/bra/sp cc/ -mt ), under the tested conditions, at two different time intervals ( and minutes of contact time). the virus was inoculated into liquid media containing no fabric, treated and non-treated polycotton samples and incubated for different time periods. then, they were plated onto tissue cultures of vero ccl- cells. after the incubation, the viral genetic material was quantified in each condition using real-time quantitative pcr, and based on the control samples, the ability of each sample to inactivate sars-cov- was determined. briefly, vero ccl- cells were plated onto -well plates at × cells per well. the cells were maintained in dmem high glucose culture medium (sigma-aldrich, c) supplemented with % fetal bovine serum, u/ml of penicillin, and µg/ml of streptomycin. the plate was incubated at ºc, % co atmosphere for h. following this period, the medium was removed and replaced with . µl of dmem high glucose/well without supplementation. three test specimens, non-treated polycotton control and ag-based antimicrobial treated polycotton samples, measuring . cm apiece, were tested. each test specimen was placed into a different tube and . ml of dmem high glucose medium without supplementation was added to each tube. in parallel, µl of culture medium containing sars-cov- was diluted in . ml of dmem medium without supplementation, and then . µl of this viral suspension was added to each of the tubes containing the pieces of cloth. the mixtures were incubated with the virus for min and the tubes were homogenized every seconds. after this period, . µl of each sample was transferred to different wells of the plates containing the cells previously seeded. after a total of min of incubation, an additional . µl-aliquot was removed from each tube and incubated in other wells on the same plate. as control, the viral suspension was incubated in media without supplementation, with samples collected at and min used to infect vero cells on the same plate the plate was incubated for h at ºc, % co for viral adsorption, and after this period, . µl of dmem high glucose medium containing % fetal bovine serum were added to each well, making to a final volume of ml of medium/well containing % serum. immediately after adding the medium, the plate was further incubated at ºc, % co . after h, the plate was removed from the incubator and µl of the medium from each well (each well a different condition) was removed and placed in lysis buffer to proceed with the viral rna extraction. for the extraction, the magmax™ core nucleic acid purification kit (thermo fisher) was used, following the manufacturer's instructions, on the semi-automated platform magmax express- (applied biosystems, weiterstadt, germany). the detection of viral rna was carried out using the agpath-id one-step rt-pcr kit (applied biosystems) on an abi sds real-time pcr machine (applied biosystems), using a published protocol and sequence of primers and probe for e gene. [ ] the number of rna copies/ml was quantified by real-time rt-qpcr using a specific in vitro-transcribed rna quantification standard, kindly granted by christian drosten, tsmedizin berlin, germany, as described previously. [ ] the viricidal activity, or viral inactivation, was determined as a percentage related to the control (media without fabric specimen). the experiment was repeated using the same experimental conditions, but with media incubated with two pieces of test specimens (instead of one) per condition. a human repeat insult patch test (hript) was performed to determine the absence of the potential for dermal irritability and sensitization of the treated fabrics. the study was carried out in maximized conditions, in which semi-occlusive dressings containing the investigational product and controls were applied to the participants' backs. the application of the study dressings occurred for six weeks, with three weeks of application alternately, two weeks of rest and a new application of the dressing containing the product in virgin area in the sixth week (challenge). the readings of the application site were performed at each dressing change according to the reading scale recommended by the international contact dermatitis research group (icdrg) [ ] . dermatological evaluations were carried out at the beginning and end of the study, and a physician was available for evaluation and assistance to the participants in case of positive or adverse reaction. participants of both genders, with phototypes iii to iv (fitzpatrick) , [ ] aged between and were selected. the selected participants were distributed as shown in the table . since exposure to solar radiation can trigger or aggravate adverse reactions to topical products, knowing the behavior of the product on human skin stimulated with ultraviolet radiation is of fundamental importance for proof of safety. therefore, a unicentric, blind, comparative clinical study to assess the photoirritating and photosensitizing potential was also conducted, with the aim of proving the absence of the irritating potential of the product applied to the skin when exposed to ultraviolet radiation. the study was carried out with dressings containing the product, applied to the participants' skin and, after removal, controlled irradiation with a spectrum of ultraviolet radiation emission was performed. readings were performed according to the reading scale recommended by the icdrg. the study with the participants lasted for five weeks, covering phases: induction, rest and challenge. dermatological evaluations were performed at the beginning and end of the study, or when there was an indication of positivity or adverse reaction. participants of both genders, with phototype iii (fitzpatrick) , aged between and were selected. the selected participants were distributed as shown in the table . there are numerous ways to functionalize a textile substrate, from the development of new structures up to finishes that modify the material's surface. the superficial modification through the incorporation of nanoparticles has been extensively studied and shows potential for obtaining devices with microbicidal activity. [ ] [ ] [ ] [ ] [ ] [ ] in this scenario, nanoparticles can advantageously replace micrometric particles used in the finishes to obtain functional fabrics, because it has a greater surface area, resulting in a better adhesion to fabrics and, consequently, greater durability of functionality. furthermore, it is possible to achieve a pronounced effect with small amounts of material not altering the original properties of the fabric. traditionally, the pad-dry-cure is the most common finishing route applied to impart different finish treatments on textile fabrics. [ , ] in this way, the interactions between the polycotton fabric and the ag nps were investigated, in order to observe how these changes mirrored the microbicidal properties in the new ag-based fabric. in order to analyze the local structural order/disorder caused by the addition of ag-based antimicrobials to polycotton, micro-raman analyzes were performed. the results are presented in figure . the adhesion and durability of a superficial change in fabrics depends on the surface chemical properties. [ ] since the main component of polycotton is cotton (formed by glucose monomers) and polyester (polyethylene terephthalate), the vibrations refer to c, o and h bonds. the cotton raman spectra of the samples can be divided into four blocks related to the glycosidic ring skeleton, to oh groups, the ch and ch groups and acetylation of cotton. the glycosidic ring presents the fingerprint of the cotton raman spectra. these modes can be observed at , , and cm - and represent the symmetrical stretching of c-o-c in the plane, asymmetric and symmetrical stretching of c-o-c in the glycosidic link, and asymmetric stretching of c-c ring breathing. [ ] the presence of oh groups in the samples is observed by the modes located at and cm - , related to the twisting and deformation of the c-oh bonds. [ , ] four modes related to ch deformations and twisting are observed in , , and cm - . there is still a mode located at cm - regarding the twisting of the c-ch bond. [ ] the acetylation of the cotton used is further confirmed by the modes located at and cm - , referring to deformation o-c=o and the stretching of the h c-c bonds. [ ] like cotton, polyester has its fingerprint given by the modes referring to its aromatic ring and its esters. it is observed in cm - the mode referring to the stretching of the c -c carbon of the aromatic polyester ring, as well as its ch stretching in cm - . [ , ] it is also possible to observe in cm - the stretching of the c=o bonds of the esters and the stretching of the ch bonds of the methyl groups external to the ring, in cm - . [ ] [ ] [ ] it is observed that the addition of ag-based antimicrobials do not cause significant changes in the polycotton structure at short-range. as a complementary analysis to micro raman spectroscopy, ftir measurements were performed to investigate the functional groups of products after the incorporation of the two ag-based antimicrobial solutions into the polycotton ( figure ). it is observed in the samples of pure polycotton and those modified with the agbased antimicrobials the peaks located at , , , , , , , and cm - . the peaks located at , and cm - refer to oh stretching and ch deformation respectively, the latter being related to the ch groups of the cellulose structure. [ , ] the peaks located in , , and cm - correspond respectively to h o adsorbed on the polycotton surface, stretching of the ch bond, asymmetric deformation of the c-o-c groups and were attributed to stretching vibrations of intermolecular ester bonding. [ ] [ ] [ ] as in the raman spectra, the cotton fingerprint can be observed in the ftir due to the presence of the band located at cm - , referring to the asymmetric stretching of the glycosidic ring, especially the c -o-c bonds. [ ] it is observed for the modified polycotton in relation to the non-treated polycotton the displacement of the band located at cm - , as well as the decrease of the band located around cm - , referent to in-plane bending of o-h mode from the glycosidic units and deformation of the oh, respectively. [ , ] these shifts, as well as the appearance of new bands in the ftir spectra of the modified polycotton are due to interactions between polycotton and ag-based antimicrobial additives. [ ] [ ] [ ] figures d-f) , it is possible to observe the formation of small ag nanoparticles on the polycotton surface, with average size of the . ± . nm. similar behavior was obtained by several other authors in works that incorporated agnps into polycotton in different ways. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] for polycotton agnp-op, the formation of a smaller amount of ag nanoparticles with average size higher ( . ± . nm) than the than polycotton agnp-cs were observed. in addition, there is a homogeneous distribution of micrometric crystals with since ag-based antimicrobial additives caused distinct surface interactions in polycotton fibers, it is expected that this difference will be reflected in their physical, chemical and biological properties. in this sense, experiments were carried out to evaluate the biological properties of composites obtained through the allergenic response to humans and microbicidal activity against e. coli, s. aureus, c. albicans and sars-cov- . the aatcc test results against s. aureus (gram positive) and e. coli (gram negative) bacteria for the non-treated and the ag-based antimicrobial treated polycotton samples are shown in figure and table . for the control polycotton, growth of e. coli and s. aureus was observed under the specimen while no growth appeared for the treated fabric. the zone of inhibition for the control sample was mm, in comparison to - mm for the treated fabric. it can be seen from these results that the ag-based antimicrobials treated fabrics displayed a high level of antibacterial performance. the antibacterial mechanism for gram-positive and gram-negative bacteria is associated with ag nps and their penetration into the cell membrane of these microorganisms. ag nps are able to penetrate cell membranes and release ag + ions, which have a high affinity to react with phosphorus and sulfur compounds, either from the membrane or from inside of the cell. [ ] in addition, ag nps can generate reactive oxygen species (ros), which cause an accumulation of intracellular ros, leading to bacterial death from oxidative stress [ ] . table . aatcc tests results against s. aureus and e. coli for non-treated (control) and ag-based antimicrobials treated fabric samples. the halo inhibition test (aatcc ) is only a qualitative test that shows bacterial inhibition, requiring a qualitative test to determine the percentages of inhibition (aatcc ). the quantitative antimicrobial activities of finished textiles treated with the two different ag-based antimicrobials according to the aatcc standard are shown in table . these tests were also performed with c. albicans, in order to assess the fungicidal potential of the ag-based fabrics. in agreement with the qualitative test, the quantitative test showed that all the ag-based treated polycotton samples had efficient antimicrobial activities, against bacteria and fungi, displaying a . % reduction in all tested samples. a percent bacterial reduction as measured against a non-treated control. the antiviral activity test was designed to determine the inactivation of viral particles upon short exposure to the products, which in this case were the ag-based treated polycotton samples incubated in liquid media. after a short period of incubation, the media were transferred to a cell culture, where viable virions would be able to enter cells and replicate within. the supernatant of cell cultures was recovered after h and the viral load was determined by rt-qpcr, resulting in the determination of number of viral rna copies per ml. table shows the number of copies of the control media without any fabric sample, non-treated polycotton, and the two ag-based treated polycotton samples at the two different tested time periods. with the result of the number of copies of each sample, the viral inactivation effect of each cloth was calculated, using the media without any fabric sample as control. regarding the second experiment, the number of copies per milliliter in each sample was also obtained and the percentage of inhibition of the products was calculated from the control media without any fabric sample. the obtained results were summarized in table . the following graphs represent the data described in tables and , of the control media without any fabric sample, non-treated polycotton, and the two ag-based treated polycotton samples. figures and show the results of the first and second experiments, respectively, indicating the number of viral copies per ml and the percentage of inhibition of each compound above the bar referring to it. inhibition was calculated for each treatment using its respective control. in both experiments, in the two time periods tested, the untreated polycotton showed a subtle activity, which was already expected by data already published by chin and colleagues. [ ] polycotton agnp-cs showed a high viricidal activity when incubated with the virus. at both time periods in both experiments, polycotton agnp-op obtained a higher rate of viral inactivation compared to polycotton agnp-cs. in short, both treated polycotton samples were effective in viral inhibition in and minutes in two different experiments, where there was variation in the amount of virus per cm² of fabric ( x less virus/cm² in the second experiment). polycotton agnp-op showed the best activity, reaching . % within two minutes of incubation with the virus in the second experiment. polycotton agnp-cs, despite being less effective than polycotton agnp-op, showed high anti-sars-cov- activity, with more than % inhibition rate in all tests performed. as an antiviral agent, ag nps can interfere with viral replication by two separate mechanisms of adhesion to the surface of the viral envelope. this adhesion prevents the virus from being able to connect to the infecting cell, preventing contamination and possible damage. [ , ] these antiviral mechanisms are mainly caused by stress in infected cells (due to physical contact), generation of reactive oxygen species (ros), interactions with dna and enzymatic damage. [ ] the first mechanism is through the binding of ag nps with sulfur residues from the virus's surface glycoproteins, preventing interaction with the receptor and its entry into the host cell. [ , ] the second mechanism involves the passage of ag nps through the cell membrane that consequently, it ends up effectively blocking the transcription factors necessary for the adequate assembly of the viral progeny. [ ] thus, in addition to the unique behavior of ag nps in isolation, its interface with polymers can be explored, which may open the way for new ones promising applications in several fields of action. both the human repeat insult patch test and the clinical study to assess the photoirritating and photosensitizing potential were conducted according to the cosmetic product safety assessment guide, published by the brazilian regulatory agency anvisa [ ] , by the ecolyzer group (são paulo/sp, brazil), an independent and iso certified laboratory. for the hript, primary dermal irritability, accumulated dermal irritability and dermal sensitization potential were determined. the clinical evaluation criterion was the observation of clinical signs or symptoms such as swelling (edema), redness (erythema), papules and vesicles according to the reading scale recommended by the icdrg. no adverse reactions (erythema, edema, papules or vesicles) were detected in the product's application areas, in the analysis of primary and accumulated irritability, sensitization, during the study period. the same clinical evaluation criterion was used to determine the dermal photoirritation and dermal photosensitization in the clinical study. as in the hript, on this study no adverse reactions (erythema, edema, papules or vesicles) were detected in the product's application areas during the study period. according to the results obtained from the sample of participants studied, we can conclude that the treated fabrics did not induce a photoirritating, photosensitizing, irritation nor sensitization process and, therefore, can be considered hypoallergenic and dermatologically tested and approved, being considered safe, according to anvisa's guide for cosmetic product safety. polycotton fabrics can be functionalized to attain antibacterial, antifungal and antiviral properties using a simplistic and very common finishing treatment method in nature, the pad-dry-cure. the use of an aqueous ag nps solution mixed with an acrylicbased binder was demonstrated to achieve a high level of antimicrobial performance and can potentially present a high durability in relation to washing cycles as a result of the use of the binder in the impregnation solution. ftir and raman analyzes showed different surface effects on polycotton surface fibers due to the different chemical nature of both tested antimicrobial finishing products. additionally, the antimicrobial finishing did not display significant differences in the fiber diameters of the samples, as shown by the fe-sem images. therefore, this product had excellent universality in the preparation, and it is expected that no significant change in fabric's organoleptic properties change, requiring no special condition for its use in a major scale. future experimental investigation would open new avenues to include the antiviral, antibacterial and antifungal treatment to a wide variety of different surfaces in addition to polycotton fabrics such as synthetic and natural fabrics, including cotton, polyesters and polyamides. beyond the scope of this work, this simple and facile antimicrobial finishing treatment could be potentially scaled for industrial applications after addressing challenges such as choosing appropriate processing conditions and developing feasible waste disposition protocols. the main differential capability of these ag-based fabrics is the prevention of cross infection caused by pathogens, such as opportunistic bacteria and fungi, responsible for the worsening of covid- and other types of viruses. the fabrication of these fabrics composed of these materials may provide new insights into the development of protection garments and it is expected that these new textile materials may play an outstanding role as a new and important weapon against the current covid- pandemic. are rna viruses candidate agents for the next global pandemic? a review a novel coronavirus from patients with pneumonia in china sars and mers: recent insights into emerging coronaviruses revised estimates for the number of human and bacteria cells in the advice on the use of masks in the context of covid- : interim guidance- guía interna la oms silver nanoparticles as a new generation of antimicrobials silver in medicine: a brief history bc to present history of the medical use of silver effects of various heavy metal nanoparticles on enterococcus hirae and escherichia coli growth and proton-coupled membrane transport nanoparticles: alternatives against drug-resistant pathogenic microbes silver nanopaste: synthesis, reinforcements and application int synthesis of silver nanoparticles: chemical, physical and biological methods res silver nanoparticles: synthesis methods, bio-applications and properties shams-eldin e. and shalan a. e. advances in nanotechnology and antibacterial properties of biodegradable food packaging materials rsc adv therapeutic gold, silver, and platinum nanoparticles wires nanomedicine and nanobiotechnology silver nanoparticles: various methods of synthesis, size affecting factors and their potential applications-a review appl inhibition of candida auris biofilm formation on medical and environmental surfaces by silver nanoparticles antimicrobial characterization of silver nanoparticle-coated surfaces by "touch test" method a review on nanoparticles : their synthesis and types res antimicrobial activity assessment of textile materials: parallel streak method from american association of textile chemists and colorists antibacterial finishes on textile materials: assessment of developed from american association of textile chemists and colorists textiles -determination of antiviral activity of textile products identification of a novel coronavirus in patients with severe acute respiratory syndrome terminology of contact dermatitis the validity and practicality of sun-reactive skin types i through vi surface modification of biotextiles for medical applications coating of tpu-pdms-tms on polycotton fabrics for versatile protection polymers (basel) growing zno nanoparticles on polydopamine-templated cotton fabrics for durable antimicrobial activity and uv protection polymers (basel) uv resistant and fire retardant properties in fabrics coated with polymer based nanocomposites derived from sustainable and natural resources for protective clothing application compos fabrication of durably antibacterial cotton fabrics by robust and uniform immobilization of silver nanoparticles via mussel-inspired polydopamine ligand modified cellulose fabrics as support of zinc oxide nanoparticles for uv protection and antimicrobial activities application technologies for coating, lamination and finishing of technical textiles introduction to finishing application of ft-raman spectroscopy for in situdetection of microorganisms on the surface of textiles characterization of developing cotton fibers by confocal raman microscopy fibers vibrational spectroscopic investigation of australian cotton cellulose fibres part . a fourier transform raman study application of raman spectroscopy for differentiation among cotton and viscose fibers dyed with several dye classes raman spectroscopic investigation of acetylation of raw cotton spectrochim physicochemical modifications accompanying uv laser induced surface structures on poly(ethylene terephthalate) and their effect on adhesion of mesenchymal cells chemical depth profiling of photovoltaic backsheets after accelerated laboratory weathering reliab graphene/cotton composite fabrics as flexible electrode materials for electrochemical capacitors rsc adv eco-friendly cationic modification of cotton fabrics for improving utilization of reactive dyes rsc adv characterization of cotton fabric scouring by ft-ir atr spectroscopy pine cone and boron compounds effect as reinforcement on mechanical and flammability properties of polyester composites open chem influence of stacking sequence on the mechanical and dynamic mechanical properties of cotton/glass fiber reinforced polyester multi-finishing of polyester and polyester cotton blend fabrics activated by enzymatic treatment and loaded with antibacterial cotton fabric with enhanced durability prepared using silver nanoparticles and carboxymethyl chitosan synthesis and optical properties of composite films from p ht and sandwich-like ag-c-ag nanoparticles rsc adv laundering durable antibacterial cotton fabrics grafted with pomegranate-shaped polymer wrapped in silver nanoparticle aggregations sci silver-cotton nanocomposites: nano-design of microfibrillar structure causes morphological changes and increased tenacity sci enhancing the surface affinity with silver nano-particles for antibacterial cotton fabric by coating carboxymethyl chitosan and l-cysteine application of silver nanoparticles to cotton fabric as an antibacterial textile finish fibers polym preparation of cotton fibers with antibacterial silver nanoparticles antimicrobial activity and cytotoxicity of cotton fabric coated with conducting polymers, polyaniline or polypyrrole, and with deposited silver nanoparticles antibacterial properties of cotton fabric treated with silver nanoparticles durable antibacterial and cross-linking cotton with colloidal silver nanoparticles and butane tetracarboxylic acid without yellowing colloids surfaces b biointerfaces a new method of finishing of cotton fabric by in situ synthesis of silver nanoparticles ind controlled synthesis of ag nanoparticles with different morphologies and their antibacterial properties ag nanoparticles/α-ag wo composite formed by electron beam and femtosecond irradiation as peiris m. and poon l. l. m. stability of sars-cov- in different environmental conditions the lancet microbe protease inhibitors: silver bullets for chronic hepatitis c infection? cleve possible mechanism of inhibition of virus infectivity with nanoparticles semicond an overview application of silver nanoparticles in inhibition of herpes simplex virus artif interaction of silver nanoparticles with tacaribe virus mode of antiviral action of silver nanoparticles against hiv- uptake and intracellular distribution of silver nanoparticles in human mesenchymal stem cells agência nacional de vigilânia sanitária guia para avaliação de segurança de produtos cosméticos guia para avaliação de segurança de the authors acknowledge the financial support of the brazilian research financing institutions: fundação de amparo à pesquisa do estado de são paulo key: cord- - k qymqd authors: xiong, hua-long; cao, jia-li; shen, chen-guang; ma, jian; qiao, xiao-yang; shi, tian-shu; yang, yang; ge, sheng-xiang; zhang, jun; zhang, tian-ying; yuan, quan; xia, ning-shao title: several fda-approved drugs effectively inhibit sars-cov- infection in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k qymqd to identify drugs that are potentially used for the treatment of covid- , the potency of fda-approved drugs were evaluated using a robust pseudovirus assay and the candidates were further confirmed by authentic sars-cov- assay. four compounds, clomiphene (citrate), vortioxetine, vortioxetine (hydrobromide) and asenapine (hydrochloride), showed potent inhibitory effects in both pseudovirus and authentic virus assay. the combination of clomiphene (citrate), vortioxetine and asenapine (hydrochloride) is much more potent than used alone, with ic of . μm. as of may , the covid- pandemic has claimed more than , lives, but yet effective drug is not available. it is time-consuming to develop vaccines or specific drugs for a disease caused by a novel defined virus like sars-cov- . re-purposing of approved drugs may be a faster way to find treatment for covid- . verification of drugs that might suppress sars-cov- by prediction, including drugs against similar virus and broad-spectrum antiviral agents (bsaas), is time-saving for drug re-purposing at the expense of missing some potential candidates. integrative, antiviral drug repurposing methods based on big data analysis or molecular docking and molecular dynamics are timesaving and high throughput. however, drugs identified by virtual screening still need to be verified in vitro and in vivo. in our previous research, a robust neutralization assay was established based on sars-cov- s-bearing vesicular stomatitis virus (vsv) pseudovirus and human ace expressing bhk cells (bhk -hace ) . single-cycle infectious of recombinant vsv-sars-cov- -sdel mimics the entry of sars-cov- . the infection of pseudovirus can be detected by fluorescence hours after infection, enabling the assay time-saving for high-throughput screening . this pseudovirus based assay is suitable for screening drugs that can block the infection of sars-cov- . in this study, the anti-sars-cov- potentiality of fda approved drugs were quantitatively evaluated by the pseudovirus-based assay. the screening procedure was illustrated in figure a and described in methods. the numbers of gfp-positive cells from drug treated wells were counted and divided by the number of infected cells from the well without treatment of drugs to calculate the relative value of infection rate. the results of two repetitions showed that most of drugs did not inhibit viral infection ( figure b ). fortyfour drugs with relatively better inhibitory effect were selected for further validation. in the second round of screening, inhibition of vsv-sars-cov- -sdel virus infection and cell cytotoxicity were both detected (supplementary figure ) . among them, drugs were excluded due to cytotoxicity. drugs were selected for analysis of specificity to vsv-sars-cov- -sdel and verification by authentic sars-cov- assay. to verify whether these selected drugs act on spike protein of sars-cov- on the pseudovirus or the vsv backbone, we evaluated the inhibitory effect of these compounds on vsv-g (the sequence of gfp was inserted into the genome of vsv, so that the infection of vsv could be indicated by green fluorescence.). there were three compounds, including chloroquine diphosphate, ribavirin, and cetylpyridinium (chloride monohydrate), exhibited significant inhibitor effects on vsv-g, whereas no effect was noted for other compounds ( figure c ). including clomiphene (citrate), amiodarone (hydrochloride), vortioxetine, vortioxetine (hydrobromide) and asenapine (hydrochloride), were selected and the function of these compounds was confirmed using authentic sars-cov- assay ( figure d and e). among them, the inhibitory effects of clomiphene (citrate) and vortioxetine were comparable to chloroquine diphosphate, while vortioxetine (hydrobromide) and asenapine (hydrochloride) were slightly less effective. whereas amiodarone (hydrochloride) inhibited the infection of pseudovirus efficiently with ic around . μm, but it showed no effect on authentic sars-cov- virus infection even used at a concentration of μm. to further evaluate the potential of applying these drugs in prophylaxis and combination therapy, we treated the cell with pseudovirus and different drug combinations. the drug combinations were added either at the same time of pseudovirus infection or hours pre-infection (table vortioxetine is an antidepressant drug that is used to treat major depressive disorder in adults. vortioxetine was safe and well tolerated, it was approved in . so far, no previous study described its antiviral roles. it is reported that sever covid- patients have a high probability of suffering from mental illness. recently, another antidepressant drug fluvoxamine is evaluated for the potential to treat covid- by researchers from the washington university school of medicine, because the drug may prevent an overreaction of the immune system called cytokine storms, which could result in life-threatening organ failure. the antiviral mechanism of vortioxetine remains unknow. however, it may bring physical and psychological benefits for covid- patients. asenapine is an antipsychotic medicine that is used to treat schizophrenia and bipolar i disorder and has been approved since . notably, it showed less cytotoxicity in this study comparing to other drugs that could inhibit the infection of sars-cov- . in summary, our study identified four fda-approved drugs that have the potential to suppress sars-cov- infection. the robust assay based on vsv-sars-cov- -sdel pseudovirus screened out the potential drugs with high efficiency, then the inhibitory effect was confirmed by authentic sars-cov- assay. the inhibitory effect of vortioxetine and clomifene is superior and the mechanism of these drugs seems different from chloroquine. the combination of clomifene (citrate), vortioxetine and asenapine (hydrochloride) greatly decreases the ic /ic of blocking virus infection. the clinical safety of these compounds has been evaluated and the availability of pharmacological data are expected to enable rapid preclinical and clinical evaluation for treatment of covid- . based on the existing clinical results, it seems that it is difficult for one particular drug alone to significantly benefit covid- patients, and combination therapy is more likely to make the patient recover faster. this work identified novel drugs that suppress the infection of virus and provided more candidates for post-exposure prophylaxis and combination therapies. notice that no test in vivo has been conducted and the mechanism of these compounds also remains unknown. more researches are required to support the clinical application of these drugs for treatment of covid- . vsv pseudovirus carrying truncated spike protein of sars-cov- , named vsv-sars-cov- -sdel virus, was packaged as previously described . vsv-g was prepared in similar way the relative value or inhibition rate of candidate drugs were calculated according to the decrease of gfp positive cell number (for pseudovirus-based assay) or cytopathic effect (for authentic sars-cov- -based assay). the ic (the half maximal inhibitory concentration), ic (the concentration for the % of the maximum inhibition) and cc (the % cytotoxic concentration) values were calculated with non-linear regression, i.e. log (inhibitor) vs. normalized response -variable slope or log(agonist) vs. response-find ecanything using graphpad prism . (graphpad software, inc., san diego, ca, usa). . na "clo" means clomiphene (citrate), "vor" means vortioxetine, "ase" means asenapine (hydrochloride) and "cq" means chloroquine diphosphate. "pre-treatment" means cell was treated with drugs hours before infection, while "cotreatment" means cells were treated with drugs at the time of infection. ic , ic and cc were calculated using prism software (graphpad). "na" means the value can't be calculated. moi= . robust neutralization assay based on sars-cov- s-bearing vesicular stomatitis virus (vsv) pseudovirus and ace -overexpressed bhk cells triple combination of interferon beta- b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with covid- : an open-label, randomised, phase trial. the lancet %@ fda-approved selective estrogen receptor modulators inhibit ebola virus infection ifitm proteins inhibit entry driven by the mers-coronavirus spike protein: evidence for cholesterol-independent mechanisms clomiphene and its isomers block ebola virus particle entry and infection with similar potency: potential therapeutic implications characterization of severe acute respiratory syndrome-associated coronavirus (sars-cov) spike glycoprotein-mediated viral entry coronavirus cell entry occurs through the endo-/lysosomal pathway in a proteolysis-dependent manner sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the safety and tolerability of vortioxetine: analysis of data from randomized placebo-controlled trials and open-label extension studies generation of vsv pseudotypes using recombinant deltag-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines hua-long xiong, tian-ying zhang, quan yuan and ning-shao xia had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy the authors declare that they have no conflicts of interest. key: cord- -hq um k authors: xiong, rui; zhang, leike; li, shiliang; sun, yuan; ding, minyi; wang, yong; zhao, yongliang; wu, yan; shang, weijuan; jiang, xiaming; shan, jiwei; shen, zihao; tong, yi; xu, liuxin; yu, chen; liu, yingle; zou, gang; lavillete, dimitri; zhao, zhenjiang; wang, rui; zhu, lili; xiao, gengfu; lan, ke; li, honglin; xu, ke title: novel and potent inhibitors targeting dhodh, a rate-limiting enzyme in de novo pyrimidine biosynthesis, are broad-spectrum antiviral against rna viruses including newly emerged coronavirus sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hq um k emerging and re-emerging rna viruses occasionally cause epidemics and pandemics worldwide, such as the on-going outbreak of coronavirus sars-cov- . existing direct-acting antiviral (daa) drugs cannot be applied immediately to new viruses because of virus-specificity, and the development of new daa drugs from the beginning is not timely for outbreaks. thus, host-targeting antiviral (hta) drugs have many advantages to fight against a broad spectrum of viruses, by blocking the viral replication and overcoming the potential viral mutagenesis simultaneously. herein, we identified two potent inhibitors of dhodh, s and s , with favorable drug-like and pharmacokinetic profiles, which all showed broad-spectrum antiviral effects against various rna viruses, including influenza a virus (h n , h n , h n ), zika virus, ebola virus, and particularly against the recent novel coronavirus sars-cov- . our results are the first to validate that dhodh is an attractive host target through high antiviral efficacy in vivo and low virus replication in dhodh knocking-out cells. we also proposed the drug combination of daa and hta was a promising strategy for anti-virus treatment and proved that s showed more advantageous than oseltamivir to treat advanced influenza diseases in severely infected animals. notably, s is reported to be the most potent inhibitor with an ec of nm and si value > in sars-cov- -infected cells so far. this work demonstrates that both our self-designed candidates and old drugs (leflunomide/teriflunomide) with dual actions of antiviral and immuno-repression may have clinical potentials not only to influenza but also to covid- circulating worldwide, no matter such viruses mutate or not. acute viral infections, such as influenza virus, sars-cov, mers-cov, ebola virus, zika virus, and the very recent sars-cov- are an increasing and probably lasting global threat . broad-spectrum antivirals (bsa) are clinically needed for the effective control of emerging and re-emerging viral infectious diseases. however, although great efforts have been made by the research community to discover therapeutic antiviral agents for coping with such emergencies, specific and effective drugs or vaccines with low toxicity have been rarely reported . thus, unfortunately, there is still no effective drugs for the infection of the novel coronavirus sars-cov- at present, which outbreak in december firstly identified by several chinese groups [ ] [ ] [ ] , and now has quickly spread throughout china and to more than other countries, infecting , patients and killing ones by march , . discovery of nucleoside or nucleotide analogs and host-targeting antivirals (htas) are two main strategies for developing bsa [ ] [ ] [ ] . with the former drug class usually causing drug resistance and toxicity, the discovery of htas has attracted much attention . several independent studies searching for htas collectively end up to compounds targeting the host's pyrimidine synthesis pathway to inhibit virus infections, which indicates that the replication of viruses is widely dependent on the host pyrimidine synthesis [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . however, most of these compounds lack verified drug targets making subsequent drug optimization and further application impossible , [ ] [ ] [ ] [ ] [ ] [ ] , . there are only a few inhibitors against pyrimidine synthesis that can be carried forward to animal studies, however, their antiviral efficacies were unsatisfactory or even ineffective at all , , , , [ ] [ ] [ ] . for example, a pyrimidine synthesis inhibitor fa- without a specific target protected only . % of mice from lethal influenza a virus infection when compared to the daa drug zanamivir ( %) in parallel . another two compounds, cmp and fk , which target dhodh, a rate-limiting enzyme in the fourth step of the de novo pyrimidine synthesis pathway, could only inhibit the dna virus (cmv) replication in rag -/mice, but their therapeutic effects on the upcoming diseases were unexplored. therefore, more potent pyrimidine synthesis inhibitors, especially ones with the specific drug target, are urgent to be developed to prove whether such an hta drug is valuable towards clinical use or has any advantages over daa drugs in antiviral treatment. to identify potent and low-toxicity dhodh inhibitors (dhodhi), we previously conducted a hierarchal structure-based virtual screening (fig. a) against ~ , compounds library towards the ubiquinone-binding site of dhodh . we finally obtained two highly potent dhodhi s and s with ic s of . nm and . nm through structural optimization , , which are > -folds potent than the fda approved dhodhi teriflunomide (ic of . nm). by using these two potent inhibitors, we could fully evaluate dhodh as a valuable host target both in infected cells and in vivo in infected animals. we identified that targeting dhodh offers broad-spectrum antiviral efficacies against various rna viruses, including the daa-resistant influenza virus and the newly emerged coronavirus sars-cov- . especially, our potent dhodhi can protect % mice from lethal influenza challenge, which is as good as the daa drugs, and is even effective in the late phase of infection when daa drug is no longer responding. by determination of the x-ray crystal structure of dhodh in complex with s ( supplementary data fig. a, table s ), we verified the binding mode of s at the ubiquinone-binding site of dhodh that similar to s . the binding free energies for s and s were - . and - . kj/mol, and the binding equilibrium dissociation constants (kd) were . and . nm, respectively ( fig. b and c) . additionally, the two inhibitors exhibited a clear trait of fast-associating (kon) and slowdissociating (koff) inhibition (table s ) , providing themselves as ideal drug candidates with a high level of target occupancy. moreover, s and s showed proper halflives ( . and . h, respectively) ( table s ) , which are much shorter than that of teriflunomide and leflunomide, indicating that they may have less possibility to bring side effects from drug accumulation in the body (supplementary data fig. b and c ). to examine the antiviral activities of these dhodhi, we use the influenza a virus as a model virus. a labor stain of a/wsn/ (h n , wsn) with tcid was applied to infect mdck cells, and serial dilutions of drugs (dmso as controls) were added at the same time when cells were infected. drug efficacies were evaluated by quantification of cell viability in both infected and non-infected cells, and the halfmaximal effective concentration (ec ) and the half-cytotoxic concentration (cc ) of the indicated drug were obtained accordingly. the selectivity index (si) was calculated by cc /ec . as shown in fig. d , the antiviral effect of leflunomide is hardly detectable at the cell culture level (ec > μ m). however, teriflunomide, the active metabolite of leflunomide, exhibited a clear antiviral effect against the wsn virus (ec = . μm, cc = . μm,si= . ). as compared to teriflunomide, the potent dhodhi s is ~ -fold stronger (ec = . μm) and s is ~ -fold stronger (ec = . μm) than teriflunomide in their antiviral efficacies. we also tested different influenza a virus subtypes of h n and h n . the antiviral efficacies of dhodhi followed the same pattern as they were against h n , which is s >s >teriflunomide>leflunomide in viral inhibitory efficacies (summarized in table ). the drug effective curve of s to h n (ec = . μm) and h n (ec = . μm) is shown in fig. e and f . when we compared the drug efficacy by virus plaque assay, the results in fig. g showed that the positive control daa drug oseltamivir (osel) could reduce the plaque size to needlepoint size. however, the virus plaque in equivalent s -treatment was not observable at all indicating that s is more efficient in inhibiting virus replication than osel. the results in all indicate that dhodhi, especially s and s exhibited direct antiviral activities to different subtypes of influenza a viruses by shutting off virus multiplication more thoroughly than osel. as all actuate infectious viruses rely on cellular pyrimidine synthesis process to replicate, it is reasonable to speculate that dhodhi have broad-spectrum antiviral efficacies. we, therefore, tested several highly impacted acute infectious rna viruses. all compounds of teriflunomide, brequinar, s , and s , showed inhibitory effects against ebola virus (ebov) mini-replicon, with ec of . , . , . and . μm, respectively ( fig. a) . to our supersize, s showing relatively high cytotoxicity in mdck cells (cc = . μm in fig. d ) was less toxic to ebov-minireplicon supporting bsr-t / cells (cc = . μm). thus, a significantly high si= . was achieved by s . we subsequently tested the inhibitory effects of dhodhi against zika virus (fig. b) . ec values were . , . , . and . μm for teriflunomide, brequinar, s and s , respectively. again, the selective index of s reached the top of si= . . when we prepared the manuscript, a severe outbreak of sars-cov- occurred in wuhan in december , we responded quickly to examine the antiviral activity of dhodhi against this new coronavirus. the data in fig. c showed that all the dhodhi tested is low toxic to sars-cov- susceptible vero e cells. teriflunomide had a solid antiviral efficacy of ec = . um (at moi= . , ~ . -fold stronger than favipiravir [ec = . μm] ) (fig. c upper right) , whereas its pro-drug leflunomide showed less inhibition of ec = . um (data not shown). we therefore further did immuno-florescent assay to visualize the drug efficacy. to determine more carefully the efficacy of teriflunomide, which can be transferred to clinical treatment of sars-cov- immediately as an approved drug, a bit low moi of . (fig. c upper left) were applied. in this condition, the ec of teriflunomide could reach μm with si> , indicating that teriflunomide with effective ec and si values have all the potentials to treat sars-cov- -induced covid- disease as an 'old drug in new use' option. additionally, s and s exhibited ideal antiviral efficacies of ec = . μm (si> . ) and ec = . μm (extensively high si> . ), respectively ( fig. c lower panel). compared with our previous publication of remdesivir (ec = . μm, si> . ) and chloroquine (ec = . μm, si> . ) , which are currently used in clinical trials against sars-cov- , s had much greater ec and si values ( . fold stronger than chloroquine in ec ) against sars-cov- . the data in fig. d clearly showed that as little as . nm ( . ec ) of s can dramatically inhibit sars-cov- infections, while, increased drug concentration of nm ( ec ) could further eliminate viral infected cells. thus, s turns to be the best efficient chemical so far against sars-cov- at the cellular level. in all the previous studies, inhibitors to dhodh or pyrimidine synthesis pathway were fig. a ). the data in fig. b showed that the bodyweights of mice from the dmso-treated 'virus group' all dropped to less than % and died at d p.i.. daa drug of osel could indeed totally rescue all the mice from bodyweight loss and death. equivalently, s ( mg/kg, red line) was also able to confer % protection and little bodyweight loss similar to osel. even s of . mg/kg and mg/kg cold confer % protection and % protection, respectively. considering the cmax of s (≈ μm), we used both mg/kg and mg/kg in the following experiments. the results suggest that s of a modest dose ( mg/kg) would achieve the equivalent % protection to daa drug when used from the beginning of infection. except for broad active to different viruses, hta drug such as dhodhi has another advantage over daa drug to overcome drug-resistant. to prove this, we generated a current-circulating oseltamivir-resistant na h y mutant virus (in wsn backbone) by reverse genetics (supplementary data fig. a and b) . we found that the na h y virus did not respond to osel ( mg/kg/day)-treatment at all but . mg/kg/day of s can rescue % of mice from lethal infection of na h y virus (supplementary data fig. c ). when mice were infected with a natural-isolated pandemic strain sc , which is less sensitive to either osel ( % protection) or s ( % protection) ( fig. c ), we further observed % protection in combined treatment of s +osel, indicating that hta and daa drug combination can augment therapeutic effects. the data in all refresh dhodh as an attractive host target in treating viral disease with equivalent efficacy to daa drug, and is advantageous when facing daa-drugresistant viruses. to elucidate the essential role of dhodh in the viral replication cycle, we generated a dhodh -/-a cell line by crispr-cas gene knock-off (ko) technology (fig. a) . unexpectedly, the cell proliferation rate was barely affected in dhodh -/cells indicating dhodh is not indispensable for cell growth at least for three days ( hours) ( supplementary fig. ). by contrast, virus growth was largely inhibited in dhodh -/cells as compared to wild-type (wt) a cells with almost -fold reduction of infectious particles at hours post-infection (h.p.i.) (fig. b) . when s ( ic ) was added into the culture medium, dramatic reduction of virus growth only occurred in wt cells but not in dhodh -/cells ( fig. c and d) . these results prove that virus growth but not the coincident cell growth requires dhodh activity, and antiviral action of s is implemented by targeting dhodh. the general virus growth cycle includes virus entry, viral genome replication, and virus release. to further validate virus genome replication is the major target of dhodhi, we used the influenza-a-virus mini-replicon system to quantify viral genome replication. brequinar, another potent inhibitor of dhodh was included as a positive control , whereas, osel targeting influenza na protein served as a negative control. the results in fig. e showed no inhibition on viral genome replication in the osel-treated group, but there were obvious inhibitions on viral genome replication in both s -and s -treated groups as well as brequinar-treated group in dosedependent manners. almost % of viral genome replication was suppressed by ic of s ( μm) and s ( . μm). as dhodh catalyzes oxidation of dihydroorotate (dho) to produce orotic acid (oro) and finally forms utp and ctp, we add four nucleotides (adenine nucleotide(a), guanine nucleotide(g), uracil nucleotide(u), and cytosine nucleotide(c)), dho, oro respectively to mini-replicon system to identify the target of s and s . the results in fig. f showed that the addition of μm either u or c could effectively rescue viral genome replication in s -and s -treated cells (as well as brequinar-treated cells), whereas addition of neither a nor g changed the inhibitory effects. moreover, supplement of dhodh substrate dho cannot rescue viral genome replication (fig. g) , but a supplement of dhodh product oro can gradually reverse the inhibition effects of s and s (fig. h) . the results further confirm that compounds s and s inhibit viral genome replication via targeting dhodh and interrupting the fourth step in de novo pyrimidine synthesis. it is documented elsewhere that daa drugs such as osel is only completely effective in the early phase of infection, optimally within hours of symptom onset . and till now, there is no approved drug to treat advanced influenza disease at the late phase specifically. we suppose that s could be effective in the middle or late phase of disease because it targets a host pro-viral factor of dhodh not affected by viral replication cycle. to test this, we compared the therapeutic windows of s and osel in early (d -d ), middle & late (d -d ),severe late (d -d or d -d ) phases (workflow shown in fig. a ). when drugs were given in the early phase, both oseltreatment and 'osel+s '-combination-treatment conferred % protection (fig. b ). when drugs were given at the middle & late phase (fig. c) , single osel-treatment wholly lost its antiviral effect with no surviving. however, s -treatment could provide % protection, and drug combination reached to % protection. when drugs were given at severe late phase of disease that mice were starting dying (fig. d) , neither single treatment of osel nor s can rescue the mice from death but combined treatment still conferred to % survival. to really show the advantage of s in treating severe disease, we additionally treated the mice a bit early before dying around % of initial weights (d -d ) with a more optimal dose of s ( mg/kg). the data in fig. e showed that s rescued % of mice from severe body-weight losses, and combined treatment coffered additional % survival. these results once again highlight that s has remarkable advantages over osel to treat severe diseases at the late phase, and its therapeutic effectiveness could even be improved when s was combined with daa drug. it is known that severe acute infections including influenza and covid- always induce pathogenic immunity as cytokine/chemokine storms. leflunomide and teriflunomide are already clinically used in autoimmune disease to inhibit pathogenic cytokines and chemokines. we, therefore, suspect that dhodhi should also be anticytokine-storm in viral infectious disease. balf from either osel-or 's +osel'treated mice were collected at d in an independently repeated experiment with a lower infection dose. a parallel body weights excluded differences in virus load (supplementary fig. a) . the data in supplementary fig. b showed that the pathogenic inflammatory cytokines in 's +osel'-treated group was largely reduced as compared to osel-treated mouse in the levels of il , mcp- , il , kc/gro(cxcl ), il , ifn-γ, ip- , il , tnf-α, gm-csf, epo, il p , mip α and il a/f (listed in the order of reduce significance). the results in all provide striking information that dhodh inhibitors are effective in infected animals not only by inhibiting virus replication (shown in fig. and fig. ) but also by eliminating excessive cytokine/chemokine storm. usage of dhodhi could finally benefit to advanced disease in late infection. in this study, we applied dhodh inhibitors including a computer-aided designed compound s into viral infectious disease. we found that direct-targeting dhodhi are broad-spectrum antiviral both in cell culture and in vivo. the candidate s had further advantage to be used in infected animals with low toxicity and high efficiency. moreover, s can rescue severe influenza infection by limiting inflammatory cytokine storm in vivo. dhodh is a rate-limiting enzyme catalyzing the fourth step in pyrimidine de novo synthesis. it catalyzes the dehydrogenation of dihydroorotate (dho) to orotic acid (oro) to finally generate uridine (u) and cytosine (c) to supply nucleotide resources in a cell. under normal conditions, nucleotides are supplied via both de novo biosynthesis and salvage pathways, the latter of which is a way of recycling pre-existing nucleotides from food or other nutrition. however, in virus-infected cells, a large intracellular nucleotide pool is demanded by rapid viral replication. it is therefore reasonable that de novo nucleotides biosynthesis rather than salvage pathway is more critical for virus replication. our data indeed show that virus replication is largely restricted when the dhodh gene was knocked off even with a complete culture medium. by contrast, cell growth was not affected by lacking dhodh at all, indicating that de novo nucleotides biosynthesis is not indispensable in normal cell growth without infection at least for days. more interestingly, we notice that compared with dna viruses, rna viruses need unique ump but not tmp in their genomes. ump is the particular nucleoside produced by dhodh, which means rna viruses might be more sensitive to dhodh activity. sars-cov- , for instance, has around % of ump in its genome explaining why dhodhi are effective and superior to sars-cov- . nevertheless, the comparison between different viruses is worth to be studied in the future. although several dhodhi have been documented to be antiviral by high-throughput screening [ ] [ ] [ ] [ ] . most of these compounds are still at cell culture level with unknown in vivo efficacy. therefore, the development of broad-spectrum antiviral agents targeting dhodh is still an exciting avenue in antiviral research. s and s present more potent inhibition and favorable pharmacokinetic profiles, moreover, the half-lives of s and s ( . and . h, respectively) are much shorter and more appropriate than that of teriflunomide, indicating that they may have less possibility to bring toxic side effects from drug accumulation in the body. strikingly, s showed active effects in vivo in lethal dose infection of influenza a viruses not only when used from the beginning of infection but also in the late phase when daa drug is not responding anymore. another surprise is the high si value of s to against zika (si= . ), ebola (si= . ), and the current sars-cov- (si> ). these data interpreted that s is highly promising to develop further as it should be to s . the extremely high si of s may be due to its high binding affinity and favorable occupation of the ubiquinone-binding site of dhodh with faster-associating characteristics (kon = . × m - s - ) and slower dissociating binding characteristic (koff= . × - s - ), which will reduce the possibility of off-target in vivo. acute viral infections usually cause severe complications associated with hyper induction of pro-inflammatory cytokines, which is also known as "cytokine storm" firstly named in severe influenza disease , . several studies showed that lethal sars patients expressed high serum levels of pro-inflammatory cytokines (ifn-γ, il- , il- , il- , and tgfβ) and chemokines (ccl , cxcl , cxcl , and il- ) compared to uncomplicated sars patients [ ] [ ] [ ] [ ] . similarly, in severe covid- cases, icu patients had higher plasma levels of il- , il- , il- , gscf, mcp , mip a, and tnfα compared to non-icu patients . moreover, a clinical study of patients with covid- showed that the percentage of patients with il- above normal is higher in severe group . in terms of treatment, immunomodulatory agents can reduce mortality and organ injury of severe influenza. however, these immunomodulatory are mostly non-specific to viral infection but rather a systemic regulation, such as corticosteroid, intravenous immunoglobulin (ivig) or angiotensin receptor blockers [ ] [ ] [ ] [ ] . leflunomide and its active metabolite teriflunomide have been approved for clinical treatment for excessive inflammatory diseases such as rheumatoid arthritis and multiple sclerosis . our data once again proved that dhodhi could further reduce cytokine storm than daa drugs when using influenza-a-virus infected animal as a model. we believe that a similar immune-regulating role of dhodhi will exist in covid- patients. thus, by targeting dhodh, the single key enzyme in viral genome replication and immuneregulation, a dual-action of dhodh can be realized in fighting against a broad spectrum of viruses and the corresponding pathogenic-inflammation in severe infections. we hope our study may quickly and finally benefit the patients now suffering from severe covid- and other infectious diseases caused by emerging and reemerging viruses. itc measurements were carried out at °c on a microcal itc (ge healthcare). for the titration of an inhibitor to dhodh, both were diluted using the buffer ( mm hepes, ph . , mm kcl, % glycerol and % dmso). the concentration of dhodh in the cell was µm, and the concentration of inhibitor in the syringe was µm for s or µm for s . all titration experiments were performed by adding the inhibitor in steps of µl. the data were analyzed using microcal origin software by fitting to a one-site binding model. surface plasmon resonance experiments were performed with a biacore t (ge healthcare) according to our previous work all drug concentrations were performed at least three replicates. data were processed with graphpad prism software to calculate ec and cc values of the compounds. ebolavirus must be operated in the bsl- laboratory. to reduce the biological safety risks, the ebolavirus replicon system was chosen for antiviral efficacy assay to detect viral protein expression in vero e cells, cells were fixed with % paraformaldehyde and permeabilized with . % triton x- . the cells were then incubated with the primary antibody (a polyclonal antibody against the np of a bat sars-related cov) after blocking, followed by incubation with the secondary antibody (alexa -labeled goat anti-rabbit, abcam). to the t cells were seeded into -well plates at × respectively. diluted compounds were given by intraperitoneal (i.p) injection once a day. the drug treatment was initiated on days , , , , post-infection respectively and continued for several days. animal weight and survival were monitored daily, and mice were euthanized until the end of the experiment or when body-weight lost more than %. the protein structure data has been uploaded to the protein data bank with accession number m b. ( . , , mg/kg), oseltamivir ( mg/kg) and s +oseltamivir ( mg/kg+ mg/kg) once per day from d -d respectively. the body weight and survival were monitored for days or until body weight reduced to % (n = mice per group). (c) mice were inoculated intranasally with pfu of a/sc/ (h n ) and then i.p. with s ( mg/kg), oseltamivir ( mg/kg) and s +oseltamivir ( mg/kg+ mg/kg) once per day from d to d . the body weight and survival were monitored until days post-infection or when the bodyweight reduced to %. the dotted line indicates endpoint for mortality ( % of initial weight). the body weights are present as the mean percentage of the initial weight ±sd of - mice per group and survival curve were shown. tables table table fig. . discovery of novel dhodh inhibitors and their anti-influenza-a-virus activities. broad-spectrum antiviral activity of dhodh inhibitors. a"iv to "z"ikv: attacks from emerging and re-emerging pathogens expanding the activity spectrum of antiviral agents rna based mngs approach identifies a novel human coronavirus from two individual pneumonia cases in wuhan outbreak a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china advances in the development of nucleoside and nucleotide analogues for cancer and viral diseases nucleosides for the treatment of respiratory rna virus infections cellular targets for influenza drugs broad-spectrum antiviral agents: a crucial pandemic tool discovery of a broad-spectrum antiviral compound that inhibits pyrimidine biosynthesis and establishes a type interferon-independent antiviral state characterization of dengue virus resistance to brequinar in cell culture inhibiting pyrimidine biosynthesis impairs ebola virus replication through depletion of nucleoside pools and activation of innate immune responses broad-spectrum antiviral that interferes with de novo pyrimidine biosynthesis atovaquone inhibits arbovirus replication through the depletion of intracellular nucleotides antiviral effects of selected impdh and dhodh inhibitors against foot and mouth disease virus discovery of potent broad spectrum antivirals derived from marine actinobacteria original chemical series of pyrimidine biosynthesis inhibitors that boost the antiviral interferon response inhibition of pyrimidine biosynthesis pathway suppresses viral growth through innate immunity sar-based optimization of a -quinoline carboxylic acid analogue with potent antiviral activity respiratory syncytial virus infection in macaques is not suppressed by intranasal sprays of pyrimidine biosynthesis inhibitors discovery, optimization, and target identification of novel potent broadspectrum antiviral inhibitors broad-spectrum inhibition of common respiratory rna viruses by a pyrimidine synthesis inhibitor with involvement of the host antiviral response assessment of drug candidates for broad-spectrum antiviral therapy targeting cellular pyrimidine biosynthesis mechanistic study of malononitrileamide fk in cardiac transplantation and cmv infection in rats discovery of diverse human dihydroorotate dehydrogenase inhibitors as immunosuppressive agents by structure-based virtual screening rational design of benzylidenehydrazinyl-substituted thiazole derivatives as potent inhibitors of human dihydroorotate dehydrogenase with in vivo anti-arthritic design, synthesis, x-ray crystallographic analysis, and biological evaluation of thiazole derivatives as potent and selective inhibitors of human dihydroorotate dehydrogenase remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro inhibition of dihydroorotate dehydrogenase activity by brequinar sodium neuraminidase inhibitors: zanamivir and oseltamivir broad-spectrum antiviral agents the future of antivirals: broad-spectrum inhibitors. current opinion in infectious diseases gsk : a novel compound with broad-spectrum antiviral activity broad-spectrum inhibition of common respiratory rna viruses by a pyrimidine synthesis inhibitor with involvement of the host antiviral response inhibition of dengue virus through suppression of host pyrimidine biosynthesis evaluation of anti-zika virus activities of broad-spectrum antivirals and nih clinical collection compounds using a cell-based, highthroughput screen assay influenza-associated encephalopathy--pathophysiology and disease mechanisms human infection by avian influenza a h n temporal changes in cytokine/chemokine profiles and pulmonary involvement in severe acute respiratory syndrome persistence of lung inflammation and lung cytokines with high-resolution ct abnormalities during recovery from sars plasma inflammatory cytokines and chemokines in severe acute respiratory syndrome analysis of serum cytokines in patients with severe acute respiratory syndrome clinical features of patients infected with novel coronavirus in wuhan characteristics of lymphocyte subsets and cytokines in peripheral blood of hospitalized patients with novel coronavirus pneumonia (ncp). medrxiv clinical findings in cases of influenza a (h n ) virus infection hyperimmune intravenous immunoglobulin treatment: a multicentre double-blind randomized controlled trial for patients with severe a(h n )pdm infection meta-analysis: convalescent blood products for spanish influenza pneumonia: a future h n treatment? evaluation of the efficacy and safety of a statin/caffeine combination against h n , h n and h n virus infection in balb/c mice on dihydroorotate dehydrogenases and their inhibitors and uses sumoylation of influenza a virus nucleoprotein is essential for intracellular trafficking and virus growth isolation and characterization of zika virus imported to china using c / mosquito cells structure-based design of potent human dihydroorotate dehydrogenase inhibitors as anticancer agents minigenome-based reporter system suitable for high-throughput screening of compounds able to inhibit < replication and/or transcription new low-viscosity overlay medium for viral plaque assays this work was supported in part by the national key /-) cells. (e) the t cells were co-transfected with the influenza virus minigenome plasmid system (pb , pb , pa, np, ppoⅡ-np-luc, and prlsv ). after h.p.i., cells were treated with -fold serial dilutions of oseltamivir, s , s , and brequinar respectively. the luciferase activities were measured h of post-treatment. (f) effects of nucleotides addition on the antiviral efficacies of s and s . the -fold ic of s ( μm) or s ( . μm) and μm four nucleotides (adenosine, uridine, cytidine, guanosine) were added at the same time on t cells. after h of treatment, the luciferase activities were measured. (g and h) effects of addition of dihydroorotate (dho) or orotic acid (oro) on the antiviral efficacies of s and s . the luciferase activities were detected as above after treating with indicated concentrations of dho or oro. all results are presented as a mean of three replicates ± sd. statistical analysis, two-way anova for b, c, and d. one-way anova for e, f, g and h. ns, p > . ; *, p < . ; **, p < . ; ***, p < . . d - (d) . another groups of s ( mg/kg) or s +oseltamivir ( mg/kg+ mg/kg) were given i.p. once per day from d to d in e. the green bars indicate the period of drug administration. the body weight and survival were monitored until days post-infection or when the bodyweight reduced to %, respectively (n = - mice per group). the dotted line indicates endpoint for mortality ( % of initial weight). the body weights are present as the mean percentage of the initial weight ±sd of - mice per group and survival curve were shown. key: cord- - chuwvg authors: maclean, oscar a.; lytras, spyros; singer, joshua b.; weaver, steven; pond, sergei l. kosakovsky; robertson, david l. title: evidence of significant natural selection in the evolution of sars-cov- in bats, not humans date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: chuwvg rna viruses are proficient at switching to novel host species due to their fast mutation rates. implicit in this assumption is the need to evolve adaptations in the new host species to exploit their cells efficiently. however, sars-cov- has required no significant adaptation to humans since the pandemic began, with no observed selective sweeps to date. here we contrast the role of positive selection and recombination in the sarbecoviruses in horseshoe bats to sars-cov- evolution in humans. while methods can detect some evidence for positive selection in sars-cov- , we demonstrate these are mostly due to recombination and sequencing artefacts. purifying selection is also substantially weaker in sars-cov- than in the related bat sarbecoviruses. in comparison, our results show evidence for positive, specifically episodic selection, acting on the bat virus lineage sars-cov- emerged from. this signature of selection can also be observed among synonymous substitutions, for example, linked to ancestral cpg depletion on this bat lineage. we show the bat virus rmyn has recombinant cpg content in spike pointing to coinfection and evolution in bats without involvement of other species. our results suggest the non-human progenitor of sars-cov- was capable of human-human transmission as a consequence of its natural evolution in bats. this analysis reported ten sites as showing significant evidence of positive selection across the pandemic phylogeny (table ) . due to the low diversity in these sars-cov- samples, there is limited power to confidently estimate the synonymous and nonsynonymous substitution rate for each codon. this means that statistical power to identify positive selection in the form of dn/ds> for any given codon is limited, and the posterior distribution should be flat. the presence statistically significantly signatures of positive selection is therefore somewhat surprising. to understand the specific mutational patterns that might explain these significant results, we looked at where in the phylogeny these putatively positively selected mutations were occurring. for all but two of the ten positive selected codons, this signal was being driven by apparent convergent evolution (or homoplasy) in the tree, with the same mutation occurring in parallel across the phylogeny. to investigate whether this observation was truly due to independent events or because of recombination signatures in the sars-cov- outbreak tree, we firstly determined if the samples with these convergent mutations were geographically correlated. as selective pressure acting on an untreatable novel zoonotic virus is likely to be globally shared (adaptation to humans), but recombination requires co-localisation of viruses in the time and space, geographic clustering would be a good indication that these mutations are not independent. the homoplasies driving orf l s and orf ab l s mutations were both found in south korean isolates, and each of the two instances of orf m d g and orf n i t were found in the netherlands. this geographic clustering was suggestive of recombination and was investigated further. two of the ten fubar-flagged sites, spike codons and , did not show any homoplasies. both signals could be attributed to the same run of four neighbouring u to a mutations spanning the two codons. these mutations were found in only a single sample: epi_isl_ from beijing and have not been observed since (to date / / ). this suggests that they were either sequencing errors, or a large single mutation spanning two codons, which has not subsequently spread. multiple nucleotide changes within a single codon should be rare and sequencing error is a plausible explanation. the positive selection signature at nsp codon can be explained by multiple homoplasies of g to u mutations at nucleotide . this mutation is found in four distinct haplotypes both the orf codon and orf ab codon positive selection signals appear to be raised by a single south korean sample (gisaid accession ). this sample possesses two derived mutations either side of a hypothesised breakpoint. these pairs of derived mutations belong to samples with different haplotypes (supplementary figure ) . therefore the sample appears to be a recombinant between sample and or . as both and were sequenced by the same lab and released at the same time, this recombination event may be an artefactual product of lab cross-contamination. figure a) , suggesting it is not the result of recombination. no newly sequenced samples uploaded up to / / containing the d g mutation clustered with the wuhan sample, suggesting that this haplotype did not spread or that this homoplasy is driven by sequencing error. additional sequences displaying apparent convergent evolution at this site have since been sequenced, these have been taken as evidence of positive selection . however, given that this mutation now occurs in % of sequenced samples (as of / / ), it will be one of the mutations most likely to be variable if multiple viral genotypes are present following lab contamination or in mixed infections, and so most prone to being shuttled onto new backgrounds by recombination. therefore, whilst high frequency mutations are the most important to study, they are also the most prone to misleading homoplasies, and must be analysed with the most caution. the n orf site detected by fubar is driven by a similar convergent evolution event history. however, both samples exhibiting the same derived i to t mutation ( figure b ; gisaid ids and ) were sequenced by the same dutch lab and released at the same time, again suggesting that lab cross contamination is a likely driver. however, unlike south korean sample , there is only one shared derived mutation (codon ), and therefore the genomic evidence for recombination in these samples is weaker. the spike v f signal was driven by apparent convergent evolution between four french samples sequenced in january and a hong kong sample , which shows shared variation either side of the homoplasy suggesting it is not a recombinant (supplementary figure c) . looking through more recent data shows additional homoplasies in a simple neighbour joining tree. additionally, newly generated sequences since the fubar analysis cluster around the hong kong sample, further suggesting it is not a lab sequencing error. subsequent informal scans of newer data have revealed evidence of additional lab recombination events (see supplementary figure d ). in addition to searching for positive selection, we investigated if signatures of purifying selection on segregating variation in the current sars-cov- data could be observed (sequences as of / / ). we compared the relative frequencies of nonsynonymous and synonymous mutations in the pandemic data. codons with multiple mutations present were discarded from the analysis to avoid ambiguity in the order of mutations, and simplify synonymous/nonsynonymous classification. most mutations of both classes are at very low frequency (main text figure ) , indicative of the viral population expansion that the pandemic has undergone. dn/ds was approximately . in singletons, suggesting that % of nonsynonymous mutations are strongly deleterious and therefore never observed in the population. there is a weak observable trend towards a higher proportion of mutations being synonymous at the highest frequency intervals, suggestive of some ongoing selection against circulating amino acid replacements in the pandemic. this observation may be partially driven by sequencing errors which are not transmitted and so are at low frequency. these sequencing errors are likely to have a dn/ds value of , which may make the estimate that % amino acid replacements are strongly deleterious an underestimate of the true value. however, the decline in nonsynonymous/synonymous ratio occurs across the range of frequencies, suggesting that sequencing errors alone are not driving the trend. it is important to consider that the observed frequencies are likely to differ from true global frequencies due to biased sampling of infections in the pandemic , and so we caution against overinterpretation of specific mutation frequencies. fubar: a fast, unconstrained bayesian approximation for inferring selection synonymous mutations and the molecular evolution of sars-cov- origins controlling the sars-cov- outbreak, insights from large scale whole genome sequences generated across the world no evidence for distinct types in the evolution of sars-cov- key: cord- - ow d authors: parvez, md sorwer alam; rahman, mohammad mahfujur; morshed, md niaz; rahman, dolilur; anwar, saeed; hosen, mohammad jakir title: genetic analysis of sars-cov- isolates collected from bangladesh: insights into the origin, mutation spectrum, and possible pathomechanism date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ow d as the coronavirus disease (covid- ), caused by the severe acute respiratory syndrome coronavirus- (sars-cov- ), rages across the world, killing hundreds of thousands and infecting millions, researchers are racing against time to elucidate the viral genome. some bangladeshi institutes are also in this race, sequenced a few isolates of the virus collected from bangladesh. here, we present a genomic analysis of isolates. the analysis revealed that sars-cov- isolates sequenced from dhaka and chittagong were the lineage of europe and the middle east, respectively. our analysis identified a total of mutations, including three large deletions, half of which were synonymous. most of the missense mutations in bangladeshi isolates found to have weak effects on the pathogenesis. some mutations may lead the virus to be less pathogenic than the other countries. molecular docking analysis to evaluate the effect of the mutations on the interaction between the viral spike proteins and the human ace receptor, though no significant interaction was observed. this study provides some preliminary insights into the origin of bangladeshi sars-cov- isolates, mutation spectrum and its possible pathomechanism, which may give an essential clue for designing therapeutics and management of covid- in bangladesh. till th may, genome sequence of bangladeshi sars-cov- isolates were found deposited in the gsaid, however genome sequence of isolates were found incomplete. thus, all complete genome sequences of the reported isolates of sars-cov- in bangladesh were retrieved from the gisaid database (https://www.gisaid.org/). as many of the bangladeshi people return during the covid- outbreak mainly from china, india, saudi arabia, spain, italy, japan, qatar, canada, kuwait, usa, france, sweden, and switzerland, the first deposited genome sequence of those countries were also retrieved. sequence information of the first isolate collected from china was considered as a reference for further analysis. we performed multiple sequence alignment using clustal omega [ , ] , and the sequence of the strain china [epi_isl_ ] was used as a reference genome. the alignment file was analyzed using mview program of clustal omega [ ] . only variations in the coding regions were analyzed in this study. fgenesv of softberry (http://linux .softberry.com/berry.phtml), which is a trained pattern/markov chain-based viral gene prediction tools, was adopted for the prediction of the genes as well as the proteins from the viral genomes. each predicted protein (for each viral genomes) was identified using the the structural and functional effects of the missense variants, along with the stability change, were analyzed using different prediction tools. i-mutant was employed to analyze the stability change where all the parameters were kept in default [ ] . additionally, mutpred was adopted to predict the molecular consequences and functional effect of these mutations [ ] . retrieved genome sequence of the sars-cov- a total number of complete genome sequences of the sars-cov- isolates from bangladesh and genome sequence from the isolates of other countries (china, india, saudi arabia, spain, italy, japan, qatar, canada, kuwait, usa, france, sweden, and switzerland) have been retrieved from gsaid. the strain of wuhan accession number with epi_isl_ was considered as the reference strain. phylogenetic tree analysis phylogenetic tree analysis revealed that all the selected bangladeshi isolates could be divided into two main groups, where one group shared a common ancestor with saudi arabia (fig ) . the other group found to have a similarity with the strain from switzerland, and it could be subdivided into two groups. in (table ) . additionally, three mutations occurring in surface glycoprotein, orf a and orf were predicted to alter the molecular consequences, including loss of sulfation in surface glycoprotein and loss of proteolytic cleavage in orf a and loss of allosteric site in orf (table and supplementary table ). in total, three models were generated using the template pdb id: vsb; one model for the spike protein of reference strain, and the two others were for two different mutant isolates from bangladesh (fig ) . two types of mutations were found in the spike proteins of all bangladeshi isolates, where most of the isolates were found to contain a substitution of d g. only one strain, epi_isl_ , found to have assessment scores of these three models were mostly similar to the template, which provided the spike proteins along with mutant models and the human ace receptor. interestingly, this molecular docking analysis revealed that the docking score for the three models against the human ace receptor was similar, and it was - . ; mutation in the spike proteins do not hamper binding with ace receptor. for three spike protein models, this study found that a domain of spike protein instead of whole protein, amino acid ranging from to , was involved in the interactions. this domain was conserved in all isolates resulting in similar interactions with ace (fig ) . epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; m: missing) epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ) table : prediction of the mutational effects on the structural stability. table : prediction of the effects of the mutation on the molecular consequences. table : mutpred score for all mutations. scores of < . indicate no effect on molecular consequences. table : predicted number of genes and identity compared to the reference strain. (legends: s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; s : epi_isl_ ; m: missing) no protein s s s s s s s s s s s s s s covid- and the cardiovascular system structural basis for inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir the hallmarks of covid- disease bangladesh expands covid- testing key: cord- - pfa authors: yamamoto, norio; matsuyama, shutoku; hoshino, tyuji; yamamoto, naoki title: nelfinavir inhibits replication of severe acute respiratory syndrome coronavirus in vitro date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pfa in december , severe acute respiratory syndrome coronavirus (sars-cov- ) emerged in wuhan, hubei province, china. no specific treatment has been established against coronavirus disease- (covid- ) so far. therefore, it is urgently needed to identify effective antiviral agents for the treatment of this disease, and several approved drugs such as lopinavir have been evaluated. here, we report that nelfinavir, an hiv- protease inhibitor, potently inhibits replication of sars-cov- . the effective concentrations for % and % inhibition (ec and ec ) of nelfinavir were . µm and . µm respectively, the lowest of the nine hiv- protease inhibitors including lopinavir. the trough and peak serum concentrations of nelfinavir were three to six times higher than ec of this drug. these results suggest that nelfinavir is a potential candidate drug for the treatment of covid- and should be assessed in patients with covid- . in december , a novel betacoronavirus, designated as severe acute respiratory syndrome coronavirus (sars-cov- ), emerged in wuhan, hubei province, china , . as of march , , , confirmed cases with , deaths were reported, and countries/territories/areas were affected. as of now, no specific treatment has been established against coronavirus disease- . after the outbreak of severe acute respiratory syndrome (sars) in , screening of approved drugs identified at least two human immunodeficiency virus type- (hiv- ) protease inhibitors, lopinavir and nelfinavir, as compounds that inhibited sars-cov replication in vitro , . in the open-label study with sars patients in , the group treated with lopinavir-ritonavir ( mg and mg, respectively) in addition to ribavirin showed better clinical outcomes than the control group treated with only ribavirin . because sars-cov- is relatively similar to sars-cov , it is expected that these drugs will be effective for the treatment of covid- patients. to select the candidate drug for the clinical evaluation, the data of antiviral activity in vitro are of great help. however, there is no basic information about the effective concentration of hiv- protease inhibitors to abrogate sars-cov- replication. thus, in the present study, we evaluated the antiviral activity of nine approved hiv- protease inhibitors against sars-cov- in vitro. for the evaluation of the nine hiv- protease inhibitors, the % effective concentration (ec ), the % cytotoxic concentration (cc ), and the selectivity index (si, cc /ec ) were analyzed. moreover, based on the mean peak and trough serum concentration (c max and c trough ) of the compounds in human, c max -ec ratio (c max /ec ) and c trough -ec ratio (c trough /ec ) were calculated to assess the safety and efficacy of these compounds. among these inhibitors tested, the high concentrations of drugs were required to inhibit sars-cov- replication in amprenavir (ec = . µm, cc > µm, si > . ), darunavir (ec = . µm, cc > µm, si > . ), and indinavir (ec = . µm cc > µm, si > . ) (fig. g , h, i, j). tipranavir inhibited replication of sars-cov- , but si was relatively low (ec = . µm, cc = . µm, si = . ) (fig. f, j) . ritonavir (ec = . µm, cc = . µm, si = . ), saquinavir (ec = . µm, cc = . µm, si = . ), and atazanavir (ec = . µm, cc > µm, si > . ) suppressed sars-cov- at less than µm (fig. c , d, e, j). lopinavir, which has been clinically tested in patients with sars and covid- , blocked sars-cov- replication at a low concentration range and its si was relatively high among nine inhibitors (ec = . µm, cc = . µm, si = . ) (fig. b, j) . notably, nelfinavir (ec = . µm, cc = . µm, si = . ) potently inhibited virus replication at the lowest concentration and exhibited the highest si in the tested hiv- protease inhibitors (fig. a, j) . to examine the target step of hiv- protease inhibitors in the virus life cycle, time-of-addition assays were performed. intra-cellular viral rna was quantified at hpi with varying timing of drug addition. nelfinavir and lopinavir suppressed multiplication of viral rna in the cells even when these drugs were added at . hpi (fig. s ). these results suggest that both drugs functioned at the post-entry step. next, c max -ec ratio and c trough -ec ratio of nine compounds were calculated to estimate the clinical efficacy against covid- . nelfinavir (c max /ec = . , c trough /ec = . ), lopinavir (c max /ec = . , c trough /ec = . ), and tipranavir (c max /ec = . , c trough /ec = . ) exhibited c trough -ec ratio higher than one (fig. k ). ritonavir exceeded one in c max -ec ratio, but did not in c trough -ec ratio (c max /ec = . , c trough /ec = . ). saquinavir (c max /ec = . , c trough /ec = . ), atazanavir (c max /ec = . , c trough /ec = . ), amprenavir (c max /ec = . , c trough /ec = . ), darunavir (c max /ec = . , c trough /ec = . ), and indinavir (c max /ec = . , c trough /ec = . ) could not reach one in the c max -ec ratio. these results suggest that nelfinavir, lopinavir, and tipranavir can achieve ec of each drug in human and are effective in the treatment of covid- patients. nelfinavir showed the lowest ec ( . µm), the highest si ( . ), the second highest c max -ec ratio ( . ), and the highest c trough -ec ratio ( . ) among the tested hiv- protease inhibitors. these data indicate that nelfinavir is the best of nine drugs as a potential drug candidate for the treatment of covid- . lopinavir showed the inferior results to nelfinavir, namely, higher ec ( . µm), lower si ( . ), lower c max -ec ratio ( . ), and lower c trough -ec ratio ( . ) than nelfinavir. in a recent randomized, controlled, open-label trial, lopinavir-ritonavir did not significantly accelerate clinical improvement, reduce -day mortality, or decrease throat viral rna amount in patients with severe covid- . one of the reasons why the efficacy of lopinavir-ritonavir was not confirmed in the clinical trial might be associated with insufficient c max -ec and c trough -ec ratio to treat severe covid- . it remains to be determined whether earlier lopinavir-ritonavir treatment in mild covid- could have clinical benefit. tipranavir showed the highest c max -ec ratio ( . ) and the second highest c trough -ec ratio ( . ) of all tested drugs. this is because c max and c trough are much higher than the other tested drugs (fig. j) . since si of tipranavir was less than that of nelfinavir ( . vs. . ), nelfinavir seems better than tipranavir. the low si of tipranavir may raise concern about adverse effects to covid- patients. our findings reveal that nelfinavir is highly effective in inhibiting sars-cov- replication in vitro and has the high c max -ec and c tough -ec ratio. we suggest that nelfinavir is a potential candidate drug for the treatment of covid- and should be assessed in the patients with covid- . clinical features of patients infected with novel coronavirus in wuhan a novel coronavirus from patients with pneumonia in china role of lopinavir/ritonavir in the treatment of sars: initial virological and clinical findings hiv protease inhibitor nelfinavir inhibits replication of sars-associated coronavirus genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding a trial of lopinavir-ritonavir in adults hospitalized with severe covid- enhanced isolation of sars-cov- by tmprss -expressing cells development of genetic diagnostic methods for novel coronavirus (ncov- ) in japan detection of novel coronavirus ( -ncov) by real-time rt-pcr we thank yoko yamamoto for her technical assistance. this work was supported, in part, by a grant from the japan agency for medical research and development to n.y. key: cord- -vf wxkx authors: lokman, syed mohammad; rasheduzzaman, md.; salauddin, asma; barua, rocktim; tanzina, afsana yeasmin; rumi, meheadi hasan; hossain, md. imran; siddiki, amam zonaed; mannan, adnan; hasan, md. mahbub title: exploring the genomic and proteomic variations of sars-cov- spike glycoprotein: a computational biology approach date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vf wxkx the newly identified sars-cov- has now been reported from around countries with more than a million confirmed human cases including more than deaths. the genomes of sars-cov- strains isolated from different parts of the world are now available and the unique features of constituent genes and proteins have gotten substantial attention recently. spike glycoprotein is widely considered as a possible target to be explored because of its role during the entry of coronaviruses into host cells. we analyzed whole-genome sequences and spike protein sequences of sars-cov- using multiple sequence alignment tools. in this study, unique variations have been identified among the genomes including non-synonymous mutations and one deletion in the spike protein of sars-cov- . among the variations detected, variations were located at the n-terminal domain and variations at the receptor-binding domain (rbd) which might alter the interaction with receptor molecules. in addition, amino acid insertions were identified in the spike protein of sars-cov- in comparison with that of sars-cov. phylogenetic analyses of spike protein revealed that bat coronavirus have a close evolutionary relationship with circulating sars-cov- . the genetic variation analysis data presented in this study can help a better understanding of sars-cov- pathogenesis. based on our findings, potential inhibitors can be designed and tested targeting these proposed sites of variation. wuhan, hubei province of china in december . the death toll rose to more than , among , , confirmed cases around the globe (until april , ) [ ] . the virus causing covid- is named as severe acute respiratory syndrome coronavirus (sars-cov- ). based on the phylogenetic studies, the sars-cov- is categorized as a member of the genus betacoronavirus, the same lineage that includes sars coronavirus (sars-cov) [ ] that caused sars (severe acute respiratory syndrome) in china during [ ] . recent studies showed that sars-cov- has a close relationship with bat sars-like covs [ , ] [ ] ]. interestingly, s glycoprotein is characterized as the critical determinant for viral entry into host cells which consists of two functional subunits namely s and s . the s subunit recognizes and binds to the host receptor through the receptor-binding domain (rbd) whereas s is responsible for fusion with the host cell membrane [ [ ] , [ ] , [ ] ]. mers-cov uses dipeptidyl peptidase- (dpp ) as entry receptor [ ] whereas sars-cov and sars-cov- utilize ace- (angiotensin converting enzyme- ) [ ] , abundantly available in lung alveolar epithelial cells and enterocytes, suggesting s glycoprotein as a potential drug target to halt the entry of sars-cov- [ ] . according to recent reports, neutralizing antibodies are generated in response to the entry and fusion of surface-exposed s protein (mainly rbd domain) which is predicted to be an important target for vaccine candidates [ [ ] , [ ] , [ ] ]. however, sars-cov- has emerged with remarkable properties like glutamine-rich aa long exclusive molecular signature (dsqqtvgqqdgsednqtttiqtivevqpqlemeltpvvqtie) in position - of polyprotein ab (pp ab) [ ] , diversified receptor-binding domain (rbd), unique furin cleavage site (prrar↓sv) at s /s boundary in s glycoprotein which could play roles in viral pathogenesis, diagnosis and treatment [ ] . to date, few genomic variations of sars-cov- are reported [ [ ] , [ ] ]. there is growing evidence that spike protein, a amino acid long glycoprotein having multiple domains, possibly plays a major role in sars-cov- pathogenesis. viral entry to the host cell is initiated by the receptor-binding domain (rbd) of s head. upon receptor-binding, proteolytic cleavage occurs at s /s cleavage site and two heptad repeats (hr) of s stalk form a six-helix bundle structure triggering the release of the fusion peptide. as it comes into close proximity to the transmembrane anchor (tm), the tm domain facilitates membrane destabilization required for fusion between virus-host membranes [ [ ] , [ ] ]. insights into the sequence variations of s glycoprotein among available genomes are key to understanding the biology of sars-cov- infection, developing antiviral treatments and vaccines. in this study, we have analyzed genomic sequences of sars-cov- to identify mutations between the available genomes followed by the amino acid variations in the glycoprotein s to foresee their impact on the viral entry to host cell from structural biology viewpoint. all available sequences ( whole genome and surface glycoprotein sequences of sars-cov- ) related to the covid- pandemic were retrieved from ncbi virus variation resource repository (https://www.ncbi.nlm.nih.gov/labs/virus/) [ ] . in addition, all s glycoprotein sequences from different coronavirus families were retrieved for phylogenetic analysis. the ncbi reference sequence of sars-cov- s glycoprotein, accession number yp_ was used as the canonical sequence for the analyses of spike protein variants. variant analyses of sars-cov- genomes were performed in the genome detective coronavirus typing tool version . which is specially designed for this virus (https://www.genomedetective.com/app/typingtool/cov/) [ ] . for multiple sequence alignment (msa), genome detective coronavirus typing tool uses a reference dataset of whole genome sequences (wgs) where wgs were from known nine coronavirus species. the dataset was then aligned with muscle [ ] . entropy (h(x)) plot of nucleotide variations in sars-cov- genome was constructed using bioedit [ ] . mega x (version . . ) was used to construct the msas and the phylogenetic tree using pairwise alignment and neighborjoining methods in clustalw [ , ] . tree structure was validated by running the analysis on bootstraps [ ] replications dataset and the evolutionary distances were calculated using the poisson correction method [ ] . variant sequences of sars-cov- were modeled in swiss-model [ ] using the cryo-em spike protein structure of sars-cov- (pdb id vsb) as a template. the overall quality of models was assessed in rampage server [ ] by generating ramachandran plots (supplementary table ). pymol and biovia discovery studio were used for structure visualization and superpose [ , ] . multiple sequence alignment of the available genomes of sars-cov- were performed and variations were found throughout the , bp long sars-cov- genome with in total variations in utr region, synonymous variations that cause no amino acid alteration, non-synonymous variations causing change in amino acid residue, indels, and variations in non-coding region (supplementary table ). among the variations, variations ( synonymous, non-synonymous mutations and one deletion) were observed in the region of orf s that encodes s glycoprotein which is responsible for viral fusion and entry into the host cell [ ] . notable that, most of the sars-cov- genome sequences were deposited from the usa ( ) and china ( ) (supplementary fig. ). positional variability of the sars-cov- genome was calculated from the msa of sars-cov- whole genomes as a measure of entropy value (h(x)) [ ] . excluding ′ and ′ utr, ten hotspot of hypervariable position were identified, of which seven were located at orf ab ( c>t, c>t, c>t, c>t, c>t, a>g, c>t) and one at orf s ( a>g), orf a ( g>t), and orf ( t>c) respectively. the variability at position and were found to be the highest among the other hotspots ( fig. ). the phylogenetic analysis of a total of sequences ( unique sars-cov- and different coronavirus s glycoprotein sequences) was performed. the evolutionary distances showed that all the sars-cov- spike proteins cluster in the same node of the phylogenetic tree confirming the sequences are similar to refseq yp_ (fig. ) . bat coronaviruses has a close evolutionary relationship as different strains were found in the nearest outgroups and clades (bat coronavirus bm - , bat hp-beta coronavirus, bat coronavirus hku ) conferring that coronavirus has vast geographical spread and bat is the most prevalent host (fig. ) . in other clades, the clusters were speculated through different hosts which may describe the evolutionary changes of surface glycoprotein due to cross species transmission. viral hosts reported from different spots at different times is indicative of possible recombination. the s glycoprotein sequences of sars-cov- were retrieved from the ncbi virus fig. ). alterations of amino acid residual charge from positive to neutral (h y, r i, h q), negative to neutral (d n, d g, d y), negative to positive (d h, d h), and neutral to positive (n k, s r) were seen in variants qhw , qhs , qis , qis , qik , qis , qis , qis , qio , and qhr respectively due to substitution of amino acid that differ in charge. the remaining variants were mutated with the amino acids that are similar in charge (fig. a) . the sars-cov- spike protein variants were superposed with the cryo-electron microscopic structure of sars-cov- spike protein [ ] . l f, n k, e d, f l, g v, s r, g s, v a, d h, and d h variants were excluded from superposition due to absence of respective residues in the d structure of template (pdb: vsb). the superposition showed that most of the residual change were causing incorporation of bulky amino acid residues (t i, h y, l f, s w, a t, h q, a s, a v, d y, and a v) in place of smaller size residue except y n, d n, r i, d g, and f c (fig. b-p) . fig. ) . the s subunit of spike protein, especially the heptad repeat region , fusion peptide domain, transmembrane domain, and cytoplasmic tail were found to be highly conserved in the sars-cov and the sars-cov- variants while the s subunit was more diverse, specifically the n-terminal domain (ntd) and receptor-binding domain (rbd). covid is one of the most contagious pandemics the world has ever had with , , confirmed cases to date (april , ) and the cases have increased as high as times in less than a month [ ] . phylogenetic analysis showed that the sars-cov- is a unique coronavirus presumably related to bat coronavirus (bm - , hp-betacoronavirus). during this study, we [ ] , [ ] , [ ] , [ ] ]. likewise, a number of studies targeting sars-cov- spike protein have been undertaken for the therapeutic measures [ ] , but the unique structural and functional details of sars-cov- spike protein are still under scrutiny. we also found a variant (r i) at receptor binding domain (rbd) that mutated from positively charged arginine residue to neutral and smaller sized isoleucine residue (fig. i) . this change might alter the interaction of viral rbd with the host receptor because the r residue of sars-cov- is known to interact with the ace receptor for viral entry [ ] . similarly, alterations of rbd (g s, v a, h q, and a s) also could affect the interaction of sars-cov- spike protein with other molecules which require further investigations. qia and qis variants were found to have an alteration of alanine to valine (a v), and aspartic acid to tyrosine (d y) respectively in the alpha helix of the hr domain. previous reports have indicated that hr domain plays a significant role in viral fusion and entry by forming helical bundles with hr , and mutations including alanine substitution by valine (a v) in hr region are predominantly responsible for conferring resistance to mouse hepatitis coronaviruses against hr derived peptide entry inhibitors [ ] . this study hypothesizes the mutation (a v) found in that of sars-cov- might also have a role in the emergence of drug-resistance virus strains. also, the mutation (d h) found in the heptad repeat (hr) sars-cov- could play a vital role in viral pathogenesis. the sars-cov- s protein contains additional furin protease cleavage site, prrars, in s /s domain which is conserved among all sequences as revealed during this study ( supplementary fig. ). this unique signature is thought to make the sars-cov- more virulent than sars-cov and regarded as novel features of the viral pathogenesis (ref ). according to previous reports the more the host cell protease can process the coronavirus s can accelerate viral tropism accordingly in influenza virus [[ ] , [ ] , [ ] , [ ] ]. apart from that, this could also promote viruses to escape antiviral therapies targeting transmembrane protease tmprss (clinicaltrials.gov, nct ) which is well reported protease to cleave at s /s of s glycoprotein [ ] . comparative analyses between sars-cov and sars-cov- spike glycoprotein showed % similarity between them where the most diverse region was . coronavirus disease (covid- ) situation reports severe acute respiratory syndrome-related coronavirus--the species and its viruses, a statement of the coronavirus study group lim, others, a novel coronavirus associated with severe acute respiratory syndrome bats are natural reservoirs of sars-like coronaviruses, science ( -. ) huang, others, a pneumonia outbreak associated with a new coronavirus of probable bat origin pei, others, a new coronavirus associated with human respiratory disease in china genome composition and divergence of the novel coronavirus ( -ncov) originating in china cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein structure analysis of the receptor binding of -ncov fouchier, others, dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc greenough, others, angiotensin-converting enzyme is a functional receptor for the sars coronavirus functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses a. nitsche, others, sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor wu, others, potent binding of novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody an exclusive amino acid signature in pp ab protein provides insights into the evolutive history of the novel human-pathogenic coronavirus (sars-cov ) the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade genomic variance of the -ncov coronavirus genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex interaction between heptad repeat and regions in spike protein of sars-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors virus variation resource--improved response to emergent viral outbreaks genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes muscle: multiple sequence alignment with improved accuracy and proceedings. ieee comput. syst. bioinforma. conf. . csb bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt mega x: molecular evolutionary genetics analysis across computing platforms the neighbor-joining method: a new method for reconstructing phylogenetic trees bootstrap confidence levels for phylogenetic trees evolutionary divergence and convergence in proteins swiss-model: homology modelling of protein structures and complexes structure validation by calpha geometry: phi, psi and cbeta deviation pymol: an open-source molecular graphics tool receptor recognition mechanisms of coronaviruses: a decade of structural studies a parvovirus b synthetic genome: sequence features and functional competence cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding long-term protection from sars coronavirus infection conferred by a single immunization with an attenuated vsv-based vaccine human monoclonal antibodies against highly conserved hr and hr domains of the sars-cov spike protein are more broadly neutralizing a truncated receptor-binding domain of mers-cov spike protein potently inhibits mers-cov infection and induces strong neutralizing antibody responses: implication for developing therapeutics and vaccines fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein role of changes in sars-cov- spike protein in the interaction with the human ace receptor: an in silico analysis coronavirus escape from heptad repeat (hr )-derived peptide entry inhibition as a result of mutations in the hr domain of the spike fusion protein host cell proteases controlling virus pathogenicity role of hemagglutinin cleavage for the pathogenicity of influenza virus host cell proteases: critical determinants of coronavirus tropism and pathogenesis coronaviruses: an overview of their replication and pathogenesis receptor for mouse hepatitis virus is a member of the carcinoembryonic antigen family of glycoproteins laude, others, aminopeptidase n is a major receptor for the enteropathogenic coronavirus tgev key: cord- -el v a authors: tan, h.s. title: fourier spectral density of the coronavirus genome date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: el v a we present an analysis of the coronavirus rna genome via a study of its fourier spectral density based on a binary representation of the nucleotide sequence. we find that at low frequencies, the power spectrum presents a small and distinct departure from the behavior expected from an uncorrelated sequence. we provide a couple of simple models to characterize such deviations. away from a small low-frequency domain, the spectrum presents largely stochastic fluctuations about fixed values which vary inversely with the genome size generally. it exhibits no other peaks apart from those associated with triplet codon usage. we uncover an interesting, new scaling law for the coronavirus genome: the complexity of the genome scales linearly with the power-law exponent that characterizes the enveloping curve of the low-frequency domain of the spectral density. motivated by our search for deeper organizational principles governing genetic information , the study of a dna/rna genome via its fourier spectral density has given us several interesting insights into the code of life. an example of a seminal paper in this subject is that of voss in [ ] where the author found that the spectral density of the genome of many different species follows a power law of the form /k β in the low-frequency domain, with the exponent β potentially related to the organism's evolutionary category. in [ ] , β was found to be close to , a phenomenon shared by a wide variety of physical systems especially those that carry long-range correlations or characterized by a myriad of length scales. it was also found that the power spectra may contain defining peaks or resonances, for example at period for primates, vertebrates and invertebrates, or period - for yeast, bacteria and archaea as shown in [ ] where the peaks were remarkably related to aspects of protein structuring and folding. over the years, these methods and results have been extended in various ways [ ] , such as wavelet-type analysis [ , , ] of the sequences, using features of the spectra to classify and cluster genomes with the aid of neural networks [ ] , prediction of coding regions [ ] and periodic structures [ ] , etc. in this paper, we study the fourier spectral density of the genome of coronaviruses -a positivesense single-stranded rna genome with size ranging from roughly to kilobases, based on the dataset of [ ] which covers all four genera of coronaviruses. in addition, motivated by the recent covid- pandemic, we include the genomes of sars-cov- , a bat coronavirus bat-ratg of close genome identity, and the mers coronavirus. across the different genome sequences, we find that their fourier spectra take on the same form. there is a low frequency domain (k in units of inverse genome length) where a sinc-squared-like oscillatory form is enveloped by a roughly /k decay curve. this is followed by stochastic white-noise type fluctuations about fixed mean values which tend to vary inversely with the genome size. we find that a random, uncorrelated sequence -with the probability of occurrence for each nucleotide being its frequency ratio in the sequence -yields similar behavior in the low-frequency domain. we develop a few models to characterize the typical spectrum, and in the process stumble upon a linear scaling law between a measure of the complexity of each genome and the power-law exponent that describes the enveloping curve of the low-frequency domain. the complexity measure that we use here is intimately related to the shannon entropy of the sequence, and thus this relation concretely realizes a way by which information-theoretic content is carried within the genome's spectral density. now, power-law decay of the form /k α have previously been discussed in literature for other types of genomes (see for example [ , ] ). we would like to emphasize that here, we do not employ either the fast fourier transform or non-overlapping averaging procedures to smoothen the data in the low-frequency domain. these are common techniques used for easing computations in past related works, but may compromise the sensitivity by which we characterize the spectral curves. we also perform the spectral density analysis at the level of the coding region (a few thousand nucleotides) for the spike protein, an essential protein that binds to the host cell's receptor. we find that interestingly, the general features of the spectrum persist at the protein level, but not the scaling law mentioned above. our paper is organized as follows. in section , we present some background theory for our work, followed by section where we present the results and a few graphical plots for visualization, before concluding in section . the appendix a collects a table listing all the genbank accession numbers [ ] of the genomes, and another gathers several graphs useful for interpreting our various results. in this section, we present some essential mathematical concepts that form the basis for our study. our analysis of the rna genomes can only begin after transformation of the genome sequence consisting of the four nucleotides (adenine, cytosine, guanine and uracil) into a numerical string. the spectral density of interest here is the absolute square of the discrete fourier transform of a nucleotide indicator function φ(i) defined as follows where m denotes the length of the genome, and α, β denote particular choices of nucleotides. in the continuum limit and after averaging over some distribution of genomes, this approaches the fourier transform of the correlation function now, a basic premise lies in the choice of the indicator function φ(i). while various propositions have been explored in the literature, in this paper, following [ ] , we use a simple binary-valued model where for each nucleotide, φ(i) is equal to if the nucleotide is found at position 'i' and otherwise. for all our genome data, we find that ( ) exhibits a clear specific oscillatory form that resembles a sinc(-squared) function in the low-frequency domain (up to k ∼ ). in the following, we furnish a potential simple explanation of such low-frequency behavior. for simplicity and definiteness, we will mainly focus on the spectral density sum for the rest of the paper, but we have checked that the general features described above pertain to the cross-spectra s αβ (with β = α) as well as the individual autocorrelations s αα for all the four nucleotides. apart from computing various quantities at the level of the entire rna genome, we also examine the spectral density associated with the coding region for the spike protein. for the coronaviruses, apart from the spike protein, the genome encodes several proteins each carrying unique functions, such as the envelope, membrane, nucleocapsid, etc. in particular, the spike protein plays an essential role in host cell receptor binding during the process of viral infection, and is thus a common target for developments of antibodies and vaccines (see for example, [ ] ). now the coding region associated with this protein is only of the order of nucleotides, so a priori it is not clear if the spectral density can be meaningfully analyzed. we find however that the general features of the spectral density persist for the spike protein's coding region too. consider the case of an uncorrelated numerical sequence, where the probablity of a nucleotide of type α occurring at some position is a constant, independent of others and the position itself. given n α such nucleotides in the sequence, we can estimate this constant to be nα m , with the expectation value of the spectral density being we would find that up to k ∼ , ( ) models the spectral density rather well. for the local maxima of ( ), they approximately occur at half-integer values of k and thus the upper envelope of the oscillation is manifest as a /k decay function in this domain which follows from expanding ( ) about k = . the decaying behavior of the envelope curve typically stops at about k ∼ , and thereafter the spectral density appears to be characterized by stochastic fluctuations about some fixed mean. although ( ) appears to model observed datasets well, the goodness of fit doesn't extend beyond the k ∼ range, nor is it clear from the data whether deviations from ( ) are unimportant random fluctuations or otherwise within the low-frequency domain. to gain further insights, we present a few simple models which characterize the observed deviations from ( ). the models' parameters can potentially be used for clustering coronavirus genomes if future studies prove that these values persist for a larger sets of data, or more interestingly, they could potentially demonstrate correlation with other features of the genome that would help us recognize the presence of long-range correlations. from now on, we refer to ( ) as the 'uncorrelated background'. in the following, we present three models for the observed spectral density that characterize deviations from the uncorrelated background. the first two concerns the description of the low-frequency domain (k ) whereas the third involves a more global description. (a) power-law decay of the enveloping curve motivated by previous works on this subject, we consider fitting a power-law decay via leastsquare regression to the enveloping curve ( for k ∈ [ , ] ) of the form for some power exponent δ. the power-law description is convenient and has proven to be a popularly studied model for spectral density of genomes in general (see for example [ ] ). it is crucial to bear in mind that it is a coarse-grained description which doesn't extend to the origin, and valid only for the low-frequency domain. we would later find that this is the parameter that remarkably scales linearly with a measure of the genome complexity. for all our datasets, = δ − ∼ − . it is not a priori clear how large has to be in order for the deviation to be significant, and more sequences corresponding to each type of coronavirus should be studied in order to determine the range of and its statistical distribution. although we leave this for future work, we found evidence that the variation in the correlates with a measure of the complexity of the genome (which at the limit of infinite genome size approaches the shannon entropy) in a way that is distinctly different from a completely random sequence. it is useful to compute the expected δ for the hypothetical uncorrelated background ( ) which is parametrized by the genome size m and the sum of squares of nucleotide number α n α . for the general spectral density s(k), from least-square regression of the log-log relation, we obtain ≡ − δ to be where . . . = k (. . .) denotes averaging over the nine local maxima points in the domain k ∈ [ , ] . for the uncorrelated case of ( ), we find that the factor α n α cancels away in ( ) and numerically, δ ≈ . for all the datasets at the level of the genome and that of the protein coding region. this defines a background value for the detection of a deviation away from the completely random sequence. in contrast to an empirical power-law fitting of only the enveloping curve, one could adopt a bottom-up approach by postulating certain forms of the correlation function, and then performing the discrete fourier transform. consider the case where the correlation function is a linear function of the nucleotide separation, we can write for some constant κ, and r = α this function is invariant under the reflection k ↔ m − k, which is an exact discrete symmetry for the spectral density s(k) (or the individual s αα (k)) more generally. the parameter κ admits the physical interpretation of the presence of long-range correlation/anti-correlation depending on whether it's positive/negative, and we would find that apart from one exception, all our datasets can be matched to a positive κ of the order − . we find that if the curvefitting is performed taking into account only the first ten local maxima as in the case for δ-parameter, the local minima points at integral k-values are not well captured by the fitted curve, so we also include them in the curve-fitting. beyond the specific linear form of the correlation function postulated in ( ), it is also representative of a large class of correlation functions of the form where τ ≡ l − j,κ is a small constant andτ =κ |τ | m . this first-order truncation is identical to ( ) with f ( )κ = κ. thus, ( ) could approximate correlation functions of the general form f κ |τ | m whereκ is a small dimensionless parameter, and for example, if the correlation function turns out to be an exponentially decaying function of the form e − b|τ | m with b , then to a good approximation we can identify κ ∼ b. the power-law decay in (a) parametrizes the decay of the envelope whereas the model in ( ) could account for non-vanishing local minima in the low-frequency domain. beyond this region, we seek an interpolating curve that extends throughout the spectrum including the origin. for this purpose, we consider fitting a lorentzian function of the following form to the spectrum where n = α n α m − m and m is the mean value near the spectrum's midpoint, about which stochastic fluctuations are observed. this is a simple coarse-grained model which averages over the oscillations in the low-frequency domain and describes the overall decay of the spectrum via a smooth curve. like the κ parameter in the model ( ), the curve-fitting is performed with the set of extremal points in the low-frequency domain, with the initial and final conditions taken into account by first fixing n, m with their observed values for each genome sequence. as a useful reference, we also fit the lorentzian function to the uncorrelated background ( ) and finding b ≈ . with m ∼ − at the genome level, and m ∼ − at the protein coding region level. scaling laws manifest in the fourier spectral density have often motivated the study of features of the genome that reflect various properties of it being a complex system, such as the fractal dimension (of a suitably defined matrix representation of the correlation function), etc. a measure of the complexity of the genome considered in the past literature (see for example [ , , ] ) is defined as follows . where n α is the number of the α-nucleotide. the logarithmic argument counts the number of distinguishable permutations given a fixed number of each nucleotide. at large m , this admits a natural interpretation of the shannon entropy of the genome sequence. to see this, we can invoke stirling's formula to express the large-m limit of Ω as which is a function of only the fractional distribution of nucleotides. in this form ( ), the measure of complexity Ω is clearly the shannon entropy which measures the information entropy associated with a genome sequence where the probability of nucleotide-α occurring in any position is nα m . we would find later that interestingly, the model parameter δ (but not κ) scales linearly with Ω across the dataset of types of coronavirus genomes. also, when restricted to the spike protein's level, the measure of complexity appears to scale linearly with the overall measure at the genome level. but the model parameter δ that is computed at the level of the spike protein does not correlate with Ω at either the genome/protein level, and neither does κ. our genome dataset consisting of types of coronaviruses spread across four genera mainly follows from reference [ ] plus a few other additions : sars-cov- , mers-cov and bat-ratg . bat-ratg is a bat coronavirus that was most recently found to have % genome identity with sars-cov- and featured in papers discussing a possible bat origin of the latter [ ] . we included it here to see how the model parameters for this genome compare to that of sars-cov- relative to the other coronaviruses. in the following, we outline the essential results, using the example of the sars-cov- reference genome for various graphical illustrations. we find that the fourier spectral density is characterized by the following features: (a) in a small low-frequency regime (k ), the uncorrelated background ( ) is a good approximation (see fig. from visual inspection of the relevant graphical plots, we find no obvious correlation among these model parameters, nor between them and the genome/spike protein sizes. but we find that δ w and Ω w appear to be related. linear regression yields the following best-fit line (see fig. ) with the line parameters being (with the % confidence intervals in brackets) α ≈ . ( . , . ), β ≈ − . (− . , − . ), since we checked that for all the coronaviruses, the assumption of a completely uncorrelated background yields δ ≈ . , this leads to a convenient definition of a reference complexity value Ω u ≈ . , which lies at the intersection between the uncorrelated vertical line and the observed one with finite slope. the difference between the observed complexity measure and Ω u in turn enacts a measure of the deviation from complete randomness of the sequence. there is also a similar relation between δ w and Ω s , consistent with the following linear relation that we found: Ω w = cΩ s , c ≈ . ( . , . ). it would be interesting to study this for other coding and non-coding regions as it is suggestive of some level of self-similarity for this complexity measure. ( ) obtained by fitting to all maxima and minima, with the best-fit value κ = . . (b) after k ∼ , the genome displays much more scatter about the uncorrelated background, and the models of deviation are no longer effective descriptions (see fig. ). stochastic fluctuations about a fixed mean appear to set in and there are no isolated peaks apart from two prominent ones at k ∼ m , m which have been seen and interpreted in past literature [ , ] to correspond to the universal triplet codon usage. we applied an (overlapping) moving average (of window size ∼ nucleotides) to smooth out the data, and checked that there is no apparent regime where some non-trivial scaling law holds (see fig. at the level of both the genome and protein coding region, the fixed mean parameter m appears to correlate with the genome size. it appears to generally decrease with the size of the sequence,at both levels of the genome and the spike protein (see figures a and b ) in appendix b). at the genome level, it is of the order ∼ − which is about larger than the value expected for the uncorrelated background, whereas at the spike protein level, m ∼ − which is times larger than the uncorrelated background. the lorentzian function that is fitted to the data with initial and final conditions fixed by r and m is parametrized by the showing how after about k = , the data points appear to be noisy and such stochastic fluctuations appear to persist throughout apart from a couple of isolated peaks. neither the envelope curve of /k δ nor equation ( ) continue to be effective descriptions. half-width parameter b. we find that this parameter generally increases with κ at both genome and spike protein levels (see figures a and b in appendix b). (c) finally, although for simplicity, we have kept to analyzing the spectral density corresponding to the sum of all the nucleotides, the general qualitative features described in (a) and (b) above apply to the spectral density for each individual nucleotide as well as the cross-spectra. we have presented a study of the fourier spectral density of the coronavirus genome at the level of the entire genome as well as the coding region for the spike protein. the power spectrum profile can be well-described by considering aspects of deviation from the hypothetical case of a random, uncorrelated sequence (eqn. ( ) ). we summarize the essential general features below: (i) there is a low-frequency domain (k ) which exhibits a clear oscillatory form that is close to ( ) . in this domain, we find that the enveloping curve connecting the local maxima is well-described by a power decay law of the form /k δ . we noted that the power exponent δ shows a correlation with a measure of complexity of the sequence (eqn. ( ) ) which in the limit of large genome size is the sequence's shannon entropy. the deviation from the uncorrelated background can be described by a linear relation between δ and Ω. this behavior does not however persist at the level of the spike protein's coding region. (ii) beyond the low-frequency domain, the spectrum displays stochastic fluctuations about certain fixed values m, and we find no other resonances apart from the peaks at m , m which are associated with the universal triplet codon usage. relative to the uncorrelated case, m is about higher at the genome level and about higher at the spike protein level. it also generally decreases with the size of the genome or the protein coding region. (iii) upon fitting the lorentzian function to the spectrum with initial and final conditions determined by r and m respectively, we find that its half-width parameter is correlated with κ -the dimensionless constant that defines the linearized correlation function in the lowfrequency domain, and generally increases with it. this is observed at both the genome and spike protein's levels. let us conclude by briefly pointing out several future directions and applications. now, it has been noted in literature for some time that dna viruses and unicellular organisms tend to have mutation rates which vary inversely with the genome size ('drake's rule' [ , , ] ). this correlation has been studied for rna viruses recently (see for example [ ] ) although we are unaware of any evidence for the case of coronaviruses which is the only rna virus family which has a 'exonuclease proofreading mechanism that enhances replication fidelity. the parameter m that we have introduced here appears to vary inversely with genome size, and thus it may be worthwhile to explore its role in models that attempts to explain viral mutation rates. in [ ] , a negative association between molecular evolution rate and genome size was established for rna viruses. it would be interesting to compute the parameter m for the viral sequences studied in [ , ] . another potential application of our work which has immediate relevance is to study the distribution of m for sars-cov- genomes specifically to explore if they could describe current evolution of the virus (see for example [ ] ). the lorentzian function that we fit broadly to the spectrum as a whole is a coarse-grained description that does not model the transition from the low-frequency spectrum to the other part of the spectrum that appears to be dominated by stochastic fluctuations. it would be interesting to develop theoretical models that could possibly account for such a transition and in the process, construct a clearer understanding for the parameter m or why the information-theoretic measure ( ) is relevant for the low-frequency domain. a complementary approach towards understanding correlation effects is to study directly the correlation function itself (see for example [ ] ), although this is more computationally intensive. it would be interesting to study what forms of correlation functions could lead to the enveloping curve being of the form /k δ . a few related models were proposed in [ , ] , and it may be worthwhile to revisit them in light of the newfound relation with the measure of complexity. finally, it would be interesting to perform a more extensive study of the models here with a larger set of viral genomes so that we have a fuller understanding of their statistical distribution and whether they can be useful in clustering and classifying purposes. motivated by the covid- pandemic, notwithstanding our limited dataset, in table below, we show the viral genome that is the closest neighbor to sars-cov- for each of the four model parameters at both levels of the genome and spike protein coding region. from table , we see that bat-rtg features most frequently and that apart from tgev and hku which infect pigs and humans respectively, the others are bat coronaviruses. collectively, they appear to be broadly compatible with the plausibility of the bat origin of sars-cov- , while to our knowledge, the association of sars-cov- with tgev and hku has never been made in literature. spike protein δ tgev bat-rtg κ bat-rtg hku m bat-rtg bat-cov- , hku b hku hku in this section, we collect several graphs useful for visualizing two particular trends observed: (i) the parameter m tends to vary inversely with size of genome/spike protein coding region, (ii) the linearized correlation function parameter κ and the half-width parameter b appears to be correlated. based on lectures delivered under the auspices of the dublin institute for advanced studies at trinity college evolution of long-range fractal correlations and /f noise in dna base sequences - bp periodicities in complete genomes reflect protein structure and dna folding universal /f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in dna sequences of the human genome multi-scale coding of genomic information: from dna sequence to genome structure and function wavelet analysis of dna sequences characterizing long-range correlations in dna sequences from wavelet analysis bacteria classification on power spectrums of complete dna sequences by self-organizing map long-range correlation properties of coding and noncoding dna sequences: genbank analysis periodicity in prokaryotic and eukaryotic genomes identified by power spectrum analysis coronavirus genomics and bioinformatics analysis statistics of dna sequences: a low-frequency analysis understanding long-range correlations in dna sequences national center for biotechnology information a maximum entropy principle for the distribution of local complexity in naturally occurring nucleotide sequences fractals and hidden symmetries in dna visualization and analysis of dna sequences using dna walks a pneumonia outbreak associated with a new coronavirus of probable bat origin periodicity of base correlation in nucleotide sequence a constant rate of spontaneous mutation in dna-based microbes rates of spontaneous mutation evolution of the mutation rate correlation between mutation rate and genome size in riboviruses: mutation rate of bacteriophage q moderate mutation rate in the sars coronavirus genome and its implications from molecular genetics to phylodynamics: evolutionary relevance of mutation rates across viruses complexities of viral mutation rates on the origin and continuing evolution of sars-cov- study of statistical correlations in dna sequences spatial /f spectra in open dynamical systems expansion-modification systems: a model for spatial /f spectra a quantitative genomic view of the coronaviruses: sars-cov acknowledgments i thank neal snyderman and rajesh parwani for stimulating discussions. this appendix collects the genbank accession id and names of the coronaviruses used in this work, which largely follows [ ] , our additions being sars-cov- , mers-cov and bat-ratg . these genomes can be freely downloaded from https://www.ncbi.nlm.nih.gov. for each genome, we exclude the poly(a) tail for our analysis. key: cord- - zyz hll authors: luban, jeremy; sattler, rachel; mühlberger, elke; graci, jason d.; cao, liangxian; weetall, marla; trotta, christopher; colacino, joseph m.; bavari, sina; strambio-de-castillia, caterina; suder, ellen l.; wang, yetao; soloveva, veronica; cintron-lue, katherine; naryshkin, nikolai a.; pykett, mark; welch, ellen m.; o’keefe, kylie; kong, ronald; goodwin, elizabeth; jacobson, allan; paessler, slobodan; peltz, stuart title: the dhodh inhibitor ptc arrests sars-cov- replication and suppresses induction of inflammatory cytokines date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zyz hll the coronavirus disease (covid- ) pandemic has created an urgent need for therapeutics that inhibit the sars-cov- virus and suppress the fulminant inflammation characteristic of advanced illness. here, we describe the anti-covid- potential of ptc , an orally available compound that is a potent inhibitor of dihydroorotate dehydrogenase (dhodh), the rate-limiting enzyme of the de novo pyrimidine biosynthesis pathway. in tissue culture, ptc manifests robust, dose-dependent, and dhodh-dependent inhibition of sars cov- replication (ec range, . to . nm) with a selectivity index > , . ptc also blocked replication of other rna viruses, including ebola virus. consistent with known dhodh requirements for immunomodulatory cytokine production, ptc inhibited the production of interleukin (il)- , il- a (also called il- ), il- f, and vascular endothelial growth factor (vegf) in tissue culture models. the combination of anti-sars-cov- activity, cytokine inhibitory activity, and previously established favorable pharmacokinetic and human safety profiles render ptc a promising therapeutic for covid- . the coronavirus disease pandemic is a serious threat to public health. severe acute respiratory syndrome coronavirus (sars-cov- ), the causative agent of covid- , is a positive-sense, single-stranded-rna virus of the coronaviridae family that shares . % sequence identity with sars-cov lu et al., ; wu et al., a; wu et al., b; zhou et al., b; zhu et al., ) . while both viruses are likely descendants of bat coronaviruses, the proximal source of this zoonotic virus is unknown (zhou et al., b) . since its first description at the end of , sars-cov- has spread around the globe, infecting millions of people, and killing hundreds of thousands (https://coronavirus.jhu.edu). an urgent medical need exists for effective treatments for this disease. in the early stages of covid- , the virus proliferates rapidly, and in some cases, triggers a cytokine storm -an excessive production of inflammatory cytokines (quartuccio et al., ; wiersinga et al., ) . this uncontrolled inflammation can result in hyperpermeability of the vasculature, multi-organ failure, acute respiratory distress syndrome (ards), and death (gupta et al., ; jose and manuel, ; quartuccio et al., ; wang et al., a; zhou et al., a) . acute respiratory distress syndrome is one of the leading causes of mortality in covid- (moore and june, ) . in addition, elevated levels of interleukin (il)- and il- are reported to be associated with severe pulmonary complications and death (pacha et al., ; ruan et al., ; russell et al., ) . a therapeutic that can inhibit sars-cov- replication while attenuating the cytokine storm would be highly beneficial for both the early and late stages of covid- . targeting the cellular de novo pyrimidine nucleotide biosynthesis pathway is one potential approach to treat both phases of covid- as viral replication and over-production of a subset of inflammatory cytokines are controlled by pyrimidine nucleotide levels (cheung et al., ; xiong et al., ) . dihydroorotate dehydrogenase (dhodh), the rate-limiting enzyme in this pathway, is located on the inner membrane of mitochondria and catalyzes the dehydrogenation of dihydroorotate (dho) to orotic acid, ultimately resulting in the production of uridine and cytidine triphosphates (utp and ctp) (figure ) (munier-lehmann et al., ) . the de novo pyrimidine biosynthesis pathway is also critical for the excessive production of a subset of inflammatory cytokines (including interferon-gamma, monocyte chemoattractant protein- [mcp- ], il- , and il- ) in both cultured cells and animal models of viral infection (cheung et al., ; xiong et al., ) . consistent with dhodh being mechanistically central to covid- , a recent multi-omics study of drug targets for viral infections prioritized dhodh inhibition as one of the top three mechanisms to consider for sars-cov- treatment . ptc is an orally bioavailable potent inhibitor of dhodh (cao et al., ) . treatment of cultured cells with ptc results in inhibition of dhodh activity, leading to increased levels of dho, the substrate for the dhodh enzyme (cao et al., ) . similar findings were documented in ptc -treated cancer patients, where administration of ptc resulted in increased blood levels of dho, indicating successful inhibition of dhodh in these patients (cao et al., ) . these results are consistent with increased dho levels observed in miller syndrome patients who carry mutations in the dhodh gene that reduce dhodh activity (duley et al., ) and with studies showing that ptc not only inhibits vascular endothelial growth factor (vegf) levels in cultured cells, but also normalized vegf levels in cancer patients (cao et al., ) . vegf is a stress-regulated cytokine the levels of which are increased in these patients (plotkin et al., ). in studies of over human subjects, including healthy volunteers and oncology patients, ptc has manifested a favorable pharmacokinetic (pk) profile and has been generally well tolerated. the mechanism of action and the favorable pk and safety profiles of ptc support its investigation for use as a therapeutic for covid- . here, we evaluated the in vitro antiviral activity of ptc against a panel of rna viruses, with a specific focus on sars-cov- , as well as the drug's ability to attenuate the excessive production of inflammatory cytokines. our results indicate that that ptc has considerable potential as a covid- therapeutic. schematic of the salvage and de novo pathways of pyrimidine biosynthesis. the salvage pathway recycles pre-existing nucleotides from food or other extracellular sources and does not appear to be sufficient to support viral rna replication. the percentages in red denote the respective extents to which the relative levels of specific nucleotides are reduced during ptc treatment ( nm) of cultured ht fibrosarcoma cells for hours (cao et al., ) abbreviations: cad, complex of the following three enzymes: carbamoyl-phosphate synthetase , aspartate carbamoyl transferase, and dihydroorotase; cda, cytidine deaminase; cmp, cytidine monophosphate; ctp, cytidine triphosphate; dho, dihydroorotate; uck / , uridine-cytidine kinase and uridine-cytidine kinase ; udp, uridine diphosphate; ump, uridine monophosphate; upp, uridine phosphorylase; uprt, uracil phosphoribosyl transferase; utp, uridine triphosphate. the ability of ptc to inhibit sars-cov- replication was evaluated in vitro using two showed that ptc did not reduce cell number (figure a ). these results indicate that ptc reduces viral load in the absence of cytotoxicity. a. legend: (a) quantitative immunofluorescence analysis of sars-cov- -infected vero e cells treated with ptc . ptc was added at concentrations ranging from nm to µm minutes prior to infection of vero e cells with sars-cov- (usa-wa / ) at a multiplicity of infection (moi) of . . at hours post-infection, the cells were fixed, probed with antibodies against the nucleocapsid protein (np) of sars-cov- and stained with alexafluor conjugated secondary antibody. nuclei were stained with dapi. images acquired in the green (i.e., np of sars-cov- ) and the blue (dapi) channels were overlaid and are displayed as indicated (note images corresponding to the nm concentration of ptc were omitted for simplicity). scale bar, µm. (b) in order to quantify viral infection, images acquired in a were subjected to quantitative image analysis to determine the number of sars-cov- nucleocapsid positive cells at each concentration of ptc (i.e., , , , nm and µm). normalized percent numbers of nucleocapsid positive cells were plotted against log transforms of ptc concentrations and the ic was estimated using non-linear regression (goodness-of-fit r-squared = . ). data is plotted as the mean and standard deviation of independent replicates. in a second cell culture model, the efficacy of ptc in inhibiting sars-cov- replication was further evaluated. vero cells were infected with sars-cov- at a moi of . and ptc was added at increasing concentrations at various times from hours pre-infection to hours post-infection. viral titer was quantified by collecting the medium from the infected cell cultures and determining the tcid ( % tissue culture infectious dose). the % and % effective concentrations (ec and ec , respectively) of ptc were calculated. in parallel, the concentration of ptc necessary to reduce cell viability by % (cytotoxic concentration; the initial concentration range of ptc explored did not enable the determination of ec and ec for ptc against sars-cov- . therefore, a range of lower concentrations of ptc (from pm to µm) was evaluated. the ec and ec values were determined to be . nm and nm, respectively ( figure c and table ), which is consistent with the findings of the immunofluorescence experiments described in figure . the ec , determined as the midpoint of the logarithmic curve, was . nm. using a similar calculation based on the logarithmic curve, the ec was determined to be . nm. at the highest concentration of ptc tested ( µm), the cc was determined to be > µm ( figure d and table ) , indicating, similar to the immunofluorescence experiments described above, that ptc reduces viral load with little cell death. legend: (a) quantification of the dose dependent inhibition of sars-cov- replication by ptc . vero cells were pretreated for hours with the concentrations of ptc of , , or nm prior to the addition of sars-cov- . dashed line indicates the limit of quantification. (b). uridine blocks the ptc inhibition of sars-cov replication. vero cells were preincubated for or hours with ptc alone or with ptc plus µm uridine before addition of sars-cov- . (c) determination of ec and ec . vero cells were preincubated with ptc at . , . , . , . , . , . , , , , , , , , , , and nm prior to addition of sars-cov- . (d) ptc has little cytotoxic effect in vero cells. vero cells were preincubated for hours with ptc at . , . , . , . , . , . , . , . , and nm prior to addition of sars-cov- . (a-d) cells were inoculated with sars-cov- (usa-wa / ) at a moi of . . viral titer was determined by collecting the medium from the infected cells and assessing the concentration required for % infectivity of vero cells (the % tissue culture infectious dose assay or tcid ). cytotoxicity was evaluated by measuring intracellular atp (adenosine triphosphate) levels after hours of culture. data are plotted as the mean and standard deviation of independent replicates. the results in figures and indicate that ptc is a potent inhibitor of sars-cov- replication. considering its mechanism of inhibiting cellular de novo pyrimidine synthesis and the established role of pyrimidine nucleotides in viral replication, we investigated the spectrum of activity of ptc against a panel of rna viruses of both positive and negative genome polarity. ptc was found to have broad-spectrum antiviral activity against the rna viruses tested (table ) . ptc was a highly potent inhibitor of hepatitis c virus (hcv) replicon genotype b, poliovirus, ebola virus, and rift valley fever virus (ec ≤ nm). the cc values indicated minimal cytotoxicity in most of the viral infection models and were generally above the highest concentrations of ptc tested. a selectivity index is the ratio of cc to ec b values are mean ± standard deviation (sd) abbreviations: cc , compound concentration at which cell number is reduced by %; ec , compound concentration at which viral replication on a linear scale is inhibited by %; gfp, green fluorescent protein; hcv, hepatitis c virus replicon genotype b; piv- , parainfluenza type ; rsv, respiratory syncytial virus; rt-qpcr, quantitative reverse transcription pcr; sars-cov- , severe acute respiratory syndrome coronavirus ; tcid , tissue culture infectious dose %. ptc was originally identified as a compound that modulates expression of stressregulated proteins like vegf and was only subsequently found to be a dhodh inhibitor (cao et al., ; weetall et al., ) . dhodh inhibitors have been shown to suppress virallyinduced inflammatory cytokine production (xiong et al., ) . the effect of ptc on cytokine production and release into the cell culture medium was assessed in the biologically multiplexed activity profiling (biomap) assay profiling platform. this system consists of co- fibroblasts, and endothelial cells, allowing for the measurement of physiologically relevant biomarker readouts of the activity of a test compound (berg et al., ; kunkel et al., a; kunkel et al., b) . each co-culture was stimulated to recapitulate relevant signaling networks that naturally occur in healthy human tissue and under certain disease states. for each system, cytotoxicity and cellular proliferation were assessed. ptc was added at increasing concentrations to each of the model systems one hour before cellular stimulation and remained present during the stimulation period for each co-culture. cytokine production was subsequently assessed. ptc was found to be a potent inhibitor of the production of a subset of immunomodulatory and inflammation-related cytokines that are associated with the stress response and poor prognosis in covid- ( figure ). in the sag co-cell culture system, which models t-cell activation, ptc treatment resulted in significant decreases of mcp- , cd , and il- , as levels decreased after hours of stimulation (table ) . compared with the control, % and % reductions in mcp- and il- , respectively, were observed when cells were incubated with nm ptc , and a % decrease in cd was detected with nm ptc (all p values < . ). a % reduction in endothelial cellular proliferation as measured by srb (a measure of cell biomass) relative to control was seen following treatment with ptc at nm (p< . ) but not at or nm. in the bt co-cell culture system, which models chronic inflammatory conditions driven by b cell activation and antibody production, incubation of cells with nm ptc resulted in a significant reduction in the levels of soluble (s)igg, sil- a, sil- f, sil- , and stnfα released from the cells after hours of stimulation (range, % to %) (all p values < . ) ( figure and table ). the production of sigg was also significantly decreased by % following hours of stimulation (p < . ). similar to results obtained in the sag co-cell culture system, nm ptc significantly inhibited proliferation ( %) compared with control (p< . ). in the hdfsag co-cell culture system, which models chronic t cell activation and inflammation responses within a tissue setting, ptc was associated with significantly lower levels of secreted mcp- , il- , mmp- , sil- , sil- a, sil- f, sil- , sil- , stnfα and svegf compared with control (all p values < . ) ( figure and table ). ptc inhibition of cytokine production was particularly apparent for sil- f, sil- , and svegf, where a significant reduction in the levels of these cytokines was observed at as low as nm ptc (range % to %; all p values < . ). no significant differences between co-cultures treated with ptc or the control were seen with regard to proliferation or cytotoxicity. in the /th co-cell culture system, a model of mixed vascular th and th type inflammation that creates a pro-angiogenic environment promoting vascular permeability and recruitment of mast cells, basophils, eosinophils and t and b cells, significant inhibition of mcp- and sil- a production was seen as relative to controls (all p values < . ) ( figure and table ). significant inhibition of mcp- production of % was observed when cells were incubated with nm ptc (p< . ). similar to results obtained with the hdfsag co-culture system, treatment with ptc did not result in a significant decrease in cell proliferation or increased cytotoxicity in the /th system. legend: the x-axis indicates the cytokines evaluated and the y-axis the log ratio of ptc treated vs. vehicle control. arrows indicate evaluation of cell viability, cytotoxicity, and proliferation, with the two thicker grey arrows showing values that reached statistical significance at nm ptc . biomarkers are annotated only when there was a significant difference between the effect of ptc and vehicle control (p< . ), were outside of the significance envelope, and were > % [log ratio> . ]. the x-axis indicates the cytokines evaluated and the y-axis the log ratio of ptc treated vs. vehicle control. arrows indicate evaluation of cell viability, cytotoxicity, and proliferation, with the two thicker grey arrows showing when values reached statistical significance at nm ptc . abbreviations: bcr, b-cell receptor; biomap, biologically multiplexed activity profiling; bt system, co-culture of cd + b cells and pbmc that utilizes bcr stimulation and sub-mitogenic tcr stimulation; hdfsag system, co-culture of human primary dermal fibroblasts and pbmc that is stimulated with sub-mitogenic tcr levels; ig, immunoglobulin; igg immunoglobulin g; il, interleukin; pbmc, peripheral blood mononuclear cell; sag system, co-culture of endothelial cells and pbmc stimulated with mitogenic levels of tcr ligands; tcr, t-cell receptor; /th system, co-culture of endothelial cells and th blasts stimulated with tcr ligands and cytokines; th , t helper type ; vegf, vascular endothelial growth factor . (megna et al., ; pacha et al., ) . results from the bioseek analysis described above using the bt and hdfsag co-cell culture systems demonstrated that ptc significantly inhibits the production of il- a and il- f. the ability of ptc to decrease il- a and il- f levels was further verified by assessing the effects of ptc on the production of il- a and il- f by th cells, a subset of cd t cells that produce these cytokines (martinez et al., ) . the ability of ptc to inhibit the production of secreted il- a and il- f was assessed in a model system in which pbmcs were stimulated with human t-activator cd /cd dynabeads in a culture containing cytokines and antibody to induce th differentiation. the covid- is characterized by an early stage of viral replication followed in some cases by overproduction of inflammatory cytokines. both the viral replication and the cytokine response are dependent upon pyrimidine nucleotides produced by the de novo biosynthesis pathway (cheung et al., ; xiong et al., ) . the results presented here demonstrate that ptc has a dual mechanism of action that inhibits viral replication and attenuates the production of a subset of inflammatory cytokines. ptc potently inhibited sars-cov- replication in vitro, with an ec of . nm ( . nm based on the logarithmic curve), and an antiviral selectivity of > , . inhibition of sars-cov- replication by ptc was prevented by the addition of exogenous uridine, which obviated the requirement for the de novo pyrimidine nucleotide synthesis pathway, consistent with the drug functioning as a dhodh inhibitor and with the fact that many rna viruses require high concentrations of pyrimidine nucleotides to transcribe or replicate the viral genomes (xiong et al., ) . accordingly, ptc also demonstrated broadspectrum antiviral activity (table ) . a unique aspect of a dhodh inhibitor such as ptc is its ability to affect both viral replication and attenuate the cytokine storm (breedveld and dayer, ; li et al., ; xiong et al., ) . the cytokine response associated with covid- dramatically complicates the course of the infection in some patients and can result in ards with a high mortality rate. experience with other viral infections indicates that treating the excessive inflammatory immune events downstream of the infection is paramount to patient recovery (quartuccio et al., ) . studies presented here demonstrated that ptc inhibited production of multiple cytokines, including mcp- , il- , tnfα, vegf, and il- . inhibition of il- , il- , vegf, and igg by ptc may be of particular importance for treating covid- . il- appears to play a key role in excessive cytokine production resulting from viral infection and pulmonary complications in covid- (ruan et al., ; russell et al., ) . preliminary results from two small studies suggest inhibition of the il- pathways by tocilizumab and siltuximab resulted in treatment benefit for covid- patients (gritti et al., ; xu et al., ) . comparable to what was found for mers-cov and sar-cov infections, il- a has also been associated with disease severity and lung injury in covid- (pacha et al., ) . similarly, increased vegf levels promote vascular permeability and leakage, helping in the pathophysiology of hypotension and pulmonary dysfunction (teuwen et al., ) . the attenuation of expression of these cytokines by ptc that result from sars-cov- infection may provide important and unique therapeutic benefits. the reduction in the levels of igg production by ptc may also be a clinically meaningful aspect of ptc treatment; elevated igg levels has been shown to be associated with the severity of covid- (tan et al., ; . the novel dual mechanism of action of ptc distinguishes it from most other therapeutics being investigated in the clinic for the treatment of covid- , as many of these target either viral-specific processes or the immune response, but not both. due to its dual mechanism of action, ptc is expected to be effective in treating both early and later stages of covid- . this contrasts with direct-acting antivirals that typically show their greatest effectiveness primarily in the early phase of the disease. further, since ptc targets a cellular gene product (dhodh), and it is unlikely that viruses can overcome the need for pyrimidine nucleotides, the likelihood that the therapeutic effect of ptc will be compromised by the development of viral resistance is low. this may be particularly important for rna viruses whose high mutation frequency often promotes evasion of direct-acting antivirals. ptc is a highly potent inhibitor of sars-cov- replication, with an ec of . nm. several other dhodh inhibitors have also shown activity against sars-cov- (xiong et al., ) , including teriflunomide, an fda approved treatment for relapsing forms of multiple sclerosis. teriflunomide has an ec of . µm , and similar to ptc , has also been shown to have broad spectrum antiviral activity (xiong et al., ) . cov- infection have ec values reported to be the micromolar range. while these compounds have different mechanisms of action than ptc , it is notable that the reported ec of remdesivir is nm, that of chloroquine is . to . µm, and that of hydroxycholorquine is nm (choy et al., ; sanders et al., ; wang et al., b; xiong et al., ; yao et al., ) . the key to a successful covid- treatment is to not only have a potent molecule, but to also have a dose that can be delivered safely and that will sustain exposure in the blood to inhibit viral replication or infection. ptc is currently being evaluated to treat covid- in the phase / study ptc -vir- -cov (referred to as fite ) (https://clinicaltrials.gov/ct /show/nct ?term=ptc +and+covid&draw= &ran k= ). the dose being used is based on a pk/pharmacodynamic relationship obtained in monkeys and in cancer patients with neurofibromatosis type (nf ), and the well-characterized pk profile from healthy volunteer and oncology studies (cao et al., ) (data on file). the ptc dose in the fite study is predicted to yield a cave of ng/ml and will thus be ~ -fold higher than the drug's ec and ~ -fold greater than the ec values against sars-cov- . the ptc levels would also be ~ -fold higher than the nm concentration that resulted in the -log reduction in the titer of sars-cov- in cultured vero cells. in summary, ptc is a highly potent inhibitor of sars-cov- replication that also suppresses production of a subset of pro-inflammatory cytokines, suggesting it has the potential to act through this dual mechanism to treat the viral and immune components of covid- . due to its ability to block viral replication and cytokine production, ptc may be effective in treating both early and later stages of the disease. this contrasts with direct-acting antivirals that typically show their greatest effectiveness primarily in the early phase of the disease (mcnicholl and mcnicholl, ; xiong et al., ) and with anti-inflammatory drugs that treat only the later phase of covid- (moore and june, ; quartuccio et al., ) . it has been argued that it is important to combine antiviral therapy with immune suppressants as using only compounds that modify the immune response to treat the cytokine storms may make viral clearance more difficult (quartuccio et al., ) . importantly, ptc is orally bioavailable, has been extensively evaluated in human subjects, has well established pk and safety profiles and is generally well tolerated. these findings and prior clinical experience with ptc support the further development of this novel molecule for the treatment of covid- . the authors would like to thank leonid yurkovetskiy, mitchell white, and baylee heiden for thoughtful discussion and constructive advice and help with design and execution of sars-cov- experiments. poliovirus (ccl- ) was obtained from atcc. the received virus was amplified in hela cells, grown in dmem supplemented with % fbs and % penicillin and streptomycin solution and the titer evaluated by plaque assay on hela cell monolayers. vero cells were pretreated with increasing concentrations ( nm to µm) of ptc or ptc with µm uridine. after incubation at °c overnight, medium was removed, and african green monkey vero e cells were plated in -well plates at x cells per well. the activity of pt was evaluated in complex primary culture human cell systems in the biomap assays profiling platform (from bioseek, now part of eurofins). detailed protocols for the biomap primary human cell cultures and coculture systems have previously been published (berg et al., ; berg et al., ; melton et al., ; shah et al., ) . these systems consist of complex co-cultures of human peripheral blood mononuclear cells (pbmcs) pooled from healthy donors with early passage human primary fibroblasts, b-cells, or venular endothelial cells and are stimulated to recapitulate relevant signaling networks that naturally occur in human tissue or during specific disease states. in this study, four different co-culture systems were utilized. ptc was prepared in dmso (final concentration ≤ . %) and was added at the specified concentrations -hr before stimulation and remain in culture for -hrs (sag system), -hrs (hdfsag system), -hrs (bt system, soluble cytokines), or -hrs (bt system, secreted igg)). the sag system is a co-culture of human umbilical vein endothelial cells (huvec) and peripheral blood mononuclear cells (pbmc) stimulated with mitogenic levels of t cell receptor (tcr) ligands to drive the polyclonal t cell stimulation (defined as x tcr ligand strength), expansion and elevated cytokine release that is characteristic of acute t cell immune activation and inflammation conditions. huvec cells were stimulated with cytokines (interleukin [il]- β, ng/ml; tnf-α, ng/ml; and ifn-γ, ng/ml) and the superantigen staphylococcal enterotoxin f from staphylococcus aureus [sag ( ng/ml)]. the sag system also captures the paracrine signaling between activated immune cells and inflamed endothelial cells that is relevant for the vascular inflammation. the bt system is a co-culture of cd + b cells and pbmc that utilizes b cell receptor (bcr) stimulation and sub-mitogenic tcr stimulation (defined as / th tcr ligand strength) to capture the t cell dependent b cell activation and class switching that occurs in germinal centers. the bt system models diseases and chronic inflammatory conditions driven by b cell activation and antibody production. the hdfsag system is a co-culture of human primary dermal fibroblasts and pbmc that is stimulated with sub-mitogenic tcr levels (defined as / th tcr ligand strength) to model chronic t cell activation and inflammation responses within a tissue setting. the /th system is a co-culture of huvec cells and th blasts stimulated with tcr ligands ( / th ligand strength) and cytokines to model mixed vascular t helper type (th ) and th type inflammation and create a pro-angiogenic environment that promotes vascular permeability and recruitment of mast cells, basophils, eosinophils and t and b cells. for all co-cell culture systems, ptc was added to cells at a concentration of . , . , , and nm hr prior to stimulation of the cells and was present during the -hr stimulation period. the effect of ptc on the levels of various biomarkers including cytokines, growth factors, expression of surface molecules, and cell proliferation were monitored. the levels of protein biomarkers were measured by elisa. cytotoxicity was quantified by alamar blue reduction, which measures cell viability by evaluating the level of oxidation during respiration. cell viability was quantified using sulforhodamine b (srb) assay. srb bind cellular protein components and measure the total biomass and proliferation was measured by counting cells. pbmcs (allcells, inc.) were stimulated with t cell activator cd /cd dynabeads, that are coated with anti-cd and anti-cd antibodies (dynabeads; gibco, cat# d),in the presence of cytokines and antibodies to promote th differentiation while blocking th and th differentiation. the final concentration of cytokines was as follows: il- ( ng/ml), il- β il- ( ng/ml), and the neutralizing antibodies anti-il- ( μg/ml) and anti-ifn-γ ( μg/ml) in culture medium subsequently, µl of washed dynabeads or medium were added to each well and the cells were incubated at °c, % co for to hours. following incubation, µl of x cytokine/ab mixture (described above) was as added to each well (for the "no stimulating control", µl medium was as added) and µl of ptc , ptc- (the inactive enantiomer of ptc ) (cao et al., ), brequinar, a , or medium was added. cells were incubated at °c, % co for a total hours. after hours, the medium was collected and analyzed by an enzyme-linked immunosorbent assay (elisa) kit (r&d system: il cat#dy - and il f cat#dy - ) according to manufacturer's instruction characterization of compound mechanisms and secondary activities by biomap analysis chemical target and pathway toxicity mechanisms defined in primary human cell systems building predictive models for mechanism-ofaction classification from phenotypic assay data sets leflunomide: mode of action in the treatment of rheumatoid arthritis targeting of hematologic malignancies with ptc , a novel potent inhibitor of dihydroorotate dehydrogenase with favorable pharmaceutical properties broad-spectrum inhibition of common respiratory rna viruses by a pyrimidine synthesis inhibitor with involvement of the host antiviral response remdesivir, lopinavir, emetine, and homoharringtonine inhibit sars-cov- replication in vitro elevated plasma dihydroorotate in miller syndrome: biochemical, diagnostic and clinical implications, and treatment with uridine use of siltuximab in patients with covid- pneumonia requiring ventilatory support extrapulmonary manifestations of covid- clinical features of patients infected with novel coronavirus in covid- cytokine storm: the interplay between inflammation and coagulation. the lancet respiratory medicine an integrative biology approach for analysis of drug action in models of human vascular inflammation rapid structure-activity and selectivity analysis of kinase inhibitors by biomap analysis in complex human primary cell-based models inhibitory effect of the antimalarial agent artesunate on collagen-induced arthritis in rats through nuclear factor kappa b and mitogen-activated protein kinase signaling pathway genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding regulatory t cells and th cells in viral infections: implications for multiple sclerosis and myocarditis neuraminidase inhibitors: zanamivir and oseltamivir may il- have a role in covid- infection? regulation of il- a production is distinct from il- f in a primary human cell co-culture model of t cell-mediated b cell activation cytokine release syndrome in severe covid- on dihydroorotate dehydrogenases and their inhibitors and uses covid- : a case for inhibiting il- ? hearing improvement after bevacizumab in patients with neurofibromatosis type urgent avenues in the treatment of covid- : targeting downstream inflammation to prevent catastrophic syndrome clinical predictors of mortality due to covid- based on an analysis of data of patients from wuhan associations between immune-suppressive and stimulating drugs and novel covid- -a systematic review of current evidence pharmacologic treatments for coronavirus disease (covid- ): a review mechanisms of skin toxicity associated with metabotropic glutamate receptor negative allosteric modulators viral kinetics and antibody responses in patients with covid- . medrxiv covid- : the vasculature unleashed clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro phase study of safety, tolerability, and pharmacokinetics of ptc , an inhibitor of stress-regulated protein translation transmission, diagnosis, and treatment of coronavirus disease (covid- ): a review genome composition and divergence of the novel coronavirus ( -ncov) originating in china a new coronavirus associated with human respiratory disease in china novel and potent inhibitors targeting dhodh, a rate-limiting enzyme in de novo pyrimidine biosynthesis, are broad-specturm antiviral against rna viruses including newly effective treatment of severe covid- patients with tocilizumab in vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus (sars-cov- ) clinical infectious diseases : an official publication of the infectious diseases society of america multi-omics study revealing tissue-dependent putative mechanisms of sars-cov- drug targets on viral infections and complex diseases. medrxiv clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china key: cord- - wfy f authors: gobeil, sophie m-c.; janowska, katarzyna; mcdowell, shana; mansouri, katayoun; parks, robert; manne, kartik; stalls, victoria; kopp, megan; henderson, rory; edwards, robert j; haynes, barton f.; acharya, priyamvada title: d g mutation alters sars-cov- spike conformational dynamics and protease cleavage susceptibility at the s /s junction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wfy f the sars-cov- spike (s) protein is the target of vaccine design efforts to end the covid- pandemic. despite a low mutation rate, isolates with the d g substitution in the s protein appeared early during the pandemic, and are now the dominant form worldwide. here, we analyze the d g mutation in the context of a soluble s ectodomain construct. cryo-em structures, antigenicity and proteolysis experiments suggest altered conformational dynamics resulting in enhanced furin cleavage efficiency of the g variant. furthermore, furin cleavage altered the conformational dynamics of the receptor binding domains (rbd) in the g s ectodomain, demonstrating an allosteric effect on the rbd dynamics triggered by changes in the sd region, that harbors residue and the furin cleavage site. our results elucidate sars-cov- spike conformational dynamics and allostery, and have implications for vaccine design. highlights sars-cov- s ectodomains with or without the k p, v p mutations have similar structures, antigenicity and stability. the d g mutation alters s protein conformational dynamics. d g enhances protease cleavage susceptibility at the s protein furin cleavage site. cryo-em structures reveal allosteric effect of changes at the s /s junction on rbd dynamics. the severe acute respiratory coronavirus (sars-cov- ) belongs to the b-coronavirus family of enveloped, positive-sense single stranded rna viruses, and has one of the largest genomes among rna viruses (de wit et al., ) . of the seven known coronaviruses that infect humans, four (hcov- e, hcov-oc , hcov-nl , cov-hku ) circulate annually causing generally mild respiratory symptoms in otherwise healthy individuals, while the sars-cov- and middle east respiratory syndrome coronavirus (mers-cov) , that are closely related to sars-cov- , have resulted in the - sars and mers epidemics (zumla et al., , respectively. the ongoing pandemic of coronavirus disease of , is a global public health emergency with more than million cases and million deaths recorded worldwide (dong et al., ) (https://coronavirus.jhu.edu). the surface of the sars-cov- is decorated with the spike (s) glycoprotein turonova et al., ) that is the target of most current vaccine development efforts sempowski et al., ) . in its prefusion conformation the sars-cov- s protein is a large homo-trimeric glycoprotein forming a crown (from the latin corõna) at the surface of the virus capsid. each s protomer is subdivided into two domains, s and s , which are delimited by a furin cleavage site at residue - (figure ) . the s domain comprises the n-terminal domain (ntd), an ntd-to-rbd linker (n r), the receptor binding domain (rbd), and subdomains and (sd and sd ). the s domain contains a second protease cleavage site (s ') followed by the fusion peptide (fp), heptad repeat (hr ), the central helix (ch), the connector domain (cd), heptad repeat (hr ), the transmembrane domain (tm) and a cytoplasmic tail (ct) (figure ) . the s domain is responsible for recognition and binding to the host-cell angiotensin-converting enzyme (ace ) receptor. the s domain is responsible for viral-host-cell membrane fusion and undergoes large conformational changes (hoffmann et al., a) , but only upon furin cleavage and further essential processing by cleavage at the s ' site by tmprss and related proteases (bestle et al., ; hoffmann et al., b; matsuyama et al., ) . previous reports have demonstrated the central role of the dynamics of the rbd domains between a "closed" (or all rbd-down receptor inaccessible conformation) and "open" (or rbd-up) conformations for recognition and binding to the host cell ace receptor (gui et al., ; shang et al., ; yuan et al., ) . since the early stages of the covid- pandemic, virus evolution has been followed by large-scale sequencing of the virus genomes isolated from patients, and several mutations that arose and propagated within different populations have been identified even though the virus has genetic proofreading mechanisms (elbe and buckland-merrett, ; korber et al., ) . the d g mutation in particular has attracted attention since it has quickly become the dominant strain of sars-cov- circulating worldwide (korber et al., ) . the d g mutation of the s protein has been associated in numerous reports with increased fitness and/or infectivity of the virus (korber et al., ; li et al., ; weissman et al., ) . cryo electron microscopy (cryo-em) structures of the s glycoprotein ectodomain have revealed that d is a surface residue in the vicinity of the furin cleavage site. mutation of this residue to a glycine is expected to disrupt a critical interprotomer hydrogen bond involving t of the s domain (korber et al., ) and resulting in a shift in the observed equilibrium between the open and closed state of the s protein ectodomain (johnson et al., ; weissman et al., ; yurkovetskiy et al., ) ( figure ). most structures of the sars-cov- s ectodomain currently available include two mutations, one to disrupt the furin cleavage site (rrar to gsas = s-gsas), and a double proline mutation (pp) of residues - , designed to prevent conformational change to the post-fusion state (wrapp et al., ) . originally designed for the mers s protein (pallesen et al., ) , insertion of two consecutive pro mutations at the junction of the hr and ch regions stabilized the pre-fusion conformation of the mers, sars and hcov-hku spikes, increased protein expression, and immunogenicity for the mers s protein (pallesen et al., ) . based on these prior data, introduction of two consecutive proline residues at the beginning of the central helix was postulated as a general strategy for retaining b-coronavirus s proteins in the prefusion conformation. thus, the pp mutations were carried over to the sars-cov- ectodomain (wrapp et al., ) that is currently widely used in the field for vaccine and structural studies, and is also the component of a vaccine candidate . although shown to stabilize the pre-fusion conformation of other coronaviruses, the effect of the pp insertion has not been systematically studied for the sars-cov- s ectodomain. with the goal of investigating the biophysical and structural consequences of the d g mutation, and to prevent the engineered pp mutations from confounding our observations, we produced two sars-cov- s ectodomain constructs with the native k and v residues, incorporating either a d or a g at position (figure ) . the rrar sequence in the furin cleavage site was replaced by a gsas sequence thus rendering the s constructs furin-cleavage deficient. to probe the effect of the d g substitution on furin cleavage of the s protein, we either reinstated the native furin sequence or replaced it with an exogeneous hrv c proteolysis signal. we determined the cryo-em structures of the uncleaved d and g s ectodomains, as well as the structure of the fully cleaved g s ectodomain of the currently globally dominant sars-cov- . our results demonstrate the effect of the d g substitution on the conformational dynamics and furin cleavage susceptibility of the s ectodomain, and reveal insights into the allostery between the rbd and distal regions of the s protein. while the sars-cov- s ectodomain construct that includes mutations of residues k and v , between the hr and ch subdomains (s domain), to prolines (pp) (named s-gsas/pp in this study) (figure ) is widely used in the field, the origin of this pp construct was based upon the stabilization of the pre-fusion conformation of other coronavirus spikes (pallesen et al., ; walls et al., ; wrapp et al., ) . here, we generated an analogous s ectodomain construct that had the native k and v residues (named s-gsas) (figure ). in our f expression system (see methods for details), both the s-gsas/pp and s-gsas constructs expressed at similar levels, yielding about mg final protein per l of culture. both proteins also showed similar migration profiles on sds-page and by size exclusion chromatography (sec) on a superose column (figure a, b) . similar to the s-gsas/pp construct henderson et al., ; wrapp et al., ) , s-gsas showed - % intact prefusion spike trimers by negative stain electron microscopy (nsem) (figure c ). this finding is in contrast to previous observations for mers and the sars-cov- ectodomains, which showed a mixture of the prefusion and postfusion conformations unless the pp mutation was included (pallesen et al., ) . binding of s-gsas and s-gsas/pp measured by elisa to ace and cr , both requiring an rbd-up conformation, ab and ab , two antibodies isolated from a covid- convalescent donor with epitopes mapping to the ace binding site and s domain respectively, and g , binding to a quaternary s glycan epitope, were all nearly identical demonstrating that both constructs showed similar antigenic behavior ( figure d ). using differential scanning fluorimetry to measure the spike thermostability, we found the s-gsas and s-gsas/pp ectodomains showed similar melting temperatures ( figure e ). we next solved cryo-em structures of the s-gsas ectodomain (figures f-h , s -s , table s ), to compare with the s-gsas/pp structures (walls et al., ; wrapp et al., ) and to visualize the impact that the engineered pp mutations had on the structure of the sars-cov- spike ectodomain. two populations of the s-gsas s ectodomain were identified in the cryo-em dataset -a -rbd-up (or open) and a -rbd-down (or closed) conformation ( figure f and table s ). both structures were similar to the corresponding structures of s-gsas/pp (walls et al., ) , with overall rmsds of . Å and . Å for the -up and -down structures, respectively. in the region around the pp mutations, we found the s-gsas structures to be similar to the corresponding s-gsas/pp structures ( figure h ). in the s-gsas -rbd-up structure, we observed that the k side chain was appropriately positioned to make an interprotomer salt bridge with the d residue of the rbd of the adjacent protomer, an interaction that would be abrogated in the pp construct. the corresponding residues in the mers s protein, v and l are non-polar, and the adjacent protomers too far to interact with these residues (figure s ). in the sars-cov- s protein cryo-em structure (pdb xlr, x b) the residues d -d (equivalent to sars-cov- d -d ) lie further from k suggesting that this putative salt bridge interaction may be more transient in sars-cov- . overall, our data show that for the sars-cov- s ectodomain, the s-gsas construct showed similar structural, antigenic and stability behavior as the s-gsass/pp construct that included the k p and v p mutations at the junction of the ch and hr regions. while these and analogous mutations had proved beneficial for the expression and stability of other covs (pallesen et al., ) , for the sars-cov- s protein other compensating interactions may help confer stability to the pre-fusion form in the absence of the pp mutations. for the rest of this study we have used the s-gsas construct as the platform for introducing mutations and other modifications of interest. to understand the molecular details of the spike d g mutation that arose and quickly dominated circulating sars-cov- isolates globally, we sought to assess the impact of the d g mutation on the structure and antigenicity of the sars-cov- s ectodomain. the d g mutated s-gsas construct (s-gsas/d g), yielded an average of ~ mg of purified protein per l of culture (n = ). the sds-page, sec and dsf profiles of the s-gsas/d g ( figure a ) were similar to that of the s-gsas s ectodomain (figure a, b) . nsem of the s-gsas/d g s ectodomain revealed typical and well-dispersed pre-fusion s particles ( figure b ). to visualize structural details at higher resolution, we determined the cryo-em structures of s-gsas/d g construct ( figure c -e, table s , figures s -s ) . two major populations of the s ectodomain were identified in the cryo-em dataset -one population with one rbd in the "up" or ace receptor accessible conformation and the other with all three rbds in the figure c ). this is consistent with our previous observations made with nsem data that showed an increase in the rbd-up population for the s-gsas/d g s ectodomain (weissman et al., ) . our results show that the d g mutation in the sd domain, even though distal from the rbd region, has an allosteric effect on rbd dynamics leading to alteration of up/down rbd dispositions. to understand the nature of this allostery, we examined changes in the s protein that accompany the up and down rbd transition (figure ) by comparing the rbd-up chain in the -rbd-up structure to the down chains in the -up and the -down structures ( figure c ). in each s protein protomer, the polypeptide chain folds into domains as it traverses the length of the s subunit and enters the s subunit i.e. the ntd (residues - ) followed by the rbd (residues - ), the sd (residues - ) and sd (residues - ) domains ( figure a ). the ntd and rbd are connected via a -residue linker spanning residues - (named n r) that stacks against the sd and sd domains ( figure a-d) , as it makes its way from the ntd to the rbd, essentially connecting all the individual domains in the s subunit, and forming "super" subdomains sd ' and sd ', respectively . upon overlaying the protomers with the rbd in the up position with the protomers with their rbds in the down position by using the s subunit residues - for superpositions, we found that the down-to-up rbd motion is accompanied by a rigid body movement of the sd ' domain resulting in a shift of up to ~ . Å of the sd domain ( figure d ), relative to its position in the rbd-down protomers and a shift of up to a Å in the n r linker as it hinges to enter into the rbd. this results in a ~ ° tilt of residues - of the n r linker region that forms part of the sd ' super subdomain, while residues - of the linker that associate with the sd subdomain remained virtually unmoved, showing only a slight tilt in the b beta strand that accompanied large movements in the rbd and adjoining sd ' domain ( figure d ). indeed, the sd ' super subdomain that harbors the d g mutation appears to form a conformationally invariant anchor with the mobile rbd and ntd domains at either end ( figure d) . additionally, the s subunit remains invariant between the different protomers showing that the large movements that occur in the s subunit are effectively arrested by the sd ' super subdomain conformationally invariant anchor. these observations are mirrored in the difference distance matrices (ddm) comparing the rbd-up and down chains ( figure e and figure s ). ddm analyses (richards and kundrot, ) provide superposition-free comparisons between a pair of structures by calculating the differences between the distances of each pair of ca atoms in a structure and the corresponding pair of ca atoms in the second structure. the ddm analysis not only shows the large movement in the rbd region and the movement in the ntd, it also captures the movement in the n r linker and the sd domain observed in the structures. overall, these analyses show that the d g mutation is acquired within a key region encompassing the sd domain and an additional b-strand contributed by residues - of the n r linker that forms a region of relative structural stillness separating the mobile ntd and rbd, as well isolating the motions in s from the s subunit. this distal mutation altering rbd conformational dynamics shows that small changes in this region can translate into large allosteric effects, and suggests a role for the sd domain in modulating rbd dynamics. in addition to the d g mutation, the sd subdomain also harbors a furin cleavage site (residues - ) that separates the s and s subunits (figure ) . cleavage of the s protein by furin at this site is essential for virus transmission (shang et al., ) . the proximity of the d g mutation to the furin cleavage site and the increased flexibility observed in the cryo-em dataset of the s-gsas/d g ectodomain ( figure c -e), prompted us to examine the effect of the d g substitution on furin cleavage. since our expression system (i.e. freestyle cells) endogenously expresses furin, in order to obtain uncleaved spike that we could then test for protease cleavage in vitro, we engineered a hrv c site ( amino acids long) to replace the furin cleavage site ( amino acids long) at the s /s junction, resulting in the s-hrv c and s-hrv c/d g s ectodomain constructs ( figure a ). both proteins expressed in f cells but at lower yields compared to the s-gsas constructs ( µg/l and µg/l for the s-hrv c and s-hrv c/d g proteins, respectively). sec and sds-page profiles were similar to the s-gsas and s-gsas/d g proteins confirming well-folded and homogeneous spike preparations ( figure a , b). nsem micrographs showed characteristic kite-shaped particles for the pre-fusion s protein, and d-classification of particles from nsem revealed well folded spikes, further confirming that s-hrv c spikes retained the overall fold and structure of the s-gsas spikes (figure c, d) . to test the susceptibility of the hrv c site engineered at the junction of the s and s subunits to protease cleavage, we incubated the purified s-hrv c and s-hrv c/d g spikes with the hrv c enzyme and followed the digestion by analyzing aliquots taken at different time-points by sds-page ( figure e -g). we found that the digestion of the s-hrv c/d g spike ( figure f -g) proceeded at a faster rate than that of the s-hrv c spike ( figure e-g) with the s-hrv c/d g spike almost % digested within the first minutes of incubation, whereas, the s-hrv c constructs only achieved % of cleavage after hours, and a substantial portion remained uncleaved even upon addition of more enzyme followed by additional hours of incubation. these results suggested that the d g mutation increased the susceptibility of protease cleavage at the s /s junction. to study the effect of the d g substitution on protease cleavage at the s /s junction with the native furin site, we next generated spike ectodomains constructs where the furin site was restored to the native sequence, resulting in two constructs named s-rrar and s-rrar/d g ( figure a) . the proteins were expressed and purified using our usual methodology for the furin cleavage-deficient constructs (see methods). the sec profiles ( figure a ) showed a higher proportion of the first higher molecular weight peak. a second peak eluting at a similar molecular weight as the s-gsas spike (at ~ . ml elution volume) was used for further characterization. the sec profile of the s-rrar spike preparation showed small populations of lower molecular weight peaks that were not observed for the s-rrar/d g protein ( figure a ). on sds-page (figure b) , the peak corresponding to the s ectodomain showed the s-rrar construct as having one major band at the molecular weight corresponding to the spike monomer and some fainter bands corresponding to the s and s subunits while the s-rrar/d g protein showed a band corresponding to the spike monomer and the two bands corresponding to the molecular weights of the s and s subunits. the smaller molecular weight bands corresponding to the s and s subunits were in higher proportions in the s-rrar/d g spike preparation compared to the s-rrar preparation. in summary the sec and sds-page profiles showed that, although both the s-rrar and s-rrar/d g constructs were cleaved by endogeneous furin (figure b ) during protein expression the s and s subunits remained together in solution ( figure a) . consistent with the enhanced cleavage observed for the s-hrv c/d g spike relative to the s-hrv c spike, in the furin-site restored spikes we observed a higher proportion of cleaved spike in s-rrar/d g relative to s-rrar, suggesting that the d g mutation makes the spike more susceptible to furin cleavage. nsem of the purified s-rrar ( figure c ) and s-rrar/d g ( figure d) confirmed that both of these furin site-restored spikes formed well-folded spike ectodomains. we next digested the sec isolated fractions of the s-rrar and s-rrar/d g ectodomains ( figure a-d) in vitro by adding furin ( figure e ). as observed for the s-hrv c constructs, the d version of the spike was less susceptible to cleavage than the g mutant for the same incubation time with the enzyme. sec purification of the fully digested s-rrar/d g ectodomain revealed a peak corresponding to the ectodomain ( figure f) . on sds-page, this peak migrated as two distinct bands corresponding to the s and s domains thus confirming isolation of only the cleaved portion of the protein ( figure g ). nsem showed fully folded ectodomains for the furin digested and sec purified s-rrar/d g protein ( figure h ). in summary, these results show that acquisition of the d g mutation the s protein sd domain resulted in increased furin cleavage of the s ectodomain. to visualize the structure of the furin-cleaved s ectodomain at atomic level resolution, we obtained a cryo-em dataset, and resolved two populations of the furin-cleaved s ectodomain -a -rbd-up and a -rbd-down population ( figure a, figure s and s and table s ). we observed an increased proportion of the -rbd-down s compared to the uncleaved d g s ectodomain, thus reporting a change in the rbd conformational dynamics upon furin cleavage. consistent with this result, we observed reduced binding to ligands such as ace- and cr that require the rbd to be in the up conformation for binding ( figure b ). as expected, decrease in binding was also observed with antibody , isolated from a convalescent covid- donor, with an epitope overlapping with the ace binding site. antibody g that binds a quaternary glycan epitope in the s subunit showed a small decrease in binding with the furin-cleaved s ectodomain, whereas another covid- -derived s antibody showed increase in binding with the furin-cleaved s ectodomain. we compared the different protomers in the two structures by overlaying three protomers in the asymmetric -rbd-up structure and one protomer from the symmetric -rbd-down structure using residues - (comprising the ch and hr regions) for superposition ( figure c ). similar to observations made with the s-gsas/d g s ectodomain structure, the rbd up/down motion in the furin-cleaved g s ectodomain was associated with a movement in the sd domain and in the region of the rbd-to-ntd linker that joined the sd b sheet ( figure c, s b) . as observed for s-gsas/d g, the sd domain showed little conformational change and formed a stable motif anchoring the mobile ntd and rbd domains. these observations reinforce the divergent roles of the sd and sd domain in rbd motion. we next examined the region of the sd domain proximal to the ntd, and asked whether we could detect any structural changes in this region and if yes, could these be related to ntd motion. in the symmetric -rbd-down s ectodomain, all ntds are identical, each stacking against the down rbd of the adjacent promoter. in the asymmetric -rbd-up structure, each ntd were distinct. to distinguish between these, we named the ntds as follows: the ntd that was part of the protomer with the rbd in the up conformation was named ntd . ntd stacked against a down rbd that contacted the up-rbd at one end and the second down-rbd at the other. the ntd stacked against a down-rbd that contacted a down-rbd at one end, and the ntd had the least amount of rbd contact by virtue of contacting the up-rbd ( figure a ). observing the ntd-proximal region on the sd domain (marked by a dotted square on figure c ) that also contacted the rbd-to-ntd linker, we noted shifts in the t - loop between the different protomers. while the shifts were modest (with a maximal displacement of . Å), interestingly, identical trends were observed in the -rbd-up structures of the s-gsas, s-gsas/d g and furin-cleaved s-gsas/d g s ectodomains, suggesting that this region of the sd domain responds to ntd motion and adopts a different conformation depending on the ntd environment ( figure d ). thus, these data provide further evidence for allostery in the s protein, with changes in the sd domain impacting the rbd conformational dynamics. while the sd domain remains almost structurally invariant, we observe small but reproducible changes in sd loops in response to rbd/ntd movement suggesting that small changes in the sd region may translate to large motions in the rbd/ntd region. stabilized ectodomain constructs have proven to be useful tools to understand conformational dynamics of cov s proteins. in particular, these have enabled high-resolution structural determination and atomic level understanding of the s ectodomain. they also are key components in vaccine development pipelines. the structural similarities in the s proteins of diverse covs have often enabled quick translation of structural rules and ideas from one cov s ectodomain to another. indeed, after the onset of the recent and ongoing covid- pandemic, the sars-cov- s ectodomain could be rapidly stabilized and structurally characterized by exploiting its similarities with other covs and following strategies that had proved successful previously pallesen et al., ; wrapp et al., ) . some of these stabilization strategies, such as introduction of proline residues in the fusion subunit to prevent transition from pre-to post-fusion, have been successful in stabilizing the pre-fusion conformation of diverse class i fusion proteins including rsv f (krarup et al., ) , hiv- env (sanders et al., ) , ebola and marburg gp (rutten et al., ) , influenza ha (qiao et al., ) and lassa gpc (hastie et al., ) . while the underlying hypothesis for the stabilization of the s ectodomain was that introduction of proline residues at the junction of the ch and hr helices would arrest conformational transition to the post-fusion form, we found that even without the pp mutations, the sars-cov- s ectodomain retained its pre-fusion form. not only so, even following furin cleavage the s ectodomain retained its pre-fusion conformation. these differences between the observed behavior of the sars-cov- s relative to other covs suggests that even though they retain similar overall topology and structural folds, there are differences between these covs that profoundly affect their structural and biological properties. studying and accounting for these will be essential not only to understand sars-cov- but also to appreciate the nature and origin of these differences between covs for anticipating, preparing for and rapidly combating future cov pandemics. viral surface proteins that are involved in receptor binding mediated cellular entry typically consist of flexible and moving parts that exhibit large conformational changes. while this conformational flexibility is necessary for function, structural checkpoints are required to prevent premature activation and destabilization or unfolding of the protein structure. conformationally-silent structural islands provide the necessary stabilizing anchors for adjacent regions undergoing large motions. in this study we have identified the sd domain in the sars-cov- s protein as such a conformational anchor that is spatially interspersed between the highly mobile ntd and rbd regions, while itself remaining relatively invariant in its conformation. this conformational invariability of the sd subdomain is reminiscent of the beta sandwich structure in the hiv- envelope glycoprotein that connects and anchors a mobile layered architecture of the gp inner domain (pancera et al., ) . the conformationally invariant sd also serves to contain the movements of the rbd and ntd to the s subunit, such that the s subunit was unchanged between the various rbd "up" and "down" protomers ( figure and figure s ). this suggests a role for the sd domain in preventing premature triggering due to the stochastic up/down rbd motions in the sars-cov- s protein, as well as the importance of downstream events such as ace receptor engagement and tmprss protease cleavage (bestle et al., ; hoffmann et al., b; matsuyama et al., ) in orchestrating the full extent of pre-to post-fusion transformation. in this study, we also assigned a key role to the n r linker that connects the ntd to the rbd within a protomer. rather than just being a connector, this -residue linker is also a modulator of conformational changes that are critical for receptor engagement. the linker contributes a beta strand to each of the sd and sd subdomains thus connecting all the structural domains in the s subunit. in addition to the much discussed d g mutation, the sd subdomain also houses the multibasic furin cleavage site that demarcates the s and s subunits. furin cleavage is an essential processing step for the s protein and is necessary for viral infection and transmission (hoffmann et al., a; shang et al., ) . we provide evidence in this study that the d g mutation enhances susceptibility of the sars-cov- s ectodomain to furin cleavage, thus raising the possibility that this is a contributor to increased fitness and transmissibility of d g isolates. in this paper, we study the effect of the d g mutation on rbd dynamics and susceptibility to furin cleavage. we find that the d g mutation results in increased furin cleavage susceptibility, which could be responsible for the increased transmissibility of the sars-cov- with the d g mutation. it is important to consider though that these results are further information and requests for resources and reagents should be directed to priyamvada acharya (priyamvada.acharya@duke.edu) . data and code availability cryo-em reconstructions and atomic models generated during this study are available at wwpdb and embd (https://www.rcsb.org; http://emsearch.rutgers.edu) under the accession codes pdb ids kdg, kdh, kdk, kdl, kdi, kdj, ke , ke , ke , ke , ke , kea, keb, kec and emdb ids emdb- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- , emd- . gibco freestyle -f cells (embryonal, human kidney) were incubated at °c and % co in a humidified atmosphere. cells were incubated in freestyle expression medium (gibco) with agitation at rpm. plasmids were transiently transfected into cells using turbo (speedbiosystems) and incubated at °c, % co , rpm for days. on the day following transfection, hyclone cdm hek media (cytiva, ma) was added to the cells. antibodies were produced in expi cells (embryonal, human kidney). cells were incubated in expi expression medium at °c, rpm and % co in a humidified atmosphere. plasmids were transiently transfected into cells using the expifectamine transfection kit and protocol (gibco). all genes in this study were synthesized and sequenced by geneimmune biotechnology (rockville, md). the sars-cov- spike protein ectodomain constructs used comprised the protein residues − (genbank: mn ) with or without the d g mutation, with or without the furin cleavage site rrar (residue - ) mutated to gsas or levlfqgp (hrv c protease site), a c-terminal t fibritin trimerization motif, a c-terminal hrv c protease cleavage site (except for the constructs where the furin site was mutated to an hrv c site), a twinstreptag and an xhistag. all spike ectodomain constructs were cloned into the mammalian expression vector pαh (wrapp et al., ) . for the ace- construct, the c-terminus was fused a human fc region. protein purification spike ectodomains were harvested from filtered and concentrated supernatant using streptactin resin (iba) and further purified by sec using a superose / gl increase column preequilibrated in mm tris, ph . , mm nacl, . % sodium azide. all protein purification steps were performed at room temperature in a single day. the purified proteins were flash frozen and stored at - °c in single-use aliquots. each aliquot were thawed by incubation (~ min) at °c before use. antibodies were produced in expi f cells and purified by protein a affinity. ace- with human fc tag was purified by protein a affinity chromatography. negative-stain electron microscopy samples were diluted to µg/ml in mm hepes ph . , mm nacl, % glycerol, . mm glutaraldehyde and incubated for minutes before quenching the glutaraldehyde by the addition of m tris (to a final concentration of mm) and minutes incubation. a -µl drop of sample was then applied to a glow-discharged carbon-coated grid for - seconds, blotted, stained with % uranyl formate, blotted and air-dried. images were obtained using a philips em electron microscope at kv, , × magnification, and a . Å pixel size. the relion (scheres, ) program was used for particle picking, d and d class averaging. differential scanning fluorimetry dsf assay was performed using tycho nt. (nanotemper technologies). spike ectodomains were diluted to approximatively . mg/ml. intrinsic fluorescence was measured at nm and nm while the sample was heated from to °c at a rate of °c/min. the ratio of fluorescence ( / nm) and inflection temperatures (ti) were calculated by the tycho nt. apparatus. elisa assays spike samples were pre-incubated at different temperatures then tested for antibody-or ace- binding in elisa assays as previously described . assays were run in two formats. in the first format antibodies or ace protein were coated on -well plates at µg/ml overnight at °c, washed, blocked and followed by two-fold serially diluted spike protein starting at µg/ml. binding was detected with polyclonal anti-sars-cov- spike rabbit serum (developed in our lab), followed by goat anti-rabbit-hrp and tmb substrate. absorbance was read at nm. in the second format, serially diluted spike protein was bound in individual wells of -well plates, which were previously coated with streptavidin at µg/ml and blocked. proteins were incubated at room temperature for hour, washed, then human mabs were added at µg/ml. antibodies were incubated at room temperature for hour, washed and binding detected with goat anti-human-hrp and tmb substrate. cryo-em purified sars-cov- spike preparations were diluted to a concentration of ~ . mg/ml in mm tris ph . , mm nacl and . % nan . a . -µl drop of protein was deposited on a quantifoil- . / . grid that had been glow discharged for seconds in a pelco easiglow™ glow discharge cleaning system. after a seconds incubation in > % humidity, excess protein was blotted away for . seconds before being plunge frozen into liquid ethane using a leica em gp plunge freezer (leica microsystems). frozen grids were imaged in a titan krios (thermo fisher) equipped with a k detector (gatan). no statistical analysis were performed in this study. table reagent or resource source identifier antibodies ace n/a cr n/a g n/a ab n/a ab n/a goat anti-rabbit-hrp abcam ab goat anti-human-hrp jackson immunoresearch laboratories first derivative (ratio) figure c with the s subunit colored by domain and the s subunit colored grey. rbd is colored red, ntd green, sd dark blue, sd orange and the linker between the ntd and rbd colored cyan. b. overlay of the individual protomers in the -rbd-up structure and a protomer in the c symmetric -rbd-down structure shown in figure c . the structures were superimposed using s subunit residues - (spanning the hr and ch regions). the domain colors of the up-rbd chain are as described in panel a. the down-rbds are colored salmon, the sd domains from the down rbd chains are colored light blue. the linker between the ntd and rbd in the down rbd chains are colored deep teal. c. zoomed-in view showing the association of the linker connecting the ntd and rbd with the sd and sd domains. d. zoomed-in views of individual domains marked in panel b. the n r linker spanning residues - connects the ntd and the rbd. residues - of the n r linker contribute a b-strand to the sd subdomain together forming the sd ' "super" subdomain. residues - of the n r linker contribute a b-strand to the sd subdomain together forming the sd ' "super" subdomain. e. difference distance matrices (ddm) showing structural changes between different protomers for the structures shown in figure c . the blue to white to red coloring scheme is illustrated at the bottom. ab (rbd-directed neutralizing antibody) and ab (s -directed non-neutralizing antibody) to s-gsas/d g (in blue) and the furin-cleaved s-rrar/d g ectodomain (in green) measured by elisa. the assay format was the same as in figure d . c. overlay of the individual protomers in the -rbd-up structure and a protomer in the c symmetric -down-rbd structure shown in panel a. rbd-up chain with the s subunit colored by domain and the s subunit colored grey. rbd is colored red, ntd colored green, sd dark blue, sd orange and the linker between the ntd and rbd colored cyan. the down rbds are colored salmon, the sd domains from the down rbd chains are colored light blue. the linker between the ntd and rbd in the down rbd chains are colored deep teal. insets show zoomed-in views of individual domains similar to the depiction in figure d . d. (left) the protomers of the -rbd-up structure of the furin-cleaved s-rrar/d g ectodomain superimposed using residues - and colored by the color of their ntd as depicted in panel a. zoomed-in views show region of the sd domain proximal to the ntd. a glycan cluster on the sars-cov- spike ectodomain is recognized by fab-dimerized glycan-reactive antibodies. biorxiv real-space refinement inphenixfor cryo-em and crystallography tmprss and furin are both essential for proteolytic activation of sars-cov- in human airway cells sars-cov- mrna vaccine design enabled by prototype pathogen preparedness sars and mers: recent insights into emerging coronaviruses an interactive web-based dashboard to track covid- in real time cold sensitivity of the sars-cov- spike ectodomain data, disease and diplomacy: gisaid's innovative contribution to global health features and development ofcoot ucsf chimerax: meeting modern challenges in visualization and analysis the bio d packages for structural bioinformatics cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding structural basis for antibody-mediated neutralization of lassa virus controlling the sars-cov- spike glycoprotein conformation a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structures and distributions of sars-cov- spike proteins on intact virions tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus a highly stable prefusion rsv f vaccine derived from structural analysis of the fusion mechanism the impact of mutations in sars-cov- spike on viral infectivity and antigenicity macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix enhanced isolation of sars-cov- by tmprss -expressing cells immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen structure of hiv- gp with gp -interactive region reveals layered envelope architecture and basis of conformational mobility ucsf chimera?a visualization system for exploratory research and analysis cryosparc: algorithms for rapid unsupervised cryo-em structure determination specific single or double proline substitutions in the "spring-loaded" coiled-coil region of the influenza hemagglutinin impair or abolish membrane fusion activity identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure structure-based design of prefusion-stabilized filovirus glycoprotein trimers stabilization of the soluble, cleaved, trimeric form of the envelope glycoprotein complex of human immunodeficiency virus type a bayesian view on cryo-em structure determination processing of structurally heterogeneous cryo-em data in relion nih image to imagej: years of image analysis pandemic preparedness: developing vaccines and therapeutic antibodies for covid- cell entry mechanisms of sars-cov- situ structural analysis of sars-cov- spike reveals flexibility mediated by three hinges. science function, and antigenicity of the sars-cov- spike glycoprotein d g spike mutation increases sars cov- susceptibility to neutralization. medrxiv cryo-em structure of the -ncov spike in the prefusion conformation cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains structural and functional analysis of the d g sars-cov- spike protein variant coronaviruses -drug discovery and therapeutic options rbd "up" ( -rbd-up s) vs rbd "down" protomer # ( -rbd-up s) rbd "up" ( -rbd-up s) vs rbd "down" protomer ( -rbd-down s) rbd "up" ( -rbd-up s) vs rbd "down" protomer # ( -rbd-up s) rbd "down" protomer # ( -rbd-up s) vs rbd "down" protomer # ( -rbd-up s) rbd "down" protomer # ( -rbd-up s) vs rbd "down" protomer ( -rbd-down s) cryo-em data were collected at the national center for cryo-em access and training (nccat) and the simons electron microscopy center located at the new york structural biology center, supported by the nih common fund transformative high resolution cryo-electron microscopy program (u gm ) and by grants from the simons foundation key: cord- -vxxhglx authors: abitogun, folagbade; srivastava, r.; sharma, s.; komarysta, v.; akurut, e.; munir, n.; macalalad, l.; olawale, o.; owolabi, o.; abayomi, g.; debnath, s. title: covid : exploring uncommon epitopes for a stable immune response through mhc binding date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vxxhglx the covid pandemic has resulted in , , deaths as of th october , indicating the urgent need for a vaccine. this study highlights novel protein sequences generated by shot gun sequencing protocols that could serve as potential antigens in the development of novel subunit vaccines and through a stringent inclusion criterion, we characterized these protein sequences and predicted their d structures. we found distinctly antigenic sequences from the sars-cov- that have led to identification of proteins that demonstrate an advantageous binding with human leukocyte antigen- molecules. results show how previously unexplored proteins may serve as better candidates for subunit vaccine development due to their high stability and immunogenicity, reinforce by their hla- binding propensities and low global binding energies. this study thus takes a unique approach towards furthering the development of vaccines by employing multiple consensus strategies involved in immuno-informatics technique. coronaviruses are a large group of viruses that are rather common throughout the community, causing illness ranging from the common cold to more severe diseases such as severe acute respiratory syndrome (sars-cov) and middle east respiratory syndrome (mers-cov) ( ) . the previous outbreaks of these coronaviruses in and respectively show similarities to the novel coronavirus ( ) , which was first reported in december in wuhan ( ) . it was later declared as a pandemic on th march, by the world health organization (who) having caused a global emergency across many countries and territories around the world. the causative agent of the covid- disease is the severe acute respiratory syndrome coronavirus (sars-cov- ) which was initially tagged as novel coronavirus ( -ncov) by the world health organization ( ) before genomic studies revealed a . % and % nucleotide level similarity with sars-cov and bat coronavirus respectively ( , ) , hence necessitating its new identity. further analysis on its similarity with sars-cov and mers-cov including phylogeny revealed that it is categorized under the family coronaviridae, order nidoviralae and genus betacoronavirus alongside the other two viruses which have also caused pandemic situations in the past, causing thousands of death globally. ( , ) the structural architecture of sars-cov- is an elaborate one, containing a positive single stranded rna as its genetic element with genomic length of about - kilobases, ( ) which encodes the major structural proteins being the spike (s) glycoprotein, membrane (m) protein, envelope (e) protein, and nucleocapsid (n) protein. ( , ) the structure of the spike glycoprotein of the virus is also an extended similarity with sars-cov, ( ) which together with covid : exploring uncommon epitopes for a stable immune response through mhc binding other proteins of the virus are candidates for vaccine development and are being explored in different settings due to the active roles of the proteins in the infectivity of the virus. ( ) ( ) ( ) ( ) developing an effective treatment for sars-cov- is a research priority. currently, different therapeutic strategies are being utilized by researchers to combat this dangerous covid- , with much of the focus being on developing novel drugs or vaccines. however, in recent years, the development of vaccine design has been revolutionized by the reverse vaccinology (rv), which aims to first identify promising vaccine candidate through bioinformatics analysis of the pathogen genome. one of the successes of this method is the bexsero vaccine, discovered for group b meningococcus. ( ) this immuno-informatics method includes the identification of peptide epitopes in the virus genome which can then be prepared by chemical synthesis techniques. these epitopes are easier in quality control; however, there is the need for structural modifications as well as the inclusion of delivery systems, and adjuvants in their formulation due to the low immunogenicity of the epitopes which is a result of their structural complexity and low molecular weight. ( ) recently, a set of b and t cell epitopes highly conserved in sars-cov- were identified from the s and n proteins of sars-cov. ( ) however studies have shown that full length spike protein vaccines for sars-cov may lead to antibody mediated disease enhancement causing inflammatory and liver damage in animal models ( , ) which is why in this manuscript, we applied immuno-informatics "in silico" approaches to identify potential cd + cytotoxic t cell epitopes from proteins of sars-cov- , sars-cov and mers-cov. in achieving this, the ~ . kb genome of the sars-cov- which encodes for proteins comprising structural proteins, accessory proteins and non-structural proteins as well as the ~ kb genome of sars-cov- comprising open reading frames (orf) were explored for some uncommon proteins of the viruses with therapeutic potentials. identified proteins were thus considered for highly antigenic epitope prediction and evaluation. the antigenicity and immunogenicity of all identified epitopes was estimated and their interactions with the human leukocyte antigen (hla) class i system were evaluated, owing to the wide usage of the hla system as a strategy in the search for the etiology of infectious diseases and autoimmune disorders, ( ) as was explored in the case of sars-cov- after the first epidemic that broke out in east asia in − . this is linked to the fact that hla genes are known to display the highest level of diversity in the genome, with thousands of different alleles now reported. each of these alleles is also a combination of multiple single nucleotide polymorphisms (snps). this, we also investigated the hla-class i system for evidence of disease association, with the hope that the discovery of disease susceptibility will help in developing protection such as vaccines against the virus. the uniprot ( ) database was searched for the protein sequences inferred from the genome sequence analysis of different isolates of highly pathogenic human coronaviruses. the sars cov, mers-cov, and sars-cov- proteins without experimentally defined d structure or functional annotation were chosen for further analysis and their sequence files in the fasta format were downloaded. the fasta sequences were used to infer the physicochemical properties of the chosen each model was refined using a molecular dynamics simulation approach implemented by galaxy refine ( ) tool or rapid energy minimization using discrete molecular dynamics with an all-atom representation for each residue implemented by chiron server. antigenicity prediction of the top eight uncharacterized coronavirus proteins selected after d molecular models' validation was performed using vaxijen tool. ( ) a threshold value of . was taken into account. non-antigenic peptides having vaxijen scores less than . were discarded, while antigenic epitopes with scores higher than . were further prioritized for their immunogenicity. the proteins predicted to be non-antigens were discarded from the further analysis. antiviral immunity relies on the ability of major histocompatibility complex (mhc) class i molecules to bind antigen molecules and display them to t cells. each candidate is subsequently refined by restricted interface side-chain rearrangement and by soft rigid-body optimization. the side-chain flexibility is modeled by rotamers and the obtained combinatorial optimization problem is solved by integer linear programming. following rearrangement of the side-chains, the relative position of the docking partners is refined by monte carlo minimization of the binding score function. the refined candidates are ranked by the binding score which is a combination of atomic contact energy, softened van der waals interactions, partial electrostatics and additional estimations of the binding free energy. in order to identify functionally important regions in the selected proteins and the selection of epitopes with the desired degree of conservation, consurf ( ) a total of non-structural sequences were identified and retrieved from uniprotkb database. the mass, isoelectric-ph, basic amino acid %, cysteine %, polarity and hydrophobicity of the protein sequences were computed and outlined in table . the lengths of these protein sequences varied from to amino acid residues. cysteine was found to have the highest presence in the selected proteins with an average of . % and its highest presence was observed in proteinq tfa with . %. arginine, lysine and histidine have an average presence of . %, . % and . % respectively. the polarity and non polarity of the final selected proteins ranged from . %to . % and from . %to . % respectively with m svf having the highest percentage of polar amino acids ( . %) while p dtd has the highest percentage of non-polar amino acids ( . %). likewise, the highest hydrophobicity index was recorded for p dtd . after obtaining the d structures of the proteins from modeling servers, the models were refined using galaxyrefine. upon refinement, the models were subjected to stereochemical and thermodynamic stability analysis. particularly, the models with highest ramachandran favored residues and lowest clash scores were used for energy analysis using chiron, scoop and errat servers as highlighted in table . all the protein structures had a ramachandran score of over % and an errat value of over , with q tfa , p dtd and p dtd all having a perfect % ramachandran scores. the free energy of the structures ranged from - . kcal/mol (q tfa ) to - . kcal/mol (a a h lbe ). force field values ranging from - . (a a h lbe ) to - . (p dtd ) were observed using gromos implementation of the swiss-pdb viewer. the models with lowest overall energies were determined to be stable. vaxijen antigenicity prediction was used to predict the overall antigenicity of the proteins as outlined in table . two proteins: m svf _mers and q tfa were predicted to be non-antigens with their antigenicity scores of . and . respectively while p dtd and p dtc showed very high antigenicity owing to their high scores of . and . respectively. t-cell epitope prediction was carried out using iedb, epijen, pepvac, rankpep, and netmhc. initially, , hla class i binding epitopes were predicted from out of stable proteins which were predicted to be antigenic. scrutiny on the basis of percentile rank filtered peptide epitopes. considerable binding affinity for hla- subtypes with high frequencies in asia and africa (hla-a* : , hla-a* : , hla-b* : , hla-b* : and hla b* : ) was observed from netmhcpan server which predicted high affinity binders (< . percentile rank) using el-ranks. among predicted epitopes of sars-cov- virus, ten ( ) epitopes showed considerably high immunogenicity towards the mhc-class molecules which were then selected for further analysis (table ). all the predicted epitopes were further subjected to allergenic and toxicity analysis, with all coming out as non-allergens and non toxins (table ) . epitopes including hlvdfqvti, rksapliel, and ylcflafll showed remarkably high antigenic tendencies, scoring . , . and . respectively. all of these epitopes, along with their features are reported in table (table ) . further docking analysis resulted in a total of global binding energies predictions which were refined with firedock ( table ). the global binding energies range from - . to - . indicating that all the proposed epitope sites from the modelled proteins can serve as good antigens. the conservation status of each residue in the selected t-cell epitopes in four ( ) sars cov- proteins (q tlc , p dtc , p dtd , p dtc ) were examined using consurf. results revealed that the highly conserved and exposed residues are found in all proteins, with p tdc and p dtc having the larger amount of these residues. particularly, the epitopes without allergenicity and toxicity containing at least one predicted functional residue (highly conserved and exposed) included epitopes 'rksapliel' and 'hlvdfqvti' while t-cell epitopes 'rksapliel', 'flafllflv', and 'hlvdfqvti' contained at least one predicted structural residue (highly conserved and buried). interestingly, none of the epitopes from p dtd contained any functional residues (highly conserved and exposed) while the highest concentration of predicted structural and functional residues was found in two epitopes hlvdfqvti ( functional residue and structural residues) and rksapliel ( functional residues and structural residue)-from the protein p dtc . despite the continuous unrest caused by the covid pandemic, considering the , , new cases recorded as of th september, which has spiked up the total reported cases to over . million and a total of , deaths globally, there is currently no available fda-approved vaccine against covid- ( ). if a vaccine is successfully developed against covid- , this will improve global human health. advancement in technology has led to various bioinformatic approaches such as reverse vaccinonology (rv) and immune-informatics being applied to revolutionize vaccine production in terms of production cost and time which has been a major setback in vaccine development. antibody response as well as cell mediated immunity can be established by using proper protein antigen. reverse vaccinology (rv) has been useful in identification of potential vaccine candidates against pathogens and the world is in a race to get one against sars-cov . this approach presents an advantage over other vaccine approaches because it can identify all potential antigens coded by a genome irrespective of their abundance, phase of expression and immunogenecity. while most of the previous protein vaccines against sars-cov have focused on the full length spike protein of the virus, the vaccine development race against the covid virus have followed the same path, with a larger percentage of the of the vaccines currently in development utilizing the spike protein of the sars-cov- ( ) not minding the reported negative side effects, including liver damage and antibody mediated inflammatory diseases, of the full length spike protein vaccines and other whole organism vaccines in the host. additionally, while novel technologies and approaches such as next-generation vaccines enabled through advances in nanotechnology which relies on the principle that nanoparticles and viruses operate at the same length scale are poised to make a clinical impact for the first time, many of these platforms may be several years away from deployment and therefore may not have an impact on the current sars-cov- pandemic, further necessitating the priority given to the quicker and safer vaccine development platforms. the current study focused on the hla- binding propensities of some uncommon proteins from covid- virus. obtaining these antigenic epitopes using the immune-informatics approach will help inform which epitopes to use in the construct of protein vaccines against sars-cov- using proteins other than the spike protein. from the selected proteins obtained all the models with good stability analysis including the ramachandran analysis, which is a two dimensional plot drawn in the space of angles ϕ and ψ that quantify the rotation of the protein backbone and thus play a crucial role in the secondary structure of proteins and later the tertiary structure, were further analyzed for antigenicity and only highly antigenic proteins were selected for further analysis since the highly antigenic proteins will induce better immune response in the host. five ( ) of the most common hla- subtypes in africa and asia that were identified in this study were included in the analyses and their potential interactions between identified epitopes from the proteins were predicted using the netmhcpan . server. the binding affinities between these hla subtypes and the epitopes were indicated by the server which allowed the selection of epitopes that will have strong interactions with the hla alleles. the analysis of this interaction is important because vaccination is a proven strategy for protection from disease and an ideal vaccine would include antigens that elicit a safe and effective protective immune response. hla-restricted epitope vaccines, including t-lymphocyte epitopes restricted by hla alleles, embodies a newly developing and favorable immunization approach. research in hla restricted epitope vaccines to be used for the treatment of tumors as well as for the prevention of viral, bacterial, and parasitic infections have, in recent years, achieved substantial progress and this further strengthens the resolve of the hla-binding propensities analysis done in this study. the predicted strongest binding epitopes were thus further subjected to immunogencity evaluation which checked the ability of the epitope to induce an immune response and also predicted how strongly they will induce an immune response; allergenicity which verified the potential of our epitopes to cause sensitization or allergic reactions related to ige antibody responses; and toxicity which is its ability to cause bodily harm. using these criteria, epitopes were identified and these could be potential epitopes for vaccine constructs against sars-cov- . this was also evinced by the lower global binding energies between our identified epitopes and the hla molecules, ranging from - . to - . , indicating stable complexes formed between them. our extensive approach towards the prediction of stable protein structures from previously uncharacterized proteins and the subsequent application of immune-informatics for the identification of immunogens has resulted in the classification of different epitope sites from different proteins that have high antigenicity and low binding energies with hla- alleles that are most commonly present in the continent of asia and africa. these epitopes have high tendencies to provide effective and strong protective cytotoxic immunity against the sars cov- if they are adequately exploited for vaccine production. a multiple consensus approach ensures the reliability and reproducibility of these results. this study thus provides further knowledge on the association of these hla- alleles with covid , suggesting that good peptide coverage can be achieved by many different combinations of hla- alleles. multi-epitope based peptide vaccine design using three structural proteins (s, e, and m) of sars-cov- : an in silico approach knowledge, attitude, practice and perceived barriers among healthcare professionals regarding covid- : a cross-sectional survey from pakistan naming the coronavirus disease (covid- ) and the virus that causes it a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin prevalence of comorbidities in the middle east respiratory syndrome coronavirus (mers-cov): a systematic review and meta-analysis immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen honglong wuet al. genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding characterization of the coronavirus m protein and nucleocapsid interaction in infected cells shuliang zhouet al. clinical course and outcome of patients infected with the novel coronavirus, sars-cov- , discharged from two hospitals in wuhan raúl gómez román, stig tollefsen, melaniesaville. the covid- vaccine development landscape progress and prospects on vaccinedevelopment against sars-cov- .vaccines(basel) sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor molecular biology of flaviviruses use of serogroup bmeningococcal vaccines in persons aged ≥ years at increased risk for serogroup b meningococcal disease: recommendations of the advisory committee on immunization practices recent progress in adjuvant discovery for peptide based subunit vaccines preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies immunization with modified vaccinia virus ankara-based recombinant vaccine against severe acute respiratory syndrome is associated with enhanced hepatitis in ferrets evaluation of modified vaccinia virus ankara based recombinant sars vaccine in ferrets hla studies in the context of coronavirus outbreaks uniprot: a worldwide hub of protein knowledge the proteomics server for in-depth protein knowledge and analysis the embl-ebi search and sequence analysis tools apis in peptide hydrophobicity/hydrophilicity analysis tool critical assessment of methods of protein structure prediction (casp)-round xiii swiss-model: homology modelling of protein structures and complexes lomets : improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins intfold: an integrated server for modelling protein structures and functions from amino acid sequences the i-tasser suite: protein structure and function prediction falcon: a high-throughput protein structure prediction server based on remote homologue recognition the galaxy platform for accessible, reproducible and collaborative biomedical analyses template-based protein structure modeling using the raptorx web server ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field protein structure refinement driven by side-chain repacking automated minimization of steric clashes in protein structures procheck: a program to check the stereochemical quality of protein structures verify d: assessment of protein models with three dimensional profiles verification of protein structures: patterns of nonbonded atomic interactions deviations from standard atomic volumes as a quality measure for protein crystal structures scoop: an accurate and fast predictor of protein stability curves as a function of temperature the gromos biomolecular simulation program package a server for prediction of protective antigens, tumour antigens and subunit vaccines gold-standard data classification, open access genotype data and new query tools netmhcpan- . and netmhciipan- . : improved predictions of mhc antigen presentation by concurrent motif deconvolution and integration of ms mhc eluted ligand data the immune epitope database (iedb): update silico approach for predicting toxicity of peptides and proteins pep-fold : faster de novo structure prediction for linear peptides in solution and in complex ucsf chimera -a visualization system for exploratory research and analysis patchdock and symmdock: servers for rigid and symmetric docking firedock: a web server for fast interaction refinement in molecular docking using evolutionary data to raise testable hypotheses about protein function consurf: identification of functional regions in proteins by surface-mapping of phylogenetic information development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines coronavirus disease (covid- ) situation reports milken institute covid- vaccinetracker table . immunogenicity analysis of predicted mhc class epitopes we acknowledge the support of dnacompass, cmesbahf nigeria, biotrust scientific, galaxy project and neliref for this project. we also acknowledge the provision of figure images generated by rajavee srivastava and folagbade abitogun from ucsf chimera, discovery studio, pymol, mhc cluster server v . and consurf. we acknowledge t. thao, h. aminu, and v. ekundayo for their contribution to this project. folagbade abitogun is a research associate at the university college hospital, ibadan, nigeria.he is currently actively involved in antimicrobial resistant research and he has deep interest in the interplay between the human immune system and the clearance of infectious diseases, using molecular and bioinformatics approaches. coloured 'e' represents exposed residues, the green colored 'b' represents buried residues, the red colored 'f' represents functional residues (highly conserved and exposed) and the dark blue colored 's' represents structural residues (highly conserved and buried). the conservation scale represents the status of conservation from variable, average to conserved. key: cord- -kxhasabe authors: luo, ruibang; wong, yat-sing; lam, tak-wah title: tracking cytosine depletion in sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kxhasabe motivation danchin et al. have pointed out that cytosine drives the evolution of sars-cov- . a depletion of cytosine might lead to the attenuation of sars-cov- . results we built a website to track the composition change of mono-, di-, and tri-nucleotide of sars-cov- over time. the website downloads new strains available from gisaid and updates its results daily. our analysis suggests that the composition of cytosine in coronaviruses is related to their reported mortality. using , sars-cov- strains collected in ten months, we observed cytosine depletion at a rate of about one cytosine loss per month from the whole genome. availability the website is available at http://www.bio .cs.hku.hk/sarscov /. contact rbluo@cs.hku.hk supplementary information supplementary data are available at bioinformatics online. sars-cov- is a positive-strand rna virus. the intracellular multiplication of positive-strand rna viruses requires complex interactions with the host cell. the coronavirus genome mimics the structure of a cellular mrna. it is rapidly translated into an rna-dependent replicase and proteins that allow the virus to hijack specific host functions used during viral translation, rna replication, and viral envelope construction. the virus uses a critical set of metabolic pathways to access the pool of ribonucleotide triphosphates needed for the transcription of many copies of the replicated rna strand. this makes the viral sequence highly sensitive to changes in the nucleotide pool, the composition of which might be reflected in the virus evolution as it mutates. although coronaviruses have evolved a specific function meant to overcome some of this limitation via a proofreading step in the replicase function, yet mutations remain unavoidable and viruses, which generate a vast number of particles within a single patient, will evolve quasi-species that will progressively integrate the various types of selection pressure that the virus faces. among the pressures, the availability of ctp (cytidine triphosphate, and hence of cytosine-based precursors) might be an important driving force in the evolution of the virus. the ctp supply in a human is lower than the three other essential triphosphates atp, gtp, and ttp for rna synthesis, and the scarceness of ctp in the host will, in turn, lowers the composition of cytosine in a virus genome along with its evolution (armengaud, et al., ; danchin and marliere, ) . this phenomenon is called depletion, and it co-occurs with the decreased virulence of an rna virus commonly observed in an outbreak. if under constant selection pressures, the speed of depletion tends to stabilize, but if extraneous pressures such as massive vaccination or effective medication, the depletion may slow down or even reverse (danchin and timmis, ) . by evaluating the speed of cytosine depletion in the new sars-cov- strains, it might ) be used as a new metric at the molecular level for gauging the speed of the spread of sars-cov- , and ) predict the resistance of sars-cov- to new vaccines and drugs. we built an interactive website at http://www.bio .cs.hku.hk/sarscov to show the mono-, di-, and tri-nucleotide composition trends of the whole genome and single genes. the sars-cov- genomes used in this website are from gisaid (https://www.gisaid.org/). the acknowledge table of the strands used in table is at http://www.bio .cs.hku.hk/sarscov /gisaid_cov _ .xls. new sars-cov- strains are retrieved from the gisaid platform on a daily basis. we used only full genomes and removed those sequences short than kbp after removing gaps (runs of 'n's). iupac bases are substituted with c or the alphabetically smaller base. we used "betacov/wuhan/wiv / " (https://www.ncbi.nlm.nih.gov/nuccore/mn ) and its annotations as our reference. for every full genome, use aligned it against the reference using mafft (katoh and standley, ) . the completeness at both ends of the full genomes varied. thus, we removed the bases aligned with the head bp and tail bp of the reference. in table , we first compared the composition of nucleotides in multiple coronaviruses, including sars-cov- (an average of strands collected from dec , to feb , ), sars, mers, and four that causes a common cold. compared to sars-cov- , which we assumed a relative mortality level of , the percentage of cytosine is lower in almost all of the eleven genes in the four common cold coronaviruses with a lower mortality level . however, the percentage of cytosine is higher in sars (mortality level ) and mers (mortality level ). the result suggests that cytosine depletion is associated with reduced morbidity of coronaviruses. next, we summarized the trend of different nucleotides in sars-cov- strains collected from dec , to oct , . the results are available on an interactive website at http://www.bio .cs.hku.hk/sarscov . , strains from gisaid (elbe and buckland-merrett, ) were used (see methods). the trend of mono-nucleotides a, c, g, and t of the whole genome and ten genes are shown in figure (whole genome, orf ab, s, e, m, and n), and supplementary figure (whole genome, ns , ns , ns a, ns b, and ns ). the whole genome shows a sign of cytosine depletion, at a rate of about one cytosine loss per month. except for gene e, n, ns a, and ns b, all other genes show a sign of cytosine depletion. the trend of di-nucleotides and tri-nucleotides is available on the website. looking into the trend of di-nucleotides in the whole genome, all di-nucleotides containing a cytosine are having a sign of depletion except for "gc". funding r.l. was supported by the ecs (grant number ) of the hksar government, and by the urc fund at hku. conflict of interest: none declared. the importance of naturally attenuated sars-cov- in the fight against covid- cytosine drives evolution of sars-cov- sars-cov- variants: relevance for symptom granularity, epidemiology, immunity (herd, vaccines), virus origin and containment? data, disease and diplomacy: gisaid's innovative contribution to global health mafft multiple sequence alignment software version : improvements in performance and usability figure : the trend of mononucleotide a, c, g and t in , sars-cov- strains collected from table : the relative composition of cytosine in the eleven genes (using the annotation of wuhan-hu- ) of four common cold coronaviruses, sars and mers comparing to the average of sars-cov- strains ( -ncov-avg , collected from dec , to feb , ). the mortality levels are set according to their reported mortality. key: cord- -omxcdiwt authors: basso, fernanda gisele; de paulo, alex fabianne; porto, geciane silveira; pereira, cristiano gonçalves title: cooperative efforts on developing vaccines and therapies for covid- cooperative efforts for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: omxcdiwt health organizations have always sought partnership to join competencies in innovation, even with fierce competition in this sector. in this pandemic moment it is relevant to observe how organizations behave to seek quick and safe answers. the present research analyzes how the cooperation networks were set off considering the clinical trials on therapies and vaccines that were developed specifically to treat or prevent covid- . social network analysis technique was used to build cooperation networks and apply metrics that characterize these connections. there was an evaluation of statistics of strength of cooperation and unilateral dependence of cooperation that identify the cooperation strength between two organizations, and the dependence of this relations. a total of clinical trial were identified, of which % are in cooperation. from organizations that have partnership, firms are the first, followed by universities. we extracted the main categories that concentrate % of partnerships in the trials of antibody, and vaccine. several organizations cooperate in multiple categories of trials, evidencing the efforts to focus on different strategies to treat the disease. we found high strength of cooperation and an assimetryc dependency between partners, which can be assigned to specialized models of partnership and it occurs in competitive enviroments like this pandemic moment. cooperation were not limited to geographical proximity and the advent of chinese players can represent a new change in the biotechnological development axis. finally, the challenge of finding therapeutic or immunological solutions for covid- demonstrates a clear composition of cooperation groups that complement their skills to manage organizational strategies to beat the pandemic. in this new paradigm, there can be partnerships not only in clinical trial but also in pre-competitive technologies development. this experience is expected to change the way of organizations define their r&d strategies and start to adopt more a collaborative innovation model. the estabilishment of the partnership between firms and institutes of science and technology (ist) has brought significant contributions to the transformation of knowledge resulting from technological development. the use of knowledge generated in universities by the productive sector are the driver for the development of new technologies whose transfer consists of a complementary path to reach a higher technological level [ ] . from the perspective of open innovation (oi), organizations improve their capabilities by combining the internalization and ourtsourcing of resources [ ] and establishing partnerships with other firms, customers, suppliers, or ist, which occur at different levels of affinity and complexity. the level of maturity that can be achieved in oi is the strong prioritization of the development of partnerships [ ] . technological cooperation networks, which are formed by groups of heterogeneous organizations [ ] , can be understood from the perspective of their structure: a horizontal pattern when partners from different sectors collaborate or a vertical pattern when the actors are from the same production chain [ ] [ ] . thus, organizations are grouped based on the formation of partnerships with firms, government agencies, investors, industrial associations, and research institutions [ ] . organizations which are in cooperation networks that are more heterogeneous tend to be associated with better economic performance [ ] . in addition, highly clustered communities induce the feeling of trust [ ] , which is relevant when a partnership is associated with discoveries commercially attractive and with high profit expectations. thus, selecting of the best partners is one of the main challenges for the success of businesses in cooperation [ ] . a deep understanding of the cooperation networks can be important and can provide insights of the pattern of relationships affecting the innovation performance of their members; however, the overflow of knowledge may not be equally accessible by or appropriate for all members [ ] . thus, regarding the development of drugs for covid- , the cooperation networks resulting from clinical trial (ct) may emerge as an essential element, allowing organizations to share their results and fulfill the expectations of the drugs efficacy in a more intensive and rapid manner. the decision on which partner to choose to develop an r&d&i project in this sector are frequently drived by complimentray competences to beat the pandemic faster but also focusing on firms' strategies to optimizing their ct portfolios. identifying the most promising technology in such a short time compared to traditional regulatory parameters, then scaling them up and distributing these therapies and/or vaccines are a major challenge for investors and firms, regulatory agencies, and academics. in this context, cooperation networks during the pandemic can be perceived as performance indicators when joining capabilities are enforced to fight the disease and provide a solution for the pandemic. efforts to seek partnerships and create new alliances to fight covid- have brought numerous challenges. there are risks due to the need to redirect the efforts of research teams despite working on the ongoing funding (i.e. for other diseases). there may be delays, discontinuities or even difficulties in maintaining adherence to original protocols, which impairs statistical consistency [ ] . notably, the pressure to obtain rapid results cannot lead to disregard the safety and integrity of the trials; therefore, the organizations involved are seeking maximum effectiveness in the shortest possible time by incorporating knowledge accumulated during other pandemics [ ] [ ] . in addition, concern regarding the accurate monitoring of numerous cts while avoiding redundancies has led to the creation, via artificial intelligence, of a real-time dashboard for covid- cts [ ]. given the extent and severity of covid- , numerous studies under different perspectives have sought to analyse developments related to the pandemic. however, the dynamics of cooperation, which can accelerate the development of a vaccine or drug that is effective against this disease, have not yet been explored. this study seeks to deply understand the cooperation networks of organizations that have combined efforts to create and test new drugs to fight covid- , based on the ct mapped until july . with this purpose, social network analysis (sna) were used to map the interactions between organizations, understand the determinants of cooperation and to allows the analysis of a significant sample of ct resulting from collaborative development [ ] [ ] . notably, within the set of studies on cooperation networks, the adoption of sna using ct data is a strategy little explored in the literature. therefore, this study contributes to understand the profile of entities that cooperate the most in a pandemic status and to highlight which this study uses sna as a method to identify and build cooperation networks for cts focused on the development of drugs and vaccines for covid- . the operational definition of the main constructs used for innovation [ ], sna [ ] , and clinical trials [ ] are detailed in the supporting material (s appendix). the study used data sources from bio century, clinicaltrials.gov, and policy cures research collected until july th, . the organizations were classified into firms, universities, hospitals, research institutes (ris) and government. the categories of clinical trials were grouped into eight categories (s table) . the sponsors, which are organizations that provide funding, were excluded from the networks because their participation was assumed to occur only through funding and not in the technological development and trials themselves. for the construction of cooperation networks, it was assumed that organizations signed agreements and/or treaties to develop specific studies, establishing joint ownership of the results and the new drug or vaccine, with the purpose of forming an alliance for innovation [ ] . cts with or more organizations were label standardized and analysed using gephi for graphical representation and calculation of the network metrics. the connections (edges) refer to the (non-directional) relationship of partnerships in each ct. the strength of cooperation (sc) was adopted to identify the affinity of the cooperation relationship between two organizations, and the unilateral dependence on a cooperation relationship (udc) was used to verify the asymmetry collaboration relationship [ ] . sc and udc values vary between and , and the results of each relationship are distributed in quartiles (q) as follows: q = to . ; q = . to . ; q = . to . , and q = . to . to avoid bias in the analysis, one-to-one relationships were suppressed because the metric values would automatically have a maximum value (s appendix). a total of ongoing cts involving drugs and vaccines for covid- were identified, of which % are in cooperation and % are being tested by only one organization (without cooperation). the most tested therapeutic categories involve antibodies, vaccines, and proteins, which represent % of the total cts in progress. the following categories stand out based on cooperation relevance (fig ) , in decreasing order: sirna ( %), vaccines ( %), protein-based technologies ( %), nucleic acid-based technologies ( %), antibodies ( %), and cell therapy ( %). conversely, despite technological effort, only % of trials involving small molecules have cooperation on ct (s table) . notably, organizations seek to accelerate technological development by establishing partnerships [ ] . in the case of covid- , time is crucial because obtaining effective results can generate a broad competitive advantage in this market, in addition to contributing to the normal resumption of post-pandemic activities. regarding the distribution of the types of organizations that cooperate by category (fig ) , a greater diversity of partnerships was observed in antibodies, followed by vaccines and proteins, because these categories address complex therapies and require complementarity among several disciplines (e.g., adjuvants in because clinical adoption and commercial success are due to the incorporation degree of existing practices in innovation processes [ ] , the diffusion of disruptive technologies in this field may encounter greater challenges. thus, there may be some difficulties in adhering to technologies involving sirna because there are few firms with expertise in their development which delays the approval of drugs. it is difficult to correctly deliver sirna treatments without the drugs being degraded by nuclease enzymes and without promoting side effects [ ] . this technological challenge is also observed in treatments involving cell therapy [ ] . the general cooperation network is constituted by cts (fig ) , and organizations establish cooperation relationships (s table) , indicating specific partnerships. there are exception for collaboration involved in two cts and two partnerships involved in three cts. by dividing the network by technological categories (fig ) , connected groups related to technologies involving antibodies were identified. most of them are hospitals that work together with several organizations, including other hospitals, in the same convalescent plasma ct. in general, these cooperation relationships were carried out mainly by geographic proximity and involving government agencies. despite increasing globalization, regional action is relevant to innovation networks because it facilitates the exchange of knowledge between organizations [ ] . the second largest cluster is vaccines (fig ) , formed mainly by cts between university, firms, and ris, mainly focused on recombinant dna, rna, and live attenuated virus. epivax and tonix pharma stand out with four partnerships each. tonix maintains cooperation with the southern research institute and has three other relationships with the university of alberta, all involving engineered live attenuated virus in the preclinical phase. in turn, epivax cooperates with seven organizations, each with a different expertise. also, in phase iii ct is an inactivated sars-cov- vaccine candidate developed by in a partnership with the butantan institute in brazil. the partnership between pfizer and biontech also generates a product based on messenger rna, which is being tested in a phase ii ct in germany. there are protein-based technologies (fig ) which can also include protein subunit vaccines, of which are in the preclinical phase and only two are in phase i cts. the ct involving scb- , which is also a protein subunit vaccine by sichuan clover biopharma combines the expertise of gsk and dynavax to increase the immune response of patients. the nvx-cov , a vaccine developed by novavax is a candidate which include partnerships with polypeptide group, agc biologics and emergent biosolution to ensure the scaling of compounds necessary for final product development and large-scale manufacturing. these trials show that firms have intensified cooperation relationships to foster a faster response to covid- and ensure product scalability. among the four cts involving cell therapy, the chongging biotechnology and immuncyte (phase i/ii) partnership stands out for car-modified nk cells, which recognize and eliminate the virus. intraregional partnerships in china, in the biomedical field, favour technology spillover abroad and the production of innovation [ ] . four firms and one university developed three preclinical trials based on nucleic acids. ontochem and anixa biosciences participate in two trials involving technology that inhibits the ability of the virus to replicate and bind to human cell proteins. conversely, the university of columbia, oncogenuity, and fortress biotech use a platform to produce oligomers that can help fight covid- and accelerate the discovery of treatments for new outbreaks. the intense participation of chinese organizations in covid- cts is notable, which may represent not only a change in the geographical axis of the development of this type of technology but also the emergence of new players in the fierce biopharmaceutical market. the chinese firms had also developing the therapeutic strategies before any other country, leading the landscape since the outbreak starts overthere. furthermore, organizations from countries without rich traditions in this area, based in eastern europe and latin america, have started to participate in this type of development. some organizations operate in more than one category (fig ) , which may be an indication of greater capacity for technological development and partnerships. the choice to cooperate in different fields, with different partners, contributes to expanding the connection of ct development networks for covid- , directly interfering in the network metrics, which will imply more relevant nodes. the network metrics (table sm ) indicate a low density, with an average and weighted degree with similar values, because only institutions developed at least two cts. among these, only institutions have built partnerships with different players. the density of the less explored categories can bring an understanding that there are more connections, but this occurs due to the presence of few actors, requiring fewer connections [ ] . the diameter of the overall network is eight connections, which is considered a high value, impacted by organizations that have different partners in more than one ct. regarding betweenness centrality, which indicates high influence within a network due to the power to mediate relationships, it was found that organizations influence connectivity, including firms, nine universities and eight ris (table sm the remaining five organizations with the highest betweenness centrality are interconnected. the university of oxford has the highest betweenness centrality, as it participates in a ct that brings together other organizations, in addition to acting in two other trials with partners that also positively influence the network connection, e.g., astrazeneca. this firm also participates in three cts and sought to establish more specific partnerships with organizations that have more than one ct in cooperation. the high volume of triangulations of some organizations suggests multiple partnerships with a high impact on centrality and low risk diversification. when analysing the strength of cooperation, of the relationships, ( %) are one-to-one relationships; that is, organizations participating in only one ct and with a single partner. therefore, for these cases, sc and udc have the maximum value ( ). furthermore, % of the partners have at least one party with a high dependence on the relationship (udc in q ). these characteristics of a high degree of cooperation and asymmetry in unilateral dependence can be attributed to partnership models with well-defined specializations between the parties, which occur in extremely competitive environments, corresponding to the current moment of the pandemic. for the other partners ( ), sc was found in q only for one relationship, i.e. astrazeneca and chinese academy sciences, that are developing a neutralizing antibody, referring to the fact that both work independently of each other. the other organizations have a median cooperation affinity, with a tendency to increase. this influences the distribution of quartiles of the udc, whose predominance in the combinations q -q and q -q reinforces the dependence asymmetry in the cooperation relationships. this finding is evident when observing the sc and udc measurements by category and type of organization, where antibodies, vaccines and proteins fit this profile (s - table) . among the organizations with greater cooperation in cts, vir biotech stands out with seven trials in progress, of which three focus on sirna and four focus on antibodies. vir biotech showed greater cooperation affinity with gsk for antibodies (sc = . ) and alnylam pharma for sirna (sc = . ). despite this greater affinity, vir biotech has a low dependence on the relationship, unlike the other two partners, who have a high dependence. johns hopkins university, with four trials involving antibodies, one involving proteins, and one involving a vaccine, shows a greater cooperation affinity with capricor (sc = . ) and has a low udc ( . ), while its partner has a relationship of total dependence in this category. another organization that stands out in this analysis is epivax, with six ongoing trials, of which five involve vaccines and one involves a protein, with low affinity with its partners but with dependence on the cooperation relationship. another characteristic observed is the wide diversity of partnerships between firms, hospitals, ri, and universities. the role of universities has a special meaning, which in the context of a pandemic requires complex research and leads to intensification of partnerships for ct execution, being a complementary way to achieve the desired objetives. hospitals are also seen as key organizations in this type of partnership in order to enable, quickly and in multiple centers facilitating the testing of new drugs and vaccines. finally, in the case of firms, they set aside their history of disputes and started to share competences. in this new paradigm, there can be partnerships not only in ct but also in pre-competitive technologies development. this experience is expected to change the way of organizations define their r&d strategies and start to adopt more widely a collaborative innovation model. this study has some limitations due to the availability of information. there is a difference between the quality of information provided by universities and industries. these variations can be perceived according to the geographical locations of organizations and clinical studies. in addition, there is the possibility that some groups may not fully report their status for competitive reasons or overreport to attract more funding. in future research, it is intended to analyze more broadly the process of technological development for drugs and vaccines that will result in disruptive technologies protected by patents, complementing the understanding of different aspects of cooperation in this new context such as the current pandemic. supporting information s appendix. main constructs used for innovation, sna, and clinical trials. s appendix. strength of cooperation (sc) s table. grouping of categories to characterize clinical studies. the categories were grouped considering the technological proximity that exists between them. in the case of vaccines, the class information and the type of compound provided by the biocentury were also evaluated. r&d cooperation, partner diversity, and innovation performance: an empirical analysis collaborative networks and product innovation performance: toward a contingency perspective relationship between cooperation networks and innovation performance of smes the role of networks in small and medium-sized enterprise innovation and firm performance effect of collaboration network structure on knowledge creation and technological performance: the case of biotechnology in canada strategic network partner fit, open innovation and organisational performance: a conceptual framework interfirm collaboration networks: the impact of large-scale network structure on firm innovation clinical trials and tribulations in the covid- era release from french national academy of medicine, national academy of pharmacy and academy of sciences the scientific literature on coronaviruses, covid- and its associated safety-related research dimensions: a scientometric analysis and scoping review networks: an introduction user guide -overview. clarivate analytics network capital, social capital and knowledge flow: how the nature of inter-organizational networks impacts on innovation introduction to modern information retrieval the relationship of industry structure to open innovation: cooperative value creation in pharmaceutical consortia cancer clinical trials in the era of genomic signatures: biomedical innovation, clinical utility, and regulatory-scientific hybrids chitosan-based nanoparticles for survivin targeted sirna delivery in breast tumor therapy and preventing its metastasis emerging car t cell therapies: clinical landscape and patent technological routes examining the moderating effect of technology spillovers embedded in the intra-and inter-regional collaborative innovation networks of china the geography and structure of global innovation networks: a knowledge base perspective comparing brain networks of different size and connectivity density using graph theory key: cord- - xpnd d authors: strömich, léonie; wu, nan; barahona, mauricio; yaliraki, sophia n. title: allosteric hotspots in the main protease of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xpnd d inhibiting the main protease of sars-cov- is of great interest in tackling the covid- pandemic caused by the virus. most efforts have been centred on inhibiting the binding site of the enzyme. however, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting. here, we report the allosteric communication pathways in the main protease dimer by using two novel fully atomistic graph theoretical methods: bond-to-bond propensity analysis, which has been previously successful in identifying allosteric sites without a priori knowledge in benchmark data sets, and, markov transient analysis, which has previously aided in finding novel drug targets in catalytic protein families. we further score the highest ranking sites against random sites in similar distances through statistical bootstrapping and identify four statistically significant putative allosteric sites as good candidates for alternative drug targeting. for allosteric regulation of the main protease. by providing guidance for allosteric drug design we hope to open a new chapter for drug targeting efforts to combat covid- . results the first step in our graph analysis approach is the construction of an atomistic graph from a protein data bank (pdb) [ ] structure. this process takes into account strong and weak interactions like hydrogen bonds, electrostatic and hydrophobic interactions (see methods and fig. ) . additionally, we can incorporate water molecules, which in the case of the m pro are catalytically important and known to expand the catalytic dyad to a triad [ ] (see fig. s b ). in sites) or form a communication pathway [ ] . by applying quantile regression we are able to quantitatively rank all bonds, atoms and subsequently residues. this allows to score the hotspots we identified and statistically prove their significance. table s ) reveal two main areas of interest in the m pro . the hotspot on the back of the monomer opposite to the active site ( fig. a) is described in more detail in the paragraphs below. hotspot two is located in the dimer interface and contains four residues which form salt bridges between the two monomers. serine and arginine from one monomer connect to histidine and glutamine from the other one, respectively. interestingly, these bonds have been found to be essential for dimer formation which in turn is required for m pro activity [ , ] . to further clarify the interactions between the dimer halves ( hence, we chose these residues as source when looking into pro- tease dimer connectivity in comparison between sars-cov- and sars-cov. in sars-cov, this closer dimer packing led to an increased activ- figure : bond-to-bond propensities of m pro sourced from the orthosteric sites. the source sites have been chosen as the catalytically active residues his and cys in both chains of the homodimer and are shown in green (front a) and top b) view). all other residues are coloured by quantile score as shown in the legend and reveal two main areas of interest with important residues labelled. c) the propensity of each residue, ⇧r, is plotted against the residue distance from the orthosteric site. the dashed line indicates the quantile regression estimate of the . quantile cutoff used for identifying relevant residues. atomistic level here, we assume that studying the dimer interface residues in a systematic manner would help elucidate the link between domain iii and the catalytic activity of the m pro . bond-to-bond propensities have been shown to successfully detect allosteric sites on proteins [ ] and we here present the results in the sars-cov- m pro to that effect. by choosing the active site residues histidine and cysteine as source, we can detect areas of strong connectivity towards the active centre which allows us to reveal putative allosteric sites. we could detect two hotspots on the protease which might be targetable for allosteric regulation of the protease (fig. ) . most of the residues present in the two putative sites are amongst the highest scoring residues which are listed in table s . site ( fig. a shown in yellow) which is located on the back of the monomer in respect to the active site and is formed by nine residues from domain i and ii (full list in table s ). the second hotspot identified with bond-to-bond propensities is located in the dimer interface and contains residues (tab. s ) which are located on both monomers ( fig. b shown in pink). two of these residues, glu and arg of the respective second monomer, are forming a salt bridge which is essential for dimerisation [ ] . quantile regression allows us to rank all residues in the protein and thus we can score both sites with an average residue quantile score as listed in table . site and have a high score of . and . , respectively and score much higher than a randomly sampled site would score with . ( % ci: . - . ) for a a site of the size of site or . ( % ci: . - . ) for a site of the size of site . our methodologies further allow to investigate the reverse analysis to assess the connectivity of the predicted allosteric sites. for this purpose, we defined the source as all residues within the respective identified sites (tables s and s ). after a full bond-to-bond propensity analysis and quantile regression to rank all residues, we are able to score the active site to obtain a measure for the connectivity towards the catalytic center (tab. s ). for site the active site score is . which is above a randomly sampled site score of . ( % ci: . - . ). however, for site the active site score is . which is only marginally above a randomly sampled site score of . ( % ci: . - . ). as site is located in the dimer interface, this is in line with the above described suggestion that the allosteric effect is not directly conferred from the dimer interface towards the catalytic centre. nonetheless, this site might provide scope for inhibiting the m pro by disrupting the dimer formation at these sites. figure : putative allosteric sites identified by bond-to-bond propensities. surface representation of the m pro dimer coloured by quantile score (as shown in the legend). a) rotated front view with site (yellow) which is located on the opposite of the orthosteric site (coloured in green). b) top view with site (pink) located in the dimer interface. a detailed view of both sites is provided with important residues labelled. overall, this missing bi directional connectivity hints to a more complex communication pattern in the protein and gave us reason to utilize another tool which has been shown to be effective in catalytic frameworks [ ] like the protease. figure a and a full list can be found in table s . in the sars-cov- m pro , this analysis subsequently led to the discovery of two more putative sites as shown in figure c . both hotspots are located on the back of the monomer in relation to the active site. site (shown in turquoise in figure c ) is located solely in domain ii and consists of ten residues as listed in table s . one of which is a cysteine at position which might provide a suitable anchor point for covalent drug design. site (orange in figure c ) is located further down the protein in domain i with residues as listed in table s . both sites were scored as described above and in the methods section. following the same thought process as described for site and , we can investigate the protein connectivity from the opposite site by sourcing our runs from the residues in site and . we then score the active site to measure the impact in multimeric proteins this might be due to another structural or dynamic factor which we did not yet uncover between site and the active site. overall we see a similar pattern of hot and cold spots in the sars-cov m pro (results not shown). we find a high overlap for the identified four sites which gives us confidence, that a potential drug effort would find applications in where b is the n ⇥ m incidence matrix for the atomistic protein graph with n nodes and m edges; w = diag(w ij ) is an we define the bond propensity as: and then calculate the residue propensity of a residue r: markov transient analysis (mta). a complementary, node-based method, markov transient analysis (mta) identifies areas of the protein that are significantly connected to a site of interest, the source, such as the active site, and obtains the signal propagation that connects the two sites at the atomistic level. the method has been introduced and discussed in detail in ref. [ ] and has successfully identified allosteric hotspots and pathways without any a priori knowledge [ , ] . importantly, it captures all paths that connect the two sites. the contribution of each atom in the where t provides models for conditional quantile functions. this is significant here because it allows us to identify not the "average" atom or bond but those that are outliers from all those found at the same distance from the active site and because we are looking at the tails of highly non-normal distributions. as the distribution of propensities over distance follows an exponential decay, we use a linear function of the logarithm propensities can be found in ref. [ ] and for markov transient analysis in ref. [ ] . site scoring with structural bootstrap sampling. to allow an assessment of the statistical significance of a site of interest, we score the site against randomly sampled sites of the same size. for this purpose, the average residue quantile score of the site of interest is calculated. after sampling random sites on the protein, the average residue quantile scores are calculated. by performing a bootstrap with , resamples with replacement on the random sites average residue quantile scores, we are able to provide a confidence interval to assess the statistical significance of the site of interest score in relation to the random site score. investigation as shown in table . for each of these fragment-bound structures, we performed bond-to-bond propensity and markov transient analyses to evaluate the connectivity to the active site. the active site was scored as described above. a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying - ncov and naming it sars-cov- the severe acute respiratory syndrome a decade after sars: strategies for controlling emerging coron- aviruses dissection study on the severe acute respiratory syndrome c-like protease reveals structure-based prediction of protein allostery allosteric modulator discovery: from serendipity to structure-based design activation pathway of src kinase reveals intermediate states as targets for drug design perturbation-response scanning reveals key residues for allosteric control in hsp exploiting protein flexibility to predict the location of allosteric sites pars: a web server for the prediction of protein allosteric and regulatory sites allopred: prediction of allosteric pockets on proteins using normal mode pertur- bation analysis improved method for the identification and validation of allosteric sites structure-based statistical mechanical model accounts for the causality and energetics of allosteric communication reversing allosteric communication: from detecting allosteric sites to inducing and tuning targeted allosteric response mapping allosteric communications within individual proteins protein multi-scale organization through graph partitioning and robustness analysis: application to the myosin-myosin light chain interaction uncovering allosteric pathways in caspase- using markov transient analysis and multiscale community detection bagpype: a python package for the construction of atomistic, energy-weighted graphs from biomolecular structures prediction of allosteric sites and mediating interactions through bond-to-bond propensities allostery and cooperativity in multimeric proteins: bond- to-bond propensities in atcase the origin of allosteric functional modulation: multiple pre-existing pathways abstract : targeting rsk prevents both chemoresistance and metastasis in lung cancer the protein data bank sars-cov cl protease cleaves its c-terminal autoprocessing site by novel subsite cooperativity quaternary structure of the severe acute respiratory syndrome (sars) coronavirus main protease crystallographic and electrophilic fragment screening of the sars-cov- main protease potential anti-viral activity of approved repurposed drug against main protease of sars- cov- : an in silico based approach silico evaluation of the effectivity of approved protease inhibitors against the main protease of the novel sars-cov- virus targeting the dimerization of the main protease of coronaviruses: a potential broad- spectrum therapeutic strategy targeting non-catalytic cysteine residues through structure-guided drug discovery inference of macromolecular assemblies from crystalline state proteinlens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation inorganic chemistry: principles of structure and reactivity dreiding: a generic force field for molecular simulations automated design of the surface positions of protein helices hydrophobic potential of mean force as a solvation function for structure of complex networks: quantifying edge-to-edge relations by failure-induced flow redistribution algebraic graph theory random walks, markov processes and the multiscale modular organization of complex networks quantile regression quantreg: quantile regression. r package version exploring allostery in proteins with graph theory open-source foundation of the user-sponsored pymol molecular visualization system key: cord- -n ens authors: rosebrock, adam p. title: patient dna cross-reactivity of the cdc sars-cov- extraction control leads to an inherent potential for false negative results date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n ens testing for rna viruses such as sars-cov- requires careful handling of inherently labile rna during sample collection, clinical processing, and molecular analysis. tests must include fail-safe controls that affirmatively report the presence of intact rna and demonstrate success of all steps of the assay. a result of “no virus signal” is insufficient for clinical interpretation: controls must also say “the reaction worked as intended and would have found virus if present.” unfortunately, a widely used test specified by the us centers for disease control and prevention (cdc) incorporates a control that does not perform as intended and claimed. detecting sars-cov- with this assay requires both intact rna and successful reverse transcription. the cdc-specified control does not require either of these, due to its inability to differentiate human genomic dna from reverse-transcribed rna. patient dna is copurified from nasopharyngeal swabs during clinically-approved rna extraction and is sufficient to return an “extraction control success” signal using the cdc design. as such, this assay fails-unsafe: truly positive patient samples return a false-negative result of “no virus detected, control succeeded” following any of several readily-encountered mishaps. this problem affects tens-of-millions of patients worth of shipped assays, but many of these flawed reagents have not yet been used. there is an opportunity to improve this important diagnostic tool. as demonstrated here, a re-designed transcript-specific control correctly monitors sample collection, extraction, reverse transcription, and qpcr detection. this approach can be rapidly implemented and will help reduce truly positive patients from being incorrectly given the all-clear. one sentence summary a widely-used covid- diagnostic is mis-designed and generates false-negative results, dangerously confusing “no” with “don’t know” – but it’s fixable the focus of molecular testing will shift to clearing asymptomatic individuals based on absence of detectable virus. the us centers for disease control and prevention (cdc) have specified and given emergency use authorization (eua) for a sars-cov- molecular diagnostic used to detect viral rna in clinical samples ( ) . this assay is widely used and also serves as a prototype for many fdaauthorized tests (discussed below). as i show here, this assay uses a poorly-designed specimen extraction control that does not perform as intended on real-world samples. the cdc control fails to reports on specimen and process integrity. i alerted the cdc to this potential problem in march of , supported by bioinformatic prediction. their response, sent more than a month later, did not share my concerns. one, the cdc had not received widespread reports of false-negative tests. two, negative patients should err on the side of caution and re-test if concerned. three, the fda had approved this test, making it, de jure correct (full correspondence in supplemental materials). false negatives are inherently difficult to detect -particularly when the assay in question is the current gold standard. testing reagents are in short supply, making it hard for individuals to be tested once, much less repeatedly. official approval is not a defense against facts. as i show below, the cdc design is both flawed and fixable. individuals would be incensed by a preventable false-positive diagnosis putting them into unnecessary quarantine. the public should be incensed by a poorly designed official assay whose preventable false-negative diagnoses allow virus-positive patients back into society. the -ncov real-time rt-pcr diagnostic panel is a cdc-designed and validated covid- assay comprised of three qpcr primer/probe sets. n and n generate sars-cov- -specific amplicons from reverse transcribed viral rna. the third, rp, targets the human rpp gene and is an "extraction control" that is intended and claimed to inform about successful sample collection, rna extraction, and reverse transcription ( ). it does not perform these functions in practice. the cdc design is fundamentally flawed and has the potential to return dangerous false negatives (fig. a) . the cdc-specified rp extraction control design generates identical amplicons from reverse transcribed rna and human genomic dna. megablast ( ) search of the rp forward, reverse, and probe sequences against the human refseq_rna transcript database identifies perfect matches within the rpp mrna. the design recognizes sequences that are entirely contained within a single exon, and therefore generates an identical amplicon from the rpp genomic locus (fig. b) . the cdc extraction control generates a positive signal in reactions containing intact rna. it also generates a positive signal from rna-free reactions containing human genomic dna (fig. s ). genomic dna is co-purified in quantities sufficient to generate strong positive signals for the cdc-specified extraction control during work-up of clinical rna specimens. to test for the presence of control-affecting dna, qpcr reactions lacking reverse transcriptase were performed on sars-cov- -positive clinical samples using the cdc-specified rp primer and probe. "purified rna" samples from covid- cases were obtained from a cap/clia certified university hospital molecular pathology laboratory. samples had been processed using either solid phase extraction (qiagen) or low ph organic extraction (trizol). both methods are permitted under current emergency use authorization (eua) for rna isolation. in reality, both methods co-purify significant amounts of dna ( ). all clinical samples tested generated unambiguous extraction control positive signals in the absence of reverse transcription, a reaction context that could not have detected virus an rna virus ( fig. a) single-digit copies of genomic dna are sufficient to generate a positive control signal using the cdc-designed assay. driven by concern about the control probe signals observed in pcr-only analysis, i determined the quantity of genomic dna required to generate an assay-positive control signal. commercially supplied, pooled human dna was serially diluted across several orders of magnitude and analyzed by the same one-step rt-qpcr used for clinical samples (fig. b) . extrapolating from pure-dna samples, a positive signal from the cdc control would be readily achieved in samples containing one copy of the genome copy per reaction ( ). due to the presence of co-purifying genomic dna in clinical samples, loss of rna integrity leads to false-negative results using the cdc-specified control. rna, including the genome of the sars-cov- virus, is intrinsically more labile than dna. rna-degrading enzymes (rnases) rapidly hydrolyze rna while leaving dna intact. laboratory procedures that manipulate rna, including methods specified in the cdc eua, require careful sample, reagent, and consumable handling to avoid introduction of ubiquitous environmental or personnel-derived rnases. loss of rna integrity, accomplished here by intentional exposure to rnase, causes virus-positive clinical samples to return a false-negative "extraction control positive, virus negative" result using the cdc-design (fig. ) . assays can be designed to specifically detect mature mrna, and ignore genomic dna, by generating amplicons that are of pcr-amplifiable size only after splicing or by targeting sequences created at exon-exon junctions (figs. , s - ). a transcript-specific control design incorporating both of these approaches correctly monitors human specimen collection, rna integrity, and successful reverse transcription, thus serving all intended roles of the human specimen / extraction control (figs. , s , and s and supplementary text) discussion: properly-designed extraction and reverse transcription controls are essential, not for detecting sars-cov- , but for determining when a negative test result is invalid. the unprecedented scale of molecular testing undertaken during the covid- pandemic has required that labs rapidly adopt this cdc-specified, among many other, new assays. labs accustomed to automated walk-up systems are now performing manual sample handling and reaction setup one pipet tip at a time. the cdc design detects sars-cov- , but in the face of real-world assay errors, its faulty control design fails-unsafe and calls into question virus-negative results. the cdc specifies a once-per-plate control using in vitro transcribed rna, purified virus, or pooled patient samples. this approach does not inform about the suitability of individual specimens. degradation of sample rnas or failure of reverse transcription on a per-sample basis results in the inability to detect viral rnas while the dna-templated control signal remains. this opens the door to false-negative clinical interpretation of patients from whom virus was successfully collected. controls for specimen integrity and success of all reaction steps are critical to robust assay design. the currently-specified cdc control achieves neither. a number of other fdaapproved sars-cov- molecular assays use synthetic rna spike-ins added at the time of extraction. this approach controls for post-extraction sample integrity and reverse transcription, but does not inform about sufficiency of collected patient samples ( ). the intent of the cdc's extraction control is good, monitoring ( ) sufficient human specimen collection, ( ) maintenance of sample integrity during processing, and ( ) successful reverse transcription and qpcr. these goals can be met by an extraction control that specifically detects human rna and is insensitive to genomic dna; the cdc's mis-executed plan can be put into practice by manufacturing and shipping an updated control. the dna cross-reactive control probe is used in over % of the covid- fda-authorized protocols listed as of may , ( ). million assays worth of cdc-design reagents have been produced and shipped by a single vendor as of april , ( ). this cdc-designed sars-cov- assay is fda-approved and unquestionably popular, but we must not confuse popularity and official approval with being correct. the scope of this problem is large, though its extent is unknowable: the sequences used in many test designs, commercial and academic, are not disclosed to the public. even fully disclosed sequences do not guarantee rapid discovery of errors. this poorly-designed cdc extraction control probe has been included in clinical rt-qpcr panels used in testing for rna viruses across more than a decade ( ). past errors cannot be unmade, but future false negatives that would arise from this design can be prevented. the current cdc algorithm states that molecular testing "…must be combined with clinical observations, patient history, and epidemiological information" ( ) . in the context of drive-up testing and potential nationwide return-to-work testing, clinical observations will be necessarily limited and often impersonal in scope. absence of data and data of absence must be carefully distinguished. a proper control does not provide definitive diagnostic results from a flawed specimen. it will, however, fail-safe, flag a result as invalid, and call attention to problem samples to enable focused re-testing or changes in clinical management. in the interest of public health, assays must generate as many valid results as feasible. this requires simultaneously maximizing identification of true positives and minimizing false negatives. virus-shedding patients must not be given an all-clear test result due to an improperly designed that can be readily fixed. while perfect diagnostic tests are unachievable, assays should be designed and updated to strive for maximum specificity. the cost of incorrectly clearing sars-cov- -positive patients is incalculably large. implementing a redesigned control that performs as intended amounts to pennies per test. a new primer / probe set, such as the one described here, is a drop-in replacement that can be manufactured rapidly and at scale. it is imperative that the cdc take action to update its clinical algorithms and specify a re-designed transcript-specific extraction control that performs all of its claimed, and critically important, functions. . the amount of co-purified dna is lower for trizol-purified samples despite use of a larger fraction of the sample. this may reflect an even greater rna:dna co-purification ratio but may also reflect evolving clinical skills in patient sample collection during covid- testing. . the cdc eua states that the extraction control is considered positive / indicative of a successful extraction, where control probe ct < cycles using a fastdx instrument (see supplementary text). . multiple vendors of purified "reference" dna were used with similar results. promega human male control dna is available in relatively large lot-controlled batches, validated, is widely cited, and is used in all figures shown here. . nasopharyngeal (np) swabs in transport media are a particularly sanguineous mixture that are frequently received with co-collected patient cells and tissue. while there may be more patient dna in np-derived specimens than other collection types, human genomic dna is routinely intentionally collected from saliva, sputum, and other covid- relevant anatomic sites. . a properly-processed blank swab or empty vial of vtm will generate, as intended, a spikein positive signal. figures s -s table s fig. . a. the cdc-specified sars-cov- qpcr assay and associated clinical algorithm are intended to inform for presence of virus, or that the assay worked and could have found virus, but that no virus was found. signal from virus-targeting rt-qpcrs is sufficient to report "positive -ncov". an "invalid" report is returned where all viral and control probes are negative. the clinical interpretation is " -ncov not detected" when a human control, targeting the rpp gene, returns a positive signal and when viral probes are negative. patients who are truly virus negative will fall into this category. so will patients for whom sample collection, handling, or analysis problems have occurred, leading to false negative results. b. the cdc-specified extraction control primers and probe recognize a sequence that is identical in reverse transcribed rna and genomic dna. megablast search of the human refseq_dna database identifies perfect matches for the cdc-control primers, rp-f and rp-r, and probe, rp-p. this design amplifies and detects a base pair amplicon located entirely within exon of the rpp gene, located on human chromosome (coordinates from hg human genome build). specific detection of reverse-transcribed rnas can be accomplished by design of an exon-exon junction spanning amplicon and probe (supplemental figure ) . genomic dna is co-purified during rna extraction of clinical covid- samples and is sufficient to generate a positive signal using the cdc extraction control. a. qpcr (without reverse transcription) was performed using the cdc extraction control (rp) probe. raw ct values are shown. a total of representative covid- positive clinical samples were tested, fourteen purified by automated silica spin-column (qiagen) and fourteen by manual organic extractions (trizol, materials and methods). all samples contained genomic dna and generated extraction control positive signals well above the cdc assay threshold ( cycles, horizontal dashed line). significantly more genomic dna was present in samples purified by silica spin-columns compared to organic extraction (mean ct of . and . , respectively, p= . x - , unpaired t). co-purification of dna is to be expected based on the mechanisms of purification used, and is openly caveated in vendor literature ( ). per-plate process controls, shown, contained . ng of genomic dna, equivalent to ~ diploid cells. b. single-cell equivalents of purified genomic dna, absent rna, are sufficient to generate an "extraction control positive" signal using the cdc-rp probe. rt-qpcr was performed using viral one- step mastermix (materials and methods) on -fold serial dilutions of a commercially-supplied reference human dna (rna-free). dilution equivalent to one diploid copy of gdna per reaction ( . pg) generates signals well above the cdc-specified threshold, despite increase in ct variability at extreme dilutions. fig. . cdc-specified testing of sars-cov- -positive clinical specimens return a false negative following loss of rna integrity. reverse-transcriptase-qpcr data were generated using viral one-step rt-qpcr mastermix (materials and methods). extracted nucleic acid samples used in figure were pooled, divided into aliquots, and treated with either nuclease-free water or ribonuclease a (rnase a). commercial rna, dna, and no-template controls (ntc) were treated in parallel. samples were analyzed using the viral n , viral n , and cdc-designed rp control primer/probe sets, and a transcript-specific control targeting exons - of rpp (table s ). correct positive viral and control signals are generated from each infected-patient pool after mock treatment (water, left). loss of rna integrity causes clinical samples to return negative signals from both viral probes, but the cdc-specified rp control due to dna-templated amplification (rnase, right). a transcript-specific rp control accurately informs as to loss of sample integrity. as desired, samples with degraded rna do not generate an extraction control signal from a transcript-specific design , marking the result as invalid, rather than "virus not detected". horizontal dashed lines reflect software-generated ct thresholds. sample preparation fully de-identified clinical "rna" (more properly, total nucleic acid) specimens, generated in excess during testing of sars-cov- rt-qpcr-positive patients, were provided by the stony brook hospital biobank. nucleic acids had been previously purified from polyester-tipped nasopharyngeal swabs desorbed in viral transport media (vtm) and processed in a clia high complexity pathology laboratory. briefly, nucleic acid was extracted from µl vtm using qiagen qiaamp dsp viral mini spin columns processed on a qiacube automation system as per vendor instructions (qiagen, bv) or from µl vtm using manual trizol-mediated organic extraction (life technologies) followed by alcohol precipitation ( ). samples were eluted in µl of rnase-free water supplemented with . % nan (qiagen) or resuspended in rnase-free water (trizol) and stored at - c until use. commercial pooled rna (quantigene human reference, qs , life technologies) and pooled human male dna (g , promega and , life technologies) were used to generate calibration curves and serve as reaction controls. dna samples were diluted in nuclease free water supplemented with . % tween- , rna samples were diluted in nuclease-free water. low binding filter tips were used for all liquid handling. pcr primer and probe design qpcr primers and hydrolysis ("taqman") probes were used for real-time detection. cdc-specified rnasep (rp) and sars-cov- (n and n ) probes were purchased from integrated dna technologies (ruo qualified, ; to conserve clinically certified reagents, research grade primers of identical composition were used). cdc primers were used as specified in the eua a total of five rpp probe set designs from multiple vendors were ordered and empirically tested (table s ). these include probes targeting exon-exon junctions and a minor-groove-binding taqman probe (life technologies). the best performing design, generating an amplicon across exons - with an exon-exon junctionspanning probe, was ordered with a ' -carboxyfluoroscein, -bhq and internally-quenched probe from idt as assay hs.pt. . . this design had the highest sensitivity (as lowest ct), highest efficiency, and best consistent terminal signal characteristics of all candidate probes tested (supplementary figure s a) . all non-cdc primer/probes were used at pmole primer/ pmole probe per µl reaction. qpcr reaction conditions reverse transcription coupled qpcr was carried out essentially as described in the cdc eua for covid- testing ( ) with the following modifications. to conserve supplies for clinical use, comparable research-grade reagents were used where possible. taqman™ fast virus -step master mix ( , life technologies) was used for one-step rt-qpcr and taqman fast advanced master mix ( , life technologies) was used for non-reverse transcriptase reactions. real-time data were collected on a validated viia qpcr instrument using a w block and vendor plates and seals. all runs were performed using cdc-specified thermal cycling parameters as per the eua. automatic baselining and thresholding of raw qpcr data was performed using vendor software (quantstudio . , life technologies). efficiency of all qpcr probes was tested by using a simplex -fold serial dilution of quantigene commercial reference human rna (invitrogen) or dna (promega) ranging from . ng to . pg per reaction using virus onestep mastermix and taqman fast mastermix. to robustly determine cdc-rp response to genomic dna, four completely independent dilutions of genomic dna were performed down to . pg/reaction and analyzed using virus one-step mastermix. sample pooling and rnase treatment limited-volume clinical samples were pooled to generate sufficient material for the rnase-exposure experiment. a total of clinical rna samples were pooled, fourteen each purified by qiagen and trizol (above, sample preparation). three pools were generated for each technology, containing , , and clinical samples, respectively. pooling was performed based on sequence of the de-identified serialized sample names (q , q …q ) provided by the stony brook hospital biobank. sample names do not reflect time of collection nor any other patient or sampleintrinsic information. pool composition, beyond positive sars-cov- status and by extraction technology, is effectively random. µl of each clinical purified nucleic acid specimen was pooled into a fresh rnase-free polypropylene tube. µl of nuclease-free water was added to each of the four-specimen containing pools to compensate for the lower specimen count (resulting in a total volume of µl per pool). µl of each pool was aliquoted to a fresh tube. µl aliquots of . ng/µl promega control dna and quantigene rna were generated, as were µl aliquots of nuclease-free water as a no-template control. µl of nuclease free water was added to each of the "sham" samples (six pools plus three controls). µl ( . u/µl) of recombinant rnasea was added to each "rnase" sample in a physically separate laboratory space. capped tubes were mixed, vortexed, centrifuged briefly, and incubated at ˚ for minutes immediately prior to rt-qpcr setup. one-step rt-qpcr was performed on treated and control samples using cdc-rp, cdc-n , cdc-n , and hs.pt. . assays. plotting of raw qpcr data raw qpcr data were exported as deltarn (rox-normalized, background subtracted fluorescence) per cycle using quantstudio. "amplificationdata" were imported into r for plotting. optical and quantization noise result in small values centered around (i.e. either + or - x - ) during cycles prior to generation of detectable signal. to eliminate undefined log transformation errors of negative values, a small offset value of . was added to each deltarn signal prior to plotting on log-axis scale. this is evident as a displayed baseline signal centered around . in figure . unmodified data were used for all calculations and are included in supplementary data; this transformation is for convenience to make ≤ data plottable on a logarithmic axis. controls are useful; why not add more? assay design should be fit to task and balance throughput, cost, and sensitivity/specificity. controls are necessary, but add assay complexity, reagent cost, and can compete with throughput. the current cdc-eua uses a single detection channel, -carboxyfluoroscein, for each of three probes that are analyzed separately. three wells are required per patient or control specimen. adding even one more control decreases assay capacity by % while increasing the cost of one-step mastermix per patient by the same amount. more is not always necessary. a well designed, transcript-specific sample collection, extraction, and rt-qpcr probe can achieve a composite control using one channel. if the control does not turn positive, a virus-negative reaction cannot be trusted. a composite control does not specify where a problem occurred, only that one has. modern qpcr (and by extension, rt-qpcr) is not limited to a single detection channel; multiplex designs can generate data from two to five (or more) targets in a single well. multiple fluorescent dyes on probes of different sequence can be combined in carefully-designed assays, increasing the number of targets that can be interrogated in one tube. limitations of multiplex qpcr hardware, fluorescent dyes, and the realities of molecular biology constrain the degree of multiplexing and drive design concessions. multiplex designs are widely used in the clinic, including for sars-cov- testing. the ability to multiplex does not obviate good assay design. several current panel designs use a synthetic rna spike-in to monitor extraction and reverse transcription/pcr. patient sampling and extraction controls are critical to clinical interpretation and should be prioritized. as of may , , none of the fda eua'd sars-cov- assay designs contain both a human sample collection control and a separate extraction/rt control. four-plex rt-qpcr, used by several panels, could be leveraged as a pair of viral probes, an rna spike to monitor extraction/rt-qpcr, and an rna-specific human specimen control. this approach retains control for presence of a human specimen while providing more granularity than available from a single composite control, such as a transcript-specific rpp probe. this two-control design would enable more efficient handling of specimens where no human specimen control signal is generated. an intact spike-in signal would mean that that extraction and downstream handling of rna through rt-qpcr were successful, despite no patient rna having been obtained. patients should be called back in, re-sampled, re-extracted, and re-tested. in contrast, failure of the spike-in control means that something likely went amiss during handling. whether caused by a bad pipet tip, a one-off problem with extraction, or shaky hands at the end of a shift, excess viral transport media could be re-extracted and re-tested before recalling a patient into the health care setting for a re-swabbing. why change the assay when one could fix the input? although a re-designed transcript-specific control eliminates the false negative potential described here, approaches to make best use of the current reagent will likely be proposed. one potential alternative to eliminate confounding genomic dna is to digest extracted samples with deoxyribonuclease (dnase). this should be avoided. this approach is expensive (far more than a re-designed primer), requires addition of divalent cations that can interfere with downstream reverse transcription and pcr, and is imperfect: dna has evolved to be more physically robust than rna. as shown above and in many other contexts, exponential amplification during pcr generates detectable signals from single-digit copies of template. even commercial offerings "reference rna" can be contaminated with low, but interfering, levels of human dna. quantigene human reference rna (life technologies) is specified as a mixture of dnasetreated rnas from multiple cell lines. in practice, this material continues to generate qpcr signals both using a taq-only, non-reverse transcribing, mastermix and a one-step rt-pcr mix following rnase a treatment (figs. and s ). ct is a unitless measurement and should be used with care when discussing quantity. . qpcr is able to operate in qualitative, semi-quantitative, or fully quantitative modes, depending on assay design. currently deployed assays are effectively qualitative (yes/no. above or below acceptable ct). throughout this manuscript, absolute ct values and difference in ct between samples are used as a proxy of inverse(log ) amount of input material or differences thereof between samples. comparisons are restricted to same-probe, same-instrument, same (or similar) reagent. calibration of ct:[nucleic acid] was performed using dilutions of commercial standards. an exception to this conservative approach is the use of the " cycle" threshold specified in the cdc eua. our results err on the side of underestimating the extreme sensitivity of the primary cdc design. the applied biosystems viia w instrument used here is moderately less sensitive (higher ct) for a given reaction composition and volume than a w fast reaction block in the same instrument (information obtained from life technologies technical support, / ). for a representative instrument, the w fast block ct was approximately . cycles lower across the dilution range. assays should be improved when feasible, not rubber-stamped and set in stone. i alerted cdc that their eua-specified suffered from a potential false-negative on march , (attached below). their response, sent thirty-six days later (also attached), dismissed my concerns at multiple levels. one: there had not been widespread reporting of false negatives from this test. two: cdc's clinical algorithms state that negative-testing patients should be retested and that clinical management should not rely solely on this test. three: the fda had approved the test, making it, de jure, acceptable. each of the cdc's arguments is fallacious. false negatives are inherently difficult to catch, and identification of true positives is a challenging bootstrap problem. the idea of "re-test if you're uncertain" is tone deaf in the context of widespread shortages of testing reagents and long backlogs for sample processing, not to mention the moderately invasive nature of nasopharyngeal sampling. appealing to fda's authority may be legally satisfying, but does nothing to actually address issues of scientific merit. we will never know how many virus-positive patients have been given what amounts to an "all clear" due to false-negatives generated by this flawed control design. the potential for misdiagnosis is real, and there is a window to make an important assay work as it was intended. the alternative would be for cdc to say "our test doesn't work as we intended; if you are worried that you're sick and your test has come back negative, come back and be tested again until your fears are allayed, or you test positive." fixing some problems is difficult; others are difficult to identify but easy to remedy once found. as shown here, this important assay can be made to operate as intended by a re-manufacture of a single reagent. a b rna, but not genomic dna, is a productive template for reverse-transcription-qpcr amplification using any of the five tested transcript-specific rpp controls. one-step rt-qpcr was performed with the same mastermix used for clinical samples across a -fold dilution series (materials and methods). as expected, pre-pandemic commercial reference rna is nonreactive to the sars-cov- primer/probe sets. all six primer/probe sets targeting rpp show a strong and relatively linear signal across a range of rna input (a) (s a). using the same reverse-transcriptase containing mastermix with mass-equivalent dilutions of rna-free genomic dna input, none of the transcript-specific probes generated a qpcr signal, within cycles, across a titration of purified dna ranging from , copies to . copies/reaction. (b) the cdc-specified rp control generates a strong signal from genomic dna. reactions containing ng of either total rna or genomic dna generated a ~ cycle difference for threshold crossing (supplemental text). modest baseline and ntc signal is due to offsetting of raw qpcr signals before plotting, as described in materials and methods. positive signal from a transcript-specific rpp probe requires both rna and reverse transcriptase (rt). the cdc-specified rp control requires neither rna nor rt when genomic is present. dilutions of either commercial rna or dna were used as input for reactions containing either viral one-step rt-qpcr mastermix (containing reverse transcriptase) or taqman fast qpcr mastermix (lacking reverse transcriptase) using either the cdc-specified or transcript-specific rpp probe. the transcript-specific probe generates signal only in reactions with an rna input where reverse-transcriptase is present (as a one-step mastermix). genomic dna does not generate a signal above baseline even in an extended -cycle amplification, nor does rna generate a signal in a qpcr-only (no reverse transcriptase) reaction for a transcriptspecific probe. signal generated for cdc-rp probe in pcr-only analysis of "rna" is due to low-concentration genomic dna present in quantigene commercial rna (supplemental discussion), further underscoring the need for transcript-specific primer design. the cdna-specific rpp assay used here is doubly-unable to generate signal qpcr signals from genomic-dna. reverse-transcribed rpp rna can serve as a template for the hs.pt. . qpcr assay generating a relatively short amplicon (s a). the forward and reverse primers bind in the bodies of exon and , respectively. the hydrolysis probe binds at the exon -exon junction that is formed during mrna splicing. this primer/probe set does not generate a signal from a genomic dna template (shown above), due to the ~ . kb spacing between the forward and reverse primers that is un-amplifiable using the short annealing/extension cycles of the cdc ramping program (s b). even if pcr product were generated, the hydrolysis probe binding sequence is split in half, with each side ~ kb apart in a genome-templated amplicon. colleagues at the cdc, i am writing to express my concern that the rnase p probe sequence specified in the -ncoveua- revision assay does not serve its desired function as a control of sample collection and reverse transcription. the goal of the rnasep probe is to provide a composite control that is positive if sufficient specimen, including intact rna, has been collected and a successful reverse-transcription qpcr has occurred. as per the eua, "failure to detect rnase p in any clinical specimens may indicate: − improper extraction of nucleic acid from clinical materials resulting in loss of rna and/or rna degradation." the specified rnasep probe is not selective for the rnasep transcript. the amplicon and probe sequence are contained within a single exon, do not span an exon-exon junction, and therefore detect both rna and dna templates (see illustration on following page). this fundamental problem undermines the control and makes "rnasep positive, viral transcript negative" samples ambiguous. samples with degraded rna (but intact human dna) will generate a qpcr signal for the rnasep control due to contaminating dna, but return a false negative for n and n (viral) amplicons due to the absence of intact rna. similarly, failure of the reverse transcription step precludes viral cdna synthesis, but any human genomic dna will generate a positive rp control signal. although the rp sequences are inherent non-selective for rnasep rna, it is conceivable that this primer was functionally tested in ideal conditions where sample extraction generated pure rna. as the range of sample preparation protocols used expands (either as specified in the eua or off-label), dna purification alongside rna is inevitable. as presently designed, the rnasep probe does not inform about the presence of intact specimen rna nor does it control for successful reverse transcription. this has the potential to adversely affect clinical decision making. the -ncoveua- rev assay certainly detects viral load from controls and patient samples (our facility has recently completely proficiency testing to that end), but this assay has the potential to return false negative results. i suggest that the " -ncov rrt-pcr diagnostic panel results interpretation guide" be updated to reflect this fact, and that a re-design of the rnasep probe (to generate a transcript-specific amplicon that will be insensitive to dna) be undertaken post haste. i am including annotations of genbank files for both rna (cds) and genomic sequences on the following page. thank you for your work on the rapid development of this assay; my goal in providing this feedback is to help improve upon your significant efforts and ensure that reported results reflect patient disease state. adam rosebrock, phd cdc's response, received april , . to protect the email addresses of those carbon copied, cdc-internal email addresses have been redacted. this email is being presented in full and unedited. it is internally ambiguous. the first paragraph correctly notes that my email reflected concerns about false negative potential. this is contradicted in the second paragraph as "false-positive results", which i hope to be a typo by the cdc. given the importance of this communication in the context of cdc's response, i am including a contact-info redacted version of this email in spite of the "do not distribute" message. no patient identifying information is present, and this represents us government work product (from the cdc) and that of a ny state, federally grant funded employee (the recipient). dear adam rosebrock, our apologies for the delayed response to your email. as you can imagine we are currently experiencing a high volume of inquiries, but patient care is a top priority and we want to thank you for reaching out to us about your concerns for the -ncov assay and the potential for false negatives. the assay rnasep control serves as an extraction (total nucleic acid) and amplification control. the assay positive control serves as a control for reverse transcriptase function. although there is a theoretical possibility that extracted rna might be degraded in individual samples, this assay design has been authorized by fda. in addition, cdc has not received feedback that the cdc eua covid- assay produces false-positive results when used according to the ifu. the cdc notes on its website and in its advice to both physicians and patients that the rt-pcr may come back negative if viral titers haven't yet reached the levels of detection, or potentially if the rna has degraded as is your concern, and to always factor in the patient's symptoms when making a diagnosis. i believe we can all agree that in this pandemic it is better to err on the side of caution. i hope this response answers some of your concerns, and have a wonderful day! this e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. any unauthorized review, use, disclosure or distribution is prohibited. if you are not the intended recipient, please contact the sender by e-mail and destroy all copies of the original. virological assessment of hospitalized patients with covid- revision: , centers for disease control and prevention: division of viral diseases acknowledgments: apr would like to thank amy a. caudy for technical discussion and critical reading of the manuscript, bruce futcher for discussion, and richard kew and karen bai key: cord- -o uplxe authors: beaudoin-bussières, guillaume; laumaea, annemarie; anand, sai priya; prévost, jérémie; gasser, romain; goyette, guillaume; medjahed, halima; perreault, josée; tremblay, tony; lewin, antoine; gokool, laurie; morrisseau, chantal; bégin, philippe; tremblay, cécile; martel-laferrière, valérie; kaufmann, daniel e.; richard, jonathan; bazin, renée; finzi, andrés title: decline of humoral responses against sars-cov- spike in convalescent individuals date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: o uplxe in the absence of effective vaccines and with limited therapeutic options, convalescent plasma is being collected across the globe for potential transfusion to covid- patients. the therapy has been deemed safe and several clinical trials assessing its efficacy are ongoing. while it remains to be formally proven, the presence of neutralizing antibodies is thought to play a positive role in the efficacy of this treatment. indeed, neutralizing titers of ≥ : have been recommended in some convalescent plasma trials for inclusion. here we performed repeated analyses at one-month interval on convalescent individuals to evaluate how the humoral responses against the sars-cov- spike, including neutralization, evolve over time. we observed that receptor-binding domain (rbd)-specific igg slightly decreased between six and ten weeks after symptoms onset but rbd-specific igm decreased much more abruptly. similarly, we observed a significant decrease in the capacity of convalescent plasma to neutralize pseudoparticles bearing sars-cov- s wild-type or its d g variant. if neutralization activity proves to be an important factor in the clinical efficacy of convalescent plasma transfer, our results suggest that plasma from convalescent donors should be recovered rapidly after symptoms resolution. plasma from convalescent donors should be recovered rapidly after symptoms resolution. until an efficient vaccine to protect from sars-cov- infection is available, alternative approaches to treat or prevent acute covid- are urgently needed. a promising approach is the use of convalescent plasma containing anti-sars-cov- antibodies collected from donors who have recovered from covid- ( ). convalescent plasma therapy was successfully used in the treatment of sars, mers and influenza h n pandemics and was associated with improvement of clinical outcomes ( - ). experience to date shows that the passive transfer of convalescent plasma to acute covid- patients has been shown to be well tolerated and presented some hopeful signs ( - ). in one study, the convalescent plasma used had high titers of igg to sars-cov- (at least : ), which correlated positively with neutralizing activity ( ). while it remains to be formally demonstrated, neutralizing activity is considered an important determinant of convalescent plasma efficacy ( ) and regulatory agencies have been recommending specific thresholds for qualifying convalescent plasma prior to its release. while neutralizing function has been associated with protection against reinfection in rhesus macaques expressing vsv-g. neutralizing activity against pseudoparticles bearing the sars-cov s glycoprotein was detected in only % of convalescent plasma and exhibited low potency, as previously reported (figure ) ( ) . of note, while we observed enhanced infectivity for the d g variant compared to its wt sars-cov- s counterpart ( figure s a ), no major differences in neutralization with convalescent plasma were detected at both time-points ( figure s b), thus suggesting that the d g change does not affect the overall conformation of the spike, in agreement with recent findings ( ) . the capacity to neutralize sars-cov- s wt or d g-pseudotyped particles significantly correlated with the presence of rbd-specific igg, igm and anti-s antibodies ( figure s ). interestingly, we observed a pronounced decrease ( - %) in the percentage of patients able to neutralize pseudoparticles bearing sars-cov- s glycoprotein between and weeks after symptoms onset. moreover, with plasma that still neutralized, the neutralization activity significantly decreased between these two time-points ( figure c ). interestingly, rbd- specific igm and neutralizing activity declined more significantly in convalescent plasma overtime compared to rbd-specific igg and anti-s abs ( figure s a , b). moreover, while the loss of neutralizing activity on the wt and d g pseudoparticles over time correlated with the loss of anti-rbd igm and igg antibodies, the correlation was higher for igm than igg ( figure s c, d), suggesting that at least part of the neutralizing activity could be mediated by igm, as recently proposed ( , ) . in summary, our study indicates that plasma neutralization activity keeps decreasing passed the sixth week of symptom onset ( ). it is currently unknown whether neutralizing activity is truly driving the efficacy of convalescent plasma in acute figure s . d g mutation enhances sars-cov- infectivity but does not affect its susceptibility to plasma neutralization.(a) reverse transcriptase normalized levels of pseudoviral particles bearing the sars-cov- s wt or d g variant were used to infect t/ace cells and infectivity measured h later by luciferase activity. graph shown represents the percentage of infectivity relative to pseudoviral particle bearing the sars-cov- s wt. statistical significance was tested using mann-whitney u tests (**** p < . ). (b) comparison between the neutralization id from pseudoparticles bearing sars-cov- s wt and sars-cov- s d g. statistical significance was tested using wilcoxon matched-pairs signed rank test. (ns, not significant). sars-cov- cell entry depends on ace and tmprss and is human coronavirus nl employs the severe acute respiratory syndrome coronavirus receptor for cellular entry cross-sectional evaluation of humoral responses against sars-cov- spike the membrane-proximal key: cord- -yffwd dc authors: douangamath, alice; fearon, daren; gehrtz, paul; krojer, tobias; lukacik, petra; owen, c. david; resnick, efrat; strain-damerell, claire; aimon, anthony; Ábrányi-balogh, péter; brandaõ-neto, josé; carbery, anna; davison, gemma; dias, alexandre; downes, thomas d; dunnett, louise; fairhead, michael; firth, james d.; jones, s. paul; keely, aaron; keserü, györgy m.; klein, hanna f; martin, mathew p.; noble, martin e. m.; o’brien, peter; powell, ailsa; reddi, rambabu; skyner, rachael; snee, matthew; waring, michael j.; wild, conor; london, nir; von delft, frank; walsh, martin a. title: crystallographic and electrophilic fragment screening of the sars-cov- main protease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yffwd dc covid- , caused by sars-cov- , lacks effective therapeutics. additionally, no antiviral drugs or vaccines were developed against the closely related coronavirus, sars-cov- or mers-cov, despite previous zoonotic outbreaks. to identify starting points for such therapeutics, we performed a large-scale screen of electrophile and non-covalent fragments through a combined mass spectrometry and x-ray approach against the sars-cov- main protease, one of two cysteine viral proteases essential for viral replication. our crystallographic screen identified hits that span the entire active site, as well as hits at the dimer interface. these structures reveal routes to rapidly develop more potent inhibitors through merging of covalent and non-covalent fragment hits; one series of low-reactivity, tractable covalent fragments was progressed to discover improved binders. these combined hits offer unprecedented structural and reactivity information for on-going structure-based drug design against sars-cov- main protease. through ribosome frame-shifting, generates two polyproteins pp a and pp ab (bredenbeek et al., ) . these polyproteins produce most of the proteins of the replicase-transcriptase complex (thiel et al., ) . the polyproteins are processed by two viral cysteine proteases: a papain-like protease (pl pro ) which cleaves three sites, releasing non-structural proteins nsp - and a c-like protease, also referred to as the main protease (m pro ), that cleaves at sites to release non-structural proteins (nsp - ). these non-structural proteins form the replicase complex responsible for replication and transcription of the viral genome and have led to m pro and pl pro being the primary targets for antiviral drug development (hilgenfeld, ) . structural studies have played a key role in drug development and were quickly applied during the first coronavirus outbreak. early work by the hilgenfeld group facilitated targeting the m pro of coronarviruses (hilgenfeld, ) , spectrum antivirals. the most successful have been peptidomimetic α-ketoamide inhibitors (zhang et al., a) , which have been used to derive a potent α-ketoamide inhibitor that may lead to a successful antiviral (zhang et al., b) . to date, no drugs targeting sars-cov- have been verified by clinical trials and treatments are limited to those targeting disease symptoms. to contribute to future therapeutic possibilities, we approached the sars-cov- m pro as a target for high throughput drug discovery using a fragment-based approach (thomas et al., ) . we screened against over unique fragments leading to the identification of high value fragment hits, including non-covalent and covalent hits in the active site, and hits at the vital dimerization interface. here, these data are detailed along with potential ways forward for rapid follow- up design of improved, more potent, compounds. we report the apo structure of sars-cov- m pro with data to . Å. the construct we crystallised has native residues at both n-and c--terminals, without cloning truncations or appendages which could otherwise interfere with fragment binding. electron density is present for all residues, including alternate conformations, many of which were absent in previous lower resolution crystal structures. the protein crystallised with a single protein polypeptide in the asymmetric unit, and the catalytic dimer provided by a symmetry-related molecule. the structure aligns closely with the m pro structures from sars-cov- and mers (rmsd of . Å and . Å respectively). the active site is sandwiched between two β-barrel domains, i (residue - ) and ii (residue - ) ( figure a ). domain iii (residue - ), forms a bundle of alpha helices and is proposed to regulate dimerization (shi and song, ) . the c-terminal residues, cys -gln , wrap against domain ii. however, the c terminal displays a degree of flexibility and wraps around domain iii in the n inhibitor complex (shi and song, ) (pdb id lu ). his and cys comprise the catalytic dyad and dimerisation completes the active site by bringing ser of the second dimer protomer into proximity with glu ( figure b ). this aids formation of the substrate specificity pocket and the oxyanion hole (hilgenfeld, between hits. screening at more stringent conditions ( µm per electrophile; . hours; °c) resulted in . % of the library labelling above % of protein (table s a) . these hits revealed common motifs, and we focused on compounds which offer promising starting points. compounds containing n-chloroacetyl-n'-sulfonamido-piperazine or n--chloroacetylaniline motifs were frequent hitters. such compounds can be highly reactive. therefore, we chose series members with relatively low reactivity for follow up crystallization attempts. for another series of hit compounds, containing a n-chloroacetyl piperidinyl- -carboxamide motif (table s ) which displays lower reactivity and were not frequent hitters in previous screens, we attempted crystallization despite their absence of labelling in the stringent conditions. while mild electrophilic fragments are ideal for probing the binding properties around the active site cysteine, their small size prevents extensive exploration of the substrate binding pocket. we performed an additional crystallographic fragment screen to exhaustively probe the m pro active site, and to find opportunities for fragment merging or growing. the electrophile fragment hits were added to crystals along with a total of unique fragments from libraries (table s ) . non-covalent fragments were soaked (collins et al., ) , whereas electrophile fragments were both soaked and co-crystallized as previously described (resnick et al., ) , to ensure that as many of the mass spectrometry hits as possible were structurally observed. a total of soaking and co-crystallization experiments resulted in mounted crystals. while some fragments either destroyed the crystals or their diffraction, datasets with a resolution better than . Å were collected. the best crystals diffracted to better than . Å, but diffraction to . Å was more typical, and no datasets worse than . Å were included in analysis ( figure s ). we identified fragment hits using the pandda method (pearce et al., ) , all of which were deposited in the protein data bank (table s ) active-site fragments eight fragments were identified that bind in the s subsite and frequently form interactions with the side chains of the key residues his , through a pyridine ring or similar nitrogen containing heterocycle, and glu through a carbonyl group in an amide or urea moiety ( figure ). several also reach across into the s subsite. this location, which we termed the "aromatic wheel" because of a consistent motif of an aromatic ring forming hydrophobic interactions with met or π-π stacking with his , with groups variously placed in axial directions. particularly notable is the vector into the small pocket between his , met and asp , exploited by three of the fragments (z (x ), z (x ) and z (x )) with fluoro and cyano substituents ( figure ). of the four fragments exploring subsite s , three contain an aromatic ring with a sulfonamide group forming hydrogen bonds with gln and pointing out of the active site towards the solvent interface ( figure ). these hits have expansion vectors suitable for exploiting the same his /met /asp pocket mentioned above. the experiment revealed one notable conformational variation, which was exploited by one fragment only (z (x ); figure ): a change in the sidechains of the key catalytic residues his , cys alters the size and shape of subsite s ʹ and thus the link to subsite s . this allows the fragment to bind, uniquely, to both s and s ʹ. in s , the isoxazole nitrogen hydrogen-bonds to his , an interaction that features in several other hits; and in s ʹ, the cyclopropyl group occupies the region sampled by the covalent fragments. notably, the n- methyl group offers a vector to access the s and s subsites. thus, compounds that interfere with dimerization might serve as quasi-allosteric inhibitors of protease activity. in this study three compounds bound at the dimer interface. fragment z (x ; figure a ) binds in a hydrophobic pocket formed by the sidechains of met , phe , arg and val . it also mediates two hydrogen bonds to the sidechain of gln and the backbone of met . its binding site is less than Å away from ser , whose mutation to alanine in sars-cov- protease reduced both dimerization and protease activity by about % ( ser . finally, pob (x ; york d library; figure c ), binds only Å from gly at the dimer interface and is encased between lys and val of one protomer and gly , arg , phe , lys and leu of the second, including two hydrogen bonds with the backbone of phe . heterobenzyl-piperazine motif crystallized in one binding mode with respect to the piperazinyl moiety ( figure c ) (with one exception, pcm- (x )). two structures (pcm- (x ), pcm- (x )) with a -halothiophen- -ylmethylene moiety exploit lipophilic parts of s , which is also recapitulated by the thiophenyl moiety in an analogous carboxamide (pcm- (x )). the other five structures point mainly to s , offering an accessible growth vector towards the nearby s pocket. a series of compounds containing a n-chloroacetyl piperidinyl- -carboxamide motif showed promising binding modes. to follow up on these compounds we performed a rapid second- generation compound synthesis. derivatives of this chemotype were accessible in mg-scale by reaction of n-chloroacetyl piperidine- -carbonyl chloride with various in-house amines, preferably carrying a chromophore to ease purification. these new compounds were tested by intact protein mass-spectrometry to assess protein labelling ( um compound; . h incubation, rt; table s b ). amides derived from non-polar amines mostly outcompeted their polar counterparts, hinting at a targetable lipophilic sub-region in this direction. the two amides with the highest labelling pg-cov- and pg-cov- (figure g,h) highlight the potential for further synthetic derivatization by amide n-alkylation or cross-coupling, respectively. the screen revealed unexpected covalent warheads from the series of -bromoprop- -yn- two peplites, containing threonine (ncl- (x )) and asparagine (ncl (x )) bound covalently to the active site cysteine (cys ), forming a thioenolether via c- addition with loss of bromine ( figure e ,f). the covalent linkage was unexpected and evidently the result of significant non-covalent interactions, specific to these two peplites, that position the electrophile group for nucleophilic attack. we note the side-chains make hydrogen-bonding interactions with various backbone nh and o atoms of thr and thr ; in the case of threonine, it was the minor r, r diastereomer (corresponding to d- allothreonine) that bound. the only other peplite observed (tyrosine, ncl- (x )) bound non-covalently to a different subsite. the highlighted structure activity relationships is important for further optimisation. bromoalkynes have intrinsic thiol reactivity that is lower than that of established acrylamide- proteins. the data presented herein provides many clear routes to developing potent inhibitors against sars-cov- . the bound fragments comprehensively sample all subsites of the active site revealing diverse expansion vectors, and the electrophiles provide extensive, systematic as well as serendipitous, data for designing covalent compounds. it is widely accepted that new small molecule drugs cannot be developed fast enough to help against covid- . nevertheless, as the pandemic threatens to remain a long-term problem and vaccine candidates do not promise complete and lasting protection, antiviral molecules will remain an important line of defence. such compounds will also be needed to fight future pandemics (hilgenfeld, ) . our data will accelerate such efforts: therapeutically, through design of new molecules and to inform ongoing efforts at repurposing existing drugs; and for research, through development of probe molecules (arrowsmith et al., ) to understand viral biology. one example is the observation that fragment z (x ) is a close analogue of melatonin, although in this case, it is unlikely that melatonin mediates direct antiviral activity through inhibition of m pro , given its low molecular weight; nevertheless, melatonin is currently in clinical trials to assess its immune-regulatory effects on covid (clinicaltrials.gov identifier nct ). in line with the urgency, results were made available online immediately for download. additionally, since exploring d data requires specialised tools ( figure . these can be expected to result in potent m pro binders and compound synthesis is ongoing. collectively, the covalent hits provide rational routes to inhibitors of low reactivity and high selectivity. rationally designed covalent drugs are gaining traction, with many recent fda approvals (singh et al., , bauer, . their design is based on very potent reversible binding, that allows precise orientation of a low reactivity electrophile, so that formation of the covalent bond is reliant on binding site specificity, with minimal off-targets. ( protease and other impurities were removed from the cleaved target protein by reverse nickel-nta. the relevant fractions were concentrated and applied to an s / gel filtration column equilibrated in mm hepes ph . , mm nacl buffer. the protein was concentrated to mg/ml using a kda mwco centrifugal filter device. crystallisation and structure determination: protein was thawed and diluted to mg/ml using mm hepes ph . , mm nacl. the sample was centrifuged at g for minutes. initial hits were found in well f of the proplex crystallisation screen, . m licl, . m tris ph , % peg k. these crystals were used to prepare a seed stock by crushing the proteins with a pipette tip, suspending in reservoir solution and vortexing for s in the reservoir solution with approximately glass beads ( . mm diameter, biospec products). adding dmso to the protein solution to a concentration of % and performing microseed matrix screening, many new crystallisation hits were discovered in commercial crystallisation screens. following optimisation, the final crystallisation condition was % peg k, % dmso, . m mes ph . with a seed stock dilution of / . the seeds were prepared from crystals grown in the final crystallisation condition. the drop ratios were . µl protein, . µl reservoir solution, . µl seed stock. crystals were grown using the sitting drop vapor diffusion method at °c and appeared within hours. initial diffraction data was collected on beamline i at diamond light source on a crystal grown in . m mes ph . , % peg k, cryoprotected using % peg . data were processed using dials ( light source on crystals grown using the . m mes ph . , % peg k, % dmso condition. to create a high-resolution dataset, datasets from crystals were scaled and merged using aimless (evans and murshudov, ) . waters acuity uplc class h instrument, in positive ion mode using electrospray ionization. uplc separation used a c column ( Å, . μm, mm × mm). the column was held at °c and the autosampler at °c. mobile solution a was . % formic acid in water, and mobile phase b was . % formic acid in acetonitrile. the run flow was . ml/min with gradient % b for min, increasing linearly to % b for min, holding at % b for . min, changing to % b in . min, and holding at % for min. the mass data were collected on a waters sqd detector with an m/z range of − . at a range of − m/z. raw data were processed using openlynx and deconvoluted using maxent. labelling assignment was performed as previously described (resnick et al., ) . fragment screening: fragments were soaked into crystals as previously described (collins et al. libraries.html). electrophile fragments identified by mass spectrometry were soaked by the same procedure as the other libraries, but in addition, they were also co-crystallised in the same crystallisation condition as for the apo structure. the protein was incubated with to - fold excess compound (molar ratio) for approximately h prior to the addition of the seeds and reservoir solution (following resnick et al (resnick et al., ) ). data were collected at the beamline i - at k and processed to a resolution of approximately . Å using xds (kabsch, ) the corresponding amides were prepared by addition of the acid chloride ( equiv.) as a dcm solution to the pertinent amines ( equiv.) in presence of pyridine ( equiv.) in dcm. heterogeneous reaction mixtures were treated with a minimal amount of dry dmf to achieve full solubility. after stirring the reaction mixtures overnight, the solvents were removed in by rotary evaporation, re-dissolved in % aq. mecn (and a minimal amount of dmso to achieve higher solubility), followed by purification by (semi-)preparative rp-hplc in mass-directed automatic mode or manually. synthesis of peplites hatu ( . eq.), dipea ( . eq.) and the acid starting material ( . eq.) were dissolved in dmf ( - ml) and stirred together at room temperature for min. - bromoprop- -yn- -amine hydrochloride was added and the reaction mixture was stirred at °c overnight. the reaction mixture was allowed to cool to room temperature, diluted with etoac or dcm and washed with saturated aqueous sodium bicarbonate solution, brine and water. the organic layer was dried over mgso , filtered and evaporated to afford crude product. the crude product was then purified by either normal or reverse phase chromatography. ( -bromoprop- -yn- -yl) n-( -bromoprop- -yn- -yl)- -(tert-butoxy) butanamide ( s, s)- -acetamido-n-( -bromoprop- -yn- -yl)- -(tert-butoxy)butanamide the reaction mixture was allowed to warm to room temperature and was stirred at room temperature for h then evaporated to dryness to afford crude product. the crude product was purified by flash silica chromatography, elution gradient - % meoh in dcm. pure fractions were evaporated to dryness to afford ( s, s)- -acetamido-n-( -bromoprop- -yn- (s)- -acetamido-n -( -bromoprop- -yn- -yl)succinimide (asparagine peplite) (s)- -acetamido-n -( -bromoprop- -yn- -yl)succinamide was synthesized according to general procedure a using (s)- -acetamido- -amino- -oxobutanoic acid ( mg, . mmol) and evaporating the reaction mixture to afford the crude product without aqueous work-up. the crude product was purified by flash silica chromatography, elution gradients - % meoh in dcm. pure fractions were evaporated to dryness to afford (s)- -acetamido-n - ( - bromoprop- -yn- -yl)succinamide ( mg, %) as a white solid. coronavirus main proteinase ( cl(pro)) structure: basis for design of anti-sars drugs the promise and peril of chemical probes new substructure filters for removal of pan assay interference compounds (pains) from screening libraries and for their exclusion in bioassays seven year itch: pan-assay interference compounds (pains) in -utility and limitations targeted covalent inhibitors for drug design long-range cooperative interactions modulate dimerization in sars cl(pro) covalent inhibitors in drug discovery: from accidental discoveries to avoided liabilities and designed therapies rapid experimental sad phasing and hot-spot identification with halogenated fragments severe respiratory illness caused by a novel coronavirus the primary structure and expression of the nd open reading frame of the polymerase gene of the coronavirus mhv polymerase is expressed by an efficient ribosomal frameshifting mechanism residues on the dimer interface of sars coronavirus c-like protease: dimer stability characterization and enzyme catalytic activity analysis quaternary structure of the severe acute respiratory syndrome (sars) coronavirus main protease gentle, fast and effective crystal soaking by acoustic dispensing a poised fragment library enables rapid synthetic expansion yielding the first reported inhibitors of phip( ), an atypical bromodomain covalent inhibitors design and discovery an interactive web-based dashboard to track covid- in real time design and synthesis of shape diverse -d fragments how good are my data and what is the resolution michelanglo: sculpting protein views on web pages without coding an improved model for fragment- based lead generation at astrazeneca structure-based design,synthesis, and biological evaluation of peptidomimetic sars-cov clpro inhibitors conservation of substrate specificities among coronavirus main proteases from sars to mers: crystallographic studies on coronaviral proteases enable antiviral drug design critical assessment of important regions in the subunit association and catalytic action of the severe acute respiratory syndrome coronavirus main protease two adjacent mutations on the dimer interface of sars coronavirus c-like protease cause different conformational changes in crystal structure structure of m(pro) from sars-cov- and discovery of its inhibitors integration, scaling, space-group assignment and post-refinement. acta crystallographica section d-biological crystallography dimple: a difference map pipeline for the rapid screening of crystals on the beamline heterocyclic electrophiles as new mura inhibitors design and characterization of a heterocyclic electrophilic fragment library for the discovery of cysteine-targeted covalent inhibitors the xchemexplorer graphical workflow tool for routine or large-scale protein-ligand structure determination early dynamics of transmission and control of covid- : a mathematical modelling study how significant are unusual protein-ligand interactions? insights from database mining newly discovered coronavirus as the primary cause of severe acute respiratory syndrome interactive jimd articles using the isee concept: turning a new page on structural biology data acedrg: a stereochemical description generator for ligands the alkyne moiety as a latent electrophile in irreversible covalent small molecule inhibitors of cathepsin k refmac for the refinement of macromolecular crystal structures crystallographic screening using ultra-low-molecular-weight ligands to guide drug design a multi-crystal method for extracting obscured crystallographic states from conventionally uninterpretable electron density electrophile-fragment screening the catalysis of the sars c-like protease is under extensive regulation by its extra domain the resurgence of covalent drugs assessment of chemical libraries for their druggability sars -beginning to understand a new virus mechanisms and enzymes involved in sars coronavirus genome expression structure-guided fragment-based drug discovery at the synchrotron: screening binding sites and correlations with hotspot mapping a structural view of the inactivation of the sars coronavirus main proteinase by benzotriazole esters data processing and analysis with the autoproc toolbox decision making in xia dials: implementation and evaluation of a new integration package fraglites-minimal, halogenated fragments displaying an efficient approach to druggability assessment and hit generation a new coronavirus associated with human respiratory disease in china structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design design of wide-spectrum inhibitors targeting coronavirus main proteases the crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor isolation of a novel coronavirus from a man with pneumonia in saudi alpha-ketoamides as broad-spectrum inhibitors of coronavirus and enterovirus replication: structure-based design, synthesis, and activity assessment crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors recent advances in selective and irreversible covalent ligand development and validation a novel coronavirus from patients with pneumonia in china key: cord- - pb ag j authors: anand, praveen; puranik, arjun; aravamudan, murali; venkatakrishnan, aj; soundararajan, venky title: sars-cov- selectively mimics a cleavable peptide of human enac in a strategic hijack of host proteolytic machinery date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pb ag j molecular mimicry of host proteins is an evolutionary strategy adopted by viruses to evade immune surveillance and exploit host cell systems. we report that sars-cov- has evolved a unique s /s cleavage site (rrarsvas), absent in any previous coronavirus sequenced, that results in mimicry of an identical furin-cleavable peptide on the human epithelial sodium channel α-subunit (enac-α). genetic truncation at this enac-α cleavage site causes aldosterone dysregulation in patients, highlighting the functional importance of the mimicked sars-cov- peptide. single cell rna-seq from studies shows significant overlap between the expression of enac-α and ace , the putative receptor for the virus, in cell types linked to the cardiovascular-renal-pulmonary pathophysiology of covid- . triangulating this cellular fingerprint with amino acid cleavage signatures of human proteases shows the potential for tissue-specific proteolytic degeneracy wired into the sars-cov- lifecycle. we extrapolate that the evolution of sars-cov- into a global coronavirus pandemic may be in part due to its targeted mimicry of human enac and hijack of the associated host proteolytic network. the surface of sars-cov- virions is coated with the spike (s) glycoprotein, whose proteolysis is key to the infection lifecycle. after the initial interaction of the s-protein with the ace receptor , host cell entry is mediated by two key proteolytic steps. the s subunit of the sprotein engages ace , and viral entry into the host cell is facilitated by proteases that catalyze s /s cleavage , at arginine- /serine- (figure a ). this is followed by s ' site cleavage that is required for fusion of viral-host cell membranes . we hypothesized that the virus may mimic host substrates in order to achieve proteolysis. comparing sars-cov- with sars-cov shows that the former has acquired a distinctive sequence insertion at the s /s site (figure a) . the resulting tribasic -mer peptide (rrarsvas) on the sars-cov- s /s site is conserved among , of , circulating strains deposited at gisaid , as of april , (table s ). this peptide is also absent in over , non-covid- coronavirus s-proteins from the vipr database . strikingly, examining over million peptides ( -mers) of , canonical human proteins from uniprotkb shows that the peptide of interest (rrarsvas) is present exclusively in human enac-ɑ, also known as scnn a (p-value = e- ) (supplementary methods). the location of this sars-cov- mimicked peptide in the enac-ɑ structure is in the extracellular domain (figure b) . enac regulates na+ and water homeostasis and its expression levels are controlled by aldosterone and the associated renin-angiotensin-aldosterone system (raas) . similar to sars-cov , enac-ɑ needs to be proteolytically activated for its function . furin cleaves the equivalent peptide on mouse enac-ɑ between the arginine and serine residues in the th and th positions respectively (rsar|sass) , , akin to the recent report establishing furin cleavage at the s /s site of sars-cov- (figure b) . furthermore, a frameshift mutation leading to a premature stop codon in serine- at the th position of the enac-ɑ mimicked peptide (rrar|svas) is known to cause the monogenic disease pseudohypoaldosteronism type (pha ) . this emphasizes the functional salience of the -mer peptide being mimicked by sars-cov- . mimicry of human enac-ɑ by the s /s site raises the specter that sars-cov- may be hijacking the protease network of enac-ɑ for viral activation. we asked whether there is an overlap between putative sars-cov- infecting cells and enac-ɑ expressing cells. systematic single cell expression profiling of the ace receptor and enac-ɑ was performed across human and mouse samples comprising ~ . methods, figure c ) . interestingly, enac-ɑ is expressed in the nasal epithelial cells, type ii alveolar cells of the lung, tongue keratinocytes, and colon enterocytes (figure c and figures s -s ) , which are all implicated in covid- pathophysiology , . further, ace and enac-ɑ are known to be expressed generally in the apical membranes of polarized epithelial cells , . the overlap of the cell-types expressing ace and enac-ɑ, and similar spatial distributions at the apical surfaces, suggest that sars-cov- may be leveraging the protease network responsible for enac cleavage. beyond furin that cleaves the s /s site , we were intrigued by the possibility of other host proteases also being exploited by sars-cov- . we created a -dimensional vector space ( amino acids x positions on the peptide) for assessment of cleavage similarities between the human proteases with biochemical validation from the merops database (supplementary methods; < protease similarity metric < ) . this shows that furin (pcsk ) has overall proteolytic similarity to select pcsk family members, specifically pcsk ( . ), pcsk ( . ), pcsk ( . ), pcsk ( . ), and pcsk ( . ) ( table s ) . it is also known that the protease plg cleaves the ɣ-subunit of enac (enac-ɣ) . in order to extrapolate the tissue tropism of sars-cov- from the lens of the host proteolytic network, we assessed the co-expression of these proteases concomitant with the viral receptor ace and enac-ɑ (figure ) . this analysis shows that furin is expressed with ace and enac-ɑ in the colon (immature enterocytes, transit amplifying cells) and pancreas (ductal cells, acinar cells) of human tissues, as well as tongue (keratinocytes) of mouse tissues. pcsk and pcsk are broadly expressed across multiple cell types with ace and enac-ɑ, making it a plausible broad-spectrum protease that may cleave the s /s site. in humans, concomitant with ace and enac-ɑ, pcsk appears to be expressed in cells from the intestines, pancreas, and lungs, whereas pcsk is noted to be co-expressed in the respiratory tract and the pancreas (figure ) . it is worth noting that the extracellular proteases need not necessarily be expressed in the same cells as ace and enac-ɑ. among the pcsk family members with the potential to cleave the mimicked -mer peptide, it is intriguing that the same tissue can house multiple proteases and also that multiple tissues do share the same set of proteases. our findings emphasize that redundancy may be wired into the mechanisms of host proteolytic activation of sars-cov- , and call for some caution in the ongoing development of selective human protease inhibitors as covid- therapeutics. the mimicry of a cleavable host peptide central to cardiovascular, renal, and pulmonary function provides a new perspective to the evolution of sars-cov- as the first coronavirus pandemic. the cartoon representation of the s-protein homotrimer from sars-cov- is shown (pdb id: vsb). one of the monomers in highlighted red. the pairwise alignment of the s /s cleavage site required for the activation of sars and sars-cov- is depicted. the amino acid insertion evolved by sars-cov- , along with the abutting cleavage site is shown in a box. (b) the cartoon representation of human enac protein is depicted (pdb id: bqn; chain in green), highlighting the enac-ɑ chain in green. the alignment on the right captures furin cleavage at the s /s site of sars-cov- , along with its striking molecular mimicry of the identical peptide from human enac-ɑ protein (circled in the cartoon rendering of human enac). the alignment also shows the equivalent -mer peptide of mouse enac-ɑ that is also known to be cleaved by furin. (c) the single cell transcriptomic co-expression of ace , enacɑ, and furin is summarized. the heatmap depicts the mean relative expression of each gene across the identified cell populations. the human and mouse single cell rna-seq are visualized independently. the cell types are ranked based on decreasing expression of ace . the box highlights the ace positive cell types in mouse and human samples. the heatmap depicts the relative expression of ace and enac-ɑ along with all the proteases that can potentially cleave the s /s site. the relative expression levels are denoted on a scale of blue (low) to red (high). the rows denote proteases and columns denote cell-types. structure, function, and antigenicity of the sars-cov- spike glycoprotein activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites mechanisms of coronavirus cell entry mediated by the viral spike protein data, disease and diplomacy: gisaid's innovative contribution to global health: data, disease and diplomacy viperdb : an enhanced and web api enabled relational database for structural virology epithelial sodium transport and its control by aldosterone: the story of our internal environment revisited structure of the human epithelial sodium channel by cryo-electron microscopy distinct pools of epithelial sodium channels are expressed at the plasma membrane epithelial sodium channels are activated by furin-dependent proteolysis five novel mutations in the scnn a gene causing autosomal recessive pseudohypoaldosteronism type knowledge synthesis from million biomedical documents augments the deep expression profiling of coronavirus receptors augmented curation of unstructured clinical notes from a massive ehr system reveals specific phenotypic signature of impending covid- diagnosis regulation of the epithelial sodium channel (enac) by membrane trafficking peripheral localization of the epithelial sodium channel in the apical membrane of bronchial epithelial cells the merops database of proteolytic enzymes, their substrates and inhibitors in and a comparison with peptidases in the panther database plasmin activates epithelial na + channels by cleaving the γ subunit the authors thank patrick lenehan, david zemmour, travis hughes, tyler wagner, and mathai mammen for their careful review and feedback. the authors are also grateful to ramakrishna chilaka for the software visualization tools, and dhruti patwardhan, saranya marimuthu, jaya jain, dariusz murakowski, and enrique garcia-rivera for their assistance with databases. key: cord- -kejlm authors: liu, xiaoyu; gao, fang; gou, liming; chen, yin; gu, yayun; ao, lei; shen, hongbing; hu, zhibin; guo, xiling; gao, wei title: neutralizing antibodies isolated by a site-directed screening have potent protection on sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: kejlm neutralizing antibody is one of the most effective interventions for acute pathogenic infection. currently, over three million people have been identified for sars-cov- infection but sars-cov- -specific vaccines and neutralizing antibodies are still lacking. sars-cov- infects host cells by interacting with angiotensin converting enzyme- (ace ) via the s receptor-binding domain (rbd) of its surface spike glycoprotein. therefore, blocking the interaction of sars-cov- -rbd and ace by antibody would cause a directly neutralizing effect against virus. in the current study, we selected the ace interface of sars-cov- -rbd as the targeting epitope for neutralizing antibody screening. we performed site-directed screening by phage display and finally obtained one igg antibody ( a ) and several domain antibodies. among them, a and three domain antibodies ( a , d , and a ) were identified to act as neutralizing antibodies due to their capabilities to block the interaction between sars-cov- -rbd and ace -positive cells. the domain antibody a was predicted to have the best accessibility to all three ace -interfaces on the spike homotrimer. pseudovirus and authentic sars-cov- neutralization assays showed that all four antibodies could potently protect host cells from virus infection. overall, we isolated multiple formats of sars-cov- -neutralizing antibodies via site-directed antibody screening, which could be promising candidate drugs for the prevention and treatment of covid- . coronavirus disease (covid- ) is a worldwide epidemic of respiratory disease caused by the novel human coronavirus sars-cov- , . currently, over three million infected people have been identified in more than countries and regions by laboratory testing, with an average mortality rate of approximately % (https://covid .who.int/). the real number of infected cases is even higher, considering the detection limitation in many counties. therefore, there is an urgent need to develop an effective vaccine and neutralizing antibody against sars-cov- . sars-cov- , a single-stranded positive-sense rna virus of the β-coronaviridae family . it shares % nucleotide sequence identity with sars-cov- . both sars-cov- and sars-cov- infect host cells by directly interacting with the host angiotensin-converting enzyme- (ace ) receptor through their spike glycoprotein expressed on the viral membrane and subsequently trigger the fusion of the cell and virus membrane for cell entry , . spike glycoprotein exists as a homotrimeric complex on the viral membrane of coronaviruses . each spike monomer contains an s subunit and an s subunit . the s subunit binds to ace through its receptor-binding domain (rbd) to initiate cell recognition, whereas the s subunit anchors the spike protein to the viral envelope and responds to s -induced cell recognition to mediate effective membrane fusion via a conformational transition . these determined infection mechanisms indicated that blocking the interaction of sars-cov- -rbd and ace would cause a direct neutralizing effect against virus. neutralizing antibody is one of the most effective interventions for acute pathogenic infection . several approaches are reported to obtained sars-cov- neutralizing antibodies successfully. among them, one approach is to screen the preexisting sars-cov- antibody repertoires by evaluating cross-reactivity . an alternative approach is to clone the neutralizing antibody from the isolated sars-cov- -rbd-specific single b cells from infected patients , . however, the feasibility of these two strategies is quite limited due to the rare chance of accessing either sars-cov- antibodies or sars-cov- -infected patients. therefore, in vitro site-directed screening in a human antibody library would be more feasible and efficient. the rbd is a relatively isolated domain of the s subunit with independent function . the crystal structure of the sars-cov- -rbd and the sars-cov- -rbd/ace complex , has already been reported. it presents quite similar interaction details compared to those of the previously determined sars-cov- -rbd/ace structure . notably, a recent study performed a systematic bioinformatics analysis to predict the potential b cell epitope and t cell epitope of sars-cov- . the only predicted conformational b cell epitope in the rbd is located within the ace interface (p -y ). this information suggests that the ace interface of sars-cov- -rbd might have high immunogenicity, which would be a suitable targeting epitope to develop sars-cov- -specific antibodies with potent neutralizing function by in vitro screening. in the current study, we selected the ace -interface of sars-cov- -rbd as the targeting epitope to screen neutralizing antibody. we performed site-directed antibody screening by phage display and finally obtained one igg antibody and three single domain antibodies with potent neutralizing activities for sars-cov- . these neutralizing antibodies are promising candidate drugs for the prevention and treatment of covid- . plasmids and reagents. the coding sequences of ace and the sars-cov- spike were cloned into the plvx vector to construct stable cell lines. the coding sequences of truncated ace (q -s ), sars-cov- -rbd (p -v ), sars-cov- -rbd (p -v ) and sars-cov- -rbd mut were cloned into the pfuse vector to obtain hfc-fusion proteins. sars-cov- -rbd mut was designed by substituting key residues on sars-cov- -rbd with ala or phe to disrupt its interaction with ace (table ) . the coding sequences of domain antibodies were also cloned into the pfuse vector to obtain hfc-fusion proteins. the coding sequences of the a heavy chain variable region and light chain variable region were amplified by adding the il- signal peptide and cloned into the expression vectors pfuse-chig-hg and pfuse -clig-hk (invivogen, san diego, ca), respectively. all plasmids were identified by sequencing. sars-cov- -rbd-his protein was purchased from genscript (genscript, nanjing), and gpc -his protein was purchased from r&d (minneapolis, mn). phage display. the tg clone was picked and cultured overnight in yt medium at °c. sars-cov- -rbd-his or sars-cov- -rbd-hfc proteins in pbs buffer were coated on elisa plates at °c overnight. the phage library (tomlison i library and domain antibody library) and the coated wells were blocked with pbst with % milk at room temperature for h. the blocked phages were precleaned by negative antigens and then added into the coated wells to incubate for h at room temperature. after washing times with pbst, the sars-cov- -rbd-binding phages were eluted with mm triethylamine (sigma-aldrich, xuhui, shanghai). all eluted phages were collected and used to infect tg cells. after incubating with helper phages, the eluted phages were rescued with a titer of approximately ~ pfu/ml for the next round of screening. elisa. for direct elisa, the indicated antigen ( μg/ml) was coated on an elisa plate at °c overnight. after blocking, biotin-labeled ace , blocked phage or the indicated antibodies were added to the wells and incubated at °c for . h. streptavidin-hrp (thermo, pudong new area, shanghai), a rabbit anti-m hrp antibody (for phage) (ge healthcare, milwaukee, wi), or a goat anti-human fcγ hrp antibody (jackson immunoresearch, west grove, pa) was added. tmb and h so were added to detect the od nm value. for capture elisa, an anti-his antibody ( µg/ml) was coated on an elisa plate at °c overnight. after blocking, soluble antibodies extracted from the periplasm of tg were added into the wells and incubated at °c for . h. after washing, sars-cov- -rbd-hfc, sars-cov- -rbd-hfc or sars-cov- -rbd mut-hfc protein ( μg/ml) was added and incubated at °c for . h. after washing, a goat anti-human fcγ hrp antibody (jackson immunoresearch, west grove, pa) was added and incubated at °c for . h. tmb and h so were added to detect the od nm value. were performed under the approved standard operating procedures of biosafety level laboratory. live sars-cov- was isolated from throat swabs of sars-cov- infected patients in jiangsu province and identified by sequencing (strai beta cov-js ). viruses were amplified in vero e cells and made as working stocks at pfu/ml. for the neutralization assay, vero e cells were seeded into -well plates at /well and cultured overnight. sars-cov- ( tcid ) was pre-incubated with a series of diluted antibodies at °c for h. then, the virus-antibody mixtures were added to seeded vero e cells. cytopathic effects (cpes) were photographed days later. statistical analysis. all group data are expressed as the mean ± standard deviation (sd) of a representative experiment performed at least in triplicate, and similar results were obtained in at least three independent experiments. all statistical analyses were conducted using graphpad prism . . two-tailed student's t-test of the means was used for statistical analysis, with p *< . defined as significant. to screen potent neutralizing antibodies against sars-cov- , we first analyzed the ace interface of sars-cov- -rbd ( figure a ). we selected sixteen residues essential for the hydrophobic or electrostatic effects within the ace interface of sars-cov- and made mutations ( table ) . the predicted structure model of sars-cov- -rbd mut showed that the overall conformation of the rbd did not change after mutation ( figure b ), but the surface property of the ace interface had been changed ( figure c ). we then fused sars-cov- -rbd, sars-cov- -rbd mut and sars-cov- -rbd with a human fc tag and performed purification ( figure d ). the ace binding activity of purified sars-cov- -rbd mut significantly decreased compared to that of sars-cov- -rbd ( figure e ), and a similar trend was also observed when we detected the binding of sars-cov- -rbd and sars-cov- -rbd mut on ace -cho cells ( figure f and figure g ). these results indicated that these mutations successfully abolished ace recognition by destroyed ace interface of sars-cov- -rbd. the purified sars-cov- -rbd mut would be suitable to function as the negative antigen in our screening. to obtain sars-cov- -specific neutralizing antibody, we performed site-directed antibody screening by phage display. we utilized sars-cov- -rbd-his and sars-cov- -rbd-hfc as the positive antigens and gpc -his and sars-cov- -rbd mut-hfc as the negative antigens to execute the selection within a naive human scfv antibody phage library and a domain antibody phage library, respectively ( figure a ). after four rounds of screening, the antigen-binding activity of the eluted phage dramatically increased ( figure b) . notably, the eluted phage exhibited a stronger binding signal on sars-cov- -rbd compared to that on sars-cov- -rbd mut, especially those from the domain antibody library ( figure c ), indicating an expected precleaning effect during selection. we then randomly picked single clones from the th round eluted phage and performed monoclonal phage elisa. the positive binders were enriched significantly in both libraries ( figure d ). all the positive binders were sequenced, and finally, we obtained nine enriched clones from the domain antibody library and one enriched clone from the scfv antibody library. among them, nine domain antibodies bound to sars-cov- -rbd specifically, whereas one scfv antibody ( a ) showed weak binding activity on sars-cov- -rbd and sars-cov- -rbd mut in the phage elisa ( figure e ). the nine binders isolated from the domain antibody library contained only the antibody heavy chain variable region. we then fused them with a human fc tag and performed purification. the a scfv binder was converted into a human igg and purified as well ( figure a ). among all the purified antibodies, a , d , a , c and a were selected for further evaluation due to their strong binding activities to both sars-cov- -rbd ( figure b ) and sars-cov- spike-overexpressing cells ( figure c ), in addition to their promising expression yield and purity. to examine the potential neutralizing capabilities of our candidate antibodies, we detected whether they would disturb the binding between sars-cov- -rbd and ace -positive cells. three domain antibodies ( a , d , a ) and a igg exhibited obvious inhibition in a dose-dependent manner, whereas the sars-cov- -neutralizing antibody m seemed to have no effect ( figure a and c). considering the high similarity of the rbds, we also evaluated the blocking effects of our antibodies on the binding between sars-cov- -rbd and ace -cho cells. none of our candidate antibodies showed inhibitory effects ( figure b and d) . notably, we suspected that clone a , which showed weak cross-reaction with sars-cov- -rbd in the phage elisa ( figure e ), might exhibit a certain blocking effect for the cell binding of sars-cov- -rbd, but actually, it showed specific blocking on only the cell binding of sars-cov- -rbd. we then performed spr to further evaluate the affinities of these candidate antibodies. the results showed that the affinities of our antibodies ranged from . nm to . nm (table and figure ). altogether, these results indicated that we obtained four antibodies that might have potential neutralizing functions against sars-cov- . domain antibody a was predicted to have advantage for accessing all three according to the recently reported cryo-em structure, the sars-cov- spike trimer appears in two distinct confirmation states: a closed state with the three rbds embedded and an open state with only one rbd extended for ace binding. since the extended rbd presents a complete exposed ace interface that would be more easily captured by an antibody than the closed rbd, we simply wondered whether the remaining embedded ace interfaces of both the open and closed spike trimers could be accessed by antibody. because the reported cryo-em structure of the sars-cov- spike trimer lacks some rbd residue information, we replaced its rbd with the determined structure of sars-cov- -rbd to establish the remodeled sars-cov- spike trimer. we then analyzed the interspace between each embedded ace interface and its neighboring monomer. the three interspaces in the closed trimer were quite uniform (from . Å to . Å). for the two interspaces of the open trimer, one did not change ( . Å), whereas the other was somehow occupied by the extended rbd ( . Å) ( figure a ). the domain antibody exhibited a relatively smaller size and antigen interface because it contains only a heavy chain variable region with three complementarity-determining regions (cdrs) instead of six in igg ( figure b ). we then performed molecular docking to compare the binding patterns of a scfv and the domain antibodies on our remodeled spike trimers. both a and the domain antibodies were predicted to recognize all three ace interfaces of the closed trimer ( figure c ). however, when docked with the open trimer, a scfv was predicted to target only two ace interfaces. it could not access the embedded ace interface occupied by the extended rbd, probably due to its larger size. surprisingly, one of the domain antibodies, a , was predicted to block all three ace interfaces ( figure d ). these predictions suggested that the domain antibody might have the advantage of blocking all the ace interfaces of both closed and open trimers, which might offer more effective protection against sars-cov- . nevertheless, detailed structural examination of the binding patterns of these antibodies with sars-cov- -rbd and sars-cov- spike trimers is extremely important in our future investigations. to evaluate the potential antiviral activities of our antibodies, we prepared sars-cov- pseudovirus by replacing the coding sequence of vsv glycoprotein with sars-cov- spike glycoprotein in a lentivirus packaging system ( figure a ). we preincubated our candidate antibodies with sars-cov- pseudovirus and then added them to cultured ace -cho cells. the domain antibodies a , a , and d and a igg exhibited obvious neutralizing potencies, with ic values from . µg/ml to . µg/ml ( figure b ). these results were consistent with the blocking pattern observed in the sars-cov- -rbd and ace -cho cell-binding assays ( figure a and c). although c did not show a blocking effect in our cell-binding assay, it still showed a mild neutralizing capability ( figure b ). based on our pseudovirus experiment, we selected a igg, a and a to evaluate their neutralizing activities on authentic sars-cov- . the observations of cpe showed that a , a and a exhibited complete protection at . µg/ml, µg/ml and . µg/ml, respectively, when we performed a -day exposure to sars-cov- ( tcid ) ( figure c and d) . these protective effects were still quite stable when the exposure time was extended to to days (table ) . overall, we isolated several human monoclonal neutralizing antibodies against sars-cov- by site-directed screening strategy, which could be promising candidate drugs for the prevention and treatment of covid- . currently, several studies have obtained sars-cov- antibodies with neutralizing activity by phage display, including full length antibody and domain antibody , . instead of using a mutant sars-cov- -rbd with a disrupted ace binding motif, one of these studies utilizing a captured ace /sars-cov- -rbd complex to perform the pre-absorption which also obtained neutralizing antibodies as well . the rbds mediate the ace interaction for both sars-cov- and sars-cov- . many investigations utilize the rbd as the target region to produce neutralizing antibodies and vaccines - . it has been reported that immunizing rodents with sars-cov- -rbd elicits a robust neutralizing antibody response without antibodydependent enhancement (ade) . according to the determined structure, the ace interface of the rbd presents only a small portion of the whole domain , and only those antibodies binding to the interface would directly interfere the interaction with ace . this would be one possible explanation for why not all the antibodies isolated with rbd binding activity showed neutralizing activity. based on this information, we performed site-directed screening with the positive antigen sars-cov- -rbd and negative antigen sars-cov- -rbd mut to ensure that we obtained antibodies accurately targeting the ace interface. as we predicted, most of our candidate antibodies exhibited a significant blocking effect for ace recognition on cells ( figure ). although the antibody c did not, it did show a relatively mild virus-neutralizing function compared to that of the other candidates (figure ) . these functional evaluations might prove the feasibility of our site-directed screening strategy, but further validations, such as epitope determination by cryo-em, are still necessary. the small size of the domain antibody enables several unique advantages, including high expression yield, enhanced tissue penetration, and hidden epitope targeting - . phage display is a well-established strategy for in vitro antibody screening . the antibody is amplified in the prokaryotic system quickly and efficiently . we initially obtained ten enriched binders that all showed strong antigen-binding activity. however, when expressed in t cells, three of them lost antigen binding, and two of them exhibited poor yield. this effect is a common problem for phage display when shifting to a eukaryotic expression system. other in vitro screening strategies, such as yeast display or mammalian cell display , may offer more stable selection due to their eukaryotic background. antibodies selected from naive phage libraries usually require further optimization to improve their affinity. even though the affinities of our antibodies are in a range ( - m) similar to that of antibodies screened from sars-cov- -infected patients , we still need to perform affinity maturation to achieve more competitive affinity and neutralizing function against sars-cov- . on the other hand, collecting the blood of infected donors to construct a sars-cov- -specific phage library for antibody screening will be an alternative strategy as well. in the current study, we established site-directed phage display screening, a feasible and efficient in vitro assay, to obtain neutralizing antibodies. in addition to screen neutralizing antibodies against viruses, this strategy could be widely used for isolating function-blocking antibodies targeting the essential domains of various antigens. potent human neutralizing antibodies elicited by sars-cov- infection human monoclonal antibodies block the binding of sars-cov- spike protein to angiotensin converting enzyme receptor a -amino acid fragment of the sars coronavirus s protein efficiently binds angiotensin-converting enzyme cryo-em structure of the -ncov spike in the prefusion conformation structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis for the recognition of sars-cov- by full-length human ace structure of sars coronavirus spike receptorbinding domain complexed with receptor a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- automated structure refinement of macromolecular assemblies from cryo-em maps using rosetta a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov mammalian cell display for the discovery and optimization of antibody therapeutics we thank our colleagues dr. yujie sun and dr. ningning wang for providing cell lines; the authors declare that they have no conflicts of interest with the contents of this key: cord- -mhviub e authors: le, brian l.; andreoletti, gaia; oskotsky, tomiko; vallejo-gracia, albert; rosales, romel; yu, katharine; kosti, idit; leon, kristoffer e.; bunis, daniel g.; li, christine; kumar, g. renuka; white, kris m.; garcía-sastre, adolfo; ott, melanie; sirota, marina title: transcriptomics-based drug repositioning pipeline identifies therapeutic candidates for covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mhviub e the novel sars-cov- virus emerged in december and has few effective treatments. we applied a computational drug repositioning pipeline to sars-cov- differential gene expression signatures derived from publicly available data. we utilized three independent published studies to acquire or generate lists of differentially expressed genes between control and sars-cov- -infected samples. using a rank-based pattern matching strategy based on the kolmogorov-smirnov statistic, the signatures were queried against drug profiles from connectivity map (cmap). we validated sixteen of our top predicted hits in live sars-cov- antiviral assays in either calu- or t-ace cells. validation experiments in human cell lines showed that of the compounds tested to date (including clofazimine, haloperidol and others) had measurable antiviral activity against sars-cov- . these initial results are encouraging as we continue to work towards a further analysis of these predicted drugs as potential therapeutics for the treatment of covid- . sars-cov- has already claimed at least a million lives, has been detected in at least million people, and has likely infected at least another million. the spectrum of disease caused by the virus can be broad ranging from silent infection to lethal disease, with an estimated infection-fatality ratio around % . sars-cov- infection has been shown to affect many organs of the body in addition to the lungs . three epidemiological factors increase the risk of disease severity: increasing age, decade-by-decade, after the age of years; being male; and various underlying medical conditions . however, even taking these factors into account, there is immense interindividual clinical variability in each demographic category considered . recently, researchers found that more than % of people who develop severe covid- have misguided antibodies-autoantibodies-that attack the innate immune system. another . % or more of people who develop severe covid- carry specific genetic mutations that impact innate immunity. consequently, both groups lack effective innate immune responses that depend on type interferon, demonstrating a crucial role for type interferon in protecting cells and the body from covid- . whether the type interferon has been neutralized by autoantibodies or-because of a faulty gene-is produced in insufficient amounts or induced an inadequate antiviral response, the absence of type ifn-mediated immune response appears to be a commonality among a subgroup of people who suffer from life-threatening covid- pneumonia . while numerous efforts are underway to identify potential therapies targeting various aspects of the disease, there is a paucity of clinically proven treatments for covid- . there have been efforts to therapeutically target the hyperinflammation associated with severe covid- , as well as to utilize previously identified antiviral medications , . one of these antivirals, remdesivir, an intravenously administered rnadependent rna polymerase inhibitor, showed positive preliminary results in patients with severe covid- . in october , the fda approved remdesivir for the treatment of covid- . dexamethasone has also been shown to reduce the mortality rate in cases of severe covid- . nevertheless, the lack of treatments and the severity of the current health pandemic warrant the exploration of rapid identification methods of preventive and therapeutic strategies from every angle. the traditional paradigm of drug discovery is generally regarded as protracted and costly, taking approximately years and over $ billion to develop and bring a novel drug to market . the repositioning of drugs already approved for human use mitigates the costs and risks associated with early stages of drug development, and offers shorter routes to approval for therapeutic indications. successful examples of drug repositioning include the indication of thalidomide for severe erythema nodosum leprosum and retinoic acid for acute promyelocytic leukemia . the development and availability of large-scale genomic, transcriptomic, and other molecular profiling technologies and publicly available databases, in combination with the deployment of the network concept of drug targets and the power of phenotypic screening, provide an unprecedented opportunity to advance rational drug design. drug repositioning is being extensively explored for covid- . high-throughput screening pipelines have been implemented in order to quickly test drug candidates as they are identified [ ] [ ] [ ] [ ] . in the past, our group has successfully applied a transcriptomics- based computational drug repositioning pipeline to identify novel therapeutic uses for existing drugs . this pipeline leverages transcriptomic data to perform a patternmatching search between diseases and drugs. the underlying hypothesis is that for a given disease signature consisting of a set of up and down-regulated genes, if there is a drug profile where those same sets of genes are instead down-regulated and upregulated, respectively, then that drug could be therapeutic for the disease. this method has shown promising results for a variety of different indications, including inflammatory bowel disease , dermatomyositis , cancer [ ] [ ] [ ] , and preterm birth . in existing work from xing et al. , this pipeline has been used to identify potential drug hits from multiple input disease signatures derived from sars-cov or mers-cov data. the results were aggregated to obtain a consensus ranking, with drugs selected for in vitro testing against sars-cov- in vero e cell lines, with four drugs (bortezomib, dactolisib, alvocidib and methotrexate) showing viral inhibition . however, this pipeline has not yet been applied specifically to sars-cov- infection. a variety of different transcriptomic datasets related to sars-cov- were published in the spring of . in may , blanco-melo et al. studied the transcriptomic signature of sars-cov- in a variety of different systems, including human cell lines and a ferret model . by infecting human adenocarcinomic alveolar basal epithelial cells with sars-cov- and comparing to controls, the authors generated a list of differentially expressed genes. they observed two enriched pathways: one composed primarily of type-i interferon-stimulated genes (isgs) involved in the cellular response to viral infection; and a second composed of chemokines, cytokines, and complement proteins involved in the humoral response. after infecting the cell lines, blanco-melo et al. did not detect either ace or tmprss , which are the sars-cov- receptor and sars-cov- protease, respectively . however, supported viral replication was observed, thereby allowing the capture of some of the biological responses to sars-cov- . in may , another study by lamers et al. examined sars-cov- infection in human small intestinal organoids grown from primary gut epithelial stem cells. the organoids were exposed to sars-cov- and grown in various conditions, including wnt-high expansion media. enterocytes were readily infected by the virus, and rna sequencing revealed upregulation of cytokines and genes related to type i and iii interferon responses . a limited amount of transcriptomic data from human samples has also been published. one study detailed the transcriptional signature of bronchoalveolar lavage fluid (of which responding immune cells are often a primary component) of covid- patients compared to controls . despite a limited number of samples, the results were striking enough to reveal inflammatory cytokine profiles in the covid- cases, along with enrichments in the activation of apoptosis and the p signaling pathways. on the drug side, data are available in the form of differential gene expression profiles from testing on human cells. publicly-available versions include the connectivity map (cmap) , which contains genome-wide testing on approximately , drugs, wherein the differential profile for a drug was generated by comparing cultured cells treated with the drug to untreated control cultures. here, we applied our existing computational drug repositioning pipeline to identify drug profiles with significantly reversed differential gene expression compared to predicted inhibitors (including one tested in calu- ) were incubated with sars-cov- infected human embryonic kidney t cells overexpressing ace ( t-ace ) with viral replication determined using an immunofluorescence-based assay. in this study, we applied our drug repositioning pipeline to sars-cov- differential gene expression signatures derived from publicly available rna-seq data ( figure ). the transcriptomic data were generated from distinct types of tissues, so rather than aggregating them together, we predicted therapeutics for each signature and then combined the results. we utilized three independent gene expression signatures (labelled "alv", "exp", and "balf"), each of which consisted of lists of differentially expressed genes between sars-cov- samples and their respective controls. the alv signature was generated from human adenocarcinomic alveolar basal epithelial cells by comparing sars-cov- infection to mock-infection conditions . the exp signature originated from a study where organoids, grown from human intestinal cells expanded in wnt-high expansion media, were infected with sars-cov- and then compared to controls . the balf signature was from a contrast of primary human balf samples from two covid- patients versus three controls . each of these signatures was contrasted with drug profiles of differential gene expression from cmap. for each of the input signatures, we applied a significance threshold false discovery rate (fdr) < . . we further applied minimum fold change thresholds in order to identify the driving genes. the alv signature had only genes, with genes shared with the drug profiles; in order to maintain at least genes for the pattern-matching algorithm to work with, we applied no fold-change threshold. for the exp signature, we applied a |log fc| > cutoff, resulting in genes for the expansion signature ( shared with the drug profiles). for the balf signature, we processed the raw read count data to calculate differential gene expression values. we applied a |log fc| > cutoff, with the balf data yielding , protein-coding genes for the lavage fluid signature ( shared with the drug profiles). the gene lists for each of these signatures can be found in the supplement (tables s , s , s ). we used gsea (gene set enrichment analysis) we used the publicly available single-cell rnaseq dataset gse composed of patients ( healthy, presenting with mild covid- symptoms, and presenting with severe covid- symptoms) to further characterise the expression of dkk ( figure s ). data were re-analyzed following the standard seurat pipeline. from the analyses of the single-cell data, dkk is highly expressed in covid- patients compared to controls, specifically in severe patients and it is expressed by epithelial cells. after analyzing the input sars-cov- signatures, we utilized our repositioning pipeline to identify drugs with reversed profiles from cmap ( figure ). significantly reversed drug profiles were identified for each of the signatures using a permutation approach: hits from the alv signature (table s ), hits from the exp signature (table s ) , and hits from the balf signature (table s ). when visualizing the gene regulation of the input signatures and their respective top drug hits, the overall reversal pattern can be observed ( figure c -e). in total, our analysis identified unique drug hits (table s ) . twenty-five common drug hits were shared by at least two of the signatures (p = . ), with four consensus drug hits (bacampicillin, clofazimine, haloperidol, valproic acid) across all three signatures (p = . ) (table , figure a ). we further characterized the common hits by examining their interactions with proteins in humans. we used known drug targets from drugbank and predicted additional targets using the similarity ensemble approach (sea) . we visualized the known interactions from drugbank in a network ( figure b ). we also aggregated the list of proteins which were found in drugbank for at least of the common hits (table s ) to confirm the validity of our approach, the inhibitory effects of of our drug hits which significantly reversed multiple sars-cov- profiles were assessed in live antiviral assays. the inhibitory effects of haloperidol, clofazimine, valproic acid, and fluticasone were evaluated in sars-cov- infected calu- cells (human lung epithelial cell line), with remdesivir also tested as a positive control. from these five, remdesivir and haloperidol inhibited viral replication ( figure a ), and the inhibitory effect was also observed by microscopy ( figure b ). additionally, drugs (bacampicillin, ciclopirox, ciclosporin, clofazimine, dicycloverine, fludrocortisone, isoxicam, lansoprazole, metixene, myricetin, pentoxifylline, sirolimus, tretinoin) were assessed in a live sars-cov- antiviral assay. remdesivir was again used as a positive control. this testing involved six serial dilutions of each drug to inhibit the replication of sars-cov- in t-ace cells using an immunofluorescence-based antiviral assay . all antiviral assays were paired with cytotoxicity assays using identical drug concentrations in uninfected human t-ace cells. positive control remdesivir and of our predicted drugs (bacampicillin, ciclopirox, ciclosporin, clofazimine, dicycloverine, isoxicam, metixene, pentoxifylline, sirolimus, and tretinoin) showed antiviral efficacy against sars-cov- , reducing viral infection by at least %, that was distinguishable from their cytotoxicity profile when tested in this cell line ( figure ) . several inhibitors showed micromolar to sub-micromolar antiviral efficacy, including clofazimine, ciclosporin, ciclopirox, and metixene. these results not only confirm our predictive methods, but have also identified several clinically-approved drugs with potential for repurposing for the treatment of covid- . here, we used a transcriptomics-based drug repositioning pipeline to predict therapeutic drug hits for three different input sars-cov- signatures, each of which came from distinct human cell or tissue origins. we found significant overlap of the therapeutic predictions for these signatures. twenty-five of our drug hits reversed at least two of the three input signatures (table ) . notably, of the hits from the exp signature were also hits for the balf signature, despite being generated from different types of tissue. the exp signature was generated from intestinal tissue, whereas the balf signature was generated from constituents of the respiratory tract. among the common hits reversing at least two of the signatures were two immunosuppressants (ciclosporin and sirolimus), an antiinflammatory medication (isoxicam), and two steroids (fludrocortisone and fluticasone). our testing of clofazimine demonstrated submicromolar antiviral effects of this drug in sars-co-v- infected t-ace and vero e cells (figures and s ) . clofazimine is an orally administered antimycobacterial drug used in the treatment of leprosy. by preferentially binding to mycobacterial dna, clofazimine disrupts the cell cycle and eventually kills the bacterium . in addition to being an antimycobacterial agent, clofazimine also possesses anti-inflammatory properties primarily by inhibiting t lymphocyte activation and proliferation validation experiments revealed antiviral activity for of drug hits. further clinical investigation into these drug hits as well as potential combination therapies is warranted. we have previously developed and used a transcriptomics based bioinformatics approach for drug repositioning in various contexts including inflammatory bowel disease, dermatomyositis, and spontaneous preterm birth. for a list of differentially expressed genes, the computational pipeline compares the ranked differential expression of a disease signature with that of a profile , , . a reversal score based on the kolmogorov-smirnov statistic is generated for each disease-drug pair, with the idea that if the drug profile significantly reverses the disease signature, then the drug could be potentially therapeutic for the disease. blanco-melo et al. generated a differential gene expression signature using rnaseq on human adenocarcinomic alveolar basal epithelial cells infected with sars-cov- propagated from vero e cells (gse ) . due to the fast-moving nature of the research topic, we opted to use this cell line data in lieu of waiting for substantial patient-level data. this work identified differentially expressed genes (degs) - upregulated and downregulated. we used these genes as the alv signature for our computational pipeline (table s ) (v. . . r package) to call differentially expressed genes (degs). after adjusting for the sequencing platform, the default settings of deseq were used. principal components were generated using the deseq function ( figure s ), and heat maps were generated using the bioconductor package pheatmap (v. . . ) using the rlogtransformed counts ( figure s ). values shown are rlog-transformed and rownormalized. volcano plots were generated using the bioconductor package enhancedvolcano (v. . . ) ( figure s ). retaining only protein-coding genes and applying both a significance threshold and a fold-change cutoff (fdr < . , |log fc| > ), we obtained , genes to be used as the balf signature (table s ). functional enrichment gene-set analysis for gsea (gene set enrichment analysis) was performed using fgsea (v. . . r package) and the input gene lists were ranked by log fold change. the hallmark gene sets used in the gsea analysis were downloaded from msigdb signatures database , . for go (gene ontology) terms, identification of enriched biological themes was performed using the david database . drug gene expression profiles were sourced from connectivity map (cmap), a publicly-available database of drugs tested on cancer cell lines computational gene expression reversal scoring to compute reversal scores, we used a non-parametric rank-based method similar to the kolmogorov-smirnov test statistic. this analysis was originally suggested by the creators of the cmap database and has since been implemented in a variety of different settings [ ] [ ] [ ] [ ] , . similar to past works, we applied a pre-filtering step to the cmap profiles to maintain only drug profiles which were significantly correlated with another profile of the same drug. drugs were assigned reversal scores based on their ranked differential gene expression profile relative to the sars-cov- ranked differential gene expression signature. a negative reversal score indicated that the drug had a profile which reversed the sars-cov- signature; that is, up-regulated genes in the sars-cov- signature were down-regulated in the drug profile and vice versa. single-cell data analysis quantification files were downloaded from geo gse . an individual seurat object for each sample was generated using seurat v. . while the data has been filtered by x's algorithm, we still needed to ensure the remaining cells are clean and devoid of artifacts. we calculated three confounders for the dataset: mitochondrial percentage, ribosomal percentage, and cell cycle state information. for each sample, cells were normalized for genes expressed per cell and per total expression, then multiplied by a scale factor of , and log-transformed. low quality cells were excluded from our analyses-this was achieved by filtering out cells with greater than , and fewer than genes and cells with high percentage of mitochondrial and ribosomal genes (greater than % for mitochondrial genes, and % for ribosomal genes). sctransform is a relatively new technique that uses "pearson residuals" (pr) to normalize the data. pr's are independent of sequencing depththanks . we "regress out" the effects of mitochondrial and ribosomal genes, and the cell cycling state of each cell, so they do not dominate the downstream signal used for clustering and differential expression. we then performed a lineage auto-update disabled r dimensional reduction (runpca function). then, each sample was merged together into one seurat object. data were then re-normalized and dimensionality reduction and significant principal for studies at the gladstone institutes, calu- cells, a human lung epithelial cell line for studies carried out at mount sinai, sars-cov- was propagated in vero e cells and t-ace cells, as previously described in , . two thousand cells were seeded into -well plates in dmem ( % fbs) and incubated for h at °c, % co . then, h before infection, the medium was replaced with predicted covid- fatality rates based on age, sex, comorbidities and health system capacity extrapulmonary manifestations of covid- inborn errors of type i ifn immunity in patients with life-threatening covid- covid- : consider cytokine storm syndromes and immunosuppression a protein interaction map identifies existing drugs targeting sars-cov- silico discovery of candidate drugs against covid- remdesivir for the treatment of covid- -preliminary report food and drug administration. fda approves first treatment for covid- dexamethasone in hospitalized patients with covid- -preliminary report the price of innovation: new estimates of drug development costs old drugs -new uses a sars-cov- protein interaction map reveals targets for drug repurposing network-based drug repurposing for novel coronavirus -ncov/sars-cov- comparative host-coronavirus protein interaction networks reveal panviral disease mechanisms in silico studies on therapeutic agents for covid- : drug repurposing approach discovery and preclinical validation of drug indications using compendia of public gene expression data computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease identification of alphaadrenergic agonists as potential therapeutic agents for dermatomyositis through drugrepurposing using public expression datasets computational discovery of niclosamide ethanolamine, a repurposed drug candidate that reduces growth of hepatocellular carcinoma cells in vitro and in mice by inhibiting cell division cycle signaling reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets a drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors computational discovery of therapeutic candidates for preventing preterm birth reversal of infected host gene expression identifies repurposed drug candidates for imbalanced host response to sars-cov- drives development of covid- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor sars-cov- productively infects human gut enterocytes transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in covid- patients the connectivity map: using gene-expression signatures to connect small molecules, genes, and disease gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles pgc- α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes single-cell landscape of bronchoalveolar immune cells in patients with covid- drugbank . : a major update to the drugbank database for relating protein pharmacology by ligand chemistry an in vitro microneutralization assay for sars-cov- serology and drug screening schizophrenia: more dopamine, more d receptors clofazimine: current status and future prospects clofazimine is a broad-spectrum coronavirus inhibitor that antagonizes sars-cov- replication in primary human cell culture and hamsters systematic integration of biomedical knowledge prioritizes drugs for repurposing a next generation connectivity map: l platform and the first , , profiles entrez gene: gene-centered information at ncbi circulating levels of gdf- and calprotectin for prediction of in-hospital mortality in covid- patients: a case series who | who model lists of essential medicines molecular signatures database (msigdb) . database for annotation, visualization, and integrated discovery normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression interferon_alpha_response interferon_gamma_response e f_targets myc_targets_v g m_checkpoint il _jak_stat _signaling kras_signaling_dn oxidative_phosphorylation estrogen_response_late androgen_response spermatogenesis tnfa_signaling_via_nfkb kras_signaling_up pancreas_beta_cells inflammatory_response myc_targets_v dna_repair epithelial_mesenchymal_transition allograft_rejection unfolded_protein_response complement estrogen_response_early protein_secretion uv_response_up fatty_acid_metabolism hedgehog_signaling apical_surface uv_response_dn notch_signaling reactive_oxigen_species_pathway tgf_beta_signaling il _stat _signaling mitotic_spindle peroxisome pi k_akt_mtor_signaling xenobiotic_metabolism p _pathway glycolysis coagulation apical_junction mtorc _signaling cholesterol_homeostasis adipogenesis key: cord- -ubkqny j authors: ranoa, diana rose e.; holland, robin l.; alnaji, fadi g.; green, kelsie j.; wang, leyi; brooke, christopher b.; burke, martin d.; fan, timothy m.; hergenrother, paul j. title: saliva-based molecular testing for sars-cov- that bypasses rna extraction date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ubkqny j convenient, repeatable, large-scale molecular testing for sars-cov- would be a key weapon to help control the covid- pandemic. unfortunately, standard sars-cov- testing protocols are invasive and rely on numerous items that can be subject to supply chain bottlenecks, and as such are not suitable for frequent repeat testing. specifically, personal protective equipment (ppe), nasopharyngeal (np) swabs, the associated viral transport media (vtm), and kits for rna isolation and purification have all been in short supply at various times during the covid- pandemic. moreover, sars-cov- is spread through droplets and aerosols transmitted through person-to-person contact, and thus saliva may be a relevant medium for diagnosing sars-cov- infection status. here we describe a saliva-based testing method that bypasses the need for rna isolation/purification. in experiments with inactivated sars-cov- virus spiked into saliva, this method has a limit of detection of - viral particles per ml, rivalling the standard np swab method, and initial studies also show excellent performance with clinical samples. this saliva-based process is operationally simple, utilizes readily available materials, and can be easily implemented by existing testing sites, thus allowing for high-throughput, rapid, and repeat testing of large populations. graphical abstract that can be realistically scaled to test thousands of individuals a day. figure . schematic of sars-cov- testing. a) the current, widely-utilized diagnostic process involves nasopharyngeal (np) swabs and viral transport media (vtm), followed by rna extraction and isolation, with rt-qpcr analysis of the samples. np swabs, vtm, and rna purification kits have been in short supply at various times. b) in april of , saliva was emergency use authorized (eua) as a diagnostic sample, using rna extraction and isolation, followed by rt-qpcr. c) other groups have reported direct testing of np swabs in vtm by rt-qpcr. d) the university of illinois at urbana-champaign (uiuc) protocol involves saliva collection in standard ml conical tubes, heating ( o c for min), followed by addition of buffer and analysis by rt-qpcr. when considering various sample collection possibilities, saliva is attractive due to the known detection of sars-cov- through oral shedding, and the potential for rapid and easy self-collection, [ ] [ ] [ ] thus minimizing the need for direct healthcare provider-patient contact and consequent conservation of ppe. in addition, a number of recent reports have detailed the detection of sars-cov- in saliva through the workflow in figure b , including a report showing higher viral loads in saliva when compared to matched np swabs from the same patients. importantly, saliva (expelled in aerosols and droplets) may be a significant factor in personto-person transmission of sars-cov- , and it has been suggested that np swab tests remain positive long after patients are infectious (potentially due to detection of inactive virus or remnants of viral rna in the np cavity), whereas sars-cov- viral loads in saliva are highest during the first week of infection, when a person is most infectious. these data suggest that viral loads in saliva may be a good reflection of the transmission potential of patients infected sars-cov- . [ ] [ ] [ ] while we are unaware of direct sars-cov- detection from saliva that bypasses rna isolation/purification, there are several reports of detection from swab/vtm that bypasses rna isolation/purification ( figure c) . [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] with the ultimate goal of providing convenient, scalable, and costeffective molecular diagnostic testing for > , individuals per day using a single covid- testing center, here we report the discovery of a sensitive saliva-based detection method for sars-cov- that bypasses rna isolation/purification ( figure d ). this sars-cov- testing process and workflow is convenient, simple, rapid, and inexpensive, and can be readily adopted by any testing facility currently using rt-qpcr. while sars-cov- has been identified in the nasopharynx, collecting np samples is neither trivial nor innocuous, and for repeat testing to track disease progression within a given patient this method may prove unreliable, due to inconsistencies in repeated sampling and potential formation of scar tissue, altogether resulting in possible false-negatives. compounding these anatomic limitations, the procedure for np sample collection is invasive, further reducing patient compliance for repeated and serial sampling. saliva may serve as an important mediator in transmitting sars-cov- between individuals via droplets and aerosols, [ ] [ ] [ ] and thus viral loads in saliva may serve as a highly relevant correlate of transmission potential. however, saliva is comprised of constituents that may hinder virus detection by rt-qpcr, such as degradative enzymes. as such, we sought to identify conditions that could take advantage of the many positives of saliva while overcoming potential limit of detection challenges with this collection medium. for the optimization phase of this work we utilized two versions of inactivated sars-cov- , one inactivated through gamma(γ)-irradiation ( x rads) and one inactivated through heat ( o c, min). for the detection of sars-cov- , we utilized the commercially available taqpath rt-pcr covid- kit, developed and marketed by thermo fisher scientific. this multiplex rt-qpcr kit targets the orf ab (replication), n-gene (nucleocapsid), and s-gene (spike) of sars-cov- . to reduce cost and extend reagent usage, we performed rt-qpcr reactions at half the suggested reaction mix volume. heat treatment. up-front heating of freshly collected saliva samples is attractive as a simple method to inactivate the virus without having to open the collection vessel. indeed, heat treatment is often used to inactivate saliva patient samples, , thus conferring added biosafety by decreasing the likelihood of viral transmission via sample handling by personnel. common conditions for sars-cov- inactivation are heating at - °c for - min, , although other temperature and times have been examined. using intact, γirradiated sars-cov- spiked into fresh human saliva (that was confirmed to be sars-cov- negative), we observed dramatic time-and temperature-dependent improvement in sars-cov- detection by direct rt-qpcr, without the use of rna extraction. when incubated at ambient temperature (no heat treatment), no sars-cov- genes were detectable ( figure ) . as temperature and incubation time were increased, substantial improvement in virus detection was observed, with % identification of all sars-cov- genes, in all replicate samples, being detected following a min incubation at °c. importantly, a short heating time ( minutes) at o c (as has been examined by others , ) does not allow for sensitive detection; the minute duration is essential, as it is likely that this extended heating inactivates components of saliva that inhibit rt-qpcr. thus, proper heating of patient samples allows for virus detection without the need for rna extraction, with the added benefit of inactivating the samples, thus substantially reducing biohazard risks. the effect of heat on sars-cov- detection. γ-irradiated sars-cov- (from bei, used at . x viral copies/ml) was spiked into fresh human saliva (sars-cov- negative). samples diluted : with x trisborate-edta (tbe) buffer ( . ml in ml conical tubes) were incubated at °c (ambient temperature), or in a hot water bath at °c, °c, or °c, for , , , or min. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples, a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . saliva collection buffer. we next sought to evaluate saliva collection buffers as a means to enhance viral rna stability, but also to increase uniformity between saliva samples and to decrease sample viscosity. in conjunction with rna isolation/purification, other groups have utilized protocols whereby saliva was provided by a patient and soon thereafter combined with the collection buffer; reported collection buffers include phosphate buffered saline (pbs), dna/rna shield, and tris-edta (te). using intact, γ-irradiated sars-cov- spiked into fresh human saliva, which was then heat treated at °c for min, we observed outstanding virus detection when saliva samples were combined with either tris-borate-edta (tbe) or te buffer ( figure a) . comparable ct values were observed between tbe and te buffer, but te yielded greater variability between individual gene replicates, whereas tbe buffer yielded highly clustered data. in stark contrast, combining saliva with pbs or two commercially available buffers (dna/rna shield, sdna- ), completely abrogated viral detection, including the ms bacteriophage internal control, indicating that these buffers directly interfere with the rt-qpcr reaction itself. tbe, te, and pbs were further titrated with different concentrations of sars-cov- , where similar trends were observed, namely, greater replicate variability with te buffer, and no virus detection with pbs (supporting figure ) . thus, when saliva samples are combined with tbe buffer to a final working concentration of x, sars-cov- is detectable in saliva without rna extraction; te buffer is also suitable but more variability is observed. these findings further suggest that while pbs and commercially available buffers may be appropriate for samples that are processed via rna extraction, these agents are incompatible with direct saliva-to-rt-qpcr. the effect of collection buffer on sars-cov- detection. γ-irradiated sars-cov- (from bei, at . x or . x viral copies/ml) was spiked into fresh human saliva (sars-cov- negative) and combined at a : ratio with tris-borate-edta (tbe), tris-edta (te), phosphate buffered saline (pbs), dna/rna shield (zymo research), or sdna- (spectrum solutions) such that the final concentration of each buffer was x. samples ( . ml in ml conical tubes) were incubated in a hot water bath at °c for min. (b) detergent optimization. γ-irradiated sars-cov- ( . x viral copies/ml) was spiked into fresh human saliva (sars-cov- negative) and combined : with tbe buffer at a final working concentration of x. samples were treated with detergents (triton x- , %, . %, . %; tween , %, . %, . %; np- , %, %, . %) after heating at °c for min. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples, a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . sample additives. in addition to saliva collection buffers, various additives have been explored for their ability to enhance sars-cov- detection. , - therefore, detergents, including triton x- , tween , and np- ( figure b) , as well as various rna stabilizing agents, including rnase inhibitor, carrier rna, glycogen, tcep, proteinase k, bovine serum albumin (bsa), rnalater, and pbs-dtt (supporting figure ) were examined. notably, modest improvements in viral detection were observed with all detergents tested (~ ct, figure b ) and with addition of carrier rna, rnase inhibitor, and bsa (supporting figure ) , these additives slightly improve virus detection, without interfering with rt-qpcr; in addition, if clinical saliva specimens are especially viscous, addition of detergent may improve ease of sample handling. however, inclusion of detergents prior to heat treatment inhibited viral detection, emphasizing the importance of adding detergents after heat treatment, if they are to be included (supporting figure ). of the detergents tested, tween was chosen for incorporation into the standard sample processing protocol, given its ease of handling and cost. when samples were treated with tween and tbe (alone or in combination, either before or after heating) the ideal workflow for virus detection, as defined by the lowest ct values with the greatest clustering of individual replicates, was tbe buffer before heating, and tween after heating (supporting figure ) . however, it is important to note that comparable results were obtained when tbe was added after heating (supporting figure ) , suggesting flexibility in when tbe buffer can be included during sample processing. altogether, the safest and most streamlined protocol would be: collection of saliva samples, heat at o c for min, add tbe buffer and tween , followed by rt-qpcr. using the optimized protocol of addition of tbe (or te) buffer at a : ratio with saliva, followed by heat treatment at °c for min and addition of tween to a final concentration of . %, the limit of detection (lod) was determined. other reports have suggested that sars-cov- is shed into saliva at a remarkably wide range from , - , , , copies/ml. , while the lod of sars-cov- approved diagnostic methods can vary considerably ( - , viral copies/ml ) and are not always reported, the best lod values for sars-cov- using rna extraction protocols appear to be approximately copies/ml. similarly, a lod of copies/ml was found for sars-cov- detection in saliva using rna purification. to determine the lod for this new direct protocol (salivart-qpcr), a side-by-side comparison was conducted of intact, γ-irradiated sars-cov- spiked into fresh human saliva compared to a process that includes rna isolation/purification. as shown in figure , comparable lod measurements were observed, with lod of ~ viral copies/ml for both the direct process with addition of tween and tbe buffer, and the process using rna purification. similar results were observed with heat-inactivated sars-cov- , whereby the lod was measured to be viral copies/ml for both rna extraction of saliva samples and direct saliva-to-rt-qpcr, with greater detection if the virus was directly analyzed in water (supporting limit of detection (lod) for assessment of sars-cov- from saliva, comparing a process utilizing rna isolation/purification to one that bypasses rna isolation/purification. γ-irradiated sars-cov- was spiked into fresh human saliva (sars-cov- negative), with or without tbe buffer( x) at . x , . x , . x , . x , . x , . x , . x , . x , and . x viral copies/ml. samples were incubated at °c for min, then combined with or without tween ( . %). all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples were either processed for rna extraction followed by rt-qpcr (purified rna), or directly analyzed by rt-qpcr (direct saliva). all samples, including a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . the limit of detection (lod) is indicated by the dotted vertical line. as the taqpath/mastermix rt-qpcr reagents from thermofisher provide the necessary specificity for sars-cov- detection in a simplified workflow, this system was utilized for all the experiments described above. however, we have also assessed the centers for disease control and prevention (cdc)-approved primers and probes for sars-cov- n and n genes, and the human rnase p (rp) gene control in this direct saliva-to-rt-qpcr protocol, and the results show that these primers give comparable lod values, with viral copies/ml using heat-inactivated sars-cov- , and viral copies/ml using γ-irradiated sars-cov- (supporting figure ) . these findings further illustrate that our optimized protocol may be used with comparable detection across multiple analytical platforms. altogether, these findings indicate that the optimized protocol (heat treatment of saliva samples at °c for min / addition of tbe buffer and tween ) yields a lod that is comparable to reported clinical viral shedding concentrations in oral fluid, thus emphasizing the translatability of the protocol to detecting sars-cov- in patient samples. in preparation for clinical samples and real-world testing, we first evaluated the ability to detect spiked inactivated virus in samples that were stored at varying temperatures (ambient ( °c), °c, - °c, and - °c), for varying lengths of time (≤ hrs). most importantly, at room temperature and at o c samples processed after hr showed little difference from those processed after hr storage, suggesting considerable flexibility in processing time (supporting figure ) . some increased variability between individual gene replicates and loss of signal was observed with prolonged storage and freeze/thaw cycles (supporting figure ). next, evaluation was made of the effect of sample volume in the saliva collection vessels ( ml conical tubes) on viral detection, after heating at °c for min in a hot water bath, due to concerns of evaporation of smaller samples and incomplete heating of larger samples. no appreciable difference was observed across the anticipated range of clinical saliva sample volumes ( . - ml), indicating that sample volume does not impact virus detection (supporting figure ) . furthermore, if samples are transferred to smaller vessels for more efficient long-term cold storage ( . ml microcentrifuge tubes), no appreciable differences in virus detection between different volumes is anticipated (supporting figure ) . finally, as clinical saliva samples can sometimes contain particulates, we next evaluated whether removal of the particulates via centrifugation affected viral detection (supporting figure ) . notably, if samples were centrifuged, with the resultant supernatant being used for direct rt-qpcr, the lod was approximately -fold worse, with fewer individual gene replicates being detected at lower viral copy numbers (supporting figure ) . therefore, we recommend avoiding centrifugation of samples if possible. altogether, these findings suggest that ( ) lod reproducibility. in order to evaluate the robustness of the optimized direct saliva-to-rt-qpcr approach, the lod of sars-cov- viral copies/ml was measured in independent replicate samples ( figure ). γ-irradiated sars-cov- was spiked into fresh saliva from two healthy donors, and two commercially available saliva sources. across all replicates, these samples with viral copies/ml were consistently detected (all three viral genes), further testifying to the ability of direct saliva-to-rt-qpcr to detect sars-cov- . in order to validate the specificity of our detection system to sars-cov- , saliva was spiked with or without sars-cov- (γ-irradiated virus, synthetic n-transcript), two other human coronaviruses (oc , e), sars and mers synthetic rna, and human rna (extracted from hek cells). among these samples, sars-cov- genes were only detected in the positive control, and sars-cov- samples, further supporting the specificity of the detection platform for sars-cov- (supporting figure ). limit of detection (lod) reproducibility. γ-irradiated sars-cov- was spiked into human saliva (sars-cov- negative), sourced fresh from two healthy donors, and purchased from two companies, in x tbe buffer at . x viral copies/ml. samples were incubated at °c for min, then tween was added to a final concentration of . %. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples were directly analyzed by rt-qpcr (direct saliva). all samples, including a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were analyzed by rt-qpcr, in replicates of , for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . our findings support an optimized sars-cov- diagnostic approach that increases accessibility to testing by using saliva (rather than np swabs) and eliminates the need for rna extraction (thus saving time and resources). we next sought to assess our protocol with clinical samples. although the changes in viral load in the np cavity and in saliva over time are unknown, there is reason to believe they are different, , so exact concordance between the two samples might not be expected; detection in saliva can provide complementary information to that in the np cavity. to evaluate the ability of the direct saliva-to-rt-qpcr approach to detect sars-cov- in clinical patient specimens, saliva was collected contemporaneously with np swabs from individuals using the following protocol: after saliva collection, te was added at a : ratio, and samples were frozen for over a week before processing. for the evaluation, samples were thawed, x tbe buffer was added to a final concentration of x, heated at °c for min, cooled to room temperature, and tween was added to a final concentration of . %, followed by direct rt-qpcr. given biological complexity in clinical samples, variabilities in signal detection based on viral load and gene target length (orf ab > s > n) may occur; therefore, a given result was interpreted as positive if one or more gene targets were detected, and negative if no gene targets were detected. furthermore, a result was considered valid if all gene targets were detected in the sars-cov- positive control and no gene targets were detected in the negative control. a notable power in the context of a multiplex system is the ability to evaluate three independent viral genes in a single reaction, rather than relying upon multiple probes across different reactions for a single viral gene (as is used in other systems). one of the benefits of saliva-based testing is the possibility of frequent and easy retesting of samples and of individuals, and as such duplicate testing (testing of the same saliva sample two different times) was utilized for this study. of the samples analyzed, were positive for sars-cov- as assessed by np swab, and upon duplicate testing the direct saliva-to-rt-qpcr process identified the same samples as positive, with of saliva samples positive in both of the replicates. all samples identified as negative by np swab were also negative via saliva testing, although in one of these samples one of the duplicate runs was positive, but was negative upon re-tests ( figure , table ). even though these samples were not run under the fully optimized protocol, this initial testing of clinical samples using direct saliva-to-rt-qpcr showed excellent performance. when testing samples a single time, it was . % sensitive and . % specific for sars-cov- , with an . % false negative and . % false positive rate, and . % positive and . % negative predictive value. using duplicate testing of samples, sensitivity and specificity, and positive and negative predictive values, all increased to %, and the false negative and positive rates decreased to %. comparison of np swab and saliva-based testing. when seeking to develop a sars-cov- molecular diagnostic protocol suitable for testing > , individuals a day, the ease with which saliva can be collected, and the known presence of the virus in saliva makes it highly desirable as the sample medium. as a diagnostic tool, such testing has the additional advantage of making assessments directly from an oral fluid that may be a culprit in transmission of sars-cov- . unfortunately, only a handful of studies have examined the viral load dynamics over time for saliva and np swab samples. , while these studies support the notion that sars-cov- tends to be at its highest level in saliva during the first week of infection, more information is needed on this important topic. in contrast, studies have shown that while live virus can no longer be cultured from patients days after symptom onset, np swabs continue to be positive after a patient is in the convalescent phase and no longer infectious. as such, it is quite possible that differences observed in studies comparing sars-cov- levels in saliva and np swabs are real, and not an artifact of different testing sensitivities; while in general concordance between the np swab and saliva testing has been high in other studies ( %, well, consistent with another report successfully using te to extract dry np swabs, tbe buffer provides more reliability and consistency in our direct saliva-to-rt-qpcr detection of sars-cov- . finally, the addition of the non-ionic detergent tween- also helped improve detection of sars-cov- , possibly by facilitating the opening of the viral capsid to allow the release of rna to provide sufficient template for rt-qpcr detection. our preliminary assessment of clinical samples is very promising, especially given that these samples were not collected and processed under the optimized protocol (they were collected before our discovery of the benefits of tbe buffer and tween ); with these samples te buffer was added to the sample, and they were frozen for over a week before processing. however, even under this non-optimized workflow we were able to identify all np swab positives with duplicate runs of the samples. next steps are to perform similar head-to-head comparisons between the np swab-based method and our optimized workflow with additional clinical samples. supply chain, costs, and next-generation technology. a major benefit of the simple workflow (see graphical abstract) detailed herein is its ability to be adopted by any diagnostic laboratory currently using rt-qpcr in sars-cov- testing. in addition to the time savings and major logistical benefits of using saliva and bypassing rna isolation/purification, our analysis of the costs of all reagents/disposables for this process amounts to ~$ per test, the bulk of which are the taqpath/mastermix. this cost could drop further if samples are pooled before rt-qpcr. pooling considerations will necessarily be informed by data on the expected positive rate in the population to be tested, and also the relationship between viral load and infectivity; - while one recent study showed that live sars-cov- could not be cultured from samples containing less than , , viral copies per ml, more information is needed. and, while there is no indication that taqpath/mastermix will be limited by the supply chain, we show that this process and workflow is also compatible with other primer sets, such as the n and n primers and probes from the cdc. in the future, development of analogous saliva-based processes that bypass rna isolation/purification can be envisioned for alternative back-end detection technologies, such as the lamp method, , which if successful would result in an even shorter overall time from sample collection to results. in summary, described herein is a sensitive diagnostic method for sars-cov- that is operationally simple, bypasses supply chain bottlenecks, evaluates a clinically relevant infectious fluid, is appropriate for large scale repeat testing, is cost effective, and can be readily adopted by other laboratories. large scale sars-cov- testing will be a powerful weapon in preventing spread of this virus and helping to control the covid- pandemic. this work was supported by the university of illinois, urbana-champaign. we would like to thank dr. rashid bashir and dr. enrique valera for their assistance in coordinating and obtaining irb approval for acquisition of clinical samples, as well as dr. mark johnson, reubin mcguffin, maryellen sherwood, and carly skadden for their assistance in collection and distribution of saliva and discarded vtm samples from carle foundation hospital. we are grateful to dr. lois hoyer for use of quantstudio rt-qpcr instruments. all clinical samples from study participants were collected in accordance with university of illinois at in all virus stocks and rna transcripts were aliquoted in small volumes and stored at - °c. stocks were serially diluted to the correct concentration in rnase-free water on the day of experimentation. we performed a multiplex rt-qpcr assay using the taqpath rt-pcr covid- kit (thermo fisher cn a ) together with the taqpath -step master mix -no rox (thermo fisher cn a ). to reduce cost, rt-qpcr reactions were prepared at half the suggested reaction mix volume ( . µl instead of µl). µl of either saliva in tbe buffer or extracted rna were used as templates for the rt-qpcr reaction. all saliva samples used for pre-clinical studies were spiked with purified ms bacteriophage ( : ms :sample) as an internal control prior to analysis by rt-qpcr. for clinical samples, ms was added to the preparation of the reaction mix ( µl ms per reaction). covid- positive control rna at genomic copies/µl was used. negative control is ultrapure dnase/rnase-free distilled water (ambion cn ). all rt-qpcr reactions were performed in . ml -well reaction plates in a quantstudio system (applied biosciences). the limit of detection (lod) of the assay was performed by serial dilution of γ-irradiated sars-cov- ( - . x viral copies/ml) used to spike pooled fresh saliva samples. the rt-qpcr was run using the standard mode, consisting of a hold stage at °c for min, °c for min, and °c for min, followed by cycles of a pcr stage at °c for sec then °c for sec; with a . °c/sec ramp up and ramp down rate. in some experiments, the cdc-approved assay was used to validate our data using the taqpath - pre-aliquoted γ-irradiated sars-cov- ( . x viral copies/ml) was spiked into pre-aliquoted fresh human saliva (sars-cov- negative) and combined with following completion of rt-qpcr, data were processed using quantstudio design and analysis supporting figure . saliva collection buffer titration. γ-irradiated sars-cov- ( . supporting figure . limit of detection optimization. heat-inactivated sars-cov- was spiked into fresh human saliva (sars-cov- negative) in . x te or water at . x , . x , . x , . x , . x , and . x viral copies/ml. samples were incubated at °c for min. all samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked samples were either processed for rna extraction using a commercially available kit (magmax), or directly analyzed by rt-qpcr (direct saliva). all samples, including a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . the limit of detection (lod) is indicated by the vertical dotted line. supporting figure . lod of direct saliva-to-rt-qpcr sars-cov- detection using cdc-approved primers and probes. heat-inactivated (a, b, c) and γ-irradiated (d, e, f) sars-cov- was spiked into fresh human saliva (sars-cov- negative) in x tris-borate-edta buffer (tbe) at . x , . x , . x , . x , . x , . x , and . x viral copies/ml. samples were incubated at °c for min. virus-spiked saliva samples, a positive control (pos; sars-cov- positive control, . x copies/ml) and a negative control (neg; water) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- n gene (a, d) and n gene (b, e), and the human rp gene (c, f). undetermined ct values are plotted at . figure . effect of sample volume on sars-cov- detection. γ-irradiated sars-cov- ( . x viral copies/ml) was spiked into fresh human saliva (sars-cov- negative) and combined with tbe buffer : at a final working concentration of x. the sample was distributed into either ml conical or . ml microfuge tubes, at either % ( ml in ml conical, µl in . ml microfuge), % ( . ml in ml conical, µl in . ml microfuge), or % ( . ml in ml conical, µl in . ml microfuge) the vessel storage capacity. samples were incubated in a hot water bath at °c for min. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples, a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and s-gene (blue circle), and ms (open circle). undetermined ct values are plotted at . overcoming the bottleneck to widespread testing: a rapid review of nucleic acid testing approaches for covid- detection assay techniques and test development for covid- diagnosis delivery of infection from asymptomatic carriers of covid- in a familial cluster efficient high throughput sars-cov- testing to detect asymptomatic carriers evidence supporting transmission of severe acute respiratory syndrome coronavirus while presymptomatic or asymptomatic substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application sars-cov- : what can saliva tell us? saliva-friend and foe in the covid- outbreak rapid detection of sars-cov- in saliva: can an endodontist take the lead in point-of-care covid- testing? saliva is more sensitive for sars-cov- detection in covid- patients than nasopharyngeal swabs comparison of sars-cov- detection in nasopharyngeal swab and saliva consistent detection of novel coronavirus in saliva sensitivity of nasopharyngeal swabs and saliva for the detection of severe acute respiratory syndrome coronavirus (sars-cov- ) sars-cov- detection by direct rrt-pcr without rna extraction preliminary support for "dry swab, extraction free" protocol for sars-cov- testing via rt-qpcr direct rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swabs without an rna extraction step detection of sars-cov- rna by direct rt-qpcr on nasopharyngeal specimens without extraction of viral rna rapid direct nucleic acid amplification test without rna extraction for sars-cov- using a portable pcr thermocycler sars-cov- detection from nasopharyngeal swab samples without rna extraction validation of an extraction-free rt-pcr protocol for detection of sars-cov rna extraction-free covid- (sars-cov- ) diagnosis by rt-pcr to increase capacity for national testing programmes during a pandemic negative nasopharyngeal and oropharyngeal swabs do not rule out covid saliva is a non-negligible factor in the spread of covid- clinical significance of a high sars-cov- viral load in the saliva could sars-cov- be transmitted via speech droplets? medrxiv blueprint for a pop-up sars-cov- testing lab heat inactivation of the severe acute respiratory syndrome coronavirus evaluation of heating and chemical protocols for inactivating sars-cov- effective heat inactivation of sars-cov- an alternative workflow for molecular detection of sars-cov- -escape from the na extraction kit-shortage detection of severe acute respiratory syndrome coronavirus (sars-cov- ) is comparable in clinical samples preserved in saline or viral transport medium self-collected oral fluid and nasal swabs demonstrate comparable sensitivity to clinician collected nasopharyngeal swabs for covid- detection rapid and extraction-free detection of sars-cov- from saliva with colorimetric sars-cov- detection using an isothermal amplification reaction and a rapid, inexpensive protocol for sample inactivation and purification fast sars-cov- detection protocol based on rna precipitation and rt-qpcr in nasopharyngeal swab samples a simple rna preparation method for sars-cov- detection by rt-qpcr emergency use authorizations. u.s. food and drug administration temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov- : an observational cohort study coronavirus disease test results after clinical recovery and hospital discharge among patients in china virological assessment of hospitalized patients with covid- saliva is a reliable tool to detect sars-cov- viral dynamics in asymptomatic patients with covid- a study on infectivity of asymptomatic sars-cov- carriers viral kinetics of sars-cov- in asymptomatic carriers and presymptomatic patients viral copies/ml) was spiked into fresh human saliva (sars-cov- negative) and combined with tbe buffer : to a final working concentration of x. samples ( . ml in ml conical tubes) were stored at °c (ambient temperature), °c, - °c, or - °c for , , , , , and hours. following storage, samples were incubated in a hot water bath at °c for min. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples stored under different conditions, a freshly prepared virusspiked saliva sample n-gene (red square), and s-gene x , and . x viral copies/ml) was spiked into fresh human saliva (sars-cov- negative) and combined with tbe buffer : at a final working concentration of x. samples were heat treated at °c for min, then treated with or without centrifugation at rpm for min. all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virusspiked saliva samples, centrifugation supernatants, a positive control (pos; sars-cov- positive control, . x copies/ml) and a negative control (neg; water) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene commercially available saliva (lee biosciences and innovative research) were combined in equal proportions, diluted : with x tbe buffer, and spiked . x viral copies/ml of sars-cov- (γ-irradiated virus or synthetic n-transcript rna), human coronaviruses ( e, oc ), sars and mers synthetic rna, and human rna all saliva samples were spiked with purified ms bacteriophage ( : ms :sample) as an internal control. virus-spiked saliva samples, a positive control (pos; sars-cov- positive control, . x copies/ml, no ms ) and a negative control (neg; water, no ms ) were directly analyzed by rt-qpcr, in triplicate, for sars-cov- orf ab (green triangle), n-gene (red square), and sgene (blue circle), and ms (open circle) key: cord- - pgish x authors: zhao, yu; zhao, zixian; wang, yujia; zhou, yueqing; ma, yu; zuo, wei title: single-cell rna expression profiling of ace ,thereceptor of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pgish x a novel coronavirus sars-cov- was identified in wuhan, hubei province, china in december of . according to who report, this new coronavirus has resulted in , confirmed infections and , deaths in china by february, , with additional patients being identified in a rapidly growing number internationally. sars-cov- was reported to share the same receptor, angiotensin-converting enzyme (ace ), with sars-cov. here based on the public database and the state-of-the-art single-cell rna-seq technique, we analyzed the ace rna expression profile in the normal human lungs. the result indicates that the ace virus receptor expression is concentrated in a small population of type ii alveolar cells (at ). surprisingly, we found that this population of ace -expressing at also highly expressed many other genes that positively regulating viral entry, reproduction and transmission. this study provides a biological background for the epidemic investigation of the covid- , and could be informative for future anti-ace therapeutic strategy development. seq technique, we analyzed the ace rna expression profile in the normal human lungs. the result indicates that the ace virus receptor expression is concentrated in a small population of type ii alveolar cells (at ). surprisingly, we found that this population of ace -expressing at also highly expressed many other genes that positively regulating viral reproduction and transmission. a comparison between eight individual samples demonstrated that the asian male one has an extremely large number of ace -expressing cells in the lung. this study provides a biological background for the epidemic investigation of the -ncov infection disease, and could be informative for future anti-ace therapeutic strategy development. severe infection by -ncov could result in acute respiratory distress syndrome (ards) and sepsis, causing death in approximately % of infected individuals , . once contacted with the human airway, the spike proteins of this author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint virus can associate with the surface receptors of sensitive cells, which mediated the entrance of the virus into target cells for further replication. recently, xu et.al., modeled the spike protein to identify the receptor for -ncov, and indicated that angiotensin-converting enzyme (ace ) could be the receptor for this virus . ace is previously known as the receptor for sars-cov and nl - . according to their modeling, although the binding strength between -ncov and ace is weaker than that between sars-cov and ace , it is still much higher than the threshold required for virus infection. zhou et. al. conducted virus infectivity studies and showed that ace is essential for -ncov to enter hela cells . these data indicated that ace is likely to be the receptor for -ncov. the expression and distribution of the receptor decide the route of virus infection and the route of infection has a major implication for understanding the pathogenesis and designing therapeutic strategies. previous studies have investigated the rna expression of ace in human tissues . however, the lung is a complex organ with multiple types of cells, and such real-time pcr rna profiling is based on bulk tissue analysis with no way to elucidate the ace expression in each type of cell in the human lung. the ace protein level is also investigated by immunostaining in lung and other organs , . these studies showed that in normal human lung, ace is mainly expressed by type ii and type i alveolar epithelial cells. endothelial cells were also reported to be ace positive. however, immunostaining analysis is known for its lack of signal author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint specificity, and accurate quantification is also another challenge for such analysis. the recently developed single-cell rna sequencing (scrna-seq) technology enables us to study the ace expression in each cell type and give quantitative information at single-cell resolution. previous work has built up the online database for scrna-seq analysis of normal human lung transplant donors . in current work, we used the updated bioinformatics tools to analyze the data. in total, we analyzed , cells derived from normal lung tissue of to further understand the special population of ace -expressing at , we performed gene ontology enrichment analysis to study which biological processes are involved with this cell population by comparing them with the at cells not expressing ace . surprisingly, we found that multiple viral process-related go are significantly over-presented, including "positive regulation of viral process" (p value= . ), "viral life cycle" (p value= . ), "virion assembly" (p value= . ) and "positive regulation of viral genome replication" (p value= . ). these highly expressed viral process-related genes in ace -expressing at include: slc a , cxadr, cav , nup , ctbp , gsn,hspa b,stom, rab b, hacd , itgb , ist ,nucks ,trim , apoe, smarcb ,ubp ,chmp a,nup ,hspa ,dag ,stau ,icam ,chmp ,d ek, vps b, egfr, ccnk, ppia, ifitm , ppib, tmprss , ubc, lamp and chmp . therefore, it seems that the -ncov has cleverly evolved to hijack this population of at cells for its reproduction and transmission. we further compared the characteristics of the donors and their ace expressing patterns. no association was detected between the ace expressing cell number and the age or smoking status of donors. of note, the male donors have a higher ace -expressing cell ratio than all other female author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint donors ( . % vs. . % of all cells, p value= . , mann whitney test). in addition, the distribution of ace is also more widespread in male donors than females: at least different types of cells in male lung express this receptor, while only ~ types of cells in female lung express the receptor. this result is highly consistent with the epidemic investigation showing that most of the confirmed -ncov infected patients were men ( vs. , by jan , ). we also noticed that the only asian donor (male) has a much higher ace - altogether, in the current study, we report the rna expression profile of ace in the human lung at single-cell resolution. our analysis suggested that the expression of ace is concentrated in a special population of at which expresses many other genes favoring the viral process. this conclusion is different from the previous report which observed abundant ace not only in at , but also in endothelial cells . in fact, to our knowledge, endothelial cells sometimes can be non-specifically stained in immunohistochemical analysis. author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint public datasets (geo: gse ) were used for bioinformatics analysis. firstly, we used seurat (version . . ) to read a combined gene-barcode matrix of all samples. we removed the low-quality cells with less than or more than , detected genes, or if their mitochondrial gene content was > %. genes were filtered out that were detected in less than cells. for normalization, the combined gene-barcode matrix was scaled by total umi counts, multiplied by , and transformed to log space. the highly variable genes were identified using the function findvariablegenes. variants arising from number of umis and percentage of mitochondrial genes were regressed out by specifying the vars.to.regress argument in seurat function scaledata. the expression level of highly variable genes in the cells was scaled and centered along each gene, and was conducted to principal component analysis. then we assessed the number of pcs to be included in downstream analysis by ( ) plotting the cumulative standard deviations accounted for each pc using the function pcelbowplot in seurat to identify the 'knee' point at a pc number after which successive pcs explain diminishing degrees of variance, and ( ) by exploring primary sources of heterogeneity in the datasets using the pcheatmap function in seurat. based on these two methods, we selected the first top significant pcs for two-dimensional t-distributed stochastic neighbor embedding (tsne), implemented by the seurat software with the default parameters. we used findclusters in seurat to identify cell clusters for each author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint sample. following clustering and visualization with t-distributed stochastic neighbor embedding (tsne), initial clusters were subjected to inspection and merging based on the similarity of marker genes and a function for measuring phylogenetic identity using buildclustertree in seurat. identification of cell clusters was performed on the final aligned object guided by marker genes. to identify the marker genes, differential expression analysis was performed by the function findallmarkers in seurat with wilcoxon rank sum test. differentially expressed genes that were expressed at least in % cells within the cluster and with a fold change more than . (log scale) were considered to be marker genes. tsne plots and violin plots were generated using seurat. b. cellular cluster map of the asian male. all samples were analyzed using the seurat r package. cells were clustered using a graph-based shared nearest neighbor clustering approach and visualized using a t-distributed stochastic neighbor embedding (tsne) plot. author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: biorxiv preprint clinical features of patients infected with novel coronavirus in wuhan a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission the s proteins of human coronavirus nl and severe acute respiratory syndrome coronavirus bind overlapping regions of ace crystal structure of nl respiratory coronavirus receptor-binding domain complexed with its human receptor expression of elevated levels of pro-inflammatory cytokines in sars-cov-infected ace + cells in sars patients: relation to the acute lung injury and pathogenesis of sars discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. biorxiv no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis binding of sars coronavirus to its receptor damages islets and causes acute diabetes single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis no reuse allowed without permission. the copyright holder for this preprint (which was not peer-reviewed) is the key: cord- -vert n authors: márquez-lópez, cristina; roche-molina, marta; garcía-quintáns, nieves; sacristán, silvia; siniscalco, david; gonzález-guerra, andrés; camafeita, emilio; lytvyn, mariya; guillen, maría i.; sanz-rosa, david; martín-pérez, daniel; sánchez-ramos, cristina; garcía, ricardo; bernal, juan a. title: sars-cov- protein nsp alters actomyosin cytoskeleton and phenocopies arrhythmogenic cardiomyopathy-related pkp mutant date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vert n mutations in desmosomal plakophilin- (pkp ) are the most prevalent drivers of arrhythmogenic cardiomyopathy (acm) and a common cause of sudden cardiac death in young athletes. however, partner proteins that elucidate pkp cellular mechanism to understand cardiac dysfunction in acm are mostly unknown. here we identify the actin-based motor proteins myh and myh as key pkp interactors, and demonstrate that the expression of the acm-related pkp mutant r x alters actin fiber organization and cell mechanical stiffness. we also show that sars-cov- nsp protein acts similarly to this known pathogenic r x mutant, altering the actomyosin component distribution on cardiac cells. our data reveal that the viral nsp hijacks pkp into the cytoplasm and mimics the effect of delocalized r x mutant. these results demonstrate that cytoplasmic pkp , wildtype or mutant, induces the collapse of the actomyosin network, since shrna-pkp knockdown maintains the cell structure, validating a critical role of pkp localization in the regulation of actomyosin architecture. the fact that nsp and pkp mutant r x share similar phenotypes also suggests that direct sars-cov- heart infection could induce a transient acm-like disease in covid- patients, which may contribute to right ventricle dysfunction, observed in patients with poor survival prognosis. highlights the specific cardiac isoform plakophilin- a (pkp ) interacts with myh and myh . pkp delocalization alters actomyosin cytoskeleton component organization. sars-cov- nsp protein hijacks pkp from the desmosome into the soluble fraction where it is downregulated. viral nsp collapses the actomyosin cytoskeleton and phenocopies the arrhythmogenic cardiomyopathy-related mutant r x. coronavirus disease is caused by the severe acute respiratory syndrome coronavirus (sars-cov- ). in just a few months, covid- has become a worldwide health problem of unprecedented spread since the spanish flu. it is characterized by respiratory symptoms, but cardiac complications (fried et al, ) including arrhythmias, heart failure, and viral myocarditis are also common, implicating myocardial injury as a possible pathogenic mechanism contributing to severe illness and mortality . emerging evidence suggests that the sars-cov- infection compromises right ventricle (rv) function beyond inducing lung injury and acute respiratory distress syndrome. in fact, several studies show that rv dilation is strongly associated with in-hospital mortality, and that compromised rv function identifies higher risk patients with covid- (argulian et al, ; li et al, ) . although the generalized inflammatory response (cytokine storm) caused by covid- can induce a deleterious cardiac disfunction, the direct impact of sars-cov- infection on cardiac tissues is not well-understood. like other coronaviruses, sars-cov- genome is made out of a singlestranded rna, about ∼ kb in size, encoding for proteins. the first open reading frames (orf) orf a/b, located at the ′ end, comprehend about two-thirds of the whole genome length, and encode polyproteins a and b (pp a, pp b). these polyproteins are processed into non-structural proteins (nsps) to form a replication-transcription complex that is involved in genome transcription and replication (thi nhu thao et al, ) . amongst them is the n-terminal nonstructural protein (nsp ) that, similar to other coronaviruses, displays a analogous biological function suppressing host gene expression (narayanan et al, ; tohya et al, ) . nsp binds to the small ribosomal subunit and prevents canonical mrna translation by blocking the mrna entry tunnel (thoms et al, ) . despite sharing over % of the amino acid sequence with the nsp of the original sars-cov, the differences could hide new functions. little is known about the role of nsp in determined cell types and the consequences of recently described protein-protein interactions with host cell proteins like desmosomal plakophilin- (pkp ) (gordon et al, ) . alterations affecting the cardiac isoform of pkp a, which lacks the exon (gandjbakhch et al, ) , account for - % of genotype-positive patients with arrhythmogenic cardiomyopathy (acm) (den haan et al, ; van tintelen et al, ) . referred as a desmosomal disorder, acm is a genetic disease of the heart muscle (alcalde et al, ; austin et al, ) that predisposes to sudden cardiac death (scd) (goff & calkins, ) , particularly in young patients and athletes (coelho et al, ; maron et al, ) . for many years acm has been known as arrhythmogenic right ventricular cardiomyopathy (arvc) (corrado et al, a) , showing autosomaldominant inheritance and usually manifesting as impaired function of the rv (mckoy et al, ) . desmosomes are dynamic intercellular junctions that maintain the structural integrity of skin and heart tissues by withstanding shear forces (al-jassar et al, ; patel & green, ) . actin is one of the major cytoskeletal proteins in eukaryotic cells and plays an essential role in several cellular processes, including mechano-resistance and contractile force generation (koenderink & paluch, ) . defective regulation of the organization of actin filaments in sarcomeres, owing to genetic mutations or deregulated expression of cytoskeletal proteins, is a hallmark of many heart and skeletal muscle disorders (grimes et al, ) . here, we uncover a new function of sars-cov- nsp protein that drives inappropriate actomyosin cytoskeleton organization through a pkp -dependent mechanism. by sequestering pkp from the desmosome, nsp interferes with the turnover and localization of the actin-based motor proteins non-muscle myosin heavy chain ii-a (myh ) and b (myh ). we identified these actomyosin components as key interactors of pkp cardiac isoform using biochemical and mass spectrometry approaches. the main role of myh and myh proteins is to orchestrate the mechanoenzymatic properties of stress fibers. myosins execute numerous mechanical tasks in cells, including spatiotemporal organization of the actin cytoskeleton, adhesion, migration, cytokinesis, tissue remodeling, and membrane trafficking (juanes- garcia et al, ; pecci et al, ; vicente-manzanares et al, ; vicente-manzanares et al, ; vicente-manzanares et al, ) . using atomic force microscopy together with fluorescence imaging, we show that expression of pkp (c. c>t), encoding the r x mutant found in acm patients from different families (alcalde et al., ; gerull et al, ) alters cell mechanical stiffness, and alike nsp , modifies actomyosin cytoskeleton. we found that nsp and r x expression led to ultrastructural alterations in myh and myh and a rearrangement of the f-actin cytoskeleton, paired with reduced cell height and the collapse of the actomyosin cytoskeleton. these results suggest that pkp plays an important role in shape determination and actin homeostasis and that these functions can be modified both by the sars-cov- protein nsp interaction and the acmrelated mutant r x. all together our results suggest that nsp , through its interaction with pkp in the organization of actomyosin cytoskeleton, may induce a transient acm-like phenotype in covid- patients with cardiac infection. we recently demonstrated that a c-terminal deletion pkp mutant (r x) operates as a gain-of-function protein in arrhythmogenic cardiomyopathy (acm) (cruz et al, ) . to gain insight into the molecular mechanisms that induce the disease we studied the link between mutant pkp and cellular cytoskeleton. it is known that pkp loss impairs cortical actin remodeling and normal desmosome assembly (godsel et al, ) , which compromises cell mechanical properties like cellular stiffness (puzzi et al, ) . we measured cell stiffness in intact hl- cardiac cells by acquiring force-volume maps by atomic force microscopy (afm). these force maps were used to represent spatial variation in young's modulus as a measure of local cell elasticity (stiffness) (dufrene et al, ) in cell lines stably expressing pkp or mutant r x ( figure a ). the nanomechanical maps show that young's modulus was higher in regions with a high density of branched actin cytoskeleton networks and stress fibers. these observations were quantified by statistical analysis of young's modulus maps using a bottom-effect correction expression (garcia & garcia, ) . the graphs show that the median value of the young's modulus measured in hl- pkp cells was higher than that measured in hl- r x cells ( figure b ). puzzi et al using afm in hl- cells showed that pkp knock-down was also associated with decreased cellular stiffness, as indicated by decreased young's modulus, when comparing to control groups (puzzi et al., ) .these results support the observation that r x cells lack most stress and f-actin fibers ( figure c ). images from the top central area of cells revealed a scarcity of f-actin filaments in the area above the nucleus of cells expressing the r x mutant ( . %± . % in pkp cells vs. . %± . % in r x cells; p< . , ****; n= ) ( figure c ). together, these data show that the higher absolute levels of f-actin and stress-fiber formation in pkp hl- cell are associated with greater cell stiffness, whereas the lower cell stiffness in mutant r x correlates with a reduction in f-actin and stress fiber disassembly. pkp interacts with the actomyosin proteins myh and myh . to find candidates that could regulate actin cytoskeleton and cellular stiffness we explored the pkp interactome. we identified cardiac specific pkp a (gandjbakhch et al., ) , binding partners by pull-down of halo-tagged pkp proteins followed by mass spectrometry (ms) analysis. pkp -halotag pulldown products included previously identified interacting partners such as the desmosomal components desmocolin, plakoglobin, desmoglein, and desmoplakin together with novel interacting partners like myosin (myh ) and myosin (myh ), also known as non-muscle myosins nmiia and nmiib ( figure a ). interaction with myh and myh was validated by immunoprecipitation from cells expressing egfp-pkp or egfp-r x ( figure b ). these results support a direct link between desmosomal pkp and actomyosin cytoskeleton components. localization. confocal microscopy revealed that most of the r x mutant signal was located in the cytosol, displaced from the cell edge ( figure a ). image analysis in hela and hl- cells confirmed that r x mutant was present at lower levels in the plasma membrane relative to the cytoplasm ( figure b ). after transfection of cells with pcag-pkp , pcag-r x, or both plasmids, immunoblot analysis of cell fractions demonstrated that wild-type pkp associates with the plasma membrane, whereas the r x mutant is predominantly detected in the cytoplasm and only occasionally at the membrane fraction ( figure c ). these results demonstrate that the c-terminal region of pkp is important for proper protein localization at the border contact region. also, highlight that pkp subcellular localization at the desmosome is not altered when co-expressed with mutant r x pkp protein ( figure c ). these data suggest that the gain-of-function mutant pkp in the cytosol has a new activity, as such function is not mediated by a deregulation of the levels or localization of wild-type pkp . myh and myh interact with mutant r x in the cytoplasm, we hypothesized that pkp abnormal localization might disrupt actomyosin cytoskeleton organization. firstly, we analyzed the precise subcellular distribution in cardiac hl- cells of endogenous actomyosin cytoskeleton (myh , myh and actin) at cell periphery when using pkp or r x egfp nterminal fusion protein ( figure a ). confocal imaging showed that pkp signal coincides with f-actin, and partially overlaps with myh and myh . in fact, a representative fluorescence intensity profile (along a cross-section line from the plasma membrane to the cytoplasm; figure a ) of egfp-tagged pkp (green line), together with myh (red), myh (blue) and f-actin (grey line), shows a structured disposition with a more prominent intracellular localization of myh , resulting in the absence of fluorescent colocalization with egfp-pkp at the edge of the plasma membrane of the cell. likewise, fluorescence of f-actin was found to largely colocalize with that of egfp-pkp . line scan analysis shows that the vast majority of myh and myh filaments overlapped, except at the very leading edge where myh is localized ahead of myh . by contrast, when r x mutant is present, fluorescence of f-actin, myh and myh exhibited a full overlap, indicating inefficient distribution of these actomyosin cytoskeleton components at the cell edge ( figure b ). to analyze the actomyosin elements distribution in the cell population we quantitatively evaluated the maximal signal intensity in multiple cells to confirm the effect of r x ( figure c ). it is important to notice that f-actin, myh and myh organization in r x mutant differed from what can be observed in pkp shrna cells. when pkp is absent only cortical actin is displaced from the periphery in agreement with previous reports (godsel et al., ) and myh and myh organization remain unaltered compared with control cells ( figure c ). these results identify myh and myh as specific pkp interactors that provide a novel functional link between a desmosomal component localization and the actomyosin cytoskeleton organization. a recent work from krogan´s laboratory (gordon et al., ) has also shown that the sars-cov nonstructural protein (nsp ) expressed in kidney hek human cells as xstrep affinity tag fused protein pull downs pkp present in the lysates. due to the preferential localization of pkp at the desmosome and the increased interaction of nsp -pkp in the soluble fraction (gordon et al., ) we speculated that nsp hijacks pkp from the desmosome into the cytoplasm. to assess this hypothesis and to address whether nsp delocalizes pkp , we transfected human cells with egfp-tagged pkp and xstrep-tagged nsp . representative confocal images of hela cell populations showed that under steady-state conditions egfp-pkp signal was reduced and mobilized from the desmosome into the cytoplasm ( figure a ). on the contrary, in the absence of viral nsp , wild-type transfected pkp localizes at the desmosomes around the cell perimeter. this effect of viral nsp on pkp protein was observed in human and mouse cells ( figure b ). fluorescence intensity profiles spotted a lack of pkp fluorescent signal at the cellular cortex in cell cultures transfected with nsp ( figure c ). consistently with the imaging analysis, we also confirmed that the expression of the viral protein nsp decreases pkp absolute protein level and dramatically reduces the amount of protein in the insoluble fraction. when cell lysates were divided in np- soluble and insoluble fractions we detected a significant decrease in the amount of pkp in the insoluble fraction ( figure d ). these data suggest that pkp is not stably associated with the desmosomal complex in cells expressing nsp and that pkp is unstable when displaced from the cellular cortex. transverse maximal projections of z-stack images in cells expressing pkp confirmed the organization of actin filament bundles into structures along the cell. in contrast, cells expressing the sars-cov- nsp protein or r x lacked this structure and the nucleus was close to the external plasma membrane with almost collapsed cytosolic space. we noted that both in nsp expressing cells and r x mutant, the actin distribution around the cell differed from that observed in pkp controls ( figure c ) where actin fibers run over the nuclei. this difference in actin distribution was reflected independently of the cell type in the abnormal height of most r x and pkp -nsp cells, which were shorter than control cells (pkp , nsp or mock transfected) ( figure d ). we hypothesize that if the pkp delocalization effect on actin cytoskeleton is specific and based on an abnormal gain of function out of the desmosome, pkp shrna knockdown (shpkp ), that do not induce pkp localization on the cytoplasm, and wild-type phenotypes should be the same. in fact, this is the case, as % pkp knockdown does not alter cellular height in hl- cells ( figure e ). all together, these results suggest that cytoplasmic localization of pkp either by a mutant involved in acm development, or by nsp hijack by sars-cov- infection can alter the integrity and assembly of the actin cytoskeleton by modulating the cortical distribution of myh and myh . myocardial complications have been documented in a significant number of covid- patients shi et al, ) , with a strong involvement of the right ventricle (rv) in individuals with poor prognosis. our data suggest that the interaction between sars-cov- protein nsp and desmosomal pkp in the soluble fraction of the cardiomyocytes could contribute to sars-cov- - the most common subtype of acm is arrhythmogenic right ventricular cardiomyopathy (arvc), which is overrepresented in patients with mutations resulting in pre-mature termination of the pkp protein lazzarini et al, ) . therefore, it is not surprising that pathogenic mutations in pkp have been usually associated with the classical form of the disease that predominantly includes anomalous electrocardiograms and arrhythmias, with structural abnormalities that lead to a progressive global rv dysfunction (van tintelen et al., ) . acm presents very diverse penetrance and the same mutation may cause severe heart failure in one patient and no symptoms in another (corrado et al, b) . these phenotypic differences can be explained due to environmental stressors like exercise (cruz et al., ; james et al, ) that define the final disease outcome. under extreme exercise conditions the rv goes through a greater load increase compared with the left ventricle, where near-linear increase in pulmonary artery pressures predominantly contributes to a disproportionate increase in rv wall stress (la gerche & claessen, ) . given the profound effect of exercise on rv structure and function, it is reasonable that acm develops in pkp mutant carriers with compromised structural integrity of the myocardial cytoskeleton. similarly, it is long recognized that rv dysfunction is frequently associated with moderate to severe acute respiratory distress syndrome (ards), which is one of the major determinants of covid- mortality. since most ards patients require mechanical ventilation, the uncoupling between the rv and pulmonary circulation under ventilation can also contribute to the fatal increase in rv stress. our data demonstrate that sars-cov- nsp protein alters pkp function by collapsing actomyosin cytoskeleton alike acm-related mutant r x. given that the acm development expressing r x mutant is triggered by exercise (cruz et al., ) , it is plausible the cardiac expression of nsp contributes to aggravate rv cardiac dysfunction in patients with arsd and/or mechanical ventilation. thus, our data suggest that cardiac tissue infection by sars-cov- is overrepresented in covid- patients with worst prognosis and higher death incidence. in fact, it has been documented that over consecutive covid- autopsies, sars-cov- was found in the heart tissue in over % of the cases (lindner et al, ). proteins do not exist in isolation, and the function of any given protein cannot be understood without considering its interactions with other proteins and its place in cellular interaction networks (vidal et al, ) . mutations can affect interactions in many ways; they can completely wipe out all protein interactions; disrupt some interactions while retaining or strengthening others; or generate new interfaces. loss of all interactions occurs most commonly through mutations that disrupt protein folding, leading to protein degradation (sahni et al, ) . however, this is not the case with the pkp r x mutant, which can be detected (figure ) , although does not accumulate as pkp . it is possible that pkp once displaced from the desmosome, into the soluble fraction, is strongly regulated. in support of this hypothesis, we observed similar protein levels on the soluble fraction of mutant r x and pkp dragged from insoluble structures by viral nsp (figure ) . furthermore, both pkp and r x proteins have a deleterious effect over actomyosin cytoskeleton organization what reinforces the relevance of regulating pkp levels outside the desmosome. our study reveals a desmosome-independent role of pkp in the regulation of myh , myh and f-actin relative distribution at nanoscale level, exerted through protein-protein interaction. pkp gain-of-function mutant mainly localized in the cytoplasm alters actin remodeling required for accurate relative distribution and the maintenance of f-actin-derived biomechanical properties and cell architecture (figures and ) . this idea is reinforced by the fact that nsp protein delocalizes pkp from the desmosome into the soluble fraction and deregulates the actomyosin network by downregulating and delocalizing myh and myh . dominant inherited cardiomyopathy could be caused by loss-of-function (haploinsufficiency) or gain-of-function mechanisms. we identified a possible mechanism through which the r x pkp variant could function as a gain-offunction mutation and not a simple loss-of-function allele. we hypothesize that the pathogenic r x variant functions by interrupting or rewiring highly connected interaction networks to disturb f-actin homeostasis. we predict that gain of function will also be the mode of action of other c-terminally truncated pkp variants stable to retain myh and myh interaction and to disrupt cellto-cell subcellular localization. the clinvar database (https://www.ncbi.nlm.nih.gov/clinvar/), part of the ncbi entrez system, attempts to establish relationships between gene variants and phenotype. the hek t (atcc, crl- ) and hela (atcc, ccl- ) cell lines were maintained in dmem (gibco) supplemented with % fbs, % penicillin/streptomycin, and mm l-glutamine. the atrial cardiomyocyte cell line hl- (sigma, aldrich) was maintained in claycomb medium (sigma aldrich) supplemented with % fbs, % penicillin/streptomycin, and mm l-glutamine, as previously described (claycomb et al, ) . hl- cells were seeded on plates coated with . % gelatin/fibronectin (sigma aldrich). cell lines were maintained at ºc with % of co . transient transfection with cdnas encoding egfp, egfp-tagged pkp and r x, nsp proteins were performed in -mm tissue culture dishes (mattek) using the jetprime® reagent according to the manufacturer's protocol (polyplus transfection®). stable hl- cell lines were generated using the piggybac transposon system. cells were transfected with plasmids expressing pkp , shpkp , egfp-pkp , r x, egfp-r x or r x-egfp together with the ppb-transposase (cadinanos & bradley, ) . cells were then selected for geneticin resistance (g , thermofisher scientific) and expanded for further experiments. the afm experiments were performed with a commercial instrument (jpk nanowizard , jpk instruments ag, berlin, germany) mounted on an axio vert the half-cone angle was  º, and the nominal radius at the tip apex was nm. the upper cantilever surface was gold-coated to improve the signal-tonoise ratio in the deflection signal. to control the force applied on the cell, the deflection sensitivity was calibrated on a petri dish. the spring constant ( . - . n/m) was calculated using the thermal noise method (lozano et al, ) . force-volume maps (dufrene et al., ) were generated for whole cells by acquiring force-distance curves ( x pixels ) over a µm² area. the tip sample distance was modulated by applying a triangular waveform. each individual force-distance curve was acquired at a velocity of µm/s ( hz) and a range of  µm. to prevent sample damage, the maximum force applied to cells was nn. maps were analyzed with in-house software written in python. we analyzed r x-egfp variants. the program includes bottom-effect corrections for a conical tip to correct for finite cell thickness (garcia & garcia, ) . pkp isoforms were expressed in hek t cells as n-terminal halo tag fusion proteins. cell pellets were lysed using mammalian lysis buffer (g , promega). the bait-prey complexes, containing the pkp -halo-tagged fusion protein (bait) and the potential binding partners (prey), were pulled down using halolink resin (promega madison, wi) and extensively washed in buffer containing mm tris (ph . ), mm nacl, mg/ml bsa, and . % igepal ® ca- (octylphenoxypolyethoxyethanol, i , sigma-aldrich, oakville, on). purified bait-prey protein complexes were digested overnight with tev protease at °c to release halo-linked pkp protein, and the tag-free protein complexes were isolated with a his-trap-spin column. the eluted protein complexes were in-gel digested with trypsin as described previously (bonzon-kulichenko et al, ) , and the resulting peptides were and fragment mass tolerances of ppm and . da, respectively; cys carbamidomethylation as a static modification; and met oxidation as a dynamic modification. the results were analyzed using the probability ratio method (martinez-bartolome et al, ) , and a false discovery rate (fdr) for peptide identification was calculated based on search results against a decoy database using the refined method (navarro & vazquez, ). nsp -expressing lentiviral particles were produced by cotransfection of pspax. (gag-pol), pmd -g (env) and plvx-ef a-nsp - xstrep-ires-puro plasmids into hek t cells with calcium phosphate. viruses were harvested h posttransfection, filtered through . um pes filter and concentrated by ultracentrifugation prior to titration by qpcr. hela and hl- cell lines were fixed with % formaldehyde for minutes, then permeabilized for minutes at room temperature with . % triton x- , and finally stained with phalloidin-ifluor reagent (ab , abcam). cells were washed with phosphate buffered saline before addition of dapi (thermofisher scientific) and were examined under a leica confocal laser scanning microscope. z-stack images were captured and prepared as maximal to calculate ratio membrane-cytoplasm fluorescence z-stacks images from hl-with a leica sp confocal microscope with hc pl apo x/ . oil objective. regions of interest (rois) were drawn over maxima projections images to define the plasma membrane and the cytoplasm, excluding the nucleus and/or vacuoles. the ratio plasma membrane-cytoplasm intensity was calculated to normalize the intensity of the plasma membrane to the level of expression on each single cell. after hours, transfected cells were washed once with ice-cold pbs and after dislodged by scraping using buffer np ( mm tris-hcl ph , ; mm nacl; % nonidet p substitute). cells were lysed for min at ºc on a rotator and the lysates were cleared by centrifugation ( rpm for min at ºc). pellet was resuspended in buffer np supplemented with % triton x- and x protein loading buffer ( mm tris-hcl ph , , % sds, mm mercaptoethanol, % glycerol, , % bromophenol blue). for plasma membrane protein extraction cells were washed once with ice-cold pbs and dislodged by scraping. plasma membrane proteins were extracted using the plasma membrane protein extraction kit from abcam. proteins were separated from membrane and cytoplasmic extracts on % sds-page gels and then western blotted with antibodies against pkp (eb , everest biotech), n-cadherin (sc- , santa cruz biotechnology), and gapdh (sc- , santa cruz biotechnology). secondary antibodies were anti-goat (abin , antibodies online) and anti-mouse (abin , antibodies online) as appropriate. immunoblots were developed with the odyssey imaging system. hek t cells were transfected with pegfp-n , pegfp-pkp , or pegfp-r x. after hours, cells were washed with ice-cold pbs and dislodged by scraping. proteins were extracted in np- buffer ( mm nacl, mm tris-hcl ph . , % np- , and protease and phosphatase inhibitors). samples were incubated with shaking for hours at ºc and then centrifuged for minutes at , g at ºc. all protein samples were quantified by the lowry method (biorad), and mg and g of each sample were used for coimmunoprecipitation (co-ip) and input, respectively. co-ip was performed with dynabeads® a ( d, thermofisher scientific). for each sample, l dynabeads were washed four times with np- buffer. after this, rat igg anti-egfp (kindly provided by the monoclonal antibody facility at the cnio, spain) were added to the dynabeads at : dilution and incubated with shaking for minutes at ºc. the antibody was then removed, and mg of protein per sample was added to the dynabeads, followed by incubation overnight with shaking at ºc. unbound proteins were removed by washing the dynabeads four times in np- buffer. bound proteins were eluted with l loading buffer ( % sds, mm β-mercapto-ethanol, % glycerol, mm tris-hcl ph . , . % bromophenolblue), heated for minutes at ºc, separated on % sds-page gels, and western blotted with antibodies against myh (gtx , genetex), myh ( s, cell signaling technology), and egfp ( , clontech). secondary antibodies were anti-mouse (abin , antibodies online) and anti-rabbit (abin , antibodies online), as appropriate. immunoblots were developed with the odyssey imaging system. no data were excluded from the analysis. all data were analyzed by one-way anova with the tukey multiple comparison post-test, two-way anova, unpaired student t test, or mann-whitney test. error bars represent sem. statistical significance of differences was assigned as follows: * p< . , ** p< . , *** p< . , **** p< . , and ns p> . . ochoa center of excellence (sev- - ) . this study was supported by mciu grant bfu - -r. the study was also partially supported by the "ayudas a la investigación cátedra real madrid-universidad europea" e g f p -p k p e g f p -r x e g f p e g f p -p k p e g f p -r x e g f p cell height (μm) cell height (μm) mechanistic basis of desmosometargeted diseases stop-gain mutations in pkp are associated with a later age of onset of arrhythmogenic right ventricular cardiomyopathy right ventricular dilation in hospitalized patients with covid- infection impact of genotype on clinical course in arrhythmogenic right ventricular dysplasia/cardiomyopathy-associated mutation carriers a robust method for quantitative high-throughput analysis of proteomes by o labeling generation of an inducible and optimized piggybac transposon system hl- cells: a cardiac muscle cell line that contracts and retains phenotypic characteristics of the adult cardiomyocyte athletic training and arrhythmogenic right ventricular cardiomyopathy arrhythmogenic cardiomyopathy arrhythmogenic right ventricular cardiomyopathy exercise triggers arvc phenotype in mice expressing a disease-causing mutated version of human plakophilin- comprehensive desmosome mutation analysis in north americans with arrhythmogenic right ventricular dysplasia/cardiomyopathy imaging modes of atomic force microscopy for application in molecular and cell biology the variety of cardiovascular presentations of covid- plakophilin a is the dominant isoform in human heart tissue: consequences for the genetic screening of arrhythmogenic right ventricular cardiomyopathy determination of the elastic moduli of a single cell cultured on a rigid support by force microscopy mutations in the desmosomal protein plakophilin- are common in arrhythmogenic right ventricular cardiomyopathy plakophilin couples actomyosin remodeling to desmosomal plaque assembly via rhoa sudden death related cardiomyopathies -arrhythmogenic right ventricular cardiomyopathy, arrhythmogenic cardiomyopathy, and exerciseinduced cardiomyopathy a sars-cov- protein interaction map reveals targets for drug repurposing supporting the heart: functions of the cardiomyocyte's non-sarcomeric cytoskeleton clinical presentation, long-term follow-up, and outcomes of arrhythmogenic right ventricular dysplasia/cardiomyopathy patients and family members exercise increases age-related penetrance and arrhythmic risk in arrhythmogenic right ventricular dysplasia/cardiomyopathy-associated desmosomal mutation carriers molecular control of non-muscle myosin ii assembly architecture shapes contractility in actomyosin networks increased flow, dam walls, and upstream pressure: the physiological challenges and atrial consequences of intense exercise the arvd/c genetic variants database: update calibration of higher eigenmode spring constants of atomic force microscope cantilevers recommendations for physical activity and recreational sports participation for young patients with genetic cardiovascular diseases properties of average score distributions of sequest: the probability ratio method identification of a deletion in plakoglobin in arrhythmogenic right ventricular cardiomyopathy with palmoplantar keratoderma and woolly hair (naxos disease) severe acute respiratory syndrome coronavirus nsp suppresses host gene expression, including that of type i interferon, in infected cells a refined method to calculate false discovery rates for peptide identification using decoy databases desmosomes in the heart: a review of clinical and mechanistic analyses myh : structure, functions and role of non-muscle myosin iia in human disease knock down of plakophillin dysregulates adhesion pathway through upregulation of mir b and alters the mechanical properties in cardiac cells widespread macromolecular interaction perturbations in human genetic disorders association of cardiac injury with mortality in hospitalized patients with covid- in wuhan key: cord- -n ylgqfu authors: giri, rajanish; bhardwaj, taniya; shegane, meenakshi; gehi, bhuvaneshwari r.; kumar, prateek; gadhave, kundlik; oldfield, christopher j.; uversky, vladimir n. title: when darkness becomes a ray of light in the dark times: understanding the covid- via the comparative analysis of the dark proteomes of sars-cov- , human sars and bat sars-like coronaviruses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: n ylgqfu recently emerged coronavirus designated as sars-cov- (also known as novel coronavirus ( -ncov) or wuhan coronavirus) is a causative agent of coronavirus disease (covid- ), which is rapidly spreading throughout the world now. more than , , cases of sars-cov- infection and more than , covid- -associated mortalities have been reported worldwide till the writing of this article, and these numbers are increasing every passing hour. world health organization (who) has declared the sars-cov- spread as a global public health emergency and admitted that the covid- is a pandemic now. the multiple sequence alignment data correlated with the already published reports on the sars-cov- evolution and indicated that this virus is closely related to the bat severe acute respiratory syndrome-like coronavirus (bat sars-like cov) and the well-studied human sars coronavirus (sars cov). the disordered regions in viral proteins are associated with the viral infectivity and pathogenicity. therefore, in this study, we have exploited a set of complementary computational approaches to examine the dark proteomes of sars-cov- , bat sars-like, and human sars covs by analysing the prevalence of intrinsic disorder in their proteins. according to our findings, sars-cov- proteome contains very significant levels of structural order. in fact, except for nucleocapsid, nsp , and orf , the vast majority of sars-cov- proteins are mostly ordered proteins containing less intrinsically disordered protein regions (idprs). however, idprs found in sars-cov- proteins are functionally important. for example, cleavage sites in its replicase ab polyprotein are found to be highly disordered, and almost all sars-cov- proteins were shown to contain molecular recognition features (morfs), which are intrinsic disorder-based protein-protein interaction sites that are commonly utilized by proteins for interaction with specific partners. the results of our extensive investigation of the dark side of the sars-cov- proteome will have important implications for the structural and non-structural biology of sars or sars-like coronaviruses. significance the infection caused by a novel coronavirus (sars-cov- ) that causes severe respiratory disease with pneumonia-like symptoms in humans is responsible for the current covid- pandemic. no in-depth information on structures and functions of sars-cov- proteins is currently available in the public domain, and no effective anti-viral drugs and/or vaccines are designed for the treatment of this infection. our study provides the first comparative analysis of the order- and disorder-based features of the sars-cov- proteome relative to human sars and bat cov that may be useful for structure-based drug discovery. intrinsically disordered proteins (idps) and intrinsically disordered protein regions (idprs)), in order to better understand an interplay between the ordered and disordered components of the proteome. in classical structure-function-paradigm, it is believed that a unique, stable, and well-defined -dimensional structure is a prerequisite for a protein to accomplish its unique biological function. although this notion dominated scientific minds for over the hundred years, eventually an idea of the presence of functional intrinsic disorder in proteins came to the attention of the structural biologists. according to this "heretic" viewpoint, a noticeable amount of biologically active proteins (of protein regions) fail to fold into the well-defined structures and instead remain disordered, existing as highly dynamic ensembles of rapidly interconverting conformations under the physiological conditions. these proteins and protein regions are known now as intrinsically disordered proteins (idps) and intrinsically disordered protein regions (idprs), respectively. the propensity of being functional intrinsically disordered proteins (similar to the propensity of forming unique biologically active structures of ordered proteins) is determined by the amino acid sequences [ ] [ ] [ ] . idps exhibit their biological functions in numerous biological processes commonly associated with cellular signalling, gene regulation, and control by interacting with their physiological partners [ ] [ ] [ ] [ ] [ ] . these functions of idps and idps are regulated by their protein-protein, protein-rna, protein-dna interactions [ , ] . molecular recognition features (morfs) are the regions in idps implicated in regulation of idps function by protein-protein interactions and serve as the primary stage in molecular recognition. zhang and colleagues have reported the genomic sequence of sars-cov- with genbank accession number nc_ having , nucleotides. the virus was isolated from the bronchoalveolar lavage fluid of a patient, went through a circle or renaming, from novel wuhan seafood market pneumonia virus to deadly wuhan coronavirus, to the novel coronavirus ( -ncov) or the wuhan- novel coronavirus (wuhan- -ncov, and was eventually named sars-cov- by the who [ ] . it is known that the idps/idprs are present in all three kingdoms of life, and viral proteins often contain unstructured regions that have been strongly correlated with their virulence [ ] [ ] [ ] [ ] . in this report, we investigated the disordered side of the sars-cov- proteome using a complementary set of computational approaches to check the prevalence of idprs in various sars-cov- proteins and to shed some light on their disorder-related functions. we also have comprehensively analyzed idprs among the closely related viruses, such as human sars cov and bat sars-like cov. furthermore, we have also identified protein functions related to protein-protein interactions, rna binding, and dna binding from all three viruses. since these three viruses are closely related, our study provides important means for a better understanding of the sequence and structural peculiarities of their evolution. we believe that this study will help the structural and non-structural biologists to design and perform experiments for a more in-depth understanding of this virus and its pathogenicity. this also will have long-term implications for developing new drugs or vaccines against this currently unpreventable infection. sequence retrieval and multiple sequence alignment. the protein sequences of bat cov (sars-like) and human sars cov were retrieved from uniprot (uniprot ids for individual proteins are listed in table ). the translated sequences of sars-cov- proteins [genbank database [ ] (accession id: nc_ . )] were obtained from genbank. we used these sequences for performing multiple sequence alignment (msa) and predicting the idprs. we have used clustal omega [ ] for protein sequence alignment and esprit . [ ] for constructing the aligned images. for the prediction of the intrinsic disorder predisposition of cov proteomes, we have used multiple predictors, such as members of the pondr ® (predictor of natural disordered regions) family including pondr ® vls [ ] , pondr ® vl [ ] , pondr ® fit [ ] , and pondr ® vlxt [ ] , as well as the iupred platform for predicting long (≥ residues) and short idprs (< residues) [ ] . these computational tools predict residues/regions, which do not have the tendency to form an ordered structure. residues with disorder scores exceeding the threshold value of . are considered as intrinsically disordered residues, whereas residues with the predicted disorder scores between . and . are considered flexible. complete predicted percent of intrinsic disorder (ppid) in a query protein was calculated for every protein of all the three viruses from outputs of six predictors. the detailed methodology has been given in our previous reports [ , ] . and disopred [ ] . the protein residues with anchor, morfpred, and disopred score above the threshold value of . and morfchibi_web score above the threshold value of . are considered morf regions. idprs facilitate interactions with rnas and dnas and regulates many cellular functions [ ] . thus, for predicting the dna binding residues in cov proteins, we have used two online servers: drnapred [ ] and disordpbind [ ] . for rna binding residues, we used pprint (prediction of protein rna-interaction) [ ] and disordpbind [ ] . the mean values of the predicted percentage of intrinsic disorder scores (mean ppids), that were obtained by averaging the predicted disorder scores from six disorder predictors (supplementary table - ) for each protein of sars-cov- as well as human sars, and bat cov are represented in table . * these sequences are based on genome annotations conducted by wu et al. [ ] . proteins and their ppids are coloured to reflect their disorder status (ordered -blue, moderately disorderedpink, highly disordered -red) figures a, b , and c are d-disorder plots generated for sars-cov- , human sars and bat cov proteins, respectively, and represent the ppid pondr-fit vs. ppid mean plots. based on their predicted levels of intrinsic disorder, proteins can be classified as highly ordered (ppid < %), moderately disordered ( % ≤ ppid < %) and highly disordered (ppid ≥ %) [ ] . from the data in table , figures a, b , and c, as well as the ppid based classification, we conclude that the nucleocapsid protein from all three strains of coronavirus possesses the highest percentage of the disorder and is classified as highly disordered protein. orf b protein in bat cov, orf protein in sars-cov- , human sars, and bat cov, and orf b protein in human sars and sars cov belong to the class of moderately disordered proteins. while the structured proteins, namely, spike glycoprotein (s), an envelope protein (e) and membrane protein (m) as well as accessory proteins orf a, orf a, orf (orf a and orf b in case of human sars) of all three strains of coronavirus are ordered proteins. orf and orf proteins also belong to the class of ordered proteins. in ch-cdf plot of the proteins of (d) sars-cov- (e) human sars and (f) bat cov, the y coordinate of each protein spot signifies distance of corresponding protein from the boundary in ch plot and the x coordinate value corresponds to the average distance of the cdf curve for the respective protein from the cdf boundary. in order to further investigate the nature of the disorder in proteins of sars-cov- , human sars, and bat cov, we utilized the combined ch-cdf tool that uses the outputs of two binary classifiers of disorder, charge hydropathy (ch) plot and cumulative distribution function (cdf) plot. this helped in retrieving more detailed characterization of the global disorder predisposition of the query proteins and their classification according to the disorder "favours". the ch plot is a linear classifier that differentiates between proteins that are predisposed to possess extended disordered conformations that include random coils and premolten globules from proteins that have compact conformations (ordered proteins and molten globule-like proteins). the other binary predictor, cdf is a nonlinear classifier that uses the pondr ® vlxt scores to discriminate ordered globular proteins from all disordered conformations, which include native molten globules, pre-molten globules, and random coils. the ch-cdf plot can be divided into four quadrants: q (bottom right quadrant) is an area of ch-cdf phase space that is expected to include ordered proteins; q (bottom left quadrant) includes proteins predicted to be disordered by cdf and compact by ch (i.e., native molten globules and hybrid proteins containing high levels of both ordered and disordered regions); q (top left quadrant) contains proteins that are predicted to be disordered by both ch and cdf analysis (i.e., highly disordered proteins with the extended disorder); and q (top right quadrant) possesses proteins disordered according to ch but ordered according to cdf analysis [ ] . figures d, e and f represent the ch-cdf analysis of proteins of sars-cov- , human sars, and bat cov and show that all the proteins are located within the two quadrants q and q . the ch-cdf analysis leads to the conclusion that all proteins of sars-cov- , human sars, and bat cov are ordered except nucleocapsid protein, which is predicted to be disordered by cdf but ordered by ch and hence lies in q . molecular recognition features (morfs) are short interaction-prone disordered regions found within idps/idprs that commence a disorder-to-order transition upon binding to their partners [ , ] . these regions are important for protein-protein interactions and may initiate an early step in molecular recognition [ ] . in this study, we have analyzed and compared morfs (protein-binding regions) in sars-cov- with human sars and bat cov. the results of this analysis are summarized in table , which clearly shows that most of the sars-cov- proteins contain at least one morf, indicating that disorder does play an important role in the functionality of these viral proteins. in addition to protein-protein interactions/protein-binding functions, idps and idrs also mediates functions by facilitating their interactions with nucleotides such as dna and rna [ , ] . therefore, we have used a combination of two different online servers for locating protein residues that are showing the propensity to bind with dna as well as rna. nucleotide-binding residues in proteins of three studied coronaviruses are listed in supplementary coronaviruses encode four structural proteins, namely, spike (s), envelope (e) glycoprotein, membrane (m), and nucleocapsid (n) proteins, which are translated from the last ~ kb nucleotides and form the outer cover of the covs, encapsulating their single-stranded genomic rna. s protein is a large multifunctional protein forming the exterior of the cov particles [ , ] . it forms surface homotrimers and contains two distinct ectodomain regions known as s and s . in some covs, the s protein is actually cleaved into these subunits, which are joined non-covalently, whereas an additional proteolytic cleavage within the n-terminal part of the s subunit that takes place upon virus endocytosis generates spike proteins s '. subunit s initiates viral infection by binding to the host cell receptors, s acts as a class i viral fusion protein that mediates fusion of the virion and cellular membranes and thereby promotes the viral entry into the host cells, whereas s ' serves as a viral fusion peptide [ , ] . spike binds to the virion m protein through its c-terminal transmembrane region [ ] . belonging to a class i viral fusion protein, s protein binds to specific surface receptor angiotensin-converting enzyme (ace ) on host cell plasma membrane through its n-terminal receptor-binding domain (rbd) and mediates viral entry into host cells [ ] . the s protein consists of an n-terminal signal peptide, a long extracellular domain, a singlepass transmembrane domain, and a short intracellular domain [ ] . a . Å resolution structure (pdb id: acc) of s protein from human sars complexed with its host binding partner ace has been obtained by cryo-electron microscopy (cryo-em the biophysical analysis reported in previous study has also revealed that the s protein from sars-cov- has a higher binding affinity to ace than s protein from human sars [ ] . which has been calculated by averaging the disorder scores from all six predictors is represented by a short-dot line (sky-blue line) in the graph. the light sky-blue shadow region signifies the mean error distribution. the residues missing in the pdb structure or the residues for which pdb structure is unavailable are represented by the grey-coloured area in the corresponding graphs. (e) aligned disorder profiles generated for spike glycoprotein from sars-cov- (black line), human sars (red line), and bat cov (green line) based on the outputs of the pondr ® vsl . msa analysis among all three coronaviruses demonstrates that s protein of sars-cov- has a . % sequence identity with bat cov and . % identity with human sars (supplementary figure s a) . all three s proteins are found to have a conserved c-terminal region. however, the n-terminal regions of s proteins display noticeable differences. given that there is significant sequence variation rbd located at the n-terminal region of s protein, this might be the reason behind variation in its virulence and its receptor-mediated binding and entry into the host cell. according to our intrinsic disorder propensity analysis, s protein from all three covs analysed in this study are highly structured, as their predicted disorder propensity lies below % ( table ). in fact, the mean ppid scores of sars-cov- , human sars cov, and bat cov are calculated to be . %, . %, and . %, respectively. figures b, c , and d represent the intrinsic disorder profiles of s proteins from sars-cov- , human sars and bat cov obtained from six disorder predictors. finally, figure e shows aligned disorder profiles of s proteins from these covs and illustrates remarkable similarity in their disorder propensity, especially in the c-terminal region. it is of interest to map known functional regions of s proteins to their corresponding disorder profiles. the maturation of s protein requires specific posttranslational modification (ptm), proteolytic cleavage that happens at two stages. first, host cell furin or another cellular protease nicks the s precursor to generate s and s proteins, whereas the second cleavage that takes place after the viral attachment to host cell receptors leads to the release of a fusion peptide generating the s ' subunit. in human sars cov, the first and second cleavage site is located at residues r and r , respectively, whereas in bat cov, the corresponding cleavage sites are residues r and r . as it follows from figure , these cleavage sites are located within the idprs. in human sars cov s protein, fusion peptide (residues - ) is located within a flexible region, is characterized by the mean disorder score of . ± . . similarly, in bat cov s protein, fusion peptide (residues - ) has a mean disorder score of . ± . . s protein contains two heptad repeat regions that form coiledcoil structure during viral and target cell membrane fusion, assuming a trimer-of-hairpins structure needed for the functional positioning of the fusion peptide. in human sars cov s protein, heptad repeat regions are formed by residues - and - , which have mean disorder scores of . ± . and . ± . , respectively. the analogous situation is observed for the s protein from bat cov, where these heptad repeat regions are positioned at residues - ( . ± . ) and - ( . ± . ). another functional region found in s proteins is the receptor-binding domain (residues - and - in human sars cov and bat cov, respectively) containing a receptor-binding motif responsible for interaction with human ace . in human s protein of human sars cov this motif (residues - ) is not only characterized by structural flexibility, possessing a mean disorder score of . ± . , but also contains a disordered region (residues - ). since s protein is known as spike glycoprotein, it contains numerous glycosylation sites. due to rather close similarity of disorder profiles of s proteins analysed here, we can assume that all the aforementioned indications of the functional importance of disorder and flexible regions in s proteins from sars cov and bat cov are also applicable to sars-cov- s protein. finally, table shows that s protein from sars-cov- contain one morf region at its cterminal (residues - ) by morfchibi_web, two morf regions ((residues - ) & (residues - )) by morfpred, and one morf region at n-terminal (residues - ) by disopred . these results indicating that intrinsic disorder is important for its interaction with binding partners. interestingly, the n-terminal region of s protein (residues - ) from all three viruses are observed to be a disorder-based protein binding region by two predictors (morfpred and disopred ). n-terminal morf displays its role in viral interaction with host receptor and c-terminal morf displays its role in m protein interaction and viral assembly. moreover, morf region mainly lies in the n-and c-terminal regions suggesting a possible role during cleavage as well. in addition to protein-binding regions, s protein also shows many nucleotide-binding residues. tables , , and shows that numerous rna binding residues predicted by pprint in all three viruses and a single rna binding residue were predicted by disordpbind in human sars. further, drnapred and disordpbind predicted the presence of many dna binding residues in s protein of all three viruses. these results signify the role of s protein functions related to molecular recognition (protein-protein interaction, rna binding, and dna binding) such as interaction with host cell membrane and further viral infection. therefore, identified idps/idprs and residues/regions from s protein crucial for molecular recognition can be targeted for disorder-based drug discovery. envelope (e) protein is a small, multifunctional inner membrane protein that plays an important role in the assembly and morphogenesis of virions in the cell [ ] [ ] [ ] . e protein consists of two ectodomains associated with n-and c-terminal regions, and a transmembrane domain. it homo-oligomerize to form pentameric membrane destabilizing transmembrane (tm) hairpins to form a pore necessary for its ion channel activity [ ] . figure a shows the nmr-structure (pdb id: mm ) of human sars envelope glycoprotein of - residues [ ] . msa results illustrate ( figure b ) that this protein is highly conserved, with only three amino acid substitutions in e protein of sars-cov- conferring its % sequence similarity with human sars and bat cov. bat cov shares % sequence identity with human sars. mean ppid calculated for sars-cov- , human sars, and bat cov e proteins are . %, . %, and . % respectively ( table ) . the e protein is found to have a reasonably well-predicted structure. our predictions suggest that the residues of n-and c-terminals are displaying a higher tendency for the disorder. the last hydrophilic residues (residues - ) have been reported to adopt a random-coil conformation with and without the addition of lipid membranes [ ] . literature suggests that the last four amino acids of the c-terminal region of e protein containing a pzd-binding motif are involved in protein-protein interactions with a tight junction protein pals . our results support literature as we identified long n-terminal region of approximately residues long as disorder-based protein binding region in all three viruses (see table , supplementary table and ). pals is involved in maintaining the polarity of epithelial cells in mammals [ ] . respective graphs in figures c, d , and e show the predicted intrinsic disorder profiles for e proteins of sars-cov- , human sars, and bat cov. we speculate that the disordered region content may be facilitating the interactions with other proteins as well. in agreement with this hypothesis, table shows that in e protein from sars-cov- , the c-terminal domain serves as protein-binding region. we found that the residues from - is a long morf in e proteins of all three viruses as predicted by morfchibi_web ( table , supplementary table and ). as aforementioned, these randomly-coiled binding-residues at c-terminus may gain structure while assisting the protein-protein interaction mediated by e protein. one more morf region (residues [ ] [ ] [ ] [ ] [ ] in the transmembrane domain was observed by disopred in the e protein of all three viruses. since these residues are the part of ion channel, we speculate that these residues do specific interactions and may be guiding the specifi functions of ion channel activity. few rna binding residues by pprint and disordpbind and several dna binding residues by drnapred are predicted for e protein in all three viruses. assembly by interacting with the nucleocapsid (n) and e proteins [ ] [ ] [ ] . protein m interacts specifically with a short viral packaging signal containing coronavirus rna in the absence of n protein, thereby highlighting an important nucleocapsid-independent viral rna packaging mechanism inside the host cells [ ] . it gains high-mannose n-glycans in er, which are subsequently modified into complex n-glycans in the golgi complex. glycosylation of m protein is observed to be not essential for virion fusion in cell culture [ , ] . cryo-em and tomography data indicate that m forms two distinct conformations, a compact m protein having high flexibility and low spike density, and an elongated m protein having a rigid structure and narrow range of membrane curvature [ ] . some regions of m glycoproteins might serve as important dominant immunogens. although no structural information is available for the full-length m protein as of yet, a short peptide of the membrane glycoprotein (residues - ) from human sars cov was co-crystallized with a complex between a- alpha chain of the hla class i histocompatibility antigen and β microglobulin (pdb id: i g) [ ] . figure a shows that within this complex, the cocrystallized m protein region exists in an extended conformation. m protein of sars-cov- has a sequence similarity of . % with bat cov and . % with human sars m proteins ( figure b ). our analysis revealed that the intrinsic disorder levels in m proteins of sars-cov- , human sars cov, and bat cov are relatively low since these proteins show the ppid values of . %, . %, and . % respectively. this is in line with the previous publication by goh et al. on human sars hku where they found the mean ppid of % using additional predictors such as topidp and foldindex along with the predictors used in our study [ ] . figures c, d , and e represent per-residue disorder profiles generated for m proteins of sars-cov- , human sars cov, and bat cov and show that with the exception to their n-and c-terminal regions, these proteins are mostly ordered. the last residues of mers-cov m protein are important for intracellular trafficking and contains a determinant that localizes it into the golgi network [ ] . our results in table illustrates that the disordered c-tail of the m protein is predicted to have disorder based protein-binding region and therefore can serve as a binding site for its specific partner required for its localization inside the host cell. a long morf region (residues - ) at the c-terminal of m protein in all three viruses were observed by morfchibi_web. two morf regions (one at n-terminus (residues - ) and one at c-terminus (residues - )) was observed by disopred in human sars and bat cov. however, single morf (residues - ) observed in sars-cov- by disopred . morfpred also predicts a short morf at c-terminus of sars-cov- (residues - ), human sars (residues - ), and bat cov (residues - ) ( table , supplementary tables and ) . furthermore, the m protein from all three viruses displays strong tendency to bind with rna (as predicted by pprint and disordpbind) and dna (as predicted by drnapred and disordpbind) (see supplementary tables , , and ). our understanding on m protein of covs (idps and morf at c-terminus and molecular recognition) elucidates its crucial role in interaction with the n and e proteins for viral assembly. nucleocapsid (n) protein: nucleocapsid (n) protein is one of the major viral proteins playing several significant roles in transcription, and virion assembly of coronaviruses [ ] . it binds to viral genomic rna forming a ribonucleoprotein core required for the rna encapsidation during viral particle assembly [ ] . sars-cov virus-like particles (vlps) formation has been reported to depend upon either m and e proteins or m and n proteins. for the effective production and release of vlps, co-expression of e or n proteins with m protein is necessary [ ] . n protein of human sars consists of two structural domains, the n-terminal rna-binding domain (ntd: - residues) and the c-terminal dimerization domain (ctd: - residues) with a disordered patch in between these domains. n protein has been demonstrated to bind viral rna using both ntd and ctd [ ] . figure a displays the nmr solution structure of the ntd of human sars cov nucleocapsid protein ( - residues) (pdb id: ssk) [ ] . figure a shows an x-ray crystal structure of the ctd of human sars cov nucleocapsid protein ( - residues) (pdb id: gib) [ ] . a model of the domain organization of the n-protein from sars-cov- is shown in figure b . the amino acid-long n protein of sars-cov- shows a percentage identity of . % with n protein of bat cov n protein and . % with human sars n protein (supplementary figure s b) . our analysis revealed that the n proteins of coronaviruses contain the highest levels of intrinsic disorder (see figure and table ). in fact, n proteins from sars-cov- human sars cov, and bat cov are characterized by the mean ppid of . %, . %, and . %, respectively. in accordance with the previously evaluated intrinsic disorder predisposition [ ] , n protein is highly disordered in all three sars viruses analysed in this study ( table ) . graphs in figures c, d , and e depict the disorder profiles of sars-cov- , human sars cov, and bat cov nucleocapsid proteins and show that their n-and c-terminal regions are completely disordered, and all three proteins also contain the central unstructured segment. as expected, the intrinsic disorder predisposition of the n protein of sars-cov- is remarkably similar to that for the n protein of human sars cov as reported in a previous study [ ] . this is further supported by figure f , where pondr ® vsl -generated disorder profiles of these three proteins are overlapped to show almost complete coincidence of their major disorder-related features. it is clear that in n-proteins, the n-and c-termini and a log central segment are completely disordered. figure c shows that in the n protein from sars-cov- , residues - , - , - , - , and - are found to be disordered. many of these residues are lying within the ntd and ctd regions, and which, due to their structural plasticity, were not crystallized in human sars cov n protein. sars-cov- has a disordered segment from - residues while human sars has predicted to have an unstructured segment from - residues. overall, all three n proteins are found to be highly disordered. the n protein from human sars cov has one phosphorylation site (residue s ) and several regions with compositional biases, such as ser-rich (residues - ), poly-leu, poly-gln, and ploy-lys (residues - , - , and - ), all predicted to be disordered. similarly, in n protein from bat cov, s is phosphorylated, and this protein has ser-rich, poly-leu, and ploy-lys regions (residues - , - , and - , respectively), all of which are disordered. it has been reported to interact using the central disordered region with m protein, hnrnp a , and self n-n interaction [ ] [ ] [ ] . the middle flexible region is also responsible for its rna-binding activity [ ] . deletion of - residues, - residues, - residues of n abolishes its multimerization, rnabinding capacity, and hnrnp a interactions respectively. supplementary table and , and table shows that n protein is heavily decorated with numerous morfs, suggesting that this protein is a promiscuous binder. long disorder-based protein bonding regions at nand c-terminus of n protein of all three viruses were observed by all four predictors (morfchibi_web, anchor, morfpred, and disopred ). indeed, this is the single protein where we found many morfs as compared with the other structural, non-structural and accessory proteins of covs. the morfs present in these regions may mediate the abovementioned interactions of n proteins. figure a represents another important disorderrelated functional feature of the n protein. in fact, the ctd homodimer shown there is characterized by highly intertwined morphology, which is typically a result of bindinginduced folding [ ] [ ] [ ] , indicating that a very significant part of ctd gains structure during dimerization. we identified numerous rna binding residues in all three viruses using pprint server. this finding supports the function of n protein as it interacts with genomic rna for a ribonucleoprotein core formation which is crucial step for rna encapsidation during viral particle assembly. in addition, drnapred and disordpbind predicts multiple dna binding residues for n protein in sars-cov- , human sars, and bat cov. the long flexible (idprs) regions at n and c-terminus of sars-cov- have long protein-binding as well as nucleotide-binding regions that may have important role in its interaction with viral rna. these flexible regions can be targeted to inhibit interaction of n protein with viral genomic rna. literature suggests that some viral proteins are translated from the genes interspersed in between the genes of structural proteins. these proteins are known as accessory proteins, and many of them are proposed to be involved in viral pathogenesis [ ] . proteins orf a and orf b. orf a is a multifunctional protein with the molecular weight of ~ kda that has been found to localize in different organelles inside the host cells. also referred to as u , x , and orf , the gene for this protein is present between the s and e genes of the sars-cov genome [ ] [ ] [ ] . the homo-tetrameric complex of orf a has been demonstrated to form a potassium-ion channel on the host cell plasma membrane [ ] . it performs a major function during virion assembly by co-localizing with e, m, and s viral proteins [ , ] . orf b protein can be found in the cytoplasm, nucleolus, and outer membrane of mitochondria of the host cells [ , ] . in huh cells, its over-expression has been linked with the activation of ap- via erk and jnk pathways [ ] . transfection of orf b-egfp leads to cell growth arrest at the g /g phase of vero, , and cos- cells [ ] . orf a induces apoptosis via caspase / directed mitochondrial-mediated pathways, while orf b is reported to affect only the caspase -related pathways [ , ] . on performing msa, results of which are shown in figure d , we found that orf a protein from sars-cov- is slightly evolutionary closer to the orf a of bat cov ( . %) than to the orf a of human sars cov ( . %). graphs in figures a, b , and c depict the propensity for disorder in orf a proteins of novel sars-cov- , human sars cov, and bat cov (sars-like), respectively. mean ppids in these orf a proteins are . % (sars-cov- ), . % (human sars), and . % (bat cov (sars-like)). orf a of sars cov- shows protein-binding regions at its n-terminus (by morfchibi_web (residues - ), morfpred (residues - ), and disopred (residues [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ) and at c-terminus (by morfchibi_web (residues - ) and morfpred (residues - )) ( table ) . similarly, orf a of human sars and bat cov also shows morfs at n-and c-terminus with the help of morfchibi_web and morfpred (supplementary tables and ). these protein-binding regions in orf a may have role in its co-localization with e, m, and s viral proteins. apart from morfs, it also displays several nucleotide-binding residues in all three viruses (see supplementary tables , , and ). in fact, this represents maximum number of rna and dna binding residues as compared with all other accessory proteins. these results indicate that the idps/idprs of this protein could be utilized in molecular recognition (protein-protein, protein-rna, and protein-dna interaction). according to the intrinsic disorder predisposition analysis of orf b proteins, their mean ppid values in sars-cov- , human sars cov, and bat cov are %, . %, and . % respectively, as represented in figures a, b , and c. msa results ( figure d ) demonstrate that orf b of sars-cov- is not closer to orf b protein of human sars and orf b protein of bat-cov, having a sequence similarity of only . % and . %, respectively. as we can see in table , there is not a single morf found in orf b of sars-cov- . however, for human sars we identified three morfs (residues - , - , and - ) and for bat cov one morf at n-terminus (residues - ) by morfchibi_web server. protein orf . orf is a short coronavirus protein with just residues. also known as p , this membrane-associated protein serves as an interferon (ifn) antagonist [ ] . it downregulates the ifn pathway by blocking a nuclear import protein, karyopherin α . using its c-terminal residues, orf disrupts karyopherin import complex in the cytosol and, therefore, hampers the movement of transcription factors like stat into the nucleus [ , ] . it contains a ysel motif near its c-terminal region, which functions in protein internalization from the plasma membrane into the endosomal vesicles [ ] . another study has also demonstrated the presence of orf in endosomal/lysosomal compartments [ , ] . msa results demonstrate that (figure d) , sars-cov- orf is closer to orf protein of human sars cov, having a sequence similarity of . % than to the orf of bat cov (sars-like) ( . %). novel sars-cov- orf is predicted to be the second most disordered structural protein, with ppid of . %, and with especially disordered cterminal region. our analysis of the intrinsic disorder predisposition using six predictors revealed the mean ppid in orf proteins of sars-cov- , human sars, and bat cov to be . %, . %, and . %, respectively ( table ) . graphs in figures a, b and c illustrate that orf proteins from all three studied coronaviruses are expected to be moderately disordered proteins with the high disorder content in their c-terminal regions. these disordered regions are important for the biological activities of orf . as aforementioned, this hydrophilic region contains lysosomal targeting motif (ysel) and diacidic motif (ddee) responsible for binding and recognition during translocation [ ] . however, the n-terminal region does not contain a noticeable disorder. the - residues of the n-terminal region of human sars cov orf was shown to be α-helical and embedded in the membrane, although orf is not a transmembrane protein [ ] . a long morf region ((residues - in sars-cov- ), (residues - in human sars), and (residues - in bat cov)) is also present at cterminus of orf proteins which are tabulated in table , and supplementary tables and . no predictor other than morfchibi_web has located morfs in this protein. supplementary table , , and shows nucleotide-binding residues in orf of all three viruses. it represents very few rna binding residues by pprint and few dna binding residues by drnapred. orf a and orf b proteins. alternatively called u , orf a is a type i transmembrane protein [ , ] . it has been proven to localize in er, golgi, and peri-nuclear space. the presence of a krkte motif near the c-terminal region is needed for importing this protein from the er to the golgi apparatus [ , ] . orf a contributes to viral pathogenesis by activating the release of pro-inflammatory cytokines and chemokines, such as il- and rantes [ , ] . in another study, overexpression of bcl-xl in t cells blocked the orf a mediated apoptosis [ ] . on the other hand, orf b is an integral membrane protein that has been shown to localize in the golgi complex [ , ] . the same reports also confirm the role of orf b as an accessory as well as a structural protein in sars-cov virion [ , ] . figure d represents the . Å x-ray crystal structure of the - fragment of the orf a from human sars cov (pdb id: xak) and demonstrates the compact seven-stranded topology of this protein, which is similar to that of the ig-superfamily members [ ] . importantly, in this crystal structure, residues - constituted the region with missing electron density, indicating high structural flexibility of this segment. in line with this hypothesis, the nmr solution structure of the - fragment of the orf a from human sars cov (pdb id: yo ) showed that residues - are highly disordered [ ] . at the domain level, the structure of the orf a protein includes a signal peptide, a luminal domain, a transmembrane domain, and a short cytoplasmic tail at the c-terminus [ , ] . we found that -residue-long orf a protein of sars-cov- shares . % and . % sequence identity with orf a proteins of bat cov and human sars cov, respectively ( figure e) . on the other hand, the orf b of sars-cov- is found to be closer to orf b of human sars than to orf b of bat cov, showing sequence identities of . % and . %, respectively (see figure d ). as can be observed from table , our disorder predisposition analyses resulted in the overall ppid for orf a proteins of . % for sars-cov- , . % for bat cov and . % for human sars cov. mean ppids estimated for orf b proteins are . % for sars-cov- , . % for bat cov and . % human sars cov. figures a, b , and c represent the residues predisposed for disorder in orf a proteins of sars-cov- , human sars cov, and bat cov, respectively. table shows that orf a protein is expected to have several morfs indicating the potential involvement of this protein in disorder-dependent proteinprotein interactions. at the n-terminus, we observed one morf region (residues - ) with the help of disopred in all three viruses. in addition to protein binding regions, orf a also contains several rna and dna binding residues. analysis also represents, orf b proteins from all three viruses have low disorder content, likewise, they are not predicted to contain any morf by any of the predictors used in this study ( table , supplementary table and ). although the orf b does not contain protein-binding regions, it was found to contain nucleotide (rns and dna) binding regions in the protein figures a, b , and c depict the residues predisposed for disorder in orf b proteins of sars-cov- , human sars cov, and bat cov, respectively. according to our analysis, both proteins in all three studied coronaviruses have a mostly ordered structure. proteins orf a and orf b. in animals and isolates from early human infections, the orf gene codes for a single orf protein. however, in late infections, more specifically, at middle and late stages, a nucleotide deletion in the orf gene led to the formation of two distinct proteins, orf a and orf b containing and residues respectively [ , ] . both proteins have conformations different from that of the longer orf protein. it has been reported that overexpression of orf b resulted in the downregulation of e protein while the proteins orf a and orf /orf ab have no effect on the expression of protein e. also, orf /orf ab was found to interact very strongly with proteins s, orf a, and orf a. orf a interacts with s and e proteins, whereas orf b protein interacts with e, m, orf a and orf a proteins [ ] . the disorder-based protein binding regions of this protein identified in this study may have important role in interaction with other proteins. orf protein found in early sars-cov- isolates having residues and according to our analysis, it shares a . % sequence identity with orf protein of bat cov ( figure c) . furthermore, figures a and b show that there is no intrinsic disorder in both orf proteins from sars-cov- and bat cov. therefore, these two proteins predicted to be completely structured having a mean ppid of . %. in orf a and orf b proteins of the human sars, the predicted disorder is estimated to be . % and . %, respectively (table ) . graphs in figures a and b illustrate the presence of some disorder near the n-and c-terminals of orf a and orf b proteins. table shows the identified morf regions in orf of sars-cov- . it shows three morf regions (residues - , - , and - ) by morfchibi_web and one morf region (residues - ) by disopred . in human sars, the n-terminus of both orf a (residues - ) and orf b (residues - ) was found to be morf by morfchibi_web server (supplementary table ) . further, four proteinbinding regions (residues - , - , - , and - ) were identified by morfchibi_web server in bat cov (supplementary table ). apart from protein-binding, orf of sars-cov- , orf a and orf b of human sars, and orf of bat cov also comprise several nucleotide-binding residues (see supplementary table , , and ). this protein is expressed from an alternative orf within the n gene through a leaky ribosome binding process [ ] . inside the host cells, orf b enters the nucleus, which is a cell cycle-independent process and represents a passive entry. this protein was shown to interact with a nuclear export protein receptor exportin (crm ), using which it translocate out of the nucleus [ ] . our morfs analysis shows the presence of disorderbased protein binding regions in orf b protein which may have role in its interaction with crm and further translocation outside the nucleus. a . Å resolution crystal structure of orf b protein from human sars cov (pdb id: cme) shows the presence of a dimeric tent-like -structure along with the central hydrophobic amino acids ( figure d) . the published structure has the highly polarized distribution of charges, with positively charged residues on the one side of the tent and negatively charged on the other [ ] . based on the sequence availability of accession id nc_ . , the translated protein sequence of orf b is not reported for the sars-cov- as of yet. however, based on the report by wu and colleagues [ ] , the sequences of the sars-cov- are already annotated. therefore, we took the corresponding amino acid sequences from that study and conducted the intrinsic disorder analysis. according to the msa, results shown in figure e , orf b protein from sars-cov- shares . % identity with human sars and . % identity with bat cov. our idp analysis ( table ) shows that orf b from human sars is a moderately unstructured protein with a mean ppid estimated to . %. as depicted in figure a , b, and c, disorder mainly lies near the n-terminal end - residues and - residues near the central region with a well-ordered inner core of human sars orf b protein. the x-ray crystal structure of orf b has a missing electron density of the first residues and - residues near the central region. this indicates that the corresponding regions are disordered, which are difficult to crystallize due to their highly dynamic structural organization. sars-cov- orf b protein with a mean ppid of . % has an n-terminal ( - residues) predicted disordered segment. orf b of bat cov is shown to have an intrinsic disorder content of . %, comparatively lower than that of the human sars orf b protein. morfs lies in the n-terminal region of orf b proteins ( table , supplementary table and ). in the absence of other viral proteins, its first residues have been demonstrated to induce membranous structures similar to dmvs [ ] . the available crystal structure also has the missing electron density in the n-terminal region suggests that these flexible amino acids are likely to interact with host lipids. the first - residues of sars-cov are identified as disorder-based protein binding region that may have role in its interaction with host lipids and formation of dmvs. supplementary tables , , and represents nucleotide-binding residues for orf b of sars-cov- , human sars, and bat cov. the newly emerged sars-cov- has an orf protein of amino acids. orf of sars-cov- has a % sequence similarity with orf of bat cov strain bat-sl-covzc [ ] . however, we did not conduct the disorder analysis for orf from the bat-sl-covzc strain, since all our studies reported here are related to a different strain of bat cov (reviewed strain hku - ). therefore, we report here only the results of disorder analysis for the orf protein from sars-cov- , according to which this protein has a mean ppid of . % (see also figure for disorder profile of orf ). this protein contains a morf from - residues at its n-terminus as predicted by morfchibi_web. further, we found its tendency to nucleotides and found the presence of few rna binding sites, however, it does not contain dna binding residues. protein orf . this is a -amino-acid-long uncharacterized protein of unknown function, which is present in human sars and bat cov. in sars-cov- , orf is a -amino-acidlong protein. according to the msa, orf of sars-cov- has . % identity with human-sars and . % identity with bat cov as represented in figure d . we have performed the intrinsic disorder analysis to see the peculiarities of the distribution of disorder predisposition in this protein. figures a, b, and c show the resulting disorder profiles of orf of sars-cov- , human sars cov, and bat cov. although these proteins have calculated mean ppid values of . %, . %, and . % respectively, figure shows that they have flexible n-and c-terminal regions. this protein can use intrinsic disorder or structural flexibility for protein-protein interactions since it possesses morfs. it mainly contains morfs at n-and c-terminal regions as tabulated in (table , supplementary table and ). it was also found to contain several rna and dna binding residues (supplementary table , , and ) . these results indicating its vital role in protein function related to molecular recognition such as protein-protein, protein-rna, and protein-dna interaction. in coronaviruses, due to ribosomal leakage during translation, two-third of the rna genome is processed into two polyproteins: (i) replicase polyprotein a and (ii) replicase polyprotein ab. both contain non-structural proteins (nsp - ) in addition to different proteins required for viral replication and pathogenesis. replicase polyprotein a contains an additional nsp protein of amino acids, the function of which is not investigated yet. the longer replicase polyprotein ab of amino acids accommodates five other non-structural proteins (nsp - ) [ ] . these proteins assist in er membrane-induced vesicle formation, which acts as sites for replication and transcription. in addition to this, non-structural proteins work as proteases, helicases, and mrna capping and methylation enzymes, crucial for virus survival and replication inside host cells [ , ] . global analysis of intrinsic disorder in the replicase polyprotein ab table represents the ppid mean scores of non-structural proteins (nsps) derived from the replicase polyprotein ab in sars-cov- , human sars cov, and bat cov. these values were obtained by combining the results from six disorder predictors (see supplementary table s -s ) . figures a, b , and c represent the d-disorder plots of the nsps coded by orf ab in sars-cov- , human sars cov, and bat cov, respectively. based on the mean ppid scores in table , figures a, b, c , and taking into ppid based classification [ ] , we conclude that none of the nsps in sars-cov- , human sars cov, and bat cov are highly disordered. the highest disorder was observed for nsp proteins in all three coronaviruses. both nsp and nsp are moderately disordered proteins ( % ≤ ppid ≤ %). we also observed that nsp , nsp , nsp , nsp , nsp , nsp , nsp , nsp , and nsp have less than % disordered residues and hence, belong to the category of mostly ordered proteins. other non-structural proteins, namely, nsp , nsp , nsp , and nsp have negligible levels of disorder (ppid < %), which tells us that these are highly structured proteins. the ch-cdf analysis of the nsps from sars-cov- , human sars and bat cov have been represented in figures d, e , and f respectively. it was observed that all the nsps of the three coronaviruses are located within the quadrant q of the ch-cdf phase space, indicating that all the nsps are predicted to be mostly ordered. replicase polyprotein ab. the longer replicase polyprotein ab is a , amino acid-long polypeptide, which contains non-structural proteins listed in table . nsp , nsp , and nsp are cleaved using a viral papain-like proteinase (nsp /pl-pro), while the rest of nsps are cleaved by another viral c-like proteinase, nsp / cl-pro. we mapped the cleavage sites of the replicase ab polyprotein from human sars cov to the disorder profile of this polyprotein. figure represents the results of this analysis by showing zoomed-in regions surrounding all the cleavage sites with few residues spanning at both terminals. interestingly, we observed that all the cleavage sites are largely disordered, suggesting that intrinsic disorder may have a crucial role in the maturation of individual non-structural proteins. as the nsps of human sars cov are evolutionary close to the nsps of sars-cov- , we hypothesize that the cleavage sites in the sars-cov- replicase ab polyprotein are also intrinsically disordered or flexible. to shed more light on other implications of idprs, the structural and functional properties of nsps and their predicted idprs are thoroughly described below. this protein acts as a host translation inhibitor as it binds to the s subunit of the ribosome and blocks the translation of cap-dependent mrnas as well as mrnas that uses the internal ribosome entry site (ires) [ ] . figure d shows the nmr solution structure (pdb id: gdt) of human sars nsp protein ( - residues), whereas residues - were not included in this structural analysis [ ] . sars-cov- nsp shares . % and . % sequence identity with nsp s of human sars cov and bat cov, respectively. its n-terminal region is found to be more conserved than the rest of the protein sequence ( figure e ). mean ppids of nsp s from sars-cov- , human sars cov, and bat cov are . %, . %, and . %, respectively. figure a , b, and c represent the graphs of predicted per-residue intrinsic disorder propensity of these nsp s. according to the analysis, the following regions are predicted to be disordered: sars-cov- (residues - and - ), human sars cov (residues - and - ), and bat cov (residues - and - ). nmr solution structure of nsp from human sars revealed the presence of two unstructured segments near the n-terminal ( - residues) and c-terminal ( - residues) regions [ ] . the disordered region residues) at c-terminus is important for nsp expression [ ] . based on sequence homology with human sars cov nsp , the predicted disordered c-terminal region of sars-cov- nsp may play a critical role in its expression. alanine mutants at k and h in the cterminal region of nsp protein is reported to abolish its binding with the s subunit of the host ribosome [ ] . in conjunction with this data, several morfs are present in the unstructured segments of nsp proteins. these regions are tabulated in table , and supplementary tables and . this protein functions by disrupting the host survival pathway via interaction with the host proteins prohibitin- and prohibitin- [ ] . reverse genetic deletion in the coding sequence of nsp of the sars virus attenuated little viral growth and replication and allowed the recovery of mutant virulent viruses. this indicates the dispensable nature of the nsp protein for sars viruses [ ] . the sequence identity of the nsp protein from sars-cov- with nsp s of human sars cov and bat cov amounts to . % and . %, respectively (supplementary figure s a) . we have estimated the mean ppids of nsp s of sars-cov- , human sars cov, and bat cov to be . %, . %, and . % respectively (see table ). the per-residues predisposition for the intrinsic disorder of nsp s from sars-cov- , human sars cov, and bat cov are depicted in figures a, b , and c. according to this analysis, the following regions in nsp proteins are predicted to be disordered, residues - (sars-cov- ), residues - (human sars), and residues - (bat cov). as listed in table , and supplementary tables and , human sars cov does not contain morf while sars-cov- and bat cov have an n-terminally located morf region predicted by morfchibi_web. nsp is an almost , -residue-long viral papain-like protease (plp) that affects the phosphorylation and activation of irf and therefore antagonizes the ifn pathway [ ] . it was also demonstrated that nsp works by stabilizing nf-inhibitor further blocking the nf-pathway [ ] . figure d represents the . Å resolution x-ray crystal structure of the catalytic core of nsp protein from human sars cov (pdb id: fe ), which was obtained by andrew and colleagues [ ] . this structure consists of the residues - of nsp . the structure revealed folds similar to a deubiquitinating enzyme in-vitro deubiquitinating activity of which was found to be efficiently high [ ] . nsp protein of sars-cov- contains several substituted residues throughout the protein. it is equally close with both nsp proteins of human sars and bat cov sharing respective . % and . % identity (supplementary figure s b) . according to our results, the mean ppids of nsp proteins of sars-cov- , human sars, and bat cov are . %, . %, and . % respectively ( table ) . graphs in figures a, b , and c portray the tendency of nsp proteins of sars-cov- , human sars, and bat cov for the intrinsic disorder. nsp proteins of all three studied sars viruses were found to be highly structured and characterized by rather similar disorder profiles. this is further supported by figure e , where pondr ® vsl -generated disorder profiles of these three proteins are overlapped to show almost complete coincidence of their major disorder-related features. according to the mean disorder analysis (see figures a, b, and c) , nsp proteins are predicted to have the following idprs, sars-cov- ( - , - , - ), human sars ( - , - , - ) and bat cov ( - , - , - ) . the first residues in nsp represent a ubiquitin-like globular fold while - residues form the flexible acidic domain rich in glutamic acid. it is thought to bind and ubiquitinate viral e protein using the n-terminal acidic domain [ , ] . this unstructured segment has many morfs predicted by anchor and morfpred servers which may facilitate the protein-protein interaction ( table ) . interestingly, nsp of all three viruses was found with highest number of rnabinding residues (supplementary tables , , and ) . nsp has been reported to induce the formation of the double-membrane vesicles (dmvs) with the co-expression of full-length nsp and nsp proteins for optimal replication inside host cells [ ] [ ] [ ] . it localizes itself in ermembrane, when expressed alone but is demonstrated to be present in replication units in infected cells. it was observed that nsp protein contains a tetraspanning transmembrane region having its n-and c-terminals in the cytosol [ ] . no crystal or nmr solution structure is reported for this protein as of yet. nsp protein of sars-cov- has multiple substitutions near the n-terminal region and has a quite conserved c-terminus (supplementary figure s c) . it is found to be closer to nsp of bat cov ( . % identity) than to human sars nsp ( %). mean ppids of nsp s from sars-cov- , human sars, and bat cov are estimated to be . %, . %, and . % respectively. the low level of intrinsic disorder is further illustrated by figures a, b , and c. with ppids around zero, nsp were classified as highly structured proteins, which, however, contain some flexible regions. likewise, table shows the presence of only nand c-terminal morfs which possibly assist in cleavage of nsp protein from long polyproteins a and ab. also referred to as cl-pro, nsp works as a protease that cleaves the replicase polyproteins ( a and ab) at major sites [ , ] . x-ray crystal structure with . Å resolution (pdb id: c o) obtained for human sars cov nsp is shown in figure d . here, cl-protease is bound to a phenyl-beta-alanyl (s, r)-n-declin type inhibitor. another crystal structure resolved to . Å revealed a chymotrypsin-like fold and a conserved substrate-binding site connected to a novel α-helical fold [ ] . recently, the x-ray crystal structure (resolution . Å) was solved for the sars-cov- nsp in complex with an inhibitor n (pdb id: lu ) ( figure e ). nsp protein is found to be highly conserved in all three studied cov viruses. sars-cov- nsp shares a . % sequence identity with human sars nsp and . % with nsp of bat cov (supplementary figure s d) . therefore, it not surprising that our analysis demonstrated the identical mean ppid values of . % for nsp s from sars-cov- , human sars, and bat cov ( table ) . the predicted per-residue intrinsic disorder propensity of sars-cov- , human sars, and bat cov nsp s are presented in figures a, b , and c, respectively. as the graphs depict, nsp s have several flexible regions and n-terminally idpr of six residues. due to the low flexibility of this protein, a single morf predicted by morfchibi_web is present in the n-terminal region (residues - ) in nsp s of all three viruses (table , supplementary tables and ) . further, the identified nucleotide-binding residues in nsp of all three viruses are tabulated in supplementary tables , , and . non-structural protein (nsp ). nsp protein is involved in blocking er-induced autophagosome/autolysosome vesicle formation that functions in restricting viral production inside host cells. it induces autophagy by activating the omegasome pathway, which is normally utilized by cells in response to starvation. sars nsp leads to the generation of small autophagosome vesicles thereby limiting their expansion [ ] . nsp of sars-cov- is equally close to nsp s from both human sars and bat cov, having a sequence identity of . % ( figure d ). according to our analysis, mean ppids for nsp s are calculated to be . %, . %, and . % for sars-cov- , human sars cov, and bat cov, respectively. figures a, b, and c show the corresponding graphs of intrinsic disorder tendency of nsp s from sars-cov- , human sars cov, and bat cov and demonstrate that these proteins are highly ordered and show low flexibility. as it is a membrane protein, nsp proteins are predicted to have only a single morf near the nterminal region (residues - in sars-cov- , residues - in human sars, and residues - in bat cov) by the disopred server (table , supplementary tables and ) . the role of these protein-binding regions for the induction of autophagy is need to be elucidated. nsp and ) . the ~ kda nsp helps in primaseindependent de novo initiation of viral rna replication by forming a hexadecameric ring-like structure with nsp protein [ , ] . both non-structural proteins and contribute molecules to the ring-structured multimeric viral rna polymerase. site-directed mutagenesis in nsp revealed a d/exd/e motif essential for the in vitro catalysis [ ] . figure d depicts the . Å resolution electron microscopy-based structure (pdb id: nur) of the rdrp-nsp -nsp complex bound to the nsp . the structure identified conserved neutral nsp and nsp binding sites overlapping with finger and thumb domains on nsp of the virus [ ] . we found that nsp of sars-cov- share % sequence identity with nsp of bat cov and . % with nsp from human sars (figure e) , while sars-cov- nsp is closer to nsp of human sars ( . %) than to nsp of bat cov ( . %) ( figure d ). due to the high levels of sequence identity, mean ppids of all nsp s were found to be identical and equal to . %. both sars-cov- and human sars nsp proteins were calculated to have a mean ppid of . % and, for nsp of bat cov mean disorder is predicted to be . %. figures a, b , and c display the intrinsic disorder profiles for nsp s, whereas figures a, b , and c represent the predicted intrinsic disorder propensity of nsp s. as our analysis suggests, nsp s might have a well-predicted structure, while nsp s are moderately disordered. nsp s are predicted to have a long idpr (residues - ) in both sars-cov- and human sars, and a bit shorter idpr in bat cov (residues . furthermore, sars-cov nsp using its n-terminus residues (v , c , v , and v ) forms a hydrophobic core with nsp residues (m , m , l , m , and l ). additionally, h-bonding takes place between nsp q and nsp t residues [ ] . these amino acids are the part of morfs predicted in nsp and nsp proteins. the results are tabulated in both supplementary tables , , and ). nsp protein is a single-stranded rna-binding protein [ ] . it might protect rna from nucleases by binding and stabilizing viral nucleic acids during replication or transcription [ ] . our results on nucleotide-binding tendency of nsp shows the presence of several rna binding and few dna binding residues in nsp of sars-cov- , human sars, and bat cov (supplementary tables , , and ) . presumed to evolve from a protease, nsp forms a dimer using its gxxxg motif [ , ] . figure d shows a . Å crystal structure of the homodimer of human sars nsp (pdb id: qz ) that identified a unique and previously unreported for other proteins, oligosaccharide/oligonucleotide fold-like fold [ ] . here, each monomer contains a coneshaped β-barrel and a c-terminal α-helix arranged into a compact domain [ ] . nsp of sars-cov- is equally similar to nsp s from both human sars and bat cov, having a percentage identity of . %. the difference in three amino acids at , and positions accounts for these similarity scores ( figure e ). as calculated, the mean ppids of nsp s of sars-cov- , human sars cov, and bat cov are . %, . %, and . % respectively. figures a, b , and c depict the predicted intrinsic disorder propensity in the nsp protein from sars-cov- , human sars, and bat cov. according to our analysis, all three nsp s are rather structured but contain flexible regions. nsp contains conserved residues (r , k , y , r , r , f , k , y , f , k , r , and r ) of positively charged side chains suitable for binding with the negatively charged phosphate backbone of rna and aromatic side-chain amino acids providing stacking interactions [ ] . these residues are a part of multiple disorder-based binding sites predicted by morfchibi_web server ( table , supplementary table and ) . nsp performs several functions for sars-cov. it forms a complex with nsp for dsrna hydrolysis in ′ to ′ direction and activates its exonuclease activity [ ] . it also stimulates the methyltransferase (mtase) activity of nsp required during rna-cap formation after replication [ ] . figure d represents the x-ray crystal structure of the nsp /nsp complex (pdb id: c t) [ ] . in agreement with the results of previous biochemical experimental studies, the structure identified important interactions with the exon (exonuclease domain) of nsp without affecting its n -mtase activity [ , ] . sars-cov- nsp protein is quite conserved having a . % sequence identity with nsp of human sars and . % with nsp of bat cov (figure e) . mean ppids of all three studied nsp proteins are found to be . %. figures a, b , and c represent disorder profiles of nsp s and signify the lack of long idprs but presence flexible regions in these proteins. furthermore, [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] was predicted by morfpred server. interestingly, the sars-cov nsp residues f , f , and v form van der waals interactions with many of the nsp amino acids [ ] and one residue (f ) is located in morf region which we have identified. furthermore, many nucleotide-binding residues which are found in all three viruses (supplementary table , , and ) and above-mentioned residues are not found to interact with dna/rna. in coronaviruses, nsp is an rna-dependent rna polymerase (rdrp). it carries out both primer-independent and primer-dependent synthesis of viral rna with mn + as its metallic co-factor and viral nsp and as protein co-factors [ ] . as aforementioned, a . Å resolution structure of human sars nsp in association with nsp and nsp proteins (pdb id: nur) has been reported using electron microscopy ( figure d ). nsp has a polymerase domain similar to "right hand", finger domain ( - , - residues), palm domain ( - , - residues) and a thumb domain ( - ) [ ] . sars-cov- nsp protein has a highly conserved c-terminal region (supplementary figure s e ). it is found to share a . % sequence identity with human sars nsp and . % with bat cov nsp . mean ppid values for all three nsp s are estimated to be . % (table ) . figures a, b , and c show that although these proteins are mostly ordered, they have multiple flexible regions. as rdrp protein is observed to be mostly structured, significant morfs in disordered regions are not found ( table , supplementary table and ). nsp functions as a viral helicase and unwinds dsdna/dsrna in ' to ' direction [ ] . recombinant viral helicase expressed in e.coli rosetta strain was reported to unwind ~ bp per second [ ] . figure d represents a . Å x-ray crystal structure of human sars nsp (pdb id: jyt) [ ] . this helicase contains a - loop on a domain, which is primarily responsible for its unwinding activity. furthermore, the study revealed an important interaction of nsp with nsp that further enhances its helicase activity [ ] . the -amino-acid-long nsp of sars-cov- is almost completely conserved, as it shares . % with nsp of humans sars and . % with nsp of bat cov (supplementary figure s f) . in accordance with our results, the mean ppids of all three nsp proteins are estimated to be . %. figures a, b , and c show that nsp s contain multiple flexible regions but do not possess significant disorder. as expected, being a low disorder protein nsp does not contain any morf region and not a single bindingregion is located by any server used in all three viruses ( table , supplementary table and ). it has many nucleotide-binding residues (rna and dna) which are tabulated in supplementary tables , , and . nsp is a multifunctional viral protein that acts as an exoribonuclease (exon) and methyltransferase (n -mtase) in sars coronaviruses. it's ' to ' exonuclease activity lies in the conserved dedd residues related to exonuclease superfamily [ ] . its guanine-n methyltransferase activity depends upon the s-adenosyl-lmethionine (adomet) as a cofactor [ ] . as aforementioned, nsp requires nsp for activating its exon and n -mtase activity inside the host cells. figure d depicts the . Å crystal structure of human sars nsp /nsp complex (pdb id: c t), where amino acids - form the exon domain and - residues form the n -mtase domain of nsp . a loop (residues - ) is essential for its n -mtase activity [ ] . figure s g) . mean ppid values for nsp s from sars-cov- and human sars is calculated to be . %, while the nsp from bat cov has a mean ppid . %. predicted per-residue intrinsic disorder propensity of nsp s from sars-cov- , human sars, and bat cov is represented in figures a, b , and c, respectively. as can be observed from these plots and corresponding ppid values, all nsp s are found to be highly structured. likewise, table shows nsp contains two protein binding regions (residues - and - ) predicted by the morfpred server in all three viruses. as shown in supplementary tables , , and , the orf represents multiple nucleotide-binding residues. nsp is a uridylate-specific rna endonuclease (nendou) that creates a ′- ′ cyclic phosphates after cleavage. its endonuclease activity depends upon mn + ions as co-factors. conserved in nidovirus, it acts as an important genetic marker due to its absence in other rna viruses [ ] . figure d represents a . Å crystal structure of uridylate-specific nsp (pdb id: h ) that was deduced by bruno and colleagues using x-ray diffraction [ ] . the monomeric nsp has three domains: nterminal domain ( - residues) formed by a three anti-parallel -strands and two α-helices packed together; a middle domain residues) that contains an α-helix connected via a -amino-acid-long coil to an ordered region containing two α-helices and five -strands; and a c-terminal domain ( - residues) consisting of two anti-parallel three -strand sheets on each side of a central α-helical core [ ] . the nsp is found to be quite conserved across human sars and bat covs. sars-cov- nsp shares an . % sequence identity with nsp of human sars and . % with nsp of bat cov (supplementary figure s h) . calculated mean ppids of nsp s from sars-cov- , human sars, and bat cov are . %, . %, and . %, respectively. similar to many other non-structural proteins of coronaviruses, nsp s from sars-cov- , human sars, and bat cov are predicted to possess multiple flexible regions but contain virtually no idprs (see figures a, b, and c) . similarly, no significant disorderbinding regions are predicted in nsp proteins ( table ) . sars-cov- contain one morf (residues - ) predicted by morfpred server. human sars do not have a single morf while bat cov possesses two very short binding regions (supplementary table and ). supplementary table , , and depicts the presence of many rna binding residues and few dna binding residues in nsp of all three viruses. nsp protein is another mtase domain-containing protein. as methylation of coronavirus mrnas occurs in steps, three proteins nsp , nsp , and nsp acts one after another. the first event requires the initiation trigger from nsp protein, after which nsp methylates capped mrnas forming cap- ( me) gpppa-rnas. nsp protein, along with its co-activator protein nsp , acts on cap- ( me) gpppa-rnas to give rise to final cap- ( me)gpppa( 'ome)-rnas [ , ] . a Å x-ray crystal structure of the human sars nsp -nsp complex is depicted in figure d (pdb id: r ) [ ] . the structure consists of a characteristic fold present in class i mtase family comprising of α-helices and loops surrounding a seven-stranded β-sheet [ ] . nsp protein of sars-cov- is found to be equally similar to nsp s from human sars and bat cov ( . %) (supplementary figure s i) . mean ppids for nsp s from sars-cov- , human sars, and bat cov are . %, . %, and . %, respectively. in line with these ppids values, figures a, b, and c show that nsp s are mostly ordered proteins containing several flexible regions. correspondingly, no significant morfs are present in this protein ( table , supplementary table and ). a single morf (residues [ ] [ ] [ ] [ ] [ ] [ ] were found with the help of morfpred in all three viruses. further, several rnabinding and few dna-binding residues are also identified (supplementary table , , and ). replicase polyprotein a. since replicase polyprotein a contains non-structural proteins - identical to those found in replicase polyprotein ab, we did not perform their disorder analysis separately. however, replicase polyprotein a has one additional non-structural protein designated as nsp . nsp is a small uncharacterized protein cleaved from the replicase polyprotein a. this small protein with unknown function requires experimental insights to further characterize this protein. the intrinsic disorder predicting software used in this study requires amino acid sequences, which are at least -residue long. therefore, because of their short sequences (just residues) nsp s from all three studied coronaviruses were not checked for the intrinsic disorder, disorder-based protein binding regions, and nucleotide-binding residues. based on the msa outputs, nsp from sars-cov- was found to have a sequence identity of . % with nsp s from human sars and bat cov (figure ). the emergence of new viruses and associated deaths around the globe represent one of the major concerns of modern times. despite its pandemic nature, there is very little information available in the public domain regarding the structures and functions of sars-cov- proteins. based on its similarity with human sars cov and bat cov, the published reports have suggested the functions of sars-cov- proteins. in this study, we utilized information available on sars-cov- genome and translated proteome from genbank, and carried out a comprehensive computational analysis of the prevalence of the intrinsic disorder in sars-cov- proteins. additionally, a comparison was also made with proteins from close relatives of sars-cov- from the same group of beta coronaviruses, human sars cov and bat cov. our analysis revealed that in these three covs, the n proteins are highly disordered, possessing the ppid values of more than %. these viruses also have several moderately disordered proteins, such as nsp , orf , and orf b. although other proteins have shown lower disorder content, almost all of them contain at least some idprs, and all cov proteins analysed in this study definitely have multiple flexible regions. importantly, our study provides novel information on presence of intrinsic disorder at the cleavage sites of the replicase ab polyprotein of covs. this observation confirms the crucial role of idprs in maturation of individual proteins. we also established that many of these proteins contain disorder-based binding motifs. since idps/idprs might undergo structural transition upon association with their physiological partners, our study generates important grounds for better understanding of the functionality of these proteins, their interactions with other viral proteins, as well as interaction with host proteins in different physiological conditions. this will also guide structural biologists to carry out a structure-based analysis of sars-cov- proteome to explore the path for the development of new drugs and vaccines. the periodical outbreaks of pathogens worldwide always remind the lack of suitable drugs or vaccines for proper cure or treatment. in , nearly deaths were reported due to the sars outbreak in more than countries. but this time, the outbreak of wuhan's novel coronavirus (sars-cov- ) has quickly surpassed this number, indicating more causalities soon. the lack of accurate information and ignorance of primary symptoms are major reasons, which cause many infection cases. although efficient transmission from human to human is confirmed, the actual reasons for fast sars-cov- spread are still unknown, but some assumptions were made by researchers and chinese authorities. the fast spread of sars-cov- , covid- pandemic, and associated introduction of quarantine also have made major impacts on economy and education worldwide due to several restrictions, such as limited transportation, restrained or frozen traveling, halted attendance of mass events, the introduction of distant teaching and learning, etc. due to advancements in sequencing techniques, the full genome sequence of sars-cov- was made available in a few days of the first infection report from wuhan, china. however, massive subsequent research needs to be done to identify the actual cause of sars-cov- infectivity and to design suitable treatment in the coming future. certain possibilities can be explored with the available information. the mutational pressure study on this virus will be very interesting to see if this virus transforms from bat sars to human sars to sars-cov- . more in-depth experimental studies using molecular and cell biology techniques to establish structurefunction relationships are required for a better understanding of the functioning of sars-cov- proteins. additionally, based on the sequence homology and information on proteinprotein interactions, the associated viral and host proteins should be explored, for finding means suitable for limiting replication, maturation, and ultimately pathogenesis of this virus. although structural biology techniques (so-called rational drug design) can be used in drug development utilizing high throughput screening of compounds virtually or experimentally, the applicability of these techniques is limited by the presence of intrinsic disorder in target proteins. therefore, the thorough disorder analysis of three coronaviruses conducted in this study will help structural biologists to rationally design experiments keeping this information in mind. authors contribution: rg: conception and design, interpretation of data, writing, and review of the manuscript, and study supervision. vnu: conception and design, acquisition and interpretation of data, writing, and review of the manuscript. ms, tb, pk, brg, kg: acquisition and interpretation of data, writing of the manuscript. table . evaluation of intrinsic disorder in non-structural proteins of bat cov. table : predicted morf residues in human sars proteins. supplementary table : predicted morf residues in bat cov proteins. supplementary table : predicted nucleotide-binding residues in sars-cov- proteins. supplementary table : predicted nucleotide-binding residues in human sars proteins. supplementary table : predicted nucleotide-binding residues in bat cov proteins. supplementary figures s . multiple sequence alignment of structural proteins of all three studied coronaviruses are generated using clustal omega. the aligned images are created using esprit . . figure s a . msa of sars-cov- , human sars, and bat cov spike glycoproteins. figure s b . msa of sars-cov- , human sars, and bat cov nucleoproteins. supplementary figure s . multiple sequence alignment of non-structural proteins of all three studied coronaviruses are generated using clustal omega. the aligned images are created using esprit . . figure s a . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s b . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s c . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s d . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s e . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s f . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s g . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s h . msa of sars-cov- , human sars, and bat cov nsp proteins. figure s i . msa of sars-cov- , human sars, and bat cov nsp proteins. clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a singlecentered, retrospective, observational study nidovirales: evolving the largest rna virus genome discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavi full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus the molecular biology of coronaviruses identification of novel subgenomic rnas and noncanonical transcription initiation signals of severe acute respiratory syndrome coronavirus ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex a contemporary view of coronavirus transcription classification of intrinsically disordered regions and proteins intrinsically disordered proteins and intrinsically disordered protein regions intrinsically unstructured proteins: re-assessing the protein structure-function paradigm flexible nets. the roles of intrinsic disorder in protein interaction networks identification and functions of usefully disordered proteins function and structure of inherently disordered proteins. current opinion in structural biology intrinsic disorder in transcription factors showing your id: intrinsic disorder as an id for recognition, regulation and cell signaling drnapred, fast sequence-based method that accurately predicts and discriminates dna-and rna-binding residues high-throughput prediction of rna, dna and protein binding regions mediated by intrinsic disorder a new coronavirus associated with human respiratory disease in china intrinsically disordered side of the zika virus proteome. frontiers in cellular and infection microbiology , , .teome viral disorder or disordered viruses: do viral proteins possess unique features? deciphering the dark proteome of chikungunya virus prediction and functional analysis of native disorder in proteins from the three kingdoms of life fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega deciphering key features in protein structures with the new endscript server length-dependent prediction of protein intrinsic disorder optimizing long intrinsic disorder predictors with protein evolutionary information pondr-fit: a metapredictor of intrinsically disordered amino acids sequence complexity of disordered protein iupred a: context-dependent prediction of protein disorder as a function of redox state and protein binding the dark side of alzheimer's disease: unstructured biology of proteins from the amyloid cascade signaling pathway the dark proteome of cancer: intrinsic disorderedness and functionality of hif- α along with its interacting proteins why are "natively unfolded" proteins unstructured under physiologic conditions? subclassifying disordered proteins by the ch-cdf plot method computational identification of morfs in protein sequences using hierarchical application of bayes rule prediction of protein binding regions in disordered proteins anchor: web server for predicting protein binding regions in disordered proteins morfpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins disopred : precise disordered region predictions with annotated protein-binding activity prediction of disordered rna, dna, and protein binding regions using disordpbind prediction of rna binding sites in a protein using svm and pssm profile genome composition and divergence of the novel coronavirus ( -ncov) originating in china a majority of the cancer/testis antigens are intrinsically disordered proteins molecular recognition features in zika virus proteome mpmorfsdb: a database of molecular recognition features in membrane proteins disordered rna-binding region prediction with disordpbind coronavirus ibv: removal of spike glycopolypeptide s by urea abolishes infectivity and haemagglutination but not attachment to cells recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission mechanisms of coronavirus cell entry mediated by the viral spike protein cooperative involvement of the s and s subunits of the murine coronavirus spike protein in receptor binding and extended host range the cytoplasmic tail of the severe acute respiratory syndrome coronavirus spike protein contains a novel endoplasmic reticulum retrieval signal that binds copi and promotes interaction with membrane protein structure of sars coronavirus spike receptorbinding domain complexed with receptor important role for the transmembrane domain of severe acute respiratory syndrome coronavirus spike protein during entry cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace cryo-em structure of the -ncov spike in the prefusion conformation the coronavirus e protein: assembly and beyond incorporation of spike and membrane glycoproteins into coronavirus virions a severe acute respiratory syndrome coronavirus that lacks the e gene is attenuated in vitro and in vivo the transmembrane oligomers of coronavirus protein e structure of a conserved golgi complex-targeting signal in coronavirus envelope proteins structural and functional aspects of viroporins in human respiratory viruses: respiratory syncytial virus and coronaviruses the sars coronavirus e protein interacts with pals and alters tight junction formation and epithelial morphogenesis identifying sars-cov membrane protein amino acid residues linked to virus-like particle assembly self-assembly of severe acute respiratory syndrome coronavirus membrane protein the cytoplasmic tails of infectious bronchitis virus e and m proteins mediate their interaction nucleocapsid-independent specific viral rna packaging via viral envelope protein and viral rna signal differential maturation and subcellular localization of severe acute respiratory syndrome coronavirus surface proteins s, m and e studies on membrane topology, n-glycosylation and functionality of sars-cov membrane protein a structural analysis of m protein in coronavirus assembly and morphology the membrane protein of severe acute respiratory syndrome coronavirus acts as a dominant immunogen revealed by a clustering region of novel functionally and structurally defined cytotoxic tlymphocyte epitopes prediction of intrinsic disorder in mers-cov/hcov-emc supports a high oral-fecal transmission the cterminal domain of the mers coronavirus m protein contains a trans-golgi network localization signal the coronavirus nucleocapsid is a multifunctional protein ribonucleocapsid formation of severe acute respiratory syndrome coronavirus through molecular action of the n-terminal domain of n protein structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging structure of the nterminal rna-binding domain of the sars cov nucleocapsid protein crystal structure of the severe acute respiratory syndrome (sars) coronavirus nucleocapsid protein dimerization domain reveals evolutionary linkage between corona-and arteriviridae characterization of protein-protein interactions between the nucleocapsid protein and membrane protein of the sars coronavirus the nucleocapsid protein of sars coronavirus has a high binding affinity to the human cellular heterogeneous nuclear ribonucleoprotein a analysis of multimerization of the sars coronavirus nucleocapsid protein localization of the rna-binding domain of mouse hepatitis virus nucleocapsid protein analysis of ordered and disordered protein complexes reveals structural features discriminating between stable and unstable monomers flexible nets: disorder and induced fit in the associations of p and - - with their partners in various protein complexes, disordered protomers have large per-residue surface areas and area of protein-, dnaand rna-binding interfaces sars coronavirus accessory proteins identification of a novel protein a from severe acute respiratory syndrome coronavirus subcellular localization and membrane association of sars-cov a protein a novel severe acute respiratory syndrome coronavirus protein, u , is transported to the cell surface and undergoes endocytosis severe acute respiratory syndrome-associated coronavirus a protein forms an ion channel and modulates virus release the severe acute respiratory syndrome (sars)-coronavirus a protein may function as a modulator of the trafficking properties of the spike protein the role of severe acute respiratory syndrome (sars)-coronavirus accessory proteins in virus pathogenesis mitochondrial location of severe acute respiratory syndrome coronavirus b protein nucleolar localization of nonstructural protein b, a protein specifically encoded by the severe acute respiratory syndrome coronavirus sars-cov accessory protein b induces ap- transcriptional activity through activation of jnk and erk pathways g /g arrest and apoptosis induced by sars-cov b protein in transfected cells over-expression of severe acute respiratory syndrome coronavirus b protein induces both apoptosis and necrosis in vero e cells the a protein of severe acute respiratory syndrome-associated coronavirus induces apoptosis in vero e cells severe acute respiratory syndrome coronavirus open reading frame (orf) b, orf , and nucleocapsid proteins function as interferon antagonists severe acute respiratory syndrome coronavirus orf antagonizes stat function by sequestering nuclear import factors on the rough endoplasmic reticulum/golgi membrane enhancement of murine coronavirus replication by severe acute respiratory syndrome coronavirus protein requires the n-terminal hydrophobic region but not c-terminal sorting motifs a putative diacidic motif in the sars-cov orf protein influences its subcellular localization and suppression of expression of cotransfected expression constructs the n-terminal region of severe acute respiratory syndrome coronavirus protein induces membrane rearrangement and enhances virus replication characterization of a unique group-specific protein (u ) of the severe acute respiratory syndrome coronavirus severe acute respiratory syndrome coronavirus a accessory protein is a viral structural protein augmentation of chemokine production by severe acute respiratory syndrome coronavirus a/x and a/x proteins through nf-kappab activation chemokine upregulation in sars-coronavirus-infected, monocyte-derived human dendritic cells induction of apoptosis by the severe acute respiratory syndrome coronavirus a protein is dependent on its interaction with the bcl-xl protein the orf b protein of severe acute respiratory syndrome coronavirus (sars-cov) is expressed in virus-infected cells and incorporated into sars-cov particles a protein of severe acute respiratory syndrome coronavirus inhibits cellular protein synthesis and activates p mitogen-activated protein kinase structure and intracellular targeting of the sars-coronavirus orf a accessory protein solution structure of the x protein coded by the sars related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin i domains the -nucleotide deletion present in human but not in animal severe acute respiratory syndrome coronaviruses disrupts the functional expression of open reading frame molecular evolution of the sars coronavirus during the course of the sars epidemic in china the human severe acute respiratory syndrome coronavirus (sars-cov) b protein is distinct from its counterpart in animal sars-cov and down-regulates the expression of the envelope protein in infected cells severe acute respiratory syndrome coronavirus accessory protein b is a virion-associated protein sars-cov b protein diffuses into nucleus, undergoes active crm mediated nucleocytoplasmic export and triggers apoptosis when retained in the nucleus the crystal structure of orf- b, a lipid binding protein from the sars coronavirus mechanisms and enzymes involved in sars coronavirus genome expression biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus c-like proteinase severe acute respiratory syndrome coronavirus protein nsp is a novel eukaryotic translation inhibitor that represses multiple steps of translation initiation novel -barrel fold in the nuclear magnetic resonance structure of the replicase nonstructural protein from the severe acute respiratory syndrome coronavirus identification of residues of sars-cov nsp that differentially affect inhibition of gene expression and antiviral signaling coronavirus nonstructural protein : common and distinct functions in the regulation of host and viral gene expression severe acute respiratory syndrome coronavirus nonstructural protein interacts with a host protein complex involved in mitochondrial biogenesis and intracellular signaling the nsp replicase proteins of murine hepatitis virus and severe acute respiratory syndrome coronavirus are dispensable for viral replication severe acute respiratory syndrome coronavirus papain-like protease ubiquitin-like domain and catalytic domain regulate antagonism of irf and nf-b signaling severe acute respiratory syndrome coronavirus papain-like protease: structure of a viral deubiquitinating enzyme nuclear magnetic resonance structure of the n-terminal domain of nonstructural protein from the severe acute respiratory syndrome coronavirus the envelope protein of severe acute respiratory syndrome coronavirus interacts with the non-structural protein and is ubiquitinated severe acute respiratory syndrome coronavirus nonstructural proteins , , and induce double-membrane vesicles mobility and interactions of coronavirus nonstructural protein two-amino acids change in the nsp of sars coronavirus abolishes viral replication localization and membrane topology of coronavirus nonstructural protein : involvement of the early secretory pathway in replication ligand-induced dimerization of middle east respiratory syndrome (mers) coronavirus nsp protease ( clpro): implications for nsp regulation and the development of antivirals a novel mutation in murine hepatitis virus nsp , the viral c-like proteinase, causes temperature-sensitive defects in viral growth and protein processing structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra alpha-helical domain coronavirus nsp restricts autophagosome expansion the sars-coronavirus nsp +nsp complex is a unique multimeric rna polymerase capable of both de novo initiation and primer extension insights into sars-cov transcription and replication from the structure of the nsp -nsp hexadecamer structure of the sars-cov nsp polymerase bound to nsp and nsp co-factors the severe acute respiratory syndrome-coronavirus replicative protein nsp is a singlestranded rna-binding subunit unique in the rna virus world variable oligomerization modes in coronavirus non-structural protein severe acute respiratory syndrome coronavirus nsp dimerization is essential for efficient viral growth rna '-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp /nsp exoribonuclease complex in vitro reconstitution of sars-coronavirus mrna cap methylation structural basis and functional analysis of the sars coronavirus nsp -nsp complex biochemical characterization of a recombinant sars coronavirus nsp rna-dependent rna polymerase capable of copying viral rna templates mechanism of nucleic acid unwinding by sars-cov helicase delicate structural coordination of the severe acute respiratory syndrome coronavirus nsp upon atp hydrolysis discovery of an rna virus '-> ' exoribonuclease that is critically involved in coronavirus rna synthesis major genetic marker of nidoviruses encodes a replicative endoribonuclease crystal structure and mechanistic determinants of sars coronavirus nonstructural protein define an endoribonuclease family coronavirus nonstructural protein is a cap- binding enzyme possessing (nucleoside- 'o)-methyltransferase activity biochemical and structural insights into the mechanisms of sars coronavirus rna ribose '-o-methylation by nsp /nsp protein complex key: cord- -v hrcl k authors: sang, eric r.; tian, yun; miller, laura c.; sang, yongming title: epigenetic evolution of ace and il- genes as non-canonical interferon-stimulated genes correlate to covid- susceptibility in vertebrates date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v hrcl k current novel coronavirus disease (covid- ) has spread globally within a matter of months. the virus establishes a success in balancing its deadliness and contagiousness, and causes substantial differences in susceptibility and disease progression in people of different ages, genders and pre-existing comorbidities. since these host factors are subjected to epigenetic regulation, relevant analyses on some key genes underlying covid- pathogenesis were performed to longitudinally decipher their epigenetic correlation to covid- susceptibility. the genes of host angiotensin-converting enzyme (ace , as the major virus receptor) and interleukin (il)- (a key immune-pathological factor triggering cytokine storm) were shown to evince active epigenetic evolution via histone modification and cis/trans-factors interaction across different vertebrate species. extensive analyses revealed that ace ad il- genes are among a subset of non-canonical interferon-stimulated genes (non-isgs), which have been designated recently for their unconventional responses to interferons (ifns) and inflammatory stimuli through an epigenetic cascade. furthermore, significantly higher positive histone modification markers and position weight matrix (pwm) scores of key cis-elements corresponding to inflammatory and ifn signaling, were discovered in both ace and il gene promoters across representative covid- -susceptible species compared to unsusceptible ones. findings characterize ace and il- genes as non-isgs that respond differently to inflammatory and ifn signaling from the canonical isgs and their epigenetic properties may serve as biomarkers to longitudinally predict covid- susceptibility in vertebrates and partially explain covid- inequality in people of different subgroups. for cell attachment and entry [ , ] . several groups have reported that sars-cov exerts higher receptor affinity to human ace than other coronaviruses, which may contribute to the high-contagiousness and rapid spread of sars-cov in humans [ , ] . being a key enzyme in the body's renin-angiotensin-aldosterone system (raas), ace catalyzes angiotensinogen (agt) to produce the active forms of hormonal angiotensin (ang) - , which directly regulate the blood volume/pressure, body fluid balance, sodium and water retention, as well as co-opt multiple effects on inflammation, apoptosis, and generation of [ ] . these canonical isgs are mainly induced by type i and type iii ifns but overlap with those upregulated by type ii ifn (i.e. ifn-γ) [ ] [ ] [ ] [ ] . these isgs comprise a frontline of antiviral immunity to restrict virus spreading from the initial infection sites [ ] . however, based on gene evolution and epigenetic analyses, ace may not be a member of these classical antiviral isgs, and more isgs expression (figure ) [ ] [ ] [ ] [ ] . to confirm that, we conducted cross-species comparative analysis between il- and ace genes. first, annotation of encode epigenetic datasets discovered similarity of h k me and h k ac markers between il- and ace gene promoters in both humans and mice; however, significantly higher z- scores and enrichment of h k me and h k ac in human il- and ace genes were detected than in their mouse orthologs, respectively [ ] . second, detection of cis- regulatory elements (cres) that bind core transcription factors of non-isgs, including pu. , irfs, and nf-κb, in ace and il- gene proximal promoter regions across representative animal species [ ] . third, we found that the evolutionary increase of ace , and especially the il- genes response to inflammatory and ifn signaling may serve as epigenetic marker for covid- susceptibility in some animal species including humans. finally, using our non-biased rna-seq data, we further categorize some more non-isgs that resemble the expression pattern of either il- or ace [ ] . notably, we detected two ace isoforms, which differ in both proximal promoters and coding regions, in some livestock species including pigs, dogs and cattle [ ] . in pigs, the ace short isoform (ace s) has an expression pattern similar to il- than the long isoform (ace l). collectively, our findings characterize ace and il- genes as non-isgs responding lower pwm scores for these cres than those for il- genes, in particular the pwm scores for nf-κb cre in ace genes were at - log units lower ( figure d ). this indicates that ace genes were less responsive to non-canonical nf-κb signaling mediated by nf- notably, only cre matrices to irf were shown in figure c ace group showed a mid-response to ifn-α but highest to ifn-γ ( figure a- d) . figure d statistically demonstrates the stimulatory difference among three groups of ifn- interestingly, most of them belong to il- , tnf and chemokine superfamilies, whose roles scores between ace and il- genes, except nf-κb that mediates non-canonical nf-κb response (d) has a significant lower mpwn score ( - log units), indicating ace genes are among different non-isgs group other than il- . canonical isgs of human isg and irf are used as references. mpwm scores are calculated using tools at https://ccg.epfl.ch/pwmtools/pwmscore.php with cre matrices from meme-derived hocomocov tf collection affiliated with the pwm tools. pwm, position weight matrix. other abbreviations are as in figure . . this shows that ace and especially il- genes from cov (+) species contain cres with significantly higher mpwm scores, indicating that in some vertebrate species, non-isgs like ace and especially il- genes evolved to obtain high inductive propensity by inflammatory and ifn signaling, and may serve as epigenetic biomarkers (or triggers) for susceptibility prediction for covid- and other ard syndromes. abbreviation: h_bat, great horseshoe bat, and other abbreviations are as in figure : cross-species phylogenic and topological comparison of il- and ace gene promoters. evolutionary analyses were conducted in mega x. the evolutionary history was inferred by using the maximum likelihood method and tamura-nei model. the tree with the highest log likelihood (- . ) is shown. the percentage of trees in which the associated taxa clustered together is shown next to the branches. initial tree(s) for the heuristic search were obtained automatically by applying neighbor-join and bionj algorithms to a matrix of pairwise distances estimated using the tamura-nei model, and then selecting the topology with superior log likelihood value. for topological comparison between phylogenic trees generated using il- and ace gene proximal promoters, the phylogenies of newick strings were generated using the mega, and topological comparison between the newick trees was performed with compare trees at (http://www.mas.ncl.ac.uk/~ntmwn/compare trees) to obtain the overall topological scores. orange circle: covid- susceptible species. arrows: other tentative marker species to determine which group (il- or ace ) of non-isgs are more determined for covid- susceptibility. abbreviations are as in figure . covid- dashboard by the center for systems science and engineering a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a new coronavirus associated with human respiratory disease in china high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus the contagion of mortality: a terror management health model for pandemics existing conditions of covid- cases and deaths epidemiology working group for ncip epidemic response disease control and prevention. the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) covid- ) [pdf] -world health organization considering how biological sex impacts immune responses and covid- outcomes comorbidities, clinical signs and symptoms, laboratory findings, imaging features, treatment strategies, and outcomes in adult and pediatric patients with covid- : a systematic review and meta-analysis description and proposed management of the acute covid- cardiovascular syndrome sars-cov- and covid- : a genetic, epidemiological, and evolutionary perspective deciphering the role of host genetics in susceptibility to severe covid- the - novel coronavirus acute respiratory syndrome coronavirus ) pandemic: a joint american college of academic international medicine-world academic council of emergency medicine of interferon-stimulated genes: what do they all do interferon target-gene expression and epigenomic signatures in health and disease regulation of type i interferon signaling in immunity and inflammation: a comprehensive review the encode portal as an epigenomics resource pwmscan: a fast tool for scanning entire genomes with a position-specific weight matrix antiviral regulation in porcine monocytic cells at different activation states noncanonical nf-κb signaling in health and disease sun sc. the non-canonical nf-κb pathway in immunity and inflammation orange circle: marking covid- susceptible species. arrows: other tentative marker species to determine which group genome-wide categorizing non-isgs based on the similarity of inductive pattern to il- the non-biased genome-wide transcriptomic data was generated using a rna-seq procedure in porcine lung macrophages stimulated with each of activation stimulator of il- significantly differentially expressed genes (degs) in renin- angiotensin system (ras), interleukin (il)- , tnf and chemokine super-families were annotated and grouped using heatmaps according to their inductive expression patterns similar to c) examples of canonical isgs as reference; (d) averaged transcriptomic expression levels (normalized at reads per kilobase of transcript per million mapped reads, rpkm) of the grouped isgs or non-isgs above. indicated by arrows, pigs have two ace isoforms ace s similar to il- was showing less responsive to ifn-α but highly responsive to lps and ifn-γ. in contrast, ace l and another key gene, agt, in ras were categorized together with other non-isgs (b) epigenetic regulation of non-isgs such as il- and ace was sequentially regulated by such as tnf, ifn and tlr signaling, which modify chromatin accessibility through activating histone modification and recruitment of transcription factors including pu. , irf and nf-κb binding on promoter regions of il- and ace genes. in turn, it will amplify inflammatory loop through il- -mediated response and inducing more ace expression, which collectively contribute to the occurrence of respiratory distress syndrome as in covid- . therefore, high expression of non-isgs such as il- and ace could be biomarkers to determine covid- susceptibility and disease development in different animal species. abbreviations: non-isg, non-canonical interferon stimulated genes tlr, toll- like receptor figure : genome-wide categorizing of non-isgs based on the similarity of inductive pattern to il- and ace genes. the non-biased genome-wide transcriptomic data was generated using rna-seq of porcine lung macrophages activated with stimuli of il- , il- , lps, ifnα or ifn-γ at ng/ml and infected by porcine arterivirus virus for h, using an illumina procedure as previously described [ ] . significantly differentially expressed genes (degs) in renin-angiotensin system (ras), interleukin (il)- , tnf and chemokine super-families were annotated and grouped using heatmaps according to their inductive expression patterns similar to: (a) il- , (b) ace ; (c) examples of canonical isgs as reference; (d) averaged transcriptomic expression levels (normalized at reads per kilobase of transcript per million mapped reads, rpkm) of the grouped isgs or non-isgs above. indicated by arrows, pigs have two ace isoforms, namely ace l and ace s, which have different expression patterns, ace s similar to il- was shown to be less responsive to ifn-α but highly responsive to lps and ifn-γ. in contrast, ace l and another key gene, agt, in ras were categorized together with other non-isgs (b), which is more like the expression pattern of canonical isgs (c) than the il- group (a). figure : working summary for il- and ace as non-isgs biomarkers and contribution to covid- susceptibility. epigenetic regulation of non-isgs such as il- and ace was sequentially regulated by such as tnf, ifn and tlr signaling, which modify chromatin accessibility through activating histone modification and recruitment of transcription factors including pu. , irf and nf-κb binding on promoter regions of il- and ace genes. in turn, it will amplify the inflammatory loop through the il- mediated response and induce greater ace expression, which collectively contributes to the occurrence of respiratory distress syndrome as in covid- . therefore, high expression of non-isgs such as il- and ace could be biomarkers to determine covid- susceptibility and disease development in different animal species. abbreviations: non-isg, non-canonical interferon stimulated genes; gtf, stf, or tf, general (g), tissue-specific (s) transcription factor (tf); tlr, toll-like receptor; tss, transcription start site. key: cord- - tioh m authors: pezzotti, giuseppe; ohgitani, eriko; shin-ya, masaharu; adachi, tetsuya; marin, elia; boschetto, francesco; zhu, wenliang; mazda, osam title: rapid inactivation of sars-cov- by silicon nitride, copper, and aluminum nitride date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tioh m introduction viral disease spread by contaminated commonly touched surfaces is a global concern. silicon nitride, an industrial ceramic that is also used as an implant in spine surgery, has known antibacterial activity. the mechanism of antibacterial action relates to the hydrolytic release of surface disinfectants. it is hypothesized that silicon nitride can also inactivate the coronavirus sars-cov- . methods sars-cov- virions were exposed to wt.% aqueous suspensions of silicon nitride, aluminum nitride, and copper particles. the virus was titrated by the tcd method using veroe /tmprss cells, while viral rna was evaluated by real-time rt-pcr. immunostaining and raman spectroscopy were used as additional probes to investigate the cellular responses to virions exposed to the respective materials. results all three tested materials showed > % viral inactivation at one and ten minutes of exposure. degradation of viral rna was also observed with all materials. immunofluorescence testing showed that silicon nitride-treated virus failed to infect veroe /tmprss cells without damaging them. in contrast, the copper-treated virus suspension severely damaged the cells due to copper ion toxicity. raman spectroscopy indicated differential biochemical cellular changes due to infection and metal toxicity for two of the three materials tested. conclusions silicon nitride successfully inactivated the sars-cov- in this study. the mechanism of action was the hydrolysis-mediated surface release of nitrogen-containing disinfectants. both aluminum nitride and copper were also effective in the inactivation of the virus. however, while the former compound affected the cells, the latter compound had a cytopathic effect. further studies are needed to validate these findings and investigate whether silicon nitride can be incorporated into personal protective equipment and commonly touched surfaces, as a strategy to discourage viral persistence and disease spread. the novel coronavirus, sars-cov- , has led to a worldwide pandemic and raised interest in the surface-mediated transmission of viral diseases [ ] [ ] [ ] . respiratory aerosols and droplets, and contaminated surfaces facilitate viral spread from person to person leading to recommendations of social distancing, wearing of masks, hand-washing, and regular surface disinfection. data suggest that the sars-cov- virus can remain viable on copper, plastic, steel, and cardboard surfaces for - hours after contact, and up to -days on surgical masks [ ] . viral persistence on these and other materials presents a risk for the social and nosocomial propagation of covid- , the disease caused by sars-cov- . current viral inactivation methods include the surface application of chemicals, such as a combination of ethanol with hydrogen peroxide or sodium hypochlorite [ ] . irradiation of surfaces with ultraviolet light is another virus disinfection strategy [ ] . these and other proposed antiviral methodologies are ultimately limited by their toxicity to human cells [ , ] . as a practical solution, surfaces that are safe for human contact and capable of spontaneously inactivating viruses are desirable to control the spread of viral diseases. silicon nitride (si n ) is a non-oxide ceramic that has been used in many industries since the s. a formulation of si n is fda-cleared for use as an intervertebral spinal spacer in cervical and lumbar spine fusion surgery, with proven long-term safety, efficacy, and biocompatibility. clinical data for si n implants compare favorably with other spine biomaterials, such as allograft, titanium, and polyetheretherketone [ ] [ ] [ ] [ ] [ ] [ ] . a curious finding is that si n implants have a lower incidence of bacterial infection (i.e., less than . %) when compared to other implant materials ( . % to %) [ ] . this property reflects the complex surface biochemistry of si n that elutes minute amounts of nitrogen, which is converted to ammonia, ammonium, and other reactive nitrogen species (rns) that inhibit bacteria [ ] . a recent investigation also found that viral exposure to sintered si n powders in aqueous suspension inactivated h n (influenza a/puerto rico/ / ), feline calicivirus, and enterovirus (ev-a ) [ ] . based on these findings, it is hypothesized that si n may be able to inactivate sars-cov- . the present study compared the effects of exposing sars-cov- to aqueous suspensions of si n and aluminum nitride (aln) particles and two controls, (i.e., a suspension of copper (cu) particles (positive control) and a sham treatment (negative control)). cu was chosen as a positive control because of its well-known ability to inactivate a variety of microbes, including viruses [ ] . aln was included in the testing because, like si n , it is a nitrogen-based compound whose surface hydrolysis in aqueous solution leads to the elution of ammonia, with an attendant increase in ph. since comparable antiviral and antibacterial phenomena are believed to be operative for all nitridebased compounds, aln was used to provide additional insight into the antipathogenic mechanisms of nitrogen-containing inorganic materials [ , ] . si n , cu, and aln powders were acquired from commercial sources (sintx technologies, inc., salt lake city, ut usa, fujifilm wako pure chemical corporation, osaka, japan, and tokuyama co., yamaguchi, japan, respectively). si n powder (nominal composition of wt.% si n , wt.% y o , and wt.% al o ) [ ] was prepared by aqueous mixing and spray-drying of the inorganic constituents, followed by sintering of the spray-dried granules (~ °c for ~ h), hot-isostatic pressing (~ °c, h, mpa in n ), aqueous-based comminution, and freezedrying. the resulting powder had an average particle size of . ± . μm. as-received cu powder (usp grade . % purity) granules were comminuted to achieve a particle size comparable to the si n . aln powder had an average particle size of . ± . μm as received, which was comparable to si n . veroe /tmprss cells (japanese collection of research biosources cell bank, national institute of biomedical innovation, osaka, japan) were used in the viral assays [ ] . cells were grown in dulbecco's modified eagle's minimum essential medium (dmem) (nissui pharmaceutical co. ltd., tokyo, japan) supplemented with g disulfate ( mg/ml), penicillin ( units/ml), streptomycin ( μg/ml), % fetal bovine serum, and maintained at °c in a % co / % in a humidified atmosphere. the sars-cov- (japan/ai/i- / ; japan national institute of infectious diseases, tokyo, japan) viral stock was propagated using veroe /tmprss cells at °c for days. fifteen weight percent ( wt.%) of the si n , cu, and aln powders were separately dispersed in ml of pbs(-), followed by the addition of the viral suspension ( x median tissue culture infectious dose (tcid ) in μl). due to the higher density of the cu powder, its volumetric fraction was approximately one-third of the si n . mixing was gently performed at room temperature for min by slow manual rotation or for min using a rotation machine. after exposure, the powders were pelleted by centrifugation ( rpm mins) followed by filtration through a . μm filter (hawach sterile pes syringe filter, hawach scientific co., ltd., xi'an, china). supernatants were collected, aliquoted, and subjected to tcid assays and real-time rt-pcr. experiments were performed in triplicate including sham-treated virus suspension that was not exposed to any powder. a confluent monolayer of veroe /tmprss cells in a -well plate was inoculated with μl/well of each virus suspension in a tenfold serial dilution with . % fbs dmem (i.e., maintenance medium). viral adsorption at °c for h was made with gently shaking every min. afterward, μl/well of the maintenance medium was added. the plate was incubated at °c in a % co / % humidified atmosphere for days. the cytopathic effect (cpe) of the infected cells was observed under a phase-contrast microscope. the cells were subsequently fixed by adding μl/well of glutaraldehyde followed by staining with . % crystal violet. the tcid was calculated according to the reed-muench method [ ] . after exposure to the powders, μl of the supernatants were used for viral rna extraction. rna was also extracted from the surfaces of the centrifuged and filtered powders. rna purification was performed by using a qiaamp viral rna mini kit (qiagen, germantown, md, usa). an aliquot of μl of isolated rna was reverse-transcribed using revertra ace ® qpcr rt master mix (toyobo, shiga, japan). quantitative real-time pcr was performed using a step-one plus real-time pcr system (applied biosystems, foster city, ca, usa) and two sets of primers/probes specific for viral n gene. vero e /tmprss cells on cover glass were inoculated with μl of virus supernatant. after viral adsorption at °c for h, the cells were incubated with the maintenance medium in a co incubator for h. for the detection of infected cells, they were washed with tbs ( mm tris-hcl ph . , mm nacl) and fixed with % pfa for min at room temperature (rt) followed by membrane permeabilization with . % triton x in tbs for min at rt. the cells were blocked with % skim milk in tbs for min at rt and stained with anti-sars coronavirus envelope (rabbit) antibody (dilution = : ) (prosci inc., poway, ca, usa) for min at rt. after washing with a buffer, cells were incubated with an alexa goat anti-rabbit igg (h+l) ( : ) (thermo fisher scientific, waltham, ma, usa) and alexa phalloidin ( : ) (thermo fisher scientific, waltham, ma, usa) for min at rt in the dark. prolongtm diamond antifade mountant with dapi (thermo fisher scientific) was used as a mounting medium. the staining was observed under a fluorescent microscope bzx (keyence, osaka, japan). the total cells, phalloidin-staining cells, and protein-expressing cells were counted using the keyence bz-x analyzer. vero e /tmprss cells were infected with μl of each virus suspension on glass sites. after viral adsorption at °c for h, the infected cells were incubated with the maintenance medium in a co incubator for h and fixed with % paraformaldehyde for min at rt. after washing with distilled water twice, infected cells were air-dried and in situ analyzed using a highly sensitive raman spectroscope (labram hr , horiba/jobin-yvon, kyoto, japan) with a × optical lens. it operated in microscopic measurement mode with confocal imaging in two dimensions. a holographic notch filter within the optical circuit was used to efficiently achieve a spectral resolution of . cm - via a nm excitation source operating at mw. raman emissions were monitored using a single monochromator connected to an air-cooled charge-coupled device (ccd) detector (andor dv -oe ; × pixels). the acquisition time was fixed at s. thirty spectra were collected and averaged for each analysis time-point. raman spectra were deconvoluted into gaussian-lorentzian sub-bands using commercially available software (labspec . , horiba/jobin-yvon, kyoto, japan). the student's t-test determined statistical significance for n= and at a p-value of . using prism software (graphpad, san diego, ca usa). to examine whether viral rna was fragmented from exposure to the powders, rt-pcr tests were conducted on the viral n gene sequence. the results are shown in fig. (a)~(b) and (c)~(d) for and -min exposures, respectively. again, in comparison to the supernatant of the negative control that was not exposed to any powder, almost complete fragmentation of the rna was observed for cu while significant damage was caused by aln and to a lesser extent by si n . after min exposure to the powders, substantial cleavage of the rna was seen for all three materials. while cu still showed the most fragmentation, si n demonstrated similar effectiveness, and aln was essentially identical to the -min exposure condition. viral rna was virtually undetectable for all three materials from the pelleted powders after min of exposure (cf. fig. (a)~(b) ). this result suggests that the decrease of viral rna in the supernatant was not because of adherence of the rna to the powders, but rather due to degradation. figure . viral rna underwent severe degradation after exposure to copper or nitride particles. in (a) and (b), virus suspensions were exposed to cu, aln, and si n powders for min, and viral rna in supernatants and on particles were evaluated using viral n gene "set " and "set " primers, respectively. data collected on supernatants and pellet samples are given in comparison with the amount of viral n gene rna in suspension that was left untreated. in (c) and (d), results of rt-pcr tests for supernatants after min exposure of virus suspension to cu, aln, and si n powders for viral n gene "set " and "set " primers are shown, respectively. statistics are given in the inset according to unpaired two-tailed student's t-test (n= ). immunofluorescence imaging using anti-sars coronavirus envelope antibody (red), phalloidin that stains f-actin in viable cells (green), and dapi for cell nuclear staining (blue) was then used to confirm the tcid assay and gene fragmentation results. figures (a)~(d) show fluorescence micrographs representative of the veroe /tmprss cell populations that were inoculated with supernatants of (a) unexposed virions (i.e., negative control) and -min-exposed virions of (b) si n , (c) aln, and (d) cu. figure (e) shows cells that were not inoculated with the virus (labeled as "sham-infected" cells. the red-fluorescent signals in the negative control ( fig. (a) ) demonstrated that the virus had infected the cells and viral protein was synthesized. this contrasts with the sham-infected cells (fig. (e) ), which did not show any evidence of expression of the viral protein. figure . si n suppressed virus infection without affecting cell viability, whereas cu killed the cells. veroe /tmprss cells were inoculated with (a) unexposed virions, and virions -minexposed to (b) si n , (c) aln, and (d) cu, followed by culture for h. in (e), non-inoculated cells (labeled as "sham-infected" cells) were also prepared and imaged for comparison. after fixation, cells were stained with anti-sars coronavirus envelope antibody (red), phalloidin to visualize f-actin (green), and dapi to stain nuclei (blue). fluorescence micrographs are shown, which are representative of n= samples. remarkably, cells inoculated with supernatants from si n and, to a lesser extent, from aln demonstrated near-normal viability with few infections. conversely, cells inoculated with the cu supernatant were essentially dead as suggested by a complete lack of f-actin (fig. (d) ). this indicates that cell death did not result from viral infection but from toxic effects of intracellular free cu ions [ ] . quantification of the fluorescent microscopic results from figure is provided in figure , assuming that the phalloidin-positive cells were viable and anti-sars envelope antibody-stained cells were infected. these data demonstrate that about % of the viable veroe /tmprss cells from the negative control were infected, whereas only % and % of cells inoculated with supernatants from si n and aln were infected, respectively. raman spectroscopy examined veroe /tmprss cells exposed to the various supernatants to assess biochemical cellular changes due to infection and ionic (i.e., cu and al) toxicity. figure shows raman spectra in the frequency range - cm - for (a) uninfected veroe /tmprss cells, and cells inoculated with supernatants containing virions exposed for mins to (b) si n , (c) aln, (d) cu (positive control), and (e) no antiviral compounds (negative control). of fundamental importance are the vibrational bands of ring breathing and h-scissoring of the indole ring of tryptophan (at and cm - [ ] , labeled as t and t , respectively). tryptophan plays a vital role in protein synthesis and the generation of molecules for various immunological functions. its stereoisomers serve to anchor proteins within the cell membrane [ ] and its catabolites possess immunosuppressive functions [ ] . the catabolism of tryptophan is triggered by a viral infection. this occurs via the enzymatic activity of indoleamine- , -dioxygenase (ido) which protects the host cells from an over-reactive immune response. ido reduces tryptophan to kynurenine and then to n'-formyl-kynurenine. an increase in ido activity depletes tryptophan [ ] . consequently, the intensity of the tryptophan bands (t and t ) is an indicator of these biochemical changes. except for the cu-treated sample, the data presented in fig. (f) show an exponential decline in the combined tryptophan bands that correlates with the fraction of infected cells. (the chemical structure of n'-formyl-kynurenine is given in the inset for clarity.) the anomaly for copper provides further evidence of its toxicity. the veroe /tmprss cells consumed tryptophan to reduce cu + and stabilize it as cu + [ ] . figure . raman spectra of: (a) uninfected cells (i.e., unexposed to virions), and cells infected with sars-cov- virions exposed for min to (b) si n , (c) aln, and (d) cu; in (e), raman spectrum of cells infected by unexposed virions (negative control). in (f), a plot of the average intensity of the two tryptophan t and t bands (at and cm - , respectively) as a function of the fraction of infected cells by virions unexposed and exposed for min to different particles (cf. labels); in the inset, the structure of n'formylkynurenine, an intermediate in the catabolism of tryptophan upon enzymatic ido reaction. in (g), three possible conformations of tyrosine-based peptides that can justify the disappearance of ring vibrations in tyrosine (ty band) upon chelation of cu(ii) ions. the raman signals due to ring-stretching vibrations of adenine, cytosine, guanine, and thymine were found at , , , and cm - , and are labeled as a, cy , g, and th, respectively, in fig. ) [ ] . these bands were preserved after virus exposure. however, there was an anomaly for lines representative of tyrosine at and cm - labeled as ty and ty , respectively [ ] for cells infected with cu-exposed virions. the ring-breathing band ty of tyrosine was very weak compared to the other samples (cf. fig. (d) with (b) ). conversely, the c-c bond-related ty signal remained strong. this suggests that the aromatic ring of tyrosine chelated the cu ions [ ] . this explains why only the tyrosine ring-breathing mode was reduced while the c-c signal remained unaltered. three possible cu(ii) chelating conformations in tyrosine are given in fig. (g) [ , ] . for veroe /tmprss cells exposed to virions treated with aln (fig. (c) ), the tryptophan t and t bands were preserved, but the bands at and ~ cm - due to ring bending in dna cytosine (labeled as cy and cy , respectively, in fig. ) almost vanished [ ] . their disappearance is due to either progressive internucleosomal dna cleavage or from the formation of complexes, and both are related to toxicity [ , ] . the loss of the cytosine signals is interpreted as a toxic effect by al ions, although it is far less critical than copper. al + interacts with carbonyl o and/or n ring donors in nucleotide bases [ , ] and selectively binds to the backbone of the po group and/or to the guanine n- site of the g-c base pairs by chelation [ , ] . unlike exposure of the veroe /tmprss cells to cu and aln supernatants, which resulted in moderate to severe toxicity, si n invoked no modifications of tryptophan, tyrosine, and cytosine. the morphology of the spectrum for the si n viral supernatant closely matched that of the uninfected sham suspension (cf. figs. (a) and (b) ). the persistence of human coronaviruses on common materials (e.g., metal, plastic, paper, and fabric) and touch surfaces (e.g., knobs, handles, railings, tables, and desktops) can contribute to the nosocomial and social spread of disease [ , ] . warnes et al. reported that at room temperature with %- % humidity, the pathogenic human coronavirus e (hucov- e) remained infectious in a lung cell model after at least days of persistent viability on a variety of materials, such as teflon, polyvinyl chloride, ceramic tile, glass, stainless steel, and silicone rubber [ ] . these investigators also showed rapid hucov- e inactivation (within a few minutes) for simulated fingertip contamination on cu surfaces. cu ion release and the generation of reactive oxygen species (ros) were involved in viral inactivation; and increased contact time with copper and brass surfaces led to greater non-specific fragmentation of viral rna, indicating irreversible viral inactivation [ ] . more recently, doremalen et al. showed surface stability of both sars-cov- and sars-cov- virus on plastic, cardboard, stainless steel, and even cu surfaces for ~ hours after application [ ] . while breathable n -rated masks can capture particulates before they can be inhaled, sars-cov- virus particles remain active in mask filters for up to days [ ] . contact killing of viruses, such as observed on cu surfaces is, therefore, receiving renewed interest as a disease mitigation strategy [ ] . the present work is the first to show that compounds capable of endogenous nitrogen-release, such as si n and aln, can inactivate the sars-cov- virus at least as effectively as cu. these results suggest that multiple antiviral mechanisms may be operative, such as rna fragmentation, and in the case of cu, direct metal ion toxicity; but while cu and aln supernatants demonstrated strong and partial cellular lysis, respectively, si n provoked no metabolic alterations. the raman spectrum of veroe /tmprss cells exposed to the si n viral supernatant was like that of the uninfected sham. these findings indicate that while si n , cu, and aln were all capable of inactivating the sars-cov- virus, si n was the safest compound for the tested cell model. these data on sars-cov- are consistent with a recent investigation that showed rapid inactivation of three single-strand rna viruses (h n (influenza a/puerto rico/ / ), feline calicivirus, and enterovirus (ev-a ) upon exposure to aqueous suspensions of powdered wt.% si n [ ] . the antiviral effect in this prior study was related to the electrical attraction (including "binding competitive" to an envelope glycoprotein hemagglutinin in the case of influenza virus) and viral rna fragmentation by reactive nitrogen species (rns). these phenomena are due to the slow and controlled elution of nitrogen from si n 's surface, which forms ammonia (nh ) and ammonium (nh + ) moieties coupled with the release of free electrons and negatively charged silanols in aqueous solution. in the context of sars-cov- viral inactivation, two important aspects of si n 's surface chemistry play fundamental roles: (i) the similarity between the protonated amino groups, sioh-nh + at the surface of si n and the n-terminal of lysine, c-nh + on the virus; and, (ii) the elution of gaseous ammonia due to si n hydrolysis. a schematic representation of the interaction between sars-cov- and the si n surface is given in fig. (central panel) . the similarity is depicted in the left panel of this figure. it triggers an extremely effective "binding competitive" approach to sars-cov- inactivation which stems from several successful other examples such as hepatitis b [ ] and influenza a [ ] . the strong antiviral effect of eluted (gaseous) nh is due to its penetration of the virions and its reaction with the rna backbone. the rna undergoes alkaline transesterification through the hydrolysis of its phosphodiester bonds [ ] . rna phosphodiester bond cleavage is schematically depicted in the right panel of fig. . the rt-pcr and fluorescence microscopy results of the present study suggest contributions from both mechanisms to the inactivation of sars-cov- , consistent with earlier work [ ] . while additional experiments are ongoing to confirm the proposed mechanisms of inactivation, the tcid results shown in fig. and the rt-pcr data of fig. for viral rna harvested from either the supernatant or the si n particles provide important information about these mechanisms. although > % inactivation was achieved after exposure to si n for min, (fig. (b) ), only partial viral rna fragmentation was observed for the supernatant (fig. (a) ) while rna harvested from the si n particles ( fig. (b) ) was essentially fully fragmented. this suggests that the mechanism of inactivation for si n , as depicted in the left panel of fig. , had successive events of "competitive binding" and ammonia poisoning -a kind of "catch and kill" scenario. the complete rna fragmentation at min exposure to si n suggests that nitrogen elution is a key process that triggers a cascade of reactions, which result in virus inactivation (cf. right panel in fig. ). note that the antiviral effectiveness of si n was comparable to cu [ ] . while cu is an essential trace element for human health and an electron donor/acceptor for several key enzymes by altering redox states between cu + and cu + [ ] , these properties can also cause cellular damage [ ] . its use as an antiviral agent is limited by allergic dermatitis [ ] , hypersensitivity [ ] , and multiorgan dysfunction [ ] . in contrast, the safety of si n as a permanently implanted material during spine fusion surgery is well established by experimental and clinical data. these observations are consistent with previous studies for si n spine implants made from the same composition. si n is well known for its capabilities as an industrial material [ ] . load-bearing si n prosthetic hip bearings and spinal fusion implants were initially developed because of the superior strength and toughness of si n [ ] . later studies showed other properties of si n that are favored in designing orthopaedic implants, such as enhanced osteoconductivity [ ] [ ] [ ] [ ] and bacteriostasis [ , [ ] [ ] [ ] [ ] [ ] [ ] . taken together, these data suggest that si n 's surface chemistry, previously known to contribute concurrent upregulation of osteogenic activity and prevention of bacterial adhesion and biofilm formation, is also quite effective against viruses. sintered powders of si n have been incorporated into other materials, such as polymers [ , ] , other ceramics [ ] , and bioglass [ , ] to create composite structures that maintain the index osteogenic and antibacterial properties of monolithic si n . three-dimensional additive deposition of si n could, at least in theory, enable the manufacture of protective surfaces in health care that reduce fomite-mediated transmission of microbial disease [ ] . figure . schematic model illustrating: a chemical and electrical charge similarity between the protonated amine groups, si-nh + , at the surface of si n and the n-terminal of lysine, c-nh + in cells (left panel); and, the interaction of sars-cov- viruses with the charged molecular species at the surface of si n (specifically, at protonated amines charging plus) and the eluted species nh /nh + (central panel). the eluted n leaves + charged vacancies on the solid surface (violet-colored sites), which stem together with negatively charged silanols. the three-step process leading to rna backbone cleavage by the eluted nitrogen species (namely, deprotonation of '-hydroxyl groups, formation of a transient pentaphosphate, and cleavage of the phosphodiester bond in the rna backbone by alkaline transesterification through hydrolysis) is shown in the right panel. note that the similarity between the protonated amine and n-terminal of lysine might trigger an extremely effective "binding competitive" mechanism for sars-cov- virion inactivation, while eluted ammonia fatally degrades the virion rna in a combined "catch and kill" effect. incorporation of si n particles into the fabric of personal protective equipment, such as face masks, protective gowns, and surgical drapes could contribute to health workers as well as patient safety. further studies and testing are necessary to corroborate our findings and to establish the antiviral attributes of si n composites and coatings. in summary, it was shown that si n inactivates the sars-cov- virus in a matter of minutes following exposure. similar results with aln suggest that the mechanism of action is shared with other nitrogen-based compounds that release trace amounts of surface disinfectants slowly and for the long-term. these data may prove useful in designing products and strategies to mitigate the spread of viral diseases. how long do nosocomial pathogens persist on inanimate surfaces? a systematic review persistence of coronaviruses on inanimate surfaces and their inactivation with biocidal agents human coronavirus e remains infectious on common touch surface materials stability of sars-cov- in different environmental conditions uvc radiation as an effective disinfectant method to inactivate human papillomaviruses antibacterial properties and toxicity from metallic nanomaterials toxic effects of ultraviolet radiation on the skin a single center retrospective clinical evaluation of anterior cervical discectomy and fusion comparing allograft spacers to silicon nitride cages accelerated cervical fusion of silicon nitride versus peek spacers: a comparative clinical study porous silicon nitride spacers versus peek cages for anterior cervical discectomy and fusion: clinical and radiological results of a single-blinded randomized controlled trial clinical outcomes for anterior cervical discectomy and fusion with silicon nitride spine cages: a multicenter study clinical outcomes for lumbar fusion using silicon nitride versus other biomaterials two-year results of a double-blind multicenter randomized controlled non-inferiority trial of peek versus silicon nitride spinal fusion cages in patients with symptomatic degenerative lumbar disc disorders bacteriostatic behavior of surface-modulated silicon nitride in comparison to polyetheretherketone and titanium silicon nitride: a bioceramic with a gift a potent solid-state bioceramic inactivator of ssrna viruses metallic copper as an antimicrobial surface hydrolysis behavior of aluminum nitride in various solutions the hydrolysis of aluminum nitride: a problem or an advantage processing and characterization of silicon nitride bioceramics enhanced isolation of sars-cov- by tmprss -expressing cells a simple method of estimating fifty percent endpoints copper homeostasis in eukaryotes: teetering on a tightrope raman spectra of amino acids and their aqueous solutions the role of tryptophan side chains in membrane protein anchoring and hydrophobic mismatch tryptophan catabolism and regulation of adaptive immunity inhibition of indoleamine , -dioxygenase enhances the t-cell response to influenza virus infection copper(i) stabilization by cysteine/tryptophan motif in the extracellular domain of ctr surface-enhanced raman spectroscopy of dna bases cysteine conformation and sulfhydryl interactions in proteins and viruses. . normal coordinate analysis of the cysteine side chain in model compounds influences of peptide side chains on the metal ion binding site in metal ion-cationized peptides: participation of aromatic rings in metal chelation tyrosine: an efficient natural molecule for copper remediation discrimination between ricin and sulphur mustard toxicity in vitro using raman spectroscopy speciation of aluminum in biological systems a review of heavy metal cation binding to deoxyribonucleic acids for the creation of chemical sensors alteration of superhelical state of dna by aluminium (al) an ftir spectroscopic study of calf-thymus dna complexation with al(iii) and ga(iii) cations evidence that contaminated surfaces contribute to the transmission of hospital pathogens and an overview of strategies to address contaminated surfaces in hospital settings the role of contaminated surfaces in the transmission of nosocomial pathogens aerosol and surface stability of sars-cov- as compared with sars-cov- efficient inhibition of hepatitis b virus infection by a pres -binding peptide. sci rep a novel small-molecule compound disrupts influenza a virus pb cap-binding and inhibits viral replication ammonia as an in-situ sanitizer: inactivation kinetics and mechanisms of the ssrna virus ms by nh structural motifs, and inorganic models. science ( -) metal allergy and systemic contact dermatitis: an overview. dermatol res pract an overview of copper toxicity relevance to public health silicon nitride and related materials orthopedic applications of silicon nitride ceramics bioactive silicon nitride: a new therapeutic material for situ spectroscopic screening of osteosarcoma living cells on stoichiometry-modulated silicon nitride bioceramic surfaces silicon nitride: a synthetic mineral for vertebrate biology human osteoblasts grow transitional si/n apatite in quickly osteointegrated si n cervical insert decreased bacteria activity on si n surfaces compared with peek or titanium anti-infective and osteointegration properties of silicon nitride, poly (ether ether ketone), and titanium implants a spontaneous solid-state no donor to fight antibiotic resistant bacteria. mater today chem monitoring metabolic reactions in staphylococcus epidermidis exposed to silicon nitride using in situ time-lapse raman spectroscopy surface functionalization of polyethylene by incorporating si n into peek to produce antibacterial, osteoconductive, and radiolucent spinal implants silicon nitride laser cladding: a feasible technique to improve the biological response of zirconia biological response of human osteosarcoma cells to si n -doped bioglasses enhanced bioactivity of si n through trench-patterning and back-filling with bioglass® d-additive deposition of an antibacterial and osteogenic silicon nitride coating on orthopaedic titanium substrate the authors declare no conflicts of interest. key: cord- - oisrm s authors: liu, zhe; zheng, huanying; yuan, runyu; li, mingyue; lin, huifang; peng, jingju; xiong, qianlin; sun, jiufeng; li, baisheng; wu, jie; hulswit, ruben j.g.; bowden, thomas a.; rambaut, andrew; loman, nick; pybus, oliver g; ke, changwen; lu, jing title: identification of a common deletion in the spike protein of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oisrm s two notable features have been identified in the sars-cov- genome: ( ) the receptor binding domain of sars-cov- ; ( ) a unique insertion of twelve nucleotide or four amino acids (prra) at the s and s boundary. for the first feature, the similar rbd identified in sars-like virus from pangolin suggests the rbd in sars-cov- may already exist in animal host(s) before it transmitted into human. the left puzzle is the history and function of the insertion at s /s boundary, which is uniquely identified in sars-cov- . in this study, we identified two variants from the first guangdong sars-cov- cell strain, with deletion mutations on polybasic cleavage site (prrar) and its flank sites. more extensive screening indicates the deletion at the flank sites of prrar could be detected in of clinical samples and half of in vitro isolated viral strains. these data indicate ( ) the deletion of qtqtn, at the flank of polybasic cleavage site, is likely benefit the sars-cov- replication or infection in vitro but under strong purification selection in vivo since it is rarely identified in clinical samples; ( ) there could be a very efficient mechanism for deleting this region from viral genome as the variants losing - is commonly detected after two rounds of cell passage. the mechanistic explanation for this in vitro adaptation and in vivo purification processes (or reverse) that led to such genomic changes in sars-cov- requires further work. nonetheless, this study has provided valuable clues to aid further investigation of spike protein function and virus evolution. the deletion mutation identified in vitro isolation should be also noted for current vaccine development. sars-cov- is a novel coronavirus firstly identified at the end of december but has caused a global pandemic of covid- . unlike the other two zoonotic coronaviruses sars cov- and mers , the genetic evolution history is mostly unknown for sars-cov- . a recent analysis based on the genetic information and protein structure highlights there are two notable features in the sars-cov- genome: ( ) the receptor binding domain (rbd) of sars-cov- is distinct from the most closely-related batorigin sars related virus (ratg ) and is demonstrated to have a high affinity to human ace receptor; ) a unique insertion of nucleotides (or four amino acids, prra) at the s and s boundary results in a polybasic (furin) cleavage site and three predicted o-linked glycans around the cleavage site . with respect to the first feature, the similar rbd identified in a sars-like virus from a pangolin suggests that the rbd in sars-cov- may already exist in its potential animal host(s) before it transmitted into human . the question remaining is the history and function of the insertion at the s /s boundary, which is uniquely identified in sars-cov- . the insertion of proline is predicted to result in three addition of o-linked glycans. the functional consequence of the polybasic cleavage site and o-linked glycans in sars-cov- is unknown. by sequencing the whole genome of sars-cov- , we identified two variants having deletion mutations on polybasic cleavage site (prrar) and its flank sites. more extensive screening indicates the deletion at the flank sites of prrar have been frequently observed in cell isolated strains and could be verified by multiple sequencing methods. the first covid- clinical case in guangdong was reported on th january, with illness onset on st to investigate whether these deletions described above are random mutations occasionally identified in a strain or would commonly occur after cell passages, we performed whole genome sequencing on the other sars-cov- viral strains collected after rounds of cell passage in vero-e or vero cells (supplemental table) . the corresponding original samples for these strains were collected between th january and th february . multiplex-pcr combined with the nanopore sequencing was used, following the general protocol as described in (https://artic.network/ncov- ). the artic pipeline was applied to trimmed primers and generated the bam files, which included all reads mapping to the sars-cov- reference genome (mn . ). variant sites were called by using ivar with depth >= as a threshold. with this method, of cell isolate strains have different ratios of variants (> %) with deletion at the flank of the polybasic cleavage site (deletion at - ) ( figure c) . one has the variant with deletion on the polybasic cleavage site (deletion at - ). to find out whether the deletion on - was restricted in a specific genetic lineage, we next investigated the phylogenetic relationship of these strains and first strain described above. as shown in figure d , the strains with a relative higher ratio of this deletion were dispersed in the phylogenetic tree suggesting the deletion mutation was not restricted to a specific genetic lineage of sars-cov- viruses. to identify whether these deletions also occurred in original clinical samples, we screened the high through-put sequencing data from clinical samples, which collected between th february and th march in guangdong, china. these samples were sequenced as by using multiplex pcr combined with nanopore sequencing. there were sars-cov- genomes with sequencing average depth >= at the sites neighboring . as shown in table , the variants with the deletion at - were found in ( %) of clinical samples with ratios ranging from . - . % indicating this deletion may also occur in vivo infections even though the rate was extremely low compared to the results from in vitro ( figure d ). to date, there are no genome sequences deposited in public dataset having this deletion. however, this did not mean this variant did not exist in currently released sequences since most of the variants with a lower ratio would be discarded when generating the final consensus sequences. a new coronavirus associated with human respiratory disease in china coronavirus disease (covid- ) situation reports origin and evolution of pathogenic coronaviruses the proximal origin of sars-cov- identifying sars-cov- related coronaviruses in malayan pangolins evidence and characteristics of human-to-human transmission of sars-cov- an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar the chinese sars molecular epidemiology consortium. molecular evolution of the sars coronavirus during the course of the sars epidemic in china attenuation of replication by a nucleotide deletion in sars-coronavirus acquired during the early stages of human-to-human transmission a novel bat coronavirus reveals natural insertions at the s /s cleavage site of the spike protein and a possible recombinant origin of hcov- genome sequence archive key: cord- - d mn ok authors: gouveia, duarte; miotello, guylaine; gallais, fabrice; gaillard, jean-charles; debroas, stéphanie; bellanger, laurent; lavigne, jean-philippe; sotto, albert; grenga, lucia; pible, olivier; armengaud, jean title: proteotyping sars-cov- virus from nasopharyngeal swabs: a proof-of-concept focused on a min mass spectrometry window date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: d mn ok rapid but yet sensitive, specific and high-throughput detection of the severe acute respiratory syndrome coronavirus (sars-cov- ) in clinical samples is key to diagnose infected people and to better control the spread of the virus. alternative methodologies to pcr and immunodiagnostic that would not require specific reagents are worth to investigate not only for fighting the covid- pandemic, but also to detect other emergent pathogenic threats. here, we propose the use of tandem mass spectrometry to detect sars-cov- marker peptides in nasopharyngeal swabs. we documented that the signal from the microbiota present in such samples is low and can be overlooked when interpreting shotgun proteomic data acquired on a restricted window of the peptidome landscape. simili nasopharyngeal swabs spiked with different quantities of purified sars-cov- viral material were used to develop a nanolc-ms/ms acquisition method, which was then successfully applied on covid- clinical samples. we argue that peptides adetqalpqr and gfyaqgsr from the nucleocapsid protein are of utmost interest as their signal is intense and their elution can be obtained within a min window in the tested conditions. these results pave the way for the development of time-efficient viral diagnostic tests based on mass spectrometry. the new severe acute respiratory syndrome-related coronavirus (sars-cov- ) is the causative agent of covid- , the coronavirus disease that was first reported in december in the city of wuhan, china ( ) . due to its easy inter-human transmission, sars-cov- has since quickly spread worldwide, causing more than million covid- diagnosed infections and more than thousand deaths officially reported as off mid june (https://covid .who.int/). the rapid, sensitive and specific detection of the sars-cov- virus in large cohorts of clinical samples is of utmost importance to identify infected people and control the propagation of the virus by specific containment measures. at the same time, being able to catch the numerous sars-cov- variants represents an opportunity to identify attenuated forms of the virus ( ) . however, the occurrence of specific mutations, especially deletions, may challenge current molecular detection methodologies. the research community has been placing great efforts in the development of quick and accurate detection tests ( , ) . the gold standard in diagnostics relies on the amplification and measurement of the viral rna by reverse transcription polymerase chain reaction (rt-pcr). rt-pcr is highly specific and achieves a good compromise between speed ( - min) and sensitivity. however, due to the great demand for pcr-based testing, shortage of rna extraction kits and pcr reagents may have limited the testing capacity in some countries at the early stage of the pandemic ( ) . besides, rt-pcr testing of clinical samples may be in some case less efficient due to nucleic acid variations in the targeted regions -primers or their close vicinity -that could affect the amplification rate ( , ) . for these reasons, alternative detection strategies that address these concerns should be developed to complement conventional tools. immunoassays, whole-genome sequencing ( ) and mass spectrometry (ms) ( ) technologies are commonly suggested alternatives to pcr-based assays. among these, new generation ms offers a highly sensitive technology that allows the rapid identification of thousands of proteins present in a single sample. the typing of organisms by tandem mass spectrometry (ms/ms), commonly referred to as "proteotyping", is based on the identification of specific peptide sequences that allow the unambiguous identification of organisms ( ) ( ) ( ) . the uniqueness of the mass to charge ratios and fragmentation patterns measured in ms/ms allows identifying peptides that differentiate organisms at the subspecies level. although classical ms-based identification of pathogens in the clinical setting is based on whole-cell maldi-tof technology ( ) , the field has thrived with the increases in speed, sensitivity, and accuracy of new ms instrumentation in the last decade. the coupling of new generation instruments with the separation power of liquid chromatography makes lc-ms/ms a valuable technology to implement in the routine of clinical laboratories. despite their high potential, the application of lc-ms/ms approaches for virus proteotyping is still scarce. among the few examples available in the literature, lc-ms/ms was shown to be able to detect purified influenza virus ( ) and human metapneumovirus in clinical samples ( ) . because of the considerable damages of the covid pandemic, the mass spectrometry community quickly proposed to mobilize its efforts at helping to understand the molecular mechanisms of infection ( ) ( ) ( ) and at improving detection methods ( ) . several research groups started investigating ms-based quantification of peptides for the detection of sars-cov- in clinical samples, but these results are not yet published ( ) ( ) ( ) ( ) . these preliminary results indicate that targeted ms, in which the mass spectrometer is programmed to precisely detect and quantify a limited number of peptides of interest, can be successfully applied to virus detection. targeted ms is considered as the gold standard for peptide quantification due to its higher sensitivity when compared to shotgun proteomics approaches. nevertheless, this approach has a much lower throughput and is commonly used to test hypotheses on a subset of proteins of interest, in contrast to discovery shotgun proteomics. by being more flexible, the latter provides a more comprehensive picture of the viral peptidome, including the detection of variant sequences because of the possibility of detecting peptides without any previous knowledge of their sequences. here we established the proof of concept of the use of ms/ms for the rapid proteotyping of sars-cov- from clinical samples. we recently published a dataset from a shotgun lc-ms/ms experiment performed with sars-cov- infected cells and proposed a list of specific viral peptides that could be used for the development of targeted approaches ( ) . interestingly, we observed that some sars-cov- specific peptides eluted from the lc column at narrow windows of retention time. here, we used lc-ms/ms with an orbitrap instrument (q exactive hf) for analyzing the peptidome from nasal swabs spiked with different quantities of viral material. by using a short lc gradient focusing on the region of interest identified in our previous study, we tested the detection of the virus in samples containing different quantities of viral peptides, as well as covid- clinical samples, paving the way for the development of time-efficient viral diagnostic tests based on an alternative platform. nasopharyngeal swab collection and processing for reference matrices. two nasopharyngeal swabs were collected using a sterile polyester swab with semi-flexible polystyrene handle (puritan) from two healthy volunteers (swabs r and r ). each swab was soaked into a tube containing µl of sterile water, incubated for min at room temperature, and then rinsed with µl of sterile water. the biological material from the µl of solution was precipitated with the addition of µl of trichloroacetic acid at % (w/v) and centrifugation at , g for min. the supernatant was discarded. the hardly visible pellet was dissolved into µl of lds x containing % beta-mercaptoethanol, heated for min at °c, and centrifuged briefly. for each swab sample, a volume of µl of lds x sample was deposited on a sds-page gel and run for min. after migration, the gel was rinsed with water, stained with simply blue safestain (invitrogen), and destained overnight in water. the two polyacrylamide gel bands corresponding to the whole proteome of each matrix were excised, processed as described ( ) , and then subjected to trypsin gold proteolysis (promega) using . % proteasemax surfactant (promega). the nasal matrix peptide fractions were µl for each swab. peptides from the nasal swab matrices were analysed with a q-exactive hf mass spectrometer (thermo) coupled with an ultimate lc system (dionex-lc) and operated in datadependent mode as previously described ( ) . a volume of µl of peptides was injected, desalted onto an acclaim pepmap c pre-column ( µm, Å, µm id x mm), and then resolved onto a nanoscale acclaim pepmap c column ( µm, Å, µm id x cm) with a -min gradient at a flow rate of . µl/min. the gradient was developed from to % of ch cn, . % formic acid in min, and then from to % in min, washed, and re-equilibrated. peptides were analysed with scan cycles initiated by a full scan of peptide ions in the orbitrap analyser, followed by high-energy collisional dissociation and ms/ms scans on the most abundant precursor ions (top method). full scan mass spectra were acquired from m/z to at a resolution of , with internal calibration activated on the m/z . signal. ion selection for ms/ms fragmentation and measurement was performed applying a dynamic exclusion window of sec and an intensity threshold of x . only ions with positive charges + and + were considered. vero e (atcc, clr- ) cells were cultured at °c in % co in dulbecco's modified eagle's medium (dmem, gibco, themofisher) supplemented with % fetal calf serum (fcs) and . % penicillin-streptomycin. the sars-cov- strains -ncov/italy-inmi (genbank mt ) was provided by the lazzaro spallanzani national institute of infectious diseases (rome, italy) via the evag network (european virus archive goes global). sars-cov- stocks used in the experiments had undergone two passages on vero e cells and were stored at - °c. virus titer was . x plaque forming units (pfu)/ml, as determined by standard plaque assay (three dilutions in duplicates). all experiments entailing live sars-cov- were performed in our biosafety level facility and strictly followed its approved standard operating procedures. vero e cells ( x ) seeded into cm flasks were grown to cell confluence in ml dmem supplemented with % fcs and . % penicillin-streptomycin for one night at °c under % co . they were infected at multiplicity of infection (moi) of . . cells were harvested at days post infection (dpi) and viral suspension was recovered after centrifugation at , rpm for min to remove cell debris. ml of the viral suspension were laid on ml of % (w/v) sucrose cushion prepared in nacl . m, edta m m, mm tris hcl buffer (ph . ) (tne buffer) in ultra-clear ml tubes (beckman coulter). samples were centrifuged at , rpm for h at °c. pellets were solubilised in l of cold tne buffer and a volume of . ml was laid on a five step - % (w/v) sucrose gradient prepared in ultra-clear ml tubes (beckman coulter). the tubes were centrifuged at , rpm for h at °c in a beckman sw rotor. after recovery of the virus band, the viral suspension was inactivated by incubation with betapropiolactone at a final concentration of . % for h at °c. plaque assay titration was used to quantify the purified virus and validate the viral inactivation. the inactivated purified virus sample (equivalent to . x pfu/ml) was quantified in terms of protein concentration ( . mg/ml) by uv spectrophotometry. a volume of µl was mixed with µl of lds x to obtain a protein fraction of . mg/ml. after denaturation at °c for min, a volume of µl ( . µg of proteins) was deposited on a nupage - % gel (invitrogen) and subjected to min electrophoretic migration. the whole proteome was excised as a single polyacrylamide gel band and subjected to trypsin proteolysis as previously described ( ) . an aliquot of µl of peptides was extracted. ms/ms analysis was performed to confirm the high content of viral proteins in this sample (data not shown). sars-cov- viral peptides ( µl) were diluted in µl of h o, . % tfa. after mixing, µl of this tube was removed and diluted with µl of h o, . % tfa. this was repeated several times to obtain a one third dilution cascade of viral peptides. two series of simili swabs were prepared in parallel. the two peptide fractions obtained from nasopharyngeal swabs ( µl) were diluted with µl of h , . % tfa. a volume of µl of this diluted matrix was added to each simili swab samples, giving a final volume of µl per sample. thus, each simili swab contained the equivalent of . % of the proteins harvested by a nasal swab. two biological replicates were prepared using each nasal swab matrix. a volume of µl per sample was injected in the q-exactive hf tandem mass spectrometer. they were analysed in the same conditions as above except that the gradient was developed from to . % of ch cn, . % formic acid for min at a flow rate of . µl/min. the min ms/ms acquisition started min after injection. nasopharyngeal swabs were collected from covid- diagnosed adult patients as routine medical controls and tested by rt-pcr assay for detecting sars-cov- in a nasopharyngeal sample (swabs t -t ). this study was approved by the institutional review boards of the university hospital of nîmes, france ( - - ). patients have been previously be informed that part of these samples could be used for research purpose and agreed. each swab was soaked into a tube containing ml of phosphate buffered saline (ph . ) sterile solution and transferred in the biosafety level facility. the biological material was precipitated with the addition of . ml of trichloroacetic acid at % (w/v). after centrifugation, the supernatant was discarded and the pellet was dissolved into µl of lds x containing % betamercaptoethanol, heated for min at °c, and deposited on a nupage - % gel (invitrogen). the proteins were subjected to min electrophoresis and treated as described here above to obtain tryptic peptides. ms/ms acquisition was done as for the simili swabs. the min ms/ms acquisition started min after injection with an inclusion list comprising m/z values corresponding to viral peptides. ms/ms spectra from the nasopharyngeal swabs were searched against the generalist ncbinr database ( , , sequences totalling , , , amino acids) with the mascot daemon . . search engine (matrix science). the search parameters were as follows: fulltrypsin specificity, maximum of two missed cleavages, mass tolerances of ppm on the parent ion and . da on the ms/ms, carbamidomethylated cysteine (+ . ) as a fixed modification, and oxidized methionine (+ . ) and deamidation of asparagine and glutamine (+ . ) as variable modifications. psms with an fdr< % were selected for peptide inference. peptides were assigned to taxa using the unipept . web interface ( ) with default parameters (equate i/l, filter duplicate peptides). ms/ms spectra from the simili sars-cov- contaminated swabs and from the covid- nasopharyngeal swabs were assigned with the mascot daemon . . search engine (matrix science) as follows: the spectra were first queried against the crap_contaminants_ - - .fasta file and then against the swissprot_human_isl_ _ - - database ( , sequences totalling , , amino acids) in follow-up mode and with the decoy option activated. this last database is the merge of the sars-cov- viral proteins and the swissprot human proteome. the mascot search was performed with the same parameters as above. all peptide matches presenting a mascot peptide score with a fdr lower than % were assigned to protein sequences. ms peak areas were evaluated with skyline ( ) . briefly, we created spectral libraries based on the dat files from each mascot search (cut-of . ) and uploaded the ms full scan information contained in the raw files. the protein database previously used for the mascot search was used as background proteome. only the viral proteins were added to the target panel. peptide settings were matched to those used in the mascot search. peak peaking was manually checked for all peptides. the mass spectrometry and proteomics data acquired on simili swabs have been deposited to the proteomexchange consortium via the pride partner repository ( ) with the dataset identifiers pxd and . /pxd . to assess the performance of shotgun ms-based proteomics in detecting sars-cov- peptides in a background matrix consisting of nasopharyngeal swab protein material, we experimentally created tryptic peptidomes from i) a purified virus solution obtained from vero e cells infected with a sars-cov- reference strain, and ii) nasopharyngeal swabs obtained from two healthy volunteers (figure ) . we first characterized the nasal peptidomes and searched for the presence of detectable microorganisms by metaproteomic data analysis. then, the virus peptidome was serially diluted into nasopharyngeal swab peptidomes to obtain two sets of seven tubes containing from ng (equivalent to infectious particles) to . ng (equivalent to infectious particle) of viral protein material. the fourteen samples were subsequently analyzed by lc-ms/ms. a window of min of acquisition within a min lc gradient was adjusted to target the region of elution of five previously identified virusspecific peptides ( ) . the rationale for focussing the mass spectrometry measurements on these peptides was their remarkable sequence conservation amongst the numerous sars-cov- strains sequenced to date or/and their specificity to the novel coronavirus ( ) . these peptides were the following: eitvatsr, gfyaegsr, htpinlvr, iaghhlgr, and adetqalpqr. while known variants exist for the latter, the other four peptides are conserved along the several sars-cov- sequenced genomes. the ms/ms spectra acquired over min on the two nasopharyngeal swabs were analyzed to infer the main microbial components present in these samples as such presence should be taken into account for creating an ad hoc database for ms/ms interpretation. swabs r and r yielded , and , ms/ms spectra, respectively, from which , and , were attributed to , and , peptide sequences from organisms present in the ncbinr database (fdr < %). these peptide sequences were analyzed with the unipept tool ( ) to assess the biodiversity present in each sample through their taxon-specificity characteristics based on the lowest common ancestor approach. only a small proportion of the peptide sequences mapped by unipept belonged to microorganisms (table s ) . a rather low number ( and ) of peptides from the r and r swabs, respectively, were attributed to bacteria, archaea or fungi. these corresponded to . % and . % of the mapped peptide sequences, respectively. to exclude false positive identifications, we applied a threshold of at least threetaxon specific peptides for organism validation at the species level, corresponding to . % of the total number of species-specific peptides ( and in each sample), as suggested by ( ) . thus, one low-abundant corynebacterium was identified in sample r , namely corynebacterium accolens, with specific peptides. in swab r , corynebacterium propinguum, corynebacterium pseudodiphtheriticum, and dolosigranulum pigrum could be identified at the species taxonomical rank with , and specific peptide sequences, respectively. simili swabs containing specific quantities of sars-cov- virus and the equivalent of . % of the nasal matrix protein material collected during sampling were analysed by ms/ms with a short gradient. we first confirmed on the most diluted fraction that the bacterial signal was negligible for both fractions, thus not to consider at the ms/ms attribution search stage. for this, the two datasets were searched against the generalist database ncbinr to check for the presence of non-human peptides in the swab peptidomes. the unipept analysis of the detected peptide sequences showed that only and peptides, from replicate and , respectively, were attributed to bacteria and no bacterial species could be confidently identified ( table s ) . the results from the short gradient ms analysis on the simili swabs against the specific human/virus database yielded , ms/ms spectra recorded in the fourteen samples. from these, , were attributed to , peptide sequences with a fdr below % (table s ) . this data allowed for the identification of , protein groups (table s ) . a small fraction of peptide-to-spectrum matches (psms), corresponding to . % of the total psms, allowed identifying different viral peptide sequences, including the five peptides of interest. the peptides report for structural proteins from the virus: peptides from the nucleocapsid protein (n), peptides from the spike protein (s), and peptides from the membrane glycoprotein (m). at least one viral peptide was identified in all samples independently of the concentration of the viral material, from ng ( pfu) to ng ( pfu). however, no peptide from the virus was identified in the sample containing viral peptides corresponding to . ng ( pfu). the heatmap in figure displays the ms peak areas, the number of psms attributed to each peptide in each sample, and the number of viral peptides identified in each sample. the five peptides with the highest ms peak areas across samples were the following: eitvatsr, gfyaegsr, lnqlesk, adetqalpqr, and kadetqalpqr. among them, eitvatsr, gfyaegsr, and adetqalpqr are between the five peptides of interest. peptide htpinlvr was the seventh most abundant. inversely, peptide iaghhlgr was amongst the peptides with the lowest ms peak areas, along with peptides msecvlgqsk, lddkdpnfk, and eidrlnevak. as expected, the number of identified peptides decreased with the decreasing viral load in the sample. while all peptides were identified in the initial dilution containing ng of viral proteins ( pfu), in highly-diluted samples containing ng of viral proteins ( pfu) and ng ( pfu), the virus was proteotyped with only and peptides, respectively. in these samples only peptides from protein n were detected (adetqalpqr, kadetqalpqr, and gfyaegsr). generally, the peptides from protein n were the most consistently detected across samples. despite being among the peptides with higher peak areas in the chromatograms, peptides of interest htpinlvr and eitvatsr were only detected in the simili swabs containing an estimated ng of viral proteins ( pfu). on the other hand, the two other peptides of interest from protein n, gfyaegsr and adetqalpqr, allowed virus proteotyping in the sample containing ng of viral proteins ( pfu). of note, the peptide identified in the condition with ng of viral proteins ( pfu) is a miss-cleaved version of the adetqalpqr peptide: kadetqalpqr. peptide asanlaatk, that had not been previously selected among the "best" candidates for sars-cov- proteotyping ( ), was detected in the dilution with ng of viral proteins ( pfu) and was the most sensitive peptide from protein s. with the lc gradient used in this experiment and the delay for starting the acquisition, the retention times of peptides are minus - minutes compared to those described in our previous paper. peptides are generally well distributed along the gradient, with some exceptions of peptide pairs that co-elute: kadetqalpqr/rvdfcgk, qlqqsmssadstqa/cygvsptk, adetqalpqr/gfyaqgsr, or htpinlvr/eidrlnevak. six peptides elute in the first ten minutes of the gradient: iaghhlgr, kkadetqalpqr, asanlaatk, lnqlesk, kadetqalpqr, and rvdfcgk. of the utmost interest, three of the most conserved and well-detected peptides, eitvatsr, adetqalpqr, and gfyaqgsr, elute in a -min window between and min of the gradient. nasopharyngeal swabs were sampled from nine covid- diagnosed patients with different clinical manifestations (moderate symptoms and asymptomatic) and at different postdiagnostic stages ( table ) . due to the complexity of the samples, an inclusion list of m/z signals corresponding to the five peptides of interest as well as other sars-cov- peptides detectable in this gradient region ( ) was added to the acquisition method to increase the likelihood of their detection. table s reports this inclusion list which contained m/z values for different precursors from different viral peptide sequences. the short gradient ms analysis on these clinical samples yielded between and , ms/ms spectra recorded per sample. sixty-five spectra were attributed to viral peptide sequences with a fdr below % (table s ) . this data allowed for the detection of six peptides reporting for two viral proteins (table s ) : lddkdpnfk, kadetqaipqr, kkadetqaipqr, adetqaipqr, gfyaegsr from protein n, and eitvatsr from protein m. the heatmap in figure displays the ms peak areas, the number of psms attributed to each peptide in each sample, the number of viral peptides identified in each sample, and the result from the pcr testing performed on the same sample. the virus was confidently proteotyped in clinical swabs t and t , with four and five peptides respectively. peptide eitvatsr was identified in swab t with two spectral counts, but virus detection in this sample cannot be validated since this peptide is not specific to sars-cov- ( ). as shown in table , swabs t and t correspond to patients that were diagnosed as sars-cov- positive by rt-pcr with relatively clear viral loads (ct values of and , respectively) and were sampled and days after their diagnostic and confinement. ms/ms samples were negative for swabs that yielded a relatively low pcr signal (ct of and for swabs t and t ), with undetectable pcr signal (swabs t , t , t , and t ) or days after their diagnostic (swab t ). the negative ms/ms signals for swabs t , t , and t patients are explained by the very low viral load probably present in these samples. to foster the development of alternative detection methods for sars-cov- , we performed a proof-of-concept study to assess the potential of ms/ms for proteotyping sars-cov- : i) in simulated nasal swabs containing different quantities of viral peptides; and ii) in nasopharyngeal swabs from covid- diagnosed patients. the two nasal peptidomes collected from healthy donors for the first experiment were first analyzed with a gradient of min to check for the presence of detectable microorganisms from the natural microbiota. a search against a generalist database such as ncbinr detected only trace levels of very low abundant bacteria commonly found in the nasal tract ( ) , thus confirming the absence of a measurable microbiome in the swab samples. based on this metaproteomic analysis, we used a human-only database as representative of the nasopharyngeal matrices for the subsequent analysis. the simili sars-cov- contaminated swabs contained a fixed amount of swab peptidome, plus a precise amount of viral peptidome corresponding to the expected quantities extracted from ng ( pfu), ng ( pfu), ng ( pfu), ng ( pfu), ng ( pfu), ng ( pfu), and . ng ( pfu) of sars-cov- . it is important to note that the virus produced in vero e cells and purified on sucrose gradient is only partially infectious, and thus the data are also presented in quantities of viral proteins. the real number of viral particles could be much higher in these samples and could be roughly estimated as the molecular weights of each viral protein are known and if the numbers of molecules per virus particle were documented for sars-cov- . here, we refer to the infectious dose as this is the most important parameter in terms of health concern, but the ratio of infectious particles in the nasopharyngeal swabs of patients may drastically differ from the purified virus fraction used here, and could even fluctuate during the course of the pathology. the strategy proposed for the analysis of these simili swabs consisted in a shotgun ms analysis based on a short acquisition of min with a short lc gradient. for the clinical samples, we added an inclusion list of viral peptides in the ms method. the inclusion list allowed forcing the fragmentation of candidate viral peptide ions contained in the background matrix, even when they were not included in the top from the data dependent acquisition method. the shotgun strategy resulted in the detection of viral peptides in six out of the seven conditions tested for the simili swab experiment. from the five peptides of interest, gfyaqgsr and adetqalpqr proved to be the most detectable and most sensitive in this background matrix, allowing proteotyping the virus up to the condition of ng of viral material ( pfu). one of the most interesting result was the omnipresence of peptide adetqalpqr and its two miss-cleaved versions kadetqalpqr and kkadetqalpqr. these peptides were consistently detected in out of identifications in the six most abundant conditions from the simili swab experiment. peptide kadetqalpqr was identified in all simili swabs from figure . these results clearly show that adetqalpqr, despite being prone to missedcleavages, is one the most abundant and ionisable peptides and should be the main target for proteotyping sars-cov- . this result was confirmed from the analysis of the clinical swab samples, since peptides adetqalpqr, kadetqalpqr, and kkadetqalpqr were undoubtedly the most abundant in samples from covid- patients (figure and table ). in our previous work, we showed that this peptide sequence is also specific to sars-cov- , but presented several variants among the available sars-cov- genomes ( ) . therefore, when targeting this peptide for viral detection with ms/ms we can also take into account both its missed-cleaved versions, and its different variants. surprisingly, the high intensity peptide eitvatsr was only identified in simili swabs with high concentration of viral proteic material (figure ) . by analyzing the ms and ms/ms spectra from these samples, we confirmed that this peptide co-eluted with another intense precursor from the background matrix that was fragmented simultaneously. the low mascot ion score attributed to these spectra hindered the confident identification of this peptide. this coelution effect is most likely due to the use of the short chromatographic gradient, and one way to tackle it would be to use smaller isolation windows for fragmentation. this parameter was tested for the analysis of clinical swabs, but little or no improvement was observed. no ms/ms spectra were validated at fdr % for this peptide in swabs t and t , even with the presence of a ms peak corresponding to this peptide in swab t . this peptide is therefore problematic in this type of matrix and probably not suited for tracking sars-cov- in nasal swab samples with our specific experimental setup. the distribution of the peptides along with the chromatogram from figure shows that the two most detectable peptides gfyaqgsr and adetqalpqr eluted in a narrow window of retention time between - min in simili swab samples. for the clinical swab samples, we observed that the retention time for these two peptides was . ± . min for peptide adetqalpqr, and . ± . minutes for peptide gfyaqgsr as established with skyline (table s ). in the light of these new results, we argue that targeting peptides adetqalpqr and gfyaqgsr with an extra short lc gradient of min coupled to the enrichment of these hydrophilic peptides prior the lc injection could be one way to develop quick and robust assays for detection of the virus in clinical samples and gain in signal/noise ratio. besides their high intensity, these peptides provide the needed specificity for a confident assay: peptide gfyaegsr is highly conserved among different sars-cov- genomes, and peptide adetqalpqr is specific to sars-cov- . the simultaneous detection of these two peptides could provide therefore unequivocal evidence for the presence of the virus. interestingly, a recent not yet published study showed the high potential of the same two peptides by using a targeted proteomics assay ( ) . the authors report limits of detection in the mid-attomole range corresponding to theoretically , sars-cov- particles in their specific experimental set-up. besides shortening the lc gradient to less than three min, sample preparation can also be optimized to develop more rapid peptidome preparation assays and remove too hydrophilic and too hydrophobic peptides that could saturate the chromatography column. here, we performed a sds-page gel and in-gel proteolysis with trypsin to denature proteins and to remove any mass spectrometry-chromatography deleterious compounds that could be present in the nasal swab. this procedure is known to not be optimal as only % of the peptide material deposited on the gel is recovered. the literature is becoming rich in alternative sample preparation protocols for ms-based proteomics. for example, we recently proposed a proteotyping assay based on sp magnetic beads for protein purification and digestion in roughly min ( ) . being easily adapted to -well plates and robotization, sp based digestion is the method of choice for quick, high-throughput, and highly reproducible proteome digestions, as recently demonstrated ( , ) . such sample preparation may further significantly increase the sensitivity of the tandem mass spectrometry proteotyping proposed in the present work. furthermore, more sensitive instrument and ms acquisition modes could be tested to gain further sensitivity. in conclusion, we tested in this study the potential of lc-ms/ms based methods for proteotyping sars-cov- in nasopharyngeal swabs. with a min ms-acquisition window, we were able to identify and quantify several virus-specific peptides that allowed proteotyping the virus in simulated swabs and clinical swabs from covid- patients. we argue that peptides adetqalpqr (and its variant forms) and gfyaqgsr from the nucleocapsid protein are of utmost interest to develop quick and robust targeted assays for proteotyping the virus in nasopharyngeal swab samples. further research must be done to validate their usefulness and their limits of detection in clinical samples, and develop the shortest possible pipeline. ja conceived the study with help from dg, gm, lg, and op. gm and jcg performed the mass spectrometry experimental work. jpl and as contributed the medical covid- swab samples. fg, sd, and lb contributed the sars-cov- biological material. dg, lg, gm, op and ja analysed the data. dg and ja wrote the manuscript with help from lg. table s . list of m/z values of the inclusion list; table s . unipept taxonomical analysis of swab r , swab r , and simili swab r - pfu samples; table s . list of peptide-to-spectrum matches (psms) assigned from the simili sars-cov- swabs (fdr< %); table s . list of proteins from the simili sars-cov- swabs and their spectral counts; table s . list of peptideto-spectrum matches (psms) assigned in the clinical samples (fdr< %); table s . list of proteins identified in the clinical samples and their spectral counts; table s . skyline report with experimental data on precursor ions from the nine swabs from covid- patients. patients were numbered from "swab t " to "swab t ". a pneumonia outbreak associated with a new coronavirus of probable bat origin the importance of naturally attenuated sars-cov- in the fight against covid- development and clinical application of a rapid igm-igg combined antibody test for sars-cov- infection diagnosis detection of sars-cov- in different types of clinical specimens overcoming the bottleneck to widespread testing: a rapid review of nucleic acid testing approaches for covid- detection correlation of chest ct and rt-pcr testing in coronavirus disease (covid- ) in china: a report of cases chest ct for typical -ncov pneumonia: relationship to negative rt-pcr testing diagnosing covid- : the disease and tools for detection pathogen proteotyping: a rapidly developing application of mass spectrometry to address clinical concerns proteotyping for the rapid identification of influenza virus and other biopathogens proteotyping: proteomic characterization, classification and identification of microorganisms--a prospectus identification of different respiratory viruses, after a cell culture step, by matrix assisted laser desorption/ionization time of flight mass spectrometry (maldi-tof ms) quantification of viral proteins of the avian h subtype of influenza virus: an isotope dilution mass spectrometry method applicable for producing more rapid vaccines in the case of an influenza pandemic targeted proteomics of human metapneumovirus in clinical samples and viral cultures proteomics of sars-cov- -infected host cells reveals therapy targets the covid- ms coalition-accelerating diagnostics, prognostics, and treatment peptides for targeted studies from experimental data-dependent acquisition tandem mass spectrometry data taking the shortcut for high-throughput shotgun proteomic analysis of bacteria rna-binding proteins are a major target of silica nanoparticles in cell extracts the unipept metaproteomics analysis pipeline skyline: an open source document editor for creating and analyzing targeted proteomics experiments the pride database and related tools and resources in : improving support for quantification data evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture the human nasal microbiota and staphylococcus aureus carriage evaluation of sample preparation methods for fast proteotyping of microorganisms by tandem mass spectrometry automated sample preparation with sp for low-input clinical proteomics the authors are indebted to dr silvia meschi (national institute for infectious diseases "lazzaro spallanzani" irccs, via portuense , rome, italia) for making the human -ncov strain -ncov/italy-inmi ( n- ) available. this publication was supported by the european virus archive goes global (evag) project that has received funding from the european union's horizon research and innovation programme under grant agreement n° . the authors are also grateful to the french alternative energies and atomic energy commission (cea), and the anr program "phylopeptidomics" (anr- -ce - - ) that supported part of this study. the authors have declared no conflict of interest. key: cord- -az pgi k authors: turjya, rafeed rahman; khan, md. abdullah-al-kamran; islam, abul bashar mir md. khademul title: perversely expressed long noncoding rnas can alter host response and viral proliferation in sars-cov- infection date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: az pgi k background since december , the world is experiencing an unprecedented crisis due to a novel coronavirus, sars-cov- . owing to poor understanding of pathogenicity, the virus is eluding treatment and complicating recovery. regulatory roles of long non-coding rnas (lncrnas) during viral infection and associated antagonism of host antiviral immune responses has become more evident in last decade. to elucidate possible functions of lncrnas in the covid- pathobiology, we have utilized rna-seq dataset of sars-cov- infected lung epithelial cells. results our analyses uncover differentially expressed lncrnas whose functions are broadly involved in cell survival and regulation of gene expression. by network enrichment analysis we find that these lncrnas can directly interact with differentially expressed protein-coding genes adar, edn , kynu, mall, tlr and ywhag; and also akap l, exosc , gdf , hectd , larp b, larp , mipol , upf , mov and prkar a, host genes that interact with sars-cov- proteins. these genes are involved in cellular signaling, metabolism, immune response and rna homeostasis. since lncrnas have been known to sponge micrornas and protect expression of upregulated genes, we also identified micrornas that are induced in viral infections; however, some lncrnas are able to block their usual suppressive effect on overexpressed genes and consequently contribute to host defense and cell survival. conclusions our investigation determines that deregulated lncrnas in sars-cov- infection are involved in viral proliferation, cellular survival, and immune response, ultimately determining disease outcome and this information could drive the search for novel rna therapeutics as a treatment option. by mid-may , the confirmed cases of covid- have crossed million ( ), making it one of the largest pandemics in modern times. since the first reported case at wuhan, china in december ( ), this viral disease has been responsible for more than , deaths worldwide ( ), and the number is steadily rising. the causative agent behind the disease is a novel beta-coronavirus ( ), which has been named sars coronavirus i.e. sars-cov- due to its similarity to the earlier sars-cov first detected in ( ). the rapid spread of the virus, the ever-increasing death toll, and absence of a sufficient treatment strategy has affected societies and economies all over the globe. the molecular mechanism of sars-cov- is complex and interrelated with host mechanisms, as is common with most pathogenic viruses ( ). it is possible that infected lung performing rna-seq analysis to identify differentially expressed (de) genes. but no such study yet concluded the possible outcomes of the deregulated lncrnas in sars-cov- . in our present study we have identified lncrnas that are differentially expressed in sars- cov- infected cell's transcriptome compared to uninfected cells, and then correlated the putative effects of the deregulated lncrnas in the tug-of-war between sars-cov- and the host. we have also investigated the possible aftermaths of lncrna deregulation in covid- disease pathobiology. to investigate the probable deregulation of lncrnas in sars-cov- infection, firstly we have analyzed the hours post-infection transcriptome data of sars-cov- infected nhbe cells. analyzing the rna-seq data led to identification of de genes and intriguingly, amongst those de genes, we discovered lncrna genes, of them upregulated while were downregulated (figure ). we now sought to elucidate the putative effects of these deregulated lncrnas in sars- cov- infection. in order to achieve that, we built a network of the deregulated lncrnas along with their interacting protein coding target genes. among the differentially expressed protein-coding genes, were found to interact with the de lncrnas which (table ) . there are direct rna-rna interactions for genes, and rna-protein interactions for the rest ( figure ). these lncrna interacting protein coding genes have potential roles during the viral infections (table ) innate immune response against the infection is inevitable, but the complex interactions underlying its activation and effect may have been the target of lncrnas.hectd is an e ubiquitin-protein ligase that interacts with nsp . as it is involved in class i mhc mediated antigen processing and presentation and innate immune system, the binding may lead to modulation of that response. in case of hectd mrna, absence of rna-rna interaction with downregulated kcnq ot lncrna can facilitate the response. ptges gene was upregulated and also found to interact with sars-cov- nsp protein. as it is involved in innate immune system and signaling pathways, this binding may exert an indirect influence. were analyzed using networkanalyst . ( ) tools for functional overrepresentation and network enrichment. mirnas that mostly targeted upregulated genes were finally selected. not applicable. publicly available data were utilized. analyses generated data are deposited as supplementary files. the authors declare that they have no competing interests. funding this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. exosc is a non-catalytic component of the rna exosome complex which has ' to ' exoribonuclease activity and participates in a multitude of cellular rna processing and degradation events ( ). protein-rna gdf is a member of the glial cell-derived neurotropic factor (gdnf) family which binds to gdnf family receptor α -like (gfral) protein, a transmembrane receptor exclusively expressed in the hind brain ( ). the protein is expressed in a broad range of cell types, acts as a pleiotropic cytokine and is involved in the stress response program of cells after cellular injury ( ). increased protein levels are associated with disease states such as tissue hypoxia, inflammation, acute injury and oxidative stress ( , ). meg rna-rna hectd is an e ubiquitin-protein ligase which accepts ubiquitin from an e ubiquitin-conjugating enzyme in the form of a thioester and then directly transfers the ubiquitin to targeted substrates and mediates 'lys- '-linked polyubiquitination of hsp aa which leads to its intracellular localization and reduced secretion ( ). the protein is involved in class i mhc mediated antigen processing and presentation and innate immune system. kcnq ot rna-rna larp bencodes a la-module containing factor that can bind au-rich rna sequence directly and promotes mrna accumulation and translation ( ). it was deemed a candidate tumor suppressor gene in glioma, as it was consistently decreased in human glioma stem cells and cell lines compared with normal neural stem cells. larp b overexpression strongly inhibited cell proliferation by inducing mitotic arrest and apoptosis andcdkn a and bax were upregulated ( ). nsp protein-rna larp works as a negative transcriptional regulator of polymerase ii genes, acting by means of the sk rnp system ( ). this snrnp complex inhibits a cyclin-dependent kinase, positive transcription elongation factor b, which is required for paused rna polymerase ii at a promoter to begin transcription elongation ( ). protein-rna prkar a is the camp-dependent protein kinase type ii-alpha regulatory subunit, which works as the regulatory subunit of the camp-dependent protein kinases involved in camp signaling in cells ( ). outbreak of a novel coronavirus a new coronavirus associated with human respiratory disease in china epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak host immune response and immunobiology of human sars-cov- infection low expression of long noncoding rna gas -as predicts a poor prognosis in patients with nsclc low expression of long noncoding rna gas - as as a novel biomarker of poor prognosis for breast cancer rotavirus nsp triggers secretion of proinflammatory cytokines from macrophages via toll-like receptor pattern recognition receptors tlr and cd mediate response to respiratory syncytial virus reading the viral signature by toll-like receptors and other pattern recognition receptors lncrna kcnq ot attenuates sepsis-induced myocardial injury via regulating mir- - p/xiap axis acute respiratory distress syndrome. american journal of respiratory and critical care medicine kcnq ot promotes migration and inhibits apoptosis by modulating mir- - p/rab axis in oral squamous cell carcinoma lncrna meg aggravates palmitate-induced insulin resistance by regulating mir- - p/egr axis in hepatic cells. european review for medical and pharmacological sciences the long non-coding rna meg /mir-let- c- p axis regulates ethanol-induced hepatic steatosis and apoptosis by targeting nlrc ncbi geo: archive for functional genomics data sets-update tophat: discovering splice junctions with rna- initial sequencing and analysis of the human genome biases in illumina transcriptome sequencing caused by random hexamer priming the subread aligner: fast, accurate and scalable read mapping by seed-and-vote differential expression analysis for sequence count data string v : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets cytoscape: a software environment for integrated models of biomolecular interaction networks npinter v . : an integrated database of ncrna interactions lncbook: a curated knowledgebase of human long non-coding rnas mirtarbase : updates to the experimentally validated microrna-target interaction database. nucleic acids research diana-lncbase v : indexing experimentally supported mirna targets on non-coding transcripts networkanalyst . : a visual analytics platform for comprehensive gene expression profiling and meta-analysis. adenosine deaminase acting on rna (adar ), a suppressor of double- stranded rna-triggered innate immune responses rna editing by adar prevents mda sensing of endogenous dsrna as nonself new antiviral pathway that mediates hepatitis c virus replicon interferon sensitivity through adar rna adenosine deaminase adar deficiency leads to increased activation of protein kinase pkr and reduced vesicular stomatitis virus growth following interferon treatment adenosine deaminase acting on rna (adar ) suppresses the induction of interferon by measles virus the large form of adar is responsible for enhanced hepatitis delta virus rna editing in interferon- alpha-stimulated host cells editing of hiv- rna by the double-stranded rna deaminase adar stimulates viral infection pyk mediates endothelin- signaling via p cas/bcar cascade and regulates human glomerular mesangial cell adhesion and spreading the role of endothelin- in pulmonary arterial hypertension purification and biochemical characterization of some of the properties of recombinant human kynureninase cloning and recombinant expression of rat and human kynureninase bene, a novel raft-associated protein of the mal proteolipid family, interacts with caveolin- in human endothelial-like ecv cells. the journal of biological chemistry a novel gene that encodes a protein with a putative src homology domain is a candidate gene for familial juvenile nephronophthisis the role of tlr in infection and immunity mycobacterium tuberculosis lipoproteins directly regulate human memory cd (+) t cell activation via toll- like receptors and . infection and immunity tlr and its co-receptors determine responses of macrophages and dendritic cells to lipoproteins of mycobacterium tuberculosis host defense mechanisms triggered by microbial lipoproteins through toll-like receptors. science cell activation and apoptosis by bacterial lipoproteins through toll-like receptor- - - γ associates with muscle specific kinase and regulates synaptic gene transcription at vertebrate neuromuscular synapse resistance exercise and insulin regulate as and interaction with - - in human skeletal muscle. diabetes - - gamma interacts with and is phosphorylated by multiple protein kinase c isoforms in pdgf-stimulated human vascular smooth muscle cells. dna and cell biology jnk phosphorylation of - - proteins regulates nuclear targeting of c-abl in the apoptotic response to dna damage. functional conservation of - - isoforms in inhibiting bad-induced apoptosis kank regulates rhoa-dependent formation of actin stress fibers and cell migration via - - in pi k-akt signaling mir- b- p promotes epithelial-mesenchymal transition in breast cancer cells through snail stabilization by directly targeting ywhag microrna- promoted esophageal squamous cell carcinoma cell growth and metastasis via targeting ywhag wong-staal f. a novel shuttle protein binds to rna helicase a and activates the retroviral constitutive transport element. the wong-staal f. mapping the functional domains of hap , a protein that binds rna helicase a and activates the constitutive transport element of type d retroviruses protein kinase a associates with ha and affects transcriptional coactivation by epstein-barr virus nuclear proteins. molecular and cellular biology the role of a-kinase anchoring protein -like protein in annealing of trnalys to hiv- rna the mammalian exosome mediates the efficient degradation of mrnas that contain au-rich elements growth differentiation factor (gdf ): a survival protein with therapeutic potential in metabolic diseases cardiovascular diseases: a translational prospective tgf-b superfamily cytokine mic- /gdf is a physiological appetite and body weight regulator gfral is the receptor for gdf and the ligand promotes weight loss in mice and nonhuman primates hectd regulates intracellular localization and secretion of hsp to control cellular behavior of the cranial mesenchyme larp b is an au-rich sequence associated factor that promotes mrna accumulation and translation identification of rna- binding protein larp b as a tumor suppressor in glioma la-related protein larp is a component of the sk ribonucleoprotein and affects transcription of cellular and viral polymerase ii genes functional characterization of a candidate tumor suppressor gene, mirror image polydactyly , in nasopharyngeal carcinoma human upf proteins target an mrna for nonsense-mediated decay when bound downstream of a termination codon molecular mechanisms for the rna-dependent atpase activity of upf and its regulation by upf binding of a novel smg- -upf -erf -erf complex (surf) to the exon junction complex triggers upf phosphorylation and nonsense-mediated mrna decay mov is a ' to ' rna helicase contributing to upf mrna target degradation by translocation along ' utrs identification of novel argonaute-associated proteins apobec g inhibits microrna-mediated repression of translation by interfering with the interaction between microrna silencing through risc recruitment of eif virus escape and manipulation of cellular nonsense-mediated mrna decay provides antiviral activity against rna viruses by enhancing rig-i-mavs-independent ifn induction not applicable. key: cord- -p euuop authors: doğan, tunca; atas, heval; joshi, vishal; atakan, ahmet; rifaioglu, ahmet sureyya; nalbat, esra; nightingale, andrew; saidi, rabie; volynkin, vladimir; zellner, hermann; cetin-atalay, rengul; martin, maria; atalay, volkan title: crossbar: comprehensive resource of biomedical relations with deep learning applications and knowledge graph representations date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: p euuop systemic analysis of available large-scale biological and biomedical data is critical for developing novel and effective treatment approaches against both complex and infectious diseases. owing to the fact that different sections of the biomedical data is produced by different organizations/institutions using various types of technologies, the data are scattered across individual computational resources, without any explicit relations/connections to each other, which greatly hinders the comprehensive multi-omics-based analysis of data. we aimed to address this issue by constructing a new biological and biomedical data resource, crossbar, a comprehensive system that integrates large-scale biomedical data from various resources and store them in a new nosql database, enrich these data with deep-learning-based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to users via easy-to-interpret, interactive and heterogenous knowledge graph (kg) representations within an open access, user-friendly and online web-service at https://crossbar.kansil.org. as a use-case study, we constructed crossbar covid- kgs (available at: https://crossbar.kansil.org/covid_main.php) that incorporate relevant virus and host genes/proteins, interactions, pathways, phenotypes and other diseases, as well as known and completely new predicted drugs/compounds. our covid- graphs can be utilized for a systems-level evaluation of relevant virus-host protein interactions, mechanisms, phenotypic implications and potential interventions. systemic analysis of available large-scale biological and biomedical data is critical for developing novel and effective treatment approaches. parts of this big-data are continuously being updated and maintained by different organizations and institutions, thus, data is scattered across individual computational resources. although these entities are biologically related and complementary to each other, the connections between datapoints at different resources are not well-established and explicit at all. in addition to the connectivity problem, another issue related to biomedical data is the incompleteness in knowledge space (e.g., possible ligands of a target biomolecule, or the disease related implications of a critical mutation). there is a clear requirement for innovative computational approaches to integrate available biomedical big-data and to complete missing information with in silico predictions, to serve the ultimate aim of proposing novel treatment options (especially) for complex diseases. there are numerous studies in the literature that aimed to integrate the available biomedical data [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . these studies provided useful tools and methods to the life-sciences research community; however, many of them miss important functionalities that prevent them from becoming widely adopted tools/services (supplementary information section ). in this project, we aimed to address the current shortcomings by developing a comprehensive open access biomedical system entitled crossbar via integrating various biological databases to each other, inferring the missing relations between existing data points, and constructing informative knowledge graphs based on specific biomedical components/terms such as a disease/phenotype, biological process, gene/protein and drug/compound, or specific combinations of them. to construct the crossbar system, we accomplished multiple sub-projects: (i) the construction of the crossbar database to house, and its api service to serve the integrated biomedical data, (ii) development of deeplearning based drug/compound-target protein interaction (dti) prediction models, crossbar database (crossbar-db) comprises carefully selected features from various data sources namely uniprot, intact, interpro, reactome, ensembl, drugbank, chembl, pubchem, kegg, omim, orphanet, gene ontology, experimental factor ontology (efo) and human phenotype ontology (hpo). extract-transform-load (etl) pipelines were developed for heavy lifting of data from these resources by persisting specific data attributes with the implementation of logic rules. these pipelines fetch, cleanse, validate and consolidate the data, and thus, implement a multi-omics data integration approach to release a single resource based on mongodb collections. crossbar-db, which provides a broad spectrum of information such as biomolecular functions, domains, interactions, pathways, diseases, phenotypes, drugs, compounds, etc., is hosted and maintained by the embl-ebi. current data statistics of the crossbar-db and the database schema are shown in fig. b and in fig. s , respectively. crossbar-db is periodically updated on demand/request basis via an automated procedure, which makes the underlying data up to date most of the time. crossbar-db can be queried via a public restful api at: www.ebi.ac.uk/tools/crossbar/swagger-ui.html, which provides a multi-faceted view of the stored data through endpoints (fig. s ) . as a part of the crossbar project, we developed two novel deep-learning-based predictive systems: deepscreen and mdeepred, with the aim of enriching bioactivity data by identifying unknown interactions between drugs/drug-candidate compounds and target proteins. deepscreen employs convolutional neural networks to process d structural images of drugs/compounds in individually optimized high performance target-based prediction models, suited for well-studied targets . mdeepred utilizes both compound and target protein features within a pairwise input hybrid deep neural network architecture to produce real valued bioactivity predictions, especially for targets with a few or no training instances . we trained both systems using carefully filtered and integrated data in crossbar-db, and ran our trained-models on large compound and human protein spaces to obtain comprehensive bio-interaction predictions, which are included in our knowledge graphs. we also developed an accompanying computational tool, ibioprovis, which is an unsupervised-learning-based visualization system for exploring large drug/compound-target interaction datasets in reduced dimensions . the term knowledge graph (kg) defines a specialized data representation structure, in which a collection of entities (nodes) are linked to each other (edges) in a semantic context . in this study, we chose to represent heterogeneous biomedical data in kg structures. in crossbar knowledge graphs (crossbar-kg), biological components/terms (i.e., drugs/compounds, genes/proteins, bio-processes/pathways, phenotypes/diseases) are represented as nodes, and their known or predicted pairwise relationships are annotated as edges (a protein and its coding gene is treated as one merged term/entry/node). the logic behind the construction of a knowledge graph is centered around queried biological components/terms, as shown in fig. c with a work-flow diagram and with an example disease term query. at each step of the process, an overrepresentation-based enrichment analysis has been performed to select the terms that are significantly associated with the growing graph, and to discard the rest. this analysis comprises a series of hypergeometric tests, based on the recorded relations in the crossbar database. here, we applied a layered construction approach, always taking the genes/proteins at the centre of the enrichment analysis. finally, additional relation types are incorporated to the graph as edges between the existing nodes (e.g., drug-disease, disease-pathway and disease-hpo), to further enrich the provided information. during the construction of graphs, terms (nodes) and their pairwise relations (edges) are directly obtained from the crossbar-db. crossbar-kgs clearly display the direct and indirect relations between all of the terms in the graph. these intensely-processed heterogeneous biological networks are expected to aid biomedical research, especially to infer mechanisms of diseases in relation to biomolecules, systems and candidate drugs. we developed the crossbar web-service (crossbar-ws) to make the crossbar-kgs available to the public in an easily interpretable and interactive way (https://crossbar.kansil.org). kgs are presented visually on web-browsers as flexible cytoscape networks. users can create queries with biomedical terms, individually or in combination, to obtain the relevant graph. combinatory term query is especially critical as it provides the ability to investigate the indirect biological relationships between the terms from both the same and different biomedical components. since there are billions of different ways to query crossbar, it was not feasible to pre-calculate the resulting graphs; therefore, they are set to be constructed on-the-fly, in real-time. several options are provided to users to customize the procedure both before the search, such as the uniprot databases to be used (uniprotkb/swiss-prot or uniprotkb/swiss-prot+uniprotkb/trembl), taxons to be included, and the number of terms/nodes to include from each entity type (selected from enrichment score-based ranked lists). it is also possible to display the graph using a variety of layout options, including our in-house crossbar-layout (fig. a) . saving options let users to store the graph in different formats, including json, figure-ready snapshots and protein-centric delimited data-tables. the interactive visualization also lets users prepare a custom display by relocating the nodes/edges as desired. as a use-case, we present coronavirus disease (covid- ) crossbar-kgs (https://crossbar.kansil.org/covid_main.php). starting from the end of , the new coronavirus (sars-cov- ) pandemic has ravaged the entire globe and caused immeasurable damage . as of july , the scientific endeavour to develop effective drugs and vaccines is at peak, and systemic evaluation of the current knowledge about sars-cov- infection is expected aid researchers in this struggle . to demonstrate the capabilities of crossbar, we have constructed two different versions of the covid- knowledge graph, (i) a large-scale version including nearly the entirety of the related information on different crossbarintegrated data sources, which is ideal for further network and machine learning based analysis or a detailed inspection (fig. b) , and (ii) a simplified version distilled to include only the most relevant genes/proteins as provided in uniprot-covid- portal (https://covid- .uniprot.org), which is ideal for fast interpretation (fig. c) . it is interesting to observe the indirect relations between the diseases/phenotypes in the kgs and covid- over the incorporated host proteins and enriched pathways, and between covid- and our in silico predicted drugs, as they may reveal further evidence to be utilized against covid- (fig. b ,c). for this, we conducted a short literature-based validation study and found that many of these drugs have already been experimented at both preclinical and clinical stages for new covid- treatments (supplementary information section ). although covid- is a respiratory disease and lung lesions have been considered the major damage caused by sars-cov- , liver injury has also been reported in about one-third of hospitalized patients infected with the virus and the majority of covid- patient deaths are associated with cytokine storm/release syndrome resulting in multi organ damage . hence, with the aim of indicating the biological relevance of the information in crossbar-kgs, we conducted in vitro experimentation on drug treated liver cancer cell-lines and comparatively analysed the results on both covid- kgs. chloroquine (cq) phosphate was reported to be used in treatment of covid- with controversies . cq is an antiinflammatory drug that has been used in autoimmune diseases and can significantly alter the production of pro-inflammatory and anti-inflammatory cytokines. we investigated the effect cq on normal hepatocytes like huh cells and poorly differentiated mahlavu cells. cells were treated with cq and the differentially expressed gene (deg) data were acquired from a large multiplex panel of genes using the nanostring platform (fig. d ). our experimental data indicated significant alterations in jak/stat, pi k, ras, mapk pathways involving cytokine production in liver cells (full pathway list: table s . ). these pathways were also presented in both kgs along with additional cytokine related pathways, such as interleukin signaling, along with dense connections to other biological components in covid- crossbar-kgs, which is an expected output considering the mode of action of cq in covid- (fig. c) . additional use-cases are provided in the supplementary information section . we believe that the crossbar system can be utilized towards the systematic analysis of the pharmacological effects of drugs as it brings relevant pieces of biological data together, that is relevant to the user's query, which can be manually explored by the expert to build new hypotheses. integrated data resources and the crossbar-db crossbar-db has developed its bespoke etl pipelines in java using the spring batch framework to structure the jobs. the latter are executed on state-of-the-art embl-ebi lsf clusters powered by ibm in a parallel distributed fashion to reduce the processing time. the data are finally stored in mongodb in the form of independent data collections, thus, providing schemaless flexibility and faster development, while sustaining data relationships in the form of nested documents. the pipelines have been both unit and integration tested using spock framework in groovy language. the public databases integrated in the crossbar system can be listed along with the type of the biomedical data it contains as follows; the statistics regarding the number of terms and annotations incorporated to crossbar-db from each resource listed above is given in fig. b supplementary information fig. s , where the crossbar-db collections are shown. here, the collections entitled "activities", "assays", "molecules" and "targets" correspond to bioactivity, bioassay, compound and target entries in the chembl database, respectively. "drugs" corresponds to drug entries in drugbank, "efo disease terms" corresponds to the disease entries in the experimental factor ontology, "hpo" corresponds to phenotype entries in the human phenotype ontology, "proteins" correspond to a subset of the protein entries in the uniprotkb. the remaining four collections belong to the pubchem data. there is no one-to-one correspondence between the incorporated data resources and the crossbar-db collections since some of the resources had to be split to multiple collections for easier query (e.g., chembl and pubchem). also, some of the sources are directly incorporated from the uniprot database, thus, reside in the proteins collection (e.g., both terms and annotations for interpro, reactome, and only annotations for intact, omim and orphanet). it is possible to obtain cross-collection relational data (i.e., integrated relational data from multiple collections) by writing programmatic queries and submitting them to the api, as it is applied in crossbar-ws to construct the knowledge graphs. however, it is not possible to obtain this complex relational data in a single query using the swagger graphical user interface. currently, crossbar knowledge graphs do not include pubchem data due to both elevated computational demand (the sizes of pubchem collections are large) and high redundancy (a large portion of bioactivity data points in pubchem and chembl databases are shared). however, it is possible to query the crossbar-db using the provided api service, to obtain data entries from pubchem database collections. the database and api construction work has been handled by the protein function development (uniprot database) team at embl-ebi, utilizing their expertise in biological database development and maintenance together with the available strong computational infrastructure, the team managed to build a huge but stable resource. the professional service providing approach applied by the team allowed the proper and constant maintenance of both the database and the entire crossbar system. the the current knowledge corresponds to less than . % of the whole compound-target space . the high rate of missing dti data negatively impacts the integrated biomedical resources as well. in the crossbar project, we aimed to address this issue by producing machine learning based dti predictions and incorporating these predictions to the crossbar resource. the studies specifically about the development of these tools have already been published or under review , , however, we used our tools to produce dti predictions to be incorporated in our knowledge graphs in the framework of this study. one critical topic in developing dti prediction models is the source dataset to be used in system training procedures. it is especially critical to construct large-scale dti datasets to train deep-learning models. to address this issue, we prepared a dti dataset from the chembl database that is suitable for training machine learning systems, with standardized filtering operations on targets, compounds and bioactivities. the dataset is periodically updated with each chembl database release. we employed this dataset for the training and validation of the deep-learning based dti prediction models we developed in the framework of the crossbar project, and also as the source dataset for drug/compound-target interaction space visualization (the methods are described below). it can also be used for developing new dti prediction models. the current version of the bioactivity dataset (chembl v ) is available for public use in: https://github.com/cansyl/crossbar/blob/master/crossbar_db_api/chembl _preproc essed_activities_sp_b_pchembl.zip. details regarding the dataset can be found in our recent article . deep learning base predictor -deepscreen deepscreen was the first dti prediction system that we developed in this endeavour. deepscreen is a high-performance drug-target interaction predictor that utilizes deep convolutional neural networks and -d structural compound representations (i.e., simple images) to predict their activity against intended target proteins. deepscreen system is composed of target protein specific prediction models, each independently trained using experimental bioactivity measurements against many drug candidate small molecules, and optimized according to the binding properties of the target proteins. the main novelty of deepscreen is employing readily available -d structural representations of compounds at the input level instead of conventional drug/compounds descriptors (e.g., molecular fingerprints) that display limited performance. deepscreen produces binary predictions, meaning that a compound is either predicted as active or inactive against a target protein. during the development of this method, we also carried out cell-based in vitro wet-lab experiments on computationally generated dti predictions, with the purposes of both validating the accuracy of the prediction models, and for gaining biological insight in the framework of health and disease, especially to contribute to the understanding of processes active in different cancer subtypes. deepscreen can be used for the fast screening of the chemogenomic space, to provide completely new dtis that can later be investigated experimentally in the fields of drug discovery and repurposing in crossbar, the data is stored in a non-relational database (mongodb), as separate collections for easy maintenance and fast querying. as a result, the database itself is not a knowledge graph. instead, biologically relevant small-scale knowledge graphs are constructed on-the-fly, triggered by users' queries with a single or multiple term(s) such as the names or ids of genes/proteins, diseases/phenotypes, compounds/drugs and/or pathways/biological processes of interest. in crossbar knowledge graphs, biological entities are represented as vertices/nodes. distinct types of nodes are defined for: (i) biomolecules (i.e., genes/proteins), (ii) biological mechanisms (i.e., processes/pathways), (iii) pathologies (i.e., diseases and phenotypes), and (iv) small molecule ligands used for treatment (i.e., drugs and drug candidate compounds). relations between different types of biological entities are expressed by the edges of the graph. edge types vary according to the defined relations. for a relation between; (i) two proteins, the edge is labelled as "interacts_with", (ii) for a gene/protein and a disease, the edge label is "related_to", (iii) for a drug/compound and a protein, the edge label is "targets", (iv) for a gene/protein and a pathway, the edge label is "involved_in", (v) for a gene/protein and a phenotype term, the edge label is "associated_with", (vi) for a drug and a disease, the edge label is "indicates", (vii) for a disease and a pathway, the edge label is "modulates", and (viii) for a disease and a phenotype term, the edge label is "associated_with". the incorporation of pathway-related information (both signalling and metabolic pathways) in crossbar is done based on a membership-based approach, where pathways are expressed on the graph as single nodes and the nodes of those member proteins are connected to them via edges. this approach leaves out the detailed reaction-based mechanistic information provided in pathway databases such as reactome and kegg pathways; however, the inclusion of this information via applying a pathway resource styled network approach would prevent the generation of large heterogeneous networks composed of tens of different pathways and other components. nevertheless, it is possible to explore these pathways in detail using the provided links, which takes the user to the corresponding page on that pathway database. both reactome and kegg pathways provide the same type of biological information at the level of large-scale biological processes; however, reactome also divides these processes into sub-pathways, whereas kegg only provides the pathway information at a generic level. in crossbar, due to the way the overrepresentation analysis is done, only specific sub-pathways are incorporated from the reactome database, in most cases. as a result, pathway information in the knowledge graphs is displayed at different levels of specificity, and thus, not redundant. a simplified form of the knowledge graph construction work-flow is displayed in fig. c . in would not be feasible. to address this problem, we applied a multi-staged overrepresentation-based enrichment analysis process during the construction of graphs. in this analysis, we calculate an independent enrichment score for each biological entity in the database (i.e., a disease, phenotype, drug, compound, gene/protein or pathway), to be considered as its relevance to the graph that is being constructed. the calculation of enrichment score and its statistical significance is done using a modified version of the hypergeometric test for overrepresentation , which also corresponds to a one-tailed fisher's exact test, and it is based on the statistics of the relations/connections with gene/protein nodes. for example, the enrichment score (ed,w) and its significance (sd,w), in terms of p-value, for a disease term d, for graph w is calculated as follows: ( ) ( ) where ed,w is the enrichment score calculated for the disease term d for graph w; md represent the square of the number of genes/proteins in graph w that are associated with disease d; nw represents the total number of gene/protein nodes in graph w; md is the total number of genes/proteins (not necessarily in graph w) that is associated with disease d; and n represents the total number of reviewed human gene/protein entries (i.e., uniprotkb/swiss-prot entries) in the crossbar database that is annotated with any disease entry. sd,w represents the significance (p-value) for the disease term d for graph w calculated in the hypergeometric test. while constructing the graph, an enrichment score is calculated for each disease entry in the crossbar database and these scores are used to rank disease entries according to their biological relevance to graph w. a cut-off value k is employed to include the top k relevant disease entries to graph w. the default value for k is , which means that only top- relevant diseases will be included in the graph. apart from diseases, the same methodology is used to filter out the nodes of neighbouring genes/proteins, phenotypes, drugs, compounds and pathways. significance values are not directly used in the filtering operation, since the main objective here is not only including significantly over-represented terms, but just reducing the number of nodes in the graph by filtering out the ones that are least relevant. in the traditional way of calculating an enrichment score, md is without square. the reason behind taking the square of md here is to break the tie between the scores of terms in favour of the one with a higher md value. formalizing the graph construction around gene/protein entries during the construction of a knowledge graph, first, the gene/protein entries that are directly connected to the query term (i.e., core proteins) are fetched, such as the member genes/proteins of the queried signalling pathway. after that, neighbouring/interacting genes/proteins are added to the graph by calculating enrichment scores for each interacting protein, using the equation above, and filtering out based on the selected cut-off value. this is followed by the enrichment-based filtering and addition of other entity types; however, this time, both core and neighbouring genes/proteins are taken into consideration together to calculate the enrichment scores. if the user starts a heterogenous search that contains multiple terms from different entity types, both core and neighbouring genes/proteins are independently collected for each non-protein query term, queried gene/protein entries are added to this list (if there is any), and the entity collection process is continued using the union of these genes/proteins (fig. c) . this approach enables the exploration of direct and indirect relations between the queried terms. bioactive compound and bioactivity selection procedure small molecule compounds are selected and incorporated to kgs based on their reported bioactivities against target proteins. in a kg, a compound is represented as a node and a bioactivity is represented as an edge between a compound node and a gene/protein node. we start the bioactive compound collection procedure with a set of target gene/protein entries at hand (gathered in a previous step of the kg construction process), and obtain the compounds that are reported to be bio-actively interacting with these proteins, as their target biomolecules. despite having a simple logic, this procedure is extremely complex due to practical reasons. since there are more than million bioactivity measurements in the chembl database (v ), we rigorously filter these data points with the aim of providing only the most relevant bioactivity/compound information in crossbar-kgs. since crossbar is a gene/protein centric system, we first filter out the data points where the target is not a single protein. we also set an organism filter for the targets, where the default selection is human. additionally, we filter out bioactivities if their standard (activity) type is not one of these: ic , ec , ac , xc , ki, kd, potency; since these standard types provide roughly comparable measures of half-maximal response. furthermore, we eliminated data points without a pchembl value, which standardizes the above-mentioned standard types under one value in the negative logarithmic scale. bioactivity data points with an assigned pchembl value have usually received additional curation, and thus, they are more reliable. finally, with the aim of only taking the data points at the active binding range (i.e., high affinities between the ligand and the target) we discard the data points with a pchembl value less than (i.e., xc > µm). despite these filtering operations, we still usually end up with tens of thousands of compounds before the compound enrichment analysis, which significantly increases the kg construction run time. exploiting the fact that it is a better choice to include a compound with higher binding affinity compared to a compound with a lower binding affinity for the same target protein, we set the pchembl value cut off value to , at the beginning of the compound collection procedure. then, we reduce the cut off value and re-run the query if the total number of gathered compound entries is less than in the first run. we iteratively do this procedure until we obtain at least compound entries. similarly, if the number of returned compounds is more than in the first run, we further increase the cut off iteratively until we obtain less than compound entries. this number (i.e., to ) is still much higher than the number of compounds we incorporate to a kg, which is between and ; however, we aim to enter the enrichment analysis with a high amount to be able to select the compounds that are interacting with multiple proteins in the network, not just one. another reason is to be able to select diverse compounds, in terms of their scaffolds/structures, which is explained below (under the compound clustering sub-section). there are more than one million compound entries in the chembl database, most of which have bioactivity data points against target biomolecules. since it is not feasible to include each and every bioactive compound node in a kg (otherwise the graph would be extremely crowded), only the most overrepresented compounds are tried to be incorporated. we observed that some of the compounds with the same (or a very similar) enrichment score are also structurally very similar to each other. these are mostly molecules with matching scaffolds, which are screened against the same target and produced similar results in the same bioassay. since their enrichment scores are similar as well, they are either selected or discarded together. to provide a better selection of compounds in the graph, we incorporated a structural property-based filtering in the enrichment analysis. the aim here is to select overrepresented compounds that are as diverse from each other as possible in terms of molecular structures, so that users will be provided with a variety of ligands for the target proteins in the graph. to achieve this, we calculated the pairwise molecular similarities between all compounds in crossbar-db using circular fingerprints (ecfp ) and the tanimoto coefficient. after that, we clustered the compounds based on a predefined similarity cut-off value of . , meaning that each cluster is composed of compounds that are at least % similar to each other. the cluster information is pre-calculated and recorded on our server. each time a knowledge graph is being constructed, enrichment score ranked compounds are checked one by one in terms of their cluster membership and if there already is a compound from the same cluster in the graph, the compound in turn is discarded (i.e., not incorporated to the graph). the same clustering-based selection approach is applied to incorporate compounds that are computationally predicted to interact with the proteins in the graph. following the finalization of the compound nodes, we check whether some of these compounds correspond to drugs that are already incorporated to the kg (since chembl also contains bioactivity measurements belonging to approved or investigational drugs), using the identifier mapping between chembl and drugbank databases. when a positive case is detected, we merge these two nodes and set the node type as a drug, since drugs are as a result, predicted relations comprise the least reliable part of the dti information provided in crossbar-kgs. with the aim of transmitting this evidence-based relation confidence information to users, we used edge labels in the kgs. in terms of visualization, these labels are encoded on the graphs as colours, such that green colour corresponds to dtis obtained from approved or investigational drugs, blue colour corresponds to experimental bioactivity measures obtained from chembl, and the red colour corresponds to computationally predicted interactions. during the generation of kgs, if a specific relation is obtained from multiple sources (e.g., when the same relation is reported both in drugbank and in chembl) the edge label of the more reliable relation is incorporated. to accomplish this task, kg construction process comprises an edge label update procedure. another process we applied at this step is the edge addition. some drugs possess bioactivity data points in chembl in addition to their approved targets in drugbank. to detect this, we first do a mapping between chembl compound entries and drugbank drug entries, to find the equivalent chembl entry for each drug. after that, we identify the reported chembl bioactivities between that compound and all of the proteins presented in the kg. for those relations, which were not already incorporated into the kg via drugbank, we added blue coloured edges. the same procedure is applied for adding red coloured edges to the drugs and compounds that have extra computationally predicted target interactions. a user query initiates the entity (node) gathering procedure first from the related database collections, using the crossbar restful api. together with the entities that match the search term, the information regarding the related/connected entities are obtained from the corresponding collection. after that, the next database collection is queried with the terms gathered at the previous step. the order of the api queries follows the logic defined for the construction of the kgs, as given in fig. c (simplified version) , and in fig. s . (full version). following the initiation of a query, the growing knowledge graph is displayed on the web-browser in real time (using cytoscape web), starting from the collection and filtering of core and neighbouring genes/proteins. the process is continued with the collection, filtering and addition of phenotype/pathway/disease terms, drugs, bioactive compounds and predicted interacting compounds to the kg (as nodes), together with their relations with gene/protein nodes (as edges). the construction process is finished with the addition of respective edges between non-protein nodes. an important subject in graph/network visualization is the layout. in crossbar-ws, we incorporated the standard layouts of cytoscape web, such as circle, cose, grid and concentric. however, none of these layouts were sufficient for communicating highly heterogeneous graphs with different types of nodes and different types of edges. to address this problem, we developed the crossbar layout, in which biological terms (nodes) from a specific biomedical component (e.g., diseases, pathways, ...) are placed on circular points within a fixed radius. with the aim of preventing overlapping nodes, the radius of each circle is selected as a different value. curved edge style (i.e., unbundled-bezier) is applied to reduce the amount of edge crossing. more information regarding the usage of crossbar-ws and its user interface can be found at https://crossbar.kansil.org/tutorial.php. crossbar web-service queries run in linear time and the actual duration of the process is correlated with the total number of core genes/proteins obtained within the query, together with the annotation volume of these genes/proteins. highly studied genes/proteins usually have high number of associations, which in turn, extends the actual query runtime in practice. according to our tests, most of the queries with disease, gene/protein, drug and compound terms (in terms of both single and combinatory term searches) take between and minutes to complete (from job submission to the display of the whole graph). however, most pathway and some phenotypic term queries take longer, especially when the number of directly associated genes/proteins is over one hundred. with the aim of creating a better user experience, we applied a procedure in which the collected nodes and their edges are instantaneously and interactively displayed on the screen, before the end of the job. this way, users do not have to wait for the whole job to be finished before starting to explore the kgs. the entire crossbar web-service including the web-site and the underlying api queries can be found in the github repository of the project (https://github.com/cansyl/crossbar). we constructed two versions of covid- knowledge graph. first, the large-scale version that includes nearly the whole of the covid- related information recently accumulated in the scientific literature, organized and presented in an interpretable way. second, the simplified version, which is suitable for quick exploration. the aim behind constructing the simplified version was that the large-scale kg is not easily explorable visually due to its huge size. since most of the covid- related data has still not been integrated into the regular releases of biological databases, the data could not be pulled to the crossbar database at the time of writing this manuscript. as a result, we had to make manual interventions to obtain the data from crossbar data resources. we applied the same knowledge graph construction methodology incorporated in crossbar; however, we also conducted manual curation to a certain extent, since, unlike the rest of the crossbar data, covid- related information has not been extensively curated yet. we saved the pre-constructed kgs, which are directly accessible and viewable through the links given on our web-service (https://crossbar.kansil.org/covid_main.php). it is also important to note that, due to the integrated data resources, crossbar heavily contains rare and complex disease data, and mostly leaves infectious diseases out. nevertheless, the constructed covid- graphs provided rich biomedical information. below, we describe the methodology followed for the generation of crossbar covid- kgs. construction of the large-scale covid- graph started with acquiring the related efo disease term named: "covid- " (id: mondo: ). we also incorporated the disease term for "severe acute respiratory syndrome" (id: efo: ) (the original sars) into the graph since sars is better annotated compared to covid- . the full-scale covid- kg construction is continued as follows: we obtained related genes/proteins and their interactions from the intact database's covid- dataset. unlike a genetic disease, human genes/proteins represent only a portion of an infectious disease. due to this, we aimed to incorporate sars-cov and sars-cov- genes/proteins, as well as, the host genes/proteins into the graph. without any filtering, the kg contained , gene/protein and metabolite nodes from various organisms and , edges. due to the high number of genes/proteins in the kg, there was a risk of incorporating non-specific/irrelevant terms from the other biological components at later steps. to address this risk, we applied several filtering operations on this data. first, we eliminated all non-gene/protein nodes and we discarded the genes/proteins if the corresponding organism is not human or sars-cov/sars-cov- . second, we eliminated the protein entries that are not reviewed (i.e., not from uniprotkb/swiss-prot) except sars-cov- orf (accession: a a dja ), which currently is an unreviewed protein entry in uniprotkb/trembl. we also filtered out a portion of the host genes/proteins using interaction-based information, according to their confidence scores reported in intact. we discarded the edges between host proteins and sars-cov and/or sars-cov- proteins if the confidence score was less than . . we also discarded the edges between host proteins in the kg (i.e., neighbouring proteins) if their interaction confidence score is less than . . we removed the disconnected components made up of host proteins, which were formed due to the edge filtering operation. orthology relations between sars-cov and sars-cov- genes/proteins were annotated with "is ortholog of" edge type. the subunits of large protein complexes such as the nsps of replicase polyprotein ab of sars-cov/sars-cov- were mapped to their corresponding protein complex nodes with "is subunit of" edge label. after these operations, the finalized number of genes/proteins/subunits is ( pathways of covid- related host genes/proteins signaling and metabolic pathway information was taken from reactome (via crossbar database) and kegg pathways data sources. the most relevant pathways were determined by the overrepresentation analysis and mapped to the related genes/proteins in the kg. some of the incorporated pathways are directly related to sars-cov- infection such as "viral mrna translation" (r-hsa- ) or "isg antiviral mechanism" (r-hsa- ) and innate pathways of the host such as "endocytosis" (hsa ), "cell cycle" (hsa ) or "nf-kappa b signaling pathway" (hsa ). we also incorporated pathway-disease the resource for the phenotype terms is the human phenotype ontology (hpo) database. for each phenotype term that is associated with at least one gene in the kg according to hpo data, we calculated an enrichment score and p-value via overrepresentation analysis. from the score-ranked hpo term list we selected phenotype terms that are not in a close parent-child relationship with each other in the hpo direct acyclic graph. hpo also has a curated list of sars related phenotype terms. these terms were also added into the network and mapped to "covid- " and "severe acute respiratory syndrome" disease the authors declare no competing interests. supplementary information is available for this paper. genes/proteins that are associated with the query disease (i.e., core genes/proteins), ( ) collects additional genes/proteins (i.e., first-neighbours) using ppis of core genes/proteins, ( ) identifies biological processes (pathways), of which these genes/proteins (core+neighbouring) are members, ( ) gathers phenotypic terms (hpo) associated with the whole gene/protein set, ( ) obtains known drugs and drug candidate compounds targeting these genes/proteins, together with our deeplearning based interaction predictions, and ( ) revisits the disease collection to make another query with all collected genes/proteins, to obtain the disease entries that have similar implications as the query disease. bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications the ontology lookup service: bigger and better biograph: unsupervised biomedical knowledge discovery via automated hypothesis generation systematic integration of biomedical knowledge prioritizes drugs for repurposing biograph: a web application and a graph database for querying and analyzing bioinformatics resources constructing biomedical knowledge graph based on semmeddb and linked open data expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences kabob: ontology-based semantic integration of biomedical databases science forum: wikidata as a knowledge graph for the life sciences high performance drug-target interaction prediction with convolutional neural networks using -d structural compound representations mdeepred: novel multi-channel protein featurization for deep learning based binding affinity prediction in drug discovery interactive visualization and analysis of compound bioactivity space knowledge graph embedding by translating on hyperplanes cytoscape: a software environment for integrated models of biomolecular interaction networks a new coronavirus associated with human respiratory disease in china liver diseases in covid- : etiology, treatment and prognosis controversial treatments: an updated understanding of the coronavirus disease recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases enrichment or depletion of a go category within a class of genes: which test? a sars-cov- protein interaction map reveals targets for drug repurposing crossbar web-service (including a tutorial on how to use the service) is available at dr aybar can acar (faculty member, metu, turkey) and dr tugba suzek (faculty member, mugla university, turkey) for helpful discussions, comments and support. we also thank dr ian dunham (director, open targets, uk) and dr andrew leach ): (b) the large-scale kg ( nodes and edges) and (c) the simplified kg ( nodes and edges). both of these graphs reveal the most overrepresented biological processes during a sars-cov- infection (i.e., cell cycle, viral mrna translation, endocytosis, interleukin signaling, etc.), as well as, the potential treatment options with covid- related pre-clinical/clinical results (e.g., chloroquine, remdesivir, favipiravir, dexamethasone, etc.) and our novel in silico predictions we checked the interaction between the significant degs (table s. ) and genes in the large-scale covid- kg, and applied fisher's exact test to analyse the significance of the presence of degs on the kg (table s. ) as opposed to the non-degs in the multiplex panel of the gene expression analysis platform (nanostring) key: cord- -mtq yh authors: rodrigues, joão pglm; barrera-vilarmau, susana; teixeira, joão mc; seckel, elizabeth; kastritis, panagiotis; levitt, michael title: insights on cross-species transmission of sars-cov- from structural modeling date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: mtq yh severe acute respiratory syndrome coronavirus (sars-cov- ) is responsible for the ongoing global pandemic that has infected more than million people in more than countries worldwide. like other coronaviruses, sars-cov- is thought to have been transmitted to humans from wild animals. given the scale and widespread geographical distribution of the current pandemic, the question emerges whether human-to-animal transmission is possible and if so, which animal species are most at risk. here, we investigated the structural properties of several ace orthologs bound to the sars-cov- spike protein. we found that species known not to be susceptible to sars-cov- infection have non-conservative mutations in several ace amino acid residues that disrupt key polar and charged contacts with the viral spike protein. our models also predict affinity-enhancing mutations that could be used to design ace variants for therapeutic purposes. finally, our study provides a blueprint for modeling viral-host protein interactions and highlights several important considerations when designing these computational studies and analyzing their results. introduction sars-cov- , a novel betacoronavirus first identified in china in late , is responsible for the ongoing global pandemic that has infected more than million people worldwide and killed nearly . [ ] . based on comparative genomics, sars-cov- is thought to have been transmitted to humans from an animal host, most likely bats or pangolins [ ] . given the widespread human-to-human transmission across the globe, the question emerges whether humans can infect other animal species with sars-cov- , namely domestic and farm animals. identifying potential intermediate hosts that can act as reservoirs for the virus has both important global health, animal welfare, and ecological implications. during the course of this pandemic, there have been several news reports of domestic, farm, and zoo animals testing positive for sars-cov- infection. belgium [ ] and new york [ ] reported positive symptomatic cases in cats, the netherlands reported infection of minks in farms [ ] , and the bronx zoo in new york reported infections in lions and tigers [ ] . in all these cases, the vehicle of transmission appears to be an infected human owner or handler. more importantly, in the case of the mink farms in the netherlands, there is evidence of human-to-animal-to-human transmission. in addition to these reported cases, several groups put forward both pre-prints and peer-reviewed studies on animal susceptibility to sars-cov- under controlled laboratory conditions [ ] [ ] [ ] , two of which are of particular interest. the first study showed that cats, civets, and ferrets are susceptible to infection; pigs, chickens, and ducks are not, while the results for dogs were inconclusive [ ] . a second study, using human cells expressing recombinant sars-cov- receptor proteins showed that camels, cattle, cats, horses, sheep, and rabbit can be infected with the virus, but not chicken, ducks, guinea pigs, pigs, mice, and rats [ ] . together, these studies provide a dataset of confirmed susceptible and non-susceptible species that we can use to find molecular discriminants between the two groups. for simplicity, from here on we will refer to susceptible and non-susceptible species as sars-cov- pos and sars-cov- neg , respectively. like sars-cov- before, sars-cov- infection starts with the binding of the viral spike protein to the extracellular protease domain of angiotensin-converting enzyme (ace ) [ ] , a single-pass transmembrane protein expressed on the surface of a variety of tissues, including along the respiratory tract and the intestine. several biophysical and structural studies identified helices α and α , as well as a short loop between strands β and β in ace as the interface for the viral spike protein [ ] [ ] [ ] [ ] . these studies also identified key differences between the sequences of the receptor binding domains (rbd) of sars-cov- and sars-cov- , which explain the stronger interaction of the latter with human ace . as such, we can reasonably assume that sequence variation across ace orthologs might explain why some animal species are susceptible to infection while others are not. in addition, combining structural and binding data with the natural diversity of ace across species is likely to shine a light on the key aspects that drive ace interaction to viral rbds and ultimately help guide the development of therapeutic molecules against sars-cov- . unsurprisingly, several groups already published, or made available as preprints, multiple sequence and structure-based analyses of how sequence variation affects ace binding to sars-cov- rbd [ ] [ ] [ ] [ ] . two recent preprints, specifically, focus on the effects of ace variation on rbd binding. the first used an ace sequence library to select for mutants that bind rbd with high affinity, identifying several mutants that enhance or decrease affinity to the viral protein and providing a blueprint for engineering proteins and peptides with therapeutic purposes [ ] . while useful, we note that the authors carried out a single round of selection as opposed to the multiple rounds commonly carried out in similar studies. the second study used computational modeling to predict ΔΔg of mutations in animal species and assess their risk for infection [ ] . in addition, the authors also identified a number of locations on ace that contribute to binding the viral rbd, in particular residues , , , as well as a cluster of n-terminal hydrophobic amino acid residues. in this study, we aimed to leverage structural, binding, and sequence data to investigate how different ace orthologs bind to sars-cov- rbd. we selected animal species likely to encounter humans in a variety of residential, industrial, and commercial settings. for each of these species, we generated d models of ace bound to rbd and refined these models using short molecular dynamic simulations. after refinement, we found that models of sars-cov- pos species generally have a lower (better) score than those of sars-cov- neg species. further, we carried out a per-residue energy analysis that identified key locations in ace that are consistently mutated across sars-cov- neg species. collectively, our results provide a structural framework that explains why certain animal species are not susceptible to sars-cov- infection, and also suggests potential mutations that can enhance binding to the viral rbd. sequence conservation of ace orthologs we analyzed the sequence conservation of ace across our dataset, with respect to the entire sequence ( residues) and to the interface residues computed from a structure of ace bound to rbd (pdb id: m ) ( residues) ( table ). all orthologs are reasonably conserved, with global similarity values to the human ace sequence (hace ) ranging from % (goldfish) to . % (chimpanzee) (figure , left panel). all species coarsely cluster in three classes consistent with evolutionary distance to humans: primates have the highest similarity values, followed by other mammals, birds and reptiles, and finally fish. zooming in on the interface residues, we find substantially more variation (figure , right panel) . similarity values for these residues range from % (crocodile) to % (all primates) but, despite an overall correlation (pearson r of . ), do not always match global similarities. hedgehogs and sheep, for example, share . % and . % global similarity with hace , respectively, but % and . % for the interface region. in absolute numbers, these similarities mean that sheep share out of residues with hace at the interface with rbd, while hedgehogs share . the horseshoe bat, one of the proposed animal reservoirs for sars-cov- , shares . % interface similarity with hace , a comparable value to the . % of the sars-cov- neg mouse sequence. altogether, these results prompt two observations. first, neither global nor interface sequence similarity is predictive of sars-cov- susceptibility. second, that the interface of the viral rbd is substantially plastic and able to bind to sufficiently different ace orthologs. refinement of the hace :rbd complex in order to validate the refinement protocol used in our analysis, we created and refined models of human ace (hace ) bound to sars-cov- rbd. we used the cryo-em structure of full-length human ace bound to the rbd, in the presence of the amino acid transporter b at (pdb id: m ). compared to a high-resolution crystal structure of the same complex (pdb id: m j), the cryo-em structure lacks several key contacts between our two proteins of interest, which we attribute to poor density for side-chain atoms at the interface region. our refinement protocol restores the majority of these contacts (table s ) , yielding an average haddock score of - . (arbitrary units) for the best models of the best cluster. see materials and methods for further details on the protocol. these negative haddock scores suggest a favorable interface and agree with scores calculated for a reference set of transient protein-protein interactions (n= , haddock score=- . ± . ) [ ] . the interfaces in our models are dominated by hydrogen bond interactions involving the ace α helix and a small loop between strands β and β . there is one single salt-bridge involving hace d and rbd k consistently present in all our hace models. these observations all agree with the published crystal structure. further, the buried surface area of the refined models is also in agreement with published crystal structures (~ Å ). as such, considering the low quality of the interface region in our template structure, we are confident that our modeling and refinement protocol is robust enough to model all ace orthologs. refinement of orthologous ace :rbd complexes we modeled and refined complexes for all ace orthologs in our dataset (table ) using the same protocol as above. the representative models for each species ( best models of the best cluster) are available for visualization and download at https://joaorodrigues.github.io/ace -animal-models/. the haddock scores of all ace complexes (including hace ) range from - . (dog) to - . (mouse), a significant range that indicates substantial differences between these interfaces (table and figure ). the average haddock score is - . , very close to that of the human complex (- . ). overall, models of sars-cov- pos species have consistently lower (better) scores than those of sars-cov- neg species. although it is well-known that docking scores do not quantitatively correlate with experimental binding affinities [ ] , these scores suggest that sars-cov- neg species lack one or more key ace residues that contribute significantly to the interaction with rbd. to understand what forces drive the interactions between ace and sars-cov- rbd, we quantified the contribution of each component of the haddock scoring function to the overall score ( figure ). the haddock score is a linear combination of van der waals, electrostatics, and desolvation energy terms. in our models, electrostatics are the most discriminatory component (pearson r of . ), followed by desolvation ( . ), and finally van der waals ( . ). these correlations suggest that differences between the models of the different species originate primarily in polar and charged residues, in agreement with observations from experimental structures. in addition, the buried surface area of the models also correlates quite strongly with the haddock score (pearson r of . ), which is unsurprising since larger interfaces tend to make more contacts. most models bury between and Å , in agreement with the crystal and cryo-em structures, while the topscoring species (dog and goldfish) bury nearly Å and the lowest-scoring (mouse) bury only Å . finally, there is a weak correlation between the average haddock score of the representative models and the sequence similarity of the ace interface residues (pearson r of . ) ( figure s ). per-residue energetics of the ace :rbd interface to gain further insight on how ace sequence variation across the different orthologs affects binding to sars-cov- rbd, we calculated haddock scores for each interface residue in the refined models ( figure ). this high-resolution analysis reveals several sites that discriminate between sars-cov- pos and sars-cov- neg species. the first and most relevant of these sites is amino acid , which in hace (d ) interacts with rbd k to form the only intermolecular salt-bridge of the interface. in all sars-cov- pos species, this site is occupied by a negatively charged amino acid residue. in contrast, out of sars-cov- neg species have a hydrophobic or polar residue at this position. the goldfish ace sequence is an interesting outlier, with the second-best haddock score despite having a lysine at position that breaks the intermolecular salt-bridge. the loss of such an important site is compensated by the introduction of an alternative salt-bridge between e and rbd r . finally, the sequences of the top-scoring models also suggest that between aspartate and glutamate, the latter results in a stronger interaction, likely due to a stabilizing effect of the longer side-chain. the second site is amino acid , a lysine in hace , and in nearly all of the sars-cov- pos species, that interacts both with ace e and rbd q . the only exceptions are the civet and dromedary sequences, mutated to threonine and glutamate, respectively. in the case of the civet, our models show that t can still hydrogen bond with both e and rbd q . dromedaries, on the other hand, share e with chickens, guinea pigs, and ducks, all sars-cov- neg species. however, and quite beautifully, dromedaries compensate the possible electrostatic repulsion between e and e with a lysine at position (q in hace ) leading to the formation of an additional intramolecular salt-bridge that stabilizes the fold of ace and frees e to hydrogen bond with q in % of our models. all three sars-cov- neg species have an additional charge-reversal mutation at position , although with different outcomes in our models. in both chicken and duck ace , e is locked in an intramolecular salt-bridge with r in all of our models, losing the intermolecular hydrogen bond with rbd q . lastly, guinea pigs compensate k e with e k and remain able to hydrogen bond with rbd. the third discriminatory site between sars-cov- pos and sars-cov- neg species is amino acid , a histidine in hace and a polar residue in all sars-cov- pos species. in our hace models, h is doubly-protonated and forms an intramolecular salt-bridge/hydrogen bond with e and an intermolecular hydrogen bond with the hydroxyl group of rbd y . in addition, in most of our models, the aromatic ring of h is close enough (< . Å) to the aliphatic side-chain of rbd l to form productive hydrophobic interactions. our energetic analysis shows that substituting h by polar (serine, threonine) or hydrophobic (leucine, valine) residues destabilizes the interface, while substitution by a tyrosine substantially contributes to a stronger interaction. sars-cov- neg species except mouse and rat have hydrophobic residues at position , losing the ability to hydrogen bond with rbd y . in addition, the side-chain of rbd l is often out of range of hydrophobic interactions. in contrast, the h y substitution in the dog, ferret, and civet sequences loses the intramolecular hydrogen bond with e but compensates by hydrophobic interactions with nearby rbd residues and hydrogen bonds with rbd r (ferret), s (civet) or y (dog). in addition, the loss of aromatic residues at position leads to a steep decrease in desolvation energy of the models( figure s ). . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . besides these three major discriminatory sites, we identified three other sites that are systematically mutated in sars-cov- neg species. the first of these sites is k (in hace ), which is involved in an intramolecular saltbridge with d and an additional backbone hydrogen bond with rbd g . in rat and mouse ace , both sars-cov- neg species, this residue is mutated to a histidine, which weakens the interaction with d , possibly leading to increased conformational dynamics of the β -β loop and consequently lower binding affinity. then, position , a glutamine in hace and in most other species, hydrogen bonds with rbd y in the majority of our models. in canary, chicken, pigeon, hedgehog, duck, and crocodile ace sequences, this amino acid is mutated to a glutamate. this substitution introduces the possibility of an additional intramolecular salt-bridge with k , in ace helix α , which we observe in some of our models, preventing the formation of the intermolecular hydrogen bond. finally, y in hace is mutated to phenylalanine in canary, chicken, rat, duck, and mouse ace , mostly sars-cov- neg species. although our models do not offer a clear reason as to why this mutation could be damaging to rbd binding, the loss of the terminal hydroxyl group could have two negative consequences. first, there is the clear loss of two possible hydrogen bonds, with ace q and rbd n . then, the gain in hydrophobicity could lead the aromatic moiety to bury between both α and α helices, causing rbd f to lose a valuable interaction partner. all our models and the scoring statistics are available for visualization and download at https://joaorodrigues.github.io/ace -animal-models/. can structural modeling predict cross-species transmission of sars-cov- ? our computational modeling of vertebrate ace orthologs bound to sars-cov- rbd discriminates between previously reported sars-cov- pos and sars-cov- neg species. models of sars-cov- neg species -chicken, duck, guinea pig, mouse, and rat -generally have higher (worse) haddock scores than average (figure ), suggesting that these species' non-susceptibility to infection could stem from deficient rbd binding to ace . despite this clear trend, there are two notable outliers. our modeling ranks guinea pig ace (sars-cov- neg ) as a better receptor for sars-cov- rbd than for example, human, cat, horse, or rabbit ace (all sars-cov- pos species), despite experiments showing that there is negligible binding between the two proteins [ ] . then, the goldfish ace sequence ranks second among all models, despite reports that fishes are unlikely to be susceptible to infection due to their physiology and environment [ ] . these two results highlight the need for critical thinking when evaluating predictions from computational models. as noted earlier in the introduction, sars-cov- infection is a complex multi-step process [ ] . thus, while we can assume that impaired ace binding decreases odds of infection, we cannot state that ace binding is predictive of infection. for instance, experiments with recombinant ace show that the pig ortholog binds sars-cov- rbd and leads to entry of the virus in host cells [ ] , but tests in live animals returned negative results [ ] . in addition, our modeling protocol makes assumptions about the bound state of the two proteins, starting from the cryo-em template structure. however, cryo-em structures of the full-length sars-cov- spike protein [ ] highlight multiple unbound conformations for rbd, and coarse-grained simulations of the hace :rbd complex show that there is substantial flexibility in some of the interfacial rbd loops [ ]. altogether, these limitations show that computational models alone cannot predict whether certain animal species are at risk of infection. what our models do predict, however, is that there are distinctive molecular features characteristic of sars-cov- neg species. as the adage goes, 'all models are wrong, but some are useful.' sars-cov- neg species lack important polar and charged ace residues on further inspection, we find that sars-cov- neg models rank worse due to a substantial decrease in electrostatic energy (figure ), indicating loss of polar interface contacts, namely hydrogen bonds and saltbridges ( figure ) . indeed, models of mouse, duck, rat, and chicken lack the ability to form an intermolecular salt-bridge with rbd due to the loss of hace d . these predictions are supported by experimental work, where mutants lacking a negative charge at this position are largely unable to bind rbd [ ] . non-conservative mutations at other sites on ace also contribute negatively to the interface scores. residues k and h (hace ) engage multiple neighboring residues in both intra-and intermolecular hydrogen bonds, contributing both to ace fold stability and rbd binding, respectively. our models suggest that the introduction of a negatively charged residue at position is disruptive to binding, in agreement with experiments [ ] . in sars-cov- pos species, like dromedary camels, this mutation is more likely to be tolerated due to additional compensatory mutations that stabilize the ace fold and still allow for contacts with rbd. in all sars-cov- neg species except guinea pig however, there are no additional mutations to compensate for this substitution. as for position , our predictions contrast with experimental measurements [ ] , which show that mutation to a hydrophobic residue improves binding between ace and rbd. in our models, the preference seems to be for aromatic residues (histidine, tyrosine) capable of both hydrogen bonds and hydrophobic interactions. we note, however, that our coverage of sequence space is limited to naturally occurring variants. unlike in the work referenced before [ ] , where the selection driver is rbd binding, natural selection of ace might impose additional constraints on sequence variability. finally, our models suggest that reduced flexibility of ace might be a positive contributor to rbd binding affinity. disrupting an intramolecular salt-bridge between d and k by substituting k with a shorter polar amino acid residue is a consistent feature in mice and rats, both sars-cov- neg species. these results support other computational modeling work [ ] that suggest that rbd mutants g d bind worse to ace because of the disruption of this intramolecular salt-bridge. natural variants of ace encode potential affinity-enhancing mutations for sars-cov- rbd in addition to identifying mutations that impair binding of sars-cov- rbd, our models suggest several hace variants that could be used to enhance affinity between the two proteins. the clearest affinity enhancer seems to be d e, a variant observed in of the best scoring species ( figure ) and shown in experiments to increase binding to rbd [ , ] . the longer side-chain of a glutamate residue can help strengthen and stabilize the intermolecular salt-bridge with rbd k . the impact of such conservative mutations in stabilizing protein interactions has been reported previously for other systems [ ] . the second predicted enhancer is h y, which as we discussed above, contrasts with experimental measurements. in addition to maintaining hydrogen bonds and hydrophobic contacts, our models show that this mutation results in a substantial increase in desolvation energy ( figure s ). in summary, our protocol combines structural, sequence, and binding data to create a structure-based framework to understand sars-cov- susceptibility across different animal species. our models help rationalize the impact of naturally-occurring ace mutations on sars-cov- rbd binding and explain why certain species are not susceptible to infection with the virus. in addition, we propose possible affinityenhancing mutants that can help guide engineering efforts for the development of ace -based antiviral therapeutics. despite the aforementioned limitations, our protocol and models can easily be replicated using freely-available tools and web servers and serve as a blueprint for future modeling studies on ace interactions with coronaviruses' rbds. finally, to prevent human-to-animal transmission, we recommend following the world organization for animal health guidelines: people infected with covid- should limit contact with their pets, as well as with other animals (including humans). sequence alignment of ace orthologs sequences of ace orthologs from species were retrieved from ncbi using the human gene as a reference (gene id: , updated on -apr- ) and the query term "ortholog_gene_ [group]". other species, such as rhinolophus sinicus, were manually included using custom queries. the sequences were aligned with mafft version [ , ] , using the alignment method fft-ns-i (standard). some sequences had undefined amino acids ('x'), which we converted to glycine to allow modeling without any bias for amino acid identity. all species and the respective protein identifiers are listed in table . . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint all calculations were based on the alignments from mafft, restricted to the region used for modeling (residues - ). to calculate sequence similarity, we considered the following groups based on physico-chemical properties: charged-positive (arg, lys, his), charged-negative (asp, glu), aromatic (phe, tyr, trp), polar (ser, thr, asn, gln), and apolar (ala, val, ile, met). cys, gly, and pro residues were considered individual classes. the modeling of ace orthologs was carried out using modeller . [ ] and custom python scripts (available upon request). we used the cryo-em structure of the sars-cov- rbd bound to human ace (pdb id: m ) [ ] as a template for all our subsequent models, including all glycans and the coordinates of rbd. to save computational resources, we modeled only the extracellular domain of ace , specifically residues - , which are known to be sufficient to bind to rbd. to avoid unwanted deviation from the initial cryo-em structure, we restricted the optimization and refinement of the models to the coordinates of atoms of mutated or inserted residues. we used the fastest library schedule for model optimization and the very_fast schedule for model refinement. for each species, we generated backbone or loop models and selected the one with the lowest normalized dope score as a representative. these final models were then processed to remove any sugar molecules in species where the respective asparagine residue had been mutated. the initial complex models were prepared for refinement using the pdb-tools suite [ ] . each chain was separated into a different pdb file (pdb_selchain) and standardized with ter and end statements (pdb_tidy). we used haddock . [ ] to carry out the refinement of the models. the protein molecules were parameterized using the standard force field in haddock, while the sugars were parameterized using updated parameters for carbohydrates [ ] . we used a modified version of the topology generation scripts to allow automatic detection of n-linked glycans and expand the range of the interface refinement ( Å distance cutoff). each initial homology model was refined through independent short molecular dynamics simulations in explicit solvent (solvshell=true). these refined models were then clustered using the fcc algorithm [ ] with default parameters and scored using the haddock score, a linear combination of van der waals, electrostatics, and desolvation. a lower haddock score is better. the top models of the top scoring cluster, ranked by its average haddock score, were selected as representatives of the complex. analysis of interface contacts of refined ace :rbd complexes we used the interfacea analysis library (version . ) (http://doi.org/ . /zenodo. ) to identify intermolecular contacts between hace and rbd, specifically hydrogen bonds, salt bridges, and aromatic ring stacking. hydrogen bonds were defined between any donor atom (nitrogen, oxygen, or sulfur bound to a hydrogen atom) within . Å of an acceptor atom (nitrogen, oxygen, or sulfur), if the donor-hydrogen-acceptor angle was between and degrees. salt bridges were defined between two residues with a pair of cationic/anionic groups within Å of each other. finally, two aromatic residues were defined as stacking if the centers of mass of the aromatic groups were within . Å (pi-stacking) or Å (t-stacking) and the angle between the planes of the rings was between and degrees (pi-stacking) or between and degrees (t-stacking). additionally, for pi-stacking interactions, the projected centers of both rings must fall inside the other ring. for each modelled species, we took the best models of the best cluster, judged by their haddock score, and aggregated all their contacts together. contacts present in at least models were considered representative. per-residue decomposition of haddock scores we used a custom cns [ ] script to calculate the haddock score of each residue at the interface between ace and rbd. briefly, the protocol was the following. for each model, since haddock uses a united-atom force field, we first added missing hydrogen atoms and minimized their coordinates, keeping all other atoms fixed. we marked a residue of ace as part of the interface if any of its atoms were within Å of any atom of rbd, and vice-versa. we then calculated the electrostatics, van der waals, and desolvation energies for each of these residues, considering only atoms belonging to the other protein chain. note that this protocol does not account for intramolecular effects of mutations. finally, we calculated the haddock score per residue, using the default scoring function weights, and averaged per-residue values for the best models of the best cluster of each species. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . figure . sequence similarity of ace orthologs to human ace . global sequence similarity values range from - %, while similarity values for residues interacting with sars-cov- rbd (derived from m ) range from - %. species are ordered by decreasing global sequence similarity to human ace . colors indicate known susceptibility to infection: sars-cov- pos species in green, sars-cov- neg species in red, others in gray. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint figure . haddock scores of modeled ace orthologs bound to sars-cov- rbd. the haddock score predicts the strength of the interaction between proteins. models of sars-cov- pos species (green) generally have better (more negative) scores than sars-cov- neg species (red), suggesting that impaired binding between the two proteins might explain differences in viral susceptibility. the scores shown here are the average of the best models for each ace ortholog. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . figure . correlation of haddock score with individual energy terms and structural features. differences in electrostatics energy contribute the most towards discriminating sars-cov- pos species (green) from sars-cov- neg species (red), supporting observations of hydrogen bonding networks and charged interactions in experimental structures. the buried surface area of the models is also correlated with their haddock score. the units for van der waals and electrostatics energies, desolvation, and buried surface area are kcal.mol - , arbitrary units, and Å , respectively. the human complex is shown in black for reference. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint figure . haddock score of individual ace interface residues. amino acid residues at positions , , , and are predicted to be the largest contributors to the stability of the interface. sars-cov- neg (red labels) species consistently show changes in these positions which could explain their non-susceptibility to the virus. the top-scoring sars-cov- pos (green labels) also suggest that hace d e and h y could potentially act as affinity enhancers. for each species, each block represents an interface residue of ace . the identity of the amino acid is shown in one-letter codes. the colors represent the average haddock score of each particular residue over the best models: lower scores (blue) indicate more favorable interactions. positive scores (dark red) indicate steric clashes or electrostatic repulsion. blank squares indicate that in that ortholog, that position is not part of the interface of the complex. residues marked with *, and ** are observed to form hydrogen bonds or salt-bridges in the hace :rbd crystal structure, respectively. see materials and methods for additional details on definitions. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint figure s . correlation between interface sequence similarity to hace and haddock score. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint figure s . desolvation energy of individual ace interface residues. aromatic residues at position contribute the most to the gain in desolvation energy across all species of the complex, indicating that h y could be a potential affinity enhancing mutation in hace . for each species, each block represents an interface residue of ace . the identity of the amino acid is shown in one-letter codes. the colors represent the average desolvation energy of each particular residue over the best models: lower scores (blue) indicate more favorable interactions. blank squares indicate that in that ortholog, that position is not part of the interface of the complex. residues marked with *, and ** are observed to form hydrogen bonds or salt-bridges in the hace :rbd crystal structure, respectively. see materials and methods for additional details on definitions. . cc-by-nc-nd . international license was not certified by peer review) is the author/funder. it is made available under a the copyright holder for this preprint (which this version posted june , . . https://doi.org/ . / . . . doi: biorxiv preprint an interactive web-based dashboard to track covid- in real time the proximal origin of sars-cov- a cat appears to have caught the coronavirus, but it's complicated mink infected two humans with coronavirus: dutch government. reuters. seven more big cats test positive for coronavirus at bronx zoo. in: animals [internet susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus potential host range of multiple sars-like coronaviruses and an improved ace -fc variant that is potent against both sars-cov- and sars-cov- . microbiology simulation of the clinical and pathological manifestations of coronavirus disease (covid- ) in golden syrian hamster model: implications for disease pathogenesis and transmissibility structural and functional basis of sars-cov- entry by using human ace structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis for the recognition of sars-cov- by fulllength human ace structural basis of receptor recognition by sars-cov- the sequence of human ace is suboptimal for binding the s spike protein of sars coronavirus . biochemistry sars-cov- spike protein predicted to form stable complexes with host receptor protein orthologues from mammals receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus sars-cov- , an evolutionary perspective of interaction with human ace reveals undiscovered amino acids necessary for complex stability proteins feel more than they see: fine-tuning of binding affinity by properties of the non-interacting surface are scoring functions in protein−protein docking ready to predict interactomes? clues from a novel binding affinity benchmark viewpoint: sars-cov- (the cause of covid- in humans) is not known to infect aquatic food animals nor contaminate their products the trinity of covid- : immunity, inflammation and intervention structure, function, and antigenicity of the sars-cov- spike glycoprotein mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform aleaves facilitates on-demand exploration of metazoan gene family trees on mafft sequence alignment server with enhanced interactivity comparative protein modelling by satisfaction of spatial restraints pdb-tools: a swiss army knife for molecular structures the haddock . web server: user-friendly integrative modeling of biomolecular complexes compatible topologies and parameters for nmr structure determination of carbohydrates by simulated annealing clustering biomolecular complexes by residue contacts similarity version . of the crystallography and nmr system sheep - acknowledgements jpglmr acknowledges support from the molecular sciences software institute (aci- ). jpglmr and ml acknowledge funding from the national institutes of health usa (r gm ). plk acknowledges funding from the federal ministry for education and research (bmbf, zik program) ( z hn ) and the european regional development funds for saxony-anhalt (efre: zs/ / / ). the authors thank t. dots, k. lindorff-larsen, j. puglisi, and r. fernandes for feedback and encouragement during the course of the project. key: cord- -qfdp ov authors: shaban, mohammed samer; müller, christin; mayr-buro, christin; weiser, hendrik; albert, benadict vincent; weber, axel; linne, uwe; hain, torsten; babayev, ilya; karl, nadja; hofmann, nina; becker, stephan; herold, susanne; schmitz, m. lienhard; ziebuhr, john; kracht, michael title: inhibiting coronavirus replication in cultured cells by chemical er stress date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qfdp ov coronaviruses (covs) are important human pathogens for which no specific treatment is available. here, we provide evidence that pharmacological reprogramming of er stress pathways can be exploited to suppress cov replication. we found that the er stress inducer thapsigargin efficiently inhibits coronavirus (hcov- e, mers-cov, sars-cov- ) replication in different cell types, (partially) restores the virus-induced translational shut-down, and counteracts the cov-mediated downregulation of ire α and the er chaperone bip. proteome-wide data sets revealed specific pathways, protein networks and components that likely mediate the thapsigargin-induced antiviral state, including herpud , an essential factor of er quality control, and er-associated protein degradation complexes. the data show that thapsigargin hits a central mechanism required for cov replication, suggesting that thapsigargin (or derivatives thereof) may be developed into broad-spectrum anti-cov drugs. one sentence summary / running title suppression of coronavirus replication through thapsigargin-regulated er stress, erqc / erad and metabolic pathways the er is critically involved in surveying the quality and fidelity of membrane and secreted protein synthesis, as well as the folding, assembly, transport and degradation of these proteins (wang & kaufman, ). the accumulation of unfolded or misfolded proteins in the er lumen leads to er stress and upr activation, thereby slowing down protein synthesis and increasing the folding capacity of the er (karagoz et al, ) . as a result, cellular protein homeostasis can be restored and the cell survives. if this compensatory mechanism fails, er stress pathways can also switch function and will eventually induce oxidative stress and cell death (hetz & papa, ; wang & kaufman, ). the system relies on three er membrane-inserted sensors, including the protein kinase r (pkr)-like er kinase (perk), inositol-requiring protein α (ire α) and cyclic amp-dependent transcription factor α (atf α). perk and ire α are ser/thr kinases whose conserved n termini are oriented towards the er lumen (wu et al, ) . in non-stressed cells, the highly abundant major er chaperone and er stress sensor binding-immunoglobulin protein bip (also called kda glucose- regulated protein, gpr ; heat shock protein family a member , hspa ) binds to perk and ire α, which keeps these two proteins in an inactive monomeric state ( the activation of er stress by infectious agents has been widely observed. however, with few exceptions, it remains to be studied how this response is shaped in a microbe-specific manner and whether or not these responses are beneficial or detrimental to the host (grootjans et al, ). moreover, there is a lack of knowledge on cov-mediated (de)regulation of er stress components at the protein level. the latter is important because covs, in common with many rna viruses, are known to cause a global shutdown of host protein synthesis (hilton et al, ). here, we report that cov infection activates upr signaling and induces er stress components at the mrna level but suppresses them at the protein level. strikingly, the well-known chemical activator of the upr, thapsigargin, exerts a profound antiviral effect in the lower nanomolar range on three different covs in different cell types. a detailed proteomics analysis reveals multiple thapsigargin- regulated pathways and a network of proteins that are suppressed by cov but (re)activated by chemically stressed infected cells. these results reveal new insight into central factors required for cov replication and open new avenues for targeted cov antivirals. results to investigate how covs modulate er stress components at the mrna compared to the protein level, we determined the expression levels of components of the er stress pathway kegg "protein processing in endoplasmic reticulum" in human huh liver cells, a commonly used cellular model for cov replication, in response to infections with hcov- e and mers-cov, respectively. for untreated huh cells, we obtained mrna (by rna-seq) and protein (by lc-ms/ms) expression data for components which revealed a positive correlation between mrna and protein abundancies (fig. a, upper graph) . however, in cell lysates obtained at h post infection (p.i.), this effect was largely lost (fig. a , middle and lower graph). pearson correlation matrix confirmed a progressive loss of correlation between mrna levels and protein levels for this pathway over a time course from h to h p.i. (fig. b) . thus, out of (for hcov- e) or (for mers-cov) er stress factors that were found to be regulated at the mrna level, only a few remained (down)regulated at the protein level at late time points (fig. c, fig. s ). to determine the functional consequences of this opposing regulation at the mrna and protein levels in cov-infected cells, we focused on hcov- e and assessed key regulatory features of the er stress pathway as shown in fig. a . as a reference, we included samples from cells exposed to thapsigargin, a compound that has been widely used to study prototypically activated er stress mechanistically (bertolotti et al., ; oslowski & urano, ) . this setup included experiments, in which thapsigargin and virus were added simultaneously to the cell culture medium (followed by a further incubation for h) or thapsigargin was added to the cells at h p.i. for h (fig. b) . the presence of thapsigargin in the growth medium resulted in a major drop in viral titers by more than -fold (from . * to . * pfu / ml) which was paralleled by reduced amounts of viral rna isolated from thapsigargin-treated, hcov- e-infected infected cells at h p.i. (fig. c) . immunofluorescence analysis of hcov- e-infected cells treated with thapsigargin confirmed the impaired formation of functional viral replication/transcription complexes (rtcs) as shown by the reduced levels of both double-stranded rna (an intermediate of viral rna replication) and nonstructural protein (nsp) (an essential part of the viral rtc) (fig. d) . a strong suppression of viral replication was also demonstrated by the reduced protein levels observed for the nucleocapsid (n) protein (a major coronavirus structural protein) as well as nonstructural proteins (nsp) and , both of which representing essential components of the viral replication complex (snijder et al, ) (fig. e, f) . in all cases, the antiviral effect of thapsigargin remained readily detectable when the compound was added at h p.i, suggesting that it does not prevent viral entry but rather suppresses intracellular pathways required for efficient rna replication and/or particle formation and release or activates unknown antiviral effector systems (fig. c-f) . next, we investigated er stress signaling under these conditions. both virus and thapsigargin were confirmed to activate the perk branch of er stress (fig. e, f) , as shown by the retarded mobility of perk in sds gels (indicating multisite phosphorylation) and by phosphorylation of the perk substrate eif α at ser (fig. e, f) . unlike thapsigargin treatment, hcov- e infection led to a weak but significant decrease of perk (mean ± %) and eif α (mean ± %) levels compared to the controls. infection also caused an approximately twofold (mean ± %) reduction in bip expression (fig. e, f) . in contrast, long-term thapsigargin treatment (for h or h) caused a - - fold increase in bip expression, also in hcov- e-infected cells, thus reversing the suppression by viral infection (fig. e, f) . similarly, thapsigargin treatment for h or h caused a . - -fold increase in ire α expression (but not phosphorylation), again also in infected cells (fig. e, f) . in this set of experiments, atf proved to be the only protein that was induced by virus alone (fig. e, f), while the expression levels of atf remained largely unchanged (fig. e, f) . these data show that both cov infection and chemicals like thapsigargin activate er stress through the same proximal perk pathway, although they affect downstream cellular outcomes differentially. the restoration of bip and ire α levels by long-term thapsigargin treatment further suggests that the cov-induced block of inducible host factors is not irreversible and can be reprogrammed by a (presumably protective) thapsigargin-mediated response. our comparative analyses of viral replication and host response lead us to conclude that chemically and virus-induced forms of er stress, although proceeding through the same core perk pathway, do not simply potentiate each other but rather (somewhat counterintuitively) counteract each other. to explore a potential pharmacological exploitation of this effect, we assessed the cytotoxicity of the combined thapsigargin treatment and virus infection, because both conditions are known to promote cell death. at h p.i., cell viability of hcov- e-infected huh cells was only marginally reduced (mean . ± . %) (fig. g, upper graph) . after h of incubation, thapsigargin decreased cell viability in a dose-dependent manner with a cc of . µm in line with previous reports (fig. g, middle graph, fig. h ) (sehgal et al, ; tombal et al, ) . the combination of thapsigargin and hcov- e infection did not cause additional cytotoxicity as shown by a nearly identical cc of . µm (lower graph, fig. g, fig. h ). at µm thapsigargin, i.e., a concentration shown to completely abolish viral protein translation and replication (see above), the cell viability of cells infected with hcov- e and treated with thapsigargin was . ± . %, suggesting that thapsigargin exerts its antiviral effects at concentrations well below its cytotoxic concentrations (fig. g, h) . to further characterize the metabolic state of the cells under the conditions used in these experiments, we investigated protein de novo synthesis. newly produced proteins were quantified by in vivo puromycinylation tagging of nascent protein chains followed by immunoblotting using anti-puromycin antibodies. hcov- e was found to shut down protein biosynthesis by . ± . % while h of thapsigargin treatment led to a shutdown by . ± . % (fig. i) . however, in infected cells, the simultaneous or delayed addition of thapsigargin restored (or rescued) protein biosynthesis to approximately % of the level observed in untreated cells (fig. i) . these data demonstrate that, although both viral infection and thapsigargin treatment (individually) induce er stress and cause a translational shut-down, their combination shows no additive harmful effects to the cells. on the contrary, their combination appears to have opposing effects that result in a partial restoration of the cellular metabolic capacity while retaining a profound antiviral effect. we next assessed if these effects were cell-type or virus-specific. in line with the results described to characterize the underlying molecular mechanisms responsible for the observed antiviral effects of thapsigargin, we focused on two highly pathogenic coronaviruses, mers-cov and sars-cov, for which, to our knowledge, no side-by-side comparison of proteomic changes has been reported to date. the large-scale proteomic study included (i) untreated cells and cells that were (ii) infected with mers-cov, (iii) sars-cov- , (iv) treated with thapsigargin, (v and vi) infected with one of these viruses in the presence of thapsigargin. we used label-free quantification of six replicates per sample to determine the expression levels of > proteins from total cell extracts. in a systematic approach, we identified differentially expressed proteins (deps) based on pairwise comparisons of proteins obtained from untreated cells, virus-infected cells or thapsigargin-treated cells using a p value of -log ≥ . as cut-off. as visualized by volcano plot representations, mers-cov infection suppressed (at h p.i.) and proteins (at h p.i.), respectively, and increased the levels of proteins (at h p.i.) and proteins (at h p.i.), respectively (fig. a, b) , while sars-cov- suppressed the expression of proteins at h p.i. and proteins at h p.i. and increased the expression of proteins at h p.i. and proteins at h p.i. (fig. c, d) . we then devised a bioinformatics strategy to identify patterns of co-regulated or unique pathways and link deregulated protein sets identified in these data to specific (known) biological functions. as shown schematically in up-/or downregulated deps we found that many of the most highly enriched categories are related to rna, dna, metabolic functions and localization (fig. s a) . we then combined the pathway categories and searched this list for identical or unique go terms in response to mers-cov, sars- cov- or thapsigargin. by filtering pathways (out of ) with enrichment p values ≤ -log we found pathway categories shared by both viruses and by thapsigargin, which are mostly related to rna, folding, stress and localization (fig. f, fig. s b ). pathway categories unique to thapsigargin almost exclusively represented metabolic and biosynthetic pathways as shown for the top overrepresented pathways containing up-or downregulated deps, suggesting that thapsigargin on its own, unlike cov infection, initiates a broad metabolic response (fig. f, fig. s b ). this raised the question of whether the thapsigargin effects were retained in infected cells or, alternatively, drug-sensitive pathway patterns were reprogrammed (or masked) by the virus infection. to address this point, we pooled all pathways enriched under virus+thapsigargin conditions and compared them to virus infection or thapsigargin alone. % ( out ) pathway terms were shared by these three conditions reflecting multiple stress-related, catabolic and rna regulatory processes (fig. g, h) . pathway terms were unique to the virus+thapsigargin situation. they primarily mapped to specific splicing, signaling (torc, rhoa, arf ) and transport/localization pathways (fig. g, h) . the categories shared by virus+thapsigargin and thapsigargin conditions but were not detectable in cells infected with virus (only) recapitulate the thapsigargin-regulated metabolic pathways (pyruvate, aldehyde, carbohydrate, amino/nucleotide sugar, amino acid and glutamine metabolism, tca cycle, erad pathway, n-linked glycosylation) (fig. g, h) . for several of these pathways (e.g. erad, heat stress, carbohydrate metabolism), some deps were induced while others were repressed, indicating remodeling of pathway functions at the protein level (fig. g, h) . the pathway terms that were absent in the virus+thapsigargin group of terms (groups , , of the venn diagram shown in fig. g ) represent a distinct set of terms, mostly related to nucleotide and dna- related processes, such as dna repair, dna unwinding, chromatin silencing (fig. g, h) . in summary, the functional analysis of deps at the level of differentially enriched pathway categories shows that the antiviral effects of thapsigargin strongly correlate with the activation / suppression of a range of metabolic programs. the enriched pathway terms provided important overarching information on shared and unique biological processes but not necessarily encompassed identical sets of deps as exemplified by the ten pathways shown in fig. i . we therefore refined our analysis to the individual component level to identify proteins with similar regulation between both viruses across both cell types. the proteomes of huh and vero e cells overlap by % (fig. a) . in this group, only identical proteins were found to be deregulated by both mers-cov and sars-cov- (fig. b, it becomes apparent that the majority of proteins are regulated in the same direction by thapsigargin alone; demonstrating that thapsigargin largely overrides any virus-induced modulation of host processes (fig. c ). in the absence of thapsigargin, the virus infection generally has little or opposite effects on the levels of the proteins, as exemplified by the suppression seen for bip (hspa ) or herpud (fig. c, highlighted in green). the induced factors map to pathways involving copi-mediated anterograde transport, er stress, organelle organization and apoptosis (fig. d) . across their pathway annotations, out of the proteins were reported to strongly interact, thus probably being involved in protein:protein networks that coordinate activities of the enriched pathways (fig. e, left graph) . likewise, the repressed proteins map to specific (though different) pathways, such as fatty acid degradation or viral life cycle (fig. d) . components can be allocated to a few small protein interaction networks (fig. e, right graph) . we then validated mass spectrometry data by immunoblotting, confirming the induction of herpud in thapsigargin-treated cells infected individually with each of the three covs (fig. f , g). we also confirmed an additional hit belonging to the enriched pathways go: (cellular amino acid metabolic process) and go: (response to endoplasmic reticulum stress, as shown in fig. g, h), cystathionine-γ-lyase (cth), a regulator of glutathione homeostasis and cell survival (lee et al, ), as a further independent example for the fidelity and robustness of the proteomics data (fig. f, g). the highly inducible herpud protein has an essential scaffolding function for the organization of searching our proteomics data for further erad factors we were able to retrieve a total of (for mers-cov) and (for sars-cov- ) proteins of the canonical erqc and erad pathways for which a differential expression was observed in virus-infected cells treated with thapsigargin (fig. h) . mapping of these data on the kegg pathway suggests that thapsigargin enhances or restores these mechanisms at key nodes of erqc and erad in coronavirus-infected cells (fig. s ) . we also intersected the + proteins jointly regulated by thapsigargin in mers-cov and sars- cov- infected cells with data from a recent genome-wide sgrna screen that reported new erad factors required for protein degradation (leto et al, ) . this analysis identified additional thapsigargin-regulated factors that may further support antiviral erad, including uba and znf , which were recently described either as negative regulators of dna virus infections or of autophagy, the latter process playing diverse roles during cov infection (fig. i) in conclusion, these data show that thapsigargin forces the (re)expression of a dedicated network of proteins with roles in er stress, erqc, erad, and a range of metabolic pathways. collectively, these changes at the protein level confer an "antiviral state" and profoundly suppress cov replication as summarized schematically in fig. j . discussion in this study, we report a potent inhibitory effect of the chemical thapsigargin on the replication of three human covs in three different cell types. clearly, the mechanistic basis for these effects remains to be identified in additional studies. the proteomic data show that thapsigargin affects multiple pathways beyond the core er stress response. they also indicate that it will not be trivial to identify the essential targets that mediate thapsigargin's antiviral effects. our data provide a rich resource for further drug target analysis, also in conjunction with the few deep protein sequencing studies available for sars-cov- (but not mers-cov) ( in the absence of effective therapeutic and prophylactic strategies (antivirals and vaccines) to combat coronaviruses, and in view of the current sars-cov- pandemic, we report these observations to invite other laboratories to embark on a broader investigation of this potential therapeutic avenue. given that thapsigargin concentrations in the lower nanomolar range were shown to abolish cov replication in cultured cells, even when added later in infection ( h translational shut-down and are secreted in a cell-type specific manner (fig. s ) . some of these cytokines may contribute to the cytokine storm observed in some covid- patients (mehta et al, ). while thapsigargin had no effect on il- , il- , cxcl and ccl in cell culture (fig. s ) , a single bolus of the compound was shown to efficiently reduce the translation of pro-inflammatory cytokines in preclinical models of sepsis (wei et al, ) . thus, an additional benefit of thapsigargin treatment may arise from dampening overshooting tissue inflammation in covid- patients. in summary, the study provides several lines of evidence that thapsigargin hits a central mechanism of cov replication, which may be exploited to develop novel therapeutic strategies. this compound or derivatives with improved specificity, pharmacokinetics and safety profiles may also turn out to be valuable to mitigate the consequences of potential future cov epidemics more effectively. infection. sci pepstatin a, pmsf and microcystin were dissolved in ethanol and leupeptin in water. other reagents were from sigma-aldrich or thermo fisher scientific anti perk (abcam, #ab ), anti bip (cell signaling, # ), anti eif α (cell signaling # ), anti p(s )-eif α (cell signaling # ), anti p anti ire α #sc- ), anti herpud antibody (abnova, #h -a ), anti cth antibody (cruz, #sc- ), anti hcov- e n protein ((ingenasa, batch ), mouse anti hcov- e nsp (gift from carsten grötzinger), rabbit anti hcov- e nsp (ziebuhr & siddell, ), anti mers-cov n protein (sinobiological, # -rp ), rabbit anti sars-cov n protein cross-reacting with sars-cov- n protein (gift from friedemann weber), anti sars-cov- n protein (rockland, # - -a ), anti puromycin the following secondary antibodies were used: dako p ; polyclonal goat anti-mouse immunoglobulins/hrp, dako p ; polyclonal goat anti-rabbit immunoglobulins/hrp, cy -coupled anti rabbit (rb) igg (dk, merck millipore, #ap c), dylight -coupled anti mouse (ms) igg (dk, immunoblotting was performed essentially as described (hoffmann et al after blocking with % dried milk in tris-hcl-buffered saline/ . % tween (tbst) for h, membranes were incubated for - h with primary antibodies, washed in tbst and incubated for - h with the peroxidase- coupled secondary antibody. proteins were detected by using enhanced chemiluminescence (ecl) systems from millipore or ge healthcare mrna expression analysis by rt-qpcr . - µg of total rna was prepared by column purification (machereynagel) and transcribed into cdna using moloney murine leukemia virus reverse transcriptase #ep ) in a total volume of or µl. or µl of this reaction mixture was used to amplify cdnas using fisher scientific) for gusb ( bp, hs _m ), il ( bp il ( bp, hs _m ), cxcl ( bp, hs _m ), and ccl ( bp, hs _m ), as well as taqman fast universal pcr master mix (applied biosystems/ all pcrs were performed in duplicate on an abi real-time pcr instrument. the cycle threshold value (ct) for each individual pcr product was calculated by the instrument's software, and the ct values obtained for inflammatory/target mrnas were normalized by subtracting the ct values obtained for gusb. the resulting Δct values were also used to calculate relative changes of mrna expression as ratio (r) of mrna expression of treated after x washing, cells were fixed with % paraformaldehyde in pbs (santa cruz, # ) for min, washed x min with hank's bss (pan, #p - ), blocked with % normal donkey serum (jackson immunoresearch, # - - ) for min and incubated with primary and secondary antibodies diluted in hank's bss containing . % saponin (sigma-aldrich, #s - g) for h at room temperature. following washing steps with hank's bss containing . % saponin in brief, . x huh or x mrc- cells were seeded in -well plates for hours and thereafter treated with dmso, thapsigargin, virus alone or virus plus thapsigargin for hours as indicated in the figure legends. then, the medium was replaced by µl complete cell culture medium including µl celltiter ® aqueous one solution reagent according to the manufacturer's recommendations. cells were further incubated for . - hour at ° c. then, absorbance values were measured at nm after h, µl mtt mix (dmem supplemented with % fcs containing µg/ml tetrazolium bromide, sigma) was added to each well. next, cells were incubated for - min at °c and fixed using . % pfa in pbs. the tetrazolium crystals were dissolved by adding µl/well isopropanol and the absorbance at nm was measured using an elisa reader (biotek) elisa sandwich elisas from r&d systems (duoset elisa for human il- (dy ) the cell culture supernatants were harvested, centrifuged at , × g at °c for s and stored at - °c. µl of the supernatants were either used undiluted or were diluted in cell culture medium as follows or mock infection) using two biological replicates resulting in rna-seq data sets. rna was sequenced (with rrna depletion) using illumina reagents and an illumina hiseq instrument (single read, bases) cysteines were alkylated with iodoacetamide and m urea buffer was exchanged to mm ammonium-bicarbonate buffer with a ph of . . samples were digested within the filter devices by the addition of sequencing grade modified trypsin (serva) and incubation at °c over- night. thereafter, the filter-units were transferred to fresh tubes. peptides were eluted by the addition of µl . m nacl solution and centrifugation ( . x g for min) finally, peptides were dissolved in µl water with % acetonitrile and . % formic acid. the mass spectrometric analysis of the samples was performed using a timstof pro mass spectrometer (bruker daltonic). a nanoelute hplc system sample loading was performed at a constant pressure of bar. separation of the tryptic peptides was achieved at °c column temperature with the following gradient of water/ . % formic acid (solvent a) and acetonitrile/ . % formic acid (solvent b) at a flow rate of nl/min: linear increase from %b to %b within minutes, followed by a linear gradient to %b within minutes and linear increase to % solvent b in additional minutes. finally b was increased to % within minutes and hold for additional minutes. the built-in "dda pasef-standard_ . sec_cycletime" method developed by bruker daltonics was used for mass spectrometric measurement. data analysis was performed using maxquant with the andromeda search engine and uniprot databases were used for annotating and assigning protein identifiers the normalized expression values assigned to uninfected huh cells were derived from a total of mock samples representing multiple technical repeats of two biological samples generated at each of the h, h, h, h time points in order to generate a common reference sample for the mean protein expression found in uninfected / untreated huh cells. this mean reference was used to calculate all ratio values. from the entire data set, only protein intensity values for uniprot ids assigned to kegg were extracted, quantile normalized and further analyzed using the software tools described below. for the data shown in fig. and , raw data from lc-ms/ms runs (representing two independent experiments and three technical replicates per sample) were mapped to homo sapiens all data sets were processed by maxquant version . . . (raw data submission was done with version . . . ) (tyanova et al., a) including the match between runs option enabled resulting in the identification of ids assigned to contaminants and reverse sequences were omitted. for calculation of ratio values between conditions, the x replicates from each condition were assigned to one analysis group. differentially expressed proteins were identified from log transformed normalized protein intensity values by volcano plot analysis using perseus functions. all subsequent filtering steps and heatmap representations were performed in excel as described in the figure legends pearson correlations) were calculated using sigmaplot thirty minutes prior to harvest, the medium was supplemented with µm puromycin (invivogen, #ant-pr- ). then, cells were lysed as described above. after immunoblotting (see below), membranes were stained with coomassie brilliant blue and then hybridized with an anti puromycin antibody (kerafast, #eq ) to detect puromycinylated polypetides. total cell lysates of mers-cov-and sars-cov- -infected cells used for immunoblotting or mass spectrometry were prepared as follows. cells were scraped in ice-cold pbs and pelleted at x g for min at °. cell pellets were washed in ice-cold pbs and stored in liquid n (or lysed and processed immediately). after thawing, cell pellets (corresponding to ≈ . cells seeded in mm dishes at the start of the experiment) were resuspended in µl of ice-cold ca + / mg + -free pbs and transferred to fresh tubes. after addition of µl of % sds, samples were heated at °c for min and centrifuged at x g for minute at room temperature. supernatants were transferred to a fresh tube and heated again at °c for minutes and centrifuged at x g for minute at room temperature. protein concentrations were determined with the detergent compatible bradford assay kit (pierce™, # ) using a -fold dilution. aliquots corresponding to - µg protein (per lane) were mixed with x sds sample buffer (roti ® load, roth, #k ) and stored at - °c prior to sds-page, or loaded immediately. cell lysates were subjected to sds-page on - % gels. the pageruler™ prestained protein ladder (thermo scientific, # ) was used as mr marker. key: cord- -ch jxy authors: vashi, yoya; jagrit, vipin; kumar, sachin title: understanding the b and t cells epitopes of spike protein of severe respiratory syndrome coronavirus- : a computational way to predict the immunogens date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ch jxy the novel severe respiratory syndrome coronavirus- (sars-cov- ) outbreak has caused a large number of deaths with thousands of confirmed cases worldwide. the present study followed computational approaches to identify b- and t-cell epitopes for spike glycoprotein of sars-cov- by its interactions with the human leukocyte antigen alleles. we identified twenty-four peptide stretches on the sars-cov- spike protein that are well conserved among the reported strains. the s protein structure further validated the presence of predicted peptides on the surface. out of which twenty are surface exposed and predicted to have reasonable epitope binding efficiency. the work could be useful for understanding the immunodominant regions in the surface protein of sars-cov- and could potentially help in designing some peptide-based diagnostics. binding domains of s proteins of sars-cov- and sars-cov bind with similar affinities to human ace , . as the situation worsens, there is a growing need for the development of suitable therapeutics and alternate diagnostics against sars-cov- for effective disease management strategies. diagnostic assays based on peptides have become increasingly substantial and indispensable for its advantages over conventional methods . the present study aimed to locate appropriate epitopes within a particular protein antigen, which can elicit an immune response that could be selected for the synthesis of the immunogenic peptide. using computational approach, s glycoprotein of sars-cov- was explored to identify various immunodominant epitopes for the development of diagnostics. besides, the results could also help us to understand the sars-cov- surface protein response towards t and b cells. collection of targeted protein sequence we downloaded amino acid sequences (n= ) of s protein available at the time of study on targeted sars-cov- from the national centre for biotechnological information (ncbi) database. identification of potential peptides to identify an immunodominant region, it is of extreme importance to select the conserved region within the s protein of sars-cov- . all the sequences were compared among themselves for variability using protein variability server by shannon method . the average solvent accessibility (asa) profile was predicted for each sequence using sable server . bepipred . linear epitope prediction module [ ] [ ] [ ] incorporated in immune epitope database (iedb) was used to predict potential epitopes within the s protein. the fasta sequence of the targeted protein was used as an input for all the default parameters. the potential epitopes are represented by blue peaks, while green-colored slopes represent non- epitopic regions (figure ). the existence of b-cell linear and discontinuous (conformational) epitopes within the identified segments could help us to identify the peptides, which can elicit immune response . we identified linear epitopes, predicted by ellipro (iedb), which contains regions from of our selected peptides highlighted in red in table . these identified b-cell linear epitopes are placed based on their positional value, and scores. epitopes with high scores have more potential for antibody binding. five of our selected peptides (peptide numbers , , , , and in table ) were not considered as potential linear b-cell epitopes. using the same module, b-cell discontinuous epitopes were predicted, which gave epitope regions that contained regions from of our selected peptides highlighted in red (table ) . six peptides (peptide numbers , , , , , and in table ) were not predicted as discontinuous b-cell epitopes. to further confirm, we used abcpred server to detect b-cell epitopes, with default threshold of . . it identified various epitopes with different length and scores; out of those, the regions which contained our selected peptides are highlighted in red (table ) . a high score represents a good binding affinity with epitopes, and most of our peptides scored more than . and were predicted as linear b-cell epitopes. we used the iedb server to determine the binding affinity for human leucocyte antigen (hla) with our selected peptides from table . as recommended by the iedb server, reference hla allele sets were used for the prediction of mhc-i and mhc-ii t-cell epitopes, as they provide comprehensive coverage of the population. all the predictions were made using iedb recommended procedures. we observed good binding affinities for our selected peptides. the list of binding affinities for mhc-i t-cell epitopes is given in table , where low rank represents high binding affinity. the epitopes with rank < % for very high binding affinity were selected. regions from all of our selected peptides were found to be potential t-cell epitope(s) with high binding affinity with hla-a and hla-b alleles, except one. similarly, the list of binding affinities for mhc-ii t-cell epitopes are given in table . regions from our selected peptides are highlighted in red. the results revealed that around half of our selected peptides are potential t-cell epitope(s) with high binding affinity with hla-drb and hla- dp/dq alleles. overall, it was found that the regions identified in table not only had good b-cell and t-cell affinities, but the majority of them had overlaps with discontinuous epitopes also (table ). the peptide segments identified from the set of sequences of the sars-cov- s glycoprotein appear to hold reasonable potential to act as immunogens. peptide-based success . in our study, we predicted both b-cell and t-cell epitopes for conferring immunity in different ways. we speculate that the identified epitopes with considerably good epitope binding efficiency have the potential to be an immunodominant peptide. peptide-based sensitive and rapid diagnostic kits are considered as a better alternative to the conventional serological tests including whole antigenic protein . the study could help us to use the predicted peptide as an immunogen for the development of diagnostics against sars-cov- . figure . table . iedb ellipro predicted discontinuous epitopes for spike protein of sars-cov- . sequences that match our selected peptides are marked in red. table . iedb prediction of binding affinity with mhc-i alleles, only our selected peptides with percentile rank less than . are shown here. the binding affinity is considered higher for low percentile rank. sequences that match our selected peptides are marked in red. t , b:q , b:r , b:n , b:f , b:y , b:e , b:p , b:q , b:i , b:i , b:t , b:t , b:d , b:n , b:t , b:f , b:v , b:s , b:g , b:n , b:c , b:d , b:v , b:v , b:i , b:g , b:i , b:v , b:n , b:n , b:t , b:v , b:y c:a , c:h , c:f , c:p , c:r , c:e , c:g , c:v , c:f , c:v , c:s , c:n , c:g , c:t , c:h , c:w , c:f , c:v , c:t , c:q , c:r , c:n , c:f , c:y , c:e , c:p , c:q , c:i , c:i , c:t , c:t , c:d , c:n , c:t , c:f , c:v , c:s , c:g , c:n , c:c , c:d , c:v , c:v , c:i table . iedb prediction of binding affinity with mhc-ii alleles, only our selected peptides with percentile rank less than . are shown here. the binding affinity is considered higher for low percentile rank. sequences that match our selected peptides are marked in red. key: cord- - qfoc authors: jahromi, reza; mogharab, vahid; jahromi, hossein; avazpour, arezoo title: synergistic effects of anionic surfactants on coronavirus (sars-cov- ) virucidal efficiency of sanitizing fluids to fight covid- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qfoc our surrounding environment, especially often-touched contaminated surfaces, plays an important role in the transmission of pathogens in society. the shortage of effective sanitizing fluids, however, became a global challenge quickly after the coronavirus disease- (covid- ) outbreak in december . in this study, we present the effect of surfactants on coronavirus (sars-cov- ) virucidal efficiency in sanitizing fluids. sodium dodecylbenzenesulfonate (sdbs), sodium laureth sulfate (sls), and two commercial dish soap and liquid hand soap were studied with the goal of evaporation rate reduction in sanitizing liquids to maximize surface contact time. twelve fluids with different recipes composed of ethanol, isopropanol, sdbs, sls, glycerin, and water of standardized hardness (wsh) were tested for their evaporation time and virucidal efficiency. evaporation time increased by - % when surfactant agents were added to the liquid. in addition, surfactant incorporation enhanced the virucidal efficiency between - % according to the -field test in the en : european standard method. most importantly, however, we found that surfactant addition provides a synergistic effect with alcohols to inactivate the sars-cov- virus. this study provides a simple, yet effective solution to improve the virucidal efficiency of commonly used sanitizers. the coronavirus pandemic, also known as the covid- pandemic, is an ongoing pandemic of coronavirus disease , caused by severe acute respiratory syndrome coronavirus (sars-cov- ) [ ] . to date, more than . million cases have been confirmed, with more than , deaths globally. surfaces can become contaminated by hands, objects, settling of viruscontaining aerosols, or contaminated fluids [ , ] . therefore, these surfaces may play an important role in the transmission of pathogens in society [ , ] . manual disinfection of surfaces by wiping and spraying antibacterial fluids is an important part of the healthcare setting. the shortage of effective sanitizing liquids, however, became a global challenge quickly after the covid- outbreak in december . ethanol and isopropanol are the major active ingredient (~ - %) of most common sanitizers, but the high volatility of these alcohols makes them less effective when sprayed on surfaces. the alcohol retention time on the surface plays a key role in its viricidal efficiency. a common solution to overcome this challenge is to include less volatile material (such as aloe vera gel) in sanitizer formulation. however, while blended with gels, the final solution becomes thicker and more difficult to spray on surfaces. anionic surfactants that are used in the manufacturing of liquid soaps and detergents, are also known for their antibacterial efficiency because of their dual functionality. the hydrophobic side of these surfactants can dissolve the outer layer of viruses and bacteria, while the hydrophilic side dissolves in water [ , ] . therefore, these agents act as an emulsifier to remove the contamination. after the covid- pandemic, world health organization (who) has recommended that individuals wash hands more frequently and at least for seconds to ensure bacterial and virus removal from the skin [ , ] . it has been shown that the lipid outer layer of sars-cov- disrupts after long enough contact time with anionic surfactants [ ] . in this study, we explored the effect of anionic surfactants addition to sanitizing liquids with different recipes. the main objective of this study was to reduce the volatility of the sanitizers without negatively influencing their thickness (viscosity) while boosting their virucidal properties. two well known anionic surfactants, sodium dodecylbenzenesulfonate (sdbs) and sodium laureth sulfate (sls) that are commonly used in dish soaps and liquid soaps, respectively, were studied at % concentration. in addition, commercial dish soap and liquid hand soap were subjected to experimentation with the goal of providing a simple, yet effective solution for daily applications. all tested liquids were also studied for their sars-coc- viricidal efficiencies. the sars-cov- coronavirus was obtained from molecular epidemiology laboratory at shiraz university of medical science, iran. ethanol, isopropanol, glycerin, sdbs, and sls were obtained from millipore sigma (beijing, china). two commercial dish soap and hand soap were purchased from golrang (tehran, iran). water of standardized hardness (wsh) was used in control experiments and also as a diluting agent in the formulation of sanitizing fluids. twelve sanitizing fluids with different recipes, as shown in table , were prepared to examine the effect of individual components and mixtures on evaporation rate and sars-cov- virucidal efficiency of the solutions. for evaporation tests, pvc material with polyurethane (pur) surface coating was sliced into × cm ( mm thickness) pieces and placed on a flat bench, as shown in figure . sanitizing fluids were sprayed three times (from a distance of ~ cm from the surface) on the pvc object. fluid evaporation was recorded using a sony hxr-nx camera, and videos were processed using ulead videostudio software. a -watt amber light was installed cm above the surface to shine light on the surface to record better resolution videos. the surrounding environment was isolated to eliminate possible air convection, and the room temperature was maintained at ~ ºc. the exact amount of evaporation time was recorded for each sprayed liquid. the coronavirus suspension was prepared by infecting monolayers of a cell (human lung epithelial carcinoma cells) lines. the virus titers of these suspensions ranged from to tcid /ml. to determine the virucidal efficiency of each liquid, the -field test was performed according to en : european standard with slight modification [ ] . briefly, four squares as test fields were marked on the pvc with pur surface coating material figuring a row, cm away from each other. the marked test field on this flooring was inoculated with the sars-cov- virus suspension. μl inoculum was pipetted on the first test field and distributed with a spatula. a granite block was rapidly moved from test field to test field and back within no longer than seconds. after min contact time and drying, the sanitizing fluid was sprayed on the contaminated surface three times and allowed for min. the coronavirus was then recovered from all four fields with a nylon swab. swabs of each field were transferred to ml minimum essential medium (mem), respectively, and tubes were vortexed for s. virus titres were determined by endpoint dilution techniques according to the en : european standard and calculated using the method of kärber and spearman [ , ] . the virus reduction factor (rf) on the test fields was then determined according to the method described in details by becker et al. [ ] . all experiments were carried out in triplicates. s s s s s s s s s s s water-based sanitizing solutions normally consist of ~ % alcohol (ethanol and/or isopropanol) as an active ingredient, ~ % glycerin to reduce dehydration if applied to the skin, and a small amount of fragrance. wsh, tested as control liquid, showed s evaporation time using the method described above. to study the evaporation of sanitizing fluids, individual and multiplecompound tests were performed. evaporation test results are presented in figure . the evaporation time of % isopropanol solution (s ) was higher than that of ethanol solution (s ) that was ascribed to a higher molecular weight of isopropanol than ethanol. the effect of glycerin addition to a mixture of % ethanol and % isopropanol (s versus s solutions) also showed an increase in evaporation time by approximately % from to s. however, it is desired to maximize the liquid contact time with the surface to enhance the virucidal efficiency of the sanitizer. sdbs surfactant, which is one of the main components of most commercial dish soaps, significantly increased the evaporation time from s (of ethanol solution s ) to s (for solution s ) (~ % increase). in addition, the incorporation of % sls (a major constituent of liquid hand soaps) increased the evaporation time of isopropanol solution (s ) from s to s in solution s (~ % increase). these positive results encouraged us to investigate commercial dish soap and liquid hand soaps. the volatility reduction in the case of these soaps were even more significant than those of sdbs and sls. notably, liquid hand soap (solution s ) increased the evaporation time of isopropanol solution by about % from to s. furthermore, the addition of % dish soap to the ethanol solution (s ) increased the evaporation time by about % from to s (of fluid s ), as shown in figure . solutions s and s were prepared to record the evaporation time in the absence of alcohols and glycerin. we observed that dish soap and hand soap increased the evaporation time by % and %, respectively when compared with the control experiment (wsh). since evaporation experiments showed promise, we performed virucidal efficiency experiments thereafter. the twelve liquids exhibited different virucidal efficacies against the chosen coronavirus ( figure ). isopropanol showed a slightly higher (~ %) reduction factor than ethanol solution (s and s figure ). however, the combination of the alcohols (s ) did not exhibit rf between s and s . the virucidal efficiency of s was ~ % greater than the expected value (average of s and s ) that could be ascribed to a different virucidal activity in s . the addition of % glycerin did not influence the rf significantly as can be seen from the rf value obtained for s ( . compared with solution s . therefore, there could be another positive synergistic effect between the three agents (alcohol, soap, and glycerin) that creates a need for further investigation. we hypothesize that alcohols participate in the virucidal activity by dehydration mechanism, while anionic surfactants inactivate the virus via cell disruption mechanism. this hypothesis, with emphasis on the synergistic influence, will be studied as the next phase of the present investigation. solutions s and s were prepared to explore the virucidal properties of the commercial liquid soaps, regardless of alcohols. as expected, although these components slightly increased the rf value, the changes were negligible when compared with the control experiment (wsh evolution of severe acute respiratory syndrome coronavirus (sars-cov- ) as coronavirus disease (covid- ) pandemic: a global health emergency microbicides and the environmental control of nosocomial viral infections what do we know about the sars-cov- coronavirus in the environment? the role of contaminated surfaces in the transmission of nosocomial pathogens, in use of biocidal surfaces for reduction of healthcare acquired infections role of hospital surfaces in the transmission of emerging health care-associated pathogens: norovirus, clostridium difficile, and acinetobacter species formation and stabilization of antimicrobial delivery systems based on electrostatic complexes of cationic− non-ionic mixed micelles and anionic polysaccharides .c. and i. science, mechanism of interaction between ionic surfactants and polyglycol ethers in water features, evaluation and treatment coronavirus (covid- ) the effectiveness of moral messages on public health behavioral intentions during the covid- pandemic considerations for quarantine of individuals in the context of containment for coronavirus disease ( covid- ): interim guidance chemical disinfectants and antisepticsquantitative suspension test for the evaluation of bactericidal activity of chemical disinfectants and antiseptics used in food, industrial, domestic, and institutional areas-test method and requirements beitrag zur kollektiven behandlung pharmakologischer reihenversuche the method of 'right and wrong cases'('constant stimuli') without gauss's formulae evaluation of the virucidal efficacy of disinfectant wipes with a test method simulating practical conditions the experimental works for this study were carried out at jahrom university of medical sciences, iran, as part of the rapid response act to the covid- . no human and/or animals were subjected to experimentation during this study. key: cord- - uzivsk authors: zhang, zheng; zhu, zhaozhong; chen, wenjun; cai, zena; xu, beibei; tan, zhiying; wu, aiping; ge, xingyi; guo, xinhong; tan, zhongyang; xia, zanxian; zhu, haizhen; jiang, taijiao; peng, yousong title: membrane proteins with high n-glycosylation, high expression, and multiple interaction partners were preferred by mammalian viruses as receptors date: - - journal: biorxiv doi: . / sha: doc_id: cord_uid: uzivsk receptor mediated entry is the first step for viral infection. however, the relationship between viruses and receptors is still obscure. here, by manually curating a high-quality database of pairs of mammalian virus-host receptor interaction, which included unique viral species or sub-species and virus receptors, we found the viral receptors were structurally and functionally diverse, yet they had several common features when compared to other cell membrane proteins: more protein domains, higher level of n-glycosylation, higher ratio of self-interaction and more interaction partners, and higher expression in most tissues of the host. additionally, the receptors used by the same virus tended to co-evolve. further correlation analysis between viral receptors and the tissue and host specificity of the virus shows that the virus receptor similarity was a significant predictor for mammalian virus cross-species. this work could deepen our understanding towards the viral receptor selection and help evaluate the risk of viral zoonotic diseases. systematic analysis of the characteristics of the viral receptor could help understand the mechanisms under the receptor selection by viruses. the virus-receptor interaction was reported to be a principal determinant of viral host range, tissue tropism and cross-species infection [ , , ] . the existence and expression of the virus receptor in a host (or tissue) should be a prerequisite for viral infection of the host (or tissue) [ ] . usually, a virus mainly infects some particular type of hosts or tissues. for example, the influenza virus mostly infects cells of the respiratory tract [ ] . however, the virus-receptor interaction is a highly dynamic process. some viruses can recognize one or more receptors [ , , ] , which can also differ among virus variants or during the course of infections [ , , ] . in some cases, a few amino acid mutations in the viral protein or the receptor could abolish or enhance viral infection [ ] [ ] [ ] . besides, the virus-receptor interaction is under we firstly investigated the structural characteristics of mammalian virus receptor proteins. as expected, all the mammalian virus receptor protein belonged to the membrane protein which had at least one transmembrane alpha helix ( figure s a ). twenty-four of them had more than five helixes, such as -hydroxytryptamine receptor a (htr a) and npc intracellular cholesterol transporter (npc ). the receptor protein was mainly located in the cell membrane. besides, more than one third ( / ) of them were also located in the cytoplasm, and thirteen of them were located in the nucleus. then, the protein domain composition of the mammalian virus receptor protein was analyzed. the mammalian virus receptor proteins contained a total of domains based on the pfam database, with each viral receptor protein containing more than two domains on average ( figure s b ). this was significantly more than that of human proteins or human membrane proteins (p-values < . in the wilcoxon rank-sum test). some viral receptor proteins may contain more than domains, such as complement c d receptor (cr ) and low density lipoprotein receptor (ldlr). the human viral receptors were observed to have ten or more n-glycosylation sites, such as complement c b/c b receptor (cr ) and lysosomal associated membrane protein (lamp ). figure b displayed the modeled d-structure of htr a, the receptor for jc polyomavirus (jcpyv). five n-glycosylation sites were highlighted in red on the structure, which were reported to be important for viral infection [ ] . for comparison, we also characterized the n-glycosylation level for the human cell membrane protein, human membrane proteins and all human proteins (figure a ). it was found they had a significantly lower level of n-glycosylation than that of human and mammalian virus receptors (p-values < . in the wilcoxon rank-sum test), which suggests the importance of n-glycosylation for the viral receptor. enriched, such as "regulation of leukocyte activation" and "lymphocyte activation". for the go molecular function (table s ) , besides for the enrichment of terms related to the virus receptor activity, the human virus receptor was also enriched in terms of binding to integrin, glycoprotein, cytokine, and so on. consistent with the enrichment analysis of go cellular component, the kegg pathways of "cell adhesion molecules", "focal adhesion" and "ecm-receptor interaction" were also enriched. besides, the pathway of "phagosome" was enriched (table s ) , which may be associated with viral entry into the host cell. interestingly, some pathways associated with heart diseases were enriched, including "dilated cardiomyopathy", "hypertrophic cardiomyopathy", "arrhythmogenic right ventricular cardiomyopathy" and "viral myocarditis". we next analyzed the protein-protein interactions (ppis) which the mammalian virus receptor protein took part in. as the reason mentioned above, we only used the human (table s ) . when looking at the interactions between viral receptors, we found that of viral receptors interacted with themselves. this ratio ( / = %) was much higher than that of human proteins ( %), membrane proteins ( %) and human cell membrane proteins ( %). however, we found the viral receptor tended not to interact with each other ( figure s d since the virus has to compete with other proteins for binding to the receptor, proteins (table s ) . since the viral receptor determines the host specificity of the virus to a large extent, it is expected that the closer between the viral receptor and its homolog in a species, the more likely the virus which used the receptor would infect the species. to validate this hypothesis, we firstly calculated the sequence identities between the viral receptor and their homologs in mammal species ( figure and table s ). for clarity, only mammal species, which were frequently observed, were presented in figure . then, table s . what's the relationship between glycosylation and viral receptor selection? as we know, glycosylation of proteins is widely observed in eukaryote cells [ ] . it plays an important role in multiple cellular activities, such as folding and stability of glycoprotein, immune response, cell-cell adhesion, and so on. glycans are abundant on host cell surfaces. they were probably the primordial and fallback receptors for the virus [ ] . to use glycans as their receptors, a large number of viruses have stolen a host galectin and employed it as a viral lectin [ , ] . for example, the sjr fold, which was mainly responsible for glycan recognition and binding in cellular proteins, was observed in viral capsid proteins of over one fourth of viruses [ ] . thus, during the process of searching for protein receptors, the protein with high level of glycosylation could provide a basal attachment ability for the virus, and should be the preferred receptor for the virus. secondly, our analysis showed that the viral receptor protein had a tendency to interact with itself and had far more interaction partners than other membrane proteins. besides the function of viral receptor, the receptor protein functions in the host cell by interacting with other proteins of the host, such as signal molecules and ligands. therefore, the virus has to compete with these proteins for binding to the receptor [ ] . the protein with less interaction partners are expected to be preferred by the virus. why did the virus select the proteins with multiple interaction partners as receptors? one possible reason is that the receptor proteins are closely related to the "door" of the cell, so that many proteins have to interact with them for in-and-out of the cell. this could be partly validated by the observation that for the interaction partners of human viral receptors, six of top ten enriched terms in the domain of go biological process were related to protein targeting or localization (table s ) . for entry into the cell, the virus also selects these proteins as receptors. another possible reason is that viral entry into the cell needs cooperation of multiple proteins which were not identified as viral receptors yet. besides, previous studies show that the virus could structurally mimic native host ligands [ ] , which help them bind to the host receptor. (table s ). the number of transmembrane alpha helix of the mammalian virus receptor was derived from the database of uniprotkb and the web server tmpred [ ] . the location for the viral receptor was inferred from the description of "subcellular location" for the receptor protein provided by uniprotkb, or from the go annotations for them: the viral receptors annotated with go terms which included the words of "cell surface" or "plasma membrane" were considered to be located in the cell membrane; those annotated with go terms which included the words of "cytoplasm", "cytosol" or "cytoplasmic vesicle", or shown to be in the cytoplasm in uniprotkb, were considered to be located in the cytoplasm; those annotated with go terms "nucleus" (go: ) or "nucleoplasm" (go: ) were considered to be located in the [ ] in r (version . . ) [ ] . to identify the homolog of the mammalian virus receptor in other mammal species, the protein sequence of each viral receptor was searched against the database of mammalian protein sequences, which were downloaded from ncbi refseq database [ ] on october th , , with the help of blast (version . . ) [ ] . analysis showed that in the database of mammalian protein sequences, there were mammal species which were richly annotated and had far more protein sequences than other mammal species (table s ) . therefore, only these mammal species were considered in the evolutionary analysis. based on the results of blast, the homolog for the viral receptor was defined as the hit with e-value small than e- , coverage equal to or greater than % and sequence identity equal to or greater than %. only the closest homolog, i.e., the best hit, in each mammal species was used (table s ). similar methods as above were utilized to calculate the indicators of conservation level for these proteins. for analysis of co-evolution between viral receptors, firstly for each viral receptor, a phylogenetic tree was built based on the protein sequences of the receptor and its homologs in mammal species with the help of phylip (version . ) [ ] . the neighbor-joining method was used with the default parameter. then, the genetic distances between the viral receptor and their homologs were extracted from the phylogenetic tree with a perl script. finally, for a pair of viral receptors, the spearman correlation coefficient (scc) was calculated between the pairwise genetic distances of viral receptors and their homologs, which was used to measure the extent of co-evolution between this pair of viral receptors. the set of housekeeping gene in human was adapted from the work of eisenberg et al [ ] . a total of genes were identified as the housekeeping gene. (table s ) the mammalian virus-host interactions were primarily adapted from olival's work [ ] . one hundred and fifteen viruses in our database and of richly annotated mammal species could be mapped to those in olival's work (table s ). these viruses used a total of viral receptors. the sequence identities of these viral receptor proteins to their related homologs in the corresponding mammal species were presented in table s . for comparison, we also extracted genetic distances (host relatedness) between the mammal species and the viral host with reported receptors based on olival's work (table s ) . then, the genetic distance of the mammal species to the viral host with reported receptors, and the sequence identity of the receptor homolog in the mammal species to the viral receptor protein, was respectively used to predict whether a mammal species could be infected by the virus which infected the host with reported receptors. the method of receiver operating characteristic (roc) curve was used to evaluate and compare their performance with the functions of roc(), auc(), roc.test() and plot.roc() in the package of "proc" [ ] in r (version . . ). statistical analysis all the statistical analysis was conducted in r (version . . ) [ ] . the wilcoxon rank-sum test was conducted with the function of wilcox.test(). clusterprofiler: an r package for comparing biological hz, tj and yp conceived and designed the study. zz and zzz did the computational key: cord- -rdhjj g authors: taniwaki, s.a; silva, s.o.s; santana-clavijo, n.f; conselheiro, j. a; barone, g.t; menezes, a.a.r; pereira, e.s; brandão, p.e title: resource optimization in covid- diagnosis date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: rdhjj g the emergence and rapid dissemination worldwide of a novel coronavirus (sars-cov- ) results in decrease of swabs availability for clinical samples collection, as well as, reagents for rt-qpcr diagnostic kits considered a confirmatory test for covid- infection. this scenario, showed the requirement of improve de diagnostic capacity, so the aim of this study were to verify the possibility of reducing the reaction volume of rt-qpcr and to test cotton swabs as alternative for sample collection. rt-qpcr volumes and rna sample concentration were optimized without affecting the sensitivity of assays, using both probe-based and intercalation dyes methods. although rayon swabs showed better performance, cotton swabs could be used as alternative type for clinical sample collection. covid- laboratory diagnosis is important to isolate and restrict the dissemination of virus, so seek for alternatives to decrease the coast of assays improve the control of disease. on december , , the world health organization (who) was notified about cases of severe pneumonia in wuhan, china (who, a) . the etiological agent was identified as a coronavirus (coronaviridae: betacoronavirus: sarbecovirus: severe acute respiratory syndrome-related coronavirus) named as sars-cov- (gorbalenya et al., ; lu et al., a) and the disease denominated as covid- (coronavirus disease ) (who, b) . confirmed cases are currently peaking in latin america and, for instance, on june th , more than , cases had been confirmed in brazil (available in https://covid.saude.gov.br/). chain reaction (rt-qpcr), which can detect active infection by detection of viral genome. collection of only nasopharyngeal samples ( swab used for both nostrils) or combined with oropharyngeal ( swab) sample, should be made with synthetic fibers swabs as more reliable due to a lesser interference with the downstream reactions (cdc, a) . protocols with primers and probes designed at the charité virology institute, berlin (corman et al., ) or at the centers of disease control and prevention (cdc, b; lu et al., b) became one of the standard procedures for sars-cov- detection. with the increasing demand for rt-qpcr tests for both patients and population screening, there is an ongoing shortage of both rt-qpcr reagents and rayon swabs; besides, the relatively high costs of these consumables for poor countries pose a barrier for a wider use of testing. this manuscript reports on the optimization of the charité and the cdc rt-qpcr protocols for sars-cov- detection regarding concentration and volumes of reagents for both probe and intercalant agent-based platforms, as well as on the substitution of rayon swabs for cotton swabs for sample collection. an inactivated sars-cov- isolate (kindly provided by prof. edison l. durigon, institute of biomedical sciences, university of são paulo, brazil) was used for total rna extraction with qiaamp ® viral rna mini kit (qiagen). total rna concentrations of and % for final reactions of , and µl, were tested using agpath-id one step rt-pcr kit (applied biosystems ™ ) and stepone ™ real-time pcr systems (applied biosystems ™ ), with primers and probes for the rdrp and e genes (corman et al., ) . amplification conditions were ºc/ min (reverse transcription), ºc/ min (for the activation of the dna polymerase) followed by cycles at ºc/ seconds and ºc/ seconds. though the different total rna concentrations resulted in cq difference, no difference was found for the three final reaction volumes (table ) . performance of e and rdrp genes of sars-cov- rt-qpcrs, based on final reaction volume of µl with µl of rna (table ) , were verified with a relative standard curve built with - to - dilutions of positive rna control. respectively for e and rdrp genes, detection were observed until - (cq mean= . , e= . % and r = . ) and - dilutions (cq mean= . , e= . % and r = . ). this lower sensitivity for rdrp gene is possibly a consequence of a lower transcription of the corresponding orf due to a -to- ' attenuation (irigoyen et al., ) . so, the results showed that the final volume does not interfere with the sensitivity of this assay. the kits gotaq ® probe -step rt-qpcr (promega) and superscript iii platinum one-step rt-pcr system (thermofisher scientific) were also tested for the final reaction volumes of and µl, as per manufacturer's instructions, in a real-time pcr systems (applied biosystems ™ ), and the same results were found when compared to the agpath-id one step rt-pcr kit (applied biosystems ™ ). primer for sars-cov- kit (biosearch™ technologies), which includes primers and probes for sars-cov- n gene (n and n ) and human rnase p (endogen control), were tested with final reaction volume of and µl with % concentration of total rna, using agpath-id one step rt-pcr (applied biosystems ™ ) and stepone ™ real-time pcr systems (applied biosystems ™ ). same conditions and relative standard curve ( - to - ) were used with e gene reaction (corman et al., ) to possibly the comparison between assays performances. all reactions resulted in adequate efficiencies and r values (table ) and detected the targets up to the - dilution (table ) , though the n reaction resulted in a more intense fluorescence for the higher dilutions, allowing an easier visualization of amplification. the difference between the sensitivities of this last test with the previously described for the charité rdrp and e genes protocol is possibly due to different positive control aliquots used. the limit of detection (lod) was calculated for each assay using replicates of each dilution ( x - , x - and x - ). the n reaction showed slight better results with lod of x - dilution, while e and n assays had lod of x - dilution (table ) . for the rnase p, the nm primers reaction was found as more efficient (e= . %; r = . ), with a specific peak at ºc in the melt curve; at higher primers concentrations, primers dimers were detected in melt curve analysis and at nm there was a loss in sensitivity. e gene reaction was fond as sensitive as the probe-based reaction for this target, being able to detect up to - dilution of positive rna control with primers at both and nm, with a unique melting peak at ºc. at nm primers concentration, though, a higher efficiency was found (e= . % and r = . ), as well as an earlier target detection for the - dilution (cq mean= . and . , respectively, for and nm). the rdrp reaction showed insufficient performance, as, with primers at nm there was a loss in sensitivity, while, at nm the sensitivity was equal to the probe-based reaction (detection up to - dilution), but efficiency could not be evaluated due to primer dimers formation visualized in melt curve analysis. an inactivated sars-cov- isolate was -fold diluted in sterile saline from - to - in µl, in duplicate. for each respective dilution, one rayon (inlab diagnóstica, brazil) and one cotton (labor, brazil) swab, both with plastic stalks, were dipped into the tube (one swab/tube), being all the content absorbed by the swabs. undiluted sars-cov- ( ) was also included, as well as sterile saline (negative control). next, each swab was immersed in a conical tube containing ml sterile saline and kept at - ºc for hours, in order to mirror the clinical flow of clinical samples. all swabs were finally submitted to rna extraction and tested with the probe-based e gene rt-qpcr protocol. rayon swabs showed more consistency results, with detection through all dilutions from to - linearly (table ) , while cotton swabs failed to detect in two dilutions ( - and - ). cdc does not recommend the use of calcium arginate swabs and swabs with wooden shafts, because these material types may contain pcr inhibitors, and indicate only use of synthetic fibers swabs for collection of clinical samples (cdc, a). however, some study showed good performance of cotton swabs for detection of human norovirus and rhinovirus by rt-qpcr (waris et al., ; lee et al., ) . both cotton and rayon swabs have the same chemical structure, with o-h groups which form hydrogen bonds with nucleic acids, and showed optimal absorption capacity (above µl), but low extraction and recovery efficiency (bruijns, ) . although the amplification have been failed in some dilutions, in lower and higher dilution detection was possible, thus cotton swabs could be an alternative to rayon swabs for clinical sample collection. tabela -probe-based rt-qpcr to the e gene of serial dilutions of sars-cov- sampled with cotton and rayon swabs. the consequences of the ongoing shortage of rayon swabs and rt-qpcr reagents on the sampling and testing of covid- suspected persons will surely have a negative impact on the control of the disease due to a lower sampling and a higher sub notification rate. here, we demonstrated that the decrease of reaction volume not interferes in sensitivity of assays, allowing the diagnostic capacity increase, and a choice of the type of swab must be based in the local availability and the urgency of the sampling and testing. once no loss of sensitivity was demonstrated using µl final volume reactions, the following protocols are suggested for probe-based and intercalating dyes rt-qpcrs for detection of sars-cov- . amplification conditions were optimized for both probe and intercalating dye reactions and for different kits and primers sets. step step rt-pcr (applied biosystems ™ ) *gotaq probe -step rt-qpcr kit (promega) and superscript iii platinum one- step rt-pcr system (thermofisher scientific) were also tested sars-cov n and n and human rnase p final concentration volume/sample (µl) agpath-id buffer ( x) x . primer/probe mix ( x cnpq (brazilian national board for scientific and technological development grant number / - and capes (coordenação de aperfeiçoamento de pessoal de nível superior, brasil -finance code ). authors were grateful to prof. edison l. durigon and his team of laboratory of clinical and molecular virology caroline cotrim aires from laboratory of diagnostics of zoonosis and vector-borne diseases, and dr. solange maria de saboia e silva from health surveillance coordination for the institutional support and incentive to the development of the study interim guidelines for collecting, handling, and testing clinical specimens for covid- cdc. centers for disease control and prevention. -novel coronavirus ( -ncov) real-time rt-pcr primer and probe information the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling comparison of swab sampling methods for norovirus recovery on surfaces genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding. the lancet us cdc real-time reverse transcription pcr panel for detection of severe acute respiratory syndrome coronavirus . emerging infectious diseases comparison of sampling methods for the detection of human rhinovirus rna pneumonia of unknown cause-china novel coronavirus ( -ncov) key: cord- -tmszrtju authors: hoepel, willianne; chen, hung-jen; allahverdiyeva, sona; manz, xue; aman, jurjan; bonta, peter; brouwer, philip; de taeye, steven; caniels, tom; van der straten, karlijn; golebski, korneliusz; griffith, guillermo; jonkers, rené; larsen, mads; linty, federica; neele, annette; nouta, jan; van baarle, frank; van drunen, cornelis; vlaar, alexander; de bree, godelieve; sanders, rogier; willemsen, lisa; wuhrer, manfred; bogaard, harm jan; van gils, marit; vidarsson, gestur; de winther, menno; den dunnen, jeroen title: anti-sars-cov- igg from severely ill covid- patients promotes macrophage hyper-inflammatory responses date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tmszrtju for yet unknown reasons, severely ill covid- patients often become critically ill around the time of activation of adaptive immunity. here, we show that anti-spike igg from serum of severely ill covid- patients induces a hyper-inflammatory response by human macrophages, which subsequently breaks pulmonary endothelial barrier integrity and induces microvascular thrombosis. the excessive inflammatory capacity of this anti-spike igg is related to glycosylation changes in the igg fc tail. moreover, the hyper-inflammatory response induced by anti-spike igg can be specifically counteracted in vitro by use of the active component of fostamatinib, an fda- and ema-approved therapeutic small molecule inhibitor of syk. one sentence summary anti-spike igg promotes hyper-inflammation. coronavirus disease (covid- ) , which is caused by severe acute respiratory syndrome coronavirus (sars-cov- ), is characterized by mild flu-like symptoms in the majority of patients( , ). however, approximately % of the cases have more severe disease outcomes, with bilateral pneumonia that may rapidly deteriorate into acute respiratory distress syndrome (ards) and even death by respiratory failure. with high numbers of people being infected worldwide and very limited treatments available, safe and effective therapies for the most severe cases of covid- are urgently needed. remarkably, many of the covid- patients with severe disease show a dramatic worsening of the disease around - weeks after onset of symptoms ( , ) . this is suggested not to be a direct effect of viral infection, but instead to be caused by over-activation of the immune system, particularly because it coincides with the activation of adaptive immunity ( ) . this excessive immune response is frequently described as a 'cytokine storm', characterized by high levels of pro-inflammatory cytokines ( , ) . detailed assessment of the cytokine profile in severe cases of covid- indicates that some cytokines and chemokines are particularly elevated, such as il- , il- , and tnf ( ) ( ) ( ) . in contrast, type i and iii interferon (ifn) responses, which are critical for (early phase) anti-viral immunity, appear to be suppressed ( ) . combined, the high pro-inflammatory cytokines, known to induce collateral damage to tissues, together with muted anti-viral responses, indicate a highly unfavorable immune response in severe cases of covid- . previously, it has been shown that the virus leading to sars, sars-cov, causes severe inflammation and lung injury through igg antibodies ( ) . using a sars-cov macaque model, liu et al. demonstrated that the early emergence of anti-cov spike-protein iggs induces severe lung injury by skewing macrophage polarization away from wound-healing 'm ' characteristics towards a strong proinflammatory phenotype. notably, sars patients that eventually died from infection displayed similar conversion of these wound-healing lung macrophages, as well as the early and high presence of neutralizing igg antibodies. the pro-inflammatory consequences of these iggs could be suppressed by fc receptor (fcr) blockade. very similar to sars, severe covid- patients are characterized by an early rise and high titers of igg antibodies ( ) ( ) ( ) and show a similar conversion from anti-to proinflammatory lung macrophages ( ) . combined, these data hint towards the involvement of anti-spike igg in severe cases of covid- . therefore, in this study we explored the hypothesis that, similar to responses in sars, anti-spike antibodies drive excessive inflammation in severe cases of covid- . we assessed the effect of anti-spike antibodies from serum of critically ill covid- patients on human m macrophages. while different protocols are available for generating human m macrophages, our previous transcriptional analyses demonstrated that an m-csf and il- -induced monocyte differentiation protocol generates cells that most closely resemble primary human lung macrophages ( ) . since activation of immune cells by igg antibodies is known to require immune complex formation by binding of igg to its ligand ( , ) , we generated spike-igg immune complexes by incubating sars-cov- spike-coated wells with diluted serum from severely ill covid- patients (i.e. patients from the intensive care unit at the amsterdam umc) that tested positive for anti-sars-cov- igg. as shown in figure a , stimulation with spike protein alone did not induce cytokine production, while spike-igg immune complexes elicited small amounts il- β, il- , and tnf, but very high il- production by human macrophages. stimulation with the viral mimic polyic induced very little cytokine secretion. yet, since in the late phase of infection lung macrophages are simultaneously exposed to viral stimuli and anti-spike igg immune complexes, we also assessed the effect of the combination of these two stimuli. strikingly, combined stimulation of anti-spike igg immune complexes and polyic strongly amplified the production of covid- -associated pro-inflammatory cytokines il- β, il- , and tnf ( figure a) . induction of the anti-inflammatory cytokine il- was also increased, similar to what is observed in covid- patients ( ) . to verify the relevance of our findings for human lung macrophages, we assessed the response of primary alveolar macrophages that were obtained by bronchoalveolar lavage, which showed very similar responses ( figure b ). to assess whether this inflammatory response is really dependent on anti-spike igg antibodies, and not on other inflammatory components in serum of critically ill patients, we compared the effect of sera from intensive care lung disease patients that either ( ) did not have covid- , ( ) had covid- but were still negative for anti-spike igg, or ( ) had covid- and were positive for anti-spike igg (see table s for details). while serum of non-covid- patients and anti-spike igg-negative covid- patients showed no up-regulation of pro-inflammatory cytokines compared to individual polyic stimulation, il- β, il- , il- , and tnf production was strongly amplified by serum of covid- patients with anti-spike igg ( figure c ). substantiating these findings, rnaseq analysis of macrophages stimulated with sera from anti-spike igg positive covid- patients also showed clear induction of a pro-inflammatory gene program, as highlighted by induction of tnf, interleukins, chemokines, and macrophage differentiation factors ( figure d ). interestingly, also ifn- and ifn- were induced by anti-spike positive serum, while the classical downstream interferon response gene cxcl was reduced ( figure s a and s b), which may be related to reduced expression of ifn receptors ( figure s c ). in covid- patients, high anti-spike igg titers are associated with disease severity ( , ) . to determine whether anti-spike titers correlate with higher cytokine responses by human macrophages, we performed a principle component analysis (pca) of the combined cytokine production data for all samples, which upon overlaying with anti-spike igg titers (based on half maximal effective concentration ec ) indeed indicated that the hyper-inflammatory response of macrophages was associated with igg titers ( figure e ). this was subsequently confirmed by analyzing four serum samples with different titers using serial-step dilutions ( figure f ), which showed dose-dependent induction of pro-inflammatory cytokines ( figure g ). taken together, these data demonstrate that anti-spike igg immune complexes generated from serum of severely ill covid- patients induce a strong pro-inflammatory response by (otherwise immunosuppressive) human m macrophages, which is characterized by high production of classical cytokine storm mediators such as il- β, il- , il- , and tnf. in addition to excessive lung inflammation through a cytokine storm, severe covid- is characterized by pulmonary edema, following disruption of the microvascular endothelium ( ) , and by coagulopathy, which in many patients is characterized by pulmonary thrombosis ( ) . to test whether the excessive macrophage activation by anti-spike igg may contribute to pulmonary edema and thrombosis, we applied in vitro models for endothelial barrier integrity ( ) and in situ thrombosis ( ) using primary human pulmonary artery endothelial cells (hpaec), where thrombocytes can be added under flow conditions. while conditioned medium of macrophages that had been stimulated with only polyic induced a transient drop in endothelial barrier integrity, co-stimulation of macrophages with spike protein and serum of severe covid- patients induced long-lasting endothelial barrier disruption ( figure a ). in addition, during platelet perfusion we observed significantly increased platelet adhesion to endothelium exposed to conditioned medium of macrophages that had been costimulated with spike protein and serum ( figure b ). this effect was paralleled by an increase in von willebrand factor release from the endothelial cells ( figure c ), indicative for an active pro-coagulant state of the endothelium. these data indicate that anti-spike igg from severely ill covid- patients does not only induce hyperinflammation by macrophages, but also may contribute to permeabilization of pulmonary endothelium and microvascular thrombosis. in addition to the anti-spike antibodies from serum, we tested the effect of the recombinant anti-spike igg cova - , which we generated previously from b cells of covid- patients ( ) . as previously shown by others, the concentration of anti-sars-cov- igg in severe covid- patients peaks at an average of . μg/ml ( ) . to test the efficacy of our recombinant anti-spike igg, we stimulated macrophages with anti-spike immune complexes made with a high concentration (mimicking a serum concentration of μg/ml in our assay) of the recombinant antibody cova - . remarkably, the high concentration of recombinant anti-spike igg immune complexes elicited substantially less il- β, il- , and tnf than anti-spike immune complexes made from covid- serum ( figure a ). interestingly, we did not observe this difference for the induction of anti-inflammatory cytokine il- ( figure a ). these data suggest that the anti-spike igg in severe cases of covid- patients is intrinsically more pro-inflammatory than recombinant igg. one of the key characteristics that determines igg pathogenicity is glycosylation of the igg fc tail at position ( , ) . previously, we and others have shown that anti-spike igg of severe covid- patients has aberrant fucose and galactose expression, both compared to the total igg within these individual patients, as well as compared to anti-spike igg from mild or asymptomatic patients ( , ) . for a subset of covid- serum samples in the present study, the glycosylation pattern of anti-spike igg had been determined previously for the study of larsen et al. ( ) , which showed significantly decreased fucosylation and increased galactosylation of anti-spike igg compared to total igg within the tested patients (igg glycosylation of the sera used in this study is depicted in figure s a ). when plotting the percentage of anti-spike igg fucosylation against pro-inflammatory cytokine production, we observed an inverse correlation for igg fucosylation and production of il- β, il- , and tnf ( figure b ), while this was not seen for il- and il- ( figure s b ). to determine whether there is a causative role for these glycosylation changes in the induction of pro-inflammatory cytokines, we compared cytokine induction by regular cova - , to modified cova - that had low fucose and high galactose (glycosylation details in table s ). notably, cova - with low fucose and high galactose showed an increased capacity for amplification of pro-inflammatory cytokines ( figure c ). these data indicate that anti-spike igg from covid- patients has an aberrant glycosylation pattern that makes these antibodies intrinsically more inflammatory than 'common' iggs by increasing its capacity to induce high amounts of pro-inflammatory cytokines. anti-spike igg from severely ill covid- patients promoted inflammatory cytokines, endothelial barrier disruption, and microvascular thrombosis, which are key phenomenon in severely ill covid- patients that are thought to underlie the pathology. hence, counteracting this antibody-induced aberrant immune response could be of potential therapeutic interest. to determine how this antibodyinduced inflammation could be counteracted, we first set out to investigate which receptors on human macrophages are activated by the anti-sars-cov- igg immune complexes. igg immune complexes can be recognized by fc gamma receptors (fcγrs), which includes fcγri, fcγrii, and fcγriii ( ) . as shown in figure a , the used human macrophage model highly expressed all fcγrs. to determine whether fcγrs are involved in activation by anti-spike immune complexes, we blocked the different fcγrs with specific antibodies during stimulation, and analyzed cytokine production. as shown in figure b , all fcγrs contributed to anti-spike-induced cytokine induction, but the most pronounced inhibition was observed upon blockade of fcγrii. no inhibition was observed upon blocking of fc alpha receptor i (fcαri), suggesting that iga does not play a major role in the observed cytokine induction ( figure b ). fcγrs are known to induce signaling that critically depends on the kinase syk ( , ) . to determine whether we could counteract anti-spike-induced immune activation, we blocked syk using r , the active component of the small molecule inhibitor fostamatinib, an fda-and ema-approved drug for treatment of immune thrombocytopenia (itp) ( ) . strikingly, r almost completely blocked proinflammatory cytokine production induced by anti-spike igg from severe covid- patients ( figure c and d). importantly, inhibition by r appeared to be specific, since it selectively blocked anti-spike induced amplification of cytokines, but did not substantially affect cytokine production induced by polyic alone (figure c ). to assess the effects of inhibition by fostamatinib in greater detail, we analyzed the effects of r on macrophages stimulated with spike, covid- serum, and polyic by rnaseq. in total genes were suppressed by r treatment, while genes were induced (fdr< . , figure e ). in the downregulated genes many of the classical pro-inflammatory mediators were present, including tnf, il b, il and ccl . pathway analyses further showed no clear pathways in the upregulated genes, while suppressed genes were clearly linked to inflammatory pathways ( figure s a ). finally, gene set enrichment analysis (gsea) showed that genes associated with several pro-inflammatory pathways including il- signaling and tnf production and response, were significantly downregulated by r ( figure f ), while also response to type i ifn, fc-gamma receptor signaling, glycolysis, and platelet activation gene sets were suppressed ( figure s b ). these data demonstrate that the excessive inflammatory response by anti-spike igg from severely ill covid- patients can be counteracted by the syk inhibitor fostamatinib. in conclusion, our data show that anti-spike igg from serum of severely ill covid- patients strongly amplifies pro-inflammatory responses by human macrophages, and can contribute to subsequent endothelial barrier disruption and thrombosis. this may explain the observation that many covid- patients become critically ill around the time of activation of adaptive immune responses. in general, antibodies are beneficial for host defense by providing various mechanisms to counteract infections ( ) . these different effector functions of antibodies are modulated by antibody-intrinsic characteristics, such as isotype, subclass, allotype, and glycosylation ( ) . in severely ill covid- patients the glycosylation of anti-spike igg is changed, which we here show amplifies the igg effector function to promote pro-inflammatory cytokine production, of which covid- -associated cytokines such as il- β, il- , and tnf( , ) are most pronounced. decreased igg fucosylation, as observed in severe cases of covid- , has so far only been observed in patients infected with hiv and dengue virus ( , ) , but may actually be a general phenomenon in a response to enveloped viruses ( ) . while decreased fucosylation increases the infection of cells by a process known as antibody-dependent enhancement (ade) ( ) , there is little evidence for antibody-enhanced infection in covid- ( ) . instead, our data show that increased pathology by de-fucosylated igg in covid- patients most likely results from excessive immune activation. the combination of decreased fucosylation and increased galactosylation of igg is known to particularly increase the affinity for fcγriii ( ) . while we show that fcγriii was partially responsible for anti-spike-induced inflammation, fcγrii contributed most, indicating that collaboration between multiple fcγrs is required for the hyper-inflammatory responses induced by the aberrant glycosylation of anti-spike igg. in addition to human alveolar macrophages, these fcγrs are expressed by various other myeloid immune cells ( ) , but also by airway epithelial cells ( ) , which are one of the main target cells of infection by sars-cov- and closely interact with activated macrophages ( ) . the observed hyper-inflammatory response induced by anti-spike igg from severe patients could be specifically counteracted by the syk inhibitor r , the pro-drug of fostamatinib. notably, fostamatinib is an fda and recently also ema approved drug that is currently used for treatment of itp ( ) , which may facilitate repurposing for the treatment of severe covid- patients. a very recent study indicates that fostamatinib may also counteract acute lung injury by inhibiting mucin- expression on epithelial cells, suggesting that fostamatinib may target multiple pathways simultaneously ( ) . in addition to fostamatinib, also other drugs that target key molecules in fcγr signaling could be efficacious to counteract anti-spike igg-induced inflammation in covid- patients. for example, the sykdependent fcγr signaling pathway critically depends on the transcription factor irf ( , ) , which can be targeted using cell penetrating peptides ( ) . furthermore, fcγr stimulation is known to induce metabolic reprogramming of human macrophages ( ) , which is also observed in covid- patients ( ) , and therefore may provide additional targets for therapy. these findings may not only be valuable to find new ways to treat the most severely ill covid- patients, but may also have implications for the therapeutic use of convalescent serum, for which it may be wise to omit the deglycosylated iggs that are present in severely ill patients. table s for details). (f) binding elisa to determine the relative titer of anti-spike igg in sera of four severe covid- patients. od=optical density. (g) macrophages stimulated with spike protein and polyic were co-stimulated with different dilutions of serum from patients in panel f, after which il- production was determined after h. human pulmonary arterial endothelial cells were exposed to supernatant of macrophages that were unstimulated, or had been stimulated with polyic and spike protein, with or without covid- serum. endothelial barrier integrity was measured over time by measuring the resistance over time using electrical cell-substrate impedance sensing. (b) endothelium as stimulated under a. for h was perfused with platelets for minutes, after which the area covered by platelets was quantified. (c) flow supernatant was collected after perfusion under b, and vwf levels were measured with elisa. *p< . ; **p< . ; ***p< . ; ****p< . ; ns=not significant. macrophages stimulated with spike protein and polyic were co-stimulated with either x diluted serum from different anti-spike positive covid- patients (black dots), or with recombinant anti-spike antibody cova - (red dot). cytokine production was measured after h. (b) correlation graphs of fucosylation percentages of anti-spike igg from covid- serum against cytokine production of macrophages after stimulation as in panel a. the pearson correlation coefficient (r) and p value are illustrated on each graph with % confidence bands of the best-fit line. (c). macrophages stimulated with spike protein were co-stimulated with (combinations of) polyic, cova - (recombinant anti-spike igg ), or cova - that had been modified to express low fucose and high galactose. il- production was measured after h. representative donor (mean+stdev) of independent experiments. membrane expression of fcγri, ii, and iii by human macrophages was determined by flow cytometry. (b) fcγri, ii, and/or iii, or fcαri were blocked by specific antibodies, after which macrophages were stimulated with spike, covid- serum, polyic, or a combination. il- production was measured after h. triplicate values from a representative experiment with serum from three different covid- patients and two different macrophage donors (mean+stdev). (c/d). macrophages were pre-incubated with syk inhibitor r , after which cells were stimulated as in b. cytokine production was measured after h. panel c shows a representative donor (mean+stdev), panel d shows the response of macrophages stimulated with spike, polyic, and covid- serum, with or without pre-incubation with r . every pair of dots represents cytokine production after h by a different serum donor. ***p< . ; ****p< . . (e) volcano plot depicting up-and down-regulated genes when comparing macrophages stimulated for h with spike, polyic, and serum to the same stimulation in the presence of r . fdr=false discovery rate. (f) gene set enrichment analysis (gsea) of curated gene sets suppressed by r : interleukin- -mediated signaling pathway (go: ), tnf production (go: ), response to tnf (go: ). nes stands for normalized enrichment score and adj. p represents the bh-adjusted p value. substrate for fucosylation, -deoxy- -fluoro-l-fucose ( ff) (carbosynth, md ) was added one hour prior to transfection. to produce a cova - variant with elevated galactosylation, f cells were co-transfected ( % of total dna) with a plasmid expressing beta- , -galactosyltransferase (b galt ). in addition mm d-galactose was added h before transfection. antibodies were purified with protein g affinity chromatography as previously described ( ) and stored in pbs at °c . to determine the glycosylation of cova - , aliquots of the mab samples ( µg) were subjected to acid denaturation ( mm formic acid, minutes), followed by vacuum centrifugation. subsequently, samples were trypsinized, and fc glycopeptides were measured as described previously ( ) . relative abundances of fc glycopeptides were determined, and levels of bisection, fucosylation, galactosylation and sialylation were determined as described before ( ) . cells ( , /well) were stimulated in a pre-coated plates as described above in combination with µg/ml polyic (sigma-aldrich). to block syk, cells were pre-incubated with . μm r (selleckchem) or dmso as a control, for minutes at °c. to block the different fcrs, cells were pre-incubated with μg/ml of the following antibodies: (anti-fcyri (cd ; . ; bd bioscience); anti-fcyriia (cd a; iv. ; stemcell technologies); anti-fcyriii (cd ; g ; bd bioscience) and anti-fcαri (cd ; mip a; abcam)) for minutes at °c. then media was added to get a final antibody concentration of μg/ml. paec passage - were seeded : in . % gelatin coated -well ( w e) or -well ( w idf pet) ibidi culture slides for electrical cell-substrate impedance sensing (ecis), as previously described ( ) . cells were maintained in culture in endothelial cell medium (ecm, sciencell, # ) supplemented with % penicillin/streptomycin, % endothelial cell growth supplement (ecgs), % fbs and % nonessential amino acids (neaa, biowest, #x - ), with medium change every other day. from seeding onwards, electrical impedance was measured at hz every minutes. cells were grown to confluence and after h, ecm medium was removed and replaced by supernatant of alveolar macrophage-like monocyte-derived macrophages stimulated for h as described above with polyic, or in combination with patient serum. within every experiment triplicate measurements were performed for each condition. for every experiment paecs and macrophages obtained from different donors were used. paecs were seeded in . % gelatin coated µ-slide vi . ibitreat flow slides (ibidi, # ) and cultured for days. paecs were pre-incubated for h with supernatant of alveolar macrophage-like monocytederived macrophages stimulated for h as described above with polyic, or in combination with patients serum before flow experiments were performed. on the day of perfusion, citrated blood was collected from healthy volunteers and platelets were isolated as previously described ( ) . platelets were perfused for minutes and phase-contrast and fluorescent images were taken with an etaluma ls microscope using a x phase-contrast objective. platelet adhesion was quantified in imagej by determining the area covered by platelets per field of view (fov). to determine cytokine production, supernatants were harvested after h of stimulation and cytokines were detected using the following antibody pairs: il- β and il- (u-cytech biosciences); tnf (ebioscience); and il- (invitrogen). to determine the concentration of anti-spike antibodies present in patients serum, spike protein was coating directly on -well plates at µg/ml overnight to determine serum binding using serial dilutions as previously described ( ) . to detect the vwf levels, flow supernatant was collected after perfusion and vwf levels were measured with elisa. an -well high affinity elisa plate was coated with polyclonal anti-vwf ( : , dako, #a ) and blocked with % bsa. samples were loaded and bound vwf was detected with hrp-conjugated rabbit polyclonal anti-vwf ( : , dako, #a ). normal plasma with a stock concentration of nm vwf (gifted from sanquin) was used as a standard for determination of vwf levels, measured at nm and nm. meso scale discovery multiplex assay v-plex custom human cytokine -plex kits for proinflammatory panel and chemokine panel (k a h- , for il- β, il- , il- , il- , tnf, ccl , cxcl ) and u-plex human interferon combo sector (k k- , for ifn-α a, ifn-β, ifn-γ, ifn-λ ) were purchased from mesoscale discovery (msd). the lyophilized cocktail mix calibrators for proinflammatory panel , chemokine panel , and calibrators for u-plex biomarker group (calibrator , , , ) were reconstituted in provided assay diluents respectively. u-plex plates were coating with supplied linkers and biotinylated capture antibodies according to manufacturer's instructions. proinflammatory cytokines and chemokines in supernatant collected at hours after stimulation were detected with pre-coated v-plex while interferons in -hour supernatant were measured by coated u-plex plates. the assays were performed according to manufacturer's protocol with overnight incubation of the diluted samples and standards at °c. the electrochemiluminescence signal (ecl) were detected by meso quickplex sq plate reader (msd) and analyzed with discovery workbench software (v . , msd). the concentration of each sample was calculated based on the four-parameter logistic fitting model generated with the standards (concentration was determined according to the certificate of analysis provided by msd). log values of measured level of il- β, il- , il- , il- , tnf, ifn-β, ifn-γ, cxcl were used for the principle component analysis. log igg titers (half maximal effective concentration, ec ) were used for the color overlay. cells were stimulated as described above and lysed after hours. total rna was isolated with rneasy mini kit (qiagen) and rnase-free dnase set (qiagen) per the manufacturer's protocol. cdna libraries were prepared using the standard protocol of kapa mrna hyperprep kits (roche) with input of ng rna per sample. size-selected cdna libraries were pooled and sequenced on a hiseq sequencer (illumina) to a depth of - m per sample according to the bp single-end protocol at the amsterdam university medical centers, location vrije universiteit medical center. raw fastq files were aligned to the human genome grch by star (v . . b) with default settings ( ) . indexed binary alignment map (bam) files were generated and filtered on mapq> with samtools (v . . ) ( ) . raw tag counts and reads per kilo base million (rpkm) per gene were calculated using homer 's analyzerepeats.pl script with default settings and the -noadj or -rpkm options for raw counts and rpkm reporting ( ) for further analyses. after detachment, macrophages were stained with antibodies against fc gamma receptors: fcγri (cd ; cat# , biolegend), fcγrii(cd ; cat# , bd bioscience), and fcγriii (cd ; cat# , bd bioscience). fluorescence was measured with cytoflex flow cytometer and analyzed with flowjo software version . . (flowjo, llc, ashland, or). fluorescence minus one (fmo) controls were used for each staining as negative controls. all data generated during this study are available within the paper. the rnaseq data were deposited on gene expression omnibus. source data are provided with this paper. statistical significance of the data was performed in graphpad prism version (graphpad software). correlation analysis: for the fucosylation and cytokine correlation analysis, pearson's correlation was performed for paired fucosylation percentage and measured cytokine concentration. for fig. c fig. an ordinary one-way anova was performed and corrected for tukey's comparisons test (fig a) or sidak's multiple comparison test ( fig. b and c) . for fig. d and fig. s a ratio paired t-test were used. all analyses were performed in the r statistical environment (v . . ). differential expression was assessed using the bioconductor package edger (v . . ) ( ) . lowly expressed genes were filtered with the filterbyexpr function and gene expression called differential with a false discovery rate (fdr) < . . pathway enrichment analyses were performed on the differentially regulated genes with an absolute log (fold change) higher than using the metascape (http://metascape.org/gp/index.html) macrophages stimulated with spike protein and polyic were co-stimulated with serum from icu lung patients that either did not have covid- , had covid- but were negative for anti-spike igg, or had covid- and were positive for anti-spike igg. every dot represents cytokine production after h (ifnβ, ifn-γ) or h (cxcl ) by a different serum donor (mean+sem). grey line indicates cytokine production induced by polyic-spike. statistics were made with brown-forsythe and welch's anova test and corrected by dunnett t test for multiple test correction (sample size: covid -: , covid + anti-spike-: , covid + anti-spike+: ). asterisks indicate significance: *p < . ; **p < . ; ****p < . ; ns=not significant. (c) heatmap showing scaled log expression level of ifn receptors assessed by rnaseq upon h stimulation of human macrophages with polyic, with or without spike protein and serum from five covid- patients that tested positive for anti-spike igg. table s . glycosylation of cova - iggs. levels of fucose, galactose, sialic acid, and bisection of cova - glyco-variants was measured as described previously ( ) . mild or moderate covid- the trinity of covid- : immunity, inflammation and intervention pathological inflammation in patients with covid- : a key role for monocytes and macrophages cytokine storms: understanding covid- . immunity imbalanced host response to sars-cov- drives development of covid- complex immune dysregulation in covid- patients with severe respiratory failure elevated levels of il- and crp predict the need for mechanical ventilation in covid- impaired type i interferon activity and exacerbated inflammatory responses in severe covid- patients. medrxiv anti-spike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection antibody responses to sars-cov- in patients with covid- profile of igg and igm antibodies against severe acute respiratory syndrome coronavirus (sars-cov- ) temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov- : an observational cohort study single-cell landscape of bronchoalveolar immune cells in patients with covid- meta-analysis of in vitro-differentiated macrophages identifies transcriptomic signatures that classify disease macrophages in vivo active control of mucosal tolerance and inflammation by human iga and igg antibodies control of cytokine production by human fc gamma receptors: implications for pathogen defense and autoimmunity clinical features of patients infected with novel coronavirus in wuhan acute lung injury in patients with covid- infection covid- update: covid- -associated coagulopathy bosutinib prevents vascular leakage by reducing focal adhesion turnover and reinforcing junctional integrity in vitro microfluidic disease model to study whole blood-endothelial interactions and blood clot dynamics in real-time potent neutralizing antibodies from covid- patients define multiple targets of vulnerability covid- diagnosis and study of serum sars-cov- specific iga, igm and igg by chemiluminescence immunoanalysis. medrxiv igg subclasses and allotypes: from structure to effector functions the immunoregulatory roles of antibody glycosylation afucosylated immunoglobulin g responses are a hallmark of enveloped virus infections and show an exacerbated phenotype in covid- . biorxiv symptomatic sars-cov- infections display specific igg fc structures. medrxiv fcgammar-tlr cross-talk enhances tnf production by human monocyte-derived dcs via irf -dependent gene transcription and glycolytic reprogramming fc gamma receptor-tlr cross-talk elicits pro-inflammatory cytokine production by human m macrophages fostamatinib for the treatment of chronic immune thrombocytopenia the function of fcgamma receptors in dendritic cells and macrophages an inflammatory cytokine signature helps predict covid- severity and death. medrxiv natural variation in fc glycosylation of hiv-specific antibodies impacts antiviral activity igg antibodies to dengue enhanced for fcgammariiia binding determine disease severity dissecting antibody-mediated protection against sars-cov- fcgammariii stimulation breaks the tolerance of human nasal epithelial cells to bacteria through cross-talk with tlr covid- severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis a high content screen for mucin- -reducing compounds identifies fostamatinib as a candidate for rapid repurposing for acute lung injury during the covid- pandemic. biorxiv inhibition of irf cellular activity with cell-penetrating peptides that target homodimerization proteomic and metabolomic characterization of covid- multi-level glyco-engineering techniques to generate igg with defined fcglycans stasis promotes erythrocyte adhesion to von willebrand factor star: ultrafast universal rna-seq aligner the sequence alignment/map format and samtools moderated estimation of fold change and dispersion for rnaseq data with deseq differential expression analysis of multifactor rna-seq experiments with respect to biological variation metascape provides a biologist-oriented resource for the analysis of systemslevel datasets fast gene set enrichment analysis. biorxiv mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart buffy coats from healthy anonymous donors were acquired from the sanquin blood supply in amsterdam, the netherlands. all the subjects provided written informed consent prior to donation to sanquin. monocytes were isolated from the buffy coats through density centrifugation using lymphoprep™ (axis-shield) followed by human cd magnetic beads purification with the macs ® cell separation columns (miltenyi) as previously described ( ) . the resulting monocytes were seeded on tissue culture plates and subsequently differentiated to macrophages for days in the presence of ng/ml human m-csf (miltenyi) with iscove's modified dulbecco's medium (imdm, lonza) containing % fetal bovine serum (fbs) (biowest) and µg/ml gentamicin (gibco). the medium was renewed on the third day. after -day differentiation, the medium was replaced by culture medium without m-csf and supplemented with ng/ml il (r&d) for hours to generate alveolar macrophage-like monocyte-derived macrophages. these macrophages were then detached with key: cord- -hp s ebh authors: petráš, marek; lesný, petr; musil, jan; limberková, radomíra; pátíková, alžběta; jirsa, milan; krsek, daniel; březovský, pavel; koladiya, abhishek; vaníková, Šárka; macková, barbora; jírová, dagmar; krijt, matyáš; králová lesná, ivana; adámková, věra title: early immune response in mice immunized with a semi-split inactivated vaccine against sars-cov- containing s protein-free particles and subunit s protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: hp s ebh the development of a vaccine against covid- is a hot topic for many research laboratories all over the world. our aim was to design a semi-split inactivated vaccine offering a wide range of multi-epitope determinants important for the immune system including not only the spike (s) protein but also the envelope, membrane and nucleocapsid proteins. we designed a semi-split vaccine prototype consisting of s protein-depleted viral particles and free s protein. next, we investigated its immunogenic potential in balb/c mice. the animals were immunized intradermally or intramuscularly with the dose adjusted with buffer or addition of aluminum hydroxide, respectively. the antibody response was evaluated by plasma analysis at days after the first or second dose. the immune cell response was studied by flow cytometry analysis of splenocytes. the data showed a very early onset of both s protein-specific antibodies and virus-neutralizing antibodies at % inhibition regardless of the route of vaccine administration. however, significantly higher levels of neutralizing antibodies were detected in the intradermally (geometric mean titer - gmt of . ± . ) than in the intramuscularly immunized mice (gmt of . ± . ). in accordance with this, stimulation of cellular immunity by the semi-split vaccine was suggested by elevated levels of b and t lymphocyte subpopulations in the murine spleens. these responses were more predominant in the intradermally immunized mice compared with the intramuscular route of administration. the upward trend in the levels of plasmablasts, memory b cells, th and th lymphocytes, including follicular helper t cells, was confirmed even in mice receiving the vaccine intradermally at a dose of . μg. we demonstrated that the semi-split vaccine is capable of eliciting both humoral and cellular immunity early after vaccination. our prototype thus represents a promising step toward the development of an efficient anti-covid- vaccine for human use. the covid pandemic has made the development of a vaccine an emergency priority. hectic research was enabled by recent major advances in sequencing, protein structure applied into the right caudal thigh muscle, the intradermal one was applied into the auricle under the guidance of a microscope. one week after the first or second dose, the mice were anesthetized with isoflurane and blood was intracardially withdrawn using a ml syringe, transferred to anticoagulant tubes (k edta), and mixed. whole blood was centrifuged ( rpm, minutes, °c), plasma aliquoted into µl microtubes and frozen at - ° c. the spleens were collected in chilled rpmi medium on ice and transported to laboratory within hours for flow cytometry examination. anti-s specific igg antibodies as well as virus-neutralizing antibodies were expressed as the titer. the value had to be log-transformed to pass a normality test (d'agostino & pearson or shapiro-wilk test) followed by a parametric t-test or analysis of variance (anova). as the lymphocyte populations were measured from pooled spleens of each group, the traditional statistical approach could not be employed; hence, the change in lymphocyte subsets using linear regression was determined. if the slope of lines exhibited values significantly different from null, then the observed change was considered to be confirmed. merged lymphocyte subpopulations were analyzed with parametric tests after log- transformation to pass a normality test (shapiro-wilk test). the power of each test was insufficient since the sample size was small. a > % power of the test was only achieved when comparing the geometric mean titers of antibodies between immunized and unimmunized mice. the correlation between igg antibodies and virus-neutralizing antibodies was assessed with the pearson correlation coefficient after the log-transformation of their values. continuous data were summarized using standard descriptive statistics, i.e., median including range or geometric mean with standard deviation or % confidence interval. all tests were two-tailed, and the level of significance was set at . . statistical analyses were performed using prism (graphpad software, inc., san diego, ca) and stata version software (statcorp, college station, tx). results course of the culture, inactivation and purification process to obtain a viral stock for vaccine production, the virus strain was first purified by passages in vero e cells. subsequently, another passages were performed to generate a sufficient stock of virus for the production of the master seed as the source of a virus bank for vaccine production through the seed lot system. growth kinetics analysis of the second passage showed a sufficient virus replication rate reaching a titer of . - . log tcid /ml in the supernatant within - days ( figure ). the peak titers of . and . log tcid /ml from the supernatant and lysate, respectively, were at hours post-infection. moreover, multiplicities of infection (moi) of . - . at a culture temperature of °c and % co were confirmed. confluent monolayers of vero e cells, grown in optipro serum-free medium (life technologies europe bv, the netherlands) with % fetal bovine serum (fbs), were infected by the strain from the second passage of the stock and incubated in a serum-free medium at °c and % co for hours. the first viral suspension of ml was obtained by centrifugation of the supernatant harvest at g for minutes. the second suspension was harvested from infected cells by adding an approx. ml of medium, overnight freezing and thawing at - °c followed by centrifugation under the same conditions. both live viral suspensions containing viral particles with typical corona spikes as documented by electron- microscopic inspection ( figure a ) were stored at °c for at least days. the suspensions were subsequently inactivated by adding beta-propiolactone diluted : , , and continuously shaken up at a temperature of °c for hours. the beta-propiolactone hydrolysis at °c lasted hours. inactivated suspensions were centrifuged and the supernatants were stored at °c for one week. the second inactivation was performed with beta-propiolactone diluted : , at °c for hours. after that, the pellets were centrifuged at , g for minutes and both inactivated suspensions were subsequently placed in water baths to hydrolyze beta-propiolactone with continuous stirring at °c for hours with the temperature gradually decreasing to °c overnight. the inactivated viral suspension contained of . μ g/ml of the s protein for the supernatant harvest and . μ g/ml for the lysate harvest. while the first inactivation did not influence the virion structure, the second one showed partial changes, i.e., complete and incomplete particles with disrupted s protein ( figure b and c). the inactivation including hydrolysis decreased the ph to . - . that disrupted s protein binding to the viral envelope as documented by a record from the electron microscope ( figure c). both suspensions were stored at °c for days to be followed by vaccine purification and thickening to obtain μ g/ μ l the for harvested supernatant and μ g/ μ l for the harvested lysate. this was achieved by a multiple ultra/diafiltration process utilizing amicon ® ultra centrifugal filter units (amicon ultracel) with a volume of ml and a membrane molecular weight cutoff (mwco) of kda (amicon ® ultra- centrifugal filter unit - kda cutoff, merck millipore ltd., darmstadt, germany). the suspensions were centrifuged several times in amicon ultracel at , g for at least minutes to exchange the culture medium and to remove cellular debris and low-molecular substances. the suspension was washed several times with phosphate buffer saline (pbs) of ph . . within a dilution range of - to - . the result of purification and thickening were suspensions containing both virion particles free of the spike and separated s protein as documented by electron microscopy ( figure d ). the vaccine dose for the test of immunogenicity in mice was adjusted in pbs to concentrations of . significantly increasing proportions not only of follicular th lymphocytes but also th and th lymphocytes with an increasing number of doses ( figure a ). in addition, the same effect was observed in these mice for subsets of plasmablasts and memory b cells. discussion the above laboratory procedure generated a semi-split inactivated vaccine, i.e., a vaccine with the s protein separated from the viral particle exhibiting an early, both humoral and cellular, immune response. the design of this prototype vaccine was based on the usual procedures [tang , spruth , gao ]. we selected a strain showing higher infectivity with lower virulence, i.e., one that showed signs of attenuation. overall, we have tested several candidate strains for both immunogenicity through convalescent plasma and propagation capacity by viral titration. viral growth kinetics analysis was in line with the published data [gao , manenti . furthermore, we devised a technique to increase the virus concentration in suspension about to , -fold (note: this finding requires validation). to inactivate the virus properly, beta-propiolactone was chosen because it can alkylate the viral genome and keeps the virus capable of inducing a protective immune response [delrue ] . the process of viral suspension double inactivation was designed in accordance with the procedures of other vaccine manufacturers [pu , xia . design of a multiepitope-based peptide vaccine against the e protein of human covid- : an immunoinformatics approach guidelines for the use of flow cytometry and cell sorting in immunological studies (second edition) inactivated virus vaccines from chemistry to prophylaxis: merits, risks and challenges safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single-blind, randomised controlled trial development of an inactivated vaccine candidate for sars-cov- an in-silico approach to develop of a multi-epitope vaccine candidate against sars-cov- envelope (e) protein. res sq cov- cell entry depends on ace and tmprss and is blocked by a beigel jh; mrna- study group vaccine against sars-cov- -preliminary report evaluation of sars-cov- neutralizing antibodies using a cpe- based colorimetric live virus micro-neutralization assay in human serum samples epub ahead of print rna vaccine bnt b in adults immune and bioinformatics identification of t cell and b cell epitopes in the protein structure of sars-cov- : a systematic review an in-depth investigation of the safety and immunogenicity of an inactivated sars-cov- vaccine determination of % endpoint titer using a simple formula convergent antibody responses to sars-cov- in convalescent individuals concurrent human antibody and th type t-cell responses elicited by a covid- rna vaccine structural insight into the role of novel sars-cov- e protein: a potential target for vaccine development and other therapeutic strategies a double-inactivated whole virus candidate sars coronavirus vaccine stimulates neutralising and protective antibody responses. vaccine preparation, characterization and preliminary in vivo studies of inactivated sars-cov vaccine immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus rna-based covid- vaccine bnt b selected for a pivotal efficacy study. medrxiv inactivated vaccine against sars-cov- on safety and immunogenicity outcomes: interim analysis of randomized clinical trials immunogenicity and safety of a sars-cov- double-blind, and placebo-controlled phase clinical trial augmentation of immune responses to sars coronavirus by a combination of dna and whole killed virus vaccines. vaccine immunogenicity and safety of a recombinant adenovirus type- -vectored covid- vaccine in healthy adults safety, tolerability, and immunogenicity of a recombinant adenovirus type- vectored covid- vaccine: a dose-escalation non-randomised, first-in-human trial key: cord- -ivbf kuq authors: faryami, ahmad; harris, carolyn a title: open source d printed ventilation device date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ivbf kuq covid- is an acute respiratory tract infection caused by a coronavirus known as sars-cov- . the common signs of infection include respiratory symptoms such as shortness of breath, breathing difficulties, dry cough, fever, and in some patients, severe acute respiratory syndrome, kidney failure, and death. , deaths from covid- has been reported as of today. while respiratory symptoms are commonly caused by the infection, the use of mechanical ventilation is required for some patients. the following is intended to review the development and testing of a d printed and open-source mechanical ventilation device that is capable of adjusting breathing rate, volume, and pressure simultaneously and was designed according to the latest clinical observations of the current pandemic. the intuitive design of this device along with the use of primarily d printed or readily available components allow the rapid manufacturing and transportation of this ventilation device to the impacted regions. open-source medical technologies have been developed for testing or experimental use in the past, but the global pandemic has elevated the importance of open-source technologies and medical equipment to rapidly expand the capacity of the healthcare system . this open-source positive pressure ventilation device (osppvd) is a response to the shortages in the hospitals' ventilation capacity, which has been reported to be essential to many covid- patients throughout the world. osppvd has been designed according to health care professionals' descriptions and observations of covid- infection instead of recreating already existing technologies , , . however, it is not intended to replace nor negate the importance of medications as well as care provided by healthcare professionals. the opensource ventilation device is intended to assist in increasing the capacity of healthcare centers or provide the healthcare providers the tools necessary to provide effective care to patients . this device operates based on the changes in hydrostatic pressure within three partially empty and interconnected containers. this is achieved through lowering and raising one of the containers relative to the other two stationery containers. at equilibrium, the height of the water column is equal in all three containers ( figure ). the change in the height of the actuating container results in a change in hydrostatic pressure and displacement of air and water ( figure - ). the flow of water from one container to the other two containers results in the displacement of air transferred to the patient through a series of tubes and one-way check valves. in this setup, two of the containers were secured at a set height while the other container was attached to an electric motor and was lowered or raised relative to the other containers. this setup is particularly useful since it does not require traditional compression equipment such as pistons or precision parts such as gearboxes while a low number of moving parts would allow relative ease in sterilizing and reusing the equipment between patients. a geared dc motor (amazon, seattle, wa, usa) rated at rpm at a v input was utilized in this setup while the frequency of rotation translated to one cycle of inhalation and exhalation. the input voltage to the motor was adjusted via a dc motor speed controller (onyehn, seattle, wa, usa) in order to achieve the required respiration rate. a neiko a laser tachometer (amazon, seattle, wa, usa) was used to measure and record the revolution rate. a d printed motor housing was designed to retain the motor on a tripod stand mounted on the wall using two screws as illustrated in figure . importantly, the height regulator translates the rotational output of the electric motor, then subsequently to linear motion to raise and lower the container. since the relative height of the containers is the variable factor of hydrostatic pressure in this setup, controlling the height difference between the containers determines the set inhalation and exhalation pressure. tidal volume is determined by the volume of water displaced during each cycle as a function of the height difference. the height regulator is d printed in two parts to make sure most printers with a building surface of at least cm in any dimension could also manufacture the parts. finally, a shut off valve was incorporated in the device to adjust tidal volume and to avoid overdistention of normal alveoli and volume-induced trauma. although the inherent advantages of a rigid metal frame with linear rails were initially put into consideration for this device, a d printed pulley system was eventually constructed instead, to reduce the overall number of all the required parts, cost and to improve the ease of storage and transport. a d printed pulley system was constructed using paracord, a mm skateboard ball bearing, and d printed parts. around three meters of readily available paracord was used in this setup. different types of tubing were utilized in the construction of this project ( table ). the tubing is used to transfer water or air. four mm od check valves were used in this setup to carry oxygenated air to the patient and carry the expired air away. the tubing that was used to interconnect the containers has a wider inner diameter to allow equilibrium to be reached rapidly between the containers. all parts were printed on an anycubic mega (amazon, seattle, wa, usa) using overture . mm polylactic acid filament (amazon, seattle, wa, usa). a . mm nozzle was used for printing the files instead of . mm nozzle to reduce the total printing time ( table ). all the computer-aided design files (cad) were made using autodesk inventor software and all the parts were printed without any additional supports, or brim. the d printed parts ( figure ) were color-coordinated with schematic diagrams (figure - ) . the d printed containers were specifically designed to be watertight and the internal walls were optimized for fused deposition manufacturing process to avoid water or air leakage. the inlet and outlet joints were designed at a reasonable tolerance to be printable with almost all consumer-grade d printers while eliminating the need for adhesives in order to further minimize the chance of leakage while in operation and reduce assembly time. in this experiment, the osppvd was operated over a -hour period to evaluate the device performance as well as measuring the output pressure. all the pressure measurements were made using the hti-xintai dual-port handheld digital manometer (amazon, seattle, wa, usa). all pressure measurements were recorded by a digital camera at frames per second and the pressure data from the recording were extracted through processing the video at a frame by frame basis using free video to jpg converter software. the systematic output is a direct result of precision and accuracy of the achieved height by the actuating container. while the use of a pulley system instead of a solid frame was determined as a cost reductive measure and significantly increased the mobility of the setup, the lack of a rigid framework also increased the difference between the set and measured height ( table ) . despite the micro-gaps that occur during the additive manufacturing process, no considerable leakage was observed during the experiment. though other polymers such as high impact polystyrene (hips) or polyethylene terephthalate (petg), are better suited for this application, the ease of use and ubiquity of polylactic acid filament allowed rapid prototyping and iterating of this device. optimization of printing speed and quality show exact printing settings vary between different makes and models (table ) . while the steady increase in pressure observed by a gradual pressurization is best represented by a sinusoidal wave, the handheld digital manometer was not capable of providing a sufficient passage for the flow of large volume air displaced to measure pressure changes through a cycle. however, the pressure achieved at each height was measured and the expected pressure was calculated using the hydrostatic pressure formula. regular tap water at room temperature was used in this experiment. while a maximum of ± . variance between the height set by the height regulator and the pressure measured by the digital manometer was observed during this experiment, the height regulator may also be used as an indicator for the peak pressure in assuming that the density of the medium is known. ( ) = ℎ * ( ) . °c the slight difference between the measured and expected pressure is within the range of difference observed in expected and measured traveled distance. while paracord is a readily available and cost-effective material for building the pulley system, especially due to its high tensile strength. however, stretching in length under load was observed which results in a deviation from the expected height and pressure. a sturdier material may be more suitable for this application. clinical observations indicate that maintaining a positive end-expiratory pressure (peep) is crucial in avoiding alveolar collapse among covid- patients. maintaining peep is easily achieved through a secondary source of compressed air and a pressure-regulated shut-off valve. however, this setup requires further studies and is currently under development , . the steady increase and decrease in pressure are best fit to a sinusoidal graph that is similar to the breathing pattern observed in human adults. the gradual decrease in pressure could be beneficial in avoiding pressure related injury that may result in scar formation and fibrosis. according to the initial reports published by healthcare providers, the loss of compliance due to the onset of fibrosis in the lungs is observed in patients diagnosed with severe cases of covid- infection . an advantage of osppvd presented here is its capability to perform hyperventilation with high flow rates under relatively wide pressure range from that would allow the delivery of oxygen without the over-expansion of the lungs while being able to supply pressurized air as high as or even though the current setup is designed to only reach . the underlying functioning principle of this setup allows it to run through many cycles of inhalation and exhalation with minimal cyclical system fatigue (wear and tear). to our knowledge of the currently available systems, this is an advantage over other open-source ventilators that rely on the compression of a flexible air chamber with limited compression cycles before it has to be replaced from loss of compliance. furthermore, the flexible air chamber is best used when providing positive pressure necessary during the inhalation. this method of ventilation might not be suitable for patients with covid- as they mainly rely on lungs' compliance for exhalation. fibrosis and the loss of lung compliance is a common symptom among covid- patients and may not allow the patient to exhale normally . a conscious effort was made to reduce the number of electronic components needed to manufacture and control the device to reduce the production cost as well as minimizing the chance of device malfunction and injury to the patient due to electronic or software malfunction. however, another iteration of this setup is currently under development that is controlled by a micro-controllers and sensors to adjust the breathing rate and the intake volume according to the patients' needs autonomously and in real-time. oxygen therapy is an essential treatment for many infected patients, and this setup was designed to be capable of intaking concentrated oxygen and supplying the patient according to the healthcare provider's specified volume and pressure. this may be achieved through a regulated cylinder and an oxygen tank connected to the intake port of the positive pressure container. moisturizing and warming up the intake air are another feature that was put into consideration while designing this device. during this experiment, the intake tube was extended below the waterline in the compression chamber which allowed the air to percolate through warm water before being compressed out of the container. the container was d printed from polyethylene terephthalate (petg), due to its thermal properties compared to pla . this method was used to make the intake air warm and humid before being compressed and expelled. although an increase in the air temperature was noticed during this experiment, further analysis is required to measure the moisture content and confirm these results. furthermore, the exhaust air from the negative pressure container may contain airborne water droplets that could be hazardous to the medical professionals as well as other people in the vicinity of the patient. the traditional methods of spreading airborne viruses such as fine filters may be used in this setup. a similar method may also be utilized to remove contaminants before it is transported to the lungs . although calculating the tidal volume using the provided equation is rather simple, it is still necessary to include an inline pressure-regulated valves in order to avoid any injuries due to over-expansion of the lung. other safety valves and sensors may be added in order to protect the patient against ventilator-induced injury. virological assessment of hospitalized patients with covid- clinical characteristics of coronavirus disease in china failure in initial stage containment of global covid- epicenters immune responses and pathogenesis of sars-cov- during an outbreak in iran: comparison with sars and mers hospital surge capacity in a tertiary emergency referral centre during the covid- outbreak in italy a review of open source ventilators for covid- and future pandemics the italian coronavirus disease outbreak: recommendations from clinical practice correction to: ultra-high-resolution computed tomography can demonstrate alveolar collapse in novel coronavirus (covid- ) pneumonia coping with covid - : ventilator splitting with differential driving pressures using standard hospital equipment clinical trials during covid - understanding of covid- based on current evidence selected mechanical properties of petg -d prints adapting reusable elastomeric respirators to utilise anaesthesia circuit filters using a d-printed adaptor; a potential alternative to address n shortages during the covid- pandemic key: cord- -ztonp x authors: nagpal, sunil; srivastava, divyanshu; mande, sharmila s. title: what if we perceive sars-cov- genomes as documents? topic modelling using latent dirichlet allocation to identify mutation signatures and classify sars-cov- genomes date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ztonp x topic modeling is frequently employed for discovering structures (or patterns) in a corpus of documents. its utility in text-mining and document retrieval tasks in various fields of scientific research is rather well known. an unsupervised machine learning approach, latent dirichlet allocation (lda) has particularly been utilized for identifying latent (or hidden) topics in document collections and for deciphering the words that define one or more topics using a generative statistical model. here we describe how sars-cov- genomic mutation profiles can be structured into a ‘bag of words’ to enable identification of signatures (topics) and their probabilistic distribution across various genomes using lda. topic models were generated using ~ novel corona virus genomes (considered as documents), leading to identification of amino acid mutation signatures and nucleotide mutation signatures (equivalent to topics) in the corpus of chosen genomes through coherence optimization. the document assumption for genomes also helped in identification of contextual nucleotide mutation signatures in the form of conventional n-grams (e.g. bi-grams and tri-grams). we validated the signatures obtained using lda driven method against the previously reported recurrent mutations and phylogenetic clades for genomes. additionally, we report the geographical distribution of the identified mutation signatures in sars-cov- genomes on the global map. use of the non-phylogenetic albeit classical approaches like topic modeling and other data centric pattern mining algorithms is therefore proposed for supplementing the efforts towards understanding the genomic diversity of the evolving sars-cov- genomes (and other pathogens/microbes). a document is a thematic body of text containing a semantic structure of words. the theme of a document, also called as the primary topic, is constituted by a specific proportion of various words. considering existence of a finite vocabulary, different proportions of words (and their semantic similarity) in each document would drive the theme(s) or topic(s) of various documents. therefore, while words are apparent constituents of a document, topics are latent (or hidden). topic modeling, a statistical method, employs these characteristics of documents to discover hidden structures (or latent topics) . its utility in text-mining and document retrieval/classification tasks in various fields of scientific research is rather well known [ ] [ ] [ ] . in fact, latent dirichlet allocation (lda), an unsupervised machine learning approach, is particularly known for identifying latent topics in large document collections and deciphering the words that define the inferred topics using a generative statistical model. lda assumes that a document is generated by a distribution of all possible hidden topics, while a topic is generated by the distribution of all possible apparent words. this multiplicity of topic affiliation for documents and words is accommodated through assumption of dirichlet priors which can be optimized to get ideal distribution of coherent topics in a document . the approach can also be made akin to markov-chains for probing the temporal evolution of a large number of documents and document topics . a large number of sars-cov- genome sequences are being deposited to public repositories like gisaid through an unprecedented spirit of scientific collaboration across the world. the high volume of raw data is expected to balloon further by the end of this pandemic. each new sequenced genome is a mutant/variant (with few exceptions) of original reference genome i.e. wuhan/wiv / (epi_isl_ ). in other words, certain mutations at nucleotide and amino acid levels can be expected to be observed in the submitted genomes. understanding the evolution and diversity of these variants has been a subject of interest to a wide spectrum of researchers. various reports aimed at identification of clades or classification system(s) for these genomes have in fact been outcomes of the afore-mentioned problem statement . although conventional topic modeling has been utilized to understand covid- from literature data , can it be applied to obtain additional insights from sequenced sars-cov- genomes and to rather classify them? in other words, can we perceive each genome of sars-cov- as a document containing words in the form of characteristic mutations (figure consequently, a genome would essentially become a bag of mutations (like bag of words in a document). such an assumption can potentially enable classification of the entire genome corpus by identifying mutation signatures (equivalent to topics in document) through topic modeling. moreover, given the inherent temporal nature of genome collections, dynamic topic modeling (e.g. temporal lda or hidden markov model driven lda) may rather provide a way to probe the evolution of the genome variants in terms of identified mutation signatures , . a parallel between classical lda on a large document corpus and a genome corpus is illustrated in plate notation below. where, α is a parameter governing the distribution structure of signatures (nucleotide and amino acid mutations) across all genomes (similar to topics across all documents) θ is a random matrix representing dirichlet distribution of various signatures in the genomes (similar to topics in documents), such that θ(i,j) indicate the probability of the i th genome (document) to contain mutations (words) pertaining to the j th signature (topic) β is a parameter governing the distribution structure of mutations across all signatures (similar to words across all topics) η is a random matrix representing dirichlet distribution of various mutations in signatures (similar to words in topics), such that η (i,j) indicate the probability of the i th signature (topic) to contain the j th mutation ) for each genome (document) g, first draw θg dirichlet(α), then for each ∼ n th mutation (word) of the to substantiate the conjecture, a bag of mutations data structure for ~ sars-cov- genomes submitted to gisaid was created. classical lda was employed to generate topic models leading to identification of amino acid mutation signatures and nucleotide mutation signatures (equivalent to topics) in the corpus of chosen genomes through rigorous hyper-parameter tuning for coherence optimization (figure ) . interestingly, most of the high weight inferred signatures had a good overlap with the previously identified clades specific to various geographical regions (refer table ). for example, the signature- , constituted predominantly by amino acid mutations n-p l/orf b-p s, orf a-l f, orf b-a v, orf a-t k, was observed to dominate in india and other asian regions . biology agnostic, data structure driven approaches for sars-cov- genome sequences may therefore have some merit in not only handling the large amount of genomic data, but also for identifying mutation signatures (and hence classifying genomes) that might be of interest to clinicians/ biologists . their cross validation against phylogenetic estimations can help fine tune the performance of these machine learning algorithms, thereby adding confidence to the use of unconventional methods for probing genomic diversity . approximately sars-cov- sequences, obtained from global initiative on sharing avian influenza data (gisaid) between jan-july , were used. nextstrain's augur pipeline was employed with default parameters to align the sequences against the reference wuhan/wiv / (epi_isl_ ) . individual proteins of sars-cov- were extracted post alignment and translated to the amino acid sequences. comparisons to reference amino acid and nucleotide sequences were performed to profile mutations for all viral genome sequences. a genome collection (document corpus) mapped to the identified nucleotide and amino acid substitution mutations (document vocabulary) was created. sample mutation profile data structure have been provided in supplementary table . it may be noted that only those genomes were employed for topic modelling which contained at least one amino acid mutation. as shown in figure , bag of words representation of a document in natural language processing pertains to two aspects of the document: table ). two corpus vocabularies were consequently created (one each for nucleotide and amino acid mutations). binary document vectors were prepared for each of the genomes against the corpus vocabularies for these two types of mutations. mutation-genome matrices so computed for the two corpora represented the global picture of 'bag of words' models for novel corona virus genomes. it may be noted that unlike a conventional natural language processing task, given the non-linguistic context of observed mutations, the issues pertaining to tokenization, stop words, lemmatization and stemming were not relevant here . given that most of the existing clade definitions employ two or more co-occurring mutations, a bigram nucleotide mutation model was also created for the genomes. the corpus vocabulary for bigrams was created by taking into account the observed co-occurring pairs of mutations in the entire corpus of nucleotide mutation vocabulary (and not all possible pairs of mutations), such that each genome was represented by a numerically sorted list of nucleotide mutations. it is pertinent to note that numeric sorting of mutations is critical in searching for bi-grams (or n-grams) for a meaningful contextual search. it may be noted that the choice of bi-gram mutations is enabled by a probabilistic scoring procedure as follows: the probabilistic derivation of bi-grams paves the way for an initial estimation of signatures of any size in the corpus using the following progressive probabilistic scoring: score ( (n− ) , m ) = count ( ( n− ) , m ) −min ( (n− ) , m ) count ((n− ) ) * count ( m ) * um where: (n- ) , m are mutations in a pair, such that (n- ) refers to the (n- ) sized mutation combination um is total unique mutations (i.e. size of mutation vocabulary) score refers to the confidence score for the given pair count refers to the total occurrence in the corpus min refers to the minimum occurrence threshold for the mutation(s) in the corpus python's gensim library was employed to estimate topic (mutation signature) models for ~ sars-cov- genomes through online variational bayes (vb) algorithm as described previously , . quality of mutation signatures (topics) inferred by lda was assessed using a coherence score which refers to an index of the semantic similarity between dominant mutations (words) of the mutation signature (topic). in other words, a mutation signature (topic) with high coherence is expected to have mutations (words) with high co-occurrence similarity score. a good overall mutation signature extraction is therefore expected to have a high mean coherence. the coherence measure was calculated for different numbers of mutation signature extractions between - and an optimal score for nucleotide as well as amino acid mutations were obtained. further hyperparameter optimization was performed for a range of alpha and beta measures (between . - . , step size of . ) and the number of topics, in order to maximize the coherence score, and optimal values for all three parameters were obtained using the grid-search alogrithm . the entire implementation was executed in a core xeon series . ghz machine with gb ram in a python v . . kernel with gensim v . . and scikit-learn v . . for topic modelling using lda. word clouds provide quick visual reference to the dominant words in a bag of words. as shown in r k, g r (figure b) . this approach therefore provided a preliminary way of quickly visualizing the global mutation signatures. at a minimum co-occurrence count of genomes and threshold of , nucleotide mutation bigrams were identified, the most frequent (in ~ genomes) bi-grams being g a_g c and g a_g a, followed by a g_g t ( genomes). supplementary table provides a full list of the detected bi-grams along-with their respective scores and cooccurrence counts. similarly, supplementary table provides a list of tri-grams. g a_g a_g c was observed to be the most frequent tri-gram ( genomes), followed by a g_g a_g a ( genomes) and c t_c t_c t ( genomes). the probabilistic approach can be extended to co-occurring contextual mutations of any size (n-gram) (described in methods section). allocation amino acid mutation signatures and nucleotide mutation signatures were obtained at an alpha (α) value of . and beta (β) . . these hyper-parameters, as described in the methods section, were optimized through grid search. validating non-phylogenetic algorithms of genome classification against phylogenetic estimations can provide an index of suitability of the data structure driven methods. as a qualitative crosschecking, the dominating mutation composition of signatures inferred using lda was compared with the well known recurrent mutation reports and clade definitions. table provides a summary of the amino acid mutation signatures detected through lda and corresponding close literature evidence citing a similar phylogenetically estimated genome group/clade (if any). the mutations in the signature were ordered in according to the probability of their presence in the signature. consequently, each signature was dominated by the first mutation, as compared to the probability of occurrence of other mutations in the signature. also, it is pertinent to note that given the probabilistic nature of inference, a high total score (weight) is more likely to indicate co-occurring mutations across large number of genomes. first five mutations, in the order of their probability of occurrence in the signatures, have been listed in table . in addition, the bi-grams and tri-grams identified through probablistic approach in this study have already been supported with their score and prevalence across genomes (supplementary table and ). [ ] . orf -l s, orf b-y c, orf b-p l, n-s n, orf -v i [ ] while the mutation signatures obtained through unsupervised machine learning approaches are not phylogenetic, right choice of algorithms can enable identification of probabilistic (and hence reliable) markers to classify genomes based on observed mutations. in fact, an evolutionary trail may also be established by following a temporal approach to lda (or other methods of topic modeling). an increase in efficiency of signature detection may further be achieved through other topic modeling methods (e.g. short text topic modeling). importantly, insights obtained about latent signatures through machine learning approaches like lda can also guide phylogenetic estimations. this article is intended to encourage the use of unconventional data driven approaches as an avenue that deserves attention of both data scientists and biologists alike. this, we believe, is expected to supplement the efforts in understanding the genomic diversity of the evolving sars-cov- genomes (and other pathogens). orf a-m i orf b-p l, orf a-q h, orf a-t i orf b-p l, orf a-v l, n-d y orf a-g v, orf a-p s, orf a-i v orf a-a t, orf a-a v, orf a-p l, n-p s orf b-p s, orf a-l f, orf b-a v latent dirichlet allocation inference of population structure using multilocus genotype data finding scientific topics topic evolution based on lda and hmm and its application in stem cell research cord- : the covid- open research dataset global initiative on sharing all influenza data -from vision to reality a dynamic nomenclature proposal for sars-cov- lineages to assist genomic epidemiology spatial latent dirichlet allocation a distinct phylogenetic cluster of indian sars-cov- isolates machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: covid- case study nextstrain: real-time tracking of pathogen evolution distributed representations of words and phrases and their compositionality online learning for latent dirichlet allocation software framework for topic modelling with large corpora scikit-learn: machine learning in python genetic variants and source of introduction of sars-cov- in south america genetic spectrum and distinct evolution patterns of sars-cov- variant analysis of sars-cov- genomes climatic-niche evolution of sars cov- sars-cov- genomic variations associated with mortality rate of covid- global spread of sars-cov- subtype with spike protein mutation d g is shaped by human genomic variations that regulate expression of tmprss and mx genes sars-cov- genomic surveillance in taiwan revealed novel orf -deletion mutant and clade possibly associated with infections in middle east amino acid variation analysis of surface spike glycoprotein at in sars-cov- strains molecular modelling predicts sars-cov- orf protein and human complement factor catalytic domain sharing common binding site on complement c b we gratefully acknowledge all the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories where genetic sequence data were generated and shared via the gisaid initiative, on which this research is based. genome sequences and meta-data should be downloaded from https://www.gisaid.org. a sample file for mutation profiles generated for this research has been provided in supplementary table along with the original contributors of these virus genome sequences in supplementary table . authors would also like to thank the management of tata consultancy services ltd for promoting the environment of fundamental and applied research. authors would like to thank their colleague nishal k. pinna for his assistance in generating mutation profiles. key: cord- - bz acw authors: tang, mei san; case, james brett; franks, caroline e.; chen, rita e.; anderson, neil w.; henderson, jeffrey p.; diamond, michael s.; gronowski, ann m.; farnsworth, christopher w. title: association between sars-cov- neutralizing antibodies and commercial serological assays date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: bz acw introduction commercially available sars-cov- serological assays based on different viral antigens have been approved for the qualitative determination of anti-sars-cov- antibodies. however, there is limited published data associating the results from commercial assays with neutralizing antibodies. methods specimens from patients with pcr-confirmed covid- and a positive result by the roche elecsys sars-cov- , abbott sars-cov- igg, or euroimmun sars-cov- igg assays and control specimens were analyzed for the presence of neutralizing antibodies to sars-cov- . correlation, concordance, positive percent agreement (ppa), and negative percent agreement (npa) were calculated at several cutoffs. results were compared in patients categorized by clinical outcomes. results the correlation between sars-cov- neutralizing titer (ec ) and the roche, abbott, and euroimmun assays was . , . , and . respectively. at an ec of : , the concordance kappa with roche was . ( % ci; . - . ), with abbott was . ( . - . ), and with euroimmun was . ( . - . ). at the same neutralizing titer, the ppa and npa for the roche was % ( - ) & % ( - ); abbott was % ( - ) & % ( - ); and euroimmun was % ( - ) & % ( - ) for distinguishing neutralizing antibodies. patients who died, were intubated, or had a cardiac injury from covid- infection had significantly higher neutralizing titers relative to those with mild symptoms. conclusion covid- patients generate an antibody response to multiple viral proteins such that the calibrator ratios on the roche, abbott, and euroimmun assays are all associated with sars-cov- neutralization. nevertheless, commercial serological assays have poor npa for sars-cov- neutralization, making them imperfect proxies for neutralization. comparisons, all specimens were >d post-symptom onset. all statistical analyses were performed with graphpad prism (graphpad). the correlation of the sars-cov- neutralizing titer with the ratio reported by the roche, abbott, and ei assays was . , . , and . respectively (figure a-c) . higher neutralizing titers were generally associated with a higher ratio as measured by all three assays. at a cutoff of : for the neutralizing assay, the concordance kappa with roche was . ( % ci; . - . ), with abbott was . ( . - . ), and with ei was . ( . - . ). for all three assays, the concordance decreased with an increased threshold for neutralizing titers. contrast, no significant difference in ratio was observed between patients that died from covid- compared to those that survived using the roche, abbott, or ei assays. increased neutralizing antibody titers were also higher in patients that were intubated, had cardiac injury, or aki relative to those with milder covid- symptoms ( figure b-d). in contrast, no significant differences were noted between the groups regardless of outcomes when using the roche, abbott, and ei assays. however, similar non- significant trends (i.e., increase in ratio) were observed in patients who were intubated, had cardiac injury, or aki with the ei assay. neutralizing titers trended higher in male patients and patients > years old, although this was not statistically significant. similar trends were observed with the serology assay ratios as well (supplemental were no significant differences in outcomes between patients. however, there was an increase in the ratio observed in high neutralizing titer patients ( . , % ci; . - . ) compared to low titer patients ( . , % ci; . - . ) on the abbott assay and the ei assay ( . , . - . vs. . , . - . ) (supplemental table all three commercial assays correlated with higher neutralizing titers, this was not universally true. consistent with this, the correlation between neutralizing titers and serological results were < . on all three commercial assays. these findings are neutralizing sars-cov- titers with anti-rbd igg or anti-s igg using laboratory developed elisas ( ). nonetheless, we found that higher ratios reported by all three commercial assays was associated with higher neutralizing titers. importantly, all three serological assays used in this study currently have emergency use authorization (eua) to qualitatively determine the presence of antibodies against sars-cov- . while a negative result on sars-cov- serological assays is likely to be associated with the absence of neutralizing antibody titers, a positive result is not reliable for predicting the presence of neutralizing antibodies. furthermore, since these assays are under the eua, they cannot be modified by the laboratory to report quantitative units. our results argue for a potential utility in reporting the ratio calculated for commercially available assays relative to the calibrator. we, along with others, have previously patients ( ). our findings are also consistent with a study assessing the agreement between the ei igg result and neutralizing titers on predominantly non-hospitalized convalescent plasma donors ( ). the authors demonstrated that at a neutralizing titer of : , the ppa and npa were % and % respectively and that neutralizing titers were higher in a small cohort of hospitalized patients. similarly, we demonstrate higher neutralizing titers among patients with worse outcomes in an almost entirely hospitalized cohort. unique to this study, we also compare commercial tests head-to- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the humoral immune response janeway's immunobiology a sars-cov-infection model in mice demonstrates protection by neutralizing antibodies potent neutralizing antibodies from covid- patients define multiple targets of vulnerability tang clinical performance of two sars-cov- serologic assays . cdc. symptoms of coronavirus neutralizing antibody and soluble ace inhibition of a replication-competent vsv-sars-cov- and a clinical isolate of sars-cov- a highly conserved cryptic epitope in the receptor binding domains of sars-cov- and sars-cov . world health organization tang key: cord- -k gxww c authors: arévalo, ap; pagotto, r; pórfido, j; daghero, h; segovia, m; yamasaki, k; varela, b; hill, m; verdes, jm; duhalde vega, m; bollati-fogolín, m; crispo, m title: ivermectin reduces coronavirus infection in vivo: a mouse experimental model date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: k gxww c sars-cov is a single strand rna virus member of the type coronavirus family, responsible for causing covid- disease in humans. the objective of this study was to test the ivermectin drug in a murine model of coronavirus infection using a type family rna coronavirus similar to sars-cov , the mouse hepatitis virus (mhv). balb/cj female mice were infected with , pfu of mhv-a (group infected; n= ) and immediately treated with one single dose of μg/kg of ivermectin (group infected + ivm; n= ), or were not infected and treated with pbs (control group; n= ). five days after infection/treatment, mice were euthanized to obtain different tissues to check general health status and infection levels. overall results demonstrated that viral infection induces the typical mhv disease in infected animals, with livers showing severe hepatocellular necrosis surrounded by a severe lymphoplasmacytic inflammatory infiltration associated with a high hepatic viral load ( , au), while ivermectin administration showed a better health status with lower viral load ( , au; p< . ) and few livers with histopathological damage (p< . ), not showing statistical differences with control mice (p=ns). furthermore, serum transaminase levels (aspartate aminotransferase and alanine aminotransferase) were significantly lower in treated mice compared to infected animals. in conclusion, ivermectin seems to be effective to diminish mhv viral load and disease in mice, being a useful model for further understanding new therapies against coronavirus diseases. immediately after necropsy, liver and spleen were fixed in % neutral buffered formalin (ph . ) for further processing. for evaluation, they were embedded in paraffin, sectioned at µm and stained with hematoxylin-eosin (h&e), according to (kyuwa et al., ) . control group (p< . ). results are shown in figure a , b, c and d. mouse hepatitis virus infection, liver, mouse digestive system the fda- approved drug ivermectin inhibits the replication of sars-cov- in vitro a role for neutrophils in viral respiratory disease new insights about albumin and liver disease, annals of hepatology avermectin exerts anti-inflammatory effect by downregulating the nuclear transcription factor kappa-b and mitogen-activated protein kinase activation pathway ivermectin, 'wonder drug' from japan: the human use perspective the immunobiology of sars* modelos lineales generalizados mixtos: aplicaciones en infostat cryo-em analysis of the post-fusion structure of the sars-cov spike glycoprotein molecular pathology analyses of two fatal human infections of avian influenza a(h n ) virus zhong ns; china medical treatment expert group for covid- . clinical characteristics of coronavirus disease in china ivermectin: a systematic review from antiviral effects to covid- complementary regimen ivermectin as a broad-spectrum host antiviral: the real deal? cells animal and translational models of sars-cov- infection of mice and men: the coronavirus mhv and mouse models as a translational approach understand sars-cov- the importin alpha/beta-specific inhibitor ivermectin affects hif-dependent hypoxia response pathways acute hepatic failure in ifn-γ-deficient balb/c mice after murine coronavirus infection acute and chronic changes in the microcirculation of the liver in inbred strains of mice following infection with mouse hepatitis virus type pathogenesis of coronavirus-induced infections. review of pathological and immunological aspects hematological parameters in the early phase of influenza a virus infection in differentially susceptible inbred mouse strains liver immunology and its role in inflammation and homeostasis immunomodulatory effect of various anti-parasitics: a review ivermectin, a new candidate therapeutic against sars-cov- /covid- toxicology testing. in: haschek and rousseaux's handbook of toxicologic nuclear/nucleolar localization properties of c-terminal nucleocapsid protein of sars coronavirus ivermectin is a specific inhibitor of importin α/β-mediated nuclear import able to inhibit replication of hiv- and dengue virus coronavirus pathogenesis advances in virus nucleocytoplasmic transport of nucleocapsid proteins of enveloped rna viruses ivermectin inhibits lps-induced production of inflammatory cytokines and improves lps-induced survival in mice key: cord- -qkhfp l authors: steiner, daniel j.; cognetti, john s.; luta, ethan p.; klose, alanna m.; bucukovski, joseph; bryan, michael r.; schmuke, jon j.; nguyen-contant, phuong; sangster, mark y.; topham, david j.; miller, benjamin l. title: array-based analysis of sars-cov- , other coronaviruses, and influenza antibodies in convalescent covid- patients date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: qkhfp l detection of antibodies to upper respiratory pathogens is critical to surveillance, assessment of the immune status of individuals, vaccine development, and basic biology. the urgent need for antibody detection tools has proven particularly acute in the covid- era. we report a multiplex label-free antigen microarray on the arrayed imaging reflectometry (air) platform for detection of antibodies to sars-cov- , sars-cov- , mers, three circulating coronavirus strains (hku , e, oc ) and three strains of influenza. we find that the array is readily able to distinguish uninfected from convalescent covid- subjects, and provides quantitative information about total ig, as well as igg- and igm-specific responses. what they do not provide, however, is a broader understanding of the human immune response to sars-cov- infection, or illuminate potential relationships between covid- infection and previous infections (and immunity to) other respiratory viruses including circulating coronaviruses that cause the common cold. to address these goals, multiplex analytical techniques are required. a bead-based multiplex immunoassay for six coronaviruses infecting humans (pre-sars-cov- ) has been reported, and more recently a -plex assay on the quanterix platform focused on sars-cov- antigens has been described. despite these advances, there remains a significant need for analytical methods able to rapidly quantify antibodies not only to sars-cov- , but also to other coronaviruses, and other pathogenic viruses. most importantly, these must be able to discriminate among responses to different closely related viruses and different antigens from the same virus. to address this need, we have developed a prototype -plex array on the arrayed imaging reflectometry (air) platform. air is a label-free multiplex sensor method in which the surface chemistry and deposition of capture molecules to form a microarray on a silicon chip are carefully controlled such that s-polarized hene laser light at a . º incident angle to the chip undergoes total destructive interference within the surface film. binding to any probe spot on the array degrades the antireflective condition in proportion to the amount of material bound, yielding an increase in the reflected light as observed by a ccd camera. by comparing the intensity of the reflected light to an experimentally validated model, the thickness change for each spot, and therefore the quantity of each analyte in the sample, may be precisely and sensitively determined. we have previously reported the utility of influenza antigen arrays fabricated on the air platform for assessment of anti-influenza antibodies in human, animal, and avian serum, , both as a tool for viral surveillance and for assessment of the efficacy of a candidate vaccine. we have also demonstrated that air is scalable at least to -plex assays, used for discriminating different influenza virus serotypes. , we therefore anticipated that the platform would be useful as a way to quantify anti-sars-cov- antibodies, antibodies to other coronaviruses including circulating ("common cold") strains, and other respiratory pathogens including influenza. here, we discuss the development and testing of a mixed coronavirus / influenza antigen panel on air, and its application to analyzing the coronavirus antibody profile of a cohort of convalescent covid- patients and subjects of unknown disease status. material sources: for air assays, sars-cov- , sars-cov, mers, and influenza type a and b antigens were obtained from sino biological, inc., and are described in more detail below. most antigens were supplied as lyophilized material and reconstituted at the recommended concentrations using -mΩ water, while the remaining antigens were supplied frozen on dry ice. pbs-et was prepared as phosphate buffer ( mm monobasic sodium phosphate, mm dibasic sodium phosphate, mm nacl) with . % w/v tween- and mm edta. aminereactive substrates for fabrication of air arrays were provided by adarza biosystems, inc. for elisa assays, sars-cov- full-length spike and rbd were produced in-house using a mammalian expression system, , as was influenza a/h n /california hemagglutinin. hcov- e and hcov-oc spike proteins (baculovirus-expressed) were obtained from sino biological. tetanus toxoid (ttd) was obtained from calbiochem. antigen probe formulation: prior to microarray fabrication, antigens were buffer-exchanged and concentrated using amicon centrifugation filters (emd millipore) into phosphate buffer at ph . and ph . prior to use. during development, several printing concentrations and/or solution ph values of each antigen were tested, along with sugar additives (glycerol, trehalose) in order to optimize spot uniformity and morphology as well as initial probe thickness. preparation of arrays: arrays were printed on amine-reactive silicon oxide substrates (adarza biosystems, inc.) using a scienion sx piezoelectric microarrayer (scienion, a.g.) with spot volumes of approximately pl. six spots were printed for each antigen, the final layout of which is shown in figure . the number of spots arrayed was not critical to robust analytical performance or statistical analysis. each spot consists of approximately pixels when imaged by the ccd in an air chip reader (adarza biosystems, inc.), with each pixel representing a discrete interrogation of a unique probe surface region. therefore, averaging these pixel values together produces an inherently reliable measure of analyte-to-probe response. dilutions of polyclonal anti-fluorescein (anti-fitc, rockland inc.), were printed as negative intra-array controls. after printing, chips were mounted onto adhesive strips at appropriate spacing for -well plates, and then placed into mm sodium acetate buffer (ph ) for minutes. next, a . % bsa solution was added to each well resulting in a final bsa concentration of . % to passivate the remaining amine-reactive surface functionality. after blocking for minutes, the chips were transferred to new wells containing % fetal bovine serum (gibco) in pbs-et as a secondary block, and incubated for min. this step was required to reduce nonspecific binding from human serum at the assay endpoint. the chips were then rinsed briefly ( min) in new wells containing pbs-et, then transferred to wells containing microarray stabilizer solution (surmodics ivd). after a -minute incubation, the chips were dried at °c in an oven for min. this last step renders the sensors shelf-stable, until use in assays performed later. igm, or % fbs as a negative control. each of these conditions was produced in duplicate. secondary antibodies were diluted to μg/ml for both goat α-higg (jackson immunoresearch) and rabbit α-higm (rockland, inc.) in adarza diluent. after one hour of incubation with secondary antibodies at room temperature, chips were washed twice for minutes in pbs-et, then rinsed with water and dried with nitrogen as before. data analysis: air images were analyzed using the adarza ziva data analysis tool. probe spots with major defects or debris were manually flagged and eliminated, and minor defects in spot quality were automatically identified and excluded from the median intensity measurement. the median intensity values were converted to median thickness values using a best-fit line to an experimentally derived reflectance model. then, the median thickness values were further processed in microsoft excel as described below, and are referred to simply as "thickness" hereafter. while anti-fitc spots were designed to serve as an intra-chip normalizer, these were not used as such due to the unexpected presence of anti-goat igg antibodies in some single donor human serum samples. therefore, the blank area served an intra-chip normalizer to mitigate any variation in the reactivity of the surface chemistry between air chips. the thickness of the blank area was subtracted from the thickness of each probe spot to produce "normalized thickness" values for each probe spot. all of the normalized thickness values across replicate chips (n= ) were averaged together (maximum of n= probe spots) for each antigen, and the standard deviation was calculated. the average thickness for each antigen in the fetal bovine serum (fbs) control was subtracted from the average thickness obtained for each antigen in each subject sample to produce the "normalized thickness change (Δ thickness)." in the case of the polyclonal antibody titration, the control chip was incubated in a matrix of fbs and pnhs. elisa assay: serum igg titers specific for sars-cov- proteins and selected non-coronavirus proteins were determined by elisa as described previously. human serum standards were used to assign weight-based concentrations of antigen-specific igg as previously described, with the limit of assay sensitivity set at . μg/ml for all antigens. , results: . this is as expected given the prevalence of these viruses in the general population. addition of an anti-sars-cov- polyclonal antibody raised against the sars-cov- spike protein receptor binding domain (rbd) at μg/ml produced a strong signal on all three rbdcontaining antigens (s + s ecd, s , and rbd). overall response to the polyclonal antibody was well-behaved, and titrated to zero as expected ( figure c and d) . quantitative data are presented in Ångstroms of build. at the highest concentrations, significant cross-reactive binding to the hcov- e spike protein was observed, as well as some binding to the hcov-oc spike protein and mers s . calculated limits of detection for these data were . ng/ml (sars-cov- s + s ecd), . ng/ml (sars-cov- s ), and . ng/ml (sars-cov- rbd). however, these should be viewed as provisional, and subject to optimization. response of a commercial anti-sars-cov- rabbit polyclonal antibody (pab) on the array. (a) array exposed to array exposed to % fbs + % pnhs; (b) array exposed to μg/ml anti-sars-cov- pab in % fbs + % pnhs. strong responses to sars-cov- s +s ecd, s , and rbd are observed, as well as smaller cross-reactive responses to hcov- e, hcov-oc , and mers spike proteins; (c) quantitative data for the titration. convalescent serum array responses were compared to an elisa assay ( figure ). as elisa values were all igg-specific, and air data discussed thus far (obtained in a "label-free" mode) was a combination of igg and igm-specific responses, these results would not be expected to match precisely. differences in the expression system used for antigen production (baculovirus for commercial antigens used in air; hek t cells used for antigens used in the elisa assays) could also lead to differences. however, overall trends for sars-cov- antigens correlate well, as shown in figure . to provide further detail with regard to the response, air assays were run using secondary anti-igg and anti-igm antibodies to determine class-specific responses for a subset of samples ( figure ). was the case with assays run using the laboratory air assay, analysis using the ziva system readily discriminated between negative and convalescent samples (figure ). three putative convalescent covid- samples gave responses on all sars-cov- antigens that were below the threshold for a positive response (two standard deviations above the average of the negative samples). this is analogous to the air and elisa results obtained for sample hd , as described above. the remaining convalescent samples gave strong responses on at least one sars-cov- antigen, with many responding strongly to both rbd and s ( figure ). with at least one sars-cov- antigen response above threshold. health and disease result from many factors, including the overall landscape of a person's immune system. as such, methods for profiling antigen-specific antibody titers to a range of diseases in addition to the disease of primary current interest are of utility when studying the disease. to that end, we have presented preliminary data on a -plex array on the air platform, developed in response to the need to study sars-cov- but incorporating antigens for other coronaviruses and influenza. responses to sars-cov- antigens on the array effectively discriminated between serum samples from uninfected and covid- convalescent subjects, with generally good correlation to elisa data. follow-up assays demonstrated that exposure of the arrays to anti-igg and anti-igm antibodies enabled discrimination of antibody isotype. an important aspect of this work is the ability to evaluate anti-sars-cov- immunity in the context of the individual's overall immune landscape. because available chip real estate allows for substantial expansion of the multiplex capability of the array, in ongoing efforts we will add additional antigens for other strains of influenza (by analogy to our previous work ), as well as other upper respiratory infections such as respiratory syncytial virus and metapneumovirus. other coronavirus antigens including nucleocapsid (n) are also likely candidates for addition to the array, as they are known to produce an immune response (as seen in the elisa results, for example). thus, the flexibility of the air platform will prove useful not only in the current pandemic, but as other viruses inevitably emerge. report from the american society for microbiology covid- international summit targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals anti-sars-cov- virus antibody levels in convalescent plasma of six donors who have recovered from covid- serology assays to manage covid- developing antibody tests for sars-cov- " the lancet sars-cov- seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup diagnostic value and dynamic variance of serum antibody in coronavirus disease development and clinical application of a rapid igm-igg combined antibody test for sars-cov- infection diagnosis profiling early humoral response to diagnose novel coronavirus disease (covid- ) test performance evaluation of sars-cov- serological assays development and evaluation of a multiplexed immunoassay for simultaneous detection of serum igg antibodies to six human coronaviruses ultra-sensitive high-resolution profiling of anti-sars-cov- antibodies for detecting early seroconversion in covid- patients a theoretical and experimental analysis of arrayed imaging reflectometry as a sensitive proteomics technique validation of arrayed imaging reflectometry biosensor response for protein-antibody interactions: cross-correlation of label-free, arrayed sensing of immune response to influenza antigens a multiplex label-free approach to avian influenza surveillance and serology crowd on a chip: label-free human monoclonal antibody arrays for serotyping influenza characterizing emerging canine h influenza viruses sars-cov- seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup a serological assay to detect sars_cov- seroconversion in humans investigation of non-nucleophilic additives for reduction of morphological anomalies in protein arrays memory b cell expansion by seasonal influenza virus infection reflects early-life imprinting and adaptation to the infecting virus assignment of weight-based antibody units to a human antipneumococcal standard reference serum, lot -s statistical method for determining and comparing limits of detection of bioassays we thank alicia papalia for assistance with human samples, and florian krammer for the generous donation of plasmids for sars-cov- antigen production (s + s ecd and rbd). key: cord- -wzndpcq authors: albagi, sahar obi abd; al-nour, mosab yahya; elhag, mustafa; abdelihalim, asaad tageldein idris; haroun, esraa musa; essa, mohammed elmujtba adam; abubaker, mustafa; hassan, mohammed a. title: a multiple peptides vaccine against ncovid- designed from the nucleocapsid phosphoprotein (n) and spike glycoprotein (s) via the immunoinformatics approach date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wzndpcq due to the current covid- pandemic, the rapid discovery of a safe and effective vaccine is an essential issue, consequently, this study aims to predict potential covid- peptide-based vaccine utilizing the nucleocapsid phosphoprotein (n) and spike glycoprotein (s) via the immunoinformatics approach. to achieve this goal, several immune epitope database (iedb) tools, molecular docking, and safety prediction servers were used. according to the results, the spike peptide peptides sqcvnlttrtqlppaytnsftrgvy is predicted to have the highest binding affinity to the b-cells. the spike peptide ftisvttei has the highest binding affinity to the mhc i hla-b allele. the nucleocapsid peptides ktfpptepk and rwyfyylgtgpeagl have the highest binding affinity to the mhc i hla-a allele and the three mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb - * : , hla-drb , respectively. furthermore, those peptides were predicted as non-toxic and non-allergen. therefore, the combination of those peptides is predicted to stimulate better immunological responses with respectable safety. human coronaviruses (hcovs) including the severe acute respiratory syndrome coronavirus (sars-cov- , covid- ) are enveloped, positive-sense, single-stranded polyadenylated rna viruses belong to the coronaviridae family. they cause systemic and respiratory zoonotic diseases [ ] . the sars-cov- is a novel strain detected firstly in the city of wuhan, the republic of china in december [ ] . it causes fever, cough, dyspnea, bilateral infiltrates on chest imaging and may progress to pneumonia [ ] . the covid- is characterized by rapid spreading; "as the feb, it is reported in countries, causing over , infections with , deaths" [ ] and till the th may, more than . million positive cases and . million deaths have been identified globally [ ] . unfortunately, until now covid- has no effective antiviral drug for the treatment or vaccine for the prevention, hence extensive researches should be conducted on the development of safe and effective vaccines and antiviral drugs [ ] . to develop a safe and effective covid- vaccine rapidly, the who recommended that "we must test all candidate vaccines until they fail to ensure that all of them have the chance of being tested at the initial stage of development". ensuing this point, recently, there are over proposed vaccines. six of them in the clinical evaluation and in pre-clinical evaluation [ ] . the vaccine development is achieved by multiple approaches including the inactivated, live-attenuated, non-replicating viral vector, dna, rna, recombinant proteins, and peptide-based vaccines. "as of april , the global covid- vaccine r&d landscape confirmed active candidate vaccine" [ ] . consistent with global efforts, this study aims to predict potential covid- peptide-based vaccine utilizing the nucleocapsid phosphoprotein (n) and spike glycoprotein (s) via the immunoinformatics approach. due to the respectable antigenicity of the nucleocapsid and spike glycoprotein, they are appropriate targets for vaccine design [ ] . the peptide vaccines are sufficient to stimulate cellular and humoral immunity without allergic responses [ ] . they are" safe, simply produced, stable, reproducible, cost-effective" [ ] , and permits a broad spectrum of immunity [ ] , consequently, they are the targets for this study as well as they utilized in multiple studies concerning covid- vaccine [ ] [ ] [ ] [ ] . a total of covid- nucleocapsid phosphoprotein (n) and spike glycoprotein (s) were retrieved from the national center for biotechnology information (ncbi) database [ ] as fasta format in march . the sequences with their accession number are listed in the supplementary file s . the protein sequences of mhc i alleles hla-a* : , hla-b : , hla-c* : , and mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb * : , hla-drb were obtained from the immuno polymorphism database (ipd-imgt/hla) [ ] . the retrieved covid- nucleocapsid phosphoprotein (n) and spike glycoprotein (s) sequences were aligned using the clustalw algorithm [ ] on the bioedit software version . . [ ] to identify the conserved regions between sequences. the b-cells peptides were predicted from the conserved regions using the linear epitope prediction tool "bepipred-test" on the immune epitope database (iedb) [ ] at the default threshold value - . . to predict the epitopes accurately, a combination between the hidden (parker and levitt) method and the markov model (hmm) [ ] was used. the surface accessibility of b-cells peptides was predicted via the emini surface accessibility tool [ ] on the iedb [ ] at the default threshold holding value. to identify the antigenic sites within the nucleocapsid phosphoprotein and spike glycoprotein, the kolaskar and tongaonker method's on the iedb [ ] at the default threshold value. to predict the interaction with different mhc i alleles, the major histocompatibility complex class i (mhc i) binding prediction tool on the iedb [ ] was used. all peptide length was set as amino acid. to predict the binding affinity, the artificial neural network (ann) prediction method was selected with a half-maximal inhibitory concentration (ic ) value of less than . in contrast, to predict the interaction with different mhc ii alleles, the major histocompatibility complex class ii (mhc ii) binding prediction was used. to predict the binding affinity, the nn align algorithm was selected with an ic value of less than . the human allele reference sets (hla dr, dp, and dq) were included in the prediction. to predict the percentage of peptides binding with various mhc i and mhc ii alleles that cover the world population, the population coverage tool on the iedb [ ] was used. to predict the peptides' allergicity, the allergenfp v. . [ ] and allercatpro v. . [ ] servers were used. in contrast, to predict the peptides' toxicity, the toxinpred server [ ] was used. to model the d structure of the nucleocapsid, spike, and mhc molecules, the swiss-model server [ ] , and the phyre web portal [ ] were used. to visualize the modeled structures, the uscf chimera . software [ ] was used. the predicted peptides were docked with mhc i alleles hla-a* : , hla-b : , hla-c* : , and mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb * : , hla-drb . the modeled mhc structures were prepared for the docking via cresset flare software [ ] at the normal type calculation method. to dock the predicted peptides, the mdockpep [ ] and hpepdock [ ] [ ] [ ] [ ] [ ] servers were used. the predicted peptides were submitted as amino acid sequences. the d and d interactions were visualized using the poseview [ ] at the proteinplus web portal [ ] and cresset flare viewer [ ] , respectively. according to the iedb [ ] prediction, the average binding score for the nucleocapsid phosphoprotein and spike glycoprotein were . , . , respectively. all values equal to or greater than the default threshold were predicted as potential b-cell binders. regarding the cytotoxic t-lymphocyte peptides, the mhc i binding prediction tool predicts peptides from the nucleocapsid and peptides from the spike glycoprotein could interact with the different mhc i alleles. the most promising peptides were listed in table . in contrast, the mhc ii binding prediction tool predicts peptides from the nucleocapsid and peptides from the spike glycoprotein could interact with the different mhc ii alleles. the most promising peptides were listed in table according to the mdockpep [ ] and hpepdock [ ] servers prediction, the spike peptide (ftisvttei) has the highest binding affinity to the mhc i hla-b allele and the spike peptide (miaqytsal) has the highest binding affinity to the mhc i hla-c allele. the nucleocapsid peptide (ktfpptepk) has the highest binding affinity to the mhc i hla-a allele ( table ) according to the allergenfp v. . [ ] , allercatpro v. . [ ] , and toxinpred servers [ ] , all the predicted peptides except the spike peptide (evfnatrfasvyawn) were nonallergen and non-toxin (tables and ) . the hydrophobicity scores of the spike glycoprotein peptides were not available. due to the current covid- pandemic, the rapid discovery of a safe and effective vaccine is an essential issue [ ] . since the successful vaccine relies on the selection of the most antigenic parts and the best approaches [ ] , covid- nucleocapsid phosphoprotein (n) and spike glycoprotein (s) were selected to design a peptide vaccine. the antigenicity of the nucleocapsid and spike is well predicted [ ] , and the advantages of the peptide vaccines are well established [ , ] . the peptide design via the immunoinformatics approach is achieved through multiple steps including the prediction of; b-cells and t-cell peptides, the surface accessibility, antigenic sites, and the population coverage. after the selection of the candidate peptides, their interaction with the mhc molecules is simulated and their safety is predicted [ ] . regarding the b-cells peptides prediction, the successful candidates must pass the threshold scores in the bepipred, parker hydrophilicity, kolaskar and tongaonkar antigenicity, as well as emini surface accessibility tests [ ] . the iedb bepipred test [ ] on the nucleocapsid showed that eleven peptides were predicted, however, the peptide dayktfpptepkkdk-kkkadetqalpqrqkkqqtvtllpaadldd was the only one that passed all the tests. in contrast, the iedb bepipred test [ ] on the spike showed that forty-two peptides were predicted, but the peptides sqcvnlttrtqlppaytnsftrgvy and lgky the only two that passed all the tests. as the length of effective b-cell peptides varies from - amino acids [ ] , the peptide lgky is too short and the peptide dayktfpptepkkdk-kkkadetqalpqrqkkqqtvtllpaadldd is too long. consequently, the spike peptide peptides sqcvnlttrtqlppaytnsftrgvy is predicted to have the highest binding affinity to the b-cells (table ) . concerning the t-cells peptides prediction, the test measures the peptides' binding affinity to the mhc molecules [ ] . the available mhc i alleles hla a, hla b, hla c, hla e, and mhc ii alleles hla-dr, hla-dq, and hla-dp were used. the mhc i iedb tests [ ] the results of collective the iedb tests [ ] revealed that the spike glycoprotein peptides ftisvttei, miaqytsal, and the nucleocapsid peptide ktfpptepk are the most promised mhc peptides. on the other hand, the spike peptides evfnatrfasvyawn, vfrssvlhstqdlfl, and the nucleocapsid peptides aalalllldrlnqle, alall-lldrlnqles, prwyfyylgtgpeag, rwyfyylgtgpeagl are the most promised mhc ii peptides. to stimulate better immunological responses by the predicted peptides, they must interact and bind effectively with the mhc and mhc ii molecules [ ] , therefore we must study their interaction with the mhc molecules. the simulation and prediction of the interaction between the predicted peptides and the mhc molecules are conducted using molecular docking studies that rely on the calculation of the binding free energy. the lowest binding energy scores of the mhc-peptide complex will indicate the best interaction and the highest stability [ ] . to validate the results of molecular docking, mdockpep [ ] and hpepdock [ ] [ ] [ ] [ ] [ ] servers were used. the mdockpep server predicts the mhc-peptides interaction by "docking the peptides onto the whole surface of protein independently and flexibly using a novel the conformation restriction in its novel iterative approach. it ranks the docked peptides via the itscorepep scoring function that uses the known protein-peptide complex structures in the calculations" [ ] . in contrast, hpepdock uses "a hierarchical flexible peptide docking approach" to predict the mhc-peptides interaction [ ] . the mhc i, hla-a , hla-b , hla-c was predicted to present the highly conserved sars-cov- peptides more effectively [ ] , hence, they were used in molecular docking study. the molecular docking results showed that the spike peptide ftisvttei has the lowest docking energy score with the mhc i hla-b allele, hence it is predicted to have the highest binding affinity. the spike peptide miaqytsal showed the lowest docking energy score with the mhc i hla-c , consequently, it is predicted to have the highest binding affinity to the mhc i hla-c allele. in contrast, regarding the mhc i hla-a ; the results of the mdockpep [ ] server showed that the nucleocapsid peptide ktfpptepk has the lowest docking energy score, but the results of hpepdock [ ] server showed the spike peptide miaqytsal has the lowest docking energy score ( table ). to illustrate the mhc-peptide interaction, the poseview [ ] at the proteinplus web portal [ ] that illustrate the d interactions and cresset flare viewer [ ] that illustrate the d interaction were used. the spike peptide ftisvttei interacts with the mhc i hla-b allele by forming hydrogen bonds with the amino acids thr , ser , ser a, asn a, tyr a, thr a, lys a, trp a, glu a and hydrophobic bonds with the amino acids thr , thr , trp a, ser a, trp a, ala a, leu a. the spike peptide miaqytsal interacts with the mhc i hla-c allele by forming hydrogen bonds with the amino acids tyr a, arg a, lys a, gln a, thr a and hydrophobic bonds with the amino acids ile , gln , gln a, ala , arg a, tyr a, trp a. in comparison, it interacts with the mhc i hla-a allele by forming hydrogen bonds with the amino acids thr , glu a, arg a, trp a, and hydrophobic bonds with the amino acids ile , trp a, tyr a, trp a. it forms six hydrogen bonds and seven hydrophobic bonds with the mhc i hla-c allele that is more than its bonds with the mhc i hla-a allele. this finding indicates the higher binding affinity of spike peptide miaqytsal to the mhc i hla-c allele and supports the mdockpep [ ] server score (table and figures , ). the nucleocapsid peptide ktfpptepk interacts with the mhc i hla-a allele by forming hydrogen bonds with the amino acids pro , glu a, lys a, asp a, trp a and hydrophobic bonds with the amino acids pro , pro , leu a, thr a, trp a, ala a, val a (figures and ). among the reported mhc i alleles, the hla-b allele was predicted to have "the greatest ability to present the highly conserved sars-cov- peptides" [ ] , therefore, the spike peptide ftisvttei is predicted to make the highest response, since the binding with the mhc i stimulates the natural killer and the cytotoxic t cells [ ] . regarding the interaction with the mhc ii molecule, the spike peptide evfnatrfasvyawn showed the lowest docking energy score with the three mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb * : , and hla-drb , hence it is predicted to have the highest binding affinity to the three alleles. hence, it predicted to stimulate the cd + (helper) t cells more effectively, since the mhc ii molecule presents the antigenic peptides to the cd + (helper) t cells [ ] . on the contrary, the nucleocapsid peptide (rwyfyylgtgpeagl) showed lower docking energy scores with the mhc ii allele hla-dqa * : /dqb * : . the results of the mdockpep [ ] server showed that the peptide (prwyfyylgtgpeag) has the lowest docking energy score, however, the results of hpepdock [ ] server differed from it in the mhc ii alleles hla-dpa * : /dpb * : and hla-drb (table ) . spike peptide (evfnatrfasvyawn) interacts with hla-dpa * : /dpb * : allele by forming hydrogen bonds with the amino acids val , ser , tyr , tyr a, glu a, hydrophobic bonds with the amino acids val , phe , tyr a, ala a, ala a, asn a, leu a, leu a, trp , tyr a, and pi-pi bonds with the aromatic amino acid phe , ser , tyr , trp , tyr a. it interacts with the hla-drb allele by forming hydrogen bonds with the amino acids val , thr , ser , phe b, tyr b, tyr b, gly b, hydrophobic bonds with the amino acids val , asn , thr , phe , tyr , trp , glu b, cys b, phe b, asn b, tyr b, pro b, trp b, and pi-pi bonds with the aromatic amino acid phe , tyr , trp . its interaction with the hla-dqa * : /dqb * : allele was not obtained from the server. the nucleocapsid peptide (rwyfyylgtgpeagl) interacts with hla-dpa * : /dpb * : allele by forming hydrogen bonds with the amino acids glu , leu a, hydrophobic bonds with the amino acids trp , phe , tyr , gly , glu , gln a, leu a, leu a, leu a, phe a, pro a, pro a, tyr a, and pi-pi bonds with the aromatic amino acid trp , phe , tyr , phe a, tyr a. it interacts with the hla-dqa * : /dqb * : allele by forming hydrogen bonds with the amino acids glu , asn a, asn a, leu a, thr a, asn a, hydrophobic bonds with the amino acids trp , phe , tyr , gly , glu , gly , gln a, phe a, ala a, tyr a, thr a, ala a, ala a, and pi-pi bonds with the aromatic amino acid trp , phe , tyr , tyr a. it interacts with the hla-drb allele by forming hydrogen bonds with the amino acids glu , val b, cys b, tyr b, tyr b, asn b, gln b, hydrophobic bonds with the amino acids trp , phe , tyr , gly , glu , lys b, his b, tyr b, asn b, tyr b, trp b, and pi-pi bonds with the aromatic amino acid trp , phe , tyr . the nucleocapsid peptide (rwyfyylgtgpeagl) interacts most effectively with the hla-dqa * : /dqb * : allele, hence it showed the highest binding affinity to it (figures and ) . concerning the two nucleocapsid peptides (rwyfyylgtgpeagl) and (prwyfyylgtgpeag) in the interaction with the mhc ii alleles hla-dpa * : /dpb * : and hla-drb , the d and d interaction results showed that the peptide (prwyfyylgtgpeag) more effectively with the hla-dpa * : /dpb * : allele and the peptide (rwyfyylgtgpeagl) more effectively with the hla-drb allele, however, they differ only in the first and last amino acid (figures and ). in comparison between the spike and nucleocapsid peptides, the spike peptide (ftisvttei) showed a higher binding affinity to the mhc i hla-b allele. the nucleocapsid peptides (ktfpptepk) and (rwyfyylgtgpeagl) showed a higher binding affinity to the mhc i hla-a allele and the three mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb * : , hla-drb , respectively, however, the total population coverage of the peptides ftisvttei and ktfpptepk is not high (table ) . joshi a, et al. in their predictive covid- peptide-based vaccine study found that the orf- a protein's peptide (itlcftlkr) binds most effectively with the mhc i hla alleles hla-a* : , hla-a* : [ ] . enayatkhani m, et al. in their predictive study included nucleocapsid n, but they studied its interaction with the mhc i hla-a* : allele [ ] . kalita p, et al. included the nucleocapsid protein and spike glycoprotein. they used the predicted peptides from nucleocapsid, spike, and membrane glycoprotein to design a subunit vaccine [ ] . singh a, et al. used the nucleocapsid protein to design multi-peptides vaccine [ ] . since we didn't include the two alleles hla-a* : , hla-a* : in our study and didn't design subunit or multi-peptides vaccine, the logical comparison will not be applied. besides the binding with the mhc molecules, the predicted peptide must be non-toxic and non-allergen, hence, their safety was predicted using the allergenfp v. . [ ] , allercatpro [ ] v. . , toxinpred [ ] servers. the result showed that all the peptides were non-toxic. the allercatpro v. . [ ] server results showed there is no evidence about the allergicity of all peptides, however, the allergenfp v. . [ ] server predicts spike peptide (evfnatrfasvyawn) as an allergen (tables and ). a potential covid- peptide-based vaccine was predicted from the nucleocapsid phosphoprotein (n) and spike glycoprotein (s) via the immunoinformatics approach. the spike peptide peptides sqcvnlttrtqlppaytnsftrgvy is predicted to have the highest binding affinity to the b-cells. the spike peptide ftisvttei has the highest binding affinity to the mhc i hla-b allele. the nucleocapsid peptides ktfpptepk and rwyfyylgtgpeagl have the highest binding affinity to the mhc i hla-a allele and the three mhc ii alleles hla-dpa * : /dpb * : , hla-dqa * : /dqb -* : , hla-drb , respectively. furthermore, those peptides were predicted as non-toxic and non-allergen. therefore, the combination of those peptides is predicted to stimulate better immunological responses. since the study is an in silico predictive work, further experimental studies are recommended to validate the obtained results. a comparative sequence analysis to revise the current taxonomy of the family coronaviridae novel coronavirus ( -ncov) : situation report, coronavirus disease (covid- ): epidemiology, virology, clinical features, diagnosis, and prevention sars-cov- : an emerging coronavirus that causes a global threat covid- situation update worldwide who solidarity trial -accelerating a safe and effective covid- vaccine the covid- vaccine development landscape genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan peptide vaccine: progress and challenges peptide-based vaccine successfully induces protective immunity against canine visceral leishmaniasis reverse vaccinology approach to design a novel multi-epitope vaccine candidate against covid- : an in silico study design of a peptide-based subunit vaccine against novel coronavirus sars-cov- epitope based vaccine prediction for sars-cov- by deploying immuno-informatics approach designing a multi-epitope peptide-based vaccine against sars-cov- database resources of the national center for biotechnology information ipd-imgt/hla database clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties, and weight matrix choice bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt the immune epitope database (iedb) . improved method for predicting linear bcell epitopes induction of hepatitis a virusneutralizing antibody by a virus-specific synthetic peptide allergenfp: allergenicity prediction y descriptor fingerprints allercatpro-prediction of protein allergenicity potential from the protein sequence in silico approach for predicting toxicity of peptides and proteins swiss-model: an automated protein homology-modeling server the phyre web portal for protein modeling, prediction, and analysis ucsf chimera--a visualization system for exploratory research and analysis a. molecular field extrema as descriptors of biological activity: definition and validation mdockpep: an ab-initio protein-peptide docking server hpepdock: a web server for blind peptide-protein docking based on a hierarchical algorithm hhblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment improved tools for biological sequence comparison comparative protein structure modeling of genes and genomes the protein data bank poseview--molecular interaction patterns at galance proteinsplus: a web portal for structure analysis of macromolecules developing covid- vaccines at pandemic speed vaccines: from empirical development to rational design immunoinformatics approach for epitope-based peptide vaccine design and active site prediction against polyprotein of emerging oropouche virus antibody epitope prediction -tutorial present yourself! by mhc class i and mhc class ii molecules relative binding free energy calculations in drug discovery: recent advances and practical considerations human leukocyte antigen susceptibility map for sars-cov- ," medrxiv major histocompatibility complex (mhc) class i and mhc class ii proteins: conformational plasticity in antigen presentation function and regulation of mhc class ii molecules in t-lymphocytes: of mice and men the authors declare that have no conflict of interest. key: cord- -wivk bm authors: schoof, michael; faust, bryan; saunders, reuben a.; sangwan, smriti; rezelj, veronica; hoppe, nick; boone, morgane; billesbølle, christian b.; puchades, cristina; azumaya, caleigh m.; kratochvil, huong t.; zimanyi, marcell; deshpande, ishan; liang, jiahao; dickinson, sasha; nguyen, henry c.; chio, cynthia m.; merz, gregory e.; thompson, michael c.; diwanji, devan; schaefer, kaitlin; anand, aditya a.; dobzinski, niv; zha, beth shoshana; simoneau, camille r.; leon, kristoffer; white, kris m.; chio, un seng; gupta, meghna; jin, mingliang; li, fei; liu, yanxin; zhang, kaihua; bulkley, david; sun, ming; smith, amber m.; rizo, alexandrea n.; moss, frank; brilot, axel f.; pourmal, sergei; trenker, raphael; pospiech, thomas; gupta, sayan; barsi-rhyne, benjamin; belyy, vladislav; barile-hill, andrew w.; nock, silke; liu, yuwei; krogan, nevan j.; ralston, corie y.; swaney, danielle l.; garcía-sastre, adolfo; ott, melanie; vignuzzi, marco; walter, peter; manglik, aashish title: an ultra-potent synthetic nanobody neutralizes sars-cov- by locking spike into an inactive conformation date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: wivk bm without an effective prophylactic solution, infections from sars-cov- continue to rise worldwide with devastating health and economic costs. sars-cov- gains entry into host cells via an interaction between its spike protein and the host cell receptor angiotensin converting enzyme (ace ). disruption of this interaction confers potent neutralization of viral entry, providing an avenue for vaccine design and for therapeutic antibodies. here, we develop single-domain antibodies (nanobodies) that potently disrupt the interaction between the sars-cov- spike and ace . by screening a yeast surface-displayed library of synthetic nanobody sequences, we identified a panel of nanobodies that bind to multiple epitopes on spike and block ace interaction via two distinct mechanisms. cryogenic electron microscopy (cryo-em) revealed that one exceptionally stable nanobody, nb , binds spike in a fully inactive conformation with its receptor binding domains (rbds) locked into their inaccessible down-state, incapable of binding ace . affinity maturation and structure-guided design of multivalency yielded a trivalent nanobody, mnb -tri, with femtomolar affinity for sars-cov- spike and picomolar neutralization of sars-cov- infection. mnb -tri retains stability and function after aerosolization, lyophilization, and heat treatment. these properties may enable aerosol-mediated delivery of this potent neutralizer directly to the airway epithelia, promising to yield a widely deployable, patient-friendly prophylactic and/or early infection therapeutic agent to stem the worst pandemic in a century. monoclonal antibodies disclosed to date. our lead neutralizing molecule, mnb -tri, blocks sars-cov- entry in human cells at picomolar efficacy and withstands aerosolization, lyophilization, and elevated temperatures. mnb -tri provides a promising approach to deliver a potent sars-cov- neutralizing molecule directly to the airways for prophylaxis or therapy. synthetic nanobodies that disrupt spike-ace interaction to isolate nanobodies that neutralize sars-cov- , we screened a yeast surface-displayed library of > x synthetic nanobody sequences. our strategy was to screen for binders to the full spike protein ectodomain, in order to capture not only those nanobodies that would compete by binding to the ace -binding site on the rbd directly but also those that might bind elsewhere on spike and block ace interaction through indirect mechanisms. we used a mutant form of sars-cov- spike (spike*,) as the antigen ( ). spike* lacks one of the two activating proteolytic cleavage sites between the s and s domains and introduces two mutations to stabilize the pre-fusion conformation. spike* expressed in mammalian cells binds ace with a kd = nm ( supplementary fig. ) , consistent with previous reports ( ). next, we labeled spike* with biotin or with fluorescent dyes and selected nanobody-displaying yeast over multiple rounds, first by magnetic bead binding and then by fluorescence-activated cell sorting (fig. a) . three rounds of selection yielded unique nanobodies that bound spike* and showed decreased spike* binding in the presence of ace . closer inspection of their binding properties revealed that these nanobodies fall into two distinct classes. one group (class i) binds the rbd and competes with ace (fig. b) . a prototypical example of this class is nanobody nb , which binds to spike* and to rbd alone with a kd of nm and nm, respectively ( fig. c ; table ). another group (class ii), exemplified by nanobody nb , binds to spike* (kd = nm), but displays no binding to rbd alone (fig. c, table ). in the presence of excess ace , binding of nb and other class i nanobodies is blocked entirely, whereas binding of nb and other class ii nanobodies is decreased only moderately (fig. b) . these results suggest that class i nanobodies target the rbd to block ace binding, whereas class ii nanobodies target other epitopes and decrease ace interaction with spike allosterically or through steric interference. indeed, surface plasmon resonance (spr) experiments demonstrate that class i and class ii nanobodies can bind spike* simultaneously (fig. d) . analysis of the kinetic rate constants for class i nanobodies revealed a consistently greater association rate constant (ka) for nanobody binding to the isolated rbd than to full-length spike* (table ) , which suggests that rbd accessibility influences the kd. we next tested the efficacy of our nanobodies, both class i and class ii, to inhibit binding of fluorescently labeled spike* to ace -expressing hek cells (table , fig. e ). class i nanobodies emerged with highly variable activity in this assay with nb and nb as two of the most potent clones with ic values of and nm, respectively (table ) to define the binding sites of nb and nb , we determined their cryogenic electron microscopy (cryo-em) structures bound to spike* ( fig. a state rbds only contacts a single rbd (fig. d) . nb interacts with the spike s domain external to the rbd our attempts to determine the binding site of nb by cryo-em proved unsuccessful. we therefore turned to radiolytic hydroxyl radical footprinting to determine potential binding sites for nb . spike*, either apo or bound to nb , was exposed to - milliseconds of synchrotron x-ray radiation to label solvent-exposed amino acids with hydroxyl radicals. radical-labeled amino acids were subsequently identified and quantified by mass spectrometry of trypsin/lys-c or glu- c protease digested spike*( ). two neighboring surface residues on the s domain of spike (m and h ) emerged as highly protected sites in the presence of nb we assessed multivalent nb binding to spike* by spr. both bivalent nb with a amino acid linker (nb -bi) and trivalent nb with two amino acid linkers (nb -tri) dissociate from spike* in a biphasic manner. the dissociation phase can be fitted to two components: a fast phase with kinetic rate constants kd of . x - s - for nb -bi and . x - s - for nb -tri, which are of the same magnitude as that observed for monovalent nb (kd = . x - s - ) and a slow phase that is dependent on avidity (kd = . x - for nb -bi and kd < . x - s - for nb -tri, respectively) ( fig. a) . the relatively similar kd for the fast phase suggests that a fraction of the observed binding for the multivalent constructs is nanobody binding to a single spike* rbd. by contrast, the slow dissociation phase of nb -bi and nb -tri indicates engagement of two or three rbds. we observed no dissociation for the slow phase of nb -tri over minutes, indicating an upper boundary for kd of x - s - and subpicomolar affinity. this measurement remains an upper- bound estimate rather than an accurate measurement because the technique is limited by the intrinsic dissociation rate of spike* from the chip imposed by the chemistry used to immobilize spike*. we reasoned that the biphasic dissociation behavior could be explained by a slow interconversion between up-and down-state rbds, with conversion to the more stable down- state required for full trivalent binding. according to this view, a single domain of nb -tri engaged with an up-state rbd would dissociate rapidly. the system would then re-equilibrate as the rbd flips into the down-state, eventually allowing nb -tri to trap all rbds in closed spike*. to test this notion directly, we varied the time allowed for nb -tri binding to spike*. indeed, we observed an exponential decrease in the percent fast-phase with a t / of s ( table ). nb -tri shows a -fold enhancement of inhibitory activity, with an ic of . nm, whereas trimerization of nb and nb resulted in more modest gains of - and -fold ( nm and nm), respectively (fig. c) . we next confirmed these neutralization activities with a viral plaque assay using live sars- nb -tri proved exceptionally potent, neutralizing sars-cov- with an average ic of pm (fig. d ). nb -tri neutralized sars-cov- with an average ic of nm (fig. d) . we further optimized the potency of nb by selecting high-affinity variants. to this end, we prepared a new library, starting with the nb coding sequence, in which we varied each amino acid position of all three cdrs by saturation mutagenesis (fig. a) . after two rounds of magnetic bead-based selection, we isolated a population of high-affinity clones. sequencing revealed two highly penetrant mutations: i y in cdr and p y in cdr . we incorporated these two mutations into nb to generate matured nb (mnb ), which binds with -fold increased affinity to spike* as measured by spr (fig. b) . as a monomer, mnb inhibits both pseudovirus and live sars-cov- infection with low nanomolar potency, a ~ -fold improvement compared to nb ( fig. i -j, table ). a . Å cryo-em structure of mnb bound to spike* shows that, like the parent nanobody nb , mnb binds to closed spike (fig. c, supplementary fig. ) . the higher resolution map allowed us to build a model with high confidence and determine the effects of the i y and p y substitutions. mnb induces a slight rearrangement of the down-state rbds as compared to both previously determined structures of apo-spike* and spike* bound to nb , inducing a ° rotation of the rbd away from the central three-fold symmetry axis (fig. h) ( , ) . this deviation likely arises from a different interaction between cdr and spike*, which nudges the rbds into a new resting position. while the i y substitution optimizes local contacts between cdr in its original binding site on the rbd, the p y substitution leads to a marked rearrangement of cdr in mnb (fig. f-g) . this conformational change yields a different set of contacts between mnb cdr and the adjacent rbd (fig. d) . remarkably, an x-ray crystal structure of mnb alone revealed dramatic conformational differences in cdr and cdr between free and spike*-bound mnb , suggestive of significant conformational heterogeneity for the unbound nanobodies and induced-fit rearrangements upon binding to spike* (fig. e) . the binding orientation of mnb is similar to that of nb , supporting the notion that our multivalent design would likewise enhance binding affinity. unlike nb -tri, trivalent mnb (mnb -tri) bound to spike with no observable fast-phase dissociation and no measurable dissociation over ten minutes, yielding an upper bound for the dissociation rate constant kd of . x - s - (t / > days) and a kd of < pm (fig. b) . as above, more precise measurements of the dissociation rate are precluded by the surface chemistry used to immobilize spike*. mnb -tri displays further gains in potency in both pseudovirus and live sars-cov- infection assays with ic values of pm ( . ng/ml) and pm ( . ng/ml), respectively (fig. h-i, table ). given the sub-picomolar affinity observed by spr, it is likely that these viral neutralization potencies reflect the lower limit of the assays. mnb -tri is therefore an exceptionally potent sars-cov- neutralizing antibody, among the most potent molecules disclosed to date. nb , nb -tri, mnb , and mnb -tri are robust proteins one of the most attractive properties that distinguishes nanobodies from traditional monoclonal antibodies is their extreme stability ( ). we therefore tested nb , nb -tri, mnb , and mnb -tri for stability regarding temperature, lyophilization, and aerosolization. temperature denaturation experiments using circular dichroism measurements to assess protein unfolding revealed melting temperatures of . , . , . , and . °c for nb , nb -tri, mnb and mnb -tri, respectively ( fig a) . aerosolization and prolonged heating of nb , mnb , and mnb -tri for hour at °c induced no loss of activity (fig b) . moreover, mnb and mnb -tri were stable to lyophilization and to aerosolization using a mesh nebulizer, showing no aggregation by size exclusion chromatography and preserved high affinity binding to spike* (fig. c-d) . there is a pressing need for prophylactics and therapeutics against sars-cov- infection. most recent strategies to prevent sars-cov- entry into the host cell aim at blocking the ace -rbd interaction. high-affinity monoclonal antibodies, many identified from convalescent patients, are leading the way as potential therapeutics ( - ). while highly effective in vitro, these agents are expensive to produce by mammalian cell expression and need to be intravenously administered by healthcare professionals ( ). moreover, large doses are likely to be required for prophylactic viral neutralization, as only a small fraction of systemically circulating antibodies cross the epithelial cell layers that line the airways ( ). by contrast, single domain antibodies (nanobodies) provide significant advantages in terms of production and deliverability. they can be inexpensively produced at scale in bacteria (e. coli) or yeast (p. pastoris). furthermore, their inherent stability enables aerosolized delivery directly to the nasal and lung epithelia by self-administered inhalation ( ). monomeric mnb is among the most potent single domain antibodies neutralizing sars-cov- discovered to date. multimerization of single domain antibodies has been shown to improve target affinity by avidity ( , ) . in the case of nb and mnb , however, our design strategy enabled a multimeric construct that simultaneously engages all three rbds, yielding profound gains in potency. furthermore, because rbds must be in the up-state to engage with ace , conformational control of rbd accessibility can serve as an added neutralization mechanism. indeed, our nb -tri and mnb -tri molecules were designed with this functionality in mind. sars-cov- seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup trimeric sars-cov- spike interacts with dimeric ace with limited intra- spike avidity. biorxiv yeast surface display platform for rapid discovery of conformationally selective nanobodies automated electron microscope tomography using robust prediction of specimen movements motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy cryosparc: algorithms for rapid unsupervised cryo-em structure determination new tools for automated high-resolution cryo-em structure determination in relion- grigorieff, cistem, user-friendly software for single-particle image processing structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure of a nanobody-stabilized active state of the β( ) adrenoceptor rosettaes: a sampling strategy enabling automated interpretation of difficult cryo-em maps coot: model-building tools for molecular graphics isolde: a physically realistic environment for model building into low- resolution electron-density maps phenix: a comprehensive python-based system for macromolecular structure solution allosteric nanobodies reveal the dynamic range and diverse mechanisms of g-protein-coupled receptor activation the beamline x c of the center for synchrotron biosciences: a national resource for biomolecular structure and dynamics experiments using synchrotron footprinting fast quantitative analysis of timstof pasef data with msfragger and ionquant msstats: an r package for statistical analysis of quantitative mass spectrometry-based proteomic experiments automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants phaser crystallographic software buster version . . . . cambridge, united kingdom: global phasing ltd figure . cryo-em structures of nb and nb bound to spike. a, cryo-em maps of spike*- nb complex in either closed (left) or open (right) spike* conformation. b, cryo-em maps of spike*-nb complex in either closed (left) or open (right) spike* conformation. the top views show receptor binding domain (rbd) up-or down-states. c, nb straddles the interface of two down-state rbds, with cdr reaching over to an adjacent rbd. d, nb binds a single rbd in the down-state (displayed) or similarly in the up-state nb in either rbd up-or down-state. e, comparison of rbd epitopes engaged by ace (purple), nb (red), or nb (green) multivalency improves nanobody affinity and inhibitory efficacy. a, spr of nb and multivalent variants. red traces show raw data and black lines show global kinetic fit for nb and independent fits for association and dissociation phases for nb -bi and nb -tri dissociation phase spr traces for nb -tri after variable association time ranging from curves were normalized to maximal signal at the beginning of the dissociation phase. percent fast phase is plotted as a function of association time (right) with a single exponential fit. n = independent biological replicates. c, inhibition of pseudotyped lentivirus infection of ace expressing hek t cells. n = biological replicates for all but nb -tri (n = ) d, inhibition of live sars-cov- virus. representative biological replicate with n = (right panel) or (left panel) technical replicates per concentration. n = biological replicates for all but nb and nb - tri (n = ) dissociation was observed for mnb -tri over minutes. c, cryo-em structure of spike*-mnb comparison of receptor binding domain (rbd) engagement by nb and mnb demonstrating changes in nb and mnb position and the adjacent rbd. e, comparison of mnb complementarity determining regions in either the cryo-em structure of the spike*-mnb complex or an x-ray crystal structure of mnb alone. f, cdr of nb and mnb binding to the rbd. as compared to i in nb nb and mnb binding to the rbd demonstrating a large conformational rearrangement of the entire loop in mnb . h, comparison of closed spike* bound to mnb and rotational axis for rbd movement is highlighted. i, inhibition of pseudotyped lentivirus infection of ace expressing hek t cells by mnb and mnb -tri. n = biological replicates j, mnb and mnb -tri inhibit sars-cov- infection of veroe cells in a plaque assay representative biological replicate with n = technical replicates per concentration. n = biological replicates for all samples average values from n = biological replicates for nb , nb , and nb -tri are presented c average values from n = biological replicates for nb , nb -bi, and nb -tri. n = biological replicates for all others nb -no binding nc -no competition np -not performed we thank the entire walter and manglik labs for facilitating the development and rapid execution of this large-scale collaborative effort. we thank sebastian bernales and tony de fougerolles for advice and helpful discussion, and jonathan weissman for input into the project and reagent and machine use. we thank jim wells for providing the ace ecd-fc construct, jason mclellan for providing spike, rbd, and ace constructs, and florian krammer for providing an rbd construct. we thank jesse bloom for providing the ace expressing hek t i . x . x - . x - . x . x - . x - . x - ( . x - ) np np nb ii . x . x - . x - nb nc . x - ( . x - ) . x - ( . x - )nb i . x . x - . x - . x . x - . x - . x - ( . x - ) . x - ( . x - ) . x - ( . x - )nb i . x . x - . x - . x . x - . x - . x - ( . x - ) np np nb i . x . x - . x - . x . x - . x - . x - ( . x - ) . x - ( . x - ) np nb i . x . x - . x - biphasic biphasic biphasic . x - ( . x - ) . x - ( . x - ) np nb i . x . x - . x - . x . x - . x - . x - ( . x - ) . x - ( . x - ) np nb i . x . x - . x - np . x - ( . x - ) np np nb ii . x . x - . x - nb nc . x - ( . x - ) np nb ii . x . x - . x - nb . x - ( . x - ) np np nb i . x . x - . x - . x . x - . x - . x - ( . x - ) . x - ( . x - ) np nb i . x . x - . x - . x . x - . x - . x - ( . x - ) np np ace n/a . x . x - . x - np np np . x - ( . x - ) . x - ( . x - ) np mnb i . x . x - . x - . x . x - . x - . x - ( . x - ) . x - ( . x - ) key: cord- -aq p uxo authors: węglarz-tomczak, ewelina; tomczak, jakub m.; giurg, mirosław; burda-grabowska, małgorzata; brul, stanley title: discovery of potent inhibitors of plprocov by screening a library of selenium-containing compounds date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: aq p uxo a collection of twelve organoselenium compounds, structural analogues of antioxidant drug ebselen were screened for inhibition of the papain-like protease (plpro) from the acute respiratory syndrome coronavirus (sars-cov- , cov ). this cysteine protease, being responsible for the hydrolysis of peptide bonds between specific amino acids, plays a critical role in cov replication and in assembly of new viral particles within human cells. the activity of the plpro cov is essential for the progression of coronavirus disease (covid- ) and it constitutes a key target for the development of anti-covid- drugs. here, we identified four strong inhibitors that bind favorably to the plpro cov with the ic in the nanomolar range. severe acute respiratory syndrome coronavirus (sars-cov- ) is the cause of the coronavirus disease in the coronavirus pandemic that on may , affected around . million people in over countries [ ] . the scale of the pandemic, easiness of its spreading [ ] and variety of potential complications [ ] , which yet remain not fully understood, position covid- among serious diseases faced by humankind so far. the extensive research on developing new antiviral drugs for covid- led to identifying two potential targets being cysteine proteases that play a vital role in viral replication. main protease (m pro , also known as chymotrypsin-like protease clpro) is responsible for polypeptide processing during virus replication [ , ] . papain-like protease (pl pro ) additionally to its role in virus replication assembles new viral particles within human cells [ , ] . harcourt et al. [ ] showed that pl pro is a key enzyme in the pathogenesis of sars-cov- [ ] , the causative agent of the fatal global outbreak of respiratory disease in humans during - [ ] . in the study presented by shin et al. [ ] , pl pro from novel sars-cov- is claimed to be the essential viral enzyme that weakens the antiviral immune response and helps to take advantage of the host's immune system for its own benefit. thus, current understanding of mechanisms of sars-cov- indicates that blocking pl pro seems to be crucial for further stopping virus spread. the identification of sars-cov- pl pro as an essential viral enzyme [ ] offers a great possibility for drug discovery. in recent studies, peptide analogues [ ] and ebselen [ ] have been identified as highly active inhibitors for pl pro . ebselen is a seleno-organic drug with well-known anti-inflammatory, anti-atherosclerotic, cytoprotective properties and low toxicity in humans [ ] . it has been proven to be an effective therapeutic agent in multiple diseases like cancer and hepatitis c virus [ ] [ ] [ ] [ ] [ ] . in this paper, we present inhibitory activity of twelve ebselen derivatives obtained by substitution/functionalization of the phenyl ring towards sars-cov- pl pro (pl pro sars) and sars-cov- pl pro (pl pro cov ). previously, these derivatives were proven to be highly effective towards human methionine aminopeptidase [ ] and antiviral and antimicrobial agents [ ] . we show that some of them indeed possess higher activity than ebselen, that has been recently reported as pl pro cov inhibitor [ ] , and, thus, could be considered as novel potential drugs against covid- . twelve ebselen derivatives/analogues compounds, seven benzisoselenazol- ( h )-ones ( a -g ) and four , ′-dicarbamoyldiaryl diselenides organoselenium compounds were found to be irresistible inhibitors toward both enzymes. similar to recently published ebselen [ ] they show irreversibility of the mode of action. all of phenylbenzisoselenazol- ( h )-ones inactivated completely pl pro cov in concentration equal μm (see table ). in the case of pl pro sars the range of inhibition was from % to % (see table ). a enzymes activity assayed in the absence of inhibitors is defined as % activity. "< " means that the enzyme was not active (i.e., almost zero relative activity of the enzymes). the inhibitors were screened in tris buffer. the concentration of the enzymes was nm. the release of the fluorophore was monitored continuously. the linear portion of the progress curve was used to calculate the velocity. each experiment was repeated at least three times and the results are presented as the average with standard deviation. for more details, please see the materials and methods section. the experiment with μm of the inhibitor led to identify highly active ligands. two derivatives of ebselen with substitution by hydroxy ( c ) or methoxy group ( d ) inhibited cov in % in this condition. affinity of diselenide orthologs ( c and d ) were overall in the same range (table and ). whereas, replacing phenyl in position with less hydrophobic substituents, hydrogen ( a ) or methyl ( b ), was not beneficial, similar to substitution by methyl in the para position ( b ). we observed a similar relationship for diselenide (table ). all compounds were less active toward pl pro sars. only compounds that are substituted derivatives of ebselen and their diselenide orthologs showed significant inhibition in concentration μm. the inhibitory potency was further investigated for the most significant ligands with pl pro cov and we found ic value in nanomolar range for c , d , c and d (table ). all four compounds appeared to be very effective inhibitors of pl pro from cov , with the ic constants in the nanomolar range, e.g. nm for compound e with methyl substitution in the para position. the most significant results obtained in this study were further illustrated with molecular modeling (figure and ) . the modeled interactions do not show significant changes in the overall binding mode architecture compared with the ebselen-pl pro cov [ ] . similar to ebselen complexes with enzyme, hydroxyl derivative occupies the same intersection between catalytic cys -his -asp triad and trp and it is wrapped by other tyr , ala and leu making with them face-to-edge stacking and π-alkyl interactions, respectively ( figure ). additionally, se -phenyl and an indole from his forms π-π stacking interactions. in the case of compound d possessing hydroxyl group, negatively charged oxygen atoms coordinate the carboxyl group from asp (figure ). whereas the methoxy group in compound e forms π-alkyl interactions with aromatic rings from his and tyr (figure ). a model of the complex of compound d with sars-cov- pl pro (pdb: w c [ ] ). a model of the complex of compound e with sars-cov- pl pro (pdb: w c [ ] ). inhibition of papain like protease from cov has been recently identified as a potential approach to therapy of covid- [ ] . in this work, we used the ebselen derivatives/analogues library and performed a comprehensive inhibition study of pl pro cov . all of the tested compounds proved to be covalent inhibitors of the enzyme. interestingly, all derivatives except a blocked completely pl pro cov at a high concentration of the inhibitor ( μm), but only some of them ( c , d , e and d , e ) inhibited pl pro sars while rest of them failed. this outcome is even more apparent for the concentration of μm of an inhibitor. in the case of pl pro sars, none of the presented ebselen derivatives was able to block the enzyme. however, d , e , d and e inhibited pl pro cov at this concentration. the investigation of ic revealed that d , e , d and e obtained approximately nm that is one magnitude better than in the case of ebselen (around μ m as reported in [ ] ). in conclusion, we identified very effective inhibitors of pl pro cov , with the ic constants in the nanomolar range. our findings provide evidence that ebselen derivatives with an additional hydroxy or methoxy group serve as highly potential prospective drugs against covid- . sars-cov- plpro, sars-cov- plpro and ubiquitin-amc were purchased as , and μm solutions, respectively, from r&d systems. all compounds were obtained and fully characterized in previous studies [ ] [ ] [ ] . their purity and homogeneity were confirmed by hrms and se nmr. the enzymes were dissolved in a mm tris-hcl buffer containing dtt ( mm), nacl ( mm) and . mg/ml albumin, at ph . , and preincubated min. spectrofluorimetric measurements were performed in a -well plate format working at two wavelengths: excitation at nm and emission at nm. the release of the fluorophore was monitored continuously at the enzyme concentration of nm. the linear portion of the progress curve was used to calculate velocity of hydrolysis. the inhibitor was screened against recombinant pl pro sars and pl pro cov at °c in the assay buffer as described above. for steady state measurement the enzymes were incubated for min at °c with an inhibitor before adding the substrate to the wells. eight different inhibitor concentrations were used. value of the concentration of the inhibitor that achieved % inhibition ( ic ) was taken from the dependence of the hydrolysis velocity on the logarithm of the inhibitor concentration [i]. molecular modeling studies were performed using the discovery studio (dassault systemes biovia corp). the crystal structure of the sars-cov- (pdb id w c [ ] ) with protons added (assuming the protonation state of ph . ) was used as the starting point for calculations of the enzyme complexed with ebselen. the partial charges of all atoms were computed using the momany-rone algorithm. minimization was performed using the smart minimizer algorithm and the charmm force field up to an energy change of . or rms gradient of . . generalized born model was applied. the nonbond radius was set to Å. who novel coronavirus high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus . emerg interim clinical guidance for management of patients with confirmed coronavirus disease (covid- potential treatments for covid- ; a narrative literature review analysis of therapeutic targets for sars-cov- and discovery of potential drugs by computational methods identification of severe acute respiratory syndrome coronavirus replicase products and characterization of papain-like protease activity the sars-coronavirus papain-like protease: structure, function and inhibition by designed antiviral compounds coronaviruses: molecular and cellular biology summary table of sars cases by country inhibition of papain-like protease plpro blocks sars-cov- spread and promotes anti-viral immunity activity profiling and structures of inhibitor-bound sars-cov- -plpro protease provides a framework for anti-covid- drug design ebselen as a highly active inhibitor of pl pro cov molecular actions of ebselen-an antiinflammatory antioxidant ebselen as template for stabilization of a v mutant dimer for motor neuron disease therapy ebselen inhibits qsox enzymatic activity and suppresses invasion of pancreatic and renal cancer cell lines design, synthesis, and biological evaluation of benzoselenazole-stilbene hybrids as multi-target-directed anti-cancer agents ebselen inhibits hepatitis c virus ns helicase binding to nucleic acid and prevents viral replication the clinical drug ebselen attenuates inflammation and promotes microbiome recovery in mice after antibiotic treatment for cdi identification of methionine aminopeptidase as a molecular target of the organoselenium drug ebselen and its derivatives/analogues: synthesis, inhibitory activity and molecular modeling study synthesis of new alkylated and methoxylated analogues of ebselen with antiviral and antimicrobial properties reaction of bis[( -chlorocarbonyl)phenyl] diselenide with phenols, aminophenols, and other amines towards diphenyl diselenides with antimicrobial and antiviral properties the crystal structure of papain-like protease of sars cov- we gratefully acknowledge the dassault systemes for the free license for biovia: discovery studio package given for our research. ewt is co-financed by a grant mobilność plus v from the polish ministry of science and higher education (grant no. /mob/v/ / ). ewt conceived the project. ewt designed the research and experiments with contributions from jt and sb. experimental work was done by ewt. molecular docking was done by jt and ewt. mg and mb contributed to synthesis. ewt, jt and sb drafted and revised the manuscript. the authors declare no competing interests. correspondence and requests for materials should be addressed to ewt. key: cord- -zhwtlik authors: yazhini, arangasamy; sidhanta, das swayam prakash; srinivasan, narayanaswamy title: d g substitution enhances the stability of trimeric sars-cov- spike protein date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: zhwtlik sars-cov- spike protein with d g substitution has become the dominant variant in the ongoing covid- pandemic. several studies to characterize the new virus expressing g variant show that it exhibits increased infectivity compared to the ancestral virus having d spike protein. here, using in-silico mutagenesis and energy calculations, we analyzed inter-residue interaction energies and thermodynamic stability of the dominant (g ) and the ancestral (d ) variants of spike protein trimer in ‘closed’ and ‘partially open’ conformations. we find that the local interactions mediated by aspartate at the th position are energetically frustrated and create unfavourable environment. whereas, glycine at the same position confers energetically favourable environment and strengthens intra-as well as inter-protomer association. such changes in the local interaction energies enhance the thermodynamic stability of the spike protein trimer as free energy difference (ΔΔg) upon glycine substitution is − . kcal/mol for closed conformation and − . kcal/mol for open conformation. our results on the structural and energetic basis of enhanced stability hint that g may confer increased availability of functional form of spike protein trimer and consequent in higher infectivity than the d variant. according to epidemiological surveillance of the disastrous covid- pandemic, the causing agent severe acute respiratory syndrome coronavirus- (sars-cov- ) virus harbours mutations and associated with geographical-specific etiological effects (brufsky, ; mercatelli and giorgi, ) . currently, three major variants of sars-cov- have been identified namely d g in the spike protein, g v in the non-structural protein (ns ) and l s in the orf protein (forster et al., ) . in this article, we focused on the spike protein with d g substitution. spike protein of sars-cov- is a aa long transmembrane glycoprotein and comprises of three modules namely a large ectodomain that protrudes from the surface, a single-pass transmembrane anchor and a short intracellular tail. the ectodomain has s and s regions responsible for host cell binding and viral-host membrane fusion, respectively. at the junction of s and s regions, s / cleavage site is present and s ' cleavage site is located in the s region. depending on the orientation of a receptor binding domain (rbd) in the s region, protomers in the functional form of spike protein trimer adopts 'open' or 'closed' conformation (cai et al., ; walls et al., ; wrapp et al., ) . upon open conformation, rbd exposes ace receptor binding regions and interacts with a peptidase domain of angiotensin-converting enzyme (ace ) receptor. this primary step clasps the virus on to the host surface (lan et al., ; yan et al., ) . studies on sars-cov have shown that subsequent proteolysis at s /s cleavage site sheds s region from the spike protein and cleavage at s ' site near fusion peptide causes a large conformational change in the s region. such conformation change leads to an insertion of fusion peptide to host membrane and the formation of six-helix bundle. at this state, spike protein bridges viral envelope and host membrane. hairpin-like bend in s region brings both membranes to close proximity for fusion following which genetic material gets released into the cytoplasm of the human cell (belouzard et al., ) . it is also noted that due to multi-basic nature of s / cleavage site, the sars-cov- spike protein can be preactivated by furin enzyme during viral packaging (shang et al., ) . in contrast to sars-cov infection, this process reduces sars-cov- dependence on target cell proteases for the succeeding infection. therefore, mutations in the spike protein that influence the initial step for viral infection are associated with altered virus transmissibility and pathogenicity (brufsky, ; li et al., ) . the ancestral spike protein with aspartate at th position (s d ) has been asynchronously superseded by glycine substitution (s g ) world-wide. the dominant s g variant is shown to have higher infectivity than the s d variant (korber et al., ) . concomitantly, other studies report that glycine substitution disrupts a salt bridge interaction between aspartate at th position and lysine at th position of an adjacent protomer and may contribute to higher frequency of open conformation than the s d variant (cai et al., ; yurkovetskiy et al., ; zhang et al., a) . a recent cryo-em study further reveals that the glycine substitution prevents premature shedding of s region (zhang et al., b) . along the similar line, our calculation of local interaction energies and free energy difference between aspartate and glycine variants of the spike protein, reported in this paper, suggests that glycine creates energetically favourable local environment. as a result, it strengthens the association of s and s regions of the same as well as adjacent protomer(s) and enhances overall stability of the spike protein trimer. we generated an in silico model for d g variant of spike protein trimer using structure editing tool in ucsf chimera with default parameters (pettersen et al., ) . side chains were optimized using scwrl . program (krivov et al., ) . two d g variant models were generated corresponding to closed and partially open conformations of the spike protein trimer based on the reference cryo-em structures available in the protein data bank (or pdb) entries vxx and vyb, respectively. it must be noted that although d g variant structure is available (pdb code: xs ), we have not considered it in this analysis due to the absence of rbd domain in the solved structure. th effect of d g substitution on local interaction energies was examined using frustratometer algorithm (parra et al., ) . the underlying principle of the algorithm is that a native protein comprises of several conflicting contacts leading to local frustration. to examine frustrated contacts present in a protein, the algorithm systematically substitutes the residue type or alters chemical configuration of each interacting pair (including watermediated interactions) and generates approximately structural decoys for a given contact (elaborated in ferreiro et al. ). the extent of changes in the total interaction energy between native and structural decoys according to associative memory, water mediated, structure and energy model (awsem and implemented as a molecular dynamics algorithm, awsem-md) decides whether the frustration of a concerned contact is minimal, neutral, or high. when the native energy is in the lower end of energy distribution of structural decoys, the contact is stabilizing and minimally frustrated (favourable) whereas when the native energy falls in between the energy distribution of structural decoys indicates the contact is neutrally frustrated. native energy at higher end of the energy distribution of structural decoy indicates the contact is destabilizing and highly frustrated (unfavourable). often, highly frustrated contacts signify functional constraints such as substrate binding, allosteric transitions, binding interfaces and conformational dynamics (ferreiro et al., (ferreiro et al., , . frustration of the contact is represented as frustration index, a z-score of interaction energy of native contact with respect to the interaction energy distribution of structural decoys generated for that specific contact. frustration index below - indicates the interacting pair is highly frustrated while the index between - to . or above . indicates the interacting pair is neutrally or minimally frustrated, respectively. depending upon the nature of perturbation, frustration is referred as 'mutational', 'configurational' or 'single-residue level'. in the mutational frustration, residue type is replaced by other residue types while in the configurational frustration, all possible interaction types between the native residue pairs were sampled through altering residue configuration. in case of single-residue level frustration, only a single residue is considered. the structural decoy set comprises of randomized residue type at that specific site and frustration index is calculated by evaluating changes in the protein energy upon altering the type of residue. in these three categories of frustration indices, only the concerned site/interaction is altered and the rest of the structure is maintained as native. in this study, we analyzed all categories of frustration indices for two variants of spike protein (s d and s g ) in the functional trimeric form. to study the effect of d g variation on the thermodynamic stability of the spike protein trimer, we calculated free energy changes upon aspartate to glycine substitution using buildmodel function in foldx (schymkowitz et al., ) . five iterations of free energy calculations were carried out to obtain converged results (tokuriki et al., ) . inferences of the results were derived from closed and partially open conformations of the spike protein trimer. as amino acid substitution alters local chemical environment, we probed d g effect on the energetics of local inter-residue interactions. this can be quantified as local frustration of a residue or inter-residue interactions. we calculated frustration index of residues and interresidue interactions for two variants of spike protein viz. aspartate or glycine at the th position. result shows that frustration index of aspartate in the spike protein (s d ) is - . , - . and - . for three protomers in the closed conformation (red lines in figure a ). the frustration index of aspartate in the partially open conformation is - . , - . and - . for three protomers (red lines in figure b) . hence, in both the conformations aspartate is highly frustrated. conversely, in glycine variant (s g ), the residue is neutrally frustrated with frustration index of - . , - . and - . for protomers in the closed conformation and - . , - . and - . for protomers in the partially open conformation (blue lines in figure ). this result implies that residue frustration at the th position has become neutral upon glycine substitution. in the spike protein of both conformations (s d ), aspartate is involved in intra-protomer contact (with residues ser , gly and gly ) as well as in inter-protomer contact (with asn , arg , ser , thr and pro ) through direct, long-range electrostatic or water-mediated interactions. mutational frustration index indicates that all the contacts are highly frustrated (figure , top panel) . however, in the closed conformation of s g variant, glycine interacts with phe , leu and cys of the same protomer and pro of the adjacent protomer. except inter-protomer contact through pro , all three intra-protomer contacts are minimally frustrated (figure , top left panel) . likewise, in the partially open conformation of s g variant, glycine has the same contact pattern as observed in the closed conformation along with one additional contact to val . of these five contacts, three are minimally frustrated and two are highly frustrated (figure , top right panel) . overall, the number of contacts as well as the number of highly frustrated contacts are reduced upon aspartate to glycine substitution. notably, glycine forms a greater number of minimally frustrated contacts indicating it creates more favourable environment around the th position compared to aspartate. next, we calculated configurational frustration index that indicates how favourable the native contact between two residues relative to other possible contacts those two residues can have. results show that in the closed conformation, aspartate (s d ) has one minimally frustrated contact with arg (figure , left bottom panel) . whereas, glycine (s g ) has six minimally frustrated contacts with residues ser , gly , asn , thr and arg of the same protomer and thr of a preceding protomer in the clock-wise direction (figure , left bottom panel). similar trend is seen for partially open conformation in which aspartate (s d ) has a highly frustrated contact with gly while glycine (s g ) has the same contacts but minimally frustrated besides three minimally frustrated contacts with other residues (thr , arg and thr ) (figure , right bottom panel). these observations are common among three protomers present in the spike protein trimer (supplementary table s ). hence, glycine has more favourable contacts than aspartate. overall, calculations of single residue, mutational and configurational frustrations reveal that glycine substitution modified local interaction energy in the favourable direction. if the reduction of frustration in the local interaction energies is significant upon aspartate to glycine substitution, it can have an influence on the thermodynamic stability of the spike protein trimer. to examine this, we calculated difference in the total free energy of trimer between s d and s g variants using foldx package (schymkowitz et al., ) . results show that the free energy difference (ΔΔg) is - . kcal/mol for the closed conformation and - . kcal/mol for the partially open conformation. it suggests that the stabilizing effect of glycine substitution in the local environment markedly increases the overall stability of spike protein trimer. together, these results imply that the enhanced stability of s g may confer increased availability of functional form of spike protein trimer and consequent in higher infectivity compared to the s d as observed in the recent experimental studies (korber et al., ; zhang et al., a zhang et al., , b . the increasing severity in public health and economic crisis builds urgency to develop therapeutic intervention against covid- infection at the earliest. currently, the dominance of d g variant of sars-cov- spike protein that is being intensively studied across the globe for covid- prophylaxis and treatment invites special attention. in this study, we demonstrate using in-silico approaches that glycine substitution at th position changes the local environment from energetically frustrated into favourable for contacts present within as well as between protomer(s). consequently, the free energy of s g is lower than that of s d and hence local changes in the interaction energies at the th position in each protomer have a significant effect on the overall thermodynamic stability of the spike protein trimer. this finding bestows to our knowledge on the mechanism of increased transmissibility of s g . table s ). supplementary table s . the table contains details of frustration index of inter-residue contacts present at th position of spike protein trimer in closed and partially open conformations. table s a in sheet provides mutational frustration index of contacts present at the th position in the ancestral (d ) and dominant (g ) variants of the spike protein trimer. in table s b at sheet , configurational frustration index of those contacts in the ancestral (d ) and dominant (g ) variants has been provided. frustration state namely minimally, neutral or highly represents that the contact is energetically favourable, neutral or unfavourable, respectively. mechanisms of coronavirus cell entry mediated by the viral spike protein distinct viral clades of sars-cov- : implications for modeling of viral spread distinct conformational states of sars-cov- spike protein localizing frustration in native proteins and protein assemblies frustration in biomolecules phylogenetic network analysis of sars-cov- genomes tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus improved prediction of protein side-chain conformations with scwrl structure of the sars-cov- spike receptor-binding domain bound to the ace receptor the impact of mutations in sars-cov- spike on viral infectivity and antigenicity geographic and genomic distribution of sars-cov- mutations protein frustratometer : a tool to localize energetic frustration in protein molecules, now with electrostatics ucsf chimera -a visualization system for exploratory research and analysis the foldx web server: an online force field cell entry mechanisms of sars-cov- the stability effects of protein mutations appear to be universally distributed function, and antigenicity of the sars-cov- spike glycoprotein cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace structural and functional analysis of the d g sars-cov- spike protein variant the d g mutation in the sars-cov- spike protein reduces s shedding and increases infectivity structural impact on sars-cov- spike protein by d g substitution key: cord- -lqio l k authors: arumugam, arunkumar; faron, matthew l.; yu, peter; markham, cole; wong, season title: a rapid covid- rt-pcr detection assay for low resource settings date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: lqio l k quantitative reverse transcription polymerase chain reaction (rt-qpcr) assay is the gold standard recommended to test for acute sars-cov- infection. it has been used by the centers for disease control and prevention (cdc) and several other companies in their emergency use authorization (eua) assays. rt-qpcr requires expensive equipment such as rna isolation instruments and real-time pcr thermal cyclers, which are not available in many low resource settings and developing countries. as a pandemic, covid- has quickly spread to the rest of the world. many underdeveloped and developing counties do not have the means for fast and accurate covid- detection to control this outbreak. using covid- positive clinical specimens, we demonstrated that rt-pcr assays can be performed in as little as minutes using untreated samples, heat-inactivated samples, or extracted rna templates. rapid rt-pcr was achieved using thin-walled pcr tubes and a setup including sous vide immersion heaters/circulators. our data suggest that rapid rt-pcr can be implemented for sensitive and specific molecular diagnosis of covid- in situations where sophisticated laboratory instruments are not available. quantitative reverse transcription polymerase chain reaction (rt-qpcr) assay is the gold standard recommended to test for acute sars-cov- infection. [ ] [ ] [ ] [ ] it has been used by the centers for disease control and prevention (cdc) and several other companies in their emergency use authorization (eua) assays. despite its established performance in sensitivity and specificity, rt-qpcr requires expensive equipment such as rna isolation instruments and real-time pcr thermal cyclers, which are not available in many resource limiting settings. as a pandemic, covid- has quickly spread to the rest of the world. many underdeveloped and developing counties do not have the means for fast and accurate covid- detection to control this outbreak. therefore, any development toward rapid, sensitive, and affordable rt-pcr assays could help advance the diagnosis and thus limit the spread of covid- . our approach is to update the "archaic" method of hand-transferring reaction tubes through a series of water baths. two water baths (one for denaturation and one for the reverse transcription and then the annealing/extension steps) were made using food storage plastic containers heated by sous vide immersion heaters. these heaters are easy to use and have precise temperature control and circulating functionality. the pcr tubes were then shuttled between water baths using a servo motor operated arm controlled by a raspberry pi-based device. [ ] [ ] [ ] our device costed ~$ to build, whereas other pcr machines typically cost thousands of dollars. in addition to its low price tag, the unit can process up to samples in one run. it is also extremely fast, as cycles of taqman probe-based pcr can be completed in minutes using conventional polypropylene pcr tubes. using thin-film pcr reaction tubes (e.g., cepheid smartcycler or glass capillary tubes), a -cycle rt-pcr reaction can be completed in as little as minutes ( minutes rt and minutes pcr with seconds per cycle). compared to most benchtop thermal cyclers, this is a significant improvement in speed. while there was no real-time fluorescence detection during the run, the results of the probe-based rt-pcr assays were examined using the end-point pcr approach. the fluorescence signal of the tubes after amplification was examined by placing the tubes on top of a blue led gel-viewing box and viewed through an amber or orange colored filter. a cell phone camera was used to record the signal difference between the pcr tubes. we tested this rapid water bath rt-pcr approach using untreated or heat-inactivated samples directly added to one-step rt-pcr master mixes without an rna extraction step. we have also developed a -minute rapid extraction step using a magnetic particle-based method to produce rna templates. the results are described in the next section. the sous vide immersion heaters we used are capable of heating water up to °c. this means they can be used to make an environment suitable for denaturing nucleic acid and amplicons. we used two -quart clear food storage containers ( . l) as water baths and we mounted each immersion heater to the side. the immersion heater can warm the water to °c in less than minutes from room temperature water. a temperature data logger probe was immersed in the water to create a temperature plot (figure ) . following the initial heating, the immersion heater was left in the container for another minutes to observe for its ability to maintain water temperature. after minutes, the data logger showed that the water bath temperature remained at a steady °c, with only . °c of variation. in another test, room temperature water was heated to °c from a room temperature of . °c. after minutes, the immersion heater successfully heated the water to a steady °c. as expected, the heater successfully maintained a temperature of °c with only °c of variation. these two results confirm that the sous vide immersion heater can maintain steady heat long enough for the denaturation and annealing/extension steps of rt-pcr. the only limitation we expect for this set up is when using it in high altitude locations where boiling temperature is lower than °c. lower denaturation temperature setting would be needed, along with longer denaturation time. to determine the relative viral load of the clinical samples, µl of the specimen was extracted using an automated system (promega maxwell) and run on a commercial real-time thermal cycler. three µl of the templates were used in each rt-qpcr reaction. the qpcr threshold cycle (ct) values of these samples are listed in table . it took the commercial cycler hour and minutes to complete the reaction. furthermore, to test the need for rna extraction, µls of each media was spiked in µl of the master mix. the results show that untreated media samples can produce qualitative pcr results matching those performed using extracted rna templates. to test for sar-cov- using water baths, we prepared three singleplex pcr reactions, each targeting n , n , and rnase p. extracted templates from covid- positive clinical specimens and contrived negative samples were tested using water bath-based rt-pcr. the rt-pcr run was completed in minutes. the picture of the pcr tubes after cycles shows that we can use a blue led gel box and cell phone camera to determine the test results (figure ) . in a covid- positive sample, all three sample tubes had increased fluorescence signal after the reaction. in a covid- negative sample, only the rnase p tube had increased in fluorescence intensity. figure . n , n , and rnase p in thin-walled pcr tubes after a rt-pcr run using µl of extracted templates. total run time was minutes ( s/ s/ x( xs/ s)). rnase p rnase p we repeated the work using regular pcr tubes. while the reaction took longer to complete ( minutes total), the results matched those of the thin-walled pcr tubes (figure ). the sample preparation step is generally time-consuming, regardless of whether it is done manually or automated. in addition, there was a shortage of the recommended viral rna extraction kits needed for the centers for disease control and prevention (cdc) rt-qpcr assay to diagnose sars-cov- . we investigated the feasibility of omitting the full rna extraction step to expedite the test without significantly impacting the test's sensitivity. our study corroborates the findings from others who have successfully performed covid- rt-qpcr reactions in a benchtop real-time thermal cycler by simply adding a few microliters of the unprocessed sample in transport medium directly into the rt-qpcr assay master mix. [ ] [ ] [ ] [ ] the data presented below suggests that it is possible to skip the rna extraction step and use rapid rt-pcr to deliver fast covid- testing using minimal equipment. while this approach may not be desirable for use in developed countries, it will find applications in many remote regions inside underdeveloped and developing countries where extraction devices and pcr thermal cyclers are not commonly available. we first tested whether we could use untreated samples directly by spiking them into rt-pcr master mix to expedite the rt-pcr test (targeting n ) by skipping the extraction step. we performed the initial reactions with a protocol of seconds rt, seconds of reverse transcriptase inactivation and hot start for the pcr. this was followed by cycles of seconds of denaturation and seconds of annealing/extension steps. afterwards, photos were taken of the tubes illuminated on a blue led gel box with an amber viewing filter (figure ) . by comparing the tube intensity with the negative control, we were able to call out of of the samples as n positive (samples , , , , , , and ). sample was a weak positive. the signal intensity for samples , , and was not significantly different from the negative controls shown in the photo. these four untreated samples ( , , , and ) all produced ct over when tested by rt-pcr (table ) . we also speculate that the sensitivity of this run using untreated samples was low ( %) because we only spent seconds on reverse transcription. the inhibitors in the samples can affect the cdna production from the rt process as well as the polymerase extension step. though not presented here, we have also determined that ingredients in utm inhibit our rt-pcr reactions much more than vtm. this can explain why samples , , and (low viral load samples in utm) were not tested positive. these results suggest that the limit of detection under these conditions is around the ct values of low s. to reduce inhibition and improve sensitivity, a -minute, °c heat inactivation step was added to treat the samples. three microliters of these samples were then added into rt-pcr master mix and perform the rapid test (n ). the photo taken after cycles shows a significant increase in the fluorescence signal in out of samples (figure ) . the signal intensity for sample , the sample with the highest threshold cycle value (ct = . ) among the samples, was not different from the vtm-only negative control. the improved sensitivity from using heat inactivated samples for testing covid- is similar to our results with flu sample testing (data not shown) and from others reported recently. seven out of ten samples can be identified as positive covid- patient samples by comparing results with negative controls. the rt-pcr run was completed in minutes and the rt-pcr protocol was ( s/ s/ x( s/ s)). we next tested whether we can use a rapid rna extraction step to produce rna templates for rapid rt-pcr from raw samples (no heat inactivation). we developed our own rapid magnetic particle-based extraction protocol using lysis buffer and two wash buffers before elution took place at to °c. three µl of the extracted templates was added to the rt-pcr master mix and tested in the water bath method. the photo taken after cycles shows that we can call out of samples as n positive (figure ) . the results suggest that a quick extraction step can improve sensitivity. figure . pcr tubes after a rt-pcr run using µl of extracted templates. ten out of ten samples can be identified as positive based on the fluorescence signal difference when compared to the negative control. the rt-pcr run was completed in minutes ( s/ s/ x( s/ s)). our post-rt-pcr photos of the pcr tubes show that water bath-based rt-pcr allows for cycle rt-pcr reactions to finish in as little as minutes. with a -minute rna extraction protocol, we were able to get positive rt-pcr results with all ten covid- positive clinical samples. using raw samples after minutes of heating, we were to detect out of samples using our approach. using untreated samples with a -second reverse transcription step, we only were able to identify out of samples as positive. as expected, samples with a high viral load can easily be detected if the rt-pcr input uses raw or extracted templates. samples with a low viral load or untreated covid- positive samples are more likely to be missed if an extraction step is not used or when the pcr protocol is shortened when the presence of inhibitors affects the amplification efficiency. for covid- testing, using raw samples or minimal sample preparation steps may not significantly reduce the test sensitivity as most patients tend to have a higher viral load. therefore, we report that the use of untreated samples can be a viable option during the covid- pandemic. in addition, we can use a low-cost set up to allow fluorescence based end-point rt-pcr reactions to test for covid- by running n , n , and rnase p rt-pcr reactions. this would allow locations with no access to real-time pcr thermal cyclers or even basic thermal cyclers to perform highly sensitive and specific gold-standard rt-pcr assays for covid- . the water baths are big enough to accept a -well pcr plate. the disadvantage of this approach is that a non-multiplexed reaction algorithm uses larger volumes of reagents. also, if regular pcr tubes are used, the rt-pcr will take up to minutes to complete. we have previously devised a low cost alternative to a thermal cycler using the thermos thermal cycler (ttc), a pcr method using thermos cans as insulated water baths to create a semiautomated cheaper alternative to conventional thermal cyclers to be used in low resource settings and small laboratories. , for the current work, to achieve a steady circulating water system with consistent temperatures for denaturation and annealing/extension steps, we used two sous vide immersion heaters purchased at $ each (anova culinary, watt) to heat the water in two quart clear food storage containers (rubbermaid) to °c and °c, both of which are common pcr annealing and denaturing temperatures. the temperature of the °c bath was lowered to °c during the reverse transcription step. the sous vide immersion heater is a medium-sized device that can be securely clamped onto the edge of the food storage containers. its steady heating and water circulation functionality results in a well-maintained, even temperature throughout the bath for long periods of time. we were able to measure the temperature throughout the heating duration using a temperature data logger (hh u, omega engineering, norwalk, ct). automation of the thermal cycling reactions was achieved by programming a one-servo motor and devising a holder assembly with lego pieces to shuttle pcr tubes between the water baths (figure ). an arching movement was used to lift, transport, and lower the tubes between two baths in a single semi-circle motion. the tubes were contained in a hinged holder that would allow the tubes to remain upright throughout the movement phase. the mechanical portion was driven by a micro servo (mg s micro servo motor, amazon) and was controlled by a microcontroller (raspberry pi model a+) using a -channel pwm / servo hat (adafruit). precise movement by the servo moved the arm back and forth between the two baths. the semi-circle movement of the arm, therefore, can move the pcr tubes in and out of the water. the entire arm assembly was mounted on a small lab-jack. a usb connector powered the servos and controller. the covid- positive clinical samples were obtained from the medical college of wisconsin (mcw) and consisted of de-identified nasopharyngeal specimens collected for non-research purposes. specimens were defined as positive by a laboratory defined assay consisting of cdc primers extracted on an emag (biomerieux, france) and tested on the dx (applied biosystems fosters city, ca). specimens were frozen within hours of collection. primer and probe sets for the sars-cov- rt-qpcr assay (cat. no. ) were purchased from integrated dna technologies (coralville, ia). the taqpath -step multiplex master mix (cat. no. the promega maxwell extraction was performed using as cartridge (promega corporation, madison, wi). µl of the samples from mwc was added to the cartridge and eluted in µl of the elution buffer provided in the kit. we have used a rapid sample preparation protocol to reduce the sample-to-test time. µl of the samples from mcw were added to µl of lysis buffer and µl of magnetic bead solution (nuclisens magnetic particle extraction kit, biomerieux, durham, nc) and lysed for minute. the beads were collected and washed in wash buffers and for seconds each and eluted in µl of elution buffer at °c for minute. taqpath -step multiplex master mix was used for the rt-qpcr reaction with specific primers and probes tagged with fam. for the detection of sars-cov- target genes, n and n , the final concentration of primers was nm and probes were nm as per the cdc protocol ( table ). each pcr reaction was µl with µl samples/extracted template added. the typical run time for the bio-rad cycler to complete cycles was minutes. (table ) rapid pcr rapid pcr was conducted using two water baths maintained at °c and °c ( table ) . reverse transcription was carried out for seconds at °c and initial denaturation was carried out for seconds followed by cycles of pcr amplification with seconds of denaturation at °c and seconds of annealing/extension at °c. the fluorescence intensity of the reaction mix was analyzed after amplification. the total time for pcr was minutes, including minutes of rt and initial denaturation, and minutes of pcr amplification for cycles. it took . minutes for -cycle reactions ( minutes of rt and initial denaturation and . minutes of pcr amplification). protocols that deviated from these typical settings are noted in the results section. pcr tubes used in this study was cepheid smartcycler reaction tubes and thin-walled polypropylene pcr tubes (cat. no. , sorenson bioscience). after amplification by the water bath thermal cycler, pcr tubes were placed above a gel-viewing blue led powered light box (lonza flashgel dock or io rodeo midi blue led transilluminator) to confirm amplification through fluorescence intensity. fluorescence across reactions was imaged using a smartphone camera (note , samsung electronics) with an amber filter placed in front of the lens. we visually determined the rt-pcr result as positive or negative using the negative control as background fluorescence. alternatively, software such as imagej can be utilized. the intent of the work was for clinical method development as a response to the covid- pandemic. in this work we used anonymized remnant material from samples that had been collected for clinical diagnostics of sars-cov- . our clinical specimens were provided by the medical college of wisconsin. these samples included oropharyngeal swabs in utm and m vtm. these de-identified samples are not considered to be human subjects. molecular diagnosis of a novel coronavirus ( -ncov) causing an outbreak of pneumonia a novel coronavirus from patients with pneumonia in china comparison of different samples for novel coronavirus detection by nucleic acid amplification tests detection of novel coronavirus ( -ncov) by real-time rt-pcr triplex real-time rt-pcr for severe acute respiratory syndrome coronavirus . emerging infectious disease journal the kinetic requirements of extreme qpcr a rapid and low-cost pcr thermal cycler for low resource settings a rapid and low-cost pcr thermal cycler for infectious disease diagnostics direct rt-qpcr detection of sars-cov- rna from patient nasopharyngeal swabs without an rna extraction step. biorxiv the potential use of unprocessed sample for rt-qpcr detection of covid- without an rna extraction step. biorxiv sars-cov- detection from nasopharyngeal swab samples without rna extraction. biorxiv fast sars-cov- detection by rt-qpcr in preheated nasopharyngeal swab samples. medrxiv sars-cov- viral load in upper respiratory specimens of infected patients division of viral diseases the authors declare no conflict of interest. aa is employed at ai biosciences. sw is the cofounder of ai biosciences. key: cord- -tk eturq authors: berrio, alejandro; gartner, valerie; wray, gregory a title: positive selection within the genomes of sars-cov- and other coronaviruses independent of impact on protein function date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: tk eturq background the emergence of a novel coronavirus (sars-cov- ) associated with severe acute respiratory disease (covid- ) has prompted efforts to understand the genetic basis for its unique characteristics and its jump from non-primate hosts to humans. tests for positive selection can identify apparently nonrandom patterns of mutation accumulation within genomes, highlighting regions where molecular function may have changed during the origin of a species. several recent studies of the sars-cov- genome have identified signals of conservation and positive selection within the gene encoding spike protein based on the ratio of synonymous to nonsynonymous substitution. such tests cannot, however, detect changes in the function of rna molecules. methods here we apply a test for branch-specific oversubstitution of mutations within narrow windows of the genome without reference to the genetic code. results we recapitulate the finding that the gene encoding spike protein has been a target of both purifying and positive selection. in addition, we find other likely targets of positive selection within the genome of sars-cov- , specifically within the genes encoding nsp and nsp . homology-directed modeling indicates no change in either nsp or nsp protein structure relative to the most recent common ancestor. thermodynamic modeling of rna stability and structure, however, indicates that rna secondary structure within both genes in the sars-cov- genome differs from those of ratg , the reconstructed common ancestor, and pan-cov-gd (guangdong). these sars-cov- -specific mutations may affect molecular processes mediated by the positive or negative rna molecules, including transcription, translation, rna stability, and evasion of the host innate immune system. our results highlight the importance of considering mutations in viral genomes not only from the perspective of their impact on protein structure, but also how they may impact other molecular processes critical to the viral life cycle. an important challenge in understanding zoonotic events is identifying the genetic changes that to identify branch specific positive selection, it is necessary to obtain a query and a reference alignment. we downloaded six high quality reference genomes from the subgenus sarbecovirus (table ). next, we used mafft (katoh & standley, ) (kearse et al., ) with default settings to build a sequence alignment. next, we refined the alignment using a gene by gene procedure. foreground branch is evolving at faster rates than the expectation from the background species. we performed a selection analysis on sliding windows of bp with a step of bp along a sequence alignment of reference genome sequences of coronaviruses of the subgenus sarbecovirus and two sequences of pangolin coronavirus recently published (liu, chen & chen, ; lam et al., ) . this procedure generates partitions where a tree topology can be fitted. to investigate the extent of positive selection or branches with long substitution rates along the sars-cov- genome, we used a branch-specific method known as adaptiphy that was initially pa_cov_guangxi_p l), (bat_cov_lyra , sars_cov)), bat_cov_bm ). this method is highly sensitive and specific and can differentiate between positive selection and relaxation of constraint. adaptiphy requires at least kb reference alignment for each species that is used as a putatively neutral proxy for computing substitution rates. viruses' genomes lack non-functional regions, therefore, the most reasonable proxy for neutral evolution has to be found in the regions outside the query window. to do this, we concatenated twenty regions of bp of the viral genome alignment that were drawn randomly with replacement from the entire genome alignment. then, for each query alignment, we built a reference alignment of kb as it produces a stable evolutionary standard of recombination rates. to control for the stochasticity of the evolutionary process, we run each query against ten bootstrapped samples of reference alignments. finally, we used a custom r script to compute the likelihood ratio, which was used as a test statistic for a chi-squared test with one degree of freedom to calculate a p-value for each query. then, we corrected the distribution of all p-values per query region using the p.adjust() r function with the fdr method. next, we classified a query window to be under positive selection if the p-adjusted value was < . . we were unable to successfully run adaptiphy on two windows because the outgroup species (bat_cov_bm ) contained a deletion of bp relative to sars-cov- , which spans the entire orf . next, we calculated the distribution of substitution rates for each branch and nodes in each query and reference sequence using phylofit (hubisz, pollard & siepel, (wong & nielsen, ) . to test for conservation, we used the phastcons computational method from phast (siepel et positive and negative selection are highly localized within coronavirus genomes we tested for branch-specific selection on nucleotide sequences in coronavirus genomes, the fourth signal is located in a segment encoding the s and s ' subunits that includes the boundary region between the s and the s subunits (fig ) , a region that includes the genes encoding nsp and nsp contain branch-specific signals of positive selection we also detected two shorter signals of positive selection within the sars-cov- genome that are located outside of the s gene, in orf a and orf b (fig a) . interestingly, both encode small proteins that contribute to viral replication. the first is nsp , which encodes a similarities to, but also notable differences from, that of sars-cov- (fig ) . in both species, s cov-ratg )))). recombination from a divergent species should produce an incongruent topology in one or more adjacent windows, revealing a recombined region and its approximate breakpoints. we identified regions where the topology differed from the expected (fig ) . of (fig c and a) . in contrast, signals of positive selection in sars-cov- and bat-cov-ratg are concentrated in the domain that mediates binding to the host receptor ace (fig c and a) . these distinct distributions suggest that modifications in different aspects of spike function took place as various coronaviruses adapted to novel hosts. in importantly, we also detected signals of positive selection in two additional regions of the sars-cov- genome, specifically within the genes encoding nsp and nsp (fig a) . of and rna structure (fig and ) . in the case of nsp protein, two nearly adjacent nonsynonymous substitutions at residues and occurred on the branch leading to sars-cov- ( fig b) . these both involve changing side chains with similar biochemical properties, respectively valine to alanine and valine to isoleucine. homology-directed modeling of protein structure suggests that these two amino acid substitutions have very little impact on either secondary or tertiary structure when comparing the sars-cov- protein orthologue to those of the other species examined (fig a) . in the case of nsp protein, no nonsynonymous substitutions evolved on the branch leading to sars-cov- . thus, the signal of positive selection within nsp is unlikely to reflect changes in protein structure or function, while the signal within nsp cannot affect either because the encoded polypeptide is identical. with highly similar and identical protein structures predicted for nsp and nsp , respectively, in silico identification of conserved cis -acting rna elements in the sars- regulate antagonism of irf and nf-kappab signaling ubiquitination-mediated regulation of interferon responses coronavirus spike proteins in viral entry functional rna elements in the dengue virus genome recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission the vienna rna websuite membrane rearrangements mediated by coronavirus nonstructural proteins and . virology mobility and interactions of coronavirus nonstructural protein visualizing genomic data using gviz and bioconductor molecular biology promoter regions of many neural-and nutrition-related genes have experienced positive selection during human evolution temporal dynamics in viral shedding and transmissibility of covid- structure of replicating sars-cov- polymerase a multibasic cleavage site in the spike protein of sars-cov- is essential for infection of human lung cells evidence of the recombinant origin of a bat severe acute respiratory syndrome (sars)-like coronavirus and its implications on the direct ancestor of sars discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus phast and rphast: phylogenetic analysis with space/time models coronavirus spike protein and tropism changes advances in virus research comprehensive in-vivo secondary structure of the sars-cov- genome reveals novel regulatory motifs and mechanisms. biorxiv : the preprint server for biology comprehensive in-vivo secondary structure of the sars-cov- genome reveals novel regulatory motifs and mechanisms. biorxiv : the preprint server for biology mafft multiple sequence alignment software version : improvements in performance and usability geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data the phyre web portal for protein modeling, prediction and analysis the architecture of sars-cov- tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- raxml-ng: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference identifying sars-cov- -related coronaviruses in malayan pangolins severe acute respiratory syndrome (sars) horseshoe bats through recombination emergence of sars-cov- through recombination and strong purifying selection the divergence between sars-cov- and ratg might be overestimated due to the extensive rna modification viral metagenomics revealed sendai virus and coronavirus infection of malayan pangolins (manis javanica) viennarna package . severe acute respiratory syndrome-associated coronavirus a protein forms an ion channel and modulates virus release cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell-cell but not virus-cell fusion downloaded from mechanism and structural diversity of exoribonuclease- resistant rna structures in flaviviral rnas coronavirus cis-acting rna elements the sars coronavirus papain like protease can inhibit irf at a post activation step that requires deubiquitination activity coronavirus non-structural protein : evasion, attenuation, and possible treatments the sars coronavirus a protein causes endoplasmic reticulum stress and induces ligand-independent downregulation of the type interferon receptor viral innate immune evasion and the pathogenesis of emerging rna virus infections the ratio of replacement to silent divergence and tests of neutrality topology and membrane anchoring of the coronavirus replication complex: not all hydrophobic domains of nsp and nsp are membrane spanning emerging sars-cov- mutation hot spots include a novel rna-dependent-rna polymerase variant structure of the human metapneumovirus polymerase phosphoprotein complex clinical progression and viral load in a community outbreak of coronavirus- associated sars pneumonia: a prospective study the coding region of the hcv genome contains a network of regulatory rna structures estimating variability in the transmission of severe acute respiratory syndrome to household contacts in hong kong hyphy: hypothesis testing using phylogenies hyphy . -a customizable platform for evolutionary hypothesis testing using phylogenies rna genome conservation and secondary structure in sars-cov- and sars-related viruses: a first look comparative analysis of coronavirus genomic rna structure reveals conservation in sars-like coronaviruses. biorxiv : the preprint server for biology in search of molecular darwinism evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes the nonstructural proteins directing coronavirus rna synthesis and processing the severe acute respiratory syndrome coronavirus a protein up-regulates expression of fibrinogen in lung epithelial cells on the origin and continuing evolution of sars-cov- structural insights into coronavirus entry sars-cov- genomic variations associated with mortality rate of covid- inhibition of irf -dependent antiviral responses by cellular and viral proteins positive selection of orf a and orf genes drives the evolution of sars-cov- during the enhanced receptor binding of sars-cov- through networks of hydrogen-bonding and hydrophobic interactions structural and functional basis of sars-cov- entry by using exploitation of glycosylation in enveloped virus pathobiology detecting selection in noncoding regions of nucleotide sequences cryo-em structure of the -ncov spike in the prefusion conformation cryo-em analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage fatcat: a web server for flexible structure comparison and structure similarity searching coronavirus open reading frame- a drives multimodal necrotic cell death the short- and long-range rna-rna interactome of sars-cov- co-first authors ribose '-o- methylation provides a molecular signature for the distinction of self and non-self mrna dependent on the rna sensor mda key: cord- -oj d ail authors: gorgun, d.; lihan, m.; kapoor, k.; tajkhorshid, e. title: binding mode of sars-cov fusion peptide to human cellular membrane date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: oj d ail infection of human cells by the sars-cov relies on its binding to a specific receptor and subsequent fusion of the viral and host cell membranes. the fusion peptide (fp), a short peptide segment in the spike protein, plays a central role in the initial penetration of the virus into the host cell membrane, followed by the fusion of the two membranes. here, we use an array of molecular dynamics (md) simulations taking advantage of the highly mobile membrane mimetic (hmmm) model, to investigate the interaction of the sars-cov fp with a lipid bilayer representing mammalian cellular membranes at an atomic level, and to characterize the membrane-bound form of the peptide. six independent systems were generated by changing the initial positioning and orientation of the fp with respect to the membrane, and each system was simulated in five independent replicas. in % of the simulations, the fp reaches a stable, membrane-bound configuration where the peptide deeply penetrated into the membrane. clustering of the results reveals two major membrane binding modes, the helix-binding mode and the loop-binding mode. taken into account the sequence conservation among the viral fps and the results of mutagenesis studies establishing the role of specific residues in the helical portion of the fp in membrane association, we propose that the helix-binding mode represents more closely the biologically relevant form. in the helix-binding mode, the helix is stabilized in an oblique angle with respect to the membrane with its n-terminus tilted towards the membrane core. analysis of the fp-lipid interactions shows the involvement of specific residues of the helix in membrane binding previously described as the fusion active core residues. taken together, the results shed light on a key step involved in sars-cov infection with potential implications in designing novel inhibitors. significance a key step in cellular infection by the sars-cov virus is its attachment to and penetration into the plasma membrane of human cells. these processes hinge upon the membrane interaction of the viral fusion peptide, a segment exposed by the spike protein upon its conformational changes after encountering the host cell. in this study, using molecular dynamics simulations, we describe how the fusion peptide from the sars-cov virus binds human cellular membranes and characterize, at an atomic level, lipid-protein interactions important for the stability of the bound state. coronavirus disease (covid- ) emerged in late as a significant threat to human health. it became a global pandemic by march ( , ) , and it continues to claim lives and to significantly impact all aspects of people's lives across the globe. covid- is caused by severe acute respiratory syndrome coronavirus (sars-cov ), a positive-strand rna virus that causes severe respiratory complications, among other symptoms, in humans ( ) . sars-cov recognizes and infects human cells that express a cell surface receptor termed angiotensin-converting enzyme (ace ) ( ) , which is specifically recognized by the viral spike glycoprotein (s-protein). binding of the two proteins is a prerequisite for the fusion of the viral and cellular membranes ( ) , one of the first and required steps in viral infection facilitating the release of the viral genome into the infected cell ( ) ( ) ( ) ( ) . the binding of the virus to the surface cae receptor on the host cell is mediated by the s domain in the s-protein on the viral surface. the next key step, namly, virus-host membrane fusion is mediated by the s domain of the s-protein ( ) , the domain, which consists of multiple proteolytic cleavage sites, namely, one at the boundary of s /s and one at the s ' site, which are cleaved as part of the fusion process. ( ) ( ) ( ) cleavages at the s /s boundary and the s ' sites, downstream to the two heptad repeat regions (hr and hr ), induce the dissociation of the s subunits from the s-protein, followed by a series of conformational changes that trigger membrane fusion between the host cell membrane and the viral envelope ( , ) . the remaining s trimer, a post-fusion structural motif, is shared among all the class i viral fusion proteins ( , ) . a critical part of any viral fusion protein in the coronavirus family is the relatively apolar fusion peptide (fp), present in the s domain, which is responsible for directly inserting into and interacting with the host cell membrane, thereby initiating the fusion process ( , , ) . there are several characteristics that viral fps have in common and help locating the fp sequence. the sequences of the fps are highly conserved within each family of viruses (but not between families), the frequency of glycines and alanines in the sequence is high, and bulky hydrophobic residues as well as hydrophilic residues usually flank the cleavage sites ( , ) . some fps have a central kink via a proline and a helix-turn-helix structure, as observed in influenza virus hemagglutinin (ha). in such cases, proteolytic cleavage occurs directly at the n-terminus to the fp, and the peptides are thus called external or n-terminal fps. in other cases, the proteolytic cleavage site resides upstream from the fp which is relatively longer ( - amino acids) and contains a prolonged -helix in the fusion-active state such as in the cases of ebola virus or avian leukosis sarcoma virus ( , , ) . such fps are referred to as internal fps as in the case of sars-cov . to this date, three main fp regions have been suggested in sars-cov s-protein ( , ) , which are located in between hr and n-terminus of the s domain: ( ) at the n-terminus of hr , ( ) near the s /s cleavage site, and, ( ) at the c-terminus of the cleavage site s ' ( ) . based on the criteria stated above as well as experimental studies, most recent data suggest that immediately downstream of the s ' cleavage site of the sars-cov is the leading segment involved in the fusion process ( , ( ) ( ) ( ) ( ) . mutagenesis experiments showed the significance of the fp in this region, specifically carrying a region termed the fusion active core ( ) . the outer leaflet of human cell membranes is composed of a mixture of lipids including phosphatidylcholine (pc), phosphatidylethanolamine (pe), and cholesterol (chl) ( ) . a fluorescence spectroscopy study has shown that chl plays an important role in modulating the binding affinity and organization of the sars-cov fp in the membranes ( ) . therefore, taking into account the natural lipid composition of a mammalian cell in simulation studies such as the present one is important. characterizing how the fp binds to the membrane and how it interacts with specific lipids has been challenging experimentally. computational methods, particularly atomistic molecular dynamics (md) simulation, offer an alternative strategy to capture the membrane binding process of the fp and to probe the interface between the fp and the lipid membrane. one of the major challenges in simulating such processes lies in sufficient sampling of all possible fp membrane-binding poses. due to the slow dynamics of membrane lipids, they are often insufficiently sampled on the timescales which atomistic md simulations currently can access, causing the membrane binding and insertion of proteins to be heavily biased by the initial lipid distribution and protein placement. in this context, an alternative membrane model, termed the highly mobile membrane-mimetic (hmmm) model, has been developed to enhance lipid diffusion without compromising the atomistic description of lipid head groups ( ) ( ) ( ) ( ) . the hmmm model is based on the combination of a biphasic solvent system ( ) with short-tailed lipids at the interface ( ) . owing to its significantly enhanced lipid mobility the model has proven extremely efficient in describing mixed lipid bilayers, reproducibly capturing spontaneous (unbiased) membrane binding and insertion of a wide spectrum of peripheral proteins ( , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , and collecting significantly improved sampling of lipid-protein interactions. of particular interest to the present study, this method has also been successfully used to capture spontaneous membrane association of the influenza virus hemagglutinin fp ( ) . in this study, we perform an extensive set of hmmm simulations, to investigate membrane binding of the sars-cov fp, also taking into account several initial, different positioning of the peptide with respect to the membrane to further improve sampling. the results provide a detailed mechanistic picture of the initial step in the fusion process, focusing on the biophysical aspects of the virus-lipid bilayer interactions taking place during this process. characterizing the mechanism of the fusion-driving, fp-host membrane interactions is key to our understanding critical steps involved in viral infection, and might pave the way for development of novel therapeutic intervention strategies against the virus. as the first step for modeling the sars-cov fp, multiple sequence alignment was carried out for the s-proteins from different human coronaviruses (hcovs). there are a total of seven known hcovs, namely, e and nl belonging to the alpha subfamily of coronaviruses, and oc , hku , middle east respiratory syndrome (mers), severe acute respiratory syndrome (sars) and sars-cov , belonging to the beta subfamily of coronaviruses. out of these, s-protein structures are available for hku , mers, sars, as well as sars-cov . additionally, bat coronavirus ratg , a closely related homolog of sars-cov , was also included in the sequence alignment. multiple sequence alignment for the above eight sequences was carried out using the mafft program with the l-ins-i method ( ) and visualized using jalview ( ) (fig. s ). the available cryoem structure of the sars-cov s-protein at the time (pdb: vsb) ( ) contained only a partial structure ( residues) of the fp, making it unsuitable for developing the initial sars-cov fp model (fig. a) . since the s domain of the s-protein containing the fp is well-conserved among the sars coronaviruses, we used the s-protein from sars-cov as a guide for modeling the fp of the sars-cov . accordingly, the cryoem structure of the s-protein from sars-cov (pdb: xlr) ( ) containing the fp structure was used as a template for constructing the initial sars-cov fp model. our sars-cov fp model also contained the loop connecting the fp to the neighboring proximal region, suggested to be important in the fusion process ( - ), but not modeled fully in the current study due to lack of a suitable template structure at the time. the only two different residues between the sequences of sars-cov and sars-cov fps (i /m and d /e ) were mutated to the sars-cov residues (fig. b) . the initial fp model was then solvated and ionized with . m nacl in vmd ( ) . energy minimization was carried out using the steepest descent method for timesteps followed by solution equilibrium md for ns, in order to obtain a fully equilibrated sars-cov fp in an aqueous environment. the resulting equilibrated fp was used in the subsequent simulations investigating the membrane binding characteristics of the peptide. s-protein with the highlighted fp (blue). each monomer in the trimer is drawn in a different color (grey, purple, and orange). b) the sars-cov fp. the missing residues of the sars-cov fp structure were modeled using sars-cov fp (pdb xlr), with the mutations of i /m and d /e . residues - are in an alpha helical conformation and residues - form a loop structure. c) hmmm membrane binding simulation setup. after simulating the modeled fp in solution for ns, we placed the equilibrated peptide above the hmmm lipid bilayers in several different orientations. the membrane lipid composition is pc/pe/chl ( / / mol% which is representative of the outer leaflet of the human plasma membrane (carbon as tan, oxygen as red, phosphorus as blue, and nitrogen as orange). we use six different initial orientations rotating the peptide around its x axis with respect the parallel orientation: parallel (p, x= • ), antiparallel (a, x= • ), nosedive (n, x= • ), standing (s, x=− • ), inclined (i, x=− • ), and reclined (r, x= • )), for the fp placement above the membrane to randomize the initial interactions. lipid bilayers were randomly and individually built. each orientation was simulated in five independent replicas, with the total replica number of for membrane-binding simulations. we developed membrane-bound models of the sars-cov fp, utilizing multiple simulation replicas using the hmmm model ( fig. c) ( , , ) . symmetric conventional (full) membranes were first constructed using charmm-gui ( ) with a lipid composition of pc/pe/chl ( / / mol%), resembling the outer leaflet of human plasma membrane ( ) . the membranes were then converted to hmmm membranes by removing the atoms after the fifth carbon in the phospholipid acyl chains while keeping the cholesterol molecules intact. to mimic the membrane core, a previously developed in silico solvent, termed scse (including two carbon-like interaction centers) ( ) , was used to match the number of heavy atoms removed from the lipid tails in the previous step. the resulting hmmm membranes contained , scse molecules ( ) and lipids in each leaflet. in order to further expand the sampling of the phase space, we varied the initial placement and orientation of the fp, i.e., six rotated orientations (p, a, n, s, i, r) (fig. c) , to further reduce the initial bias. we solvated the systems using the solvate plugin in vmd ( ), with . m nacl resulting in total system sizes of , to , atoms and box sizes of × × to × × Å , respectively. multiple replicas were simulated for each fp orientation, as the diffusion and mixing of lipids and the process of membrane binding and insertion of the fp can be slow even when using hmmm membranes, and more sampling would ensure the meaningfulness of the obtained membrane-bound configurations. five independent hmmm membranes with each specified fp orientation were generated using a monte carlo based lipid mixing protocol developed in our group to further enhance variation of initial membrane configurations for each fp orientation. the systems were energy minimized for , steps and simulated for ns with the c atoms of the peptide harmonically restrained (k = . kcal mol − Å − ), followed by a production run of ns where c restraints were removed. a harmonic restraint along the z-axis, with a force constant k = . kcal mol − Å − , was applied to the c , c , and c atoms of the phospholipids and to the o atoms of cholesterol to mimic the atomic distributions of a full lipid bilayer more closely, and to prevent the occasional diffusion of short-tailed lipids into the aqueous phase, which is expected for these surfactant-like molecules. to prevent scse molecules from diffusing out of the core of the membrane, we subjected them to a grid-based restraining potential, applied using the gridforce ( ) feature of namd ( , ) . five replicas, each with an independently generated hmmm membrane and a starting orientation of the fp, were simulated, resulting in a total of independent membrane binding simulations. we define stable membrane binding with the criteria described below (see fig. ). first, a contact between the fp and the membrane is defined for any heavy atom of an fp residue that is within . Å of any lipid heavy atoms. any contiguous segment of the simulation trajectory with a length of at least ns during which at least one contact between the lipids and the fp existed was considered stable binding. the mid-points of these -ns segments are marked as a red point in fig. . to characterize the binding orientation of the fp with respect to the membrane, the first of the three principal axes (pa) of the fp helical segment (residues - ) was computed (fig. s ) . the angle between the first helical pa and the membrane normal, pa , was used to describe the tilting of the helical segment. two auxiliary angles, f and f , were further defined by the angle between the membrane normal and the vector component of the respective selected phenylalanine c -c vector perpendicular to the first pa, which together describe the rotational degrees of freedom of the fp about the helical first pa. k-medoids clustering algorithm ( ) was performed to categorize the membrane-bound poses of the fp, resulting in two major membrane binding modes. for all the frames identified as membrane-bound (see above for definition), a vector composed of the z-distances between the com of individual fp residues and the lipid phosphate plane was used as the dissimilarity metric for clustering. the cluster approximate centers, i.e., medoids, were viewed as the representative structures for each fp membrane binding mode identified. to better visualize the distribution of membrane-bound poses as well as the clustering results, principal component analysis (pca) was performed to reduce the dimensionality of the original data set. in pca, the covariance matrix of the z-distances between the com of individual residues and the phosphate plane was computed and diagonalized. the resulting eigenvectors, i.e., the principal components (pcs), represents the coordinates that maximize the variance of projected data. the first two pcs were selected to project the original z-distance data of membrane-bound frames along with the clustering result onto the reduced dimension. the pca was performed using the scikit-learn package ( ) . the sars-cov attachment to and its penetration into the human plasma membrane are key steps in viral infection that rely on the interaction of the viral fp with the host cellular membrane. this aspect is the central phenomenon that we study here using atomistic simulations, in order to characterize the binding pose and conformation of the sars-cov fp when bound to the membrane. in the following sections we first describe the results of our spontaneous membrane binding simulations in terms of the overall binding of the fp using some coarse parameters. then, we use the interaction of individual residues from the fp in clustering and examine them in a reduced dimension to classify better the different microstates that arise during the binding process and discuss their biological significance. in order to characterize the membrane binding mode of the sars-cov fp to a lipid bilayer representing human cellular membranes, we performed independent copies of hmmm, membrane-binding simulations. spontaneous diffusion and membrane binding of the fp in each simulation replica can be monitored by tracking the center of mass (com) of the peptide. the component of the com was tracked with respect to the phosphate layer (the average position of all the phosphate groups) of each leaflet (blue and red lines, respectively, in fig. ). due to the applied periodic boundary conditions, and the free diffusion of the fp in the solution, the fp was able to diffuse towards either the upper or the lower leaflet of the membrane (fig. ) , which both include a lipid composition resembling the outer leaflet of human plasma membrane ( ) . among the performed membrane-binding simulations, instances of both stable (majority) and transient binding events, as well as cases with no membrane binding, were observed. because the lipid molecules in the membrane are neutral, there are no dominating electrostatic driving forces between the fp and the lipid bilayer. therefore, the major driving forces between the two are hydrophobic effects, and the fp diffusion in the solution can make the peptide take a longer time to make the initial encounter with the membrane. for further analysis of the membrane interactions, we selected only the portions of the trajectories where "stable membrane binding" or "membrane-bound state" was defined (see methods for details). residue contacts between any heavy atom of the fp and any heavy atom of the lipid bilayer are shown in fig. where the red segments of the graphs represent stable binding. consisting of six independent systems and five replicas of each, % of the replicas contain stably bound configurations (fig. ) . we observe stable binding in, e.g., parallel replicas p , p , p , and in antiparallel replicas a , a , a (fig. ) . in some simulations, nearly the entire length of the fp was observed to be engaged with lipids (e.g., replicas p , p , n , s , and r ), while in other cases, only a specific part of the peptide makes contact with the membrane, specifically via either the loop or the helical part. biasing the initial placement of the peptides allowed the first interaction of the fp to diversify. some peptides residues (y-axis) with the lipid bilayer over time (x-axis) in sampling step size of ns. if the fp is in contact with the membrane (heavy atom distance of less than . Å) for at least ns, the midpoint of that segment of the trajectory is marked with a black dot. short-term, transient contacts are labeled with black dots. we observe stable binding in % of the replicas. bound to the lipid bilayer as soon as in ns, while others diffused and tumbled longer prior to interacting with the membrane, naturally allowing the peptide to unbias itself more from the effect of initial placement. examples include nosedive-placed peptides resulting in membrane-interacting fp via the loop part, or standing-placed peptides buried in the bilayer via their helical segment (fig. ) . using the stable membrane-bound states defined above, we further examined the average position and lipid interaction of individual fp residues. we calculated the z-distance of the side chain com to the phosphate layer of the membrane (fig ) . each ensemble average is calculated from the membrane-bound portions of the individual trajectory. the average depth of insertion of individual residues reaches as deep as Å below the phosphate layer, indicating interaction with the hydrophobic core of the membrane. we observe two major classes of membrane-bound states where either the alpha helical part (approximately residues - ) or the loop part (the rest of the residues at c-terminus) of the fp interact with the lipid bilayer. for example, replicas p , p , and n , the fp are bound to the membrane primarily through its alpha helical segment whereas in systems p , p , a , a , and a , the c-terminal end (the loop) is mostly engaged with the lipids. in order to better classify these binding modes in a reduced space, we will next do clustering of the obtained membrane-bound configurations. to analyze the ensemble of fp membrane binding configurations captured in the hmmm simulations where stable membrane binding was observed, clustering was performed using the z-distances between the the side chain com and the membrane lipid phosphate plane. despite the large variance in membrane-bound poses among md snapshots from different simulation replicas, two major clusters can be identified, which represent the two distinct binding patterns mentioned briefly above. we call the first cluster where the n-terminal, helical part of the fp interacts with the membrane, the "helix-binding" mode. in the other major cluster, the fp interacts with the bilayer primarily from the c-terminal loop, hereby referred to as the "loop-binding" mode. for better visualization, we performed principal component analysis (pca) and reduced the dimension to the first two principal components (pcs), which together explain % of the variance. the nature of these two pcs, pc and pc , was examined by fig. ). transparent green areas represent the standard deviation of the distance. the eigenvectors, which quantitatively evaluate the contribution of membrane insertion by each residue (fig. s ) . from the eigenvectors, pc measures the contrast of membrane insertion by the helix and the c-terminal loop, whereas pc measures mostly how much the c-terminal loop is inserted. all membrane binding configurations were then projected and visualized in the reduced dimension defined by pc and pc (fig. ) . examination of the projections and selected md snapshots are consistent with the presence of two major membrane-binding modes as introduced earlier, i.e., either the helix (fig. a,d) or the loop (fig. c,f) , with a few involving both segments only at the membrane surface (fig. b,e) . the flexibility of the c-terminal loop resulted in a variety of scenarios in terms of its membrane interaction, including its deep membrane insertion (e.g., d and f in fig. ), tangential interaction (e.g., c and e in fig. ), or no interaction (e.g., a and b in fig. ). the two major clusters of membrane binding poses are also visualized in the reduced dimension (fig. a) , along with the two cluster centers showing the representative membrane binding modes (fig. b,c) . as expected, the cluster corresponding to the helix-binding mode demonstrated a larger variance of the loop involvement in the membrane binding compared to the cluster representing the loop-binding mode. to further analyze the membrane binding configurations of the fp in our simulations, we analyzed the orientation of the helical segment of the fp (residues - ), which remains alpha helical throughout all the simulation replicas (fig. s ) . we traced the internal angles pa , f , and f defined to describe the fp helix orientation (see methods for the definition of the angles) over the simulation trajectories in fig. a , for replicas with stably bound configurations. in the simulations where the fp binds the membrane in a helix-binding mode (p , p , n , n , s , s , r , r ), pa stabilizes at ∼ • which corresponds to an oblique position relative to the membrane where the n-terminus is tilted towards the membrane core. since the loop is more flexible and the reference internal angle is defined at the -helix moving in the solution, it is more difficult to provide an equally accurate description for the loop-binding modes (p , a , a , a , n , s , i , i , i , i ). in most of the replicas towards the end of the trajectory, pa is less than • . this means that the n-terminal of the helix is facing up, and as the angle decreases the helix becomes more orthogonal to the membrane. in the rest of the loop-binding modes, pa is larger than • , meaning while the loop is inserted into the membrane, the helix is also interacting with the membrane but not fully inserted. the f angles in the helix-binding mode averaged around ∼ • where f is facing towards the lipid bilayer. this preference can be clearly seen in fig. b . since f is nearly located on the opposite side of the helix, this residue is mostly facing up in the membrane with f averaging around • . however, there is no specific preference for f and f angles in the loop-binding mode (fig. b ). within each family of viruses, the sequence of the fps is highly conserved. in addition, the alpha helical part of the sars-cov fp (sfiedllfnkv) is highly conserved among the coronaviridae family ( , ) , as indicated by the multiple sequence alignment (fig. s ), highlighting its importance in the viral life cycle. in our simulations we observed two membrane binding modes for the sars-cov fp, the helix-binding mode in which the helical segment of the fp engages with the membrane, and the loop-binding mode where the c-terminal loop of the peptide is the part primarily interacting with the lipids (fig. ). mutagenesis experiments have clearly shown that residues l , l , and f play a major role in viral fusion. given their importance, these residues are termed the "fusion active core" of the fp ( , ) . consistent with these results, in the helix-binding mode found in our simulations, residues l , l , and f are closely interacting with the membrane lipids with the two leucine side chains deeply inserted into the hydrophobic core of the membrane (fig. ) . notably, the same residues are found to be highly conserved among all human corononaviruses (fig. s ). given the high sequence conservation and established role of several residues in the helical segment of the fp that we simulated, we strongly believe and propose that the helix-binding mode represents the physiologically relevant membrane-bound form of the sars-cov fp. notably, from an energetic perspective, insertion of helical segments into the membrane might be less costly, since backbone amide groups in a helix satisfy their hydrogen bonds internally ( ) . based on structural data currently available, the loop-binding mode might become relevant to potential interaction of other segments of the protein with the membrane upon structural transition and cleavage of the s protein. in addition to the simulated segment in this study (which we refer to as the fp), there are other pieces proximal to the fp (named fp in sars-cov and fppr in sars-cov ) ( , ( ) ( ) ( ) ( ) , which are implicated in membrane fusion. the fusion peptide proximal region (fppr) of sars-cov downstream to the fp was later resolved and claimed to be involved in the structural rearrangement of the s protein prior to membrane fusion ( ) . an internal disulfide bond within the fppr, between c and c , was observed and is suggested to increase membrane-ordering activity ( , ) . the membrane-ordering activity of the fp, due to the fusion active core, is significantly higher than the fppr and the activity of fp/fppr together is only slightly increased compared to the activity of fp and fppr separately ( ) . the loop-binding mode might support the formation of a "fusion platform" where both fp and fppr interact with the membrane simultaneously as two subdomains ( ) . to characterize such platform axis changing with color in time. replica names are colored according to whether the fp is in a helix-(purple) or in a loop-binding mode (yellow). b) different helix orientations displayed by the two fp membrane-binding modes. the helix-binding mode prefers a pa angle between • and • , accumulated more around • whereas the loop-binding mode prefers a wider range of pa angles, spanning from • to • . unlike the loop-binding mode, the helix-binding mode demonstrates a strong preference for f rather than f to face towards the membrane. interacting with the membrane, a longer peptide including the fppr should be studied in future. , which emerged as a severe pandemic worldwide, calls for a need to accelerate the development of novel therapeutic intervention strategies. the s-protein of the sars-cov contains the key machinery necessary for the infection of human cell, including the fp, a highly conserved segment that inserts into the human cellular membrane initiating the fusion of the virus. in this study, using a large set of simulations, we describe how the sars-cov fp binds mammalian cellular membranes and characterize, at atomic details, lipid-protein interactions important for the stability of the bound state. characterizing the mechanism of the fusion driving fp-host membrane interactions is key to our understanding critical steps involved in the process of viral infection, paving way for potential development of novel therapeutics against sars-cov . these include modulation of fp-membrane binding interface through small molecules showing high specificity for this region of the s-protein, or inhibiting the key lipid-protein interactions observed. based on the suggested binding mode elucidated in our study, mutagenesis experiments can be designed to further confirm the role of the important residues implicated in membrane binding. given the close similarity of the fusion peptides in coronaviruses in general, these results can also be applicable to infections caused by other members of this life-threatening family of pathogens. dg, kk and et designed the research. dg carried out all simulations, dg and ml analyzed the data. all authors wrote the article. a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a review of coronavirus disease- (covid- ) receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus coronavirus membrane fusion mechanism offers a potential target for antiviral development receptor binding and membrane fusion in virus entry: the influenza hemagglutinin mechanisms of viral membrane fusion and its inhibition structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme virus membrane-fusion proteins: more than one way to make a hairpin coronavirus binding and entry. coronaviruses: molecular and cellular biology mechanisms of coronavirus cell entry mediated by the viral spike protein ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence sars-cov fusion peptides induce membrane surface ordering and curvature activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites conformational reorganization of the sars coronavirus spike following receptor binding: implications for membrane fusion structure of a proteolytically resistant core from the severe acute respiratory syndrome coronavirus s fusion protein the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex the many mechanisms of viral membrane fusion proteins viral fusion peptides: a tool set to disrupt and connect biological membranes interplay of proteins and lipids in virus entry by membrane fusion. protein-lipid interactions, in press characterization of a highly conserved domain within the severe acute respiratory syndrome coronavirus spike protein s domain with characteristics of a viral fusion peptide membrane insertion of the three main membranotropic sequences from sars-cov s glycoprotein identification of the membrane-active regions of the severe acute respiratory syndrome coronavirus spike membrane glycoprotein using a / -mer peptide scan: implications for the viral fusion mechanism physiological and molecular triggers for sars-cov membrane fusion and entry into host cells the sars-cov fusion peptide forms an extended bipartite fusion platform that perturbs membrane order in a calcium-dependent manner stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis distinct conformational states of sars-cov- spike protein lipid organization of the plasma membrane membrane cholesterol modulates oligomeric status and peptide-membrane interaction of severe acute respiratory syndrome coronavirus fusion peptide capturing spontaneous partitioning of peripheral proteins using a biphasic membrane-mimetic model accelerating membrane insertion of peripheral proteins with a novel membrane mimetic model partitioning of amino acids into a model membrane: capturing the interface extension of the highly mobile membrane mimetic to transmembrane systems through customized in silico solvents characterizing the membrane-bound state of cytochrome p a : structure, depth of insertion, and orientation a microscopic view of phospholipid insertion into biological membranes membrane-induced structural rearrangement and identification of a novel membrane anchor in talin f f a highly tilted membrane configuration for the pre-fusion state of synaptobrevin conformational heterogeneity of -synuclein in membrane synaptotagmin's role in neurotransmitter release likely involves ca + -induced conformational transition molecular model of hemoglobin n from mycobacterium tuberculosis bound to lipid bilayers: a combined spectroscopic and computational study membrane interaction of the factor viiia discoidin domains in atomistic detail efficient exploration of membrane-associated phenomena at atomic resolution atomic-level description of protein-lipid interactions using an accelerated membrane model lipid specificity of the membrane binding domain of coagulation factor x differential membrane binding mechanics of synaptotagmin isoforms observed at atomic detail capturing spontaneous membrane insertion of the influenza virus hemagglutinin fusion peptide mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform the jalview java alignment editor cryo-em structure of the -ncov spike in the prefusion conformation cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding vmd -visual molecular dynamics charmm-gui: a web-based graphical user interface for charmm exploring transmembrane transport through -hemolysin with grid-steered molecular dynamics scalable molecular dynamics with namd scalable molecular dynamics on cpu and gpu architectures with namd numpy/scipy recipes for data science: k-medoids clustering scikit-learn: machine learning in python sars-coronavirus spike s domain flanked by cysteine residues c and c is important for activation of membrane fusion experimentally determined hydrophobicity scale for proteins at membrane interfaces this work was supported by the national institutes of health (grants p -gm and r -gm to et). md simulations were performed using blue waters and computational resources provided by microsoft azure. key: cord- -v zmksio authors: golden, joseph w.; cline, curtis r.; zeng, xiankun; garrison, aura r.; carey, brian d.; mucker, eric m.; white, lauren e.; shamblin, joshua d.; brocato, rebecca l.; liu, jun; babka, april m.; rauch, hypaitia b.; smith, jeffrey m.; hollidge, bradley s.; fitzpatrick, collin; badger, catherine v.; hooper, jay w. title: human angiotensin-converting enzyme transgenic mice infected with sars-cov- develop severe and fatal respiratory disease date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: v zmksio the emergence of sars-cov- has created an international health crisis. small animal models mirroring sars-cov- human disease are essential for medical countermeasure (mcm) development. mice are refractory to sars-cov- infection due to low affinity binding to the murine angiotensin-converting enzyme (ace ) protein. here we evaluated the pathogenesis of sars-cov- in male and female mice expressing the human ace gene under the control of the keratin promotor. in contrast to non-transgenic mice, intranasal exposure of k -hace animals to two different doses of sars-cov- resulted in acute disease including weight loss, lung injury, brain infection and lethality. vasculitis was the most prominent finding in the lungs of infected mice. transcriptomic analysis from lungs of infected animals revealed increases in transcripts involved in lung injury and inflammatory cytokines. in the lower dose challenge groups, there was a survival advantage in the female mice with % surviving infection whereas all male mice succumbed to disease. male mice that succumbed to disease had higher levels of inflammatory transcripts compared to female mice. this is the first highly lethal murine infection model for sars-cov- . the k -hace murine model will be valuable for the study of sars-cov- pathogenesis and the assessment of mcms. sars-cov- is a betacoronavirus and the causative agent of covid- , a febrile respiratory human disease that emerged in late in china and subsequently spread throughout the world ( , ). covid- is primarily a respiratory disease with a wide spectrum of severity ranging from a mild cough, to the development of hypoxia, and in some cases resulting in a lifethreatening acute respiratory distress syndrome (ards) requiring mechanical ventilation ( , ). the most severe cases are generally skewed towards the aged population (> ) and those with underlying health conditions such as hypertension or cardiovascular disorders ( , ) . sars-cov- human infections can also cause vasculature damage and coagulopathies, leading to infarction ( - ). acute disease often presents with elevated levels of inflammatory cytokines, including il- , and these host molecules may play a role in the pathogenic process ( , ) . additionally, ~ % of cases include signs of neurological disease such as headache, anosmia (loss of smell), ataxia, meningitis, seizures and impaired consciousness ( ) ( ) ( ) . to-date, sars-cov- has infected over eight million people world-wide and resulted in the death of more than , . there is an urgent need for medical countermeasures to prevent this disease or limit disease severity in a post exposure setting. similar to sars-cov ( ) , sars-cov- binds to target cells via an interaction between the kda viral spike protein and the host angiotensin-converting enzyme (ace ) protein ( ) ( ) ( ) ( ) ). an important factor in host tropism of the virus is this receptor interaction and reduced affinity between these two molecules greatly impacts host susceptibility to infection. both sars-cov and possibly to a greater extent sars-cov- bind to murine ace (mace) poorly compared human ace (hace ) ( ) . accordingly, mice are inherently refractory to infection by ace utilizing golden jw, et al sars-cov- pathogenesis in k -hace mice human coronaviruses ( ) ( ) ( ) . in these animals, infection by sars-cov ( ) and sars-cov- ( ) is rapidly controlled, although older mice are more permissive to lung replication, but nevertheless lung injury is limited and mortality is generally low. indeed, even infection in mice lacking adaptive immunity due to rag deficiencies (no t-cells or b-cells) are protected against severe sars-cov disease, whereas mice genetically devoid of stat- , an important molecule involved in innate immunity, are sensitive to infection by sars-cov( - ). however, disease in stat- mice is protracted and not highly representative of human infections ( , ) . in response to the need for small animal models to study sars-cov, several laboratories ) . k limits expression to airway epithelial cells, colon and to a lesser extent kidney, liver, spleen and small intestine. a minor level of hace expression was also detected in the brain. these mice are susceptible to sars-cov strain urbani and develop severe respiratory disease subsequent to intranasal exposure characterized by lung inflammation, serum cytokine and chemokine production as well as high lethality ( ). as with several mouse strains transgenic for hace , sars-cov infects the brains of k -hace mice ( , ) . cns localization is speculated to play a major role in host mortality due to neuronal cell death, particular cell loss in cardiorespiratory control center ( ) . some of these previous hace transgenic mice are being used for sars-cov- research and newer golden jw, et al sars-cov- pathogenesis in k -hace mice systems have been developed using crsipr-cas ( , , ). however, none were shown to produce a consistent lethal disease and in models where lung injury occurred, it was most pronounced in aged animals ( , ). here, we evaluated susceptibility of k -hace mice to sars-cov- . we found that mice developed severe disease that included respiratory distress with weight loss and mortality, as well as brain infection. sars-cov- produces severe and fatal infection of k -hace mice. two groups of -week old k -hace mice (n= per group) were intranasally infected with x or x pfu of sars-cov- . these groups equally divided by sex (n= per group/sex). on day , two mice per group were euthanized to assess disease severity. the remaining mice per group were monitored up to days. we also infected c bl/ , balb/c and rag deficient mice with a challenge dose of x pfu. on day , all groups of infected k -hace mice began to lose weight, which was more pronounced in the female mice compared to the male mice in either challenge dose group ( fig. a & fig. s a ). non-transgenic mice did not lose any weight and no animal succumbed to disease. starting on day , several k -hace animals began to show signs of respiratory disease, included labored breathing and conjunctivitis. on day through day the majority of mice ( / mice) began to meet euthanasia criteria (n= ) or died (n= ). an additional animal in the low dose male group succumbed to disease on day , after a period of weight loss. the highest mean weight loss was in the female groups (> %), although male mice lost > % weight (fig s a,b) . of the k -hace mice in the high dose challenge groups, all females and % of the male mice succumbed to disease. most female mice survived in the low dose group with % mortality, but all male mice succumbed to disease. the difference in survival between low dose male and female golden jw, et al sars-cov- pathogenesis in k -hace mice mice was significant (log-rank; p= . ), as was mortality between the high and low dose female groups (log-rank; p= . ). there was no statistical significance in survival between the other groups. lung homogenates taken on day showed high levels of virus in k -hace , in contrast to c bl/ or rag -deficient mice which had low levels of virus (fig. b,c) . the virus rna levels were nearly identical between all the infected k -hace groups and remained high in most of the euthanized mice, with the exception of the mouse that died on day . that animal had markedly lower levels of detectable genomic rna (fig. c) . compared to non-transgenic mice and uninfected k -hace mice, infected k -hace mice had comparatively higher serum levels of tnf-α, il- and il- as well as the monocyte chemoattractants mcp- (ccl ) and mcp- (ccl ) (fig. d) . levels of cytokines observed in mice that succumbed to disease were generally higher compared to those sampled on day . overall, these findings indicated that sars-cov- causes a severe disease in k -hace mice following intranasal exposure. lungs of k -hace mice exposed to sars-cov- show signs of acute disease. lungs were collected from k -hace on day or at the time of euthanasia due to disease severity. in k -hace , viral rna was detected by in situ hybridization (ish) in all mice on day and in most mice succumbing to disease on days - , but these levels diminished as disease progressed (fig a, fig s , table s ). additionally, ish labeling was patchy (fig. s ) and most severe at the day time point. ish labeling was present in both inflamed and normal appearing alveolar septa. positive ish labeling for sars-cov- was identified multifocally in alveolar septa in the lung, suggesting infection of pneumocytes and macrophages. this was confirmed by the detection of golden jw, et al sars-cov- pathogenesis in k -hace mice sars-cov- protein in e-cadherin positive cells (pneumocytes) and cd positive (alveolar and infiltrating macrophages) using immunofluorescence assay (ifa, fig. b ,c). in comparison with the normal lung architecture in uninfected control animals, infected mice necropsied on day , and those succumbing to disease on days - , had varying levels of lung injury including area of lung consolidation characterized by inflammation/expansion of alveolar septa with fibrin, edema and mononuclear leukocytes and infiltration of vessel walls by numerous mononuclear leukocytes (fig a, fig s , and table s ). type ii pneumocyte hyperplasia was identified in less than half of infected animals. this lesion had a relatively patchy distribution except in the most severely affected animals where it is more abundant. in areas of septal inflammation, exudation of fibrin and edema into alveolar lumina from damaged septal capillaries was observed in most animals. vasculitis was the most common finding and was present in ~ % percent of all mice (fig. a,b) . the lesion encompassed small and intermediate caliber vessels, and was often characterized by near circumferential infiltration of the vessel wall by numerous mononuclear inflammatory cells. this lesion also contained small amounts of fibrin and occasional necrotic debris, affecting all tunics and obscuring the vessel wall architecture. however, the endothelial cells surrounding lung vasculature were largely free of viral rna (fig. b) . in one low dose male mouse that died on day , evidence of numerous fibrin thrombi were identified in the small and intermediate vessels, suggestive of a hypercoagulable state within the lung. this same animal had marginally detectable virus in the lung (fig. c) , and fibrin thrombi were not identified in other organs including liver and kidney. infected k -hace mice had elevated numbers of tunel positive cells, suggesting increased cell death (fig. c) . we also detected increased expression of ki- , a marker for cellular proliferation, likely expressed by proliferating golden jw, et al sars-cov- pathogenesis in k -hace mice pneumocytes and potentially by replicating alveolar macrophages (fig. d) . neutrophilia was detected by h&e staining, consistent with an increase in myeloperoxidase (mpo)-positive cells (neutrophils, basophil and eosinophil) detected by ifa (fig. s a) . however, the mpo positive cells were devoid of viral antigen (fig. s a) . there was also an increase in the presence of cd positive macrophages (fig. s b) and a pronounced increase in cd and cd positive cells in infected lungs, indicative of infiltrating leukocytes including t-cells, which is consistent with histologic findings (fig. s c) . these data indicate that k -hace mice develop a pronounced lung injury upon exposure to sars-cov- . transcriptional profiles of host immunological and inflammatory genes in lung homogenates from sars-cov- infected k -hace and c bl/ mice were examined by nanostring-based gene barcoding on day (k -hace and c bl/ ) or in k -hace mice at the time euthanasia. transcripts were increased in k -hace mice and decreased at a log fold cutoff of and a p value < . . many of the increased transcripts in the k -hace mice were inflammatory genes including il- , interferon gamma and chemokines (ccl , ccl , ccl and cxcl ) (fig. a) . the type i interferon transcripts ifna and ifnb along with the cytokines il- and il- were decreased. consistent with the increased presence of cd macrophages, cd transcripts were also significantly increased in k -hace mice. viral sensing pathways were elevated in infected mice with severe disease, indicated by high transcript was observed in the nasal turbinates in the majority of mice and rarely within the eyes of infected mice (fig. a, fig. s and table s ). viral rna in the eye was localized to the retina, suggesting viral infection of neurons in the inner nuclear layer and ganglion cell layer (fig. s ) . despite infection, evidence of inflammation or other damage in the retina or elsewhere in the eye was not present. viral rna and spike protein were also detected to some degree in the nasal turbinate epithelium (predominantly olfactory epithelium) as early as day post-infection (fig. b) , as indicated by co-staining with cytokeratin ( fig c) . pathology was minimal, predominantly isolated to few areas of olfactory epithelium atrophy, with degeneration or erosion present in the epithelium lining the dorsal and lateral nasal meatuses (fig. d) . some cellular sloughing was also detected and these sloughed cells contained viral rna (fig. s ) . in mice succumbing to disease on days - , virus was present in the olfactory bulb and most animals showed asymmetrical staining, with one bulb more positive than the other. viral rna was detected throughout the olfactory bulb, including in the olfactory nerve layer (onl), glomerular layer (gl), external golden jw, et al sars-cov- pathogenesis in k -hace mice plexiform layer (epl), and mitral cell layer (mcl) of olfactory bulbs in most of animals. viral protein co-localized with the neuronal marker neun, suggesting virus was present within neurons in the olfactory bulb (fig. s ) . virus was not detected in the olfactory bulb of animals taken on day , suggesting that virus trafficked to this region on day or . these data indicated that sars- cov- infects cells within the nasal turbinates, eyes and olfactory system and that infection was observed in epithelial cells and neurons. brain infection was not observed in the majority of animals examined on day , but was prevalent in mice necropsied on days - (table s ). evidence of sars-cov- was found throughout the brain including strong but variably diffuse signal in regions of the thalamus, hypothalamus, amygdala, cerebral cortex, medulla, pons and midbrain (fig. a, table s ). similar intense but less diffuse signal was present within the hippocampus. ish positive cells included neurons of thalamic nuclei. in contrast, cells within the vessel walls and perivascular spaces infiltrated by mononuclear inflammatory cells were negative for viral genomic rna (fig. a) . histopathological changes were detected in the brains of several infected k -hace mice euthanized on day - , but not in most mice taken on day (fig. b , table s and fig. s ). in the thalamus/hypothalamus, vasculitis was the most common lesion characterized by endothelial hypertrophy and increases in mononuclear leukocytes within the vessel wall and/or filling the perivascular space. small amounts of necrotic debris were also identified. in the majority of these cases, the vascular lesion was characterized by the presence of increased numbers of microglia within the adjacent neuropil. occlusive fibrin thrombi were also detected within the thalamus in a few mice. meningitis was observed in a subset of animals and golden jw, et al sars-cov- pathogenesis in k -hace mice was associated with infiltration of mononuclear leukocytes (majority lymphocytes) and is most prominent adjacent to vessels. in the mouse that died late on day with massive pulmonary clotting, the rostral cerebral cortex brain lesions included small to intermediate size vessel walls multifocally expanded by mononuclear inflammatory cells and perivascular hemorrhage extending into the adjacent neuropil (fig. b) . increased numbers of microglia were readily detected on h&e stained sections, expanding outward from the perivascular neuropil. an increase in numbers of microglia were found in the neuropil surrounding affected vessels. additionally, brains also showed signs of neuroinflammation indicated by increased staining of iba- and gfap indicating microgliosis and astrocytosis, respectively (fig. c) . necrosis was identified in at least five animals, and was most prominent within the periventricular region of the hypothalamus as well as in the amygdala. the lesion was characterized by moderate numbers of shrunken, angular cells with hypereosinophilic cytoplasm, pyknotic nuclei and surrounded by a clear halo (fig. s ) . the morphology and location of individual cells was suggestive of neuronal necrosis, but further investigation will be required to confirm this finding. viral spike protein was detected in neun positive cells, indicating viral infection of neurons (fig. d) . viral antigen was absent in gfap positive cells suggesting virus does not productively infect astrocytes. these data indicate that similar to sars-cov, sars-cov- also targets the brain of k -hace mice, causing brain injury. as indicated by duplex ish labeling of brain, neurons are positive for both hace transgene expression and viral genomic rna (fig. s ) . no animal showed outward signs of neurological deficient, such as hind limb paralysis, head tilting or tremors. golden jw, et al sars-cov- pathogenesis in k -hace mice other murine infection models for sars-cov- involving transgene expression of the human ace protein have been reported ( , , ) . however in these models, sars-cov- only produces a transient weight loss with some lung injury, but the animals generally recover. additionally, several of these models required mice aged > weeks for the most severe disease to occur, diminishing the practicality of these systems given the urgent need for medical countermeasures (mcms) ( , ). one system tested sars-cov- infection in mice in which hace was expressed under the control of the hfh promoter ( ) . infection in these mice was only ~ % lethal ( / mice) and lung injury (assessed by plethysmography) and weight loss were absent. lethality in this system was exclusive to animals where virus was detected in the brain. other recently reported sars-cov- murine models involved transduction of mouse lungs with a replication-incompetent adenovirus virus or an adeno associated virus (aav) encoding the hace gene ( , ) . transduced lung cells expressing hace supported sars-cov- replication and lung pathology ensued along with weight loss. however, disease was generally mild with no lethality. blockade of the type i interferon system using pharmacological intervention was needed to produce the most severe disease . murine systems faithfully producing the major elements of severe disease observed in humans will be more useful for identifying the most promising mcms. the k -hace mice lost considerable weight > % in males and > % in females and lethality in the high dose exceeded %. acute lung injury was detected in all animals succumbing to disease, with vascular damage the most common lesion. similar to findings with sars-cov, sars-cov- infected the brains of k -hace mice. brain infection resulted in vasculitis and inflammation, with sars-cov- antigen detected in neurons. it is possible virus enters the brain via the olfactory bulb, as has been reported for sars-cov, although more studies golden jw, et al sars-cov- pathogenesis in k -hace mice will be required as virus may also have entered the brain via inflamed vessels. whether mortality results directly from brain infection is not clear, but this has been speculated as the major cause of mortality in sars-cov infected mice ( ) . infection in the brain was delayed by at least four days, as it was an uncommon finding in day animals. thus, early during infection lung appears to be the primary target. we have not yet evaluated the protective efficacy of mcms in this model, but it has been reported that antibodies protect against sars-cov, demonstrating the k -hace system is useful for evaluating countermeasures against ace -targeting human coronaviruses. importantly, in our study some animals at the lower dose survived infection despite significant infection of k -hace mice by sars-cov- produces a disease similar to that observed in acute human cases, with development of an acute lung injury associated with edema, production of inflammatory cytokines and the accumulation of mononuclear cells in the lung. impacted lungs had elevated levels of transcripts consistent with respiratory damage such as increased expression of himf, which is involved in activation of lung endothelial cells in response to lung inflammation ( ) . we also found increased levels of sgpl , a molecule associated with lung injury ( ) and known to be increased in mechanically ventilated mice ( ) . a prominent finding in infected k - hace mice system was vasculitis. endotheliitis/vasculitis has been observed in human covid- patients and a role for the endothelium in acute disease is becoming more apparent ( , , ) . direct viral infection in human endothelial cells has also been reported ( ). curiously in the mice, virus was absent in these areas suggesting vasculitis was due to host inflammatory processes, but golden jw, et al sars-cov- pathogenesis in k -hace mice further study will be required to determine if vascular damage is incurred by direct or indirect viral effects. it has been speculated that during human acute disease, inflammatory cytokines (il- in particular) are major drivers of tissue damage and this is supported by data from humans showing il- receptor targeting with pharmacological antagonists (tocilizumab) can reduce morbidity ( ). we found il- transcripts, as well chemokines, are elevated in mouse lungs. during human disease, pulmonary inflammation is associated with increases in lung granulocytes and an increase in macrophages ( , , ) . some have speculated that these macrophages play an important role in host injury ( ) . it is still unclear if human macrophages are directly infected by sars-cov- , though we found virus antigen in cd macrophages and others report infection in murine mac -positive macrophages ( ). further study will be required to determine the role macrophages play in sars-cov- lung injury and given the regents available, murine systems may be highly suited for these studies. in addition to vascular issues, coagulopathies are a common findings in humans ( ), with pulmonary embolism having been reported, along with clotting abnormities leading to loss of limbs ( ) . at least some mice produced evidence of these clotting issues, with one mouse presenting with fibrin thrombi in the lungs. during human infections, males have been reported to have more severe outcomes despite a similar infection rate ( ). in our experiments, we observed a statistically significant difference in survival of female and male mice infected at the lower dose of virus, with % of females but no males surviving infection. despite surviving, the female mice challenged with the low dose still lost a significant amount of weight (> %). transcriptomic profiling in mouse lungs indicated that female mice that succumbed to disease had modestly lower levels of il- , cxcl- and il- ra suggesting a less intense inflammatory response. our work only involved a small number of golden jw, et al sars-cov- pathogenesis in k -hace mice animals, and more work will be required to determine if the k -hace system recapitulates this sex difference in disease severity. the neuroinvasive aspects of covid- are becoming more appreciated ( ) . indeed, sars-cov- causes neurological sequela in at least a third of human cases including headache, infection of these cells may help explain the loss of smell associated with some covid- cases. however, the virus may also gain access via disruption of the blood brain barrier, as these were inflamed in most animals and neurovasculitis has been found in humans ( ). overall, the k - were then heated in kit-provided antigen retrieval buffer and digested by kit-provided proteinase. sections were exposed to ish target probe pairs and incubated at °c in a hybridization oven for h. after rinsing, ish signal was amplified using kit-provided pre-amplifier and amplifier conjugated to alkaline phosphatase and incubated with a fast red substrate solution for min at room temperature. sections were then stained with hematoxylin, air-dried, and coverslipped. epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a pneumonia outbreak associated with a new coronavirus of probable bat origin clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study targeting potential drivers of covid- : neutrophil extracellular traps covid- and crosstalk with the hallmarks of aging the hallmarks of covid- disease pulmonary vascular endothelialitis, thrombosis, and angiogenesis in covid- covid- : the vasculature unleashed covid- and its implications for thrombosis and anticoagulation the pathogenesis and treatment of the `cytokine storm' in covid- the many faces of the anti-covid immune response neuromechanisms of sars-cov- : a review early recovery following new onset anosmia during the covid- pandemic -an observational cohort study does sars-cov- invade the brain? translational lessons from animal models angiotensin-converting enzyme is a functional receptor for the sars coronavirus structural basis of receptor recognition by sars-cov- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven cryo-em structure of the -ncov spike in the prefusion conformation structure of the sars-cov- spike receptor-binding domain bound to the ace receptor receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus animal models for sars and mers coronaviruses prior infection and passive transfer of neutralizing antibody prevent replication of severe acute respiratory syndrome coronavirus in the respiratory tract of mice sars-like wiv -cov poised for human emergence severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitis in mice transgenic for human ace a mouse-adapted sars-cov- model for the evaluation of covid- medical countermeasures a mouse model of sars-cov- infection and pathogenesis a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies mouse model of sars-cov- reveals inflammatory role of type i interferon signaling hypoxiainduced mitogenic factor (himf/fizz /relmalpha) increases lung inflammation and activates pulmonary microvascular endothelial cells via an il- -dependent mechanism protection of lps-induced murine acute lung injury by sphingosine- -phosphate lyase suppression sphingolipids in ventilator induced lung injury: role of sphingosine- -phosphate lyase endothelial cell infection and endotheliitis in covid- effective treatment of severe covid- patients with tocilizumab macrophages: a trojan horse in covid- ? the use of anti-inflammatory drugs in the treatment of people with severe coronavirus disease (covid- ): the perspectives of clinical immunologists from china severe covid- and aging: are monocytes the key? geroscience acute pulmonary embolism associated with covid- pneumonia detected by pulmonary ct angiography gender differences in patients with covid- : focus on severity and mortality this project was funded by a grant awarded to j.w.g. and j.w.h. from the military infectious disease program. we thank the usamriid histology lab and comparative medicine division for their assistance. we also express gratitude to the jackson laboratory for providing early access to duplex in situ hybridization. duplex in situ hybridization was performed using the rnascope . hd duplex assay kit (advanced cell diagnostics) according to the manufacturer's instructions with minor modifications. in addition to sars-cov- genomic rna probe mentioned above (# , green), another probe with c channel (# -c , red) specifically targeting human ace (nm_ . ) was designed and synthesized by advanced cell diagnostics. ish signal was amplified using kit-provided pre-amplifiers and amplifiers conjugated to either alkaline phosphatase or horseradish peroxidase, and incubated sequentially with a fast red and green chromogenic substrate solution for min at room temperature. sections were then stained with hematoxylin, air-dried, and coverslipped. key: cord- -pqn ojj authors: yao, hebang; cai, hongmin; li, tingting; zhou, bingjie; qin, wenming; lavillette, dimitri; li, dianfan title: a high-affinity rbd-targeting nanobody improves fusion partner’s potency against sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: pqn ojj a key step to the sars-cov- infection is the attachment of its spike receptor-binding domain (s rbd) to the host receptor ace . considerable research have been devoted to the development of neutralizing antibodies, including llama-derived single-chain nanobodies, to target the receptor-binding motif (rbm) and to block ace -rbd binding. simple and effective strategies to increase potency are desirable for such studies when antibodies are only modestly effective. here, we identify and characterize a high-affinity synthetic nanobody (sybody, sr ) as a fusion partner to improve the potency of rbm-antibodies. crystallographic studies reveal that sr binds to rbd at a conserved and ‘greasy’ site distal to rbm. although sr distorts rbd at the interface, it does not perturb the rbm conformation, hence displaying no neutralizing activities itself. however, fusing sr to two modestly neutralizing sybodies dramatically increases their affinity for rbd and neutralization activity against sars-cov- pseudovirus. our work presents a tool protein and an efficient strategy to improve nanobody potency. sars-cov- , the pathogenic virus for covid- , has caused a global pandemic since its first report in early december in wuhan china ( ), posing a gravely crisis for health and economic and social order. sars-cov- is heavily decorated by its surface spike (s) ( , ) , a single-pass membrane protein that is key for the host-virus interactions. during the infection, s is cleaved by host proteases ( , ) , yielding the nterminal s and the c-terminal s subunit. s binds to angiotensin-converting enzyme (ace ) ( - ) on the host cell membrane via its receptor-binding domain (rbd), causing conformational changes that trigger a secondary cleavage needed for the s mediated membrane fusion at the plasma membrane or in the endosome. because of this essential role, rbd has been a hot spot for the development of therapeutic monoclonal antibodies (mabs) and vaccine ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . llama-derived heavy chain-only antibodies (nanobodies) are attractive biotherapeutics ( ) . these small (~ kda) proteins are robust, straightforward to produce, and amenable to engineering such as mutation and fusion. owing to their ultra-stability, nanobodies have been reported to survive nebulization, a feature that has been explored for the development of inhaled nanobodies to treat respiratory viral diseases ( , ) which categorizes covid- . owing to their high sequence similarities with human type vh domains (vh ), nanobodies are known to cause little immunogenicity ( ) . for the same reason, they can be humanized with relative ease to reduce immunogenicity when needed. therefore, nanobodies as biotherapeutics are being increasingly recognized. examples of nanobody drugs include caplacizumab ( ) for the treatment of acquired thrombotic thrombocytopenic purpura, and ozoralizumab and vobarilizumab that are in the clinical trials for rheumatoid arthritis ( , ) . recently, several groups have independently reported neutralizing nanobodies ( , ( ) ( ) ( ) ( ) ( ) ( ) or single-chain vh antibodies ( ) against sars-cov- with variable potencies. we have recently reported several synthetic nanobodies (sybodies) which bind rbd with various affinity and neutralizing activity ( ) . affinity and neutralizing activity are very important characteristics for therapeutic antibodies, and they can be improved by a number of ways such as random mutagenesis ( , ) and structure- based design. previously, in the case of one modestly-neutralizing sybody mr , we have determined its structure and designed a single mutant that improves its potency by over folds ( ) . the rational design approach, while very effective, inevitably requires high-resolution structural information which are non-trivial to obtain. generally applicable tools will be welcome. here, we report a strategy to increase sybody potency by biparatopic fusion with sr , a sybody that binds rbd tightly with a kd of . nm. as revealed by crystal structure, sr engages the rbd at a conserved site that is distal to the rbm. as such, it does not neutralize sars-cov- but forms non-competing pairs with several other rbm-binders and increases their neutralization potency when conjugated. sr may be used as a general affinity-enhancer for both detection and therapeutic applications. a high-affinity rbd binder without neutralizing activity previously, we generated sybodies from three highly diverse synthetic libraries by ribosome and phage display with in vitro selections against the sars-cov- rbd. most of the sybody binders showed neutralizing activity. interestingly, about sybodies bind rbd but showed no neutralizing activities ( ) even at m concentration. one such sybodies, named sr , was characterized in this study. in analytic fluorescence-detection size exclusion chromatography (fsec), sr caused earlier retention of rbd (fig. a) which was included at a low concentration ( . m), suggesting nanomolar affinity for sr -rbd binding. this was confirmed by bio-layer interferometry analysis (fig. b) which showed a kd of . nm and an off-rate of × - s - . consistent with its inability to neutralize sars-cov- pseudovirus, sr did not affect rbd-ace binding (fig. c) . to characterize the sr -rbd interactions in detail, we purified the complex (fig. d ), and obtained crystals (fig. d ) that diffracted to . Å resolution ( table ) . the structure was solved by molecular replacement using the published rbd and sybody structures (pdb ids m j and m ) ( , ) as search models. the structure was refined to rwork/rfree of . / . ( table ) . the asymmetric unit contained one molecule each for the rbd and sr , indicating an expected : stoichiometry. sr binds to the rbd sideways at a buried surface area of , . Å ( fig. a) , which is significantly larger than that for the previously reported sybodies sr ( . Å ) and mr ( . Å ) ( ) . the binding surface is near a heavily decorated glycosylation site, asn ( fig. a- c) , which, although at an apparent strategic position to possibly divide the accessible surfaces for immune surveillance, does not show clashes with sr . all three cdrs participated in the interaction by providing five (cdr ), three (cdr ), and nine h-bonds (cdr ) (fig. e- g) . peculiarly, the cdr , which contains a cluster of hydrophobic side chains that include met , val , phe , trp , and tyr , inserted into a greasy pocket (fig. b ) in the rbd that was lined with twelve hydrophobic/aromatic residues (fig. f) . unlike salt bridges, hydrophobic interactions are more tolerant to environment such as change of ph and ionic strength. in addition, they are less specific and thus less likely to be affected by mutations. this binding mode thus makes sr an attractive candidate for detection purposes. most rbd-targeting neutralizing antibodies, including neutralizing nanobodies characterized so far ( , - , , , - , - , , , ) , engage the rbd at the receptor-binding motif (rbm) (fig. a) , thus competing off ace and preventing viral entry. aligning the ace structure to the sr -rbd structure showed that the sr - binding epitope is distant from the rbm (fig. a) . comparing the epitopes of existing monoclonal antibodies showed that the sr epitope partly overlaps with cr ( ) and the recently identified ey a ( ) (fig. b, c ). it has been established that the binding of the bulky cr and ey a at the interface between rbd and the n- . taken together, the structural data rationalize the high-affinity binding between sr and rbd, and its inability to neutralize sars-cov- . because nanobodies are relatively easy to produce, the availability of nanobodies that recognize a wide spectrum of epitopes can be a useful toolkit to probe binding mode of uncharacterized antibodies using competitive binding assays. they may also be used to select binders with new epitopes by including them as pre-formed sybody-rbd complexes during in vitro selection (and thus excluding binders at the same site). other rbd-targeting nanobodies ( , , , ) and mabs ( - , , , , , - ) . red, the collective epitope of rbm-binders; blue, the sr epitope; magenta, the collective epitope of cr and ey a; orange, the overlap between the structure alignment of sr -rbd with ace -rbd revealed that the two rbd structures were overall very similar with a c rmsd of . Å (fig. a) . nevertheless, significant structural rearrangements at the binding interface were observed (fig. a, b) . specifically, the small -helix  - (numbers mark start-end) moves towards the direction of rbm by a dramatic ~ . Å and transforms to a short sheet ( - ) which in turn forms a parallel -sheet pair with  - in the cdr region. in addition, nudged by the cdr , the short helix  - swings towards the rbd core by ~ . Å. remarkable, the dramatic rearrangements did not cause noticeable conformational change of rbm (fig. a) nor did it affect ace binding (fig. c) . given that rbd is a relatively small entity, and that the two surfaces are relatively close (~ Å), this was somewhat unexpected. a probable explanation is that rbd is very rigid and hence stable. indeed, as shown in fig. c , rbd showed ultra-stability, with an apparent melting temperature of greater than º c ( -min heating). intriguingly, the rearrangement happens at a region that is rich in disulfide bonds. specifically,  - is tethered between the disulfide pairs cys -cys and cys -cys , and  - bridges cys -cys and cys- -cys (fig. d) . thus, the three disulfide bonds segregate the two local motifs from the rest of rbd, preventing these conformational changes from propagating through the domain. the neutral feature of sr so far suggests it could bind to rbd in addition to rbm binders such as mr and sr ( ) . indeed, bli assays showed no competition between sr and mr (fig. a) , indicating a 'sandwich complex' where the rbd is bound with both sybodies. this non-competing feature was also observed in the case of mr (fig. b) which has also been shown to have neutralizing activities ( ) . as a further proof for the simultaneous binding, we determined the structure of the sandwich complex sr -rbd-mr (fig. c, table ) to . Å resolution. the sandwich complex was similar to the individual mr -and sr -rbd complexes, with an overall c rmsd of . and . Å, respectively. aligning the sandwich complex with the mr -rbd structure revealed no noticeable changes at the mr -binding surface (fig. c) , reinforcing the idea that sr -binding does not allosterically change the rbm surface nor affect rbm binders. to the two-component complex structure (rbd (green) and mr , pdb id c w) ( ) . although sr does not neutralize sars-cov- pseudovirus itself, its highaffinity may help increase the affinity of other neutralizing nanobodies through avidity effect by fusion. indeed, the biparatopic fusion sr -mr displayed remarkable increase in binding affinity compared to sr or mr alone. its kd of . nm (fig. a ) was lower than mr (kd = . nm) ( ) by folds and lower than sr (kd = . nm) by folds. consistently, sr -mr neutralized sars-cov- pseudovirus times more effectively (in molarity) than mr alone (fig. b) . that sr can enhance potency of its fusion partner was also demonstrated in the case for mr . at its free form, mr bound to rbd with a kd of . nm (fig. c) , and showed modest neutralizing activity with an ic of . g ml - ( . nm). fusing it to sr increased its affinity by over folds, displaying a kd of . nm (fig. d) . in line with this, sr -mr showed a -fold higher neutralization activity compared to mr , with an ic of . nm ( . g ml - ) (fig. e) . interestingly, when fused to mr , a neutralizing antibody that had higher affinity (kd = . nm) than sr , the neutralizing activity decreased by folds (fig. f) . possible reasons include steric incompatibility caused by improper link length, and allosteric effects. such hypothesis warrants future structural investigation. binding affinity and neutralizing activity are important characteristics of therapeutic antibodies. for modestly neutralizing nanobodies, the potency can be increased in a number of ways, including random mutagenesis ( ) , structure-based design ( ) , and fusion ( , , ) . compared with the other two approaches, the fusion technique is more rapid, less involving and does not rely on prior structural information. depending on whether the two fusion partners are the same, divalent nanobodies can be categorized into two types: monoparatopic and biparatopic. biparatopic fusions recognize two distinct epitopes on the same target. therefore, they are more likely to be resistant to escape mutants because simultaneous mutations at two epitopes should occur at a much lower rate than at a single epitope. because of the minute size, sr could be used as an 'add-on' to monoclonal antibodies, scfv fragments, and other nanobodies to enhance their affinity and potency, especially for those with modest neutralizing activities. in addition, due to its small size and high stability, sr may be chemically modified as a vector to deliver smallmolecule inhibitors specifically targeting sars-cov- . in summary, we have structurally characterized sr , a high-affinity nanobody against sars-cov- rbd. although lacking neutralizing activity alone, sr is an attractive biparatopic partner for rbm-binders owing to its distinct epitope from rbm. our work presents a generally useful strategy and offers a simple and fast approach to enhance potency of modestly active antibodies against sars-cov . the authors claim no conflict of interest. sars-cov- rbd was expressed essentially as described ( ) . briefly, a dna fragment encoding, from n-to c-terminus, residues - of sars-cov s, a gly-thr linker, the c protease site (levlfqgp), a gly-ser linker, the avi tag (glndifeaqkiewhe), a ser-gly linker, and a deca-his tag were cloned into the pfastbac-based vector. baculovirus was generated in sf cells following the invitrogen bac-to-bac transfection protocol. high five insect cells were infected with p virus. for crystallization, sr or sr -mr was mixed with rbd at a : . molar ratio. the mixture was then loaded onto a superdex column for gel filtration. fractions containing the complex were pooled and concentrated to mg ml - . to screen rbd binders by size exclusion chromatography (sec) using unpurified sybodies, rbd was fluorescently labelled as follows. first the avi-tagged rbd was for - s, before moving into sybody-free buffer for dissociation. bli signal was monitored during the whole process. data were fitted with a : stoichiometry using the build-in software analysis . for kinetic parameters. for competitive assay of the rbd between sr and ace , the rbd-coated sensor was saturated in nm of sr , before soaked in nm sr with or without nm of ace . as a control, bli assays were also carried out by soaking the rbd-coated sensor in ace without sr . for competitive rbd-binding assays for different sybodies, the assays were carried out the same manner as described above. desired crystals were cryo-protected, harvested using a mitegen loop under a microscope, and flash-cooled in liquid nitrogen before diffraction. x-ray diffraction data were collected at beamline bl u ( ) at shanghai synchrotron radiation facility with a x μm beam on a pilatus m detector, with oscillation of . ° and a wavelength of . Å. data were integrated using the software xds ( ) , and scaled and merged using aimless ( ) . the sr -rbd structure was solved by molecular replacement using phaser ( ) with pdb ids m j and m ( ) as the search model. the sr -mr -rbd structure was solved using the sr -rbd and mr structure ( ) as search models. the models were manually adjusted as guided by the fo-fc maps in coot ( ) , and refined using phenix ( ) . structures were visualized using pymol ( ). the structure factors and coordinates were deposited in the protein data bank (pdb) under accession codes d z (sr +rbd) and d (sr -mr +rbd). a novel coronavirus outbreak of global health concern cryo-em structure of the -ncov spike in the prefusion conformation structure, function, and antigenicity of the sars-cov- spike glycoprotein cell entry mechanisms of sars-cov- sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural basis of receptor recognition by sars-cov- structural and functional basis of sars-cov- entry by using human ace structural basis for the recognition of sars-cov- by full-length human ace conformational dynamics of sars-cov- trimeric spike glycoprotein in complex with receptor ace revealed by cryo-em. biorxiv structural basis for the neutralization of sars-cov- by an antibody from a convalescent patient a highly conserved cryptic epitope in the receptor binding domains of sars- a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace a human neutralizing antibody targets the receptor-binding site of sars-cov- isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model convergent antibody responses to sars-cov- in convalescent individuals the receptor-binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in sars-cov- patients cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody potent neutralizing antibodies against multiple epitopes on sars-cov- spike. microbe neutralizing nanobodies bind sars-cov- spike rbd and block interaction with ace studies in humanized mice and convalescent humans yield a sars-cov- antibody cocktail potent neutralizing antibodies against sars-cov- identified by high-throughput single-cell sequencing of convalescent patients' b cells a potent neutralizing human antibody reveals the n-terminal domain of the spike protein of sars-cov- as a site of vulnerability. biorxiv potent neutralizing antibodies from covid- patients define multiple targets of vulnerability structures of human antibodies bound to sars-cov- spike reveal structural basis for potent neutralization of sars-cov- and role of antibody affinity maturation. biorxiv : the preprint server for biology the therapeutic potential of nanobodies delivery of alx- by inhalation greatly reduces respiratory syncytial virus disease in newborn lambs nanobodies® as inhaled biotherapeutics for lung diseases caplacizumab treatment for acquired thrombotic thrombocytopenic purpura emerging therapies in rheumatoid arthritis: focus on monoclonal antibodies. f res , f faculty rev- structural basis for potent neutralization of betacoronaviruses by single- potent synthetic nanobodies against sars-cov- and molecular basis for neutralization. biorxiv an ultra-high affinity synthetic nanobody blocks sars-cov- infection by locking spike into an inactive conformation. biorxiv an alpaca nanobody neutralizes sars-cov- by blocking receptor interaction synthetic nanobodies targeting the sars-cov- receptor-binding domain. biorxiv selection, biophysical and structural analysis of synthetic nanobodies that effectively neutralize sars-cov- . biorxiv identification of human single-domain antibodies against sars-cov- synthetic single domain antibodies for the conformational trapping of membrane proteins generation of synthetic nanobodies against delicate proteins the protein complex crystallography beamline (bl u ) at the shanghai synchrotron radiation facility how good are my data and what is the resolution? phaser crystallographic software features and development of coot phenix: a comprehensive python-based system for macromolecular structure solution key: cord- -t q tlq authors: jia, yong; shen, gangxu; zhang, yujuan; huang, keng-shiang; ho, hsing-ying; hor, wei-shio; yang, chih-hui; li, chengdao; wang, wei-lung title: analysis of the mutation dynamics of sars-cov- reveals the spread history and emergence of rbd mutant with lower ace binding affinity date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: t q tlq monitoring the mutation dynamics of sars-cov- is critical for the development of effective approaches to contain the pathogen. by analyzing sars-cov- and sars genome sequences, we provided direct genetic evidence that sars-cov- has a much lower mutation rate than sars. minimum evolution phylogeny analysis revealed the putative original status of sars-cov- and the early-stage spread history. the discrepant phylogenies for the spike protein and its receptor binding domain proved a previously reported structural rearrangement prior to the emergence of sars-cov- . despite that we found the spike glycoprotein of sars-cov- is particularly more conserved, we identified a mutation that leads to weaker receptor binding capability, which concerns a sars-cov- sample collected on th january from india. this represents the first report of a significant sars-cov- mutant, and raises the alarm that the ongoing vaccine development may become futile in future epidemic if more mutations were identified. highlights based on the currently available genome sequence data, we proved that sars-cov- genome has a much lower mutation rate and genetic diversity than sars during the - outbreak. the spike (s) protein encoding gene of sars-cov- is found relatively more conserved than other protein-encoding genes, which is a good indication for the ongoing antiviral drug and vaccine development. minimum evolution phylogeny analysis revealed the putative original status of sars-cov- and the early-stage spread history. we confirmed a previously reported rearrangement in the s protein arrangement of sars-cov- , and propose that this rearrangement should have occurred between human sars-cov and a bat sars-cov, at a time point much earlier before sars-cov- transmission to human. we provided first evidence that a mutated sars-cov- with reduced human ace receptor binding affinity have emerged in india based on a sample collected on th january . monitoring the mutation dynamics of sars-cov- is critical for the development of effective approaches to contain the pathogen. by analyzing sars-cov- and sars genome sequences, we provided direct genetic evidence that sars-cov- has a much lower mutation rate than sars. minimum evolution phylogeny analysis revealed the putative original status of sars-cov- and the early-stage spread history. the discrepant phylogenies for the spike protein and its receptor binding domain proved a previously reported structural rearrangement prior to the emergence of sars- despite that we found the spike glycoprotein of sars-cov- is particularly more conserved, we identified a mutation that leads to weaker receptor binding capability, which concerns a sars-cov- sample collected on th january from india. this represents the first report of a significant sars-cov- mutant, and raises the alarm that the ongoing vaccine development may become futile in future epidemic if more mutations were identified. evolutionary rate assessment the ratio of nonsynonymous mutations (dn) to synonymous mutations (ds) was calculated using codeml in the paml (version . ) package ( ). cds sequences for each protein encoding gene were filtered to remove redundant identical sequences. then codon-based cds sequence alignment was performed using muscle program, and an individual nj tree was generated using mega . ( ) with p-distance model. the obtained sequence alignment and phylogenetic tree files were used as paml inputs for dn and ds calculations. protein structural analyses d structure of the sars-cov- spike glycoprotein in complex with (pdb: vw , vw ) has been determined recently ( , ) genetic diversity analyses identified a single amino acid mutation in rbd of the spike protein in sars-cov- as of th march , there are a total of nucleotide sequences for sars-cov- in the ncbi database. by restricting to the complete or near-complete genomes, sequences from countries were obtained and used for further analyses. this encompasses records from usa, from china, and the rest from other countries: australia ( ), brazil ( ), finland ( ), india ( ), italy ( ), nepal ( ), spain ( ), south korea ( ), and sweden ( ). based on the gene model of the reference sars-cov- genome (genebank: nc_ . ), a total of protein-encoding open reading frames (orfs), plus utr and utr were annotated ( figure a) . overall, the gene sequences from different samples are highly homologous, sharing > . % identity, with the exception of utr ( . %) and utr ( %) ( table ) , which are relatively more divergent. sequence alignment showed that there is no mutation in orf , orf a, and orf b. the genetic diversity profile across the genomes was displayed in figure a . a few nucleotide sites within orf a, orf b, orf a, and orf exhibiting high genetic diversity were identified ( figure a) . the s protein is critical for virus infection and vaccine development. as shown in figure to assess how the mutation rate and genetic diversity of sars-cov- , the ratio of nonsynonymous mutations (dn) and synonymous mutations (ds), was calculated for each protein-encoding orf based on the sars-cov- and sars genomes. for sars-cov- , the highest dn was observed for orf ( . ), followed by orf a ( . ), orf ( . ), and orf ( . ) ( table ) , indicating these genes may be more likely to accumulate nonsynonymous mutations. in contrast, orf b ( . ), s gene ( . ) encoding the spike protein, and orf ( . ) are relatively more conserved in terms of nonsynonymous mutation. noteworthy, orf , orf ab and orf are strictly conserved with no nonsynonymous mutation. compared to sars-cov- , sars displayed higher mutation rates for all of the orfs in the virus genome (table ) , suggesting an overall higher levels of genetic diversity and mutation rate. in particular, the dn and ds values for the s gene in sars-cov is around and times higher than that for sars-cov- . in contrast, the mutation rate differences for orf a and orf b between sars-cov- and sars are relatively milder, varying from . times to . times only. in contrast to sars-cov- , which has strictly conserved orf , orf a, and orf b, sars displayed mutation rates at different levels. notably, the ds for orf are comparable between the two genomes at . and . , respectively. to trace the potential spread history of sars-cov- across the world, an unrooted minimum evolution (me) tree of the genomes was developed based on whole-genome sequence alignment. the clustering pattern of the me phylogeny the spike glycoprotein is critical for the virus infection. recent study suggested that the s protein in sars-cov- may has underwent a structural rearrangement( ). to investigate this hypothesis, two separate phylogenies were developed based on the full-s and rbd sequences, respectively. overall, the two phylogenies displayed similar clustering patterns, separating into three major clades (figure ) . sars-cov- was identified in the same major clade, and was clustered most closely with two bat sars covs (highlighted in purple and green colors, figure ) and the human sars-cov (orange color, figure ) . in both phylogenies, sars-cov- is most closely related to bat_cov_ratg , suggesting sars- may have originated from bat. however, the evolutionary positions of human sars-cov and bat-sl-covz were swapped between the full-s and rbd-only phylogenies. in the full-s phylogeny, bat-sl-covz is relatively more similar to human sars-cov- , whilst human sars-cov is closer to sars-cov- than bat-sl-covz . taken together, these results suggested that the rbd of sars-cov- is more likely originated from human sars-cov, whilst the rest part of the s protein in sars-cov- may have originated from bat-sl-covz , supporting the potential structural rearrangement of s protein in sars-cov- . bat_cov_ratg is similar to sars-cov- , indicating the proposed structural rearrangement may have occurred in bat first before its transmission to human. the rbd of virus s protein binds to a receptor in host cells, and is responsible for the first step of cov infection ( ). thus, amino acid mutation to rbd may have significant impact on receptor binding and vaccine development. the d structure of the spike protein rbd of sars-cov- (pdb: vw ) has recently been determined in complex with human ace receptor ( ). one of the amino acid mutations in the rbd of s protein (r i) was identified among the sars- cov- genomes. sequence alignment showed that r is strictly conserved in sars-cov- , sars-cov and bat sars- like cov (figure a) . based on the determined cov _rbd-ace complex structure, r is located at the interface between rbd and ace , but is positioned relatively far away from ace , thus does not have direct interaction with ace ( figure b) . however, the determined rbd -ace structure showed that r forms a hydrogen bond ( . Å in length) with the glycan attached to n from ace ( figure c ) ( ). the hydrogen bond may have contributed to the exceptionally higher ace binding affinity. in contrast, despite this arginine residue is also conserved in human sars-cov (corresponding to r in pdb: ajf), it is positioned relatively distant ( . Å) from the glycan bound to n from ace ( figure s ) . interestingly, the r-glycan hydrogen bond seem to be disrupted by the r i mutation in one sars-cov- potential interventions for novel coronavirus in china: a systematic review timely development of vaccines against sars-cov- . emerg microbes infec structure of sars coronavirus spike receptor-binding domain complexed with receptor evidence for a common evolutionary origin of coronavirus spike protein receptor- binding subunits cryo-em structure of the -ncov spike in the prefusion conformation structural basis of receptor recognition by sars-cov- angiotensin receptor blockers as tentative sars-cov- therapeutics. drug development research the spike protein of sars-cov -a target for vaccine and therapeutic development structural basis for the recognition of sars-cov- by full-length human ace proof of principle for epitope-focused vaccine design influenza virosomes in vaccine coronaviruses an rna proofreading machine regulates replication fidelity and diversity genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission annotationsketch: a genome annotation drawing library muscle: multiple sequence alignment with high accuracy and high throughput mega : molecular evolutionary genetics analysis version . for bigger datasets phylogenetic analysis by maximum likelihood genomic variance of the -ncov coronavirus on the origin and continuing evolution of sars-cov- a pneumonia outbreak associated with a new coronavirus of probable bat origin identifying sars-cov- related coronaviruses in malayan pangolins identifying sars-cov- related coronaviruses in malayan pangolins preliminary identification of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies contrast to the arginine residue, which is electrically charged and highly hydrophilic, the mutated isoleucine residue has a highly hydrophobic side chain with no hydrogen-bond potential ( figure e ). to sum up, the r i mutation identified from the sars-cov- strain in india represents a sars-cov- mutant with potentially reduced ace binding affinity. hydrophobic profile changes due to r i mutation, with with red and white colours representing the highest hydrophobicity and the lowest hydrophobicity respectively. all amino acid number according to the s protein of sars-cov- (nc_ . ) and human ace , respectively. based on the currently available genome sequence data, our results showed that the mutation rate of sars-cov- is much lower than that for sars, which caused the - outbreak. our study is the first to provide a direct quantitative comparison between sars-cov- and sars. a relatively stable genome of sars-cov- is a good indication for the epidemic control, as less mutation raises the hope of the rapid development of validate vaccine and antiviral drugs. our results are consistent with several recent genetic variance analyses on sars- ) , which suggested the sars- cov- genomes are highly homogeneous. molecular geneticists closely monitoring the virus development also suggested that the mutation rate of sars-cov- maintains at a low level. whilst it is generally safe to say that sars-cov- tends to mutate at a low rate, all current analyses are merely based on data collected at the early stage of this pandemic. as the virus continues to spread rapidly around the world, and more genomic data is accumulated, the evolution and mutation dynamics of sars-cov- still need to be monitored closely. one critical aim of our study is to identify the original status of sars-cov- before its wide transmission across different countries. due to the short time space of sample collection and a relatively low mutation rate for sars-cov- , we believe that a minimum evolution phylogeny may outperform other phylogenetic methods to achieve this aim. as expected, the earliest few reported sars-cov- accessions collected from wuhan china were identified at the center of the phylogenetic tree with the shortest branch. interestingly, a number of virus genomes from usa were found almost identical to these putative original versions of virus from wuhan. however, according to public media, the outbreak of sars-cov- in usa occurred relatively later than other countries. one possible explanation for this observation is that, the spread of sars- cov- in usa might start much earlier than previously thought or reported. due to a dominant proportion of the samples in this study were collected from china and usa, we observed a significantly higher level of genetic diversity from these two countries. most sars-cov- accessions from the other countries can find their closely related sisters from either china or usa. this data bias, on the other hand, may give us an advantage to trace the spread history of sars-cov- in different countries. this suggestion is reliable because all of the samples studies in this study were collected at the early stage of the pandemic, which may avoid the potential data noise caused by recent published genomes of complex spread background. one notable finding in our phylogenetic tree is that, the singleton sars-cov- accessions collected from australia, brazil, south korea, italy and sweden were clustered together with two usa samples but without a chinese version, suggesting that these infection cases may be somehow related. in addition, one of the three samples collected from the cruise ship stranded in japan was found closely related to a sample collected from guangzhou, china, whilst the other two were grouped with several cases from usa. noteworthy, out phylogeny seems to support the presence of two major types of sars-cov- in the target samples, suggesting the potential existence of two spread sources. interestingly, this speculation is corroborated by an independent clustering analyses using different phylogeny method ( ). until now, the origin of sars-cov- , and how it has been transmitted to human remains largely a mystery. early genomic data proved that human sars-cov- is an enveloped, positive-sense, and single-stranded rna virus in the subgenus sarbecovirus of the genus betacoronavirus ( , ) . evolutionarily, sars-cov- is most closely related to bat sars-like cov ( % genome sequence identity) and human sars cov ( %), the latter of which has caused world pandemic in ( ). based on the strong genome sequence identity between sars-cov- and bat sars-like covs, it was initially speculated that sars-cov- may have originated from bat ( , ). however, a more recent study proposed that pangolin may be the most likely reservoir hosts due to the identification of closely related sars-covs from this species as well ( ). both of these two animals can harbor coronaviruses related to sars-cov- . however, direct evidence of the transmission of sars-cov- from either bat or pangolin to human is still missing. prior to this study, several publications have suggested that sars-cov- may have originated from the genome recombination of sars-like covs from different animal hosts, as evidenced by the discrepant clustering patterns for the phylogenies using different genetic regions. lu ( ) first observed that the rbd of s protein in sars-cov- is more closely related to human sars-cov, whilst the other part of its genome is more similar to bat sars-cov. later peng ( ) identified a bat cov_ratg and several pangolin sars-covs that are consistently closer to sars-cov- than human sars-cov in either full-s protein or rbd. by combining the data from these two studies, our study confirmed the observations reported in both studies, and further determined that the s protein recombination actually happened between human sars-cov and a bat sars-cov, much earlier before its transmission to human, with the newly identified bat sars-cov-ratg as an intermediate. another notable finding in this study corresponds to the identification of an amino acid mutation in the rbd of s protein in sars-cov- . mostly importantly, we showed that this amino acid mutation is very likely to cause a reduced binding affinity to human ace receptor. the rbd of s protein binds to a receptor in host cells, and is responsible for the first step of cov infection. the receptor binding affinity of rbd directly affects virus transmission rate. thus, it has been the major target for antiviral vaccine and therapeutic development such as sars ( ). despite the s protein gene seems to be more conserved than the other protein-encoding genes in the sars-cov- genome, our study provide direct evidences that a mutated version of sars-cov- s protein with varied transmission rate may have already emerged. based on the close relationship of sars-cov- to sars, current vaccine and drug development for sars-cov- has also focused on the s protein and its human binding receptor ace ( , ). thus, the observation in this study raised the alarm that sars- cov- mutation with varied epitope profile could arise at any time, which means current vaccine development against sars-cov- is at great risk of becoming futile. because the receptor recognition mechanism seems to be highly conserved between sars-cov- and sars-cov, which have been proved to share the common human cell receptor ace . one suggestion for the next step of therapeutic development is probably to focus on the identification of potential human ace receptor blocker, as suggested in a recent commentary ( ). this approach will avoid the above- key: cord- -yfn vaan authors: meirson, tomer; bomze, david; markel, gal title: structural basis of sars-cov- spike protein induced by ace date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: yfn vaan motivation the recent emergence of the novel sars-coronavirus (sars-cov- ) and its international spread pose a global health emergency. the viral spike (s) glycoprotein binds the receptor (angiotensin-converting enzyme ) ace and promotes sars-cov- entry into host cells. the trimeric s protein binds the receptor using the distal receptor-binding domain (rbd) causing conformational changes in s protein that allow priming by host cell proteases. unravelling the dynamic structural features used by sars-cov- for entry might provide insights into viral transmission and reveal novel therapeutic targets. using structures determined by x-ray crystallography and cryo-em, we performed structural analysis and atomic comparisons of the different conformational states adopted by the sars-cov- -rbd. results here, we determined the key structural components induced by the receptor and characterized their intramolecular interactions. we show that κ-helix (also known as polyproline ii) is a predominant structure in the binding interface and in facilitating the conversion to the active form of the s protein. we demonstrate a series of conversions between switch-like κ-helix and β-strand, and conformational variations in a set of short α-helices which affect the proximal hinge region. this conformational changes lead to an alternating pattern in conserved disulfide bond configurations positioned at the hinge, indicating a possible disulfide exchange, an important allosteric switch implicated in viral entry of various viruses, including hiv and murine coronavirus. the structural information presented herein enables us to inspect and understand the important dynamic features of sars-cov- -rbd and propose a novel potential therapeutic strategy to block viral entry. overall, this study provides guidance for the design and optimization of structure-based intervention strategies that target sars-cov- . several members of the coronavirus family circulate in the human population and usually manifest mild respiratory symptoms (su, et al., ) . however, over the past two decades, emerging coronaviruses (cov) have raised great public health concerns worldwide. the highly pathogenic severe acute respiratory syndrome-related coronavirus (sars-cov) (drosten, et al., ; ksiazek, et al., ) and middle-east respiratory syndrome coronavirus (mers-cov) (zaki, et al., ) have crossed the species barrier and cause deadly pneumonia in afflicted individuals, sars and mers, respectively. sars-cov first emerged in humans in guangdong province of china in , and its global spread was associated with deaths among , cases (de wit, et al., ; who, ) . in , mers-cov was first identified in the arabian peninsula and spread to countries, infecting a total of , people and claiming lives (who, ) . while the sars pandemic was finally stopped by conventional control measures, including patient isolation and travel restrictions, new cases of mers have been reported (yoon and kim, ) . in december , a previously unknown cov, named sars-cov- , was discovered in wuhan, hubei province of china (huang, et al., ; zhu, et al., ) . the sudden emergence of the novel sars-cov- has rapidly evolved into a pandemic that posed a serious threat to global health and economy (gates, ) . although sars and mers have a higher mortality rate, sars-cov- infection spreads much more rapidly . as of may , more than , , confirmed infections were reported in countries, including over , deaths (who, ) . mers-cov, sars-cov, and sars-cov were suggested to originate from bats and most likely serve as a reservoir host for these viruses (ge, et al., ; haagmans, et al., ; li, et al., ; memish, et al., ; zhou, et al., ) . detailed investigations of the zoonotic origin of human covs indicate that sars-cov was transmitted from palm civets to humans and mers-cov from dromedary camels to humans (guan, et al., ; haagmans, et al., ; kan, et al., ) . however, the intermediate host for zoonotic transmission of sars-cov- , linked to a wet animal market in wuhan, is still under investigation (walls, et al., ; ye, et al., ) . the spike (s) glycoprotein is a class i fusion protein that mediates the entry of covs into target cells (tortorici, et al., ) . the s protein forms homotrimers that protrude from the viral surface ( figure a ) and comprises two functional subunits, which facilitate viral attachment to the surface of host cells (s subunit) and fusion of the viral and cellular membranes (s subunit) ( figure b) . the distal part of s subunit, the receptor-binding domain (rbd), is linked through two anti-parallel hinge linkers which connect the domain to the n-terminal domain (ntd) and c-terminal domain (ctd ) and allow the transition between closed and open conformations (gui, et al., ; song, et al., ; walls, et al., ; yuan, et al., ) (figure a-c) . the open conformation of the s subunit facilitates interaction with angiotensin-converting enzyme (ace ) ( figure d -e), which contributes to the stabilization of the prefusion state of the s subunit that contains the fusion machinery (song, et al., ; walls, et al., ) . activated s protein is cleaved by host proteases at the s /s and s ' site resulting in a cleaved s ' subunit that drives the fusion of viral and cellular membranes (belouzard, et al., ; millet and whittaker, ; pallesen, et al., ) . the critical step in sars-cov- infection which involves the transition between a metastable prefusion state to a stable post-fusion state is triggered by binding to ace which induces conformational changes in the rbd and the hinge region (gui, et al., ; pallesen, et al., ; song, et al., ; walls, et al., ) . while the interaction between the s protein and ace has been extensively studied (han, et al., ; lan, et al., ; letko, et al., ; li, ; shang, et al., ; song, et al., ; wan, et al., ; yan, et al., ) , the key determinants in the activation of the virus upon binding to the host receptor is poorly understood. we aimed to investigate the structural basis of the sars-cov- s protein modulation induced by virus-receptor interaction. we searched the pdb database for high-quality structures of the sars-cov- s protein at closed, open, and bound conformations. for each conformation, we used the structure with the lowest resolution that contains the rbd and the hinge region. for the unboundclosed, unbound-open, and bound conformations, we used pdb ids vxx:b, vyb:b, and m j:e, respectively. to model the position of sars-cov- s protein compared to ace , we structurally aligned the models to the rbd of ace-b o at complex (yan, et al., ) ( m ) and sars-cov in the bound conformation with the highest degree of rbd opening (song, et al., ) ( ack). the structures were preprocessed with dock prep and aligned using ucsf chimera v . (pettersen, et al., ) . to compare the secondary structure compositions of attachment proteins, we used structures with the lowest resolution from different families of enveloped viruses with class i fusion proteins (white, et al., ) . these structures were compared to a random dataset comprising the first non-viral pdb proteins deposited in with a resolution < Å. the initial assignment of secondary structure was performed using dssp (touw, et al., ) . to assign κ-helix (alternative designation for polyproline ii [ppii]), we used the method which was recently introduced . briefly, we calculated the root-mean-square dihedral deviations (rmsdd) of the peptide backbone torsional angles φ and ψ as a measure of the average deviation from a reference κ-helix. to include short segments of κ-helix, at least two consecutive residues with mean rmsdd below the cutoff (ε) of (mansiaux, et al., ) were defined as the criteria for the assignment. to analyze the conformational changes induced by ace binding, we compared the bound with the unbound form. we included both the open and closed states of the unbound structures to highlight the specific effects induced by the interaction with ace . the backbone dihedral angles (φ, ψ) were converted to generic helix parameters ϑ (angular step per residue) and d (rise per reside) as described by miyazawa (miyazawa, ) to describe the geometrical variations intuitively. van der waals (vdw) intramolecular interactions were determined using a distance cutoff of . Å. only interactions that were gained or lost by the bound compared to both unbound conformations were considered. to identify interacting residues with considerable shifts, displacements smaller than . Å were excluded. to identify important backbone hydrogen bonds (h-bonds), the mean difference in the electrostatic interaction energy between the bound and the unbound conformations was calculated using dssp. for each acceptor-donor pair, gain, or loss of h-bond was defined if the mean energy difference is <- kcal/mol or > kcal/mol, respectively. the magnitude of the energy threshold represents twice the standard cutoff (- . kcal/mol) for the existence of a h-bond (zhang and sagui, ) . structural analyses was performed with the bio d package (grant, et al., ) in r version . . due to inconsistencies in the reported composition of disulfide bonds in the rbd in different structures (lavillette, et al., ; song, et al., ; walls, et al., ; wrapp, et al., ; yuan, et al., ) and to account for possible errors in modeling disulfide bonds (carpentier, et al., ; kleywegt and jones, ; villa and lasker, ; wlodawer, et al., ) , we used disulfide by design . (dbd ) to calculate χ torsion angles and bond energies (craig and dombkowski, ) . the dbd algorithm could accurately predict the chiralities and positions of disulfide bonds based on energy function, reflecting the geometric characteristics of disulfide bonds among high-quality crystal structures (craig and dombkowski, ; wiedemann, et al., ) . we used the estimated disulfide energy threshold of < . kcal/mol that applies to most naturally occurring disulfides, χ angles of - or ± (craig and dombkowski, ) and disulfide bond distance of . ± % (spek, ) to indicate a high probability of a disulfide bond that satisfies stereochemical constraints. the models were visualized using ucsf chimerax v . (goddard, et al., ) . models with reassigned secondary structures were visualized using the academic version of schrodinger maestro v . (bell, et al., ) . κ-helices and -helices were represented as ribbons and tubes, respectively. ribbons were drawn, passing through carbon alphas. trajectories were produced by interpolating between the bound and unbound-closed conformation and visualized using vmd (humphrey, et al., ) . to characterize the structural composition of the rbd of sars-cov- and other members of the class i fusion proteins, we calculated the distribution of secondary structure assignment. since κ-helix (ppii) often serves functional purposes in proteins (adzhubei, et al., ; and dssp does not assign the conformation, we performed reassignment of the secondary structures to reveal potential κ-helix conformations. compared to a random dataset in the pdb, structures of the attachment proteins of covs, influenza, measles, hiv and ebola viruses display higher proportions of β-strands and κ-helices, whereas α-helix is under-represented ( figure a ). among non-regular secondary structures, the random coil is the most over-represented assignment. the reassigned sars-cov- structure reveals a diverse distribution of κhelices throughout the domain, whereas other secondary structures are clustered more closely together ( figure b -c). the most common secondary structure in the ace binding interface is κ-helix ( figure c ). to gain insights into the effects of ace interaction on the s protein of sars-cov- , we analyzed the intramolecular structural variations in the bound versus the unbound-closed and unbound-open conformations. figure depicts a summary of the main differences in the intramolecular interaction and h-bond profile upon receptor binding and allows to follow the interconnectivity path leading to the hinge region at the termini. while small changes in the rotational angles or rise of residues occur throughout the domain ( figure a ), the most substantial changes happen at non-regular secondary structures such as coils and turns. compared with the unbound-open state, the bound conformation is mainly associated with the formation of κ-helices ( residues), followed by α-helices ( residues), β-strands ( residues), and a -helix ( residues). the interaction map demonstrates a rich rearrangement of intramolecular interactions ( figure a ). interactions are gained mostly in missing loops at the binding interface ( - ), whereas lost interactions are scattered throughout the domain. figure b figure b ). consequently, a new h-bond is formed between n and g , which converts ' into α . the adoption of a shorter-pitched α-helix pulls κ , which constructs the n-terminus of the hinge region. the main-chain h-bond of f with g and the π-π interaction with f are lost ( figure c ) while a stronger pairing of h-bond forms between g and y ( figure b ). the rearrangement of the interactions is associated with conformational changes, including the conversion between κ ' to β , which constructs the c-terminus of the hinge region. also, the loop downstream to y is displaced, and α-helix α is formed by a contribution of h-bond between the backbones of p and n . the h-bond between l and is lost, and the new α packs closely together with α , which includes vdw interactions between s and t ( figure d ). notably, this interaction is among the only gained contacts observed in the rbd, not involving the missing loops ( figure a ). also, a concentration of h-bonds is re-distributed along α , repositioning the α-helix one step back (from - to - ). this transition is associated with a small movement of β at the medial region of the hinge. the κ /κ loop between residues and exhibits a relatively large conformational change between the unbound-closed and the bound states (supplementary figure ) . this loop is involved in the interaction between the rbd and ctd in the closed conformation. however, in the open conformation where rbd dissociates from ctd , the loop is missing in both sars-cov and sars-cov- (song, et al., ; walls, et al., ; wrapp, et al., ) . the dissociation between the domains is required for the hinge and the rbd to move freely. together, this indicates that the rbd-ace complex stabilizes the κ /κ loop in a conformation that disfavors the interaction between rbd and ctd . the rbd-ace complex inducible interactions culminate in the hinge region. the hinge, which we show to exhibit various conformational changes (figure and ) , contains two pairs of cysteines. since cysteine pairs have the potential to act as allosteric switches (bekendam, et al., ; butera, et al., ; chiu and hogg, ) , we hypothesized they are altered during the interaction between the s protein and ace receptor. therefore, we calculated the energy and geometrical features of the potential disulfide bonds in all high-quality structures of the rbd in sars-cov- . due to gross inconsistencies between disulfide assignments (lavillette, et al., ; song, et al., ; walls, et al., ; wrapp, et al., ; yuan, et al., ) and quality issues of these pairs, we also characterized quality metrics of the cysteine pairs. figure a a summary of the conformational changes between the unbound and the bound rbd states are shown as an interpolated trajectory in figure a . also, a simplified model linking between the distal part and the hinge is depicted in figure b . the recent emergence of sars-cov- pandemic represents a major epidemiological challenge. ace has been reported to be the receptor that initiates the activation of this novel cov (hoffmann, et al., ; yan, et al., ) . in this study, we determined the key structural components induced by the receptor and characterized their intramolecular interactions. numerous structures of the prefusion human cov s proteins were determined at different states, and the key regions responsible for the interaction with the receptor were previously reported (lan, et al., ; shang, et al., ; walls, et al., ; wan, et al., ; wrapp, et al., ; yan, et al., ) . however, to our knowledge, no previous study has extensively investigated the mechanism of transduction through the rbd. the structural transduction mechanism on a molecular level remains a difficult question to address experimentally. therefore, we used current state-of-the-art structures of the sars-cov- -rbd and focused on their structural organization. these structures represent snapshots of the dynamic s protein and allow to track and model the conformational transitions between the different states. characterizing the secondary structures of proteins is fundamental for gaining knowledge and simplifying the complicated d structures. we show that κ-helix is a predominant structure in the binding interface and in facilitating the conversion to the active form of the s protein. this conformation which is commonly known as ppii was recently designated as κ-helix, following the widespread criticism of the misleading name ppii (adzhubei, et al., ; hollingsworth, et al., ; mansiaux, et al., ; martin, et al., ; . as many structures contain few prolines or none, the name 'polyproline' is considered inappropriate, and a more general term which abides the tradition of latin letters to secondary structures was proposed . the role of κ-helix in propagating interactions and facilitating switch-like components coincides with the assessment that they represent 'functional blocks' as compared with other conformations such as α-helices that often represent structural building blocks (adzhubei, et al., ; . the flexible and extended conformation of κ-helices, as well as non-regular h-bonds and preferred location on the surface of proteins, making them ideal elements for a wide range of molecular interactions (cubellis, et al., ; stapley and creamer, ; zagrovic, et al., ) . however, despite being more common than most secondary structures, this conformation is often overlooked, apart from proline-rich regions. this is explained in part because it is not defined by h-bonds and is not assigned by the secondary structure assignment program employed in the pdb. other reasons include a lack of graphical representation and its misleading historical name . our findings demonstrate that the high prevalence of κ-helix, as well as β-strands, are not unique to sars-cov- and appear to characterize other viruses. the conformational changes between different states of the sars-cov- -rbd are associated with a typical transition between β-strand and κ-helix, as they are closely related in the torsional space (hollingsworth and karplus, ; mansiaux, et al., ; oh, et al., ) . therefore, we hypothesize that κ-helix could serve as an efficient evolutionary tool due to its flexible nature, which could adapt more quickly in the dynamic environment compared to more restricted secondary structures. in line with this suggestion, austin et al. showed evolutionary conservation of κ-helix bias in intrinsically disordered regions that could be tuned by changing the distribution of κ-helix for multiple functions, including molecular recognition or allosteric regulation (austin elam, et al., ) . the hinge region was reported to facilitate the rbd motion and participate in the activation process (gui, et al., ; pallesen, et al., ; song, et al., ; walls, et al., ) , and our structural analysis suggests that the conformational changes culminate at the hinge which contains four highly conserved cysteines (shang, et al., ; . to explore a possible allosteric switching mechanism, we performed atomic comparisons of the cysteine pairs at different states of the s protein. since inconsistencies in disulfide assignment exist and errors in structure determination are not uncommon (carpentier, et al., ; kleywegt and jones, ; villa and lasker, ; wlodawer, et al., ) , we also assessed their geometric quality. atomic details of the structures at a ph range of . - . reveal alternating patterns in bond energy, geometric characteristics, and quality, between the pairs cys -cys and cys -cys , but not in cys - . nonetheless, cys - displays a switch in the chirality of the disulfide bond in the bound state. these findings demonstrate a switching mechanism between disulfide bonds of cys -cys and cys -cys , where at each state (open, closed, or bound), only one pair of cysteine satisfies favorable disulfide configuration and quality criteria. the unfavorable configuration is associated with unphysical disulfide bond characteristics, high energy, poor quality, or their combination. this indicates that the distorted disulfide bonds entail substantial stress or that bond assignments were inaccurate, and these pairs are reduced. both are possible as the cysteine pairs, located at the hinge, undergo significant conformational changes, and forced stretching of disulfide bonds is known to accelerate their cleavage (zhou, et al., ) . the four cysteine residues are adjacent and aligned suitably for disulfide exchange reactions. such an arrangement makes it possible for a concerted series of disulfide exchange reactions to occur (zhou, et al., ) beyond serving purely structural role, disulfide bonds can participate in redox reactions and act as allosteric switches controlling protein functions (bekendam, et al., ; butera, et al., ; chiu and hogg, ; zhou, et al., ) . specific disulfide exchange reactions depend on a reducing agent such as thioredoxin or protein disulfide isomerase (pdi) (zhou, et al., ) . rearrangement of disulfides (disulfide shuffling) can also occur via intra-protein thiol-disulfide exchange reactions without additional agents, which depends on conformational changes (chiu and hogg, ; zhang, et al., ). an increasing number of studies support an essential role for disulfide exchange in the entry of multiple viruses in susceptible cells (stantchev, et al., ) . in hiv, the attachment of gp subunit of the viral envelope (env) to its primary receptor cd , induces conformational changes that cause disulfide exchange in a pdi-dependent manner and is obligatory for triggering membrane-fusion process (fenouillet, et al., ; owen, et al., ; stantchev, et al., ) . more specifically, structural rearrangements of the s protein of cov murine hepatitis virus (mhv) during cell interaction has been reported to affect cell entry using disulfide shuffling in the rbd (gallagher, ; weismiller, et al., ) as our result suggest for sars-cov- . surprisingly, lavillette et al. showed that sars-cov s subunit is redox insensitive using chemical manipulation of the redox state, in contrast to various viruses including hiv and the cov mhv (fenouillet, et al., ; lavillette, et al., ) . however, the study utilized murine leukemia retrovirus (mlv) pseudotyped with s subunit, lacking the s subunit, a system with limited biological relevance as the subunits cooperate and form a tightly packed trimeric structure. furthermore, the subunits remain non-covalently bound after proteolytic s /s cleavage (tortorici, et al., ; walls, et al., ) . we propose that targeted redox exchange between conserved cysteine pairs in the s protein could conceptualize a new strategy in the development of high-affinity ligands against sars-cov- , with important therapeutic implications. recently, hati et al. showed, using molecular dynamic simulations, that reducing all disulfide bonds in both ace and sars-cov impairs their binding affinity (bhattacharyay and hati, ) . however, more evidence is required to establish the role of redox potential and paired and unpaired cysteines in the s protein during viral entry. also, our study is limited to a computational assessment of structures reconstructed using x-ray and cryoem, and the implications of the observed structural rearrangements remain to be determined. currently, no efficient antivirals against sars-cov- or other covs are available, and numerous clinical trials are underway (lythgoe and middleton, ) . in parallel, efforts continue to develop antivirals and vaccines (chen, et al., ; liu, et al., ) . structure-based design of antivirals that efficiently recognize the target relies on understanding the main structural features, including local structural dynamics. furthermore, developing selective therapies and efficient vaccines against adaptive evolutionary patterns of the virus poses a significant challenge. this challenge is amplified due to the persistency of the pandemic and the estimation that sars-cov- might continue to circulate in the population with renewed outbreaks (kissler, et al., ; tse, et al., ; ye, et al., ) . our analysis has laid the major inducible structural features of the sars-cov- -rbd and propose a new potential therapeutic strategy to block viral entry. overall, this study may be helpful in guiding the development and optimization of structure-based intervention strategies that target sars-cov- . the authors declare no conflict of interest. gal markel is supported by the samulei foundation grant for integrative immuno-oncology and by the israel science foundation ipmp grant. tomer meirson is supported by the foulkes foundation fellowship for md/phd students. polyproline-ii helix in proteins: structure and function evolutionary conservation of the polyproline ii conformation surrounding intrinsically disordered phosphorylation sites a substrate-driven allosteric switch that enhances pdi catalytic activity primex and the schrödinger computational chemistry suite of programs activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites impact of thiol-disulfide balance on the binding of covid- spike protein with angiotensin converting enzyme receptor autoregulation of von willebrand factor function by a disulfide bond switch raman-assisted crystallography suggests a mechanism of x-ray-induced disulfide radical formation and reparation the sars-cov- vaccine pipeline: an overview allosteric disulfides: sophisticated molecular structures enabling flexible protein regulation disulfide by design . : a web-based tool for disulfide engineering in proteins properties of polyproline ii, a secondary structure element implicated in protein-protein interactions sars and mers: recent insights into emerging coronaviruses identification of a novel coronavirus in patients with severe acute respiratory syndrome cell entry by enveloped viruses: redox considerations for hiv and sars-coronavirus murine coronavirus membrane fusion is blocked by modification of thiols buried within the spike protein responding to covid- -a once-in-a-century pandemic isolation and characterization of a bat sars-like coronavirus that uses the ace receptor meeting modern challenges in visualization and analysis bio d: an r package for the comparative analysis of protein structures isolation and characterization of viruses related to the sars coronavirus from animals in southern china cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding middle east respiratory syndrome coronavirus in dromedary camels: an outbreak investigation. the lancet infectious diseases identification of critical determinants on ace for sars-cov entry and development of a potent entry inhibitor sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor on the occurrence of linear groups in proteins a fresh look at the ramachandran plot and the occurrence of standard structures in proteins clinical features of patients infected with novel coronavirus in wuhan vmd: visual molecular dynamics molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms social distancing strategies for curbing the covid- epidemic where freedom is given, liberties are taken a novel coronavirus associated with severe acute respiratory syndrome structure of the sars-cov- spike receptor-binding domain bound to the ace receptor significant redox insensitivity of the functions of the sars-cov spike glycoprotein comparison with hiv envelope functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses receptor recognition and cross-species infections of sars coronavirus bats are natural reservoirs of sars-like coronaviruses learning from the past: possible urgent prevention and treatment options for severe acute respiratory infections caused by -ncov ongoing clinical trials for the management of the covid- pandemic assignment of polyproline ii conformation and analysis of sequencestructure relationship silaproline helical mimetics selectively form an all-trans ppii helix a helical lock and key model of polyproline ii conformation with sh κ-helix and the helical lock and key model: a pivotal way of looking at polyproline ii middle east respiratory syndrome coronavirus in bats, saudi arabia host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein molecular vibrations and structure of high polymers. ii. helical parameters of infinite polymer chains as functions of bond lengths, bond angles, and internal rotation angles circular dichroism eigenspectra of polyproline ii and β-strand conformers of trialanine in water: singular value decomposition analysis human cd metastability is a function of the allosteric disulfide bond in domain immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen ucsf chimera-a visualization system for exploratory research and analysis structural basis of receptor recognition by sars-cov- cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace cell-type specific requirements for thiol/disulfide exchange during hiv- entry and infection a survey of left-handed polyproline ii helices epidemiology, genetic recombination, and pathogenesis of coronaviruses temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov- : an observational cohort study. the lancet infectious diseases structural basis for human coronavirus attachment to sialic acid receptors a series of pdb-related databanks for everyday needs the current and future state of vaccines, antivirals and gene therapies against emerging coronaviruses finding the right fit: chiseling structures out of cryo-electron microscopy maps structure, function, and antigenicity of the sars-cov- spike glycoprotein cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer receptor recognition by the novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus a novel coronavirus outbreak of global health concern structural and functional basis of sars-cov- entry by using human ace monoclonal antibodies to the peplomer glycoprotein of coronavirus mouse hepatitis virus identify two subunits and detect a conformational change in the subunit released under mild alkaline conditions structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme summary of probably sars cases with onset of illness from cysteines and disulfide bonds as structure-forming units: insights from different domains of life and the potential for characterization by nmr protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace zoonotic origins of human coronaviruses first clinical trial of a mers coronavirus dna vaccine cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains unusual compactness of a polyproline type ii structure isolation of a novel coronavirus from a man with pneumonia in saudi arabia intra-and inter-protein couplings of backbone motions underlie protein thioldisulfide exchange cascade secondary structure assignment for conformationally irregular peptides: comparison between dssp, stride and kaksi identification of allosteric disulfides from prestress analysis a pneumonia outbreak associated with a new coronavirus of probable bat origin nmr solution structure of the integral membrane enzyme dsbb: functional insights into dsbb-catalyzed disulfide bond formation a novel coronavirus from patients with pneumonia in china the authors would like to thank haya and nehemia lemelbaum for their continuous generous support. key: cord- - xn m authors: heaton, brook e.; trimarco, joseph d.; hamele, cait e.; harding, alfred t.; tata, aleksandra; zhu, xinyu; tata, purushothama rao; smith, clare m.; heaton, nicholas s. title: srsf protein kinases and are essential host factors for human coronaviruses including sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xn m antiviral therapeutics against sars-cov- are needed to treat the pandemic disease covid- . pharmacological targeting of a host factor required for viral replication can suppress viral spread with a low probability of viral mutation leading to resistance. here, we performed a genome-wide screen in human lung epithelial cells to identify potential host therapeutic targets. we report that the kinase srpk , together with the closely related srpk , are jointly essential for sars-cov- replication; inhibition of srpk / with small molecules led to a dramatic decrease (more than , -fold) in sars-cov- virus production in immortalized and primary human lung cells. subsequent biochemical studies revealed that sprk / phosphorylate the viral nucleocapsid (n) protein at sites highly conserved across human coronaviruses and, due to this conservation, even a distantly related coronavirus was highly sensitive to an sprk / inhibitor. together, these data suggest that srpk / -targeted therapies may be an efficacious strategy to prevent or treat covid- and other coronavirus-mediated diseases. in december , a novel human coronavirus, now known as sars-cov- , emerged and began causing a human disease termed covid- , . since then, a global pandemic has infected millions and caused hundreds of thousands of deaths to date. due to the prevalence and severity of this disease, the development of therapeutic interventions is of the highest importance. much attention has been focused on targeting viral proteins and their associated enzymatic activities. in particular, the virally encoded rna- dependent rna polymerase (rdrp) and the viral proteases, are attractive potential targets. remdesivir, the only fda-approved antiviral for sars-cov- , is a nucleoside analogue which targets the viral rdrp and causes premature termination of transcription . efficacy of this treatment however, unfortunately, appears limited . in addition to targeting viral proteins directly, other antiviral development strategies attempt to target host factors that the virus requires to complete its lifecycle. relative to their hosts, viruses have dramatically less coding space in their genomes and therefore utilize host proteins to supplement the activities of their own proteins. the major advantages of inhibiting a virus indirectly via an essential host factor are two-fold: ( ) many viruses may utilize the same host protein, therefore host-directed therapeutics have the potential to be broadly acting and ( ) while direct targeting of the virus can rapidly select for resistant viral mutants, it is thought to be much more difficult for a viral mutation to overcome inhibition of a co-opted host protein. the development of host-directed covid- therapeutics has been hindered however, due to an incomplete understanding of the host factors required for sars-cov- replication; this is in no small part due to its recent emergence into the human population. sgrna sequencing data indicated that the host gene with the highest probability of being required for sars-cov- infection was the serine/arginine-rich protein kinase, srpk ( figure b) . we did not enrich for ace , presumably due to the numerous lentivirus- integrated copies of ace causing inefficient targeting in our cell line. we chose to move forward by interrogating potential roles for srpk during viral infection as kinases are generally thought to be good drug targets , and identifying drug targets was the initial goal of our screen. since a number of sars-cov- proteins have been reported to be phosphorylated, we took a bioinformatic approach to predict potential srpk phosphorylation sites across the entire sars-cov- genome ( figure c , supplemental table ). the viral protein with by far the highest confidence predicted srpk sites was the viral nucleoprotein (n). phosphoproteomic studies of sars-cov- proteins have reported that some of the predicted srpk phosphorylation sites in the n protein are indeed phosphorylated - (fig c) . we therefore hypothesized that srpk was likely responsible for the phosphorylation of at least some sites in the n we next wanted to define the degree to which inhibition of srpk mediated n phosphorylation would affect viral replication, especially since other non-srpk kinases have been predicted to be responsible for sars-cov- protein phosphorylation , . we selected the srpk inhibitor sprin ( figure d ) as it has been reported to inhibit srpk (and srpk with an ~ x higher ic ) without affecting other kinases including the related serine/arginine-rich protein kinase family members (srpks) . treatment of a -ace cells with increasing concentrations of srpin revealed significant toxicity at a concentration of µm; lower concentrations however, were well tolerated ( figure e). we next infected cells treated with non-toxic concentrations of srpin with sars-cov- and allowed multicycle viral growth for hours. we observed a dramatic srpin -mediated decrease in viral rna and infectious viral titer (up to , -fold) at the higher drug concentrations (figure f,g) . the stronger viral inhibitory phenotype at higher concentrations of srpin suggested that srpk and srpk (which are known to have similar substrate specificities , ) likely need to be inhibited simultaneously for full suppression of viral replication. to further test that hypothesis, we made use of another small molecule (sphinx ) that targets srpk without appreciably affecting srpk . consistent with our previous data, while we could significantly inhibit sars-cov- replication by targeting srpk alone, the magnitude of the effect was smaller than with srpin (supplemental figure ) . finally, in order to provide direct evidence that both srpk and srpk could indeed phosphorylate the sars-cov- n protein, we performed in vitro phosphorylation assays with purified n, srpk or srpk , and -p atp. in a dose dependent manner, increasing concentrations of the kinases led to increased n protein phosphorylation ( figure h,i) . together, these data indicate that while srpk is the dominant kinase (and thus most table ). we therefore wanted to determine if the viral requirement for srpk / activity might be broadly conserved as well. srpin treatment of cells infected with the alphacoronavirus e (which is only distantly related to betacoronavirus sars-cov- ) inhibited the virus by more than , -fold at non-toxic concentrations of the drug ( figure g-h). these data indicate that the requirement for srpk / activity is not restricted to further, srpk is upregulated in multiple cancers including prostate, breast, lung, and glioma . in particular, it is thought that its regulation of vascular endothelial growth factor (vegf) is central to its importance for a number of different diseases . as such, a number of different inhibitors of srpk have been developed and used in various pre- clinical disease models, however none of these compounds have yet been fda approved. in addition to its normal physiological roles and potential roles in cancer, srpk / kinase activity has been previously reported as important for the replication of have been reported to suppressed after treatment with gsk- inhibitors, presumably by affecting n protein phosphorylation . further studies will be required to understand how viral phosphorylation of sars-cov- protein affects their functions, as well as experimentally define which kinases phosphorylate which viral proteins at which specific locations. it will also be important to define potential kinase redundancy and the effects on viral replication. at least for sars-cov- , it appears that srpk / are central to the regulation of viral infection, but it is worth noting that although our study has focused on the phosphorylation of the n protein, we cannot rule out that the viral inhibition phenotype observed is also or instead due to altered phosphorylation of other viral, or indeed host, (thermo fisher cat # ) and total rna was isolated by either phase separation with chloroform and isopropanol or using the zymo direct-zol rna miniprep kit (zymo cat #r ). one-step qrt-pcr was performed using the invitrogen express one- step superscript qrt-pcr kit (thermo fisher cat # ) and the commercial milipore, cat# ab ) and sars-cov- ( : , genetex cat# gtx ) in blocking buffer at °c overnight. membranes were then washed times in pbst, incubated with secondary antibodies in blocking buffer for h at room temperature followed be three washes with pbst and mounted using fluor g reagent with dapi. all confocal images were collected using olympus confocal microscope fv using a x objectives. covid- pathophysiology: a review early transmission dynamics in wuhan, china remdesivir is a direct-acting antiviral that inhibits rna- dependent rna polymerase from severe acute respiratory syndrome coronavirus with high potency remdesivir for the treatment of covid- -preliminary report. genome-wide crispr screen reveals host genes that regulate sars-cov- infection. biorxiv severe acute respiratory syndrome coronavirus from patient with coronavirus disease, united states improved vectors and genome-wide libraries for crispr screening mageck enables robust identification of essential genes from genome-scale crispr/cas knockout screens protein kinases--the major drug targets of the twenty-first century? gps . : an update on the prediction of kinase-specific phosphorylation sites in proteins characterisation of the transcriptome and proteome of sars-cov- reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein the global phosphorylation landscape of sars-cov- growth factor receptor signaling inhibition prevents sarscov- replication. biorxiv utilization of host sr protein kinases and rna-splicing machinery during viral replication srpk : a differentially expressed sr protein-specific kinase involved in mediating the interaction and localization of pre-mrna splicing factors in mammalian cells development of potent, selective srpk inhibitors as potential topical therapeutics for neovascular eye disease structure of the sars coronavirus nucleocapsid protein rna- binding dimerization domain suggests a mechanism for helical packaging of viral rna the coronavirus nucleocapsid is a multifunctional protein pathogenesis of covid- from a cell biology perspective splicing kinase srpk conforms to the landscape of its sr protein substrate regulation of sr protein phosphorylation and alternative splicing by modulating kinetic interactions of srpk with molecular chaperones the many faces of srpk srpk inhibition in vivo: modulation of vegf splicing and potential treatment for multiple diseases serine-arginine protein kinase regulates ebola virus transcription inhibition of hepatitis c virus replication by a specific inhibitor of serine-arginine-rich protein kinase regulation of the subcellular distribution of key cellular rna-processing factors during permissive human cytomegalovirus infection human papillomavirus type e ^e protein is a potent inhibitor of the serine-arginine (sr) protein kinase srpk and inhibits phosphorylation of host sr proteins and of the viral transcription and replication regulator e molecular mechanism of sr protein kinase inhibition by the herpes virus protein icp suppression of hepatitis b virus replication by srpk and srpk via a pathway independent of the phosphorylation of the viral core protein hepatitis b virus core protein phosphorylation: identification of the srpk target sites and impact of their occupancy on rna binding and capsid structure identification of srpk and srpk as the major cellular protein kinases phosphorylating hepatitis b virus core protein phosphorylation of the arginine/serine dipeptide-rich motif of the severe acute respiratory syndrome coronavirus nucleocapsid protein modulates its multimerization, translation inhibitory activity and cellular localization the severe acute respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes in the cytoplasm by - - -mediated translocation glycogen synthase kinase- regulates the phosphorylation of severe acute respiratory syndrome coronavirus nucleocapsid protein and viral replication serine-arginine protein kinase (srpk ) inhibition as a potential novel targeted therapeutic strategy in prostate cancer an orally bioavailable broad-spectrum antiviral inhibits sars-cov- in human airway epithelial cell cultures and multiple coronaviruses in mice sars-cov- mrna was quantified by qrt-pcr n= . (e) calu- cells were treated with infectious virus was quantified by plaque assay n= . (f) diagram of the n protein of multiple red lines within protein indicate predicted srpk phosphorylation sites. (g) huh cells were treated with different concentrations of srpin and cytotoxicity was assayed with the celltiter-glo assay n= . (h) huh cells were treated with dmso or µm srpin then infected with e moi . . viral rna diagram of primary type ii pneumocyte cell culturing and infections. (j) primary type ii pneumocytes were isolated from different donors, cultured on transwell membranes, treated with dmso or µm srpin and then infected at moi= . after hrs cells were fixed and stained for type scale bar= µm. (k) microscopy images from (j) were quantified and % positive cells from four independent donors were plotted, n> . for all panels, error bars represent sem and *p< . **p< . by an unpaired key: cord- -q yqnlyl authors: armijos-jaramillo, vinicio; yeager, justin; muslin, claire; perez-castillo, yunierkis title: sars-cov- , an evolutionary perspective of interaction with human ace reveals undiscovered amino acids necessary for complex stability date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q yqnlyl the emergence of sars-cov- has resulted in more than , infections and nearly , deaths globally so far. this novel virus is thought to have originated from an animal reservoir, and acquired the ability to infect human cells using the sars-cov cell receptor hace . in the wake of a global pandemic it is essential to improve our understanding of the evolutionary dynamics surrounding the origin and spread of a novel infectious disease. one way theory predicts selection pressures should shape viral evolution is to enhance binding with host cells. we first assessed evolutionary dynamics in select betacoronavirus spike protein genes to predict where these genomic regions are under directional or purifying selection between divergent viral lineages at various scales of relatedness. with this analysis, we determine a region inside the receptor-binding domain with putative sites under positive selection interspersed among highly conserved sites, which are implicated in structural stability of the viral spike protein and its union with human receptor hace . next, to gain further insights into factors associated with coronaviruses recognition of the human host receptor, we performed modeling studies of five different coronaviruses and their potential binding to hace . modeling results indicate that interfering with the salt bridges at hot spot could be an effective strategy for inhibiting binding, and hence for the prevention of coronavirus infections. we also propose that a glycine residue at the receptor binding domain of the spike glycoprotein can have a critical role in permitting bat variants of the coronaviruses to infect human cells. the recent emergence of the novel sars coronavirus (sars-cov- ) marked the third introduction of a highly pathogenic coronavirus into the human population in the twenty-first century, following the severe acute respiratory syndrome coronavirus (sars-cov) and the middle east respiratory syndrome coronavirus (mers-cov). the first, sars-cov emerged in november in the guangdong province of china and spread globally during - , infecting more than people and causing deaths (drosten et al., ; who, ) . mers-cov was the second emergence and was first detected in saudi arabia in and resulted in nearly human infections and deaths in countries (fehr et al., ; zaki et al., ) . in december , sars-cov- , a previously unknown coronavirus capable of infecting humans was discovered in the chinese city of wuhan, in the hubei province zhu et al., ) . sars-cov- is associated with an ongoing pandemic of atypical pneumonia, now termed coronavirus disease (covid- ) that has affected over , people with fatalities as of march , (who, . both sars-cov and mers-cov are thought to have originated in colonies of bats, eventually transmitted to humans, putatively facilitated by intermediate hosts such as palm civets and dromedary camels, respectively (cui et al., ) . the genome of sars-cov- shares about % nucleotide identity with that of sars-cov and is % identical to the bat coronavirus batcov ratg genome, reinforcing the probable bat origin of the virus . however, better assessing the evolutionary dynamics of sars-cov- is an active research priority worldwide. sars-cov, mers-cov and sars-cov- belong to the genus betacoronavirus within the subfamily coronavirinae of the family coronaviridae. members of this family are enveloped viruses containing a single positive-strand rna genome of - kb in length, the largest known rna virus genome. the coronavirus spherical virion consists of four structural proteins: the spike glycoprotein (s-protein), the envelope protein, membrane protein and nucleocapsid. the transmembrane trimeric s-protein plays a critical role in virus entry into host cells (gallagher & buchmeier, ; tortorici & veesler, ) . it comprises two functional subunits: s subunit, where the receptor-binding domain (rbd) is found, is responsible for binding host cell surface receptors and s subunit mediates subsequent fusion between the viral and cellular membranes (kirchdoerfer et al., ; yuan et al., ) . both sars-cov and sars-cov- interact directly with angiotensin-converting enzyme (ace ) to enter host target cells (hoffmann et al., ; li et al., ; walls et al., ; yan et al., ) . in the case of sars-cov, ace binding was found to be a critical determinant for the virus host range and key amino acid residues in the rbd were identified to be essential for ace -mediated sars-cov infection and adaptation to humans (li et al., ; li et al., ) . understanding the dynamics that permits a virus to shift hosts is of considerable interest, and further be an essential preliminary step towards facilitating the development of vaccines and the discovery of specific drug therapies. we employ a multidisciplinary approach to look for evidence of diversifying selection on the s-protein gene, and model the interactions between human ace (hace ) and the rbd of selected coronavirus strains, which ultimately afforded us novel insights detailing virus and host cell interactions. given the rapid pace of discovery we aim to add clarity to evolutionary dynamics of diseases strains by more precisely understand the dynamics at the s-protein and its interaction with hace . the most similar genomes to sars-cov- mn were retrieved using blastp (altschul et al., ) vs the nr database of genbank (table ) . genomes were then aligned using mauve (darling et al., ) and the s-protein gene was trimmed. the extracted genomic sections were aligned using a translation align option of geneious (kearse et al., ) with a mafft plugin (katoh & standley, ) . the phylogenetic reconstruction of s-protein genes was performed with phyml (guindon et al., ) , using a gtr+i+g model, using non-parametric bootstrap replicates. both, the alignment and the tree were used as input for paml codeml (yang, ) . the presence of sites under positive selection was tested by the comparison of m (it allows a proportion of positive, neutral and negative selection sites in the alignment) vs m (it allows a proportion of neutral and negative selection sites in the alignment) and m (ω follows a beta distribution plus a proportion of sites with ω> ) vs m (ω follows a beta distribution) models using the ete toolkit . (huerta-cepas et al., ) . the presence of tree nodes under positive selection was obtained with the free branch model and then tested by the comparison of branch free (different ω for each selected branches) vs m (negative selection for all sites and branchesnull model) and branch free vs branch neutral (ω= for selected branches) models. the presence of sites with positive selection under specific branches of the tree was tested with bsa (proportion of sites with positive selection in a specific branch of the tree) vs bsa (proportion of sites with neutral and purifying selection in a specific branch of the tree) models. likelihood ratio test (lrt) was performed (p≤ . ) to compare the hypothesis contrasted by each model. we used the set of programs available in hyphy (kosakovsky pond et al., ) , fast unconstrained bayesian approximation (fubar) to detect overall sites under positive selection, and fixed effects likelihood (fel) to detect specific sites under positive selection in specific branches. we used mixed effects model of evolution (meme) to detect episodic positive/diversifying selection and adaptive branch site rel (absrel) to detect branches in the tree under positive selection. the web server datamonkey (weaver et al., ) was used to perform the hyphy analyses. finally, treesaap . (woolley et al., ) was used to detect sites under adaptation (in terms of physicochemical properties). the same alignment and tree described above were used for this analysis. all these experiments were performed again using the s-protein genes of a shorter list of accessions and more distantly related (broad dataset) to sars-cov- (ay , ay , dq , fj , ky , mg , mg , mn , nc_ ) to test the reproducibility of the predicted branches and sites under positive selection. the crystal structure of the sars-cov s-protein rbd ( genebank id nc_ ) in complex with hace was retrieved from the protein data bank (code ajf) (berman et al., ) . homology models were constructed using this structure as template for the rbds of sars-cov- (sars , genebank id mn ), the bat sars-like coronavirus isolate rm (rm , genebank id dq ) and the bat sars-like coronavirus isolate rs ( rs , genebank id ky ). one additional homology model for the g d mutant of the sars-cov- rbd (sars -mut) was constructed. homology models were built with modeller v. (webb & sali, ) using its ucsf chimera interface (pettersen et al., ) . five models were constructed for each target sequence and the one with the lowest dope score was selected for the final model. all non-amino acidic residues were removed from the sars-cov rbd-hace complex to obtain a clean complex. the homology models of the sars , rm , rs rbds and sars -mut were superimposed into the sars-cov rbd to obtain their initial complexes with hace . these complexes were then subject to molecular dynamics (md) simulations and estimation of their free energies of binding using amber (case et al., ) . for the later, ace was considered as the receptor and the rbds as ligands. the protocol described below was employed for all complexes and otherwise noted default software parameters were employed. systems preparation was performed with the tleap program of the amber suite. each complex was enclosed in a truncated octahedron box extending Å from any atom. next, the boxes were solvated with tip p water molecules and na+ ions were added to neutralize the excess charge. systems were minimized in two steps, the first of which consisted in steps of the steepest descent algorithm followed by cycles of conjugate gradient with protein atoms restrained using a force constant of kcal/mol.Å . the pme method with a cutoff of Å was used to treat long range electrostatic interactions. during the second minimization step the pme cutoff was set to Å and it proceeded for steps of the steepest descent algorithm followed by cycles of conjugate gradient with no restrains. the same pme cutoff of Å was used in all simulation steps from here on. both minimization stages were performed at constant volume. the minimized systems were heated from to k at constant volume constraining all protein atoms with a force constant of kcal/mol.Å . the shake algorithm was used to constrain all bonds involving hydrogens and their interactions were omitted from this step on. heating took place for steps, with a time step of fs and a langevin thermostat with a collision frequency of . ps - was employed. all subsequent md steps utilized the same thermostat settings. afterward, the systems were equilibrated for ps at a constant temperature of k and a constant pressure of bar. pressure was controlled with isotropic position scaling with a relaxation time of ps. the equilibrated systems were used as input for ns length production md simulations. the free energies of binding were computed under the mm-pbsa approach implemented in ambertools (case et al., ) . a total of md snapshots were evenly selected, one every ps, from the last ns of the production run for mm-pbsa calculations. the ionic strength was set to mm and the solute dielectric factor was set to for all systems. in order to detect branches and sites under positive/negative selection, two datasets were explored. the first ('closer' dataset) harbors the most similar genomes to wuhan-hu- coronavirus (sars-cov- ) (mn ). for this dataset, several genomes were excluded from the analysis because they showed minimal variation to other sequences. we used a preliminary phylogeny to select a representative isolate of each clade (table ) in order to exclude highly similar sequences. the second dataset ('broad' dataset) includes some accessions of the first dataset plus isolates less related to sars-cov- , like sars-like coronavirus isolates from different countries (see methods). we compare the results of two dataset because the phylogenetic distance between orthologues in a given dataset has been demonstrated to alter the ability to detect selection in paml and meme (mcbee et al., ) . in both datasets, we observed evidence of purifying selection in the majority of nodes of the tree. specifically, in the 'closer' dataset we identified nodes with evidence of negative selection, and under positive selection when free ratios model of codeml model was applied. to confirm the four nodes under positive selection we use ltr test for contrasting hypothesis using branch free, branch neutral and m models of codeml. using these approximations, any node predicted by free ratios model with ω> was significantly different to the purifying (ω< ) or neutral (ω= ) models. an equivalent analysis was performed using absrel of hyphy, observing episodic diversifying selection in at least of nodes of the phylogenetic tree reconstructed with the 'closer' dataset ( figure ). interestingly, one of the divisions detected with diversifying selection was the branch that contains sars-cov- , pangolin coronavirus isolate mp and bat coronavirus ratg (called sars-cov- group) but not the specific branch that contains sars-cov- . under positive selection in sars-cov- using the closer dataset without pangolin coronavirus isolate mp . it is interesting despite the influence of the dataset in the results, because site f is directly involved in hace -rbd interaction , explaining at least in part strong selection at this site. moreover, the branch-site model bsa (positive selection) vs bsa (relaxation) of codeml were compared to find evidence of sites under positive selection in branch of sars-cov- using the 'closer' dataset, but bsa does not show significant differences with bsa (p> . ) indicating selection cannot be confidently implicated, but it was when other datasets were used (including f ). in summary, we do find evidence of sites under positive/episodic selection in branches of close related strains of wuhan-hu- isolate coronavirus. however, there is not strong evidence of specific sites under positive selection in sars-cov- using the tools mentioned in this work. this result does not disregard the presence of positive selection sites in sars-cov- , nonetheless, it shows the limitation of the methods to identify with precision specific sites under positive selection in a precise taxon of a phylogenetic tree. we further warn researchers need to be conservative with interpretations of studies utilizing these methodologies, given the equivocal results can be generated by datasets varying in genetic similarity. to complement our analyses looking for evidence of selection among lineages, we specifically analyzed for patterns of selection across sites in the s-protein genes, we used the sites models available in codeml and hyphy. model m of codeml detected . % of sites under positive selection (ω> ) and models m and m detected % of sites under purifying selection (ω< ). model m explains the significant data better (p= e- ) than m model, that takes in account only sites with neutral and purifying selection. to resolve these ambiguities in positive selection sites we calculate putative selection sites with codeml (using bayes empirical bayes from m and m models) and fubar with different datasets reflecting the addition of novel sequences to online repositories (broad, closer, closer without mn and mt and closer without mt ) and we obtain different results. it is becoming increasingly clear that predictions of positive selected sites are highly influenced simply by the diversity of the individual sequences included in the datasets. in any case, the majority of predicted sites converge in the region between to , a section of the rbd. additionally, we used treesaap to detect important biochemical amino acid properties changes over regions and/or sites along betacoronavirus s-protein. using a sliding window size of (increasing by ) we detect that the region between to (using sars-cov- s-protein as a reference) have drastic amino acid changes for alpha-helical tendencies. in addition, the section between to residues registers radical changes in amino acids implicated in the equilibrium constant (ionization of cooh). in the structural analysis we performed, the section between to forms a loop that is not present in certain s-proteins of coronavirus isolated in bats. this loop extends the interaction area between rbd of s-protein and human ace , in fact, the lack of these loop decreases the negative energy of interaction (increasing the binding) among these two molecules (see table ). these results obtained from independent analysis strongly highlight the importance of to section. additionally, important hace -binding residues in the rbd from sars-cov- obtained from the crystallography and structure determination performed by shang et al. ( ) are also present in the section we highlight here. we propose that this region is the most probable to contain the sites under positive selection due to predictions by our codeml and fubar models. in that sense, we refer to this section as region under positive selection (rps). it is important to additionally clarify that even inside the rps we found at least aa highly conserved between coronaviruses, several of them are predicted as sites under purifying selection. this shows that it is necessary to maintain sites without change around polymorphic sites, probably to conserve the protein structure and at the same time to have the ability to colonize more than one host. interestingly, the rps of the pangolin coronavirus isolate mp differs only in one amino acid with the homologous region of sars-cov- , whereas in contrast the bat coronavirus ratg (the overall most similar isolate to sars-cov- sequenced at the moment) shows differences in the same region. several explanations could derive from this observation. the hypothesis of recombination inside the pangolin between a native coronavirus strain and a bat coronavirus (like ratg ) is congruent with our observation. this scenario was proposed and discussed as the origin of sars-cov- by (lam et al., ; wong et al., ; xiao et al., ) , however, other explanations are possible. if the sars-cov- , ratg and pangolin coronavirus mp isolate are closely related as shown in the tree of the figure , we are observing the ancestral sequence of rps in human and pangolin coronaviruses, and a mutated version in bat virus. elucidating the origin of sars-cov- is beyond the scope of this work, nevertheless sequencing of new coronavirus isolates in the near future could resolve this question. with a list of broader observations related to the role of selection across viral genomes we aimed to specifically understand how these regions could affect virus/host interactions. to understand more in deep the importance of rps in the evolution of sars-cov- , we quantified the relative importance of this region in the interaction between rbd and hace . in that sense, md simulations were run for five complexes (listed in methods). in all cases the systems were stable with root mean square deviations (rmsd) of their backbones between . Å and . Å relative to the initial complexes structures during the last ns of the production run. we first investigated the network of contacts between the ligands (coronaviruses rdb) with the receptor (hace ). overall, all complexes present a large number of contacts between the ligands and the receptor in at least % of the md snapshots selected for mm-pbsa calculations. common interactions with t , f , k , h , y , k , g , d and r of the receptor are observed in all systems. the full networks of interactions between the coronaviruses and the hace receptor are provided as supporting information. next we estimated the free energies of binding of the coronaviruses' rbds to hace and the results of these evaluations are summarized in table . these calculations show that the sars , sars and rs viruses are predicted to favorably bind to the human hace receptor, while the rm and sars -mut variants present unfavorable free energies of binding. the fact that the bat's coronavirus rs , in addition to sars and sars , presents favorable interaction with hace is in accordance with the previous observation that it is able to infect human cells expressing this protein (hu et al., ) . to get more insights into the contribution of the receptor and the rbds to the binding process, we performed energy decomposition experiments. the contribution of each residue in the studied coronaviruses that interact with the hace receptor are shown in table . rows are presented in such a way that each of them contains the residues occupying the same position in the viruses rbds structures as in the sar rbd structure. from here on, residues numeration will take that of sars as reference. in general, most rbds residues show negative values of contribution to the free energies of binding to the human receptor. all studied rbds, except that of the rm coronavirus, have amino acids with large favorable contributions to the free energies of binding that directly interact with hace : k of sars and sars -mut, r in sars and r in rs . on the other hand, the g d mutation (d present in bat coronavirus strains) have a negative contribution to the binding of the rdb to hace . this site was predicted to be under purifying selection by fubar analyses, and is located within the rps. strikingly, the g d mutation (sars numeration) has a large negative influence in the free energy of binding in the two complexes that contain it. it is also worth noting that the three aspartic acid substitutions present in all systems negatively contribute to the systems stability. taking into account that the only difference between sars and sars -mut is the g d mutation, we postulate that this rbd position is critical for the human receptor recognition by coronaviruses. to the best of our knowledge, no coronavirus having aspartic acid at this position is able to infect human cells. this result supports the prediction from fubar analyses indicating that the site g d is under purifying selection. combined, our results strongly suggest that the mutation of the d residue present in the coronaviruses from bats is critical for their rbds to recognize the human hace receptor. additionally, it shows the importance of sites under purifying selection in rps for the rbd evolution. to better interpret the influence of the key interactions between the coronaviruses rbds and their hace receptor, their interactions were analyzed. to select the representative structure of each system the md snapshots employed for mm-pbsa calculations were clustered. then, the representative structure of a system was selected as the centroid of the most populated cluster. the predicted rbd-hace complexes for sars , sars and sars -mut are depicted in figure . many studies have focused on coronaviruses mutations that favor adaptations for human hosts infections. for example, it has been shown that specific substitutions at positions , , , and ( , , , and in sars) of the rbd of sars favors the interaction between the rbd of sars and hace (cui et al., ) . likewise, homology modeling studies found favorable interactions between the residues occupying these positions in the sars rbd and the human receptor . the cornerstone of these favorable interactions is the complementarity of the rbds with hot spots and . these are salt bridges between k and e and between d and k of ace which are buried in a hydrophobic environment (see figure. ). in the cases of sars and sars, q (n in sars) and n (t in sars) add support to the hot spots according to these previous studies. these observations should also hold for the rs strain, however the n a change in the later compared to sars (a in rs ) add little support to hot spot . in this case, to continue permitting human infection, the large favorable contribution of r in rs to the free energy of binding could compensate the weak support provided by a to hot spot . interestingly, k is the residue forming the largest network of contacts with the analyzed rbds among those belonging to both hot spots. our simulations also show that in sars and sars the rbd amino acids with the largest contribution to the free energy of binding, k and r (see table ) respectively, do not interact with any hot spot residue. instead, they interact with d of hace in the sars complex and with e of the human receptor in the sars complex. this could indicate that interactions additional to those previously identified with the hace hotspots could be critical for the stabilization of the rdb-human receptor complexes. finally, we analyzed the possible reasons for the predicted negative impact that the g d mutation has on the predicted free energies of binding of the rbd to hace . as depicted in figure , g directly interacts with k in hot spot and its mutation interferes with the d -k salt bridge. specifically, d of the rdb point to d of hace yields a high electric repulsion between these amino acids. consequently, this portion of the rbd is pushed to a position further from hace than that observed in the wild type receptor, resulting in the reduction of its network of contacts with k . as a result, the binding of the rbd to hace is considerably inhibited and unlikely to occur. a priority in ongoing research is to better understand coronavirus evolution, with specific interests in understanding the role of selection pressures in viral evolution, and clarifying how viral strains can infect novel hosts. our experiments suggest that there are sites under positive selection in the s-protein gene of sars-cov- and other betacoronaviruses, particularly in a region that we called rps (region under positive selection) inside of the rbd. however, we have identified that by in large, sites in this region (and overall, in the s-protein gene) are under purifying selection. particularly, for the site d g, the presence of aspartic acid seems indispensable for the interaction with the hace . additionally, we performed md simulations and free energies of binding predictions for five different complexes of coronaviruses that do and do not infect human cells. our results suggest that as long as no disrupting interference occur with both salt bridges at hot spots and coronaviruses are able to bind with hace . modeling results suggest that interference with the hot spot could be and effective strategy for inhibiting the recognition of the rbd of the sars-cov- spike protein by its human host receptor ace and hence prevent infections. although additional simulations and experiments are required, all evidence suggests that the mutation of d in the bat variants of the coronaviruses permit infection of human cells. giving the large contribution of sars k to the free energy of binding of the rbd to hace we propose that blocking its interaction with the receptor d could be a promising strategy for future drug discovery efforts. gapped blast and psi-blast: a new generation of protein database search programs the -new coronavirus epidemic: evidence for virus evolution the protein data bank origin and evolution of pathogenic coronaviruses mauve: multiple alignment of conserved genomic sequence with rearrangements identification of a novel coronavirus in patients with severe acute respiratory syndrome middle east respiratory syndrome: emergence of a pathogenic human coronavirus coronavirus spike proteins in viral entry and pathogenesis new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . . systematic biology sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus clinical features of patients infected with novel coronavirus in ete: a python environment for tree exploration more effective purifying selection on rna viruses than in dna viruses mafft multiple sequence alignment software version : improvements in performance and usability geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data pre-fusion structure of a human coronavirus spike protein hyphy . -a customizable platform for evolutionary hypothesis testing using phylogenies identification of -ncov related coronaviruses in malayan pangolins in southern china animal origins of the severe acute respiratory syndrome coronavirus: insight from ace -s-protein interactions angiotensin-converting enzyme is a functional receptor for the sars coronavirus receptor and viral determinants of sars-coronavirus adaptation to human ace the effect of species representation on the detection of positive selection in primate gene data sets ucsf chimera-a visualization system for exploratory research and analysis structural basis for receptor recognition by the novel coronavirus from wuhan on the origin and continuing evolution of sars-cov- structural insights into coronavirus entry structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars datamonkey . : a modern web application for characterizing selective and other evolutionary processes comparative protein structure modeling using modeller summary of probable sars cases with onset of illness from evidence of recombination in coronaviruses implicating pangolin origins of ncov- treesaap: selection on amino acid properties using phylogenetic trees isolation and characterization of -ncov-like coronavirus from malayan pangolins structural basis for the recognition of the sars-cov- by full-length human ace paml : phylogenetic analysis by maximum likelihood cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china the authors declare that they have no conflicts of interest. key: cord- -y bk jc authors: caruso, Ícaro p.; sanches, karoline; da poian, andrea t.; pinheiro, anderson s.; almeida, fabio c. l. title: dynamics of the n-terminal domain of sars-cov- nucleocapsid protein drives dsrna melting in a counterintuitive tweezer-like mechanism date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: y bk jc the n protein of betacoronaviruses is responsible for nucleocapsid assembly and other essential regulatory functions. its n-terminal domain (ntd) interacts and melts the double-stranded transcriptional regulatory sequences (dstrs), regulating the discontinuous subgenome transcription process. here, we used molecular dynamics (md) simulations to study the binding of sars-cov- n-ntd to non-specific (ns) and trs dsrnas. we probed dsrnas’ watson and crick (wc) base-pairing over replicas of ns md simulations, showing that only one n-ntd of dimeric n is enough to destabilize dsrnas, initiating melting. n-ntd dsrna destabilizing activity was more efficient for dstrs than dsns. n-ntd dynamics, especially a tweezer-like motion of β -β and -β loops, played a key role in wc base-pairing destabilization. based on experimental information available in the literature, we constructed kinetics models for n-ntd-mediated dsrna melting. our results support a : stoichiometry (n-ntd:dsrna), matching md simulations and raising different possibilities for n-ntd action: (i) two n-ntds of dimeric n would act independently, increasing efficiency; (ii) two n-ntds of dimeric n would bind to two different rna sites, bridging distant regions of the genome; and (iii) monomeric n would be active, opening up the possibility of a regulatory dissociation event. importance coronaviruses are among the largest positive-sense rna viruses. they display a unique discontinous transcription mechanism, involving n protein as a major player. the n-ntd promote the dsrna melting releasing the nascent sense negative strand via a poorly known mechanism of action. it specifically recognizes the body trs conserved rna motif located at the ’ end of each orf. n protein has the ability to transfer the nascent rna strand to the leader trs. the mechanism is essential and one single mutation at the rna binding site of the n-ntd impairs the viral replication. here, we describe a counterintuitive mechanism of action of n-ntd based on molecular dynamics simulation and kinetic modelling of the experimental melting activity of n-ntd. this data impacts directly in the understanding of the way n protein acts in the cell and will guide future experiments. the recent pandemic of severe acute respiratory syndrome coronavirus (sars-cov- ), the causative agent of coronavirus disease (covid- ) , has become a global health emergency ( , ) . sars-cov- is an enveloped virus containing a large nonsegmented positive-sense single-stranded rna genome, belonging to the coronaviridae family ( , ) . the ' two-thirds of the coronaviruses' genome, corresponding to orf a/b, is translated into two polyproteins (pp a and pp ab) that are proteolytically processed into sixteen nonstructural proteins (nsps) ( ) . these nsps assemble in the viral replicasetranscriptase complex (rtc) at the endoplasmic reticulum membrane, being responsible for genome replication and transcription ( ) . conversely, the ' one-third of the genome is translated into accessory proteins as well as the four structural proteins, spike (s), membrane (m), envelope (e), and nucleocapsid (n), through a unique process of subgenomic mrna (sgmrna) transcription ( , ) . n is one of the most abundant viral proteins in the infected cell. it is a -kda multifunctional rna-binding protein that drives the viral rna packaging into a helical nucleocapsid ( ) . in addition, n localizes at the rtc at early stages of infection and plays a central role in the regulation of rna synthesis ( ) ( ) ( ) . it is composed of two functionally distinct folded domains, which are interspersed by an intrinsically disordered linker region enriched in arginine and serine residues. both the two domains and the linker region contribute individually to rna binding ( ) . the n-terminal domain (ntd) has been shown to interact with regulatory rna sequences during subgenome transcription, whereas the c-terminal domain (ctd) is responsible for n protein dimerization, which is crucial for nucleocapsid assembly ( , ) . the recently reported solution structure of sars-cov- n-ntd reveals a right hand-like fold, composed of a five-stranded central β-sheet flanked by two short α-helices, arranged in a β -β -β -β -β topology ( ) . the β-sheet core is referred to as the hand's palm, while the long β -β hairpin, mostly composed of basic amino acid residues, corresponds to the basic finger. the positively-charged cleft between the basic finger and the palm has been suggested as a putative rna binding site ( ) . genome replication is a continuous process in coronaviruses. in contrast, transcription is discontinuous and involves the production of sgmrnas ( ) . regulation of sgmrna synthesis is dependent on transcriptional regulatory sequences (trss) located either at the ' end of the positive-strand rna genome, known as the leader trs (trs-l), or at the ' end of each viral gene coding for structural and accessory proteins, called the body trs (trs-b). trs-l and trs-b share an identical core sequence, which allows for a template switch during sgmrna synthesis. once the trs-b has been copied, the nascent negative-strand rna is transferred to trs-l and transcription is terminated ( , ) . multiple well-orchestrated factors, including trs secondary structure, rna-rna and rna-protein interactions, influence sgmrna transcription ( ) . coronaviruses' n-ntd specifically interacts with the trs and efficiently melts a trs-ctrs rna duplex, facilitating template switch and playing a pivotal role in the regulation of discontinuous transcription ( , , , ) . despite its relevance for the viral replication cycle, the molecular basis underlying the specificity of interaction of sars-cov- n-ntd with the trs sequence remains elusive. thus, understanding the mechanism by which sars-cov- n-ntd specifically recognizes trs rna at atomic detail is paramount for the rational development of new antiviral strategies. here, we present a hypothesis for the molecular mechanism of dsrna melting activity of sars-cov- n-ntd. we showed by molecular dynamics (md) simulations ( replicas of ns) that n-ntd destabilizes dsrna's watson and crick base-pairing by dropping down intramolecular hydrogen bonds and perturbing the local rigid-body geometric parameters of dsrna. the destabilization is more significant for trs than for a non-specific (ns) dsrna sequence. moreover, a tweezer-like motion between β -β and -β loops of n-ntd seems to be a key dynamic feature for selectivity and, consequently, dsrna melting activity. we also constructed kinetic models for characterizing the melting activity of the dimeric n protein assuming : and : (n-ntd:dsrna) stoichiometries, revealing that only one n-ntd is enough for dsrna melting. we calculated the structural model of the n-ntd:dstrs complex based on the experimental data for the n-ntd interaction with a non-specific dsrna ( '-cacugac- ') (dsns) ( ) using the haddock . server ( ) . the structural restraints of the n-ntd:dsns complex were defined from csps titration performed by dinesh et al. ( ) ( ) . the lowest-energy structure of the n-ntd:dsns complex from the cluster with the lowest haddock score (fraction common contacts = . ± . Å, interface-rmsd = . ± . Å, and ligand-rmsd = . ± . Å) was used to mutate the dsns molecule to obtain the trs sequence ( '-ucuaaac- ') and, therefore, to generate the n-ntd:dstrs complex structure. figure a shows the structural model of the n-ntd:dstrs complex, in which the trs rna is inserted in a cleft located between the large protruding β -β loop, named finger, and the central -sheet of n-ntd, referred to as the palm. analysis of the electrostatic surface potential of n-ntd revealed that the dsrna-binding pocket is positively charged, with the finger being the highest charged region ( figure s ). this result is consistent with the charge complementarity of the nucleic acid phosphate groups that exhibit negative charge. it is worth mentioning that the orientation of the trs sense strand in the complex model is in agreement with experimental results described by keane et al. ( ) for the murine hepatitis virus structural homolog of n-ntd ( ) , in which the 'end of the sense strand binds close to β -β -β and the '-end binds next to β -β ( figure a ). we performed calculations of ns molecular dynamics (md) simulations to investigate the stability of the structural models of n-ntd in complex with either dsns or dstrs, as well as each of the biomolecules separately (dsns, dstrs, and n-ntd). figure b to estimate the stability of the watson-crick (wc) base-pairing of dsns and dstrs complexed with n-ntd, we evaluated the intramolecular hydrogen bonds formed between sense and anti-sense strands of the dsrna bound to n-ntd. in addition, the hydrogen bonds of the free rna molecules were investigated as a control parameter. this confirms the consistency of the force field used to describe the studied molecular system. for the n-ntd-bound dsrnas, we observed a decrease in the overall average number of intramolecular hydrogen bonds for both rna ligands, accompanied by an increase in respective standard deviations. this increase in standard deviation is due to a significant reduction in the average number of intramolecular hydrogen bonds of particular replicas, specifically for dstrs (runs , , , and ) and for dsns (runs and ) ( figure a ). the n-ntd-induced reduction in the number of intramolecular hydrogen bonds between the sense and anti-sense strands of dsrnas was more pronounced for dstrs than dsns, as shown by the analysis of score profile in figure . in contrast to the decrease in the number of intramolecular hydrogen bonds between the sense and anti-sense strands of dsrnas (wc base-pairing) due to n-ntd binding, we observed an increase in the average number of intermolecular hydrogen bonds formed between the nitrogenous bases of dstrs and n-ntd (protein-rna interaction) along the ns md simulation, whereas for dsns, this average value was constant ( figure b top). it is noteworthy that dsns has more hydrogen bond-forming sites (acceptor and donor) than dstrs and, in spite of that, the average number of intermolecular hydrogen bonds for the n-ntd:dstrs complex is higher after ns simulation. figure b figure s ). the structural model of the n-ntd:dstrs complex presented in figure c suggests a hypothesis for the mechanism of action of n-ntd in which only one domain is capable of destabilizing the rna duplex, possibly leading to its dissociation and ultimately release of the rna single strands. figure ). investigation of the stretch, stagger, and shear distances for dsns and dstrs showed that the equilibrium population at ~ Å decreased for both dsrnas as a result of n-ntd binding. however, this reduction is more drastic for dstrs than dsns, as can be seen in the inset for the respective plots in figure . in addition to this reduction effect, we also verified that n-ntd-bound dstrs exhibited clear subpopulations at ~ , ~± . , and ~ Å for the stretch, stagger, and shear distances, respectively. however, this destabilization effect is more evident for the n-ntd:dstrs complex, since the above analysis of angle and distance parameters suggests an impairment of base-pairing planarity accompanied by an increase in the separation between the nitrogenous bases of the complementary dsrna strands upon n-ntd binding. this result agrees well with the analysis of the intramolecular hydrogen bonds formed between the sense and anti-sense dsrna strands (see figure a ). it is worth mentioning that, even though base-pairing destabilization was more pronounced for dstrs than dsns, dsns suffered a greater reduction in the rna duplex twist, as suggested by the n-ntd-induced perturbation of the propeller angles. we also analyzed the population distributions of the local base-pair parameters for the replicas of dsns and dstrs in their free and n-ntd-bound states. figure s shows that the wc base-pairing perturbations observed in runs , , , and for dstrs due to n-ntd binding was also seen for the fully unbiased distribution generated from the replicas. nevertheless, an opposite effect occurred for dsns, for which the angle and distance parameters exhibited characteristics of stability and/or slight fluctuations around the equilibrium population. figure c ). an investigation of the motions filtered from the eigenvectors of pc and pc revealed that dstrs-bound n-ntd exhibited the largest conformational dynamics when compared to free and dsnsbound n-ntd, which were similar ( figure d and e). we highlight that the most evident motions took place in the n-and c-termini as well as the basic finger (β -β loop) for both free and dsrna-bound n-ntd. however, the eigenvectors of pc and pc for the n-ntd:dstrs complex suggested a wide motion between the basic finger and the -β loop, located at the palm, similar to a tweezer. interestingly, this tweezer-like motion was intrinsic to the residues located at the dsrna-binding cleft in n-ntd ( figure a ). our results of conformational flexibility from rmsf and pca for free and dsrnabound n-ntd corroborated each other and suggest a significant contribution of the n-and c-termini and the basic finger (β -β loop) to n-ntd dynamics. they also revealed that n-ntd interaction with dstrs led to a general gain in protein conformational flexibility when compared to its free state. we suggest that this flexibility gain of dstrs-bound n-ntd over replicas of concatenated simulations may be a key structural factor to promote dstrs wc base-pairing destabilization upon n-ntd binding, as determined by the break of intramolecular hydrogen bonds ( figure b and a) and perturbation of the local base-pair parameters (figure ). increasing n-ntd concentration led to the dsrna melting curve, which is characterized by an exponential decay of fret efficiency as a function of n-ntd concentration. the melting curves reached either zero, for an n-ntd construct that contains the c-terminal serine/arginine (sr)-rich motif, or a plateau, for n-ntd itself ( ) . since the fret efficiency is a measure of the molar fraction of dsrna, in the simulated kinetic models presented here, we report the molar fraction of dsrna as a function of n-ntd concentration, simulating the dsrna melting curve. we used the software kinetiscope (http://hinsberg.net/kinetiscope/), which is based on a stochastic algorithm developed by bunker ( ) and gillespie ( ) . we used the elementary rate constants for individual chemical steps to produce an absolute time base ( figure a ). the starting condition mimics exactly the experimental condition, varying the concentration of n-ntd over nm dsrna (dstrs). the predictions were validated by direct comparison to the experimental data ( ) . to simulate the melting curve, we had to constrain the kinetic space, which is large because each model is composed by reactions and individual rate constants, assuming the following boundaries: (b ) the kinetic model must be complete, complying all possible reactions for a given mechanism; (b ) the presence of n-ntd must lead to catalysis, with the melting of dsrna being faster than the annealing reaction; (b ) the equilibrium of the annealing is shifted toward the dsrna; and (b ) the equilibrium for the melting activity must be reached in few seconds or less to be efficient in the cellular environment. the criterion for choosing the rate constants for the annealing reaction (r , figure ) was that it must be significantly slower than the melting activity (catalysis). since to our knowledge, there is no experimental kinetic rate constant available for the annealing of dstrs, we fixed for the simulations a k off = × - s - , which is the experimental value of the dissociation rate constant observed for the almost inactive y a n-ntd+sr mutant ( ) . this mutant has a melting activity of hours, while our simulation showed melting activities of few seconds ( figure b ). to yield an equilibrium shifted toward the dsrna, we used k on = × - m - s - , which is true below the melting temperature of the dsrna. any values of k off < s - , with an association constant k a , gives the same molar fraction of dsrna. we constrained the binding reactions r and r of n-ntd to the sense (trs) and antisense (ctrs) single-stranded rna (ssrna) ( figure a ) based on the published experimental values for these association constants ( , ) . since these values were very similar, to simplify the simulation, we used the same k a for both reactions (k a = × m - ). note that k on < m - s - makes the reaction too slow to reach equilibrium, violating boundary b (figures s a and s b ). for dsrna (dstrs) binding, there was no experimental data to constrain the simulation. however, simulations unambiguously showed that k a for reaction r must be of the same order of that for ssrnas, leading to the allowed ranges depicted in figure a . we also determined k on based on the simulations, taking boundaries b and b into consideration, which were also considered for reactions r to r (figure s b, s a and s b). all the constraints applied to reactions r to r are valid for both kinetic models (models and ). conversely, reactions r and r are specific for each kinetic model, being essential to comply with boundary b . for model , there is no experimental data available to constrain reactions r and r , but the simulations showed that they are tightly related to reactions r and r , being both k a and k off of the same order of magnitude for reactions r and r ( figure s c ). note that there is an intricate relationship between the formation of ssrna-bound states (c and c ) and the decrease of free or bound dsrna (dstrs and c ). to illustrate this relationship, figure b shows the kinetics at three concentrations of n-ntd. the simulated melting curves for model resembled the near exponential decay observed experimentally ( figure c, left) . interestingly, when k off of reactions r and r were bigger than k off for reactions r and r , we observed a plateau in the exponential decay of the dsrna melting curve ( figure c ). remarkably, melting curves that either decayed to zero or reached a plateau was observed experimentally, as mentioned before ( ) . it is worth mentioning that the kinetic model is fully compatible with the experimental data by grossoehme et al. ( ) ( ), as well as with the mechanism suggested by the md simulations, in which one n-ntd can initiate dsrna melting, destabilizing the wc base-pairing. we also evaluated kinetic model . this mechanism for n-ntd melting activity was suggested in the conclusion scheme drawn in the paper by grossoehme to build a kinetic model that would exclusively produce ssrna from the sandwiched dsrna, we had to replace reactions r and r of kinetic model . in the new model, reaction r forms the sandwiched dsrna (c , figure a ) and reaction r is the dissociation of c into the ssrna-bound n-ntds (c and c , figure a ). to simulate n-ntd melting activity considering model , we used the same boundaries described earlier (b , b , b and b ), with reactions r to r having almost the same constraints described for model . reaction r and r of model has no parallel to any other reaction. we scanned all the kinetic space that led to the catalysis of melting activity and observed two contrasting situations. the first is when reaction r equilibrium is between - and m - , always having the dissociated forms c and c available and making the melting curve very stiff (model a). the second is the opposite situation, where equilibrium is skilled toward the sandwich state (c ) with k a > m - (model b). figure c illustrates the melting curves obtained for the two situations. model a is characterized for the high efficiency in the dissociation of the dsrna, k on and k off can assume any value. particularly for model a, the kinetic of dsrna melting is also independent of k on for reactions r and r , at fixed concentrations of n-ntd. all simulated conditions led to the curve in red ( figure c ), in which, the minimal amount of n-ntd ( nm) led to complete dissociation of the dsrna (molar fraction of zero). figure s illustrates all the simulated boundaries. note that for model a, there is never an accumulation of c ( figure s c ). model b corresponds to when the equilibrium of reaction r is shifted toward c (k a > m - ). figure s illustrates the reaction boundaries. in this situation, we were able to observe a melting curve ( figure c , blue) with a near exponential decay at a low concentration of n-ntd and a near exponential rise at higher concentrations of n-ntd. for model b. we determined that k on has to be > s - to keep up with boundaries b and b . for the melting activity to take place, the equilibrium of reaction r was shifted toward c (k a > , for model a or m - for model b) (figure s b and s c). in the present work, we used computational simulations to unravel the dsrna melting activity of the isolated sars-cov- n-ntd. our molecular dynamics data suggested that, during interaction with dsrna, protein dynamics drives the destabilization of hydrogen bonds involved in the wc rna base-pairing, probably in a : stoichiometry (n-ntd:dsrna). we also showed that the capacity of n-ntd to break the wc basepairing was sequence-specific, being more efficient for dstrs ( '-ucuaaac- ') than for a non-specific (ns) sequence ( '-cacugac- '). to further explore the n-ntd:dsrna stoichiometry, we constructed kinectic models based on experimental data by grossoehme et al ( ) ( ) . remarkably, the model using a : stoichiometry greatly fits the experimental data, reinforcing the mechanism we hypothesize here. the strategy of performing ns-molecular dynamics simulations with the same starting structure but different seeds of the random number generator provided a large sampling of conformational space of each molecular system (n-ntd, dsrnas, and n-ntd/dsrna complexes). this set of theoretical data ensured a statistically significant result showing that n-ntd destabilizes the wc base-pairing, especially for dstrs, through the replacement of intramolecular hydrogen bonds between the dsrna strands by intermolecular hydrogen bonds between n-ntd and the nitrogenous bases of each rna strand. the results also revealed unbiasedly that the rigid-body geometric parameters of the wc base-pairing were significantly changed due to n-ntd binding. one notable n-ntd structural feature is the presence of a significant number of loops; only out of residues are involved in secondary structure ( , ) . this is a typical feature of a dynamic protein. in fact, our results revealed that n-ntd is a plastic protein, with the n and c-termini and the β -β loop (finger) as the most prominent dynamic regions. for the n-ntd:dstrs interaction, a remarkable tweezer-like motion between the finger and the -β loop could be related to the sequence-specific wc basepairing destabilization. this led us to hypothesize that, following formation of the n-ntd:dstrs complex, the tweezer-like motion resulted from intrinsic protein dynamics might promote a steric effect causing a "compaction pressure" on the dsrna strands. this might expose residues from the bottom of palm (finger/ -β cleft) allowing their interaction with the bases, leading to destabilization of the wc base-pairing (figure ). to confirm the mechanism emerged from the md simulations, where only one molecule of n-ntd was enough to initiate dsrna melting, we constructed kinetic models considering two possible scenarios: a stoichiometry of : or : for n-ntd and dsrna. the : stoichiometry is more intuitive since n protein is dimeric in solution ( ) . to perform the docking, we took advantage of experimental data previously next, the structural model of the n-ntd:dstrs ( '-ucuaaac- ') complex was generated from the lowest-energy structure of the n-ntd:dsns complex, derived from the cluster with the lowest haddock score, by mutating the dsrna sequence using w dna ( ) . therefore, both complexes have identical geometries, varying only the dsrna sequences. structural conformation of the constructed model for n-ntd:dstrs complex was displayed using the web application http://skmatic.x dna.org for easy creation of dssr (dissecting the spatial structure of rna)-pymol schematics ( ) . molecular dynamics (md) calculations for n-ntd, dsrnas, and n-ntd:dsrna complexes were performed using gromacs (version . . ) ( ) . the molecular systems were modeled with the corrected amber -ol package, including the ff sb protein ( ) and ff bsc χ ol +ol nucleic acid ( , ) force fields, as well as the tip p water model ( ) . the structural models of n-ntd (pdb yi ), dsrnas (mutated pdb u ), and n-ntd:dsrna complexes (from molecular docking) were placed in the center of a cubic box solvated by a solution of mm nacl in water. the protonation state of ionizable residues was set according to the propka server ( ) considering ph . . periodic boundary conditions were used and all simulations were performed in npt ensemble, keeping the system at k and . bar using nose-hoover thermostat (τ t = ps) and parrinello-rahman barostat (τ p = ps and compressibility = .  - ·bar - ). a cutoff of Å for both lennard-jones and coulomb potentials was used. the long-range electrostatic interactions were calculated using the particle mesh ewald (pme) algorithm. in every md simulation, a time step of . fs was used and all covalent bonds involving hydrogen atoms were constrained to their equilibrium distance. a conjugate gradient minimization algorithm was used to relax the superposition of atoms generated in the box construction process. energy minimizations were carried out with steepest descent integrator and conjugate gradient algorithm, using , kj·mol - ·nm - as maximum force criterion. one hundred thousand steps of molecular dynamics were performed for each directly with the n-ntd. we suggest that this activity is a consequence of intrinsic dynamics of n-ntd, especially because the tweezer-like motion between β -β (finger) and -β loops. the protein is denoted as cartoon with the helix-and β-strand secondary structures colored in cyan and orange, respectively. the dsrna is showed as a line model with the complementary strands colored in red and blue. the tweezer-like motion between the finger and -β loop is indicated by bidirectional arrows colored in magenta. a novel coronavirus from patients with pneumonia in china sars-cov- structure and replication characterized by in situ cryo-electron tomography insights into sars-cov- genome, structure, evolution, pathogenesis and therapies: structural genomics approach proteolytic processing of polyproteins a and ab between non-structural proteins and / of coronavirus infectious bronchitis virus is dispensable for viral replication in cultured cells ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex unique and conserved features of genome and proteome of sars-coronavirus, an early split-off from the coronavirus group lineage mechanisms and enzymes involved in sars coronavirus genome expression analyses of coronavirus assembly interactions with interspecies membrane and nucleocapsid protein chimeras coronavirus n protein n-terminal domain (ntd) specifically binds the transcriptional regulatory sequence (trs) and melts trs-ctrs rna duplexes the coronavirus nucleocapsid protein is dynamically associated with the replication-transcription complexes coronavirus nucleocapsid protein facilitates template switching and is required for efficient transcription multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging the nucleocapsid protein of the sars coronavirus is capable of self-association through a c-terminal amino acid interaction domain the coronavirus nucleocapsid is a multifunctional protein structural basis of rna recognition by the sars-cov- nucleocapsid phosphoprotein continuous and discontinuous rna synthesis in coronaviruses sequence motifs involved in the regulation of discontinuous coronavirus subgenomic rna synthesis the sars coronavirus nucleocapsid protein -forms and functions user-friendly integrative modeling of biomolecular complexes functional transcriptional regulatory sequence (trs) rna binding and helix destabilizing determinants of murine hepatitis virus (mhv) nucleocapsid (n) protein structural bioinformatics do_x dna: a tool to analyze structural fluctuations of dsdna or dsrna from molecular dynamics simulations dna: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures discrete simulation methods in combustion kinetics a general method for numerically simulating the stochastic time evolution of coupled chemical reactions crystal structure of sars-cov- nucleocapsid protein rna binding domain reveals potential unique drug targeting sites biochemical characterization of sars-cov- nucleocapsid protein crystal structure studies of rna duplexes containing s u:a and s u:u base pairs web dna . for the analysis, visualization, and modeling of d nucleic acid structures propka : consistent treatment of internal and surface residues in empirical p k a predictions refinement of protein structures in explicit solvent dssr-enabled innovative schematics of d nucleic acid structures with pymol -pubmed gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb refinement of the sugar-phosphate backbone torsion beta for amber force fields improves the description of z-and b-dna refinement of the cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles comparison of simple potential functions for simulating liquid water the pymol molecular graphics system model of the n-ntd:dsrna complex and its validation from molecular dynamics simulations. (a) structural model of the n-ntd:dstrs complex determined by molecular docking calculations and mutation of dsns nucleotide sequence the color of the rectangles corresponds to the nitrogenous base of the dsrna sense strand, namely a: red, c: yellow, u: cyan, and g: green. the large protruding β -β loop is referred to as the finger. (b) average rmsd values for dsns and dstrs in their free and n-ntd-bound states (top), average rmsd values for n-ntd in its free and dsrna-bound state (middle), and average number of contacts between n-ntd and dsrna atoms the authors declare that no conflict of interest exists.